The Statistically Significantly Illogical CAHPS Cut-Points

The Statistically Significantly Illogical CAHPS Cut-Points

Imagine 100 students take a test.

The professor in this particular class is known to grade on a curve, so the students know beforehand that there is no preset grading scale for this test. They must simply strive to score as high as possible, and their letter grade will be based on the distribution of scores of all the students. Theoretically, at least. More on that in a second.

I’ll mention that the professor is also known to be a bit “wonky”. He is a statistics enthusiast who tends to make tests and their grading scales overly complicated, sometimes completely missing the point of what the test was actually measuring.

When the results of the test are tabulated, the professor posts each student’s score and the threshold for each letter grade. The thresholds are as follows:

No alt text provided for this image

The students are initially perplexed that their results all fall within such an incredibly tight range. They realize that only 5% separates an A from an F, and they can’t help but wonder that, given the limited variance between the “high” performers and the “low” performers, maybe they need a different test. Maybe the test isn’t measuring what it should be measuring.

Further, imagine that 63 students all score an 86, which is clearly in the C threshold based on the curve results. However, the 63 students do not all receive a C.

Of the 63 students who score 86, only 59 receive an actual C.

No alt text provided for this image

Of the remaining 4 students, 3 are awarded a B. The professor’s explanation is that those 3 scored a “more solid” 86. Their 86 was statistically significantly higher than the rest of the 86’s scored in class, meaning that, perhaps, they scored 86.4, the uppermost “86” possible. So, despite the fact that only 5 percentage points separate excelling from failing, the wonky professor has developed statistical caveats that allow students who achieved the extremely limited performance threshold to still not receive the grade associated with it.

And the last student who scored an 86? They receive a D. The professor’s explanation is that their 86 was statistically significantly lower than the rest of the 86’s scored in class. Possibly, the student scored an 85.5, the lowest “86” possible, and/or the professor deemed the student’s responses less reliable than the other students’ responses.

In the program these students are all enrolled in, every other professor in every other class awards students the same grade if they achieve the same score. Every professor and every test…except this one. Why didn’t all 63 students receive the same score in this class?

This is exactly what happened last year with 63 Medicare Advantage health plans and the Rating of Health Care Quality measure from the Medicare Consumer Assessment of Healthcare Providers and Systems (CAHPS®) survey, as part of the Star Ratings program.

Real World Example

Rating of Health Care Quality: 2018 CAHPS Survey (2019 Star Ratings)

No alt text provided for this image

Illustrated above are actual numbers from the 2018 CAHPS survey. These numbers are specific to Medicare Advantage health plans that scored between 83 and 87 on the Rating of Health Care Quality measure. This measure is based on a single question asked to a random sample of members from each health plan:

Using any number from 0 to 10, where 0 is the worst health care possible and 10 is the best health care possible, what number would you use to rate all your health care in the last 6 months?

The measure itself is not complicated. Only the scoring is. Some Star Rating measures from the CAHPS survey involve the calculation of a composite score based on up to six separate questions. This measure is based on a single question.

And as shown in the numbers above, in many cases health plans achieved the minimum threshold to receive a certain Star Rating, but they received a lower (or higher) Star Rating based on the reliability of their responses and the statistical significance of their score compared to the same scores by other health plans.

Some other examples to help further orient you to the data above:

No alt text provided for this image

This method of scoring doesn’t apply to all 40+ Star Rating measures. It only applies to the 9 measures from the CAHPS survey. Why are they so different, especially the measures such as Rating of Health Care Quality which are based on a single question? Why does the math have to be so fuzzy for these measures? And if it does have to be different for these measures based on some valid statistical analysis reason – I confess I’m not a statistician – why can’t there be more transparency? Plans are left frustrated and often unclear as to exactly why they received something different than their score is worthy of.

If member incentive plans were measured and rewarded this way, it would lead to frustrated, disengaged members.

All in the Family

Another interesting note is that many of these examples involve plans from the same parent organization. Imagine Thanksgiving dinner if 9 students from the same family all received the same grade on an exam, although 5 of the students scored high enough to receive a higher grade. 

No alt text provided for this image

This Year’s Results

Last week, CMS released its first plan preview for 2020 Star Ratings. As part of that release, plans saw their own CAHPS rates and Star Ratings for this year. As I helped plans review and validate their data, I was reminded of the frustration plans have when they achieve a certain threshold but fail to receive the related Star Rating, due to these statistical caveats. The results of all plans are not yet public; thus the need to revisit last year’s results for the analysis above. However, we do have access to this year’s base Star thresholds.

As shown below, the range from being considered a 1 Star plan on any given CAHPS measure (excluding Annual Flu Vaccine) and being considered a 5 Star plan on that given measure this year is as low as 4 percentage points and no more than 7.

Case in point: Plan A scores 87 on the Getting Needed Prescription Drugs measure and receives 1 Star, the lowest possible score a plan can receive. Plan B scores 91, only 4 points higher than Plan A, and receives 5 Stars, the highest possible score. Can 4 points really distinguish a plan with the highest-possible rating from a plan with the lowest-possible rating?

No alt text provided for this image

Increasing in Importance

Beginning with the 2020 survey, CMS is placing even more emphasis on the CAHPS survey – by increasing the weight on 8 of the 9 measures along with several other key member experience measures. The combined weight of these member experience measures will be greater than in the past, forcing plans to invest more in improvement of the member experience if they want to perform highly in Star Ratings.

The Medicare Advantage industry is ready for this shift in importance of member experience measures and does not appear to be pushing back. Health plans need to improve as it relates to the experience of their members. There is zero argument there. However, the scoring for those measures needs to be clearer. 

Great article! I’ll be sharing out as well. Thanks!

Sarah Kachur PharmD MBA BCACP

Johns Hopkins Innovation | VBC & Pop Health Results | Founder @ Illustra Health

5y

Wow Rex this is really interesting! I had no idea of the complexity.

Nate Lucena

Connecting the dots as CSO @ RWC Health Equity | Analytics | Strategy | Execution

5y

Great piece, Rex! My team will enjoy this one. These are points that we try to raise with CMS and our CAHPS champions at health plans every year!

To view or add a comment, sign in

Insights from the community

Others also viewed

Explore topics