What If a Five-Star Rating Was Actually Bad?

In her latest research on consumer behavior and decision making, Tuck associate professor Ellie Kyung investigates what happens to consumer judgment when our rating system is turned upside down.

So many things in our world are evaluated using numeric ratings, from the quality of an everyday product like shampoo to the more solemn assessment of employee performance.

Numeric ratings are popular because they are instantly comprehensible. But a problem can arise when rating systems clash. In the U.S., for example, a higher number equals a higher rating. In Germany, it is just the opposite: a lower number means something is better. According to a new study by Tuck associate professor Ellie J. Kyung, this can lead to errors in judgment when numeric ratings are shared cross-culturally.

The research behind “When Bigger is Better (and When it is Not): Implicit Bias in Numeric Judgments,” which Kyung co-authored with Manoj Thomas of Cornell University and the University of Michigan’s Aradhna Krishna, was sparked by Germany’s rating system. Kyung and her co-authors noticed that the rating scale in Germany’s version of Consumer Reports went the other way—a higher number meant something was actually bad quality. “We started to wonder, what happens when you have to decide on something that is being rated in a format you are less used to?” says Kyung.

Over a series of seven experiments that used participants in both the U.S. and Germany, and tested judgments such as auction bids and willingness to pay, the authors identified what they term “the rating polarity effect.” What they found is a persistent tendency for U.S. consumers to be less sensitive to differences in product quality when using the smaller-is-better rating, and for Germans’ judgments to be similarly skewed when using a bigger-is-better-format. For both groups, the difference between a terrible and a wonderful product is smaller than it normally would be.

We found it is difficult to override the implicit memory that causes bias.

“I think a lot of us would assume that because we’re dealing with numbers it’s very concrete, and that we just need to explain the differences between the rating systems and people will grasp it,” says Kyung. “But we did that and peoples’ judgments were still biased. What we found really interesting is how difficult it is to override the implicit memory that causes the bias.”

A case in point is the experiment described in the study involving a tooth-whitening product. U.S. participants were told that the manufacturer was considering launching a product in the U.S. and that it had received a quality rating from a reputable consumer welfare agency in Europe. Half of the participants were given a product quality rating for the teeth whitener that was either high or low, using a bigger-is-better rating format. The other participants were given a quality rating with the smaller-is-better format.

When the participants were asked to evaluate the same set of before and after photos of whitened teeth, those who had received the American, bigger-is-better format judged the difference between the high and low quality product as far more impressive than those using the smaller-is-better rating format. Those participants saw virtually no difference between products rated as high versus low quality. “Peoples’ perceptions of the product’s effectiveness were dramatically different depending on which rating format they were using, even though they were looking at exactly the same photographs,” says Kyung.

Peoples’ perceptions were dramatically different even though they were looking the same photograph.

So what’s the takeaway for businesses and organizations? “I think that this is where a translation process can become really important,” Kyung says.  “For example, if you are introducing a product in a country that has an opposite numerical association with ratings, you would want to translate that number into the kind of scale that people there are used to encountering. And it’s the same if you have people evaluating grant applications, candidates for admission to a university, or managers evaluating people for jobs. Anywhere you have this kind of potential cross-cultural mixing of rating systems, you want to use the system that the manager or decision-maker is most familiar with.”