Where would we be without rating and reputation systems these days? Take them away, and we wouldn’t know who to trust on eBay, what movies to pick on Netflix, or what books to buy on Amazon. Reputation systems (essentially a rating system for people) also help guide us through the labyrinth of individuals who make up our social web. Is he or she worthwhile to spend my time on? For pity’s sake, please don’t check out our reputation points before deciding whether to read this article.
Rating and reputation systems have become standard tools in our design toolbox. But sometimes they are not well-understood. A recent post at the IxDA forum showed confusion about how and when to use rating systems. Much of the conversation was about whether to use stars or some other iconography. These can be important questions, but they miss the central point of ratings systems: to manage risk.
So, when we think about rating and reputation systems, the first question to ask is not, “Am I using stars, bananas, or chili peppers?” but, “what risk is being managed?”
What Is Risk?
We desire certainty in our transactions. It’s just our nature. We want to know that the person we’re dealing with on eBay won’t cheat us. Or that Blues Brothers 2000 is a bad movie (1 star on Netflix). So risk, most simply (and broadly), arises when a transaction has a number of possible outcomes, some of which are undesirable, but the precise outcome cannot be determined in advance.
Where Does Risk Come From?
There are two main sources of risk that are important for rating and reputation systems: asymmetric information and uncertainty.
Asymmetric information arises when one party to a transaction can not completely determine in advance the characteristics of the other party, and this information cannot credibly be communicated. The main question here is: can I, the buyer, trust you, the seller, to honestly complete the transaction we’re going to engage in? That means: will you take my money and run? Did you describe what you’re selling accurately? And so on.
This unequal distribution of information between buyers and sellers is a characteristic of most transactions, even in transactions where fraud is not a concern. Online transactions make asymmetric information problems worse. No longer can we look the seller in the eye and make a judgment about their honesty. Nor can we physically inspect what we’re buying and get a feel of its quality. We need other ways to manage our risk generated by asymmetric information.
The other source of risk is not knowing beforehand whether we’ll like the thing we’re buying. Here honesty and quality are not the issue, but rather our own personal tastes and the nature of the thing we’re buying. Movies, books, and wine are examples of experience goods, which we need to experience before we know their true value. For example, we’re partial to red wine from Italy, but that doesn’t mean we’ll like every bottle of Italian red wine we buy.
Managing Risk with Design
Among the ways to manage risk, two methods will be of interest to user experience designers:
- Signaling is where participants in a transaction communicate something meaningful about themselves.
- Reducing information costs involves reducing the time and effort it takes participants in a transaction to get meaningful information (such as: is this a good price? is this a quality good?).
Reputation systems tend to enable signaling and are best utilized in evaluating people’s historical actions. In contrast, rating systems are a way of leveraging user feedback to reduce information costs and are best utilized in evaluating standard products or services.
It is important to note that reputation systems are not the only way to signal (branding and media coverage are other means, among others), and rating systems are not the only means of reducing information costs (better search engines and product reviews also help, for example). But these two tools are becoming increasingly important, as they provide quick reference points that capture useful data.
As we review various aspects of rating and reputation systems, the key questions to keep in mind are:
- Who is doing the rating?
- What, exactly, is being rated?
- If people are being rated, what behaviors are we trying to encourage or discourage?
Who is doing the Rating?
A random poll of several friends shows about half use the Amazon rating system when buying books and the other half ignore it. Why do they ignore it? Because they don’t know whether the people doing the rating are crackpots or if they have similar tastes to them.
Amazon has tried to counteract some of these issues by using features such as “Real Name” and “helpfulness” ratings of the ratings themselves (see Figure 1).
Figure 1: Amazon uses real names and helpfulness to communicate honesty of the review.
This is good, but requires time to read and evaluate the ratings and reviews. It also doesn’t answer the question, how much is this person like me?
Better is Netflix’s system (Figure 2), which is explicit about finding people like you, be they acknowledged friends or matched by algorithm.
Figure 2: Netflix lets you know what people like you thought of a movie.
Both these systems implicitly recognize that validation of the rating system itself is important. Ideally users should understand three things about the other people who are doing the rating:
- Are they honest and authentic?
- Are they like you in a way that is meaningful?
- Are they qualified to adequately rate the good or service in question?
The last point is important. While less meaningful for rating systems of some experience goods (we’re all movie experts, after all), it is more important for things we understand less well. For example, while we might be able to say whether a doctor is friendly or not, we may be less able to fairly evaluate a doctor’s medical skills.
What is being rated?
Many rating systems are binary (thumbs up, thumbs down), or scaled (5 stars, 5 chili peppers, etc.), but this uni-dimensionality is inappropriate for complicated services or products which may have many characteristics.
For example, Figure 3 depicts a rating system from the HP Activity Center and shows how not to do a rating system. Users select a project that interests them (e.g., how to make an Ireland Forever poster) and then complete it using materials they can purchase from HP (e.g., paper). A rating system is included, presumably to help you decide which project you should undertake in your valuable time.
Figure 3: The rating system on the HP Activity Center site: what not to do.
A moment’s reflection raises the following question: what is being rated? The final outcome of the project? The clarity of the instructions? How fun this project is? We honestly don’t know. Someone thoughtlessly included this rating system.
Good rating systems also don’t inappropriately “flatten” the information that they collect into a single number. Products and services can have many characteristics, and not being clear on what characteristics are being rated, or inappropriately lumping all aspects into a single rating, is misleading and makes the rating meaningless.
RateMDs, a physician rating site, uses a smiley face to tell us about how good the doctor is (Figure 4).
Figure 4: RateMDs.com rating system for doctors.
Simple? Yes. Appropriate? Perhaps not.
Better is Vitals, a physician rating site that includes information about doctors’ years of experience, any disciplinary actions they might have, their education, and a patient rating system (Figure 5).
Figure 5: The multi-dimensional rating system on Vitals.com.
While Vitals has an overall rating, more important are the components of the system. Each variable – ease of appointment, promptness, etc. – reflects a point of concern that helps to evaluate physicians.
When rating experiences, what is being rated is relatively clear. Did you enjoy the experience of consuming this good or not? Rating physical goods and products can be more complicated. An ad hoc analysis of Amazon’s rating system (Figure 6) should help explain.
Figure 6: Amazon’s rating system is not always consistent.
In this example the most helpful favorable and unfavorable reviews are highlighted. However, each review is addressing different variables. The favorable review talks about how easy it is to set up this router, while the unfavorable review talks about the lack of new features. These reviews are presented as comparable, but they are not. These raters were thinking about different characteristics of the router.
The point here is that rating systems need to be appropriate for the goods or services that are being rated. A rating system for books cannot easily be applied to a rating system for routers, since the products are so entirely different in how we experience them. What aspects we rate need to be carefully selected, and based on the characteristics of the product or service being rated.
What behaviors are we trying to encourage?
Any rating of people is essentially a reputation system. Despite some people’s sensitivity to being rated, reputation systems are extremely valuable. Buyers need to know whom they can trust. Sellers need to be able to communicate – or signal based on their past actions – that they are trustworthy. This is particularly true online, where it’s common to do business with someone you don’t know.
But designing a good reputation system is hard. eBay’s reputation system has had some problems, such as the practice of “defensive rating” (rate me well and I’ll rate you well; rate me bad and I’ll rate you worse). This defeats the purpose of a rating system, since it undermines the honesty of the people doing the rating, and eBay has had to address this flaw in their system. What started out as an open system now needs to permit anonymous ratings in order to save the reputation (as it were) of the reputation system.
While designing a good reputation system is hard, it’s not impossible. There are five key things to keep in mind when designing a reputation system:
1. List the behaviors you want to encourage and those that you want to discourage
It’s obvious what eBay wants to encourage (see Figure 7). A look at a detailed ratings page shows they want sellers to describe products accurately, communicate well (and often), ship in a reasonable time, and not charge unreasonably for shipping. (Not incidentally, you could also view these dimensions as source of risk in a transaction.)
Figure 7: eBay encourages good behavior.
2. Be transparent
Once you know the behaviors you want to encourage, you then need to be transparent about it. Your users need to know how they are being rated and on what basis. Often a reputation is distilled into a single number — say, reputation points — but it is impossible to look at a number and derive the formula that produced it. While Wikinvest (Figure 8) doesn’t show a formula (which would be ideal), they do indicate what actions you took to receive your point total.
Figure 8: Wikinvest’s reputation system
Any clarity that is added to a reputation system will make your users happy, and it will make them more likely to behave in the manner you desire.
3. Keep your reputation system flexible
Any scoring system is open to abuse, and chances are that any reputation system you design will be abused in imaginative ways that you can’t predict. Therefore it’s important to keep your system flexible. If people begin behaving in ways that enhance their reputation but don’t enhance the community, the reputation system needs to be adjusted.
Changing the weighting of certain behaviors is one way to adjust your system. Adding ratings (or points) for new behaviors is another. The difficulty here will be in keeping everything fair. People don’t like a shifting playing field, so tweaks are better than wholesale changes. And changes should also be communicated clearly.
4. Avoid negative reputations
When possible, reputation systems should also be non-negative towards the individual. While negative reputations are important to protect people, negative reputations should not always be emphasized. This is specifically true in community sites where users generate much of the content, and not much is at stake (except perhaps your prestige).
Looking at our example above (Figure 8), Wikinvest uses the term “Analyst” (a nice, non-offensive term … if you’re not in investment banking), to mean, “this person isn’t really contributing much.”
5. Reflect reality
Systems sometimes fail on community sites when people belong to multiple communities and their complete reputations are not contained within any one of them. While there are exceptions, allowing reputations earned elsewhere to be imported can be a smart way to bring your system in line with reality and increase the value of information that it provides.
Conclusion
Our discussion of rating systems and reputation systems is certainly incomplete. We do hope that we’ve given a good description of risk in online transactions, and how understanding this can help user experience designers better manage risk via the design of more robust rating and reputation systems.
In addition, we’d like to begin a repository of rating and reputation systems. If you find any that you’d like to share, feel free to submit them at http://101ratings.com/submit.php.