Thursday, October 9, 2008

Bayesian Rating - how to implement a weighted rating system

I had saved this link in my Saved Links in my favorite social bookmarking site reddit. When I wanted it for reference the link and the site was gone. Luckily, Google cache to the rescue. So for future reference I am saving it here as is:

March 30th, 2006 in Basic Tutorials · By Markus Weichselbaum

Many web sites allow users to provide feedback on products, services or other users. In addition to verbal reviews, rating facilities are typically present that allow visitors to rate an item from 0 to 5 (often in conjuction with stars), from 0 to 10, or simply by voting + or -, respectively.

These visitor ratings are then often used to rank the rated items. And when “rank” comes into play, it gets tricky.

Ranking using Bayesian average

Hopefully the headline hasn’t turned you away yet – it smells of mathematical hardcore. But fear not, once you know how, implementing a robust rating and ranking system using the approach discussed here is really quite simple, very elegant, and most importantly, it works really well!

A basic example using simple + and - votes

In fact, the artworks in TheBroth gallery are visitor rated, using a rather simple + and - system. If you like an item, rate it “plus”. If you don’t like it, give it a “minus”.

The rating of an item would then be: number of positive votes divided by number of total votes. For example, 4 + votes and 1 - vote would correspond to a rating of 0.8, or 80%.

Now if you want to rank the items based on this simple equation, the following happens:

Assume you have on item with a rating of 0.93, based on 100s of votes. Now another new item is rated by a total of 2 visitors (or even just one), and they rate it +. Boom, it goes straight to #1 position in the ranking, as its rating is 100%!

A weighty issue

What we want is this:

If there is only few votes, then these votes should count less than when there are many votes and we can trust that this is how the public feels about it. In other texts this value is also refered to as “certainty” or “believability”.

This means, the more votes an item has, the higher the “weight” of these votes.

Thus, we want to calculate a corrected rating that somehow takes the weight of votes into account:

  • The more votes an item has, the closer the corrected rating value would be to the uncorrected rating value.
  • The less votes an item has – and this is the main trick here – the closer its rating should be to the average rating value of all items!

That way, new votes pull the corrected rating value away from the average rating, towards the uncorrected rating value.

There you have it – this is the main algorithm of what we call “Bayesian rating”, or rather “Bayesian ranking” as it is really about the relation of the item ratings to each other, based on the number of votes of each item.

Using a magic value

We now need to apply a “magic” value that determines how strong the weighting (or dampening, as some consider it) shall be. In other words, how many votes are required until the uncorrected value approximates the corrected value?

It really depends on how many votes the items get, in average. There is no point requiring 1000 votes for the item to rank 60% when each item only gets a handful of votes in average.

Thus, we could make this “magic” value exactly that, namely the average number of votes for all rated items, and voila, our Bayesian rating system is complete. By making the magic value dynamic, it will auto adapt to your system.

Finetuning the magic value

You could opt to create an upper limit to your magic value so that your doesn’t come to a grinding halt when there are many votes per item – an evergrowing magic value would make it less and less possible to actually influence the rating of a new item because it takes so many votes before you believe the rating of a new item.

The finetuning will depend on whether your system has a large influx of new items or not. If there are many new items added all the time, this influx will keep the average number of votes per item low. If your system has a fixed number of items, such as “rate your favorite star of The Beatles”, you may not need an upper limit. If you do add the occasional item, then an upper limit makes sense to give new items a chance to rate highly more quickly.

Bayesian rating for everyone

Now, lets summarize this all and provide a working formula for you to use in your code:

Bayesian Rating is using the Bayesian Average. This is a mathematical term that calculates a rating of an item based on the “believability” of the votes. The greater the certainty based on the number of votes, the more the Bayesian rating approximates the plain, unweighted rating. When there are very few votes, the bayesian rating of an item will be closer to the average rating of all items.

Use this equation:

br = ( (avg_num_votes * avg_rating) + (this_num_votes * this_rating) ) / (avg_num_votes + this_num_votes)

Legend:

  • avg_num_votes: The average number of votes of all items that have num_votes>0
  • avg_rating: The average rating of each item (again, of those that have num_votes>0)
  • this_num_votes: number of votes for this item
  • this_rating: the rating of this item

Note: avg_num_votes is used as the “magic” weight in this formula. The higher this value, the more votes it takes to influence the bayesian rating value.

How Bayesian Rating is used in TheBroth

We use it to show the “highest rated artworks” in order. We wanted to avoid that a new artwork with 1 vote immediately jumps to first place, as its rating would be 100%. Using Bayesian rating, its starting rating with one positive vote would be a little bit higher than the average rating of all items.

Resources