PDA

View Full Version : Performance ratings models for 100% and 0% scores



Kevin Bonham
14-07-2010, 11:52 PM
There has been a fair bit of discussion about performance ratings for cases where a player has a 100% (or zero) score. You can see some of this on ChessBase here (http://chessbase.com/newsdetail.asp?newsid=6316).

Technically under ratings formulae you simply can't convert 100% or 0% to performances; it comes out at infinitely above or below the ratings of the people you are playing.

However, some programs still attempt to convert anyway. A practice used by some programs is to assume a 99% score and add c. 800 points to the player's rating; I have also seen some programs assume a smaller amount (eg 400).

A flat addition is unsatisfactory since it treats the 100% as just as exceptional when based on 1 game as when based on 100 (and indeed if you base it on more than 50 games the player is penalised for winning every game instead of drawing at least one).

Chessbase cites Ken Thompson as adding a notional draw between the player and themselves (a method Chessbase is now going to adopt). So for instance, if a 2600 has 3/3 against three 2500s, treat it as if they have 3.5/4 against three 2500s and a 2600 (which comes out to 2868; 2.5/3 against the three 2500s would have been 2780). In this way, low-rated players who get perfect scores are given lower PRs than high-rated players who do the same thing against the same field. An advantage of this system is that a player on 100% is always performing (if you use TPRs) at or above their own rating.

However, I don't entirely like this system as applied to TPRs (and I just don't use batched PRs if I can avoid it), because a low-rated player gets almost no credit for getting a perfect score rather than conceding half a point. Example: a junior, rated 1600, plays three 2000-rated players in a row. The following are the junior's TPRs under the Thompson system for various scores:

0/3: 1491
0.5/3: 1720
1/3: 1879
1.5/3: 2000
2/3: 2121
2.5/3: 2280
3/3: 2288

...and the last one seems a bit harsh. (And yes, you can add more games and the same problem is still there.)

A similar problem with the Thompson method is this. A player beats a bunch of players of their own rating in a row. They then play another player of their own rating, and draw. The draw is a bad result compared to the other game results, and should reduce the performance rating, but it doesn't affect their Thompson method performance rating at all.

I think the following might be one approach to using a lot of empirical data to model a sound version. Since a game is a granular outcome, a range of performances are likely to round to 100%. For instance, if you play at a playing strength a few hundred points above your opponent, and would score 85% against that opponent by playing that well against them over a large number of games, then in any given game you're a lot more likely to win than draw or lose. Over two games, you've probably got a more than 50% chance of being 2/2.

If you're playing to a playing strength that would give you a 60% score against that opponent, then you're not all that likely to beat them, unless you're both so weak (or sharp) that you'd hardly ever draw. More likely your chance of a win in a given game is something like 40% and your chance of two wins in a row is something like 16%.

What all this means is that if you know a player has 100% from a given number of games against a field of a given strength, and have access to win-rate data by rating for players of all rating strengths, you can infer that if a player was performing at a given rating, they have a given chance of scoring 100% from that number of games. Then, using that range of probabilities and the distribution of different ratings across all players, you can work out an average rating strength for a player who would obtain that result. (This cuts out the infinity problem, since while a player infinitely higher rated than you has a 100% chance of scoring 100% against you, no players like that exist, and therefore they disappear from the model.)

However, all that is a heck of a lot of work, and really not worth the bother.

I'd be interested to know if anyone has any better practical approximations than Thompson's.

One I have looked at is to treat a 100% score as equivalent to that score minus a small fraction of a game. So, for instance, the fraction might be 0.2 (it shouldn't be more than 0.3 and should probably be a fair bit less than that.) Then 1/1 is treated as .8/1, 2/2 = 1.8/2, 3/3 = 2.8/3 etc (and then find performance rating as normal using that percentage score against those opponents).

This gets rid of the absurdities of adding or subtracting a single value that can be too high or too low depending on how many games are included. But a problem with this one can be that from a small number of games, a player who vastly outrates their opposition can be said to have performed below their own rating with a 100% score. (Also a problem with my theoretical version above.)

Rincewind
15-07-2010, 12:32 AM
A simple thing to do would be to use some score less than 100% but greater than the next highest possible score than that number of rounds. For example if after n rounds player one has won all their games instead of doing a reverse lookup with 1 (which leads to an infinity) you use 1-1/[2(n+2)]. So after one round use 5/6 after two rounds use 7/8 etc. The reason for the plus 2 is so that if that player scores a draw in his next round his score used in the performance calculation will go down which mitigate some of the issues you raised.

Likewise a player on 0/2 say would use a figure of 1/8.

Another advantage is that it is independent of the players own rating which seems to be a fair thing since why should two player with the same score against the same opposition have different performance ratings?

I haven't tried to apply this, just an idea.

Kevin Bonham
15-07-2010, 01:00 AM
A simple thing to do would be to use some score less than 100% but greater than the next highest possible score than that number of rounds. For example if after n rounds player one has won all their games instead of doing a reverse lookup with 1 (which leads to an infinity) you use 1-1/[2(n+2)]. So after one round use 5/6 after two rounds use 7/8 etc. The reason for the plus 2 is so that if that player scores a draw in his next round his score used in the performance calculation will go down which mitigate some of the issues you raised.

I think this works very well for small numbers of games. For large numbers of games, it has a possible limitation that n*(1-1/2(n+2)) closely approaches (n-1/2) meaning that the difference between a 100% score and 100%-minus-a-draw becomes very small. For instance while 1 win at .83 is much better than a draw at 0.5, +10=0-0 at .9583 (possibly rounded to .96 for lookup purposes) is not much different to +9=1-0 at .95. (It is still better though than the Thompson method in which +10=0-0 is only worth .9545.)

Kevin Bonham
02-04-2011, 12:41 AM
I noticed today that SP uses +999 for 100%. (My test example was a 14-player round robin where one player scored 13/13 but every opponent was rated 1000). Presumably it uses -999 for 0% but I haven't tested that.