PDA

View Full Version : Calculating true performance ratings



Kevin Bonham
02-08-2006, 09:07 PM
Your true performance rating (TPR) is the rating at which your expected score against the field met equals your actual score, ie if that was your start rating you would neither gain nor lose points from the event.

Is there a simple way of determining TPR, even in the ELO system, to within say 20 points (or an online calculator that will do it)? Working out performance ratings by the common batched game method leads to inaccurate results if you have a few outliers skewing the ratings. For instance, I'll quite often play an event in which I play an 1100 in round 1 and nobody weaker than 1500 for the rest of the event. Where games are batched and I look up my %age on a lookup table, the outlier drags the average down so far that my crude PR is higher with the win against the outlier dropped.

Apologies if this has been covered before.

pax
03-08-2006, 10:14 AM
The solution is Ra in the following equation:

Ea=sum(i=1..n)(1/(1+10^((Rbi-Ra)/400)))

Where Ea is the score achieved, n is the number of games and Rb1, Rb2, ... Rbn are the ratings of the n opponents.

I don't think there is a simple analytical solution. On the other hand, a numerical solution is easy, since Ea is monotonic in Ra. Bisection can probably get you and answer in a few milliseconds.

Hmm... Maybe I will do a calculator in PHP.

Garvinator
03-08-2006, 12:51 PM
Hmm... Maybe I will do a calculator in PHP.
:clap:

Ian Rout
03-08-2006, 03:01 PM
Noting the definition, or interpretation of the definition:
ie if that was your start rating you would neither gain nor lose points from the event. the TPR would depend on the rating system; TPR(Glicko) and TPR(Elo) are different from each other and from TPRs under other rating systems.

If you are using a rating system that you can easily model in Excel (such as standard Elo) you can use the Goal Seek function to find the starting rating which generates a zero adjustment.

pax
03-08-2006, 03:16 PM
Try this:

http://www.paxmans.net/performance_calc.php

Test it out, let me know if there are any bugs. I can include Glicko as well, if people think that would be useful.

Kevin Bonham
04-08-2006, 08:58 PM
Noting the definition, or interpretation of the definition: the TPR would depend on the rating system; TPR(Glicko) and TPR(Elo) are different from each other and from TPRs under other rating systems.

Yes, this is true, especially under Glicko if you are playing opponents with a wide variation in rating reliability. I am not sure how different Glicko and ELO would be as a result of this in any given case.


Try this:

http://www.paxmans.net/performance_calc.php

Test it out, let me know if there are any bugs. I can include Glicko as well, if people think that would be useful.

Excellent! Thanks very much for that. Bill sent me an Excel file that works the same way, but with a small amount of manual trial and error to get the final result. I entered my results vs rated opponents in the current HICC Club Champs thus far and got the same figure (2052) in Bill's file and your php link.

One trivial bug found: it accepts negative scores (however a PR of zero is returned in such cases).

pax
04-08-2006, 09:39 PM
One trivial bug found: it accepts negative scores (however a PR of zero is returned in such cases).

Ta. Will fix that.

Igor_Goldenberg
07-08-2006, 10:01 AM
Try this:

http://www.paxmans.net/performance_calc.php

Test it out, let me know if there are any bugs. I can include Glicko as well, if people think that would be useful.

I tried it, It's quite good. However, it does not account for 350 rule correctly.
Indeed, after I entered the result and got TPR, I bumped ratings of my opponent to TPR-350 and recalculated. TPR should've stayed the same, but it actually went up.

Bill Gletsos
07-08-2006, 12:11 PM
I tried it, It's quite good. However, it does not account for 350 rule correctly.
Indeed, after I entered the result and got TPR, I bumped ratings of my opponent to TPR-350 and recalculated. TPR should've stayed the same, but it actually went up.When calculating true performance ratings there is absolutely no need to use a 350 rule or in fact any sort of xxx rule.

Igor_Goldenberg
07-08-2006, 12:21 PM
When calculating true performance ratings there is absolutely no need to use a 350 rule or in fact any sort of xxx rule.

TPR is the rating at which you would not lose/gain rating points. Loss/gain of points depend on 350 rule.

Bill Gletsos
07-08-2006, 02:34 PM
TPR is the rating at which you would not lose/gain rating points.Incorrect.

Loss/gain of points depend on 350 rule.FIDE use a 350 point rule in calculating rating changes but that isnt the same as calculating what a true performance rating is.

FIDE dont even use a 350 rule in calculation of performance ratings in norms.

However even then we are not talking about the flawed method by which FIDE calculate performance ratings where they average the rating of your opponents as part of the calculation.

A true performance rating is the rating at which your expected score equals your actual score. When determining TPR's 350 rules and any other such xxx rules have no place in the calculations and are totally meaningless and in fact distort results.

e.g. a player 2800 plays a 2800 and a 1400 and he scores 0.5 and 1.0 respectively.

His true performance rating is 2800.
His performance using a 350 rule is 2877 which is totally incorrect.
His performance rating using averaging is 2291 which is also totally incorrect.

pax
07-08-2006, 11:08 PM
What Bill said :D

Vlad
09-08-2006, 11:59 AM
e.g. a player 2800 plays a 2800 and a 1400 and he scores 0.5 and 1.0 respectively.

His true performance rating is 2800.
His performance using a 350 rule is 2877 which is totally incorrect.
His performance rating using averaging is 2291 which is also totally incorrect.

Bill is the nicest person on the bb. :clap: How come all these ... complaining all the time??

Igor_Goldenberg
09-08-2006, 09:59 PM
What Bill said :D

Agree

Kevin Bonham
20-12-2015, 06:54 PM
pax's site is down at the moment.

ER
20-12-2015, 07:08 PM
pax's site is down at the moment.

how about that for a thread resurrection!!! :)

Kevin Bonham
20-12-2015, 07:26 PM
That just shows how long it has been up for!

Pepechuy
21-07-2016, 04:20 PM
Incorrect.

His true performance rating is 2800.
His performance using a 350 rule is 2877 which is totally incorrect.
His performance rating using averaging is 2291 which is also totally incorrect.

According to my computations, his true performance rating is 2800.000263
I think it can be safely rounded to 2800.
I am assuming a normal distribution with standard deviation 200*sqrt(2) to calculate the expected score for each game, just like Elo originally proposed.

Bill Gletsos
22-07-2016, 01:46 PM
According to my computations, his true performance rating is 2800.000263Yes to 6 decimals places. Using a normal distribution it is 2800.00026342 to 8 decimal places.

I think it can be safely rounded to 2800.True.

I am assuming a normal distribution with standard deviation 200*sqrt(2) to calculate the expected score for each game, just like Elo originally proposed.Elo switched to a logistic function and introduced it years ago in the USCF calculations.
Using the logistic function it is 2800.21939096 to 8 decimal places.

Interestingly the FIDE rating regulations totally mess this all up.
The published tables which are what they actually use for calculations are based on the normal distribution.
However the approximating formula they list is the logistic formula.

Why they stick with the inferior normal distribution is anyone's guess.

Rincewind
22-07-2016, 02:40 PM
Why they stick with the inferior normal distribution is anyone's guess.

I suspect it has something to do with the bureaucratic nature of changing anything in FIDE.

BTW Do you have some reference to the argument Elo had at the time that the USCF switched? I believe it happened and in fact other people have said the same thing as you just adding that the USCF looked at a lot of data and determined the logistic distribution was better. But generally the logistic and normal distributions are difficult to distinguish without a very big dataset. A reference to the dataset or a graph of the data demonstrating the logistic distribution would be great.

Patrick Byrom
22-07-2016, 03:37 PM
BTW Do you have some reference to the argument Elo had at the time that the USCF switched? I believe it happened and in fact other people have said the same thing as you just adding that the USCF looked at a lot of data and determined the logistic distribution was better. But generally the logistic and normal distributions are difficult to distinguish without a very big dataset. A reference to the dataset or a graph of the data demonstrating the logistic distribution would be great.
Mark Glickman's paper (http://www.glicko.net/research/acjpaper.pdf) suggests (page 6) that the results are basically identical with either distribution, and it's just easier to calculate using the logistic - that was certainly my experience in implementing the Elo formula. (I tried to copy the relevant section, but Adobe Reader doesn't seem to like the format of the paper!)

Rincewind
22-07-2016, 05:20 PM
Mark Glickman's paper (http://www.glicko.net/research/acjpaper.pdf) suggests (page 6) that the results are basically identical with either distribution, and it's just easier to calculate using the logistic - that was certainly my experience in implementing the Elo formula. (I tried to copy the relevant section, but Adobe Reader doesn't seem to like the format of the paper!)

Thanks Patrick I have the paper and can check it out. There are some figures in that paper but mostly they are generic although Figure 6 is constructed from a large dataset of actual games I don't think it is demonstrating that a particular distribution is better.

I also had problems with the paper that seems to totally mess with Adobe's search function as well.

Pepechuy
25-07-2016, 07:32 AM
I think the difference of normal vs logistic is a very minor issue for FIDE ratings.
There are far bigger problems with the Elo system as implemented by FIDE:
1. Between two rated players, the expected score is checked up from a table that provides very low precision (with modern computers, it is easy to compute it).
2. The "conversion from fractional score into rating differences" is also provided by a table: the fractional score is first rounded to two decimals(!), and then the table is consulted. Again, modern computers can provide a very precise answer in a very short time.
3. For unrated players, the ratings of the opponents are averaged. Again, using modern technology, it is easy to compute a "true performance rating". I understand that FIDE does not want anything like that for new players that score more than 50%, I think this issue can be (artificially) addressed.
4. The 400-point rule is artificial. Computing the expected score for each game individually, there is no need for it.
5. In complete round-robin tournaments, all the results of the unrated players count towards the rating of their opponents; but even if one game is missing they do not (they also do not count in other type of competitions, like Swiss system). In an extreme case, this is quite unfair. It is possible to rate all the games solving a non-linear optimization problem (the only requirement is that the unrated player does not lose all the games, and does not win all the games). The procedure described by FIDE is based on assumptions that rely heavily on all the games being played, just drop those assumptions.

pappubahry
26-09-2020, 01:25 AM
I've made a webpage (https://pappubahry.com/misc/tpr_vega/) that takes a Vega cross table of a Swiss event and adds a (logistic) true-performance-rating column, with an option to replace ratings of zero with some other figure.

Kevin Bonham
26-09-2020, 12:15 PM
I've made a webpage (https://pappubahry.com/misc/tpr_vega/) that takes a Vega cross table of a Swiss event and adds a (logistic) true-performance-rating column, with an option to replace ratings of zero with some other figure.

Very nice, thankyou!

Pepechuy
28-09-2020, 02:01 PM
Your true performance rating (TPR) is the rating at which your expected score against the field met equals your actual score, ie if that was your start rating you would neither gain nor lose points from the event.

Is there a simple way of determining TPR, even in the ELO system, to within say 20 points (or an online calculator that will do it)? Working out performance ratings by the common batched game method leads to inaccurate results if you have a few outliers skewing the ratings. For instance, I'll quite often play an event in which I play an 1100 in round 1 and nobody weaker than 1500 for the rest of the event. Where games are batched and I look up my %age on a lookup table, the outlier drags the average down so far that my crude PR is higher with the win against the outlier dropped.

Apologies if this has been covered before.


It can be easily done in Excel, as long as the score is neither 0% or 100% (I am thinking of Elo system).
First, define a constant
sigma=200*sqrt(2)

Put the ratings of the opponents in a column (nothing else should be here), lets say A
Define
n=count(A:A)
Now, you might have the score of your player for each game, or the total score.
If you have the individual score, just add them up and compute the total score.
I am assuming 0 < TotalScore < n

You need now an initial guess. The average rating of the opponents should work well.
Lets call this TruePR
Now, take another column, lets say column B.
In cell B1 put
=norm.dist(TruePR, A1, sigma, TRUE)

Copy this to the lower cells.
Add this column B. Lets call this result ExpectedScore
What we want to achieve is ExpectedScore = TotalScore, by modifying the TruePR
In another cell, write
(ExpectedScore - TotalScore)^2

Now open the Solver.
The Objective is to Minimise the cell with (ExpectedScore - TotalScore)^2, by changing the TruePR.
No extra restrictions are needed.

You might need to repeat this a few times.

This can be generalised if you have two (or more) players with an unreliable rating (or even unrated), they have played among them, and need a TruePr.

Note: Pax uses a logistic distribution, while the Elo system is actually based on the normal distribution.

Greetings,
Josť.

Bulldozer
08-07-2021, 01:45 PM
I'm trying to get a better understanding why the FIDE performance rating works so badly for extreme scores (say, 8.5 of 9), and found that FIDE didn't implement the Elo's formula in full.
Have a look at "modified performance rating Rf" in his book (https://www.gwern.net/docs/statistics/comparison/1978-elo-theratingofchessplayerspastandpresent.pdf) "The Rating of Chess Players, Past & Present".
I thought it might fix things at some extent.
Does anyone have an idea what the formula behind F is? In other words, how to reconstruct the table?
Apparently, the purpose of Rf is to make the modified PR less differ from the opponents' average rating than non-modified, especially when the number of games is low or when the score is not 50%.
When I was reading the text, my first thought was that F would be just a ratio of the inverse Student's CDF to the inverse normal CDF. I checked, and it wasn't the case. Also, that ratio would produce counter-intuitive results, because the t-distribution has heavier tails, so in that case F for better scores (bigger Dps) would be >1, hence the modified performance rating Rf would be even higher that the non-modified as a result.
So, it's something else - not that simple. I tried several things including multiplication by sqrt(N), sqrt(p*(1-p)), etc but couldn't match the F values in the table.

4932

Bill Gletsos
08-07-2021, 03:19 PM
I'm trying to get a better understanding why the FIDE performance rating works so badly for extreme scores (say, 8.5 of 9), and found that FIDE didn't implement the Elo's formula in full.
Have a look at "modified performance rating Rf" in his book (https://www.gwern.net/docs/statistics/comparison/1978-elo-theratingofchessplayerspastandpresent.pdf) "The Rating of Chess Players, Past & Present".
I thought it might fix things at some extent.
Does anyone have an idea what the formula behind F is? In other words, how to reconstruct the table?
Apparently, the purpose of Rf is to make the modified PR less than non-modified.
When I was reading the text, my first thought was that F would be just a ratio of the inverse Student's CDF to the inverse normal CDF. I checked, and it wasn't the case. Also, that ratio would produce counter-intuitive results, because the t-distribution has heavier tails, so in that case F for better scores (bigger Dps) would be >1, hence the modified performance rating Rf would be even higher that the non-modified as a result.
So, it's something else - not that simple. I tried several things including multiplication by sqrt(N), sqrt(p*(1-p)), etc but couldn't match the F values in the table.

4932Using Rf isn't a solution as the problem is that FIDE's current and ELO's Rf method use a batched solution.

Although Elo mentions in his book at 1.53 that a more precise formula for the performance rating is in section 8.85 (that you show above), he notes in section 1.55 there are times when the method of successive approximations described in 3.4 should be used.

He also states:

The exact value for Rp is simply the value at which the expected game score is equal to the actual game score.

Kevin stated this in post #1 of this thread.

Your true performance rating (TPR) is the rating at which your expected score against the field met equals your actual score, ie if that was your start rating you would neither gain nor lose points from the event.
The solution as stated by pax in post #2 is

The solution is Ra in the following equation:

Ea=sum(i=1..n)(1/(1+10^((Rbi-Ra)/400)))

Where Ea is the score achieved, n is the number of games and Rb1, Rb2, ... Rbn are the ratings of the n opponents.