PDA

View Full Version : performance ratings

23-01-2006, 11:15 AM
I was thinking another day how performance ratings are calculated and find the process less than satisfactory. The first well known issue is that average rating is used instead of doing it game by game. This one is easy to fix and this is what I believe Bill is doing in Australia.

The second problem is that actual ratings are used in calculations. This might be not so good idea because the ratings are quite often lagging. Take example of Raymond in the last championship. His actual rating was 2050, while he performed about 2300. When the performance ratings of his opponents are calculated the above fact is not taken into account. This problem is easy to fix if approach that scientists use is applied. (Scientists have a very similar problem. They rank scientific journals. There are cross references from one journal to another. There is original ranking, which is used to calculate new ranking based on references a particular journal receives from other journals. The process is repeated until the ranking converges.)

In performance rating terms the process can be amended in the following way. After old performance ratings are calculated, they are used as initial ratings in the process and using them new performance ratings are calculated. The process is repeated until performance ratings converge. The process will give true performance rating which does not depend on your initial rating.

Rincewind
23-01-2006, 12:04 PM
I was thinking another day how performance ratings are calculated and find the process less than satisfactory. The first well known issue is that average rating is used instead of doing it game by game. This one is easy to fix and this is what I believe Bill is doing in Australia.

Can you expand on how this works?

Bill Gletsos
23-01-2006, 12:11 PM
Can you expand on how this works?It involves not only game by game but also iteration. I explained how it works in post http://www.chesschat.org/showpost.php?p=83771&postcount=39

Rincewind
23-01-2006, 12:22 PM
It involves not only game by game but also iteration. I explained how it works in post http://www.chesschat.org/showpost.php?p=83771&postcount=39

Thanks. I think game-by-game is a bit of a misnomer as taking an average is also game-by-game in the same sense. What you are doing is matching true expected total scores, rather than averaging the rating of opponents.

From a purist point of view, it's a pity that iteration is required (although the function shoud be monotonic which greatly simplifies the task of root finding).

Have you looked for an explicit form? Refresh my memory, is the expected score calculated using a normal or logistic distribution?

23-01-2006, 01:57 PM
It involves not only game by game but also iteration. I explained how it works in post http://www.chesschat.org/showpost.php?p=83771&postcount=39

I hope it is clear that your iteration is defferent from my iteration. If I correctly understand your post, your iteration is needed because in the first place you take an average rating, which is an approximate for calculating the performance. My iteration comes because the true performance rating should not depend on the initial rating.

p.S. A stupid question - I hope you do not iterate by your hands. It will take me about 15 minutes to write a computer program that can do that.

Bill Gletsos
23-01-2006, 02:01 PM
I hope it is clear that your iteration is defferent from my iteration.It is to me. I cannot speak for others. ;)

If I correctly understand your post, your iteration is needed because in the first place you take an average rating, which is an approximate for calculating the performance. My iteration comes because the true performance rating should not depend on the initial rating.Of course there are situations where you may not get convergence.

p.S. A stupid question - I hope you do not iterate by your hands.No. By program.

It will take me about 15 minutes to write a computer program that can do that.You can also do it via Excel.

Bill Gletsos
23-01-2006, 02:02 PM
Thanks. I think game-by-game is a bit of a misnomer as taking an average is also game-by-game in the same sense. What you are doing is matching true expected total scores, rather than averaging the rating of opponents.

From a purist point of view, it's a pity that iteration is required (although the function shoud be monotonic which greatly simplifies the task of root finding).

Have you looked for an explicit form? Refresh my memory, is the expected score calculated using a normal or logistic distribution?Logistic, however in my example both logistic and normal round to 2626.
Logistic is 2626.2 and normal 2625.8.

23-01-2006, 02:13 PM
Of course there are situations where you may not get convergence.

Do not agree. The process of performance rating calculation is equivalent to multiplying a vector by a matrix. If you repeat it - it means you multiply by the same matrix again.

So what you have is x=A^n*y, where y is a vector of initial ratings. A is a matrix that gives performance rating when you multiply initial ratings by it. It is to the power of n because we repeat the process potentially infinite number of times.

There is a very well known theorem in mathematics (unfortunately I forgot its name), which is saying that in the limit when n goes to infinity x has to converge to a unique vector, which is the eigenvector for matrix A.

Bill Gletsos
23-01-2006, 02:37 PM
Do not agree.Sorry, I should have been clearer.
There are situations where I dont believe they meaningfully converge.

The process of performance rating calculation is equivalent to multiplying a vector by a matrix. If you repeat it - it means you multiply by the same matrix again.

So what you have is x=A^n*y, where y is a vector of initial ratings. A is a matrix that gives performance rating when you multiply initial ratings by it. It is to the power of n because we repeat the process potentially infinite number of times.

There is a very well known theorem in mathematics (unfortunately I forgot its name), which is saying that in the limit when n goes to infinity x has to converge to a unique vector, which is the eigenvector for matrix A.I know the one you mean. Its name escapes me too.

Now perhaps I am wrong but I seem to recall seeing an example where they do not meaningfully converge.
If I remember it correctly it goes something like the following.

Players A and B draw with each and both beat player C and player D.
Player C in turn draws with Player D.

All start out nominally rated at 1500.
Their performance ratings wont meaningfully converge.

The reason why is as follows:

A=B+C+D
B=A+C+D
C-A-B=D
D-A-B=C

In the above scenario with A, B, C and D you effectively have two groups (A,B) , (C,D) and everyone in the first group beats everyone they play in the second group. So this effectively 100% / 0% and (A,B) keep rising and (C,D) keep falling.

Eventually they will converge at the point where (A,B)'s rating is around 920 points higher than (C,D). However are they meaningful.

It is essentially the problem of how much stronger is Player X than Player Y when X has 100% score against Y over a number of games. All you know from the logistic scale is that X is possibly 920 points higher than Y or if using the normal scale 735 higher, however how much higher is unknown.

23-01-2006, 03:57 PM
Sorry, I should have been clearer.
There are situations where I dont believe they meaningfully converge.
I know the one you mean. Its name escapes me too.

Now perhaps I am wrong but I seem to recall seeing an example where they do not meaningfully converge.
If I remember it correctly it goes something like the following.

Players A and B draw with each and both beat player C and player D.
Player C in turn draws with Player D.

All start out nominally rated at 1500.
Their performance ratings wont meaningfully converge.

The reason why is as follows:

A=B+C+D
B=A+C+D
C-A-B=D
D-A-B=C

In the above scenario with A, B, C and D you effectively have two groups (A,B) , (C,D) and everyone in the first group beats everyone they play in the second group. So this effectively 100% / 0% and (A,B) keep rising and (C,D) keep falling.

Eventually they will converge at the point where (A,B)'s rating is around 920 points higher than (C,D). However are they meaningful.

It is essentially the problem of how much stronger is Player X than Player Y when X has 100% score against Y over a number of games. All you know from the logistic scale is that X is possibly 920 points higher than Y or if using the normal scale 735 higher, however how much higher is unknown.

Yep, that is right. The same is true for scietific journals however as i mentioned it is implemented. The probablity of this event in a reasonable 11-round competition is pretty small.

Say, for the recent championship do you believe this problem is relevant?

Bill Gletsos
23-01-2006, 03:59 PM
In fact try this scenario.

Players A and B draw with each and both beat player C.
Player C in turn draws with Player D.

All start out nominally rated at 1500.
Their performance ratings wont meaningfully converge.

The reason why is as follows:

A=B+C
B=A+C
C-A-B=D
D=C

These dont seem to converge at all.
The ratings for all 4 players just continue to drop.

Bill Gletsos
23-01-2006, 04:03 PM
Yep, that is right.Thanks for confirming that.

The same is true for scietific journals however as i mentioned it is implemented. The probablity of this event in a reasonable 11-round competition is pretty small.True, provided you dont have a number of withdrawls who only played a few games and also played each other.
I've also seen it happen enough in swiss events with many new players who are mostly unrated and very weak.

Say, for the recent championship do you believe this problem is relevant?No.

Rincewind
23-01-2006, 07:01 PM
Logistic, however in my example both logistic and normal round to 2626.
Logistic is 2626.2 and normal 2625.8.

Well logistic is probably easier to work with. In that case it looks like you are dealing with a polynomial of degree n where n is the number of games in the calculation. Once n is reasonably large (greater than 4 or 5) you are doing iterative calculations anyway. With just a little bit of effort you could probably express this polynomial for a general n and then it would be a piece of cake to write a root-finding routine that converges very quickly using something as simple as Newton's method.

However an explicit form would seem to be unobtainable.

Out of interest what is a typical number of iterations currently required?

Rincewind
23-01-2006, 08:03 PM
Actually, expressing the thing as a polynomial is not a requirement to use Newton's method. You just need something which is differentiable and the starting function is is that. Here is a brief description of the method please let me know if you want further details...

First thing is to normalise the opponent coefficients. call these a[r]

a[r] = 10 ^ (rating/400)

for r = 1..n

Then if E is the expected score you are matching to you define a function y such that

y = sum(1 / (1 + x * a[r]), r=1..n) - E

and

y' = - sum(a[r] / (1 + x * a[r])^2)

Then you set x[0] equal to a first guess which you calculate this way (this is the standard performance rating)...

set a = 10 ^ (averagerating/400)
the set x[0] = (numberofgames - E) / (E * a)

Then proceed as normal for Newton method...

x[i+1] = x[i] - y(x[i])/y'(x[i])

to convert x[i] into a rating use the formula

rating = -400 * log(x[i])

Using this method and your example data I get the right rating to one decimal place in one iteration. In three iterations I get the exact answer to the accuracy of single floating point in Excel (2626.191843)

(Edit: fixed a couple of errors in the above 28-Jan-2006)

Rincewind
23-01-2006, 09:10 PM
NB Convergence is not guaranteed by this method and the system I outline above using the example data seems to not converge if you use a very low value of initial rating guess. Best single value seems to be a starting value of x[0] = 0 (equivalent to a rating guess of +infinity). Using this you get absolute converge to the accuracy of Excel in 5 iterations. However using the initial guess as outlined above is probably still best.

Spiny Norman
24-01-2006, 06:22 AM
Truly terrifying Barry! My mind is boggling away here ... ;)

Rincewind
25-01-2006, 03:32 PM
Truly terrifying Barry! My mind is boggling away here ... ;)

Sorry Stephen, I meant to respond to this post earlier.

I must admit my intended audience was primarily Bill. Since I have spoken to Bill previously about Newton-Ralphson root-finding I had a feel that what I posted was at the right level to describe my method clearly.

However, if you (or anyone else) want to discuss further some time you can give me a call or ask specific questions in this thread or via email. Alternatively you could google Newton-Ralphson and it would explain the idea which requires high school level of calculus to appreciate.

Bill Gletsos
25-01-2006, 04:01 PM
Barry,

When I wrote the code for doing performance ratings instead of using Newton-Ralphson I used Brent's method.

I did however implement Newton-Ralphson for determinging the new volatility in Glicko2 as described in Glickman's document when I discovered that his iterative code might fail to find a root. In this case I implemented Newton-Ralphson with the additional step of ensuring that initially the root is bracketed between points a and b.

pax
25-01-2006, 04:27 PM
There is a very well known theorem in mathematics (unfortunately I forgot its name), which is saying that in the limit when n goes to infinity x has to converge to a unique vector, which is the eigenvector for matrix A.

Only if the eigenvalues of A are less than one!

Spiny Norman
25-01-2006, 04:52 PM
... requires high school level of calculus to appreciate.
I probably would have appreciated it 25-30 years ago ... but I was always an "average" Pure Math student and I'm afraid that I've forgotten it all now. :oops:

25-01-2006, 05:12 PM
Only if the eigenvalues of A are less than one!

I believe this is the answer to Bill's post. (He was saying that in his example all the ratings were going down.) This is because in a procedure you need to adjust the average performance rating to the average initial rating every step. When you do the procedure only once, the discrepancy is not very big, when you do it many times it does accumulate.:)

Rincewind
25-01-2006, 05:13 PM
Only if the eigenvalues of A are less than one!

Do you know the name of the theorem? Linear Algebra certainly isn't a strength of mine. I was wondering if this is somehow related to the Jacobi method of determining eigenvalues.

pax
25-01-2006, 05:20 PM
Do you know the name of the theorem? Linear Algebra certainly isn't a strength of mine. I was wondering if this is somehow related to the Jacobi method of determining eigenvalues.

Nah, this is in the category of stuff that mathematicians label "clear" or "obvious". You wouldn't bother using the name (if indeed it has one).

25-01-2006, 05:39 PM
Do you know the name of the theorem? Linear Algebra certainly isn't a strength of mine. I was wondering if this is somehow related to the Jacobi method of determining eigenvalues.

The theorem is called THE PERRON-FROBENIUS THEOREM. Below is the reference that explains the theorem.

http://www.prenhall.com/divisions/esm/app/ph-linear/leon/html/perron.html

25-01-2006, 05:47 PM
It is pretty much saying that if a matrix is nonnegative then the largest eigenvalue corresponds to a positive vector. This eigenvalue will dominate all other eigenvalues and when you multiply a vector by a matrix many times, it will eventually converge to this positive eigenvector because all other terms (corresponding to other eigenvalues) are relatively small.

There is not requirement with respect to the size of the eigenvalues though.:)

shaun
25-01-2006, 05:57 PM
Barry,

When I wrote the code for doing performance ratings instead of using Newton-Ralphson I used Brent's method.

Which is especially appropriate as Richard Brent is still an active chess player here in Canberra.:D

Bill Gletsos
25-01-2006, 06:20 PM
Which is especially appropriate as Richard Brent is still an active chess player here in Canberra.:DThanks for that.
I knew Brent was Australian but hadnt considered he was the one on the ACF rating list.