1. performance ratings

I was thinking another day how performance ratings are calculated and find the process less than satisfactory. The first well known issue is that average rating is used instead of doing it game by game. This one is easy to fix and this is what I believe Bill is doing in Australia.

The second problem is that actual ratings are used in calculations. This might be not so good idea because the ratings are quite often lagging. Take example of Raymond in the last championship. His actual rating was 2050, while he performed about 2300. When the performance ratings of his opponents are calculated the above fact is not taken into account. This problem is easy to fix if approach that scientists use is applied. (Scientists have a very similar problem. They rank scientific journals. There are cross references from one journal to another. There is original ranking, which is used to calculate new ranking based on references a particular journal receives from other journals. The process is repeated until the ranking converges.)

In performance rating terms the process can be amended in the following way. After old performance ratings are calculated, they are used as initial ratings in the process and using them new performance ratings are calculated. The process is repeated until performance ratings converge. The process will give true performance rating which does not depend on your initial rating.

2. Originally Posted by drug
I was thinking another day how performance ratings are calculated and find the process less than satisfactory. The first well known issue is that average rating is used instead of doing it game by game. This one is easy to fix and this is what I believe Bill is doing in Australia.
Can you expand on how this works?

3. Originally Posted by Rincewind
Can you expand on how this works?
It involves not only game by game but also iteration. I explained how it works in post http://www.chesschat.org/showpost.ph...1&postcount=39

4. Originally Posted by Bill Gletsos
It involves not only game by game but also iteration. I explained how it works in post http://www.chesschat.org/showpost.ph...1&postcount=39
Thanks. I think game-by-game is a bit of a misnomer as taking an average is also game-by-game in the same sense. What you are doing is matching true expected total scores, rather than averaging the rating of opponents.

From a purist point of view, it's a pity that iteration is required (although the function shoud be monotonic which greatly simplifies the task of root finding).

Have you looked for an explicit form? Refresh my memory, is the expected score calculated using a normal or logistic distribution?

5. Originally Posted by Bill Gletsos
It involves not only game by game but also iteration. I explained how it works in post http://www.chesschat.org/showpost.ph...1&postcount=39
I hope it is clear that your iteration is defferent from my iteration. If I correctly understand your post, your iteration is needed because in the first place you take an average rating, which is an approximate for calculating the performance. My iteration comes because the true performance rating should not depend on the initial rating.

p.S. A stupid question - I hope you do not iterate by your hands. It will take me about 15 minutes to write a computer program that can do that.

6. Originally Posted by drug
I hope it is clear that your iteration is defferent from my iteration.
It is to me. I cannot speak for others.
Originally Posted by drug
If I correctly understand your post, your iteration is needed because in the first place you take an average rating, which is an approximate for calculating the performance. My iteration comes because the true performance rating should not depend on the initial rating.
Of course there are situations where you may not get convergence.
Originally Posted by drug
p.S. A stupid question - I hope you do not iterate by your hands.
No. By program.
Originally Posted by drug
It will take me about 15 minutes to write a computer program that can do that.
You can also do it via Excel.

7. Originally Posted by Rincewind
Thanks. I think game-by-game is a bit of a misnomer as taking an average is also game-by-game in the same sense. What you are doing is matching true expected total scores, rather than averaging the rating of opponents.

From a purist point of view, it's a pity that iteration is required (although the function shoud be monotonic which greatly simplifies the task of root finding).

Have you looked for an explicit form? Refresh my memory, is the expected score calculated using a normal or logistic distribution?
Logistic, however in my example both logistic and normal round to 2626.
Logistic is 2626.2 and normal 2625.8.

8. Originally Posted by Bill Gletsos
Of course there are situations where you may not get convergence.
Do not agree. The process of performance rating calculation is equivalent to multiplying a vector by a matrix. If you repeat it - it means you multiply by the same matrix again.

So what you have is x=A^n*y, where y is a vector of initial ratings. A is a matrix that gives performance rating when you multiply initial ratings by it. It is to the power of n because we repeat the process potentially infinite number of times.

There is a very well known theorem in mathematics (unfortunately I forgot its name), which is saying that in the limit when n goes to infinity x has to converge to a unique vector, which is the eigenvector for matrix A.

9. Originally Posted by drug
Do not agree.
Sorry, I should have been clearer.
There are situations where I dont believe they meaningfully converge.
Originally Posted by drug
The process of performance rating calculation is equivalent to multiplying a vector by a matrix. If you repeat it - it means you multiply by the same matrix again.

So what you have is x=A^n*y, where y is a vector of initial ratings. A is a matrix that gives performance rating when you multiply initial ratings by it. It is to the power of n because we repeat the process potentially infinite number of times.

There is a very well known theorem in mathematics (unfortunately I forgot its name), which is saying that in the limit when n goes to infinity x has to converge to a unique vector, which is the eigenvector for matrix A.
I know the one you mean. Its name escapes me too.

Now perhaps I am wrong but I seem to recall seeing an example where they do not meaningfully converge.
If I remember it correctly it goes something like the following.

Players A and B draw with each and both beat player C and player D.
Player C in turn draws with Player D.

All start out nominally rated at 1500.
Their performance ratings wont meaningfully converge.

The reason why is as follows:

A=B+C+D
B=A+C+D
C-A-B=D
D-A-B=C

In the above scenario with A, B, C and D you effectively have two groups (A,B) , (C,D) and everyone in the first group beats everyone they play in the second group. So this effectively 100% / 0% and (A,B) keep rising and (C,D) keep falling.

Eventually they will converge at the point where (A,B)'s rating is around 920 points higher than (C,D). However are they meaningful.

It is essentially the problem of how much stronger is Player X than Player Y when X has 100% score against Y over a number of games. All you know from the logistic scale is that X is possibly 920 points higher than Y or if using the normal scale 735 higher, however how much higher is unknown.

10. Originally Posted by Bill Gletsos
Sorry, I should have been clearer.
There are situations where I dont believe they meaningfully converge.
I know the one you mean. Its name escapes me too.

Now perhaps I am wrong but I seem to recall seeing an example where they do not meaningfully converge.
If I remember it correctly it goes something like the following.

Players A and B draw with each and both beat player C and player D.
Player C in turn draws with Player D.

All start out nominally rated at 1500.
Their performance ratings wont meaningfully converge.

The reason why is as follows:

A=B+C+D
B=A+C+D
C-A-B=D
D-A-B=C

In the above scenario with A, B, C and D you effectively have two groups (A,B) , (C,D) and everyone in the first group beats everyone they play in the second group. So this effectively 100% / 0% and (A,B) keep rising and (C,D) keep falling.

Eventually they will converge at the point where (A,B)'s rating is around 920 points higher than (C,D). However are they meaningful.

It is essentially the problem of how much stronger is Player X than Player Y when X has 100% score against Y over a number of games. All you know from the logistic scale is that X is possibly 920 points higher than Y or if using the normal scale 735 higher, however how much higher is unknown.
Yep, that is right. The same is true for scietific journals however as i mentioned it is implemented. The probablity of this event in a reasonable 11-round competition is pretty small.

Say, for the recent championship do you believe this problem is relevant?

11. In fact try this scenario.

Players A and B draw with each and both beat player C.
Player C in turn draws with Player D.

All start out nominally rated at 1500.
Their performance ratings wont meaningfully converge.

The reason why is as follows:

A=B+C
B=A+C
C-A-B=D
D=C

These dont seem to converge at all.
The ratings for all 4 players just continue to drop.

12. Originally Posted by drug
Yep, that is right.
Thanks for confirming that.
Originally Posted by drug
The same is true for scietific journals however as i mentioned it is implemented. The probablity of this event in a reasonable 11-round competition is pretty small.
True, provided you dont have a number of withdrawls who only played a few games and also played each other.
I've also seen it happen enough in swiss events with many new players who are mostly unrated and very weak.

Originally Posted by drug
Say, for the recent championship do you believe this problem is relevant?
No.

13. Originally Posted by Bill Gletsos
Logistic, however in my example both logistic and normal round to 2626.
Logistic is 2626.2 and normal 2625.8.
Well logistic is probably easier to work with. In that case it looks like you are dealing with a polynomial of degree n where n is the number of games in the calculation. Once n is reasonably large (greater than 4 or 5) you are doing iterative calculations anyway. With just a little bit of effort you could probably express this polynomial for a general n and then it would be a piece of cake to write a root-finding routine that converges very quickly using something as simple as Newton's method.

However an explicit form would seem to be unobtainable.

Out of interest what is a typical number of iterations currently required?

14. Actually, expressing the thing as a polynomial is not a requirement to use Newton's method. You just need something which is differentiable and the starting function is is that. Here is a brief description of the method please let me know if you want further details...

First thing is to normalise the opponent coefficients. call these a[r]

a[r] = 10 ^ (rating/400)

for r = 1..n

Then if E is the expected score you are matching to you define a function y such that

y = sum(1 / (1 + x * a[r]), r=1..n) - E

and

y' = - sum(a[r] / (1 + x * a[r])^2)

Then you set x[0] equal to a first guess which you calculate this way (this is the standard performance rating)...

set a = 10 ^ (averagerating/400)
the set x[0] = (numberofgames - E) / (E * a)

Then proceed as normal for Newton method...

x[i+1] = x[i] - y(x[i])/y'(x[i])

to convert x[i] into a rating use the formula

rating = -400 * log(x[i])

Using this method and your example data I get the right rating to one decimal place in one iteration. In three iterations I get the exact answer to the accuracy of single floating point in Excel (2626.191843)

(Edit: fixed a couple of errors in the above 28-Jan-2006)

15. NB Convergence is not guaranteed by this method and the system I outline above using the example data seems to not converge if you use a very low value of initial rating guess. Best single value seems to be a starting value of x[0] = 0 (equivalent to a rating guess of +infinity). Using this you get absolute converge to the accuracy of Excel in 5 iterations. However using the initial guess as outlined above is probably still best.