PDA

View Full Version : ACF Rating system



peter_parr
20-12-2010, 10:19 AM
I have received many communications concerning the ACF rating system since my NSWCA President’s report published in the last ACF newsletter.
Some players have gained or lost 95 or more rating points on one game in the latest ACF rating list.
Australia is the only place in the world that this can happen. A former ACF ratings officer in part of his email to me a few days ago…

“I’d like to see the current ratings system thrown out, but it won’t happen without help from influential people”
The following message is authorized for publication by Damian Norris. He is one of many this has happened to. The ACF ratings officer refuses to have any review of the rating system. It is causing enormous damage in my view and we are losing many players.

Hi Peter

I read with interest your report in the ACF newsletter and of particular interest was your views on the Glicko rating system.

I don’t know if you are aware of what happened to me….but I will probably never play in an Australian tournament again because of the stupid ACF rating system.

I have only played in one tournament in the past 4 years in Australia but I regularly compete in international events still. I’m not sure if Greg Canfell informed you of my circumstances….but in summary, I competed in the Zonal on the Gold Coast 2009. I came down with a very bad flu just prior to the event and was not well enough to play but because it was the zonal and I had travelled from overseas, I thought it would have looked bad if I had pulled out. The end result is that I had my worst tournament ever…my FIDE rating dropped 79 points, which was terrible but expected but my ACF rating went from 2068 to 1621…down 447 points in one tournament.

I thought this must have been a joke and Guy West certainly spent many e-mails stirring me about it as well as about half of Queensland….but after I queried Bill Glestos (ratings officer at the time), all he would do was stoutly defend it….no amount of logic would change his mind and he viewed any constructive criticism as a personal attack on him. My logic was that how could it be possible that someone who has been rated between 2000 and 2100 for the past 25 years and one who still has an active FIDE rating (at the time above 2100) drop an astonishing 447 points in one tournament due to ill health. But the more amazing ridiculous thing is that prior to the tournament I had a ?? rating indicator but after that one tournament, it now indicates a reliable rating (no question mark!). But Bill in his infinite wisdom still strongly supported the Glicko system. I wonder if Roger Federer had one incredibly bad tennis tournament, his world ranking would drop so much.
It is now clear that if I ever want to recoup the rating loss, I can simply only compete again once my rating indicates a ?? again and hopefully, I will make sure that I’m not ill or my mobile doesn’t go off in a winning position or other stupid things like this….or perhaps I might lose another 450 points!

The Glicko system or is it Glestos system (because he is such a strong supporter of it) is an absolute joke and many chess players considering a comeback have seen what has happened to me and have now decided against it. I will either never play in Australia again or perhaps only play in 24 months time if my rating gets to ?? in the hope of picking up a lot of points in one tournament…..but most people with a stable rating over many years but have not recently competed would be absolutely stupid to come back under the present system. Tim Reilly is an example of one person who still enjoys international tournaments and regularly competes in Thailand but refuses to play in Australia for the same reason.

I notice that under my present rating, I’m not good enough to enter the Qld Championships…I think you have to be 1700 or above! Anyway for your amusement, 3 months after that Zonal tournament, I competed in the Thai National Championships….came 3rd (with a performance rating of 2300) but in fact would have been outright champion if I has won my last game instead of losing it. But that doesn’t change my ACF rating…I’m still 1621 and according to Bill Glestos and the ACF rating system, this rating is reliable (or at least not unreliable) and thus a true indication of my current playing strength.

Anyway, I hope everything is going good with you. I just wanted to say that I support your comments wholeheartedly and I’m happy for you to use me as an example in any further criticism of the rating system that you wish to express.

All the best
Damian Norris

Garrett
20-12-2010, 11:03 AM
Yes it's astonishing that Damian could lose so many points. I've been playing him on and off for around 30 years.

If I was to be drawn to play against Damian (rated 1621) in a tournament there is a chance I would just forfeit and buy him a beer and catch up.

Manack
20-12-2010, 11:22 AM
I'm not a mathematician, but if the ACF system uses Glicko-2 then from my understanding then a performance that was remarkably different from the existing rating would introduce a large violatility measure for that player.

So one wouldn't need to wait through ratings cycles before playing again and expecting large ratings swings back towards the players consistant performance.

I don't know how the ??,?,!,!! are calculated and if they take into account the violatility or just the ratings deviation or both. It would be nice to know what those symbols represent.

A Glicko rating presented without its associated RD value can't be given much stock.

george
20-12-2010, 12:32 PM
Hi all,

In the last rating period I only played one non rapid rated game. I was filling in in Interclub. I played someone rated about 200 points below me and won. You can understand my surprise when with the last rating list I found my rating had gone up by about 60 points.

Over the last few years I have found the ACF Rating system to be fairly accurate and representative of the strength of players I have played.

I consider Glicko 2 to have a few quirks but I dont believe it is fatally flawed and appreciate as always the amount of work done by Bill Gletsos in running the ACF rating system.

Kevin Bonham
20-12-2010, 12:50 PM
It is now clear that if I ever want to recoup the rating loss, I can simply only compete again once my rating indicates a ?? again

This just isn't true. Glicko-2 is a more advanced and more dynamic version of Glicko-1 but in most regards it is similar and you can model likely impacts roughly (note that in extreme and artificial cases the results may be unreliable) in Barry's Glicko 1 calculator (http://www.bjcox.com/modules.php?name=Glicko_Calc).

A player whose rating has a blank after it and who performs many hundreds of points above that rating gains points rapidly under either system. On the Glicko 1 calculator if Damian Norris returned with his rating at 1619_ and played about 18 games at 2068 strength against opponents with blanks after their name in one period he would recoup all the lost points.

With his rating at 1619? as it is now he would recoup the damage in about 10 games in one period.

If the opponents are !s or !!s the points will be recouped in fewer games, and the figures for G2 will be broadly similar.

Yes it's not as simple as just playing one tournament and performing at your old rating strength again, but nor should it be - if the system has two recent results and before that nothing for years, and one of them is at 2068 strength and one of them is at 1400-ish strength then it should be inclined to hold some doubt about whether you're really going to consistently play at 2068. So it's natural that it needs more games to erase the bad rating than it did to create them.


I will either never play in Australia again or perhaps only play in 24 months time if my rating gets to ?? in the hope of picking up a lot of points in one tournament…..

I've had a bit of a look at the maths of this now and then and while a lot of players think this is a good idea, it's actually only a good idea if you are no longer performing at your old rating strength and want to attempt a Hail Mary at getting your old rating back.

If you are performing at your old rating strength (on average) then you will get your old rating back both faster and with a higher probability of success by playing more tournaments.


Anyway for your amusement, 3 months after that Zonal tournament, I competed in the Thai National Championships….came 3rd (with a performance rating of 2300) but in fact would have been outright champion if I has won my last game instead of losing it.

And if this event had been rated under the ACF system it would have wiped out most of the points loss lifting the player from 1621 to the mid-high 1900s in just six games. Which is fair enough given that the only recent data it would have for a former 2068-rated player would be a tournament at 1400-ish and one at 2300-ish with an average still well below the old rating.


But that doesn’t change my ACF rating…I’m still 1621 and according to Bill Glestos and the ACF rating system, this rating is reliable (or at least not unreliable) and thus a true indication of my current playing strength.

!! is very reliable, ! is reliable and blank is "neither reliable nor unreliable". In my experience blank ratings are actually quite unreliable on the whole, especially when players are returning, and they could probably be described differently to accord with general understanding of what "reliable" means, eg blank = unreliable, ? = very unreliable, ?? = extremely unreliable, ?^infinity =Peter Parr's old rating etc.

The Damian Norris case is a very unusual one because most players who spend years out of the ACF system and lose bundles of points do so because they are rusty and simply not as good as they were. When a player is a little more active and only dipping in and out of the ACF system occassionally irregularities can occur.

In general Glicko-2 is vastly superior to FIDE ELO which is still stuck in the mathematical stone ages, but I'd be interested to see some predictivity testing specifically focused on those cases where players have been inactive for >2 years and record results wildly divergent from their own rating. Also cases where a player plays a very small number of games years after last playing would be worth looking at.

Bill Gletsos
20-12-2010, 05:51 PM
Hi all,

In the last rating period I only played one non rapid rated game. I was filling in in Interclub. I played someone rated about 200 points below me and won. You can understand my surprise when with the last rating list I found my rating had gone up by about 60 points..Your September 2010 rating was 1474 and your December 2010 rating is 1501.

As such you only went up 27 points.

Oepty
20-12-2010, 06:28 PM
Hi all,

In the last rating period I only played one non rapid rated game. I was filling in in Interclub. I played someone rated about 200 points below me and won. You can understand my surprise when with the last rating list I found my rating had gone up by about 60 points.

Over the last few years I have found the ACF Rating system to be fairly accurate and representative of the strength of players I have played.

I consider Glicko 2 to have a few quirks but I dont believe it is fatally flawed and appreciate as always the amount of work done by Bill Gletsos in running the ACF rating system.

George, you mean you beat me. I think my rating was 1446, now 1478, yours as Bill pointed out was 1474. Only 28 points different.
Scott

george
20-12-2010, 09:20 PM
Hi Scott,

I had no idea what your rating was and took an uneducated guess - sorry mate.

I really did think my rating had gone right down in the Sept ratings but Bill corrected me so 21 (?) point jump I guess is fair enough.

In truth i get mixed up sometimes between rapid and longer ratings - but resticting myself exclusively almost to rapid play should at least give my search for the simple life that much closer.

Ah well - see you at Bay - Scott:) .

peter_parr
22-12-2010, 02:30 PM
I was pleased to meet a former NSW Chess Association President and strong tournament player in my shop today after a lengthy absence from administration and competitive play.

After chatting about old times of Sydney chess during the last 30 years he told me he was interested in playing again in a tournament in the near future. He asked me about the rating system.

Once again as I have on hundreds of occasions over the last ten years I explained to him how the ACF Glicko rating system worked.

I advised him that an the latest rating list some players had lost well over 100 rating points on one game due to inactivity by losing one game against a player of similar rating.

I further advised him that this system was not used by any other of the 170 FIDE countries.

It is very doubtful that he will now return to competitive chess. This situation has repeated itself hundreds of times and adult chess in Australia is very severely affected by the Glicko system.

The statement by Gletsos and Bonham that everyone in the world is wrong and they are right is doing more and more damage.
Even a review of the rating system is NOT permitted.

We need more input from all players in Australia on what they think of the Glicko rating system.

An ELO system accepted worldwide causes no problems.

It is so very sad that players do not return to chess entirely because we are using a very bad rating system when as Damian points out a loss of nearly 500 rating points in a 9 round event is utterly absurd.

Garvinator
22-12-2010, 02:44 PM
We need more input from all players in Australia on what they think of the Glicko rating system.I think you only want the input of those who support your beliefs.

I do not support your beliefs at all, do you still want my input? I doubt it.

Kevin Bonham
22-12-2010, 03:32 PM
The statement by Gletsos and Bonham that everyone in the world is wrong and they are right is doing more and more damage.

As you claim I have made this statement, can you please point me to where I have said it, preferably in exactly those words?


Even a review of the rating system is NOT permitted.

On the contrary I believe that many aspects of a rating system should be reviewed at all times to check that its predictiveness is optimised. Note the latter qualifier; to soothe the egos of people whose ratings have dropped even if this leads to much less accurate ratings on the whole is not a fit criterion for reviews.

In my previous post I even wrote this:


In general Glicko-2 is vastly superior to FIDE ELO which is still stuck in the mathematical stone ages, but I'd be interested to see some predictivity testing specifically focused on those cases where players have been inactive for >2 years and record results wildly divergent from their own rating. Also cases where a player plays a very small number of games years after last playing would be worth looking at.

Apparently writing that was a complete waste of my time because you go saying I don't support reviews of the system whereas I actually support constant reviews of it in all kinds of ways (and I know that the Ratings Officers are frequently working on ways to improve it.) What I don't support is a kangaroo court review by people whose views about ratings are simplistic or outmoded.

As for the person you were talking to, why does he care if a rating that is many years out of date and irrelevant as a guide to his actual current playing strength goes down? Is it so unacceptable to him if his new rating doesn't exceed his actual current playing strength by at least 200 points? If (perhaps) being underrated for a few months would be enough to deter him from returning then just how serious is he about coming back anyway?


We need more input from all players in Australia on what they think of the Glicko rating system.

If it was a tenth as outrageous as you claim you wouldn't need to be asking for that input.


I think you only want the input of those who support your beliefs.

I have a similar impression. The Parr modus of repeating a set rant with content changes (if any) having little relevance to the actual nature of replies is a sadly too-familiar one by now.

Oepty
22-12-2010, 07:39 PM
Hi Scott,

I had no idea what your rating was and took an uneducated guess - sorry mate.

I really did think my rating had gone right down in the Sept ratings but Bill corrected me so 21 (?) point jump I guess is fair enough.

In truth i get mixed up sometimes between rapid and longer ratings - but resticting myself exclusively almost to rapid play should at least give my search for the simple life that much closer.

Ah well - see you at Bay - Scott:) .

No worries George, my rapid rating is quite a bit lower so maybe you were remembering that. At this stage I doubt you will see me at Glenelg. I definitely will not be playing, might drop in briefly if I can though.
Scott

ER
22-12-2010, 09:42 PM
I was pleased to meet a former NSW Chess Association President and strong tournament player in my shop today after a lengthy absence from administration and competitive play.

Nice to see players / administrators coming back to Chess scene!

...


Once again as I have on hundreds of occasions over the last ten years I explained to him how the ACF Glicko rating system worked.

Did you also explain to him that parallel to ACF's Glicko rating system we also have the FIDE Elo rating system so efficiently looked after by Greg? Does the ex Pres know that most serious tournaments in Australia are also FIDE rated?


We need more input from all players in Australia on what they think of the Glicko rating system.

Input and discussion is always welcome albeit, I believe that participation of players is a sound indicator of approval or disapproval; in this case all of our highest rated (by means of both systems) players play more often than not in local tournaments. On a personal basis I wonder if there are any players male or female of say Olympic team strength who have given up OTB chess because of Glicko disapproval?

BTW Peter are you going to be open for business during the Aus Open?

Spiny Norman
23-12-2010, 06:41 AM
I was pleased to meet a former NSW Chess Association President and strong tournament player in my shop today after a lengthy absence from administration and competitive play.

After chatting about old times of Sydney chess during the last 30 years he told me he was interested in playing again in a tournament in the near future. He asked me about the rating system.

Once again as I have on hundreds of occasions over the last ten years I explained to him how the ACF Glicko rating system worked.

I advised him that an the latest rating list some players had lost well over 100 rating points on one game due to inactivity by losing one game against a player of similar rating.

I further advised him that this system was not used by any other of the 170 FIDE countries.

It is very doubtful that he will now return to competitive chess.

Oh, so very well done Peter ... :hand: you goose!

Manack
23-12-2010, 08:47 AM
What's most interesting about this arguement is that there is any arguement at all. This is simply a conflict between various mathematical systems and it is very easy to determine which of these systems gives the best predictive result for the majority of cases. No emotive arguement from anyone really should have any impact on this debate. Simply input the data and do the math.

Garvinator
23-12-2010, 09:54 AM
What's most interesting about this arguement is that there is any arguement at all. This is simply a conflict between various mathematical systems and it is very easy to determine which of these systems gives the best predictive result for the majority of cases. No emotive arguement from anyone really should have any impact on this debate. Simply input the data and do the math.The arguments here have nothing to do with logical analysis, but have to do with stroking egos and pushing of agendas.

Kevin Bonham
23-12-2010, 09:59 AM
This is simply a conflict between various mathematical systems and it is very easy to determine which of these systems gives the best predictive result for the majority of cases. No emotive arguement from anyone really should have any impact on this debate. Simply input the data and do the math.

Yes and this has been tested extensively and the current system is a much better predictor than the ELO system that went before it, and is also much better at dealing with cases where players are rapidly changing in strength. The ACF wouldn't be using it if it wasn't. I'm not saying here that it is perfect, just that it is much better than ELO on the whole.

The problem is that opponents of the system who are running Parr's line just don't seem to care that ELO is predictively inferior. They just won't accept any system in which a player can "lose" a relatively large number of points from few games, even if that player is "losing" those points from a rating that is so old that it is utterly absurd to consider it to still mean anything in terms of the player's current playing strength.

peter_parr
23-12-2010, 11:42 AM
Nearly every week for the last ten years one or more inactive players come to my shop with the intention of playing again.

The one question everyone asks is how does the ratings work? Kevin please explain how I can persuade these inactive players to play again when they discover they may lose well over 100 rating points on their very first game against a player of similar rating as in the December 2010 rating list?

The fact very simply is that the player does not come back and never does play even one game.

Further details of Glicko have no relevance. I put it to you that the former NSWCA President and many others are genuine about competing again but will not do so if such massive losses of rating points occur in one game (or 477 rating points in one event – poor Damian of Qld – Garvin).

If our system was similar to FIDE and the other 169 countries – pick up the old rating and carry on – as accepted by all experts and all other countries then numerous inactive Australians would be playing again – they want to but with our system they will not play. This is well understood by all other experts around the world and by FIDE.

All I suggest is we have a similar system. Do you want our inactive players to play again – there are many thousands. They will not play even a single game again under the ACF Glicko Rating system.

I see all these players – what can I tell them ? If a player has lost some strength it will soon be reflected in a sensible amended rating. Why is Australia the only country in the world this is happening to.

Kevin please listen to the world opinion – please advise how I can get an inactive player to play one game without the threat of massive losses of rating points unique to Australia.

Is the world wrong and Australia right or vice-versa.

We must do what is best for chess.

peter_parr
23-12-2010, 11:44 AM
Did you also explain to him that parallel to ACF's Glicko rating system we also have the FIDE Elo rating system so efficiently looked after by Greg? Does the ex Pres know that most serious tournaments in Australia are also FIDE rated?
Yes I did explain about the FIDE ELO rating system and its world wide-acceptance.
Greg is of course very efficient.
Bill is also of course very efficient in processing data – all we need is the data in the right program as in FIDE and not highly volatile.


BTW Peter are you going to be open for business during the Aus Open?
Yes 30% discount applies to all products regular business hours. Good to see you again and all players (I will be at the opening ceremony) from VIC and all states!

Kevin Bonham
23-12-2010, 12:07 PM
Nearly every week for the last ten years one or more inactive players come to my shop with the intention of playing again.

The one question everyone asks is how does the ratings work? Kevin please explain how I can persuade these inactive players to play again when they discover they may lose well over 100 rating points on their very first game against a player of similar rating as in the December 2010 rating list?

Who lost over 100 points from one game against a player of similar rating?

Firstly you should tell the inactive players that under the current system their new rating will probably be quite different from their old rating, initially. Not only might it be much lower but if their results are good it might be much higher. In any case if they play actively their rating will quickly reach a level consistent with their current playing strength, so long as they play a few tournaments rather than just one.

If they persist in complaining that they will "lose" ratings points tell them that a rating that is 10, 20 or 30 years old is meaningless as an indicator of a player's current skill level and therefore they shouldn't mind about getting a new one that might be lower since their old one means very little anyway. Their old rating doesn't say how good they are, it says how good they were. Indeed in times past some previous ratings officers would simply have deleted the old rating, forcing them to start from scratch and their new rating (if they performed poorly) would have been even lower than under Glicko. If they are really coming back for enjoyment of playing the game competitively rather than for an ego boost from parading an out of date rating they shouldn't care.

As Spiny Norman seems to have spotted you don't seem to be going out of your way to encourage these players to return to chess. Instead you seem to alarm them with scare stories about the Glicko system based on a small number of extreme cases. When telling them about how quickly they could lose points do you tell them about how quickly they could gain them?

(Indeed I have more problems with Glicko's handling of inactive players who perform above their ratings on return than below.)


If our system was similar to FIDE and the other 169 countries – pick up the old rating and carry on – as accepted by all experts

Oh no it isn't. :rolleyes: And even though many other nations use ELO, there's ELO and there's ELO. ELO with activity-based sliding k-factors is basically primitive Glicko.


Kevin please listen to the world opinion

Argument from authority. Invalid. Ignored. :hand: What I listen to is evidence of what works and what doesn't. One of my other fields of interest is electoral systems and in Tasmania we have some very advanced ones that can be a little confusing for the public but are very fair. I don't listen to world opinion from countries like the USA and the UK who run their elections using the inferior but easy system that is first-past-the-post. ELO is the first-past-the-post of chess ratings systems. I don't care how many countries use it.

Now Peter, since you maintain that you have "one or more inactive players" piking because of the rating system "nearly every week" that sounds like at least 50/year.

If the numbers were really that massive you could have harnessed this sentiment by means of a simple petition "We the undersigned formerly active chessplayers would like to come back to active chess but will only do so if the ACF allows us to be overrated for at least another 3-4 years no matter how badly we play." (I'm sure it wouldn't be that revealingly worded but I'm sure you get the drift.) By now you could have accumulated hundreds of signatures to that effect. Where are they? Why does the ACF virtually never hear complaints from such players directly, but only vicariously through you?

Could it be that the one or more inactive players are for the most part a small group of players who eternally talk about returning to competitive chess but don't actually do it? Yes I know plenty of these too. Most chess organisers under any rating system would.

Garvinator
23-12-2010, 12:24 PM
Peter Parr,

From now on, each time you claim that there are these inactive players who are not playing because of Glicko, I will ask you to name these players, because quite frankly I believe you are grossly over-stating the problem, or, and I think this is certainly what is happening, is that:

You are biasing these so called former players with your own biased views. You only give them one side of the story that Glicko is bad. I would not even be surprised at all if some, if not most, of these people are only agreeing with you because to do otherwise would be to disagree or question an anti-glicko zealot.

As for the loss of rating points, the explanation is simple. Their old rating is only used to seed them in the tournament. Their next rating will be based on their current tournament results. Perform better than your old rating in the tournament and you will get a higher new rating. Perform a lot worse than your old rating and your new rating will reflect your current tournament performance.

As the new rating is very unreliable, based on one tournament, you can quite easily get back to your old rating if you perform at that rating in tournaments afterwards.

I guess you would not use an explanation like that because it would not fit your agenda.

A worse situation than some mythical former players not returning is the loss of adult players to rapidly improving juniors that would occur if the system was not so dynamic,. That is really one of the major issues in tournament chess, loss of rating points to under rated players, which the ACF glicko system handles very well.

Given the choice between massaging a few broken egos, or dealing with rapidly improving players correctly, I will the latter every day of the week.

ER
23-12-2010, 01:05 PM
Yes 30% discount applies to all products regular business hours. Good to see you again and all players (I will be at the opening ceremony) from VIC and all states!

Great! :) See you at both the opening ceremony and the shop!

Desmond
23-12-2010, 01:48 PM
Nearly every week for the last ten years one or more inactive players come to my shop with the intention of playing again.

The one question everyone asks is how does the ratings work? Kevin please explain how I can persuade these inactive players to play again when they discover they may lose well over 100 rating points on their very first game against a player of similar rating as in the December 2010 rating list?
Don't "discover" it to them in the first place.

But c'mon Peter you are a salesman, you know how to sell! Every objection is an opportunity to highlight something good.

Yes ratings can change quickly - that means that those underrated pesky juniors aren't so underrated anymore.
Yes if you're rusty you will drop some points - but if you bang out that rust before the next tournament you can bounce back just as quickly.

Unless of course the product you're selling is not chess participation.

Garrett
23-12-2010, 02:07 PM
Yes ratings can change quickly - that means that those underrated pesky juniors aren't so underrated anymore.

Does it work like that ?

A lot of those rapidly rising stars seem to have !! next to their ratings indicating it is a reliable rating.

Desmond
23-12-2010, 02:59 PM
Does it work like that ?

A lot of those rapidly rising stars seem to have !! next to their ratings indicating it is a reliable rating.
Well according to the more (than me) mathematically minded posters here, the system has better predictive results. I'd take that to mean that ratings are closer to right that they would otherwise be.

Garvinator
23-12-2010, 03:13 PM
A lot of those rapidly rising stars seem to have !! next to their ratings indicating it is a reliable rating.When those rising stars go through their next upsurge, an intermediate rating is most often used.

So if the player was 1400!! and then for a rating period they performed at 1800 for the whole period, the latter half of their games would be rated as though they were 1600.

This has the effect of increasing the improving players rating faster, while minimising the effects on the players that person played as for the second half of the rating period the games would be worked out as though the player had a 1600 rating.

Also, for the next rating period, that player would most likely have a !, instead of !!, as their new rating is less reliable. This then means if they are continuing their improvement, their rating is more volatile from the first game of the new rating period.

I am not sure, but I believe so, that the opposite is true for a player on the downward spiral.

Now under ELO, the improving junior would only have their games rated for the whole period at a K factor of 15, which means their gains are much slower, meaning they are 'underrated' for a longer period of time. This then affects that players ability to get into rating restricted events, hurts their opponents with more rating points lost than should have occurred.

Also, here in Australia, we do have the effect quite often where a junior will play a few long time control events, get a rating, then only play in junior rapids for quite a while before returning to the long time control scene when they get a bit older.

Under ELO, it would be likely that their old rating would not catch up if they are rapidly improving, but under Glicko that returning junior would be starting with a ? or ?? and so their rating can go up in leaps and bounds.

Garrett
23-12-2010, 05:00 PM
Thanks for explaining it Garvin.

Cheers
Garrett.

peter_parr
23-12-2010, 05:32 PM
Who lost over 100 points from one game against a player of similar rating?

Nicholas Cooper lost 124 points from a single game played in the previous rating period (from 1891 in September quarter to 1767 in December quarter) against an opponent rated just over 1800. Given an Elo expected score of ~62%, this implies an effective K factor of 124/62% = 200.

Denis_Jessop
23-12-2010, 07:44 PM
If players are refusing to come back to competitive chess just because they may lose some rating points, perhaps only temporarily, it doesn't say much for their commitment to chess as such.

DJ

Kevin Bonham
23-12-2010, 08:10 PM
Nicholas Cooper lost 124 points from a single game played in the previous rating period (from 1891 in September quarter to 1767 in December quarter) against an opponent rated just over 1800. Given an Elo expected score of ~62%, this implies an effective K factor of 124/62% = 200.

Thankyou. The player in question has not played a significant number of games in any period since Aug 2002 when he played 10 and his rating was 1716. In the rest of that year and 2003-4 he seems to have played just 5 games and in the process gained 167 points (partly by uplift but more than half not) - which is a good demonstration of how quickly even a moderately inactive player can rise. Prior to the one game showing for the recent period he had been inactive for 6 years. I suspect that under a less responsive system he would not have been as high as 1891 in the first place.

Any other examples or was he the only one?

CivicChessMan
28-12-2010, 11:10 AM
Interesting discussion. In New Zealand, a provisonally rated player (< 20 rated games) will be purged from the system after 4 years of inactivity. Typically, these are one-tournament players. Established players are purged after 10 years of inactivity and on the rare occasion they do resurface, their ratings are quite different to what they were at purge time. A 10+ year old rating doesn't seem to be that reliable. It might be interesting to look at comeback players who have been inactive between 5 and 10 years to see if their current performance is anywhere their old rating. It may actually show that established players can be purged sooner.

Kevin Bonham
28-12-2010, 11:24 AM
Interesting discussion. In New Zealand, a provisonally rated player (< 20 rated games) will be purged from the system after 4 years of inactivity. Typically, these are one-tournament players. Established players are purged after 10 years of inactivity and on the rare occasion they do resurface, their ratings are quite different to what they were at purge time. A 10+ year old rating doesn't seem to be that reliable. It might be interesting to look at comeback players who have been inactive between 5 and 10 years to see if their current performance is anywhere their old rating. It may actually show that established players can be purged sooner.

The thing I find intriguing about this whole discussion is that if you have a system where people's out-of-date ratings are purged (not such a rare practice although FIDE does not do it), inactive players don't seem to mind. But under Glicko, where the old rating is allowed to still have some influence on the new rating (which is mathematically best practice since it will still have some predictive value even after several years) some people complain that their rating has gone "down", and do so even if their new rating is significantly higher than the new rating they would have got in a system where their old rating was "scrapped".

I proposed a way of getting around this, which is that old ratings be declared to have expired (so that they are still held on master file but a player no longer "has" that rating and hence has no valid rating to lose points from by returning to tournament play) but Peter Parr didn't like that either!

CivicChessMan
29-12-2010, 04:45 AM
that old ratings be declared to have expired (so that they are still held on master file but a player no longer "has" that rating and hence has no valid rating to lose points from by returning to tournament play)
Effectively the rating is declared as expired in NZ and has no relevance if a player makes a comeback. I have never heard any complaints from players making a comeback either. After 10 years or more inactivity, it's quite likely that they have only a rough idea of their rating anyway. The rating pool can have changed significantly, for example, in NZ, in 2006, there were 213 active juniors, in 2010, there are 473 active juniors. I can't think of any other sport where a player can "retire" for a number of years and not have to prove himself when making a comeback.

Kevin Bonham
01-01-2011, 03:37 PM
Reading the Australian Open thread I noticed a blatant inconsistency in Peter Parr's arguments across different issues. On this thread he argues against Glicko and a part of his argument is that Australia is (supposedly) the only country using such a system and therefore there must be something wrong with it.

Yet in discussions of the Australian Open Rapidplay (not held this year despite Parr supporting it) we find:


I would like to suggest that the Australian Open Rapid Play Championship be held on Saturday 8 January 2011.

Australia to the best of my knowledge is the only country in the world that has a regular updated rapid rating list (produced by Bill Gletsos).

Since Peter has argued against Glicko on the grounds that it is not widely used, by the same token he should be arguing against us having a rapid rating system (under Parr logic if it was any good everyone else would do it including FIDE) and therefore he shouldn't have used its existence as an argument in favour of an Australian Open Rapidplay.

CivicChessMan
01-01-2011, 06:24 PM
From Peter Parr: Australia to the best of my knowledge is the only country in the world that has a regular updated rapid rating list (produced by Bill Gletsos). .

But we all know that other countries have rapid rating lists.

Kevin Bonham
01-01-2011, 06:34 PM
But we all know that other countries have rapid rating lists.

Indeed. This is another problem with Parr's arguments; he often makes claims based on "to my knowledge" without making use of the excellent facilities of the internet to quickly check whether these claims are actually true. A quick search using terms like "chess rapid ratings lists" will show within seconds that we are not alone in having one.

Tony Dowden
01-02-2011, 06:43 PM
In my view any decent rating system needs to have the confidence of all (or very nearly all) the players. It is not enough for a system to be mathematically unimpeachable (keeping the geekiest maths geeks among us chess geeks delirously happy!), it also has to fit the wider social context.

Although a mathematical argument might be made in defence of Norris' stupendous rating drop, I doubt there's a sensible contextual argument to be made. I think its perfectly understandable for an established player to feel aggrieved at such a massive plunge in their rating.

The solution - especially if one doesn't have much respect for Glicko in the first place - is, firstly, to simply to ignore one's rating and just play chess. But of course this isn't so easy if your rating partly defines your chess identity. In my case I played chess for over 30 years without the encumbrance of a Glicko rating. So my identity as a player isn't (and never has been) tied to the Aussie rating system I don't understand and - in all honesty - don't really care about.

Secondly, one should cross the Tasman and get an NZ rating. A Kiwi Elo is worth cherishing: before you can blink it will add at least 200 points to your old Glicko rating; it has all kinds of additional perks like exceptional performance bonuses, junior proofing, more extra points for everyone to combat global financial crisis blues, and good behaviour bonuses*; and it generally makes you feel good and generates the illusion you a strong player. Why be a struggling 1450 Glicko-rated player at risk of dropping to 1200 in your next event when you can be a 1700 Kiwi-rated player and be confident that you'll still be 1500 player in your dotage?

*Just kidding about good behaviour bonuses, but I reckon it's not a bad idea ;)

Denis_Jessop
04-02-2011, 06:35 PM
In my view any decent rating system needs to have the confidence of all (or very nearly all) the players. It is not enough for a system to be mathematically unimpeachable (keeping the geekiest maths geeks among us chess geeks delirously happy!), it also has to fit the wider social context.

Although a mathematical argument might be made in defence of Norris' stupendous rating drop, I doubt there's a sensible contextual argument to be made. I think its perfectly understandable for an established player to feel aggrieved at such a massive plunge in their rating.

The solution - especially if one doesn't have much respect for Glicko in the first place - is, firstly, to simply to ignore one's rating and just play chess. But of course this isn't so easy if your rating partly defines your chess identity. In my case I played chess for over 30 years without the encumbrance of a Glicko rating. So my identity as a player isn't (and never has been) tied to the Aussie rating system I don't understand and - in all honesty - don't really care about.

Secondly, one should cross the Tasman and get an NZ rating. A Kiwi Elo is worth cherishing: before you can blink it will add at least 200 points to your old Glicko rating; it has all kinds of additional perks like exceptional performance bonuses, junior proofing, more extra points for everyone to combat global financial crisis blues, and good behaviour bonuses*; and it generally makes you feel good and generates the illusion you a strong player. Why be a struggling 1450 Glicko-rated player at risk of dropping to 1200 in your next event when you can be a 1700 Kiwi-rated player and be confident that you'll still be 1500 player in your dotage?

*Just kidding about good behaviour bonuses, but I reckon it's not a bad idea ;)

The problem with this argument is the use of words like "confidence" and "respect". Even if the bulk of Australian chess players' views on this topic could be obtained - a virtual impossibility - what has it to do with the value of the rating system. It's somewhat like saying "Do you believe in the Theory of Relativity".

DJ

Tony Dowden
06-02-2011, 08:55 AM
The problem with this argument is the use of words like "confidence" and "respect". Even if the bulk of Australian chess players' views on this topic could be obtained - a virtual impossibility - what has it to do with the value of the rating system. It's somewhat like saying "Do you believe in the Theory of Relativity".

DJ
DJ: Your point is too cryptic (you may well have one but I don't know what it is).

Perhaps your problem is the analogy about the Theory of Relativity - which might be too much of stretch?

Kerry Stead
06-02-2011, 08:30 PM
Would something similar to what is now being used in the US, a title system, explained here (http://www.glicko.net/ratings/titles-0910.pdf), satisfy both sides of the argument?
Use Glicko for rating, which is current & predictively more accurate, but have titles for those attached to past glories ('I may be rated 1500 now, but I'm a Candidate Master') ...
Of course this requires someone going through rating lists, tournament results, etc & calculating said norms & titles ...

Rincewind
06-02-2011, 08:41 PM
DJ: Your point is too cryptic (you may well have one but I don't know what it is).

I believe what DJ might be hinting at is countering your claim that a rating systems benefit cannot be measured by its statistical utility. When it comes to rating systems mathematical arguments trump "wider social contexts".

Like the theory of relativity. It is not a question of believing or not believing it. It fits the data better than the opposition and so that is what is used. When a physics theory comes along that fits the data better, then that will replace Einstein's theory. But at the moment we are not there and we certainly won't pick the physics theory which most people feel confidence in.

It was claimed (probably incorrectly) that when relativity was in its youth, only 3 people in the world understood it. This did not stop it from being the most correct theory available.

Kevin Bonham
06-02-2011, 09:43 PM
Of course this requires someone going through rating lists, tournament results, etc & calculating said norms & titles ...

Old tournament results too, possibly dating even from times when the ACF system was very inaccurate, since the players who complain about Glicko mainly do so because it fails to preserve strong indications of past glories.

It's an interesting system and I'd never looked at the full details of it before. For instance for a US Candidate Master title you need to at some point get a rating of 2000 and you need five "norms". However the norms require performances that are quite a bit stronger than 2000.

Norms are gained in events of 4+ rounds by scoring more than a point above a target score based on the rating cutoff for the title being aimed at. If you are chasing the CM title then a win against a 2400+ puts you +1 vs target, a win against a <1800 scores you nothing, a win against 1800-2000 scores you points on a slide scale of 0 to 0.5 and a win against 2000-2400 scores you points on a slide scale from 0.5 to 1. For a draw subtract 0.5 from the above and for a loss subtract 1.

If we had this system in Australia I would be First Category, probably a fair few times over in terms of the 5 norms required. I would be nowhere near Candidate Master since although I have twice briefly met the rating threshhold, not a single one of my roughly twenty 2000+ performances is even close to a Candidate Master norm.

Craig_Hall
07-02-2011, 01:20 PM
I believe what DJ might be hinting at is countering your claim that a rating systems benefit cannot be measured by its statistical utility. When it comes to rating systems mathematical arguments trump "wider social contexts".

Like the theory of relativity. It is not a question of believing or not believing it. It fits the data better than the opposition and so that is what is used. When a physics theory comes along that fits the data better, then that will replace Einstein's theory. But at the moment we are not there and we certainly won't pick the physics theory which most people feel confidence in.

It was claimed (probably incorrectly) that when relativity was in its youth, only 3 people in the world understood it. This did not stop it from being the most correct theory available.

The difference being that the Laws of Physics don't get more accurate with more data.

Oepty
07-02-2011, 01:47 PM
Perhaps we can have a Peak Rating rating list, that way we could see how good people were at there best.
Scott

Rincewind
07-02-2011, 07:48 PM
The difference being that the Laws of Physics don't get more accurate with more data.

Well they can, after a fashion. So all analogies break at some level but you could extend this to physics by thinking about the improvement in measurement over the history of physics. So Newtonian mechanics are fine and in the mid-1800s people were sort of thinking that physics was pretty dull and uninteresting because Newton had stood on the shoulders of giants and seen as far as there was to see. However then we got relativity thanks to Einstein (and Lorenz and others) and then the atom got very interesting and quantum effects were measured and for the last 100 years at least physics is still a very vibrant field with a lot of interest.

So I guess there some sort of analogy there, but I don't think Denis's comments were aimed in that direction. It was just making the point that we know how predictive a rating system is. We can measure that objectively. If you have an objective measure of the usefulness of a system then you obviously pick the most useful one. The "wider social context" might be an important issue in the management of a system (you may need to sell it to the community) but it should not be a factor in the selection of a rating system.

So the marketing of Glicko could possibly be improved. Which reminds me of something Nietzsche said


Not enough! - It is not enough to prove something, one also has to seduce or elevate people to it. That is why the man of knowledge should learns how to speak his wisdom: and often in such a way that it sounds like folly!

Friedrich Nietzsche, Daybreak

Rincewind
07-02-2011, 07:54 PM
Perhaps we can have a Peak Rating rating list, that way we could see how good people were at there best.

The trouble is always the comparison between eras is always difficult. Perhaps the CM title idea is the best way to do this - but I'm sure that system would have issues too.

The only way to really know what to do is to identify the problem. If the problem is players ratings going down by massive amounts - if that causes a predictive problem with the system then it should be addressed. If it is actually a good thing for the system to do (over predictivity is not compromised) then potentially we have a self-esteem issue or a public confidence issue then that needs to be dealt with separately. But in general mathematics is not good at solving such problems.

Oepty
07-02-2011, 09:11 PM
The trouble is always the comparison between eras is always difficult. Perhaps the CM title idea is the best way to do this - but I'm sure that system would have issues too.

Yes that is true and people can choose to take as little or as much stock on a 20 year old peak rating of 2250 as they like. Maybe players who return and have there rating drop feel better because they can still point to something as proof of their previous strength and may be able to get back too. I am not entirely sold on the idea, but it might be worth considering.



The only way to really know what to do is to identify the problem. If the problem is players ratings going down by massive amounts - if that causes a predictive problem with the system then it should be addressed. If it is actually a good thing for the system to do (over predictivity is not compromised) then potentially we have a self-esteem issue or a public confidence issue then that needs to be dealt with separately. But in general mathematics is not good at solving such problems.

Too Peter Parr there is a problem that players lose a lot of rating points when they return even though it seems to be fairly good for the predicitivity of the system. I am open to someone putting forward an argument that returning players are sometimes losing too many rating points but also think if it has happened and the fall was really not justified playing a few more tournaments will quickly correct it. I also think that there are, in any rating system including the ACF rating system, ratings that are just wrong.
Scott

Basil
07-02-2011, 11:31 PM
My logic was that how could it be possible that someone who has been rated between 2000 and 2100 for the past 25 years and one who still has an active FIDE rating (at the time above 2100) drop an astonishing 447 points in one tournament ...
I don't know Damian well, but have met him on occasion. From these occurrences and by reputation, I believe him to be 'a good chap'.

Nonetheless ...

At around the time of these happenings, Damian attended the Brisbane Club where I played him in a semi-serious game (clocks, competition, unrated, perhaps game in in 30 mins). I won. I was rated approximately 1450-1550 at the time.

Carry on! You're all doing very well.

Tony Dowden
09-02-2011, 06:30 PM
I believe what DJ might be hinting at is countering your claim that a rating systems benefit cannot be measured by its statistical utility. When it comes to rating systems mathematical arguments trump "wider social contexts".

Like the theory of relativity. It is not a question of believing or not believing it. It fits the data better than the opposition and so that is what is used. When a physics theory comes along that fits the data better, then that will replace Einstein's theory. But at the moment we are not there and we certainly won't pick the physics theory which most people feel confidence in.

It was claimed (probably incorrectly) that when relativity was in its youth, only 3 people in the world understood it. This did not stop it from being the most correct theory available.

Thanks for the patient explanation. Yes, I think that's probably what was meant.

But to my mind it confirms my suspicion the anaology of the theory of relativity is probably too much of a stretch. Mathematicss is great for describing/calculating relativity - I guess(!) [a close relation is an emeritus professor of physics but it didn't rub off on me] - but the (social) context of chessplayers and their self concept is far less exact. Being assured that "the maths is correct (so there)" isn't necessarily going to shore up confidence among the subset of players experiencing wild gyrations in ratings. (Admittedly, it probably does for the subset of maths geeks among us but even I know it's too much of stretch to assume both subsets will completely intersect).

I still recommend getting a Kiwi Elo ;)

Kevin Bonham
09-02-2011, 11:09 PM
Some forms of ELO may be more flattering than others but all forms of ELO that are not radically modified are pseudoscientific rubbish that is unworthy of even being considered to be a rating system. This is a view I have held for a long time but it has just been strengthened by consideration of what the Tasmanian top ten would look like if we still used it. :lol:

Desmond
10-02-2011, 10:02 AM
Some forms of ELO may be more flattering than others but all forms of ELO that are not radically modified are pseudoscientific rubbish that is unworthy of even being considered to be a rating system. This is a view I have held for a long time but it has just been strengthened by consideration of what the Tasmanian top ten would look like if we still used it. :lol:
Well go on then ... you can't just say that and not show and tell! What would it look like?

Kevin Bonham
10-02-2011, 06:43 PM
Well go on then ... you can't just say that and not show and tell! What would it look like?

The current Tasmanian top ten is Dowden, Dyer, Small, Steward, Markovitz, Bonham, Frame, Chadwick, T Hendrey, M Bretag.

This seems about right to me. Dowden and Dyer in some order are clearly 1 and 2 as they more or less consistently dominate weekenders and almost always draw with each other. Small is hard to place correctly as he had a very high rating but has only played two tournaments in Tas, both at well below that strength but not enormously so. Steward had some modest results on returning from a long absence but has since had some stellar ones and won this year's Hobart champs well ahead of Markovitz. Bonham has been out of form for two years but there is no one below consistently performing well enough to push him lower than sixth.

Under ELO the top ten for the December ratings period would be Small, Markovitz, Dowden, Steward, Bonham, Dyer, Lucas, Chadwick, Frame, Gibbs. Marcus Bretag who has been a regular podium finisher (and even in one case winner) in weekenders would not be near the top ten. Small and Markovitz would be over-favoured by their high start rating in the state (Small) and good results in first rated sample (Markovitz). Dyer, M Bretag, T Hendrey would still be experiencing drag on their rating from long-ago junior days.

Rincewind
10-02-2011, 10:47 PM
Being assured that "the maths is correct (so there)" isn't necessarily going to shore up confidence among the subset of players experiencing wild gyrations in ratings. (Admittedly, it probably does for the subset of maths geeks among us but even I know it's too much of stretch to assume both subsets will completely intersect).

I don't disagree that as a marketing exercise the present situation could not be improved upon. However as far as the maths goes, it is not usually a good idea to try and fix marketing problems with mathematics. :)

My suggestion is stick with Glicko until something better comes along but address the management issues (what I suspect are actually latent change management issues) as and when necessary.

Santa
07-03-2011, 03:32 AM
If they persist in complaining that they will "lose" ratings points tell them that a rating that is 10, 20 or 30 years old is meaningless as an indicator of a player's current skill level and therefore they shouldn't mind about getting a new one that might be lower since their old one means very little anyway. Their old rating doesn't say how good they are, it says how good they were.


Who was it complained that a previous NRO expunged ancient ratings?

Santa
07-03-2011, 03:32 AM
On the contrary I believe that many aspects of a rating system should be reviewed at all times to check that its predictiveness is optimised. Note the latter qualifier; to soothe the egos of people whose ratings have dropped even if this leads to much less accurate ratings on the whole is not a fit criterion for reviews.
.

When the National Ratings Officers make undocumented changes to the undocumented algorithms, when they reprocess data for the past ten years or so using those changed algorithms, when they don't publish the before/after results or the data they have processed, then the ratings are pretty much unpredictable to the vast majority of players.

I recently played in a tournament where I defeated each player I played whose rating was below 1800.

I also lost to everyone I played who was rated over 1300.

I was keen to see what magical effect this had on my rating, but unfortunately it was muddied by another tournament (fair enough), and by this reprocessing :rolleyes:

Cordless makes the point that kids like to see their ratings increase. I can tell you that holds for adults too, but some of us distinguish between earned increases and unearned increases.

Worse, I could not find these changes mentioned in the ACF newsletter or on its website.

Santa
07-03-2011, 04:14 AM
When those rising stars go through their next upsurge, an intermediate rating is most often used.

So if the player was 1400!! and then for a rating period they performed at 1800 for the whole period, the latter half of their games would be rated as though they were 1600.

This has the effect of increasing the improving players rating faster, while minimising the effects on the players that person played as for the second half of the rating period the games would be worked out as though the player had a 1600 rating.

Also, for the next rating period, that player would most likely have a !, instead of !!, as their new rating is less reliable. This then means if they are continuing their improvement, their rating is more volatile from the first game of the new rating period.

I am not sure, but I believe so, that the opposite is true for a player on the downward spiral.

Now under ELO, the improving junior would only have their games rated for the whole period at a K factor of 15, which means their gains are much slower, meaning they are 'underrated' for a longer period of time. This then affects that players ability to get into rating restricted events, hurts their opponents with more rating points lost than should have occurred.

Also, here in Australia, we do have the effect quite often where a junior will play a few long time control events, get a rating, then only play in junior rapids for quite a while before returning to the long time control scene when they get a bit older.

Under ELO, it would be likely that their old rating would not catch up if they are rapidly improving, but under Glicko that returning junior would be starting with a ? or ?? and so their rating can go up in leaps and bounds.

Note, Elo is a chap's name, it is not spelled ELO.

There are many implementations of Elo ratings systems, FIDE has one, Australia had another.

One of the flaws in Australia's implementation was that all games were rated using players' last-published ratings.

A player with a rating of 1400 on a list might perform at 1800 through 50 games in the next period, but all his rating changes were based on his previous rating of 1400.

When I took over the ratings system, K was a property of the tournament! My memory is faded, but I think Junior events used K=25 which might be okay as all the participants were juniors. But K=25 was also used for the Australian Championship "to reflect its importance."

As I recall, I changed it to use K=25 (not 15) for juniors, and we also used bonus points (an optional part of an Elo system, clearly unsuitable for FIDE back then) where players' ratings would be increased more rapidly if they were clearly underrated as evidenced in tournaments of more then five (or six?) rounds. These bonus points were also calculated using players' last-published ratings.

A consequence of this was that some players (those on both busiest and most improved lists) were likely to switch from underrated to overrated.

As for me, I am not convinced that this much sought-after predictability is even achievable. I used to play at Waverley and Dandenong chess clubs in Melbourne.

Waverley had a junior club, with juniors regularly advancing to the senior club. People like Geoff Saw, David Cordover, Joseph, Michael and Samuel Chow, Jason de Boer, Matthew de Souza who were all in the Junior club and advanced to Seniors while I was there. Additionally, because of the presence of those juniors, others came in from other clubs - Simon Rutherford, Joel McDonald, Michael Kagan the younger, and we also had some Monash students such as Robert Shankly & Min Vui Voon.

Because of the number of younger players, all generally improving relatively rapidly. the older senior members were "robbed" of ratings points and so were underrated themselves.

In contrast, I only recall one junior at Dandenong, Daniel See. Mostly, players there were from Eastern Europe with a good number rated 1500-1600 or so. Their ratings were around 200 points above mine, but I could expect to beat them fairly often, and in one club championship I beat enough to come equal second.

Unless and until you have a ratings system that can predict improved performances before they occur, then you will always have improving players who are underrated and they (and their victims) will defeat your predictions.

Likely you will also have overrated players, but I've not seen so many of them. Except when they come from another circle of chessplayers - maybe another club, another state. Or territory.

Santa
07-03-2011, 04:23 AM
Effectively the rating is declared as expired in NZ and has no relevance if a player makes a comeback. I have never heard any complaints from players making a comeback either. After 10 years or more inactivity, it's quite likely that they have only a rough idea of their rating anyway. The rating pool can have changed significantly, for example, in NZ, in 2006, there were 213 active juniors, in 2010, there are 473 active juniors. I can't think of any other sport where a player can "retire" for a number of years and not have to prove himself when making a comeback.

My oldest daughter hasn't played since she acquired a published rating (a fairly dubious one I have to say, I think it's based on one win, a one-move blunder by her opponent, but that's how it works). Her old record is still in the ACF master file, and she was pretty surprised when I told her what her rating is. She remembers, even after 20 years or so.

Santa
07-03-2011, 04:33 AM
The problem with this argument is the use of words like "confidence" and "respect". Even if the bulk of Australian chess players' views on this topic could be obtained - a virtual impossibility - what has it to do with the value of the rating system. It's somewhat like saying "Do you believe in the Theory of Relativity".

DJ

Which Theory of Relativity Dennis?


The Special Theory of Relativity might be hard to understand, but it does not make random changes to my environment.

Nor does the General Theory of Relativity.

Santa
07-03-2011, 05:03 AM
Some forms of ELO may be more flattering than others but all forms of ELO that are not radically modified are pseudoscientific rubbish that is unworthy of even being considered to be a rating system. This is a view I have held for a long time but it has just been strengthened by consideration of what the Tasmanian top ten would look like if we still used it. :lol:

If you mean the Elo system as implemented by the ACF is rubbish, I agree with you. if your comparisons are between Glicko-2 and the ACF's implementation of Elo's recommendations, then those comparisons are unfair to Elo systems because of the ACF's misguided implementation.

You can make all the assertions you like about the superiority of the ACF's implementation of the Glicko-2 system, but until the ACF implementation is properly documented, and the tests you say have been run are published, I can see no merit in your assertions.

All the documentation I have seen on the ACF's current rating system says that is is "based on the Glicko-2 system." I have seen nothing on what changes have been made or why they were made.

The ACF's Elo system, with all its faults, was documented.

Kevin Bonham
07-03-2011, 09:00 AM
If you mean the Elo system as implemented by the ACF is rubbish, I agree with you. if your comparisons are between Glicko-2 and the ACF's implementation of Elo's recommendations, then those comparisons are unfair to Elo systems because of the ACF's misguided implementation.

Which ELO recommendations did you have in mind and has there ever been a running example of an ELO system that did implement them? As far as I can tell the use of the term "ELO system" for a system lacking features you attribute to it in this post (http://chesschat.org/showpost.php?p=303217&postcount=23) has become so widespread that calling something Elo originally had in mind but isn't implemented "the ELO system" is scarcely valid. It's like saying that what everybody knows as ELO isn't actually ELO at all - which may be true but isn't helpful in discussing the widespread "imposter".


You can make all the assertions you like about the superiority of the ACF's implementation of the Glicko-2 system, but until the ACF implementation is properly documented, and the tests you say have been run are published, I can see no merit in your assertions.

Actually you don't need to have a rating system documented in order to test how well it works. You can simply use the ratings published and a sufficiently large sample of tournament test data from appropriate following periods.

As to whether the Ratings Officers see any need to release all the documentation mentioned for the sake of convincing you, I leave that up to them.


All the documentation I have seen on the ACF's current rating system says that is is "based on the Glicko-2 system." I have seen nothing on what changes have been made or why they were made.

Actually while there hasn't been enough released for anyone to attempt to exactly replicate it themselves (not that you could do so without all the player files anyway), broad details of reasons and natures of significant changes have generally been released.

Kevin Bonham
07-03-2011, 09:24 AM
Who was it complained that a previous NRO expunged ancient ratings?

While they are largely meaningless as an indicator of current playing strength (most likely being accurate only within 300-500 points), ancient ratings that were based on substantial numbers of games are still useful both for historical purposes and to ensure that a player is not wrongly treated as unrated for ratings prize purposes in those events which use "unrated prizes" to encourage players who are more or less newcomers.


When I took over the ratings system, K was a property of the tournament! My memory is faded, but I think Junior events used K=25 which might be okay as all the participants were juniors. But K=25 was also used for the Australian Championship "to reflect its importance."

I think this is a silly change unless there is evidence that weighting performance in more "important" events better predicts a player's performance overall.


One of the flaws in Australia's implementation was that all games were rated using players' last-published ratings.

Yet it is precisely this flaw (and it is a flaw), widespread in well known versions of the ELO system, that is the main reason we have people like Peter Parr coming on here to defend that implementation and ask us to bring it back.


Because of the number of younger players, all generally improving relatively rapidly. the older senior members were "robbed" of ratings points and so were underrated themselves.

The ACF's version of Glicko includes many features that mute this problem and this is probably a reason why it is more predictive than what is widely known as ELO. It is very difficult for any rating system to be completely predictive when dealing with fast-improving players without occasional severe predictive overshoots. But just because you can't fix the problem completely doesn't mean you shouldn't try.


When the National Ratings Officers make undocumented changes to the undocumented algorithms, when they reprocess data for the past ten years or so using those changed algorithms, when they don't publish the before/after results or the data they have processed, then the ratings are pretty much unpredictable to the vast majority of players.

Ratings are pretty much unfathomable to the vast majority of players if you use any complex system whether you publicise those details or not.

Denis_Jessop
10-03-2011, 10:24 AM
Which Theory of Relativity Dennis?


The Special Theory of Relativity might be hard to understand, but it does not make random changes to my environment.

Nor does the General Theory of Relativity.

It doesn't matter which theory. The point of my observation was the word "believe" that is, as an act of faith.

DJ

pax
11-03-2011, 04:36 PM
While I'm not a huge fan of Glicko, I do think the concept of not playing *ever* in order to protect a ten year old rating is nothing short of absurd.

20 years ago the rating system was full of 1000-1300 rated juniors who routinely played >500 points above their rating. At least that situation does not arise any more (at least, not extensively).

Tzoglanis
16-03-2011, 11:50 AM
Hi,
I do not mind if I am made unrated. Rather than going through all this with Glicko1, 2, etc.
Why not make me unrated. I have not played in ten years. I am not protecting a rating. Who cares? With all this bickering chess is going backwards in NSW. Is achieving a high rating the goal in someone's enjoyment of the game? Question for Kerry: Having a 1200 rating and a title at the same time is a little absurd?! Unless you meant your post as a joke.
Cheers anyway.
So my idea is make everyone or people that do not mind, unrated after a 10 year hiatus?
What do you think?
Start afresh.

Vlad
16-03-2011, 02:14 PM
Do you realize that if you have no rating and you have a bad result in the first tournament your new rating will be even worse than under Glicko??

antichrist
16-03-2011, 02:24 PM
On dark side of moon they allege with graphs to prove that NSW players get a better rating than Victorian players - is it true?

Oepty
16-03-2011, 04:14 PM
On dark side of moon they allege with graphs to prove that NSW players get a better rating than Victorian players - is it true?

The reply he received over there seems to be very much on the point. The fact the poster can not even get support on the dark side probably is a bad sign for the argument.
Scott

Kevin Bonham
16-03-2011, 04:37 PM
On dark side of moon they allege with graphs to prove that NSW players get a better rating than Victorian players - is it true?

That post is rubbish, even Sweeney (to his credit) pointed that out.

The poster is claiming that strong NSW players have a higher ACF rating compared to their FIDE than Victorians do. Of those players for whom figures are visible in the post, the average NSW example has an ACF 68 points higher than their FIDE but the average Victorian example has a FIDE 11 points higher than their ACF. That suggests a 79 point gap.

The problem is that the poster is using too small a sample (though parts of it seem to have been chopped by dodgy presentation) and also they seem to not be selecting their sample systematically.

Doing it properly by comparing the top 30 by ACF rating from each state (of those who have FIDE ratings) it turns out that the average gap drops from the 79 points in Frightened Victorian!'s sample to just 14 points.

Thus the ACF system thinks the top 30 NSW players are better than the top 30 Victorians by 53 points but the FIDE system thinks the NSW players are better by 39.

It's a trivial difference and is more likely to be just a result of the vagaries of the histories of different players in each group than anything systematic, let alone suspicious.

ChessGuru
16-03-2011, 04:48 PM
What are we really talking about...

Can anyone give a specific number by which the ratings are more accurate under the shrouded-in-complexity Glicko 2.5 when compared to the more straightforward Elo?

Because if we're talking about a rating which is 5 points more accurate then I'm sure many a player would sacrifice the 5 points in exchange for clarity, predicability, transparency etc....

After all the super accuracte ratings didn't exactly predict all that well the results of the recent Begonia Open.

Oepty
16-03-2011, 05:15 PM
After all the super accuracte ratings didn't exactly predict all that well the results of the recent Begonia Open.

You might be right with this statement but what is your proof?
Scott

Garvinator
16-03-2011, 05:45 PM
After all the super accuracte ratings didn't exactly predict all that well the results of the recent Begonia Open.
In individual results NO rating system is going to be accurate 100 per cent of the time. So Glicko2 is not going to be able to predict results 100 per cent of the time, neither will ELO.

The key number is that when 20 or so players of a certain rating (say 1800 approx) met another 20 or so players with a different rating (say 1600 approx), did the results of all those games fall within the percentage that Glicko2 would predict would occur.

And conversely, was ELO's predictive factor better for those players than was Glicko's.

Those are the key questions in a nutshell.

For ELO to be a better predictor, it needs to get a higher percentage on the test above to a better rating system. I strongly doubt it.

Feel free to present your hard evidence based argument from the results from Ballarat to show that ELO did better than Glicko2 at predicting results.

Oepty
16-03-2011, 06:53 PM
In individual results NO rating system is going to be accurate 100 per cent of the time. So Glicko2 is not going to be able to predict results 100 per cent of the time, neither will ELO.

The key number is that when 20 or so players of a certain rating (say 1800 approx) met another 20 or so players with a different rating (say 1600 approx), did the results of all those games fall within the percentage that Glicko2 would predict would occur.

And conversely, was ELO's predictive factor better for those players than was Glicko's.

Those are the key questions in a nutshell.

For ELO to be a better predictor, it needs to get a higher percentage on the test above to a better rating system. I strongly doubt it.

Feel free to present your hard evidence based argument from the results from Ballarat to show that ELO did better than Glicko2 at predicting results.

Garvin, to be fair to David it is a bit hard for him to do a proper comparison between the two lots of ratings because he does not have a list showing what every players rating would have been if the old Elo system had been continued until now. This makes your task for him impossible. He however can compare results with expected results from players current ACF rating as see how it compares.
I have no idea what a good result for a rating system would be for the tournament. Maybe this tournament was one where the ACF ratings did worse than normal, not having looked at it I have no idea though.
Scott

Kevin Bonham
16-03-2011, 08:31 PM
What are we really talking about...

Can anyone give a specific number by which the ratings are more accurate under the shrouded-in-complexity Glicko 2.5 when compared to the more straightforward Elo?

In terms of using each one to predict player scores in subseqeunt play, "Glicko 2.5" is about 14% less error-prone than the version of ELO that the ACF replaced with Glicko. (The 14% is an average of two error estimate methods being looked at.)


Because if we're talking about a rating which is 5 points more accurate then I'm sure many a player would sacrifice the 5 points in exchange for clarity, predicability, transparency etc....

Certainly - but the difference will be many times that even for the proverbial "average player". I'm not sure how to translate the prediction differences into a rating error difference but I'd be surprised if it was less than about 30 points average, and it could be much more. And for juniors, especially strong juniors, it would be worse still.

For instance, if we were still using the same Elo system as the one we replaced with Glicko over a decade ago, then the average difference between the top 20 players who are over 20 years old, and the top 20 who are under 20 years old, would be about 180 points higher than it is in the current system. It's likely all this error would be driven by the old Elo system's inadequate response to junior improvement and over-40 decline, suggesting that top players and top juniors would be incorrectly rated by at least 90 points on average. There would also be substantial differences in the composition of those lists.

Moulthun Ly recently tied for first in the Australian Open with a GM and a near-GM. Under the current system he is rated 5th in Australia. Under the old Elo system he would still not be rated in the top 20 and nor would any other under-20 age player.

antichrist
16-03-2011, 08:55 PM
Originally Posted by DoubleRoo on his blog comments
David, that sounds like an introduction to a bloody good rant The Glicko system was introduced after I left but nevertheless my rating skyrocketed to 2582. Meanwhile Darryl's rating took a temporary dive . I don't remember to what but a friend at an intoxicated blitz session worked out that I would now need to score 90% against Darryl to keep my rating. We were all in tears laughing.

__________________
AC,dont know who doubleroo is but sure KB does. I have deleted FG's comments re KB's figure

Kevin Bonham
16-03-2011, 09:08 PM
Well firegoat is just incoherently abusive as usual - too stupid to even find Alex Wohl's current ACF rating for himself. It is 2432 (not the 2582 of Wohl's anecdote) followed by a blank. If goatboy can't find it in one minute then I suggest he be careful with challenging tasks such as breathing; he might absorb oxygen and seriously injure himself.

doubleroo is Wohl and the anecdote he tells is amusing but would suggest that at some point his rating exceeded Johansen's by about 360 points. I doubt that is actually true.

Anecdotes (even true ones!) about issues from when Glicko-1 was first introduced are not relevant to assessing Glicko-2 now.

Kevin Bonham
16-03-2011, 10:39 PM
AC, I deleted that rubbish of firegoat's that you brought across from the other place since most of it was just his usual dumb abuse and you were clearly doing so just to try to exacerbate fights.

If you must try to give his rubbish an airing on this board then just stick to the bits that actually try to make a coherent point related to the issue (good luck finding them) and don't copy and paste the rest of his dross. If you keep dragging abuse over the lights will go out for you from another section of the board.

The only thing in his latest frothing that's worth responding to is this:


Nope...but I will add, not being active in Australia makes Alexs rating unpublishable on the March list.

firegoat is completely wrong again. Alex Wohl has played games rated by the ACF within the last two years and therefore appears on the main list of active Australian players. Indeed his rating (having a blank after it) is more reliable than many ? and ??s that appear on the list.

Kevin Bonham
17-03-2011, 09:36 AM
And he's still going on about this:


The rule now is that Alex Wohl is no longer considered a "Top Player" because of some arbitrary ACF created conditional rule. But if you look in the active player list he is arbitrary regarded as an active player because of some arbitrary created conditional rule made up by the ACF.

It has long been the case that the activity threshholds for listing as an active player and for listing on the Top Players lists are different. The first just requires at least one game in the last two years - you have to draw a line defining active and inactive somewhere. The latter requires a reliable rating indicated by ! or !!.

There is nothing arbitrary about excluding less reliably rated players at all. If you have a rusty player with a high rating who comes out of hiding and wins one game against a bunny (and yes this sort of thing has happened), you don't want someone with 2400?? to be listed as one of the country's top 20 players until they have shown they really are such. But as they have recently played a game, they are an active player.

The treatment of expats and players who play mainly overseas under this rule is a tricky business. Perhaps if ways can be found to assess their performance in OS tournaments within the ACF rating system then they could be rated more reliably and qualify for inclusion in the Top Players lists. The alternative is to go cherry-picking players who are excluded from the rule and listed anyway - now that really would be "arbitrary".

Vlad
17-03-2011, 11:19 AM
Below is the list, which is an average of ACF and FIDE ratings. I think both systems have some issues, taking averages diminish issues of both systems. In my view, the ratings below are very much reasonable (at least for the top players).




1. 3202534 Zhao, Zong-Yuan g 2601 1986
2. 3202305 Smerdon, David g 2514 1984
3. 3202933 Xie, George Wendi m 2485 1985
4. 3200345 Wohl, Aleksandar H. m 2425 1963
5. 3200035 Johansen, Darryl K. g 2415 1959
6. 3200043 Solomon, Stephen J. m 2413 1963
7. 3204405 Ly, Moulthun f 2400 1991
8. 3201970 Goldenberg, Igor m 2383 1969
8. 4126505 Smirnov, Vladimir f 2383 1974
10. 401269 Teichmann, Erik f 2369 1961
10. 3205207 Illingworth, Max f 2369 1992

Oepty
17-03-2011, 11:39 AM
Below is the list, which is an average of ACF and FIDE ratings. I think both systems have some issues, taking averages diminish issues of both systems. In my view, the ratings below are very much reasonable (at least for the top players).




1. 3202534 Zhao, Zong-Yuan g 2601 1986
2. 3202305 Smerdon, David g 2514 1984
3. 3202933 Xie, George Wendi m 2485 1985
4. 3200345 Wohl, Aleksandar H. m 2425 1963
5. 3200035 Johansen, Darryl K. g 2415 1959
6. 3200043 Solomon, Stephen J. m 2413 1963
7. 3204405 Ly, Moulthun f 2400 1991
8. 3201970 Goldenberg, Igor m 2383 1969
8. 4126505 Smirnov, Vladimir f 2383 1974
10. 401269 Teichmann, Erik f 2369 1961
10. 3205207 Illingworth, Max f 2369 1992



Seems like quite a good list. Others people might consider to be very close to this list would be Chapman, Canfell, Brown, Morris, Cheng and Ikeda. Might have missed someone but that is what I come up with off the top of my head.
Scott

antichrist
17-03-2011, 12:28 PM
as is all done by computers, maybe 2 lists could be churned out, one glicko 2 and also ELO or FIDE whatever. Any extra volunteers.

Vlad
17-03-2011, 02:42 PM
Seems like quite a good list. Others people might consider to be very close to this list would be Chapman, Canfell, Brown, Morris, Cheng and Ikeda. Might have missed someone but that is what I come up with off the top of my head.
Scott


12. Lane IM 2366
13. Chapman IM 2358
14. West IM 2350
15. Bjelobrk FM 2349
16. Cheng FM 2348


This list could be used to make an educated forecast who is Australia's next IM. Molton, myself and Eric are pretty much IMs in waiting. That means the "hottest" FM at the moment is Max, followed by Bobby.

Kevin Bonham
18-03-2011, 08:40 PM
Below is the list, which is an average of ACF and FIDE ratings.

This average is used by the ACF to conduct selections when there is insufficient time to form a selections panel. It generally works quite well, and is especially useful given that some players playing mainly offshore have much more FIDE rated data than ACF. Selecting someone like Wohl purely on his ACF rating at any time would not be a good idea.

While top young players tend to have worse rankings under FIDE than in the ACF system, they still rank better under it than they would under ACF-ELO if it was still being used.

I think this is because upcoming juniors typically enter the FIDE system at a much better playing strength than the ACF system, and therefore their FIDE ratings do not suffer so much drag from their earliest years. Also because ratings at the low end of the Australian FIDE pool have tended to be artificially high as a result (or legacy) of ratings floors, the reasonably high entry point is compensation for the slower pace of change as a player improves.

ChessGuru
18-03-2011, 11:47 PM
Given the fact that players are continually changing strength and coming in and out of the system (how many ACF Rated games does an average player play in their lifetime?) is it even possible to create a "reliable" and "accurate" rating system of our players?

Perhaps we're chasing a wild-goose? With the amount of data that is available (ie. the number of games each player plays and the unreliable opponents they are playing against) can we really achieve the sort of "predictive accuracy" that Glicko2 is able to produce on larger data-sets?

Santa
28-03-2011, 05:48 AM
While they are largely meaningless as an indicator of current playing strength (most likely being accurate only within 300-500 points), ancient ratings that were based on substantial numbers of games are still useful both for historical purposes and to ensure that a player is not wrongly treated as unrated for ratings prize purposes in those events which use "unrated prizes" to encourage players who are more or less newcomers.


This player master file isn't the place for data merely of historical interest. There are printed documents that provide that information, I have the odd rating booklet back to 1986.

Offering unrated prizes is haxardous at the best of times. Numerous times I've seen uni students from o/s picking them up.

Tournament organisers should instead offer novice prizes. A player performing at +1700 clearly is not a novice.



I think this is a silly change unless there is evidence that weighting performance in more "important" events better predicts a player's performance overall.

Exactly.



Yet it is precisely this flaw (and it is a flaw), widespread in well known versions of the ELO system, that is the main reason we have people like Peter Parr coming on here to defend that implementation and ask us to bring it back.



I quoted text in another post from Glickman saying that one should not change rhe ratings during a ratings period, and that the system works best with lots of games to process.

quote]
The ACF's version of Glicko includes many features that mute this problem and this is probably a reason why it is more predictive than what is widely known as ELO. It is very difficult for any rating system to be completely predictive when dealing with fast-improving players without occasional severe predictive overshoots. But just because you can't fix the problem completely doesn't mean you shouldn't try.



Ratings are pretty much unfathomable to the vast majority of players if you use any complex system whether you publicise those details or not.[/QUOTE]

Without the documentation, there is no transparency and no possibility of anyone finding a problem in the maths.

I was reading an old ratings booklet a while ago, researching before posting. and found where a player picked up that sometimes one player's results were not being processed (and had never been processed). In the April '95 I reported I'd found the bug, and promised reprocessed results RSN. The reason for the reprocessing was late results from SA.

Santa
28-03-2011, 05:58 AM
That post is rubbish, even Sweeney (to his credit) pointed that out.

The poster is claiming that strong NSW players have a higher ACF rating compared to their FIDE than Victorians do. Of those players for whom figures are visible in the post, the average NSW example has an ACF 68 points higher than their FIDE but the average Victorian example has a FIDE 11 points higher than their ACF. That suggests a 79 point gap.

The problem is that the poster is using too small a sample (though parts of it seem to have been chopped by dodgy presentation) and also they seem to not be selecting their sample systematically.

Doing it properly by comparing the top 30 by ACF rating from each state (of those who have FIDE ratings) it turns out that the average gap drops from the 79 points in Frightened Victorian!'s sample to just 14 points.

Thus the ACF system thinks the top 30 NSW players are better than the top 30 Victorians by 53 points but the FIDE system thinks the NSW players are better by 39.

It's a trivial difference and is more likely to be just a result of the vagaries of the histories of different players in each group than anything systematic, let alone suspicious.

I don't see how comparisons between ACF and FIDE ratings have any significant meaning. For starters, they are calculated differently and one should expect that with any given set of inputs, they would produce different outputs.

Both systems should produce higher relative ratings for stronger players.

Where a comparison is required, then the reporting tool should perform whatever conversion is required, and as the comparison changes, then the conversion forumula needs changing.

On no account should ACF ratings be changed just because the gap between ACF and FIDE ratings is changing, that is to be expected given the use of different calculations and the processing of different results: Australian players play in FIDE events that the ACF does not rate, and they also play in Australian events that FIDE doesn't rate.

Santa
28-03-2011, 06:03 AM
Feel free to present your hard evidence based argument from the results from Ballarat to show that ELO did better than Glicko2 at predicting results.

The first requirement for an Elo system to outperform a Glicko system is for it to have Elo ratings input.

Equally, a Glicko system requires Glicko ratings as input.

Unless those conditions are met, there is no possibility of a valid comparison.

Santa
28-03-2011, 06:08 AM
In terms of using each one to predict player scores in subseqeunt play, "Glicko 2.5" is about 14% less error-prone than the version of ELO that the ACF replaced with Glicko. (The 14% is an average of two error estimate methods being looked at.)



I'm sure someone will find this a funny thing to ask, but, to whom is this high predictive capability important, and for what reason(s).

I'll warrant that not a lot of chessplayers value it.

Santa
28-03-2011, 06:13 AM
Indeed his rating (having a blank after it) is more reliable than many ? and ??s that appear on the list.

If Alex has played a lot of games overseas and improved his chess, that would not be so. Given his age, it's likely he's not improved a lot, but if some did the same and was 30 years younger....

Santa
28-03-2011, 06:24 AM
Also because ratings at the low end of the Australian FIDE pool have tended to be artificially high as a result (or legacy) of ratings floors, the reasonably high entry point is compensation for the slower pace of change as a player improves.

I'm not sure that ratings floors are that important. Discarding early losses is, and FIDE is a little harder than that, it requires (I think, I read it an hour or so ago) two wins or the equivalent. Requiring players to actually score ensures their initial rating in't all that bad, I recall some of my juniors from Waverley entered the ratings system at around 1000.

Kevin Bonham
28-03-2011, 12:09 PM
Without the documentation, there is no transparency and no possibility of anyone finding a problem in the maths.

Actually with the use of software like Barry's Glicko-1 calculator it is possible for a player roughly checking their own rating to notice if there are really gross errors that might have been caused by, for instance, a 1-0 submitted as 0-1. Of course, with more data available checkability could be more precise.


I'm sure someone will find this a funny thing to ask, but, to whom is this high predictive capability important, and for what reason(s).

I'll warrant that not a lot of chessplayers value it.

You might think that but there are no end of complaints from adults and parents alike when juniors are rated far too low and as a result are playing 500+ points above rating and remaining underrated for years as a legacy of the overweighting of first results. Fast-improving juniors are a terrible problem for any rating system but ours handles them better than most.

Also, ratings have official purposes for things like setting limits for ratings prizes, conducting selections when there is not enough time to form a selection panel, and granting automatic entry for players of a certain strength to the Aus Champs. For all these things a predictively accurate rating system is important.


If Alex has played a lot of games overseas and improved his chess, that would not be so.

The same unknowns can apply to any player on that level of (un)reliability. They may be playing on the internet or receiving coaching, or they may just be playing sparsely without any actual change in playing strength.

ChessGuru
01-04-2011, 08:58 PM
Just enter the modified Glicko-2 system the the ACF is using into the Kaggle Ratings Comp and you've got independent verification of the improved predictive power of the system.

antichrist
04-04-2011, 10:28 AM
according to Peter Parr's chess column today FIDE will put out an international list for no extra charge making Glicko and Tronelo rating redundant, my comment - all the fighting for nothing

Kevin Bonham
04-04-2011, 10:37 AM
according to Peter Parr's chess column today FIDE will put out an international list for no extra charge making Glicko and Tronelo rating redundant, my comment - all the fighting for nothing

It won't make anything redundant if the FIDE system continues to be mathematically junk. By the way as FIDE includes more and more weaker players their current system will become more and more inappropriate.

If they are going to supplant all national rating lists they will need to get a real rating system. Hopefully this is the basis of their interest in the Kaggle comp.

antichrist
04-04-2011, 11:28 AM
It won't make anything redundant if the FIDE system continues to be mathematically junk. By the way as FIDE includes more and more weaker players their current system will become more and more inappropriate.

If they are going to supplant all national rating lists they will need to get a real rating system. Hopefully this is the basis of their interest in the Kaggle comp.
I don't care what system they use, as long as it is universal. A few years back I was consdering attendiing an overseas comp but the message came back do I have a FIDE rating, well that ruined that.

Ratings are not just the playthings of the ACF, members have a right to a universally-accepted system

peter_parr
04-04-2011, 12:14 PM
A report on the Kaggle competition exhibition in George St, Sydney at Deloitte last week appears in the Sydney Morning Herald 4th April 2011.

SMH (http://www.chessdiscountsales.com/news/newsindex.htm)

ChessGuru
04-04-2011, 12:42 PM
Can someone please help me with the maths of the Kaggle comp....as I understand it they're trying to find the best "predictor" of future results. The number that is on the "leaderboard" is around 0.25....

Eg.
ELO Benchmark 0.258992
Glicko Benchmark 0.258078
Current leader 0.248604
All Draws 0.301030

What I'd like to know is a reverse engineering thing... in order to improve from .259 to .249 (that is .01) exactly how much better does your predicting need to be?

If you predict all games as drawn then you score 0.301
If you predict a loss, when it was actually a win then you score 0.004

If White actually wins all games; to score .259 you need to predict white wins 0.55 of the time...to improve to .249 you need to predict white wins 0.563. I dunno what that equates to, perhaps around 7 rating points.

But when you take into account that about half of all games are drawn...it gets a bit more complicated.

Assuming you can predict .55 for all games which actually end up drawn (actual score .5) - to get the .259 score you need to predict the white wins 0.61 of the time.... to improve to .249 you need to predict white wins 0.64. That's around 16 rating points.

But what if your draw predictions are hopeless... let's say you only pick .25 for each draw (correct half the time). Now you need to improve your win-picking from 0.70 to 0.73 -- again around 16 rating points?

Am I right in thinking then that the best improvements we are going to be able to make on ELO are around 7-16 rating points per player? And the Glicko v. ELO benchmark is maybe only 1-2 points?


I'm also interested to know if there is any point to discussing the maths of ratings in the real world. Over the 10 year period FIDE provided results for only 25% of players had "initial ratings".... I assume that means there is "churn" (players starting and stopping) of about 75% over 10 years.

So taking the fact that you may not get sufficient data for most players to make a rating reliable - is it worth trying to improve on a reasonably efficient (0.259) and simple ELO formula.

In order to win the Kaggle comp you're only improving on ELO by 0.01 - and Glicko only improves on ELO by 0.001. And to do so you are adding up to 10 attributes to each player to help modify their rating by just the right amount...

Vlad
04-04-2011, 01:21 PM
As usual once it comes to ratings you have no idea what you are talking about... Let me just enumerate a few clear misunderstandings.
1) The number for ELo is not 0.258992. This number is what one could get if he/she optimized ELO, that is why it is called "Optimized Elo Benchmark". The current FIDE system would get much worse result. The number for Glicko is 0.258078. This number is not optimized. You can see that even non-optimized Glicko is easily outperforming optimized Elo.
2) You can't say that the difference is just 0.01, the absolute value of difference has no meaning unless it is compared with the actual number. So 0.01 from 0.25 is actually 4%. Do not forget that it is 4% on average from each game. They are using average deviations which means it is pretty hard to interpret it. However, when there is an improvement in 4% predictability for each game and a player plays say 100 games per year, I would think the difference would be noticeable.

Correction: a better way of thinking about 2) is to use information that you provided. When the rating system has pretty much no predictability (assumes each result is a draw), the index is equal to 0.3. The optimized Elo improves this number roughly by 0.04. The best program does further 0.01. Now how bad the system is with no predictability? Well, it assumes that each result is a draw. Let me give you a simple example. I normally score on average 6/7 from each weekender; the k factor for me is 15. So if I perform according to my rating the system with no predictability will allocate me with additional 15*2.5=37.5 points; that is just from one tournament. If I play 12 tournaments a year, that gives 37.5*12=450 point. Now 450 points are associated with 0.5 points difference in the index (from 0.25 to 0.3). That implies that if the relationship is linear the further 0.1 difference in index could mean the difference in rating in order of 100 points; and that is just after one year.

Desmond
04-04-2011, 01:25 PM
If you predict a loss, when it was actually a win then you score 0.004Look all this is frankly beyond my maths skills, but I don't think it's as simple as that. See here (http://www.kaggle.com/c/ChessRatings2/Details/Evaluation)
Looks like not just a matter of predicting a loss or a win, but a sliding scale of how likely that loss or win is to be. And the points you get will be determined by the formulas not just if you predicted a loss or not.

antichrist
04-04-2011, 01:28 PM
As usual once it comes to ratings you have no idea what you are talking about... Let me just enumerate a few clear misunderstandings

AC
Did chessguru design tornelo? Does tornelo do ratings? Does the designer have no idea about how ratings work? Come again

Vlad
04-04-2011, 02:35 PM
If White actually wins all games; to score .259 you need to predict white wins 0.55 of the time...to improve to .249 you need to predict white wins 0.563. I dunno what that equates to, perhaps around 7 rating points.

Replying to AC' post the first and the last time. When somebody writes the above sentence without explaining where 7 comes from I believe I have the right to say that they have no idea about ratings. Exactly in the same way he could say 70 points or even 700 points. He does not understand the relationship and just making numbers up.

Kevin Bonham
04-04-2011, 04:53 PM
I don't care what system they use, as long as it is universal.

Actually the final sentence ("‘One player,one FIDE rating’ for every chess player worldwide would eliminate the need for individual countries to produce their own rating lists. ") was just speculation on Peter's part.

See post 4 at http://www.chesschat.org/showthread.php?t=12006 where this was done to death before.

ChessGuru
04-04-2011, 10:00 PM
Did chessguru design tornelo? Does tornelo do ratings? Does the designer have no idea about how ratings work?
I get the concept of ratings but like I said in my post - I need help with the maths!

Tornelo is a tool for doing pairings and calculating ratings. You can put whatever engine you like into it for ratings or for pairings... the beauty of tornelo isn't in the .249, it's the 'eye candy' as the ACF puts it. But save that discussion for another thread....

I do appreciate Vlad's manner - I can see why he's such a good chess coach. Patient with the kids, explains what he knows in such a nice way. Much appreciated.

I made up the "7 points" from the difference between predicting .55 and .563 -- in the ELO tables .013 would be roughly equal to 7 points.

And I've got to say, that while I don't understand a lot of maths, what you're on about doesn't make much sense either.....did you even take a look at the kaggle site?

Just based on the ACF System with the last year's results my annectodal observation is that the Glicko2 that Gletsos is running is a better predictor of results. I'm not intending to argue anything different....

I have spoken to some of the kaggle event entrants who are currently in the top 10 spots and they all tell me that their method of calculation is too complex to be of any 'real world' value. I don't know what that means...but I'm at least asking questions.

ChessGuru
04-04-2011, 10:01 PM
Can someone give us a clear answer. Is the ACF entering the Kaggle event, and if not WHY not?

Kevin Bonham
04-04-2011, 10:25 PM
There's been no discussion at ACF level of entering it. It's up to the Ratings Officers to decide if they feel like doing so privately or not.

I'd be quite interested to see how our version of Glicko2 went at predicting results in the current comp - but it would not be the most meaningful test since the actual aim of the ACF ratings system is to provide accurate ratings for the ACF ratings pool. The ACF ratings pool includes not just the mostly 1200+ strength players used to provide data for the Kaggle comp but also that long tail of low-rated juniors (some of whom later go on to be very highly-rated.) The lower the strengths of players included, the more difficult a predictive task. No system would be scoring anything like .25 if it had to predict our data pool!

ER
04-04-2011, 10:28 PM
KAGGLE not to be confused with CAGLES.
The latter being an acronym of my end of statement greetings of Cheers And Good Luck as formed by Eclectic who added the ES in the end thus making it usable as a verb or noun. In our private correspondence we still use CAGLES as well as other variations ie AU CAGLOIR (Eclectic) or CAGLEREMOS (myself)!

Vlad
04-04-2011, 11:36 PM
I made up the "7 points" from the difference between predicting .55 and .563 -- in the ELO tables .013 would be roughly equal to 7 points.


To me that sounds similar to “It is 6 o’clock on my watch which means the temperature in my house is 6 degrees Celsius”.

BTW, there is Real Glicko which is currently on the 32 position. Compare that result with Benchmark ELO which is #100, not very far from “White always wins” and “White always draws”, that are on 103 and 110 positions respectively.

peter_parr
05-04-2011, 11:25 AM
“FIDE General Secretary Ignatius Leong (Singapore) in a recent visit to Sydney advised that FIDE will soon restructure its rating fees with no extra charges for thousands of games. ‘One player, one FIDE rating’ for every chess player worldwide would eliminate the need for individual countries to produce their own rating lists.”

See SMH (http://www.chessdiscountsales.com/news/2011.htm)



No, the last sentence is just speculation on Peter's part. The real idea was for FIDE to expand by rating rapid and blitz. See previous discussion at http://www.chesschat.org/showthread.php?t=12006


Actually the final sentence ("‘One player,one FIDE rating’ for every chess player worldwide would eliminate the need for individual countries to produce their own rating lists. ") was just speculation on Peter's part.

See post 4 at http://www.chesschat.org/showthread.php?t=12006 where this was done to death before.

My last sentence was not speculation on my part as stated incorrectly by KB.

I was quoting from what the FIDE General Secretary Ignatius Leong (Singapore) said at Norths (Sydney) in his speech on “One player, One FIDE rating”
(I was there – KB was not)

The whole point Leong was making was simply one player one rating eliminating the need for individual countries to produce their own rating lists.

This is what Leong said and this is what I correctly published in the SMH.
Leong explained for some time one player – one rating. My report was 100% accurate in the few lines available.

I’m not writing the encyclopedia of Chit-Chat-Chess-Chat.

Btw – It was unfortunate that no ACF official attended the Deloitte chess exhibition last Wednesday in George St, Sydney CBD. I sincerely hope KB (ACF vice-president) that the ACF on its letterhead write to Deloitte thanking Deloitte Australia for their $10,000 sponsorship.

Kevin Bonham
05-04-2011, 11:36 AM
Disposed of on other thread.

http://chesschat.org/showpost.php?p=306716&postcount=38