PDA

View Full Version : Objectives of rating systems



Kevin Bonham
13-09-2006, 12:19 PM
Rincewind suggested on another thread that agreement about the objectives of a rating system is needed for debate about the system to advance.

This thread is for debate about what should be the fundamental aims of a rating system such as the ACFs. I will be linking back to it to encourage critics of the existing rating system to discuss their views on what a rating system should do, where it seems their criticisms are related to this rather than to how well it does it.

I propose that the main objective of a rating system should be to measure the strength of players in the form of a figure or set of figures that can be used to predict the performance of players as accurately as possible.

More formally, the average difference between actual score and score expected based on rating for those players active in a ratings period should be as low as possible.

This should be subject to many constraints. For starters, I propose the following:

* Practicality.

* Affordability.

* The primary (but not necessarily the only) source of data should be a player's previous performances.

* The system should be logically defensible - eg it should not only be shown to be predictively effective but it should make sense (at least to a suitably qualified mathematician) that it is so.

* Avoidance of discrimination. Players should not be thrown or penalised points on account of age (however age could be considered in weighting different performances if this is shown to improve predictive capacity).

* The system should not deflate. Gradual inflation should be permitted if it is clearly not due to any known statistical problem with the system and if it is considered to reflect a real improvement in the standard of play across the board.

* The system should be open to subjective expert review to permit corrections to be applied in demonstrated cases of local disfunction.

* The system should control the risk of serious overshoots and undershoots. A slightly less predictive system may be preferred to a slightly more predictive one if the latter has more severe or more numerous cases of a small number of players being severely overrated or underrated. This constraint applies particularly to top players and strong juniors because of the potential for adverse impact on selections.

Suggestions for further constraints welcome.

Vlad
13-09-2006, 12:41 PM
More formally, the average difference between actual score and score expected based on rating for those players active in a ratings period should be as low as possible.


What do you mean by average? Is it just divided by the number of active players? I suppose it should be weighted by the number of games played. For players who play a lot deviations will be significant, for players who play only a few games deviations will be relatively small.

It probably does not matter in any case, I mean Glicko will outperform others anyway; but just for consistency.

Kevin Bonham
13-09-2006, 12:58 PM
What do you mean by average? Is it just divided by the number of active players? I suppose it should be weighted by the number of games played. For players who play a lot deviations will be significant, for players who play only a few games deviations will be relatively small.

Good question. I think this is open for offers as to whether it should be

* mean error per player
* mean error per player per game

(I could add *mean error per game, but I think it's unlikely a system could perform especially well overall while having an unusually high error rate on particular games.)

Oepty
13-09-2006, 06:13 PM
* The system should not deflate. Gradual inflation should be permitted if it is clearly not due to any known statistical problem with the system and if it is considered to reflect a real improvement in the standard of play across the board.


Kevin. Why should the system not deflate if it is considered to reflect a real decline in the standard of play accross the board? Who is to make the judge of whether there is a real improvement or decline in the standard of play from year to year?
Scott

pax
13-09-2006, 06:22 PM
Kevin. Why should the system not deflate if it is considered to reflect a real decline in the standard of play accross the board? Who is to make the judge of whether there is a real improvement or decline in the standard of play from year to year?
Scott

Because our means of testing is strictly within the system (games between players), it is impossible to determine if the overall pool is increasing or decreasing in strength. Ratings only have a relative relevance.

Of course, this means that it doesn't actually matter to predictability if the system deflates. However deflation is not desirable, since it is also useful to be able to compare ratings from different time periods.

Garvinator
13-09-2006, 06:27 PM
Kevin. Why should the system not deflate if it is considered to reflect a real decline in the standard of play accross the board? Who is to make the judge of whether there is a real improvement or decline in the standard of play from year to year?
Scott
I would regard deflation/inflation as where the standard of play inside the pool of players is the same, but as a whole the ratings have decreased/increased.

Therefore, deflation or inflation are highly undesirable, but if one has to be chosen as a side effect, then inflation would be slightly 'better' as ppl like to see their ratings going up.

Oepty
13-09-2006, 06:32 PM
Because our means of testing is strictly within the system (games between players), it is impossible to determine if the overall pool is increasing or decreasing in strength. Ratings only have a relative relevance.
[QUOTE]

Well that is sort of what I was trying to get at with my second question to Kevin
[QUOTE=pax]
Of course, this means that it doesn't actually matter to predictability if the system deflates. However deflation is not desirable, since it is also useful to be able to compare ratings from different time periods.

Well then inflation would be just as bad. I took Kevins comments as saying inflation is some how fundamentally better than deflation and I was trying to find whether there was any reason for that.
Scott

Oepty
13-09-2006, 06:36 PM
I would regard deflation/inflation as where the standard of play inside the pool of players is the same, but as a whole the ratings have decreased/increased.

Therefore, deflation or inflation are highly undesirable, but if one has to be chosen as a side effect, then inflation would be slightly 'better' as ppl like to see their ratings going up.

How do we measure whether the standard of play has gone up or down? If there was a perfect or close to perfect way of doing this then surely we could have a rating system that just looks at the strength of play of every game of a player and gives them a rating based on that. Impossible though in reality.
Scott

Rincewind
13-09-2006, 06:47 PM
I think the who inflation deflation issue is a non problem if the system works and is self consistent. It is desirable to compare ratings over time so a player who hasn't played for 2 years still has a reasonable rating so large scale niflation and deflation should be avoided where possible. However, I think this might not be a problem with the ratings systsem but could be systemic with the behaviour of players. For example, a large number of players rapidly increasing in strength, generally deflating the pool of stable players and then ceasing competition chess is not a problem with the rating system but with the environment.

Kevin Bonham
13-09-2006, 07:18 PM
Kevin. Why should the system not deflate if it is considered to reflect a real decline in the standard of play accross the board? Who is to make the judge of whether there is a real improvement or decline in the standard of play from year to year?
Scott

In a sufficiently large chess playing area, like a nation, real deflation shouldn't happen. You might get a decline in the average rating because there are less strong players or more weak players about at one time than another, but that's not real deflation. Real deflation is, for instance, when stably rated adults in the 35-45 age bracket are losing points when there is no reason to believe their play is getting better or worse.

The standard of chess play should gradually improve globally as more and more lines are studied and more and more conclusions reached about particular lines (and ditto to a degree for endgames). There is no mechanism by which it should decline.

I'll agree just to be really precise that if, for instance, a virus from Mars caused all chessplayers to blunder pieces every five moves, then deflation should be possible. However there is no normal circumstance under which it should occur - so if you have significant average points losses among adults who should be close to stable in playing strength, you have a problem with your system.


How do we measure whether the standard of play has gone up or down? If there was a perfect or close to perfect way of doing this then surely we could have a rating system that just looks at the strength of play of every game of a player and gives them a rating based on that. Impossible though in reality.

You could measure it to a degree by looking at their games. This is not a practical way to run a rating system, but it is a practical way to try to calculate whether drift in ratings over a very long time is reasonable.

John Nunn did an interesting exercise involving putting games of leading 19th century players through computers and looking at the frequency with which severe tactical errors occurred. He found that in the games of the leading 19th century players errors were far more common than they are now. On that basis today's leading players are probably better than the leading players of times past.

Another reason is that the growth in theory. Any GM of today would wipe the floor with a Philidor or Morphy through better understanding of theory alone. That makes them objectively better players, although if Philidor or Morphy had access to the same resources over the same period of time they could well triumph through superior raw talent.

So what I'm saying is that there are cases in which inflation may represent some real improvement in playing strength, whereas true deflation almost certainly means there is something wrong with the system.

Kevin Bonham
13-09-2006, 07:25 PM
I think the who inflation deflation issue is a non problem if the system works and is self consistent. It is desirable to compare ratings over time so a player who hasn't played for 2 years still has a reasonable rating so large scale niflation and deflation should be avoided where possible. However, I think this might not be a problem with the ratings systsem but could be systemic with the behaviour of players. For example, a large number of players rapidly increasing in strength, generally deflating the pool of stable players and then ceasing competition chess is not a problem with the rating system but with the environment.

Agreed. On this basis, I'll change


* The system should not deflate.

to


* The system should not deflate under normal circumstances. If it deflates under exceptional circumstances, such as unusual player demographics, corrections should be applied.

I think a reason to avoid deflation is that if it is merely a relative scale then if it deflates, points can be added without affecting the relative standings of players. This has the added advantage of preventing widespread points losses and undue discouragement caused by them.

Denis_Jessop
13-09-2006, 09:32 PM
John Nunn did an interesting exercise involving putting games of leading 19th century players through computers and looking at the frequency with which severe tactical errors occurred. He found that in the games of the leading 19th century players errors were far more common than they are now. On that basis today's leading players are probably better than the leading players of times past.

Another reason is that the growth in theory. Any GM of today would wipe the floor with a Philidor or Morphy through better understanding of theory alone. That makes them objectively better players, although if Philidor or Morphy had access to the same resources over the same period of time they could well triumph through superior raw talent.

I can't let this one pass unchallenged especially as it's nice and off topic.

John Nunn was not the only person to do an exercise of that kind. Apparently Prof.Elo himself did one as did Sir Richard Clarke, an English ratings expert, so it is said. Others have also had a go. The whole issue is dealt with at length by the inimitable Fox and James in The Even More (Inimitable) Complete Chess Addict.

What is interesting here is not so much the lists of strongest players as the opinions of recent strong players of their predecessors. So Larsen (1967) nominated Philidor as the greatest of all time. Fischer (1964) said that Morphy would "beat anyone alive today" and Euwe and Gligoric also nominated Morphy. Lasker, Alekhin, Botvinnik and Spassky all opted for Capablanca.

One point is that in assessing players' comparative worth, you cannot properly pit Morphy with 19th C knowledge against Kasparov with 21st C knowledge as it's quite an unfair comparison. My own guess is that the very best of old-time players were intrinsically as strong as the best modern players but that the overall standard has improved. That, I'm sure is the case in most sports and in classical music performance (except singing) as well.

Another factor is the different tournament atmosphere these days. In another part of their work F & J have section entitled Drink Like a Grandmaster in which, among other things, they note that James Mason, a leading player of his day, was claimed frequently to have lost games in a "hilarious condition" and that in the London Tournament of 1899 he was discovered asleep in a fireplace. They don't mention whether the fire was lit at the time or whose move it was so please don't ask. :lol: :doh: :hmm:

DJ

Kevin Bonham
14-09-2006, 12:39 PM
John Nunn was not the only person to do an exercise of that kind. Apparently Prof.Elo himself did one as did Sir Richard Clarke, an English ratings expert, so it is said.

I am not sure whether Elo and Clarke made any attempt to assess changes in the quality of play or whether they were just trying to provide a comparative ratings process based on game results between players over time. (In the latter light, Sonas' Chessmetrics site is by far the most thorough attempt, but it assumes parity in strength between all the eras mentioned.)


What is interesting here is not so much the lists of strongest players as the opinions of recent strong players of their predecessors. So Larsen (1967) nominated Philidor as the greatest of all time. Fischer (1964) said that Morphy would "beat anyone alive today" and Euwe and Gligoric also nominated Morphy. Lasker, Alekhin, Botvinnik and Spassky all opted for Capablanca.

I have seen these nominations in the original edition of that book (I haven't read any of the later editions alas). I suspect that great players tend to factor in a correction for theory in making such a comment.


One point is that in assessing players' comparative worth, you cannot properly pit Morphy with 19th C knowledge against Kasparov with 21st C knowledge as it's quite an unfair comparison.

Nonetheless if you put Paul Morphy at his peak in a time machine and sit him down for a match with Kasparov today, it is a fair reflection of how he would perform. The role of a rating system is to say "how good is a player at a specific time?" not "how good would this player be if exposed to the theory of 150 years in the future at the same moment?" Otherwise you would be judging Kramnik according to how good he would be if NCO volume 23 from 2150 AD fell into a timewarp and landed in his lap before his next match.

To get this back to objectives of ratings systems, suppose a ratings system applied retrospectively says (say) that the strongest players of the 19th century were 2400-2500 strength at their best while the strongest players now are 2800 strength. What I am saying with my comment about inflation is that that is not automatically a reason to conclude the system is faulty. It may be that it is telling us something about a gradual improvement in play standards that is actually real. But if it said the top players of the 19th century were 3200 strength we would have to be sceptical about that.

That is all I was getting at in saying that gradual inflation does not necessarily prove a fault with a rating system.

The comment about tournament standards is interesting. For instance until quite recently it was permitted for smoking to occur in tournaments. For all we know this may have affected the performance of almost everybody! A rating system probably can't handle these sorts of issues and just has to assume that playing conditions from time to time remain the same - unless it takes time control (for instance and where known) into account.

I'm interested that the inflation/deflation part of my initial post has created a lot of debate (and it is a point that takes a fair amount of "unpacking") but the rest has not. Does anyone have any suggestions or points of difference with the rest?

Oepty
14-09-2006, 01:36 PM
In a sufficiently large chess playing area, like a nation, real deflation shouldn't happen. You might get a decline in the average rating because there are less strong players or more weak players about at one time than another, but that's not real deflation. Real deflation is, for instance, when stably rated adults in the 35-45 age bracket are losing points when there is no reason to believe their play is getting better or worse.

The standard of chess play should gradually improve globally as more and more lines are studied and more and more conclusions reached about particular lines (and ditto to a degree for endgames). There is no mechanism by which it should decline.

I'll agree just to be really precise that if, for instance, a virus from Mars caused all chessplayers to blunder pieces every five moves, then deflation should be possible. However there is no normal circumstance under which it should occur - so if you have significant average points losses among adults who should be close to stable in playing strength, you have a problem with your system.



You could measure it to a degree by looking at their games. This is not a practical way to run a rating system, but it is a practical way to try to calculate whether drift in ratings over a very long time is reasonable.

John Nunn did an interesting exercise involving putting games of leading 19th century players through computers and looking at the frequency with which severe tactical errors occurred. He found that in the games of the leading 19th century players errors were far more common than they are now. On that basis today's leading players are probably better than the leading players of times past.

Another reason is that the growth in theory. Any GM of today would wipe the floor with a Philidor or Morphy through better understanding of theory alone. That makes them objectively better players, although if Philidor or Morphy had access to the same resources over the same period of time they could well triumph through superior raw talent.

So what I'm saying is that there are cases in which inflation may represent some real improvement in playing strength, whereas true deflation almost certainly means there is something wrong with the system.

Excellent answers Kevin. Thank you.

Oepty
14-09-2006, 01:39 PM
Kevin I have a number of questions, I just started off with the one I did on sort of a random way.



* The primary (but not necessarily the only) source of data should be a player's previous performances.


What other sources of data did you have in mind? I am struggling to think of any other source.
Scott

Vlad
14-09-2006, 01:53 PM
As Kevin has already mentioned somewhere before player's age could be a very important piece of information. I am sure if one will run a basic regression, it will be highly significant.

In general, it could be pretty much anything.
a) Do you smoke? Yes or No.
b) Do you work full time? Yes or No.
c) Have you recently got married or divorced? Yes or No.
d) Do you have a coach? Yes or No.

Kevin Bonham
14-09-2006, 01:54 PM
What other sources of data did you have in mind? I am struggling to think of any other source.

Possibly the player's age. I suggested in my opening post that while throwing points based on age (or silly things like "if you play a junior you can't lose ratings points) is not justified, it might be reasonable to consider age in determining what rate a player's rating changes at, if you have the data and if you can prove it makes the system better.

Can't think of any others offhand.

Garvinator
14-09-2006, 02:53 PM
* Affordability.
I will give this a go. I would say that the rating system must be affordable to those it is being marketed to. There is very little use having a wizz bang, perfect rating system if nobody can afford to use it.

Furthermore, if not many organisers can afford to send tournaments in to be rated, then that is less information being fed into the system, reducing 'predictive accuracy'.

But then also on the flip side, if a rating system is too cheap, or dare I say FREE :uhoh: , then it can/could be viewed by some as having no meaningful value.

I understand that fide rating fees are more expensive that the most expensive admin fees for rating in Australia (not sure if true), maybe someone can help with the fide costs by posting them as a comparison.

Kevin Bonham
14-09-2006, 03:44 PM
My meaning when I wrote "affordability" was that it should be affordable for the organisation running it - no point having a rating system if you need to buy out the Pentagon's entire IT network to run it.

However I'll add something else there. It should also be affordable for the players too. Players should be able to get a rating cheaply. No good having a rating system which is theoretically perfect if the players have to pay $5 a game to get rated and therefore don't use it.

From this I'll add another constraint:

* Inclusivity. The ideal system should rate as many players as it is capable of providing reasonably accurate ratings for. A player should be ratable by the system after a reasonably small number of games and the requirements for an event to be rated should not be too restrictive. Also a system should be able to rate players of a very wide spread of tournament-playing abilities.

Oepty
14-09-2006, 05:00 PM
Possibly the player's age. I suggested in my opening post that while throwing points based on age (or silly things like "if you play a junior you can't lose ratings points) is not justified, it might be reasonable to consider age in determining what rate a player's rating changes at, if you have the data and if you can prove it makes the system better.

Can't think of any others offhand.

Thank you Kevin. Another question. I have a vague idea of what is meant by overshoots and undershoots, can you please definie these terms.
Scott

Kevin Bonham
14-09-2006, 07:07 PM
Thank you Kevin. Another question. I have a vague idea of what is meant by overshoots and undershoots, can you please definie these terms.

In this context it means that a player performs spectacularly well or badly over a small number of games and gets a huge rating change as a result. A very dynamic system might assume this is because their playing strength has changed and give them an extremely good or bad rating as a result when actually they haven't got that much better or worse - it was just a patch of unusual form or "luck".

A different context in which the same term is sometimes used is where a player gets a rating that is higher than or lower than both of their previous rating and their performance rating for the period. I am not suggesting that a good rating system must always avoid undershoots/overshoots of this kind, but they're not something you would usually expect to see.

Oepty
15-09-2006, 10:51 AM
In this context it means that a player performs spectacularly well or badly over a small number of games and gets a huge rating change as a result. A very dynamic system might assume this is because their playing strength has changed and give them an extremely good or bad rating as a result when actually they haven't got that much better or worse - it was just a patch of unusual form or "luck".


Kevin. This would also be a issue to consider when looking at the ratings of players who are returning to competitive chess and might take a tournament or two to get back to their best. Having said that I think the Glicko system has actually handled the return of players such as this quite well in SA. Exceptionally well in the case of Sykes as I mentioned in the other thread.
Scott

Kevin Bonham
15-09-2006, 11:50 AM
Kevin. This would also be a issue to consider when looking at the ratings of players who are returning to competitive chess and might take a tournament or two to get back to their best. Having said that I think the Glicko system has actually handled the return of players such as this quite well in SA. Exceptionally well in the case of Sykes as I mentioned in the other thread.
Scott

I wasn't really thinking of the inactive player issue when I wrote that, more of cases of spectacularly good/bad performances by very active players and how much impact these should have. But yes, overshoots/undershoots can occur with inactive players too. However in their case since their new rating is highly likely to be inaccurate whatever it is, I don't think overshoots/undershoots (if they happen) are as much of a problem.

How inactive players relate to the objectives of a system is an important issue. The accuracy of their ratings generally won't be tested in a given ratings period since very few of them will return and probably not for many games. I think it's desirable to estimate their playing strength as accurately as possible anyway, if only to discourage players with excessive inactive ratings from sitting on them in fear of likely points loss on return. I have no objection to penalties for inactivity if they are proven to improve predictiveness for those inactive players who do return.

Cat
15-09-2006, 11:51 PM
For example, a large number of players rapidly increasing in strength, generally deflating the pool of stable players and then ceasing competition chess is not a problem with the rating system but with the environment.

It's time for your tablets my son!

Cat
15-09-2006, 11:58 PM
Kevin. Why should the system not deflate if it is considered to reflect a real decline in the standard of play accross the board? Who is to make the judge of whether there is a real improvement or decline in the standard of play from year to year?
Scott

Deflation is important because it occurs unevenly across the pool. Some regions deflate more quickly than others which leads to regional variation. It's analagous to the way in which regional accents develop. English is spoken thoughout the UK but a Glaswegian and a Scouse could never communicate.

Rincewind
16-09-2006, 12:16 AM
It's time for your tablets my son!

I'm glad you're not medicating me.

Cat
17-09-2006, 11:05 PM
I'm glad you're not medicating me.


More's the pity!

Rincewind
17-09-2006, 11:12 PM
More's the pity!

Hardly. I'd have more confidence seeing Dr Patel.

bergil
17-09-2006, 11:22 PM
Hardly. I'd have more confidence seeing Dr Patel.
ROTFL :owned:

Santa
04-01-2010, 03:41 PM
This is one of the most stupid comments anyone could make when it comes to rating systems and just shows your complete lack of understanding.


I disagree with you Bill.

A ratings system cannot accurately predict the outcome of a match or a tournament - if it could, there would be little point in playing.

No ratings system could have predicted that Trevor Tao would come equal second (I think with Goldenberg) on his first attempt, he was granted discretional entry as an improving junior who, it was felt, would be competitive.

One can, and should, compare what a ratings system predicts will happen with what does, examine and understand the differences between predictions and reality. In any sport, players hit form and lose form, and sometimes a result reflects on one player but not the other. Examples, any player who loses by blundering a piece. Gary Lane's book, Sharpen your chess tactics in 7 days has quite a few grandmasterly oversights.

As in any statistical model, a ratings system can only predict trends over relatively large numbers.

You can take a group of players, say all all those in the Championship and Reserves, and say with some degree of certainly, "Of all Australian chessplayers available to play now, one of these would win, " and you'd name a few players. If you chose to name 30 players, few if any would be in the reserves, and looking at those actually in the championship you might be happy to nominate half as having no chance. If you only considered their ratings, the chances you are right would be really excellent. You might also be pretty confident that it will be one of these five....

The predicive capacity of a ratings system, such as it is, is good for determining elegibility to a championship tournament, but the ACF is wise to allow a few special cases to cater for rating inaccuracies.

I think most people accept that as fair.

Ratings between players can only validly be compared if the play substantially the same field, and that's why comparing Cordless Ratings with USCF ratings with FIDE ratings with ACF ratings is not valid. Heck, in the early 90s there were signficiant differences between ratings of Waverley and Dandenong players.

Kevin Bonham
04-01-2010, 04:54 PM
I disagree with you Bill.

A ratings system cannot accurately predict the outcome of a match or a tournament - if it could, there would be little point in playing.

Without getting into Bill's evaluation of CG's comments (I often don't comment on CG-related matters because of conflict of interest) I am not sure you are representing Bill's views correctly.

He isn't saying and has never said that an aim of a rating system is to predict tournament results perfectly and indeed there is no reason to believe it would be possible for any feasible rating system to do this.

What he is saying is that the better a rating system is the more accurate it should be at predicting results in the next ratings period on average.

Every ratings system will have its hits and misses but some ratings systems are demonstrably better than others over a sufficiently large mass of data.

Bill's view is that being a good system in this regard is an important objective. I don't see how your comment:


As in any statistical model, a ratings system can only predict trends over relatively large numbers.

is in any way inconsistent with that, so I'm not sure what you are actually disagreeing with him about.

Bill Gletsos
04-01-2010, 05:46 PM
I disagree with you Bill.Disagree all you like, that just makes you wrong.

Cordover said:

The predictive power of a rating system is one of the less important goals of a ratings system. I stand by my statement that thiis is one of the most stupid comments anyone could make when it comes to rating systems and just shows his complete lack of understanding

A ratings system cannot accurately predict the outcome of a match or a tournament - if it could, there would be little point in playing.I never said it could.

ChessGuru
04-01-2010, 07:59 PM
Disagree all you like, that just makes you wrong.

Cordover said:
I stand by my statement that thiis is one of the most stupid comments anyone could make when it comes to rating systems and just shows his complete lack of understanding
I never said it could.

I am not suggeting it should be an IN-accurate system. But sacrificing aspects such as simplicity, transparency, speed, efficiency etc in order to have a more 'accurate' system is (IMO) the wrong path to take.

If I had to answer the question: "Why should we create a rating system?"

My answer wouldn't be "to accurately determine the difference between 2149 and 2150th best player in the country".... or "to be able to predict to 3 decimal places the statistically likely results of a given tournament."

It would be more like "To encourage people to play more chess by providing feedback" or "To encourage rivalry and friendly competitiveness"....

You've already admitted that ratings cannot accurately predict who will win a match.... so if they aren't trusted to predict outcomes, they aren't used for selections (eg. olympiad)....we all know they aren't perfectly accurate -

My argument is based on the fact that there is finite time and energy being spent to improve the ratings system. In this case the priorities should be somewhat different from yours....

Bill Gletsos
04-01-2010, 08:13 PM
I am not suggeting it should be an IN-accurate system. But sacrificing aspects such as simplicity, transparency, speed, efficiency etc in order to have a more 'accurate' system is (IMO) the wrong path to take.You are changing your tune because your previous statements just made you look totally clueless and stupid..
You said:

I just think mine looks prettier (and not needing ratings officers is a bonus). Which IMO is really what a ratings system is about -- players want to look at their rating. If it looks nicer then it's a better system.

If I had to answer the question: "Why should we create a rating system?"

My answer wouldn't be "to accurately determine the difference between 2149 and 2150th best player in the country".... or "to be able to predict to 3 decimal places the statistically likely results of a given tournament."

It would be more like "To encourage people to play more chess by providing feedback" or "To encourage rivalry and friendly competitiveness"....The aim of a rating system isnt to stroke peoples egos, it is to show the relative playing strengths of chess players.
To this end the primary aim of a rating system is to be as predictively accurate as possible.

You've already admitted that ratings cannot accurately predict who will win a match.... so if they aren't trusted to predict outcomes, they aren't used for selections (eg. olympiad)....we all know they aren't perfectly accurate -Just because you cannot determine those things with 100% accuracy does not mean you should not strive to be as accurate as possible.

My argument is based on the fact that there is finite time and energy being spent to improve the ratings system. In this case the priorities should be somewhat different from yours....Whether the system is Elo of Glicko the priority should be predictive accuracy.
The facts are simple Glicko2 is more predictively accurate than Glicko which is more predictively accurate than Elo.

Get back to me when you finally get a clue. :whistle:

Santa
04-01-2010, 11:46 PM
Disagree all you like, that just makes you wrong.

Cordover said:
I stand by my statement that thiis is one of the most stupid comments anyone could make when it comes to rating systems and just shows his complete lack of understanding
I never said it could.

The predictive capacity of the ratings system is a test of it, not a function of it. David's a user, not a theoretician. he wants numbers, and he wants them fast.

However good the maths are, calculations based on data potentially more than three months old, typically half that, are not going to be accurate.

Bill Gletsos
05-01-2010, 12:42 AM
The predictive capacity of the ratings system is a test of it, not a function of it.The point is that the better the predictive accuracy of a rating system the better the ratings produced by that system compared to systems with less predictive accuracy.

As such an the primary aim of a rating system should be to be as predictively accurate as possible.

David's a user, not a theoretician. he wants numbers, and he wants them fast.Actually he said he wanted them to look pretty.

However good the maths are, calculations based on data potentially more than three months old, typically half that, are not going to be accurate.Clearly not an opinion held by numerous national chess federation such as the ECF (calculated annually) or the NZCF (4 month periods) to name just a couple. Until recently even FIDE had 3 mionthly rating periods.
For the majority of players with fairly steady playing strengths your claim is not true and for those whose rating is improving then that can be factored into the calculations.

Kevin Bonham
05-01-2010, 01:58 AM
However good the maths are, calculations based on data potentially more than three months old, typically half that, are not going to be accurate.

An average data age of a month and a half is a relative side-issue where you're running a junior ratings system where many of the players only play a few events a year if that and hence existing ratings can be way out of date no matter whether they're processed three months after a player played their games or three minutes after.

I'm not convinced that as a general principle processing games every few months rather than game-by-game updating leads to a loss in predictiveness. You do get lag when a player is improving, especially rapidly, but in a game-by-game system where you are chucking out the context of surrounding performances on at least one side (which you can get in Glicko or ELO systems using intermediate ratings) you are going to have much the same thing. Maybe someone can come up with a predictively superior system that updates game by game instantly but I'll believe that when I see it.

Anyway that's not especially relevant to the debate about rating objectives as they relate to CK vs Glicko-2.

Bearing in mind that I have a conflict of interest since I run tournaments for CG, I actually think that an immediate-update system is the right choice for the body of players he currently rates. These are mostly weak to midrange juniors, many of whom play seldom, and often there is a high proportion of previously unrated players in an event.

Rating this sort of player pool using a Rolls-Royce rating system with a turnaround of a few months wouldn't be worth it. They like the immediacy of getting their ratings right away and using a system with a long turnaround isn't going to make the system much more accurate anyway, given the notorious problems with rating that kind of player pool correctly in any rating system, however good. Furthermore, there generally aren't any sheep stations riding on the accuracy of the system.

That said there are ways in which the CK system could be made more accurate without much compromising simplicity or transparency, especially for the very top players where the ratings are very overcompressed - I can send David some of these sometime, or post them here if anyone's interested.

However, there are some organisations that legitimately and genuinely don't want an inaccurate fast-update system because they actually need their system to be quite predictively reliable (not perfect because that's impossible, but as good as reasonably possible). While ACF selections aren't always solely on a ratings basis (they are when there is no time for the selection-panel method) some selectors do rely heavily on ratings - not just the rating of the player but also their opponents played in the recent past - in making their decisions. Ratings are also used to decide the cutoffs for automatic entry to the Aus Champs and are critical in seeding tournaments as effectively as possible.

These things are all compromised if a more basic and less accurate system is used. I know that if the ACF went back to a low-dynamism version of ELO then I would find it very much more difficult to do the elite junior squad selections I do now, for example, and I think selections would be a lot more contentious with a less accurate rating system. (That said, there are some selectors who actually don't like Glicko.) Basically, the more inaccurate the rating system, the more you need selectors who have direct personal experience of the play of each of the players involved, and lots of it - and such potential selectors are not easy to come by, and often have conflicts of interest.

If someone can show that there is a specific instant-update system that is more or less as predictively accurate as G-2 that would be worth the ACF checking out.

Spiny Norman
05-01-2010, 05:23 AM
If there was a system where:
-- organisers could upload tournament results; and
-- state ratings officers could flag them as "provisionally approved"

then it would be feasible for a web-based system to provide players with a "provisional rating" (not an "official rating") based on the latest results. This might be good for juniors, or frankly for anyone like me who is slightly rating-obsessed. ;)

However this would be a lot of work for very little benefit. If someone wanted to do the work for free and assist the ACF in integrating the system with the work practices of the state ratings officers, I wouldn't complain; I would welcome it. But I think it would be a rather poor use of money and volunteer resources when there are so many other things that could be done.

Patrick Byrom
06-01-2010, 05:18 AM
Originally Posted by The Snail King
If there was a system where:
-- organisers could upload tournament results; and
-- state ratings officers could flag them as "provisionally approved"

then it would be feasible for a web-based system to provide players with a "provisional rating" (not an "official rating") based on the latest results. This might be good for juniors, or frankly for anyone like me who is slightly rating-obsessed.

However this would be a lot of work for very little benefit. If someone wanted to do the work for free and assist the ACF in integrating the system with the work practices of the state ratings officers, I wouldn't complain; I would welcome it. But I think it would be a rather poor use of money and volunteer resources when there are so many other things that could be done.

It is only a lot of work if you have to start from scratch. I'd already written a program to calculate ratings for the Qld Junior Rating List (which uses a modified ELO system), so incorporating the Glicko1 formulas (from Barry Cox's website) is fairly easy.

I also have the benefit of having all the SP files for Qld conveniently available (since I'm the Qld Rating Officer). I don't have any tournaments for this period, but I've processed the events from the last period, and the estimated rating changes for each event are available here:
www.southsidejuniorchessclub.org/Sept-Dec%20_09_Unofficial_ACF_Rating_Changes.htm.

You can also download a 'Masterfile' with the accumulated changes:
www.southsidejuniorchessclub.org/NewMasterlist.txt.

Obviously these results are only of historical interest, but it should be possible to provide estimated Qld rating changes approximately every month.

Of course, these results are no substitute for the more accurate official ratings.