Page 1 of 3 123 LastLast
Results 1 to 15 of 39
  1. #1
    Monster of the deep Kevin Bonham's Avatar
    Join Date
    Jan 2004
    Posts
    39,310

    Objectives of rating systems

    Rincewind suggested on another thread that agreement about the objectives of a rating system is needed for debate about the system to advance.

    This thread is for debate about what should be the fundamental aims of a rating system such as the ACFs. I will be linking back to it to encourage critics of the existing rating system to discuss their views on what a rating system should do, where it seems their criticisms are related to this rather than to how well it does it.

    I propose that the main objective of a rating system should be to measure the strength of players in the form of a figure or set of figures that can be used to predict the performance of players as accurately as possible.

    More formally, the average difference between actual score and score expected based on rating for those players active in a ratings period should be as low as possible.

    This should be subject to many constraints. For starters, I propose the following:

    * Practicality.

    * Affordability.

    * The primary (but not necessarily the only) source of data should be a player's previous performances.

    * The system should be logically defensible - eg it should not only be shown to be predictively effective but it should make sense (at least to a suitably qualified mathematician) that it is so.

    * Avoidance of discrimination. Players should not be thrown or penalised points on account of age (however age could be considered in weighting different performances if this is shown to improve predictive capacity).

    * The system should not deflate. Gradual inflation should be permitted if it is clearly not due to any known statistical problem with the system and if it is considered to reflect a real improvement in the standard of play across the board.

    * The system should be open to subjective expert review to permit corrections to be applied in demonstrated cases of local disfunction.

    * The system should control the risk of serious overshoots and undershoots. A slightly less predictive system may be preferred to a slightly more predictive one if the latter has more severe or more numerous cases of a small number of players being severely overrated or underrated. This constraint applies particularly to top players and strong juniors because of the potential for adverse impact on selections.

    Suggestions for further constraints welcome.
    Moderation Requests: All requests for, comments about, or questions about moderation of any kind including thread changes must be posted in the Help and Feedback section and not on the thread in question. (Or by private message for routine changes or sensitive matters.)

    ACF Newsletter Information - All Australian players and administrators should subscribe and check each issue for relevant notices

    My psephology/politics site (token chess references only) : http://kevinbonham.blogspot.com.au/ Politics twitter feed https://twitter.com/kevinbonham

  2. #2
    CC International Master
    Join Date
    Jul 2005
    Posts
    2,256
    Quote Originally Posted by Kevin Bonham
    More formally, the average difference between actual score and score expected based on rating for those players active in a ratings period should be as low as possible.
    What do you mean by average? Is it just divided by the number of active players? I suppose it should be weighted by the number of games played. For players who play a lot deviations will be significant, for players who play only a few games deviations will be relatively small.

    It probably does not matter in any case, I mean Glicko will outperform others anyway; but just for consistency.

  3. #3
    Monster of the deep Kevin Bonham's Avatar
    Join Date
    Jan 2004
    Posts
    39,310
    Quote Originally Posted by drug
    What do you mean by average? Is it just divided by the number of active players? I suppose it should be weighted by the number of games played. For players who play a lot deviations will be significant, for players who play only a few games deviations will be relatively small.
    Good question. I think this is open for offers as to whether it should be

    * mean error per player
    * mean error per player per game

    (I could add *mean error per game, but I think it's unlikely a system could perform especially well overall while having an unusually high error rate on particular games.)
    Moderation Requests: All requests for, comments about, or questions about moderation of any kind including thread changes must be posted in the Help and Feedback section and not on the thread in question. (Or by private message for routine changes or sensitive matters.)

    ACF Newsletter Information - All Australian players and administrators should subscribe and check each issue for relevant notices

    My psephology/politics site (token chess references only) : http://kevinbonham.blogspot.com.au/ Politics twitter feed https://twitter.com/kevinbonham

  4. #4
    CC Grandmaster
    Join Date
    Jan 2004
    Posts
    3,444
    Quote Originally Posted by Kevin Bonham
    * The system should not deflate. Gradual inflation should be permitted if it is clearly not due to any known statistical problem with the system and if it is considered to reflect a real improvement in the standard of play across the board.
    Kevin. Why should the system not deflate if it is considered to reflect a real decline in the standard of play accross the board? Who is to make the judge of whether there is a real improvement or decline in the standard of play from year to year?
    Scott

  5. #5
    CC Grandmaster
    Join Date
    Jun 2004
    Posts
    5,672
    Quote Originally Posted by Freddy
    Kevin. Why should the system not deflate if it is considered to reflect a real decline in the standard of play accross the board? Who is to make the judge of whether there is a real improvement or decline in the standard of play from year to year?
    Scott
    Because our means of testing is strictly within the system (games between players), it is impossible to determine if the overall pool is increasing or decreasing in strength. Ratings only have a relative relevance.

    Of course, this means that it doesn't actually matter to predictability if the system deflates. However deflation is not desirable, since it is also useful to be able to compare ratings from different time periods.

  6. #6
    CC Grandmaster Garvinator's Avatar
    Join Date
    Jan 2004
    Location
    Brisbane
    Posts
    13,201
    Quote Originally Posted by Freddy
    Kevin. Why should the system not deflate if it is considered to reflect a real decline in the standard of play accross the board? Who is to make the judge of whether there is a real improvement or decline in the standard of play from year to year?
    Scott
    I would regard deflation/inflation as where the standard of play inside the pool of players is the same, but as a whole the ratings have decreased/increased.

    Therefore, deflation or inflation are highly undesirable, but if one has to be chosen as a side effect, then inflation would be slightly 'better' as ppl like to see their ratings going up.

  7. #7
    CC Grandmaster
    Join Date
    Jan 2004
    Posts
    3,444
    [QUOTE=pax]Because our means of testing is strictly within the system (games between players), it is impossible to determine if the overall pool is increasing or decreasing in strength. Ratings only have a relative relevance.
    [QUOTE]

    Well that is sort of what I was trying to get at with my second question to Kevin
    Quote Originally Posted by pax
    Of course, this means that it doesn't actually matter to predictability if the system deflates. However deflation is not desirable, since it is also useful to be able to compare ratings from different time periods.
    Well then inflation would be just as bad. I took Kevins comments as saying inflation is some how fundamentally better than deflation and I was trying to find whether there was any reason for that.
    Scott

  8. #8
    CC Grandmaster
    Join Date
    Jan 2004
    Posts
    3,444
    Quote Originally Posted by ggrayggray
    I would regard deflation/inflation as where the standard of play inside the pool of players is the same, but as a whole the ratings have decreased/increased.

    Therefore, deflation or inflation are highly undesirable, but if one has to be chosen as a side effect, then inflation would be slightly 'better' as ppl like to see their ratings going up.
    How do we measure whether the standard of play has gone up or down? If there was a perfect or close to perfect way of doing this then surely we could have a rating system that just looks at the strength of play of every game of a player and gives them a rating based on that. Impossible though in reality.
    Scott

  9. #9
    Reader in Slood Dynamics Rincewind's Avatar
    Join Date
    Jan 2004
    Location
    The multiverse
    Posts
    21,570
    I think the who inflation deflation issue is a non problem if the system works and is self consistent. It is desirable to compare ratings over time so a player who hasn't played for 2 years still has a reasonable rating so large scale niflation and deflation should be avoided where possible. However, I think this might not be a problem with the ratings systsem but could be systemic with the behaviour of players. For example, a large number of players rapidly increasing in strength, generally deflating the pool of stable players and then ceasing competition chess is not a problem with the rating system but with the environment.
    So einfach wie möglich, aber nicht einfacher - Albert Einstein

  10. #10
    Monster of the deep Kevin Bonham's Avatar
    Join Date
    Jan 2004
    Posts
    39,310
    Quote Originally Posted by Freddy
    Kevin. Why should the system not deflate if it is considered to reflect a real decline in the standard of play accross the board? Who is to make the judge of whether there is a real improvement or decline in the standard of play from year to year?
    Scott
    In a sufficiently large chess playing area, like a nation, real deflation shouldn't happen. You might get a decline in the average rating because there are less strong players or more weak players about at one time than another, but that's not real deflation. Real deflation is, for instance, when stably rated adults in the 35-45 age bracket are losing points when there is no reason to believe their play is getting better or worse.

    The standard of chess play should gradually improve globally as more and more lines are studied and more and more conclusions reached about particular lines (and ditto to a degree for endgames). There is no mechanism by which it should decline.

    I'll agree just to be really precise that if, for instance, a virus from Mars caused all chessplayers to blunder pieces every five moves, then deflation should be possible. However there is no normal circumstance under which it should occur - so if you have significant average points losses among adults who should be close to stable in playing strength, you have a problem with your system.

    How do we measure whether the standard of play has gone up or down? If there was a perfect or close to perfect way of doing this then surely we could have a rating system that just looks at the strength of play of every game of a player and gives them a rating based on that. Impossible though in reality.
    You could measure it to a degree by looking at their games. This is not a practical way to run a rating system, but it is a practical way to try to calculate whether drift in ratings over a very long time is reasonable.

    John Nunn did an interesting exercise involving putting games of leading 19th century players through computers and looking at the frequency with which severe tactical errors occurred. He found that in the games of the leading 19th century players errors were far more common than they are now. On that basis today's leading players are probably better than the leading players of times past.

    Another reason is that the growth in theory. Any GM of today would wipe the floor with a Philidor or Morphy through better understanding of theory alone. That makes them objectively better players, although if Philidor or Morphy had access to the same resources over the same period of time they could well triumph through superior raw talent.

    So what I'm saying is that there are cases in which inflation may represent some real improvement in playing strength, whereas true deflation almost certainly means there is something wrong with the system.
    Moderation Requests: All requests for, comments about, or questions about moderation of any kind including thread changes must be posted in the Help and Feedback section and not on the thread in question. (Or by private message for routine changes or sensitive matters.)

    ACF Newsletter Information - All Australian players and administrators should subscribe and check each issue for relevant notices

    My psephology/politics site (token chess references only) : http://kevinbonham.blogspot.com.au/ Politics twitter feed https://twitter.com/kevinbonham

  11. #11
    Monster of the deep Kevin Bonham's Avatar
    Join Date
    Jan 2004
    Posts
    39,310
    Quote Originally Posted by Rincewind
    I think the who inflation deflation issue is a non problem if the system works and is self consistent. It is desirable to compare ratings over time so a player who hasn't played for 2 years still has a reasonable rating so large scale niflation and deflation should be avoided where possible. However, I think this might not be a problem with the ratings systsem but could be systemic with the behaviour of players. For example, a large number of players rapidly increasing in strength, generally deflating the pool of stable players and then ceasing competition chess is not a problem with the rating system but with the environment.
    Agreed. On this basis, I'll change

    * The system should not deflate.
    to

    * The system should not deflate under normal circumstances. If it deflates under exceptional circumstances, such as unusual player demographics, corrections should be applied.
    I think a reason to avoid deflation is that if it is merely a relative scale then if it deflates, points can be added without affecting the relative standings of players. This has the added advantage of preventing widespread points losses and undue discouragement caused by them.
    Moderation Requests: All requests for, comments about, or questions about moderation of any kind including thread changes must be posted in the Help and Feedback section and not on the thread in question. (Or by private message for routine changes or sensitive matters.)

    ACF Newsletter Information - All Australian players and administrators should subscribe and check each issue for relevant notices

    My psephology/politics site (token chess references only) : http://kevinbonham.blogspot.com.au/ Politics twitter feed https://twitter.com/kevinbonham

  12. #12
    CC Grandmaster Denis_Jessop's Avatar
    Join Date
    Jan 2004
    Location
    Canberra
    Posts
    3,333

    The Greatest

    Quote Originally Posted by Kevin Bonham
    John Nunn did an interesting exercise involving putting games of leading 19th century players through computers and looking at the frequency with which severe tactical errors occurred. He found that in the games of the leading 19th century players errors were far more common than they are now. On that basis today's leading players are probably better than the leading players of times past.

    Another reason is that the growth in theory. Any GM of today would wipe the floor with a Philidor or Morphy through better understanding of theory alone. That makes them objectively better players, although if Philidor or Morphy had access to the same resources over the same period of time they could well triumph through superior raw talent.
    I can't let this one pass unchallenged especially as it's nice and off topic.

    John Nunn was not the only person to do an exercise of that kind. Apparently Prof.Elo himself did one as did Sir Richard Clarke, an English ratings expert, so it is said. Others have also had a go. The whole issue is dealt with at length by the inimitable Fox and James in The Even More (Inimitable) Complete Chess Addict.

    What is interesting here is not so much the lists of strongest players as the opinions of recent strong players of their predecessors. So Larsen (1967) nominated Philidor as the greatest of all time. Fischer (1964) said that Morphy would "beat anyone alive today" and Euwe and Gligoric also nominated Morphy. Lasker, Alekhin, Botvinnik and Spassky all opted for Capablanca.

    One point is that in assessing players' comparative worth, you cannot properly pit Morphy with 19th C knowledge against Kasparov with 21st C knowledge as it's quite an unfair comparison. My own guess is that the very best of old-time players were intrinsically as strong as the best modern players but that the overall standard has improved. That, I'm sure is the case in most sports and in classical music performance (except singing) as well.

    Another factor is the different tournament atmosphere these days. In another part of their work F & J have section entitled Drink Like a Grandmaster in which, among other things, they note that James Mason, a leading player of his day, was claimed frequently to have lost games in a "hilarious condition" and that in the London Tournament of 1899 he was discovered asleep in a fireplace. They don't mention whether the fire was lit at the time or whose move it was so please don't ask.

    DJ
    ...I don't want to go among mad people Alice remarked, "Oh, you can't help that," said the Cat: we're all mad here. I am mad. You're mad." "How do you know I'm mad?" said Alice. "You must be," said the Cat ,"or you wouldn't have come here."

  13. #13
    Monster of the deep Kevin Bonham's Avatar
    Join Date
    Jan 2004
    Posts
    39,310
    Quote Originally Posted by Denis_Jessop
    John Nunn was not the only person to do an exercise of that kind. Apparently Prof.Elo himself did one as did Sir Richard Clarke, an English ratings expert, so it is said.
    I am not sure whether Elo and Clarke made any attempt to assess changes in the quality of play or whether they were just trying to provide a comparative ratings process based on game results between players over time. (In the latter light, Sonas' Chessmetrics site is by far the most thorough attempt, but it assumes parity in strength between all the eras mentioned.)

    What is interesting here is not so much the lists of strongest players as the opinions of recent strong players of their predecessors. So Larsen (1967) nominated Philidor as the greatest of all time. Fischer (1964) said that Morphy would "beat anyone alive today" and Euwe and Gligoric also nominated Morphy. Lasker, Alekhin, Botvinnik and Spassky all opted for Capablanca.
    I have seen these nominations in the original edition of that book (I haven't read any of the later editions alas). I suspect that great players tend to factor in a correction for theory in making such a comment.

    One point is that in assessing players' comparative worth, you cannot properly pit Morphy with 19th C knowledge against Kasparov with 21st C knowledge as it's quite an unfair comparison.
    Nonetheless if you put Paul Morphy at his peak in a time machine and sit him down for a match with Kasparov today, it is a fair reflection of how he would perform. The role of a rating system is to say "how good is a player at a specific time?" not "how good would this player be if exposed to the theory of 150 years in the future at the same moment?" Otherwise you would be judging Kramnik according to how good he would be if NCO volume 23 from 2150 AD fell into a timewarp and landed in his lap before his next match.

    To get this back to objectives of ratings systems, suppose a ratings system applied retrospectively says (say) that the strongest players of the 19th century were 2400-2500 strength at their best while the strongest players now are 2800 strength. What I am saying with my comment about inflation is that that is not automatically a reason to conclude the system is faulty. It may be that it is telling us something about a gradual improvement in play standards that is actually real. But if it said the top players of the 19th century were 3200 strength we would have to be sceptical about that.

    That is all I was getting at in saying that gradual inflation does not necessarily prove a fault with a rating system.

    The comment about tournament standards is interesting. For instance until quite recently it was permitted for smoking to occur in tournaments. For all we know this may have affected the performance of almost everybody! A rating system probably can't handle these sorts of issues and just has to assume that playing conditions from time to time remain the same - unless it takes time control (for instance and where known) into account.

    I'm interested that the inflation/deflation part of my initial post has created a lot of debate (and it is a point that takes a fair amount of "unpacking") but the rest has not. Does anyone have any suggestions or points of difference with the rest?
    Moderation Requests: All requests for, comments about, or questions about moderation of any kind including thread changes must be posted in the Help and Feedback section and not on the thread in question. (Or by private message for routine changes or sensitive matters.)

    ACF Newsletter Information - All Australian players and administrators should subscribe and check each issue for relevant notices

    My psephology/politics site (token chess references only) : http://kevinbonham.blogspot.com.au/ Politics twitter feed https://twitter.com/kevinbonham

  14. #14
    CC Grandmaster
    Join Date
    Jan 2004
    Posts
    3,444
    Quote Originally Posted by Kevin Bonham
    In a sufficiently large chess playing area, like a nation, real deflation shouldn't happen. You might get a decline in the average rating because there are less strong players or more weak players about at one time than another, but that's not real deflation. Real deflation is, for instance, when stably rated adults in the 35-45 age bracket are losing points when there is no reason to believe their play is getting better or worse.

    The standard of chess play should gradually improve globally as more and more lines are studied and more and more conclusions reached about particular lines (and ditto to a degree for endgames). There is no mechanism by which it should decline.

    I'll agree just to be really precise that if, for instance, a virus from Mars caused all chessplayers to blunder pieces every five moves, then deflation should be possible. However there is no normal circumstance under which it should occur - so if you have significant average points losses among adults who should be close to stable in playing strength, you have a problem with your system.



    You could measure it to a degree by looking at their games. This is not a practical way to run a rating system, but it is a practical way to try to calculate whether drift in ratings over a very long time is reasonable.

    John Nunn did an interesting exercise involving putting games of leading 19th century players through computers and looking at the frequency with which severe tactical errors occurred. He found that in the games of the leading 19th century players errors were far more common than they are now. On that basis today's leading players are probably better than the leading players of times past.

    Another reason is that the growth in theory. Any GM of today would wipe the floor with a Philidor or Morphy through better understanding of theory alone. That makes them objectively better players, although if Philidor or Morphy had access to the same resources over the same period of time they could well triumph through superior raw talent.

    So what I'm saying is that there are cases in which inflation may represent some real improvement in playing strength, whereas true deflation almost certainly means there is something wrong with the system.
    Excellent answers Kevin. Thank you.

  15. #15
    CC Grandmaster
    Join Date
    Jan 2004
    Posts
    3,444
    Kevin I have a number of questions, I just started off with the one I did on sort of a random way.

    Quote Originally Posted by Kevin Bonham
    * The primary (but not necessarily the only) source of data should be a player's previous performances.
    What other sources of data did you have in mind? I am struggling to think of any other source.
    Scott

Thread Information

Users Browsing this Thread

There are currently 1 users browsing this thread. (0 members and 1 guests)

Similar Threads

  1. Underrated Juniors
    By Paul S in forum Ratings Arena
    Replies: 627
    Last Post: 17-02-2008, 08:43 PM
  2. Regarding the behaviour of some BB members
    By Alan Shore in forum Non-Chess
    Replies: 324
    Last Post: 14-08-2006, 09:27 PM
  3. Game by Game ratings, fide
    By Garvinator in forum Ratings Arena
    Replies: 15
    Last Post: 14-10-2005, 11:11 AM
  4. Rating computation suggestion
    By pax in forum Ratings Arena
    Replies: 292
    Last Post: 19-09-2004, 07:46 PM
  5. Planned Rating Changes
    By Bill Gletsos in forum Ratings Arena
    Replies: 415
    Last Post: 30-07-2004, 01:00 AM

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •