PDA

View Full Version : Kevin's rating question - extended



Kevin Bonham
09-09-2004, 08:46 PM
This is a question I've come up with which I regard as a basic test of having any clue about how chess ratings should work at all. I'm reposting it because it has become buried in a pile of rubbish on longwinded ratings threads, but David (who fancies himself as some kind of armchair expert on ratings despite all evidence to the contrary) has refused to answer this question the many times it has been put to him.

A player who has never played a rated game before plays ten games in a ratings period and performs at 1000.
The player then plays another ten games in the next ratings period and performs at 1200 in those ten games.
Each ratings period is three months long.

All other things being equal (eg you don't know the RDs, volatilities or other results of the opponents) what should be the new rating of the player?

A number of answers have already been ventured. I'm going to put this up for a week or so and I would ask everyone to submit answers by one of the following methods:

* PM it to me.
* Post it and immediately delete it.

David, you can feel free to post yours openly. I'd hate you to get any clues from anyone else's answers, since it's obvious you don't have any at the moment.

Some starting thoughts to get you going:

Matt said 1250 - see reasoning below, good luck making any sense of it.

ELO with k=15 says 1039.

KB, Glicko1 and Glicko2 all give answers within the same five-point range, but I'm not saying what it is ... yet. :lol:

Over to the rest of you ... David, do feel free to post abusive rubbish in response to keep this thread near the top.

firegoat7
09-09-2004, 10:33 PM
Immature posting from a moderator who should know better.

Cheers FG7

PHAT
09-09-2004, 10:37 PM
Matt said 1400 if the periods are "temporal", which I do believe they are.

Not the whole story, is it, so here:


Without a time component being given, there is only one answer 1250.

If the subject was known to be static, then the new rating would be (1000+1200)/2 = 1100

If the subject was known to be dynamic, then the new rating would be 1400.

Since we do not know if the subject is static of dynamic, we should take a compromise course. (1100+1400)/2 = 1250

Clearly, Kevin my estimation is 1250, not the unlikely 1400 that you would lead others to believe I said. :rolleyes:

PHAT
09-09-2004, 10:40 PM
Immature posting from a moderator who should know better.

Cheers FG7

I don't often disagree with you FG, but I am of the opinion it isn't his immaturity that is driving his spite, it is his ego. ;)

Bill Gletsos
09-09-2004, 11:05 PM
I don't often disagree with you FG, but I am of the opinion it isn't his immaturity that is driving his spite, it is his ego. ;)
What is abundantly clear is that DR is evading the question like his life depended on it.

Kevin Bonham
09-09-2004, 11:25 PM
Immature posting from a moderator who should know better.

firegoat, Matthew, David et al - any rubbish you spout on this thread will only further my aim of exposing David's refusal to answer the question by pushing this thread to the top. :owned: So eat your hearts out, if you have any.

Furthermore, to stop the answer post from being buried deep in the thread past the point where people have stopped reading due to trolls, I will post the answers in a new thread, locking this one at that time.

Finally, if you have any concerns about the content of my post please direct them to another moderator. I do not moderate my own posts. Cheapo - red card - firegoat fails again. :hand:

Kevin Bonham
09-09-2004, 11:29 PM
Clearly, Kevin my estimation is 1250, not the unlikely 1400 that you would lead others to believe I said. :rolleyes:

Sorry, I was just doing my best to unravel what you meant from your very poorly expressed posts on the issue, and in particular the fact that you kept asking me about the time component when it was stated: consecutive ratings periods. I now understand your position better.

Of course, it's still wrong. :owned:

PHAT
10-09-2004, 12:20 AM
Furthermore, to stop the answer post from being buried deep in the thread past the point where people have stopped reading due to trolls, I will post the answers in a new thread, locking this one at that time.


NO!

you cannot use your moderater powers to aid you in your personal feud. If you do this I will put in an official complaint.

What else would we expect from Mr Amoral.

Kevin Bonham
10-09-2004, 12:52 AM
NO!

you cannot use your moderater powers to aid you in your personal feud. If you do this I will put in an official complaint.

Hmmm, I can see that being taken very seriously indeed.

As an alternative, prior to giving the answer I could delete or move every post not relevant to the thread topic. :lol:

Keep it going Matt. Remember, the more this goes to the top the more the pressure you put on your tag-team buddy Dave.

Bill Gletsos
10-09-2004, 01:12 PM
Keep it going Matt. Remember, the more this goes to the top the more the pressure you put on your tag-team buddy Dave.
I just cant see DR providing an answer as he will just shoot himself in the foot.

Kevin Bonham
21-09-2004, 01:49 AM
Keep it going Matt. Remember, the more this goes to the top the more the pressure you put on your tag-team buddy Dave.

Aaaaah. That all shut him up most effectively. We'll have to try these kinds of tactics again. :whistle:

No answer received from David of course, further confirming that he doesn't have even the roughest clue (and yes he has been online during the relevant time). This answer was definitely heading in the right direction (I also received a couple that weren't, but I won't embarrass those responsible unless they want to be outed) :


Hello,

this is not an attempt to solve the posed problem (my only answer would be "somewhere in the 1001-1199 range", which is probably not what you were after :-).

Instead, I am trying to use this example to evaluate if glicko is better than our local, similar-to-elo rating system, which comes up with either 1049, 1061 or 1064, depending on the players age. I had to make a few assumptions as well, regarding the grouping of the results in tournaments as we rate per tournament not per 3-month-period.

But, my intuition says, any value below 1100 is too low as we have a total of 20 games averaging a 1100 performance with the ten recent (and thus more important) games played above that level.

Yes. Anything 1100 or below is too low because the 10 games at 1200 are more recent evidence than the 10 games at 1000 and therefore at least slightly more relevant, both because of the possibility of improvement and the possibility of form.

The question is how much higher. Matthew gives an estimate of 1250 but this is based on assuming a 1/2 probability that the subject is dynamic. This ignores several objections. How many players really improve 400 points of playing strength from a base of 1000 in their first six months? I don't have any empirical data on it but I've never seen it happen. However for a player of close to static strength to perform 200 points better in one set of ten games is nothing unusual. An 1100s player playing a field of average strength about 1460 would score about 1/10 on average, but could very easily score 0.5/10 one time and 2/10 another, which would actually give a slightly wider PR range than my example. (This is a good example of why you should generally not assume a probability of anything to be 1/2 in the absence of further information. Further information is often no further away than a few moment's thought about the problem.)

Also even if the performance jump was partly a product of improvement, you don't know if that improvement was as large as 200 points, nor do you know if that improvement is continuing or if it was simply a once-off. Matthew's argument doesn't consider any of these factors and as a result his 1250 is a gross overestimate.

To get an exact answer you'd need to examine the question empirically, but based on the above it is going to be in the ballpark 1101-1150.

My initial answer was 1105-1110. Maybe a little bit conservative.

Bill's initial answer was 1106 (RD=87) for Glicko and 1105 (RD=86) for Glicko 2. This was based on opponents all having RD=70, which gave the new player an RD of 107 after the first period.

Bill also looked at the highest possible RD the initially unrated player could get after the first period (abandoning the assumption that opponents had low RDs) which was around 155. That would result in ratings of 1143 (RD=103) for Glicko and 1144 (RD=102) for Glicko-2.

Probably the Glicko ratings would be much closer to the first set in practice bexause while !!s are a minority of the active rating list, their high activity levels mean they play a very high proportion of the games.

(Bill - please feel free to add any clarifications necessary).

What is the point of this? The point is that the main thrust of David's complaint (that many juniors are underrated) is wrong because in any effective system, the ratings of actually improving players must lag as a result of insufficient information to distinguish those who are improving from those who aren't. If David wants to come up with something useful, he needs to find a system that identifies which juniors are improving based on their past data. Maybe he'd like to look at the Gold Coast ratings one period and post a list of which ones will go up next period?

PHAT
21-09-2004, 05:47 PM
KB, Every now and again you make a post that is so poorly argued, I think you are having us on. To wit:


The question is how much higher. Matthew gives an estimate of 1250 but this is based on assuming a 1/2 probability that the subject is dynamic. This ignores several objections. How many players really improve 400 points of playing strength from a base of 1000 in their first six months?

I venture to say that nearly every chess player impoves by 400 point in the first six months! In deed, I would be worried about one that did not. From rank beginner ("How does that horse move again?") with a rating of ~100 to a rating of ~500 in half a year, is not a big ask.


I don't have any empirical data on it but I've never seen it happen.

See above, slow Joe.


However for a player of close to static strength to perform 200 points better in one set of ten games is nothing unusual. An 1100s player playing a field of average strength about 1460 would score about 1/10 on average, but could very easily score 0.5/10 one time and 2/10 another, which would actually give a slightly wider PR range than my example. (This is a good example of why you should generally not assume a probability of anything to be 1/2 in the absence of further information. Further information is often no further away than a few moment's thought about the problem.) [bold by MS]

Not a good example at all. It is only an example of something that proves to be not 0.50 after examining more data. When there is no apriori reason to suggest a non 0.50 chance, then 0.50 is the choice.


Also even if the performance jump was partly a product of improvement, you don't know if that improvement was as large as 200 points,... it might be even larger.
...nor do you know if that improvement is continuing or if it was simply a once-off. Exactly right! It might or might not continue to improve. However, there is data available to us that says it is going up. so why on Earth would you say that we ought to treat the rating as if the player is static? Derr. Derr. Triple derr. b.


Matthew's argument doesn't consider any of these factors and as a result his 1250 is a gross overestimate.

Prove it.


To get an exact answer you'd need to examine the question empirically, but based on the above it is going to be in the ballpark 1101-1150.

"Exact" .... :lol: Exact except for the uncertainty. :wall:


Bill's initial answer was ... Bill also looked at ...

Probably the Glicko ratings would be much closer to the first set in practice bexause while !!s are a minority of the active rating list, their high activity levels mean they play a very high proportion of the games.

Glicko is only going to give you what you want to see. You salect how responsive/volitile you want it to be, and BINGO!!! you get exactly what you reckon it should be. Suffice to say, Glicko proves nothing at all.


(Bill - please feel free to add any clarifications necessary).
... is Bill your daddy as well as chesslover's?

Kevin Bonham
21-09-2004, 10:10 PM
KB, Every now and again you make a post that is so poorly argued,

Now now, none of that, I'm the one doing the baiting here. :lol:


I venture to say that nearly every chess player impoves by 400 point in the first six months! In deed, I would be worried about one that did not. From rank beginner ("How does that horse move again?") with a rating of ~100 to a rating of ~500 in half a year, is not a big ask.

However it is clearly a very big ask to expect you to reply to a paragraph containing the words "from a base of 1000" in a thread discussing tournament performance without giving an irrelevant example involving a player who starts at ~100 strength presumably outside tournaments. So your objection is bogus and the:


slow Joe.

who needed to:


See above,

was you. :owned:

FWIW I would actually be quite surprised if "nearly every" player improved 400 points in strength in their first six months of playing the game at all (unless you assign some kind of rating to a player who has never played), but even if this is true it is irrelevant. We are discussing the tournament performances of a player performing at 1000 in their first 10 games.


Not a good example at all. It is only an example of something that proves to be not 0.50 after examining more data. When there is no apriori reason to suggest a non 0.50 chance, then 0.50 is the choice.

That's unusually pedantic of you, Matt, I thought you were allergic to so-called pedantry, but obviously only so when it suits you. My point, of course, was that you did not bother to think and consider whether there could have been further information. In any case when there is no apriori reason, I'd say the probability is indeterminate, not 0.5. "0.5 +/- 0.5" doesn't actually tell you a hell of a lot.


it might be even larger.

And (although this is less likely) the player might even have actually gone backwards.


Exactly right! It might or might not continue to improve. However, there is data available to us that says it is going up. so why on Earth would you say that we ought to treat the rating as if the player is static? Derr. Derr. Triple derr. .

Triple derr is you, because I never said that. Indeed, by not giving a final answer of exactly 1100 I have already ruled out the assumption that the player must be static. It is a question of how you mix the static/dynamic probabilities.


Prove it.

As I said, firm proof will require empirical evidence - this is just a theory exercise in establishing the most plausible range.


"Exact" .... :lol: Exact except for the uncertainty. :wall:

Point?


Glicko is only going to give you what you want to see. You salect how responsive/volitile you want it to be, and BINGO!!! you get exactly what you reckon it should be. Suffice to say, Glicko proves nothing at all.

You missed the point - it wasn't to say that Glicko agrees with me therefore I'm right, it was to defend Glicko by pointing out that it gives reasonable answers, which ELO and Matthew Sweeney do not. :hand:


... is Bill your daddy as well as chesslover's?

No. But he certainly seems to be your master. :hmm:

rob
22-09-2004, 03:34 PM
What is the point of this? The point is that the main thrust of David's complaint (that many juniors are underrated) is wrong because in any effective system, the ratings of actually improving players must lag as a result of insufficient information to distinguish those who are improving from those who aren't.

I would like to think that the ratings of actually improving players should only lag by a minimal amount if at all. Surely it is better to have players closer to their actual rating performance? If a player has a good rating period they should get a rating to reflect that. If they continue improving then it must be better that their current rating is higher than otherwise, or if they follow-up (next period) with a lower rating performace then they should lose the appropriate number of points. Isn't this fair to both them and those that play them in future? We shouldn't be afraid to change ppl's rating based on their results just because some may not follow-up with similar performances. For many improving players (mainly juniors) a RD of !! may not be dynamic enough to change ratings enough to indicate their current strength. Also an RD of !! causes greater effect on their opponents rating than ! or blank. I believe that the September changes may reduce this problem a little. Apart from this glicko2 is brilliant and most players current ratings seem pretty accurate IMHO :) (I recall Greg Canfell expressing similar sentiments on another post)

PHAT
22-09-2004, 04:39 PM
I would like to think that the ratings of actually improving players should only lag by a minimal amount if at all. Surely it is better to have players closer to their actual rating performance? If a player has a good rating period they should get a rating to reflect that. If they continue improving then it must be better that their current rating is higher than otherwise, or if they follow-up (next period) with a lower rating performace then they should lose the appropriate number of points.

I am on it, with no thanks to the dog in the manger and resident control freak, BG.

Bill Gletsos
22-09-2004, 11:41 PM
I would like to think that the ratings of actually improving players should only lag by a minimal amount if at all. Surely it is better to have players closer to their actual rating performance? If a player has a good rating period they should get a rating to reflect that. If they continue improving then it must be better that their current rating is higher than otherwise, or if they follow-up (next period) with a lower rating performace then they should lose the appropriate number of points. Isn't this fair to both them and those that play them in future? We shouldn't be afraid to change ppl's rating based on their results just because some may not follow-up with similar performances. For many improving players (mainly juniors) a RD of !! may not be dynamic enough to change ratings enough to indicate their current strength. Also an RD of !! causes greater effect on their opponents rating than ! or blank. I believe that the September changes may reduce this problem a little. Apart from this glicko2 is brilliant and most players current ratings seem pretty accurate IMHO :) (I recall Greg Canfell expressing similar sentiments on another post)
Firstly their RD isnt as important under Glicko2 as the volatility factor will cut in if they demonstrate a significant change in strength. However it would be foolish to believe a small number of games is sufficient to do this.

Also although the RD of an oppoent has some impact it is fairly insignificant in comparison to the players own RD and volatility.

Now all ratings systems merge the rating at the start of the period with the results obtained during the period. The more weight given to the results in the period the more the new rating tends towards the performance rating.

Well of course you could simply make a players new rating equal to their performace rating for the rating period. As you say wouldnt this be fair/
Unfortunately the answer is no, it would not be fair.

Not only ratings theory but also testing shows this leads to considerably less predictive accuracy.

The player needs to demonstrate their new proficiency over a reasonable number of games. This is especially true with Elo but less so with Glicko/Glicko2.

If you take the Elo system for example for those 10 games at the 1200 performance level a K of 77 is required ofr the new rating to equal 1200. Also a 10 game sample only represents a confidence level of 74% which is pretty inadequate. As Kevin said above Elo (K=15) gives a rating of 1039. At least 30 games are required for a 95% confidence level. With that Elo gives a rating of 1117. Glicko would give a rating of somewhere between 1171-1183. Now 50 games represents a 98.8% confidence level. Elo would give that a 1195 rating whilst Glicko for 50 games would give a 1195-1204 rating.

However what is abundantly clear is that just because a player improved in the current rating period there is no reason to believe they will improve at all in the next rating period, they may actually get worse.

Bill Gletsos
22-09-2004, 11:44 PM
I am on it, with no thanks to the dog in the manger and resident control freak, BG.
You arent on it, in fact you arent even close to it.
What is clear is that you have never read anything to do with ratings not even Elo's book. If by some chance you have, then your ramblings indicate you certainly did not gain any knowledge from it.

PHAT
23-09-2004, 01:13 AM
You arent on it, in fact you arent even close to it.
What is clear is that you have never read anything to do with ratings not even Elo's book. If by some chance you have, then your ramblings indicate you certainly did not gain any knowledge from it.

As it happens, I was having a go at it tonight. I feel like I am re-inventing the wheel because I am trying to find a way of combining the master files on to one excel spreadsheet with all the same ID numbers lined up in the same row. I haven't cracked it yet. But don't worry, I won't bother asking a no mates patzer, pretend NSWCA president, dog in the manger, to help. So, ought to have worked out, that that is a tacit GF.

BTW, you needn't think that by moving a perenial BB flaming fest into real world combat will make you look like a decent person. You look very very little.

Bill Gletsos
23-09-2004, 11:15 AM
As it happens, I was having a go at it tonight. I feel like I am re-inventing the wheel because I am trying to find a way of combining the master files on to one excel spreadsheet with all the same ID numbers lined up in the same row. I haven't cracked it yet. But don't worry, I won't bother asking a no mates patzer, pretend NSWCA president, dog in the manger, Conscientious Negative Technician, to help. So, ought to have worked out, that that is a tacit GF.
The point is your whole idea of trending is flawed.
There is no way to predict that just because someone improved this period they will improve the next or if their rating declined this period it will decline in the next.


BTW, you needn't think that by moving a perenial BB flaming fest into real world combat will make you look like a decent person. You look very very little.
And you look like the fool you are, with no credability.
You are just a loud mouthed, abusive do nothing individual.
You chose to behave on the BB like a foul mouthed individual bent on spreading misinformation and generating beatups.
You could have acted like a reasonable human being ever since you first posted on the BB. However you did not take that option.
You have no one to blame but yourself for your complete lack of credability.

rob
23-09-2004, 02:34 PM
As it happens, I was having a go at it tonight. I feel like I am re-inventing the wheel because I am trying to find a way of combining the master files on to one excel spreadsheet with all the same ID numbers lined up in the same row. I haven't cracked it yet. But don't worry, I won't bother asking a no mates patzer, pretend NSWCA president, dog in the manger, Conscientious Negative Technician, to help. So, ought to have worked out, that that is a tacit GF.

BTW, you needn't think that by moving a perenial BB flaming fest into real world combat will make you look like a decent person. You look very very little.

Gee Matthew I can't think why Bill wouldn't want to assist you when you are so charming to him - its all such a mystery to me :)

PHAT
23-09-2004, 03:07 PM
The point is your whole idea of trending is flawed.
There is no way to predict that just because someone improved this period they will improve the next or if their rating declined this period it will decline in the next.

If I cannot show that trending can be modelled, that does not mean that it cannot be modelled. It only means that I have not got the ability or data to do so.


You are just a loud mouthed, abusive do nothing individual.
Wrong. I use my loud mouth to abuse you.

You chose to behave on the BB like a foul mouthed individual ...
Hypercrit. It is a matter of degree.

... bent on spreading misinformation and generating beatups.


Previously you refered to my saying the NSWJCL not giving much support to representatives. I still stand by what I said. Nobody has privided a shread of evidence to show that I was wrong. When/if they do, I will retract. Until then put up or shut up.

PHAT
23-09-2004, 03:19 PM
Gee Matthew I can't think why Bill wouldn't want to assist you when you are so charming to him - its all such a mystery to me :)

Let me solve the mystery for you.

BG is scared that I might be able to show that trends can be identified and used to improve rating predictions.

BG is stupid. He could have used this opportunity to take the high moral ground and make himself look like a bigger person than me, but he hasn't. He just looks like the dog in the manger.

Bill Gletsos
23-09-2004, 03:20 PM
If I cannot show that trending can be modelled, that does not mean that it cannot be modelled. It only means that I have not got the ability or data to do so.
It has nothing to do with your ability or data.
There is no means of predicting if the player will improve or not in the next period. Your stubborness to accept this just demonstrates further how foolish you are.


Wrong. I use my loud mouth to abuse you.
Actually it can be shown that you use your loud moth to abuse all and sundry.
Too bad you never put even 1% of the effort you expend here in actually doing anything whilst on the NSWCA Council.


Hypercrit. It is a matter of degree.
Being crude and vulgar and foul mouthed has nothing to do with matters of degree. It just demonstrates to everyone just how uncouth you are.


Previously you refered to my saying the NSWJCL not giving much support to representatives. I still stand by what I said. Nobody has privided a shread of evidence to show that I was wrong. When/if they do, I will retract. Until then put up or shut up.
No you moron.
The onus is on you to check your facts before making false and incorrect claims. However you never attempt to do this.
By not even attempting to do this you just show that all you really care about is generating beatups where none exist.
No wonder everyone considers you are a complete joke.

Bill Gletsos
23-09-2004, 03:33 PM
Let me solve the mystery for you.
You coulnt solve a 3 move mate.


BG is scared that I might be able to show that trends can be identified and used to improve rating predictions.
You are deluding yourself as usual.

BG is a {deleted}.
This is good coming from a foolish useless cretin knowing essentially rubbish like you.

BG is stupid. He could have used this opportunity to take the high moral ground and make himself look like a bigger person than me, but he hasn't. He just looks like the dog in the manger.
All you are is a loud mouthed do nothing individual.
You had your chance to contribute to chess in NSW whilst on Council this year and you did nothing. Zip.
You have no credability.

Kevin Bonham
23-09-2004, 11:47 PM
Should I split this now or is it going to get back on topic?

Trent Parker
23-09-2004, 11:58 PM
Ya can undelete my deleted posts now.... I cant remember what i said.... :D

Bill Gletsos
24-09-2004, 12:08 AM
Should I split this now or is it going to get back on topic?
We can but hope.

Garvinator
24-09-2004, 12:45 AM
Should I split this now or is it going to get back on topic?
you could always delete ;)

PHAT
24-09-2004, 03:30 PM
We can but hope.

You always have to get in the last word :rolleyes:

Bill Gletsos
24-09-2004, 03:46 PM
You always have to get in the last word :rolleyes:
Actually I thought gg had gotten in the last word. :lol:

PHAT
24-09-2004, 03:57 PM
Actually I thought gg had gotten in the last word. :lol:

This is the last word. :P

Rincewind
24-09-2004, 04:57 PM
This is the last word. :P

This thread is obviously going nowhere. Consider it parked.

rob
25-09-2004, 08:37 AM
I'd be interested in ppl's comments on Kevin's rating question (below - previous thread closed) but after the 3rd rating period if (a) they perform at 1100 for those 10 games or (b) they perform at 1300 for those 10 games.

I guess that ppl's (a) ratings to be pretty close :) but (b) to be far apart :(

I hope that ppl's answers and any reasons they kindly provide will be respected no matter how flawed they may appear :) By respected I mean not turning into aggressive personal attacks - just charmingly making ones point is much nicer and indicates confidence :) (perhaps I'm too keen on 'Yes Minister')

Kevins rating question:
A player who has never played a rated game before plays ten games in a ratings period and performs at 1000.
The player then plays another ten games in the next ratings period and performs at 1200 in those ten games.
Each ratings period is three months long.

All other things being equal (eg you don't know the RDs, volatilities or other results of the opponents) what should be the new rating of the player?

Spiny Norman
25-09-2004, 10:07 AM
My thoughts (better or worse) are as follows:

1) Has anyone looked at age-based trends in chess ratings over a long period (e.g. 20 years) to see whether people in different age brackets trend in a certain direction? Would this fall foul of age discrimination laws? would there be a big enough sample size of people moving from one bracket to another to make it worthwhile. If there are trends (e.g. juniors "tend to improve", adults "are mostly stable" and seniors "tend to reduce in strength") then this could be accounted for mathematically.

2. On the question at hand, I would expect a rating around the 1150 mark to be "reasonable".

No mathematics there ... just what seems reasonable to me.

Frosty

Rincewind
25-09-2004, 10:47 AM
1) Has anyone looked at age-based trends in chess ratings over a long period (e.g. 20 years) to see whether people in different age brackets trend in a certain direction? Would this fall foul of age discrimination laws? would there be a big enough sample size of people moving from one bracket to another to make it worthwhile. If there are trends (e.g. juniors "tend to improve", adults "are mostly stable" and seniors "tend to reduce in strength") then this could be accounted for mathematically.

I know of no study whatsoever that has looked at this. However, my belief is that there are lot of things going on developmentally and a wide number of interrelated variables which would be difficult to access. The rating system is a results based system. Your rating is designed to be variable depending on te strength of yur opponents and your scores against them. If your playing strength has increased, this will be reflected in the rating. There seems therefore no more need to introduce aged-based rating formulae then say start a registry of players (junior and senior) who are receiving coaching and introduce a specifric rating calculations for them.


2. On the question at hand, I would expect a rating around the 1150 mark to be "reasonable".

I think this is about right too. One period of 1000 performance then a period of 1200 performance (equal number of games). Assuming other factors are equal (like the opponent RDs) then the more recent results should have a greater weight leading to a rating of around 1150.

Do I think the player should be rated at 1400 due to the trend? No.


No mathematics there ... just what seems reasonable to me.

Is there a difference? I've always related mathematics with what is reasonable.

Rincewind
25-09-2004, 10:54 AM
(below - previous thread closed)

Sorry about that. It seemed to be descending into a a contest of one-up-manship so I closed it to get that "debate" into the appropriate thread. I've now reopened and merged the two threads back.

Kevin Bonham
25-09-2004, 08:05 PM
I'd be interested in ppl's comments on Kevin's rating question (below - previous thread closed) but after the 3rd rating period if (a) they perform at 1100 for those 10 games or (b) they perform at 1300 for those 10 games.

Along similar lines to my arguments on the original case:

(a) it would make little difference to the rating I would have them at after the second period, so about 1105.
(b) you have three periods of 1000, 1200 and 1300. The mean is 1167 but the 1000 performances are six months old whereas the 1300 performances are much more recent. I'd say about 1210.

Kevin Bonham
25-09-2004, 08:20 PM
1) Has anyone looked at age-based trends in chess ratings over a long period (e.g. 20 years) to see whether people in different age brackets trend in a certain direction?

We've looked at that here very informally and selectively sometimes and I think it's also been looked at within the US. It's difficult to monitor over 20 years because any system will get changed several times in that time, so the best way to look at it is by tracking players of different ages collectively as they age by a year. Juniors tend to improve, 35s-to-45s are fairly stable (I think Glickman uses this group for ratings pool monitoring for this reason) over-45s tend to slowly decline. No surprises there - however these are statistical averages only. For seniors, the average strength loss, which might be say 10-15 points a year, is swamped by the variation. Some juniors never improve above their initial rating, or even put in good performances then lose interest and get worse. Some seniors are still improving well into their 60s.


Would this fall foul of age discrimination laws?

Just examining it doesn't.

If there was age-based ratings adjustment, I would be concerned about what would happen if a mature and improving player applying for an event for which the selection criterion was, as it usually is, "to rank the players in order of playing strength", was to miss selection narrowly because of their low rating. I think such a candidate would have some chances on appeal if not in court - although selectors are usually pretty good at seeing through any such quirks in the ratings.

Spiny Norman
25-09-2004, 09:43 PM
2. On the question at hand, I would expect a rating around the 1150 mark to be "reasonable".

... just realised that I only answered the "a" part of the question.

for the "b" part of the question ... if they performed at 1300 for the next period ... I would have thought a final rating around the 1265 mark because they've now demonstrated a healthy increase over two consecutive periods, and therefore the likelihood that their performance is genuinely improving (not a fluke) is higher in my mind.

Frosty

Bill Gletsos
25-09-2004, 11:42 PM
Along similar lines to my arguments on the original case:

(a) it would make little difference to the rating I would have them at after the second period, so about 1105.
(b) you have three periods of 1000, 1200 and 1300. The mean is 1167 but the 1000 performances are six months old whereas the 1300 performances are much more recent. I'd say about 1210.
Elo (k=15) gives a rating of 1052 for a) and for b) a rating of 1087. If k=24 then a) 1075 and b) 1133.
Glicko gives a rating of 1104 for a) and for b) 1190.

In fact no current rating system would give a rating above 1200.
Anyone who believes b) should be over 1200 is ignoring the following. The number of games is still too small. The 20 games in the last two rating periods only represents a 89% confidence level.

Kevin Bonham
26-09-2004, 12:46 AM
Glicko gives a rating of 1104 for a) and for b) 1190.

Doesn't this again depend upon the RDs of the opponents, at least in the second case?


In fact no current rating system would give a rating above 1200.
Anyone who believes b) should be over 1200 is ignoring the following. The number of games is still too small. The 20 games in the last two rating periods only represents a 89% confidence level.

While 1210 was a rough figure that may be a touch on the high side (1190 sounds fine to me too), I don't think the above follows - after all the mean PR from these 20 games is 1250 not 1200, so it is not as if they are being taken as conclusive evidence that the first ten games are irrelevant.

I would certainly not consider figures around 1250 credible.

Bill Gletsos
26-09-2004, 12:53 AM
Doesn't this again depend upon the RDs of the opponents, at least in the second case?
To a certain degree.


While 1210 was a rough figure that may be a touch on the high side (1190 sounds fine to me too), I don't think the above follows - after all the mean PR from these 20 games is 1250 not 1200, so it is not as if they are being taken as conclusive evidence that the first ten games are irrelevant.
I'm not sure we are disagreeing. As you note although the last 20 games give a PR of 1250 I was just pointing out they dont meet an acceptable confidence level.


I would certainly not consider figures around 1250 credible.
Neither would I.

Kevin Bonham
26-09-2004, 01:34 AM
Neither would I.

Matt gave 1250 after the second ratings period. I wonder what he would give after the third one. :hmm:

Spiny Norman
26-09-2004, 07:36 AM
The number of games is still too small. The 20 games in the last two rating periods only represents a 89% confidence level.

I have a further question then (following up on my question about age-based trends in ratings).

Has anyone considered the duration (i.e. number of moves) of the games to be relevant? Is it relevant, or a furphy?

What I mean is this: Would 10 x 12-move draws be a different indicator to my strength than if I played 10 wins/draws/losses where my wins against slightly lower-rated opponents were all "quick wins" (less than 20 moves) and my draws and losses against higher-rated opponents were, say, 40-move+ games. I have "played more chess" in those longer games against higher-ranked opponents because I have played far more moves than in the other games.

When I was starting out in chess all those years ago, "crushing" lower-ranked opponents and "holding out for a while" against higher-ranked opponents was how I measured my development. It was an indicator of my improvement (although it might also have indicated that my higher-ranked opponents were all just having off days I concede).

Although the number of games (20) gives a confidence level of 89%, what does the number of moves (less against lower-ranked players and more against equal or higher-ranked players) indicate in terms of confidence level?

I'm not seriously suggesting this ... just exploring the idea of "confidence levels" as pertaining to performance. From a practical perspective, nobody would want to see players holding out for move after move in clearly lost positions, just because it would help their rating to hang on for 5 more moves.

I'm more than happy with the ranking system "as is". It is what it is. We know the score. It doesn't (!) have to be perfect (what system is???).

Another thought. Were the games "home wins" or "away wins"? Home ground advantage might be a factor.... ;)

Cheers all,

Steve

Alan Shore
26-09-2004, 07:57 AM
I'm not seriously suggesting this ... just exploring the idea of "confidence levels" as pertaining to performance. From a practical perspective, nobody would want to see players holding out for move after move in clearly lost positions, just because it would help their rating to hang on for 5 more moves.

I think you answer your own question with this, some players resign quickly, others will play on another 20 moves to mate despite being dead lost. Depending on varying playing styles, some will win quicker than others even with similar ratings. I for one enjoy endgames so if I have an advantage often I will trade and win decisively there rather than try to finish the game more quickly - and it's silly to suggest I be penalised for that.


Another thought. Were the games "home wins" or "away wins"? Home ground advantage might be a factor.... ;)

There's a whole thread on this somewhere. I actually have performed better 'away' overall!


Edit: http://www.chesschat.org/showthread.php?t=578

Garvinator
26-09-2004, 09:15 AM
There's a whole thread on this somewhere. I actually have performed better 'away' overall!


Edit: http://www.chesschat.org/showthread.php?t=578
you know that you have been here probably too long when you can recall topics that have been discussed before and are able to find them and refer newer ppl to those threads :uhoh: :doh:

Alan Shore
26-09-2004, 12:32 PM
you know that you have been here probably too long when you can recall topics that have been discussed before and are able to find them and refer newer ppl to those threads :uhoh: :doh:

Too long.. yeah, it's about time for me to declare my innings closed isn't it? ;)

At least until my final exams are over.

Garvinator
26-09-2004, 06:56 PM
Too long.. yeah, it's about time for me to declare my innings closed isn't it? ;)

At least until my final exams are over.
would that then be a declaration :cool:

Kevin Bonham
26-09-2004, 09:32 PM
Has anyone considered the duration (i.e. number of moves) of the games to be relevant? Is it relevant, or a furphy?

Probably the latter from a ratings perspective overall, IMO. For instance there are 1400s players and there are 1400s players. I play some 1400s players who will throw the kitchen sink at me, attacking with sacrifices and wild tactics from the outset, beat me once in a blue moon and go down in a screaming heap before move 30 almost every other time. I play some other 1400s players who are solid and cautious but either too careful and timid or else simply a bit weak on their positional game, and I'll beat this kind of player nearly every time with the odd draw, but they'll frequently hang around past move 45 and even into the 60s before I convert the small advantages I've picked up through the game.


What I mean is this: Would 10 x 12-move draws be a different indicator to my strength than if I played 10 wins/draws/losses where my wins against slightly lower-rated opponents were all "quick wins" (less than 20 moves) and my draws and losses against higher-rated opponents were, say, 40-move+ games. I have "played more chess" in those longer games against higher-ranked opponents because I have played far more moves than in the other games.

That's true but we'd need to know if the chess was shuffling pieces in -/+ or worse positions where it is almost inevitable they will win, or actually going toe-to-toe with them in competitive positions. And we'd need to know if you were losing after move 40 because they'd slowly got the better of you or because you did well to that point but your endgame was hopeless. Also we'd need to know if the quick crushes were exactly that or if they were unsound attacks that the weaker player could have won against.


When I was starting out in chess all those years ago, "crushing" lower-ranked opponents and "holding out for a while" against higher-ranked opponents was how I measured my development. It was an indicator of my improvement (although it might also have indicated that my higher-ranked opponents were all just having off days I concede).

I measure my performance first and foremost by the score. After that the questions I ask are:

* against much lower-ranked opponents, did I maintain control of the game and take opportunities to obtain or increase an advantage? (The answer is far too often "no" in my case.)

* against much higher-rated opponents, was I competitive enough to force them to display their skills in order to beat me, and give them an opportunity to lose if they were having an off day?

When I'm playing a lower-rated player I don't care if I crush them or not. If I have a choice between winning two pawns and grinding them into the dust 50 moves later or playing a very promising attack that I'm not absolutely certain about, I'll take the pawns and the long hard grind every time.

eclectic
26-09-2004, 09:48 PM
our ratings are based solely on the result of our games not our performance when playing them

it's not as if bill gletsos compels tournament organisers to put all ratable games through fritz or whatever so that a figure which purports to indicate our playing efficiency and effectiveness is then provided

we simply assume that if your rating is higher it's because overall you do play efficiently

eclectic

Rhubarb
26-09-2004, 11:35 PM
^ :clap: ^ :clap: ^ :clap: