Ratings and History

Submitted by davidetal on Sun, 06/21/2009 at 7:00pm.

The dark science of ratings is beyond my mathematical ken, so why we end up with ratings like 1430 or 1683 or 2700 I have no idea. However, there does seem an excellent correlation between rating and historical strength. 

Thus, a player with an 1800 rating today would have been a great player in the 18th Century. A current rating of 1450 would place you at the forefront of chess played in the 1400s. My personal goal is to achieve a rating that conforms with the present, that is, a 2000+ rating, so conforming with the year as it is written. And then a 2100+ rating, which would place me firmly in the 21st CenturyCool.

GMs with 2700+ are several centuries ahead of most of us mere mortals, but would lose quickly to the GMs of the 27th Century.

Or so it seems to meLaughing.

» posted in davidetal's Blog
 

Comments:

by flyingpawns - 4 months ago
China
Member Since: May 2009
Member Points: 184

cool

by cofresi - 4 months ago
United States
Member Since: Sep 2008
Member Points: 53

"cross-pollination". 

Wow. that's like a pretty fancy word to be using with people. 

You must have like books or something. 

by MikeRoesell - 4 months ago
Chicago United States
Member Since: Jan 2009
Member Points: 239

First of all the reason that you are playing better when you moved to the city is because you were playing higher rated pla yers and probably picking up pointers from them as well.  As for the other theory we see this in refuted ideas.  The later the player the more dubious lines they will know how to refute OTB. As for how the rating is made here is someone else to tell you how its done

 

Mathematical details

Performance can't be measured absolutely; it can only be inferred from wins and losses. Ratings therefore have meaning only relative to other ratings. Therefore, both the average and the spread of ratings can be arbitrarily chosen. Elo suggested scaling ratings so that a difference of 200 rating points in chess would mean that the stronger player has an expected score of approximately 0.75, and the USCF initially aimed for an average club player to have a rating of 1500.

A player's expected score is his probability of winning plus half his probability of drawing. Thus an expected score of 0.75 could represent a 75% chance of winning, 25% chance of losing, and 0% chance of drawing. On the other extreme it could represent a 50% chance of winning, 0% chance of losing, and 50% chance of drawing. The probability of drawing, as opposed to having a decisive result, is not specified in the Elo system. Instead a draw is considered half a win and half a loss.

If Player A has true strength RA and Player B has true strength RB, the exact formula (using the logistic curve) for the expected score of Player A is

E_A = \frac 1 {1 + 10^{(R_B - R_A)/400}}.

Similarly the expected score for Player B is

E_B = \frac 1 {1 + 10^{(R_A - R_B)/400}}.

This could also be expressed by

E_A = \frac{Q_A}{Q_A + Q_B}

and

E_B = \frac{Q_B}{Q_A + Q_B}

where Q_A = 10^{R_A/400} and Q_B = 10^{R_B/400}. Note that the same denominator applies to both expressions. This means that by studying only the numerators, we find out that player A has about QA / QB times greater chance of winning as player B. It then follows that for each 400 rating points of advantage over the opponent, the chance of winning is magnified ten times in comparison to the opponent's chance of winning.

Also note that EA + EB = 1. In practice, since the true strength of each player is unknown, the expected scores are calculated using the player's current ratings.

When a player's actual tournament scores exceed his expected scores, the Elo system takes this as evidence that player's rating is too low, and needs to be adjusted upward. Similarly when a player's actual tournament scores fall short of his expected scores, that player's rating is adjusted downward. Elo's original suggestion, which is still widely used, was a simple linear adjustment proportional to the amount by which a player overperformed or underperformed his expected score. The maximum possible adjustment per game (sometimes called the K-value) was set at K = 16 for masters and K = 32 for weaker players.

Supposing Player A was expected to score EA points but actually scored SA points. The formula for updating his rating is

R_A^\prime = R_A + K(S_A - E_A).

This update can be performed after each game or each tournament, or after any suitable rating period. An example may help clarify. Suppose Player A has a rating of 1613, and plays in a five-round tournament. He loses to a player rated 1609, draws with a player rated 1477, defeats a player rated 1388, defeats a player rated 1586, and loses to a player rated 1720. His actual score is (0 + 0.5 + 1 + 1 + 0) = 2.5. His expected score, calculated according to the formula above, was (0.506 + 0.686 + 0.785 + 0.539 + 0.351) = 2.867. Therefore his new rating is (1613 + 32· (2.5 − 2.867)) = 1601.

Note that while two wins, two losses, and one draw may seem like a par score, it is worse than expected for Player A because his opponents were lower rated on average. Therefore he is slightly penalized. If he had scored two wins, one loss, and two draws, for a total score of three points, that would have been slightly better than expected, and his new rating would have been (1613 + 32· (3 − 2.867)) = 1617.

This updating procedure is at the core of the ratings used by FIDE, USCF, Yahoo! Games, the ICC, and FICS. However, each organization has taken a different route to deal with the uncertainty inherent in the ratings, particularly the ratings of newcomers, and to deal with the problem of ratings inflation/deflation. New players are assigned provisional ratings, which are adjusted more drastically than established ratings, and various methods (none completely successful) have been devised to inject points into the rating system so that ratings from different eras are roughly comparable.

The principles used in these rating systems can be used for rating other competitions—for instance, international football matches.

Elo ratings have also been applied to games without the possibility of draws, and to games in which the result can also have a quantity (small/big margin) in addition to the quality (win/loss). See go rating with Elo for more.

Mathematical issues

There are three main mathematical concerns relating to the original work of Professor Elo, namely the correct curve, the correct K-factor, and the provisional period crude calculations.

Most accurate distribution model

The first major mathematical concern addressed by both FIDE and the USCF was the use of the normal distribution. They found that this did not accurately represent the actual results achieved by particularly the lower rated players. Instead they switched to a logistical distribution model, which provides a better fit for the actual results achieved.[9]

Most accurate K-factor

The second major concern is the correct "K-factor" used. The chess statistician Jeff Sonas reckons that the original K=10 value (for players rated above 2400) is inaccurate in Elo's work. If the K-factor coefficient is set too large, there will be too much sensitivity to winning, losing or drawing, in terms of the large number of points exchanged. Too low a K-value, and the sensitivity will be minimal, and it would be hard to achieve a significant number of points for winning, etc.

Elo's original K-factor estimation was made without the benefit of huge databases and statistical evidence. Sonas indicates that a K-factor of 24 (for players rated above 2400) may be more accurate both as a predictive tool of future performance, and also more sensitive to performance.[10]

Certain Internet chess sites seem to avoid a three-level K-factor staggering based on rating range. For example the ICC seems to adopt a global K=32 except when playing against provisionally rated players. The USCF (which makes use of a logistic distribution as opposed to a normal distribution) have staggered the K-factor according to three main rating ranges of:

  • Players below 2100 -> K factor of 32 used
  • Players between 2100 and 2400 -> K factor of 24 used
  • Players above 2400 -> K factor of 16 used

FIDE uses the following ranges[11]:

  • K = 25 for a player new to the rating list until he has completed events with a total of at least 30 games.
  • K = 15 as long as a player's rating remains under 2400.
  • K = 10 once a player's published rating has reached 2400, and he has also completed events with a total of at least 30 games. Thereafter it remains permanently at 10.

In over-the-board chess, the staggering of K-factor is important to ensure minimal inflation at the top end of the rating spectrum. This assumption might in theory apply equally to an online chess server, as well as a standard over-the-board chess organisation such as FIDE or USCF. In theory, it would make it harder for players to get the much higher ratings, if their K-factor sensitivity was lessened from 32 to 16 for example, when they get over 2400 rating. However, the ICC's help on K-factors indicates[12] that it may simply be the choosing of opponents that enables 2800+ players to further increase their rating quite easily. This would seem to hold true, for example, if one analysed the games of a GM on the ICC: one can find a string of games of opponents who are all over 3100. In over-the-board chess, it would only be in very high level all-play-all events that this player would be able to find a steady stream of 2700+ opponents – in at least a category 15+ FIDE event. A category 10 FIDE event would mean players are restricted in rating between 2476 to 2500. However, if the player entered normal Swiss-paired open over-the-board chess tournaments, he would likely meet many opponents less than 2500 FIDE on a regular basis. A single loss or draw against a player rated less than 2500 would knock the GM's FIDE rating down significantly.

Even if the K-factor was 16, and the player defeated a 3100+ player several games in a row, his rating would still rise quite significantly in a short period of time, due to the speed of blitz games, and hence the ability to play many games within a few days. The K-factor would arguably only slow down the increases that the player achieves after each win. The evidence given in the ICC K-factor article relates to the auto-pairing system, where the maximum ratings achieved are seen to be only about 2500. So it seems that random-pairing as opposed to selective pairing is the key for combatting rating inflation at the top end of the rating spectrum, and possibly only to a much lesser extent, a slightly lower K-factor for a player >2400 rating.

If you couldn't tell this is straight from wikipedia

by Shivsky - 4 months ago
DFW United States
Member Since: Jun 2009
Member Points: 486

Well did you consider that one's rating is a sign of the kind of "competition" he/she overcame to achieve in his/her day?

A simplified example => I used to be stuck in a small town and my only chance of OTB tournament opposition was in the 1200-1600 range. I moved to a big city and then boom! I grew faster. So my theory is that with the world getting smaller and smaller (the industrial revolution, trade, colonization etc. ), there was more "cross-pollination" and that led to stronger players playing stronger players and hence => adding to what would be the "top" rating of their day. 

Just a thought :)

 

Add your comment:

Join Chess.com for free to add your comment! Already a member? Then login now to comment.