After not being able to make much sense of chess, I've decided to do something I am more familiar with. I set out to learn how the Elo rating system works. As always these days, I started with wikipedia. Sure enough, there is a good wiki page, and there is also a useful link to Mark Glickman's research page, where there is a collection of papers with theoretical and historical background on chess rating.
Here is what I found. The rating difference is modelled as a random variable with logistic distribution, and individual players have ratings that follow “extreme value distribution”. Also, M. Glickman states that logistic distribution is quite similar to Gaussin (normal) distribution, which significantly simplifies things, at least for me. Sum or difference of two Gaussian distributions is another Gaussian distribution, which means that I can approximate rating of individual players with Gaussian distribution (this is really just reverse engineering). Great! [NOTE: Gaussian distribution is a close approximation as it has been found out empirically that the tails of individual player's distribution are not symmetric – a main reason why “extreme value distribution” was chosen. ]
Another search revealed that if you model individual playing strength with Gaussian distribution, the standard deviation around the mean value is 200. That implies that the standard deviation for the difference in ratings is 200*sqrt(2), and the mean is a simple difference between the two means.
I quickly verified this by using scipy (scientific python) to create plots for the PDF functions of the rating difference (logistic PDF vs. Gaussian PDF with sigma=282.84), and everything matches pretty close. If you want to do the same, do not forget the ln(10) factor.
The main purpose of this simple exercise was not to determine what is the expectation that I will win or lose against another player given the rating differential X, as formulas for that are trivial regardless of whether you use logistic or normal distribution. Instead, I wanted to have some quantitative measure of how much does my game vary from one match to another. A standard deviation of 200 means that the variations are quite big, and while one plays within + / - 200 around average rating 68% of the time, there is still about 16 % of the time when you play really well, and another 16 % when you play well below what you normally do. This is significant, as when I have these bad days I tend to play more in a hope of winning a game, and lose really bad.
Another usefull piece of information is to determine the expectation that you will play X points above your average rating. This particular probability is not the same as the probability that you will win against a player that has a rating X points above yours. Here are some numbers I've derived using Gaussian CDF for individual player's ratin: as well as for rating difference.
|
X
|
E(pl. strength > X) using Gaussian CDF for individual player
|
E(win|X) using Gaussian CDF
|
E(win|X) using ELO formula
|
|
50
|
40.13
|
42.98
|
42.854
|
|
100
|
30.85
|
36.18
|
35.994
|
|
150
|
22.66
|
29.79
|
29.661
|
|
200
|
15.87
|
23.98
|
24.025
|
|
250
|
10.57
|
18.84
|
19.168
|
|
300
|
6.68
|
14.44
|
15.098
|
These values show (as expected) that your chance of playing for example X=100 points or more above your average rating is 30.8 %, compared to 36 % chances of winning against an opponent whose ranting is 100 better than yours. For X=300 this difference is even more pronounced. The implication of this is that if somebody can make a computer program that always plays with a fixed rating (pdf of its rating is a delta function), you wouldn't be able to beat it as often as you would beat a human player with the same average rating.
These results are of course trivial and expected, but I thought I would do it for fun.
Some thoughts/question related to computer ratings
As I was going through this exercise I thought of some questions related to computer chess and their rating. There is a chance that somebody who is familiar with the subject of computer rating stumbles upon this page, so if that happens I would be happy to hear some feedback on this.
Let us imagine that it is possible to make a chess engine that has a fixed rating (pdf of its playing stregth is a delta function, no variations in rating). How would one determine engine's rating in a chess tournament where every engine has fixed rating? Does the ELO difference, no matter how small, imply that the higher rated engine always wins?
I can think of a simple solution to the computer rating problem. It would involve calibration of these fixed-rated engines against humans, provided that the engines are not too strong.
Another question that comes to my mind is related to the standard deviation of the existing chess engines. I am pretty sure that engines don't play with a fixed strength as I know that they sometimes win and sometimes lose against engines with different rating. The question is: is the standard deviation of chess engines same as the standard deviation of human players? Intuitively, I would think that they shouldn't be the same, they should be smaller. On the other hand, it is likely that it is impossible for a chess engine to always play a game with the fixed rating simply because some of its move will match moves of a much stronger engine, and everything would then depend on how long the game is played and what type of position is developed. However, as I said, the variations from one game to another should be somewhat smaller than variations in human games. Is this taken into account in chess engine ratings?