A geeky question/suggestion about the ratings algorithm.

Suggestions on improving the site or comments in general?

Re: A geeky question/suggestion about the ratings algorithm.

Postby Groo » 16 Jun 2017, 15:44

nopunin10did wrote:
Though I'm sure their Elo has a lot more mess going on behind-the-scenes, I have been able to glean that it's roughly based on a model like this:

  • -20 for a loss
  • 120 for a solo
  • 50 each for 2-way
  • 26.667 each for 3-way
  • 15 each for 4-way
  • 8 each for 5-way
  • 3.333 each for 6-way
  • 0 for everyone in 7-way


I'm not sure what exactly are you referring to with this model, but I'm pretty sure ratings on playDip now are not working this way, because I have only one ranked game, and a solo victory and I got around 400 points, so please explain :)

I glanced at Dixiecon, and I don't know, I don't find it satisfying enough also. My idea was not to create a usable Elo system for PlayDip at this moment, but to "improve" the existing system a tad. It should actually encourage players to go for solo even more: only the second in a solo would get some points, or rating shield. For example - If you're the only survivor except the victor you get some points, let's say 10% of what victor gains, and if you're second in a solo with more survivors you get a ratings shield and lose no points.
I don't see the need for zero sum environment at all, if we're talking about score on this site alone.If we're talking about a "global" dip elo ranking system...then this should have a separate thread and be discussed much more thoroughly.
User avatar
Groo
 
Posts: 23
Joined: 14 Nov 2016, 18:13
Class: Ambassador
Standard rating: 1457
All-game rating: 1460
Timezone: GMT

Re: A geeky question/suggestion about the ratings algorithm.

Postby NoPunIn10Did » 16 Jun 2017, 16:38

Wethalon wrote:The scoring system depends on rating differences. See here for chess:

https://en.wikipedia.org/wiki/Elo_ratin ... al_details

There are no standard changes for certain results. But yes it's zero-sum.


That's true. What I'm describing is the base case, where you assume equally skilled opponents and no variety in K-factors.
NoPunIn10Did

Variant GM, Designer & Collaborator
User avatar
NoPunIn10Did
Premium Member
 
Posts: 721
Joined: 17 Aug 2011, 00:17
Location: North Carolina
Class: Ambassador
Standard rating: 1000
All-game rating: 1238
Timezone: GMT-5

Re: A geeky question/suggestion about the ratings algorithm.

Postby asudevil » 16 Jun 2017, 16:46

Groo wrote:I don't see the need for zero sum environment at all, if we're talking about score on this site alone.If we're talking about a "global" dip elo ranking system...then this should have a separate thread and be discussed much more thoroughly.


You have to have a zero sum environment otherwise inflation or deflation screws with the ratings where people who have been here longer...by default have a higher rating just because...not based on skill, but based on number of games.
Captain FANG, forum team championships WINNER
Part of the surviving nations of WW4/Haven

Unless I am in the cheater's subforum. 99% of what I say is NOT as a mod.

Want to play fantasy football next season [url=http://www.playdiplomacy.com/forum/viewtopic.php?f=8&t=56016[/url]
User avatar
asudevil
Premium Member
 
Posts: 15773
Joined: 18 Jul 2011, 02:20
Class: Star Ambassador
Standard rating: 1339
All-game rating: 1513
Timezone: GMT-7

Re: A geeky question/suggestion about the ratings algorithm.

Postby NoPunIn10Did » 16 Jun 2017, 16:58

Groo wrote:
nopunin10did wrote:
Though I'm sure their Elo has a lot more mess going on behind-the-scenes, I have been able to glean that it's roughly based on a model like this:

  • -20 for a loss
  • 120 for a solo
  • 50 each for 2-way
  • 26.667 each for 3-way
  • 15 each for 4-way
  • 8 each for 5-way
  • 3.333 each for 6-way
  • 0 for everyone in 7-way


I'm not sure what exactly are you referring to with this model, but I'm pretty sure ratings on playDip now are not working this way, because I have only one ranked game, and a solo victory and I got around 400 points, so please explain :)

I glanced at Dixiecon, and I don't know, I don't find it satisfying enough also. My idea was not to create a usable Elo system for PlayDip at this moment, but to "improve" the existing system a tad. It should actually encourage players to go for solo even more: only the second in a solo would get some points, or rating shield. For example - If you're the only survivor except the victor you get some points, let's say 10% of what victor gains, and if you're second in a solo with more survivors you get a ratings shield and lose no points.
I don't see the need for zero sum environment at all, if we're talking about score on this site alone.If we're talking about a "global" dip elo ranking system...then this should have a separate thread and be discussed much more thoroughly.


As mentioned in a prior reply, the numbers I listed are not the whole story. They're the base case. Your 400 is likely an adjustment based on your K-factor, a value that shrinks or magnifies a win or draw by a certain amount independent of other players. Here, K-factors appear to shrink to a constant over time but start at high values to push your rating further up or down in our earliest games (as a means of measuring your skill faster).

Your rating compared to that of your opponents prior to applying points is the other side. In the assumed base case, you're expected to get 20 points out of 140. Your adjustment is your victory minus your expectation, which is why a solo lands at 120 and a loss at -20.

But let's say one player was three times as good as you, and the other five were twice as good as you. Then the Elo calculation would treat your expected as 10, the best player as 30, and the other five as 20. For you, a loss would still be negative, but only half as much. And if the best player landed in a large enough draw, they would lose points.

And you absolutely need a near-zero sum environment for any ratings system where you don't want "I play more games" to become a trivial means of increasing ratings.
NoPunIn10Did

Variant GM, Designer & Collaborator
User avatar
NoPunIn10Did
Premium Member
 
Posts: 721
Joined: 17 Aug 2011, 00:17
Location: North Carolina
Class: Ambassador
Standard rating: 1000
All-game rating: 1238
Timezone: GMT-5

Re: A geeky question/suggestion about the ratings algorithm.

Postby Groo » 16 Jun 2017, 19:11

Ok, thanks for explanation guys :D
User avatar
Groo
 
Posts: 23
Joined: 14 Nov 2016, 18:13
Class: Ambassador
Standard rating: 1457
All-game rating: 1460
Timezone: GMT

Re: A geeky question/suggestion about the ratings algorithm.

Postby Wethalon » 18 Jun 2017, 14:20

Here is my best guess to how the system actually works:

1) For each player (i = 1,2,...,7), divide their rating by 400 and exponentiate: p[i] = exp(Rating[i]/400)
2) Normalize the numbers you get by dividing by their sum. This is a number between 0 and 1 representing each player's expected result. ExpectedResult[i] = p[i]/(sum_i p[i]).
3) ActualResult[i] is 1 over number of winners if you are a winner, 0 otherwise. So it would be 1/3 for those part of a three-way draw.
4) Rating change is K * (ActualResult - ExpectedResult).

This system is guaranteed to be zero-sum because both ActualResult and ExpectedResult sum to 1 over all players. The system also becomes identical to the Elo system in chess if there were two players.

Unknowns:
- The 400 in Step 1 is the value used in chess. I don't know if PD uses the same value.
- The K-factor in Step 4. This was discussed two posts above. It is set by the site and depends on how much you have played and/or your current rating. It likely starts at 400 and decreases in steps, probably always being a round number.

If someone is really dedicated, they could try to determine the unknowns from recently finished games (tricky: you'd need to know the ratings going in) 8-)
Wethalon
Premium Member
 
Posts: 42
Joined: 14 Jun 2015, 04:35
Class: Star Ambassador
Standard rating: 1695
All-game rating: 1793
Timezone: GMT-5

Re: A geeky question/suggestion about the ratings algorithm.

Postby asudevil » 18 Jun 2017, 15:19

Wethalon wrote:Here is my best guess to how the system actually works:

1) For each player (i = 1,2,...,7), divide their rating by 400 and exponentiate: p[i] = exp(Rating[i]/400)
2) Normalize the numbers you get by dividing by their sum. This is a number between 0 and 1 representing each player's expected result. ExpectedResult[i] = p[i]/(sum_i p[i]).
3) ActualResult[i] is 1 over number of winners if you are a winner, 0 otherwise. So it would be 1/3 for those part of a three-way draw.
4) Rating change is K * (ActualResult - ExpectedResult).

This system is guaranteed to be zero-sum because both ActualResult and ExpectedResult sum to 1 over all players. The system also becomes identical to the Elo system in chess if there were two players.

Unknowns:
- The 400 in Step 1 is the value used in chess. I don't know if PD uses the same value.
- The K-factor in Step 4. This was discussed two posts above. It is set by the site and depends on how much you have played and/or your current rating. It likely starts at 400 and decreases in steps, probably always being a round number.

If someone is really dedicated, they could try to determine the unknowns from recently finished games (tricky: you'd need to know the ratings going in) 8-)


Also, every game individually isn't zero sum because solo's get more than a 2way draw...but it doesn't "cost" and lose you more points if you lose to a solo vs a 2man draw.

And there are the surrenders which take full losses immediately...and ratings shields which DONT cost you points when you lose...

So its not entirely zero sum for each game...although we try to keep it close...but its pretty close to zero sum for the site hence the fact that our inflation % is less than .1% in the last 3 years.
Captain FANG, forum team championships WINNER
Part of the surviving nations of WW4/Haven

Unless I am in the cheater's subforum. 99% of what I say is NOT as a mod.

Want to play fantasy football next season [url=http://www.playdiplomacy.com/forum/viewtopic.php?f=8&t=56016[/url]
User avatar
asudevil
Premium Member
 
Posts: 15773
Joined: 18 Jul 2011, 02:20
Class: Star Ambassador
Standard rating: 1339
All-game rating: 1513
Timezone: GMT-7

Re: A geeky question/suggestion about the ratings algorithm.

Postby NoPunIn10Did » 18 Jun 2017, 17:28

Wethalon wrote:Here is my best guess to how the system actually works:

1) For each player (i = 1,2,...,7), divide their rating by 400 and exponentiate: p[i] = exp(Rating[i]/400)
2) Normalize the numbers you get by dividing by their sum. This is a number between 0 and 1 representing each player's expected result. ExpectedResult[i] = p[i]/(sum_i p[i]).
3) ActualResult[i] is 1 over number of winners if you are a winner, 0 otherwise. So it would be 1/3 for those part of a three-way draw.
4) Rating change is K * (ActualResult - ExpectedResult).

This system is guaranteed to be zero-sum because both ActualResult and ExpectedResult sum to 1 over all players. The system also becomes identical to the Elo system in chess if there were two players.

Unknowns:
- The 400 in Step 1 is the value used in chess. I don't know if PD uses the same value.
- The K-factor in Step 4. This was discussed two posts above. It is set by the site and depends on how much you have played and/or your current rating. It likely starts at 400 and decreases in steps, probably always being a round number.

If someone is really dedicated, they could try to determine the unknowns from recently finished games (tricky: you'd need to know the ratings going in) 8-)


Based on the data I've been able to collect, the "400" value seems to be either 1200 or 1400. The K-factor changes based on the number of games you've played, but it appears to shrink to approximately 140 eventually. This value yields the -20 loss, 120 win base case mentioned earlier.

For Ancient Med games, where there are 5 opponents instead of 7, I think the K-factor is 100 instead. A base-case loss there counts the same -20, but a solo is worth 80, a 2-way worth 30, 3-way 13, and a 4-way 5.
NoPunIn10Did

Variant GM, Designer & Collaborator
User avatar
NoPunIn10Did
Premium Member
 
Posts: 721
Joined: 17 Aug 2011, 00:17
Location: North Carolina
Class: Ambassador
Standard rating: 1000
All-game rating: 1238
Timezone: GMT-5

Re: A geeky question/suggestion about the ratings algorithm.

Postby Idols » 20 Jul 2017, 22:26

Maybe we are approaching this wrong. Ideally, I would want to be able to know how good I am at playing Italy compared to other people. Kind of like how best country awards work in FTF diplomacy tournaments.

So perhaps we could create a ranking sub-score for each country. This way, I would know that I am say the 289th best player at playing Austria, but say the 68th best player at England.
This is a block of text that can be added to posts you make. There is a 300 character limit.
Yes, that is the unfortunate extent of my creativity.

Idols
User avatar
Idols
 
Posts: 66
Joined: 07 Feb 2014, 19:00
Location: Chicago
Class: Star Ambassador
Standard rating: 1479
All-game rating: 1720
Timezone: GMT-6

Re: A geeky question/suggestion about the ratings algorithm.

Postby super_dipsy » 21 Jul 2017, 07:27

Idols wrote:Maybe we are approaching this wrong. Ideally, I would want to be able to know how good I am at playing Italy compared to other people. Kind of like how best country awards work in FTF diplomacy tournaments.

So perhaps we could create a ranking sub-score for each country. This way, I would know that I am say the 289th best player at playing Austria, but say the 68th best player at England.

Some of that is available to you now, but only based on you versus the 'norm'. So in the Stats pages, you can see how solos split between the countries. Looking at your own stats, you can see how YOUR solo rate by each country matches that. But I understand that this does not allow you to see that you are ranked 203rd as an Italy player or whatever.

But this will always suffer from the points int he previous posts. If you want a statistical-based picture, then that can be done for lots of different slices on the fly, based on game histories. But this will not take account of the strength of your opposition. Someone doing well with Italy may only be doing so because they played a lot of games against weak or unreliable players. If you want to take into account opposition, then you are back to needing a rating for every player for that country which cannot be created on the fly but has to be built up historically game by game. Although this can be done (as we have just done with Gunboat / Fog / the maps) in your example if we did it by country then assuming we are going to be fair to players of all the variants we would have to do it for many, many countries (eg Italy on Classic, Italy on 1900, Poland on Versailles, Greece on AM, etc)

I think actually that this is a point I either saw here or in another thread. We need to differentiate between different cuts of the statistics that we would like (that could be created on the fly) and performance ratings (which have to be built game by game).
User avatar
super_dipsy
Site Admin
Site Admin
 
Posts: 10698
Joined: 04 Nov 2009, 17:43
Class: Ambassador
Standard rating: (1000)
All-game rating: (956)
Timezone: GMT

Previous

Return to Suggestions

Who is online

Users browsing this forum: No registered users and 2 guests