Rating survey results, Cultris will switch to ELO

5/21/2012 9:54:20 PM
Gravatar
Total Posts 11

Rating survey results, Cultris will switch to ELO

Hi there

Well, the important news is right there in the title: Cultris will switch from the currently used Bitpattern algorithm to create the highscore to good old ELO! The change will happen sometime in the beginning of next month and of course take into account all the games that have been played up to then. The new ranking should provide a much more accurate description of the performance of all players, it will be easier to understand and more fun for everyone!

The survey

Since gdocs doesn't let me publish the summary, I'll quickly leave some notes about it here:
The average happiness with the system is at 2.7 (from 1-5) so it is time to do something. A lot of people are unhappy with the accuracy and feel the system should respond better to their current performance. For more than one third it is also important to be able to understand the system. You clearly voted that players who haven't played in a while should lose some points, this is easy to implement. While about one third thinks the accuracy of top players is more important, the majority disagrees, so there will be no special treatment. You wished for the average to be around 1000, but were mostly indifferent about the question of rounded value vs exact. ELO usually doesn't use floating points though.
There were a few wishes about separate highscores for 1v1 games, monthly highscores and more. This is a bit outside of my jurisdiction though and for now we'll focus on improving the rating system in general. Let's first have one good ranking before wishing for more.

The new system

This is a school project for me that I'll present on the 6th of June. I'll give my code to Simon in two weeks, so there is still some time to optimize some values, but this is where it stands now:
Basically it will just be ELO with a k-factor of around 35 and an average score of 1000. The actual number for k is based on parameter optimization and we're going to use the one that gives the best accuracy over all players. Currently I have it at 72 divided by the number of players in the round. This means it's better to win against a good player in 1vs1 than in a group with 4 other people.
New players will have a more adaptive rating in the beginning, they get a small multiplier bonus to the points they win / lose each round, a bonus that will slowly decrease during the first 50 games. This means a players rank will be more volatile in the beginning and become more stable later on.
Further considerations like "lines sent" and so on could improve accuracy even more, but as it stands now it will not find its way into the system.

Comparisons

A simple way to see how accurate an algorithm is, is to implement it and then predict the outcome of the next game based on the current highscore. Some results for you:

Simple*: 62.8% correct predictions
Glicko: 66.6% correct predictions
Bitpattern: 66.8% correct predictions
ELO: 68.2% correct predictions
Improved ELO**: 68.7% correct predictions

*Simple system where winner gets +1 point, loser -1 point
**optimized K-Factor and more adaptive handling of newer players.

In the end this means that ELO significantly outperforms the current system in terms of accuracy and should show a better representation of your real performance in the game. If you want to see a possible high score list created with the new system you can check the following document:
https://docs.google.com/spreadsheet/ccc?key=0ArhgFScG26ISdGZ5MXA1bmVYSHFWUF9pX0dIWmt6dWc#gid=0

The ratings in there are based on the data from February. To find the name / current rank behind the UserID, go to http://gewaltig.net/ProfileView.aspx?userid=4196 and change the end of link to the userid.

Well, that's it! Feel free to discuss it here, but don't forget to actually go and defend those ELO-Points now!

5/21/2012 10:42:22 PM
Gravatar
Total Posts 516

Re: Rating survey results, Cultris will switch to ELO

I had a look on the wiki page. Which values do you use for S_A in multiplayer rooms (e.g. when somebody comes in 2nd in a 5 player room)?

@Simon I still would like to see the Bitpattern ranking continued on the ranking page as a comparative value. It's more reliable for medium ranked players and can not be tricked by somebody, who only plays against his friends. And I would also like to see those values:

  • average bpm (average over last 100 games, Cheese factory and Survivor excluded).
  • attack per minute ( [send lines + blocked lines]/time )
5/21/2012 10:51:00 PM
Gravatar
Total Posts 257
“If they ever tell my story let them say I walked with giants (...) let them say I lived in the time of Achilles...”

Re: Rating survey results, Cultris will switch to ELO

Oh yeah, that's a good news !

 

We finally have a reason to play and improve our ranking !

 

Thank you !

5/22/2012 12:52:10 AM
Gravatar
Total Posts 79

Re: Rating survey results, Cultris will switch to ELO

Awesome. I'm glad to hear it. I believe the new rating system will motivate players to improve more (and as a result, play more).

Other interesting stats to consider having a separate leaderboard:

  • Rolling average BPM (as mISStAKE mentioned). Who's the fastest?
  • Rolling average lines per minute (including upstack and garbage). (I suspect would have a strong correllation with Elo).
  • Rolling average garbage cleared per minute. Who's the most defensive player?
  • Rolling average SPM. Although, there are some issues with this! This stat has a bias towards players who just happen to play in games of longer length, and it may not accurately reflect the player's attacking ability.
  • For the above reason, a more accurate measure of attacking power would be "adjusted SPM." Theoretically, aSPM could equal (a*(number of singles)+b*(number of doubles)+c*(number of triples)+d*(number of tetrises)+e*(number of combos that included a 1-Combo)+f*(number of combos that included a 2-Combo)...)/(minutes played), where a, b, c, etc. equal the number of lines each of those attacks send when time=0.
  • Rolling average aFlux. This is simply (average garbage cleared per minute)+(aSPM), which would probably yield the strongest correlation with Elo. Which players are the most aggressive and defensive?
  • Rolling average keys per block. Who's got the most efficient finesse? Good for finding out if setting all your starting orientations as vertical will help or hurt you!
  • Cheese and Survivor Elo.
  • Rolling average Cheese and Survivor finish time (of those when the player wins the match).

Edit: Rolling average is much more preferable than a regular average since players hate to be haunted by their past sub-par performances. It's a more accurate representation of the player's current abilities. Not only does it show if a player has gotten better, but also if a player has gotten worse. I have a feeling it would prevent players from making a bunch of new accounts just to make their stats look good, too.

5/22/2012 3:05:11 PM
Gravatar
Total Posts 80

Re: Rating survey results, Cultris will switch to ELO

Cultris "will" switch to ELO

so do you mean it is still being worked on, and will be implemented soon?

 

or... "Well, that's it! Feel free to discuss it here, but don't forget to actually go and defend those ELO-Points now!"

do you mean it's already used now? XD .

5/22/2012 4:10:45 PM
Gravatar
Total Posts 11

Re: Rating survey results, Cultris will switch to ELO

do you mean it's already used now? XD .

It may not be used right now, but every game you play now will still count when the switch is done, so don't be lazy now :)

I had a look on the wiki page. Which values do you use for S_A in multiplayer rooms (e.g. when somebody comes in 2nd in a 5 player room)?

It counts as win against each of the 3 weaker players and a single loss against the stronger player, though it takes into account the number of players in the room for the change of points - same as before. Concrete value: whatever gives the highest accuracy in the end, no guessing around involved.

It's more reliable for medium ranked players

What do you mean by that? While the gap in accuracy between the old and new system is indeed smallest for medium ranked players, Bitpattern still performs significantly worse.

5/22/2012 7:55:05 PM
Gravatar
Total Posts 516

Re: Rating survey results, Cultris will switch to ELO

( come in 2nd against 4 opponents)

It counts as win against each of the 3 weaker players and a single loss against the stronger player, though it takes into account the number of players in the room for the change of points - same as before. Concrete value: whatever gives the highest accuracy in the end, no guessing around involved.

I think, I understood. Let's say, there are 4 players. Player A comes in 1st, B 2nd, C 3rd, D 4th.

The updated scores are:

  • R(A) += K * ( 1 - E(A,B) + 1 - E(A,C) + 1 - E(A,D) )
  • R(B) += K * ( 0 - E(B,A) + 1 - E(B,C) + 1 - E(B,D) )
  • R(C) += K * ( 0 - E(C,A) + 0 - E(C,B) + 1 - E(C,D) )
  • R(D) += K * ( 0 - E(D,A) + 0 - E(D,B) + 0 - E(D,C) )

, where

  • R(A) is rank (ELO points) of player A
  • E(A,B) is expected score for A against B (winning chance)
  • E(A,B) + E(B,A) = 1
  • K depends on the number of players, here K = 72/4

The problem is, that coming in 2nd in a full room is nearly as good as winning. Actually it's fine by me, because I never come in first. Though, some Top 20 players will care. So I suggest using bigger K's for the winner. I know, that this is difficult, because the sums off all ranks should stay the same. Here an alternative for the example above:

  • R(A) += K * ( 1.5 - 1.5*E(A,B) + 1.5 - 1.5*E(A,C) + 1.5 - 1.5*E(A,D) )
  • R(B) += K * ( 0.0 - 1.5*E(B,A) + 1.0 - 1.0*E(B,C) + 1.0 - 1.0*E(B,D) )
  • R(C) += K * ( 0.0 - 1.5*E(C,A) + 0.0 - 1.0*E(C,B) + 1.0 - 1.0*E(C,D) )
  • R(D) += K * ( 0.0 - 1.5*E(D,A) + 0.0 - 1.0*E(D,B) + 0.0 - 1.0*E(D,C) )

(Bitpattern more reliable for medium ranked players)

What do you mean by that? While the gap in accuracy between the old and new system is indeed smallest for medium ranked players, Bitpattern still performs significantly worse.

I speak about players, who will never meet, so there can't be accuracy. I know this from Cultris 1, e.g. Kent is currently ranked 5th with 80 BPM, because there are no good players from Viet Nam. Of course, Cultris 2 community is more active, but still it will happen in ranks 50+. The strength of Bitpattern is, that players must have proved their skill against enough players to be ranked high.

May I ask, how your score changes by the course of time (gaining/losing score without playing)? Does it depend on your ELO points (more ELO points -> lose points faster) or how many games you have played?

And some notes to Caffeine's proposals

  • BPM vs. lines per minute (including upstack and garbage): too similar, one is enough
  • garbage cleared per minute: value may depend on your opponents; no need to downstack, when you block all lines
  • aSPM vs. aFlux: too similar, since # garbage cleared << lines sent
  • keys per block: how to handle DAS and 180°?
  • Cheese finish time: what happens, if one player is faster? time = winner's time * 9/# cleared lines?
  • Survivor finish time: hardly practicable, initial garbage insertion speed depends on the skill of the players
5/22/2012 8:35:15 PM
Gravatar
Total Posts 11

Re: Rating survey results, Cultris will switch to ELO

Yes, you understood correctly. As for your alternatives: I might try if it makes a difference, but not right now.

The change over time will not be that big, as it would decrease accuracy otherwise. In a set period of time every score gets slightly normalized to 1000, meaning for average players nothing will change, players below 1000 gain a bit and players above 1000 lose a few points. This also helps keeping the scores somewhat in check and the max ELO scores don't rise indefinitely.

5/24/2012 11:13:17 PM
Gravatar
Total Posts 219

Re: Rating survey results, Cultris will switch to ELO

I don't think you should try an approximative solution like misstake suggested to the the "multiplayer ELO" problem. As it stands and as Misstake explained, the approximation is good enough because it is exactly like if every player played 4 games against the other players to place themselves in a mini tourney,  while with the 1,5 constant that misstake proposed to introduce in the formula, we completely get out of the mathematical model behind ELO that we know is reliable.

If you want really want to extend ELO to multiplayer in the most accurate way that is, in the goal of finding a measurement(strength) for each player allowing to predict the outcome of a multiplayer game, you should try to inspire on the Trueskill paper that is easily found on google.

Now to come back to the original problem which was, that people were complaining about the hisghscore list, I dont understand why you would switch to ELO without letting us try  before a "bug-free" bitPattern system since it appears clearly that the current bitpattern algorithm is flawed as the rankings in your google doc do not match at all current rankings? Anyway, I'm good with the system you proposed, but I just think that the original complaints came from the fact that the highscore system was bugged, not that it was using the wrong system (bitpattern as opposed to ELO).

5/25/2012 1:13:34 AM
Gravatar
Total Posts 79

Re: Rating survey results, Cultris will switch to ELO

"BPM vs. lines per minute (including upstack and garbage): too similar, one is enough"

A decent player might be able to clear a garbage row every three pieces on average. Let's say player A ignores the downstack entirely while player B downstacks all he can. Both players go exactly 60 BPM. Both players leave no upstack at the end of the match. Player A will produce 60/2.5= 24 LPM. Player B will produce ((1/3)*60)+(((2/3)*60)*(3/4))/2.5 = 32 LPM.

"garbage cleared per minute: value may depend on your opponents; no need to downstack, when you block all lines"

Nearly all of these metrics' values will be influenced by the opponent. If you play against an easy opponent, it's easier to do a high combo. If you play against a difficult opponent, you may have to slow down to tackle all the garbage he's sending you.

"aSPM vs. aFlux: too similar, since # garbage cleared << lines sent"
I'm not so sure that aSPM > garbage cleared always. I could be wrong, but in any case, it shouldn't matter. They won't be too similar. If a player's opponent is weak, he increases his aFlux by being more aggressive (sending garbage). If his opponent is strong, then he increases his aFlux by being more defensive (clearing garbage). aFlux's strength lies in the fact that you can have players who produce high aSPMs but don't win very much (overly aggressive) and you can have players who produce high (garbage cleared)/minute but don't win very much (overly defensive).

"keys per block: how to handle DAS and 180°?"

180 is a key press. DAS is not a key press. KPT simply equals (total keys)/(total pieces). I could fancy the formula up to take account of time used by DAS as a function of KPT, but I don't see why that would be so much better. It would be harder to understand, too.

"Cheese finish time: what happens, if one player is faster? time = winner's time * 9/# cleared lines?"

I guess that could work, but it just seems like excluding all lost games would be more consistent (maybe the player tends to choke towards the end?).

"Survivor finish time: hardly practicable, initial garbage insertion speed depends on the skill of the players"

Ah, I wasn't aware of that feature. Thanks for pointing that out to me.