The new ELO-based ranking system

gamerman01

And man this part annoys me! Maybe you are using the wrong term.

A circular reference is “a formula that visits its own or another cell more than once in its chain of calculations, creating an infinite loop.”

There are no circular references. There isn’t even a formula. There are no infinite loops.
You can take any given player at any given time and tally up the points from each player they’ve defeated, in that player’s current tier which is always known, and similarly with each loss.

The major strength of this characteristic of the past system is that anyone, especially me, can accurately verify exactly how many points divided by how many games and error check. A player can see exactly why his average is what it is.

There is no circular reference. There is no infinte loop.
I do understand that there are instances where it matters which result is put in, in which order. I have never seen how this makes any significant difference, and the fact remains, every average can be verified by the wins and losses against various tiers. I don’t think a player is going to be 5.43 instead of 5.20 because of this novelty exception.

Anyway.

I am only trying to educate and work together to make the ELO system as awesome as it can be.
The lifetime, multi-year results are awesome, and are quite accurate because there is more data than in a single year and the inherent limitation of players with only 1 or 3 games played is overcome by the large quantity of data.

We’ll figure out how to make good single year results and playoffs, which have always been the top priorities of league play.

gamerman01

I understand it will come across that I am angry and defensive because “my” rankings are being criticized and shoved to the side, but that is really not why I’m writing this way.

I am enthusiastic and entering data and validating data and working with MrRoboto and his (to me) incredible prowess with formulas, spreadsheets, math, and computers in general,
And I am super excited about the multi-year data and how the results are stacking up like for the past 3 years to look very accurate to me. With a database and queries, filters, sorting etc the sky is the limit. And it would have been good if this happened 10 years ago.

So! Don’t get me wrong.

I just really want to educate when I see, even constructive and well-meaning and intelligent criticisms, that are made without understanding all angles/factors/history/theory.

Not to mention it’s fun.

pacifiersboard

@gamerman01 said in Proposal for a new, ELO-based, ranking system:
…

We’ll figure out how to make good single year results and playoffs, which have always been the top priorities of league play.

What you write doesn’t read to me as you were pissed but readjusting focus!

pacifiersboard

So issues are that

main period of time is each legal year,
relevant strength of a player is the one at the last day of a year or its full length average?

MrRoboto

I know we basically already decided to go for the new system, but gamerman felt the need to clear up some things that are wrong in his opinion. And I do too.

Your old spreadsheet only had no formulas, because you did everything manually. I could do the ELO system completely manually too, it’s not high math. This stuff is taught in high school.

But once you try to autmate it, you will need formulas. In fact that’s exactly what I did with the old PPG-system and Google Sheet immediately alerted me, that there are circular references. I had to enable iterative calculations and limit the loops to 50 calculations, since there are, in fact, infinite loops.
This is done with File -> Settings -> Calculations.
Without that, the spreadsheet would give only errors.
You can check out my automated version here:
https://docs.google.com/spreadsheets/d/1_agqROzXQHWdmmiCJXGs3oFARJ29euk6MHPXoQ-0wlI/

I already showed it to you multiple times, but I can try it one more time, this time a bit more detailed.

The cell with the Tier ranking (A) looks at the PPG-cell (B)
The PPG-Cell (B) is just total points © divided by number of games.
Total points looks at all the individual points a result has awarded. Let’s say there are 10 results for a specific player, so total points looks at 10 different cells, we can call them R1-R10. Now you just added these points manually and it’s almost impossible to determine which summand refers to which specific result. But even manually all of these 10 summands depend on the tier of the opponent.
So R1-10 look at the cells with the tier of the different opponents, let’s call them O1-O10.
O1-10 looks at PPG (B1-B10).
PPG (B1-B10) look at total points of these players (TP1 - TP10).
But in these cells TP1-TP10 are obviously summands that include the result against the original player so all of these look at cell A.
And there is already the loop.

All of that because you want to cover the very specific case of a newcomer being a tomato or a god.
I do admit that this very specific case can be covered with the old PPG system a tiny bit better. The Elo-System has the K-factor for that very reason. I mean look at Adam, he is not even a god (since he lost 1 game) and still climbed to #5 with only 7 games. 2 of the 6 wins are against low level players and 2 other against medium ones.
In your extreme case, god would crush even elite players and climb to #1 within 4 (FOUR!) games.

The one downside with the current system is that the established player would still lose points to a newcomer that might be a god. (or gain some against a tomato). There is no beating around the bush, the first 2, maybe 3 games are not covered by this extreme case.

jkeller (the current #1) would lose 68 points against a newcomer.
Axis-Dominion (the new #1) would still lose 62 if he lost against the new god.
jkeller would reclaim #1 and if he lost again, would lose 51 points.
With the 4th win against then #1 Axis-Dominion, the new god already is #1.

So for 2, MAYBE 3 games there is an argument that the established players would lose a bit too many points. However, as I already said, that’s the absolute extreme case and I wonder if this ever happened.

The old PPG would treat these 2-3 games a bit more fairly but sacrifices soooo many things for that (I already listed them all).

Concerning the playoff qualifications:
We can just use 6 games played the last year as a requirement.
The more recent games (the last year) always has a much much much bigger impact on the ELO rating.

If you are a top10 player on Jan1 but perform poorly during the year with at least 6 completed games, there is no chance you keep the ranking and you WILL drop out of the top10.
On the other hand, even if you are far from #10, you can easily climb to top10 if you perform accordingly.
I just showed you how fast you can get to #1.
So just takes the best 8 (or how many playoff spots there are) that have completed at least 6 games this year and voila: You’ll most likely have the 8 best AT THE MOMENT, which automatically also means best 8 of the year.

MrRoboto

Sorry, my mistake. Wrong year.
My bad!

pacifiersboard

@MrRoboto

would you also like to share a local copy of the new file with formulae? I am interested in getting it to my sandbox :)

gamerman01

@MrRoboto I deleted the post for you

It’s a firm point that the system I maintained was vulnerable to human error, and I relied on players to catch the occasional error that I would make.

I’m not arguing against a new and improved system, as I made very clear. I don’t have time to answer point by point, nor is there any need. I don’t think you understand all of the nuances and benefits of what we’ve been relying on, but it doesn’t matter.

MrRoboto

@MrRoboto said in Proposal for a new, ELO-based, ranking system:

https://docs.google.com/spreadsheets/d/1_agqROzXQHWdmmiCJXGs3oFARJ29euk6MHPXoQ-0wlI/

@pacifiersboard

gamerman01

@MrRoboto said in Proposal for a new, ELO-based, ranking system:

But once you try to autmate it, you will need formulas. In fact that’s exactly what I did with the old PPG-system and Google Sheet immediately alerted me, that there are circular references. I had to enable iterative calculations and limit the loops to 50 calculations, since there are, in fact, infinite loops.
This is done with File -> Settings -> Calculations.
Without that, the spreadsheet would give only errors.
You can check out my automated version here:
https://docs.google.com/spreadsheets/d/1_agqROzXQHWdmmiCJXGs3oFARJ29euk6MHPXoQ-0wlI/

I already showed it to you multiple times, but I can try it one more time, this time a bit more detailed.

The cell with the Tier ranking (A) looks at the PPG-cell (B)
The PPG-Cell (B) is just total points © divided by number of games.
Total points looks at all the individual points a result has awarded. Let’s say there are 10 results for a specific player, so total points looks at 10 different cells, we can call them R1-R10. Now you just added these points manually and it’s almost impossible to determine which summand refers to which specific result. But even manually all of these 10 summands depend on the tier of the opponent.
So R1-10 look at the cells with the tier of the different opponents, let’s call them O1-O10.
O1-10 looks at PPG (B1-B10).
PPG (B1-B10) look at total points of these players (TP1 - TP10).
But in these cells TP1-TP10 are obviously summands that include the result against the original player so all of these look at cell A.
And there is already the loop.

As you said at the beginning, “once you try to automate it”. So we are each talking about a different thing. I merely asserted that there are no circular references to the (manual) system. You are saying there are circular references if you try to automate it. There is no impasse.

I have been doing the system that shows average points per game while at the same time reflecting the current tier situation of all players. You’re saying the computer can’t do that. Score a rare win for humanity.

I want the system automated and I want the ELO spreadsheet to be “it”, starting 1/1/24, so discussions about the different features of both systems is purely academic, OTHER THAN to demonstrate what features we would be losing if we don’t take a look at them and decide whether we can live without them or whether we should work together to find a way to duplicate, replicate, preserve features we don’t want to lose.

So it seems that on the table right now is playoffs. Pretty sure vast majority want optional annual playoffs for each major version played.
The fact is, with lifetime ELO, some players start the year out ahead of others, and the ELO at the end of the year is largely reflective of most recent games, but not entirely. This is not necessarily a problem. It’s just one of the issues that needs to be discussed since we have a significant system change.

pacifiersboard

@MrRoboto

thx! I meant a copy of the ELO file that is enabling some (local) experimentation

Stucifer

@gamerman01

The fact is, with lifetime ELO, some players start the year out ahead of others, and the ELO at the end of the year is largely reflective of most recent games, but not entirely. This is not necessarily a problem. It’s just one of the issues that needs to be discussed since we have a significant system change.

One idea: make a separate sheet for the Yearly rankings that takes the player’s Lifetime Ranking at the start of the year as their starting ELO, then only include the games from the year. Not sure how to calibrate the K value for that, maybe have all at the middle number for 10+ games played (90). Can be discussed.

That way, Lifetime players with high ELO are still recognized for their skill but someone that makes a lot of progress in a new year can move up more rapidly even if they played a lot of games previously. And someone that loses a lot of games that year would move down more rapidly.

However to @MrRoboto 's point, the K value is already large with the K at 70 giving a 35-point swing for even ELO matchups, and upsets quite a bit more than that, it seems like just using the Lifetime rating should suffice for making brackets. 🤔🤔

Stucifer

@mr_stucifer said in Proposal for a new, ELO-based, ranking system:

middle number for 10+ games played (90)

Correction, 7-9 games I think is K of 90, somewhere around 90-110 would be a significant acceleration of change compared to 70.

gamerman01

OK, so there is a “slider bar” on sensitivity that can easily be adjusted.
Reading MrRoboto’s recent post a second time, I see the sensitivity to the past 6+ games can be set to essentially make the current ELO rating (at 12/31/XX) reflect the results of the past 12 months, and the objective is met.

So then would we just want another set of results/rankings that has a lower sensitivity, like for a life-ELO number? If the same sensitivity, K factor, is used and the last, say, 6-10 results are what mostly determine the current ELO rating, then older results are nearly irrelevant?

Just trying to understand

MrRoboto

@pacifiersboard said in Proposal for a new, ELO-based, ranking system:

thx! I meant a copy of the ELO file that is enabling some (local) experimentation

You can create your own copy.

File -> Make a copy

Stucifer

@gamerman01 They aren’t irrelevant, but they do factor in less over time generally speaking. They are important because it tries to calibrate you to your appropriate rating quickly, then from there it is more of a maintenance process unless there is relative improvement.

At the extremes of the ELO Rating, even current games are worth little unless there is an upset. If a 1900 plays 1100 and wins they will not go up very many points, as it is expected. But if the 1900 loses the 1100 will receive a huge boost and the 1900 will fall quite significantly. This does reduce the incentive for the highest-ranked players to play the lowest-ranked players, but that was already the case.

I am not sure how, but think it would be awesome if we could implement a possible Bid allowance into drastically different ELO games. Say a 500+ rating difference could get double the usual bid but have the game be worth half the points. This might promote playing between the extremes in skill levels, if useful.

MrRoboto

@mr_stucifer has summarized it perfectly. You clearly understand some math!

I agree, the incentive is not very high to play a lower-skilled player. However, you WILL gain points, if you win. You just need to be ready to take that risk.

Remember, I can always change how huge the impact of an upset is, by lowering or increasing the F-Factor. Right now it is at 500, which means that the system expects a player with 500 more rating than another player to win in 90% of the cases.

A lower F-Factor would squeeze everyone closer together so the difference between #1 and the last player is lower. The number of points lost when the worst player wins against the #1 still remains the same however. So the gained/lost points relative to the total amount is a higher%
So upsets hurt the better player more and will help the worse player more.

On the contrary, a higher F-Factor would increase the extreme ELO-Ratings at the top and bottom so the points lost/gained have not as big of an impact.

With the current F-Factor of 500, the difference between first and last player in BM4-rankings is 966 points. So the system expects then #1 player to win in 98,8% of games against the last one. Which sounds about right if you ask me.
Play this matchup 100 times and #1 will gain exactly 1 point 99 times and lose 79 points once.
The expected outcome is still positive for the better player…

MrRoboto

The system becomes stable with more games played, but still flexible enough to allow for adjustments when a player improves.

If your skill stays the same, you will oscillate around your “correct” ELO-Rating, gaining some points, losing some but always hover within a certain corridor around your skill level.

Your chance of breaking out of that corridor is when you actually improve your skill.

This is different than PPG, which becomes a lot more stable with many games.

If you play 50 games and you have, for example, a PPG of 4, that means you have 200 points.
Even a win against the best player around would increase that total to 208 points, but your PPG increases only to 4.08.

With the new system it doesn’t matter if you play 30, 50 or 100 games. After a certain threshold is reached (when the system “Found” your correct place), your elo rating will not “solidify” more. It might only become more accurate in finding the correct spot.

That’s why the K-factor is so important: We want to reach that threshold as fast as possible. I think after around 15-20 games every player is where he/she should be. That’s a lot for a single year, but not for multiple years.

MrRoboto

@mr_stucifer said in Proposal for a new, ELO-based, ranking system:

I am not sure how, but think it would be awesome if we could implement a possible Bid allowance into drastically different ELO games. Say a 500+ rating difference could get double the usual bid but have the game be worth half the points. This might promote playing between the extremes in skill levels, if useful.

I support this idea. And I think together we can find the sweet spot.

My idea would be to take the average bid up until that game.

And then for every bid above that average, the ELO change could be multiplied by 2%.

Right now in BM, the Allied bid is 18.3 on average.

If I play Axis and give my opponent +30, thats 12 more than average.
If we are at the same ELO level, I would usually gain 40 for a win or -40 for a loss.

Factoring in the bid, I would gain 401.12 = 49.6 (so 50) points, I would only lose -400.76 = 30.4 (so 30) points for a loss too.

An example for players with different ELO rating:

A 1800 wins against 1500.
Ratings change +16 and -16.

With a bid that 12 higher than average, that changes to:

+20 and -12

Or a 1800 loses against a 1500
Usually that is +64 and -64.

But with a bid 12 higher than average, that changes to
+80 and -49

Do you think that 2% per bid is too high or too low? Quite right?

MrRoboto

@MrRoboto said in Proposal for a new, ELO-based, ranking system:

That’s why the K-factor is so important:

To be precise: The difference of the K-rating between the first and later games is the important part. That total value of K is not necessarily very important.

Right now K changes from 120 (first 3 games) to 80 in later games. So the first 3 games give 50% more points than later ones.

Lowering the 80 to a lower number would not only enhance the impact of early games compared, it would also narrow said corridor. You would oscillate a bit less around your “correct” rating.

If the difference between the early and the last games is too high, players with slightly more than 10 games might be far off from their correct spot when a couple of those early games are outliers.

The new ELO-based ranking system

Featured Topics

T-shirts, Hats, and More

Suggested Topics

Simon33 (Axis) vs Lebowsky (Allies+16)

L25 BM4 zarhunter (Axis) vs Lebowsky (Allies +19)

L25 BM sunkhareb (Allies+23) v Omni (Axis)

2024 BM Playoffs Adam514 (Axis) vs pacifiersboard (Allies+22)

2024 Playoff OOB Round 1 Pacifiers Board (+42) v Gamerman01

L24 PTV MikawaGunichi (X+7) vs Wizmark Game#2

L24 PTV Adam514 (Axis+13) vs GeneralDisarray (Allies)

L24 BM4 fasthard (X) vs Adam514 (L+22)

121

17.5k

40.0k

1.7m