Advanced analytics are all the rage in sports right now.
From the NBA abandoning the mid-range jumper for treys and layups to the NFL treating running backs like second-class citizens, a focus on analytics done by pointy-headed Ivy Leaguers is remaking the way sports are played and rosters are constructed.
Advanced analytics have yet to break into cyclocross, but during a recent episode of Cyclocross Radio, our crew of Bill, Micheal and me, on the spot, came up with a metric that quickly became known as the On Podium Percentage, or OPP for short—all credit is due to Bill because, well, you know, he was old enough to remember when that song was released.
Bill and I are admittedly huge basketball fans, so we have long hinted at the idea of developing “advanced metrics” for cyclocross. The on-the-spot creation of the OPP inspired me to think more about what we can measure by delving deeper into rider results.
This post shares some ideas I came up after some intense brainstorming. I am not a pointy-headed Ivy Leaguer, just an over-educated state school product, so while I am likely not destined for a spot in the front office of my Milwaukee Bucks, hopefully there are a few interesting nuggets to be gleaned from this exercise.
The Assumptions
As I more or less said on the first edition of the Groadio power rankings last summer, when it comes to dealing with numbers, it is not as much about the numbers as the assumptions you make before you do the calculations. The same is true for the Cross Metrics presented here.
Before we go further, many thanks are due to crossresults.com and cyclocross24.com for compiling results into easy-to-use databases.
Despite their work, we are still limited to numerical results. In a perfect world, we would have lap-by-lap results and times, but for now, we are limited to results for UCI races domestically and abroad.
For this exercise, I did calculations for international riders and domestic riders, with some caveats.
International events were defined as World Cups and Low Country events that held Elite races—Elite Euros counted, U23 and Junior Euros did not. National Championships also were not included.
Domestic races were defined as Elite American and Canadian events. The World Cups did not count since they were included with the International races. Same story holds for National Championships.
While the UCI, CrossResults and USA Cycling use a rolling 12-month window, the first-ever Cross Metrics were calculated based on only the 2019-20 season. While there is some continuity between seasons for riders, cyclocross has shown itself to be a fickle sport, and a rider’s performance one season is not necessarily the same as it is in another.
Other final assumptions before we move on is a rider has to have 5 results to qualify for inclusion in the CM and the max score for any one race is 30, whether it be a DNF or really bad race.
The Metrics
Now that I have made an ass out of you and me, it is time to move on to the metrics. My ostensible forte is in statistical hydrology, so there are likely more holes in these metrics than a 32-spoke rim, so feel free to let me know what I can do better.
OPP – On Podium Percentage
Any way you cut it, a podium finish is the gold standard in cycling, so what better way to kick off the Cross Metrics than with the On Podium Percentage, or OPP. Yes yes, you know me.
Calculation the OPP is simple enough:
OPP = Podium finishes / Races Started
The numbers that stand out for OPP are Mathieu van der Poel’s 100% OPP and the 92% OPPs of Ceylin Alvarado on the international stage and Maghalie Rochette at home in North America.
Admittedly, also surprising is the 36% OPP for defending Elite Women’s World Champ Sanne Cant.
If you are looking for a counterargument to the final CX Heat Check that put Kerry Werner in the Number 1 slot ahead of Curtis White, you could look to White’s 88% OPP to Werner’s 65% for the same metric.
WAPP – Wide Angle Podium Percentage
Cyclocross Radio’s The Media Pit is part of the Wide Angle Podium network (click here to subscribe!), so it only makes sense that there is a Wide Angle Podium (TM) Percentage metric that measures how often riders finish in the Top 5.
WAPP = Top 5 Finishes / Races Started
PSZP – Podium Scrub Zone Percentage
The Scrub Zone of cyclocross has been well established thanks to the work of Colin Reuter and others, but credit for the Podium Scrub Zone goes to my friend Narayan, as far as I know.
For our purposes, the Podium Scrub Zone—or “thereabouts” as they call it in Europe—is defined as 4th through 6th place. Good results no doubt, but still the scrub zone with respect to those who get to stand up on the stage and get the accolades.
PSZP = 4th – 6th Place Finishes / Races Started
Bad Legs Days
Pro or amateur, we have all had those “bad legs days.” Whether a BLD be a lame excuse or a legit bad day, we are allowed an off day.
Unfortunately, those supposed “bad legs days” happen more often for some than others. The Bad Legs Day (BLD) metric tries to get at those riders who are most prone to off days.
Bad days are obviously relative from rider to rider. In recent years, a second-place finish is considered the end of the world for Mathieu van der Poel, while it might take a 10th or worse finish to be considered an off day for others.
For the Cross Metrics, a finish 5 places or more worse than that rider’s median finish for the year is considered a BLD. The decision to use 5 places worse and the median result are admittedly both arbitrary, so feel free to argue otherwise.
BLDs are reported as both an absolute value and a percentage of races started.
80% Rule
Consistency can be an underappreciated thing in cyclocross, especially for riders who are not necessarily on the podium week in and week out. The 80% Rule metric seeks to get at a rider’s results consistency.
It is not, as one might expect, named for the number of times a rider got pulled by the 80% rule. What it does is identify the range that 80% of a rider’s results fall between.
Now if we were doing Science! It would be the 95% rule, and we would make sure our p value was less than 0.05. However, since this is cyclocross, it made more sense to calculate the 10th and 90th percentiles for a rider’s results.
Keep in mind, this is a statistical analysis of the range a rider is most likely to finish within.
The most impressive of these are Van der Poel’s 1-1 and Rochette’s domestic 1-1.
Speaking to that measure of consistency, new Tormans CX Team teammates Quinten Hermans and Corne van Kessel have 80% Rule spreads of 2-8.6 and 3-8.5, respectively. So even though they are not necessarily winning races, they are pretty consistently finishing on the edge of the podium and no worse than about 8th or 9th place.
Average Placing Difference
The Average Placing Difference, or APD, is another metric designed to assess a rider’s consistency. The metric measures the average difference from a rider’s median finish across all races.
APD = Σ│(Finish – Median)│ / Races Started
**The little │ thing means absolute value and sigma ( Σ ) stands for sum
For APD, Van der Poel provides an interesting case study because as it stands, he has an APD of 0.1, but if he had taken a DNF or Van der Quit at Ronse like he was prone to do in the past, his APD would have jumped to 1.4.
That would not have necessarily captured how well he was racing, but sometimes, at least theoretically, the math is the math.
Internationally, the APD shows how well the top women have ridden. Alvarado and Worst have APDs of 0.9 and 1.6, which are much better than the 3.6 sported by Sanne Cant.
Lucinda Brand’s value is 3.8, but she falls into that theoretical Van der Poel situation after getting crashed out at Loenhout and taking a DNF. Without that 30 value, her APD would be in line with Alvarado’s.
A Sampling of Cross Metrics
Cross Metrics for what is admittedly an arbitrary number of top domestic and international riders are shown in the tables below.
To keep things interactive, the data are sortable by each Cross Metric.
Domestic Women: 2019 CrossMetrics
OPP | WAPP | PSZP | Bad Legs Days | BLD Percent | 80% Rule | 80% Spread | APD | |
---|---|---|---|---|---|---|---|---|
Maghalie Rochette | 0.92 | 0.92 | 0.00 | 1 | 0.08 | 1.0 - 1.0 | 0.0 | 0.5 |
Clara Honsinger | 0.87 | 0.87 | 0.00 | 2 | 0.13 | 1.0 - 5.4 | 4.4 | 1.2 |
Caroline Mani | 0.57 | 1.00 | 0.43 | 0 | 0.00 | 2.0 - 5.0 | 3.0 | 1.1 |
Rebecca Fahringer | 0.75 | 0.85 | 0.10 | 3 | 0.15 | 1.0 - 7.0 | 6.0 | 1.7 |
Courtenay McFadden | 0.40 | 0.53 | 0.33 | 3 | 0.20 | 2.0 - 9.0 | 7.0 | 2.5 |
Jenn Jackson | 0.29 | 0.59 | 0.35 | 3 | 0.18 | 1.6 - 11.6 | 10.0 | 3.2 |
Katie Clouse | 0.67 | 0.83 | 0.17 | 1 | 0.17 | 1.5 - 6.5 | 5.0 | 1.8 |
Ruby West | 0.31 | 0.38 | 0.23 | 5 | 0.38 | 3.0 - 13.8 | 10.8 | 3.2 |
Raylyn Nuss | 0.27 | 0.50 | 0.31 | 7 | 0.27 | 2.0 - 19.0 | 17.0 | 1.6 |
Sammi Runnels | 0.11 | 0.32 | 0.26 | 2 | 0.11 | 3.8 - 11.2 | 7.4 | 2.2 |
Caroline Nolan | 0.40 | 0.73 | 0.40 | 2 | 0.13 | 1.0 - 11.0 | 10.0 | 2.7 |
Ellen Noble | 0.25 | 0.25 | 0.17 | 4 | 0.33 | 1.2 - 18.7 | 17.5 | 5.4 |
Sunny Gilbert | 0.22 | 0.56 | 0.44 | 5 | 0.28 | 3.0 - 13.3 | 10.3 | 3.1 |
Madigan Munro | 0.33 | 0.33 | 0.17 | 0 | 0.00 | 2.5 - 10.5 | 8.0 | 3.2 |
Lizzy Gunsalus | 0.14 | 0.21 | 0.14 | 4 | 0.29 | 2.9 - 25.1 | 22.2 | 7.4 |
Crystal Anthony | 0.15 | 0.31 | 0.15 | 2 | 0.15 | 3.2 - 14.4 | 11.2 | 3.8 |
Hannah Arensman | 0.22 | 0.22 | 0.00 | 0 | 0.00 | 2.6 - 15.0 | 12.4 | 3.7 |
Domestic Men: 2019 CrossMetrics
OPP | WAPP | PSZ | Bad Legs Days | BLD Percent | 80% Rule | 80% Spread | APD | |
---|---|---|---|---|---|---|---|---|
Curtis White | 0.88 | 0.88 | 0.06 | 2 | 0.12 | 1.0 - 4.2 | 3.2 | 1.5 |
Kerry Werner | 0.65 | 0.85 | 0.25 | 2 | 0.10 | 1.0 - 6.3 | 5.3 | 1.9 |
Gage Hecht | 0.55 | 0.73 | 0.27 | 2 | 0.18 | 2.0 - 9.0 | 7.0 | 4.1 |
Michael van den Ham | 0.38 | 0.50 | 0.19 | 4 | 0.25 | 1.0 - 12.0 | 11.0 | 4.7 |
Stephen Hyde | 0.71 | 0.71 | 0.14 | 2 | 0.14 | 1.3 - 10.2 | 9.9 | 3.4 |
Lance Haidet | 0.43 | 0.57 | 0.21 | 2 | 0.14 | 1.3 - 11.8 | 10.5 | 3.8 |
Drew Dillman | 0.24 | 0.41 | 0.24 | 4 | 0.24 | 2.2 - 16.4 | 14.2 | 4.3 |
Eric Brunner | 0.38 | 0.50 | 0.25 | 2 | 0.25 | 2.4 - 21.6 | 19.2 | 6.5 |
Lane Maher | 0.31 | 0.50 | 0.31 | 4 | 0.25 | 1.5 - 23.0 | 21.5 | 6.0 |
Tobin Ortenblad | 0.25 | 0.25 | 0.00 | 3 | 0.19 | 1.5 - 17.5 | 16.0 | 5.6 |
Cody Kaiser | 0.06 | 0.38 | 0.38 | 3 | 0.19 | 4.0 - 19.0 | 15.0 | 4.8 |
Jamey Driscoll | 0.40 | 0.40 | 0.10 | 4 | 0.40 | 2.0 - 16.4 | 14.4 | 5.9 |
Sam Noel | 0.25 | 0.33 | 0.25 | 4 | 0.33 | 2.1 - 14.9 | 12.8 | 5.2 |
Eric Thompson | 0.21 | 0.43 | 0.21 | 6 | 0.43 | 3.0 - 20.8 | 17.8 | 6.5 |
Travis Livermon | 0.16 | 0.58 | 0.42 | 8 | 0.42 | 3.0 - 15.2 | 12.2 | 5.0 |
Cody Cupp | 0.27 | 0.55 | 0.27 | 4 | 0.36 | 3.0 - 15.0 | 12.0 | 3.9 |
International Women: 2019-2020 CrossMetrics
OPP | WAPP | PSZP | Bad Legs Days | BLD Percent | 80% Rule | 80% Spread | APD | |
---|---|---|---|---|---|---|---|---|
Ceylin Alvarado | 0.92 | 0.92 | 0.08 | 0 | 0.00 | 1.0 - 3.0 | 2.0 | 0.9 |
Annemarie Worst | 0.79 | 0.92 | 0.13 | 2 | 0.08 | 1.0 - 4.7 | 3.7 | 1.6 |
Sanne Cant | 0.36 | 0.48 | 0.32 | 3 | 0.12 | 2.0 - 12.0 | 10.0 | 3.6 |
Lucinda Brand | 0.78 | 0.89 | 0.11 | 1 | 0.11 | 1.0 - 9.2 | 8.2 | 3.8 |
Maghalie Rochette | 0.11 | 0.22 | 0.11 | 2 | 0.22 | 4.2 - 16.2 | 12.0 | 3.6 |
Yara Kastelijn | 0.57 | 0.83 | 0.26 | 4 | 0.17 | 1.0 - 9.0 | 8.0 | 3.2 |
Inge van der Heijden | 0.21 | 0.47 | 0.26 | 6 | 0.32 | 3.0 - 18.6 | 15.6 | 5.4 |
Clara Honsinger | 0.11 | 0.22 | 0.33 | 3 | 0.33 | 3.8 - 20.0 | 16.2 | 6.0 |
Katerina Nash | 0.25 | 0.38 | 0.13 | 2 | 0.25 | 1.7 - 15.9 | 14.2 | 5.5 |
Katie Compton | 0.28 | 0.39 | 0.22 | 5 | 0.28 | 2.7 - 15.5 | 12.8 | 4.6 |
Alice Maria Arzuffi | 0.16 | 0.36 | 0.32 | 4 | 0.16 | 3.0 - 16.8 | 13.8 | 4.8 |
Lucia Gonzalez Blanco | 0.00 | 0.00 | 0.00 | 2 | 0.22 | 17.6 - 26.8 | 9.2 | 3.0 |
Ellen Van Loy | 0.03 | 0.26 | 0.26 | 7 | 0.21 | 5.0 - 15.7 | 10.7 | 3.6 |
Caroline Mani | 0.00 | 0.00 | 0.13 | 3 | 0.38 | 6.7 - 23.5 | 16.8 | 6.5 |
Kaitie Keough | 0.00 | 0.31 | 0.31 | 5 | 0.38 | 4.2 - 18.6 | 14.4 | 5.5 |
Rebecca Fahringer | 0.11 | 0.22 | 0.11 | 1 | 0.11 | 3.8 - 16.2 | 12.4 | 4.2 |
Laura Verdonschot | 0.13 | 0.29 | 0.25 | 5 | 0.21 | 3.3 - 30.0 | 26.7 | 6.7 |
Eva Lechner | 0.09 | 0.27 | 0.27 | 1 | 0.05 | 4.0 - 11.9 | 7.9 | 2.5 |
Anna Kay | 0.24 | 0.33 | 0.24 | 9 | 0.43 | 3.0 - 19.0 | 16.0 | 5.6 |
Loes Sels | 0.00 | 0.10 | 0.14 | 11 | 0.38 | 5.8 - 30.0 | 24.2 | 7.5 |
International Men: 2019-2020 CrossMetrics
OPP | WAPP | PSZP | Bad Legs Days | BLD Percent | 80% Rule | 80% Spread | APD | |
---|---|---|---|---|---|---|---|---|
Mathieu van der Poel | 1.00 | 1.00 | 0.00 | 0 | 0.00 | 1.0 - 1.0 | 0.0 | 0.1 |
Toon Aerts | 0.56 | 0.76 | 0.28 | 3 | 0.12 | 1.4 - 8.2 | 6.8 | 2.1 |
Eli Iserbyt | 0.67 | 0.81 | 0.19 | 4 | 0.15 | 1.0 - 13.0 | 12.0 | 3.0 |
Laurens Sweeck | 0.46 | 0.50 | 0.12 | 6 | 0.23 | 2.0 - 12.0 | 10.0 | 4.0 |
Michael Vanthourenhout | 0.26 | 0.56 | 0.37 | 4 | 0.15 | 2.6 - 11.0 | 8.4 | 2.6 |
Lars van der Haar | 0.15 | 0.54 | 0.54 | 5 | 0.19 | 3.0 - 11.0 | 8.0 | 2.6 |
Quinten Hermans | 0.44 | 0.56 | 0.28 | 3 | 0.12 | 2.0 - 8.6 | 6.6 | 2.5 |
Corne van Kessel | 0.19 | 0.42 | 0.42 | 2 | 0.08 | 3.0 - 8.5 | 5.5 | 1.7 |
Gianni Vermeersch | 0.11 | 0.21 | 0.14 | 5 | 0.18 | 3.7 - 21.2 | 17.5 | 4.6 |
Felipe Orts Lloret | 0.07 | 0.07 | 0.00 | 3 | 0.20 | 8.4 - 27.8 | 19.4 | 6.6 |
Tom Pidcock | 0.38 | 0.62 | 0.29 | 4 | 0.19 | 2.0 - 10.0 | 8.0 | 2.5 |
Tim Merlier | 0.27 | 0.55 | 0.27 | 2 | 0.09 | 2.0 - 9.0 | 7.0 | 2.5 |
Marcel Meisen | 0.00 | 0.00 | 0.00 | 5 | 0.24 | 7.0 - 20.0 | 13.0 | 3.9 |
Michael Boros | 0.00 | 0.00 | 0.00 | 1 | 0.11 | 17.4 - 25.8 | 8.4 | 21.8 |
Jens Adams | 0.00 | 0.19 | 0.26 | 6 | 0.22 | 5.0 - 20.8 | 15.8 | 4.1 |
This is great! Thanks!
I’m also a data scientist by day and a CX fan by, uh, day, too, so I can’t help but think about ways to try to expand or extend what you’ve already done here. One idea so far:
XCxCX. In cross-country running (a.k.a. XC), team scores are computed as the sum of the placings for the first five finishers from each team. So, what if we treated a single athletes’ best results over a rolling window of, say, 10 events like a team score from a single event? The best possible score would now be 5 instead of 15, while the worst possible score would depend on the events used, but in practice it would mostly smooth out over 10 events, and who cares about the comparability of the scores close to that tail of the distribution anyway? In contrast to the various podium-based metrics, this would give more weight to better finishes, and it would discriminate more among athletes further down the results list, too. A little post-computation transformation might be in order to get a distribution we really like, but we’d have to see the raw numbers to figure that out.
How about a good ol’fashioned standard deviation thrown in there? Does someone else already do that?
@Jay … Interesting idea. I think that is kind of similar to what CrossResults does. I am admittedly not a data scientist, so anything past podium finishes divided by races is probably mostly bush-league level mumbo jumbo from me.
@Davis … I played around with using means and standard deviations. I opted for the median and the made-up deviation values because both the means and standard deviations were skewed by bad results. I wanted a way to eliminate the effect of outliers.
For example, the mean and standard deviation of Iserbyt’s results are 4.4 and 5.6, which does not, IMO, accurately reflect the results he has gotten.
As Cross is all about momentum a ROC index may be of interest .-)
(ROC : Rate of Change)