1. Introduction
The use of data analytics in professional tennis has become increasingly more common over the
last decade. More and more players are taking advantage of advanced statistical analyses to
improve their game, whereas media outlets, helped by software giants (e.g. I.B.M) use the data to
present the audience with enriched analysis of matches in real time (Larson and Smith 2018).
These analyses, however, have been mostly limited to the development of models and metrics to
describe broad aspects of the game, such as predicting the outcomes of tennis matches (e.g.,
Ingram 2019; Klaassen and Magnus 2003; McHale and Morton 2011; Spanias and Knottenbelt
2013; see review by Kovalchik 2016), revising ranking systems (e.g., Bozóki, Csató, and Temesi
2016). ; Irons, Buckley, and Paulden 2014), or settling popular disputes with interest to pundits
and general audiences (e.g., Radicchi 2012). While a few studies have examined more specific
aspects, such as success rates of elite players in elite tournaments (e.g. Gallagher, Frisoli, and
Luby 2021; Leitner, Zeileis, and Hornik 2009; Wei, Lucey, Morgan, and Sridharan 2013), their
approach remained top-down: developing a model and then applying it to a particular dataset
chosen for its prominent status and visibility.
Much less common are bottom-up approaches, which begin with identifying local, unique
statistical patterns in the field, and then examine whether they could be accounted for by
mechanisms that have broader implications on the sport. The current work attempts to illuminate
such a unique pattern appearing in the ATP Finals tennis tournament for singles, explain its
possible sources through a “model-free” statistical approach, and draw conclusions with possible
interest to players, ATP officials, tennis pundits and betting agencies. To the best of our
knowledge, this is also the first academic attempt devoted specifically to identifying and
explaining statistical patterns in the ATP Finals in tennis.