European Journal of Operational Research 263 (2017) 611–624
Contents lists available at ScienceDirect
European Journal of Operational Research
journal homepage: www.elsevier.com/locate/ejor
Innovative Applications of O.R.
Beyond crowd judgments: Data-driven estimation of market value in
association football
Oliver Müller
a , , 1
, Alexander Simons
b , 1
, Markus Weinmann
b , 1
a
IT University of Copenhagen, Rued Langgaards Vej 7, 2300 Copenhagen S, Denmark
b
University of Liechtenstein, Fuerst-Franz-Josef-Strasse 21, 9490 Vaduz, Liechtenstein
a r t i c l e i n f o
Article history:
Received 5 May 2016
Accepted 4 May 2017
Available online 11 May 2017
Keywords:
OR in Sports
Football
Soccer
Market value
Crowdsourcing
a b s t r a c t
Association football is a popular sport, but it is also a big business. From a managerial perspective, the
most important decisions that team managers make concern player transfers, so issues related to player
valuation, especially the determination of transfer fees and market values, are of major concern. Market
values can be understood as estimates of transfer fees—that is, prices that could be paid for a player
on the football market—so they play an important role in transfer negotiations. These values have tradi-
tionally been estimated by football experts, but crowdsourcing has emerged as an increasingly popular
approach to estimating market value. While researchers have found high correlations between crowd-
sourced market values and actual transfer fees, the process behind crowd judgments is not transparent,
crowd estimates are not replicable, and they are updated infrequently because they require the partici-
pation of many users. Data analytics may thus provide a sound alternative or a complementary approach
to crowd-based estimations of market value. Based on a unique data set that is comprised of 4217 play-
ers from the top five European leagues and a period of six playing seasons, we estimate players’ market
values using multilevel regression analysis. The regression results suggest that data-driven estimates of
market value can overcome several of the crowd’s practical limitations while producing comparably accu-
rate numbers. Our results have important implications for football managers and scouts, as data analytics
facilitates precise, objective, and reliable estimates of market value that can be updated at any time.
©2017 The Author(s). Published by Elsevier B.V.
This is an open access article under the CC BY-NC-ND license.
( http://creativecommons.org/licenses/by-nc-nd/4.0/ )
1.
Introduction
With millions of players and billions of fans, association foot-
ball (“football” hereafter) is the world’s most popular sport. Be-
cause of its popularity, professional football teams generate enor-
mous revenues; they are no longer just clubs but companies with
shareholders and managers, sales and profits, and customers rather
than fans. From a managerial perspective, the most important de-
cisions that these “football companies” ( Amir & Livne, 2005 ) have
to make concern which players to employ. As player transfers have
a tremendous impact on a club’s chances for success ( Pawlowski,
Breuer, & Hovemann, 2010 ), researchers from various disciplines
have long studied the factors that impact transfer fees ( Frick,
2007 ).
Corresponding author.
E-mail addresses: oliver[email protected] (O. Müller), alexander[email protected]
(A. Simons), [email protected] (M. Weinmann).
1
Authors are listed in alphabetical order to reflect equal contribution to this re-
search.
More recently, though, researchers have begun to pay particu-
lar attention to players’ market values. A player’s market value is
an estimate of the amount for which a team can sell the player’s
contract to another team ( Herm, Callsen-Bracker, & Kreis, 2014 ).
While transfer fees represent actual prices paid on the market,
market values provide estimates of transfer fees, so they play
an important role in transfer negotiations. Market values have
long been estimated by football experts like team managers and
sports journalists, while crowdsourcing websites like Transfermarkt
( www.transfermarkt.com ) have proved their usefulness in estimat-
ing market value during the past few years. However, data-driven
approaches to estimating market value have not yet caught on in
professional football.
Football has long lagged behind other major sports in the use
of data analytics. In 2010, the New York Times still called football
the “least statistical” of all major sports ( Kaplan, 2010 ), in large
part because the pool of data available at that time was com-
paratively weak. Today, however, sports-data companies like Opta
( www.optasports.com ) collect prodigious amounts of detailed per-
formance data that could be used for player valuation in profes-
http://dx.doi.org/10.1016/j.ejor.2017.05.005
0377-2217/© 2017 The Author(s). Published by Elsevier B.V. This is an open access article under the CC BY-NC-ND license.
( http://creativecommons.org/licenses/by-nc-nd/4.0/ )
612 O. Müller et al. / European Journal of Operational Research 263 (2017) 611–624
sional football (see, e.g., Brandes & Franck, 2012 ). While some foot-
ball clubs have started to analyze that data for training purposes
and decisions about line-ups, only a few have realized the data’s
economic potential. They still ignore the “Moneyball” idea of using
statistics to guide player scouting and recruitment ( Zhu, Lakhani,
Schmidt, & Herman, 2015 ).
In this paper, we evaluate the applicability of data analytics for
estimating players’ market values in professional football; in doing
so, we make four primary contributions: 1) we identify the short-
comings of crowd-based estimations of market value, which jus-
tify the exploration of data-driven approaches to estimating market
value; 2) we synthesize the academic literature on player valuation
to identify the factors that determine players’ market values; 3) we
use a large sample of publicly available data on the five biggest
professional football leagues in Europe over a period of six playing
seasons to train a multilevel regression model for data-driven es-
timation of market value; and 4) we evaluate the accuracy of our
model based on a comparison with actual transfer fees and crowd
estimates and define the potential of data analytics in overcoming
the crowd’s limitations.
2. Background
2.1. Market values in professional football
Players are the most important investments in professional
football from both a sporting perspective and a business perspec-
tive. While in the United States (U.S.), professional athletes are
often traded for other athletes or for future draft picks (e.g., in
American football or baseball), European football players are usu-
ally traded for cash settlements, which are referred to as “trans-
fer fees” ( Frick, 2007 ). Players’ market values are estimates of the
transfer fees that are most likely to be paid for them. Although
there are conceptual differences, market values and transfer fees
are comparable ( He, Cachucho, & Knobbe, 2015 ). Accordingly, a
player’s market value can be defined as “an estimate of the amount
of money a club would be willing to pay in order to make [an] ath-
lete sign a contract, independent of an actual transaction” ( Herm
et al., 2014 , p. 484). As such, market values inform selling clubs and
buying clubs about football players’ monetary value—even those
whose contracts have not been sold recently—so they are impor-
tant in transfer negotiations. Market values have traditionally been
estimated by the clubs themselves or by sports journalists, but as
football fans have developed an interest in market values, websites
have emerged that provide estimates of players’ market values. In
particular, crowdsourcing has proved its usefulness in estimating
market values.
2.2. Crowd-based estimation of market value
Transfermarkt is the leading website on the football transfer
market. The site offers general football-related data, such as scores
and results, football news, transfer rumors, and estimations of mar-
ket value at the individual and team levels for most professional
football leagues. Once a user has registered at Transfermarkt, he
or she can follow discussion threads about players’ market val-
ues, propose personal estimations based on players’ current value
and performance, and discuss their proposals with other commu-
nity members. The final market values are then determined by ag-
gregating the individual estimates. Launched in Germany in 2001,
where it now ranks among the most frequently visited websites
( Alexa , n.d.), Transfermarkt released an English-language version in
2009, and versions of the site have since been made available in
Austria, Italy, Poland, Portugal, Spain, Switzerland, Turkey, and the
Netherlands.
Transfermarkt’s idea is that users can build an estimate of mar-
ket value together as well as or better than a few football experts
can, a style of judgment for which Surowiecki (2005) coined the
term “wisdom of crowds.” Some of the most influential newspa-
pers and magazines in Europe regularly quote Transfermarkt’s mar-
ket values for football players ( Bryson, Frick, & Simmons, 2012;
Herm et al., 2014 ), which have been found to correlate closely
with experts estimates and player salaries ( Franck & Nüesch, 2011;
Torgler & Schmidt, 2007 ). Accordingly, Transfermarkt’s market val-
ues have provided the foundation for several studies of the foot-
ball transfer market (e.g., Franck & Nüesch, 2012; He et al . , 2015 ).
Transfermarkt’s accuracy in estimating market value is remarkable,
as crowdsourcing is generally associated with challenges like so-
cial influence, manipulation attempts, and lack of experience and
knowledge (e.g., Lorenz, Rauhut, Schweitzer, & Helbing, 2011 ) that
may bias estimations of players’ market value. As Herm et al.
(2014) explained, Transfermarkt has dealt with these challenges by
implementing the “judge principle,” a selective approach to infor-
mation aggregation.
According to Herm et al. (2014) , the judge principle of infor-
mation aggregation works as follows. Transfermarkt does not esti-
mate market values in a democratic way, such that all user esti-
mates have equal value, but uses a hierarchical approach. There-
fore, Transfermarkt does not calculate the final market values as
the mean or median of all individual estimates but gives a few
empowered community members, whom Herm et al. called the
“judges,” the final say. Accordingly, judges review other users’ es-
timates and select and weigh them when making their decisions,
so they can decrease or increase the influence of users they con-
sider to be less or more qualified. Although the final market val-
ues are not calculated democratically, there is reason to believe
that the selective-judge principle works better than purely demo-
cratic approaches to information aggregation would. For example,
when little-known players receive only a few votes, user estimates
that are clearly too high or too low would significantly bias the
results–either because of manipulation attempts (e.g., by oppor-
tunistic sports agents) or because of a lack of knowledge (e.g., by
inexperienced fans). Judges can exclude such estimates from the
aggregation, which decreases the risk of bias. (For a more detailed
description of how Transfermarkt works see Herm et al. (2014) ).
However, despite its arguable benefits and its demonstrated ac-
curacy, the crowdsourcing approach to estimating market value
comes with several limitations. First, community members base
their estimates on arbitrary indicators, which may happen even
unconsciously, so they lack objectivity. (Transfermarkt suggests a
list of evaluation criteria, but these are not mandatory.) Second,
judges can independently determine the final market values based
on personal evaluations of user estimates and other indicators, so
they are not reproducible. (As Transfermarkt does not calculate
the final values in a formal way, the question arises concerning
who judges the judges.). Third, as crowd estimations require the
participation of many users, market values are not updated on a
match-by-match basis and may no longer be accurate after a few
games, so crowd estimations are generally not efficient. (Transfer-
markt usually estimates market values every six to twelve months.)
Fourth, crowd estimates tend to be more accurate for players who
are well known to a sufficiently large audience, so they often do
not support player scouting in minor leagues. (The number of
Transfermarkt’s forum posts is rather low in some countries and
leagues.) Fifth, crowd-estimated market values are public, so they
do not offer a competitive advantage to clubs in transfer negoti-
ations. (Transfermarkt’s market values increasingly affect contract
and wage negotiations on the football market.) As the next section
explains, a data-driven approach to estimating market value would
address these limitations.
O. Müller et al. / European Journal of Operational Research 263 (2017) 611–624 613
Fig. 1. Conceptualization of market-value estimation at Transfermarkt
(adapted from Herm et al., 2014 , p. 486)
2.3. Data-driven estimation of market value
Major League Baseball (MLB) was the first sport to make serious
use of data analytics in player recruitment ( Steinberg, 2015 ). At the
end of the 1990s, Billy Beane, General Manager of the Oakland Ath-
letics, began using statistical data for player scouting and decisions
about the team roster, a story probably best known through the
bestseller, “Moneyball,” and its film adaptation by the same name
( Lewis, 2004 ). Insights generated from player statistics helped the
team’s management to identify undervalued but talented players
and overvalued players who had passed their zenith ( Zhu et al.,
2015 ). In the following two decades, the Athletics’ innovative ap-
proach to player recruitment helped the team reach the playoffs
roughly every second season, although they had one of the low-
est budgets of all of the MLB teams, many of which later adopted
Beane’s ideas.
Professional football has long lagged behind sports like baseball
and basketball in the use of quantitative data, so football clubs es-
chewed the Moneyball idea. For example, in 2010 the U.S.s Major
League Soccer (MLS) website displayed only six metrics per player,
while the MLB website featured twenty-nine batting metrics alone
( Kaplan, 2010 ). “Contrary to the situation in most American team
sports, few individual performance measures are recorded in foot-
ball” ( Frick, 2011 , p. 113). However, sports-data companies like
Opta have begun collecting exhaustive and detailed data about
football players, and some clubs have even begun to collect their
own data during training and games. For example, during the 2014
FIFA world cup in Brazil, the German Football Association (DFB)
used one of SAP’s big-data solutions to analyze player performance
( SAP, 2014 ). The software company estimated that only ten min-
utes of training with ten players and three balls produced more
than seven million data points (also see Bojanova, 2014 ).
However, most clubs use the newly available data to adjust
training plans and support decisions about line-ups, while the
data’s potential for supporting managerial decisions is ignored.
Only a few clubs are known to use data analytics systematically
for player valuation, but most of them are small or medium-sized
clubs for which buying expensive superstars is not a viable strat-
egy. For example, Danish Superliga club FC Midtjylland has begun
to use statistical models to evaluate teams and players ( Murtagh,
2015 ), and Dietmar Hopp, owner of German Bundesliga club TSG
Hoffenheim and co-founder of SAP, has pushed the use of statis-
tical analysis at Hoffenheim. After Hoffenheim received from FC
Liverpool an all-time-high transfer fee of 41 million in 2015 for
Roberto Firmino, who had cost Hoffenheim only 4 million four
years earlier, Hopp identified two success factors for running the
team in the future: being an early adopter of innovative technolo-
gies and identifying talented players early in their careers and de-
veloping them so they contributed on both the pitch and the bal-
ance sheet ( Zhu et al., 2015 ). While data analytics is an innovative
technology, its applicability to estimating market value and recruit-
ing talented young players remains to be assessed.
Research on judgment and decision-making provides strong
empirical and theoretical arguments that favor statistical estimates
over human (heuristic) judgments ( Dawes, Faust, & Meehl, 1989 ),
particularly when it comes to complex decisions ( Evans, 2006;
Tversky & Kahneman, 1974 ) like estimating a football player’s mar-
ket value. A meta-analysis of 136 empirical studies that compared
statistical predictions and human judgments in fields from clinical
decision-making to economics showed that statistical techniques
are, on average, 10 percent more accurate than human judgments
are ( Grove, Zald, Lebow, Snitz, & Nelson, 20 0 0 ). The superiority
of statistical methods over human judgments holds for trained,
untrained, experienced, and inexperienced judges alike ( Grove &
Meehl, 1996 ). Therefore, our approach to data-driven estimation of
market value uses a statistical model.
Brunswik’s (1952) lens model, which Herm et al. (2014) used
to conceptualize how the Transfermarkt crowd estimates market
value, can also be used to explain our approach to data-driven es-
timation of market value ( Fig. 1 ). On the Transfermarkt website,
community members j make subjective estimations
ˆ
y
j
of a foot-
ball player’s true, unobservable market value y based on arbitrary
indicators x
i
and subjective weightings a
i,j
. A Transfermarkt judge
then creates a final estimation of market value
ˆ
y based on selected
user evaluations
ˆ
y
j
and other indicators x
i
, to both of which he or
she assigns subjective weightings b
j
and a
i
. Accordingly, the crowd-
based approach to estimating market values uses divergent indica-
tors and weightings. In contrast, a data-driven approach to estimat-
ing market value uses a statistical model with consistent indicators
x
i
and empirically derived weightings a
i
to estimate players’ mar-
ket values, so it overcomes the limitations of the crowd: Because
the model uses the same indicators and weightings for all players,
it is transparent and replicable; it is efficient, so market values can
be updated on a match-by-match basis; it produces unbiased esti-
mates for well-known and lesser known players alike, so it can be
used for player scouting; and its use does not require public an-
614 O. Müller et al. / European Journal of Operational Research 263 (2017) 611–624
Table 1
Indicators of market value.
Indicator Description Selected references
Player characteristics
Age Age reflects players’ experience and potential. (1)–(19)
Height Height reflects heading ability, which can influence the probability of scoring or preventing goals. (2), (4), (11), (18)
Position Position reflects players’ flexibility on the pitch and their crowd-pulling capacity. (1)–(19)
Footedness Two-footedness is an advantageous footballing ability that also
reflects players’ flexibility. (2), (12), (18)
Nationality Nationality refers to a player’s country or continent of birth. (2), (6), (8), (9), (14), (16), (17)
Player performance
Playing time Playing time refers to the number of games or minutes played at the national and international levels. (1)–(13), (15)–(19)
Goals Goals refers
to the number of goals a player has scored. (2)–(5), (7), (8), (10)–(19)
Assists Assists refers to the number of a player’s assists that helped other players score goals. (7), (11)–(16)
Passing Passing refers to the number of passes to other players or the accuracy of passing. (7), (12), (16)
Dribbling
Dribbling refers to the number and success rate of a player’s ball maneuvers. (7), (11), (16)
Dueling Dueling refers to the number and success rate of a player’s tackles, clearances, blocks, and interceptions. (7), (12), (14), (16)
Fouls Fouls refers to the number of fouls committed or the number of
times a player has been fouled. (7), (11), (13)
Cards Cards refers to the number of yellow, yellow/red, and red cards received by a player. (7), (8), (13), (18)
Player popularity
News A player’s news-worthiness is reflected in press citations. (7), (13), (14)
Internet links Popularity is reflected in the number
of links reported by web search engines like Google. (9), (12), (13)
References: (1) Brandes and Franck (2012) ; (2) Bryson et al. (2012) ; (3) Carmichael and Thomas (1993) ; (4) Carmichael et al. (1999) ; (5) Dobson et al. (20 0 0) ; (6) Feess
et al. (2004) ; (7) Franck and Nüesch (2012) ; (8) Frick (2011) ; (9) Garcia-del-Barrio
and Pujol (2007) ; (10) Gerrard and Dobson (20 0 0) ; (11) He et al. (2015) ; (12) Herm
et al. (2014) ; (13) Kiefer (2014) ; (14) Lehmann and Schulze (2008) ; (15) Lucifora and Simmons (2003) ; (16) Medcalfe (2008) ; (17) Reilly and Witt (1995) ;
(18) Ruijg and
van Ophem (2014) ; (19) Speight and Thomas (1997)
nouncement, so it can offer the club that uses it an advantage in
transfer negotiations.
The next section’s literature review identifies indicators of mar-
ket value in order to provide a conceptual background for develop-
ing such a model.
3. Indicators of market value
3.1. Overview
Research has identified several factors that can be used to es-
timate market values and these factors are similar to those the
Transfermarkt crowd uses (see Herm et al., 2014 ). Table 1 or-
ganizes the most common indicators of market value into three
categories—player characteristics , player performance , and player
popularity —and shows selected studies that have used these indi-
cators.
While researchers have studied indicators of transfer fees (e.g.,
Carmichael & Thomas, 1993; Carmichael, Forrest, & Simmons,
1999; Dobson, Gerrard, & Howe, 20 0 0; Gerrard & Dobson, 20 0 0;
Medcalfe, 2008; Ruijg & van Ophem, 2014; Speight & Thomas,
1997 ) and market values (e.g., Franck & Nüesch, 2012; Garcia-del-
Barrio & Pujol, 2007; He et al., 2015; Herm et al., 2014; Kiefer,
2014 ), studies on players’ remuneration (e.g., Brandes & Franck,
2012; Bryson et al . , 2012; Feess, Frick, & Muehlheusser, 2004;
Frick, 2011; Lehmann & Schulze, 2008; Lucifora & Simmons, 2003 )
can also be used to identify indicators of market value. In fact,
players’ salaries are influenced by the same—or at least similar—
factors
as those that influence market values and transfer fees
(see, e.g., Brandes & Franck, 2012; Bryson et al., 2012; Frick, 2007 ).
Therefore, we explain the three indicator categories of market
value by reviewing research on player valuation, payment, and
transfer. (Text references to the indicators listed in Table 1 are
italicized.)
3.2. Player characteristics
We conceptualize player characteristics as players’ physical and
demographic attributes. Age is an important indicator of market
value, as it reflects both experience and potential (e.g., Carmichael
& Thomas, 1993 ). Most studies on player valuation have used
quadratic age terms to allow for non-linear relationships, consid-
ering that players’ values usually increase into their mid-twenties
and decline thereafter (e.g., Bryson et al., 2012 ). Age (age squared)
has frequently been found to influence pay and value positively
(negatively) (e.g., Lehmann & Schulze, 2008 ). In addition, a player’s
height has been found to significantly increase salary returns
( Bryson et al., 2012 ) because it indicates good heading ability that
may increase the probability of scoring or preventing a goal ( Fry,
Galanos, & Posso, 2014 ).
Another player characteristic that has been studied in player-
valuation research is footedness . For example, Bryson et al.
(2012) concluded that two-footed ability raises players’ salaries,
and Herm et al. (2014) found that it positively impacts their
market values. Two-footedness is a generally advantageous foot-
ball skill, but it also reflects flexibility because players who are
adept with both feet can be used in various positions on the
pitch ( Bryson et al., 2012 ). Like the other player characteristics,
footedness is a talent-related indicator of market value, but re-
searchers have also studied whether players’ nationalities influ-
ence their value and pay because of discrimination ( Frick, 2007 ).
For example, in their study of the Spanish professional football
league, Garcia-del-Barrio and Pujol (2007) found that non-Spanish
European players were systematically overrated, while non-
European players were systematically underrated. However, Reilly
and Witt (1995) found no evidence of discrimination of players
in professional football, which was more recently confirmed by
Medcalfe (2008) .
Finally, a player’s position —goalkeeper, defender, midfielder, or
forward—is important in estimating market value. Several re-
searchers have found that players’ positions impact salaries and
transfer fees, as they reflect players’ degrees of specialization and
crowd-pulling capacity. For example, Frick (2007) found that goal-
keepers earn significantly less than midfielders because goalkeep-
ers can be used less flexibly on the pitch. Garcia-del-Barrio and
Pujol (2007) concluded that attackers receive much higher atten-
tion and rewards than goalkeepers, as attackers are more visible to
the audience and so have higher crowd-pulling power ( He et al.,
2015 ).
O. Müller et al. / European Journal of Operational Research 263 (2017) 611–624 615
3.3. Player performance
Player performance reflects how well players function on the
pitch. Playing time has consistently been used in player-valuation
research. For example, appearances in domestic leagues, in the Eu-
ropean leagues, and on the national team have a positive impact
on transfer fees and market values (e.g., Carmichael & Thomas,
1993; Garcia-del-Barrio & Pujol, 2007; Gerrard & Dobson, 20 0 0 ).
Researchers have distinguished between appearances during play-
ing seasons and appearances during players’ careers (e.g., Franck
& Nüesch, 2012 ), and they have considered substitute appearances
(e.g., Bryson et al., 2012 ) and minutes played (e.g., Ruijg & van
Ophem, 2014 ) to account for the actual time spent on the field.
Several other performance measures can be used to estimate
market values. Goals , including field goals, headers, and penal-
ties, indicate players’ scoring ability, so they are a largely unam-
biguous performance measure ( Carmichael et al., 1999 ). Accord-
ingly, the total and average number of goals, each across play-
ing seasons and players’ careers, have often been used in player-
valuation research (e.g., Bryson et al., 2012; Carmichael & Thomas,
1993; Frick, 2011; Gerrard & Dobson, 20 0 0 ). Assists refer to players’
contributions that help others score goals, so they are also com-
mon indicators of player value. For example, Lucifora and Simmons
(2003) provided evidence from Italian football that forwards’ as-
sist rates can increase their salaries, a finding that Lehmann and
Schulze (2008) and Franck and Nüesch (2012) reinforced for Ger-
man Bundesliga players.
Because of the protracted unavailability of detailed performance
data in professional football, only a few researchers have used per-
formance measures other than goals and assists to explain value
and pay. Infrequently used are passing (e.g., Herm et al., 2014 );
dueling in the form of clearances, blocks, and interceptions (e.g.,
Franck & Nüesch, 2012 ); dribbles (e.g., Medcalfe, 2008 ); commit-
ted fouls (e.g., He et al., 2015 ); and yellow and red cards (e.g.,
Kiefer, 2014 ). Because the significance of performance indicators
varies by position, researchers have also included interaction ef-
fects in their models of player value (e.g., Dobson et al., 20 0 0;
Gerrard & Dobson, 20 0 0 ). For example, while forwards are sup-
posed to score goals, defenders should win tackles, and midfield-
ers are expected to defend and attack equally well. To account for
the variety of performance indicators, some researchers have also
replaced them with aggregated indices and expert estimations as
proxies for player performance (e.g., Brandes & Franck, 2012; Feess
et al., 2004; Garcia-del-Barrio & Pujol, 2007 ).
3.4. Player popularity
Theories on the emergence of “superstars” like actors and
singers suggest that not only talent ( Rosen, 1981 ) but also the ex-
ternalities of popularity ( Adler, 1985 ) can explain demand for foot-
ball players ( Franck & Nüesch, 2012 ). Therefore, players’ market
values also depend on their crowd-pulling power, independent of
what they show on the pitch, as this power can sell their clubs’
jerseys and seats. Accordingly, studies of the football transfer mar-
ket have investigated popularity-related factors. While early studies
left popularity to the error term (e.g., Carmichael & Thomas, 1993 ),
the Internet has provided new ways to measure player popularity
by, for example, analyzing online news and web links . For exam-
ple, Lehmann and Schulze (2008) concluded that media presence,
measured as the number of times a player’s name is mentioned
in the online version of the German sports magazine Kicker, re-
lates to salary. Likewise, Franck and Nüesch (2012) found that non-
performance-related press citations in the LexisNexis database are
positively related to market value, and Brandes, Franck, and Nüesch
(2008) counted how often German Bundesliga players’ names were
mentioned in newspapers and magazines to determine whether
superstars boost attendance at home and away matches. Herm
et al. (2014) and Garcia-del-Barrio and Pujol (2007) measured
public attention as the total number of Google search hits and
found it to be a significant factor in player valuation, while Kiefer
(2014) measured popularity using Facebook “likes” and mentions
on the UEFA website.
In summary, research has identified several indicators of market
value, including player characteristics, performance, and popularity,
with most of the extant studies relying on similar factors. The next
section explains how we operationalized these factors and how we
collected and analyzed data to train a statistical market-value esti-
mation model.
4. Data collection and description
We gathered season-level data about players’ characteristics,
performance, and popularity from several Internet sources, in-
cluding Google, Reddit, Transfermarkt, WhoScored, Wikipedia, and
YouTube. We collected data for six playing seasons, from the
2009/10 season to the 2014/15 season, for players from the five
top European leagues, that is, England’s Premier League, Spain’s La
Liga, Germany’s Bundesliga, Italy’s Serie A, and France’s Ligue 1.
To increase the reliability of the performance data, and in line
with previous research, we considered only those players who ap-
peared on the pitch for at least ninety minutes in a given season
( Brandes & Franck, 2012 ) and excluded goalkeepers from our sam-
ple ( Bryson et al., 2012; Lucifora & Simmons, 2003 ), as their per-
formance is measured in a considerably different way than that
of outfield players. The resulting data set consisted of 10,350 ob-
servations from 4217 players on 146 teams. Table 2 provides an
overview.
Our data-driven approach to estimating market value is concep-
tually similar to how the crowd estimates market values. To es-
timate a player’s market value after a given season, we use his
estimation of market value from the end of the previous sea-
son as a baseline and add data about his characteristics, perfor-
mance, and popularity from that season. As the accuracy of Trans-
fermarkt’s estimations of market value has been repeatedly con-
firmed by researchers, and because of the unavailability of other
credible sources that provide historical data, we used Transfer-
markt’s estimations of market value to train our model. We first
collected the estimations that were made at the end of the six sea-
sons (as per June 30) for all players in our sample. The average
player across all leagues and seasons was worth around 5.6 mil-
lion at Transfermarkt; players’ market values ranged from 50,0 0 0
to 120 million with a standard deviation of around 8.2 million.
( Appendix A provides a more detailed overview of the transfer
market.)
To conduct our own estimation of players’ market values, we
collected data about their characteristics, performance, and popu-
larity. We operationalized the player characteristics by means of
a player’s Age (years), Height (centimeters), Footedness (two-footed
ability or not), Nationality (continent of origin), and Position on
the pitch (defender, midfielder, forward). The average player in our
data set was 26.5 years old and 181.5 centimeters (nearly six feet)
tall. Eight percent of all players were adept with both feet, 41 per-
cent of them were midfielders (21% forwards, 38% defenders), and
76 percent were from Europe (12% from South America, 10% from
Africa, 2% from other continents). (Categorical variables are not dis-
played in Table 2 .)
We measured player performance by means of the number of
Minutes played , Goals , Assists , and Yellow or Red cards per season;
the number and success ratio of Passes , Dribbles , Aerial duels , and
Tackles per game; and the number of Interceptions , Clearances , and
committed Fouls per game. The average player in our sample was
on the pitch for 1612 minutes per season, during which he scored
616 O. Müller et al. / European Journal of Operational Research 263 (2017) 611–624
Table 2
Descriptive statistics.
Variable Measurement Mean Median St. Dev. Min. Max.
Player valuation
Transfermarkt’s market value EUR 5588,529 30 0 0,0 0 0 8208,470 50,0 0 0 120,0 0 0,0 0 0
Player characteristics
Age Years 26.51 26.00 4.08 17.0 0 40.00
Height Centimeters 181.49 182.00 6.15 161.00 203.00
Player performance
Minutes played total p.s. 1612.39 1612.00 884.85 90.00 3420.00
Goals total p.s.
2.39 1.00 3.85 .00 50.00
Assists total p.s. 1.64 1. 0 0 2.25 .00 20.00
Passes total p.g. 29.45 28.48 13.36 1.55 110.03
Successful passes percent p.g. .78 .78 .07 .43 1.0 0
Dribbles total p.g. 1.21 .90 1.12 .00 9.58
Successful dribbles percent p.g. .51 .50 .24 .00 1.0 0
Aerial duels total
p.g. 2.22 1.79 1.71 .00 15.50
Successful aerial duels percent p.g. .47 .48 .18 .00 1.00
Tackles total p.g. 2.21 2.09 1.21 .00 9.00
Successful tackles percent p.g. .71 .72 .14 .00 1.0 0
Interceptions total p.g. 1.35 1.25 .92 .00 7.13
Clearances total p.g. 2.09 1.07 2.35 .00 13.44
Fouls total
p.g. 1.10 1.03 .53 .00 4.27
Yellow cards total p.s. 3.48 3.00 2.89 .00 18.00
Red cards total p.s. .20 .00 .46 .00 3.00
Player popularity
Wikipedia page views total p.s. 104,509.30 23,944.00 319,022.80 .00 8786,701.00
Google Trends search index average index p.s. 13.36 13.21 12.38 .00 91.83
Reddit posts
total p.s. 15.42 2.00 38.79 .00 789.00
YouTube videos total p.s. 36,075.46 918.50 141,882.30 .00 10 0 0,0 0 0.0 0
Notes: p.s. = per season; p.g. = per game; N = 10,350
2.4 goals, gave 1.6 assists, and received 3.5 yellow and .2 red cards.
In an average game, he made 29 passes (at a success rate of 78%),
did 1.2 dribbles (51% successfully), and committed 1.1 fouls. He
conducted 2.2 aerial duels (47% won) and made 2.2 tackles (71%
successfully), 1.4 interceptions, and 2.1 clearances per game.
We used four Internet metrics to measure player popularity: the
number of times a player’s Wikipedia page was viewed, how often
a player’s name was searched on Google , the number of times a
player’s name appeared in the “soccer” forum on Reddit , and how
many videos about a player were shared on YouTube . The average
player had more than 10 0,0 0 0 Wikipedia page views and more
than 35,0 0 0 YouTube videos. His name appeared in 15.4 forum
posts on Reddit, and his average Google Trends search index was
13.4. (The data Google provides is scaled from 0 to 100 for a given
time frame, so it refers to total searches for a term relative to the
total number of searches over time.)
None of the independent variables were highly correlated, but
an exploratory data analysis revealed that the distributions of the
players’ market values were highly right-skewed, which was also
the case for the popularity variables. ( Appendix B shows how the
market values were distributed across seasons, leagues, and po-
sitions, and how the independent variables were correlated.) We
log-transformed these variables to avoid violating the linearity as-
sumption of linear regression. “Eyeballing” the associations be-
tween the players’ market values that we collected from Transfer-
markt and the numerical independent variables with scatterplots
showed that all variables except age had reasonably linear relation-
ships with market value. Therefore, we squared the age variable to
get a more linear relationship with market value.
5. Results
5.1. Model specification
In order to build a statistical model with which to estimate
players’ market values, we fitted a series of regression models,
which included as predictors the players’ previous market values,
and the players’ characteristics, performance measures, and pop-
ularity metrics. As our data structure is hierarchical (players are
nested within teams, and teams are nested within leagues) and
longitudinal (players played multiple seasons), the model’s resid-
uals are likely not independent, which would violate a central as-
sumption of linear regression. Therefore, we used multilevel mod-
els that we specified to include player, team, league, position, con-
tinent of origin, and season as random factors, and for which
we allowed the intercepts to vary (notation adapted from Lee,
1975 ):
Market value
i(t(l)
p
c)[s]
= α
i(t(l)
p
c)[s]
+ β · Market value
i(t(l)
p
c)[s-1]
+ χ · Player characteristics
i(t(l)
p
c)[s]
+ δ · Player performance
i(t(l)
p
c)[
s
]
+ γ · Player popularity
i(t(l)
p
c)[
s
]
+ u
i(t(l)
p
c)[s]
+ u
t(l)
+ u
l
+ u
p
+ u
c
+ u
s
+ ε
i(t(l)
p
c)[
s
]
,
where i(t(l)
p
c)[s] indexes a player i , who is nested within each
of three factors that are crossed with each other—a team t (which
is further nested in a league l ), a position p , and the conti-
nent of origin c —corresponding to season observations s . Market
value
i(t(l)
p
c)[s]
is the market value to be estimated; α
i(t(l)
p
c)[s]
rep-
resents an individual intercept; Market value
i(t(l)
p
c)[s-1]
is the mar-
ket value from the preceding season; Player characteristics
i(t(l)
p
c)[s]
consists of the predictors Age
2
, Height , and Footedness ; Player
performance
i(t(l)
p
c)[s]
consists of the predictors Minutes played ,
Goals , Assists , (Successful) Passes , (Successful) Dribbles , (Successful)
Aerial duels , (Successful) Tackles , Interceptions , Clearances , Fouls , Yel-
low cards , and Red cards ; and, Player popularity
i(t(l)
p
c)[s]
consists
of the predictors Wikipedia page views , Google Trends search in-
dex , Reddit posts , and YouTube videos . u
i(t(l)
p
c)[s]
, u
t(l)
, u
l
, u
p
, u
c
,
and u
s
are random effects that are designed to capture the non-
independence between 1) market values observed for the same
O. Müller et al. / European Journal of Operational Research 263 (2017) 611–624 617
player i over time s ( u
i(t(l)
p
c)[s]
), 2) market values observed for
players on the same team ( u
t(l)
), 3) market values observed for
teams in the same league ( u
l
), 4) market values observed for play-
ers who play the same position ( u
p
), 5) market values observed
for players from the same continent of origin ( u
c
), and 6) market
values observed for players in the same season ( u
s
), respectively.
ε
i(t(l)
p
c)[s]
captures the remaining error. The random effects and
the error term are assumed to be independently and identically
distributed and follow a normal distribution with mean zero and
standard deviation σ
μ
.
5.2. Regression results
Table 3 shows the estimated coefficients, standard errors, and
p -values of the fixed effects as well as the standard deviations of
the random effects. Model 1 serves as a baseline model and con-
tains only an intercept and the Previous market value . Model 2 adds
player characteristics, Model 3 adds the player-performance vari-
ables, and Model 4 adds the player-popularity metrics. The good-
ness of fit, measured by the Akaike Information Criterion (AIC) and
the Bayesian Information Criterion (BIC), improves with each block
of variables added; likelihood ratio tests confirm that these im-
provements are significant (from Model 1 to 2: χ2 (3) = 2439.00,
p = .0 0 0; from Model 2 to 3: χ2 (16) = 4843.20, p = .0 0 0; from
Model 3 to 4: χ 2 (4) = 144.12, p = .0 0 0).
As our dependent variable is measured on the logarithmic scale,
the models’ coefficients can be interpreted roughly as percent
changes. The coefficients of the log-transformed independent vari-
ables have to be interpreted as elasticities. For example, an addi-
tional Goal (Assist) per season increases a player’s Market value by
2.4 (1.5) percent in Model 4, holding all other variables constant,
and a 1 percent increase in the number of Wikipedia page views is
associated with a .02 percent increase in Market value .
In Model 1, the baseline model, the Previous market value (.543;
p < .001) is significant. The significant variables in Model 2 are
Previous market value (.610; p < .001) and Age
2
( .002; p < .001).
AIC drops from 17,416.2 to 14,983.2, indicating an improvement in
goodness of fit. In Model 3, the significant variables from Model
2—that is, Previous market value (.495; p < .001) and Age
2
( .002;
p < .001)—are still significant, and from the set of performance
variables, Minutes played , Goals , Assists , Passes , Successful passes ,
Dribbles , Aerial duels , Tackles , and Yellow cards are also significant.
With every minute a footballer plays, his market value increases
by .03 percent ( p < .001), each goal increases it by 2.60 percent
( p < .001), and each assist increases it by 1.58 percent ( p < .001).
Passes (0.57%; p < .001), the ratio of Successful passes (30.05%; p <
.001), Dribbles (3.02%; p < .001), and Aerial duels (1.33%; p < .001)
further increase a player’s market value, whereas Tackles ( 2.08%;
p < .001) and Yellow cards ( 0.41%; p < .05) decrease it. The
model’s goodness of fit increases compared to Model 2, as AIC
drops from 14,983.2 to 10,172.0.
Model 4 adds popularity data. The variables from Model 3
remain largely stable when Wikipedia page views , Google Trends
search index , Reddit posts , and YouTube videos are added. Three of
the four popularity variables are significantly related to a player’s
market value, with a .02 percent increase for each 1 percent in-
crease in Wikipedia page views ( p < .001), a .03 percent increase for
each 1 percent increase in Reddit posts ( p < .001), and a .01 per-
cent increase for each 1 percent increase in YouTube videos ( p <
.01). The model’s goodness of fit increases compared to the previ-
ous models, as AIC drops from 10,172.0 to 10,035.9.
The parameter estimates for the random effects (i.e., the stan-
dard deviations) remain largely stable across models ( σ
2
to σ
6
).
However, unexplained player-specific variability ( σ
1
, the standard
deviation for Players nested in Teams nested in Leagues ) is com-
paratively large in Model 1 (.4 4 4) but decreases when additional
Table 3
Multilevel regression models.
Dependent variable: Log of market value
Model 1 Model 2 Model 3 Model 4
Fixed effects
Intercept 6.789
∗∗∗
6.492
∗∗∗
7.432
∗∗∗
7.272
∗∗∗
(.132) (.219) (.203) (.200)
Log of previous market value .543
∗∗∗
.610
∗∗∗
.495
∗∗∗
.486
∗∗∗
(.006) (.005) (.005) (.005)
Age
2
.002
∗∗∗
.002
∗∗∗
.002
∗∗∗
(.0 0 0) (.0 0 0) (.0 0 0)
Height .002 .001 .001
(.001) (.001) (.001)
Footedness .003 .006 .007
(.022) (.017) (.017)
Minutes played .0 0 0
∗∗∗
.0 0 0
∗∗∗
(.0 0 0) (.0 0 0)
Goals .026
∗∗∗
.024
∗∗∗
(.002) (.002)
Assists .016
∗∗∗
.015
∗∗∗
(.002) (.002)
Passes .006
∗∗∗
.005
∗∗∗
(.001) (.001)
Successful passes .301
∗∗∗
.286
∗∗∗
(.083) (.083)
Dribbles .030
∗∗∗
.028
∗∗∗
(.005) (.005)
Successful dribbles .035 .034
(.019) (.018)
Aerial duels .013
∗∗∗
.014
∗∗∗
(.004) (.004)
Successful aerial duels .005 .006
(.028) (.027)
Tackles .021
∗∗∗
.018
∗∗∗
(.005) (.005)
Successful tackles .049 .050
(.030) (.030)
Interceptions .013 .010
(.008) (.008)
Clearances .003 .003
(.003) (.003)
Fouls .002 .004
(.010) (.010)
Yellow cards .004
.004
(.002) (.002)
Red cards .007 .007
(.009) (.008)
Log of Wikipedia page views .016
∗∗∗
(.002)
Log of Google Trends search index .006
(.004)
Log of Reddit posts .026
∗∗∗
(.005)
Log of YouTube videos .007
∗∗
(.002)
Random effects
σ
1
(Player/Team/League) .4 4 4 .298 .179 .185
σ
2
(Team/League) .280 .217 .237 .219
σ
3
(League) .138 .137 .150 .120
σ
4
(Position) .083 .052 .056 .050
σ
5
(Continent of origin) .057 .053 .034 .029
σ
6
(Season) .107 .089 .089 .098
σ
7
(Residual) .409 .411 .347 .343
Log Likelihood 8699.1 7479.6 5058.0 4986.0
AIC 17,416 .2 14,983.2 10,172.0 10,035.9
BIC 17,4 81.4 15,070.1 10,374.9 10,267.8
Notes:
p < .05
∗∗
p < .01
∗∗∗
p < .001; standard errors are in parentheses.
Number of observations: 10,350. Number of groups: Players, 4217; Teams, 146; Con-
tinents of origin, 6; Seasons, 6; Leagues, 5; Positions, 3.
fixed factors and covariates are added (Model 4: .185). In other
words, these variables explain additional variability between play-
ers. In what follows, we evaluate the accuracy of Model 4 in esti-
mating market value, as it is the model with the highest goodness
of fit.
618 O. Müller et al. / European Journal of Operational Research 263 (2017) 611–624
5.3. Model evaluation
Market values are unobservable, which made it difficult to eval-
uate the accuracy of our statistical model. Still, market values are
proxies for transfer fees ( He et al . , 2015 ), so we compared the
model estimates with actual transfer fees. However, market val-
ues and transfer fees are not necessarily the same. For example,
players can switch clubs after their contracts have expired with-
out any transfer fee, but that does not mean that their market
value is zero, and clubs sometimes pay unreasonably high fees
for players, especially if they have to find replacements for in-
jured players quickly or want to weaken competitors ( Herm et al.,
2014 ). Against this background, we also compared our model esti-
mates with the crowd estimates, which provided another bench-
mark for evaluating our model’s accuracy. We collected data on
publically announced transfer fees for all six playing seasons, ex-
cluding players from the evaluation sample whose transfer fees
were zero (because their contracts had expired or they were on
loan) and players other than those who had been sold by one of
the 146 clubs in our data set (because players they had bought
may have come from leagues other than the European top five,
so we would not have had their data). From this process we col-
lected 845 transfer fees with which we could evaluate our model’s
accuracy.
Because our sample spanned several playing seasons, we could
not use standard evaluation strategies for predictive models, such
as k-fold cross-validation (see, e.g., Hastie, Tibshirani, & Fried-
man, 2017 ), as these strategies would have introduced the risk of
leakage–that is, the use of data from the future to train a model in
the past ( Kaufman, Rosset, & Perlich, 2011 ). Therefore, we applied
a time-series-based evaluation approach to ensure that a player’s
market value after a given season was estimated based only on
data that was known at that point in time. For example, to esti-
mate players’ market values after the 2009/10 season, we trained
the model on data from the 2009/10 season, and to estimate play-
ers’ market values after the 2010/11 season, we trained the model
on data from the 2009/10 and 2010/11 seasons. After we had ob-
tained statistical estimates of market value for all 845 players in
our evaluation sample, we calculated the differences between the
model estimates and the transfer fees for each of them and, on
that basis, the Root Mean Squared Error (RMSE) and the Mean
Absolute Error (MAE) as aggregated measures. We calculated the
same two measures for the crowd’s estimates.
As Table 4 shows, the evaluation results indicate that the
crowd’s estimates are slightly more accurate in that they are closer
to actual transfer fees than the model’s estimates, with an RMSE
that is 3.4% lower and an MAE that is 3.6% lower. However, a
Diebold-Mariano test that compared the MAEs of the crowd’s es-
timates and the model’s estimates showed no statistically signifi-
cant difference ( p < .340) ( Diebold & Mariano, 1995 ). On average,
the crowd estimates deviate by 3241,733 from the players’ trans-
fer fees and the model estimates by 3359,743.
However, as the exploratory data analysis revealed, the distri-
bution of players’ market values was highly skewed and character-
Table 4
Model evaluation.
RMSE MAE
Crowd estimates 5793,474 3241,733
Model estimates 5996,341 3359,743
Relative difference + 3.4% + 3.6%
Notes: A positive value for relative difference in-
dicates superiority of crowd. Actual transfer fees
were used as ground truth. N = 845
ized by extreme outliers ( Appendix B ), as was the case with their
transfer fees. Therefore, we evaluated the accuracy of both the
model estimates and the crowd estimates for various price ranges.
Fig. 2 shows the development of the difference in RMSE between
the model’s estimates and the crowd’s estimates when the data
set is filtered at various cut-off points. While the differences be-
tween the two estimation approaches are generally not large, the
model tends to be more accurate for low- to medium-priced play-
ers, whereas the crowd tends to be more accurate for high-priced
players.
The crossover between the model’s estimates and the crowd’s
estimates occurs at a transfer fee of approximately 18 million,
which is at the 90th percentile of the distribution. ( Fig. 3 pro-
vides a transfer-fee histogram.) In other words, the model pro-
duced more accurate estimates on average than the crowd did for
the lower 90 percent of all transfers (i.e., for 769 out of 845 trans-
ferred players).
In contrast, the crowd produced more accurate estimates on
average for players with high transfer fees, such as superstars
like David Luiz and Edinson Cavani, who were both bought by
Paris Saint-Germain F.C. for fees of 49.5 million and 64.5 mil-
lion, respectively. ( Appendix C provides more detailed evaluation
results.)
6. Discussion
Overall, the results from the evaluation of our statistical model
confirm the applicability of data analytics to estimating market
value, as the estimated market values did not deviate consider-
ably from actual transfer fees. The average deviation was around
3.4 million, which is not much considering the high transfer
fees in today’s football. (The players’ transfer fees ranged from
10 0 0 to 101,0 0 0,0 0 0 in our sample, with a standard devia-
tion of 9414,575.) Still, it is difficult to draw conclusions from
a comparison with transfer fees alone, because they are concep-
tually different from market values. To have another benchmark,
we also compared our model estimates with Transfermarkt’s es-
timates of market value, which we found to be more closely re-
lated to actual transfer fees. However, the difference was relatively
small, with an RMSE that was only 3.4 percent lower and not
statistically significant, so our evaluation results do not necessar-
ily indicate that crowds are more accurate in estimating market
value.
In fact, we found that the model tends to provide more ac-
curate estimations for low- to medium-priced players, while the
crowd tends to be more accurate for high-priced players. Specif-
ically, the model produced more accurate market-value estimates
on average for the lower 90 percent of the transfers we con-
sidered, even though the differences between crowd estimations
and model estimations were often not large. However, especially
for the smaller share of expensive players, the model estimations
were disproportionally inaccurate, which skewed the average so
the crowd was more accurate for the overall sample. There are at
least two possible explanations for this finding. First, the model
may not be able to value expensive players, especially superstars,
accurately because it may lack important intangible indicators (e.g.,
players’ potential to boost ticket or jersey sales). While the crowd
can consider such factors, which can range widely from player
to player, the statistical model uses the same set of predefined
factors for all players. In other words, the crowd has more free-
dom in selecting relevant information for player valuation, which
may be an advantage when it comes to setting a value on a su-
perstar. Second, professional football clubs sometimes pay very
high transfer fees for players, which may not reflect their “true”
value,
so the model has difficulty in estimating their prices. In that
case, the crowd would be severely biased by these players’ tal-
O. Müller et al. / European Journal of Operational Research 263 (2017) 611–624 619
Fig. 2. Comparison of the model’s and the crowd’s estimates
Notes: The x -axis is log-transformed and it represents the upper limits of transfer fees. The y -axis shows the difference in RMSE between the model and the crowd, calculated
based on a comparison with transfer fees. The dotted line separates
the lower 90% of all transfers from the higher 10% of all transfers.
ent and popularity, while the statistical model would allow to de-
tect disproportionate and unreasonable payments on the transfer
market.
Our findings have several implications for the practice of esti-
mating market value in professional football. We argued that data-
driven estimation of market value can overcome several limitations
that are associated with crowd-based estimates of market value.
The use of data analytics is arguably more transparent and re-
producible than crowd judgments are, as the estimated regression
coefficients directly quantify the impact of several variables on a
player’s market value. Transparency about the relationships of mar-
ket values with player characteristics, performance, and popularity
can help managers to make predictions about future market-value
developments that can be repeated at minimal cost and with a
high level of reliability. Because data analytics is efficient, it may
even allow players’ market values to be estimated on a match-
by-match basis, while the crowd can update market values only
infrequently. Based on a comparison with actual transfer fees, we
showed that formal models can provide accurate estimates of mar-
ket value that do not deviate much from crowd-based estimates,
even though the crowd’s estimates require considerably more time
and effort. Theref ore, our st atistical result s can f orm the basis f or
building real-time information systems that estimate and predict
players’ market values. In addition, our results may also be in-
teresting for operators of fantasy-football websites, where partic-
ipants slip into the role of club managers and choose their team
rosters by buying and selling players, as such games likewise use
performance data to determine players’ value, yet in a much sim-
pler way.
Furthermore, while crowdsourcing platforms like Transfermarkt
produce public numbers, data analytics allows football clubs to
evaluate players internally, so they can provide a competitive ad-
vantage to football clubs in transfer negotiations. In particular, data
analytics can support clubs in player scouting, while the crowd of-
620 O. Müller et al. / European Journal of Operational Research 263 (2017) 611–624
Fig. 3. Distribution of transfer fees
Note: The dotted line separates the lower 90% of all transfers from the higher 10% of all transfers.
ten has difficulty evaluating lesser-known players (e.g., from less
popular leagues). Players who are largely unknown tend to receive
only a few votes from the crowd, which increases the risk of bi-
ased estimations. Formal models have the potential to identify tal-
ented young players early in their careers, when their value is still
unknown to the broader public. Against this background, our study
demonstrates the applicability of the “Moneyball” idea in associa-
tion football.
To the best of our knowledge, this study is grounded in the
largest data set in terms of both coverage (five leagues, six years)
and level of detail (more than twenty indicators) that has been
used for research on estimating market value in professional foot-
ball. Accordingly, our study can also inform future research in the
field. In particular, we determined the significance of various in-
dicators of market value that have guided related work, by which
we proposed a multilevel model for estimating market value. How-
ever, although our model incorporated a large number of market-
value indicators, commercial providers of sports data capture more
than two hundreds metrics per player per game to which we did
not have access. Therefore, future research is challenged to test the
applicability of alternative model specifications and to determine
the significance of additional indicators of market value. For exam-
ple, it is likely that market values are a function of several other
variables at the league level (e.g., UEFA coefficients), at the club
level (e.g., team popularity), and at the individual level (e.g., ap-
pearances and performance on the national team or in the Cham-
pions League or Europa League), which we did not include in our
model. Moreover, future research could investigate the added value
of not only considering the volume of news shared on Reddit or
keywords used on Google as indicators of market value, but also
their sentiment ( Pang & Lee, 2008 ). For example, research on the
applicability of social-media data to predict politicians’ popularity
has shown that combining information on volume and sentiment
can enhance the accuracy of predictive models (see, e.g., Gayo-
Avello, 2013 ).
Against this background, our study has several limitations. First,
we could not confirm empirically the potential of data analyt-
ics in scouting young and/or unknown players. Because we used
data from the five largest European leagues, most of the play-
ers in our sample were already well known to the public and
crowd. Therefore, future research should conduct similar analy-
ses using minor-league data, which may be a challenge because
less data are available for the minor leagues. Second, we argued
that data analytics can make estimating changes in players’ mar-
ket value possible on a match-by-match basis, while crowd es-
timations require much more time and effort. However, this po-
tential also remains to be empirically confirmed. Our model used
seasonal data, so future research is challenged to conduct simi-
lar analyses with match-day data. Third, because of the unavail-
ability of other credible sources that provide historical estima-
tions of market value, we trained our model based on Transfer-
markt’s estimates of market value–another reason why our eval-
uation results are difficult to interpret. Therefore, data analytics
should not be viewed at this stage as an alternative but as a com-
plementary approach to crowd-based estimation. As our model in-
corporated human judgment, it can be considered a “model of
the judge” ( Baron, 2008 , pp. 366ff.)–that is, we used the subjec-
tive estimations by the Transfermarkt judges to train a statisti-
O. Müller et al. / European Journal of Operational Research 263 (2017) 611–624 621
cal model based on objective market-value indicators. To evalu-
ate the superiority of purely formal models over crowd estimates,
or vice versa, future research should develop time-series based
approaches to data-driven estimation of market value that pre-
dict market values in the future based on their own past estima-
tions.
7. Conclusions
Based on an analysis of a unique data set of 4217 players on
146 teams from the top five European leagues and a period of six
playing seasons, we demonstrated the value of using multilevel re-
gression models to estimate players’ market values. Comparing our
results with crowd estimates shows that a data-driven approach to
estimating market value can overcome several of crowdsourcing’s
practical limitations while producing comparatively accurate esti-
mates. Given the increasing availability of data about football play-
ers in the form of data sets from commercial data providers and
user-generated content from the web, we expect that the football
industry will increasingly adopt data analytics to support player re-
cruitment and transfer negotiations.
Acknowledgments
This research did not receive any specific grant from funding
agencies in the public, commercial, or not-for-profit sectors. We
thank Alin Secareanu for his assistance and support in collecting
and preparing the data.
Appendix A. Descriptive Overview of the European transfer
market
We collected Transfermarkt’s estimations of market value at the
end of the six seasons (as per June 30) for all players in our sam-
ple. Fig. A.1 shows how the players’ market values changed dur-
ing the six-year period for the various playing positions, and Fig.
A.2 shows how they changed during that time for the top five
Fig. A.1. Development of market value across positions
Note: The figure displays estimations of market value at the end of the six playing
seasons, as estimated on the Transfermarkt website.
Fig. A.2. Development of market value across leagues
Note: The figure displays estimations of market value at the end of the six playing
seasons, as estimated on the Transfermarkt website.
European leagues. For each of the five leagues, Fig. A.3 shows the
two teams with the highest average player values across all sea-
sons. Across all leagues, the average player was worth 5.4 mil-
lion in 2009/10 and 6.0 million by 2014/15, an 11 percent increase
in only six years, which illustrates how important player valuation
has become in recent years.
Market values have generally increased for all positions, but
the amount of the increase has differed considerably among them.
With an average market value of 4.4 million across all seasons,
defenders had the lowest market values, while midfielders’ and
forwards’ average market values were 5.9 million and 7.2 mil-
lion, respectively. From 2009/10 to 2014/15, forwards’ market val-
ues increased from 6.8 million to 7.6 million (11.8%), midfielders’
market values increased from 5.7 million to 6.5 million (14.0%),
and defenders’ market values increased from 4.4 million to 4.6
million (4.5%).
England’s Premier League had the highest average market value
in every season. In 2009/10, its average market value was 7.3 mil-
lion, and it increased to 8.5 million in 2014/15 (16.4%). The two
most valuable teams were Chelsea F.C. (with an average player
value of 19.3 million) and Manchester City (with an average
player value of 18.8 million). Both of these teams were much less
valuable than the two top teams from Spain, FC Barcelona (with
an average player value of 29.4 million) and Real Madrid (with
an average player value of 26.4 million), even though players in
the Spanish league overall had considerably lower average market
values (average of 6.8 million) across the six seasons.
German Bundesliga players’ average market values increased
from 4.3 million in 2009/10 to 5.8 million in 2014/15 (34.9%).
The two most valuable clubs were FC Bayern Munich (with an
622 O. Müller et al. / European Journal of Operational Research 263 (2017) 611–624
Fig. A.3. Teams with the highest average player market values
Notes: The figure displays the average player values, not the total team values, at the end of the six playing seasons, as estimated on the Transfermarkt website. The two
teams with the highest average player values are shown for each of
the five leagues.
average player value of 17.8 million) and Borussia Dortmund (with
an average player value of 11. 3 million). In contrast, Italy’s Serie A
players lost value, with average market values decreasing from 5.5
million in 2009/10 to 5.0 million in 2014/15, so the Serie A lost
its place among the top three most valuable European leagues to
Germany. The two most valuable teams were Juventus Turin (with
an average player value of 12.6 million) and Inter Milan (with an
average player value of 10.2 million).
Finally, players’ market values in France’s Ligue 1 remained
largely stable over the six years under consideration, with an
average market value of 3.5 million in 2009/10 and 3.4 mil-
lion in 2014/15. The two most valuable teams were Paris Saint-
Germain F.C. (with an average player value of 12.0 million) and
Olympique de Marseille (with an average player value of 6.6
million).
Appendix B. Distribution and correlation of dependent and
independent variables
As we used Transfermarkt’s estimates of market value to train
our model, we investigated the distributions of the players’ market
values. Fig. B.1 provides box plots that show how the market values
were distributed across seasons, leagues, and positions. The distri-
bution of the players’ market values was highly right-skewed, with
means that were above the medians for all seasons, leagues, and
positions, which indicates that our sample contained a few players
with exceptionally high market values, as well as a large number
of players whose market values were below the average of around
5.6 million.
We also investigated how the indicators of market value that
we used as independent variables in our regression model were
correlated ( Table B.1 ). All correlations were below the critical
threshold of 0.7; in addition, all variance inflation factors (VIF)
were below 4, well below the critical threshold of 10, so multi-
collinearity presented no problems.
Appendix C. Evaluation results
We used the sample of players who had transfer fees below 18
million to investigate our model’s accuracy by evaluating how the
estimates of market value differed from actual transfer fees across
seasons, positions, and leagues. Table C.1 shows the evaluation re-
sults.
In the first four seasons, the crowd’s estimates were closer
to the actual transfer fees, especially in season 2012/13 (relative
difference in RMSE of + 20.0%), but in 2013/14 and 2014/15, the
model’s estimates were more accurate ( 13.2% and 3.1%, respec-
tively). While the model produced more accurate numbers for Ger-
many’s Bundesliga ( 6.4%) and England’s Premier League ( 5.2%),
the crowd provided more accurate estimates for Spain’s La Liga
( + 0.9%), France’s Ligue 1 ( + 2.1%), and Italy’s Serie A ( + 9.4%). Fi-
nally, the crowd’s estimates were closer to the actual transfer fees
for defenders ( + 4.6%) and forwards ( + 7.3%), while the model was
more accurate for midfielders ( 8.4%).
O. Müller et al. / European Journal of Operational Research 263 (2017) 611–624 623
Fig. B.1. Distribution of market value across seasons, leagues, and positions
Notes: The figure displays box plots of market-value estimations at the end of the six playing seasons, as estimated on the Transfermarkt website. The y-axes are log-
transformed. The whiskers (i.e., the lines at the bottom and top of each
box) show the minimum and maximum values within 1.5 times the interquartile range; the bands
in the boxes represent the 25th, 50th (median), and 75th percentiles. The dotted lines that cross the box plots show the mean market value.
Table B.1
Correlation matrix.
(1) (2) (3) (4) (5) (6) (7) (8) (9) (10) (11) (12) (13) (14) (15) (16) (17) (18) (19) (20) (21) (22)
(1) Age 1
(2) Height 0.01 1
(3) Minutes played 0.10 0.02 1
(4) Goals 0.00 0.01 0.37 1
(5) Assists 0.02 0.20 0.42 0.51 1
(6) Passes 0.13 0.02 0.52 0.03 0.20 1
(7) Successful passes 0.01 0.13 0.07 0.12 0.01 0.49 1
(8) Dribbles 0.22 0.28 0.19 0.41 0.48 0.02 0.04 1
(9) Successful dribbles 0.03 0.10 0.18 0.09 0.04 0.32 0.21 0.06 1
(10) Aerial duels 0.10 0.40 0.22 0.19 0.04 0.06 0.29 0.09 0.07 1
(11) Successful aerial duels 0.14 0.41 0.13 0.21 0.23 0.29 0.04 0.34 0.19 0.25 1
(12) Tackles 0.02 0.05 0.33 0.24 0.00 0.56 0.18 0.02 0.22 0.00 0.24 1
(13) Successful tackles 0.00 0.09 0.10 0.12 0.11 0.13 0.09 0.15 0.04 0.00 0.22 0.10 1
(14) Interceptions 0.07 0.12 0.33 0.31 0.16 0.53 0.17 0.26 0.25 0.06 0.44 0.62 0.25 1
(15) Clearances 0.13 0.39 0.23 0.27 0.29 0.28 0.08 0.42 0.25 0.29 0.53 0.22 0.28 0.55 1
(16) Fouls 0.02 0.10 0.26 0.10 0.05 0.21 0.11 0.08 0.04 0.24 0.12 0.41 0.04 0.24 0.00 1
(17) Yellow cards 0.11 0.03 0.58 0.09 0.15 0.38 0.05 0.01 0.12 0.13 0.17 0.40 0.09 0.37 0.19 0.49 1
(18) Red cards 0.03 0.07 0.17 0.00 0.01 0.11 0.00 0.04 0.06 0.08 0.13 0.12 0.07 0.17 0.16 0.18 0.20 1
(19) Wikipedia page views 0.02 0.01 0.13 0.27 0.24 0.15 0.16 0.16 0.03 0.01 0.06 0.07 0.01 0.13 0.09 0.08 0.02 0.03 1
(20) Google Trends search index 0.03 0.02 0.03 0.10 0.08 0.04 0.08 0.04 0.03 0.03 0.03 0.03 0.04 0.03 0.00 0.04 0.03 0.00 0.15 1
(21) Reddit posts 0.09 0.03 0.13 0.17 0.17 0.18 0.18 0.11 0.09 0.18 0.02 0.02 0.10 0.11 0.02 0.14 0.04 0.02 0.33 0.04 1
(22) YouTube videos 0.07 0.06 0.14 0.21 0.20 0.15 0.12 0.15 0.03 0.08 0.05 0.05 0.05 0.08 0.06 0.09 0.08 0.00 0.26 0.08 0.63 1
624 O. Müller et al. / European Journal of Operational Research 263 (2017) 611–624
Table C.1
Model evaluation across seasons, positions, and leagues.
RMSE RMSE Relative N
Model’s estimates Crowd’s estimates difference
Seasons 2009/10 34 4 4,749 3382,450 + 1.8% 101
2010/11 3242,258 3217,317 + 0.8% 147
2011/12 4006,372 3808,920 + 5.1% 120
2012/13 3221,275 2635,404 + 20.0% 130
2013/14 3101,502 3541,482 13.2% 129
2014/15 4241,699 4374,319 3.1% 141
Positions Defender 3723,296 3556,600 + 4.6% 240
Midfielder 3175,083 3453,751 8.4% 315
Forward 3932,515 3653,805 + 7.3% 213
Leagues Bundesliga 2743,188 2923,510 6.4% 16 4
La Liga 3642,176 3610,105 + 0.9% 102
Ligue 1 3855,753 3775,886 + 2.1% 128
Premier League 4113,338 4332,412 5.2% 144
Serie A 3532,511 3215,505 + 9.4% 230
Notes: The table shows RMSEs for transfer fees below 18 million. A positive value
for relative difference indicates superiority of crowd. N = 768.
References
Adler, M. (1985). Stardom and talent. The American Economic Review, 75 (1), 208–212 .
Alexa. (n.d.). How popular is transfermarkt.de? Retrieved March 14, 2017, from http:
//www.alexa.com/siteinfo/www.transfermarkt.de .
Amir, E. , & Livne, G. (2005). Accounting, valuation and duration of football player
contracts. Journal of Business Finance & Accounting, 32 (3–4),
549–586 .
Baron, J. (2008). Thinking and deciding (4th edition). New York, NY, USA: Cambridge
University Press .
Bojanova, I. (2014). IT enhances football at World Cup 2014. IT Professional, 16 (4),
12–17 .
Brandes, L. , & Franck, E. (2012). Social preferences or personal career concerns? Field
evidence on
positive and negative reciprocity in the workplace. Journal of Eco-
nomic Psychology, 33 (5), 925–939 .
Brandes, L. , Franck, E. , & Nüesch, S. (2008). Local heroes and superstars: an empiri-
cal analysis of star attraction in German soccer. Journal of Sports Economics, 9 (3),
266–286 .
Brunswik, E. (1952).
The conceptual framework of psychology . Chicago, IL, USA: The
University of Chicago Press .
Bryson, A. , Frick, B. , & Simmons, R. (2012). The returns to scarce talent: footedness
and player remuneration in European soccer. Journal of Sports Economics, 14 (6),
606–628 .
Carmichael, F. , & Thomas,
D. (1993). Bargaining in the transfer market: theory and
evidence. Applied Economics, 25 (12), 1467–1476 .
Carmichael, F. , Forrest, D. , & Simmons, R. (1999). The labour market in association
football: who gets transferred and for how much? Bulletin of Economic Research,
51 (2), 125–150 .
Dawes, R. M. , Faust, D. , & Meehl, P. E. (1989). Clinical versus actuarial judgment.
Science, 243 (4899), 1668–1674 .
Diebold, F. X. , & Mariano, R. S. (1995). Comparing predictive accuracy. Journal of
Business & Economic Statistics, 13 (3), 253–263 .
Dobson, S. , Gerrard, B. , & Howe, S. (20 0
0). The determination of transfer fees in
English nonleague football. Applied Economics, 32 (9), 1145–1152 .
Evans, J. S. B. T. (2006). The heuristic-analytic theory of reasoning: extension and
evaluation. Psychonomic Bulletin & Review, 13 (3), 378–395 .
Feess, E., Frick, B., & Muehlheusser, G. (2004). Legal restrictions on buyout fees:
the-
ory and evidence from German soccer. IZA Discussion Paper No. 1180. Retrieved
March 14, 2017, from http://ssrn.com/abstract=562445 .
Franck, E. , & Nüesch, S. (2011). The effect of wage dispersion on team outcome and
the way team outcome is produced. Applied Economics, 43 (23), 3037–3049 .
Franck, E. ,
& Nüesch, S. (2012). Talent and/or popularity: what does it take to be a
superstar? Economic Inquiry, 50 (1), 202–216 .
Frick, B. (2007). The football players’ labor market: empirical evidence from the ma-
jor European leagues. Scottish Journal of Political Economy, 54 (3), 422–446 .
Frick, B. (2011). Performance, salaries,
and contract length: empirical evidence from
German soccer. International Journal of Sport Finance, 6 (2), 87–118 .
Fry, T. R. L. , Galanos, G. , & Posso, A. (2014). Let’s get Messi? Top-scorer productivity
in the European Champions League. Scottish Journal of Political Economy, 61 (3),
261–279 .
Garcia-del-Barrio, P.
, & Pujol, F. (2007). Hidden monopsony rents in winner-take-all
markets–Sport and economic contribution of Spanish soccer players. Managerial
and Decision Economics, 28 (1), 57–70 .
Gayo-Avello, D. (2013). A meta-analysis of state-of-the-art electoral prediction from
Twitter data. Social Science Computer Review, 31 (6), 649–679 .
Gerrard, B. , & Dobson, S. (20 0 0). Testing for monopoly rents in the market for play-
ing talent–Evidence from English professional football. Journal of Economic Stud-
ies, 27 (3),
142–164 .
Grove, W. M. , & Meehl, P. E. (1996). Comparative efficiency of informal (subjec-
tive, impressionistic) and formal (mechanical, algorithmic) prediction proce-
dures: the clinical-statistical controversy. Psychology, Public Policy, and Law, 2 (2),
293–323 .
Grove, W. M. , Zald, D. H. , Lebow, B. S. , Snitz,
B. E. , & Nelson, C. (20 0 0). Clinical
versus mechanical prediction: a meta-analysis. Psychological Assessment, 12 (1),
19–30 .
Hastie, T. , Tibshirani, R. , & Friedman, J. (2017). The elements of statistical learning:
data mining, inference, and prediction (2nd edition). New York, NY, USA: Springer .
He, M.,
Cachucho, R., & Knobbe, A. (2015). Football player’s performance and mar-
ket value. In Proceedings of the 2nd workshop of sports analytics, European Con-
ference on Machine Learning and Principles and Practice of Knowledge Discov-
ery in Databases (ECML PKDD) . Retrieved March 14, 2017, from https://dtai.cs.
kuleuven.be/events/MLSA15/papers/mlsa15 _ submission _
8.pdf .
Herm, S. , Callsen-Bracker, H.-M. , & Kreis, H. (2014). When the crowd evaluates soc-
cer players’ market values: accuracy and evaluation attributes of an online com-
munity. Sport Management Review, 17 (4), 4 84–4 92 .
Kaplan, T. (2010, July 8). When it comes to stats, soccer
seldom counts . The New York
Times . Retrieved March 14, 2017, from https:// mobile.nytimes.com/ 2010/ 07/ 09/
sports/ soccer/ 09soccerstats.html .
Kaufman, S. , Rosset, S. , & Perlich, C. (2011). Leakage in data mining: formulation,
detection, and avoidance. In Proceedings of the 17th ACM SIGKDD Conference on
Knowledge Discovery and Data Mining (pp. 556–563). San Diego, CA, USA .
Kiefer, S. (2014). The impact of the Euro 2012 on popularity and market value of
football players. International Journal of Sport Finance, 9 (2), 95–110 .
Lee, W. (1975). Experimental design and analysis . San Francisco, CA, USA: Freeman &
Co Ltd .
Lehmann, E. E. , & Schulze, G. G. (2008). What does it take to be a star? The role
of performance and the media for German soccer players. Applied Economics
Quarterly, 54 (1), 59–70 .
Lewis, M. (2004). Moneyball: the art of winning an unfair game . New
York, NY, USA:
W. W. Norton & Company, Inc .
Lorenz, J. , Rauhut, H. , Schweitzer, F. , & Helbing, D. (2011). How social influence can
undermine the wisdom of crowd effect. Proceedings of the National Academy of
Sciences of the United States of America, 108 (22), 9020–9025 .
Lucifora, C. , & Simmons, R. (2003). Superstar effects in sport: evidence from Italian
soccer. Journal of Sports Economics, 4 (1), 35–55 .
Medcalfe, S. (2008). English league transfer prices: is there a racial dimension? A
re-examination with new data. Applied Economics Letters, 15 (11), 865–867 .
Murtagh, J. (2015, August
20). Moneyball FC: how Midtjylland harnessed the
power of stats to set up Euro showdown with Southampton . Mirror. Re-
trieved March 14, 2017, from http://www.mirror.co.uk/sport/football/news/
moneyball- fc- how- midtjylland- harnessed- 6282271 .
Pang, B. , & Lee, L. (2008). Opinion mining and sentiment analysis. Foundations and
Trends in Information
Retrieval, 2 (1–2), 1–90 .
Pawlowski, T. , Breuer, C. , & Hovemann, A. (2010). Top clubs’ performance and the
competitive situation in European domestic football competitions. Journal of
Sports Economics, 11 (2), 186–202 .
Reilly, B. , & Witt, R. (1995). English league transfer prices: is there a racial dimen-
sion? Applied Economics Letters, 2 (7), 220–222 .
Rosen, S. (1981). The economics of superstars. The American Economic Review, 71 (5),
845–858 .
Ruijg, J., & van Ophem, H. (2014). Determinants of football transfers. Dis-
cussion Paper 2014/01, Department of Economics & Econometrics, Am-
sterdam School of Economics. Retrieved March
14, 2017, from http:
//ase.uva.nl/binaries/content/assets/subsites/amsterdam- school- of- economics/
research/uva- econometrics/dp- 2014/1401.pdf .
SAP. (2014, June 11) . SAP and the German Football Association turn big data into
smart decisions to improve player performance at the World Cup in Brazil.
SAP News Center. Retrieved March 14, 2017, from http://www.news-sap.com/
sap- dfb- turn- big-
data- smart- data- world- cup- brazil .
Speight, A. , & Thomas, D. (1997). Football league transfers: a comparison of negoti-
ated fees with arbitration settlements. Applied Economics Letters, 4 (1), 41–44 .
Steinberg, L. (2015, August 18). Changing the game: the rise of sports analytics . Forbes.
Retrieved March
14, 2017, from http://www.forbes.com/sites/leighsteinberg/
2015/08/18/changing- the- game- the- rise- of- sports- analytics/#1f21bfdc31b2 .
Surowiecki, J. (2005). The wisdom of crowds . New York, NY, USA: Random House .
Torgler, B. , & Schmidt, S. L. (2007). What shapes player performance in soccer? Em-
pirical findings from a panel analysis. Applied Economics,
39 (18), 2355–2369 .
Tversky, A. , & Kahneman, D. (1974). Judgment under uncertainty: heuristics and bi-
ases. Science, 185 (4157), 1124–1131 .
Zhu, F., Lakhani, K. R., Schmidt, S. L., & Herman, K. (2015). TSG Hoffenheim: football
in the age of analytics . Harvard Business School Case 616–010. Retrieved
March
14, 2017, from http:// www.hbs.edu/ faculty/ Pages/ item.aspx?num=49569 .