803
Views
0
CrossRef citations to date
0
Altmetric
Bayesian and Monte Carlo Methods

Modeling Intransitivity in Pairwise Comparisons with Application to Baseball Data

ORCID Icon, ORCID Icon, &
Pages 1383-1392 | Received 07 Apr 2022, Accepted 27 Jan 2023, Published online: 15 Mar 2023

Abstract

The seminal Bradley-Terry model exhibits transitivity, that is, the property that the probabilities of player A beating B and B beating C give the probability of A beating C, with these probabilities determined by a skill parameter for each player. Such transitive models do not account for different strategies of play between each pair of players, which gives rise to intransitivity. Various intransitive parametric models have been proposed but they lack the flexibility to cover the different strategies across n players, with the O(n2) values of intransitivity modeled using O(n) parameters, while they are not parsimonious when the intransitivity is simple. We overcome their lack of adaptability by allocating each pair of players to one of a random number of K intransitivity levels, each level representing a different strategy. Our novel approach for the skill parameters involves having the n players allocated to a random number of A<n distinct skill levels, to improve efficiency and avoid false rankings. Although we may have to estimate up to O(n2) unknown parameters for (A,K) we anticipate that in many practical contexts A+K<n. Our semiparametric model, which gives the Bradley-Terry model when (A=n1,K=0), is shown to have an improved fit relative to the Bradley-Terry, and the existing intransitivity models, in out-of-sample testing when applied to simulated and American League baseball data. Supplementary materials for the article areavailable online.

1 Introduction

The seminal Bradley-Terry model (Bradley and Terry Citation1952) is commonly used to rank objects from paired comparison data. Given a set I of n objects with each object iI having skill riR, then the Bradley-Terry model gives, for ijI,(1) pij(BT)=Pr{ij}:={1+exp[(rirj)]}1,(1) where ab denotes preference for object a over b, and r1=0 to avoid identifiability issues. A ranking of the objects is given by sorting estimates of r:={riR:iI}. This model is transitive, that is, pjk(BT) is given by pij(BT) and pik(BT), for all ijkI, see Section 3.

Now consider the game of Rock-Paper-Scissors, a zero-sum game in which Rock beats Scissors, Scissors beats Paper, and Paper beats Rock, and specifically consider the deterministic scenario where players (r,p,s) always pick (Rock, Paper, Scissors), respectively. In this scenario, all win probabilities in a game are either 0 or 1 depending on the opponent, and each player wins their next game with probability 1/2 if their next opponent is to be selected at random. Whatever way the skill of a player is defined, the symmetry of this game setup unquestionably leads to the conclusion that the three players have equalskill levels.

Conclusions drawn from a Bradley-Terry model fitted to data from this simple game are surprisingly poor. Given a round-robin tournament, where each player plays all other players an equal number of times, the model will correctly estimate that all players are equally ranked in terms of skills; however, it would also estimate all pairwise win probabilities to be 1/2, which couldn’t be more wrong. Even worse, is that any illusory ranking can result when the tournament is not round-robin, for example, if the most common pairing of players is (r,s) and the other two pairings occur equally often then the Bradley-Terry model will rank player r as top. The key reason for the failure of the Bradley-Terry model is its transitive nature, a trait shared by almost all commonly used ranking systems.

Here we develop a novel pairwise comparison model, and an associated ranking system, which accounts for intransitivity. Thus, it describes how specific pairwise probabilities differ from probabilities given by overall skill levels alone, that is, how probabilities differ from those given by the Bradley-Terry model. The Rock-Paper-Scissors game also illustrates that ranking can involve ties, where subsets of players can have equal skill levels, and that tournament structure can effect the subsequent inference. We also address some aspects associated withthese issues.

The concept and associated modeling of intransitivity is not new. Makowski and Piotrowski (Citation2006) present many examples of competitions exhibiting intransitivity and argue that it can occur whenever the best strategy in a given comparison depends on the strategy of the opponent, and Smead (Citation2019) provides a philosophical argument as to why intransitivity is particularly likely to occur in sports. Given this, it is not surprising to find cases of intransitivity in e-sports (Makhijani and Ugander Citation2019; Chen and Joachims Citation2016; Duan et al. Citation2017). Other applications include social choice, real sensory analysis, and electiondatasets.

With n competitors there are n(n1)/2 interactions, or intransitivities, so even in round-robin competitions, with m rounds, there are too many terms to estimate efficiently using empirical methods, unless m/n is large. Causeur and Husson (Citation2005) proposed an O(n2) parameter extension of the Bradley-Terry model to address intransitivity. Subsequently O(nd) parametric models have been studied for some fixed dN (dn), see all the models in Section 2, but they lack the flexibility to cover the potentially O(n2) different intransitivities across n players, leading to bias; whilst they are not parsimonious when the intransitivity is simple, leading to inefficiency.

Although the concept of intransitivity is quite clear, there is no established measure of the amount of intransitivity in a dataset. In this work, we propose a definition of intransitivity through a distance metric between the assumed probability of paired comparisons under a Bradley-Terry model, and the empirical or model-based probability estimate, such that for any given dataset the magnitude of the intransitivity present is unambiguous. A flexible model then, is one which is capable of exploring the space of all possible combinations of intransitivity, as defined by this measure. Any parametric model is restricted to a subset of this space by definition, with this restriction being most obviously revealed when assessing predictive performance.

We then develop a novel semiparametric extension of the Bradley-Terry model, allocating the n(n1)/2 pairs of objects to a random number K, with 0Kn(n1)/2, of distinct intransitivity levels, each level representing a different strategy. We term this model the Intransitive Clustered Bradley-Terry (ICBT) model. Relative to the aforementioned parametric models, this ICBT model provides greater flexibility to enable the incorporation of varying structures, and degrees of, intransitivity. As many of these strategies will have similar effects, we anticipate that K should be small, yet the random property of K provides the potential for it to be large when required. This flexibility ensures that our model is parsimonious, whatever the complexity of the data. For our Rock-Paper-Scissors illustration K=1.

Moreover, our novel approach for the objects’ skills is to allocate the n objects into a random number of A+1n distinct skill levels, to improve efficiency and avoid false rankings. This constraint recognizes that from paired comparison data there will be objects that are indistinguishable as having statistically significantly different skill levels, for example, for our Rock-Paper-Scissors illustration A=0. So clustering skills avoids overinterpretation of misinformed rankings, a feature Masarotto and Varin (Citation2012) address by clustering skills via a lasso procedure.

The basis of our model is the belief that in practice there are likely to small subsets of skill and intransitivity levels, namely An1 and Kn(n1)/2, respectively. As we have little prior knowledge about the skills of the objects or the intransitivities of the pairs of objects, we allow the clustering of objects into different skill levels, and of the pairs of objects into separate intransitivity levels, to be determined entirely through a Bayesian hierarchical model. We take each of (A,K), the allocations of objects to skill levels, and the allocations of the pairs of objects to intransitivity levels as unknown, with inference being conducted via a reversible jump Markov chain Monte Carlo (RJMCMC) algorithm. This formulation does offer computational challenges; however, we anticipate that typically the posterior will give a high probability that A+K<n and that many of the cluster allocations also will be strongly identified. Our inference framework offers the opportunity to select a highly simplified model, with the values of A, K and allocations fixed at values given by posterior means/modes if these are found to align with known structure about the paired comparison. In the absence of such knowledge our results allow for the full uncertainty of these features to be accounted for.

In certain circumstances our model has the potential to identify and correct for imbalanced tournament structure on overall rankings since teams are not penalized if they (unfairly) compete most frequently against those whom they perform systematically worse to relative to what is expected based on respective skills alone.

We use American League Baseball data to illustrate the performance of our methods in comparison to existing models for a range of reasons. First, each game results in a win or a loss for a team. Second, it is known to be a highly strategic game, see Section 5, so we anticipate that the level of intransitivity will be high. Finally, although the tournament structure is not round robin, each team plays each other team often, and so the existence of intransitivity should become apparent in inference. Indeed this is found in Section 5, where our model is shown to have an improved fit over the Bradley-Terry model and existing parametric intransitivity models in out of sample testing for each of the nine seasons we study.

The layout is as follows. Section 2 introduces other approaches to modeling intransitivity. Section 3 then introduces our novel measure of intransitivity, the ICBT model, and the ranking formulation. Section 4 contains details of the inference, including prior specification, our full Bayesian hierarchical modeling strategy, an overview of the RJMCMC algorithm and its novel features, and an overview of a simulation study. Section 5 compares this model with the Bradley-Terry model and other competitor models, using American League baseball data. Section 6 is a discussion. Full details of the RJMCMC algorithm, simulation study, and extended analysis of the baseball application are in the supplementary material.

2 Literature on Intransitive Modeling

The blade-chest model of Chen and Joachims (Citation2016), extends the Bradley-Terry model into d-dimensions by incorporating so-called blade and chest vectors bi,ciRd for each object iI. There are two versions: the -dist and -inner variants, given, respectively, bylogit(pij(BCD)):=||bjci||22||bicj||22+rirj,andlogit(pij(BCI)):=biTcjbjTci+rirj.

The blade and chest parameters of all the objects can be viewed as features on a d-dimensional latent space. Then, if an object i’s blade is close to object j’s chest, and simultaneously object i’s chest is far from object j’s blade, then object i has an additional advantage over object j. If d=2 this model can represent a deterministic Rock-Paper-Scissors game, by placing the blade of Rock at the chest of Scissors, the blade of Scissors at the chest of Paper, and the blade of Paper at the chest of Rock. By increasing d, ever more complex relationships can be captured between the pairs of objects. Given n objects and bi,ciRd for each object i, the model contains 2d(n1) identifiable parameters. The r parameters can be absorbed into the blade and chest parameters; however, the above parametrisation makes it clear that the Bradley-Terry model is a special case of the blade-chest model, when bi=bj=ci=cj,i,jI.

Duan et al. (Citation2017) introduce a generalized model for intransitivity, withlogit(pij(G))=μiTΣμj+μiTΓμiμjTΓμj,with μiRd, where d is even, being a d-dimensional strength vector, for an object iI, and Σ,ΓRd×d are so-called transitive matrices. The first matrix Σ represents the interactions between objects, and Γ controls how an individual object’s strength components form the object’s overall strength. The number of identifiable parameters is d(3d/2+n1), since there are two d×d matrices (Σ,Γ), of which Σ is skew-symmetric, and n d-dimensional vectors. They show that their model is a generalization of the blade-chest and Bradley-Terry models. Specifically, the blade-chest-inner model arises when μi=(bi,ci), μj=(bj,cj), and ||μi||22=||μj||22, that is, the objects all have equal skill, and Σ is a block diagonal matrix with two (d/2)×(d/2) matrices of zeros on the diagonal and matrices Id/2 and Id/2 on the off-diagonals where Im is the m×m identity matrix, then μiTΣμj=biTcjbjTci. The degrees of freedom are restricted by regularization, using an L2 norm for the object strength vectors and Frobenius norm for both transitivity matrices. The tuning parameters are selected via cross-validation.

Makhijani and Ugander (Citation2019) introduced a majority vote model with object i having a vector of d skill attributes, (μi,1,,μi,d), where d is odd. Then, given a suitable choice of mapping function f, for example, logistic or Gaussian, define qijl=f(μi,lμj,l), l{1,,d} to be the probability of i beating j based only on their lth attribute. Then, majority vote model says that the probability of i being preferred to j overall, is the probability that it wins across the majority of attributes. For d=1 the model is linearly transitive, but not when d=3, asPr{ij}=qij1qij2qij3+(1qij1)qij2qij3+qij1(1qij2)qij3+qij1qij2(1qij3).

3 Modeling

3.1 Measure of Intransitivity

From the model definition (1), the Bradley-Terry model assumes linear transitivity. This assumption constrains the pairwise probabilities of the model such that, given pij(BT) and pjk(BT) from (1) for any ijk, the probability pik(BT) is completely determined. It is straightforward to show that the form of pik(BT) is given aspik(BT)=pij(BT)pjk(BT)1+2pij(BT)pjk(BT)(pij(BT)+pjk(BT)),ji,k,noting it is independent of the choice of bridge object j. Therefore, there can be no interaction that is specific to the pair {i,k}, that is not already captured between all other pairs.

Including intransitivity; however, allows for some pairwise probabilities to depart from those assumed by the Bradley-Terry model. This can be formalized by supposing that for all ikI the true probability of preference ik is given as some function f:{[0,1],R}[0,1] of the Bradley-Terry probability and the intransitivity, θik, of the pair {i,k}, then we can write(2) pik:=f(pik(BT),θik),ikI,(2) where we identify the form of f in Section 3.2. We define the intransitivity to be the displacement of the true probabilities from the Bradley-Terry probabilities on the log-odds scale, so that(3) θik:=log(pik/(1pik)pik(BT)/(1pik(BT))),ikI,(3) is the amount of intransitivity between the pair of objects {i,k}. A value of θik=0 indicates that the comparison is transitive, that is, the pairwise probabilities could be modeled by the Bradley-Terry model. As a consequence we require f(x,0)=x,x[0,1]. The choice of log-odds ratio in EquationEquation (3) reflects the nonlinearity of probabilities. For example, if ϵ=0.099, then a small linear shift in probability from 0.5 to 0.5+ϵ has little impact on the odds, which remain at approximately 1:2. However, a linear shift in probability from 0.9 to 0.9+ϵ has a huge impact on the odds, which move from 1:10 to 1:1000. Moreover, our definition (3) for the intransitivity θik also imposes rotational symmetry for pairs of objects, that is(4) θki=log(pki/(1pki)pki(BT)/(1pki(BT)))(4) =log((1pik)/pik(1pik(BT))/pik(BT))=θik,ikI, so we need to find {θik,i>kI} only, in order to completely define {θik,ikI}.

3.2 Model Formulation

To find the function f in EquationEquation (2), EquationEquation (3) can be simply rearranged which gives(5) pik=pik(BT)exp(θik)pik(BT)exp(θik)+1pik(BT),ikI,(5) and so for any pair {i,k}, EquationEquation (5) can be rewritten as(6) pik=11+exp[(θik+rirk)],ikI.(6)

Here the effect of θik is clear, positive (negative) θik, increases (decreases) the probability of team i beating team k relative to their skills alone, that is, relative to the Bradley-Terry model.

Thus far, the model contains the flexibility to describe P:={pik[0,1],ikI} completely. Here P contains n(n1)/2 degrees of freedom, because pki=1pik; however, the model contains n(n1)/2+n parameters: n(n1)/2 values of intransitivity between pairs, and n skill parameters from r, and thus, the model parameters are not identifiable. One way of ensuring identifiability in the standard Bradley-Terry model is to fix one object’s skill level, and here it is chosen that r1=0. As well as this constraint on the objects’ skill parameters, the intransitivity parameters require constraints for parameter identifiability. The minimal set of required constraints is identified in Proposition 1, see the Appendix for the proof.

Proposition 1.

Consider a round-robin tournament with pairs of objects (i,j) being compared, with i,jI, with ij where |I|=n. Suppose that the probabilities of i beating j are pij where these probabilities are given by expression (6), with r1=0 and intransitivity values θij. If a set of n1 pairs of objects, indexed by Jn1, have their intransitivity values set to arbitrary specified values, then all the rest of the {ri} and {θij} parameters in expression (6) are identifiable if Jn1 forms a connected graph over I. Furthermore, if less than n1 pairs’ intransitivity values are specified or if Jn1 is not a connected graph over I then identifiability is not achievable.

We choose the n1 constraints to be θij=0,(i,j):i=1,jI{1}, that is, all pairs involving object 1 have intransitivity set to 0. Proposition 1 gives that if any further constraints are imposed on the intransitivity values the flexibility of model (6) will be compromised.

With the above constraints, the minimal conditions for parameter identifiability are satisfied, but the model is still likely to overfit with so many parameters. To rectify this we restrict the total number of degrees of freedom, by restricting both the number of intransitivity values to only Kn(n1)/2 unique values and restricting the number of unknown skill values to be A<n, where A+Kn(n1)/2 and ideally A+Kn(n1)/2. In this fashion our ICBT model embraces intransitivity in a parsimonious way.

First, consider the A+1 unique skill values, which ensures parsimony in the model by clustering the objects’ skills r into distinct values which are sufficiently statistically significantly different. Since r1=0 is fixed, there are only A unknown skill levels, ϕRA. By defining the labels of the set of skill levels to be A:={A,,0,,A+} with A+ being the number of skill levels which are greater than 0 and A the number of skill levels less than 0 such that A+A++1=A+1=|A|n, we impose the equivalent condition in our model by fixing the skill level with label {0} to be ϕ0=0, and fixing object 1 to always be allocated to this cluster. The possible skill values an object can take are, therefore, defined as{ϕ0=0,ϕ:={ϕaR,aA{0}}:ϕA<<ϕ0<<ϕA+},where ϕ are the unknown skill levels, and the ordering helps with label switching problems in the inference. The skill cluster allocation of object i, denoted y{i}{0,1}A+1, is a binary vector which takes the value 1 at position sA and 0 everywhere else, if object iI belongs to cluster s. The set Y:={y{i}:iI{1}} then contains all the objects’ skill cluster allocations except object {1} which has fixed cluster allocation. Therefore, by defining S{i}(Y):=argmaxsy{i},s,iI{1}, then the objects’ skills can be written as(7) ri={ϕS{i}(Y),iI{1}0,i=1:=fr(ϕ,Y,i),iI.(7)

Now consider the K unique values of intransitivity to describe the different inter-object strategies. Of the n(n1)/2 pairs of objects, many will adopt similar strategies depending on their opponents. These similar strategies are translated by the model as having similar departures from transitivity, and are, thus, clustered together. For example, suppose some group of objects V:jV competed against object j in the same way. Then it would be reasonable to assume that θij is the same for all iV. This creates clusters of pairs of objects, such that the pairs are clustered according to them having identical intransitivity.

In order to measure the departure from a Bradley-Terry model, a linearly transitive cluster is imposed, which contains the set of pairs JT{{i,k}:ikI}, which have an intransitivity level θ0=0. Thus, the Bradley-Terry modeling assumption (1) holds for these pairs, such that pik=pik(BT), for all {i,k}JT. Given the existence of this cluster, there must be strong evidence from the data to produce an additional cluster with an intransitivity level close to 0. This choice does not impose transitivity of pairs as the linearly transitive cluster JT may be empty, except for all pairs with object 1 which are classified as transitive due to our imposition of constraints for identifiability from Proposition 1. Let the distinct set of intransivity levels beθK:={θ0=0,θ={θkR+,kK}:0<θ1<<θK},where K={1,,K} and the levels of intransitivity are ordered from smallest to largest. The levels of intransitivity, θ, contain the set of positive values of intransitivity which, due to symmetry and the completely transitive cluster with intransitivity value θ0, then define the full 2K+1 possible values of intransitivity between any pair of objects.

We define the intransitivity cluster allocation of a given pair {i,k} to be another binary matrix z{i,k}, which takes the value 1 at position s{K,,K} and 0 everywhere else, if the pair {i,k} belongs to cluster s. The clusters are, therefore, labeled from K to K, where a cluster labeled k{1,K} has cluster level θk, a cluster labeled k{K,,1} has cluster level θk, and a cluster with label 0 has cluster level θ0=0. The set Z:={z{i,k},i>kI{1}} then defines all the cluster allocations for all the free pairs ikI{1}, because of the rotational symmetry. For example, if the Kth index of z{i,k} has value z{i,k},K=1, then this indicates that the pair {i,k} belongs to the cluster with label K, whose cluster level is the largest level of intransitivity θK, and this enforces that the pair {k,i} belongs to cluster K and has the smallest level of intransitivity θK. If the cluster allocation of the pair {i,k}I{1} is(8) S{i,k}(Z):={argmaxsz{i,k},sif i>k,argmaxsz{k,i},sif i<k,(8) then the level of intransitivity for a pair {i,k}, θik can be redefined as(9) θik={θS{i,k}(Z) 1{S{i,k}(Z)0}θS{i,k}(Z)1{S{i,k}(Z)<0},{i,k}I{1}0,otherwise:=fθ(θ,Z,{i,k})(9) where 1 is the indicator function, and remembering that θ0=0.

The full model can be written either in terms of EquationEquation (6), noting that the parameters will be clustered, or can be written in terms of the levels and the cluster allocations,(10) pik=(1+exp{[fθ(θ,Z,{i,k})+fr(ϕ,Y,i)fr(ϕ,Y,k)]})1.(10)

So the ICBT model is defined by ψ={ϕ={ϕa:aA{0}}, θ={θk:kK}}.

Due to the intransitivity levels being fixed to 0 for all pairs of objects involving object 1, an adjustment is required to get a more interpretable value of intransitivity between the pairs. We define the adjusted intransitivity to be(11) θij*:=logit(pij)logit(pij(BT))=θij+rirj(ri(BT)rj(BT)),(11) that is, the difference between the logits of the pairwise probability between our ICBT model and the Bradley-Terry model. Note that the rotational symmetry of {θij} (4) also imposes rotational symmetry on {θij*}, that is, θij*=θji*,ijI.

To help see the value of this reparametrisation, consider then the earlier example of a deterministic game of Rock-Paper-Scissors. Take Rock as the constrained object, then Rock has fixed skill level rr=0, and that pairs involving Rock have intransitivity 0, that is θrp=θrs=0, where the p and s subscripts denote Paper and Scissors. To maintain that Rock always beats Scissors prs=1, then from the constraints, we get an excellent approximation from the ICBT model when rs=M for some large M, with the approximation improving as M. Likewise rp=M, and θps=3M. With this model there is only one skill level M, and one nonzero intransitivity level 3M. This parametrisation somewhat hides the symmetry of the intransitivity over pairs. However, with definition (11), then θrs*=θsp*=θpr*=M, resulting in an intuitive and easy interpretation of the intransitivity, reflecting the symmetry of the game, no-matter the choice of the fixed parameters.

3.3 Model Ranking

In the Bradley-Terry model, the skill parameters can simply be ordered to give a rank since a greater skill always results in higher win probabilities against all other objects. In our ICBT model this is not the case, because both the intransitivity parameters of each pair and the skill parameters of the objects impact the win probability between any pair. However, below we present two intuitive methods for determining overall ability, and therefore, ranking.

First, if pij=Pr{ij} is the probability of an object i beating object j according to our model, then we can rank the objects by ordering(12) p.:={pi.:=1n1jI:jipij:iI},(12) that is, pi. is the average probability of object i beating any other object jiI.

Second, if we consider the intransitivity between an object i and an opposing object ji as some “boost” which contributes to the overall ability (which could be negative), then the overall ability ai of object i could be defined by(13) ai:=ri+1njIθij, where θii=0,iI,(13) that is, the object skill plus its average intransitivity level. Definition (13) is equivalent to the Bradley-Terry definition of “ability.” Defining logit(pii(BT))=0,iI, a Bradley-Terry gives(14) 1njIlogit(pij(BT))=ri1njIrj,(14) where the sum on the right hand side does not depend on i, so the skill of object i is entirely determined by ri. Similarly, in our model(15) 1njIlogit(pij)=ri+1njIθij1njIrj=ai1njIrj,(15) then given definition (13), both (14) and (15) have the same form but with ai replacing ri. Then a ranking can be formed by ordering the set of abilities a:={ai:iI}. We argue that the first method, using the probabilities p. to rank the objects, is more meaningful since it is directly associated with the pairwise probabilities, the modeling of which is our ultimate aim. The application to baseball data of both methods is discussed in the supplementary material.

4 Inference

4.1 Likelihood

The data, x:={xc:cC}, are binary, and ij denotes that i is preferred to j. Then xc=1 if icjc, and xc=0 otherwise, where ic,jcI are the objects being compared in comparison c. Then, the log-likelihood for the ICBT model is(16) l(x|ϕ,Y,A,θ,Z,K)=cC[xclog(picjc)+(1xc)log(1picjc)],(16) where picjc is given by the ICBT model for all cC and is calculated from the set of parameters (ϕ,Y,A,θ,Z,K). All pairs’ intransitivities {θik:ijI} can be found from the intransitivity levels θ and the cluster allocations Z, using equation (3.2), so it is only necessary to do inference on these parameters, rather than the full 2K+1 separate clusters. Therefore, from here onwards the term intransitivity levels refers only to those K values which have positive intransitivity. Similarly, any individual object’s skill riiI can be found from knowing the ability levels ϕ and the cluster allocations Y, using EquationEquation (7). We formulate a Bayesian hierarchical model, which treats both K and A as unknown parameters, thus, accounting for uncertainty in the number of clusters. The posterior is, therefore, written asπ(ϕ,Y,A,θ,Z,K|x)L(x|ϕ,Y,A,θ,Z,K)π(ϕ,Y,A,θ,Z,K) where L()=exp[l()] is the likelihood and π(ϕ,Y,A,θ,Z,K) is the prior.

4.2 Prior Specification

Formulating the prior, we make the assumption that Z θ|K that is, the intransitivity level allocations and intransitivity levels are independent from one another given the number of intransitivity levels K. Likewise, it is assumed that Y\ϕ|A. Furthermore, we assume that the clustering of the objects’ skills and the clustering of the pairs’ intransitivities are independent systems, that is, A K, ϕ θ, and Y Z. This means that the prior specification for the two features we are clustering, skills and intransitivities, can be approached separately.

Consider first the prior specification for the clustering of the intransitivity values of the pairs. Remember that labels z{i,j},i>jI{1} have domain {K,,K}, that is, z{i,j}:{K,,K}{0,1},i>jI{1}, and also that z{i,1},iI{1} (and by symmetry z{1,j},jI{1} too) are fixed in the transitive level {0} for identifiability purposes, see Section 3.2. Let the prior on the cluster allocation bez{i,j}|ωKmultinomial(1,ωK),i>jI{1},where ωK is on {K,,K} such thatωK={ωK,s[0,1]:s{K,,K},s=KKωK,s=1}.

The distribution of Z|(ωK,K) is assumed independent over all pairs i>jI{1}, that is,f(Z|ωK,K)=(k=KK|bk|)!k=KK|bk|!s=KKωK,s|bs|,where bk={(i,j):i>jI{1}:z{i,j},k=1}k{K,,K} is the set of allocated pairs of objects belonging to cluster k. We set ωK|KDirichlet(γ¯K) to come from 2K+1 dimensional Dirichlet prior distribution, and γ¯KR+2K+1 is the hyperparameters vector. We use an uninformative prior, setting γ¯K=γK12K+1 where 12K+1 is a vector of ones of length 2K+1 and γKR+. In this case, the ωK parameter can be marginalized out, by(17) f(Z|γK,K)=ωKf(Z|ωK,K)f(ωK|γK,K)dωK=(k=KK|bk|)!k=KK|bk|!Γ((2K+1)γK)Γ(γK)2K+1×k=KKΓ(γK+|bk|)Γ((2K+1)γK+k=KK|bk|)(17) where integration on ωK is taken over the 2K+1 simplex. This is referred to as a Dirichlet-multinomial allocation prior. The prior for K is a Poisson (λK) distribution with probability mass function denoted g0(k|λK) so that E[K|λK]=λK, with λK>0. Note that K=0 is feasible, as this corresponds to the Bradley-Terry model since θ|(K=0)= and so only the transitive cluster exists, that is θK|(K=0)=θ0, and {i,j}JT,ijI, so all pairs belong to the transitive cluster JT. Formally K<n(n1)/2n but as this is large relative to our prior beliefs on K, for simplicity we ignore this constraint in the inference.

As the θ elements are ordered in increasing order and are positive, the prior on the θ parameters is taken to be the joint distribution of K order statistics drawn from independent gamma random variables, such that(18) h0(θ|K)=K!i=1Kh0(θi|α,β),with 0<θ1<<θK,K1,(18) and where h0(x|α,β) is the Gamma (α,β) density with shape and scale α,β>0, respectively.

Consider the prior for the skill levels clustering. The set of skill cluster allocations has distribution Y={y{i}|ωA,Amultinomial(1,ωA),i{2,,n}}, where ωA has domain on {A,,A+}. The distribution of y{i}|(ωA,A) is assumed independent over all objects iI{1} such thatf(Y|ωA,A)=iI{1}f(y{i}|ωA,A),whereωA={ωA,s[0,1]:s{A,,A+},s=AA+ωA,s=1}.

Again, ωA|(A,γA)Dirichlet(γ¯A) is modeled to come from an A+1 dimensional Dirichlet prior distribution, with γ¯A=γA1A+1 where γAR+. Marginalizing out as in derivation (4.2), another Dirichlet-multinomial allocation prior is obtained by integrating ωA over the A+1 dimensional simplex. The prior density for the skill allocations is, therefore, given as(19) f(Y|γA,A)=n!a=AA+|ca|!Γ(AγA)Γ(γA)Aa=AA+Γ(γA+|ca|)Γ(AγA+n),(19) because a=AA+|ca|=n, where ca:={i,iI{1}:y{i},a=1} is the set of objects belonging to skill cluster aA.

The prior for the number of unknown skill levels A is taken to be a truncated Poisson distribution with parameter (λA), λA>0 with probability mass functiongA(a|λA)=λAaa!(i=0n1λAii!)1a=0,1,,n1.

Similarly to θ, the prior choice for ϕ is taken to be the joint distribution of order statistics of independent and identically distributed A+1 Gaussian random variables such thatπ(ϕ|A)=(A+1)!aA{0}π(ϕa) for ϕA<<ϕ0<ϕA+,where ϕaN(0,νA2)aA{0}, and with νAR+. The (A+1)! term arises as ϕ0 can occur anywhere in the sequence of ϕ.

In summary, the prior π(ϕ,Y,A,θ,Z,K) is equal to(20) (A+1)![aA{0}π(ϕa)]f(Y|γA,A)gA(A|λA)K!×[i=1Kh0(θi|α,β)]f(Z|γK,K)g0(K|λK),(20) where, λK,λA,γK,γA,νA, and α,β are the hyperparameters.

4.3 Reversible Jump Markov Chain Monte Carlo Sampler

Inference is made via a reversible jump Markov chain Monte Carlo sampler (Green Citation1995), which provides samples from the posterior distribution π(ϕ,Y,A,θ,Z,K|x), that is, the intransitivity and skill levels, the allocations to the these levels, and the number of levels. Since the number of skill and intransitivity levels (A, K) are assumed to be unknown, the uncertainty in these parameters must be accounted for, thus, motivating the use of a reversible jump sampler. In a sense, the reversible jump sampler mixes over models as well as parameters, and thus, fully accounts for this uncertainty in the final inference.

The ICBT model is structured to try to favor the Bradley-Terry model as a special case, and this is reflected in our sampler, by explicitly incorporating the completely transitive cluster θ0 as an ever present cluster, even if no pairs are allocated to this cluster at a given iteration of the sampler. To ensure the skill and intransitivity levels both remain ordered, the updates to these levels occur in a transformed space such that no update can lead to a change in order.

The reversible jump algorithm used is a split-merge sampler (Green and Richardson Citation2001), which is adapted from the work of Ludkin (Citation2020). The sampler comprises three separate moves: a standard Markov chain Monte Carlo Metropolis-Hastings move, which samples parameters ϕ, θ, and reallocates clusters Y,Z; splitting or merging clusters; and adding or deleting empty clusters. For the construction of the algorithm and how it is implemented, see the supplementary material.

4.4 Model Assessment

The inferences produced by any model are only meaningful if the model itself is accurate. This accuracy is measured here by how well the model fits out of sample. If C is the set of total observed pairwise comparisons, then let Cs be the set of comparisons on which the model is fitted and Ct be the set of comparisons on which the model performance is analyzed, such that CsCt=C and CsCt=. We use log-loss l(x*) of the test dataset x* to measure model performance, which we take to be the average negative log-likelihood per observation in x*, that is,(21) l(x*)=1|Ct|cCt[xclog(p̂icjc)+(1xc)log(1p̂icjc)],(21) where p̂ij is the point estimate of pij based on the training dataset of comparisons Cs and x*:={xc:cCt} is the set of test data, where the notation is as used in the expression for the likelihood (16) .

4.5 Simulation Study

The model was tested using simulated datasets where the number of objects, the number of round-robin tournaments, and the amount of intransitivity varied between the datasets. The sensitivity of our model to these parameters was then tested by comparing out of sample prediction accuracy with a standard Bradley-Terry model. This provided insights into the amount of data, and the amount of intransitivity, required for our more complex model to outperform the Bradley-Terry model. A full analysis is provided in the supplementary material.

5 Baseball Data

5.1 Data

Baseball was chosen to illustrate the methodology due to the high frequency of games, with accessible data for the American League Baseball obtained from www.retrosheet.org. The data are from the 2010–2018 seasons, with the 2010–2012 seasons involving 14 teams, and the 2013–2018 seasons involving 15 teams due to the Houston Astros moving from the National League to the American League. We analyze each season’s data separately here, and jointly over years in the supplementary material.

The tournament structure is not as simple as the round robin tournament we considered in the simulation study. The American League is split into three divisions based on location: East, Central and West, with five teams in each (since 2013). Within the same division, pairs of teams play each other approximately 20 times, and pairs of teams from different divisions play each other around 5–7 times, as well as any Playoffs and World Series matchups, totaling around 140–160 matches per team every season, depending on the season and the team. Baseball is known to be a highly strategic game, with issues such as player selection, handedness of the of batters, strength and speed of players, and tactics such as “small ball” versus “long ball” all considered of great importance. So we anticipate that the level of intransitivity will be high.

The vast majority (at least 99.5%) of all matches are played at the home of one of the two teams competing in the game, with the rest played at neutral venues. Playing at home is well known to have the potential to increase the probability of the home team winning the match across a range of sports (Dixon and Coles Citation1997). Although prediction and model interpretation could be improved by incorporating this effect, we decided not to address home advantage here. Our reason was that none of the existing intransitivity models have such a feature, as they were developed for applications devoid of home advantage, such as e-sports, so a comparison of the different models would only be fair if we did not include this property. However, in Section 6 we formulate the home advantage adaptation given its potential interest.

If pairs of teams do not play equally home and away, then ignoring home advantage could lead to misinterpretation of the estimated ICBT model parameters, for example, if team i mostly played team k with team i at home, the home advantage would feed into θik. We do not believe this is problematic due to the near perfect balance of home to away matches per team, and the maximum home percentage within pairs of teams is 70% for 2010–2011 and only 57% subsequently.

5.2 Inference

The baseball data are analyzed using the ICBT model, and its results are compared with those of the Bradley-Terry model and with the existing models of Section 2 except for the model of Duan et al. (Citation2017) due to the subjective choices required for some parameters.

The ICBT model incorporates uncertainty in the choice of model itself, that is, the number of clusters, and therefore, how many parameters. Our prior distributions for number of intransitivity levels K and skill levels A are shown in . We used a Poisson(λK=2) prior for K, with the hyperparameter to give a 95% prior chance that K[0,5], as it was thought that there would only be a few different pairwise strategies. Similarly, the prior for A was taken to be Poisson(λA = 7) as this hyper-parameter choice gave a 95% prior chance that A[2,13]. The justification for our choice of the other hyper-parameters (γK,γA,α,β,νA) and a sensitivity analysis to hyper-parameter choice is reported in the supplementary material.

Figure 1: Posterior distributions of the K intransitivity levels (left) and the A skill levels (right) for the 2018 season: with the associated prior distributions in a lighter color.

Figure 1: Posterior distributions of the K intransitivity levels (left) and the A skill levels (right) for the 2018 season: with the associated prior distributions in a lighter color.

Now consider the posterior distributions for K and A based on the 2018 season data, also shown in . Despite the prior only providing vague information across values of K5, the data clearly favors having a single intransitivity level, meaning three possible clusters for each pair: a positive level, the completely transitive level, and the mirrored negative level. Further, although the prior gave a 14% probability to the Bradley-Terry model (K=0), the posterior probability for that model is estimated to be zero, showing strong evidence of intransitivity in the dataset. For the distinct skill levels the change from prior to posterior is relatively small, with a mean (and 95% credible interval) of 7.94(4,12), with the number of distinct skills levels favoring A[6,9].

The posterior estimated values of the teams’ skills, and the variance of these values in particular, provides a helpful summary of how competitive the tournament is in a season, with the smaller the variance the more closely contested the tournament. For our model the skills’ variance ranged from 0.027 (2012 season) to 0.32 (2018 season) and for the standard Bradley-Terry model, 0.031 (2015 season) to 0.23 (2018 season) - both models suggesting that 2018 was the least competitive season. In the 2013 season the variance of skill levels according to Bradley-Terry is almost 3 times that of our model. However, the 2013 season was found to contain a particularly large amount of intransitivity, indicating that the large range of skills in the Bradley-Terry model could be the result of compensating for an inability to express the intransitivity. This has perhaps resulted in the Bradley-Terry model concluding that the 2013 season was less competitive than it was in reality.

Now consider the pairwise interactions between teams. These interactions could be inferred from either the intransitivity of the posterior mean θ̂ij*,ij, or by the posterior mean of the intransitivity parameter θ̂ij,ij. The supplementary material contains a comparison for both and concludes that θ̂ij* is more meaningful and interpretable here, so we focus on that. For the 2018 season, (left) shows θ̂ij*, for each pair of teams i>jI: recall that intransitivity has rotational symmetry, that is, θij*=θji*,ij. The teams are sorted by their rank according to p., given by definition (12), see (right). Reading from the teams on the y-axis to x-axis there is a large positive value of intransitivity from Baltimore (BAL) to Tampa Bay (TBA) of 0.78 with 95% credible interval (0.36,1.22), indicating that Baltimore played better against Tampa Bay than expected, given their overall abilities. This is consistent with the data, with Baltimore winning 11 out of 19 matches between the two teams, despite being ranked lower.

Figure 2: Analysis of 2018 season: the posterior mean of the intransitivity parameter, θ̂ij* across all pairs of teams i>jI (left); ranking according to definition (12) (black) and Bradley-Terry model (red) for all teams iI (right).

Figure 2: Analysis of 2018 season: the posterior mean of the intransitivity parameter, θ̂ij* across all pairs of teams i>j∈I (left); ranking according to definition (12) (black) and Bradley-Terry model (red) for all teams i∈I (right).

The analysis of these intransitivities between pairs, and that of the skills of each team, can be combined to produce an overall ranking of the teams. As discussed in Section 3.3, with further details in the supplementary material, p. provides a suitable ranking of the teams. For the 2018 season (right) shows the ranks according to p. compared to the Bradley-Terry ranks. Both have been linearly scaled to help with a visual comparison, such that the best and worst teams have abilities 1 and 0, respectively. The two sets of estimated rankings using p. are clearly correlated; however, there is some difference in the ordering of the ranks, indicating that intransitivity may have been masking the true ranks of some teams. For example, consider Tampa Bay, ranked 6th by the Bradley-Terry model. Tampa Bay’s good record against Kansas City (KCA) has a much lower weighting than their poor record against Baltimore in the Bradley-Terry model due to the differing frequency of these match-ups, and therefore, impacts the overall rank of Tampa Bay. The ICBT model, however, recognizes that good or bad records against particular teams could be due to the presence of intransitivity, and therefore, penalizes Tampa Bay less overall, ranking them 5th, thus, illustrating our point in Section 1 that the ICBT model makes adjustments for tournament imbalance. Similar plots and inferences are drawn from the other seasons (2010–2017) but with different team rankings in each year.

5.3 Model Performance

To test the model performance, 70% of games from each season are randomly selected to be training data, on which the model is fitted, with the remaining 30% used as test data, on which the log-loss score is calculated. This random selection is appropriate as none of the models compared take time-dependency into consideration, a feature discussed in the supplementary material. The variation due to this random sampling in the training-test split is accounted for by taking 100 separate random training-test splits for each season. For each replicate of training data the model is fitted separately to each season’s data. Relative log-loss is then calculated by subtracting the log-loss of a baseline coin tossing model.

shows these negative relative log-loss scores for all years of baseball data, along with 95% confidence intervals, with this measure evaluated for the ICBT, Bradley-Terry, blade-chest and majority vote models. Since a larger value of negative relative log-loss indicates better model performance, a positive value indicates an improvement on the coin tossing model. So all four models improve on simply using coin tossing, showing that there is information to be exploited for inference and prediction. The Bradley-Terry, blade chest and majority vote models all have somewhat similar performance to each other across the years, with the most improved fit being in 2018. In contrast our model is the best performing out of the four models in terms of out-of-sample prediction on all years of data. When assessed as the cumulative improvement over years, relative to coin tossing, the ICBT model is 2.8 times better than the Bradley-Terry model, showing that we have substantially improved predictive performance. The difference in log-loss scores relative to the Bradley-Terry model is largest for the 2013 season, which in Section 5.2 has been identified as the season with the largest intransitivity.

Table 1: Negative relative log-loss ×103 (compared to a coin-tossing model) for each year of baseball data for the ICBT, Bradley-Terry, blade-chest and majority vote models.

6 Conclusions and Discussion

We have proposed a new model and inference structure for paired comparison data. We frame this in the context of sport competitions, baseball in particular, with teams competing against each other, though the potential applications of the model are much broader. Our proposed model, the Intransitive Clustered Bradley-Terry (ICBT) model, extends the standard Bradley Terry model, which is widely considered as the baseline model for such data. The extension allows for intransitivity so that the difference in skill levels between two objects being compared is not the only factor affecting the probabilities of the outcomes. There are a number of models which already allow for intransitivity, but each of these are quite restricted in the parametric form of intransitivity relative to our semi-parametric approach, which recognizes that certain patterns of interaction between pairs of objects can be common over multiple pairs. Our model also allows for objects’ skills to be clustered, a feature that is novel to paired comparisons, with this inducing parsimony and avoiding obtaining distinct rankings for some items when there is no evidence from the data that they are not equally good. We have shown evidence from American League baseball that our model provides a distinct improvement on existing models.

The ICBT model has complete flexibility, in the sense that cluster allocation to skill and intransitivity levels is not predetermined. In order that the data identify the appropriate structure of clustering, and for the inference to account for the uncertainty in this choice, the model is fitted via RJMCMC.

Based on the clusters with the highest posterior probabilities, we anticipate that experts in the particular sport may be able to identify some patterns of clustering that are interpretable, for example, associated with different styles of play. In such cases, these clustering features could be hard wired into the model as the only options, resulting in more efficient inference. A referee made the helpful suggestion that if accounting for clustering uncertainty was not an issue then the inference could be simplified by estimating the ICBT model with group lasso penalties to induce clusters. We feel that our model works sufficiently well for the current applications but agree that it presents an exciting springboard for the consideration of various extensions to the model and its inference. We finish by illustrating a few such possible extensions.

In Section 5 we did not attempt to account for home advantage, which is widely recognized as an important feature in sport, for example, Cattelan, Varin, and Firth (Citation2013) incorporate it in a Bradley-Terry model, though to the best of our knowledge it has not been accounted for in the existing intransitivity models. The most natural way to achieve this is to change pik given by expression (6) to a probability pik(i) of the home team i beating the away team k, with(22) pik(i)=11+exp[(θik+γ+rirk)],   andpki(i)=1pik(i)ikI,(22) where γR determines the effect of playing at home, which here is common over all pairs of teams. If γ>0 (γ<0) then the probability of a home win is increased (decreased) relative to the other factors of skill and intransitivity. This effect can be extended to vary over teams by replacing γ by γi in expression (6). To ensure these γi parameters are all identifiable, we fix γ1=0, though no additional constraints are needed if there is a common γ, but that is all that is required under the conditions of Proposition 1 on the other parameters, as we are able to exploit data that distinguishes which team is at home.

This article only considered win-loss scenarios. Extensions of the Bradley-Terry have been proposed for handling draws. Two distinct methods for handling draws are given by Cattelan, Varin, and Firth (Citation2013) and Hankin (Citation2020). The former use ordinal logistic regression, treating win, loss, draw as outcomes of an ordered multinomial random variable, which can then be analyzed via an ordered link model. In contrast, the latter treats the problem as a competition between the two teams and a third theoretical team, such that when the theoretical team wins the outcome of the match corresponds to a draw between the two actual teams. The ICBT model can be adapted similarly, with the use of the clustering strategy extended to pooling teams to account for their similar cautiousness, leading to them drawing more often than would be expected.

We have assumed that all teams play each other. If this is not the case we cannot improve on the prior inference for the θik parameters for pairs (i,k) that do not play each other. This is not a restriction for Bradley-Terry or the existing intransitivity models, where the associated pik are determined by the observed pairs. This raises issues about identifiability of the ICBT model parameters. Our approach, through Proposition 1, is no longer sufficient leaving the open problem of which parameters to fix in order to give the most efficient inference.

Supplemental material

Supplemental Material

Download Zip (5.4 MB)

Acknowledgments

We thank Dr. Matthew Ludkin for giving detailed insight into the reversible jump algorithm. Spearing gratefully acknowledges funding of the EPSRC funded STOR-i Centre for Doctoral Training (grant number EP/L015692/1), and ATASS Sports. We thank the referees for their helpful comments.

Supplementary Materials

Additional Analyses:

The supplementary material for this article include extra analyses: a simulation study; and an extended analysis of the baseball application, including a pooled analysis of data from all seasons. (ICBT_supplementary.pdf)

Reversible Jump Markov Chain Monte Carlo Sampler:

The supplementary material includes full details of the RJMCMC algorithm, including the initialization procedure and highlighting the main novelties of the algorithm. (ICBT_supplementary.pdf)

R Code:

The analyses can be replicated using the code and data found in the supplementary material. See the README contained in the zip file for more details. (spearing_ICBTcode.zip, zip archive)

Appendix:

The supplementary material contains the Appendix, which provides the proof of Proposition 1. (ICBTappendix.pdf)

Engineering and Physical Sciences Research Council;

Disclosure Statement

The authors report there are no competing interests to declare.

References

  • Bradley, R. A., and Terry, M. E. (1952), “Rank Analysis of Incomplete Block Designs: I. The Method of Paired Comparisons,” Biometrika, 39, 324–345.
  • Cattelan, M., Varin, C., and Firth, D. (2013), “Dynamic Bradley–Terry Modelling of Sports Tournaments,” Journal of the Royal Statistical Society: Series C, 62, 135–150.
  • Causeur, D., and Husson, F. (2005), “A 2-Dimensional Extension of the Bradley–Terry Model for Paired Comparisons,” Journal of Statistical Planning and Inference, 135, 245–259.
  • Chen, S., and Joachims, T. (2016), “Modeling Intransitivity in Matchup and Comparison Data,” in Proceedings of the Ninth ACM International Conference on Web Search and Data Mining, pp. 227–236. ACM.
  • Dixon, M. J., and Coles, S. G. (1997), “Modelling Association Football Scores and Inefficiencies in the Football Betting Market,” Journal of the Royal Statistical Society: Series C, 46, 265–280.
  • Duan, J., Li, J., Baba, Y., and Kashima, H. (2017), “A Generalized Model for Multidimensional Intransitivity,” in Pacific-Asia Conference on Knowledge Discovery and Data Mining, pp. 840–852. Springer.
  • Green, P. J. (1995), “Reversible Jump Markov Chain Monte Carlo Computation and Bayesian Model Determination,” Biometrika, 82, 711–732.
  • Green, P. J., and Richardson, S. (2001), “Modelling Heterogeneity with and without the Dirichlet Process,” Scandinavian Journal of Statistics, 28, 355–375.
  • Hankin, R. K. (2020), “A Generalization of the Bradley–Terry Model for Draws in Chess with an Application to Collusion,” Journal of Economic Behavior & Organization, 180, 325–333.
  • Ludkin, M. (2020), “Inference for a Generalised Stochastic Block Model with Unknown Number of Blocks and Non-conjugate Edge Models,” Computational Statistics & Data Analysis, 152, 107051.
  • Makhijani, R., and Ugander, J. (2019), “Parametric Models for Intransitivity in Pairwise Rankings,” in The World Wide Web Conference, pp. 3056–3062. ACM.
  • Makowski, M., and Piotrowski, E. W. (2006), “Quantum Cat’s Dilemma: An Example of Intransitivity in a Quantum Game,” Physics Letters A, 355, 250–254.
  • Masarotto, G., and Varin, C. (2012), “The Ranking Lasso and Its Application to Sport Tournaments,” The Annals of Applied Statistics, 6, 1949–1970.
  • Smead, R. (2019), “Sports Tournaments and Social Choice Theory,” Philosophies, 4, 28.