Approximate Inference in Practice Microsoft s Xbox TrueSkill TM

1 Approximate Inference in Practice Microsoft s Xbox TrueSkill TM Daniel Hernández-Lobato Universidad Autónoma de Madrid 2014

2 Outline 1 Introduction 2 The Probabilistic Model 3 Approximate Inference 4 Results

3 Outline 1 Introduction 2 The Probabilistic Model 3 Approximate Inference 4 Results

4 Introduction Competition is a key aspect of humans. Innate in most persons since a young age. Used as a principle in most sports: soccer, basketball, etc. Ratings used in games for fair competition. ELO system: Estimates the skill level of a chess player. ATP system: Estimates the skill level of a tennis player. Used for matchmaking in tournaments. Generate a players ranking. Online gaming poses additional challenges. Infer from a few match outcomes player skills. Consider the possibility of teams with different number of players.

Questions that Arise in Online Gaming Skill Rating Observed data: Match outcomes of k teams with n 1,..., n k players each, in the form of a ranking with potential ties between teams. Information we would like to obtain: Skills of each player s 1,..., s k. If s i > s j player i is expected to beat player j. Global ranking among players. Fair matches between players and teams of players. Successfully achieved by Microsoft s Xbox TrueSkill TM R. Herbrich, T. Minka and T. Graepel, TrueSkill TM : A Bayesian Skill Rating System. Advances in Neural Information Processing Systems 19, 2006. 5

6 Outline 1 Introduction 2 The Probabilistic Model 3 Approximate Inference 4 Results

7 TrueSkill TM Observed and Latent Variables Latent Variables: Skill s i for player i: p(s i ) = N (s i µ i, σi 2). Performance p i of player i: p(p i s i ) = N (p i s i, β 2 ). Performance t j of team j: p(t j {p i : i A j }) = δ(t j i A j p i ). Observed Variables: Match outcome involving k teams: Rank r j for each team j. p(r 1,..., r k t 1,..., t k ) = I(t r1 > t r2 > > t rk ) Thus, r j < r j+1 implies t j > t j+1. The parameters of the model are β 2, µ i and σi 2.

8 TrueSkill TM Example We consider a game with 3 teams A 1 = {1}, A 2 = {2, 3}, A 3 = {4}. Results: r = (1, 2, 3). Team A 1 wins followed by A 2 and A 3. Define d 1 = t 1 t 2 and d 2 = t 2 t 3 : p(d 1 t 1, t 2 ) = δ(d 1 t 1 + t 2 ), p(d 2 t 2, t 3 ) = δ(d 2 t 2 + t 3 ). The result implies d 1 > 0 and d 2 > 0. Use transitivity! Thus, r = (1, 2, 3) is equivalent to observing b 1 = 1 and b 2 = 1 where: { { 1 if d 1 > 0, 1 if d 2 > 0, p(b 1 d 1 ) = p(b 2 d 2 ) = 0 if d 1 0. 0 if d 2 0.

Bayesian Network for the Example Note that we observe b 1 = 1 and b 2 = 1. 9

10 TrueSkill TM Bayesian Online Learning Inference involves computing a posterior given a match outcome: p(s, p, t, d, b) = p(s, b, t, p, d) p(b) = p(b d)p(d t)p(t p)p(p s)p(s) p(b) We marginalize out t and p and d to get the marginal posterior of s. Bayesian Online Learning The posterior after one match is used as the prior for the next match. {µ i (0), σ 2 i Match #1 (0)} {µ i (1), σi 2 Match #2 (1)} {µ i (2), σi 2(2)} Marginal Posterior is not Gaussian!

11 Outline 1 Introduction 2 The Probabilistic Model 3 Approximate Inference 4 Results

Bethe Cluster Graph for the Example 12

13 Message Passing Algorithm We pass messages in the Bethe cluster graph until convergence to compute the posterior marginals of s 1,..., s 4. δ i j (s i,j ) = ψ i (c i ) δ k i (s k,i ) d (c i \ s i,j ) /δ j i(s i,j ). k Nb i where s i,j are the variables in the sepset of edge i j. This is the same rule as the one used in Belief Propagation. The posterior marginals for s 1,..., s 4 are given: p(s i b) ψ i (s i ) k Nb i δ k i (s i ) = N (s i µ i, σ 2 i )δ k i (s i ). All messages are Gaussian except for the two bottom messages!

Bethe Cluster Graph: Message Passing 14

Bethe Cluster Graph: Message Passing 15

Approximate Messages: Projection Step We consider the case of the first message. The first step is to approximate the marginal by projecting into the Gaussian family: ψ i (d 1 )δ j i (d 1 ) = I(d 1 > 0)N (d 1 ˆm, ˆv). For this, we compute the log of the normalization constant: ( ) ˆm log Z = log I(d 1 > 0)N (d 1 ˆm, ˆv)dd 1 = log Φ. ˆv We can obtain the mean and the variance of I(d 1 > 0)N (d 1 ˆm, ˆv) by computing the derivatives with respect to ˆm and ˆv! Assume m and v are the mean and the variance. The approximate message is then: δ i j (d 1 ) N (d 1 m, v) N (d 1 ˆm, ˆv). The computation of the other approximate message is equivalent. 16

17 Outline 1 Introduction 2 The Probabilistic Model 3 Approximate Inference 4 Results

Convergence Speed 40 35 30 25 Level 20 15 10 char (TrueSkill ) SQLWildman(TrueSkill ) 5 char (Halo 2rank) SQLWildman(Halo 2rank) 0 0 100 200 300 400 NumberofGames 18

19 Applications to Online Gaming Leader-board of players: Provides a global ranking of all players. The rank is conservative: µ i 3σ i. Matchmaking: For players: Most uncertain outcome is better. For inference: Most uncertain outcome is most informative. Good for both players and the model.

20 Matchmaking: Probability of winning or loosing We assume player i wants to play against player j. What is the probability of winning? p(p i > p j ) = = I(p i p j > 0)p(p j s j )p(s j )p(p i s i )p(s i )dp i dp j ds i ds j I(p i p j > 0)N (p j s j, β 2 )N (p i s i, β 2 ) N (s j µ j, σj 2 )N (s i µ i, σi 2 )dp i dp j ds i ds j = I(p i p j > 0)N (p j µ j, σi 2 + β 2 )N (p i µ i, σi 2 + β 2 )dp i dp j = Φ µ i µ j. σi 2 + σj 2 + 2β 2

Game Over! 21