Approximate Inference in Practice Microsoft s Xbox TrueSkill TM

Similar documents
Graphical Models Chris Bishop

A New Framework for Machine Learning

ARTIFICIAL INTELLIGENCE. Uncertainty: probabilistic reasoning

Mutual Information Scheduling for Ranking

LEARNING WITH BAYESIAN NETWORKS

Clique trees & Belief Propagation. Siamak Ravanbakhsh Winter 2018

Approximate Inference Part 1 of 2

Approximate Inference Part 1 of 2

Efficient Online Scalar Annotation with Bounded Support

Probabilistic Programming with Infer.NET. John Bronskill University of Cambridge 19 th October 2017

ECE521 Tutorial 11. Topic Review. ECE521 Winter Credits to Alireza Makhzani, Alex Schwing, Rich Zemel and TAs for slides. ECE521 Tutorial 11 / 4

Probabilistic Ranking: TrueSkill

Automating variational inference for statistics and data mining

Black-box α-divergence Minimization

Lecture 6 and 7: Probabilistic Ranking

Naïve Bayes classification

Bayesian locally-optimal design of knockout tournaments

Part 1: Expectation Propagation

The Advantage of Lefties in One-On-One Sports

Lecture 3: Sports rating models

COURSE INTRODUCTION. J. Elder CSE 6390/PSYC 6225 Computational Modeling of Visual Perception

Naïve Bayes classification. p ij 11/15/16. Probability theory. Probability theory. Probability theory. X P (X = x i )=1 i. Marginal Probability

Part 2: Multivariate fmri analysis using a sparsifying spatio-temporal prior

13 : Variational Inference: Loopy Belief Propagation and Mean Field

Probabilistic Graphical Models Homework 2: Due February 24, 2014 at 4 pm

2 : Directed GMs: Bayesian Networks

Probabilistic Graphical Models (I)

Solutionbank S1 Edexcel AS and A Level Modular Mathematics

Convergence of Random Processes

Bayesian Machine Learning

Probabilistic numerics for deep learning

Expectation Propagation for Approximate Bayesian Inference

Independent Events. The multiplication rule for independent events says that if A and B are independent, P (A and B) = P (A) P (B).

The Origin of Deep Learning. Lili Mou Jan, 2015

Probabilistic and Bayesian Machine Learning

Gaussian Process Vine Copulas for Multivariate Dependence

MODEL-LEARNER PATTERN

Expectation, inequalities and laws of large numbers

Dynamic Probabilistic Models for Latent Feature Propagation in Social Networks

arxiv: v1 [stat.ml] 9 May 2013

4. The Negative Binomial Distribution

Online Appendix to Career Prospects and Effort Incentives: Evidence from Professional Soccer

Hypernode Graphs for Spectral Learning on Binary Relations over Sets

Probabilistic Machine Learning. Industrial AI Lab.

Probabilistic Modelling and Reasoning: Assignment Modelling the skills of Go players. Instructor: Dr. Amos Storkey Teaching Assistant: Matt Graham

PMR Learning as Inference

Approximate Message Passing

Nearest-Neighbor Matchup Effects: Predicting March Madness

2011 Pearson Education, Inc

6.867 Machine learning, lecture 23 (Jaakkola)

Large-scale Ordinal Collaborative Filtering

Axiomatic Foundations of Probability. Definition: Probability Function

Topic 5: Probability. 5.4 Combined Events and Conditional Probability Paper 1

A Tutorial on Learning with Bayesian Networks

Inference as Optimization

Bayesian Machine Learning - Lecture 7

STAT:5100 (22S:193) Statistical Inference I

Gaussian Process Vine Copulas for Multivariate Dependence

Machine Learning using Bayesian Approaches

POLYNOMIAL SPACE QSAT. Games. Polynomial space cont d

Bayesian Recommender Systems: Models and Algorithms

Multivariate Bayesian Linear Regression MLAI Lecture 11

Machine Learning. Lecture 4: Regularization and Bayesian Statistics. Feng Li.

Machine Learning and Bayesian Inference. Unsupervised learning. Can we find regularity in data without the aid of labels?

Dynamic Approaches: The Hidden Markov Model

arxiv: v1 [cs.lg] 5 Oct 2018

Intro to Probability. Andrei Barbu

Pattern Recognition and Machine Learning

Hierarchical Sparse Bayesian Learning. Pierre Garrigues UC Berkeley

Probabilistic Graphical Models and Bayesian Networks. Artificial Intelligence Bert Huang Virginia Tech

Bayesian Learning. HT2015: SC4 Statistical Data Mining and Machine Learning. Maximum Likelihood Principle. The Bayesian Learning Framework

The Knowledge Gradient for Sequential Decision Making with Stochastic Binary Feedbacks

Human Pose Tracking I: Basics. David Fleet University of Toronto

CS Lecture 19. Exponential Families & Expectation Propagation

Discrete Bayes. Robert M. Haralick. Computer Science, Graduate Center City University of New York

A Novel Click Model and Its Applications to Online Advertising

Grundlagen der Künstlichen Intelligenz

Approximating the Partition Function by Deleting and then Correcting for Model Edges (Extended Abstract)

IE 4521 Midterm #1. Prof. John Gunnar Carlsson. March 2, 2010

Probabilistic Graphical Models

Biology as Information Dynamics

Manifold Learning for Signal and Visual Processing Lecture 9: Probabilistic PCA (PPCA), Factor Analysis, Mixtures of PPCA

9/12/17. Types of learning. Modeling data. Supervised learning: Classification. Supervised learning: Regression. Unsupervised learning: Clustering

Interpretable Latent Variable Models

CSC Discrete Math I, Spring Discrete Probability

Exact Inference: Clique Trees. Sargur Srihari

An Introduction to Bayesian Machine Learning

Robust Expectation Propagation in Factor Graphs Involving Both Continuous and Binary Variables

Time Series Models for Measuring Market Risk

Bayesian Approaches Data Mining Selected Technique

Belief and Desire: On Information and its Value

Lecture 15. Probabilistic Models on Graph

Mathematical Formulation of Our Example

Lecture 1: Probability Fundamentals

Introduction to Bayesian Learning

Chapter 14. From Randomness to Probability. Copyright 2012, 2008, 2005 Pearson Education, Inc.

Bayesian Networks. Marcello Cirillo PhD Student 11 th of May, 2007

DATA MINING LECTURE 10

ECEn 370 Introduction to Probability

T Machine Learning: Basic Principles

Transcription:

1 Approximate Inference in Practice Microsoft s Xbox TrueSkill TM Daniel Hernández-Lobato Universidad Autónoma de Madrid 2014

2 Outline 1 Introduction 2 The Probabilistic Model 3 Approximate Inference 4 Results

3 Outline 1 Introduction 2 The Probabilistic Model 3 Approximate Inference 4 Results

4 Introduction Competition is a key aspect of humans. Innate in most persons since a young age. Used as a principle in most sports: soccer, basketball, etc. Ratings used in games for fair competition. ELO system: Estimates the skill level of a chess player. ATP system: Estimates the skill level of a tennis player. Used for matchmaking in tournaments. Generate a players ranking. Online gaming poses additional challenges. Infer from a few match outcomes player skills. Consider the possibility of teams with different number of players.

Questions that Arise in Online Gaming Skill Rating Observed data: Match outcomes of k teams with n 1,..., n k players each, in the form of a ranking with potential ties between teams. Information we would like to obtain: Skills of each player s 1,..., s k. If s i > s j player i is expected to beat player j. Global ranking among players. Fair matches between players and teams of players. Successfully achieved by Microsoft s Xbox TrueSkill TM R. Herbrich, T. Minka and T. Graepel, TrueSkill TM : A Bayesian Skill Rating System. Advances in Neural Information Processing Systems 19, 2006. 5

6 Outline 1 Introduction 2 The Probabilistic Model 3 Approximate Inference 4 Results

7 TrueSkill TM Observed and Latent Variables Latent Variables: Skill s i for player i: p(s i ) = N (s i µ i, σi 2). Performance p i of player i: p(p i s i ) = N (p i s i, β 2 ). Performance t j of team j: p(t j {p i : i A j }) = δ(t j i A j p i ). Observed Variables: Match outcome involving k teams: Rank r j for each team j. p(r 1,..., r k t 1,..., t k ) = I(t r1 > t r2 > > t rk ) Thus, r j < r j+1 implies t j > t j+1. The parameters of the model are β 2, µ i and σi 2.

8 TrueSkill TM Example We consider a game with 3 teams A 1 = {1}, A 2 = {2, 3}, A 3 = {4}. Results: r = (1, 2, 3). Team A 1 wins followed by A 2 and A 3. Define d 1 = t 1 t 2 and d 2 = t 2 t 3 : p(d 1 t 1, t 2 ) = δ(d 1 t 1 + t 2 ), p(d 2 t 2, t 3 ) = δ(d 2 t 2 + t 3 ). The result implies d 1 > 0 and d 2 > 0. Use transitivity! Thus, r = (1, 2, 3) is equivalent to observing b 1 = 1 and b 2 = 1 where: { { 1 if d 1 > 0, 1 if d 2 > 0, p(b 1 d 1 ) = p(b 2 d 2 ) = 0 if d 1 0. 0 if d 2 0.

Bayesian Network for the Example Note that we observe b 1 = 1 and b 2 = 1. 9

10 TrueSkill TM Bayesian Online Learning Inference involves computing a posterior given a match outcome: p(s, p, t, d, b) = p(s, b, t, p, d) p(b) = p(b d)p(d t)p(t p)p(p s)p(s) p(b) We marginalize out t and p and d to get the marginal posterior of s. Bayesian Online Learning The posterior after one match is used as the prior for the next match. {µ i (0), σ 2 i Match #1 (0)} {µ i (1), σi 2 Match #2 (1)} {µ i (2), σi 2(2)} Marginal Posterior is not Gaussian!

11 Outline 1 Introduction 2 The Probabilistic Model 3 Approximate Inference 4 Results

Bethe Cluster Graph for the Example 12

13 Message Passing Algorithm We pass messages in the Bethe cluster graph until convergence to compute the posterior marginals of s 1,..., s 4. δ i j (s i,j ) = ψ i (c i ) δ k i (s k,i ) d (c i \ s i,j ) /δ j i(s i,j ). k Nb i where s i,j are the variables in the sepset of edge i j. This is the same rule as the one used in Belief Propagation. The posterior marginals for s 1,..., s 4 are given: p(s i b) ψ i (s i ) k Nb i δ k i (s i ) = N (s i µ i, σ 2 i )δ k i (s i ). All messages are Gaussian except for the two bottom messages!

Bethe Cluster Graph: Message Passing 14

Bethe Cluster Graph: Message Passing 15

Approximate Messages: Projection Step We consider the case of the first message. The first step is to approximate the marginal by projecting into the Gaussian family: ψ i (d 1 )δ j i (d 1 ) = I(d 1 > 0)N (d 1 ˆm, ˆv). For this, we compute the log of the normalization constant: ( ) ˆm log Z = log I(d 1 > 0)N (d 1 ˆm, ˆv)dd 1 = log Φ. ˆv We can obtain the mean and the variance of I(d 1 > 0)N (d 1 ˆm, ˆv) by computing the derivatives with respect to ˆm and ˆv! Assume m and v are the mean and the variance. The approximate message is then: δ i j (d 1 ) N (d 1 m, v) N (d 1 ˆm, ˆv). The computation of the other approximate message is equivalent. 16

17 Outline 1 Introduction 2 The Probabilistic Model 3 Approximate Inference 4 Results

Convergence Speed 40 35 30 25 Level 20 15 10 char (TrueSkill ) SQLWildman(TrueSkill ) 5 char (Halo 2rank) SQLWildman(Halo 2rank) 0 0 100 200 300 400 NumberofGames 18

19 Applications to Online Gaming Leader-board of players: Provides a global ranking of all players. The rank is conservative: µ i 3σ i. Matchmaking: For players: Most uncertain outcome is better. For inference: Most uncertain outcome is most informative. Good for both players and the model.

20 Matchmaking: Probability of winning or loosing We assume player i wants to play against player j. What is the probability of winning? p(p i > p j ) = = I(p i p j > 0)p(p j s j )p(s j )p(p i s i )p(s i )dp i dp j ds i ds j I(p i p j > 0)N (p j s j, β 2 )N (p i s i, β 2 ) N (s j µ j, σj 2 )N (s i µ i, σi 2 )dp i dp j ds i ds j = I(p i p j > 0)N (p j µ j, σi 2 + β 2 )N (p i µ i, σi 2 + β 2 )dp i dp j = Φ µ i µ j. σi 2 + σj 2 + 2β 2

Game Over! 21