Probabilistic Ranking: TrueSkill

Size: px
Start display at page:

Download "Probabilistic Ranking: TrueSkill"

Transcription

1 Probabilistic Ranking: TrueSkill Zoubin Ghahramani Department of Engineering University of Cambridge, UK Wroc law Lectures 013

2 Motivation for ranking Competition is central to our lives. It is an innate biological trait, and the driving principle of many sports. In the Ancient Olympics (700 BC), the winners of the events were admired and immortalised in poems and statues. Today in pretty much every sport there are player or team rankings. (Football leagues, Poker tournament rankings, etc). We are going to focus on one example: tennis players, in men singles games. We are going to keep in mind the goal of answering the following question: What is the probability that player 1 defeats player? Rasmussen & Ghahramani (CUED) Lecture 6 and 7: Probabilistic Ranking /

3 The ATP ranking system for tennis players Men Singles ranking as of 8 December 011: Rank, Name & Nationality Points Tournaments Played 1 Djokovic, Novak (SRB) 13, Nadal, Rafael (ESP) 9, Federer, Roger (SUI) 8, Murray, Andy (GBR) 7, Ferrer, David (ESP) 4, Tsonga, Jo-Wilfried (FRA) 4, Berdych, Tomas (CZE) 3, Fish, Mardy (USA), Tipsarevic, Janko (SRB), Almagro, Nicolas (ESP), Del Potro, Juan Martin (ARG),315 1 Simon, Gilles (FRA), Soderling, Robin (SWE),10 14 Roddick, Andy (USA) 1, Monfils, Gael (FRA) 1, Dolgopolov, Alexandr (UKR) 1, Wawrinka, Stanislas (SUI) 1, Isner, John (USA) 1, Gasquet, Richard (FRA) 1, Lopez, Feliciano (ESP) 1,755 8 ATP: Association of Tennis Professionals ( Rasmussen & Ghahramani (CUED) Lecture 6 and 7: Probabilistic Ranking 3 /

4 The ATP ranking system explained (to some degree) Sum of points from best 18 results of the past 5 weeks. Mandatory events: 4 Grand Slams, and 8 Masters 1000 Series events. Best 6 results from International Events (4 of these must be 500 events). Points breakdown for all tournament categories (01): W F SF QF R16 R3 R64 R18 Q Grand Slams Barclays ATP World Tour Finals *1500 ATP World Tour Masters (5) (10) (1)5 ATP (0) ()0 ATP (5) (3)1 Challenger 15,000 +H Challenger 15, Challenger 100, Challenger 75, Challenger 50, Challenger 35,000 +H Futures** 15,000 +H Futures** 15, Futures** 10, The Grand Slams are the Australian Open, the French Open, Wimbledon, and the US Open. The Masters 1000 Tournaments are: Cincinnati, Indian Wells, Madrid, Miami, Monte-Carlo, Paris, Rome, Shanghai, and Toronto. The Masters 500 Tournaments are: Acapulco, Barcelona, Basel, Beijing, Dubai, Hamburg, Memphis, Rotterdam, Tokyo, Valencia and Washington. The Masters 50 Tournaments are: Atlanta, Auckland, Bangkok, Bastad, Belgrade, Brisbane, Bucharest, Buenos Aires, Casablanca, Chennai, Delray Beach, Doha, Eastbourne, Estoril, Gstaad, Halle, Houston, Kitzbuhel, Kuala Lumpur, London, Los Angeles, Marseille, Metz, Montpellier, Moscow, Munich, Newport, Nice, Sao Paulo, San Jose, s-hertogenbosch, St. Petersburg, Stockholm, Stuttgart, Sydney, Umag, Vienna, Vina del Mar, Winston-Salem, Zagreb and Dusseldorf. Rasmussen & Ghahramani (CUED) Lecture 6 and 7: Probabilistic Ranking 4 /

5 A laundry list of objections and open questions Some questions: Rank, Name & Nationality Points 1 Djokovic, Novak (SRB) 13,675 Nadal, Rafael (ESP) 9,575 3 Federer, Roger (SUI) 8,170 4 Murray, Andy (GBR) 7,380 Is a player ranked higher than another more likely to win? What is the probability that Murray defeats Djokovic? How much would you (rationally) bet on Murray? And some concerns: The points system ignores who you played against. 6 out of the 18 tournaments don t need to be common to two players. Other examples: Premier League. Meaningless intermediate results throughout the season: doesn t say whom you played and whom you didn t! Rasmussen & Ghahramani (CUED) Lecture 6 and 7: Probabilistic Ranking 5 /

6 Towards a probabilistic ranking system What we really want is to infer is a player s skill. Skills must be comparable: a player of higher skill is more likely to win. We want to do probabilistic inference of players skills. We want to be able to compute the probability of a game outcome. A generative model for game outcomes: 1 Take two tennis players with known skills (w i R) Player 1 with skill w 1. Player with skill w. Compute the difference between the skills of Player 1 and Player : s = w 1 - w 3 Add noise (n N(0, 1)) to account for performance inconsistency: t = s + n 4 The game outcome is given by y = sign(t) y =+1 means Player 1 wins. y = -1 means Player wins. Rasmussen & Ghahramani (CUED) Lecture 6 and 7: Probabilistic Ranking 6 /

7 TrueSkill, a probabilistic skill rating system w 1 and w are the skills of Players 1 and. We treat them in a Bayesian way: prior p(w i ) = N(w i µ i, i) s = w 1 - w is the skill difference. t N(t s,1) is the performance difference. The probability of outcome y given known skills is: ZZ p(y w 1, w ) = p(y t)p(t s)p(s w 1, w )dsdt likelihood The posterior over skills given the game outcome is: p(w 1, w y) = p(w 1 )p(w )p(y w 1, w ) RR p(w1 )p(w )p(y w 1, w )dw 1 dw TrueSkill : A Bayesian Skill Rating System. Herbrich, Minka and Graepel, NIPS19, 007. Rasmussen & Ghahramani (CUED) Lecture 6 and 7: Probabilistic Ranking 7 /

8 The Likelihood ZZ Z p(y w 1, w ) = p(y t)p(t s)p(s w 1, w )dsdt = p(y t)p(t w 1, w )dt = (y(w 1 - w )). where (a) = Z a -1 N(x; 0, 1)dx (a) is the Gaussian cumulative distribution function, or probit function. density p(x) and cumulative probability Φ(x) density, N(0,1) cumulative probability, Φ x Rasmussen & Ghahramani (CUED) Lecture 6 and 7: Probabilistic Ranking 8 /

9 The likelihood in a picture Rasmussen & Ghahramani (CUED) Lecture 6 and 7: Probabilistic Ranking 9 /

10 An intractable posterior The joint posterior distribution over skills does not have a closed form: p(w 1, w y) = N(w 1 ; µ 1, 1 )N(w ; µ, ) (y(w 1 - w )) RR N(w1 ; µ 1, 1 )N(w ; µ, ) (y(w 1 - w ))dw 1 dw w 1 and w become correlated, the posterior does not factorise. The posterior is no longer a Gaussian density function. The normalising constant of the posterior, the prior over y does have closed form: ZZ y(µ1 - µ ) p(y) = N(w 1 ; µ 1, 1)N(w ; µ, ) (y(w 1 - w ))dw 1 dw = q This is a smoother version of the likelihood p(y w 1, w ). Can you explain why? Rasmussen & Ghahramani (CUED) Lecture 6 and 7: Probabilistic Ranking 10 /

11 Joint posterior after several games Each player playes against multiple opponents, possibly multiple times; what does the joint posterior look like? skill, w skill, w 1 skill, w skill, w 1 The combined posterior is difficult to picture. skill, w skill, w How do we do inference with an ugly posterior like that? How do we predict the outcome of a match? Rasmussen & Ghahramani (CUED) Lecture 6 and 7: Probabilistic Ranking 11 /

12 How do we do integrals wrt an intractable posterior? Approximate expectations of a function (x) wrt probability p(x): Z E p(x) [ (x)] = = (x)p(x)dx, where x R D, when these are not analytically tractable, and typically D Assume that we can evaluate (x) and p(x). Rasmussen & Ghahramani (CUED) Lecture 6 and 7: Probabilistic Ranking 1 /

13 Numerical integration on a grid Approximate the integral by a sum of products Z TX (x)p(x)dx ' (x ( ) )p(x ( ) ) x, where the x ( ) lie on an equidistant grid (or fancier versions of this). = Problem: the number of grid points required, k D, grows exponentially with the dimension D. Practicable only to D = 4 or so. Rasmussen & Ghahramani (CUED) Lecture 6 and 7: Probabilistic Ranking 13 /

14 Monte Carlo The foundation identity for Monte Carlo approximations is E[ (x)] ' ˆ = 1 TX (x ( ) ), where x ( ) p(x). T = Under mild conditions, ˆ! E[ (x)] as T! 1. For moderate T, ˆ may still be a good approximation. In fact it is an unbiased estimate with V[ ˆ ] = V[ ] Z, where V[ ] = (x) - p(x)dx. T Note, that this variance in independent of the dimension D of x. Rasmussen & Ghahramani (CUED) Lecture 6 and 7: Probabilistic Ranking 14 /

15 Markov Chain Monte Carlo This is great, but how do we generate random samples from p(x)? If p(x) has a standard form, oo one-dimensional we may be able to generate independent samples. Idea: could we design a Markov Chain, q(x 0 x), which generates (dependent) samples from the desired distribution p(x)? One such algorithm is called Gibbs sampling: for each component i of x in turn, sample a new value from the conditional distribution of x i given all other variables: x i p(x i x 1,...,x i-1, x i+1,...,x D ). It can be shown, that this will generate dependent samples from the joint distribution p(x). Gibbs sampling reduces the task of sompling from a joint distribution, to sampling from a sequence of univariate conditional distributions. Rasmussen & Ghahramani (CUED) Lecture 6 and 7: Probabilistic Ranking 15 /

16 Gibbs sampling example: Multivariate Gaussian 0 iterations of Gibbs sampling on a bivariate Gaussian; both conditional distributions are Gaussian. Notice that strong correlations can slow down Gibbs sampling. Rasmussen & Ghahramani (CUED) Lecture 6 and 7: Probabilistic Ranking 16 /

17 Gibbs Sampling Gibbs sampling is a parameter free algorithm, applicable if we know how to sample from the conditional distributions. Main disadvantage: depending on the target distribution, there may be very strong correlations between consequtive samples. To get less dependence, Gibbs sampling is often run for a long time, and the samples are thinned by keeping only every 10th or 100th sample. It is often challenging to judge the effective correlation length of a Gibbs sampler. Sometimes several Gibbs samplers are run form different starting points, to compare results. Rasmussen & Ghahramani (CUED) Lecture 6 and 7: Probabilistic Ranking 17 /

18 Gibbs sampling for the TrueSkill model We have g = 1,..., N games where I g : id of Player 1 and J g : id of Player. The outcome of game g is y g =+1 if I g wins, y g = -1 if J g wins. Gibbs sampling alternates between sampling skills w =[w 1,...,w M ] > conditional on fixed performance differences t =[t 1,...,t N ] >, and sampling t conditional on fixed w. 1 Initialise w, e.g. from the prior p(w). Sample the performance differences from p(t g w Ig, w Jg, y g ) / (y g - sign(t g ))N(t g ; w Ig - w Jg,1) 3 Jointly sample the skills from 4 Go back to step. p(w t) {z } / p(w) {z } N(w;µ, ) g=1 N(w;µ 0, 0 ) NY p(t g w Ig, w Jg ) {z } / N(w;µ g, g ) Rasmussen & Ghahramani (CUED) Lecture 6 and 7: Probabilistic Ranking 18 /

19 Gaussian identities The conditional for the performance is both Gaussian in t g and proportional to a Gaussian in w p(t g w Ig, w Jg ) / exp - 1 (w I g - w Jg - t g ) / N - 1 w Ig - µ h 1 > 1-1 i w Jg - µ -1 1 w Ig - µ 1 w Jg - µ with µ 1 - µ = t g. Notice that h i µ 1 µ = t g -t g Remember that for products of Gaussians precisions add up, and means weighted by precisions (natural parameters) also add up: N(w; µ a, a )N(w; µ b, b )=z c N w; c ( -1 a µ a + -1 b µ b), c =( -1 a + -1 b )-1 Rasmussen & Ghahramani (CUED) Lecture 6 and 7: Probabilistic Ranking 19 /

20 Conditional posterior over skills given performances We can now compute the covariance and the mean of the conditional posterior. NX NX -1 = g µ = -1 0 µ g µ g g=1 {z } -1 To compute the mean it is useful to note that: NX µ i = t g (i - I g ) - (i - J g ) g=1 And for the covariance we note that: NX [ -1 ] ii = (i - I g )+ (i - J g ) g=1 [ -1 ] i6=j = - NX g=1 g=1 {z } µ (i - I g ) (j - J g )+ (i - J g ) (j - I g ) Rasmussen & Ghahramani (CUED) Lecture 6 and 7: Probabilistic Ranking 0 /

21 Implementing Gibbs sampling for the TrueSkill model we have derived the conditional distribution for the performance differences in game g and for the skills. These are: the posterior conditional performance difference for t g is a univariate truncated Gaussian. How can we sample from it? by rejection sampling from a Gaussian, or by the inverse transformation method (passing a uniform on an interval through the inverse cumulative distribution function). the conditional skills can be sampled jointly form the corresponding Gaussian (using the cholesky factorization of the covariance matrix). Once samples have been drawn from the posterior, these can be used to make predictions for game outcomes, using the generative model. How would you do this? Rasmussen & Ghahramani (CUED) Lecture 6 and 7: Probabilistic Ranking 1 /

22 (a) is the Gaussian cumulative distribution function, or probit function. Rasmussen & Ghahramani (CUED) Lecture 6 and 7: Probabilistic Ranking / Appendix: The likelihood in detail ZZ Z p(y w 1, w ) = p(y t)p(t s)p(s w 1, w )dsdt = p(y t)p(t w 1, w )dt = = = y = = Z +1-1 Z +1-1 Z +y1 -y1 Z +1-1 Z +1 0 (y - sign(t))n(t w 1 - w,1)dt (1 - sign(yt))n(yt y(w 1 - w ),1)dt (1 - sign(z))n(z y(w 1 - w ),1)dz (use z yt) (1 - sign(z))n(z y(w 1 - w ),1)dz N(z y(w 1 - w ),1)dz = Z y(w1 -w ) -1 N(x 0, 1)dx (use x y(w 1 - w ) - z) = (y(w 1 - w )) where (a) = Z a -1 N(x 0, 1)dx

Lecture 6 and 7: Probabilistic Ranking

Lecture 6 and 7: Probabilistic Ranking Lecture 6 and 7: Probabilistic Ranking 4F13: Machine Learning Carl Edward Rasmussen and Zoubin Ghahramani Department of Engineering University of Cambridge http://mlg.eng.cam.ac.uk/teaching/4f13/ Rasmussen

More information

Lecture 7 and 8: Markov Chain Monte Carlo

Lecture 7 and 8: Markov Chain Monte Carlo Lecture 7 and 8: Markov Chain Monte Carlo 4F13: Machine Learning Zoubin Ghahramani and Carl Edward Rasmussen Department of Engineering University of Cambridge http://mlg.eng.cam.ac.uk/teaching/4f13/ Ghahramani

More information

Approximate Inference in Practice Microsoft s Xbox TrueSkill TM

Approximate Inference in Practice Microsoft s Xbox TrueSkill TM 1 Approximate Inference in Practice Microsoft s Xbox TrueSkill TM Daniel Hernández-Lobato Universidad Autónoma de Madrid 2014 2 Outline 1 Introduction 2 The Probabilistic Model 3 Approximate Inference

More information

Serving in tennis: deuce court and ad court differ

Serving in tennis: deuce court and ad court differ Serving in tennis: deuce court and ad court differ Spieksma F. KBI_1622 Serving in tennis: deuce court and ad court differ Frits C.R. Spieksma Abstract We present statistical evidence that in professional

More information

Basic Sampling Methods

Basic Sampling Methods Basic Sampling Methods Sargur Srihari srihari@cedar.buffalo.edu 1 1. Motivation Topics Intractability in ML How sampling can help 2. Ancestral Sampling Using BNs 3. Transforming a Uniform Distribution

More information

Approximate Inference using MCMC

Approximate Inference using MCMC Approximate Inference using MCMC 9.520 Class 22 Ruslan Salakhutdinov BCS and CSAIL, MIT 1 Plan 1. Introduction/Notation. 2. Examples of successful Bayesian models. 3. Basic Sampling Algorithms. 4. Markov

More information

Probabilistic Graphical Models Lecture 17: Markov chain Monte Carlo

Probabilistic Graphical Models Lecture 17: Markov chain Monte Carlo Probabilistic Graphical Models Lecture 17: Markov chain Monte Carlo Andrew Gordon Wilson www.cs.cmu.edu/~andrewgw Carnegie Mellon University March 18, 2015 1 / 45 Resources and Attribution Image credits,

More information

Latent Variable Models for Binary Data. Suppose that for a given vector of explanatory variables x, the latent

Latent Variable Models for Binary Data. Suppose that for a given vector of explanatory variables x, the latent Latent Variable Models for Binary Data Suppose that for a given vector of explanatory variables x, the latent variable, U, has a continuous cumulative distribution function F (u; x) and that the binary

More information

Nonparametric Bayesian Methods (Gaussian Processes)

Nonparametric Bayesian Methods (Gaussian Processes) [70240413 Statistical Machine Learning, Spring, 2015] Nonparametric Bayesian Methods (Gaussian Processes) Jun Zhu dcszj@mail.tsinghua.edu.cn http://bigml.cs.tsinghua.edu.cn/~jun State Key Lab of Intelligent

More information

Large-scale Ordinal Collaborative Filtering

Large-scale Ordinal Collaborative Filtering Large-scale Ordinal Collaborative Filtering Ulrich Paquet, Blaise Thomson, and Ole Winther Microsoft Research Cambridge, University of Cambridge, Technical University of Denmark ulripa@microsoft.com,brmt2@cam.ac.uk,owi@imm.dtu.dk

More information

Introduction to Machine Learning CMU-10701

Introduction to Machine Learning CMU-10701 Introduction to Machine Learning CMU-10701 Markov Chain Monte Carlo Methods Barnabás Póczos & Aarti Singh Contents Markov Chain Monte Carlo Methods Goal & Motivation Sampling Rejection Importance Markov

More information

Statistical Machine Learning Lecture 8: Markov Chain Monte Carlo Sampling

Statistical Machine Learning Lecture 8: Markov Chain Monte Carlo Sampling 1 / 27 Statistical Machine Learning Lecture 8: Markov Chain Monte Carlo Sampling Melih Kandemir Özyeğin University, İstanbul, Turkey 2 / 27 Monte Carlo Integration The big question : Evaluate E p(z) [f(z)]

More information

Expectation Propagation for Approximate Bayesian Inference

Expectation Propagation for Approximate Bayesian Inference Expectation Propagation for Approximate Bayesian Inference José Miguel Hernández Lobato Universidad Autónoma de Madrid, Computer Science Department February 5, 2007 1/ 24 Bayesian Inference Inference Given

More information

16 : Markov Chain Monte Carlo (MCMC)

16 : Markov Chain Monte Carlo (MCMC) 10-708: Probabilistic Graphical Models 10-708, Spring 2014 16 : Markov Chain Monte Carlo MCMC Lecturer: Matthew Gormley Scribes: Yining Wang, Renato Negrinho 1 Sampling from low-dimensional distributions

More information

Pattern Recognition and Machine Learning. Bishop Chapter 11: Sampling Methods

Pattern Recognition and Machine Learning. Bishop Chapter 11: Sampling Methods Pattern Recognition and Machine Learning Chapter 11: Sampling Methods Elise Arnaud Jakob Verbeek May 22, 2008 Outline of the chapter 11.1 Basic Sampling Algorithms 11.2 Markov Chain Monte Carlo 11.3 Gibbs

More information

The Particle Filter. PD Dr. Rudolph Triebel Computer Vision Group. Machine Learning for Computer Vision

The Particle Filter. PD Dr. Rudolph Triebel Computer Vision Group. Machine Learning for Computer Vision The Particle Filter Non-parametric implementation of Bayes filter Represents the belief (posterior) random state samples. by a set of This representation is approximate. Can represent distributions that

More information

STA 4273H: Statistical Machine Learning

STA 4273H: Statistical Machine Learning STA 4273H: Statistical Machine Learning Russ Salakhutdinov Department of Computer Science! Department of Statistical Sciences! rsalakhu@cs.toronto.edu! h0p://www.cs.utoronto.ca/~rsalakhu/ Lecture 7 Approximate

More information

Variational Scoring of Graphical Model Structures

Variational Scoring of Graphical Model Structures Variational Scoring of Graphical Model Structures Matthew J. Beal Work with Zoubin Ghahramani & Carl Rasmussen, Toronto. 15th September 2003 Overview Bayesian model selection Approximations using Variational

More information

Lecture 2: From Linear Regression to Kalman Filter and Beyond

Lecture 2: From Linear Regression to Kalman Filter and Beyond Lecture 2: From Linear Regression to Kalman Filter and Beyond January 18, 2017 Contents 1 Batch and Recursive Estimation 2 Towards Bayesian Filtering 3 Kalman Filter and Bayesian Filtering and Smoothing

More information

April 20th, Advanced Topics in Machine Learning California Institute of Technology. Markov Chain Monte Carlo for Machine Learning

April 20th, Advanced Topics in Machine Learning California Institute of Technology. Markov Chain Monte Carlo for Machine Learning for for Advanced Topics in California Institute of Technology April 20th, 2017 1 / 50 Table of Contents for 1 2 3 4 2 / 50 History of methods for Enrico Fermi used to calculate incredibly accurate predictions

More information

CPSC 540: Machine Learning

CPSC 540: Machine Learning CPSC 540: Machine Learning MCMC and Non-Parametric Bayes Mark Schmidt University of British Columbia Winter 2016 Admin I went through project proposals: Some of you got a message on Piazza. No news is

More information

Lecture 2: From Linear Regression to Kalman Filter and Beyond

Lecture 2: From Linear Regression to Kalman Filter and Beyond Lecture 2: From Linear Regression to Kalman Filter and Beyond Department of Biomedical Engineering and Computational Science Aalto University January 26, 2012 Contents 1 Batch and Recursive Estimation

More information

CSci 8980: Advanced Topics in Graphical Models Gaussian Processes

CSci 8980: Advanced Topics in Graphical Models Gaussian Processes CSci 8980: Advanced Topics in Graphical Models Gaussian Processes Instructor: Arindam Banerjee November 15, 2007 Gaussian Processes Outline Gaussian Processes Outline Parametric Bayesian Regression Gaussian

More information

Gaussian Processes for Machine Learning

Gaussian Processes for Machine Learning Gaussian Processes for Machine Learning Carl Edward Rasmussen Max Planck Institute for Biological Cybernetics Tübingen, Germany carl@tuebingen.mpg.de Carlos III, Madrid, May 2006 The actual science of

More information

Gaussian Process Regression

Gaussian Process Regression Gaussian Process Regression 4F1 Pattern Recognition, 21 Carl Edward Rasmussen Department of Engineering, University of Cambridge November 11th - 16th, 21 Rasmussen (Engineering, Cambridge) Gaussian Process

More information

An introduction to Sequential Monte Carlo

An introduction to Sequential Monte Carlo An introduction to Sequential Monte Carlo Thang Bui Jes Frellsen Department of Engineering University of Cambridge Research and Communication Club 6 February 2014 1 Sequential Monte Carlo (SMC) methods

More information

Need for Sampling in Machine Learning. Sargur Srihari

Need for Sampling in Machine Learning. Sargur Srihari Need for Sampling in Machine Learning Sargur srihari@cedar.buffalo.edu 1 Rationale for Sampling 1. ML methods model data with probability distributions E.g., p(x,y; θ) 2. Models are used to answer queries,

More information

Machine Learning Summer School

Machine Learning Summer School Machine Learning Summer School Lecture 3: Learning parameters and structure Zoubin Ghahramani zoubin@eng.cam.ac.uk http://learning.eng.cam.ac.uk/zoubin/ Department of Engineering University of Cambridge,

More information

Gibbs Sampling for the Probit Regression Model with Gaussian Markov Random Field Latent Variables

Gibbs Sampling for the Probit Regression Model with Gaussian Markov Random Field Latent Variables Gibbs Sampling for the Probit Regression Model with Gaussian Markov Random Field Latent Variables Mohammad Emtiyaz Khan Department of Computer Science University of British Columbia May 8, 27 Abstract

More information

Lecture 6: Graphical Models: Learning

Lecture 6: Graphical Models: Learning Lecture 6: Graphical Models: Learning 4F13: Machine Learning Zoubin Ghahramani and Carl Edward Rasmussen Department of Engineering, University of Cambridge February 3rd, 2010 Ghahramani & Rasmussen (CUED)

More information

Creating Non-Gaussian Processes from Gaussian Processes by the Log-Sum-Exp Approach. Radford M. Neal, 28 February 2005

Creating Non-Gaussian Processes from Gaussian Processes by the Log-Sum-Exp Approach. Radford M. Neal, 28 February 2005 Creating Non-Gaussian Processes from Gaussian Processes by the Log-Sum-Exp Approach Radford M. Neal, 28 February 2005 A Very Brief Review of Gaussian Processes A Gaussian process is a distribution over

More information

Linear Dynamical Systems

Linear Dynamical Systems Linear Dynamical Systems Sargur N. srihari@cedar.buffalo.edu Machine Learning Course: http://www.cedar.buffalo.edu/~srihari/cse574/index.html Two Models Described by Same Graph Latent variables Observations

More information

Bayesian Linear Models

Bayesian Linear Models Bayesian Linear Models Sudipto Banerjee September 03 05, 2017 Department of Biostatistics, Fielding School of Public Health, University of California, Los Angeles Linear Regression Linear regression is,

More information

Lecture 6: Bayesian Inference in SDE Models

Lecture 6: Bayesian Inference in SDE Models Lecture 6: Bayesian Inference in SDE Models Bayesian Filtering and Smoothing Point of View Simo Särkkä Aalto University Simo Särkkä (Aalto) Lecture 6: Bayesian Inference in SDEs 1 / 45 Contents 1 SDEs

More information

CS242: Probabilistic Graphical Models Lecture 7B: Markov Chain Monte Carlo & Gibbs Sampling

CS242: Probabilistic Graphical Models Lecture 7B: Markov Chain Monte Carlo & Gibbs Sampling CS242: Probabilistic Graphical Models Lecture 7B: Markov Chain Monte Carlo & Gibbs Sampling Professor Erik Sudderth Brown University Computer Science October 27, 2016 Some figures and materials courtesy

More information

Stat 451 Lecture Notes Markov Chain Monte Carlo. Ryan Martin UIC

Stat 451 Lecture Notes Markov Chain Monte Carlo. Ryan Martin UIC Stat 451 Lecture Notes 07 12 Markov Chain Monte Carlo Ryan Martin UIC www.math.uic.edu/~rgmartin 1 Based on Chapters 8 9 in Givens & Hoeting, Chapters 25 27 in Lange 2 Updated: April 4, 2016 1 / 42 Outline

More information

Cheng Soon Ong & Christian Walder. Canberra February June 2018

Cheng Soon Ong & Christian Walder. Canberra February June 2018 Cheng Soon Ong & Christian Walder Research Group and College of Engineering and Computer Science Canberra February June 2018 Outlines Overview Introduction Linear Algebra Probability Linear Regression

More information

Introduction to Machine Learning

Introduction to Machine Learning Introduction to Machine Learning Brown University CSCI 1950-F, Spring 2012 Prof. Erik Sudderth Lecture 25: Markov Chain Monte Carlo (MCMC) Course Review and Advanced Topics Many figures courtesy Kevin

More information

Bayesian Methods for Machine Learning

Bayesian Methods for Machine Learning Bayesian Methods for Machine Learning CS 584: Big Data Analytics Material adapted from Radford Neal s tutorial (http://ftp.cs.utoronto.ca/pub/radford/bayes-tut.pdf), Zoubin Ghahramni (http://hunch.net/~coms-4771/zoubin_ghahramani_bayesian_learning.pdf),

More information

Probabilistic Graphical Models

Probabilistic Graphical Models Probabilistic Graphical Models Brown University CSCI 2950-P, Spring 2013 Prof. Erik Sudderth Lecture 13: Learning in Gaussian Graphical Models, Non-Gaussian Inference, Monte Carlo Methods Some figures

More information

Bayesian Inference and MCMC

Bayesian Inference and MCMC Bayesian Inference and MCMC Aryan Arbabi Partly based on MCMC slides from CSC412 Fall 2018 1 / 18 Bayesian Inference - Motivation Consider we have a data set D = {x 1,..., x n }. E.g each x i can be the

More information

Lecture : Probabilistic Machine Learning

Lecture : Probabilistic Machine Learning Lecture : Probabilistic Machine Learning Riashat Islam Reasoning and Learning Lab McGill University September 11, 2018 ML : Many Methods with Many Links Modelling Views of Machine Learning Machine Learning

More information

17 : Markov Chain Monte Carlo

17 : Markov Chain Monte Carlo 10-708: Probabilistic Graphical Models, Spring 2015 17 : Markov Chain Monte Carlo Lecturer: Eric P. Xing Scribes: Heran Lin, Bin Deng, Yun Huang 1 Review of Monte Carlo Methods 1.1 Overview Monte Carlo

More information

Markov chain Monte Carlo

Markov chain Monte Carlo Markov chain Monte Carlo Feng Li feng.li@cufe.edu.cn School of Statistics and Mathematics Central University of Finance and Economics Revised on April 24, 2017 Today we are going to learn... 1 Markov Chains

More information

CSC 2541: Bayesian Methods for Machine Learning

CSC 2541: Bayesian Methods for Machine Learning CSC 2541: Bayesian Methods for Machine Learning Radford M. Neal, University of Toronto, 2011 Lecture 3 More Markov Chain Monte Carlo Methods The Metropolis algorithm isn t the only way to do MCMC. We ll

More information

Gaussian Process Approximations of Stochastic Differential Equations

Gaussian Process Approximations of Stochastic Differential Equations Gaussian Process Approximations of Stochastic Differential Equations Cédric Archambeau Dan Cawford Manfred Opper John Shawe-Taylor May, 2006 1 Introduction Some of the most complex models routinely run

More information

MCMC algorithms for fitting Bayesian models

MCMC algorithms for fitting Bayesian models MCMC algorithms for fitting Bayesian models p. 1/1 MCMC algorithms for fitting Bayesian models Sudipto Banerjee sudiptob@biostat.umn.edu University of Minnesota MCMC algorithms for fitting Bayesian models

More information

Part 6: Multivariate Normal and Linear Models

Part 6: Multivariate Normal and Linear Models Part 6: Multivariate Normal and Linear Models 1 Multiple measurements Up until now all of our statistical models have been univariate models models for a single measurement on each member of a sample of

More information

Quantifying Uncertainty

Quantifying Uncertainty Sai Ravela M. I. T Last Updated: Spring 2013 1 Markov Chain Monte Carlo Monte Carlo sampling made for large scale problems via Markov Chains Monte Carlo Sampling Rejection Sampling Importance Sampling

More information

Intro to Probability. Andrei Barbu

Intro to Probability. Andrei Barbu Intro to Probability Andrei Barbu Some problems Some problems A means to capture uncertainty Some problems A means to capture uncertainty You have data from two sources, are they different? Some problems

More information

Answers and expectations

Answers and expectations Answers and expectations For a function f(x) and distribution P(x), the expectation of f with respect to P is The expectation is the average of f, when x is drawn from the probability distribution P E

More information

Bayesian Learning in Undirected Graphical Models

Bayesian Learning in Undirected Graphical Models Bayesian Learning in Undirected Graphical Models Zoubin Ghahramani Gatsby Computational Neuroscience Unit University College London, UK http://www.gatsby.ucl.ac.uk/ Work with: Iain Murray and Hyun-Chul

More information

F denotes cumulative density. denotes probability density function; (.)

F denotes cumulative density. denotes probability density function; (.) BAYESIAN ANALYSIS: FOREWORDS Notation. System means the real thing and a model is an assumed mathematical form for the system.. he probability model class M contains the set of the all admissible models

More information

CSC2541 Lecture 2 Bayesian Occam s Razor and Gaussian Processes

CSC2541 Lecture 2 Bayesian Occam s Razor and Gaussian Processes CSC2541 Lecture 2 Bayesian Occam s Razor and Gaussian Processes Roger Grosse Roger Grosse CSC2541 Lecture 2 Bayesian Occam s Razor and Gaussian Processes 1 / 55 Adminis-Trivia Did everyone get my e-mail

More information

Bayesian Inference in GLMs. Frequentists typically base inferences on MLEs, asymptotic confidence

Bayesian Inference in GLMs. Frequentists typically base inferences on MLEs, asymptotic confidence Bayesian Inference in GLMs Frequentists typically base inferences on MLEs, asymptotic confidence limits, and log-likelihood ratio tests Bayesians base inferences on the posterior distribution of the unknowns

More information

Bayesian Inference for the Multivariate Normal

Bayesian Inference for the Multivariate Normal Bayesian Inference for the Multivariate Normal Will Penny Wellcome Trust Centre for Neuroimaging, University College, London WC1N 3BG, UK. November 28, 2014 Abstract Bayesian inference for the multivariate

More information

Outline Lecture 2 2(32)

Outline Lecture 2 2(32) Outline Lecture (3), Lecture Linear Regression and Classification it is our firm belief that an understanding of linear models is essential for understanding nonlinear ones Thomas Schön Division of Automatic

More information

Machine Learning Srihari. Gaussian Processes. Sargur Srihari

Machine Learning Srihari. Gaussian Processes. Sargur Srihari Gaussian Processes Sargur Srihari 1 Topics in Gaussian Processes 1. Examples of use of GP 2. Duality: From Basis Functions to Kernel Functions 3. GP Definition and Intuition 4. Linear regression revisited

More information

Approximate Inference Part 1 of 2

Approximate Inference Part 1 of 2 Approximate Inference Part 1 of 2 Tom Minka Microsoft Research, Cambridge, UK Machine Learning Summer School 2009 http://mlg.eng.cam.ac.uk/mlss09/ Bayesian paradigm Consistent use of probability theory

More information

19 : Slice Sampling and HMC

19 : Slice Sampling and HMC 10-708: Probabilistic Graphical Models 10-708, Spring 2018 19 : Slice Sampling and HMC Lecturer: Kayhan Batmanghelich Scribes: Boxiang Lyu 1 MCMC (Auxiliary Variables Methods) In inference, we are often

More information

Multivariate Bayesian Linear Regression MLAI Lecture 11

Multivariate Bayesian Linear Regression MLAI Lecture 11 Multivariate Bayesian Linear Regression MLAI Lecture 11 Neil D. Lawrence Department of Computer Science Sheffield University 21st October 2012 Outline Univariate Bayesian Linear Regression Multivariate

More information

Bayesian data analysis in practice: Three simple examples

Bayesian data analysis in practice: Three simple examples Bayesian data analysis in practice: Three simple examples Martin P. Tingley Introduction These notes cover three examples I presented at Climatea on 5 October 0. Matlab code is available by request to

More information

(Multivariate) Gaussian (Normal) Probability Densities

(Multivariate) Gaussian (Normal) Probability Densities (Multivariate) Gaussian (Normal) Probability Densities Carl Edward Rasmussen, José Miguel Hernández-Lobato & Richard Turner April 20th, 2018 Rasmussen, Hernàndez-Lobato & Turner Gaussian Densities April

More information

The linear model is the most fundamental of all serious statistical models encompassing:

The linear model is the most fundamental of all serious statistical models encompassing: Linear Regression Models: A Bayesian perspective Ingredients of a linear model include an n 1 response vector y = (y 1,..., y n ) T and an n p design matrix (e.g. including regressors) X = [x 1,..., x

More information

Approximate Inference Part 1 of 2

Approximate Inference Part 1 of 2 Approximate Inference Part 1 of 2 Tom Minka Microsoft Research, Cambridge, UK Machine Learning Summer School 2009 http://mlg.eng.cam.ac.uk/mlss09/ 1 Bayesian paradigm Consistent use of probability theory

More information

p(z)

p(z) Chapter Statistics. Introduction This lecture is a quick review of basic statistical concepts; probabilities, mean, variance, covariance, correlation, linear regression, probability density functions and

More information

STAT 518 Intro Student Presentation

STAT 518 Intro Student Presentation STAT 518 Intro Student Presentation Wen Wei Loh April 11, 2013 Title of paper Radford M. Neal [1999] Bayesian Statistics, 6: 475-501, 1999 What the paper is about Regression and Classification Flexible

More information

CS-E3210 Machine Learning: Basic Principles

CS-E3210 Machine Learning: Basic Principles CS-E3210 Machine Learning: Basic Principles Lecture 4: Regression II slides by Markus Heinonen Department of Computer Science Aalto University, School of Science Autumn (Period I) 2017 1 / 61 Today s introduction

More information

Bayesian Methods and Uncertainty Quantification for Nonlinear Inverse Problems

Bayesian Methods and Uncertainty Quantification for Nonlinear Inverse Problems Bayesian Methods and Uncertainty Quantification for Nonlinear Inverse Problems John Bardsley, University of Montana Collaborators: H. Haario, J. Kaipio, M. Laine, Y. Marzouk, A. Seppänen, A. Solonen, Z.

More information

GWAS V: Gaussian processes

GWAS V: Gaussian processes GWAS V: Gaussian processes Dr. Oliver Stegle Christoh Lippert Prof. Dr. Karsten Borgwardt Max-Planck-Institutes Tübingen, Germany Tübingen Summer 2011 Oliver Stegle GWAS V: Gaussian processes Summer 2011

More information

VCMC: Variational Consensus Monte Carlo

VCMC: Variational Consensus Monte Carlo VCMC: Variational Consensus Monte Carlo Maxim Rabinovich, Elaine Angelino, Michael I. Jordan Berkeley Vision and Learning Center September 22, 2015 probabilistic models! sky fog bridge water grass object

More information

CPSC 540: Machine Learning

CPSC 540: Machine Learning CPSC 540: Machine Learning Undirected Graphical Models Mark Schmidt University of British Columbia Winter 2016 Admin Assignment 3: 2 late days to hand it in today, Thursday is final day. Assignment 4:

More information

Strong Lens Modeling (II): Statistical Methods

Strong Lens Modeling (II): Statistical Methods Strong Lens Modeling (II): Statistical Methods Chuck Keeton Rutgers, the State University of New Jersey Probability theory multiple random variables, a and b joint distribution p(a, b) conditional distribution

More information

Clustering K-means. Clustering images. Machine Learning CSE546 Carlos Guestrin University of Washington. November 4, 2014.

Clustering K-means. Clustering images. Machine Learning CSE546 Carlos Guestrin University of Washington. November 4, 2014. Clustering K-means Machine Learning CSE546 Carlos Guestrin University of Washington November 4, 2014 1 Clustering images Set of Images [Goldberger et al.] 2 1 K-means Randomly initialize k centers µ (0)

More information

Modelling Transcriptional Regulation with Gaussian Processes

Modelling Transcriptional Regulation with Gaussian Processes Modelling Transcriptional Regulation with Gaussian Processes Neil Lawrence School of Computer Science University of Manchester Joint work with Magnus Rattray and Guido Sanguinetti 8th March 7 Outline Application

More information

Chapter 3: Maximum-Likelihood & Bayesian Parameter Estimation (part 1)

Chapter 3: Maximum-Likelihood & Bayesian Parameter Estimation (part 1) HW 1 due today Parameter Estimation Biometrics CSE 190 Lecture 7 Today s lecture was on the blackboard. These slides are an alternative presentation of the material. CSE190, Winter10 CSE190, Winter10 Chapter

More information

Review. DS GA 1002 Statistical and Mathematical Models. Carlos Fernandez-Granda

Review. DS GA 1002 Statistical and Mathematical Models.   Carlos Fernandez-Granda Review DS GA 1002 Statistical and Mathematical Models http://www.cims.nyu.edu/~cfgranda/pages/dsga1002_fall16 Carlos Fernandez-Granda Probability and statistics Probability: Framework for dealing with

More information

Gaussian Process Vine Copulas for Multivariate Dependence

Gaussian Process Vine Copulas for Multivariate Dependence Gaussian Process Vine Copulas for Multivariate Dependence José Miguel Hernández Lobato 1,2, David López Paz 3,2 and Zoubin Ghahramani 1 June 27, 2013 1 University of Cambridge 2 Equal Contributor 3 Ma-Planck-Institute

More information

Non-Factorised Variational Inference in Dynamical Systems

Non-Factorised Variational Inference in Dynamical Systems st Symposium on Advances in Approximate Bayesian Inference, 08 6 Non-Factorised Variational Inference in Dynamical Systems Alessandro D. Ialongo University of Cambridge and Max Planck Institute for Intelligent

More information

Markov Chain Monte Carlo

Markov Chain Monte Carlo Department of Statistics The University of Auckland https://www.stat.auckland.ac.nz/~brewer/ Emphasis I will try to emphasise the underlying ideas of the methods. I will not be teaching specific software

More information

Black-box α-divergence Minimization

Black-box α-divergence Minimization Black-box α-divergence Minimization José Miguel Hernández-Lobato, Yingzhen Li, Daniel Hernández-Lobato, Thang Bui, Richard Turner, Harvard University, University of Cambridge, Universidad Autónoma de Madrid.

More information

Lecture 8: The Metropolis-Hastings Algorithm

Lecture 8: The Metropolis-Hastings Algorithm 30.10.2008 What we have seen last time: Gibbs sampler Key idea: Generate a Markov chain by updating the component of (X 1,..., X p ) in turn by drawing from the full conditionals: X (t) j Two drawbacks:

More information

Introduction to Probabilistic Graphical Models: Exercises

Introduction to Probabilistic Graphical Models: Exercises Introduction to Probabilistic Graphical Models: Exercises Cédric Archambeau Xerox Research Centre Europe cedric.archambeau@xrce.xerox.com Pascal Bootcamp Marseille, France, July 2010 Exercise 1: basics

More information

CSC411 Fall 2018 Homework 5

CSC411 Fall 2018 Homework 5 Homework 5 Deadline: Wednesday, Nov. 4, at :59pm. Submission: You need to submit two files:. Your solutions to Questions and 2 as a PDF file, hw5_writeup.pdf, through MarkUs. (If you submit answers to

More information

Variational Principal Components

Variational Principal Components Variational Principal Components Christopher M. Bishop Microsoft Research 7 J. J. Thomson Avenue, Cambridge, CB3 0FB, U.K. cmbishop@microsoft.com http://research.microsoft.com/ cmbishop In Proceedings

More information

σ(a) = a N (x; 0, 1 2 ) dx. σ(a) = Φ(a) =

σ(a) = a N (x; 0, 1 2 ) dx. σ(a) = Φ(a) = Until now we have always worked with likelihoods and prior distributions that were conjugate to each other, allowing the computation of the posterior distribution to be done in closed form. Unfortunately,

More information

Computer Vision Group Prof. Daniel Cremers. 9. Gaussian Processes - Regression

Computer Vision Group Prof. Daniel Cremers. 9. Gaussian Processes - Regression Group Prof. Daniel Cremers 9. Gaussian Processes - Regression Repetition: Regularized Regression Before, we solved for w using the pseudoinverse. But: we can kernelize this problem as well! First step:

More information

COM336: Neural Computing

COM336: Neural Computing COM336: Neural Computing http://www.dcs.shef.ac.uk/ sjr/com336/ Lecture 2: Density Estimation Steve Renals Department of Computer Science University of Sheffield Sheffield S1 4DP UK email: s.renals@dcs.shef.ac.uk

More information

SAMPLING ALGORITHMS. In general. Inference in Bayesian models

SAMPLING ALGORITHMS. In general. Inference in Bayesian models SAMPLING ALGORITHMS SAMPLING ALGORITHMS In general A sampling algorithm is an algorithm that outputs samples x 1, x 2,... from a given distribution P or density p. Sampling algorithms can for example be

More information

Towards a Bayesian model for Cyber Security

Towards a Bayesian model for Cyber Security Towards a Bayesian model for Cyber Security Mark Briers (mbriers@turing.ac.uk) Joint work with Henry Clausen and Prof. Niall Adams (Imperial College London) 27 September 2017 The Alan Turing Institute

More information

Bayesian Methods with Monte Carlo Markov Chains II

Bayesian Methods with Monte Carlo Markov Chains II Bayesian Methods with Monte Carlo Markov Chains II Henry Horng-Shing Lu Institute of Statistics National Chiao Tung University hslu@stat.nctu.edu.tw http://tigpbp.iis.sinica.edu.tw/courses.htm 1 Part 3

More information

Monte Carlo in Bayesian Statistics

Monte Carlo in Bayesian Statistics Monte Carlo in Bayesian Statistics Matthew Thomas SAMBa - University of Bath m.l.thomas@bath.ac.uk December 4, 2014 Matthew Thomas (SAMBa) Monte Carlo in Bayesian Statistics December 4, 2014 1 / 16 Overview

More information

Probabilistic Graphical Models

Probabilistic Graphical Models Probabilistic Graphical Models Lecture 11 CRFs, Exponential Family CS/CNS/EE 155 Andreas Krause Announcements Homework 2 due today Project milestones due next Monday (Nov 9) About half the work should

More information

Machine Learning. Probabilistic KNN.

Machine Learning. Probabilistic KNN. Machine Learning. Mark Girolami girolami@dcs.gla.ac.uk Department of Computing Science University of Glasgow June 21, 2007 p. 1/3 KNN is a remarkably simple algorithm with proven error-rates June 21, 2007

More information

Modeling and state estimation Examples State estimation Probabilities Bayes filter Particle filter. Modeling. CSC752 Autonomous Robotic Systems

Modeling and state estimation Examples State estimation Probabilities Bayes filter Particle filter. Modeling. CSC752 Autonomous Robotic Systems Modeling CSC752 Autonomous Robotic Systems Ubbo Visser Department of Computer Science University of Miami February 21, 2017 Outline 1 Modeling and state estimation 2 Examples 3 State estimation 4 Probabilities

More information

This paper is not to be removed from the Examination Halls

This paper is not to be removed from the Examination Halls ~~ST104B ZA d0 This paper is not to be removed from the Examination Halls UNIVERSITY OF LONDON ST104B ZB BSc degrees and Diplomas for Graduates in Economics, Management, Finance and the Social Sciences,

More information

20: Gaussian Processes

20: Gaussian Processes 10-708: Probabilistic Graphical Models 10-708, Spring 2016 20: Gaussian Processes Lecturer: Andrew Gordon Wilson Scribes: Sai Ganesh Bandiatmakuri 1 Discussion about ML Here we discuss an introduction

More information

Bayesian Linear Regression

Bayesian Linear Regression Bayesian Linear Regression Sudipto Banerjee 1 Biostatistics, School of Public Health, University of Minnesota, Minneapolis, Minnesota, U.S.A. September 15, 2010 1 Linear regression models: a Bayesian perspective

More information

Introduction to Gaussian Processes

Introduction to Gaussian Processes Introduction to Gaussian Processes Iain Murray murray@cs.toronto.edu CSC255, Introduction to Machine Learning, Fall 28 Dept. Computer Science, University of Toronto The problem Learn scalar function of

More information

14 : Approximate Inference Monte Carlo Methods

14 : Approximate Inference Monte Carlo Methods 10-708: Probabilistic Graphical Models 10-708, Spring 2018 14 : Approximate Inference Monte Carlo Methods Lecturer: Kayhan Batmanghelich Scribes: Biswajit Paria, Prerna Chiersal 1 Introduction We have

More information