Bayesian modelling of football outcomes. 1 Introduction. Synopsis. (using Skellam s Distribution)

Size: px
Start display at page:

Download "Bayesian modelling of football outcomes. 1 Introduction. Synopsis. (using Skellam s Distribution)"

Transcription

1 Karlis & Ntzoufras: Bayesian modelling of football outcomes 3 Bayesian modelling of football outcomes (using Skellam s Distribution) Dimitris Karlis & Ioannis Ntzoufras s: {karlis, ntzoufras}@aueb.gr Department of Statistics Athens University of Economics and Business Athens, Greece; 1 Introduction Poisson distribution has been widely used as a simple modelling approach for predicting the number of goals in football (see, for example, Lee, 1997, Karlis and Ntzoufras, 2000). Empirical evidence has shown a (relatively low) correlation between the goals in a football game. This correlation must be incorporated in the model. It is reasonable to assume that the two outcome variables in football are correlated since the two teams interact during the game. 2007, 24-26th June, Manchester Karlis & Ntzoufras: Bayesian modelling of football outcomes 2 Karlis & Ntzoufras: Bayesian modelling of football outcomes 4 Synopsis 1. Introduction. 2. Skellam s Distribution. 3. Bayesian infernece for the Skellam s distribution 4. The General Model. 5. Illustrative example and results. 6. Discussion and further work. We can assume bivariate Poisson distribution (see Karlis and Ntzoufras, 2003). The marginal distributions are simple Poisson distributions, while the random variables are now dependent. Important issue model extra variation of draws (mainly 0-0, 1-1, 2-2). This problem has been effectively handled by Dixon and Coles (1997) and Karlis and Ntzoufras (2003)

2 Karlis & Ntzoufras: Bayesian modelling of football outcomes 5 Karlis & Ntzoufras: Bayesian modelling of football outcomes 7 The Bivariate Poisson Model (Karlis and Ntzoufras, 2003) (X i,y i ) BP(λ 1i,λ 2i,λ 3i ) where BP(λ 1,λ 2,λ 3 ) is the bivariate Poisson distribution with parameters λ i, i =1, 2, 3 and probability function P (X = x, Y = y) =e (λ1+λ2+λ3) λx 1 λ y min(x,y) 2 x y ( ) k λ3 k!. (1) x! y! k k λ 1 λ 2 k=0 Marginals: X P oisson(λ 1 + λ 3 )andy P oisson(λ 2 + λ 3 ) Means and Variance: E(X) =V (X) =λ 1 + λ 3, E(Y )=V (X) =λ 2 + λ 3 Advantages Accounts for the covariance which can be modelled Has he latent variables decomposition which can be used for data augmentation Poisson Marginals Easy interpretation Easy extension to similar models. Covariance: Cov(X, Y )=λ 3 > 0 Can be derived using latent variables W i P oisson(λ i ), for i =1, 2, 3with X = W 1 + W 3 and with Y = W 2 + W 3. Karlis & Ntzoufras: Bayesian modelling of football outcomes 6 Karlis & Ntzoufras: Bayesian modelling of football outcomes 8 Covariates can be linked directly on the means of the latent variables as in Karlis and Ntzoufras (2003) or directly on the marginal means. A Simple Model (Karlis and Ntzoufras, 2003) Response variables (X, Y ) are the home and away goals in each game. Consider the structure of Lee (1997) for λ 1 and λ 2 log(λ 1i ) = µ + H + A HTi + D ATi (2) log(λ 2i ) = µ + A ATi + D HTi (3) µ: constant; H: home effect; A k, D k attacking and defensive parameters of team k; HT i, AT i home and away team in i game. Constant covariance λ 3 Disadvantages How to model λ 3? Is constant λ 3 sufficient? Does λ 3 varies across teams or time? Underestimates the number of draws in certain cases (while it provides better results on this from the double Poisson model) Poisson Marginals while empirical evidence has shown slight over-dispersion: need to adjust The model allows only for positive correlation, if the data show negative correlation (even slow) the model cannot handle it

3 Karlis & Ntzoufras: Bayesian modelling of football outcomes 9 Karlis & Ntzoufras: Bayesian modelling of football outcomes 11 marginal means λ 3 win draw loss ratio Table 1: The effect on the predicted probabilities when ignoring the covariance; the BP increases the probability of a draw. Skellam s distribution (1) For any pair of variables (X, Y )thatcanbewrittenas X = W 1 + W 3, Y = W 2 + W 3 with W 1 P oisson(λ 1 ), W 2 P oisson(λ 2 ) and W 3 any distribution with parameters θ 3 then Z = X Y PD(λ 1,λ 2 ) (Poisson difference or Skellam s distribution with parameters λ 1 and λ 2 ). Karlis & Ntzoufras: Bayesian modelling of football outcomes 10 Karlis & Ntzoufras: Bayesian modelling of football outcomes 12 Skellam s distribution (2) Diagonal Inflated Bivariate Poisson Model Under this approach a diagonal inflated model is specified by (1 p)bp(x, y λ 1,λ 2,λ 3 ), x y P D (x, y) = (1 p)bp(x, y λ 1,λ 2,λ 3 )+pd(x, θ), x = y, where D(x, θ) is discrete distribution with parameter vector θ. Such models can be fitted using the EM algorithm. Important: diagonal inflation improves in several aspects: better draw prediction, overdipsersed marginals, introduce correlation (4) Z = X Y PD(λ 1,λ 2 ) Mean E(Z) = λ 1 λ 2 Poisson difference or Skellam s distribution with Variance Var(Z) = λ 1 + λ 2 and density function ( ) z/2 f PD (z λ, λ 2 )=P(Z = z λ 1,λ 2 )=e (λ1+λ2) λ1 I z (2 ) λ 1 λ 2. (5) λ 2 for all z Z,whereI r (x) is the modified Bessel function of order r ( ) k ( x ) r x 2 4 I r (x) = 2 k!γ(r + k +1). k=0 (see Abramowitz and Stegun, 1974, pp. 375).

4 Karlis & Ntzoufras: Bayesian modelling of football outcomes 13 Karlis & Ntzoufras: Bayesian modelling of football outcomes 15 Prob lambda1 = 1,lambda2= x Prob lambda1 = 1,lambda2= x (see, Karlis and Ntzoufras,2006, SIM) Bayesian approach Used Discrete differences of dental data to test for differences of means between two repeated measurements test for zero inflated components Prob lambda1 = 1,lambda2= Prob lambda1 = 2,lambda2= using the Bayes Factor approach (and RJMCMC to estimate it). The same approach can be used to quantify (using the Bayes factor) Used Discrete differences of dental data to the importance of home effect (by testing for the equality of expected home and away goals) and the excess of draws (by testing the importance of the zero inflated component). x x Karlis & Ntzoufras: Bayesian modelling of football outcomes 14 Advantages Removes additive covariance (do not need to model it) Has an Poisson latent variable interpretation Does not assumes Poisson marginals (i.e. one may assume an overdispersed marginal distribution) Easy interpretation Simpler Model set-up than the corresponding Biv. Poisson model Discards part of the information Disadvantages Cannot model covariance and final full score (only differences). Karlis & Ntzoufras: Bayesian modelling of football outcomes 16 Skellam s Distribution for Football Scores Response variable: Z = X Y the goal difference in each game. Same structure for parameters λ 1 and λ 2 as in Bivariate Poisson: log(λ 1i ) = µ + H + A HTi + D ATi (6) log(λ 2i ) = µ + A ATi + D HTi (7) µ: constant; H: home effect; A k, D k attacking and defensive parameters of team k; HT i, AT i home and away team in i game. Use the zero inflated variation of Skellam s distribution to model the excess of draws. Hence we define the zero inflated Poisson Difference (ZPD) distribution as f ZPD (0 p, λ 1,λ 2 )= p + (1 p)f PD (0 λ 1,λ 2 ) and f ZPD (z p, λ 1,λ 2 )= (1 p)f PD (z λ 1,λ 2 ), (8) for z Z\{0}; wherep (0, 1) and f PD (z λ 1,λ 2 ) is given by (5).

5 Karlis & Ntzoufras: Bayesian modelling of football outcomes 17 Karlis & Ntzoufras: Bayesian modelling of football outcomes 19 Posterior Distributions The key element for building an MCMC algorithm for the above models is to use augmented data Sample latent data w 1i and w 2i from f(w 1i,w 2i z i = w 1i w 2i,λ 1i,λ 2i ) λw1i w 1i! where I(A) =1ifA is true and zero otherwise. Sample [δ i z i,λ 1i,λ 2i ] Bernoulli( p i )from p p i = p +(1 p)f PD (z i λ 1i,λ 2i ) [δ i indicates the mixing (zero inflated or PD) component] λ w2i 2i w 2i! I(z i = w 1i w 2i ) Then we MCMC algorithm for model parameters is similar to the one used in Poisson regression models. We can additionally model p as in logistic regression models Fulham (38) Watford (29) Chelsea (64) Arsenal (63) Tottenham (57) Everton (52) Bolton (47) Reading (52) Portsmouth (45) Blackburn (52) Aston Villa (43) Middlesbrough (44) Newcastle (38) Man City (29) West Ham (35) Wigan (37) Sheff Utd (32) Charlton (34) Man Utd (83) Liverpool (57) % Posterior intervals for attacking coefficients (A i ) Karlis & Ntzoufras: Bayesian modelling of football outcomes 18 Karlis & Ntzoufras: Bayesian modelling of football outcomes 20 Example: Premiership season data Data from the premiership for the season are used to fit the PD and ZPD models. Results using PD Posterior summaries for µ and home effect Min. 1st Qu. Median Mean 3rd Qu. Max. s.d. mu home Man Utd (27) Chelsea (24) Liverpool (27) Arsenal (35) Tottenham (54) Everton (36) Bolton (52) Reading (47) Portsmouth (42) Blackburn (54) Aston Villa (41) Middlesbrough (49) Newcastle (47) Man City (44) West Ham (59) Fulham (60) Wigan (59) Sheff Utd (55) Charlton (60) Watford (59) % Posterior intervals for defensive coefficients [( 1) D i ]

6 Karlis & Ntzoufras: Bayesian modelling of football outcomes 21 Karlis & Ntzoufras: Bayesian modelling of football outcomes 23 Number of games Goal difference 95% Posterior intervals for predicted differences (in red=observed differences; - indicates the posterior median) Karlis & Ntzoufras: Bayesian modelling of football outcomes 22 Karlis & Ntzoufras: Bayesian modelling of football outcomes 24 Posterior Predictive Table Observed Final Table Pred.(Obs.) Post. Expectations Obs. Obs. values Rank Team Pts G.Dif. Rank Team Pts G.Dif. 1 (1) Man Utd ManUtd (2) Chelsea Chelsea (3) Arsenal Liverpool (3) Liverpool Arsenal (6) Everton Tottenham (8) Reading Everton (5) Tottenham Bolton (9) Portsmouth Reading (10) Blackburn Portsmouth (11) Aston Villa Blackburn (12) Middlesbrough Aston Villa (7) Bolton Middlesbrough (13) Newcastle Newcastle (14) Man City Man City (15) West Ham West Ham (17) Wigan Fulham (18) Sheff Utd Wigan (19) Charlton Sheff Utd (16) Fulham Charlton (20) Watford Watford Deviations between observed and predictive measures Comparison Deviation 1. Relative Frequencies (counts/games) 1.06% 2. Frequencies (counts) Relative Frequencies of win/draw/lose 2.20% 4. Frequencies of win/draw/lose Expected points Expected goal difference 0.28 Deviation = 1 K ( E(Q Pred i y) Q obs i K where K is the lengths of vector Q (K = 13 for comparisons 1-4, K =20for comparisons 5-6). i=1 ) 2

7 Karlis & Ntzoufras: Bayesian modelling of football outcomes 25 Results using ZPD Posterior summaries for µ and home effect Min. 1st Qu. Median Mean 3rd Qu. Max. s.d. mu home p The posterior distribution of p is close to zero Small deviations between µ for PD and ZPD Home effect is similar in both models Results from PD Min. 1st Qu. Median Mean 3rd Qu. Max. s.d. mu home Karlis & Ntzoufras: Bayesian modelling of football outcomes 27 Deviations between observed and predictive measures Deviation Comparison PD ZPD 1. Relative Frequencies (counts/games) 1.06% 1.32% 2. Frequencies (counts) Relative Frequencies of win/draw/lose 2.20% 2.80% 4. Frequencies of win/draw/lose % 5. Expected points Expected goal difference Comments Zero inflation component does not improves the performance (as expected) Prediction of goal differences for ZPD are worse than the corresponding ones for PD Nevertheless, differences in the final rankings are minimal Karlis & Ntzoufras: Bayesian modelling of football outcomes 26 Karlis & Ntzoufras: Bayesian modelling of football outcomes 28 Number of games Goal difference 95% Posterior intervals for predicted differences (in red=observed differences; - indicates the posterior median) Comment: No major differences between the two models are observed. ZPD PD Discussion and further research Although in the simple and bivariate Poisson model we underestimate draws here using PD we have overestimated draws. No zero inflation is needed. We might need to add a component which will reduce the predicted draws. Can covariates on p improve the model? Apply Bayesian variable selection and Bayesian model averaging techniques. Apply other distributions defined on the same range.

8 Karlis & Ntzoufras: Bayesian modelling of football outcomes 29 Our Related Publications 1. Karlis D. and Ntzoufras, I. (2006). Bayesian analysis of the differences of count data. Statistics in Medicine, 25, Karlis, D. and Ntzoufras, I. (2005). Bivariate Poisson and Diagonal Inflated Bivariate Poisson Regression Models in R. Journal of Statistical Software, Volume 10, Issue Karlis, D. and Ntzoufras, I. (2003). Analysis of Sports Data Using Bivariate Poisson Models. Journal of the Royal Statistical Society, D,(Statistician),52, Karlis, D. and Ntzoufras, I. (2000). On Modelling Soccer Data. Student, 3, Karlis & Ntzoufras: Bayesian modelling of football outcomes 30 Additional References Lee, A.J. (1997). Modeling scores in the Premier League: Is Manchester United really the best? Chance, 10, Dixon, M.J. and Coles, S.G. (1997). Modelling association football scores and inefficiencies in football betting market. Applied Statistics, 46,

Bayesian Analysis of Bivariate Count Data

Bayesian Analysis of Bivariate Count Data Karlis and Ntzoufras: Bayesian Analysis of Bivariate Count Data 1 Bayesian Analysis of Bivariate Count Data Dimitris Karlis and Ioannis Ntzoufras, Department of Statistics, Athens University of Economics

More information

Analysis of sports data by using bivariate Poisson models

Analysis of sports data by using bivariate Poisson models The Statistician (2003) 52, Part 3, pp. 381 393 Analysis of sports data by using bivariate Poisson models Dimitris Karlis Athens University of Economics and Business, Greece and Ioannis Ntzoufras University

More information

Statistical Properties of indices of competitive balance

Statistical Properties of indices of competitive balance Statistical Properties of indices of competitive balance Dimitris Karlis, Vasilis Manasis and Ioannis Ntzoufras Department of Statistics, Athens University of Economics & Business AUEB sports Analytics

More information

Advanced Computational Methods in Statistics: Lecture 5 Sequential Monte Carlo/Particle Filtering

Advanced Computational Methods in Statistics: Lecture 5 Sequential Monte Carlo/Particle Filtering Advanced Computational Methods in Statistics: Lecture 5 Sequential Monte Carlo/Particle Filtering Axel Gandy Department of Mathematics Imperial College London http://www2.imperial.ac.uk/~agandy London

More information

A Bivariate Weibull Count Model for Forecasting Association Football Scores

A Bivariate Weibull Count Model for Forecasting Association Football Scores A Bivariate Weibull Count Model for Forecasting Association Football Scores Georgi Boshnakov 1, Tarak Kharrat 1,2, and Ian G. McHale 2 1 School of Mathematics, University of Manchester, UK. 2 Centre for

More information

Petr Volf. Model for Difference of Two Series of Poisson-like Count Data

Petr Volf. Model for Difference of Two Series of Poisson-like Count Data Petr Volf Institute of Information Theory and Automation Academy of Sciences of the Czech Republic Pod vodárenskou věží 4, 182 8 Praha 8 e-mail: volf@utia.cas.cz Model for Difference of Two Series of Poisson-like

More information

Composite Poisson Models For Goal Scoring

Composite Poisson Models For Goal Scoring Swarthmore College Works Mathematics & Statistics Faculty Works Mathematics & Statistics 4-1-2008 Composite Poisson Models For Goal Scoring Philip J. Everson Swarthmore College, peverso1@swarthmore.edu

More information

On the Dependency of Soccer Scores - A Sparse Bivariate Poisson Model for the UEFA EURO 2016

On the Dependency of Soccer Scores - A Sparse Bivariate Poisson Model for the UEFA EURO 2016 On the Dependency of Soccer Scores - A Sparse Bivariate Poisson Model for the UEFA EURO 2016 A. Groll & A. Mayr & T. Kneib & G. Schauberger Department of Statistics, Georg-August-University Göttingen MathSport

More information

Or How to select variables Using Bayesian LASSO

Or How to select variables Using Bayesian LASSO Or How to select variables Using Bayesian LASSO x 1 x 2 x 3 x 4 Or How to select variables Using Bayesian LASSO x 1 x 2 x 3 x 4 Or How to select variables Using Bayesian LASSO On Bayesian Variable Selection

More information

arxiv: v1 [physics.soc-ph] 24 Sep 2009

arxiv: v1 [physics.soc-ph] 24 Sep 2009 Final version is published in: Journal of Applied Statistics Vol. 36, No. 10, October 2009, 1087 1095 RESEARCH ARTICLE Are soccer matches badly designed experiments? G. K. Skinner a and G. H. Freeman b

More information

A penalty criterion for score forecasting in soccer

A penalty criterion for score forecasting in soccer 1 A penalty criterion for score forecasting in soccer Jean-Louis oulley (1), Gilles Celeux () 1. Institut Montpellierain Alexander Grothendieck, rance, foulleyjl@gmail.com. INRIA-Select, Institut de mathématiques

More information

Bivariate Poisson and Diagonal Inflated Bivariate Poisson Regression Models in R

Bivariate Poisson and Diagonal Inflated Bivariate Poisson Regression Models in R 1 Bivariate Poisson and Diagonal Inflated Bivariate Poisson Regression Models in R Dimitris Karlis and Ioannis Ntzoufras Department of Statistics, Athens University of Economics and Business, 76, Patission

More information

Lecture 3: Sports rating models

Lecture 3: Sports rating models Lecture 3: Sports rating models David Aldous January 27, 2016 Sports are a popular topic for course projects usually involving details of some specific sport and statistical analysis of data. In this lecture

More information

Bayesian Networks in Educational Assessment

Bayesian Networks in Educational Assessment Bayesian Networks in Educational Assessment Estimating Parameters with MCMC Bayesian Inference: Expanding Our Context Roy Levy Arizona State University Roy.Levy@asu.edu 2017 Roy Levy MCMC 1 MCMC 2 Posterior

More information

Journal of Statistical Software

Journal of Statistical Software JSS Journal of Statistical Software September 2005, Volume 14, Issue 10. http://www.jstatsoft.org/ Bivariate Poisson and Diagonal Inflated Bivariate Poisson Regression Models in R Dimitris Karlis Athens

More information

LORENZ A. GILCH AND SEBASTIAN MÜLLER

LORENZ A. GILCH AND SEBASTIAN MÜLLER ON ELO BASED PREDICTION MODELS FOR THE FIFA WORLDCUP 2018 LORENZ A. GILCH AND SEBASTIAN MÜLLER arxiv:1806.01930v1 [stat.ap] 5 Jun 2018 Abstract. We propose an approach for the analysis and prediction of

More information

Dependence. Practitioner Course: Portfolio Optimization. John Dodson. September 10, Dependence. John Dodson. Outline.

Dependence. Practitioner Course: Portfolio Optimization. John Dodson. September 10, Dependence. John Dodson. Outline. Practitioner Course: Portfolio Optimization September 10, 2008 Before we define dependence, it is useful to define Random variables X and Y are independent iff For all x, y. In particular, F (X,Y ) (x,

More information

Topic 2: Probability & Distributions. Road Map Probability & Distributions. ECO220Y5Y: Quantitative Methods in Economics. Dr.

Topic 2: Probability & Distributions. Road Map Probability & Distributions. ECO220Y5Y: Quantitative Methods in Economics. Dr. Topic 2: Probability & Distributions ECO220Y5Y: Quantitative Methods in Economics Dr. Nick Zammit University of Toronto Department of Economics Room KN3272 n.zammit utoronto.ca November 21, 2017 Dr. Nick

More information

A market-based algorithm for predicting soccer outcomes

A market-based algorithm for predicting soccer outcomes A market-based algorithm for predicting soccer outcomes Víctor Hernández García Universidad Complutense de Madrid & FEDEA June 2017 Abstract The main objective of this study is to develop a market-based

More information

This does not cover everything on the final. Look at the posted practice problems for other topics.

This does not cover everything on the final. Look at the posted practice problems for other topics. Class 7: Review Problems for Final Exam 8.5 Spring 7 This does not cover everything on the final. Look at the posted practice problems for other topics. To save time in class: set up, but do not carry

More information

CS37300 Class Notes. Jennifer Neville, Sebastian Moreno, Bruno Ribeiro

CS37300 Class Notes. Jennifer Neville, Sebastian Moreno, Bruno Ribeiro CS37300 Class Notes Jennifer Neville, Sebastian Moreno, Bruno Ribeiro 2 Background on Probability and Statistics These are basic definitions, concepts, and equations that should have been covered in your

More information

Business Statistics: 41000

Business Statistics: 41000 Business Statistics: 41000 Week 3: Learning from Data: Estimation Nick Polson The University of Chicago Booth School of Business http://faculty.chicagobooth.edu/nicholas.polson/teaching/41000/ Suggested

More information

Class 26: review for final exam 18.05, Spring 2014

Class 26: review for final exam 18.05, Spring 2014 Probability Class 26: review for final eam 8.05, Spring 204 Counting Sets Inclusion-eclusion principle Rule of product (multiplication rule) Permutation and combinations Basics Outcome, sample space, event

More information

PREDICTING AND RETROSPECTIVE ANALYSIS OF SOCCER MATCHES IN A LEAGUE

PREDICTING AND RETROSPECTIVE ANALYSIS OF SOCCER MATCHES IN A LEAGUE PREDICTING ND RETROSPECTIVE NLYSIS OF SOCCER MTCHES IN LEGUE HÅVRD RUE ND ØYVIND SLVESEN DEPRTMENT OF MTHEMTICL SCIENCES NTNU, NORWY FIRST VERSION: OCTOBER 1997 REVISED: UGUST 1998 BSTRCT common discussion

More information

Six Examples of Fits Using the Poisson. Negative Binomial. and Poisson Inverse Gaussian Distributions

Six Examples of Fits Using the Poisson. Negative Binomial. and Poisson Inverse Gaussian Distributions APPENDIX A APPLICA TrONS IN OTHER DISCIPLINES Six Examples of Fits Using the Poisson. Negative Binomial. and Poisson Inverse Gaussian Distributions Poisson and mixed Poisson distributions have been extensively

More information

Discrete Choice Modeling

Discrete Choice Modeling [Part 6] 1/55 0 Introduction 1 Summary 2 Binary Choice 3 Panel Data 4 Bivariate Probit 5 Ordered Choice 6 7 Multinomial Choice 8 Nested Logit 9 Heterogeneity 10 Latent Class 11 Mixed Logit 12 Stated Preference

More information

Swarthmore Honors Exam 2012: Statistics

Swarthmore Honors Exam 2012: Statistics Swarthmore Honors Exam 2012: Statistics 1 Swarthmore Honors Exam 2012: Statistics John W. Emerson, Yale University NAME: Instructions: This is a closed-book three-hour exam having six questions. You may

More information

Parametric Modelling of Over-dispersed Count Data. Part III / MMath (Applied Statistics) 1

Parametric Modelling of Over-dispersed Count Data. Part III / MMath (Applied Statistics) 1 Parametric Modelling of Over-dispersed Count Data Part III / MMath (Applied Statistics) 1 Introduction Poisson regression is the de facto approach for handling count data What happens then when Poisson

More information

Twelfth Problem Assignment

Twelfth Problem Assignment EECS 401 Not Graded PROBLEM 1 Let X 1, X 2,... be a sequence of independent random variables that are uniformly distributed between 0 and 1. Consider a sequence defined by (a) Y n = max(x 1, X 2,..., X

More information

Approximating the Conway-Maxwell-Poisson normalizing constant

Approximating the Conway-Maxwell-Poisson normalizing constant Filomat 30:4 016, 953 960 DOI 10.98/FIL1604953S Published by Faculty of Sciences and Mathematics, University of Niš, Serbia Available at: http://www.pmf.ni.ac.rs/filomat Approximating the Conway-Maxwell-Poisson

More information

Default Priors and Effcient Posterior Computation in Bayesian

Default Priors and Effcient Posterior Computation in Bayesian Default Priors and Effcient Posterior Computation in Bayesian Factor Analysis January 16, 2010 Presented by Eric Wang, Duke University Background and Motivation A Brief Review of Parameter Expansion Literature

More information

Variance reduction. Michel Bierlaire. Transport and Mobility Laboratory. Variance reduction p. 1/18

Variance reduction. Michel Bierlaire. Transport and Mobility Laboratory. Variance reduction p. 1/18 Variance reduction p. 1/18 Variance reduction Michel Bierlaire michel.bierlaire@epfl.ch Transport and Mobility Laboratory Variance reduction p. 2/18 Example Use simulation to compute I = 1 0 e x dx We

More information

ECE 302, Final 3:20-5:20pm Mon. May 1, WTHR 160 or WTHR 172.

ECE 302, Final 3:20-5:20pm Mon. May 1, WTHR 160 or WTHR 172. ECE 302, Final 3:20-5:20pm Mon. May 1, WTHR 160 or WTHR 172. 1. Enter your name, student ID number, e-mail address, and signature in the space provided on this page, NOW! 2. This is a closed book exam.

More information

(3) Review of Probability. ST440/540: Applied Bayesian Statistics

(3) Review of Probability. ST440/540: Applied Bayesian Statistics Review of probability The crux of Bayesian statistics is to compute the posterior distribution, i.e., the uncertainty distribution of the parameters (θ) after observing the data (Y) This is the conditional

More information

Course: ESO-209 Home Work: 1 Instructor: Debasis Kundu

Course: ESO-209 Home Work: 1 Instructor: Debasis Kundu Home Work: 1 1. Describe the sample space when a coin is tossed (a) once, (b) three times, (c) n times, (d) an infinite number of times. 2. A coin is tossed until for the first time the same result appear

More information

Class 8 Review Problems solutions, 18.05, Spring 2014

Class 8 Review Problems solutions, 18.05, Spring 2014 Class 8 Review Problems solutions, 8.5, Spring 4 Counting and Probability. (a) Create an arrangement in stages and count the number of possibilities at each stage: ( ) Stage : Choose three of the slots

More information

Data Modeling & Analysis Techniques. Probability & Statistics. Manfred Huber

Data Modeling & Analysis Techniques. Probability & Statistics. Manfred Huber Data Modeling & Analysis Techniques Probability & Statistics Manfred Huber 2017 1 Probability and Statistics Probability and statistics are often used interchangeably but are different, related fields

More information

Probability and Information Theory. Sargur N. Srihari

Probability and Information Theory. Sargur N. Srihari Probability and Information Theory Sargur N. srihari@cedar.buffalo.edu 1 Topics in Probability and Information Theory Overview 1. Why Probability? 2. Random Variables 3. Probability Distributions 4. Marginal

More information

Bayes Model Selection with Path Sampling: Factor Models

Bayes Model Selection with Path Sampling: Factor Models with Path Sampling: Factor Models Ritabrata Dutta and Jayanta K Ghosh Purdue University 07/02/11 Factor Models in Applications Factor Models in Applications Factor Models Factor Models and Factor analysis

More information

On the bivariate Skellam distribution. hal , version 1-22 Oct Noname manuscript No. (will be inserted by the editor)

On the bivariate Skellam distribution. hal , version 1-22 Oct Noname manuscript No. (will be inserted by the editor) Noname manuscript No. (will be inserted by the editor) On the bivariate Skellam distribution Jan Bulla Christophe Chesneau Maher Kachour Submitted to Journal of Multivariate Analysis: 22 October 2012 Abstract

More information

Online Appendix to: Marijuana on Main Street? Estimating Demand in Markets with Limited Access

Online Appendix to: Marijuana on Main Street? Estimating Demand in Markets with Limited Access Online Appendix to: Marijuana on Main Street? Estating Demand in Markets with Lited Access By Liana Jacobi and Michelle Sovinsky This appendix provides details on the estation methodology for various speci

More information

Mohammed. Research in Pharmacoepidemiology National School of Pharmacy, University of Otago

Mohammed. Research in Pharmacoepidemiology National School of Pharmacy, University of Otago Mohammed Research in Pharmacoepidemiology (RIPE) @ National School of Pharmacy, University of Otago What is zero inflation? Suppose you want to study hippos and the effect of habitat variables on their

More information

Unit 2. Describing Data: Numerical

Unit 2. Describing Data: Numerical Unit 2 Describing Data: Numerical Describing Data Numerically Describing Data Numerically Central Tendency Arithmetic Mean Median Mode Variation Range Interquartile Range Variance Standard Deviation Coefficient

More information

MODEL BASED CLUSTERING FOR COUNT DATA

MODEL BASED CLUSTERING FOR COUNT DATA MODEL BASED CLUSTERING FOR COUNT DATA Dimitris Karlis Department of Statistics Athens University of Economics and Business, Athens April OUTLINE Clustering methods Model based clustering!"the general model!"algorithmic

More information

Bayesian Inference and MCMC

Bayesian Inference and MCMC Bayesian Inference and MCMC Aryan Arbabi Partly based on MCMC slides from CSC412 Fall 2018 1 / 18 Bayesian Inference - Motivation Consider we have a data set D = {x 1,..., x n }. E.g each x i can be the

More information

Obnoxious lateness humor

Obnoxious lateness humor Obnoxious lateness humor 1 Using Bayesian Model Averaging For Addressing Model Uncertainty in Environmental Risk Assessment Louise Ryan and Melissa Whitney Department of Biostatistics Harvard School of

More information

LECTURE 1. Introduction to Econometrics

LECTURE 1. Introduction to Econometrics LECTURE 1 Introduction to Econometrics Ján Palguta September 20, 2016 1 / 29 WHAT IS ECONOMETRICS? To beginning students, it may seem as if econometrics is an overly complex obstacle to an otherwise useful

More information

Business Statistics 41000: Homework # 2 Solutions

Business Statistics 41000: Homework # 2 Solutions Business Statistics 4000: Homework # 2 Solutions Drew Creal February 9, 204 Question #. Discrete Random Variables and Their Distributions (a) The probabilities have to sum to, which means that 0. + 0.3

More information

Where would you rather live? (And why?)

Where would you rather live? (And why?) Where would you rather live? (And why?) CityA CityB Where would you rather live? (And why?) CityA CityB City A is San Diego, CA, and City B is Evansville, IN Measures of Dispersion Suppose you need a new

More information

Approximate Bayesian binary and ordinal regression for prediction with structured uncertainty in the inputs

Approximate Bayesian binary and ordinal regression for prediction with structured uncertainty in the inputs Approximate Bayesian binary and ordinal regression for prediction with structured uncertainty in the inputs Aleksandar Dimitriev Erik Štrumbelj Abstract We present a novel approach to binary and ordinal

More information

Estimation of Copula Models with Discrete Margins (via Bayesian Data Augmentation) Michael S. Smith

Estimation of Copula Models with Discrete Margins (via Bayesian Data Augmentation) Michael S. Smith Estimation of Copula Models with Discrete Margins (via Bayesian Data Augmentation) Michael S. Smith Melbourne Business School, University of Melbourne (Joint with Mohamad Khaled, University of Queensland)

More information

This paper is not to be removed from the Examination Halls

This paper is not to be removed from the Examination Halls ~~ST104B ZA d0 This paper is not to be removed from the Examination Halls UNIVERSITY OF LONDON ST104B ZB BSc degrees and Diplomas for Graduates in Economics, Management, Finance and the Social Sciences,

More information

Multivariate Normal & Wishart

Multivariate Normal & Wishart Multivariate Normal & Wishart Hoff Chapter 7 October 21, 2010 Reading Comprehesion Example Twenty-two children are given a reading comprehsion test before and after receiving a particular instruction method.

More information

Lecture Notes for BUSINESS STATISTICS - BMGT 571. Chapters 1 through 6. Professor Ahmadi, Ph.D. Department of Management

Lecture Notes for BUSINESS STATISTICS - BMGT 571. Chapters 1 through 6. Professor Ahmadi, Ph.D. Department of Management Lecture Notes for BUSINESS STATISTICS - BMGT 571 Chapters 1 through 6 Professor Ahmadi, Ph.D. Department of Management Revised May 005 Glossary of Terms: Statistics Chapter 1 Data Data Set Elements Variable

More information

CS281 Section 4: Factor Analysis and PCA

CS281 Section 4: Factor Analysis and PCA CS81 Section 4: Factor Analysis and PCA Scott Linderman At this point we have seen a variety of machine learning models, with a particular emphasis on models for supervised learning. In particular, we

More information

STA414/2104. Lecture 11: Gaussian Processes. Department of Statistics

STA414/2104. Lecture 11: Gaussian Processes. Department of Statistics STA414/2104 Lecture 11: Gaussian Processes Department of Statistics www.utstat.utoronto.ca Delivered by Mark Ebden with thanks to Russ Salakhutdinov Outline Gaussian Processes Exam review Course evaluations

More information

Incorporating cost in Bayesian variable selection, with application to cost-effective measurement of quality of health care

Incorporating cost in Bayesian variable selection, with application to cost-effective measurement of quality of health care Incorporating cost in Bayesian variable selection, with application to cost-effective measurement of quality of health care Dimitris Fouskakis, Department of Mathematics, School of Applied Mathematical

More information

EXAMINATIONS OF THE HONG KONG STATISTICAL SOCIETY

EXAMINATIONS OF THE HONG KONG STATISTICAL SOCIETY EXAMINATIONS OF THE HONG KONG STATISTICAL SOCIETY HIGHER CERTIFICATE IN STATISTICS, 2013 MODULE 5 : Further probability and inference Time allowed: One and a half hours Candidates should answer THREE questions.

More information

Chapter 5. Bayesian Statistics

Chapter 5. Bayesian Statistics Chapter 5. Bayesian Statistics Principles of Bayesian Statistics Anything unknown is given a probability distribution, representing degrees of belief [subjective probability]. Degrees of belief [subjective

More information

CIS 390 Fall 2016 Robotics: Planning and Perception Final Review Questions

CIS 390 Fall 2016 Robotics: Planning and Perception Final Review Questions CIS 390 Fall 2016 Robotics: Planning and Perception Final Review Questions December 14, 2016 Questions Throughout the following questions we will assume that x t is the state vector at time t, z t is the

More information

Part IA Probability. Definitions. Based on lectures by R. Weber Notes taken by Dexter Chua. Lent 2015

Part IA Probability. Definitions. Based on lectures by R. Weber Notes taken by Dexter Chua. Lent 2015 Part IA Probability Definitions Based on lectures by R. Weber Notes taken by Dexter Chua Lent 2015 These notes are not endorsed by the lecturers, and I have modified them (often significantly) after lectures.

More information

Bayesian inference for factor scores

Bayesian inference for factor scores Bayesian inference for factor scores Murray Aitkin and Irit Aitkin School of Mathematics and Statistics University of Newcastle UK October, 3 Abstract Bayesian inference for the parameters of the factor

More information

Optimization and Simulation

Optimization and Simulation Optimization and Simulation Variance reduction Michel Bierlaire Transport and Mobility Laboratory School of Architecture, Civil and Environmental Engineering Ecole Polytechnique Fédérale de Lausanne M.

More information

Contents. Part I: Fundamentals of Bayesian Inference 1

Contents. Part I: Fundamentals of Bayesian Inference 1 Contents Preface xiii Part I: Fundamentals of Bayesian Inference 1 1 Probability and inference 3 1.1 The three steps of Bayesian data analysis 3 1.2 General notation for statistical inference 4 1.3 Bayesian

More information

Part 8: GLMs and Hierarchical LMs and GLMs

Part 8: GLMs and Hierarchical LMs and GLMs Part 8: GLMs and Hierarchical LMs and GLMs 1 Example: Song sparrow reproductive success Arcese et al., (1992) provide data on a sample from a population of 52 female song sparrows studied over the course

More information

Systematic strategies for real time filtering of turbulent signals in complex systems

Systematic strategies for real time filtering of turbulent signals in complex systems Systematic strategies for real time filtering of turbulent signals in complex systems Statistical inversion theory for Gaussian random variables The Kalman Filter for Vector Systems: Reduced Filters and

More information

Chapter 5. Chapter 5 sections

Chapter 5. Chapter 5 sections 1 / 43 sections Discrete univariate distributions: 5.2 Bernoulli and Binomial distributions Just skim 5.3 Hypergeometric distributions 5.4 Poisson distributions Just skim 5.5 Negative Binomial distributions

More information

Dependence. MFM Practitioner Module: Risk & Asset Allocation. John Dodson. September 11, Dependence. John Dodson. Outline.

Dependence. MFM Practitioner Module: Risk & Asset Allocation. John Dodson. September 11, Dependence. John Dodson. Outline. MFM Practitioner Module: Risk & Asset Allocation September 11, 2013 Before we define dependence, it is useful to define Random variables X and Y are independent iff For all x, y. In particular, F (X,Y

More information

Exam 1 - Math Solutions

Exam 1 - Math Solutions Exam 1 - Math 3200 - Solutions Spring 2013 1. Without actually expanding, find the coefficient of x y 2 z 3 in the expansion of (2x y z) 6. (A) 120 (B) 60 (C) 30 (D) 20 (E) 10 (F) 10 (G) 20 (H) 30 (I)

More information

Stat 5101 Lecture Notes

Stat 5101 Lecture Notes Stat 5101 Lecture Notes Charles J. Geyer Copyright 1998, 1999, 2000, 2001 by Charles J. Geyer May 7, 2001 ii Stat 5101 (Geyer) Course Notes Contents 1 Random Variables and Change of Variables 1 1.1 Random

More information

Lecture 1: Probability Fundamentals

Lecture 1: Probability Fundamentals Lecture 1: Probability Fundamentals IB Paper 7: Probability and Statistics Carl Edward Rasmussen Department of Engineering, University of Cambridge January 22nd, 2008 Rasmussen (CUED) Lecture 1: Probability

More information

Multivariate Generalized Linear Mixed Models for Joint Estimation of Sporting Outcomes

Multivariate Generalized Linear Mixed Models for Joint Estimation of Sporting Outcomes arxiv:1710.05284v1 [stat.ap] 15 Oct 2017 Multivariate Generalized Linear Mixed Models for Joint Estimation of Sporting Outcomes Jennifer E. Broatch Andrew T. Karl Abstract This paper explores improvements

More information

Vojtěch Herrmann. Favoritism Under Social Pressure: Evidence From English Premier League

Vojtěch Herrmann. Favoritism Under Social Pressure: Evidence From English Premier League BACHELOR THESIS Vojtěch Herrmann Favoritism Under Social Pressure: Evidence From English Premier League Department of Probability and Mathematical Statistics Supervisor of the bachelor thesis: Study programme:

More information

Bayesian Inference. Chapter 4: Regression and Hierarchical Models

Bayesian Inference. Chapter 4: Regression and Hierarchical Models Bayesian Inference Chapter 4: Regression and Hierarchical Models Conchi Ausín and Mike Wiper Department of Statistics Universidad Carlos III de Madrid Master in Business Administration and Quantitative

More information

Semiparametric Generalized Linear Models

Semiparametric Generalized Linear Models Semiparametric Generalized Linear Models North American Stata Users Group Meeting Chicago, Illinois Paul Rathouz Department of Health Studies University of Chicago prathouz@uchicago.edu Liping Gao MS Student

More information

Hidden Markov models for time series of counts with excess zeros

Hidden Markov models for time series of counts with excess zeros Hidden Markov models for time series of counts with excess zeros Madalina Olteanu and James Ridgway University Paris 1 Pantheon-Sorbonne - SAMM, EA4543 90 Rue de Tolbiac, 75013 Paris - France Abstract.

More information

A strategy for modelling count data which may have extra zeros

A strategy for modelling count data which may have extra zeros A strategy for modelling count data which may have extra zeros Alan Welsh Centre for Mathematics and its Applications Australian National University The Data Response is the number of Leadbeater s possum

More information

arxiv: v2 [physics.data-an] 3 Mar 2010

arxiv: v2 [physics.data-an] 3 Mar 2010 Soccer: is scoring goals a predictable Poissonian process? A. Heuer, 1 C. Müller, 1,2 and O. Rubner 1 1 Westfälische Wilhelms Universität Münster, Institut für physikalische Chemie, Corrensstr. 3, 48149

More information

Lecture 3: Sports rating models

Lecture 3: Sports rating models Lecture 3: Sports rating models David Aldous August 31, 2017 Sports are a popular topic for course projects usually involving details of some specific sport and statistical analysis of data. In this lecture

More information

Gauge Plots. Gauge Plots JAPANESE BEETLE DATA MAXIMUM LIKELIHOOD FOR SPATIALLY CORRELATED DISCRETE DATA JAPANESE BEETLE DATA

Gauge Plots. Gauge Plots JAPANESE BEETLE DATA MAXIMUM LIKELIHOOD FOR SPATIALLY CORRELATED DISCRETE DATA JAPANESE BEETLE DATA JAPANESE BEETLE DATA 6 MAXIMUM LIKELIHOOD FOR SPATIALLY CORRELATED DISCRETE DATA Gauge Plots TuscaroraLisa Central Madsen Fairways, 996 January 9, 7 Grubs Adult Activity Grub Counts 6 8 Organic Matter

More information

Forestry experiment and dose-response modelling. In the News: High water mark: the rise in sea levels may be accelerating Economist, Jan 17

Forestry experiment and dose-response modelling. In the News: High water mark: the rise in sea levels may be accelerating Economist, Jan 17 Today HW 1: due February 4, 11.59 pm. Regression with count data Forestry experiment and dose-response modelling In the News: High water mark: the rise in sea levels may be accelerating Economist, Jan

More information

36-463/663: Multilevel & Hierarchical Models

36-463/663: Multilevel & Hierarchical Models 36-463/663: Multilevel & Hierarchical Models (P)review: in-class midterm Brian Junker 132E Baker Hall brian@stat.cmu.edu 1 In-class midterm Closed book, closed notes, closed electronics (otherwise I have

More information

Bayesian linear regression

Bayesian linear regression Bayesian linear regression Linear regression is the basis of most statistical modeling. The model is Y i = X T i β + ε i, where Y i is the continuous response X i = (X i1,..., X ip ) T is the corresponding

More information

Approximate Inference in Practice Microsoft s Xbox TrueSkill TM

Approximate Inference in Practice Microsoft s Xbox TrueSkill TM 1 Approximate Inference in Practice Microsoft s Xbox TrueSkill TM Daniel Hernández-Lobato Universidad Autónoma de Madrid 2014 2 Outline 1 Introduction 2 The Probabilistic Model 3 Approximate Inference

More information

MATH 3510: PROBABILITY AND STATS June 15, 2011 MIDTERM EXAM

MATH 3510: PROBABILITY AND STATS June 15, 2011 MIDTERM EXAM MATH 3510: PROBABILITY AND STATS June 15, 2011 MIDTERM EXAM YOUR NAME: KEY: Answers in Blue Show all your work. Answers out of the blue and without any supporting work may receive no credit even if they

More information

13.1 Causal effects with continuous mediator and. predictors in their equations. The definitions for the direct, total indirect,

13.1 Causal effects with continuous mediator and. predictors in their equations. The definitions for the direct, total indirect, 13 Appendix 13.1 Causal effects with continuous mediator and continuous outcome Consider the model of Section 3, y i = β 0 + β 1 m i + β 2 x i + β 3 x i m i + β 4 c i + ɛ 1i, (49) m i = γ 0 + γ 1 x i +

More information

Lecture 25: Review. Statistics 104. April 23, Colin Rundel

Lecture 25: Review. Statistics 104. April 23, Colin Rundel Lecture 25: Review Statistics 104 Colin Rundel April 23, 2012 Joint CDF F (x, y) = P [X x, Y y] = P [(X, Y ) lies south-west of the point (x, y)] Y (x,y) X Statistics 104 (Colin Rundel) Lecture 25 April

More information

Introduction to Probability 2017/18 Supplementary Problems

Introduction to Probability 2017/18 Supplementary Problems Introduction to Probability 2017/18 Supplementary Problems Problem 1: Let A and B denote two events with P(A B) 0. Show that P(A) 0 and P(B) 0. A A B implies P(A) P(A B) 0, hence P(A) 0. Similarly B A

More information

Approaches for Multiple Disease Mapping: MCAR and SANOVA

Approaches for Multiple Disease Mapping: MCAR and SANOVA Approaches for Multiple Disease Mapping: MCAR and SANOVA Dipankar Bandyopadhyay Division of Biostatistics, University of Minnesota SPH April 22, 2015 1 Adapted from Sudipto Banerjee s notes SANOVA vs MCAR

More information

Linear Models and Empirical Bayes Methods for. Assessing Differential Expression in Microarray Experiments

Linear Models and Empirical Bayes Methods for. Assessing Differential Expression in Microarray Experiments Linear Models and Empirical Bayes Methods for Assessing Differential Expression in Microarray Experiments by Gordon K. Smyth (as interpreted by Aaron J. Baraff) STAT 572 Intro Talk April 10, 2014 Microarray

More information

R-squared for Bayesian regression models

R-squared for Bayesian regression models R-squared for Bayesian regression models Andrew Gelman Ben Goodrich Jonah Gabry Imad Ali 8 Nov 2017 Abstract The usual definition of R 2 (variance of the predicted values divided by the variance of the

More information

Bayesian Learning. HT2015: SC4 Statistical Data Mining and Machine Learning. Maximum Likelihood Principle. The Bayesian Learning Framework

Bayesian Learning. HT2015: SC4 Statistical Data Mining and Machine Learning. Maximum Likelihood Principle. The Bayesian Learning Framework HT5: SC4 Statistical Data Mining and Machine Learning Dino Sejdinovic Department of Statistics Oxford http://www.stats.ox.ac.uk/~sejdinov/sdmml.html Maximum Likelihood Principle A generative model for

More information

45 6 = ( )/6! = 20 6 = selected subsets: 6 = and 10

45 6 = ( )/6! = 20 6 = selected subsets: 6 = and 10 Homework 1. Selected Solutions, 9/12/03. Ch. 2, # 34, p. 74. The most natural sample space S is the set of unordered -tuples, i.e., the set of -element subsets of the list of 45 workers (labelled 1,...,

More information

Bayesian Inference. Chapter 4: Regression and Hierarchical Models

Bayesian Inference. Chapter 4: Regression and Hierarchical Models Bayesian Inference Chapter 4: Regression and Hierarchical Models Conchi Ausín and Mike Wiper Department of Statistics Universidad Carlos III de Madrid Advanced Statistics and Data Mining Summer School

More information

CSC 411: Lecture 09: Naive Bayes

CSC 411: Lecture 09: Naive Bayes CSC 411: Lecture 09: Naive Bayes Class based on Raquel Urtasun & Rich Zemel s lectures Sanja Fidler University of Toronto Feb 8, 2015 Urtasun, Zemel, Fidler (UofT) CSC 411: 09-Naive Bayes Feb 8, 2015 1

More information

Online Appendix: Bayesian versus maximum likelihood estimation of treatment effects in bivariate probit instrumental variable models

Online Appendix: Bayesian versus maximum likelihood estimation of treatment effects in bivariate probit instrumental variable models Online Appendix: Bayesian versus maximum likelihood estimation of treatment effects in bivariate probit instrumental variable models A. STAN CODE // STAN code for Bayesian bivariate model // based on code

More information

Bayes: All uncertainty is described using probability.

Bayes: All uncertainty is described using probability. Bayes: All uncertainty is described using probability. Let w be the data and θ be any unknown quantities. Likelihood. The probability model π(w θ) has θ fixed and w varying. The likelihood L(θ; w) is π(w

More information

Plausible Values for Latent Variables Using Mplus

Plausible Values for Latent Variables Using Mplus Plausible Values for Latent Variables Using Mplus Tihomir Asparouhov and Bengt Muthén August 21, 2010 1 1 Introduction Plausible values are imputed values for latent variables. All latent variables can

More information

Statistical Methods III Statistics 212. Problem Set 2 - Answer Key

Statistical Methods III Statistics 212. Problem Set 2 - Answer Key Statistical Methods III Statistics 212 Problem Set 2 - Answer Key 1. (Analysis to be turned in and discussed on Tuesday, April 24th) The data for this problem are taken from long-term followup of 1423

More information

STAT 4385 Topic 01: Introduction & Review

STAT 4385 Topic 01: Introduction & Review STAT 4385 Topic 01: Introduction & Review Xiaogang Su, Ph.D. Department of Mathematical Science University of Texas at El Paso xsu@utep.edu Spring, 2016 Outline Welcome What is Regression Analysis? Basics

More information