Bayesian modelling of football outcomes. 1 Introduction. Synopsis. (using Skellam s Distribution)
|
|
- Erick Young
- 6 years ago
- Views:
Transcription
1 Karlis & Ntzoufras: Bayesian modelling of football outcomes 3 Bayesian modelling of football outcomes (using Skellam s Distribution) Dimitris Karlis & Ioannis Ntzoufras s: {karlis, ntzoufras}@aueb.gr Department of Statistics Athens University of Economics and Business Athens, Greece; 1 Introduction Poisson distribution has been widely used as a simple modelling approach for predicting the number of goals in football (see, for example, Lee, 1997, Karlis and Ntzoufras, 2000). Empirical evidence has shown a (relatively low) correlation between the goals in a football game. This correlation must be incorporated in the model. It is reasonable to assume that the two outcome variables in football are correlated since the two teams interact during the game. 2007, 24-26th June, Manchester Karlis & Ntzoufras: Bayesian modelling of football outcomes 2 Karlis & Ntzoufras: Bayesian modelling of football outcomes 4 Synopsis 1. Introduction. 2. Skellam s Distribution. 3. Bayesian infernece for the Skellam s distribution 4. The General Model. 5. Illustrative example and results. 6. Discussion and further work. We can assume bivariate Poisson distribution (see Karlis and Ntzoufras, 2003). The marginal distributions are simple Poisson distributions, while the random variables are now dependent. Important issue model extra variation of draws (mainly 0-0, 1-1, 2-2). This problem has been effectively handled by Dixon and Coles (1997) and Karlis and Ntzoufras (2003)
2 Karlis & Ntzoufras: Bayesian modelling of football outcomes 5 Karlis & Ntzoufras: Bayesian modelling of football outcomes 7 The Bivariate Poisson Model (Karlis and Ntzoufras, 2003) (X i,y i ) BP(λ 1i,λ 2i,λ 3i ) where BP(λ 1,λ 2,λ 3 ) is the bivariate Poisson distribution with parameters λ i, i =1, 2, 3 and probability function P (X = x, Y = y) =e (λ1+λ2+λ3) λx 1 λ y min(x,y) 2 x y ( ) k λ3 k!. (1) x! y! k k λ 1 λ 2 k=0 Marginals: X P oisson(λ 1 + λ 3 )andy P oisson(λ 2 + λ 3 ) Means and Variance: E(X) =V (X) =λ 1 + λ 3, E(Y )=V (X) =λ 2 + λ 3 Advantages Accounts for the covariance which can be modelled Has he latent variables decomposition which can be used for data augmentation Poisson Marginals Easy interpretation Easy extension to similar models. Covariance: Cov(X, Y )=λ 3 > 0 Can be derived using latent variables W i P oisson(λ i ), for i =1, 2, 3with X = W 1 + W 3 and with Y = W 2 + W 3. Karlis & Ntzoufras: Bayesian modelling of football outcomes 6 Karlis & Ntzoufras: Bayesian modelling of football outcomes 8 Covariates can be linked directly on the means of the latent variables as in Karlis and Ntzoufras (2003) or directly on the marginal means. A Simple Model (Karlis and Ntzoufras, 2003) Response variables (X, Y ) are the home and away goals in each game. Consider the structure of Lee (1997) for λ 1 and λ 2 log(λ 1i ) = µ + H + A HTi + D ATi (2) log(λ 2i ) = µ + A ATi + D HTi (3) µ: constant; H: home effect; A k, D k attacking and defensive parameters of team k; HT i, AT i home and away team in i game. Constant covariance λ 3 Disadvantages How to model λ 3? Is constant λ 3 sufficient? Does λ 3 varies across teams or time? Underestimates the number of draws in certain cases (while it provides better results on this from the double Poisson model) Poisson Marginals while empirical evidence has shown slight over-dispersion: need to adjust The model allows only for positive correlation, if the data show negative correlation (even slow) the model cannot handle it
3 Karlis & Ntzoufras: Bayesian modelling of football outcomes 9 Karlis & Ntzoufras: Bayesian modelling of football outcomes 11 marginal means λ 3 win draw loss ratio Table 1: The effect on the predicted probabilities when ignoring the covariance; the BP increases the probability of a draw. Skellam s distribution (1) For any pair of variables (X, Y )thatcanbewrittenas X = W 1 + W 3, Y = W 2 + W 3 with W 1 P oisson(λ 1 ), W 2 P oisson(λ 2 ) and W 3 any distribution with parameters θ 3 then Z = X Y PD(λ 1,λ 2 ) (Poisson difference or Skellam s distribution with parameters λ 1 and λ 2 ). Karlis & Ntzoufras: Bayesian modelling of football outcomes 10 Karlis & Ntzoufras: Bayesian modelling of football outcomes 12 Skellam s distribution (2) Diagonal Inflated Bivariate Poisson Model Under this approach a diagonal inflated model is specified by (1 p)bp(x, y λ 1,λ 2,λ 3 ), x y P D (x, y) = (1 p)bp(x, y λ 1,λ 2,λ 3 )+pd(x, θ), x = y, where D(x, θ) is discrete distribution with parameter vector θ. Such models can be fitted using the EM algorithm. Important: diagonal inflation improves in several aspects: better draw prediction, overdipsersed marginals, introduce correlation (4) Z = X Y PD(λ 1,λ 2 ) Mean E(Z) = λ 1 λ 2 Poisson difference or Skellam s distribution with Variance Var(Z) = λ 1 + λ 2 and density function ( ) z/2 f PD (z λ, λ 2 )=P(Z = z λ 1,λ 2 )=e (λ1+λ2) λ1 I z (2 ) λ 1 λ 2. (5) λ 2 for all z Z,whereI r (x) is the modified Bessel function of order r ( ) k ( x ) r x 2 4 I r (x) = 2 k!γ(r + k +1). k=0 (see Abramowitz and Stegun, 1974, pp. 375).
4 Karlis & Ntzoufras: Bayesian modelling of football outcomes 13 Karlis & Ntzoufras: Bayesian modelling of football outcomes 15 Prob lambda1 = 1,lambda2= x Prob lambda1 = 1,lambda2= x (see, Karlis and Ntzoufras,2006, SIM) Bayesian approach Used Discrete differences of dental data to test for differences of means between two repeated measurements test for zero inflated components Prob lambda1 = 1,lambda2= Prob lambda1 = 2,lambda2= using the Bayes Factor approach (and RJMCMC to estimate it). The same approach can be used to quantify (using the Bayes factor) Used Discrete differences of dental data to the importance of home effect (by testing for the equality of expected home and away goals) and the excess of draws (by testing the importance of the zero inflated component). x x Karlis & Ntzoufras: Bayesian modelling of football outcomes 14 Advantages Removes additive covariance (do not need to model it) Has an Poisson latent variable interpretation Does not assumes Poisson marginals (i.e. one may assume an overdispersed marginal distribution) Easy interpretation Simpler Model set-up than the corresponding Biv. Poisson model Discards part of the information Disadvantages Cannot model covariance and final full score (only differences). Karlis & Ntzoufras: Bayesian modelling of football outcomes 16 Skellam s Distribution for Football Scores Response variable: Z = X Y the goal difference in each game. Same structure for parameters λ 1 and λ 2 as in Bivariate Poisson: log(λ 1i ) = µ + H + A HTi + D ATi (6) log(λ 2i ) = µ + A ATi + D HTi (7) µ: constant; H: home effect; A k, D k attacking and defensive parameters of team k; HT i, AT i home and away team in i game. Use the zero inflated variation of Skellam s distribution to model the excess of draws. Hence we define the zero inflated Poisson Difference (ZPD) distribution as f ZPD (0 p, λ 1,λ 2 )= p + (1 p)f PD (0 λ 1,λ 2 ) and f ZPD (z p, λ 1,λ 2 )= (1 p)f PD (z λ 1,λ 2 ), (8) for z Z\{0}; wherep (0, 1) and f PD (z λ 1,λ 2 ) is given by (5).
5 Karlis & Ntzoufras: Bayesian modelling of football outcomes 17 Karlis & Ntzoufras: Bayesian modelling of football outcomes 19 Posterior Distributions The key element for building an MCMC algorithm for the above models is to use augmented data Sample latent data w 1i and w 2i from f(w 1i,w 2i z i = w 1i w 2i,λ 1i,λ 2i ) λw1i w 1i! where I(A) =1ifA is true and zero otherwise. Sample [δ i z i,λ 1i,λ 2i ] Bernoulli( p i )from p p i = p +(1 p)f PD (z i λ 1i,λ 2i ) [δ i indicates the mixing (zero inflated or PD) component] λ w2i 2i w 2i! I(z i = w 1i w 2i ) Then we MCMC algorithm for model parameters is similar to the one used in Poisson regression models. We can additionally model p as in logistic regression models Fulham (38) Watford (29) Chelsea (64) Arsenal (63) Tottenham (57) Everton (52) Bolton (47) Reading (52) Portsmouth (45) Blackburn (52) Aston Villa (43) Middlesbrough (44) Newcastle (38) Man City (29) West Ham (35) Wigan (37) Sheff Utd (32) Charlton (34) Man Utd (83) Liverpool (57) % Posterior intervals for attacking coefficients (A i ) Karlis & Ntzoufras: Bayesian modelling of football outcomes 18 Karlis & Ntzoufras: Bayesian modelling of football outcomes 20 Example: Premiership season data Data from the premiership for the season are used to fit the PD and ZPD models. Results using PD Posterior summaries for µ and home effect Min. 1st Qu. Median Mean 3rd Qu. Max. s.d. mu home Man Utd (27) Chelsea (24) Liverpool (27) Arsenal (35) Tottenham (54) Everton (36) Bolton (52) Reading (47) Portsmouth (42) Blackburn (54) Aston Villa (41) Middlesbrough (49) Newcastle (47) Man City (44) West Ham (59) Fulham (60) Wigan (59) Sheff Utd (55) Charlton (60) Watford (59) % Posterior intervals for defensive coefficients [( 1) D i ]
6 Karlis & Ntzoufras: Bayesian modelling of football outcomes 21 Karlis & Ntzoufras: Bayesian modelling of football outcomes 23 Number of games Goal difference 95% Posterior intervals for predicted differences (in red=observed differences; - indicates the posterior median) Karlis & Ntzoufras: Bayesian modelling of football outcomes 22 Karlis & Ntzoufras: Bayesian modelling of football outcomes 24 Posterior Predictive Table Observed Final Table Pred.(Obs.) Post. Expectations Obs. Obs. values Rank Team Pts G.Dif. Rank Team Pts G.Dif. 1 (1) Man Utd ManUtd (2) Chelsea Chelsea (3) Arsenal Liverpool (3) Liverpool Arsenal (6) Everton Tottenham (8) Reading Everton (5) Tottenham Bolton (9) Portsmouth Reading (10) Blackburn Portsmouth (11) Aston Villa Blackburn (12) Middlesbrough Aston Villa (7) Bolton Middlesbrough (13) Newcastle Newcastle (14) Man City Man City (15) West Ham West Ham (17) Wigan Fulham (18) Sheff Utd Wigan (19) Charlton Sheff Utd (16) Fulham Charlton (20) Watford Watford Deviations between observed and predictive measures Comparison Deviation 1. Relative Frequencies (counts/games) 1.06% 2. Frequencies (counts) Relative Frequencies of win/draw/lose 2.20% 4. Frequencies of win/draw/lose Expected points Expected goal difference 0.28 Deviation = 1 K ( E(Q Pred i y) Q obs i K where K is the lengths of vector Q (K = 13 for comparisons 1-4, K =20for comparisons 5-6). i=1 ) 2
7 Karlis & Ntzoufras: Bayesian modelling of football outcomes 25 Results using ZPD Posterior summaries for µ and home effect Min. 1st Qu. Median Mean 3rd Qu. Max. s.d. mu home p The posterior distribution of p is close to zero Small deviations between µ for PD and ZPD Home effect is similar in both models Results from PD Min. 1st Qu. Median Mean 3rd Qu. Max. s.d. mu home Karlis & Ntzoufras: Bayesian modelling of football outcomes 27 Deviations between observed and predictive measures Deviation Comparison PD ZPD 1. Relative Frequencies (counts/games) 1.06% 1.32% 2. Frequencies (counts) Relative Frequencies of win/draw/lose 2.20% 2.80% 4. Frequencies of win/draw/lose % 5. Expected points Expected goal difference Comments Zero inflation component does not improves the performance (as expected) Prediction of goal differences for ZPD are worse than the corresponding ones for PD Nevertheless, differences in the final rankings are minimal Karlis & Ntzoufras: Bayesian modelling of football outcomes 26 Karlis & Ntzoufras: Bayesian modelling of football outcomes 28 Number of games Goal difference 95% Posterior intervals for predicted differences (in red=observed differences; - indicates the posterior median) Comment: No major differences between the two models are observed. ZPD PD Discussion and further research Although in the simple and bivariate Poisson model we underestimate draws here using PD we have overestimated draws. No zero inflation is needed. We might need to add a component which will reduce the predicted draws. Can covariates on p improve the model? Apply Bayesian variable selection and Bayesian model averaging techniques. Apply other distributions defined on the same range.
8 Karlis & Ntzoufras: Bayesian modelling of football outcomes 29 Our Related Publications 1. Karlis D. and Ntzoufras, I. (2006). Bayesian analysis of the differences of count data. Statistics in Medicine, 25, Karlis, D. and Ntzoufras, I. (2005). Bivariate Poisson and Diagonal Inflated Bivariate Poisson Regression Models in R. Journal of Statistical Software, Volume 10, Issue Karlis, D. and Ntzoufras, I. (2003). Analysis of Sports Data Using Bivariate Poisson Models. Journal of the Royal Statistical Society, D,(Statistician),52, Karlis, D. and Ntzoufras, I. (2000). On Modelling Soccer Data. Student, 3, Karlis & Ntzoufras: Bayesian modelling of football outcomes 30 Additional References Lee, A.J. (1997). Modeling scores in the Premier League: Is Manchester United really the best? Chance, 10, Dixon, M.J. and Coles, S.G. (1997). Modelling association football scores and inefficiencies in football betting market. Applied Statistics, 46,
Bayesian Analysis of Bivariate Count Data
Karlis and Ntzoufras: Bayesian Analysis of Bivariate Count Data 1 Bayesian Analysis of Bivariate Count Data Dimitris Karlis and Ioannis Ntzoufras, Department of Statistics, Athens University of Economics
More informationAnalysis of sports data by using bivariate Poisson models
The Statistician (2003) 52, Part 3, pp. 381 393 Analysis of sports data by using bivariate Poisson models Dimitris Karlis Athens University of Economics and Business, Greece and Ioannis Ntzoufras University
More informationStatistical Properties of indices of competitive balance
Statistical Properties of indices of competitive balance Dimitris Karlis, Vasilis Manasis and Ioannis Ntzoufras Department of Statistics, Athens University of Economics & Business AUEB sports Analytics
More informationAdvanced Computational Methods in Statistics: Lecture 5 Sequential Monte Carlo/Particle Filtering
Advanced Computational Methods in Statistics: Lecture 5 Sequential Monte Carlo/Particle Filtering Axel Gandy Department of Mathematics Imperial College London http://www2.imperial.ac.uk/~agandy London
More informationA Bivariate Weibull Count Model for Forecasting Association Football Scores
A Bivariate Weibull Count Model for Forecasting Association Football Scores Georgi Boshnakov 1, Tarak Kharrat 1,2, and Ian G. McHale 2 1 School of Mathematics, University of Manchester, UK. 2 Centre for
More informationPetr Volf. Model for Difference of Two Series of Poisson-like Count Data
Petr Volf Institute of Information Theory and Automation Academy of Sciences of the Czech Republic Pod vodárenskou věží 4, 182 8 Praha 8 e-mail: volf@utia.cas.cz Model for Difference of Two Series of Poisson-like
More informationComposite Poisson Models For Goal Scoring
Swarthmore College Works Mathematics & Statistics Faculty Works Mathematics & Statistics 4-1-2008 Composite Poisson Models For Goal Scoring Philip J. Everson Swarthmore College, peverso1@swarthmore.edu
More informationOn the Dependency of Soccer Scores - A Sparse Bivariate Poisson Model for the UEFA EURO 2016
On the Dependency of Soccer Scores - A Sparse Bivariate Poisson Model for the UEFA EURO 2016 A. Groll & A. Mayr & T. Kneib & G. Schauberger Department of Statistics, Georg-August-University Göttingen MathSport
More informationOr How to select variables Using Bayesian LASSO
Or How to select variables Using Bayesian LASSO x 1 x 2 x 3 x 4 Or How to select variables Using Bayesian LASSO x 1 x 2 x 3 x 4 Or How to select variables Using Bayesian LASSO On Bayesian Variable Selection
More informationarxiv: v1 [physics.soc-ph] 24 Sep 2009
Final version is published in: Journal of Applied Statistics Vol. 36, No. 10, October 2009, 1087 1095 RESEARCH ARTICLE Are soccer matches badly designed experiments? G. K. Skinner a and G. H. Freeman b
More informationA penalty criterion for score forecasting in soccer
1 A penalty criterion for score forecasting in soccer Jean-Louis oulley (1), Gilles Celeux () 1. Institut Montpellierain Alexander Grothendieck, rance, foulleyjl@gmail.com. INRIA-Select, Institut de mathématiques
More informationBivariate Poisson and Diagonal Inflated Bivariate Poisson Regression Models in R
1 Bivariate Poisson and Diagonal Inflated Bivariate Poisson Regression Models in R Dimitris Karlis and Ioannis Ntzoufras Department of Statistics, Athens University of Economics and Business, 76, Patission
More informationLecture 3: Sports rating models
Lecture 3: Sports rating models David Aldous January 27, 2016 Sports are a popular topic for course projects usually involving details of some specific sport and statistical analysis of data. In this lecture
More informationBayesian Networks in Educational Assessment
Bayesian Networks in Educational Assessment Estimating Parameters with MCMC Bayesian Inference: Expanding Our Context Roy Levy Arizona State University Roy.Levy@asu.edu 2017 Roy Levy MCMC 1 MCMC 2 Posterior
More informationJournal of Statistical Software
JSS Journal of Statistical Software September 2005, Volume 14, Issue 10. http://www.jstatsoft.org/ Bivariate Poisson and Diagonal Inflated Bivariate Poisson Regression Models in R Dimitris Karlis Athens
More informationLORENZ A. GILCH AND SEBASTIAN MÜLLER
ON ELO BASED PREDICTION MODELS FOR THE FIFA WORLDCUP 2018 LORENZ A. GILCH AND SEBASTIAN MÜLLER arxiv:1806.01930v1 [stat.ap] 5 Jun 2018 Abstract. We propose an approach for the analysis and prediction of
More informationDependence. Practitioner Course: Portfolio Optimization. John Dodson. September 10, Dependence. John Dodson. Outline.
Practitioner Course: Portfolio Optimization September 10, 2008 Before we define dependence, it is useful to define Random variables X and Y are independent iff For all x, y. In particular, F (X,Y ) (x,
More informationTopic 2: Probability & Distributions. Road Map Probability & Distributions. ECO220Y5Y: Quantitative Methods in Economics. Dr.
Topic 2: Probability & Distributions ECO220Y5Y: Quantitative Methods in Economics Dr. Nick Zammit University of Toronto Department of Economics Room KN3272 n.zammit utoronto.ca November 21, 2017 Dr. Nick
More informationA market-based algorithm for predicting soccer outcomes
A market-based algorithm for predicting soccer outcomes Víctor Hernández García Universidad Complutense de Madrid & FEDEA June 2017 Abstract The main objective of this study is to develop a market-based
More informationThis does not cover everything on the final. Look at the posted practice problems for other topics.
Class 7: Review Problems for Final Exam 8.5 Spring 7 This does not cover everything on the final. Look at the posted practice problems for other topics. To save time in class: set up, but do not carry
More informationCS37300 Class Notes. Jennifer Neville, Sebastian Moreno, Bruno Ribeiro
CS37300 Class Notes Jennifer Neville, Sebastian Moreno, Bruno Ribeiro 2 Background on Probability and Statistics These are basic definitions, concepts, and equations that should have been covered in your
More informationBusiness Statistics: 41000
Business Statistics: 41000 Week 3: Learning from Data: Estimation Nick Polson The University of Chicago Booth School of Business http://faculty.chicagobooth.edu/nicholas.polson/teaching/41000/ Suggested
More informationClass 26: review for final exam 18.05, Spring 2014
Probability Class 26: review for final eam 8.05, Spring 204 Counting Sets Inclusion-eclusion principle Rule of product (multiplication rule) Permutation and combinations Basics Outcome, sample space, event
More informationPREDICTING AND RETROSPECTIVE ANALYSIS OF SOCCER MATCHES IN A LEAGUE
PREDICTING ND RETROSPECTIVE NLYSIS OF SOCCER MTCHES IN LEGUE HÅVRD RUE ND ØYVIND SLVESEN DEPRTMENT OF MTHEMTICL SCIENCES NTNU, NORWY FIRST VERSION: OCTOBER 1997 REVISED: UGUST 1998 BSTRCT common discussion
More informationSix Examples of Fits Using the Poisson. Negative Binomial. and Poisson Inverse Gaussian Distributions
APPENDIX A APPLICA TrONS IN OTHER DISCIPLINES Six Examples of Fits Using the Poisson. Negative Binomial. and Poisson Inverse Gaussian Distributions Poisson and mixed Poisson distributions have been extensively
More informationDiscrete Choice Modeling
[Part 6] 1/55 0 Introduction 1 Summary 2 Binary Choice 3 Panel Data 4 Bivariate Probit 5 Ordered Choice 6 7 Multinomial Choice 8 Nested Logit 9 Heterogeneity 10 Latent Class 11 Mixed Logit 12 Stated Preference
More informationSwarthmore Honors Exam 2012: Statistics
Swarthmore Honors Exam 2012: Statistics 1 Swarthmore Honors Exam 2012: Statistics John W. Emerson, Yale University NAME: Instructions: This is a closed-book three-hour exam having six questions. You may
More informationParametric Modelling of Over-dispersed Count Data. Part III / MMath (Applied Statistics) 1
Parametric Modelling of Over-dispersed Count Data Part III / MMath (Applied Statistics) 1 Introduction Poisson regression is the de facto approach for handling count data What happens then when Poisson
More informationTwelfth Problem Assignment
EECS 401 Not Graded PROBLEM 1 Let X 1, X 2,... be a sequence of independent random variables that are uniformly distributed between 0 and 1. Consider a sequence defined by (a) Y n = max(x 1, X 2,..., X
More informationApproximating the Conway-Maxwell-Poisson normalizing constant
Filomat 30:4 016, 953 960 DOI 10.98/FIL1604953S Published by Faculty of Sciences and Mathematics, University of Niš, Serbia Available at: http://www.pmf.ni.ac.rs/filomat Approximating the Conway-Maxwell-Poisson
More informationDefault Priors and Effcient Posterior Computation in Bayesian
Default Priors and Effcient Posterior Computation in Bayesian Factor Analysis January 16, 2010 Presented by Eric Wang, Duke University Background and Motivation A Brief Review of Parameter Expansion Literature
More informationVariance reduction. Michel Bierlaire. Transport and Mobility Laboratory. Variance reduction p. 1/18
Variance reduction p. 1/18 Variance reduction Michel Bierlaire michel.bierlaire@epfl.ch Transport and Mobility Laboratory Variance reduction p. 2/18 Example Use simulation to compute I = 1 0 e x dx We
More informationECE 302, Final 3:20-5:20pm Mon. May 1, WTHR 160 or WTHR 172.
ECE 302, Final 3:20-5:20pm Mon. May 1, WTHR 160 or WTHR 172. 1. Enter your name, student ID number, e-mail address, and signature in the space provided on this page, NOW! 2. This is a closed book exam.
More information(3) Review of Probability. ST440/540: Applied Bayesian Statistics
Review of probability The crux of Bayesian statistics is to compute the posterior distribution, i.e., the uncertainty distribution of the parameters (θ) after observing the data (Y) This is the conditional
More informationCourse: ESO-209 Home Work: 1 Instructor: Debasis Kundu
Home Work: 1 1. Describe the sample space when a coin is tossed (a) once, (b) three times, (c) n times, (d) an infinite number of times. 2. A coin is tossed until for the first time the same result appear
More informationClass 8 Review Problems solutions, 18.05, Spring 2014
Class 8 Review Problems solutions, 8.5, Spring 4 Counting and Probability. (a) Create an arrangement in stages and count the number of possibilities at each stage: ( ) Stage : Choose three of the slots
More informationData Modeling & Analysis Techniques. Probability & Statistics. Manfred Huber
Data Modeling & Analysis Techniques Probability & Statistics Manfred Huber 2017 1 Probability and Statistics Probability and statistics are often used interchangeably but are different, related fields
More informationProbability and Information Theory. Sargur N. Srihari
Probability and Information Theory Sargur N. srihari@cedar.buffalo.edu 1 Topics in Probability and Information Theory Overview 1. Why Probability? 2. Random Variables 3. Probability Distributions 4. Marginal
More informationBayes Model Selection with Path Sampling: Factor Models
with Path Sampling: Factor Models Ritabrata Dutta and Jayanta K Ghosh Purdue University 07/02/11 Factor Models in Applications Factor Models in Applications Factor Models Factor Models and Factor analysis
More informationOn the bivariate Skellam distribution. hal , version 1-22 Oct Noname manuscript No. (will be inserted by the editor)
Noname manuscript No. (will be inserted by the editor) On the bivariate Skellam distribution Jan Bulla Christophe Chesneau Maher Kachour Submitted to Journal of Multivariate Analysis: 22 October 2012 Abstract
More informationOnline Appendix to: Marijuana on Main Street? Estimating Demand in Markets with Limited Access
Online Appendix to: Marijuana on Main Street? Estating Demand in Markets with Lited Access By Liana Jacobi and Michelle Sovinsky This appendix provides details on the estation methodology for various speci
More informationMohammed. Research in Pharmacoepidemiology National School of Pharmacy, University of Otago
Mohammed Research in Pharmacoepidemiology (RIPE) @ National School of Pharmacy, University of Otago What is zero inflation? Suppose you want to study hippos and the effect of habitat variables on their
More informationUnit 2. Describing Data: Numerical
Unit 2 Describing Data: Numerical Describing Data Numerically Describing Data Numerically Central Tendency Arithmetic Mean Median Mode Variation Range Interquartile Range Variance Standard Deviation Coefficient
More informationMODEL BASED CLUSTERING FOR COUNT DATA
MODEL BASED CLUSTERING FOR COUNT DATA Dimitris Karlis Department of Statistics Athens University of Economics and Business, Athens April OUTLINE Clustering methods Model based clustering!"the general model!"algorithmic
More informationBayesian Inference and MCMC
Bayesian Inference and MCMC Aryan Arbabi Partly based on MCMC slides from CSC412 Fall 2018 1 / 18 Bayesian Inference - Motivation Consider we have a data set D = {x 1,..., x n }. E.g each x i can be the
More informationObnoxious lateness humor
Obnoxious lateness humor 1 Using Bayesian Model Averaging For Addressing Model Uncertainty in Environmental Risk Assessment Louise Ryan and Melissa Whitney Department of Biostatistics Harvard School of
More informationLECTURE 1. Introduction to Econometrics
LECTURE 1 Introduction to Econometrics Ján Palguta September 20, 2016 1 / 29 WHAT IS ECONOMETRICS? To beginning students, it may seem as if econometrics is an overly complex obstacle to an otherwise useful
More informationBusiness Statistics 41000: Homework # 2 Solutions
Business Statistics 4000: Homework # 2 Solutions Drew Creal February 9, 204 Question #. Discrete Random Variables and Their Distributions (a) The probabilities have to sum to, which means that 0. + 0.3
More informationWhere would you rather live? (And why?)
Where would you rather live? (And why?) CityA CityB Where would you rather live? (And why?) CityA CityB City A is San Diego, CA, and City B is Evansville, IN Measures of Dispersion Suppose you need a new
More informationApproximate Bayesian binary and ordinal regression for prediction with structured uncertainty in the inputs
Approximate Bayesian binary and ordinal regression for prediction with structured uncertainty in the inputs Aleksandar Dimitriev Erik Štrumbelj Abstract We present a novel approach to binary and ordinal
More informationEstimation of Copula Models with Discrete Margins (via Bayesian Data Augmentation) Michael S. Smith
Estimation of Copula Models with Discrete Margins (via Bayesian Data Augmentation) Michael S. Smith Melbourne Business School, University of Melbourne (Joint with Mohamad Khaled, University of Queensland)
More informationThis paper is not to be removed from the Examination Halls
~~ST104B ZA d0 This paper is not to be removed from the Examination Halls UNIVERSITY OF LONDON ST104B ZB BSc degrees and Diplomas for Graduates in Economics, Management, Finance and the Social Sciences,
More informationMultivariate Normal & Wishart
Multivariate Normal & Wishart Hoff Chapter 7 October 21, 2010 Reading Comprehesion Example Twenty-two children are given a reading comprehsion test before and after receiving a particular instruction method.
More informationLecture Notes for BUSINESS STATISTICS - BMGT 571. Chapters 1 through 6. Professor Ahmadi, Ph.D. Department of Management
Lecture Notes for BUSINESS STATISTICS - BMGT 571 Chapters 1 through 6 Professor Ahmadi, Ph.D. Department of Management Revised May 005 Glossary of Terms: Statistics Chapter 1 Data Data Set Elements Variable
More informationCS281 Section 4: Factor Analysis and PCA
CS81 Section 4: Factor Analysis and PCA Scott Linderman At this point we have seen a variety of machine learning models, with a particular emphasis on models for supervised learning. In particular, we
More informationSTA414/2104. Lecture 11: Gaussian Processes. Department of Statistics
STA414/2104 Lecture 11: Gaussian Processes Department of Statistics www.utstat.utoronto.ca Delivered by Mark Ebden with thanks to Russ Salakhutdinov Outline Gaussian Processes Exam review Course evaluations
More informationIncorporating cost in Bayesian variable selection, with application to cost-effective measurement of quality of health care
Incorporating cost in Bayesian variable selection, with application to cost-effective measurement of quality of health care Dimitris Fouskakis, Department of Mathematics, School of Applied Mathematical
More informationEXAMINATIONS OF THE HONG KONG STATISTICAL SOCIETY
EXAMINATIONS OF THE HONG KONG STATISTICAL SOCIETY HIGHER CERTIFICATE IN STATISTICS, 2013 MODULE 5 : Further probability and inference Time allowed: One and a half hours Candidates should answer THREE questions.
More informationChapter 5. Bayesian Statistics
Chapter 5. Bayesian Statistics Principles of Bayesian Statistics Anything unknown is given a probability distribution, representing degrees of belief [subjective probability]. Degrees of belief [subjective
More informationCIS 390 Fall 2016 Robotics: Planning and Perception Final Review Questions
CIS 390 Fall 2016 Robotics: Planning and Perception Final Review Questions December 14, 2016 Questions Throughout the following questions we will assume that x t is the state vector at time t, z t is the
More informationPart IA Probability. Definitions. Based on lectures by R. Weber Notes taken by Dexter Chua. Lent 2015
Part IA Probability Definitions Based on lectures by R. Weber Notes taken by Dexter Chua Lent 2015 These notes are not endorsed by the lecturers, and I have modified them (often significantly) after lectures.
More informationBayesian inference for factor scores
Bayesian inference for factor scores Murray Aitkin and Irit Aitkin School of Mathematics and Statistics University of Newcastle UK October, 3 Abstract Bayesian inference for the parameters of the factor
More informationOptimization and Simulation
Optimization and Simulation Variance reduction Michel Bierlaire Transport and Mobility Laboratory School of Architecture, Civil and Environmental Engineering Ecole Polytechnique Fédérale de Lausanne M.
More informationContents. Part I: Fundamentals of Bayesian Inference 1
Contents Preface xiii Part I: Fundamentals of Bayesian Inference 1 1 Probability and inference 3 1.1 The three steps of Bayesian data analysis 3 1.2 General notation for statistical inference 4 1.3 Bayesian
More informationPart 8: GLMs and Hierarchical LMs and GLMs
Part 8: GLMs and Hierarchical LMs and GLMs 1 Example: Song sparrow reproductive success Arcese et al., (1992) provide data on a sample from a population of 52 female song sparrows studied over the course
More informationSystematic strategies for real time filtering of turbulent signals in complex systems
Systematic strategies for real time filtering of turbulent signals in complex systems Statistical inversion theory for Gaussian random variables The Kalman Filter for Vector Systems: Reduced Filters and
More informationChapter 5. Chapter 5 sections
1 / 43 sections Discrete univariate distributions: 5.2 Bernoulli and Binomial distributions Just skim 5.3 Hypergeometric distributions 5.4 Poisson distributions Just skim 5.5 Negative Binomial distributions
More informationDependence. MFM Practitioner Module: Risk & Asset Allocation. John Dodson. September 11, Dependence. John Dodson. Outline.
MFM Practitioner Module: Risk & Asset Allocation September 11, 2013 Before we define dependence, it is useful to define Random variables X and Y are independent iff For all x, y. In particular, F (X,Y
More informationExam 1 - Math Solutions
Exam 1 - Math 3200 - Solutions Spring 2013 1. Without actually expanding, find the coefficient of x y 2 z 3 in the expansion of (2x y z) 6. (A) 120 (B) 60 (C) 30 (D) 20 (E) 10 (F) 10 (G) 20 (H) 30 (I)
More informationStat 5101 Lecture Notes
Stat 5101 Lecture Notes Charles J. Geyer Copyright 1998, 1999, 2000, 2001 by Charles J. Geyer May 7, 2001 ii Stat 5101 (Geyer) Course Notes Contents 1 Random Variables and Change of Variables 1 1.1 Random
More informationLecture 1: Probability Fundamentals
Lecture 1: Probability Fundamentals IB Paper 7: Probability and Statistics Carl Edward Rasmussen Department of Engineering, University of Cambridge January 22nd, 2008 Rasmussen (CUED) Lecture 1: Probability
More informationMultivariate Generalized Linear Mixed Models for Joint Estimation of Sporting Outcomes
arxiv:1710.05284v1 [stat.ap] 15 Oct 2017 Multivariate Generalized Linear Mixed Models for Joint Estimation of Sporting Outcomes Jennifer E. Broatch Andrew T. Karl Abstract This paper explores improvements
More informationVojtěch Herrmann. Favoritism Under Social Pressure: Evidence From English Premier League
BACHELOR THESIS Vojtěch Herrmann Favoritism Under Social Pressure: Evidence From English Premier League Department of Probability and Mathematical Statistics Supervisor of the bachelor thesis: Study programme:
More informationBayesian Inference. Chapter 4: Regression and Hierarchical Models
Bayesian Inference Chapter 4: Regression and Hierarchical Models Conchi Ausín and Mike Wiper Department of Statistics Universidad Carlos III de Madrid Master in Business Administration and Quantitative
More informationSemiparametric Generalized Linear Models
Semiparametric Generalized Linear Models North American Stata Users Group Meeting Chicago, Illinois Paul Rathouz Department of Health Studies University of Chicago prathouz@uchicago.edu Liping Gao MS Student
More informationHidden Markov models for time series of counts with excess zeros
Hidden Markov models for time series of counts with excess zeros Madalina Olteanu and James Ridgway University Paris 1 Pantheon-Sorbonne - SAMM, EA4543 90 Rue de Tolbiac, 75013 Paris - France Abstract.
More informationA strategy for modelling count data which may have extra zeros
A strategy for modelling count data which may have extra zeros Alan Welsh Centre for Mathematics and its Applications Australian National University The Data Response is the number of Leadbeater s possum
More informationarxiv: v2 [physics.data-an] 3 Mar 2010
Soccer: is scoring goals a predictable Poissonian process? A. Heuer, 1 C. Müller, 1,2 and O. Rubner 1 1 Westfälische Wilhelms Universität Münster, Institut für physikalische Chemie, Corrensstr. 3, 48149
More informationLecture 3: Sports rating models
Lecture 3: Sports rating models David Aldous August 31, 2017 Sports are a popular topic for course projects usually involving details of some specific sport and statistical analysis of data. In this lecture
More informationGauge Plots. Gauge Plots JAPANESE BEETLE DATA MAXIMUM LIKELIHOOD FOR SPATIALLY CORRELATED DISCRETE DATA JAPANESE BEETLE DATA
JAPANESE BEETLE DATA 6 MAXIMUM LIKELIHOOD FOR SPATIALLY CORRELATED DISCRETE DATA Gauge Plots TuscaroraLisa Central Madsen Fairways, 996 January 9, 7 Grubs Adult Activity Grub Counts 6 8 Organic Matter
More informationForestry experiment and dose-response modelling. In the News: High water mark: the rise in sea levels may be accelerating Economist, Jan 17
Today HW 1: due February 4, 11.59 pm. Regression with count data Forestry experiment and dose-response modelling In the News: High water mark: the rise in sea levels may be accelerating Economist, Jan
More information36-463/663: Multilevel & Hierarchical Models
36-463/663: Multilevel & Hierarchical Models (P)review: in-class midterm Brian Junker 132E Baker Hall brian@stat.cmu.edu 1 In-class midterm Closed book, closed notes, closed electronics (otherwise I have
More informationBayesian linear regression
Bayesian linear regression Linear regression is the basis of most statistical modeling. The model is Y i = X T i β + ε i, where Y i is the continuous response X i = (X i1,..., X ip ) T is the corresponding
More informationApproximate Inference in Practice Microsoft s Xbox TrueSkill TM
1 Approximate Inference in Practice Microsoft s Xbox TrueSkill TM Daniel Hernández-Lobato Universidad Autónoma de Madrid 2014 2 Outline 1 Introduction 2 The Probabilistic Model 3 Approximate Inference
More informationMATH 3510: PROBABILITY AND STATS June 15, 2011 MIDTERM EXAM
MATH 3510: PROBABILITY AND STATS June 15, 2011 MIDTERM EXAM YOUR NAME: KEY: Answers in Blue Show all your work. Answers out of the blue and without any supporting work may receive no credit even if they
More information13.1 Causal effects with continuous mediator and. predictors in their equations. The definitions for the direct, total indirect,
13 Appendix 13.1 Causal effects with continuous mediator and continuous outcome Consider the model of Section 3, y i = β 0 + β 1 m i + β 2 x i + β 3 x i m i + β 4 c i + ɛ 1i, (49) m i = γ 0 + γ 1 x i +
More informationLecture 25: Review. Statistics 104. April 23, Colin Rundel
Lecture 25: Review Statistics 104 Colin Rundel April 23, 2012 Joint CDF F (x, y) = P [X x, Y y] = P [(X, Y ) lies south-west of the point (x, y)] Y (x,y) X Statistics 104 (Colin Rundel) Lecture 25 April
More informationIntroduction to Probability 2017/18 Supplementary Problems
Introduction to Probability 2017/18 Supplementary Problems Problem 1: Let A and B denote two events with P(A B) 0. Show that P(A) 0 and P(B) 0. A A B implies P(A) P(A B) 0, hence P(A) 0. Similarly B A
More informationApproaches for Multiple Disease Mapping: MCAR and SANOVA
Approaches for Multiple Disease Mapping: MCAR and SANOVA Dipankar Bandyopadhyay Division of Biostatistics, University of Minnesota SPH April 22, 2015 1 Adapted from Sudipto Banerjee s notes SANOVA vs MCAR
More informationLinear Models and Empirical Bayes Methods for. Assessing Differential Expression in Microarray Experiments
Linear Models and Empirical Bayes Methods for Assessing Differential Expression in Microarray Experiments by Gordon K. Smyth (as interpreted by Aaron J. Baraff) STAT 572 Intro Talk April 10, 2014 Microarray
More informationR-squared for Bayesian regression models
R-squared for Bayesian regression models Andrew Gelman Ben Goodrich Jonah Gabry Imad Ali 8 Nov 2017 Abstract The usual definition of R 2 (variance of the predicted values divided by the variance of the
More informationBayesian Learning. HT2015: SC4 Statistical Data Mining and Machine Learning. Maximum Likelihood Principle. The Bayesian Learning Framework
HT5: SC4 Statistical Data Mining and Machine Learning Dino Sejdinovic Department of Statistics Oxford http://www.stats.ox.ac.uk/~sejdinov/sdmml.html Maximum Likelihood Principle A generative model for
More information45 6 = ( )/6! = 20 6 = selected subsets: 6 = and 10
Homework 1. Selected Solutions, 9/12/03. Ch. 2, # 34, p. 74. The most natural sample space S is the set of unordered -tuples, i.e., the set of -element subsets of the list of 45 workers (labelled 1,...,
More informationBayesian Inference. Chapter 4: Regression and Hierarchical Models
Bayesian Inference Chapter 4: Regression and Hierarchical Models Conchi Ausín and Mike Wiper Department of Statistics Universidad Carlos III de Madrid Advanced Statistics and Data Mining Summer School
More informationCSC 411: Lecture 09: Naive Bayes
CSC 411: Lecture 09: Naive Bayes Class based on Raquel Urtasun & Rich Zemel s lectures Sanja Fidler University of Toronto Feb 8, 2015 Urtasun, Zemel, Fidler (UofT) CSC 411: 09-Naive Bayes Feb 8, 2015 1
More informationOnline Appendix: Bayesian versus maximum likelihood estimation of treatment effects in bivariate probit instrumental variable models
Online Appendix: Bayesian versus maximum likelihood estimation of treatment effects in bivariate probit instrumental variable models A. STAN CODE // STAN code for Bayesian bivariate model // based on code
More informationBayes: All uncertainty is described using probability.
Bayes: All uncertainty is described using probability. Let w be the data and θ be any unknown quantities. Likelihood. The probability model π(w θ) has θ fixed and w varying. The likelihood L(θ; w) is π(w
More informationPlausible Values for Latent Variables Using Mplus
Plausible Values for Latent Variables Using Mplus Tihomir Asparouhov and Bengt Muthén August 21, 2010 1 1 Introduction Plausible values are imputed values for latent variables. All latent variables can
More informationStatistical Methods III Statistics 212. Problem Set 2 - Answer Key
Statistical Methods III Statistics 212 Problem Set 2 - Answer Key 1. (Analysis to be turned in and discussed on Tuesday, April 24th) The data for this problem are taken from long-term followup of 1423
More informationSTAT 4385 Topic 01: Introduction & Review
STAT 4385 Topic 01: Introduction & Review Xiaogang Su, Ph.D. Department of Mathematical Science University of Texas at El Paso xsu@utep.edu Spring, 2016 Outline Welcome What is Regression Analysis? Basics
More information