Foundations of Statistical Inference

Similar documents
Lecture 16 : Bayesian analysis of contingency tables. Bayesian linear regression. Jonathan Marchini (University of Oxford) BS2a MT / 15

The linear model is the most fundamental of all serious statistical models encompassing:

Bayesian Linear Models

Bayesian Linear Models

Bayesian Linear Models

Linear Models A linear model is defined by the expression

Conjugate Analysis for the Linear Model

ST 740: Linear Models and Multivariate Normal Inference

Foundations of Statistical Inference

Bayesian Linear Regression

Summer School in Statistics for Astronomers V June 1 - June 6, Regression. Mosuk Chow Statistics Department Penn State University.

A Bayesian Treatment of Linear Gaussian Regression

Bayesian Learning. HT2015: SC4 Statistical Data Mining and Machine Learning. Maximum Likelihood Principle. The Bayesian Learning Framework

Module 4: Bayesian Methods Lecture 5: Linear regression

ST 740: Model Selection

BIOS 2083 Linear Models c Abdus S. Wahed

MCMC algorithms for fitting Bayesian models

An Introduction to Bayesian Linear Regression

Physics 403. Segev BenZvi. Parameter Estimation, Correlations, and Error Bars. Department of Physics and Astronomy University of Rochester

Statistics 135 Fall 2008 Final Exam

Motivation Scale Mixutres of Normals Finite Gaussian Mixtures Skew-Normal Models. Mixture Models. Econ 690. Purdue University

STAT763: Applied Regression Analysis. Multiple linear regression. 4.4 Hypothesis testing

General Linear Model: Statistical Inference

Foundations of Statistical Inference

Bayesian Ingredients. Hedibert Freitas Lopes

F & B Approaches to a simple model

STAT Advanced Bayesian Inference

Lecture 1 Bayesian inference

STAT 135 Lab 11 Tests for Categorical Data (Fisher s Exact test, χ 2 tests for Homogeneity and Independence) and Linear Regression

Module 17: Bayesian Statistics for Genetics Lecture 4: Linear regression

Learning Bayesian network : Given structure and completely observed data

Bayesian inference. Rasmus Waagepetersen Department of Mathematics Aalborg University Denmark. April 10, 2017

Topic 2: Probability & Distributions. Road Map Probability & Distributions. ECO220Y5Y: Quantitative Methods in Economics. Dr.

Multinomial Data. f(y θ) θ y i. where θ i is the probability that a given trial results in category i, i = 1,..., k. The parameter space is

variability of the model, represented by σ 2 and not accounted for by Xβ

g-priors for Linear Regression

Integrated Likelihood Estimation in Semiparametric Regression Models. Thomas A. Severini Department of Statistics Northwestern University

COS513 LECTURE 8 STATISTICAL CONCEPTS

Topic 21 Goodness of Fit

Lecture 13 Fundamentals of Bayesian Inference

Part IB Statistics. Theorems with proof. Based on lectures by D. Spiegelhalter Notes taken by Dexter Chua. Lent 2015

Multiple regression. CM226: Machine Learning for Bioinformatics. Fall Sriram Sankararaman Acknowledgments: Fei Sha, Ameet Talwalkar

Nonparameteric Regression:

Lecture 15. Hypothesis testing in the linear model

Introduction into Bayesian statistics

Bayesian Inference. Chapter 9. Linear models and regression

Some slides from Carlos Guestrin, Luke Zettlemoyer & K Gajos 2

Computer intensive statistical methods

HPD Intervals / Regions

AMS-207: Bayesian Statistics

CSC411 Fall 2018 Homework 5

Module 11: Linear Regression. Rebecca C. Steorts

Bayesian Inference. Chapter 2: Conjugate models

Mathematical statistics

Time Series and Dynamic Models

DEPARTMENT OF COMPUTER SCIENCE Autumn Semester MACHINE LEARNING AND ADAPTIVE INTELLIGENCE

1 Data Arrays and Decompositions

5.2 Expounding on the Admissibility of Shrinkage Estimators

Bayesian Regression (1/31/13)

Section 4.6 Simple Linear Regression

2.6.3 Generalized likelihood ratio tests

Weighted Least Squares

Fall 2017 STAT 532 Homework Peter Hoff. 1. Let P be a probability measure on a collection of sets A.

Hypothesis Testing. Econ 690. Purdue University. Justin L. Tobias (Purdue) Testing 1 / 33

Fractional Imputation in Survey Sampling: A Comparative Review

13.1 Categorical Data and the Multinomial Experiment

Introduction to Machine Learning

Bayesian Linear Models

Multinomial Logistic Regression Models

Bayesian Models in Machine Learning

Statistical Data Analysis Stat 3: p-values, parameter estimation

Foundations of Statistical Inference

Log-linear Models for Contingency Tables

Statistics 3858 : Contingency Tables

Topic 12 Overview of Estimation

Bayesian Interpretations of Regularization

Non-Parametric Bayes

PMR Learning as Inference

Fractional Hot Deck Imputation for Robust Inference Under Item Nonresponse in Survey Sampling

Deep Poisson Factorization Machines: a factor analysis model for mapping behaviors in journalist ecosystem

Lecture 1 Basic Statistical Machinery

Lecture 6 Multiple Linear Regression, cont.

INTRODUCTION TO BAYESIAN STATISTICS

Gibbs Sampling in Latent Variable Models #1

Pattern Recognition and Machine Learning. Bishop Chapter 2: Probability Distributions

T Machine Learning: Basic Principles

Bayes methods for categorical data. April 25, 2017

Stat 5101 Lecture Notes

Problem Selected Scores

Bayesian Inference. Chapter 4: Regression and Hierarchical Models

Probability. Machine Learning and Pattern Recognition. Chris Williams. School of Informatics, University of Edinburgh. August 2014

Introduction to Statistical Data Analysis Lecture 7: The Chi-Square Distribution

Bayesian Inference. Chapter 4: Regression and Hierarchical Models

Principles of Bayesian Inference

1/15. Over or under dispersion Problem

STAT 135 Lab 13 (Review) Linear Regression, Multivariate Random Variables, Prediction, Logistic Regression and the δ-method.

Testing Independence

Bayesian Graphical Models

Lecture : Probabilistic Machine Learning

CS Lecture 18. Topic Models and LDA

Transcription:

Foundations of Statistical Inference Julien Berestycki Department of Statistics University of Oxford MT 2015 Julien Berestycki (University of Oxford) SB2a MT 2015 1 / 16

Lecture 16 : Bayesian analysis of contingency tables. Bayesian linear regression. Julien Berestycki (University of Oxford) SB2a MT 2015 2 / 16

Example 2 2 From Wikipedia article on contingency tables: Left-handed Right-handed Total Male 9 (y 1 ) 43 52 Female 4 (y 2 ) 44 48 Total 13 87 100 Hypothesis: θ 1 = Proportion of left-handed men > θ 2 = proportion of left handed women. Julien Berestycki (University of Oxford) SB2a MT 2015 3 / 16

Example 2 2 From Wikipedia article on contingency tables: Left-handed Right-handed Total Male 9 (y 1 ) 43 52 Female 4 (y 2 ) 44 48 Total 13 87 100 Hypothesis: θ 1 = Proportion of left-handed men > θ 2 = proportion of left handed women. Model: y 1 Binom(n 1, θ 1 ), y 2 Binom(n 2, θ 2 ). Julien Berestycki (University of Oxford) SB2a MT 2015 3 / 16

Example 2 2 From Wikipedia article on contingency tables: Left-handed Right-handed Total Male 9 (y 1 ) 43 52 Female 4 (y 2 ) 44 48 Total 13 87 100 Hypothesis: θ 1 = Proportion of left-handed men > θ 2 = proportion of left handed women. Model: y 1 Binom(n 1, θ 1 ), y 2 Binom(n 2, θ 2 ). Use uniform priors θ i U [0,1] =Beta(1, 1). Julien Berestycki (University of Oxford) SB2a MT 2015 3 / 16

Example 2 2 From Wikipedia article on contingency tables: Left-handed Right-handed Total Male 9 (y 1 ) 43 52 Female 4 (y 2 ) 44 48 Total 13 87 100 Hypothesis: θ 1 = Proportion of left-handed men > θ 2 = proportion of left handed women. Model: y 1 Binom(n 1, θ 1 ), y 2 Binom(n 2, θ 2 ). Use uniform priors θ i U [0,1] =Beta(1, 1). Posteriors p(θ 1 y 1, n 1 ) = Beta(y 1 +1, n 1 y 1 +1), p(θ 2 y 2, n 2 ) = Beta(y 2 +1, n 2 y 2 +1) Then compute posterior P(Z 1 > Z 2 ) Julien Berestycki (University of Oxford) SB2a MT 2015 3 / 16

Example 2 2 From Wikipedia article on contingency tables: Left-handed Right-handed Total Male 9 (y 1 ) 43 52 Female 4 (y 2 ) 44 48 Total 13 87 100 Hypothesis: θ 1 = Proportion of left-handed men > θ 2 = proportion of left handed women. Model: y 1 Binom(n 1, θ 1 ), y 2 Binom(n 2, θ 2 ). Use uniform priors θ i U [0,1] =Beta(1, 1). Posteriors p(θ 1 y 1, n 1 ) = Beta(y 1 +1, n 1 y 1 +1), p(θ 2 y 2, n 2 ) = Beta(y 2 +1, n 2 y 2 +1) Then compute posterior P(Z 1 > Z 2 ) Either compute an integral or by simulation. Julien Berestycki (University of Oxford) SB2a MT 2015 3 / 16

Example 2 2: simulations See R code. Generate M sample from joint posterior p(θ 1, θ 2 y 1, n 1, y 2, n 2 ) = p(θ 1 y 1, n 1 ) p(θ 2 y 2, n 2 ) and then use Monte-Carlo approximation P[θ 1 > θ 2 ] 1 (i) I(θ M 1 > θ(i) 2 ) Julien Berestycki (University of Oxford) SB2a MT 2015 4 / 16

Example 2 2: simulations See R code. Generate M sample from joint posterior p(θ 1, θ 2 y 1, n 1, y 2, n 2 ) = p(θ 1 y 1, n 1 ) p(θ 2 y 2, n 2 ) and then use Monte-Carlo approximation P[θ 1 > θ 2 ] 1 (i) I(θ M Outputs M=10000 : 1 > θ(i) 2 ) Posterior Simulation of Male - Female Lefties 2.5% 50% 97.5% -0.046 0.083 0.218 print(mean(theta1>theta2)) [1] 0.8997 p(theta1 - theta2 y, n) -0.2-0.1 0.0 0.1 0.2 0.3 0.4 theta1 - theta2 Julien Berestycki (University of Oxford) SB2a MT 2015 4 / 16

Contingency table analysis North Carolina State University data. EC : Extra Curricular activities in hours per week. EC < 2 2 to 12 > 12 C or better 11 68 3 D or F 9 23 5 Let y = (y ij ) be the matrix of counts. Julien Berestycki (University of Oxford) SB2a MT 2015 5 / 16

Frequentist analysis Usual χ 2 test from R. Pearson s Chi-squared test y i,j is cardinal of cell i, j < 2 2 to 12 > 12 C or better 11 68 3 D or F 9 23 5 Julien Berestycki (University of Oxford) SB2a MT 2015 6 / 16

Frequentist analysis Usual χ 2 test from R. Pearson s Chi-squared test Sum rows and columns < 2 2 to 12 > 12 total C or better 11 68 3 82 D or F 9 23 5 37 total 20 91 8 119 Julien Berestycki (University of Oxford) SB2a MT 2015 6 / 16

Frequentist analysis Usual χ 2 test from R. Pearson s Chi-squared test E i,j = r i c j /N < 2 2 to 12 > 12 total C or better 13.8 62.7 5.51 82 D or F 6.22 28.3 2.49 37 total 20 91 8 119 Julien Berestycki (University of Oxford) SB2a MT 2015 6 / 16

Frequentist analysis Usual χ 2 test from R. Pearson s Chi-squared test χ 2 = i,j (y i,j E i,j ) 2 /E i,j < 2 2 to 12 > 12 total C or better 0.56 0.45 1.15 D or F 1.24 0.99 2.54 total 6.92 Julien Berestycki (University of Oxford) SB2a MT 2015 6 / 16

Frequentist analysis Usual χ 2 test from R. Pearson s Chi-squared test χ 2 = i,j (y i,j E i,j ) 2 /E i,j < 2 2 to 12 > 12 total C or better 0.56 0.45 1.15 D or F 1.24 0.99 2.54 total 6.92 X-squared = 6.9264, df = (3-1)(2-1) = 2, p-value = 0.03133 The p-value is 0.03133, evidence that grades are related to time spent on extra curricular activities. Julien Berestycki (University of Oxford) SB2a MT 2015 6 / 16

Bayesian analysis EC < 2 2 to 12 > 12 C or better p 11 p 12 p 13 D or F p 21 p 22 p 23 Julien Berestycki (University of Oxford) SB2a MT 2015 7 / 16

Bayesian analysis EC < 2 2 to 12 > 12 C or better p 11 p 12 p 13 D or F p 21 p 22 p 23 Let p = {p 11,..., p 23 }. The model is that Y = (y 1,1,..., y 2,3 ) is a multinomial (N, p) (i.e. N trials with P(X k = (i, j)) = p i,j and y i,j = #{X k = (i, j)}.) Julien Berestycki (University of Oxford) SB2a MT 2015 7 / 16

Bayesian analysis EC < 2 2 to 12 > 12 C or better p 11 p 12 p 13 D or F p 21 p 22 p 23 Let p = {p 11,..., p 23 }. The model is that Y = (y 1,1,..., y 2,3 ) is a multinomial (N, p) (i.e. N trials with P(X k = (i, j)) = p i,j and y i,j = #{X k = (i, j)}.) Bayesian method: make p a variable. Julien Berestycki (University of Oxford) SB2a MT 2015 7 / 16

Bayesian analysis EC < 2 2 to 12 > 12 C or better p 11 p 12 p 13 D or F p 21 p 22 p 23 Let p = {p 11,..., p 23 }. The model is that Y = (y 1,1,..., y 2,3 ) is a multinomial (N, p) (i.e. N trials with P(X k = (i, j)) = p i,j and y i,j = #{X k = (i, j)}.) Bayesian method: make p a variable. Julien Berestycki (University of Oxford) SB2a MT 2015 7 / 16

Bayesian analysis EC < 2 2 to 12 > 12 C or better p 11 p 12 p 13 D or F p 21 p 22 p 23 Let p = {p 11,..., p 23 }. The model is that Y = (y 1,1,..., y 2,3 ) is a multinomial (N, p) (i.e. N trials with P(X k = (i, j)) = p i,j and y i,j = #{X k = (i, j)}.) Bayesian method: make p a variable. Consider two models M I the two categorical variables are independent Julien Berestycki (University of Oxford) SB2a MT 2015 7 / 16

Bayesian analysis EC < 2 2 to 12 > 12 C or better p 11 p 12 p 13 D or F p 21 p 22 p 23 Let p = {p 11,..., p 23 }. The model is that Y = (y 1,1,..., y 2,3 ) is a multinomial (N, p) (i.e. N trials with P(X k = (i, j)) = p i,j and y i,j = #{X k = (i, j)}.) Bayesian method: make p a variable. Consider two models M I the two categorical variables are independent the two categorical variables are dependent. M D Julien Berestycki (University of Oxford) SB2a MT 2015 7 / 16

Bayesian analysis EC < 2 2 to 12 > 12 C or better p 11 p 12 p 13 D or F p 21 p 22 p 23 Let p = {p 11,..., p 23 }. The model is that Y = (y 1,1,..., y 2,3 ) is a multinomial (N, p) (i.e. N trials with P(X k = (i, j)) = p i,j and y i,j = #{X k = (i, j)}.) Bayesian method: make p a variable. Consider two models M I (p 11, p 12, p 13 ) (p 21, p 22, p 23 ) M D the two categorical variables are dependent. The Bayes factor is BF = P(y M D) P(y M I ). Julien Berestycki (University of Oxford) SB2a MT 2015 7 / 16

Bayesian analysis EC < 2 2 to 12 > 12 C or better p 11 p 12 p 13 D or F p 21 p 22 p 23 Let p = {p 11,..., p 23 }. The model is that Y = (y 1,1,..., y 2,3 ) is a multinomial (N, p) (i.e. N trials with P(X k = (i, j)) = p i,j and y i,j = #{X k = (i, j)}.) Bayesian method: make p a variable. Consider two models M I (p 11, p 12, p 13 ) (p 21, p 22, p 23 ) M D (p 11, p 12, p 13 ) (p 21, p 22, p 23 ) The Bayes factor is BF = P(y M D) P(y M I ). Julien Berestycki (University of Oxford) SB2a MT 2015 7 / 16

The Dirichlet distribution Dirichlet integral z 1 + +z k =1 z ν 1 1 1 z ν k 1 k dz 1 dz k = Γ(ν 1) Γ(ν k ) Γ( ν i ) Julien Berestycki (University of Oxford) SB2a MT 2015 8 / 16

The Dirichlet distribution Dirichlet integral z 1 + +z k =1 z ν 1 1 1 z ν k 1 k dz 1 dz k = Γ(ν 1) Γ(ν k ) Γ( ν i ) Dirichlet distribution Γ( ν i ) Γ(ν 1 ) Γ(ν k ) zν 1 1 1 z ν k 1 k, z 1 + + z k = 1 The means are E[Z i ] = ν i / ν i, i = 1,..., k. Julien Berestycki (University of Oxford) SB2a MT 2015 8 / 16

The Dirichlet distribution Dirichlet integral z 1 + +z k =1 z ν 1 1 1 z ν k 1 k dz 1 dz k = Γ(ν 1) Γ(ν k ) Γ( ν i ) Dirichlet distribution Γ( ν i ) Γ(ν 1 ) Γ(ν k ) zν 1 1 1 z ν k 1 k, z 1 + + z k = 1 The means are E[Z i ] = ν i / ν i, i = 1,..., k. A representation that makes the Dirichlet easy to simulate from is the following. Julien Berestycki (University of Oxford) SB2a MT 2015 8 / 16

The Dirichlet distribution Dirichlet integral z 1 + +z k =1 z ν 1 1 1 z ν k 1 k dz 1 dz k = Γ(ν 1) Γ(ν k ) Γ( ν i ) Dirichlet distribution Γ( ν i ) Γ(ν 1 ) Γ(ν k ) zν 1 1 1 z ν k 1 k, z 1 + + z k = 1 The means are E[Z i ] = ν i / ν i, i = 1,..., k. A representation that makes the Dirichlet easy to simulate from is the following. Let W 1,..., W k be independent Gamma (ν 1, θ),... Gamma (ν k, θ) random variables, W = W i and set Z i = W i /W, i = 1,..., k. (Does not depend on θ). Julien Berestycki (University of Oxford) SB2a MT 2015 8 / 16

Examples of 3D Dirichlet distributions Julien Berestycki (University of Oxford) SB2a MT 2015 9 / 16

Calculating marginal likelihoods The model is that f (y, p) is a P(y M D ) = P(y p)π(p)dp p Julien Berestycki (University of Oxford) SB2a MT 2015 10 / 16

Calculating marginal likelihoods The model is that f (y, p) is a P(y M D ) = P(y p)π(p)dp = p p 11 + +p 23 =1 ( ) y ( ) yij pij π(p)dp11 dp 23 y ij Julien Berestycki (University of Oxford) SB2a MT 2015 10 / 16

Calculating marginal likelihoods The model is that f (y, p) is a P(y M D ) = P(y p)π(p)dp = p p 11 + +p 23 =1 ( ) y ( ) yij pij π(p)dp11 dp 23 y ij where ( ) y = ( y ij )!/ y ij! y Julien Berestycki (University of Oxford) SB2a MT 2015 10 / 16

Calculating marginal likelihoods The model is that f (y, p) is a P(y M D ) = P(y p)π(p)dp where = p p 11 + +p 23 =1 ( ) y ( ) yij pij π(p)dp11 dp 23 y ( ) y = ( y ij )!/ y ij! y Under M D (p 1,1, p 1,2, p 1,3 ) (p 2,1, p 2,2, p 2,3 ) so choose a uniform distribution for p i.e. Dirichlet(1,..., 1) ij Julien Berestycki (University of Oxford) SB2a MT 2015 10 / 16

Calculating marginal likelihoods The model is that f (y, p) is a P(y M D ) = P(y p)π(p)dp where = p p 11 + +p 23 =1 ( ) y ( ) yij pij π(p)dp11 dp 23 y ( ) y = ( y ij )!/ y ij! y Under M D (p 1,1, p 1,2, p 1,3 ) (p 2,1, p 2,2, p 2,3 ) so choose a uniform distribution for p i.e. Dirichlet(1,..., 1) π(p) = Γ(RC), p 11 + + p 23 = 1 ij Julien Berestycki (University of Oxford) SB2a MT 2015 10 / 16

Calculating marginal likelihoods P(y M D ) = ( ) y Γ(RC) y p 11 + +p 23 =1 ( ) yij pij dp11 dp 23 ij Julien Berestycki (University of Oxford) SB2a MT 2015 11 / 16

Calculating marginal likelihoods P(y M D ) = = ( ) y Γ(RC) y ( ) y Γ(yij + 1) Γ(RC) y Γ( y + RC) p 11 + +p 23 =1 ( ) yij pij dp11 dp 23 ij Julien Berestycki (University of Oxford) SB2a MT 2015 11 / 16

Calculating marginal likelihoods P(y M D ) = = = ( ) y Γ(RC) y ( ) y Γ(yij + 1) Γ(RC) y Γ( y + RC) ( ) y D(y + 1) y D(1 RC ) p 11 + +p 23 =1 ( ) yij pij dp11 dp 23 ij Julien Berestycki (University of Oxford) SB2a MT 2015 11 / 16

Calculating marginal likelihoods P(y M D ) = = = where ( ) y Γ(RC) y ( ) y Γ(yij + 1) Γ(RC) y Γ( y + RC) ( ) y D(y + 1) y D(1 RC ) p 11 + +p 23 =1 D(ν) = Γ(ν i )/Γ( ν i ) ( ) yij pij dp11 dp 23 ij and y + 1 denotes the matrix of counts with 1 added to all entries and 1 RC denotes a vector of length RC with all entries equal to 1. Julien Berestycki (University of Oxford) SB2a MT 2015 11 / 16

Calculating marginal likelihoods Under M I the probabilities are determined by the marginal probabilities p r = {p 1, p 2, } and p c = {p 1, p 2, p 3 } < 2 2 to 12 > 12 C or better p 11 p 12 p 13 p 1 D or F p 21 p 22 p 23 p 2 p 1 p 2 p 3 Julien Berestycki (University of Oxford) SB2a MT 2015 12 / 16

Calculating marginal likelihoods Under M I the probabilities are determined by the marginal probabilities p r = {p 1, p 2, } and p c = {p 1, p 2, p 3 } < 2 2 to 12 > 12 C or better p 11 p 12 p 13 p 1 D or F p 21 p 22 p 23 p 2 p 1 p 2 p 3 Under M I we have a table where p ij = p i p j. Under independence M I the prior for the row sums and column sums are independent uniform priors: Dirichlet distribution (with R=2 and C = 3 respectively) π(p r ) = Γ(R) Γ(1) p 1 1 1 p 1 1 R = Γ(R), π(p c ) = Γ(C) Γ(1) p 1 1 1 p 1 1 C = Γ(C) Julien Berestycki (University of Oxford) SB2a MT 2015 12 / 16

The marginal likelihood under M I is therefore ( ) y ( ) yij P(y M I ) = pi p j π(pr )π(p c )dp r dp c y p r p c ij Julien Berestycki (University of Oxford) SB2a MT 2015 13 / 16

The marginal likelihood under M I is therefore ( ) y ( ) yij P(y M I ) = pi p j π(pr )π(p c )dp r dp c y p r p c ij ( ) y = Γ(R)Γ(C) (p i ) y ( ) i dp y j r p j dpc y p r i p c j Julien Berestycki (University of Oxford) SB2a MT 2015 13 / 16

The marginal likelihood under M I is therefore ( ) y ( ) yij P(y M I ) = pi p j π(pr )π(p c )dp r dp c y p r p c ij ( ) y = Γ(R)Γ(C) (p i ) y ( ) i dp y j r p j dpc y p r i p c j ( ) y Γ(yi + 1) Γ(y j + 1) = Γ(R)Γ(C) y Γ( y + R) Γ( y + C) Julien Berestycki (University of Oxford) SB2a MT 2015 13 / 16

The marginal likelihood under M I is therefore ( ) y ( ) yij P(y M I ) = pi p j π(pr )π(p c )dp r dp c y p r p c ij ( ) y = Γ(R)Γ(C) (p i ) y ( ) i dp y j r p j dpc y p r i p c j ( ) y Γ(yi + 1) Γ(y j + 1) = Γ(R)Γ(C) y Γ( y + R) Γ( y + C) ( ) y D(yR + 1)D(y = C + 1) y D(1 R )D(1 C ) Julien Berestycki (University of Oxford) SB2a MT 2015 13 / 16

Bayes Factor Combining the two marginal likelihoods we get the Bayes Factor BF = P(y M D) P(y M I ) = D(y + 1)D(1 R )D(1 C ) D(1 RC )D(y R + 1)D(y C + 1) Julien Berestycki (University of Oxford) SB2a MT 2015 14 / 16

Bayes Factor Combining the two marginal likelihoods we get the Bayes Factor Our data is BF = P(y M D) P(y M I ) = D(y + 1)D(1 R )D(1 C ) D(1 RC )D(y R + 1)D(y C + 1) 11 68 3 82 9 23 5 37 20 91 8 119 Julien Berestycki (University of Oxford) SB2a MT 2015 14 / 16

Bayes Factor Combining the two marginal likelihoods we get the Bayes Factor Our data is The Bayes factor is BF = P(y M D) P(y M I ) = D(y + 1)D(1 R )D(1 C ) D(1 RC )D(y R + 1)D(y C + 1) 11!68!3!9!23!5!1!2! 124! 11 68 3 82 9 23 5 37 20 91 8 119 120!121! 5!20!91!8!82!37! = 1.66 Julien Berestycki (University of Oxford) SB2a MT 2015 14 / 16

Bayes Factor Combining the two marginal likelihoods we get the Bayes Factor Our data is The Bayes factor is BF = P(y M D) P(y M I ) = D(y + 1)D(1 R )D(1 C ) D(1 RC )D(y R + 1)D(y C + 1) 11!68!3!9!23!5!1!2! 124! 11 68 3 82 9 23 5 37 20 91 8 119 120!121! 5!20!91!8!82!37! = 1.66 which gives modest support against independence. Julien Berestycki (University of Oxford) SB2a MT 2015 14 / 16

Normal Linear regression model Julien Berestycki (University of Oxford) SB2a MT 2015 15 / 16

Normal Linear regression model Model: Response variable n 1 vector Y = (y 1,..., y n ), predictor variables n p matrix X = (x 1,..., x p ). Julien Berestycki (University of Oxford) SB2a MT 2015 16 / 16

Normal Linear regression model Model: Response variable n 1 vector Y = (y 1,..., y n ), predictor variables n p matrix X = (x 1,..., x p ). Y = Xβ + ɛ, ɛ N(0, σ 2 I) Julien Berestycki (University of Oxford) SB2a MT 2015 16 / 16

Normal Linear regression model Model: Response variable n 1 vector Y = (y 1,..., y n ), predictor variables n p matrix X = (x 1,..., x p ). Y = Xβ + ɛ, ɛ N(0, σ 2 I) Recall that classical unbiased estimates are ˆβ = (X t X) 1 X T Y, ˆσ 2 = (Y X ˆβ) T (Y X ˆβ) Julien Berestycki (University of Oxford) SB2a MT 2015 16 / 16

Normal Linear regression model Model: Response variable n 1 vector Y = (y 1,..., y n ), predictor variables n p matrix X = (x 1,..., x p ). Y = Xβ + ɛ, ɛ N(0, σ 2 I) Recall that classical unbiased estimates are ˆβ = (X t X) 1 X T Y, ˆσ 2 = (Y X ˆβ) T (Y X ˆβ) and predicted Y is Ŷ = X ˆβ = P X Y, P X = X(X T X) 1 X T. Julien Berestycki (University of Oxford) SB2a MT 2015 16 / 16

Normal Linear regression model To sum up: Y β, σ 2, X N n (Xβ, σ 2 I) Julien Berestycki (University of Oxford) SB2a MT 2015 17 / 16

Normal Linear regression model To sum up: Y β, σ 2, X N n (Xβ, σ 2 I) Bayesian formulation: Assume that (β, σ 2 ) have a non-informative prior g(β, σ 2 ) 1 σ 2 Julien Berestycki (University of Oxford) SB2a MT 2015 17 / 16

Posterior distribution q(β, σ 2 Y ) = q(β Y, σ 2 )q(σ 2 Y ) Julien Berestycki (University of Oxford) SB2a MT 2015 18 / 16

Posterior distribution q(σ 2 Y ) q(β, σ 2 Y ) = q(β Y, σ 2 )q(σ 2 Y ) 1 (σ 2 ) ( n p IG 2 (n p)/2+1 exp, (n p)ˆσ2 2 Recall Inverse Gamma(a, b) is y a 1 exp{ b/y} } (n p)s2 { 2σ 2 ) Julien Berestycki (University of Oxford) SB2a MT 2015 18 / 16

Posterior distribution q(σ 2 Y ) q(β, σ 2 Y ) = q(β Y, σ 2 )q(σ 2 Y ) 1 (σ 2 ) ( n p IG 2 (n p)/2+1 exp, (n p)ˆσ2 2 Recall Inverse Gamma(a, b) is y a 1 exp{ b/y} } (n p)s2 { 2σ 2 ) where q(β Y, σ 2 ) = N( ˆβ, V β σ 2 ) β = (X T X) 1 X T Y, V β = (X T X) 1 Julien Berestycki (University of Oxford) SB2a MT 2015 18 / 16

Posterior distribution q(σ 2 Y ) q(β, σ 2 Y ) = q(β Y, σ 2 )q(σ 2 Y ) 1 (σ 2 ) ( n p IG 2 (n p)/2+1 exp, (n p)ˆσ2 2 Recall Inverse Gamma(a, b) is y a 1 exp{ b/y} } (n p)s2 { 2σ 2 ) where q(β Y, σ 2 ) = N( ˆβ, V β σ 2 ) β = (X T X) 1 X T Y, V β = (X T X) 1 Julien Berestycki (University of Oxford) SB2a MT 2015 18 / 16

Posterior The posterior density comes from a classical factorization of the likelihood 1 { (2πσ 2 ) n/2 exp 1 } 2σ 2 (y Xβ)T (y Xβ) knowing that (y Xβ) T (y Xβ) = (y X β) (y X β) + ( β β) T X T X( β β) Julien Berestycki (University of Oxford) SB2a MT 2015 19 / 16

Posterior The posterior density comes from a classical factorization of the likelihood 1 { (2πσ 2 ) n/2 exp 1 } 2σ 2 (y Xβ)T (y Xβ) knowing that (y Xβ) T (y Xβ) = (y X β) (y X β) + ( β β) T X T X( β β) P(β Y ) is a non-cenral multivariate t n p distribution. Julien Berestycki (University of Oxford) SB2a MT 2015 19 / 16

Posterior The posterior density comes from a classical factorization of the likelihood 1 { (2πσ 2 ) n/2 exp 1 } 2σ 2 (y Xβ)T (y Xβ) knowing that (y Xβ) T (y Xβ) = (y X β) (y X β) + ( β β) T X T X( β β) P(β Y ) is a non-cenral multivariate t n p distribution. For each j β j ˆβ j t n p s (X T X) 1 jj Julien Berestycki (University of Oxford) SB2a MT 2015 19 / 16

Prediction New covariate matrix X, predict Ỹ. Julien Berestycki (University of Oxford) SB2a MT 2015 20 / 16

Prediction New covariate matrix X, predict Ỹ. p(ỹ Y ) = p(ỹ β, σ2 )p(β, σ 2 Y )dβdσ 2 Julien Berestycki (University of Oxford) SB2a MT 2015 20 / 16

Prediction New covariate matrix X, predict Ỹ. p(ỹ Y ) = p(ỹ β, σ2 )p(β, σ 2 Y )dβdσ 2 Simulate or Julien Berestycki (University of Oxford) SB2a MT 2015 20 / 16

Prediction New covariate matrix X, predict Ỹ. p(ỹ Y ) = p(ỹ β, σ2 )p(β, σ 2 Y )dβdσ 2 Simulate or p(ỹ Y ) is a multivariate t distribution t n p ( X ˆβ, ˆσ 2 (I + X(X T X) 1 X T )) Julien Berestycki (University of Oxford) SB2a MT 2015 20 / 16