Foundations of Statistical Inference

Foundations of Statistical Inference Julien Berestycki Department of Statistics University of Oxford MT 2015 Julien Berestycki (University of Oxford) SB2a MT 2015 1 / 16

Lecture 16 : Bayesian analysis of contingency tables. Bayesian linear regression. Julien Berestycki (University of Oxford) SB2a MT 2015 2 / 16

Example 2 2 From Wikipedia article on contingency tables: Left-handed Right-handed Total Male 9 (y 1 ) 43 52 Female 4 (y 2 ) 44 48 Total 13 87 100 Hypothesis: θ 1 = Proportion of left-handed men > θ 2 = proportion of left handed women. Model: y 1 Binom(n 1, θ 1 ), y 2 Binom(n 2, θ 2 ). Use uniform priors θ i U [0,1] =Beta(1, 1). Posteriors p(θ 1 y 1, n 1 ) = Beta(y 1 +1, n 1 y 1 +1), p(θ 2 y 2, n 2 ) = Beta(y 2 +1, n 2 y 2 +1) Then compute posterior P(Z 1 > Z 2 ) Julien Berestycki (University of Oxford) SB2a MT 2015 3 / 16

Example 2 2 From Wikipedia article on contingency tables: Left-handed Right-handed Total Male 9 (y 1 ) 43 52 Female 4 (y 2 ) 44 48 Total 13 87 100 Hypothesis: θ 1 = Proportion of left-handed men > θ 2 = proportion of left handed women. Model: y 1 Binom(n 1, θ 1 ), y 2 Binom(n 2, θ 2 ). Use uniform priors θ i U [0,1] =Beta(1, 1). Posteriors p(θ 1 y 1, n 1 ) = Beta(y 1 +1, n 1 y 1 +1), p(θ 2 y 2, n 2 ) = Beta(y 2 +1, n 2 y 2 +1) Then compute posterior P(Z 1 > Z 2 ) Either compute an integral or by simulation. Julien Berestycki (University of Oxford) SB2a MT 2015 3 / 16

Example 2 2: simulations See R code. Generate M sample from joint posterior p(θ 1, θ 2 y 1, n 1, y 2, n 2 ) = p(θ 1 y 1, n 1 ) p(θ 2 y 2, n 2 ) and then use Monte-Carlo approximation P[θ 1 > θ 2 ] 1 (i) I(θ M Outputs M=10000 : 1 > θ(i) 2 ) Posterior Simulation of Male - Female Lefties 2.5% 50% 97.5% -0.046 0.083 0.218 print(mean(theta1>theta2)) [1] 0.8997 p(theta1 - theta2 y, n) -0.2-0.1 0.0 0.1 0.2 0.3 0.4 theta1 - theta2 Julien Berestycki (University of Oxford) SB2a MT 2015 4 / 16

Contingency table analysis North Carolina State University data. EC : Extra Curricular activities in hours per week. EC < 2 2 to 12 > 12 C or better 11 68 3 D or F 9 23 5 Let y = (y ij ) be the matrix of counts. Julien Berestycki (University of Oxford) SB2a MT 2015 5 / 16

Frequentist analysis Usual χ 2 test from R. Pearson s Chi-squared test y i,j is cardinal of cell i, j < 2 2 to 12 > 12 C or better 11 68 3 D or F 9 23 5 Julien Berestycki (University of Oxford) SB2a MT 2015 6 / 16

Frequentist analysis Usual χ 2 test from R. Pearson s Chi-squared test Sum rows and columns < 2 2 to 12 > 12 total C or better 11 68 3 82 D or F 9 23 5 37 total 20 91 8 119 Julien Berestycki (University of Oxford) SB2a MT 2015 6 / 16

Frequentist analysis Usual χ 2 test from R. Pearson s Chi-squared test E i,j = r i c j /N < 2 2 to 12 > 12 total C or better 13.8 62.7 5.51 82 D or F 6.22 28.3 2.49 37 total 20 91 8 119 Julien Berestycki (University of Oxford) SB2a MT 2015 6 / 16

Frequentist analysis Usual χ 2 test from R. Pearson s Chi-squared test χ 2 = i,j (y i,j E i,j ) 2 /E i,j < 2 2 to 12 > 12 total C or better 0.56 0.45 1.15 D or F 1.24 0.99 2.54 total 6.92 Julien Berestycki (University of Oxford) SB2a MT 2015 6 / 16

Frequentist analysis Usual χ 2 test from R. Pearson s Chi-squared test χ 2 = i,j (y i,j E i,j ) 2 /E i,j < 2 2 to 12 > 12 total C or better 0.56 0.45 1.15 D or F 1.24 0.99 2.54 total 6.92 X-squared = 6.9264, df = (3-1)(2-1) = 2, p-value = 0.03133 The p-value is 0.03133, evidence that grades are related to time spent on extra curricular activities. Julien Berestycki (University of Oxford) SB2a MT 2015 6 / 16

Bayesian analysis EC < 2 2 to 12 > 12 C or better p 11 p 12 p 13 D or F p 21 p 22 p 23 Julien Berestycki (University of Oxford) SB2a MT 2015 7 / 16

Bayesian analysis EC < 2 2 to 12 > 12 C or better p 11 p 12 p 13 D or F p 21 p 22 p 23 Let p = {p 11,..., p 23 }. The model is that Y = (y 1,1,..., y 2,3 ) is a multinomial (N, p) (i.e. N trials with P(X k = (i, j)) = p i,j and y i,j = #{X k = (i, j)}.) Bayesian method: make p a variable. Consider two models M I the two categorical variables are independent Julien Berestycki (University of Oxford) SB2a MT 2015 7 / 16

Bayesian analysis EC < 2 2 to 12 > 12 C or better p 11 p 12 p 13 D or F p 21 p 22 p 23 Let p = {p 11,..., p 23 }. The model is that Y = (y 1,1,..., y 2,3 ) is a multinomial (N, p) (i.e. N trials with P(X k = (i, j)) = p i,j and y i,j = #{X k = (i, j)}.) Bayesian method: make p a variable. Consider two models M I the two categorical variables are independent the two categorical variables are dependent. M D Julien Berestycki (University of Oxford) SB2a MT 2015 7 / 16

Bayesian analysis EC < 2 2 to 12 > 12 C or better p 11 p 12 p 13 D or F p 21 p 22 p 23 Let p = {p 11,..., p 23 }. The model is that Y = (y 1,1,..., y 2,3 ) is a multinomial (N, p) (i.e. N trials with P(X k = (i, j)) = p i,j and y i,j = #{X k = (i, j)}.) Bayesian method: make p a variable. Consider two models M I (p 11, p 12, p 13 ) (p 21, p 22, p 23 ) M D the two categorical variables are dependent. The Bayes factor is BF = P(y M D) P(y M I ). Julien Berestycki (University of Oxford) SB2a MT 2015 7 / 16

Bayesian analysis EC < 2 2 to 12 > 12 C or better p 11 p 12 p 13 D or F p 21 p 22 p 23 Let p = {p 11,..., p 23 }. The model is that Y = (y 1,1,..., y 2,3 ) is a multinomial (N, p) (i.e. N trials with P(X k = (i, j)) = p i,j and y i,j = #{X k = (i, j)}.) Bayesian method: make p a variable. Consider two models M I (p 11, p 12, p 13 ) (p 21, p 22, p 23 ) M D (p 11, p 12, p 13 ) (p 21, p 22, p 23 ) The Bayes factor is BF = P(y M D) P(y M I ). Julien Berestycki (University of Oxford) SB2a MT 2015 7 / 16

The Dirichlet distribution Dirichlet integral z 1 + +z k =1 z ν 1 1 1 z ν k 1 k dz 1 dz k = Γ(ν 1) Γ(ν k ) Γ( ν i ) Julien Berestycki (University of Oxford) SB2a MT 2015 8 / 16

The Dirichlet distribution Dirichlet integral z 1 + +z k =1 z ν 1 1 1 z ν k 1 k dz 1 dz k = Γ(ν 1) Γ(ν k ) Γ( ν i ) Dirichlet distribution Γ( ν i ) Γ(ν 1 ) Γ(ν k ) zν 1 1 1 z ν k 1 k, z 1 + + z k = 1 The means are E[Z i ] = ν i / ν i, i = 1,..., k. A representation that makes the Dirichlet easy to simulate from is the following. Julien Berestycki (University of Oxford) SB2a MT 2015 8 / 16

The Dirichlet distribution Dirichlet integral z 1 + +z k =1 z ν 1 1 1 z ν k 1 k dz 1 dz k = Γ(ν 1) Γ(ν k ) Γ( ν i ) Dirichlet distribution Γ( ν i ) Γ(ν 1 ) Γ(ν k ) zν 1 1 1 z ν k 1 k, z 1 + + z k = 1 The means are E[Z i ] = ν i / ν i, i = 1,..., k. A representation that makes the Dirichlet easy to simulate from is the following. Let W 1,..., W k be independent Gamma (ν 1, θ),... Gamma (ν k, θ) random variables, W = W i and set Z i = W i /W, i = 1,..., k. (Does not depend on θ). Julien Berestycki (University of Oxford) SB2a MT 2015 8 / 16

Examples of 3D Dirichlet distributions Julien Berestycki (University of Oxford) SB2a MT 2015 9 / 16

Calculating marginal likelihoods The model is that f (y, p) is a P(y M D ) = P(y p)π(p)dp p Julien Berestycki (University of Oxford) SB2a MT 2015 10 / 16

Calculating marginal likelihoods The model is that f (y, p) is a P(y M D ) = P(y p)π(p)dp = p p 11 + +p 23 =1 ( ) y ( ) yij pij π(p)dp11 dp 23 y ij Julien Berestycki (University of Oxford) SB2a MT 2015 10 / 16

Calculating marginal likelihoods The model is that f (y, p) is a P(y M D ) = P(y p)π(p)dp = p p 11 + +p 23 =1 ( ) y ( ) yij pij π(p)dp11 dp 23 y ij where ( ) y = ( y ij )!/ y ij! y Julien Berestycki (University of Oxford) SB2a MT 2015 10 / 16

Calculating marginal likelihoods The model is that f (y, p) is a P(y M D ) = P(y p)π(p)dp where = p p 11 + +p 23 =1 ( ) y ( ) yij pij π(p)dp11 dp 23 y ( ) y = ( y ij )!/ y ij! y Under M D (p 1,1, p 1,2, p 1,3 ) (p 2,1, p 2,2, p 2,3 ) so choose a uniform distribution for p i.e. Dirichlet(1,..., 1) ij Julien Berestycki (University of Oxford) SB2a MT 2015 10 / 16

Calculating marginal likelihoods The model is that f (y, p) is a P(y M D ) = P(y p)π(p)dp where = p p 11 + +p 23 =1 ( ) y ( ) yij pij π(p)dp11 dp 23 y ( ) y = ( y ij )!/ y ij! y Under M D (p 1,1, p 1,2, p 1,3 ) (p 2,1, p 2,2, p 2,3 ) so choose a uniform distribution for p i.e. Dirichlet(1,..., 1) π(p) = Γ(RC), p 11 + + p 23 = 1 ij Julien Berestycki (University of Oxford) SB2a MT 2015 10 / 16

Calculating marginal likelihoods P(y M D ) = ( ) y Γ(RC) y p 11 + +p 23 =1 ( ) yij pij dp11 dp 23 ij Julien Berestycki (University of Oxford) SB2a MT 2015 11 / 16

Calculating marginal likelihoods P(y M D ) = = ( ) y Γ(RC) y ( ) y Γ(yij + 1) Γ(RC) y Γ( y + RC) p 11 + +p 23 =1 ( ) yij pij dp11 dp 23 ij Julien Berestycki (University of Oxford) SB2a MT 2015 11 / 16

Calculating marginal likelihoods P(y M D ) = = = ( ) y Γ(RC) y ( ) y Γ(yij + 1) Γ(RC) y Γ( y + RC) ( ) y D(y + 1) y D(1 RC ) p 11 + +p 23 =1 ( ) yij pij dp11 dp 23 ij Julien Berestycki (University of Oxford) SB2a MT 2015 11 / 16

Calculating marginal likelihoods P(y M D ) = = = where ( ) y Γ(RC) y ( ) y Γ(yij + 1) Γ(RC) y Γ( y + RC) ( ) y D(y + 1) y D(1 RC ) p 11 + +p 23 =1 D(ν) = Γ(ν i )/Γ( ν i ) ( ) yij pij dp11 dp 23 ij and y + 1 denotes the matrix of counts with 1 added to all entries and 1 RC denotes a vector of length RC with all entries equal to 1. Julien Berestycki (University of Oxford) SB2a MT 2015 11 / 16

Calculating marginal likelihoods Under M I the probabilities are determined by the marginal probabilities p r = {p 1, p 2, } and p c = {p 1, p 2, p 3 } < 2 2 to 12 > 12 C or better p 11 p 12 p 13 p 1 D or F p 21 p 22 p 23 p 2 p 1 p 2 p 3 Under M I we have a table where p ij = p i p j. Under independence M I the prior for the row sums and column sums are independent uniform priors: Dirichlet distribution (with R=2 and C = 3 respectively) π(p r ) = Γ(R) Γ(1) p 1 1 1 p 1 1 R = Γ(R), π(p c ) = Γ(C) Γ(1) p 1 1 1 p 1 1 C = Γ(C) Julien Berestycki (University of Oxford) SB2a MT 2015 12 / 16

The marginal likelihood under M I is therefore ( ) y ( ) yij P(y M I ) = pi p j π(pr )π(p c )dp r dp c y p r p c ij Julien Berestycki (University of Oxford) SB2a MT 2015 13 / 16

The marginal likelihood under M I is therefore ( ) y ( ) yij P(y M I ) = pi p j π(pr )π(p c )dp r dp c y p r p c ij ( ) y = Γ(R)Γ(C) (p i ) y ( ) i dp y j r p j dpc y p r i p c j Julien Berestycki (University of Oxford) SB2a MT 2015 13 / 16

The marginal likelihood under M I is therefore ( ) y ( ) yij P(y M I ) = pi p j π(pr )π(p c )dp r dp c y p r p c ij ( ) y = Γ(R)Γ(C) (p i ) y ( ) i dp y j r p j dpc y p r i p c j ( ) y Γ(yi + 1) Γ(y j + 1) = Γ(R)Γ(C) y Γ( y + R) Γ( y + C) Julien Berestycki (University of Oxford) SB2a MT 2015 13 / 16

The marginal likelihood under M I is therefore ( ) y ( ) yij P(y M I ) = pi p j π(pr )π(p c )dp r dp c y p r p c ij ( ) y = Γ(R)Γ(C) (p i ) y ( ) i dp y j r p j dpc y p r i p c j ( ) y Γ(yi + 1) Γ(y j + 1) = Γ(R)Γ(C) y Γ( y + R) Γ( y + C) ( ) y D(yR + 1)D(y = C + 1) y D(1 R )D(1 C ) Julien Berestycki (University of Oxford) SB2a MT 2015 13 / 16

Bayes Factor Combining the two marginal likelihoods we get the Bayes Factor BF = P(y M D) P(y M I ) = D(y + 1)D(1 R )D(1 C ) D(1 RC )D(y R + 1)D(y C + 1) Julien Berestycki (University of Oxford) SB2a MT 2015 14 / 16

Bayes Factor Combining the two marginal likelihoods we get the Bayes Factor Our data is BF = P(y M D) P(y M I ) = D(y + 1)D(1 R )D(1 C ) D(1 RC )D(y R + 1)D(y C + 1) 11 68 3 82 9 23 5 37 20 91 8 119 Julien Berestycki (University of Oxford) SB2a MT 2015 14 / 16

Bayes Factor Combining the two marginal likelihoods we get the Bayes Factor Our data is The Bayes factor is BF = P(y M D) P(y M I ) = D(y + 1)D(1 R )D(1 C ) D(1 RC )D(y R + 1)D(y C + 1) 11!68!3!9!23!5!1!2! 124! 11 68 3 82 9 23 5 37 20 91 8 119 120!121! 5!20!91!8!82!37! = 1.66 Julien Berestycki (University of Oxford) SB2a MT 2015 14 / 16

Bayes Factor Combining the two marginal likelihoods we get the Bayes Factor Our data is The Bayes factor is BF = P(y M D) P(y M I ) = D(y + 1)D(1 R )D(1 C ) D(1 RC )D(y R + 1)D(y C + 1) 11!68!3!9!23!5!1!2! 124! 11 68 3 82 9 23 5 37 20 91 8 119 120!121! 5!20!91!8!82!37! = 1.66 which gives modest support against independence. Julien Berestycki (University of Oxford) SB2a MT 2015 14 / 16

Normal Linear regression model Julien Berestycki (University of Oxford) SB2a MT 2015 15 / 16

Normal Linear regression model Model: Response variable n 1 vector Y = (y 1,..., y n ), predictor variables n p matrix X = (x 1,..., x p ). Julien Berestycki (University of Oxford) SB2a MT 2015 16 / 16

Normal Linear regression model Model: Response variable n 1 vector Y = (y 1,..., y n ), predictor variables n p matrix X = (x 1,..., x p ). Y = Xβ + ɛ, ɛ N(0, σ 2 I) Julien Berestycki (University of Oxford) SB2a MT 2015 16 / 16

Normal Linear regression model Model: Response variable n 1 vector Y = (y 1,..., y n ), predictor variables n p matrix X = (x 1,..., x p ). Y = Xβ + ɛ, ɛ N(0, σ 2 I) Recall that classical unbiased estimates are ˆβ = (X t X) 1 X T Y, ˆσ 2 = (Y X ˆβ) T (Y X ˆβ) Julien Berestycki (University of Oxford) SB2a MT 2015 16 / 16

Normal Linear regression model Model: Response variable n 1 vector Y = (y 1,..., y n ), predictor variables n p matrix X = (x 1,..., x p ). Y = Xβ + ɛ, ɛ N(0, σ 2 I) Recall that classical unbiased estimates are ˆβ = (X t X) 1 X T Y, ˆσ 2 = (Y X ˆβ) T (Y X ˆβ) and predicted Y is Ŷ = X ˆβ = P X Y, P X = X(X T X) 1 X T. Julien Berestycki (University of Oxford) SB2a MT 2015 16 / 16

Normal Linear regression model To sum up: Y β, σ 2, X N n (Xβ, σ 2 I) Julien Berestycki (University of Oxford) SB2a MT 2015 17 / 16

Normal Linear regression model To sum up: Y β, σ 2, X N n (Xβ, σ 2 I) Bayesian formulation: Assume that (β, σ 2 ) have a non-informative prior g(β, σ 2 ) 1 σ 2 Julien Berestycki (University of Oxford) SB2a MT 2015 17 / 16

Posterior distribution q(β, σ 2 Y ) = q(β Y, σ 2 )q(σ 2 Y ) Julien Berestycki (University of Oxford) SB2a MT 2015 18 / 16

Posterior distribution q(σ 2 Y ) q(β, σ 2 Y ) = q(β Y, σ 2 )q(σ 2 Y ) 1 (σ 2 ) ( n p IG 2 (n p)/2+1 exp, (n p)ˆσ2 2 Recall Inverse Gamma(a, b) is y a 1 exp{ b/y} } (n p)s2 { 2σ 2 ) Julien Berestycki (University of Oxford) SB2a MT 2015 18 / 16

Posterior distribution q(σ 2 Y ) q(β, σ 2 Y ) = q(β Y, σ 2 )q(σ 2 Y ) 1 (σ 2 ) ( n p IG 2 (n p)/2+1 exp, (n p)ˆσ2 2 Recall Inverse Gamma(a, b) is y a 1 exp{ b/y} } (n p)s2 { 2σ 2 ) where q(β Y, σ 2 ) = N( ˆβ, V β σ 2 ) β = (X T X) 1 X T Y, V β = (X T X) 1 Julien Berestycki (University of Oxford) SB2a MT 2015 18 / 16

Posterior The posterior density comes from a classical factorization of the likelihood 1 { (2πσ 2 ) n/2 exp 1 } 2σ 2 (y Xβ)T (y Xβ) knowing that (y Xβ) T (y Xβ) = (y X β) (y X β) + ( β β) T X T X( β β) Julien Berestycki (University of Oxford) SB2a MT 2015 19 / 16

Posterior The posterior density comes from a classical factorization of the likelihood 1 { (2πσ 2 ) n/2 exp 1 } 2σ 2 (y Xβ)T (y Xβ) knowing that (y Xβ) T (y Xβ) = (y X β) (y X β) + ( β β) T X T X( β β) P(β Y ) is a non-cenral multivariate t n p distribution. Julien Berestycki (University of Oxford) SB2a MT 2015 19 / 16

Posterior The posterior density comes from a classical factorization of the likelihood 1 { (2πσ 2 ) n/2 exp 1 } 2σ 2 (y Xβ)T (y Xβ) knowing that (y Xβ) T (y Xβ) = (y X β) (y X β) + ( β β) T X T X( β β) P(β Y ) is a non-cenral multivariate t n p distribution. For each j β j ˆβ j t n p s (X T X) 1 jj Julien Berestycki (University of Oxford) SB2a MT 2015 19 / 16

Prediction New covariate matrix X, predict Ỹ. Julien Berestycki (University of Oxford) SB2a MT 2015 20 / 16

Prediction New covariate matrix X, predict Ỹ. p(ỹ Y ) = p(ỹ β, σ2 )p(β, σ 2 Y )dβdσ 2 Julien Berestycki (University of Oxford) SB2a MT 2015 20 / 16

Prediction New covariate matrix X, predict Ỹ. p(ỹ Y ) = p(ỹ β, σ2 )p(β, σ 2 Y )dβdσ 2 Simulate or Julien Berestycki (University of Oxford) SB2a MT 2015 20 / 16

Prediction New covariate matrix X, predict Ỹ. p(ỹ Y ) = p(ỹ β, σ2 )p(β, σ 2 Y )dβdσ 2 Simulate or p(ỹ Y ) is a multivariate t distribution t n p ( X ˆβ, ˆσ 2 (I + X(X T X) 1 X T )) Julien Berestycki (University of Oxford) SB2a MT 2015 20 / 16