Sampling Distributions

Similar documents
Sampling Distributions

MLES & Multivariate Normal Theory

Maximum Likelihood Estimation

Regression Review. Statistics 149. Spring Copyright c 2006 by Mark E. Irwin

A Significance Test for the Lasso

Ch 2: Simple Linear Regression

Interpretation, Prediction and Confidence Intervals

5.1 Consistency of least squares estimates. We begin with a few consistency results that stand on their own and do not depend on normality.

Ch 3: Multiple Linear Regression

Preliminaries. Copyright c 2018 Dan Nettleton (Iowa State University) Statistics / 38

Data Mining Stat 588

Simple Linear Regression

AMS-207: Bayesian Statistics

The Multivariate Normal Distribution 1

Part IB Statistics. Theorems with proof. Based on lectures by D. Spiegelhalter Notes taken by Dexter Chua. Lent 2015

Lecture 8: Fitting Data Statistical Computing, Wednesday October 7, 2015

A significance test for the lasso

STAT 135 Lab 13 (Review) Linear Regression, Multivariate Random Variables, Prediction, Logistic Regression and the δ-method.

Matrix Approach to Simple Linear Regression: An Overview

Gauss Markov & Predictive Distributions

Ch4. Distribution of Quadratic Forms in y

TMA4267 Linear Statistical Models V2017 (L10)

The linear model is the most fundamental of all serious statistical models encompassing:

variability of the model, represented by σ 2 and not accounted for by Xβ

Multiple Linear Regression

The Multivariate Normal Distribution 1

STAT 462-Computational Data Analysis

STA 2201/442 Assignment 2

Lecture 6 Multiple Linear Regression, cont.

Applied Regression Analysis

A significance test for the lasso

Introduction to Statistics and R

3 Multiple Linear Regression

Probability and Statistics Notes

2. A Review of Some Key Linear Models Results. Copyright c 2018 Dan Nettleton (Iowa State University) 2. Statistics / 28

MIT Spring 2015

Peter Hoff Linear and multilinear models April 3, GLS for multivariate regression 5. 3 Covariance estimation for the GLM 8

Linear models and their mathematical foundations: Simple linear regression

Post-selection Inference for Forward Stepwise and Least Angle Regression

Distribution Assumptions

Matrices and vectors A matrix is a rectangular array of numbers. Here s an example: A =

Simple and Multiple Linear Regression

BIOS 2083 Linear Models Abdus S. Wahed. Chapter 2 84

Chapter 3. Linear Models for Regression

ST430 Exam 1 with Answers

3. The F Test for Comparing Reduced vs. Full Models. opyright c 2018 Dan Nettleton (Iowa State University) 3. Statistics / 43

Lecture 1: Linear Models and Applications

STA442/2101: Assignment 5

Some new ideas for post selection inference and model assessment

BIOS 2083 Linear Models c Abdus S. Wahed

Outline. Remedial Measures) Extra Sums of Squares Standardized Version of the Multiple Regression Model

Inference for Regression

Multivariate Regression

Multivariate Statistical Analysis

Lecture 11. Multivariate Normal theory

Problems. Suppose both models are fitted to the same data. Show that SS Res, A SS Res, B

So far our focus has been on estimation of the parameter vector β in the. y = Xβ + u

Regularization Paths

If you believe in things that you don t understand then you suffer. Stevie Wonder, Superstition

Simple Linear Regression

Linear Regression Model. Badr Missaoui

Stat 704 Data Analysis I Probability Review

Least Angle Regression, Forward Stagewise and the Lasso

Multiple Linear Regression

Distributions of Quadratic Forms. Copyright c 2012 Dan Nettleton (Iowa State University) Statistics / 31

UNIVERSITY OF MASSACHUSETTS. Department of Mathematics and Statistics. Basic Exam - Applied Statistics. Tuesday, January 17, 2017

Bayesian Linear Models

Multivariate Analysis and Likelihood Inference

General Linear Model: Statistical Inference

Applied Regression. Applied Regression. Chapter 2 Simple Linear Regression. Hongcheng Li. April, 6, 2013

Figure 1: The fitted line using the shipment route-number of ampules data. STAT5044: Regression and ANOVA The Solution of Homework #2 Inyoung Kim

COMP 551 Applied Machine Learning Lecture 2: Linear regression

Bayesian Linear Models

Confidence Intervals, Testing and ANOVA Summary

Lecture 4 Multiple linear regression

Lecture 11: Regression Methods I (Linear Regression)

Basic Distributional Assumptions of the Linear Model: 1. The errors are unbiased: E[ε] = The errors are uncorrelated with common variance:

Math 423/533: The Main Theoretical Topics

Simple Linear Regression

This model of the conditional expectation is linear in the parameters. A more practical and relaxed attitude towards linear regression is to say that

Lecture 11: Regression Methods I (Linear Regression)

6. Multiple Linear Regression

High-dimensional data analysis

SCHOOL OF MATHEMATICS AND STATISTICS. Linear and Generalised Linear Models

STAT 350. Assignment 6 Solutions

4 Multiple Linear Regression

STAT5044: Regression and Anova. Inyoung Kim

Section 4.6 Simple Linear Regression

Regression #5: Confidence Intervals and Hypothesis Testing (Part 1)

Bayes Estimators & Ridge Regression

Regression and Statistical Inference

Random Vectors and Multivariate Normal Distributions

[y i α βx i ] 2 (2) Q = i=1

Stat 5102 Final Exam May 14, 2015

Chapter 3: Multiple Regression. August 14, 2018

Linear Regression. In this problem sheet, we consider the problem of linear regression with p predictors and one intercept,

UNIVERSITY OF TORONTO Faculty of Arts and Science

Bayesian Linear Models

Math 3330: Solution to midterm Exam

Dealing with Heteroskedasticity

Transcription:

Merlise Clyde Duke University September 8, 2016

Outline Topics Normal Theory Chi-squared Distributions Student t Distributions Readings: Christensen Apendix C, Chapter 1-2

Prostate Example > library(lasso2); data(prostate) # n = 97, 9 variables > summary(lm(lpsa ~., data=prostate)) Coefficients: Estimate Std. Error t value Pr(> t ) (Intercept) 0.669399 1.296381 0.516 0.60690 lcavol 0.587023 0.087920 6.677 2.11e-09 *** lweight 0.454461 0.170012 2.673 0.00896 ** age -0.019637 0.011173-1.758 0.08229. lbph 0.107054 0.058449 1.832 0.07040. svi 0.766156 0.244309 3.136 0.00233 ** lcp -0.105474 0.091013-1.159 0.24964 gleason 0.045136 0.157464 0.287 0.77506 pgg45 0.004525 0.004421 1.024 0.30885 --- Signif. codes: 0 *** 0.001 ** 0.01 * 0.05. 0.1 1 Residual standard error: 0.7084 on 88 degrees of freedom Multiple R-squared: 0.6548,Adjusted R-squared: 0.6234 F-statistic: 20.86 STA721 onlinear 8 and Models 88 Sampling DF, Distributions p-value: < 2.2e-16

Summary of Distributions Models: Full Y = Xβ + ɛ Assume X is full rank with the first column of ones 1 n and p additional predictors r(x) = p + 1 ˆβ σ 2 N(β, σ 2 (X T X) 1 ) SSE σ 2 χ2 n r(x) ˆβ j β j SE( ˆβ j ) t n r(x) where SE( ˆβ) is the square root of the jth diagonal element of ˆσ 2 (X T X) 1 and ˆσ 2 is the unbiased estimate of σ 2

General Case W = µ + AZ with Z R d and A is n d

General Case W = µ + AZ with Z R d and A is n d E[W] = µ

General Case W = µ + AZ with Z R d and A is n d E[W] = µ Cov(W) = AA T 0

General Case W = µ + AZ with Z R d and A is n d E[W] = µ Cov(W) = AA T 0 W N(µ, Σ) where Σ = AA T If Σ is singular then there is no density (on R n ), but claim that W still has a multivariate normal distribution!

General Case W = µ + AZ with Z R d and A is n d E[W] = µ Cov(W) = AA T 0 W N(µ, Σ) where Σ = AA T If Σ is singular then there is no density (on R n ), but claim that W still has a multivariate normal distribution! Definition W R n has a multivariate normal distribution N(µ, Σ) if for any v R n v T Y has a normal distribution with mean v T µ and variance v T Σv

General Case W = µ + AZ with Z R d and A is n d E[W] = µ Cov(W) = AA T 0 W N(µ, Σ) where Σ = AA T If Σ is singular then there is no density (on R n ), but claim that W still has a multivariate normal distribution! Definition W R n has a multivariate normal distribution N(µ, Σ) if for any v R n v T Y has a normal distribution with mean v T µ and variance v T Σv see Lessons in Normal Theory in Sakai for videos using Characteristic functions

Linear Transformations are Normal If Y N n (µ, Σ) then for A m n AY N m (Aµ, AΣA T ) AΣA T does not have to be positive definite!

Equal in Distribution Multiple ways to define the same normal:

Equal in Distribution Multiple ways to define the same normal: Z 1 N(0, I n ), Z 1 R n and take A d n

Equal in Distribution Multiple ways to define the same normal: Z 1 N(0, I n ), Z 1 R n and take A d n Z 2 N(0, I p ), Z 2 R p and take B d p

Equal in Distribution Multiple ways to define the same normal: Z 1 N(0, I n ), Z 1 R n and take A d n Z 2 N(0, I p ), Z 2 R p and take B d p Define Y = µ + AZ 1

Equal in Distribution Multiple ways to define the same normal: Z 1 N(0, I n ), Z 1 R n and take A d n Z 2 N(0, I p ), Z 2 R p and take B d p Define Y = µ + AZ 1 Define W = µ + BZ 2

Equal in Distribution Multiple ways to define the same normal: Z 1 N(0, I n ), Z 1 R n and take A d n Z 2 N(0, I p ), Z 2 R p and take B d p Define Y = µ + AZ 1 Define W = µ + BZ 2 Theorem If Y = µ + AZ 1 and W = µ + BZ 2 then Y D = W if and only if AA T = BB T = Σ

Zero Correlation and Independence Theorem For a random vector Y N(µ, Σ) partitioned as [ ] ([ ] [ Y1 µ1 Σ11 Σ Y = N, 12 Y 2 µ 2 Σ 21 Σ 22 ])

Zero Correlation and Independence Theorem For a random vector Y N(µ, Σ) partitioned as [ ] ([ ] [ Y1 µ1 Σ11 Σ Y = N, 12 Y 2 µ 2 Σ 21 Σ 22 then Cov(Y 1, Y 2 ) = Σ 12 = Σ T 21 = 0 if and only if Y 1 and Y 2 are independent. ])

Independence Implies Zero Covariance Proof. Cov(Y 1, Y 2 ) = E[(Y 1 µ 1 )(Y 2 µ 2 ) T ]

Independence Implies Zero Covariance Proof. Cov(Y 1, Y 2 ) = E[(Y 1 µ 1 )(Y 2 µ 2 ) T ] If Y 1 and Y 2 are independent

Independence Implies Zero Covariance Proof. Cov(Y 1, Y 2 ) = E[(Y 1 µ 1 )(Y 2 µ 2 ) T ] If Y 1 and Y 2 are independent E[(Y 1 µ 1 )(Y 2 µ 2 ) T ] = E[(Y 1 µ 1 )E(Y 2 µ 2 ) T ] = 00 T = 0

Independence Implies Zero Covariance Proof. Cov(Y 1, Y 2 ) = E[(Y 1 µ 1 )(Y 2 µ 2 ) T ] If Y 1 and Y 2 are independent E[(Y 1 µ 1 )(Y 2 µ 2 ) T ] = E[(Y 1 µ 1 )E(Y 2 µ 2 ) T ] = 00 T = 0 therefore Σ 12 = 0

Zero Covariance Implies Independence Assume Σ 12 = 0 Proof Choose an [ A1 0 A = 0 A 2 ] such that A 1 A T 1 = Σ 11, A 2 A T 2 = Σ 22

Zero Covariance Implies Independence Assume Σ 12 = 0 Proof Choose an [ A1 0 A = 0 A 2 ] such that A 1 A T 1 = Σ 11, A 2 A T 2 = Σ 22 Partition Z = [ Z1 Z 2 ] ([ 01 N 0 2 ] [ I1 0, 0 I 2 ]) and µ = [ µ1 µ 2 ]

Zero Covariance Implies Independence Assume Σ 12 = 0 Proof Choose an [ A1 0 A = 0 A 2 ] such that A 1 A T 1 = Σ 11, A 2 A T 2 = Σ 22 Partition Z = [ Z1 Z 2 ] ([ 01 N 0 2 ] [ I1 0, 0 I 2 ]) and µ = [ µ1 µ 2 ] then Y D = AZ + µ N(µ, Σ)

Continued Proof. [ Y1 Y 2 ] [ D A1 Z = 1 + µ 1 A 2 Z 2 + µ 2 ]

Continued Proof. [ Y1 Y 2 ] [ D A1 Z = 1 + µ 1 A 2 Z 2 + µ 2 ] But Z 1 and Z 2 are independent

Continued Proof. [ Y1 Y 2 ] [ D A1 Z = 1 + µ 1 A 2 Z 2 + µ 2 ] But Z 1 and Z 2 are independent Functions of Z 1 and Z 2 are independent

Continued Proof. [ Y1 Y 2 ] [ D A1 Z = 1 + µ 1 A 2 Z 2 + µ 2 ] But Z 1 and Z 2 are independent Functions of Z 1 and Z 2 are independent Therefore Y 1 and Y 2 are independent

Continued Proof. [ Y1 Y 2 ] [ D A1 Z = 1 + µ 1 A 2 Z 2 + µ 2 ] But Z 1 and Z 2 are independent Functions of Z 1 and Z 2 are independent Therefore Y 1 and Y 2 are independent For Multivariate Normal Zero Covariance implies independence

Another Useful Result Corollary If Y N(µ, σ 2 I n ) and AB T = 0 then AY and BY are independent.

Another Useful Result Corollary If Y N(µ, σ 2 I n ) and AB T = 0 then AY and BY are independent. Proof. [ W1 W 2 ] = [ A B ] [ AY Y = BY ]

Another Useful Result Corollary If Y N(µ, σ 2 I n ) and AB T = 0 then AY and BY are independent. Proof. [ W1 W 2 ] = [ A B ] [ AY Y = BY ] Cov(W 1, W 2 ) = Cov(AY, BY) = σ 2 AB T

Another Useful Result Corollary If Y N(µ, σ 2 I n ) and AB T = 0 then AY and BY are independent. Proof. [ W1 W 2 ] = [ A B ] [ AY Y = BY ] Cov(W 1, W 2 ) = Cov(AY, BY) = σ 2 AB T AY and BY are independent if AB T = 0

of β If Y N(Xβ, σ 2 I n ) Then ˆβ N(β, σ 2 (X T X) 1 )

Unknown σ 2 ˆβ j β j, σ 2 N(β, σ 2 [(X T X) 1 ] jj )

Unknown σ 2 ˆβ j β j, σ 2 N(β, σ 2 [(X T X) 1 ] jj ) What happens if we substitute ˆσ 2 = e t e/(n r(x)) in the above?

Unknown σ 2 ˆβ j β j, σ 2 N(β, σ 2 [(X T X) 1 ] jj ) What happens if we substitute ˆσ 2 = e t e/(n r(x)) in the above? ( ˆβ j β j )/σ [(X T X) 1 ] jj e T e/(σ 2 (n r(x)) D = N(0, 1) χ 2 n r(x) /(n r(x) t(n r(x), 0, 1) Need to show that e T e/σ 2 has a χ 2 distribution and is independent of the numerator!

Central Student t Distribution Definition Let Z N(0, 1) and S χ 2 p with Z and S independent,

Central Student t Distribution Definition Let Z N(0, 1) and S χ 2 p with Z and S independent, then W = Z S/p has a (central) Student t distribution with p degrees of freedom

Central Student t Distribution Definition Let Z N(0, 1) and S χ 2 p with Z and S independent, then W = Z S/p has a (central) Student t distribution with p degrees of freedom See Casella & Berger or DeGroot & Schervish for derivation - nice change of variables and marginalization problem!

Chi-Squared Distribution Definition If Z N(0, 1) then Z 2 χ 2 1 degree of freedom) (A Chi-squared distribution with one

Chi-Squared Distribution Definition If Z N(0, 1) then Z 2 χ 2 1 degree of freedom) Density (A Chi-squared distribution with one f (x) = 1 Γ(1/2) (1/2) 1/2 x 1/2 1 e x/2 x > 0

Chi-Squared Distribution Definition If Z N(0, 1) then Z 2 χ 2 1 degree of freedom) Density (A Chi-squared distribution with one f (x) = 1 Γ(1/2) (1/2) 1/2 x 1/2 1 e x/2 x > 0 Characteristic Function E[e itz 2 ] = ϕ(t) = (1 2it) 1/2

Chi-Squared Distribution with p Degrees of Freedom iid If Z j N(0, 1) j = 1,... p then X Z T Z = p j Zj 2 χ 2 p

Chi-Squared Distribution with p Degrees of Freedom iid If Z j N(0, 1) j = 1,... p then X Z T Z = p j Zj 2 χ 2 p Characteristic Function ϕ X (t) = E[e it p j Z 2 j ]

Chi-Squared Distribution with p Degrees of Freedom iid If Z j N(0, 1) j = 1,... p then X Z T Z = p j Zj 2 χ 2 p Characteristic Function ϕ X (t) = E[e it p j Z 2 = p j=1 j ] E[e itz 2 j ]

Chi-Squared Distribution with p Degrees of Freedom iid If Z j N(0, 1) j = 1,... p then X Z T Z = p j Zj 2 χ 2 p Characteristic Function ϕ X (t) = E[e it p j Z 2 = = p j=1 j ] E[e itz 2 j ] p (1 2it) 1/2 j=1

Chi-Squared Distribution with p Degrees of Freedom iid If Z j N(0, 1) j = 1,... p then X Z T Z = p j Zj 2 χ 2 p Characteristic Function ϕ X (t) = E[e it p j Z 2 = = p j=1 j ] E[e itz 2 j ] p (1 2it) 1/2 j=1 = (1 2it) p/2

Chi-Squared Distribution with p Degrees of Freedom iid If Z j N(0, 1) j = 1,... p then X Z T Z = p j Zj 2 χ 2 p Characteristic Function ϕ X (t) = E[e it p j Z 2 = = p j=1 j ] E[e itz 2 j ] p (1 2it) 1/2 j=1 = (1 2it) p/2 A Gamma distribution with shape p/2 and rate 1/2, G(p/2, 1/2) f (x) = 1 Γ(p/2) (1/2) p/2 x p/2 1 e x/2 x > 0

Quadratic Forms Theorem Let Y N(µ, σ 2 I n ) with µ C(X) then if Q is a rank k orthogonal projection on to C(X), (Y T QY)/σ 2 χ 2 k Proof. For an orthogonal projection Q = UΛU T = U k U T k where C(Q) = C(U k ) and U T k U k = I k (Spectral Theorem)

Quadratic Forms Theorem Let Y N(µ, σ 2 I n ) with µ C(X) then if Q is a rank k orthogonal projection on to C(X), (Y T QY)/σ 2 χ 2 k Proof. For an orthogonal projection Q = UΛU T = U k U T k where C(Q) = C(U k ) and U T k U k = I k (Spectral Theorem) Y T QY = Y T U k U T k Y

Quadratic Forms Theorem Let Y N(µ, σ 2 I n ) with µ C(X) then if Q is a rank k orthogonal projection on to C(X), (Y T QY)/σ 2 χ 2 k Proof. For an orthogonal projection Q = UΛU T = U k U T k where C(Q) = C(U k ) and U T k U k = I k (Spectral Theorem) Y T QY = Y T U k U T k Y Z = U T k Y/σ N(UT k µ, UT k U k)

Quadratic Forms Theorem Let Y N(µ, σ 2 I n ) with µ C(X) then if Q is a rank k orthogonal projection on to C(X), (Y T QY)/σ 2 χ 2 k Proof. For an orthogonal projection Q = UΛU T = U k U T k where C(Q) = C(U k ) and U T k U k = I k (Spectral Theorem) Y T QY = Y T U k U T k Y Z = U T k Y/σ N(UT k µ, UT k U k) Z N(0, I k )

Quadratic Forms Theorem Let Y N(µ, σ 2 I n ) with µ C(X) then if Q is a rank k orthogonal projection on to C(X), (Y T QY)/σ 2 χ 2 k Proof. For an orthogonal projection Q = UΛU T = U k U T k where C(Q) = C(U k ) and U T k U k = I k (Spectral Theorem) Y T QY = Y T U k U T k Y Z = U T k Y/σ N(UT k µ, UT k U k) Z N(0, I k ) Z T Z χ 2 k

Quadratic Forms Theorem Let Y N(µ, σ 2 I n ) with µ C(X) then if Q is a rank k orthogonal projection on to C(X), (Y T QY)/σ 2 χ 2 k Proof. For an orthogonal projection Q = UΛU T = U k U T k where C(Q) = C(U k ) and U T k U k = I k (Spectral Theorem) Y T QY = Y T U k U T k Y Z = U T k Y/σ N(UT k µ, UT k U k) Z N(0, I k ) Z T Z χ 2 k Since U T Y/σ D = Z, YT QY σ 2 χ 2 k

Residual Sum of Squares Example Sum of Squares Error (SSE) Let Y N(µ, σ 2 I n ) with µ C(X). Because µ C(X), I P X is a projection on C(X)

Residual Sum of Squares Example Sum of Squares Error (SSE) Let Y N(µ, σ 2 I n ) with µ C(X). Because µ C(X), I P X is a projection on C(X) e T e σ 2 = (I 2 YT n P X ) Y χ 2 n r(x) σ

Estimated Coefficients and Residuals are Independent If Y N(Xβ, σ 2 I n ) Then Cov(ˆβ, e) = 0 which implies independence Functions of independent random variables are independent (show characteristic functions or densities factor)

Putting it all together ˆβ N(β, σ 2 (X T X) 1 ) ( ˆβ j β j )/σ[(x T X) 1 ] jj N(0, 1) e T e/σ 2 χ 2 n r(x) ˆβ and e are independent ( ˆβ j β j )/σ[(x T X) 1 ] jj e T e/(σ 2 (n r(x))) t(n r(x), 0, 1)

Inference 95% Confidence interval: ˆβ j ± t α/2 SE( ˆβ j )

Inference 95% Confidence interval: ˆβ j ± t α/2 SE( ˆβ j ) use qt(a, df) for t a quantile derive from pivotal quantity t = ( ˆβ j β j )/SE( ˆβ j ) where P(t (t α/2, t 1 α/2 )) = 1 α

Prostate Example xtable(confint(prostate.lm)) from library(mass) and library(xtable) 2.5 % 97.5 % (Intercept) -1.91 3.25 lcavol 0.41 0.76 lweight 0.12 0.79 age -0.04 0.00 lbph -0.01 0.22 svi 0.28 1.25 lcp -0.29 0.08 gleason -0.27 0.36 pgg45-0.00 0.01

interpretation For a 1 unit increase in X j, expect Y to increase by ˆβ j ± t α/2 SE( ˆβ j ) for log transforms Y = exp(xβ + ɛ) = exp(x j β j ) exp(ɛ) if X = log(w j ) then look at 2-fold or % increases in W to look at multiplicative increase in median of Y ifcavol increases by 10% then we expect PSA to increase by 1.10 (CI ) = (1.0398%, 1.0751%) or by 3.98 to 7.51 percent For a 10% increase in cancer volume, we are 95% confident that the PSA levels will increase by approximately 4 to 7.5 percent.

Derivation