Sampling Distributions

Size: px

Start display at page:

Download "Sampling Distributions"

Gabriel Goodman
6 years ago
Views:

1 Merlise Clyde Duke University September 3, 2015

2 Outline Topics Normal Theory Chi-squared Distributions Student t Distributions Readings: Christensen Apendix C, Chapter 1-2

3 Prostate Example > library(lasso2); data(prostate) # n = 97, 9 variables > summary(lm(lpsa ~., data=prostate)) Coefficients: Estimate Std. Error t value Pr(> t ) (Intercept) lcavol e-09 *** lweight ** age lbph svi ** lcp gleason pgg Signif. codes: 0 *** ** 0.01 * Residual standard error: on 88 degrees of freedom Multiple R-squared: , Adjusted R-squared: F-statistic: STA721 onlinear 8 and Models 88 Sampling DF, Distributions p-value: < 2.2e-16

4 Summary of Distributions Models: Full Y = Xβ + ɛ Assume X is full rank with the first column of ones 1 n and p additional predictors r(x) = p + 1 ˆβ σ 2 N(β, σ 2 (X T X) 1 ) RSS σ 2 χ 2 n r(x) ˆβ j β j SE( ˆβ j ) t n r(x) where SE( ˆβ) is the square root of the jth diagonal element of ˆσ 2 (X T X) 1 and ˆσ 2 is the unbiased estimate of σ 2

5 of β If Y N(Xβ, σ 2 I n ) Then ˆβ N(β, σ 2 (X T X) 1 )

6 Unknown σ 2 ˆβ j β j, σ 2 N(β, σ 2 [(X T X) 1 ] jj ) What happens if we substitute ˆσ 2 = e t e/(n r(x)) in the above? ( ˆβ j β j )/σ [(X T X) 1 ] jj e T e/(σ 2 (n r(x)) D = N(0, 1) χ 2 n r(x) /(n r(x) t(n r(x), 0, 1) Need to show that e T e/σ 2 has a χ 2 distribution and is independent of the numerator!

7 Central Student t Distribution Definition Let Z N(0, 1) and S χ 2 p with Z and S independent, then W = Z S/p has a (central) Student t distribution with p degrees of freedom See Casella & Berger or DeGroot & Schervish for derivation - nice change of variables and marginalization problem!

8 Chi-Squared Distribution Definition If Z N(0, 1) then Z 2 χ 2 1 degree of freedom) Density (A Chi-squared distribution with one f (x) = 1 Γ(1/2) (1/2) 1/2 x 1/2 1 e x/2 x > 0 Characteristic Function E[e itz 2 ] = ϕ(t) = (1 2it) 1/2

9 Chi-Squared Distribution with p Degrees of Freedom iid If Z j N(0, 1) j = 1,... p then X Z T Z = p j Zj 2 χ 2 p Characteristic Function ϕ X (t) = E[e it p j Z 2 = = p j=1 j ] E[e itz 2 j ] p (1 2it) 1/2 j=1 = (1 2it) p/2 A Gamma distribution with shape p/2 and rate 1/2, G(p/2, 1/2) f (x) = 1 Γ(p/2) (1/2) p/2 x p/2 1 e x/2 x > 0

10 Quadratic Forms Theorem Let Y N(µ, σ 2 I n ) with µ C(X) then if Q is a rank k orthogonal projection on to C(X), (Y T QY)/σ 2 χ 2 k Proof. For an orthogonal projection Q = UΛU T = U k U T k where C(Q) = C(U k ) and U T k U k = I k (Spectral Theorem) Y T QY = Y T U k U T k Y Z = U T k Y/σ N(UT k µ, UT k U k) Z N(0, I k ) Z T Z χ 2 k Since U T Y/σ D = Z, YT QY σ 2 χ 2 k

11 Residual Sum of Squares Example Sum of Squares Error (SSE) Let Y N(µ, σ 2 I n ) with µ C(X). Because µ C(X), I P X is a projection on C(X) e T e σ 2 = (I 2 YT n P X ) Y χ 2 n r(x) σ

12 Estimated Coefficients and Residuals are Independent If Y N(Xβ, σ 2 I n ) Then Cov(ˆβ, e) = 0 which implies independence Functions of independent random variables are independent (show characteristic functions or densities factor)

13 Putting it all together ˆβ N(β, σ 2 (X T X) 1 ) ( ˆβ j β j )/σ[(x T X) 1 ] jj N(0, 1) e T e/σ 2 χ 2 n r(x) ˆβ and e are independent ( ˆβ j β j )/σ[(x T X) 1 ] jj e T e/(σ 2 (n r(x))) t(n r(x), 0, 1)

14 Inference 95% Confidence interval: ˆβ j ± t α/2 SE( ˆβ j ) use qt(a, df) for t a quantile derive from pivotal quantity t = ( ˆβ j β j )/SE( ˆβ j ) where P(t (t α/2, t 1 α/2 )) = 1 α

15 Prostate Example xtable(confint(prostate.lm)) from library(mass) and library(xtable) 2.5 % 97.5 % (Intercept) lcavol lweight age lbph svi lcp gleason pgg

16 interpretation For a 1 unit increase in X j, expect Y to increase by ˆβ j ± t α/2 SE( ˆβ j ) for log transforms Y = exp(xβ + ɛ) = exp(x j β j ) exp(ɛ) if X = log(w j ) then look at 2-fold or % increases in W to look at multiplicative increase in median of Y ifcavol increases by 10% then we expect PSA to increase by 1.10 (CI ) = (1.0398%, %) or by 3.98 to 7.51 percent For a 10% increase in cancer volume, we are 95% confident that the PSA levels will increase by approximately 4 to 7.5 percent.

17 Derivation

Sampling Distributions

Sampling Distributions Merlise Clyde Duke University September 8, 2016 Outline Topics Normal Theory Chi-squared Distributions Student t Distributions Readings: Christensen Apendix C, Chapter 1-2 Prostate Example > library(lasso2);