Sampling Distributions

Size: px

Start display at page:

Download "Sampling Distributions"

Maximilian Warren
6 years ago
Views:

1 Merlise Clyde Duke University September 8, 2016

2 Outline Topics Normal Theory Chi-squared Distributions Student t Distributions Readings: Christensen Apendix C, Chapter 1-2

3 Prostate Example > library(lasso2); data(prostate) # n = 97, 9 variables > summary(lm(lpsa ~., data=prostate)) Coefficients: Estimate Std. Error t value Pr(> t ) (Intercept) lcavol e-09 *** lweight ** age lbph svi ** lcp gleason pgg Signif. codes: 0 *** ** 0.01 * Residual standard error: on 88 degrees of freedom Multiple R-squared: ,Adjusted R-squared: F-statistic: STA721 onlinear 8 and Models 88 Sampling DF, Distributions p-value: < 2.2e-16

4 Summary of Distributions Models: Full Y = Xβ + ɛ Assume X is full rank with the first column of ones 1 n and p additional predictors r(x) = p + 1 ˆβ σ 2 N(β, σ 2 (X T X) 1 ) SSE σ 2 χ2 n r(x) ˆβ j β j SE( ˆβ j ) t n r(x) where SE( ˆβ) is the square root of the jth diagonal element of ˆσ 2 (X T X) 1 and ˆσ 2 is the unbiased estimate of σ 2

5 General Case W = µ + AZ with Z R d and A is n d

6 General Case W = µ + AZ with Z R d and A is n d E[W] = µ

7 General Case W = µ + AZ with Z R d and A is n d E[W] = µ Cov(W) = AA T 0

8 General Case W = µ + AZ with Z R d and A is n d E[W] = µ Cov(W) = AA T 0 W N(µ, Σ) where Σ = AA T If Σ is singular then there is no density (on R n ), but claim that W still has a multivariate normal distribution!

9 General Case W = µ + AZ with Z R d and A is n d E[W] = µ Cov(W) = AA T 0 W N(µ, Σ) where Σ = AA T If Σ is singular then there is no density (on R n ), but claim that W still has a multivariate normal distribution! Definition W R n has a multivariate normal distribution N(µ, Σ) if for any v R n v T Y has a normal distribution with mean v T µ and variance v T Σv

10 General Case W = µ + AZ with Z R d and A is n d E[W] = µ Cov(W) = AA T 0 W N(µ, Σ) where Σ = AA T If Σ is singular then there is no density (on R n ), but claim that W still has a multivariate normal distribution! Definition W R n has a multivariate normal distribution N(µ, Σ) if for any v R n v T Y has a normal distribution with mean v T µ and variance v T Σv see Lessons in Normal Theory in Sakai for videos using Characteristic functions

11 Linear Transformations are Normal If Y N n (µ, Σ) then for A m n AY N m (Aµ, AΣA T ) AΣA T does not have to be positive definite!

12 Equal in Distribution Multiple ways to define the same normal:

13 Equal in Distribution Multiple ways to define the same normal: Z 1 N(0, I n ), Z 1 R n and take A d n

14 Equal in Distribution Multiple ways to define the same normal: Z 1 N(0, I n ), Z 1 R n and take A d n Z 2 N(0, I p ), Z 2 R p and take B d p

15 Equal in Distribution Multiple ways to define the same normal: Z 1 N(0, I n ), Z 1 R n and take A d n Z 2 N(0, I p ), Z 2 R p and take B d p Define Y = µ + AZ 1

16 Equal in Distribution Multiple ways to define the same normal: Z 1 N(0, I n ), Z 1 R n and take A d n Z 2 N(0, I p ), Z 2 R p and take B d p Define Y = µ + AZ 1 Define W = µ + BZ 2

17 Equal in Distribution Multiple ways to define the same normal: Z 1 N(0, I n ), Z 1 R n and take A d n Z 2 N(0, I p ), Z 2 R p and take B d p Define Y = µ + AZ 1 Define W = µ + BZ 2 Theorem If Y = µ + AZ 1 and W = µ + BZ 2 then Y D = W if and only if AA T = BB T = Σ

18 Zero Correlation and Independence Theorem For a random vector Y N(µ, Σ) partitioned as [ ] ([ ] [ Y1 µ1 Σ11 Σ Y = N, 12 Y 2 µ 2 Σ 21 Σ 22 ])

19 Zero Correlation and Independence Theorem For a random vector Y N(µ, Σ) partitioned as [ ] ([ ] [ Y1 µ1 Σ11 Σ Y = N, 12 Y 2 µ 2 Σ 21 Σ 22 then Cov(Y 1, Y 2 ) = Σ 12 = Σ T 21 = 0 if and only if Y 1 and Y 2 are independent. ])

20 Independence Implies Zero Covariance Proof. Cov(Y 1, Y 2 ) = E[(Y 1 µ 1 )(Y 2 µ 2 ) T ]

21 Independence Implies Zero Covariance Proof. Cov(Y 1, Y 2 ) = E[(Y 1 µ 1 )(Y 2 µ 2 ) T ] If Y 1 and Y 2 are independent

22 Independence Implies Zero Covariance Proof. Cov(Y 1, Y 2 ) = E[(Y 1 µ 1 )(Y 2 µ 2 ) T ] If Y 1 and Y 2 are independent E[(Y 1 µ 1 )(Y 2 µ 2 ) T ] = E[(Y 1 µ 1 )E(Y 2 µ 2 ) T ] = 00 T = 0

23 Independence Implies Zero Covariance Proof. Cov(Y 1, Y 2 ) = E[(Y 1 µ 1 )(Y 2 µ 2 ) T ] If Y 1 and Y 2 are independent E[(Y 1 µ 1 )(Y 2 µ 2 ) T ] = E[(Y 1 µ 1 )E(Y 2 µ 2 ) T ] = 00 T = 0 therefore Σ 12 = 0

24 Zero Covariance Implies Independence Assume Σ 12 = 0 Proof Choose an [ A1 0 A = 0 A 2 ] such that A 1 A T 1 = Σ 11, A 2 A T 2 = Σ 22

25 Zero Covariance Implies Independence Assume Σ 12 = 0 Proof Choose an [ A1 0 A = 0 A 2 ] such that A 1 A T 1 = Σ 11, A 2 A T 2 = Σ 22 Partition Z = [ Z1 Z 2 ] ([ 01 N 0 2 ] [ I1 0, 0 I 2 ]) and µ = [ µ1 µ 2 ]

26 Zero Covariance Implies Independence Assume Σ 12 = 0 Proof Choose an [ A1 0 A = 0 A 2 ] such that A 1 A T 1 = Σ 11, A 2 A T 2 = Σ 22 Partition Z = [ Z1 Z 2 ] ([ 01 N 0 2 ] [ I1 0, 0 I 2 ]) and µ = [ µ1 µ 2 ] then Y D = AZ + µ N(µ, Σ)

27 Continued Proof. [ Y1 Y 2 ] [ D A1 Z = 1 + µ 1 A 2 Z 2 + µ 2 ]

28 Continued Proof. [ Y1 Y 2 ] [ D A1 Z = 1 + µ 1 A 2 Z 2 + µ 2 ] But Z 1 and Z 2 are independent

29 Continued Proof. [ Y1 Y 2 ] [ D A1 Z = 1 + µ 1 A 2 Z 2 + µ 2 ] But Z 1 and Z 2 are independent Functions of Z 1 and Z 2 are independent

30 Continued Proof. [ Y1 Y 2 ] [ D A1 Z = 1 + µ 1 A 2 Z 2 + µ 2 ] But Z 1 and Z 2 are independent Functions of Z 1 and Z 2 are independent Therefore Y 1 and Y 2 are independent

31 Continued Proof. [ Y1 Y 2 ] [ D A1 Z = 1 + µ 1 A 2 Z 2 + µ 2 ] But Z 1 and Z 2 are independent Functions of Z 1 and Z 2 are independent Therefore Y 1 and Y 2 are independent For Multivariate Normal Zero Covariance implies independence

32 Another Useful Result Corollary If Y N(µ, σ 2 I n ) and AB T = 0 then AY and BY are independent.

33 Another Useful Result Corollary If Y N(µ, σ 2 I n ) and AB T = 0 then AY and BY are independent. Proof. [ W1 W 2 ] = [ A B ] [ AY Y = BY ]

34 Another Useful Result Corollary If Y N(µ, σ 2 I n ) and AB T = 0 then AY and BY are independent. Proof. [ W1 W 2 ] = [ A B ] [ AY Y = BY ] Cov(W 1, W 2 ) = Cov(AY, BY) = σ 2 AB T

35 Another Useful Result Corollary If Y N(µ, σ 2 I n ) and AB T = 0 then AY and BY are independent. Proof. [ W1 W 2 ] = [ A B ] [ AY Y = BY ] Cov(W 1, W 2 ) = Cov(AY, BY) = σ 2 AB T AY and BY are independent if AB T = 0

36 of β If Y N(Xβ, σ 2 I n ) Then ˆβ N(β, σ 2 (X T X) 1 )

37 Unknown σ 2 ˆβ j β j, σ 2 N(β, σ 2 [(X T X) 1 ] jj )

38 Unknown σ 2 ˆβ j β j, σ 2 N(β, σ 2 [(X T X) 1 ] jj ) What happens if we substitute ˆσ 2 = e t e/(n r(x)) in the above?

39 Unknown σ 2 ˆβ j β j, σ 2 N(β, σ 2 [(X T X) 1 ] jj ) What happens if we substitute ˆσ 2 = e t e/(n r(x)) in the above? ( ˆβ j β j )/σ [(X T X) 1 ] jj e T e/(σ 2 (n r(x)) D = N(0, 1) χ 2 n r(x) /(n r(x) t(n r(x), 0, 1) Need to show that e T e/σ 2 has a χ 2 distribution and is independent of the numerator!

40 Central Student t Distribution Definition Let Z N(0, 1) and S χ 2 p with Z and S independent,

41 Central Student t Distribution Definition Let Z N(0, 1) and S χ 2 p with Z and S independent, then W = Z S/p has a (central) Student t distribution with p degrees of freedom

42 Central Student t Distribution Definition Let Z N(0, 1) and S χ 2 p with Z and S independent, then W = Z S/p has a (central) Student t distribution with p degrees of freedom See Casella & Berger or DeGroot & Schervish for derivation - nice change of variables and marginalization problem!

43 Chi-Squared Distribution Definition If Z N(0, 1) then Z 2 χ 2 1 degree of freedom) (A Chi-squared distribution with one

44 Chi-Squared Distribution Definition If Z N(0, 1) then Z 2 χ 2 1 degree of freedom) Density (A Chi-squared distribution with one f (x) = 1 Γ(1/2) (1/2) 1/2 x 1/2 1 e x/2 x > 0

45 Chi-Squared Distribution Definition If Z N(0, 1) then Z 2 χ 2 1 degree of freedom) Density (A Chi-squared distribution with one f (x) = 1 Γ(1/2) (1/2) 1/2 x 1/2 1 e x/2 x > 0 Characteristic Function E[e itz 2 ] = ϕ(t) = (1 2it) 1/2

46 Chi-Squared Distribution with p Degrees of Freedom iid If Z j N(0, 1) j = 1,... p then X Z T Z = p j Zj 2 χ 2 p

47 Chi-Squared Distribution with p Degrees of Freedom iid If Z j N(0, 1) j = 1,... p then X Z T Z = p j Zj 2 χ 2 p Characteristic Function ϕ X (t) = E[e it p j Z 2 j ]

48 Chi-Squared Distribution with p Degrees of Freedom iid If Z j N(0, 1) j = 1,... p then X Z T Z = p j Zj 2 χ 2 p Characteristic Function ϕ X (t) = E[e it p j Z 2 = p j=1 j ] E[e itz 2 j ]

49 Chi-Squared Distribution with p Degrees of Freedom iid If Z j N(0, 1) j = 1,... p then X Z T Z = p j Zj 2 χ 2 p Characteristic Function ϕ X (t) = E[e it p j Z 2 = = p j=1 j ] E[e itz 2 j ] p (1 2it) 1/2 j=1

50 Chi-Squared Distribution with p Degrees of Freedom iid If Z j N(0, 1) j = 1,... p then X Z T Z = p j Zj 2 χ 2 p Characteristic Function ϕ X (t) = E[e it p j Z 2 = = p j=1 j ] E[e itz 2 j ] p (1 2it) 1/2 j=1 = (1 2it) p/2

51 Chi-Squared Distribution with p Degrees of Freedom iid If Z j N(0, 1) j = 1,... p then X Z T Z = p j Zj 2 χ 2 p Characteristic Function ϕ X (t) = E[e it p j Z 2 = = p j=1 j ] E[e itz 2 j ] p (1 2it) 1/2 j=1 = (1 2it) p/2 A Gamma distribution with shape p/2 and rate 1/2, G(p/2, 1/2) f (x) = 1 Γ(p/2) (1/2) p/2 x p/2 1 e x/2 x > 0

52 Quadratic Forms Theorem Let Y N(µ, σ 2 I n ) with µ C(X) then if Q is a rank k orthogonal projection on to C(X), (Y T QY)/σ 2 χ 2 k Proof. For an orthogonal projection Q = UΛU T = U k U T k where C(Q) = C(U k ) and U T k U k = I k (Spectral Theorem)

53 Quadratic Forms Theorem Let Y N(µ, σ 2 I n ) with µ C(X) then if Q is a rank k orthogonal projection on to C(X), (Y T QY)/σ 2 χ 2 k Proof. For an orthogonal projection Q = UΛU T = U k U T k where C(Q) = C(U k ) and U T k U k = I k (Spectral Theorem) Y T QY = Y T U k U T k Y

54 Quadratic Forms Theorem Let Y N(µ, σ 2 I n ) with µ C(X) then if Q is a rank k orthogonal projection on to C(X), (Y T QY)/σ 2 χ 2 k Proof. For an orthogonal projection Q = UΛU T = U k U T k where C(Q) = C(U k ) and U T k U k = I k (Spectral Theorem) Y T QY = Y T U k U T k Y Z = U T k Y/σ N(UT k µ, UT k U k)

55 Quadratic Forms Theorem Let Y N(µ, σ 2 I n ) with µ C(X) then if Q is a rank k orthogonal projection on to C(X), (Y T QY)/σ 2 χ 2 k Proof. For an orthogonal projection Q = UΛU T = U k U T k where C(Q) = C(U k ) and U T k U k = I k (Spectral Theorem) Y T QY = Y T U k U T k Y Z = U T k Y/σ N(UT k µ, UT k U k) Z N(0, I k )

56 Quadratic Forms Theorem Let Y N(µ, σ 2 I n ) with µ C(X) then if Q is a rank k orthogonal projection on to C(X), (Y T QY)/σ 2 χ 2 k Proof. For an orthogonal projection Q = UΛU T = U k U T k where C(Q) = C(U k ) and U T k U k = I k (Spectral Theorem) Y T QY = Y T U k U T k Y Z = U T k Y/σ N(UT k µ, UT k U k) Z N(0, I k ) Z T Z χ 2 k

57 Quadratic Forms Theorem Let Y N(µ, σ 2 I n ) with µ C(X) then if Q is a rank k orthogonal projection on to C(X), (Y T QY)/σ 2 χ 2 k Proof. For an orthogonal projection Q = UΛU T = U k U T k where C(Q) = C(U k ) and U T k U k = I k (Spectral Theorem) Y T QY = Y T U k U T k Y Z = U T k Y/σ N(UT k µ, UT k U k) Z N(0, I k ) Z T Z χ 2 k Since U T Y/σ D = Z, YT QY σ 2 χ 2 k

58 Residual Sum of Squares Example Sum of Squares Error (SSE) Let Y N(µ, σ 2 I n ) with µ C(X). Because µ C(X), I P X is a projection on C(X)

59 Residual Sum of Squares Example Sum of Squares Error (SSE) Let Y N(µ, σ 2 I n ) with µ C(X). Because µ C(X), I P X is a projection on C(X) e T e σ 2 = (I 2 YT n P X ) Y χ 2 n r(x) σ

60 Estimated Coefficients and Residuals are Independent If Y N(Xβ, σ 2 I n ) Then Cov(ˆβ, e) = 0 which implies independence Functions of independent random variables are independent (show characteristic functions or densities factor)

61 Putting it all together ˆβ N(β, σ 2 (X T X) 1 ) ( ˆβ j β j )/σ[(x T X) 1 ] jj N(0, 1) e T e/σ 2 χ 2 n r(x) ˆβ and e are independent ( ˆβ j β j )/σ[(x T X) 1 ] jj e T e/(σ 2 (n r(x))) t(n r(x), 0, 1)

62 Inference 95% Confidence interval: ˆβ j ± t α/2 SE( ˆβ j )

63 Inference 95% Confidence interval: ˆβ j ± t α/2 SE( ˆβ j ) use qt(a, df) for t a quantile derive from pivotal quantity t = ( ˆβ j β j )/SE( ˆβ j ) where P(t (t α/2, t 1 α/2 )) = 1 α

64 Prostate Example xtable(confint(prostate.lm)) from library(mass) and library(xtable) 2.5 % 97.5 % (Intercept) lcavol lweight age lbph svi lcp gleason pgg

65 interpretation For a 1 unit increase in X j, expect Y to increase by ˆβ j ± t α/2 SE( ˆβ j ) for log transforms Y = exp(xβ + ɛ) = exp(x j β j ) exp(ɛ) if X = log(w j ) then look at 2-fold or % increases in W to look at multiplicative increase in median of Y ifcavol increases by 10% then we expect PSA to increase by 1.10 (CI ) = (1.0398%, %) or by 3.98 to 7.51 percent For a 10% increase in cancer volume, we are 95% confident that the PSA levels will increase by approximately 4 to 7.5 percent.

66 Derivation

Sampling Distributions

Sampling Distributions Merlise Clyde Duke University September 3, 2015 Outline Topics Normal Theory Chi-squared Distributions Student t Distributions Readings: Christensen Apendix C, Chapter 1-2 Prostate Example > library(lasso2);