BIO752: Advaced Methds i Bistatistics, II TERM 2, 2010 T. A. Luis BIO 752: MIDTERM EXAMINATION: ANSWERS 30 Nvember 2010 Questi #1 (15 pits): Let X ad Y be radm variables with a jit distributi ad assume that V (X) = V (Y ) <. Let S = X + Y ad D = X Y. (a) Shw that cv(s, D) = 0 cv(s,d) = cv(x + Y,X Y ) = cv(x,x) cv(x,y ) + cv(x,y ) cv(y,y ) = V (X) V (Y ) = 0 (b) Give a example f a (X,Y ) pair fr which cv(s,d) = 0, but S ad D are depedet. Let, Z N(0,1) ad set D = Z,S = Z 2. S, cv(s,d) = E(Z 3 ) = 0, but (S,D) are cmpletely depedet. Wrkig back, X = (S + D)/2 = (Z + Z 2 )/2, Y = (S D)/2 = (Z Z 2 )/2, V (X) = V (Y ) = (1 + 2)/4 = 3/4. Nte: Whe studyig the relati betwee variables with a symmetric rle, fr example the IQs f twis, aalyzig D versus S (plttig ad regressig) avids arbitrarily treatig e twi as the predictr. It is a pr pers s pricipal cmpet aalysis. I the twis applicati, usually yu fid a slpe f 0, but with the spread f D biggest i the middle rage f IQs, arrwig at each ed. This suggests that evirmetal factrs have mre ifluece i the middle rage tha at either ed f the IQ scale.
BIO752: Advaced Methds i Bistatistics, II TERM 2, 2010 2 Questi #2 (25 pits): Assume that yu have data ad pla t cduct the regressi, Y i = α 0 + α 1 X i1 + α 2 X i2 + e i, i = 1,...,15. (a) I the space belw draw a scatterplt f 15 desig pits, (X i1,x i2 ). Iclude ad idetify, i. Oe that is high leverage fr α 1, but t fr α 2 ii. Oe that is high leverage fr bth α 1 ad α 2 iii. Oe that is high leverage fr either α 1 r α 2 (b) Justify yur idetificatis DESIGN SPACE X2 2 1 0 1 2 square = alpha1 high leverage triagle = (alpha1, alpha2) high leverage slid = high leverage fr either 2 1 0 1 2 X1 The pit is a utlier fr X 1, but t X 2 ad s is high leverage fr α 1. The pit is a utlier fr X 1 ad X 2 ad s is high leverage fr bth. The pit is dead ceter ad s high leverage fr either.
BIO752: Advaced Methds i Bistatistics, II TERM 2, 2010 3 Questi #3 (30 pits): Yu estimate θ usig ˆθ(c) = (1 c) X, whe [X 1,...,X θ] are iid with mea θ ad variace σ 2. (a) Fid the MSE(θ,c,σ 2,) = E[(ˆθ(c) θ) 2 θ,c,σ 2,)] MSE(θ,c,σ 2,) = c 2 θ 2 + (1 c) 2σ2 (b) Fid sup <θ< MSE(θ,c,σ 2,). Fr what value(s) f c is this fiite? sup = uless c = 0. (c) Idex c by (i.e., c ). Fid a ecessary ad sufficiet cditi c s that fr ay < θ < as, MSE(θ,c,σ 2,) 0 (i.e., ˆθ(c ) is L 2 csistet). c 0 is ecessary ad sufficiet as is clear frm the aswer i part (a). (d) Fid the value f c that miimizes MSE(θ,c,σ 2,). Dete it by c (θ,σ 2 ). c (θ,σ 2 ) = Nte that the ptimal c is a fucti f θ/σ σ 2 θ 2 + σ 2 (e) Fr 0 < A <, fid, MSE (A,σ 2,) = if c sup MSE(θ,c,σ 2,) θ ( A,A) ad fid c (A,σ 2 ), the value f c that prduces MSE. MSE (A,σ 2,) = MSE(A,c (A,σ 2 ),σ 2,) = σ 2 A 2 A 2 + σ 2 = σ2 ( A 2 A 2 + σ 2 ) < σ2 c (A,σ 2 ) = c (A,σ 2 ) (f) Evaluate lim A MSE (A,σ 2,) ad lim A c (A,σ2 ). MSE σ 2 /; c 0. i. Briefly discuss what this implies abut selectig c. If yu wat t ctrl the MSE fr abslutely all θ, yu must set c = 0 ad use the ubiased estimate. If, hwever, yu are willig/able t cstrai the pssible values f θ t a fiite iterval, yu ca shrik twards 0 ad reduce the MSE.
BIO752: Advaced Methds i Bistatistics, II TERM 2, 2010 4 Questi #4 (30 pits): Yu are t aalyze bld pressure data (Y i,i = 1,...,) with primary iterest i the effect f a treatmet. There are tw explaatry variables, T i = {0 r 1} accrdig as participat i is i the ctrl r treatmet grup ad D i = {0 r 1} accrdig as the participat des t r des have diabetes. (a) Write dw a liear mdel that icludes a itercept, the mai effect fr treatmet ad the mai effect fr diabetes, but iteracti term. Y i = µ + αt i + βd i + e i, e i (0,σ 2 ) (b) Iterpret the cefficiets. Des yur iterpretati deped whether treatmet assigmet was radmized r the study was bservatial? If s, hw; if t, why t? µ is the expected BP fr -diabetics i the ctrl grup α is the expected icremet i BP assciated with treatmet grup membership. This is the treatmet effect fr bth diabetics ad -diabetics β is the expected icremet i BP assciated with beig diabetic µ + α + β is the expected BP fr diabetics i the treatmet grup, etc. Iterpretati f the treatmet effect (α) des deped whether r t the study is radmized. If radmized, the T, D ad ther attributes t (yet) i the mdel are statistically idepedet, ad α ca be iterpreted i a causal maer as the expected chage iduced by treatmet. If the study is t radmized, the α is the expected chage assciated with treatmet., but it is ptetially cfuded by imbalace i umeasured r yet t be icluded cvariates. If the effect f diabetes is truly additive, the the treatmet effect will t be cfuded by it, but if the diabetes effect is t additive, there ca still be cfudig by diabetes status. (c) Augmet the mdel i part (a) by a treatmet-by-diabetes iteracti. Y i = µ + αt i + βd i + γt i D i + e i, e i (0,σ 2 ) i. What are the treatmet effects fr diabetics ad fr -diabetics? Treatmet effects: N-diabetics: α Diabetics: α + γ ii. Fr what relati betwee α ad γ des the treatmet effects fr diabetics ad diabetics have the same sig? S lg as γ < α the treatmet effects will have the same sig. This implies that eve with a sigificat treatmet by diabetes iteracti, there ca be a csistet recmmedati fr bth grups. (d) There are 4(= 2 2) differet types f participats, (T, D) = {(00),(01),(10), (11)} i. Lgic regressi: Write dw the liear mdel that uses idicatrs fr these Blea expressis as regressrs. With I i (td) = I {(Ti,D i )=(t,d)}, Y i = δ 00 I i (00) + δ 01 I i (01) + δ 10 I i (10)δ 11 I i (11) + e i ii. Iterpret the slpes. All terms play the same rle; each slpe measures the expectati f Y cditial the sceari, t as cmpared t sme ther sceari.
BIO752: Advaced Methds i Bistatistics, II TERM 2, 2010 5 iii. What cstraits these slpes wuld prduce the mdel i part (a)? Sceari Part (a) mdel Lgic regressi Cstrait 00 µ δ 00 δ 00 = µ 01 µ + β δ 00 + δ 01 δ 01 = β 10 µ + α δ 00 + δ 10 δ 10 = α 11 µ + α + β δ 00 + δ 11 δ 11 = δ 01 + δ 10 (= α + β) (e) NOT ON EXAM: Briefly discuss the advatages ad disadvatages f the stadard ad lgic mdelig appraches. Lgic regressi has the ptetial t be mre parsimius i that it autmatically icludes the jit acti f predictrs. This feature becmes mre imprtat as the dimesi f the regressrs icreases, especially i applicatis where the statistical relatis are dmiated by iteractis ad jit actis. Fr example, i gemics, it s usually the jit activity f gees r SNPs; mai effects have a relatively smaller rle. There is abslute here, bth appraches ca be effective; each shuld be tried t see which prvides a better fit relative t degrees f freedm used.