BIO752: Advanced Methods in Biostatistics, II TERM 2, 2010 T. A. Louis. BIO 752: MIDTERM EXAMINATION: ANSWERS 30 November 2010

Similar documents
Chapter 3.1: Polynomial Functions

D.S.G. POLLOCK: TOPICS IN TIME-SERIES ANALYSIS STATISTICAL FOURIER ANALYSIS

ENGI 4421 Central Limit Theorem Page Central Limit Theorem [Navidi, section 4.11; Devore sections ]

5.1 Two-Step Conditional Density Estimator

Ch. 1 Introduction to Estimation 1/15

ENGI 4421 Central Limit Theorem Page Central Limit Theorem [Navidi, section 4.11; Devore sections ]

Lecture 11 Simple Linear Regression

AP Statistics Notes Unit Eight: Introduction to Inference

Multi-objective Programming Approach for. Fuzzy Linear Programming Problems

ALE 26. Equilibria for Cell Reactions. What happens to the cell potential as the reaction proceeds over time?

MATH Midterm Examination Victor Matveev October 26, 2016

1 Inferential Methods for Correlation and Regression Analysis

[1 & α(t & T 1. ' ρ 1

Grade 3 Mathematics Course Syllabus Prince George s County Public Schools

Review for cumulative test

A Study on Estimation of Lifetime Distribution with Covariates Under Misspecification

Comparative analysis of bayesian control chart estimation and conventional multivariate control chart

Simple Regression. Acknowledgement. These slides are based on presentations created and copyrighted by Prof. Daniel Menasce (GMU) CS 700

Super-efficiency Models, Part II

Solutions. Definitions pertaining to solutions

Quantum Mechanics for Scientists and Engineers. David Miller

Chapter 5. Root Locus Techniques

Markov processes and the Kolmogorov equations

Matching a Distribution by Matching Quantiles Estimation

A New Method for Finding an Optimal Solution. of Fully Interval Integer Transportation Problems

Discrete Mathematics for CS Spring 2008 David Wagner Note 22

, the random variable. and a sample size over the y-values 0:1:10.

5.80 Small-Molecule Spectroscopy and Dynamics

Hypothesis Tests for One Population Mean

Study of Energy Eigenvalues of Three Dimensional. Quantum Wires with Variable Cross Section

Physical Chemistry Laboratory I CHEM 445 Experiment 2 Partial Molar Volume (Revised, 01/13/03)

E o and the equilibrium constant, K

Active redundancy allocation in systems. R. Romera; J. Valdés; R. Zequeira*

October 25, 2018 BIM 105 Probability and Statistics for Biomedical Engineers 1

MATHEMATICS 9740/01 Paper 1 14 Sep hours

Assessment and Modeling of Forests. FR 4218 Spring Assignment 1 Solutions

CHAPTER 24: INFERENCE IN REGRESSION. Chapter 24: Make inferences about the population from which the sample data came.

SUPPLEMENTARY MATERIAL GaGa: a simple and flexible hierarchical model for microarray data analysis

Identical Particles. We would like to move from the quantum theory of hydrogen to that for the rest of the periodic table

The Excel FFT Function v1.1 P. T. Debevec February 12, The discrete Fourier transform may be used to identify periodic structures in time ht.

10-701/ Machine Learning Mid-term Exam Solution

Intermediate Division Solutions

Lecture 21: Signal Subspaces and Sparsity

Author. Introduction. Author. o Asmir Tobudic. ISE 599 Computational Modeling of Expressive Performance

Topic 9: Sampling Distributions of Estimators

Fourier Series & Fourier Transforms

2 1. The r.s., of size n2, from population 2 will be. 2 and 2. 2) The two populations are independent. This implies that all of the n1 n2

Statistics 20: Final Exam Solutions Summer Session 2007

Internal vs. external validity. External validity. Internal validity

Axial Temperature Distribution in W-Tailored Optical Fibers

II. Descriptive Statistics D. Linear Correlation and Regression. 1. Linear Correlation

Econ 325/327 Notes on Sample Mean, Sample Proportion, Central Limit Theorem, Chi-square Distribution, Student s t distribution 1.

Output Analysis (2, Chapters 10 &11 Law)

Linear Regression Models

Hypothesis Testing and Confidence Intervals (Part 1): Using the Standard Normal

EXAMINATIONS OF THE ROYAL STATISTICAL SOCIETY

Linearly Independent Sets, Bases. Review. Remarks. A set of vectors,,, in a vector space is said to be linearly independent if the vector equation

Resampling Methods. X (1/2), i.e., Pr (X i m) = 1/2. We order the data: X (1) X (2) X (n). Define the sample median: ( n.

LINEAR REGRESSION ANALYSIS. MODULE IX Lecture Multicollinearity

Regression, Inference, and Model Building

TABLES AND FORMULAS FOR MOORE Basic Practice of Statistics

y ij = µ + α i + ɛ ij,

Statistics Lecture 27. Final review. Administrative Notes. Outline. Experiments. Sampling and Surveys. Administrative Notes

ALE 21. Gibbs Free Energy. At what temperature does the spontaneity of a reaction change?

Chapter 6 Sampling Distributions

CHAPTER 2. Mean This is the usual arithmetic mean or average and is equal to the sum of the measurements divided by number of measurements.

Chapter 22. Comparing Two Proportions. Copyright 2010 Pearson Education, Inc.

Computing Confidence Intervals for Sample Data

Hypothesis Testing. Evaluation of Performance of Learned h. Issues. Trade-off Between Bias and Variance

Sample Size Estimation in the Proportional Hazards Model for K-sample or Regression Settings Scott S. Emerson, M.D., Ph.D.

MEASURES OF DISPERSION (VARIABILITY)

Cross-Validation in Function Estimation

7-1. Chapter 4. Part I. Sampling Distributions and Confidence Intervals

Lecture 19: Convergence

STATISTICAL PROPERTIES OF LEAST SQUARES ESTIMATORS. Comments:

DAWSON COLLEGE DEPARTMENT OF MATHEMATICS 201-BZS-05 PROBABILITY AND STATISTICS FALL 2015 FINAL EXAM

Because it tests for differences between multiple pairs of means in one test, it is called an omnibus test.

Unifying the Derivations for. the Akaike and Corrected Akaike. Information Criteria. from Statistics & Probability Letters,

Optimization Methods MIT 2.098/6.255/ Final exam

The Simple Linear Regression Model: Theory

Since X n /n P p, we know that X n (n. Xn (n X n ) Using the asymptotic result above to obtain an approximation for fixed n, we obtain

Unit -2 THEORY OF DILUTE SOLUTIONS

Math 105: Review for Exam I - Solutions

x 2 x 3 x b 0, then a, b, c log x 1 log z log x log y 1 logb log a dy 4. dx As tangent is perpendicular to the x axis, slope

Lecture 3. Properties of Summary Statistics: Sampling Distribution

UNIVERSITY OF TORONTO Faculty of Arts and Science APRIL/MAY 2009 EXAMINATIONS ECO220Y1Y PART 1 OF 2 SOLUTIONS

Overview. p 2. Chapter 9. Pooled Estimate of. q = 1 p. Notation for Two Proportions. Inferences about Two Proportions. Assumptions

Statistics 511 Additional Materials

Circle the single best answer for each multiple choice question. Your choice should be made clearly.

32 estimating the cumulative distribution function

Chapter 22. Comparing Two Proportions. Copyright 2010, 2007, 2004 Pearson Education, Inc.

STATS 200: Introduction to Statistical Inference. Lecture 1: Course introduction and polling

MATH 320: Probability and Statistics 9. Estimation and Testing of Parameters. Readings: Pruim, Chapter 4

Solutions to Odd Numbered End of Chapter Exercises: Chapter 4

Read through these prior to coming to the test and follow them when you take your test.

Statistical and Mathematical Methods DS-GA 1002 December 8, Sample Final Problems Solutions

Chapter 12 Correlation

Final Review. Fall 2013 Prof. Yao Xie, H. Milton Stewart School of Industrial Systems & Engineering Georgia Tech

Z ß cos x + si x R du We start with the substitutio u = si(x), so du = cos(x). The itegral becomes but +u we should chage the limits to go with the ew

SOLUTIONS. 1. a) Let X and Y be random variables with. σ Y = 9. σ X 30 σ XY σ X 30 ρ σ X σ Y σ X + 24 σ XY

Transcription:

BIO752: Advaced Methds i Bistatistics, II TERM 2, 2010 T. A. Luis BIO 752: MIDTERM EXAMINATION: ANSWERS 30 Nvember 2010 Questi #1 (15 pits): Let X ad Y be radm variables with a jit distributi ad assume that V (X) = V (Y ) <. Let S = X + Y ad D = X Y. (a) Shw that cv(s, D) = 0 cv(s,d) = cv(x + Y,X Y ) = cv(x,x) cv(x,y ) + cv(x,y ) cv(y,y ) = V (X) V (Y ) = 0 (b) Give a example f a (X,Y ) pair fr which cv(s,d) = 0, but S ad D are depedet. Let, Z N(0,1) ad set D = Z,S = Z 2. S, cv(s,d) = E(Z 3 ) = 0, but (S,D) are cmpletely depedet. Wrkig back, X = (S + D)/2 = (Z + Z 2 )/2, Y = (S D)/2 = (Z Z 2 )/2, V (X) = V (Y ) = (1 + 2)/4 = 3/4. Nte: Whe studyig the relati betwee variables with a symmetric rle, fr example the IQs f twis, aalyzig D versus S (plttig ad regressig) avids arbitrarily treatig e twi as the predictr. It is a pr pers s pricipal cmpet aalysis. I the twis applicati, usually yu fid a slpe f 0, but with the spread f D biggest i the middle rage f IQs, arrwig at each ed. This suggests that evirmetal factrs have mre ifluece i the middle rage tha at either ed f the IQ scale.

BIO752: Advaced Methds i Bistatistics, II TERM 2, 2010 2 Questi #2 (25 pits): Assume that yu have data ad pla t cduct the regressi, Y i = α 0 + α 1 X i1 + α 2 X i2 + e i, i = 1,...,15. (a) I the space belw draw a scatterplt f 15 desig pits, (X i1,x i2 ). Iclude ad idetify, i. Oe that is high leverage fr α 1, but t fr α 2 ii. Oe that is high leverage fr bth α 1 ad α 2 iii. Oe that is high leverage fr either α 1 r α 2 (b) Justify yur idetificatis DESIGN SPACE X2 2 1 0 1 2 square = alpha1 high leverage triagle = (alpha1, alpha2) high leverage slid = high leverage fr either 2 1 0 1 2 X1 The pit is a utlier fr X 1, but t X 2 ad s is high leverage fr α 1. The pit is a utlier fr X 1 ad X 2 ad s is high leverage fr bth. The pit is dead ceter ad s high leverage fr either.

BIO752: Advaced Methds i Bistatistics, II TERM 2, 2010 3 Questi #3 (30 pits): Yu estimate θ usig ˆθ(c) = (1 c) X, whe [X 1,...,X θ] are iid with mea θ ad variace σ 2. (a) Fid the MSE(θ,c,σ 2,) = E[(ˆθ(c) θ) 2 θ,c,σ 2,)] MSE(θ,c,σ 2,) = c 2 θ 2 + (1 c) 2σ2 (b) Fid sup <θ< MSE(θ,c,σ 2,). Fr what value(s) f c is this fiite? sup = uless c = 0. (c) Idex c by (i.e., c ). Fid a ecessary ad sufficiet cditi c s that fr ay < θ < as, MSE(θ,c,σ 2,) 0 (i.e., ˆθ(c ) is L 2 csistet). c 0 is ecessary ad sufficiet as is clear frm the aswer i part (a). (d) Fid the value f c that miimizes MSE(θ,c,σ 2,). Dete it by c (θ,σ 2 ). c (θ,σ 2 ) = Nte that the ptimal c is a fucti f θ/σ σ 2 θ 2 + σ 2 (e) Fr 0 < A <, fid, MSE (A,σ 2,) = if c sup MSE(θ,c,σ 2,) θ ( A,A) ad fid c (A,σ 2 ), the value f c that prduces MSE. MSE (A,σ 2,) = MSE(A,c (A,σ 2 ),σ 2,) = σ 2 A 2 A 2 + σ 2 = σ2 ( A 2 A 2 + σ 2 ) < σ2 c (A,σ 2 ) = c (A,σ 2 ) (f) Evaluate lim A MSE (A,σ 2,) ad lim A c (A,σ2 ). MSE σ 2 /; c 0. i. Briefly discuss what this implies abut selectig c. If yu wat t ctrl the MSE fr abslutely all θ, yu must set c = 0 ad use the ubiased estimate. If, hwever, yu are willig/able t cstrai the pssible values f θ t a fiite iterval, yu ca shrik twards 0 ad reduce the MSE.

BIO752: Advaced Methds i Bistatistics, II TERM 2, 2010 4 Questi #4 (30 pits): Yu are t aalyze bld pressure data (Y i,i = 1,...,) with primary iterest i the effect f a treatmet. There are tw explaatry variables, T i = {0 r 1} accrdig as participat i is i the ctrl r treatmet grup ad D i = {0 r 1} accrdig as the participat des t r des have diabetes. (a) Write dw a liear mdel that icludes a itercept, the mai effect fr treatmet ad the mai effect fr diabetes, but iteracti term. Y i = µ + αt i + βd i + e i, e i (0,σ 2 ) (b) Iterpret the cefficiets. Des yur iterpretati deped whether treatmet assigmet was radmized r the study was bservatial? If s, hw; if t, why t? µ is the expected BP fr -diabetics i the ctrl grup α is the expected icremet i BP assciated with treatmet grup membership. This is the treatmet effect fr bth diabetics ad -diabetics β is the expected icremet i BP assciated with beig diabetic µ + α + β is the expected BP fr diabetics i the treatmet grup, etc. Iterpretati f the treatmet effect (α) des deped whether r t the study is radmized. If radmized, the T, D ad ther attributes t (yet) i the mdel are statistically idepedet, ad α ca be iterpreted i a causal maer as the expected chage iduced by treatmet. If the study is t radmized, the α is the expected chage assciated with treatmet., but it is ptetially cfuded by imbalace i umeasured r yet t be icluded cvariates. If the effect f diabetes is truly additive, the the treatmet effect will t be cfuded by it, but if the diabetes effect is t additive, there ca still be cfudig by diabetes status. (c) Augmet the mdel i part (a) by a treatmet-by-diabetes iteracti. Y i = µ + αt i + βd i + γt i D i + e i, e i (0,σ 2 ) i. What are the treatmet effects fr diabetics ad fr -diabetics? Treatmet effects: N-diabetics: α Diabetics: α + γ ii. Fr what relati betwee α ad γ des the treatmet effects fr diabetics ad diabetics have the same sig? S lg as γ < α the treatmet effects will have the same sig. This implies that eve with a sigificat treatmet by diabetes iteracti, there ca be a csistet recmmedati fr bth grups. (d) There are 4(= 2 2) differet types f participats, (T, D) = {(00),(01),(10), (11)} i. Lgic regressi: Write dw the liear mdel that uses idicatrs fr these Blea expressis as regressrs. With I i (td) = I {(Ti,D i )=(t,d)}, Y i = δ 00 I i (00) + δ 01 I i (01) + δ 10 I i (10)δ 11 I i (11) + e i ii. Iterpret the slpes. All terms play the same rle; each slpe measures the expectati f Y cditial the sceari, t as cmpared t sme ther sceari.

BIO752: Advaced Methds i Bistatistics, II TERM 2, 2010 5 iii. What cstraits these slpes wuld prduce the mdel i part (a)? Sceari Part (a) mdel Lgic regressi Cstrait 00 µ δ 00 δ 00 = µ 01 µ + β δ 00 + δ 01 δ 01 = β 10 µ + α δ 00 + δ 10 δ 10 = α 11 µ + α + β δ 00 + δ 11 δ 11 = δ 01 + δ 10 (= α + β) (e) NOT ON EXAM: Briefly discuss the advatages ad disadvatages f the stadard ad lgic mdelig appraches. Lgic regressi has the ptetial t be mre parsimius i that it autmatically icludes the jit acti f predictrs. This feature becmes mre imprtat as the dimesi f the regressrs icreases, especially i applicatis where the statistical relatis are dmiated by iteractis ad jit actis. Fr example, i gemics, it s usually the jit activity f gees r SNPs; mai effects have a relatively smaller rle. There is abslute here, bth appraches ca be effective; each shuld be tried t see which prvides a better fit relative t degrees f freedm used.