Master s Written Examination Option: Statistics and Probability Spring 016 Full points may be obtained for correct answers to eight questions. Each numbered question which may have several parts is worth the same number of points. All answers will be graded, but the score for the examination will be the sum of the scores of your best eight solutions. Use separate answer sheets for each question. DO NOT PUT YOUR NAME ON YOUR ANSWER SHEETS. When you have finished, insert all your answer sheets into the envelope provided, then seal it. 1
Problem 1 Stat 401. Let X and Y have parameters EX = µ 1, EY = µ, V arx = σ 1, V ary = σ and correlation coefficient ρ of X and Y. Show that the correlation coefficient of X and Y ρσ σ 1 X is zero. Solution to Problem 1. Since the correlation coefficient ρ X, Y ρσ X = 0 iff. Cov X, Y ρσ X = 0, it suf- σ 1 σ 1 fices to show Cov X, Y ρσ X = E[XY ρσ X/σ 1 ] EXE[Y ρσ X/σ 1 ] σ 1 = E[XY ] ρσ σ 1 E[X ] EXEY + ρσ σ 1 EX = E[XY ] EXEY ρσ σ1 + µ σ 1 ρσ 1 σ 1 = EXY EXEY ρσ 1 σ = 0 Problem Stat 401. Suppose X n converges to X in distribution and Y n converges in probability to some constant C. Show that X n + Y n converges to X + C in distribution. Solution to Problem. Y n converges in probability to some constant C Let CF := {x R : F is continuous at x}. Apparently CF X+C = {x : F X is continuous at x C}. Thus for any a CF X+C, a C CF X. We need to show lim F X n+y n a = F X+C a. n For this purpose, we first prove lim n F Xn+Y n a F X+C a. Let {ɛ k } 0 be a sequence of points such that {a C + ɛ k CF X }. It is possible to select such a sequence of {ɛ k } k 1 because CF X c is at most countable. Therefore, for any k 1, P[X n + Y n a] = P[X n + Y n a, Y n C ɛ k ] + P[X n + Y n a, Y n C < ɛ k ] P[ Y n C ɛ k ] + P[X n a C + ɛ k ]. Letting n, note that Y n converges in probability to some constant C, we have lim sup F Xn+Yn a lim inf F X n a C + ɛ k = F X a C + ɛ k, for all k 1. n n Note that F X is continuous at all a C + ɛ k, letting k proves lim n F Xn+Y n a F X+C a.
To prove lim n F Xn+Y n a F X+C a, we select {ɛ k } 0 be a sequence of points such that {a C ɛ k CF X }. Similar to the argument above, we can show P[X n + Y n > a] P[ Y n C ɛ k ] + P[X n > a C ɛ k ] for all n, k. First fix k, letting n yields lim sup 1 F Xn+Yn a = P[X n + Y n > a] 1 F X a C ɛ k, for every k. n i.e., lim inf F X n+y n a F X a C ɛ k, for every k. n Again since F X is continuous at a C, now letting k completes the proof. Problem 3 Stat 411. Let X 1, X,..., X n be a random sample from N0, θ, 0 < θ <. i Find the Fisher information Iθ. ii Derive ˆθ MLE of θ. iii What is the asymptotic distribution of nˆθ θ? iv Compute the efficiency of ˆθ. Hint: χ distribution is also a special gamma distribution. The density function of gamma distribution with parameters α, β can be written as 1 Γαβ α x α 1 e x/β, x > 0. Solution to Problem 3. i The Fisher information Iθ can be computed as Iθ = E log fx, θ θ = E 1 θ 3 θ 4 X = θ. ii The joint likelihood function of X 1, X,..., X n can be written as Lθ = π n/ θ n exp 1 θ n Xi. Setting log Lθ θ = 0, we have n θ + n X i θ 3 = 0. Solving this equation about θ, we have ˆθ = n X i /n. 3
iii The asymptotic distribution of nˆθ θ is normal distribution with mean 0 and variance I 1 θ = θ. iv We need to compute the variance of ˆθ. Notice that z = n X i /θ follows χ distribution with degrees of freedom n, which is also a gamma distribution with α = n/ and β =, i.e., with density function fz = 1 z n Γ n n 1 e z/, z > 0. So we have E z Z = Γ n z n n 1 e z/ dz Thus, = = = 0 0 1 Γ n n n+1 n+1 Γ Γ n n/ n Γ n+1 Γ n. n V arˆθ = E X i n z n+1 1 e z/ dz 0 1 Γ n+1 n+1 z n+1 1 e z/ dz E θ n X i = θ θ Γ n+1 n θ nγ n So the efficiency of ˆθ is niθ 1 V arˆθ = nγ n n nγ n Γ n+1 Problem 4 Stat 411. Let X 1, X,..., X n be a random sample from the uniform distribution over the interval θ 1, θ + 1. The parameter θ can be any real number. a Show that the order statistics Y 1 = min i {X i } and Y n = max i {X i } are jointly sufficient statistics for θ. b Show that Y 1 and Y n are also minimally sufficient for θ. c Are Y 1 and Y n jointly complete statistics for < θ <? Why? Solution to Problem 4. a The joint pdf of X 1,..., X n n 1 I θ 1,θ+1x i = n I θ 1,+ Y 1 I,θ+1 Y n By Factorization theorem, Y 1 and Y n are sufficient for θ. 4
b For two random samples x = x 1,..., x n T and z = z 1,..., z n T. The ratio of joint pdf n 1 I θ 1,θ+1x i n 1 I θ 1,θ+1z i = I θ 1,+ min i {x i } I θ 1,+ min i {z i } I,θ+1max i {x i } I,θ+1 max i {z i } does not depend on θ if and only if min i {x i } = min i {z i } and max i {x i } = max i {z i }. Therefore, Y 1 and Y n are minimally sufficient for θ. c No, Y 1 and Y n are not jointly complete for θ. Because [ Yn Y 1 E n 1 ] = 0 n + 1 for all θ. That is, there is a nonzero function of those statistics whose expectation is always zero. Problem 5 Stat 411. Let X 1,..., X n be an iid sample from a shifted exponential distribution, i.e., the density function for each X i is f θ x = e x θ I θ, x, θ,. 1. Consider testing H 0 : θ = θ 0 versus H 1 : θ = θ 1, where θ 0 and θ 1 are fixed, with θ 1 > θ 0. Show that the most powerful test is of the form reject H 0 if and only if X 1 > c for some constant c, where X 1 = min{x 1,..., X n } is the sample minimum.. For a specified α 0, 1, find the constant c so that the size, or Type I error probability, of the test in Part a is α. 3. Calculate the power, p n θ 1, of the test at the alternative θ 1. What happens to p n θ 1 as n? Solution to Problem 5. 1. The likelihood function is n Lθ = f θ X i = Then the likelihood ratio is n e X i θ I θ, X i = e n X i θ I θ, X 1. Lθ 0 Lθ 1 = enθ 0 θ 1 I θ 0, X 1 I θ1, X 1. According to the Neyman Pearson lemma, the most powerful test rejects H 0 iff the likelihood ratio above is small which, in this case, is equivalent to X 1 being big. Therefore, the most powerful test rejects H 0 iff X 1 is bigger than some constant c. 5
. To find the constant c so that the size of the test is the specified level α, we must solve the equation: P θ0 X 1 > c = α. Since the X i s are iid, the left-hand side above can be rewritten as Setting this equal to α and solving gives P θ0 X 1 > c = P θ0 X 1 > c n = e nc θ 0. c = θ 0 n 1 log α. 3. Let p n θ 1 be the power function. Then a calculation similar to that for the size above gives p n θ 1 = P θ1 X 1 > c = e n max{c θ 1,0}. Plugging in the value of c derived above gives p n θ 1 = e n max{θ 0 θ 1 n 1 log α} = min{1, αe nθ 0 θ 1 }. Since θ 1 > θ 0, the second term in the minimum as n. Therefore, the power is converging to 1 as n. Problem 6 Stat 416. Consider the following two independent random samples drawn from continuous populations which have the same form but possibly a difference of θ in their locations: X 79 13 138 19 59 76 75 53 Y 96 141 133 107 10 19 110 104 a Using the Mann-Whitney test and the significance level 0.10, test H 0 : θ = 0 versus H 1 : θ 0 For a two-sided test with significance level 0.10, the rejection region for the Mann- Whitney test is U 15 or U 49. b For what kind of distributions, the Mann-Whitney test performs better than the t test? Solution to Problem 6. a In this case, m = 8, n = 8. Note that there is a tie that X 4 = Y 6 = 19. The Mann- Whitney U statistic is either 1 X 4 precedes Y 6 or 13 Y 6 precedes X 4. In either case, we reject the null hypothesis. 6
b The Mann-Whitney test performs better than the t test for heavy-tailed distributions including the double exponential distribution and the logistic distribution. Problem 7 Stat 431. A client has a finite population of 6 units and has funds to survey 3 units for the purpose of estimating the mean and its related standard deviation. There are two sampling plans to be considered. Sampling Plan 1: A simple random sample of size 3 without replacement, SRS 6, 3 Sampling Plan : A uniform sampling plan on the following support. Samples in the support: {1,3,5}, {,4,5}, {1,4,6}, {1,,3}, {,5,6}. Suppose upon the survey we obtain the following data: 10, 15, and 0. Answer the following questions: i Can we estimate the HT estimation of the population mean under both sampling plans? If yes, compute the estimation. ii Can we estimate the standard deviation of your estimator under both sampling plans? If yes, compute the estimation. Solution to Problem 7. i To compute the HT estimator, we need to compute the first order inclusive probability. For Plan 1, the first order inclusive probability is π i = 1/, i = 1,..., 6. For Plan, the first order inclusive probability is π 1 = 3/5, π = 3/5, π 3 = /5, π 4 = /5, π 5 = 3/5, i S Y i π i. Notice that we don t have the and π 6 = /5. HTE estimator is given by 1 n information which samples are corresponding to the data 10, 15, and 0. So we cannot estimate the population mean of based on HTE for Plan. However, we are still be able to estimate HTE based on Plan I since all π i s are the same. The corresponding HTE for Plan 1 is 1 10 + 15 + 0 = 15. 6 ii Plan 1 is simple random sample. The variance of HT estimator under simple random sample is given by 1 n 1 N s, where s is sample variance. Thus the standard deviation of the HT estimator is 1/3 1/6 5 = 5/6. For Plan, we cannot estimate the standard deviation since we do not have the required information on unit samples. Problem 8 Stat 451. Consider a triangular distribution with density function fx = 1 x, x [ 1, 1]. 7
1. Propose an accept reject procedure to simulate a random variable X having the triangular distribution above. Hint: Keep it simple!. What is the acceptance probability for your proposed method? 3. Is the X produced by your accept reject method an exact or approximate sample from the triangular distribution? Justify your answer. Solution to Problem 8. 1. The simplest approach is to choose a Unif 1, 1 proposal distribution, which is possible since the support is bounded. Let gy = 1I [ 1,1]y be the corresponding density function. We have the following bound fy M gy where M =. Then the accept reject algorithm goes as follows: Sample Y g = Unif 1, 1 and U Unif0, 1. If U fy MgY. The acceptance probability is M 1 = 1. = 1 Y, then set X = Y ; else, go back to previous step. 3. The sample X has exactly the triangular distribution. Problem 9 Stat 461. matrix Consider a Markov chain X 0, X 1,... with transition probability P = 1 0 0 0 0.4 0.1 0.1 0.4 0.3 0. 0. 0.3 0 0 0 1 The transition probability matrix Q corresponding to the non-absorbing states is 0.1 0.1. 0. 0. Calculate the matrix inverse to I Q, and use it to answer the following two questions: a Suppose the chain starts from state 1. What is the mean time spent in each of states 1 and prior to absorption? b What is the probability of absorption into state 3 from state 1?. Solution to Problem 9. I Q = 0.9 0.1 0. 0.9. 8
Hence we have W = I Q 1 = 1 0.8 0.1 0.7 0. 0. 0.9 = 8/7 1/7 /7 9/7 Hence, given the chain starts from state 1, the mean time spent in state 1 is w 11 = 8/7, the mean time spend in state is w 1 = 1/7. Note that Since R = U = W R = 0.4 0.4 0.3 0.3. 0.5 0.5 0.5 0.5 we obtain that the probability of absorption into state 3 from state 1 is u 13 = 0.5.,. Problem 10 Stat 461. A population begins with a single individual. In each generation, each individual in the population dies with probability 1/ or doubles with probability 1/. Let X n be the number of individuals in the population in the nth generation. It is clear that X n is a branching process with X 0 = 1. Find the mean and variance of X n. Solution to Problem 10. Let ξ i be i.i.d. random variables with a common distribution P {ξ i = 0} = 1, P {ξ i = } = 1. It is clear that ξ i has mean µ = 1 and variance σ = 1. By the definition of a Branching process, we have X n+1 = ξ 1 + ξ + + ξ Xn. Hence for the mean Mn of X n we have Mn = µmn 1 = µ Mn = = µ n = 1. Similarly, for the variance V n of X n, we have V n = σ Mn 1 + µ V n 1 = 1 + V n 1 = = n. Problem 11 Stat 481. In an attempt to study fat absorption in doughnuts, 4 doughnuts are randomly selected in the study of four kind of fats. The dependent variable is grams of fat absorbed, and the factor variable is the type of fat. The factor contains 4 levels four types of fat were tested and there are 6 doughnuts from each of 4 kinds of fats. The researcher accidentally dropped one of the doughnuts from the second type of fat, so the second type of fat contains 5 observations instead of 6. Given SST R = 1504.5, SSE = 018, 9
1. What is the research objective of this study? Determine the parameters of interest first and state the null and alternative hypothesis for the parameters.. What design is employed in this study? What model will you suggest to fit the data? Write down the model with necessary assumptions. 3. Construct an ANOVA table, and draw your decision accordingly. The significance α level 0.05 is given. [F 0.05 3, 19 = 3.1, F 0.05 3, 0 = 3.10.] 4. If we would like to do further analysis on the data, for example to test hypotheses on all pairwise comparison between four treatment levels simultaneously. What methods would you like to suggest? Why? Solution to Problem 11. 1. To investigate the fat absorption of four kinds of fats in doughnuts. Denote µ i is the mean fat absorption of the i-th kind of fat, i = 1,..., 4. Null hypothesis H 0 : µ 1 = µ = µ 3 = µ 4 vs alternative hypothesis H 1 : at least one µ i is different from other means.. It is a completely randomized design with a fixed effect. One-way ANOVA model, Y ij = i.i.d. µ i + ε ij, ε i N 0, σ, i = 1,..., 4; j = 1,..., n i. 3. ANOVA table Source of Variation SS DF M S F p value T reatment 1504.5 3 501.5 4.7 0.013 Error 018 19 106. T otal 35.5 As p-value < 0.05, then the null hypothesis will be rejected, i.e. there is significant evidence to show that there exists differences among the mean fab absorptions of the four kinds of fats in the study. 4. For simultaneous pairwise comparisons among the four mean fat absorptions, Tukey s method is recommended as the studentized range test for pairwise comparison is exact. It have more accurate estimation for pairwise comparison than the conservative though flexible Bonferroni s method, also better than the Scheffé s confidence region method for all linear hypotheses contrasts of the four means. Problem 1 Stat 481. Consider a linear regression model Y i = β 0 + β 1 x 1i + β x i + ε i, where i.i.d. errors ε i N 0, σ, i = 1,..., n. Constant variance σ is unknown. 1. Express the model in the matrix form with the notation below Y = Y 1. Y n, X = 1 x 1,1 x,1... 1 x 1,n x,n, β = Based on the least square criterion loss function, derive the normal equation and then the least squares estimates for the coefficients, ˆβ = ˆβ0, ˆβ 1, ˆβ. 10 β 0 β 1 β.
. Calculate the variance-covariance matrix of linear coefficient estimators V ar ˆβ, and determine the distribution of ˆβ. 3. Construct the confidence interval for the mean response µ i = E Y i at a design point x 1i, x i. Solution to Problem 1. 1. The linear regression model can be written as Y = Xβ + ε, ε N n 0, σ I n Least square objective function: Q β = Y Xβ Y Xβ. Take derivative w.r.t. β and obtain the normal equation, i.e. Q β / β = 0 X Xβ = X Y ˆβ = X X 1 X Y. The variance-covaraince matrix of the least square estimator ˆβ is V ar ˆβ = X X 1 X V ar Y X X X 1 = σ X X 1 X I n X X X 1 = σ X X 1. In addition, we can show that E ˆβ = X X 1 X Xβ = β. As ˆβ = X X 1 X Y, so it follows a normal distribution, i.e. ˆβ N3 β, σ X X 1. 3. The mean response at a design point x 1i, x i is µ i = β 0 + β 1 x 1i + β x i = X iβ, where X i = 1, x 1i, x i. Its estimator based on the least square estimates ˆβ is ˆµ i = X i ˆβ, its distribution also follows normal, X i ˆβ N X iβ, σ X i X X 1 X i = N µ i, σ X i X X 1 X i. Its sampling distribution with σ unknown is X i ˆβ µ i MSE X i X X 1 X i t n 3 with MSE = SSE/ n 3. The 100 1 α % confidence interval for µ i is X i ˆβ ± t α/ n 3 MSE X i X X 1 X i. 11