Statistics & Data Sciences: First Year Prelim Exam May 2018 Instructions: 1. Do not turn this page until instructed to do so. 2. Start each new question on a new sheet of paper. 3. This is a closed book exam. 4. Show your working and explain your reasoning as fully as possible. 5. Switch off all electronic devices. 6. The time limit is 3 hours. 7. The exam has 6 questions do as much as you can but credit will be given for only 5 answers. 1
Question 1. Consider the exponential family model given by f Y (y; θ) = c(y) exp {yθ b(θ)} where c is a known function and b is part of the normalizing constant. (i) Show that E Y = b (θ), where b denotes the derivative of b, assumed to exist. (ii) Prove that b is a convex function; i.e. b (θ) 0, assumed to exist. (iii) Show that Var Y = b (θ). The MLE θ is given by b ( θ) = Ȳ, where Ȳ is the sample mean. (iv) Show that if the true model is f 0 (y); i.e. the data arise from f 0, show that θ converges to the θ which maximizes f 0 (y) log f(y; θ) dy. 2
Question 2. Suppose X 1,..., X n are independent and identically distributed from some distribution function F (x) and the aim is to test H 0 : F (0) = 1 2 vs H 1 : F (0) 1 2. That is, the test is to see whether the median of F is at 0 or not. (i) If H 0 is true, what is the distribution of n T = 1(X i 0). i=1 Here 1(X 0) = 1 if X 0 and otherwise is 0. (ii) Hence, using a normal approximation for T, which will involve finding the mean and variance of T, describe how you would test the hypothesis; i.e. find the test statistic and how to obtain the critical value. (iii) Find the Laplace transform of a normal density function with mean µ and variance σ 2 ; i.e. Find E e θx where X N(µ, σ 2 ). (iv) Using Laplace transforms, show that T/n F (0) F (0)(1 F (0))/n converges in distribution to a standard normal random variable. 3
Question 3. Let P = (p ij ) be a M M transition matrix for a discrete Markov chain (X n ) on {1,..., M}, for M > 1. Hence, for all n 1, P (X n = j X n 1 = i) = p ij. (i) Show that all the eigenvalues of P lie between 1 and +1 and that 1 is actually the largest eigenvalue. Hint: For any vector v = (v 1,..., v M ) of dimension M it is that M v (1) p ij v j v (M) j=1 where v (1) is the minimum from (v 1,..., v M ) and v (M) is the maximum. (ii) If P is reducible, explain why the second largest eigenvalue of P is 1. Now suppose π is the unique stationary probability for P, i.e. π P = π, and that P is irreducible and that the smallest eigenvalue of P is greater than 1. Let (π, w 2,..., w M ) be the left eigenvectors of P. (iii) Prove that w j 1 = 0 where 1 is the M vector of 1 s. (iv) The chain is started at q 0 ; i.e. P (X 0 = j) = q 0j. Show that we can write M q 0 = α π + α j w j and, using (iii), show that α = 1. j=2 Show that the probability mass function for X n is and hence explain why q n π. M q n = π + α j λ n j w j j=2 4
Question 4. Assume independent observations (y i ) n i=1 arise from the model y i = θ i + ε i, where the θ = (θ 1,..., θ n ) are unknown parameters and E ε i = 0 and Var ε i = σ 2, where σ > 0 is assumed known. The true value for the parameter is denoted by θ. (i) Find the least squares estimator θ of θ subject to the constraint for some fixed τ. (ii) Find E θ and Cov θ. Now consider the model n θ i = τ, i=1 y i = x i θ + ε i where the ε i are the same as before, but know θ is an unknown parameter of dimension d and x i = (x i1,..., x id ) is a vector of covariates. (iii) Find the least squares estimator of θ subject to to constraint for some fixed τ > 0. θ θ = τ (iv) Show that the estimator given in (iii) has a Bayesian interpretation. 5
Question 5. Consider the model for which P (y i = k x i, θ, β) = ( θ e βx i ) k k! exp { } θ e βx i for k {0, 1, 2,...}. So this is a Poisson regression model. The prior for θ is gamma(a, b) and the prior for β is N(0, σ 2 ), where (σ, a, b) is specified. (i) Write down the posterior distribution for (θ, β). (ii) Describe in detail a Gibbs sampler for approximately sampling from the posterior. In particular, show that the transition density for your chain has the posterior as the stationary density. (iii) If the above model refers to model M 1, then M 2 is the same model except now β = 0 is fixed. If the prior for each model is 1/2; i.e. P (M 1 ) = P (M 2 ) = 1/2, give a method by which you could compute P (M 1 data). (iv) Give a full interpretation of the probabilities for the models; given that if model M 2 is correct then so is model M 1. This interpretation should include the cases when one of the models is correct and when neither is correct. 6
Question 6. Suppose p θ (x) is a density function for each θ Θ and p 0 (x) a member of the family for a particular θ 0 and observations (x 1,..., x n ) are i.i.d. from p 0. The aim is to estimate θ by minimizing the Fisher information distance given by F (p 0, p θ ) = p 0 (x) ( ) p 2 0 (x) p 0 (x) p θ(x) dx, p θ (x) where p 0(x) denotes the derivative with respect to x. (i) Show that under regularity conditions, we can write ( p F (p 0, p θ ) = κ + p 0 (x) θ (x) p θ (x) where κ is a constant not depending on θ. ) 2 + 2 d dx ( ) p θ (x) p θ (x) dx, (ii) If p θ (x) = exp(w(x)/θ) exp(w(x)/θ) dx find l(θ, x) = ( ) p 2 θ (x) + 2 d p θ (x) dx ( ) p θ (x). p θ (x) (iii) Hence, explain how one could estimate θ from the observations using an approximation to F (p 0, p θ ). (iv) Find the form of the estimator using your answer to (iii) with the model in (ii). 7