Principles of Statistics

Size: px

Start display at page:

Download "Principles of Statistics"

Alexia Knight
5 years ago
Views:

1 Part II Year

2 Paper 4, Section II 28K Let g : R R be an unknown function, twice continuously differentiable with g (x) M for all x R. For some x 0 R, we know the value g(x 0 ) and we wish to estimate its derivative g (x 0 ). To do so, we have access to a pseudo-random number generator that gives U1,..., U N i.i.d. uniform over [0, 1], and a machine that takes input x 1,..., x N R and returns g(x i ) + ε i, where the ε i are i.i.d. N (0, σ 2 ). (a) Explain how this setup allows us to generate N independent X i = x 0 + hz i, where the Z i take value 1 or 1 with probability 1/2, for any h > 0. (b) We denote by Y i the output g(x i ) + ε i. Show that for some independent ξ i R Y i g(x 0 ) = hz i g (x 0 ) + h2 2 g (ξ i ) + ε i. (c) Using the intuition given by the least-squares estimator, justify the use of the estimator ĝ N given by ĝ N = 1 N Z i (Y i g(x 0 )). N h (d) Show that E[ ĝ N g (x 0 ) 2 ] h2 M 2 + σ2 4 Nh 2. Show that for some choice h N of parameter h, this implies i=1 E[ ĝ N g (x 0 ) 2 ] σm N. Part II, 2018 List of Questions [TURN OVER

3 Paper 3, Section II 28K In the model {N (θ, I p ), θ R p } of a Gaussian distribution in dimension p, with unknown mean θ and known identity covariance matrix I p, we estimate θ based on a sample of i.i.d. observations X 1,..., X n drawn from N (θ 0, I p ). (a) Define the Fisher information I(θ 0 ), and compute it in this model. (b) We recall that the observed Fisher information i n (θ) is given by i n (θ) = 1 n n θ log f(x i, θ) θ log f(x i, θ). i=1 Find the limit of î n = i n (ˆθ MLE ), where ˆθ MLE is the maximum likelihood estimator of θ in this model. (c) Define the Wald statistic W n (θ) and compute it. Give the limiting distribution of W n (θ 0 ) and explain how it can be used to design a confidence interval for θ 0. [You may use results from the course provided that you state them clearly.] Paper 2, Section II 28K We consider the model {N (θ, I p ), θ R p } of a Gaussian distribution in dimension p 3, with unknown mean θ and known identity covariance matrix I p. We estimate θ based on one observation X N (θ, I p ), under the loss function l(θ, δ) = θ δ 2 2. (a) Define the risk of an estimator ˆθ. Compute the maximum likelihood estimator ˆθ MLE of θ and its risk for any θ R p. (b) Define what an admissible estimator is. Is ˆθ MLE admissible? (c) For any c > 0, let π c (θ) be the prior N (0, c 2 I p ). Find a Bayes optimal estimator ˆθ c under this prior with the quadratic loss, and compute its Bayes risk. (d) Show that ˆθ MLE is minimax. [You may use results from the course provided that you state them clearly.] Part II, 2018 List of Questions

4 Paper 1, Section II 29K A scientist wishes to estimate the proportion θ (0, 1) of presence of a gene in a population of flies of size n. Every fly receives a chromosome from each of its two parents, each carrying the gene A with probability (1 θ) or the gene B with probability θ, independently. The scientist can observe if each fly has two copies of the gene A (denoted by AA), two copies of the gene B (denoted by BB) or one of each (denoted by AB). We let n AA, n BB, and n AB denote the number of each observation among the n flies. (a) Give the probability of each observation as a function of θ, denoted by f(x, θ), for all three values X = AA, BB, or AB. (b) For a vector w = (w AA, w BB, w AB ), we let ˆθ w denote the estimator defined by ˆθ w = w AA n AA n + w n BB BB n + w n AB AB n. Find the unique vector w such that ˆθ w is unbiased. Show that ˆθ w is a consistent estimator of θ. (c) Compute the maximum likelihood estimator of θ in this model, denoted by ˆθ MLE. Find the limiting distribution of n(ˆθ MLE θ). [You may use results from the course, provided that you state them clearly.] Part II, 2018 List of Questions [TURN OVER

5 Paper 2, Section II 26K We consider the problem of estimating θ in the model {f(x, θ) : θ (0, )}, where f(x, θ) = (1 α)(x θ) α 1{x [θ, θ + 1]}. Here 1{A} is the indicator of the set A, and α (0, 1) is known. This estimation is based on a sample of n i.i.d. X 1,..., X n, and we denote by X (1) <... < X (n) the ordered sample. (a) Compute the mean and the variance of X 1. Construct an unbiased estimator of θ taking the form θ n = X n + c(α), where X n = n 1 n i=1 X i, specifying c(α). (b) Show that θ n is consistent and find the limit in distribution of n( θ n θ). Justify your answer, citing theorems that you use. (c) Find the maximum likelihood estimator ˆθ n of θ. Compute P(ˆθ n θ > t) for all real t. Is ˆθ n unbiased? (d) For t > 0, show that P(n β (ˆθ n θ) > t) has a limit in (0, 1) for some β > 0. Give explicitly the value of β and the limit. Why should one favour using ˆθ n over θ n? Part II, 2017 List of Questions

6 Paper 3, Section II 26K We consider the problem of estimating an unknown θ 0 in a statistical model {f(x, θ), θ Θ} where Θ R, based on n i.i.d. observations X 1,..., X n whose distribution has p.d.f. f(x, θ 0 ). In all the parts below you may assume that the model satisfies necessary regularity conditions. (a) Define the score function S n of θ. Prove that S n (θ 0 ) has mean 0. (b) Define the Fisher Information I(θ). Show that it can also be expressed as [ d 2 ] I(θ) = E θ dθ 2 log f(x 1, θ). (c) Define the maximum likelihood estimator ˆθ n of θ. Give without proof the limits of ˆθ n and of n(ˆθ n θ 0 ) (in a manner which you should specify). [Be as precise as possible when describing a distribution.] (d) Let ψ : Θ R be a continuously differentiable function, and θ n another estimator of θ 0 such that ˆθ n θ n 1/n with probability 1. Give the limits of ψ( θ n ) and of n(ψ( θn ) ψ(θ 0 )) (in a manner which you should specify). Part II, 2017 List of Questions [TURN OVER

7 Paper 4, Section II 27K For the statistical model {N d (θ, Σ), θ R d }, where Σ is a known, positive-definite d d matrix, we want to estimate θ based on n i.i.d. observations X 1,..., X n with distribution N d (θ, Σ). (a) Derive the maximum likelihood estimator ˆθ n of θ. What is the distribution of ˆθ n? (b) For α (0, 1), construct a confidence region C α n such that P θ (θ C α n ) = 1 α. (c) For Σ = I d, compute the maximum likelihood estimator of θ for the following parameter spaces: (i) Θ = {θ : θ 2 = 1}. (ii) Θ = {θ : v θ = 0} for some unit vector v R d. (d) For Σ = I d, we want to test the null hypothesis Θ 0 = {0} (i.e. θ = 0) against the composite alternative Θ 1 = R d \ {0}. Compute the likelihood ratio statistic Λ(Θ 1, Θ 0 ) and give its distribution under the null hypothesis. Compare this result with the statement of Wilks theorem. Part II, 2017 List of Questions

8 Paper 1, Section II 28K For a positive integer n, we want to estimate the parameter p in the binomial statistical model {Bin(n, p), p [0, 1]}, based on an observation X Bin(n, p). (a) Compute the maximum likelihood estimator for p. Show that the posterior distribution for p under a uniform prior on [0, 1] is Beta(a, b), and specify a and b. [The p.d.f. of Beta(a, b) is given by f a,b (p) = (a + b 1)! (a 1)!(b 1)! pa 1 (1 p) b 1. ] (b) (i) For a risk function L, define the risk of an estimator ˆp of p, and the Bayes risk under a prior π for p. (ii) Under the loss function L(ˆp, p) = (ˆp p)2 p(1 p), find a Bayes optimal estimator for the uniform prior. Give its risk as a function of p. (iii) Give a minimax optimal estimator for the loss function L given above. Justify your answer. Part II, 2017 List of Questions [TURN OVER

9 2016 Paper 3, Section II 25J 75 Let X 1,..., X n be i.i.d. random variables from a N(θ, 1) distribution, θ R, and consider a Bayesian model θ N(0, v 2 ) for the unknown parameter, where v > 0 is a fixed constant. (a) Derive the posterior distribution Π( X 1,..., X n ) of θ X 1,..., X n. (b) Construct a credible set C n R such that (i) Π(C n X 1,..., X n ) = 0.95 for every n N, and (ii) for any θ 0 R, P N θ 0 (θ 0 C n ) 0.95 as n, where Pθ N denotes the distribution of the infinite sequence X 1, X 2,... when drawn independently from a fixed N(θ, 1) distribution. [You may use the central limit theorem.] Paper 2, Section II 26J (a) State and prove the Cramér Rao inequality in a parametric model {f(θ) : θ Θ}, where Θ R. [Necessary regularity conditions on the model need not be specified.] (b) Let X 1,..., X n be i.i.d. Poisson random variables with unknown parameter EX 1 = θ > 0. For X n = (1/n) n i=1 X i and S 2 = (n 1) 1 n i=1 (X i X n ) 2 define T α = α X n + (1 α)s 2, 0 α 1. Show that Var θ (T α ) Var θ ( X n ) for all values of α, θ. Now suppose θ = θ(x 1,..., X n ) is an estimator of θ with possibly nonzero bias B(θ) = E θ θ θ. Suppose the function B is monotone increasing on (0, ). Prove that the mean-squared errors satisfy E θ ( θ n θ) 2 E θ ( X n θ) 2 for all θ Θ. Part II, 2016 List of Questions [TURN OVER

10 2016 Paper 4, Section II 26J Consider a decision problem with parameter space Θ. Bayes decision rule δ π and of a least favourable prior. 76 Define the concepts of a Suppose π is a prior distribution on Θ such that the Bayes risk of the Bayes rule equals sup θ Θ R(δ π, θ), where R(δ, θ) is the risk function associated to the decision problem. Prove that δ π is least favourable. Now consider a random variable X arising from the binomial distribution Bin(n, θ), where θ Θ = [0, 1]. Construct a least favourable prior for the squared risk R(δ, θ) = E θ (δ(x) θ) 2. [You may use without proof the fact that the Bayes rule for quadratic risk is given by the posterior mean.] Paper 1, Section II 27J Derive the maximum likelihood estimator θ n based on independent observations X 1,..., X n that are identically distributed as N(θ, 1), where the unknown parameter θ lies in the parameter space Θ = R. Find the limiting distribution of n( θ n θ) as n. Now define θ n = θ n whenever θ n > n 1/4, = 0 otherwise, and find the limiting distribution of n( θ n θ) as n. Calculate lim sup ne θ (T n θ) 2 n θ Θ for the choices T n = θ n and T n = θ n. Based on the above findings, which estimator T n of θ would you prefer? Explain your answer. [Throughout, you may use standard facts of stochastic convergence, such as the central limit theorem, provided they are clearly stated.] Part II, 2016 List of Questions

11 2015 Paper 4, Section II 24J 79 Given independent and identically distributed observations X 1,..., X n with finite mean E(X 1 ) = µ and variance Var(X 1 ) = σ 2, explain the notion of a bootstrap sample X b 1,..., Xb n, and discuss how you can use it to construct a confidence interval C n for µ. Suppose you can operate a random number generator that can simulate independent uniform random variables U 1,..., U n on [0, 1]. How can you use such a random number generator to simulate a bootstrap sample? Suppose that (F n : n N) and F are cumulative probability distribution functions defined on the real line, that F n (t) F (t) as n for every t R, and that F is continuous on R. Show that, as n, sup F n (t) F (t) 0. t R State (without proof) the theorem about the consistency of the bootstrap of the mean, and use it to give an asymptotic justification of the confidence interval C n. That is, prove that as n, P N (µ C n ) 1 α where P N is the joint distribution of X 1, X 2,.... [You may use standard facts of stochastic convergence and the Central Limit Theorem without proof.] Part II, 2015 List of Questions [TURN OVER

12 2015 Paper 3, Section II 24J 80 Define what it means for an estimator ˆθ of an unknown parameter θ to be consistent. Let S n be a sequence of random real-valued continuous functions defined on R such that, as n, S n (θ) converges to S(θ) in probability for every θ R, where S : R R is non-random. Suppose that for some θ 0 R and every ε > 0 we have S(θ 0 ε) < 0 < S(θ 0 + ε), and that S n has exactly one zero ˆθ n for every n N. Show that ˆθ n P θ 0 as n, and deduce from this that the maximum likelihood estimator (MLE) based on observations X 1,..., X n from a N(θ, 1), θ R model is consistent. Now consider independent observations X 1,..., X n of bivariate normal random vectors X i = (X 1i, X 2i ) T [ N 2 (µi, µ i ) T, σ 2 ] I 2, i = 1,..., n, where µ i R, σ > 0 and I 2 is the 2 2 identity matrix. Find the MLE ˆµ = (ˆµ 1,..., ˆµ n ) T of µ = (µ 1,..., µ n ) T and show that the MLE of σ 2 equals ˆσ 2 = 1 n n s 2 i, s 2 i = 1 2 [(X 1i ˆµ i ) 2 + (X 2i ˆµ i ) 2 ]. i=1 Show that ˆσ 2 is not consistent for estimating σ 2. Explain briefly why the MLE fails in this model. [You may use the Law of Large Numbers without proof.] Part II, 2015 List of Questions

13 2015 Paper 2, Section II 25J 81 Consider a random variable X arising from the binomial distribution Bin(n, θ), θ Θ = [0, 1]. Find the maximum likelihood estimator ˆθ MLE and the Fisher information I(θ) for θ Θ. Now consider the following priors on Θ: (i) a uniform U([0, 1]) prior on [0, 1], (ii) a prior with density π(θ) proportional to I(θ), (iii) a Beta( n/2, n/2) prior. Find the means E[θ X] and modes m θ X of the posterior distributions corresponding to the prior distributions (i) (iii). Which of these posterior decision rules coincide with ˆθ MLE? Which one is minimax for quadratic risk? Justify your answers. [You may use the following properties of the Beta(a, b) (a > 0, b > 0) distribution. Its density f(x; a, b), x [0, 1], is proportional to x a 1 (1 x) b 1, its mean is equal to a/(a + b), and its mode is equal to provided either a > 1 or b > 1. max(a 1, 0) max(a, 1) + max(b, 1) 2 You may further use the fact that a unique Bayes rule of constant risk is a unique minimax rule for that risk.] Part II, 2015 List of Questions [TURN OVER

14 2015 Paper 1, Section II 25J 82 Consider a normally distributed random vector X R p modelled as X N(θ, I p ) where θ R p, I p is the p p identity matrix, and where p 3. Define the Stein estimator ˆθ STEIN of θ. Prove that ˆθ STEIN dominates the estimator θ = X for the risk function induced by quadratic loss p l(a, θ) = (a i θ i ) 2, a R p. Show however that the worst case risks coincide, that is, show that i=1 sup E θ l(x, θ) = sup E θ l(ˆθ STEIN, θ). θ R p θ R p [You may use Stein s lemma without proof, provided it is clearly stated.] Part II, 2015 List of Questions

15 Paper 4, Section II 27J Suppose you have at hand a pseudo-random number generator that can simulate an i.i.d. sequence of uniform U[0, 1] distributed random variables U1,..., U N for any N N. Construct an algorithm to simulate an i.i.d. sequence X1,..., X N of standard normal N(0, 1) random variables. [Should your algorithm depend on the inverse of any cumulative probability distribution function, you are required to provide an explicit expression for this inverse function.] Suppose as a matter of urgency you need to approximately evaluate the integral I = 1 1 /2 2π (π + x ) 1/4 e x2 dx. R Find an approximation I N of this integral that requires N simulation steps from your pseudo-random number generator, and which has stochastic accuracy Pr( I N I > N 1/4 ) N 1/2, where Pr denotes the joint law of the simulated random variables. Justify your answer. Paper 3, Section II 27J State and prove Wilks theorem about testing the simple hypothesis H 0 : θ = θ 0, against the alternative H 1 : θ Θ \ {θ 0 }, in a one-dimensional regular parametric model {f(, θ) : θ Θ}, Θ R. [You may use without proof the results from lectures on the consistency and asymptotic distribution of maximum likelihood estimators, as well as on uniform laws of large numbers. Necessary regularity conditions can be assumed without statement.] Find the maximum likelihood estimator ˆθ n based on i.i.d. observations X 1,..., X n in a N(0, θ)-model, θ Θ = (0, ). Deduce the limit distribution as n of the sequence of statistics ( ) n log(x 2 ) (X 2 1), where X 2 = (1/n) n i=1 X2 i and X 1,..., X n are i.i.d. N(0, 1). Part II, 2014 List of Questions

16 Paper 2, Section II 28J In a general decision problem, define the concepts of a Bayes rule and of admissibility. Show that a unique Bayes rule is admissible. Consider i.i.d. observations X 1,..., X n from a Poisson(θ), θ Θ = (0, ), model. Can the maximum likelihood estimator ˆθ MLE of θ be a Bayes rule for estimating θ in quadratic risk for any prior distribution on θ that has a continuous probability density on (0, )? Justify your answer. Now model the X i as i.i.d. copies of X θ Poisson(θ), where θ is drawn from a prior that is a Gamma distribution with parameters α > 0 and λ > 0 (given below). Show that the posterior distribution of θ X 1,..., X n is a Gamma distribution and find its parameters. Find the Bayes rule ˆθ BAYES for estimating θ in quadratic risk for this prior. [The Gamma probability density function with parameters α > 0, λ > 0 is given by where Γ(α) is the usual Gamma function.] f(θ) = λα θ α 1 e λθ, θ > 0, Γ(α) Finally assume that the X i have actually been generated from a fixed Poisson(θ 0 ) distribution, where θ 0 > 0. Show that n(ˆθ BAYES ˆθ MLE ) converges to zero in probability and deduce the asymptotic distribution of n(ˆθ BAYES θ 0 ) under the joint law P N θ 0 of the random variables X 1, X 2,.... [You may use standard results from lectures without proof provided they are clearly stated.] Paper 1, Section II 28J State without proof the inequality known as the Cramér Rao lower bound in a parametric model {f(, θ) : θ Θ}, Θ R. Give an example of a maximum likelihood estimator that attains this lower bound, and justify your answer. Give an example of a parametric model where the maximum likelihood estimator based on observations X 1,..., X n is biased. State without proof an analogue of the Cramér Rao inequality for biased estimators. Define the concept of a minimax decision rule, and show that the maximum likelihood estimator ˆθ MLE based on X 1,..., X n in a N(θ, 1) model is minimax for estimating θ Θ = R in quadratic risk. Part II, 2014 List of Questions [TURN OVER

17 Paper 4, Section II 27K Assuming only the existence and properties of the univariate normal distribution, define N p (µ, Σ), the multivariate normal distribution with mean (row-)vector µ and dispersion matrix Σ; and W p (ν; Σ), the Wishart distribution on integer ν > 1 degrees of freedom and with scale parameter Σ. Show that, if X N p (µ, Σ), S W p (ν; Σ), and b (1 q), A (p q) are fixed, then b + XA N q (b + µa, Φ), A T SA W p (ν; Φ), where Φ = A T ΣA. The random (n p) matrix X has rows that are independently distributed as N p (M, Σ), where both parameters M and Σ are unknown. Let X := n 1 1 T X, where 1 is the (n 1) vector of 1s; and S c := X T ΠX, with Π := I n n 1 11 T. State the joint distribution of X and S c given the parameters. Now suppose n > p and Σ is positive definite. Hotelling s T 2 is defined as T 2 := n(x M) ( S c ) 1 (X M) T where S c := S c /ν with ν := (n 1). Show that, for any values of M and Σ, ( ) ν p + 1 T 2 F p ν p+1 νp, the F distribution on p and ν p + 1 degrees of freedom. [You may assume that: 1. If S W p (ν; Σ) and a is a fixed (p 1) vector, then a T Σ 1 a a T S 1 a χ2 ν p If V χ 2 p, W χ2 λ are independent, then V/p W/λ F p λ. ] Part II, 2013 List of Questions [TURN OVER

18 Paper 3, Section II 27K What is meant by a convex decision problem? State and prove a theorem to the effect that, in a convex decision problem, there is no point in randomising. [You may use standard terms without defining them.] The sample space, parameter space and action space are each the two-point set {1, 2}. The observable X takes value 1 with probability 2/3 when the parameter Θ = 1, and with probability 3/4 when Θ = 2. The loss function L(θ, a) is 0 if a = θ, otherwise 1. Describe all the non-randomised decision rules, compute their risk functions, and plot these as points in the unit square. Identify an inadmissible non-randomised decision rule, and a decision rule that dominates it. Show that the minimax rule has risk function (8/17, 8/17), and is Bayes against a prior distribution that you should specify. What is its Bayes risk? Would a Bayesian with this prior distribution be bound to use the minimax rule? Paper 1, Section II 28K When the real parameter Θ takes value θ, variables X 1, X 2,... arise independently from a distribution P θ having density function p θ (x) with respect to an underlying measure µ. Define the score variable U n (θ) and the information function I n (θ) for estimation of Θ based on X n := (X 1,..., X n ), and relate I n (θ) to i(θ) := I 1 (θ). State and prove the Cramér Rao inequality for the variance of an unbiased estimator of Θ. Under what conditions does this inequality become an equality? What is the form of the estimator in this case? [You may assume E θ {U n (θ)} = 0, var θ {U n (θ)} = I n (θ), and any further required regularity conditions, without comment.] Let Θ n be the maximum likelihood estimator of Θ based on X n. asymptotic distribution of n 1 2 ( Θ n Θ) when Θ = θ? What is the Suppose that, for each n, Θ n is unbiased for Θ, and the variance of n 1 2 ( Θ n Θ) is exactly equal to its asymptotic variance. By considering the estimator α Θ k + (1 α) Θ n, or otherwise, show that, for k < n, cov θ ( Θ k, Θ n ) = var θ ( Θ n ). Part II, 2013 List of Questions

19 Paper 2, Section II 28K Describe the Weak Sufficiency Principle (WSP) and the Strong Sufficiency Principle (SSP). Show that Bayesian inference with a fixed prior distribution respects WSP. A parameter Φ has a prior distribution which is normal with mean 0 and precision (inverse variance) h Φ. Given Φ = φ, further parameters Θ := (Θ i : i = 1,..., I) have independent normal distributions with mean φ and precision h Θ. Finally, given both Φ = φ and Θ = θ := (θ 1,..., θ I ), observables X := (X ij : i = 1,..., I; j = 1,..., J) are independent, X ij being normal with mean θ i, and precision h X. The precision parameters (h Φ, h Θ, h X ) are all fixed and known. Let X := (X 1,..., X I ), where X i := J j=1 X ij/j. Show, directly from the definition of sufficiency, that X is sufficient for (Φ, Θ). [You may assume without proof that, if Y 1,..., Y n have independent normal distributions with the same variance, and Y := n 1 n i=1 Y i, then the vector (Y 1 Y,..., Y n Y ) is independent of Y.] For data-values x := (x ij : i = 1,..., I; j = 1,..., J), determine the joint distribution, Π φ say, of Θ, given X = x and Φ = φ. What is the distribution of Φ, given Θ = θ and X = x? Using these results, describe clearly how Gibbs sampling combined with Rao Blackwellisation could be applied to estimate the posterior joint distribution of Θ, given X = x. Part II, 2013 List of Questions [TURN OVER

20 Paper 4, Section II 27K For i = 1,..., n, the pairs (X i, Y i ) have independent bivariate normal distributions, with E(X i ) = µ X, E(Y i ) = µ Y, var(x i ) = var(y i ) = φ, and corr(x i, Y i ) = ρ. The means µ X, µ Y are known; the parameters φ > 0 and ρ ( 1, 1) are unknown. Show that the joint distribution of all the variables belongs to an exponential family, and identify the natural sufficient statistic, natural parameter, and mean-value parameter. Hence or otherwise, find the maximum likelihood estimator ˆρ of ρ. Let U i := X i + Y i, V i := X i Y i. What is the joint distribution of (U i, V i )? Show that the distribution of (1 + ˆρ)/(1 ˆρ) (1 + ρ)/(1 ρ) is Fn n. Hence describe a (1 α)-level confidence interval for ρ. Briefly explain what would change if µ X and µ Y were also unknown. [Recall that the distribution F ν 1 ν 2 is that of (W 1 /ν 1 )/(W 2 /ν 2 ), where, independently for j = 1 and j = 2, W j has the chi-squared distribution with ν j degrees of freedom.] Part II, 2012 List of Questions [TURN OVER

21 Paper 3, Section II 27K The parameter vector is Θ (Θ 1, Θ 2, Θ 3 ), with Θ i > 0, Θ 1 + Θ 2 + Θ 3 = 1. Given Θ = θ (θ 1, θ 2, θ 3 ), the integer random vector X = (X 1, X 2, X 3 ) has a trinomial distribution, with probability mass function p(x θ) = n! x 1! x 2! x 3! θx 1 1 θx 2 2 θx 3 3, (x i 0, ) 3 x i = n. (1) Compute the score vector for the parameter Θ := (Θ 1, Θ 2 ), and, quoting any relevant general result, use this to determine E(X i ) (i = 1, 2, 3). Considering (1) as an exponential family with mean-value parameter Θ, what is the corresponding natural parameter Φ (Φ 1, Φ 2 )? Compute the information matrix I for Θ, which has (i, j)-entry ( 2 ) l I ij = E (i, j = 1, 2), θ i θ j where l denotes the log-likelihood function, based on X, expressed in terms of (θ 1, θ 2 ). Show that the variance of log(x 1 /X 3 ) is asymptotic to n 1 (θ1 1 + θ3 1 ) as n. [Hint. The information matrix I Φ for Φ is I 1 and the dispersion matrix of the maximum likelihood estimator Φ behaves, asymptotically (for n ) as I 1 Φ.] i=1 Part II, 2012 List of Questions

22 Paper 2, Section II 28K Carefully defining all italicised terms, show that, if a sufficiently general method of inference respects both the Weak Sufficiency Principle and the Conditionality Principle, then it respects the Likelihood Principle. The position X t of a particle at time t > 0 has the Normal distribution N (0, φt), where φ is the value of an unknown parameter Φ; and the time, T x, at which the particle first reaches position x 0 has probability density function ( ) x p x (t) = exp x2 (t > 0). 2πφt 3 2φt Experimenter E 1 observes X τ, and experimenter E 2 observes T ξ, where τ > 0, ξ 0 are fixed in advance. It turns out that T ξ = τ. What does the Likelihood Principle say about the inferences about Φ to be made by the two experimenters? E 1 bases his inference about Φ on the distribution and observed value of X 2 τ /τ, while E 2 bases her inference on the distribution and observed value of ξ 2 /T ξ. Show that these choices respect the Likelihood Principle. Part II, 2012 List of Questions [TURN OVER

23 Paper 1, Section II 28K Prove that, if T is complete sufficient for Θ, and S is a function of T, then S is the minimum variance unbiased estimator of E(S Θ). When the parameter Θ takes a value θ > 0, observables (X 1,..., X n ) arise independently from the exponential distribution E(θ), having probability density function Show that the family of distributions with probability density function p(x θ) = θe θx (x > 0). Θ Gamma (α, β) (α > 0, β > 0), (1) π(θ) = βα Γ(α) θα 1 e βθ (θ > 0), is a conjugate family for Bayesian inference about Θ (where Γ(α) is the Gamma function). Show that the expectation of Λ := log Θ, under prior distribution (1), is ψ(α) log β, where ψ(α) := (d/dα) log Γ(α). What is the prior variance of Λ? Deduce the posterior expectation and variance of Λ, given (X 1,..., X n ). Let Λ denote the limiting form of the posterior expectation of Λ as α, β 0. Show that Λ is the minimum variance unbiased estimator of Λ. What is its variance? Part II, 2012 List of Questions

24 Paper 1, Section II 28K Define admissible, Bayes, minimax decision rules. A random vector X = (X 1, X 2, X 3 ) T has independent components, where X i has the normal distribution N (θ i, 1) when the parameter vector Θ takes the value θ = (θ 1, θ 2, θ 3 ) T. It is required to estimate Θ by a point a R 3, with loss function L(θ, a) = a θ 2. What is the risk function of the maximum-likelihood estimator Θ := X? Show that Θ is dominated by the estimator Θ := (1 X 2 ) X. Paper 2, Section II 28K Random variables X 1,..., X n are independent and identically distributed from the normal distribution with unknown mean M and unknown precision (inverse variance) H. Show that the likelihood function, for data X 1 = x 1,..., X n = x n, is { }) L n (µ, h) h n/2 exp ( 1 2 h n (x µ) 2 + S, where x := n 1 i x i and S := i (x i x) 2. A bivariate prior distribution for (M, H) is specified, in terms of hyperparameters (α 0, β 0, m 0, λ 0 ), as follows. The marginal distribution of H is Γ(α 0, β 0 ), with density π(h) h α 0 1 e β 0h (h > 0), and the conditional distribution of M, given H = h, is normal with mean m 0 and precision λ 0 h. Show that the conditional prior distribution of H, given M = µ, is ( H M = µ Γ α , β λ 0 (µ m 0 ) 2). Show that the posterior joint distribution of (M, H), given X 1 = x 1,..., X n = x n, has the same form as the prior, with updated hyperparameters (α n, β n, m n, λ n ) which you should express in terms of the prior hyperparameters and the data. [You may use the identity where p + q = 1 and δ = pa + qb.] p(t a) 2 + q(t b) 2 = (t δ) 2 + pq(a b) 2, Explain how you could implement Gibbs sampling to generate a random sample from the posterior joint distribution. Part II, 2011 List of Questions

25 Paper 3, Section II 27K Random variables X 1, X 2,... are independent and identically distributed from the exponential distribution E(θ), with density function p X (x θ) = θe θx (x > 0), when the parameter Θ takes value θ > 0. The following experiment is performed. First X 1 is observed. Thereafter, if X 1 = x 1,..., X i = x i have been observed (i 1), a coin having probability α(x i ) of landing heads is tossed, where α : R (0, 1) is a known function and the coin toss is independent of the X s and previous tosses. If it lands heads, no further observations are made; if tails, X i+1 is observed. Let N be the total number of X s observed, and X := (X 1,..., X N ). Write down the likelihood function for Θ based on data X = (x 1,..., x n ), and identify a minimal sufficient statistic. What does the likelihood principle have to say about inference from this experiment? Now consider the experiment that only records Y := X N. Show that the density function of Y has the form p Y (y θ) = exp{a(y) k(θ) θy}. Assuming the function a( ) is twice differentiable and that both p Y (y θ) and p Y (y θ)/ y vanish at 0 and, show that a (Y ) is an unbiased estimator of Θ, and find its variance. Stating clearly any general results you use, deduce that k (θ) E θ {a (Y )} 1. Part II, 2011 List of Questions [TURN OVER

26 Paper 4, Section II 27K What does it mean to say that a (1 p) random vector ξ has a multivariate normal distribution? Suppose ξ = (X, Y ) has the bivariate normal distribution with mean vector µ = (µ X, µ Y ), and dispersion matrix ( ) σxx σ Σ = XY. σ XY Show that, with β := σ XY /σ XX, Y βx is independent of X, and thus that the conditional distribution of Y given X is normal with mean µ Y + β(x µ X ) and variance σ Y Y X := σ Y Y σ 2 XY /σ XX. For i = 1,..., n, ξ i = (X i, Y i ) are independent and identically distributed with the above distribution, where all elements of µ and Σ are unknown. Let where ξ := n 1 n i=1 ξ i. ( SXX S S = XY S XY S Y Y ) := σ Y Y n (ξ i ξ) T (ξ i ξ), The sample correlation coefficient is r := S XY / S XX S Y Y. Show that the distribution of r depends only on the population correlation coefficient ρ := σ XY / σ XX σ Y Y. Student s t-statistic (on n 2 degrees of freedom) for testing the null hypothesis H 0 : β = 0 is β t :=, SY Y X /(n 2)S XX where β := S XY /S XX and S Y Y X := S Y Y S 2 XY /S XX. Its density when H 0 is true is i=1 ) 1 p(t) = C (1 + t2 2 (n 1), n 2 where C is a constant that need not be specified. Express t in terms of r, and hence derive the density of r when ρ = 0. How could you use the sample correlation r to test the hypothesis ρ = 0? Part II, 2011 List of Questions

27 2010 Paper 1, Section II 28J 76 The distribution of a random variable X is obtained from the binomial distribution B(n; Π) by conditioning on X > 0; here Π (0, 1) is an unknown probability parameter and n is known. Show that the distributions of X form an exponential family and identify the natural sufficient statistic T, natural parameter Φ, and cumulant function k(φ). Using general properties of the cumulant function, compute the mean and variance of X when Π = π. Write down an equation for the maximum likelihood estimate Π of Π and explain why, when Π = π, the distribution of Π is approximately normal N (π, π(1 π)/n) for large n. Suppose we observe X = 1. It is suggested that, since the condition X > 0 is then automatically satisfied, general principles of inference require that the inference to be drawn should be the same as if the distribution of X had been B(n; Π) and we had observed X = 1. Comment briefly on this suggestion. Paper 2, Section II 28J Define the Kolmogorov Smirnov statistic for testing the null hypothesis that real random variables X 1,..., X n are independently and identically distributed with specified continuous, strictly increasing distribution function F, and show that its null distribution does not depend on F. A composite hypothesis H 0 specifies that, when the unknown positive parameter Θ takes value θ, the random variables X 1,..., X n arise independently from the uniform distribution U [0, θ]. Letting J := arg max 1 i n X i, show that, under H 0, the statistic (J, X J ) is sufficient for Θ. Show further that, given {J = j, X j = ξ}, the random variables (X i : i j) are independent and have the U [0, ξ] distribution. How might you apply the Kolmogorov Smirnov test to test the hypothesis H 0? Part II, 2010 List of Questions

28 2010 Paper 3, Section II 27J 77 Define the normal and extensive form solutions of a Bayesian statistical decision problem involving parameter Θ, random variable X, and loss function L(θ, a). How are they related? Let R 0 = R 0 (Π) be the Bayes loss of the optimal act when Θ Π and no data can be observed. Express the Bayes risk R 1 of the optimal statistical decision rule in terms of R 0 and the joint distribution of (Θ, X). The real parameter Θ has distribution Π, having probability density function π( ). Consider the problem of specifying a set S R such that the loss when Θ = θ is L(θ, S) = c S 1 S (θ), where 1 S is the indicator function of S, where c > 0, and where S = S dx. Show that the highest density region S := {θ : π(θ) c} supplies a Bayes act for this decision problem, and explain why R 0 (Π) 0. For the case Θ N (µ, σ 2 ), find an expression for R 0 in terms of the standard normal distribution function Φ. Suppose now that c = 0.5, that Θ N (0, 1) and that X Θ N (Θ, 1/9). Show that R 1 < R 0. Paper 4, Section II 27J Define completeness and bounded completeness of a statistic T in a statistical experiment. Random variables X 1, X 2, X 3 are generated as X i = Θ 1/2 Z + (1 Θ) 1/2 Y i, where Z, Y 1, Y 2, Y 3 are independently standard normal N (0, 1), and the parameter Θ takes values in (0, 1). What is the joint distribution of (X 1, X 2, X 3 ) when Θ = θ? Write down its density function, and show that a minimal sufficient statistic for Θ based on (X 1, X 2, X 3 ) is T = (T 1, T 2 ) := ( 3 i=1 X2 i, ( 3 i=1 X i) 2 ). [Hint: You may use that if I is the n n identity matrix and J is the n n matrix all of whose entries are 1, then ai + bj has determinant a n 1 (a + nb), and inverse ci + dj with c = 1/a, d = b/(a(a + nb)).] What is E θ (T 1 )? Is T complete for Θ? Let S := Prob(X T ). Show that E θ(s) is a positive constant c which does not depend on θ, but that S is not identically equal to c. Is T boundedly complete for Θ? Part II, 2010 List of Questions [TURN OVER

29 2009 Paper 1, Section II 28I 76 (i) Let X 1,..., X n be independent and identically distributed random variables, having the exponential distribution E(λ) with density p(x λ) = λ exp( λx) for x, λ > 0. Show that T n = n i=1 X i is minimal sufficient and complete for λ. [You may assume uniqueness of Laplace transforms.] (ii) For given x > 0, it is desired to estimate the quantity φ = Prob(X 1 > x λ). Compute the Fisher information for φ. by (iii) State the Lehmann Scheffé theorem. Show that the estimator φ n of φ defined { 0, if Tn < x, φ n = ( ) n 1 1 x T n, if Tn x is the minimum variance unbiased estimator of φ based on (X 1,..., X n ). Without doing any computations, state whether or not the variance of φ n achieves the Cramér Rao lower bound, justifying your answer briefly. Let k n. Show that E( φ k T n, λ) = φ n. Paper 2, Section II 28I Suppose that the random vector X = (X 1,..., X n ) has a distribution over R n depending on a real parameter θ, with everywhere positive density function p(x θ). Define the maximum likelihood estimator ˆθ, the score variable U, the observed information ĵ and the expected (Fisher) information I for the problem of estimating θ from X. For the case where the (X i ) are independent and identically distributed, show that, as n, I 1/2 U d N (0, 1). [You may assume sufficient conditions to allow interchange of integration over the sample space and differentiation with respect to the parameter.] State the asymptotic distribution of ˆθ. The random vector X = (X 1,..., X n ) is generated according to the rule X i+1 = θx i + E i, where X 0 = 1 and the (E i ) are independent and identically distributed from the standard normal distribution N (0, 1). Write down the likelihood function for θ based on data x = (x 1,..., x n ), find ˆθ and ĵ and show that the pair (ˆθ, ĵ) forms a minimal sufficient statistic. A Bayesian uses the improper prior density π(θ) 1. Show that, in the posterior, S(θ ˆθ) (where S is a statistic that you should identify) has the same distribution as E 1. Part II, 2009 List of Questions

30 Paper 3, Section II 27I What is meant by an equaliser decision rule? What is meant by an extended Bayes rule? Show that a decision rule that is both an equaliser rule and extended Bayes is minimax. Let X 1,..., X n be independent and identically distributed random variables with the normal distribution N (θ, h 1 ), and let k > 0. It is desired to estimate θ with loss function L(θ, a) = 1 exp{ 1 2 k(a θ)2 }. Suppose the prior distribution is θ N (m 0, h 1 0 ). Find the Bayes act and the Bayes loss posterior to observing X 1 = x 1,..., X n = x n. What is the Bayes risk of the Bayes rule with respect to this prior distribution? Show that the rule that estimates θ by X = n 1 n i=1 X i is minimax. Paper 4, Section II 27I Consider the double dichotomy, where the loss is 0 for a correct decision and 1 for an incorrect decision. Describe the form of a Bayes decision rule. Assuming the equivalence of normal and extensive form analyses, deduce the Neyman Pearson lemma. For a problem with random variable X and real parameter θ, define monotone likelihood ratio (MLR) and monotone test. Suppose the problem has MLR in a real statistic T = t(x). Let φ be a monotone test, with power function γ( ), and let φ be any other test, with power function γ ( ). Show that if θ 1 > θ 0 and γ(θ 0 ) > γ (θ 0 ), then γ(θ 1 ) > γ (θ 1 ). Deduce that there exists θ [, ] such that γ(θ) γ (θ) for θ < θ, and γ(θ) γ (θ) for θ > θ. For an arbitrary prior distribution Π with density π( ), and an arbitrary value θ, show that the posterior odds Π(θ > θ X = x) Π(θ θ X = x) is a non-decreasing function of t(x). Part II, 2009 List of Questions [TURN OVER

31 /II/27I An angler starts fishing at time 0. Fish bite in a Poisson Process of rate Λ per hour, so that, if Λ = λ, the number N t of fish he catches in the first t hours has the Poisson distribution P(λt), while T n, the time in hours until his nth bite, has the Gamma distribution Γ(n, λ), with density function p(t λ) = λ n (n 1)! tn 1 e λ t (t > 0). Bystander B 1 plans to watch for 3 hours, and to record the number N 3 of fish caught. Bystander B 2 plans to observe until the 10th bite, and to record T 10, the number of hours until this occurs. For B 1, show that Λ 1 := N 3 /3 is an unbiased estimator of Λ whose variance function achieves the Cramér Rao lower bound. Find an unbiased estimator of Λ for B 2, of the form Λ 2 = k/t 10. Does it achieve the Cramér Rao lower bound? Is it minimum-variance-unbiased? Justify your answers. In fact, the 10th fish bites after exactly 3 hours. For each of B 1 and B 2, write down the likelihood function for Λ based their observations. What does the Likelihood Principle have to say about the inferences to be drawn by B 1 and B 2, and why? Compute the estimates λ 1 and λ 2 produced by applying Λ 1 and Λ 2 to the observed data. Does the method of minimum-variance-unbiased estimation respect the Likelihood Principle? Part II 2008

32 /II/27I Under hypothesis H i (i = 0, 1), a real-valued observable X, taking values in X, has density function p i ( ). Define the Type I error α and the Type II error β of a test φ : X [0, 1] of the null hypothesis H 0 against the alternative hypothesis H 1. What are the size and power of the test in terms of α and β? Show that, for 0 < c <, φ minimises cα + β among all possible tests if and only if it satisfies p 1 (x) > c p 0 (x) φ(x) = 1, p 1 (x) < c p 0 (x) φ(x) = 0. What does this imply about the admissibility of such a test? Given the value θ of a parameter variable Θ [0, 1), the observable X has density function 2(x θ) p(x θ) = (1 θ) 2 (θ x 1). For fixed θ (0, 1), describe all the likelihood ratio tests of H 0 : Θ = 0 against H θ : Θ = θ. For fixed k (0, 1), let φ k be the test that rejects H 0 if and only if X k. Is φ k admissible as a test of H 0 against H θ for every θ (0, 1)? Is it uniformly most powerful for its size for testing H 0 against the composite hypothesis H 1 : Θ (0, 1)? Is it admissible as a test of H 0 against H 1? 3/II/26I Define the notion of exponential family (EF), and show that, for data arising as a random sample of size n from an exponential family, there exists a sufficient statistic whose dimension stays bounded as n. The log-density of a normal distribution N (µ, v) can be expressed in the form log p(x φ) = φ 1 x + φ 2 x 2 k(φ) where φ = (φ 1, φ 2 ) is the value of an unknown parameter Φ = (Φ 1, Φ 2 ). Determine the function k, and the natural parameter-space F. What is the mean-value parameter H = (H 1, H 2 ) in terms of Φ? Determine the maximum likelihood estimator Φ 1 of Φ 1 based on a random sample (X 1,..., X n ), and give its asymptotic distribution for n. How would these answers be affected if the variance of X were known to have value v 0? Part II 2008

33 /II/27I Define sufficient statistic, and state the factorisation criterion for determining whether a statistic is sufficient. Show that a Bayesian posterior distribution depends on the data only through the value of a sufficient statistic. Given the value µ of an unknown parameter M, observables X 1,..., X n are independent and identically distributed with distribution N (µ, 1). Show that the statistic X := n 1 n i=1 X i is sufficient for M. If the prior distribution is M N (0, τ 2 ), determine the posterior distribution of M and the predictive distribution of X. In fact, there are two hypotheses as to the value of M. Under hypothesis H 0, M takes the known value 0; under H 1, M is unknown, with prior distribution N (0, τ 2 ). Explain why the Bayes factor for choosing between H 0 and H 1 depends only on X, and determine its value for data X 1 = x 1,..., X n = x n. The frequentist 5%-level test of H 0 against H 1 rejects H 0 when X 1.96/ n. What is the Bayes factor for the critical case x = 1.96/ n? How does this behave as n? Comment on the similarities or differences in behaviour between the frequentist and Bayesian tests. Part II 2008

34 /II/27I Suppose that X has density f( θ) where θ Θ. What does it mean to say that statistic T T (X) is sufficient for θ? Suppose that θ = (ψ, λ), where ψ is the parameter of interest, and λ is a nuisance parameter, and that the sufficient statistic T has the form T = (C, S). What does it mean to say that the statistic S is ancillary? If it is, how (according to the conditionality principle) do we test hypotheses on ψ? Assuming that the set of possible values for X is discrete, show that S is ancillary if and only if the density (probability mass function) f(x ψ, λ) factorises as f(x ψ, λ) = ϕ 0 (x) ϕ C (C(x), S(x), ψ) ϕ S (S(x), λ) ( ) for some functions ϕ 0, ϕ C, and ϕ S with the properties ϕ 0 (x) = 1 = ϕ S (s, λ) = x C 1 (c) S 1 (s) s s ϕ C (c, s, ψ) c for all c, s, ψ, and λ. Suppose now that X 1,..., X n are independent observations from a Γ(a, b) distribution, with density f(x a, b) = (bx) a 1 e bx bi {x>0} /Γ(a). Assuming that the criterion ( ) holds also for observations which are not discrete, show that it is not possible to find (C(X), S(X)) sufficient for (a, b) such that S is ancillary when b is regarded as a nuisance parameter, and a is the parameter of interest. Part II 2007

35 /II/27I (i) State Wilks likelihood ratio test of the null hypothesis H 0 : θ Θ 0 against the alternative H 1 : θ Θ 1, where Θ 0 Θ 1. Explain when this test may be used. (ii) Independent identically-distributed observations X 1,..., X n take values in the set S = {1,..., K}, with common distribution which under the null hypothesis is of the form P (X 1 = k θ) = f(k θ) (k S) for some θ Θ 0, where Θ 0 is an open subset of some Euclidean space R d, d < K 1. Under the alternative hypothesis, the probability mass function of the X i is unrestricted. Assuming sufficient regularity conditions on f to guarantee the existence and uniqueness of a maximum-likelihood estimator ˆθ n (X 1,..., X n ) of θ for each n, show that for large n the Wilks likelihood ratio test statistic is approximately of the form K (N j nˆπ j ) 2 /N j, j=1 where N j = n i=1 I {X i =j}, and ˆπ j = f(j ˆθ n ). What is the asymptotic distribution of this statistic? 3/II/26I (i) In the context of decision theory, what is a Bayes rule with respect to a given loss function and prior? What is an extended Bayes rule? Characterise the Bayes rule with respect to a given prior in terms of the posterior distribution for the parameter given the observation. When Θ = A = R d for some d, and the loss function is L(θ, a) = θ a 2, what is the Bayes rule? (ii) Suppose that A = Θ = R, with loss function L(θ, a) = (θ a) 2 and suppose further that under P θ, X N(θ, 1). Supposing that a N(0, τ 1 ) prior is taken over θ, compute the Bayes risk of the decision rule d λ (X) = λx. Find the posterior distribution of θ given X, and confirm that its mean is of the form d λ (X) for some value of λ which you should identify. Hence show that the decision rule d 1 is an extended Bayes rule. Part II 2007

36 /II/27I Assuming sufficient regularity conditions on the likelihood f(x θ) for a univariate parameter θ Θ, establish the Cramér Rao lower bound for the variance of an unbiased estimator of θ. If ˆθ(X) is an unbiased estimator of θ whose variance attains the Cramér Rao lower bound for every value of θ Θ, show that the likelihood function is an exponential family. Part II 2007

37 /II/27J (a) What is a loss function? What is a decision rule? What is the risk function of a decision rule? What is the Bayes risk of a decision rule with respect to a prior π? (b) Let θ R(θ, d) denote the risk function of decision rule d, and let r(π, d) denote the Bayes risk of decision rule d with respect to prior π. Suppose that d is a decision rule and π 0 is a prior over the parameter space Θ with the two properties (i) r(π 0, d ) = min d r(π 0, d) (ii) sup θ R(θ, d ) = r(π 0, d ). Prove that d is minimax. (c) Suppose now that Θ = A = R, where A is the space of possible actions, and that the loss function is L(θ, a) = exp( λaθ), where λ is a positive constant. If the law of the observation X given parameter θ is N(θ, σ 2 ), where σ > 0 is known, show (using (b) or otherwise) that the rule is minimax. d (x) = x/σ 2 λ 2/II/27J Let {f( θ) : θ Θ} be a parametric family of densities for observation X. What does it mean to say that the statistic T T (X) is sufficient for θ? What does it mean to say that T is minimal sufficient? State the Rao Blackwell theorem. State the Cramér Rao lower bound for the variance of an unbiased estimator of a (scalar) parameter, taking care to specify any assumptions needed. Let X 1,..., X n be a sample from a U(0, θ) distribution, where the positive parameter θ is unknown. Find a minimal sufficient statistic T for θ. If h(t ) is an unbiased estimator for θ, find the form of h, and deduce that this estimator is minimum-variance unbiased. Would it be possible to reach this conclusion using the Cramér Rao lower bound? Part II 2006

38 /II/26J Write an essay on the rôle of the Metropolis Hastings algorithm in computational Bayesian inference on a parametric model. You may for simplicity assume that the parameter space is finite. Your essay should: (a) explain what problem in Bayesian inference the Metropolis Hastings algorithm is used to tackle; (b) fully justify that the algorithm does indeed deliver the required information about the model; (c) discuss any implementational issues that need care. 4/II/27J (a) State the strong law of large numbers. State the central limit theorem. (b) Assuming whatever regularity conditions you require, show that if ˆθ n ˆθ n (X 1,..., X n ) is the maximum-likelihood estimator of the unknown parameter θ based on an independent identically distributed sample of size n, then under P θ n(ˆθn θ) N(0, J(θ) 1 ) in distribution as n, where J(θ) is a matrix which you should identify. A rigorous derivation is not required. (c) Suppose that X 1, X 2,... are independent binomial Bin(1, θ) random variables. It is required to test H 0 : θ = 1 2 against the alternative H 1 : θ (0, 1). Show that the construction of a likelihood-ratio test leads us to the statistic T n = 2n{ˆθ n log ˆθ n + (1 ˆθ n ) log(1 ˆθ n ) + log 2}, where ˆθ n n 1 n k=1 X k. Stating clearly any result to which you appeal, for large n, what approximately is the distribution of T n under H 0? Writing ˆθ n = Z n, and assuming that Z n is small, show that T n 4nZ 2 n. Using this and the central limit theorem, briefly justify the approximate distribution of T n given by asymptotic maximum-likelihood theory. What could you say if the assumption that Z n is small failed? Part II 2006

39 /II/27I State Wilks Theorem on the asymptotic distribution of likelihood-ratio test statistics. Suppose that X 1,..., X n are independent with common N(µ, σ 2 ) distribution, where the parameters µ and σ are both unknown. Find the likelihood-ratio test statistic for testing H 0 : µ = 0 against H 1 : µ unrestricted, and state its (approximate) distribution. What is the form of the t-test of H 0 against H 1? likelihood-ratio test and the t-test are nearly the same. Explain why for large n the 2/II/27I (i) Suppose that X is a multivariate normal vector with mean µ R d and covariance matrix σ 2 I, where µ and σ 2 are both unknown, and I denotes the d d identity matrix. Suppose that Θ 0 Θ 1 are linear subspaces of R d of dimensions d 0 and d 1, where d 0 < d 1 < d. Let P i denote orthogonal projection onto Θ i (i = 0, 1). Carefully derive the joint distribution of ( X P 1 X 2, P 1 X P 0 X 2 ) under the hypothesis H 0 : µ Θ 0. How could you use this to make a test of H 0 against H 1 : µ Θ 1? (ii) Suppose that I students take J exams, and that the mark X ij of student i in exam j is modelled as X ij = m + α i + β j + ε ij where i α i = 0 = j β j, the ε ij are independent N(0, σ 2 ), and the parameters m, α, β and σ are unknown. Construct a test of H 0 : β j = 0 for all j against H 1 : j β j = 0. 3/II/26I In the context of decision theory, explain the meaning of the following italicized terms: loss function, decision rule, the risk of a decision rule, a Bayes rule with respect to prior π, and an admissible rule. Explain how a Bayes rule with respect to a prior π can be constructed. Suppose that X 1,..., X n are independent with common N(0, v) distribution, where v > 0 is supposed to have a prior density f 0. In a decision-theoretic approach to estimating v, we take a quadratic loss: L(v, a) = (v a) 2. Write X = (X 1,..., X n ) and X = (X Xn) 2 1/2. By considering decision rules (estimators) of the form ˆv(X) = α X 2, prove that if α 1/(n + 2) then the estimator ˆv(X) = α X 2 is not Bayes, for any choice of prior f 0. By considering decision rules of the form ˆv(X) = α X 2 + β, prove that if α 1/n then the estimator ˆv(X) = α X 2 is not Bayes, for any choice of prior f 0. [You may use without proof the fact that, if Z has a N(0, 1) distribution, then EZ 4 = 3.] Part II 2005

A Very Brief Summary of Statistical Inference, and Examples

A Very Brief Summary of Statistical Inference, and Examples Trinity Term 2008 Prof. Gesine Reinert 1 Data x = x 1, x 2,..., x n, realisations of random variables X 1, X 2,..., X n with distribution (model)