Principles of Statistics

Size: px
Start display at page:

Download "Principles of Statistics"

Transcription

1 Part II Year

2 Paper 4, Section II 28K Let g : R R be an unknown function, twice continuously differentiable with g (x) M for all x R. For some x 0 R, we know the value g(x 0 ) and we wish to estimate its derivative g (x 0 ). To do so, we have access to a pseudo-random number generator that gives U1,..., U N i.i.d. uniform over [0, 1], and a machine that takes input x 1,..., x N R and returns g(x i ) + ε i, where the ε i are i.i.d. N (0, σ 2 ). (a) Explain how this setup allows us to generate N independent X i = x 0 + hz i, where the Z i take value 1 or 1 with probability 1/2, for any h > 0. (b) We denote by Y i the output g(x i ) + ε i. Show that for some independent ξ i R Y i g(x 0 ) = hz i g (x 0 ) + h2 2 g (ξ i ) + ε i. (c) Using the intuition given by the least-squares estimator, justify the use of the estimator ĝ N given by ĝ N = 1 N Z i (Y i g(x 0 )). N h (d) Show that E[ ĝ N g (x 0 ) 2 ] h2 M 2 + σ2 4 Nh 2. Show that for some choice h N of parameter h, this implies i=1 E[ ĝ N g (x 0 ) 2 ] σm N. Part II, 2018 List of Questions [TURN OVER

3 Paper 3, Section II 28K In the model {N (θ, I p ), θ R p } of a Gaussian distribution in dimension p, with unknown mean θ and known identity covariance matrix I p, we estimate θ based on a sample of i.i.d. observations X 1,..., X n drawn from N (θ 0, I p ). (a) Define the Fisher information I(θ 0 ), and compute it in this model. (b) We recall that the observed Fisher information i n (θ) is given by i n (θ) = 1 n n θ log f(x i, θ) θ log f(x i, θ). i=1 Find the limit of î n = i n (ˆθ MLE ), where ˆθ MLE is the maximum likelihood estimator of θ in this model. (c) Define the Wald statistic W n (θ) and compute it. Give the limiting distribution of W n (θ 0 ) and explain how it can be used to design a confidence interval for θ 0. [You may use results from the course provided that you state them clearly.] Paper 2, Section II 28K We consider the model {N (θ, I p ), θ R p } of a Gaussian distribution in dimension p 3, with unknown mean θ and known identity covariance matrix I p. We estimate θ based on one observation X N (θ, I p ), under the loss function l(θ, δ) = θ δ 2 2. (a) Define the risk of an estimator ˆθ. Compute the maximum likelihood estimator ˆθ MLE of θ and its risk for any θ R p. (b) Define what an admissible estimator is. Is ˆθ MLE admissible? (c) For any c > 0, let π c (θ) be the prior N (0, c 2 I p ). Find a Bayes optimal estimator ˆθ c under this prior with the quadratic loss, and compute its Bayes risk. (d) Show that ˆθ MLE is minimax. [You may use results from the course provided that you state them clearly.] Part II, 2018 List of Questions

4 Paper 1, Section II 29K A scientist wishes to estimate the proportion θ (0, 1) of presence of a gene in a population of flies of size n. Every fly receives a chromosome from each of its two parents, each carrying the gene A with probability (1 θ) or the gene B with probability θ, independently. The scientist can observe if each fly has two copies of the gene A (denoted by AA), two copies of the gene B (denoted by BB) or one of each (denoted by AB). We let n AA, n BB, and n AB denote the number of each observation among the n flies. (a) Give the probability of each observation as a function of θ, denoted by f(x, θ), for all three values X = AA, BB, or AB. (b) For a vector w = (w AA, w BB, w AB ), we let ˆθ w denote the estimator defined by ˆθ w = w AA n AA n + w n BB BB n + w n AB AB n. Find the unique vector w such that ˆθ w is unbiased. Show that ˆθ w is a consistent estimator of θ. (c) Compute the maximum likelihood estimator of θ in this model, denoted by ˆθ MLE. Find the limiting distribution of n(ˆθ MLE θ). [You may use results from the course, provided that you state them clearly.] Part II, 2018 List of Questions [TURN OVER

5 Paper 2, Section II 26K We consider the problem of estimating θ in the model {f(x, θ) : θ (0, )}, where f(x, θ) = (1 α)(x θ) α 1{x [θ, θ + 1]}. Here 1{A} is the indicator of the set A, and α (0, 1) is known. This estimation is based on a sample of n i.i.d. X 1,..., X n, and we denote by X (1) <... < X (n) the ordered sample. (a) Compute the mean and the variance of X 1. Construct an unbiased estimator of θ taking the form θ n = X n + c(α), where X n = n 1 n i=1 X i, specifying c(α). (b) Show that θ n is consistent and find the limit in distribution of n( θ n θ). Justify your answer, citing theorems that you use. (c) Find the maximum likelihood estimator ˆθ n of θ. Compute P(ˆθ n θ > t) for all real t. Is ˆθ n unbiased? (d) For t > 0, show that P(n β (ˆθ n θ) > t) has a limit in (0, 1) for some β > 0. Give explicitly the value of β and the limit. Why should one favour using ˆθ n over θ n? Part II, 2017 List of Questions

6 Paper 3, Section II 26K We consider the problem of estimating an unknown θ 0 in a statistical model {f(x, θ), θ Θ} where Θ R, based on n i.i.d. observations X 1,..., X n whose distribution has p.d.f. f(x, θ 0 ). In all the parts below you may assume that the model satisfies necessary regularity conditions. (a) Define the score function S n of θ. Prove that S n (θ 0 ) has mean 0. (b) Define the Fisher Information I(θ). Show that it can also be expressed as [ d 2 ] I(θ) = E θ dθ 2 log f(x 1, θ). (c) Define the maximum likelihood estimator ˆθ n of θ. Give without proof the limits of ˆθ n and of n(ˆθ n θ 0 ) (in a manner which you should specify). [Be as precise as possible when describing a distribution.] (d) Let ψ : Θ R be a continuously differentiable function, and θ n another estimator of θ 0 such that ˆθ n θ n 1/n with probability 1. Give the limits of ψ( θ n ) and of n(ψ( θn ) ψ(θ 0 )) (in a manner which you should specify). Part II, 2017 List of Questions [TURN OVER

7 Paper 4, Section II 27K For the statistical model {N d (θ, Σ), θ R d }, where Σ is a known, positive-definite d d matrix, we want to estimate θ based on n i.i.d. observations X 1,..., X n with distribution N d (θ, Σ). (a) Derive the maximum likelihood estimator ˆθ n of θ. What is the distribution of ˆθ n? (b) For α (0, 1), construct a confidence region C α n such that P θ (θ C α n ) = 1 α. (c) For Σ = I d, compute the maximum likelihood estimator of θ for the following parameter spaces: (i) Θ = {θ : θ 2 = 1}. (ii) Θ = {θ : v θ = 0} for some unit vector v R d. (d) For Σ = I d, we want to test the null hypothesis Θ 0 = {0} (i.e. θ = 0) against the composite alternative Θ 1 = R d \ {0}. Compute the likelihood ratio statistic Λ(Θ 1, Θ 0 ) and give its distribution under the null hypothesis. Compare this result with the statement of Wilks theorem. Part II, 2017 List of Questions

8 Paper 1, Section II 28K For a positive integer n, we want to estimate the parameter p in the binomial statistical model {Bin(n, p), p [0, 1]}, based on an observation X Bin(n, p). (a) Compute the maximum likelihood estimator for p. Show that the posterior distribution for p under a uniform prior on [0, 1] is Beta(a, b), and specify a and b. [The p.d.f. of Beta(a, b) is given by f a,b (p) = (a + b 1)! (a 1)!(b 1)! pa 1 (1 p) b 1. ] (b) (i) For a risk function L, define the risk of an estimator ˆp of p, and the Bayes risk under a prior π for p. (ii) Under the loss function L(ˆp, p) = (ˆp p)2 p(1 p), find a Bayes optimal estimator for the uniform prior. Give its risk as a function of p. (iii) Give a minimax optimal estimator for the loss function L given above. Justify your answer. Part II, 2017 List of Questions [TURN OVER

9 2016 Paper 3, Section II 25J 75 Let X 1,..., X n be i.i.d. random variables from a N(θ, 1) distribution, θ R, and consider a Bayesian model θ N(0, v 2 ) for the unknown parameter, where v > 0 is a fixed constant. (a) Derive the posterior distribution Π( X 1,..., X n ) of θ X 1,..., X n. (b) Construct a credible set C n R such that (i) Π(C n X 1,..., X n ) = 0.95 for every n N, and (ii) for any θ 0 R, P N θ 0 (θ 0 C n ) 0.95 as n, where Pθ N denotes the distribution of the infinite sequence X 1, X 2,... when drawn independently from a fixed N(θ, 1) distribution. [You may use the central limit theorem.] Paper 2, Section II 26J (a) State and prove the Cramér Rao inequality in a parametric model {f(θ) : θ Θ}, where Θ R. [Necessary regularity conditions on the model need not be specified.] (b) Let X 1,..., X n be i.i.d. Poisson random variables with unknown parameter EX 1 = θ > 0. For X n = (1/n) n i=1 X i and S 2 = (n 1) 1 n i=1 (X i X n ) 2 define T α = α X n + (1 α)s 2, 0 α 1. Show that Var θ (T α ) Var θ ( X n ) for all values of α, θ. Now suppose θ = θ(x 1,..., X n ) is an estimator of θ with possibly nonzero bias B(θ) = E θ θ θ. Suppose the function B is monotone increasing on (0, ). Prove that the mean-squared errors satisfy E θ ( θ n θ) 2 E θ ( X n θ) 2 for all θ Θ. Part II, 2016 List of Questions [TURN OVER

10 2016 Paper 4, Section II 26J Consider a decision problem with parameter space Θ. Bayes decision rule δ π and of a least favourable prior. 76 Define the concepts of a Suppose π is a prior distribution on Θ such that the Bayes risk of the Bayes rule equals sup θ Θ R(δ π, θ), where R(δ, θ) is the risk function associated to the decision problem. Prove that δ π is least favourable. Now consider a random variable X arising from the binomial distribution Bin(n, θ), where θ Θ = [0, 1]. Construct a least favourable prior for the squared risk R(δ, θ) = E θ (δ(x) θ) 2. [You may use without proof the fact that the Bayes rule for quadratic risk is given by the posterior mean.] Paper 1, Section II 27J Derive the maximum likelihood estimator θ n based on independent observations X 1,..., X n that are identically distributed as N(θ, 1), where the unknown parameter θ lies in the parameter space Θ = R. Find the limiting distribution of n( θ n θ) as n. Now define θ n = θ n whenever θ n > n 1/4, = 0 otherwise, and find the limiting distribution of n( θ n θ) as n. Calculate lim sup ne θ (T n θ) 2 n θ Θ for the choices T n = θ n and T n = θ n. Based on the above findings, which estimator T n of θ would you prefer? Explain your answer. [Throughout, you may use standard facts of stochastic convergence, such as the central limit theorem, provided they are clearly stated.] Part II, 2016 List of Questions

11 2015 Paper 4, Section II 24J 79 Given independent and identically distributed observations X 1,..., X n with finite mean E(X 1 ) = µ and variance Var(X 1 ) = σ 2, explain the notion of a bootstrap sample X b 1,..., Xb n, and discuss how you can use it to construct a confidence interval C n for µ. Suppose you can operate a random number generator that can simulate independent uniform random variables U 1,..., U n on [0, 1]. How can you use such a random number generator to simulate a bootstrap sample? Suppose that (F n : n N) and F are cumulative probability distribution functions defined on the real line, that F n (t) F (t) as n for every t R, and that F is continuous on R. Show that, as n, sup F n (t) F (t) 0. t R State (without proof) the theorem about the consistency of the bootstrap of the mean, and use it to give an asymptotic justification of the confidence interval C n. That is, prove that as n, P N (µ C n ) 1 α where P N is the joint distribution of X 1, X 2,.... [You may use standard facts of stochastic convergence and the Central Limit Theorem without proof.] Part II, 2015 List of Questions [TURN OVER

12 2015 Paper 3, Section II 24J 80 Define what it means for an estimator ˆθ of an unknown parameter θ to be consistent. Let S n be a sequence of random real-valued continuous functions defined on R such that, as n, S n (θ) converges to S(θ) in probability for every θ R, where S : R R is non-random. Suppose that for some θ 0 R and every ε > 0 we have S(θ 0 ε) < 0 < S(θ 0 + ε), and that S n has exactly one zero ˆθ n for every n N. Show that ˆθ n P θ 0 as n, and deduce from this that the maximum likelihood estimator (MLE) based on observations X 1,..., X n from a N(θ, 1), θ R model is consistent. Now consider independent observations X 1,..., X n of bivariate normal random vectors X i = (X 1i, X 2i ) T [ N 2 (µi, µ i ) T, σ 2 ] I 2, i = 1,..., n, where µ i R, σ > 0 and I 2 is the 2 2 identity matrix. Find the MLE ˆµ = (ˆµ 1,..., ˆµ n ) T of µ = (µ 1,..., µ n ) T and show that the MLE of σ 2 equals ˆσ 2 = 1 n n s 2 i, s 2 i = 1 2 [(X 1i ˆµ i ) 2 + (X 2i ˆµ i ) 2 ]. i=1 Show that ˆσ 2 is not consistent for estimating σ 2. Explain briefly why the MLE fails in this model. [You may use the Law of Large Numbers without proof.] Part II, 2015 List of Questions

13 2015 Paper 2, Section II 25J 81 Consider a random variable X arising from the binomial distribution Bin(n, θ), θ Θ = [0, 1]. Find the maximum likelihood estimator ˆθ MLE and the Fisher information I(θ) for θ Θ. Now consider the following priors on Θ: (i) a uniform U([0, 1]) prior on [0, 1], (ii) a prior with density π(θ) proportional to I(θ), (iii) a Beta( n/2, n/2) prior. Find the means E[θ X] and modes m θ X of the posterior distributions corresponding to the prior distributions (i) (iii). Which of these posterior decision rules coincide with ˆθ MLE? Which one is minimax for quadratic risk? Justify your answers. [You may use the following properties of the Beta(a, b) (a > 0, b > 0) distribution. Its density f(x; a, b), x [0, 1], is proportional to x a 1 (1 x) b 1, its mean is equal to a/(a + b), and its mode is equal to provided either a > 1 or b > 1. max(a 1, 0) max(a, 1) + max(b, 1) 2 You may further use the fact that a unique Bayes rule of constant risk is a unique minimax rule for that risk.] Part II, 2015 List of Questions [TURN OVER

14 2015 Paper 1, Section II 25J 82 Consider a normally distributed random vector X R p modelled as X N(θ, I p ) where θ R p, I p is the p p identity matrix, and where p 3. Define the Stein estimator ˆθ STEIN of θ. Prove that ˆθ STEIN dominates the estimator θ = X for the risk function induced by quadratic loss p l(a, θ) = (a i θ i ) 2, a R p. Show however that the worst case risks coincide, that is, show that i=1 sup E θ l(x, θ) = sup E θ l(ˆθ STEIN, θ). θ R p θ R p [You may use Stein s lemma without proof, provided it is clearly stated.] Part II, 2015 List of Questions

15 Paper 4, Section II 27J Suppose you have at hand a pseudo-random number generator that can simulate an i.i.d. sequence of uniform U[0, 1] distributed random variables U1,..., U N for any N N. Construct an algorithm to simulate an i.i.d. sequence X1,..., X N of standard normal N(0, 1) random variables. [Should your algorithm depend on the inverse of any cumulative probability distribution function, you are required to provide an explicit expression for this inverse function.] Suppose as a matter of urgency you need to approximately evaluate the integral I = 1 1 /2 2π (π + x ) 1/4 e x2 dx. R Find an approximation I N of this integral that requires N simulation steps from your pseudo-random number generator, and which has stochastic accuracy Pr( I N I > N 1/4 ) N 1/2, where Pr denotes the joint law of the simulated random variables. Justify your answer. Paper 3, Section II 27J State and prove Wilks theorem about testing the simple hypothesis H 0 : θ = θ 0, against the alternative H 1 : θ Θ \ {θ 0 }, in a one-dimensional regular parametric model {f(, θ) : θ Θ}, Θ R. [You may use without proof the results from lectures on the consistency and asymptotic distribution of maximum likelihood estimators, as well as on uniform laws of large numbers. Necessary regularity conditions can be assumed without statement.] Find the maximum likelihood estimator ˆθ n based on i.i.d. observations X 1,..., X n in a N(0, θ)-model, θ Θ = (0, ). Deduce the limit distribution as n of the sequence of statistics ( ) n log(x 2 ) (X 2 1), where X 2 = (1/n) n i=1 X2 i and X 1,..., X n are i.i.d. N(0, 1). Part II, 2014 List of Questions

16 Paper 2, Section II 28J In a general decision problem, define the concepts of a Bayes rule and of admissibility. Show that a unique Bayes rule is admissible. Consider i.i.d. observations X 1,..., X n from a Poisson(θ), θ Θ = (0, ), model. Can the maximum likelihood estimator ˆθ MLE of θ be a Bayes rule for estimating θ in quadratic risk for any prior distribution on θ that has a continuous probability density on (0, )? Justify your answer. Now model the X i as i.i.d. copies of X θ Poisson(θ), where θ is drawn from a prior that is a Gamma distribution with parameters α > 0 and λ > 0 (given below). Show that the posterior distribution of θ X 1,..., X n is a Gamma distribution and find its parameters. Find the Bayes rule ˆθ BAYES for estimating θ in quadratic risk for this prior. [The Gamma probability density function with parameters α > 0, λ > 0 is given by where Γ(α) is the usual Gamma function.] f(θ) = λα θ α 1 e λθ, θ > 0, Γ(α) Finally assume that the X i have actually been generated from a fixed Poisson(θ 0 ) distribution, where θ 0 > 0. Show that n(ˆθ BAYES ˆθ MLE ) converges to zero in probability and deduce the asymptotic distribution of n(ˆθ BAYES θ 0 ) under the joint law P N θ 0 of the random variables X 1, X 2,.... [You may use standard results from lectures without proof provided they are clearly stated.] Paper 1, Section II 28J State without proof the inequality known as the Cramér Rao lower bound in a parametric model {f(, θ) : θ Θ}, Θ R. Give an example of a maximum likelihood estimator that attains this lower bound, and justify your answer. Give an example of a parametric model where the maximum likelihood estimator based on observations X 1,..., X n is biased. State without proof an analogue of the Cramér Rao inequality for biased estimators. Define the concept of a minimax decision rule, and show that the maximum likelihood estimator ˆθ MLE based on X 1,..., X n in a N(θ, 1) model is minimax for estimating θ Θ = R in quadratic risk. Part II, 2014 List of Questions [TURN OVER

17 Paper 4, Section II 27K Assuming only the existence and properties of the univariate normal distribution, define N p (µ, Σ), the multivariate normal distribution with mean (row-)vector µ and dispersion matrix Σ; and W p (ν; Σ), the Wishart distribution on integer ν > 1 degrees of freedom and with scale parameter Σ. Show that, if X N p (µ, Σ), S W p (ν; Σ), and b (1 q), A (p q) are fixed, then b + XA N q (b + µa, Φ), A T SA W p (ν; Φ), where Φ = A T ΣA. The random (n p) matrix X has rows that are independently distributed as N p (M, Σ), where both parameters M and Σ are unknown. Let X := n 1 1 T X, where 1 is the (n 1) vector of 1s; and S c := X T ΠX, with Π := I n n 1 11 T. State the joint distribution of X and S c given the parameters. Now suppose n > p and Σ is positive definite. Hotelling s T 2 is defined as T 2 := n(x M) ( S c ) 1 (X M) T where S c := S c /ν with ν := (n 1). Show that, for any values of M and Σ, ( ) ν p + 1 T 2 F p ν p+1 νp, the F distribution on p and ν p + 1 degrees of freedom. [You may assume that: 1. If S W p (ν; Σ) and a is a fixed (p 1) vector, then a T Σ 1 a a T S 1 a χ2 ν p If V χ 2 p, W χ2 λ are independent, then V/p W/λ F p λ. ] Part II, 2013 List of Questions [TURN OVER

18 Paper 3, Section II 27K What is meant by a convex decision problem? State and prove a theorem to the effect that, in a convex decision problem, there is no point in randomising. [You may use standard terms without defining them.] The sample space, parameter space and action space are each the two-point set {1, 2}. The observable X takes value 1 with probability 2/3 when the parameter Θ = 1, and with probability 3/4 when Θ = 2. The loss function L(θ, a) is 0 if a = θ, otherwise 1. Describe all the non-randomised decision rules, compute their risk functions, and plot these as points in the unit square. Identify an inadmissible non-randomised decision rule, and a decision rule that dominates it. Show that the minimax rule has risk function (8/17, 8/17), and is Bayes against a prior distribution that you should specify. What is its Bayes risk? Would a Bayesian with this prior distribution be bound to use the minimax rule? Paper 1, Section II 28K When the real parameter Θ takes value θ, variables X 1, X 2,... arise independently from a distribution P θ having density function p θ (x) with respect to an underlying measure µ. Define the score variable U n (θ) and the information function I n (θ) for estimation of Θ based on X n := (X 1,..., X n ), and relate I n (θ) to i(θ) := I 1 (θ). State and prove the Cramér Rao inequality for the variance of an unbiased estimator of Θ. Under what conditions does this inequality become an equality? What is the form of the estimator in this case? [You may assume E θ {U n (θ)} = 0, var θ {U n (θ)} = I n (θ), and any further required regularity conditions, without comment.] Let Θ n be the maximum likelihood estimator of Θ based on X n. asymptotic distribution of n 1 2 ( Θ n Θ) when Θ = θ? What is the Suppose that, for each n, Θ n is unbiased for Θ, and the variance of n 1 2 ( Θ n Θ) is exactly equal to its asymptotic variance. By considering the estimator α Θ k + (1 α) Θ n, or otherwise, show that, for k < n, cov θ ( Θ k, Θ n ) = var θ ( Θ n ). Part II, 2013 List of Questions

19 Paper 2, Section II 28K Describe the Weak Sufficiency Principle (WSP) and the Strong Sufficiency Principle (SSP). Show that Bayesian inference with a fixed prior distribution respects WSP. A parameter Φ has a prior distribution which is normal with mean 0 and precision (inverse variance) h Φ. Given Φ = φ, further parameters Θ := (Θ i : i = 1,..., I) have independent normal distributions with mean φ and precision h Θ. Finally, given both Φ = φ and Θ = θ := (θ 1,..., θ I ), observables X := (X ij : i = 1,..., I; j = 1,..., J) are independent, X ij being normal with mean θ i, and precision h X. The precision parameters (h Φ, h Θ, h X ) are all fixed and known. Let X := (X 1,..., X I ), where X i := J j=1 X ij/j. Show, directly from the definition of sufficiency, that X is sufficient for (Φ, Θ). [You may assume without proof that, if Y 1,..., Y n have independent normal distributions with the same variance, and Y := n 1 n i=1 Y i, then the vector (Y 1 Y,..., Y n Y ) is independent of Y.] For data-values x := (x ij : i = 1,..., I; j = 1,..., J), determine the joint distribution, Π φ say, of Θ, given X = x and Φ = φ. What is the distribution of Φ, given Θ = θ and X = x? Using these results, describe clearly how Gibbs sampling combined with Rao Blackwellisation could be applied to estimate the posterior joint distribution of Θ, given X = x. Part II, 2013 List of Questions [TURN OVER

20 Paper 4, Section II 27K For i = 1,..., n, the pairs (X i, Y i ) have independent bivariate normal distributions, with E(X i ) = µ X, E(Y i ) = µ Y, var(x i ) = var(y i ) = φ, and corr(x i, Y i ) = ρ. The means µ X, µ Y are known; the parameters φ > 0 and ρ ( 1, 1) are unknown. Show that the joint distribution of all the variables belongs to an exponential family, and identify the natural sufficient statistic, natural parameter, and mean-value parameter. Hence or otherwise, find the maximum likelihood estimator ˆρ of ρ. Let U i := X i + Y i, V i := X i Y i. What is the joint distribution of (U i, V i )? Show that the distribution of (1 + ˆρ)/(1 ˆρ) (1 + ρ)/(1 ρ) is Fn n. Hence describe a (1 α)-level confidence interval for ρ. Briefly explain what would change if µ X and µ Y were also unknown. [Recall that the distribution F ν 1 ν 2 is that of (W 1 /ν 1 )/(W 2 /ν 2 ), where, independently for j = 1 and j = 2, W j has the chi-squared distribution with ν j degrees of freedom.] Part II, 2012 List of Questions [TURN OVER

21 Paper 3, Section II 27K The parameter vector is Θ (Θ 1, Θ 2, Θ 3 ), with Θ i > 0, Θ 1 + Θ 2 + Θ 3 = 1. Given Θ = θ (θ 1, θ 2, θ 3 ), the integer random vector X = (X 1, X 2, X 3 ) has a trinomial distribution, with probability mass function p(x θ) = n! x 1! x 2! x 3! θx 1 1 θx 2 2 θx 3 3, (x i 0, ) 3 x i = n. (1) Compute the score vector for the parameter Θ := (Θ 1, Θ 2 ), and, quoting any relevant general result, use this to determine E(X i ) (i = 1, 2, 3). Considering (1) as an exponential family with mean-value parameter Θ, what is the corresponding natural parameter Φ (Φ 1, Φ 2 )? Compute the information matrix I for Θ, which has (i, j)-entry ( 2 ) l I ij = E (i, j = 1, 2), θ i θ j where l denotes the log-likelihood function, based on X, expressed in terms of (θ 1, θ 2 ). Show that the variance of log(x 1 /X 3 ) is asymptotic to n 1 (θ1 1 + θ3 1 ) as n. [Hint. The information matrix I Φ for Φ is I 1 and the dispersion matrix of the maximum likelihood estimator Φ behaves, asymptotically (for n ) as I 1 Φ.] i=1 Part II, 2012 List of Questions

22 Paper 2, Section II 28K Carefully defining all italicised terms, show that, if a sufficiently general method of inference respects both the Weak Sufficiency Principle and the Conditionality Principle, then it respects the Likelihood Principle. The position X t of a particle at time t > 0 has the Normal distribution N (0, φt), where φ is the value of an unknown parameter Φ; and the time, T x, at which the particle first reaches position x 0 has probability density function ( ) x p x (t) = exp x2 (t > 0). 2πφt 3 2φt Experimenter E 1 observes X τ, and experimenter E 2 observes T ξ, where τ > 0, ξ 0 are fixed in advance. It turns out that T ξ = τ. What does the Likelihood Principle say about the inferences about Φ to be made by the two experimenters? E 1 bases his inference about Φ on the distribution and observed value of X 2 τ /τ, while E 2 bases her inference on the distribution and observed value of ξ 2 /T ξ. Show that these choices respect the Likelihood Principle. Part II, 2012 List of Questions [TURN OVER

23 Paper 1, Section II 28K Prove that, if T is complete sufficient for Θ, and S is a function of T, then S is the minimum variance unbiased estimator of E(S Θ). When the parameter Θ takes a value θ > 0, observables (X 1,..., X n ) arise independently from the exponential distribution E(θ), having probability density function Show that the family of distributions with probability density function p(x θ) = θe θx (x > 0). Θ Gamma (α, β) (α > 0, β > 0), (1) π(θ) = βα Γ(α) θα 1 e βθ (θ > 0), is a conjugate family for Bayesian inference about Θ (where Γ(α) is the Gamma function). Show that the expectation of Λ := log Θ, under prior distribution (1), is ψ(α) log β, where ψ(α) := (d/dα) log Γ(α). What is the prior variance of Λ? Deduce the posterior expectation and variance of Λ, given (X 1,..., X n ). Let Λ denote the limiting form of the posterior expectation of Λ as α, β 0. Show that Λ is the minimum variance unbiased estimator of Λ. What is its variance? Part II, 2012 List of Questions

24 Paper 1, Section II 28K Define admissible, Bayes, minimax decision rules. A random vector X = (X 1, X 2, X 3 ) T has independent components, where X i has the normal distribution N (θ i, 1) when the parameter vector Θ takes the value θ = (θ 1, θ 2, θ 3 ) T. It is required to estimate Θ by a point a R 3, with loss function L(θ, a) = a θ 2. What is the risk function of the maximum-likelihood estimator Θ := X? Show that Θ is dominated by the estimator Θ := (1 X 2 ) X. Paper 2, Section II 28K Random variables X 1,..., X n are independent and identically distributed from the normal distribution with unknown mean M and unknown precision (inverse variance) H. Show that the likelihood function, for data X 1 = x 1,..., X n = x n, is { }) L n (µ, h) h n/2 exp ( 1 2 h n (x µ) 2 + S, where x := n 1 i x i and S := i (x i x) 2. A bivariate prior distribution for (M, H) is specified, in terms of hyperparameters (α 0, β 0, m 0, λ 0 ), as follows. The marginal distribution of H is Γ(α 0, β 0 ), with density π(h) h α 0 1 e β 0h (h > 0), and the conditional distribution of M, given H = h, is normal with mean m 0 and precision λ 0 h. Show that the conditional prior distribution of H, given M = µ, is ( H M = µ Γ α , β λ 0 (µ m 0 ) 2). Show that the posterior joint distribution of (M, H), given X 1 = x 1,..., X n = x n, has the same form as the prior, with updated hyperparameters (α n, β n, m n, λ n ) which you should express in terms of the prior hyperparameters and the data. [You may use the identity where p + q = 1 and δ = pa + qb.] p(t a) 2 + q(t b) 2 = (t δ) 2 + pq(a b) 2, Explain how you could implement Gibbs sampling to generate a random sample from the posterior joint distribution. Part II, 2011 List of Questions

25 Paper 3, Section II 27K Random variables X 1, X 2,... are independent and identically distributed from the exponential distribution E(θ), with density function p X (x θ) = θe θx (x > 0), when the parameter Θ takes value θ > 0. The following experiment is performed. First X 1 is observed. Thereafter, if X 1 = x 1,..., X i = x i have been observed (i 1), a coin having probability α(x i ) of landing heads is tossed, where α : R (0, 1) is a known function and the coin toss is independent of the X s and previous tosses. If it lands heads, no further observations are made; if tails, X i+1 is observed. Let N be the total number of X s observed, and X := (X 1,..., X N ). Write down the likelihood function for Θ based on data X = (x 1,..., x n ), and identify a minimal sufficient statistic. What does the likelihood principle have to say about inference from this experiment? Now consider the experiment that only records Y := X N. Show that the density function of Y has the form p Y (y θ) = exp{a(y) k(θ) θy}. Assuming the function a( ) is twice differentiable and that both p Y (y θ) and p Y (y θ)/ y vanish at 0 and, show that a (Y ) is an unbiased estimator of Θ, and find its variance. Stating clearly any general results you use, deduce that k (θ) E θ {a (Y )} 1. Part II, 2011 List of Questions [TURN OVER

26 Paper 4, Section II 27K What does it mean to say that a (1 p) random vector ξ has a multivariate normal distribution? Suppose ξ = (X, Y ) has the bivariate normal distribution with mean vector µ = (µ X, µ Y ), and dispersion matrix ( ) σxx σ Σ = XY. σ XY Show that, with β := σ XY /σ XX, Y βx is independent of X, and thus that the conditional distribution of Y given X is normal with mean µ Y + β(x µ X ) and variance σ Y Y X := σ Y Y σ 2 XY /σ XX. For i = 1,..., n, ξ i = (X i, Y i ) are independent and identically distributed with the above distribution, where all elements of µ and Σ are unknown. Let where ξ := n 1 n i=1 ξ i. ( SXX S S = XY S XY S Y Y ) := σ Y Y n (ξ i ξ) T (ξ i ξ), The sample correlation coefficient is r := S XY / S XX S Y Y. Show that the distribution of r depends only on the population correlation coefficient ρ := σ XY / σ XX σ Y Y. Student s t-statistic (on n 2 degrees of freedom) for testing the null hypothesis H 0 : β = 0 is β t :=, SY Y X /(n 2)S XX where β := S XY /S XX and S Y Y X := S Y Y S 2 XY /S XX. Its density when H 0 is true is i=1 ) 1 p(t) = C (1 + t2 2 (n 1), n 2 where C is a constant that need not be specified. Express t in terms of r, and hence derive the density of r when ρ = 0. How could you use the sample correlation r to test the hypothesis ρ = 0? Part II, 2011 List of Questions

27 2010 Paper 1, Section II 28J 76 The distribution of a random variable X is obtained from the binomial distribution B(n; Π) by conditioning on X > 0; here Π (0, 1) is an unknown probability parameter and n is known. Show that the distributions of X form an exponential family and identify the natural sufficient statistic T, natural parameter Φ, and cumulant function k(φ). Using general properties of the cumulant function, compute the mean and variance of X when Π = π. Write down an equation for the maximum likelihood estimate Π of Π and explain why, when Π = π, the distribution of Π is approximately normal N (π, π(1 π)/n) for large n. Suppose we observe X = 1. It is suggested that, since the condition X > 0 is then automatically satisfied, general principles of inference require that the inference to be drawn should be the same as if the distribution of X had been B(n; Π) and we had observed X = 1. Comment briefly on this suggestion. Paper 2, Section II 28J Define the Kolmogorov Smirnov statistic for testing the null hypothesis that real random variables X 1,..., X n are independently and identically distributed with specified continuous, strictly increasing distribution function F, and show that its null distribution does not depend on F. A composite hypothesis H 0 specifies that, when the unknown positive parameter Θ takes value θ, the random variables X 1,..., X n arise independently from the uniform distribution U [0, θ]. Letting J := arg max 1 i n X i, show that, under H 0, the statistic (J, X J ) is sufficient for Θ. Show further that, given {J = j, X j = ξ}, the random variables (X i : i j) are independent and have the U [0, ξ] distribution. How might you apply the Kolmogorov Smirnov test to test the hypothesis H 0? Part II, 2010 List of Questions

28 2010 Paper 3, Section II 27J 77 Define the normal and extensive form solutions of a Bayesian statistical decision problem involving parameter Θ, random variable X, and loss function L(θ, a). How are they related? Let R 0 = R 0 (Π) be the Bayes loss of the optimal act when Θ Π and no data can be observed. Express the Bayes risk R 1 of the optimal statistical decision rule in terms of R 0 and the joint distribution of (Θ, X). The real parameter Θ has distribution Π, having probability density function π( ). Consider the problem of specifying a set S R such that the loss when Θ = θ is L(θ, S) = c S 1 S (θ), where 1 S is the indicator function of S, where c > 0, and where S = S dx. Show that the highest density region S := {θ : π(θ) c} supplies a Bayes act for this decision problem, and explain why R 0 (Π) 0. For the case Θ N (µ, σ 2 ), find an expression for R 0 in terms of the standard normal distribution function Φ. Suppose now that c = 0.5, that Θ N (0, 1) and that X Θ N (Θ, 1/9). Show that R 1 < R 0. Paper 4, Section II 27J Define completeness and bounded completeness of a statistic T in a statistical experiment. Random variables X 1, X 2, X 3 are generated as X i = Θ 1/2 Z + (1 Θ) 1/2 Y i, where Z, Y 1, Y 2, Y 3 are independently standard normal N (0, 1), and the parameter Θ takes values in (0, 1). What is the joint distribution of (X 1, X 2, X 3 ) when Θ = θ? Write down its density function, and show that a minimal sufficient statistic for Θ based on (X 1, X 2, X 3 ) is T = (T 1, T 2 ) := ( 3 i=1 X2 i, ( 3 i=1 X i) 2 ). [Hint: You may use that if I is the n n identity matrix and J is the n n matrix all of whose entries are 1, then ai + bj has determinant a n 1 (a + nb), and inverse ci + dj with c = 1/a, d = b/(a(a + nb)).] What is E θ (T 1 )? Is T complete for Θ? Let S := Prob(X T ). Show that E θ(s) is a positive constant c which does not depend on θ, but that S is not identically equal to c. Is T boundedly complete for Θ? Part II, 2010 List of Questions [TURN OVER

29 2009 Paper 1, Section II 28I 76 (i) Let X 1,..., X n be independent and identically distributed random variables, having the exponential distribution E(λ) with density p(x λ) = λ exp( λx) for x, λ > 0. Show that T n = n i=1 X i is minimal sufficient and complete for λ. [You may assume uniqueness of Laplace transforms.] (ii) For given x > 0, it is desired to estimate the quantity φ = Prob(X 1 > x λ). Compute the Fisher information for φ. by (iii) State the Lehmann Scheffé theorem. Show that the estimator φ n of φ defined { 0, if Tn < x, φ n = ( ) n 1 1 x T n, if Tn x is the minimum variance unbiased estimator of φ based on (X 1,..., X n ). Without doing any computations, state whether or not the variance of φ n achieves the Cramér Rao lower bound, justifying your answer briefly. Let k n. Show that E( φ k T n, λ) = φ n. Paper 2, Section II 28I Suppose that the random vector X = (X 1,..., X n ) has a distribution over R n depending on a real parameter θ, with everywhere positive density function p(x θ). Define the maximum likelihood estimator ˆθ, the score variable U, the observed information ĵ and the expected (Fisher) information I for the problem of estimating θ from X. For the case where the (X i ) are independent and identically distributed, show that, as n, I 1/2 U d N (0, 1). [You may assume sufficient conditions to allow interchange of integration over the sample space and differentiation with respect to the parameter.] State the asymptotic distribution of ˆθ. The random vector X = (X 1,..., X n ) is generated according to the rule X i+1 = θx i + E i, where X 0 = 1 and the (E i ) are independent and identically distributed from the standard normal distribution N (0, 1). Write down the likelihood function for θ based on data x = (x 1,..., x n ), find ˆθ and ĵ and show that the pair (ˆθ, ĵ) forms a minimal sufficient statistic. A Bayesian uses the improper prior density π(θ) 1. Show that, in the posterior, S(θ ˆθ) (where S is a statistic that you should identify) has the same distribution as E 1. Part II, 2009 List of Questions

30 Paper 3, Section II 27I What is meant by an equaliser decision rule? What is meant by an extended Bayes rule? Show that a decision rule that is both an equaliser rule and extended Bayes is minimax. Let X 1,..., X n be independent and identically distributed random variables with the normal distribution N (θ, h 1 ), and let k > 0. It is desired to estimate θ with loss function L(θ, a) = 1 exp{ 1 2 k(a θ)2 }. Suppose the prior distribution is θ N (m 0, h 1 0 ). Find the Bayes act and the Bayes loss posterior to observing X 1 = x 1,..., X n = x n. What is the Bayes risk of the Bayes rule with respect to this prior distribution? Show that the rule that estimates θ by X = n 1 n i=1 X i is minimax. Paper 4, Section II 27I Consider the double dichotomy, where the loss is 0 for a correct decision and 1 for an incorrect decision. Describe the form of a Bayes decision rule. Assuming the equivalence of normal and extensive form analyses, deduce the Neyman Pearson lemma. For a problem with random variable X and real parameter θ, define monotone likelihood ratio (MLR) and monotone test. Suppose the problem has MLR in a real statistic T = t(x). Let φ be a monotone test, with power function γ( ), and let φ be any other test, with power function γ ( ). Show that if θ 1 > θ 0 and γ(θ 0 ) > γ (θ 0 ), then γ(θ 1 ) > γ (θ 1 ). Deduce that there exists θ [, ] such that γ(θ) γ (θ) for θ < θ, and γ(θ) γ (θ) for θ > θ. For an arbitrary prior distribution Π with density π( ), and an arbitrary value θ, show that the posterior odds Π(θ > θ X = x) Π(θ θ X = x) is a non-decreasing function of t(x). Part II, 2009 List of Questions [TURN OVER

31 /II/27I An angler starts fishing at time 0. Fish bite in a Poisson Process of rate Λ per hour, so that, if Λ = λ, the number N t of fish he catches in the first t hours has the Poisson distribution P(λt), while T n, the time in hours until his nth bite, has the Gamma distribution Γ(n, λ), with density function p(t λ) = λ n (n 1)! tn 1 e λ t (t > 0). Bystander B 1 plans to watch for 3 hours, and to record the number N 3 of fish caught. Bystander B 2 plans to observe until the 10th bite, and to record T 10, the number of hours until this occurs. For B 1, show that Λ 1 := N 3 /3 is an unbiased estimator of Λ whose variance function achieves the Cramér Rao lower bound. Find an unbiased estimator of Λ for B 2, of the form Λ 2 = k/t 10. Does it achieve the Cramér Rao lower bound? Is it minimum-variance-unbiased? Justify your answers. In fact, the 10th fish bites after exactly 3 hours. For each of B 1 and B 2, write down the likelihood function for Λ based their observations. What does the Likelihood Principle have to say about the inferences to be drawn by B 1 and B 2, and why? Compute the estimates λ 1 and λ 2 produced by applying Λ 1 and Λ 2 to the observed data. Does the method of minimum-variance-unbiased estimation respect the Likelihood Principle? Part II 2008

32 /II/27I Under hypothesis H i (i = 0, 1), a real-valued observable X, taking values in X, has density function p i ( ). Define the Type I error α and the Type II error β of a test φ : X [0, 1] of the null hypothesis H 0 against the alternative hypothesis H 1. What are the size and power of the test in terms of α and β? Show that, for 0 < c <, φ minimises cα + β among all possible tests if and only if it satisfies p 1 (x) > c p 0 (x) φ(x) = 1, p 1 (x) < c p 0 (x) φ(x) = 0. What does this imply about the admissibility of such a test? Given the value θ of a parameter variable Θ [0, 1), the observable X has density function 2(x θ) p(x θ) = (1 θ) 2 (θ x 1). For fixed θ (0, 1), describe all the likelihood ratio tests of H 0 : Θ = 0 against H θ : Θ = θ. For fixed k (0, 1), let φ k be the test that rejects H 0 if and only if X k. Is φ k admissible as a test of H 0 against H θ for every θ (0, 1)? Is it uniformly most powerful for its size for testing H 0 against the composite hypothesis H 1 : Θ (0, 1)? Is it admissible as a test of H 0 against H 1? 3/II/26I Define the notion of exponential family (EF), and show that, for data arising as a random sample of size n from an exponential family, there exists a sufficient statistic whose dimension stays bounded as n. The log-density of a normal distribution N (µ, v) can be expressed in the form log p(x φ) = φ 1 x + φ 2 x 2 k(φ) where φ = (φ 1, φ 2 ) is the value of an unknown parameter Φ = (Φ 1, Φ 2 ). Determine the function k, and the natural parameter-space F. What is the mean-value parameter H = (H 1, H 2 ) in terms of Φ? Determine the maximum likelihood estimator Φ 1 of Φ 1 based on a random sample (X 1,..., X n ), and give its asymptotic distribution for n. How would these answers be affected if the variance of X were known to have value v 0? Part II 2008

33 /II/27I Define sufficient statistic, and state the factorisation criterion for determining whether a statistic is sufficient. Show that a Bayesian posterior distribution depends on the data only through the value of a sufficient statistic. Given the value µ of an unknown parameter M, observables X 1,..., X n are independent and identically distributed with distribution N (µ, 1). Show that the statistic X := n 1 n i=1 X i is sufficient for M. If the prior distribution is M N (0, τ 2 ), determine the posterior distribution of M and the predictive distribution of X. In fact, there are two hypotheses as to the value of M. Under hypothesis H 0, M takes the known value 0; under H 1, M is unknown, with prior distribution N (0, τ 2 ). Explain why the Bayes factor for choosing between H 0 and H 1 depends only on X, and determine its value for data X 1 = x 1,..., X n = x n. The frequentist 5%-level test of H 0 against H 1 rejects H 0 when X 1.96/ n. What is the Bayes factor for the critical case x = 1.96/ n? How does this behave as n? Comment on the similarities or differences in behaviour between the frequentist and Bayesian tests. Part II 2008

34 /II/27I Suppose that X has density f( θ) where θ Θ. What does it mean to say that statistic T T (X) is sufficient for θ? Suppose that θ = (ψ, λ), where ψ is the parameter of interest, and λ is a nuisance parameter, and that the sufficient statistic T has the form T = (C, S). What does it mean to say that the statistic S is ancillary? If it is, how (according to the conditionality principle) do we test hypotheses on ψ? Assuming that the set of possible values for X is discrete, show that S is ancillary if and only if the density (probability mass function) f(x ψ, λ) factorises as f(x ψ, λ) = ϕ 0 (x) ϕ C (C(x), S(x), ψ) ϕ S (S(x), λ) ( ) for some functions ϕ 0, ϕ C, and ϕ S with the properties ϕ 0 (x) = 1 = ϕ S (s, λ) = x C 1 (c) S 1 (s) s s ϕ C (c, s, ψ) c for all c, s, ψ, and λ. Suppose now that X 1,..., X n are independent observations from a Γ(a, b) distribution, with density f(x a, b) = (bx) a 1 e bx bi {x>0} /Γ(a). Assuming that the criterion ( ) holds also for observations which are not discrete, show that it is not possible to find (C(X), S(X)) sufficient for (a, b) such that S is ancillary when b is regarded as a nuisance parameter, and a is the parameter of interest. Part II 2007

35 /II/27I (i) State Wilks likelihood ratio test of the null hypothesis H 0 : θ Θ 0 against the alternative H 1 : θ Θ 1, where Θ 0 Θ 1. Explain when this test may be used. (ii) Independent identically-distributed observations X 1,..., X n take values in the set S = {1,..., K}, with common distribution which under the null hypothesis is of the form P (X 1 = k θ) = f(k θ) (k S) for some θ Θ 0, where Θ 0 is an open subset of some Euclidean space R d, d < K 1. Under the alternative hypothesis, the probability mass function of the X i is unrestricted. Assuming sufficient regularity conditions on f to guarantee the existence and uniqueness of a maximum-likelihood estimator ˆθ n (X 1,..., X n ) of θ for each n, show that for large n the Wilks likelihood ratio test statistic is approximately of the form K (N j nˆπ j ) 2 /N j, j=1 where N j = n i=1 I {X i =j}, and ˆπ j = f(j ˆθ n ). What is the asymptotic distribution of this statistic? 3/II/26I (i) In the context of decision theory, what is a Bayes rule with respect to a given loss function and prior? What is an extended Bayes rule? Characterise the Bayes rule with respect to a given prior in terms of the posterior distribution for the parameter given the observation. When Θ = A = R d for some d, and the loss function is L(θ, a) = θ a 2, what is the Bayes rule? (ii) Suppose that A = Θ = R, with loss function L(θ, a) = (θ a) 2 and suppose further that under P θ, X N(θ, 1). Supposing that a N(0, τ 1 ) prior is taken over θ, compute the Bayes risk of the decision rule d λ (X) = λx. Find the posterior distribution of θ given X, and confirm that its mean is of the form d λ (X) for some value of λ which you should identify. Hence show that the decision rule d 1 is an extended Bayes rule. Part II 2007

36 /II/27I Assuming sufficient regularity conditions on the likelihood f(x θ) for a univariate parameter θ Θ, establish the Cramér Rao lower bound for the variance of an unbiased estimator of θ. If ˆθ(X) is an unbiased estimator of θ whose variance attains the Cramér Rao lower bound for every value of θ Θ, show that the likelihood function is an exponential family. Part II 2007

37 /II/27J (a) What is a loss function? What is a decision rule? What is the risk function of a decision rule? What is the Bayes risk of a decision rule with respect to a prior π? (b) Let θ R(θ, d) denote the risk function of decision rule d, and let r(π, d) denote the Bayes risk of decision rule d with respect to prior π. Suppose that d is a decision rule and π 0 is a prior over the parameter space Θ with the two properties (i) r(π 0, d ) = min d r(π 0, d) (ii) sup θ R(θ, d ) = r(π 0, d ). Prove that d is minimax. (c) Suppose now that Θ = A = R, where A is the space of possible actions, and that the loss function is L(θ, a) = exp( λaθ), where λ is a positive constant. If the law of the observation X given parameter θ is N(θ, σ 2 ), where σ > 0 is known, show (using (b) or otherwise) that the rule is minimax. d (x) = x/σ 2 λ 2/II/27J Let {f( θ) : θ Θ} be a parametric family of densities for observation X. What does it mean to say that the statistic T T (X) is sufficient for θ? What does it mean to say that T is minimal sufficient? State the Rao Blackwell theorem. State the Cramér Rao lower bound for the variance of an unbiased estimator of a (scalar) parameter, taking care to specify any assumptions needed. Let X 1,..., X n be a sample from a U(0, θ) distribution, where the positive parameter θ is unknown. Find a minimal sufficient statistic T for θ. If h(t ) is an unbiased estimator for θ, find the form of h, and deduce that this estimator is minimum-variance unbiased. Would it be possible to reach this conclusion using the Cramér Rao lower bound? Part II 2006

38 /II/26J Write an essay on the rôle of the Metropolis Hastings algorithm in computational Bayesian inference on a parametric model. You may for simplicity assume that the parameter space is finite. Your essay should: (a) explain what problem in Bayesian inference the Metropolis Hastings algorithm is used to tackle; (b) fully justify that the algorithm does indeed deliver the required information about the model; (c) discuss any implementational issues that need care. 4/II/27J (a) State the strong law of large numbers. State the central limit theorem. (b) Assuming whatever regularity conditions you require, show that if ˆθ n ˆθ n (X 1,..., X n ) is the maximum-likelihood estimator of the unknown parameter θ based on an independent identically distributed sample of size n, then under P θ n(ˆθn θ) N(0, J(θ) 1 ) in distribution as n, where J(θ) is a matrix which you should identify. A rigorous derivation is not required. (c) Suppose that X 1, X 2,... are independent binomial Bin(1, θ) random variables. It is required to test H 0 : θ = 1 2 against the alternative H 1 : θ (0, 1). Show that the construction of a likelihood-ratio test leads us to the statistic T n = 2n{ˆθ n log ˆθ n + (1 ˆθ n ) log(1 ˆθ n ) + log 2}, where ˆθ n n 1 n k=1 X k. Stating clearly any result to which you appeal, for large n, what approximately is the distribution of T n under H 0? Writing ˆθ n = Z n, and assuming that Z n is small, show that T n 4nZ 2 n. Using this and the central limit theorem, briefly justify the approximate distribution of T n given by asymptotic maximum-likelihood theory. What could you say if the assumption that Z n is small failed? Part II 2006

39 /II/27I State Wilks Theorem on the asymptotic distribution of likelihood-ratio test statistics. Suppose that X 1,..., X n are independent with common N(µ, σ 2 ) distribution, where the parameters µ and σ are both unknown. Find the likelihood-ratio test statistic for testing H 0 : µ = 0 against H 1 : µ unrestricted, and state its (approximate) distribution. What is the form of the t-test of H 0 against H 1? likelihood-ratio test and the t-test are nearly the same. Explain why for large n the 2/II/27I (i) Suppose that X is a multivariate normal vector with mean µ R d and covariance matrix σ 2 I, where µ and σ 2 are both unknown, and I denotes the d d identity matrix. Suppose that Θ 0 Θ 1 are linear subspaces of R d of dimensions d 0 and d 1, where d 0 < d 1 < d. Let P i denote orthogonal projection onto Θ i (i = 0, 1). Carefully derive the joint distribution of ( X P 1 X 2, P 1 X P 0 X 2 ) under the hypothesis H 0 : µ Θ 0. How could you use this to make a test of H 0 against H 1 : µ Θ 1? (ii) Suppose that I students take J exams, and that the mark X ij of student i in exam j is modelled as X ij = m + α i + β j + ε ij where i α i = 0 = j β j, the ε ij are independent N(0, σ 2 ), and the parameters m, α, β and σ are unknown. Construct a test of H 0 : β j = 0 for all j against H 1 : j β j = 0. 3/II/26I In the context of decision theory, explain the meaning of the following italicized terms: loss function, decision rule, the risk of a decision rule, a Bayes rule with respect to prior π, and an admissible rule. Explain how a Bayes rule with respect to a prior π can be constructed. Suppose that X 1,..., X n are independent with common N(0, v) distribution, where v > 0 is supposed to have a prior density f 0. In a decision-theoretic approach to estimating v, we take a quadratic loss: L(v, a) = (v a) 2. Write X = (X 1,..., X n ) and X = (X Xn) 2 1/2. By considering decision rules (estimators) of the form ˆv(X) = α X 2, prove that if α 1/(n + 2) then the estimator ˆv(X) = α X 2 is not Bayes, for any choice of prior f 0. By considering decision rules of the form ˆv(X) = α X 2 + β, prove that if α 1/n then the estimator ˆv(X) = α X 2 is not Bayes, for any choice of prior f 0. [You may use without proof the fact that, if Z has a N(0, 1) distribution, then EZ 4 = 3.] Part II 2005

A Very Brief Summary of Statistical Inference, and Examples

A Very Brief Summary of Statistical Inference, and Examples A Very Brief Summary of Statistical Inference, and Examples Trinity Term 2008 Prof. Gesine Reinert 1 Data x = x 1, x 2,..., x n, realisations of random variables X 1, X 2,..., X n with distribution (model)

More information

A Very Brief Summary of Bayesian Inference, and Examples

A Very Brief Summary of Bayesian Inference, and Examples A Very Brief Summary of Bayesian Inference, and Examples Trinity Term 009 Prof Gesine Reinert Our starting point are data x = x 1, x,, x n, which we view as realisations of random variables X 1, X,, X

More information

Hypothesis Testing. 1 Definitions of test statistics. CB: chapter 8; section 10.3

Hypothesis Testing. 1 Definitions of test statistics. CB: chapter 8; section 10.3 Hypothesis Testing CB: chapter 8; section 0.3 Hypothesis: statement about an unknown population parameter Examples: The average age of males in Sweden is 7. (statement about population mean) The lowest

More information

Lecture 1: Introduction

Lecture 1: Introduction Principles of Statistics Part II - Michaelmas 208 Lecturer: Quentin Berthet Lecture : Introduction This course is concerned with presenting some of the mathematical principles of statistical theory. One

More information

Lecture 8: Information Theory and Statistics

Lecture 8: Information Theory and Statistics Lecture 8: Information Theory and Statistics Part II: Hypothesis Testing and I-Hsiang Wang Department of Electrical Engineering National Taiwan University ihwang@ntu.edu.tw December 23, 2015 1 / 50 I-Hsiang

More information

Part III. A Decision-Theoretic Approach and Bayesian testing

Part III. A Decision-Theoretic Approach and Bayesian testing Part III A Decision-Theoretic Approach and Bayesian testing 1 Chapter 10 Bayesian Inference as a Decision Problem The decision-theoretic framework starts with the following situation. We would like to

More information

Foundations of Statistical Inference

Foundations of Statistical Inference Foundations of Statistical Inference Jonathan Marchini Department of Statistics University of Oxford MT 2013 Jonathan Marchini (University of Oxford) BS2a MT 2013 1 / 27 Course arrangements Lectures M.2

More information

MAS223 Statistical Inference and Modelling Exercises

MAS223 Statistical Inference and Modelling Exercises MAS223 Statistical Inference and Modelling Exercises The exercises are grouped into sections, corresponding to chapters of the lecture notes Within each section exercises are divided into warm-up questions,

More information

STAT215: Solutions for Homework 2

STAT215: Solutions for Homework 2 STAT25: Solutions for Homework 2 Due: Wednesday, Feb 4. (0 pt) Suppose we take one observation, X, from the discrete distribution, x 2 0 2 Pr(X x θ) ( θ)/4 θ/2 /2 (3 θ)/2 θ/4, 0 θ Find an unbiased estimator

More information

Lecture 7 Introduction to Statistical Decision Theory

Lecture 7 Introduction to Statistical Decision Theory Lecture 7 Introduction to Statistical Decision Theory I-Hsiang Wang Department of Electrical Engineering National Taiwan University ihwang@ntu.edu.tw December 20, 2016 1 / 55 I-Hsiang Wang IT Lecture 7

More information

Testing Hypothesis. Maura Mezzetti. Department of Economics and Finance Università Tor Vergata

Testing Hypothesis. Maura Mezzetti. Department of Economics and Finance Università Tor Vergata Maura Department of Economics and Finance Università Tor Vergata Hypothesis Testing Outline It is a mistake to confound strangeness with mystery Sherlock Holmes A Study in Scarlet Outline 1 The Power Function

More information

Unbiased Estimation. Binomial problem shows general phenomenon. An estimator can be good for some values of θ and bad for others.

Unbiased Estimation. Binomial problem shows general phenomenon. An estimator can be good for some values of θ and bad for others. Unbiased Estimation Binomial problem shows general phenomenon. An estimator can be good for some values of θ and bad for others. To compare ˆθ and θ, two estimators of θ: Say ˆθ is better than θ if it

More information

Spring 2012 Math 541A Exam 1. X i, S 2 = 1 n. n 1. X i I(X i < c), T n =

Spring 2012 Math 541A Exam 1. X i, S 2 = 1 n. n 1. X i I(X i < c), T n = Spring 2012 Math 541A Exam 1 1. (a) Let Z i be independent N(0, 1), i = 1, 2,, n. Are Z = 1 n n Z i and S 2 Z = 1 n 1 n (Z i Z) 2 independent? Prove your claim. (b) Let X 1, X 2,, X n be independent identically

More information

Brief Review on Estimation Theory

Brief Review on Estimation Theory Brief Review on Estimation Theory K. Abed-Meraim ENST PARIS, Signal and Image Processing Dept. abed@tsi.enst.fr This presentation is essentially based on the course BASTA by E. Moulines Brief review on

More information

9 Bayesian inference. 9.1 Subjective probability

9 Bayesian inference. 9.1 Subjective probability 9 Bayesian inference 1702-1761 9.1 Subjective probability This is probability regarded as degree of belief. A subjective probability of an event A is assessed as p if you are prepared to stake pm to win

More information

Spring 2012 Math 541B Exam 1

Spring 2012 Math 541B Exam 1 Spring 2012 Math 541B Exam 1 1. A sample of size n is drawn without replacement from an urn containing N balls, m of which are red and N m are black; the balls are otherwise indistinguishable. Let X denote

More information

Mathematics Ph.D. Qualifying Examination Stat Probability, January 2018

Mathematics Ph.D. Qualifying Examination Stat Probability, January 2018 Mathematics Ph.D. Qualifying Examination Stat 52800 Probability, January 2018 NOTE: Answers all questions completely. Justify every step. Time allowed: 3 hours. 1. Let X 1,..., X n be a random sample from

More information

General Bayesian Inference I

General Bayesian Inference I General Bayesian Inference I Outline: Basic concepts, One-parameter models, Noninformative priors. Reading: Chapters 10 and 11 in Kay-I. (Occasional) Simplified Notation. When there is no potential for

More information

STA 732: Inference. Notes 10. Parameter Estimation from a Decision Theoretic Angle. Other resources

STA 732: Inference. Notes 10. Parameter Estimation from a Decision Theoretic Angle. Other resources STA 732: Inference Notes 10. Parameter Estimation from a Decision Theoretic Angle Other resources 1 Statistical rules, loss and risk We saw that a major focus of classical statistics is comparing various

More information

Suggested solutions to written exam Jan 17, 2012

Suggested solutions to written exam Jan 17, 2012 LINKÖPINGS UNIVERSITET Institutionen för datavetenskap Statistik, ANd 73A36 THEORY OF STATISTICS, 6 CDTS Master s program in Statistics and Data Mining Fall semester Written exam Suggested solutions to

More information

Chapter 4 HOMEWORK ASSIGNMENTS. 4.1 Homework #1

Chapter 4 HOMEWORK ASSIGNMENTS. 4.1 Homework #1 Chapter 4 HOMEWORK ASSIGNMENTS These homeworks may be modified as the semester progresses. It is your responsibility to keep up to date with the correctly assigned homeworks. There may be some errors in

More information

Fall 2017 STAT 532 Homework Peter Hoff. 1. Let P be a probability measure on a collection of sets A.

Fall 2017 STAT 532 Homework Peter Hoff. 1. Let P be a probability measure on a collection of sets A. 1. Let P be a probability measure on a collection of sets A. (a) For each n N, let H n be a set in A such that H n H n+1. Show that P (H n ) monotonically converges to P ( k=1 H k) as n. (b) For each n

More information

Unbiased Estimation. Binomial problem shows general phenomenon. An estimator can be good for some values of θ and bad for others.

Unbiased Estimation. Binomial problem shows general phenomenon. An estimator can be good for some values of θ and bad for others. Unbiased Estimation Binomial problem shows general phenomenon. An estimator can be good for some values of θ and bad for others. To compare ˆθ and θ, two estimators of θ: Say ˆθ is better than θ if it

More information

Final Exam. 1. (6 points) True/False. Please read the statements carefully, as no partial credit will be given.

Final Exam. 1. (6 points) True/False. Please read the statements carefully, as no partial credit will be given. 1. (6 points) True/False. Please read the statements carefully, as no partial credit will be given. (a) If X and Y are independent, Corr(X, Y ) = 0. (b) (c) (d) (e) A consistent estimator must be asymptotically

More information

MATH c UNIVERSITY OF LEEDS Examination for the Module MATH2715 (January 2015) STATISTICAL METHODS. Time allowed: 2 hours

MATH c UNIVERSITY OF LEEDS Examination for the Module MATH2715 (January 2015) STATISTICAL METHODS. Time allowed: 2 hours MATH2750 This question paper consists of 8 printed pages, each of which is identified by the reference MATH275. All calculators must carry an approval sticker issued by the School of Mathematics. c UNIVERSITY

More information

Master s Written Examination

Master s Written Examination Master s Written Examination Option: Statistics and Probability Spring 016 Full points may be obtained for correct answers to eight questions. Each numbered question which may have several parts is worth

More information

Mathematical Statistics

Mathematical Statistics Mathematical Statistics Chapter Three. Point Estimation 3.4 Uniformly Minimum Variance Unbiased Estimator(UMVUE) Criteria for Best Estimators MSE Criterion Let F = {p(x; θ) : θ Θ} be a parametric distribution

More information

ST5215: Advanced Statistical Theory

ST5215: Advanced Statistical Theory Department of Statistics & Applied Probability Wednesday, October 5, 2011 Lecture 13: Basic elements and notions in decision theory Basic elements X : a sample from a population P P Decision: an action

More information

Testing Statistical Hypotheses

Testing Statistical Hypotheses E.L. Lehmann Joseph P. Romano Testing Statistical Hypotheses Third Edition 4y Springer Preface vii I Small-Sample Theory 1 1 The General Decision Problem 3 1.1 Statistical Inference and Statistical Decisions

More information

Chapter 3. Point Estimation. 3.1 Introduction

Chapter 3. Point Estimation. 3.1 Introduction Chapter 3 Point Estimation Let (Ω, A, P θ ), P θ P = {P θ θ Θ}be probability space, X 1, X 2,..., X n : (Ω, A) (IR k, B k ) random variables (X, B X ) sample space γ : Θ IR k measurable function, i.e.

More information

1. Fisher Information

1. Fisher Information 1. Fisher Information Let f(x θ) be a density function with the property that log f(x θ) is differentiable in θ throughout the open p-dimensional parameter set Θ R p ; then the score statistic (or score

More information

Definition 3.1 A statistical hypothesis is a statement about the unknown values of the parameters of the population distribution.

Definition 3.1 A statistical hypothesis is a statement about the unknown values of the parameters of the population distribution. Hypothesis Testing Definition 3.1 A statistical hypothesis is a statement about the unknown values of the parameters of the population distribution. Suppose the family of population distributions is indexed

More information

Master s Written Examination - Solution

Master s Written Examination - Solution Master s Written Examination - Solution Spring 204 Problem Stat 40 Suppose X and X 2 have the joint pdf f X,X 2 (x, x 2 ) = 2e (x +x 2 ), 0 < x < x 2

More information

DA Freedman Notes on the MLE Fall 2003

DA Freedman Notes on the MLE Fall 2003 DA Freedman Notes on the MLE Fall 2003 The object here is to provide a sketch of the theory of the MLE. Rigorous presentations can be found in the references cited below. Calculus. Let f be a smooth, scalar

More information

Qualifying Exam in Probability and Statistics. https://www.soa.org/files/edu/edu-exam-p-sample-quest.pdf

Qualifying Exam in Probability and Statistics. https://www.soa.org/files/edu/edu-exam-p-sample-quest.pdf Part : Sample Problems for the Elementary Section of Qualifying Exam in Probability and Statistics https://www.soa.org/files/edu/edu-exam-p-sample-quest.pdf Part 2: Sample Problems for the Advanced Section

More information

Statistics Ph.D. Qualifying Exam: Part I October 18, 2003

Statistics Ph.D. Qualifying Exam: Part I October 18, 2003 Statistics Ph.D. Qualifying Exam: Part I October 18, 2003 Student Name: 1. Answer 8 out of 12 problems. Mark the problems you selected in the following table. 1 2 3 4 5 6 7 8 9 10 11 12 2. Write your answer

More information

Statistics 3858 : Maximum Likelihood Estimators

Statistics 3858 : Maximum Likelihood Estimators Statistics 3858 : Maximum Likelihood Estimators 1 Method of Maximum Likelihood In this method we construct the so called likelihood function, that is L(θ) = L(θ; X 1, X 2,..., X n ) = f n (X 1, X 2,...,

More information

LECTURE 5 NOTES. n t. t Γ(a)Γ(b) pt+a 1 (1 p) n t+b 1. The marginal density of t is. Γ(t + a)γ(n t + b) Γ(n + a + b)

LECTURE 5 NOTES. n t. t Γ(a)Γ(b) pt+a 1 (1 p) n t+b 1. The marginal density of t is. Γ(t + a)γ(n t + b) Γ(n + a + b) LECTURE 5 NOTES 1. Bayesian point estimators. In the conventional (frequentist) approach to statistical inference, the parameter θ Θ is considered a fixed quantity. In the Bayesian approach, it is considered

More information

simple if it completely specifies the density of x

simple if it completely specifies the density of x 3. Hypothesis Testing Pure significance tests Data x = (x 1,..., x n ) from f(x, θ) Hypothesis H 0 : restricts f(x, θ) Are the data consistent with H 0? H 0 is called the null hypothesis simple if it completely

More information

Bayesian statistics: Inference and decision theory

Bayesian statistics: Inference and decision theory Bayesian statistics: Inference and decision theory Patric Müller und Francesco Antognini Seminar über Statistik FS 28 3.3.28 Contents 1 Introduction and basic definitions 2 2 Bayes Method 4 3 Two optimalities:

More information

A Very Brief Summary of Statistical Inference, and Examples

A Very Brief Summary of Statistical Inference, and Examples A Very Brief Summary of Statistical Inference, and Examples Trinity Term 2009 Prof. Gesine Reinert Our standard situation is that we have data x = x 1, x 2,..., x n, which we view as realisations of random

More information

Mathematical statistics

Mathematical statistics October 18 th, 2018 Lecture 16: Midterm review Countdown to mid-term exam: 7 days Week 1 Chapter 1: Probability review Week 2 Week 4 Week 7 Chapter 6: Statistics Chapter 7: Point Estimation Chapter 8:

More information

Lecture 2: Basic Concepts of Statistical Decision Theory

Lecture 2: Basic Concepts of Statistical Decision Theory EE378A Statistical Signal Processing Lecture 2-03/31/2016 Lecture 2: Basic Concepts of Statistical Decision Theory Lecturer: Jiantao Jiao, Tsachy Weissman Scribe: John Miller and Aran Nayebi In this lecture

More information

Chapter 3 : Likelihood function and inference

Chapter 3 : Likelihood function and inference Chapter 3 : Likelihood function and inference 4 Likelihood function and inference The likelihood Information and curvature Sufficiency and ancilarity Maximum likelihood estimation Non-regular models EM

More information

λ(x + 1)f g (x) > θ 0

λ(x + 1)f g (x) > θ 0 Stat 8111 Final Exam December 16 Eleven students took the exam, the scores were 92, 78, 4 in the 5 s, 1 in the 4 s, 1 in the 3 s and 3 in the 2 s. 1. i) Let X 1, X 2,..., X n be iid each Bernoulli(θ) where

More information

STAT 730 Chapter 4: Estimation

STAT 730 Chapter 4: Estimation STAT 730 Chapter 4: Estimation Timothy Hanson Department of Statistics, University of South Carolina Stat 730: Multivariate Analysis 1 / 23 The likelihood We have iid data, at least initially. Each datum

More information

Nuisance parameters and their treatment

Nuisance parameters and their treatment BS2 Statistical Inference, Lecture 2, Hilary Term 2008 April 2, 2008 Ancillarity Inference principles Completeness A statistic A = a(x ) is said to be ancillary if (i) The distribution of A does not depend

More information

Exercises and Answers to Chapter 1

Exercises and Answers to Chapter 1 Exercises and Answers to Chapter The continuous type of random variable X has the following density function: a x, if < x < a, f (x), otherwise. Answer the following questions. () Find a. () Obtain mean

More information

Math 494: Mathematical Statistics

Math 494: Mathematical Statistics Math 494: Mathematical Statistics Instructor: Jimin Ding jmding@wustl.edu Department of Mathematics Washington University in St. Louis Class materials are available on course website (www.math.wustl.edu/

More information

Recall that in order to prove Theorem 8.8, we argued that under certain regularity conditions, the following facts are true under H 0 : 1 n

Recall that in order to prove Theorem 8.8, we argued that under certain regularity conditions, the following facts are true under H 0 : 1 n Chapter 9 Hypothesis Testing 9.1 Wald, Rao, and Likelihood Ratio Tests Suppose we wish to test H 0 : θ = θ 0 against H 1 : θ θ 0. The likelihood-based results of Chapter 8 give rise to several possible

More information

3.0.1 Multivariate version and tensor product of experiments

3.0.1 Multivariate version and tensor product of experiments ECE598: Information-theoretic methods in high-dimensional statistics Spring 2016 Lecture 3: Minimax risk of GLM and four extensions Lecturer: Yihong Wu Scribe: Ashok Vardhan, Jan 28, 2016 [Ed. Mar 24]

More information

Probability and Measure

Probability and Measure Part II Year 2018 2017 2016 2015 2014 2013 2012 2011 2010 2009 2008 2007 2006 2005 2018 84 Paper 4, Section II 26J Let (X, A) be a measurable space. Let T : X X be a measurable map, and µ a probability

More information

Chapter 5. Bayesian Statistics

Chapter 5. Bayesian Statistics Chapter 5. Bayesian Statistics Principles of Bayesian Statistics Anything unknown is given a probability distribution, representing degrees of belief [subjective probability]. Degrees of belief [subjective

More information

Chapter 7. Hypothesis Testing

Chapter 7. Hypothesis Testing Chapter 7. Hypothesis Testing Joonpyo Kim June 24, 2017 Joonpyo Kim Ch7 June 24, 2017 1 / 63 Basic Concepts of Testing Suppose that our interest centers on a random variable X which has density function

More information

Hypothesis testing: theory and methods

Hypothesis testing: theory and methods Statistical Methods Warsaw School of Economics November 3, 2017 Statistical hypothesis is the name of any conjecture about unknown parameters of a population distribution. The hypothesis should be verifiable

More information

Test Code: STA/STB (Short Answer Type) 2013 Junior Research Fellowship for Research Course in Statistics

Test Code: STA/STB (Short Answer Type) 2013 Junior Research Fellowship for Research Course in Statistics Test Code: STA/STB (Short Answer Type) 2013 Junior Research Fellowship for Research Course in Statistics The candidates for the research course in Statistics will have to take two shortanswer type tests

More information

Lecture 7 October 13

Lecture 7 October 13 STATS 300A: Theory of Statistics Fall 2015 Lecture 7 October 13 Lecturer: Lester Mackey Scribe: Jing Miao and Xiuyuan Lu 7.1 Recap So far, we have investigated various criteria for optimal inference. We

More information

Statement: With my signature I confirm that the solutions are the product of my own work. Name: Signature:.

Statement: With my signature I confirm that the solutions are the product of my own work. Name: Signature:. MATHEMATICAL STATISTICS Homework assignment Instructions Please turn in the homework with this cover page. You do not need to edit the solutions. Just make sure the handwriting is legible. You may discuss

More information

7. Estimation and hypothesis testing. Objective. Recommended reading

7. Estimation and hypothesis testing. Objective. Recommended reading 7. Estimation and hypothesis testing Objective In this chapter, we show how the election of estimators can be represented as a decision problem. Secondly, we consider the problem of hypothesis testing

More information

Maximum Likelihood Estimation

Maximum Likelihood Estimation Chapter 8 Maximum Likelihood Estimation 8. Consistency If X is a random variable (or vector) with density or mass function f θ (x) that depends on a parameter θ, then the function f θ (X) viewed as a function

More information

Part IB. Statistics. Year

Part IB. Statistics. Year Part IB Year 2018 2017 2016 2015 2014 2013 2012 2011 2010 2009 2008 2007 2006 2005 2004 2003 2002 2001 2018 42 Paper 1, Section I 7H X 1, X 2,..., X n form a random sample from a distribution whose probability

More information

Part IB Statistics. Theorems with proof. Based on lectures by D. Spiegelhalter Notes taken by Dexter Chua. Lent 2015

Part IB Statistics. Theorems with proof. Based on lectures by D. Spiegelhalter Notes taken by Dexter Chua. Lent 2015 Part IB Statistics Theorems with proof Based on lectures by D. Spiegelhalter Notes taken by Dexter Chua Lent 2015 These notes are not endorsed by the lecturers, and I have modified them (often significantly)

More information

INTRODUCTION TO BAYESIAN METHODS II

INTRODUCTION TO BAYESIAN METHODS II INTRODUCTION TO BAYESIAN METHODS II Abstract. We will revisit point estimation and hypothesis testing from the Bayesian perspective.. Bayes estimators Let X = (X,..., X n ) be a random sample from the

More information

Maximum Likelihood Estimation

Maximum Likelihood Estimation Chapter 7 Maximum Likelihood Estimation 7. Consistency If X is a random variable (or vector) with density or mass function f θ (x) that depends on a parameter θ, then the function f θ (X) viewed as a function

More information

STATISTICS SYLLABUS UNIT I

STATISTICS SYLLABUS UNIT I STATISTICS SYLLABUS UNIT I (Probability Theory) Definition Classical and axiomatic approaches.laws of total and compound probability, conditional probability, Bayes Theorem. Random variable and its distribution

More information

STATS 200: Introduction to Statistical Inference. Lecture 29: Course review

STATS 200: Introduction to Statistical Inference. Lecture 29: Course review STATS 200: Introduction to Statistical Inference Lecture 29: Course review Course review We started in Lecture 1 with a fundamental assumption: Data is a realization of a random process. The goal throughout

More information

PCMI Introduction to Random Matrix Theory Handout # REVIEW OF PROBABILITY THEORY. Chapter 1 - Events and Their Probabilities

PCMI Introduction to Random Matrix Theory Handout # REVIEW OF PROBABILITY THEORY. Chapter 1 - Events and Their Probabilities PCMI 207 - Introduction to Random Matrix Theory Handout #2 06.27.207 REVIEW OF PROBABILITY THEORY Chapter - Events and Their Probabilities.. Events as Sets Definition (σ-field). A collection F of subsets

More information

Statistics: Learning models from data

Statistics: Learning models from data DS-GA 1002 Lecture notes 5 October 19, 2015 Statistics: Learning models from data Learning models from data that are assumed to be generated probabilistically from a certain unknown distribution is a crucial

More information

Statistics & Data Sciences: First Year Prelim Exam May 2018

Statistics & Data Sciences: First Year Prelim Exam May 2018 Statistics & Data Sciences: First Year Prelim Exam May 2018 Instructions: 1. Do not turn this page until instructed to do so. 2. Start each new question on a new sheet of paper. 3. This is a closed book

More information

Mathematics Qualifying Examination January 2015 STAT Mathematical Statistics

Mathematics Qualifying Examination January 2015 STAT Mathematical Statistics Mathematics Qualifying Examination January 2015 STAT 52800 - Mathematical Statistics NOTE: Answer all questions completely and justify your derivations and steps. A calculator and statistical tables (normal,

More information

Probability Theory and Statistics. Peter Jochumzen

Probability Theory and Statistics. Peter Jochumzen Probability Theory and Statistics Peter Jochumzen April 18, 2016 Contents 1 Probability Theory And Statistics 3 1.1 Experiment, Outcome and Event................................ 3 1.2 Probability............................................

More information

Mathematical Statistics

Mathematical Statistics Mathematical Statistics MAS 713 Chapter 8 Previous lecture: 1 Bayesian Inference 2 Decision theory 3 Bayesian Vs. Frequentist 4 Loss functions 5 Conjugate priors Any questions? Mathematical Statistics

More information

Statistics. Statistics

Statistics. Statistics The main aims of statistics 1 1 Choosing a model 2 Estimating its parameter(s) 1 point estimates 2 interval estimates 3 Testing hypotheses Distributions used in statistics: χ 2 n-distribution 2 Let X 1,

More information

Final Examination Statistics 200C. T. Ferguson June 11, 2009

Final Examination Statistics 200C. T. Ferguson June 11, 2009 Final Examination Statistics 00C T. Ferguson June, 009. (a) Define: X n converges in probability to X. (b) Define: X m converges in quadratic mean to X. (c) Show that if X n converges in quadratic mean

More information

Central Limit Theorem ( 5.3)

Central Limit Theorem ( 5.3) Central Limit Theorem ( 5.3) Let X 1, X 2,... be a sequence of independent random variables, each having n mean µ and variance σ 2. Then the distribution of the partial sum S n = X i i=1 becomes approximately

More information

McGill University. Faculty of Science. Department of Mathematics and Statistics. Part A Examination. Statistics: Theory Paper

McGill University. Faculty of Science. Department of Mathematics and Statistics. Part A Examination. Statistics: Theory Paper McGill University Faculty of Science Department of Mathematics and Statistics Part A Examination Statistics: Theory Paper Date: 10th May 2015 Instructions Time: 1pm-5pm Answer only two questions from Section

More information

COS513 LECTURE 8 STATISTICAL CONCEPTS

COS513 LECTURE 8 STATISTICAL CONCEPTS COS513 LECTURE 8 STATISTICAL CONCEPTS NIKOLAI SLAVOV AND ANKUR PARIKH 1. MAKING MEANINGFUL STATEMENTS FROM JOINT PROBABILITY DISTRIBUTIONS. A graphical model (GM) represents a family of probability distributions

More information

Fundamentals. CS 281A: Statistical Learning Theory. Yangqing Jia. August, Based on tutorial slides by Lester Mackey and Ariel Kleiner

Fundamentals. CS 281A: Statistical Learning Theory. Yangqing Jia. August, Based on tutorial slides by Lester Mackey and Ariel Kleiner Fundamentals CS 281A: Statistical Learning Theory Yangqing Jia Based on tutorial slides by Lester Mackey and Ariel Kleiner August, 2011 Outline 1 Probability 2 Statistics 3 Linear Algebra 4 Optimization

More information

Part IA Probability. Definitions. Based on lectures by R. Weber Notes taken by Dexter Chua. Lent 2015

Part IA Probability. Definitions. Based on lectures by R. Weber Notes taken by Dexter Chua. Lent 2015 Part IA Probability Definitions Based on lectures by R. Weber Notes taken by Dexter Chua Lent 2015 These notes are not endorsed by the lecturers, and I have modified them (often significantly) after lectures.

More information

2017 Financial Mathematics Orientation - Statistics

2017 Financial Mathematics Orientation - Statistics 2017 Financial Mathematics Orientation - Statistics Written by Long Wang Edited by Joshua Agterberg August 21, 2018 Contents 1 Preliminaries 5 1.1 Samples and Population............................. 5

More information

Formulas for probability theory and linear models SF2941

Formulas for probability theory and linear models SF2941 Formulas for probability theory and linear models SF2941 These pages + Appendix 2 of Gut) are permitted as assistance at the exam. 11 maj 2008 Selected formulae of probability Bivariate probability Transforms

More information

1 Introduction. P (n = 1 red ball drawn) =

1 Introduction. P (n = 1 red ball drawn) = Introduction Exercises and outline solutions. Y has a pack of 4 cards (Ace and Queen of clubs, Ace and Queen of Hearts) from which he deals a random of selection 2 to player X. What is the probability

More information

Estimation theory. Parametric estimation. Properties of estimators. Minimum variance estimator. Cramer-Rao bound. Maximum likelihood estimators

Estimation theory. Parametric estimation. Properties of estimators. Minimum variance estimator. Cramer-Rao bound. Maximum likelihood estimators Estimation theory Parametric estimation Properties of estimators Minimum variance estimator Cramer-Rao bound Maximum likelihood estimators Confidence intervals Bayesian estimation 1 Random Variables Let

More information

6.1 Variational representation of f-divergences

6.1 Variational representation of f-divergences ECE598: Information-theoretic methods in high-dimensional statistics Spring 2016 Lecture 6: Variational representation, HCR and CR lower bounds Lecturer: Yihong Wu Scribe: Georgios Rovatsos, Feb 11, 2016

More information

Qualifying Exam in Probability and Statistics. https://www.soa.org/files/edu/edu-exam-p-sample-quest.pdf

Qualifying Exam in Probability and Statistics. https://www.soa.org/files/edu/edu-exam-p-sample-quest.pdf Part 1: Sample Problems for the Elementary Section of Qualifying Exam in Probability and Statistics https://www.soa.org/files/edu/edu-exam-p-sample-quest.pdf Part 2: Sample Problems for the Advanced Section

More information

Hypothesis Test. The opposite of the null hypothesis, called an alternative hypothesis, becomes

Hypothesis Test. The opposite of the null hypothesis, called an alternative hypothesis, becomes Neyman-Pearson paradigm. Suppose that a researcher is interested in whether the new drug works. The process of determining whether the outcome of the experiment points to yes or no is called hypothesis

More information

Statistical Data Analysis Stat 3: p-values, parameter estimation

Statistical Data Analysis Stat 3: p-values, parameter estimation Statistical Data Analysis Stat 3: p-values, parameter estimation London Postgraduate Lectures on Particle Physics; University of London MSci course PH4515 Glen Cowan Physics Department Royal Holloway,

More information

Let us first identify some classes of hypotheses. simple versus simple. H 0 : θ = θ 0 versus H 1 : θ = θ 1. (1) one-sided

Let us first identify some classes of hypotheses. simple versus simple. H 0 : θ = θ 0 versus H 1 : θ = θ 1. (1) one-sided Let us first identify some classes of hypotheses. simple versus simple H 0 : θ = θ 0 versus H 1 : θ = θ 1. (1) one-sided H 0 : θ θ 0 versus H 1 : θ > θ 0. (2) two-sided; null on extremes H 0 : θ θ 1 or

More information

Module 22: Bayesian Methods Lecture 9 A: Default prior selection

Module 22: Bayesian Methods Lecture 9 A: Default prior selection Module 22: Bayesian Methods Lecture 9 A: Default prior selection Peter Hoff Departments of Statistics and Biostatistics University of Washington Outline Jeffreys prior Unit information priors Empirical

More information

Statistical Methods for Handling Incomplete Data Chapter 2: Likelihood-based approach

Statistical Methods for Handling Incomplete Data Chapter 2: Likelihood-based approach Statistical Methods for Handling Incomplete Data Chapter 2: Likelihood-based approach Jae-Kwang Kim Department of Statistics, Iowa State University Outline 1 Introduction 2 Observed likelihood 3 Mean Score

More information

ECE534, Spring 2018: Solutions for Problem Set #3

ECE534, Spring 2018: Solutions for Problem Set #3 ECE534, Spring 08: Solutions for Problem Set #3 Jointly Gaussian Random Variables and MMSE Estimation Suppose that X, Y are jointly Gaussian random variables with µ X = µ Y = 0 and σ X = σ Y = Let their

More information

ECE531 Lecture 10b: Maximum Likelihood Estimation

ECE531 Lecture 10b: Maximum Likelihood Estimation ECE531 Lecture 10b: Maximum Likelihood Estimation D. Richard Brown III Worcester Polytechnic Institute 05-Apr-2011 Worcester Polytechnic Institute D. Richard Brown III 05-Apr-2011 1 / 23 Introduction So

More information

Stat 535 C - Statistical Computing & Monte Carlo Methods. Arnaud Doucet.

Stat 535 C - Statistical Computing & Monte Carlo Methods. Arnaud Doucet. Stat 535 C - Statistical Computing & Monte Carlo Methods Arnaud Doucet Email: arnaud@cs.ubc.ca 1 CS students: don t forget to re-register in CS-535D. Even if you just audit this course, please do register.

More information

Ch. 5 Hypothesis Testing

Ch. 5 Hypothesis Testing Ch. 5 Hypothesis Testing The current framework of hypothesis testing is largely due to the work of Neyman and Pearson in the late 1920s, early 30s, complementing Fisher s work on estimation. As in estimation,

More information

SYSM 6303: Quantitative Introduction to Risk and Uncertainty in Business Lecture 4: Fitting Data to Distributions

SYSM 6303: Quantitative Introduction to Risk and Uncertainty in Business Lecture 4: Fitting Data to Distributions SYSM 6303: Quantitative Introduction to Risk and Uncertainty in Business Lecture 4: Fitting Data to Distributions M. Vidyasagar Cecil & Ida Green Chair The University of Texas at Dallas Email: M.Vidyasagar@utdallas.edu

More information

ECE 275B Homework # 1 Solutions Version Winter 2015

ECE 275B Homework # 1 Solutions Version Winter 2015 ECE 275B Homework # 1 Solutions Version Winter 2015 1. (a) Because x i are assumed to be independent realizations of a continuous random variable, it is almost surely (a.s.) 1 the case that x 1 < x 2

More information

Testing Statistical Hypotheses

Testing Statistical Hypotheses E.L. Lehmann Joseph P. Romano, 02LEu1 ttd ~Lt~S Testing Statistical Hypotheses Third Edition With 6 Illustrations ~Springer 2 The Probability Background 28 2.1 Probability and Measure 28 2.2 Integration.........

More information

Statistical Theory MT 2007 Problems 4: Solution sketches

Statistical Theory MT 2007 Problems 4: Solution sketches Statistical Theory MT 007 Problems 4: Solution sketches 1. Consider a 1-parameter exponential family model with density f(x θ) = f(x)g(θ)exp{cφ(θ)h(x)}, x X. Suppose that the prior distribution has the

More information

Random Variables and Their Distributions

Random Variables and Their Distributions Chapter 3 Random Variables and Their Distributions A random variable (r.v.) is a function that assigns one and only one numerical value to each simple event in an experiment. We will denote r.vs by capital

More information

The Bayesian Choice. Christian P. Robert. From Decision-Theoretic Foundations to Computational Implementation. Second Edition.

The Bayesian Choice. Christian P. Robert. From Decision-Theoretic Foundations to Computational Implementation. Second Edition. Christian P. Robert The Bayesian Choice From Decision-Theoretic Foundations to Computational Implementation Second Edition With 23 Illustrations ^Springer" Contents Preface to the Second Edition Preface

More information