Statistical Methods for Handling Incomplete Data Chapter 2: Likelihood-based approach

Size: px
Start display at page:

Download "Statistical Methods for Handling Incomplete Data Chapter 2: Likelihood-based approach"

Transcription

1 Statistical Methods for Handling Incomplete Data Chapter 2: Likelihood-based approach Jae-Kwang Kim Department of Statistics, Iowa State University

2 Outline 1 Introduction 2 Observed likelihood 3 Mean Score Approach 4 Observed Information Jae-Kwang Kim (ISU) Ch 2 P1 2 / 52

3 1. Introduction - Basic Setup (No missing data) y = (y 1, y 2, y n) is a realization of the random sample from an infinite population with density f (y). Assume that the true density f (y) belongs to a parametric family of densities P = {f (y; θ) ; θ Ω} indexed by θ Ω. That is, there exist θ 0 Ω such that f (y; θ 0) = f (y) for all y. Jae-Kwang Kim (ISU) Ch 2 P1 3 / 52

4 Likelihood Definitions for likelihood theory The likelihood function of θ is defined as where f (y; θ) is the joint pdf of y. L(θ) = f (y; θ) Let ˆθ be the maximum likelihood estimator (MLE) of θ 0 if it satisfies L(ˆθ) = max θ Θ L(θ). A parametric family of densities, P = {f (y; θ); θ Θ}, is identifiable if for all y, f (y; θ 1) f (y; θ 2) for every θ 1 θ 2. Jae-Kwang Kim (ISU) Ch 2 P1 4 / 52

5 Lemma 2.1 Properties of identifiable distribution Lemma 2.1. If P = {f (y; θ) ; θ Ω} is identifiable and E { ln f (y; θ) } < for all θ, then [ ] f (y; θ) M (θ) = E θ0 ln f (y; θ 0) has a unique maximum at θ 0. [Proof] Let Z = f (y; θ) /f (y; θ 0). Use the strict version of Jensen s inequality ln [E (Z)] < E [ ln (Z)]. Because E θ0 (Z) = f (y; θ) dy = 1, we have ln [E (Z)] = 0. Jae-Kwang Kim (ISU) Ch 2 P1 5 / 52

6 Remark 1 In Lemma 2.1, M (θ) is called the Kullback-Leibler divergence measure of f (y; θ) from f (y; θ 0). It is often considered a measure of distance between the two distributions. 2 Lemma 2.1 simply says that, under identifiability, Q(θ) = E θ0 {log f (Y ; θ)} takes the (unique) maximum value at θ = θ 0. 3 Define Q n(θ) = n 1 n log f (y i ; θ) then the MLE ˆθ is the maximizer of Q n(θ). Since Q n(θ) converges in probability to Q(θ), can we say that the maximizer of Q n(θ) converges to the maximizer of Q(θ)? i=1 Jae-Kwang Kim (ISU) Ch 2 P1 6 / 52

7 Theorem 2.1: (Weak) consistency of MLE Theorem 2.1 Let and ˆθ be any solution of Q n (θ) = 1 n Assume the following two conditions: 1 Uniform weak convergence: for some non-stochastic function Q (θ) n log f (y i, θ) i=1 Q n(ˆθ) = max Qn (θ). θ Ω p sup Q n (θ) Q (θ) 0 θ Ω 2 Identification: Q (θ) is uniquely maximized at θ 0. Then, ˆθ p θ 0. Jae-Kwang Kim (ISU) Ch 2 P1 7 / 52

8 Remark 1 Convergence in probability: for any ɛ > 0. ˆθ p θ 0 P { } ˆθ θ 0 > ɛ 0 as n, 2 If P = {f (y; θ) ; θ Ω} is not identifiable, then Q (θ) may not have a unique maximum and ˆθ may not converge (in probability) to a single point. 3 If the true distribution f (y) does not belong to the class P = {f (y; θ) ; θ Ω}, which point does ˆθ converge to? Jae-Kwang Kim (ISU) Ch 2 P1 8 / 52

9 Example: log-likelihood function for N(θ, 1) Log-likelihood Under assumption N(θ, 1): l n(θ) = n log f (y i ; θ) i=1 Plot of l n(θ) on θ: Jae-Kwang Kim (ISU) Ch 2 P1 9 / 52

10 Other properties of MLE Asymptotic normality (Theorem 2.2) Asymptotic optimality: MLE achieves Cramer-Rao lower bound Wilks theorem: 2{l n(ˆθ) l n(θ 0)} d χ 2 p Jae-Kwang Kim (ISU) Ch 2 P1 10 / 52

11 1 Introduction - Fisher information Definition 1 Score function: S(θ) = log L(θ) θ 2 Fisher information = curvature of the log-likelihood: I(θ) = 2 log L(θ) = θ θt θ S(θ) T 3 Observed (Fisher) information: I(ˆθ n) where ˆθ n is the MLE. 4 Expected (Fisher) information: I(θ) = E θ {I(θ)} The observed information is always positive. The observed information applies to a single dataset. The expected information is meaningful as a function of θ across the admissible values of θ. The expected information is an average quantity over all possible datasets. I(ˆθ) = I(ˆθ) for exponential family. Jae-Kwang Kim (ISU) Ch 2 P1 11 / 52

12 Properties of score function Theorem 2.3. Properties of score functions Under the regularity conditions allowing the exchange of the order of integration and differentiation, E θ {S(θ)} = 0 and V θ {S(θ)} = I(θ). Jae-Kwang Kim (ISU) Ch 2 P1 12 / 52

13 Proof Jae-Kwang Kim (ISU) Ch 2 P1 13 / 52

14 1 Introduction - Fisher information Remark Since ˆθ p θ 0, we can apply a Taylor linearization on S(ˆθ) = 0 to get ˆθ θ 0 = {I(θ0)} 1 S(θ 0). Here, we use the fact that I (θ) = S(θ)/ θ T converges in probability to I(θ). Thus, the (asymptotic) variance of MLE is V (ˆθ). = {I(θ 0)} 1 V {S(θ 0)} {I(θ 0)} 1 = {I(θ 0)} 1, where the last equality follows from Theorem 2.3. Jae-Kwang Kim (ISU) Ch 2 P1 14 / 52

15 Section 2: Observed likelihood 2. Observed likelihood Jae-Kwang Kim (ISU) Ch 2 P1 15 / 52

16 Basic Setup 1 Let y = (y 1, y 2, y n) be a realization of the random sample from an infinite population with density f (y; θ 0). 2 Let δ i be an indicator function defined by { 1 if yi is observed δ i = 0 otherwise and we assume that Pr (δ i = 1 y i ) = π (y i, φ) for some (unknown) parameter φ and π( ) is a known function. 3 Thus, instead of observing (δ i, y i ) directly, we observe (δ i, y i,obs ) { yi if δ i = 1 y i,obs = if δ i = 0. 4 What is the (marginal) density of (y i,obs, δ i )? Jae-Kwang Kim (ISU) Ch 2 P1 16 / 52

17 Motivation (Change of variable technique) 1 Suppose that z is a random variable with density f (z). 2 Instead of observing z directly, we observe only y = y (z), where the mapping z y (z) is known. 3 The density of y is where R (y) = {z; y (z) = y}. g (y) = R(y) f (z) dz Jae-Kwang Kim (ISU) Ch 2 P1 17 / 52

18 Derivation 1 The original distribution for z = (y, δ) is f (y, δ) = f 1 (y) f 2 (δ y) where f 1 (y) is the density of y and f 2 (δ y) is the conditional density of δ conditional on y and is given by f 2 (δ y) = {π(y)} δ {1 π(y)} 1 δ, where π(y) = Pr (δ = 1 y). 2 Instead of observing z = (y, δ) directly, we observe only (y obs, δ) where y obs = y obs (y, δ) and the mapping (y, δ) y obs is known. 3 The (marginal) density of (y obs, δ) is g (y obs, δ) = where R (y obs, δ) = {y; y obs (y, δ) = y obs }. R(y obs,δ) f (y, δ) dy Jae-Kwang Kim (ISU) Ch 2 P1 18 / 52

19 Likelihood under missing data (=Observed likelihood) The observed likelihood is the likelihood obtained from the marginal density of (y i,obs, δ i ), i = 1, 2,, n, and can be written as, under the IID setup, L obs (θ, φ) = [f 1 (y i ; θ) f 2 (δ i y i ; φ)] [ ] f 1 (y i ; θ) f 2 (δ i y i ; φ) dy i δ i =1 δ i =0 = [f 1 (y i ; θ) π (y i ; φ)] [ ] f 1 (y; θ) {1 π (y; φ)} dy, δ i =1 δ i =0 where π (y; φ) = P (δ = 1 y; φ). Jae-Kwang Kim (ISU) Ch 2 P1 19 / 52

20 Example 2.2 (Censored regression model, or Tobit model) z i = x i β + ɛ i y i = ɛ i N (0, σ 2) { zi if z i 0 0 if z i < 0. The observed log-likelihood is [ ] ( l β, σ 2) = 1 ln 2π + ln σ 2 + (y i x i β) 2 2 σ 2 y i >0 where Φ (x) is the cdf of the standard normal distribution. + [ ( )] x ln 1 Φ i β σ y i =0 Jae-Kwang Kim (ISU) Ch 2 P1 20 / 52

21 Example 2.3 Let t 1, t 2,, t n be an IID sample from a distribution with density f θ (t) = θe θt I (t > 0). Instead of observing t i, we observe (y i, δ i ) where { ti if δ i = 1 y i = c if δ i = 0 and δ i = { 1 if ti c 0 if t i > c, where c is a known censoring time. The observed likelihood for θ can be derived as L obs (θ) = n i=1 [ ] {f θ (t i )} δ i {P (t i > c)} 1 δ i = θ n i=1 δ i exp( θ n y i ). i=1 Jae-Kwang Kim (ISU) Ch 2 P1 21 / 52

22 Multivariate extension Basic Setup Let y = (y 1,..., y p) be a p-dimensional random vector with probability density function f (y; θ). { 1 if yij is observed Let δ ij be the response indicator function of y ij with δ ij = 0 otherwise. δ i = (δ i1,, δ ip ): p-dimensional random vector with density P(δ y) assuming P(δ y) = P(δ y; φ) for some φ. Let (y i,obs, y i,mis ) be the observed part and missing part of y i, respectively. Let R(y obs, δ) = {y; y obs (y i, δ i ) = y i,obs, i = 1,..., n} be the set of all possible values of y with the same realized value of y obs, for given δ, where y obs (y i, δ i ) is a function that gives the value of y ij for δ ij = 1. Jae-Kwang Kim (ISU) Ch 2 P1 22 / 52

23 Definition: Observed likelihood Under the above setup, the observed likelihood of (θ, φ) is L obs (θ, φ) = f (y; θ)p(δ y; φ)dy. R(y obs,δ) Under IID setup: The observed likelihood is n [ ] L obs (θ, φ) = f (y i ; θ)p(δ i y i ; φ)dy i,mis, i=1 where it is understood that, if y i = y i,obs and y i,mis is empty then there is nothing to integrate out. In the special case of scalar y, the observed likelihood is L obs (θ, φ) = [f (y i ; θ)π(y i ; φ)] [ ] f (y; θ) {1 π(y; φ)} dy, δ i =1 δ i =0 where π(y; φ) = P(δ = 1 y; φ). Jae-Kwang Kim (ISU) Ch 2 P1 23 / 52

24 Missing At Random Definition: Missing At Random (MAR) P(δ y) is the density of the conditional distribution of δ given y. Let y obs = y obs (y, δ) where { yi if δ i = 1 y i,obs = if δ i = 0. The response mechanism is MAR if P (δ y 1 ) = P (δ y 2 ) { or P(δ y) = P(δ y obs )} for all y 1 and y 2 satisfying y obs (y 1, δ) = y obs (y 2, δ). MAR: the response mechanism P(δ y) depends on y only through y obs. Let y = (y obs, y mis ). By Bayes theorem, P (y mis y obs, δ) = P(δ y mis, y obs ) P (y P(δ y obs ) mis y obs ). MAR: P(y mis y obs, δ) = P(y mis y obs ). That is, y mis δ y obs. MAR: the conditional independence of δ and y mis given y obs. Jae-Kwang Kim (ISU) Ch 2 P1 24 / 52

25 Remark MCAR (Missing Completely at random): P(δ y) does not depend on y. MAR (Missing at random): P(δ y) = P(δ y obs ) NMAR (Not Missing at random): P(δ y) P(δ y obs ) Thus, MCAR is a special case of MAR. Jae-Kwang Kim (ISU) Ch 2 P1 25 / 52

26 Likelihood factorization theorem Theorem 2.4 (Rubin, 1976) P φ (δ y) is the joint density of δ given y and f θ (y) is the joint density of y. Under conditions 1 the parameters θ and φ are distinct and 2 MAR condition holds, the observed likelihood can be written as L obs (θ, φ) = L 1(θ)L 2(φ), and the MLE of θ can be obtained by maximizing L 1(θ). Thus, we do not have to specify the model for response mechanism. The response mechanism is called ignorable if the above likelihood factorization holds. Jae-Kwang Kim (ISU) Ch 2 P1 26 / 52

27 Example 2.4 Bivariate data (x i, y i ) with pdf f (x, y) = f 1(y x)f 2(x). x i is always observed and y i is subject to missingness. Assume that the response status variable δ i of y i satisfies for some function Λ 1 ( ) of known form. P (δ i = 1 x i, y i ) = Λ 1 (φ 0 + φ 1x i + φ 2y i ) Let θ be the parameter of interest in the regression model f 1 (y x; θ). Let α be the parameter in the marginal distribution of x, denoted by f 2 (x i ; α). Define Λ 0(x) = 1 Λ 1(x). Three parameters θ: parameter of interest α and φ: nuisance parameter Jae-Kwang Kim (ISU) Ch 2 P1 27 / 52

28 Example 2.4 (Cont d) Observed likelihood L obs (θ, α, φ) = f 1 (y i x i ; θ) f 2 (x i ; α) Λ 1 (φ 0 + φ 1x i + φ 2y i ) δi =1 δi =0 f 1 (y x i ; θ) f 2 (x i ; α) Λ 0 (φ 0 + φ 1x i + φ 2y) dy = L 1 (θ, φ) L 2 (α) where L 2 (α) = n i=1 f2 (x i; α). Thus, we can safely ignore the marginal distribution of x if x is completely observed. Jae-Kwang Kim (ISU) Ch 2 P1 28 / 52

29 Example 2.4 (Cont d) If φ 2 = 0, then MAR holds and L 1 (θ, φ) = L 1a (θ) L 1b (φ) where and L 1a (θ) = f 1 (y i x i ; θ) δ i =1 L 1b (φ) = Λ 1 (φ 0 + φ 1x i ) Λ 0 (φ 0 + φ 1x i ). δ i =1 δ i =0 Thus, under MAR, the MLE of θ can be obtained by maximizing L 1a (θ), which is obtained by ignoring the missing part of the data. Jae-Kwang Kim (ISU) Ch 2 P1 29 / 52

30 Example 2.4 (Cont d) Instead of y i subject to missingness, if x i is subject to missingness, then the observed likelihood becomes L obs (θ, φ, α) = f 1 (y i x i ; θ) f 2 (x i ; α) Λ 1 (φ 0 + φ 1x i + φ 2y i ) δi =1 δi =0 f 1 (y i x; θ) f 2 (x; α) Λ 0 (φ 0 + φ 1x + φ 2y i ) dx If φ 1 = 0 then L 1 (θ, φ) L 2 (α). L obs (θ, α, φ) = L 1 (θ, α) L 2 (φ) and MAR holds. Although we are not interested in the marginal distribution of x, we still need to specify the model for the marginal distribution of x. Jae-Kwang Kim (ISU) Ch 2 P1 30 / 52

31 Chapter 2: Part 2 3. Mean Score Approach Jae-Kwang Kim (ISU) Ch 2 P1 31 / 52

32 3 Mean Score Approach The observed likelihood is the marginal density of (y obs, δ). The observed likelihood is L obs (η) = f (y; θ)p(δ y; φ)dµ(y) = R(y obs,δ) f (y; θ)p(δ y; φ)dµ(y mis ) where y mis is the missing part of y and η = (θ, φ). Observed score equation: S obs (η) η log L obs(η) = 0 Computing the observed score function can be computationally challenging because the observed likelihood is an integral form. Jae-Kwang Kim (ISU) Ch 2 P1 32 / 52

33 3 Mean Score Approach Theorem 2.5: Mean Score Theorem (Fisher, 1922) Under some regularity conditions, the observed score function equals to the mean score function. That is, S obs (η) = S(η) where S(η) = E{S com(η) y obs, δ} S com(η) = log f (y, δ; η), η f (y, δ; η) = f (y; θ)p(δ y; φ). The mean score function is computed by taking the conditional expectation of the complete-sample score function given the observation. The mean score function is easier to compute than the observed score function. Jae-Kwang Kim (ISU) Ch 2 P1 33 / 52

34 3 Mean Score Approach Proof of Theorem 2.5 Since L obs (η) = f (y, δ; η) /f (y, δ y obs, δ; η), we have η ln L obs (η) = ln f (y, δ; η) η η ln f (y, δ y obs, δ; η), taking a conditional expectation of the above equation over the conditional distribution of (y, δ) given (y obs, δ), we have { } η ln L obs (η) = E η ln L obs (η) y obs, δ { } = E {S com (η) y obs, δ} E η ln f (y, δ y obs, δ; η) y obs, δ. Here, the first equality holds because L obs (η) is a function of (y obs, δ) only. The last term is equal to zero by Theorem 2.3, which states that the expected value of the score function is zero and the reference distribution in this case is the conditional distribution of (y, δ) given (y obs, δ). Jae-Kwang Kim (ISU) Ch 2 P1 34 / 52

35 Example Suppose that the study variable y follows from a normal distribution with mean x β and variance σ 2. The score equations for β and σ 2 under complete response are S 1(β, σ 2 ) = n ( yi x i β ) x i /σ 2 = 0 i=1 and S 2(β, σ 2 ) = n/(2σ 2 ) + n ( yi x i β )2 /(2σ 4 ) = 0. 2 Assume that y i are observed only for the first r elements and the MAR assumption holds. In this case, the mean score function reduces to S 1(β, σ 2 ) = i=1 r ( yi x i β ) x i /σ 2 i=1 and r S 2(β, σ 2 ) = n/(2σ 2 ( ) + yi x i β )2 /(2σ 4 ) + (n r)/(2σ 2 ). i=1 Jae-Kwang Kim (ISU) Ch 2 P1 35 / 52

36 Example 2.6 (Cont d) 3 The maximum likelihood estimator obtained by solving the mean score equations is ( r ) 1 r ˆβ = x i x i x i y i and ˆσ 2 = 1 r i=1 r i=1 i=1 ( y i x i ˆβ) 2. Thus, the resulting estimators can be also obtained by simply ignoring the missing part of the sample, which is consistent with the result in Example 2.4 (for φ 2 = 0). Jae-Kwang Kim (ISU) Ch 2 P1 36 / 52

37 Discussion of Example 2.6 We are interested in estimating θ for the conditional density f (y x; θ). Under MAR, the observed likelihood for θ is r n L obs (θ) = f (y i x i ; θ) f (y x i ; θ)dy = i=1 i=r+1 r f (y i x i ; θ). The same conclusion can follow from the mean score theorem. Under MAR, the mean score function is r n S(θ) = S(θ; x i, y i ) + E{S(θ; x i, Y ) x i } = i=1 r S(θ; x i, y i ) i=1 i=r+1 where S(θ; x, y) is the score function for θ and the second equality follows from Theorem 2.3. i=1 Jae-Kwang Kim (ISU) Ch 2 P1 37 / 52

38 Example Suppose that the study variable y is randomly distributed with Bernoulli distribution with probability of success p i, where p i = p i (β) = exp (x i β) 1 + exp (x i β) for some unknown parameter β and x i is a vector of the covariates in the logistic regression model for y i. We assume that 1 is in the column space of x i. 2 Under complete response, the score function for β is S 1(β) = n (y i p i (β)) x i. i=1 Jae-Kwang Kim (ISU) Ch 2 P1 38 / 52

39 Example 2.5 (Cont d) 3 Let δ i be the response indicator function for y i with distribution Bernoulli(π i ) where π i = exp (x i φ 0 + y i φ 1) 1 + exp (x i φ0 + y iφ 1). We assume that x i is always observed, but y i is missing if δ i = 0. 4 Under missing data, the mean score function for β is S 1 (β, φ) = δi =1 {y i p i (β)} x i + δ i =0 1 w i (y; β, φ) {y p i (β)} x i, where w i (y; β, φ) is the conditional probability of y i = y given x i and δ i = 0: y=0 w i (y; β, φ) = P β (y i = y x i ) P φ (δ i = 0 y i = y, x i ) 1 z=0 P β (y i = z x i ) P φ (δ i = 0 y i = z, x i ) Thus, S 1 (β, φ) is also a function of φ. Jae-Kwang Kim (ISU) Ch 2 P1 39 / 52

40 Example 2.5 (Cont d) 5 If the response mechanism is MAR so that φ 1 = 0, then w i (y; β, φ) = P β (y i = y x i ) 1 z=0 P β (y i = z x i ) = P β (y i = y x i ) and so S 1 (β, φ) = δi =1 {y i p i (β)} x i = S 1 (β). 6 If MAR does not hold, then ( ˆβ, ˆφ) can be obtained by solving S 1 (β, φ) = 0 and S 2(β, φ) = 0 jointly, where S 2 (β, φ) = δi =1 {δ i π (φ; x i, y i )} (x i, y i ) + δ i =0 1 w i (y; β, φ) {δ i π i (φ; x i, y)} (x i, y). y=0 Jae-Kwang Kim (ISU) Ch 2 P1 40 / 52

41 Discussion of Example 2.5 We may not have a unique solution to S(η) = 0, where S(η) = [ S1 (β, φ), S 2(β, φ) ] when MAR does not hold, because of the non-identifiability problem associated with non-ignorable missing. To avoid this problem, often a reduced model is used for the response model. Pr (δ = 1 x, y) = Pr (δ = 1 u, y) where x = (u, z). The reduced response model introduces a smaller set of parameters and the over-identified situation can be resolved. (More discussion will be made in Chapter 6.) Computing the solution to S(η) is also difficult. EM algorithm, which will be discussed in Chapter 3, is a useful computational tool. Jae-Kwang Kim (ISU) Ch 2 P1 41 / 52

42 2.4 Observed information Jae-Kwang Kim (ISU) Ch 2 P1 42 / 52

43 4 Observed information Definition 1 Observed score function: S obs (η) = η log L obs(η) 2 Fisher information from observed likelihood: I obs (η) = 2 log L η η T obs (η) 3 Expected (Fisher) information from observed likelihood: I obs (η) = E η {I obs (η)}. Theorem 2.6 Under regularity conditions, E{S obs (η)} = 0, and V{S obs (η)} = I obs (η), where I obs (η) = E η{i obs (η)} is the expected information from the observed likelihood. Jae-Kwang Kim (ISU) Ch 2 P1 43 / 52

44 4 Observed information Under missing data, the MLE ˆη is the solution to S(η) = 0. Under some regularity conditions, ˆη converges in probability to η 0 and has the asymptotic variance {I obs (η 0)} 1 with { I obs (η) = E } { } { } η S obs(η) = E S 2 T obs (η) = E S 2 (η) & B 2 = BB T. For variance estimation of ˆη, may use {I obs (ˆη)} 1. In general, I obs (ˆη) is preferred to I obs (ˆη) for variance estimation of ˆη. Jae-Kwang Kim (ISU) Ch 2 P1 44 / 52

45 Return to Example 2.3 Observed log-likelihood ln L obs (θ) = n δ i log(θ) θ n i=1 i=1 y i MLE for θ: ˆθ = n i=1 y i n i=1 δ i Fisher information: I obs (θ) = n i=1 δ i/θ 2 Expected information: I obs (θ) = n i=1 (1 e θc )/θ 2 = n(1 e θc )/θ 2. Which one do you prefer? Jae-Kwang Kim (ISU) Ch 2 P1 45 / 52

46 4 Observed information Motivation L com(η) = f (y, δ; η): complete-sample likelihood with no missing data Fisher information associated with L com(η): I com(η) = η Scom(η) = 2 log Lcom(η) T η ηt L obs (η): the observed likelihood Fisher information associated with L obs (η): I obs (η) = η S obs(η) = 2 T η η log L obs(η) T How to express I obs (η) in terms of I com(η) and S com(η)? Jae-Kwang Kim (ISU) Ch 2 P1 46 / 52

47 Louis formula Theorem 2.7 (Louis, 1982; Oakes, 1999) Under regularity conditions allowing the exchange of the order of integration and differentiation, [ I obs (η) = E{I com(η) y obs, δ} E{Scom(η) y 2 obs, δ} S(η) 2] where S(η) = E{S com(η) y obs, δ}. = E{I com(η) y obs, δ} V {S com(η) y obs, δ}, Jae-Kwang Kim (ISU) Ch 2 P1 47 / 52

48 Proof of Theorem 2.7 By Theorem 2.5, the observed information associated with L obs (η) can be expressed as I obs (η) = η S (η) where S (η) = E {S com(η) y obs, δ; η}. Thus, we have η S (η) = S com(η; y)f (y, δ y η obs, δ; η) dµ(y) { } = Scom(η; y) f (y, δ y η obs, δ; η) dµ(y) { } + S com(η; y) η f (y, δ y obs, δ; η) dµ(y) = E { S com(η)/ η y obs, δ } { } + S com(η; y) η log f (y, δ y obs, δ; η) f (y, δ y obs, δ; η) dµ(y). The first term is equal to E {I com(η) y obs, δ} and the second term is equal to E { S com(η)s mis(η) y obs, δ } = E [{ S(η) } + Smis(η) S mis(η) y obs, δ ] = E { S mis(η)s mis(η) y obs, δ } because E { S(η)Smis(η) y obs, δ } = 0. Jae-Kwang Kim (ISU) Ch 2 P1 48 / 52

49 Missing Information Principle S mis (η): the score function with the conditional density f (y, δ y obs, δ). { } Expected missing information: I mis (η) = E S η T mis (η) I mis (η) = E { S mis (η) 2}. Missing information principle (Orchard and Woodbury, 1972): I mis (η) = I com(η) I obs (η), satisfying where I com(η) = E { S com(η)/ η T } is the expected information with complete-sample likelihood. An alternative expression of the missing information principle is V{S mis (η)} = V{S com(η)} V{ S(η)}. Note that V{S com(η)} = I com(η) and V{S obs (η)} = I obs (η). Jae-Kwang Kim (ISU) Ch 2 P1 49 / 52

50 Example Consider the following bivariate normal distribution: ( ) [( ) ( y1i µ1 σ11 σ 12 N, µ 2 σ 12 σ 22 y 2i )], for i = 1, 2,, n. Assume for simplicity that σ 11, σ 12 and σ 22 are known constants and µ = (µ 1, µ 2) be the parameter of interest. 2 The complete sample score function for µ is n n ( ) 1 ( S com (µ) = S com (i) σ11 σ 12 y1i µ 1 (µ) = σ 12 σ 22 y 2i µ 2 i=1 The information matrix of µ based on the complete sample is ( ) 1 σ11 σ 12 I com (µ) = n. σ 12 σ 22 i=1 ). Jae-Kwang Kim (ISU) Ch 2 P1 50 / 52

51 Example 2.7 (Cont d) 3 Suppose that there are some missing values in y 1i and y 2i and the original sample is partitioned into four sets: H = both y 1 and y 2 respond K = only y 1 is observed L = only y 2 is observed M = both y 1 and y 2 are missing. Let n H, n K, n L, n M represent the size of H, K, L, M, respectively. 4 Assume that the response mechanism does not depend on the value of (y 1, y 2) and so it is MAR. In this case, the observed score function of µ based on a single observation in set K is E { } S com (i) (µ) y 1i, i K ( ) 1 ( ) σ11 σ 12 y1i µ 1 = σ 12 σ 22 E(y 2i y 1i ) µ 2 ( σ 1 11 = (y ) 1i µ 1). 0 Jae-Kwang Kim (ISU) Ch 2 P1 51 / 52

52 Example 2.7 (Cont d) 5 Similarly, we have E { } S com (i) (µ) y 2i, i L = ( 0 σ 1 22 (y 2i µ 2) ). 6 Therefore, the observed information matrix of µ is I obs (µ) = n H ( σ11 σ 12 σ 12 σ 22 ) 1 + n K ( σ ) + n L ( σ 1 22 and the asymptotic variance of the MLE of µ can be obtained by the inverse of I obs (µ). ) Jae-Kwang Kim (ISU) Ch 2 P1 52 / 52

53 REFERENCES Fisher, R. A. (1922), On the mathematical foundations of theoretical statistics, Philosophical Transactions of the Royal Society of London A 222, Louis, T. A. (1982), Finding the observed information matrix when using the EM algorithm, Journal of the Royal Statistical Society: Series B 44, Oakes, D. (1999), Direct calculation of the information matrix via the em algorithm, Journal of the Royal Statistical Society: Series B 61, Orchard, T. and M.A. Woodbury (1972), A missing information principle: theory and applications, in Proceedings of the 6th Berkeley Symposium on Mathematical Statistics and Probability, Vol. 1, University of California Press, Berkeley, California, pp Rubin, D. B. (1976), Inference and missing data, Biometrika 63, Jae-Kwang Kim (ISU) Ch 2 P1 52 / 52

Statistical Methods for Handling Missing Data

Statistical Methods for Handling Missing Data Statistical Methods for Handling Missing Data Jae-Kwang Kim Department of Statistics, Iowa State University July 5th, 2014 Outline Textbook : Statistical Methods for handling incomplete data by Kim and

More information

Recent Advances in the analysis of missing data with non-ignorable missingness

Recent Advances in the analysis of missing data with non-ignorable missingness Recent Advances in the analysis of missing data with non-ignorable missingness Jae-Kwang Kim Department of Statistics, Iowa State University July 4th, 2014 1 Introduction 2 Full likelihood-based ML estimation

More information

Fractional Imputation in Survey Sampling: A Comparative Review

Fractional Imputation in Survey Sampling: A Comparative Review Fractional Imputation in Survey Sampling: A Comparative Review Shu Yang Jae-Kwang Kim Iowa State University Joint Statistical Meetings, August 2015 Outline Introduction Fractional imputation Features Numerical

More information

Fractional Hot Deck Imputation for Robust Inference Under Item Nonresponse in Survey Sampling

Fractional Hot Deck Imputation for Robust Inference Under Item Nonresponse in Survey Sampling Fractional Hot Deck Imputation for Robust Inference Under Item Nonresponse in Survey Sampling Jae-Kwang Kim 1 Iowa State University June 26, 2013 1 Joint work with Shu Yang Introduction 1 Introduction

More information

Likelihood-based inference with missing data under missing-at-random

Likelihood-based inference with missing data under missing-at-random Likelihood-based inference with missing data under missing-at-random Jae-kwang Kim Joint work with Shu Yang Department of Statistics, Iowa State University May 4, 014 Outline 1. Introduction. Parametric

More information

Chapter 4: Imputation

Chapter 4: Imputation Chapter 4: Imputation Jae-Kwang Kim Department of Statistics, Iowa State University Outline 1 Introduction 2 Basic Theory for imputation 3 Variance estimation after imputation 4 Replication variance estimation

More information

6. Fractional Imputation in Survey Sampling

6. Fractional Imputation in Survey Sampling 6. Fractional Imputation in Survey Sampling 1 Introduction Consider a finite population of N units identified by a set of indices U = {1, 2,, N} with N known. Associated with each unit i in the population

More information

An Efficient Estimation Method for Longitudinal Surveys with Monotone Missing Data

An Efficient Estimation Method for Longitudinal Surveys with Monotone Missing Data An Efficient Estimation Method for Longitudinal Surveys with Monotone Missing Data Jae-Kwang Kim 1 Iowa State University June 28, 2012 1 Joint work with Dr. Ming Zhou (when he was a PhD student at ISU)

More information

Parametric fractional imputation for missing data analysis

Parametric fractional imputation for missing data analysis 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 Biometrika (????),??,?, pp. 1 15 C???? Biometrika Trust Printed in

More information

MISCELLANEOUS TOPICS RELATED TO LIKELIHOOD. Copyright c 2012 (Iowa State University) Statistics / 30

MISCELLANEOUS TOPICS RELATED TO LIKELIHOOD. Copyright c 2012 (Iowa State University) Statistics / 30 MISCELLANEOUS TOPICS RELATED TO LIKELIHOOD Copyright c 2012 (Iowa State University) Statistics 511 1 / 30 INFORMATION CRITERIA Akaike s Information criterion is given by AIC = 2l(ˆθ) + 2k, where l(ˆθ)

More information

A note on multiple imputation for general purpose estimation

A note on multiple imputation for general purpose estimation A note on multiple imputation for general purpose estimation Shu Yang Jae Kwang Kim SSC meeting June 16, 2015 Shu Yang, Jae Kwang Kim Multiple Imputation June 16, 2015 1 / 32 Introduction Basic Setup Assume

More information

Estimation theory. Parametric estimation. Properties of estimators. Minimum variance estimator. Cramer-Rao bound. Maximum likelihood estimators

Estimation theory. Parametric estimation. Properties of estimators. Minimum variance estimator. Cramer-Rao bound. Maximum likelihood estimators Estimation theory Parametric estimation Properties of estimators Minimum variance estimator Cramer-Rao bound Maximum likelihood estimators Confidence intervals Bayesian estimation 1 Random Variables Let

More information

Shu Yang and Jae Kwang Kim. Harvard University and Iowa State University

Shu Yang and Jae Kwang Kim. Harvard University and Iowa State University Statistica Sinica 27 (2017), 000-000 doi:https://doi.org/10.5705/ss.202016.0155 DISCUSSION: DISSECTING MULTIPLE IMPUTATION FROM A MULTI-PHASE INFERENCE PERSPECTIVE: WHAT HAPPENS WHEN GOD S, IMPUTER S AND

More information

1. Fisher Information

1. Fisher Information 1. Fisher Information Let f(x θ) be a density function with the property that log f(x θ) is differentiable in θ throughout the open p-dimensional parameter set Θ R p ; then the score statistic (or score

More information

Introduction to Estimation Methods for Time Series models Lecture 2

Introduction to Estimation Methods for Time Series models Lecture 2 Introduction to Estimation Methods for Time Series models Lecture 2 Fulvio Corsi SNS Pisa Fulvio Corsi Introduction to Estimation () Methods for Time Series models Lecture 2 SNS Pisa 1 / 21 Estimators:

More information

EM Algorithm II. September 11, 2018

EM Algorithm II. September 11, 2018 EM Algorithm II September 11, 2018 Review EM 1/27 (Y obs, Y mis ) f (y obs, y mis θ), we observe Y obs but not Y mis Complete-data log likelihood: l C (θ Y obs, Y mis ) = log { f (Y obs, Y mis θ) Observed-data

More information

Theory of Maximum Likelihood Estimation. Konstantin Kashin

Theory of Maximum Likelihood Estimation. Konstantin Kashin Gov 2001 Section 5: Theory of Maximum Likelihood Estimation Konstantin Kashin February 28, 2013 Outline Introduction Likelihood Examples of MLE Variance of MLE Asymptotic Properties What is Statistical

More information

Fall 2017 STAT 532 Homework Peter Hoff. 1. Let P be a probability measure on a collection of sets A.

Fall 2017 STAT 532 Homework Peter Hoff. 1. Let P be a probability measure on a collection of sets A. 1. Let P be a probability measure on a collection of sets A. (a) For each n N, let H n be a set in A such that H n H n+1. Show that P (H n ) monotonically converges to P ( k=1 H k) as n. (b) For each n

More information

Introduction An approximated EM algorithm Simulation studies Discussion

Introduction An approximated EM algorithm Simulation studies Discussion 1 / 33 An Approximated Expectation-Maximization Algorithm for Analysis of Data with Missing Values Gong Tang Department of Biostatistics, GSPH University of Pittsburgh NISS Workshop on Nonignorable Nonresponse

More information

Graduate Econometrics I: Maximum Likelihood II

Graduate Econometrics I: Maximum Likelihood II Graduate Econometrics I: Maximum Likelihood II Yves Dominicy Université libre de Bruxelles Solvay Brussels School of Economics and Management ECARES Yves Dominicy Graduate Econometrics I: Maximum Likelihood

More information

Outline of GLMs. Definitions

Outline of GLMs. Definitions Outline of GLMs Definitions This is a short outline of GLM details, adapted from the book Nonparametric Regression and Generalized Linear Models, by Green and Silverman. The responses Y i have density

More information

Model comparison and selection

Model comparison and selection BS2 Statistical Inference, Lectures 9 and 10, Hilary Term 2008 March 2, 2008 Hypothesis testing Consider two alternative models M 1 = {f (x; θ), θ Θ 1 } and M 2 = {f (x; θ), θ Θ 2 } for a sample (X = x)

More information

Chapter 3: Maximum Likelihood Theory

Chapter 3: Maximum Likelihood Theory Chapter 3: Maximum Likelihood Theory Florian Pelgrin HEC September-December, 2010 Florian Pelgrin (HEC) Maximum Likelihood Theory September-December, 2010 1 / 40 1 Introduction Example 2 Maximum likelihood

More information

Chapter 7. Hypothesis Testing

Chapter 7. Hypothesis Testing Chapter 7. Hypothesis Testing Joonpyo Kim June 24, 2017 Joonpyo Kim Ch7 June 24, 2017 1 / 63 Basic Concepts of Testing Suppose that our interest centers on a random variable X which has density function

More information

Likelihood based Statistical Inference. Dottorato in Economia e Finanza Dipartimento di Scienze Economiche Univ. di Verona

Likelihood based Statistical Inference. Dottorato in Economia e Finanza Dipartimento di Scienze Economiche Univ. di Verona Likelihood based Statistical Inference Dottorato in Economia e Finanza Dipartimento di Scienze Economiche Univ. di Verona L. Pace, A. Salvan, N. Sartori Udine, April 2008 Likelihood: observed quantities,

More information

Hypothesis Testing. 1 Definitions of test statistics. CB: chapter 8; section 10.3

Hypothesis Testing. 1 Definitions of test statistics. CB: chapter 8; section 10.3 Hypothesis Testing CB: chapter 8; section 0.3 Hypothesis: statement about an unknown population parameter Examples: The average age of males in Sweden is 7. (statement about population mean) The lowest

More information

Chapter 3. Point Estimation. 3.1 Introduction

Chapter 3. Point Estimation. 3.1 Introduction Chapter 3 Point Estimation Let (Ω, A, P θ ), P θ P = {P θ θ Θ}be probability space, X 1, X 2,..., X n : (Ω, A) (IR k, B k ) random variables (X, B X ) sample space γ : Θ IR k measurable function, i.e.

More information

Parametric Inference Maximum Likelihood Inference Exponential Families Expectation Maximization (EM) Bayesian Inference Statistical Decison Theory

Parametric Inference Maximum Likelihood Inference Exponential Families Expectation Maximization (EM) Bayesian Inference Statistical Decison Theory Statistical Inference Parametric Inference Maximum Likelihood Inference Exponential Families Expectation Maximization (EM) Bayesian Inference Statistical Decison Theory IP, José Bioucas Dias, IST, 2007

More information

Chapter 3 : Likelihood function and inference

Chapter 3 : Likelihood function and inference Chapter 3 : Likelihood function and inference 4 Likelihood function and inference The likelihood Information and curvature Sufficiency and ancilarity Maximum likelihood estimation Non-regular models EM

More information

Ph.D. Qualifying Exam Friday Saturday, January 6 7, 2017

Ph.D. Qualifying Exam Friday Saturday, January 6 7, 2017 Ph.D. Qualifying Exam Friday Saturday, January 6 7, 2017 Put your solution to each problem on a separate sheet of paper. Problem 1. (5106) Let X 1, X 2,, X n be a sequence of i.i.d. observations from a

More information

simple if it completely specifies the density of x

simple if it completely specifies the density of x 3. Hypothesis Testing Pure significance tests Data x = (x 1,..., x n ) from f(x, θ) Hypothesis H 0 : restricts f(x, θ) Are the data consistent with H 0? H 0 is called the null hypothesis simple if it completely

More information

analysis of incomplete data in statistical surveys

analysis of incomplete data in statistical surveys analysis of incomplete data in statistical surveys Ugo Guarnera 1 1 Italian National Institute of Statistics, Italy guarnera@istat.it Jordan Twinning: Imputation - Amman, 6-13 Dec 2014 outline 1 origin

More information

Graduate Econometrics I: Maximum Likelihood I

Graduate Econometrics I: Maximum Likelihood I Graduate Econometrics I: Maximum Likelihood I Yves Dominicy Université libre de Bruxelles Solvay Brussels School of Economics and Management ECARES Yves Dominicy Graduate Econometrics I: Maximum Likelihood

More information

Final Exam. 1. (6 points) True/False. Please read the statements carefully, as no partial credit will be given.

Final Exam. 1. (6 points) True/False. Please read the statements carefully, as no partial credit will be given. 1. (6 points) True/False. Please read the statements carefully, as no partial credit will be given. (a) If X and Y are independent, Corr(X, Y ) = 0. (b) (c) (d) (e) A consistent estimator must be asymptotically

More information

Maximum Likelihood Estimation

Maximum Likelihood Estimation Chapter 7 Maximum Likelihood Estimation 7. Consistency If X is a random variable (or vector) with density or mass function f θ (x) that depends on a parameter θ, then the function f θ (X) viewed as a function

More information

Nonresponse weighting adjustment using estimated response probability

Nonresponse weighting adjustment using estimated response probability Nonresponse weighting adjustment using estimated response probability Jae-kwang Kim Yonsei University, Seoul, Korea December 26, 2006 Introduction Nonresponse Unit nonresponse Item nonresponse Basic strategy

More information

Econometrics I, Estimation

Econometrics I, Estimation Econometrics I, Estimation Department of Economics Stanford University September, 2008 Part I Parameter, Estimator, Estimate A parametric is a feature of the population. An estimator is a function of the

More information

DA Freedman Notes on the MLE Fall 2003

DA Freedman Notes on the MLE Fall 2003 DA Freedman Notes on the MLE Fall 2003 The object here is to provide a sketch of the theory of the MLE. Rigorous presentations can be found in the references cited below. Calculus. Let f be a smooth, scalar

More information

Maximum Likelihood Estimation

Maximum Likelihood Estimation Chapter 8 Maximum Likelihood Estimation 8. Consistency If X is a random variable (or vector) with density or mass function f θ (x) that depends on a parameter θ, then the function f θ (X) viewed as a function

More information

Lecture 1: Introduction

Lecture 1: Introduction Principles of Statistics Part II - Michaelmas 208 Lecturer: Quentin Berthet Lecture : Introduction This course is concerned with presenting some of the mathematical principles of statistical theory. One

More information

Optimization Methods II. EM algorithms.

Optimization Methods II. EM algorithms. Aula 7. Optimization Methods II. 0 Optimization Methods II. EM algorithms. Anatoli Iambartsev IME-USP Aula 7. Optimization Methods II. 1 [RC] Missing-data models. Demarginalization. The term EM algorithms

More information

A Very Brief Summary of Statistical Inference, and Examples

A Very Brief Summary of Statistical Inference, and Examples A Very Brief Summary of Statistical Inference, and Examples Trinity Term 2008 Prof. Gesine Reinert 1 Data x = x 1, x 2,..., x n, realisations of random variables X 1, X 2,..., X n with distribution (model)

More information

Review and continuation from last week Properties of MLEs

Review and continuation from last week Properties of MLEs Review and continuation from last week Properties of MLEs As we have mentioned, MLEs have a nice intuitive property, and as we have seen, they have a certain equivariance property. We will see later that

More information

Likelihood inference in the presence of nuisance parameters

Likelihood inference in the presence of nuisance parameters Likelihood inference in the presence of nuisance parameters Nancy Reid, University of Toronto www.utstat.utoronto.ca/reid/research 1. Notation, Fisher information, orthogonal parameters 2. Likelihood inference

More information

Brief Review on Estimation Theory

Brief Review on Estimation Theory Brief Review on Estimation Theory K. Abed-Meraim ENST PARIS, Signal and Image Processing Dept. abed@tsi.enst.fr This presentation is essentially based on the course BASTA by E. Moulines Brief review on

More information

Optimization. The value x is called a maximizer of f and is written argmax X f. g(λx + (1 λ)y) < λg(x) + (1 λ)g(y) 0 < λ < 1; x, y X.

Optimization. The value x is called a maximizer of f and is written argmax X f. g(λx + (1 λ)y) < λg(x) + (1 λ)g(y) 0 < λ < 1; x, y X. Optimization Background: Problem: given a function f(x) defined on X, find x such that f(x ) f(x) for all x X. The value x is called a maximizer of f and is written argmax X f. In general, argmax X f may

More information

Estimation of Dynamic Regression Models

Estimation of Dynamic Regression Models University of Pavia 2007 Estimation of Dynamic Regression Models Eduardo Rossi University of Pavia Factorization of the density DGP: D t (x t χ t 1, d t ; Ψ) x t represent all the variables in the economy.

More information

Last lecture 1/35. General optimization problems Newton Raphson Fisher scoring Quasi Newton

Last lecture 1/35. General optimization problems Newton Raphson Fisher scoring Quasi Newton EM Algorithm Last lecture 1/35 General optimization problems Newton Raphson Fisher scoring Quasi Newton Nonlinear regression models Gauss-Newton Generalized linear models Iteratively reweighted least squares

More information

Statistical Inference

Statistical Inference Statistical Inference Robert L. Wolpert Institute of Statistics and Decision Sciences Duke University, Durham, NC, USA. Asymptotic Inference in Exponential Families Let X j be a sequence of independent,

More information

Max. Likelihood Estimation. Outline. Econometrics II. Ricardo Mora. Notes. Notes

Max. Likelihood Estimation. Outline. Econometrics II. Ricardo Mora. Notes. Notes Maximum Likelihood Estimation Econometrics II Department of Economics Universidad Carlos III de Madrid Máster Universitario en Desarrollo y Crecimiento Económico Outline 1 3 4 General Approaches to Parameter

More information

Biostat 2065 Analysis of Incomplete Data

Biostat 2065 Analysis of Incomplete Data Biostat 2065 Analysis of Incomplete Data Gong Tang Dept of Biostatistics University of Pittsburgh October 20, 2005 1. Large-sample inference based on ML Let θ is the MLE, then the large-sample theory implies

More information

System Identification, Lecture 4

System Identification, Lecture 4 System Identification, Lecture 4 Kristiaan Pelckmans (IT/UU, 2338) Course code: 1RT880, Report code: 61800 - Spring 2016 F, FRI Uppsala University, Information Technology 13 April 2016 SI-2016 K. Pelckmans

More information

Mathematics Ph.D. Qualifying Examination Stat Probability, January 2018

Mathematics Ph.D. Qualifying Examination Stat Probability, January 2018 Mathematics Ph.D. Qualifying Examination Stat 52800 Probability, January 2018 NOTE: Answers all questions completely. Justify every step. Time allowed: 3 hours. 1. Let X 1,..., X n be a random sample from

More information

P n. This is called the law of large numbers but it comes in two forms: Strong and Weak.

P n. This is called the law of large numbers but it comes in two forms: Strong and Weak. Large Sample Theory Large Sample Theory is a name given to the search for approximations to the behaviour of statistical procedures which are derived by computing limits as the sample size, n, tends to

More information

Statistical Methods. Missing Data snijders/sm.htm. Tom A.B. Snijders. November, University of Oxford 1 / 23

Statistical Methods. Missing Data  snijders/sm.htm. Tom A.B. Snijders. November, University of Oxford 1 / 23 1 / 23 Statistical Methods Missing Data http://www.stats.ox.ac.uk/ snijders/sm.htm Tom A.B. Snijders University of Oxford November, 2011 2 / 23 Literature: Joseph L. Schafer and John W. Graham, Missing

More information

Gaussian Processes 1. Schedule

Gaussian Processes 1. Schedule 1 Schedule 17 Jan: Gaussian processes (Jo Eidsvik) 24 Jan: Hands-on project on Gaussian processes (Team effort, work in groups) 31 Jan: Latent Gaussian models and INLA (Jo Eidsvik) 7 Feb: Hands-on project

More information

Master s Written Examination

Master s Written Examination Master s Written Examination Option: Statistics and Probability Spring 05 Full points may be obtained for correct answers to eight questions Each numbered question (which may have several parts) is worth

More information

Problem Selected Scores

Problem Selected Scores Statistics Ph.D. Qualifying Exam: Part II November 20, 2010 Student Name: 1. Answer 8 out of 12 problems. Mark the problems you selected in the following table. Problem 1 2 3 4 5 6 7 8 9 10 11 12 Selected

More information

Analysis of Gamma and Weibull Lifetime Data under a General Censoring Scheme and in the presence of Covariates

Analysis of Gamma and Weibull Lifetime Data under a General Censoring Scheme and in the presence of Covariates Communications in Statistics - Theory and Methods ISSN: 0361-0926 (Print) 1532-415X (Online) Journal homepage: http://www.tandfonline.com/loi/lsta20 Analysis of Gamma and Weibull Lifetime Data under a

More information

Information in a Two-Stage Adaptive Optimal Design

Information in a Two-Stage Adaptive Optimal Design Information in a Two-Stage Adaptive Optimal Design Department of Statistics, University of Missouri Designed Experiments: Recent Advances in Methods and Applications DEMA 2011 Isaac Newton Institute for

More information

Primal-dual Covariate Balance and Minimal Double Robustness via Entropy Balancing

Primal-dual Covariate Balance and Minimal Double Robustness via Entropy Balancing Primal-dual Covariate Balance and Minimal Double Robustness via (Joint work with Daniel Percival) Department of Statistics, Stanford University JSM, August 9, 2015 Outline 1 2 3 1/18 Setting Rubin s causal

More information

Master s Written Examination

Master s Written Examination Master s Written Examination Option: Statistics and Probability Spring 016 Full points may be obtained for correct answers to eight questions. Each numbered question which may have several parts is worth

More information

Posterior Regularization

Posterior Regularization Posterior Regularization 1 Introduction One of the key challenges in probabilistic structured learning, is the intractability of the posterior distribution, for fast inference. There are numerous methods

More information

Semiparametric posterior limits

Semiparametric posterior limits Statistics Department, Seoul National University, Korea, 2012 Semiparametric posterior limits for regular and some irregular problems Bas Kleijn, KdV Institute, University of Amsterdam Based on collaborations

More information

A Few Notes on Fisher Information (WIP)

A Few Notes on Fisher Information (WIP) A Few Notes on Fisher Information (WIP) David Meyer dmm@{-4-5.net,uoregon.edu} Last update: April 30, 208 Definitions There are so many interesting things about Fisher Information and its theoretical properties

More information

A Very Brief Summary of Statistical Inference, and Examples

A Very Brief Summary of Statistical Inference, and Examples A Very Brief Summary of Statistical Inference, and Examples Trinity Term 2009 Prof. Gesine Reinert Our standard situation is that we have data x = x 1, x 2,..., x n, which we view as realisations of random

More information

System Identification, Lecture 4

System Identification, Lecture 4 System Identification, Lecture 4 Kristiaan Pelckmans (IT/UU, 2338) Course code: 1RT880, Report code: 61800 - Spring 2012 F, FRI Uppsala University, Information Technology 30 Januari 2012 SI-2012 K. Pelckmans

More information

Chapter 5: Models used in conjunction with sampling. J. Kim, W. Fuller (ISU) Chapter 5: Models used in conjunction with sampling 1 / 70

Chapter 5: Models used in conjunction with sampling. J. Kim, W. Fuller (ISU) Chapter 5: Models used in conjunction with sampling 1 / 70 Chapter 5: Models used in conjunction with sampling J. Kim, W. Fuller (ISU) Chapter 5: Models used in conjunction with sampling 1 / 70 Nonresponse Unit Nonresponse: weight adjustment Item Nonresponse:

More information

Information in Data. Sufficiency, Ancillarity, Minimality, and Completeness

Information in Data. Sufficiency, Ancillarity, Minimality, and Completeness Information in Data Sufficiency, Ancillarity, Minimality, and Completeness Important properties of statistics that determine the usefulness of those statistics in statistical inference. These general properties

More information

Recap. Vector observation: Y f (y; θ), Y Y R m, θ R d. sample of independent vectors y 1,..., y n. pairwise log-likelihood n m. weights are often 1

Recap. Vector observation: Y f (y; θ), Y Y R m, θ R d. sample of independent vectors y 1,..., y n. pairwise log-likelihood n m. weights are often 1 Recap Vector observation: Y f (y; θ), Y Y R m, θ R d sample of independent vectors y 1,..., y n pairwise log-likelihood n m i=1 r=1 s>r w rs log f 2 (y ir, y is ; θ) weights are often 1 more generally,

More information

Let us first identify some classes of hypotheses. simple versus simple. H 0 : θ = θ 0 versus H 1 : θ = θ 1. (1) one-sided

Let us first identify some classes of hypotheses. simple versus simple. H 0 : θ = θ 0 versus H 1 : θ = θ 1. (1) one-sided Let us first identify some classes of hypotheses. simple versus simple H 0 : θ = θ 0 versus H 1 : θ = θ 1. (1) one-sided H 0 : θ θ 0 versus H 1 : θ > θ 0. (2) two-sided; null on extremes H 0 : θ θ 1 or

More information

Likelihood Inference in the Presence of Nuisance Parameters

Likelihood Inference in the Presence of Nuisance Parameters PHYSTAT2003, SLAC, September 8-11, 2003 1 Likelihood Inference in the Presence of Nuance Parameters N. Reid, D.A.S. Fraser Department of Stattics, University of Toronto, Toronto Canada M5S 3G3 We describe

More information

Central Limit Theorem ( 5.3)

Central Limit Theorem ( 5.3) Central Limit Theorem ( 5.3) Let X 1, X 2,... be a sequence of independent random variables, each having n mean µ and variance σ 2. Then the distribution of the partial sum S n = X i i=1 becomes approximately

More information

STATS 200: Introduction to Statistical Inference. Lecture 29: Course review

STATS 200: Introduction to Statistical Inference. Lecture 29: Course review STATS 200: Introduction to Statistical Inference Lecture 29: Course review Course review We started in Lecture 1 with a fundamental assumption: Data is a realization of a random process. The goal throughout

More information

Likelihood Inference in the Presence of Nuisance Parameters

Likelihood Inference in the Presence of Nuisance Parameters Likelihood Inference in the Presence of Nuance Parameters N Reid, DAS Fraser Department of Stattics, University of Toronto, Toronto Canada M5S 3G3 We describe some recent approaches to likelihood based

More information

Basic math for biology

Basic math for biology Basic math for biology Lei Li Florida State University, Feb 6, 2002 The EM algorithm: setup Parametric models: {P θ }. Data: full data (Y, X); partial data Y. Missing data: X. Likelihood and maximum likelihood

More information

Lecture 7 Introduction to Statistical Decision Theory

Lecture 7 Introduction to Statistical Decision Theory Lecture 7 Introduction to Statistical Decision Theory I-Hsiang Wang Department of Electrical Engineering National Taiwan University ihwang@ntu.edu.tw December 20, 2016 1 / 55 I-Hsiang Wang IT Lecture 7

More information

Various types of likelihood

Various types of likelihood Various types of likelihood 1. likelihood, marginal likelihood, conditional likelihood, profile likelihood, adjusted profile likelihood 2. semi-parametric likelihood, partial likelihood 3. empirical likelihood,

More information

Lecture 25: Review. Statistics 104. April 23, Colin Rundel

Lecture 25: Review. Statistics 104. April 23, Colin Rundel Lecture 25: Review Statistics 104 Colin Rundel April 23, 2012 Joint CDF F (x, y) = P [X x, Y y] = P [(X, Y ) lies south-west of the point (x, y)] Y (x,y) X Statistics 104 (Colin Rundel) Lecture 25 April

More information

Variational Bayesian Inference for Parametric and Non-Parametric Regression with Missing Predictor Data

Variational Bayesian Inference for Parametric and Non-Parametric Regression with Missing Predictor Data for Parametric and Non-Parametric Regression with Missing Predictor Data August 23, 2010 Introduction Bayesian inference For parametric regression: long history (e.g. Box and Tiao, 1973; Gelman, Carlin,

More information

Multivariate Statistics

Multivariate Statistics Multivariate Statistics Chapter 2: Multivariate distributions and inference Pedro Galeano Departamento de Estadística Universidad Carlos III de Madrid pedro.galeano@uc3m.es Course 2016/2017 Master in Mathematical

More information

1 Glivenko-Cantelli type theorems

1 Glivenko-Cantelli type theorems STA79 Lecture Spring Semester Glivenko-Cantelli type theorems Given i.i.d. observations X,..., X n with unknown distribution function F (t, consider the empirical (sample CDF ˆF n (t = I [Xi t]. n Then

More information

Some General Types of Tests

Some General Types of Tests Some General Types of Tests We may not be able to find a UMP or UMPU test in a given situation. In that case, we may use test of some general class of tests that often have good asymptotic properties.

More information

Estimation and Model Selection in Mixed Effects Models Part I. Adeline Samson 1

Estimation and Model Selection in Mixed Effects Models Part I. Adeline Samson 1 Estimation and Model Selection in Mixed Effects Models Part I Adeline Samson 1 1 University Paris Descartes Summer school 2009 - Lipari, Italy These slides are based on Marc Lavielle s slides Outline 1

More information

Lecture 3 September 1

Lecture 3 September 1 STAT 383C: Statistical Modeling I Fall 2016 Lecture 3 September 1 Lecturer: Purnamrita Sarkar Scribe: Giorgio Paulon, Carlos Zanini Disclaimer: These scribe notes have been slightly proofread and may have

More information

For iid Y i the stronger conclusion holds; for our heuristics ignore differences between these notions.

For iid Y i the stronger conclusion holds; for our heuristics ignore differences between these notions. Large Sample Theory Study approximate behaviour of ˆθ by studying the function U. Notice U is sum of independent random variables. Theorem: If Y 1, Y 2,... are iid with mean µ then Yi n µ Called law of

More information

Machine learning - HT Maximum Likelihood

Machine learning - HT Maximum Likelihood Machine learning - HT 2016 3. Maximum Likelihood Varun Kanade University of Oxford January 27, 2016 Outline Probabilistic Framework Formulate linear regression in the language of probability Introduce

More information

Combining Non-probability and Probability Survey Samples Through Mass Imputation

Combining Non-probability and Probability Survey Samples Through Mass Imputation Combining Non-probability and Probability Survey Samples Through Mass Imputation Jae-Kwang Kim 1 Iowa State University & KAIST October 27, 2018 1 Joint work with Seho Park, Yilin Chen, and Changbao Wu

More information

Comparison between conditional and marginal maximum likelihood for a class of item response models

Comparison between conditional and marginal maximum likelihood for a class of item response models (1/24) Comparison between conditional and marginal maximum likelihood for a class of item response models Francesco Bartolucci, University of Perugia (IT) Silvia Bacci, University of Perugia (IT) Claudia

More information

Inference in non-linear time series

Inference in non-linear time series Intro LS MLE Other Erik Lindström Centre for Mathematical Sciences Lund University LU/LTH & DTU Intro LS MLE Other General Properties Popular estimatiors Overview Introduction General Properties Estimators

More information

IEOR E4570: Machine Learning for OR&FE Spring 2015 c 2015 by Martin Haugh. The EM Algorithm

IEOR E4570: Machine Learning for OR&FE Spring 2015 c 2015 by Martin Haugh. The EM Algorithm IEOR E4570: Machine Learning for OR&FE Spring 205 c 205 by Martin Haugh The EM Algorithm The EM algorithm is used for obtaining maximum likelihood estimates of parameters when some of the data is missing.

More information

MCMC algorithms for fitting Bayesian models

MCMC algorithms for fitting Bayesian models MCMC algorithms for fitting Bayesian models p. 1/1 MCMC algorithms for fitting Bayesian models Sudipto Banerjee sudiptob@biostat.umn.edu University of Minnesota MCMC algorithms for fitting Bayesian models

More information

STAT 730 Chapter 4: Estimation

STAT 730 Chapter 4: Estimation STAT 730 Chapter 4: Estimation Timothy Hanson Department of Statistics, University of South Carolina Stat 730: Multivariate Analysis 1 / 23 The likelihood We have iid data, at least initially. Each datum

More information

Principles of Statistics

Principles of Statistics Part II Year 2018 2017 2016 2015 2014 2013 2012 2011 2010 2009 2008 2007 2006 2005 2018 81 Paper 4, Section II 28K Let g : R R be an unknown function, twice continuously differentiable with g (x) M for

More information

Maximum Likelihood Estimation

Maximum Likelihood Estimation University of Pavia Maximum Likelihood Estimation Eduardo Rossi Likelihood function Choosing parameter values that make what one has observed more likely to occur than any other parameter values do. Assumption(Distribution)

More information

Linear Methods for Prediction

Linear Methods for Prediction Chapter 5 Linear Methods for Prediction 5.1 Introduction We now revisit the classification problem and focus on linear methods. Since our prediction Ĝ(x) will always take values in the discrete set G we

More information

1 One-way analysis of variance

1 One-way analysis of variance LIST OF FORMULAS (Version from 21. November 2014) STK2120 1 One-way analysis of variance Assume X ij = µ+α i +ɛ ij ; j = 1, 2,..., J i ; i = 1, 2,..., I ; where ɛ ij -s are independent and N(0, σ 2 ) distributed.

More information

Maximum Likelihood Tests and Quasi-Maximum-Likelihood

Maximum Likelihood Tests and Quasi-Maximum-Likelihood Maximum Likelihood Tests and Quasi-Maximum-Likelihood Wendelin Schnedler Department of Economics University of Heidelberg 10. Dezember 2007 Wendelin Schnedler (AWI) Maximum Likelihood Tests and Quasi-Maximum-Likelihood10.

More information

BIO5312 Biostatistics Lecture 13: Maximum Likelihood Estimation

BIO5312 Biostatistics Lecture 13: Maximum Likelihood Estimation BIO5312 Biostatistics Lecture 13: Maximum Likelihood Estimation Yujin Chung November 29th, 2016 Fall 2016 Yujin Chung Lec13: MLE Fall 2016 1/24 Previous Parametric tests Mean comparisons (normality assumption)

More information

Gauge Plots. Gauge Plots JAPANESE BEETLE DATA MAXIMUM LIKELIHOOD FOR SPATIALLY CORRELATED DISCRETE DATA JAPANESE BEETLE DATA

Gauge Plots. Gauge Plots JAPANESE BEETLE DATA MAXIMUM LIKELIHOOD FOR SPATIALLY CORRELATED DISCRETE DATA JAPANESE BEETLE DATA JAPANESE BEETLE DATA 6 MAXIMUM LIKELIHOOD FOR SPATIALLY CORRELATED DISCRETE DATA Gauge Plots TuscaroraLisa Central Madsen Fairways, 996 January 9, 7 Grubs Adult Activity Grub Counts 6 8 Organic Matter

More information