Chapter 3: Maximum Likelihood Theory
|
|
- Penelope Francis
- 6 years ago
- Views:
Transcription
1 Chapter 3: Maximum Likelihood Theory Florian Pelgrin HEC September-December, 2010 Florian Pelgrin (HEC) Maximum Likelihood Theory September-December, / 40
2 1 Introduction Example 2 Maximum likelihood estimator Notation Likelihood and log-likelihood Maximum likelihood principle Equivariance principle 3 Fisher information Score vector Fisher information matrix 4 Asymptotic results Overview Consistency Asymptotic efficiency Large sample distribution Back to the equivariance... Florian Pelgrin (HEC) Maximum Likelihood Theory September-December, / 40
3 Introduction Example Example 1: Suppose that Y 1,Y 2,,Y n are i.i.d. random variables, with Y i B(p): { 1 with probability p Y i = 0 with probability 1 p where p is an unknown parameter to estimate. The sample (y 1, y 2,, y n ) is observed. Explicit assumption regarding the distribution of Y i. Can we find an estimate (estimator) of p? Florian Pelgrin (HEC) Maximum Likelihood Theory September-December, / 40
4 Introduction Example Example 1 (cont d) The joint distribution of the sample is: ( n ) P (Y i = y i ) = P ((Y 1 = y 1 ) (Y 2 = y 2 ) (Y n = y n )) = = n P(Y i = y i ) n p y i (1 p) 1 y i = p np y i (1 p) n n P y i The likelihood function is the joint density of the data, except that we treat it as a function of the parameter: n L(p y) L(y; p) = p y i (1 p) 1 y i... The likely values of the unknown parameter given the realizations of the random variables... Florian Pelgrin (HEC) Maximum Likelihood Theory September-December, / 40
5 Introduction Example Example 1 (cont d) Suppose that two estimates of p, given by ˆp 1,n (y) and ˆp 2,n (y), are such that L n (y; ˆp 1,n (y)) > L n (y; ˆp 2,n (y)) The sample we observe y = (y 1,, y n ) is more likely to have occurred if p = ˆp 1,n (y) than if p = ˆp 2,n (y) ˆp 1,n (y) is a more plausible value. Florian Pelgrin (HEC) Maximum Likelihood Theory September-December, / 40
6 Introduction Example Example 1 (cont d) Under suitable regularity conditions, the maximum likelihood estimate (estimator) is defined to be: ˆp = argmaxl(y; p) = argmaxl(y; p) p p where l(y; p) = log(l(y; p)) is the log-likelihood function. The maximum likelihood estimate is: ˆp(y) = 1 n The maximum likelihood estimator is: ˆp = 1 n y i Y i. Florian Pelgrin (HEC) Maximum Likelihood Theory September-December, / 40
7 Introduction Example How to apply the maximum likelihood principle to the multiple linear regression model? What are the main properties of the maximum likelihood estimator? Is it asymptotically unbiased? Is it asymptotically efficient? Under which condition(s)? Is it consistent? What is the asymptotic distribution? What are the main properties of any transformation of the estimator, say θ = g(p)?... All of these questions are answered in this lecture... Florian Pelgrin (HEC) Maximum Likelihood Theory September-December, / 40
8 Maximum likelihood estimator Notation 2. Maximum likelihood estimator 2.1 Notation Consider the multiple linear regression model: y i = x i b + u i where the error terms are spherical, and the observations (y i, x i ), i = 1,, n, are i.i.d. The joint density function is given by: where θ = (b, σ 2 ). By definition, f (y i, x i ) L i (y i, x i ; θ) L(y i, x i ; θ) f (y i, x i ) = f (y i x i )f (x i ) where f (y i x i ) is the conditional density of Y X = x i and f (x i ) is the marginal density of X i. Florian Pelgrin (HEC) Maximum Likelihood Theory September-December, / 40
9 Maximum likelihood estimator Notation To get the (log-)likelihood function, one needs some parametric assumptions: 1 One can specify the conditional distribution of u X, i.e. the conditional distribution of Y X: u X N (0 n 1, σ 2 I n ) i.e Y X N (Xb, σ 2 I n ) 2 One can specify the joint (multivariate) distribution of (X, Y ) and the marginal (multivariate) distribution of X: ( Y X ) N (( EY EX or X N (EX, Σ xx ) ) ( Σyy Σ, yx Σ xy Σ xx )) Florian Pelgrin (HEC) Maximum Likelihood Theory September-December, / 40
10 Maximum likelihood estimator Notation In the first case (conditional distribution), the estimator of θ can be observed from the conditional (log-)likelihood function. In the second case (joint distribution), the estimator of θ can be derived from the joint (log-)likelihood function The joint likelihood function is the product of the conditional likelihood function and the marginal likelihood function (the information provided by the marginal distribution of X). The joint log-likelihood function is the sum of the conditional log-likelihood function and the marginal log-likelihood function. Florian Pelgrin (HEC) Maximum Likelihood Theory September-December, / 40
11 Maximum likelihood estimator Notation The conditional and marginal (log-)likelihood function (and thus the joint and conditional (log-)likelihood function) are conceptually different, and so are the two corresponding estimators (especially, in finite samples). Choosing one or the other depends on the empirical setting. For instance, the distribution of the sample data can be conditionally normal but not jointly normal (e.g., the variables X are arbitrarily determined in some experimental settings). In the sequel, we only consider the conditional maximum likelihood estimator (under the assumption of independent samples). Florian Pelgrin (HEC) Maximum Likelihood Theory September-December, / 40
12 Maximum likelihood estimator 2.2. Likelihood and log-likelihood Likelihood and log-likelihood Definition The (conditional) likelihood function is defined to be: L n : Y Θ [0, + ) ((y, x), θ) L n (y x; θ) = n L i (y i x i ; θ) Remark: The conditional likelihood function is the joint conditional density of the data in which the unknown parameter is θ. Florian Pelgrin (HEC) Maximum Likelihood Theory September-December, / 40
13 Maximum likelihood estimator Likelihood and log-likelihood Definition The (conditional) log-likelihood function is defined to be: l n : Y Θ R ((y, x); θ) l n (y x; θ) = logl i (y i x i ; θ) Florian Pelgrin (HEC) Maximum Likelihood Theory September-December, / 40
14 Maximum likelihood estimator Likelihood and log-likelihood Application: The multiple linear regression model. Under the conditional normality assumption, ( f (x i x i ; θ) L i (y i x i ; θ) = (σ 2 2π) 1 2 exp (y ) i x i b)2 2σ0 2 Therefore L n (y x; θ) = n L i (y i x i ; θ) = (σ 2 2π) n 2 exp ( 1 2σ 2 and l n (y x; θ) = n 2 log(2π) n 2 log(σ2 ) 1 2σ 2 (y i x i b)2 ) (y i x i b)2. Florian Pelgrin (HEC) Maximum Likelihood Theory September-December, / 40
15 Maximum likelihood estimator 2.3. Maximum likelihood principle Maximum likelihood principle Definition A maximum likelihood estimator of θ Θ R k is a solution to the maximization problem: or ˆθ n = argmaxl n (θ) θ Θ ˆθ n = argmaxl n (θ). θ Θ Florian Pelgrin (HEC) Maximum Likelihood Theory September-December, / 40
16 Maximum likelihood estimator Maximum likelihood principle 2.3. The maximum likelihood principle: Using the first-order conditions... Definition Under suitable regularity conditions, a maximum likelihood estimator of θ Θ R k is defined to be the solution of the first-order conditions (likelihood or log-likelihood equations): or L n θ (y x, ˆθ n ) = 0 k 1 l n θ (y x, ˆθ n ) = 0 k 1. Remark: Regularity conditions are fundamental! Florian Pelgrin (HEC) Maximum Likelihood Theory September-December, / 40
17 Maximum likelihood estimator Maximum likelihood principle Application: The multiple linear regression model (cont d) Under suitable regularity conditions, the first-order condition are given by: { ln b (y x; ˆθ n ) = 0 k 1 l n σ 2 (y x; ˆθ n ) = σ 2 x i (y i x i ˆb n ) = 0 k 1 (y i x i ˆb n ) 2 = 0 n + 1 2ˆσ n 2 2ˆσ n 4 The maximum likelihood estimate of θ is: ( ) 1 ( ˆb n = x i x i ) x i y i ˆσ n 2 = n 1 n (y i x i ˆb n ) 2. Florian Pelgrin (HEC) Maximum Likelihood Theory September-December, / 40
18 Maximum likelihood estimator Maximum likelihood principle Second-order conditions: the Hessian matrix evaluated at θ = ˆθ n must be negative definite. The Hessian matrix is given by: and H θ= ˆθn H = = 1 σ 2 1 x σ 4 x i x i 1 x σ 4 i (y i x i b) i (y i x i b) 1ˆσ x n 2 i x i 0 k k n 2ˆσ n 4 n 1 2σ 4 σ 6 (y i x i b)2 Given that (X X) is positive definite and ˆσ 2 > 0, then H θ= ˆθn negative definite and ˆθ n is a maximum. is Florian Pelgrin (HEC) Maximum Likelihood Theory September-December, / 40
19 Maximum likelihood estimator 2.4. Equivariance principle Equivariance principle Definition Under suitable regularity conditions, the maximum likelihood estimator of a function g(θ) of the parameter θ is g(ˆθ n ), where ˆθ n is the maximum likelihood estimator of θ. Florian Pelgrin (HEC) Maximum Likelihood Theory September-December, / 40
20 Maximum likelihood estimator Equivariance principle Example: Suppose Y 1,,Y n i.i.d. E(θ). The likelihood function is: n L n (y; θ) = θ exp( θy i ) ( ) = θ n exp θ y i. One gets (second-order conditions hold): ˆθ n = 1 Ȳ n ˆθ n (y) = 1 ȳ n. Consider now the probability density function: f Yi (y i ; λ) = 1 ( λ exp y ) i. λ Florian Pelgrin (HEC) Maximum Likelihood Theory September-December, / 40
21 Example (continued): Maximum likelihood estimator Equivariance principle The log-likelihood function is: l(y; λ) = nlog(λ) 1 λ The first-order condition with respect to λ is: n λ + 1 λ 2 y i = 0. y i. Since the second order condition holds, one gets (as to be expected!): ˆλ n = Ȳn = 1ˆθ n ˆλ n (y) = ȳ n = 1 ˆθ n (y). Florian Pelgrin (HEC) Maximum Likelihood Theory September-December, / 40
22 3. Fisher information. 3.1 Score vector Fisher information Score vector Definition The score vector, s, is defined to be the vector formed by the first (partial) derivative of the (conditional) log-likelihood with respect to the parameters θ Θ R k : ( ) ln s(θ) l n,θ (Y x; θ) = (Y x; θ) θ i 1 i k It satisfies: [ ] ln E θ (Y x; θ) = 0 k 1, x, θ. θ Remark: E θ means the expectation with respect to the conditional distribution Y X. Florian Pelgrin (HEC) Maximum Likelihood Theory September-December, / 40
23 Fisher information Score vector Application: The multiple linear regression model The score vector is given by: 1 x σ s(θ) = 2 i (Y i x i b) n + 1 (Y 2σ 2 2σ 4 i x i b)2 E θ [s(θ)] = 0 (k+1) 1 since: [ ] ln E θ β (Y x; βσ2 ) [ E θ n 2σ σ 4 = 1 σ 2 x i (E θ (Y i ) x i b) = 0 k 1 ] (Y i E θ (Y i x i )) 1 2σ 4 = n 2σ 2 + E θ (Y i E(Y i x i )) 2 }{{} = 0 V θ (Y i x i )=σ 2 Florian Pelgrin (HEC) Maximum Likelihood Theory September-December, / 40
24 Fisher information 3.2. Fisher information matrix Fisher information matrix Definition The Fisher information matrix at x is the variance-covariance matrix of the score vector: [ I x ln F = V θ (Y x; θ) θ [ ln (Y x; θ) = E θ. l n(y x; θ) θ θ ] ]. Florian Pelgrin (HEC) Maximum Likelihood Theory September-December, / 40
25 Fisher information Fisher information matrix Definition The Fisher information matrix of x is also given by: Remarks: [ I x 2 ] l n F = E θ (Y x; θ). θ θ 1 Three equivalent definition of the Fisher information matrix three different consistent estimates of the Fisher information matrix. 2 Finite sample properties can be quite different! 3 I x F can be defined from the Fisher information matrix for the observation i. Florian Pelgrin (HEC) Maximum Likelihood Theory September-December, / 40
26 Fisher information Fisher information matrix Definition The Fisher information matrix for the observation i (or x i ) can be defined by: Ĩ x i F (θ) = V θ [ ] l θ (Y i x i ; θ) = E θ [ l θ (Y i x i ; θ). l = E θ [ 2 l n θ θ (Y i x i ; θ) ] θ (Y i x i ; θ) ]. Florian Pelgrin (HEC) Maximum Likelihood Theory September-December, / 40
27 Fisher information Fisher information matrix Proposition The Fisher information matrix at x = (x 1,, x n ) (or for n observations) is given by: I x F (θ) = Ĩ x i F (θ). Remark: In a sampling model (with i.i.d. observations), one has: I x F (θ) = nĩx i F (θ). Florian Pelgrin (HEC) Maximum Likelihood Theory September-December, / 40
28 Fisher information Fisher information matrix Definition The average Fisher information matrix for one observation is defined by: Theorem (a) Ĩ F (θ) = plim 1 n IX F (θ) n Ĩ F (θ) = E Xi Ĩ X i F (θ). (b) Ĩ F (θ) = E [ l θ (Y i X i ; θ) l θ (Y i X i ; θ) ] [ ] (c) Ĩ F (θ) = E 2 l θ θ (Y i X i ; θ) Florian Pelgrin (HEC) Maximum Likelihood Theory September-December, / 40
29 Fisher information Fisher information matrix A consistent estimator of the Fisher information matrix? Proposition If ˆθ n converges in probability to θ 0, then: Î (1) F (ˆθ n,ml ) = 1 n Î (2) F (ˆθ n,ml ) = 1 n Î (3) F (ˆθ n,ml ) = 1 n I x i F (ˆθ n,ml ) l i (y i x i ; ˆθ n,ml ) l i (y i x i ; ˆθ n,ml ) θ θ t 2 l i (y i x i ; ˆθ n,ml ) θ θ t = 1 n 2 l n (y x; ˆθ n,ml ) θ θ t. are three consistent estimators of the Fisher information matrix. Florian Pelgrin (HEC) Maximum Likelihood Theory September-December, / 40
30 Fisher information Fisher information matrix These three consistent estimators of the Fisher information matrix are asymptotically equivalent and none of these estimators is preferable to the others on statistical grounds. The main difficulty is that these estimators can have very different finite sample properties (again!). This can lead to different statistical conclusions for the same problem! Florian Pelgrin (HEC) Maximum Likelihood Theory September-December, / 40
31 Fisher information Fisher information matrix Application: The multiple linear regression model Computation of Ĩ F (θ): Derivation of the Hessian matrix of the log-likelihood function for observation i: 2 ( l i θ θ (y 1 x i x i ; θ) = σ 2 i x i 1 ) x σ 4 i (y i x i b) 1 x σ 4 i (y 1 i x i b) 1 (y 2σ 4 σ 6 i x i b)2 Expectation with respect to the conditional distribution of Y i X i = x i : [ 2 ] ( l E i 1 θ θ θ (.) x = σ 2 i x i 0 k k 1 2σ 4 Expectation with respect to the distribution of X i : [ 2 ] ( l 1 Ĩ F (θ) = E Xi E i θ θ θ (.) E(X = σ 2 i X i ) 0 ) k k 2σ 4 ) Florian Pelgrin (HEC) Maximum Likelihood Theory September-December, / 40
32 4. Asymptotic results 4.1. Overview Asymptotic results Overview Under certain regularity conditions, the maximum likelihood estimator, ˆθ n, possesses many appealing properties: 1. The maximum likelihood estimator is consistent. 2. The maximum likelihood estimator is asymptotically normal: n (ˆθn θ 0 ) d N (.,.). 3. The maximum likelihood estimator is asymptotically optimal or efficient. 4. The maximum likelihood estimator is equivariant: if ˆθ n is an estimator of θ 0 then g(ˆθ n ) is an estimator of g(θ 0 ). Florian Pelgrin (HEC) Maximum Likelihood Theory September-December, / 40
33 Asymptotic results Overview At the same time, Dependence to the explicit assumptions regarding Y 1,,Y n Finite sample properties can be very different from large sample properties: - The maximum likelihood estimator is consistent but can be severely biased in finite samples - The estimation of the variance-covariance matrix can be seriously doubtful in finite samples. Florian Pelgrin (HEC) Maximum Likelihood Theory September-December, / 40
34 4.2. Consistency Asymptotic results Consistency Theorem Under suitable regularity conditions, ˆθ n,ml a.s θ 0. Remark: This implies that: ˆθ n,ml p θ0 Florian Pelgrin (HEC) Maximum Likelihood Theory September-December, / 40
35 Asymptotic results 4.3. Asymptotical efficiency Asymptotic efficiency Proposition An unbiased maximum likelihood estimator of θ or g(θ) attains the FDCR lower bound and is thus (asymptotically) efficient. Florian Pelgrin (HEC) Maximum Likelihood Theory September-December, / 40
36 Asymptotic results 4.4. Large sample distribution Large sample distribution Theorem Under suitable regularity conditions, ( ) d n(ˆθn,ml θ 0 ) N 0, Ĩ 1 n F (θ 0) ˆθ n,ml a N (θ 0, n 1 Ĩ 1 F (θ 0) Remark: In a sampling model, I F (θ 0 ) is independent of x and is the Fisher information matrix for one observation: ( ) d n(ˆθ n,ml θ 0 ) N 0, I 1 n 1 (θ 0) ). Florian Pelgrin (HEC) Maximum Likelihood Theory September-December, / 40
37 Asymptotic results Large sample distribution Interpretation: The distribution of ˆθ n is approximatively (for n large) normally distributed with expectation the true unknown parameter and variance-covariance matrix the FDCR lower bound. The maximum likelihood estimator is asymptotically unbiased. The maximum likelihood estimator is asymptotically efficient. Florian Pelgrin (HEC) Maximum Likelihood Theory September-December, / 40
38 Asymptotic results 5.3. Back to the equivariance... Back to the equivariance... Proposition Assume H1, H2, H3-H8 hold, and g is a continuously differentiable function of θ and is defined from R k to R p, then: ) a.s. g (ˆθ n g(θ 0 ) ( ) ) ( [ ] [ ]) d g g n g (ˆθ n g(θ 0 ) N 0, θ t (θ 0) Ĩ 1 t F (θ 0) θ (θ 0). Florian Pelgrin (HEC) Maximum Likelihood Theory September-December, / 40
39 Asymptotic results Back to the equivariance... Application: The multiple linear regression model The Fisher information matrix is given by: ( Ĩ 1 σ F (θ 2 0) = 0 (EX i X i ) ) 1 0 k k 2σ0 4 Therefore, n(ˆbn,ml b 0 ) n(ˆσ 2 n,ml σ 2 0 ) ( d N 0, σ 2 n 0 (EX ix i ) 1) d n N (0, 2σ 4 0 ) The two vectors n(ˆb n,ml b 0 ) and n(ˆσ 2 n,ml σ2 0 ) are asymptotically independent. Florian Pelgrin (HEC) Maximum Likelihood Theory September-December, / 40
40 Asymptotic results Back to the equivariance... A consistent estimate of the Fisher information matrix can be given by: Ĩ (1) F = 1 n = ( 1ˆσ 2 n E 2 l(y i x i ; ˆθ n ) θ θ 1 n X ) X 0 k 1 0 k 1 1 ˆσ 4 n so that: ˆb n,ml a N ( b 0, ˆσ 2 n(x X) 1) ˆσ n,ml a N ( σ 2 0, 2ˆσ4 n n ). Florian Pelgrin (HEC) Maximum Likelihood Theory September-December, / 40
Chapter 4: Constrained estimators and tests in the multiple linear regression model (Part III)
Chapter 4: Constrained estimators and tests in the multiple linear regression model (Part III) Florian Pelgrin HEC September-December 2010 Florian Pelgrin (HEC) Constrained estimators September-December
More informationf(x θ)dx with respect to θ. Assuming certain smoothness conditions concern differentiating under the integral the integral sign, we first obtain
0.1. INTRODUCTION 1 0.1 Introduction R. A. Fisher, a pioneer in the development of mathematical statistics, introduced a measure of the amount of information contained in an observaton from f(x θ). Fisher
More informationGraduate Econometrics I: Maximum Likelihood I
Graduate Econometrics I: Maximum Likelihood I Yves Dominicy Université libre de Bruxelles Solvay Brussels School of Economics and Management ECARES Yves Dominicy Graduate Econometrics I: Maximum Likelihood
More informationIntroduction to Estimation Methods for Time Series models Lecture 2
Introduction to Estimation Methods for Time Series models Lecture 2 Fulvio Corsi SNS Pisa Fulvio Corsi Introduction to Estimation () Methods for Time Series models Lecture 2 SNS Pisa 1 / 21 Estimators:
More informationMax. Likelihood Estimation. Outline. Econometrics II. Ricardo Mora. Notes. Notes
Maximum Likelihood Estimation Econometrics II Department of Economics Universidad Carlos III de Madrid Máster Universitario en Desarrollo y Crecimiento Económico Outline 1 3 4 General Approaches to Parameter
More informationBIO5312 Biostatistics Lecture 13: Maximum Likelihood Estimation
BIO5312 Biostatistics Lecture 13: Maximum Likelihood Estimation Yujin Chung November 29th, 2016 Fall 2016 Yujin Chung Lec13: MLE Fall 2016 1/24 Previous Parametric tests Mean comparisons (normality assumption)
More informationParameter Estimation
Parameter Estimation Consider a sample of observations on a random variable Y. his generates random variables: (y 1, y 2,, y ). A random sample is a sample (y 1, y 2,, y ) where the random variables y
More informationStatistical Methods for Handling Incomplete Data Chapter 2: Likelihood-based approach
Statistical Methods for Handling Incomplete Data Chapter 2: Likelihood-based approach Jae-Kwang Kim Department of Statistics, Iowa State University Outline 1 Introduction 2 Observed likelihood 3 Mean Score
More informationECE531 Lecture 10b: Maximum Likelihood Estimation
ECE531 Lecture 10b: Maximum Likelihood Estimation D. Richard Brown III Worcester Polytechnic Institute 05-Apr-2011 Worcester Polytechnic Institute D. Richard Brown III 05-Apr-2011 1 / 23 Introduction So
More informationTheory of Maximum Likelihood Estimation. Konstantin Kashin
Gov 2001 Section 5: Theory of Maximum Likelihood Estimation Konstantin Kashin February 28, 2013 Outline Introduction Likelihood Examples of MLE Variance of MLE Asymptotic Properties What is Statistical
More informationNotes, March 4, 2013, R. Dudley Maximum likelihood estimation: actual or supposed
18.466 Notes, March 4, 2013, R. Dudley Maximum likelihood estimation: actual or supposed 1. MLEs in exponential families Let f(x,θ) for x X and θ Θ be a likelihood function, that is, for present purposes,
More informationEstimation MLE-Pandemic data MLE-Financial crisis data Evaluating estimators. Estimation. September 24, STAT 151 Class 6 Slide 1
Estimation September 24, 2018 STAT 151 Class 6 Slide 1 Pandemic data Treatment outcome, X, from n = 100 patients in a pandemic: 1 = recovered and 0 = not recovered 1 1 1 0 0 0 1 1 1 0 0 1 0 1 0 0 1 1 1
More informationFinal Examination Statistics 200C. T. Ferguson June 11, 2009
Final Examination Statistics 00C T. Ferguson June, 009. (a) Define: X n converges in probability to X. (b) Define: X m converges in quadratic mean to X. (c) Show that if X n converges in quadratic mean
More informationMathematics Ph.D. Qualifying Examination Stat Probability, January 2018
Mathematics Ph.D. Qualifying Examination Stat 52800 Probability, January 2018 NOTE: Answers all questions completely. Justify every step. Time allowed: 3 hours. 1. Let X 1,..., X n be a random sample from
More informationExercises and Answers to Chapter 1
Exercises and Answers to Chapter The continuous type of random variable X has the following density function: a x, if < x < a, f (x), otherwise. Answer the following questions. () Find a. () Obtain mean
More informationFall 2017 STAT 532 Homework Peter Hoff. 1. Let P be a probability measure on a collection of sets A.
1. Let P be a probability measure on a collection of sets A. (a) For each n N, let H n be a set in A such that H n H n+1. Show that P (H n ) monotonically converges to P ( k=1 H k) as n. (b) For each n
More informationStatistics 3858 : Maximum Likelihood Estimators
Statistics 3858 : Maximum Likelihood Estimators 1 Method of Maximum Likelihood In this method we construct the so called likelihood function, that is L(θ) = L(θ; X 1, X 2,..., X n ) = f n (X 1, X 2,...,
More informationSTATS 200: Introduction to Statistical Inference. Lecture 29: Course review
STATS 200: Introduction to Statistical Inference Lecture 29: Course review Course review We started in Lecture 1 with a fundamental assumption: Data is a realization of a random process. The goal throughout
More informationThis paper is not to be removed from the Examination Halls
~~ST104B ZA d0 This paper is not to be removed from the Examination Halls UNIVERSITY OF LONDON ST104B ZB BSc degrees and Diplomas for Graduates in Economics, Management, Finance and the Social Sciences,
More informationECE 275A Homework 7 Solutions
ECE 275A Homework 7 Solutions Solutions 1. For the same specification as in Homework Problem 6.11 we want to determine an estimator for θ using the Method of Moments (MOM). In general, the MOM estimator
More informationGeneralized Linear Models. Kurt Hornik
Generalized Linear Models Kurt Hornik Motivation Assuming normality, the linear model y = Xβ + e has y = β + ε, ε N(0, σ 2 ) such that y N(μ, σ 2 ), E(y ) = μ = β. Various generalizations, including general
More informationEstimation theory. Parametric estimation. Properties of estimators. Minimum variance estimator. Cramer-Rao bound. Maximum likelihood estimators
Estimation theory Parametric estimation Properties of estimators Minimum variance estimator Cramer-Rao bound Maximum likelihood estimators Confidence intervals Bayesian estimation 1 Random Variables Let
More informationBrief Review on Estimation Theory
Brief Review on Estimation Theory K. Abed-Meraim ENST PARIS, Signal and Image Processing Dept. abed@tsi.enst.fr This presentation is essentially based on the course BASTA by E. Moulines Brief review on
More informationEconometrics I, Estimation
Econometrics I, Estimation Department of Economics Stanford University September, 2008 Part I Parameter, Estimator, Estimate A parametric is a feature of the population. An estimator is a function of the
More informationSTAT 512 sp 2018 Summary Sheet
STAT 5 sp 08 Summary Sheet Karl B. Gregory Spring 08. Transformations of a random variable Let X be a rv with support X and let g be a function mapping X to Y with inverse mapping g (A = {x X : g(x A}
More informationEconomics 620, Lecture 2: Regression Mechanics (Simple Regression)
1 Economics 620, Lecture 2: Regression Mechanics (Simple Regression) Observed variables: y i ; x i i = 1; :::; n Hypothesized (model): Ey i = + x i or y i = + x i + (y i Ey i ) ; renaming we get: y i =
More informationECON 4160, Autumn term Lecture 1
ECON 4160, Autumn term 2017. Lecture 1 a) Maximum Likelihood based inference. b) The bivariate normal model Ragnar Nymoen University of Oslo 24 August 2017 1 / 54 Principles of inference I Ordinary least
More informationPh.D. Qualifying Exam Friday Saturday, January 6 7, 2017
Ph.D. Qualifying Exam Friday Saturday, January 6 7, 2017 Put your solution to each problem on a separate sheet of paper. Problem 1. (5106) Let X 1, X 2,, X n be a sequence of i.i.d. observations from a
More informationMultivariate Regression
Multivariate Regression The so-called supervised learning problem is the following: we want to approximate the random variable Y with an appropriate function of the random variables X 1,..., X p with the
More informationMachine Learning Basics: Maximum Likelihood Estimation
Machine Learning Basics: Maximum Likelihood Estimation Sargur N. srihari@cedar.buffalo.edu This is part of lecture slides on Deep Learning: http://www.cedar.buffalo.edu/~srihari/cse676 1 Topics 1. Learning
More informationHypothesis Testing. 1 Definitions of test statistics. CB: chapter 8; section 10.3
Hypothesis Testing CB: chapter 8; section 0.3 Hypothesis: statement about an unknown population parameter Examples: The average age of males in Sweden is 7. (statement about population mean) The lowest
More informationSummer School in Statistics for Astronomers V June 1 - June 6, Regression. Mosuk Chow Statistics Department Penn State University.
Summer School in Statistics for Astronomers V June 1 - June 6, 2009 Regression Mosuk Chow Statistics Department Penn State University. Adapted from notes prepared by RL Karandikar Mean and variance Recall
More informationSTAT 730 Chapter 4: Estimation
STAT 730 Chapter 4: Estimation Timothy Hanson Department of Statistics, University of South Carolina Stat 730: Multivariate Analysis 1 / 23 The likelihood We have iid data, at least initially. Each datum
More informationSTAT 135 Lab 3 Asymptotic MLE and the Method of Moments
STAT 135 Lab 3 Asymptotic MLE and the Method of Moments Rebecca Barter February 9, 2015 Maximum likelihood estimation (a reminder) Maximum likelihood estimation Suppose that we have a sample, X 1, X 2,...,
More informationStatement: With my signature I confirm that the solutions are the product of my own work. Name: Signature:.
MATHEMATICAL STATISTICS Homework assignment Instructions Please turn in the homework with this cover page. You do not need to edit the solutions. Just make sure the handwriting is legible. You may discuss
More informationAdvanced Quantitative Methods: maximum likelihood
Advanced Quantitative Methods: Maximum Likelihood University College Dublin 4 March 2014 1 2 3 4 5 6 Outline 1 2 3 4 5 6 of straight lines y = 1 2 x + 2 dy dx = 1 2 of curves y = x 2 4x + 5 of curves y
More informationGraduate Econometrics I: Maximum Likelihood II
Graduate Econometrics I: Maximum Likelihood II Yves Dominicy Université libre de Bruxelles Solvay Brussels School of Economics and Management ECARES Yves Dominicy Graduate Econometrics I: Maximum Likelihood
More informationMS&E 226: Small Data. Lecture 11: Maximum likelihood (v2) Ramesh Johari
MS&E 226: Small Data Lecture 11: Maximum likelihood (v2) Ramesh Johari ramesh.johari@stanford.edu 1 / 18 The likelihood function 2 / 18 Estimating the parameter This lecture develops the methodology behind
More informationECE531 Lecture 8: Non-Random Parameter Estimation
ECE531 Lecture 8: Non-Random Parameter Estimation D. Richard Brown III Worcester Polytechnic Institute 19-March-2009 Worcester Polytechnic Institute D. Richard Brown III 19-March-2009 1 / 25 Introduction
More informationsimple if it completely specifies the density of x
3. Hypothesis Testing Pure significance tests Data x = (x 1,..., x n ) from f(x, θ) Hypothesis H 0 : restricts f(x, θ) Are the data consistent with H 0? H 0 is called the null hypothesis simple if it completely
More informationModification and Improvement of Empirical Likelihood for Missing Response Problem
UW Biostatistics Working Paper Series 12-30-2010 Modification and Improvement of Empirical Likelihood for Missing Response Problem Kwun Chuen Gary Chan University of Washington - Seattle Campus, kcgchan@u.washington.edu
More informationFractional Hot Deck Imputation for Robust Inference Under Item Nonresponse in Survey Sampling
Fractional Hot Deck Imputation for Robust Inference Under Item Nonresponse in Survey Sampling Jae-Kwang Kim 1 Iowa State University June 26, 2013 1 Joint work with Shu Yang Introduction 1 Introduction
More informationFinal Exam. 1. (6 points) True/False. Please read the statements carefully, as no partial credit will be given.
1. (6 points) True/False. Please read the statements carefully, as no partial credit will be given. (a) If X and Y are independent, Corr(X, Y ) = 0. (b) (c) (d) (e) A consistent estimator must be asymptotically
More informationStatistical Machine Learning Hilary Term 2018
Statistical Machine Learning Hilary Term 2018 Pier Francesco Palamara Department of Statistics University of Oxford Slide credits and other course material can be found at: http://www.stats.ox.ac.uk/~palamara/sml18.html
More informationTECHNICAL REPORT # 59 MAY Interim sample size recalculation for linear and logistic regression models: a comprehensive Monte-Carlo study
TECHNICAL REPORT # 59 MAY 2013 Interim sample size recalculation for linear and logistic regression models: a comprehensive Monte-Carlo study Sergey Tarima, Peng He, Tao Wang, Aniko Szabo Division of Biostatistics,
More informationMath 181B Homework 1 Solution
Math 181B Homework 1 Solution 1. Write down the likelihood: L(λ = n λ X i e λ X i! (a One-sided test: H 0 : λ = 1 vs H 1 : λ = 0.1 The likelihood ratio: where LR = L(1 L(0.1 = 1 X i e n 1 = λ n X i e nλ
More informationCorrelation and Regression
Correlation and Regression October 25, 2017 STAT 151 Class 9 Slide 1 Outline of Topics 1 Associations 2 Scatter plot 3 Correlation 4 Regression 5 Testing and estimation 6 Goodness-of-fit STAT 151 Class
More informationHT Introduction. P(X i = x i ) = e λ λ x i
MODS STATISTICS Introduction. HT 2012 Simon Myers, Department of Statistics (and The Wellcome Trust Centre for Human Genetics) myers@stats.ox.ac.uk We will be concerned with the mathematical framework
More informationCanonical Correlation Analysis of Longitudinal Data
Biometrics Section JSM 2008 Canonical Correlation Analysis of Longitudinal Data Jayesh Srivastava Dayanand N Naik Abstract Studying the relationship between two sets of variables is an important multivariate
More informationUnbiased Estimation. Binomial problem shows general phenomenon. An estimator can be good for some values of θ and bad for others.
Unbiased Estimation Binomial problem shows general phenomenon. An estimator can be good for some values of θ and bad for others. To compare ˆθ and θ, two estimators of θ: Say ˆθ is better than θ if it
More informationThe linear model is the most fundamental of all serious statistical models encompassing:
Linear Regression Models: A Bayesian perspective Ingredients of a linear model include an n 1 response vector y = (y 1,..., y n ) T and an n p design matrix (e.g. including regressors) X = [x 1,..., x
More informationAn Efficient Estimation Method for Longitudinal Surveys with Monotone Missing Data
An Efficient Estimation Method for Longitudinal Surveys with Monotone Missing Data Jae-Kwang Kim 1 Iowa State University June 28, 2012 1 Joint work with Dr. Ming Zhou (when he was a PhD student at ISU)
More informationAdvanced Quantitative Methods: maximum likelihood
Advanced Quantitative Methods: Maximum Likelihood University College Dublin March 23, 2011 1 Introduction 2 3 4 5 Outline Introduction 1 Introduction 2 3 4 5 Preliminaries Introduction Ordinary least squares
More informationLecture 3. Inference about multivariate normal distribution
Lecture 3. Inference about multivariate normal distribution 3.1 Point and Interval Estimation Let X 1,..., X n be i.i.d. N p (µ, Σ). We are interested in evaluation of the maximum likelihood estimates
More informationRegression Estimation Least Squares and Maximum Likelihood
Regression Estimation Least Squares and Maximum Likelihood Dr. Frank Wood Frank Wood, fwood@stat.columbia.edu Linear Regression Models Lecture 3, Slide 1 Least Squares Max(min)imization Function to minimize
More informationMLE and GMM. Li Zhao, SJTU. Spring, Li Zhao MLE and GMM 1 / 22
MLE and GMM Li Zhao, SJTU Spring, 2017 Li Zhao MLE and GMM 1 / 22 Outline 1 MLE 2 GMM 3 Binary Choice Models Li Zhao MLE and GMM 2 / 22 Maximum Likelihood Estimation - Introduction For a linear model y
More informationCS281A/Stat241A Lecture 17
CS281A/Stat241A Lecture 17 p. 1/4 CS281A/Stat241A Lecture 17 Factor Analysis and State Space Models Peter Bartlett CS281A/Stat241A Lecture 17 p. 2/4 Key ideas of this lecture Factor Analysis. Recall: Gaussian
More information2.3 Methods of Estimation
96 CHAPTER 2. ELEMENTS OF STATISTICAL INFERENCE 2.3 Methods of Estimation 2.3. Method of Moments The Method of Moments is a simple technique based on the idea that the sample moments are natural estimators
More informationLecture 3 September 1
STAT 383C: Statistical Modeling I Fall 2016 Lecture 3 September 1 Lecturer: Purnamrita Sarkar Scribe: Giorgio Paulon, Carlos Zanini Disclaimer: These scribe notes have been slightly proofread and may have
More informationReview and continuation from last week Properties of MLEs
Review and continuation from last week Properties of MLEs As we have mentioned, MLEs have a nice intuitive property, and as we have seen, they have a certain equivariance property. We will see later that
More informationLECTURE 2 LINEAR REGRESSION MODEL AND OLS
SEPTEMBER 29, 2014 LECTURE 2 LINEAR REGRESSION MODEL AND OLS Definitions A common question in econometrics is to study the effect of one group of variables X i, usually called the regressors, on another
More informationSpring 2012 Math 541A Exam 1. X i, S 2 = 1 n. n 1. X i I(X i < c), T n =
Spring 2012 Math 541A Exam 1 1. (a) Let Z i be independent N(0, 1), i = 1, 2,, n. Are Z = 1 n n Z i and S 2 Z = 1 n 1 n (Z i Z) 2 independent? Prove your claim. (b) Let X 1, X 2,, X n be independent identically
More informationRecent Advances in the analysis of missing data with non-ignorable missingness
Recent Advances in the analysis of missing data with non-ignorable missingness Jae-Kwang Kim Department of Statistics, Iowa State University July 4th, 2014 1 Introduction 2 Full likelihood-based ML estimation
More informationEstimation, Inference, and Hypothesis Testing
Chapter 2 Estimation, Inference, and Hypothesis Testing Note: The primary reference for these notes is Ch. 7 and 8 of Casella & Berger 2. This text may be challenging if new to this topic and Ch. 7 of
More informationAsymptotic Statistics-III. Changliang Zou
Asymptotic Statistics-III Changliang Zou The multivariate central limit theorem Theorem (Multivariate CLT for iid case) Let X i be iid random p-vectors with mean µ and and covariance matrix Σ. Then n (
More informationOutline of GLMs. Definitions
Outline of GLMs Definitions This is a short outline of GLM details, adapted from the book Nonparametric Regression and Generalized Linear Models, by Green and Silverman. The responses Y i have density
More informationSimple and Multiple Linear Regression
Sta. 113 Chapter 12 and 13 of Devore March 12, 2010 Table of contents 1 Simple Linear Regression 2 Model Simple Linear Regression A simple linear regression model is given by Y = β 0 + β 1 x + ɛ where
More informationMCMC algorithms for fitting Bayesian models
MCMC algorithms for fitting Bayesian models p. 1/1 MCMC algorithms for fitting Bayesian models Sudipto Banerjee sudiptob@biostat.umn.edu University of Minnesota MCMC algorithms for fitting Bayesian models
More informationLinear Regression. In this problem sheet, we consider the problem of linear regression with p predictors and one intercept,
Linear Regression In this problem sheet, we consider the problem of linear regression with p predictors and one intercept, y = Xβ + ɛ, where y t = (y 1,..., y n ) is the column vector of target values,
More informationLecture 2: ARMA(p,q) models (part 2)
Lecture 2: ARMA(p,q) models (part 2) Florian Pelgrin University of Lausanne, École des HEC Department of mathematics (IMEA-Nice) Sept. 2011 - Jan. 2012 Florian Pelgrin (HEC) Univariate time series Sept.
More informationCovariance function estimation in Gaussian process regression
Covariance function estimation in Gaussian process regression François Bachoc Department of Statistics and Operations Research, University of Vienna WU Research Seminar - May 2015 François Bachoc Gaussian
More informationPart 6: Multivariate Normal and Linear Models
Part 6: Multivariate Normal and Linear Models 1 Multiple measurements Up until now all of our statistical models have been univariate models models for a single measurement on each member of a sample of
More informationExercises Chapter 4 Statistical Hypothesis Testing
Exercises Chapter 4 Statistical Hypothesis Testing Advanced Econometrics - HEC Lausanne Christophe Hurlin University of Orléans December 5, 013 Christophe Hurlin (University of Orléans) Advanced Econometrics
More informationEcon 2148, fall 2017 Gaussian process priors, reproducing kernel Hilbert spaces, and Splines
Econ 2148, fall 2017 Gaussian process priors, reproducing kernel Hilbert spaces, and Splines Maximilian Kasy Department of Economics, Harvard University 1 / 37 Agenda 6 equivalent representations of the
More informationUnbiased Estimation. Binomial problem shows general phenomenon. An estimator can be good for some values of θ and bad for others.
Unbiased Estimation Binomial problem shows general phenomenon. An estimator can be good for some values of θ and bad for others. To compare ˆθ and θ, two estimators of θ: Say ˆθ is better than θ if it
More informationMathematical statistics
October 4 th, 2018 Lecture 12: Information Where are we? Week 1 Week 2 Week 4 Week 7 Week 10 Week 14 Probability reviews Chapter 6: Statistics and Sampling Distributions Chapter 7: Point Estimation Chapter
More informationStat 5102 Lecture Slides Deck 3. Charles J. Geyer School of Statistics University of Minnesota
Stat 5102 Lecture Slides Deck 3 Charles J. Geyer School of Statistics University of Minnesota 1 Likelihood Inference We have learned one very general method of estimation: method of moments. the Now we
More informationGeneralized Linear Models Introduction
Generalized Linear Models Introduction Statistics 135 Autumn 2005 Copyright c 2005 by Mark E. Irwin Generalized Linear Models For many problems, standard linear regression approaches don t work. Sometimes,
More informationRegression Estimation - Least Squares and Maximum Likelihood. Dr. Frank Wood
Regression Estimation - Least Squares and Maximum Likelihood Dr. Frank Wood Least Squares Max(min)imization Function to minimize w.r.t. β 0, β 1 Q = n (Y i (β 0 + β 1 X i )) 2 i=1 Minimize this by maximizing
More information10. Linear Models and Maximum Likelihood Estimation
10. Linear Models and Maximum Likelihood Estimation ECE 830, Spring 2017 Rebecca Willett 1 / 34 Primary Goal General problem statement: We observe y i iid pθ, θ Θ and the goal is to determine the θ that
More informationBetter Bootstrap Confidence Intervals
by Bradley Efron University of Washington, Department of Statistics April 12, 2012 An example Suppose we wish to make inference on some parameter θ T (F ) (e.g. θ = E F X ), based on data We might suppose
More informationCopula Regression RAHUL A. PARSA DRAKE UNIVERSITY & STUART A. KLUGMAN SOCIETY OF ACTUARIES CASUALTY ACTUARIAL SOCIETY MAY 18,2011
Copula Regression RAHUL A. PARSA DRAKE UNIVERSITY & STUART A. KLUGMAN SOCIETY OF ACTUARIES CASUALTY ACTUARIAL SOCIETY MAY 18,2011 Outline Ordinary Least Squares (OLS) Regression Generalized Linear Models
More informationGaussian Processes 1. Schedule
1 Schedule 17 Jan: Gaussian processes (Jo Eidsvik) 24 Jan: Hands-on project on Gaussian processes (Team effort, work in groups) 31 Jan: Latent Gaussian models and INLA (Jo Eidsvik) 7 Feb: Hands-on project
More informationSTA 260: Statistics and Probability II
Al Nosedal. University of Toronto. Winter 2017 1 Properties of Point Estimators and Methods of Estimation 2 3 If you can t explain it simply, you don t understand it well enough Albert Einstein. Definition
More informationWooldridge, Introductory Econometrics, 4th ed. Appendix C: Fundamentals of mathematical statistics
Wooldridge, Introductory Econometrics, 4th ed. Appendix C: Fundamentals of mathematical statistics A short review of the principles of mathematical statistics (or, what you should have learned in EC 151).
More informationOptimization. The value x is called a maximizer of f and is written argmax X f. g(λx + (1 λ)y) < λg(x) + (1 λ)g(y) 0 < λ < 1; x, y X.
Optimization Background: Problem: given a function f(x) defined on X, find x such that f(x ) f(x) for all x X. The value x is called a maximizer of f and is written argmax X f. In general, argmax X f may
More informationMaximum Likelihood Tests and Quasi-Maximum-Likelihood
Maximum Likelihood Tests and Quasi-Maximum-Likelihood Wendelin Schnedler Department of Economics University of Heidelberg 10. Dezember 2007 Wendelin Schnedler (AWI) Maximum Likelihood Tests and Quasi-Maximum-Likelihood10.
More informationMA 575 Linear Models: Cedric E. Ginestet, Boston University Midterm Review Week 7
MA 575 Linear Models: Cedric E. Ginestet, Boston University Midterm Review Week 7 1 Random Vectors Let a 0 and y be n 1 vectors, and let A be an n n matrix. Here, a 0 and A are non-random, whereas y is
More informationPOLI 8501 Introduction to Maximum Likelihood Estimation
POLI 8501 Introduction to Maximum Likelihood Estimation Maximum Likelihood Intuition Consider a model that looks like this: Y i N(µ, σ 2 ) So: E(Y ) = µ V ar(y ) = σ 2 Suppose you have some data on Y,
More informationA General Overview of Parametric Estimation and Inference Techniques.
A General Overview of Parametric Estimation and Inference Techniques. Moulinath Banerjee University of Michigan September 11, 2012 The object of statistical inference is to glean information about an underlying
More informationMISCELLANEOUS TOPICS RELATED TO LIKELIHOOD. Copyright c 2012 (Iowa State University) Statistics / 30
MISCELLANEOUS TOPICS RELATED TO LIKELIHOOD Copyright c 2012 (Iowa State University) Statistics 511 1 / 30 INFORMATION CRITERIA Akaike s Information criterion is given by AIC = 2l(ˆθ) + 2k, where l(ˆθ)
More informationMathematical statistics
October 1 st, 2018 Lecture 11: Sufficient statistic Where are we? Week 1 Week 2 Week 4 Week 7 Week 10 Week 14 Probability reviews Chapter 6: Statistics and Sampling Distributions Chapter 7: Point Estimation
More informationChapter 4: Asymptotic Properties of the MLE (Part 2)
Chapter 4: Asymptotic Properties of the MLE (Part 2) Daniel O. Scharfstein 09/24/13 1 / 1 Example Let {(R i, X i ) : i = 1,..., n} be an i.i.d. sample of n random vectors (R, X ). Here R is a response
More informationSystem Identification, Lecture 4
System Identification, Lecture 4 Kristiaan Pelckmans (IT/UU, 2338) Course code: 1RT880, Report code: 61800 - Spring 2012 F, FRI Uppsala University, Information Technology 30 Januari 2012 SI-2012 K. Pelckmans
More informationInverse of a Square Matrix. For an N N square matrix A, the inverse of A, 1
Inverse of a Square Matrix For an N N square matrix A, the inverse of A, 1 A, exists if and only if A is of full rank, i.e., if and only if no column of A is a linear combination 1 of the others. A is
More informationLinear Methods for Prediction
Chapter 5 Linear Methods for Prediction 5.1 Introduction We now revisit the classification problem and focus on linear methods. Since our prediction Ĝ(x) will always take values in the discrete set G we
More informationNotes on the Multivariate Normal and Related Topics
Version: July 10, 2013 Notes on the Multivariate Normal and Related Topics Let me refresh your memory about the distinctions between population and sample; parameters and statistics; population distributions
More informationTheory of Statistics.
Theory of Statistics. Homework V February 5, 00. MT 8.7.c When σ is known, ˆµ = X is an unbiased estimator for µ. If you can show that its variance attains the Cramer-Rao lower bound, then no other unbiased
More informationVarious types of likelihood
Various types of likelihood 1. likelihood, marginal likelihood, conditional likelihood, profile likelihood, adjusted profile likelihood 2. semi-parametric likelihood, partial likelihood 3. empirical likelihood,
More informationStatistics and Econometrics I
Statistics and Econometrics I Point Estimation Shiu-Sheng Chen Department of Economics National Taiwan University September 13, 2016 Shiu-Sheng Chen (NTU Econ) Statistics and Econometrics I September 13,
More information