Point Estimation: definition of estimators

Similar documents
Point Estimation: definition of estimators

STK4011 and STK9011 Autumn 2016

Chapter 5 Properties of a Random Sample

ENGI 4421 Joint Probability Distributions Page Joint Probability Distributions [Navidi sections 2.5 and 2.6; Devore sections

1 Solution to Problem 6.40

Lecture 3. Sampling, sampling distributions, and parameter estimation

X ε ) = 0, or equivalently, lim

Chapter 4 Multiple Random Variables

ρ < 1 be five real numbers. The

X X X E[ ] E X E X. is the ()m n where the ( i,)th. j element is the mean of the ( i,)th., then

CHAPTER VI Statistical Analysis of Experimental Data

Bayes (Naïve or not) Classifiers: Generative Approach

Ordinary Least Squares Regression. Simple Regression. Algebra and Assumptions.

Special Instructions / Useful Data

UNIVERSITY OF OSLO DEPARTMENT OF ECONOMICS

Econometric Methods. Review of Estimation

Bayes Estimator for Exponential Distribution with Extension of Jeffery Prior Information

CHAPTER 3 POSTERIOR DISTRIBUTIONS

STATISTICAL PROPERTIES OF LEAST SQUARES ESTIMATORS. x, where. = y - ˆ " 1

Chapter 4 Multiple Random Variables

arxiv: v1 [math.st] 24 Oct 2016

Functions of Random Variables

Module 7. Lecture 7: Statistical parameter estimation

THE ROYAL STATISTICAL SOCIETY GRADUATE DIPLOMA

Introduction to Probability

( ) = ( ) ( ) Chapter 13 Asymptotic Theory and Stochastic Regressors. Stochastic regressors model

Lecture Notes Types of economic variables

Chapter 14 Logistic Regression Models

Lecture Note to Rice Chapter 8

The Mathematical Appendix

Estimation of Stress- Strength Reliability model using finite mixture of exponential distributions

UNIVERSITY OF OSLO DEPARTMENT OF ECONOMICS

Lecture 3 Probability review (cont d)

TESTS BASED ON MAXIMUM LIKELIHOOD

Qualifying Exam Statistical Theory Problem Solutions August 2005

22 Nonparametric Methods.

Dimensionality Reduction and Learning

Parameter, Statistic and Random Samples

ECON 5360 Class Notes GMM

Summary of the lecture in Biostatistics

Lecture 7. Confidence Intervals and Hypothesis Tests in the Simple CLR Model

best estimate (mean) for X uncertainty or error in the measurement (systematic, random or statistical) best

9.1 Introduction to the probit and logit models

6.867 Machine Learning

Bayesian Inferences for Two Parameter Weibull Distribution Kipkoech W. Cheruiyot 1, Abel Ouko 2, Emily Kirimi 3

BAYESIAN INFERENCES FOR TWO PARAMETER WEIBULL DISTRIBUTION

THE ROYAL STATISTICAL SOCIETY 2016 EXAMINATIONS SOLUTIONS GRADUATE DIPLOMA MODULE 2

LINEAR REGRESSION ANALYSIS

STATISTICAL INFERENCE

Chapter 8. Inferences about More Than Two Population Central Values

Chapter 4 (Part 1): Non-Parametric Classification (Sections ) Pattern Classification 4.3) Announcements

Continuous Distributions

{ }{ ( )} (, ) = ( ) ( ) ( ) Chapter 14 Exercises in Sampling Theory. Exercise 1 (Simple random sampling): Solution:

3. Basic Concepts: Consequences and Properties

ECONOMETRIC THEORY. MODULE VIII Lecture - 26 Heteroskedasticity

Generative classification models

THE ROYAL STATISTICAL SOCIETY 2010 EXAMINATIONS SOLUTIONS GRADUATE DIPLOMA MODULE 2 STATISTICAL INFERENCE

Law of Large Numbers

Class 13,14 June 17, 19, 2015

Multivariate Transformation of Variables and Maximum Likelihood Estimation

Unsupervised Learning and Other Neural Networks

Overview. Basic concepts of Bayesian learning. Most probable model given data Coin tosses Linear regression Logistic regression

Mean is only appropriate for interval or ratio scales, not ordinal or nominal.

Introduction to local (nonparametric) density estimation. methods

Investigation of Partially Conditional RP Model with Response Error. Ed Stanek

Some Statistical Inferences on the Records Weibull Distribution Using Shannon Entropy and Renyi Entropy

ESS Line Fitting

Answer key to problem set # 2 ECON 342 J. Marcelo Ochoa Spring, 2009

Feature Selection: Part 2. 1 Greedy Algorithms (continued from the last lecture)

CHAPTER 4 RADICAL EXPRESSIONS

Fridayʼs lecture" Problem solutions" Joint densities" 1."E(X) xf (x) dx (x,y) dy X,Y Marginal distributions" The distribution of a ratio" Problems"

Third handout: On the Gini Index

ENGI 3423 Simple Linear Regression Page 12-01

Maximum Likelihood Estimation

7. Joint Distributions

1 Lyapunov Stability Theory

A New Family of Transformations for Lifetime Data

Homework 1: Solutions Sid Banerjee Problem 1: (Practice with Asymptotic Notation) ORIE 4520: Stochastics at Scale Fall 2015

Simple Linear Regression

Lecture 9: Tolerant Testing

A NEW LOG-NORMAL DISTRIBUTION

Simulation Output Analysis

2.28 The Wall Street Journal is probably referring to the average number of cubes used per glass measured for some population that they have chosen.

THE ROYAL STATISTICAL SOCIETY GRADUATE DIPLOMA

1 Mixed Quantum State. 2 Density Matrix. CS Density Matrices, von Neumann Entropy 3/7/07 Spring 2007 Lecture 13. ψ = α x x. ρ = p i ψ i ψ i.

Estimation of the Loss and Risk Functions of Parameter of Maxwell Distribution

Wu-Hausman Test: But if X and ε are independent, βˆ. ECON 324 Page 1

9 U-STATISTICS. Eh =(m!) 1 Eh(X (1),..., X (m ) ) i.i.d

The equation is sometimes presented in form Y = a + b x. This is reasonable, but it s not the notation we use.

ECON 482 / WH Hong The Simple Regression Model 1. Definition of the Simple Regression Model

Simple Linear Regression

MATH 247/Winter Notes on the adjoint and on normal operators.

4 Inner Product Spaces

Comparing Different Estimators of three Parameters for Transmuted Weibull Distribution

BAYESIAN ESTIMATOR OF A CHANGE POINT IN THE HAZARD FUNCTION

Lecture 2 - What are component and system reliability and how it can be improved?

THE ROYAL STATISTICAL SOCIETY 2016 EXAMINATIONS SOLUTIONS HIGHER CERTIFICATE MODULE 5

Comparison of Parameters of Lognormal Distribution Based On the Classical and Posterior Estimates

Likelihood Ratio, Wald, and Lagrange Multiplier (Score) Tests. Soccer Goals in European Premier Leagues

Lecture 07: Poles and Zeros

Transcription:

Pot Estmato: defto of estmators Pot estmator: ay fucto W (X,..., X ) of a data sample. The exercse of pot estmato s to use partcular fuctos of the data order to estmate certa ukow populato parameters. Examples: Assume that X,..., X are draw..d. from some dstrbuto wth ukow mea µ ad ukow varace σ 2. Potetal pot estmators for µ clude: sample mea X = X ; sample meda med(x,..., X ). Potetal pot estmators for σ 2 clude: the sample varace (X X ) 2. Ay pot estmator s a radom varable, whose dstrbuto s that duced by the dstrbuto of X,..., X. Example: X,..., X..d. N(µ, σ 2 ). The sample mea X N(µ, σ 2 ) where µ = µ, ad σ 2 = σ 2 /. For a partcular realzato of the radom varables x,..., x, the correspodg pot estmator evaluated at x,..., x,.e., W (x,..., x ), s called the pot estmate. I these lecture otes, we wll cosder three types of estmators:. Method of momets 2. Maxmum lkelhood 3. Bayesa estmato Method of momets: Assume: X,..., X..d. f(x θ,..., θ K ) Here the ukow parameters are θ,..., θ K (K N). Idea s to fd values of the parameters such that the populato momets are as close as possble to ther sample aalogs. Ths volves fdg values of the parameters to solve the followg K-system of equatos:

m X = EX = xf(x θ,..., θ K ) m 2 X 2 = EX 2 = x 2 f(x θ,..., θ K ).. m K X K = EX K = x K f(x θ,..., θ K ). Example: X,..., X..d. N(θ, σ 2 ). Parameters are θ, σ 2. Momet equatos are: X = EX = θ X 2 = EX 2 = V X + (EX) 2 = σ 2 + θ 2. Hece, the MOM estmators are θ MOM = X ad σ 2 MOM = Example: X,..., X..d. U[0, θ]. Parameter s θ. MOM: X = θ 2 = θmom = 2 X. Remarks: X2 ( X ) 2. Apart from these specal cases above, for geeral desty fuctos f( θ), the MOM estmator s ofte dffcult to calculate, because the populato momets volve dffcult tegrals. I Pearso s orgal paper, the desty was a mxture of two ormal desty fuctos: f(x θ) = λ 2πσ exp ( (x µ ) ) 2 + ( λ) 2σ 2 wth ukow parameters λ, µ, µ 2, σ, σ 2. 2πσ2 exp ( (x µ ) 2) 2 The model assumpto that X,..., X..d. f( θ) mples a umber of momet equatos equal to the umber of momets, whch ca be >> K. Ths leaves room for evaluatg the model specfcato. 2 2σ 2 2

For example, the uform dstrbuto example above, aother momet codto whch should be satsfed s that X 2 At the MOM estmator θ MOM, oe ca see whether = EX 2 = V X + (EX) 2 = θ2 3 + θ 2. () X 2 = θmom 2 3 + θmom 2. (Later, you wll lear how ths ca be tested more formally.) If ths does ot hold, the that mght be cause for you to coclude that the orgal specfcato that X,..., X..d. U[0, θ] s adequate. Eq. () s a example s a overdetfyg restrcto. Whle the MOM estmator focuses o usg the sample ucetered momets to costruct estmators, there are other sample quattes whch could be useful, such as the sample meda (or other sample percetles), as well as sample mmum or maxmum. (Ideed, for the uform case above, the sample maxmum would be a very reasoable estmator for θ.) All these estmators are lumped uder the rubrc of geeralzed method of momets (GMM). Maxmum Lkelhood Estmato Let X,..., X..d. wth desty f( θ,..., θ K ). Defe: the lkelhood fucto, for a cotuous radom varable, s the jot desty of the sample observatos: L( θ x,..., x ) = f(x θ). = Vew L( θ x) as a fucto of the parameters θ, for the data observatos x. From classcal pot of vew, the lkelhood fucto L( θ x) s a radom varable due to the radomess the data x. (I the Bayesa pot of vew, whch we talk about later, the lkelhood fucto s also radom because the parameters θ are also treated as radom varables.) 3

The maxmum lkelhood estmator (MLE) are the parameter values θ ML whch maxmze the lkelhood fucto: θ ML = argmax θ L( θ x). Usually, practce, to avod umercal overflow problems, maxmze the log of the lkelhood fucto: θ ML = argmax θ log L( θ x) = log f(x θ). Aalogously, for dscrete radom varables, the lkelhood fucto s the jot probablty mass fucto: L( θ x) = P (X = x θ). Example: X,..., X..d. N(θ, ). = log L(θ x) = log(/ 2π) 2 = (x θ) 2 max θ log L(θ x) = m θ 2 (x θ) 2 FOC: log L θ = (x θ) = 0 θ ML = Also should check secod order codto: 2 log L θ 2 x (sample mea) = < 0 : so satsfed. Example: X,..., X..d. Beroull wth prob. p. Ukow parameter s p. L(p x) = = px ( p) x FOC: log L p log L(p x) = [x log p + ( x ) log( p)] = = y log p + ( y) log( p) : y s umber of s = y p y p = pml = y For y = 0 or y =, the p ML s (respectvely) 0 ad : corer solutos. 4

SOC: log L = p p=p y y < 0 for 0 < y <. 2 p 2 ( p) 2 ML Whe parameter s multdmesoal: check that the Hessa matrx 2 log L θ θ egatve defte. s Example: X,..., X U[0, θ]. Lkelhood fucto { ( L( X θ) ) f max(x = θ,..., X ) θ 0 f max(x,..., X ) > θ whch s maxmzed at θ MLE = max(x,..., X ). You ca thk of ML as a geeralzed MOM estmator: for X,..., X..d., ad K-dmesoal parameter vector θ, the MLE solves the FOCs: log f(x θ) log f(x θ) = 0 θ log f(x θ) θ 2 = 0.. log f(x θ) = 0. θ K p Eθ0 log f(x θ) Uder LLN: θ k θ k, for k =,..., K, where the otato E θ0 deote the expectato over the dstrbuto of X at the true parameter vector θ 0. Hece, MLE s equvalet to GMM wth the momet codtos Bayes estmators E θ0 log f(x θ) θ k = 0, k =,..., K. 5

Phlosophcally dfferet vew of the world. Model the ukow parameters θ as radom varables, ad assume that researcher s belefs about θ are summarzed a pror dstrbuto f(θ). I ths sese, Bayesa approach s subjectve, because researcher s belefs about θ are accommodated feretal approach. X,..., X..d. f(x θ): the Bayesa vews the desty of each data observato as a codtoal desty, whch s codtoal o a realzato of the radom varable θ. Gve data X,..., X, we ca update our belefs about the parameter θ by computg the posteror desty (usg Bayes Rule): f( x θ) f(θ) f(θ x) = f( x) f( x θ) f(θ) =. f( x θ)f(θ)dθ A Bayesa pot estmate of θ s some feature of ths posteror desty. Commo pot estmators are: Posteror mea: E [θ x] = θf(θ x)dθ. Posteror meda: F θ x (0.5), where F θ x s CDF correspodg to the posteror desty:.e., F θ x ( θ) = θ f(θ x)dθ. Posteror mode: max θ f(θ x). Ths s the pot at whch the desty s hghest. Note that f( x θ) s just the lkelhood fucto, so that the posteror desty f(θ x) ca be wrtte as: f(θ x) = L(θ x) f(θ). L(θ x)f(θ)dθ But there s a dfferece terpretato: Bayesa world, the lkelhood fucto s radom due to both x ad θ, whereas classcal world, oly x s radom. Example: X,..., X..d. N(θ, ), wth pror desty f(θ). Posteror desty f(θ x) = exp( 2 (x θ) 2 f(θ)) exp( 2 (x θ) 2 )f(θ)dθ. 6

Itegral deomator ca be dffcult to calculate: computatoal dffcultes ca hamper computato of posteror destes. However, ote that the deomator s ot a fucto of θ. Thus f(θ x) L(θ x). Hece, f we assume that f(θ) s costat (e. uform), for all possble values of θ, the the posteror mode argmax θ f(θ x) = argmax θ L(θ x) = θ ML. Example: Bayesa updatg for ormal dstrbuto, wth ormal prors X N(θ, σ 2 ), assume σ 2 s kow. Pror: θ N(µ, τ 2 ), assume τ s kow. The posteror dstrbuto where θ X N(E(θ X), V (θ X)), E(θ X) = τ 2 τ 2 + σ X + σ2 2 σ 2 + τ µ 2 V (θ X) = σ2 τ 2 σ 2 + τ 2. Ths s a example of a cojugate pror ad cojugate dstrbuto, where the posteror dstrbuto comes from the same famly as the pror dstrbuto. Posteror mea E(θ X) s weghted average of X ad pror mea µ. I ths case, as τ (so that pror formato gets worse ad worse): the E(θ X) X (a.s.). These are just the MLE (for just oe data observato). Whe you observe a..d. sample X (X,..., X ), wth sample mea X : E(θ X ) = τ 2 τ 2 + σ X σ 2 2 + σ 2 + τ µ 2 V (θ X ) = σ2 τ 2 σ 2 + τ 2. I ths case, as the umber of observatos, the posteror mea E(θ X ) X. So as, the posteror mea coverges to the MLE: whe your sample becomes arbtrarly large, you place o weght o your pror formato. 7

Exchageablty ad depedece: A terestg feature of the Bayesa approach here, s that, for ay, the posteror ferece does ot deped o the order whch the observatos X, X 2,..., X are observed; ay permutato of these varables would yeld the same posteror mea. Ths exchageablty of the posteror mea would appear to be a reasoable requremet to make of ay ferece procedure, the case whe the data are draw..d. fasho. De Fett s Theorem formalzes the coecto betwee exchageablty ad Bayesa ferece wth..d. varates. Defe a exchageble sequece of radom varables to be a sequece X, X 2,..., X such that the jot dstrbuto fucto F (X,..., X ) s the same for ay permutato of the radom varables. (Obvously, f X,..., X are..d., the they are exchageable; but ot true vce versa.) De Fett s Theorem, the smplest form, says that a fte sequece of 0- radom varables X, X 2,..., X,... whch are exchageable, has a jot probablty dstrbuto equal to the jot margal dstrbuto of codtoally..d. Beroull radom varables: that s, for all f(x,..., X ) = }{{} exchageable 0 p Xt ( p) Xt t dh(p) }{{}..d. Beroull(p) (Result has bee exteded to cotuous radom elemets.) Data augmetato The mportat phlosophcal dstcto of the Bayesa approach s that data ad model parameters are treated o a equal footg. Hece, just as we make posteror ferece o model parameters, we ca also make posteror ferece o uobserved varables latet varable models, whch are models where ot all the model varables are observed. Cosder a smple example (the bary probt model): z = βx + ϵ, ϵ N(0, ) { 0 f z < 0 y = f z 0. (2) The researcher observes (x, y), but ot (z, β). He wshes to form the posteror of z, β x, y. 8

We do all ferece codtoal o x. Therefore the relevat pror s (z, β x) = (z β, x) β x = N(βx, ) N( β, a 2 ) }{{} f(β) = ϕ(z βx) a ϕ ( β β a ). (3) I the above, we assume the margal pror o β s ormal (ad does t deped o x). The codtoal pror desty of z β, x s derved from the model specfcato (2). ϕ deotes desty for stadard ormal. The posteror s: (z, β y, x) L(y z, β, y, x) (z, β x) = ((z 0) y + (z < 0) ( y)) f 0 (z, β x). (4) Note how dog ths smplfes the lkelhood. Ths s ot the lkelhood you would use whe you do MLE (because you do ote observe z!) but ths s what you ca f you treat the latet z as a parameter. Accordgly, ths ca be margalzed over β to obta the posteror of z y, x. Usg Bayesa procedure to do posteror ferece o latet data varables s sometmes called data augmetato. I o-bayesa cotext, obtag values for mssg data values s usually doe by some sort of mputato procedure. Thus, data augmetato ca be vewed as a sort of Bayesa mputato procedure. Oe attractve feature of the Bayesa approach s that t follows easly ad aturally from the usual Bayesa logc. 9