Point Estimation: definition of estimators

Similar documents
Point Estimation: definition of estimators

STK4011 and STK9011 Autumn 2016

Chapter 5 Properties of a Random Sample

ENGI 4421 Joint Probability Distributions Page Joint Probability Distributions [Navidi sections 2.5 and 2.6; Devore sections

ρ < 1 be five real numbers. The

Lecture 3. Sampling, sampling distributions, and parameter estimation

Special Instructions / Useful Data

1 Solution to Problem 6.40

Chapter 4 Multiple Random Variables

X ε ) = 0, or equivalently, lim

Functions of Random Variables

Bayes (Naïve or not) Classifiers: Generative Approach

Qualifying Exam Statistical Theory Problem Solutions August 2005

CHAPTER VI Statistical Analysis of Experimental Data

UNIVERSITY OF OSLO DEPARTMENT OF ECONOMICS

Module 7. Lecture 7: Statistical parameter estimation

THE ROYAL STATISTICAL SOCIETY GRADUATE DIPLOMA

The Mathematical Appendix

Econometric Methods. Review of Estimation

ECON 5360 Class Notes GMM

best estimate (mean) for X uncertainty or error in the measurement (systematic, random or statistical) best

STATISTICAL PROPERTIES OF LEAST SQUARES ESTIMATORS. x, where. = y - ˆ " 1

Bayes Estimator for Exponential Distribution with Extension of Jeffery Prior Information

6.867 Machine Learning

LINEAR REGRESSION ANALYSIS

Chapter 4 Multiple Random Variables

arxiv: v1 [math.st] 24 Oct 2016

Lecture Notes Types of economic variables

TESTS BASED ON MAXIMUM LIKELIHOOD

X X X E[ ] E X E X. is the ()m n where the ( i,)th. j element is the mean of the ( i,)th., then

Introduction to local (nonparametric) density estimation. methods

Estimation of Stress- Strength Reliability model using finite mixture of exponential distributions

UNIVERSITY OF OSLO DEPARTMENT OF ECONOMICS

CHAPTER 3 POSTERIOR DISTRIBUTIONS

Ordinary Least Squares Regression. Simple Regression. Algebra and Assumptions.

STATISTICAL INFERENCE

Chapter 4 (Part 1): Non-Parametric Classification (Sections ) Pattern Classification 4.3) Announcements

Chapter 8. Inferences about More Than Two Population Central Values

Lecture 7. Confidence Intervals and Hypothesis Tests in the Simple CLR Model

Lecture Note to Rice Chapter 8

Summary of the lecture in Biostatistics

Continuous Distributions

Mean is only appropriate for interval or ratio scales, not ordinal or nominal.

Bayesian Inferences for Two Parameter Weibull Distribution Kipkoech W. Cheruiyot 1, Abel Ouko 2, Emily Kirimi 3

BAYESIAN INFERENCES FOR TWO PARAMETER WEIBULL DISTRIBUTION

Generative classification models

Class 13,14 June 17, 19, 2015

THE ROYAL STATISTICAL SOCIETY 2016 EXAMINATIONS SOLUTIONS GRADUATE DIPLOMA MODULE 2

Dimensionality Reduction and Learning

Lecture 3 Probability review (cont d)

22 Nonparametric Methods.

CHAPTER 4 RADICAL EXPRESSIONS

9.1 Introduction to the probit and logit models

9 U-STATISTICS. Eh =(m!) 1 Eh(X (1),..., X (m ) ) i.i.d

3. Basic Concepts: Consequences and Properties

{ }{ ( )} (, ) = ( ) ( ) ( ) Chapter 14 Exercises in Sampling Theory. Exercise 1 (Simple random sampling): Solution:

Homework 1: Solutions Sid Banerjee Problem 1: (Practice with Asymptotic Notation) ORIE 4520: Stochastics at Scale Fall 2015

Some Statistical Inferences on the Records Weibull Distribution Using Shannon Entropy and Renyi Entropy

Chapter 14 Logistic Regression Models

Parameter, Statistic and Random Samples

Multivariate Transformation of Variables and Maximum Likelihood Estimation

ENGI 3423 Simple Linear Regression Page 12-01

Introduction to Probability

Overview. Basic concepts of Bayesian learning. Most probable model given data Coin tosses Linear regression Logistic regression

( ) = ( ) ( ) Chapter 13 Asymptotic Theory and Stochastic Regressors. Stochastic regressors model

Comparison of Parameters of Lognormal Distribution Based On the Classical and Posterior Estimates

7. Joint Distributions

Construction and Evaluation of Actuarial Models. Rajapaksha Premarathna

Dr. Shalabh Department of Mathematics and Statistics Indian Institute of Technology Kanpur

Multiple Choice Test. Chapter Adequacy of Models for Regression

THE ROYAL STATISTICAL SOCIETY 2010 EXAMINATIONS SOLUTIONS GRADUATE DIPLOMA MODULE 2 STATISTICAL INFERENCE

Simple Linear Regression

THE ROYAL STATISTICAL SOCIETY GRADUATE DIPLOMA

Maximum Likelihood Estimation

Law of Large Numbers

ESS Line Fitting

Fridayʼs lecture" Problem solutions" Joint densities" 1."E(X) xf (x) dx (x,y) dy X,Y Marginal distributions" The distribution of a ratio" Problems"

Lecture 07: Poles and Zeros

Likelihood Ratio, Wald, and Lagrange Multiplier (Score) Tests. Soccer Goals in European Premier Leagues

Investigation of Partially Conditional RP Model with Response Error. Ed Stanek

2.28 The Wall Street Journal is probably referring to the average number of cubes used per glass measured for some population that they have chosen.

Estimation of the Loss and Risk Functions of Parameter of Maxwell Distribution

hp calculators HP 30S Statistics Averages and Standard Deviations Average and Standard Deviation Practice Finding Averages and Standard Deviations

Feature Selection: Part 2. 1 Greedy Algorithms (continued from the last lecture)

Likewise, properties of the optimal policy for equipment replacement & maintenance problems can be used to reduce the computation.

Analysis of Variance with Weibull Data

ECONOMETRIC THEORY. MODULE VIII Lecture - 26 Heteroskedasticity

A New Family of Transformations for Lifetime Data

Lecture 9: Tolerant Testing

Wu-Hausman Test: But if X and ε are independent, βˆ. ECON 324 Page 1

Comparing Different Estimators of three Parameters for Transmuted Weibull Distribution

1. The weight of six Golden Retrievers is 66, 61, 70, 67, 92 and 66 pounds. The weight of six Labrador Retrievers is 54, 60, 72, 78, 84 and 67.

Unsupervised Learning and Other Neural Networks

Lecture 1 Review of Fundamental Statistical Concepts

Lecture 7: Linear and quadratic classifiers

Randomness and uncertainty play an important

VOL. 3, NO. 11, November 2013 ISSN ARPN Journal of Science and Technology All rights reserved.

Chapter 8: Statistical Analysis of Simulated Data

LECTURE - 4 SIMPLE RANDOM SAMPLING DR. SHALABH DEPARTMENT OF MATHEMATICS AND STATISTICS INDIAN INSTITUTE OF TECHNOLOGY KANPUR

L(θ X) s 0 (1 θ 0) m s. (s/m) s (1 s/m) m s

Transcription:

Pot Estmato: defto of estmators Pot estmator: ay fucto W (X,..., X ) of a data sample. The exercse of pot estmato s to use partcular fuctos of the data order to estmate certa ukow populato parameters. Examples: Assume that X,..., X are draw..d. from some dstrbuto wth ukow mea µ ad ukow varace σ 2. Potetal pot estmators for µ clude: sample mea X = X ; sample meda med(x,..., X ). Potetal pot estmators for σ 2 clude: the sample varace (X X ) 2. Ay pot estmator s a radom varable, whose dstrbuto s that duced by the dstrbuto of X,..., X. Example: X,..., X..d. N(µ, σ 2 ). The sample mea X N(µ, σ 2 ) where µ = µ, ad σ 2 = σ 2 /. For a partcular realzato of the radom varables x,..., x, the correspodg pot estmator evaluated at x,..., x,.e., W (x,..., x ), s called the pot estmate. I these lecture otes, we wll cosder three types of estmators:. Method of momets 2. Maxmum lkelhood 3. Bayesa estmato Method of momets: very tutve dea Assume: X,..., X..d. f(x θ,..., θ K ) Here the ukow parameters are θ,..., θ K (K N). Idea s to fd values of the parameters such that the populato momets are as close as possble to ther sample aalogs. Ths volves fdg values of the parameters to solve the followg K-system of equatos:

m X = EX = xf(x θ,..., θ K ) m 2 X 2 = EX 2 = x 2 f(x θ,..., θ K ).. m K X K = EX K = x K f(x θ,..., θ K ). Example: X,..., X..d. N(θ, σ 2 ). Parameters are θ, σ 2. Momet equatos are: X = EXθ X 2 = EX 2 = V X + (EX) 2 = σ 2 + θ 2. Hece, the MOM estmators are θ MOM = X ad σ 2 MOM = Example: X,..., X..d. U[0, θ]. Parameter s θ. MOM: X = θ 2 = θmom = 2 X. Remarks: X2 ( X ) 2. Apart from these specal cases above, for geeral desty fuctos f( θ), the geeral case MOM estmator s ofte dffcult to calculate, because the populato momets volve dffcult tegrals. (I Pearso s orgal paper, the desty was a mxture of two ormal desty fuctos: f(x θ) = λ 2πσ exp ( (x µ ) ) 2 + ( λ) 2σ 2 wth ukow parameters λ, µ, µ 2, σ, σ 2.) 2πσ2 exp ( (x µ ) 2) 2 The model assumpto that X,..., X..d. f( θ) mples a umber of momet equatos equal to the umber of momets, whch ca be >> K. Ths leaves room for evaluatg the model specfcato. 2 2σ 2 2

For example, the uform dstrbuto example above, aother momet codto whch should be satsfed s that X 2 = EX 2 = V X + (EX) 2 = θ2 3 + θ 2. () At the MOM estmator θ MOM, oe ca see whether X 2 = θmom 2 3 + θmom. 2 (Later, you wll lear how ths ca be tested more formally.) If ths does ot hold, the that mght be cause for you to coclude that the orgal specfcato that X,..., X..d. U[0, θ] s adequate. Eq. () s a example s a overdetfyg restrcto. Whle the MOM estmator focuses o usg the sample ucetered momets to costruct estmators, there are other sample quattes whch could be useful, such as the sample meda (or other sample percetles), as well as sample mmum or maxmum. (Ideed, for the uform case above, the sample maxmum would be a very reasoable estmator for θ.) Maxmum Lkelhood Estmato Let X,..., X..d. wth desty f( θ,..., θ K ). Defe: the lkelhood fucto, for a cotuous radom varable, s the jot desty of the sample observatos: L( θ x,..., x ) = f(x θ). Vew L( θ x) as a fucto of the parameters θ, for the data observatos x. From classcal pot of vew, the lkelhood fucto L( θ x) s a radom varable due to the radomess the data x. (I the Bayesa pot of vew, whch we talk about later, the lkelhood fucto s also radom because the parameters θ are also treated as radom varables.) The maxmum lkelhood estmator (MLE) are the parameter values θ ML whch maxmze the lkelhood fucto: = θ ML = argmax θ L( θ x). 3

Usually, practce, to avod umercal overflow problems, maxmze the log of the lkelhood fucto: θ ML = argmax θ log L( θ x) = log f(x θ). Aalogously, for dscrete radom varables, the lkelhood fucto s the jot probablty mass fucto: L( θ x) = P (X = x θ). Example: X,..., X..d. N(θ, ). = log L(θ x) = log(/ 2π) 2 = (x θ) 2 max θ log L(θ x) = m θ 2 (x θ) 2 FOC: log L θ = (x θ) = 0 θ ML = Also should check secod order codto: x (sample mea) 2 log L θ 2 = < 0 : so satsfed. Example: X,..., X..d. Beroull wth prob. p. Ukow parameter s p. L(p x) = = px ( p) x FOC: log L p log L(p x) = [x log p + ( x ) log( p)] = = y log p + ( y) log( p) : y s umber of s = y p y p = pml = y For y = 0 or y =, the p ML s (respectvely) 0 ad : corer solutos. SOC: log L p 2 p=p ML = y p 2 y ( p) 2 < 0 for 0 < y <. Whe parameter s multdmesoal: check that the Hessa matrx 2 log L θ θ egatve defte. s 4

You ca thk of ML as a MOM estmator: for X,..., X..d., ad K-dmesoal parameter vector θ, the MLE solves the FOCs: log f(x θ) = 0 θ log f(x θ) θ 2 = 0.. log f(x θ) = 0. θ K Uder LLN: log f(x θ) p θ k log f(x θ) Eθ0 θ k, for k =,..., K, where the otato E θ0 deote the expectato over the dstrbuto of X at the true parameter vector θ 0. Hece, MLE s equvalet to MOM wth the momet codtos Bayes estmators E θ0 log f(x θ) θ k = 0, k =,..., K. Fudametally dfferet vew of the world. Model the ukow parameters θ as radom varables, ad assume that researcher s belefs about θ are summarzed a pror dstrbuto f(θ). I ths sese, Bayesa approach s subjectve, because researcher s belefs about θ are accommodated feretal approach. X,..., X..d. f(x θ): the Bayesa vews the desty of each data observato as a codtoal desty, whch s codtoal o a realzato of the radom varable θ. Gve data X,..., X, we ca update our belefs about the parameter θ by comput- 5

g the posteror desty (usg Bayes Rule): f( x θ) f(θ) f(θ x) = f( x) f( x θ) f(θ) =. f( x θ)f(θ)dθ A Bayesa pot estmate of θ s some feature of ths posteror desty. Commo pot estmators are: Posteror mea: E [θ x] = θf(θ x)dθ. Posteror meda: F θ x (0.5), where F θ x s CDF correspodg to the posteror desty:.e., F θ x ( θ) = θ f(θ x)dθ. Posteror mode: max θ f(θ x). Ths s the pot at whch the desty s hghest. Note that f( x θ) s just the lkelhood fucto, so that the posteror desty f(θ x) ca be wrtte as: f(θ x) = L(θ x) f(θ). L(θ x)f(θ)dθ But there s a dfferece terpretato: Bayesa world, the lkelhood fucto s radom due to both x ad θ, whereas classcal world, oly x s radom. Example: X,..., X..d. N(θ, ), wth pror desty f(θ). Posteror desty f(θ x) = exp( 2 P (x θ) 2 f(θ)) R exp( P 2 (x θ) 2 )f(θ)dθ. Itegral deomator ca be dffcult to calculate: computatoal dffcultes ca hamper computato of posteror destes. Specal case: f we assume that f(θ) =, for < θ < (ths s what s called a mproper pror ), the f(θ x) L(θ x) (because deomator s just a costat, ad ot a fucto of θ). For ths case, posteror mode = argmax θ L(θ x) = θ ML. 6

Example: Bayesa updatg for ormal dstrbuto, wth ormal prors X N(θ, σ 2 ), assume σ 2 s kow. Pror: θ N(µ, τ 2 ), assume τ s kow. The posteror dstrbuto where θ X N(E(θ X), V (θ X)), E(θ X) = τ 2 τ 2 + σ X + σ2 2 σ 2 + τ µ 2 V (θ X) = σ2 τ 2 σ 2 + τ 2. Ths s a example of a cojugate pror ad cojugate dstrbuto, where the posteror dstrbuto comes from the same famly as the pror dstrbuto. (The classc referece s Morrs Degroot, Optmal Statstcal Decsos.) Posteror mea s weghted average of X ad pror mea µ. I ths case, as τ (so that pror formato gets worse ad worse): the E(θ X) X (a.s.). These are just the MLE (for just oe data observato). Whe you observe a..d. sample X (X,..., X ), wth sample mea X : E(θ X ) = τ 2 τ 2 + σ X σ 2 2 + σ 2 + τ µ 2 V (θ X) = σ2 τ 2 σ 2 + τ 2. I ths case, as the umber of observatos, the posteror mea E(θ X ) X. So as, the posteror mea coverges to the MLE: whe your sample becomes arbtrarly large, you place o weght o your pror formato. 7