LECTURE NOTES 9. 1 Point Estimation. 1.1 The Method of Moments

Similar documents
Lecture 9: September 19

Lecture 11 and 12: Basic estimation theory

Review Questions, Chapters 8, 9. f(y) = 0, elsewhere. F (y) = f Y(1) = n ( e y/θ) n 1 1 θ e y/θ = n θ e yn

Direction: This test is worth 250 points. You are required to complete this test within 50 minutes.

Chapter 6 Principles of Data Reduction

1.010 Uncertainty in Engineering Fall 2008

Lecture 12: September 27

Exponential Families and Bayesian Inference

Direction: This test is worth 150 points. You are required to complete this test within 55 minutes.

Solutions: Homework 3

Statistical Inference (Chapter 10) Statistical inference = learn about a population based on the information provided by a sample.

Lecture Note 8 Point Estimators and Point Estimation Methods. MIT Spring 2006 Herman Bennett

Lecture 3. Properties of Summary Statistics: Sampling Distribution

5. Likelihood Ratio Tests

Let us give one more example of MLE. Example 3. The uniform distribution U[0, θ] on the interval [0, θ] has p.d.f.

Lecture 33: Bootstrap

DS 100: Principles and Techniques of Data Science Date: April 13, Discussion #10

Resampling Methods. X (1/2), i.e., Pr (X i m) = 1/2. We order the data: X (1) X (2) X (n). Define the sample median: ( n.

Topic 9: Sampling Distributions of Estimators

Estimation for Complete Data

EXAMINATIONS OF THE ROYAL STATISTICAL SOCIETY

MASSACHUSETTS INSTITUTE OF TECHNOLOGY 6.265/15.070J Fall 2013 Lecture 3 9/11/2013. Large deviations Theory. Cramér s Theorem

AAEC/ECON 5126 FINAL EXAM: SOLUTIONS

Stat 421-SP2012 Interval Estimation Section

STATISTICAL INFERENCE

This exam contains 19 pages (including this cover page) and 10 questions. A Formulae sheet is provided with the exam.

Last Lecture. Wald Test

Introductory statistics

Lecture 19: Convergence

ECE 901 Lecture 13: Maximum Likelihood Estimation

Stat410 Probability and Statistics II (F16)

4. Partial Sums and the Central Limit Theorem

CS434a/541a: Pattern Recognition Prof. Olga Veksler. Lecture 5

SOME THEORY AND PRACTICE OF STATISTICS by Howard G. Tucker

Expectation-Maximization Algorithm.

Topic 9: Sampling Distributions of Estimators

Statistical Pattern Recognition

Parameter, Statistic and Random Samples

EECS564 Estimation, Filtering, and Detection Hwk 2 Solns. Winter p θ (z) = (2θz + 1 θ), 0 z 1

Topic 9: Sampling Distributions of Estimators

ECE 901 Lecture 14: Maximum Likelihood Estimation and Complexity Regularization

Bayesian Methods: Introduction to Multi-parameter Models

Maximum Likelihood Estimation and Complexity Regularization

Kurskod: TAMS11 Provkod: TENB 21 March 2015, 14:00-18:00. English Version (no Swedish Version)

Lecture 2: Monte Carlo Simulation

Statistics for Applications. Chapter 3: Maximum Likelihood Estimation 1/23

Statistical Theory MT 2009 Problems 1: Solution sketches

Lecture 10 October Minimaxity and least favorable prior sequences

Summary. Recap ... Last Lecture. Summary. Theorem

( θ. sup θ Θ f X (x θ) = L. sup Pr (Λ (X) < c) = α. x : Λ (x) = sup θ H 0. sup θ Θ f X (x θ) = ) < c. NH : θ 1 = θ 2 against AH : θ 1 θ 2

Unbiased Estimation. February 7-12, 2008

Stat 319 Theory of Statistics (2) Exercises

Lecture 18: Sampling distributions

January 25, 2017 INTRODUCTION TO MATHEMATICAL STATISTICS

Lecture 13: Maximum Likelihood Estimation

x = Pr ( X (n) βx ) =

Discrete Mathematics for CS Spring 2008 David Wagner Note 22

MATH/STAT 352: Lecture 15

CSE 527, Additional notes on MLE & EM

Lecture Notes 15 Hypothesis Testing (Chapter 10)

The variance of a sum of independent variables is the sum of their variances, since covariances are zero. Therefore. V (xi )= n n 2 σ2 = σ2.

Department of Mathematics

Questions and Answers on Maximum Likelihood

Rates of Convergence by Moduli of Continuity

6. Sufficient, Complete, and Ancillary Statistics

MASSACHUSETTS INSTITUTE OF TECHNOLOGY 6.265/15.070J Fall 2013 Lecture 2 9/9/2013. Large Deviations for i.i.d. Random Variables

Goodness-of-Fit Tests and Categorical Data Analysis (Devore Chapter Fourteen)

Chapter 13: Tests of Hypothesis Section 13.1 Introduction

Statistical Theory MT 2008 Problems 1: Solution sketches

Random Variables, Sampling and Estimation

Lecture 7: Properties of Random Samples

AMS570 Lecture Notes #2

CS284A: Representations and Algorithms in Molecular Biology

A statistical method to determine sample size to estimate characteristic value of soil parameters

Asymptotics. Hypothesis Testing UMP. Asymptotic Tests and p-values

LECTURE 14 NOTES. A sequence of α-level tests {ϕ n (x)} is consistent if

The Sampling Distribution of the Maximum. Likelihood Estimators for the Parameters of. Beta-Binomial Distribution

Problem Set 4 Due Oct, 12

Bayesian inference for Parameter and Reliability function of Inverse Rayleigh Distribution Under Modified Squared Error Loss Function

Lecture 1 Probability and Statistics

Statisticians use the word population to refer the total number of (potential) observations under consideration

ECE 8527: Introduction to Machine Learning and Pattern Recognition Midterm # 1. Vaishali Amin Fall, 2015

MATH 320: Probability and Statistics 9. Estimation and Testing of Parameters. Readings: Pruim, Chapter 4

A RANK STATISTIC FOR NON-PARAMETRIC K-SAMPLE AND CHANGE POINT PROBLEMS

Since X n /n P p, we know that X n (n. Xn (n X n ) Using the asymptotic result above to obtain an approximation for fixed n, we obtain

Last Lecture. Biostatistics Statistical Inference Lecture 16 Evaluation of Bayes Estimator. Recap - Example. Recap - Bayes Estimator

1 Introduction: within and beyond the normal approximation

Distribution of Random Samples & Limit theorems

Bayesian and E- Bayesian Method of Estimation of Parameter of Rayleigh Distribution- A Bayesian Approach under Linex Loss Function

The Bayesian Learning Framework. Back to Maximum Likelihood. Naïve Bayes. Simple Example: Coin Tosses. Given a generative model

Element sampling: Part 2

Understanding Samples

Regression and generalization

Simulation. Two Rule For Inverting A Distribution Function

32 estimating the cumulative distribution function

Mathematical Statistics Anna Janicka

Probability and MLE.

EFFECTIVE WLLN, SLLN, AND CLT IN STATISTICAL MODELS

Economics 241B Relation to Method of Moments and Maximum Likelihood OLSE as a Maximum Likelihood Estimator

Transcription:

LECTURE NOTES 9 Poit Estimatio Uder the hypothesis that the sample was geerated from some parametric statistical model, a atural way to uderstad the uderlyig populatio is by estimatig the parameters of the statistical model. Oe ca of course woder what happes if the model is wrog? Or where do models really come from? I some rare cases, we actually kow eough about our data to hypothesise a reasoable model. Most ofte however, whe we specify a model, we do so hopig that it ca provide a useful approximatio to the data geeratio mechaism. The George Box quote is worth rememberig i this cotext: all models are wrog, but some are useful.. The Method of Momets Suppose that θ = (θ,..., θ k ) so that there are k ukow parameters. We ca estimate θ by matchig k momets. Let m = X i, m = Xi,..., m k = Xi k. Let µ i = x i p θ (x)dx deote the i th populatio momet. This depeds o θ so we write it as µ i (θ). The method-of-momets prescribes estimatig the parameters: θ,..., θ k by solvig the system of equatios: m = µ (θ,..., θ k ). m k = µ k (θ,..., θ k ). Example : If X,..., X N(θ, σ ), we would solve: X i = θ, ad Xi = θ + σ,

to obtai the estimators: θ = σ = = X i = X Xi ( ) X i ( ) Xi X s. Example : Suppose X,..., X Bi(k, p) where k ad p are both ukow. Now µ = kp µ = kp( p) + k p. Solvig, we ge which gives X = kp, Xi = kp( p) + k p p = X (X i X), X X k = X (X i X).. Maximum Likelihood Estimatio The most popular techique to derive estimators is via the priciple of maximum likelihood. Suppose that X,..., X p θ where p θ deotes either the pmf or pdf. The likelihood fuctio is defied by: L(θ) L(θ; X,..., X ) = p θ (X i ). The log-likelihood fuctio is l(θ) l(θ; X,..., X ) = log L(θ).

The maximum likelihood estimator, or mle deoted by θ or θ is the value of the θ that maximizes L(θ). Note that θ also maximizes l(θ). We write θ = argmax L(θ) = argmax l(θ). θ Keep i mid that θ is a fuctio of the data. Sometimes we will write θ as θ(x,..., θ ) to emphasize this poit. Later, we shall see that the mle has may optimality properties i certai settigs. Fidig the mle might ot be easy. Sometimes we eed to resort to umerical techiques. The typical way to compute the MLE (suppose that we have k ukow parameters) is to either aalytically or umerically solve the system of equatios: θ i l(θ) = 0 i =,..., k. Note: We ca throw away ay costats ot depedig o θ i the likelihood fuctio whe we fid the mle. This does ot affect the locatio of the maximizer. Example : Suppose X,..., X N(θ, ), the the likelihood fuctio is give as: L(θ) = π exp( (X i θ) /) e (θ X) / ad l(θ) = (θ X ). We get θ = X. Sice l ( θ) < 0 this is ideed a maximum. Example : Suppose that X,..., X Ber(p), the the log-likelihood is give by l(p) which is maximized at p = X. X i log p + ( X i ) log( p) = X log p + ( X) log( p), Ivariace of the MLE. The mle is ivariat to trasfomatios. This meas that the mle of r(θ) is ru( θ) for ay fuctio r. We will ot prove this but it is a very useful fact. We will discuss other properties of the MLE i future lectures. 3

Bayes Estimators The third geeral method to derive estimators is the Bayes estimator. We treat θ as a radom variable ad assig it a distributio p(θ) called the prior distributio. **This opes up a buch of philosophical questios that we will deal with later i the course.** Now we ca use Bayes theorem to get the distributio of θ give X,..., X, which is called the posterior distributio: p(θ x,..., x ) = p(θ, x,..., x ) p(x,..., x ) = L(θ)p(θ) L(θ)p(θ). L(θ)p(θ)dθ = p(x,..., x θ)p(θ) p(x,..., x θ)p(θ)dθ Fially, we ca use the mea of p(θ x,..., x ) as a estimator: θ = θp(θ X,..., X )dθ. We call this, the Bayes estimator. We could also use the media or mode of the posterior. Example: Suppose X,..., X Ber(θ). We will first eed to defie the Beta distributio: θ has a Beta distributio with parameters α ad β if its desity o [0, ] is p(θ) = Γ(α + β) Γ(α)Γ(β) θα ( θ) β θ α ( θ) β. We write θ Beta(α, β). The mea of the Beta distributio is: α/(α + β). Let S = i X i. The posterior distributio is p(θ X,..., X ) L(θ)p(θ) θ S ( θ) S θ α ( θ) β = θ S+α ( θ) S+beta. Thus, the posterior distributio is Beta(S + α, S + β). We write The mea is (S + α)/( + α + β). Thus, θ X,..., X Beta(S + α, S + β). θ = S + α + α + β. A commo choice is α = β = (so that the prior for θ is uiform). I that case: θ = X + + = + X + + 4 = wx + ( w),

which ca be viewed as a covex combiatio of the MLE ad the prior mea /. Note that, whe is large, θ X, which is the mle. Example : Suppose that X,..., X draw from N(θ, σ ). Assume that σ is kow. Let s use the prior θ N(µ, τ ). It ca be show that the posterior is N(a, b ) where Exercise: Prove this. a = τ σ + τ X σ + σ + τ µ, b = σ τ σ + τ. The Bayes estimator is thus ( µ = τ σ + τ X σ + σ + τ µ = w where w = τ σ +τ. Whe is large, w ad µ X. ) X i + ( w)µ 5