Mathematical statistics

Similar documents
Mathematical statistics

Mathematical statistics

Review. December 4 th, Review

Chapters 9. Properties of Point Estimators

Central Limit Theorem ( 5.3)

HT Introduction. P(X i = x i ) = e λ λ x i

STAT 135 Lab 3 Asymptotic MLE and the Method of Moments

Example: An experiment can either result in success or failure with probability θ and (1 θ) respectively. The experiment is performed independently

Part IB Statistics. Theorems with proof. Based on lectures by D. Spiegelhalter Notes taken by Dexter Chua. Lent 2015

Chapter 8.8.1: A factorization theorem

A Very Brief Summary of Statistical Inference, and Examples

ECE534, Spring 2018: Solutions for Problem Set #3

STAT 730 Chapter 4: Estimation

ELEG 5633 Detection and Estimation Minimum Variance Unbiased Estimators (MVUE)

t x 1 e t dt, and simplify the answer when possible (for example, when r is a positive even number). In particular, confirm that EX 4 = 3.

Problem Selected Scores

Final Exam. 1. (6 points) True/False. Please read the statements carefully, as no partial credit will be given.

Introduction to Estimation Methods for Time Series models Lecture 2

Distributions of Functions of Random Variables. 5.1 Functions of One Random Variable

A Very Brief Summary of Statistical Inference, and Examples

Elements of statistics (MATH0487-1)

March 10, 2017 THE EXPONENTIAL CLASS OF DISTRIBUTIONS

STAT 512 sp 2018 Summary Sheet

Math 494: Mathematical Statistics

EIE6207: Estimation Theory

McGill University. Faculty of Science. Department of Mathematics and Statistics. Part A Examination. Statistics: Theory Paper

1. (Regular) Exponential Family

Classical Estimation Topics

SUFFICIENT STATISTICS

MATH4427 Notebook 2 Fall Semester 2017/2018

Estimation theory. Parametric estimation. Properties of estimators. Minimum variance estimator. Cramer-Rao bound. Maximum likelihood estimators

Course: ESO-209 Home Work: 1 Instructor: Debasis Kundu

STA 260: Statistics and Probability II

Statistics - Lecture One. Outline. Charlotte Wickham 1. Basic ideas about estimation

Mathematics Ph.D. Qualifying Examination Stat Probability, January 2018

STAT 135 Lab 5 Bootstrapping and Hypothesis Testing

Lecture 7 Introduction to Statistical Decision Theory

Terminology Suppose we have N observations {x(n)} N 1. Estimators as Random Variables. {x(n)} N 1

BIO5312 Biostatistics Lecture 13: Maximum Likelihood Estimation

Statistics Ph.D. Qualifying Exam: Part I October 18, 2003

Theory of Statistics.

f(x θ)dx with respect to θ. Assuming certain smoothness conditions concern differentiating under the integral the integral sign, we first obtain

Variations. ECE 6540, Lecture 10 Maximum Likelihood Estimation

Parameter Estimation

STATS 200: Introduction to Statistical Inference. Lecture 29: Course review

Math Review Sheet, Fall 2008

Estimation MLE-Pandemic data MLE-Financial crisis data Evaluating estimators. Estimation. September 24, STAT 151 Class 6 Slide 1

Fall 2017 STAT 532 Homework Peter Hoff. 1. Let P be a probability measure on a collection of sets A.

Exercises and Answers to Chapter 1

DA Freedman Notes on the MLE Fall 2003

Stat 5102 Final Exam May 14, 2015

Regression Estimation - Least Squares and Maximum Likelihood. Dr. Frank Wood

Master s Written Examination

Economics 520. Lecture Note 19: Hypothesis Testing via the Neyman-Pearson Lemma CB 8.1,

Brief Review on Estimation Theory

Statistics 3858 : Maximum Likelihood Estimators

Probability and Estimation. Alan Moses

ECE531 Lecture 10b: Maximum Likelihood Estimation

MS&E 226: Small Data. Lecture 11: Maximum likelihood (v2) Ramesh Johari

Advanced Signal Processing Introduction to Estimation Theory

Review of Discrete Probability (contd.)

Midterm Examination. STA 215: Statistical Inference. Due Wednesday, 2006 Mar 8, 1:15 pm

STAB57: Quiz-1 Tutorial 1 (Show your work clearly) 1. random variable X has a continuous distribution for which the p.d.f.

Two hours. To be supplied by the Examinations Office: Mathematical Formula Tables THE UNIVERSITY OF MANCHESTER. 21 June :45 11:45

Hypothesis testing: theory and methods

Practice Problems Section Problems

ST5215: Advanced Statistical Theory

Definition 3.1 A statistical hypothesis is a statement about the unknown values of the parameters of the population distribution.

Mathematics Qualifying Examination January 2015 STAT Mathematical Statistics

STAT 450: Final Examination Version 1. Richard Lockhart 16 December 2002

Define characteristic function. State its properties. State and prove inversion theorem.

MAS223 Statistical Inference and Modelling Exercises

p y (1 p) 1 y, y = 0, 1 p Y (y p) = 0, otherwise.

Statistics: Learning models from data

COMP2610/COMP Information Theory

Chapter 8: Least squares (beginning of chapter)

Problem 1 (20) Log-normal. f(x) Cauchy

ADMISSIONS EXERCISES. MSc in Mathematical Finance. For entry in 2020

2. Variance and Covariance: We will now derive some classic properties of variance and covariance. Assume real-valued random variables X and Y.

Testing Hypothesis. Maura Mezzetti. Department of Economics and Finance Università Tor Vergata

Lecture 18: Bayesian Inference

Qualifying Exam in Probability and Statistics.

Design of Engineering Experiments

Statement: With my signature I confirm that the solutions are the product of my own work. Name: Signature:.

557: MATHEMATICAL STATISTICS II BIAS AND VARIANCE

IE 230 Probability & Statistics in Engineering I. Closed book and notes. 60 minutes.

Unbiased Estimation. Binomial problem shows general phenomenon. An estimator can be good for some values of θ and bad for others.

Final Examination a. STA 532: Statistical Inference. Wednesday, 2015 Apr 29, 7:00 10:00pm. Thisisaclosed bookexam books&phonesonthefloor.

Statistics and Econometrics I

Introduction to Estimation Methods for Time Series models. Lecture 1

3 Joint Distributions 71

Regression Estimation Least Squares and Maximum Likelihood

Probability & Statistics - FALL 2008 FINAL EXAM

Gov 2001: Section 4. February 20, Gov 2001: Section 4 February 20, / 39

5.2 Fisher information and the Cramer-Rao bound

F & B Approaches to a simple model

Bivariate distributions

Methods of evaluating estimators and best unbiased estimators Hamid R. Rabiee

Lecture 1: Introduction

February 26, 2017 COMPLETENESS AND THE LEHMANN-SCHEFFE THEOREM

Transcription:

October 4 th, 2018 Lecture 12: Information

Where are we? Week 1 Week 2 Week 4 Week 7 Week 10 Week 14 Probability reviews Chapter 6: Statistics and Sampling Distributions Chapter 7: Point Estimation Chapter 8: Confidence Intervals Chapter 9: Test of Hypothesis Regression

Overview 7.1 Point estimate unbiased estimator mean squared error 7.2 Methods of point estimation method of moments method of maximum likelihood. 7.3 Sufficient statistic 7.4 Information and Efficiency Large sample properties of the maximum likelihood estimator

Sufficient statistic

Sufficient statistic Definition A statistic T = t(x 1,..., X n ) is said to be sufficient for making inferences about a parameter θ if the joint distribution of X 1, X 2,..., X n given that T = t does not depend upon θ for every possible value t of the statistic T.

Fisher-Neyman factorization theorem Theorem T is sufficient for if and only if nonnegative functions g and h can be found such that f (x 1, x 2,..., x n ; θ) = g(t(x 1, x 2,..., x n ), θ) h(x 1, x 2,..., x n ) i.e. the joint density can be factored into a product such that one factor, h does not depend on θ; and the other factor, which does depend on θ, depends on x only through t(x).

Example Let X 1, X 2,..., X n be a random sample of from a Poisson distribution with parameter λ where λ is unknown. f (x, λ) = 1 x! e λx x = 0, 1, 2,..., Find a sufficient statistic of λ.

Jointly sufficient statistic Definition The m statistics T 1 = t 1 (X 1,..., X n ), T 2 = t 2 (X 1,..., X n ),..., T m = t m (X 1,..., X n ) are said to be jointly sufficient for the parameters θ 1, θ 2,..., θ k if the joint distribution of X 1, X 2,..., X n given that T 1 = t 1, T 2 = t 2,..., T m = t m does not depend upon θ 1, θ 2,..., θ k for every possible value t 1, t 2,..., t m of the statistics.

Fisher-Neyman factorization theorem Theorem T 1, T 2,..., T m are sufficient for θ 1, θ 2,..., θ k if and only if nonnegative functions g and h can be found such that f (x 1, x 2,..., x n ; θ 1, θ 2,..., θ k ) = g(t 1, t 2,..., t m, θ 1, θ 2,..., θ k ) h(x 1, x 2,..., x n )

Example Problem Let X 1, X 2,..., X n be a random sample from N (µ, σ 2 ) Prove that f X (x) = 1 e (x µ)2 2σ 2 2πσ T 1 = X 1 +... + X n, T 2 = X 2 1 + X 2 2 +... + X 2 n are jointly sufficient for the two parameters µ and σ.

Example Problem Let X 1, X 2,..., X n be a random sample from a Gamma distribution f X (x) = where α, β is unknown. Prove that 1 Γ(α)β α x α 1 e x/β T 1 = X 1 +... + X n, T 2 = X 1 X 2... X n are jointly sufficient for the two parameters α and β.

Information

Fisher information Definition The Fisher information I (θ) in a single observation from a pmf or pdf f (x; θ) is the variance of the random variable U = which is [ ] log f (X, θ) I (θ) = Var θ Note: We always have E[U] = 0 log f (X,θ) θ,

Fisher information We have Thus f (x, θ) = 1 x θ [ ] log f (X, θ) E[U] = E θ = log f (x, θ) f (x, θ) θ x = f (x, θ) = 0 θ x

Example Problem Let X be distributed by x 0 1 f (x, θ) 1 θ θ Compute I (X, θ). Hint: If x = 1, then f (x, θ) = θ. Thus How about x = 0? u(x) = log f (x, θ) θ = 1 θ

Example Problem Let X be distributed by x 0 1 f (x, θ) 1 θ θ Compute I (X, θ). We have Var[U] = E[U 2 ] (E[U]) 2 = E[U 2 ] = U 2 (x)f (x, θ) = x=0,1 1 (1 θ) 2 (1 θ) + 1 θ 2 θ

The Cramer-Rao Inequality Theorem Assume a random sample X 1, X 2,..., X n from the distribution with pmf or pdf f (x, θ) such that the set of possible values does not depend on θ. If the statistic T = t(x 1, X 2,..., X n ) is an unbiased estimator for the parameter θ, then Var(T ) 1 n I (θ)

Efficiency Theorem Let T = t(x 1, X 2,..., X n ) is an unbiased estimator for the parameter θ, the ratio of the lower bound to the variance of T is its efficiency 1 Efficiency = ni (θ)v (T ) 1 T is said to be an efficient estimator if T achieves the CramerRao lower bound (i.e., the efficiency is 1). Note: An efficient estimator is a minimum variance unbiased (MVUE) estimator.

Large Sample Properties of the MLE Theorem Given a random sample X 1, X 2,..., X n from the distribution with pmf or pdf f (x, θ) such that the set of possible values does not depend on θ. Then for large n the maximum likelihood estimator ˆθ has approximately a normal distribution with mean θ and variance 1 n I (θ). More precisely, the limiting distribution of n(ˆθ θ) is normal with mean 0 and variance 1/I (θ).

The Central Limit Theorem Theorem Let X 1, X 2,..., X n be a random sample from a distribution with mean µ and variance σ 2. Then, in the limit when n, the standardized version of X have the standard normal distribution lim P n ( X µ σ/ n z ) = P[Z z] = Φ(z)

Chapter 7: Summary

Overview 7.1 Point estimate unbiased estimator mean squared error bootstrap 7.2 Methods of point estimation method of moments method of maximum likelihood. 7.3 Sufficient statistic 7.4 Information and Efficiency Large sample properties of the maximum likelihood estimator

Point estimate Definition A point estimate ˆθ of a parameter θ is a single number that can be regarded as a sensible value for θ. population parameter = sample = estimate θ = X 1, X 2,..., X n = ˆθ

Mean Squared Error Measuring error of estimation ˆθ θ or (ˆθ θ) 2 The error of estimation is random Definition The mean squared error of an estimator ˆθ is E[(ˆθ θ) 2 ]

Bias-variance decomposition Theorem ( ) 2 MSE(ˆθ) = E[(ˆθ θ) 2 ] = V (ˆθ) + E(ˆθ) θ Bias-variance decomposition Mean squared error = variance of estimator + (bias) 2

Unbiased estimators Definition A point estimator ˆθ is said to be an unbiased estimator of θ if E(ˆθ) = θ for every possible value of θ. Unbiased estimator Bias = 0 Mean squared error = variance of estimator

Example 1 Problem Consider a random sample X 1,..., X n from the pdf f (x) = 1 + θx 2 1 x 1 Show that ˆθ = 3 X is an unbiased estimator of θ.

Overview 7.1 Point estimate unbiased estimator mean squared error bootstrap 7.2 Methods of point estimation method of moments method of maximum likelihood. 7.3 Sufficient statistic 7.4 Information and Efficiency Large sample properties of the maximum likelihood estimator

Method of moments: ideas Let X 1,..., X n be a random sample from a distribution with pmf or pdf f (x; θ 1, θ 2,..., θ m ) Assume that for k = 1,..., m X k 1 + X k 2 +... + X k n n = E(X k ) Solve the system of equations for θ 1, θ 2,..., θ m

Method of moments: Example 4 Problem Suppose that for a parameter 0 θ 1, X is the outcome of the roll of a four-sided tetrahedral die x 1 2 3 4 p(x) 3θ 4 θ 4 3(1 θ) 4 (1 θ) 4 Suppose the die is rolled 10 times with outcomes 4, 1, 2, 3, 1, 2, 3, 4, 2, 3 Use the method of moments to obtain an estimator of θ.

Maximum likelihood estimator Let X 1, X 2,..., X n have joint pmf or pdf where θ is unknown. f joint (x 1, x 2,..., x n ; θ) When x 1,..., x n are the observed sample values and this expression is regarded as a function of θ, it is called the likelihood function. The maximum likelihood estimates θ ML are the value for θ that maximize the likelihood function: f joint (x 1, x 2,..., x n ; θ ML ) f joint (x 1, x 2,..., x n ; θ) θ

How to find the MLE? Step 1: Write down the likelihood function. Step 2: Can you find the maximum of this function? Step 3: Try taking the logarithm of this function. Step 4: Find the maximum of this new function. To find the maximum of a function of θ: compute the derivative of the function with respect to θ set this expression of the derivative to 0 solve the equation

Example 3 Let X 1,..., X 10 be a random sample of size n = 10 from a distribution with pdf { (θ + 1)x θ if 0 x 1 f (x) = 0 otherwise The observed x i s are 0.92, 0.79, 0.90, 0.65, 0.86, 0.47, 0.73, 0.97, 0.94, 0.77 Question: Use the method of maximum likelihood to obtain an estimator of θ.

Overview 7.1 Point estimate unbiased estimator mean squared error bootstrap 7.2 Methods of point estimation method of moments method of maximum likelihood. 7.3 Sufficient statistic 7.4 Information and Efficiency Large sample properties of the maximum likelihood estimator

Fisher-Neyman factorization theorem Theorem T is sufficient for if and only if nonnegative functions g and h can be found such that f (x 1, x 2,..., x n ; θ) = g(t(x 1, x 2,..., x n ), θ) h(x 1, x 2,..., x n ) i.e. the joint density can be factored into a product such that one factor, h does not depend on θ; and the other factor, which does depend on θ, depends on x only through t(x).

Fisher information Definition The Fisher information I (θ) in a single observation from a pmf or pdf f (x; θ) is the variance of the random variable U = which is [ ] log f (X, θ) I (θ) = Var θ Note: We always have E[U] = 0. log f (X,θ) θ,

The Cramer-Rao Inequality Theorem Assume a random sample X 1, X 2,..., X n from the distribution with pmf or pdf f (x, θ) such that the set of possible values does not depend on θ. If the statistic T = t(x 1, X 2,..., X n ) is an unbiased estimator for the parameter θ, then V (T ) 1 n I (θ)

Large Sample Properties of the MLE Theorem Given a random sample X 1, X 2,..., X n from the distribution with pmf or pdf f (x, θ) such that the set of possible values does not depend on θ. Then for large n the maximum likelihood estimator ˆθ has approximately a normal distribution with mean θ and variance 1 n I (θ). More precisely, the limiting distribution of n(ˆθ θ) is normal with mean 0 and variance 1/I (θ).