Lecture 12: September 27

Similar documents
6. Sufficient, Complete, and Ancillary Statistics

Chapter 6 Principles of Data Reduction

Lecture 23: Minimal sufficiency

Lecture 11 and 12: Basic estimation theory

Exponential Families and Bayesian Inference

Lecture Note 8 Point Estimators and Point Estimation Methods. MIT Spring 2006 Herman Bennett

1.010 Uncertainty in Engineering Fall 2008

This exam contains 19 pages (including this cover page) and 10 questions. A Formulae sheet is provided with the exam.

MATH 472 / SPRING 2013 ASSIGNMENT 2: DUE FEBRUARY 4 FINALIZED

Lecture Notes 15 Hypothesis Testing (Chapter 10)

n outcome is (+1,+1, 1,..., 1). Let the r.v. X denote our position (relative to our starting point 0) after n moves. Thus X = X 1 + X 2 + +X n,

7.1 Convergence of sequences of random variables

Statistical Theory MT 2009 Problems 1: Solution sketches

7.1 Convergence of sequences of random variables

5. Likelihood Ratio Tests

Unbiased Estimation. February 7-12, 2008

Stat410 Probability and Statistics II (F16)

Asymptotics. Hypothesis Testing UMP. Asymptotic Tests and p-values

The standard deviation of the mean

MATH 320: Probability and Statistics 9. Estimation and Testing of Parameters. Readings: Pruim, Chapter 4

Problem Set 4 Due Oct, 12

Lecture 3. Properties of Summary Statistics: Sampling Distribution

Convergence of random variables. (telegram style notes) P.J.C. Spreij

Math 525: Lecture 5. January 18, 2018

Random Variables, Sampling and Estimation

STAT Homework 1 - Solutions

Statistical Theory MT 2008 Problems 1: Solution sketches

Lecture 6 Ecient estimators. Rao-Cramer bound.

Review Questions, Chapters 8, 9. f(y) = 0, elsewhere. F (y) = f Y(1) = n ( e y/θ) n 1 1 θ e y/θ = n θ e yn

ECE 901 Lecture 14: Maximum Likelihood Estimation and Complexity Regularization

Discrete Mathematics for CS Spring 2008 David Wagner Note 22

Direction: This test is worth 250 points. You are required to complete this test within 50 minutes.

EECS564 Estimation, Filtering, and Detection Hwk 2 Solns. Winter p θ (z) = (2θz + 1 θ), 0 z 1

Lecture 19: Convergence

Lecture 9: September 19

Econ 325 Notes on Point Estimator and Confidence Interval 1 By Hiro Kasahara

Let us give one more example of MLE. Example 3. The uniform distribution U[0, θ] on the interval [0, θ] has p.d.f.

Output Analysis and Run-Length Control

LECTURE NOTES 9. 1 Point Estimation. 1.1 The Method of Moments

This section is optional.

The variance of a sum of independent variables is the sum of their variances, since covariances are zero. Therefore. V (xi )= n n 2 σ2 = σ2.

STAT Homework 2 - Solutions

Lecture 7: Properties of Random Samples

Chapter 3. Strong convergence. 3.1 Definition of almost sure convergence

Parameter, Statistic and Random Samples

Advanced Stochastic Processes.

Distribution of Random Samples & Limit theorems

Stat 421-SP2012 Interval Estimation Section

CSE 527, Additional notes on MLE & EM

Lecture 3: August 31

Topic 9: Sampling Distributions of Estimators

Estimation for Complete Data

ECE 901 Lecture 13: Maximum Likelihood Estimation

Lecture 10: Universal coding and prediction

Last Lecture. Wald Test

Introductory statistics

Bayesian Methods: Introduction to Multi-parameter Models

4. Partial Sums and the Central Limit Theorem

Learning Theory: Lecture Notes

1 Introduction to reducing variance in Monte Carlo simulations

Summary. Recap ... Last Lecture. Summary. Theorem

CS434a/541a: Pattern Recognition Prof. Olga Veksler. Lecture 5

Topic 9: Sampling Distributions of Estimators

Topic 9: Sampling Distributions of Estimators

Lecture 2: Monte Carlo Simulation

Lecture 33: Bootstrap

FACULTY OF MATHEMATICAL STUDIES MATHEMATICS FOR PART I ENGINEERING. Lectures

January 25, 2017 INTRODUCTION TO MATHEMATICAL STATISTICS

Statistical Inference (Chapter 10) Statistical inference = learn about a population based on the information provided by a sample.

Maximum Likelihood Estimation and Complexity Regularization

MASSACHUSETTS INSTITUTE OF TECHNOLOGY 6.265/15.070J Fall 2013 Lecture 2 9/9/2013. Large Deviations for i.i.d. Random Variables

SOME THEORY AND PRACTICE OF STATISTICS by Howard G. Tucker

Econ 325/327 Notes on Sample Mean, Sample Proportion, Central Limit Theorem, Chi-square Distribution, Student s t distribution 1.

Lecture Chapter 6: Convergence of Random Sequences

Introduction to Extreme Value Theory Laurens de Haan, ISM Japan, Erasmus University Rotterdam, NL University of Lisbon, PT

5. INEQUALITIES, LIMIT THEOREMS AND GEOMETRIC PROBABILITY

IIT JAM Mathematical Statistics (MS) 2006 SECTION A

STATISTICAL PROPERTIES OF LEAST SQUARES ESTIMATORS. Comments:

Solutions: Homework 3

10-701/ Machine Learning Mid-term Exam Solution

Monte Carlo method and application to random processes

Mathematical Statistics - MS

Lecture 2: Concentration Bounds

MASSACHUSETTS INSTITUTE OF TECHNOLOGY 6.265/15.070J Fall 2013 Lecture 6 9/23/2013. Brownian motion. Introduction

Lecture 10 October Minimaxity and least favorable prior sequences

EE 4TM4: Digital Communications II Probability Theory

Notes 5 : More on the a.s. convergence of sums

2 Definition of Variance and the obvious guess

Resampling Methods. X (1/2), i.e., Pr (X i m) = 1/2. We order the data: X (1) X (2) X (n). Define the sample median: ( n.

Notes 19 : Martingale CLT

Discrete Mathematics and Probability Theory Spring 2014 Anant Sahai Lecture 16

Machine Learning Brett Bernstein

Lecture 16: UMVUE: conditioning on sufficient and complete statistics

Discrete Mathematics for CS Spring 2007 Luca Trevisan Lecture 22

MASSACHUSETTS INSTITUTE OF TECHNOLOGY 6.265/15.070J Fall 2013 Lecture 21 11/27/2013

MASSACHUSETTS INSTITUTE OF TECHNOLOGY 6.265/15.070J Fall 2013 Lecture 3 9/11/2013. Large deviations Theory. Cramér s Theorem

Properties of Point Estimators and Methods of Estimation

Department of Mathematics

Element sampling: Part 2

Transcription:

36-705: Itermediate Statistics Fall 207 Lecturer: Siva Balakrisha Lecture 2: September 27 Today we will discuss sufficiecy i more detail ad the begi to discuss some geeral strategies for costructig estimators. 2. Miimal sufficiecy As we have see previously sufficiet statistics are ot uique. Furthermore, it seems at least ituitively that some sufficiet statistics preset much more reductio tha others (for istace i the Poisso model both the mea ad the etire sample are sufficiet). This motivates the followig defiitio of miimal sufficiet statistics: Miimal Sufficiecy: A statistic T (x,..., x ) is miimal sufficiet if it is sufficiet, ad furthermore for ay other sufficiet statistic S(x,..., x ) we ca write T (x,..., x ) = g(s(x,..., x )), i.e. T is a fuctio of S. There is ufortuately o straightforward way to verify this coditio. Aalogous to the factorizatio theorem we have a coditio that we ca check. Theorem 2. Defie R(x,..., x, y,..., y ; θ) = p(y,..., y ; θ) p(x,..., x ; θ). Suppose that a statistic T has the followig property: R(x,..., x, y,..., y ; θ) does ot deped o θ if ad oly if T (y,..., y ) = T (x,..., x ). The T is a MSS. Before we prove the theorem let us cosider some examples. Example 2.2 Suppose that Y,..., Y are i.i.d Poisso (θ). p(y,..., y ; θ) = e θ θ y i yi, p(y,..., y ; θ) p(x,..., x ; θ) = θ yi x i yi!/ x i! which is idepedet of θ iff y i = x i. This implies that T (X,..., X ) = X i is a miimal sufficiet statistic for θ. 2-

2-2 Lecture 2: September 27 The miimal sufficiet statistic is ot uique. But, the miimal sufficiet partitio is uique. Example 2.3 Cauchy. The p(x; θ) = p(y,..., y ; θ) p(x,..., x ; θ) = The ratio is a costat fuctio of θ if π( + (x θ) 2 ). { + (x i θ) 2 }. { + (y j θ) 2 } j= T (X,..., X ) = (X (),, X () ). It is techically harder to show that the ratio is idepedet of θ oly if T is the order statistics, but it could be doe usig theorems about polyomials. Havig show this, oe ca coclude that the order statistics are the miimal sufficiet statistics for θ. Proof: This proof is a bit techical so feel free to skip it. We prove this i two steps. We first show that T is a sufficiet statistic ad the we check that it is miimal. We defie the partitio iduced by T, as {A t : t Rage(T )} ad for each set i the partitio A t we associate a represetative (x t,..., x t ) A t. T is sufficiet: We look at the joit distributio at ay (x,..., x ). Suppose that T (x,..., x ) = u, the cosider (y,..., y ) := (x u,..., x u ). Observe that, (y,..., y ) depeds oly o T (x,..., x ), i.e. the poit y is a fuctio of the statistic T oly. Now we have that, p(x,..., x ; θ) = p(y,..., y ; θ)r(y,..., y, x,..., x ; θ), ad sice T (x,..., x ) = T (y,..., y ), R does ot deped o θ. Recallig that (y,..., y ) is oly a fuctio of T (x,..., x ) we have that, p(x,..., x ; θ) = g(t (x,..., x ); θ)h(x,..., x ), where g correspods to the first term ad h correspods to the R term. We coclude that T is sufficiet. T is miimal: As a prelimiary we ote that the defiitio of a miimal sufficiet statistic could be equivaletly writte as: T is a MSS if for ay other sufficiet statistic S, if we have that S(x,..., x ) = S(y,..., y ) the we also have that T (x,..., x ) = T (y,..., y ). This is equivalet to the statemet that T is a fuctio of S.

Lecture 2: September 27 2-3 Cosider, ay other sufficiet statistic S. Suppose that, S(x,..., x ) = S(y,..., y ), the by the factorizatio theorem we have that, p(x,..., x ; θ) = g(s(x,..., x ); θ)h(x,..., x ) = g(s(y,..., y ); θ)h(y,..., y ) h(x,..., x ) h(y,..., y ) = p(y,..., y ; θ) h(x,..., x ) h(y,..., y ), so we have that R(x,..., x, y,..., y ; θ) does ot deped o θ. T (x,..., x ) = T (y,..., y ) ad so T is miimal. So we coclude that 2.2 Miimal sufficiecy ad the likelihood Although miimal sufficiet statistics are ot uique they iduce a uique partitio o the possible datasets. This partitio is also iduced by the likelihood, i.e. Suppose we have a partitio such that (x,..., x ) ad (y,..., y ) are placed i the same set of the partitio iff L(θ; x,..., x ) L(θ; y,..., y ), the the partitio is the miimal sufficiet partitio. You will prove this o your homework but it is a simple cosequece of the characterizatio we have see i the previous sectio. 2.3 Sufficiecy - the risk reductio viewpoit We will retur to the cocept of risk more formally i the ext few lectures, but for ow let us try to uderstad the mai ideas. Settig: Suppose we observe X,..., X p(x; θ) ad we would like to estimate θ, i.e. we wat to costruct some fuctio of the data that is close i some sese to θ. We costruct a estimator θ(x,..., X ). I order to evaluate our estimator we might cosider how far our estimate is from θ o average, i.e. we ca defie R( θ, θ) = E( θ θ) 2. We will see this agai later o but the risk of a estimator ca be decomposed ito its bias ad variace, i.e. E( θ θ) 2 = (E θ θ) 2 + E( θ E θ) 2, where the first term is referred to as the bias ad the secod is the variace.

2-4 Lecture 2: September 27 There is a strog sese i which estimators which do ot deped oly o sufficiet statistics ca be improved. This is kow as the Rao-Blackwell theorem. Let θ be a estimator. Let T be ay sufficiet statistic ad defie θ = E[ θ T ]. Rao-Blackwell theorem: R( θ, θ) R( θ, θ). We will ot sped too much time o this but lets see a quick example ad the prove the result. Example: estimator: Suppose we toss a coi times, i.e. X,..., X Ber(θ). We cosider the θ = X, ad the sufficiet statistic T = X i, the θ = E[X T ] = E[X i X i ]. I claim that the coditioal expectatio is simply the average, i.e. θ = X i. First, let us check this i the case whe = 2. If X + X 2 = 2 the X =, ad if X + X 2 = 0, X = 0. I the case, whe X + X 2 =, we have X = with probability /2 ad 0 with probability /2. So we coclude the coditioal expectatio is (X + X 2 )/2. More geerally, if we have X i = k, the of the ( k ) equally likely possibilities we have that X = for ( k ) of them so that the coditioal expectatio is simply: ( k ) as desired. E[X i X i = k] = ( k ) = k, We observe that both estimators are ubiased but the variace of the Rao-Blackwellized estimator is θ( θ)/ as opposed to the origial estimator which has variace θ( θ). Proof of Rao-Blackwell: Observe that, R( θ, θ) = E[(E[ θ T ] θ) 2 ] = E[(E[ θ θ T ]) 2 ] E[E[( θ θ) 2 T ]] = R( θ, θ).

Lecture 2: September 27 2-5 The iequality is Jese s iequality (equivaletly just Var(X) = E[X 2 ] (E[X]) 2 0). A questio worth poderig is: why does it matter for Rao-Blackwellizatio that T is a sufficiet statistic? 2.4 More examples with the likelihood Example 2.4 Suppose that X = (X, X 2, X 3 ) Multiomial(, p) where So Suppose that X = (, 3, 2). The Now suppose that X = (2, 2, 2). The p = (p, p 2, p 3 ) = (θ, θ, 2θ). ( ) p(x; θ) = p x p x 2 2 p x 3 3 = θ x +x 2 ( 2θ) x 3. x x 2 x 3 L(θ) = 6!! 3! 2! θ θ 3 ( 2θ) 2 θ 4 ( 2θ) 2. L(θ) = 6! 2! 2! 2! θ2 θ 2 ( 2θ) 2 θ 4 ( 2θ) 2. Hece, the likelihood fuctio is the same for these two datasets. Example 2.5 X,, X N(µ, ). The, ( ) { } 2 L(µ) = exp (x i µ) 2 exp { } 2π 2 2 (x µ)2. Example 2.6 Let X,..., X Beroulli(p). The for p [0, ] where X = i X i. L(p) p X ( p) X 2.5 Estimatio Now we begi discussig more formally the estimatio problem.

2-6 Lecture 2: September 27 X,..., X p(x; θ). Wat to estimate θ = (θ,..., θ k ). A estimator θ = θ = w(x,..., X ) is a fuctio of the data. Keep i mid that the parameter is a fixed, ukow costat. The estimator is a radom variable. For ow, we will discuss three methods of costructig estimators:. The Method of Momets (MOM) 2. Maximum likelihood (MLE) 3. Bayesia estimators. Some Termiology. Throughout these otes, we will use the followig termiology:. E θ ( θ) = θ(x,..., x )p(x ; θ) p(x ; θ)dx dx. 2. Bias: E θ ( θ) θ. 3. The distributio of θ is called its samplig distributio. 4. The stadard deviatio of θ is called the stadard error deoted by se( θ). 5. θ is cosistet if θ p θ as. 6. Later we will see that if bias 0 ad Var( θ) 0 as the θ is cosistet. 2.6 The Method of Momets Suppose that θ = (θ,..., θ k ). Defie m = m 2 = X i, µ (θ) = E(X i ) Xi 2, µ 2 (θ) = E(Xi 2 ).. m k = Xi k, µ k (θ) = E(Xi k ).

Lecture 2: September 27 2-7 Let θ = ( θ,..., θ k ) solve: m j = µ j ( θ), j =,..., k. I other words, we equate the first k sample momets with the first k theoretical momets. This defies k equatios with k ukows. Example 2.7 N(β, σ 2 ) with θ = (β, σ 2 ). The µ = β ad µ 2 = σ 2 + β 2. Equate: X i = β, Xi 2 = σ 2 + β 2 to get β = X, σ 2 = (X i X ) 2. Example 2.8 Suppose where both k ad p are ukow. We get X,..., X Biomial(k, p) kp = X, Xi 2 = kp( p) + k 2 p 2 givig p = X k, k = X 2 X i (X i X) 2. The method of momets was popular may years ago because it is ofte easy to compute. Lately, it has attracted attetio agai. For example, there is a large literature o estimatig mixtures of Gaussias usig the method of momets.