Example: An experiment can either result in success or failure with probability θ and (1 θ) respectively. The experiment is performed independently

Similar documents
Chapter 8.8.1: A factorization theorem

Mathematical statistics

1 Probability Model. 1.1 Types of models to be discussed in the course

STATISTICAL METHODS FOR SIGNAL PROCESSING c Alfred Hero

Chapters 9. Properties of Point Estimators

Mathematical statistics

1. Fisher Information

Completeness. On the other hand, the distribution of an ancillary statistic doesn t depend on θ at all.

Last Lecture - Key Questions. Biostatistics Statistical Inference Lecture 03. Minimal Sufficient Statistics

ECE534, Spring 2018: Solutions for Problem Set #3

1 Probability Model. 1.1 Types of models to be discussed in the course

Methods of evaluating estimators and best unbiased estimators Hamid R. Rabiee

Statistics GIDP Ph.D. Qualifying Exam Theory Jan 11, 2016, 9:00am-1:00pm

Unbiased Estimation. Binomial problem shows general phenomenon. An estimator can be good for some values of θ and bad for others.

A Very Brief Summary of Statistical Inference, and Examples

STAT 730 Chapter 4: Estimation

Mathematical statistics

Lecture 11. Multivariate Normal theory

Final Exam. 1. (6 points) True/False. Please read the statements carefully, as no partial credit will be given.

Unbiased Estimation. Binomial problem shows general phenomenon. An estimator can be good for some values of θ and bad for others.

Course: ESO-209 Home Work: 1 Instructor: Debasis Kundu

557: MATHEMATICAL STATISTICS II BIAS AND VARIANCE

SUFFICIENT STATISTICS

Module 2. Random Processes. Version 2, ECE IIT, Kharagpur

Review. December 4 th, Review

A Very Brief Summary of Statistical Inference, and Examples

MA/ST 810 Mathematical-Statistical Modeling and Analysis of Complex Systems

Distributions of Functions of Random Variables. 5.1 Functions of One Random Variable

Part IB Statistics. Theorems with proof. Based on lectures by D. Spiegelhalter Notes taken by Dexter Chua. Lent 2015

Brief Review on Estimation Theory

STAT 512 sp 2018 Summary Sheet

HT Introduction. P(X i = x i ) = e λ λ x i

Central Limit Theorem ( 5.3)

Hypothesis Test. The opposite of the null hypothesis, called an alternative hypothesis, becomes

McGill University. Faculty of Science. Department of Mathematics and Statistics. Part A Examination. Statistics: Theory Paper

All other items including (and especially) CELL PHONES must be left at the front of the room.

1 Complete Statistics

DA Freedman Notes on the MLE Fall 2003

Two hours. To be supplied by the Examinations Office: Mathematical Formula Tables THE UNIVERSITY OF MANCHESTER. 21 June :45 11:45

Master s Written Examination

STAT215: Solutions for Homework 2

First Year Examination Department of Statistics, University of Florida

STAT/MATH 395 A - PROBABILITY II UW Winter Quarter Moment functions. x r p X (x) (1) E[X r ] = x r f X (x) dx (2) (x E[X]) r p X (x) (3)

Hypothesis Testing. 1 Definitions of test statistics. CB: chapter 8; section 10.3

Proof In the CR proof. and

Statistics 3858 : Maximum Likelihood Estimators

Algorithms for Uncertainty Quantification

4.5.1 The use of 2 log Λ when θ is scalar

Two hours. To be supplied by the Examinations Office: Mathematical Formula Tables and Statistical Tables THE UNIVERSITY OF MANCHESTER.

Chapter 3. Point Estimation. 3.1 Introduction

Probability and Distributions

2.2.2 Comparing estimators

6.1 Variational representation of f-divergences

Problem Selected Scores

Topic 10: Hypothesis Testing

The Delta Method and Applications

March 10, 2017 THE EXPONENTIAL CLASS OF DISTRIBUTIONS

Random vectors X 1 X 2. Recall that a random vector X = is made up of, say, k. X k. random variables.

1. (Regular) Exponential Family

4.1 The Expectation of a Random Variable

5.2 Fisher information and the Cramer-Rao bound

Chapter 8: Least squares (beginning of chapter)

ECE 275B Homework # 1 Solutions Version Winter 2015

Statistics 1B. Statistics 1B 1 (1 1)

parameter space Θ, depending only on X, such that Note: it is not θ that is random, but the set C(X).

Chapter 1. Statistical Spaces

Lecture 25: Review. Statistics 104. April 23, Colin Rundel

Mathematical Statistics

Contents 1. Contents

Stat 5102 Lecture Slides Deck 3. Charles J. Geyer School of Statistics University of Minnesota

Actuarial Science Exam 1/P

A Few Notes on Fisher Information (WIP)

Estimation theory. Parametric estimation. Properties of estimators. Minimum variance estimator. Cramer-Rao bound. Maximum likelihood estimators

Quick Tour of Basic Probability Theory and Linear Algebra

ECE 275B Homework # 1 Solutions Winter 2018

EIE6207: Estimation Theory

Chapter 4 HOMEWORK ASSIGNMENTS. 4.1 Homework #1

Multiple Random Variables

Spring 2012 Math 541B Exam 1

Last Lecture. Biostatistics Statistical Inference Lecture 14 Obtaining Best Unbiased Estimator. Related Theorems. Rao-Blackwell Theorem

Part IA Probability. Definitions. Based on lectures by R. Weber Notes taken by Dexter Chua. Lent 2015

15 Discrete Distributions

ELEG 5633 Detection and Estimation Minimum Variance Unbiased Estimators (MVUE)

BASICS OF PROBABILITY

Direction: This test is worth 250 points and each problem worth points. DO ANY SIX

Why study probability? Set theory. ECE 6010 Lecture 1 Introduction; Review of Random Variables

conditional cdf, conditional pdf, total probability theorem?

Accounting for Baseline Observations in Randomized Clinical Trials

Statistics and Econometrics I

Estimation Theory. as Θ = (Θ 1,Θ 2,...,Θ m ) T. An estimator

Probability Background

Review Quiz. 1. Prove that in a one-dimensional canonical exponential family, the complete and sufficient statistic achieves the

Maximum Likelihood Estimation

Machine learning - HT Maximum Likelihood

Bivariate distributions

Likelihoods. P (Y = y) = f(y). For example, suppose Y has a geometric distribution on 1, 2,... with parameter p. Then the pmf is

Graduate Econometrics I: Unbiased Estimation

Space Telescope Science Institute statistics mini-course. October Inference I: Estimation, Confidence Intervals, and Tests of Hypotheses

Theory of Statistical Tests

Summary. Ancillary Statistics What is an ancillary statistic for θ? .2 Can an ancillary statistic be a sufficient statistic?

Transcription:

Chapter 3 Sufficient statistics and variance reduction Let X 1,X 2,...,X n be a random sample from a certain distribution with p.m/d.f fx θ. A function T X 1,X 2,...,X n = T X of these observations is called a statistic. From a statistical point of view taking a statistic of the observations is equivalent to taking into account only part of the information in the sample. Example: An experiment can either result in success or failure with probability θ and 1 θ respectively. The experiment is performed independently n times. Let X i = { 1 if the ith repetition results in success 0 if the ith repetition results in failure Let S m = m X i and S n m = n i=m+1 X i. Consider the bivariate statistic T X = S m,s n m. This statistic gives information on how many successes are obtained in the first m experiments and on how many successes are obtained in the last n mexperiments. The information on which particular experiments the successes were obtained in is not retained; neither is the information about how many successes are obtained in the first r experiments for r m. Consider now the statistic UX = n X i. This statistic gives information on the total number of successes in the n repetitions; all other information in the sample is not retained by UX. Note, in fact, that UX retains even less information than T X. Note also that 35

36CHAPTER 3. SUFFICIENT STATISTICS AND VARIANCE REDUCTION UX = S m + S n m i.e. UX is a function i.e. statistic of T X. Consequently we come to the conclusion that every time we take a function of a statistic we drop some of the information. We have argued in the past that the Fisher information Iθ = I X θ = E S 2 X, where S X = d dθ log f X X θ, is a measure of the amount of information in the sample X about the parameter θ. Now, if T X is a statistic then a measure of the amount of information in T about θ can be given by the Fisher information of T defined by I T θ = E S 2 T with ST = d dθ log f T T θ where f T t θ is the p.m/d.f. of the statistic T. If ˆθ T is an unbiased estimator of θ based on the statistic T instead of on the whole sample X then the Cramér-Rao inequality becomes V ar ˆθ T 1 3.1 I T θ Now, in view of the remarks we made about a statistic being equivalent to taking into account only part of the information in the sample, we should expect to have that I T θ I X θ 3.2 with equality holding if and only if the statistic has retained all the relevant information about θ and dropped only information which does not relate to θ. A statistic which retains all the relevant information about θ and discards only information which does not relate to θ is said to be sufficient for θ. Unfortunately, tempting as it may be, we can not adopt strict equality in 3.2 as the formal definition of sufficiency of a statistic T as this will only be possible in the cases when there is enough regularity for the Fisher Information to be defined. We need a formal definition of sufficiency which holds in all cases irrespective of whether this regularity is there or not. Formal definition of Sufficiency: A statistic TX of the observations X with p.m/d.f. f X x θ is said to sufficient for the parameter θ if the conditional distribution of X given T = t is free of θ i.e. if the conditional p.m/d.f.

37 f X x T=t does not involve θ. From this definition of sufficiency we have the following The factorization theorem A statistic TX, where X has joint p.m/d.f. f X x θ, is sufficient for θ if and only if f X x θ = gt, θhx for all x X n where gt,θ is a function of θ and depends on the observations only through the value t of T and hx is a function which does not involve θ. Proof. We first note that if F X,T x,t θ is the joint p.m/d.f. of X and T then f X,T x,t θ = = { fx x θ if t = Tx 0 if t Tx { fx x θ if x A t 3.3 0 if x / A t where the set A t = {x : Tx = t} = set of all sample results for which T = t. We can understand better the result in 3.3 in terms of an example. Suppose an experiment which can result in either success or failure is repeated independently three times and on the ith repetition we record X i = 1 if we get a success and X i = 0 if we get a failure i = 1, 2, 3. Let the statistic T = 3 X i be the number of successes in the three repetitions. The possible outcomes of the sample X = X 1,X 2,X 3 and of the statistic T are shown below.

38CHAPTER 3. SUFFICIENT STATISTICS AND VARIANCE REDUCTION Partition sets X 1,X 2,X 3 T A 0 = 0,0,0 0 A 1 = 1,0,0 0,1,0 0,0,1 1 A 2 = 1,1,0 1,0,1 0,1,1 2 A 3 = 1,1,1 3 Clearly f X,T 0, 1, 0, 2 θ = PrX 1,X 2, X 3 = 0, 1, 0, = 0 3 X i = 2 since clearly we cannot have the result X 1, X 2,X 3 = 0, 1, 0 and at the same have 3 X i = 2. On the other hand f X,T 0, 1, 0, 1 θ = PrX 1,X 2, X 3 = 0, 1, 0, = PrX 1,X 2, X 3 = 0, 1, 0 = f X 0, 1, 0 θ 3 X i = 1. We now turn our attention to the proof of the factorization theorem. Assume first that T is sufficient for θ i.e. that f X x T=t is free of the parameter θ. Since for t = Tx f X x θ = f X,T x,t θ = f X x T=tf T t θ see 3.3 the factorization follows by taking f X x T=t hx and f T t θ gt, θ.

Assume now that the factorization f X x θ = gt,θhx holds for all x X n with t = Tx. It follows that f T t θ = x A t f X x θ = x A t gt,θhx = gt,θ x A t hx = gt, θht 3.4 where the set A t = {x : T x = t} = set of all sample results for which T = t. In calculating 3.4 we have assumed the observations to be discrete; if they are continuous replace summations by integrals. Further in 3.3 we have seen that f X x θ if x A f X x T=t = t f T t θ 0 if x / A t and from 3.4 and the factorization we get gt, θhx f X x T=t = gt,θht = hx if x A t Ht 0 if x / A t i.e. f X x T=t is free of θ. This completes the proof of the factorization theorem. Remark 3.0.1 What are the implications of having the conditional p.m/d.f. f X x T=t free of θ? Given that we know that T x = t it follows that x must be situated in the set A t ; if further f X x T=t is free of θ we can conclude that once we know that x is in the set A t the probability of it being in any particular position within A t is not dependent on θ i.e. once we know that x is in the set A t information on its exact position within A t does not relate to θ. Put in another way, all the information in x relating to θ is contained in the value of T x, the information in x which is not retained by the statistic T does not relate to θ. But we have seen that a statistic T which retains all the relevant information about θ and discards only information that is not relevant to θ is what we call a sufficient statistic for θ. Result: Let TX be a statistic of the sample X whose joint distribution depends on a parameter θ. Then under certain regularity conditions on the joint p.d/m.f f X x θ of X and on the p.d/m.f f T t θ of T 39 I T θ I X θ θ Θ

40CHAPTER 3. SUFFICIENT STATISTICS AND VARIANCE REDUCTION with equality if and only if TX is sufficient for θ. Here [ ] 2 I T θ = E θ log f TT θ = E 2 θ log f TT θ 2 and I X θ = E [ ] 2 θ log f XX θ = E 2 θ log f XX θ. 2 Proof. The inequality I T θ I X θ will be assumed valid as a consequence of our understanding of what a statistic does and of what the Fisher information represents - although it can be rigorously proved using mathematics. That strict equality holds if and only if T is sufficient for θ follows from the factorization theorem and is left as an exercise. Remark: 1. Notice that the factorization theorem not only gives us necessary and sufficient conditions for the existence of a sufficient statistic it also identifies for us the sufficient statistic. 2. Sufficiency implies that basing inferences about θ on procedures involving sufficient statistics rather than the whole sample, will be more preferable since such procedures discard, outright, unnecessary information which does not relate to θ. In particular, in estimating θ, the best unbiased estimators based on sufficient statistics are not going to be any less efficient in the formal sense than the best unbiased estimators based on the whole sample since for T sufficient I T θ = I X θ i.e. the CRLB for unbiased estimators based on Tis the same as the CRLB for unbiased estimators based on X. Example 3.0.1 Let X 1,X 2,...,X n be a random sample from the Bernoulli distribution i.e. { 1 with probability θ X i = 0 with probability 1 θ Hence f Xi x i θ = θ x i 1 θ 1 x i

for all i, Use the factorization theorem to find a sufficient statistic for θ and then confirm it is sufficient for θ with the use of the formal definition of sufficiency Solution: The joint mass function of the observations X = X 1,X 2,...,X n is f X x θ = f Xi x i θ = θ x i 1 θ 1 x i = θ P n x i 1 θ n P n x i = g x i, θ h x with h x 1. Hence by the factorization theorem n X i is a sufficient statistic for θ. Notice that since the factorization is not unique, there may be more than one sufficient statistic. For example we could have written f X x θ = θ Sm+S n m 1 θ n Sm S n m = gs m,s n m,θh x with, once again, h x 1 and S m = m x i, S n m = n i=m+1 x i. Hence by the factorization theorem m X i, n i=m+1 X i is a bivariate sufficient statistic for θ. We now show, using the formal definition, that n X i is indeed a sufficient statistic. The conditional p.m.f. of X given that TX = n X i = t is 41 f X x T = t = f X,Tx,t θ f T t θ = f Xx θ f T t θ when Tx = t and zero otherwise. The last equality was obtained using 3.3. However, the statistic T = n X i has the Binomialn,θ distribution. Hence f X x T = t == f Xx θ f T t θ = θ P n x i 1 θ n P n x i n = 1/ n t θ t t 1 θ n t which is independent of θ confirming that n X i is sufficient for θ.

42CHAPTER 3. SUFFICIENT STATISTICS AND VARIANCE REDUCTION Example 3.0.2 Let X 1,X 2,...,X n be a random sample from the N µ,σ 2 distribution. Then f X x θ = 1 exp 1 2πσ 2σ x 2 i µ 2 = 2πσ 2 n/2 exp = 2πσ 2 n/2 exp 1 1 x i µ 2 x 2 i + µ σ 2 x i nµ2 Suppose that both µ and σ 2 are unknown so that θ = µ,σ 2 T. Then f X x θ = g x i, x 2 i,θ h x with h x 1 and g x i, x 2 i,θ = 2πσ 2 n/2 exp 1 x 2 i + µ σ 2 x i nµ2 Hence the bivariate statistic n X i, n X2 i is sufficient for µ,σ 2. This should NOT be interpreted as saying that n X i is sufficient for µ and n X2 i is sufficient for σ 2. All it says is that all the information contained in the sample about µ and σ 2 is also contained in the statistic n X i, n X2 i. Suppose now that µ is unknown but that σ 2 so that we now have θ = µ and f X x θ = 2πσ 2 n/2 µ exp x σ 2 i nµ2 exp 1 x 2 i }{{}}{{} = g x i,θ h x By the factorization theorem we conclude that n X i is sufficient for θ = µ.

Suppose now that µ is known but σ 2 is unknown so that now θ = σ 2 and f X x θ = 2πθ n/2 exp 1 x 2 i + µ x i nµ2 2θ θ 2θ }{{} 1 }{{} = g x i, x 2 i,θ h x By the factorization theorem we conclude that the bivariate statistic n X i, n is sufficient for θ = σ 2. Note that n X2 i by itself is not sufficient for σ 2 unless µ = 0. Example 3.0.3 Let X 1,X 2,...,X n be a random sample from the U0, θ distribution i.e. { 1 if 0 < x f Xi x i θ = i < θ θ 0 otherwise Note that θ is involved in the range of the distribution. Hence it is better if we write the p.d.f. of X i as f Xi x i θ = 1 θ I 0,θ x i where I 0,θ is the identity function of the interval 0,θ. For any set A the identity function of A is defined as { 1 if x A I A x = 0 if x / A 43 X2 i Hence f X x θ = 1 θ I 0,θ x i = 1 θ n I 0,θ x i = 1 θ ni 0,θ maxx i. }{{} 1 }{{} = gmaxx i,θ.h x = max 1 i n X i is sufficient for θ by the factorization theorem.