Lecture 23: Minimal sufficiency

Similar documents
Chapter 6 Principles of Data Reduction

Lecture 19: Convergence

6. Sufficient, Complete, and Ancillary Statistics

Lecture 12: September 27

Lecture 18: Sampling distributions

Lecture 20: Multivariate convergence and the Central Limit Theorem

Lecture 8: Convergence of transformations and law of large numbers

Topic 9: Sampling Distributions of Estimators

Lecture 16: UMVUE: conditioning on sufficient and complete statistics

62. Power series Definition 16. (Power series) Given a sequence {c n }, the series. c n x n = c 0 + c 1 x + c 2 x 2 + c 3 x 3 +

Lecture 7: Properties of Random Samples

Topic 9: Sampling Distributions of Estimators

Topic 9: Sampling Distributions of Estimators

The variance of a sum of independent variables is the sum of their variances, since covariances are zero. Therefore. V (xi )= n n 2 σ2 = σ2.

Last Lecture. Unbiased Test

Chapter 6 Infinite Series

Convergence of random variables. (telegram style notes) P.J.C. Spreij

EECS564 Estimation, Filtering, and Detection Hwk 2 Solns. Winter p θ (z) = (2θz + 1 θ), 0 z 1

MASSACHUSETTS INSTITUTE OF TECHNOLOGY 6.436J/15.085J Fall 2008 Lecture 19 11/17/2008 LAWS OF LARGE NUMBERS II THE STRONG LAW OF LARGE NUMBERS

Math 155 (Lecture 3)

Sequences and Series of Functions

1.010 Uncertainty in Engineering Fall 2008

Lecture 17: Minimal sufficiency

Statistical Inference (Chapter 10) Statistical inference = learn about a population based on the information provided by a sample.

Let us give one more example of MLE. Example 3. The uniform distribution U[0, θ] on the interval [0, θ] has p.d.f.

Statistical Theory MT 2009 Problems 1: Solution sketches

Statistical Theory MT 2008 Problems 1: Solution sketches

Probability 2 - Notes 10. Lemma. If X is a random variable and g(x) 0 for all x in the support of f X, then P(g(X) 1) E[g(X)].

Lecture 33: Bootstrap

Linear regression. Daniel Hsu (COMS 4771) (y i x T i β)2 2πσ. 2 2σ 2. 1 n. (x T i β y i ) 2. 1 ˆβ arg min. β R n d

Week 5-6: The Binomial Coefficients

Direction: This test is worth 250 points. You are required to complete this test within 50 minutes.

Lecture 6 Ecient estimators. Rao-Cramer bound.

MATH 320: Probability and Statistics 9. Estimation and Testing of Parameters. Readings: Pruim, Chapter 4

First Year Quantitative Comp Exam Spring, Part I - 203A. f X (x) = 0 otherwise

Distribution of Random Samples & Limit theorems

5. Likelihood Ratio Tests

CEE 522 Autumn Uncertainty Concepts for Geotechnical Engineering

17. Joint distributions of extreme order statistics Lehmann 5.1; Ferguson 15

Questions and Answers on Maximum Likelihood

This exam contains 19 pages (including this cover page) and 10 questions. A Formulae sheet is provided with the exam.

2.2. Central limit theorem.

Lecture 24: Variable selection in linear models

Definition 4.2. (a) A sequence {x n } in a Banach space X is a basis for X if. unique scalars a n (x) such that x = n. a n (x) x n. (4.

Stat 421-SP2012 Interval Estimation Section

MASSACHUSETTS INSTITUTE OF TECHNOLOGY 6.265/15.070J Fall 2013 Lecture 21 11/27/2013

Infinite Sequences and Series

Lesson 10: Limits and Continuity

Mathematical Statistics - MS

IIT JAM Mathematical Statistics (MS) 2006 SECTION A

Lecture Chapter 6: Convergence of Random Sequences

7.1 Convergence of sequences of random variables

CHAPTER 5. Theory and Solution Using Matrix Techniques

The multiplicative structure of finite field and a construction of LRC

Summary. Recap ... Last Lecture. Summary. Theorem

An Introduction to Randomized Algorithms

Lecture 3. Properties of Summary Statistics: Sampling Distribution

Expectation and Variance of a random variable

Introduction to Extreme Value Theory Laurens de Haan, ISM Japan, Erasmus University Rotterdam, NL University of Lisbon, PT

Solutions: Homework 3

Lecture 11 and 12: Basic estimation theory

Stat410 Probability and Statistics II (F16)

MATHEMATICAL SCIENCES PAPER-II

Properties of Point Estimators and Methods of Estimation

MATH 472 / SPRING 2013 ASSIGNMENT 2: DUE FEBRUARY 4 FINALIZED

Stat 200 -Testing Summary Page 1

Notes 5 : More on the a.s. convergence of sums

7.1 Convergence of sequences of random variables

Chapter 3. Strong convergence. 3.1 Definition of almost sure convergence

1. ARITHMETIC OPERATIONS IN OBSERVER'S MATHEMATICS

This section is optional.

Kurskod: TAMS11 Provkod: TENB 21 March 2015, 14:00-18:00. English Version (no Swedish Version)

Since X n /n P p, we know that X n (n. Xn (n X n ) Using the asymptotic result above to obtain an approximation for fixed n, we obtain

ECONOMETRIC THEORY. MODULE XIII Lecture - 34 Asymptotic Theory and Stochastic Regressors

EE 4TM4: Digital Communications II Probability Theory

Binomial Distribution

Joint Probability Distributions and Random Samples. Jointly Distributed Random Variables. Chapter { }

It is always the case that unions, intersections, complements, and set differences are preserved by the inverse image of a function.

Approximations and more PMFs and PDFs

Lecture Note 8 Point Estimators and Point Estimation Methods. MIT Spring 2006 Herman Bennett

Tests of Hypotheses Based on a Single Sample (Devore Chapter Eight)

MASSACHUSETTS INSTITUTE OF TECHNOLOGY 6.265/15.070J Fall 2013 Lecture 3 9/11/2013. Large deviations Theory. Cramér s Theorem

Goodness-of-Fit Tests and Categorical Data Analysis (Devore Chapter Fourteen)

STAT Homework 1 - Solutions

Beurling Integers: Part 2

Understanding Samples

DIVISIBILITY PROPERTIES OF GENERALIZED FIBONACCI POLYNOMIALS

5 : Exponential Family and Generalized Linear Models

Chapter 6 Sampling Distributions

1. By using truth tables prove that, for all statements P and Q, the statement

Measure and Measurable Functions

Problem Set 4 Due Oct, 12

10-701/ Machine Learning Mid-term Exam Solution

Math 61CM - Solutions to homework 3

Econ 325/327 Notes on Sample Mean, Sample Proportion, Central Limit Theorem, Chi-square Distribution, Student s t distribution 1.

Bayesian Methods: Introduction to Multi-parameter Models

Analytic Continuation

Review Questions, Chapters 8, 9. f(y) = 0, elsewhere. F (y) = f Y(1) = n ( e y/θ) n 1 1 θ e y/θ = n θ e yn

Introductory statistics

( θ. sup θ Θ f X (x θ) = L. sup Pr (Λ (X) < c) = α. x : Λ (x) = sup θ H 0. sup θ Θ f X (x θ) = ) < c. NH : θ 1 = θ 2 against AH : θ 1 θ 2

Transcription:

Lecture 23: Miimal sufficiecy Maximal reductio without loss of iformatio There are may sufficiet statistics for a give problem. I fact, X (the whole data set) is sufficiet. If T is a sufficiet statistic ad T = ψ(s), where ψ is a fuctio ad S is aother statistic, the S is sufficiet. For istace, if X 1,...,X are iid with P(X i = 1) = θ ad P(X i = 0) = 1 θ, the ( m X i, i=m+1 X i) is sufficiet for θ, where m is ay fixed iteger betwee 1 ad. If T is sufficiet ad T = ψ(s) with a measurable fuctio ψ that is ot oe-to-oe, the T is more useful tha S, sice T provides a further reductio of the data without loss of iformatio. Is there a sufficiet statistics that provides maximal" reductio of the data? Defiitio 6.2.11 (miimal sufficiecy) A sufficiet statistic T (X) is a miimal sufficiet statistic if, for ay other sufficiet statistic U(X), T (x) is a fuctio of U(x). UW-Madiso (Statistics) Stat 609 Lecture 23 2015 1 / 15

T (x) is a fuctio of U(x) iff U(x) = U(y) implies that T (x) = T (y) for ay pair (x,y). I terms of the partitios of the rage of the data set, if T (x) is a fuctio of U(x), the {x : U(x) = t} {x : T (x) = t} Thus, a miimal sufficiet statistic achieves the greatest possible data reductio as a sufficiet statistic. If both T ad S are miimal sufficiet statistics, the by defiitio there is a oe-to-oe fuctio ψ such that T = ψ(s); hece, the miimal sufficiet statistic is uique i the sese that two statistics that are fuctios of each other ca be treated as oe statistic. For example, if T is miimal sufficiet, the so is (T,e T ), but o oe is goig to use (T,e T ). If the rage of X is R k, the there exists a miimal sufficiet statistic. Example 6.2.15. Let X 1,...,X be iid form the uiform(θ,θ + 1) distributio, where θ R is a ukow parameter. UW-Madiso (Statistics) Stat 609 Lecture 23 2015 2 / 15

The joit pdf of (X 1,...,X ) is I({θ < x i < θ + 1}) = I({x () 1 < θ < x (1) }), (x 1,...,x ) R, where x (i) deotes the ith ordered value of x 1,...,x. By the factorizatio theorem, T = (X (1),X () ) is sufficiet for θ. Note that x (1) = sup{θ : f θ (x) > 0} ad x () = 1 + if{θ : f θ (x) > 0}. If S(X) is a statistic sufficiet for θ, the by the factorizatio theorem, there are fuctios h ad g θ such that f θ (x) = g θ (S(x))h(x). For x with h(x) > 0, x (1) = sup{θ : g θ (S(x)) > 0} ad x () = 1 + if{θ : g θ (S(x)) > 0}. Hece, there is a fuctio ψ such that T (x) = ψ(s(x)) whe h(x) > 0. Sice P(h(X) > 0) = 1, we coclude that T is miimal sufficiet. I this example, the dimesio of a miimal sufficiet statistic is larger tha the dimesio of θ. UW-Madiso (Statistics) Stat 609 Lecture 23 2015 3 / 15

Fidig a miimal sufficiet statistic by defiitio is ot coveiet. The ext theorem is a useful tool. Theorem 6.2.13. Let f θ (x) be the pmf or pdf of X. Suppose that T (x) is sufficiet for θ ad that, for every pair x ad y with at least oe of f θ (x) ad f θ (y) is ot 0, f θ (x)/f θ (y) does ot deped o θ implies T (x) = T (y). (Here, we defie a/0 = for ay a > 0.) The T (X) is miimal sufficiet for θ. Proof. Let U(X) be aother sufficiet statistic. By the factorizatio theorem, there are fuctios h ad g θ such that f θ (x) = g θ (U(x))h(x) for all x ad θ. For x ad y such that at least oe of f θ (x) ad f θ (y) is ot 0, i.e., at least oe of h(x) ad h(y) is ot 0, if U(x) = U(y), the which does ot deped o θ. f θ (x) f θ (y) = g θ (U(x))h(x) g θ (U(y))h(y) = h(x) h(y) UW-Madiso (Statistics) Stat 609 Lecture 23 2015 4 / 15

By the assumptio of the theorem, T (x) = T (y). This shows that there is a fuctio ψ such that T (x) = ψ(s(x)) Hece, T is miimal sufficiet for θ Example 6.2.15. We re-visit this example ad show how to apply Theorem 6.2.13. If X 1,...,X are iid form the uiform(θ,θ + 1) distributio, the the joit pdf is f θ (x) = I({x () 1 < θ < x (1) }), x = (x 1,...,x ) R, Let x ad y be two data poits. Each of f θ (x) ad f θ (y) oly takes two possible values, 0 ad 1. Let A x = (x () 1,x (1) ) ad A y = (y () 1,y (1) ). The f θ (x) 0 θ A x,θ A y f θ (y) = 1 θ A x,θ A y θ A x,θ A y This depeds o θ uless A x = A y. UW-Madiso (Statistics) Stat 609 Lecture 23 2015 5 / 15

Thus, if the ratio does ot deped o θ, we must have x (1) = y (1) ad x () = y (). Therefore, by Theorem 6.2.13, (X (1),X () ) is miimal sufficiet for θ. Example 6.2.14. Let X 1,...,X be iid from N(µ,σ 2 ) with ukow µ R ad σ > 0. Earlier, we showed that T = ( X,S 2 ) is sufficiet for θ = (µ,σ 2 ). Is T miimal sufficiet? Let f θ (x) be the joit pdf of the sample, x ad y be two sample poits. The f θ (x) f θ (y) = (2πσ 2 ) /2 exp( [(( x µ) 2 + ( 1)s 2 x]/(2σ 2 )) (2πσ 2 ) /2 exp( [((ȳ µ) 2 + ( 1)s 2 y]/(2σ 2 )) = exp([ ( x 2 ȳ 2 ) + 2µ( x ȳ) ( 1)(s 2 x s 2 y)]/(2σ 2 )) where x ad s 2 x are the sample mea ad variace based o the sample poit x, ad ȳ ad s 2 y are the sample mea ad variace based o the sample poit y. UW-Madiso (Statistics) Stat 609 Lecture 23 2015 6 / 15

This ratio does ot deped o θ = (µ,σ 2 ) iff x = ȳ ad s 2 x = s 2 y. By Theorem 6.2.13, T = ( X,S 2 ) is miimal sufficiet for θ = (µ,σ 2 ). Like the cocept of sufficiecy, miimal sufficiecy also depeds o the parameters we are iterested i. I Example 6.2.14, let us cosider two sub-families. Suppose that it is kow that σ = 1. The T = ( X,S 2 ) is still sufficiet but ot miimal sufficiet for µ. I fact, usig Theorem 6.2.13 we ca show that X is miimal sufficiet for µ whe σ = 1 ad, T = ( X,S 2 ) is ot a fuctio of X (why?). Suppose that it is kow that µ 2 = σ 2 so that we oly have oe ukow parameter µ R, µ 0. (This is called a curved expoetial family.) Is T = ( X,S 2 ) miimal sufficiet for θ = µ? Note that T is certaily sufficiet for θ. The aswer ca be foud as a special case of the followig result. UW-Madiso (Statistics) Stat 609 Lecture 23 2015 7 / 15

Miimal sufficiet statistics i expoetial families Let X 1,...,X be iid from a expoetial family with pdf f θ (x) = h(x)c(θ)exp ( η(θ) T (x) ), θ Θ where η(θ) = (w 1 (θ),...,w k (θ)) ad T (x) = (t 1 (x),...,t k (x)). Suppose that there exists {θ 0,θ 1,...,θ k } Θ such that the vectors η i = η(θ i ) η(θ 0 ), i = 1,...,k, are liearly idepedet i R k. (This is true if the family is of full rak). We have show that T (X) is sufficiet for θ. We ow show that T is i fact miimal sufficiet for θ. Note that we ca focus o the pits x at which h(x) > 0. For ay θ, f θ (x) f θ (y) = exp( η(θ) [T (x) T (y)] ) h(x) h(y) If this ratio equals φ(x,y) ot depedig o θ, the which implies η(θ) [T (x) T (y)] = log(φ(x,y)) + log(h(y)/h(x)) [η(θ i ) η(θ 0 )] [T (x) T (y)] = 0 i = 1,...,k. UW-Madiso (Statistics) Stat 609 Lecture 23 2015 8 / 15

Sice η(θ i ) η(θ 0 ), i = 1,...,k, are liearly idepedet, we must have T (x) = T (y). By Theorem 6.2.13, T is miimal sufficiet for θ. Applicatio to curved ormal family We ca ow show that T = ( X,S 2 ) miimal sufficiet for θ = µ whe the sample is a radom sample from N(µ, µ 2 ), µ R, µ 0. It ca be show that ( µ η(θ) = σ 2, 1 ) ( 2σ 2 = µ, 1 ) 2µ 2 Poits θ 0 = (1,1), θ 1 = ( 1,1), ad θ 2 = (1/2,1/2) are i the parameter space, ad ( ) ( ) ( 2 η(θ 1 ) η(θ 0 ) = = ( 1)/2 ( 1)/2 0 ( η(θ 2 ) η(θ 0 ) = /2 2( 1) ) ( ( 1)/2 ) ( = ) ) /2 3( 1)/2 UW-Madiso (Statistics) Stat 609 Lecture 23 2015 9 / 15

Are these two vectors liearly idepedet? If there are c ad d such that i.e., c[η(θ 1 ) η(θ 0 )] + d[η(θ 2 ) η(θ 0 )] = 0 ( 2 c 0 ) ( + d /2 3( 1)/2 ) = 0 The, we must have d = 0 ad the c = 0. That meas vectors η(θ 1 ) η(θ 0 ) ad η(θ 2 ) η(θ 0 ) are liearly idepedet ad the coditio for the result of expoetial family is satisfied. Therefore, T = ( X,S 2 ) is miimal sufficiet for θ = µ. Example. Let (X 1,...,X ) be a radom sample from Cauchy(µ,σ), where µ R ad σ > 0 are ukow parameters. The joit pdf of (X 1,...,X ) is f µ,σ (x) = σ 1 π σ 2 + (x i µ) 2, x = (x 1,...,x ) R. UW-Madiso (Statistics) Stat 609 Lecture 23 2015 10 / 15

For ay x = (x 1,...,x ) ad y = (y 1,...,y ), if f µ,σ (x) = ψ(x,y)f µ,σ (y) holds for ay µ ad σ, where ψ does ot deped o (µ,σ), the [ 1 + (y i µ) 2] [ = ψ(x,y) 1 + (x i µ) 2] µ R Both sides of the above idetity are polyomials of degree 2 i µ. Compariso of the coefficiets to the highest terms gives ψ(x,y) = 1 ad hece [ 1 + (y i µ) 2] = [ 1 + (x i µ) 2] µ R As a polyomial of µ, the left-had side of the above idetity has 2 complex roots x i ± ı, i = 1,...,, while the right-had side of the above idetity has 2 complex roots y i ± ı, i = 1,...,. Sice the two sets of roots must agree, the ordered values of x i s are the same as the ordered values of y i s. By Theorem 6.2.13, the order statistics of X 1,...,X is miimal sufficiet for (µ,σ). UW-Madiso (Statistics) Stat 609 Lecture 23 2015 11 / 15

The followig result may be useful i establishig miimal sufficiecy outside of the expoetial family. Theorem 6.6.5. Let f 0,f 1,f 2,... be a sequece of pdf s or pmf s o the rage of X, all have the same support. a. The statistic ( f1 (X) T (X) = f 0 (X), f ) 2(X) f 0 (X),... is miimal sufficiet for the family {f 0,f 1,f 2,...} of populatios for X. b. If P is a family of pdf s or pmf s for X with commo support, ad (i) f i P, i = 0,1,2,... (ii) T (X) is sufficiet for P, the T is miimal sufficiet for P. Proof of part a. Let g i (T ) = T i, the ith compoet of T, i = 0,1,2,..., ad let g 0 (T ) = 1. The f i (x) = g i (T (x))f 0 (x), i = 0,1,2,... By the factorizatio theorem, T is sufficiet for P 0 = {f 0,f 1,f 2,...}. UW-Madiso (Statistics) Stat 609 Lecture 23 2015 12 / 15

Suppose that S(X) is aother sufficiet statistic. By the factorizatio theorem, there are Borel fuctios h ad g i such that f i (x) = g i (S(x))h(x), i = 0,1,2,... The T i (x) = f i(x) f 0 (x) = g i(s(x)) g 0 (S(x)), i = 0,1,2,... Thus, T is a fuctio of S. By defiitio, T is miimal sufficiet for P 0. Proof of part b. The two additioal coditios are P 0 P ad T is sufficiet for P. Let S be ay sufficiet statistic for P. The S is also sufficiet for P 0. From part a, T is miimal sufficiet for P 0 so there is a fuctio ψ such that T = ψ(s). By defiitio, T is miimal sufficiet for P sice T is sufficiet for P. UW-Madiso (Statistics) Stat 609 Lecture 23 2015 13 / 15

I fact, the result i Theorem 6.6.5 still holds if the commo support coditio is replaced by that the support of f 0 cotais the support of ay pdf or pmf i P. I most applicatios, we ca choose P 0 cotaiig fiitely may elemets. Example Let X 1,...,X be a radom sample from double-expoetial(µ,1), µ R, i.e., the joit pdf of X is ) f µ (x) = 1 2 exp ( x i µ Cosider P 0 = {f 0,f 1,...,f }, where each f j is the pdf with µ = j, j = 0,1,...,. By Theorem 6.6.5a, a miimal sufficiet statistic for P 0 is ( ) ) T = exp( X i X i j, j = 1,..., which is equivalet to UW-Madiso (Statistics) Stat 609 Lecture 23 2015 14 / 15

( U = X i X i j, j = 1,..., We ca further show that U is equivalet to S, the set of order statistics. Sice S is sufficiet, by Theorem 6.6.5b, S, U, ad T are all miimal sufficiet. Defiitio 6.6.3 (Necessity). A statistic is said to be ecessary if it ca be writte as a fuctio of every sufficiet statistic. T is ecessary if, for every sufficiet S, there is a fuctio ψ such that T = ψ(s). From the defiitios of the ecessity ad miimal sufficiecy, we ca reach the followig coclusio. Theorem 6.6.4. A statistic is miimal sufficiet if ad oly if it is ecessary ad sufficiet. ) UW-Madiso (Statistics) Stat 609 Lecture 23 2015 15 / 15