36-705: Itermediate Statistics Fall 207 Lecturer: Siva Balakrisha Lecture 2: September 27 Today we will discuss sufficiecy i more detail ad the begi to discuss some geeral strategies for costructig estimators. 2. Miimal sufficiecy As we have see previously sufficiet statistics are ot uique. Furthermore, it seems at least ituitively that some sufficiet statistics preset much more reductio tha others (for istace i the Poisso model both the mea ad the etire sample are sufficiet). This motivates the followig defiitio of miimal sufficiet statistics: Miimal Sufficiecy: A statistic T (x,..., x ) is miimal sufficiet if it is sufficiet, ad furthermore for ay other sufficiet statistic S(x,..., x ) we ca write T (x,..., x ) = g(s(x,..., x )), i.e. T is a fuctio of S. There is ufortuately o straightforward way to verify this coditio. Aalogous to the factorizatio theorem we have a coditio that we ca check. Theorem 2. Defie R(x,..., x, y,..., y ; θ) = p(y,..., y ; θ) p(x,..., x ; θ). Suppose that a statistic T has the followig property: R(x,..., x, y,..., y ; θ) does ot deped o θ if ad oly if T (y,..., y ) = T (x,..., x ). The T is a MSS. Before we prove the theorem let us cosider some examples. Example 2.2 Suppose that Y,..., Y are i.i.d Poisso (θ). p(y,..., y ; θ) = e θ θ y i yi, p(y,..., y ; θ) p(x,..., x ; θ) = θ yi x i yi!/ x i! which is idepedet of θ iff y i = x i. This implies that T (X,..., X ) = X i is a miimal sufficiet statistic for θ. 2-
2-2 Lecture 2: September 27 The miimal sufficiet statistic is ot uique. But, the miimal sufficiet partitio is uique. Example 2.3 Cauchy. The p(x; θ) = p(y,..., y ; θ) p(x,..., x ; θ) = The ratio is a costat fuctio of θ if π( + (x θ) 2 ). { + (x i θ) 2 }. { + (y j θ) 2 } j= T (X,..., X ) = (X (),, X () ). It is techically harder to show that the ratio is idepedet of θ oly if T is the order statistics, but it could be doe usig theorems about polyomials. Havig show this, oe ca coclude that the order statistics are the miimal sufficiet statistics for θ. Proof: This proof is a bit techical so feel free to skip it. We prove this i two steps. We first show that T is a sufficiet statistic ad the we check that it is miimal. We defie the partitio iduced by T, as {A t : t Rage(T )} ad for each set i the partitio A t we associate a represetative (x t,..., x t ) A t. T is sufficiet: We look at the joit distributio at ay (x,..., x ). Suppose that T (x,..., x ) = u, the cosider (y,..., y ) := (x u,..., x u ). Observe that, (y,..., y ) depeds oly o T (x,..., x ), i.e. the poit y is a fuctio of the statistic T oly. Now we have that, p(x,..., x ; θ) = p(y,..., y ; θ)r(y,..., y, x,..., x ; θ), ad sice T (x,..., x ) = T (y,..., y ), R does ot deped o θ. Recallig that (y,..., y ) is oly a fuctio of T (x,..., x ) we have that, p(x,..., x ; θ) = g(t (x,..., x ); θ)h(x,..., x ), where g correspods to the first term ad h correspods to the R term. We coclude that T is sufficiet. T is miimal: As a prelimiary we ote that the defiitio of a miimal sufficiet statistic could be equivaletly writte as: T is a MSS if for ay other sufficiet statistic S, if we have that S(x,..., x ) = S(y,..., y ) the we also have that T (x,..., x ) = T (y,..., y ). This is equivalet to the statemet that T is a fuctio of S.
Lecture 2: September 27 2-3 Cosider, ay other sufficiet statistic S. Suppose that, S(x,..., x ) = S(y,..., y ), the by the factorizatio theorem we have that, p(x,..., x ; θ) = g(s(x,..., x ); θ)h(x,..., x ) = g(s(y,..., y ); θ)h(y,..., y ) h(x,..., x ) h(y,..., y ) = p(y,..., y ; θ) h(x,..., x ) h(y,..., y ), so we have that R(x,..., x, y,..., y ; θ) does ot deped o θ. T (x,..., x ) = T (y,..., y ) ad so T is miimal. So we coclude that 2.2 Miimal sufficiecy ad the likelihood Although miimal sufficiet statistics are ot uique they iduce a uique partitio o the possible datasets. This partitio is also iduced by the likelihood, i.e. Suppose we have a partitio such that (x,..., x ) ad (y,..., y ) are placed i the same set of the partitio iff L(θ; x,..., x ) L(θ; y,..., y ), the the partitio is the miimal sufficiet partitio. You will prove this o your homework but it is a simple cosequece of the characterizatio we have see i the previous sectio. 2.3 Sufficiecy - the risk reductio viewpoit We will retur to the cocept of risk more formally i the ext few lectures, but for ow let us try to uderstad the mai ideas. Settig: Suppose we observe X,..., X p(x; θ) ad we would like to estimate θ, i.e. we wat to costruct some fuctio of the data that is close i some sese to θ. We costruct a estimator θ(x,..., X ). I order to evaluate our estimator we might cosider how far our estimate is from θ o average, i.e. we ca defie R( θ, θ) = E( θ θ) 2. We will see this agai later o but the risk of a estimator ca be decomposed ito its bias ad variace, i.e. E( θ θ) 2 = (E θ θ) 2 + E( θ E θ) 2, where the first term is referred to as the bias ad the secod is the variace.
2-4 Lecture 2: September 27 There is a strog sese i which estimators which do ot deped oly o sufficiet statistics ca be improved. This is kow as the Rao-Blackwell theorem. Let θ be a estimator. Let T be ay sufficiet statistic ad defie θ = E[ θ T ]. Rao-Blackwell theorem: R( θ, θ) R( θ, θ). We will ot sped too much time o this but lets see a quick example ad the prove the result. Example: estimator: Suppose we toss a coi times, i.e. X,..., X Ber(θ). We cosider the θ = X, ad the sufficiet statistic T = X i, the θ = E[X T ] = E[X i X i ]. I claim that the coditioal expectatio is simply the average, i.e. θ = X i. First, let us check this i the case whe = 2. If X + X 2 = 2 the X =, ad if X + X 2 = 0, X = 0. I the case, whe X + X 2 =, we have X = with probability /2 ad 0 with probability /2. So we coclude the coditioal expectatio is (X + X 2 )/2. More geerally, if we have X i = k, the of the ( k ) equally likely possibilities we have that X = for ( k ) of them so that the coditioal expectatio is simply: ( k ) as desired. E[X i X i = k] = ( k ) = k, We observe that both estimators are ubiased but the variace of the Rao-Blackwellized estimator is θ( θ)/ as opposed to the origial estimator which has variace θ( θ). Proof of Rao-Blackwell: Observe that, R( θ, θ) = E[(E[ θ T ] θ) 2 ] = E[(E[ θ θ T ]) 2 ] E[E[( θ θ) 2 T ]] = R( θ, θ).
Lecture 2: September 27 2-5 The iequality is Jese s iequality (equivaletly just Var(X) = E[X 2 ] (E[X]) 2 0). A questio worth poderig is: why does it matter for Rao-Blackwellizatio that T is a sufficiet statistic? 2.4 More examples with the likelihood Example 2.4 Suppose that X = (X, X 2, X 3 ) Multiomial(, p) where So Suppose that X = (, 3, 2). The Now suppose that X = (2, 2, 2). The p = (p, p 2, p 3 ) = (θ, θ, 2θ). ( ) p(x; θ) = p x p x 2 2 p x 3 3 = θ x +x 2 ( 2θ) x 3. x x 2 x 3 L(θ) = 6!! 3! 2! θ θ 3 ( 2θ) 2 θ 4 ( 2θ) 2. L(θ) = 6! 2! 2! 2! θ2 θ 2 ( 2θ) 2 θ 4 ( 2θ) 2. Hece, the likelihood fuctio is the same for these two datasets. Example 2.5 X,, X N(µ, ). The, ( ) { } 2 L(µ) = exp (x i µ) 2 exp { } 2π 2 2 (x µ)2. Example 2.6 Let X,..., X Beroulli(p). The for p [0, ] where X = i X i. L(p) p X ( p) X 2.5 Estimatio Now we begi discussig more formally the estimatio problem.
2-6 Lecture 2: September 27 X,..., X p(x; θ). Wat to estimate θ = (θ,..., θ k ). A estimator θ = θ = w(x,..., X ) is a fuctio of the data. Keep i mid that the parameter is a fixed, ukow costat. The estimator is a radom variable. For ow, we will discuss three methods of costructig estimators:. The Method of Momets (MOM) 2. Maximum likelihood (MLE) 3. Bayesia estimators. Some Termiology. Throughout these otes, we will use the followig termiology:. E θ ( θ) = θ(x,..., x )p(x ; θ) p(x ; θ)dx dx. 2. Bias: E θ ( θ) θ. 3. The distributio of θ is called its samplig distributio. 4. The stadard deviatio of θ is called the stadard error deoted by se( θ). 5. θ is cosistet if θ p θ as. 6. Later we will see that if bias 0 ad Var( θ) 0 as the θ is cosistet. 2.6 The Method of Momets Suppose that θ = (θ,..., θ k ). Defie m = m 2 = X i, µ (θ) = E(X i ) Xi 2, µ 2 (θ) = E(Xi 2 ).. m k = Xi k, µ k (θ) = E(Xi k ).
Lecture 2: September 27 2-7 Let θ = ( θ,..., θ k ) solve: m j = µ j ( θ), j =,..., k. I other words, we equate the first k sample momets with the first k theoretical momets. This defies k equatios with k ukows. Example 2.7 N(β, σ 2 ) with θ = (β, σ 2 ). The µ = β ad µ 2 = σ 2 + β 2. Equate: X i = β, Xi 2 = σ 2 + β 2 to get β = X, σ 2 = (X i X ) 2. Example 2.8 Suppose where both k ad p are ukow. We get X,..., X Biomial(k, p) kp = X, Xi 2 = kp( p) + k 2 p 2 givig p = X k, k = X 2 X i (X i X) 2. The method of momets was popular may years ago because it is ofte easy to compute. Lately, it has attracted attetio agai. For example, there is a large literature o estimatig mixtures of Gaussias usig the method of momets.