ECE 6540, Lecture 06 Sufficient Statistics & Complete Statistics Variations
Last Time Minimum Variance Unbiased Estimators Sufficient Statistics Proving t = T(x) is sufficient Neyman-Fischer Factorization 2
Practice Consider yy = xx + ww ww ~ NN 00, ΣΣ What is the distribution of yy yy + 11σσ 2 yy/nn HHHH HHyy yy TT HH yytt 11 NN yy TT xx xx TT ΛΛyy 3
Neyman-Fischer Factorization Let s reconsider the XX 1, XX 2, XX 3, XX NN ~ iid NN μμ, σσ 2 With unknown θθ = μμ σσ 2 pp xx; θθ = = 1 1 exp xx μμ11 2 2ππσσ2 NN/2 2σσ2 1 1 exp 2ππσσ2 NN/2 2σσ 2 xxtt xx 2μμxx TT 11 + NNμμ 2 Are these sufficient statistics for μμ and/or σσ 2? xx TT 11 xx TT xx xx μμ11 2 xx TT xx 2μμxx TT 11 xx TT 11 + 94 xx xx TT 11/NN xx xxtt 11 NN 11 22 xx TT xx 2xx TT 11 xx TT 11 xx TT xx 4
Neyman-Fischer Factorization Let s reconsider the XX 1, XX 2, XX 3, XX NN ~ iid NN μμ, σσ 2 With unknown θθ = μμ σσ 2 pp xx; θθ = = 1 1 exp xx μμ11 2 2ππσσ2 NN/2 2σσ2 1 1 exp 2ππσσ2 NN/2 2σσ 2 xxtt xx 2μμxx TT 11 + NNμμ 2 Are these sufficient statistics for μμ or σσ 2? xx TT 11 xx TT xx xx μμ11 2 xx TT xx 2μμxx TT 11 xx TT xx 2xx TT 11 xx TT 11 xx TT xx xx TT 11 + 94 xx xx TT 11/NN xx xxtt 11 NN 11 22 = xx TT 11/NN xx TT xx 2 xxtt 11 22 NN Contains only xx TT 11 and xx TT xx + xxtt 11 22 5
Last Time Consider the random variables XX 1, XX 2, XX 3, XX NN are iid such that XX ii 1,2,3,, MM Pr XX = ii = pp ii θθ = pp 1 pp 2 pp MM What is this describing? 6
Practice Consider the random variables XX 1, XX 2, XX 3, XX NN are iid such that XX ii 1,2,3,, MM Pr XX = ii = pp ii θθ = pp 1 pp 2 pp MM δδ pp xx; θθ = pp xx 1 δδ 1 pp xx 2 δδ 2 pp xx 3 δδ xx MM 3 pp MM 7
Practice Consider the random variables XX 1, XX 2, XX 3, XX NN are iid such that XX ii 1,2,3,, MM Pr XX = ii = pp ii θθ = pp 1 pp 2 pp MM δδ pp xx; θθ = pp xx 1 δδ 1 pp xx 2 δδ 2 pp xx 3 δδ xx MM 3 pp MM pp xx; θθ = NN nn=1 pp xx; θθ = pp NN nn=1 1 pp 1 δδ xx nn 1 pp2 δδ xx nn 2 pp3 δδ xx nn 3 ppmm δδ xx nn MM δδ xx ii 1 NN nn=1 pp2 δδ xx ii 2 NN nn=1 pp3 δδ xx ii 3 NN nn=1 ppmm δδ xx ii MM gg TT xx = NN nn=1 δδ xx ii 1 A Histogram!!! NN nn=1 δδ xx ii MM ; θθ 8
Practice Beta Distribution Example Let s consider the XX 1, XX 2, XX 3, XX NN ~ iid Beta αα, ββ with unknown ββ p xx ii ; ββ = 1 BB αα,ββ xx ii αα 1 1 xx ii ββ 1 for 0 xx ii 1 Determine a sufficient statistic for ββ See Homework!! Source: https://en.wikipedia.org/wiki/beta_distribution 9
Neyman-Fischer Factorization Note about minimum sufficient statistics Kay s definition is misleading According to Kay: The data set that contains the least number of elements is called the minimal one. More correct definition A sufficient statistic SS(xx) is minimal sufficient if there exists a function ff such that SS(XX) = ff(tt(xx)) for any sufficient statistic TT(XX) That is, if a function of any other sufficient statistic from can make SS(XX), then SS(XX) is a minimal sufficient statistic 10
Using Sufficient Statistics to Find the MVU Estimator
Rao-Blackwell-Lehmann-Scheffe Theorem If θθ is an unbiased estimator of θθ and TT xx is a sufficient statistic for θθ, then θθ = E θθ TT xx is 1) A valid estimator for θθ (not dependent on θθ) 2) Unbiased TT xx represents more information 3) Of lesser or equal variance than that of θθ, for all θθ (each element of θθ has lesser or equal variance) 4) If the sufficient statistic is complete, then θθ is the MVU estimator 12
Rao-Blackwell-Lehmann-Scheffe Theorem If θθ is an unbiased estimator of θθ and TT xx is a sufficient statistic for θθ, then θθ = E θθ TT xx is. Use another Use another suff. statistic suff. statistic TT 1 xx TT 2 xx θθ 1 θθ 2 A smaller variance estimate! θθ 3 A smaller variance estimate! Use another suff. statistic TT 3 xx θθ 4 A smaller variance estimate! θθ Smallest variance estimate! 13
Rao-Blackwell-Lehmann-Scheffe Theorem If θθ is an unbiased estimator of θθ and TT xx is a sufficient statistic for θθ, then θθ = E θθ TT xx is. Use a complete statistic TT 1 xx θθ 1 θθ Smallest variance estimate! 14
Using Complete, Sufficient Statistics to Find the MVU Estimator
A Complete Statistics A statistic is complete if there is only one function gg of the statistic t = TT xx that is unbiased. That is, A statistic is complete if only one gg exists such that EE gg TT xx = θθ 16
Procedure for finding the MVU Estimator (using sufficient statistics) 1. Find a sufficient statistic TT xx using the Neyman-Fisher factorization 2. Show that TT xx is complete (if not, stop) 3. The MVU estimator is 1. θθ = EE θθ TT xx where θθ is any unbiased estimator for θθ Variance Space of all unbiased estimators Space of all sufficient unbiased estimators MVUB estimator In the end though, proving that a statistic TT xx is complete is difficult 17
Example: Consider the signal xx = AA11 + ww ww ~ NN 0, IIσσ 22 Show that tt = TT xx = NN nn=1 xx ii is complete. DC voltage 18
Example: Consider the signal xx = AA11 + ww ww ~ NN 0, IIσσ 22 Show that tt = TT xx = NN nn=1 xx ii is complete. DC voltage Assume the exists a function gg such that E gg tt = AA Assume there exists a second function h such that E h tt = AA Then E gg tt h tt = AA AA = 0 Let vv tt = gg tt h tt Then E vv tt = vv tt pp tt; θθ ddtt = 0 19
Example: Consider the signal xx = AA11 + ww ww ~ NN 0, IIσσ 22 Show that tt = TT xx = NN nn=1 xx ii is complete. DC voltage Then E vv tt = vv tt pp tt; θθ ddtt = 0 tt ~ NN NNNN, NNσσ 22, so E vv tt = vv tt 1 2 ππππ σσ 2 exp tt NNNN 2 2NNσσ 2 dddd = 0 If ττ = tt, E vv tt = NN vv ττ NN 2 ππππ σσ NN exp AA ττ 2 ddττ = 0 2 2σσ 2 vvv ττ ff AA ττ 20
Example: Consider the signal xx = AA11 + ww ww ~ NN 0, IIσσ 22 Show that tt = TT xx = NN nn=1 xx ii is complete. DC voltage If ττ = tt, E vv tt = NN vv ττ vvv ττ NN 2 ππππ σσ NN exp AA ττ 2 ddττ = 0 2 2σσ 2 ff AA ττ E vv tt = vv ττ ff AA ττ dddd = vv ττ ff ττ = 0 VV ωω FF ωω = 0 21
Example: Consider the signal xx = AA11 + ww ww ~ NN 0, IIσσ 22 Show that tt = TT xx = NN nn=1 xx ii is complete. DC voltage VV ωω FF ωω = 0 Fourier Transform of a Gaussian is Gaussian, so not zero Therefore, VV ωω = 0 vv ττ = 0 vv tt = 0 gg tt h tt = 0 Hence gg tt = h tt -> There can only be one unbiased function! TT xx is a complete statistic! 22
A Complete Statistics A statistic is complete if there is only one function gg of the statistic t = TT xx that is unbiased. That is, A statistic is complete if only one gg exists such that EE gg TT xx = θθ Alternatively, A statistic is complete if vv tt pp tt; θθ ddtt = EE vv TT xx = 0, for all θθ Is satisfied only when vv tt = 0 for all tt = TT xx. 23
There is one simplification Consider the Fischer Factorization: pp xx; θθ = gg tt; θθ h xx If gg tt; θθ has the form gg tt; θθ = cc θθ exp aa kk θθ tt kk KK kk=1 Then pp xx; θθ and pp tt; θθ are said to be in the exponential family And tt = TT xx is a complete, sufficient statistic Many common PDFs satisfy this. 24
Example: Are the following distributions in the exponential family? Exponential: pp xx; λλ = λλee λλλλ, xx 0 0, xx < 0 Gamma: pp xx; kk, θθ = xkk 1 ee xx θθ θθ kk Γ kk Gaussian: pp xx; μμ, σσ 2 = 2ππσσ 1 1 2 1/2 exp 2 xx μμ 2 2σσ 25
Example Let XX 1, XX 2, XX 3, XX NN ~ iid Bernoulli θθ Determine a complete, sufficient statistics for θθ Determine an MVUB Estimator 26
Example Let XX 1, XX 2, XX 3, XX NN ~ iid Bernoulli θθ NN pp xx; θθ = ii=1 θθ xx ii θθ 1 1 xx ii = θθ NN ii=1 xxii 1 θθ NN NN ii=1 xxii tt tt = 1 1 θθ NN θθ 1 θθ h xx Hence, NN ii=1 gg ii=1 NN TT xx = xx ii ; θθ NN xxii = 1 1 θθ NN exp ii=1 xx ii is a complete sufficient statistic NN ii=1 xx ii gg tt; θθ = cc θθ exp log θθ 1 θθ KK aa kk θθ tt kk kk=1 27
Example Let XX 1, XX 2, XX 3, XX NN ~ iid Bernoulli θθ The expected value of the sufficient statistic is NN EE tt = EE ii=1 xx ii NN = EE xx ii ii=1 = NNθθ Therefore tt is an unbiased estimator. Since t is complete, it is also MVUB. So the NN MVUB estimator is NN θθ = ii=1 xx ii 28
Example Let XX 1, XX 2, XX 3, XX NN ~ iid NN θθ, 1 Determine a complete, sufficient statistic for θθ 29
Example Let XX 1, XX 2, XX 3, XX NN ~ iid NN θθ, 1 pp xx; θθ = 1 2ππ NN/2 exp 1 2 xx θθ11 2 = 1 2ππ NN/2 exp 1 2 xxtt xx 2θθxx TT 11 + NNθθ 2 = 1 2ππ NN/2 exp 1 2 2θθxxTT 11 + NNθθ 2 exp xxtt xx 2 = 1 2ππ NN/2 exp 1 2 2θθθθ xx + NNθθ2 exp xxtt xx 2 gg TT xx = xx TT 11; θθ h xx = cc θθ exp aa kk θθ tt kk KK kk=1 30
Example Let XX 1, XX 2, XX 3, XX NN ~ iid NN θθ, 1 The unbiased complete, sufficient statistic is xx TT 11/NN Hence the MVUB estimator is θθ = xx TT 11/NN 31
Example Let XX 1, XX 2, XX 3, XX NN ~ iid NN θθ, σσ 2 with unknown θθ, σσ 2 Determine a complete, sufficient statistic for θθ and σσ 2 32
Example Let XX 1, XX 2, XX 3, XX NN ~ iid NN θθ, σσ 2 with unknown θθ, σσ 2 pp xx; θθ = = 1 1 exp xx θθ11 2 2ππσσ2 NN/2 2σσ2 1 1 exp 2ππσσ2 NN/2 2σσ 2 xxtt xx 2θθxx TT 11 + NNθθ 2 = 1 2ππσσ 2 NN 2 exp 1 2σσ 2 TT 2 xx 2θθTT 1 xx + NNθθ 2 1 gg tt = xxtt 11 xx TT xx ; θθ σσ 2 h xx gg tt; θθ = cc θθ exp KK aa kk θθ tt kk kk=1 33
Example Let XX 1, XX 2, XX 3, XX NN ~ iid NN θθ, σσ 2 with unknown θθ, σσ 2 The unbiased complete, sufficient statistic for θθ is xx TT 11/NN The unbiased complete, sufficient statistic for σσ 2 is xx xxtt 11 NN 11 22 Hence the MVUB estimators are θθ = xx TT 11/NN σσ 2 = xx xxtt 11 NN 11 22 34
Beta Distribution Example Let s consider the XX 1, XX 2, XX 3, XX NN ~ iid Beta αα, ββ with unknown ββ p xx ii ; ββ = 1 BB αα,ββ xx ii αα 1 1 xx ii ββ 1 for 0 xx ii 1 Determine a complete, sufficient statistic for ββ See Homework!!! 35
Consider XX 1, XX 2, XX 3, XX NN ~ iid exponential λλ The statistic NN tt = nn=1 Is a complete, sufficient statistic for λλ. What is the MVUB estimator? xx nn 36
Consider XX 1, XX 2, XX 3, XX NN ~ iid exponential λλ The statistic NN tt = nn=1 Is a complete, sufficient statistic for λλ. What is the MVUB estimator? tt~erlang NN, λλ or Gamma NN, 1 λλ xx nn EE tt = 1 λλ 1 tt ~ InvGamma NN, λλ EE 1 = λλ = λλ 1 tt NN 1 NN 1 So the MVUB estimator is λλ = NN 1 1 NN nn=1 xx nn 37
Procedure for finding the MVU Estimator (using sufficient statistics) For xx = AA11 + ww, ww ~ NN 0, IIσσ 22 1. Find a sufficient statistic TT xx using the Neyman-Fisher factorization 2. Show that TT xx is complete (if not, stop) 3. The MVU estimator is 1. θθ = EE θθ TT xx where θθ is any unbiased estimator for θθ
In the end, Applying Neyman-Fisher factorization can be difficult Proving that a statistic is complete can be difficult (unless you have an exponential family PDF) Applying Rao-Blackwell-Lehmann-Scheffe Theorem can be difficult What else can we do???? 39