Lecture 3 Properties of Summary Statistics: Samplig Distributio
Mai Theme How ca we use math to justify that our umerical summaries from the sample are good summaries of the populatio?
Lecture Summary Today, we focus o two summary statistics of the sample ad study its theoretical properties i=1 Sample mea: X = 1 Sample variace: S 2 = 1 1 X i i=1 X i X 2 They are aimed to get a idea about the populatio mea ad the populatio variace (i.e. parameters) First, we ll study, o average, how well our statistics do i estimatig the parameters Secod, we ll study the distributio of the summary statistics, kow as samplig distributios.
Setup Let X 1,, X R p be i.i.d. samples from the populatio F θ F θ : distributio of the populatio (e.g. ormal) with features/parameters θ Ofte the distributio AND the parameters are ukow. That s why we sample from it to get a idea of them! i.i.d.: idepedet ad idetically distributed Every time we sample, we redraw members from the populatio ad obtai (X 1,, X ). This provides a represetative sample of the populatio. R p : dimesio of X i For simplicity, we ll cosider uivariate cases (i.e. p = 1)
Loss Fuctio How good are our umerical summaries (i.e. statistics) i capturig features of the populatio (i.e. parameters)? Loss Fuctio: Measures how good the statistic is i estimatig the parameter 0 loss: the statistic is the perfect estimator for the parameter loss: the statistic is a terrible estimator for the parameter Example: l T, θ = T θ 2 where T is the statistic ad Λ is the parameter. Called square-error loss
Sadly It is impossible to compute the values for the loss fuctio Why? We do t kow what the parameter is! (sice it s a ukow feature of the populatio ad we re tryig to study it!) More importatly, the statistic is radom! It chages every time we take a differet sample from the populatio. Thus, the value of our loss fuctio chages per sample
A Remedy A more maageable questio: O average, how good is our statistic i estimatig the parameter? Risk (i.e. expected loss): The average loss icurred after repeated samplig R θ = E l T, θ Risk is a fuctio of the parameter For the square-error loss, we have R θ = E T θ 2
Bias-Variace Trade-Off: Square Error Risk After some algebra, we obtai aother expressio for square error Risk R θ = E T θ 2 = E T θ 2 + E[(T E T ) 2 ] = Bias θ T 2 + Var T Bias: O average, how far is the statistic away from the parameter (i.e. accuracy) Bias θ T = E T θ Variace: How variable is the statistic (i.e. precisio) I estimatio, there is always a bias-variace tradeoff!
Sample Mea, Bias, Variace, ad Risk Let T X 1,, X = 1 X i=1 i. We wat to see how well the sample mea estimates the populatio mea. We ll use square error loss. Bias μ T = 0, i.e. the sample mea is ubiased for the populatio mea Iterpretatio: O average, the sample mea will be close to the populatio mea, μ Var T = σ2 Iterpretatio: The sample mea is precise up to a order 1. That is, we decrease the variability of our estimate for the populatio mea by a factor of 1 Thus, R μ = σ2 Iterpretatio: O average, the sample mea will be close to the populatio mea by σ2. This holds for all populatio mea μ
Sample Variace ad Bias Let T X 1,, X = 1 X 1 i=1 i X 2. We wat to see how well the sample variace estimates the populatio variace. We ll use square error loss. Bias σ 2 T = 0, i.e. the sample variace is ubiased for the populatio mea Iterpretatio: O average, the sample variace will be close to the populatio variace, σ 2 Var T =?, R σ 2 =? Depeds o assumptio about fourth momets.
I Summary We studied how good, o average, our statistic is i estimatig the parameter Sample mea, X, ad sample variace,σ 2, are both ubiased for the populatio mea,μ, ad the populatio variace, σ 2
But But, what about the distributio of our summary statistics? So far, we oly studied the statistics average behavior. Samplig distributio!
Samplig Distributio whe F is Normal Case 1 (Sample Mea): Suppose F is a ormal distributio with mea μ ad variace σ 2 (deoted as N(μ, σ 2 )). The X is distributed as X = 1 i=1 X i N(μ, σ2 ) Proof: Use the fact that X i N μ, σ 2.
Samplig Distributio Example Suppose you wat to kow the probability that your sample mea, X, is ε > 0 away from the populatio mea. We assume σ is kow ad X i N μ, σ 2, i. i. d. P X μ ε = P X μ σ ε σ = P Z ε σ where Z N(0,1).
Samplig Distributio whe F is Normal Case 1 (Sample Variace): Suppose F is a ormal distributio with mea μ ad variace σ 2 (deoted as N(μ, σ 2 )). The 1 σ2 is σ distributed as 2 1 σ2 σ 2 = 1 σ 2 i=1 X i X 2 2 χ 1 2 where χ 1 is the Chi-square distributio with 1 degrees of freedom.
Some Prelimiaries from Stat 430 Fact 0: (X, X 1 X,, X X) is joitly ormal Proof: Because X ad X i X are liear combiatios of ormal radom variables, they must be joitly ormal. Fact 1: For ay i = 1,,, we have Cov X, X i X = 0 Proof: E X X i X = 1 μ2 + 1 μ2 + σ 2 (μ 2 + σ2 ) = 0 E X E X i X = μ μ μ = 0. Thus, Cov X, X i X = E X X i X E X E X i X = 0
Sice X ad X i X are joitly ormal, the zero covariace betwee them implies that X ad X i X are idepedet Furthermore, because σ 2 = 1 X 1 i=1 i X 2 is a fuctio of X i X, σ 2 is idepedet of X
2 Fact 2: If W = U + V ad W χ a+b, V χ 2 b, ad U 2 ad V are idepedet, the U χ a Proof: Use momet geeratig fuctios Now, we ca prove this fact. (see blackboard) W = U = X i μ σ 2 = i=1 X i X+X μ σ 2 = i=1 U + V χ 2 1 S2 σ 2 X μ V = σ 2 Thus, U χ 1 2 χ1 2
Samplig Distributio whe F is ot Normal Case 2: Suppose F is a arbitrary distributio with mea μ ad variace σ 2 (deoted as F(μ, σ 2 )). The as, lim X μ σ N 0,1 Proof: Use the fact that X i F μ, σ 2 ad use the cetral limit theorem
Properties As sample size icreases, X is ubiased Asymptotically ubiased As sample size icreases, X approaches the ormal distributio at a rate 1 Eve if we do t have ifiite sample size, usig N as a approximatio to X is meaigful for large samples How large? Geeral rule of thumb: 30 μ, σ2
Example Suppose X i Exp λ i i.i.d. Remember, E X i = λ ad Var X i = 1 λ 2 The, for large eough, X N( 1 λ, 1 λ 2 )
A Experimet
Lecture Summary Risk: Average loss/mistakes that the statistics make i estimatig the populatio parameters Bias-variace tradeoff for squared error loss. X ad σ 2 are ubiased for the populatio mea ad the populatio variace Samplig distributio: the distributio of the statistics used for estimatig the populatio parameters. If the populatio is ormally distributed: X N μ, σ2 1 σ2 ad σ 2 2 χ 1 If the populatio is ot ormally distributed X μ σ N(0,1) or X N μ, σ2