SAMPLE STATISTICS A radom sample x 1,x,,x from a distributio f(x) is a set of idepedetly ad idetically variables with x i f(x) for all i Their joit pdf is f(x 1,x,,x )=f(x 1 )f(x ) f(x )= f(x i ) The sample momets provide estimates of the momets of f(x) We eed to kow how they are distributed The mea x of a radom sample is a ubiased estimate of the populatio momet µ = E(x), sice ( xi ) E( x) =E = 1 E(xi )= µ = µ The variace of a sum of idepedet variables is the sum of their variaces, sice covariaces are zero Therefore V ( x) =V ( xi ) = 1 V (xi )= σ = σ Observe that V ( x) 0 as Sice E( x) = µ, the estimates become icreasigly cocetrated aroud the true populatio parameter Such a estimate is said to be cosistet 1
The sample variace is ot a ubiased estimate of σ = V (x), sice { 1 } E(s )=E (xi x) [ 1 { = E (xi µ)+(µ x) } ] [ 1 { = E (xi µ) +(x i µ)(µ x)+(µ x) }] = V (x) E{( x µ) } + E{( x µ) } = V (x) V ( x) Here, we have used the result that { 1 } E (xi µ)(µ x) = E{(µ x) } = V ( x) It follows that E(s )=V(x) V ( x) =σ σ ( 1) = σ Therefore, s is a biased estimator of the populatio variace For a ubiased estimate, we should use ˆσ = s 1 = (xi x) 1 However, s is still a cosistet estimator, sice E(s ) σ as ad also V (s ) 0 The value of V (s ) depeds o the distributio of uderlyig populatio, which is ofte assumed to be a ormal
Theorem Let x 1,x,,x be a radom sample from the ormal populatio N(µ, σ ) The, y = a i x i is ormally distributed with E(y) = a i E(x i )=µ a i ad V (y) = a i V (x i )=σ a i Ay liear fuctio of a set of ormally distributed variables is ormally distributed If is a ormal radom sample the x i N(µ, σ ); x N(µ, σ /) i =1,, Let µ = [µ 1,µ,,µ ] = E(x) be the expected value of x = [x 1,x,,x ] ad let Σ=[σ ij ; i, j =1,,,] be the variace covariace matrix If a =[a 1,a,,a ] is a costat vector, the a x N(a µ, a Σa) is a ormally distributed with a mea of E(a x)=a µ = a i µ i ad a variace of V (a x)=a Σa = i = a i σ ii + i i a i a j σ ij j a i a j σ ij j i 3
Let ι =[1, 1,,1] The, if x =[x 1,x,,x ] has x i N(µ, σ ) for all i, there is x N(µι, σ I ), where µι =[µ,µ,,µ] ad I is a idetity matrix of order Writig this explicitly, we have The, there is x = x 1 x x N µ µ µ, σ 0 0 0 σ 0 0 0 σ x =(ι ι) 1 ι x = 1 ι x N(µ, σ /) ad 1 σ = ι {σ I}ι = σ ι ι = σ, where we have used repeatedly the result that ι ι = If we do ot kow the form of the distributio from which the sample has bee take, we ca still say that, uder very geeral coditios, the distributio of x teds to ormality as : The Cetral Limit Theorem states that, if x 1,x,,x is a radom sample from a distributio with mea µ ad variace σ, the the distributio of x teds to the ormal distributio N(µ, σ /) as Equivaletly, ( x µ)/(σ/ ) teds i distributio to the stadard ormal N(0, 1) distributio 4
Distributio of the sample variace To describe the distributio of the sample variace, we eed to defie the chi-square distributio: Defiitio If z N(0, 1) is distributed as a stadard ormal variable, the z χ (1) is distributed as a chi-square variate with oe degree of freedom If z i N(0, 1); i =1,,, are a set of idepedet ad idetically distributed stadard ormal variates, the the sum of their squares is a chi-square variate of degrees of freeedom: u = zi χ () The mea ad the variace of the χ () variate are E(u) = ad V (u) = respectively Theorem The sum of two idepedet chi-square variates is a chi-square variate with degrees of freedom equal to the sum of the degrees of freedom of its additive compoets If x χ () ad y χ (m), the (x + y) χ (m + ) If x 1,x,,x x i χ () is a radom sample from a stadard ormal N(0, 1) distributio, the If x 1,x,,x is a radom sample from a N(µ, σ ) distributio, the (x i µ)/σ N(0, 1) ad, therefore, (x i µ) χ () σ 5
Cosider the idetity (xi µ) = ({x i x} + { x µ}) = ( {xi x} +{ x µ}{x i x} + { x µ} ) = {x i x} + { x µ}, which follows from the fact that the cross product term is { x µ} {x i x} = 0 This decompositio of a sum of squares features i the followig result: The Decompositio of a Chi-square statistic If x 1,x,,x is a radom sample from a stadard ormal N(µ, σ ) distributio, the (x i µ) (x i x) ( x µ) σ = σ + σ, with (1) () (x i µ) σ (x i x) χ (), σ χ ( 1), ( x µ) (3) σ χ (1), where the statistics uder () ad (3) are idepedetly distributed 6
Samplig Distributios (1) If u χ (m) ad v χ () are idepedet chi-square variates with m ad degrees of freedom respectively, the F = { / } u v m F (m, ), which is the ratio of the chi-squares divided by their respective degrees of freedom, has a F distributio of m ad degrees of freedom, deoted by F (m, ) () If z N(0, 1) is a stadard ormal variate ad if v χ () is a chi-square variate of degrees of freedom, ad if the two variates are distributed idepedetly, the the ratio / v t = z t() Notice that has a t distributed of degrees of freedom, deoted t() t = z v/ { χ / (1) χ } () = F (1,) 1 7
CONFIDENCE INTERVALS Let z N(0, 1) From the tables of the stadard ormal, we ca fid umbers a, b such that, for ay Q (0, 1), there is P (a z b) =Q The iterval [a, b] is called a Q 100% cofidece iterval for z The legth of the iterval is miimised whe it is cetred o E(z) =0 A cofidece iterval for the mea Let x i N(µ, σ ); i =1,,be a radom sample The ) x N (µ, σ x µ ad σ/ N(0, 1) Therefore, we ca fid umbers ±β such that ( P β x µ ) σ/ β = Q But, the followig evets are equivalet: ( β x µ ) σ/ β ( x β σ µ x + β σ ) The probability that [ x βσ/, x + βσ/ ] falls over µ is P ( x β σ µ x + β σ ) = Q, which meas that we Q 100% cofidet that µ lies i the iterval 8
A cofidece iterval for µ whe σ is ukow The ubiased estimate of σ is ˆσ = (x i x) /( 1) Whe σ is replaced by ˆσ, ( x µ) σ N(0, 1) is replaced by ( x µ) t( 1) ˆσ To demostrate this result, cosider writig { / (xi } ( x µ) ( x µ) x) = ˆσ σ σ, ( 1) ad observe that σ is cacelled from the umerator ad the deomiator The deomiator cotais (x i x) /σ χ ( 1) The umerator is a stadard ormal variate Therefore { / } χ ( 1) N(0, 1) t( 1) 1 To costruct a cofidece iterval, ±β, from the table of the N(0, 1) distributio, are replaced by correspodig umbers ±b, from the t( 1) table The P ( x b ˆσ µ x + b ˆσ ) = Q 9
A cofidece iterval for the differece betwee two meas Imagie a treatmet that affects the mea of a ormal populatio without affectig its variace To establish a cofidece iterval for the chage i the mea, take samples before ad after the treatmet Before treatmet, there is x i N(µ x,σ ); i =1,, ad x N ) (µ x, σ, ad, after treatmet, there is y j N(µ y,σ ); j =1,,m ad ȳ N ) (µ y, σ m Assumig that the samples are mutually idepedet, the differece betwee their meas is ( x ȳ) N ) (µ x µ y, σ + σ m Hece ( x ȳ) (µ x µ y ) σ + σ m N(0, 1) 10
If σ were kow, the, for ay give value of Q (0, 1), a umber β ca be foud from the N(0, 1) table such that { } σ P ( x ȳ) β + σ σ m µ x µ y ( x ȳ)+β + σ = Q, m givig a cofidece iterval for µ x µ y Usually, σ has to be estimated from the sample iformatio There are (xi x) σ χ ( 1) ad (yj ȳ) σ χ (m 1), which are idepedet variates with expectatios equal to the umbers of their degrees of freedom The sum of idepedet chi-squares is itself a chi-square with degrees of freedom equal to the sum of those of its costituet parts Therefore, (xi x) + (y j ȳ) σ χ ( + m ) has a expected value of + m, whece the ubiased estimate of the variace is ˆσ = (xi x) + (y j ȳ) + m 11
If the estimate is used i place of the ukow value of σ, the we get ( x ȳ) (µ x µ y ) ˆσ + ˆσ m = / (xi ( x ȳ) (µ x µ y ) x) + (y j ȳ) σ ( + m ) σ N(0, 1) χ (+m ) +m which is the basis for a cofidece iterval + σ m = t( + m ), A cofidece iterval for the variace If x i N(µ, σ ); i =1,, is a radom sample, the (x i x) /( 1) is a ubiased estimate of the variace ad (x i x) /σ χ ( 1) Therefore, from the appropriate chi-square table, we ca fid umbers α ad β such that ( (xi x) ) P α β = Q for some chose Q (0, 1) From this, it follows that P ( 1 α σ (xi x) 1 ) β σ = Q P ad the latter provides a cofidece iterval for σ ( (xi x) 1 β σ (xi x) ) = Q α
The cofidece iterval for the ratio of two variaces Imagie a treatmet that affects the variace of a ormal populatio It is possibile that the mea is also affected Let x i N(µ x,σ x); i =1,,be a radom sample take from the populatio before treatmet ad let y j N(µ y,σ y); j =1,,m be a radom sample take after treatmet The (xi x) σ χ ( 1) ad (yj ȳ) σ χ (m 1), are idepedet chi-squared variates, ad hece { (xi / (yj } x) ȳ) F = σx( 1) σy(m 1) F ( 1,m 1) It is possible to fid umbers α ad β such that P (α F β) =Q, where Q (0, 1) is some chose probability value Give such values, we may make the followig probability statemet: ( (yj ȳ) ) ( 1) P α (xi x) (m 1) σ y (yj ȳ) ( 1) σx β (xi x) = Q (m 1) 13