Lecture 3. Sampling, sampling distributions, and parameter estimation

Lecture 3 Samplg, samplg dstrbutos, ad parameter estmato

Samplg

Defto Populato s defed as the collecto of all the possble observatos of terest. The collecto of observatos we take from the populato s called a sample. The umber of observatos the sample s called the sample sze.

Samplg Whe we are terested a populato, we typcally study a sample of that populato rather tha attempt to study the etre populato. The sample should deally be a represetato of the populato wth smlar characterstcs.

Prcples of samplg. Same dstrbuto. All varables the sample X,, X have the same dstrbuto as the etre populato.. Idepedece. X,, X are depedet. I other words, each observato has o relatoshp wth others.

Smple radom samplg Smple radom samplg s the most straghtforward of the radom samplg strateges. We use ths strategy whe we beleve that the populato s relatvely homogeeous for the characterstc of terest..e. o populato structure

Smple radom samplg For example, let's say you were surveyg frst-tme parets about ther atttudes toward madatory seat belt laws. You mght expect that ther status as ew parets mght lead to smlar cocers about safety. O campus, those who share a major mght also have smlar terests ad values; we mght expect psychology majors to share cocers about access to metal health servces o campus.

Smple Radom Samplg

Other samplg methods Systematc samplg Stratfed samplg Proportoate samplg Cluster samplg Multstage samplg Ad so o

Sample statstc ad dstrbuto

Sample mea ad sample varace Let X,, X be a radom sample Sample mea Sample varace Sample stadard error (devato) X X X X S X X S ) ( X X S

For frequecy or groupg data Let X,, X k be group values, f,, f k be the frequecy of each group, f + +f k = Sample mea Sample varace k f X X X X f X X f S k k

Propertes of sample mea ad varace Let X,...,X be a radom sample from a ormal dstrbuto N(, ), ad let X X S X X The we have () X ad S are depedet radom varables. () X has a ormal dstrbuto,.e. N(, ) (3) ( ) S has a ch-squared dstrbuto wth degrees of freedom. ( ) S SS ( X X ) X X

dstrbuto If X ~N(0,), =,,, ad X are depedet Defe The obeys dstrbuto wth degrees of freedom, deoted by ~ ( ) X p(x) 0.5 0.45 0.4 0.35 0.3 0.5 0. 0.5 0. 0.05 0 ( ) 0 4 6 8 x = = =3 =4 =6 =9

Studet s t dstrbuto If X~N(0,), Y~, X ad Y are depedet Defe t X Y The t obeys t dstrbuto wth degrees of freedom, deoted by t~t() ( ) p(x) 0.45 0.4 0.35 0.3 0.5 0. 0.5 0. 0.05 t() 0-6 -4-0 4 6 x = = =3 =4 =6 =9

F dstrbuto X ad Y are depedet, Defe X ( ) Y ( ) ~ F X Y ~ The F obeys F dstrbuto wth degrees of freedom ad, deoted by F~F(, ) p(x) 0.6 0.5 0.4 0.3 0. 0. 0 F(d,d) d=,d= d=,d= d=5,d= d=00,d= 0 3 4 5 6 x

Relatoshp betwee dfferet dstrbutos Bomal dstrbuto = x x p s large; p s moderate (Ut tme) Posso dstrbuto p(λ) s large Stadardzed Normal dstrbuto N(0,) s large F(x) Uform dstrbuto U(0,) x Two-pot dstrbuto (Tme terval Δt) Posso dstrbuto p(λδt) X X X T Z Tme for the frst jump Expoetal X X ~ (0,) Z ~ ( ) N s large X ad Z are depedet t dstrbuto t() dstrbuto exp(λ) Х () F dstrbuto F(m, ) α=, β= x ~ ( m) F x ~ ( ) x ad x x m Tme for the αth jump Erlag dstrbuto Γ(α, λ) α=/, β = are depedet x α>0 λ= β Gamma dstrbuto Γ(α, β) x ~ (,) B x ~ (,) x ad x x x x are depedet Beta dstrbuto Beta(α, β)

Statstcal ferece Statstcal ferece: Drawg coclusos about the whole populato o the bass of a sample Precodto for statstcal ferece: A sample s radomly selected from the populato (=probablty sample)

Parameter estmato

Parameter estmato Parameter estmato s a mportat problem statstcs. It ca dvded to two types:. Pot estmato: t volves the use of sample data to calculate a sgle value (kow as a statstc) whch s to serve as a "best guess" or "best estmate" of a ukow (fxed or radom) populato parameter.. Iterval estmato: t s the use of sample data to calculate a terval of possble (or probable) values of a ukow populato parameter.

Pot estmato X~F(x, θ), θ s ukow. The target of pot estmato s to gve a statstc ad there s a group of observatos X, X,, X. The estmator of θ deotes as ˆ ˆ( X, X,, X For example, whe θ=e(x), we ca use mea of samples as the estmator of θ,.e. ˆ X )

Two commoly used pot estmato methods Maxmum lkelhood method Momet method

Maxmum lkelhood estmate (MLE) Ths method s to maxmze the lkelhood fucto for gettg the estmator of parameters. The probablty desty fucto of X s p(x; θ), ad θ s ukow. Suppose there s a sample observatos X, X,, X for X.

Maxmum lkelhood method The the combed probablty fucto s L( ) L( x, x,, x; ) p(x; ) p(x; ) p(x; ) p(x; ) We call the above fucto the lkelhood fucto. Defe the logarthm of lkelhood as d l L( ) Let 0 d, the we ca calculate the maxmum lkelhood estmator (MLE) of θ. l L( ) l L( x ), x,, x; ) l p( x ;

Maxmum lkelhood method Whe the lkelhood fucto cotas k parameters,,,, the k ( L,,, k) p(x ;,,, k ) The maxmum lkelhood estmator of,,, k : ˆ ˆ x, x,, x, =,,k are the soluto of k equatos l L(,,, k) 0,,,, k

Example Assume X, X,, X are radom samples from a ormal dstrbuto, how to get the maxmum lkelhood estmator of parameters. Soluto: The lkelhood fucto s ), ( N ad x x L exp ] exp[, ) ( ) (

Example The The the dervatve equatos are x L l l l ) ( ) ( ), ( x L x L 4 0 l 0 l ) ( ), ( ) ( ), (

Example So the solutos are The maxmum lkelhood estmator of s x x x ( ) x x ˆ

Example Whe s kow, MLE of s ˆ ( ) If s ukow, MLE of s ˆ ( x x) x MLE of s ot equal to the sample varace! Whch oe s better?

Momet method Basc dea: equatg sample momets wth uobservable populato momets ad the solvg those equatos for the quattes to be estmated. Suppose the probablty desty fucto of X s p(x; ), the the r th,,, k momet of X s r r - r E( X ) x p( x;,,, k ) dx

Momet method Suppose there s a sample observatos X, X,, X for X. The the r th momet of samples are Equate the j th (j=,, k) sample momets wth uobservable populato momets r r X a k k k k k a a a ),,, ( ),,, ( ),,, (

Momet method Solve the equatos, the we ca get the estmator of θ: ˆ, ˆ,, ˆ We call them the momet estmators of θ. k

Example X, X,, X are samples from Uform dstrbuto The,0 x p(x, ) 0, otherwse a - xp( x, ) dx So ˆ x x The momet estmator of θ s x 0 xdx ˆ x

Desrable propertes of estmator Sce estmator gves a estmate that depeds o sample pots (X, X,, X ), estmate s a fucto of sample pots. Sample pots are radom varable, therefore estmate s radom varable ad has probablty dstrbuto. We wat that estmator to have several desrable propertes lke. Ubasedess. Effectveess 3. Mmum mea square error

Ubasedess A estmator s sad to be ubased f the expected value of the estmator s equal to true value of the parameter beg estmated, or E( ˆ) Example: sample proporto s the ubased estmator of populato proporto

Effectveess The most effcet estmator amog a group of ubased estmators s the oe wth the smallest varace. Geerally speakg, assumg ˆ ˆ (X,X,,X ad (X,X,,X) are two ubased estmators of θ, ad V ( ˆ ˆ ) V ( ), the ˆ s sad to be more effectve tha. ˆ )

Mmum mea square error (MSE) Basc dea: mmze the average devato betwee the estmato ad true value. We call the estmator whch mmze E{[ ˆ(X,X,,X ) - ] } as the mmum mea square error estmator of θ.

Iterval estmato Estmato of the parameter s ot suffcet. It s ecessary to aalyze ad see how cofdet we ca be about ths partcular estmato. Oe way of dog t s defg cofdece tervals. If we have estmated θ we wat to kow f the true parameter s close to our estmate. I other words we wat to fd a terval that satsfes followg relato: P( G G ) L U

Iterval estmato.e. probablty that true parameter θ s the terval (G L, G U ) s greater tha -. Actual realzato of ths terval - (g L, g U ) s called a 00(- )% of cofdece terval, lmts of the terval are called lower ad upper cofdece lmts. - s called cofdece level.

Example If populato varace s kow ( ) ad we estmate populato mea the Z x / ~ N(0,) We ca fd from the table that probablty of Z s more tha s equal to 0.587. Probablty of Z s less tha - s aga 0.587. These values comes from the table of the stadard ormal dstrbuto.

Example Now we ca fd cofdece terval for the sample mea. Sce: P( Z ) P( Z *0.587 ) 0.686 P( Z The for we ca wrte ) P( Z x P( ) P( x / x / ) / ) P( Z 0.686 Cofdece level that true value s wth stadard error (stadard devato of samplg dstrbuto) from the sample mea s 0.686. Probablty that true value s wth stadard error from the sample mea s 0.9545. )

Iterval estmato Above we cosdered the case whe populato varace s kow advace. It s rarely the case real lfe. Whe both populato mea ad varace are ukow we ca stll fd cofdece tervals. I ths case we calculate populato mea ad varace ad the cosder dstrbuto of the statstc: x Z S / Here S s the sample varace.

Iterval estmato Sce t s the rato of the stadard ormal radom varable to square root of radom varable wth - degrees of freedom, Z has Studet s t dstrbuto wth - degrees of freedom. I ths case we ca use table of t dstrbuto to fd cofdece levels. It s ot surprsg that whe we do ot kow sample varace cofdece tervals for the same cofdece levels becomes larger. That s prce we pay for what we do ot kow.

Iterval estmato If umber of degrees of freedom becomes large, the t dstrbuto s approxmated well wth ormal dstrbuto. For >00 we ca use ormal dstrbuto to fd cofdece levels, tervals.

The Law of Large Numbers ad Cetral Lmt Theorem

The Law of Large Numbers Assume X, X,, X are radom samples of X. EX ad V(X) exst. Let X, the for ay gve ε 0, X lm P X

The Cetral Lmt Theorem Let X be the mea of a radom sample X, X,, X, of sze from a dstrbuto wth a fte mea ad a fte postve varace. The X Y N (0,) /

Small probablty evet A small probablty evet s a evet that has a low probablty of occurrg. The small probablty evet wll hardly happe oe expermet. Ths prcple s used for hypothess ad tests. A evet s a small probablty evet, so t wll hardly happe theory. But f t happes actually, the we reject H 0.

Expermets o the dstrbuto of sample mea ad sample varace Use RAND() EXCEL to geerate pseudoradom umbers X ad X of U(0,): uform dstrbuto o the terval [0, ] Use trasformato to geerate radom umbers Y ad Y of N(0, ) Y X l( X) s( X ), Y l( X) cos( ) Use trasformato to geerate radom umbers Z ad Z of N(μ, σ ) Z Y

Let s do some exercses together Draw 00 radom samples from U(0, ) Draw the frequecy dstrbuto of the 00 samples Draw 00 sets of 5 samples from N(0, 0) Draw the frequecy dstrbuto of the 500 samples, ad compare t wth N(0, 0) Draw the frequecy dstrbuto of the 00 sample meas ad 00 sample varaces Compare the dstrbuto of X wth N(0, ) Compare the dstrbuto of wth S 0 (4) 4