Kurskod: TAMS4 Statistisk teori)/provkod: TEN 08-08-4 08:00-:00 Examiator: Zhexia Liu Tel: 070089508) You are permitted to brig: a calculator, ad formel -och tabellsamlig i matematisk statistik Scores ratig: 8- poits givig rate 3; 5-45 poits givig rate 4; 5-8 poits givig rate 5 3 poits) Eglish Versio Assume that the distributio of lifetimes uit: year) of a certai type of electroic compoets is Exp/µ) where the true average lifetime µ is ukow Oe chose 400 such electroic compoets, ad after oe year 09 compoets still worked amely, the other 9 compoets were broke after oe year) Based o this iformatio, use the method of momets to fid a poit estimate of µ Solutio Lethal X = umber of compoets which still work after oe year The X Bi400, p), where p = P Exp/µ) > ) = fracµe x/µ dx = e /µ It follows from the method of momets that EX) = x here x = x / = 09), therefore 400 p = 09 implyig e /µ = 09/400 ˆµ = / l09/400) = 077 3 poits) A radom sample {X,, X } is take from a populatio Nµ, σ) with ukow µ ad kow σ ) p) Fid a poit estimator of µ usig Maximum-Likelihood method ) p) Is this poit estimator i ) ubiased? Why? 3) p) Is this poit estimator i ) cosistet? Why? Solutio ) The likelihood fuctio is Takig the logarithm gives Lµ) = ) e Xi µ) /σ ) = e σ Xi µ) πσ πσ l Lµ) = l πσ) σ I order to fid the maximal poit, we take the first derivative X i µ) 0 = l Lµ) = σ X i µ) ˆµ = X The secod derivative rule verifies that ˆµ = X is ideed a maximum ) Yes, sice Eˆµ) = E X) = E X + + X )) = EX ) + + EX )) = µ = µ 3) Yes, sice ˆµ is ubiased ad V ˆµ) = V X + + X )) = V X ) + + V X )) = σ = σ 0, as Page /3
3 3 poits) Oe wats to collect a radom sample of values from a populatio P oµ) Usig the sample, oe iteds to test the ull hypothesis H 0 : µ = 4 agaist the alterative hypothesis H : µ > 4 such that the probability of the first type error is 005 ad the probability of the secod type error is 00 with the true µ = 5 How should be chose? Solutio Let s preted that is large so that we ca use ormal approximatios, that is It the follows that X + + X P oµ) Nµ, µ) X µ µ/ N0, ) The fact of the first type error is 005 gives that H 0 is rejected whe X µ 0 µ0 / > z 005 = 645 Therefore, 00 = the secod type error = P do t reject H 0 whe H 0 is false ad µ = 5) = P X µ 0 645 whe µ = 5) µ0 / = P X 4 4 + 645 whe µ = 5) = P X 4 µ 4 + 645 ) 5 ) P N0, ) µ/ 5/ 4 4 + 645 ) 5 ) 5/ Therefore, 4 4 + 645 ) 5 = 33 5/ = 75 ie = 73) 4 3 poits) ) )) X 4 Assume that X = N, Oe wats to make a liear combiatio Y = ax X 5) 3 + bx such that the mea EY ) = 8 ad the variace V Y ) is miimized Determie a ad b Solutio It follows from 8 = EY ) = aex ) + bex ) = a + 5b that a = 8 5b The the variace is computed as V Y ) = V ax + bx ) = a V X ) + b V X ) + abcovx, X ) = 4a + 3b + ab = 8 5b) + 3b + 8 5b)b = 3b 7b + 64 To fid the miimal vale of V X) we just take the first derivative 0 = dv Y )/db = 46b 7 b = 7/46 = 565 a = 8 5b)/ = 0087 5 3 poits) The umber of cars passig a bridge ca be assumed to be Poisso distributed with a mea µ cars per miute from North ad a mea µ cars per miute from South Suppose that the umber of cars from North is idepedet of the umber of cars from South Durig a hour there were 60 cars passed of which 90 cars were from North Fid a 95% cofidece iterval for µ µ Page /3
Solutio Let The X = umber of cars from North i a hour P o60µ ) N60µ, 60µ ), Y = umber of cars from South i a hour P o60µ ) N60µ, 60µ ) X Y N60µ 60µ, 60µ + 60µ ) X 60 Y 60 ) µ µ ) µ +µ 60 Therefore, the cofidece iterval for µ µ is I µ µ = x 60 y 60 ) z ˆµ + ˆµ α/ 60 = 90 60 70 90/60 + 70/60 60 ) 96 60 = 0333 96 0 = 0333 043 = 008, 0746) N0, ) 6 3 poits) Oe wishes to ivestigate whether or ot the check out frequecy i a certai library varies with the day of the week Durig a radomly chose week oe couts the umber of books checked out at the idividual days: weekday Moday Tuesday Wedesday Thursday Friday # books checked out 08 35 4 46 0 Test o a sigificace level α = 00 whether or ot the check out frequecy varies with the day of the week Solutio I this case, The the test statistic is ad the rejectio regio is H 0 : p = p = p 3 = p 4 = p 5 = 0 agaist H : some p i 0 T S = 5 N i p i ) = 7866, p i C = χ α5 ), + ) = 38, + ) It is clear that T S / C, so we do t reject H 0 ie there is o evidece that frequecy varies with the day of the week Page 3/3
Svesk Versio 3 poäg) Atag att fördelige av livslägde ehet: år) för e viss typ av elektroiska kompoeter är Exp/µ) där de saa geomsittliga livslägde µ är okäd Ma valde 400 sådaa elektroiska kompoeter, och efter ett år arbetade 09 kompoeter fortfarade ämlige de adra 9 kompoetera bröts efter ett år) Basera på dea iformatio, aväd momet-metode för att beräka e puktskattig av µ 3 poäg) Ett slumpmässigt stickprov {X,, X } tas frå e populatio Nµ, σ) med okäd µ och käd σ ) p) Beräka e puktskattig av µ geom att aväda Maximum Likelihood-metode ) p) Är dea puktskattig i ) vätevärdesriktig? Varför? 3) p) Är dea puktskattig i ) kosistet? Varför? 3 3 poäg) Ma öskar isamla ett slumpmässigt stickprov om värde frå e populatio P oµ) Med hjälp av stickprovet avser ma att testa ollhypotese H 0 : µ = 4 mot de alterativa hypotese H : µ > 4 på sådat sätt att saolikhete för fel av första slaget är 005 och saolikhete för fel av adra slaget är 00 med de saa µ = 5 Hur skall väljas? 4 3 poäg) X ) )) 4 Atag att X = N, Ma vill göra e lijärkombiatio Y = ax X 5) 3 + bx såda att vätevärdet EY ) = 8 och variase V Y ) miimeras Bestäm a och b 5 3 poäg) Atalet bilar som passerar e bro ka atas vara Poissofördelat med ett vätevärde µ bilar per miut orrut och ett vätevärde µ bilar per miut söderut Atag att atalet bilar orrut är oberoede av atalet bilar söderut Uder e timme passerade 60 bilar varav 90 var orrut Bilda ett approximativt 95% kofidesitervall för µ µ 6 3 poäg) Ma vill udersöka om utlåigsfrekvese för ett bibliotek varierar med veckodag Uder e slumpmässigt vald vecka erhölls följade resultat: veckodag mådag tisdag osdag torsdag fredag # utlåade böcker 08 35 4 46 0 Testa på sigifikasivå α = 00 huruvida utlåige varierar med veckodag Page /
TAMS4: Notatios ad Formulas by Xiagfeg Yag Basic otatios ad defiitios X: radom variable stokastiska variabel); Mea Vätevärde): { kp Xk), if X is discrete, µ = EX) = xf Xx)dx, if X is cotiuous; Variace Varias): σ = V X) = EX µ) ) = EX ) EX)) ; Stadard deviatio Stadardavvikelse): σ = DX) = V X); Populatio X; Radom sample slumpmässigt stickprov): X,, X are idepedet ad have the same distributio as the populatio X Before observe/measure, X,, X are radom variables, ad after observe/measure, we use x,, x which are umbers ot radom variables); Sample mea Stickprovsmedelvärde): Before observe/measure, X = X i, ad after observe/measure, x = x i; Sample variace Stickprovsvarias): Before observe/measure, S = X i X), ad after observe/measure, s = x i x) ; Sample stadard deviatio Stickprovsstadardavvikelse): Before observe/measure, S = S, ad after observe/measure, s = s ; E cixi) = ciexi), V cixi) = c i V Xi), if X,, X are idepedet oberoede); If X Nµ, σ), the X µ σ N0, ); If X,, X are idepedet ad Xi Nµi, σi), the d + cixi Nd + ciµi, c i σ i ); For a populatio X with a ukow parameter θ, ad a radom sample {X,, X} : Estimator Stickprovsvariabel): ˆΘ = gx,, X), a radom variable; /0 Estimate Puktskattig): ˆθ = gx,, x), a umber; Ubiased Vätevärdesriktig): E ˆΘ) = θ; Effective Effektiv): Two estimators ˆΘ ad ˆΘ are ubiased, we say that ˆΘ is more effective tha ˆΘ if V ˆΘ) < V ˆΘ); Biomial distributio X BiN, p) : there are N idepedet ad idetical trials, each trial has a probability of success p, ad X = the umber of successes i these N trials The radom variable X BiN, p) has a probability fuctio saolikhetsfuktio) ) N pk) = P X = k) = p k p) N k ; k Expoetial distributio X Exp/µ) : whe we cosider the waitig time/lifetime The radom variable X Exp/µ) has a desity fuctio täthetsfuktio) fx) = µ e x/µ, x 0 Poit estimatio Method of momets Mometmetode): # of equatios depeds o # of ukow parameters, EX) = x, EX ) = x i, EX 3 ) = x 3 i, Cosistet Kosistet): A estimator ˆΘ = gx,, X) is cosistet if lim P ˆΘ θ > ε) = 0, for ay costat ε > 0 This is called covergece i probability ) Theorem: If E ˆΘ) = θ ad lim V ˆΘ) = 0, the ˆΘ is cosistet Least square method mista-kvadrat-metode): The least square estimate ˆθ is the oe miimizig Qθ) = xi EX)) Maximum-likelihood method Maximum-likelihood-metode): The maximum-likelihood estimate ˆθ is the oe maximizig the likelihood fuctio Lθ) = { fx i; θ), if X is cotiuous, px i; θ), if X is discrete Remark o ML: I geeral, it is easier/better to maximize l Lθ); Remark o ML: If there are several radom samples say m) from differet populatios with a same ukow parameter θ, the the maximum-likelihood estimate ˆθ is the oe maximizig the likelihood fuctio defied as Lθ) = Lθ) Lmθ), where Liθ) is the likelihood fuctio from the i-th populatio /0
Estimates of populatio variace σ : If there is oly oe populatio with a ukow mea, the method of momets ad maximum-likelihood method, i geeral, give a estimate of σ as follows σ = xi x) NOT ubiased) A adjusted or corrected) estimate would be the sample variace xi x) ubiased) s = If there are m differet populatios with ukow meas ad a same variace σ, the a adjusted or corrected) ML estimate is s = )s + + m )s m ) + + m ) ubiased) where i is the sample size of the i-th populatio, ad s i is the sample variace of the i-th populatio Stadard error medelfelet) of a estimator ˆΘ: is a estimate of the stadard deviatio D ˆΘ) 3 Iterval estimatio Oe sample, X} from Nµ, σ) Two samples {X,, X } from Nµ, σ); {Y,, Y } from Nµ, σ); Nµ, σ) ad Nµ, σ) are idepedet Iµ = x λ α/ σ, if σ is kow; fact X µ x t α/ ) s, if σ is ukow; fact X µ s/ t ) I σ = )s σ/ N0, ) ) ; fact )S σ χ ) χ α ), )s χ α ) Ukow σ ca be estimated by the sample variace s = x i x) σ x ȳ) λ α/ Iµ µ + σ, if σ ad σ are kow; X Ȳ ) µ µ) fact N0, ) σ + σ x ȳ) t α/ + ) s +, if σ = σ = σ is ukow; X Ȳ ) µ µ) = fact S + t + ) s x ȳ) t α/ f) + s, if σ σ both are ukow; I σ = + )s χ α + ), + )s χ α + ) X Ȳ ) µ µ) fact tf) S + S degrees of freedom f = s /+s /) s / ) + s / ) ), if σ = σ = σ; fact + )S σ χ + ) Ukow σ ca be estimated by the samples variace s = )s + )s + m samples: The ukow σ = = σ m = σ ca be estimated by s = )s ++m )s m )++m ) 3/0 Nµ, σ) idep Nµ, σ) Remark: The idea of usig fact to fid cofidece itervals is very importat There are a lot more differet cofidece itervals besides above For istace, we cosider two idepedet samples: {X,, X } from Nµ, σ) ad {Y,, Y } from Nµ, σ) I this case, we ca easily prove that c X + cȳ N cµ + cµ, σ c + c c X+cȲ ) cµ+cµ) If σ is kow, the fact σ c + c N0, ) So we ca fid Icµ+cµ ; c X+cȲ ) cµ+cµ) If σ is ukow, the fact S c + c t + ) So we ca fid Icµ+cµ 3 Cofidece itervals from ormal approximatios ˆp ˆp) X BiN, p) : Ip = ˆp λ α/ N, fact ˆP p N0, ) ˆP ˆP ) N we require that N ˆp > 0 ad N ˆp ˆp) > 0) N X HypN,, p) : Ip = ˆp λ α/ N ˆp ˆp), fact ˆP p N N ˆP ˆP N0, ) ) x X P oµ) : Iµ = x λ α/, fact X µ N0, ) X we require that x > 5) X Exp µ ) : I x µ = x + λ α/, λ α/ Iµ = x λ α/ x, fact, fact X µ µ/ N0, ), X µ N0, ) X/ we require that 30) Remark: Agai there are more cofidece itervals besides above For istace, we cosider two idepedet samples: X from BiN, p) ad Y from BiN, p), with ukow p ad p As we kow ˆP N p, p p) ad ˆP N p, p p), ) so ˆP ˆP N p p, p p) + p p) ˆP ˆP) p p) Therefore, fact is ˆP ˆP ) + ˆP ˆP ) Ip p = ˆp ˆp) λ α/ ˆp ˆp) + ˆp ˆp) N 0, ), 3 Cofidece itervals from the ratio of two populatio variaces 4/0
Suppose there are two idepedet samples {X,, X } from Nµ, σ), ad {Y,, Y } from Nµ, σ) The )S χ σ ) ad )S χ σ ), therefore S /σ S F /σ, ), fact Thus I σ /σ = s s F α, ), s ) s F α, ) 33 Large sample size 30, populatio may be completely ukow) If there is o iformatio about the populatios), the we ca apply Cetral Limit Theorem usually with a large sample 30) to get a approximated ormal distributios Here are two examples: Example : Let {X,, X}, 30, be a radom sample from a populatio, the o matter what distributio the populatio is) X µ s/ N0, ) Example : Let {X,, X }, 30, be a radom sample from a populatio, ad {Y,, Y }, 30, be a radom sample from aother populatio which is idepedet from the first populatio, the o matter what distributios the populatios are) X Ȳ ) µ µ) N0, ) s + s 4 Hypothesis testig 4 Oe sample ad the geeral theory of hypothesis testig Suppose there is a radom sample {X,, X} from a populatio X with a ukow parameter θ, H0 : θ = θ0 vs H : θ < θ0, or θ > θ0, or θ θ0 H0 is true H0 is false ad θ = θ reject H0 type I error or sigificace level) α power) hθ) do t reject H0 α type II error) βθ) = hθ) Regardig the p-value: reject H0 if ad oly if p-value < α For otatioal simplicity, we employ TS := test statistic ; ad C := critical regio reject H0 if TS C; reject H0 if ad oly if p-value < α 5/0 4 Hypothesis testig for populatio meas) Oe sample: {X,, X} from Nµ, σ) Null hypothesis H0 : µ = µ0 σ is kow: X µ σ/ N0, ) H : µ < µ0 : TS = x µ0 σ/, C =, λ α), p-value = P N0, ) TS); H : µ > µ0 : TS = x µ0 σ/, C = λ α, + ), p-value = P N0, ) TS); H : µ µ0 : TS = x µ0 σ/, C =, λ α/) λ α/, + ), p-value = P N0, ) TS ) σ is ukow: X µ s/ t ) H : µ < µ0 : TS = x µ0 s/, C =, t α )), p-value = P t ) TS); H : µ > µ0 : TS = x µ0 s/, C = t α ), + ), p-value = P t ) TS); H : µ µ0 : TS = x µ0 s/, C =, t α/ )) t α/ ), + ), p-value = P t ) TS ) Two samples: {X,, X } from Nµ, σ); {Y,, Y } from Nµ, σ); Null hypothesis H0 : µ = µ σ, σ are kow: X Ȳ ) µ µ) N0, ) σ + σ H : µ < µ : TS = x ȳ) σ + σ p-value = P N0, ) TS); H : µ > µ : TS = x ȳ), C = λα, + ), σ + σ, C =, λα), p-value = P N0, ) TS); H : µ µ : TS = x ȳ) σ + σ, C =, λ α/ ) λ α/, + ), p-value = P N0, ) TS ) H : µ < µ : TS = σ = σ is ukow: x ȳ) s, C =, + tα + )), p-value = P t + ) TS); H : µ > µ : TS = x ȳ) s, C = + tα + ), + ), X Ȳ ) µ µ) S + t + ) p-value = P t + ) TS); H : µ µ : TS = x ȳ) s, C =, t + α/ + )) p-value = P t + ) TS ) t α/ + ), + ), σ σ both ukow: similarly as i the tree of cofidece itervals 6/0
43 Hypothesis testig for populatio variaces) H : σ {X,, X } from Nµ, σ) )S < σ )s 0 : TS =, C = 0, χ σ α )), 0 p-value = P χ ) TS); χ H : σ > σ )s 0 : TS =, C = χ σ α ), + ), ) σ 0 H0 : σ = σ 0 p-value = P χ ) TS); H : σ σ )s 0 : TS =, C = 0, χ σ 0 α )) χ α ), + ), p-value = P χ ) TS) or P χ ) TS) H : σ {X,, X } from Nµ, σ) < σ : TS = s /s, C = 0, F α, )), p-value = P F, ) TS); H : σ > σ : TS = s /s, C = F α, ), + ), p-value = P F, ) TS); {Y,, Y } from Nµ, σ) S /σ F S, ) /σ H0 : σ = σ H : σ σ : TS = s /s, C = 0, F α, )) F α, ), + ), p-value = P F, ) TS) or P F, ) TS) 44 Large sample size 30, populatio may be completely ukow) If there is o iformatio about the populatios), the we ca apply Cetral Limit Theorem usually with a large sample 30) The idea is exactly the same as the oe used i cofidece itervals Oe example is: a sample {X,, X}, 30, from some populatio which is ukow) with a mea µ ad stadard deviatio σ Null hypothesis H0 : µ = µ0 The it follows from CLT that X µ s/ N0, ), therefore H : µ < µ0 : TS = x µ0 s/, C =, λ α), p-value = P N0, ) TS); H : µ > µ0 : TS = x µ0 s/, C = λ α, + ), p-value = P N0, ) TS); H : µ µ0 : TS = x µ0 s/, C =, λ α/) λ α/, + ), p-value = P N0, ) TS ) 5 Multi-dimesio radom variables or radom vectors) Covariace Kovarias) of X, Y ): σx,y = covx, Y ) = E X µx)y µy ), covx, X) = V X)) Correlatio coefficiet Korrelatio) of X, Y ): ρx,y = covx,y ) V X) V Y ) = σx,y σx σy A rule: for real costats a, ai, b ad bj, m m cova + aixi, b + bjyj) = aibjcovxi, Yj) j= j= 7/0 X ad Y are ucorrelated: if covx, Y ) = 0 A importat theorem: Suppose that a radom vector X has a mea µx ad a covariace matrix CX Defie a ew radom vector Y = AX + b, for some matrix A ad vector b The µy = AµX + b, CY = ACXA Stadard ormal vectors: {Xi} are idepedet ad Xi N0, ), X = X X X, thus µx = 0 0 0, CX = 0 0 0 0 0 0, desity fxx) = π) e x x Geeral ormal vectors: Y = AX + b, where X is a stadard ormal vector, ad µy = b, CY = AA, desity fyy) = π) detcy) e y µy) C y µy) Y 6 Simple ad multiple) Liear regressios Simple liear regressio: Yj = β0 + βxj + εj, εj N0, σ), j =,, Multiple liear regressio: Yj = β0 + βxj + βxj + + βkxjk + εj, εj N0, σ), j =,, Both Simple liear regressio ad Multiple liear regressio ca be writte as vector forms: Y x xk Y x β0 xk Y = Xβ + ε : Y =, X =, β =, ε N0, σ I ) βk Y x xk Y NµY, CY), where µy = Xβ ad CY = σ I Estimate of the coefficiet β: ˆβ = X X) X y Estimator of the coefficiet β: ˆB = X X) X Y N β, σ X X) ) Estimated lie is: ˆµj = ˆβ0 + ˆβxj + ˆβxj + + ˆβkxjk Aalysis of variace: SST OT = yj ȳ), j= SSR = ˆµj ȳ), j= SSE = yj ˆµj), j= SST OT σ = j= Y j Ȳ ) σ χ ), if β = = βk = 0; j= ˆµ j Ȳ ) σ χ k), if β = = βk = 0; SSR σ ˆµj) χ k ) = SSE σ j= = j σ 8/0
SST OT = SSR + SSE, ad R = SS R SST OT σ is estimated as ˆσ = S = SSE k For the Hypothesis testig: H0 : β = = βk = 0 vs H : at least oe βj 0, SSR/k SSE/ k ) F k, k ) SSR/k TS = SSE/ k ) C = Fαk, k ), + ) We kow ˆB = X X) X Y N β, σ X X) ), thus if we deote X X ) = h00 h0 h0k h0 h hk, hk hk hkk the ˆBj Nβj, σ hjj) ad ˆBj βj σ hjj N0, ) But σ is geerally ukow, therefore ˆBj βj S t k ), hjj s hjj is sometimes deoted as d ˆβj) or se ˆβj) Cofidece iterval of βj is: Iβj = ˆβj t α/ k ) s hjj; Hypothesis testig H0 : βj = 0 vs H : βj 0 has TS = ˆβj s hjj C =, t α/ k )) t α/ k ), + ) Rewrite simple ad multiple liear regressios as follows: Y = β0 + βx + + βkxk + ε, ε N0, σ), the model); µ = EY ) = β0 + βx + + βkxk, the mea); ˆµ = ˆβ0 + ˆβx + + ˆβkxk, the estimated lie) For a give/fixed x =, x,, xk), the scalar ˆµ is a estimate of ukow µ ad Y ) The we ca talk about accuracy of this estimate i terms of cofidece itervals ad predictio itervals) Cofidece iterval of µ: Iµ = ˆµ t α/ k ) s x X X) x Predictio iterval of Y : IY = ˆµ t α/ k ) s + x X X) x Suppose we have two models: { Model : Y = β0 + βx + + βkxk + ε; Model : Y = β0 + βx + + βkxk + βk+xk+ + + βk+pxk+p + ε, 9/0 ad we wat to test H0 : βk+ = = βk+p = 0 vs H : at least oe βk+i 0, SS ) E SS) E )/p F p, k p ) SS ) E / k p ) TS = SS) E SS) E )/p SS ) E / k p ) C = Fαp, k p ), + ) Variable selectio If we have a respose variable y with possibly may predictors x,, xk, the how to choose appropriate x s some x s are useful to Y, ad some are ot): Step : corrx,, xk, y), choose a maximal correlatio say xi), Y = β0 + βixi + ε, test if βi = 0? Step : do regressio Y = β0 + βixi + β x + ε for =,, i, i +,, k, choose a miimal SSE say xj), Y = β0 + βixi + βjxj + ε, test if βj = 0? Step 3: repeat Step util the last test for β = 0 is ot rejected 7 Basic χ -test { H0 : X distributio with or without ukow parameters); Suppose we wat to test H : X distributio with or without ukow parameters) fact is : k Ni pi) χ k #of ukow parameters); pi The TS = k Ni pi) ; pi C = χ αk #of ukow parameters), + ) Homogeeity test Suppose we have a data with r rows ad k colums, { H0 : differet rows have a same patter i terms of colums); H : differet rows have differet patters i terms of colums) Equivaletly, { H0 : rows ad colums are idepedet; H : rows ad colums are ot idepedet The fact is : k r Nij pij) j= χ r )k )); pij TS = k r Nij pij) j= ; pij C = χ αr )k )), + ), where pij = pi qj are the theoretical probabilities 0/0