UNIVERSITY OF TORONTO Faculty of Arts and Scence December 005 Examnatons STA47HF/STA005HF Duraton - hours AIDS ALLOWED: (to be suppled by the student) Non-programmable calculator One handwrtten 8.5'' x '' ad sheet (both sdes may be used) Please show the your work clearly n the space provded. You may use the back of the pages f necessary but you must reman organzed. Last name: Frst name: Student number: Benchmarks of 5% sgnfcance and 95% confdence may be used when evaluatng results/conclusons. Assume multvarate normalty of data whenever necessary. PLEASE CHECK AND MAKE SURE THAT THERE ARE NO MISSING PAGES IN THIS BOOKLET. 4 5 6 7 8 9 0 Total
) [0] The students n a school take the three courses Math I, Stat I and Math II. Let X, X and X denote the scores of a student n these three courses (Math I, Stat I and Math II respectvely). We have the followng nformaton on the three varables: Mean of X ( µ X ) = 75, Standard devaton of X ( σ X ) = 8 Mean of X ( µ X ) = 70, Standard devaton of X ( σ X ) = 0 Mean of X ( µ X ) = 60, Standard devaton of X ( σ X ) = Correlaton of X and X ( ρ, ) = 0. Correlaton of X and X ( ρ, ) = 0.5 Correlaton of X and X ( ρ,) = 0.4 Assume that ( X, X, X ) has a multvarate normal dstrbuton wth the above parameters. X+ X Gven that a student has an average of 70 for Math I and Math II (.e. the condtonal probablty that ths student wll get an A grade for Stat I. (Note: In order to get an A, the student has to score 80 or above) = 70), fnd (Tme estmate: 0 mnutes) (Soluton on hard copy)
) [0] In a study on counterfet notes, the lengths ( x ) and wdths ( x ) of 00 counterfet notes were measured. The sample mean vector and the sample covarance matrx are gven below: 4.8 0.4 0.0 x 0.0 and S 0.0 0.065 Genune notes have a mean length of 5 and a mean wdth of 0 (n the same unts as measurements above). Test whether the mean dmensons (.e. length and wdth) of the counterfet notes are sgnfcantly dfferent from those of genune notes. Assume bvarate normalty of data.
) The nutrtonal content of 0 dets was analyzed n two labs (lab A and lab B). Each lab measured two varables V and V on each det. The objectve s to compare the measurements n the two labs. Let x and y denote the observaton vectors on det, from lab A and B respectvely and let d = x y. Some useful summary statstcs are gven below (n usual notaton):.0 0 4.55 9.86 0.50.67 d 0.80, W d = dd, W = 9.86 7.888 d 0.67 0.909 0.588 W d has egen pars ( λ, e ) =, where λ =.65, e 0.809 0.809 λ = 0.047, e 0.588 µ Assume that d ~ N( µ d, d) where µ d µ. a) [0] Test whether there s a sgnfcant dfference (on average) between the measurements n the two labs. b) [0] Test the hypothess H : µ = 0 aganst the alternatve µ 0, =, wth o µ > 0 for at least one =,. (Assume that any possble devatons of the mean components from the null hypothess need not be of equal magntude) Sol > db=c(.0, 0.80) > db [].0 0.80 > db=matrx(db) > db [,] [,].0 [,] 0.80 > wd=matrx(c(4.55,9.86,9.86,7.888), nrow=, ncol=, byrow=) > wd [,] [,] [,] 4.55 9.86 [,] 9.86 7.888 > solve(wd) [,] [,] [,] 0.5050-0.66648 [,] -0.66648 0.90867
> egen(solve(wd)) $values [].6985 0.04670746 $vectors [,] [,] [,] 0.588470 0.80868 [,] -0.80868 0.588470 > n=0 > f=n- > f [] 9 > p= > sd=wd-n*db%*%t(db) > sd [,] [,] [,].806.64978 [,].64978.4799 > T_sq=n*t(db)%*%solve(sd)%*%db > f0=((f-p+)/(f*p))*t_sq > f0 [,] [,].970 > fa=qf(0.95,p,f-p+) > fa [] 4.45897 > # Do not rejet H0 > p=matrx(c(0.588,0.809,-0.809,0.588), nrow=, ncol=, byrow=) > p [,] [,] [,] 0.588 0.809 [,] -0.809 0.588 > l=dag(c(.65,0.047)) > l [,] [,] [,].65 0.000 [,] 0.000 0.047 > l^0.5 [,] [,] [,].68 0.0000000
[,] 0.000000 0.67948 > a=p%*%l^0.5%*%t(p) > a [,] [,] [,] 0.54589-0.45687 [,] -0.45687 0.896067 > u=sqrt(0)*a%*%db > u [,] [,] 0.67590 [,] 0.66854 > ub_sq=t(u)%*%u > ub_sq [,] [,] 0.808 > #Read table B wth p= and n=0 > # Ths value s 0.40 and so reject H0.
4) [5] In an experment, three varetes (A, B and C) of rce are sown n 8 plots, where each varety of rce s assgned at random to sx plots. Two varables are measured after sx weeks, y, the heght of the plant, and y, the number of tllers per plant. Some useful summary statstcs (n usual notaton) are gven below: Mean vector y 58.7 5.67 Varety A B C y 50.8 5.50 y 54.8 5.8 Covarance S 7.7 0.67 0.67.07 S 6.7 0.0 0.0.90 S 4.97.0.0 0.57 6.78 4.00 SSTR = 4.00 0. Assumng that all the observatons are ndependently normally dstrbuted wth common covarance matrx and the th varety has mean µ = ( µ,, µ, ), =,,, test the hypothess H : µ = µ = µ aganst alternatve A: at least one mean s dfferent, at the 5% level of sgnfcance. Gve a Bonferron type 95% confdence nterval for µ µ µ +µ. Assume that we are nterested n computng k = such confdence,,,, ntervals, but t s enough to present only ths one snce we don t have tme to compute the other two. (t 5,0.05/ 6 =.694 )
5) In a study on counterfet notes, the nvestgators collected data on four varables: left length, rght length, bottom length and top length. The objectve of the study was to nvestgate the possblty of classfyng the notes as real or fake. Some SAS outputs (usng PROC DISCRIM) s gven below: The DISCRIM Procedure Observatons 49 DF Total 48 Varables 4 DF Wthn Classes 47 Classes DF Between Classes Class Level Informaton Varable Pror type Name Frequency Weght Proporton Probablty fake fake 6 6.0000 0.40996 0.500000 real real 88 88.0000 0.590604 0.500000 Parwse Generalzed Squared Dstances Between Groups - D ( j) = (X - X )' COV (X - X ) j j Generalzed Squared Dstance to type From type fake real fake 0.0790 real.0790 0 Lnear Dscrmnant Functon Constant = -.5 X' COV X Coeffcent Vector = COV X j j j Lnear Dscrmnant Functon for type Varable fake real Constant -907-980 left 75.6560 75.76897 rght 694.64696 69.540 bottom -90.08-96.6407 top -9.9590-7.788
a) [5] Use ths nformaton to classfy a note wth left length 0., rght length 9.9, bottom length 9.9 and top length 0. (= ). x o D o x b) [5] Let denote the estmated Mahalanobs squared dstance between x and ( =, wth = fake and = real). Gve the value of D D.
c) [5] Estmate the probablty of msclassfyng an observaton from populaton (fake notes).
d) [5] The estmated coeffcents n the SAS output above were obtaned usng equal pror probabltes and equal costs of msclassfcaton. If we now want to set the pror probabltes to reflect the fact that only percent of the notes are fake (and 99 % real) and assume that the cost of msclassfyng a fake note as real s 0 tmes greater than the cost of msclassfyng a real note as fake. Classfy the observaton n part (a) (.e. x o ) nto one populaton usng ths addtonal nformaton on pror probabltes and msclassfcaton costs.
Mscellaneous 6) [] X and X are normal random varables both havng mean 0 and varance. State whether the followng statements are true or false. a) ( X, X has a bvarate normal dstrbuton. (True / False) (Crcle one) ) False (Not always, true f X and X are ndependent) b) For all a, where False (smlar to (a) above) x = ( X, X ) has a normal dstrbuton. (True / False) ax 0 7) ( X, X, X) ~ N( µ, ) where µ =[,, ], = 0 0.75 0.50 0.5 = 0.50.00 0.50 0.5 0.50 0.75 a) [7] State whether the followng statements are true or false. and ( X ) ( X ) ) + has a (False ) ( X ) ( X ) ) + has a (True) ) X ~ N (,) and X ~ N(,). (True / False) (True) χ dstrbuton wth degrees of freedom. (True / False) χ dstrbuton wth degrees of freedom. (True / False) v) X X =.~ N (, ). (True / False) (True ) v X + X + X and X X + X are ndependent. (True / False) Ans: F Reason Cov( X+ X + X, X X + X) = -. ( ) ( v) X + X X X + ) + 6 (True / False) has a χ dstrbuton wth degrees of freedom.
X+ X X 6 0 Ans True = ~ N X X X 0 v) The probablty densty at (,,0) s greater than that at (,,0). (True / False) sol: False Reason: > a=matrx(c(,,0), nrow=, ncol=, byrow=) > a [,] [,] [,] [,] 0 > b=matrx(c(,,0), nrow=, ncol=, byrow=) > b [,] [,] [,] [,] 0 > s= matrx(c(,,0,,,,0,,), nrow=, ncol=, byrow=) > s [,] [,] [,] [,] 0 [,] [,] 0 > m=matrx(c(,,), nrow=, ncol=, byrow=) > m [,] [,] [,] [,] > t(a-m)%*%solve(s)%*%(a-m) [,] [,] 9 > t(b-m)%*%solve(s)%*%(b-m) [,] [,] 6.75 (,,0) s closer to m = (,,) (Mahalanobs dstance =6.75) than (,,0) ( ths has Mahalanobs dstance 9) and therefore has hgher densty.
b) [] Fnd EQ ( ), where Q= X + X + X + XX 4XX.
8) [4] Let x, x,, x 0 be ndependently and dentcally dstrbuted N ( µ, ) (µ, known) random varables wth 0 C. 0 0 x = x, 0 = 0 S = ( x x)( x x) and 9 Gve the dstrbuton (wth the values of parameters) of the followng statstcs: = a) ( x µ ) ( x µ ) b) ( x µ ) S ( x µ ) c) ( x µ ) C C C C x µ ( ) ( ) d) ( x µ ) C CSC x µ ( ) C( ) 9) [] In a profle analyss of two normal populatons wth mean vectors (unknown) µ and µ, we test H: µ µ = γ and f H s accepted, then test H : γ = 0. State whether the followng statements are true or false. a) If we accept H at α = 0.05, then for any r p matrx C, the test of H : C( µ µ ) = 0 wll also be accepted at α = 0.05. (True / False) (False) b) If we accept both H and H at α = 0.05, then the test of H : µ = µ (wth a Hotellng s T test) wll also be accepted at α = 0.05. (True / False) (False) c) If H : µ = µ (wth a Hotellng s T test) s accepted at α = 0.05, then n a profle analyss, both and H wll also be accepted at α = 0.05. (True / False) H
(False) 0)[6] Eght patents were gven a certan drug and the change n ther blood pressure was measured. The 95% confdence regon for the mean change n blood pressure s gven by {( µ, µ ) ( µ + 0.75).707( µ + 0.75)( µ.5) +.68( µ.5).9}. State whether the followng statements are true or false. a) The p-value for the Hotellng s T test for greater than 0.05. (True / False) b) The p-value for the Hotellng s T test for greater than 0.05. (True / False) H : µ = (,) aganst H : µ (,) s 0 H 0 : µ = (,) aganst H : µ (,) s c) The null hypothess H 0 : µ = (0,) wll be rejected at (usng a Hotellng s T test) any α < 0.05. (True / False)