UNIVERSITY OF TORONTO Faculty of Arts and Science APRIL/MAY 2009 EXAMINATIONS ECO220Y1Y PART 1 OF 2 SOLUTIONS

PART of UNIVERSITY OF TORONTO Faculty of Arts ad Sciece APRIL/MAY 009 EAMINATIONS ECO0YY PART OF () The sample media is greater tha the sample mea whe there is. (B) () A radom variable is ormally distributed with mea 0 ad variace 5. If a radom variable Y is defied as Y 0, what is P(Y < 5)? (E) (3) For a radom sample affected by samplig error but ot affected by ay o-samplig errors, which are defiitely true statemets about the samplig distributio of the sample mea? (B) (4) A airplae is overloaded if the total weight of passegers exceeds 6,500 pouds. O average passegers weigh 75 pouds with a stadard deviatio of 6 pouds. If there are 36 passegers what is the probability that the plae is overloaded (to the earest hudredth)? (A) (5) For the pairs of measuremets ( x y ), ( x, y ),...,(, ) Y o is (E) Y 3 ad for o Y is, x y the simple liear regressio lie for Y. What is the coefficiet of correlatio betwee ad Y? 0 (6) What is the variace of the depedat variable (to the earest iteger)? (D) (7) What percet of the variatio i the depedat variable is explaied by variatio i the idepedet variable (to the earest iteger)? (A) (8) A simple liear regressio of Y o with a sample size of 3 yields t -3. for the test of statistical sigificace (i.e. validity) of the model. Which is closest to the p-value? (C) (9) What is the stadardized test statistic (to the earest hudredth)? (A) (0) To support the iferece that people use the stairs more whe the sig is posted, the differece i the fractio usig the stairs must be at least. (B) () Which are legitimate limitatios of the statistical aalysis of these data? (B) () If the retur o ivestmet i Caada is 9, what is the predicted retur o ivestmet overseas (to the earest teth)? (D) (3) A populatio is Uiformly distributed betwee 50 ad 60. For a sample size of 35 what is the probability that the sample mea is less tha 56 (to the earest teth)? (E)

PART of (4) What is the iterpretatio of the coefficiet estimate for the variable APR? (C) (5) What would the coefficiet estimate for the variable FEB be if the icluded moth variables are FEB, MAR, APR, MAY, JUN, JUL, AUG, SEP, OCT, NOV, ad DEC istead of the moth variables i the table? (B) (6) What is the estimated relatioship betwee delays ad precipitatio at the origi airport for a September flight if there is.5 cm of rai at the destiatio airport? (E) (7) Which are legitimate criticisms of this multiple regressio model? (C) (8) Compared to Graph, Graph clearly shows a regressio aalysis where the coefficiet of determiatio is ad the F test statistic is. (A) (9) Which would result i the highest probability of a Type II error? (C) (0) If a radom sample has 00 observatios, the true populatio mea is 60, ad the sigificace level is 0.0 the what is the probability of a Type II error? (A)

PART OF Page of 4 UNIVERSITY OF TORONTO Faculty of Arts ad Sciece APRIL/MAY 009 EAMINATIONS ECO0YY PART OF Duratio - 3 hours Examiatio Aids: Calculator () (a) The chace that samplig error explais the result is 0.5 8 0.004, which is very close to zero. Hece we should coclude that either we have o-samplig errors (such as a selectio bias, o-respose bias, poorly desiged questioaire) or that it is ot true that 50% of customers order the large bowl. (b) The sample is large, therefore the distributio of the sample mea will be ormal accordig to the CLT: σ N μ, E( ) 0.5 6 + 0.5 9 7.50 V ( ) E( [ 0.5 6 + 0.5 9 ] or V ( ) E( 0.5*(6 7.5) ) μ μ) 7.50.5 + 0.5*(9 7.5).5.5 N 7.5, 40 8.00 7.5 ( 8.00) P > P Z >.5 40 P( Z >.) 0.5 0.486 0.074 Desity 3 mea 7.5, s.e..37 8.00 0 6.5 7 7.5 -bar 8 8.5

PART OF Page of 4 () pˆ pˆ B A 65 0.36 000 576 0.384 500 H H 0 : p : p A A p p B B 0 > 0 65 + 576 pˆ 0.3509 000 + 500 (0.384 0.36) Z 0.3509 ( 0.3509) + 000 p value P( Z > 3.558) 0 500 3.558 Because the p-value is very small there is strog evidece to support the iferece that the fractio of cars carryig two or more people is higher after the itroductio of the carpool-oly laes. (3) The statistical aalysis that addresses the questio is iterval estimatio (ad ot hypothesis testig). We eed to make a iferece about the differece betwee two meas. The uequal variaces approach is most appropriate because it looks like the reveues from large cart sales are much more variable tha small cart sales. ν ( s + s ) ( s ) ( s ) ( 50.09 7+ 5.3 4) ( 50.09 7) ( 5.3 4) + 7 + 4 63 ( ) ± t + ( 9.94 55.0) α / s 74.93 ±.96*.3 LCL 70.40 UCL 79.46 s ±.96 50.09 7 5.3 + 4 We are 95% cofidet that the differece i sales betwee large ad small carts is i the iterval from $70.40 to $79.46.

PART OF Page 3 of 4 (4) (a) The estimated regressio equatio is y 5.0 + 0.87x + 8. 4x. The coefficiet of 0.87 o x measures how much higher reveues are as the local populatio grows by,000 households: each additioal,000 households is associated with sales reveues that are $870 higher. [ETRA: The costat term (itercept) of 5.0 has o meaig because o locatio will have zero households: it is just a shifter. The coefficiet of 8.4 o x measures how much higher reveues are whe parkig is available compared to places it is ot available: places that have parkig have reveues that are $8,400 higher tha places without parkig.] (b) H 0 : β β 0 (Model is ot statistically sigificat) H : ot all slopes are 0 (Model is statistically sigificat) Use the F-test. Calculate the F test statistic: R / k 0.7 / F 4 ( R ) /( k ) ( 0.7) /(5 ) Fid the rejectio regio: The umerator degrees of freedom ( k) ad the deomiator degrees of freedom is ( k ). Our F table does ot have the exact critical value for our test but we ca see that for α 0.05 it will be betwee 3.49 ad 4.0. Our F-statistic is 4, which is clearly greater tha either of those critical values ad hece falls i the rejectio regio. Reject H 0 ad coclude that the model is statistically sigificat. [Note: Some studets may correctly ote that the model is statistically sigificat eve if α 0.0, which is a tougher stadard.] [ETRA: We are give the SST 000. R SSR/SST 0.7 SSR/000 SSR 700 SSE SST SSR 000 700 300 We could complete the ANOVA table as follow: SS df MS F Regressio 700 350 4 Error 300 5 Total 000 4 At 05 0. α, reject 0 H if F > 3.89 (*eed a computer to get 3.89 exactly*) ad fail to reject 0 H if F 3.89, where F has degrees of freedom (, ). ]

PART OF Page 4 of 4 (c) A locatio with parkig: y 50.7 + 0.97x A locatio without parkig: y 7.9 + 0.4x To test if these relatioships differ i a statistically sigificat way we eed to test whether the coefficiet o the iteractio term is statistically differet from zero. H 0 : β 3 0 H : β 3 0 t 0.55/0.5 3.67 The rejectio regio with degrees of freedom ad α 0.05 is t < -.0 or t >.0. Sice 3.67 >.0 we reject the ull hypothesis ad ifer the research hypothesis is true: the coefficiet is statistically sigificat. [ETRA: Hece we have sufficiet evidece to ifer that the effect of parkig depeds o the local populatio size (or alteratively that the effect of local populatio size depeds o parkig). Also some studets did two tests: test if slopes differ (above) ad also test if itercepts differ. Techically these two tests should be joit but we did ot lear that i our course: we oly leared a joit test of all of the coefficiets, which is the F test. However studets that also tested the dummy coefficiet did ot lose ay poits: i some ways they gave a more complete aswer eve though the test is ot exactly right (should be a special F test--that we did t lear-- ad ot two t tests).] (5) We are 99% cofidet that the differece i health scores betwee subscribers ad osubscribers is betwee 9.7 ad 4.3. However, the data are observatioal ad ot experimetal, which meas that we caot ifer causality. While it is true that people who subscribe to NA are substatially healthier we caot coclude that readig NA caused this. I fact customers choose whether to subscribe to NA: it was ot radomly assiged (i.e. it is a edogeous variable). Customers that that are iterested i a healthy lifestyle would choose to subscribe. Istead the secod iterval is based o experimetal data. We are 99% cofidet that the differece i health scores betwee free subscribers ad o-subscribers is betwee -0. ad 4.7. It looks like havig NA aroud probably does have a positive health effect although this causal effect is ot huge. The poit estimate is that a subscriptio to NA boosts the health score by.3 (out of 00). Notice this laguage correctly implies causality. While the 99% Cofidece Iterval estimate does iclude zero, if we did a oe-tailed test we would fid that NA has a positive ad statistically sigificat causal effect o health.