UNIVERSITY OF TORONTO Faculty of Arts and Science APRIL/MAY 2009 EXAMINATIONS ECO220Y1Y PART 1 OF 2 SOLUTIONS

Similar documents
Regression, Inference, and Model Building

1 Inferential Methods for Correlation and Regression Analysis

ST 305: Exam 3 ( ) = P(A)P(B A) ( ) = P(A) + P(B) ( ) = 1 P( A) ( ) = P(A) P(B) σ X 2 = σ a+bx. σ ˆp. σ X +Y. σ X Y. σ X. σ Y. σ n.

Overview. p 2. Chapter 9. Pooled Estimate of. q = 1 p. Notation for Two Proportions. Inferences about Two Proportions. Assumptions

Properties and Hypothesis Testing

Continuous Data that can take on any real number (time/length) based on sample data. Categorical data can only be named or categorised

Common Large/Small Sample Tests 1/55

Final Examination Solutions 17/6/2010

MOST PEOPLE WOULD RATHER LIVE WITH A PROBLEM THEY CAN'T SOLVE, THAN ACCEPT A SOLUTION THEY CAN'T UNDERSTAND.

BIOS 4110: Introduction to Biostatistics. Breheny. Lab #9

Because it tests for differences between multiple pairs of means in one test, it is called an omnibus test.

TABLES AND FORMULAS FOR MOORE Basic Practice of Statistics

Chapter 13, Part A Analysis of Variance and Experimental Design

HYPOTHESIS TESTS FOR ONE POPULATION MEAN WORKSHEET MTH 1210, FALL 2018

Mathematical Notation Math Introduction to Applied Statistics

3/3/2014. CDS M Phil Econometrics. Types of Relationships. Types of Relationships. Types of Relationships. Vijayamohanan Pillai N.

Chapters 5 and 13: REGRESSION AND CORRELATION. Univariate data: x, Bivariate data (x,y).

TABLES AND FORMULAS FOR MOORE Basic Practice of Statistics

Worksheet 23 ( ) Introduction to Simple Linear Regression (continued)

Section 9.2. Tests About a Population Proportion 12/17/2014. Carrying Out a Significance Test H A N T. Parameters & Hypothesis

GG313 GEOLOGICAL DATA ANALYSIS

Stat 139 Homework 7 Solutions, Fall 2015

- E < p. ˆ p q ˆ E = q ˆ = 1 - p ˆ = sample proportion of x failures in a sample size of n. where. x n sample proportion. population proportion

2 1. The r.s., of size n2, from population 2 will be. 2 and 2. 2) The two populations are independent. This implies that all of the n1 n2

Topic 9: Sampling Distributions of Estimators

Data Analysis and Statistical Methods Statistics 651

Linear Regression Models

Agreement of CI and HT. Lecture 13 - Tests of Proportions. Example - Waiting Times

Stat 200 -Testing Summary Page 1

Econ 325 Notes on Point Estimator and Confidence Interval 1 By Hiro Kasahara

Correlation Regression

MA238 Assignment 4 Solutions (part a)

Recall the study where we estimated the difference between mean systolic blood pressure levels of users of oral contraceptives and non-users, x - y.

Assessment and Modeling of Forests. FR 4218 Spring Assignment 1 Solutions

A quick activity - Central Limit Theorem and Proportions. Lecture 21: Testing Proportions. Results from the GSS. Statistics and the General Population

Simple Regression. Acknowledgement. These slides are based on presentations created and copyrighted by Prof. Daniel Menasce (GMU) CS 700

Agenda: Recap. Lecture. Chapter 12. Homework. Chapt 12 #1, 2, 3 SAS Problems 3 & 4 by hand. Marquette University MATH 4740/MSCS 5740

Topic 9: Sampling Distributions of Estimators

Interval Estimation (Confidence Interval = C.I.): An interval estimate of some population parameter is an interval of the form (, ),

INSTRUCTIONS (A) 1.22 (B) 0.74 (C) 4.93 (D) 1.18 (E) 2.43

Class 23. Daniel B. Rowe, Ph.D. Department of Mathematics, Statistics, and Computer Science. Marquette University MATH 1700

Comparing Two Populations. Topic 15 - Two Sample Inference I. Comparing Two Means. Comparing Two Pop Means. Background Reading

This is an introductory course in Analysis of Variance and Design of Experiments.

Topic 9: Sampling Distributions of Estimators

(all terms are scalars).the minimization is clearer in sum notation:

Correlation. Two variables: Which test? Relationship Between Two Numerical Variables. Two variables: Which test? Contingency table Grouped bar graph

University of California, Los Angeles Department of Statistics. Simple regression analysis

Chapter 13: Tests of Hypothesis Section 13.1 Introduction

Chapter 23: Inferences About Means

S Y Y = ΣY 2 n. Using the above expressions, the correlation coefficient is. r = SXX S Y Y

Topic 18: Composite Hypotheses

Chapter 6 Sampling Distributions

Sample Size Determination (Two or More Samples)

Chapter 22. Comparing Two Proportions. Copyright 2010, 2007, 2004 Pearson Education, Inc.

Lecture 8: Non-parametric Comparison of Location. GENOME 560, Spring 2016 Doug Fowler, GS

Chapter 1 (Definitions)

Hypothesis Testing. Evaluation of Performance of Learned h. Issues. Trade-off Between Bias and Variance

Formulas and Tables for Gerstman

Statistics 20: Final Exam Solutions Summer Session 2007

Frequentist Inference

STAT431 Review. X = n. n )

11 Correlation and Regression

Parameter, Statistic and Random Samples

Working with Two Populations. Comparing Two Means

Introduction to Econometrics (3 rd Updated Edition) Solutions to Odd- Numbered End- of- Chapter Exercises: Chapter 3

Final Review. Fall 2013 Prof. Yao Xie, H. Milton Stewart School of Industrial Systems & Engineering Georgia Tech

Statistical Inference About Means and Proportions With Two Populations

DS 100: Principles and Techniques of Data Science Date: April 13, Discussion #10

7-1. Chapter 4. Part I. Sampling Distributions and Confidence Intervals

Investigating the Significance of a Correlation Coefficient using Jackknife Estimates

MATH 320: Probability and Statistics 9. Estimation and Testing of Parameters. Readings: Pruim, Chapter 4

Chapter 20. Comparing Two Proportions. BPS - 5th Ed. Chapter 20 1

Topic 10: Introduction to Estimation

STATISTICAL PROPERTIES OF LEAST SQUARES ESTIMATORS. Comments:

Computing Confidence Intervals for Sample Data

Successful HE applicants. Information sheet A Number of applicants. Gender Applicants Accepts Applicants Accepts. Age. Domicile

Math 152. Rumbos Fall Solutions to Review Problems for Exam #2. Number of Heads Frequency

FACULTY OF MATHEMATICAL STUDIES MATHEMATICS FOR PART I ENGINEERING. Lectures

MidtermII Review. Sta Fall Office Hours Wednesday 12:30-2:30pm Watch linear regression videos before lab on Thursday

Module 1 Fundamentals in statistics

Open book and notes. 120 minutes. Cover page and six pages of exam. No calculators.

Random Variables, Sampling and Estimation

Lecture 5: Parametric Hypothesis Testing: Comparing Means. GENOME 560, Spring 2016 Doug Fowler, GS

This chapter focuses on two experimental designs that are crucial to comparative studies: (1) independent samples and (2) matched pair samples.

A Confidence Interval for μ

UCLA STAT 110B Applied Statistics for Engineering and the Sciences

Chapter 6. Sampling and Estimation

Describing the Relation between Two Variables

EXAMINATIONS OF THE ROYAL STATISTICAL SOCIETY

Inferential Statistics. Inference Process. Inferential Statistics and Probability a Holistic Approach. Inference Process.

CHAPTER SUMMARIES MAT102 Dr J Lubowsky Page 1 of 13 Chapter 1: Introduction to Statistics

University of California, Los Angeles Department of Statistics. Hypothesis testing

Stat 421-SP2012 Interval Estimation Section

Chapter two: Hypothesis testing

Dr. Maddah ENMG 617 EM Statistics 11/26/12. Multiple Regression (2) (Chapter 15, Hines)

Statistical Inference (Chapter 10) Statistical inference = learn about a population based on the information provided by a sample.

STA Learning Objectives. Population Proportions. Module 10 Comparing Two Proportions. Upon completing this module, you should be able to:

Goodness-of-Fit Tests and Categorical Data Analysis (Devore Chapter Fourteen)

Chapter 22: What is a Test of Significance?

Biostatistics for Med Students. Lecture 2

Transcription:

PART of UNIVERSITY OF TORONTO Faculty of Arts ad Sciece APRIL/MAY 009 EAMINATIONS ECO0YY PART OF () The sample media is greater tha the sample mea whe there is. (B) () A radom variable is ormally distributed with mea 0 ad variace 5. If a radom variable Y is defied as Y 0, what is P(Y < 5)? (E) (3) For a radom sample affected by samplig error but ot affected by ay o-samplig errors, which are defiitely true statemets about the samplig distributio of the sample mea? (B) (4) A airplae is overloaded if the total weight of passegers exceeds 6,500 pouds. O average passegers weigh 75 pouds with a stadard deviatio of 6 pouds. If there are 36 passegers what is the probability that the plae is overloaded (to the earest hudredth)? (A) (5) For the pairs of measuremets ( x y ), ( x, y ),...,(, ) Y o is (E) Y 3 ad for o Y is, x y the simple liear regressio lie for Y. What is the coefficiet of correlatio betwee ad Y? 0 (6) What is the variace of the depedat variable (to the earest iteger)? (D) (7) What percet of the variatio i the depedat variable is explaied by variatio i the idepedet variable (to the earest iteger)? (A) (8) A simple liear regressio of Y o with a sample size of 3 yields t -3. for the test of statistical sigificace (i.e. validity) of the model. Which is closest to the p-value? (C) (9) What is the stadardized test statistic (to the earest hudredth)? (A) (0) To support the iferece that people use the stairs more whe the sig is posted, the differece i the fractio usig the stairs must be at least. (B) () Which are legitimate limitatios of the statistical aalysis of these data? (B) () If the retur o ivestmet i Caada is 9, what is the predicted retur o ivestmet overseas (to the earest teth)? (D) (3) A populatio is Uiformly distributed betwee 50 ad 60. For a sample size of 35 what is the probability that the sample mea is less tha 56 (to the earest teth)? (E)

PART of (4) What is the iterpretatio of the coefficiet estimate for the variable APR? (C) (5) What would the coefficiet estimate for the variable FEB be if the icluded moth variables are FEB, MAR, APR, MAY, JUN, JUL, AUG, SEP, OCT, NOV, ad DEC istead of the moth variables i the table? (B) (6) What is the estimated relatioship betwee delays ad precipitatio at the origi airport for a September flight if there is.5 cm of rai at the destiatio airport? (E) (7) Which are legitimate criticisms of this multiple regressio model? (C) (8) Compared to Graph, Graph clearly shows a regressio aalysis where the coefficiet of determiatio is ad the F test statistic is. (A) (9) Which would result i the highest probability of a Type II error? (C) (0) If a radom sample has 00 observatios, the true populatio mea is 60, ad the sigificace level is 0.0 the what is the probability of a Type II error? (A)

PART OF Page of 4 UNIVERSITY OF TORONTO Faculty of Arts ad Sciece APRIL/MAY 009 EAMINATIONS ECO0YY PART OF Duratio - 3 hours Examiatio Aids: Calculator () (a) The chace that samplig error explais the result is 0.5 8 0.004, which is very close to zero. Hece we should coclude that either we have o-samplig errors (such as a selectio bias, o-respose bias, poorly desiged questioaire) or that it is ot true that 50% of customers order the large bowl. (b) The sample is large, therefore the distributio of the sample mea will be ormal accordig to the CLT: σ N μ, E( ) 0.5 6 + 0.5 9 7.50 V ( ) E( [ 0.5 6 + 0.5 9 ] or V ( ) E( 0.5*(6 7.5) ) μ μ) 7.50.5 + 0.5*(9 7.5).5.5 N 7.5, 40 8.00 7.5 ( 8.00) P > P Z >.5 40 P( Z >.) 0.5 0.486 0.074 Desity 3 mea 7.5, s.e..37 8.00 0 6.5 7 7.5 -bar 8 8.5

PART OF Page of 4 () pˆ pˆ B A 65 0.36 000 576 0.384 500 H H 0 : p : p A A p p B B 0 > 0 65 + 576 pˆ 0.3509 000 + 500 (0.384 0.36) Z 0.3509 ( 0.3509) + 000 p value P( Z > 3.558) 0 500 3.558 Because the p-value is very small there is strog evidece to support the iferece that the fractio of cars carryig two or more people is higher after the itroductio of the carpool-oly laes. (3) The statistical aalysis that addresses the questio is iterval estimatio (ad ot hypothesis testig). We eed to make a iferece about the differece betwee two meas. The uequal variaces approach is most appropriate because it looks like the reveues from large cart sales are much more variable tha small cart sales. ν ( s + s ) ( s ) ( s ) ( 50.09 7+ 5.3 4) ( 50.09 7) ( 5.3 4) + 7 + 4 63 ( ) ± t + ( 9.94 55.0) α / s 74.93 ±.96*.3 LCL 70.40 UCL 79.46 s ±.96 50.09 7 5.3 + 4 We are 95% cofidet that the differece i sales betwee large ad small carts is i the iterval from $70.40 to $79.46.

PART OF Page 3 of 4 (4) (a) The estimated regressio equatio is y 5.0 + 0.87x + 8. 4x. The coefficiet of 0.87 o x measures how much higher reveues are as the local populatio grows by,000 households: each additioal,000 households is associated with sales reveues that are $870 higher. [ETRA: The costat term (itercept) of 5.0 has o meaig because o locatio will have zero households: it is just a shifter. The coefficiet of 8.4 o x measures how much higher reveues are whe parkig is available compared to places it is ot available: places that have parkig have reveues that are $8,400 higher tha places without parkig.] (b) H 0 : β β 0 (Model is ot statistically sigificat) H : ot all slopes are 0 (Model is statistically sigificat) Use the F-test. Calculate the F test statistic: R / k 0.7 / F 4 ( R ) /( k ) ( 0.7) /(5 ) Fid the rejectio regio: The umerator degrees of freedom ( k) ad the deomiator degrees of freedom is ( k ). Our F table does ot have the exact critical value for our test but we ca see that for α 0.05 it will be betwee 3.49 ad 4.0. Our F-statistic is 4, which is clearly greater tha either of those critical values ad hece falls i the rejectio regio. Reject H 0 ad coclude that the model is statistically sigificat. [Note: Some studets may correctly ote that the model is statistically sigificat eve if α 0.0, which is a tougher stadard.] [ETRA: We are give the SST 000. R SSR/SST 0.7 SSR/000 SSR 700 SSE SST SSR 000 700 300 We could complete the ANOVA table as follow: SS df MS F Regressio 700 350 4 Error 300 5 Total 000 4 At 05 0. α, reject 0 H if F > 3.89 (*eed a computer to get 3.89 exactly*) ad fail to reject 0 H if F 3.89, where F has degrees of freedom (, ). ]

PART OF Page 4 of 4 (c) A locatio with parkig: y 50.7 + 0.97x A locatio without parkig: y 7.9 + 0.4x To test if these relatioships differ i a statistically sigificat way we eed to test whether the coefficiet o the iteractio term is statistically differet from zero. H 0 : β 3 0 H : β 3 0 t 0.55/0.5 3.67 The rejectio regio with degrees of freedom ad α 0.05 is t < -.0 or t >.0. Sice 3.67 >.0 we reject the ull hypothesis ad ifer the research hypothesis is true: the coefficiet is statistically sigificat. [ETRA: Hece we have sufficiet evidece to ifer that the effect of parkig depeds o the local populatio size (or alteratively that the effect of local populatio size depeds o parkig). Also some studets did two tests: test if slopes differ (above) ad also test if itercepts differ. Techically these two tests should be joit but we did ot lear that i our course: we oly leared a joit test of all of the coefficiets, which is the F test. However studets that also tested the dummy coefficiet did ot lose ay poits: i some ways they gave a more complete aswer eve though the test is ot exactly right (should be a special F test--that we did t lear-- ad ot two t tests).] (5) We are 99% cofidet that the differece i health scores betwee subscribers ad osubscribers is betwee 9.7 ad 4.3. However, the data are observatioal ad ot experimetal, which meas that we caot ifer causality. While it is true that people who subscribe to NA are substatially healthier we caot coclude that readig NA caused this. I fact customers choose whether to subscribe to NA: it was ot radomly assiged (i.e. it is a edogeous variable). Customers that that are iterested i a healthy lifestyle would choose to subscribe. Istead the secod iterval is based o experimetal data. We are 99% cofidet that the differece i health scores betwee free subscribers ad o-subscribers is betwee -0. ad 4.7. It looks like havig NA aroud probably does have a positive health effect although this causal effect is ot huge. The poit estimate is that a subscriptio to NA boosts the health score by.3 (out of 00). Notice this laguage correctly implies causality. While the 99% Cofidece Iterval estimate does iclude zero, if we did a oe-tailed test we would fid that NA has a positive ad statistically sigificat causal effect o health.