UNIVERSITY OF TORONTO. Faculty of Arts and Science APRIL - MAY 2005 EXAMINATIONS STA 248 H1S. Duration - 3 hours. Aids Allowed: Calculator

Similar documents
Some Assorted Formulae. Some confidence intervals: σ n. x ± z α/2. x ± t n 1;α/2 n. ˆp(1 ˆp) ˆp ± z α/2 n. χ 2 n 1;1 α/2. n 1;α/2

# of 6s # of times Test the null hypthesis that the dice are fair at α =.01 significance

1-Way ANOVA MATH 143. Spring Department of Mathematics and Statistics Calvin College

ANOVA: Analysis of Variation

ANOVA Situation The F Statistic Multiple Comparisons. 1-Way ANOVA MATH 143. Department of Mathematics and Statistics Calvin College

Statistics for EES Factorial analysis of variance

Biostatistics for physicists fall Correlation Linear regression Analysis of variance

We know from STAT.1030 that the relevant test statistic for equality of proportions is:

Nature vs. nurture? Lecture 18 - Regression: Inference, Outliers, and Intervals. Regression Output. Conditions for inference.

UNIVERSITY OF TORONTO Faculty of Arts and Science

Final Exam - Solutions

Homework 9 Sample Solution

Lecture Slides. Elementary Statistics Eleventh Edition. by Mario F. Triola. and the Triola Statistics Series 9.1-1

Math 141. Lecture 16: More than one group. Albyn Jones 1. jones/courses/ Library 304. Albyn Jones Math 141

Section 4.6 Simple Linear Regression

SMAM 314 Exam 42 Name

Name: Biostatistics 1 st year Comprehensive Examination: Applied in-class exam. June 8 th, 2016: 9am to 1pm

MATH Notebook 3 Spring 2018

Chapter 7. Inference for Distributions. Introduction to the Practice of STATISTICS SEVENTH. Moore / McCabe / Craig. Lecture Presentation Slides

ST430 Exam 1 with Answers

Chapter 12. Analysis of variance

STAT 501 EXAM I NAME Spring 1999

Chapter 7. Practice Exam Questions and Solutions for Final Exam, Spring 2009 Statistics 301, Professor Wardrop

Department of Economics. Business Statistics. Chapter 12 Chi-square test of independence & Analysis of Variance ECON 509. Dr.

Chapter 9. Hypothesis testing. 9.1 Introduction

Inference for Regression

Stat 135 Fall 2013 FINAL EXAM December 18, 2013

Elementary Statistics and Inference

UNIVERSITY OF TORONTO. Faculty of Arts and Science APRIL 2010 EXAMINATIONS STA 303 H1S / STA 1002 HS. Duration - 3 hours. Aids Allowed: Calculator

4.1. Introduction: Comparing Means

Probability and Probability Distributions. Dr. Mohammed Alahmed

STAT 201 Assignment 6

Exam Empirical Methods VU University Amsterdam, Faculty of Exact Sciences h, February 12, 2015

STAT FINAL EXAM

Chapter 8 Student Lecture Notes 8-1. Department of Economics. Business Statistics. Chapter 12 Chi-square test of independence & Analysis of Variance

Inference for Distributions Inference for the Mean of a Population

CHAPTER 8. Test Procedures is a rule, based on sample data, for deciding whether to reject H 0 and contains:

hypotheses. P-value Test for a 2 Sample z-test (Large Independent Samples) n > 30 P-value Test for a 2 Sample t-test (Small Samples) n < 30 Identify α

FinalExamReview. Sta Fall Provided: Z, t and χ 2 tables

Salt Lake Community College MATH 1040 Final Exam Fall Semester 2011 Form E

Booklet of Code and Output for STAC32 Final Exam

Inferences Based on Two Samples

" M A #M B. Standard deviation of the population (Greek lowercase letter sigma) σ 2

STAT 525 Fall Final exam. Tuesday December 14, 2010

An inferential procedure to use sample data to understand a population Procedures

10.2: The Chi Square Test for Goodness of Fit

Statistics - Lecture 05

Announcements. Final Review: Units 1-7

AMS7: WEEK 7. CLASS 1. More on Hypothesis Testing Monday May 11th, 2015

One-Way Analysis of Variance: ANOVA

STA 101 Final Review

M(t) = 1 t. (1 t), 6 M (0) = 20 P (95. X i 110) i=1

Section 9 1B: Using Confidence Intervals to Estimate the Difference ( p 1 p 2 ) in 2 Population Proportions p 1 and p 2 using Two Independent Samples

IB Questionbank Mathematical Studies 3rd edition. Grouped discrete. 184 min 183 marks

Math 1040 Final Exam Form A Introduction to Statistics Fall Semester 2010

[ z = 1.48 ; accept H 0 ]

their contents. If the sample mean is 15.2 oz. and the sample standard deviation is 0.50 oz., find the 95% confidence interval of the true mean.

Nominal Data. Parametric Statistics. Nonparametric Statistics. Parametric vs Nonparametric Tests. Greg C Elvers

TEST 1 M3070 Fall 2003

Confidence Intervals, Testing and ANOVA Summary

MAT 2379, Introduction to Biostatistics, Sample Calculator Questions 1. MAT 2379, Introduction to Biostatistics

Booklet of Code and Output for STAC32 Final Exam

SCHOOL OF MATHEMATICS AND STATISTICS

Relax and good luck! STP 231 Example EXAM #2. Instructor: Ela Jackiewicz

WISE International Masters

Lecture 17. Ingo Ruczinski. October 26, Department of Biostatistics Johns Hopkins Bloomberg School of Public Health Johns Hopkins University

MAT3378 (Winter 2016)

Mock Exam - 2 hours - use of basic (non-programmable) calculator is allowed - all exercises carry the same marks - exam is strictly individual

Chapter 8. The analysis of count data. This is page 236 Printer: Opaque this

unadjusted model for baseline cholesterol 22:31 Monday, April 19,

Examination paper for TMA4255 Applied statistics

3. (a) (8 points) There is more than one way to correctly express the null hypothesis in matrix form. One way to state the null hypothesis is

Black White Total Observed Expected χ 2 = (f observed f expected ) 2 f expected (83 126) 2 ( )2 126

Ch 11- One Way Analysis of Variance

Lecture 14: Introduction to Poisson Regression

Modelling counts. Lecture 14: Introduction to Poisson Regression. Overview

Open book and notes. 120 minutes. Covers Chapters 8 through 14 of Montgomery and Runger (fourth edition).

1. A machine produces packets of sugar. The weights in grams of thirty packets chosen at random are shown below.

2. Outliers and inference for regression

Chapter 20 Comparing Groups

The Components of a Statistical Hypothesis Testing Problem

ONE FACTOR COMPLETELY RANDOMIZED ANOVA

Introduction to Analysis of Variance. Chapter 11

y = a + bx 12.1: Inference for Linear Regression Review: General Form of Linear Regression Equation Review: Interpreting Computer Regression Output

Exam details. Final Review Session. Things to Review

MULTIPLE CHOICE. Choose the one alternative that best completes the statement or answers the question. describes the.

The t-test: A z-score for a sample mean tells us where in the distribution the particular mean lies

Swarthmore Honors Exam 2012: Statistics

UNIVERSITY OF TORONTO SCARBOROUGH Department of Computer and Mathematical Sciences Midterm Test, October 2013

Extra Exam Empirical Methods VU University Amsterdam, Faculty of Exact Sciences , July 2, 2015

Lecture 11 Analysis of variance

QUEEN S UNIVERSITY FINAL EXAMINATION FACULTY OF ARTS AND SCIENCE DEPARTMENT OF ECONOMICS APRIL 2018

MATH c UNIVERSITY OF LEEDS Examination for the Module MATH1725 (May-June 2009) INTRODUCTION TO STATISTICS. Time allowed: 2 hours

Soc 3811 Basic Social Statistics Second Midterm Exam Spring Your Name [50 points]: ID #: ANSWERS

INTERVAL ESTIMATION AND HYPOTHESES TESTING

Statistical inference (estimation, hypothesis tests, confidence intervals) Oct 2018

Point Estimation and Confidence Interval

9 One-Way Analysis of Variance

CIVL /8904 T R A F F I C F L O W T H E O R Y L E C T U R E - 8

GEOMETRIC -discrete A discrete random variable R counts number of times needed before an event occurs

Transcription:

UNIVERSITY OF TORONTO Faculty of Arts and Science APRIL - MAY 2005 EXAMINATIONS STA 248 H1S Duration - 3 hours Aids Allowed: Calculator LAST NAME: FIRST NAME: STUDENT NUMBER: There are 17 pages including this page. On the last page is a list of formulae that may be useful. Tables of the normal distribution can be found on page 14, the t distribution can be found on page 15, and the chi-square distribution can be found on page 16. Total marks: 85 1 2 3 4 5ab 5cd 5ef 6abcde 6fghi 7 8 1

1. (8 marks) The histograms below show the distributions of marks for the 188 students who wrote the STA 247 exam. The first histogram is constructed from the marks for question 9 (out of 10) and the second histogram is constructed from the total of the marks on all questions (out of 100). Histogram of Q9 Histogram of total Frequency 0 20 40 60 80 Frequency 0 10 20 30 40 50 0 2 4 6 8 10 Q9 0 20 40 60 80 total (a) Describe the shape of each of the two distributions of marks. (b) If you were to pick a single statistic to summarize the distribution for question 9 and another to summarize the distribution of the total, what would they be and why? (c) Suppose the marks for all 10 questions on the exam had distributions like that for question 9. (They don t, but pretend they do.) Ignoring the actual values on the horizontal axis and concentrating on shape only, would you be surprised if the shape of the distribution of the total was the shape shown above? Why or why not? 2

2. (6 marks) For X a random variable with a Poisson distribution, the probability mass function is P (X = x) = e µ µ x for x = 0, 1, 2,... x! and E(X) = µ and Var(X) = µ. Suppose x 1, x 2,..., x n are n observations from a Poisson distribution. (a) Find the maximum likelihood estimate for µ. (b) Is your answer to part (a) unbiased? Explain. 3

3. (8 marks) A sample of size 4 from a normal distribution with σ 2 = 16 (assumed known) is used to test H 0 : µ = 10 versus H a : µ = 13. Suppose that the test statistic used is the sample mean, X, and that we will reject H 0 in favour of H a if the observed value of X is greater than 12. (a) If H 0 is true, what is the distribution of X? (b) On the diagram below, shade the region whose area is α. density 0.00 0.05 0.10 0.15 0.20 5 10 15 x (c) On the diagram below, shade the region whose area is the power of the test H 0 : µ = 10 versus H a : µ = 13. density 0.00 0.05 0.10 0.15 0.20 5 10 15 x (d) Give one way to increase the power and describe how it will affect your sketch in (c). 4

4. (5 marks) Identify an appropriate parametric test for each of the following situations. You can assume that the assumptions of the test procedure are satisfied in each case. Your choices for this question are: 1 sample t-test 2 independent samples t-test paired t-test 1-way analysis of variance 2-way analysis of variance chi-square test (a) A recent lawsuit against the Ford automobile manufacturer suggested that tire failure was the cause of fatal accidents in their sport utility vehicles (SUVs) more often than in SUVs of other manufacturers. The cause of fatal accidents involving SUVs (in particular, whether they were tire related or not) and the manufacturers of the SUVs (in particular, Ford or another) were recorded and the data are the counts in each category. We d like to determine whether there is a relationship between cause of accident and manufacturer. (b) A doctor is interested in assessing whether or not there is a difference in blood pressure levels for populations of young women using birth control pills and young women not using birth control pills. A random sample of 50 young women in each population is collected and their blood pressures are measured. (c) A random sample of workers is taken from each of 3 factories and the number of overtime hours worked for each worker is recorded. The purpose of the study is to examine the relationship between factory and hours of overtime worked. (d) A chemist is evaluating a new method for determining the percentage content of an element in a sample. She obtains a specimen of known content and makes 10 measurements of the percentage content of the element. She wants to compare her measurements to the known content. (e) Researchers collected intelligence test scores on twins, one of whom was raised by the natural parents and one of whom was raised by foster parents. They are interested in knowing whether there is an advantage in resulting intelligence scores for children raised by their natural parents. 5

5. (24 marks) Fourteen volunteer males with high blood pressure were randomly assigned to one of two diets for four weeks: a fish oil diet and a regular oil diet. The data collected are the reductions in diastolic blood pressure from the beginning of the study to the end. Here is some R output giving some summary statistics and side-by-side boxplots for the change in blood pressure. mean(pressuredrop[dietoil=="fish"]) [1] 6.571429 mean(pressuredrop[dietoil=="regular"]) [1] -1.142857 sqrt(var(pressuredrop[dietoil=="fish"])) [1] 5.8554 sqrt(var(pressuredrop[dietoil=="regular"])) [1] 3.184785 5 0 5 10 FISH REGULAR (a) Describe and compare the distributions of blood pressure change for each of the two treatment groups. (b) Estimate the median of each group. Compare its value with the mean of each group. How is this comparison related to your answer to part (a)? 6

(c) Here is some more output from R for question 7. Two Sample t-test data: pressuredrop by dietoil t = 3.0621, df = 12, p-value = 0.009861 alternative hypothesis: true difference in means is not equal to 0 95 percent confidence interval: 2.225174 13.203398 sample estimates: mean in group FISH mean in group REGULAR 6.571429-1.142857 The following 4 questions relate to the output above. i. Give the null and alternative hypotheses being tested. ii. Explain how the given confidence interval and p-value give the same conclusion. iii. What assumptions are being made in the testing procedure? (d) Since the measurements are the reductions in blood pressure for each man, it is of interest to know whether the mean reduction is zero for each group. For the regular oil diet group carry out a test to determine the evidence that the mean reduction for this group is different from zero. 7

(e) Here is some R code related to the reductions in blood pressure for the men in the regular oil diet group. Two different procedures are being carried out. # PROCEDURE 1 bootsamples <- matrix(sample(pressuredrop[dietoil=="regular"],7*1000, + replace=t),nrow=1000) bootmeans <- apply(bootsamples,1,mean) diff <- bootmeans - mean(bootmeans) diff <- sort(diff) llimit <- mean(pressuredrop[dietoil=="regular"]) - diff[975] ulimit <- mean(pressuredrop[dietoil=="regular"]) - diff[25] llimit [1] -3.240857 ulimit [1] 1.044857 # PROCEDURE 2 y <- pressuredrop[dietoil=="regular"] - mean(pressuredrop[dietoil=="regular"]) bootsamples <- matrix(sample(y,7*1000,replace=t),nrow=1000) bootmeans <- apply(bootsamples,1,mean) mean(pressuredrop[dietoil=="regular"]) [1] -1.142857 (sum(bootmeans 1.142857) + sum(bootmeans < -1.142857))/1000 [1] 0.341 Indicate clearly what the procedures are and what are the results. (f) Explain why the bootstrap samples are drawn from different data for the two procedures in part (e). 8

6. (17 marks) Suppose that three database servers compete for our business. Each purports to have the smallest mean response time, averaged over a query mix particular to our activity. We collect a number of response times (variable name: times) from each server (recorded in variable server as 1, 2 or 3). The following analysis was carried out using R. Questions begin on the next page. mean(times[server==1]) [1] 624.199 mean(times[server==2]) [1] 238.123 mean(times[server==3]) [1] 348.44 sqrt(var(times[server==1])) [1] 155.5974 sqrt(var(times[server==2])) [1] 194.0457 sqrt(var(times[server==3])) [1] 147.928 db.aov <- aov(times ~ server) summary(db.aov) Df Sum Sq Mean Sq F value Pr(F) server 2 790892 395446 14.166 6.213e-05 *** Residuals 27 753723 27916 --- Signif. codes: 0 *** 0.001 ** 0.01 * 0.05. 0.1 db.aov$coef (Intercept) server2 server3 624.199-386.076-275.759 qqnorm(db.aov$resid) TukeyHSD(db.aov) Tukey multiple comparisons of means 95% family-wise confidence level Fit: aov(formula = times ~ server) $server diff lwr upr 2-1 -386.076-571.33898-200.81302 3-1 -275.759-461.02198-90.49602 3-2 110.317-74.94598 295.57998 9

Normal Q-Q plot of the residuals: Normal Q Q Plot Sample Quantiles 200 100 0 100 200 300 2 1 0 1 2 Theoretical Quantiles (a) What are the null and alternative hypotheses being tested by the F test? (b) How many observations were there? (c) What is SS Tot? (d) What is the estimate of the error variance? (e) What assumptions are necessary to justify the F test? 10

(More questions for #8.) (f) What assumptions are assessed by the Normal Q-Q plot? What does the given plot (on the previous page) suggest? (g) Show how the mean response times for each server can be calculated from the db.aov$coef. (h) Suppose the residual for the 4th response time on server 2 is negative. What does this tell you about that response time as it relates to the other observations from that database server? (i) Tukey s procedure was carried out. What is its purpose and what conclusions can be drawn from it? 11

7. (9 marks) In a double-blind study, human subjects were randomly assigned to take either a placebo or vitamin C tablet daily during the winter. The purpose of the experiment was to determine whether or not taking vitamin C helps protect people from colds. The following data were collected: Cold No cold Total Placebo 62 26 88 Vitamin C 157 75 232 Total 219 101 320 (a) Is this an experiment or an observational study? Explain how you know. (b) The study is described as double-blind. What does this mean and why is it a good feature of a study? (c) Conduct an appropriate test to determine whether there is a significant difference in catching a cold between subjects who took vitamin C and those who took the placebo. 12

8. (8 marks) The following statements are false. Correct them. (When there is more than one sentence in the parts below, the correction should be made to the last sentence.) Trivial corrections (e.g. simply inserting the word not ) will receive no credit. (a) If a sample size is large, then the shape of a histogram of the sample will be approximately normal, even if the population distribution is not normal. (b) A 95% confidence interval for the mean weight of adult males is calculated from a random sample of 120 males and found to be (70, 100) kg. Thus 95% of adult males weigh between 70 and 100 kg. (c) A type I error occurs when the test statistic falls in the rejection region of the test. (d) An analysis of variance is carried out to test whether the means of two groups are equal. Of course, this analysis could also have been carried out with an appropriate t-test. The test statistic for the analysis of variance F -test is the same as the test statistic for this t-test. 13

14

15

16

Some Assorted Formulae If X Bin(n, p), E(X) = np and Var(X) = np(1 p). Some confidence intervals: σ x ± z α/2 n s x ± t n 1;α/2 n ˆp(1 ˆp) ˆp ± z α/2 n ( (n 1)s 2 (n 1)s 2 ) χ 2, n 1;α/2 χ 2 n 1;1 α/2 s 2 (y 1 y 2 ) ± t (df; α 2 ) 1 + s2 2 n 1 n 2 (y 1 y 2 ) ± t (n1 +n 2 2; α 2 ) s p 1 n 1 + 1 n 2 Some test statistics: z obs = z obs = x µ 0 σ/ n t obs = x µ 0 s/ n ˆp p 0 p0 (1 p 0 )/n t obs = (y 1 y 2 ) (µ 1 µ 2 ) s 2 1 n 1 + s2 2 n 2 t obs = (y 1 y 2 ) (µ 1 µ 2 ) s p 1 n 1 + 1 n 2 χ 2 obs = r i=1 j=1 c (O ij E ij ) 2 E ij Some analysis of variance formulae: a n i SS Tot = (y ij y ) 2 i=1 j=1 a SS Tr = n i (y i y ) 2 i=1 a n i SS E = (y ij y i ) 2 i=1 j=1 s p (y i y j ) ± q (a,dfe,α) n 17 Total pages 17 Total marks 85