STAT 285: Fall Semester Final Examination Solutions

Similar documents
STAT 350: Summer Semester Midterm 1: Solutions

Smoking Habits. Moderate Smokers Heavy Smokers Total. Hypertension No Hypertension Total

Stat 135, Fall 2006 A. Adhikari HOMEWORK 10 SOLUTIONS

Bias Variance Trade-off

AMCS243/CS243/EE243 Probability and Statistics. Fall Final Exam: Sunday Dec. 8, 3:00pm- 5:50pm VERSION A

Regression Estimation - Least Squares and Maximum Likelihood. Dr. Frank Wood

STAT Exam Jam Solutions. Contents

EC212: Introduction to Econometrics Review Materials (Wooldridge, Appendix)

Unless provided with information to the contrary, assume for each question below that the Classical Linear Model assumptions hold.

Evaluating Classifiers. Lecture 2 Instructor: Max Welling

HT Introduction. P(X i = x i ) = e λ λ x i

Statistics II Lesson 1. Inference on one population. Year 2009/10

SCHOOL OF MATHEMATICS AND STATISTICS. Linear and Generalised Linear Models

Simple Linear Regression

Disadvantages of using many pooled t procedures. The sampling distribution of the sample means. The variability between the sample means

Ch 2: Simple Linear Regression

This exam is closed book and closed notes. (You will have access to a copy of the Table of Common Distributions given in the back of the text.

Salt Lake Community College MATH 1040 Final Exam Fall Semester 2011 Form E

Lecture 10: Generalized likelihood ratio test

Probability & Statistics - FALL 2008 FINAL EXAM

Mathematical Notation Math Introduction to Applied Statistics

Evaluating Hypotheses

Statistics 135 Fall 2008 Final Exam

MAT 2379, Introduction to Biostatistics, Sample Calculator Questions 1. MAT 2379, Introduction to Biostatistics

COMP2610/COMP Information Theory

ORF 245 Fundamentals of Statistics Chapter 9 Hypothesis Testing

Problem 1 (20) Log-normal. f(x) Cauchy

Economics 345: Applied Econometrics Section A01 University of Victoria Midterm Examination #2 Version 1 SOLUTIONS Fall 2016 Instructor: Martin Farnham

Interval estimation. October 3, Basic ideas CLT and CI CI for a population mean CI for a population proportion CI for a Normal mean

CS 361 Sample Midterm 2

Review. December 4 th, Review

Math 1040 Final Exam Form A Introduction to Statistics Fall Semester 2010

Math 243 Section 3.1 Introduction to Probability Lab

Basic Probability Reference Sheet

WISE International Masters

Swarthmore Honors Exam 2012: Statistics

Stat 135, Fall 2006 A. Adhikari HOMEWORK 6 SOLUTIONS

This is a multiple choice and short answer practice exam. It does not count towards your grade. You may use the tables in your book.

Master s Written Examination - Solution

MTMS Mathematical Statistics

UNIVERSITY OF MASSACHUSETTS. Department of Mathematics and Statistics. Basic Exam - Applied Statistics. Tuesday, January 17, 2017

Inferences for Regression

Probability Distributions

MS&E 226: Small Data

6.867 Machine Learning

Mathematical statistics

This does not cover everything on the final. Look at the posted practice problems for other topics.

Unit 6 - Simple linear regression

Problems. Suppose both models are fitted to the same data. Show that SS Res, A SS Res, B

14.30 Introduction to Statistical Methods in Economics Spring 2009

Economics 345: Applied Econometrics Section A01 University of Victoria Midterm Examination #2 Version 2 Fall 2016 Instructor: Martin Farnham

Chapter 7: Statistical Inference (Two Samples)

Stat 231 Exam 2 Fall 2013

Regression Estimation Least Squares and Maximum Likelihood

Discrete distribution. Fitting probability models to frequency data. Hypotheses for! 2 test. ! 2 Goodness-of-fit test

Econometrics A. Simple linear model (2) Keio University, Faculty of Economics. Simon Clinet (Keio University) Econometrics A October 16, / 11

5.2 Fisher information and the Cramer-Rao bound

MTH302 Long Solved Questions By

Note that we are looking at the true mean, μ, not y. The problem for us is that we need to find the endpoints of our interval (a, b).

10/4/2013. Hypothesis Testing & z-test. Hypothesis Testing. Hypothesis Testing

Stat 135 Fall 2013 FINAL EXAM December 18, 2013

Statistics 100A Homework 5 Solutions

CSE 312: Foundations of Computing II Quiz Section #10: Review Questions for Final Exam (solutions)

Null Hypothesis Significance Testing p-values, significance level, power, t-tests

The Central Limit Theorem

7 Estimation. 7.1 Population and Sample (P.91-92)

MATH c UNIVERSITY OF LEEDS Examination for the Module MATH1725 (May-June 2009) INTRODUCTION TO STATISTICS. Time allowed: 2 hours

SCHOOL OF MATHEMATICS AND STATISTICS

Discrete Structures Prelim 1 Selected problems from past exams

Chapter 12 - Lecture 2 Inferences about regression coefficient

Probability and Probability Distributions. Dr. Mohammed Alahmed

Solutionbank S1 Edexcel AS and A Level Modular Mathematics

Lecture 6 Multiple Linear Regression, cont.

Unit 6 - Introduction to linear regression

Simple and Multiple Linear Regression

Semester , Example Exam 1

Introduction to Statistical Inference

Statistics Handbook. All statistical tables were computed by the author.

Statistics 135 Fall 2007 Midterm Exam

We're in interested in Pr{three sixes when throwing a single dice 8 times}. => Y has a binomial distribution, or in official notation, Y ~ BIN(n,p).

Oct Simple linear regression. Minimum mean square error prediction. Univariate. regression. Calculating intercept and slope

Design of Engineering Experiments

Ch. 7 Statistical Intervals Based on a Single Sample

Mock Exam - 2 hours - use of basic (non-programmable) calculator is allowed - all exercises carry the same marks - exam is strictly individual

UNIVERSITY OF TORONTO Faculty of Arts and Science

Chapter 3. Introduction to Linear Correlation and Regression Part 3

Mathematical Notation Math Introduction to Applied Statistics

Confidence Intervals. - simply, an interval for which we have a certain confidence.

6.867 Machine Learning

6 The normal distribution, the central limit theorem and random samples

Originality in the Arts and Sciences: Lecture 2: Probability and Statistics

Lesson One Hundred and Sixty-One Normal Distribution for some Resolution

Null Hypothesis Significance Testing p-values, significance level, power, t-tests Spring 2017

Lectures on Simple Linear Regression Stat 431, Summer 2012

BNAD 276 Lecture 5 Discrete Probability Distributions Exercises 1 11

STAT 450: Final Examination Version 1. Richard Lockhart 16 December 2002

STAT5044: Regression and Anova. Inyoung Kim

EXAM TMA4240 STATISTICS Thursday 20 Dec 2012 Tid: 09:00 13:00

STAT 31 Practice Midterm 2 Fall, 2005

Transcription:

Name: Student Number: STAT 285: Fall Semester 2014 Final Examination Solutions 5 December 2014 Instructor: Richard Lockhart Instructions: This is an open book test. As such you may use formulas such as those for means and variances of standard distributions from the book without deriving them. You may use notes, text, other books and a calculator; you may not use a computer. Your work will be marked for clarity of explanation. I expect you to explain what assumptions you are making and to comment if those assumptions seem unreasonable. In general you need not finish doing arithmetic in confidence interval problems; I will be satisfied if your answers contain things like 27±1.96 247.5/11, but I have to be absolutely convinced you know what arithmetic to do! In hypothesis testing problems you will have to finish the arithmetic enough to reach a real world conclusion. I want the answers written on the paper. The exam is out of 70. 1. We generally assume that when a coin is tossed it lands heads up with probability 0.5. It has been suggested, however, that if you do the experiment in a different way the chance might change. One such way is to stand the coin on edge on a hard flat surface, hold it upright with a finger and then flick the edge with a finger to send the coin spinning away. A group of 107 statistics students actually did this, 40 times each for a total of 4280 spins. (a) [5 marks] Suppose they got a total of 2376 heads. Give a 99%confidence interval for the probability that spinning produces heads. Wearegoingtoassume thatall40students hadthesameprobability pofspinning heads; it is not obvious that this is realistic. If so then the number, X, of heads has a Binomial(4280,p) distribution. The estimate is The estimated standard error is ˆp = 2376 = 0.5551402 = 0.555 4280 ˆσˆp = ˆp(1 ˆp) 4280 = 0.555 0.445 4280 = 0.007596106 = 0.00760. 1

For a 99% confidence interval the relevant critical point from the normal curve is and the interval is z 0.005 = 2.58 0.5551±2.58 0.0076 or 0.5475 to 0.5627. It is not necessary to work out all the numbers for full marks. (b) [5 marks] Is it reasonable to believe that this method produces heads with probability 1/2? This is a typical hypothesis testing question. The way it is asked the null hypothesis is H 0 : p = 0.5 and the alternative is H a : p 0.5. The relevant test statistic is T = 0.5551 0.5 = 7.25 0.0076 and the P-value is obtained from a normal curve. On the test the tables being used end at 3.49 and you can only say that P is less than 2 0.0002 = 0.0004. I can use R to check that P 4.2 10 13. In any case P is so ridiculously small that the null hypothesis is untenable. The probability of heads is more than 0.5. 2. [5 marks] It has also been suggested that if you flip a coin in the usual way and catch it in your hand it is slightly more likely than 50%to land the same way up as it started. If the chance of landing the same side up as it started were really 0.51 how many tosses would I need to make to have at least a 90% chance that a level 5% test would detect a significant difference between the probability of heads and 0.5? Thisisanapplicationofthesamplesizedeterminationproblem. Wearegivenα = 0.05, β = 0.1, p 0 = 0.5, and p = 0.51. The alternative suggested calls for a one-sided test so [ ] 2 [ ] 2 z 0.05 0.5 0.5+z0.1 0.51 0.49 1.65 0.5+1.28 0.50 n = = = 21609 0.51 0.5 0.01 is the required sample size. 3. A large population of new mothers is divided into two groups: smokers and nonsmokers. Independent samples of 40 mothers are drawn from each group in order to make comparisons. One comparison made is birth weight. The babies of the 40 smokers average 112.1 ounces with a standard deviation of 16.1 ounces while those of the 40 non-smokers average 123.1 ounces with an SD of 15.8 ounces. 2

(a) [5 marks] Is it clear that smokers have lower birth weight babies? We have two samples: X 1,...,X n with n = 40 from the population of smokers, and Y 1,...,Y m with m = 40 from the population of non-smokers. If µ 1 is the population mean birth weight for smoker s babies and µ 2 is the population mean birth weight for non-smokers babies then our null hypothesis is H 0 : µ 2 = µ 1 (or H 0 : µ 2 µ 1 ) and the alternative is H a : µ 1 < µ 2. (There may be students who try to make the case for a 2 sided test; I would dock 0.5 marks only.) The relevant test statistic is Ȳ T = X = 3.08. 16.1 2 + 15.82 40 40 The degrees of freedom, no matter how you calculate it will be close to 78; they cannot be lower than 39 and the two SDs are close together so they will be close to 78. The P value for 3.08, 1 sided, with any degrees of freedom from 40 to 120 is in the range of 0.001 to 0.002 so in any case we conclude that there is very strong evidence against the idea that smokers have babies as heavy as those of non-smokers. Clearly they have low birth weight babies. From R the P-value is between 0.00141 and 0.00187 and likely closer to the smaller number but the conclusion is not affected by this difference. (b) [5 marks] Give a 90 percent confidence interval for the difference in mean birth weights between smoking and non-smoking mothers. In the previous part you computed both a degrees of freedom and an estimated standard error. The df should be used to find a t multiplier. 3

4. The following sequence of questions concern the following model. Imagine we have 3 dice which have been carefully manufactured so that they all have exactly the same weight, θ. We begin by weighing 1 of the dice and recording Y 1 which you may assume has mean θ. The error in the measurement, namely Y 1 θ has a normal distribution with mean 0 and standard deviation σ. In order to keep this problem simple you may assume that σ = 1 and that you know this somehow. Then you weigh 2 of the dice together and record Y 2 whose mean is 2θ. Assume that the error Y 2 2θ has a normal distribution with mean 0 and standard deviation σ. In this problem you are to compare two estimators for θ. (a) [5 marks] The first estimator is based on the idea that Y 2 /2 has mean θ the same as Y 1. This estimator is the average of these two. ˆθ 1 = Y 1 +Y 2 /2 2 Find the bias, standard error and mean square error of ˆθ 1. The mean of ˆθ 1 is ( ) Y1 +Y 2 /2 E(ˆθ 1 ) = E = E(Y 1)+E(Y 2 )/2 = θ+2θ/2 2 2 2 = θ. Thus ˆθ 1 is unbiased (the bias is 0). The variance is ( ) Y1 +Y 2 /2 Var(ˆθ 1 ) = Var = Var(Y 1)+Var(Y 2 )/4 2 4 = σ2 +σ 2 /4 4 = 5σ2 16 The mean squared error is also the variance. The standard error is the square root of the variance 5σ 4. 4

(b) [5 marks] Another estimator is obtained by least squares. Derive the formula for the least squares estimate of θ; call this estimator ˆθ 2. The error sum of squares is (Y 1 θ) 2 +(Y 2 2θ) 2 To minimize this take the derivative with respect to θ and get 2(Y 1 θ) 4(Y 2 2θ) = 2(Y 1 +2Y 2 5θ). This is 0 when so Y 1 +2Y 2 5θ = 0 ˆθ 2 = Y 1 +2Y 2. 5 (c) [5 marks] Find the bias, standard error and mean squared error of ˆθ 2. The mean of ˆθ 2 is ( ) Y1 +2Y 2 E = E(Y 1)+2E(Y 2 ) = θ+4θ = θ. 5 5 5 Thus ˆθ 2 is unbiased. Its variance is ( ) Y1 +2Y 2 Var = Var(Y 1)+4Var(Y 2 ) 5 25 This is also the MSE. The standard error is σ 5. = σ2 +4σ 2 25 = σ2 5. (d) [2 marks] Based on these calculations, which is the better estimator of θ, ˆθ 1 or ˆθ 2? Both are unbiased and ˆθ 2 has a smaller variance because Thus ˆθ 2 is better. 1 5 < 5 16. 5

5. Suppose X has density f(x,θ) = { θ x 2 x > θ 0 x θ (a) [2 marks] Find the cumulative distribution function of X. Let F(x) = P(X x) be the cdf. For x < θ we have F(x) = 0. For x θ we have x θ F(x) = P(X x) = u 2du = θ x u = 1 θ x. (b) [2 marks] For any b such that 1 b find θ P(1 X θ b) and then find the values of b for which this probability is 0.025 and 0.975. This is just To make this be α we need or 1/b = 1 α or P(θ X bθ) = F(bθ) F(θ) = 1 θ bθ = 1 1 b. α = 1 1/b b = 1 1 α. Thus for 0.025 we have b = 1/0.975 and for 0.975 we need b = 1/0.025. θ 6

(c) [5 marks] Use the results of the previous problem to find a 95% confidence interval for θ. We now know that ( 1 P 0.975 X θ 1 ) = 0.975 0.025 = 0.95. 0.025 Solving the inequalities we find P(0.025X θ 0.975X) = 0.95 so that the interval [0.025X,0.975X] is a 95% confidence interval for θ. (d) [1 mark] Evaluate your interval if we observe X = 40. The interval runs from 0.025 40 = 1 to 0.975 40 = 39. 7

6. In the October 7, 2014 issue of the Canadian Medical Association Journal a randomized controlled double blind study of 378 patients studied the effect of melatonin on delirium. The treatment group had 186 patients and observed 55 cases of delirium while the control group had 192 patients and 49 cases of delirium. (a) [1 mark] If the treatment is completely ineffective what is the estimated probability of delirium in this group? The pooled estimate of the binomial probability is ˆp = 55+49 186+192 = 104 378 = 0.27513. (b) [2 marks] What is the estimated standard error for the estimate in the previous part. If the treatment is completely ineffective we have n = 378 trials and ˆp is a sample proportion so its standard error is p(1 p) 378 which we estimate by ˆp(1 ˆp) 0.275 0.725 = = 0.02297. 378 378 (c) [5 marks] Is there clear evidence of a difference in either direction in delirium rates between treatment and control? This is a test for the equality of two proportions, p 1, the probability of delirium in the treatment group and p 2 the probability of delirium in the control group. The null hypothesis is p 1 = p 2. The alternative is two sided: p 1 p 2. The test statistic is ˆp 1 ˆp 2 T = ˆp(1 ˆp) ( 1 + ) = 0.736. 1 186 192 This is nowhere near significant; there is little evidence of a difference in the probability of delirium between treatment and control. The actual P-value is 0.46. 8

7. I have regularly used the heights of 1078 father / adult son pairs gathered in Victorian England. We are interested in predicting the heights of sons from the heights of fathers supposing that the relationship is described by a straight line. Fathers average 67.69 inches in height with a standard deviation of 2.74 inches. Sons average 68.68 inches in height with a standard deviation of 2.81 inches. The average of the products (son s height times father s height) is 4652.895 square inches. Students writing in Burnaby or at the CSD got to see the following sentence which was added after the exams were printed; students in Surrey never found this out! I adjusted the marking to compensate. The Error Sum of Squares i (Y i ˆβ 0 ˆβ 1 x i ) 2 is 6390.331 square inches. (a) [2 marks] Show that xy xȳ = 1 n (x i x)(y i ȳ) The right hand side is 1 n as desired. (x i x)(y i ȳ) = 1 n = 1 n (x i y i x i ȳ y i x+ xȳ) x i y i ȳ 1 n x i x 1 n = xy ȳ x xȳ + xȳ = xy xȳ y i + 1 n n xȳ (b) [3 marks] Estimate the slope and intercept of the least squares line for predicting son s heights from father s heights. This just requires you to plug in to the various formulas. The numerator of ˆβ 1 is The denominator is Thus Then xy xȳ = 4652.895 67.69 68.68 = 3.946. x 2 x 2 = n 1 n s2 x = 1077 1078 2.742 = 7.501. ˆβ 1 = 3.946 7.501 = 0.526. ˆβ 0 = ȳ ˆβ 1 x = 68.68 0.526 67.69 = 33.08. The units of ˆβ 0 are inches while ˆβ 1 is unitless (inches per inch). 9

(c) [5 marks]give a 95% confidence interval for the true slope for the population these families are drawn from. The interval is S ˆβ 1 ±t 0.025,n 1 (n 1)s 2 x where n 1 = 1077 which is so large we must use the normal critical value: 1.96. The quantity S is the estimated standard error which I had to add to the data give to you (it can be computed from the information given if you know how but I did not teach how). Using the information given in red above we find inches. Our interval is 6390.331 S = = 2.437 1076 0.526±1.96 2.437 1077 2.75 2. 10

1a 5 5a 2 1b 5 5b 2 2 5 5c 5 3a 5 5d 1 3b 5 6a 1 4a 5 6b 2 4b 5 6c 5 4c 5 7a 2 4d 2 7b 3 7c 5 Total 70 11