Homework 9 Sample Solution

Similar documents
Section 4.6 Simple Linear Regression

Inference for Regression

Biostatistics for physicists fall Correlation Linear regression Analysis of variance

Variance Decomposition and Goodness of Fit

UNIVERSITY OF TORONTO SCARBOROUGH Department of Computer and Mathematical Sciences Midterm Test, October 2013

Variance Decomposition in Regression James M. Murray, Ph.D. University of Wisconsin - La Crosse Updated: October 04, 2017

Biostatistics 380 Multiple Regression 1. Multiple Regression

Simple Linear Regression

Ch 2: Simple Linear Regression

Figure 1: The fitted line using the shipment route-number of ampules data. STAT5044: Regression and ANOVA The Solution of Homework #2 Inyoung Kim

Chapter 16: Understanding Relationships Numerical Data

Stat 401B Exam 2 Fall 2015

No other aids are allowed. For example you are not allowed to have any other textbook or past exams.

Exercise I.1 I.2 I.3 I.4 II.1 II.2 III.1 III.2 III.3 IV.1 Question (1) (2) (3) (4) (5) (6) (7) (8) (9) (10) Answer

Summary of Chapter 7 (Sections ) and Chapter 8 (Section 8.1)

1-Way ANOVA MATH 143. Spring Department of Mathematics and Statistics Calvin College

Regression on Faithful with Section 9.3 content

ST430 Exam 2 Solutions

Chapter 14 Simple Linear Regression (A)

Lecture 6 Multiple Linear Regression, cont.

Linear models and their mathematical foundations: Simple linear regression

STAT 215 Confidence and Prediction Intervals in Regression

ANOVA: Analysis of Variation

Tests of Linear Restrictions

STA 108 Applied Linear Models: Regression Analysis Spring Solution for Homework #6

MATH 644: Regression Analysis Methods

STAT420 Midterm Exam. University of Illinois Urbana-Champaign October 19 (Friday), :00 4:15p. SOLUTIONS (Yellow)

Estimated Simple Regression Equation

Applied Regression Analysis

Coefficient of Determination

Lecture 11: Simple Linear Regression

STAT 350 Final (new Material) Review Problems Key Spring 2016

Introduction and Single Predictor Regression. Correlation

UNIVERSITY OF TORONTO. Faculty of Arts and Science APRIL - MAY 2005 EXAMINATIONS STA 248 H1S. Duration - 3 hours. Aids Allowed: Calculator

Linear Regression Model. Badr Missaoui

Density Temp vs Ratio. temp

Chapter 8: Correlation & Regression

Handout 4: Simple Linear Regression

" M A #M B. Standard deviation of the population (Greek lowercase letter sigma) σ 2

Correlation Analysis

Multiple Regression Introduction to Statistics Using R (Psychology 9041B)

UNIVERSITY OF TORONTO Faculty of Arts and Science

Regression Analysis lab 3. 1 Multiple linear regression. 1.1 Import data. 1.2 Scatterplot matrix

R 2 and F -Tests and ANOVA

Recall that a measure of fit is the sum of squared residuals: where. The F-test statistic may be written as:

Stat 401B Exam 2 Fall 2016

Booklet of Code and Output for STAC32 Final Exam

Confidence Intervals, Testing and ANOVA Summary

Stat 401B Final Exam Fall 2015

Lecture 10. Factorial experiments (2-way ANOVA etc)

1 Use of indicator random variables. (Chapter 8)

Regression, Part I. - In correlation, it would be irrelevant if we changed the axes on our graph.

Lecture 15 Multiple regression I Chapter 6 Set 2 Least Square Estimation The quadratic form to be minimized is

22s:152 Applied Linear Regression. Take random samples from each of m populations.

SCHOOL OF MATHEMATICS AND STATISTICS

Table of z values and probabilities for the standard normal distribution. z is the first column plus the top row. Each cell shows P(X z).

16.3 One-Way ANOVA: The Procedure

Statistics 203 Introduction to Regression Models and ANOVA Practice Exam

Lecture 15. Hypothesis testing in the linear model

Multiple Linear Regression

1 Forecasting House Starts

22s:152 Applied Linear Regression. There are a couple commonly used models for a one-way ANOVA with m groups. Chapter 8: ANOVA

MODELS WITHOUT AN INTERCEPT

Table of z values and probabilities for the standard normal distribution. z is the first column plus the top row. Each cell shows P(X z).

Ch 3: Multiple Linear Regression

Tentative solutions TMA4255 Applied Statistics 16 May, 2015

Section VII. Chi-square test for comparing proportions and frequencies. F test for means

Statistiek II. John Nerbonne. March 17, Dept of Information Science incl. important reworkings by Harmut Fitz

STT 843 Key to Homework 1 Spring 2018

Econ 3790: Business and Economics Statistics. Instructor: Yogesh Uppal

STAT 135 Lab 11 Tests for Categorical Data (Fisher s Exact test, χ 2 tests for Homogeneity and Independence) and Linear Regression

Study Sheet. December 10, The course PDF has been updated (6/11). Read the new one.

Estimating σ 2. We can do simple prediction of Y and estimation of the mean of Y at any value of X.

Chapter 10. Chapter 10. Multinomial Experiments and. Multinomial Experiments and Contingency Tables. Contingency Tables.

SLR output RLS. Refer to slr (code) on the Lecture Page of the class website.

ECO220Y Simple Regression: Testing the Slope

Statistical Analysis for QBIC Genetics Adapted by Ellen G. Dow 2017

Multiple Linear Regression (solutions to exercises)

Linear regression. We have that the estimated mean in linear regression is. ˆµ Y X=x = ˆβ 0 + ˆβ 1 x. The standard error of ˆµ Y X=x is.

Oct Simple linear regression. Minimum mean square error prediction. Univariate. regression. Calculating intercept and slope

Hypothesis Testing hypothesis testing approach

Linear Modelling: Simple Regression

Interpretation, Prediction and Confidence Intervals

Inference. ME104: Linear Regression Analysis Kenneth Benoit. August 15, August 15, 2012 Lecture 3 Multiple linear regression 1 1 / 58

Chapter 11: Linear Regression and Correla4on. Correla4on

Chapter 10: Analysis of variance (ANOVA)

MS&E 226: Small Data

STATISTICS 174: APPLIED STATISTICS FINAL EXAM DECEMBER 10, 2002

Regression Analysis II

Econ 3790: Business and Economic Statistics. Instructor: Yogesh Uppal

Linear Regression. Simple linear regression model determines the relationship between one dependent variable (y) and one independent variable (x).

The goodness-of-fit test Having discussed how to make comparisons between two proportions, we now consider comparisons of multiple proportions.

Regression Review. Statistics 149. Spring Copyright c 2006 by Mark E. Irwin

Foundations of Correlation and Regression

ANOVA Situation The F Statistic Multiple Comparisons. 1-Way ANOVA MATH 143. Department of Mathematics and Statistics Calvin College

Inference for Regression Simple Linear Regression

1 Multiple Regression

Workshop 7.4a: Single factor ANOVA

Inferences for Regression

Nature vs. nurture? Lecture 18 - Regression: Inference, Outliers, and Intervals. Regression Output. Conditions for inference.

Transcription:

Homework 9 Sample Solution # 1 (Ex 9.12, Ex 9.23) Ex 9.12 (a) Let p vitamin denote the probability of having cold when a person had taken vitamin C, and p placebo denote the probability of having cold when a person had only taken placebo. H 0 : p vitamin = p placebo H A : p vitamin p placebo Note that p vitamin = 17 139 0.122, p placebo = 31 140 0.221. z = p vitamin (1 p vitamin) n vitamin = p vitamin p placebo 0.122 0.221 + p placebo (1 p placebo) n placebo 0.122 0.878 0.221 0.779 139 + 140 = 2.212 P-value is 2 * (0.0136) = 0.0272 < 0.05 = α Thus, reject H 0 and we can conclude that Vitamin C significantly changes (reduces) the incidence rate of cold (or the probability of having cold). We will reject H 0 if χ 2 2 > χ (r 1)(c 1),α r c χ 2 = (n ij e ij) 2 i=1 j=1 Note that Chi-square test for two-way data can test both hypothesis of independence and hypothesis of homogeneity (p. 323). e ij

The following three ways of stating hypotheses are all right and equivalent. (1) H 0 : p(cold VC) = P(cold placebo) = P(cold), p(no cold VC) = P(no cold placebo) = P(no cold) H A : at least one is different. (2) H 0 : The chance of having cold is homogeneous (equal) in the group of VC and placebo. H A : The chance of having cold is heterogeneous (NOT equal) in the group of VC and placebo (3) H 0 : Having cold is independent of whether a person had vitamin C or placebo. H A : Having cold is NOT independent of whether a person had vitamin C or placebo. Observed Values Group Cold Column Yes No Total Vitamin C 17 122 139 Placebo 31 109 140 Row Total 48 232 279 Expected Values Group Cold Column Yes No Total Vitamin C 23.90 115.04 138.94 Placebo 24.09 115.97 140.06 Row Total 47.99 231.01 279 χ 2 = (17 23.90)2 23.90 + (31 24.09)2 24.09 + (122 115.04)2 115.04 2 = 4.814 > 3.843 = χ 1,0.05 + (109 115.97)2 115.97 Thus, reject H 0 and we can conclude that Vitamin C reduces the incidence of cold (in rates).

Ex 9.23 (a) χ 2 2 = (n i e i ) 2 i=1 e i = (x np 0 )2 + (n x n(1 p 0 ))2 np 0 n(1 p 0 ) = (x np 0 )2 (1 p 0 )+(np 0 x) 2 p 0 np 0 (1 p 0 ) = (x np 0 )2 np 0 (1 p 0 ) = z2 We reject H 0 if z > z α/2 or if z 2 2 > z α/2 equivalent. # 2 (Ex 9.20) (a) = χ 2 1,α. It is evident that the two tests are, indeed, H 0 : p 1 = 9 16, p 2 = 3 16, p 3 = 3 16, p 4 = 1 16 H A = Not H 0 Note that e i = np i = 1611p i. Phenotype n i e i (n i e i ) 2 Tall, cut-leaf 926 906.188 0.433 Dwarf, cut-leaf 293 302.063 0.272 Tall, potato-leaf 288 302.063 0.655 Dwarf, potato-leaf 104 100.688 0.109 Total 1611 χ 2 = 1.469 e i Note that χ 2 2 = χ 3,0.05 = 7.815. Thus, we fail to reject H 0.

# 3 (Ex 9.22) (a) Note that λ = 0.519. Then, p i = e 0.519(0.519)i i! Passengers n i p i e i (n i e i ) 2 0 678 0.595 601.662 9.686 1 227 0.309 312.262 23.281 2 56 0.080 81.032 7.733 3 28 0.014 14.019 13.944 4 8 0.002 1.819 196.998 Greater than 5 14 0.000 0.206 Total 1011 χ 2 = 251.642 e i Note that the cell Greater than 5 was combined with cell 4, to satisfy the requirement that no cell can have e i < 1 and no more than 1/5 th of the e i can be < 5. Since χ 2 2 > χ 3,0.05 = 7.815, reject H 0 and conclude that the Poisson distribution is not a plausible distribution for the number of passengers. Since p = 1 1+0.519 = 0.658, then p i = (1 p ) i 1 p = (0.342) i 1 (0.658) and e i = np i. Occupants n i p i e i (n i e i ) 2 1 678 0.658 665.569 0.232 2 227 0.225 227.407 0.001 3 56 0.077 77.698 6.060 4 28 0.026 26.547 0.079 5 8 0.009 9.071 0.126 Greater than 6 14 0.005 4.708 18.342 Total 1011 χ 2 = 24.841 e i Since χ 2 > χ 4,0.05 2 = 9.488, reject H 0 and conclude that the geometric distribution is not a plausible distribution for the number of occupants.

(c) While neither is a plausible distribution for the data, the geometric distribution seemed to fit much better, since the χ 2 value is much smaller. Also note that the lack of fit of the geometric distribution comes primarily from the tail category (Greater than 6).

# 4 (Ex 10.14, Ex 10.33) Ex 10.14 First of all, Note that Y = 1 Y n i and β 1 = c i Y i, where c i = x i x and c S i = 0. Then xx Cov(Y, β ) 1 = 1 n (c i)cov(y i, Y j ) i j = 1 n (c i)var(y i ) i = σ2 n c i i = 0 Since both Y and β 1 are both normally distributed (as linear functions of normal random variables), a correlation of 0 implies that they are independent. Ex 10.33 y i = β 0 + β 1x i = y + β 1(x i x ) Then, (y i y i)(y i y ) = (y i y β 1(x i x ))(y + β 1(x i x ) y ) = (y i y β 1(x i x ))β 1(x i x ) = β 1 (y i y )(x i x ) β 12 x i x 2 = β 1S xy β 12 S xx = S xy 2 S 2 xy S xx S xx 2 S xx = 0

# 5 (Coding Assignment: Ex 9.32) (a) Note that the total number for each row or column is not fixed. Thus, this is an example of a multinomial sampling (Refer to page 322 of the textbook for explanation). Pearson's Chi-squared test data: data X-squared = 138.29, df = 9, p-value < 2.2e-16 # 6 (Coding Assignment: Ex 10.4, Ex 10.11) (a), > summary(model) Call: lm(formula = NEXT ~ LAST) Residuals: Min 1Q Median 3Q Max -12.2364-4.2364-0.6352 5.5327 9.9316 Coefficients: Estimate Std. Error t value Pr(> t ) (Intercept) 31.013 4.417 7.022 1.10e-06 *** LAST 9.790 1.300 7.531 4.06e-07 *** --- Signif. codes: 0 *** 0.001 ** 0.01 * 0.05. 0.1 1 Residual standard error: 6.129 on 19 degrees of freedom Multiple R-squared: 0.7491, Adjusted R-squared: 0.7359 F-statistic: 56.72 on 1 and 19 DF, p-value: 4.059e-07

y = 31.013 + 3 9.79 = 60.383 (c) Note that R-square value is 0.7491 from the regression output. Also, you can run ANOVA test, get SSR and SST to compute R 2 = SSR = 2130.60 = 0.749. SST 2844.286 Analysis of Variance Table Response: NEXT Df Sum Sq Mean Sq F value Pr(>F) LAST 1 2130.60 2130.60 56.721 4.059e-07 *** Residuals 19 713.69 37.56 --- Signif. codes: 0 *** 0.001 ** 0.01 * 0.05. 0.1 1 (d) You can find σ = 6.129 from the regression output. MSE can be found in the ANOVA table, and the value is 37.56. You can also compute this: MSE = Thus, σ = 37.563 6.129. SSE = 713.69 = 37.563. Residual Degrees of Freedom 19 (e) In the previous regression output, note that the p-value of the coefficient corresponds to t-test where H 0 : β 1 = 0 H A : β 1 0 Since the p-value 4.06e-07 < 0.05 and that the coefficient is positive with value 9.79, we conclude that the time to next eruption significantly increase when the duration of the last eruption increases. (f) We can use cor function in R to find the sample correlation r = 0.865, and cor.test function to find the confidence interval for correlation as [0.692, 0.944]. Another way to find confidence interval for correlation is to follow the steps in p.383. Define

Compute the z-statistic ψ = 1 2 log e ( 1 + r 1 r ) = 1 2 log e ( 1 + 0.865 1 0.865 ) = 1.3129 z = n 3(ψ ψ 0 ) ψ t 20,0.025 1 n 3 ψ ψ + t 20,0.025 1 n 3 = 1.3129 1.725 1 1 ψ 1.3129 + 1.725 18 18 = 0.906 ψ 1.719 Lastly, we want to back out to correlation. e 2l 1 e 2l + 1 ρ e2u 1 e 2u + 1 = e2 0.906 1 e 2 0.906 + 1 ρ e2 1.719 1 e 2 1.719 + 1 = 0.719 ρ 0.938 Ex 10.11 (a) fit lwr upr 1 60.38332 47.2377 73.52893 Prediction Interval = [47.2377, 73.52893] fit lwr upr 1 60.38332 57.51009 63.25654 Confidence Interval = [57.51009, 63.25654] Note that confidence interval is narrower than the prediction interval. (c) fit lwr upr 1 40.80318 26.33021 55.27614

Prediction Interval = [26.33021, 55.27614] This prediction interval is not reliable because it extrapolates beyond the domain of the data. Codes Used # Copy the given data table in R data <- matrix(c(68, 20, 15, 5, 119, 84, 54, 29, 26, 17, 14, 14, 7, 94, 10, 16),ncol=4) rownames(data) <- c("brown", "Blue", "Hazel", "Green") colnames(data) <- c("black", "Brown", "Red", "Blond") # Perform Chi-square test chisq.test(data) # Note that you don't need to compute for expected values for each cell. R will automatically do all the work for you. # Copy the data into R LAST <- c(2, 1.8, 3.7, 2.2, 2.1, 2.4, 2.6, 2.8, 3.3, 3.5, 3.7, 3.8, 4.5, 4.7, 4, 4, 1.7, 1.8, 4.9, 4.2, 4.3) NEXT <- c(50, 57, 55, 47, 53, 50, 62, 57, 72, 62, 63, 70, 85, 75, 77, 70, 43, 48, 70, 79, 72) plot(last, NEXT, main="scatter Plot of NEXT vs LAST") model <- lm(next ~ LAST) abline(model) summary(model) # Correlation between LAST and NEXT cor(last, NEXT) # Prediction Interval newdata <- data.frame(last=3) predict(lm(next~last), newdata, interval="predict", level=0.95) # Correlation Interval predict(lm(next~last), newdata, interval="confidence", level=0.95) # Prediction Interval newdata <- data.frame(last=1) predict(lm(next~last), newdata, interval="predict", level=0.95)