Homework 9 Sample Solution

Size: px

Start display at page:

Download "Homework 9 Sample Solution"

Maurice Hunt
5 years ago
Views:

1 Homework 9 Sample Solution # 1 (Ex 9.12, Ex 9.23) Ex 9.12 (a) Let p vitamin denote the probability of having cold when a person had taken vitamin C, and p placebo denote the probability of having cold when a person had only taken placebo. H 0 : p vitamin = p placebo H A : p vitamin p placebo Note that p vitamin = , p placebo = z = p vitamin (1 p vitamin) n vitamin = p vitamin p placebo p placebo (1 p placebo) n placebo = P-value is 2 * (0.0136) = < 0.05 = α Thus, reject H 0 and we can conclude that Vitamin C significantly changes (reduces) the incidence rate of cold (or the probability of having cold). We will reject H 0 if χ 2 2 > χ (r 1)(c 1),α r c χ 2 = (n ij e ij) 2 i=1 j=1 Note that Chi-square test for two-way data can test both hypothesis of independence and hypothesis of homogeneity (p. 323). e ij

2 The following three ways of stating hypotheses are all right and equivalent. (1) H 0 : p(cold VC) = P(cold placebo) = P(cold), p(no cold VC) = P(no cold placebo) = P(no cold) H A : at least one is different. (2) H 0 : The chance of having cold is homogeneous (equal) in the group of VC and placebo. H A : The chance of having cold is heterogeneous (NOT equal) in the group of VC and placebo (3) H 0 : Having cold is independent of whether a person had vitamin C or placebo. H A : Having cold is NOT independent of whether a person had vitamin C or placebo. Observed Values Group Cold Column Yes No Total Vitamin C Placebo Row Total Expected Values Group Cold Column Yes No Total Vitamin C Placebo Row Total χ 2 = ( ) ( ) ( ) = > = χ 1, ( ) Thus, reject H 0 and we can conclude that Vitamin C reduces the incidence of cold (in rates).

3 Ex 9.23 (a) χ 2 2 = (n i e i ) 2 i=1 e i = (x np 0 )2 + (n x n(1 p 0 ))2 np 0 n(1 p 0 ) = (x np 0 )2 (1 p 0 )+(np 0 x) 2 p 0 np 0 (1 p 0 ) = (x np 0 )2 np 0 (1 p 0 ) = z2 We reject H 0 if z > z α/2 or if z 2 2 > z α/2 equivalent. # 2 (Ex 9.20) (a) = χ 2 1,α. It is evident that the two tests are, indeed, H 0 : p 1 = 9 16, p 2 = 3 16, p 3 = 3 16, p 4 = 1 16 H A = Not H 0 Note that e i = np i = 1611p i. Phenotype n i e i (n i e i ) 2 Tall, cut-leaf Dwarf, cut-leaf Tall, potato-leaf Dwarf, potato-leaf Total 1611 χ 2 = e i Note that χ 2 2 = χ 3,0.05 = Thus, we fail to reject H 0.

4 # 3 (Ex 9.22) (a) Note that λ = Then, p i = e 0.519(0.519)i i! Passengers n i p i e i (n i e i ) Greater than Total 1011 χ 2 = e i Note that the cell Greater than 5 was combined with cell 4, to satisfy the requirement that no cell can have e i < 1 and no more than 1/5 th of the e i can be < 5. Since χ 2 2 > χ 3,0.05 = 7.815, reject H 0 and conclude that the Poisson distribution is not a plausible distribution for the number of passengers. Since p = = 0.658, then p i = (1 p ) i 1 p = (0.342) i 1 (0.658) and e i = np i. Occupants n i p i e i (n i e i ) Greater than Total 1011 χ 2 = e i Since χ 2 > χ 4, = 9.488, reject H 0 and conclude that the geometric distribution is not a plausible distribution for the number of occupants.

5 (c) While neither is a plausible distribution for the data, the geometric distribution seemed to fit much better, since the χ 2 value is much smaller. Also note that the lack of fit of the geometric distribution comes primarily from the tail category (Greater than 6).

6 # 4 (Ex 10.14, Ex 10.33) Ex First of all, Note that Y = 1 Y n i and β 1 = c i Y i, where c i = x i x and c S i = 0. Then xx Cov(Y, β ) 1 = 1 n (c i)cov(y i, Y j ) i j = 1 n (c i)var(y i ) i = σ2 n c i i = 0 Since both Y and β 1 are both normally distributed (as linear functions of normal random variables), a correlation of 0 implies that they are independent. Ex y i = β 0 + β 1x i = y + β 1(x i x ) Then, (y i y i)(y i y ) = (y i y β 1(x i x ))(y + β 1(x i x ) y ) = (y i y β 1(x i x ))β 1(x i x ) = β 1 (y i y )(x i x ) β 12 x i x 2 = β 1S xy β 12 S xx = S xy 2 S 2 xy S xx S xx 2 S xx = 0

7 # 5 (Coding Assignment: Ex 9.32) (a) Note that the total number for each row or column is not fixed. Thus, this is an example of a multinomial sampling (Refer to page 322 of the textbook for explanation). Pearson's Chi-squared test data: data X-squared = , df = 9, p-value < 2.2e-16 # 6 (Coding Assignment: Ex 10.4, Ex 10.11) (a), > summary(model) Call: lm(formula = NEXT ~ LAST) Residuals: Min 1Q Median 3Q Max Coefficients: Estimate Std. Error t value Pr(> t ) (Intercept) e-06 *** LAST e-07 *** --- Signif. codes: 0 *** ** 0.01 * Residual standard error: on 19 degrees of freedom Multiple R-squared: , Adjusted R-squared: F-statistic: on 1 and 19 DF, p-value: 4.059e-07

8 y = = (c) Note that R-square value is from the regression output. Also, you can run ANOVA test, get SSR and SST to compute R 2 = SSR = = SST Analysis of Variance Table Response: NEXT Df Sum Sq Mean Sq F value Pr(>F) LAST e-07 *** Residuals Signif. codes: 0 *** ** 0.01 * (d) You can find σ = from the regression output. MSE can be found in the ANOVA table, and the value is You can also compute this: MSE = Thus, σ = SSE = = Residual Degrees of Freedom 19 (e) In the previous regression output, note that the p-value of the coefficient corresponds to t-test where H 0 : β 1 = 0 H A : β 1 0 Since the p-value 4.06e-07 < 0.05 and that the coefficient is positive with value 9.79, we conclude that the time to next eruption significantly increase when the duration of the last eruption increases. (f) We can use cor function in R to find the sample correlation r = 0.865, and cor.test function to find the confidence interval for correlation as [0.692, 0.944]. Another way to find confidence interval for correlation is to follow the steps in p.383. Define

9 Compute the z-statistic ψ = 1 2 log e ( 1 + r 1 r ) = 1 2 log e ( ) = z = n 3(ψ ψ 0 ) ψ t 20, n 3 ψ ψ + t 20, n 3 = ψ = ψ Lastly, we want to back out to correlation. e 2l 1 e 2l + 1 ρ e2u 1 e 2u + 1 = e e ρ e e = ρ Ex (a) fit lwr upr Prediction Interval = [ , ] fit lwr upr Confidence Interval = [ , ] Note that confidence interval is narrower than the prediction interval. (c) fit lwr upr

10 Prediction Interval = [ , ] This prediction interval is not reliable because it extrapolates beyond the domain of the data. Codes Used # Copy the given data table in R data <- matrix(c(68, 20, 15, 5, 119, 84, 54, 29, 26, 17, 14, 14, 7, 94, 10, 16),ncol=4) rownames(data) <- c("brown", "Blue", "Hazel", "Green") colnames(data) <- c("black", "Brown", "Red", "Blond") # Perform Chi-square test chisq.test(data) # Note that you don't need to compute for expected values for each cell. R will automatically do all the work for you. # Copy the data into R LAST <- c(2, 1.8, 3.7, 2.2, 2.1, 2.4, 2.6, 2.8, 3.3, 3.5, 3.7, 3.8, 4.5, 4.7, 4, 4, 1.7, 1.8, 4.9, 4.2, 4.3) NEXT <- c(50, 57, 55, 47, 53, 50, 62, 57, 72, 62, 63, 70, 85, 75, 77, 70, 43, 48, 70, 79, 72) plot(last, NEXT, main="scatter Plot of NEXT vs LAST") model <- lm(next ~ LAST) abline(model) summary(model) # Correlation between LAST and NEXT cor(last, NEXT) # Prediction Interval newdata <- data.frame(last=3) predict(lm(next~last), newdata, interval="predict", level=0.95) # Correlation Interval predict(lm(next~last), newdata, interval="confidence", level=0.95) # Prediction Interval newdata <- data.frame(last=1) predict(lm(next~last), newdata, interval="predict", level=0.95)

Section 4.6 Simple Linear Regression

Section 4.6 Simple Linear Regression Objectives ˆ Basic philosophy of SLR and the regression assumptions ˆ Point & interval estimation of the model parameters, and how to make predictions ˆ Point and interval