Section 4.6 Simple Linear Regression

Size: px
Start display at page:

Download "Section 4.6 Simple Linear Regression"

Transcription

1 Section 4.6 Simple Linear Regression Objectives ˆ Basic philosophy of SLR and the regression assumptions ˆ Point & interval estimation of the model parameters, and how to make predictions ˆ Point and interval estimation of future observations from the model ˆ Regression diagnostics, including R 2 and basic residual analysis Basic Philosophy We have two variables X and Y. Here, X is not random (so we will write x), but Y is random. We believe that Y depends in some way on x. Some typical examples of (x, Y ) pairs are ˆ x study time and Y score on a test. ˆ x height and Y weight. ˆ x father s height and Y son s height. We focus our efforts on estimating two parameters, β 0 and β in the simple linear regression model: Y i β 0 + β x i + ε i, where ε i N ( 0, σ 2) ˆ Y i is the (random) response for the ith case. ˆ β 0, β are unknown parameters that we want to estimate. β 0 (unknown) intercept, and β (unknown) slope. ˆ X i is the value of the predictor variable for the ith case. ˆ ε i is a (random) error term for the ith case, such that the mean 0, variance the same for all the cases, and the covariance between the ith and jth case 0. Least Squares Estimates We begin with the likelihood function L(β 0, β, σ 2 ) n f ( y i ; β 0, β, σ 2) ln L(β 0, β, σ 2 ) n 2 ln ( 2πσ 2) + n ( ) [ n/2 (yi β 0 β x i ) 2 2πσ 2 exp 2σ 2 [ ] exp (y i β 0 β x i ) 2 2πσ 2 2σ 2 ] (y i β 0 β x i ) 2 To maximize the log likelihood, let s minimize the summand, i.e., H 2σ 2 (y i β 0 β x i ) 2. That is, let s find β 0 and β that minimize H. Because of the two parameters, we differentiate this wrt

2 β 0, β and set them equal to zero, we get β 0 H 2 β H 2 2 (y i β 0 β x i ) 0 nβ 0 + β x i x i (y i β 0 β x i ) ( xi y i β 0 x i β x 2 ) i 0 β0 x i + β y i x 2 i x i y i Organizing these two equations, we get ˆβ 0 ȳ ˆβ x ( ) x i y i x i)( y i /n ˆβ ( ) 2 x 2 i x i /n Shown below are the second derivatives: (x i x) (y i ȳ) (x i x) 2 2 β 2 0 H 2n, 2 β β 0 H 2 x i, 2 β 0 β H 2 2 β 2 H 2 x 2 i x i And the 2 2 matrix consisting of these second-derivatives is positive definite because the (,)th element > 0 and its determinant is also > 0. 2n, 2 2 x i, 2 x i x 2 i det > 0 The conclusion? Use ŷ i ˆβ 0 + ˆβ x i line to ensure the line that fits the (x, y) pattern the best, i.e., the estimated line we have will leave the smallest gap between the observed y s and the estimated line. For this reason, they are also called the least squares estimates. Section 4.6+, page 2

3 Next, let s find the mle of σ 2. σ 2 { ln L(β0, β, σ 2 ) } n 2σ 2 (y i β 0 β x i ) 2 2 (σ 2 ) 2 0 We get ˆσ 2 n (y i ˆβ 0 ˆβ ) 2 x i One note: ˆ In statistics, the gap between the observed value (y i ) and the expected (or predicted) value (ŷ i ) is called the residual. So (y i ˆβ 0 ˆβ ) 2 x i (y i ŷ i ) 2 is the sum of squared residuals, and it s commonly called SS E. For a point estimate of σ 2, we use SS E n 2, i.e., SSE ˆσ s n 2. ˆ There are many equivalent formulas for ˆβ that are more intuitive, or at the least are easier to remember. One of the popular ones is ˆβ r SD y, where r correlation coefficient between SD x x and y, SD y sd of y and SD x sd of x. Inferences about the Parameters Let s learn some more notations: b ˆβ S xy b 0 ˆβ 0 ȳ b x (x i x) (y i ȳ) (x i x) 2 (x i x) y i (x i x) 2 Section 4.6+, page 3

4 Here is how to derive the expectation and the variance of the estimates: E (b ) E ( Sxy ) { } E (x i x) y i E S xx ] [ x i E (y i ) ne ( xȳ) [ ] x i (β 0 + β x i ) n x (β 0 + β x) [ β 0 ( x i n x (0 + β ) β ) + β ( x 2 i x 2 )] { } (x i y i xy i ) E (b 0 ) E (ȳ b x) E (β 0 + β x) xe (b ) β 0 ( ) { } { } Sxy V ar (b ) V ar Sxx 2 V ar (x i x) y i S 2 (x i x) 2 σ 2 xx ( ) V ar (b 0 ) V ar (ȳ b x) σ 2 n + x2 Furthermore, it can be shown that b N (mean β, sd σ b ), where σ b σ Sxx σ2 σ b σ is also called the standard error of b and we can estimate σ from the previous descrip- Sxx tion by s SSE n 2. So the SE of b becomes s b s Sxx See textbook page It can be shown that SS E σ 2 n ˆσ 2 σ 2 χ2 (n 2). Also, it turns out that b 0, b, and s are mutually independent. Therefore, we have the following t-distribution. T b β σ b SS E σ 2 /(n 2) b β σ/ SS E σ 2 /(n 2) b β s/ b β s b t df(n 2) Therefore, a 00( α)% confidence interval for β is given by b ± t df(n 2) α/2 s b Section 4.6+, page 4

5 It can also be shown in a similar way, b 0 N (mean β 0, sd σ b0 ), where σ b0 σ n + x2 σ b0 σ n + x2 is the standard error of b 0 and the SE of b becomes Therefore, we have another t-distribution. s b0 s n + x2 T 0 b 0 β 0 σ b0 SS E σ 2 /(n 2) b 0 β 0 σ SS E n + x2 Sxx σ 2 /(n 2) b 0 β 0 s n + x2 b 0 β 0 s b0 t df(n 2) Therefore, a 00( α)% confidence interval for β is given by b 0 ± t df(n 2) α/2 s b0 We have seen how to estimate the coefficients of a regression line with both point estimates and confidence intervals. We have learned how to estimate a value ŷ on the regression line for a given value of x, such as x x 0. But how good is our estimate ŷ at x x 0? How much confidence do we have in this estimate? Furthermore, suppose we were going to observe another value of y at x x 0. What can we say? Intuitively, it should be easier to get bounds on the mean (average) value of y at x 0 (called a confidence interval for the mean value of y at x 0 ) than it is to get bounds on a future observation of y (called a prediction interval for y at x 0 ). It turns out the confidence intervals are narrower for the mean value, wider for the individual value. Our point estimate of y at x 0 is, of course, ŷ at x 0, so for a confidence interval we will need to know the sampling distribution of ŷ s. It turns out that ŷ at x 0 is distributed as ŷ N ) (mean E (y x0 ), sd σŷx0, where σŷx0 σ n + (x 0 x 2 ) Section 4.6+, page 5

6 σŷx0 σ n + (x 0 x 2 ) is the standard error of ŷ x0 and the estimate is sŷx0 s n + (x 0 x 2 ) Therefore, we have the following t-distribution. T 2 ŷ x0 E(y x0 ) σŷx0 SS E σ 2 /(n 2) ŷ x0 E(y x0 ) n + (x 0 x2 ) Sxx σ SS E σ 2 /(n 2) ŷx 0 E (y x0 ) ŷ x0 E (y x0 ) s n + (x 0 x 2 ) sŷx0 t df(n 2) Therefore, a 00( α)% confidence interval (C.I.) for E (y) at x 0 is given by ŷ x0 ± t df(n 2) α/2 sŷx0 Next, the prediction intervals are slightly different. In order to find confidence bounds for a new observation of y (we will denote it y future ) we use the fact that ŷ future N ( mean E (y future ), sd σŷfuture ), where σŷfuture σ + n + (x 0 x 2 ) Of course σ is unknown and we estimate it with s. Therefore, a 00( α)% prediction interval (P.I.) for a future value of y at x 0 is given by ŷ x0 ± t df(n 2) α/2 sŷfuture Take note that the prediction interval is wider than the confidence interval, as its SE is greater. Ex. Consider the following sample data and carry out all the inferences involved. Midterm (X) Final (Y ) Section 4.6+, page 6

7 > x <- c(70,74,80,84,80,67,70,64,74,82) > y <- c(87,79,88,98,96,73,83,79,9,94) > model <- lm(y~x) > summary(model) Coefficients: Estimate Std. Error t value Pr(> t ) (Intercept) x ** --- Residual standard error: on 8 degrees of freedom Multiple R-squared: , Adjusted R-squared: F-statistic: 9 on and 8 DF, p-value: > plot(y~x,pch6,col2) > abline(model,col4) > predict(model,interval"confidence") fit lwr upr > predict(model,interval"prediction") fit lwr upr > newx <- seq(60,95,0.2) > ci <- predict(model,list(xnewx), interval"confidence") > pi <- predict(model,list(xnewx), interval"prediction") > plot(x,y,pch6,col2) > matplot(newx,ci,type"l",ltyc(,2,2),colc(,2,2),addt) > matplot(newx,pi,type"l",ltyc(,3,3),colc(,4,4),addt) > legend(locator(),c("regression line","95% ci","95% pi"),cex0.8,lty:3,colc(,2,4)) > #The following command creates four diagnostic plots. > par(mfrowc(,4)) Section 4.6+, page 7

8 > plot(model) Figure : Regression line, 95% CI & 95% PI Figure 2: Diagnostic plots of a regression model Section 4.6+, page 8

9 Section 4.8 One-Factor ANOVA One-Factor Samples Suppose you have collected n i, where (i, 2,..., m) samples from m groups: Groups Means Y : Y Y 2 Y n Ȳ Y 2: Y 2 Y 22 Y 2n2 Ȳ 2..:..... Y m: Y m Y m2 Y mnm Ȳ m Grand Mean: Ȳ The hypotheses we want to test are: H 0 : µ µ 2 µ m (i.e., all group means are the same.) H : not H 0 (i.e., some group means are significantly different.) In the end, all will be summarized in the following ANOVA table: source SS df MS F -value p-value Treatment SS trt m MS trt SS trt m Error SS E n m MS E SS E n m Total SS tot n MS trt/ms E Here are all the SS (sum of squares) numbers and how the SS tot is partitioned: SS tot m n i ( Yij Ȳ ) 2 m n i ( Yij Ȳi + Ȳi Ȳ ) 2 j n i j n i m ( Yij Ȳi ) 2 m + j n i j m ( Yij Ȳ ) 2 m + j ) 2 (Ȳi Ȳ cross-product term 0 ) 2 n i (Ȳi Ȳ SS E + SS trt We also have SS trt σ 2 χ 2 (m ), SS E σ 2 χ 2 (n m) SS trt /(m ) σ 2 SS E /(n m) SS trt/(m ) SS σ 2 E /(m m) MS trt MS E F (m ),(n m) Section 4.6+, page 9

10 Ex 2. Consider the following sample data and carry out all the inferences involved. Observations Group : Group 2: Group 3: Group 4: Group 5: > grp <- c(rep(,7),rep(2,7),rep(3,7),rep(4,7),rep(5,7)) > y <- c(92,90,87,05,86,83,02,00,08,98,0,4,97,94, + 43,49,38,36,39,20,45,47,44,60,49,52,3,34, + 42,55,9,34,33,46,52) > data <- data.frame(cbind(grp,y)) > head(data) > attach(data) > grp <- factor(grp) > boxplot(y~grp,col"pink") > model <- lm(y~grp) > summary(model) Coefficients: Estimate Std. Error t value Pr(> t ) (Intercept) < 2e-6 *** grp * grp e-0 *** grp e- *** grp e-0 *** --- Residual standard error: 9.7 on 30 degrees of freedom Multiple R-squared: , Adjusted R-squared: F-statistic: 44.2 on 4 and 30 DF, p-value: 3.664e-2 > anova(model) Analysis of Variance Table Response: y Df Sum Sq Mean Sq F value Pr(>F) grp e-2 *** Residuals > summary(aov(y~grp)) Df Sum Sq Mean Sq F value Pr(>F) grp e-2 *** Residuals > boxplot(y~grp,col"pink") > plot(tukeyhsd(aov(y~grp))) Section 4.6+, page 0

11 > par(mfrowc(,4)) > plot(model) Figure 3: Boxplot & Tukey s pairwise comparison Figure 4: Diagnostic plots of ANOVA Section 4.6+, page

12 Section 4.0 χ 2 Tests Review: Facts about χ 2 -Distribution In the χ 2 distribution X χ 2 df, where df ( degrees of freedom) is the only parameter that uniquely determines the shape. The (theoretical) population mean is µ df and the (theoretical) population standard deviation is σ 2 df. ˆ If you square a random variable that has the standard normal distribution, it has χ 2 (df) distribution. This is often written as Z 2 χ 2 (df). ˆ The random variable with a χ 2 distribution with k degrees of freedom is the sum of k independent, squared standard normal variables, i.e., χ 2 (dfk) Z2 + Z Z2 k, where Z N(0, ). ˆ The curve is nonsymmetrical and skewed to the right. ˆ The mean, µ, is always located just to the right of the peak. ˆ The χ 2 test statistic is always greater than or equal to zero. ˆ When df > 90, the χ 2 curve is approximated by the normal distribution. X χ 2 (df000), then, X N(µ 000, σ ). For example, χ 2 Goodness of Fit Test We test whether the data fits a particular distribution or not. For example, we can test if the color distribution of M&M bags fits what the company claims on their webpage. After flipping a coin many times, we can test if it fits a binomial distribution. We use a χ 2 test statistic to determine if there is a good fit or not. Why χ 2? Demo for a binomial case Let Y binomial(n, p ), then Z Y np np ( p ) the CLT. Consider the following: has an approximate N(0, ) distribution due to Q (Y np ) 2 np ( p ) (Y np ) 2 + (Y np ) 2 (Why?) np n( p ) (Y np ) 2 + (Y 2 np 2 ) 2 [ (Y np ) 2 {n Y n( p )} 2 (Y 2 np 2 ) 2] np np 2 2 (Y i np i ) 2 χ 2 (df) np i Section 4.6+, page 2

13 This will be generalized to when there are k many categories. It can be shown: Q k k (Y i np i ) 2 np i χ 2 (dfk ) The null and the alternative hypotheses for the goodness-of-fit test can be written as: H 0 : p i p i0, where i, 2,..., k, (i.e., data fits the hypothesized distribution) H : p i p i0 (i.e., at least in some cases, data does NOT fit the hypothesized distribution) Ex. People were asked to write bunch of random digits. The result: If these digits are truly random, the probability of the next digit can be either the same as the preceding one with the probability of /0 or one away from the preceding one with the probability of 2/0, or neither cases with the probability of 7/0. We want to test whether the data fits this thinking (i.e., random sequence examined by this idea), i.e., H 0 : p 0, p 2 2 0, p H : At least one of the cases is significantly different from the hypothesized proportion. Here is summary: observed freq expected freq same digit 0 5 (/0) 5. one-away digit 8 5 (2/0) 0.2 others 43 5 (7/0) 35.7 Test statistic: χ 2 3 (observed expected) 2 expected (0 5.) (8 0.2) ( ) χ 2 (df2) p-value , so we reject H 0 and conclude that the data didn t follow the hypothesized proportion, i.e., the data doesn t seem random. The whole thing can be done in R as shown below. > x <- c(0,8,43) > chisq.test(x,p <- c(0., 0.2, 0.7)) Pearson s Chi-squared test data: x and p <- c(0., 0.2, 0.7) X-squared 6, df 4, p-value 0.99 Section 4.6+, page 3

14 Ex 2. You flipped a coin 4 times a day and counted total number of H s every day. You did this for 00 days. The result: Number of H s observed freq Test if the result agrees with X (total number of H s) being a binomial (4, /2). Answer: Number of H s observed freq expected freq Test statistic: χ 2 5 (obs exp) 2 exp (7 6.25) (8 25) (4 6.25) χ 2 (df4) p-value , so we do not reject H 0 and conclude that the data supports the hypothesis of binomial (4, 0.5). Ex 3. You lose one more df by estimating another parameter! Shown below are X, number of α particles emitted by barium-33 in /0 of a second, and counted by a Geiger counter Test H 0 : X P oisson. Answer: We first have to estimate the Poisson parameter λ by the mean of data, i.e., ˆλ x 5.4. Then, we calculate the expected probabilities for each case and expected frequencies. Cases observed freq expected freq {0,,2,3} {4} {5} {6} {7} {8, 9,... } Section 4.6+, page 4

15 Test statistic: χ 2 6 (obs exp) 2 exp (3 0.65) (0 8.90) χ 2 (df4) p-value 0.408, so we do not reject H 0 and conclude that the data cannot reject the hypothesis that the counts form a Poisson distribution. χ 2 Test for Homogeneity The goodness-of-fit test can be used to decide whether a data fits a given distribution, but it will not suffice to decide whether two populations follow the same unknown distribution. A different test, called the test for homogeneity, can be used to draw a conclusion about whether two populations have the same distribution. Here we re concerned about: H 0 : The distributions of the two populations are the same. H : The distributions of the two populations are NOT the same. Ex 4. Shown below are grade distribution of two groups of students. observed freq A B C D F total Group I Group II Test H 0 : Grade distribution of the two groups are the same. Answer: Under H 0 that the probabilities of each grade is equal, the respective estimates of the probabilities are: 2/000.2, 22/000.22, 30/000.3, 26/000.26, and 0/000.. Note also, since we have estimated these probabilities, the χ 2 test statistic will have a df (5 ) + (5 ) 4 4. Here are the expected frequencies for each case. expected freq A B C D F Group I Group II Test statistic: 2 5 χ 2 (obs exp) 2 exp j (8 6) (7 5) χ 2 (df4) p-value , so we do not reject H 0 and conclude that we cannot say there is a significant difference in grade distribution between the two groups. Section 4.6+, page 5

16 > data <- matrix(c(8,4,3,9,6,4,0,6,3,7), nrow2, ncol5) > chisq.test(as.table(data))$observed A B C D E A B > chisq.test(as.table(data))$expected A B C D E A B > chisq.test(as.table(data))$residual A B C D E A B > chisq.test(as.table(data)) Pearson s Chi-squared test data: as.table(data) X-squared 5.786, df 4, p-value χ 2 Test for Independence Test of independence involves using a contingency table of observed (data) values. statistic for a test of independence is similar to that of a goodness-of-fit test: The test c r χ 2 (obs exp) 2 exp j χ 2 df(r )(c ), where r number of rows, and c number of columns. Ex 5. A random sample of 400 students at the University of Iowa shows the following breakdown of gender and colleges where they study. observed freq Business Engineering Liberal Arts Nursing Pharmacy total Male Females Test H 0 : p ij p i p j (i.e., the college where a student studies is independent of the gender.) Answer: > data2 <- matrix(c(2,4,6,4,45,75,2,3,6,4), nrow2, ncol5) > chisq.test(as.table(data2))$observed A B C D E A B > chisq.test(as.table(data2))$expected A B C D E Section 4.6+, page 6

17 A B > chisq.test(as.table(data2))$residual A B C D E A B > chisq.test(as.table(data2)) Pearson s Chi-squared test data: as.table(data2) X-squared , df 4, p-value We do reject H 0 and conclude that the number of students in colleges is highly dependent on gender, i.e., the two variables (gender and which college) are NOT independent. Section 4.9 Distribution-Free CI & TI Basics Let Y < Y 2 < Y 3 < Y 4 < Y 5 be the order statistics of a random sample of size n 5 from any continuous distribution. Also, let m π 0.5 (i.e., the 50th percentile) be the median. For example, we can find the following probability: P (Y < m < Y 5 ) 4 k ( 5 k ) ( 2 Why P (Y < m < Y 5 ) is calculated like this? ) k ( ) 5 k 2 P (X 0) P (X 5), where X binomial(5, /2) ( ) 5 ( ) First, for any individual observation, say X, has P (X < m) 0.5, and in order for Y to be less than m and Y 5 to be greater than m, we must have, 2, 3, or 4 observations to be less than m. And we say (y, y 5 ) is a 94% (distribution-free) confidence interval for m. In a similar way, when there are n independent trials, we calculate: j P (Y i < m < Y j ) ki ( n k α ) ( 2 ) k ( ) n k 2 Section 4.6+, page 7

18 and (y i, y j ) is a 00( α)% (distribution-free) confidence intervals for the median m. Ex. Suppose we have an ordered set of data (n 9) like: Let s calculate: P (Y 2 < m < Y 8 ) 7 k2 ( 9 k ) ( 2 ) k ( ) 9 k and (y 2, y 8 ) (9.0, 30.) is a 96.% (distribution-free) confidence intervals for the median m. It turns out we can argue the same thing for any percentile π p. In this case, any individual observation X has P (X < π p ) p, so when there are n independent trials, we calculate: j ( ) n P (Y i < π p < Y j ) p k ( p) n k α k ki and (y i, y j ) is a 00( α)% (distribution-free) confidence intervals for the percentile π p. Ex 2. Suppose we have an ordered set of data (n 27) like: First, note that π 0.25 (i.e., the first quartile) (n + )p (27 + )(0.25) 7, and we have ˆπ 0.25 y Now, let s see how much confidence we can have with (y 4, y 0 ) being a confidence interval for y 7. P (Y 4 < π 0.25 < Y 0 ) 9 k4 ( ) 27 (0.25) k (0.75) 27 k k i.e., (y 4, y 0 ) (74, 87) is a 82.0% (distribution-free) confidence intervals for the 25th percentile π One note: For some of these binomial probability calculations, it s OK to use the normal approximation. For example, in the last ( problem where we calculate P (4 X 9), where X binomial (n 27, p /4), and X. N µ 27/4 6.75, σ ) 27 (/4) (3/4) Finding the same probability by normal approximation, we have: ( P (4 X 9) P (3.5 X 9.5) P Z 2.25 ) Section 4.6+, page 8

19 i.e., the normal approximation works rather well for such a case. Theorem. Let Y < Y 2 < < Y n be the order statistics (based on random samples x, x 2,..., x n ). Then the pdf of Y k is Proof. g k (y) n! (k )!(n k)! [F (y)]k f(y) [ f(y)] n k, where f( ), F ( ) pdf and cdf of X. Theorem 2. Let U () < U (2) < < U (n) be the order statistics, where U i uniform(0, ). Then U (k) has a beta distribution with two parameters k and (n k + ). Proof. From Theorem, we have g k (y) n! (k )!(n k)! (y)k ( y) n k, 0 < y < pdf of β(k, n k + ) Theorem 3. Let X, X 2,..., X n be random variables with cdf F ( ), then, where U i uniform(0, ). Then F { } X (k) has a beta distribution with two parameters k and (n k + ). Proof. First, note that U i F (X i ) is iid uniform (0, ) due to the probability integral transformation. Furthermore, F ( ) is a nondecreasing function, i.e., F ( ) preserves order. So, U (i) F { } X (i). That is, { } { ( ) ( ) ( )} U(), U (2),..., U (n) F X(), F X(2),..., F X(n) F ( ) X (k) β(k, n k + ) Application: Let Y k be the order statistic of X k, i.e., Y k X (k). Consider the following n + Section 4.6+, page 9

20 random variables: W F (Y ) W 2 F (Y 2 ) F (Y ) W 3 F (Y 3 ) F (Y 2 ) W n F (Y n ) F (Y n ) W n+ F (Y n ) ˆ These W, W 2,..., W n+ are called the coverage of intervals, for example (Y i, Y i+ ]. ˆ Note that sum of k of thse intervals, i.e., W + W k F (Y k ) β(k, n k + ) ˆ F (Y j ) F (Y i ), i < j is the sum of k j i coverages, so that it will have β(j i, n j +i+), i.e., γ P {F (Y j ) F (Y i ) p} p Γ(n + ) Γ(j i)γ(n j + i + ) vj i ( v) n j+i dv and this is called a 00γ% tolerance interval for 00p% of the distribution. Ex 3. Let Y < Y 2 < < Y 6 be the order statistics of a random sample of size n 6 from any continuous distribution. Also, let p 0.8, then γ P {F (Y 6 ) F (Y ) 0.8} 0.8 Γ(7) Γ(5)Γ(2) v4 ( v)dv 0.34 i.e., (y, y 6 ) is a 34% (distribution-free) tolerance interval for 80% of the distribution. Section 4.6+, page 20

Ch 2: Simple Linear Regression

Ch 2: Simple Linear Regression Ch 2: Simple Linear Regression 1. Simple Linear Regression Model A simple regression model with a single regressor x is y = β 0 + β 1 x + ɛ, where we assume that the error ɛ is independent random component

More information

Simple Linear Regression

Simple Linear Regression Simple Linear Regression In simple linear regression we are concerned about the relationship between two variables, X and Y. There are two components to such a relationship. 1. The strength of the relationship.

More information

Homework 9 Sample Solution

Homework 9 Sample Solution Homework 9 Sample Solution # 1 (Ex 9.12, Ex 9.23) Ex 9.12 (a) Let p vitamin denote the probability of having cold when a person had taken vitamin C, and p placebo denote the probability of having cold

More information

ANOVA Situation The F Statistic Multiple Comparisons. 1-Way ANOVA MATH 143. Department of Mathematics and Statistics Calvin College

ANOVA Situation The F Statistic Multiple Comparisons. 1-Way ANOVA MATH 143. Department of Mathematics and Statistics Calvin College 1-Way ANOVA MATH 143 Department of Mathematics and Statistics Calvin College An example ANOVA situation Example (Treating Blisters) Subjects: 25 patients with blisters Treatments: Treatment A, Treatment

More information

Lectures on Simple Linear Regression Stat 431, Summer 2012

Lectures on Simple Linear Regression Stat 431, Summer 2012 Lectures on Simple Linear Regression Stat 43, Summer 0 Hyunseung Kang July 6-8, 0 Last Updated: July 8, 0 :59PM Introduction Previously, we have been investigating various properties of the population

More information

Simple and Multiple Linear Regression

Simple and Multiple Linear Regression Sta. 113 Chapter 12 and 13 of Devore March 12, 2010 Table of contents 1 Simple Linear Regression 2 Model Simple Linear Regression A simple linear regression model is given by Y = β 0 + β 1 x + ɛ where

More information

Inference for Regression

Inference for Regression Inference for Regression Section 9.4 Cathy Poliak, Ph.D. cathy@math.uh.edu Office in Fleming 11c Department of Mathematics University of Houston Lecture 13b - 3339 Cathy Poliak, Ph.D. cathy@math.uh.edu

More information

Lecture 15. Hypothesis testing in the linear model

Lecture 15. Hypothesis testing in the linear model 14. Lecture 15. Hypothesis testing in the linear model Lecture 15. Hypothesis testing in the linear model 1 (1 1) Preliminary lemma 15. Hypothesis testing in the linear model 15.1. Preliminary lemma Lemma

More information

Chapter 12 - Lecture 2 Inferences about regression coefficient

Chapter 12 - Lecture 2 Inferences about regression coefficient Chapter 12 - Lecture 2 Inferences about regression coefficient April 19th, 2010 Facts about slope Test Statistic Confidence interval Hypothesis testing Test using ANOVA Table Facts about slope In previous

More information

1-Way ANOVA MATH 143. Spring Department of Mathematics and Statistics Calvin College

1-Way ANOVA MATH 143. Spring Department of Mathematics and Statistics Calvin College 1-Way ANOVA MATH 143 Department of Mathematics and Statistics Calvin College Spring 2010 The basic ANOVA situation Two variables: 1 Categorical, 1 Quantitative Main Question: Do the (means of) the quantitative

More information

22s:152 Applied Linear Regression. Take random samples from each of m populations.

22s:152 Applied Linear Regression. Take random samples from each of m populations. 22s:152 Applied Linear Regression Chapter 8: ANOVA NOTE: We will meet in the lab on Monday October 10. One-way ANOVA Focuses on testing for differences among group means. Take random samples from each

More information

22s:152 Applied Linear Regression. There are a couple commonly used models for a one-way ANOVA with m groups. Chapter 8: ANOVA

22s:152 Applied Linear Regression. There are a couple commonly used models for a one-way ANOVA with m groups. Chapter 8: ANOVA 22s:152 Applied Linear Regression Chapter 8: ANOVA NOTE: We will meet in the lab on Monday October 10. One-way ANOVA Focuses on testing for differences among group means. Take random samples from each

More information

GROUPED DATA E.G. FOR SAMPLE OF RAW DATA (E.G. 4, 12, 7, 5, MEAN G x / n STANDARD DEVIATION MEDIAN AND QUARTILES STANDARD DEVIATION

GROUPED DATA E.G. FOR SAMPLE OF RAW DATA (E.G. 4, 12, 7, 5, MEAN G x / n STANDARD DEVIATION MEDIAN AND QUARTILES STANDARD DEVIATION FOR SAMPLE OF RAW DATA (E.G. 4, 1, 7, 5, 11, 6, 9, 7, 11, 5, 4, 7) BE ABLE TO COMPUTE MEAN G / STANDARD DEVIATION MEDIAN AND QUARTILES Σ ( Σ) / 1 GROUPED DATA E.G. AGE FREQ. 0-9 53 10-19 4...... 80-89

More information

CAS MA575 Linear Models

CAS MA575 Linear Models CAS MA575 Linear Models Boston University, Fall 2013 Midterm Exam (Correction) Instructor: Cedric Ginestet Date: 22 Oct 2013. Maximal Score: 200pts. Please Note: You will only be graded on work and answers

More information

[y i α βx i ] 2 (2) Q = i=1

[y i α βx i ] 2 (2) Q = i=1 Least squares fits This section has no probability in it. There are no random variables. We are given n points (x i, y i ) and want to find the equation of the line that best fits them. We take the equation

More information

I i=1 1 I(J 1) j=1 (Y ij Ȳi ) 2. j=1 (Y j Ȳ )2 ] = 2n( is the two-sample t-test statistic.

I i=1 1 I(J 1) j=1 (Y ij Ȳi ) 2. j=1 (Y j Ȳ )2 ] = 2n( is the two-sample t-test statistic. Serik Sagitov, Chalmers and GU, February, 08 Solutions chapter Matlab commands: x = data matrix boxplot(x) anova(x) anova(x) Problem.3 Consider one-way ANOVA test statistic For I = and = n, put F = MS

More information

Confidence Intervals, Testing and ANOVA Summary

Confidence Intervals, Testing and ANOVA Summary Confidence Intervals, Testing and ANOVA Summary 1 One Sample Tests 1.1 One Sample z test: Mean (σ known) Let X 1,, X n a r.s. from N(µ, σ) or n > 30. Let The test statistic is H 0 : µ = µ 0. z = x µ 0

More information

STAT 215 Confidence and Prediction Intervals in Regression

STAT 215 Confidence and Prediction Intervals in Regression STAT 215 Confidence and Prediction Intervals in Regression Colin Reimer Dawson Oberlin College 24 October 2016 Outline Regression Slope Inference Partitioning Variability Prediction Intervals Reminder:

More information

Chapter 26: Comparing Counts (Chi Square)

Chapter 26: Comparing Counts (Chi Square) Chapter 6: Comparing Counts (Chi Square) We ve seen that you can turn a qualitative variable into a quantitative one (by counting the number of successes and failures), but that s a compromise it forces

More information

Business Statistics. Lecture 10: Course Review

Business Statistics. Lecture 10: Course Review Business Statistics Lecture 10: Course Review 1 Descriptive Statistics for Continuous Data Numerical Summaries Location: mean, median Spread or variability: variance, standard deviation, range, percentiles,

More information

Chapter 1: Linear Regression with One Predictor Variable also known as: Simple Linear Regression Bivariate Linear Regression

Chapter 1: Linear Regression with One Predictor Variable also known as: Simple Linear Regression Bivariate Linear Regression BSTT523: Kutner et al., Chapter 1 1 Chapter 1: Linear Regression with One Predictor Variable also known as: Simple Linear Regression Bivariate Linear Regression Introduction: Functional relation between

More information

Table of z values and probabilities for the standard normal distribution. z is the first column plus the top row. Each cell shows P(X z).

Table of z values and probabilities for the standard normal distribution. z is the first column plus the top row. Each cell shows P(X z). Table of z values and probabilities for the standard normal distribution. z is the first column plus the top row. Each cell shows P(X z). For example P(X 1.04) =.8508. For z < 0 subtract the value from

More information

Log-linear Models for Contingency Tables

Log-linear Models for Contingency Tables Log-linear Models for Contingency Tables Statistics 149 Spring 2006 Copyright 2006 by Mark E. Irwin Log-linear Models for Two-way Contingency Tables Example: Business Administration Majors and Gender A

More information

Multiple Linear Regression

Multiple Linear Regression Multiple Linear Regression Simple linear regression tries to fit a simple line between two variables Y and X. If X is linearly related to Y this explains some of the variability in Y. In most cases, there

More information

Introduction and Single Predictor Regression. Correlation

Introduction and Single Predictor Regression. Correlation Introduction and Single Predictor Regression Dr. J. Kyle Roberts Southern Methodist University Simmons School of Education and Human Development Department of Teaching and Learning Correlation A correlation

More information

Homework 2: Simple Linear Regression

Homework 2: Simple Linear Regression STAT 4385 Applied Regression Analysis Homework : Simple Linear Regression (Simple Linear Regression) Thirty (n = 30) College graduates who have recently entered the job market. For each student, the CGPA

More information

Statistics 135 Fall 2008 Final Exam

Statistics 135 Fall 2008 Final Exam Name: SID: Statistics 135 Fall 2008 Final Exam Show your work. The number of points each question is worth is shown at the beginning of the question. There are 10 problems. 1. [2] The normal equations

More information

Exercise I.1 I.2 I.3 I.4 II.1 II.2 III.1 III.2 III.3 IV.1 Question (1) (2) (3) (4) (5) (6) (7) (8) (9) (10) Answer

Exercise I.1 I.2 I.3 I.4 II.1 II.2 III.1 III.2 III.3 IV.1 Question (1) (2) (3) (4) (5) (6) (7) (8) (9) (10) Answer Solutions to Exam in 02402 December 2012 Exercise I.1 I.2 I.3 I.4 II.1 II.2 III.1 III.2 III.3 IV.1 Question (1) (2) (3) (4) (5) (6) (7) (8) (9) (10) Answer 3 1 5 2 5 2 3 5 1 3 Exercise IV.2 IV.3 IV.4 V.1

More information

Review for Final. Chapter 1 Type of studies: anecdotal, observational, experimental Random sampling

Review for Final. Chapter 1 Type of studies: anecdotal, observational, experimental Random sampling Review for Final For a detailed review of Chapters 1 7, please see the review sheets for exam 1 and. The following only briefly covers these sections. The final exam could contain problems that are included

More information

Nature vs. nurture? Lecture 18 - Regression: Inference, Outliers, and Intervals. Regression Output. Conditions for inference.

Nature vs. nurture? Lecture 18 - Regression: Inference, Outliers, and Intervals. Regression Output. Conditions for inference. Understanding regression output from software Nature vs. nurture? Lecture 18 - Regression: Inference, Outliers, and Intervals In 1966 Cyril Burt published a paper called The genetic determination of differences

More information

Summary of Chapter 7 (Sections ) and Chapter 8 (Section 8.1)

Summary of Chapter 7 (Sections ) and Chapter 8 (Section 8.1) Summary of Chapter 7 (Sections 7.2-7.5) and Chapter 8 (Section 8.1) Chapter 7. Tests of Statistical Hypotheses 7.2. Tests about One Mean (1) Test about One Mean Case 1: σ is known. Assume that X N(µ, σ

More information

Biostatistics for physicists fall Correlation Linear regression Analysis of variance

Biostatistics for physicists fall Correlation Linear regression Analysis of variance Biostatistics for physicists fall 2015 Correlation Linear regression Analysis of variance Correlation Example: Antibody level on 38 newborns and their mothers There is a positive correlation in antibody

More information

Simple Linear Regression

Simple Linear Regression Simple Linear Regression MATH 282A Introduction to Computational Statistics University of California, San Diego Instructor: Ery Arias-Castro http://math.ucsd.edu/ eariasca/math282a.html MATH 282A University

More information

MA 575 Linear Models: Cedric E. Ginestet, Boston University Midterm Review Week 7

MA 575 Linear Models: Cedric E. Ginestet, Boston University Midterm Review Week 7 MA 575 Linear Models: Cedric E. Ginestet, Boston University Midterm Review Week 7 1 Random Vectors Let a 0 and y be n 1 vectors, and let A be an n n matrix. Here, a 0 and A are non-random, whereas y is

More information

Mathematical Notation Math Introduction to Applied Statistics

Mathematical Notation Math Introduction to Applied Statistics Mathematical Notation Math 113 - Introduction to Applied Statistics Name : Use Word or WordPerfect to recreate the following documents. Each article is worth 10 points and should be emailed to the instructor

More information

Table of z values and probabilities for the standard normal distribution. z is the first column plus the top row. Each cell shows P(X z).

Table of z values and probabilities for the standard normal distribution. z is the first column plus the top row. Each cell shows P(X z). Table of z values and probabilities for the standard normal distribution. z is the first column plus the top row. Each cell shows P(X z). For example P(X.04) =.8508. For z < 0 subtract the value from,

More information

Review of Statistics 101

Review of Statistics 101 Review of Statistics 101 We review some important themes from the course 1. Introduction Statistics- Set of methods for collecting/analyzing data (the art and science of learning from data). Provides methods

More information

Correlation & Simple Regression

Correlation & Simple Regression Chapter 11 Correlation & Simple Regression The previous chapter dealt with inference for two categorical variables. In this chapter, we would like to examine the relationship between two quantitative variables.

More information

Central Limit Theorem ( 5.3)

Central Limit Theorem ( 5.3) Central Limit Theorem ( 5.3) Let X 1, X 2,... be a sequence of independent random variables, each having n mean µ and variance σ 2. Then the distribution of the partial sum S n = X i i=1 becomes approximately

More information

STAT 135 Lab 11 Tests for Categorical Data (Fisher s Exact test, χ 2 tests for Homogeneity and Independence) and Linear Regression

STAT 135 Lab 11 Tests for Categorical Data (Fisher s Exact test, χ 2 tests for Homogeneity and Independence) and Linear Regression STAT 135 Lab 11 Tests for Categorical Data (Fisher s Exact test, χ 2 tests for Homogeneity and Independence) and Linear Regression Rebecca Barter April 20, 2015 Fisher s Exact Test Fisher s Exact Test

More information

Linear Regression. In this lecture we will study a particular type of regression model: the linear regression model

Linear Regression. In this lecture we will study a particular type of regression model: the linear regression model 1 Linear Regression 2 Linear Regression In this lecture we will study a particular type of regression model: the linear regression model We will first consider the case of the model with one predictor

More information

Master s Written Examination - Solution

Master s Written Examination - Solution Master s Written Examination - Solution Spring 204 Problem Stat 40 Suppose X and X 2 have the joint pdf f X,X 2 (x, x 2 ) = 2e (x +x 2 ), 0 < x < x 2

More information

" M A #M B. Standard deviation of the population (Greek lowercase letter sigma) σ 2

 M A #M B. Standard deviation of the population (Greek lowercase letter sigma) σ 2 Notation and Equations for Final Exam Symbol Definition X The variable we measure in a scientific study n The size of the sample N The size of the population M The mean of the sample µ The mean of the

More information

Chapter 8: Correlation & Regression

Chapter 8: Correlation & Regression Chapter 8: Correlation & Regression We can think of ANOVA and the two-sample t-test as applicable to situations where there is a response variable which is quantitative, and another variable that indicates

More information

13 Simple Linear Regression

13 Simple Linear Regression B.Sc./Cert./M.Sc. Qualif. - Statistics: Theory and Practice 3 Simple Linear Regression 3. An industrial example A study was undertaken to determine the effect of stirring rate on the amount of impurity

More information

ANOVA: Analysis of Variation

ANOVA: Analysis of Variation ANOVA: Analysis of Variation The basic ANOVA situation Two variables: 1 Categorical, 1 Quantitative Main Question: Do the (means of) the quantitative variables depend on which group (given by categorical

More information

Review of Statistics

Review of Statistics Review of Statistics Topics Descriptive Statistics Mean, Variance Probability Union event, joint event Random Variables Discrete and Continuous Distributions, Moments Two Random Variables Covariance and

More information

Ch. 1: Data and Distributions

Ch. 1: Data and Distributions Ch. 1: Data and Distributions Populations vs. Samples How to graphically display data Histograms, dot plots, stem plots, etc Helps to show how samples are distributed Distributions of both continuous and

More information

Final Exam - Solutions

Final Exam - Solutions Ecn 102 - Analysis of Economic Data University of California - Davis March 19, 2010 Instructor: John Parman Final Exam - Solutions You have until 5:30pm to complete this exam. Please remember to put your

More information

Chapter 16: Understanding Relationships Numerical Data

Chapter 16: Understanding Relationships Numerical Data Chapter 16: Understanding Relationships Numerical Data These notes reflect material from our text, Statistics, Learning from Data, First Edition, by Roxy Peck, published by CENGAGE Learning, 2015. Linear

More information

Problems. Suppose both models are fitted to the same data. Show that SS Res, A SS Res, B

Problems. Suppose both models are fitted to the same data. Show that SS Res, A SS Res, B Simple Linear Regression 35 Problems 1 Consider a set of data (x i, y i ), i =1, 2,,n, and the following two regression models: y i = β 0 + β 1 x i + ε, (i =1, 2,,n), Model A y i = γ 0 + γ 1 x i + γ 2

More information

This is a multiple choice and short answer practice exam. It does not count towards your grade. You may use the tables in your book.

This is a multiple choice and short answer practice exam. It does not count towards your grade. You may use the tables in your book. NAME (Please Print): HONOR PLEDGE (Please Sign): statistics 101 Practice Final Key This is a multiple choice and short answer practice exam. It does not count towards your grade. You may use the tables

More information

Introduction to Statistical Data Analysis Lecture 7: The Chi-Square Distribution

Introduction to Statistical Data Analysis Lecture 7: The Chi-Square Distribution Introduction to Statistical Data Analysis Lecture 7: The Chi-Square Distribution James V. Lambers Department of Mathematics The University of Southern Mississippi James V. Lambers Statistical Data Analysis

More information

Intro to Linear Regression

Intro to Linear Regression Intro to Linear Regression Introduction to Regression Regression is a statistical procedure for modeling the relationship among variables to predict the value of a dependent variable from one or more predictor

More information

Sociology 6Z03 Review II

Sociology 6Z03 Review II Sociology 6Z03 Review II John Fox McMaster University Fall 2016 John Fox (McMaster University) Sociology 6Z03 Review II Fall 2016 1 / 35 Outline: Review II Probability Part I Sampling Distributions Probability

More information

Ch 3: Multiple Linear Regression

Ch 3: Multiple Linear Regression Ch 3: Multiple Linear Regression 1. Multiple Linear Regression Model Multiple regression model has more than one regressor. For example, we have one response variable and two regressor variables: 1. delivery

More information

UNIVERSITY OF TORONTO SCARBOROUGH Department of Computer and Mathematical Sciences Midterm Test, October 2013

UNIVERSITY OF TORONTO SCARBOROUGH Department of Computer and Mathematical Sciences Midterm Test, October 2013 UNIVERSITY OF TORONTO SCARBOROUGH Department of Computer and Mathematical Sciences Midterm Test, October 2013 STAC67H3 Regression Analysis Duration: One hour and fifty minutes Last Name: First Name: Student

More information

Intro to Linear Regression

Intro to Linear Regression Intro to Linear Regression Introduction to Regression Regression is a statistical procedure for modeling the relationship among variables to predict the value of a dependent variable from one or more predictor

More information

1. Simple Linear Regression

1. Simple Linear Regression 1. Simple Linear Regression Suppose that we are interested in the average height of male undergrads at UF. We put each male student s name (population) in a hat and randomly select 100 (sample). Then their

More information

Chapter 1. Linear Regression with One Predictor Variable

Chapter 1. Linear Regression with One Predictor Variable Chapter 1. Linear Regression with One Predictor Variable 1.1 Statistical Relation Between Two Variables To motivate statistical relationships, let us consider a mathematical relation between two mathematical

More information

No other aids are allowed. For example you are not allowed to have any other textbook or past exams.

No other aids are allowed. For example you are not allowed to have any other textbook or past exams. UNIVERSITY OF TORONTO SCARBOROUGH Department of Computer and Mathematical Sciences Sample Exam Note: This is one of our past exams, In fact the only past exam with R. Before that we were using SAS. In

More information

Analysis of Variance

Analysis of Variance Analysis of Variance Blood coagulation time T avg A 62 60 63 59 61 B 63 67 71 64 65 66 66 C 68 66 71 67 68 68 68 D 56 62 60 61 63 64 63 59 61 64 Blood coagulation time A B C D Combined 56 57 58 59 60 61

More information

SCHOOL OF MATHEMATICS AND STATISTICS. Linear and Generalised Linear Models

SCHOOL OF MATHEMATICS AND STATISTICS. Linear and Generalised Linear Models SCHOOL OF MATHEMATICS AND STATISTICS Linear and Generalised Linear Models Autumn Semester 2017 18 2 hours Attempt all the questions. The allocation of marks is shown in brackets. RESTRICTED OPEN BOOK EXAMINATION

More information

Correlation and regression

Correlation and regression 1 Correlation and regression Yongjua Laosiritaworn Introductory on Field Epidemiology 6 July 2015, Thailand Data 2 Illustrative data (Doll, 1955) 3 Scatter plot 4 Doll, 1955 5 6 Correlation coefficient,

More information

MS&E 226: Small Data

MS&E 226: Small Data MS&E 226: Small Data Lecture 15: Examples of hypothesis tests (v5) Ramesh Johari ramesh.johari@stanford.edu 1 / 32 The recipe 2 / 32 The hypothesis testing recipe In this lecture we repeatedly apply the

More information

Statistics. Statistics

Statistics. Statistics The main aims of statistics 1 1 Choosing a model 2 Estimating its parameter(s) 1 point estimates 2 interval estimates 3 Testing hypotheses Distributions used in statistics: χ 2 n-distribution 2 Let X 1,

More information

: The model hypothesizes a relationship between the variables. The simplest probabilistic model: or.

: The model hypothesizes a relationship between the variables. The simplest probabilistic model: or. Chapter Simple Linear Regression : comparing means across groups : presenting relationships among numeric variables. Probabilistic Model : The model hypothesizes an relationship between the variables.

More information

BIOS 2083 Linear Models c Abdus S. Wahed

BIOS 2083 Linear Models c Abdus S. Wahed Chapter 5 206 Chapter 6 General Linear Model: Statistical Inference 6.1 Introduction So far we have discussed formulation of linear models (Chapter 1), estimability of parameters in a linear model (Chapter

More information

STAT763: Applied Regression Analysis. Multiple linear regression. 4.4 Hypothesis testing

STAT763: Applied Regression Analysis. Multiple linear regression. 4.4 Hypothesis testing STAT763: Applied Regression Analysis Multiple linear regression 4.4 Hypothesis testing Chunsheng Ma E-mail: cma@math.wichita.edu 4.4.1 Significance of regression Null hypothesis (Test whether all β j =

More information

Institute of Actuaries of India

Institute of Actuaries of India Institute of Actuaries of India Subject CT3 Probability & Mathematical Statistics May 2011 Examinations INDICATIVE SOLUTION Introduction The indicative solution has been written by the Examiners with the

More information

Inferences for Regression

Inferences for Regression Inferences for Regression An Example: Body Fat and Waist Size Looking at the relationship between % body fat and waist size (in inches). Here is a scatterplot of our data set: Remembering Regression In

More information

STAT2912: Statistical Tests. Solution week 12

STAT2912: Statistical Tests. Solution week 12 STAT2912: Statistical Tests Solution week 12 1. A behavioural biologist believes that performance of a laboratory rat on an intelligence test depends, to a large extent, on the amount of protein in the

More information

STAT5044: Regression and Anova. Inyoung Kim

STAT5044: Regression and Anova. Inyoung Kim STAT5044: Regression and Anova Inyoung Kim 2 / 47 Outline 1 Regression 2 Simple Linear regression 3 Basic concepts in regression 4 How to estimate unknown parameters 5 Properties of Least Squares Estimators:

More information

MATH 644: Regression Analysis Methods

MATH 644: Regression Analysis Methods MATH 644: Regression Analysis Methods FINAL EXAM Fall, 2012 INSTRUCTIONS TO STUDENTS: 1. This test contains SIX questions. It comprises ELEVEN printed pages. 2. Answer ALL questions for a total of 100

More information

Statistical Inference: Estimation and Confidence Intervals Hypothesis Testing

Statistical Inference: Estimation and Confidence Intervals Hypothesis Testing Statistical Inference: Estimation and Confidence Intervals Hypothesis Testing 1 In most statistics problems, we assume that the data have been generated from some unknown probability distribution. We desire

More information

UNIVERSITY OF TORONTO. Faculty of Arts and Science APRIL 2010 EXAMINATIONS STA 303 H1S / STA 1002 HS. Duration - 3 hours. Aids Allowed: Calculator

UNIVERSITY OF TORONTO. Faculty of Arts and Science APRIL 2010 EXAMINATIONS STA 303 H1S / STA 1002 HS. Duration - 3 hours. Aids Allowed: Calculator UNIVERSITY OF TORONTO Faculty of Arts and Science APRIL 2010 EXAMINATIONS STA 303 H1S / STA 1002 HS Duration - 3 hours Aids Allowed: Calculator LAST NAME: FIRST NAME: STUDENT NUMBER: There are 27 pages

More information

Multivariate Linear Regression Models

Multivariate Linear Regression Models Multivariate Linear Regression Models Regression analysis is used to predict the value of one or more responses from a set of predictors. It can also be used to estimate the linear association between

More information

Statistics for Engineers Lecture 9 Linear Regression

Statistics for Engineers Lecture 9 Linear Regression Statistics for Engineers Lecture 9 Linear Regression Chong Ma Department of Statistics University of South Carolina chongm@email.sc.edu April 17, 2017 Chong Ma (Statistics, USC) STAT 509 Spring 2017 April

More information

1 Introduction to Minitab

1 Introduction to Minitab 1 Introduction to Minitab Minitab is a statistical analysis software package. The software is freely available to all students and is downloadable through the Technology Tab at my.calpoly.edu. When you

More information

One-Way ANOVA. Some examples of when ANOVA would be appropriate include:

One-Way ANOVA. Some examples of when ANOVA would be appropriate include: One-Way ANOVA 1. Purpose Analysis of variance (ANOVA) is used when one wishes to determine whether two or more groups (e.g., classes A, B, and C) differ on some outcome of interest (e.g., an achievement

More information

Measuring the fit of the model - SSR

Measuring the fit of the model - SSR Measuring the fit of the model - SSR Once we ve determined our estimated regression line, we d like to know how well the model fits. How far/close are the observations to the fitted line? One way to do

More information

Can you tell the relationship between students SAT scores and their college grades?

Can you tell the relationship between students SAT scores and their college grades? Correlation One Challenge Can you tell the relationship between students SAT scores and their college grades? A: The higher SAT scores are, the better GPA may be. B: The higher SAT scores are, the lower

More information

6. Multiple Linear Regression

6. Multiple Linear Regression 6. Multiple Linear Regression SLR: 1 predictor X, MLR: more than 1 predictor Example data set: Y i = #points scored by UF football team in game i X i1 = #games won by opponent in their last 10 games X

More information

401 Review. 6. Power analysis for one/two-sample hypothesis tests and for correlation analysis.

401 Review. 6. Power analysis for one/two-sample hypothesis tests and for correlation analysis. 401 Review Major topics of the course 1. Univariate analysis 2. Bivariate analysis 3. Simple linear regression 4. Linear algebra 5. Multiple regression analysis Major analysis methods 1. Graphical analysis

More information

Topic 20: Single Factor Analysis of Variance

Topic 20: Single Factor Analysis of Variance Topic 20: Single Factor Analysis of Variance Outline Single factor Analysis of Variance One set of treatments Cell means model Factor effects model Link to linear regression using indicator explanatory

More information

z and t tests for the mean of a normal distribution Confidence intervals for the mean Binomial tests

z and t tests for the mean of a normal distribution Confidence intervals for the mean Binomial tests z and t tests for the mean of a normal distribution Confidence intervals for the mean Binomial tests Chapters 3.5.1 3.5.2, 3.3.2 Prof. Tesler Math 283 Fall 2018 Prof. Tesler z and t tests for mean Math

More information

http://www.statsoft.it/out.php?loc=http://www.statsoft.com/textbook/ Group comparison test for independent samples The purpose of the Analysis of Variance (ANOVA) is to test for significant differences

More information

Estimating σ 2. We can do simple prediction of Y and estimation of the mean of Y at any value of X.

Estimating σ 2. We can do simple prediction of Y and estimation of the mean of Y at any value of X. Estimating σ 2 We can do simple prediction of Y and estimation of the mean of Y at any value of X. To perform inferences about our regression line, we must estimate σ 2, the variance of the error term.

More information

Business Statistics. Lecture 9: Simple Regression

Business Statistics. Lecture 9: Simple Regression Business Statistics Lecture 9: Simple Regression 1 On to Model Building! Up to now, class was about descriptive and inferential statistics Numerical and graphical summaries of data Confidence intervals

More information

Formal Statement of Simple Linear Regression Model

Formal Statement of Simple Linear Regression Model Formal Statement of Simple Linear Regression Model Y i = β 0 + β 1 X i + ɛ i Y i value of the response variable in the i th trial β 0 and β 1 are parameters X i is a known constant, the value of the predictor

More information

STAT 525 Fall Final exam. Tuesday December 14, 2010

STAT 525 Fall Final exam. Tuesday December 14, 2010 STAT 525 Fall 2010 Final exam Tuesday December 14, 2010 Time: 2 hours Name (please print): Show all your work and calculations. Partial credit will be given for work that is partially correct. Points will

More information

Regression Estimation - Least Squares and Maximum Likelihood. Dr. Frank Wood

Regression Estimation - Least Squares and Maximum Likelihood. Dr. Frank Wood Regression Estimation - Least Squares and Maximum Likelihood Dr. Frank Wood Least Squares Max(min)imization Function to minimize w.r.t. β 0, β 1 Q = n (Y i (β 0 + β 1 X i )) 2 i=1 Minimize this by maximizing

More information

Lecture 10 Multiple Linear Regression

Lecture 10 Multiple Linear Regression Lecture 10 Multiple Linear Regression STAT 512 Spring 2011 Background Reading KNNL: 6.1-6.5 10-1 Topic Overview Multiple Linear Regression Model 10-2 Data for Multiple Regression Y i is the response variable

More information

ST430 Exam 1 with Answers

ST430 Exam 1 with Answers ST430 Exam 1 with Answers Date: October 5, 2015 Name: Guideline: You may use one-page (front and back of a standard A4 paper) of notes. No laptop or textook are permitted but you may use a calculator.

More information

Figure 1: The fitted line using the shipment route-number of ampules data. STAT5044: Regression and ANOVA The Solution of Homework #2 Inyoung Kim

Figure 1: The fitted line using the shipment route-number of ampules data. STAT5044: Regression and ANOVA The Solution of Homework #2 Inyoung Kim 0.0 1.0 1.5 2.0 2.5 3.0 8 10 12 14 16 18 20 22 y x Figure 1: The fitted line using the shipment route-number of ampules data STAT5044: Regression and ANOVA The Solution of Homework #2 Inyoung Kim Problem#

More information

Objectives Simple linear regression. Statistical model for linear regression. Estimating the regression parameters

Objectives Simple linear regression. Statistical model for linear regression. Estimating the regression parameters Objectives 10.1 Simple linear regression Statistical model for linear regression Estimating the regression parameters Confidence interval for regression parameters Significance test for the slope Confidence

More information

STAT 4385 Topic 01: Introduction & Review

STAT 4385 Topic 01: Introduction & Review STAT 4385 Topic 01: Introduction & Review Xiaogang Su, Ph.D. Department of Mathematical Science University of Texas at El Paso xsu@utep.edu Spring, 2016 Outline Welcome What is Regression Analysis? Basics

More information

Fall 2017 STAT 532 Homework Peter Hoff. 1. Let P be a probability measure on a collection of sets A.

Fall 2017 STAT 532 Homework Peter Hoff. 1. Let P be a probability measure on a collection of sets A. 1. Let P be a probability measure on a collection of sets A. (a) For each n N, let H n be a set in A such that H n H n+1. Show that P (H n ) monotonically converges to P ( k=1 H k) as n. (b) For each n

More information

FinalExamReview. Sta Fall Provided: Z, t and χ 2 tables

FinalExamReview. Sta Fall Provided: Z, t and χ 2 tables Final Exam FinalExamReview Sta 101 - Fall 2017 Duke University, Department of Statistical Science When: Wednesday, December 13 from 9:00am-12:00pm What to bring: Scientific calculator (graphing calculator

More information

Lecture 6 Multiple Linear Regression, cont.

Lecture 6 Multiple Linear Regression, cont. Lecture 6 Multiple Linear Regression, cont. BIOST 515 January 22, 2004 BIOST 515, Lecture 6 Testing general linear hypotheses Suppose we are interested in testing linear combinations of the regression

More information