13 Simple Linear Regression
|
|
- Loreen Cameron
- 5 years ago
- Views:
Transcription
1 B.Sc./Cert./M.Sc. Qualif. - Statistics: Theory and Practice 3 Simple Linear Regression 3. An industrial example A study was undertaken to determine the effect of stirring rate on the amount of impurity in paint produced by a chemical process. The study yielded the data shown in the following S+ output. The stirring rate is in revolutions per minute and the impurity is recorded as a percentage. It appears from the data that twelve stirring rates were chosen at intervals of 2 rpm and the resulting impurity levels recorded for each stirring rate. The subsequent plot shows that impurity increases approximately linearly with stirring rate. > stirrate <- seq(20, 42, 2) > impurity <- c(8.4, 9.5,.8, 0.4, 3.3, 4.8, 3.2, 4.7, 6.4, 6.5, 8.9, 8.5) > paint.data <- data.frame(stirrate, impurity) > paint.data stirrate impurity > rm(stirrate, impurity) > attach(paint.data) > plot(stirrate, impurity)
2 impurity stirrate Figure : Plot of Impurity versus Stirrate 3.2 The statistical model for simple linear regression In general, suppose that we have observed n pairs of values, (x, y ), (x 2, y 2 ),..., (x n, y n ), where y is regarded as the observed value of the response variable Y and x as the regressor variable (or predictor variable or explanatory variable), so that Y is the dependent variable, and we wish to investigate how the values of Y depend upon the values of x. The simplest model is a linear one. Given the set of values x i, i =,..., n, regarded as fixed and observed without error, consider the linear regression model Y i = β 0 + β x i + ε i i =,..., n, () where β 0 and β are unknown parameters. The random errors ε i are assumed to be NID(0, σ 2 ), with σ 2 unknown. We are now looking at the relationship of the (observed) response variable y to a quantitative factor, which takes numerical values x. Previously we dealt with a qualitative factor, in the form of a treatment, the different levels of which did not necessarily represent different numerical levels of some variable, and even if they did, this was not taken into account in the underlying statistical model. The line with equation y = β 0 + β x is known as the regression line. The regression coefficient β is the slope of the regression line and the regression coefficient (the constant) β 0 is the intercept of the line on the y-axis. 2
3 3.3 The least squares estimates of the parameters We shall use hatted Greek letters, ˆβ, for parameter estimators, and lower case Roman letters, b, for parameter estimates. Thus coefficients regression line model parameters β 0 β y = β 0 + β x parameter estimators ˆβ0 ˆβ y = ˆβ 0 + ˆβ x parameter estimates b 0 b y = b 0 + b x Given estimated parameter values, for each x i the corresponding observed fitted value ŷ i is given by ŷ i = b 0 + b x i and e i y i ŷ i is the corresponding observed residual. According to the method of least squares, given the observed values (x i, y i ), i =,..., n, we choose our parameter estimates, b 0 and b, to be those values of β 0 and β that minimize L = n (y i β 0 β x i ) 2. (2) i= In geometrical terms, given a scatter plot of the points (x i, y i ), i =,..., n, we choose our fitted regression line in such a way as to minimize the sum of squares of the vertical distances of the points from the line. It is worth introducing some more notation at this stage. In what follows, all summations are from i = to n. Denote the corrected sums of squares by = (x i x) 2 and S yy = (Y i Ȳ )2. Note that S yy is the total (corrected) sum of squares in the ANOVA. The corrected sum of products, S xy, is defined by S xy = (x i x)(y i Ȳ ), or, equivalently, by S xy = (x i x)y i. Note that, whereas and S yy are necessarily non-negative, S xy can take negative values. The observed values of S yy and S xy are denotes by s yy and s xy respectively. It turns out that the least squares estimates b 0 and b of β 0 and β, respectively, are given by b = s xy (3) 3
4 and Thus the equation of the fitted regression line, can be written as b 0 = ȳ b x. y = b 0 + b x, y = ȳ + b (x x). This is the equation of the line with slope b s xy / passing through the point ( x, ȳ). 3.4 The partition of the total sum of squares It turns out that the total sum of squares SS T S yy may be partitioned as where SS Reg is the regression sum of squares, SS T = SS Reg + SS R, (4) SS Reg = ˆβ S xy, (5) and the residual sum of squares SS R corresponds to the minimized value of L in Equation (2). The regression sum of squares SS Reg may be interpreted as that part of the total sum of squares which is accounted for by the estimated regression. Given SS T, the larger the value of SS Reg and the smaller the value of SS R, the better the fit of the estimated regression line. We may test the null hypothesis H 0 : β = 0 against the alternative H : β 0, which is a test for the absence of a linear relationship between the x and y variables. If β = 0 then the regression model () reduces to Y i = β 0 + ε i i =,..., n, so that the Y i are assumed to be NID(β 0, σ 2 ). In this case, the joint distribution of the Y i does not depend upon the values of the x i, so that the x i have no predictive power. It may be shown that the two terms on the right hand side of Equation (4), SS Reg and SS R, are independently distributed. SS R /σ 2 has the χ 2 n 2 distribution and, under H 0, SS Reg /σ 2 has the χ 2 distribution. Hence, under H 0, the ratio F = MS Reg MS R has the F,n 2 distribution. This statistic is used for a one-tail test of H 0. The calculations may be laid out in the form of the following ANOVA table. As in previous ANOVAs, a mean 4
5 square (MS) is obtained by dividing the corresponding sum of squares by its degrees of freedom and Ŝ2 MS R is an unbiased estimator of the error variance σ 2. ANOVA TABLE Source DF SS M S Regression ˆβ S xy SS Reg Error n 2 by subtraction Ŝ 2 SS R /(n 2) Total n S yy 3.5 Example (continued) The regression analysis is carried out using the S+ function lm, where impurity is regressed against a constant (which is included by default) and stirrate, the data being drawn from the data frame paint.data. The functions summary and anova are then applied to the fitted model object paint.lm in order to obtain the corresponding parameter estimates and analysis of variance table. > paint.lm <- lm(impurity ~ stirrate, data = paint.data) > summary(paint.lm) Call: lm(formula = impurity ~ stirrate, data = paint.data) Residuals: Min Q Median 3Q Max Coefficients: Value Std. Error t value Pr(> t ) (Intercept) stirrate Residual standard error: on 0 degrees of freedom Multiple R-Squared: Adjusted R-squared: F-statistic: 4. on and 0 degrees of freedom, the p-value is 3.2e-007 > anova(paint.lm) Analysis of Variance Table Response: impurity Terms added sequentially (first to last) Df Sum of Sq Mean Sq F Value Pr(F) stirrate e-007 Residuals The output shows that the coefficients of the fitted regression line are b 0 = and b = We shall discuss in a future section some of the details of the calculation of the 5
6 associated standard errors and tests of significance. In the case of simple linear regression, the t-test for the coefficient β (for stirrate) is equivalent to the F-test in the ANOVA. The p-value for both is 0.000, so clearly there is a very significant linear relationship. The p-value for the constant β 0 is not significant, but we do keep the constant term in the regression equation. We may also verify that, correct to two decimal places, the observed value of Ŝ = MS R is ŝ = 0.85 = Correlation and the coefficient of determination From Equations (3) and (5), Hence where r is the sample correlation coefficient, SS Reg = S2 xy. SS Reg SS T = S2 xy S yy = r 2, (6) r = S xy sxx S yy. r, which satisfies the inequalities r, may be thought of as a measure of the strength of the linear relationship between the x i and the y i. The closer r is to the value, the stronger the relationship. But from Equation (6) it follows that r 2 may be characterized as the proportion of the total sum of squares accounted for by the regression. (robs 2 = 93.4% = 9.28/27.73 in our example.) It also follows from Equations (4) and (6) that r 2 = SS R SS T, (7) and it turns out that Equation (7) is the one that is the most appropriate for generalization to more general regression models and measures of fit. In general, define the coefficient of determination R 2 by R 2 = SS Reg SS T = SS R SS T. This quantity, which like r 2 is the proportion of the total sum of squares accounted for by the regression, may be regarded as a measure of the goodness of fit of the regression model. An alternative measure, which is often preferred, is the adjusted coefficient of determination R 2 (adjusted for the number of regressor variables, one in the case of simple linear regression), R 2 = MS R MS T, where MS T = SS T /(n ). The significance of these quantities becomes apparent only when more complicated regression models are to be investigated. S+ outputs these two coefficients Multiple R-Squared and Adjusted R-squared, respectively. 6
7 In comparing the use of the F -statistic and R 2, we may recall that the F -statistic is used to investigate whether there is evidence of a linear relationship between the variables x and y. The value of R 2 is an indicator of the strength of that relationship. It is readily checked that, in the case of simple linear regression, the values of F and R 2 are related by the formula or, equivalently, R 2 = F = F n 2 + F (n 2)R2 R 2. It is possible to have a highly significant value of F together with a relatively low value of R 2 (if n is large) or a relatively large value of R 2 with a non-significant value of F (if n is small). 3.7 A test and confidence interval for the slope parameter Recall that the least squares estimator ˆβ of β is given by ˆβ = S xy = (xi x)y i (8) and that in the regression model the x i are regarded as fixed. So, on the right hand side of Equation (8), only the Y i are random variables, independently and normally distributed. Since ˆβ is a linear combination of normally distributed r.v.s, it follows that ˆβ is also normally distributed. It may be shown that ˆβ is an unbiased estimator of β, that is, and that the variance of ˆβ is given by E[ ˆβ ] = β var( ˆβ ) = σ2. Hence ˆβ has the N(β, σ 2 / ) distribution. We estimate the unknown error variance σ 2 by using the estimator Ŝ2 MS R from the ANOVA table. (In the S+ output, the estimate ŝ of σ is given by Residual standard error.) Thus ŝ/ is the observed standard error of b and the t-statistic for testing H 0 : β = 0 is T = ˆβ sxx, ŝ which under H 0 has the t n 2 distribution. We can verify from our S+ output that T obs for β is calculated as the ratio of the estimated coefficient to its standard error:.9 = / The above t-statistic satisfies T 2 = ˆβ 2 ŝ 2 = MS Reg MS R = F, 7
8 where F is the F-statistic calculated from the ANOVA. This fact is a special feature of simple linear regression and does not hold for more general regression models. In our example, F obs = 4.3 = = T 2 obs. It follows from the definitions of the distributions that the square of a random variable with a t ν distribution has the F,ν distribution. The p-values of the above t-statistic and F-statistic are identical. Given the value of b, a 00( α)% observed confidence interval for β is given by b ± t n 2,α/2ŝ sxx. In our example we may calculate the 95% confidence interval for β using S+. ##Direct Calculation of the observed CI #k is upper 2.5% percentage point of t-distn wit 0 d.o.f. #k2 is the half-length of interval #k3 is the estimated value of #the slope > k <- qt(0.975, 0) > k2 <- k * > k3 < > CI <- c(k3 - k2, k3 + k2) > CI [] Thus the confidence interval for β is (0.37,0.54). 3.8 Fitted values and Analysis of residuals Previously, we found that the fitted equation was of the form y = x. The observed fitted values may be obtained for each of the stir rates in the data set using the function fitted. > fitted.values <- fitted(paint.lm) > fitted.values Recall that the residuals ˆε i are defined by ˆε i = Y i Ŷi = Y i ˆβ 0 ˆβ x i i =,..., n. Given that ˆβ is an unbiased estimator of β, it is easy to check that ˆβ 0 is an unbiased estimator of β 0. It follows that E[ˆε i ] = E[Y i ] E[ ˆβ 0 ] E[ ˆβ ]x i = (β 0 + β x i ) β 0 β x i = 0. 8
9 A more detailed analysis shows that var(ˆε i ) = ( h i )σ 2 i =,..., n, where h i is the leverage of the i-th observation, Hence the standardized residuals D i are defined by h i = n + (x i x) 2 i =,..., n. (9) D i = ˆε i Ŝ h i i =,..., n. If the assumptions of the regression model are correct, the standardized residuals are approximately NID(0, ). The leverage h i of the i-th observation as defined in Equation (9) depends only on the value x i of the predictor variable and not on the value y i of the response variable. The leverage h i may be regarded as a measure of the remoteness of the value x i of the predictor variable for the i-th observation from the sample mean x of all n observed values of the predictor variable. It is always the case for simple regression that n h i i =,..., n and hi = 2, so that h = 2/n. If h i is large then the corresponding observation may be highly influential in determining the estimated regression coefficients. There are situations in which removal of an observation with large leverage from the data set can result in drastic changes in the estimates of the regression coefficients. So observations with large leverage should be treated with caution. We can obtain a list of the leverage values and the standardized residuals by using the commands lm.influence() and (upon invoking library(mass) first) stdres(), respectively. As a benchmark, we might consider an h i greater than say 3 times the average (or very close to ), which equates to 0.5 in our example, as high (suggesting corresponding predictor is unusual) and the standardized residual d i satisfying d i > 2 to be high (suggesting corresponding response is unusual). > leverages <- lm.influence(paint.lm)$hat > library(mass) > std.residuals <- stdres(paint.lm) > diagnostics <- data.frame(leverages, std.residuals) > diagnostics leverages std.residuals
10 Nothing untoward in the above output. 3.9 Prediction One of the reasons for carrying out a linear regression analysis may be that, in future, given an x-value, we wish to be able to predict the corresponding y-value, using the fitted regression equation, so that Ŷ = ˆβ 0 + ˆβ x. (0) Assuming the validity of the linear regression model, for the given x-value, the actual y-value will be given by Y = β 0 + β x + ε, where, as before, the error term ε is assumed to have the N(0, σ 2 ) distribution. Hence and E[Y ] = β 0 + β x Y = E[Y ] + ε. () The Ŷ defined in Equation (0) may be regarded in two ways, either as an estimator of E[Y ] (the long-term average of all y-values for the given x-value) or as a predictor of y (one particular y-value for the given x-value). In the latter case, there are two sources of error in accounting for the difference between an observed value of Y, i.e. y, and the predicted value ŷ: one due to using the estimators ˆβ 0 and ˆβ instead of the actual parameter values β 0 and β, and the other due to the presence of the error term ϵ. Since ˆβ is an unbiased estimator of β and ˆβ 0 is an unbiased estimator of β 0, from Equation (0), E[Ŷ ] = E[ ˆβ 0 + ˆβ x] = β 0 + β x = E[Y ]. Thus Ŷ is an unbiased estimator of E[Y ] and an unbiased predictor of Y. From Equation (0), var(ŷ ) = var( ˆβ 0 + ˆβ x), which turns out to be given by var(ŷ ) = ( n ) (x x)2 + σ 2. (2) 0
11 Additionally, using Equation (), var(ŷ Y ) is equal to i.e. ( + ) (x x)2 + σ 2. n var(ŷ ) + var(ε) = var(ŷ ) + σ2. (3) As before, we estimate σ 2 by Ŝ2 MS R from the ANOVA table. A 00( α)% observed confidence interval for E[Y ] is given by (x x)2 b 0 + b x ± t n 2,α/2 + ŝ. n A 00( α)% observed prediction interval for the value of y is given by b 0 + b x ± t n 2,α/2 + (x x)2 + ŝ. n S+ refers to the quantity (x x)2 + n ŝ as se.fit, the standard error of the fit. Note how the widths of the confidence and prediction intervals depend on the distance of x from x. The prediction interval is wider than the confidence interval. If the regression equation has been fitted using x-values in some interval A and appears to provide a good representation of the relationship between x and y in A, we should be wary of extrapolating this equation to make predictions for x-values outside A, as the linear relationship between x and y may not hold outside A. 3.0 Example (continued) We use the function predict in S+ to obtain predicted values and their standard errors. We construct a data frame x whose variable name is that of the regressor variable, stirrate, and which contains the values of the regressor variables for which we wish to make predictions. In the present case, we shall use the single value of 4. The first argument of the predict function is the object paint.lm that corresponds to our model and the second argument is the data frame x that contains the values of the regressor variable for which we wish to make predictions. The argument se.fit = TRUE is required so that we obtain standard errors for our predictions and so that, subsequently, we can use the function pointwise to produce confidence intervals.
12 In the output, the term residual.scale refers to the value of ŝ. Given this and the value of standard error of the fit, we may, if desired, calculate the prediction interval as defined above, in addition to the confidence interval produced by the function pointwise. > x <- data.frame(stirrate = 4) > predict.impurity <- predict(paint.lm, x, se.fit = TRUE) > predict.impurity $fit: $se.fit: $residual.scale: [] $df: [] 0 > pointwise(predict.impurity, 0.95) $upper: $fit: $lower:
14 Multiple Linear Regression
B.Sc./Cert./M.Sc. Qualif. - Statistics: Theory and Practice 14 Multiple Linear Regression 14.1 The multiple linear regression model In simple linear regression, the response variable y is expressed in
More informationCh 2: Simple Linear Regression
Ch 2: Simple Linear Regression 1. Simple Linear Regression Model A simple regression model with a single regressor x is y = β 0 + β 1 x + ɛ, where we assume that the error ɛ is independent random component
More informationSimple Linear Regression
Simple Linear Regression In simple linear regression we are concerned about the relationship between two variables, X and Y. There are two components to such a relationship. 1. The strength of the relationship.
More informationInference for Regression
Inference for Regression Section 9.4 Cathy Poliak, Ph.D. cathy@math.uh.edu Office in Fleming 11c Department of Mathematics University of Houston Lecture 13b - 3339 Cathy Poliak, Ph.D. cathy@math.uh.edu
More informationOct Simple linear regression. Minimum mean square error prediction. Univariate. regression. Calculating intercept and slope
Oct 2017 1 / 28 Minimum MSE Y is the response variable, X the predictor variable, E(X) = E(Y) = 0. BLUP of Y minimizes average discrepancy var (Y ux) = C YY 2u C XY + u 2 C XX This is minimized when u
More informationNature vs. nurture? Lecture 18 - Regression: Inference, Outliers, and Intervals. Regression Output. Conditions for inference.
Understanding regression output from software Nature vs. nurture? Lecture 18 - Regression: Inference, Outliers, and Intervals In 1966 Cyril Burt published a paper called The genetic determination of differences
More informationAMS 315/576 Lecture Notes. Chapter 11. Simple Linear Regression
AMS 315/576 Lecture Notes Chapter 11. Simple Linear Regression 11.1 Motivation A restaurant opening on a reservations-only basis would like to use the number of advance reservations x to predict the number
More informationMeasuring the fit of the model - SSR
Measuring the fit of the model - SSR Once we ve determined our estimated regression line, we d like to know how well the model fits. How far/close are the observations to the fitted line? One way to do
More information12 The Analysis of Residuals
B.Sc./Cert./M.Sc. Qualif. - Statistics: Theory and Practice 12 The Analysis of Residuals 12.1 Errors and residuals Recall that in the statistical model for the completely randomized one-way design, Y ij
More informationSimple and Multiple Linear Regression
Sta. 113 Chapter 12 and 13 of Devore March 12, 2010 Table of contents 1 Simple Linear Regression 2 Model Simple Linear Regression A simple linear regression model is given by Y = β 0 + β 1 x + ɛ where
More informationSimple Linear Regression
Simple Linear Regression September 24, 2008 Reading HH 8, GIll 4 Simple Linear Regression p.1/20 Problem Data: Observe pairs (Y i,x i ),i = 1,...n Response or dependent variable Y Predictor or independent
More informationCh 3: Multiple Linear Regression
Ch 3: Multiple Linear Regression 1. Multiple Linear Regression Model Multiple regression model has more than one regressor. For example, we have one response variable and two regressor variables: 1. delivery
More informationSimple Linear Regression
Simple Linear Regression Reading: Hoff Chapter 9 November 4, 2009 Problem Data: Observe pairs (Y i,x i ),i = 1,... n Response or dependent variable Y Predictor or independent variable X GOALS: Exploring
More informationST430 Exam 1 with Answers
ST430 Exam 1 with Answers Date: October 5, 2015 Name: Guideline: You may use one-page (front and back of a standard A4 paper) of notes. No laptop or textook are permitted but you may use a calculator.
More informationBusiness Statistics. Tommaso Proietti. Linear Regression. DEF - Università di Roma 'Tor Vergata'
Business Statistics Tommaso Proietti DEF - Università di Roma 'Tor Vergata' Linear Regression Specication Let Y be a univariate quantitative response variable. We model Y as follows: Y = f(x) + ε where
More informationDensity Temp vs Ratio. temp
Temp Ratio Density 0.00 0.02 0.04 0.06 0.08 0.10 0.12 Density 0.0 0.2 0.4 0.6 0.8 1.0 1. (a) 170 175 180 185 temp 1.0 1.5 2.0 2.5 3.0 ratio The histogram shows that the temperature measures have two peaks,
More informationLecture 6 Multiple Linear Regression, cont.
Lecture 6 Multiple Linear Regression, cont. BIOST 515 January 22, 2004 BIOST 515, Lecture 6 Testing general linear hypotheses Suppose we are interested in testing linear combinations of the regression
More informationLecture 18: Simple Linear Regression
Lecture 18: Simple Linear Regression BIOS 553 Department of Biostatistics University of Michigan Fall 2004 The Correlation Coefficient: r The correlation coefficient (r) is a number that measures the strength
More informationSimple Linear Regression
Chapter 2 Simple Linear Regression Linear Regression with One Independent Variable 2.1 Introduction In Chapter 1 we introduced the linear model as an alternative for making inferences on means of one or
More informationLecture 14 Simple Linear Regression
Lecture 4 Simple Linear Regression Ordinary Least Squares (OLS) Consider the following simple linear regression model where, for each unit i, Y i is the dependent variable (response). X i is the independent
More informationMAT2377. Rafa l Kulik. Version 2015/November/26. Rafa l Kulik
MAT2377 Rafa l Kulik Version 2015/November/26 Rafa l Kulik Bivariate data and scatterplot Data: Hydrocarbon level (x) and Oxygen level (y): x: 0.99, 1.02, 1.15, 1.29, 1.46, 1.36, 0.87, 1.23, 1.55, 1.40,
More informationUnit 6 - Simple linear regression
Sta 101: Data Analysis and Statistical Inference Dr. Çetinkaya-Rundel Unit 6 - Simple linear regression LO 1. Define the explanatory variable as the independent variable (predictor), and the response variable
More informationSTAT Chapter 11: Regression
STAT 515 -- Chapter 11: Regression Mostly we have studied the behavior of a single random variable. Often, however, we gather data on two random variables. We wish to determine: Is there a relationship
More informationOverview Scatter Plot Example
Overview Topic 22 - Linear Regression and Correlation STAT 5 Professor Bruce Craig Consider one population but two variables For each sampling unit observe X and Y Assume linear relationship between variables
More informationStat 401B Exam 2 Fall 2015
Stat 401B Exam Fall 015 I have neither given nor received unauthorized assistance on this exam. Name Signed Date Name Printed ATTENTION! Incorrect numerical answers unaccompanied by supporting reasoning
More informationChapter 1: Linear Regression with One Predictor Variable also known as: Simple Linear Regression Bivariate Linear Regression
BSTT523: Kutner et al., Chapter 1 1 Chapter 1: Linear Regression with One Predictor Variable also known as: Simple Linear Regression Bivariate Linear Regression Introduction: Functional relation between
More informationMS&E 226: Small Data
MS&E 226: Small Data Lecture 15: Examples of hypothesis tests (v5) Ramesh Johari ramesh.johari@stanford.edu 1 / 32 The recipe 2 / 32 The hypothesis testing recipe In this lecture we repeatedly apply the
More informationEstimating σ 2. We can do simple prediction of Y and estimation of the mean of Y at any value of X.
Estimating σ 2 We can do simple prediction of Y and estimation of the mean of Y at any value of X. To perform inferences about our regression line, we must estimate σ 2, the variance of the error term.
More informationStatistical Modelling in Stata 5: Linear Models
Statistical Modelling in Stata 5: Linear Models Mark Lunt Arthritis Research UK Epidemiology Unit University of Manchester 07/11/2017 Structure This Week What is a linear model? How good is my model? Does
More informationSimple Linear Regression
Simple Linear Regression MATH 282A Introduction to Computational Statistics University of California, San Diego Instructor: Ery Arias-Castro http://math.ucsd.edu/ eariasca/math282a.html MATH 282A University
More informationHomework 2: Simple Linear Regression
STAT 4385 Applied Regression Analysis Homework : Simple Linear Regression (Simple Linear Regression) Thirty (n = 30) College graduates who have recently entered the job market. For each student, the CGPA
More informationSimple Linear Regression
Simple Linear Regression ST 370 Regression models are used to study the relationship of a response variable and one or more predictors. The response is also called the dependent variable, and the predictors
More informationCHAPTER 2 SIMPLE LINEAR REGRESSION
CHAPTER 2 SIMPLE LINEAR REGRESSION 1 Examples: 1. Amherst, MA, annual mean temperatures, 1836 1997 2. Summer mean temperatures in Mount Airy (NC) and Charleston (SC), 1948 1996 Scatterplots outliers? influential
More informationSimple Linear Regression
Simple Linear Regression ST 430/514 Recall: A regression model describes how a dependent variable (or response) Y is affected, on average, by one or more independent variables (or factors, or covariates)
More informationMultiple Linear Regression
Multiple Linear Regression Simple linear regression tries to fit a simple line between two variables Y and X. If X is linearly related to Y this explains some of the variability in Y. In most cases, there
More informationSimple Linear Regression Analysis
LINEAR REGRESSION ANALYSIS MODULE II Lecture - 6 Simple Linear Regression Analysis Dr. Shalabh Department of Mathematics and Statistics Indian Institute of Technology Kanpur Prediction of values of study
More information2.4.3 Estimatingσ Coefficient of Determination 2.4. ASSESSING THE MODEL 23
2.4. ASSESSING THE MODEL 23 2.4.3 Estimatingσ 2 Note that the sums of squares are functions of the conditional random variables Y i = (Y X = x i ). Hence, the sums of squares are random variables as well.
More informationGeneral Linear Model (Chapter 4)
General Linear Model (Chapter 4) Outcome variable is considered continuous Simple linear regression Scatterplots OLS is BLUE under basic assumptions MSE estimates residual variance testing regression coefficients
More informationMultiple Linear Regression
Multiple Linear Regression ST 430/514 Recall: a regression model describes how a dependent variable (or response) Y is affected, on average, by one or more independent variables (or factors, or covariates).
More informationUnit 6 - Introduction to linear regression
Unit 6 - Introduction to linear regression Suggested reading: OpenIntro Statistics, Chapter 7 Suggested exercises: Part 1 - Relationship between two numerical variables: 7.7, 7.9, 7.11, 7.13, 7.15, 7.25,
More informationStatistics for Engineers Lecture 9 Linear Regression
Statistics for Engineers Lecture 9 Linear Regression Chong Ma Department of Statistics University of South Carolina chongm@email.sc.edu April 17, 2017 Chong Ma (Statistics, USC) STAT 509 Spring 2017 April
More informationLecture 3: Inference in SLR
Lecture 3: Inference in SLR STAT 51 Spring 011 Background Reading KNNL:.1.6 3-1 Topic Overview This topic will cover: Review of hypothesis testing Inference about 1 Inference about 0 Confidence Intervals
More informationLecture 10 Multiple Linear Regression
Lecture 10 Multiple Linear Regression STAT 512 Spring 2011 Background Reading KNNL: 6.1-6.5 10-1 Topic Overview Multiple Linear Regression Model 10-2 Data for Multiple Regression Y i is the response variable
More informationMath 3330: Solution to midterm Exam
Math 3330: Solution to midterm Exam Question 1: (14 marks) Suppose the regression model is y i = β 0 + β 1 x i + ε i, i = 1,, n, where ε i are iid Normal distribution N(0, σ 2 ). a. (2 marks) Compute the
More informationChapter 1 Linear Regression with One Predictor
STAT 525 FALL 2018 Chapter 1 Linear Regression with One Predictor Professor Min Zhang Goals of Regression Analysis Serve three purposes Describes an association between X and Y In some applications, the
More informationBiostatistics 380 Multiple Regression 1. Multiple Regression
Biostatistics 0 Multiple Regression ORIGIN 0 Multiple Regression Multiple Regression is an extension of the technique of linear regression to describe the relationship between a single dependent (response)
More informationLecture 1: Linear Models and Applications
Lecture 1: Linear Models and Applications Claudia Czado TU München c (Claudia Czado, TU Munich) ZFS/IMS Göttingen 2004 0 Overview Introduction to linear models Exploratory data analysis (EDA) Estimation
More informationSchool of Mathematical Sciences. Question 1. Best Subsets Regression
School of Mathematical Sciences MTH5120 Statistical Modelling I Practical 9 and Assignment 8 Solutions Question 1 Best Subsets Regression Response is Crime I n W c e I P a n A E P U U l e Mallows g E P
More informationLinear Regression. Simple linear regression model determines the relationship between one dependent variable (y) and one independent variable (x).
Linear Regression Simple linear regression model determines the relationship between one dependent variable (y) and one independent variable (x). A dependent variable is a random variable whose variation
More informationInference for Regression Simple Linear Regression
Inference for Regression Simple Linear Regression IPS Chapter 10.1 2009 W.H. Freeman and Company Objectives (IPS Chapter 10.1) Simple linear regression p Statistical model for linear regression p Estimating
More informationComparing Nested Models
Comparing Nested Models ST 370 Two regression models are called nested if one contains all the predictors of the other, and some additional predictors. For example, the first-order model in two independent
More informationSimple Linear Regression. Material from Devore s book (Ed 8), and Cengagebrain.com
12 Simple Linear Regression Material from Devore s book (Ed 8), and Cengagebrain.com The Simple Linear Regression Model The simplest deterministic mathematical relationship between two variables x and
More informationLinear models and their mathematical foundations: Simple linear regression
Linear models and their mathematical foundations: Simple linear regression Steffen Unkel Department of Medical Statistics University Medical Center Göttingen, Germany Winter term 2018/19 1/21 Introduction
More informationINTRODUCING LINEAR REGRESSION MODELS Response or Dependent variable y
INTRODUCING LINEAR REGRESSION MODELS Response or Dependent variable y Predictor or Independent variable x Model with error: for i = 1,..., n, y i = α + βx i + ε i ε i : independent errors (sampling, measurement,
More informationStatistics - Lecture Three. Linear Models. Charlotte Wickham 1.
Statistics - Lecture Three Charlotte Wickham wickham@stat.berkeley.edu http://www.stat.berkeley.edu/~wickham/ Linear Models 1. The Theory 2. Practical Use 3. How to do it in R 4. An example 5. Extensions
More informationSimple Linear Regression. (Chs 12.1, 12.2, 12.4, 12.5)
10 Simple Linear Regression (Chs 12.1, 12.2, 12.4, 12.5) Simple Linear Regression Rating 20 40 60 80 0 5 10 15 Sugar 2 Simple Linear Regression Rating 20 40 60 80 0 5 10 15 Sugar 3 Simple Linear Regression
More informationApplied Regression Analysis
Applied Regression Analysis Chapter 3 Multiple Linear Regression Hongcheng Li April, 6, 2013 Recall simple linear regression 1 Recall simple linear regression 2 Parameter Estimation 3 Interpretations of
More informationProblems. Suppose both models are fitted to the same data. Show that SS Res, A SS Res, B
Simple Linear Regression 35 Problems 1 Consider a set of data (x i, y i ), i =1, 2,,n, and the following two regression models: y i = β 0 + β 1 x i + ε, (i =1, 2,,n), Model A y i = γ 0 + γ 1 x i + γ 2
More information" M A #M B. Standard deviation of the population (Greek lowercase letter sigma) σ 2
Notation and Equations for Final Exam Symbol Definition X The variable we measure in a scientific study n The size of the sample N The size of the population M The mean of the sample µ The mean of the
More information1 Multiple Regression
1 Multiple Regression In this section, we extend the linear model to the case of several quantitative explanatory variables. There are many issues involved in this problem and this section serves only
More informationFigure 1: The fitted line using the shipment route-number of ampules data. STAT5044: Regression and ANOVA The Solution of Homework #2 Inyoung Kim
0.0 1.0 1.5 2.0 2.5 3.0 8 10 12 14 16 18 20 22 y x Figure 1: The fitted line using the shipment route-number of ampules data STAT5044: Regression and ANOVA The Solution of Homework #2 Inyoung Kim Problem#
More informationSTAT 215 Confidence and Prediction Intervals in Regression
STAT 215 Confidence and Prediction Intervals in Regression Colin Reimer Dawson Oberlin College 24 October 2016 Outline Regression Slope Inference Partitioning Variability Prediction Intervals Reminder:
More informationREGRESSION ANALYSIS AND INDICATOR VARIABLES
REGRESSION ANALYSIS AND INDICATOR VARIABLES Thesis Submitted in partial fulfillment of the requirements for the award of degree of Masters of Science in Mathematics and Computing Submitted by Sweety Arora
More informationR 2 and F -Tests and ANOVA
R 2 and F -Tests and ANOVA December 6, 2018 1 Partition of Sums of Squares The distance from any point y i in a collection of data, to the mean of the data ȳ, is the deviation, written as y i ȳ. Definition.
More informationy n 1 ( x i x )( y y i n 1 i y 2
STP3 Brief Class Notes Instructor: Ela Jackiewicz Chapter Regression and Correlation In this chapter we will explore the relationship between two quantitative variables, X an Y. We will consider n ordered
More informationRegression Models - Introduction
Regression Models - Introduction In regression models there are two types of variables that are studied: A dependent variable, Y, also called response variable. It is modeled as random. An independent
More informationCorrelation Analysis
Simple Regression Correlation Analysis Correlation analysis is used to measure strength of the association (linear relationship) between two variables Correlation is only concerned with strength of the
More informationLeverage. the response is in line with the other values, or the high leverage has caused the fitted model to be pulled toward the observed response.
Leverage Some cases have high leverage, the potential to greatly affect the fit. These cases are outliers in the space of predictors. Often the residuals for these cases are not large because the response
More informationLecture 15. Hypothesis testing in the linear model
14. Lecture 15. Hypothesis testing in the linear model Lecture 15. Hypothesis testing in the linear model 1 (1 1) Preliminary lemma 15. Hypothesis testing in the linear model 15.1. Preliminary lemma Lemma
More informationChapter 16. Simple Linear Regression and dcorrelation
Chapter 16 Simple Linear Regression and dcorrelation 16.1 Regression Analysis Our problem objective is to analyze the relationship between interval variables; regression analysis is the first tool we will
More informationUNIVERSITY OF MASSACHUSETTS. Department of Mathematics and Statistics. Basic Exam - Applied Statistics. Tuesday, January 17, 2017
UNIVERSITY OF MASSACHUSETTS Department of Mathematics and Statistics Basic Exam - Applied Statistics Tuesday, January 17, 2017 Work all problems 60 points are needed to pass at the Masters Level and 75
More informationSimple linear regression
Simple linear regression Biometry 755 Spring 2008 Simple linear regression p. 1/40 Overview of regression analysis Evaluate relationship between one or more independent variables (X 1,...,X k ) and a single
More informationSCHOOL OF MATHEMATICS AND STATISTICS
RESTRICTED OPEN BOOK EXAMINATION (Not to be removed from the examination hall) Data provided: Statistics Tables by H.R. Neave MAS5052 SCHOOL OF MATHEMATICS AND STATISTICS Basic Statistics Spring Semester
More informationMODULE 4 SIMPLE LINEAR REGRESSION
MODULE 4 SIMPLE LINEAR REGRESSION Module Objectives: 1. Describe the equation of a line including the meanings of the two parameters. 2. Describe how the best-fit line to a set of bivariate data is derived.
More informationSection 4.6 Simple Linear Regression
Section 4.6 Simple Linear Regression Objectives ˆ Basic philosophy of SLR and the regression assumptions ˆ Point & interval estimation of the model parameters, and how to make predictions ˆ Point and interval
More informationSTAT 511. Lecture : Simple linear regression Devore: Section Prof. Michael Levine. December 3, Levine STAT 511
STAT 511 Lecture : Simple linear regression Devore: Section 12.1-12.4 Prof. Michael Levine December 3, 2018 A simple linear regression investigates the relationship between the two variables that is not
More informationy ˆ i = ˆ " T u i ( i th fitted value or i th fit)
1 2 INFERENCE FOR MULTIPLE LINEAR REGRESSION Recall Terminology: p predictors x 1, x 2,, x p Some might be indicator variables for categorical variables) k-1 non-constant terms u 1, u 2,, u k-1 Each u
More informationExercise I.1 I.2 I.3 I.4 II.1 II.2 III.1 III.2 III.3 IV.1 Question (1) (2) (3) (4) (5) (6) (7) (8) (9) (10) Answer
Solutions to Exam in 02402 December 2012 Exercise I.1 I.2 I.3 I.4 II.1 II.2 III.1 III.2 III.3 IV.1 Question (1) (2) (3) (4) (5) (6) (7) (8) (9) (10) Answer 3 1 5 2 5 2 3 5 1 3 Exercise IV.2 IV.3 IV.4 V.1
More informationInference for Regression Inference about the Regression Model and Using the Regression Line
Inference for Regression Inference about the Regression Model and Using the Regression Line PBS Chapter 10.1 and 10.2 2009 W.H. Freeman and Company Objectives (PBS Chapter 10.1 and 10.2) Inference about
More informationLinear Regression. In this problem sheet, we consider the problem of linear regression with p predictors and one intercept,
Linear Regression In this problem sheet, we consider the problem of linear regression with p predictors and one intercept, y = Xβ + ɛ, where y t = (y 1,..., y n ) is the column vector of target values,
More informationMA 575 Linear Models: Cedric E. Ginestet, Boston University Midterm Review Week 7
MA 575 Linear Models: Cedric E. Ginestet, Boston University Midterm Review Week 7 1 Random Vectors Let a 0 and y be n 1 vectors, and let A be an n n matrix. Here, a 0 and A are non-random, whereas y is
More informationMath 423/533: The Main Theoretical Topics
Math 423/533: The Main Theoretical Topics Notation sample size n, data index i number of predictors, p (p = 2 for simple linear regression) y i : response for individual i x i = (x i1,..., x ip ) (1 p)
More informationMultiple Regression Introduction to Statistics Using R (Psychology 9041B)
Multiple Regression Introduction to Statistics Using R (Psychology 9041B) Paul Gribble Winter, 2016 1 Correlation, Regression & Multiple Regression 1.1 Bivariate correlation The Pearson product-moment
More informationMa 3/103: Lecture 25 Linear Regression II: Hypothesis Testing and ANOVA
Ma 3/103: Lecture 25 Linear Regression II: Hypothesis Testing and ANOVA March 6, 2017 KC Border Linear Regression II March 6, 2017 1 / 44 1 OLS estimator 2 Restricted regression 3 Errors in variables 4
More informationRegression. Bret Hanlon and Bret Larget. December 8 15, Department of Statistics University of Wisconsin Madison.
Regression Bret Hanlon and Bret Larget Department of Statistics University of Wisconsin Madison December 8 15, 2011 Regression 1 / 55 Example Case Study The proportion of blackness in a male lion s nose
More informationSix Sigma Black Belt Study Guides
Six Sigma Black Belt Study Guides 1 www.pmtutor.org Powered by POeT Solvers Limited. Analyze Correlation and Regression Analysis 2 www.pmtutor.org Powered by POeT Solvers Limited. Variables and relationships
More informationChapter 4 Describing the Relation between Two Variables
Chapter 4 Describing the Relation between Two Variables 4.1 Scatter Diagrams and Correlation The is the variable whose value can be explained by the value of the or. A is a graph that shows the relationship
More informationCorrelation and the Analysis of Variance Approach to Simple Linear Regression
Correlation and the Analysis of Variance Approach to Simple Linear Regression Biometry 755 Spring 2009 Correlation and the Analysis of Variance Approach to Simple Linear Regression p. 1/35 Correlation
More informationdf=degrees of freedom = n - 1
One sample t-test test of the mean Assumptions: Independent, random samples Approximately normal distribution (from intro class: σ is unknown, need to calculate and use s (sample standard deviation)) Hypotheses:
More informationCoefficient of Determination
Coefficient of Determination ST 430/514 The coefficient of determination, R 2, is defined as before: R 2 = 1 SS E (yi ŷ i ) = 1 2 SS yy (yi ȳ) 2 The interpretation of R 2 is still the fraction of variance
More informationTMA4255 Applied Statistics V2016 (5)
TMA4255 Applied Statistics V2016 (5) Part 2: Regression Simple linear regression [11.1-11.4] Sum of squares [11.5] Anna Marie Holand To be lectured: January 26, 2016 wiki.math.ntnu.no/tma4255/2016v/start
More informationLectures on Simple Linear Regression Stat 431, Summer 2012
Lectures on Simple Linear Regression Stat 43, Summer 0 Hyunseung Kang July 6-8, 0 Last Updated: July 8, 0 :59PM Introduction Previously, we have been investigating various properties of the population
More informationLecture 2. The Simple Linear Regression Model: Matrix Approach
Lecture 2 The Simple Linear Regression Model: Matrix Approach Matrix algebra Matrix representation of simple linear regression model 1 Vectors and Matrices Where it is necessary to consider a distribution
More informationChapter 2 Inferences in Simple Linear Regression
STAT 525 SPRING 2018 Chapter 2 Inferences in Simple Linear Regression Professor Min Zhang Testing for Linear Relationship Term β 1 X i defines linear relationship Will then test H 0 : β 1 = 0 Test requires
More informationInference for the Regression Coefficient
Inference for the Regression Coefficient Recall, b 0 and b 1 are the estimates of the slope β 1 and intercept β 0 of population regression line. We can shows that b 0 and b 1 are the unbiased estimates
More informationCorrelation. Bivariate normal densities with ρ 0. Two-dimensional / bivariate normal density with correlation 0
Correlation Bivariate normal densities with ρ 0 Example: Obesity index and blood pressure of n people randomly chosen from a population Two-dimensional / bivariate normal density with correlation 0 Correlation?
More information2. Outliers and inference for regression
Unit6: Introductiontolinearregression 2. Outliers and inference for regression Sta 101 - Spring 2016 Duke University, Department of Statistical Science Dr. Çetinkaya-Rundel Slides posted at http://bit.ly/sta101_s16
More informationCAS MA575 Linear Models
CAS MA575 Linear Models Boston University, Fall 2013 Midterm Exam (Correction) Instructor: Cedric Ginestet Date: 22 Oct 2013. Maximal Score: 200pts. Please Note: You will only be graded on work and answers
More informationLAB 5 INSTRUCTIONS LINEAR REGRESSION AND CORRELATION
LAB 5 INSTRUCTIONS LINEAR REGRESSION AND CORRELATION In this lab you will learn how to use Excel to display the relationship between two quantitative variables, measure the strength and direction of the
More informationLecture 1 Linear Regression with One Predictor Variable.p2
Lecture Linear Regression with One Predictor Variablep - Basics - Meaning of regression parameters p - β - the slope of the regression line -it indicates the change in mean of the probability distn of
More information