Linear Regression. Furthermore, it is simple.

Size: px
Start display at page:

Download "Linear Regression. Furthermore, it is simple."

Transcription

1 Linear Regression While linear regression has limited value in the classification problem, it is often very useful in predicting a numerical response, on a linear or ratio scale. Furthermore, it is simple. Further-furthermore, performing linear regression analysis involves many important principles of data analysis. So it s a good place to start. 1

2 The Model In the basic problem, we have some predictor variables and a response. Let p be the number of predictors. We have n observations on all of these. So we have an n p matrix of predictors; call it X. And we have an n-vector of numerical responses; call it Y. A model for this situation is where f is some function. Y f(x), In linear regression, we have where β is an unknown vector. Y Xβ, 2

3 The Model Instead of Y Xβ, let s write Y = Xβ + ɛ, where ɛ is an n-vector of latent variables we ll call errors. We treat ɛ as a random vector. Our problem is to find β. 3

4 Assumptions about the Error Term in Linear Regression We make some simple assumptions about the distribution of the random vector ɛ. At the least, we assume E(ɛ) = 0 V(ɛ) = σ 2 I, where I is the identity matrix. We often also add an additional assumption: We may assume that ɛ has a multivariate normal (Gaussian) distribution. As we discuss the linear regression more, it is important to recognize whether or not we make the additional assumption of normality (for example, for t tests). 4

5 Simple Linear Regression Just as regression in general can be used to illustrate many of the principles of modeling in data science, the simple linear model can be used to illustrate many of the principles and techniques of multiple linear regression. It s easy to visualize. The model, one observation at a time, is where we assume, at the least, y i = β 0 + β 1 x i + ɛ i, E(ɛ i ) = 0 V(ɛ i ) = σ 2 i i (finite and constant) Cov(ɛ i, ɛ j ) = 0 i, j; j i (These are the same as on the previous slide, except these are written in term of the individual elements of the vector.) We may also make a further assumption that the errors have a normal distribution. 5

6 Notation What is X if we write the model as y = Xβ + ɛ. This is always a problem. Should we write y = β 0 + x T β + ɛ. How do we write the constant (the intercept)? I don t have an opinion; I just want you to be aware of the possibilities. Different authors do it differently, and I even do it differently at different times. There is an interesting property of a least-squares fit that is relevant to this issue, as we will see. 6

7 Simulated Data Simulation of data is very useful, both for teaching and for research. In research, it allows us to study various scenarios ( Monte Carlo study). In teaching, it just gives us some numbers and pictures to look at. Let s simulate some data for simple linear regression. 7

8 Simulated Data We ll generate some data from this model. y i = β 0 + β 1 x i + ɛ i First, we need to decide on what s random. Only ɛ i. Now we need to decide on a probability distribution for ɛ i. 8

9 Simulated Data The distribution must be consistent with the three assumptions mentioned earlier. Independence = Cov = 0. So that s easy to satisfy. Let s use ɛ i iid N(0,σ 2 ). The mean is finite and constant. Now, what about the x i? Let s just let them be randomly distributed over (0,10). Let β 0 = 0.8 and β 1 =

10 Simulate the Data in R and Plot It set.seed(555) beta0 <- 0.8 beta1 <- 1.2 xlo <- 0 xhi <- 10 n <- 20 eps <- rnorm(n) x <- runif(n,xlo,xhi) y <- beta0 + beta1*x + eps plot(x,y, main="simulated Data") Save the plot. (saveplot is specific to MS Windows.) setwd( c:/isye6740_course/l03 ) saveplot( Fig01L030505,type="ps") 10

11 Simulated Data y x 11

12 How the Data Compare to the (Known) Model Plot the line representing the true model, and draw lines representing the errors. abline(beta0,beta1,col="green") title("true Model") for (i in 1:n) lines(c(x[i],x[i]),c(y[i],beta0+beta1*x[i]), col="green") Save plot. 12

13 True Model y x 13

14 Fitting the Data Now, suppose we have the data, but don t know β 0 and β 1. This is the usual case, of course. Let s just try a line with β 0 = 1 and β 1 = 1. The plot function starts a new graph. plot(x,y) b0 <- 1 b1 <- 1 abline(b0,b1,col="red") title(expression( paste("model with ", beta[0],"=1, ", beta[2], "=1"))) for (i in 1:n) lines(c(x[i],x[i]),c(y[i],b0+b1*x[i]),col="red") Save plot. 14

15 Model with β 0 =1, β 2 =1 y x 15

16 Fitting the Data Doesn t look good. Residuals are bad. They are not balanced. Look at the sum of the squares of the residuals: sum((y - (b0+b1*x))^2) This is called the residual sum of squares, RSS, for this model, that is, for these two values of β 0 and β 1. 16

17 RSS for the (Unknown) True Model What about the sum of the squared residuals for the true model (which we don t know)? sum((y - (beta0+beta1*x))^2) The sum of squares provides an indication for comparative purposes of how well the model fits the data. 17

18 Least Squares Fit OK. Let s fit the model by determining β 0 and β 1 in such a way as to make the sum of squared residuals small. Least squares. An R function that will do this is lm. It generates an R object, which I will name fit, that has all kinds of information in it. 18

19 Least Squares Fit Here s what my R console looks like: > fit <- lm(y~x) > summary(fit) Call: lm(formula = y ~ x) Residuals: Min 1Q Median 3Q Max Coefficients: Estimate Std. Error t value Pr(> t ) (Intercept) * x e-12 *** --- Signif. codes: 0 *** ** 0.01 * Residual standard error: on 18 degrees of freedom Multiple R-squared: , Adjusted R-squared: F-statistic: 249 on 1 and 18 DF, p-value: 5.502e-12 19

20 Least Squares Fit We can can see what kinds of information are the fit object using the function names. > names(fit) [1] "coefficients" "residuals" "effects" "rank" [5] "fitted.values" "assign" "qr" "df.residual" [9] "xlevels" "call" "terms" "model" We can access individual pieces of the fit object using the extractor $. For example, > fit$coefficients (Intercept) x We can abbreviate the names; coef, for example. 20

21 Least Squares Fit What is the RSS for this fit? > b0hat <- fit$coef[1] > b1hat <- fit$coef[2] > sum((y - (b0hat+b1hat*x))^2) [1] Even better than true model (which, of course, we don t know). That s because it s a least squares fit. 21

22 Least Squares Fit Plot it, as before. Also plot the mean ( x, ȳ). plot(x,y) abline(b0hat,b1hat,col="blue") title("least Squares Fit") for (i in 1:n) lines(c(x[i],x[i]),c(y[i],b0hat+b1hat*x[i]), col="blue") points(mean(x),mean(y), pch="+", cex=1.5, col="red") legend("bottomright", c("mean"), pch="+", col="red") 22

23 Least Squares Fit y mean x 23

24 Properties of the Least Squares Estimators First, what are the estimators? Differentiate (yi (b 0 + b 1 x i )) 2 with respect to b 0 and b 1 and set equal to 0. Get two equations. (Exercise.) Solve them and call the solution β 0 and β 1. How do you know the solution of the equations is a minimum? 24

25 Properties of the Least Squares Estimators The solutions, which are our estimators, written in terms of x i x and y i ȳ, are (xi x)(y β 1 = i ȳ) (xi x) 2 β 0 = ȳ β 1 x We see the first important property: The least-squares line goes through x and ȳ. Therefore, you often see the derivation in terms of x i x and y i ȳ. This is a result, not an a priori requirement. 25

26 Properties of the Least Squares Estimators From the expressions for β 0 and β 1 we can figure out all kinds of things. If we assumed that the distribution of ɛ i is normal, from our original model, we have y i N(β 0 + β 1 x i, σ 2 ). Likewise, we can work out the distribution of ȳ. 26

27 Properties of the Least Squares Estimators From the distribution of y i and ȳ, we can get the distribution of β 0 and β 1. It is also easy to work out any property, such as E( β 1 ), for example ( ) E( β (xi x)(y 1 ) = E i ȳ) (xi x) 2 = = = = = β 1 1 ( (xi x) 2E (xi x)(y i ȳ) ) 1 (xi x) 2 (xi x)(e(y i ) E(ȳ)) 1 (xi x) 2 (xi x)(β 0 + β 1 x i β 0 β 1 x)) 1 (xi x) 2 (xi x)β 1 (x i x) 27

28 Properties of the Least Squares Estimators So we see β 1 is an unbiased estimator of β 1. Likewise, we can see β 0 is an unbiased estimator of β 0. (Exercise: work that out.) In similar fashion, we can work out the variances of β 0 and β 1. We can even work out their distributions. They are normal under our assumption that the ɛ i are normal. 28

29 Summary of Properties Using Matrix Notation We will use the full model with intercept (that is where X has a column of 1 s). Sum of squared residuals: Derivative set equal to 0: y = Xβ + ɛ (y Xβ) T (y Xβ) X T Xβ X T y = 0 Solution: β = (X T X) 1 X T y Expected value E( β) = (X T X) 1 X T Ey) = (X T X) 1 X T Xβ = β Predicted values ŷ = X(X T X) 1 X T y The hatrix X(X T X) 1 X T is called the hat matrix. Residuals r = (I X(X T X) 1 X T )y 29

30 R Function After the function lm has been used, and a object of class fit has been created, various things can be extracted from the fit object. Suppose we have used a statement like myfit <- lm(... Confidence intervals for the coefficients are given by confit(myfit) The predicted values (a vector) are given by predict(myfit) The residuals are given by residual(myfit) The studentized residuals (what are they?) are given by rstudent(myfit) The values along the diagonal of the hat matrix are given by hatvalues(myfit) 30

31 Interesting Fact about Residuals X T r = X T (I X(X T X) 1 X T )y = (X T X T X(X T X) 1 X T )y = (X T X T )y = 0. So the residuals are orthogonal to every column of X. That means they have zero correlation with each X variable. That means the sum of residuals is 0. 31

32 Testing Hypotheses Concerning the Coefficient Estimate Knowing the distributions of β 0 and β 1, allows us to develop a test of the hypothesis versus the alternative for example; it is a t test. H 0 : β 1 = 0 H a : β 1 0 It uses the standard error. (Review: What s a t test? What a standard error?) If we do not reject this hypothesis; that is, if β 1 = 0 in the model, there s really no linear relation between the predictor and the response. 32

33 t Tests The results of a t test are shown in the summary for fit in a previous slide: Coefficients: Estimate Std. Error t value Pr(> t ) (Intercept) * x e-12 *** --- Signif. codes: 0 *** ** 0.01 * Can you interpret these? Would we reject the null hypotheses? 33

34 The Accuracy of the Model Even if we reject the null hypothesis, and conclude that there is a linear relation between the predictor and the response, the question still remains of how good the model fits the data. As we have seen, the RSS tells which model is better than another one. To make it more meaningful, however, we need to standardize it in some way. First of all, it is clear that RSS gets bigger, the more observations that we have, so it would be a good idea to scale it back by n, the sample size. 34

35 The MSE and the RSE It can be proven (we won t do that here) that the mean squared error, MSE = RSS n p, is an unbiased estimator of σ 2 (assuming all assumptions for the model are satisfied. We call the square root of the MSE, the residual standard error, or RSE: RSE = RSS n p. While MSE is unbiased for σ 2, RSE is not unbiased for σ. 35

36 R 2 Another way of assessing the accuracy of the model is by comparing the RSS with the total variation, which we measure by the total sum of squares TSS, that is, the sum of the squared deviations of the response from its mean, without the linear model: TSS = (y i ȳ) 2. Obviously, if RSS is almost as large as TSS, the model did not explain very much of the variation in the response. A measure of how much variation it does explain is called R 2. It is defined as R 2 = 1 RSS TSS. 36

37 R 2 It ranges from 0 to 1. (Remember ȳ = β 0 + β 1 x and RSS is the minimum of anything of the form (y i b 0 b 1 x i ) 2.) It is often expressed as a percentage. For good fits the value of R 2 varies widely depending on the application. In the social sciences, an R 2 of 60% may indicate a very goodfitting model. In engineering applications, an R 2 less than 90% may indicate poor good-fitting model. 37

38 We chose to fit the model Other Criteria for Fitting the Simple Linear Regression Model y i = β 0 + β 1 x i + ɛ i by finding values of β 0 and β 1 that minimize (yi β 0 β 1 x i ) 2. Least squares. The properties of the least-squares estimators which we can use to test hypotheses and make other types of inference that we have discussed depend on the way we obtained these estimators. It might make sense to use another criterion for fitting the regression equation. For example, we may choose to find values of β 0 and β 1 that minimize yi β 0 β 1 x i p, for some p 1. Called L p estimators. 38

39 L 1 Estimators for the Simple Linear Regression Model We want to minimize the sum of absolute values of the residuals: yi β 0 β 1 x i. The R function l1fit in the L1pack package does this. library(l1pack) plot(x,y) fitl1 <- l1fit(x,y) b0l1 <- fitl1$coef[1] b1l1 <- fitl1$coef[2] abline(b0l1,b1l1,col="blue") title("least Absolute Values Fit") for (i in 1:n) lines(c(x[i],x[i]),c(y[i],b0l1+b1l1*x[i]), col="blue") points(mean(x),mean(y), pch="+", col="red") 39

40 Least Absolute Values Fit y x 40

41 y LS LAV x 41

42 The Principles and Procedures of Simple Linear Regression The methods we have used and illustrated in the simple regression model apply to other regression models. A basic procedure in any analysis is to partition the total variation into two parts: We can measure variation in different ways, such as squared deviations or absoulte variations for example. 42

43 The Principles and Procedures of Simple Linear Regression and Least Squares The most common measure of variation is the squared deviations from a mean. We partition the variation as TSS = TSS-RSS + RSS = explained + residual We observed several things about least squares in simple linear regression: The coefficient estimators are unbiased. The MSE is unbiased for the error variance. The least squares fit goes through the mean point x, ȳ). 43

44 The Roles of Training and Test Data Given a model and some data, we use the data to train the model (to fit it). The best fit is obtained by using all of the data. After fitting the model, we can look at things like R 2, but those really don t tell us if the model is good. Suppose we are unsure what kind of model? Use some of the data to fit it ( training data ) and hold back some to see how well the fitted model fits this test data. In most applications of machine learning, the use of two subgroups of the data, the training set and the test set, is important. It is not done so often in regression analysis, but the idea is valid and it is straightforward. 44

45 Multiple Linear Regression The model is y = Xβ + ɛ, where y is an n-vector, X is an n p matrix and ɛ is an n-vector. V(ɛ) = σ 2 I Recall the possible ambiguities regarding the intercept; that is, a column of 1 s in X. Remember the difference in multiple linear regression and multivariate linear regression. 45

46 Multiple Linear Regression All of the things we did with the simple linear regression model apply to this. For least squares, we minimize (y Xβ) T (y Xβ) We expand this, differentiate with respect to β, and set equal to 0. We get the system of p equations in p unknowns: X T Xβ X T y = 0 These are called the normal equations. 46

47 Least-Squares Estimators and Their Properties These are very similar to those of the simple linear regression model. Remember, there are other kinds of estimators! But the LS estimators are easiest and also have some of the nicest properties. The least-squares estimators are solutions to the normal equations: β = (X T X) 1 X T y, if (X T X) is nonsingular, that is, the rank of X is p; otherwise, β = (X T X) + X T y. (Pseudoinverse; in simple linear regression, this would be like the two equations were the same; happens iff x is constant.) 47

48 Least-Squares Estimators and Their Properties Let s assume the rank of X is p ( full rank ), so we will write β = (X T X) 1 X T y. Comment: this is not the way to compute the estimators. The computer algorithms are different. Because y N(Xβ, σ 2 I), ( β N (X T X) 1 X T Xβ, σ 2 (X T X) 1 X T ( (X T X) 1 X T) T ). Simplify: β N ( β, σ 2 (X T X) 1). This tells us everything we need to know to make inferences about the data-generating process: test hypotheses, etc. 48

49 Statistical Inference We can do the same kind of things we did with the simple linear regression model. y i = β 0 + β 1 x 1i + + β p x pi + ɛ i We can test individual hypotheses, such as or Each is a t test. H 10 : β 1 = 0 H 20 : β 2 = 0. The problem is that the tests are not independent of each other. 49

50 Statistical Inference We might consider an hypothesis of all βs at once: H 0 : β 1 = = β p = 0; that is, β = 0 versus H a : = β j 0, for at least one β j ; that is, β 0. This requires a different kind of test (because the distribution of β is different; it is multivariate normal). F test. 50

51 Statistical Inference F = (TSS RSS)/p RSS/(n p 1) (What s the 1 for?) F with p and n p 1 degrees of freedom. To test just q of the coefficients, fit the model without those variables, get the RSS; call it RSS o. Then form F = (RSS o RSS))/q RSS/(n p 1) F with q and n p 1 degrees of freedom. 51

52 Prediction For a specific set of X values, say x 0 (this is a p-vector) ŷ = x T 0 β What s expected value? Unbiased. E(ŷ) = E(x T 0 β) = x T 0 E( β) = x T 0 β What s the variance? V(ŷ) = V(x T 0 β) = x T 0 V( β)x 0 = x T 0 (XT X) 1 x 0 σ 2. 52

53 Simulate Data in R and Fit It set.seed(555) beta0 <- 0.8 beta1 <- 1.2 beta2 <- 2.8 beta3 <- 4.2 xlo <- 0 xhi <- 10 n <- 20 eps <- rnorm(n) x1 <- runif(n,xlo,xhi) x2 <- runif(n,xlo,xhi) x3 <- runif(n,xlo,xhi) y <- beta0 + beta1*x1 + beta2*x2 + beta3*x3 + eps fit3 <- lm(y~x1+x2+x3) 53

54 > summary(fit3) Call: lm(formula = y ~ x1 + x2 + x3) Fit of Model with Simulated Data Residuals: Min 1Q Median 3Q Max Coefficients: Estimate Std. Error t value Pr(> t ) (Intercept) x e-10 *** x e-15 *** x < 2e-16 *** --- Signif. codes: 0 *** ** 0.01 * Residual standard error: on 16 degrees of freedom Multiple R-squared: , Adjusted R-squared: F-statistic: 1679 on 3 and 16 DF, p-value: < 2.2e-16 54

55 Are All of the Predictor Variables Important? Forward selection: Find the best one; find the next best; and so on. How? Backward selection: Put all in the model; remove the least important; remove the next least; and so on. How? Stepwise selection: Find the three best ones; find the least important of those; decide whether or not to remove it; then find the next best one to add; and so on. All best: Find the single variable; find the two best; find the three best; and so on. We ll talk more about the criteria later. 55

56 Qualitative Predictors Variables are not numeric, e.g. sex, male or female Create numeric variable Z that takes values (0,1), (1,2) or (-1,1) for example. Can do the regression in the usual way. How about 3 cases: red, green, blue. Two dummy variables, Z 1 and Z 2 Z 1 = Z 2 = 1 if red 0 if not red 1 if green 0 if not green 56

57 Polynomial Regression y i = β 0 + β 1 x i + β 2 x 2 i + ɛ i This is essentially a linear model: y = Xβ, where 57

58 Interactions among Predictors y i = β 0 + β 1 x 1i x 2i + β 2 x 3i + ɛ i This is different from a linear model. 58

59 The Regression Model Statement in R Suppose we have a response y and predictors x 1 and x 2. To tell R we want the model y i = β 0 +β 1 x 1i +ɛ i, after appropriately assigning the variables in R, we use lm(y~x1) For the model y i = β 1 x 1i + ɛ i (no intercept), use lm(y~x1-1) or lm(y~x1+0) For the model y i = β 0 + β 1 x 1i + β 2 x 2i + ɛ i, use lm(y~x1+x2) For the model y i = β 0 + β 1 x 1i x 2i + ɛ i (interaction), use lm(y~x1:x2) For the model y i = β 0 +β 1 x 1i +β 2 x 2i +β 3 x 1i x 2i +ɛ i ( full factorial ), use lm(y~x1*x2) same as lm(y~x1+x2+x1:x2) For the model y i = β 0 + β 1 x 1i + β 2 x 2 1i + ɛ i (polynomial regression), use lm(y~x1+i(x1^2)) 59

60 Nonlinear Data y i = β 0 + β 1 e β 2x i + ɛ i. Some of the same least squares methods apply, but there are differnces. Don t have unbiasedness (usually); don t have t and F distributions. 60

61 Correlated Errors Time series Cor(ɛ i, ɛ i+20 ) maybe 0, but Cor(ɛ i, ɛ i+1 ) not 0. Serial correlation. Ways of dealing with this (fit an ARMA model, e.g.) 61

62 Nonconstant Variance of Errors Variance is larger for larger values of the response; common problem. Scale data when fitting. Suppose in we have y = Xβ + ɛ, V(ɛ) = Σ, that is, nonconstant variance and correlated errors. ( β = X T Σ 1 1 X) X T Σ 1/2 y Generalized least squares. 62

63 Outliers (in Response) Recall the simple linear model we began with: y i = β 0 + β 1 x i + ɛ i Consider the same dataset as before, except that one of the observations doesn t fit the model well. It is an outlier. 63

64 Outlier in y y x 64

65 We fit it with least squares. 65

66 Outlier in y y x 66

67 Fit good data (without the outlier). 67

68 Outlier in y y with outlier without x 68

69 Fit with L1. 69

70 LS Fit LAV Fit y y x x 70

71 High Leverage Points (Outliers in Predictors) The estimators are affected by the outlier, but there s really not a whole lot of difference. Now consider a slightly different problem. 71

72 Outlier in x y x 72

73 The Effect of High Leverage Points The response at the outlier in the predictors wields more influence. The hat value (the diagonal value in the hat matrix that corresponds to this point in the predictor space) is a measure of this relative influence. 73

74 Outlier in x y x 74

75 Look at the effect of an outlier at a high leverage point. 75

76 Outlier in x y x 76

77 The Least Squares Residuals Do Not Have the Same Variance One of our fundamental assumptions is that the errors all have the same variance; V(ɛ i ) = σ 2 for all i. ɛ i = y i β 0 + β 1 x i How about the least squares residuals? V(r i ) depends on x i x. r i = y i β 0 + β 1 x i 77

78 Multicollinearity Multicollinearity is the situation where one vector of predictors is almost a linear combination of the others. Three vectors, x 1, x 2, x 3. Independent iff there do not exist scalars c 1, c 2, c 3 with some c i 0 such that c 1 x 1 + c 2 x 2 + c 3 x 3 = 0. Suppose independent but x 3 = a 1 x 1 + a 2 x 2 + d = 0, where d is a vector not equal to 0. What is d is very small. multicollinearity Multicollinearity is not a binary quaality; it exists in various degrees. 78

79 Multicollinearity Predictor variables with strong multicollinearity result in needlessly large variance of the coefficient estimators. The increased variance associated with each coefficient estimator depends on how strongly the corresponding predictor variable is linearly related to the other predictors. We measure this by R 2. 79

80 Multicollinearity For the j th predictor, consider, x ji = α 0 + α 1 x 1i + + α p x pi + ɛ i, where x j is not included on the right-hand side. The R 2 for this regression is a measure of the strength of the linear relationship between x j and the other predictors. Let R 2 x j x j be that R 2. We define the variance inflation factor, VIF, for that coefficient estimator as VIF( β 1 j ) = 1 R x 2. j x j 80

81 An Application: Assessing the Effect of Advertising in a Marketing Plan The Advertising dataset from the book s web site. Read Section

82 Linear Regression and KNN In the model Y f(x), for linear regression, we f be a linear function, and we can write Y Xβ, where β is an unknown vector. To fit the linear regression model, we estimate β by some method, maybe least squares, for example. To fit the model by K-nearest neighbors, KNN, at any point, we use the average response of the K nearest points in the predictor space. Let s consider an example in simple regression (one predictor). 82

83 Consider our Simulated Data Simulated Data y x 83

84 KNN Prediction Let K = 3. Now, let s find ŷ(x) when x = 4, and when x = 8. The R code below will do that and then plot it. I ve written a simple function, just to illustrate function writing in R. 84

85 KNNprediction <- function(x,y,k,x0){ dist <- abs(x-x0) Kn <- order(dist)[1:3] yhatx0 <- mean(y[kn]) return(yhatx0) } K=3 x0 <- 4 arrows(x0, 1, x1 = x0, y1 = 0, length = 0.1, col="red") yhatx0 <- KNNprediction(x,y,K,x0) lines(c(x0-.2,x0+.2), c(yhatx0,yhatx0), col="red") x0 <- 8 arrows(x0, 1, x1 = x0, y1 = 0, length = 0.1, col="red") yhatx0 <- KNNprediction(x,y,K,x0) lines(c(x0-.2,x0+.2), c(yhatx0,yhatx0), col="red") 85

86 KNN Predictors at x = 4 and at x = 8 Simulated Data y x 86

87 KNN Regression The length of the horizontal lines depends on the next nearest neighbors. Continuing in this fashion the regression fit would be a step function. The idea of course extends to higher dimensions. We would get a set of hyperplanes to make a step function. It works whether the data are linear or not. 87

Machine Learning for OR & FE

Machine Learning for OR & FE Machine Learning for OR & FE Supervised Learning: Regression I Martin Haugh Department of Industrial Engineering and Operations Research Columbia University Email: martin.b.haugh@gmail.com Some of the

More information

MATH 644: Regression Analysis Methods

MATH 644: Regression Analysis Methods MATH 644: Regression Analysis Methods FINAL EXAM Fall, 2012 INSTRUCTIONS TO STUDENTS: 1. This test contains SIX questions. It comprises ELEVEN printed pages. 2. Answer ALL questions for a total of 100

More information

10. Alternative case influence statistics

10. Alternative case influence statistics 10. Alternative case influence statistics a. Alternative to D i : dffits i (and others) b. Alternative to studres i : externally-studentized residual c. Suggestion: use whatever is convenient with the

More information

Linear Regression. In this problem sheet, we consider the problem of linear regression with p predictors and one intercept,

Linear Regression. In this problem sheet, we consider the problem of linear regression with p predictors and one intercept, Linear Regression In this problem sheet, we consider the problem of linear regression with p predictors and one intercept, y = Xβ + ɛ, where y t = (y 1,..., y n ) is the column vector of target values,

More information

Linear Regression In God we trust, all others bring data. William Edwards Deming

Linear Regression In God we trust, all others bring data. William Edwards Deming Linear Regression ddebarr@uw.edu 2017-01-19 In God we trust, all others bring data. William Edwards Deming Course Outline 1. Introduction to Statistical Learning 2. Linear Regression 3. Classification

More information

Lecture 6: Linear Regression

Lecture 6: Linear Regression Lecture 6: Linear Regression Reading: Sections 3.1-3 STATS 202: Data mining and analysis Jonathan Taylor, 10/5 Slide credits: Sergio Bacallado 1 / 30 Simple linear regression Model: y i = β 0 + β 1 x i

More information

Linear regression. Linear regression is a simple approach to supervised learning. It assumes that the dependence of Y on X 1,X 2,...X p is linear.

Linear regression. Linear regression is a simple approach to supervised learning. It assumes that the dependence of Y on X 1,X 2,...X p is linear. Linear regression Linear regression is a simple approach to supervised learning. It assumes that the dependence of Y on X 1,X 2,...X p is linear. 1/48 Linear regression Linear regression is a simple approach

More information

Lecture 18: Simple Linear Regression

Lecture 18: Simple Linear Regression Lecture 18: Simple Linear Regression BIOS 553 Department of Biostatistics University of Michigan Fall 2004 The Correlation Coefficient: r The correlation coefficient (r) is a number that measures the strength

More information

Ch 2: Simple Linear Regression

Ch 2: Simple Linear Regression Ch 2: Simple Linear Regression 1. Simple Linear Regression Model A simple regression model with a single regressor x is y = β 0 + β 1 x + ɛ, where we assume that the error ɛ is independent random component

More information

Lecture 4 Multiple linear regression

Lecture 4 Multiple linear regression Lecture 4 Multiple linear regression BIOST 515 January 15, 2004 Outline 1 Motivation for the multiple regression model Multiple regression in matrix notation Least squares estimation of model parameters

More information

Machine Learning Linear Regression. Prof. Matteo Matteucci

Machine Learning Linear Regression. Prof. Matteo Matteucci Machine Learning Linear Regression Prof. Matteo Matteucci Outline 2 o Simple Linear Regression Model Least Squares Fit Measures of Fit Inference in Regression o Multi Variate Regession Model Least Squares

More information

Math 423/533: The Main Theoretical Topics

Math 423/533: The Main Theoretical Topics Math 423/533: The Main Theoretical Topics Notation sample size n, data index i number of predictors, p (p = 2 for simple linear regression) y i : response for individual i x i = (x i1,..., x ip ) (1 p)

More information

Simple Linear Regression

Simple Linear Regression Simple Linear Regression ST 430/514 Recall: A regression model describes how a dependent variable (or response) Y is affected, on average, by one or more independent variables (or factors, or covariates)

More information

Lecture 6: Linear Regression (continued)

Lecture 6: Linear Regression (continued) Lecture 6: Linear Regression (continued) Reading: Sections 3.1-3.3 STATS 202: Data mining and analysis October 6, 2017 1 / 23 Multiple linear regression Y = β 0 + β 1 X 1 + + β p X p + ε Y ε N (0, σ) i.i.d.

More information

Statistical Methods for Data Mining

Statistical Methods for Data Mining Statistical Methods for Data Mining Kuangnan Fang Xiamen University Email: xmufkn@xmu.edu.cn Linear regression Linear regression is a simple approach to supervised learning. It assumes that the dependence

More information

9. Linear Regression and Correlation

9. Linear Regression and Correlation 9. Linear Regression and Correlation Data: y a quantitative response variable x a quantitative explanatory variable (Chap. 8: Recall that both variables were categorical) For example, y = annual income,

More information

Multiple Linear Regression

Multiple Linear Regression Multiple Linear Regression ST 430/514 Recall: a regression model describes how a dependent variable (or response) Y is affected, on average, by one or more independent variables (or factors, or covariates).

More information

Introduction and Single Predictor Regression. Correlation

Introduction and Single Predictor Regression. Correlation Introduction and Single Predictor Regression Dr. J. Kyle Roberts Southern Methodist University Simmons School of Education and Human Development Department of Teaching and Learning Correlation A correlation

More information

Linear Regression. In this lecture we will study a particular type of regression model: the linear regression model

Linear Regression. In this lecture we will study a particular type of regression model: the linear regression model 1 Linear Regression 2 Linear Regression In this lecture we will study a particular type of regression model: the linear regression model We will first consider the case of the model with one predictor

More information

Lecture 8: Fitting Data Statistical Computing, Wednesday October 7, 2015

Lecture 8: Fitting Data Statistical Computing, Wednesday October 7, 2015 Lecture 8: Fitting Data Statistical Computing, 36-350 Wednesday October 7, 2015 In previous episodes Loading and saving data sets in R format Loading and saving data sets in other structured formats Intro

More information

Regression, Part I. - In correlation, it would be irrelevant if we changed the axes on our graph.

Regression, Part I. - In correlation, it would be irrelevant if we changed the axes on our graph. Regression, Part I I. Difference from correlation. II. Basic idea: A) Correlation describes the relationship between two variables, where neither is independent or a predictor. - In correlation, it would

More information

BANA 7046 Data Mining I Lecture 2. Linear Regression, Model Assessment, and Cross-validation 1

BANA 7046 Data Mining I Lecture 2. Linear Regression, Model Assessment, and Cross-validation 1 BANA 7046 Data Mining I Lecture 2. Linear Regression, Model Assessment, and Cross-validation 1 Shaobo Li University of Cincinnati 1 Partially based on Hastie, et al. (2009) ESL, and James, et al. (2013)

More information

Business Statistics. Tommaso Proietti. Linear Regression. DEF - Università di Roma 'Tor Vergata'

Business Statistics. Tommaso Proietti. Linear Regression. DEF - Università di Roma 'Tor Vergata' Business Statistics Tommaso Proietti DEF - Università di Roma 'Tor Vergata' Linear Regression Specication Let Y be a univariate quantitative response variable. We model Y as follows: Y = f(x) + ε where

More information

Inference for Regression

Inference for Regression Inference for Regression Section 9.4 Cathy Poliak, Ph.D. cathy@math.uh.edu Office in Fleming 11c Department of Mathematics University of Houston Lecture 13b - 3339 Cathy Poliak, Ph.D. cathy@math.uh.edu

More information

1 Least Squares Estimation - multiple regression.

1 Least Squares Estimation - multiple regression. Introduction to multiple regression. Fall 2010 1 Least Squares Estimation - multiple regression. Let y = {y 1,, y n } be a n 1 vector of dependent variable observations. Let β = {β 0, β 1 } be the 2 1

More information

2.1 Linear regression with matrices

2.1 Linear regression with matrices 21 Linear regression with matrices The values of the independent variables are united into the matrix X (design matrix), the values of the outcome and the coefficient are represented by the vectors Y and

More information

UNIVERSITY OF MASSACHUSETTS. Department of Mathematics and Statistics. Basic Exam - Applied Statistics. Tuesday, January 17, 2017

UNIVERSITY OF MASSACHUSETTS. Department of Mathematics and Statistics. Basic Exam - Applied Statistics. Tuesday, January 17, 2017 UNIVERSITY OF MASSACHUSETTS Department of Mathematics and Statistics Basic Exam - Applied Statistics Tuesday, January 17, 2017 Work all problems 60 points are needed to pass at the Masters Level and 75

More information

Estimating σ 2. We can do simple prediction of Y and estimation of the mean of Y at any value of X.

Estimating σ 2. We can do simple prediction of Y and estimation of the mean of Y at any value of X. Estimating σ 2 We can do simple prediction of Y and estimation of the mean of Y at any value of X. To perform inferences about our regression line, we must estimate σ 2, the variance of the error term.

More information

Regression, Ridge Regression, Lasso

Regression, Ridge Regression, Lasso Regression, Ridge Regression, Lasso Fabio G. Cozman - fgcozman@usp.br October 2, 2018 A general definition Regression studies the relationship between a response variable Y and covariates X 1,..., X n.

More information

Density Temp vs Ratio. temp

Density Temp vs Ratio. temp Temp Ratio Density 0.00 0.02 0.04 0.06 0.08 0.10 0.12 Density 0.0 0.2 0.4 0.6 0.8 1.0 1. (a) 170 175 180 185 temp 1.0 1.5 2.0 2.5 3.0 ratio The histogram shows that the temperature measures have two peaks,

More information

Linear Regression. September 27, Chapter 3. Chapter 3 September 27, / 77

Linear Regression. September 27, Chapter 3. Chapter 3 September 27, / 77 Linear Regression Chapter 3 September 27, 2016 Chapter 3 September 27, 2016 1 / 77 1 3.1. Simple linear regression 2 3.2 Multiple linear regression 3 3.3. The least squares estimation 4 3.4. The statistical

More information

Formal Statement of Simple Linear Regression Model

Formal Statement of Simple Linear Regression Model Formal Statement of Simple Linear Regression Model Y i = β 0 + β 1 X i + ɛ i Y i value of the response variable in the i th trial β 0 and β 1 are parameters X i is a known constant, the value of the predictor

More information

Simple and Multiple Linear Regression

Simple and Multiple Linear Regression Sta. 113 Chapter 12 and 13 of Devore March 12, 2010 Table of contents 1 Simple Linear Regression 2 Model Simple Linear Regression A simple linear regression model is given by Y = β 0 + β 1 x + ɛ where

More information

Terminology for Statistical Data

Terminology for Statistical Data Terminology for Statistical Data variables - features - attributes observations - cases (consist of multiple values) In a standard data matrix, variables or features correspond to columns observations

More information

Chapter 4: Regression Models

Chapter 4: Regression Models Sales volume of company 1 Textbook: pp. 129-164 Chapter 4: Regression Models Money spent on advertising 2 Learning Objectives After completing this chapter, students will be able to: Identify variables,

More information

Unit 6 - Simple linear regression

Unit 6 - Simple linear regression Sta 101: Data Analysis and Statistical Inference Dr. Çetinkaya-Rundel Unit 6 - Simple linear regression LO 1. Define the explanatory variable as the independent variable (predictor), and the response variable

More information

Unit 6 - Introduction to linear regression

Unit 6 - Introduction to linear regression Unit 6 - Introduction to linear regression Suggested reading: OpenIntro Statistics, Chapter 7 Suggested exercises: Part 1 - Relationship between two numerical variables: 7.7, 7.9, 7.11, 7.13, 7.15, 7.25,

More information

y ˆ i = ˆ " T u i ( i th fitted value or i th fit)

y ˆ i = ˆ  T u i ( i th fitted value or i th fit) 1 2 INFERENCE FOR MULTIPLE LINEAR REGRESSION Recall Terminology: p predictors x 1, x 2,, x p Some might be indicator variables for categorical variables) k-1 non-constant terms u 1, u 2,, u k-1 Each u

More information

1 The Classic Bivariate Least Squares Model

1 The Classic Bivariate Least Squares Model Review of Bivariate Linear Regression Contents 1 The Classic Bivariate Least Squares Model 1 1.1 The Setup............................... 1 1.2 An Example Predicting Kids IQ................. 1 2 Evaluating

More information

Introduction to Linear Regression Rebecca C. Steorts September 15, 2015

Introduction to Linear Regression Rebecca C. Steorts September 15, 2015 Introduction to Linear Regression Rebecca C. Steorts September 15, 2015 Today (Re-)Introduction to linear models and the model space What is linear regression Basic properties of linear regression Using

More information

Applied Regression Analysis

Applied Regression Analysis Applied Regression Analysis Chapter 3 Multiple Linear Regression Hongcheng Li April, 6, 2013 Recall simple linear regression 1 Recall simple linear regression 2 Parameter Estimation 3 Interpretations of

More information

Lecture 5: Clustering, Linear Regression

Lecture 5: Clustering, Linear Regression Lecture 5: Clustering, Linear Regression Reading: Chapter 10, Sections 3.1-3.2 STATS 202: Data mining and analysis October 4, 2017 1 / 22 .0.0 5 5 1.0 7 5 X2 X2 7 1.5 1.0 0.5 3 1 2 Hierarchical clustering

More information

Data Mining Stat 588

Data Mining Stat 588 Data Mining Stat 588 Lecture 02: Linear Methods for Regression Department of Statistics & Biostatistics Rutgers University September 13 2011 Regression Problem Quantitative generic output variable Y. Generic

More information

Linear Regression (9/11/13)

Linear Regression (9/11/13) STA561: Probabilistic machine learning Linear Regression (9/11/13) Lecturer: Barbara Engelhardt Scribes: Zachary Abzug, Mike Gloudemans, Zhuosheng Gu, Zhao Song 1 Why use linear regression? Figure 1: Scatter

More information

22s:152 Applied Linear Regression

22s:152 Applied Linear Regression 22s:152 Applied Linear Regression Chapter 7: Dummy Variable Regression So far, we ve only considered quantitative variables in our models. We can integrate categorical predictors by constructing artificial

More information

Day 4: Shrinkage Estimators

Day 4: Shrinkage Estimators Day 4: Shrinkage Estimators Kenneth Benoit Data Mining and Statistical Learning March 9, 2015 n versus p (aka k) Classical regression framework: n > p. Without this inequality, the OLS coefficients have

More information

Regression and the 2-Sample t

Regression and the 2-Sample t Regression and the 2-Sample t James H. Steiger Department of Psychology and Human Development Vanderbilt University James H. Steiger (Vanderbilt University) Regression and the 2-Sample t 1 / 44 Regression

More information

14 Multiple Linear Regression

14 Multiple Linear Regression B.Sc./Cert./M.Sc. Qualif. - Statistics: Theory and Practice 14 Multiple Linear Regression 14.1 The multiple linear regression model In simple linear regression, the response variable y is expressed in

More information

Business Statistics. Lecture 9: Simple Regression

Business Statistics. Lecture 9: Simple Regression Business Statistics Lecture 9: Simple Regression 1 On to Model Building! Up to now, class was about descriptive and inferential statistics Numerical and graphical summaries of data Confidence intervals

More information

1 Multiple Regression

1 Multiple Regression 1 Multiple Regression In this section, we extend the linear model to the case of several quantitative explanatory variables. There are many issues involved in this problem and this section serves only

More information

Nature vs. nurture? Lecture 18 - Regression: Inference, Outliers, and Intervals. Regression Output. Conditions for inference.

Nature vs. nurture? Lecture 18 - Regression: Inference, Outliers, and Intervals. Regression Output. Conditions for inference. Understanding regression output from software Nature vs. nurture? Lecture 18 - Regression: Inference, Outliers, and Intervals In 1966 Cyril Burt published a paper called The genetic determination of differences

More information

22s:152 Applied Linear Regression. Take random samples from each of m populations.

22s:152 Applied Linear Regression. Take random samples from each of m populations. 22s:152 Applied Linear Regression Chapter 8: ANOVA NOTE: We will meet in the lab on Monday October 10. One-way ANOVA Focuses on testing for differences among group means. Take random samples from each

More information

Lecture 5: Clustering, Linear Regression

Lecture 5: Clustering, Linear Regression Lecture 5: Clustering, Linear Regression Reading: Chapter 10, Sections 3.1-3.2 STATS 202: Data mining and analysis October 4, 2017 1 / 22 Hierarchical clustering Most algorithms for hierarchical clustering

More information

, (1) e i = ˆσ 1 h ii. c 2016, Jeffrey S. Simonoff 1

, (1) e i = ˆσ 1 h ii. c 2016, Jeffrey S. Simonoff 1 Regression diagnostics As is true of all statistical methodologies, linear regression analysis can be a very effective way to model data, as along as the assumptions being made are true. For the regression

More information

y response variable x 1, x 2,, x k -- a set of explanatory variables

y response variable x 1, x 2,, x k -- a set of explanatory variables 11. Multiple Regression and Correlation y response variable x 1, x 2,, x k -- a set of explanatory variables In this chapter, all variables are assumed to be quantitative. Chapters 12-14 show how to incorporate

More information

Multiple Linear Regression

Multiple Linear Regression Multiple Linear Regression Simple linear regression tries to fit a simple line between two variables Y and X. If X is linearly related to Y this explains some of the variability in Y. In most cases, there

More information

Lecture 1 Intro to Spatial and Temporal Data

Lecture 1 Intro to Spatial and Temporal Data Lecture 1 Intro to Spatial and Temporal Data Dennis Sun Stanford University Stats 253 June 22, 2015 1 What is Spatial and Temporal Data? 2 Trend Modeling 3 Omitted Variables 4 Overview of this Class 1

More information

Linear Models in Machine Learning

Linear Models in Machine Learning CS540 Intro to AI Linear Models in Machine Learning Lecturer: Xiaojin Zhu jerryzhu@cs.wisc.edu We briefly go over two linear models frequently used in machine learning: linear regression for, well, regression,

More information

Ch 3: Multiple Linear Regression

Ch 3: Multiple Linear Regression Ch 3: Multiple Linear Regression 1. Multiple Linear Regression Model Multiple regression model has more than one regressor. For example, we have one response variable and two regressor variables: 1. delivery

More information

Lecture 4: Multivariate Regression, Part 2

Lecture 4: Multivariate Regression, Part 2 Lecture 4: Multivariate Regression, Part 2 Gauss-Markov Assumptions 1) Linear in Parameters: Y X X X i 0 1 1 2 2 k k 2) Random Sampling: we have a random sample from the population that follows the above

More information

Linear Regression 9/23/17. Simple linear regression. Advertising sales: Variance changes based on # of TVs. Advertising sales: Normal error?

Linear Regression 9/23/17. Simple linear regression. Advertising sales: Variance changes based on # of TVs. Advertising sales: Normal error? Simple linear regression Linear Regression Nicole Beckage y " = β % + β ' x " + ε so y* " = β+ % + β+ ' x " Method to assess and evaluate the correlation between two (continuous) variables. The slope of

More information

Lecture 5: Clustering, Linear Regression

Lecture 5: Clustering, Linear Regression Lecture 5: Clustering, Linear Regression Reading: Chapter 10, Sections 3.1-2 STATS 202: Data mining and analysis Sergio Bacallado September 19, 2018 1 / 23 Announcements Starting next week, Julia Fukuyama

More information

Chapter 8: Simple Linear Regression

Chapter 8: Simple Linear Regression Chapter 8: Simple Linear Regression Shiwen Shen University of South Carolina 2017 Summer 1 / 70 Introduction A problem that arises in engineering, economics, medicine, and other areas is that of investigating

More information

Multiple Regression. Peerapat Wongchaiwat, Ph.D.

Multiple Regression. Peerapat Wongchaiwat, Ph.D. Peerapat Wongchaiwat, Ph.D. wongchaiwat@hotmail.com The Multiple Regression Model Examine the linear relationship between 1 dependent (Y) & 2 or more independent variables (X i ) Multiple Regression Model

More information

The linear model. Our models so far are linear. Change in Y due to change in X? See plots for: o age vs. ahe o carats vs.

The linear model. Our models so far are linear. Change in Y due to change in X? See plots for: o age vs. ahe o carats vs. 8 Nonlinear effects Lots of effects in economics are nonlinear Examples Deal with these in two (sort of three) ways: o Polynomials o Logarithms o Interaction terms (sort of) 1 The linear model Our models

More information

Multiple Regression and Regression Model Adequacy

Multiple Regression and Regression Model Adequacy Multiple Regression and Regression Model Adequacy Joseph J. Luczkovich, PhD February 14, 2014 Introduction Regression is a technique to mathematically model the linear association between two or more variables,

More information

Final Review. Yang Feng. Yang Feng (Columbia University) Final Review 1 / 58

Final Review. Yang Feng.   Yang Feng (Columbia University) Final Review 1 / 58 Final Review Yang Feng http://www.stat.columbia.edu/~yangfeng Yang Feng (Columbia University) Final Review 1 / 58 Outline 1 Multiple Linear Regression (Estimation, Inference) 2 Special Topics for Multiple

More information

Multiple Linear Regression

Multiple Linear Regression Multiple Linear Regression University of California, San Diego Instructor: Ery Arias-Castro http://math.ucsd.edu/~eariasca/teaching.html 1 / 42 Passenger car mileage Consider the carmpg dataset taken from

More information

Lecture 10 Multiple Linear Regression

Lecture 10 Multiple Linear Regression Lecture 10 Multiple Linear Regression STAT 512 Spring 2011 Background Reading KNNL: 6.1-6.5 10-1 Topic Overview Multiple Linear Regression Model 10-2 Data for Multiple Regression Y i is the response variable

More information

Lecture 14 Simple Linear Regression

Lecture 14 Simple Linear Regression Lecture 4 Simple Linear Regression Ordinary Least Squares (OLS) Consider the following simple linear regression model where, for each unit i, Y i is the dependent variable (response). X i is the independent

More information

Inference in Regression Analysis

Inference in Regression Analysis Inference in Regression Analysis Dr. Frank Wood Frank Wood, fwood@stat.columbia.edu Linear Regression Models Lecture 4, Slide 1 Today: Normal Error Regression Model Y i = β 0 + β 1 X i + ǫ i Y i value

More information

Applied Regression Analysis

Applied Regression Analysis Applied Regression Analysis Lecture 2 January 27, 2005 Lecture #2-1/27/2005 Slide 1 of 46 Today s Lecture Simple linear regression. Partitioning the sum of squares. Tests of significance.. Regression diagnostics

More information

Activity #12: More regression topics: LOWESS; polynomial, nonlinear, robust, quantile; ANOVA as regression

Activity #12: More regression topics: LOWESS; polynomial, nonlinear, robust, quantile; ANOVA as regression Activity #12: More regression topics: LOWESS; polynomial, nonlinear, robust, quantile; ANOVA as regression Scenario: 31 counts (over a 30-second period) were recorded from a Geiger counter at a nuclear

More information

Linear models and their mathematical foundations: Simple linear regression

Linear models and their mathematical foundations: Simple linear regression Linear models and their mathematical foundations: Simple linear regression Steffen Unkel Department of Medical Statistics University Medical Center Göttingen, Germany Winter term 2018/19 1/21 Introduction

More information

Lab 3 A Quick Introduction to Multiple Linear Regression Psychology The Multiple Linear Regression Model

Lab 3 A Quick Introduction to Multiple Linear Regression Psychology The Multiple Linear Regression Model Lab 3 A Quick Introduction to Multiple Linear Regression Psychology 310 Instructions.Work through the lab, saving the output as you go. You will be submitting your assignment as an R Markdown document.

More information

1 Introduction 1. 2 The Multiple Regression Model 1

1 Introduction 1. 2 The Multiple Regression Model 1 Multiple Linear Regression Contents 1 Introduction 1 2 The Multiple Regression Model 1 3 Setting Up a Multiple Regression Model 2 3.1 Introduction.............................. 2 3.2 Significance Tests

More information

MLR Model Selection. Author: Nicholas G Reich, Jeff Goldsmith. This material is part of the statsteachr project

MLR Model Selection. Author: Nicholas G Reich, Jeff Goldsmith. This material is part of the statsteachr project MLR Model Selection Author: Nicholas G Reich, Jeff Goldsmith This material is part of the statsteachr project Made available under the Creative Commons Attribution-ShareAlike 3.0 Unported License: http://creativecommons.org/licenses/by-sa/3.0/deed.en

More information

Remedial Measures, Brown-Forsythe test, F test

Remedial Measures, Brown-Forsythe test, F test Remedial Measures, Brown-Forsythe test, F test Dr. Frank Wood Frank Wood, fwood@stat.columbia.edu Linear Regression Models Lecture 7, Slide 1 Remedial Measures How do we know that the regression function

More information

This model of the conditional expectation is linear in the parameters. A more practical and relaxed attitude towards linear regression is to say that

This model of the conditional expectation is linear in the parameters. A more practical and relaxed attitude towards linear regression is to say that Linear Regression For (X, Y ) a pair of random variables with values in R p R we assume that E(Y X) = β 0 + with β R p+1. p X j β j = (1, X T )β j=1 This model of the conditional expectation is linear

More information

ST430 Exam 2 Solutions

ST430 Exam 2 Solutions ST430 Exam 2 Solutions Date: November 9, 2015 Name: Guideline: You may use one-page (front and back of a standard A4 paper) of notes. No laptop or textbook are permitted but you may use a calculator. Giving

More information

Chapter 16. Simple Linear Regression and dcorrelation

Chapter 16. Simple Linear Regression and dcorrelation Chapter 16 Simple Linear Regression and dcorrelation 16.1 Regression Analysis Our problem objective is to analyze the relationship between interval variables; regression analysis is the first tool we will

More information

Simple Linear Regression

Simple Linear Regression Simple Linear Regression In simple linear regression we are concerned about the relationship between two variables, X and Y. There are two components to such a relationship. 1. The strength of the relationship.

More information

Matrices and vectors A matrix is a rectangular array of numbers. Here s an example: A =

Matrices and vectors A matrix is a rectangular array of numbers. Here s an example: A = Matrices and vectors A matrix is a rectangular array of numbers Here s an example: 23 14 17 A = 225 0 2 This matrix has dimensions 2 3 The number of rows is first, then the number of columns We can write

More information

STAT 3022 Spring 2007

STAT 3022 Spring 2007 Simple Linear Regression Example These commands reproduce what we did in class. You should enter these in R and see what they do. Start by typing > set.seed(42) to reset the random number generator so

More information

Multivariate Time Series: VAR(p) Processes and Models

Multivariate Time Series: VAR(p) Processes and Models Multivariate Time Series: VAR(p) Processes and Models A VAR(p) model, for p > 0 is X t = φ 0 + Φ 1 X t 1 + + Φ p X t p + A t, where X t, φ 0, and X t i are k-vectors, Φ 1,..., Φ p are k k matrices, with

More information

Econometrics I KS. Module 2: Multivariate Linear Regression. Alexander Ahammer. This version: April 16, 2018

Econometrics I KS. Module 2: Multivariate Linear Regression. Alexander Ahammer. This version: April 16, 2018 Econometrics I KS Module 2: Multivariate Linear Regression Alexander Ahammer Department of Economics Johannes Kepler University of Linz This version: April 16, 2018 Alexander Ahammer (JKU) Module 2: Multivariate

More information

Relating Graph to Matlab

Relating Graph to Matlab There are two related course documents on the web Probability and Statistics Review -should be read by people without statistics background and it is helpful as a review for those with prior statistics

More information

Hypothesis testing Goodness of fit Multicollinearity Prediction. Applied Statistics. Lecturer: Serena Arima

Hypothesis testing Goodness of fit Multicollinearity Prediction. Applied Statistics. Lecturer: Serena Arima Applied Statistics Lecturer: Serena Arima Hypothesis testing for the linear model Under the Gauss-Markov assumptions and the normality of the error terms, we saw that β N(β, σ 2 (X X ) 1 ) and hence s

More information

Lecture 4: Multivariate Regression, Part 2

Lecture 4: Multivariate Regression, Part 2 Lecture 4: Multivariate Regression, Part 2 Gauss-Markov Assumptions 1) Linear in Parameters: Y X X X i 0 1 1 2 2 k k 2) Random Sampling: we have a random sample from the population that follows the above

More information

Lecture 10: F -Tests, ANOVA and R 2

Lecture 10: F -Tests, ANOVA and R 2 Lecture 10: F -Tests, ANOVA and R 2 1 ANOVA We saw that we could test the null hypothesis that β 1 0 using the statistic ( β 1 0)/ŝe. (Although I also mentioned that confidence intervals are generally

More information

REVIEW 8/2/2017 陈芳华东师大英语系

REVIEW 8/2/2017 陈芳华东师大英语系 REVIEW Hypothesis testing starts with a null hypothesis and a null distribution. We compare what we have to the null distribution, if the result is too extreme to belong to the null distribution (p

More information

Regression with a Single Regressor: Hypothesis Tests and Confidence Intervals

Regression with a Single Regressor: Hypothesis Tests and Confidence Intervals Regression with a Single Regressor: Hypothesis Tests and Confidence Intervals (SW Chapter 5) Outline. The standard error of ˆ. Hypothesis tests concerning β 3. Confidence intervals for β 4. Regression

More information

General Linear Model (Chapter 4)

General Linear Model (Chapter 4) General Linear Model (Chapter 4) Outcome variable is considered continuous Simple linear regression Scatterplots OLS is BLUE under basic assumptions MSE estimates residual variance testing regression coefficients

More information

R 2 and F -Tests and ANOVA

R 2 and F -Tests and ANOVA R 2 and F -Tests and ANOVA December 6, 2018 1 Partition of Sums of Squares The distance from any point y i in a collection of data, to the mean of the data ȳ, is the deviation, written as y i ȳ. Definition.

More information

22s:152 Applied Linear Regression. There are a couple commonly used models for a one-way ANOVA with m groups. Chapter 8: ANOVA

22s:152 Applied Linear Regression. There are a couple commonly used models for a one-way ANOVA with m groups. Chapter 8: ANOVA 22s:152 Applied Linear Regression Chapter 8: ANOVA NOTE: We will meet in the lab on Monday October 10. One-way ANOVA Focuses on testing for differences among group means. Take random samples from each

More information

Statistical Methods III Statistics 212. Problem Set 2 - Answer Key

Statistical Methods III Statistics 212. Problem Set 2 - Answer Key Statistical Methods III Statistics 212 Problem Set 2 - Answer Key 1. (Analysis to be turned in and discussed on Tuesday, April 24th) The data for this problem are taken from long-term followup of 1423

More information

The simple linear regression model discussed in Chapter 13 was written as

The simple linear regression model discussed in Chapter 13 was written as 1519T_c14 03/27/2006 07:28 AM Page 614 Chapter Jose Luis Pelaez Inc/Blend Images/Getty Images, Inc./Getty Images, Inc. 14 Multiple Regression 14.1 Multiple Regression Analysis 14.2 Assumptions of the Multiple

More information

Regression: Main Ideas Setting: Quantitative outcome with a quantitative explanatory variable. Example, cont.

Regression: Main Ideas Setting: Quantitative outcome with a quantitative explanatory variable. Example, cont. TCELL 9/4/205 36-309/749 Experimental Design for Behavioral and Social Sciences Simple Regression Example Male black wheatear birds carry stones to the nest as a form of sexual display. Soler et al. wanted

More information

Regression Review. Statistics 149. Spring Copyright c 2006 by Mark E. Irwin

Regression Review. Statistics 149. Spring Copyright c 2006 by Mark E. Irwin Regression Review Statistics 149 Spring 2006 Copyright c 2006 by Mark E. Irwin Matrix Approach to Regression Linear Model: Y i = β 0 + β 1 X i1 +... + β p X ip + ɛ i ; ɛ i iid N(0, σ 2 ), i = 1,..., n

More information

STATISTICS 174: APPLIED STATISTICS FINAL EXAM DECEMBER 10, 2002

STATISTICS 174: APPLIED STATISTICS FINAL EXAM DECEMBER 10, 2002 Time allowed: 3 HOURS. STATISTICS 174: APPLIED STATISTICS FINAL EXAM DECEMBER 10, 2002 This is an open book exam: all course notes and the text are allowed, and you are expected to use your own calculator.

More information