Lecture 6: Linear Regression

Size: px
Start display at page:

Download "Lecture 6: Linear Regression"

Transcription

1 Lecture 6: Linear Regression Reading: Sections STATS 202: Data mining and analysis Jonathan Taylor, 10/5 Slide credits: Sergio Bacallado 1 / 30

2 Simple linear regression Model: y i = β 0 + β 1 x i + ε i Sales ε i N (0, σ) i.i.d TV Figure / 30

3 Simple linear regression Model: y i = β 0 + β 1 x i + ε i Sales ε i N (0, σ) i.i.d. The estimates ˆβ 0 and ˆβ 1 are chosen to minimize the residual sum of squares (RSS): TV Figure 3.1 RSS = n (y i ŷ i ) 2 i=1 2 / 30

4 Simple linear regression Model: y i = β 0 + β 1 x i + ε i Sales ε i N (0, σ) i.i.d. The estimates ˆβ 0 and ˆβ 1 are chosen to minimize the residual sum of squares (RSS): TV Figure 3.1 RSS = = n (y i ŷ i ) 2 i=1 n (y i β 0 β 1 x i ) 2. i=1 2 / 30

5 Estimates ˆβ 0 and ˆβ 1 A little calculus shows that the minimizers of the RSS are: ˆβ 1 = n i=1 (x i x)(y i y) n i=1 (x i x) 2, ˆβ 0 = y ˆβ 1 x. 3 / 30

6 Assesing the accuracy of ˆβ 0 and ˆβ 1 Y Y The Standard Errors for the parameters are: SE( ˆβ 0 ) 2 = σ 2 [ 1 n + x 2 n i=1 (x i x) 2 ] SE( ˆβ 1 ) 2 = σ 2 n i=1 (x i x) X X Figure / 30

7 Assesing the accuracy of ˆβ 0 and ˆβ 1 Y Y The Standard Errors for the parameters are: SE( ˆβ 0 ) 2 = σ 2 [ 1 n + x 2 n i=1 (x i x) 2 ] SE( ˆβ 1 ) 2 = σ 2 n i=1 (x i x) X X Figure 3.3 The 95% confidence intervals: ˆβ 0 ± 2 SE( ˆβ 0 ) ˆβ 1 ± 2 SE( ˆβ 1 ) Calculations depend on the model 4 / 30

8 Hypothesis test H 0 : There is no relationship between X and Y. H a : There is some relationship between X and Y. 5 / 30

9 Hypothesis test H 0 : β 1 = 0. H a : β / 30

10 Hypothesis test H 0 : β 1 = 0. H a : β 1 0. Test statistic: t = ˆβ 1 0 SE( ˆβ 1 ). Under the null hypothesis (special case of the model), this has a t-distribution with n 2 degrees of freedom. 5 / 30

11 Hypothesis test H 0 : β 1 = 0. H a : β 1 0. Test statistic: t = ˆβ 1 0 SE( ˆβ 1 ). Under the null hypothesis (special case of the model), this has a t-distribution with n 2 degrees of freedom. 5 / 30

12 Interpreting the hypothesis test If we reject the null hypothesis, can we assume there is an exact linear relationship? 6 / 30

13 Interpreting the hypothesis test If we reject the null hypothesis, can we assume there is an exact linear relationship? No. A quadratic relationship may be a better fit, for example. There is evidence of linear association. 6 / 30

14 Interpreting the hypothesis test If we reject the null hypothesis, can we assume there is an exact linear relationship? No. A quadratic relationship may be a better fit, for example. There is evidence of linear association. If we don t reject the null hypothesis, can we assume there is no relationship between X and Y? 6 / 30

15 Interpreting the hypothesis test If we reject the null hypothesis, can we assume there is an exact linear relationship? No. A quadratic relationship may be a better fit, for example. There is evidence of linear association. If we don t reject the null hypothesis, can we assume there is no relationship between X and Y? No. This test is only powerful against certain monotone alternatives. There could be more complex non-linear relationships. 6 / 30

16 Multiple linear regression Y = β 0 + β 1 X β p X p + ε Y ε N (0, σ) i.i.d. X2 X1 Figure / 30

17 Multiple linear regression Y = β 0 + β 1 X β p X p + ε Y ε N (0, σ) i.i.d. or, in matrix notation: X2 Ey = Xβ, Figure 3.4 X1 where y = (y 1,..., y n ) T, β = (β 0,..., β p ) T and X is our usual data matrix with an extra column of ones on the left to account for the intercept. 7 / 30

18 Multiple linear regression can answer several questions Is at least one of the variables X i useful for predicting the outcome Y? 8 / 30

19 Multiple linear regression can answer several questions Is at least one of the variables X i useful for predicting the outcome Y? Which subset of the predictors is most important? 8 / 30

20 Multiple linear regression can answer several questions Is at least one of the variables X i useful for predicting the outcome Y? Which subset of the predictors is most important? How good is a linear model for these data? 8 / 30

21 Multiple linear regression can answer several questions Is at least one of the variables X i useful for predicting the outcome Y? Which subset of the predictors is most important? How good is a linear model for these data? Given a set of predictor values, what is a likely value for Y, and how accurate is this prediction? 8 / 30

22 The estimates ˆβ Our goal again is to minimize the RSS: RSS = n (y i ŷ i ) 2 i=1 9 / 30

23 The estimates ˆβ Our goal again is to minimize the RSS: RSS = = n (y i ŷ i ) 2 i=1 n (y i β 0 β 1 x i,1 β p x i,p ) 2. i=1 9 / 30

24 The estimates ˆβ Our goal again is to minimize the RSS: RSS = = n (y i ŷ i ) 2 i=1 n (y i β 0 β 1 x i,1 β p x i,p ) 2. i=1 One can show that this is minimized by the vector ˆβ: ˆβ = (X T X) 1 X T y. We also write RSS for the minimized sum of squares. 9 / 30

25 Consider the hypothesis: Which variables are important? H 0 : The last q predictors have no relation with Y. 10 / 30

26 Consider the hypothesis: Which variables are important? H 0 : β p q+1 = β p q+2 = = β p = / 30

27 Consider the hypothesis: Which variables are important? H 0 : β p q+1 = β p q+2 = = β p = 0. Let RSS 0 be the residual sum of squares for the model which excludes these variables. 10 / 30

28 Consider the hypothesis: Which variables are important? H 0 : β p q+1 = β p q+2 = = β p = 0. Let RSS 0 be the residual sum of squares for the model which excludes these variables. The F -statistic is defined by: F = (RSS 0 RSS)/q RSS/(n p 1). 10 / 30

29 Consider the hypothesis: Which variables are important? H 0 : β p q+1 = β p q+2 = = β p = 0. Let RSS 0 be the residual sum of squares for the model which excludes these variables. The F -statistic is defined by: F = (RSS 0 RSS)/q RSS/(n p 1). Under the null hypothesis, this has an F -distribution. 10 / 30

30 Consider the hypothesis: Which variables are important? H 0 : β p q+1 = β p q+2 = = β p = 0. Let RSS 0 be the residual sum of squares for the model which excludes these variables. The F -statistic is defined by: F = (RSS 0 RSS)/q RSS/(n p 1). Under the null hypothesis, this has an F -distribution. Example: If q = p, we test whether any of the variables is important. n RSS 0 = (y i y) 2 i=1 10 / 30

31 Which variables are important? A multiple linear regression in R has the following output: 11 / 30

32 Which variables are important? The t-statistic associated to the ith predictor is the square root of the F -statistic for the null hypothesis which sets only β i = / 30

33 Which variables are important? The t-statistic associated to the ith predictor is the square root of the F -statistic for the null hypothesis which sets only β i = 0. A low p-value indicates that the predictor is important (it adds something to the model not explained by other variables). 12 / 30

34 Which variables are important? The t-statistic associated to the ith predictor is the square root of the F -statistic for the null hypothesis which sets only β i = 0. A low p-value indicates that the predictor is important (it adds something to the model not explained by other variables). P-hacking warning: If there are many predictors, even under the null hypothesis, some of the t-tests will have low p-values. 12 / 30

35 How many variables are important? When we select a subset of the predictors, we have 2 p choices. 13 / 30

36 How many variables are important? When we select a subset of the predictors, we have 2 p choices. A way to simplify the choice is to define a range of models with an increasing number of variables, then select the best. 13 / 30

37 How many variables are important? When we select a subset of the predictors, we have 2 p choices. A way to simplify the choice is to define a range of models with an increasing number of variables, then select the best. Forward selection: Starting from a null model, include variables one at a time, minimizing the RSS at each step. 13 / 30

38 How many variables are important? When we select a subset of the predictors, we have 2 p choices. A way to simplify the choice is to define a range of models with an increasing number of variables, then select the best. Forward selection: Starting from a null model, include variables one at a time, minimizing the RSS at each step. Backward selection: Starting from the full model, eliminate variables one at a time, choosing the one with the largest p-value at each step. 13 / 30

39 How many variables are important? When we select a subset of the predictors, we have 2 p choices. A way to simplify the choice is to define a range of models with an increasing number of variables, then select the best. Forward selection: Starting from a null model, include variables one at a time, minimizing the RSS at each step. Backward selection: Starting from the full model, eliminate variables one at a time, choosing the one with the largest p-value at each step. Mixed selection: Starting from a null model, include variables one at a time, minimizing the RSS at each step. If the p-value for some variable goes beyond a threshold, eliminate that variable. 13 / 30

40 How many variables are important? When we select a subset of the predictors, we have 2 p choices. A way to simplify the choice is to define a range of models with an increasing number of variables, then select the best. Forward selection: Starting from a null model, include variables one at a time, minimizing the RSS at each step. Backward selection: Starting from the full model, eliminate variables one at a time, choosing the one with the largest p-value at each step. Mixed selection: Starting from a null model, include variables one at a time, minimizing the RSS at each step. If the p-value for some variable goes beyond a threshold, eliminate that variable. Choosing model this way is a form of tuning. P-hacking: hypothesis tests, confidence intervals not valid after selection! 13 / 30

41 How good is the fit? To assess the fit, we focus on the residuals. 14 / 30

42 How good is the fit? To assess the fit, we focus on the residuals. The RSS always decreases as we add more variables. 14 / 30

43 How good is the fit? To assess the fit, we focus on the residuals. The RSS always decreases as we add more variables. The residual standard error (RSE) corrects this: 1 RSE = n p 1 RSS. 14 / 30

44 How good is the fit? To assess the fit, we focus on the residuals. The RSS always decreases as we add more variables. The residual standard error (RSE) corrects this: 1 RSE = n p 1 RSS. Visualizing the residuals can reveal phenomena that are not accounted for by the model; eg. synergies or interactions: Sales TV Radio 14 / 30

45 How good are the predictions? The function predict in R output predictions from a linear model: Confidence intervals reflect the uncertainty on ˆβ. 15 / 30

46 How good are the predictions? The function predict in R output predictions from a linear model: Confidence intervals reflect the uncertainty on ˆβ. Prediction intervals reflect uncertainty on ˆβ and the irreducible error ε as well. 15 / 30

47 Dealing with categorical or qualitative predictors Example: Credit dataset Balance Age Cards Education Income Limit Rating / 30

48 Dealing with categorical or qualitative predictors Example: Credit dataset Balance In addition, there are 4 qualitative variables: Age gender: male, female Cards Education Income student: student or not. status: married, single, divorced Limit Rating ethnicity: African American, Asian, Caucasian / 30

49 Dealing with categorical or qualitative predictors For each qualitative predictor, e.g. ethnicity: Choose a baseline category, e.g. African American 17 / 30

50 Dealing with categorical or qualitative predictors For each qualitative predictor, e.g. ethnicity: Choose a baseline category, e.g. African American For every other category, define a new predictor: XAsian is 1 if the person is Asian and 0 otherwise. X Caucasian is 1 if the person is Caucasian and 0 otherwise. 17 / 30

51 Dealing with categorical or qualitative predictors For each qualitative predictor, e.g. ethnicity: Choose a baseline category, e.g. African American For every other category, define a new predictor: XAsian is 1 if the person is Asian and 0 otherwise. X Caucasian is 1 if the person is Caucasian and 0 otherwise. The model will be: Y = β 0 +β 1 X 1 + +β 7 X 7 +β Asian X Asian + β Caucasian X Caucasian +ε. 17 / 30

52 Dealing with categorical or qualitative predictors For each qualitative predictor, e.g. ethnicity: Choose a baseline category, e.g. African American For every other category, define a new predictor: XAsian is 1 if the person is Asian and 0 otherwise. X Caucasian is 1 if the person is Caucasian and 0 otherwise. The model will be: Y = β 0 +β 1 X 1 + +β 7 X 7 +β Asian X Asian + β Caucasian X Caucasian +ε. β Asian is the relative effect on balance for being Asian compared to the baseline category. 17 / 30

53 Dealing with categorical or qualitative predictors The model fit and predictions are independent of the choice of the baseline category. 18 / 30

54 Dealing with categorical or qualitative predictors The model fit and predictions are independent of the choice of the baseline category. However, hypothesis tests derived from these variables are affected by the choice. 18 / 30

55 Dealing with categorical or qualitative predictors The model fit and predictions are independent of the choice of the baseline category. However, hypothesis tests derived from these variables are affected by the choice. Solution: To check whether ethnicity is important, use an F -test for the hypothesis β Asian = β Caucasian = 0. This does not depend on the coding. 18 / 30

56 Dealing with categorical or qualitative predictors The model fit and predictions are independent of the choice of the baseline category. However, hypothesis tests derived from these variables are affected by the choice. Solution: To check whether ethnicity is important, use an F -test for the hypothesis β Asian = β Caucasian = 0. This does not depend on the coding. Other ways to encode qualitative predictors produce the same fit ˆf, but the coefficients have different interpretations. 18 / 30

57 Recap So far, we have: Defined Multiple Linear Regression 19 / 30

58 Recap So far, we have: Defined Multiple Linear Regression Discussed how to test the importance of variables. 19 / 30

59 Recap So far, we have: Defined Multiple Linear Regression Discussed how to test the importance of variables. Described one approach to choose a subset of variables. 19 / 30

60 Recap So far, we have: Defined Multiple Linear Regression Discussed how to test the importance of variables. Described one approach to choose a subset of variables. Explained how to code qualitative variables. 19 / 30

61 Recap So far, we have: Defined Multiple Linear Regression Discussed how to test the importance of variables. Described one approach to choose a subset of variables. Explained how to code qualitative variables. Now, how do we evaluate model fit? Is the linear model any good? What can go wrong? 19 / 30

62 How good is the fit? To assess the fit, we focus on the residuals. 20 / 30

63 How good is the fit? To assess the fit, we focus on the residuals. R 2 = Corr(Y, Ŷ ), always increases as we add more variables. 20 / 30

64 How good is the fit? To assess the fit, we focus on the residuals. R 2 = Corr(Y, Ŷ ), always increases as we add more variables. The residual standard error (RSE) does not always improve with more predictors: 1 RSE = n p 1 RSS. 20 / 30

65 How good is the fit? To assess the fit, we focus on the residuals. R 2 = Corr(Y, Ŷ ), always increases as we add more variables. The residual standard error (RSE) does not always improve with more predictors: 1 RSE = n p 1 RSS. Visualizing the residuals can reveal phenomena that are not accounted for by the model. 20 / 30

66 Potential issues in linear regression 1. Interactions between predictors 2. Non-linear relationships 3. Correlation of error terms 4. Non-constant variance of error (heteroskedasticity). 5. Outliers 6. High leverage points 7. Colinearity 21 / 30

67 Interactions between predictors Linear regression has an additive assumption: sales = β 0 + β 1 tv + β 2 radio + ε 22 / 30

68 Interactions between predictors Linear regression has an additive assumption: sales = β 0 + β 1 tv + β 2 radio + ε i.e. An increase of $100 dollars in TV ads causes a fixed increase in sales, regardless of how much you spend on radio ads. 22 / 30

69 Interactions between predictors Linear regression has an additive assumption: sales = β 0 + β 1 tv + β 2 radio + ε i.e. An increase of $100 dollars in TV ads causes a fixed increase in sales, regardless of how much you spend on radio ads. If we visualize the residuals, it is clear that this is false: Sales TV Radio 22 / 30

70 Interactions between predictors One way to deal with this is to include multiplicative variables in the model: sales = β 0 + β 1 tv + β 2 radio + β 3 (tv radio) + ε The interaction variable is high when both tv and radio are high. 23 / 30

71 Interactions between predictors R makes it easy to include interaction variables in the model: 24 / 30

72 Non-linearities Example: Auto dataset. Miles per gallon Linear Degree 2 Degree 5 A scatterplot between a predictor and the response may reveal a non-linear relationship Horsepower 25 / 30

73 Non-linearities Example: Auto dataset. Miles per gallon Linear Degree 2 Degree 5 A scatterplot between a predictor and the response may reveal a non-linear relationship. Solution: include polynomial terms in the model. MPG = β 0 + β 1 horsepower + ε Horsepower 25 / 30

74 Non-linearities Example: Auto dataset. Miles per gallon Linear Degree 2 Degree 5 A scatterplot between a predictor and the response may reveal a non-linear relationship. Solution: include polynomial terms in the model. MPG = β 0 + β 1 horsepower + β 2 horsepower 2 + ε Horsepower 25 / 30

75 Non-linearities Example: Auto dataset. Miles per gallon Horsepower Linear Degree 2 Degree 5 A scatterplot between a predictor and the response may reveal a non-linear relationship. Solution: include polynomial terms in the model. MPG = β 0 + β 1 horsepower + β 2 horsepower 2 + β 3 horsepower 3 + ε 25 / 30

76 Non-linearities Example: Auto dataset. Miles per gallon Horsepower Linear Degree 2 Degree 5 A scatterplot between a predictor and the response may reveal a non-linear relationship. Solution: include polynomial terms in the model. MPG = β 0 + β 1 horsepower + β 2 horsepower 2 + β 3 horsepower ε 25 / 30

77 Non-linearities In 2 or 3 dimensions, this is easy to visualize. What do we do when we have too many predictors? 26 / 30

78 Non-linearities In 2 or 3 dimensions, this is easy to visualize. What do we do when we have too many predictors? Plot the residuals against the response and look for a pattern: Residual Plot for Linear Fit Residual Plot for Quadratic Fit Residuals Residuals Fitted values Fitted values 26 / 30

79 Correlation of error terms We assumed that the errors for each sample are independent: What if this breaks down? y i = f(x i ) + ε i ; ε i N (0, σ) i.i.d. 27 / 30

80 Correlation of error terms We assumed that the errors for each sample are independent: What if this breaks down? y i = f(x i ) + ε i ; ε i N (0, σ) i.i.d. The main effect is that this invalidates any assertions about Standard Errors, confidence intervals, and hypothesis tests: Example: Suppose that by accident, we double the data (we use each sample twice). 27 / 30

81 Correlation of error terms We assumed that the errors for each sample are independent: What if this breaks down? y i = f(x i ) + ε i ; ε i N (0, σ) i.i.d. The main effect is that this invalidates any assertions about Standard Errors, confidence intervals, and hypothesis tests: Example: Suppose that by accident, we double the data (we use each sample twice). Then, the standard errors would be artificially smaller by a factor of / 30

82 Correlation of error terms When could this happen in real life: Time series: Each sample corresponds to a different point in time. The errors for samples that are close in time are correlated. 28 / 30

83 Correlation of error terms When could this happen in real life: Time series: Each sample corresponds to a different point in time. The errors for samples that are close in time are correlated. Spatial data: Each sample corresponds to a different location in space. 28 / 30

84 Correlation of error terms When could this happen in real life: Time series: Each sample corresponds to a different point in time. The errors for samples that are close in time are correlated. Spatial data: Each sample corresponds to a different location in space. Study on predicting height from weight at birth. 28 / 30

85 Correlation of error terms When could this happen in real life: Time series: Each sample corresponds to a different point in time. The errors for samples that are close in time are correlated. Spatial data: Each sample corresponds to a different location in space. Study on predicting height from weight at birth. Suppose some of the subjects in the study are in the same family, their shared environment could make them deviate from f(x) in similar ways. 28 / 30

86 Correlation of error terms Simulations of time series with increasing correlations between ε i. ρ=0.0 Residual ρ=0.5 Residual ρ=0.9 Residual Observation 29 / 30

87 Non-constant variance of error (heteroskedasticity) The variance of the error depends on the input. 30 / 30

88 Non-constant variance of error (heteroskedasticity) The variance of the error depends on the input. To diagnose this, we can plot residuals vs. fitted values: Response Y Response log(y) Residuals Residuals Fitted values Fitted values 30 / 30

89 Non-constant variance of error (heteroskedasticity) The variance of the error depends on the input. To diagnose this, we can plot residuals vs. fitted values: Response Y Response log(y) Residuals Residuals Fitted values Fitted values Solution: If the trend in variance is relatively simple, we can transform the response using a logarithm, for example. 30 / 30

Lecture 6: Linear Regression (continued)

Lecture 6: Linear Regression (continued) Lecture 6: Linear Regression (continued) Reading: Sections 3.1-3.3 STATS 202: Data mining and analysis October 6, 2017 1 / 23 Multiple linear regression Y = β 0 + β 1 X 1 + + β p X p + ε Y ε N (0, σ) i.i.d.

More information

Lecture 5: Clustering, Linear Regression

Lecture 5: Clustering, Linear Regression Lecture 5: Clustering, Linear Regression Reading: Chapter 10, Sections 3.1-3.2 STATS 202: Data mining and analysis October 4, 2017 1 / 22 .0.0 5 5 1.0 7 5 X2 X2 7 1.5 1.0 0.5 3 1 2 Hierarchical clustering

More information

Lecture 5: Clustering, Linear Regression

Lecture 5: Clustering, Linear Regression Lecture 5: Clustering, Linear Regression Reading: Chapter 10, Sections 3.1-2 STATS 202: Data mining and analysis Sergio Bacallado September 19, 2018 1 / 23 Announcements Starting next week, Julia Fukuyama

More information

Lecture 5: Clustering, Linear Regression

Lecture 5: Clustering, Linear Regression Lecture 5: Clustering, Linear Regression Reading: Chapter 10, Sections 3.1-3.2 STATS 202: Data mining and analysis October 4, 2017 1 / 22 Hierarchical clustering Most algorithms for hierarchical clustering

More information

Linear regression. Linear regression is a simple approach to supervised learning. It assumes that the dependence of Y on X 1,X 2,...X p is linear.

Linear regression. Linear regression is a simple approach to supervised learning. It assumes that the dependence of Y on X 1,X 2,...X p is linear. Linear regression Linear regression is a simple approach to supervised learning. It assumes that the dependence of Y on X 1,X 2,...X p is linear. 1/48 Linear regression Linear regression is a simple approach

More information

Statistical Methods for Data Mining

Statistical Methods for Data Mining Statistical Methods for Data Mining Kuangnan Fang Xiamen University Email: xmufkn@xmu.edu.cn Linear regression Linear regression is a simple approach to supervised learning. It assumes that the dependence

More information

Machine Learning Linear Regression. Prof. Matteo Matteucci

Machine Learning Linear Regression. Prof. Matteo Matteucci Machine Learning Linear Regression Prof. Matteo Matteucci Outline 2 o Simple Linear Regression Model Least Squares Fit Measures of Fit Inference in Regression o Multi Variate Regession Model Least Squares

More information

Machine Learning for OR & FE

Machine Learning for OR & FE Machine Learning for OR & FE Supervised Learning: Regression I Martin Haugh Department of Industrial Engineering and Operations Research Columbia University Email: martin.b.haugh@gmail.com Some of the

More information

LECTURE 03: LINEAR REGRESSION PT. 1. September 18, 2017 SDS 293: Machine Learning

LECTURE 03: LINEAR REGRESSION PT. 1. September 18, 2017 SDS 293: Machine Learning LECTURE 03: LINEAR REGRESSION PT. 1 September 18, 2017 SDS 293: Machine Learning Announcements Need help with? Visit the Stats TAs! Sunday Thursday evenings 7 9 pm in Burton 301 (SDS293 alum available

More information

Multiple (non) linear regression. Department of Computer Science, Czech Technical University in Prague

Multiple (non) linear regression. Department of Computer Science, Czech Technical University in Prague Multiple (non) linear regression Jiří Kléma Department of Computer Science, Czech Technical University in Prague Lecture based on ISLR book and its accompanying slides http://cw.felk.cvut.cz/wiki/courses/b4m36san/start

More information

Linear Regression In God we trust, all others bring data. William Edwards Deming

Linear Regression In God we trust, all others bring data. William Edwards Deming Linear Regression ddebarr@uw.edu 2017-01-19 In God we trust, all others bring data. William Edwards Deming Course Outline 1. Introduction to Statistical Learning 2. Linear Regression 3. Classification

More information

4. Nonlinear regression functions

4. Nonlinear regression functions 4. Nonlinear regression functions Up to now: Population regression function was assumed to be linear The slope(s) of the population regression function is (are) constant The effect on Y of a unit-change

More information

Multiple Linear Regression

Multiple Linear Regression Multiple Linear Regression University of California, San Diego Instructor: Ery Arias-Castro http://math.ucsd.edu/~eariasca/teaching.html 1 / 42 Passenger car mileage Consider the carmpg dataset taken from

More information

Exam Applied Statistical Regression. Good Luck!

Exam Applied Statistical Regression. Good Luck! Dr. M. Dettling Summer 2011 Exam Applied Statistical Regression Approved: Tables: Note: Any written material, calculator (without communication facility). Attached. All tests have to be done at the 5%-level.

More information

LECTURE 04: LINEAR REGRESSION PT. 2. September 20, 2017 SDS 293: Machine Learning

LECTURE 04: LINEAR REGRESSION PT. 2. September 20, 2017 SDS 293: Machine Learning LECTURE 04: LINEAR REGRESSION PT. 2 September 20, 2017 SDS 293: Machine Learning Announcements Stats TA hours start Monday (sorry for the confusion) Looking for some refreshers on mathematical concepts?

More information

11 Hypothesis Testing

11 Hypothesis Testing 28 11 Hypothesis Testing 111 Introduction Suppose we want to test the hypothesis: H : A q p β p 1 q 1 In terms of the rows of A this can be written as a 1 a q β, ie a i β for each row of A (here a i denotes

More information

Data Analysis 1 LINEAR REGRESSION. Chapter 03

Data Analysis 1 LINEAR REGRESSION. Chapter 03 Data Analysis 1 LINEAR REGRESSION Chapter 03 Data Analysis 2 Outline The Linear Regression Model Least Squares Fit Measures of Fit Inference in Regression Other Considerations in Regression Model Qualitative

More information

MS&E 226: Small Data

MS&E 226: Small Data MS&E 226: Small Data Lecture 15: Examples of hypothesis tests (v5) Ramesh Johari ramesh.johari@stanford.edu 1 / 32 The recipe 2 / 32 The hypothesis testing recipe In this lecture we repeatedly apply the

More information

Scatter plot of data from the study. Linear Regression

Scatter plot of data from the study. Linear Regression 1 2 Linear Regression Scatter plot of data from the study. Consider a study to relate birthweight to the estriol level of pregnant women. The data is below. i Weight (g / 100) i Weight (g / 100) 1 7 25

More information

Chapter 4. Regression Models. Learning Objectives

Chapter 4. Regression Models. Learning Objectives Chapter 4 Regression Models To accompany Quantitative Analysis for Management, Eleventh Edition, by Render, Stair, and Hanna Power Point slides created by Brian Peterson Learning Objectives After completing

More information

Final Overview. Introduction to ML. Marek Petrik 4/25/2017

Final Overview. Introduction to ML. Marek Petrik 4/25/2017 Final Overview Introduction to ML Marek Petrik 4/25/2017 This Course: Introduction to Machine Learning Build a foundation for practice and research in ML Basic machine learning concepts: max likelihood,

More information

Analytics 512: Homework # 2 Tim Ahn February 9, 2016

Analytics 512: Homework # 2 Tim Ahn February 9, 2016 Analytics 512: Homework # 2 Tim Ahn February 9, 2016 Chapter 3 Problem 1 (# 3) Suppose we have a data set with five predictors, X 1 = GP A, X 2 = IQ, X 3 = Gender (1 for Female and 0 for Male), X 4 = Interaction

More information

Scatter plot of data from the study. Linear Regression

Scatter plot of data from the study. Linear Regression 1 2 Linear Regression Scatter plot of data from the study. Consider a study to relate birthweight to the estriol level of pregnant women. The data is below. i Weight (g / 100) i Weight (g / 100) 1 7 25

More information

y ˆ i = ˆ " T u i ( i th fitted value or i th fit)

y ˆ i = ˆ  T u i ( i th fitted value or i th fit) 1 2 INFERENCE FOR MULTIPLE LINEAR REGRESSION Recall Terminology: p predictors x 1, x 2,, x p Some might be indicator variables for categorical variables) k-1 non-constant terms u 1, u 2,, u k-1 Each u

More information

Lecture 9: Classification, LDA

Lecture 9: Classification, LDA Lecture 9: Classification, LDA Reading: Chapter 4 STATS 202: Data mining and analysis Jonathan Taylor, 10/12 Slide credits: Sergio Bacallado 1 / 1 Review: Main strategy in Chapter 4 Find an estimate ˆP

More information

Chapter 3 Multiple Regression Complete Example

Chapter 3 Multiple Regression Complete Example Department of Quantitative Methods & Information Systems ECON 504 Chapter 3 Multiple Regression Complete Example Spring 2013 Dr. Mohammad Zainal Review Goals After completing this lecture, you should be

More information

2. Regression Review

2. Regression Review 2. Regression Review 2.1 The Regression Model The general form of the regression model y t = f(x t, β) + ε t where x t = (x t1,, x tp ), β = (β 1,..., β m ). ε t is a random variable, Eε t = 0, Var(ε t

More information

BANA 7046 Data Mining I Lecture 2. Linear Regression, Model Assessment, and Cross-validation 1

BANA 7046 Data Mining I Lecture 2. Linear Regression, Model Assessment, and Cross-validation 1 BANA 7046 Data Mining I Lecture 2. Linear Regression, Model Assessment, and Cross-validation 1 Shaobo Li University of Cincinnati 1 Partially based on Hastie, et al. (2009) ESL, and James, et al. (2013)

More information

Chapter 4: Regression Models

Chapter 4: Regression Models Sales volume of company 1 Textbook: pp. 129-164 Chapter 4: Regression Models Money spent on advertising 2 Learning Objectives After completing this chapter, students will be able to: Identify variables,

More information

Linear model selection and regularization

Linear model selection and regularization Linear model selection and regularization Problems with linear regression with least square 1. Prediction Accuracy: linear regression has low bias but suffer from high variance, especially when n p. It

More information

22s:152 Applied Linear Regression

22s:152 Applied Linear Regression 22s:152 Applied Linear Regression Chapter 7: Dummy Variable Regression So far, we ve only considered quantitative variables in our models. We can integrate categorical predictors by constructing artificial

More information

Categorical Predictor Variables

Categorical Predictor Variables Categorical Predictor Variables We often wish to use categorical (or qualitative) variables as covariates in a regression model. For binary variables (taking on only 2 values, e.g. sex), it is relatively

More information

5. Let W follow a normal distribution with mean of μ and the variance of 1. Then, the pdf of W is

5. Let W follow a normal distribution with mean of μ and the variance of 1. Then, the pdf of W is Practice Final Exam Last Name:, First Name:. Please write LEGIBLY. Answer all questions on this exam in the space provided (you may use the back of any page if you need more space). Show all work but do

More information

Sociology 593 Exam 2 Answer Key March 28, 2002

Sociology 593 Exam 2 Answer Key March 28, 2002 Sociology 59 Exam Answer Key March 8, 00 I. True-False. (0 points) Indicate whether the following statements are true or false. If false, briefly explain why.. A variable is called CATHOLIC. This probably

More information

Unit 6 - Simple linear regression

Unit 6 - Simple linear regression Sta 101: Data Analysis and Statistical Inference Dr. Çetinkaya-Rundel Unit 6 - Simple linear regression LO 1. Define the explanatory variable as the independent variable (predictor), and the response variable

More information

Midterm 2 - Solutions

Midterm 2 - Solutions Ecn 102 - Analysis of Economic Data University of California - Davis February 23, 2010 Instructor: John Parman Midterm 2 - Solutions You have until 10:20am to complete this exam. Please remember to put

More information

Multiple Linear Regression

Multiple Linear Regression Multiple Linear Regression Simple linear regression tries to fit a simple line between two variables Y and X. If X is linearly related to Y this explains some of the variability in Y. In most cases, there

More information

Lecture 14: Shrinkage

Lecture 14: Shrinkage Lecture 14: Shrinkage Reading: Section 6.2 STATS 202: Data mining and analysis October 27, 2017 1 / 19 Shrinkage methods The idea is to perform a linear regression, while regularizing or shrinking the

More information

Section 3: Simple Linear Regression

Section 3: Simple Linear Regression Section 3: Simple Linear Regression Carlos M. Carvalho The University of Texas at Austin McCombs School of Business http://faculty.mccombs.utexas.edu/carlos.carvalho/teaching/ 1 Regression: General Introduction

More information

Ecn Analysis of Economic Data University of California - Davis February 23, 2010 Instructor: John Parman. Midterm 2. Name: ID Number: Section:

Ecn Analysis of Economic Data University of California - Davis February 23, 2010 Instructor: John Parman. Midterm 2. Name: ID Number: Section: Ecn 102 - Analysis of Economic Data University of California - Davis February 23, 2010 Instructor: John Parman Midterm 2 You have until 10:20am to complete this exam. Please remember to put your name,

More information

Unit 6 - Introduction to linear regression

Unit 6 - Introduction to linear regression Unit 6 - Introduction to linear regression Suggested reading: OpenIntro Statistics, Chapter 7 Suggested exercises: Part 1 - Relationship between two numerical variables: 7.7, 7.9, 7.11, 7.13, 7.15, 7.25,

More information

Final Exam - Solutions

Final Exam - Solutions Ecn 102 - Analysis of Economic Data University of California - Davis March 19, 2010 Instructor: John Parman Final Exam - Solutions You have until 5:30pm to complete this exam. Please remember to put your

More information

Hypothesis testing. Data to decisions

Hypothesis testing. Data to decisions Hypothesis testing Data to decisions The idea Null hypothesis: H 0 : the DGP/population has property P Under the null, a sample statistic has a known distribution If, under that that distribution, the

More information

Lecture 6 Multiple Linear Regression, cont.

Lecture 6 Multiple Linear Regression, cont. Lecture 6 Multiple Linear Regression, cont. BIOST 515 January 22, 2004 BIOST 515, Lecture 6 Testing general linear hypotheses Suppose we are interested in testing linear combinations of the regression

More information

Chapter 13. Multiple Regression and Model Building

Chapter 13. Multiple Regression and Model Building Chapter 13 Multiple Regression and Model Building Multiple Regression Models The General Multiple Regression Model y x x x 0 1 1 2 2... k k y is the dependent variable x, x,..., x 1 2 k the model are the

More information

STAT 135 Lab 11 Tests for Categorical Data (Fisher s Exact test, χ 2 tests for Homogeneity and Independence) and Linear Regression

STAT 135 Lab 11 Tests for Categorical Data (Fisher s Exact test, χ 2 tests for Homogeneity and Independence) and Linear Regression STAT 135 Lab 11 Tests for Categorical Data (Fisher s Exact test, χ 2 tests for Homogeneity and Independence) and Linear Regression Rebecca Barter April 20, 2015 Fisher s Exact Test Fisher s Exact Test

More information

Statistical View of Least Squares

Statistical View of Least Squares May 23, 2006 Purpose of Regression Some Examples Least Squares Purpose of Regression Purpose of Regression Some Examples Least Squares Suppose we have two variables x and y Purpose of Regression Some Examples

More information

Regression With a Categorical Independent Variable

Regression With a Categorical Independent Variable Regression ith a Independent Variable ERSH 8320 Slide 1 of 34 Today s Lecture Regression with a single categorical independent variable. Today s Lecture Coding procedures for analysis. Dummy coding. Relationship

More information

Regression, Ridge Regression, Lasso

Regression, Ridge Regression, Lasso Regression, Ridge Regression, Lasso Fabio G. Cozman - fgcozman@usp.br October 2, 2018 A general definition Regression studies the relationship between a response variable Y and covariates X 1,..., X n.

More information

Lecture 14 Simple Linear Regression

Lecture 14 Simple Linear Regression Lecture 4 Simple Linear Regression Ordinary Least Squares (OLS) Consider the following simple linear regression model where, for each unit i, Y i is the dependent variable (response). X i is the independent

More information

Data Mining Stat 588

Data Mining Stat 588 Data Mining Stat 588 Lecture 02: Linear Methods for Regression Department of Statistics & Biostatistics Rutgers University September 13 2011 Regression Problem Quantitative generic output variable Y. Generic

More information

Business Statistics. Lecture 9: Simple Regression

Business Statistics. Lecture 9: Simple Regression Business Statistics Lecture 9: Simple Regression 1 On to Model Building! Up to now, class was about descriptive and inferential statistics Numerical and graphical summaries of data Confidence intervals

More information

Lecture 15. Hypothesis testing in the linear model

Lecture 15. Hypothesis testing in the linear model 14. Lecture 15. Hypothesis testing in the linear model Lecture 15. Hypothesis testing in the linear model 1 (1 1) Preliminary lemma 15. Hypothesis testing in the linear model 15.1. Preliminary lemma Lemma

More information

AMS 7 Correlation and Regression Lecture 8

AMS 7 Correlation and Regression Lecture 8 AMS 7 Correlation and Regression Lecture 8 Department of Applied Mathematics and Statistics, University of California, Santa Cruz Suumer 2014 1 / 18 Correlation pairs of continuous observations. Correlation

More information

Business Statistics. Lecture 10: Correlation and Linear Regression

Business Statistics. Lecture 10: Correlation and Linear Regression Business Statistics Lecture 10: Correlation and Linear Regression Scatterplot A scatterplot shows the relationship between two quantitative variables measured on the same individuals. It displays the Form

More information

MATH 644: Regression Analysis Methods

MATH 644: Regression Analysis Methods MATH 644: Regression Analysis Methods FINAL EXAM Fall, 2012 INSTRUCTIONS TO STUDENTS: 1. This test contains SIX questions. It comprises ELEVEN printed pages. 2. Answer ALL questions for a total of 100

More information

WISE International Masters

WISE International Masters WISE International Masters ECONOMETRICS Instructor: Brett Graham INSTRUCTIONS TO STUDENTS 1 The time allowed for this examination paper is 2 hours. 2 This examination paper contains 32 questions. You are

More information

Multiple Regression: Chapter 13. July 24, 2015

Multiple Regression: Chapter 13. July 24, 2015 Multiple Regression: Chapter 13 July 24, 2015 Multiple Regression (MR) Response Variable: Y - only one response variable (quantitative) Several Predictor Variables: X 1, X 2, X 3,..., X p (p = # predictors)

More information

Linear Regression. In this lecture we will study a particular type of regression model: the linear regression model

Linear Regression. In this lecture we will study a particular type of regression model: the linear regression model 1 Linear Regression 2 Linear Regression In this lecture we will study a particular type of regression model: the linear regression model We will first consider the case of the model with one predictor

More information

Stat 135 Fall 2013 FINAL EXAM December 18, 2013

Stat 135 Fall 2013 FINAL EXAM December 18, 2013 Stat 135 Fall 2013 FINAL EXAM December 18, 2013 Name: Person on right SID: Person on left There will be one, double sided, handwritten, 8.5in x 11in page of notes allowed during the exam. The exam is closed

More information

Ch 3: Multiple Linear Regression

Ch 3: Multiple Linear Regression Ch 3: Multiple Linear Regression 1. Multiple Linear Regression Model Multiple regression model has more than one regressor. For example, we have one response variable and two regressor variables: 1. delivery

More information

Multiple Linear Regression. Chapter 12

Multiple Linear Regression. Chapter 12 13 Multiple Linear Regression Chapter 12 Multiple Regression Analysis Definition The multiple regression model equation is Y = b 0 + b 1 x 1 + b 2 x 2 +... + b p x p + ε where E(ε) = 0 and Var(ε) = s 2.

More information

Chapter 14 Student Lecture Notes 14-1

Chapter 14 Student Lecture Notes 14-1 Chapter 14 Student Lecture Notes 14-1 Business Statistics: A Decision-Making Approach 6 th Edition Chapter 14 Multiple Regression Analysis and Model Building Chap 14-1 Chapter Goals After completing this

More information

Correlation Analysis

Correlation Analysis Simple Regression Correlation Analysis Correlation analysis is used to measure strength of the association (linear relationship) between two variables Correlation is only concerned with strength of the

More information

ECON 5350 Class Notes Functional Form and Structural Change

ECON 5350 Class Notes Functional Form and Structural Change ECON 5350 Class Notes Functional Form and Structural Change 1 Introduction Although OLS is considered a linear estimator, it does not mean that the relationship between Y and X needs to be linear. In this

More information

LECTURE 5 HYPOTHESIS TESTING

LECTURE 5 HYPOTHESIS TESTING October 25, 2016 LECTURE 5 HYPOTHESIS TESTING Basic concepts In this lecture we continue to discuss the normal classical linear regression defined by Assumptions A1-A5. Let θ Θ R d be a parameter of interest.

More information

Trendlines Simple Linear Regression Multiple Linear Regression Systematic Model Building Practical Issues

Trendlines Simple Linear Regression Multiple Linear Regression Systematic Model Building Practical Issues Trendlines Simple Linear Regression Multiple Linear Regression Systematic Model Building Practical Issues Overfitting Categorical Variables Interaction Terms Non-linear Terms Linear Logarithmic y = a +

More information

Review of Econometrics

Review of Econometrics Review of Econometrics Zheng Tian June 5th, 2017 1 The Essence of the OLS Estimation Multiple regression model involves the models as follows Y i = β 0 + β 1 X 1i + β 2 X 2i + + β k X ki + u i, i = 1,...,

More information

Simple Linear Regression for the MPG Data

Simple Linear Regression for the MPG Data Simple Linear Regression for the MPG Data 2000 2500 3000 3500 15 20 25 30 35 40 45 Wgt MPG What do we do with the data? y i = MPG of i th car x i = Weight of i th car i =1,...,n n = Sample Size Exploratory

More information

Chapter 12 - Lecture 2 Inferences about regression coefficient

Chapter 12 - Lecture 2 Inferences about regression coefficient Chapter 12 - Lecture 2 Inferences about regression coefficient April 19th, 2010 Facts about slope Test Statistic Confidence interval Hypothesis testing Test using ANOVA Table Facts about slope In previous

More information

Multiple Linear Regression for the Salary Data

Multiple Linear Regression for the Salary Data Multiple Linear Regression for the Salary Data 5 10 15 20 10000 15000 20000 25000 Experience Salary HS BS BS+ 5 10 15 20 10000 15000 20000 25000 Experience Salary No Yes Problem & Data Overview Primary

More information

Homework 2: Simple Linear Regression

Homework 2: Simple Linear Regression STAT 4385 Applied Regression Analysis Homework : Simple Linear Regression (Simple Linear Regression) Thirty (n = 30) College graduates who have recently entered the job market. For each student, the CGPA

More information

Lecture 9: Linear Regression

Lecture 9: Linear Regression Lecture 9: Linear Regression Goals Develop basic concepts of linear regression from a probabilistic framework Estimating parameters and hypothesis testing with linear models Linear regression in R Regression

More information

Chapter 14 Simple Linear Regression (A)

Chapter 14 Simple Linear Regression (A) Chapter 14 Simple Linear Regression (A) 1. Characteristics Managerial decisions often are based on the relationship between two or more variables. can be used to develop an equation showing how the variables

More information

Correlation & Simple Regression

Correlation & Simple Regression Chapter 11 Correlation & Simple Regression The previous chapter dealt with inference for two categorical variables. In this chapter, we would like to examine the relationship between two quantitative variables.

More information

Simple Linear Regression

Simple Linear Regression Simple Linear Regression ST 430/514 Recall: A regression model describes how a dependent variable (or response) Y is affected, on average, by one or more independent variables (or factors, or covariates)

More information

Lecture 28 Chi-Square Analysis

Lecture 28 Chi-Square Analysis Lecture 28 STAT 225 Introduction to Probability Models April 23, 2014 Whitney Huang Purdue University 28.1 χ 2 test for For a given contingency table, we want to test if two have a relationship or not

More information

Math 423/533: The Main Theoretical Topics

Math 423/533: The Main Theoretical Topics Math 423/533: The Main Theoretical Topics Notation sample size n, data index i number of predictors, p (p = 2 for simple linear regression) y i : response for individual i x i = (x i1,..., x ip ) (1 p)

More information

Inference for Regression Inference about the Regression Model and Using the Regression Line

Inference for Regression Inference about the Regression Model and Using the Regression Line Inference for Regression Inference about the Regression Model and Using the Regression Line PBS Chapter 10.1 and 10.2 2009 W.H. Freeman and Company Objectives (PBS Chapter 10.1 and 10.2) Inference about

More information

STAT 3900/4950 MIDTERM TWO Name: Spring, 2015 (print: first last ) Covered topics: Two-way ANOVA, ANCOVA, SLR, MLR and correlation analysis

STAT 3900/4950 MIDTERM TWO Name: Spring, 2015 (print: first last ) Covered topics: Two-way ANOVA, ANCOVA, SLR, MLR and correlation analysis STAT 3900/4950 MIDTERM TWO Name: Spring, 205 (print: first last ) Covered topics: Two-way ANOVA, ANCOVA, SLR, MLR and correlation analysis Instructions: You may use your books, notes, and SPSS/SAS. NO

More information

Simple Linear Regression for the Climate Data

Simple Linear Regression for the Climate Data Prediction Prediction Interval Temperature 0.2 0.0 0.2 0.4 0.6 0.8 320 340 360 380 CO 2 Simple Linear Regression for the Climate Data What do we do with the data? y i = Temperature of i th Year x i =CO

More information

Ch 2: Simple Linear Regression

Ch 2: Simple Linear Regression Ch 2: Simple Linear Regression 1. Simple Linear Regression Model A simple regression model with a single regressor x is y = β 0 + β 1 x + ɛ, where we assume that the error ɛ is independent random component

More information

Unit 7: Multiple linear regression 1. Introduction to multiple linear regression

Unit 7: Multiple linear regression 1. Introduction to multiple linear regression Announcements Unit 7: Multiple linear regression 1. Introduction to multiple linear regression Sta 101 - Fall 2017 Duke University, Department of Statistical Science Work on your project! Due date- Sunday

More information

This model of the conditional expectation is linear in the parameters. A more practical and relaxed attitude towards linear regression is to say that

This model of the conditional expectation is linear in the parameters. A more practical and relaxed attitude towards linear regression is to say that Linear Regression For (X, Y ) a pair of random variables with values in R p R we assume that E(Y X) = β 0 + with β R p+1. p X j β j = (1, X T )β j=1 This model of the conditional expectation is linear

More information

Regression Models. Chapter 4. Introduction. Introduction. Introduction

Regression Models. Chapter 4. Introduction. Introduction. Introduction Chapter 4 Regression Models Quantitative Analysis for Management, Tenth Edition, by Render, Stair, and Hanna 008 Prentice-Hall, Inc. Introduction Regression analysis is a very valuable tool for a manager

More information

Linear Regression Model. Badr Missaoui

Linear Regression Model. Badr Missaoui Linear Regression Model Badr Missaoui Introduction What is this course about? It is a course on applied statistics. It comprises 2 hours lectures each week and 1 hour lab sessions/tutorials. We will focus

More information

Chapter 2 Multiple Regression I (Part 1)

Chapter 2 Multiple Regression I (Part 1) Chapter 2 Multiple Regression I (Part 1) 1 Regression several predictor variables The response Y depends on several predictor variables X 1,, X p response {}}{ Y predictor variables {}}{ X 1, X 2,, X p

More information

Lecture 4 Multiple linear regression

Lecture 4 Multiple linear regression Lecture 4 Multiple linear regression BIOST 515 January 15, 2004 Outline 1 Motivation for the multiple regression model Multiple regression in matrix notation Least squares estimation of model parameters

More information

401 Review. 6. Power analysis for one/two-sample hypothesis tests and for correlation analysis.

401 Review. 6. Power analysis for one/two-sample hypothesis tests and for correlation analysis. 401 Review Major topics of the course 1. Univariate analysis 2. Bivariate analysis 3. Simple linear regression 4. Linear algebra 5. Multiple regression analysis Major analysis methods 1. Graphical analysis

More information

Stat 135, Fall 2006 A. Adhikari HOMEWORK 10 SOLUTIONS

Stat 135, Fall 2006 A. Adhikari HOMEWORK 10 SOLUTIONS Stat 135, Fall 2006 A. Adhikari HOMEWORK 10 SOLUTIONS 1a) The model is cw i = β 0 + β 1 el i + ɛ i, where cw i is the weight of the ith chick, el i the length of the egg from which it hatched, and ɛ i

More information

Regression Models for Quantitative and Qualitative Predictors: An Overview

Regression Models for Quantitative and Qualitative Predictors: An Overview Regression Models for Quantitative and Qualitative Predictors: An Overview Polynomial regression models Interaction regression models Qualitative predictors Indicator variables Modeling interactions between

More information

Density Temp vs Ratio. temp

Density Temp vs Ratio. temp Temp Ratio Density 0.00 0.02 0.04 0.06 0.08 0.10 0.12 Density 0.0 0.2 0.4 0.6 0.8 1.0 1. (a) 170 175 180 185 temp 1.0 1.5 2.0 2.5 3.0 ratio The histogram shows that the temperature measures have two peaks,

More information

Statistical Machine Learning I

Statistical Machine Learning I Statistical Machine Learning I International Undergraduate Summer Enrichment Program (IUSEP) Linglong Kong Department of Mathematical and Statistical Sciences University of Alberta July 18, 2016 Linglong

More information

Lecture 8. Using the CLR Model. Relation between patent applications and R&D spending. Variables

Lecture 8. Using the CLR Model. Relation between patent applications and R&D spending. Variables Lecture 8. Using the CLR Model Relation between patent applications and R&D spending Variables PATENTS = No. of patents (in 000) filed RDEP = Expenditure on research&development (in billions of 99 $) The

More information

9. Linear Regression and Correlation

9. Linear Regression and Correlation 9. Linear Regression and Correlation Data: y a quantitative response variable x a quantitative explanatory variable (Chap. 8: Recall that both variables were categorical) For example, y = annual income,

More information

EECS E6690: Statistical Learning for Biological and Information Systems Lecture1: Introduction

EECS E6690: Statistical Learning for Biological and Information Systems Lecture1: Introduction EECS E6690: Statistical Learning for Biological and Information Systems Lecture1: Introduction Prof. Predrag R. Jelenković Time: Tuesday 4:10-6:40pm 1127 Seeley W. Mudd Building Dept. of Electrical Engineering

More information

Activity #12: More regression topics: LOWESS; polynomial, nonlinear, robust, quantile; ANOVA as regression

Activity #12: More regression topics: LOWESS; polynomial, nonlinear, robust, quantile; ANOVA as regression Activity #12: More regression topics: LOWESS; polynomial, nonlinear, robust, quantile; ANOVA as regression Scenario: 31 counts (over a 30-second period) were recorded from a Geiger counter at a nuclear

More information

Review of Statistics

Review of Statistics Review of Statistics Topics Descriptive Statistics Mean, Variance Probability Union event, joint event Random Variables Discrete and Continuous Distributions, Moments Two Random Variables Covariance and

More information

Announcements. Unit 7: Multiple linear regression Lecture 3: Confidence and prediction intervals + Transformations. Uncertainty of predictions

Announcements. Unit 7: Multiple linear regression Lecture 3: Confidence and prediction intervals + Transformations. Uncertainty of predictions Housekeeping Announcements Unit 7: Multiple linear regression Lecture 3: Confidence and prediction intervals + Statistics 101 Mine Çetinkaya-Rundel November 25, 2014 Poster presentation location: Section

More information

Statistics for Engineers Lecture 9 Linear Regression

Statistics for Engineers Lecture 9 Linear Regression Statistics for Engineers Lecture 9 Linear Regression Chong Ma Department of Statistics University of South Carolina chongm@email.sc.edu April 17, 2017 Chong Ma (Statistics, USC) STAT 509 Spring 2017 April

More information