Sections 7.1, 7.2, 7.4, & 7.6

Size: px
Start display at page:

Download "Sections 7.1, 7.2, 7.4, & 7.6"

Transcription

1 Sections 7.1, 7.2, 7.4, & 7.6 Adapted from Timothy Hanson Department of Statistics, University of South Carolina Stat 704: Data Analysis I 1 / 25

2 Chapter 7 example: Body fat n = 20 healthy females years old. x 1 = triceps skinfold thickness (mm) x 2 = thigh circumference (cm) x 3 = midarm circumference (cm) Y = body fat (%) Obtaining Y i, the percent of the body that is purely fat, requires immersing a person in water. Want to develop model based on simple body measurements that avoids people getting wet. 2 / 25

3 SAS code ******************************* * Body fat data from Chapter 7 *******************************; proc sgscatter data=bodyfat; matrix bodyfat triceps thigh midarm; run; proc corr data=bodyfat; var triceps thigh midarm; run; 3 / 25

4 Correlation coefficients There is high correlation among the predictors. For example r = 0.92 for triceps and thigh. These two variables are essentially carrying the same information. Maybe only one or the other is really needed. In general, one predictor may be essentially perfectly predicted by the remaining predictors (a high partial correlation ), and so would be unecessary if the other predictors are in the model. 4 / 25

5 7.1 Extra sums of squares Extra sums of squares are defined as the difference in SSE between a model with some predictors and a larger model that adds additional predictors. Fact: As predictors are added, the SSE can only decrease. The extra sums of squares is how much the SSE decreases: Def n Let x 1, x 2,..., x k be predictors in a model. SSR(x j+1,..., x k x 1, x 2,..., x j ) = SSE(x 1, x 2,..., x j ) SSE(x 1, x 2,..., x j, x j+1,..., x k ). I.e., SSR(x j+1,..., x k x 1, x 2,..., x j ) is the difference in the sums of squared errors from the reduced to the full model. This is how much of the total variation in SSTO is further explained by adding the new predictors. 5 / 25

6 Example with k = 8 predictors The predictors under consideration are There are two models x 1, x 2, x 3, x 4, x 5, x 6, x 7, x 8. Reduced : x 1, x 3, x 5, x 6, x 8 Full : x 1, x 2, x 3, x 4, x 5, x 6, x 7, x 8 Extra SS = SSR(x 2, x 4, x 7 x 1, x 3, x 5, x 6, x 8 ) = SSE(reduced) SSE(full) = SSE(x 1, x 3, x 5, x 6, x 8 ) SSE(x 1, x 2, x 3, x 4, x 5, x 6, x 7, x 8 ) = SSR(x 1, x 2, x 3, x 4, x 5, x 6, x 7, x 8 ) SSR(x 1, x 3, x 5, x 6, x 8 ) This is how much additional total variability (SSTO) is explained by adding x 2, x 4, x 7 to a model that already has x 1, x 3, x 5, x 6, x 8. 6 / 25

7 7.2 Associated tests We can formally test whether a certain set of predictors is useless, in the presence of other predictors in the model. This is the general linear test we talked about a few lectures ago (in simple linear regression). In the example above, we can test whether x 2, x 4, x 7 are needed if x 1, x 3, x 5, x 6, x 8 are in the model. If full (with x 1, x 2, x 3, x 4, x 5, x 6, x 7, x 8 ) model has much lower SSE than reduced model (without x 2, x 4, x 7 ) then at least one of x 2, x 4, x 7 is needed. 7 / 25

8 F-test Say we want to test whether we can drop q variables from a model that has p = k + 1 (including the intercept), q < p. Let R denote the reduced model and F the full, and SSE(R), SSE(F ) denote the sums of squared errors from the two models. To test H 0 : β j1 = β j2 = = β jq = 0 in the full model F = [SSE(R) SSE(F )]/q SSE(F )/(n p) F (q, n p) If H 0 : β j1 = β j2 = = β jq = 0 is true; a p-value for the test is P(F (q, n p) > F ). Can carry this out in SAS using test in proc reg. 8 / 25

9 F-test example with k = 8 predictors To test H 0 : β 2 = β 4 = β 7 = 0, F = [SSE(reduced) SSE(full)]/(# parameters in test) MSE(full) = [SSE(x 1, x 3, x 5, x 6, x 8 ) SSE(x 1, x 2, x 3, x 4, x 5, x 6, x 7, x 8 )]/3 SSE(x 1, x 2, x 3, x 4, x 5, x 6, x 7, x 8 )/(n 9) = SSR(x 2, x 4, x 7 x 1, x 3, x 5, x 6, x 8 )/3 SSE(x 1, x 2, x 3, x 4, x 5, x 6, x 7, x 8 )/(n 9) F (3, n 9) if H 0 : β 2 = β 4 = β 7 = 0 is true. 9 / 25

10 Body fat example proc reg data=body; model bodyfat=triceps thigh midarm; test thigh=0, midarm=0; run; Test 1 Results for Dependent Variable bodyfat Mean Source DF Square F Value Pr > F Numerator Denominator Reject H 0 : β 2 = β 3 = 0 in with p = fat i = β 0 + β 1 triceps i + β 2 thigh i + β 3 midarm i + ɛ i 10 / 25

11 Type I (sequential) sums of squares Note (pp ): Say you have k = 4 predictors. Then the SSR for the full model can be written SSR = SSR(x 1, x 2, x 3, x 4 ) = SSR(x 1 ) + SSR(x 2 x 1 ) + SSR(x 3 x 1, x 2 ) + SSR(x 4 x 1, x 2, x 3 ). These are called sequential sums of squares, or Type I sums of squares. They explain how much variability is soaked up by adding predictors sequentially to a model. There are four corresponding hypothesis tests with these sequential sums of squares: Model Hypothesis F-statistic Y i = β 0 + β 1 x i1 + ɛ i H 0 : β 1 = 0 SSR(x 1 ) MSE(x 1 ) Y i = β 0 + β 1 x i1 + β 2 x i2 + ɛ i H 0 : β 2 = 0 SSR(x 2 x 1 ) MSE(x 1,x 2 ) Y i = β 0 + β 1 x i1 + β 2 x i2 + β 3 x i3 + ɛ i H 0 : β 3 = 0 SSR(x 3 x 1,x 2 ) MSE(x 1,x 2,x 3 ) Y i = β 0 + β 1 x i1 + β 2 x i2 + β 3 x i3 + β 4 x i4 + ɛ i H 0 : β 4 = 0 SSR(x 4 x 1,x 2,x 3 ) MSE(x 1,x 2,x 3,x 4 ) 11 / 25

12 Body fat example You can get sequential SS from proc reg by adding ss1 as a model option. proc glm gives them automatically. proc glm data=body; model bodyfat=triceps thigh midarm / solution; run; Source DF Type I SS Mean Square F Value Pr > F triceps <.0001 thigh midarm Reject H 0 : β 1 = 0 in fat i = β 0 + β 1 triceps i + ɛ i with p < Reject H 0 : β 2 = 0 in fat i = β 0 + β 1 triceps i + β 2 thigh i + ɛ i with p = Accept H 0 : β 3 = 0 in fat i = β 0 + β 1 triceps i + β 2 thigh i + β 3 midarm i + ɛ i with p = Order entered (triceps, thigh, midarm) matters! 12 / 25

13 ANOVA table & decomposing the SSR(F) Sum of Source DF Squares Mean Square F Value Pr > F Model <.0001 Error Corrected Total The sequential extra sums of squares is given on the last slide: SSR(x 1 ) = 352.3; SSR(x 2 x 1 ) = 33.2, and SSR(x 3 x 1, x 2 ) = Almost all of the SSR(x 1, x 2, x 3 ) = is explained by x 1 (triceps) alone. Also note, as required, SSR(x 1, x 2, x 3 ) = = = SSR(x 1 )+SSR(x 2 x 1 )+SSR(x 3 x 1, x 2 ). Finally, we strongly reject H 0 : β 1 = β 2 = β 3 = / 25

14 7.4 Coefficients of partial determination We can standardize extra sums of squares to be between 0 and 1 (like R 2 ). The coefficient of partial determination is the fraction by which the sum of squared errors is reduced by adding predictor(s) to an existing model. Examples: R 2 Y 2 1 = SSR(x 2 x 1 )/SSE(x 1 ) R 2 Y 3 12 = SSR(x 3 x 1, x 2 )/SSE(x 1, x 2 ) R 2 Y 32 1 = SSR(x 2, x 3 x 1 )/SSE(x 1 ) For example, if RY = 0.5 then 50% of the remaining variability is explained by adding x 3 to a model that already had x 1 and x / 25

15 Body fat example In proc reg You can get RY 2 1, R2 Y 2 1, and R2 Y 3 12 by adding pcorr1 as a model option. You can get RY , R2 Y 2 13, and R2 Y 3 12 by adding pcorr2. proc reg data=body; model bodyfat=triceps thigh midarm / pcorr1; model bodyfat=triceps thigh midarm / pcorr2; run; 15 / 25

16 7.6 Multicollinearity Recall: In the body fat example, the F-test for testing H 0 : β 1 = β 2 = β 3 = 0 was highly significant, but individual t-tests for dropping any of x 1, x 2, or x 3 were not significant! The set x 1, x 2, x 3 are useful for explaining body fat, but none of the three are useful in the presence of the other two. Why? The predictors are measuring similar phenomena; their sample values are highly correlated. For example, r = between triceps thickness x 1 and thigh circumference x 2. This is known as multicollinearity among the predictors. 16 / 25

17 Effects of multicollinearity Model may still provide a good fit and precise prediction/estimation of the response. Several estimated regression coefficients b 1, b 2,..., b k will have large standard errors, leading to conclusions that individual predictors are not significant although overall F-test may be highly significant. Concept of holding all other predictors constant doesn t make sense in practice. Signs of regression coefficients may be opposite of intuition (or what we might think marginally they might be based on a scatterplot). 17 / 25

18 Bodyfat example proc glm data=body; model bodyfat=triceps thigh midarm / solution; run; Standard Parameter Estimate Error t Value Pr > t Intercept triceps thigh midarm Two of the three regression effects are negative. Holding midarm and triceps constant, increasing the thigh circumference 1 mm decreases bodyfat. Does this make sense? 18 / 25

19 Detecting multicollinearity A formal method for determining the presence of multicollinearity is the variance inflation factor (VIF). VIF s measure how much variances of estimated regression coefficients are inflated when compared to having uncorrelated predictors. We will start with the standardized regression model of Section 7.5. Let Y i = 1 n 1 Y i Ȳ s Y and x ij = where sy 2 = (n 1) 1 n i=1 (Y i Ȳ ) 2, = (n 1) 1 n i=1 (x ij x j ) 2, and x j = n 1 n s 2 j 1 n 1 x ij x j s j, i=1 x ij. 19 / 25

20 VIF s These variables are centered about their means and standardized to have Euclidean norm 1. For example, x j 2 = (x 1j )2 + + (x nj )2 = 1. Note that in general (Y ) (Y ) = (x j ) (x j ) = 1 and (x j ) (x s ) = corr(x j, x s ) def = r js. 20 / 25

21 VIF s Consider the standardized regression model Y i = β 1x i1 + + β k x ik + ɛ i. Define the k k sample correlation matrix R for the standardized predictors, and the n k design matrix X to be: 1 r 21 r k1 x11 x12 x 1k r 12 1 r k2 R =......, x X 21 x22 x 2k = r 1k r 2k 1 xn1 xn2 xnk Since (X ) (X ) = R, the least-squares estimate of β = (β 1,..., β k ) is given by b = R 1 (X ) Y. Hence Cov(b ) = R 1 (σ ) / 25

22 VIF s Now note that if all predictors are uncorrelated then R = I k = R 1. Hence the ith diagonal element of R 1 measures how much the variance of bi is inflated due to correlation between predictors. We call this the ith variance inflation factor: VIF i = (R 1 ) ii. Usually the largest VIF i is taken to be a measure of the seriousness of the multicollinearity among the predictors. 22 / 25

23 Detecting multicollinearity Predictor x j has a variance inflation factor of VIF j = 1 1 R 2 j, where Rj 2 is R 2 from regressing x j on the remaining predictors x 1, x 2,..., x j 1, x j+1,..., x k. High Rj 2 (near 1) x j is linearly associated with other predictors high VIF j. VIF j 1 x j is not involved in any multicollinearity. VIF j > 10 x j is involved in severe multicollinearity. 23 / 25

24 VIF j s in SAS model bodyfat = triceps thigh midarm / vif; Parameter Estimates Parameter Standard Variance Variable DF Estimate Error t Value Pr > t Inflation Intercept triceps thigh midarm What do you conclude? 24 / 25

25 Remedies for multicollinearity Drop one or more predictors from the model. We ll discuss this in Chapter 9. More advanced: principal components regression uses indexes (new predictors) that are linear combinations of the original predictors as predictors in a new model. The indexes are selected to be uncorrelated. Disadvantage: the indexes might be hard to interpret. More advanced: ridge regression (Section 11.2). More advanced: ensemble methods 25 / 25

STAT5044: Regression and Anova

STAT5044: Regression and Anova STAT5044: Regression and Anova Inyoung Kim 1 / 25 Outline 1 Multiple Linear Regression 2 / 25 Basic Idea An extra sum of squares: the marginal reduction in the error sum of squares when one or several

More information

STAT Checking Model Assumptions

STAT Checking Model Assumptions STAT 704 --- Checking Model Assumptions Recall we assumed the following in our model: (1) The regression relationship between the response and the predictor(s) specified in the model is appropriate (2)

More information

Chapter 2 Multiple Regression (Part 4)

Chapter 2 Multiple Regression (Part 4) Chapter 2 Multiple Regression (Part 4) 1 The effect of multi-collinearity Now, we know to find the estimator (X X) 1 must exist! Therefore, n must be great or at least equal to p + 1 (WHY?) However, even

More information

Ridge Regression. Summary. Sample StatFolio: ridge reg.sgp. STATGRAPHICS Rev. 10/1/2014

Ridge Regression. Summary. Sample StatFolio: ridge reg.sgp. STATGRAPHICS Rev. 10/1/2014 Ridge Regression Summary... 1 Data Input... 4 Analysis Summary... 5 Analysis Options... 6 Ridge Trace... 7 Regression Coefficients... 8 Standardized Regression Coefficients... 9 Observed versus Predicted...

More information

Outline. Remedial Measures) Extra Sums of Squares Standardized Version of the Multiple Regression Model

Outline. Remedial Measures) Extra Sums of Squares Standardized Version of the Multiple Regression Model Outline 1 Multiple Linear Regression (Estimation, Inference, Diagnostics and Remedial Measures) 2 Special Topics for Multiple Regression Extra Sums of Squares Standardized Version of the Multiple Regression

More information

Lecture 13 Extra Sums of Squares

Lecture 13 Extra Sums of Squares Lecture 13 Extra Sums of Squares STAT 512 Spring 2011 Background Reading KNNL: 7.1-7.4 13-1 Topic Overview Extra Sums of Squares (Defined) Using and Interpreting R 2 and Partial-R 2 Getting ESS and Partial-R

More information

PubH 7405: REGRESSION ANALYSIS. MLR: INFERENCES, Part I

PubH 7405: REGRESSION ANALYSIS. MLR: INFERENCES, Part I PubH 7405: REGRESSION ANALYSIS MLR: INFERENCES, Part I TESTING HYPOTHESES Once we have fitted a multiple linear regression model and obtained estimates for the various parameters of interest, we want to

More information

STAT 705 Chapter 16: One-way ANOVA

STAT 705 Chapter 16: One-way ANOVA STAT 705 Chapter 16: One-way ANOVA Timothy Hanson Department of Statistics, University of South Carolina Stat 705: Data Analysis II 1 / 21 What is ANOVA? Analysis of variance (ANOVA) models are regression

More information

STAT763: Applied Regression Analysis. Multiple linear regression. 4.4 Hypothesis testing

STAT763: Applied Regression Analysis. Multiple linear regression. 4.4 Hypothesis testing STAT763: Applied Regression Analysis Multiple linear regression 4.4 Hypothesis testing Chunsheng Ma E-mail: cma@math.wichita.edu 4.4.1 Significance of regression Null hypothesis (Test whether all β j =

More information

F-tests and Nested Models

F-tests and Nested Models F-tests and Nested Models Nested Models: A core concept in statistics is comparing nested s. Consider the Y = β 0 + β 1 x 1 + β 2 x 2 + ǫ. (1) The following reduced s are special cases (nested within)

More information

Final Review. Yang Feng. Yang Feng (Columbia University) Final Review 1 / 58

Final Review. Yang Feng.   Yang Feng (Columbia University) Final Review 1 / 58 Final Review Yang Feng http://www.stat.columbia.edu/~yangfeng Yang Feng (Columbia University) Final Review 1 / 58 Outline 1 Multiple Linear Regression (Estimation, Inference) 2 Special Topics for Multiple

More information

Inferences for Regression

Inferences for Regression Inferences for Regression An Example: Body Fat and Waist Size Looking at the relationship between % body fat and waist size (in inches). Here is a scatterplot of our data set: Remembering Regression In

More information

SSR = The sum of squared errors measures how much Y varies around the regression line n. It happily turns out that SSR + SSE = SSTO.

SSR = The sum of squared errors measures how much Y varies around the regression line n. It happily turns out that SSR + SSE = SSTO. Analysis of variance approach to regression If x is useless, i.e. β 1 = 0, then E(Y i ) = β 0. In this case β 0 is estimated by Ȳ. The ith deviation about this grand mean can be written: deviation about

More information

Statistics 512: Applied Linear Models. Topic 4

Statistics 512: Applied Linear Models. Topic 4 Topic Overview This topic will cover General Linear Tests Extra Sums of Squares Partial Correlations Multicollinearity Model Selection Statistics 512: Applied Linear Models Topic 4 General Linear Tests

More information

Topic 16: Multicollinearity and Polynomial Regression

Topic 16: Multicollinearity and Polynomial Regression Topic 16: Multicollinearity and Polynomial Regression Outline Multicollinearity Polynomial regression An example (KNNL p256) The P-value for ANOVA F-test is

More information

STA 4210 Practise set 2a

STA 4210 Practise set 2a STA 410 Practise set a For all significance tests, use = 0.05 significance level. S.1. A multiple linear regression model is fit, relating household weekly food expenditures (Y, in $100s) to weekly income

More information

Lecture 10 Multiple Linear Regression

Lecture 10 Multiple Linear Regression Lecture 10 Multiple Linear Regression STAT 512 Spring 2011 Background Reading KNNL: 6.1-6.5 10-1 Topic Overview Multiple Linear Regression Model 10-2 Data for Multiple Regression Y i is the response variable

More information

STA 4210 Practise set 2b

STA 4210 Practise set 2b STA 410 Practise set b For all significance tests, use = 0.05 significance level. S.1. A linear regression model is fit, relating fish catch (Y, in tons) to the number of vessels (X 1 ) and fishing pressure

More information

STA 302 H1F / 1001 HF Fall 2007 Test 1 October 24, 2007

STA 302 H1F / 1001 HF Fall 2007 Test 1 October 24, 2007 STA 302 H1F / 1001 HF Fall 2007 Test 1 October 24, 2007 LAST NAME: SOLUTIONS FIRST NAME: STUDENT NUMBER: ENROLLED IN: (circle one) STA 302 STA 1001 INSTRUCTIONS: Time: 90 minutes Aids allowed: calculator.

More information

Lecture 15 Multiple regression I Chapter 6 Set 2 Least Square Estimation The quadratic form to be minimized is

Lecture 15 Multiple regression I Chapter 6 Set 2 Least Square Estimation The quadratic form to be minimized is Lecture 15 Multiple regression I Chapter 6 Set 2 Least Square Estimation The quadratic form to be minimized is Q = (Y i β 0 β 1 X i1 β 2 X i2 β p 1 X i.p 1 ) 2, which in matrix notation is Q = (Y Xβ) (Y

More information

Chapter 6 Multiple Regression

Chapter 6 Multiple Regression STAT 525 FALL 2018 Chapter 6 Multiple Regression Professor Min Zhang The Data and Model Still have single response variable Y Now have multiple explanatory variables Examples: Blood Pressure vs Age, Weight,

More information

Simple Regression Model Setup Estimation Inference Prediction. Model Diagnostic. Multiple Regression. Model Setup and Estimation.

Simple Regression Model Setup Estimation Inference Prediction. Model Diagnostic. Multiple Regression. Model Setup and Estimation. Statistical Computation Math 475 Jimin Ding Department of Mathematics Washington University in St. Louis www.math.wustl.edu/ jmding/math475/index.html October 10, 2013 Ridge Part IV October 10, 2013 1

More information

Unbalanced Data in Factorials Types I, II, III SS Part 1

Unbalanced Data in Factorials Types I, II, III SS Part 1 Unbalanced Data in Factorials Types I, II, III SS Part 1 Chapter 10 in Oehlert STAT:5201 Week 9 - Lecture 2 1 / 14 When we perform an ANOVA, we try to quantify the amount of variability in the data accounted

More information

Formal Statement of Simple Linear Regression Model

Formal Statement of Simple Linear Regression Model Formal Statement of Simple Linear Regression Model Y i = β 0 + β 1 X i + ɛ i Y i value of the response variable in the i th trial β 0 and β 1 are parameters X i is a known constant, the value of the predictor

More information

1 Multiple Regression

1 Multiple Regression 1 Multiple Regression In this section, we extend the linear model to the case of several quantitative explanatory variables. There are many issues involved in this problem and this section serves only

More information

Chapter 2 Inferences in Simple Linear Regression

Chapter 2 Inferences in Simple Linear Regression STAT 525 SPRING 2018 Chapter 2 Inferences in Simple Linear Regression Professor Min Zhang Testing for Linear Relationship Term β 1 X i defines linear relationship Will then test H 0 : β 1 = 0 Test requires

More information

Predictive Analytics : QM901.1x Prof U Dinesh Kumar, IIMB. All Rights Reserved, Indian Institute of Management Bangalore

Predictive Analytics : QM901.1x Prof U Dinesh Kumar, IIMB. All Rights Reserved, Indian Institute of Management Bangalore What is Multiple Linear Regression Several independent variables may influence the change in response variable we are trying to study. When several independent variables are included in the equation, the

More information

Correlation and the Analysis of Variance Approach to Simple Linear Regression

Correlation and the Analysis of Variance Approach to Simple Linear Regression Correlation and the Analysis of Variance Approach to Simple Linear Regression Biometry 755 Spring 2009 Correlation and the Analysis of Variance Approach to Simple Linear Regression p. 1/35 Correlation

More information

Lecture 3: Inference in SLR

Lecture 3: Inference in SLR Lecture 3: Inference in SLR STAT 51 Spring 011 Background Reading KNNL:.1.6 3-1 Topic Overview This topic will cover: Review of hypothesis testing Inference about 1 Inference about 0 Confidence Intervals

More information

STAT 705 Chapter 19: Two-way ANOVA

STAT 705 Chapter 19: Two-way ANOVA STAT 705 Chapter 19: Two-way ANOVA Timothy Hanson Department of Statistics, University of South Carolina Stat 705: Data Analysis II 1 / 38 Two-way ANOVA Material covered in Sections 19.2 19.4, but a bit

More information

Chapter 3. Diagnostics and Remedial Measures

Chapter 3. Diagnostics and Remedial Measures Chapter 3. Diagnostics and Remedial Measures So far, we took data (X i, Y i ) and we assumed Y i = β 0 + β 1 X i + ǫ i i = 1, 2,..., n, where ǫ i iid N(0, σ 2 ), β 0, β 1 and σ 2 are unknown parameters,

More information

The legacy of Sir Ronald A. Fisher. Fisher s three fundamental principles: local control, replication, and randomization.

The legacy of Sir Ronald A. Fisher. Fisher s three fundamental principles: local control, replication, and randomization. 1 Chapter 1: Research Design Principles The legacy of Sir Ronald A. Fisher. Fisher s three fundamental principles: local control, replication, and randomization. 2 Chapter 2: Completely Randomized Design

More information

STAT 705 Chapters 22: Analysis of Covariance

STAT 705 Chapters 22: Analysis of Covariance STAT 705 Chapters 22: Analysis of Covariance Timothy Hanson Department of Statistics, University of South Carolina Stat 705: Data Analysis II 1 / 16 ANalysis of COVAriance Add a continuous predictor to

More information

Ch 3: Multiple Linear Regression

Ch 3: Multiple Linear Regression Ch 3: Multiple Linear Regression 1. Multiple Linear Regression Model Multiple regression model has more than one regressor. For example, we have one response variable and two regressor variables: 1. delivery

More information

Chapter 12 - Lecture 2 Inferences about regression coefficient

Chapter 12 - Lecture 2 Inferences about regression coefficient Chapter 12 - Lecture 2 Inferences about regression coefficient April 19th, 2010 Facts about slope Test Statistic Confidence interval Hypothesis testing Test using ANOVA Table Facts about slope In previous

More information

Mathematics for Economics MA course

Mathematics for Economics MA course Mathematics for Economics MA course Simple Linear Regression Dr. Seetha Bandara Simple Regression Simple linear regression is a statistical method that allows us to summarize and study relationships between

More information

Ch 2: Simple Linear Regression

Ch 2: Simple Linear Regression Ch 2: Simple Linear Regression 1. Simple Linear Regression Model A simple regression model with a single regressor x is y = β 0 + β 1 x + ɛ, where we assume that the error ɛ is independent random component

More information

6. Multiple Linear Regression

6. Multiple Linear Regression 6. Multiple Linear Regression SLR: 1 predictor X, MLR: more than 1 predictor Example data set: Y i = #points scored by UF football team in game i X i1 = #games won by opponent in their last 10 games X

More information

Lecture 11 Multiple Linear Regression

Lecture 11 Multiple Linear Regression Lecture 11 Multiple Linear Regression STAT 512 Spring 2011 Background Reading KNNL: 6.1-6.5 11-1 Topic Overview Review: Multiple Linear Regression (MLR) Computer Science Case Study 11-2 Multiple Regression

More information

Stat 5100 Handout #26: Variations on OLS Linear Regression (Ch. 11, 13)

Stat 5100 Handout #26: Variations on OLS Linear Regression (Ch. 11, 13) Stat 5100 Handout #26: Variations on OLS Linear Regression (Ch. 11, 13) 1. Weighted Least Squares (textbook 11.1) Recall regression model Y = β 0 + β 1 X 1 +... + β p 1 X p 1 + ε in matrix form: (Ch. 5,

More information

y response variable x 1, x 2,, x k -- a set of explanatory variables

y response variable x 1, x 2,, x k -- a set of explanatory variables 11. Multiple Regression and Correlation y response variable x 1, x 2,, x k -- a set of explanatory variables In this chapter, all variables are assumed to be quantitative. Chapters 12-14 show how to incorporate

More information

Multiple Regression Analysis. Part III. Multiple Regression Analysis

Multiple Regression Analysis. Part III. Multiple Regression Analysis Part III Multiple Regression Analysis As of Sep 26, 2017 1 Multiple Regression Analysis Estimation Matrix form Goodness-of-Fit R-square Adjusted R-square Expected values of the OLS estimators Irrelevant

More information

MATH 644: Regression Analysis Methods

MATH 644: Regression Analysis Methods MATH 644: Regression Analysis Methods FINAL EXAM Fall, 2012 INSTRUCTIONS TO STUDENTS: 1. This test contains SIX questions. It comprises ELEVEN printed pages. 2. Answer ALL questions for a total of 100

More information

Multiple Linear Regression

Multiple Linear Regression Multiple Linear Regression Simple linear regression tries to fit a simple line between two variables Y and X. If X is linearly related to Y this explains some of the variability in Y. In most cases, there

More information

Simple Linear Regression

Simple Linear Regression Simple Linear Regression September 24, 2008 Reading HH 8, GIll 4 Simple Linear Regression p.1/20 Problem Data: Observe pairs (Y i,x i ),i = 1,...n Response or dependent variable Y Predictor or independent

More information

LECTURE 6. Introduction to Econometrics. Hypothesis testing & Goodness of fit

LECTURE 6. Introduction to Econometrics. Hypothesis testing & Goodness of fit LECTURE 6 Introduction to Econometrics Hypothesis testing & Goodness of fit October 25, 2016 1 / 23 ON TODAY S LECTURE We will explain how multiple hypotheses are tested in a regression model We will define

More information

Contents. 1 Review of Residuals. 2 Detecting Outliers. 3 Influential Observations. 4 Multicollinearity and its Effects

Contents. 1 Review of Residuals. 2 Detecting Outliers. 3 Influential Observations. 4 Multicollinearity and its Effects Contents 1 Review of Residuals 2 Detecting Outliers 3 Influential Observations 4 Multicollinearity and its Effects W. Zhou (Colorado State University) STAT 540 July 6th, 2015 1 / 32 Model Diagnostics:

More information

STAT 705 Chapter 19: Two-way ANOVA

STAT 705 Chapter 19: Two-way ANOVA STAT 705 Chapter 19: Two-way ANOVA Adapted from Timothy Hanson Department of Statistics, University of South Carolina Stat 705: Data Analysis II 1 / 41 Two-way ANOVA This material is covered in Sections

More information

2. Regression Review

2. Regression Review 2. Regression Review 2.1 The Regression Model The general form of the regression model y t = f(x t, β) + ε t where x t = (x t1,, x tp ), β = (β 1,..., β m ). ε t is a random variable, Eε t = 0, Var(ε t

More information

Hypothesis testing Goodness of fit Multicollinearity Prediction. Applied Statistics. Lecturer: Serena Arima

Hypothesis testing Goodness of fit Multicollinearity Prediction. Applied Statistics. Lecturer: Serena Arima Applied Statistics Lecturer: Serena Arima Hypothesis testing for the linear model Under the Gauss-Markov assumptions and the normality of the error terms, we saw that β N(β, σ 2 (X X ) 1 ) and hence s

More information

No other aids are allowed. For example you are not allowed to have any other textbook or past exams.

No other aids are allowed. For example you are not allowed to have any other textbook or past exams. UNIVERSITY OF TORONTO SCARBOROUGH Department of Computer and Mathematical Sciences Sample Exam Note: This is one of our past exams, In fact the only past exam with R. Before that we were using SAS. In

More information

Chapter 12: Linear regression II

Chapter 12: Linear regression II Chapter 12: Linear regression II Timothy Hanson Department of Statistics, University of South Carolina Stat 205: Elementary Statistics for the Biological and Life Sciences 1 / 14 12.4 The regression model

More information

Topic 18: Model Selection and Diagnostics

Topic 18: Model Selection and Diagnostics Topic 18: Model Selection and Diagnostics Variable Selection We want to choose a best model that is a subset of the available explanatory variables Two separate problems 1. How many explanatory variables

More information

STOR 455 STATISTICAL METHODS I

STOR 455 STATISTICAL METHODS I STOR 455 STATISTICAL METHODS I Jan Hannig Mul9variate Regression Y=X β + ε X is a regression matrix, β is a vector of parameters and ε are independent N(0,σ) Es9mated parameters b=(x X) - 1 X Y Predicted

More information

3 Variables: Cyberloafing Conscientiousness Age

3 Variables: Cyberloafing Conscientiousness Age title 'Cyberloafing, Mike Sage'; run; PROC CORR data=sage; var Cyberloafing Conscientiousness Age; run; quit; The CORR Procedure 3 Variables: Cyberloafing Conscientiousness Age Simple Statistics Variable

More information

How the mean changes depends on the other variable. Plots can show what s happening...

How the mean changes depends on the other variable. Plots can show what s happening... Chapter 8 (continued) Section 8.2: Interaction models An interaction model includes one or several cross-product terms. Example: two predictors Y i = β 0 + β 1 x i1 + β 2 x i2 + β 12 x i1 x i2 + ɛ i. How

More information

Multilevel Models in Matrix Form. Lecture 7 July 27, 2011 Advanced Multivariate Statistical Methods ICPSR Summer Session #2

Multilevel Models in Matrix Form. Lecture 7 July 27, 2011 Advanced Multivariate Statistical Methods ICPSR Summer Session #2 Multilevel Models in Matrix Form Lecture 7 July 27, 2011 Advanced Multivariate Statistical Methods ICPSR Summer Session #2 Today s Lecture Linear models from a matrix perspective An example of how to do

More information

CHAPTER 3: Multicollinearity and Model Selection

CHAPTER 3: Multicollinearity and Model Selection CHAPTER 3: Multicollinearity and Model Selection Prof. Alan Wan 1 / 89 Table of contents 1. Multicollinearity 1.1 What is Multicollinearity? 1.2 Consequences and Identification of Multicollinearity 1.3

More information

Lecture 1 Linear Regression with One Predictor Variable.p2

Lecture 1 Linear Regression with One Predictor Variable.p2 Lecture Linear Regression with One Predictor Variablep - Basics - Meaning of regression parameters p - β - the slope of the regression line -it indicates the change in mean of the probability distn of

More information

3. Diagnostics and Remedial Measures

3. Diagnostics and Remedial Measures 3. Diagnostics and Remedial Measures So far, we took data (X i, Y i ) and we assumed where ɛ i iid N(0, σ 2 ), Y i = β 0 + β 1 X i + ɛ i i = 1, 2,..., n, β 0, β 1 and σ 2 are unknown parameters, X i s

More information

Lecture 11: Simple Linear Regression

Lecture 11: Simple Linear Regression Lecture 11: Simple Linear Regression Readings: Sections 3.1-3.3, 11.1-11.3 Apr 17, 2009 In linear regression, we examine the association between two quantitative variables. Number of beers that you drink

More information

COMPREHENSIVE WRITTEN EXAMINATION, PAPER III FRIDAY AUGUST 26, 2005, 9:00 A.M. 1:00 P.M. STATISTICS 174 QUESTION

COMPREHENSIVE WRITTEN EXAMINATION, PAPER III FRIDAY AUGUST 26, 2005, 9:00 A.M. 1:00 P.M. STATISTICS 174 QUESTION COMPREHENSIVE WRITTEN EXAMINATION, PAPER III FRIDAY AUGUST 26, 2005, 9:00 A.M. 1:00 P.M. STATISTICS 174 QUESTION Answer all parts. Closed book, calculators allowed. It is important to show all working,

More information

Class Notes Spring 2014

Class Notes Spring 2014 Psychology 513 Quantitative Models in Psychology Class Notes Spring 2014 Robert M. McFatter University of Louisiana Lafayette 5.5 5 4.5 Positive Emotional Intensity 4 3.5 3 2.5 2.5 1.25 2-2.5-2 -1.5-1

More information

Chapter 8 Quantitative and Qualitative Predictors

Chapter 8 Quantitative and Qualitative Predictors STAT 525 FALL 2017 Chapter 8 Quantitative and Qualitative Predictors Professor Dabao Zhang Polynomial Regression Multiple regression using X 2 i, X3 i, etc as additional predictors Generates quadratic,

More information

Lectures on Simple Linear Regression Stat 431, Summer 2012

Lectures on Simple Linear Regression Stat 431, Summer 2012 Lectures on Simple Linear Regression Stat 43, Summer 0 Hyunseung Kang July 6-8, 0 Last Updated: July 8, 0 :59PM Introduction Previously, we have been investigating various properties of the population

More information

Section 3: Simple Linear Regression

Section 3: Simple Linear Regression Section 3: Simple Linear Regression Carlos M. Carvalho The University of Texas at Austin McCombs School of Business http://faculty.mccombs.utexas.edu/carlos.carvalho/teaching/ 1 Regression: General Introduction

More information

STAT 3A03 Applied Regression With SAS Fall 2017

STAT 3A03 Applied Regression With SAS Fall 2017 STAT 3A03 Applied Regression With SAS Fall 2017 Assignment 2 Solution Set Q. 1 I will add subscripts relating to the question part to the parameters and their estimates as well as the errors and residuals.

More information

Notes 6. Basic Stats Procedures part II

Notes 6. Basic Stats Procedures part II Statistics 5106, Fall 2007 Notes 6 Basic Stats Procedures part II Testing for Correlation between Two Variables You have probably all heard about correlation. When two variables are correlated, they are

More information

STAT Chapter 11: Regression

STAT Chapter 11: Regression STAT 515 -- Chapter 11: Regression Mostly we have studied the behavior of a single random variable. Often, however, we gather data on two random variables. We wish to determine: Is there a relationship

More information

The general linear regression with k explanatory variables is just an extension of the simple regression as follows

The general linear regression with k explanatory variables is just an extension of the simple regression as follows 3. Multiple Regression Analysis The general linear regression with k explanatory variables is just an extension of the simple regression as follows (1) y i = β 0 + β 1 x i1 + + β k x ik + u i. Because

More information

1) Answer the following questions as true (T) or false (F) by circling the appropriate letter.

1) Answer the following questions as true (T) or false (F) by circling the appropriate letter. 1) Answer the following questions as true (T) or false (F) by circling the appropriate letter. T F T F T F a) Variance estimates should always be positive, but covariance estimates can be either positive

More information

Remedial Measures for Multiple Linear Regression Models

Remedial Measures for Multiple Linear Regression Models Remedial Measures for Multiple Linear Regression Models Yang Feng http://www.stat.columbia.edu/~yangfeng Yang Feng (Columbia University) Remedial Measures for Multiple Linear Regression Models 1 / 25 Outline

More information

Multiple Regression Examples

Multiple Regression Examples Multiple Regression Examples Example: Tree data. we have seen that a simple linear regression of usable volume on diameter at chest height is not suitable, but that a quadratic model y = β 0 + β 1 x +

More information

Hypothesis Testing hypothesis testing approach

Hypothesis Testing hypothesis testing approach Hypothesis Testing In this case, we d be trying to form an inference about that neighborhood: Do people there shop more often those people who are members of the larger population To ascertain this, we

More information

Multiple regression: introduction

Multiple regression: introduction Chapter 9 Multiple regression: introduction Multiple regression involves predicting values of a dependent variable from the values on a collection of other (predictor) variables. In particular, linear

More information

Statistics 5100 Spring 2018 Exam 1

Statistics 5100 Spring 2018 Exam 1 Statistics 5100 Spring 2018 Exam 1 Directions: You have 60 minutes to complete the exam. Be sure to answer every question, and do not spend too much time on any part of any question. Be concise with all

More information

Inference for Regression

Inference for Regression Inference for Regression Section 9.4 Cathy Poliak, Ph.D. cathy@math.uh.edu Office in Fleming 11c Department of Mathematics University of Houston Lecture 13b - 3339 Cathy Poliak, Ph.D. cathy@math.uh.edu

More information

Statistics for Engineers Lecture 9 Linear Regression

Statistics for Engineers Lecture 9 Linear Regression Statistics for Engineers Lecture 9 Linear Regression Chong Ma Department of Statistics University of South Carolina chongm@email.sc.edu April 17, 2017 Chong Ma (Statistics, USC) STAT 509 Spring 2017 April

More information

STA 108 Applied Linear Models: Regression Analysis Spring Solution for Homework #6

STA 108 Applied Linear Models: Regression Analysis Spring Solution for Homework #6 STA 8 Applied Linear Models: Regression Analysis Spring 011 Solution for Homework #6 6. a) = 11 1 31 41 51 1 3 4 5 11 1 31 41 51 β = β1 β β 3 b) = 1 1 1 1 1 11 1 31 41 51 1 3 4 5 β = β 0 β1 β 6.15 a) Stem-and-leaf

More information

Regression Models for Quantitative and Qualitative Predictors: An Overview

Regression Models for Quantitative and Qualitative Predictors: An Overview Regression Models for Quantitative and Qualitative Predictors: An Overview Polynomial regression models Interaction regression models Qualitative predictors Indicator variables Modeling interactions between

More information

Section Poisson Regression

Section Poisson Regression Section 14.13 Poisson Regression Timothy Hanson Department of Statistics, University of South Carolina Stat 705: Data Analysis II 1 / 26 Poisson regression Regular regression data {(x i, Y i )} n i=1,

More information

10. Alternative case influence statistics

10. Alternative case influence statistics 10. Alternative case influence statistics a. Alternative to D i : dffits i (and others) b. Alternative to studres i : externally-studentized residual c. Suggestion: use whatever is convenient with the

More information

Predict y from (possibly) many predictors x. Model Criticism Study the importance of columns

Predict y from (possibly) many predictors x. Model Criticism Study the importance of columns Lecture Week Multiple Linear Regression Predict y from (possibly) many predictors x Including extra derived variables Model Criticism Study the importance of columns Draw on Scientific framework Experiment;

More information

MULTICOLLINEARITY AND VARIANCE INFLATION FACTORS. F. Chiaromonte 1

MULTICOLLINEARITY AND VARIANCE INFLATION FACTORS. F. Chiaromonte 1 MULTICOLLINEARITY AND VARIANCE INFLATION FACTORS F. Chiaromonte 1 Pool of available predictors/terms from them in the data set. Related to model selection, are the questions: What is the relative importance

More information

STAT 350 Final (new Material) Review Problems Key Spring 2016

STAT 350 Final (new Material) Review Problems Key Spring 2016 1. The editor of a statistics textbook would like to plan for the next edition. A key variable is the number of pages that will be in the final version. Text files are prepared by the authors using LaTeX,

More information

Inference with Simple Regression

Inference with Simple Regression 1 Introduction Inference with Simple Regression Alan B. Gelder 06E:071, The University of Iowa 1 Moving to infinite means: In this course we have seen one-mean problems, twomean problems, and problems

More information

Confidence Intervals, Testing and ANOVA Summary

Confidence Intervals, Testing and ANOVA Summary Confidence Intervals, Testing and ANOVA Summary 1 One Sample Tests 1.1 One Sample z test: Mean (σ known) Let X 1,, X n a r.s. from N(µ, σ) or n > 30. Let The test statistic is H 0 : µ = µ 0. z = x µ 0

More information

Biostatistics 380 Multiple Regression 1. Multiple Regression

Biostatistics 380 Multiple Regression 1. Multiple Regression Biostatistics 0 Multiple Regression ORIGIN 0 Multiple Regression Multiple Regression is an extension of the technique of linear regression to describe the relationship between a single dependent (response)

More information

Model Selection Procedures

Model Selection Procedures Model Selection Procedures Statistics 135 Autumn 2005 Copyright c 2005 by Mark E. Irwin Model Selection Procedures Consider a regression setting with K potential predictor variables and you wish to explore

More information

Correlation and Regression

Correlation and Regression Correlation and Regression October 25, 2017 STAT 151 Class 9 Slide 1 Outline of Topics 1 Associations 2 Scatter plot 3 Correlation 4 Regression 5 Testing and estimation 6 Goodness-of-fit STAT 151 Class

More information

Concordia University (5+5)Q 1.

Concordia University (5+5)Q 1. (5+5)Q 1. Concordia University Department of Mathematics and Statistics Course Number Section Statistics 360/1 40 Examination Date Time Pages Mid Term Test May 26, 2004 Two Hours 3 Instructor Course Examiner

More information

ST 512-Practice Exam I - Osborne Directions: Answer questions as directed. For true/false questions, circle either true or false.

ST 512-Practice Exam I - Osborne Directions: Answer questions as directed. For true/false questions, circle either true or false. ST 512-Practice Exam I - Osborne Directions: Answer questions as directed. For true/false questions, circle either true or false. 1. A study was carried out to examine the relationship between the number

More information

CS 5014: Research Methods in Computer Science

CS 5014: Research Methods in Computer Science Computer Science Clifford A. Shaffer Department of Computer Science Virginia Tech Blacksburg, Virginia Fall 2010 Copyright c 2010 by Clifford A. Shaffer Computer Science Fall 2010 1 / 207 Correlation and

More information

Chapter 14 Student Lecture Notes 14-1

Chapter 14 Student Lecture Notes 14-1 Chapter 14 Student Lecture Notes 14-1 Business Statistics: A Decision-Making Approach 6 th Edition Chapter 14 Multiple Regression Analysis and Model Building Chap 14-1 Chapter Goals After completing this

More information

STAT 705 Chapters 23 and 24: Two factors, unequal sample sizes; multi-factor ANOVA

STAT 705 Chapters 23 and 24: Two factors, unequal sample sizes; multi-factor ANOVA STAT 705 Chapters 23 and 24: Two factors, unequal sample sizes; multi-factor ANOVA Timothy Hanson Department of Statistics, University of South Carolina Stat 705: Data Analysis II 1 / 22 Balanced vs. unbalanced

More information

Lecture 4: Multivariate Regression, Part 2

Lecture 4: Multivariate Regression, Part 2 Lecture 4: Multivariate Regression, Part 2 Gauss-Markov Assumptions 1) Linear in Parameters: Y X X X i 0 1 1 2 2 k k 2) Random Sampling: we have a random sample from the population that follows the above

More information

Multicollinearity Exercise

Multicollinearity Exercise Multicollinearity Exercise Use the attached SAS output to answer the questions. [OPTIONAL: Copy the SAS program below into the SAS editor window and run it.] You do not need to submit any output, so there

More information

Basic Business Statistics 6 th Edition

Basic Business Statistics 6 th Edition Basic Business Statistics 6 th Edition Chapter 12 Simple Linear Regression Learning Objectives In this chapter, you learn: How to use regression analysis to predict the value of a dependent variable based

More information

General Linear Model (Chapter 4)

General Linear Model (Chapter 4) General Linear Model (Chapter 4) Outcome variable is considered continuous Simple linear regression Scatterplots OLS is BLUE under basic assumptions MSE estimates residual variance testing regression coefficients

More information

Profile Analysis Multivariate Regression

Profile Analysis Multivariate Regression Lecture 8 October 12, 2005 Analysis Lecture #8-10/12/2005 Slide 1 of 68 Today s Lecture Profile analysis Today s Lecture Schedule : regression review multiple regression is due Thursday, October 27th,

More information