Chapter 6 Multiple Regression

Size: px
Start display at page:

Download "Chapter 6 Multiple Regression"

Transcription

1 STAT 525 FALL 2018 Chapter 6 Multiple Regression Professor Min Zhang

2 The Data and Model Still have single response variable Y Now have multiple explanatory variables Examples: Blood Pressure vs Age, Weight, Diet, Fitness Level Traffic Count vs Time, Location, Population, Month Goal: There is a total amount of variation in Y (SSTO). We want to explain as much of this variation as possible using a linear model and our explanatory variables Y i = β 0 +β 1 X i1 + +β p 1 X i,p 1 +ε i Have p 1 predictors p coefficients 6-1

3 First Order Model with Two Predictors Y i = β 0 +β 1 X i1 +β 2 X i2 +ε i ; i = 1,...,n β 0 is the intercept and β 1 and β 2 are the regression coefficients Meaning of regression coefficients β 1 describes change in mean response per unit increase in X 1 when X 2 is held constant β 2 describes change in mean response per unit increase in X 2 when X 1 is held constant Variables X 1 and X 2 are additive. Value of X 1 does not affect the change due to X 2. There is no interaction. The response surface is a plane. 6-2

4 Additive Response Surface Ŷ i = X i X i2 6-3

5 Interaction Model Y i = β 0 +β 1 X i1 +β 2 X i2 +β 3 X i1 X i2 +ε i Meaning of parameters: Change in X 1 when X 2 = x 2 E[Y] = {β 0 +β 1 (X 1 +1)+β 2 x 2 +β 3 (X 1 +1)x 2 } {β 0 +β 1 X 1 +β 2 x 2 +β 3 X 1 x 2 } = β 1 +β 3 x 2 Change in X 2 when X 1 = x 1 E[Y] = β 2 +β 3 x 1 Rate of change due to one variable affected by the other 6-4

6 Interaction Response Surface Ŷ i = X i1 +1.2X i2.75x i1 X i2 6-5

7 Qualitative Predictors Y i = β 0 +β 1 X i1 +β 2 X i2 +β 3 X i1 X i2 +ε i where Y is a senior student s GPA, X 1 is the SAT score. Let X 2 = 1 if case from Purdue, and X 2 = 0 if from IU Meaning of parameters: Case from Purdue (X 2 = 1): E[Y] = β 0 +β 1 X 1 +β 2 1+β 3 X 1 (1) = (β 0 +β 2 )+(β 1 +β 3 )X 1 Case from other location (X 2 = 0) E[Y] = β 0 +β 1 X 1 +β 2 0+β 3 X 1 (0) = β 0 +β 1 X 1 Have two regression lines β 2 quantify the difference between intercepts β 3 quantify the difference between slops 6-6

8 Polynomial Regression and Transformations Polynomial regression: where X i2 = X 2 i. Y i = β 0 +β 1 X i +β 2 X 2 i +ε i = β 0 +β 1 X i1 +β 2 X i2 +ε i this is a linear model because it is a linear function of parameters β Transformations Y i = 1 β 0 +β 1 X i1 +β 2 X i2 +ε i 1 Y i = β 0 +β 1 X i1 +β 2 X i2 +ε i log(y i ) = β 0 +β 1 X i1 +β 2 X i2 +ε i the last one is a linear model on the log(y i ) scale 6-7

9 General Linear Regression In Matrix Terms As an equation Y i = β 0 +β 1 X i1 + +β p 1 X i,p 1 +ε i As an array Y 1 Y 2. = Y n β 1 X 11 X 12 X 0 1 p 1 β 1 X 21 X 22 X 2 p 1 1 β X n1 X n2 X n p 1 β p 1 ε 1 ε 2. ε n In matrix notation Y = Xβ +ε Distributional assumptions: ε N(0,σ 2 I) Y N(Xβ,σ 2 I) 6-8

10 Estimation of Regression Coefficients Least squares estimates find b to minimize (Y Xb) (Y Xb) b = (X X) 1 X Y Fitted values define a (hyper)plane Ŷ = X(X X) 1 X Y = HY HY forms a response surface Residuals e = Y Ŷ = (I H)Y 6-9

11 The Distribution of Residuals e = Y Ŷ = (I H)Y I H is symmetric and idempotent Expected value E(e) = 0 Covariance Matrix σ 2 (e) = σ 2 (I H)(I H) = σ 2 (I H) Var(e i ) = σ 2 (1 h ii ) where h ii = X i (X X) 1 X i Residuals are usually correlated, i.e., cov(e i,e j ) = σ 2 h ij, i j Will use this information to look for outliers 6-10

12 Estimation of σ 2 Similar approach as before Now p model parameters s 2 = e e n p = (Y Xb) (Y Xb) n p = SSE n p = MSE 6-11

13 ANOVA TABLE Source of Variation df SS MS F Value Regression p 1 SSR MSR=SSR/(p 1) MSR/MSE (Model) Error n p SSE MSE=SSE/(n p) Total n 1 SSTO F Test: Tests if the predictors collectively help explain the variation in Y H 0 : β 1 = β 2 =... = β p 1 = 0 H a : at least one β k 0, 1 k p 1 F = SSR/(p 1) SSE/(n p) H 0 F(p 1,n p) Reject H 0 if F > F(1 α,p 1,n p) No conclusions possible regarding individual predictors 6-12

14 Testing Individual Predictor t-test Have already shown that b N ( β,σ 2 (X X) 1) This implies b k N(β k,σ 2 (b k )) Perform t test H 0 : β k = 0 vs H a : β k 0 b k β k s(b k ) t n p so t = b k s(b k ) t n p under H 0 Reject H 0 if t > t(1 α/2,n p) Confidence interval for β k b k ±t(1 α/2,n p)s{b k } 6-13

15 General Linear Test H 0 : β k = 0 vs H a : β k 0 Full Model : Y i = β 0 + p 1 j=1 β j X ji +ε i Reduced Model : Y i = β 0 + k 1 j=1 β j X ji + p 1 j=k+1 β j X ji +ε i F = (SSE(R) SSE(F))/1 SSE(F)/(n p) Reject H 0 if F > F(1 α,1,n p) 6-14

16 Equivalence of t-test and General Linear Test Can show that F = (t ) 2 both tests result in the same conclusion Both tests investigate significance of a predictor given the other variables are already in the model i.e. significance of the variable which is fitted last 6-15

17 Coefficient of Multiple Determination Coefficient of Determination R 2 describes proportionate reduction in total variation associated with the full set of X variables R 2 = SSR SSTO = 1 SSE SSTO, 0 R2 1 R 2 usually increases with the increasing p Adjusted R 2 a attempts to account for p R 2 a = 1 SSE/(n p) SSTO/(n 1), 0 R2 a 1 The adustment is often insufficient 6-16

18 Example: Purdue Computer Science Students Computer Science majors at Purdue have a large drop-out rate Goal: Find predictors of success (defined as high GPA) Predictors must be available at time of entry into program. These are: GPA: grade points average after three semesters HSM: high-school math grades HSS: high-school science grades HSE: high-school english grades SATM: SAT Math SATV: SAT Verbal Data available on n = 224 students 6-17

19 Now investigate the model: GPA = HSM HSS HSE options nocenter linesize=72; goptions colors=( none ); data a1; infile U:\.www\datasets525\csdata.dat ; input id gpa hsm hss hse satm satv; proc reg data=a1; model gpa=hsm hss hse; run; quit; 6-18

20 Analysis of Variance Sum of Mean Source DF Squares Square F Value Pr > F Model <.0001 Error Corrected Total Root MSE R-Square Dependent Mean Adj R-Sq Coeff Var Parameter Estimates Parameter Standard Variable DF Estimate Error t Value Pr > t Intercept hsm <.0001 hss hse

21 Estimation of Mean Response E(Y h ) We are interested in predictors X h Can show Ŷ h N ( X h β,σ2 X h (X X) 1 X h ) Individual CI for X h Ŷ h ± t(1 α/2,n p)s{ŷ h } Bonferroni CI for g vectors X h Ŷ h ± t(1 α/(2g),n p)s{ŷ h } Working-Hotelling confidence band for the whole regression line Ŷ h ± pf(1 α,p,n p) s{ŷ h } Be careful to be only in range of X s 6-20

22 Predict New Observation Y h(new) = E(Y h )+ε s 2 (pred) = s 2 (Ŷ h )+MSE Ŷ h +ε N ( X h β,σ2 (1+X h (X X) 1 X h ) ) Individual CI of Y h(new) Ŷ h ± t(1 α/2,n p)s{pred} Bonferroni CI for g vectors X h Ŷ h ± t(1 α/(2g),n p)s{pred} Simultaneous Scheffé prediction limits for g vectors X h Ŷ h ± gf(1 α,g,n p) s{pred} 6-21

23 Diagnostics Diagnostics play a key role in both the development and assessment of multiple regression models Most previous diagnostics carry over to multiple regression Given more than one predictor, must also consider relationship between predictors Specialized diagnostics discussed later in Chapters 9 and

24 Scatterplot Matrix Scatterplot matrix summarizes bivariate relationships between Y and X j as well as between X j and X k (j,k = 1,2,...,p 1) Nature of bivariate relationships Strength of bivariate relationships Detection of outliers Range spanned by X s Can be generated within SAS Click on Solutions Analysis Interactive data analysis Inside SAS/Insight window, click on library WORK Click on the desired data set (NOTE: data must already be read into SAS) Click Open (shift) Click on desired variables Go to menu Analyze Choose option Scatterplot(XY) 6-23

25 Correlation Matrix Complementary summary Displays all pairwise correlations Must be wary of Nonlinear relationships Outliers Influential observations 6-24

26 Example: Purdue Computer Science Student Descriptive Statistics: using PROC MEANS or PROC UNIVARIATE proc means data=cs maxdec=2; var gpa hsm hss hse satm satv; run; The MEANS Procedure Variable N Mean Std Dev Minimum Maximum gpa hsm hss hse satm satv maxdec = 2 sets the number of decimal places in the output to

27 proc univariate data=cs noprint; var gpa hsm hss hse satm satv; histogram gpa hsm hss hse satm satv /normal; run; Top Left: GPA Top Right: HSM Bottom Left: HSS Bottom Right: HSE 6-26

28 Upper SATM Lower SATV 6-27

29 Scatterplot Matrix in SAS: From the menu bar, select Solutions -> analysis -> interactive data analysis 6-28

30 Uses PROC CORR to report correlation values; Use NOPROB statement to get rid of those p-values. [1] proc corr data=a1; [3] proc corr data=a1; var hsm hss hse; var hsm hss hse satm satv; with gpa; hsm hss hse hsm hsm hss hse <.0001 <.0001 gpa hss <.0001 <.0001 <.0001 <.0001 <.0001 hse satm satv <.0001 <.0001 gpa [2] proc corr data=a1 noprob; var satm satv; satm satv satm satv

31 Residual Plots Used for similar assessment of assumptions Model is correct Normality Constant Variance Independence Plot e vs Ŷ (overall) Plot e vs X j (with respect to X j ) Plot e vs missing variable (e.g., X j X k ) 6-30

32 Tests Univariate graphical summaries of e are still preferred NORMAL option in PROC UNIVARIATE test normality Modified Levene s and Breusch-Pagan for constant variance Lack of fit test: need repeated observations where all X fixed at same levels 6-31

33 Lack of Fit Test Compare (reduced) linear model H 0 : E(Y i ) = β 0 +β 1 X i1 + +β p 1 X i,p 1 (full) model where Y has c means (i.e. c combinations of X i ) H a : E(Y i ) β 0 +β 1 X i1 + +β p 1 X i,p 1 F = {SSE(R) SSE(F)}/{(n p) (n c)} H 0 SSE(F)/(n c) F(c p,n c) under Reject H 0 if F > F(1 α,c p,n c) If reject H 0, conclude that a more complex relationship between Y and X 1,...,X p 1 is needed 6-32

34 Calculate SSE(F) in SAS * Analysis of Variance - Full Model proc glm; class x1 x2; model y=x1*x2; run; CLASS and MODEL specify that every combination of levels of X 1 and X 2 has their own mean Plug the SSE of this model into the lack of fit test 6-33

35 Example: Purdue Computer Science Student Investigate the model: GPA = HSM HSS HSE PROC REG reports SSE(R) = with df(sse) = 220. proc glm data=a1; class hsm hss hse; model gpa=hsm*hss*hse; run; quit; Class Level Information Class Levels Values hsm hss hse Sum of Source DF Squares Mean Square F Value Pr > F Model <.0001 Error Corrected Total F = {SSE(R) SSE(F)}/{(n p) (n c)} SSE(F)/(n c) = F96,124 1 (0.95). = { }/{ } /124 = > 6-34

36 Chapter Review Data and Notation Model in Matrix Terms Parameter Estimation ANOVA F-test Estimation of Mean Responses Prediction of New Observations Diagnostics 6-35

Lecture 11 Multiple Linear Regression

Lecture 11 Multiple Linear Regression Lecture 11 Multiple Linear Regression STAT 512 Spring 2011 Background Reading KNNL: 6.1-6.5 11-1 Topic Overview Review: Multiple Linear Regression (MLR) Computer Science Case Study 11-2 Multiple Regression

More information

Multiple Regression. Inference for Multiple Regression and A Case Study. IPS Chapters 11.1 and W.H. Freeman and Company

Multiple Regression. Inference for Multiple Regression and A Case Study. IPS Chapters 11.1 and W.H. Freeman and Company Multiple Regression Inference for Multiple Regression and A Case Study IPS Chapters 11.1 and 11.2 2009 W.H. Freeman and Company Objectives (IPS Chapters 11.1 and 11.2) Multiple regression Data for multiple

More information

Chapter 2 Inferences in Simple Linear Regression

Chapter 2 Inferences in Simple Linear Regression STAT 525 SPRING 2018 Chapter 2 Inferences in Simple Linear Regression Professor Min Zhang Testing for Linear Relationship Term β 1 X i defines linear relationship Will then test H 0 : β 1 = 0 Test requires

More information

Confidence Interval for the mean response

Confidence Interval for the mean response Week 3: Prediction and Confidence Intervals at specified x. Testing lack of fit with replicates at some x's. Inference for the correlation. Introduction to regression with several explanatory variables.

More information

Lecture 13 Extra Sums of Squares

Lecture 13 Extra Sums of Squares Lecture 13 Extra Sums of Squares STAT 512 Spring 2011 Background Reading KNNL: 7.1-7.4 13-1 Topic Overview Extra Sums of Squares (Defined) Using and Interpreting R 2 and Partial-R 2 Getting ESS and Partial-R

More information

Lecture 12 Inference in MLR

Lecture 12 Inference in MLR Lecture 12 Inference in MLR STAT 512 Spring 2011 Background Reading KNNL: 6.6-6.7 12-1 Topic Overview Review MLR Model Inference about Regression Parameters Estimation of Mean Response Prediction 12-2

More information

Multiple Linear Regression

Multiple Linear Regression Multiple Linear Regression Simple linear regression tries to fit a simple line between two variables Y and X. If X is linearly related to Y this explains some of the variability in Y. In most cases, there

More information

Lecture 10 Multiple Linear Regression

Lecture 10 Multiple Linear Regression Lecture 10 Multiple Linear Regression STAT 512 Spring 2011 Background Reading KNNL: 6.1-6.5 10-1 Topic Overview Multiple Linear Regression Model 10-2 Data for Multiple Regression Y i is the response variable

More information

STOR 455 STATISTICAL METHODS I

STOR 455 STATISTICAL METHODS I STOR 455 STATISTICAL METHODS I Jan Hannig Mul9variate Regression Y=X β + ε X is a regression matrix, β is a vector of parameters and ε are independent N(0,σ) Es9mated parameters b=(x X) - 1 X Y Predicted

More information

Outline. Remedial Measures) Extra Sums of Squares Standardized Version of the Multiple Regression Model

Outline. Remedial Measures) Extra Sums of Squares Standardized Version of the Multiple Regression Model Outline 1 Multiple Linear Regression (Estimation, Inference, Diagnostics and Remedial Measures) 2 Special Topics for Multiple Regression Extra Sums of Squares Standardized Version of the Multiple Regression

More information

Overview Scatter Plot Example

Overview Scatter Plot Example Overview Topic 22 - Linear Regression and Correlation STAT 5 Professor Bruce Craig Consider one population but two variables For each sampling unit observe X and Y Assume linear relationship between variables

More information

Topic 14: Inference in Multiple Regression

Topic 14: Inference in Multiple Regression Topic 14: Inference in Multiple Regression Outline Review multiple linear regression Inference of regression coefficients Application to book example Inference of mean Application to book example Inference

More information

Simple Regression Model Setup Estimation Inference Prediction. Model Diagnostic. Multiple Regression. Model Setup and Estimation.

Simple Regression Model Setup Estimation Inference Prediction. Model Diagnostic. Multiple Regression. Model Setup and Estimation. Statistical Computation Math 475 Jimin Ding Department of Mathematics Washington University in St. Louis www.math.wustl.edu/ jmding/math475/index.html October 10, 2013 Ridge Part IV October 10, 2013 1

More information

Chapter 8 Quantitative and Qualitative Predictors

Chapter 8 Quantitative and Qualitative Predictors STAT 525 FALL 2017 Chapter 8 Quantitative and Qualitative Predictors Professor Dabao Zhang Polynomial Regression Multiple regression using X 2 i, X3 i, etc as additional predictors Generates quadratic,

More information

Formal Statement of Simple Linear Regression Model

Formal Statement of Simple Linear Regression Model Formal Statement of Simple Linear Regression Model Y i = β 0 + β 1 X i + ɛ i Y i value of the response variable in the i th trial β 0 and β 1 are parameters X i is a known constant, the value of the predictor

More information

STAT 540: Data Analysis and Regression

STAT 540: Data Analysis and Regression STAT 540: Data Analysis and Regression Wen Zhou http://www.stat.colostate.edu/~riczw/ Email: riczw@stat.colostate.edu Department of Statistics Colorado State University Fall 205 W. Zhou (Colorado State

More information

Chapter 2 Multiple Regression I (Part 1)

Chapter 2 Multiple Regression I (Part 1) Chapter 2 Multiple Regression I (Part 1) 1 Regression several predictor variables The response Y depends on several predictor variables X 1,, X p response {}}{ Y predictor variables {}}{ X 1, X 2,, X p

More information

Overview of Topic III

Overview of Topic III Overview of This topic will cover thinking in terms of matrices (Chapter 5) regression on multiple predictor variables (Chapter 6) case study: CS majors Text Example (KNNL 236) Page 1 Chapter 5: Linear

More information

STA 302 H1F / 1001 HF Fall 2007 Test 1 October 24, 2007

STA 302 H1F / 1001 HF Fall 2007 Test 1 October 24, 2007 STA 302 H1F / 1001 HF Fall 2007 Test 1 October 24, 2007 LAST NAME: SOLUTIONS FIRST NAME: STUDENT NUMBER: ENROLLED IN: (circle one) STA 302 STA 1001 INSTRUCTIONS: Time: 90 minutes Aids allowed: calculator.

More information

Topic 16: Multicollinearity and Polynomial Regression

Topic 16: Multicollinearity and Polynomial Regression Topic 16: Multicollinearity and Polynomial Regression Outline Multicollinearity Polynomial regression An example (KNNL p256) The P-value for ANOVA F-test is

More information

Lecture 18 MA Applied Statistics II D 2004

Lecture 18 MA Applied Statistics II D 2004 Lecture 18 MA 2612 - Applied Statistics II D 2004 Today 1. Examples of multiple linear regression 2. The modeling process (PNC 8.4) 3. The graphical exploration of multivariable data (PNC 8.5) 4. Fitting

More information

STA 108 Applied Linear Models: Regression Analysis Spring Solution for Homework #6

STA 108 Applied Linear Models: Regression Analysis Spring Solution for Homework #6 STA 8 Applied Linear Models: Regression Analysis Spring 011 Solution for Homework #6 6. a) = 11 1 31 41 51 1 3 4 5 11 1 31 41 51 β = β1 β β 3 b) = 1 1 1 1 1 11 1 31 41 51 1 3 4 5 β = β 0 β1 β 6.15 a) Stem-and-leaf

More information

Lecture 3: Inference in SLR

Lecture 3: Inference in SLR Lecture 3: Inference in SLR STAT 51 Spring 011 Background Reading KNNL:.1.6 3-1 Topic Overview This topic will cover: Review of hypothesis testing Inference about 1 Inference about 0 Confidence Intervals

More information

STATISTICS 479 Exam II (100 points)

STATISTICS 479 Exam II (100 points) Name STATISTICS 79 Exam II (1 points) 1. A SAS data set was created using the following input statement: Answer parts(a) to (e) below. input State $ City $ Pop199 Income Housing Electric; (a) () Give the

More information

Handout 1: Predicting GPA from SAT

Handout 1: Predicting GPA from SAT Handout 1: Predicting GPA from SAT appsrv01.srv.cquest.utoronto.ca> appsrv01.srv.cquest.utoronto.ca> ls Desktop grades.data grades.sas oldstuff sasuser.800 appsrv01.srv.cquest.utoronto.ca> cat grades.data

More information

Lecture 15 Multiple regression I Chapter 6 Set 2 Least Square Estimation The quadratic form to be minimized is

Lecture 15 Multiple regression I Chapter 6 Set 2 Least Square Estimation The quadratic form to be minimized is Lecture 15 Multiple regression I Chapter 6 Set 2 Least Square Estimation The quadratic form to be minimized is Q = (Y i β 0 β 1 X i1 β 2 X i2 β p 1 X i.p 1 ) 2, which in matrix notation is Q = (Y Xβ) (Y

More information

ST505/S697R: Fall Homework 2 Solution.

ST505/S697R: Fall Homework 2 Solution. ST505/S69R: Fall 2012. Homework 2 Solution. 1. 1a; problem 1.22 Below is the summary information (edited) from the regression (using R output); code at end of solution as is code and output for SAS. a)

More information

Lecture 1 Linear Regression with One Predictor Variable.p2

Lecture 1 Linear Regression with One Predictor Variable.p2 Lecture Linear Regression with One Predictor Variablep - Basics - Meaning of regression parameters p - β - the slope of the regression line -it indicates the change in mean of the probability distn of

More information

Statistiek II. John Nerbonne. March 17, Dept of Information Science incl. important reworkings by Harmut Fitz

Statistiek II. John Nerbonne. March 17, Dept of Information Science incl. important reworkings by Harmut Fitz Dept of Information Science j.nerbonne@rug.nl incl. important reworkings by Harmut Fitz March 17, 2015 Review: regression compares result on two distinct tests, e.g., geographic and phonetic distance of

More information

6. Multiple Linear Regression

6. Multiple Linear Regression 6. Multiple Linear Regression SLR: 1 predictor X, MLR: more than 1 predictor Example data set: Y i = #points scored by UF football team in game i X i1 = #games won by opponent in their last 10 games X

More information

Lecture notes on Regression & SAS example demonstration

Lecture notes on Regression & SAS example demonstration Regression & Correlation (p. 215) When two variables are measured on a single experimental unit, the resulting data are called bivariate data. You can describe each variable individually, and you can also

More information

Lecture 11: Simple Linear Regression

Lecture 11: Simple Linear Regression Lecture 11: Simple Linear Regression Readings: Sections 3.1-3.3, 11.1-11.3 Apr 17, 2009 In linear regression, we examine the association between two quantitative variables. Number of beers that you drink

More information

STAT5044: Regression and Anova

STAT5044: Regression and Anova STAT5044: Regression and Anova Inyoung Kim 1 / 25 Outline 1 Multiple Linear Regression 2 / 25 Basic Idea An extra sum of squares: the marginal reduction in the error sum of squares when one or several

More information

ST Correlation and Regression

ST Correlation and Regression Chapter 5 ST 370 - Correlation and Regression Readings: Chapter 11.1-11.4, 11.7.2-11.8, Chapter 12.1-12.2 Recap: So far we ve learned: Why we want a random sample and how to achieve it (Sampling Scheme)

More information

General Linear Model (Chapter 4)

General Linear Model (Chapter 4) General Linear Model (Chapter 4) Outcome variable is considered continuous Simple linear regression Scatterplots OLS is BLUE under basic assumptions MSE estimates residual variance testing regression coefficients

More information

F-tests and Nested Models

F-tests and Nested Models F-tests and Nested Models Nested Models: A core concept in statistics is comparing nested s. Consider the Y = β 0 + β 1 x 1 + β 2 x 2 + ǫ. (1) The following reduced s are special cases (nested within)

More information

Correlation and the Analysis of Variance Approach to Simple Linear Regression

Correlation and the Analysis of Variance Approach to Simple Linear Regression Correlation and the Analysis of Variance Approach to Simple Linear Regression Biometry 755 Spring 2009 Correlation and the Analysis of Variance Approach to Simple Linear Regression p. 1/35 Correlation

More information

Homework 2: Simple Linear Regression

Homework 2: Simple Linear Regression STAT 4385 Applied Regression Analysis Homework : Simple Linear Regression (Simple Linear Regression) Thirty (n = 30) College graduates who have recently entered the job market. For each student, the CGPA

More information

T-test: means of Spock's judge versus all other judges 1 12:10 Wednesday, January 5, judge1 N Mean Std Dev Std Err Minimum Maximum

T-test: means of Spock's judge versus all other judges 1 12:10 Wednesday, January 5, judge1 N Mean Std Dev Std Err Minimum Maximum T-test: means of Spock's judge versus all other judges 1 The TTEST Procedure Variable: pcwomen judge1 N Mean Std Dev Std Err Minimum Maximum OTHER 37 29.4919 7.4308 1.2216 16.5000 48.9000 SPOCKS 9 14.6222

More information

Need for Several Predictor Variables

Need for Several Predictor Variables Multiple regression One of the most widely used tools in statistical analysis Matrix expressions for multiple regression are the same as for simple linear regression Need for Several Predictor Variables

More information

Final Review. Yang Feng. Yang Feng (Columbia University) Final Review 1 / 58

Final Review. Yang Feng.   Yang Feng (Columbia University) Final Review 1 / 58 Final Review Yang Feng http://www.stat.columbia.edu/~yangfeng Yang Feng (Columbia University) Final Review 1 / 58 Outline 1 Multiple Linear Regression (Estimation, Inference) 2 Special Topics for Multiple

More information

Lecture 6 Multiple Linear Regression, cont.

Lecture 6 Multiple Linear Regression, cont. Lecture 6 Multiple Linear Regression, cont. BIOST 515 January 22, 2004 BIOST 515, Lecture 6 Testing general linear hypotheses Suppose we are interested in testing linear combinations of the regression

More information

Ch 3: Multiple Linear Regression

Ch 3: Multiple Linear Regression Ch 3: Multiple Linear Regression 1. Multiple Linear Regression Model Multiple regression model has more than one regressor. For example, we have one response variable and two regressor variables: 1. delivery

More information

Chapter 1 Linear Regression with One Predictor

Chapter 1 Linear Regression with One Predictor STAT 525 FALL 2018 Chapter 1 Linear Regression with One Predictor Professor Min Zhang Goals of Regression Analysis Serve three purposes Describes an association between X and Y In some applications, the

More information

Topic 18: Model Selection and Diagnostics

Topic 18: Model Selection and Diagnostics Topic 18: Model Selection and Diagnostics Variable Selection We want to choose a best model that is a subset of the available explanatory variables Two separate problems 1. How many explanatory variables

More information

Regression Analysis. Regression: Methodology for studying the relationship among two or more variables

Regression Analysis. Regression: Methodology for studying the relationship among two or more variables Regression Analysis Regression: Methodology for studying the relationship among two or more variables Two major aims: Determine an appropriate model for the relationship between the variables Predict the

More information

Outline. Topic 19 - Inference. The Cell Means Model. Estimates. Inference for Means Differences in cell means Contrasts. STAT Fall 2013

Outline. Topic 19 - Inference. The Cell Means Model. Estimates. Inference for Means Differences in cell means Contrasts. STAT Fall 2013 Topic 19 - Inference - Fall 2013 Outline Inference for Means Differences in cell means Contrasts Multiplicity Topic 19 2 The Cell Means Model Expressed numerically Y ij = µ i + ε ij where µ i is the theoretical

More information

Models for Clustered Data

Models for Clustered Data Models for Clustered Data Edps/Psych/Soc 589 Carolyn J Anderson Department of Educational Psychology c Board of Trustees, University of Illinois Spring 2019 Outline Notation NELS88 data Fixed Effects ANOVA

More information

unadjusted model for baseline cholesterol 22:31 Monday, April 19,

unadjusted model for baseline cholesterol 22:31 Monday, April 19, unadjusted model for baseline cholesterol 22:31 Monday, April 19, 2004 1 Class Level Information Class Levels Values TRETGRP 3 3 4 5 SEX 2 0 1 Number of observations 916 unadjusted model for baseline cholesterol

More information

Sections 7.1, 7.2, 7.4, & 7.6

Sections 7.1, 7.2, 7.4, & 7.6 Sections 7.1, 7.2, 7.4, & 7.6 Adapted from Timothy Hanson Department of Statistics, University of South Carolina Stat 704: Data Analysis I 1 / 25 Chapter 7 example: Body fat n = 20 healthy females 25 34

More information

Models for Clustered Data

Models for Clustered Data Models for Clustered Data Edps/Psych/Stat 587 Carolyn J Anderson Department of Educational Psychology c Board of Trustees, University of Illinois Fall 2017 Outline Notation NELS88 data Fixed Effects ANOVA

More information

Topic 17 - Single Factor Analysis of Variance. Outline. One-way ANOVA. The Data / Notation. One way ANOVA Cell means model Factor effects model

Topic 17 - Single Factor Analysis of Variance. Outline. One-way ANOVA. The Data / Notation. One way ANOVA Cell means model Factor effects model Topic 17 - Single Factor Analysis of Variance - Fall 2013 One way ANOVA Cell means model Factor effects model Outline Topic 17 2 One-way ANOVA Response variable Y is continuous Explanatory variable is

More information

Chapter 1: Linear Regression with One Predictor Variable also known as: Simple Linear Regression Bivariate Linear Regression

Chapter 1: Linear Regression with One Predictor Variable also known as: Simple Linear Regression Bivariate Linear Regression BSTT523: Kutner et al., Chapter 1 1 Chapter 1: Linear Regression with One Predictor Variable also known as: Simple Linear Regression Bivariate Linear Regression Introduction: Functional relation between

More information

y response variable x 1, x 2,, x k -- a set of explanatory variables

y response variable x 1, x 2,, x k -- a set of explanatory variables 11. Multiple Regression and Correlation y response variable x 1, x 2,, x k -- a set of explanatory variables In this chapter, all variables are assumed to be quantitative. Chapters 12-14 show how to incorporate

More information

STAT 3A03 Applied Regression With SAS Fall 2017

STAT 3A03 Applied Regression With SAS Fall 2017 STAT 3A03 Applied Regression With SAS Fall 2017 Assignment 2 Solution Set Q. 1 I will add subscripts relating to the question part to the parameters and their estimates as well as the errors and residuals.

More information

STAT 3A03 Applied Regression Analysis With SAS Fall 2017

STAT 3A03 Applied Regression Analysis With SAS Fall 2017 STAT 3A03 Applied Regression Analysis With SAS Fall 2017 Assignment 5 Solution Set Q. 1 a The code that I used and the output is as follows PROC GLM DataS3A3.Wool plotsnone; Class Amp Len Load; Model CyclesAmp

More information

COMPREHENSIVE WRITTEN EXAMINATION, PAPER III FRIDAY AUGUST 26, 2005, 9:00 A.M. 1:00 P.M. STATISTICS 174 QUESTION

COMPREHENSIVE WRITTEN EXAMINATION, PAPER III FRIDAY AUGUST 26, 2005, 9:00 A.M. 1:00 P.M. STATISTICS 174 QUESTION COMPREHENSIVE WRITTEN EXAMINATION, PAPER III FRIDAY AUGUST 26, 2005, 9:00 A.M. 1:00 P.M. STATISTICS 174 QUESTION Answer all parts. Closed book, calculators allowed. It is important to show all working,

More information

2.1: Inferences about β 1

2.1: Inferences about β 1 Chapter 2 1 2.1: Inferences about β 1 Test of interest throughout regression: Need sampling distribution of the estimator b 1. Idea: If b 1 can be written as a linear combination of the responses (which

More information

Topic 20: Single Factor Analysis of Variance

Topic 20: Single Factor Analysis of Variance Topic 20: Single Factor Analysis of Variance Outline Single factor Analysis of Variance One set of treatments Cell means model Factor effects model Link to linear regression using indicator explanatory

More information

Inference in Normal Regression Model. Dr. Frank Wood

Inference in Normal Regression Model. Dr. Frank Wood Inference in Normal Regression Model Dr. Frank Wood Remember We know that the point estimator of b 1 is b 1 = (Xi X )(Y i Ȳ ) (Xi X ) 2 Last class we derived the sampling distribution of b 1, it being

More information

9. Linear Regression and Correlation

9. Linear Regression and Correlation 9. Linear Regression and Correlation Data: y a quantitative response variable x a quantitative explanatory variable (Chap. 8: Recall that both variables were categorical) For example, y = annual income,

More information

Chapter 14. Linear least squares

Chapter 14. Linear least squares Serik Sagitov, Chalmers and GU, March 5, 2018 Chapter 14 Linear least squares 1 Simple linear regression model A linear model for the random response Y = Y (x) to an independent variable X = x For a given

More information

Statistics 512: Solution to Homework#11. Problems 1-3 refer to the soybean sausage dataset of Problem 20.8 (ch21pr08.dat).

Statistics 512: Solution to Homework#11. Problems 1-3 refer to the soybean sausage dataset of Problem 20.8 (ch21pr08.dat). Statistics 512: Solution to Homework#11 Problems 1-3 refer to the soybean sausage dataset of Problem 20.8 (ch21pr08.dat). 1. Perform the two-way ANOVA without interaction for this model. Use the results

More information

Ch 2: Simple Linear Regression

Ch 2: Simple Linear Regression Ch 2: Simple Linear Regression 1. Simple Linear Regression Model A simple regression model with a single regressor x is y = β 0 + β 1 x + ɛ, where we assume that the error ɛ is independent random component

More information

Multiple Linear Regression

Multiple Linear Regression Multiple Linear Regression University of California, San Diego Instructor: Ery Arias-Castro http://math.ucsd.edu/~eariasca/teaching.html 1 / 42 Passenger car mileage Consider the carmpg dataset taken from

More information

Two-factor studies. STAT 525 Chapter 19 and 20. Professor Olga Vitek

Two-factor studies. STAT 525 Chapter 19 and 20. Professor Olga Vitek Two-factor studies STAT 525 Chapter 19 and 20 Professor Olga Vitek December 2, 2010 19 Overview Now have two factors (A and B) Suppose each factor has two levels Could analyze as one factor with 4 levels

More information

SSR = The sum of squared errors measures how much Y varies around the regression line n. It happily turns out that SSR + SSE = SSTO.

SSR = The sum of squared errors measures how much Y varies around the regression line n. It happily turns out that SSR + SSE = SSTO. Analysis of variance approach to regression If x is useless, i.e. β 1 = 0, then E(Y i ) = β 0. In this case β 0 is estimated by Ȳ. The ith deviation about this grand mean can be written: deviation about

More information

Inference for Regression

Inference for Regression Inference for Regression Section 9.4 Cathy Poliak, Ph.D. cathy@math.uh.edu Office in Fleming 11c Department of Mathematics University of Houston Lecture 13b - 3339 Cathy Poliak, Ph.D. cathy@math.uh.edu

More information

Figure 1: The fitted line using the shipment route-number of ampules data. STAT5044: Regression and ANOVA The Solution of Homework #2 Inyoung Kim

Figure 1: The fitted line using the shipment route-number of ampules data. STAT5044: Regression and ANOVA The Solution of Homework #2 Inyoung Kim 0.0 1.0 1.5 2.0 2.5 3.0 8 10 12 14 16 18 20 22 y x Figure 1: The fitted line using the shipment route-number of ampules data STAT5044: Regression and ANOVA The Solution of Homework #2 Inyoung Kim Problem#

More information

Booklet of Code and Output for STAC32 Final Exam

Booklet of Code and Output for STAC32 Final Exam Booklet of Code and Output for STAC32 Final Exam December 8, 2014 List of Figures in this document by page: List of Figures 1 Popcorn data............................. 2 2 MDs by city, with normal quantile

More information

IES 612/STA 4-573/STA Winter 2008 Week 1--IES 612-STA STA doc

IES 612/STA 4-573/STA Winter 2008 Week 1--IES 612-STA STA doc IES 612/STA 4-573/STA 4-576 Winter 2008 Week 1--IES 612-STA 4-573-STA 4-576.doc Review Notes: [OL] = Ott & Longnecker Statistical Methods and Data Analysis, 5 th edition. [Handouts based on notes prepared

More information

Math 3330: Solution to midterm Exam

Math 3330: Solution to midterm Exam Math 3330: Solution to midterm Exam Question 1: (14 marks) Suppose the regression model is y i = β 0 + β 1 x i + ε i, i = 1,, n, where ε i are iid Normal distribution N(0, σ 2 ). a. (2 marks) Compute the

More information

Analysis of Bivariate Data

Analysis of Bivariate Data Analysis of Bivariate Data Data Two Quantitative variables GPA and GAES Interest rates and indices Tax and fund allocation Population size and prison population Bivariate data (x,y) Case corr&reg 2 Independent

More information

Data Mining and Data Warehousing. Henryk Maciejewski. Data Mining Predictive modelling: regression

Data Mining and Data Warehousing. Henryk Maciejewski. Data Mining Predictive modelling: regression Data Mining and Data Warehousing Henryk Maciejewski Data Mining Predictive modelling: regression Algorithms for Predictive Modelling Contents Regression Classification Auxiliary topics: Estimation of prediction

More information

One-Way Analysis of Variance (ANOVA) There are two key differences regarding the explanatory variable X.

One-Way Analysis of Variance (ANOVA) There are two key differences regarding the explanatory variable X. One-Way Analysis of Variance (ANOVA) Also called single factor ANOVA. The response variable Y is continuous (same as in regression). There are two key differences regarding the explanatory variable X.

More information

Linear Algebra Review

Linear Algebra Review Linear Algebra Review Yang Feng http://www.stat.columbia.edu/~yangfeng Yang Feng (Columbia University) Linear Algebra Review 1 / 45 Definition of Matrix Rectangular array of elements arranged in rows and

More information

Statistics 5100 Spring 2018 Exam 1

Statistics 5100 Spring 2018 Exam 1 Statistics 5100 Spring 2018 Exam 1 Directions: You have 60 minutes to complete the exam. Be sure to answer every question, and do not spend too much time on any part of any question. Be concise with all

More information

Inferences for Regression

Inferences for Regression Inferences for Regression An Example: Body Fat and Waist Size Looking at the relationship between % body fat and waist size (in inches). Here is a scatterplot of our data set: Remembering Regression In

More information

K. Model Diagnostics. residuals ˆɛ ij = Y ij ˆµ i N = Y ij Ȳ i semi-studentized residuals ω ij = ˆɛ ij. studentized deleted residuals ɛ ij =

K. Model Diagnostics. residuals ˆɛ ij = Y ij ˆµ i N = Y ij Ȳ i semi-studentized residuals ω ij = ˆɛ ij. studentized deleted residuals ɛ ij = K. Model Diagnostics We ve already seen how to check model assumptions prior to fitting a one-way ANOVA. Diagnostics carried out after model fitting by using residuals are more informative for assessing

More information

3 Variables: Cyberloafing Conscientiousness Age

3 Variables: Cyberloafing Conscientiousness Age title 'Cyberloafing, Mike Sage'; run; PROC CORR data=sage; var Cyberloafing Conscientiousness Age; run; quit; The CORR Procedure 3 Variables: Cyberloafing Conscientiousness Age Simple Statistics Variable

More information

Statistics 512: Applied Linear Models. Topic 1

Statistics 512: Applied Linear Models. Topic 1 Topic Overview This topic will cover Course Overview & Policies SAS Statistics 512: Applied Linear Models Topic 1 KNNL Chapter 1 (emphasis on Sections 1.3, 1.6, and 1.7; much should be review) Simple linear

More information

MATH 644: Regression Analysis Methods

MATH 644: Regression Analysis Methods MATH 644: Regression Analysis Methods FINAL EXAM Fall, 2012 INSTRUCTIONS TO STUDENTS: 1. This test contains SIX questions. It comprises ELEVEN printed pages. 2. Answer ALL questions for a total of 100

More information

Multiple Regression. Dr. Frank Wood. Frank Wood, Linear Regression Models Lecture 12, Slide 1

Multiple Regression. Dr. Frank Wood. Frank Wood, Linear Regression Models Lecture 12, Slide 1 Multiple Regression Dr. Frank Wood Frank Wood, fwood@stat.columbia.edu Linear Regression Models Lecture 12, Slide 1 Review: Matrix Regression Estimation We can solve this equation (if the inverse of X

More information

Lecture 18: Simple Linear Regression

Lecture 18: Simple Linear Regression Lecture 18: Simple Linear Regression BIOS 553 Department of Biostatistics University of Michigan Fall 2004 The Correlation Coefficient: r The correlation coefficient (r) is a number that measures the strength

More information

STAT 350. Assignment 4

STAT 350. Assignment 4 STAT 350 Assignment 4 1. For the Mileage data in assignment 3 conduct a residual analysis and report your findings. I used the full model for this since my answers to assignment 3 suggested we needed the

More information

PubH 7405: REGRESSION ANALYSIS SLR: DIAGNOSTICS & REMEDIES

PubH 7405: REGRESSION ANALYSIS SLR: DIAGNOSTICS & REMEDIES PubH 7405: REGRESSION ANALYSIS SLR: DIAGNOSTICS & REMEDIES Normal Error RegressionModel : Y = β 0 + β ε N(0,σ 2 1 x ) + ε The Model has several parts: Normal Distribution, Linear Mean, Constant Variance,

More information

Answer Keys to Homework#10

Answer Keys to Homework#10 Answer Keys to Homework#10 Problem 1 Use either restricted or unrestricted mixed models. Problem 2 (a) First, the respective means for the 8 level combinations are listed in the following table A B C Mean

More information

1 A Review of Correlation and Regression

1 A Review of Correlation and Regression 1 A Review of Correlation and Regression SW, Chapter 12 Suppose we select n = 10 persons from the population of college seniors who plan to take the MCAT exam. Each takes the test, is coached, and then

More information

Simple Linear Regression: One Quantitative IV

Simple Linear Regression: One Quantitative IV Simple Linear Regression: One Quantitative IV Linear regression is frequently used to explain variation observed in a dependent variable (DV) with theoretically linked independent variables (IV). For example,

More information

Assignment 9 Answer Keys

Assignment 9 Answer Keys Assignment 9 Answer Keys Problem 1 (a) First, the respective means for the 8 level combinations are listed in the following table A B C Mean 26.00 + 34.67 + 39.67 + + 49.33 + 42.33 + + 37.67 + + 54.67

More information

More about Single Factor Experiments

More about Single Factor Experiments More about Single Factor Experiments 1 2 3 0 / 23 1 2 3 1 / 23 Parameter estimation Effect Model (1): Y ij = µ + A i + ɛ ij, Ji A i = 0 Estimation: µ + A i = y i. ˆµ = y..  i = y i. y.. Effect Modell

More information

SAS Commands. General Plan. Output. Construct scatterplot / interaction plot. Run full model

SAS Commands. General Plan. Output. Construct scatterplot / interaction plot. Run full model Topic 23 - Unequal Replication Data Model Outline - Fall 2013 Parameter Estimates Inference Topic 23 2 Example Page 954 Data for Two Factor ANOVA Y is the response variable Factor A has levels i = 1, 2,...,

More information

STA 4210 Practise set 2a

STA 4210 Practise set 2a STA 410 Practise set a For all significance tests, use = 0.05 significance level. S.1. A multiple linear regression model is fit, relating household weekly food expenditures (Y, in $100s) to weekly income

More information

1) Answer the following questions as true (T) or false (F) by circling the appropriate letter.

1) Answer the following questions as true (T) or false (F) by circling the appropriate letter. 1) Answer the following questions as true (T) or false (F) by circling the appropriate letter. T F T F T F a) Variance estimates should always be positive, but covariance estimates can be either positive

More information

STAT 705 Chapter 16: One-way ANOVA

STAT 705 Chapter 16: One-way ANOVA STAT 705 Chapter 16: One-way ANOVA Timothy Hanson Department of Statistics, University of South Carolina Stat 705: Data Analysis II 1 / 21 What is ANOVA? Analysis of variance (ANOVA) models are regression

More information

Lecture 5: Comparing Treatment Means Montgomery: Section 3-5

Lecture 5: Comparing Treatment Means Montgomery: Section 3-5 Lecture 5: Comparing Treatment Means Montgomery: Section 3-5 Page 1 Linear Combination of Means ANOVA: y ij = µ + τ i + ɛ ij = µ i + ɛ ij Linear combination: L = c 1 µ 1 + c 1 µ 2 +...+ c a µ a = a i=1

More information

Outline. Review regression diagnostics Remedial measures Weighted regression Ridge regression Robust regression Bootstrapping

Outline. Review regression diagnostics Remedial measures Weighted regression Ridge regression Robust regression Bootstrapping Topic 19: Remedies Outline Review regression diagnostics Remedial measures Weighted regression Ridge regression Robust regression Bootstrapping Regression Diagnostics Summary Check normality of the residuals

More information

Econ 3790: Business and Economics Statistics. Instructor: Yogesh Uppal

Econ 3790: Business and Economics Statistics. Instructor: Yogesh Uppal Econ 3790: Business and Economics Statistics Instructor: Yogesh Uppal yuppal@ysu.edu Sampling Distribution of b 1 Expected value of b 1 : Variance of b 1 : E(b 1 ) = 1 Var(b 1 ) = σ 2 /SS x Estimate of

More information

Biostatistics 380 Multiple Regression 1. Multiple Regression

Biostatistics 380 Multiple Regression 1. Multiple Regression Biostatistics 0 Multiple Regression ORIGIN 0 Multiple Regression Multiple Regression is an extension of the technique of linear regression to describe the relationship between a single dependent (response)

More information

1-Way ANOVA MATH 143. Spring Department of Mathematics and Statistics Calvin College

1-Way ANOVA MATH 143. Spring Department of Mathematics and Statistics Calvin College 1-Way ANOVA MATH 143 Department of Mathematics and Statistics Calvin College Spring 2010 The basic ANOVA situation Two variables: 1 Categorical, 1 Quantitative Main Question: Do the (means of) the quantitative

More information