Multiple Regression. Dan Frey Associate Professor of Mechanical Engineering and Engineering Systems

Similar documents
Multiple Linear Regression

Ch 3: Multiple Linear Regression

Correlation Analysis

14 Multiple Linear Regression

Lecture 9 SLR in Matrix Form

Lecture 10 Multiple Linear Regression

Multiple Linear Regression

2.830J / 6.780J / ESD.63J Control of Manufacturing Processes (SMA 6303) Spring 2008

Ch 2: Simple Linear Regression

Dr. Maddah ENMG 617 EM Statistics 11/28/12. Multiple Regression (3) (Chapter 15, Hines)

DESAIN EKSPERIMEN Analysis of Variances (ANOVA) Semester Genap 2017/2018 Jurusan Teknik Industri Universitas Brawijaya

Basic Business Statistics 6 th Edition

STAT5044: Regression and Anova

Linear Regression. In this problem sheet, we consider the problem of linear regression with p predictors and one intercept,

2.830 Homework #6. April 2, 2009

Statistics for Managers using Microsoft Excel 6 th Edition

Steps in Regression Analysis

Inferences for Regression

Unit 10: Simple Linear Regression and Correlation

Econ 3790: Statistics Business and Economics. Instructor: Yogesh Uppal

Math 423/533: The Main Theoretical Topics

Multiple Regression Analysis

The Simple Regression Model. Simple Regression Model 1

Section 4.6 Simple Linear Regression

MATH 644: Regression Analysis Methods

Formal Statement of Simple Linear Regression Model

LECTURE 6. Introduction to Econometrics. Hypothesis testing & Goodness of fit

Analysis of Variance (and discussion of Bayesian and frequentist statistics)

Factorial designs. Experiments

18.S096 Problem Set 3 Fall 2013 Regression Analysis Due Date: 10/8/2013

Chapter 4: Randomized Blocks and Latin Squares

Chapter 5 Matrix Approach to Simple Linear Regression

Weighted Least Squares

Multivariate Regression (Chapter 10)

Correlation 1. December 4, HMS, 2017, v1.1

Homoskedasticity. Var (u X) = σ 2. (23)

Regression Analysis for Data Containing Outliers and High Leverage Points

Estadística II Chapter 5. Regression analysis (second part)

Review of Statistics

Chapter 13 Student Lecture Notes Department of Quantitative Methods & Information Systems. Business Statistics

Regression used to predict or estimate the value of one variable corresponding to a given value of another variable.

Chapter 10. Supplemental Text Material

2.1 Linear regression with matrices

REVIEW 8/2/2017 陈芳华东师大英语系

2.830J / 6.780J / ESD.63J Control of Manufacturing Processes (SMA 6303) Spring 2008

Regression Review. Statistics 149. Spring Copyright c 2006 by Mark E. Irwin

Estimating σ 2. We can do simple prediction of Y and estimation of the mean of Y at any value of X.

Estadística II Chapter 4: Simple linear regression

Design of Engineering Experiments Chapter 5 Introduction to Factorials

Design & Analysis of Experiments 7E 2009 Montgomery

OPTIMIZATION OF FIRST ORDER MODELS

Confidence Intervals, Testing and ANOVA Summary

Lectures on Simple Linear Regression Stat 431, Summer 2012

The t-test: A z-score for a sample mean tells us where in the distribution the particular mean lies

1 Least Squares Estimation - multiple regression.

Econometrics I KS. Module 2: Multivariate Linear Regression. Alexander Ahammer. This version: April 16, 2018

Answers to Problem Set #4

Any of 27 linear and nonlinear models may be fit. The output parallels that of the Simple Regression procedure.

Lecture 14 Simple Linear Regression

REGRESSION ANALYSIS AND INDICATOR VARIABLES

Regression Analysis and Forecasting Prof. Shalabh Department of Mathematics and Statistics Indian Institute of Technology-Kanpur

Prepared by: Prof. Dr Bahaman Abu Samah Department of Professional Development and Continuing Education Faculty of Educational Studies Universiti

Design and Analysis of Experiments Prof. Jhareswar Maiti Department of Industrial and Systems Engineering Indian Institute of Technology, Kharagpur

MSc / PhD Course Advanced Biostatistics. dr. P. Nazarov

Inverse of a Square Matrix. For an N N square matrix A, the inverse of A, 1

6. Multiple Linear Regression

Properties of the least squares estimates

Contents. 1 Review of Residuals. 2 Detecting Outliers. 3 Influential Observations. 4 Multicollinearity and its Effects

K. Model Diagnostics. residuals ˆɛ ij = Y ij ˆµ i N = Y ij Ȳ i semi-studentized residuals ω ij = ˆɛ ij. studentized deleted residuals ɛ ij =

Analysis of Variance and Design of Experiments-II

Advanced Experimental Design

Business Statistics. Lecture 9: Simple Regression

Analysis of Variance. ภาว น ศ ร ประภาน ก ล คณะเศรษฐศาสตร มหาว ทยาล ยธรรมศาสตร

Randomized Blocks, Latin Squares, and Related Designs. Dr. Mohammad Abuhaiba 1

Midterm 2 - Solutions

5.1 Model Specification and Data 5.2 Estimating the Parameters of the Multiple Regression Model 5.3 Sampling Properties of the Least Squares

We like to capture and represent the relationship between a set of possible causes and their response, by using a statistical predictive model.

Design & Analysis of Experiments 7E 2009 Montgomery

AGEC 621 Lecture 16 David Bessler

Linear Regression with 1 Regressor. Introduction to Econometrics Spring 2012 Ken Simons

Business Statistics. Tommaso Proietti. Linear Regression. DEF - Università di Roma 'Tor Vergata'

Linear regression Class 25, Jeremy Orloff and Jonathan Bloom

Applied Statistics and Econometrics

Addition of Center Points to a 2 k Designs Section 6-6 page 271

Predictive Analytics : QM901.1x Prof U Dinesh Kumar, IIMB. All Rights Reserved, Indian Institute of Management Bangalore

Simple Linear Regression

[y i α βx i ] 2 (2) Q = i=1

Chapter 14 Student Lecture Notes Department of Quantitative Methods & Information Systems. Business Statistics. Chapter 14 Multiple Regression

coefficients n 2 are the residuals obtained when we estimate the regression on y equals the (simple regression) estimated effect of the part of x 1

(ii) Scan your answer sheets INTO ONE FILE only, and submit it in the drop-box.

STAT Final Practice Problems

Weighted Least Squares

ECO220Y Simple Regression: Testing the Slope

2 Prediction and Analysis of Variance

Outline. Remedial Measures) Extra Sums of Squares Standardized Version of the Multiple Regression Model

Analysis of Covariance. The following example illustrates a case where the covariate is affected by the treatments.

THE MULTIVARIATE LINEAR REGRESSION MODEL

STAT5044: Regression and Anova. Inyoung Kim

Confidence Interval for the mean response

Statistical Modelling in Stata 5: Linear Models

Transcription:

ultiple Regression Dan Frey Associate Professor of echanical Engineering and Engineering Systems

Plan for Today ultiple Regression Estimation of the parameters Hypothesis testing Regression diagnostics Testing lack of fit Case study Net steps

The odel Equation ε X y + For a single variable For multiple variables ε α + + Y nk n n k k L O L L X k 0 ε n ε ε ε y n y y y α is renamed 0 These s allow 0 to enter the equation without being mult by s + k p

The odel Equation y X + ε Each row of X is paired with an observation y y y y n X Each column of X is paired with a coefficient L n n L O L k k nk ε E( ε i ) ε ε ε n 0 Var( ε ) σ i There are n observations of the response 0 k There are k coefficients Each observation is affected by an independent homoscedastic normal variates

Accounting for Indices ε X y + y n y y y nk n n k k L O L L X k 0 ε n ε ε ε + k p n np p n

Concept Question Which of these is a valid X matri? X 5.0m 0.3sec 7.m 0.sec 3.m 0.7 sec 5.4m 0.4sec A X 5.0m 7.V 3.sec 5.4A B 0.3m 0.V 0.7 sec 0.4A X 5.0m 7.m C 0.sec 0.3sec ) A only ) B only 3) C only 4) A and B 5) B and C 6) A and C 7) A, B, & C 8) None 9) I don t know

Adding h.o.t. to the odel Equation y n y y y n n n n n O X Each row of X is paired with an observation There are n observations of the response You can add interactions You can add curvature 0

Estimation of the Parameters Assume the model equation y X + ε We wish to minimize the sum squared error L ε T ε ( ) T y X ( y X) To minimize, we take the derivative and set it equal to zero L ˆ X T y + X T Xˆ The solution is ( T ) T X X X y And we define the fitted model ˆ yˆ Xˆ

Done in athcad: athcad Demo ontgomery Eample 0- ontgomery, D. C., 00, Design and Analysis of Eperiments, John Wiley & Sons.

Breakdown of Sum Squares Grand Total Sum of Squares GTSS n i y i SS due to mean ny SS T n i ( y i y) n SS R (yˆ y) i i SS E n i e i SSPE SSLOF

Breakdown of DOF n number of y values due to the mean n- total sum of squares k for the regression n-k- for error

Estimation of the Error Variance σ Remember the the model equation y X + ε ε ~ N(0, σ ) If assumptions of the model equation hold, then E ( ) SS ( n k ) σ E So an unbiased estimate of σ is σ ˆ SS E ( n k )

a.k.a. coefficient of multiple determination R and Adjusted R What fraction of the total sum of squares (SS T ) is accounted for jointly by all the parameters in the fitted model? R SS R SS T SS SS E T R can only rise as parameters are added R adj R adj SS SS E T ( n ( n p) ) n ( n p can rise or drop as parameters are added R )

Back to athcad Demo ontgomery Eample 0- ontgomery, D. C., 00, Design and Analysis of Eperiments, John Wiley & Sons.

Why Hypothesis Testing is Important in ultiple Regression Say there are 0 regressor variables Then there are coefficients in a linear model To make a fully nd order model requires 0 curvature terms in each variable 0 choose 45 interactions You d need 68 samples just to get the matri X T X to be invertible You need a way to discard insignificant terms

Test for Significance of Regression The hypotheses are H 0 : K k 0 H : j 0 for at least one j The test statistic is F 0 SS E SSR k ( n k ) F > F k n k Reject H 0 if 0 α,,

The hypotheses are Test for Significance Individual Coefficients H 0 : j 0 H : j 0 j C ( The test statistic is t0 X T X) ˆ σ C jj Standard error ˆ t > t k Reject H 0 if 0 α /, n σˆ C jj

Test for Significance of Groups of Coefficients Partition the coefficients into two groups to be removed to remain ε X y + Reduced model nk n n k k L O L L X Basically, you form X by removing the columns associated with the coefficients you are testing for significance X 0 0 0 : : H H

Test for Significance Groups of Coefficients Reduced model y X + ε The regression sum of squares for the reduced model is SS R ( T y H y ny ) Define the sum squares of the removed set given the other coefficients are in the model The partial F test F 0 SS SS R E ( ) ( n SS R r p) ( ) SSR ( ) SSR ( ) Reject H 0 if F 0 > Fα, r, n p

Ecel Demo -- ontgomery E0- ontgomery, D. C., 00, Design and Analysis of Eperiments, John Wiley & Sons.

Factorial Eperiments Cuboidal Representation + + - - + - 3 Ehaustive search of the space of discrete -level factors is the full factorial 3 eperimental design

Adding Center Points + + - - + - 3 Center points allow an eperimenter to check for curvature and, if replicated, allow for an estimate of pure eperimental error

Plan for Today ud cards ultiple Regression Estimation of the parameters Hypothesis testing Regression diagnostics Testing lack of fit Case study Net steps

The Hat atri Since ˆ ( T ) T X X X y and yˆ Xˆ therefore yˆ ( T ) T X X y X X So we define Which maps from observations y to predictions ŷ H ( T ) T X X X X y ˆ Hy

Influence Diagnostics The relative disposition of points in space determines their effect on the coefficients The hat matri H gives us an ability to check for leverage points h ij is the amount of leverage eerted by point y j on Usually the diagonal elements ~p/n and it is good to check whether the diagonal elements within X of that ŷ i

athcad Demo on Distribution of Samples and Its Effect on Regression

Standardized Residuals The residuals are defined as e y yˆ So an unbiased estimate of σ is ˆ σ SS E ( n p) The standardized residuals are defined as d e σˆ If these elements were z-scores then with probability 99.7% 3 < d < 3 i

Studentized Residuals The residuals are defined as therefore e y yˆ e y Hy ( I H) y So the covariance matri of the residuals is The studentized residuals are defined as Cov( e) σ Cov( I r i e ˆ ( i σ h ii ) H) If these elements were z-scores then with probability 99.7%???? 3 < r < 3 i

Testing for Lack of Fit (Assuming a Central Composite Design) Compute the standard deviation of the center points and assume that represents the S PE S PE i center points n C ( y y) S LOF SS p LOF SS ( n ) PE S PE SS + SS PE LOF SS E F 0 S S LOF PE

Concept Test You perform a linear regression of 00 data points (n00). There are two independent variables and. The regression R is 0.7. Both and pass a t test for significance. You decide to add the interaction to the model. Select all the things that cannot happen: ) Absolute value of decreases ) changes sign 3) R decreases 4) fails the t test for significance

Plan for Today ud cards ultiple Regression Estimation of the parameters Hypothesis testing Regression diagnostics Testing lack of fit Case study Net steps

Scenario The FAA and EPA are interested in reducing CO emissions Some parameters of airline operations are thought to effect CO (e.g., Speed, Altitude, Temperature, Weight) Imagine flights have been made with special equipment that allowed CO emission to be measured (data provided) You will report to the FAA and EPA on your analysis of the data and make some recommendations

Phase One Open a atlab window Load the data (load FAAcase3.mat) Eplore the data

Phase Two Do the regression Eamine the betas and their intervals Plot the residuals y[co./ground_speed]; ones(:3538); X[ones' TAS alt temp weight]; [b,bint,r,rint,stats] regress(y,x,0.05); yhatx*b; plot(yhat,r,'+')

dimssize(x); i:dims()-; climb(); climb(dims())0; des()0; des(dims()); climb(i)(alt(i)>(alt(i-)+00)) (alt(i+)>(alt(i)+00)); des(i)(alt(i)<(alt(i-)-00)) (alt(i+)<(alt(i)-00)); for idims():-: if climb(i) des(i) y(i,:)[]; X(i,:)[]; yhat(i,:)[]; r(i,:)[]; end end hold off plot(yhat,r,'or') This code will remove the points at which the aircraft is climbing or descending

Try The Regression Again on Cruise Only Portions What were the effects on the residuals? What were the effects on the betas? hold off [b,bint,r,rint,stats] regress(y,x,0.05); yhatx*b; plot(yhat,r,'+')

See What Happens if We Remove Variables Remove weight & temp Do the regression (CO vs TAS & alt) Eamine the betas and their intervals [b,bint,r,rint,stats] regress(y,x(:,:3),0.05);

Phase Three Try different data (flight34.mat) Do the regression Eamine the betas and their intervals Plot the residuals y[fuel_burn]; ones(:34); X[ones' TAS alt temp]; [b,bint,r,rint,stats] regress(y,x,0.05); yhatx*b; plot(yhat,r,'+')

Adding Interactions X(:,5)X(:,).*X(:,3); This line will add a interaction What s the effect on the regression?

Case Wrap-Up What were the recommendations? What other analysis might be done? What were the key lessons?

Net Steps Wenesday 5 April Design of Eperiments Please read "Statistics as a Catalyst to Learning" Friday 7 April Recitation to support the term project onday 30 April Design of Eperiments Wednesday ay Design of Computer Eperiments Friday 4 ay?? Eam review?? onday 7 ay Frey at NSF Wednesday 9 ay Eam #