ultiple Regression Dan Frey Associate Professor of echanical Engineering and Engineering Systems
Plan for Today ultiple Regression Estimation of the parameters Hypothesis testing Regression diagnostics Testing lack of fit Case study Net steps
The odel Equation ε X y + For a single variable For multiple variables ε α + + Y nk n n k k L O L L X k 0 ε n ε ε ε y n y y y α is renamed 0 These s allow 0 to enter the equation without being mult by s + k p
The odel Equation y X + ε Each row of X is paired with an observation y y y y n X Each column of X is paired with a coefficient L n n L O L k k nk ε E( ε i ) ε ε ε n 0 Var( ε ) σ i There are n observations of the response 0 k There are k coefficients Each observation is affected by an independent homoscedastic normal variates
Accounting for Indices ε X y + y n y y y nk n n k k L O L L X k 0 ε n ε ε ε + k p n np p n
Concept Question Which of these is a valid X matri? X 5.0m 0.3sec 7.m 0.sec 3.m 0.7 sec 5.4m 0.4sec A X 5.0m 7.V 3.sec 5.4A B 0.3m 0.V 0.7 sec 0.4A X 5.0m 7.m C 0.sec 0.3sec ) A only ) B only 3) C only 4) A and B 5) B and C 6) A and C 7) A, B, & C 8) None 9) I don t know
Adding h.o.t. to the odel Equation y n y y y n n n n n O X Each row of X is paired with an observation There are n observations of the response You can add interactions You can add curvature 0
Estimation of the Parameters Assume the model equation y X + ε We wish to minimize the sum squared error L ε T ε ( ) T y X ( y X) To minimize, we take the derivative and set it equal to zero L ˆ X T y + X T Xˆ The solution is ( T ) T X X X y And we define the fitted model ˆ yˆ Xˆ
Done in athcad: athcad Demo ontgomery Eample 0- ontgomery, D. C., 00, Design and Analysis of Eperiments, John Wiley & Sons.
Breakdown of Sum Squares Grand Total Sum of Squares GTSS n i y i SS due to mean ny SS T n i ( y i y) n SS R (yˆ y) i i SS E n i e i SSPE SSLOF
Breakdown of DOF n number of y values due to the mean n- total sum of squares k for the regression n-k- for error
Estimation of the Error Variance σ Remember the the model equation y X + ε ε ~ N(0, σ ) If assumptions of the model equation hold, then E ( ) SS ( n k ) σ E So an unbiased estimate of σ is σ ˆ SS E ( n k )
a.k.a. coefficient of multiple determination R and Adjusted R What fraction of the total sum of squares (SS T ) is accounted for jointly by all the parameters in the fitted model? R SS R SS T SS SS E T R can only rise as parameters are added R adj R adj SS SS E T ( n ( n p) ) n ( n p can rise or drop as parameters are added R )
Back to athcad Demo ontgomery Eample 0- ontgomery, D. C., 00, Design and Analysis of Eperiments, John Wiley & Sons.
Why Hypothesis Testing is Important in ultiple Regression Say there are 0 regressor variables Then there are coefficients in a linear model To make a fully nd order model requires 0 curvature terms in each variable 0 choose 45 interactions You d need 68 samples just to get the matri X T X to be invertible You need a way to discard insignificant terms
Test for Significance of Regression The hypotheses are H 0 : K k 0 H : j 0 for at least one j The test statistic is F 0 SS E SSR k ( n k ) F > F k n k Reject H 0 if 0 α,,
The hypotheses are Test for Significance Individual Coefficients H 0 : j 0 H : j 0 j C ( The test statistic is t0 X T X) ˆ σ C jj Standard error ˆ t > t k Reject H 0 if 0 α /, n σˆ C jj
Test for Significance of Groups of Coefficients Partition the coefficients into two groups to be removed to remain ε X y + Reduced model nk n n k k L O L L X Basically, you form X by removing the columns associated with the coefficients you are testing for significance X 0 0 0 : : H H
Test for Significance Groups of Coefficients Reduced model y X + ε The regression sum of squares for the reduced model is SS R ( T y H y ny ) Define the sum squares of the removed set given the other coefficients are in the model The partial F test F 0 SS SS R E ( ) ( n SS R r p) ( ) SSR ( ) SSR ( ) Reject H 0 if F 0 > Fα, r, n p
Ecel Demo -- ontgomery E0- ontgomery, D. C., 00, Design and Analysis of Eperiments, John Wiley & Sons.
Factorial Eperiments Cuboidal Representation + + - - + - 3 Ehaustive search of the space of discrete -level factors is the full factorial 3 eperimental design
Adding Center Points + + - - + - 3 Center points allow an eperimenter to check for curvature and, if replicated, allow for an estimate of pure eperimental error
Plan for Today ud cards ultiple Regression Estimation of the parameters Hypothesis testing Regression diagnostics Testing lack of fit Case study Net steps
The Hat atri Since ˆ ( T ) T X X X y and yˆ Xˆ therefore yˆ ( T ) T X X y X X So we define Which maps from observations y to predictions ŷ H ( T ) T X X X X y ˆ Hy
Influence Diagnostics The relative disposition of points in space determines their effect on the coefficients The hat matri H gives us an ability to check for leverage points h ij is the amount of leverage eerted by point y j on Usually the diagonal elements ~p/n and it is good to check whether the diagonal elements within X of that ŷ i
athcad Demo on Distribution of Samples and Its Effect on Regression
Standardized Residuals The residuals are defined as e y yˆ So an unbiased estimate of σ is ˆ σ SS E ( n p) The standardized residuals are defined as d e σˆ If these elements were z-scores then with probability 99.7% 3 < d < 3 i
Studentized Residuals The residuals are defined as therefore e y yˆ e y Hy ( I H) y So the covariance matri of the residuals is The studentized residuals are defined as Cov( e) σ Cov( I r i e ˆ ( i σ h ii ) H) If these elements were z-scores then with probability 99.7%???? 3 < r < 3 i
Testing for Lack of Fit (Assuming a Central Composite Design) Compute the standard deviation of the center points and assume that represents the S PE S PE i center points n C ( y y) S LOF SS p LOF SS ( n ) PE S PE SS + SS PE LOF SS E F 0 S S LOF PE
Concept Test You perform a linear regression of 00 data points (n00). There are two independent variables and. The regression R is 0.7. Both and pass a t test for significance. You decide to add the interaction to the model. Select all the things that cannot happen: ) Absolute value of decreases ) changes sign 3) R decreases 4) fails the t test for significance
Plan for Today ud cards ultiple Regression Estimation of the parameters Hypothesis testing Regression diagnostics Testing lack of fit Case study Net steps
Scenario The FAA and EPA are interested in reducing CO emissions Some parameters of airline operations are thought to effect CO (e.g., Speed, Altitude, Temperature, Weight) Imagine flights have been made with special equipment that allowed CO emission to be measured (data provided) You will report to the FAA and EPA on your analysis of the data and make some recommendations
Phase One Open a atlab window Load the data (load FAAcase3.mat) Eplore the data
Phase Two Do the regression Eamine the betas and their intervals Plot the residuals y[co./ground_speed]; ones(:3538); X[ones' TAS alt temp weight]; [b,bint,r,rint,stats] regress(y,x,0.05); yhatx*b; plot(yhat,r,'+')
dimssize(x); i:dims()-; climb(); climb(dims())0; des()0; des(dims()); climb(i)(alt(i)>(alt(i-)+00)) (alt(i+)>(alt(i)+00)); des(i)(alt(i)<(alt(i-)-00)) (alt(i+)<(alt(i)-00)); for idims():-: if climb(i) des(i) y(i,:)[]; X(i,:)[]; yhat(i,:)[]; r(i,:)[]; end end hold off plot(yhat,r,'or') This code will remove the points at which the aircraft is climbing or descending
Try The Regression Again on Cruise Only Portions What were the effects on the residuals? What were the effects on the betas? hold off [b,bint,r,rint,stats] regress(y,x,0.05); yhatx*b; plot(yhat,r,'+')
See What Happens if We Remove Variables Remove weight & temp Do the regression (CO vs TAS & alt) Eamine the betas and their intervals [b,bint,r,rint,stats] regress(y,x(:,:3),0.05);
Phase Three Try different data (flight34.mat) Do the regression Eamine the betas and their intervals Plot the residuals y[fuel_burn]; ones(:34); X[ones' TAS alt temp]; [b,bint,r,rint,stats] regress(y,x,0.05); yhatx*b; plot(yhat,r,'+')
Adding Interactions X(:,5)X(:,).*X(:,3); This line will add a interaction What s the effect on the regression?
Case Wrap-Up What were the recommendations? What other analysis might be done? What were the key lessons?
Net Steps Wenesday 5 April Design of Eperiments Please read "Statistics as a Catalyst to Learning" Friday 7 April Recitation to support the term project onday 30 April Design of Eperiments Wednesday ay Design of Computer Eperiments Friday 4 ay?? Eam review?? onday 7 ay Frey at NSF Wednesday 9 ay Eam #