STATGRAPHICS Rev. 9/13/2013 Comparson of Regresson Lnes Summary... 1 Data Input... 3 Analyss Summary... 4 Plot of Ftted Model... 6 Condtonal Sums of Squares... 6 Analyss Optons... 7 Forecasts... 8 Confdence Intervals... 9 Observed versus Predcted... 10 Resdual Plots... 10 Unusual Resduals... 13 Influental Ponts... 13 Save Results... 14 Calculatons... 14 Summary The Comparson of Regresson Lnes procedure s desgned to compare the regresson lnes relatng Y and X at two or more levels of a categorcal factor. Tests are performed to determne whether there are sgnfcant dfferences between the ntercepts and the slopes at the dfferent levels of that factor. The regresson lnes are plotted, unusual resduals are dentfed, and predctons are made usng the fttng model. Sample StatFolo: compare reg.sgp 2013 by StatPont Technologes, Inc. Comparson of Regresson Lnes - 1
STATGRAPHICS Rev. 9/13/2013 Sample Data: The fle soap.sgd contans data on the amount of scrap produced by two producton lnes as a functon of the speed at whch those lnes are run. The data, from Neter et al. (1996), s shown below: Lne Scrap Speed 1 218 100 1 248 125 1 360 220 1 351 205 1 470 300 1 394 255 1 332 225 1 321 175 1 410 270 1 260 170 1 241 155 1 331 190 1 275 140 1 425 290 1 367 265 2 140 105 2 277 215 2 384 270 2 341 255 2 215 175 2 180 135 2 260 200 2 361 275 2 252 155 2 422 320 2 273 190 2 410 295 It s desred to determne whether the relatonshp between Scrap (Y) and Speed (X) s dfferent for the two Lnes. 2013 by StatPont Technologes, Inc. Comparson of Regresson Lnes - 2
Data Input The data nput dalog box requests the names of the nput varables: STATGRAPHICS Rev. 9/13/2013 Dependent Varable: numerc column contanng the n values of Y. Independent Varable: numerc column contanng the n values of X. Level Codes: numerc or nonnumerc columns dentfyng the level of the categorcal factor. A separate ntercept and slope wll be estmated for each unque value n ths column. Select: subset selecton. 2013 by StatPont Technologes, Inc. Comparson of Regresson Lnes - 3
Analyss Summary The Analyss Summary shows nformaton about the ftted model. Comparson of Regresson Lnes - Scrap vs. Speed Dependent varable: Scrap Independent varable: Speed Level codes: Lne Number of complete cases: 27 Number of regresson lnes: 2 Multple Regresson Analyss Standard T Parameter Estmate Error Statstc P-Value CONSTANT 97.9653 19.1817 5.10724 0.0000 Speed 1.14539 0.0895535 12.79 0.0000 Lne=2-90.3909 28.3457-3.18887 0.0041 Speed*Lne=2 0.176661 0.128838 1.37119 0.1835 Coeffcents Lne Intercept Slope 1 97.9653 1.14539 2 7.57446 1.32205 Analyss of Varance Source Sum of Squares Df Mean Square F-Rato P-Value Model 169165.0 d 56388.2 130.95 0.0000 Resdual 9904.06 d 430.611 Total (Corr.) 179069.0 d R-Squared = 94.4691 percent R-Squared (adjusted for d.f.) = 93.7477 percent Standard Error of Est. = 20.7512 Mean absolute error = 16.0087 Durbn-Watson statstc = 1.79915 (P=0.2120) Lag 1 resdual autocorrelaton = 0.0911102 STATGRAPHICS Rev. 9/13/2013 Resdual Analyss Estmaton n 27 MSE 430.611 MAE 16.0087 MAPE 5.45666 ME 4.31589E-14 MPE -0.437749 Valdaton Included n the output are: Data Summary: a summary of the nput data. In the example, there are a total of n = 27 observatons. The categorcal factor contans m = 2 levels. Coeffcents: the estmated coeffcents, standard errors, t-statstcs, and P values. The general equaton for the model s Y = 0 + 1 X + 2 I 1 + 3 I 1 X + 4 I 2 + 5 I 2 X + + 2m-2 I m-1 + 2m-1 I m-1 X (1) where I 1, I 2,..., I m-1 are ndcator varables for the categorcal factors, where I j = 1 f the categorcal factor s at ts (j+1)-st level and 0 otherwse. The terms nvolvng only the ndcator varables allow the ntercepts to vary amongst levels of the categorcal factor, whle 2013 by StatPont Technologes, Inc. Comparson of Regresson Lnes - 4
STATGRAPHICS Rev. 9/13/2013 the terms contanng the cross-products of the ndcator varables wth X allow the slopes to vary. The t-statstc tests the null hypothess that the correspondng model parameter equals 0, versus the alternatve hypothess that t does not equal 0. Small P-Values (less than 0.05 f operatng at the 5% sgnfcance level) ndcate that a model coeffcent s sgnfcantly dfferent from 0. Analyss of Varance: decomposton of the varablty of the dependent varable Y nto a model sum of squares and a resdual or error sum of squares. The resdual sum of squares s further parttoned nto a lack-of-ft component and a pure error component. Of partcular nterest s the F-test and the assocated P-value. The F-test on the Model lne tests the statstcal sgnfcance of the ftted model. A small P-Value (less than 0.05 f operatng at the 5% sgnfcance level) ndcates that a sgnfcant relatonshp of the form specfed exsts between Y and X. Statstcs: summary statstcs for the ftted model, ncludng: R-squared - represents the percentage of the varablty n Y whch has been explaned by the ftted regresson model, rangng from 0% to 100%. For the sample data, the regresson has accounted for about 94.5% of the varablty amongst the observed amounts of Scrap. Adjusted R-Squared the R-squared statstc, adjusted for the number of coeffcents n the model. Ths value s often used to compare models wth dfferent numbers of coeffcents. Standard Error of Est. the estmated standard devaton of the resduals (the devatons around the model). Ths value s used to create predcton lmts for new observatons. Mean Absolute Error the average absolute value of the resduals. Durbn-Watson Statstc a measure of seral correlaton n the resduals. If the resduals vary randomly, ths value should be close to 2. A small P-value ndcates a non-random pattern n the resduals. For data recorded over tme, a small P-value could ndcate that some trend over tme has not been accounted for. Lag 1 Resdual Autocorrelaton the estmated correlaton between consecutve resduals, on a scale of 1 to 1. Values far from 0 ndcate that sgnfcant structure remans unaccounted for by the model. Resdual Analyss f a subset of the rows n the datasheet have been excluded from the analyss usng the Select feld on the data nput dalog box, the ftted model s used to make predctons of the Y values for those rows. Ths table shows statstcs on the predcton errors, defned by e y yˆ (2) Included are the mean squared error (MSE), the mean absolute error (MAE), the mean absolute percentage error (MAPE), the mean error (ME), and the mean percentage error (MPE). Ths valdaton statstcs can be compared to the statstcs for the ftted model to determne how well that model predcts observatons outsde of the data used to ft t. 2013 by StatPont Technologes, Inc. Comparson of Regresson Lnes - 5
Scrap For the sample data, the ftted model s STATGRAPHICS Rev. 9/13/2013 Scrap = 97.9653 + 1.14539*Speed - 90.3909*(Lne=2) + 0.176661*Speed*(Lne=2) where Lne=2 takes the value 1 for lne #2 and 0 for lne #1. To see that ths corresponds to 2 separate regresson lnes, the model can be wrtten separately for each lne: Lne #1: Scrap = 97.9653 + 1.14539*Speed Lne #2: Scrap = (97.9653-90.3909) + (1.14539+0.176661)*Speed The thrd and fourth coeffcents n the full model represent the dfference n the ntercepts and slopes between the two lnes. Plot of Ftted Model Ths Plot of Ftted Model pane shows the two regresson lnes: Plot of Ftted Model 540 440 Lne 1 2 340 240 140 100 140 180 220 260 300 340 Speed There s a notceable offset between the lnes, wth lne #1 producng more scrap at all speeds. In addton, the slope for lne #2 s somewhat greater than for lne #1. Condtonal Sums of Squares The Condtonal Sums of Squares pane dsplays an analyss of varance table that may be used to determne whether the ntercepts and the slopes of the lnes are sgnfcantly dfferent: Further ANOVA for Varables n the Order Ftted Source Sum of Squares Df Mean Square F-Rato P-Value Speed 149661.0 1 149661.0 347.55 0.0000 Intercepts 18694.1 1 18694.1 43.41 0.0000 Slopes 809.623 1 809.623 1.88 0.1835 Model 169165.0 3 Of prmary nterest are the F-tests and P-values for the Intercepts and Slopes. 1. The F-test for Slopes tests the hypotheses: 2013 by StatPont Technologes, Inc. Comparson of Regresson Lnes - 6
STATGRAPHICS Rev. 9/13/2013 Null Hypothess: slopes of the lnes are all equal Alt. Hypothess: slopes of the lnes are not all equal If the P-Value s small (less than 0.05 f operatng at the 5% sgnfcance level), then the slopes of the lnes vary sgnfcant amongst the levels of the categorcal factor. 2. The F-test for Intercepts tests the hypotheses: Null Hypothess: ntercepts of the lnes are all equal Alt. Hypothess: ntercepts of the lnes are not all equal If the P-Value s small (less than 0.05 f operatng at the 5% sgnfcance level), then the ntercepts of the lnes vary sgnfcant amongst the levels of the categorcal factor. In the sample data, the large P-value for Slopes ndcates that the slope of lne #1 and lne #2 are not sgnfcantly dfferent. However, the ntercepts of the two lnes are sgnfcantly dfferent. Analyss Optons Assume Equal Intercepts: select ths opton to ft a model wth equal ntercepts at all levels of the categorcal factor. Assume Equal Slopes: select ths opton to ft a model wth equal slopes at all levels of the categorcal factor. Example Fttng Parallel Regresson Lnes Selectng Assume Equal Slopes forces the regresson lnes to be parallel: 2013 by StatPont Technologes, Inc. Comparson of Regresson Lnes - 7
Scrap STATGRAPHICS Rev. 9/13/2013 Plot of Ftted Model 540 440 Lne 1 2 340 240 140 100 140 180 220 260 300 340 Speed Ths s accomplshed by removng the terms n the model that contan cross-products between the ndcator varables and X. In the current example, the new model s: Scrap = 80.411 + 1.23074*Speed - 53.1292*(Lne=2) The last coeffcent n the model ndcates that lne #2 produces 53.1292 unts less scrap on average than lne #1. Forecasts The model can be used to predct Y at selected values of X. Forecasts 95.00% 95.00% Speed Predcted Predcton Lmts Confdence Lmts Lne Scrap Lower Upper Lower Upper 100.0 1 203.485 156.234 250.736 185.288 221.682 2 150.356 102.339 198.373 130.255 170.457 200.0 1 326.559 281.516 371.602 315.274 337.844 2 273.43 227.992 318.868 260.661 286.199 300.0 1 449.633 402.823 496.443 432.614 466.652 2 396.504 349.71 443.298 379.53 413.478 Included n the table are: X - the value of the ndependent varable at whch the predcton s to be made. Predcted Y - the predcted value of the dependent varable usng the ftted model. Predcton lmts - predcton lmts for new observatons Confdence lmts - confdence lmts for the mean value of Y. For example, the predcted Scrap for Lne #1 at Speed = 200 s approxmately 327. New observatons from that lne can be expected to be between 282 and 372 wth 95% confdence. On 2013 by StatPont Technologes, Inc. Comparson of Regresson Lnes - 8
STATGRAPHICS Rev. 9/13/2013 average, the scrap from that lne at that speed s estmated to be somewhere between 315 and 338. Pane Optons Confdence Level: confdence percentage for the ntervals. Forecast at X: up to 10 values of X at whch to make predctons. Confdence Intervals The Confdence Intervals pane shows the potental estmaton error assocated wth each coeffcent n the model. 95.0% confdence ntervals for coeffcent estmates Standard Parameter Estmate Error Lower Lmt Upper Lmt CONSTANT 80.411 14.5438 50.394 110.428 Speed 1.23074 0.0655522 1.09545 1.36603 Lne=2-53.1292 8.21003-70.0739-36.1845 Pane Optons Confdence Level: percentage level for the nterval or bound. 2013 by StatPont Technologes, Inc. Comparson of Regresson Lnes - 9
observed STATGRAPHICS Rev. 9/13/2013 Observed versus Predcted The Observed versus Predcted plot shows the observed values of Y on the vertcal axs and the predcted values Yˆ on the horzontal axs. Plot of Scrap 540 440 Lne 1 2 340 240 140 140 240 340 440 540 predcted If the model fts well, the ponts should be randomly scattered around the dagonal lne. It s sometmes possble to see curvature n ths plot, whch would ndcate the need for a curvlnear model rather than a lnear model. Any change n varablty from low values of Y to hgh values of Y mght also ndcate the need to transform the dependent varable before fttng a model to the data. Resdual Plots As wth all statstcal models, t s good practce to examne the resduals. In a regresson, the resduals are defned by e y yˆ (3).e., the resduals are the dfferences between the observed data values and the ftted model. The Comparson of Regresson Lnes procedure varous type of resdual plots, dependng on Pane Optons. 2013 by StatPont Technologes, Inc. Comparson of Regresson Lnes - 10
percentage Studentzed resdual STATGRAPHICS Rev. 9/13/2013 Scatterplot versus X Ths plot s helpful n vsualzng any need for a curvlnear model. Resdual Plot 2.8 1.8 0.8-0.2-1.2-2.2 140 240 340 440 540 predcted Scrap Lne 1 2 Normal Probablty Plot Ths plot can be used to determne whether or not the devatons around the lne follow a normal dstrbuton, whch s the assumpton used to form the predcton ntervals. Normal Probablty Plot for Scrap 99.9 99 95 80 50 20 5 1 0.1-2.2-1.2-0.2 0.8 1.8 Studentzed resdual If the devatons follow a normal dstrbuton, they should fall approxmately along a straght lne. 2013 by StatPont Technologes, Inc. Comparson of Regresson Lnes - 11
autocorrelaton STATGRAPHICS Rev. 9/13/2013 Resdual Autocorrelatons Ths plot calculates the autocorrelaton between resduals as a functon of the number of rows between them n the datasheet. 1 Resdual Autocorrelatons for Scrap 0.6 0.2-0.2-0.6-1 0 2 4 6 8 10 12 lag It s only relevant f the data have been collected sequentally. Any bars extendng beyond the probablty lmts would ndcate sgnfcant dependence between resduals separated by the ndcated lag, whch would volate the assumpton of ndependence made when fttng the regresson model. Pane Optons Plot: the type of resduals to plot: 1. Resduals the resduals from the least squares ft. 2. Studentzed resduals the dfference between the observed values y and the predcted values ŷ when the model s ft usng all observatons except the -th, dvded by the estmated standard error. These resduals are sometmes called externally deleted 2013 by StatPont Technologes, Inc. Comparson of Regresson Lnes - 12
STATGRAPHICS Rev. 9/13/2013 resduals, snce they measure how far each value s from the ftted model when that model s ft usng all of the data except the pont beng consdered. Ths s mportant, snce a large outler mght otherwse affect the model so much that t would not appear to be unusually far away from the lne. Type: the type of plot to be created. A Scatterplot s used to test for curvature. A Normal Probablty Plot s used to determne whether the model resduals come from a normal dstrbuton. An Autocorrelaton Functon s used to test for dependence between consecutve resduals. Plot Versus: for a Scatterplot, the quantty to plot on the horzontal axs. Number of Lags: for an Autocorrelaton Functon, the maxmum number of lags. For small data sets, the number of lags plotted may be less than ths value. Confdence Level: for an Autocorrelaton Functon, the level used to create the probablty lmts. Unusual Resduals Once the model has been ft, t s useful to study the resduals to determne whether any outlers exst that should be removed from the data. The Unusual Resduals pane lsts all observatons that have Studentzed resduals of 2.0 or greater n absolute value. Unusual Resduals Predcted Studentzed Row Y Y Resdual Resdual 15 367.0 406.557-39.5573-2.11 Studentzed resduals greater than 3 n absolute value correspond to ponts more than 3 standard devatons from the ftted model, whch s a rare event for a normal dstrbuton. Note: Ponts can be removed from the ft whle examnng the Plot of the Ftted Model by clckng on a pont and then pressng the Exclude/Include button on the analyss toolbar. Excluded ponts are marked wth an X. Influental Ponts In fttng a regresson model, all observatons do not have an equal nfluence on the parameter estmates n the ftted model. In a smple regresson, ponts located at very low or very hgh values of X have greater nfluence than those located nearer to the mean of X. The Influental Ponts pane dsplays any observatons that have hgh nfluence on the ftted model: Influental Ponts Mahalanobs Cook's Row Leverage Dstance DFITS Dstance 15 0.100555 1.83337-0.706032 0.0131351 Average leverage of sngle data pont = 0.111111 Ponts are placed on ths lst for one of the followng reasons: 2013 by StatPont Technologes, Inc. Comparson of Regresson Lnes - 13
STATGRAPHICS Rev. 9/13/2013 Leverage measures how dstant an observaton s from the mean of all n observatons n the space of the ndependent varables. The hgher the leverage, the greater the mpact of the pont on the ftted values ŷ. Ponts are placed on the lst f ther leverage s more than 3 tmes that of an average data pont. Mahalanobs Dstance measures the dstance of a pont from the center of the collecton of ponts n the multvarate space of the ndependent varables. Snce ths dstance s related to leverage, t s not used to select ponts for the table. DFITS measures the dfference between the predcted values ŷ when the model s ft wth and wthout the -th data pont. Ponts are placed on the lst f the absolute value of DFITS exceeds 2 p / n, where p s the number of coeffcents n the ftted model. Save Results The followng results may be saved to the datasheet: 1. Predcted Values the predcted value of Y correspondng to each of the n observatons. 2. Standard Errors of Predctons - the standard errors for the n predcted values. 3. Lower Lmts for Predctons the lower predcton lmts for each predcted value. 4. Upper Lmts for Predctons the upper predcton lmts for each predcted value. 5. Standard Errors of Means - the standard errors for the mean value of Y at each of the n values of X. 6. Lower Lmts for Forecast Means the lower confdence lmts for the mean value of Y at each of the n values of X. 7. Upper Lmts for Forecast Means the upper confdence lmts for the mean value of Y at each of the n values of X. 8. Resduals the n resduals. 9. Studentzed Resduals the n Studentzed resduals. 10. Leverages the leverage values correspondng to the n values of X. 11. DFITS Statstcs the value of the DFITS statstc correspondng to the n values of X. 12. Mahalanobs Dstances the Mahalanobs dstance correspondng to the n values of X. 13. Coeffcents the estmated model coeffcents. Calculatons Detals on the calculatons n ths procedure may be found n the Multple Regresson documentaton. 2013 by StatPont Technologes, Inc. Comparson of Regresson Lnes - 14