7 Multiple liear regressio: with predictors) Depedet data set: y i i = 1, oe predictad, predictors x i,k i = 1,, k = 1, ' The forecast equatio is ŷ i = b + Use matrix otatio: k =1 b k x ik Y = y 1 y 1 x 11 x 1 x 1 x 1 x 11 x 1 x 1 1 x X = 1 x x x x ik = x 1 x x x ik 1 x 1 x x x x 1 x x b The regressio coefficiets are B =, the forecast equatio is Ŷ = XB, ad b 1 ' ' the forecast error vector E = ' is give by E = Y XB ' ' ' Least squares approach: choose B to miimize E = E T E = The total sum of squares variace) of Y is SST = y i ' = Y ' T Y ' The residual forecast errors) sum of squares is = E T E = Y XB) T Y XB) = Y T Y Y T XB B T X T Y + B T X T XB all terms are scalars)the miimizatio is clearer i sum otatio:
8 = E = y i b l x il l = ' The miimizatio gives the ormal equatios : ' = = y b i b l x il k ) x = ik ' = x T ki y i x T ki b l x il ) =, k =,1,, l = l = X T Y X T XB So, i matrix form, the ormal equatios for the coefficiets are X T XB = X T Y or B = X T X ) 1 X T Y Agai, we separate the total sum of squares SST ito the regressio explaied) sum of squares SSR) ad the error sum of squares ) SST = Y ' T Y ' Y ' = Y Y = Y XB) T Y XB) = Y T Y B T X T Y Y T XB + B T X T XB = = Y T Y Y T XB sice X T XB = X T Y Sice these are all scalars, they are the same as their traspose: Y T XB = Y T XB) T = B T X T Y so that = Y T Y Y T XB = Y T B T X T )Y = E T Y the scalar product of the error ad the predictad) Note prove) that E T Ŷ =, the error is orthogoal to the forecast) SSR = SST - = R SST SST R = is the square of the geeralized correlatio coefficiet SST or explaied variace
9 This is a estimate for the depedet sample used for traiig the regressio, so that it is overoptimistic I a idepedet sample, the explaied variace is smaller Note that the aïve forecast error variace = 1 1 i = is seriously 1 biased overoptimistic) This is for two reasos: a) we are usig predictors, ad b) it is the estimate for the depedet traiig) sample The ubiased estimate for the forecast error variace for the depedet sample is s = y i ŷ i ) 1 = 1 This is because the is estimated with --1 dof SST is estimated with -1 dof oe dof was used to compute y ) SSR uses dof for the regressio coefficiets b k, so that is left with --1 dof It is clear that is we use too may predictors, ie, if ~ O) we ca have over fittig If =-1, we ca fit perfectly the depedet sample, so that the aïve depedet sum of errors squared is = However, the estimate of the depedet forecast error squared is i that case s = 1 = So we should ever over fit, ad should keep a umber of predictors such that << Moreover, the regressio coefficiets traied o the depedet sample) also have samplig errors they are oly estimates of the true regressio coefficiets)
3 Therefore whe we apply the regressio formula to ew predictors x a idepedet data set), the stadard deviatio of the error is give by ) 1 1+ x T X T X ) 1 x ucertaity icreases with the umber of predictors ucertaity icreases due to errors i the samplig of B whe used with idepedet data For =1 this correctio for idepedet data is 1+ 1 + x x ) x i x ) ~ 1+ if x is ot far from x Therefore the forecast error for idepedet data is give by the t distributio x T B Ex T B) ) ~ t 1 1 1+ x T X T X ) 1 x We ca estimate a 11-a) cofidece iterval for predictig Yx ) as Yx ) = x T B ± 1 1+ x T X T X ) 1 x )t a If a=5, we look for values of t 5, 1, 1
31 These predictio error estimates are oly estimates A better estimatio of the error is to reserve part of the data for idepedet data testig crossvalidatio) For example, we use 9 of the data to obtai the regressio coefficiets, ad test the forecast o the remaiig 1 I that case this ca be repeated 1 times for differet 1 subsets jackkifig ) This will give a good estimate of the forecast errors to be expected with a idepedet data set, as well as the error variace of the regressio coefficiets Statistical packages provide iformatio ot oly about regressio coefficiets but also about their error estimates usig the formulas above) ad about how sigificatly smaller is the forecast error This is give i aalysis of variace ANOVA) ad parameter error tables ANOVA Source of variace dof Sum of Squares SS) Mea Square MS=SS/dof F ratio test statistic Total -1 SST SST/-1) Regressio SSR SSR/ MSR/MSE Error residual) --1 --1) Regressio Summary Predictor Coefficiet Stadard error t-ratio --1) Costat b s b b / s b x k b k s bk b k / s bk If the F ratio is large compared to F 5,, 1 the we reject the ull hypothesis that the coefficiets b k are really zero for the populatio ad that b k due to samplig