11.4.1 Estmaton of Multple Regresson Coeffcents In multple lnear regresson, we essentally solve n equatons for the p unnown parameters. hus n must e equal to or greater than p and n practce n should e at least 3 or 4 tmes as large as p. he dfference etween the oserved and predcted value of y (usng regresson) or the error s = y yˆ. he regresson coeffcents are otaned y mnmzng the sum of squares of errors. In matrx form, the n equatons can e wrtten as Y = X + e (11.36) where Y = (n x 1) column vector of the dependent varale, X = (n x p) matrx of ndependent varales, = (p x 1) column vector of the regresson coeffcents, and e = (n x 1) column vector of resduals. he resduals are condtoned y: E[e] = 0 (11.37) Cov e I (11.38) where I = (n x n) dagonal dentty matrx wth dagonal elements = 1 and off-dagonal elements = 0; and σ e = varance of (Y X). Accordng to the least squares prncple the estmates of regresson parameters are those whch mnmze the resdual sum of squares e e. Hence e e = (Y X) (Y-X) (11.39) s dfferentated wth respect to, and the resultng expresson s set equal to zero. hs gves: X X X Y (11.40) whch are called the normal equatons. Multplyng oth sdes wth (X X) -1 leads to an explct expresson for : (X X) X Y = (p * 1) (p * n)(n * p) (p * n) (n * 1) -1
(11.41) Note that the ndependent varales should e chosen such that none of these s a lnear comnaton of other ndependent varales. he propertes of the estmator : Cov X X 1 (11.4) By (1) and () the total adjusted sum of squares Y Y can e parttoned nto an explaned part due to regresson and an unexplaned part aout regresson, as follows: Y Y X Y e e. (11.43) where (X) Y = sum of squares due to regresson; e e = sum of squares aout regresson. hs equaton states: otal sum of squares aout mean = regresson sum of squares + resdual sum of Squares he mean squares values of the rght hand sde terms n (11.43) are otaned y dvdng the sum of squares y ther correspondng degrees of freedom. If s a (p x 1)-column vector,.e. there are p-ndependent varales n regresson, then the regresson sum of squares has p- degrees of freedom. Snce the total sum of squares has (n-1)-degrees of freedom (note: 1 degree of freedom s lost due to the estmaton of ȳ), t follows y sutracton that the resdual sum of squares has (n-1-p)-degrees of freedom. It can e shown that the resdual mean square S e : S e e e (11.44) n 1 p Is an unased estmate of σ ε.he estmate se of σ ε s the standard error of estmate. he analyss of varance (ANOVA) tale (see ale 11.) summarzes the sum of squares quanttes. ale 11.: Analyss of varance tale (ANOVA) Source Sum of squares Degrees of freedom otal S Y = Y Y n
Mean ny 1 Regresson X Y - ny p-1 Resdual Y Y - X Y n-p As for the smple lnear regresson a measure for the qualty of the regresson equaton s the coeffcent of determnaton, defned as the rato of the explaned or regresson sum of squares and the total adjusted sum of squares. R m X Y e 1 Y Y Y e Y (11.45) 11.4. Confdence Intervals on the Regresson Lne o place confdence lmts on Y 0 where Y 0 = X 0 t s necessary to have an estmate for the varance of Ŷ 0. Consderng Cov() as gven n (5) the varance Var(Ŷ 0 ) s gven y: 1 X X X VarY0 Se X 0 0 he confdence lmts for the mean regresson equaton are gven y (11.46) CL X t a /, n p Var (11.47) 0 1 Y0 Coeffcent of Determnaton (R ) Let Z,j = (X,j - x j )/S j (11.48) where x j and S j are the mean and standard devaton of the j th ndependent varale. he correlaton matrx s: R = Z Z/(n-1) = [R,j ] (11.49) where R,j s the correlaton etween the th and j th ndependent varales. R s a symmetrc matrx snce R,j = R j,. he coeffcent of determnaton s defned as R = Sum of squares due to regresson / Sum of squares aout mean
or R =( X Y - ny )/(Y Y - ny ) (11.50) ( Lˆ, U ˆ ) = ( ˆ -t S ˆ, ˆ t(1- /), (n-p) S ˆ) (1- /), (n-p) Here s the transpose of vector of sze (1xp), and Y s the transpose of vector Y of sze (1xn). Let the resdual error e = Y X. R s the part of the total sum of squares conceted for mean that s explan y the regresson equaton. It ranges etween 0 and 1 and closer t s to1, the etter s the regresson. 11.4.3 Inferences on Regresson Coeffcents () Confdence ntervals on Assumng that the model s correct, the quantty ˆ / S degrees of freedom. he confdence ntervals on are gven as ˆ follows a t-dstruton wth (n-p) (11.51) () est of hypothess concernng he hypothess that the th varale s not contrutng sgnfcantly to explanng the varaton n the dependent varale s equvalent to testng the hypothess H o : = o versus H a : o. he test s conducted y computng: t = (ˆ - o ) / S ˆ (11.5) Null hypothess H o s rejected f t > t (1-/), (n-p). If ths hypothess s accepted, t s advsale to delete the concerned varale from the regresson model. Sgnfcance of the overall regresson he null hypothess H o : 1 = = p = 0 versus H a : at least one of these 's s not zero s used to test whether the regresson equaton s ale to explan a sgnfcant amount of varaton of Y or not. he rato of the mean square error due to regresson to the resdual mean square has an F dstruton wth p-1 and n-p degrees of freedom. Hence, the hypothess s tested y computng the test statstc: ( X Y - ny )/(p- 1) F = (Y Y - ˆ XY)/(n- p)
(11.53) H o s rejected f F exceeds the crtcal value F (1-), (p-1), (n-p). Confdence Intervals on Regresson Lne: o put the confdence lmts on Y = X, t s necessary to estmate the varance of ŷ. hs s gven y S Yˆ S X ( X ' X ) 1 X ' (11.54) where L, U ) = { Yˆ -t, Yˆ +t } Confdence Intervals on Indvdual Predcted Value of Y S ˆ Y Yˆ L, U') = { Yˆ -t K = X K ˆ = S [ 1+X (X X, Yˆ +t ) -1 X ] } (11.55) (11.56) Example 11.5: ale contans ranfall for the months of July and August and dscharge for the August month for a catchment. Estmate the parameters of lnear regresson and multple lnear regresson and fnd out f there s an advantage n usng multple lnear regresson n ths case. ( (1- /), (n-p) SYˆ (1- /), (n-p) SYˆ ( (1- /), (n-p) SY ˆ (1- /), (n-p) SY ˆ ale 11.3 Data and computatons for multple lnear regresson example YEAR Comp. Q Os Q Comp. Q RF-JUL RF-AUG y Mult. (Qo- Aug y Ln (Qo-Q L )^ (MCM) (MCM) Ln Reg Q M )^ (MCM) Reg (Q L ) (Q M ) 198 500.04 15664.05 5996.939 6830.0 694015.6 6873.0 76753 1983 7980.13 6546.4 557.916 363.6 497987.7 357.3 108983 1984 300.36 13086.63 4395.515 581.9 034467.0 4736.5 1164 1985 857.75 753.13 575.0 3649. 4308915.7 4314.0 1990914 1986 54.03 5799.34 53.373 971.4 19787. 045. 3739 1987 6311.05 95.80 774.517 447.9 733589. 4353.5 49339 1988 6040.00 785.46 4163.013 355.7 3747.7 313.1 108147
1989 1597.33 69.49 046.694 3410.8 186070.7 1068.8 95635 1990 8561.71 6889.43 4190.084 3397.8 6765.7 3988.7 40541 1991 7153.31 1566.8 6107.45 5618.5 39036.6 67.3 14365 199 563.67 1063.08 5145.44 4717.4 183188.8 4433.0 507510 1993 433.30 7108.91 300.774 3483.7 139981.8 73. 759 1994 13076.88 1047.3 8994.085 4799. 17596705.8 7679.9 177088 1995 6843.64 8068.47 3695.11 3859.0 6865.5 385.6 4788 1996 7819.49 9330.16 4870.4 435.5 68196.6 4893.4 531 1997 9403.8 744.9 3943.455 3607.3 113005.8 4610.9 445541 1998 7040.85 8306.55 3801.77 395.1 64.6 4054.5 63883 1999 7380.56 9987.30 5895.899 4609.6 1654653.4 5036. 739043 000 860.8 483.79 1501.445 378.6 769480.6 713.5 1469074 001 9113.46 5071.5 670.739 686.8 56.8 3314.4 414338 00 196.93 11168.68 319.95 5071.7 359547.4 3060.5 17531 003 7493.84 7784.6 3708.33 3748.0 157.8 3985.1 76593 Sum 14747.41 191085.61 9009.88 9009.88 3.91E+07 9.0E+04 1.4E+07 Average 6701.5 8685.71 4100.45 4100.449 Soluton: Usng the data gven n the tale, lnear regresson equaton of the followng form was estalshed etween the ranfall and oserved dscharge for the August month. Q A = a+ R A where Q A = dscharge for August and R A = ranfall for August. he parameters a and were estmated to yeld the followng equaton: Q A = 703.05 + 0.391 R A Coeffcent of determnaton R = 1 3.91*10 7 /6.33*10 7 = 0.38. Next, dscharge for the August month was computed y usng the aove equaton and the sum of square of resduals turned out to e 3.91*10 7. In case of multple lnear regresson, the ndependent varales were the ranfall for the month July and August and dependent varale as the dscharge for August. Regresson equaton of the followng form was envsaged Q A = a + 1 R J + R A
where R J = ranfall for July month. After computatons, the followng regresson equaton was otaned. Q A = -3058.4 + 0.4 R J + 0.50 R A Coeffcent of determnaton R = 1 1.4*10 7 /6.33*10 7 = 0.78. he dscharge for August was computed y LR and MLR equatons and the sum of squares of errors were computed. he values were 3.91*10 7 for LR and 1.4*10 7 for MLR. When these values are compared along wth R for the two cases, t can e concluded that MLR gves much mproved estmates of the dscharge compared to LR.