Objectives of Multiple Regression

Obectves of Multple Regresso Establsh the lear equato that best predcts values of a depedet varable Y usg more tha oe eplaator varable from a large set of potetal predctors {,,... k }. Fd that subset of all possble predctor varables that eplas a sgfcat ad apprecable proporto of the varace of Y, tradg off adequac of predcto agast the cost of measurg more predctor varables. 5-

Epadg Smple Lear Regresso Quadratc model. Y ε Geeral polomal model. Y 3 3... k k ε Addg oe or more polomal terms to the model. A depedet varable,, whch appears the polomal regresso model as k s called a k th -degree term. 5-

Polomal model shapes. Lear.5..5 3. 3.5 34 Addg oe more terms to the model sgfcatl mproves the model ft..5..5 3. 3.5 Quadratc 34 5-3

Icorporatg Addtoal Predctors Smple addtve multple regresso model 3 3... k k ε Addtve (Effect) Assumpto - The epected chage per ut cremet s costat ad does ot deped o the value of a other predctor. Ths chage s equal to. 5-4

Addtve regresso models: For two depedet varables, the respose s modeled as a surface. 5-5

Iterpretg Parameter Values (Model Coeffcets) Itercept - value of whe all predctors are. Partal slopes,, 3,... k - descrbes the epected chage per ut cremet whe all other predctors the model are held at a costat value. 5-6

Graphcal depcto of. - slope drecto of. - slope drecto of. 5-7

Multple Regresso wth Iteracto Terms Y 3 3... k k 3 3... k k... cross-product terms quatf the teracto amog predctors. k-,k k- k ε Iteractve (Effect) Assumpto: The effect of oe predctor,, o the respose,, wll deped o the value of oe or more of the other predctors. 5-8

Iterpretg Iteracto Iteracto Model or Defe: 3 3 ε ε No dfferece No loger the epected chage Y per ut cremet X! No eas terpretato! The effect o of a ut cremet X, ow depeds o X. 5-9

o-teracto } } teracto 5-

Multple Regresso models wth teracto: Les move apart Les come together 5-

Effect of the Iteracto Term Multple Regresso Surface s twsted. 5-

A Protocol for Multple Regresso Idetf all possble predctors. Establsh a method for estmatg model parameters ad ther stadard errors. Develop tests to determe f a parameter s equal to zero (.e. o evdece of assocato). Reduce umber of predctors appropratel. Develop predctos ad assocated stadard error. 5-3

Estmatg Model Parameters Least Squares Estmato Assumg a radom sample of observatos (,,,..., k ),,,...,. The estmates of the parameters for the best predctg equato: ŷ! k k Is foud b choosg the values:,,!, k whch mmze the epresso: SSE ( ŷ) (! kk ) 5-4

5-5 Normal Equatos Take the partal dervatves of the SSE fucto wth respect to,,, k, ad equate each equato to. Solve ths sstem of k equatos k ukows to obta the equatos for the parameter estmates. k k k k k k k k k k! " " "!!

A Overall Measure of How Well the Full Model Performs Coeffcet of Multple Determato Deoted as R. Defed as the proporto of the varablt the depedet varable that s accouted for b the depedet varables,,,..., k, through the regresso model. Wth ol oe depedet varable (k), R r, the square of the smple correlato coeffcet. 5-6

5-7 Computg the Coeffcet of Determato, R S SSE S S SSR R k! TSS ) ( S k k SSE ) (!

Multcolleart A further assumpto multple regresso (abset SLR), s that the predctors (,,... k ) are statstcall ucorrelated. That s, the predctors do ot co-var. Whe the predctors are sgfcatl correlated (correlato greater tha about.6) the the multple regresso model s sad to suffer from problems of multcolleart. r r.6 r.8 5 5 6 4 3 3 4-345-3456 46 5-8

Effect of Multcolleart o the Ftted Surface Etreme colleart 5-9

Multcolleart leads to Numercal stablt the estmates of the regresso parameters wld fluctuatos these estmates f a few observatos are added or removed. No loger have smple terpretatos for the regresso coeffcets the addtve model. Was to detect multcolleart Scatterplots of the predctor varables. Correlato matr for the predctor varables the hgher these correlatos the worse the problem. Varace Iflato Factors (VIFs) reported b software packages. Values larger tha usuall sgal a substatal amout of colleart. What ca be doe about multcolleart Regresso estmates are stll OK, but the resultg cofdece/predcto tervals are ver wde. Choose eplaator varables wsel! (E.g. cosder omttg oe of two hghl correlated varables.) More advaced solutos: prcpal compoets aalss;; rdge regresso. 5-

Testg Multple Regresso Testg dvdual parameters the model. Computg predcted values ad assocated stadard errors. Y X! X k k ε, ε ~ N(, σ ) Overall AOV F-test H : Noe of the eplaator varables s a sgfcat predctor of Y SSR / k F SSE /( k ) MSR MSE Reect f: F > F k, k,α 5-

Stadard Error for Partal Slope Estmate The estmated stadard error for: s where σ ε S ( R! )! k S σ ε SSE ( k ) ( ) ad R!! k s the coeffcet of determato for the model wth as the depedet varable ad all other varables as predctors. What happes f all the predctors are trul depedet of each other? R s! k! σ S ε If there s hgh depedec? R!! k s 5-

Cofdece Iterval (-α)% Cofdece Iterval for ± t ( k ), α s Reflects the umber of data pots mus the umber of parameters that have to be estmated. df for SSE 5-3

5-4 Testg whether a partal slope coeffcet s equal to zero. < > a H H Test Statstc: s t Reecto Rego: ), ( ), ( ), ( α α α > < > k k k t t t t t t Alteratves:

Predctg Y We use the least squares ftted value,, as our predctor of a sgle value of at a partcular value of the eplaator varables (,,..., k ). The correspodg terval about the predcted value of s called a predcto terval. The least squares ftted value also provdes the best predctor of E(), the mea value of, at a partcular value of (,,..., k ). The correspodg terval for the mea predcto s called a cofdece terval. Formulas for these tervals are much more complcated tha the case of SLR;; the caot be calculated b had (see the book). ŷ 5-5

Mmum R for a Sgfcat Regresso Sce we have formulas for R ad F, terms of, k, SSE ad TSS, we ca relate these two quattes. We ca the ask the questo: what s the m R whch wll esure the regresso model wll be declared sgfcat, as measured b the approprate quatle from the F dstrbuto? The aswer (below), shows that ths depeds o, k, ad SSE/TSS. R m F k, k,α k k SSE TSS 5-6

Mmum R for Smple Lear Regresso (k) 5-7