Chapter 7. Transformation - PDF Free Download

Chapter 7 Trasformatio

7.. Trasformatio Is liear regressio appropriate?

7.. Trasformatio The assumptio of liear relatioship does ot alwas hold We ca trasform The predictor The respose Both to achieve the liear relatioship

Power trasformatio Power trasformatio U U Wat a liear relatioship BraiWt BodWt e λ= a - b 0 i.e. log U c 0.33 d 0.5 Which λ will ou choose?

Practical suggestios Log rule: log trasform is useful whe Observatios are positive Rage of variable is huge i.e. the biggest observatios is a much bigger tha the smallest Rage rule: No trasformatio is useful if Rage of variable is too small

Iterpretatio λ > 0 BraiWt BodWt BraiWt Artificial usuall has o phsical meaig λ = 0 : log trasformatio BodWt e Correspodig to a phsical model allometric model e log BraiWt log BodWt BraiWt BodWt e Multiplicative error

Improvig Power trasformatio Power trasformatio Scaled power trasformatio Advatage U lim 0 U s log log Cotiuous fuctio of λ : Preserve the directio of associatio True model : E egative assocatio b/w ad Power trasform: E / positive assocatio b/w ad 0 0 Scaled power trasform: E s egative assocatio b/w ad s

Procedures to look for trasformatio Method : Draw ma fitted curves i. e. plot x ˆ for various x where ˆ ˆ ˆ 0 x 0... Method : Draw ma scatter plots vs vs / vs log Method 3: plot λ agaist RSS of fittig agaist ψ λ the fid the λ that miimizes RSS. Or choose λ i the set --/0

Example =Height of tree =diameter of tree M: Draw ma curves M: The best scatterplots M3: Miimize RSS: RSSλ=0=3. RSSλ==44.5 RSSλ=-=54.8. Coclusio: Height = β o + β logdiameter + e

Methods for multiple regressio Three approaches Iverse fitted value plot ˆ Plot agaist Fid trasformatio for that matches the above patter Box Cox trasformatio A modificatio of scaled power trasformatio but applied to. Modified power trasform for each predictor

Iverse fitted value plot. Fit a liear regressio betwee ad get the fitted value ˆ ˆ. Plot ˆ -axis agaist x-axis 3. Fix a λ fit ˆ agaist s ad obtai ˆ ˆ ˆ 0 s 4. Draw the fitted curve ˆ o the graph see if it matches the patter i. ˆ ˆ ˆ ˆ ˆ Match 0 s 5. Repeat 3-4 to search for the best λ sa λ* * ad s areliearl related Regress * agaist s

Example of Iverse fitted value Read data highwa.data=read.table"c:/highwa.txt"header=t #Or libraralr3; highwa.data=highwa Step : Multiple regressio fit=lmrate~logadt+logtrks+shld+logledata=highwa.data Step : Plot fitted values agaist.hat=fit$fitted.values =highwa.data$rate plot.hat ablielm.hat~ Step 3+4: Regressio: Fitted value agaist trasformed ad plot the Newl fitted values Psi.0=log fit=lm.hat~psi.0 poitsfit$fitted.valuescol= Trial : Step 3+4: Psi.mius=-/- fit=lm.hat~psi.mius poitsfit$fitted.valuescol=3 More R techiques: Sort to draw the lie. order.=order ordered.=[order.] ordered.fit=fit$fitted.values[order.] ordered.fit=fit$fitted.values[order.] liesordered.ordered.fittpe="l"col= liesordered.ordered.fittpe="l"col=3 I this case λ=0 seems to be the best.

Box-Cox trasformatio. Modified power famil. Advatage: Uit of is the same as for all λ 3. Model Assumptio: 4. How to choose λ? Fix a λ fit model * for ad obtai RSSλ Tr various λ ad fid the oe which miimizes RSSλ * ' x x E M 0 if log... 0 if...... S M M

Example of Box-Cox trasformatio E M x ' x Modified power famil * M... log highwa.data=read.table"c:/highwa.txt"header=t =highwa.data$rate =legth gm=prod^{/}... if 0 if 0 Choose log or λ=- 0.5 #A lambda=- Trasform.A=-gm^*/- fit.a=lmtrasform.a~logadt+logtrks+shld+logledata=highwa.data Rss.A=sumfit.A$residuals^ #G lambda= Trasform.G=//gm*^- fit.g=lmtrasform.g~logadt+logtrks+shld+logledata=highwa.data Rss.G=sumfit.G$residuals^ plotc--/0/3/crss.arss.brss.crss.drss.erss.frss.gtpe="l"

Example of Box-Cox trasformatio # Read data highwa.data=read.table"c:/highwa.txt"header=t =highwa.data$rate =legth gm=prod^{/} #A lambda=- Trasform.A=-gm^*/- fit.a=lmtrasform.a~logadt+logtrks+shld+logle data=highwa.data Rss.A=sumfit.A$residuals^ #B lambda=-/ Trasform.B=-*gm^3/*^-/- fit.b=lmtrasform.b~logadt+logtrks+shld+logle data=highwa.data Rss.B=sumfit.B$residuals^ #C lambda=0 Trasform.C=gm*log fit.c=lmtrasform.c~logadt+logtrks+shld+logl edata=highwa.data Rss.C=sumfit.C$residuals^ #D lambda=/3 Trasform.D=3*gm^/3*^/3- fit.d=lmtrasform.d~logadt+logtrks+shld+logledata= highwa.data Rss.D=sumfit.D$residuals^ #E lambda=/ Trasform.E=*gm^/*sqrt- fit.e=lmtrasform.e~logadt+logtrks+shld+logledata= highwa.data Rss.E=sumfit.E$residuals^ #F lambda= Trasform.F= fit.f=lmtrasform.f~logadt+logtrks+shld+logledata=h ighwa.data Rss.F=sumfit.F$residuals^ #G lambda= Trasform.G=//gm*^- fit.g=lmtrasform.g~logadt+logtrks+shld+logledata= highwa.data Rss.G=sumfit.G$residuals^ plotc-- /0/3/cRss.ARss.BRss.CRss.DRss.ERs s.frss.gtpe="l"

Modified power trasformatio for all predictors Modified power famil Trasform predictors so that each pair of variables i the scatterplot matrix has a liear relatioship....... p p M M M p 0 if log... 0 if...... S M

Modified power trasformatio for all predictors Trasformatio with modified power famil... p M M... M p p Not a eas task. Ol use it if other methods do ot work well

Trasformatio of o-positive variables Problem of o-positive variables e.g. λ= S x S x we ca t distiguish betwee x ad x. logx is udefied if x<0. Solutios U Fid a sufficietl large ad trasform U to eo-johso trasformatio S U U 0 J U S U U 0 x

Fial Remarks No eed to trasform factors e.g. x F 0 F 0 group group we look at β to see the mea differet betwee the groups. Trasformig the dumm does t help. There is o correct wa of trasformatio oce ou come up with trasformatio... p p 0 which looks roughl liear i the scatterplot matrix the it is ok to fit. 0... 0 p p p