Chapter 7. Transformation

Similar documents
II. Descriptive Statistics D. Linear Correlation and Regression. 1. Linear Correlation

Linear Regression Models

3/3/2014. CDS M Phil Econometrics. Types of Relationships. Types of Relationships. Types of Relationships. Vijayamohanan Pillai N.

Lecture 11 Simple Linear Regression

Response Variable denoted by y it is the variable that is to be predicted measure of the outcome of an experiment also called the dependent variable

STP 226 ELEMENTARY STATISTICS

a is some real number (called the coefficient) other

1 Inferential Methods for Correlation and Regression Analysis

Simple Regression. Acknowledgement. These slides are based on presentations created and copyrighted by Prof. Daniel Menasce (GMU) CS 700

Properties and Hypothesis Testing

Linear Regression Analysis. Analysis of paired data and using a given value of one variable to predict the value of the other

TMA4245 Statistics. Corrected 30 May and 4 June Norwegian University of Science and Technology Department of Mathematical Sciences.

Lecture 6 Chi Square Distribution (χ 2 ) and Least Squares Fitting

Lecture 6 Chi Square Distribution (χ 2 ) and Least Squares Fitting

Correlation. Two variables: Which test? Relationship Between Two Numerical Variables. Two variables: Which test? Contingency table Grouped bar graph

Chapters 5 and 13: REGRESSION AND CORRELATION. Univariate data: x, Bivariate data (x,y).

Assessment and Modeling of Forests. FR 4218 Spring Assignment 1 Solutions

Signal Processing in Mechatronics

Machine Learning Regression I Hamid R. Rabiee [Slides are based on Bishop Book] Spring

REGRESSION (Physics 1210 Notes, Partial Modified Appendix A)

11 Correlation and Regression

Problem Set # 5 Solutions

Lecture 7: Non-parametric Comparison of Location. GENOME 560, Spring 2016 Doug Fowler, GS

Polynomial Functions and Their Graphs

Sample Size Estimation in the Proportional Hazards Model for K-sample or Regression Settings Scott S. Emerson, M.D., Ph.D.

ECE 901 Lecture 12: Complexity Regularization and the Squared Loss

Dealing with Data and Fitting Empirically

Circle the single best answer for each multiple choice question. Your choice should be made clearly.

Algebra of Least Squares

Estimation for Complete Data

MCT242: Electronic Instrumentation Lecture 2: Instrumentation Definitions

Generalizing the DTFT. The z Transform. Complex Exponential Excitation. The Transfer Function. Systems Described by Difference Equations

REGRESSION AND ANALYSIS OF VARIANCE. Motivation. Module structure

Lesson 11: Simple Linear Regression

Stat 139 Homework 7 Solutions, Fall 2015

Summary: CORRELATION & LINEAR REGRESSION. GC. Students are advised to refer to lecture notes for the GC operations to obtain scatter diagram.

Lecture 7: Non-parametric Comparison of Location. GENOME 560 Doug Fowler, GS

Regression. Correlation vs. regression. The parameters of linear regression. Regression assumes... Random sample. Y = α + β X.

Implicit function theorem

Machine Learning for Data Science (CS 4786)

Geometry of LS. LECTURE 3 GEOMETRY OF LS, PROPERTIES OF σ 2, PARTITIONED REGRESSION, GOODNESS OF FIT

Economics 326 Methods of Empirical Research in Economics. Lecture 8: Multiple regression model

First, note that the LS residuals are orthogonal to the regressors. X Xb X y = 0 ( normal equations ; (k 1) ) So,

Chapter 11 Output Analysis for a Single Model. Banks, Carson, Nelson & Nicol Discrete-Event System Simulation

Least-Squares Regression

Study the bias (due to the nite dimensional approximation) and variance of the estimators

ECON 3150/4150, Spring term Lecture 3

4.1 Sigma Notation and Riemann Sums

( ) = p and P( i = b) = q.

Nonlinear regression

Statistical and Mathematical Methods DS-GA 1002 December 8, Sample Final Problems Solutions

Lecture 7: Linear Classification Methods

Solution of EECS 315 Final Examination F09

PH 411/511 ECE B(k) Sin k (x) dk (1)

Run-length & Entropy Coding. Redundancy Removal. Sampling. Quantization. Perform inverse operations at the receiver EEE

LINEAR REGRESSION ANALYSIS. MODULE IX Lecture Multicollinearity

Inverse Matrix. A meaning that matrix B is an inverse of matrix A.

6.003 Homework #3 Solutions

Section 14. Simple linear regression.

Elementary Statistics

SIMPLE LINEAR REGRESSION AND CORRELATION ANALYSIS

STA6938-Logistic Regression Model

EXAMINATIONS OF THE ROYAL STATISTICAL SOCIETY

Simple Linear Regression

Math 61CM - Solutions to homework 3

For example suppose we divide the interval [0,2] into 5 equal subintervals of length

Step 1: Function Set. Otherwise, output C 2. Function set: Including all different w and b

CHAPTER 2. Mean This is the usual arithmetic mean or average and is equal to the sum of the measurements divided by number of measurements.

PH 411/511 ECE B(k) Sin k (x) dk (1)

n m CHAPTER 3 RATIONAL EXPONENTS AND RADICAL FUNCTIONS 3-1 Evaluate n th Roots and Use Rational Exponents Real nth Roots of a n th Root of a

Bivariate Sample Statistics Geog 210C Introduction to Spatial Data Analysis. Chris Funk. Lecture 7

CATHOLIC JUNIOR COLLEGE General Certificate of Education Advanced Level Higher 2 JC2 Preliminary Examination MATHEMATICS 9740/01

Outline. Linear regression. Regularization functions. Polynomial curve fitting. Stochastic gradient descent for regression. MLE for regression

Linear Regression Demystified

Ismor Fischer, 1/11/

3. Z Transform. Recall that the Fourier transform (FT) of a DT signal xn [ ] is ( ) [ ] = In order for the FT to exist in the finite magnitude sense,

Chimica Inorganica 3

Math Solutions to homework 6

Sequences I. Chapter Introduction

ECONOMETRIC THEORY. MODULE XIII Lecture - 34 Asymptotic Theory and Stochastic Regressors

MATH 10550, EXAM 3 SOLUTIONS

Example: Find the SD of the set {x j } = {2, 4, 5, 8, 5, 11, 7}.

Curve Sketching Handout #5 Topic Interpretation Rational Functions

Soo King Lim Figure 1: Figure 2: Figure 3: Figure 4: Figure 5: Figure 6: Figure 7:

: Transforms and Partial Differential Equations

Appendix: The Laplace Transform

Linear regression. Daniel Hsu (COMS 4771) (y i x T i β)2 2πσ. 2 2σ 2. 1 n. (x T i β y i ) 2. 1 ˆβ arg min. β R n d

Chapter 2: Numerical Methods

Continuous Data that can take on any real number (time/length) based on sample data. Categorical data can only be named or categorised

Case study Galactose diffusion in silica mesopore

NANYANG TECHNOLOGICAL UNIVERSITY SYLLABUS FOR ENTRANCE EXAMINATION FOR INTERNATIONAL STUDENTS AO-LEVEL MATHEMATICS

TABLES AND FORMULAS FOR MOORE Basic Practice of Statistics

Interval Estimation (Confidence Interval = C.I.): An interval estimate of some population parameter is an interval of the form (, ),

Chapter 10: Power Series

Section 9.2. Tests About a Population Proportion 12/17/2014. Carrying Out a Significance Test H A N T. Parameters & Hypothesis

Chapter 4 - Summarizing Numerical Data

TR/46 OCTOBER THE ZEROS OF PARTIAL SUMS OF A MACLAURIN EXPANSION A. TALBOT

Cov(aX, cy ) Var(X) Var(Y ) It is completely invariant to affine transformations: for any a, b, c, d R, ρ(ax + b, cy + d) = a.s. X i. as n.

Linear Programming and the Simplex Method

Eton Education Centre JC 1 (2010) Consolidation quiz on Normal distribution By Wee WS (wenshih.wordpress.com) [ For SAJC group of students ]

Transcription:

Chapter 7 Trasformatio

7.. Trasformatio Is liear regressio appropriate?

7.. Trasformatio The assumptio of liear relatioship does ot alwas hold We ca trasform The predictor The respose Both to achieve the liear relatioship

Power trasformatio Power trasformatio U U Wat a liear relatioship BraiWt BodWt e λ= a - b 0 i.e. log U c 0.33 d 0.5 Which λ will ou choose?

Practical suggestios Log rule: log trasform is useful whe Observatios are positive Rage of variable is huge i.e. the biggest observatios is a much bigger tha the smallest Rage rule: No trasformatio is useful if Rage of variable is too small

Iterpretatio λ > 0 BraiWt BodWt BraiWt Artificial usuall has o phsical meaig λ = 0 : log trasformatio BodWt e Correspodig to a phsical model allometric model e log BraiWt log BodWt BraiWt BodWt e Multiplicative error

Improvig Power trasformatio Power trasformatio Scaled power trasformatio Advatage U lim 0 U s log log Cotiuous fuctio of λ : Preserve the directio of associatio True model : E egative assocatio b/w ad Power trasform: E / positive assocatio b/w ad 0 0 Scaled power trasform: E s egative assocatio b/w ad s

Procedures to look for trasformatio Method : Draw ma fitted curves i. e. plot x ˆ for various x where ˆ ˆ ˆ 0 x 0... Method : Draw ma scatter plots vs vs / vs log Method 3: plot λ agaist RSS of fittig agaist ψ λ the fid the λ that miimizes RSS. Or choose λ i the set --/0

Example =Height of tree =diameter of tree M: Draw ma curves M: The best scatterplots M3: Miimize RSS: RSSλ=0=3. RSSλ==44.5 RSSλ=-=54.8. Coclusio: Height = β o + β logdiameter + e

Methods for multiple regressio Three approaches Iverse fitted value plot ˆ Plot agaist Fid trasformatio for that matches the above patter Box Cox trasformatio A modificatio of scaled power trasformatio but applied to. Modified power trasform for each predictor

Iverse fitted value plot. Fit a liear regressio betwee ad get the fitted value ˆ ˆ. Plot ˆ -axis agaist x-axis 3. Fix a λ fit ˆ agaist s ad obtai ˆ ˆ ˆ 0 s 4. Draw the fitted curve ˆ o the graph see if it matches the patter i. ˆ ˆ ˆ ˆ ˆ Match 0 s 5. Repeat 3-4 to search for the best λ sa λ* * ad s areliearl related Regress * agaist s

Example of Iverse fitted value Read data highwa.data=read.table"c:/highwa.txt"header=t #Or libraralr3; highwa.data=highwa Step : Multiple regressio fit=lmrate~logadt+logtrks+shld+logledata=highwa.data Step : Plot fitted values agaist.hat=fit$fitted.values =highwa.data$rate plot.hat ablielm.hat~ Step 3+4: Regressio: Fitted value agaist trasformed ad plot the Newl fitted values Psi.0=log fit=lm.hat~psi.0 poitsfit$fitted.valuescol= Trial : Step 3+4: Psi.mius=-/- fit=lm.hat~psi.mius poitsfit$fitted.valuescol=3 More R techiques: Sort to draw the lie. order.=order ordered.=[order.] ordered.fit=fit$fitted.values[order.] ordered.fit=fit$fitted.values[order.] liesordered.ordered.fittpe="l"col= liesordered.ordered.fittpe="l"col=3 I this case λ=0 seems to be the best.

Box-Cox trasformatio. Modified power famil. Advatage: Uit of is the same as for all λ 3. Model Assumptio: 4. How to choose λ? Fix a λ fit model * for ad obtai RSSλ Tr various λ ad fid the oe which miimizes RSSλ * ' x x E M 0 if log... 0 if...... S M M

Example of Box-Cox trasformatio E M x ' x Modified power famil * M... log highwa.data=read.table"c:/highwa.txt"header=t =highwa.data$rate =legth gm=prod^{/}... if 0 if 0 Choose log or λ=- 0.5 #A lambda=- Trasform.A=-gm^*/- fit.a=lmtrasform.a~logadt+logtrks+shld+logledata=highwa.data Rss.A=sumfit.A$residuals^ #G lambda= Trasform.G=//gm*^- fit.g=lmtrasform.g~logadt+logtrks+shld+logledata=highwa.data Rss.G=sumfit.G$residuals^ plotc--/0/3/crss.arss.brss.crss.drss.erss.frss.gtpe="l"

Example of Box-Cox trasformatio # Read data highwa.data=read.table"c:/highwa.txt"header=t =highwa.data$rate =legth gm=prod^{/} #A lambda=- Trasform.A=-gm^*/- fit.a=lmtrasform.a~logadt+logtrks+shld+logle data=highwa.data Rss.A=sumfit.A$residuals^ #B lambda=-/ Trasform.B=-*gm^3/*^-/- fit.b=lmtrasform.b~logadt+logtrks+shld+logle data=highwa.data Rss.B=sumfit.B$residuals^ #C lambda=0 Trasform.C=gm*log fit.c=lmtrasform.c~logadt+logtrks+shld+logl edata=highwa.data Rss.C=sumfit.C$residuals^ #D lambda=/3 Trasform.D=3*gm^/3*^/3- fit.d=lmtrasform.d~logadt+logtrks+shld+logledata= highwa.data Rss.D=sumfit.D$residuals^ #E lambda=/ Trasform.E=*gm^/*sqrt- fit.e=lmtrasform.e~logadt+logtrks+shld+logledata= highwa.data Rss.E=sumfit.E$residuals^ #F lambda= Trasform.F= fit.f=lmtrasform.f~logadt+logtrks+shld+logledata=h ighwa.data Rss.F=sumfit.F$residuals^ #G lambda= Trasform.G=//gm*^- fit.g=lmtrasform.g~logadt+logtrks+shld+logledata= highwa.data Rss.G=sumfit.G$residuals^ plotc-- /0/3/cRss.ARss.BRss.CRss.DRss.ERs s.frss.gtpe="l"

Modified power trasformatio for all predictors Modified power famil Trasform predictors so that each pair of variables i the scatterplot matrix has a liear relatioship....... p p M M M p 0 if log... 0 if...... S M

Modified power trasformatio for all predictors Trasformatio with modified power famil... p M M... M p p Not a eas task. Ol use it if other methods do ot work well

Trasformatio of o-positive variables Problem of o-positive variables e.g. λ= S x S x we ca t distiguish betwee x ad x. logx is udefied if x<0. Solutios U Fid a sufficietl large ad trasform U to eo-johso trasformatio S U U 0 J U S U U 0 x

Fial Remarks No eed to trasform factors e.g. x F 0 F 0 group group we look at β to see the mea differet betwee the groups. Trasformig the dumm does t help. There is o correct wa of trasformatio oce ou come up with trasformatio... p p 0 which looks roughl liear i the scatterplot matrix the it is ok to fit. 0... 0 p p p