Ch. 5 Transformations and Weighting

Similar documents
1. Variance stabilizing transformations; Box-Cox Transformations - Section. 2. Transformations to linearize the model - Section 5.

6.1 Introduction. Regression Model:

Diagnostics can identify two possible areas of failure of assumptions when fitting linear models.

Example: Suppose Y has a Poisson distribution with mean

Lecture 14 Simple Linear Regression

Business Statistics. Tommaso Proietti. Linear Regression. DEF - Università di Roma 'Tor Vergata'

The Simple Regression Model. Part II. The Simple Regression Model

The Big Picture. Model Modifications. Example (cont.) Bacteria Count Example

Simple Linear Regression

Regression Models - Introduction

Lecture 24: Weighted and Generalized Least Squares

Statistical View of Least Squares

Model Modifications. Bret Larget. Departments of Botany and of Statistics University of Wisconsin Madison. February 6, 2007

Inference for Regression

Regression Models - Introduction

STAT5044: Regression and Anova. Inyoung Kim

Regression diagnostics

Introduction and Single Predictor Regression. Correlation

Unit 10: Simple Linear Regression and Correlation

Ch 2: Simple Linear Regression

Ma 3/103: Lecture 24 Linear Regression I: Estimation

Ch 3: Multiple Linear Regression

WEIGHTED LEAST SQUARES. Model Assumptions for Weighted Least Squares: Recall: We can fit least squares estimates just assuming a linear mean function.

Applied Regression Analysis

SCHOOL OF MATHEMATICS AND STATISTICS. Linear and Generalised Linear Models

Xβ is a linear combination of the columns of X: Copyright c 2010 Dan Nettleton (Iowa State University) Statistics / 25 X =

Quantitative Analysis of Financial Markets. Summary of Part II. Key Concepts & Formulas. Christopher Ting. November 11, 2017

Linear Regression. In this problem sheet, we consider the problem of linear regression with p predictors and one intercept,

STAT5044: Regression and Anova

Linear Regression Model. Badr Missaoui

STAT5044: Regression and Anova

Linear models and their mathematical foundations: Simple linear regression

Simple Linear Regression for the MPG Data

MAT2377. Rafa l Kulik. Version 2015/November/26. Rafa l Kulik

Lecture 18: Simple Linear Regression

x 21 x 22 x 23 f X 1 X 2 X 3 ε

Lecture 4 Multiple linear regression

Statistics: A review. Why statistics?

Introduction to Regression

Association studies and regression

Ordinary Least Squares Regression

Applied Econometrics (QEM)

Weighted Least Squares

Matrices and vectors A matrix is a rectangular array of numbers. Here s an example: A =

STAT 100C: Linear models

13 Simple Linear Regression

Summer School in Statistics for Astronomers V June 1 - June 6, Regression. Mosuk Chow Statistics Department Penn State University.

Lecture 1 Intro to Spatial and Temporal Data

Distribution Assumptions

Advanced Quantitative Methods: ordinary least squares

Simple Linear Regression

Tutorial 6: Linear Regression

Lecture 10 Multiple Linear Regression

Simple Linear Regression

Logistic regression. 11 Nov Logistic regression (EPFL) Applied Statistics 11 Nov / 20

Estimation of the Response Mean. Copyright c 2012 Dan Nettleton (Iowa State University) Statistics / 27

Stat 5100 Handout #26: Variations on OLS Linear Regression (Ch. 11, 13)

UNIVERSITY OF TORONTO Faculty of Arts and Science

Chapter 8: Simple Linear Regression

Weighted Least Squares

STAT 540: Data Analysis and Regression

1 Multiple Regression

Measuring the fit of the model - SSR

Final Review. Yang Feng. Yang Feng (Columbia University) Final Review 1 / 58

Intermediate Econometrics

Linear Modelling: Simple Regression

Linear Models in Machine Learning

where x and ȳ are the sample means of x 1,, x n

Part 8: GLMs and Hierarchical LMs and GLMs

STK4900/ Lecture 5. Program

Regression. Oscar García

STA 2201/442 Assignment 2

If g is also continuous and strictly increasing on J, we may apply the strictly increasing inverse function g 1 to this inequality to get

Nonlinear Models. What do you do when you don t have a line? What do you do when you don t have a line? A Quadratic Adventure

MS&E 226: Small Data. Lecture 11: Maximum likelihood (v2) Ramesh Johari

Lecture 6 Multiple Linear Regression, cont.

Introduction to Estimation Methods for Time Series models. Lecture 1

Lecture 2. The Simple Linear Regression Model: Matrix Approach

Ph.D. Qualifying Exam Friday Saturday, January 3 4, 2014

Review: what is a linear model. Y = β 0 + β 1 X 1 + β 2 X 2 + A model of the following form:

Overview Scatter Plot Example

An Introduction to Parameter Estimation

Lecture 16 Solving GLMs via IRWLS

Chapter 1: Linear Regression with One Predictor Variable also known as: Simple Linear Regression Bivariate Linear Regression

Multiple Linear Regression

Simple Linear Regression. (Chs 12.1, 12.2, 12.4, 12.5)

Lecture Notes 15 Prediction Chapters 13, 22, 20.4.

ECON The Simple Regression Model

Steps in Regression Analysis

Linear model A linear model assumes Y X N(µ(X),σ 2 I), And IE(Y X) = µ(x) = X β, 2/52

Chapter 1 Linear Regression with One Predictor

Time Series Analysis

SCHOOL OF MATHEMATICS AND STATISTICS

ST430 Exam 1 with Answers

IES 612/STA 4-573/STA Winter 2008 Week 1--IES 612-STA STA doc

,i = 1,2,L, p. For a sample of size n, let the columns of data be

Linear Regression. Simple linear regression model determines the relationship between one dependent variable (y) and one independent variable (x).

Applied Regression. Applied Regression. Chapter 2 Simple Linear Regression. Hongcheng Li. April, 6, 2013

Regression. Bret Hanlon and Bret Larget. December 8 15, Department of Statistics University of Wisconsin Madison.

Introduction to Regression

Transcription:

Outline Three approaches: Ch. 5 Transformations and Weighting. Variance stabilizing transformations; Box-Cox Transformations - Section 5.2; 5.4 2. Transformations to linearize the model - Section 5.3 3. Weighted regression - Section 5.5 Variance-Stabilizing Transformations Model assumptions: E[y x] = β 0 + β x V (y x) = σ 2 Set µ y = E[y x]. What if V (y x) = σ 2 f(µ y ) where f(x) is some non-constant function? Try to find a function g(y) so that V (g(y) x) = constant Then obtain a Taylor expansion of g(y) about µ y : Then V (g(y)) will be constant if g(y) = g(µ y ) + (y µ y )g (µ y ) + (y µ y) 2 g (µ y ) + 2 V (g(y)). = V (y) (g (µ y )) 2 = σ 2 f(µ y ) (g (µ y )) 2 g (µ y ) = g (z) = f(µy ) f(z) Examples:. f(x) = x (e.g. Poisson data) f(x) = x /2 g(y) = y Poisson vs Fitted 4 2 0 2 4 6 4 2 7 0 2 4 6 8 0 2 lm(formula = yy ~ xx)

Poisson (after sqrt) vs Fitted.0 0.5 0.0 0.5.0 4.0.5 2.0 2.5 3.0 3.5 lm(formula = I(sqrt(yy) ~ xx)) 2. f(x) = x 2 (e.g. Exponential data) 0 5 0 5 0 5 Exponential vs Fitted 5 8 0 5 0 5 0 5 20 lm(formula = yy ~ xx) f(x) = x g(y) = log(y) 3. f(x) = x( x) (e.g. binomial data) 5.4. Box-Cox Transformations (on response) Select the power λ in the transformation = f(x) x( x) d dx sin ( x) = 2 x( x) g(y) = arcsin( y) g(y) = y λ by maximum likelihood. Equivalent to minimizing the SSE with respect to λ (and other parameters). 2

Caution: The residual sums of squares are not comparable for different values of λ. We need to ensure that comparisons are made according to the same standard: { y (λ) y λ, λ 0 = λẏ λ ẏ log y, λ = 0 where Strategy: ẏ = geometric mean of the y s. Perform transformation y (λ),..., y(λ) n for several values of λ. 2. Compute SSE for each value of λ 3. Select λ which gives the minimum value. 4. Fit y λ = Xβ + ɛ Approximate confidence intervals for λ can also be obtained. In R, use boxcox(y~x, data= dataset) Examples:. Bacteria data (Ex. 5.3) - the average number of surviving bacteria (y) in a canned food product versus time (t) of exposure to 300 F heat. > library(mpv) > data(p5.3) > bact.lm <- lm(bact ~ min, data=p5.3) > plot(bact.lm, which=) # > plot(bact.lm, which=2) # > library(mass) > boxcox(bact.lm) # > bactlog.lm <- lm(log(bact) ~ min, data=p5.3) > plot(bactlog.lm, which=) # > plot(bactlog.lm, which=2) # vs Fitted 20 0 20 40 2 6 0 20 40 60 80 00 20 lm(formula = bact ~ min, data = p5.3) 3

Normal Q Q plot Standardized residuals 0 2 3 6 2.5.0 0.5 0.0 0.5.0.5 Theoretical Quantiles lm(formula = bact ~ min, data = p5.3) log Likelihood 65 60 55 50 45 40 35 95% 2 0 2 lambda 4

vs Fitted 0.2 0. 0.0 0. 0.2 0 7 2 2.5 3.0 3.5 4.0 4.5 5.0 lm(formula = log(bact) ~ min, data = p5.3) Normal Q Q plot Standardized residuals 0 2 7 0 2.5.0 0.5 0.0 0.5.0.5 A model of the form Theoretical Quantiles lm(formula = log(bact) ~ min, data = p5.3) log(y) = β 0 + β t + ε is reasonable, especially if β is negative ( β =.236). 2. trees data. 3 observations on Girth (g), Height (h) and Volume (V ) Simple Model: V =. g2 h 4π or log V = β 0 + β log h + β 2 log g + ε > library(daag) > data(trees); attach(trees) 5

> trees.lm <- lm(log(volume) ~ log(girth) + log(height)) > boxcox(trees.lm) # (lambda = is OK) > summary(trees.lm) Coefficients: Estimate Std. Error t value Pr(> t ) (Intercept) -6.632 0.800-8.29 5.e-09 log(height).7 0.204 5.46 7.8e-06 log(girth).983 0.075 26.43 < 2e-6 log Likelihood 5 0 5 20 25 95% 2 0 2 lambda Coefficient of log(height) is not distinguishable from, and coefficient of log(girth) is not distinguishable from 2. 5.3 Linearizing Transformations Intrinsically linear model: The relationship between y and x is such that a simple transformation can produce a linear model. Example: Fit the model Note that this implies multiplicative errors. i.e. E[y] = β 0 e βx log E[y] = log β 0 + β x log y i = β 0 + β x i + ε i y i = e β 0 +βxi+εi = β 0 e βxi e εi If the error is additive, i.e. y i = β 0 e βxi + ε i then the transformation is not appropriate. Other possibilities from the text 6

. E[y] = β 0 x β log E[y] = log β 0 + β log x New model: log y = β 0 + β log x i + ε i 2. x E[y] = β 0 x β E[y] = β 0 β (/x) New model: = β 0 + β ( /x i ) + ε i y i Example - windmill data (see wind) These data concern the relation between the electrical output of a windmill subjected to different wind velocities A decent model is DC output = β 0 + β /velocity + ε Windmill Data untransformed DC output 0.5.0.5 2.0 4 6 8 0 Wind Velocity Windmill Data transformed DC output 0.5.0.5 2.0 0.0 0.5 0.20 0.25 0.30 0.35 0.40 /Wind Velocity 7

Some models are intrinsically nonlinear: e.g. Michaelis-Menten model (useful for modelling chemical reaction rates) y = β 0x β + x + ε e.g. Mitcherlich Law (useful for modelling chemical yield, etc.) y = β 0 β γ x + ε e.g. Logistic Growth Model: y = β 0 + β e kx + ε Box-Tidwell transformation of a predictor variable Consider the model y = β 0 + β x α + ε If α is known, β 0 and β can be estimated... How can α be estimated? Suppose we have a good guess: α 0 Taylor expand x α about α 0 : so if α 0 is close to α, we have x α = x α0 + (α α 0 )x α0 log(x) + O((α α 0 ) 2 ) x α. = x α 0 + (α α 0 )x α0 log(x) Our regression model then looks like y =. β 0 + β x α0 + β (α α 0 )x α0 log(x) + ε so consider y =. β0 + βx α0 + β2x α0 log(x) + ε where β2 = β (α α 0 ). This gives the updating equation: α = β2/β + α 0 Algorithm:. Guess α: α 0 2. Fit 3. Fit y = β 0 + β x α0 + ε β y. = β 0 + β x α0 + β 2x α0 log(x) + ε β 2 4. Update α α = β 2/ β + α 0 Repeat above steps to get α 2.... Convergence usually in three iterations. There are instances where this procedure may not converge at all. Example: Windmill generation of electricity. DC output is measured against wind velocity: 8

wind v DC 5.00.582 2 6.00.822 3 3.40.057 4 2.70 0.500... 24 3.95.44 25 2.45 0.23 The scatterplot (windmill.pdf) indicates the need for a transformation. We saw earlier the usefulness of the reciprocal transformation of the velocity: /v. Does the Box-Tidwell procedure agree? Box-Tidwell: initial guess: α 0 = y = β 0 + β (/v) + ε > boxtidwell.lm(dc~v,data=wind) initial guess alpha_ alpha_2 alpha_3 alpha_4.000-0.98-0.836-0.833-0.833 y = β 0 + β (/v.833 ) + ε > wind.lm <- lm(dc ~ I(v^(.833)), data=wind) > summary(wind.lm) Coefficients: Estimate Std. Error t value Pr(> t ) (Intercept) 3.2608 0.054 63.4 <2e-6 I(v^(-0.833)) -6.4677 0.880-34.4 <2e-6 Fitted Model: ŷ = 3.26 6.47(/v.833 ) Windmill data: DC output vs Wind velocity DC 0.5.0.5 2.0 Transformed LS fits: red curve: reciprocal of v black curve: v^(.833) 4 6 8 0 v 9

Standardized residuals 2 0 Sample Q Q Plot Normal Q Q plot 20 9 8 Sample Quantiles 2 0 2 Simulated Q Q Plot 2 0 2 2 0 2 Theoretical Quantiles Theoretical Quantiles Simulated Q Q Plot Simulated Q Q Plot Sample Quantiles 2 0 2 Sample Quantiles 2 0 2 0 2 2 0 2 Theoretical Quantiles Theoretical Quantiles These plots indicate that this model fits fairly well. Note that the textbook implementation of the Box-Tidwell procedure is incorrect. Exercises on Box-Cox and Box-Tidwell: 5.4 (data are in p5.4, do you need to transform the response or the predictor? Check all diagnostics before and after transforming. Also, obtain a plot of the data with the overlaid curve.), 5.2 (data are in p5.2; for part (c) check the Box-Tidwell transformation is it consistent with the theory?), 5.3 (p5.3), 5.5 (p5.5) 5.5.2 Weighted Least Squares Consider the regression through the origin model y i = β x i + ε i with E[ε i ] = 0 and suppose V (y i x i ) = σ 2 /w i where w i is a known weight. i.e. E[ε 2 i ] = σ 2 /w i The least squares estimate was previously found by minimizing n i= ε i: β = xi y i x 2 i Gauss-Markov Theorem: When the variances are constant, β has the smallest variance of any linear unbiased of β. β is not the best linear unbiased estimator for β when there are weights w i. To find the BLUE now, multiply the model by a i : a i y i = a i β x i + a i ε i or Compute β for the new data (x i, y i ): y i = β x i + ε i β = x i y i (x i ) 2 0

E[ β ] = β (unbiased) x V ( β ) = σ 2 2 i a 4 i /w i ( a 2 i x2 i )2 How do we choose a, a 2,..., a n to make this as small as possible? Recall: Cauchy-Schwarz Inequality: ( n ) 2 n n u i v i i= u 2 j vk 2 j= k= (equality holds if the u i s are proportional to the v i s: u i = cv i ) Look at the denominator of our variance: ( n ) 2 ( n ) ( n ) a 2 i x 2 i a 4 i x 2 i /w i w i x 2 i i= i= (equality holds if the u i s are proportional to the v i s: e.g. a 4 i x2 i /w i = w i x 2 i or a i = w i ) Thus, the V ( β ) is minimized if a i = w i : Note also that and that instead of minimizing we are now minimizing Example: roller data Ordinary Least Squares: V ( β ) = σ 2 n i= wx2 i i= E[ w i ε i ] = 0 and V ( w i ε i ) = σ 2 n i= ε 2 i n w i ε 2 i i= Ordinary Least Squares Weighted Least Squares roller.lm <- lm(depression~weight, data=roller) plot(roller.lm, which=4) The residual plot indicates that the variance might not be constant. vs Fitted 0 5 0 5 0 5 7 8 5 0 5 20 25 30 lm(formula = depression ~ weight, data = roller)

Weighted Least Squares: roller.wlm <- lm(depression~weight, data=roller, weights=/weight^2) plot(roller.wlm, which=4) vs Fitted 0 5 0 5 0 5 8 0 0 5 0 5 20 25 30 35 lm(formula = depression ~ weight, data = roller, weights = /weight^2) a more random pattern. Roller Data depression 0 5 0 5 20 25 30 OLS WLS 2 4 6 8 0 2 weight compares the fitted lines. 5.5. Generalized Least Squares Model: E[ɛ] = 0 and E[ɛɛT ] = Σ = σ 2 V. y = Xβ + ɛ Σ must be symmetric and positive definite. This implies, among other things, that Σ possesses an inverse. Weighted Least Squares is a special case where Σ is a diagonal matrix with ii element σ 2 /w i V = K 2 for some symmetric nonsingular K. Consider K y = K Xβ + K ɛ 2

Note Var(K ɛ) = E[K ɛɛt K ] = K σ 2 V K = σ 2 I By multiplying through by K we now have a constant variance, so β can be estimated by Least-Squares: β = (X T K 2 X) X T K 2 y = (X T V X) X T V y β is the generalized least-squares estimator for β. Unbiased: E[ β ] = β Variance: Var( β ) = (X T V X) X T V ΣV X(X T V X) = σ 2 (X T V X) 3