1. Variance stabilizing transformations; Box-Cox Transformations - Section. 2. Transformations to linearize the model - Section 5.

Size: px
Start display at page:

Download "1. Variance stabilizing transformations; Box-Cox Transformations - Section. 2. Transformations to linearize the model - Section 5."

Transcription

1 Ch. 5: Transformations and Weighting 1. Variance stabilizing transformations; Box-Cox Transformations - Section 5.2; Transformations to linearize the model - Section Weighted regression - Section 5.5 1

2 Variance-Stabilizing Transformations Model assumptions: E[y x] = β 0 + β 1 x V (y x) = σ 2 Set µ y = E[y x]. What if V (y x) = σ 2 f(µ y ) where f(x) is some non-constant function? Try to find a function g(y) so that V (g(y) x) = constant 2

3 Variance-Stabilizing Transformations (cont d) Then obtain a Taylor expansion of g(y) about µ y : g(y) = g(µ y ) + (y µ y )g (µ y ) + (y µ y) 2 g (µ y ) + 2 Then V (g(y)) =. V (y) ( g (µ y ) ) 2 = σ 2 f(µ y ) ( g (µ y ) ) 2 V (g(y)) will be constant if g (µ y ) = 1 f(µ y ) g (z) = 1 f(z) 3

4 Examples 1. f(x) = x (e.g. Poisson data) 1 f(x) = x 1/2 g(y) = y Poisson Residuals Residuals vs Fitted Poisson Residuals (after sqrt) Residuals vs Fitted Residuals Residuals Fitted values lm(formula = yy ~ xx) Fitted values lm(formula = I(sqrt(yy) ~ xx)) 4

5 Examples (cont d) 2. f(x) = x 2 (e.g. Exponential data) Residuals Exponential Residuals Residuals vs Fitted f(x) = 1 x g(y) = log(y) Fitted values lm(formula = yy ~ xx) 5

6 Examples (cont d) 3. f(x) = x(1 x) (e.g. binomial data) 1 = f(x) 1 x(1 x) d dx sin 1 ( x) = 1 2 x(1 x) g(y) = arcsin( y) 6

7 5.4.1 Box-Cox Transformations (on response) Select the power λ in the transformation g(y) = y λ by maximum likelihood. Equivalent to minimizing the SSE with respect to λ (and other parameters). Caution: The residual sums of squares are not comparable for different values of λ. We need to ensure that comparisons are made according to the same standard: where y (λ) = y λ 1 λẏ λ 1, λ 0 ẏ log y, λ = 0 ẏ = geometric mean of the y s 7

8 Strategy 1. Perform transformation y (λ) 1,..., y(λ) n for several values of λ. 2. Compute SSE for each value of λ 3. Select λ which gives the minimum value. 4. Fit y λ = Xβ + ɛ 5. Approximate confidence intervals for λ can also be obtained. 6. In R, use boxcox(y x, data= dataset) 8

9 Example 1 1. Bacteria data (Ex. 5.3) - the average number of surviving bacteria (y) in a canned food product versus time (t) of exposure to 300 F heat. 9

10 Example 1 (cont d) > library(mpv) > data(p5.3) > bact.lm <- lm(bact min, data=p5.3) > plot(bact.lm, which=1) # > plot(bact.lm, which=2) # > library(mass) > boxcox(bact.lm) # > bactlog.lm <- lm(log(bact) min, data=p5.3) > plot(bactlog.lm, which=1) # > plot(bactlog.lm, which=2) # 10

11 Residuals vs. Fitted Residuals vs Fitted 1 Residuals Fitted values lm(formula = bact ~ min, data = p5.3) 11

12 Q-Q Plot Normal Q Q plot Standardized residuals Theoretical Quantiles lm(formula = bact ~ min, data = p5.3) 12

13 Box-Cox log Likelihood % lambda 13

14 Residuals vs. Fitted (after log-transforming) Residuals vs Fitted Residuals Fitted values lm(formula = log(bact) ~ min, data = p5.3) 14

15 Q-Q Plot (after log-transforming) Normal Q Q plot Standardized residuals Theoretical Quantiles lm(formula = log(bact) ~ min, data = p5.3) 15

16 Example (cont d) A model of the form log(y) = β 0 + β 1 t + ε is reasonable, especially if β 1 is negative ( β 1 =.236). 16

17 Example 2 trees data. 31 observations on Girth (g), Height (h) and Volume (V ) A Simple Model: or V. = g2 h 4π log V = β 0 + β 1 log h + β 2 log g + ε 17

18 Example 2 (Cont d) > library(daag) > data(trees); attach(trees) > trees.lm <- lm(log(volume) log(girth) + log(height)) > boxcox(trees.lm) # (lambda = 1 is OK) > summary(trees.lm) Coefficients: Estimate Std. Error t value Pr(> t ) (Intercept) e-09 log(height) e-06 log(girth) < 2e-16 18

19 Example 2 (Cont d) - Box-Cox after Transforming log Likelihood % lambda Coefficient of log(height) is not distinguishable from 1, and coefficient of log(girth) is not distinguishable from 2. 19

20 5.3 Linearizing Transformations Intrinsically linear model: The relationship between y and x is such that a simple transformation can produce a linear model. Example: Fit the model E[y] = β 0 e β 1x log E[y] = log β 0 + β 1 x log y i = β 0 + β 1x i + ε i Note that this implies multiplicative errors. i.e. y i = e β 0 +β 1x i +ε i = β 0 e β 1x i e ε i If the error is additive, i.e. y i = β 0 e β 1x i + ε i then the transformation is not appropriate. 20

21 Other possibilities from the text E[y] = β 0 x β 1 log E[y] = log β 0 + β 1 log x New model: log y = β 0 + β 1 log x i + ε i E[y] = x β 0 x β 1 New model: 1 E[y] = β 0 β 1 (1/x) 1 y i = β 0 + β 1 ( 1/x i ) + ε i 21

22 Example - Windmill Data These data concern the relation between the electrical output of a windmill subjected to different wind velocities. A decent model is: DC output = β 0 + β 1 /velocity + ε 22

23 Scatter Plots Before and After Transformation Windmill Data untransformed Windmill Data transformed DC output DC output Wind Velocity /Wind Velocity 23

24 Some models that are intrinsically nonlinear Michaelis-Menten model (useful for modelling chemical reaction rates) y = β 0x β 1 + x + ε Mitcherlich Law (useful for modelling chemical yield, etc.) Logistic Growth Model: y = β 0 β 1 γ x + ε y = β β 1 e kx + ε 24

25 Box-Tidwell transformation of a predictor variable Consider the model y = β 0 + β 1 x α + ε If α is known, β 0 and β 1 can be estimated... How can α be estimated? 25

26 Suppose we have a good guess: α 0 Taylor expand x α about α 0 : x α = x α 0 + (α α 0 )x α 0 log(x) + O((α α 0 ) 2 ) so if α 0 is close to α, we have x α. = x α 0 + (α α 0 )x α 0 log(x) Our regression model then looks like y =. β 0 + β 1 x α 0 + β 1 (α α 0 )x α 0 log(x) + ε so consider y =. β0 + β 1 xα 0 + β2 xα 0 log(x) + ε where β2 = β 1(α α 0 ). This gives the updating equation: α = β 2 /β 1 + α 0 26

27 Box-Tidwell Procedure 1. Guess α: α 0 2. Fit y = β 0 + β 1 x α 0 + ε β 1 3. Fit y. = β 0 + β 1 xα 0 + β 2 xα 0 log(x) + ε β 2 4. Update α α 1 = β 2 / β 1 + α 0 5. Repeat above steps to get α

28 Box-Tidwell Procedure (cont d) Convergence usually in three iterations. There are instances where this procedure may not converge at all. Note that the textbook implementation of the Box-Tidwell procedure is incorrect. 28

29 Example Windmill generation of electricity. DC output is measured against wind velocity: wind v DC

30 Windmill Example (cont d) The scatterplot (windmill.pdf) indicates the need for a transformation. We saw earlier the usefulness of the reciprocal transformation of the velocity: 1/v. y = β 0 + β 1 (1/v) + ε Does the Box-Tidwell procedure agree? 30

31 Box-Tidwell Initial guess: α 0 = 1 > boxtidwell.lm(dc v,data=wind) initial guess alpha_1 alpha_2 alpha_3 alpha_ y = β 0 + β 1 (1/v.833 ) + ε > wind.lm <- lm(dc I(vˆ(.833)), data=wind) > summary(wind.lm) Coefficients: Estimate Std. Error t value Pr(> t ) (Intercept) <2e-16 I(vˆ(-0.833)) <2e-16 Fitted Model: ŷ = (1/v.833 ) 31

32 Windmill Example (cont d) v DC Windmill data: DC output vs Wind velocity red curve: reciprocal of v black curve: v^(.833) Transformed LS fits: 32

33 Windmill Example (cont d) Standardized residuals Sample Q Q Plot Normal Q Q plot Sample Quantiles Simulated Q Q Plot Theoretical Quantiles Theoretical Quantiles Simulated Q Q Plot Simulated Q Q Plot Sample Quantiles Sample Quantiles Theoretical Quantiles Theoretical Quantiles These plots indicate that this model fits fairly well. 33

34 Exercises on Box-Cox and Box-Tidwell Analyse the data in p5.4. Do you need to transform the response or the predictor? Check all diagnostics before and after transforming. Also, obtain a plot of the data with the overlaid curve., Analyze the data in p5.2; check the Box-Tidwell transformation is it consistent with the theory described in Exercise 5.2 of the textbook. Analyze the data in p5.3. Analyze the data in p

35 5.5.2 Weighted Least Squares Consider the regression through the origin model y i = β 1 x i + ε i with E[ε i ] = 0 and suppose V (y i x i ) = σ 2 /w i where w i is a known weight. i.e. E[ε 2 i ] = σ2 /w i The least squares estimate was previously found by minimizing n i=1 ε i : β 1 = xi y i x 2 i Gauss-Markov Theorem: When the variances are constant, β 1 has the smallest variance of any linear unbiased of β 1. 35

36 Weighted Least Squares (cont d) β 1 is not the best linear unbiased estimator for β 1 when there are weights w i. To find the BLUE now, multiply the model by a i : or a i y i = a i β 1 x i + a i ε i y i = β 1x i + ε i Compute β 1 for the new data (x i, y i ): β 1 = x i y i (x i ) 2 E[ β 1 ] = β 1 (unbiased) V ( β 1 ) = σ 2 x 2 i a 4 i /w i ( a 2 i x2 i )2 36

37 Weighted Least Squares (cont d) How do we choose a 1, a 2,..., a n to make this as small as possible? Recall: Cauchy-Schwarz Inequality: n i=1 u i v i 2 n u 2 n j vk 2 j=1 k=1 (equality holds if the u i s are proportional to the v i s: u i = cv i ) Look at the denominator of our variance: 2 n a 2 i x2 i i=1 n i=1 a 4 i x2 i /w i n i=1 w i x 2 i (equality holds if the u i s are proportional to the v i s: e.g. a 4 i x2 i /w i = w i x 2 i or a i = w i ) 37

38 Weighted Least Squares (cont d) Thus, the V ( β 1 ) is minimized if a i = w i : V ( β 1 ) = σ 2 ni=1 wx 2 i Note also that E[ w i ε i ] = 0 and V ( w i ε i ) = σ 2 and that instead of minimizing n ε 2 i i=1 we are now minimizing n i=1 w i ε 2 i Ordinary Least Squares Weighted Least Squares 38

39 Example roller data Ordinary Least Squares: roller.lm <- lm(depression weight, data=roller) plot(roller.lm, which=4) 39

40 Example (Cont d) Residuals vs Fitted Residuals Fitted values lm(formula = depression ~ weight, data = roller) The residual plot indicates that the variance might not be constant. 40

41 Weighted Least Squares roller.wlm <- lm(depression weight, data=roller, weights=1/weightˆ2) plot(roller.wlm, which=4) Residuals vs Fitted Residuals Fitted values lm(formula = depression ~ weight, data = roller, weights = 1/weight^2) a more random pattern 41

42 Weighted Least Squares Comparing the fitted lines: Roller Data depression OLS WLS weight 42

43 Generalized Least Squares Model: y = Xβ + ɛ E[ɛ ] = 0 and E[ɛ ɛ T ] = Σ = σ 2 V. Σ must be symmetric and positive definite. This implies, among other things, that Σ possesses an inverse. Weighted Least Squares is a special case where Σ is a diagonal matrix with ii element σ 2 /w i V = K 2 for some symmetric nonsingular K. 43

44 Generalized Least Squares (cont d) Consider Note K 1 y = K 1 Xβ + K 1 ɛ Var(K 1 ɛ ) = E[K 1 ɛ ɛ T K 1 ] = K 1 σ 2 V K 1 = σ 2 I By multiplying through by K 1 we now have a constant variance, so β can be estimated by Least-Squares: β = (X T K 2 X) 1 X T K 2 y β = (X T V 1 X) 1 X T V 1 y is the generalized least-squares estimator for β. 44

45 Generalized Least Squares (cont d) Unbiased: E[ β ] = β Variance: Var( β ) = (X T V 1 X) 1 X T V 1 ΣV 1 X(X T V 1 X) 1 = σ 2 (X T V 1 X) 1 45

Ch. 5 Transformations and Weighting

Ch. 5 Transformations and Weighting Outline Three approaches: Ch. 5 Transformations and Weighting. Variance stabilizing transformations; Box-Cox Transformations - Section 5.2; 5.4 2. Transformations to linearize the model - Section 5.3 3.

More information

6.1 Introduction. Regression Model:

6.1 Introduction. Regression Model: 6.1 Introduction Regression Model: y = Xβ + ɛ Assumptions: 1. The relationship between y and the predictors is linear. 2. The noise term has zero mean. ɛ 3. All ε s have the same variance σ 2. 4. The ε

More information

Example: Suppose Y has a Poisson distribution with mean

Example: Suppose Y has a Poisson distribution with mean Transformations A variance stabilizing transformation may be useful when the variance of y appears to depend on the value of the regressor variables, or on the mean of y. Table 5.1 lists some commonly

More information

Diagnostics can identify two possible areas of failure of assumptions when fitting linear models.

Diagnostics can identify two possible areas of failure of assumptions when fitting linear models. 1 Transformations 1.1 Introduction Diagnostics can identify two possible areas of failure of assumptions when fitting linear models. (i) lack of Normality (ii) heterogeneity of variances It is important

More information

The Big Picture. Model Modifications. Example (cont.) Bacteria Count Example

The Big Picture. Model Modifications. Example (cont.) Bacteria Count Example The Big Picture Remedies after Model Diagnostics The Big Picture Model Modifications Bret Larget Departments of Botany and of Statistics University of Wisconsin Madison February 6, 2007 Residual plots

More information

Model Modifications. Bret Larget. Departments of Botany and of Statistics University of Wisconsin Madison. February 6, 2007

Model Modifications. Bret Larget. Departments of Botany and of Statistics University of Wisconsin Madison. February 6, 2007 Model Modifications Bret Larget Departments of Botany and of Statistics University of Wisconsin Madison February 6, 2007 Statistics 572 (Spring 2007) Model Modifications February 6, 2007 1 / 20 The Big

More information

Business Statistics. Tommaso Proietti. Linear Regression. DEF - Università di Roma 'Tor Vergata'

Business Statistics. Tommaso Proietti. Linear Regression. DEF - Università di Roma 'Tor Vergata' Business Statistics Tommaso Proietti DEF - Università di Roma 'Tor Vergata' Linear Regression Specication Let Y be a univariate quantitative response variable. We model Y as follows: Y = f(x) + ε where

More information

The Simple Regression Model. Part II. The Simple Regression Model

The Simple Regression Model. Part II. The Simple Regression Model Part II The Simple Regression Model As of Sep 22, 2015 Definition 1 The Simple Regression Model Definition Estimation of the model, OLS OLS Statistics Algebraic properties Goodness-of-Fit, the R-square

More information

Regression Models - Introduction

Regression Models - Introduction Regression Models - Introduction In regression models there are two types of variables that are studied: A dependent variable, Y, also called response variable. It is modeled as random. An independent

More information

Lecture 24: Weighted and Generalized Least Squares

Lecture 24: Weighted and Generalized Least Squares Lecture 24: Weighted and Generalized Least Squares 1 Weighted Least Squares When we use ordinary least squares to estimate linear regression, we minimize the mean squared error: MSE(b) = 1 n (Y i X i β)

More information

Statistical View of Least Squares

Statistical View of Least Squares Basic Ideas Some Examples Least Squares May 22, 2007 Basic Ideas Simple Linear Regression Basic Ideas Some Examples Least Squares Suppose we have two variables x and y Basic Ideas Simple Linear Regression

More information

Introduction and Single Predictor Regression. Correlation

Introduction and Single Predictor Regression. Correlation Introduction and Single Predictor Regression Dr. J. Kyle Roberts Southern Methodist University Simmons School of Education and Human Development Department of Teaching and Learning Correlation A correlation

More information

STAT5044: Regression and Anova. Inyoung Kim

STAT5044: Regression and Anova. Inyoung Kim STAT5044: Regression and Anova Inyoung Kim 2 / 47 Outline 1 Regression 2 Simple Linear regression 3 Basic concepts in regression 4 How to estimate unknown parameters 5 Properties of Least Squares Estimators:

More information

Ch 3: Multiple Linear Regression

Ch 3: Multiple Linear Regression Ch 3: Multiple Linear Regression 1. Multiple Linear Regression Model Multiple regression model has more than one regressor. For example, we have one response variable and two regressor variables: 1. delivery

More information

Regression diagnostics

Regression diagnostics Regression diagnostics Kerby Shedden Department of Statistics, University of Michigan November 5, 018 1 / 6 Motivation When working with a linear model with design matrix X, the conventional linear model

More information

Applied Regression Analysis

Applied Regression Analysis Applied Regression Analysis Chapter 3 Multiple Linear Regression Hongcheng Li April, 6, 2013 Recall simple linear regression 1 Recall simple linear regression 2 Parameter Estimation 3 Interpretations of

More information

Simple Linear Regression

Simple Linear Regression Simple Linear Regression September 24, 2008 Reading HH 8, GIll 4 Simple Linear Regression p.1/20 Problem Data: Observe pairs (Y i,x i ),i = 1,...n Response or dependent variable Y Predictor or independent

More information

Inference for Regression

Inference for Regression Inference for Regression Section 9.4 Cathy Poliak, Ph.D. cathy@math.uh.edu Office in Fleming 11c Department of Mathematics University of Houston Lecture 13b - 3339 Cathy Poliak, Ph.D. cathy@math.uh.edu

More information

Lecture 14 Simple Linear Regression

Lecture 14 Simple Linear Regression Lecture 4 Simple Linear Regression Ordinary Least Squares (OLS) Consider the following simple linear regression model where, for each unit i, Y i is the dependent variable (response). X i is the independent

More information

Ch 2: Simple Linear Regression

Ch 2: Simple Linear Regression Ch 2: Simple Linear Regression 1. Simple Linear Regression Model A simple regression model with a single regressor x is y = β 0 + β 1 x + ɛ, where we assume that the error ɛ is independent random component

More information

Ma 3/103: Lecture 24 Linear Regression I: Estimation

Ma 3/103: Lecture 24 Linear Regression I: Estimation Ma 3/103: Lecture 24 Linear Regression I: Estimation March 3, 2017 KC Border Linear Regression I March 3, 2017 1 / 32 Regression analysis Regression analysis Estimate and test E(Y X) = f (X). f is the

More information

WEIGHTED LEAST SQUARES. Model Assumptions for Weighted Least Squares: Recall: We can fit least squares estimates just assuming a linear mean function.

WEIGHTED LEAST SQUARES. Model Assumptions for Weighted Least Squares: Recall: We can fit least squares estimates just assuming a linear mean function. 1 2 WEIGHTED LEAST SQUARES Recall: We can fit least squares estimates just assuming a linear mean function. Without the constant variance assumption, we can still conclude that the coefficient estimators

More information

Lecture 18: Simple Linear Regression

Lecture 18: Simple Linear Regression Lecture 18: Simple Linear Regression BIOS 553 Department of Biostatistics University of Michigan Fall 2004 The Correlation Coefficient: r The correlation coefficient (r) is a number that measures the strength

More information

SCHOOL OF MATHEMATICS AND STATISTICS. Linear and Generalised Linear Models

SCHOOL OF MATHEMATICS AND STATISTICS. Linear and Generalised Linear Models SCHOOL OF MATHEMATICS AND STATISTICS Linear and Generalised Linear Models Autumn Semester 2017 18 2 hours Attempt all the questions. The allocation of marks is shown in brackets. RESTRICTED OPEN BOOK EXAMINATION

More information

Xβ is a linear combination of the columns of X: Copyright c 2010 Dan Nettleton (Iowa State University) Statistics / 25 X =

Xβ is a linear combination of the columns of X: Copyright c 2010 Dan Nettleton (Iowa State University) Statistics / 25 X = The Gauss-Markov Linear Model y Xβ + ɛ y is an n random vector of responses X is an n p matrix of constants with columns corresponding to explanatory variables X is sometimes referred to as the design

More information

Quantitative Analysis of Financial Markets. Summary of Part II. Key Concepts & Formulas. Christopher Ting. November 11, 2017

Quantitative Analysis of Financial Markets. Summary of Part II. Key Concepts & Formulas. Christopher Ting. November 11, 2017 Summary of Part II Key Concepts & Formulas Christopher Ting November 11, 2017 christopherting@smu.edu.sg http://www.mysmu.edu/faculty/christophert/ Christopher Ting 1 of 16 Why Regression Analysis? Understand

More information

Regression Models - Introduction

Regression Models - Introduction Regression Models - Introduction In regression models, two types of variables that are studied: A dependent variable, Y, also called response variable. It is modeled as random. An independent variable,

More information

Linear Regression Model. Badr Missaoui

Linear Regression Model. Badr Missaoui Linear Regression Model Badr Missaoui Introduction What is this course about? It is a course on applied statistics. It comprises 2 hours lectures each week and 1 hour lab sessions/tutorials. We will focus

More information

Linear models and their mathematical foundations: Simple linear regression

Linear models and their mathematical foundations: Simple linear regression Linear models and their mathematical foundations: Simple linear regression Steffen Unkel Department of Medical Statistics University Medical Center Göttingen, Germany Winter term 2018/19 1/21 Introduction

More information

MAT2377. Rafa l Kulik. Version 2015/November/26. Rafa l Kulik

MAT2377. Rafa l Kulik. Version 2015/November/26. Rafa l Kulik MAT2377 Rafa l Kulik Version 2015/November/26 Rafa l Kulik Bivariate data and scatterplot Data: Hydrocarbon level (x) and Oxygen level (y): x: 0.99, 1.02, 1.15, 1.29, 1.46, 1.36, 0.87, 1.23, 1.55, 1.40,

More information

Simple Linear Regression for the MPG Data

Simple Linear Regression for the MPG Data Simple Linear Regression for the MPG Data 2000 2500 3000 3500 15 20 25 30 35 40 45 Wgt MPG What do we do with the data? y i = MPG of i th car x i = Weight of i th car i =1,...,n n = Sample Size Exploratory

More information

STAT 100C: Linear models

STAT 100C: Linear models STAT 100C: Linear models Arash A. Amini June 9, 2018 1 / 56 Table of Contents Multiple linear regression Linear model setup Estimation of β Geometric interpretation Estimation of σ 2 Hat matrix Gram matrix

More information

13 Simple Linear Regression

13 Simple Linear Regression B.Sc./Cert./M.Sc. Qualif. - Statistics: Theory and Practice 3 Simple Linear Regression 3. An industrial example A study was undertaken to determine the effect of stirring rate on the amount of impurity

More information

Linear Regression. In this problem sheet, we consider the problem of linear regression with p predictors and one intercept,

Linear Regression. In this problem sheet, we consider the problem of linear regression with p predictors and one intercept, Linear Regression In this problem sheet, we consider the problem of linear regression with p predictors and one intercept, y = Xβ + ɛ, where y t = (y 1,..., y n ) is the column vector of target values,

More information

Measuring the fit of the model - SSR

Measuring the fit of the model - SSR Measuring the fit of the model - SSR Once we ve determined our estimated regression line, we d like to know how well the model fits. How far/close are the observations to the fitted line? One way to do

More information

Chapter 8: Simple Linear Regression

Chapter 8: Simple Linear Regression Chapter 8: Simple Linear Regression Shiwen Shen University of South Carolina 2017 Summer 1 / 70 Introduction A problem that arises in engineering, economics, medicine, and other areas is that of investigating

More information

Ordinary Least Squares Regression

Ordinary Least Squares Regression Ordinary Least Squares Regression Goals for this unit More on notation and terminology OLS scalar versus matrix derivation Some Preliminaries In this class we will be learning to analyze Cross Section

More information

Simple Linear Regression. (Chs 12.1, 12.2, 12.4, 12.5)

Simple Linear Regression. (Chs 12.1, 12.2, 12.4, 12.5) 10 Simple Linear Regression (Chs 12.1, 12.2, 12.4, 12.5) Simple Linear Regression Rating 20 40 60 80 0 5 10 15 Sugar 2 Simple Linear Regression Rating 20 40 60 80 0 5 10 15 Sugar 3 Simple Linear Regression

More information

STAT5044: Regression and Anova

STAT5044: Regression and Anova STAT5044: Regression and Anova Inyoung Kim 1 / 15 Outline 1 Fitting GLMs 2 / 15 Fitting GLMS We study how to find the maxlimum likelihood estimator ˆβ of GLM parameters The likelihood equaions are usually

More information

STAT5044: Regression and Anova

STAT5044: Regression and Anova STAT5044: Regression and Anova Inyoung Kim 1 / 25 Outline 1 Multiple Linear Regression 2 / 25 Basic Idea An extra sum of squares: the marginal reduction in the error sum of squares when one or several

More information

Lecture 6 Multiple Linear Regression, cont.

Lecture 6 Multiple Linear Regression, cont. Lecture 6 Multiple Linear Regression, cont. BIOST 515 January 22, 2004 BIOST 515, Lecture 6 Testing general linear hypotheses Suppose we are interested in testing linear combinations of the regression

More information

Linear Models in Machine Learning

Linear Models in Machine Learning CS540 Intro to AI Linear Models in Machine Learning Lecturer: Xiaojin Zhu jerryzhu@cs.wisc.edu We briefly go over two linear models frequently used in machine learning: linear regression for, well, regression,

More information

Weighted Least Squares

Weighted Least Squares Weighted Least Squares ST 430/514 Recall the linear regression equation E(Y ) = β 0 + β 1 x 1 + β 2 x 2 + + β k x k We have estimated the parameters β 0, β 1, β 2,..., β k by minimizing the sum of squared

More information

Matrices and vectors A matrix is a rectangular array of numbers. Here s an example: A =

Matrices and vectors A matrix is a rectangular array of numbers. Here s an example: A = Matrices and vectors A matrix is a rectangular array of numbers Here s an example: 23 14 17 A = 225 0 2 This matrix has dimensions 2 3 The number of rows is first, then the number of columns We can write

More information

Lecture 4 Multiple linear regression

Lecture 4 Multiple linear regression Lecture 4 Multiple linear regression BIOST 515 January 15, 2004 Outline 1 Motivation for the multiple regression model Multiple regression in matrix notation Least squares estimation of model parameters

More information

Unit 10: Simple Linear Regression and Correlation

Unit 10: Simple Linear Regression and Correlation Unit 10: Simple Linear Regression and Correlation Statistics 571: Statistical Methods Ramón V. León 6/28/2004 Unit 10 - Stat 571 - Ramón V. León 1 Introductory Remarks Regression analysis is a method for

More information

x 21 x 22 x 23 f X 1 X 2 X 3 ε

x 21 x 22 x 23 f X 1 X 2 X 3 ε Chapter 2 Estimation 2.1 Example Let s start with an example. Suppose that Y is the fuel consumption of a particular model of car in m.p.g. Suppose that the predictors are 1. X 1 the weight of the car

More information

Distribution Assumptions

Distribution Assumptions Merlise Clyde Duke University November 22, 2016 Outline Topics Normality & Transformations Box-Cox Nonlinear Regression Readings: Christensen Chapter 13 & Wakefield Chapter 6 Linear Model Linear Model

More information

Simple Linear Regression

Simple Linear Regression Simple Linear Regression ST 430/514 Recall: A regression model describes how a dependent variable (or response) Y is affected, on average, by one or more independent variables (or factors, or covariates)

More information

Introduction to Regression

Introduction to Regression Introduction to Regression David E Jones (slides mostly by Chad M Schafer) June 1, 2016 1 / 102 Outline General Concepts of Regression, Bias-Variance Tradeoff Linear Regression Nonparametric Procedures

More information

Regression. Oscar García

Regression. Oscar García Regression Oscar García Regression methods are fundamental in Forest Mensuration For a more concise and general presentation, we shall first review some matrix concepts 1 Matrices An order n m matrix is

More information

Advanced Quantitative Methods: ordinary least squares

Advanced Quantitative Methods: ordinary least squares Advanced Quantitative Methods: Ordinary Least Squares University College Dublin 31 January 2012 1 2 3 4 5 Terminology y is the dependent variable referred to also (by Greene) as a regressand X are the

More information

Estimation of the Response Mean. Copyright c 2012 Dan Nettleton (Iowa State University) Statistics / 27

Estimation of the Response Mean. Copyright c 2012 Dan Nettleton (Iowa State University) Statistics / 27 Estimation of the Response Mean Copyright c 202 Dan Nettleton (Iowa State University) Statistics 5 / 27 The Gauss-Markov Linear Model y = Xβ + ɛ y is an n random vector of responses. X is an n p matrix

More information

Logistic regression. 11 Nov Logistic regression (EPFL) Applied Statistics 11 Nov / 20

Logistic regression. 11 Nov Logistic regression (EPFL) Applied Statistics 11 Nov / 20 Logistic regression 11 Nov 2010 Logistic regression (EPFL) Applied Statistics 11 Nov 2010 1 / 20 Modeling overview Want to capture important features of the relationship between a (set of) variable(s)

More information

Stat 5100 Handout #26: Variations on OLS Linear Regression (Ch. 11, 13)

Stat 5100 Handout #26: Variations on OLS Linear Regression (Ch. 11, 13) Stat 5100 Handout #26: Variations on OLS Linear Regression (Ch. 11, 13) 1. Weighted Least Squares (textbook 11.1) Recall regression model Y = β 0 + β 1 X 1 +... + β p 1 X p 1 + ε in matrix form: (Ch. 5,

More information

Final Review. Yang Feng. Yang Feng (Columbia University) Final Review 1 / 58

Final Review. Yang Feng.   Yang Feng (Columbia University) Final Review 1 / 58 Final Review Yang Feng http://www.stat.columbia.edu/~yangfeng Yang Feng (Columbia University) Final Review 1 / 58 Outline 1 Multiple Linear Regression (Estimation, Inference) 2 Special Topics for Multiple

More information

STAT 540: Data Analysis and Regression

STAT 540: Data Analysis and Regression STAT 540: Data Analysis and Regression Wen Zhou http://www.stat.colostate.edu/~riczw/ Email: riczw@stat.colostate.edu Department of Statistics Colorado State University Fall 205 W. Zhou (Colorado State

More information

Overview Scatter Plot Example

Overview Scatter Plot Example Overview Topic 22 - Linear Regression and Correlation STAT 5 Professor Bruce Craig Consider one population but two variables For each sampling unit observe X and Y Assume linear relationship between variables

More information

Weighted Least Squares

Weighted Least Squares Weighted Least Squares The standard linear model assumes that Var(ε i ) = σ 2 for i = 1,..., n. As we have seen, however, there are instances where Var(Y X = x i ) = Var(ε i ) = σ2 w i. Here w 1,..., w

More information

ST430 Exam 1 with Answers

ST430 Exam 1 with Answers ST430 Exam 1 with Answers Date: October 5, 2015 Name: Guideline: You may use one-page (front and back of a standard A4 paper) of notes. No laptop or textook are permitted but you may use a calculator.

More information

Economics 620, Lecture 2: Regression Mechanics (Simple Regression)

Economics 620, Lecture 2: Regression Mechanics (Simple Regression) 1 Economics 620, Lecture 2: Regression Mechanics (Simple Regression) Observed variables: y i ; x i i = 1; :::; n Hypothesized (model): Ey i = + x i or y i = + x i + (y i Ey i ) ; renaming we get: y i =

More information

ECON The Simple Regression Model

ECON The Simple Regression Model ECON 351 - The Simple Regression Model Maggie Jones 1 / 41 The Simple Regression Model Our starting point will be the simple regression model where we look at the relationship between two variables In

More information

1 Multiple Regression

1 Multiple Regression 1 Multiple Regression In this section, we extend the linear model to the case of several quantitative explanatory variables. There are many issues involved in this problem and this section serves only

More information

Intermediate Econometrics

Intermediate Econometrics Intermediate Econometrics Heteroskedasticity Text: Wooldridge, 8 July 17, 2011 Heteroskedasticity Assumption of homoskedasticity, Var(u i x i1,..., x ik ) = E(u 2 i x i1,..., x ik ) = σ 2. That is, the

More information

General Linear Model (Chapter 4)

General Linear Model (Chapter 4) General Linear Model (Chapter 4) Outcome variable is considered continuous Simple linear regression Scatterplots OLS is BLUE under basic assumptions MSE estimates residual variance testing regression coefficients

More information

Linear Regression. Simple linear regression model determines the relationship between one dependent variable (y) and one independent variable (x).

Linear Regression. Simple linear regression model determines the relationship between one dependent variable (y) and one independent variable (x). Linear Regression Simple linear regression model determines the relationship between one dependent variable (y) and one independent variable (x). A dependent variable is a random variable whose variation

More information

First Year Examination Department of Statistics, University of Florida

First Year Examination Department of Statistics, University of Florida First Year Examination Department of Statistics, University of Florida August 20, 2009, 8:00 am - 2:00 noon Instructions:. You have four hours to answer questions in this examination. 2. You must show

More information

where x and ȳ are the sample means of x 1,, x n

where x and ȳ are the sample means of x 1,, x n y y Animal Studies of Side Effects Simple Linear Regression Basic Ideas In simple linear regression there is an approximately linear relation between two variables say y = pressure in the pancreas x =

More information

Applied Econometrics (QEM)

Applied Econometrics (QEM) Applied Econometrics (QEM) The Simple Linear Regression Model based on Prinicples of Econometrics Jakub Mućk Department of Quantitative Economics Jakub Mućk Applied Econometrics (QEM) Meeting #2 The Simple

More information

Estimating σ 2. We can do simple prediction of Y and estimation of the mean of Y at any value of X.

Estimating σ 2. We can do simple prediction of Y and estimation of the mean of Y at any value of X. Estimating σ 2 We can do simple prediction of Y and estimation of the mean of Y at any value of X. To perform inferences about our regression line, we must estimate σ 2, the variance of the error term.

More information

Part 8: GLMs and Hierarchical LMs and GLMs

Part 8: GLMs and Hierarchical LMs and GLMs Part 8: GLMs and Hierarchical LMs and GLMs 1 Example: Song sparrow reproductive success Arcese et al., (1992) provide data on a sample from a population of 52 female song sparrows studied over the course

More information

Nonlinear Models. What do you do when you don t have a line? What do you do when you don t have a line? A Quadratic Adventure

Nonlinear Models. What do you do when you don t have a line? What do you do when you don t have a line? A Quadratic Adventure What do you do when you don t have a line? Nonlinear Models Spores 0e+00 2e+06 4e+06 6e+06 8e+06 30 40 50 60 70 longevity What do you do when you don t have a line? A Quadratic Adventure 1. If nonlinear

More information

Introduction to Estimation Methods for Time Series models. Lecture 1

Introduction to Estimation Methods for Time Series models. Lecture 1 Introduction to Estimation Methods for Time Series models Lecture 1 Fulvio Corsi SNS Pisa Fulvio Corsi Introduction to Estimation () Methods for Time Series models Lecture 1 SNS Pisa 1 / 19 Estimation

More information

Simple Linear Regression. Material from Devore s book (Ed 8), and Cengagebrain.com

Simple Linear Regression. Material from Devore s book (Ed 8), and Cengagebrain.com 12 Simple Linear Regression Material from Devore s book (Ed 8), and Cengagebrain.com The Simple Linear Regression Model The simplest deterministic mathematical relationship between two variables x and

More information

MS&E 226: Small Data. Lecture 11: Maximum likelihood (v2) Ramesh Johari

MS&E 226: Small Data. Lecture 11: Maximum likelihood (v2) Ramesh Johari MS&E 226: Small Data Lecture 11: Maximum likelihood (v2) Ramesh Johari ramesh.johari@stanford.edu 1 / 18 The likelihood function 2 / 18 Estimating the parameter This lecture develops the methodology behind

More information

STK4900/ Lecture 5. Program

STK4900/ Lecture 5. Program STK4900/9900 - Lecture 5 Program 1. Checking model assumptions Linearity Equal variances Normality Influential observations Importance of model assumptions 2. Selection of predictors Forward and backward

More information

AMS 315/576 Lecture Notes. Chapter 11. Simple Linear Regression

AMS 315/576 Lecture Notes. Chapter 11. Simple Linear Regression AMS 315/576 Lecture Notes Chapter 11. Simple Linear Regression 11.1 Motivation A restaurant opening on a reservations-only basis would like to use the number of advance reservations x to predict the number

More information

1. Simple Linear Regression

1. Simple Linear Regression 1. Simple Linear Regression Suppose that we are interested in the average height of male undergrads at UF. We put each male student s name (population) in a hat and randomly select 100 (sample). Then their

More information

Lecture 2. The Simple Linear Regression Model: Matrix Approach

Lecture 2. The Simple Linear Regression Model: Matrix Approach Lecture 2 The Simple Linear Regression Model: Matrix Approach Matrix algebra Matrix representation of simple linear regression model 1 Vectors and Matrices Where it is necessary to consider a distribution

More information

Chapter 1. Linear Regression with One Predictor Variable

Chapter 1. Linear Regression with One Predictor Variable Chapter 1. Linear Regression with One Predictor Variable 1.1 Statistical Relation Between Two Variables To motivate statistical relationships, let us consider a mathematical relation between two mathematical

More information

Lecture 1 Intro to Spatial and Temporal Data

Lecture 1 Intro to Spatial and Temporal Data Lecture 1 Intro to Spatial and Temporal Data Dennis Sun Stanford University Stats 253 June 22, 2015 1 What is Spatial and Temporal Data? 2 Trend Modeling 3 Omitted Variables 4 Overview of this Class 1

More information

Association studies and regression

Association studies and regression Association studies and regression CM226: Machine Learning for Bioinformatics. Fall 2016 Sriram Sankararaman Acknowledgments: Fei Sha, Ameet Talwalkar Association studies and regression 1 / 104 Administration

More information

Lecture 16 Solving GLMs via IRWLS

Lecture 16 Solving GLMs via IRWLS Lecture 16 Solving GLMs via IRWLS 09 November 2015 Taylor B. Arnold Yale Statistics STAT 312/612 Notes problem set 5 posted; due next class problem set 6, November 18th Goals for today fixed PCA example

More information

Chapter 1: Linear Regression with One Predictor Variable also known as: Simple Linear Regression Bivariate Linear Regression

Chapter 1: Linear Regression with One Predictor Variable also known as: Simple Linear Regression Bivariate Linear Regression BSTT523: Kutner et al., Chapter 1 1 Chapter 1: Linear Regression with One Predictor Variable also known as: Simple Linear Regression Bivariate Linear Regression Introduction: Functional relation between

More information

Fitting a regression model

Fitting a regression model Fitting a regression model We wish to fit a simple linear regression model: y = β 0 + β 1 x + ɛ. Fitting a model means obtaining estimators for the unknown population parameters β 0 and β 1 (and also for

More information

Tutorial 6: Linear Regression

Tutorial 6: Linear Regression Tutorial 6: Linear Regression Rob Nicholls nicholls@mrc-lmb.cam.ac.uk MRC LMB Statistics Course 2014 Contents 1 Introduction to Simple Linear Regression................ 1 2 Parameter Estimation and Model

More information

Lecture Notes 15 Prediction Chapters 13, 22, 20.4.

Lecture Notes 15 Prediction Chapters 13, 22, 20.4. Lecture Notes 15 Prediction Chapters 13, 22, 20.4. 1 Introduction Prediction is covered in detail in 36-707, 36-701, 36-715, 10/36-702. Here, we will just give an introduction. We observe training data

More information

Multiple Linear Regression

Multiple Linear Regression Multiple Linear Regression University of California, San Diego Instructor: Ery Arias-Castro http://math.ucsd.edu/~eariasca/teaching.html 1 / 42 Passenger car mileage Consider the carmpg dataset taken from

More information

Summer School in Statistics for Astronomers V June 1 - June 6, Regression. Mosuk Chow Statistics Department Penn State University.

Summer School in Statistics for Astronomers V June 1 - June 6, Regression. Mosuk Chow Statistics Department Penn State University. Summer School in Statistics for Astronomers V June 1 - June 6, 2009 Regression Mosuk Chow Statistics Department Penn State University. Adapted from notes prepared by RL Karandikar Mean and variance Recall

More information

UNIVERSITY OF TORONTO Faculty of Arts and Science

UNIVERSITY OF TORONTO Faculty of Arts and Science UNIVERSITY OF TORONTO Faculty of Arts and Science December 2013 Final Examination STA442H1F/2101HF Methods of Applied Statistics Jerry Brunner Duration - 3 hours Aids: Calculator Model(s): Any calculator

More information

Applied Statistics. J. Blanchet and J. Wadsworth. Institute of Mathematics, Analysis, and Applications EPF Lausanne

Applied Statistics. J. Blanchet and J. Wadsworth. Institute of Mathematics, Analysis, and Applications EPF Lausanne Applied Statistics J. Blanchet and J. Wadsworth Institute of Mathematics, Analysis, and Applications EPF Lausanne An MSc Course for Applied Mathematicians, Fall 2012 Outline 1 Motivation: Why Applied Statistics?

More information

Dealing with Heteroskedasticity

Dealing with Heteroskedasticity Dealing with Heteroskedasticity James H. Steiger Department of Psychology and Human Development Vanderbilt University James H. Steiger (Vanderbilt University) Dealing with Heteroskedasticity 1 / 27 Dealing

More information

Steps in Regression Analysis

Steps in Regression Analysis MGMG 522 : Session #2 Learning to Use Regression Analysis & The Classical Model (Ch. 3 & 4) 2-1 Steps in Regression Analysis 1. Review the literature and develop the theoretical model 2. Specify the model:

More information

Simple Linear Regression

Simple Linear Regression Simple Linear Regression Reading: Hoff Chapter 9 November 4, 2009 Problem Data: Observe pairs (Y i,x i ),i = 1,... n Response or dependent variable Y Predictor or independent variable X GOALS: Exploring

More information

Formal Statement of Simple Linear Regression Model

Formal Statement of Simple Linear Regression Model Formal Statement of Simple Linear Regression Model Y i = β 0 + β 1 X i + ɛ i Y i value of the response variable in the i th trial β 0 and β 1 are parameters X i is a known constant, the value of the predictor

More information

Chapter 1 Linear Regression with One Predictor

Chapter 1 Linear Regression with One Predictor STAT 525 FALL 2018 Chapter 1 Linear Regression with One Predictor Professor Min Zhang Goals of Regression Analysis Serve three purposes Describes an association between X and Y In some applications, the

More information

Applied Regression. Applied Regression. Chapter 2 Simple Linear Regression. Hongcheng Li. April, 6, 2013

Applied Regression. Applied Regression. Chapter 2 Simple Linear Regression. Hongcheng Li. April, 6, 2013 Applied Regression Chapter 2 Simple Linear Regression Hongcheng Li April, 6, 2013 Outline 1 Introduction of simple linear regression 2 Scatter plot 3 Simple linear regression model 4 Test of Hypothesis

More information

IES 612/STA 4-573/STA Winter 2008 Week 1--IES 612-STA STA doc

IES 612/STA 4-573/STA Winter 2008 Week 1--IES 612-STA STA doc IES 612/STA 4-573/STA 4-576 Winter 2008 Week 1--IES 612-STA 4-573-STA 4-576.doc Review Notes: [OL] = Ott & Longnecker Statistical Methods and Data Analysis, 5 th edition. [Handouts based on notes prepared

More information

Chapter 13 Introduction to Nonlinear Regression( 非線性迴歸 )

Chapter 13 Introduction to Nonlinear Regression( 非線性迴歸 ) Chapter 13 Introduction to Nonlinear Regression( 非線性迴歸 ) and Neural Networks( 類神經網路 ) 許湘伶 Applied Linear Regression Models (Kutner, Nachtsheim, Neter, Li) hsuhl (NUK) LR Chap 10 1 / 35 13 Examples of nonlinear

More information

STK4900/ Lecture 3. Program

STK4900/ Lecture 3. Program STK4900/9900 - Lecture 3 Program 1. Multiple regression: Data structure and basic questions 2. The multiple linear regression model 3. Categorical predictors 4. Planned experiments and observational studies

More information