Examples of fitting various piecewise-continuous functions to data, using basis functions in doing the regressions.

Size: px
Start display at page:

Download "Examples of fitting various piecewise-continuous functions to data, using basis functions in doing the regressions."

Transcription

1 Examples of fitting various piecewise-continuous functions to data, using basis functions in doing the regressions. David. Boore These examples in this document used R to do the regression. See also Notes_on_piecewise_continuous_regression.doc for more detail on why the basis functions used below guarantee continuity at the breakpoints. Based on Eric Thompson's program piecewise-continuous.r, using DB basis functions CONSTRUCT DATA TO BE FIT: Define function QUADRATIC, LINEAR, LINEAR Breakpoints: c <- c(4, 7) y <- function(,c){ ifelse(<c[1], *(-c[1])-0.5*(-c[1])^2, ifelse(<=c[2], *(-c[1]), *(c[2]-c[1])-0.5*(-c[2]))) } Generate data set.seed(1) n <- 300 <- runif(n, 0, 10) Add some noise: yn <- y(,c) + rnorm(n, sd = 0.5) plot(, yn,col="black") abline(v = c) lines(sort(),y(sort(),c),lwd=3,col="red") 1

2 yn Compute basis function for each region: b1 <- function(x,r){ifelse(x<r,x,r)} b1.2 <- function(x,r){ifelse(x<r,x^2,r^2)} b2 <- function(x,l,r){ifelse(x<l,0,ifelse(x<r,x-l,r-l))} b3 <- function(x,l){ifelse(x<l,0,x-l)} plot(, b1(,c[1]), ylim=c(-0.25,10.0),col = "blue", pch = 20) points(, b1.2(,c[1]), col = "red", pch = 20) points(, b2(,c[1],c[2]), col = "green", pch = 20) points(, b3(,c[2]), col = "purple", pch = 20) 2

3 abline(v = c) b1(, c[1]) odel <- lm(yn ~ b1(,c[1]) + b1.2(,c[1]) + b2(,c[1],c[2]) + b3(,c[2]) ) Regression summary: summary(odel) Call: lm(formula = yn ~ b1(, c[1]) + b1.2(, c[1]) + b2(, c[1], c[2]) + b3(, c[2])) Residuals: in 1Q edian 3Q ax

4 Coefficients: Estimate Std. Error t value Pr(> t ) (Intercept) <2e-16 *** b1(, c[1]) <2e-16 *** b1.2(, c[1]) <2e-16 *** b2(, c[1], c[2]) <2e-16 *** b3(, c[2]) <2e-16 *** --- Signif. codes: 0 *** ** 0.01 * Residual standard error: on 295 degrees of freedom ultiple R-squared: , Adjusted R-squared: F-statistic: 3529 on 4 and 295 DF, p-value: < 2.2e-16Coefficients: coef <- coefficients(odel) coef 4

5 yn

6 FLAT, QUADRATIC, LINEAR, LINEAR Define function Breakpoints: c <- c(2, 4, 7) y <- function(,c){ ifelse(<c[1], -3.67, ifelse(<c[2], *(-c[2])-0.5*(-c[2])^2, ifelse(<=c[3], *(-c[2]), *(c[3]-c[2])-0.5*(-c[3])))) } Generate data set.seed(1) n <- 300 <- runif(n, 0, 10) Add some noise: yn <- y(,c) + rnorm(n, sd = 0.5) plot(, yn,col="black") abline(v = c) lines(sort(),y(sort(),c),lwd=3,col="red") 6

7 yn Compute basis function for each region: b1 <- function(x){rep(1,length(x))} Alternatively, but less desirable: b1 <- rep(1,length()) b2 <- function(x,l,r){ifelse(x<l,0,ifelse(x<r,x-l,r-l))} b2.2 <- function(x,l,r){ifelse(x<l,0,ifelse(x<r,(x-l)^2,(r-l)^2))} b3 <- function(x,l,r){ifelse(x<l,0,ifelse(x<r,x-l,r-l))} b4 <- function(x,l){ifelse(x<l,0,x-l)} plot(, b1(), ylim=c(-0.25,5.0),col = "blue", pch = 20) Alternatively, but less desirable: plot(, b1, ylim=c(-0.25,5.0),col = "blue", pch = 20) points(, b2(,c[1],c[2]), col = "red", pch = 20) 7

8 points(, b2.2(,c[1],c[2]), col = "magenta", pch = 20) points(, b3(,c[2],c[3]), col = "green", pch = 20) points(, b4(,c[3]), col = "purple", pch = 20) abline(v = c) b1() odel <- lm(yn ~ -1 + b1() + b2(,c[1],c[2]) + b2.2(,c[1],c[2]) + b3(,c[2],c[3]) + b4(,c[3]) ) Alternatively, but less desirable: odel <- lm(yn ~ 1 + b2(,c[1],c[2]) + b2.2(,c[1],c[2]) + b3(,c[2],c[3]) + b4(,c[3]) ) Regression summary: summary(odel) 8

9 Call: lm(formula = yn ~ -1 + b1() + b2(, c[1], c[2]) + b2.2(, c[1], c[2]) + b3(, c[2], c[3]) + b4(, c[3])) Residuals: in 1Q edian 3Q ax Coefficients: Estimate Std. Error t value Pr(> t ) b1() < 2e-16 *** b2(, c[1], c[2]) < 2e-16 *** b2.2(, c[1], c[2]) *** b3(, c[2], c[3]) < 2e-16 *** b4(, c[3]) < 2e-16 *** --- Signif. codes: 0 *** ** 0.01 * Residual standard error: on 295 degrees of freedom ultiple R-squared: , Adjusted R-squared: F-statistic: 1069 on 5 and 295 DF, p-value: < 2.2e-16 coef <- coefficients(odel) coef Replot data plot(, yn,col="black") Plot actual function abline(v = c) lines(sort(),y(sort(),c),lwd=2,col="red") Add predictions p <- sort() yp <- coef[1]*b1(p)+coef[2]*b2(p,c[1],c[2])+coef[3]*b2.2(p,c[1],c[2]) + coef[4]*b3(p,c[2],c[3]) + coef[5]*b4(p,c[3]) Alternatively, but less desirable: yp <- coef[1]+coef[2]*b2(p,c[1],c[2])+coef[3]*b2.2(p,c[1],c[2]) + coef[4]*b3(p,c[2],c[3]) + coef[5]*b4(p,c[3]) lines(p,yp, lwd=2, col="blue") 9

10 yn

11 FLAT, QUADRATIC, LINEAR, FLAT Define function Breakpoints: c <- c(2, 4, 7) y <- function(,c){ ifelse(<c[1], -3.67, ifelse(<c[2], *(-c[2])-0.5*(-c[2])^2, ifelse(<=c[3], *(-c[2]), *(c[3]-c[2])))) } Generate data set.seed(1) n <- 300 <- runif(n, 0, 10) Add some noise: yn <- y(,c) + rnorm(n, sd = 0.5) plot(, yn,col="black") abline(v = c) lines(sort(),y(sort(),c),lwd=3,col="red") 11

12 yn Compute basis function for each region: b1 <- function(x){rep(1,length(x))} b2 <- function(x,l,r){ifelse(x<l,0,ifelse(x<r,x-l,r-l))} b2.2 <- function(x,l,r){ifelse(x<l,0,ifelse(x<r,(x-l)^2,(r-l)^2))} b3 <- function(x,l,r){ifelse(x<l,0,ifelse(x<r,x-l,r-l))} plot(, b1(), ylim=c(-0.25,5.0),col = "blue", pch = 20) points(, b2(,c[1],c[2]), col = "red", pch = 20) points(, b2.2(,c[1],c[2]), col = "magenta", pch = 20) points(, b3(,c[3],c[4]), col = "green", pch = 20) abline(v = c) 12

13 b odel <- lm(yn ~ -1 + b1() + b2(,c[1],c[2]) + b2.2(,c[1],c[2]) + b3(,c[2],c[3]) ) Regression summary: summary(odel) Call: lm(formula = yn ~ 1 + b2(, c[1], c[2]) + b2.2(, c[1], c[2]) + b3(, c[2], c[3])) Residuals: in 1Q edian 3Q ax

14 Coefficients: Estimate Std. Error t value Pr(> t ) (Intercept) < 2e-16 *** b2(, c[1], c[2]) < 2e-16 *** b2.2(, c[1], c[2]) e-06 *** b3(, c[2], c[3]) < 2e-16 *** --- Signif. codes: 0 *** ** 0.01 * Residual standard error: on 296 degrees of freedom ultiple R-squared: , Adjusted R-squared: F-statistic: 2204 on 3 and 296 DF, p-value: < 2.2e-16 coef <- coefficients(odel) coef Replot data plot(, yn,col="black") Plot actual function abline(v = c) lines(sort(),y(sort(),c),lwd=2,col="red") Add predictions p <- sort() yp <- coef[1]*b1(p)+coef[2]*b2(p,c[1],c[2])+coef[3]*b2.2(p,c[1],c[2]) + coef[4]*b3(p,c[2],c[3]) lines(p,yp, lwd=2, col="blue") 14

15 yn

16 FLAT, QUADRATIC, FLAT, LINEAR Define function Breakpoints: c <- c(2, 4, 7) y <- function(,c){ ifelse(<c[1], -3.67, ifelse(<c[2], *(-c[2])-0.5*(-c[2])^2, ifelse(<=c[3],1.13, *(-c[3])))) } Generate data set.seed(1) n <- 300 <- runif(n, 0, 10) Add some noise: yn <- y(,c) + rnorm(n, sd = 0.5) plot(, yn,col="black") abline(v = c) lines(sort(),y(sort(),c),lwd=3,col="red") 16

17 yn Compute basis function for each region: b1 <- function(x){rep(1,length(x))} b2 <- function(x,l,r){ifelse(x<l,0,ifelse(x<r,x-l,r-l))} b2.2 <- function(x,l,r){ifelse(x<l,0,ifelse(x<r,(x-l)^2,(r-l)^2))} b4 <- function(x,l){ifelse(x<l,0,x-l)} plot(, b1(), ylim=c(-0.25,5.0),col = "blue", pch = 20) points(, b2(,c[1],c[2]), col = "red", pch = 20) points(, b2.2(,c[1],c[2]), col = "magenta", pch = 20) points(, b4(,c[3]), col = "purple", pch = 20) abline(v = c) 17

18 b odel <- lm(yn ~ -1 + b1() + b2(,c[1],c[2]) + b2.2(,c[1],c[2]) + b4(,c[3]) ) Regression summary: summary(odel) Call: lm(formula = yn ~ 1 + b2(, c[1], c[2]) + b2.2(, c[1], c[2]) + b4(, c[3])) Residuals: in 1Q edian 3Q ax

19 Coefficients: Estimate Std. Error t value Pr(> t ) (Intercept) < 2e-16 *** b2(, c[1], c[2]) < 2e-16 *** b2.2(, c[1], c[2]) e-07 *** b4(, c[3]) < 2e-16 *** --- Signif. codes: 0 *** ** 0.01 * Residual standard error: on 296 degrees of freedom ultiple R-squared: , Adjusted R-squared: F-statistic: 1256 on 3 and 296 DF, p-value: < 2.2e-16 coef <- coefficients(odel) coef Replot data plot(, yn,col="black") Plot actual function abline(v = c) lines(sort(),y(sort(),c),lwd=2,col="red") Add predictions p <- sort() yp <- coef[1]*b1(p)+coef[2]*b2(p,c[1],c[2])+coef[3]*b2.2(p,c[1],c[2]) + coef[4]*b4(p,c[3]) lines(p,yp, lwd=2, col="blue") 19

20 yn

21 FLAT, LINE CROSSING 0.0 AT XC, FLAT How do a model that is flat to c[1], forced to cross 0.0 at x = xc, and then is flat beyond c[2]? Thinking about it, there is only one basis function, even though it has a break at x=c[1]. The reason is that the slope of the line between c[1] and c[2] is determined by the value of constant portion for x < c[1] and the condition that the line crosses the zero line at x=xc. Therefore the slope is NOT a regression parameter. There is only one regression parameter. FLAT, LINEAR GOING THROUGH A SPECIFIED POINT, FLAT Define function Breakpoints: c <- c(4, 6) cz <- 5.5 slope <- ( )/(cz-c[1]) y <- function(x,c){ ifelse(x<c[1], 2.0, ifelse(x<c[2],2.0 - slope*(x-c[1]), slope*(c[2]-c[1]) )) } Generate data set.seed(1) n <- 300 <- runif(n, 0, 10) Add some noise: yn <- y(,c) + rnorm(n, sd = 0.5) plot(, yn,col="black") abline(h = 0) abline(v = c(c[1],cz,c[2])) lines(sort(),y(sort(),c),lwd=3,col="red") 21

22 yn Compute basis function for each region: b1 <- function(x,l,xc,r){ifelse(x<l,1,ifelse(x<r,1-(x-l)/(xc-l),1-(r-l)/(xc- L)))} plot(, b1(,c[1],cz,c[2]), col = "blue", pch = 20) abline(h=0) abline(v = c(c[1],cz,c[2])) 22

23 b1(, c[1], cz, c[2]) odel <- lm(yn ~ -1 + b1(,c[1],cz,c[2]) ) Regression summary: summary(odel) Call: lm(formula = yn ~ -1 + b1(, c[1], cz, c[2])) Residuals: in 1Q edian 3Q ax Coefficients: 23

24 Estimate Std. Error t value Pr(> t ) b1(, c[1], cz, c[2]) <2e-16 *** --- Signif. codes: 0 *** ** 0.01 * Residual standard error: on 299 degrees of freedom ultiple R-squared: , Adjusted R-squared: F-statistic: 2486 on 1 and 299 DF, p-value: < 2.2e-16 coef <- coefficients(odel) coef b1(, c[1], cz, c[2]) Replot data plot(, yn,col="black") Plot actual function abline(h = 0) abline(v = c(c[1],cz,c[2])) lines(sort(),y(sort(),c),lwd=2,col="red") Add predictions p <- sort() yp <- coef[1]*b1(p,c[1],cz,c[2]) lines(p,yp, lwd=2, col="blue") 24

25 yn

cor(dataset$measurement1, dataset$measurement2, method= pearson ) cor.test(datavector1, datavector2, method= pearson )

cor(dataset$measurement1, dataset$measurement2, method= pearson ) cor.test(datavector1, datavector2, method= pearson ) Tutorial 7: Correlation and Regression Correlation Used to test whether two variables are linearly associated. A correlation coefficient (r) indicates the strength and direction of the association. A correlation

More information

STAT 3022 Spring 2007

STAT 3022 Spring 2007 Simple Linear Regression Example These commands reproduce what we did in class. You should enter these in R and see what they do. Start by typing > set.seed(42) to reset the random number generator so

More information

Stat 8053, Fall 2013: Robust Regression

Stat 8053, Fall 2013: Robust Regression Stat 8053, Fall 2013: Robust Regression Duncan s occupational-prestige regression was introduced in Chapter 1 of [?]. The least-squares regression of prestige on income and education produces the following

More information

Regression Analysis Chapter 2 Simple Linear Regression

Regression Analysis Chapter 2 Simple Linear Regression Regression Analysis Chapter 2 Simple Linear Regression Dr. Bisher Mamoun Iqelan biqelan@iugaza.edu.ps Department of Mathematics The Islamic University of Gaza 2010-2011, Semester 2 Dr. Bisher M. Iqelan

More information

Introduction and Single Predictor Regression. Correlation

Introduction and Single Predictor Regression. Correlation Introduction and Single Predictor Regression Dr. J. Kyle Roberts Southern Methodist University Simmons School of Education and Human Development Department of Teaching and Learning Correlation A correlation

More information

MODELS WITHOUT AN INTERCEPT

MODELS WITHOUT AN INTERCEPT Consider the balanced two factor design MODELS WITHOUT AN INTERCEPT Factor A 3 levels, indexed j 0, 1, 2; Factor B 5 levels, indexed l 0, 1, 2, 3, 4; n jl 4 replicate observations for each factor level

More information

Chapter 5 Exercises 1

Chapter 5 Exercises 1 Chapter 5 Exercises 1 Data Analysis & Graphics Using R, 2 nd edn Solutions to Exercises (December 13, 2006) Preliminaries > library(daag) Exercise 2 For each of the data sets elastic1 and elastic2, determine

More information

Stat 411/511 ESTIMATING THE SLOPE AND INTERCEPT. Charlotte Wickham. stat511.cwick.co.nz. Nov

Stat 411/511 ESTIMATING THE SLOPE AND INTERCEPT. Charlotte Wickham. stat511.cwick.co.nz. Nov Stat 411/511 ESTIMATING THE SLOPE AND INTERCEPT Nov 20 2015 Charlotte Wickham stat511.cwick.co.nz Quiz #4 This weekend, don t forget. Usual format Assumptions Display 7.5 p. 180 The ideal normal, simple

More information

Lecture 18: Simple Linear Regression

Lecture 18: Simple Linear Regression Lecture 18: Simple Linear Regression BIOS 553 Department of Biostatistics University of Michigan Fall 2004 The Correlation Coefficient: r The correlation coefficient (r) is a number that measures the strength

More information

Multiple Linear Regression (solutions to exercises)

Multiple Linear Regression (solutions to exercises) Chapter 6 1 Chapter 6 Multiple Linear Regression (solutions to exercises) Chapter 6 CONTENTS 2 Contents 6 Multiple Linear Regression (solutions to exercises) 1 6.1 Nitrate concentration..........................

More information

Regression and Models with Multiple Factors. Ch. 17, 18

Regression and Models with Multiple Factors. Ch. 17, 18 Regression and Models with Multiple Factors Ch. 17, 18 Mass 15 20 25 Scatter Plot 70 75 80 Snout-Vent Length Mass 15 20 25 Linear Regression 70 75 80 Snout-Vent Length Least-squares The method of least

More information

Fractional Factorial Designs

Fractional Factorial Designs Fractional Factorial Designs ST 516 Each replicate of a 2 k design requires 2 k runs. E.g. 64 runs for k = 6, or 1024 runs for k = 10. When this is infeasible, we use a fraction of the runs. As a result,

More information

Analysis of Covariance: Comparing Regression Lines

Analysis of Covariance: Comparing Regression Lines Chapter 7 nalysis of Covariance: Comparing Regression ines Suppose that you are interested in comparing the typical lifetime (hours) of two tool types ( and ). simple analysis of the data given below would

More information

Chapter 3 - Linear Regression

Chapter 3 - Linear Regression Chapter 3 - Linear Regression Lab Solution 1 Problem 9 First we will read the Auto" data. Note that most datasets referred to in the text are in the R package the authors developed. So we just need to

More information

ST430 Exam 1 with Answers

ST430 Exam 1 with Answers ST430 Exam 1 with Answers Date: October 5, 2015 Name: Guideline: You may use one-page (front and back of a standard A4 paper) of notes. No laptop or textook are permitted but you may use a calculator.

More information

The linear model. Our models so far are linear. Change in Y due to change in X? See plots for: o age vs. ahe o carats vs.

The linear model. Our models so far are linear. Change in Y due to change in X? See plots for: o age vs. ahe o carats vs. 8 Nonlinear effects Lots of effects in economics are nonlinear Examples Deal with these in two (sort of three) ways: o Polynomials o Logarithms o Interaction terms (sort of) 1 The linear model Our models

More information

Nonstationary time series models

Nonstationary time series models 13 November, 2009 Goals Trends in economic data. Alternative models of time series trends: deterministic trend, and stochastic trend. Comparison of deterministic and stochastic trend models The statistical

More information

Quantitative Understanding in Biology Module II: Model Parameter Estimation Lecture I: Linear Correlation and Regression

Quantitative Understanding in Biology Module II: Model Parameter Estimation Lecture I: Linear Correlation and Regression Quantitative Understanding in Biology Module II: Model Parameter Estimation Lecture I: Linear Correlation and Regression Correlation Linear correlation and linear regression are often confused, mostly

More information

Class: Dean Foster. September 30, Read sections: Examples chapter (chapter 3) Question today: Do prices go up faster than they go down?

Class: Dean Foster. September 30, Read sections: Examples chapter (chapter 3) Question today: Do prices go up faster than they go down? Class: Dean Foster September 30, 2013 Administrivia Read sections: Examples chapter (chapter 3) Gas prices Question today: Do prices go up faster than they go down? Idea is that sellers watch spot price

More information

UNIVERSITY OF MASSACHUSETTS. Department of Mathematics and Statistics. Basic Exam - Applied Statistics. Tuesday, January 17, 2017

UNIVERSITY OF MASSACHUSETTS. Department of Mathematics and Statistics. Basic Exam - Applied Statistics. Tuesday, January 17, 2017 UNIVERSITY OF MASSACHUSETTS Department of Mathematics and Statistics Basic Exam - Applied Statistics Tuesday, January 17, 2017 Work all problems 60 points are needed to pass at the Masters Level and 75

More information

Linear Model Specification in R

Linear Model Specification in R Linear Model Specification in R How to deal with overparameterisation? Paul Janssen 1 Luc Duchateau 2 1 Center for Statistics Hasselt University, Belgium 2 Faculty of Veterinary Medicine Ghent University,

More information

Different formulas for the same model How to get different coefficients in R

Different formulas for the same model How to get different coefficients in R Outline 1 Re-parametrizations Different formulas for the same model How to get different coefficients in R 2 Interactions Two-way interactions between a factor and another predictor Two-way interactions

More information

Regression. Bret Larget. Department of Statistics. University of Wisconsin - Madison. Statistics 371, Fall Correlation Plots

Regression. Bret Larget. Department of Statistics. University of Wisconsin - Madison. Statistics 371, Fall Correlation Plots Correlation Plots r = 0.97 Regression Bret Larget Department of Statistics Statistics 371, Fall 2004 2 Universit of Wisconsin - Madison Correlation Plots 10 5 0 5 10 r = 0.21 Correlation December 2, 2004

More information

Comparing Nested Models

Comparing Nested Models Comparing Nested Models ST 370 Two regression models are called nested if one contains all the predictors of the other, and some additional predictors. For example, the first-order model in two independent

More information

Nonlinear Models. Daphnia: Purveyors of Fine Fungus 1/30 2/30

Nonlinear Models. Daphnia: Purveyors of Fine Fungus 1/30 2/30 Nonlinear Models 1/30 Daphnia: Purveyors of Fine Fungus 2/30 What do you do when you don t have a straight line? 7500000 Spores 5000000 2500000 0 30 40 50 60 70 longevity 3/30 What do you do when you don

More information

Chapter 5 Exercises 1. Data Analysis & Graphics Using R Solutions to Exercises (April 24, 2004)

Chapter 5 Exercises 1. Data Analysis & Graphics Using R Solutions to Exercises (April 24, 2004) Chapter 5 Exercises 1 Data Analysis & Graphics Using R Solutions to Exercises (April 24, 2004) Preliminaries > library(daag) Exercise 2 The final three sentences have been reworded For each of the data

More information

Example: 1982 State SAT Scores (First year state by state data available)

Example: 1982 State SAT Scores (First year state by state data available) Lecture 11 Review Section 3.5 from last Monday (on board) Overview of today s example (on board) Section 3.6, Continued: Nested F tests, review on board first Section 3.4: Interaction for quantitative

More information

Density Temp vs Ratio. temp

Density Temp vs Ratio. temp Temp Ratio Density 0.00 0.02 0.04 0.06 0.08 0.10 0.12 Density 0.0 0.2 0.4 0.6 0.8 1.0 1. (a) 170 175 180 185 temp 1.0 1.5 2.0 2.5 3.0 ratio The histogram shows that the temperature measures have two peaks,

More information

> modlyq <- lm(ly poly(x,2,raw=true)) > summary(modlyq) Call: lm(formula = ly poly(x, 2, raw = TRUE))

> modlyq <- lm(ly poly(x,2,raw=true)) > summary(modlyq) Call: lm(formula = ly poly(x, 2, raw = TRUE)) School of Mathematical Sciences MTH5120 Statistical Modelling I Tutorial 4 Solutions The first two models were looked at last week and both had flaws. The output for the third model with log y and a quadratic

More information

Homework 2. For the homework, be sure to give full explanations where required and to turn in any relevant plots.

Homework 2. For the homework, be sure to give full explanations where required and to turn in any relevant plots. Homework 2 1 Data analysis problems For the homework, be sure to give full explanations where required and to turn in any relevant plots. 1. The file berkeley.dat contains average yearly temperatures for

More information

Correlation and regression

Correlation and regression Correlation and regression Patrick Breheny December 1, 2016 Today s lab is about correlation and regression. It will be somewhat shorter than some of our other labs, as I would also like to spend some

More information

1.) Fit the full model, i.e., allow for separate regression lines (different slopes and intercepts) for each species

1.) Fit the full model, i.e., allow for separate regression lines (different slopes and intercepts) for each species Lecture notes 2/22/2000 Dummy variables and extra SS F-test Page 1 Crab claw size and closing force. Problem 7.25, 10.9, and 10.10 Regression for all species at once, i.e., include dummy variables for

More information

BIOSTATS 640 Spring 2018 Unit 2. Regression and Correlation (Part 1 of 2) R Users

BIOSTATS 640 Spring 2018 Unit 2. Regression and Correlation (Part 1 of 2) R Users BIOSTATS 640 Spring 08 Unit. Regression and Correlation (Part of ) R Users Unit Regression and Correlation of - Practice Problems Solutions R Users. In this exercise, you will gain some practice doing

More information

Math 2311 Written Homework 6 (Sections )

Math 2311 Written Homework 6 (Sections ) Math 2311 Written Homework 6 (Sections 5.4 5.6) Name: PeopleSoft ID: Instructions: Homework will NOT be accepted through email or in person. Homework must be submitted through CourseWare BEFORE the deadline.

More information

Applied Regression Analysis

Applied Regression Analysis Applied Regression Analysis Chapter 3 Multiple Linear Regression Hongcheng Li April, 6, 2013 Recall simple linear regression 1 Recall simple linear regression 2 Parameter Estimation 3 Interpretations of

More information

lm statistics Chris Parrish

lm statistics Chris Parrish lm statistics Chris Parrish 2017-04-01 Contents s e and R 2 1 experiment1................................................. 2 experiment2................................................. 3 experiment3.................................................

More information

Chapter 8 Conclusion

Chapter 8 Conclusion 1 Chapter 8 Conclusion Three questions about test scores (score) and student-teacher ratio (str): a) After controlling for differences in economic characteristics of different districts, does the effect

More information

Chapter 9. Polynomial Models and Interaction (Moderator) Analysis

Chapter 9. Polynomial Models and Interaction (Moderator) Analysis Chapter 9. Polynomial Models and Interaction (Moderator) Analysis In Chapter 4, we introduced the quadratic model as a device to test for curvature in the conditional mean function. You could also use

More information

Using R in 200D Luke Sonnet

Using R in 200D Luke Sonnet Using R in 200D Luke Sonnet Contents Working with data frames 1 Working with variables........................................... 1 Analyzing data............................................... 3 Random

More information

AMS-207: Bayesian Statistics

AMS-207: Bayesian Statistics Linear Regression How does a quantity y, vary as a function of another quantity, or vector of quantities x? We are interested in p(y θ, x) under a model in which n observations (x i, y i ) are exchangeable.

More information

Collinearity: Impact and Possible Remedies

Collinearity: Impact and Possible Remedies Collinearity: Impact and Possible Remedies Deepayan Sarkar What is collinearity? Exact dependence between columns of X make coefficients non-estimable Collinearity refers to the situation where some columns

More information

Holiday Assignment PS 531

Holiday Assignment PS 531 Holiday Assignment PS 531 Prof: Jake Bowers TA: Paul Testa January 27, 2014 Overview Below is a brief assignment for you to complete over the break. It should serve as refresher, covering some of the basic

More information

Biostatistics 380 Multiple Regression 1. Multiple Regression

Biostatistics 380 Multiple Regression 1. Multiple Regression Biostatistics 0 Multiple Regression ORIGIN 0 Multiple Regression Multiple Regression is an extension of the technique of linear regression to describe the relationship between a single dependent (response)

More information

General Linear Statistical Models - Part III

General Linear Statistical Models - Part III General Linear Statistical Models - Part III Statistics 135 Autumn 2005 Copyright c 2005 by Mark E. Irwin Interaction Models Lets examine two models involving Weight and Domestic in the cars93 dataset.

More information

Stat 5102 Final Exam May 14, 2015

Stat 5102 Final Exam May 14, 2015 Stat 5102 Final Exam May 14, 2015 Name Student ID The exam is closed book and closed notes. You may use three 8 1 11 2 sheets of paper with formulas, etc. You may also use the handouts on brand name distributions

More information

Stat 5031 Quadratic Response Surface Methods (QRSM) Sanford Weisberg November 30, 2015

Stat 5031 Quadratic Response Surface Methods (QRSM) Sanford Weisberg November 30, 2015 Stat 5031 Quadratic Response Surface Methods (QRSM) Sanford Weisberg November 30, 2015 One Variable x = spacing of plants (either 4, 8 12 or 16 inches), and y = plant yield (bushels per acre). Each condition

More information

Chapter 12: Linear regression II

Chapter 12: Linear regression II Chapter 12: Linear regression II Timothy Hanson Department of Statistics, University of South Carolina Stat 205: Elementary Statistics for the Biological and Life Sciences 1 / 14 12.4 The regression model

More information

Handout 4: Simple Linear Regression

Handout 4: Simple Linear Regression Handout 4: Simple Linear Regression By: Brandon Berman The following problem comes from Kokoska s Introductory Statistics: A Problem-Solving Approach. The data can be read in to R using the following code:

More information

Analytics 512: Homework # 2 Tim Ahn February 9, 2016

Analytics 512: Homework # 2 Tim Ahn February 9, 2016 Analytics 512: Homework # 2 Tim Ahn February 9, 2016 Chapter 3 Problem 1 (# 3) Suppose we have a data set with five predictors, X 1 = GP A, X 2 = IQ, X 3 = Gender (1 for Female and 0 for Male), X 4 = Interaction

More information

Introduction to Linear Regression

Introduction to Linear Regression Introduction to Linear Regression James H. Steiger Department of Psychology and Human Development Vanderbilt University James H. Steiger (Vanderbilt University) Introduction to Linear Regression 1 / 46

More information

Dealing with Heteroskedasticity

Dealing with Heteroskedasticity Dealing with Heteroskedasticity James H. Steiger Department of Psychology and Human Development Vanderbilt University James H. Steiger (Vanderbilt University) Dealing with Heteroskedasticity 1 / 27 Dealing

More information

Regression on Faithful with Section 9.3 content

Regression on Faithful with Section 9.3 content Regression on Faithful with Section 9.3 content The faithful data frame contains 272 obervational units with variables waiting and eruptions measuring, in minutes, the amount of wait time between eruptions,

More information

UNIVERSITY OF MASSACHUSETTS Department of Mathematics and Statistics Basic Exam - Applied Statistics January, 2018

UNIVERSITY OF MASSACHUSETTS Department of Mathematics and Statistics Basic Exam - Applied Statistics January, 2018 UNIVERSITY OF MASSACHUSETTS Department of Mathematics and Statistics Basic Exam - Applied Statistics January, 2018 Work all problems. 60 points needed to pass at the Masters level, 75 to pass at the PhD

More information

Technical note: Curve fitting with the R Environment for Statistical Computing

Technical note: Curve fitting with the R Environment for Statistical Computing Technical note: Curve fitting with the R Environment for Statistical Computing D G Rossiter Department of Earth Systems Analysis International Institute for Geo-information Science & Earth Observation

More information

Linear Regression Model. Badr Missaoui

Linear Regression Model. Badr Missaoui Linear Regression Model Badr Missaoui Introduction What is this course about? It is a course on applied statistics. It comprises 2 hours lectures each week and 1 hour lab sessions/tutorials. We will focus

More information

STAT 572 Assignment 5 - Answers Due: March 2, 2007

STAT 572 Assignment 5 - Answers Due: March 2, 2007 1. The file glue.txt contains a data set with the results of an experiment on the dry sheer strength (in pounds per square inch) of birch plywood, bonded with 5 different resin glues A, B, C, D, and E.

More information

Coefficient of Determination

Coefficient of Determination Coefficient of Determination ST 430/514 The coefficient of determination, R 2, is defined as before: R 2 = 1 SS E (yi ŷ i ) = 1 2 SS yy (yi ȳ) 2 The interpretation of R 2 is still the fraction of variance

More information

Diagnostics and Transformations Part 2

Diagnostics and Transformations Part 2 Diagnostics and Transformations Part 2 Bivariate Linear Regression James H. Steiger Department of Psychology and Human Development Vanderbilt University Multilevel Regression Modeling, 2009 Diagnostics

More information

Extensions of One-Way ANOVA.

Extensions of One-Way ANOVA. Extensions of One-Way ANOVA http://www.pelagicos.net/classes_biometry_fa18.htm What do I want You to Know What are two main limitations of ANOVA? What two approaches can follow a significant ANOVA? How

More information

Stat 412/512 REVIEW OF SIMPLE LINEAR REGRESSION. Jan Charlotte Wickham. stat512.cwick.co.nz

Stat 412/512 REVIEW OF SIMPLE LINEAR REGRESSION. Jan Charlotte Wickham. stat512.cwick.co.nz Stat 412/512 REVIEW OF SIMPLE LINEAR REGRESSION Jan 7 2015 Charlotte Wickham stat512.cwick.co.nz Announcements TA's Katie 2pm lab Ben 5pm lab Joe noon & 1pm lab TA office hours Kidder M111 Katie Tues 2-3pm

More information

SLR output RLS. Refer to slr (code) on the Lecture Page of the class website.

SLR output RLS. Refer to slr (code) on the Lecture Page of the class website. SLR output RLS Refer to slr (code) on the Lecture Page of the class website. Old Faithful at Yellowstone National Park, WY: Simple Linear Regression (SLR) Analysis SLR analysis explores the linear association

More information

Inference for Regression

Inference for Regression Inference for Regression Section 9.4 Cathy Poliak, Ph.D. cathy@math.uh.edu Office in Fleming 11c Department of Mathematics University of Houston Lecture 13b - 3339 Cathy Poliak, Ph.D. cathy@math.uh.edu

More information

The Big Picture. Model Modifications. Example (cont.) Bacteria Count Example

The Big Picture. Model Modifications. Example (cont.) Bacteria Count Example The Big Picture Remedies after Model Diagnostics The Big Picture Model Modifications Bret Larget Departments of Botany and of Statistics University of Wisconsin Madison February 6, 2007 Residual plots

More information

Stat 412/512 TWO WAY ANOVA. Charlotte Wickham. stat512.cwick.co.nz. Feb

Stat 412/512 TWO WAY ANOVA. Charlotte Wickham. stat512.cwick.co.nz. Feb Stat 42/52 TWO WAY ANOVA Feb 6 25 Charlotte Wickham stat52.cwick.co.nz Roadmap DONE: Understand what a multiple regression model is. Know how to do inference on single and multiple parameters. Some extra

More information

Extensions of One-Way ANOVA.

Extensions of One-Way ANOVA. Extensions of One-Way ANOVA http://www.pelagicos.net/classes_biometry_fa17.htm What do I want You to Know What are two main limitations of ANOVA? What two approaches can follow a significant ANOVA? How

More information

Chaper 5: Matrix Approach to Simple Linear Regression. Matrix: A m by n matrix B is a grid of numbers with m rows and n columns. B = b 11 b m1 ...

Chaper 5: Matrix Approach to Simple Linear Regression. Matrix: A m by n matrix B is a grid of numbers with m rows and n columns. B = b 11 b m1 ... Chaper 5: Matrix Approach to Simple Linear Regression Matrix: A m by n matrix B is a grid of numbers with m rows and n columns B = b 11 b 1n b m1 b mn Element b ik is from the ith row and kth column A

More information

Linear Regression. Furthermore, it is simple.

Linear Regression. Furthermore, it is simple. Linear Regression While linear regression has limited value in the classification problem, it is often very useful in predicting a numerical response, on a linear or ratio scale. Furthermore, it is simple.

More information

1 The Classic Bivariate Least Squares Model

1 The Classic Bivariate Least Squares Model Review of Bivariate Linear Regression Contents 1 The Classic Bivariate Least Squares Model 1 1.1 The Setup............................... 1 1.2 An Example Predicting Kids IQ................. 1 2 Evaluating

More information

y i s 2 X 1 n i 1 1. Show that the least squares estimators can be written as n xx i x i 1 ns 2 X i 1 n ` px xqx i x i 1 pδ ij 1 n px i xq x j x

y i s 2 X 1 n i 1 1. Show that the least squares estimators can be written as n xx i x i 1 ns 2 X i 1 n ` px xqx i x i 1 pδ ij 1 n px i xq x j x Question 1 Suppose that we have data Let x 1 n x i px 1, y 1 q,..., px n, y n q. ȳ 1 n y i s 2 X 1 n px i xq 2 Throughout this question, we assume that the simple linear model is correct. We also assume

More information

Introduction to Statistics and R

Introduction to Statistics and R Introduction to Statistics and R Mayo-Illinois Computational Genomics Workshop (2018) Ruoqing Zhu, Ph.D. Department of Statistics, UIUC rqzhu@illinois.edu June 18, 2018 Abstract This document is a supplimentary

More information

Model Modifications. Bret Larget. Departments of Botany and of Statistics University of Wisconsin Madison. February 6, 2007

Model Modifications. Bret Larget. Departments of Botany and of Statistics University of Wisconsin Madison. February 6, 2007 Model Modifications Bret Larget Departments of Botany and of Statistics University of Wisconsin Madison February 6, 2007 Statistics 572 (Spring 2007) Model Modifications February 6, 2007 1 / 20 The Big

More information

Consider fitting a model using ordinary least squares (OLS) regression:

Consider fitting a model using ordinary least squares (OLS) regression: Example 1: Mating Success of African Elephants In this study, 41 male African elephants were followed over a period of 8 years. The age of the elephant at the beginning of the study and the number of successful

More information

Lecture 10. Factorial experiments (2-way ANOVA etc)

Lecture 10. Factorial experiments (2-way ANOVA etc) Lecture 10. Factorial experiments (2-way ANOVA etc) Jesper Rydén Matematiska institutionen, Uppsala universitet jesper@math.uu.se Regression and Analysis of Variance autumn 2014 A factorial experiment

More information

Nonlinear Models. What do you do when you don t have a line? What do you do when you don t have a line? A Quadratic Adventure

Nonlinear Models. What do you do when you don t have a line? What do you do when you don t have a line? A Quadratic Adventure What do you do when you don t have a line? Nonlinear Models Spores 0e+00 2e+06 4e+06 6e+06 8e+06 30 40 50 60 70 longevity What do you do when you don t have a line? A Quadratic Adventure 1. If nonlinear

More information

GMM - Generalized method of moments

GMM - Generalized method of moments GMM - Generalized method of moments GMM Intuition: Matching moments You want to estimate properties of a data set {x t } T t=1. You assume that x t has a constant mean and variance. x t (µ 0, σ 2 ) Consider

More information

Lecture 2. The Simple Linear Regression Model: Matrix Approach

Lecture 2. The Simple Linear Regression Model: Matrix Approach Lecture 2 The Simple Linear Regression Model: Matrix Approach Matrix algebra Matrix representation of simple linear regression model 1 Vectors and Matrices Where it is necessary to consider a distribution

More information

Practice 2 due today. Assignment from Berndt due Monday. If you double the number of programmers the amount of time it takes doubles. Huh?

Practice 2 due today. Assignment from Berndt due Monday. If you double the number of programmers the amount of time it takes doubles. Huh? Admistrivia Practice 2 due today. Assignment from Berndt due Monday. 1 Story: Pair programming Mythical man month If you double the number of programmers the amount of time it takes doubles. Huh? Invention

More information

Example of treatment contrasts used by R in estimating ANOVA coefficients

Example of treatment contrasts used by R in estimating ANOVA coefficients Example of treatment contrasts used by R in estimating ANOVA coefficients The first example shows a simple numerical design matrix in R (no factors) for the groups 1, a, b, ab. resp

More information

Correlation and Regression: Example

Correlation and Regression: Example Correlation and Regression: Example 405: Psychometric Theory Department of Psychology Northwestern University Evanston, Illinois USA April, 2012 Outline 1 Preliminaries Getting the data and describing

More information

Psychology 405: Psychometric Theory

Psychology 405: Psychometric Theory Psychology 405: Psychometric Theory Homework Problem Set #2 Department of Psychology Northwestern University Evanston, Illinois USA April, 2017 1 / 15 Outline The problem, part 1) The Problem, Part 2)

More information

Stat 4510/7510 Homework 7

Stat 4510/7510 Homework 7 Stat 4510/7510 Due: 1/10. Stat 4510/7510 Homework 7 1. Instructions: Please list your name and student number clearly. In order to receive credit for a problem, your solution must show sufficient details

More information

Design and Analysis of Experiments

Design and Analysis of Experiments Design and Analysis of Experiments Part IX: Response Surface Methodology Prof. Dr. Anselmo E de Oliveira anselmo.quimica.ufg.br anselmo.disciplinas@gmail.com Methods Math Statistics Models/Analyses Response

More information

MATH 644: Regression Analysis Methods

MATH 644: Regression Analysis Methods MATH 644: Regression Analysis Methods FINAL EXAM Fall, 2012 INSTRUCTIONS TO STUDENTS: 1. This test contains SIX questions. It comprises ELEVEN printed pages. 2. Answer ALL questions for a total of 100

More information

Chapter 5. Transformations

Chapter 5. Transformations Chapter 5. Transformations In Chapter 4, you learned ways to diagnose violations of model assumptions. What should you do if some assumptions are badly violated? Often, you can use transformations to solve

More information

Topics on Statistics 2

Topics on Statistics 2 Topics on Statistics 2 Pejman Mahboubi March 7, 2018 1 Regression vs Anova In Anova groups are the predictors. When plotting, we can put the groups on the x axis in any order we wish, say in increasing

More information

HW3 Solutions : Applied Bayesian and Computational Statistics

HW3 Solutions : Applied Bayesian and Computational Statistics HW3 Solutions 36-724: Applied Bayesian and Computational Statistics March 2, 2006 Problem 1 a Fatal Accidents Poisson(θ I will set a prior for θ to be Gamma, as it is the conjugate prior. I will allow

More information

Stat 401B Final Exam Fall 2015

Stat 401B Final Exam Fall 2015 Stat 401B Final Exam Fall 015 I have neither given nor received unauthorized assistance on this exam. Name Signed Date Name Printed ATTENTION! Incorrect numerical answers unaccompanied by supporting reasoning

More information

Statistical Computing Session 4: Random Simulation

Statistical Computing Session 4: Random Simulation Statistical Computing Session 4: Random Simulation Paul Eilers & Dimitris Rizopoulos Department of Biostatistics, Erasmus University Medical Center p.eilers@erasmusmc.nl Masters Track Statistical Sciences,

More information

Section Least Squares Regression

Section Least Squares Regression Section 2.3 - Least Squares Regression Statistics 104 Autumn 2004 Copyright c 2004 by Mark E. Irwin Regression Correlation gives us a strength of a linear relationship is, but it doesn t tell us what it

More information

Multiple Linear Regression

Multiple Linear Regression Multiple Linear Regression ST 430/514 Recall: a regression model describes how a dependent variable (or response) Y is affected, on average, by one or more independent variables (or factors, or covariates).

More information

Exercise 2 SISG Association Mapping

Exercise 2 SISG Association Mapping Exercise 2 SISG Association Mapping Load the bpdata.csv data file into your R session. LHON.txt data file into your R session. Can read the data directly from the website if your computer is connected

More information

Regression and the 2-Sample t

Regression and the 2-Sample t Regression and the 2-Sample t James H. Steiger Department of Psychology and Human Development Vanderbilt University James H. Steiger (Vanderbilt University) Regression and the 2-Sample t 1 / 44 Regression

More information

Modern Regression HW #6 Solutions

Modern Regression HW #6 Solutions 36-401 Modern Regression HW #6 Solutions Problem 1 [32 points] (a) (4 pts.) DUE: 10/27/2017 at 3PM Given : Chick 50 150 300 50 150 300 50 150 300 50 150 300 Weight 50 150 300 50 150 300 50 150 300 Figure

More information

Ch 2: Simple Linear Regression

Ch 2: Simple Linear Regression Ch 2: Simple Linear Regression 1. Simple Linear Regression Model A simple regression model with a single regressor x is y = β 0 + β 1 x + ɛ, where we assume that the error ɛ is independent random component

More information

22s:152 Applied Linear Regression

22s:152 Applied Linear Regression 22s:152 Applied Linear Regression Chapter 7: Dummy Variable Regression So far, we ve only considered quantitative variables in our models. We can integrate categorical predictors by constructing artificial

More information

Generalized Linear Models in R

Generalized Linear Models in R Generalized Linear Models in R NO ORDER Kenneth K. Lopiano, Garvesh Raskutti, Dan Yang last modified 28 4 2013 1 Outline 1. Background and preliminaries 2. Data manipulation and exercises 3. Data structures

More information

Lab #5 - Predictive Regression I Econ 224 September 11th, 2018

Lab #5 - Predictive Regression I Econ 224 September 11th, 2018 Lab #5 - Predictive Regression I Econ 224 September 11th, 2018 Introduction This lab provides a crash course on least squares regression in R. In the interest of time we ll work with a very simple, but

More information

Figure 1: The fitted line using the shipment route-number of ampules data. STAT5044: Regression and ANOVA The Solution of Homework #2 Inyoung Kim

Figure 1: The fitted line using the shipment route-number of ampules data. STAT5044: Regression and ANOVA The Solution of Homework #2 Inyoung Kim 0.0 1.0 1.5 2.0 2.5 3.0 8 10 12 14 16 18 20 22 y x Figure 1: The fitted line using the shipment route-number of ampules data STAT5044: Regression and ANOVA The Solution of Homework #2 Inyoung Kim Problem#

More information

Chaos, Complexity, and Inference (36-462)

Chaos, Complexity, and Inference (36-462) Chaos, Complexity, and Inference (36-462) Lecture 1 Cosma Shalizi 13 January 2009 Course Goals Learn about developments in dynamics and systems theory Understand how they relate to fundamental questions

More information

Variance Decomposition and Goodness of Fit

Variance Decomposition and Goodness of Fit Variance Decomposition and Goodness of Fit 1. Example: Monthly Earnings and Years of Education In this tutorial, we will focus on an example that explores the relationship between total monthly earnings

More information