Introduction to Nonparametric Regression

Size: px
Start display at page:

Download "Introduction to Nonparametric Regression"

Transcription

1 Introduction to Nonparametric Regression Nathaniel E. Helwig Assistant Professor of Psychology and Statistics University of Minnesota (Twin Cities) Updated 04-Jan-2017 Nathaniel E. Helwig (U of Minnesota) Introduction to Nonparametric Regression Updated 04-Jan-2017 : Slide 1

2 Copyright Copyright c 2017 by Nathaniel E. Helwig Nathaniel E. Helwig (U of Minnesota) Introduction to Nonparametric Regression Updated 04-Jan-2017 : Slide 2

3 Outline of Notes 1) Need for NP Reg Motivating example Nonparametric regression 3) Local Regression: Overview Examples 2) Local Averaging Overview Examples 4) Kernel Smoothing: Overview Examples Nathaniel E. Helwig (U of Minnesota) Introduction to Nonparametric Regression Updated 04-Jan-2017 : Slide 3

4 Need for Nonparametric Regression Need for Nonparametric Regression Nathaniel E. Helwig (U of Minnesota) Introduction to Nonparametric Regression Updated 04-Jan-2017 : Slide 4

5 Need for Nonparametric Regression Motivating Example Results From Four Hypothetical Studies Study 1: y^ = x Study 2: y^ = x y^ R 2 = 0.67 y^ R 2 = x x Study 3: y^ = x Study 4: y^ = x y^ R 2 = 0.67 y^ R 2 = x x Figure: Estimated linear relationship from four hypothetical studies. Nathaniel E. Helwig (U of Minnesota) Introduction to Nonparametric Regression Updated 04-Jan-2017 : Slide 5

6 Need for Nonparametric Regression Motivating Example Implications of Four Hypothetical Studies What do the results on the previous slide imply? Can we conclude that there is a linear relationship between X and Y? Is the reproducibility of the finding indicative of a valid discovery? What have we learned about the data from these results? Nathaniel E. Helwig (U of Minnesota) Introduction to Nonparametric Regression Updated 04-Jan-2017 : Slide 6

7 Need for Nonparametric Regression Let s Look at the Data Motivating Example Study 1: y^ = x Study 2: y^ = x y^ R 2 = 0.67 y^ R 2 = x x Study 3: y^ = x Study 4: y^ = x y^ R 2 = 0.67 y^ x x Figure: Estimated linear relationship with corresponding data. Nathaniel E. Helwig (U of Minnesota) Introduction to Nonparametric Regression Updated 04-Jan-2017 : Slide 7

8 Need for Nonparametric Regression Anscombe s (1973) Quartet Motivating Example > anscombe x1 x2 x3 x4 y1 y2 y3 y Nathaniel E. Helwig (U of Minnesota) Introduction to Nonparametric Regression Updated 04-Jan-2017 : Slide 8

9 Need for Nonparametric Regression Nonparametric Regression Parametric versus Nonparametric Regression The general linear model is a form of parametric regression, where the relationship between X and Y has some predetermined form. Parameterizes relationship between X and Y, e.g., Ŷ = β 0 + β 1 X Then estimates the specified parameters, e.g., β 0 and β 1 Great if you know the form of the relationship (e.g., linear) In contrast, nonparametric regression tries to estimate the form of the relationship between X and Y. No predetermined form for relationship between X and Y Great for discovering relationships and building prediction models Nathaniel E. Helwig (U of Minnesota) Introduction to Nonparametric Regression Updated 04-Jan-2017 : Slide 9

10 Need for Nonparametric Regression Nonparametric Regression Problem of Interest Smoothers (aka nonparametric regression) try to estimate functions from noisy data. Suppose we have n pairs of points (x i, y i ) for i {1,..., n}, and WLOG assume that x 1 x 2 x n. Also, suppose the following assumptions hold: (A1) There is a functional relationship between x and y of the form y i = η(x i ) + ɛ i ; i {1,..., n} (A2) The ɛ i are iid from some distribution f (x) with zero mean Nathaniel E. Helwig (U of Minnesota) Introduction to Nonparametric Regression Updated 04-Jan-2017 : Slide 10

11 Local Averaging Local Averaging Nathaniel E. Helwig (U of Minnesota) Introduction to Nonparametric Regression Updated 04-Jan-2017 : Slide 11

12 Local Averaging Overview Friedman s (1984) Local Averaging To estimate η at the point x i, we could calculate the average of the y j values corresponding to x j values that are near x i. Friedman (1984) defined near as the smallest symmetric window around x i that contains s observations. Note that s is called the span Size of averaging window can differ for each x i But always use s points in averaging window Nathaniel E. Helwig (U of Minnesota) Introduction to Nonparametric Regression Updated 04-Jan-2017 : Slide 12

13 Local Averaging Overview Selecting the Span Friedman proposed using a cross-validation approach to select span s. For a given span s, leave-one-out cross-validation: Let ŷ (i) denote the local averaging estimate of η at the point x i obtained by holding out the i-th pair (x i, y i ) Define CV residuals e i (s) = y i ŷ (i) ; note residual is function of s ŝ = min s S (1/n) n i=1 e2 i (s) Nathaniel E. Helwig (U of Minnesota) Introduction to Nonparametric Regression Updated 04-Jan-2017 : Slide 13

14 Local Averaging Examples Local Averaging Example 1: sunspots data Raw Data span=0.01 sunspots sunlocavg$y years sunlocavg$x span=0.05 span=cv sunlocavg$y sunlocavg$y sunlocavg$x sunlocavg$x Nathaniel E. Helwig (U of Minnesota) Introduction to Nonparametric Regression Updated 04-Jan-2017 : Slide 14

15 Local Averaging Examples Local Averaging Example 1: R code data(sunspots) yrs=start(sunspots) yre=end(sunspots) years=seq(yrs[1]+yrs[2]/12,yre[1]+yre[2]/12,by=1/12) dev.new(width=8,height=6,norstudiogd=true) par(mfrow=c(2,2)) plot(years,sunspots,type="l",main="raw Data") sunlocavg=supsmu(years,sunspots,span=0.01) plot(sunlocavg,type="l",main="span=0.01") sunlocavg=supsmu(years,sunspots,span=0.05) plot(sunlocavg,type="l",main="span=0.05") sunlocavg=supsmu(years,sunspots) plot(sunlocavg,type="l",main="span=cv") Nathaniel E. Helwig (U of Minnesota) Introduction to Nonparametric Regression Updated 04-Jan-2017 : Slide 15

16 Local Averaging Examples Local Averaging Example 2: simulated data x y η^ η Nathaniel E. Helwig (U of Minnesota) Introduction to Nonparametric Regression Updated 04-Jan-2017 : Slide 16

17 Local Averaging Examples Local Averaging Example 2: R code > set.seed(1) > x=seq(0,1,length=50) > y=sin(2*pi*x)+rnorm(50,sd=0.5) > locavg=supsmu(x,y) > dev.new(width=6,height=6,norstudiogd=true) > plot(x,y) > lines(locavg$x,locavg$y) > lines(locavg$x,sin(2*pi*x),lty=2) > legend("bottomleft",c(expression(hat(eta)), + expression(eta)),lty=1:2,bty="n") Nathaniel E. Helwig (U of Minnesota) Introduction to Nonparametric Regression Updated 04-Jan-2017 : Slide 17

18 Local Regression Local Regression Nathaniel E. Helwig (U of Minnesota) Introduction to Nonparametric Regression Updated 04-Jan-2017 : Slide 18

19 Local Regression Overview Cleveland s (1979) Local Regression To estimate η at the point x i, we could calculate the local linear regression line using the (x j, y j ) points near x i. LOWESS: LOcally WEighted Scatterplot Smoothing LOESS: LOcal regression Cleveland (1979) proposed using weighted regression with weights related to distance of x j points to x i. Weight function is scaled so only proportion of (x j, y j ) points are used in each regression Size of regression window can differ for each x i But use αn points in each regression where α (0, 1] Nathaniel E. Helwig (U of Minnesota) Introduction to Nonparametric Regression Updated 04-Jan-2017 : Slide 19

20 Local Regression Overview Weighted Local Regression Cleveland proposed using a weight function W such that { > 0, x < 1 W (x) = 0, x 1 Then W is modified for each index i {1,..., n} by... Centering W at x i Scaling W such that αn values are nonzero R s loess uses tricube function: W (x) = (1 x 3 ) 3 for x < 1 Nathaniel E. Helwig (U of Minnesota) Introduction to Nonparametric Regression Updated 04-Jan-2017 : Slide 20

21 Local Regression Overview Weighted Local Regression (continued) Let {w i j }n j=1 denote the weights for a particular point x i. The weighted local regression problem minimizes n j=1 w i j (y j β i 0 βi 1 x j) 2 where β0 i and βi 1 are intercept and slope between x and y in neighborhood around x i Nathaniel E. Helwig (U of Minnesota) Introduction to Nonparametric Regression Updated 04-Jan-2017 : Slide 21

22 Local Regression Overview Robust Weighted Local Regression To reduce effect of outliers, we can perform another regression with weights based on the residuals ˆɛ i = y i ŷ i where ŷ i = ˆβ i 0 + ˆβ i 1 x i. Bisquare weight function: B(x) = (1 x 2 ) 2, x < 1 Residual-based weights: δ i = B (ˆɛ i /(6median 1 j n ˆɛ j ) ) The robust weighted local regression problem minimizes n j=1 δ j w i j (y j β i 0 βi 1 x j) 2 where β0 i and βi 1 are the robust (i.e., residual corrected) intercept and slope between x and y in neighborhood around x i Nathaniel E. Helwig (U of Minnesota) Introduction to Nonparametric Regression Updated 04-Jan-2017 : Slide 22

23 Local Regression Overview Selecting the Span Want to minimize the leave-one-out cross-validation criterion: 1 n n (y i ŷ (i) ) 2 i=1 where ŷ (i) is the LOESS estimate of y i obtained by holding out (x i, y i ). Rewrite the leave-one-out cross-validation criterion as 1 n n i=1 (y i ŷ i ) 2 (1 h ii ) 2 where h ii are diagonal entries of the hat matrix H that determines ŷ i. Replace h ii with 1 n n i=1 h ii = tr(h)/n for generalized CV Nathaniel E. Helwig (U of Minnesota) Introduction to Nonparametric Regression Updated 04-Jan-2017 : Slide 23

24 Local Regression Examples Local Regression Example 1: sunspots data Raw Data span=0.01 sunspots sunlocreg$fitted years sunlocreg$x span=0.05 span=gcv sunlocreg$fitted sunlocreg$fitted sunlocreg$x sunlocreg$x Nathaniel E. Helwig (U of Minnesota) Introduction to Nonparametric Regression Updated 04-Jan-2017 : Slide 24

25 Local Regression Examples Local Regression Example 1: R code library(fancova) data(sunspots) yrs=start(sunspots) yre=end(sunspots) years=seq(yrs[1]+yrs[2]/12,yre[1]+yre[2]/12,by=1/12) dev.new(width=8,height=6,norstudiogd=true) par(mfrow=c(2,2)) plot(years,sunspots,type="l",main="raw Data") sunlocreg=loess(sunspots~years,span=0.01) plot(sunlocreg$x,sunlocreg$fitted,type="l",main="span=0.01") sunlocreg=loess(sunspots~years,span=0.05) plot(sunlocreg$x,sunlocreg$fitted,type="l",main="span=0.05") sunlocreg=loess.as(years,sunspots,criterion="gcv") plot(sunlocreg$x,sunlocreg$fitted,type="l",main="span=gcv") Nathaniel E. Helwig (U of Minnesota) Introduction to Nonparametric Regression Updated 04-Jan-2017 : Slide 25

26 Local Regression Examples Local Regression Example 2: simulated data x y η^ η Nathaniel E. Helwig (U of Minnesota) Introduction to Nonparametric Regression Updated 04-Jan-2017 : Slide 26

27 Local Regression Examples Local Regression Example 2: R code > library(fancova) > set.seed(55455) > x=seq(0,1,length=50) > y=sin(2*pi*x)+rnorm(50,sd=0.5) > locreg=loess.as(x,y) > dev.new(width=6,height=6,norstudiogd=true) > plot(x,y) > lines(locreg$x,locreg$fitted) > lines(locreg$x,sin(2*pi*x),lty=2) > legend("bottomleft",c(expression(hat(eta)), + expression(eta)),lty=1:2,bty="n") Nathaniel E. Helwig (U of Minnesota) Introduction to Nonparametric Regression Updated 04-Jan-2017 : Slide 27

28 Kernel Smoothing Kernel Smoothing Nathaniel E. Helwig (U of Minnesota) Introduction to Nonparametric Regression Updated 04-Jan-2017 : Slide 28

29 Kernel Smoothing Overview Kernel Smoothing for General Functions Kernel smoothing extends KDE idea to estimation a general function η. Nadaraya (1964) and Watson (1964) independently introduced the kernel regression estimate where weights w i = n i=1 ˆη(x) = y ik ( x x i ) h n i=1 K ( x x ) i = h ( ) K x xi h ( n i=1 K x xi h n y i w i i=1 ) are dependent on chosen K and h. CV, GCV, or AIC to select bandwidth in kernel regression problems. Nathaniel E. Helwig (U of Minnesota) Introduction to Nonparametric Regression Updated 04-Jan-2017 : Slide 29

30 Kernel Smoothing Examples Kernel Smoothing Example 1: sunspots data Raw Data bw=0.01 sunspots fitted(sunkerreg) years years bws=0.05 bws=cv (0.09) fitted(sunkerreg) fitted(sunkerreg) years years Nathaniel E. Helwig (U of Minnesota) Introduction to Nonparametric Regression Updated 04-Jan-2017 : Slide 30

31 Kernel Smoothing Examples Kernel Smoothing Example 1: R code library(np) data(sunspots) yrs=start(sunspots) yre=end(sunspots) sunspots=as.vector(sunspots) years=seq(yrs[1]+yrs[2]/12,yre[1]+yre[2]/12,by=1/12) dev.new(width=8,height=6,norstudiogd=true) par(mfrow=c(2,2)) plot(years,sunspots,type="l",main="raw Data") sunkerreg=npreg(bws=0.01,txdat=years,tydat=sunspots) plot(years,fitted(sunkerreg),type="l",main="bw=0.01") sunkerreg=npreg(bws=0.05,txdat=years,tydat=sunspots) plot(years,fitted(sunkerreg),type="l",main="bws=0.05") sunkerreg=npreg(txdat=years,tydat=sunspots) plot(years,fitted(sunkerreg),type="l",main="bws=cv (0.09)") Nathaniel E. Helwig (U of Minnesota) Introduction to Nonparametric Regression Updated 04-Jan-2017 : Slide 31

32 Kernel Smoothing Examples Kernel Smoothing Example 2: simulated data x y η^ η Nathaniel E. Helwig (U of Minnesota) Introduction to Nonparametric Regression Updated 04-Jan-2017 : Slide 32

33 Kernel Smoothing Examples Kernel Smoothing Example 2: R code > library(np) > set.seed(55455) > x=seq(0,1,length=50) > y=sin(2*pi*x)+rnorm(50,sd=0.5) > kerreg=npreg(txdat=x,tydat=y) > dev.new(width=6,height=6,norstudiogd=true) > plot(x,y) > lines(x,fitted(kerreg)) > lines(x,sin(2*pi*x),lty=2) > legend("bottomleft",c(expression(hat(eta)), + expression(eta)),lty=1:2,bty="n") Nathaniel E. Helwig (U of Minnesota) Introduction to Nonparametric Regression Updated 04-Jan-2017 : Slide 33

STAT 704 Sections IRLS and Bootstrap

STAT 704 Sections IRLS and Bootstrap STAT 704 Sections 11.4-11.5. IRLS and John Grego Department of Statistics, University of South Carolina Stat 704: Data Analysis I 1 / 14 LOWESS IRLS LOWESS LOWESS (LOcally WEighted Scatterplot Smoothing)

More information

Density and Distribution Estimation

Density and Distribution Estimation Density and Distribution Estimation Nathaniel E. Helwig Assistant Professor of Psychology and Statistics University of Minnesota (Twin Cities) Updated 04-Jan-2017 Nathaniel E. Helwig (U of Minnesota) Density

More information

Local regression I. Patrick Breheny. November 1. Kernel weighted averages Local linear regression

Local regression I. Patrick Breheny. November 1. Kernel weighted averages Local linear regression Local regression I Patrick Breheny November 1 Patrick Breheny STA 621: Nonparametric Statistics 1/27 Simple local models Kernel weighted averages The Nadaraya-Watson estimator Expected loss and prediction

More information

Nonparametric Location Tests: k-sample

Nonparametric Location Tests: k-sample Nonparametric Location Tests: k-sample Nathaniel E. Helwig Assistant Professor of Psychology and Statistics University of Minnesota (Twin Cities) Updated 04-Jan-2017 Nathaniel E. Helwig (U of Minnesota)

More information

Nonparametric Methods

Nonparametric Methods Nonparametric Methods Michael R. Roberts Department of Finance The Wharton School University of Pennsylvania July 28, 2009 Michael R. Roberts Nonparametric Methods 1/42 Overview Great for data analysis

More information

Nonparametric Independence Tests

Nonparametric Independence Tests Nonparametric Independence Tests Nathaniel E. Helwig Assistant Professor of Psychology and Statistics University of Minnesota (Twin Cities) Updated 04-Jan-2017 Nathaniel E. Helwig (U of Minnesota) Nonparametric

More information

Chapter 3: Regression Methods for Trends

Chapter 3: Regression Methods for Trends Chapter 3: Regression Methods for Trends Time series exhibiting trends over time have a mean function that is some simple function (not necessarily constant) of time. The example random walk graph from

More information

Regression with Polynomials and Interactions

Regression with Polynomials and Interactions Regression with Polynomials and Interactions Nathaniel E. Helwig Assistant Professor of Psychology and Statistics University of Minnesota (Twin Cities) Updated 04-Jan-2017 Nathaniel E. Helwig (U of Minnesota)

More information

Introduction to Regression

Introduction to Regression Introduction to Regression David E Jones (slides mostly by Chad M Schafer) June 1, 2016 1 / 102 Outline General Concepts of Regression, Bias-Variance Tradeoff Linear Regression Nonparametric Procedures

More information

Factorial and Unbalanced Analysis of Variance

Factorial and Unbalanced Analysis of Variance Factorial and Unbalanced Analysis of Variance Nathaniel E. Helwig Assistant Professor of Psychology and Statistics University of Minnesota (Twin Cities) Updated 04-Jan-2017 Nathaniel E. Helwig (U of Minnesota)

More information

Introduction to Linear regression analysis. Part 2. Model comparisons

Introduction to Linear regression analysis. Part 2. Model comparisons Introduction to Linear regression analysis Part Model comparisons 1 ANOVA for regression Total variation in Y SS Total = Variation explained by regression with X SS Regression + Residual variation SS Residual

More information

Statistical View of Least Squares

Statistical View of Least Squares May 23, 2006 Purpose of Regression Some Examples Least Squares Purpose of Regression Purpose of Regression Some Examples Least Squares Suppose we have two variables x and y Purpose of Regression Some Examples

More information

Econ 582 Nonparametric Regression

Econ 582 Nonparametric Regression Econ 582 Nonparametric Regression Eric Zivot May 28, 2013 Nonparametric Regression Sofarwehaveonlyconsideredlinearregressionmodels = x 0 β + [ x ]=0 [ x = x] =x 0 β = [ x = x] [ x = x] x = β The assume

More information

Nonparametric Regression

Nonparametric Regression Nonparametric Regression Econ 674 Purdue University April 8, 2009 Justin L. Tobias (Purdue) Nonparametric Regression April 8, 2009 1 / 31 Consider the univariate nonparametric regression model: where y

More information

STAT 705 Nonlinear regression

STAT 705 Nonlinear regression STAT 705 Nonlinear regression Adapted from Timothy Hanson Department of Statistics, University of South Carolina Stat 705: Data Analysis II 1 / 1 Chapter 13 Parametric nonlinear regression Throughout most

More information

Linear Regression. In this problem sheet, we consider the problem of linear regression with p predictors and one intercept,

Linear Regression. In this problem sheet, we consider the problem of linear regression with p predictors and one intercept, Linear Regression In this problem sheet, we consider the problem of linear regression with p predictors and one intercept, y = Xβ + ɛ, where y t = (y 1,..., y n ) is the column vector of target values,

More information

Nonparametric Regression. Badr Missaoui

Nonparametric Regression. Badr Missaoui Badr Missaoui Outline Kernel and local polynomial regression. Penalized regression. We are given n pairs of observations (X 1, Y 1 ),...,(X n, Y n ) where Y i = r(x i ) + ε i, i = 1,..., n and r(x) = E(Y

More information

Lecture 02 Linear classification methods I

Lecture 02 Linear classification methods I Lecture 02 Linear classification methods I 22 January 2016 Taylor B. Arnold Yale Statistics STAT 365/665 1/32 Coursewebsite: A copy of the whole course syllabus, including a more detailed description of

More information

3. For a given dataset and linear model, what do you think is true about least squares estimates? Is Ŷ always unique? Yes. Is ˆβ always unique? No.

3. For a given dataset and linear model, what do you think is true about least squares estimates? Is Ŷ always unique? Yes. Is ˆβ always unique? No. 7. LEAST SQUARES ESTIMATION 1 EXERCISE: Least-Squares Estimation and Uniqueness of Estimates 1. For n real numbers a 1,...,a n, what value of a minimizes the sum of squared distances from a to each of

More information

Additive Isotonic Regression

Additive Isotonic Regression Additive Isotonic Regression Enno Mammen and Kyusang Yu 11. July 2006 INTRODUCTION: We have i.i.d. random vectors (Y 1, X 1 ),..., (Y n, X n ) with X i = (X1 i,..., X d i ) and we consider the additive

More information

Model-free prediction intervals for regression and autoregression. Dimitris N. Politis University of California, San Diego

Model-free prediction intervals for regression and autoregression. Dimitris N. Politis University of California, San Diego Model-free prediction intervals for regression and autoregression Dimitris N. Politis University of California, San Diego To explain or to predict? Models are indispensable for exploring/utilizing relationships

More information

Introduction to Normal Distribution

Introduction to Normal Distribution Introduction to Normal Distribution Nathaniel E. Helwig Assistant Professor of Psychology and Statistics University of Minnesota (Twin Cities) Updated 17-Jan-2017 Nathaniel E. Helwig (U of Minnesota) Introduction

More information

Applied Regression. Applied Regression. Chapter 2 Simple Linear Regression. Hongcheng Li. April, 6, 2013

Applied Regression. Applied Regression. Chapter 2 Simple Linear Regression. Hongcheng Li. April, 6, 2013 Applied Regression Chapter 2 Simple Linear Regression Hongcheng Li April, 6, 2013 Outline 1 Introduction of simple linear regression 2 Scatter plot 3 Simple linear regression model 4 Test of Hypothesis

More information

Probability and Statistics Notes

Probability and Statistics Notes Probability and Statistics Notes Chapter Seven Jesse Crawford Department of Mathematics Tarleton State University Spring 2011 (Tarleton State University) Chapter Seven Notes Spring 2011 1 / 42 Outline

More information

Introduction to Regression

Introduction to Regression PSU Summer School in Statistics Regression Slide 1 Introduction to Regression Chad M. Schafer Department of Statistics & Data Science, Carnegie Mellon University cschafer@cmu.edu June 2018 PSU Summer School

More information

Local Polynomial Regression

Local Polynomial Regression VI Local Polynomial Regression (1) Global polynomial regression We observe random pairs (X 1, Y 1 ),, (X n, Y n ) where (X 1, Y 1 ),, (X n, Y n ) iid (X, Y ). We want to estimate m(x) = E(Y X = x) based

More information

Section 7: Local linear regression (loess) and regression discontinuity designs

Section 7: Local linear regression (loess) and regression discontinuity designs Section 7: Local linear regression (loess) and regression discontinuity designs Yotam Shem-Tov Fall 2015 Yotam Shem-Tov STAT 239/ PS 236A October 26, 2015 1 / 57 Motivation We will focus on local linear

More information

Introduction to Regression

Introduction to Regression Introduction to Regression Chad M. Schafer May 20, 2015 Outline General Concepts of Regression, Bias-Variance Tradeoff Linear Regression Nonparametric Procedures Cross Validation Local Polynomial Regression

More information

Simple Linear Regression

Simple Linear Regression Simple Linear Regression ST 430/514 Recall: A regression model describes how a dependent variable (or response) Y is affected, on average, by one or more independent variables (or factors, or covariates)

More information

BNAD 276 Lecture 10 Simple Linear Regression Model

BNAD 276 Lecture 10 Simple Linear Regression Model 1 / 27 BNAD 276 Lecture 10 Simple Linear Regression Model Phuong Ho May 30, 2017 2 / 27 Outline 1 Introduction 2 3 / 27 Outline 1 Introduction 2 4 / 27 Simple Linear Regression Model Managerial decisions

More information

AMS 315/576 Lecture Notes. Chapter 11. Simple Linear Regression

AMS 315/576 Lecture Notes. Chapter 11. Simple Linear Regression AMS 315/576 Lecture Notes Chapter 11. Simple Linear Regression 11.1 Motivation A restaurant opening on a reservations-only basis would like to use the number of advance reservations x to predict the number

More information

Math 3330: Solution to midterm Exam

Math 3330: Solution to midterm Exam Math 3330: Solution to midterm Exam Question 1: (14 marks) Suppose the regression model is y i = β 0 + β 1 x i + ε i, i = 1,, n, where ε i are iid Normal distribution N(0, σ 2 ). a. (2 marks) Compute the

More information

Introduction to Regression

Introduction to Regression Introduction to Regression p. 1/97 Introduction to Regression Chad Schafer cschafer@stat.cmu.edu Carnegie Mellon University Introduction to Regression p. 1/97 Acknowledgement Larry Wasserman, All of Nonparametric

More information

Penalized Splines, Mixed Models, and Recent Large-Sample Results

Penalized Splines, Mixed Models, and Recent Large-Sample Results Penalized Splines, Mixed Models, and Recent Large-Sample Results David Ruppert Operations Research & Information Engineering, Cornell University Feb 4, 2011 Collaborators Matt Wand, University of Wollongong

More information

Principal Components Analysis

Principal Components Analysis Principal Components Analysis Nathaniel E. Helwig Assistant Professor of Psychology and Statistics University of Minnesota (Twin Cities) Updated 16-Mar-2017 Nathaniel E. Helwig (U of Minnesota) Principal

More information

LWP. Locally Weighted Polynomials toolbox for Matlab/Octave

LWP. Locally Weighted Polynomials toolbox for Matlab/Octave LWP Locally Weighted Polynomials toolbox for Matlab/Octave ver. 2.2 Gints Jekabsons http://www.cs.rtu.lv/jekabsons/ User's manual September, 2016 Copyright 2009-2016 Gints Jekabsons CONTENTS 1. INTRODUCTION...3

More information

Graphical Presentation of a Nonparametric Regression with Bootstrapped Confidence Intervals

Graphical Presentation of a Nonparametric Regression with Bootstrapped Confidence Intervals Graphical Presentation of a Nonparametric Regression with Bootstrapped Confidence Intervals Mark Nicolich & Gail Jorgensen Exxon Biomedical Science, Inc., East Millstone, NJ INTRODUCTION Parametric regression

More information

Linear Regression. September 27, Chapter 3. Chapter 3 September 27, / 77

Linear Regression. September 27, Chapter 3. Chapter 3 September 27, / 77 Linear Regression Chapter 3 September 27, 2016 Chapter 3 September 27, 2016 1 / 77 1 3.1. Simple linear regression 2 3.2 Multiple linear regression 3 3.3. The least squares estimation 4 3.4. The statistical

More information

K. Model Diagnostics. residuals ˆɛ ij = Y ij ˆµ i N = Y ij Ȳ i semi-studentized residuals ω ij = ˆɛ ij. studentized deleted residuals ɛ ij =

K. Model Diagnostics. residuals ˆɛ ij = Y ij ˆµ i N = Y ij Ȳ i semi-studentized residuals ω ij = ˆɛ ij. studentized deleted residuals ɛ ij = K. Model Diagnostics We ve already seen how to check model assumptions prior to fitting a one-way ANOVA. Diagnostics carried out after model fitting by using residuals are more informative for assessing

More information

Classification objectives COMS 4771

Classification objectives COMS 4771 Classification objectives COMS 4771 1. Recap: binary classification Scoring functions Consider binary classification problems with Y = { 1, +1}. 1 / 22 Scoring functions Consider binary classification

More information

THE PEARSON CORRELATION COEFFICIENT

THE PEARSON CORRELATION COEFFICIENT CORRELATION Two variables are said to have a relation if knowing the value of one variable gives you information about the likely value of the second variable this is known as a bivariate relation There

More information

LOCAL REGRESSION MODELS: ADVANCEMENTS, APPLICATIONS, AND NEW METHODS. A Dissertation. Submitted to the Faculty. Purdue University. Ryan P.

LOCAL REGRESSION MODELS: ADVANCEMENTS, APPLICATIONS, AND NEW METHODS. A Dissertation. Submitted to the Faculty. Purdue University. Ryan P. LOCAL REGRESSION MODELS: ADVANCEMENTS, APPLICATIONS, AND NEW METHODS A Dissertation Submitted to the Faculty of Purdue University by Ryan P. Hafen In Partial Fulfillment of the Requirements for the Degree

More information

Ch 2: Simple Linear Regression

Ch 2: Simple Linear Regression Ch 2: Simple Linear Regression 1. Simple Linear Regression Model A simple regression model with a single regressor x is y = β 0 + β 1 x + ɛ, where we assume that the error ɛ is independent random component

More information

Local Polynomial Modelling and Its Applications

Local Polynomial Modelling and Its Applications Local Polynomial Modelling and Its Applications J. Fan Department of Statistics University of North Carolina Chapel Hill, USA and I. Gijbels Institute of Statistics Catholic University oflouvain Louvain-la-Neuve,

More information

Chapter 5 Exercises 1. Data Analysis & Graphics Using R Solutions to Exercises (April 24, 2004)

Chapter 5 Exercises 1. Data Analysis & Graphics Using R Solutions to Exercises (April 24, 2004) Chapter 5 Exercises 1 Data Analysis & Graphics Using R Solutions to Exercises (April 24, 2004) Preliminaries > library(daag) Exercise 2 The final three sentences have been reworded For each of the data

More information

Biostatistics Advanced Methods in Biostatistics IV

Biostatistics Advanced Methods in Biostatistics IV Biostatistics 140.754 Advanced Methods in Biostatistics IV Jeffrey Leek Assistant Professor Department of Biostatistics jleek@jhsph.edu Lecture 12 1 / 36 Tip + Paper Tip: As a statistician the results

More information

Lab 3 A Quick Introduction to Multiple Linear Regression Psychology The Multiple Linear Regression Model

Lab 3 A Quick Introduction to Multiple Linear Regression Psychology The Multiple Linear Regression Model Lab 3 A Quick Introduction to Multiple Linear Regression Psychology 310 Instructions.Work through the lab, saving the output as you go. You will be submitting your assignment as an R Markdown document.

More information

Single Index Quantile Regression for Heteroscedastic Data

Single Index Quantile Regression for Heteroscedastic Data Single Index Quantile Regression for Heteroscedastic Data E. Christou M. G. Akritas Department of Statistics The Pennsylvania State University JSM, 2015 E. Christou, M. G. Akritas (PSU) SIQR JSM, 2015

More information

Matrices and vectors A matrix is a rectangular array of numbers. Here s an example: A =

Matrices and vectors A matrix is a rectangular array of numbers. Here s an example: A = Matrices and vectors A matrix is a rectangular array of numbers Here s an example: 23 14 17 A = 225 0 2 This matrix has dimensions 2 3 The number of rows is first, then the number of columns We can write

More information

Linear models and their mathematical foundations: Simple linear regression

Linear models and their mathematical foundations: Simple linear regression Linear models and their mathematical foundations: Simple linear regression Steffen Unkel Department of Medical Statistics University Medical Center Göttingen, Germany Winter term 2018/19 1/21 Introduction

More information

Simple Linear Regression for the MPG Data

Simple Linear Regression for the MPG Data Simple Linear Regression for the MPG Data 2000 2500 3000 3500 15 20 25 30 35 40 45 Wgt MPG What do we do with the data? y i = MPG of i th car x i = Weight of i th car i =1,...,n n = Sample Size Exploratory

More information

22s:152 Applied Linear Regression. Chapter 6: Statistical Inference for Regression

22s:152 Applied Linear Regression. Chapter 6: Statistical Inference for Regression 22s:152 Applied Linear Regression Chapter 6: Statistical Inference for Regression Simple Linear Regression Assumptions for inference Key assumptions: linear relationship (between Y and x) *we say the relationship

More information

Geographically Weighted Regression as a Statistical Model

Geographically Weighted Regression as a Statistical Model Geographically Weighted Regression as a Statistical Model Chris Brunsdon Stewart Fotheringham Martin Charlton October 6, 2000 Spatial Analysis Research Group Department of Geography University of Newcastle-upon-Tyne

More information

Introduction to Regression

Introduction to Regression Introduction to Regression Chad M. Schafer cschafer@stat.cmu.edu Carnegie Mellon University Introduction to Regression p. 1/100 Outline General Concepts of Regression, Bias-Variance Tradeoff Linear Regression

More information

Pattern Recognition and Machine Learning. Bishop Chapter 2: Probability Distributions

Pattern Recognition and Machine Learning. Bishop Chapter 2: Probability Distributions Pattern Recognition and Machine Learning Chapter 2: Probability Distributions Cécile Amblard Alex Kläser Jakob Verbeek October 11, 27 Probability Distributions: General Density Estimation: given a finite

More information

Regression Review. Statistics 149. Spring Copyright c 2006 by Mark E. Irwin

Regression Review. Statistics 149. Spring Copyright c 2006 by Mark E. Irwin Regression Review Statistics 149 Spring 2006 Copyright c 2006 by Mark E. Irwin Matrix Approach to Regression Linear Model: Y i = β 0 + β 1 X i1 +... + β p X ip + ɛ i ; ɛ i iid N(0, σ 2 ), i = 1,..., n

More information

STATISTICS 479 Exam II (100 points)

STATISTICS 479 Exam II (100 points) Name STATISTICS 79 Exam II (1 points) 1. A SAS data set was created using the following input statement: Answer parts(a) to (e) below. input State $ City $ Pop199 Income Housing Electric; (a) () Give the

More information

Regression #2. Econ 671. Purdue University. Justin L. Tobias (Purdue) Regression #2 1 / 24

Regression #2. Econ 671. Purdue University. Justin L. Tobias (Purdue) Regression #2 1 / 24 Regression #2 Econ 671 Purdue University Justin L. Tobias (Purdue) Regression #2 1 / 24 Estimation In this lecture, we address estimation of the linear regression model. There are many objective functions

More information

Chapter 5 Exercises 1

Chapter 5 Exercises 1 Chapter 5 Exercises 1 Data Analysis & Graphics Using R, 2 nd edn Solutions to Exercises (December 13, 2006) Preliminaries > library(daag) Exercise 2 For each of the data sets elastic1 and elastic2, determine

More information

Statistical View of Least Squares

Statistical View of Least Squares Basic Ideas Some Examples Least Squares May 22, 2007 Basic Ideas Simple Linear Regression Basic Ideas Some Examples Least Squares Suppose we have two variables x and y Basic Ideas Simple Linear Regression

More information

Ch 3: Multiple Linear Regression

Ch 3: Multiple Linear Regression Ch 3: Multiple Linear Regression 1. Multiple Linear Regression Model Multiple regression model has more than one regressor. For example, we have one response variable and two regressor variables: 1. delivery

More information

Sociology 6Z03 Review I

Sociology 6Z03 Review I Sociology 6Z03 Review I John Fox McMaster University Fall 2016 John Fox (McMaster University) Sociology 6Z03 Review I Fall 2016 1 / 19 Outline: Review I Introduction Displaying Distributions Describing

More information

2. (Today) Kernel methods for regression and classification ( 6.1, 6.2, 6.6)

2. (Today) Kernel methods for regression and classification ( 6.1, 6.2, 6.6) 1. Recap linear regression, model selection, coefficient shrinkage ( 3.1, 3.2, 3.3, 3.4.1,2,3) logistic regression, linear discriminant analysis (lda) ( 4.4, 4.3) regression with spline basis (ns, bs),

More information

Alternatives to Basis Expansions. Kernels in Density Estimation. Kernels and Bandwidth. Idea Behind Kernel Methods

Alternatives to Basis Expansions. Kernels in Density Estimation. Kernels and Bandwidth. Idea Behind Kernel Methods Alternatives to Basis Expansions Basis expansions require either choice of a discrete set of basis or choice of smoothing penalty and smoothing parameter Both of which impose prior beliefs on data. Alternatives

More information

Single Index Quantile Regression for Heteroscedastic Data

Single Index Quantile Regression for Heteroscedastic Data Single Index Quantile Regression for Heteroscedastic Data E. Christou M. G. Akritas Department of Statistics The Pennsylvania State University SMAC, November 6, 2015 E. Christou, M. G. Akritas (PSU) SIQR

More information

Summarizing Data: Paired Quantitative Data

Summarizing Data: Paired Quantitative Data Summarizing Data: Paired Quantitative Data regression line (or least-squares line) a straight line model for the relationship between explanatory (x) and response (y) variables, often used to produce a

More information

Causal Inference Basics

Causal Inference Basics Causal Inference Basics Sam Lendle October 09, 2013 Observed data, question, counterfactuals Observed data: n i.i.d copies of baseline covariates W, treatment A {0, 1}, and outcome Y. O i = (W i, A i,

More information

Objectives. 2.3 Least-squares regression. Regression lines. Prediction and Extrapolation. Correlation and r 2. Transforming relationships

Objectives. 2.3 Least-squares regression. Regression lines. Prediction and Extrapolation. Correlation and r 2. Transforming relationships Objectives 2.3 Least-squares regression Regression lines Prediction and Extrapolation Correlation and r 2 Transforming relationships Adapted from authors slides 2012 W.H. Freeman and Company Straight Line

More information

Section Least Squares Regression

Section Least Squares Regression Section 2.3 - Least Squares Regression Statistics 104 Autumn 2004 Copyright c 2004 by Mark E. Irwin Regression Correlation gives us a strength of a linear relationship is, but it doesn t tell us what it

More information

Lecture 3: Statistical Decision Theory (Part II)

Lecture 3: Statistical Decision Theory (Part II) Lecture 3: Statistical Decision Theory (Part II) Hao Helen Zhang Hao Helen Zhang Lecture 3: Statistical Decision Theory (Part II) 1 / 27 Outline of This Note Part I: Statistics Decision Theory (Classical

More information

The Correlation Principle. Estimation with (Nonparametric) Correlation Coefficients

The Correlation Principle. Estimation with (Nonparametric) Correlation Coefficients 1 The Correlation Principle Estimation with (Nonparametric) Correlation Coefficients 1 2 The Correlation Principle for the estimation of a parameter theta: A Statistical inference of procedure should be

More information

Regression Shrinkage and Selection via the Lasso

Regression Shrinkage and Selection via the Lasso Regression Shrinkage and Selection via the Lasso ROBERT TIBSHIRANI, 1996 Presenter: Guiyun Feng April 27 () 1 / 20 Motivation Estimation in Linear Models: y = β T x + ɛ. data (x i, y i ), i = 1, 2,...,

More information

Spatial Process Estimates as Smoothers: A Review

Spatial Process Estimates as Smoothers: A Review Spatial Process Estimates as Smoothers: A Review Soutir Bandyopadhyay 1 Basic Model The observational model considered here has the form Y i = f(x i ) + ɛ i, for 1 i n. (1.1) where Y i is the observed

More information

Linear models. Rasmus Waagepetersen Department of Mathematics Aalborg University Denmark. October 5, 2016

Linear models. Rasmus Waagepetersen Department of Mathematics Aalborg University Denmark. October 5, 2016 Linear models Rasmus Waagepetersen Department of Mathematics Aalborg University Denmark October 5, 2016 1 / 16 Outline for today linear models least squares estimation orthogonal projections estimation

More information

Generalized Additive Models

Generalized Additive Models Generalized Additive Models The Model The GLM is: g( µ) = ß 0 + ß 1 x 1 + ß 2 x 2 +... + ß k x k The generalization to the GAM is: g(µ) = ß 0 + f 1 (x 1 ) + f 2 (x 2 ) +... + f k (x k ) where the functions

More information

Regression, Ridge Regression, Lasso

Regression, Ridge Regression, Lasso Regression, Ridge Regression, Lasso Fabio G. Cozman - fgcozman@usp.br October 2, 2018 A general definition Regression studies the relationship between a response variable Y and covariates X 1,..., X n.

More information

STAT Regression Methods

STAT Regression Methods STAT 501 - Regression Methods Unit 9 Examples Example 1: Quake Data Let y t = the annual number of worldwide earthquakes with magnitude greater than 7 on the Richter scale for n = 99 years. Figure 1 gives

More information

Inference for Regression

Inference for Regression Inference for Regression Section 9.4 Cathy Poliak, Ph.D. cathy@math.uh.edu Office in Fleming 11c Department of Mathematics University of Houston Lecture 13b - 3339 Cathy Poliak, Ph.D. cathy@math.uh.edu

More information

High-dimensional regression

High-dimensional regression High-dimensional regression Advanced Methods for Data Analysis 36-402/36-608) Spring 2014 1 Back to linear regression 1.1 Shortcomings Suppose that we are given outcome measurements y 1,... y n R, and

More information

Location Prediction of Moving Target

Location Prediction of Moving Target Location of Moving Target Department of Radiation Oncology Stanford University AAPM 2009 Outline of Topics 1 Outline of Topics 1 2 Outline of Topics 1 2 3 Estimation of Stochastic Process Stochastic Regression:

More information

Introduction and Single Predictor Regression. Correlation

Introduction and Single Predictor Regression. Correlation Introduction and Single Predictor Regression Dr. J. Kyle Roberts Southern Methodist University Simmons School of Education and Human Development Department of Teaching and Learning Correlation A correlation

More information

Lectures on Simple Linear Regression Stat 431, Summer 2012

Lectures on Simple Linear Regression Stat 431, Summer 2012 Lectures on Simple Linear Regression Stat 43, Summer 0 Hyunseung Kang July 6-8, 0 Last Updated: July 8, 0 :59PM Introduction Previously, we have been investigating various properties of the population

More information

CPSC 540: Machine Learning

CPSC 540: Machine Learning CPSC 540: Machine Learning Kernel Density Estimation, Factor Analysis Mark Schmidt University of British Columbia Winter 2017 Admin Assignment 2: 2 late days to hand it in tonight. Assignment 3: Due Feburary

More information

Economics 620, Lecture 19: Introduction to Nonparametric and Semiparametric Estimation

Economics 620, Lecture 19: Introduction to Nonparametric and Semiparametric Estimation Economics 620, Lecture 19: Introduction to Nonparametric and Semiparametric Estimation Nicholas M. Kiefer Cornell University Professor N. M. Kiefer (Cornell University) Lecture 19: Nonparametric Analysis

More information

Biostatistics. Correlation and linear regression. Burkhardt Seifert & Alois Tschopp. Biostatistics Unit University of Zurich

Biostatistics. Correlation and linear regression. Burkhardt Seifert & Alois Tschopp. Biostatistics Unit University of Zurich Biostatistics Correlation and linear regression Burkhardt Seifert & Alois Tschopp Biostatistics Unit University of Zurich Master of Science in Medical Biology 1 Correlation and linear regression Analysis

More information

Uncertainty Quantification for Inverse Problems. November 7, 2011

Uncertainty Quantification for Inverse Problems. November 7, 2011 Uncertainty Quantification for Inverse Problems November 7, 2011 Outline UQ and inverse problems Review: least-squares Review: Gaussian Bayesian linear model Parametric reductions for IP Bias, variance

More information

Day 4A Nonparametrics

Day 4A Nonparametrics Day 4A Nonparametrics A. Colin Cameron Univ. of Calif. - Davis... for Center of Labor Economics Norwegian School of Economics Advanced Microeconometrics Aug 28 - Sep 2, 2017. Colin Cameron Univ. of Calif.

More information

Model Selection. Frank Wood. December 10, 2009

Model Selection. Frank Wood. December 10, 2009 Model Selection Frank Wood December 10, 2009 Standard Linear Regression Recipe Identify the explanatory variables Decide the functional forms in which the explanatory variables can enter the model Decide

More information

THE MINIMIZATION PROCESS IN THE CORRELATION ESTIMATION SYSTEM (CES) COMPARED TO LEAST SQUARES IN LINEAR REGRESSION

THE MINIMIZATION PROCESS IN THE CORRELATION ESTIMATION SYSTEM (CES) COMPARED TO LEAST SQUARES IN LINEAR REGRESSION THE MINIMIZATION PROCESS IN THE CORRELATION ESTIMATION SYSTEM (CES) COMPARED TO LEAST SQUARES IN LINEAR REGRESSION RUDY A. GIDEON Abstract. This presentation contains a new system of estimation, starting

More information

Nonparametric Econometrics in R

Nonparametric Econometrics in R Nonparametric Econometrics in R Philip Shaw Fordham University November 17, 2011 Philip Shaw (Fordham University) Nonparametric Econometrics in R November 17, 2011 1 / 16 Introduction The NP Package R

More information

Stat 602 Exam 1 Spring 2017 (corrected version)

Stat 602 Exam 1 Spring 2017 (corrected version) Stat 602 Exam Spring 207 (corrected version) I have neither given nor received unauthorized assistance on this exam. Name Signed Date Name Printed This is a very long Exam. You surely won't be able to

More information

Statistics 203: Introduction to Regression and Analysis of Variance Course review

Statistics 203: Introduction to Regression and Analysis of Variance Course review Statistics 203: Introduction to Regression and Analysis of Variance Course review Jonathan Taylor - p. 1/?? Today Review / overview of what we learned. - p. 2/?? General themes in regression models Specifying

More information

Chapter 5 Friday, May 21st

Chapter 5 Friday, May 21st Chapter 5 Friday, May 21 st Overview In this Chapter we will see three different methods we can use to describe a relationship between two quantitative variables. These methods are: Scatterplot Correlation

More information

GIS Analysis: Spatial Statistics for Public Health: Lauren M. Scott, PhD; Mark V. Janikas, PhD

GIS Analysis: Spatial Statistics for Public Health: Lauren M. Scott, PhD; Mark V. Janikas, PhD Some Slides to Go Along with the Demo Hot spot analysis of average age of death Section B DEMO: Mortality Data Analysis 2 Some Slides to Go Along with the Demo Do Economic Factors Alone Explain Early Death?

More information

UNIVERSITY OF MASSACHUSETTS. Department of Mathematics and Statistics. Basic Exam - Applied Statistics. Tuesday, January 17, 2017

UNIVERSITY OF MASSACHUSETTS. Department of Mathematics and Statistics. Basic Exam - Applied Statistics. Tuesday, January 17, 2017 UNIVERSITY OF MASSACHUSETTS Department of Mathematics and Statistics Basic Exam - Applied Statistics Tuesday, January 17, 2017 Work all problems 60 points are needed to pass at the Masters Level and 75

More information

Regression Diagnostics for Survey Data

Regression Diagnostics for Survey Data Regression Diagnostics for Survey Data Richard Valliant Joint Program in Survey Methodology, University of Maryland and University of Michigan USA Jianzhu Li (Westat), Dan Liao (JPSM) 1 Introduction Topics

More information

Gibbs Sampling in Linear Models #2

Gibbs Sampling in Linear Models #2 Gibbs Sampling in Linear Models #2 Econ 690 Purdue University Outline 1 Linear Regression Model with a Changepoint Example with Temperature Data 2 The Seemingly Unrelated Regressions Model 3 Gibbs sampling

More information

Remedial Measures for Multiple Linear Regression Models

Remedial Measures for Multiple Linear Regression Models Remedial Measures for Multiple Linear Regression Models Yang Feng http://www.stat.columbia.edu/~yangfeng Yang Feng (Columbia University) Remedial Measures for Multiple Linear Regression Models 1 / 25 Outline

More information

11 Regression. Introduction. The Correlation Coefficient. The Least-Squares Regression Line

11 Regression. Introduction. The Correlation Coefficient. The Least-Squares Regression Line 11 Regression The Correlation Coefficient The Least-Squares Regression Line The Correlation Coefficient Introduction A bivariate data set consists of n, (x 1, y 1 ),, (x n, y n ). A scatterplot is a of

More information

Graphical Methods of Determining Predictor Importance and Effect

Graphical Methods of Determining Predictor Importance and Effect Graphical Methods of Determining Predictor Importance and Effect a dissertation submitted to the faculty of the graduate school of the university of minnesota by Aaron Kjell Rendahl in partial fulfillment

More information