Simple linear regression

Size: px

Start display at page:

Download "Simple linear regression"

Angelica Poppy Farmer
5 years ago
Views:

1 Simple linear regression Thomas Lumley BIOST 578C

2 Linear model Linear regression is usually presented in terms of a model Y = α + βx + ɛ ɛ N(0, σ 2 ) because the theoretical analysis is pretty for this model (BIOST 533: Theory of Linear Models). Unfortunately, this leads to people believing these assumptions are necessary.

3 Drawing a line The best line between two points (x 1, y 1 ) and (x 2, y 2 ) is obvious: it is the line joining them. The line has slope β 12 = y 2 y 1 x 2 x 2 With n points a sensible approach would be to compute all the pairwise slopes and take some sort of summary of them.

4 Drawing a line y x

5 Drawing a line When x 1 and x 2 are close, the line is more likely to go the wrong way, so we should give more weight to pairs with large x 1 x 2. One sensible possibility is to define weights w ij = (x 1 x 2 ) 2 and then i,j w ij β ij ˆβ = i,j w ij Another might be a weighted median: find ˆβ so that w ij is the same for β ij > ˆβ and for β ij < ˆβ. These are summaries of the data that don t make any assumptions

6 Least squares Some algebra shows that the weighted average summary of slope is exactly the usual least squares estimator in a linear regression model We can check this in an example. First an example that satisifes the usual assumptions

7 x<-1:10 y<-rnorm(10)+x lsmodel<-lm(y~x) xdiff<-outer(x,x,"-") ydiff<-outer(y,y,"-") wij<-xdiff^2 betaij<-ifelse(xdiff==0, 0, ydiff/xdiff) sum(wij*betaij)/sum(wij) weighted.mean(as.vector(betaij), as.vector(wij)) coef(lsmodel) plot(x,y) abline(lsmodel) lines(x,x,lty=2)

8 Notes outer makes a matrix whose (ij) element is a function applied to the ith element of the first argument and the jth element of the second argument. lm is a function for fitting linear models. The object returned by lm incorporates a lot of information. Some of this can be extracted with functions such as coef, abline, and other functions that we didn t use Division by zero in ydiff/xdiff doesn t cause an error, but we can t just multiply by zero again. A non-zero number divided by zero is Inf or -Inf, and multiplying by zero gives NaN.

9 Notes y x

10 and now an example that doesn t satisfy the usual assumptions x<-1:10 y<-x^2+rnorm(10,s=1:10) lsmodel<-lm(y~x) xdiff<-outer(x,x,"-") ydiff<-outer(y,y,"-") wij<-xdiff^2 betaij<-ifelse(xdiff==0, 0, ydiff/xdiff) sum(wij*betaij)/sum(wij) weighted.mean(as.vector(betaij), as.vector(wij)) coef(lsmodel) plot(x,y) abline(lsmodel) lines(x,x^2,lty=2)

11 y x

12 The true slope in this second example is the value we would get from a very large sample, or equivalently the value we get if we remove the rnorm error in y: ytruediff<-outer(x^2,x^2,"-") betatrueij<-ifelse(xdiff==0, 0, ytruediff/xdiff) betatrue<-sum(wij*betatrueij)/sum(wij) alphatrue<-mean(x^2)-mean(x)*betatrue abline(alphatrue,betatrue,col="red")

13 y x

14 Model assumptions If you actually want to predict Y from X then you need an accurate model An average slope may not be a useful summary of the data if you expect the relationship to be very nonlinear The most popular standard error formula assumes that the relationship is linear and the variance is constant, but there are alternatives. One obvious alternative is the bootstrap, but there is also a reasonably simple analytic approach.

15 Example Anscombe (1973) gave four example data sets that give exactly the same slope and intercept summaries (α = 3, β = 0.5) and model-based standard errors. Whether the slope of the line is a useful summary will depend on the scientific questions at hand, as well as the data, but it doesn t look promising for three of the data sets.

16 Example Anscombe's 4 Regression data sets y y x1 x2 y y x3 x4

17 Least squares The usual way to describe this slope summary is in terms of squared errors: we choose the values (ˆα, ˆβ) that minimize n i=1 (Y i α βx i ) 2 The solution to this minimization problem is ˆβ = cov(x, Y ) var(x) which turns out to be the same as the weighted average of pairwise slopes. Least squares generalizes more easily to multiple predictors, but the interpretation is less clear.

Topic 16 Interval Estimation

Topic 16 Interval Estimation Additional Topics 1 / 9 Outline Linear Regression Interpretation of the Confidence Interval 2 / 9 Linear Regression For ordinary linear regression, we have given least squares