Geometric intuition of least squares Consider the vector x = (1, 2) A point in a two-dimensional space

Size: px

Start display at page:

Download "Geometric intuition of least squares Consider the vector x = (1, 2) A point in a two-dimensional space"

William Peters
6 years ago
Views:

1 Geometric intuition of least squares Consider the vector x = (1, 2) A point in a two-dimensional space x 1 1 x 1 2 x 2

2 Linear combinations of x: multiply x with a constant β generate a line that passes trough x and the origin (0, 0). x 1 1 x = (1, 2) (0, 0) 1 2 x 2

3 The line: the span of x. What is regression? Point y in the same space. Model y as a linear function of x y = xβ Find a number ˆβ that explains best observations of x and y. x 1 y = (1, 1) 1 x = (1, 2)

4 Choose the xβ that is closest to the observed y. Criterion: The usual (Euclidean) distance between y and xβ. Minimize the distance with respect to β ˆβ = argmin β y xβ, x 1 y =(1,1) 1 x = (1, 2) (0, 0) 1 2 x 2

5 OLS estimate ˆβ: The point on xβ with the minimum distance to y. To find the minimum distance minimize (y xβ) (y xβ) Optimization problem First order condition: β = 2x (y xβ) Set equal to zero and solve for β. x (y xβ) = 0 x y x xβ = 0 x y = x [ xβ x x ] 1 x y = β The famed OLS (ordinary least squares) estimate of a linear regression model: ˆβ = [ x x ] 1 x y

6 In the example x = (1, 2) y = (1, 1) What is the value of β? x x = 5 [x x] 1 = 1 5 x y = 3 ˆβ = 3 5 The closest point to y on the line xβ is x ˆβ = (1 2) 3 ( ) 3 5 = 6 5 5

7 Normal equation x (y xβ) = 0 is called the normal equation. What does this mean in terms of the model y = xβ + e Solve for e: e = y xβ In other words, the first order condition x (y xβ) = 0 can be written as x e = 0 or, the fitted errors are orthogonal to the data x.

8 This can also be thought about intuitively y x ˆβ = ê is the unexplained part of the regression. We have used the data x to find ˆβ. What is left to explain is ê = y x ˆβ If we have used x optimally to find β, x should have nothing left to explain of ê, that is, x is orthogonal to ê, or x (y xβ) = x e = 0 The normal equation x (y xβ) = 0 has thus the interpretation that we use the information in x optimally to find ˆβ

9 Measuring the fit of a regression. Consider the picture with two y observations, y 1 and y 2 x 1 y 1 = (1, 1) 1 x = (1, 2) y2 = ( 1 2, 3 2 ) (0, 0) 1 2 x 2

10 Can we conclude that y 2 is closer than y 1 to xβ? Need a measure of the distance that is not sensitive to scaling. Such a measure is the R-squared of a regression. In geometric terms, look at the two points y 1 and y 2 above. To compare the distance to Xβ of these two in a unit-free way: compare the angles that the line from the origin to y 1 and y 2 forms with Xβ. The angle θ between two vectors x and y was cos θ = (x, y) x y In a OLS problem the angle between y and the line Xβ is measured as cos θ = y ˆβ y Taking the square of this, we find the (uncentered) R 2 of the regression, R 2 = cos 2 θ

11 Alternative way of interpreting R 2 : The fraction of the errors explained by the regression. If R 2 = 1, then y is totally explained by the regression. If R 2 = 0, the regression explains nothing. x 1 1 R 2 = 0 R 2 = 1-1 (0, 0) 1 2 x 2 1 2

12 Probably better known interpretation of R 2 : R 2 = Explained square sum of errors Total square sum of errors R 2 is one when everything is explained, since the sum of squares is also our criterion function.

13 Geometry of multivariate regressions y = a + b 1 x 1 + b 2 x b k x k + e The dependent variable y is now a linear function of k independent variables x 1, x 2,...x k. The factors b i have the same interpretation as b in the univariate regressions: They measure the marginal effect of a change in one of the explanatory variables, holding everyting else constant dy = d(a + b 1x 1 + b 2 x b i x i b k x k + e) = b i dx i dx i

14 This is written in matrix form where ỹ i = a + bx 1i + bx 2i + + b k x ki + ẽ i ỹ = Xb + ẽ X = 1 x 11 x x 1k 1 x 21 x x 2k... 1 x n1 x n2... x nk The regression is formulated as y = Xb + e b = a b 1. b k e = e 1 e 2. e n

15 Geometrically Regression line is a solution to minium distance problems - in several dimenstions. Bivariate regression, y is a function of two variables (x 1, x 2 ) A three-dimensional picture, with data a bunch of points (x 1, x 2 ) in this space

16 Regression is then drawing the best fitting line in this space.

17 Calculation of estimates The minimum inference problem is b = argmin b (y Xb) (y Xb) Calculation: Step 1: the Normal Equation X (y Xb) = 0 Step 2: The analytical solution ˆb = (X X) 1 X y

18 Estimation using octave y = Xb We explain y as a linear function of X. Suppose we have two explanatory variables. Then b is a 2 1 matrix. We simulate 100 observations of the model. In simulating, we need to add some noise e to the data, to avoid a perfect fit. y = Xb + e

19 Simulating the model X=rand(100,2); b = [2;1] e = 0.25*randn(100,1); y = X*b + e;

20 Estimating the model > bhat = inv(x *X)*X *y results in > bhat = inverse(x *X)*X *y bhat = Alternatively > ols(y,x) ans =

21 Forecasting > bhat bhat = > new_x = [1 1] new_x = 1 1 > forecast_y = new_x * bhat forecast_y =

22 Residuals ehat = y - X*bhat; plot (ehat);

23 Remove lines, plots instead >> plot(ehat,"*");

24 Detecting deviations from the assumed linear relationship Residual ê = y â ˆbx. Plot residuals against other variables. Should be: Centered at zero, No obvious relationships.

25 Simulated example: Linear model True model: x = [1, 2, 3,...100] y = a + bx + e a = 1 b = 1 e N(0, 10 2 )

26 Plot data

27 Simulated example: Linear model Fitted regression b =

28 Plot residuals. Against x

29 Plot residuals against y

30 Simulated example: nonlinear model True model: y = a + b ln(x) + e x = [1, 2, 3,..., 100] a = 1 b = 1 Simulate series,

31 Plot observations

32 Fitted regression y = a + bx + e b =

33 What do the residuals look like? vs x

34 Residuals vs y Aha, there is a problem.

35 Solving this problem: Linearizing it. Define x = log(x) and run regression with x instead Fitted regression bhat =

36 Correct model

37 Residuals: against x (ln x)

38 Residuals: against y

39 Plotting the estimated relationship against x instead of ln(x).

40 Alternative nonlinear model y = a + b sin(0.1x) + e a = 1 b = 1 Fitted regression y = a + bx + e b = e e-04

41 Fitted regression

42 Residuals against x. Problem

43 Residuals outliers are not always errors... One are not always justified in simply throwing out any observations considered an outlier.

44 Difference the previous picture

45 Prediction example y = Xb Given an estimated set of parmeters b, use it for prediction

46 Prediction example At a large state university seven undergraduate students who are majoring in economics were randomly selected and surveyed. Two of the survey questions asked were: What was your grade-point average (GPA) in the preceding term? What was the average number of hours spent per week last term in the Orange and Brew? The Orange and Brew is a favorite and only watering hole on campus. Using the data below, estimate with ordinary least squares the equation G = α + βh where G is GPA and H is hours per week in Orange and Brew. (The GPA is a numerical summary of grades with 4 as the largest number.) What is the expected sign for β? Does the data support your expectations?

47 Hours per week in Student GPA (G) Orange and Brew (H) Suppose that a freshman economics student has been spending 15 hours per week in the Orange and Brew during the first two weeks of class. Predict his GPA for this year.

48 Solution Data = [1, 3.6, 3 ; \ 2, 2.2, 15 ; \ 3, 3.1, 8 ; \ 4, 3.5, 9 ; \ 5, 2.7, 12 ; \ 6, 2.6, 12 ; \ 7, 3.9, 4 ] y=data(:,2); x=data(:,3); X=[ones(7,1),x]; b=ols(y,x) b = Thus, we estimated the parameters as α = and β =

49 Now for prediction: >> predicted=[1 15]*b predicted = prediction = α + β15 = = What if number of hours is 4? >> predicted=[1 4]*b predicted = A bit better

50 You want to find the risk of the stock IBM. To do so you think the market model r it = α i + β i r mt + e it is a reasonable description of the risk return relationship. To estimate the parameters α i and β i you need a history of stock returns and index returns. Collect monthly returns for IBM and a broad based US stock market index, for example the S&P 500. Take data for 1995:1 to 2006:12. Estimate the model. What is the R 2 in your estimation?

51 Plotting one against the other

52 Running the regression >> r=ibm(:,2); >> rm=sp500(:,2); >> X=[ones(144,1) rm]; >> b=x\r b = We estimate the parameters as â = ˆb =

53 Next, plotting the observations and comparing it to the actual regrssion. >> plot(rm,r,"*",rm,x*b)

54 Check for any obvious problems by calculating the residuals and plotting them against r m : >> plot(rm,e,"*");

55 Calculate the R 2 >> SSR=e *e SSR = >> TSS=(r-mean(r)) *(r-mean(r)) TSS = >> R2=1-SSR/TSS R2 = The R 2 of the regression is

56 It is all optimization... So far, all our analysis has actually solved an optimization problem to find the parameter estimates in a regression. (minimum distance problem). Remaining issue: Probability statements. How can we evaluate an estimated coefficient how confident are we that the true coefficient is close to what we have estimated. To make such statements: Need additional assumptions.

OLS-parameter estimation

OLS-parameter estimation Bernt Arne Ødegaard 29 May 2018 Contents 1 The geometry of Least Squares. 2 1.1 Introduction........................................................... 2 1.2 OLS estimation, simplest