Solving Regression. Jordan Boyd-Graber. University of Colorado Boulder LECTURE 12. Slides adapted from Matt Nedrich and Trevor Hastie

Size: px

Start display at page:

Download "Solving Regression. Jordan Boyd-Graber. University of Colorado Boulder LECTURE 12. Slides adapted from Matt Nedrich and Trevor Hastie"

Thomasina Ryan
5 years ago
Views:

1 Solving Regression Jordan Boyd-Graber University of Colorado Boulder LECTURE 12 Slides adapted from Matt Nedrich and Trevor Hastie Jordan Boyd-Graber Boulder Solving Regression 1 of 17

2 Roadmap We talked about what regression is, but now how to solve these problems Gradient Descent for OLS for LASSO Jordan Boyd-Graber Boulder Solving Regression 2 of 17

3 Gradient Descent for OLS Plan Gradient Descent for OLS Jordan Boyd-Graber Boulder Solving Regression 3 of 17

4 Gradient Descent for OLS Closed Form Estimator Possible for ridge regression ( X T X + λi ) 1 X T y (1) But inverting a matrix is hard! Doesn t always scale. What if your data don t live in memory? Jordan Boyd-Graber Boulder Solving Regression 4 of 17

5 Gradient Descent for OLS Closed Form Estimator Possible for ridge regression ( X T X + λi ) 1 X T y (1) But inverting a matrix is hard! Doesn t always scale. What if your data don t live in memory? Stochastic gradient descent Jordan Boyd-Graber Boulder Solving Regression 4 of 17

6 Gradient Descent for OLS Objective Observations should be close to βx Error(β) = 1 N N (y i βx ) 2 i=1 (2) Equivalent to observations from Gaussian Jordan Boyd-Graber Boulder Solving Regression 5 of 17

7 Gradient Descent for OLS OLS Gradient for 2D For convenience, write predictions as mx + b Jordan Boyd-Graber Boulder Solving Regression 6 of 17

8 Gradient Descent for OLS OLS Gradient for 2D For convenience, write predictions as mx + b Possible tweaks: stochastic gradient descent, adding regularization Jordan Boyd-Graber Boulder Solving Regression 6 of 17

9 Gradient Descent for OLS Toy Data Jordan Boyd-Graber Boulder Solving Regression 7 of 17

10 Gradient Descent for OLS Toy Data Jordan Boyd-Graber Boulder Solving Regression 7 of 17

11 Gradient Descent for OLS Running Gradient Descent (learning rate is ) Jordan Boyd-Graber Boulder Solving Regression 8 of 17

12 Gradient Descent for OLS Running Gradient Descent (learning rate is ) Jordan Boyd-Graber Boulder Solving Regression 8 of 17

13 Gradient Descent for OLS Running Gradient Descent (learning rate is ) Jordan Boyd-Graber Boulder Solving Regression 8 of 17

14 Gradient Descent for OLS Running Gradient Descent (learning rate is ) Jordan Boyd-Graber Boulder Solving Regression 8 of 17

15 Gradient Descent for OLS Running Gradient Descent (learning rate is ) Jordan Boyd-Graber Boulder Solving Regression 8 of 17

16 Gradient Descent for OLS Running Gradient Descent (learning rate is ) Jordan Boyd-Graber Boulder Solving Regression 8 of 17

17 Plan Gradient Descent for OLS Jordan Boyd-Graber Boulder Solving Regression 9 of 17

18 Can we use Gradient Descent for Lasso? Objective isn t differentiable Combinatorial optimization Similar to SMO algorithm for SVMs Jordan Boyd-Graber Boulder Solving Regression 10 of 17

19 LAR Algorithm 1. Start with r = y, β 1,... β p = 0. Assume x j are all mean zero and unit variance. 2. Until all predictors have been used and r, x j = 0 j: 2.1 Find predictor x j most correlated with residual r 2.2 Increase β j in the direction of sign r, x j until some x k has as much correlation with r as x j or the sign of β j changes. Call this distance u 2.3 Update prediction µ, residual r Jordan Boyd-Graber Boulder Solving Regression 11 of 17

20 Intuition x 2 μ 0 u 1 x 1 y * 1 Initially, the prediction is 0, the mean of y (remember, everything is standardized). x 1 is most correlated with y, so we move in that direction (toward the OLS solution of y 1 ). We move a distance u 1 until x 2 has as much correlation with the residual. Jordan Boyd-Graber Boulder Solving Regression 12 of 17

21 Intuition x 2 x 2 u μ 1 0 μ 1 x 1 y * 1 Our new estimate is µ 1, a function of just x 1. Now we need to start using x 2, so we incorporate that into our estimate. Jordan Boyd-Graber Boulder Solving Regression 12 of 17

22 Intuition x 2 x 2 y * 2 u μ 1 0 μ 1 x 1 y * 1 We are now moving toward the OLS solution using these two variables, y 2, using a combination of both x 1 and x 2. Jordan Boyd-Graber Boulder Solving Regression 12 of 17

23 Intuition x 2 x 2 y * 2 u 2 u μ 1 0 μ 1 x 1 y * 1 We move our estimate in that direction until some other variable has higher correlation with the residual. We keep moving closer and closer (but never quite reaching) the OLS solution with the current set of variables. Jordan Boyd-Graber Boulder Solving Regression 12 of 17

24 Intuition x 2 x 2 y * 2 μ 2 u 2 u 1 μ 0 μ 1 x 1 y * 1 Jordan Boyd-Graber Boulder Solving Regression 12 of 17

25 MPG Dataset Predict mpg from features of a car 1. Number of cylinders 2. Displacement 3. Horsepower 4. Weight 5. Acceleration 6. Year Jordan Boyd-Graber Boulder Solving Regression 13 of 17

26 Example of LARS β beta Jordan Boyd-Graber Boulder Solving Regression 14 of 17

27 Example of LARS β Correlation beta corr The weight of the car is has the highest (negative) correlation with the weight, so we add that to the active set. Jordan Boyd-Graber Boulder Solving Regression 14 of 17

28 Example of LARS β beta Jordan Boyd-Graber Boulder Solving Regression 14 of 17

29 Example of LARS β Correlation beta 0.0 corr After making predictions with only the weight, the year is the most (positively) correlated, so it gets added to the active set. Jordan Boyd-Graber Boulder Solving Regression 14 of 17

30 Example of LARS β beta Jordan Boyd-Graber Boulder Solving Regression 14 of 17

31 Example of LARS β Correlation beta 0.0 corr At this point, the correlations are getting fairly small. Horsepower wins, but only contributes a tiny amount. Jordan Boyd-Graber Boulder Solving Regression 14 of 17

32 Example of LARS β beta Jordan Boyd-Graber Boulder Solving Regression 14 of 17

33 Example of LARS β Correlation beta 0.0 corr Same story with the number of cylinders... Jordan Boyd-Graber Boulder Solving Regression 14 of 17

34 Example of LARS β beta Jordan Boyd-Graber Boulder Solving Regression 14 of 17

35 Example of LARS β Correlation beta 0.0 corr and acceleration. 10 Jordan Boyd-Graber Boulder Solving Regression 14 of 17

36 Example of LARS β beta Jordan Boyd-Graber Boulder Solving Regression 14 of 17

37 Example of LARS β Correlation beta 0.0 corr Now the year is again the most correlated. But take a look at displacement; it s negatively correlated (about 2.5). Jordan Boyd-Graber Boulder Solving Regression 14 of 17

38 Example of LARS β beta Jordan Boyd-Graber Boulder Solving Regression 14 of 17

39 Example of LARS β Correlation beta 0.0 corr After accounting for the other variables, it s positively correlated. Jordan Boyd-Graber Boulder Solving Regression 14 of 17

40 Example of LARS β beta Jordan Boyd-Graber Boulder Solving Regression 14 of 17

41 Example of LARS β Correlation beta 0.0 corr Now we have our final model. Jordan Boyd-Graber Boulder Solving Regression 14 of 17

42 Example of LARS β beta Jordan Boyd-Graber Boulder Solving Regression 14 of 17

43 Coefficient Trajectories 0.4 beta acc cyl disp hp kg yr iter Jordan Boyd-Graber Boulder Solving Regression 15 of 17

44 Benefits of LARS Interpretation of boosting for continuous problems About as difficult as computing OLS for each group of variables No combinatorial optimization Finds all Lasso solutions Jordan Boyd-Graber Boulder Solving Regression 16 of 17

45 Recap Objective function for regression Algorithms for OLS and regularized regression Like classification, a workhorse method for continuous data Jordan Boyd-Graber Boulder Solving Regression 17 of 17

Regression. Jordan Boyd-Graber. University of Colorado Boulder LECTURE 11. Jordan Boyd-Graber Boulder Regression 1 of 19

Regression. Jordan Boyd-Graber. University of Colorado Boulder LECTURE 11. Jordan Boyd-Graber Boulder Regression 1 of 19 Regression Jordan Boyd-Graber University of Colorado Boulder LECTURE 11 Jordan Boyd-Graber Boulder Regression 1 of 19 Content Questions Jordan Boyd-Graber Boulder Regression 2 of 19 Content Questions Jordan