9/26/17. Ridge regression. What our model needs to do. Ridge Regression: L2 penalty. Ridge coefficients. Ridge coefficients

Size: px

Start display at page:

Download "9/26/17. Ridge regression. What our model needs to do. Ridge Regression: L2 penalty. Ridge coefficients. Ridge coefficients"

Ronald Barrett
5 years ago
Views:

1 What our model needs to do regression Usually, we are not just trying to explain observed data We want to uncover meaningful trends And predict future observations Our questions then are Is β" a good estimate of β (consistent, minimizes error) Will Xβ" fit future observations well. (generalizes well) If the βs are unconstrained They can get very large As they get larger, they are more susceptible to high variance Regularize the coefficients: add constraints to keep them small. 6 min ) y ' ( + Xβ - s. t. ) β 5 t +/0 5/0 Necessary for Y to be centered, X s to be standardized. Centered: mean 0 Standardized: mean 0, variance 1 Regression: L2 penalty λ New loss function: (instead of MSE or RSS) Penalized residual sum of squares. 6 PRSS β <+=>? = ) y + Xβ λ ) β 5 +/0 5/0. = ) y + Xβ λ β 5 - +/0 λ is known as the shrinkage parameter λ controls the size the β coeffienients can take Controls the amount of regularization As λ 0 we obtain β" EFG As λ we obtain β" <+=>? J/ K = 0 (intercept-only model) This is a convex optimization problem: There s a unique solution Solution is a function of λ coefficients coefficients OLS solution Intercept-only 1

2 Why does Regression help? How do we choose λ? Bias-variance tradeoff We accept bias to turn down the variance. We need a systematic and principled way of choosing λ We want to choose λ that minimizes the PRSS Usually it s not the OLS solution We want to minimize the size of βs while minimizing the MSE. How do we choose λ? (geometric proof) Choosing lambda in practice The blue ball is the beta contribution, the red the OLS MSE. Just like in the two dimensional case, we want the cross over point Regression regression: keep the size of the βs small regression: keep the βs zero. Similar to, except different penalty. (and thus different interpretation) 2

3 Regularization as Optimization We could show is biased. Unless λ=0 Analytical solution is less clear than either or OLS, but it is again a function of λ. Similar problem to before, we have to choose λ. Performance of Performance of OLS solution Intercept-only How do we choose λ? (geometric proof) Performance of The teal square is the beta contribution, the red the MSE. Just like in the two dimensional case, we want the cross over point 3

4 Performance: vs Probability of seeing a given value of β Deciding which algorithm to use Machine learning overview First consideration: (supervised or unsupervised) Is there an underlying truth? Do you know exactly what values Y should be taking? Second consideration: (classification vs regression) Are you predicting a probability, class/cluster or value A whirlwind tour of some common algorithms (not instruction, just exposure come back Spring 18 for instruction) Third consideration: Are you interested in predictive accuracy or understanding influence of predictors 4

can be turned into a classification problem

5 Unsupervised vs supervised Regression/Classification/Clustering Mostly we ll work in supervised. Considering both regression and classification With parametric models Values vs labels All regression problems can be turned into a classification problem (e.g. Does the house cost more than 100k) Classification problems are either predicting labels or predicting probabilities of events Clustering is detecting communities/groups within a dataset Linear regression Logistic Regression K nearest neighbors K means clustering 5

6 Naïve Bayes Classifier Support Vector Machines Assume everything is independent given class. Support Vector Machines: Kernels Artificial Neural Networks 6

Linear Regression 9/23/17. Simple linear regression. Advertising sales: Variance changes based on # of TVs. Advertising sales: Normal error?

Linear Regression 9/23/17. Simple linear regression. Advertising sales: Variance changes based on # of TVs. Advertising sales: Normal error? Simple linear regression Linear Regression Nicole Beckage y " = β % + β ' x " + ε so y* " = β+ % + β+ ' x " Method to assess and evaluate the correlation between two (continuous) variables. The slope of