Regression. Jordan Boyd-Graber. University of Colorado Boulder LECTURE 11. Jordan Boyd-Graber Boulder Regression 1 of 19

Similar documents
Solving Regression. Jordan Boyd-Graber. University of Colorado Boulder LECTURE 12. Slides adapted from Matt Nedrich and Trevor Hastie

Classification: Logistic Regression from Data

Logistic Regression. Introduction to Data Science Algorithms Jordan Boyd-Graber and Michael Paul SLIDES ADAPTED FROM HINRICH SCHÜTZE

COMS 4771 Introduction to Machine Learning. James McInerney Adapted from slides by Nakul Verma

Machine Learning: Chenhao Tan University of Colorado Boulder LECTURE 5

Classification: The PAC Learning Framework

II. Linear Models (pp.47-70)

Introduction to Machine Learning

Machine Learning: Chenhao Tan University of Colorado Boulder LECTURE 6

Logistic Regression. INFO-2301: Quantitative Reasoning 2 Michael Paul and Jordan Boyd-Graber SLIDES ADAPTED FROM HINRICH SCHÜTZE

Discrete Probability Distributions

Lecture 14: Shrinkage

COMS 4771 Regression. Nakul Verma

Machine Learning (CSE 446): Learning as Minimizing Loss; Least Squares

Online Learning. Jordan Boyd-Graber. University of Colorado Boulder LECTURE 21. Slides adapted from Mohri

Inexact Search is Good Enough

LINEAR REGRESSION, RIDGE, LASSO, SVR

Machine Learning & Data Mining Caltech CS/CNS/EE 155 Set 2 January 12 th, 2018

Self-Assessment Weeks 6 and 7: Multiple Regression with a Qualitative Predictor; Multiple Comparisons

Announcements. J. Parman (UC-Davis) Analysis of Economic Data, Winter 2011 February 8, / 45

COMP 551 Applied Machine Learning Lecture 3: Linear regression (cont d)

Lecture 6: Linear Regression (continued)

Lecture 6: Linear Regression

Applied Machine Learning Lecture 5: Linear classifiers, continued. Richard Johansson

Support Vector Machines

Warm up: risk prediction with logistic regression

Introduction to Machine Learning

EE 511 Online Learning Perceptron

LECTURE 04: LINEAR REGRESSION PT. 2. September 20, 2017 SDS 293: Machine Learning

Support Vector Machines

Modeling Data with Linear Combinations of Basis Functions. Read Chapter 3 in the text by Bishop

Machine learning - HT Basis Expansion, Regularization, Validation

Lecture #11: Classification & Logistic Regression

You can use numeric categorical predictors. A categorical predictor is one that takes values from a fixed set of possibilities.

Week 3: Linear Regression

Machine Learning: Chenhao Tan University of Colorado Boulder LECTURE 16

SVM. Introduction to Data Science Algorithms Jordan Boyd-Graber and Michael Paul SLIDES ADAPTED FROM HINRICH SCHÜTZE

Analytics 512: Homework # 2 Tim Ahn February 9, 2016

Chapter 1 Problem 28: Agenda. Quantities in Motion. Displacement Isn t Distance. Velocity. Speed 1/23/14

MS&E 226: Small Data

Self-Assessment Weeks 8: Multiple Regression with Qualitative Predictors; Multiple Comparisons

Linear Regression (continued)

Data Mining - SVM. Dr. Jean-Michel RICHER Dr. Jean-Michel RICHER Data Mining - SVM 1 / 55

INTRODUCTION TO DATA SCIENCE

Association studies and regression

Machine Learning Linear Classification. Prof. Matteo Matteucci

LECTURE 10: LINEAR MODEL SELECTION PT. 1. October 16, 2017 SDS 293: Machine Learning

Machine Learning: Chenhao Tan University of Colorado Boulder LECTURE 9

CS489/698: Intro to ML

MS&E 226. In-Class Midterm Examination Solutions Small Data October 20, 2015

Generative Clustering, Topic Modeling, & Bayesian Inference

Linear regression. Linear regression is a simple approach to supervised learning. It assumes that the dependence of Y on X 1,X 2,...X p is linear.

CPSC 340: Machine Learning and Data Mining

Classification: Logistic Regression from Data

Generative Learning. INFO-4604, Applied Machine Learning University of Colorado Boulder. November 29, 2018 Prof. Michael Paul

A Decision Stump. Decision Trees, cont. Boosting. Machine Learning 10701/15781 Carlos Guestrin Carnegie Mellon University. October 1 st, 2007

COS 402 Machine Learning and Artificial Intelligence Fall Lecture 3: Learning Theory

Final Overview. Introduction to ML. Marek Petrik 4/25/2017

Generative v. Discriminative classifiers Intuition

CSC 411: Lecture 02: Linear Regression

Maximum Entropy Klassifikator; Klassifikation mit Scikit-Learn

Simple Linear Regression for the MPG Data

Data Analysis and Machine Learning Lecture 12: Multicollinearity, Bias-Variance Trade-off, Cross-validation and Shrinkage Methods.

5. Let W follow a normal distribution with mean of μ and the variance of 1. Then, the pdf of W is

Case Study 1: Estimating Click Probabilities. Kakade Announcements: Project Proposals: due this Friday!

LECTURE 15: SIMPLE LINEAR REGRESSION I

ESS2222. Lecture 4 Linear model

CPSC 340: Machine Learning and Data Mining. MLE and MAP Fall 2017

Recall that a measure of fit is the sum of squared residuals: where. The F-test statistic may be written as:

Introduction to Machine Learning

9/26/17. Ridge regression. What our model needs to do. Ridge Regression: L2 penalty. Ridge coefficients. Ridge coefficients

Assignment 3. Introduction to Machine Learning Prof. B. Ravindran

Classification Logistic Regression

Expectations and Entropy

BNAD 276 Lecture 10 Simple Linear Regression Model

Random Variables and Events

Overfitting, Bias / Variance Analysis

Classification. Jordan Boyd-Graber University of Maryland WEIGHTED MAJORITY. Slides adapted from Mohri. Jordan Boyd-Graber UMD Classification 1 / 13

Linear Regression 9/23/17. Simple linear regression. Advertising sales: Variance changes based on # of TVs. Advertising sales: Normal error?

Online Videos FERPA. Sign waiver or sit on the sides or in the back. Off camera question time before and after lecture. Questions?

Machine Learning Linear Regression. Prof. Matteo Matteucci

Decision Trees. Machine Learning CSEP546 Carlos Guestrin University of Washington. February 3, 2014

... SPARROW. SPARse approximation Weighted regression. Pardis Noorzad. Department of Computer Engineering and IT Amirkabir University of Technology

Midterm. You may use a calculator, but not any device that can access the Internet or store large amounts of data.

CPSC 340: Machine Learning and Data Mining

COS513: FOUNDATIONS OF PROBABILISTIC MODELS LECTURE 10

Week 5: Logistic Regression & Neural Networks

Lecture 7: Kernels for Classification and Regression

Shrinkage Methods: Ridge and Lasso

A Short Introduction to the Lasso Methodology

Lecture 3. Linear Regression

1 Effects of Regularization For this problem, you are required to implement everything by yourself and submit code.

Polynomial Regression and Regularization

Support Vector Machine I

Multiple Regression. Midterm results: AVG = 26.5 (88%) A = 27+ B = C =

EC4051 Project and Introductory Econometrics

Lecture 18: Kernels Risk and Loss Support Vector Regression. Aykut Erdem December 2016 Hacettepe University

Lecture 3: Statistical Decision Theory (Part II)

Lecture 17: Small-Sample Inferences for Normal Populations. Confidence intervals for µ when σ is unknown

Machine Learning and Computational Statistics, Spring 2017 Homework 2: Lasso Regression

Transcription:

Regression Jordan Boyd-Graber University of Colorado Boulder LECTURE 11 Jordan Boyd-Graber Boulder Regression 1 of 19

Content Questions Jordan Boyd-Graber Boulder Regression 2 of 19

Content Questions Jordan Boyd-Graber Boulder Regression 2 of 19

Content Questions Jordan Boyd-Graber Boulder Regression 2 of 19

Content Questions Jordan Boyd-Graber Boulder Regression 2 of 19

Content Questions Jordan Boyd-Graber Boulder Regression 2 of 19

Content Questions Jordan Boyd-Graber Boulder Regression 2 of 19

Content Questions Jordan Boyd-Graber Boulder Regression 2 of 19

Content Questions Jordan Boyd-Graber Boulder Regression 2 of 19

Content Questions Jordan Boyd-Graber Boulder Regression 2 of 19

Content Questions. Jordan Boyd-Graber Boulder Regression 2 of 19

Administrivia Learnability HW due Friday SVM HW next week I m out of town next week Jordan Boyd-Graber Boulder Regression 3 of 19

Project Default project or choose your own I ll meet with your group early in the week of Oct 12 Project proposal First deliverable due Nov 6: data / baseline Jordan Boyd-Graber Boulder Regression 4 of 19

Basics Plan Basics Regularization Sklearn Jordan Boyd-Graber Boulder Regression 5 of 19

Basics Predictions dimension weight b 1 w 1 2.0 w 2 1.0 σ 1.0 1. x 1 = {0.0, 0.0}; y 1 = 2. x 2 = {1.0, 1.0}; y 2 = 3. x 3 = {.5, 2}; y 3 = Jordan Boyd-Graber Boulder Regression 6 of 19

Basics Predictions dimension weight b 1 w 1 2.0 w 2 1.0 σ 1.0 1. x 1 = {0.0, 0.0}; y 1 =1.0 2. x 2 = {1.0, 1.0}; y 2 = 3. x 3 = {.5, 2}; y 3 = Jordan Boyd-Graber Boulder Regression 6 of 19

Basics Predictions dimension weight b 1 w 1 2.0 w 2 1.0 σ 1.0 1. x 1 = {0.0, 0.0}; y 1 =1.0 2. x 2 = {1.0, 1.0}; y 2 =2.0 3. x 3 = {.5, 2}; y 3 = Jordan Boyd-Graber Boulder Regression 6 of 19

Basics Predictions dimension weight b 1 w 1 2.0 w 2 1.0 σ 1.0 1. x 1 = {0.0, 0.0}; y 1 =1.0 2. x 2 = {1.0, 1.0}; y 2 =2.0 3. x 3 = {.5, 2}; y 3 =0.0 Jordan Boyd-Graber Boulder Regression 6 of 19

Basics Probabilities dimension weight w 0 1 w 1 2.0 w 2 1.0 σ 1.0 p(y x) = y N b + exp p(y x) = { (y ŷ)2 2 2π p w j x j, σ 2 j=1 } 1. p(y 1 = 1 x 1 = {0.0, 0.0}) = 2. p(y 2 = 3 x 2 = {1.0, 1.0}) = 3. p(y 3 = 1 x 3 = {.5, 2}) = Jordan Boyd-Graber Boulder Regression 7 of 19

Basics Probabilities dimension weight w 0 1 w 1 2.0 w 2 1.0 σ 1.0 p(y x) = y N b + exp p(y x) = { (y ŷ)2 2 2π p w j x j, σ 2 j=1 } 1. p(y 1 = 1 x 1 = {0.0, 0.0}) = 0.399 2. p(y 2 = 3 x 2 = {1.0, 1.0}) = 3. p(y 3 = 1 x 3 = {.5, 2}) = Jordan Boyd-Graber Boulder Regression 7 of 19

Basics Probabilities dimension weight w 0 1 w 1 2.0 w 2 1.0 σ 1.0 p(y x) = y N b + exp p(y x) = { (y ŷ)2 2 2π p w j x j, σ 2 j=1 } 1. p(y 1 = 1 x 1 = {0.0, 0.0}) = 0.399 2. p(y 2 = 3 x 2 = {1.0, 1.0}) = 0.242 3. p(y 3 = 1 x 3 = {.5, 2}) = Jordan Boyd-Graber Boulder Regression 7 of 19

Basics Probabilities dimension weight w 0 1 w 1 2.0 w 2 1.0 σ 1.0 p(y x) = y N b + exp p(y x) = { (y ŷ)2 2 2π p w j x j, σ 2 j=1 } 1. p(y 1 = 1 x 1 = {0.0, 0.0}) = 0.399 2. p(y 2 = 3 x 2 = {1.0, 1.0}) = 0.242 3. p(y 3 = 1 x 3 = {.5, 2}) = 0.242 Jordan Boyd-Graber Boulder Regression 7 of 19

Regularization Plan Basics Regularization Sklearn Jordan Boyd-Graber Boulder Regression 8 of 19

Regularization Consider these points and data w=.75,b=.25 w=1,b=0 Jordan Boyd-Graber Boulder Regression 9 of 19

Regularization Consider these points and data w=.75,b=.25 w=1,b=0 Which is the better OLS solution? Jordan Boyd-Graber Boulder Regression 9 of 19

Regularization Consider these points and data w=.75,b=.25 w=1,b=0 Blue! It has lower RSS. Jordan Boyd-Graber Boulder Regression 9 of 19

Regularization Consider these points and data w=.75,b=.25 w=1,b=0 What is the RSS of the better solution? Jordan Boyd-Graber Boulder Regression 9 of 19

Regularization Consider these points and data w=.75,b=.25 w=1,b=0 1 2 i r i 2 = 1 ( 2 (1 1) 2 + (2.5 2) 2 + (2.5 3) 2) = 1 4 Jordan Boyd-Graber Boulder Regression 9 of 19

Regularization Consider these points and data w=.75,b=.25 w=1,b=0 What is the RSS of the red line? Jordan Boyd-Graber Boulder Regression 9 of 19

Regularization Consider these points and data w=.75,b=.25 w=1,b=0 1 2 i r i 2 = 1 ( 2 (1 1) 2 + (2.5 1.75) 2 + (2.5 2.5) 2) = 3 8 Jordan Boyd-Graber Boulder Regression 9 of 19

Regularization Consider these points and data w=.75,b=.25 w=1,b=0 For what λ does the blue line have a better regularized solution with L 2 and L 1? Jordan Boyd-Graber Boulder Regression 9 of 19

Regularization When Regularization Wins L 2 L 1 Jordan Boyd-Graber Boulder Regression 10 of 19

Regularization When Regularization Wins L 2 RSS(x, y, w) + λ d w 2 d > RSS(x, y, w) + λ d w 2 d L 1 Jordan Boyd-Graber Boulder Regression 10 of 19

Regularization When Regularization Wins L 2 RSS(x, y, w) + λ d w 2 d > RSS(x, y, w) + λ d w 2 d 1 4 + λ1 > 3 8 + λ 9 16 L 1 Jordan Boyd-Graber Boulder Regression 10 of 19

Regularization When Regularization Wins L 2 1 4 + λ1 > 3 8 + λ 9 16 7 16 λ > 1 8 L 1 Jordan Boyd-Graber Boulder Regression 10 of 19

Regularization When Regularization Wins L 2 7 16 λ > 1 8 λ > 2 7 L 1 Jordan Boyd-Graber Boulder Regression 10 of 19

Regularization When Regularization Wins L 2 λ > 2 7 L 1 RSS(x, y, w) + λ d w d > RSS(x, y, w) + λ d w d Jordan Boyd-Graber Boulder Regression 10 of 19

Regularization When Regularization Wins L 2 λ > 2 7 L 1 RSS(x, y, w) + λ d w d > RSS(x, y, w) + λ d w d 1 4 + λ1 > 3 8 + λ3 4 Jordan Boyd-Graber Boulder Regression 10 of 19

Regularization When Regularization Wins L 2 λ > 2 7 L 1 1 4 + λ1 > 3 8 + λ3 4 Jordan Boyd-Graber Boulder Regression 10 of 19

Regularization When Regularization Wins L 2 λ > 2 7 L 1 1 4 λ > 1 8 Jordan Boyd-Graber Boulder Regression 10 of 19

Regularization When Regularization Wins L 2 λ > 2 7 L 1 1 4 λ > 1 8 λ > 1 2 Jordan Boyd-Graber Boulder Regression 10 of 19

Regularization When Regularization Wins L 2 λ > 2 7 L 1 λ > 1 2 Bigger λ: preference for lower weights w Jordan Boyd-Graber Boulder Regression 10 of 19

Sklearn Plan Basics Regularization Sklearn Jordan Boyd-Graber Boulder Regression 11 of 19

Sklearn MPG Dataset Predict mpg from features of a car 1. Number of cylinders 2. Displacement 3. Horsepower 4. Weight 5. Acceleration 6. Year 7. Country (ignore this) Jordan Boyd-Graber Boulder Regression 12 of 19

Sklearn Simple Regression If w = 0, what s the intercept? Jordan Boyd-Graber Boulder Regression 13 of 19

Sklearn Simple Regression If w = 0, what s the intercept? 23.4 Jordan Boyd-Graber Boulder Regression 13 of 19

Sklearn Simple Linear Regression What are the coefficients for OLS? Jordan Boyd-Graber Boulder Regression 14 of 19

Sklearn Simple Linear Regression What are the coefficients for OLS? c y l 0.329859 d i s 0.007678 hp 0.000391 wgt 0.006795 a c l 0.085273 y r 0.753367 Jordan Boyd-Graber Boulder Regression 14 of 19

Sklearn Simple Linear Regression What are the coefficients for OLS? c y l 0.329859 d i s 0.007678 hp 0.000391 wgt 0.006795 a c l 0.085273 y r 0.753367 Intercept: -14.5 Jordan Boyd-Graber Boulder Regression 14 of 19

Sklearn Simple Linear Regression from s k l e a r n import l i n e a r m o d e l l i n e a r m o d e l. L i n e a r R e g r e s s i o n ( ) f i t = model. f i t ( x, y ) Jordan Boyd-Graber Boulder Regression 15 of 19

Sklearn Lasso As you increase the weight of alpha, what feature dominates? What happens to the other features? Jordan Boyd-Graber Boulder Regression 16 of 19

Sklearn Weight is Everything Jordan Boyd-Graber Boulder mpg = 46-0.01 Weight Regression 17 of 19

Sklearn How is ridge different? Jordan Boyd-Graber Boulder Regression 18 of 19

Sklearn Regression isn t special Feature engineering Regularization Overfitting Development / Test Data Jordan Boyd-Graber Boulder Regression 19 of 19