Linear Regression. Aarti Singh. Machine Learning / Sept 27, 2010

Similar documents
COMS 4771 Regression. Nakul Verma

COMS 4771 Introduction to Machine Learning. James McInerney Adapted from slides by Nakul Verma

Probabilistic Machine Learning. Industrial AI Lab.

Linear Regression. CSL603 - Fall 2017 Narayanan C Krishnan

Linear Regression. CSL465/603 - Fall 2016 Narayanan C Krishnan

ECE521 week 3: 23/26 January 2017

Linear Models for Regression CS534

Linear Models for Regression. Sargur Srihari

Linear Methods for Regression. Lijun Zhang

Linear Models for Regression

Linear Models for Regression CS534

Machine Learning. Lecture 4: Regularization and Bayesian Statistics. Feng Li.

10-701/ Machine Learning - Midterm Exam, Fall 2010

ECE521 lecture 4: 19 January Optimization, MLE, regularization

Linear Models for Regression CS534

Least Squares Regression

LINEAR REGRESSION, RIDGE, LASSO, SVR

Machine Learning Practice Page 2 of 2 10/28/13

Overfitting, Bias / Variance Analysis

Linear Regression (9/11/13)

STA414/2104 Statistical Methods for Machine Learning II

CSCI567 Machine Learning (Fall 2014)

Least Squares Regression

Linear Regression Introduction to Machine Learning. Matt Gormley Lecture 4 September 19, Readings: Bishop, 3.

Linear Regression (continued)

Modeling Data with Linear Combinations of Basis Functions. Read Chapter 3 in the text by Bishop

CSC2515 Winter 2015 Introduction to Machine Learning. Lecture 2: Linear regression

DS-GA 1002 Lecture notes 12 Fall Linear regression

Linear Models in Machine Learning

Logistic Regression Logistic

Mark your answers ON THE EXAM ITSELF. If you are not sure of your answer you may wish to provide a brief explanation.

Linear & nonlinear classifiers

6.036 midterm review. Wednesday, March 18, 15

Machine Learning CSE546 Carlos Guestrin University of Washington. October 7, Efficiency: If size(w) = 100B, each prediction is expensive:

Linear regression COMS 4771

Introduction to Machine Learning. Regression. Computer Science, Tel-Aviv University,

Sparse regression. Optimization-Based Data Analysis. Carlos Fernandez-Granda

Introduction to Machine Learning Prof. Sudeshna Sarkar Department of Computer Science and Engineering Indian Institute of Technology, Kharagpur

Lecture 3. Linear Regression II Bastian Leibe RWTH Aachen

COMS 4721: Machine Learning for Data Science Lecture 6, 2/2/2017

Introduction to Machine Learning

Discriminative Models

Homework 2 Solutions Kernel SVM and Perceptron

Regression, Ridge Regression, Lasso

Linear Models for Regression

Introduction to Machine Learning. PCA and Spectral Clustering. Introduction to Machine Learning, Slides: Eran Halperin

CPSC 340: Machine Learning and Data Mining

Linear Regression Linear Regression with Shrinkage

CS 340 Lec. 15: Linear Regression

SCMA292 Mathematical Modeling : Machine Learning. Krikamol Muandet. Department of Mathematics Faculty of Science, Mahidol University.

Clustering. Professor Ameet Talwalkar. Professor Ameet Talwalkar CS260 Machine Learning Algorithms March 8, / 26

Machine Learning. Linear Models. Fabio Vandin October 10, 2017

Linear Regression Linear Regression with Shrinkage

CMU-Q Lecture 24:

Cheng Soon Ong & Christian Walder. Canberra February June 2018

Machine Learning and Data Mining. Linear regression. Kalev Kask

OPTIMIZATION METHODS IN DEEP LEARNING

Lecture 2: Linear regression

9/26/17. Ridge regression. What our model needs to do. Ridge Regression: L2 penalty. Ridge coefficients. Ridge coefficients

Support Vector Machines: Training with Stochastic Gradient Descent. Machine Learning Fall 2017

SVAN 2016 Mini Course: Stochastic Convex Optimization Methods in Machine Learning

Gradient descent. Barnabas Poczos & Ryan Tibshirani Convex Optimization /36-725

Regression. Goal: Learn a mapping from observations (features) to continuous labels given a training set (supervised learning)

Machine Learning Linear Regression. Prof. Matteo Matteucci

Discriminative Models

Regression. Goal: Learn a mapping from observations (features) to continuous labels given a training set (supervised learning)

Lecture 2 Machine Learning Review

Machine Learning - MT & 5. Basis Expansion, Regularization, Validation

Nonlinear Support Vector Machines through Iterative Majorization and I-Splines

Naïve Bayes. Jia-Bin Huang. Virginia Tech Spring 2019 ECE-5424G / CS-5824

Neural Network Training

CPSC 340: Machine Learning and Data Mining. MLE and MAP Fall 2017


Machine Learning! in just a few minutes. Jan Peters Gerhard Neumann

Mathematical Formulation of Our Example

CPSC 340: Machine Learning and Data Mining. Gradient Descent Fall 2016

Introduction to Machine Learning

Linear & nonlinear classifiers

Homework 4. Convex Optimization /36-725

Parameter Estimation. Industrial AI Lab.

Sparse Linear Models (10/7/13)

Probabilistic Machine Learning

Machine Learning Basics: Stochastic Gradient Descent. Sargur N. Srihari

Pattern Recognition and Machine Learning

ISyE 691 Data mining and analytics

These slides follow closely the (English) course textbook Pattern Recognition and Machine Learning by Christopher Bishop

Bayesian Linear Regression [DRAFT - In Progress]

Kernel-based Approximation. Methods using MATLAB. Gregory Fasshauer. Interdisciplinary Mathematical Sciences. Michael McCourt.

Proteomics and Variable Selection

COMP 551 Applied Machine Learning Lecture 2: Linear Regression

Machine Learning Linear Models

Classification Logistic Regression

CSE 417T: Introduction to Machine Learning. Final Review. Henry Chai 12/4/18

Logistic Regression. COMP 527 Danushka Bollegala

UNIVERSITY of PENNSYLVANIA CIS 520: Machine Learning Final, Fall 2013

Linear Regression and Its Applications

Lecture 3. Linear Regression

Midterm exam CS 189/289, Fall 2015

Parameter Norm Penalties. Sargur N. Srihari

DATA MINING AND MACHINE LEARNING. Lecture 4: Linear models for regression and classification Lecturer: Simone Scardapane

Transcription:

Linear Regression Aarti Singh Machine Learning 10-701/15-781 Sept 27, 2010

Discrete to Continuous Labels Classification Sports Science News Anemic cell Healthy cell Regression X = Document Y = Topic X = Cell Image Y = Diagnosis Stock Market Prediction Y =? X = Feb01 2

Regression Tasks Weather Prediction Y = Temp X = 7 pm Estimating Contamination X = new location Y = sensor reading 3

Supervised Learning Goal: Sports Science News Y =? X = Feb01 Classification: Regression: Probability of Error Mean Squared Error 4

Regression Optimal predictor: (Conditional Mean) Intuition: Signal plus (zero-mean) Noise model 5

Regression Optimal predictor: Proof Strategy: Dropping subscripts for notational convenience 0 6

Regression Optimal predictor: (Conditional Mean) Intuition: Signal plus (zero-mean) Noise model Depends on unknown distribution 7

Regression algorithms Learning algorithm Linear Regression Lasso, Ridge regression (Regularized Linear Regression) Nonlinear Regression Kernel Regression Regression Trees, Splines, Wavelet estimators, 8

Empirical Risk Minimization (ERM) Optimal predictor: Empirical Risk Minimizer: Class of predictors Empirical mean Law of Large Numbers More later 9

ERM you saw it before! Learning Distributions Max likelihood = Min -ve log likelihood empirical risk What is the class F? Class of parametric distributions Bernoulli (q) Gaussian (m, s 2 ) 10

Linear Regression Least Squares Estimator - Class of Linear functions Uni-variate case: b1 - intercept b2 = slope Multi-variate case: 1 where, 11

Least Squares Estimator 12

Least Squares Estimator 13

Normal Equations p xp p x1 p x1 If is invertible, When is invertible? Recall: Full rank matrices are invertible. What is rank of? What if is not invertible? Regularization (later) 14

Geometric Interpretation Difference in prediction on training set: 0 is the orthogonal projection of onto the linear subspace spanned by the columns of 15

Revisiting Gradient Descent Even when is invertible, might be computationally expensive if A is huge. Gradient Descent since J(b) is convex Initialize: Update: 0 if = Stop: when some criterion met e.g. fixed # iterations, or < ε. 16

Effect of step-size α Large α => Fast convergence but larger residual error Also possible oscillations Small α => Slow convergence but small residual error 17

Least Squares and MLE Intuition: Signal plus (zero-mean) Noise model log likelihood Least Square Estimate is same as Maximum Likelihood Estimate under a Gaussian model! 19

Regularized Least Squares and MAP What if is not invertible? log likelihood log prior I) Gaussian Prior 0 Ridge Regression Closed form: HW Prior belief that β is Gaussian with zero-mean biases solution to small β 20

Regularized Least Squares and MAP What if is not invertible? log likelihood log prior II) Laplace Prior Lasso Prior belief that β is Laplace with zero-mean biases solution to small β 21

Ridge Regression vs Lasso Ridge Regression: Lasso: HOT! Ideally l0 penalty, but optimization becomes non-convex βs with constant J(β) (level sets of J(β)) βs with constant l2 norm β2 βs with constant l1 norm βs with constant l0 norm β1 Lasso (l1 penalty) results in sparse solutions vector with more zero coordinates Good for high-dimensional problems don t have to store all coordinates! 22

Beyond Linear Regression Polynomial regression Regression with nonlinear features/basis functions Kernel regression - Local/Weighted regression h Regression trees Spatially adaptive regression 23

Polynomial Regression Univariate (1-d) case: where, Weight of each feature Nonlinear features 24

Polynomial Regression http://mste.illinois.edu/users/exner/java.f/leastsquares/ 25

Nonlinear Regression Basis coefficients Nonlinear features/basis functions Fourier Basis Wavelet Basis Good representation for oscillatory functions Good representation for functions localized at multiple scales 26

Local Regression Basis coefficients Nonlinear features/basis functions Globally supported basis functions (polynomial, fourier) will not yield a good representation 27

Local Regression Basis coefficients Nonlinear features/basis functions Globally supported basis functions (polynomial, fourier) will not yield a good representation 28

What you should know Linear Regression Least Squares Estimator Normal Equations Gradient Descent Geometric and Probabilistic Interpretation (connection to MLE) Regularized Linear Regression (connection to MAP) Ridge Regression, Lasso Polynomial Regression, Basis (Fourier, Wavelet) Estimators Next time - Kernel Regression (Localized) - Regression Trees 29