Linear Regression. CSL465/603 - Fall 2016 Narayanan C Krishnan

Size: px
Start display at page:

Download "Linear Regression. CSL465/603 - Fall 2016 Narayanan C Krishnan"


1 Linear Regression CSL465/603 - Fall 2016 Narayanan C Krishnan

2 Outline Univariate regression Multivariate regression Probabilistic view of regression Loss functions Bias-Variance analysis Regularization Linear Regression CSL465/603 - Machine Learning 2

3 Example - Green Chilies Entertainment Company Earnings from the film (in crores of Rs) Cost of making the film (in crores of Rs) Linear Regression CSL465/603 - Machine Learning 3

4 Notations Training dataset Number of examples - N Input variable - x # Target variable - y % Goal: Learn function that predicts y for new input x Cost of Film (Crores of Rs) - x Profit/Loss (Crores of Rs) - y Linear Regression CSL465/603 - Machine Learning 4

5 Linear Regression Simplest form f(x) = w + + w - x Earnings from the film (in crores of Rs) Cost of making the film (in crores of Rs) Linear Regression CSL465/603 - Machine Learning 5

6 Least Mean Squares - Cost Function Choose parameters w + and w - (or w ) so that f x is as close as to y Earnings from the film (in crores of Rs) Cost of making the film (in crores of Rs) Linear Regression CSL465/603 - Machine Learning 6

7 Least Mean Squares - Cost Function - Parameter Space (1) Let J w +, w - = f x 23 %8-6 y 6 Linear Regression CSL465/603 - Machine Learning 7

8 Least Mean Squares - Cost Function - Parameter Space (2) Let J w +, w - = f x 23 %8- % y % Linear Regression CSL465/603 - Machine Learning 8

9 Least Mean Squares - Cost Function - Parameter Space (3) Let J w +, w - = f x 23 %8-6 y 6 Linear Regression CSL465/603 - Machine Learning 9

10 Plot of the Error Surface Linear Regression CSL465/603 - Machine Learning 10

11 Contour Plot of Error Surface Linear Regression CSL465/603 - Machine Learning 11

12 Estimating Optimal Parameters Linear Regression CSL465/603 - Machine Learning 12

13 Gradient Descent Basic Principle Minimize J w = - 3 f x 23 %8- % y % 2 Start with an initial estimate for w Keep changing w so that J w is progressively reduced Stop when no change or have reached the minimum Linear Regression CSL465/603 - Machine Learning 13

14 Gradient Descent - Intuition Linear Regression CSL465/603 - Machine Learning 14

15 Effect of Learning Parameter Too small value slow convergence Too large value oscillates widely and may not converge Linear Regression CSL465/603 - Machine Learning 15

16 Gradient Descent Local Minima Depending on the function J w, gradient descent can get stuck at local minima Linear Regression CSL465/603 - Machine Learning 16

17 Gradient Descent for Regression Convex error function J w = 1 3 2N ; f x 2 # y % %8- Geometrically error surface is bowl shaped. Only global minima Exercise Prove that the sum of squared error is a convex function Linear Regression CSL465/603 - Machine Learning 17

18 Parameter Update (1) Minimize 3 J w = 1 2N ; f x 2 # y % %8- Linear Regression CSL465/603 - Machine Learning 18

19 Parameter Update (2) Repeat till convergence 3 w + = w + α 1 N ; f x # y % w - = w - α 1 N ; f x # y % x # %8-3 %8- Linear Regression CSL465/603 - Machine Learning 19

20 Example Iteration 0 Regression Function Error Function Linear Regression CSL465/603 - Machine Learning 20

21 Example Iteration 1 Regression Function Error Function Linear Regression CSL465/603 - Machine Learning 21

22 Example Iteration 2 Regression Function Error Function Linear Regression CSL465/603 - Machine Learning 22

23 Example Iteration 4 Regression Function Error Function Linear Regression CSL465/603 - Machine Learning 23

24 Example Iteration 7 Regression Function Error Function Linear Regression CSL465/603 - Machine Learning 24

25 Example Iteration 9 Regression Function Error Function Linear Regression CSL465/603 - Machine Learning 25

26 Gradient Descent Batch Mode Update includes contribution of all data points w + = w + α 1 N ; f x # y % 3 w - = w - α 1 N ; f x # y % x # %8- %8- Will talk stochastic gradient descent later (neural networks). 3 Linear Regression CSL465/603 - Machine Learning 26

27 Multivariate Linear Regression Cost of Film (Crores of Rs) Celebrity status of the protagonist # of theatres release Age of the protagonist Earnings (Crores of Rs) - y Dimension of the input data - D Linear Regression CSL465/603 - Machine Learning 27

28 Multivariate Linear Regression - Formulation Simplest model: f x = w + + w - x - + w 2 x w? x? Parameters to learn: w +, w -,, w? = w Cost function: J w = f x 23 %8- # y % Update equation: w B = w B α - 3 f x 3 %8- % y % x %B Linear Regression CSL465/603 - Machine Learning 28

29 Gradient Descent Parameter update equation 3 w B = w B α 1 N ; f x % y % x %B %8- Linear Regression CSL465/603 - Machine Learning 29

30 Feature Scaling Multivariate Linear Regression (1) Cost of Film (Crores of Rs) Celebrity status of the protagonist # of theatres release Age of the protagonist Profit/Loss (Crores of Rs) - y Transform features to be of same scale Linear Regression CSL465/603 - Machine Learning 30

31 Feature Scaling for Multivariate Linear Regression (2) Normalization 1 x D 1 or 0 x D 1 Standardization mean 0 and standard deviation 1 Linear Regression CSL465/603 - Machine Learning 31

32 Multivariate Linear Regression Analytical Solution Cost of Film (Crores of Rs) Celebrity status of the protagonist # of theatres release Design Matrix and Target Vector Age of the protagonist Profit/Loss (Crores of Rs) - y X = 1 x -- x -? 1 x 2- x 2? Y = 1 x 3- x 3? y - y 2 y 3 Linear Regression CSL465/603 - Machine Learning 32

33 Least Squares Method f X = Xw = 1 x -- x -? 1 x 2- x 2? 1 x 3- x 3? w + w - w? = y - y 2 y 3 = Y 3 J w = 1 2 ; f x # y % 2 %8- Linear Regression CSL465/603 - Machine Learning 33

34 Normal Equations 1 min J W = min L L 2 XW Y N XW Y Finding the gradient wrt W and equate it to 0 Linear Regression CSL465/603 - Machine Learning 34

35 Analytical Solution Advantage No need for the learning parameter α! No need for iterative updates Disadvantage Need to perform matrix inversion Pseudo-Inverse of the matrix X N X P- X N Sometimes we deal with non-invertible matrices (redundant features) Linear Regression CSL465/603 - Machine Learning 35

36 Probabilistic View of Linear Regression (1) Let y = f(x) + ε ε is the error term that captures unmodeled effects or random noise. ε~n 0, σ 2 - Gaussian distribution Linear Regression CSL465/603 - Machine Learning 36

37 Probabilistic View of Linear Regression (2) Let y = f(x) + ε ε is the error term that captures unmodeled effects or random noise. ε~n 0, σ 2 - Gaussian distribution - why? N 0, σ 2 has maximum entropy among all real-valued distributions with a specified variance σ 2 3-σ rule: Linear Regression CSL465/603 - Machine Learning 37

38 Probabilistic View of Linear Regression (3) Let y = f(x) + ε ε is the error term that captures unmodeled effects or random noise. ε~n 0, σ 2 - Gaussian distribution Then P ε = And P y x = t y(x 0 ) p(t x 0 ) y(x) x 0 x Linear Regression CSL465/603 - Machine Learning 38

39 Probabilistic View of Linear Regression (4) P y -,, y 3 x -,, x 3 = P y -,, y 3 x -,, x 3 ; W = Linear Regression CSL465/603 - Machine Learning 39

40 Maximizing the Likelihood Maximize L W = 3 %8- P y % x % ; W Linear Regression CSL465/603 - Machine Learning 40

41 Loss Functions Squared loss f x y 2 Absolute loss f x y Dead band loss max 0, f x y ε, ε R ] Linear Regression CSL465/603 - Machine Learning 41

42 Loss Functions Problem with squared loss Linear Regression CSL465/603 - Machine Learning 42

43 Linear Regression with Absolute Loss Function Objective min L 3 ; XW Y %8- Non-differentiable, so cannot take the gradient descent approach Solution: frame as a constrained optimization problem Introduce new variables v R 3, v % x % W y % 3 min ; v %, subject to v % L,` %8- x % W y % v % Linear Regression CSL465/603 - Machine Learning 43

44 Linear Regression with Absolute Loss Function - Example LMS output LP output Linear Regression CSL465/603 - Machine Learning 44

45 Some Additional Notations Underlying response function (Target Concept) C Actual observed response y = C x + ε ε~n 0, σ 2, E y/x = C(x) Predicted response based on the model learned from dataset A - f x; A Expected response averaged over all datasets fm x = E n f x; A Expected L 2 error on a new test instance x - E pqq = E n f x ; A y 2 Linear Regression CSL465/603 - Machine Learning 45

46 Bias-Variance Analysis (1) Linear Regression CSL465/603 - Machine Learning 46

47 Bias-Variance Analysis (2) Linear Regression CSL465/603 - Machine Learning 47

48 Bias-Variance Analysis (3) Root Mean Square Error Linear Regression CSL465/603 - Machine Learning 48

49 Bias-Variance Analysis (4) 9 th degree polynomial fit with more sample data Linear Regression CSL465/603 - Machine Learning 49

50 Bias-Variance Analysis (5) Expected square loss - E L = r f x y 2 P x, y dxdy Linear Regression CSL465/603 - Machine Learning 50

51 Bias-Variance Analysis (6) Expected square loss - E L = r f x y 2 P x, y dxdy Linear Regression CSL465/603 - Machine Learning 51

52 Bias-Variance Analysis (7) Relevant part of loss: u f x C x 2 P x dx Linear Regression CSL465/603 - Machine Learning 52

53 Bias-Variance Analysis (8) Relevant part of loss: E n f x; A C x 2 Linear Regression CSL465/603 - Machine Learning 53

54 Bias-Variance Analysis (9) Degree = 1 Degree = 4 Linear Regression CSL465/603 - Machine Learning 54

55 Bias-Variance Analysis (10) Bias term of the error E n f x; A C x 2 Measures how well our approximation architecture can fit the data Weak approximators will have high bias Example low degree polynomials Strong approximators will have low bias Example high degree polynomials Linear Regression CSL465/603 - Machine Learning 55

56 Bias-Variance Analysis (11) Variance term of the error E n f x; A E n f x; A 2 No direct dependence on the target value For a fixed size dataset A Strong approximators tend to have more variance Small changes in the dataset can result in wide changes in the predictors Weak approximators tend to have less variance Small changes in the dataset result in similar predictors Variance disappears as A Linear Regression CSL465/603 - Machine Learning 56

57 Bias-Variance Analysis (12) Measuring Bias and Variance in practice Bootstrap from the given dataset Start with a complex approximator, and reduce the complexity through regularization Setting more coefficients/parameters to 0 Do Feature Selection Reduces variance, but can increase bias. Hopefully just sufficient to model the given data Linear Regression CSL465/603 - Machine Learning 57

58 Regularization Central Idea: penalize over-complicated solutions Linear regression minimizes 3 ; x % w y % 2 Regularized regression minimizes 3 2 ; x % w y % + λ w %8- %8- Linear Regression CSL465/603 - Machine Learning 58

59 Modified Solution Solution for ordinary linear regression min z J w min L 1 2 Xw Y N Xw Y w = X N X P- X N Y Now for the regularized version which uses L 2 norm Ridge Regression 1 min J w min z L 2 Xw Y N Xw Y + λ w 2 w = X N X + λi P- X N Y Exercise: derive the closed for solution for ridge regression with L2 regularizer Linear Regression CSL465/603 - Machine Learning 59

60 How to choose λ? Tradeoff between complexity vs. goodness of the fit Solution 1: If we have lots of data Generate multiple models Use lots of test data to discard the bad models Solution 2: With limited data Use k- fold cross validation Will discuss later Linear Regression CSL465/603 - Machine Learning 60

61 General Form of Regularizer Term 3 ; x % w y % 2? } + λ ; w D %8- D8- Quadratic/L 2 regularizer q = 2 Contours for the regularization term q =0.5 q =1 q =2 q =4 Linear Regression CSL465/603 - Machine Learning 61

62 Special scenario q = 1 - LASSO Least Absolute Shrinkage and Selection Operator 3 Error Function: 2? %8- x % w y % + λ D8- w D For sufficiently large λ many of the coefficients become 0 resulting in a sparse solution w 2 w 2 w w w 1 w 1 Linear Regression CSL465/603 - Machine Learning 62

63 LASSO Quadratic programming to solve the optimization problem Least Angles Regression solution - refer to ESL - matlab packages for LASSO Linear Regression CSL465/603 - Machine Learning 63

64 Linear Regression with Non- Linear Basis Functions Linear combination of fixed non-linear functions of the input variables? f x = w + + ; φ D x D Linear Regression CSL465/603 - Machine Learning 64

65 Linear Regression with Basis Functions Solution f X = 1 φ - x - φ? x - 1 φ - x 2 φ? x 2 1 φ - x 3 φ? x 3 w + w - w? = y - y 2 y 3 = Y w = φ X N φ X P- φ X N Y Linear Regression CSL465/603 - Machine Learning 65

66 Linear Regression with Multiple Outputs Multiple outputs Y = 1 x -- x -? f X = XW = 1 x 2- x 2? y -- y 1 - x 3- x 3? = Y y 3- y 3 y -- y - y 3- y 3 W = X N X P- X N Y w -+ w + = w -? w? Linear Regression CSL465/603 - Machine Learning 66

67 Summary Linear Regression (aka curve fitting) Gradient Descent Approach for finding the solution Analytical solution Loss Functions Probabilistic view of Linear Regression Bias-Variance analysis Regularization Ridge Regression Regression with basis functions Locally weighted regression (refer ML - 8.3) Linear Regression CSL465/603 - Machine Learning 67

Linear Regression. CSL603 - Fall 2017 Narayanan C Krishnan

Linear Regression. CSL603 - Fall 2017 Narayanan C Krishnan Linear Regression CSL603 - Fall 2017 Narayanan C Krishnan Outline Univariate regression Multivariate regression Probabilistic view of regression Loss functions Bias-Variance analysis Regularization

More information

CSC2515 Winter 2015 Introduction to Machine Learning. Lecture 2: Linear regression

CSC2515 Winter 2015 Introduction to Machine Learning. Lecture 2: Linear regression CSC2515 Winter 2015 Introduction to Machine Learning Lecture 2: Linear regression All lecture slides will be available as.pdf on the course website:

More information

ECE521 lecture 4: 19 January Optimization, MLE, regularization

ECE521 lecture 4: 19 January Optimization, MLE, regularization ECE521 lecture 4: 19 January 2017 Optimization, MLE, regularization First four lectures Lectures 1 and 2: Intro to ML Probability review Types of loss functions and algorithms Lecture 3: KNN Convexity

More information

Linear Models for Regression. Sargur Srihari

Linear Models for Regression. Sargur Srihari Linear Models for Regression Sargur 1 Topics in Linear Regression What is regression? Polynomial Curve Fitting with Scalar input Linear Basis Function Models Maximum Likelihood

More information

Linear Regression. Aarti Singh. Machine Learning / Sept 27, 2010

Linear Regression. Aarti Singh. Machine Learning / Sept 27, 2010 Linear Regression Aarti Singh Machine Learning 10-701/15-781 Sept 27, 2010 Discrete to Continuous Labels Classification Sports Science News Anemic cell Healthy cell Regression X = Document Y = Topic X

More information

Linear Models for Regression CS534

Linear Models for Regression CS534 Linear Models for Regression CS534 Prediction Problems Predict housing price based on House size, lot size, Location, # of rooms Predict stock price based on Price history of the past month Predict the

More information

CSCI567 Machine Learning (Fall 2014)

CSCI567 Machine Learning (Fall 2014) CSCI567 Machine Learning (Fall 24) Drs. Sha & Liu {feisha,yanliu.cs} October 2, 24 Drs. Sha & Liu ({feisha,yanliu.cs} CSCI567 Machine Learning (Fall 24) October 2, 24 / 24 Outline Review

More information

Linear Models for Regression

Linear Models for Regression Linear Models for Regression Machine Learning Torsten Möller Möller/Mori 1 Reading Chapter 3 of Pattern Recognition and Machine Learning by Bishop Chapter 3+5+6+7 of The Elements of Statistical Learning

More information

Linear Models for Regression CS534

Linear Models for Regression CS534 Linear Models for Regression CS534 Example Regression Problems Predict housing price based on House size, lot size, Location, # of rooms Predict stock price based on Price history of the past month Predict

More information

Overfitting, Bias / Variance Analysis

Overfitting, Bias / Variance Analysis Overfitting, Bias / Variance Analysis Professor Ameet Talwalkar Professor Ameet Talwalkar CS260 Machine Learning Algorithms February 8, 207 / 40 Outline Administration 2 Review of last lecture 3 Basic

More information

Linear Regression (continued)

Linear Regression (continued) Linear Regression (continued) Professor Ameet Talwalkar Professor Ameet Talwalkar CS260 Machine Learning Algorithms February 6, 2017 1 / 39 Outline 1 Administration 2 Review of last lecture 3 Linear regression

More information

Neural Network Training

Neural Network Training Neural Network Training Sargur Srihari Topics in Network Training 0. Neural network parameters Probabilistic problem formulation Specifying the activation and error functions for Regression Binary classification

More information

SCMA292 Mathematical Modeling : Machine Learning. Krikamol Muandet. Department of Mathematics Faculty of Science, Mahidol University.

SCMA292 Mathematical Modeling : Machine Learning. Krikamol Muandet. Department of Mathematics Faculty of Science, Mahidol University. SCMA292 Mathematical Modeling : Machine Learning Krikamol Muandet Department of Mathematics Faculty of Science, Mahidol University February 9, 2016 Outline Quick Recap of Least Square Ridge Regression

More information

Linear Models for Regression CS534

Linear Models for Regression CS534 Linear Models for Regression CS534 Example Regression Problems Predict housing price based on House size, lot size, Location, # of rooms Predict stock price based on Price history of the past month Predict

More information

These slides follow closely the (English) course textbook Pattern Recognition and Machine Learning by Christopher Bishop

These slides follow closely the (English) course textbook Pattern Recognition and Machine Learning by Christopher Bishop Music and Machine Learning (IFT68 Winter 8) Prof. Douglas Eck, Université de Montréal These slides follow closely the (English) course textbook Pattern Recognition and Machine Learning by Christopher Bishop

More information

Least Mean Squares Regression. Machine Learning Fall 2018

Least Mean Squares Regression. Machine Learning Fall 2018 Least Mean Squares Regression Machine Learning Fall 2018 1 Where are we? Least Squares Method for regression Examples The LMS objective Gradient descent Incremental/stochastic gradient descent Exercises

More information

Linear Models for Regression

Linear Models for Regression Linear Models for Regression CSE 4309 Machine Learning Vassilis Athitsos Computer Science and Engineering Department University of Texas at Arlington 1 The Regression Problem Training data: A set of input-output

More information

Lecture 3. Linear Regression II Bastian Leibe RWTH Aachen

Lecture 3. Linear Regression II Bastian Leibe RWTH Aachen Advanced Machine Learning Lecture 3 Linear Regression II 02.11.2015 Bastian Leibe RWTH Aachen This Lecture: Advanced Machine Learning Regression

More information

ECE521 week 3: 23/26 January 2017

ECE521 week 3: 23/26 January 2017 ECE521 week 3: 23/26 January 2017 Outline Probabilistic interpretation of linear regression - Maximum likelihood estimation (MLE) - Maximum a posteriori (MAP) estimation Bias-variance trade-off Linear

More information

COMS 4721: Machine Learning for Data Science Lecture 6, 2/2/2017

COMS 4721: Machine Learning for Data Science Lecture 6, 2/2/2017 COMS 4721: Machine Learning for Data Science Lecture 6, 2/2/2017 Prof. John Paisley Department of Electrical Engineering & Data Science Institute Columbia University UNDERDETERMINED LINEAR EQUATIONS We

More information

Linear Regression. S. Sumitra

Linear Regression. S. Sumitra Linear Regression S Sumitra Notations: x i : ith data point; x T : transpose of x; x ij : ith data point s jth attribute Let {(x 1, y 1 ), (x, y )(x N, y N )} be the given data, x i D and y i Y Here D

More information

Cheng Soon Ong & Christian Walder. Canberra February June 2018

Cheng Soon Ong & Christian Walder. Canberra February June 2018 Cheng Soon Ong & Christian Walder Research Group and College of Engineering and Computer Science Canberra February June 2018 (Many figures from C. M. Bishop, "Pattern Recognition and ") 1of 254 Part V

More information

Least Mean Squares Regression

Least Mean Squares Regression Least Mean Squares Regression Machine Learning Spring 2018 The slides are mainly from Vivek Srikumar 1 Lecture Overview Linear classifiers What functions do linear classifiers express? Least Squares Method

More information

J. Sadeghi E. Patelli M. de Angelis

J. Sadeghi E. Patelli M. de Angelis J. Sadeghi E. Patelli Institute for Risk and, Department of Engineering, University of Liverpool, United Kingdom 8th International Workshop on Reliable Computing, Computing with Confidence University of

More information

CSE 417T: Introduction to Machine Learning. Lecture 11: Review. Henry Chai 10/02/18

CSE 417T: Introduction to Machine Learning. Lecture 11: Review. Henry Chai 10/02/18 CSE 417T: Introduction to Machine Learning Lecture 11: Review Henry Chai 10/02/18 Unknown Target Function!: # % Training data Formal Setup & = ( ), + ),, ( -, + - Learning Algorithm 2 Hypothesis Set H

More information

Linear Methods for Regression. Lijun Zhang

Linear Methods for Regression. Lijun Zhang Linear Methods for Regression Lijun Zhang Outline Introduction Linear Regression Models and Least Squares Subset Selection Shrinkage Methods Methods Using Derived

More information

Introduction to Machine Learning. Regression. Computer Science, Tel-Aviv University,

Introduction to Machine Learning. Regression. Computer Science, Tel-Aviv University, 1 Introduction to Machine Learning Regression Computer Science, Tel-Aviv University, 2013-14 Classification Input: X Real valued, vectors over real. Discrete values (0,1,2,...) Other structures (e.g.,

More information

CSC 411: Lecture 02: Linear Regression

CSC 411: Lecture 02: Linear Regression CSC 411: Lecture 02: Linear Regression Richard Zemel, Raquel Urtasun and Sanja Fidler University of Toronto (Most plots in this lecture are from Bishop s book) Zemel, Urtasun, Fidler (UofT) CSC 411: 02-Regression

More information

Rirdge Regression. Szymon Bobek. Institute of Applied Computer science AGH University of Science and Technology

Rirdge Regression. Szymon Bobek. Institute of Applied Computer science AGH University of Science and Technology Rirdge Regression Szymon Bobek Institute of Applied Computer science AGH University of Science and Technology Based on Carlos Guestrin adn Emily Fox slides from Coursera Specialization on Machine Learnign

More information

Machine Learning Support Vector Machines. Prof. Matteo Matteucci

Machine Learning Support Vector Machines. Prof. Matteo Matteucci Machine Learning Support Vector Machines Prof. Matteo Matteucci Discriminative vs. Generative Approaches 2 o Generative approach: we derived the classifier from some generative hypothesis about the way

More information

Data Mining Techniques. Lecture 2: Regression

Data Mining Techniques. Lecture 2: Regression Data Mining Techniques CS 6220 - Section 3 - Fall 2016 Lecture 2: Regression Jan-Willem van de Meent (credit: Yijun Zhao, Marc Toussaint, Bishop) Administrativa Instructor Jan-Willem van de Meent Email:

More information

Day 3 Lecture 3. Optimizing deep networks

Day 3 Lecture 3. Optimizing deep networks Day 3 Lecture 3 Optimizing deep networks Convex optimization A function is convex if for all α [0,1]: f(x) Tangent line Examples Quadratics 2-norms Properties Local minimum is global minimum x Gradient

More information

Logistic Regression. COMP 527 Danushka Bollegala

Logistic Regression. COMP 527 Danushka Bollegala Logistic Regression COMP 527 Danushka Bollegala Binary Classification Given an instance x we must classify it to either positive (1) or negative (0) class We can use {1,-1} instead of {1,0} but we will

More information

Introduction to Machine Learning

Introduction to Machine Learning Introduction to Machine Learning Linear Regression Varun Chandola Computer Science & Engineering State University of New York at Buffalo Buffalo, NY, USA Chandola@UB CSE 474/574 1

More information

Least Squares Regression

Least Squares Regression E0 70 Machine Learning Lecture 4 Jan 7, 03) Least Squares Regression Lecturer: Shivani Agarwal Disclaimer: These notes are a brief summary of the topics covered in the lecture. They are not a substitute

More information

Parameter Norm Penalties. Sargur N. Srihari

Parameter Norm Penalties. Sargur N. Srihari Parameter Norm Penalties Sargur N. 1 Regularization Strategies 1. Parameter Norm Penalties 2. Norm Penalties as Constrained Optimization 3. Regularization and Underconstrained

More information

Machine Learning Practice Page 2 of 2 10/28/13

Machine Learning Practice Page 2 of 2 10/28/13 Machine Learning 10-701 Practice Page 2 of 2 10/28/13 1. True or False Please give an explanation for your answer, this is worth 1 pt/question. (a) (2 points) No classifier can do better than a naive Bayes

More information

Big Data Analytics. Lucas Rego Drumond

Big Data Analytics. Lucas Rego Drumond Big Data Analytics Lucas Rego Drumond Information Systems and Machine Learning Lab (ISMLL) Institute of Computer Science University of Hildesheim, Germany Predictive Models Predictive Models 1 / 34 Outline

More information

Machine Learning and Data Mining. Linear regression. Kalev Kask

Machine Learning and Data Mining. Linear regression. Kalev Kask Machine Learning and Data Mining Linear regression Kalev Kask Supervised learning Notation Features x Targets y Predictions ŷ Parameters q Learning algorithm Program ( Learner ) Change q Improve performance

More information

Least Squares Regression

Least Squares Regression CIS 50: Machine Learning Spring 08: Lecture 4 Least Squares Regression Lecturer: Shivani Agarwal Disclaimer: These notes are designed to be a supplement to the lecture. They may or may not cover all the

More information

Linear Regression (9/11/13)

Linear Regression (9/11/13) STA561: Probabilistic machine learning Linear Regression (9/11/13) Lecturer: Barbara Engelhardt Scribes: Zachary Abzug, Mike Gloudemans, Zhuosheng Gu, Zhao Song 1 Why use linear regression? Figure 1: Scatter

More information

Comments. Assignment 3 code released. Thought questions 3 due this week. Mini-project: hopefully you have started. implement classification algorithms

Comments. Assignment 3 code released. Thought questions 3 due this week. Mini-project: hopefully you have started. implement classification algorithms Neural networks Comments Assignment 3 code released implement classification algorithms use kernels for census dataset Thought questions 3 due this week Mini-project: hopefully you have started 2 Example:

More information


OPTIMIZATION METHODS IN DEEP LEARNING Tutorial outline OPTIMIZATION METHODS IN DEEP LEARNING Based on Deep Learning, chapter 8 by Ian Goodfellow, Yoshua Bengio and Aaron Courville Presented By Nadav Bhonker Optimization vs Learning Surrogate

More information

Bayesian Linear Regression. Sargur Srihari

Bayesian Linear Regression. Sargur Srihari Bayesian Linear Regression Sargur Topics in Bayesian Regression Recall Max Likelihood Linear Regression Parameter Distribution Predictive Distribution Equivalent Kernel 2 Linear

More information

Modeling Data with Linear Combinations of Basis Functions. Read Chapter 3 in the text by Bishop

Modeling Data with Linear Combinations of Basis Functions. Read Chapter 3 in the text by Bishop Modeling Data with Linear Combinations of Basis Functions Read Chapter 3 in the text by Bishop A Type of Supervised Learning Problem We want to model data (x 1, t 1 ),..., (x N, t N ), where x i is a vector

More information

CSCI 1951-G Optimization Methods in Finance Part 12: Variants of Gradient Descent

CSCI 1951-G Optimization Methods in Finance Part 12: Variants of Gradient Descent CSCI 1951-G Optimization Methods in Finance Part 12: Variants of Gradient Descent April 27, 2018 1 / 32 Outline 1) Moment and Nesterov s accelerated gradient descent 2) AdaGrad and RMSProp 4) Adam 5) Stochastic

More information

Midterm. Introduction to Machine Learning. CS 189 Spring Please do not open the exam before you are instructed to do so.

Midterm. Introduction to Machine Learning. CS 189 Spring Please do not open the exam before you are instructed to do so. CS 89 Spring 07 Introduction to Machine Learning Midterm Please do not open the exam before you are instructed to do so. The exam is closed book, closed notes except your one-page cheat sheet. Electronic

More information

COMS 4771 Regression. Nakul Verma

COMS 4771 Regression. Nakul Verma COMS 4771 Regression Nakul Verma Last time Support Vector Machines Maximum Margin formulation Constrained Optimization Lagrange Duality Theory Convex Optimization SVM dual and Interpretation How get the

More information

Lecture 2 - Learning Binary & Multi-class Classifiers from Labelled Training Data

Lecture 2 - Learning Binary & Multi-class Classifiers from Labelled Training Data Lecture 2 - Learning Binary & Multi-class Classifiers from Labelled Training Data DD2424 March 23, 2017 Binary classification problem given labelled training data Have labelled training examples? Given

More information

GWAS IV: Bayesian linear (variance component) models

GWAS IV: Bayesian linear (variance component) models GWAS IV: Bayesian linear (variance component) models Dr. Oliver Stegle Christoh Lippert Prof. Dr. Karsten Borgwardt Max-Planck-Institutes Tübingen, Germany Tübingen Summer 2011 Oliver Stegle GWAS IV: Bayesian

More information

CSE 417T: Introduction to Machine Learning. Final Review. Henry Chai 12/4/18

CSE 417T: Introduction to Machine Learning. Final Review. Henry Chai 12/4/18 CSE 417T: Introduction to Machine Learning Final Review Henry Chai 12/4/18 Overfitting Overfitting is fitting the training data more than is warranted Fitting noise rather than signal 2 Estimating! "#$

More information

Linear Models in Machine Learning

Linear Models in Machine Learning CS540 Intro to AI Linear Models in Machine Learning Lecturer: Xiaojin Zhu We briefly go over two linear models frequently used in machine learning: linear regression for, well, regression,

More information

Deep Learning & Artificial Intelligence WS 2018/2019

Deep Learning & Artificial Intelligence WS 2018/2019 Deep Learning & Artificial Intelligence WS 2018/2019 Linear Regression Model Model Error Function: Squared Error Has no special meaning except it makes gradients look nicer Prediction Ground truth / target

More information

Need for Deep Networks Perceptron. Can only model linear functions. Kernel Machines. Non-linearity provided by kernels

Need for Deep Networks Perceptron. Can only model linear functions. Kernel Machines. Non-linearity provided by kernels Need for Deep Networks Perceptron Can only model linear functions Kernel Machines Non-linearity provided by kernels Need to design appropriate kernels (possibly selecting from a set, i.e. kernel learning)

More information

Linear & nonlinear classifiers

Linear & nonlinear classifiers Linear & nonlinear classifiers Machine Learning Hamid Beigy Sharif University of Technology Fall 1394 Hamid Beigy (Sharif University of Technology) Linear & nonlinear classifiers Fall 1394 1 / 34 Table

More information

Regression with Numerical Optimization. Logistic

Regression with Numerical Optimization. Logistic CSG220 Machine Learning Fall 2008 Regression with Numerical Optimization. Logistic regression Regression with Numerical Optimization. Logistic regression based on a document by Andrew Ng October 3, 204

More information

Linear Models for Classification

Linear Models for Classification Linear Models for Classification Oliver Schulte - CMPT 726 Bishop PRML Ch. 4 Classification: Hand-written Digit Recognition CHINE INTELLIGENCE, VOL. 24, NO. 24, APRIL 2002 x i = t i = (0, 0, 0, 1, 0, 0,

More information

Why should you care about the solution strategies?

Why should you care about the solution strategies? Optimization Why should you care about the solution strategies? Understanding the optimization approaches behind the algorithms makes you more effectively choose which algorithm to run Understanding the

More information

Introduction to Machine Learning Prof. Sudeshna Sarkar Department of Computer Science and Engineering Indian Institute of Technology, Kharagpur

Introduction to Machine Learning Prof. Sudeshna Sarkar Department of Computer Science and Engineering Indian Institute of Technology, Kharagpur Introduction to Machine Learning Prof. Sudeshna Sarkar Department of Computer Science and Engineering Indian Institute of Technology, Kharagpur Module 2 Lecture 05 Linear Regression Good morning, welcome

More information

y Xw 2 2 y Xw λ w 2 2

y Xw 2 2 y Xw λ w 2 2 CS 189 Introduction to Machine Learning Spring 2018 Note 4 1 MLE and MAP for Regression (Part I) So far, we ve explored two approaches of the regression framework, Ordinary Least Squares and Ridge Regression:

More information

Linear Regression 1 / 25. Karl Stratos. June 18, 2018

Linear Regression 1 / 25. Karl Stratos. June 18, 2018 Linear Regression Karl Stratos June 18, 2018 1 / 25 The Regression Problem Problem. Find a desired input-output mapping f : X R where the output is a real value. x = = y = 0.1 How much should I turn my

More information

STA414/2104 Statistical Methods for Machine Learning II

STA414/2104 Statistical Methods for Machine Learning II STA414/2104 Statistical Methods for Machine Learning II Murat A. Erdogdu & David Duvenaud Department of Computer Science Department of Statistical Sciences Lecture 3 Slide credits: Russ Salakhutdinov Announcements

More information

COMS 4771 Introduction to Machine Learning. James McInerney Adapted from slides by Nakul Verma

COMS 4771 Introduction to Machine Learning. James McInerney Adapted from slides by Nakul Verma COMS 4771 Introduction to Machine Learning James McInerney Adapted from slides by Nakul Verma Announcements HW1: Please submit as a group Watch out for zero variance features (Q5) HW2 will be released

More information

COMP 551 Applied Machine Learning Lecture 3: Linear regression (cont d)

COMP 551 Applied Machine Learning Lecture 3: Linear regression (cont d) COMP 551 Applied Machine Learning Lecture 3: Linear regression (cont d) Instructor: Herke van Hoof ( Slides mostly by: Class web page: Unless

More information KYOTO UNIVERSITY Statistical Machine Learning Theory Sparsity Hisashi Kashima DEPARTMENT OF INTELLIGENCE SCIENCE AND TECHNOLOGY 1 KYOTO UNIVERSITY Topics:

More information

Classification Logistic Regression

Classification Logistic Regression Announcements: Classification Logistic Regression Machine Learning CSE546 Sham Kakade University of Washington HW due on Friday. Today: Review: sub-gradients,lasso Logistic Regression October 3, 26 Sham

More information

Reminders. Thought questions should be submitted on eclass. Please list the section related to the thought question

Reminders. Thought questions should be submitted on eclass. Please list the section related to the thought question Linear regression Reminders Thought questions should be submitted on eclass Please list the section related to the thought question If it is a more general, open-ended question not exactly related to a

More information

Mark your answers ON THE EXAM ITSELF. If you are not sure of your answer you may wish to provide a brief explanation.

Mark your answers ON THE EXAM ITSELF. If you are not sure of your answer you may wish to provide a brief explanation. CS 189 Spring 2015 Introduction to Machine Learning Midterm You have 80 minutes for the exam. The exam is closed book, closed notes except your one-page crib sheet. No calculators or electronic items.

More information

Lecture 2 Machine Learning Review

Lecture 2 Machine Learning Review Lecture 2 Machine Learning Review CMSC 35246: Deep Learning Shubhendu Trivedi & Risi Kondor University of Chicago March 29, 2017 Things we will look at today Formal Setup for Supervised Learning Things

More information

Linear Models for Regression

Linear Models for Regression Linear Models for Regression Seungjin Choi Department of Computer Science and Engineering Pohang University of Science and Technology 77 Cheongam-ro, Nam-gu, Pohang 37673, Korea

More information

Ch 4. Linear Models for Classification

Ch 4. Linear Models for Classification Ch 4. Linear Models for Classification Pattern Recognition and Machine Learning, C. M. Bishop, 2006. Department of Computer Science and Engineering Pohang University of Science and echnology 77 Cheongam-ro,

More information

Regression. Machine Learning and Pattern Recognition. Chris Williams. School of Informatics, University of Edinburgh.

Regression. Machine Learning and Pattern Recognition. Chris Williams. School of Informatics, University of Edinburgh. Regression Machine Learning and Pattern Recognition Chris Williams School of Informatics, University of Edinburgh September 24 (All of the slides in this course have been adapted from previous versions

More information

Fundamentals of Machine Learning. Mohammad Emtiyaz Khan EPFL Aug 25, 2015

Fundamentals of Machine Learning. Mohammad Emtiyaz Khan EPFL Aug 25, 2015 Fundamentals of Machine Learning Mohammad Emtiyaz Khan EPFL Aug 25, 25 Mohammad Emtiyaz Khan 24 Contents List of concepts 2 Course Goals 3 2 Regression 4 3 Model: Linear Regression 7 4 Cost Function: MSE

More information

Linear Regression. Robot Image Credit: Viktoriya Sukhanova

Linear Regression. Robot Image Credit: Viktoriya Sukhanova Linear Regression These slides were assembled by Eric Eaton, with grateful acknowledgement of the many others who made their course materials freely available online. Feel free to reuse or adapt these

More information

ECE521 Lectures 9 Fully Connected Neural Networks

ECE521 Lectures 9 Fully Connected Neural Networks ECE521 Lectures 9 Fully Connected Neural Networks Outline Multi-class classification Learning multi-layer neural networks 2 Measuring distance in probability space We learnt that the squared L2 distance

More information

Machine Learning - MT & 5. Basis Expansion, Regularization, Validation

Machine Learning - MT & 5. Basis Expansion, Regularization, Validation Machine Learning - MT 2016 4 & 5. Basis Expansion, Regularization, Validation Varun Kanade University of Oxford October 19 & 24, 2016 Outline Basis function expansion to capture non-linear relationships

More information

Support Vector Machines: Training with Stochastic Gradient Descent. Machine Learning Fall 2017

Support Vector Machines: Training with Stochastic Gradient Descent. Machine Learning Fall 2017 Support Vector Machines: Training with Stochastic Gradient Descent Machine Learning Fall 2017 1 Support vector machines Training by maximizing margin The SVM objective Solving the SVM optimization problem

More information

Discriminative Learning and Big Data

Discriminative Learning and Big Data AIMS-CDT Michaelmas 2016 Discriminative Learning and Big Data Lecture 2: Other loss functions and ANN Andrew Zisserman Visual Geometry Group University of Oxford Lecture

More information

Need for Deep Networks Perceptron. Can only model linear functions. Kernel Machines. Non-linearity provided by kernels

Need for Deep Networks Perceptron. Can only model linear functions. Kernel Machines. Non-linearity provided by kernels Need for Deep Networks Perceptron Can only model linear functions Kernel Machines Non-linearity provided by kernels Need to design appropriate kernels (possibly selecting from a set, i.e. kernel learning)

More information

ECS171: Machine Learning

ECS171: Machine Learning ECS171: Machine Learning Lecture 4: Optimization (LFD 3.3, SGD) Cho-Jui Hsieh UC Davis Jan 22, 2018 Gradient descent Optimization Goal: find the minimizer of a function min f (w) w For now we assume f

More information


CS6220: DATA MINING TECHNIQUES CS6220: DATA MINING TECHNIQUES Matrix Data: Prediction Instructor: Yizhou Sun September 14, 2014 Today s Schedule Course Project Introduction Linear Regression Model Decision Tree 2 Methods

More information

Machine Learning Linear Regression. Prof. Matteo Matteucci

Machine Learning Linear Regression. Prof. Matteo Matteucci Machine Learning Linear Regression Prof. Matteo Matteucci Outline 2 o Simple Linear Regression Model Least Squares Fit Measures of Fit Inference in Regression o Multi Variate Regession Model Least Squares

More information

Machine Learning CS 4900/5900. Lecture 03. Razvan C. Bunescu School of Electrical Engineering and Computer Science

Machine Learning CS 4900/5900. Lecture 03. Razvan C. Bunescu School of Electrical Engineering and Computer Science Machine Learning CS 4900/5900 Razvan C. Bunescu School of Electrical Engineering and Computer Science Machine Learning is Optimization Parametric ML involves minimizing an objective function

More information

SVAN 2016 Mini Course: Stochastic Convex Optimization Methods in Machine Learning

SVAN 2016 Mini Course: Stochastic Convex Optimization Methods in Machine Learning SVAN 2016 Mini Course: Stochastic Convex Optimization Methods in Machine Learning Mark Schmidt University of British Columbia, May 2016 Some images from this lecture are

More information

Discriminative Models

Discriminative Models No.5 Discriminative Models Hui Jiang Department of Electrical Engineering and Computer Science Lassonde School of Engineering York University, Toronto, Canada Outline Generative vs. Discriminative models

More information

Machine Learning. A. Supervised Learning A.1. Linear Regression. Lars Schmidt-Thieme

Machine Learning. A. Supervised Learning A.1. Linear Regression. Lars Schmidt-Thieme Machine Learning A. Supervised Learning A.1. Linear Regression Lars Schmidt-Thieme Information Systems and Machine Learning Lab (ISMLL) Institute for Computer Science University of Hildesheim, Germany

More information

Machine Learning. 7. Logistic and Linear Regression

Machine Learning. 7. Logistic and Linear Regression Sapienza University of Rome, Italy - Machine Learning (27/28) University of Rome La Sapienza Master in Artificial Intelligence and Robotics Machine Learning 7. Logistic and Linear Regression Luca Iocchi,

More information


DATA MINING AND MACHINE LEARNING DATA MINING AND MACHINE LEARNING Lecture 5: Regularization and loss functions Lecturer: Simone Scardapane Academic Year 2016/2017 Table of contents Loss functions Loss functions for regression problems

More information

LASSO Review, Fused LASSO, Parallel LASSO Solvers

LASSO Review, Fused LASSO, Parallel LASSO Solvers Case Study 3: fmri Prediction LASSO Review, Fused LASSO, Parallel LASSO Solvers Machine Learning for Big Data CSE547/STAT548, University of Washington Sham Kakade May 3, 2016 Sham Kakade 2016 1 Variable

More information

Ridge Regression 1. to which some random noise is added. So that the training labels can be represented as:

Ridge Regression 1. to which some random noise is added. So that the training labels can be represented as: CS 1: Machine Learning Spring 15 College of Computer and Information Science Northeastern University Lecture 3 February, 3 Instructor: Bilal Ahmed Scribe: Bilal Ahmed & Virgil Pavlu 1 Introduction Ridge

More information

SCUOLA DI SPECIALIZZAZIONE IN FISICA MEDICA. Sistemi di Elaborazione dell Informazione. Regressione. Ruggero Donida Labati

SCUOLA DI SPECIALIZZAZIONE IN FISICA MEDICA. Sistemi di Elaborazione dell Informazione. Regressione. Ruggero Donida Labati SCUOLA DI SPECIALIZZAZIONE IN FISICA MEDICA Sistemi di Elaborazione dell Informazione Regressione Ruggero Donida Labati Dipartimento di Informatica via Bramante 65, 26013 Crema (CR), Italy

More information

Lecture 1: Supervised Learning

Lecture 1: Supervised Learning Lecture 1: Supervised Learning Tuo Zhao Schools of ISYE and CSE, Georgia Tech ISYE6740/CSE6740/CS7641: Computational Data Analysis/Machine from Portland, Learning Oregon: pervised learning (Supervised)

More information

Behavioral Data Mining. Lecture 7 Linear and Logistic Regression

Behavioral Data Mining. Lecture 7 Linear and Logistic Regression Behavioral Data Mining Lecture 7 Linear and Logistic Regression Outline Linear Regression Regularization Logistic Regression Stochastic Gradient Fast Stochastic Methods Performance tips Linear Regression

More information

Stochastic Gradient Descent

Stochastic Gradient Descent Stochastic Gradient Descent Machine Learning CSE546 Carlos Guestrin University of Washington October 9, 2013 1 Logistic Regression Logistic function (or Sigmoid): Learn P(Y X) directly Assume a particular

More information

Linear Models for Regression

Linear Models for Regression Linear Models for Regression Seungjin Choi Department of Computer Science and Engineering Pohang University of Science and Technology 77 Cheongam-ro, Nam-gu, Pohang 37673, Korea

More information

Regression, Ridge Regression, Lasso

Regression, Ridge Regression, Lasso Regression, Ridge Regression, Lasso Fabio G. Cozman - October 2, 2018 A general definition Regression studies the relationship between a response variable Y and covariates X 1,..., X n.

More information

Bayesian Linear Regression [DRAFT - In Progress]

Bayesian Linear Regression [DRAFT - In Progress] Bayesian Linear Regression [DRAFT - In Progress] David S. Rosenberg Abstract Here we develop some basics of Bayesian linear regression. Most of the calculations for this document come from the basic theory

More information

Reading Group on Deep Learning Session 1

Reading Group on Deep Learning Session 1 Reading Group on Deep Learning Session 1 Stephane Lathuiliere & Pablo Mesejo 2 June 2016 1/31 Contents Introduction to Artificial Neural Networks to understand, and to be able to efficiently use, the popular

More information


LINEAR REGRESSION, RIDGE, LASSO, SVR LINEAR REGRESSION, RIDGE, LASSO, SVR Supervised Learning Katerina Tzompanaki Linear regression one feature* Price (y) What is the estimated price of a new house of area 30 m 2? 30 Area (x) *Also called

More information


CS6220: DATA MINING TECHNIQUES CS6220: DATA MINING TECHNIQUES Matrix Data: Prediction Instructor: Yizhou Sun September 21, 2015 Announcements TA Monisha s office hour has changed to Thursdays 10-12pm, 462WVH (the same

More information