Lecture 5 Multivariate Linear Regression

Size: px
Start display at page:

Download "Lecture 5 Multivariate Linear Regression"

Transcription

1 Lecture 5 Multivariate Linear Regression Dan Sheldon September 23, 2014

2 Topics Multivariate linear regression Model Cost function Normal equations Gradient descent Features

3 Book Data 10 8 Weight (lbs.) Pages y = x

4 Book Data Can we predict better with multiple features? Width Thickness Height # Pages Hardcover Weight

5 Book Data Can we predict better with multiple features? Width Thickness Height # Pages Hardcover Weight Training data (x (1), y (1) ), (x (2), y (2) ),..., (x (m), y (m) )

6 Book Data Can we predict better with multiple features? Width Thickness Height # Pages Hardcover Weight Training data (x (1), y (1) ), (x (2), y (2) ),..., (x (m), y (m) ) x (i) is a feature vector

7 Multivariate Linear Regression Input: x R n Output: y R Model (hypothesis class):? Cost function:?

8 Model h θ (x) =

9 Model h θ (x) = θ 0 + θ 1 x 1 + θ 2 x θ n x n

10 Model h θ (x) = θ 0 + θ 1 x 1 + θ 2 x θ n x n 1 h θ (x) = [ ] x 1 θ 0 θ 1... θ n. x n

11 Model h θ (x) = θ 0 + θ 1 x 1 + θ 2 x θ n x n 1 h θ (x) = [ ] x 1 θ 0 θ 1... θ n. x n h θ (x) = θ T x = x T θ (Augment feature vector with 1)

12 Geometry of high dimensional linear (affine) functions n-dimensional function h θ : R n R h θ (x) = θ 1 x 1 + θ 2 x θ n x n h θ (x) = θ 0 + θ 1 x 1 + θ 2 x θ n x n (linear) (affine) Three facts on board 1. Contours = hyperplanes 2. Gradient = θ (a vector, orthogonal to contours) 3. The norm θ can be interpreted as slope

13 The Problem Find θ such that y (i) h θ (x (i) ), i = 1,..., m

14 The Problem Find θ such that y (i) h θ (x (i) ), i = 1,..., m y (1) y (2)... 1 x (2) y (m) 1 x (1) 1 x (1) 2... x (1) n 1 x (2) 2... x (2) n... 1 x (m) 1 x (m) 2... x (m) n θ 0 θ 1... θ n

15 The Problem Find θ such that y (i) h θ (x (i) ), i = 1,..., m y (1) y (2)... 1 x (2) y (m) 1 x (1) 1 x (1) 2... x (1) n 1 x (2) 2... x (2) n... 1 x (m) 1 x (m) 2... x (m) n θ 0 θ 1... θ n y Xθ

16 Inputs: Data Matrix and Label Vector 1 x (1) 1 x (1) 1 x (2) X = 1 x (2) 2... x (1) n 2... x (2) n y =... 1 x (m) 1 x (m) 2... x (m) n Data matrix y (1) y (2)... y (m) Label vector Width Thickness Height # Pages Hardcover Weight

17 Illustration Find θ such that y (i) h θ (x (i) ), i = 1,..., m Y X 2 Elements of Statistical Learning (2nd Ed.) c Hastie, Tibshirani & Friedman 2009 Chap 3 FIGURE 3.1. Linear least squares fitting with X IR 2. We seek the linear function of X that minimizes the sum of squared residuals from Y. X 1

18 Cost Function J(θ) =

19 Cost Function J(θ) = 1 2 m (h θ (x (i) ) y (i) ) 2 i=1 Exercise: write this succinctly in matrix-vector notation

20 Cost Function Answer: J(θ) = 1 2 (Xθ y)t (Xθ y)

21 The Problem Given training data X and y, find θ to minimize cost function: J(θ) = 1 2 (Xθ y)t (Xθ y)

22 Solution 1: Normal Equations Normal equations θ = (X T X) 1 X T y Heuristic derivation:

23 Proper Approach Set all partial derivatives to zero 0 = θ j J(θ) Solve a system of n + 1 linear equations for θ 0,..., θ n Tedious, but leads to normal equations

24 Matrix Calculus Succinct (and cool!) way to solve for normal equations: 0 = J(θ) = d dθ 1 2 (Xθ y)t (Xθ y)

25 Matrix Calculus Succinct (and cool!) way to solve for normal equations: 0 = J(θ) = d 1 dθ 2 (Xθ y)t (Xθ y) 0 = (Xθ y) T X

26 Matrix Calculus Succinct (and cool!) way to solve for normal equations: 0 = J(θ) = d 1 dθ 2 (Xθ y)t (Xθ y) 0 = (Xθ y) T X 0 = X T (Xθ y)

27 Matrix Calculus Succinct (and cool!) way to solve for normal equations: 0 = J(θ) = d 1 dθ 2 (Xθ y)t (Xθ y) 0 = (Xθ y) T X 0 = X T (Xθ y) X T Xθ = X T y

28 Matrix Calculus Succinct (and cool!) way to solve for normal equations: 0 = J(θ) = d 1 dθ 2 (Xθ y)t (Xθ y) 0 = (Xθ y) T X 0 = X T (Xθ y) X T Xθ = X T y θ = (X T X) 1 X T y (Note: not responsible vector derivative in first line, but should understand rest of derivation.)

29 Solution 2: Gradient Descent 1. Initialize θ 0, θ 1,..., θ n arbitrarily 2. Repeat until convergence θ j = θ j α θ j J(θ), j = 0,..., n.

30 Solution 2: Gradient Descent 1. Initialize θ 0, θ 1,..., θ n arbitrarily 2. Repeat until convergence θ j = θ j α θ j J(θ), j = 0,..., n. Partial derivatives: θ j J(θ) = m i=1 (h θ (x (i) ) y (i) )x (i) j

31 Vectorized Gradient Descent 1. Initialize θ arbitrarily 2. Repeat until convergence θ θ α X T (Xθ y) }{{} J(θ)

32 Feature Normalization Demo: Problem 3 from HW0 Advice: normalize your features so they have the similar numeric ranges!

33 Feature Normalization For each feature j, compute the mean µ j and standard deviation σ j of that feature over training set. µ j = 1 m m i=1 x (i) j, σ j = 1 m m i=1 (x (i) j µ j ) 2

34 Feature Normalization For each feature j, compute the mean µ j and standard deviation σ j of that feature over training set. µ j = 1 m m i=1 x (i) j, σ j = 1 m m i=1 (x (i) j µ j ) 2 Then, subtract mean and divide by standard deviation: x (i) j (x (i) j µ j )/σ j

35 Feature Normalization For each feature j, compute the mean µ j and standard deviation σ j of that feature over training set. µ j = 1 m m i=1 x (i) j, σ j = 1 m m i=1 (x (i) j µ j ) 2 Then, subtract mean and divide by standard deviation: x (i) j (x (i) j µ j )/σ j Effect: adjust columns of data matrix to have mean zero and standard deviation equal to one. E.g

36 Feature Normalization Example: cost function contours before and after normalization w 2 0 w w w 1

37 Feature Design It is possible to fit nonlinear functions using linear regression: (x 1, x 2, x 3 ) (x 1, x 2, x 3, x 2 1, log(x 2 ), x 1 + x 3 ) Approaches Try standard transformations Design features you think will work

38 Polynomial Regression x (1, x, x 2, x 3,...) y x

Experiment 1: Linear Regression

Experiment 1: Linear Regression Experiment 1: Linear Regression August 27, 2018 1 Description This first exercise will give you practice with linear regression. These exercises have been extensively tested with Matlab, but they should

More information

Lecture 3. Linear Regression II Bastian Leibe RWTH Aachen

Lecture 3. Linear Regression II Bastian Leibe RWTH Aachen Advanced Machine Learning Lecture 3 Linear Regression II 02.11.2015 Bastian Leibe RWTH Aachen http://www.vision.rwth-aachen.de/ leibe@vision.rwth-aachen.de This Lecture: Advanced Machine Learning Regression

More information

LINEAR REGRESSION, RIDGE, LASSO, SVR

LINEAR REGRESSION, RIDGE, LASSO, SVR LINEAR REGRESSION, RIDGE, LASSO, SVR Supervised Learning Katerina Tzompanaki Linear regression one feature* Price (y) What is the estimated price of a new house of area 30 m 2? 30 Area (x) *Also called

More information

Lecture 4 Logistic Regression

Lecture 4 Logistic Regression Lecture 4 Logistic Regression Dr.Ammar Mohammed Normal Equation Hypothesis hθ(x)=θ0 x0+ θ x+ θ2 x2 +... + θd xd Normal Equation is a method to find the values of θ operations x0 x x2.. xd y x x2... xd

More information

Lecture 10. Neural networks and optimization. Machine Learning and Data Mining November Nando de Freitas UBC. Nonlinear Supervised Learning

Lecture 10. Neural networks and optimization. Machine Learning and Data Mining November Nando de Freitas UBC. Nonlinear Supervised Learning Lecture 0 Neural networks and optimization Machine Learning and Data Mining November 2009 UBC Gradient Searching for a good solution can be interpreted as looking for a minimum of some error (loss) function

More information

Last Lecture Recap. UVA CS / Introduc8on to Machine Learning and Data Mining. Lecture 6: Regression Models with Regulariza8on

Last Lecture Recap. UVA CS / Introduc8on to Machine Learning and Data Mining. Lecture 6: Regression Models with Regulariza8on UVA CS 45 - / 65 7 Introduc8on to Machine Learning and Data Mining Lecture 6: Regression Models with Regulariza8on Yanun Qi / Jane University of Virginia Department of Computer Science Last Lecture Recap

More information

Bias-Variance Decomposition. Mohammad Emtiyaz Khan EPFL Oct 6, 2015

Bias-Variance Decomposition. Mohammad Emtiyaz Khan EPFL Oct 6, 2015 Bias-Variance Decomposition Mohammad Emtiyaz Khan EPFL Oct 6, 2015 Mohammad Emtiyaz Khan 2015 Motivation In ridge regression, we observe a typical behaviour for train and test errors with respect to model

More information

Convex Optimization / Homework 1, due September 19

Convex Optimization / Homework 1, due September 19 Convex Optimization 1-725/36-725 Homework 1, due September 19 Instructions: You must complete Problems 1 3 and either Problem 4 or Problem 5 (your choice between the two). When you submit the homework,

More information

Linear Regression. Aarti Singh. Machine Learning / Sept 27, 2010

Linear Regression. Aarti Singh. Machine Learning / Sept 27, 2010 Linear Regression Aarti Singh Machine Learning 10-701/15-781 Sept 27, 2010 Discrete to Continuous Labels Classification Sports Science News Anemic cell Healthy cell Regression X = Document Y = Topic X

More information

Neural Networks. Haiming Zhou. Division of Statistics Northern Illinois University.

Neural Networks. Haiming Zhou. Division of Statistics Northern Illinois University. Neural Networks Haiming Zhou Division of Statistics Northern Illinois University zhouh@niu.edu Neural Networks The term neural network has evolved to encompass a large class of models and learning methods.

More information

Optimization methods

Optimization methods Lecture notes 3 February 8, 016 1 Introduction Optimization methods In these notes we provide an overview of a selection of optimization methods. We focus on methods which rely on first-order information,

More information

Exercise List: Proving convergence of the Gradient Descent Method on the Ridge Regression Problem.

Exercise List: Proving convergence of the Gradient Descent Method on the Ridge Regression Problem. Exercise List: Proving convergence of the Gradient Descent Method on the Ridge Regression Problem. Robert M. Gower September 5, 08 Introduction Ridge regression is perhaps the simplest example of a training

More information

Machine Learning 4771

Machine Learning 4771 Machine Learning 477 Instructor: Tony Jebara Topic Regression Empirical Risk Minimization Least Squares Higher Order Polynomials Under-fitting / Over-fitting Cross-Validation Regression Classification

More information

Inf2b Learning and Data

Inf2b Learning and Data Inf2b Learning and Data Lecture : Single layer Neural Networks () (Credit: Hiroshi Shimodaira Iain Murray and Steve Renals) Centre for Speech Technology Research (CSTR) School of Informatics University

More information

Approximations - the method of least squares (1)

Approximations - the method of least squares (1) Approximations - the method of least squares () In many applications, we have to consider the following problem: Suppose that for some y, the equation Ax = y has no solutions It could be that this is an

More information

Lecture 3 Notes. Dan Sheldon. September 17, 2012

Lecture 3 Notes. Dan Sheldon. September 17, 2012 Lecture 3 Notes Dan Shelon September 17, 2012 0 Errata Section 4, Equation (2): yn 2 shoul be x2 N. Fixe 9/17/12 Section 5.3, Example 3: shoul rea w 0 = 0, w 1 = 1. Fixe 9/17/12. 1 Review: Linear Regression

More information

CS 540: Machine Learning Lecture 1: Introduction

CS 540: Machine Learning Lecture 1: Introduction CS 540: Machine Learning Lecture 1: Introduction AD January 2008 AD () January 2008 1 / 41 Acknowledgments Thanks to Nando de Freitas Kevin Murphy AD () January 2008 2 / 41 Administrivia & Announcement

More information

Machine learning - HT Basis Expansion, Regularization, Validation

Machine learning - HT Basis Expansion, Regularization, Validation Machine learning - HT 016 4. Basis Expansion, Regularization, Validation Varun Kanade University of Oxford Feburary 03, 016 Outline Introduce basis function to go beyond linear regression Understanding

More information

DS Machine Learning and Data Mining I. Alina Oprea Associate Professor, CCIS Northeastern University

DS Machine Learning and Data Mining I. Alina Oprea Associate Professor, CCIS Northeastern University DS 4400 Machine Learning and Data Mining I Alina Oprea Associate Professor, CCIS Northeastern University January 17 2019 Logistics HW 1 is on Piazza and Gradescope Deadline: Friday, Jan. 25, 2019 Office

More information

Linear Regression. Udacity

Linear Regression. Udacity Linear Regression Udacity What is a Linear Equation? Equation of a line : y = mx+b, wherem is the slope of the line and (0,b)isthey-intercept. Notice that the degree of this equation is 1. In higher dimensions

More information

22 Approximations - the method of least squares (1)

22 Approximations - the method of least squares (1) 22 Approximations - the method of least squares () Suppose that for some y, the equation Ax = y has no solutions It may happpen that this is an important problem and we can t just forget about it If we

More information

Overview of vector calculus. Coordinate systems in space. Distance formula. (Sec. 12.1)

Overview of vector calculus. Coordinate systems in space. Distance formula. (Sec. 12.1) Math 20C Multivariable Calculus Lecture 1 1 Coordinates in space Slide 1 Overview of vector calculus. Coordinate systems in space. Distance formula. (Sec. 12.1) Vector calculus studies derivatives and

More information

Machine Learning Linear Models

Machine Learning Linear Models Machine Learning Linear Models Outline II - Linear Models 1. Linear Regression (a) Linear regression: History (b) Linear regression with Least Squares (c) Matrix representation and Normal Equation Method

More information

Math 251 Midterm II Information Spring 2018

Math 251 Midterm II Information Spring 2018 Math 251 Midterm II Information Spring 2018 WHEN: Thursday, April 12 (in class). You will have the entire period (125 minutes) to work on the exam. RULES: No books or notes. You may bring a non-graphing

More information

Lecture 7: Kernels for Classification and Regression

Lecture 7: Kernels for Classification and Regression Lecture 7: Kernels for Classification and Regression CS 194-10, Fall 2011 Laurent El Ghaoui EECS Department UC Berkeley September 15, 2011 Outline Outline A linear regression problem Linear auto-regressive

More information

Machine Learning. Support Vector Machines. Fabio Vandin November 20, 2017

Machine Learning. Support Vector Machines. Fabio Vandin November 20, 2017 Machine Learning Support Vector Machines Fabio Vandin November 20, 2017 1 Classification and Margin Consider a classification problem with two classes: instance set X = R d label set Y = { 1, 1}. Training

More information

Machine Learning 4771

Machine Learning 4771 Machine Learning 4771 Instructor: Tony Jebara Topic 3 Additive Models and Linear Regression Sinusoids and Radial Basis Functions Classification Logistic Regression Gradient Descent Polynomial Basis Functions

More information

Lecture for Week 2 (Secs. 1.3 and ) Functions and Limits

Lecture for Week 2 (Secs. 1.3 and ) Functions and Limits Lecture for Week 2 (Secs. 1.3 and 2.2 2.3) Functions and Limits 1 First let s review what a function is. (See Sec. 1 of Review and Preview.) The best way to think of a function is as an imaginary machine,

More information

Kernelized Perceptron Support Vector Machines

Kernelized Perceptron Support Vector Machines Kernelized Perceptron Support Vector Machines Emily Fox University of Washington February 13, 2017 What is the perceptron optimizing? 1 The perceptron algorithm [Rosenblatt 58, 62] Classification setting:

More information

Linear Classifiers. Michael Collins. January 18, 2012

Linear Classifiers. Michael Collins. January 18, 2012 Linear Classifiers Michael Collins January 18, 2012 Today s Lecture Binary classification problems Linear classifiers The perceptron algorithm Classification Problems: An Example Goal: build a system that

More information

CMU-Q Lecture 24:

CMU-Q Lecture 24: CMU-Q 15-381 Lecture 24: Supervised Learning 2 Teacher: Gianni A. Di Caro SUPERVISED LEARNING Hypotheses space Hypothesis function Labeled Given Errors Performance criteria Given a collection of input

More information

6.036 midterm review. Wednesday, March 18, 15

6.036 midterm review. Wednesday, March 18, 15 6.036 midterm review 1 Topics covered supervised learning labels available unsupervised learning no labels available semi-supervised learning some labels available - what algorithms have you learned that

More information

CPSC 340: Machine Learning and Data Mining

CPSC 340: Machine Learning and Data Mining CPSC 340: Machine Learning and Data Mining Linear Classifiers: predictions Original version of these slides by Mark Schmidt, with modifications by Mike Gelbart. 1 Admin Assignment 4: Due Friday of next

More information

Module 5 : Linear and Quadratic Approximations, Error Estimates, Taylor's Theorem, Newton and Picard Methods

Module 5 : Linear and Quadratic Approximations, Error Estimates, Taylor's Theorem, Newton and Picard Methods Module 5 : Linear and Quadratic Approximations, Error Estimates, Taylor's Theorem, Newton and Picard Methods Lecture 14 : Taylor's Theorem [Section 141] Objectives In this section you will learn the following

More information

Least Mean Squares Regression. Machine Learning Fall 2018

Least Mean Squares Regression. Machine Learning Fall 2018 Least Mean Squares Regression Machine Learning Fall 2018 1 Where are we? Least Squares Method for regression Examples The LMS objective Gradient descent Incremental/stochastic gradient descent Exercises

More information

CS229 Supplemental Lecture notes

CS229 Supplemental Lecture notes CS229 Supplemental Lecture notes John Duchi Binary classification In binary classification problems, the target y can take on at only two values. In this set of notes, we show how to model this problem

More information

Machine Learning Support Vector Machines. Prof. Matteo Matteucci

Machine Learning Support Vector Machines. Prof. Matteo Matteucci Machine Learning Support Vector Machines Prof. Matteo Matteucci Discriminative vs. Generative Approaches 2 o Generative approach: we derived the classifier from some generative hypothesis about the way

More information

Machine Learning and Computational Statistics, Spring 2017 Homework 2: Lasso Regression

Machine Learning and Computational Statistics, Spring 2017 Homework 2: Lasso Regression Machine Learning and Computational Statistics, Spring 2017 Homework 2: Lasso Regression Due: Monday, February 13, 2017, at 10pm (Submit via Gradescope) Instructions: Your answers to the questions below,

More information

Support Vector Machine I

Support Vector Machine I Support Vector Machine I Jia-Bin Huang ECE-5424G / CS-5824 Virginia Tech Spring 2019 Administrative Please use piazza. No emails. HW 0 grades are back. Re-grade request for one week. HW 1 due soon. HW

More information

Machine Learning Basics: Stochastic Gradient Descent. Sargur N. Srihari

Machine Learning Basics: Stochastic Gradient Descent. Sargur N. Srihari Machine Learning Basics: Stochastic Gradient Descent Sargur N. srihari@cedar.buffalo.edu 1 Topics 1. Learning Algorithms 2. Capacity, Overfitting and Underfitting 3. Hyperparameters and Validation Sets

More information

Machine Learning Basics: Maximum Likelihood Estimation

Machine Learning Basics: Maximum Likelihood Estimation Machine Learning Basics: Maximum Likelihood Estimation Sargur N. srihari@cedar.buffalo.edu This is part of lecture slides on Deep Learning: http://www.cedar.buffalo.edu/~srihari/cse676 1 Topics 1. Learning

More information

Lecture 2: Linear regression

Lecture 2: Linear regression Lecture 2: Linear regression Roger Grosse 1 Introduction Let s ump right in and look at our first machine learning algorithm, linear regression. In regression, we are interested in predicting a scalar-valued

More information

Least Mean Squares Regression

Least Mean Squares Regression Least Mean Squares Regression Machine Learning Spring 2018 The slides are mainly from Vivek Srikumar 1 Lecture Overview Linear classifiers What functions do linear classifiers express? Least Squares Method

More information

Mathematical Induction

Mathematical Induction Mathematical Induction James K. Peterson Department of Biological Sciences and Department of Mathematical Sciences Clemson University January 12, 2017 Outline Introduction to the Class Mathematical Induction

More information

Fundamentals of Machine Learning. Mohammad Emtiyaz Khan EPFL Aug 25, 2015

Fundamentals of Machine Learning. Mohammad Emtiyaz Khan EPFL Aug 25, 2015 Fundamentals of Machine Learning Mohammad Emtiyaz Khan EPFL Aug 25, 25 Mohammad Emtiyaz Khan 24 Contents List of concepts 2 Course Goals 3 2 Regression 4 3 Model: Linear Regression 7 4 Cost Function: MSE

More information

CSC321 Lecture 2: Linear Regression

CSC321 Lecture 2: Linear Regression CSC32 Lecture 2: Linear Regression Roger Grosse Roger Grosse CSC32 Lecture 2: Linear Regression / 26 Overview First learning algorithm of the course: linear regression Task: predict scalar-valued targets,

More information

Statistical Machine Learning Hilary Term 2018

Statistical Machine Learning Hilary Term 2018 Statistical Machine Learning Hilary Term 2018 Pier Francesco Palamara Department of Statistics University of Oxford Slide credits and other course material can be found at: http://www.stats.ox.ac.uk/~palamara/sml18.html

More information

Error Functions & Linear Regression (1)

Error Functions & Linear Regression (1) Error Functions & Linear Regression (1) John Kelleher & Brian Mac Namee Machine Learning @ DIT Overview 1 Introduction Overview 2 Univariate Linear Regression Linear Regression Analytical Solution Gradient

More information

Binary Classification / Perceptron

Binary Classification / Perceptron Binary Classification / Perceptron Nicholas Ruozzi University of Texas at Dallas Slides adapted from David Sontag and Vibhav Gogate Supervised Learning Input: x 1, y 1,, (x n, y n ) x i is the i th data

More information

ECE521 week 3: 23/26 January 2017

ECE521 week 3: 23/26 January 2017 ECE521 week 3: 23/26 January 2017 Outline Probabilistic interpretation of linear regression - Maximum likelihood estimation (MLE) - Maximum a posteriori (MAP) estimation Bias-variance trade-off Linear

More information

Linear and Logistic Regression. Dr. Xiaowei Huang

Linear and Logistic Regression. Dr. Xiaowei Huang Linear and Logistic Regression Dr. Xiaowei Huang https://cgi.csc.liv.ac.uk/~xiaowei/ Up to now, Two Classical Machine Learning Algorithms Decision tree learning K-nearest neighbor Model Evaluation Metrics

More information

COMP 551 Applied Machine Learning Lecture 2: Linear Regression

COMP 551 Applied Machine Learning Lecture 2: Linear Regression COMP 551 Applied Machine Learning Lecture 2: Linear Regression Instructor: Herke van Hoof (herke.vanhoof@mail.mcgill.ca) Slides mostly by: Class web page: www.cs.mcgill.ca/~hvanho2/comp551 Unless otherwise

More information

Data Mining and Data Warehousing. Henryk Maciejewski. Data Mining Predictive modelling: regression

Data Mining and Data Warehousing. Henryk Maciejewski. Data Mining Predictive modelling: regression Data Mining and Data Warehousing Henryk Maciejewski Data Mining Predictive modelling: regression Algorithms for Predictive Modelling Contents Regression Classification Auxiliary topics: Estimation of prediction

More information

UVA CS 6316/4501 Fall 2016 Machine Learning. Lecture 6: Linear Regression Model with RegularizaEons. Dr. Yanjun Qi. University of Virginia

UVA CS 6316/4501 Fall 2016 Machine Learning. Lecture 6: Linear Regression Model with RegularizaEons. Dr. Yanjun Qi. University of Virginia UVA CS 6316/4501 Fall 2016 Machine Learning Lecture 6: Linear Regression Model with RegularizaEons Dr. Yanjun Qi University of Virginia Department of Computer Science 1 Where are we? è Five major secgons

More information

Lecture 7. Logistic Regression. Luigi Freda. ALCOR Lab DIAG University of Rome La Sapienza. December 11, 2016

Lecture 7. Logistic Regression. Luigi Freda. ALCOR Lab DIAG University of Rome La Sapienza. December 11, 2016 Lecture 7 Logistic Regression Luigi Freda ALCOR Lab DIAG University of Rome La Sapienza December 11, 2016 Luigi Freda ( La Sapienza University) Lecture 7 December 11, 2016 1 / 39 Outline 1 Intro Logistic

More information

Ordinary Least Squares Linear Regression

Ordinary Least Squares Linear Regression Ordinary Least Squares Linear Regression Ryan P. Adams COS 324 Elements of Machine Learning Princeton University Linear regression is one of the simplest and most fundamental modeling ideas in statistics

More information

Lecture 11 Linear regression

Lecture 11 Linear regression Advanced Algorithms Floriano Zini Free University of Bozen-Bolzano Faculty of Computer Science Academic Year 2013-2014 Lecture 11 Linear regression These slides are taken from Andrew Ng, Machine Learning

More information

Linear Models for Regression

Linear Models for Regression Linear Models for Regression Machine Learning Torsten Möller Möller/Mori 1 Reading Chapter 3 of Pattern Recognition and Machine Learning by Bishop Chapter 3+5+6+7 of The Elements of Statistical Learning

More information

Machine Learning Basics

Machine Learning Basics Security and Fairness of Deep Learning Machine Learning Basics Anupam Datta CMU Spring 2019 Image Classification Image Classification Image classification pipeline Input: A training set of N images, each

More information

CS 231A Section 1: Linear Algebra & Probability Review

CS 231A Section 1: Linear Algebra & Probability Review CS 231A Section 1: Linear Algebra & Probability Review 1 Topics Support Vector Machines Boosting Viola-Jones face detector Linear Algebra Review Notation Operations & Properties Matrix Calculus Probability

More information

13. Nonlinear least squares

13. Nonlinear least squares L. Vandenberghe ECE133A (Fall 2018) 13. Nonlinear least squares definition and examples derivatives and optimality condition Gauss Newton method Levenberg Marquardt method 13.1 Nonlinear least squares

More information

Introduction to Machine Learning Prof. Sudeshna Sarkar Department of Computer Science and Engineering Indian Institute of Technology, Kharagpur

Introduction to Machine Learning Prof. Sudeshna Sarkar Department of Computer Science and Engineering Indian Institute of Technology, Kharagpur Introduction to Machine Learning Prof. Sudeshna Sarkar Department of Computer Science and Engineering Indian Institute of Technology, Kharagpur Module 2 Lecture 05 Linear Regression Good morning, welcome

More information

Lecture 35: Optimization and Neural Nets

Lecture 35: Optimization and Neural Nets Lecture 35: Optimization and Neural Nets CS 4670/5670 Sean Bell DeepDream [Google, Inceptionism: Going Deeper into Neural Networks, blog 2015] Aside: CNN vs ConvNet Note: There are many papers that use

More information

Linear Discrimination Functions

Linear Discrimination Functions Laurea Magistrale in Informatica Nicola Fanizzi Dipartimento di Informatica Università degli Studi di Bari November 4, 2009 Outline Linear models Gradient descent Perceptron Minimum square error approach

More information

Regression and Classification" with Linear Models" CMPSCI 383 Nov 15, 2011!

Regression and Classification with Linear Models CMPSCI 383 Nov 15, 2011! Regression and Classification" with Linear Models" CMPSCI 383 Nov 15, 2011! 1 Todayʼs topics" Learning from Examples: brief review! Univariate Linear Regression! Batch gradient descent! Stochastic gradient

More information

Online Learning and Sequential Decision Making

Online Learning and Sequential Decision Making Online Learning and Sequential Decision Making Emilie Kaufmann CNRS & CRIStAL, Inria SequeL, emilie.kaufmann@univ-lille.fr Research School, ENS Lyon, Novembre 12-13th 2018 Emilie Kaufmann Online Learning

More information

Ridge Regression 1. to which some random noise is added. So that the training labels can be represented as:

Ridge Regression 1. to which some random noise is added. So that the training labels can be represented as: CS 1: Machine Learning Spring 15 College of Computer and Information Science Northeastern University Lecture 3 February, 3 Instructor: Bilal Ahmed Scribe: Bilal Ahmed & Virgil Pavlu 1 Introduction Ridge

More information

VC Dimension Review. The purpose of this document is to review VC dimension and PAC learning for infinite hypothesis spaces.

VC Dimension Review. The purpose of this document is to review VC dimension and PAC learning for infinite hypothesis spaces. VC Dimension Review The purpose of this document is to review VC dimension and PAC learning for infinite hypothesis spaces. Previously, in discussing PAC learning, we were trying to answer questions about

More information

CS 231A Section 1: Linear Algebra & Probability Review. Kevin Tang

CS 231A Section 1: Linear Algebra & Probability Review. Kevin Tang CS 231A Section 1: Linear Algebra & Probability Review Kevin Tang Kevin Tang Section 1-1 9/30/2011 Topics Support Vector Machines Boosting Viola Jones face detector Linear Algebra Review Notation Operations

More information

Linear Classifiers: Expressiveness

Linear Classifiers: Expressiveness Linear Classifiers: Expressiveness Machine Learning Spring 2018 The slides are mainly from Vivek Srikumar 1 Lecture outline Linear classifiers: Introduction What functions do linear classifiers express?

More information

Econ Slides from Lecture 8

Econ Slides from Lecture 8 Econ 205 Sobel Econ 205 - Slides from Lecture 8 Joel Sobel September 1, 2010 Computational Facts 1. det AB = det BA = det A det B 2. If D is a diagonal matrix, then det D is equal to the product of its

More information

CS489/698: Intro to ML

CS489/698: Intro to ML CS489/698: Intro to ML Lecture 03: Multi-layer Perceptron Outline Failure of Perceptron Neural Network Backpropagation Universal Approximator 2 Outline Failure of Perceptron Neural Network Backpropagation

More information

Perceptron (Theory) + Linear Regression

Perceptron (Theory) + Linear Regression 10601 Introduction to Machine Learning Machine Learning Department School of Computer Science Carnegie Mellon University Perceptron (Theory) Linear Regression Matt Gormley Lecture 6 Feb. 5, 2018 1 Q&A

More information

CSC 411 Lecture 6: Linear Regression

CSC 411 Lecture 6: Linear Regression CSC 411 Lecture 6: Linear Regression Roger Grosse, Amir-massoud Farahmand, and Juan Carrasquilla University of Toronto UofT CSC 411: 06-Linear Regression 1 / 37 A Timely XKCD UofT CSC 411: 06-Linear Regression

More information

Overview. Linear Regression with One Variable Gradient Descent Linear Regression with Multiple Variables Gradient Descent with Multiple Variables

Overview. Linear Regression with One Variable Gradient Descent Linear Regression with Multiple Variables Gradient Descent with Multiple Variables Overview Linear Regression with One Variable Gradient Descent Linear Regression with Multiple Variables Gradient Descent with Multiple Variables Example: Advertising Data Data taken from An Introduction

More information

Cheng Soon Ong & Christian Walder. Canberra February June 2018

Cheng Soon Ong & Christian Walder. Canberra February June 2018 Cheng Soon Ong & Christian Walder Research Group and College of Engineering and Computer Science Canberra February June 2018 (Many figures from C. M. Bishop, "Pattern Recognition and ") 1of 254 Part V

More information

Gradient Descent. Ryan Tibshirani Convex Optimization /36-725

Gradient Descent. Ryan Tibshirani Convex Optimization /36-725 Gradient Descent Ryan Tibshirani Convex Optimization 10-725/36-725 Last time: canonical convex programs Linear program (LP): takes the form min x subject to c T x Gx h Ax = b Quadratic program (QP): like

More information

AM 205: lecture 18. Last time: optimization methods Today: conditions for optimality

AM 205: lecture 18. Last time: optimization methods Today: conditions for optimality AM 205: lecture 18 Last time: optimization methods Today: conditions for optimality Existence of Global Minimum For example: f (x, y) = x 2 + y 2 is coercive on R 2 (global min. at (0, 0)) f (x) = x 3

More information

Lecture 3 - Linear and Logistic Regression

Lecture 3 - Linear and Logistic Regression 3 - Linear and Logistic Regression-1 Machine Learning Course Lecture 3 - Linear and Logistic Regression Lecturer: Haim Permuter Scribe: Ziv Aharoni Throughout this lecture we talk about how to use regression

More information

Optimization and Gradient Descent

Optimization and Gradient Descent Optimization and Gradient Descent INFO-4604, Applied Machine Learning University of Colorado Boulder September 12, 2017 Prof. Michael Paul Prediction Functions Remember: a prediction function is the function

More information

Neural networks (NN) 1

Neural networks (NN) 1 Neural networks (NN) 1 Hedibert F. Lopes Insper Institute of Education and Research São Paulo, Brazil 1 Slides based on Chapter 11 of Hastie, Tibshirani and Friedman s book The Elements of Statistical

More information

Introduction to Model Order Reduction

Introduction to Model Order Reduction Introduction to Model Order Reduction Lecture 1: Introduction and overview Henrik Sandberg Kin Cheong Sou Automatic Control Lab, KTH ACCESS Specialized Course Graduate level Ht 2010, period 1 1 Overview

More information

A Gentle Introduction to Gradient Boosting. Cheng Li College of Computer and Information Science Northeastern University

A Gentle Introduction to Gradient Boosting. Cheng Li College of Computer and Information Science Northeastern University A Gentle Introduction to Gradient Boosting Cheng Li chengli@ccs.neu.edu College of Computer and Information Science Northeastern University Gradient Boosting a powerful machine learning algorithm it can

More information

Lecture 15: Logistic Regression

Lecture 15: Logistic Regression Lecture 15: Logistic Regression William Webber (william@williamwebber.com) COMP90042, 2014, Semester 1, Lecture 15 What we ll learn in this lecture Model-based regression and classification Logistic regression

More information

Data Mining Stat 588

Data Mining Stat 588 Data Mining Stat 588 Lecture 9: Basis Expansions Department of Statistics & Biostatistics Rutgers University Nov 01, 2011 Regression and Classification Linear Regression. E(Y X) = f(x) We want to learn

More information

IV. Matrix Approximation using Least-Squares

IV. Matrix Approximation using Least-Squares IV. Matrix Approximation using Least-Squares The SVD and Matrix Approximation We begin with the following fundamental question. Let A be an M N matrix with rank R. What is the closest matrix to A that

More information

Linear regression COMS 4771

Linear regression COMS 4771 Linear regression COMS 4771 1. Old Faithful and prediction functions Prediction problem: Old Faithful geyser (Yellowstone) Task: Predict time of next eruption. 1 / 40 Statistical model for time between

More information

18.9 SUPPORT VECTOR MACHINES

18.9 SUPPORT VECTOR MACHINES 744 Chapter 8. Learning from Examples is the fact that each regression problem will be easier to solve, because it involves only the examples with nonzero weight the examples whose kernels overlap the

More information

A note on the group lasso and a sparse group lasso

A note on the group lasso and a sparse group lasso A note on the group lasso and a sparse group lasso arxiv:1001.0736v1 [math.st] 5 Jan 2010 Jerome Friedman Trevor Hastie and Robert Tibshirani January 5, 2010 Abstract We consider the group lasso penalty

More information

FSAN815/ELEG815: Foundations of Statistical Learning

FSAN815/ELEG815: Foundations of Statistical Learning FSAN815/ELEG815: Foundations of Statistical Learning Gonzalo R. Arce Chapter 14: Logistic Regression Fall 2014 Course Objectives & Structure Course Objectives & Structure The course provides an introduction

More information

Lecture 6 Optimization for Deep Neural Networks

Lecture 6 Optimization for Deep Neural Networks Lecture 6 Optimization for Deep Neural Networks CMSC 35246: Deep Learning Shubhendu Trivedi & Risi Kondor University of Chicago April 12, 2017 Things we will look at today Stochastic Gradient Descent Things

More information

Math for Machine Learning Open Doors to Data Science and Artificial Intelligence. Richard Han

Math for Machine Learning Open Doors to Data Science and Artificial Intelligence. Richard Han Math for Machine Learning Open Doors to Data Science and Artificial Intelligence Richard Han Copyright 05 Richard Han All rights reserved. CONTENTS PREFACE... - INTRODUCTION... LINEAR REGRESSION... 4 LINEAR

More information

Linear Regression. S. Sumitra

Linear Regression. S. Sumitra Linear Regression S Sumitra Notations: x i : ith data point; x T : transpose of x; x ij : ith data point s jth attribute Let {(x 1, y 1 ), (x, y )(x N, y N )} be the given data, x i D and y i Y Here D

More information

Lecture 17 Intro to Lasso Regression

Lecture 17 Intro to Lasso Regression Lecture 17 Intro to Lasso Regression 11 November 2015 Taylor B. Arnold Yale Statistics STAT 312/612 Notes problem set 5 posted; due today Goals for today introduction to lasso regression the subdifferential

More information

Gradient descent. Barnabas Poczos & Ryan Tibshirani Convex Optimization /36-725

Gradient descent. Barnabas Poczos & Ryan Tibshirani Convex Optimization /36-725 Gradient descent Barnabas Poczos & Ryan Tibshirani Convex Optimization 10-725/36-725 1 Gradient descent First consider unconstrained minimization of f : R n R, convex and differentiable. We want to solve

More information

Machine Learning and Data Mining. Linear regression. Kalev Kask

Machine Learning and Data Mining. Linear regression. Kalev Kask Machine Learning and Data Mining Linear regression Kalev Kask Supervised learning Notation Features x Targets y Predictions ŷ Parameters q Learning algorithm Program ( Learner ) Change q Improve performance

More information

Lecture 3: Vectors. In Song Kim. September 1, 2011

Lecture 3: Vectors. In Song Kim. September 1, 2011 Lecture 3: Vectors In Song Kim September 1, 211 1 Solving Equations Up until this point we have been looking at different types of functions, often times graphing them. Each point on a graph is a solution

More information

Computing Neural Network Gradients

Computing Neural Network Gradients Computing Neural Network Gradients Kevin Clark 1 Introduction The purpose of these notes is to demonstrate how to quickly compute neural network gradients in a completely vectorized way. It is complementary

More information

Stanford Machine Learning - Week V

Stanford Machine Learning - Week V Stanford Machine Learning - Week V Eric N Johnson August 13, 2016 1 Neural Networks: Learning What learning algorithm is used by a neural network to produce parameters for a model? Suppose we have a neural

More information

LASSO Review, Fused LASSO, Parallel LASSO Solvers

LASSO Review, Fused LASSO, Parallel LASSO Solvers Case Study 3: fmri Prediction LASSO Review, Fused LASSO, Parallel LASSO Solvers Machine Learning for Big Data CSE547/STAT548, University of Washington Sham Kakade May 3, 2016 Sham Kakade 2016 1 Variable

More information