CS-E3210 Machine Learning: Basic Principles

Size: px
Start display at page:

Download "CS-E3210 Machine Learning: Basic Principles"

Transcription

1 CS-E3210 Machine Learning: Basic Principles Lecture 3: Regression I slides by Markus Heinonen Department of Computer Science Aalto University, School of Science Autumn (Period I) / 48

2 In a nutshell today and friday we consider regression problems data points x (i) R d and continuous target y (i) R we want to learn a function h(x (i) ) y (i) function prediction h(x) is continuous in classification both target y and h(x) are binary a function h( ) is represented by parameters w parameters w need to fit data X = (x (1), y (1) ),..., (x (N), y (N) ) 2 / 48

3 output y: rent Can we predict apartment rent? 2000 Rent prediction input x: house size (sqm) we observe rents y (i) for i = 1,..., 11 houses x (i) learn from this data to predict rent h(x) R given d house properties x R d (designing good h(x) by hand is not machine learning) 3 / 48

4 output y: rent input x 2 : house age Which features do we have access to? Rent prediction Rent prediction, output y: rent input x: house size (sqm) input x 1 : house size (sqm) house size x size can predict a linear trend in rent y house age x age gives non-linear information about y new and old houses seem expensive, little effect for 40 s to 90 s informative features add accuracy (eg. location, condition) non-informative features add noise (eg. house color) 4 / 48

5 output y: rent output y: rent Alternative hypotheses h(x), which to choose? 2000 Rent prediction h(x) = 8.5 x Rent prediction h(x) complex input x: house size (sqm) input x: house size (sqm) linear functions are surprisingly powerful non-linear functions can achieve low error, but still err badly a model should learn the underlying model and generalise to future data Lectures 7 & 8 5 / 48

6 input x 2 : house age input x 2 : house age 800 Alternative hypotheses h(x), which to choose? Rent prediction, output y: rent Rent prediction, output y: rent input x 1 : house size (sqm) input x 1 : house size (sqm) a linear function can not explain the bimodal behavior of x age basis functions 6 / 48

7 Outline 1 2 Polynomial basis Gaussian basis 7 / 48

8 A regression problem inputs x (i) = (x (i) 1,..., x (i) d )T R d with d features/properties/dimensions/covariates a scalar target/response/output/label y (i) R a dataset of N data points X = {(x (1), y (1) ),..., (x (N), y (N) )} = {x (i), y (i) } N i=1 in matrix form the dataset is x (1) 1 x (1) d X =..... =. x (N) 1 x (N) d x (1)T x (N)T RN d, y = learn a function h( ) : R d R with y (i) h(x (i) ) (1) which function family h(x) to choose? (2) how to measure h(x) y? y (1). y (N) R N 8 / 48

9 linear regression for multivariate inputs x R d defines d h w (x) = w j x j = w T x j=0 where w R d are linear weight parameters encode x 0 = 1, then w 0 encodes intercept the hypothesis class h w {h w : w R d } all predictions in matrix notation now h(x (1) ). h(x (N) ) = w T x (1). w T x (N) = X w measure prediction error by square error/loss L((x (i), y (i) ), h( )) = (y (i) h(x (i) )) 2 9 / 48

10 output y: rent Can we predict apartment rent? Rent prediction input x (i) output y (i) input x: house size (sqm) x (1) = 31 y (1) = 705 x (2) = 33 y (2) = 540 x (3) = 31 y (3) = 650 x (4) = 49 y (4) = 840 x (5) = 53 y (5) = 890 x (6) = 69 y (6) = 850 x (7) = 101 y (7) = 1200 x (8) = 99 y (8) = 1150 x (9) = 143 y (9) = 1700 x (10) = 132 y (10) = 900 x (11) = 109 y (11) = 1550 we observe data X = (x (1), y (1) ),..., (x (N), y (N) ) with N = 11 we assume y (i) f (x (i) ) where f ( ) is the true function 10 / 48

11 output y: rent Can we predict apartment rent? Rent prediction h(x) = 9 x input x: house size (sqm) input x (i) output y (i) h(x) = 9 x x (1) = 31 y (1) = 705 h(x (1) ) = 679 x (2) = 33 y (2) = 540 h(x (2) ) = 697 x (3) = 31 y (3) = 650 h(x (3) ) = 679 x (4) = 49 y (4) = 840 h(x (4) ) = 841 x (5) = 53 y (5) = 890 h(x (5) ) = 877 x (6) = 69 y (6) = 850 h(x (6) ) = 1021 x (7) = 101 y (7) = 1200 h(x (7) ) = 1309 x (8) = 99 y (8) = 1150 h(x (8) ) = 1291 x (9) = 143 y (9) = 1700 h(x (9) ) = 1687 x (10) = 132 y (10) = 900 h(x (10) ) = 1588 x (11) = 109 y (11) = 1550 h(x (11) ) = 1381 linear hypothesis class h w (x) = w 1 x + w 0 = w T x encode x = (x, 1) T with w = (w 1, w 0 ) T compute losses (y (i) h(x (i) )) 2 11 / 48

12 output y: rent Which parameters to choose? 2000 Rent prediction input x: house size (sqm) choose parameters to minimize empirical risk (mean loss) { ŵ = argmin E(h( ) X) = 1 N (y (i) h(x (i) )) 2 w N i=1 = 1 N y X w 2} 12 / 48

13 y Empirical risk Empirical risk 2000 Rent prediction (b=0) Empirical risk 1500 data h(x) = 5x 10 8 Empirical risk w= x w empirical risk quantifies how well the function fits data h(x) = w 1 x + 0, w 1 = 5 13 / 48

14 y Empirical risk Empirical risk 2000 Rent prediction (b=0) Empirical risk 1500 data h(x) = 5x h(x) = 11.7x 10 8 Empirical risk w=5 w= x w empirical risk quantifies how well the function fits data h(x) = w 1 x + 0, w 1 = / 48

15 y Empirical risk Empirical risk 2000 Rent prediction (b=0) Empirical risk 1500 data h(x) = 5x h(x) = 11.7x h(x) = 15x 10 8 Empirical risk w=5 w=11.7 w= x w empirical risk quantifies how well the function fits data h(x) = w 1 x + 0, w 1 = 15 best hypothesis was w 1 = 11.7 when w 0 = 0 (only on this data X!) 15 / 48

16 Empirical risk 2D empirical risk surface over w 0, w 1 16 / 48

17 Derivatives let s minimize empirical risk minimization of functions is based on derivatives df (x) f (x + h) f (x) = lim dx h 0 h derivative is the direction of steepest descent 17 / 48

18 Derivatives derivative of w wrt empirical error is (for 1D problem) E(h( ) X) = 1 N N i=1 (y (i) wx (i) ) 2 w w = 1 N (y (i) wx (i) ) 2 N w = 2 N = 2 N i=1 N i=1 N i=1 (y (i) wx (i) ) (y (i) wx (i) ) w x (i) (y (i) wx (i) ) }{{} i th data error gradient of w = (w 1,..., w d ) T wrt empirical error is w E(h w ( ) X) = E(h w 1,...,w d ( ) X) w 1. E(h w 1,...,w d ( ) X) w d 18 / 48

19 Iterative gradient descent choose initial parameter w (0) (eg. all 0 s) and stepsize α iterative gradient descent (GD): for k = 1,..., K, update w (k+1) = w (k) α w E(h( ) X) = w (k) + 2α }{{} N gradient N i=1 x (i) (y (i) w (k)t x (i) ) }{{} i th data point error output: final K th regression weight vector w (K) choice of step size or learning rate α is crucial! if α is too large: iterations may not converge if α is too small: very slow convergence α usually chosen by trial-and-error gradient w E(h( ) X) points to direction of the maximal rate of increase of E(h( ) X) at current value w subtract gradient from w (k) to maximally decrease E(h( ) X) computational complexity O(K N 2 ) 19 / 48

20 Gradient minimization we use update equation w (k+1) = w (k) + 2α N N x (i) (y (i) w (k)t x (i) ) i=1 stepsize α is good 20 / 48

21 Gradient minimization we use update equation w (k+1) = w (k) + 2α N N x (i) (y (i) w (k)t x (i) ) i=1 too large α, we are not converging 21 / 48

22 Stochastic gradient descent in gradient descent each data point pulls parameters w (k+1) = w (k) α w E(h( ) X) = w (k) + 2α N N i=1 x (i) (y (i) w (k)t x (i) ) }{{} i th data point error in stochastic gradient descent (SGD) we compute gradient over random minibatches I {1,..., N} of data of size M < N w (k+1) = w (k) α w E(h( ) X I ) = w (k) + 2α N computational complexity O(K M 2 ) x (i) (y (i) w (k)t x (i) ) SGD is one of the most powerful optimizers for large models i I 22 / 48

23 Analytical solution for linear regression to minimize E(h( ) X) we can directly solve where its gradients are 0: w E(h( ) X) = 0 with solution (DLbook 5.1.4) ŵ = (X T X ) 1 X T y we get global optimum since empirical risk (of linear regression) is convex (X T X ) 1 needs to be invertible Regression Home Assignment matrix inverse is an O(d 3 ) operation 23 / 48

24 ID card of linear regression input/feature space X = R d target space Y = R function family h(x) = w T x = d j=0 w jx j bias trick: x 0 = 1 and j starts from 0 loss function L((x, y), h( )) = (h(x) y) 2 empirical risk E(h( ) X) = 1 N X w y 2 2 empirical risk minimization leads to parameters ŵ = (X T X ) 1 X T y (..or..) w (k+1) = w (k) + 2α N x (i) (y (i) w (k)t x (i) ) N i=1 DL book: covered in chapter / 48

25 Case study: predict red wine quality with linear regression? one wants to understand what makes a wine taste good we have measured chemical composition of many wines (x) tasting evaluations to rate the wines (y) task: predict wine quality h(x) given its composition x 25 / 48

26 Wine measurement data we construct a dataset X of N = 1599 wine measurements x we manually obtain a rating y [0, 10] for each wine from subjective tastings fixed volatile citric free total acid acid acid sugar chlorides sulfur sulfur density ph sulphates alcohol quality x (i) 1 x (i) 2 x (i) 3 x (i) 4 x (i) 5 x (i) 6 x (i) 7 x (i) 8 x (i) 9 x (i) 10 x (i) x (1) 5 y (1) x (2) 5 y (2) X = x (3), y = 5 y (3) x (4) 6 y (4) x (5) 5 y (5) x (1599) 6 y (1599) *P. Cortez, A. Cerdeira, F. Almeida, T. Matos and J. Reis. Modeling wine preferences by data mining from physicochemical properties. In Decision Support Systems, Elsevier, 47(4): , / 48

27 Linear Regression on wine linear hypothesis space H = {h w (x) = w T x : w R 11 } empirical risk minimizer (fits the 1599 wines best): ŵ = argmin w 1 N N (y (i) w T x (i) ) 2 = (X T X ) 1 X T y i=1 [ ] = ŵ X = fixed volatile citric free total acid acid acid sugar chlorides sulfur sulfur density ph sulphates alcohol x (i) 1 x (i) 2 x (i) 3 x (i) 4 x (i) 5 x (i) 6 x (i) 7 x (i) 8 x (i) 9 x (i) 10 x (i) x (1) x (2) x (3) x (4) x (5) , y = quality 5 y (1) 5 y (2) 5 y (3) 6 y (4) 5 y (5).. 27 / 48

28 predictions h(x (i) ) = j ŵ j x (i) j = ŵ T x (i) X = [ ] x (1) x (2), x (3) x (4) x (5) = ŵ w T x (1) w T x (2) w T x (3) w T x (4) w T x (5) w T x (1599) h(x (1) ) = ( 1.09) = h(x (2) ) = ( 1.09) = / 48

29 result on wine We achieve empirical risk (mean square error) E(h( ) X) = 1 N N (h(x (i) ) y (i) ) 2 = i= y = 5, X ŵ = / 48

30 Outline Polynomial basis Gaussian basis 1 2 Polynomial basis Gaussian basis 30 / 48

31 Non-linearity Polynomial basis Gaussian basis so far we have analysed linear models where each feature contribution towards output is summed independently most machine learning problems are non-linear non-linear effects, e.g. log(x alcohol ) combined effects, e.g. x sugar x alcohol let s expand the feature space by considering n basis functions h(x) = n w j φ j (x) = w T φ(x) j=0 where φ(x) : R d R n with usually n > d and φ 0 (x) = 0 dataset is then Φ = (φ(x (1) ),..., φ(x (N) )) T R N n risk: 1 N Φw y 2 2, solution: ŵ = (ΦT Φ) 1 Φ T y 31 / 48

32 Outline Polynomial basis Gaussian basis 1 2 Polynomial basis Gaussian basis 32 / 48

33 Polynomial expansion Polynomial basis Gaussian basis map φ : (x 1, x 2 ) (x 1, x 2, x1 2x 2 2) the product x1 2x 2 2 solves the problem (feature expansion) trivial solution now w 3 = 1 33 / 48

34 Polynomial basis functions Polynomial basis Gaussian basis let s consider non-additive effects via M th order polynomial basis functions: where φ (M) (x) = {x j1 x j2 x jm : j M 1,..., d} φ (0) (x) = 1 φ (1) (x) = (x 1, x 2,..., x d ) T φ (2) (x) = (x 2 1, x 1 x 2,..., x d 1 x d, x 2 d )T d = 11 features gives 55 pairwise terms, 165 triplets, etc. basis expansion dramatically increases hypothesis space bases are precomputed to produce Φ matrix basis functions results in non-linear prediction 34 / 48

35 Polynomial basis Gaussian basis Polynomial basis example sample 100 points where x (i) [ 1, 1] and y (i) = sin(πx (i) ) + ε black dots: 7 data points red dots: more samples linear function h(x) = 1.37x X Y sin(x π) degree 1 polynomial 35 / 48

36 Polynomial basis Gaussian basis Polynomial regressor, M = X Y sin(x π) degree 0 polynomial h(x) = ŵ 0 36 / 48

37 Polynomial basis Gaussian basis Polynomial regressor, M = X Y sin(x π) degree 1 polynomial h(x) = ŵ 0 + ŵ 1 x 37 / 48

38 Polynomial basis Gaussian basis Polynomial regressor, M = X Y sin(x π) degree 2 polynomial h(x) = ŵ T φ(x) = ŵ 0 + ŵ 1 x + ŵ 2 x 2 38 / 48

39 Polynomial basis Gaussian basis Polynomial regressor, M = X Y sin(x π) degree 3 polynomial h(x) = ŵ T φ(x) = ŵ 0 + ŵ 1 x + ŵ 2 x 2 + ŵ 3 x 3 39 / 48

40 Polynomial basis Gaussian basis Polynomial regressors, M = X Y sin(x π) degree 5 polynomial h(x) = ŵ T φ(x) = ŵ 0 + ŵ 1 x + ŵ 2 x 2 + ŵ 3 x 3 + ŵ 4 x 4 + ŵ 5 x 5 40 / 48

41 Polynomial basis Gaussian basis Polynomial regressors, M = 5 with enough data X Y sin(x π) Polynomial of degree 5 41 / 48

42 Outline Polynomial basis Gaussian basis 1 2 Polynomial basis Gaussian basis 42 / 48

43 Kernel basis functions Polynomial basis Gaussian basis kernel function K(x, x ) R measures similarity of two vectors x, x R d opposite concept to distance function D(x, x ) a common kernel is the gaussian kernel ( K(x, x ) = exp 1 x x 2 ) 2 σ 2 kernel basis function encodes feature φ i (x) as similarity to other point m (i), φ i (x) = K(x, m (i) ) how to choose basis points m (i)? 43 / 48

44 Polynomial basis Gaussian basis feature mapping with 3 gaussian bases 3 features φ j (x) = e (x m(j) ) 2 2σ 2 at m (j) = 50, 100, 150 feature mapping φ : x (φ 1 (x), φ 2 (x), φ 3 (x)) eg. x = 31 becomes φ(31) = (0.74, 0.02, 0.00) eg. x = 69 becomes φ(69) = (0.74, 0.46, 0.00) eg. x = 143 becomes φ(143) = (0.00, 0.22, 0.96) 44 / 48

45 3 gaussian bases on 1D Polynomial basis Gaussian basis three gaussian features φ j (x) = e (x m(j) ) 2 2σ 2 with (m (1), m (2), m (3) ) = (50, 100, 150) hypothesis is a sum of weighted gaussian features 3 h(x) = w j φ j (x) j=1 45 / 48

46 ID card of linear basis regression Polynomial basis Gaussian basis input space X = R d feature space F = R n by basis function φ(x) R n dataset is then Φ = (φ(x (1) ),..., φ(x (N) ) T R N n target space Y = R function family h(x) = w T φ(x) loss function L((x, y), h( )) = (h(x) y) 2 empirical risk E(h( ) X) = 1 N Φw y 2 2 empirical risk minimization leads to parameters ŵ = (Φ T Φ) 1 Φ T y 46 / 48

47 Basis function summary Polynomial basis Gaussian basis basis functions φ : R d R n project the data into higher dimensional space (if n > d) linear regression with the high-dimensional data points φ(x) leads to non-linear hypothesis h(φ(x)) selection of informative basis functions is a difficult task polynomial bases take combinations (products) of existing features gaussian bases generate a new feature mapping 47 / 48

48 Next steps Polynomial basis Gaussian basis next lecture: Regression II with kernel methods and Bayesian regression on friday at 10:15 DL book: read chapters 5.1 and 5.2 on linear regression more information about basis functions Hastie s book 1 : chapters 3.2 & 5 Bishop s book 2 : chapter 3.1 fill out post-lecture questionnaire in MyCourses! we read and appreciate all feedback 1 Elements of Statistical Learning, Springer. 2 Pattern recognition and Machine Learning, Springer / 48

CS-E3210 Machine Learning: Basic Principles

CS-E3210 Machine Learning: Basic Principles CS-E3210 Machine Learning: Basic Principles Lecture 4: Regression II slides by Markus Heinonen Department of Computer Science Aalto University, School of Science Autumn (Period I) 2017 1 / 61 Today s introduction

More information

Lecture 5: Linear models for classification. Logistic regression. Gradient Descent. Second-order methods.

Lecture 5: Linear models for classification. Logistic regression. Gradient Descent. Second-order methods. Lecture 5: Linear models for classification. Logistic regression. Gradient Descent. Second-order methods. Linear models for classification Logistic regression Gradient descent and second-order methods

More information

Linear Regression (continued)

Linear Regression (continued) Linear Regression (continued) Professor Ameet Talwalkar Professor Ameet Talwalkar CS260 Machine Learning Algorithms February 6, 2017 1 / 39 Outline 1 Administration 2 Review of last lecture 3 Linear regression

More information

CSC321 Lecture 2: Linear Regression

CSC321 Lecture 2: Linear Regression CSC32 Lecture 2: Linear Regression Roger Grosse Roger Grosse CSC32 Lecture 2: Linear Regression / 26 Overview First learning algorithm of the course: linear regression Task: predict scalar-valued targets,

More information

Machine Learning Linear Models

Machine Learning Linear Models Machine Learning Linear Models Outline II - Linear Models 1. Linear Regression (a) Linear regression: History (b) Linear regression with Least Squares (c) Matrix representation and Normal Equation Method

More information

Overfitting, Bias / Variance Analysis

Overfitting, Bias / Variance Analysis Overfitting, Bias / Variance Analysis Professor Ameet Talwalkar Professor Ameet Talwalkar CS260 Machine Learning Algorithms February 8, 207 / 40 Outline Administration 2 Review of last lecture 3 Basic

More information

Modeling Data with Linear Combinations of Basis Functions. Read Chapter 3 in the text by Bishop

Modeling Data with Linear Combinations of Basis Functions. Read Chapter 3 in the text by Bishop Modeling Data with Linear Combinations of Basis Functions Read Chapter 3 in the text by Bishop A Type of Supervised Learning Problem We want to model data (x 1, t 1 ),..., (x N, t N ), where x i is a vector

More information

Lecture 14 : Online Learning, Stochastic Gradient Descent, Perceptron

Lecture 14 : Online Learning, Stochastic Gradient Descent, Perceptron CS446: Machine Learning, Fall 2017 Lecture 14 : Online Learning, Stochastic Gradient Descent, Perceptron Lecturer: Sanmi Koyejo Scribe: Ke Wang, Oct. 24th, 2017 Agenda Recap: SVM and Hinge loss, Representer

More information

Machine Learning Basics: Stochastic Gradient Descent. Sargur N. Srihari

Machine Learning Basics: Stochastic Gradient Descent. Sargur N. Srihari Machine Learning Basics: Stochastic Gradient Descent Sargur N. srihari@cedar.buffalo.edu 1 Topics 1. Learning Algorithms 2. Capacity, Overfitting and Underfitting 3. Hyperparameters and Validation Sets

More information

Harrison B. Prosper. Bari Lectures

Harrison B. Prosper. Bari Lectures Harrison B. Prosper Florida State University Bari Lectures 30, 31 May, 1 June 2016 Lectures on Multivariate Methods Harrison B. Prosper Bari, 2016 1 h Lecture 1 h Introduction h Classification h Grid Searches

More information

COMP 551 Applied Machine Learning Lecture 3: Linear regression (cont d)

COMP 551 Applied Machine Learning Lecture 3: Linear regression (cont d) COMP 551 Applied Machine Learning Lecture 3: Linear regression (cont d) Instructor: Herke van Hoof (herke.vanhoof@mail.mcgill.ca) Slides mostly by: Class web page: www.cs.mcgill.ca/~hvanho2/comp551 Unless

More information

Kernelized Perceptron Support Vector Machines

Kernelized Perceptron Support Vector Machines Kernelized Perceptron Support Vector Machines Emily Fox University of Washington February 13, 2017 What is the perceptron optimizing? 1 The perceptron algorithm [Rosenblatt 58, 62] Classification setting:

More information

Regression with Numerical Optimization. Logistic

Regression with Numerical Optimization. Logistic CSG220 Machine Learning Fall 2008 Regression with Numerical Optimization. Logistic regression Regression with Numerical Optimization. Logistic regression based on a document by Andrew Ng October 3, 204

More information

Logistic Regression Introduction to Machine Learning. Matt Gormley Lecture 8 Feb. 12, 2018

Logistic Regression Introduction to Machine Learning. Matt Gormley Lecture 8 Feb. 12, 2018 10-601 Introduction to Machine Learning Machine Learning Department School of Computer Science Carnegie Mellon University Logistic Regression Matt Gormley Lecture 8 Feb. 12, 2018 1 10-601 Introduction

More information

SCMA292 Mathematical Modeling : Machine Learning. Krikamol Muandet. Department of Mathematics Faculty of Science, Mahidol University.

SCMA292 Mathematical Modeling : Machine Learning. Krikamol Muandet. Department of Mathematics Faculty of Science, Mahidol University. SCMA292 Mathematical Modeling : Machine Learning Krikamol Muandet Department of Mathematics Faculty of Science, Mahidol University February 9, 2016 Outline Quick Recap of Least Square Ridge Regression

More information

ECS171: Machine Learning

ECS171: Machine Learning ECS171: Machine Learning Lecture 3: Linear Models I (LFD 3.2, 3.3) Cho-Jui Hsieh UC Davis Jan 17, 2018 Linear Regression (LFD 3.2) Regression Classification: Customer record Yes/No Regression: predicting

More information

COMP 551 Applied Machine Learning Lecture 2: Linear Regression

COMP 551 Applied Machine Learning Lecture 2: Linear Regression COMP 551 Applied Machine Learning Lecture 2: Linear Regression Instructor: Herke van Hoof (herke.vanhoof@mail.mcgill.ca) Slides mostly by: Class web page: www.cs.mcgill.ca/~hvanho2/comp551 Unless otherwise

More information

Linear Models for Regression CS534

Linear Models for Regression CS534 Linear Models for Regression CS534 Prediction Problems Predict housing price based on House size, lot size, Location, # of rooms Predict stock price based on Price history of the past month Predict the

More information

SVMs: Non-Separable Data, Convex Surrogate Loss, Multi-Class Classification, Kernels

SVMs: Non-Separable Data, Convex Surrogate Loss, Multi-Class Classification, Kernels SVMs: Non-Separable Data, Convex Surrogate Loss, Multi-Class Classification, Kernels Karl Stratos June 21, 2018 1 / 33 Tangent: Some Loose Ends in Logistic Regression Polynomial feature expansion in logistic

More information

CSC2515 Winter 2015 Introduction to Machine Learning. Lecture 2: Linear regression

CSC2515 Winter 2015 Introduction to Machine Learning. Lecture 2: Linear regression CSC2515 Winter 2015 Introduction to Machine Learning Lecture 2: Linear regression All lecture slides will be available as.pdf on the course website: http://www.cs.toronto.edu/~urtasun/courses/csc2515/csc2515_winter15.html

More information

CSC 411 Lecture 6: Linear Regression

CSC 411 Lecture 6: Linear Regression CSC 411 Lecture 6: Linear Regression Roger Grosse, Amir-massoud Farahmand, and Juan Carrasquilla University of Toronto UofT CSC 411: 06-Linear Regression 1 / 37 A Timely XKCD UofT CSC 411: 06-Linear Regression

More information

Linear Regression. Robot Image Credit: Viktoriya Sukhanova 123RF.com

Linear Regression. Robot Image Credit: Viktoriya Sukhanova 123RF.com Linear Regression These slides were assembled by Eric Eaton, with grateful acknowledgement of the many others who made their course materials freely available online. Feel free to reuse or adapt these

More information

Statistical Machine Learning from Data

Statistical Machine Learning from Data Samy Bengio Statistical Machine Learning from Data 1 Statistical Machine Learning from Data Support Vector Machines Samy Bengio IDIAP Research Institute, Martigny, Switzerland, and Ecole Polytechnique

More information

Linear Models for Regression CS534

Linear Models for Regression CS534 Linear Models for Regression CS534 Example Regression Problems Predict housing price based on House size, lot size, Location, # of rooms Predict stock price based on Price history of the past month Predict

More information

Kernel Methods in Machine Learning

Kernel Methods in Machine Learning Kernel Methods in Machine Learning Autumn 2015 Lecture 1: Introduction Juho Rousu ICS-E4030 Kernel Methods in Machine Learning 9. September, 2015 uho Rousu (ICS-E4030 Kernel Methods in Machine Learning)

More information

Lecture 4: Types of errors. Bayesian regression models. Logistic regression

Lecture 4: Types of errors. Bayesian regression models. Logistic regression Lecture 4: Types of errors. Bayesian regression models. Logistic regression A Bayesian interpretation of regularization Bayesian vs maximum likelihood fitting more generally COMP-652 and ECSE-68, Lecture

More information

Big Data Analytics. Lucas Rego Drumond

Big Data Analytics. Lucas Rego Drumond Big Data Analytics Lucas Rego Drumond Information Systems and Machine Learning Lab (ISMLL) Institute of Computer Science University of Hildesheim, Germany Predictive Models Predictive Models 1 / 34 Outline

More information

Lecture 18: Kernels Risk and Loss Support Vector Regression. Aykut Erdem December 2016 Hacettepe University

Lecture 18: Kernels Risk and Loss Support Vector Regression. Aykut Erdem December 2016 Hacettepe University Lecture 18: Kernels Risk and Loss Support Vector Regression Aykut Erdem December 2016 Hacettepe University Administrative We will have a make-up lecture on next Saturday December 24, 2016 Presentations

More information

ECE521 week 3: 23/26 January 2017

ECE521 week 3: 23/26 January 2017 ECE521 week 3: 23/26 January 2017 Outline Probabilistic interpretation of linear regression - Maximum likelihood estimation (MLE) - Maximum a posteriori (MAP) estimation Bias-variance trade-off Linear

More information

Linear Models in Machine Learning

Linear Models in Machine Learning CS540 Intro to AI Linear Models in Machine Learning Lecturer: Xiaojin Zhu jerryzhu@cs.wisc.edu We briefly go over two linear models frequently used in machine learning: linear regression for, well, regression,

More information

Support Vector Machines. CSE 6363 Machine Learning Vassilis Athitsos Computer Science and Engineering Department University of Texas at Arlington

Support Vector Machines. CSE 6363 Machine Learning Vassilis Athitsos Computer Science and Engineering Department University of Texas at Arlington Support Vector Machines CSE 6363 Machine Learning Vassilis Athitsos Computer Science and Engineering Department University of Texas at Arlington 1 A Linearly Separable Problem Consider the binary classification

More information

Neural Network Training

Neural Network Training Neural Network Training Sargur Srihari Topics in Network Training 0. Neural network parameters Probabilistic problem formulation Specifying the activation and error functions for Regression Binary classification

More information

ECS171: Machine Learning

ECS171: Machine Learning ECS171: Machine Learning Lecture 4: Optimization (LFD 3.3, SGD) Cho-Jui Hsieh UC Davis Jan 22, 2018 Gradient descent Optimization Goal: find the minimizer of a function min f (w) w For now we assume f

More information

Advanced Machine Learning Practical 4b Solution: Regression (BLR, GPR & Gradient Boosting)

Advanced Machine Learning Practical 4b Solution: Regression (BLR, GPR & Gradient Boosting) Advanced Machine Learning Practical 4b Solution: Regression (BLR, GPR & Gradient Boosting) Professor: Aude Billard Assistants: Nadia Figueroa, Ilaria Lauzana and Brice Platerrier E-mails: aude.billard@epfl.ch,

More information

CMU-Q Lecture 24:

CMU-Q Lecture 24: CMU-Q 15-381 Lecture 24: Supervised Learning 2 Teacher: Gianni A. Di Caro SUPERVISED LEARNING Hypotheses space Hypothesis function Labeled Given Errors Performance criteria Given a collection of input

More information

Machine Learning. Lecture 4: Regularization and Bayesian Statistics. Feng Li. https://funglee.github.io

Machine Learning. Lecture 4: Regularization and Bayesian Statistics. Feng Li. https://funglee.github.io Machine Learning Lecture 4: Regularization and Bayesian Statistics Feng Li fli@sdu.edu.cn https://funglee.github.io School of Computer Science and Technology Shandong University Fall 207 Overfitting Problem

More information

CSCI567 Machine Learning (Fall 2014)

CSCI567 Machine Learning (Fall 2014) CSCI567 Machine Learning (Fall 24) Drs. Sha & Liu {feisha,yanliu.cs}@usc.edu October 2, 24 Drs. Sha & Liu ({feisha,yanliu.cs}@usc.edu) CSCI567 Machine Learning (Fall 24) October 2, 24 / 24 Outline Review

More information

Machine Learning: Logistic Regression. Lecture 04

Machine Learning: Logistic Regression. Lecture 04 Machine Learning: Logistic Regression Razvan C. Bunescu School of Electrical Engineering and Computer Science bunescu@ohio.edu Supervised Learning Task = learn an (unkon function t : X T that maps input

More information

COMP 551 Applied Machine Learning Lecture 2: Linear regression

COMP 551 Applied Machine Learning Lecture 2: Linear regression COMP 551 Applied Machine Learning Lecture 2: Linear regression Instructor: (jpineau@cs.mcgill.ca) Class web page: www.cs.mcgill.ca/~jpineau/comp551 Unless otherwise noted, all material posted for this

More information

Linear Models for Regression CS534

Linear Models for Regression CS534 Linear Models for Regression CS534 Example Regression Problems Predict housing price based on House size, lot size, Location, # of rooms Predict stock price based on Price history of the past month Predict

More information

Universität Potsdam Institut für Informatik Lehrstuhl Maschinelles Lernen. Bayesian Learning. Tobias Scheffer, Niels Landwehr

Universität Potsdam Institut für Informatik Lehrstuhl Maschinelles Lernen. Bayesian Learning. Tobias Scheffer, Niels Landwehr Universität Potsdam Institut für Informatik Lehrstuhl Maschinelles Lernen Bayesian Learning Tobias Scheffer, Niels Landwehr Remember: Normal Distribution Distribution over x. Density function with parameters

More information

SVMs, Duality and the Kernel Trick

SVMs, Duality and the Kernel Trick SVMs, Duality and the Kernel Trick Machine Learning 10701/15781 Carlos Guestrin Carnegie Mellon University February 26 th, 2007 2005-2007 Carlos Guestrin 1 SVMs reminder 2005-2007 Carlos Guestrin 2 Today

More information

COMPUTATIONAL INTELLIGENCE (INTRODUCTION TO MACHINE LEARNING) SS16

COMPUTATIONAL INTELLIGENCE (INTRODUCTION TO MACHINE LEARNING) SS16 COMPUTATIONAL INTELLIGENCE (INTRODUCTION TO MACHINE LEARNING) SS6 Lecture 3: Classification with Logistic Regression Advanced optimization techniques Underfitting & Overfitting Model selection (Training-

More information

Neural Networks: Backpropagation

Neural Networks: Backpropagation Neural Networks: Backpropagation Seung-Hoon Na 1 1 Department of Computer Science Chonbuk National University 2018.10.25 eung-hoon Na (Chonbuk National University) Neural Networks: Backpropagation 2018.10.25

More information

Neural Networks and Deep Learning

Neural Networks and Deep Learning Neural Networks and Deep Learning Professor Ameet Talwalkar November 12, 2015 Professor Ameet Talwalkar Neural Networks and Deep Learning November 12, 2015 1 / 16 Outline 1 Review of last lecture AdaBoost

More information

Linear Regression Introduction to Machine Learning. Matt Gormley Lecture 4 September 19, Readings: Bishop, 3.

Linear Regression Introduction to Machine Learning. Matt Gormley Lecture 4 September 19, Readings: Bishop, 3. School of Computer Science 10-701 Introduction to Machine Learning Linear Regression Readings: Bishop, 3.1 Murphy, 7 Matt Gormley Lecture 4 September 19, 2016 1 Homework 1: due 9/26/16 Project Proposal:

More information

CS 195-5: Machine Learning Problem Set 1

CS 195-5: Machine Learning Problem Set 1 CS 95-5: Machine Learning Problem Set Douglas Lanman dlanman@brown.edu 7 September Regression Problem Show that the prediction errors y f(x; ŵ) are necessarily uncorrelated with any linear function of

More information

Multivariate Bayesian Linear Regression MLAI Lecture 11

Multivariate Bayesian Linear Regression MLAI Lecture 11 Multivariate Bayesian Linear Regression MLAI Lecture 11 Neil D. Lawrence Department of Computer Science Sheffield University 21st October 2012 Outline Univariate Bayesian Linear Regression Multivariate

More information

Discriminative Models

Discriminative Models No.5 Discriminative Models Hui Jiang Department of Electrical Engineering and Computer Science Lassonde School of Engineering York University, Toronto, Canada Outline Generative vs. Discriminative models

More information

Support Vector Machines (SVM) in bioinformatics. Day 1: Introduction to SVM

Support Vector Machines (SVM) in bioinformatics. Day 1: Introduction to SVM 1 Support Vector Machines (SVM) in bioinformatics Day 1: Introduction to SVM Jean-Philippe Vert Bioinformatics Center, Kyoto University, Japan Jean-Philippe.Vert@mines.org Human Genome Center, University

More information

Simple Techniques for Improving SGD. CS6787 Lecture 2 Fall 2017

Simple Techniques for Improving SGD. CS6787 Lecture 2 Fall 2017 Simple Techniques for Improving SGD CS6787 Lecture 2 Fall 2017 Step Sizes and Convergence Where we left off Stochastic gradient descent x t+1 = x t rf(x t ; yĩt ) Much faster per iteration than gradient

More information

Linear Regression 1 / 25. Karl Stratos. June 18, 2018

Linear Regression 1 / 25. Karl Stratos. June 18, 2018 Linear Regression Karl Stratos June 18, 2018 1 / 25 The Regression Problem Problem. Find a desired input-output mapping f : X R where the output is a real value. x = = y = 0.1 How much should I turn my

More information

Lecture 5: Logistic Regression. Neural Networks

Lecture 5: Logistic Regression. Neural Networks Lecture 5: Logistic Regression. Neural Networks Logistic regression Comparison with generative models Feed-forward neural networks Backpropagation Tricks for training neural networks COMP-652, Lecture

More information

Engineering Part IIB: Module 4F10 Statistical Pattern Processing Lecture 5: Single Layer Perceptrons & Estimating Linear Classifiers

Engineering Part IIB: Module 4F10 Statistical Pattern Processing Lecture 5: Single Layer Perceptrons & Estimating Linear Classifiers Engineering Part IIB: Module 4F0 Statistical Pattern Processing Lecture 5: Single Layer Perceptrons & Estimating Linear Classifiers Phil Woodland: pcw@eng.cam.ac.uk Michaelmas 202 Engineering Part IIB:

More information

Lecture 7: Kernels for Classification and Regression

Lecture 7: Kernels for Classification and Regression Lecture 7: Kernels for Classification and Regression CS 194-10, Fall 2011 Laurent El Ghaoui EECS Department UC Berkeley September 15, 2011 Outline Outline A linear regression problem Linear auto-regressive

More information

Machine Learning CS 4900/5900. Lecture 03. Razvan C. Bunescu School of Electrical Engineering and Computer Science

Machine Learning CS 4900/5900. Lecture 03. Razvan C. Bunescu School of Electrical Engineering and Computer Science Machine Learning CS 4900/5900 Razvan C. Bunescu School of Electrical Engineering and Computer Science bunescu@ohio.edu Machine Learning is Optimization Parametric ML involves minimizing an objective function

More information

Support Vector Machines: Training with Stochastic Gradient Descent. Machine Learning Fall 2017

Support Vector Machines: Training with Stochastic Gradient Descent. Machine Learning Fall 2017 Support Vector Machines: Training with Stochastic Gradient Descent Machine Learning Fall 2017 1 Support vector machines Training by maximizing margin The SVM objective Solving the SVM optimization problem

More information

Linear Regression. CSL603 - Fall 2017 Narayanan C Krishnan

Linear Regression. CSL603 - Fall 2017 Narayanan C Krishnan Linear Regression CSL603 - Fall 2017 Narayanan C Krishnan ckn@iitrpr.ac.in Outline Univariate regression Multivariate regression Probabilistic view of regression Loss functions Bias-Variance analysis Regularization

More information

Machine Learning. Support Vector Machines. Fabio Vandin November 20, 2017

Machine Learning. Support Vector Machines. Fabio Vandin November 20, 2017 Machine Learning Support Vector Machines Fabio Vandin November 20, 2017 1 Classification and Margin Consider a classification problem with two classes: instance set X = R d label set Y = { 1, 1}. Training

More information

J. Sadeghi E. Patelli M. de Angelis

J. Sadeghi E. Patelli M. de Angelis J. Sadeghi E. Patelli Institute for Risk and, Department of Engineering, University of Liverpool, United Kingdom 8th International Workshop on Reliable Computing, Computing with Confidence University of

More information

Linear Regression. CSL465/603 - Fall 2016 Narayanan C Krishnan

Linear Regression. CSL465/603 - Fall 2016 Narayanan C Krishnan Linear Regression CSL465/603 - Fall 2016 Narayanan C Krishnan ckn@iitrpr.ac.in Outline Univariate regression Multivariate regression Probabilistic view of regression Loss functions Bias-Variance analysis

More information

Machine Learning Lecture 7

Machine Learning Lecture 7 Course Outline Machine Learning Lecture 7 Fundamentals (2 weeks) Bayes Decision Theory Probability Density Estimation Statistical Learning Theory 23.05.2016 Discriminative Approaches (5 weeks) Linear Discriminant

More information

Machine Learning Basics III

Machine Learning Basics III Machine Learning Basics III Benjamin Roth CIS LMU München Benjamin Roth (CIS LMU München) Machine Learning Basics III 1 / 62 Outline 1 Classification Logistic Regression 2 Gradient Based Optimization Gradient

More information

These slides follow closely the (English) course textbook Pattern Recognition and Machine Learning by Christopher Bishop

These slides follow closely the (English) course textbook Pattern Recognition and Machine Learning by Christopher Bishop Music and Machine Learning (IFT68 Winter 8) Prof. Douglas Eck, Université de Montréal These slides follow closely the (English) course textbook Pattern Recognition and Machine Learning by Christopher Bishop

More information

OPTIMIZATION METHODS IN DEEP LEARNING

OPTIMIZATION METHODS IN DEEP LEARNING Tutorial outline OPTIMIZATION METHODS IN DEEP LEARNING Based on Deep Learning, chapter 8 by Ian Goodfellow, Yoshua Bengio and Aaron Courville Presented By Nadav Bhonker Optimization vs Learning Surrogate

More information

Machine Learning Lecture 5

Machine Learning Lecture 5 Machine Learning Lecture 5 Linear Discriminant Functions 26.10.2017 Bastian Leibe RWTH Aachen http://www.vision.rwth-aachen.de leibe@vision.rwth-aachen.de Course Outline Fundamentals Bayes Decision Theory

More information

CIS 520: Machine Learning Oct 09, Kernel Methods

CIS 520: Machine Learning Oct 09, Kernel Methods CIS 520: Machine Learning Oct 09, 207 Kernel Methods Lecturer: Shivani Agarwal Disclaimer: These notes are designed to be a supplement to the lecture They may or may not cover all the material discussed

More information

Introduction to Machine Learning

Introduction to Machine Learning Introduction to Machine Learning CS4731 Dr. Mihail Fall 2017 Slide content based on books by Bishop and Barber. https://www.microsoft.com/en-us/research/people/cmbishop/ http://web4.cs.ucl.ac.uk/staff/d.barber/pmwiki/pmwiki.php?n=brml.homepage

More information

CSE 417T: Introduction to Machine Learning. Lecture 11: Review. Henry Chai 10/02/18

CSE 417T: Introduction to Machine Learning. Lecture 11: Review. Henry Chai 10/02/18 CSE 417T: Introduction to Machine Learning Lecture 11: Review Henry Chai 10/02/18 Unknown Target Function!: # % Training data Formal Setup & = ( ), + ),, ( -, + - Learning Algorithm 2 Hypothesis Set H

More information

Linear Models for Regression

Linear Models for Regression Linear Models for Regression Seungjin Choi Department of Computer Science and Engineering Pohang University of Science and Technology 77 Cheongam-ro, Nam-gu, Pohang 37673, Korea seungjin@postech.ac.kr

More information

Linear Models for Regression. Sargur Srihari

Linear Models for Regression. Sargur Srihari Linear Models for Regression Sargur srihari@cedar.buffalo.edu 1 Topics in Linear Regression What is regression? Polynomial Curve Fitting with Scalar input Linear Basis Function Models Maximum Likelihood

More information

ML (cont.): SUPPORT VECTOR MACHINES

ML (cont.): SUPPORT VECTOR MACHINES ML (cont.): SUPPORT VECTOR MACHINES CS540 Bryan R Gibson University of Wisconsin-Madison Slides adapted from those used by Prof. Jerry Zhu, CS540-1 1 / 40 Support Vector Machines (SVMs) The No-Math Version

More information

Linear Regression and Discrimination

Linear Regression and Discrimination Linear Regression and Discrimination Kernel-based Learning Methods Christian Igel Institut für Neuroinformatik Ruhr-Universität Bochum, Germany http://www.neuroinformatik.rub.de July 16, 2009 Christian

More information

Perceptron (Theory) + Linear Regression

Perceptron (Theory) + Linear Regression 10601 Introduction to Machine Learning Machine Learning Department School of Computer Science Carnegie Mellon University Perceptron (Theory) Linear Regression Matt Gormley Lecture 6 Feb. 5, 2018 1 Q&A

More information

Max Margin-Classifier

Max Margin-Classifier Max Margin-Classifier Oliver Schulte - CMPT 726 Bishop PRML Ch. 7 Outline Maximum Margin Criterion Math Maximizing the Margin Non-Separable Data Kernels and Non-linear Mappings Where does the maximization

More information

Introduction to Machine Learning

Introduction to Machine Learning 1, DATA11002 Introduction to Machine Learning Lecturer: Teemu Roos TAs: Ville Hyvönen and Janne Leppä-aho Department of Computer Science University of Helsinki (based in part on material by Patrik Hoyer

More information

Gradient Descent. Sargur Srihari

Gradient Descent. Sargur Srihari Gradient Descent Sargur srihari@cedar.buffalo.edu 1 Topics Simple Gradient Descent/Ascent Difficulties with Simple Gradient Descent Line Search Brent s Method Conjugate Gradient Descent Weight vectors

More information

Optimization in the Big Data Regime 2: SVRG & Tradeoffs in Large Scale Learning. Sham M. Kakade

Optimization in the Big Data Regime 2: SVRG & Tradeoffs in Large Scale Learning. Sham M. Kakade Optimization in the Big Data Regime 2: SVRG & Tradeoffs in Large Scale Learning. Sham M. Kakade Machine Learning for Big Data CSE547/STAT548 University of Washington S. M. Kakade (UW) Optimization for

More information

Kernel Machines. Pradeep Ravikumar Co-instructor: Manuela Veloso. Machine Learning

Kernel Machines. Pradeep Ravikumar Co-instructor: Manuela Veloso. Machine Learning Kernel Machines Pradeep Ravikumar Co-instructor: Manuela Veloso Machine Learning 10-701 SVM linearly separable case n training points (x 1,, x n ) d features x j is a d-dimensional vector Primal problem:

More information

An Introduction to Statistical and Probabilistic Linear Models

An Introduction to Statistical and Probabilistic Linear Models An Introduction to Statistical and Probabilistic Linear Models Maximilian Mozes Proseminar Data Mining Fakultät für Informatik Technische Universität München June 07, 2017 Introduction In statistical learning

More information

Linear & nonlinear classifiers

Linear & nonlinear classifiers Linear & nonlinear classifiers Machine Learning Hamid Beigy Sharif University of Technology Fall 1394 Hamid Beigy (Sharif University of Technology) Linear & nonlinear classifiers Fall 1394 1 / 34 Table

More information

Midterm Review CS 6375: Machine Learning. Vibhav Gogate The University of Texas at Dallas

Midterm Review CS 6375: Machine Learning. Vibhav Gogate The University of Texas at Dallas Midterm Review CS 6375: Machine Learning Vibhav Gogate The University of Texas at Dallas Machine Learning Supervised Learning Unsupervised Learning Reinforcement Learning Parametric Y Continuous Non-parametric

More information

Discriminative Models

Discriminative Models No.5 Discriminative Models Hui Jiang Department of Electrical Engineering and Computer Science Lassonde School of Engineering York University, Toronto, Canada Outline Generative vs. Discriminative models

More information

CSE446: non-parametric methods Spring 2017

CSE446: non-parametric methods Spring 2017 CSE446: non-parametric methods Spring 2017 Ali Farhadi Slides adapted from Carlos Guestrin and Luke Zettlemoyer Linear Regression: What can go wrong? What do we do if the bias is too strong? Might want

More information

Linear Regression. Aarti Singh. Machine Learning / Sept 27, 2010

Linear Regression. Aarti Singh. Machine Learning / Sept 27, 2010 Linear Regression Aarti Singh Machine Learning 10-701/15-781 Sept 27, 2010 Discrete to Continuous Labels Classification Sports Science News Anemic cell Healthy cell Regression X = Document Y = Topic X

More information

GWAS IV: Bayesian linear (variance component) models

GWAS IV: Bayesian linear (variance component) models GWAS IV: Bayesian linear (variance component) models Dr. Oliver Stegle Christoh Lippert Prof. Dr. Karsten Borgwardt Max-Planck-Institutes Tübingen, Germany Tübingen Summer 2011 Oliver Stegle GWAS IV: Bayesian

More information

Linear Models for Regression

Linear Models for Regression Linear Models for Regression Seungjin Choi Department of Computer Science and Engineering Pohang University of Science and Technology 77 Cheongam-ro, Nam-gu, Pohang 37673, Korea seungjin@postech.ac.kr

More information

MLPR: Logistic Regression and Neural Networks

MLPR: Logistic Regression and Neural Networks MLPR: Logistic Regression and Neural Networks Machine Learning and Pattern Recognition Amos Storkey Amos Storkey MLPR: Logistic Regression and Neural Networks 1/28 Outline 1 Logistic Regression 2 Multi-layer

More information

LINEAR MODELS FOR CLASSIFICATION. J. Elder CSE 6390/PSYC 6225 Computational Modeling of Visual Perception

LINEAR MODELS FOR CLASSIFICATION. J. Elder CSE 6390/PSYC 6225 Computational Modeling of Visual Perception LINEAR MODELS FOR CLASSIFICATION Classification: Problem Statement 2 In regression, we are modeling the relationship between a continuous input variable x and a continuous target variable t. In classification,

More information

Outline. MLPR: Logistic Regression and Neural Networks Machine Learning and Pattern Recognition. Which is the correct model? Recap.

Outline. MLPR: Logistic Regression and Neural Networks Machine Learning and Pattern Recognition. Which is the correct model? Recap. Outline MLPR: and Neural Networks Machine Learning and Pattern Recognition 2 Amos Storkey Amos Storkey MLPR: and Neural Networks /28 Recap Amos Storkey MLPR: and Neural Networks 2/28 Which is the correct

More information

Machine Learning. Linear Models. Fabio Vandin October 10, 2017

Machine Learning. Linear Models. Fabio Vandin October 10, 2017 Machine Learning Linear Models Fabio Vandin October 10, 2017 1 Linear Predictors and Affine Functions Consider X = R d Affine functions: L d = {h w,b : w R d, b R} where ( d ) h w,b (x) = w, x + b = w

More information

Lecture 2 Machine Learning Review

Lecture 2 Machine Learning Review Lecture 2 Machine Learning Review CMSC 35246: Deep Learning Shubhendu Trivedi & Risi Kondor University of Chicago March 29, 2017 Things we will look at today Formal Setup for Supervised Learning Things

More information

Foundation of Intelligent Systems, Part I. Regression

Foundation of Intelligent Systems, Part I. Regression Foundation of Intelligent Systems, Part I Regression mcuturi@i.kyoto-u.ac.jp FIS-2013 1 Before starting Please take this survey before the end of this week. Here are a few books which you can check beyond

More information

IFT Lecture 6 Nesterov s Accelerated Gradient, Stochastic Gradient Descent

IFT Lecture 6 Nesterov s Accelerated Gradient, Stochastic Gradient Descent IFT 6085 - Lecture 6 Nesterov s Accelerated Gradient, Stochastic Gradient Descent This version of the notes has not yet been thoroughly checked. Please report any bugs to the scribes or instructor. Scribe(s):

More information

Least Mean Squares Regression. Machine Learning Fall 2018

Least Mean Squares Regression. Machine Learning Fall 2018 Least Mean Squares Regression Machine Learning Fall 2018 1 Where are we? Least Squares Method for regression Examples The LMS objective Gradient descent Incremental/stochastic gradient descent Exercises

More information

ECE521 lecture 4: 19 January Optimization, MLE, regularization

ECE521 lecture 4: 19 January Optimization, MLE, regularization ECE521 lecture 4: 19 January 2017 Optimization, MLE, regularization First four lectures Lectures 1 and 2: Intro to ML Probability review Types of loss functions and algorithms Lecture 3: KNN Convexity

More information

Selected Topics in Optimization. Some slides borrowed from

Selected Topics in Optimization. Some slides borrowed from Selected Topics in Optimization Some slides borrowed from http://www.stat.cmu.edu/~ryantibs/convexopt/ Overview Optimization problems are almost everywhere in statistics and machine learning. Input Model

More information

Machine Learning Support Vector Machines. Prof. Matteo Matteucci

Machine Learning Support Vector Machines. Prof. Matteo Matteucci Machine Learning Support Vector Machines Prof. Matteo Matteucci Discriminative vs. Generative Approaches 2 o Generative approach: we derived the classifier from some generative hypothesis about the way

More information

Lecture 3. Linear Regression II Bastian Leibe RWTH Aachen

Lecture 3. Linear Regression II Bastian Leibe RWTH Aachen Advanced Machine Learning Lecture 3 Linear Regression II 02.11.2015 Bastian Leibe RWTH Aachen http://www.vision.rwth-aachen.de/ leibe@vision.rwth-aachen.de This Lecture: Advanced Machine Learning Regression

More information