Linear Regression. Robot Image Credit: Viktoriya Sukhanova 123RF.com
|
|
- Steven Washington
- 5 years ago
- Views:
Transcription
1 Linear Regression These slides were assembled by Eric Eaton, with grateful acknowledgement of the many others who made their course materials freely available online. Feel free to reuse or adapt these slides for your own academic purposes, provided that you include proper attribution. Please send comments and corrections to Eric. Robot Image Credit: Viktoriya Sukhanova 123RF.com
2 Regression Given: n Data X = x (1),...,x (n)o where x (i) 2 R d n Corresponding labels y = y (1),...,y (n)o where y (i) 2 R September Arctic Sea Ice Extent (1,000,000 sq km) Linear Regression Quadratic Regression Year Data from G. Witt. Journal of Statistics Education, Volume 21, Number 1 (2013) 2
3 Linear Regression Hypothesis: y = x x d x d = dx j x j Assume x 0 = 1 j=0 Fit model by minimizing sum of squared errors x Figures are courtesy of Greg Shakhnarovich 3
4 Least Squares Linear Regression Cost Function J( ) = 1 2n Fit by solving min nx i=1 J( ) h x (i) y (i) 2 4
5 Intuition Behind Cost Function J( ) = 1 2n For insight on J(), let s assume nx h x (i) y (i) 2 i=1 so x 2 R =[ 0, 1 ] Based on example by Andrew Ng 5
6 Intuition Behind Cost Function J( ) = 1 2n For insight on J(), let s assume nx h x (i) y (i) 2 i=1 so x 2 R =[ 0, 1 ] (for fixed, this is a function of x) (function of the parameter ) 3 3 y x Based on example by Andrew Ng 6
7 Intuition Behind Cost Function J( ) = 1 2n For insight on J(), let s assume nx h x (i) y (i) 2 i=1 so x 2 R =[ 0, 1 ] (for fixed, this is a function of x) (function of the parameter ) 3 3 y Based on example by Andrew Ng x J([0, 0.5]) = (0.5 1) 2 +(1 2) 2 +(1.5 3)
8 Intuition Behind Cost Function J( ) = 1 2n For insight on J(), let s assume nx h x (i) y (i) 2 i=1 so x 2 R =[ 0, 1 ] y (for fixed, this is a function of x) (function of the parameter ) 3 3 J([0, 0]) J() is concave x Based on example by Andrew Ng 8
9 Intuition Behind Cost Function Slide by Andrew Ng 9
10 Intuition Behind Cost Function (for fixed, this is a function of x) (function of the parameters ) Slide by Andrew Ng 10
11 Intuition Behind Cost Function (for fixed, this is a function of x) (function of the parameters ) Slide by Andrew Ng 11
12 Intuition Behind Cost Function (for fixed, this is a function of x) (function of the parameters ) Slide by Andrew Ng 12
13 Intuition Behind Cost Function (for fixed, this is a function of x) (function of the parameters ) Slide by Andrew Ng 13
14 Basic Search Procedure Choose initial value for Until we reach a minimum: Choose a new value for to reduce J( ) J(q 0,q 1 ) q 1 q 0 Figure by Andrew Ng 14
15 Basic Search Procedure Choose initial value for Until we reach a minimum: Choose a new value for to reduce J( ) J(q 0,q 1 ) q 1 q 0 Figure by Andrew Ng 15
16 Basic Search Procedure Choose initial value for Until we reach a minimum: Choose a new value for to reduce J( ) J(q 0,q 1 ) Since the least squares objective function is convex (concave), q 1 q we don t need 0 to worry about local minima Figure by Andrew Ng 16
17 Gradient Descent Initialize Repeat until convergence j J( j simultaneous update for j = 0... d learning rate (small) e.g., α = J( )
18 Gradient Descent Initialize Repeat until convergence j J( j simultaneous update for j = 0... d For j J( ) j 2n nx h x (i) y (i) 2 i=1 X X! 2 18
19 Gradient Descent Initialize Repeat until convergence j J( j simultaneous update for j = 0... d For j J( ) j j 1 2n X nx h x (i) y (i) 2 i=1 nx i=1 X dx k=0 k x (i) k! y (i)! 2 X! 19
20 Gradient Descent Initialize Repeat until convergence j J( j simultaneous update for j = 0... d For j J( ) j j 1 2n = 1 n nx i=1 X nx h x (i) y (i) 2 i=1 nx i=1 dx k=0 X dx k=0 k x (i) k k x (i) k y (i)!! y j dx k=0 k x (i) k y (i)! 20
21 Gradient Descent Initialize Repeat until convergence j J( j simultaneous update for j = 0... d For j J( ) j j 1 2n = 1 n = 1 n nx i=1 nx i=1 nx h x (i) y (i) 2 i=1 nx i=1 dx k=0 dx k=0 dx k=0 k x (i) k k x (i) k k x (i) k y (i) y (i)!! nx y (i) x (i) j dx k=0 k x (i) k y (i)! 21
22 Gradient Descent for Linear Regression Initialize Repeat until convergence j j 1 nx h x (i) n i=1 y (i) x (i) j simultaneous update for j = 0... d To achieve simultaneous update At the start of each GD iteration, compute h x (i) Use this stored value in the update step loop Assume convergence when L 2 norm: s X kvk 2 = vi qv 2 = v v2 v i k new old k 2 < 22
23 Gradient Descent (for fixed, this is a function of x) (function of the parameters ) h(x) = x Slide by Andrew Ng 23
24 Gradient Descent (for fixed, this is a function of x) (function of the parameters ) Slide by Andrew Ng 24
25 Gradient Descent (for fixed, this is a function of x) (function of the parameters ) Slide by Andrew Ng 25
26 Gradient Descent (for fixed, this is a function of x) (function of the parameters ) Slide by Andrew Ng 26
27 Gradient Descent (for fixed, this is a function of x) (function of the parameters ) Slide by Andrew Ng 27
28 Gradient Descent (for fixed, this is a function of x) (function of the parameters ) Slide by Andrew Ng 28
29 Gradient Descent (for fixed, this is a function of x) (function of the parameters ) Slide by Andrew Ng 29
30 Gradient Descent (for fixed, this is a function of x) (function of the parameters ) Slide by Andrew Ng 30
31 Gradient Descent (for fixed, this is a function of x) (function of the parameters ) Slide by Andrew Ng 31
32 Choosing α α too small slow convergence α too large Increasing value for J( ) May overshoot the minimum May fail to converge May even diverge To see if gradient descent is working, print out The value should decrease at each iteration If it doesn t, adjust α J( ) each iteration 32
33 Extending Linear Regression to More Complex Models The inputs X for linear regression can be: Original quantitative inputs Transformation of quantitative inputs e.g. log, exp, square root, square, etc. Polynomial transformation example: y = b 0 + b 1 x + b 2 x 2 + b 3 x 3 Basis expansions Dummy coding of categorical inputs Interactions between variables example: x 3 = x 1 x 2 This allows use of linear regression techniques to fit non- linear datasets.
34 Linear Basis Function Models Generally, h (x) = dx j=0 j j (x) basis function Typically, 0(x) =1so that 0 acts as a bias In the simplest case, we use linear basis functions : j(x) =x j Based on slide by Christopher Bishop (PRML)
35 Linear Basis Function Models Polynomial basis functions: These are global; a small change in x affects all basis functions Gaussian basis functions: These are local; a small change in x only affect nearby basis functions. μ j and s control location and scale (width). Based on slide by Christopher Bishop (PRML)
36 Linear Basis Function Models Sigmoidal basis functions: where These are also local; a small change in x only affects nearby basis functions. μ j and s control location and scale (slope). Based on slide by Christopher Bishop (PRML)
37 Example of Fitting a Polynomial Curve with a Linear Model y = x + 2 x p x p = px j x j j=0
38 Linear Basis Function Models Basic Linear Model: Generalized Linear Model: h (x) = h (x) = Once we have replaced the data by the outputs of the basis functions, fitting the generalized model is exactly the same problem as fitting the basic model Unless we use the kernel trick more on that when we cover support vector machines Therefore, there is no point in cluttering the math with basis functions dx j=0 dx j=0 j x j j j (x) Based on slide by Geoff Hinton 38
39 Linear Algebra Concepts Vector in R d is an ordered set of d real numbers e.g., v = [1,6,3,4] is in R 4 [1,6,3,4] is a column vector: as opposed to a row vector: ( ) æ 1ö ç ç 6 ç 3 ç ç è 4ø An m- by- n matrix is an object with m rows and n columns, where each entry is a real number: æ 1 2 8ö ç ç ç è 9 3 2ø Based on slides by Joseph Bradley
40 Linear Algebra Concepts Transpose: reflect vector/matrix on line: æ aö T ç = è bø ( a b) æ ç è a c b d ö ø T = æ ç è a b c d ö ø Note: (Ax) T =x T A T (We ll define multiplication soon ) Vector norms: L p norm of v = (v 1,,v k ) is Common norms: L 1, L 2 L infinity = max i v i Length of a vector v is L 2 (v) Based on slides by Joseph Bradley X i v i p! 1 p
41 Linear Algebra Concepts Vector dot product: ( u1 u2) ( v1 v2 ) = u1v 1 u2v2 u v = + Note: dot product of u with itself = length(u) 2 = kuk 2 2 Matrix product: A AB æ a = ç ç è a æ a = ç ç è a b b a a ö, ø + + B = a a b b æ b ç ç è b a a b b b b ö ø + + a a b b ö ø Based on slides by Joseph Bradley
42 Vector products: Dot product: Outer product: ( ) v u u v v v u u v u v u T + = ø ö ç ç è æ = = ( ) ø ö ç ç è æ = ø ö ç ç è æ = v u v u u v u v v v u u uv T Based on slides by Joseph Bradley Linear Algebra Concepts
43 Benefits of vectorization Vectorization More compact equations Faster code (using optimized matrix libraries) Consider our model: Let = d h(x) = dx j=0 j x j x = 1 x 1... x d Can write the model in vectorized form as h(x) = x 43
44 Vectorization Consider our model for n instances: h x (i) dx = j x (i) j Let = d R (d+1) X = Can write the model in vectorized form as j=0 1 x (1) 1... x (1) d x (i) 1... x (i) d x (n) 1... x (n) d R n (d+1) h (x) =X 44
45 Vectorization Let: y = For the linear regression cost function: J( ) = 1 nx h x (i) y (i) 2 2n y (1) y (2). y (n) = 1 2n i=1 nx i=1 x (i) y (i) 2 = 1 2n (X y) (X y) R 1 n R n 1 R n (d+1) R (d+1) 1 45
46 Closed Form Solution Instead of using GD, solve for optimal Notice that the solution is when Derivation: Closed Form J( ) analytically ( ) = 1 2n (X y) 1 x 1 J (X y) / X X y X X y + y y / / X X 2 X y + y y Take derivative and set equal to 0, then solve for ( X X 2 X y + y (X X) X y =0 (X X) = X y =(X X) 1 X y 46
47 Closed Form Solution Can obtain by simply plugging X and yinto =(X X) 1 X y X = x (1) 1... x (1) d x (i) 1... x (i) d x (n) 1... x (n) d y = 6 4 If X T X is not invertible (i.e., singular), may need to: Use pseudo- inverse instead of the inverse In python, numpy.linalg.pinv(a) Remove redundant (not linearly independent) features Remove extra features to ensure that d n y (1) y (2). y (n)
48 Gradient Descent vs Closed Form Gradient Descent Requires multiple iterations Need to choose α Works well when n is large Can support incremental learning Closed Form Solution Non- iterative No need for α Slow if n is large Computing (X T X) - 1 is roughly O(n 3 ) 48
49 Improving Learning: Feature Scaling Idea: Ensure that feature have similar scales Before Feature Scaling Makes gradient descent converge much faster After Feature Scaling
50 Feature Standardization Rescales features to have zero mean and unit variance Let μ j be the mean of feature j: µ j = 1 nx x (i) j n Replace each value with: x (i) j x (i) j s j s j is the standard deviation of feature j Could also use the range of feature j (max j min j ) for s j Must apply the same transformation to instances for both training and prediction Outliers can cause problems µ j i=1 for j = 1...d (not x 0!) 50
51 Quality of Fit Price Price Price Size Underfitting (high bias) Size Correct fit Size Overfitting (high variance) Overfitting: The learned hypothesis may fit the training set very well ( J( ) 0 )...but fails to generalize to new examples Based on example by Andrew Ng 51
52 Regularization A method for automatically controlling the complexity of the learned hypothesis Idea: penalize for large values of Can incorporate into the cost function Works well when we have a lot of features, each that contributes a bit to predicting the label Can also address overfitting by eliminating features (either manually or via model selection) j 52
53 Regularization Linear regression objective function J( ) = 1 2n nx i=1 h x (i) y (i) dx j=1 2 j model fit to data regularization is the regularization parameter ( 0) No regularization on 0! 53
54 Understanding Regularization J( ) = 1 2n nx i=1 h x (i) y (i) dx j=1 2 j dx Note that j 2 = k 1:d k 2 2 j=1 This is the magnitude of the feature coefficient vector! We can also think of this as: dx ( j 0) 2 = k 1:d ~0k 2 2 j=1 L 2 regularization pulls coefficients toward 0 54
55 Understanding Regularization J( ) = 1 2n nx i=1 h x (i) y (i) dx j=1 2 j What happens if we set to be huge (e.g., )? Price Size Based on example by Andrew Ng 55
56 Understanding Regularization J( ) = 1 2n nx i=1 h x (i) y (i) dx j=1 2 j What happens if we set to be huge (e.g., )? Price Size Based on example by Andrew Ng 56
57 Regularized Linear Regression Cost Function J( ) = 1 2n Fit by solving nx i=1 min h x (i) y (i) dx J( ) j=1 0 j J( ) Gradient update: n j j 1 n nx i=1 nx i=1 h x (i) y (i) h x (i) y (i) x (i) j j regularization 57
58 Regularized Linear Regression J( ) = 1 2n n nx h x (i) y (i) 2 dx + 2 i=1 j j 1 n nx h x (i) y (i) i=1 nx i=1 h x (i) y (i) x (i) j j=1 2 j j We can rewrite the gradient step as: j j (1 ) 1 nx h x (i) n i=1 y (i) x (i) j 58
59 Regularized Linear Regression To incorporate regularization into the closed form solution: = B X X X y C. 5A
60 Regularized Linear Regression To incorporate regularization into the closed form solution: = B X X X y C. 5A Can derive this the same way, by solving Can prove that for λ > 0, inverse exists in the equation J( ) 60
Logistic Regression. Robot Image Credit: Viktoriya Sukhanova 123RF.com
Logistic Regression These slides were assembled by Eric Eaton, with grateful acknowledgement of the many others who made their course materials freely available online. Feel free to reuse or adapt these
More informationClassification Based on Probability
Logistic Regression These slides were assembled by Byron Boots, with only minor modifications from Eric Eaton s slides and grateful acknowledgement to the many others who made their course materials freely
More informationLogis&c Regression. Robot Image Credit: Viktoriya Sukhanova 123RF.com
Logis&c Regression These slides were assembled by Eric Eaton, with grateful acknowledgement of the many others who made their course materials freely available online. Feel free to reuse or adapt these
More informationCSC2515 Winter 2015 Introduction to Machine Learning. Lecture 2: Linear regression
CSC2515 Winter 2015 Introduction to Machine Learning Lecture 2: Linear regression All lecture slides will be available as.pdf on the course website: http://www.cs.toronto.edu/~urtasun/courses/csc2515/csc2515_winter15.html
More informationLecture 2: Linear regression
Lecture 2: Linear regression Roger Grosse 1 Introduction Let s ump right in and look at our first machine learning algorithm, linear regression. In regression, we are interested in predicting a scalar-valued
More informationClassification: Decision Trees
Classification: Decision Trees These slides were assembled by Byron Boots, with grateful acknowledgement to Eric Eaton and the many others who made their course materials freely available online. Feel
More informationUnsupervised Learning: K-Means, Gaussian Mixture Models
Unsupervised Learning: K-Means, Gaussian Mixture Models These slides were assembled by Eric Eaton, with grateful acknowledgement of the many others who made their course materials freely available online.
More informationDS Machine Learning and Data Mining I. Alina Oprea Associate Professor, CCIS Northeastern University
DS 4400 Machine Learning and Data Mining I Alina Oprea Associate Professor, CCIS Northeastern University January 17 2019 Logistics HW 1 is on Piazza and Gradescope Deadline: Friday, Jan. 25, 2019 Office
More informationLinear Regression (continued)
Linear Regression (continued) Professor Ameet Talwalkar Professor Ameet Talwalkar CS260 Machine Learning Algorithms February 6, 2017 1 / 39 Outline 1 Administration 2 Review of last lecture 3 Linear regression
More informationCSC321 Lecture 2: Linear Regression
CSC32 Lecture 2: Linear Regression Roger Grosse Roger Grosse CSC32 Lecture 2: Linear Regression / 26 Overview First learning algorithm of the course: linear regression Task: predict scalar-valued targets,
More informationLinear Models for Regression
Linear Models for Regression CSE 4309 Machine Learning Vassilis Athitsos Computer Science and Engineering Department University of Texas at Arlington 1 The Regression Problem Training data: A set of input-output
More informationRegression with Numerical Optimization. Logistic
CSG220 Machine Learning Fall 2008 Regression with Numerical Optimization. Logistic regression Regression with Numerical Optimization. Logistic regression based on a document by Andrew Ng October 3, 204
More informationMachine Learning and Data Mining. Linear regression. Kalev Kask
Machine Learning and Data Mining Linear regression Kalev Kask Supervised learning Notation Features x Targets y Predictions ŷ Parameters q Learning algorithm Program ( Learner ) Change q Improve performance
More informationLINEAR REGRESSION, RIDGE, LASSO, SVR
LINEAR REGRESSION, RIDGE, LASSO, SVR Supervised Learning Katerina Tzompanaki Linear regression one feature* Price (y) What is the estimated price of a new house of area 30 m 2? 30 Area (x) *Also called
More informationLinear Models for Regression CS534
Linear Models for Regression CS534 Prediction Problems Predict housing price based on House size, lot size, Location, # of rooms Predict stock price based on Price history of the past month Predict the
More informationCOMPUTATIONAL INTELLIGENCE (INTRODUCTION TO MACHINE LEARNING) SS16
COMPUTATIONAL INTELLIGENCE (INTRODUCTION TO MACHINE LEARNING) SS6 Lecture 3: Classification with Logistic Regression Advanced optimization techniques Underfitting & Overfitting Model selection (Training-
More informationOverfitting, Bias / Variance Analysis
Overfitting, Bias / Variance Analysis Professor Ameet Talwalkar Professor Ameet Talwalkar CS260 Machine Learning Algorithms February 8, 207 / 40 Outline Administration 2 Review of last lecture 3 Basic
More informationKernel Machines. Pradeep Ravikumar Co-instructor: Manuela Veloso. Machine Learning
Kernel Machines Pradeep Ravikumar Co-instructor: Manuela Veloso Machine Learning 10-701 SVM linearly separable case n training points (x 1,, x n ) d features x j is a d-dimensional vector Primal problem:
More informationCOMP 562: Introduction to Machine Learning
COMP 562: Introduction to Machine Learning Lecture 20 : Support Vector Machines, Kernels Mahmoud Mostapha 1 Department of Computer Science University of North Carolina at Chapel Hill mahmoudm@cs.unc.edu
More informationECE521 week 3: 23/26 January 2017
ECE521 week 3: 23/26 January 2017 Outline Probabilistic interpretation of linear regression - Maximum likelihood estimation (MLE) - Maximum a posteriori (MAP) estimation Bias-variance trade-off Linear
More informationLogistic Regression Logistic
Case Study 1: Estimating Click Probabilities L2 Regularization for Logistic Regression Machine Learning/Statistics for Big Data CSE599C1/STAT592, University of Washington Carlos Guestrin January 10 th,
More informationCSC 411 Lecture 6: Linear Regression
CSC 411 Lecture 6: Linear Regression Roger Grosse, Amir-massoud Farahmand, and Juan Carrasquilla University of Toronto UofT CSC 411: 06-Linear Regression 1 / 37 A Timely XKCD UofT CSC 411: 06-Linear Regression
More informationGaussian Mixture Models, Expectation Maximization
Gaussian Mixture Models, Expectation Maximization Instructor: Jessica Wu Harvey Mudd College The instructor gratefully acknowledges Andrew Ng (Stanford), Andrew Moore (CMU), Eric Eaton (UPenn), David Kauchak
More informationThese slides follow closely the (English) course textbook Pattern Recognition and Machine Learning by Christopher Bishop
Music and Machine Learning (IFT68 Winter 8) Prof. Douglas Eck, Université de Montréal These slides follow closely the (English) course textbook Pattern Recognition and Machine Learning by Christopher Bishop
More informationCSE446: non-parametric methods Spring 2017
CSE446: non-parametric methods Spring 2017 Ali Farhadi Slides adapted from Carlos Guestrin and Luke Zettlemoyer Linear Regression: What can go wrong? What do we do if the bias is too strong? Might want
More informationCase Study 1: Estimating Click Probabilities. Kakade Announcements: Project Proposals: due this Friday!
Case Study 1: Estimating Click Probabilities Intro Logistic Regression Gradient Descent + SGD Machine Learning for Big Data CSE547/STAT548, University of Washington Sham Kakade April 4, 017 1 Announcements:
More informationIntroduction to Machine Learning. Regression. Computer Science, Tel-Aviv University,
1 Introduction to Machine Learning Regression Computer Science, Tel-Aviv University, 2013-14 Classification Input: X Real valued, vectors over real. Discrete values (0,1,2,...) Other structures (e.g.,
More informationLinear Models for Regression
Linear Models for Regression Machine Learning Torsten Möller Möller/Mori 1 Reading Chapter 3 of Pattern Recognition and Machine Learning by Bishop Chapter 3+5+6+7 of The Elements of Statistical Learning
More informationMachine Learning Lecture 7
Course Outline Machine Learning Lecture 7 Fundamentals (2 weeks) Bayes Decision Theory Probability Density Estimation Statistical Learning Theory 23.05.2016 Discriminative Approaches (5 weeks) Linear Discriminant
More informationBias-Variance Tradeoff
What s learning, revisited Overfitting Generative versus Discriminative Logistic Regression Machine Learning 10701/15781 Carlos Guestrin Carnegie Mellon University September 19 th, 2007 Bias-Variance Tradeoff
More informationLinear Regression. CSL603 - Fall 2017 Narayanan C Krishnan
Linear Regression CSL603 - Fall 2017 Narayanan C Krishnan ckn@iitrpr.ac.in Outline Univariate regression Multivariate regression Probabilistic view of regression Loss functions Bias-Variance analysis Regularization
More informationLinear Regression. CSL465/603 - Fall 2016 Narayanan C Krishnan
Linear Regression CSL465/603 - Fall 2016 Narayanan C Krishnan ckn@iitrpr.ac.in Outline Univariate regression Multivariate regression Probabilistic view of regression Loss functions Bias-Variance analysis
More informationCOMP 551 Applied Machine Learning Lecture 20: Gaussian processes
COMP 55 Applied Machine Learning Lecture 2: Gaussian processes Instructor: Ryan Lowe (ryan.lowe@cs.mcgill.ca) Slides mostly by: (herke.vanhoof@mcgill.ca) Class web page: www.cs.mcgill.ca/~hvanho2/comp55
More informationCSC 411: Lecture 02: Linear Regression
CSC 411: Lecture 02: Linear Regression Richard Zemel, Raquel Urtasun and Sanja Fidler University of Toronto (Most plots in this lecture are from Bishop s book) Zemel, Urtasun, Fidler (UofT) CSC 411: 02-Regression
More informationSupport Vector Machines
Support Vector Machines Some material on these is slides borrowed from Andrew Moore's excellent machine learning tutorials located at: http://www.cs.cmu.edu/~awm/tutorials/ Where Should We Draw the Line????
More informationMustafa Jarrar. Machine Learning Linear Regression. Birzeit University. Mustafa Jarrar: Lecture Notes on Linear Regression Birzeit University, 2018
Mustafa Jarrar: Lecture Notes on Linear Regression Birzeit University, 2018 Version 1 Machine Learning Linear Regression Mustafa Jarrar Birzeit University 1 Watch this lecture and download the slides Course
More informationCS-E3210 Machine Learning: Basic Principles
CS-E3210 Machine Learning: Basic Principles Lecture 3: Regression I slides by Markus Heinonen Department of Computer Science Aalto University, School of Science Autumn (Period I) 2017 1 / 48 In a nutshell
More informationUNIVERSITY of PENNSYLVANIA CIS 520: Machine Learning Final, Fall 2013
UNIVERSITY of PENNSYLVANIA CIS 520: Machine Learning Final, Fall 2013 Exam policy: This exam allows two one-page, two-sided cheat sheets; No other materials. Time: 2 hours. Be sure to write your name and
More informationCOMP 551 Applied Machine Learning Lecture 3: Linear regression (cont d)
COMP 551 Applied Machine Learning Lecture 3: Linear regression (cont d) Instructor: Herke van Hoof (herke.vanhoof@mail.mcgill.ca) Slides mostly by: Class web page: www.cs.mcgill.ca/~hvanho2/comp551 Unless
More informationMachine Learning Lecture 5
Machine Learning Lecture 5 Linear Discriminant Functions 26.10.2017 Bastian Leibe RWTH Aachen http://www.vision.rwth-aachen.de leibe@vision.rwth-aachen.de Course Outline Fundamentals Bayes Decision Theory
More informationLinear Regression 1 / 25. Karl Stratos. June 18, 2018
Linear Regression Karl Stratos June 18, 2018 1 / 25 The Regression Problem Problem. Find a desired input-output mapping f : X R where the output is a real value. x = = y = 0.1 How much should I turn my
More informationMachine Learning. Lecture 4: Regularization and Bayesian Statistics. Feng Li. https://funglee.github.io
Machine Learning Lecture 4: Regularization and Bayesian Statistics Feng Li fli@sdu.edu.cn https://funglee.github.io School of Computer Science and Technology Shandong University Fall 207 Overfitting Problem
More informationLecture 6. Regression
Lecture 6. Regression Prof. Alan Yuille Summer 2014 Outline 1. Introduction to Regression 2. Binary Regression 3. Linear Regression; Polynomial Regression 4. Non-linear Regression; Multilayer Perceptron
More informationThe Kernel Trick, Gram Matrices, and Feature Extraction. CS6787 Lecture 4 Fall 2017
The Kernel Trick, Gram Matrices, and Feature Extraction CS6787 Lecture 4 Fall 2017 Momentum for Principle Component Analysis CS6787 Lecture 3.1 Fall 2017 Principle Component Analysis Setting: find the
More informationMachine Learning and Data Mining. Linear regression. Prof. Alexander Ihler
+ Machine Learning and Data Mining Linear regression Prof. Alexander Ihler Supervised learning Notation Features x Targets y Predictions ŷ Parameters θ Learning algorithm Program ( Learner ) Change µ Improve
More informationCSE546: SVMs, Dual Formula5on, and Kernels Winter 2012
CSE546: SVMs, Dual Formula5on, and Kernels Winter 2012 Luke ZeClemoyer Slides adapted from Carlos Guestrin Linear classifiers Which line is becer? w. = j w (j) x (j) Data Example i Pick the one with the
More informationLinear & nonlinear classifiers
Linear & nonlinear classifiers Machine Learning Hamid Beigy Sharif University of Technology Fall 1396 Hamid Beigy (Sharif University of Technology) Linear & nonlinear classifiers Fall 1396 1 / 44 Table
More informationECE521 lecture 4: 19 January Optimization, MLE, regularization
ECE521 lecture 4: 19 January 2017 Optimization, MLE, regularization First four lectures Lectures 1 and 2: Intro to ML Probability review Types of loss functions and algorithms Lecture 3: KNN Convexity
More informationCheng Soon Ong & Christian Walder. Canberra February June 2018
Cheng Soon Ong & Christian Walder Research Group and College of Engineering and Computer Science Canberra February June 2018 (Many figures from C. M. Bishop, "Pattern Recognition and ") 1of 254 Part V
More informationDay 3 Lecture 3. Optimizing deep networks
Day 3 Lecture 3 Optimizing deep networks Convex optimization A function is convex if for all α [0,1]: f(x) Tangent line Examples Quadratics 2-norms Properties Local minimum is global minimum x Gradient
More informationStochastic Gradient Descent
Stochastic Gradient Descent Machine Learning CSE546 Carlos Guestrin University of Washington October 9, 2013 1 Logistic Regression Logistic function (or Sigmoid): Learn P(Y X) directly Assume a particular
More informationCOMS 4771 Introduction to Machine Learning. James McInerney Adapted from slides by Nakul Verma
COMS 4771 Introduction to Machine Learning James McInerney Adapted from slides by Nakul Verma Announcements HW1: Please submit as a group Watch out for zero variance features (Q5) HW2 will be released
More informationGenerative v. Discriminative classifiers Intuition
Logistic Regression Machine Learning 10701/15781 Carlos Guestrin Carnegie Mellon University September 24 th, 2007 1 Generative v. Discriminative classifiers Intuition Want to Learn: h:x a Y X features
More informationLinear & nonlinear classifiers
Linear & nonlinear classifiers Machine Learning Hamid Beigy Sharif University of Technology Fall 1394 Hamid Beigy (Sharif University of Technology) Linear & nonlinear classifiers Fall 1394 1 / 34 Table
More informationGradient Descent. Stephan Robert
Gradient Descent Stephan Robert Model Other inputs # of bathrooms Lot size Year built Location Lake view Price (CHF) 6 5 4 3 2 1 2 4 6 8 1 Size (m 2 ) Model How it works Linear regression with one variable
More informationWeek 3: Linear Regression
Week 3: Linear Regression Instructor: Sergey Levine Recap In the previous lecture we saw how linear regression can solve the following problem: given a dataset D = {(x, y ),..., (x N, y N )}, learn to
More information18.9 SUPPORT VECTOR MACHINES
744 Chapter 8. Learning from Examples is the fact that each regression problem will be easier to solve, because it involves only the examples with nonzero weight the examples whose kernels overlap the
More informationLecture 2 Machine Learning Review
Lecture 2 Machine Learning Review CMSC 35246: Deep Learning Shubhendu Trivedi & Risi Kondor University of Chicago March 29, 2017 Things we will look at today Formal Setup for Supervised Learning Things
More information10/05/2016. Computational Methods for Data Analysis. Massimo Poesio SUPPORT VECTOR MACHINES. Support Vector Machines Linear classifiers
Computational Methods for Data Analysis Massimo Poesio SUPPORT VECTOR MACHINES Support Vector Machines Linear classifiers 1 Linear Classifiers denotes +1 denotes -1 w x + b>0 f(x,w,b) = sign(w x + b) How
More informationSVAN 2016 Mini Course: Stochastic Convex Optimization Methods in Machine Learning
SVAN 2016 Mini Course: Stochastic Convex Optimization Methods in Machine Learning Mark Schmidt University of British Columbia, May 2016 www.cs.ubc.ca/~schmidtm/svan16 Some images from this lecture are
More informationCPSC 540: Machine Learning
CPSC 540: Machine Learning First-Order Methods, L1-Regularization, Coordinate Descent Winter 2016 Some images from this lecture are taken from Google Image Search. Admin Room: We ll count final numbers
More informationECE 5984: Introduction to Machine Learning
ECE 5984: Introduction to Machine Learning Topics: Classification: Logistic Regression NB & LR connections Readings: Barber 17.4 Dhruv Batra Virginia Tech Administrativia HW2 Due: Friday 3/6, 3/15, 11:55pm
More informationLinear Models for Regression
Linear Models for Regression Seungjin Choi Department of Computer Science and Engineering Pohang University of Science and Technology 77 Cheongam-ro, Nam-gu, Pohang 37673, Korea seungjin@postech.ac.kr
More informationSupport Vector Machines
Two SVM tutorials linked in class website (please, read both): High-level presentation with applications (Hearst 1998) Detailed tutorial (Burges 1998) Support Vector Machines Machine Learning 10701/15781
More informationOrdinary Least Squares Linear Regression
Ordinary Least Squares Linear Regression Ryan P. Adams COS 324 Elements of Machine Learning Princeton University Linear regression is one of the simplest and most fundamental modeling ideas in statistics
More informationSVMs, Duality and the Kernel Trick
SVMs, Duality and the Kernel Trick Machine Learning 10701/15781 Carlos Guestrin Carnegie Mellon University February 26 th, 2007 2005-2007 Carlos Guestrin 1 SVMs reminder 2005-2007 Carlos Guestrin 2 Today
More informationSimple Techniques for Improving SGD. CS6787 Lecture 2 Fall 2017
Simple Techniques for Improving SGD CS6787 Lecture 2 Fall 2017 Step Sizes and Convergence Where we left off Stochastic gradient descent x t+1 = x t rf(x t ; yĩt ) Much faster per iteration than gradient
More informationMachine Learning. Kernels. Fall (Kernels, Kernelized Perceptron and SVM) Professor Liang Huang. (Chap. 12 of CIML)
Machine Learning Fall 2017 Kernels (Kernels, Kernelized Perceptron and SVM) Professor Liang Huang (Chap. 12 of CIML) Nonlinear Features x4: -1 x1: +1 x3: +1 x2: -1 Concatenated (combined) features XOR:
More informationMachine Learning. Regression. Manfred Huber
Machine Learning Regression Manfred Huber 2015 1 Regression Regression refers to supervised learning problems where the target output is one or more continuous values Continuous output values imply that
More informationCPSC 340: Machine Learning and Data Mining
CPSC 340: Machine Learning and Data Mining Linear Classifiers: predictions Original version of these slides by Mark Schmidt, with modifications by Mike Gelbart. 1 Admin Assignment 4: Due Friday of next
More informationCPSC 340: Machine Learning and Data Mining. Regularization Fall 2017
CPSC 340: Machine Learning and Data Mining Regularization Fall 2017 Assignment 2 Admin 2 late days to hand in tonight, answers posted tomorrow morning. Extra office hours Thursday at 4pm (ICICS 246). Midterm
More informationLinear Models for Classification
Linear Models for Classification Oliver Schulte - CMPT 726 Bishop PRML Ch. 4 Classification: Hand-written Digit Recognition CHINE INTELLIGENCE, VOL. 24, NO. 24, APRIL 2002 x i = t i = (0, 0, 0, 1, 0, 0,
More informationThe Multivariate Gaussian Distribution [DRAFT]
The Multivariate Gaussian Distribution DRAFT David S. Rosenberg Abstract This is a collection of a few key and standard results about multivariate Gaussian distributions. I have not included many proofs,
More informationLECTURE 22: SWARM INTELLIGENCE 3 / CLASSICAL OPTIMIZATION
15-382 COLLECTIVE INTELLIGENCE - S19 LECTURE 22: SWARM INTELLIGENCE 3 / CLASSICAL OPTIMIZATION TEACHER: GIANNI A. DI CARO WHAT IF WE HAVE ONE SINGLE AGENT PSO leverages the presence of a swarm: the outcome
More informationSample questions for Fundamentals of Machine Learning 2018
Sample questions for Fundamentals of Machine Learning 2018 Teacher: Mohammad Emtiyaz Khan A few important informations: In the final exam, no electronic devices are allowed except a calculator. Make sure
More informationLogistic Regression. Vibhav Gogate The University of Texas at Dallas. Some Slides from Carlos Guestrin, Luke Zettlemoyer and Dan Weld.
Logistic Regression Vibhav Gogate The University of Texas at Dallas Some Slides from Carlos Guestrin, Luke Zettlemoyer and Dan Weld. Generative vs. Discriminative Classifiers Want to Learn: h:x Y X features
More informationError Functions & Linear Regression (1)
Error Functions & Linear Regression (1) John Kelleher & Brian Mac Namee Machine Learning @ DIT Overview 1 Introduction Overview 2 Univariate Linear Regression Linear Regression Analytical Solution Gradient
More informationCS60021: Scalable Data Mining. Large Scale Machine Learning
J. Leskovec, A. Rajaraman, J. Ullman: Mining of Massive Datasets, http://www.mmds.org 1 CS60021: Scalable Data Mining Large Scale Machine Learning Sourangshu Bhattacharya Example: Spam filtering Instance
More informationMLCC 2017 Regularization Networks I: Linear Models
MLCC 2017 Regularization Networks I: Linear Models Lorenzo Rosasco UNIGE-MIT-IIT June 27, 2017 About this class We introduce a class of learning algorithms based on Tikhonov regularization We study computational
More informationCS 231A Section 1: Linear Algebra & Probability Review
CS 231A Section 1: Linear Algebra & Probability Review 1 Topics Support Vector Machines Boosting Viola-Jones face detector Linear Algebra Review Notation Operations & Properties Matrix Calculus Probability
More informationSVAN 2016 Mini Course: Stochastic Convex Optimization Methods in Machine Learning
SVAN 2016 Mini Course: Stochastic Convex Optimization Methods in Machine Learning Mark Schmidt University of British Columbia, May 2016 www.cs.ubc.ca/~schmidtm/svan16 Some images from this lecture are
More informationGenerative v. Discriminative classifiers Intuition
Logistic Regression Machine Learning 070/578 Carlos Guestrin Carnegie Mellon University September 24 th, 2007 Generative v. Discriminative classifiers Intuition Want to Learn: h:x a Y X features Y target
More informationLecture 5: Logistic Regression. Neural Networks
Lecture 5: Logistic Regression. Neural Networks Logistic regression Comparison with generative models Feed-forward neural networks Backpropagation Tricks for training neural networks COMP-652, Lecture
More informationCSC321 Lecture 9: Generalization
CSC321 Lecture 9: Generalization Roger Grosse Roger Grosse CSC321 Lecture 9: Generalization 1 / 27 Overview We ve focused so far on how to optimize neural nets how to get them to make good predictions
More informationGradient Descent. Model
Gradient Descent Stephan Robert Model Other inputs # of bathrooms Lot size Year built Location Lake view Price (HF) 6 5 4 3 2 2 4 6 8 Size (m 2 ) 2 Model How it works Linear regression with one variable
More informationMidterm: CS 6375 Spring 2015 Solutions
Midterm: CS 6375 Spring 2015 Solutions The exam is closed book. You are allowed a one-page cheat sheet. Answer the questions in the spaces provided on the question sheets. If you run out of room for an
More informationLecture 3 - Linear and Logistic Regression
3 - Linear and Logistic Regression-1 Machine Learning Course Lecture 3 - Linear and Logistic Regression Lecturer: Haim Permuter Scribe: Ziv Aharoni Throughout this lecture we talk about how to use regression
More informationCSC321 Lecture 9: Generalization
CSC321 Lecture 9: Generalization Roger Grosse Roger Grosse CSC321 Lecture 9: Generalization 1 / 26 Overview We ve focused so far on how to optimize neural nets how to get them to make good predictions
More informationLecture 4: Training a Classifier
Lecture 4: Training a Classifier Roger Grosse 1 Introduction Now that we ve defined what binary classification is, let s actually train a classifier. We ll approach this problem in much the same way as
More information18.6 Regression and Classification with Linear Models
18.6 Regression and Classification with Linear Models 352 The hypothesis space of linear functions of continuous-valued inputs has been used for hundreds of years A univariate linear function (a straight
More informationLinear Classification. CSE 6363 Machine Learning Vassilis Athitsos Computer Science and Engineering Department University of Texas at Arlington
Linear Classification CSE 6363 Machine Learning Vassilis Athitsos Computer Science and Engineering Department University of Texas at Arlington 1 Example of Linear Classification Red points: patterns belonging
More informationGeometric View of Machine Learning Nearest Neighbor Classification. Slides adapted from Prof. Carpuat
Geometric View of Machine Learning Nearest Neighbor Classification Slides adapted from Prof. Carpuat What we know so far Decision Trees What is a decision tree, and how to induce it from data Fundamental
More informationMax Margin-Classifier
Max Margin-Classifier Oliver Schulte - CMPT 726 Bishop PRML Ch. 7 Outline Maximum Margin Criterion Math Maximizing the Margin Non-Separable Data Kernels and Non-linear Mappings Where does the maximization
More informationECE521: Inference Algorithms and Machine Learning University of Toronto. Assignment 1: k-nn and Linear Regression
ECE521: Inference Algorithms and Machine Learning University of Toronto Assignment 1: k-nn and Linear Regression TA: Use Piazza for Q&A Due date: Feb 7 midnight, 2017 Electronic submission to: ece521ta@gmailcom
More informationProbability Basics. Robot Image Credit: Viktoriya Sukhanova 123RF.com
Probability Basics These slides were assembled by Eric Eaton, with grateful acknowledgement of the many others who made their course materials freely available online. Feel free to reuse or adapt these
More informationNearest Neighbor. Machine Learning CSE546 Kevin Jamieson University of Washington. October 26, Kevin Jamieson 2
Nearest Neighbor Machine Learning CSE546 Kevin Jamieson University of Washington October 26, 2017 2017 Kevin Jamieson 2 Some data, Bayes Classifier Training data: True label: +1 True label: -1 Optimal
More informationSTA414/2104 Statistical Methods for Machine Learning II
STA414/2104 Statistical Methods for Machine Learning II Murat A. Erdogdu & David Duvenaud Department of Computer Science Department of Statistical Sciences Lecture 3 Slide credits: Russ Salakhutdinov Announcements
More informationMidterm. Introduction to Machine Learning. CS 189 Spring Please do not open the exam before you are instructed to do so.
CS 89 Spring 07 Introduction to Machine Learning Midterm Please do not open the exam before you are instructed to do so. The exam is closed book, closed notes except your one-page cheat sheet. Electronic
More informationLogistic Regression Introduction to Machine Learning. Matt Gormley Lecture 8 Feb. 12, 2018
10-601 Introduction to Machine Learning Machine Learning Department School of Computer Science Carnegie Mellon University Logistic Regression Matt Gormley Lecture 8 Feb. 12, 2018 1 10-601 Introduction
More information