Regression and Classification" with Linear Models" CMPSCI 383 Nov 15, 2011!

Similar documents
CSC242: Intro to AI. Lecture 21

18.6 Regression and Classification with Linear Models

Artificial Neural Networks" and Nonparametric Methods" CMPSCI 383 Nov 17, 2011!

Error Functions & Linear Regression (1)

Error Functions & Linear Regression (2)

Artificial Intelligence Roman Barták

Logistic Regression. Vibhav Gogate The University of Texas at Dallas. Some Slides from Carlos Guestrin, Luke Zettlemoyer and Dan Weld.

Linear discriminant functions

Linear Regression. S. Sumitra

Discriminative Models

The Perceptron algorithm

Binary Classification / Perceptron

Reward-modulated inference

AN INTRODUCTION TO NEURAL NETWORKS. Scott Kuindersma November 12, 2009

COMP 551 Applied Machine Learning Lecture 14: Neural Networks

Machine Learning Linear Models

Discriminative Models

Machine Learning. Linear Models. Fabio Vandin October 10, 2017

Lecture 6. Notes on Linear Algebra. Perceptron

The perceptron learning algorithm is one of the first procedures proposed for learning in neural network models and is mostly credited to Rosenblatt.

Machine Learning Support Vector Machines. Prof. Matteo Matteucci

Mark your answers ON THE EXAM ITSELF. If you are not sure of your answer you may wish to provide a brief explanation.

Sections 18.6 and 18.7 Analysis of Artificial Neural Networks

Machine Learning. Neural Networks. (slides from Domingos, Pardo, others)

Machine Learning (CS 567) Lecture 3

Multilayer Perceptron

April 9, Depto. de Ing. de Sistemas e Industrial Universidad Nacional de Colombia, Bogotá. Linear Classification Models. Fabio A. González Ph.D.

Multilayer Neural Networks

The Perceptron Algorithm 1

ECS171: Machine Learning

Introduction to Neural Networks

Machine Learning and Data Mining. Linear classification. Kalev Kask

FA Homework 2 Recitation 2

CSC321 Lecture 4: Learning a Classifier

Sections 18.6 and 18.7 Artificial Neural Networks

Midterm exam CS 189/289, Fall 2015

Machine Learning. Regression. Manfred Huber

Logistic Regression. Machine Learning Fall 2018

Lecture 4: Perceptrons and Multilayer Perceptrons

Midterm Review CS 6375: Machine Learning. Vibhav Gogate The University of Texas at Dallas

Universität Potsdam Institut für Informatik Lehrstuhl Maschinelles Lernen. Bayesian Learning. Tobias Scheffer, Niels Landwehr

Logistic Regression. COMP 527 Danushka Bollegala

Machine Learning 2017

Linear Discrimination Functions

Machine Learning and Data Mining. Multi-layer Perceptrons & Neural Networks: Basics. Prof. Alexander Ihler

Machine Learning Tom M. Mitchell Machine Learning Department Carnegie Mellon University. September 20, 2012

CS534: Machine Learning. Thomas G. Dietterich 221C Dearborn Hall

Artificial Neural Networks

2015 Todd Neller. A.I.M.A. text figures 1995 Prentice Hall. Used by permission. Neural Networks. Todd W. Neller

Logistic Regression & Neural Networks

Machine Learning Practice Page 2 of 2 10/28/13

Sections 18.6 and 18.7 Artificial Neural Networks

Least Mean Squares Regression

Machine Learning Lecture 5

Classification with Perceptrons. Reading:

Multilayer Neural Networks. (sometimes called Multilayer Perceptrons or MLPs)

CSE 417T: Introduction to Machine Learning. Lecture 11: Review. Henry Chai 10/02/18

Learning from Examples

Midterm: CS 6375 Spring 2015 Solutions

Machine Learning. Neural Networks. Le Song. CSE6740/CS7641/ISYE6740, Fall Lecture 7, September 11, 2012 Based on slides from Eric Xing, CMU

Multilayer Neural Networks. (sometimes called Multilayer Perceptrons or MLPs)

Midterm Review CS 7301: Advanced Machine Learning. Vibhav Gogate The University of Texas at Dallas

Artificial Neural Networks

Machine Learning. Neural Networks. (slides from Domingos, Pardo, others)

Lecture 10. Neural networks and optimization. Machine Learning and Data Mining November Nando de Freitas UBC. Nonlinear Supervised Learning

CSC321 Lecture 4: Learning a Classifier

Machine Learning. Lecture 4: Regularization and Bayesian Statistics. Feng Li.

Comments. Assignment 3 code released. Thought questions 3 due this week. Mini-project: hopefully you have started. implement classification algorithms

Linear Regression, Neural Networks, etc.

Machine Learning. Neural Networks. (slides from Domingos, Pardo, others)

Feedforward Neural Nets and Backpropagation

Linear Classification. CSE 6363 Machine Learning Vassilis Athitsos Computer Science and Engineering Department University of Texas at Arlington

Multilayer Neural Networks

Machine Learning. Lecture 3: Logistic Regression. Feng Li.

Machine Learning. Lecture 2: Linear regression. Feng Li.

Machine Learning (CS 567) Lecture 5

Linear Classifiers: Expressiveness

Statistical Machine Learning (BE4M33SSU) Lecture 5: Artificial Neural Networks

Warm up: risk prediction with logistic regression

Preliminaries. Definition: The Euclidean dot product between two vectors is the expression. i=1

Solutions. Part I Logistic regression backpropagation with a single training example

Machine learning comes from Bayesian decision theory in statistics. There we want to minimize the expected value of the loss function.

Vote. Vote on timing for night section: Option 1 (what we have now) Option 2. Lecture, 6:10-7:50 25 minute dinner break Tutorial, 8:15-9

CS260: Machine Learning Algorithms

Logistic Regression. Robot Image Credit: Viktoriya Sukhanova 123RF.com

18.9 SUPPORT VECTOR MACHINES

Artificial Intelligence

Supervised Learning. George Konidaris

Chapter 2 Single Layer Feedforward Networks

Machine Learning. Linear Models. Fabio Vandin October 10, 2017

6.036 midterm review. Wednesday, March 18, 15

Lecture 14 : Online Learning, Stochastic Gradient Descent, Perceptron

Part of the slides are adapted from Ziko Kolter

CPSC 340: Machine Learning and Data Mining

Introduction to Machine Learning

MIDTERM: CS 6375 INSTRUCTOR: VIBHAV GOGATE October,

Machine Learning (CSE 446): Neural Networks

From Binary to Multiclass Classification. CS 6961: Structured Prediction Spring 2018

COMP-4360 Machine Learning Neural Networks

Transcription:

Regression and Classification" with Linear Models" CMPSCI 383 Nov 15, 2011! 1

Todayʼs topics" Learning from Examples: brief review! Univariate Linear Regression! Batch gradient descent! Stochastic gradient descent! Multivariate Linear Regression! Regularization! Linear Classifiers! Perceptron learning rule! Logistic Regression! 2

Learning from Examples (supervised learning)" 3

Learning from Examples (supervised learning)" 4

Learning from Examples (supervised learning)" 5

Learning from Examples (supervised learning)" 6

Learning from Examples (supervised learning)" 7

Learning from Examples (supervised learning)" 8

Important issues" Generalization! Overfitting! Cross-validation! Holdout cross validation! K-fold cross validation! Leave-one-out cross-validation! Model selection! 9

Recall Notation" (x 1, y 1 ), (x 2, y 2 ),K (x N,y N ) training set! y j Where each was generated by! an unknown function! y = f (x) Discover a function that best approximates the true function! h f hypothesis! 10

Loss Functions" Suppose the true prediction for input x is f (x) = y but the hypothesis gives h(x) = y ˆ L(x, y, ˆ y ) = Utility(result of using y given input x) Utility(result of using ˆ y given input x) Simplified version : L(y, ˆ y ) Absolute value loss : L 1 (y, ˆ y ) = y ˆ y ( ) 2 Squared error loss : L 2 (y, y ˆ ) = y y ˆ 0/1 loss : L 0 /1 (y, ˆ y ) = 0 if y = ˆ y, else 1 Generalization loss: expected loss over all possible examples! Empirical loss: average loss over available examples! 11

Univariate Linear Regression" 12

Univariate Linear Regression contd." w = [ w 0,w ] 1 weight vector! h w (x) = w 1 x + w 0 Find weight vector that minimizes empirical loss, e.g., L2:! N Loss(h w ) = L 2 (y j, h w (x j )) = (y j h w (x j )) 2 = (y j (w 1 x j + w 0 )) 2 i.e., find j =1 w* such that! N j =1 w * = argmin w Loss(h w ) N j =1 13

Weight Space" 14

Finding w*" Find weights such that:! w 0 N j =1 (y j (w 1 x j + w 0 )) 2 = 0 and N (y w j (w 1 x j + w 0 )) 2 = 0 1 j =1 15

Gradient Descent" step size or! learning rate! w i w i α w i Loss(w) 16

Gradient Descent contd." For one training example (x,y) :! w 0 w 0 + α(y h w (x)) and w 1 w 1 + α(y h w (x))x For N training examples:! w 0 w 0 + α (y j h w (x j )) and w 1 w 1 + α (y j h w (x j )) j j x j batch gradient descent! stochastic gradient descent: take a step for one training example at a time! 17

The Multivariate case" h sw (x j ) = w 0 + w 1 x j,1 +L + w n x j,n = w 0 + i w i x j,i Augmented vectors: add a feature to each x by tacking on a 1:! x j,0 =1 Then:! h sw (x j ) = w x j = w T x j = i w i x j,i And batch gradient descent update becomes:! j w i w i + α (y j h w (x j )) x j,i 18

The Multivariate case contd." Or, solving analytically:! y Let be the vector of outputs for the training examples! X data matrix: each row is an input vector! Solving this for w* :! y = Xw w* = ( X T X) 1 X T y pseudo inverse! 19

Regularization" Cost(h) = EmpLoss(h) + λcomplexity(h) Complexity(h w ) = L q (w) = i w i q 20

L1 vs. L2 Regularization" 21

Linear Classification: hard thresholds" 22

Linear Classification: hard thresholds contd." Decision Boundary:! In linear case: linear separator, a hyperplane! Linearly separable:! data is linearly separable if the classes can be separated by a linear separator! Classification hypothesis:! h w (x) = Threshold(w x) where Threshold(z) =1 if z 0 and 0 otherwise 23

Perceptron Learning Rule" For a single sample (x, y) : w i w i + α( y h w (x))x i If the output is correct, i.e., y =h w (x), then the weights don't change If y =1 but h w (x) = 0, then w i is increased when x i is positive and decreased when x i is negative. If y = 0 but h w (x) =1, then w i is decreased when x i is positive and increased when x i is negative. Perceptron Convergence Theorem: For any data set thatʼs linearly separable and any training procedure that continues to present each training example, the learning rule is guaranteed to find a solution in a finite number of steps.! 24

Perceptron Performance" 25

Linear Classification with Logistic Regression" An important function! 26

Logistic Regression" h w (x) = Logistic(w x) = 1 1+ e w x For a single sample (x, y) and L 2 loss function : w i w i + α( y h w (x))h w (x)( 1 h w (x))x i derivative of logistic function! 27

Logistic Regression Performance" separable case! 28

Summary" Learning from Examples: brief review! Loss functions! Generalization! Overfitting! Cross-validation! Regularization! Univariate Linear Regression! Batch gradient descent! Stochastic gradient descent! Multivariate Linear Regression! Regularization! Linear Classifiers! Perceptron learning rule! Logistic Regression! 29

Next Class" Artificial Neural Networks, Nonparametric Models, & Support Vector Machines! Secs. 18.7 18.9! 30