Topics we covered. Machine Learning. Statistics. Optimization. Systems! Basics of probability Tail bounds Density Estimation Exponential Families

Similar documents
Kernelized Perceptron Support Vector Machines

Kernel Machines. Pradeep Ravikumar Co-instructor: Manuela Veloso. Machine Learning

SVMs, Duality and the Kernel Trick

Machine Learning. Support Vector Machines. Fabio Vandin November 20, 2017

Support Vector Machines

Machine Learning Support Vector Machines. Prof. Matteo Matteucci

Lecture 14 : Online Learning, Stochastic Gradient Descent, Perceptron

Mark your answers ON THE EXAM ITSELF. If you are not sure of your answer you may wish to provide a brief explanation.

CS6375: Machine Learning Gautam Kunapuli. Support Vector Machines

Machine Learning and Data Mining. Support Vector Machines. Kalev Kask

Announcements - Homework

Support Vector Machine (SVM) and Kernel Methods

CSC 411 Lecture 17: Support Vector Machine

Support Vector Machines

Support Vector Machine (SVM) and Kernel Methods

Support Vector Machines

Support Vector Machine (SVM) & Kernel CE-717: Machine Learning Sharif University of Technology. M. Soleymani Fall 2012

Jeff Howbert Introduction to Machine Learning Winter

Dan Roth 461C, 3401 Walnut

Support Vector Machines

COMS 4771 Regression. Nakul Verma

Support Vector Machine (SVM) and Kernel Methods

Kernels. Machine Learning CSE446 Carlos Guestrin University of Washington. October 28, Carlos Guestrin

Support Vector Machines: Training with Stochastic Gradient Descent. Machine Learning Fall 2017

Max Margin-Classifier

CS145: INTRODUCTION TO DATA MINING

Support Vector Machines, Kernel SVM

Support Vector Machines

10701 Recitation 5 Duality and SVM. Ahmed Hefny

Warm up: risk prediction with logistic regression

Clustering. Professor Ameet Talwalkar. Professor Ameet Talwalkar CS260 Machine Learning Algorithms March 8, / 26

Machine Learning. Support Vector Machines. Manfred Huber

Support Vector Machines. Introduction to Data Mining, 2 nd Edition by Tan, Steinbach, Karpatne, Kumar

Support Vector Machines

Kernels and the Kernel Trick. Machine Learning Fall 2017

6.036 midterm review. Wednesday, March 18, 15

Support Vector Machines: Maximum Margin Classifiers

Support vector machines Lecture 4

SVMs: Non-Separable Data, Convex Surrogate Loss, Multi-Class Classification, Kernels

Machine Learning Basics Lecture 4: SVM I. Princeton University COS 495 Instructor: Yingyu Liang

Support Vector Machine (continued)

Support Vector Machines

Midterm. Introduction to Machine Learning. CS 189 Spring You have 1 hour 20 minutes for the exam.

CS798: Selected topics in Machine Learning

CSE546: SVMs, Dual Formula5on, and Kernels Winter 2012

Lecture Notes on Support Vector Machine

Optimization and Gradient Descent

Machine Learning Basics: Stochastic Gradient Descent. Sargur N. Srihari

Logistic Regression Introduction to Machine Learning. Matt Gormley Lecture 8 Feb. 12, 2018

Math for Machine Learning Open Doors to Data Science and Artificial Intelligence. Richard Han

PAC-learning, VC Dimension and Margin-based Bounds

Linear smoother. ŷ = S y. where s ij = s ij (x) e.g. s ij = diag(l i (x))

Machine Learning Practice Page 2 of 2 10/28/13

Pattern Recognition 2018 Support Vector Machines

COMP 875 Announcements

Multi-class SVMs. Lecture 17: Aykut Erdem April 2016 Hacettepe University

Lecture 3 January 28

Machine Learning Basics

Linear vs Non-linear classifier. CS789: Machine Learning and Neural Network. Introduction

COMP 652: Machine Learning. Lecture 12. COMP Lecture 12 1 / 37

Lecture 9: Large Margin Classifiers. Linear Support Vector Machines

Lecture Support Vector Machine (SVM) Classifiers

Cheng Soon Ong & Christian Walder. Canberra February June 2018

Numerical Optimization Techniques

Discriminative Models

Midterm exam CS 189/289, Fall 2015

Statistical Machine Learning from Data

Support Vector Machine II

LECTURE 7 Support vector machines

COMS 4771 Introduction to Machine Learning. James McInerney Adapted from slides by Nakul Verma

Support Vector Machines

Indirect Rule Learning: Support Vector Machines. Donglin Zeng, Department of Biostatistics, University of North Carolina

SMO Algorithms for Support Vector Machines without Bias Term

Support Vector Machines

Support Vector Machines and Bayes Regression

Logistic Regression. Machine Learning Fall 2018

Introduction to Logistic Regression and Support Vector Machine

Support Vector Machines and Kernel Methods

Support Vector Machines for Classification and Regression. 1 Linearly Separable Data: Hard Margin SVMs

Neural Networks: Backpropagation

PAC-learning, VC Dimension and Margin-based Bounds

Support Vector Machines and Kernel Methods

CS269: Machine Learning Theory Lecture 16: SVMs and Kernels November 17, 2010

Machine Learning And Applications: Supervised Learning-SVM

CSE 417T: Introduction to Machine Learning. Lecture 11: Review. Henry Chai 10/02/18

About this class. Maximizing the Margin. Maximum margin classifiers. Picture of large and small margin hyperplanes

Machine learning comes from Bayesian decision theory in statistics. There we want to minimize the expected value of the loss function.

Linear classifiers selecting hyperplane maximizing separation margin between classes (large margin classifiers)

STA141C: Big Data & High Performance Statistical Computing

(Kernels +) Support Vector Machines

Machine Learning. Lecture 6: Support Vector Machine. Feng Li.

Kernel Methods. Barnabás Póczos

INTRODUCTION TO DATA SCIENCE

Linear classifiers selecting hyperplane maximizing separation margin between classes (large margin classifiers)

Discriminative Models

Linear & nonlinear classifiers

Convex Optimization Lecture 16

Support Vector Machines. Machine Learning Fall 2017

Kernel Methods and Support Vector Machines

Foundation of Intelligent Systems, Part I. SVM s & Kernel Methods

Transcription:

Midterm Review

Topics we covered Machine Learning Optimization Basics of optimization Convexity Unconstrained: GD, SGD Constrained: Lagrange, KKT Duality Linear Methods Perceptrons Support Vector Machines Kernels Statistics Basics of probability Tail bounds Density Estimation Exponential Families Graphical Models Systems!

Basics of Machine Learning Supervised/Unsupervised Learning? Classification, Regression, Clustering Training error/test error? Model Complexity: Overfitting/Underfitting True error Bayes Optimal Error

Bias-Variance Tradeoff When estimating a quantity θ, we evaluate the performance of an estimator by computing its risk expected value of a loss function R θ, θ = E L(θ, θ), where L could be Mean Squared Error Loss 0/1 Loss Hinge Loss (used for SVMs) Bias-Variance Decomposition: Y = f x + ε Err x = E f x f x 2 = (E f x f(x)) 2 +E f x E f x 2 + σε 2 Bias Variance 3/3/2015 4

Copied from: Junier Oliva

Copied from: Junier Oliva

Copied from: Junier Oliva

Copied from: Junier Oliva

Copied from: Junier Oliva

Copied from: Junier Oliva

Copied from: Junier Oliva

Copied from: Junier Oliva

Copied from: Junier Oliva

Regression

Optimization Copied from: Xuezhi Wang

Convex Sets Copied from: Xuezhi Wang

Convex Functions Copied from: Xuezhi Wang

Convex Functions Examples

Useful Observations A function is convex if and only if its epigraph is a convex set. Below-Sets of Convex Functions is a convex set Convex functions cannot have local minima

Gradient Descent Copied from: Xuezhi Wang

Newton s Method Copied from: Prof Barnabas

Newton s Method Copied from: Prof Barnabas

Duality

Duality

KKT Conditions

Perceptrons

Convergence of Perceptrons

Back to Optimization

Gradient Descent

Stochastic Gradient Descent

SGD and Perceptron

SVM Primal Find maximum margin hyper-plane Hard Margin Copied from: Junier Oliva 3/3/2015 32

SVM Primal Find maximum margin hyper-plane Soft Margin Copied from: Junier Oliva 3/3/2015 33

SVM Dual Find maximum margin hyper-plane Dual for the hard margin SVM Copied from: Junier Oliva 3/3/2015 34

SVM Dual Find maximum margin hyper-plane Dual for the hard margin SVM Substituting α for w Copied from: Junier Oliva 3/3/2015 35

SVM Dual Find maximum margin hyper-plane Dual for the hard margin SVM The constraints are active for the support vectors Copied from: Junier Oliva 3/3/2015 36

SVM Dual Find maximum margin hyper-plane Dual for the hard margin SVM Copied from: Junier Oliva 3/3/2015 37

SVM Computing w Find maximum margin hyper-plane Dual for the hard margin SVM Copied from: Junier Oliva 3/3/2015 38

SVM Computing w Find maximum margin hyper-plane Dual for the soft margin SVM only difference from the separable case Copied from: Junier Oliva 3/3/2015 39

SVM the feature map Find maximum margin hyper-plane But data is not linearly separable We obtain a linear separator in the feature space.!! inputs feature map features is expensive to compute! Copied from: Junier Oliva 3/3/2015 40

Introducing the kernel The dual formulation no longer depends on w, only on a dot product! We obtain a linear separator in the feature space.!! is expensive to compute! But we don t have to! What we need is the dot product: Let s call this a kernel - 2-variable function - can be written as a dot product Copied from: Junier Oliva 3/3/2015 41

Kernel SVM The dual formulation no longer depends on w, only on a dot product! closed form This is the famous kernel trick. - never compute the feature map - learn using the closed form K - constant time for HD dot products Copied from: Junier Oliva 3/3/2015 42

Kernel SVM Run time What happens when we need to classify some x 0? Recall that w depends on α Our classifier for x 0 uses w Copied from: Junier Oliva 3/3/2015 43

Kernel SVM Run time What happens when we need to classify some x 0? Recall that w depends on α Our classifier for x 0 uses w Who needs w when we ve got dot products? Copied from: Junier Oliva 44

Kernel SVM Recap Pick kernel Solve the optimization to get α Classify as Compute b using the support vectors Copied from: Junier Oliva 45

Reminder on Kernels Remember Kernels are nothing but implicit feature maps Gram Matrix of a set of vectors x 1 x n in the inner product space defined by the kernel K Gram Matrix is always positive definite 46

Bayes Rule

Law of Large Numbers

Central Limit Theorem

Tail Bounds

More Tail Bounds