Workshop on Web- scale Vision and Social Media ECCV 2012, Firenze, Italy October, 2012 Linearized Smooth Additive Classifiers
|
|
- Dylan Godwin Nelson
- 6 years ago
- Views:
Transcription
1 Worshop on Web- scale Vision and Social Media ECCV 2012, Firenze, Italy October, 2012 Linearized Smooth Additive Classifiers Subhransu Maji Research Assistant Professor Toyota Technological Institute at Chicago
2 Generalized Additive Models f(x 1,x 2,...,x n )=f 1 (x 1 )+f 2 (x 2 )+...+ f n (x n ) Why use them? Efficiency : can be efficiently evaluated Interpretability : Simple generalization of linear classifiers, i.e., may lead to models that are interpretable Well nown in the statistics community Generalized Additive Models (Hastie & Tibshirani 90) However traditional learning algorithms do not scale well (e.g. bacfitting algorithm )
3 Additive ernels in computer vision Images are represented as histograms of low level features such as color and texture [Swain and Ballard 01, Odone et al. 05] Histogram based similarity measures are typically additive K min (x, y) = min(x i,y i ) K 2(x, y) = 2x i y i x i + y i Other examples of additive ernels based on approximate correspondence : Pyramid Match Kernel, Grauman and Darrell, CVPR 05 Spatial Pyramid Match Kernel, Lazebni, Schmidt and Ponce, CVPR 06
4 Directly learning additive classifiers f(x 1,x 2,...,x n )=f 1 (x 1 )+f 2 (x 2 )+...+ f n (x n ) A SVM lie optimization framewor min f2f l y,f(x ) + R(f) x 2 R d y 2{+1, 1} Loss function on the data e.g. hinge loss function Regularization e.g. derivative norm l y,f(x ) = max(0, 1 y f(x ))
5 Strategy for learning additive classifiers min f2f l y,f(x ) + R(f) x 2 R d y 2{+1, 1}
6 Strategy for learning additive classifiers min f2f l y,f(x ) + R(f) x 2 R d y 2{+1, 1} f i = j w i j i j Basis expansion
7 Strategy for learning additive classifiers min f2f l y,f(x ) + R(f) x 2 R d y 2{+1, 1} f i = j w i j i j Smoothness penalty on the weights Basis expansion
8 Strategy for learning additive classifiers min f2f l y,f(x ) + R(f) x 2 R d y 2{+1, 1} f i = j w i j i j Smoothness penalty on the weights Basis expansion Search for representations of the function and regularization for which the optimization can be efficiently solved In particular we want to leverage the latest methods for learning linear classifiers (almost linear time algorithms)
9 Spline embeddings : motivation f(x 1,x 2,...,x n )=f 1 (x 1 )+f 2 (x 2 )+...+ f n (x n ) i w i i Represent each function using a uniformly spaced spline basis Motivated by our earlier analysis that splines approximate additive classifiers well Popularized by Eilers and Marx (P-Splines 02, 04) for additive modeling Well nown in graphics (DeBoor 01) Question: What is the regularization on w? Figure from Eilers & Marx 04
10 Spline embeddings : representation Representation f(x) =w T (x) Linear'B)Spline' Quadra0c'B)Spline' Cubic'B)Spline' Project data onto a uniformly spaced spline basis
11 Spline embeddings : regularization Representation f(x) =w T (x) Linear'B)Spline' Quadra0c'B)Spline' Cubic'B)Spline' Regularization R(f) =w T Hw H = D T D D 0 = I Penalize differences between adjacent weights Ensures that the learned function is smooth Similar to Penalized Splines of Eilers and Marx 02
12 Spline embeddings : optimization Representation f(x) =w T (x) Linear'B)Spline' Quadra0c'B)Spline' Cubic'B)Spline' Regularization R(f) =w T Hw H = D T D D 0 = I Optimization c(w) = 2 w T D T d D d w + 1 n max 0, 1 y w T (x )
13 Spline embeddings : linear case Representation f(x) =w T (x) Linear'B)Spline' Quadra0c'B)Spline' Cubic'B)Spline' Regularization zero order differences [Maji and Berg, ICCV 09] D 0 = I Reduces to a standard linear SVM Projected features are sparse. At most non-zero entries for a basis of degree. Optimization c(w) = 2 w T w + 1 n max 0, 1 y w T (x )
14 Spline embeddings : general case Representation f(x) =w T (x) Linear'B)Spline' Quadra0c'B)Spline' Cubic'B)Spline' Regularization Penalize first order differences. Non-standard SVM due to regularization [Eilers & Marx 02] Can offer better smoothness when some bins have few data points Optimization c(w) = 2 w T D T 1 D 1 w + 1 n max 0, 1 y w T (x )
15 Spline embeddings : linearization... Original Problem c(w) = 2 w T D T d D d w + 1 n max 0, 1 y w T (x )
16 Spline embeddings : linearization... D T 1 = C A Original Problem c(w) = 2 w T D T d D d w + 1 n Upper Triangular max 0, 1 y w T (x ) Re-Parameterization of the weight c(w) = 2 w T w + 1 n max 0, 1 y w T D T 1 (x )
17 Spline embeddings : linearization... = D T 1... D T 1 = C A Equivalently a change of basis Upper Triangular c(w) = 2 w T D T d D d w + 1 n max 0, 1 y w T (x ) Re-Parameterization of the features c(w) = 2 w T w + 1 n max 0, 1 y w T D T 1 (x )
18 Spline embeddings : linearization Implicit ernel... = D T 1... Equivalently a change of basis K 1 1 = T c(w) = 2 w T D T d D d w + 1 n max 0, 1 y w T (x ) Re-Parameterization of the features c(w) = 2 w T w + 1 n max 0, 1 y w T D T 1 (x )
19 Spline embeddings : visualizing the ernel Various basis and regularization Fixing order(regularization)=1, and varying degree(basis) Smooth variants of the min ernel
20 Spline embeddings : visualizing the basis Various basis and regularization i(x) =(x i ) r + The basis are equivalent to truncated polynomial basis if: degree(basis) - order(regularization) = 1 [DeBoor 01]
21 Solving the optimization efficiently c(w) = 2 w T D T d D d w + 1 n max 0, 1 y w T (x ) Original problem : non-standard regularization
22 Solving the optimization efficiently c(w) = 2 w T D T d D d w + 1 n max 0, 1 y w T (x ) Original problem : non-standard regularization c(w) = 2 w T w + 1 n max 0, 1 y w T D T d (x ) Modified problem : dense features (memory bottlenec)
23 Solving the optimization efficiently c(w) = 2 w T D T d D d w + 1 n max 0, 1 y w T (x ) Original problem : non-standard regularization c(w) = 2 w T w + 1 n max 0, 1 y w T D T d (x ) Modified problem : dense features (memory bottlenec) Solution : implicit representation maintain: classification: f(x) =w T d (x) O(d) vs O(n)
24 Solving the optimization efficiently c(w) = 2 w T D T d D d w + 1 n max 0, 1 y w T (x ) Original problem : non-standard regularization c(w) = 2 w T w + 1 n max 0, 1 y w T D T d (x ) Modified problem : dense features (memory bottlenec) Solution : implicit representation maintain: classification: f(x) =w T d (x) O(d) vs O(n) update:
25 maintain: Solving the optimization efficiently Computing the updates classification: f(x) =w T d (x) update:
26 maintain: Solving the optimization efficiently Computing the updates classification: f(x) =w T d (x) update: Given n linearly spaced basis of degree d, computing updates to wd taes O(n 2 ) time.
27 maintain: Solving the optimization efficiently Computing the updates classification: f(x) =w T d (x) update: Given n linearly spaced basis of degree d, computing updates to wd taes O(n 2 ) time. However, one can compute it on O(nd) by exploiting the structure of D (see paper below) S. Maji, Linearized Smooth Addi4ve Classifiers, ECCV WS 2012
28 maintain: Solving the optimization efficiently Computing the updates classification: f(x) =w T d (x) update: Given n linearly spaced basis of degree d, computing updates to wd taes O(n 2 ) time. However, one can compute it on O(nd) by exploiting the structure of D (see paper below) Hence these classifiers can be trained with very low memory overhead, without compromising training time S. Maji, Linearized Smooth Addi4ve Classifiers, ECCV WS 2012
29 Spline embeddings : computational tradeoffs D 2 regularization order D 1 D 0 linear quadratic cubic spline degree
30 Spline embeddings : computational tradeoffs D 2 regularization order D 1 Standard linear solver Accuracy increases Test time increases linearly Training time increases linearly D s, 90.23% 7.26s, 90.34% 10.08s, 90.39% linear quadratic cubic spline degree Experiments on DC pedestrian dataset ( n = 20) IKSVM training time : 360s
31 Spline embeddings : computational tradeoffs D s 89.06% Custom solver for order > 1 Accuracy peas at D1 regularization order D 1 D s 91.20% 5.96s 90.23% This suggests that first order smoothness is sufficient Training time increases linearly Test time constant linear quadratic cubic spline degree Experiments on DC pedestrian dataset ( n = 20) IKSVM training time : 360s
32 Spline embeddings : computational tradeoffs D 2 Useful regime for most datasets regularization order D 1 D 0 linear quadratic cubic spline degree
33 Spline embeddings : computational tradeoffs D 2 Useful regime for most datasets regularization order D 1 most accurate closely approximates IKSVM fastest D 0 linear quadratic cubic spline degree
34 Spline embeddings : computational tradeoffs D 2 Useful regime for most datasets regularization order D 1 most accurate closely approximates IKSVM fastest D 0 linear quadratic cubic spline degree Experiments on MNIST, INRIA pedestrians, Caltech 101, DC pedestrians, etc All these are still an order of magnitude faster than traditional SVM solver. These have the same memory overhead as linear SVMs since the features are computed online.
35 General basis expansion f(x 1,x 2,...,x n )=f 1 (x 1 )+f 2 (x 2 )+...+ f n (x n ) i w i i local splines are the basis In general can choose any orthonormal basis
36 Fourier embeddings : representation Representation f(x) = w i i (x) Z b a 1(x), 1 (x),..., n (x) i(x) j (x) (x)dx = i,j An orthonormal basis Examples: Trigonometric functions, Wavelets, etc.
37 Fourier embeddings : regularization Representation f(x) = w i i (x) Z b a 1(x), 1 (x),..., n (x) i(x) j (x) (x)dx = i,j An orthonormal basis Regularization : penalize d th order derivative Z b a f d (x) 2 (x)dx = R(f) =w T Hw Z b a H i,j = w i w j d i (x) d j (x) (x)dx Z b a d i (x) d j (x)w(x)dx Similar to the spline case (different basis) Requires the basis to be differentiable
38 Fourier embeddings : optimization Representation f(x) = w i i (x) Z b a 1(x), 1 (x),..., n (x) i(x) j (x) (x)dx = i,j An orthonormal basis Regularization : penalize d th order derivative Z b a f d (x) 2 (x)dx = R(f) =w T Hw Z b a H i,j = w i w j d i (x) d j (x) (x)dx Z b a d i (x) d j (x)w(x)dx Optimization c(w) = 2 w T Hw + 1 n max 0, 1 y w T (x )
39 Fourier embeddings : optimization Optimization c(w) = 2 w T Hw + 1 n max 0, 1 y w T (x ) is not diagonal - cannot directly use fast linear solvers In general it is not structured either (unlie splines)
40 Fourier embeddings : optimization Optimization c(w) = 2 w T Hw + 1 n max 0, 1 y w T (x ) is not diagonal - cannot directly use fast linear solvers In general it is not structured either (unlie splines) Z b a Practical solution Pic orthogonal basis with orthogonal derivatives Cross terms disappear, i.e., is diagonal again f d (x) 2 (x)dx = Z b a w i w j d i (x) d j (x) (x)dx / w 2 i
41 Fourier embeddings : optimization Optimization c(w) = 2 w T Hw + 1 n max 0, 1 y w T (x ) is not diagonal - cannot directly use fast linear solvers In general it is not structured either (unlie splines) Z b a Practical solution Pic orthogonal basis with orthogonal derivatives Cross terms disappear, i.e., is diagonal again f d (x) 2 (x)dx = Z b a w i w j d i (x) d j (x) (x)dx / w 2 i Examples: Trigonometric functions, One of Jacobi, Laguerre or Hermite polynomials M. Webster, Orthogonal polynomials with orthogonal derivafves. Mathema'sche Zeitschri., 39: , 1935
42 Fourier Embeddings : Two practical ones Two families of orthogonal basis with orthogonal derivatives Embeddings that penalize the first and second order derivatives Learning : project data onto the first few basis and use a linear solver such as LIBLINEAR Fourier features are low dimensional and dense, as opposed to spline features which are high dimensional but sparse.
43 Comparison of various additive classifiers DC pedestrian dataset
44 Comparison of various additive classifiers MNIST dataset
45 Comparison of various additive classifiers
46 Software Code to train large scale additive classifiers. Provides functions: train : input (y,x) and outputs additive classifiers choice of various encodings, regularizations. encodings are computed online implements efficient weight updates for splines features classify : taes a learned classifier and features, outputs decision values encode : returns encoded features which can be directly used with any linear solver Download at:
47 Conclusions We discussed methods to directly learn additive classifiers based on a regularized loss minimization We proposed two inds of basis for which the learning problem can be efficiently solved, Spline and Fourier embeddings Spline embeddings are sparse, easy to compute, and can be used to learn classifiers with almost no memory overhead, compared to learning linear classifiers. Fourier embeddings (Trigonometric and Hermite) are low dimensional, but are relatively expensive to compute, hence are useful in setting where features can be stored in memory More experimental details and code can be found on the author s website (ttic.uchicago.edu/~smaji)
Max-Margin Additive Classifiers for Detection
Max-Margin Additive Classifiers for Detection Subhransu Maji and Alexander C. Berg Sam Hare VGG Reading Group October 30, 2009 Introduction CVPR08: SVMs with additive kernels can be evaluated efficiently.
More informationKaggle.
Administrivia Mini-project 2 due April 7, in class implement multi-class reductions, naive bayes, kernel perceptron, multi-class logistic regression and two layer neural networks training set: Project
More informationDeep Learning for Computer Vision
Deep Learning for Computer Vision Lecture 4: Curse of Dimensionality, High Dimensional Feature Spaces, Linear Classifiers, Linear Regression, Python, and Jupyter Notebooks Peter Belhumeur Computer Science
More informationStatistical Methods for Data Mining
Statistical Methods for Data Mining Kuangnan Fang Xiamen University Email: xmufkn@xmu.edu.cn Support Vector Machines Here we approach the two-class classification problem in a direct way: We try and find
More informationMachine Learning. Kernels. Fall (Kernels, Kernelized Perceptron and SVM) Professor Liang Huang. (Chap. 12 of CIML)
Machine Learning Fall 2017 Kernels (Kernels, Kernelized Perceptron and SVM) Professor Liang Huang (Chap. 12 of CIML) Nonlinear Features x4: -1 x1: +1 x3: +1 x2: -1 Concatenated (combined) features XOR:
More informationStatistical Methods for SVM
Statistical Methods for SVM Support Vector Machines Here we approach the two-class classification problem in a direct way: We try and find a plane that separates the classes in feature space. If we cannot,
More informationLecture 18: Kernels Risk and Loss Support Vector Regression. Aykut Erdem December 2016 Hacettepe University
Lecture 18: Kernels Risk and Loss Support Vector Regression Aykut Erdem December 2016 Hacettepe University Administrative We will have a make-up lecture on next Saturday December 24, 2016 Presentations
More informationSupport Vector Machine (continued)
Support Vector Machine continued) Overlapping class distribution: In practice the class-conditional distributions may overlap, so that the training data points are no longer linearly separable. We need
More informationSupport Vector Machines
Support Vector Machines Here we approach the two-class classification problem in a direct way: We try and find a plane that separates the classes in feature space. If we cannot, we get creative in two
More informationReproducing Kernel Hilbert Spaces
9.520: Statistical Learning Theory and Applications February 10th, 2010 Reproducing Kernel Hilbert Spaces Lecturer: Lorenzo Rosasco Scribe: Greg Durrett 1 Introduction In the previous two lectures, we
More informationEfficient HIK SVM Learning for Image Classification
ACCEPTED BY IEEE T-IP 1 Efficient HIK SVM Learning for Image Classification Jianxin Wu, Member, IEEE Abstract Histograms are used in almost every aspect of image processing and computer vision, from visual
More informationSupport Vector Machine I
Support Vector Machine I Jia-Bin Huang ECE-5424G / CS-5824 Virginia Tech Spring 2019 Administrative Please use piazza. No emails. HW 0 grades are back. Re-grade request for one week. HW 1 due soon. HW
More informationSupport Vector Machine (SVM) and Kernel Methods
Support Vector Machine (SVM) and Kernel Methods CE-717: Machine Learning Sharif University of Technology Fall 2015 Soleymani Outline Margin concept Hard-Margin SVM Soft-Margin SVM Dual Problems of Hard-Margin
More informationKernelized Perceptron Support Vector Machines
Kernelized Perceptron Support Vector Machines Emily Fox University of Washington February 13, 2017 What is the perceptron optimizing? 1 The perceptron algorithm [Rosenblatt 58, 62] Classification setting:
More informationSupport Vector Machine (SVM) and Kernel Methods
Support Vector Machine (SVM) and Kernel Methods CE-717: Machine Learning Sharif University of Technology Fall 2014 Soleymani Outline Margin concept Hard-Margin SVM Soft-Margin SVM Dual Problems of Hard-Margin
More informationCS798: Selected topics in Machine Learning
CS798: Selected topics in Machine Learning Support Vector Machine Jakramate Bootkrajang Department of Computer Science Chiang Mai University Jakramate Bootkrajang CS798: Selected topics in Machine Learning
More informationReproducing Kernel Hilbert Spaces
Reproducing Kernel Hilbert Spaces Lorenzo Rosasco 9.520 Class 03 February 12, 2007 About this class Goal To introduce a particularly useful family of hypothesis spaces called Reproducing Kernel Hilbert
More informationNearest Neighbor. Machine Learning CSE546 Kevin Jamieson University of Washington. October 26, Kevin Jamieson 2
Nearest Neighbor Machine Learning CSE546 Kevin Jamieson University of Washington October 26, 2017 2017 Kevin Jamieson 2 Some data, Bayes Classifier Training data: True label: +1 True label: -1 Optimal
More informationCSC 411 Lecture 17: Support Vector Machine
CSC 411 Lecture 17: Support Vector Machine Ethan Fetaya, James Lucas and Emad Andrews University of Toronto CSC411 Lec17 1 / 1 Today Max-margin classification SVM Hard SVM Duality Soft SVM CSC411 Lec17
More informationFantope Regularization in Metric Learning
Fantope Regularization in Metric Learning CVPR 2014 Marc T. Law (LIP6, UPMC), Nicolas Thome (LIP6 - UPMC Sorbonne Universités), Matthieu Cord (LIP6 - UPMC Sorbonne Universités), Paris, France Introduction
More informationSupport Vector Machines
Support Vector Machines Hypothesis Space variable size deterministic continuous parameters Learning Algorithm linear and quadratic programming eager batch SVMs combine three important ideas Apply optimization
More informationCS 231A Section 1: Linear Algebra & Probability Review. Kevin Tang
CS 231A Section 1: Linear Algebra & Probability Review Kevin Tang Kevin Tang Section 1-1 9/30/2011 Topics Support Vector Machines Boosting Viola Jones face detector Linear Algebra Review Notation Operations
More informationCS 231A Section 1: Linear Algebra & Probability Review
CS 231A Section 1: Linear Algebra & Probability Review 1 Topics Support Vector Machines Boosting Viola-Jones face detector Linear Algebra Review Notation Operations & Properties Matrix Calculus Probability
More informationScientific Computing: An Introductory Survey
Scientific Computing: An Introductory Survey Chapter 7 Interpolation Prof. Michael T. Heath Department of Computer Science University of Illinois at Urbana-Champaign Copyright c 2002. Reproduction permitted
More informationRecap from previous lecture
Recap from previous lecture Learning is using past experience to improve future performance. Different types of learning: supervised unsupervised reinforcement active online... For a machine, experience
More informationLinear vs Non-linear classifier. CS789: Machine Learning and Neural Network. Introduction
Linear vs Non-linear classifier CS789: Machine Learning and Neural Network Support Vector Machine Jakramate Bootkrajang Department of Computer Science Chiang Mai University Linear classifier is in the
More informationReview: Support vector machines. Machine learning techniques and image analysis
Review: Support vector machines Review: Support vector machines Margin optimization min (w,w 0 ) 1 2 w 2 subject to y i (w 0 + w T x i ) 1 0, i = 1,..., n. Review: Support vector machines Margin optimization
More informationCS4495/6495 Introduction to Computer Vision. 8C-L3 Support Vector Machines
CS4495/6495 Introduction to Computer Vision 8C-L3 Support Vector Machines Discriminative classifiers Discriminative classifiers find a division (surface) in feature space that separates the classes Several
More informationBeyond Spatial Pyramids
Beyond Spatial Pyramids Receptive Field Learning for Pooled Image Features Yangqing Jia 1 Chang Huang 2 Trevor Darrell 1 1 UC Berkeley EECS 2 NEC Labs America Goal coding pooling Bear Analysis of the pooling
More informationInvariant Scattering Convolution Networks
Invariant Scattering Convolution Networks Joan Bruna and Stephane Mallat Submitted to PAMI, Feb. 2012 Presented by Bo Chen Other important related papers: [1] S. Mallat, A Theory for Multiresolution Signal
More informationSupport Vector Machine (SVM) and Kernel Methods
Support Vector Machine (SVM) and Kernel Methods CE-717: Machine Learning Sharif University of Technology Fall 2016 Soleymani Outline Margin concept Hard-Margin SVM Soft-Margin SVM Dual Problems of Hard-Margin
More informationLinear & nonlinear classifiers
Linear & nonlinear classifiers Machine Learning Hamid Beigy Sharif University of Technology Fall 1396 Hamid Beigy (Sharif University of Technology) Linear & nonlinear classifiers Fall 1396 1 / 44 Table
More informationFPGA Implementation of a HOG-based Pedestrian Recognition System
MPC Workshop Karlsruhe 10/7/2009 FPGA Implementation of a HOG-based Pedestrian Recognition System Sebastian Bauer sebastian.bauer@fh-aschaffenburg.de Laboratory for Pattern Recognition and Computational
More informationSupport Vector Machines
Support Vector Machines INFO-4604, Applied Machine Learning University of Colorado Boulder September 28, 2017 Prof. Michael Paul Today Two important concepts: Margins Kernels Large Margin Classification
More informationCompressed Sensing and Neural Networks
and Jan Vybíral (Charles University & Czech Technical University Prague, Czech Republic) NOMAD Summer Berlin, September 25-29, 2017 1 / 31 Outline Lasso & Introduction Notation Training the network Applications
More informationPattern Recognition 2018 Support Vector Machines
Pattern Recognition 2018 Support Vector Machines Ad Feelders Universiteit Utrecht Ad Feelders ( Universiteit Utrecht ) Pattern Recognition 1 / 48 Support Vector Machines Ad Feelders ( Universiteit Utrecht
More informationSupport Vector Machines
Support Vector Machines Ryan M. Rifkin Google, Inc. 2008 Plan Regularization derivation of SVMs Geometric derivation of SVMs Optimality, Duality and Large Scale SVMs The Regularization Setting (Again)
More informationBasis Expansion and Nonlinear SVM. Kai Yu
Basis Expansion and Nonlinear SVM Kai Yu Linear Classifiers f(x) =w > x + b z(x) = sign(f(x)) Help to learn more general cases, e.g., nonlinear models 8/7/12 2 Nonlinear Classifiers via Basis Expansion
More informationLecture 14 : Online Learning, Stochastic Gradient Descent, Perceptron
CS446: Machine Learning, Fall 2017 Lecture 14 : Online Learning, Stochastic Gradient Descent, Perceptron Lecturer: Sanmi Koyejo Scribe: Ke Wang, Oct. 24th, 2017 Agenda Recap: SVM and Hinge loss, Representer
More informationNonlinear Support Vector Machines through Iterative Majorization and I-Splines
Nonlinear Support Vector Machines through Iterative Majorization and I-Splines P.J.F. Groenen G. Nalbantov J.C. Bioch July 9, 26 Econometric Institute Report EI 26-25 Abstract To minimize the primal support
More informationNonlinearity & Preprocessing
Nonlinearity & Preprocessing Nonlinear Features x4: -1 x1: +1 x3: +1 x2: -1 Concatenated (combined) features XOR: x = (x 1, x 2, x 1 x 2 ) income: add degree + major Perceptron Map data into feature space
More informationLinear & nonlinear classifiers
Linear & nonlinear classifiers Machine Learning Hamid Beigy Sharif University of Technology Fall 1394 Hamid Beigy (Sharif University of Technology) Linear & nonlinear classifiers Fall 1394 1 / 34 Table
More informationStat542 (F11) Statistical Learning. First consider the scenario where the two classes of points are separable.
Linear SVM (separable case) First consider the scenario where the two classes of points are separable. It s desirable to have the width (called margin) between the two dashed lines to be large, i.e., have
More informationKernel Learning for Multi-modal Recognition Tasks
Kernel Learning for Multi-modal Recognition Tasks J. Saketha Nath CSE, IIT-B IBM Workshop J. Saketha Nath (IIT-B) IBM Workshop 23-Sep-09 1 / 15 Multi-modal Learning Tasks Multiple views or descriptions
More informationGlobal Scene Representations. Tilke Judd
Global Scene Representations Tilke Judd Papers Oliva and Torralba [2001] Fei Fei and Perona [2005] Labzebnik, Schmid and Ponce [2006] Commonalities Goal: Recognize natural scene categories Extract features
More informationFastfood Approximating Kernel Expansions in Loglinear Time. Quoc Le, Tamas Sarlos, and Alex Smola Presenter: Shuai Zheng (Kyle)
Fastfood Approximating Kernel Expansions in Loglinear Time Quoc Le, Tamas Sarlos, and Alex Smola Presenter: Shuai Zheng (Kyle) Large Scale Problem: ImageNet Challenge Large scale data Number of training
More informationReproducing Kernel Hilbert Spaces
Reproducing Kernel Hilbert Spaces Lorenzo Rosasco 9.520 Class 03 February 11, 2009 About this class Goal To introduce a particularly useful family of hypothesis spaces called Reproducing Kernel Hilbert
More informationOutline. 1 Interpolation. 2 Polynomial Interpolation. 3 Piecewise Polynomial Interpolation
Outline Interpolation 1 Interpolation 2 3 Michael T. Heath Scientific Computing 2 / 56 Interpolation Motivation Choosing Interpolant Existence and Uniqueness Basic interpolation problem: for given data
More informationCompressed Fisher vectors for LSVR
XRCE@ILSVRC2011 Compressed Fisher vectors for LSVR Florent Perronnin and Jorge Sánchez* Xerox Research Centre Europe (XRCE) *Now with CIII, Cordoba University, Argentina Our system in a nutshell High-dimensional
More information7.1 Support Vector Machine
67577 Intro. to Machine Learning Fall semester, 006/7 Lecture 7: Support Vector Machines an Kernel Functions II Lecturer: Amnon Shashua Scribe: Amnon Shashua 7. Support Vector Machine We return now to
More informationConnection of Local Linear Embedding, ISOMAP, and Kernel Principal Component Analysis
Connection of Local Linear Embedding, ISOMAP, and Kernel Principal Component Analysis Alvina Goh Vision Reading Group 13 October 2005 Connection of Local Linear Embedding, ISOMAP, and Kernel Principal
More informationDeep Learning: Approximation of Functions by Composition
Deep Learning: Approximation of Functions by Composition Zuowei Shen Department of Mathematics National University of Singapore Outline 1 A brief introduction of approximation theory 2 Deep learning: approximation
More informationMachine Learning Practice Page 2 of 2 10/28/13
Machine Learning 10-701 Practice Page 2 of 2 10/28/13 1. True or False Please give an explanation for your answer, this is worth 1 pt/question. (a) (2 points) No classifier can do better than a naive Bayes
More informationStatistical Machine Learning from Data
Samy Bengio Statistical Machine Learning from Data 1 Statistical Machine Learning from Data Support Vector Machines Samy Bengio IDIAP Research Institute, Martigny, Switzerland, and Ecole Polytechnique
More informationDeviations from linear separability. Kernel methods. Basis expansion for quadratic boundaries. Adding new features Systematic deviation
Deviations from linear separability Kernel methods CSE 250B Noise Find a separator that minimizes a convex loss function related to the number of mistakes. e.g. SVM, logistic regression. Systematic deviation
More informationReducing Multiclass to Binary: A Unifying Approach for Margin Classifiers
Reducing Multiclass to Binary: A Unifying Approach for Margin Classifiers Erin Allwein, Robert Schapire and Yoram Singer Journal of Machine Learning Research, 1:113-141, 000 CSE 54: Seminar on Learning
More informationMLCC 2017 Regularization Networks I: Linear Models
MLCC 2017 Regularization Networks I: Linear Models Lorenzo Rosasco UNIGE-MIT-IIT June 27, 2017 About this class We introduce a class of learning algorithms based on Tikhonov regularization We study computational
More informationKernel methods CSE 250B
Kernel methods CSE 250B Deviations from linear separability Noise Find a separator that minimizes a convex loss function related to the number of mistakes. e.g. SVM, logistic regression. Deviations from
More informationChapter 9. Support Vector Machine. Yongdai Kim Seoul National University
Chapter 9. Support Vector Machine Yongdai Kim Seoul National University 1. Introduction Support Vector Machine (SVM) is a classification method developed by Vapnik (1996). It is thought that SVM improved
More informationCSE 151 Machine Learning. Instructor: Kamalika Chaudhuri
CSE 151 Machine Learning Instructor: Kamalika Chaudhuri Linear Classification Given labeled data: (xi, feature vector yi) label i=1,..,n where y is 1 or 1, find a hyperplane to separate from Linear Classification
More informationRiemannian Metric Learning for Symmetric Positive Definite Matrices
CMSC 88J: Linear Subspaces and Manifolds for Computer Vision and Machine Learning Riemannian Metric Learning for Symmetric Positive Definite Matrices Raviteja Vemulapalli Guide: Professor David W. Jacobs
More informationDATA MINING AND MACHINE LEARNING
DATA MINING AND MACHINE LEARNING Lecture 5: Regularization and loss functions Lecturer: Simone Scardapane Academic Year 2016/2017 Table of contents Loss functions Loss functions for regression problems
More informationSupport Vector Machine II
Support Vector Machine II Jia-Bin Huang ECE-5424G / CS-5824 Virginia Tech Spring 2019 Administrative HW 1 due tonight HW 2 released. Online Scalable Learning Adaptive to Unknown Dynamics and Graphs Yanning
More informationLinear Regression. Aarti Singh. Machine Learning / Sept 27, 2010
Linear Regression Aarti Singh Machine Learning 10-701/15-781 Sept 27, 2010 Discrete to Continuous Labels Classification Sports Science News Anemic cell Healthy cell Regression X = Document Y = Topic X
More informationSupport Vector Machines: Training with Stochastic Gradient Descent. Machine Learning Fall 2017
Support Vector Machines: Training with Stochastic Gradient Descent Machine Learning Fall 2017 1 Support vector machines Training by maximizing margin The SVM objective Solving the SVM optimization problem
More informationSupport Vector Machines
Wien, June, 2010 Paul Hofmarcher, Stefan Theussl, WU Wien Hofmarcher/Theussl SVM 1/21 Linear Separable Separating Hyperplanes Non-Linear Separable Soft-Margin Hyperplanes Hofmarcher/Theussl SVM 2/21 (SVM)
More informationSupport Vector Machine (SVM) & Kernel CE-717: Machine Learning Sharif University of Technology. M. Soleymani Fall 2012
Support Vector Machine (SVM) & Kernel CE-717: Machine Learning Sharif University of Technology M. Soleymani Fall 2012 Linear classifier Which classifier? x 2 x 1 2 Linear classifier Margin concept x 2
More informationSupport Vector Machines
Two SVM tutorials linked in class website (please, read both): High-level presentation with applications (Hearst 1998) Detailed tutorial (Burges 1998) Support Vector Machines Machine Learning 10701/15781
More informationWhat is semi-supervised learning?
What is semi-supervised learning? In many practical learning domains, there is a large supply of unlabeled data but limited labeled data, which can be expensive to generate text processing, video-indexing,
More informationLECTURE 7 Support vector machines
LECTURE 7 Support vector machines SVMs have been used in a multitude of applications and are one of the most popular machine learning algorithms. We will derive the SVM algorithm from two perspectives:
More informationSupport Vector Machines
EE 17/7AT: Optimization Models in Engineering Section 11/1 - April 014 Support Vector Machines Lecturer: Arturo Fernandez Scribe: Arturo Fernandez 1 Support Vector Machines Revisited 1.1 Strictly) Separable
More informationL5 Support Vector Classification
L5 Support Vector Classification Support Vector Machine Problem definition Geometrical picture Optimization problem Optimization Problem Hard margin Convexity Dual problem Soft margin problem Alexander
More informationSequential Minimal Optimization (SMO)
Data Science and Machine Intelligence Lab National Chiao Tung University May, 07 The SMO algorithm was proposed by John C. Platt in 998 and became the fastest quadratic programming optimization algorithm,
More informationCHAPTER 11. A Revision. 1. The Computers and Numbers therein
CHAPTER A Revision. The Computers and Numbers therein Traditional computer science begins with a finite alphabet. By stringing elements of the alphabet one after another, one obtains strings. A set of
More informationwhich arises when we compute the orthogonal projection of a vector y in a subspace with an orthogonal basis. Hence assume that P y = A ij = x j, x i
MODULE 6 Topics: Gram-Schmidt orthogonalization process We begin by observing that if the vectors {x j } N are mutually orthogonal in an inner product space V then they are necessarily linearly independent.
More informationSupport Vector Machines for Classification: A Statistical Portrait
Support Vector Machines for Classification: A Statistical Portrait Yoonkyung Lee Department of Statistics The Ohio State University May 27, 2011 The Spring Conference of Korean Statistical Society KAIST,
More informationNumerical Analysis Preliminary Exam 10 am to 1 pm, August 20, 2018
Numerical Analysis Preliminary Exam 1 am to 1 pm, August 2, 218 Instructions. You have three hours to complete this exam. Submit solutions to four (and no more) of the following six problems. Please start
More informationData Mining Stat 588
Data Mining Stat 588 Lecture 9: Basis Expansions Department of Statistics & Biostatistics Rutgers University Nov 01, 2011 Regression and Classification Linear Regression. E(Y X) = f(x) We want to learn
More informationEfficient Quasi-Newton Proximal Method for Large Scale Sparse Optimization
Efficient Quasi-Newton Proximal Method for Large Scale Sparse Optimization Xiaocheng Tang Department of Industrial and Systems Engineering Lehigh University Bethlehem, PA 18015 xct@lehigh.edu Katya Scheinberg
More informationCS6375: Machine Learning Gautam Kunapuli. Support Vector Machines
Gautam Kunapuli Example: Text Categorization Example: Develop a model to classify news stories into various categories based on their content. sports politics Use the bag-of-words representation for this
More informationSupport Vector Machines
Support Vector Machines Tobias Pohlen Selected Topics in Human Language Technology and Pattern Recognition February 10, 2014 Human Language Technology and Pattern Recognition Lehrstuhl für Informatik 6
More informationECE521 week 3: 23/26 January 2017
ECE521 week 3: 23/26 January 2017 Outline Probabilistic interpretation of linear regression - Maximum likelihood estimation (MLE) - Maximum a posteriori (MAP) estimation Bias-variance trade-off Linear
More informationInfinite Ensemble Learning with Support Vector Machinery
Infinite Ensemble Learning with Support Vector Machinery Hsuan-Tien Lin and Ling Li Learning Systems Group, California Institute of Technology ECML/PKDD, October 4, 2005 H.-T. Lin and L. Li (Learning Systems
More informationBeyond the Point Cloud: From Transductive to Semi-Supervised Learning
Beyond the Point Cloud: From Transductive to Semi-Supervised Learning Vikas Sindhwani, Partha Niyogi, Mikhail Belkin Andrew B. Goldberg goldberg@cs.wisc.edu Department of Computer Sciences University of
More informationA Modern Look at Classical Multivariate Techniques
A Modern Look at Classical Multivariate Techniques Yoonkyung Lee Department of Statistics The Ohio State University March 16-20, 2015 The 13th School of Probability and Statistics CIMAT, Guanajuato, Mexico
More informationEE613 Machine Learning for Engineers. Kernel methods Support Vector Machines. jean-marc odobez 2015
EE613 Machine Learning for Engineers Kernel methods Support Vector Machines jean-marc odobez 2015 overview Kernel methods introductions and main elements defining kernels Kernelization of k-nn, K-Means,
More informationData Mining. Linear & nonlinear classifiers. Hamid Beigy. Sharif University of Technology. Fall 1396
Data Mining Linear & nonlinear classifiers Hamid Beigy Sharif University of Technology Fall 1396 Hamid Beigy (Sharif University of Technology) Data Mining Fall 1396 1 / 31 Table of contents 1 Introduction
More informationSVAN 2016 Mini Course: Stochastic Convex Optimization Methods in Machine Learning
SVAN 2016 Mini Course: Stochastic Convex Optimization Methods in Machine Learning Mark Schmidt University of British Columbia, May 2016 www.cs.ubc.ca/~schmidtm/svan16 Some images from this lecture are
More informationDiscriminative Models
No.5 Discriminative Models Hui Jiang Department of Electrical Engineering and Computer Science Lassonde School of Engineering York University, Toronto, Canada Outline Generative vs. Discriminative models
More informationLarge Scale Semi-supervised Linear SVMs. University of Chicago
Large Scale Semi-supervised Linear SVMs Vikas Sindhwani and Sathiya Keerthi University of Chicago SIGIR 2006 Semi-supervised Learning (SSL) Motivation Setting Categorize x-billion documents into commercial/non-commercial.
More informationHOMEWORK 4: SVMS AND KERNELS
HOMEWORK 4: SVMS AND KERNELS CMU 060: MACHINE LEARNING (FALL 206) OUT: Sep. 26, 206 DUE: 5:30 pm, Oct. 05, 206 TAs: Simon Shaolei Du, Tianshu Ren, Hsiao-Yu Fish Tung Instructions Homework Submission: Submit
More informationScale-Invariance of Support Vector Machines based on the Triangular Kernel. Abstract
Scale-Invariance of Support Vector Machines based on the Triangular Kernel François Fleuret Hichem Sahbi IMEDIA Research Group INRIA Domaine de Voluceau 78150 Le Chesnay, France Abstract This paper focuses
More informationSupport vector machines
Support vector machines Jianxin Wu LAMDA Group National Key Lab for Novel Software Technology Nanjing University, China wujx2001@gmail.com May 10, 2018 Contents 1 The key SVM idea 2 1.1 Simplify it, simplify
More informationImproving the Convergence of Back-Propogation Learning with Second Order Methods
the of Back-Propogation Learning with Second Order Methods Sue Becker and Yann le Cun, Sept 1988 Kasey Bray, October 2017 Table of Contents 1 with Back-Propagation 2 the of BP 3 A Computationally Feasible
More informationLecture 17: October 27
0-725/36-725: Convex Optimiation Fall 205 Lecturer: Ryan Tibshirani Lecture 7: October 27 Scribes: Brandon Amos, Gines Hidalgo Note: LaTeX template courtesy of UC Berkeley EECS dept. Disclaimer: These
More informationLecture 3: Statistical Decision Theory (Part II)
Lecture 3: Statistical Decision Theory (Part II) Hao Helen Zhang Hao Helen Zhang Lecture 3: Statistical Decision Theory (Part II) 1 / 27 Outline of This Note Part I: Statistics Decision Theory (Classical
More informationMACHINE LEARNING. Methods for feature extraction and reduction of dimensionality: Probabilistic PCA and kernel PCA
1 MACHINE LEARNING Methods for feature extraction and reduction of dimensionality: Probabilistic PCA and kernel PCA 2 Practicals Next Week Next Week, Practical Session on Computer Takes Place in Room GR
More informationMachine Learning And Applications: Supervised Learning-SVM
Machine Learning And Applications: Supervised Learning-SVM Raphaël Bournhonesque École Normale Supérieure de Lyon, Lyon, France raphael.bournhonesque@ens-lyon.fr 1 Supervised vs unsupervised learning Machine
More informationThe Mahalanobis distance for functional data with applications to classification
arxiv:1304.4786v1 [math.st] 17 Apr 2013 The Mahalanobis distance for functional data with applications to classification Esdras Joseph, Pedro Galeano and Rosa E. Lillo Departamento de Estadística Universidad
More informationSupport Vector Machines. CSE 6363 Machine Learning Vassilis Athitsos Computer Science and Engineering Department University of Texas at Arlington
Support Vector Machines CSE 6363 Machine Learning Vassilis Athitsos Computer Science and Engineering Department University of Texas at Arlington 1 A Linearly Separable Problem Consider the binary classification
More information