Support Vector Machines and Speaker Verification

Similar documents
Support Vector Machine (SVM) and Kernel Methods

Support Vector Machine (SVM) and Kernel Methods

Support Vector Machine (SVM) & Kernel CE-717: Machine Learning Sharif University of Technology. M. Soleymani Fall 2012

Support Vector Machine (continued)

Support Vector Machines. Introduction to Data Mining, 2 nd Edition by Tan, Steinbach, Karpatne, Kumar

Support Vector Machines

Support Vector Machine (SVM) and Kernel Methods

Lecture 10: A brief introduction to Support Vector Machine

Support Vector Machines

Outline. Basic concepts: SVM and kernels SVM primal/dual problems. Chih-Jen Lin (National Taiwan Univ.) 1 / 22

Support Vector Machines

Perceptron Revisited: Linear Separators. Support Vector Machines

Introduction to Support Vector Machines

Pattern Recognition 2018 Support Vector Machines

Constrained Optimization and Support Vector Machines

Machine Learning. Support Vector Machines. Manfred Huber

Support Vector Machine

Announcements - Homework

Jeff Howbert Introduction to Machine Learning Winter

Review: Support vector machines. Machine learning techniques and image analysis

Chapter 9. Support Vector Machine. Yongdai Kim Seoul National University

Support Vector Machines: Maximum Margin Classifiers

Support Vector Machines (SVM) in bioinformatics. Day 1: Introduction to SVM

Support Vector Machines Explained

Indirect Rule Learning: Support Vector Machines. Donglin Zeng, Department of Biostatistics, University of North Carolina

Support Vector Machines

L5 Support Vector Classification

Linear classifiers selecting hyperplane maximizing separation margin between classes (large margin classifiers)

Statistical Machine Learning from Data

Non-Bayesian Classifiers Part II: Linear Discriminants and Support Vector Machines

Kernel Machines. Pradeep Ravikumar Co-instructor: Manuela Veloso. Machine Learning

Kernel Methods and Support Vector Machines

Machine Learning. Lecture 6: Support Vector Machine. Feng Li.

Machine Learning and Data Mining. Support Vector Machines. Kalev Kask

Support Vector Machines, Kernel SVM

Lecture 9: Large Margin Classifiers. Linear Support Vector Machines

Statistical Pattern Recognition

Support Vector Machines

CS6375: Machine Learning Gautam Kunapuli. Support Vector Machines

Linear & nonlinear classifiers

Data Mining. Linear & nonlinear classifiers. Hamid Beigy. Sharif University of Technology. Fall 1396

Introduction to SVM and RVM

Convex Optimization and Support Vector Machine

Lecture 10: Support Vector Machine and Large Margin Classifier

Linear, threshold units. Linear Discriminant Functions and Support Vector Machines. Biometrics CSE 190 Lecture 11. X i : inputs W i : weights

Linear vs Non-linear classifier. CS789: Machine Learning and Neural Network. Introduction

Neural Networks. Prof. Dr. Rudolf Kruse. Computational Intelligence Group Faculty for Computer Science

Statistical Methods for SVM

Lecture Notes on Support Vector Machine

Introduction to Support Vector Machines

Kernels and the Kernel Trick. Machine Learning Fall 2017

Support Vector Machines.

SVMs: Non-Separable Data, Convex Surrogate Loss, Multi-Class Classification, Kernels

Linear & nonlinear classifiers

Support Vector Machines

Machine Learning Support Vector Machines. Prof. Matteo Matteucci

(Kernels +) Support Vector Machines

Support Vector Machines. Maximizing the Margin

Support Vector Machines for Classification and Regression. 1 Linearly Separable Data: Hard Margin SVMs

ML (cont.): SUPPORT VECTOR MACHINES

Foundation of Intelligent Systems, Part I. SVM s & Kernel Methods

SVMs, Duality and the Kernel Trick

Soft-Margin Support Vector Machine

A short introduction to supervised learning, with applications to cancer pathway analysis Dr. Christina Leslie

CS145: INTRODUCTION TO DATA MINING

ICS-E4030 Kernel Methods in Machine Learning

Support Vector Machines II. CAP 5610: Machine Learning Instructor: Guo-Jun QI

Cheng Soon Ong & Christian Walder. Canberra February June 2018

Support Vector Machines

Support Vector Machines

Support Vector Machines

Linear classifiers selecting hyperplane maximizing separation margin between classes (large margin classifiers)

Support Vector Machines for Classification and Regression

EE613 Machine Learning for Engineers. Kernel methods Support Vector Machines. jean-marc odobez 2015

SUPPORT VECTOR MACHINE

Support Vector Machines and Kernel Methods

About this class. Maximizing the Margin. Maximum margin classifiers. Picture of large and small margin hyperplanes

CS798: Selected topics in Machine Learning

Support'Vector'Machines. Machine(Learning(Spring(2018 March(5(2018 Kasthuri Kannan

Support Vector Machines. CSE 6363 Machine Learning Vassilis Athitsos Computer Science and Engineering Department University of Texas at Arlington

Support Vector Machines and Kernel Methods

Deviations from linear separability. Kernel methods. Basis expansion for quadratic boundaries. Adding new features Systematic deviation

Support Vector Machine & Its Applications

Stat542 (F11) Statistical Learning. First consider the scenario where the two classes of points are separable.

Support vector machines Lecture 4

CSC 411 Lecture 17: Support Vector Machine

Support Vector Machine II

Support Vector Machine

Kernel methods CSE 250B

Linear classifiers selecting hyperplane maximizing separation margin between classes (large margin classifiers)

LMS Algorithm Summary

18.9 SUPPORT VECTOR MACHINES

Neural networks and support vector machines

Introduction to Logistic Regression and Support Vector Machine

Support Vector Machine

Max Margin-Classifier

Applied Machine Learning Annalisa Marsico

Introduction to Support Vector Machines

Learning From Data Lecture 25 The Kernel Trick

Lecture 18: Kernels Risk and Loss Support Vector Regression. Aykut Erdem December 2016 Hacettepe University

Transcription:

1 Support Vector Machines and Speaker Verification David Cinciruk March 6, 2013

2 Table of Contents Review of Speaker Verification Introduction to Support Vector Machines Derivation of SVM Equations Soft Margin Nonlinear Classification SVMs in Speaker Verification Examples of Kernels Used in Speaker Verification How to Perform

3 Review of Speaker Verification Speaker Verification as discussed before was done using Gaussian Mixture Models One popular way to perform speaker verification is with Support Vector Machines

4 Table of Contents Review of Speaker Verification Introduction to Support Vector Machines Derivation of SVM Equations Soft Margin Nonlinear Classification SVMs in Speaker Verification Examples of Kernels Used in Speaker Verification How to Perform

Motivation Training Data from two different classes are linearly separable. Want to classify unknown testing data into one of the classes Since training data is linearly separable, can create a line that separates the data Need to be able to classify testing data with minimum errors

6 A Pictoral Overview of Support Vector Machines X 2 H 1 H 2 H 3 X 1

Kernel Functions Requires kernel trick to work with non-linearly separable training data Kernel Trick - Mapping items from a set S into an inner product space V without having to compute the mapping Data can be projected to higher dimensions using kernels In the linear case, kernel is: k(x i, x j ) = x i x j (1)

8 Table of Contents Review of Speaker Verification Introduction to Support Vector Machines Derivation of SVM Equations Soft Margin Nonlinear Classification SVMs in Speaker Verification Examples of Kernels Used in Speaker Verification How to Perform

9 Formulation of the Problem Consider a set of data points of the form S = {(x i, y i ) x i R d, y i { 1, 1}} (2) y i - indicator of which class a point x i is of Need to a find a hyperplane that maximally separates data. Hyperplane of form: Parameter vector w w x b = 0 (3) b w denotes affine offset from origin along normal

10 Selecting the Seperating Hyperplane First case: Linearly Separable Training Data Select two hyperplanes as shown Region between: the margin Hyperplanes are of form on picture Distance between the two is 2 w Thus, need to minimize w

Formulating the Optimization Problem Want to prevent points from falling into the margin. Thus add constraints For x i of the first class For x i of the second class Or can be written together as w x i b 1 (4) w x i b 1 (5) y i (w x i b) 1 (6)

12 The Optimization Problem subject to min w,b 1 2 w 2 (7) y i (w x i b) 1 (8)

The Primal Problem The Lagrangian can be defined as L P := 1 2 w 2 N α i [y i (w x i b) 1] i=1 (9) Primal Problem is convex since objective is convex and constraints are linear.

The Primal Problem Solving the gradient gives us w = i α i y i x i (10) and α i y i = 0 (11) i b = 1 N SV N SV (w x i y i ) (12) i=1 where i is taken over the support vectors (the vectors on the edge of the boundary)

The Dual Problem Since primal problem is convex, there is no duality gap Dual Lagrangian given as L D = i α i 1 α i α j y i y j x i x j 2 i,j (13)

KKT Conditions w v L P = w v i α i y i x iv = 0 v = 1,..., d (14) b L P = i α i y i = 0 (15) y i (w x i + b) 1 0 i = 1,..., l (16) α i 0 i (17) α i (y i (w x i + b) 1) = 0 i (18)

17 Table of Contents Review of Speaker Verification Introduction to Support Vector Machines Derivation of SVM Equations Soft Margin Nonlinear Classification SVMs in Speaker Verification Examples of Kernels Used in Speaker Verification How to Perform

18 Unseparable Data Sometimes data from one class is mixed with data from other Can project with kernels but may overfit Want to allow for errors

19 Optimization Problem for Soft Margin Introduce linear penalty function featuring slack variable ξ i which measure the degree of misclassification of the data Optimization problem becomes N min {1 w,ξ,b 2 w 2 + C ξ i } (19) subject to i=1 y i (w x i b) 1 ξ i ξ i 0 (20)

20 Primal Lagrangian L P = 1 2 w 2 +C n n n ξ i α i [y i (w x i b) 1+ξ i ] β i ξ i (21) i=1 i=1 i=1 with α i, β i 0

Dual Lagrangian subject to and L D = n α i 1 α i α j y i y j x i x j (22) 2 i=1 i,j 0 α i C (23) n α i y i = 0 (24) i=1

22 KKT Conditions for Soft Margin L P w v = w v i L P b = i α i y i x iv = 0 (25) α i y i = 0 (26) L P ξ i = C α i µ i = 0 (27) y i (w x i + b) 1 + ξ i 0 (28) ξ i 0 (29) α i 0 (30) µ i 0 (31) α i [y i (w x i b) 1 + ξ i ] = 0 (32) µ i ξ i = 0 (33)

23 Table of Contents Review of Speaker Verification Introduction to Support Vector Machines Derivation of SVM Equations Soft Margin Nonlinear Classification SVMs in Speaker Verification Examples of Kernels Used in Speaker Verification How to Perform

24 Motivation Use Kernels to Transform Nonlinearly separable data into linearly separable data Project Data to Higher Dimensions

Kernel Function Kernel function is dot product of two vectors projected into another space. K(x i, x j ) = φ((x) i ) φ((x) j ) (34) SVM equations only require dot product of two vectors. Can work with infinite dimensional projections

Calculating the Decision Boundary Can use same optimization as before, substituting K(x i, x j ) for x i x j New data points can be classified by following equation N S f (x) = α i y(i)k(s i, x) + b (35) i=1 where s i is the support vectors

Mercer s Condition Kernels must be positive semidefinite to obtain uniform convergence to a solution for all training sets For any g(x) such that g(x)dx < K(x, y)g(x)g(y)dxdy 0 (36)

If Mercer Condition is Not Achieved Training data may cause Hessian to be indefinite No solution can be found generally May find solution for some set of training vectors

29 Examples of Kernels Polynomial K(x i, x j ) = (x i x j + 1) d (37) Gaussian Radial Basis Function K(x i, x j ) = e γ x i x j 2 (38)

30 Table of Contents Review of Speaker Verification Introduction to Support Vector Machines Derivation of SVM Equations Soft Margin Nonlinear Classification SVMs in Speaker Verification Examples of Kernels Used in Speaker Verification How to Perform

Typical Question What Kernel to Use? What Space to Model Data in?

32 Table of Contents Review of Speaker Verification Introduction to Support Vector Machines Derivation of SVM Equations Soft Margin Nonlinear Classification SVMs in Speaker Verification Examples of Kernels Used in Speaker Verification How to Perform

Radial Basis Functions Use standard radial basis function for kernel Run on MFCCs: each one a data point Subtract off means of MFCCs to normalize first

GMM Supervector Kernel MAP Adaptation of Means of UBM for each utterance is data point Compares parameters from unknown speaker to parameters from target and background speakers K(utt a, utt b ) = N i=1 w i µ a i Σ 1 i µ b i (39) Current baseline SVM system for Speaker Verification (GMM-SVM based system)

Nuisance Attribute Projection Tries to project out subspaces that cause varability in the data. Nonlinear expansion of the GMM Supervector Kernel Compared to Factor Analysis, does not estimate variability

36 Table of Contents Review of Speaker Verification Introduction to Support Vector Machines Derivation of SVM Equations Soft Margin Nonlinear Classification SVMs in Speaker Verification Examples of Kernels Used in Speaker Verification How to Perform

How to Perform New SVM Decision boundary for each target speaker For MFCC-based kernels, target data are target MFCCs For GMM Supervector-based kernels, target data are adapted mean vectors from a UBM Background data are either background MFCCs or background adapted means

Sources for Pictures Slide 6: ZackWeinberg on Wikipedia adapting a picture by Cyc Slides 10, 12, 13, 14, 15: Cyc on Wikipedia Slides 18, 19, 20, 21: EMILeA-stat by Institut fr Statistik und Wirtschaftsmathematik (http: //emilea-stat.stochastik.rwth-aachen.de/cgi-bin/ WebObjects/EMILeAstat.woa/wo/0.0.27.1.1.3.0) Slide 24: Alisneaky on Wikipedia Slide 25: Tristan Fletcher, Support Vector Machines Explained, UCL. London, England