Support Vector Machines and Speaker Verification

1 Support Vector Machines and Speaker Verification David Cinciruk March 6, 2013

2 Table of Contents Review of Speaker Verification Introduction to Support Vector Machines Derivation of SVM Equations Soft Margin Nonlinear Classification SVMs in Speaker Verification Examples of Kernels Used in Speaker Verification How to Perform

3 Review of Speaker Verification Speaker Verification as discussed before was done using Gaussian Mixture Models One popular way to perform speaker verification is with Support Vector Machines

4 Table of Contents Review of Speaker Verification Introduction to Support Vector Machines Derivation of SVM Equations Soft Margin Nonlinear Classification SVMs in Speaker Verification Examples of Kernels Used in Speaker Verification How to Perform

Motivation Training Data from two different classes are linearly separable. Want to classify unknown testing data into one of the classes Since training data is linearly separable, can create a line that separates the data Need to be able to classify testing data with minimum errors

6 A Pictoral Overview of Support Vector Machines X 2 H 1 H 2 H 3 X 1

Kernel Functions Requires kernel trick to work with non-linearly separable training data Kernel Trick - Mapping items from a set S into an inner product space V without having to compute the mapping Data can be projected to higher dimensions using kernels In the linear case, kernel is: k(x i, x j ) = x i x j (1)

8 Table of Contents Review of Speaker Verification Introduction to Support Vector Machines Derivation of SVM Equations Soft Margin Nonlinear Classification SVMs in Speaker Verification Examples of Kernels Used in Speaker Verification How to Perform

9 Formulation of the Problem Consider a set of data points of the form S = {(x i, y i ) x i R d, y i { 1, 1}} (2) y i - indicator of which class a point x i is of Need to a find a hyperplane that maximally separates data. Hyperplane of form: Parameter vector w w x b = 0 (3) b w denotes affine offset from origin along normal

10 Selecting the Seperating Hyperplane First case: Linearly Separable Training Data Select two hyperplanes as shown Region between: the margin Hyperplanes are of form on picture Distance between the two is 2 w Thus, need to minimize w

Formulating the Optimization Problem Want to prevent points from falling into the margin. Thus add constraints For x i of the first class For x i of the second class Or can be written together as w x i b 1 (4) w x i b 1 (5) y i (w x i b) 1 (6)

12 The Optimization Problem subject to min w,b 1 2 w 2 (7) y i (w x i b) 1 (8)

The Primal Problem The Lagrangian can be defined as L P := 1 2 w 2 N α i [y i (w x i b) 1] i=1 (9) Primal Problem is convex since objective is convex and constraints are linear.

The Primal Problem Solving the gradient gives us w = i α i y i x i (10) and α i y i = 0 (11) i b = 1 N SV N SV (w x i y i ) (12) i=1 where i is taken over the support vectors (the vectors on the edge of the boundary)

The Dual Problem Since primal problem is convex, there is no duality gap Dual Lagrangian given as L D = i α i 1 α i α j y i y j x i x j 2 i,j (13)

KKT Conditions w v L P = w v i α i y i x iv = 0 v = 1,..., d (14) b L P = i α i y i = 0 (15) y i (w x i + b) 1 0 i = 1,..., l (16) α i 0 i (17) α i (y i (w x i + b) 1) = 0 i (18)

17 Table of Contents Review of Speaker Verification Introduction to Support Vector Machines Derivation of SVM Equations Soft Margin Nonlinear Classification SVMs in Speaker Verification Examples of Kernels Used in Speaker Verification How to Perform

18 Unseparable Data Sometimes data from one class is mixed with data from other Can project with kernels but may overfit Want to allow for errors

19 Optimization Problem for Soft Margin Introduce linear penalty function featuring slack variable ξ i which measure the degree of misclassification of the data Optimization problem becomes N min {1 w,ξ,b 2 w 2 + C ξ i } (19) subject to i=1 y i (w x i b) 1 ξ i ξ i 0 (20)

20 Primal Lagrangian L P = 1 2 w 2 +C n n n ξ i α i [y i (w x i b) 1+ξ i ] β i ξ i (21) i=1 i=1 i=1 with α i, β i 0

Dual Lagrangian subject to and L D = n α i 1 α i α j y i y j x i x j (22) 2 i=1 i,j 0 α i C (23) n α i y i = 0 (24) i=1

22 KKT Conditions for Soft Margin L P w v = w v i L P b = i α i y i x iv = 0 (25) α i y i = 0 (26) L P ξ i = C α i µ i = 0 (27) y i (w x i + b) 1 + ξ i 0 (28) ξ i 0 (29) α i 0 (30) µ i 0 (31) α i [y i (w x i b) 1 + ξ i ] = 0 (32) µ i ξ i = 0 (33)

23 Table of Contents Review of Speaker Verification Introduction to Support Vector Machines Derivation of SVM Equations Soft Margin Nonlinear Classification SVMs in Speaker Verification Examples of Kernels Used in Speaker Verification How to Perform

24 Motivation Use Kernels to Transform Nonlinearly separable data into linearly separable data Project Data to Higher Dimensions

Kernel Function Kernel function is dot product of two vectors projected into another space. K(x i, x j ) = φ((x) i ) φ((x) j ) (34) SVM equations only require dot product of two vectors. Can work with infinite dimensional projections

Calculating the Decision Boundary Can use same optimization as before, substituting K(x i, x j ) for x i x j New data points can be classified by following equation N S f (x) = α i y(i)k(s i, x) + b (35) i=1 where s i is the support vectors

Mercer s Condition Kernels must be positive semidefinite to obtain uniform convergence to a solution for all training sets For any g(x) such that g(x)dx < K(x, y)g(x)g(y)dxdy 0 (36)

If Mercer Condition is Not Achieved Training data may cause Hessian to be indefinite No solution can be found generally May find solution for some set of training vectors

29 Examples of Kernels Polynomial K(x i, x j ) = (x i x j + 1) d (37) Gaussian Radial Basis Function K(x i, x j ) = e γ x i x j 2 (38)

30 Table of Contents Review of Speaker Verification Introduction to Support Vector Machines Derivation of SVM Equations Soft Margin Nonlinear Classification SVMs in Speaker Verification Examples of Kernels Used in Speaker Verification How to Perform

Typical Question What Kernel to Use? What Space to Model Data in?

32 Table of Contents Review of Speaker Verification Introduction to Support Vector Machines Derivation of SVM Equations Soft Margin Nonlinear Classification SVMs in Speaker Verification Examples of Kernels Used in Speaker Verification How to Perform

Radial Basis Functions Use standard radial basis function for kernel Run on MFCCs: each one a data point Subtract off means of MFCCs to normalize first

GMM Supervector Kernel MAP Adaptation of Means of UBM for each utterance is data point Compares parameters from unknown speaker to parameters from target and background speakers K(utt a, utt b ) = N i=1 w i µ a i Σ 1 i µ b i (39) Current baseline SVM system for Speaker Verification (GMM-SVM based system)

Nuisance Attribute Projection Tries to project out subspaces that cause varability in the data. Nonlinear expansion of the GMM Supervector Kernel Compared to Factor Analysis, does not estimate variability

36 Table of Contents Review of Speaker Verification Introduction to Support Vector Machines Derivation of SVM Equations Soft Margin Nonlinear Classification SVMs in Speaker Verification Examples of Kernels Used in Speaker Verification How to Perform

How to Perform New SVM Decision boundary for each target speaker For MFCC-based kernels, target data are target MFCCs For GMM Supervector-based kernels, target data are adapted mean vectors from a UBM Background data are either background MFCCs or background adapted means

Sources for Pictures Slide 6: ZackWeinberg on Wikipedia adapting a picture by Cyc Slides 10, 12, 13, 14, 15: Cyc on Wikipedia Slides 18, 19, 20, 21: EMILeA-stat by Institut fr Statistik und Wirtschaftsmathematik (http: //emilea-stat.stochastik.rwth-aachen.de/cgi-bin/ WebObjects/EMILeAstat.woa/wo/0.0.27.1.1.3.0) Slide 24: Alisneaky on Wikipedia Slide 25: Tristan Fletcher, Support Vector Machines Explained, UCL. London, England