Kernel Learning for Multi-modal Recognition Tasks

Size: px

Start display at page:

Download "Kernel Learning for Multi-modal Recognition Tasks"

Sheryl Gallagher
5 years ago
Views:

1 Kernel Learning for Multi-modal Recognition Tasks J. Saketha Nath CSE, IIT-B IBM Workshop J. Saketha Nath (IIT-B) IBM Workshop 23-Sep-09 1 / 15

2 Multi-modal Learning Tasks Multiple views or descriptions of the data is available E.g. Object Categorization J. Saketha Nath (IIT-B) IBM Workshop 23-Sep-09 2 / 15

3 Object Categorization Figure: Daffodils and Dandelions can be distinguished using shape features. Figure: Blue-bell and Tulip can be distinguished using color features. Source: J. Saketha Nath (IIT-B) IBM Workshop 23-Sep-09 3 / 15

4 Object Categorization Features Typical Features: HSV Color features. SIFT Local texture and shape features. HOG Global shape features. J. Saketha Nath (IIT-B) IBM Workshop 23-Sep-09 4 / 15

5 Multi-modal Learning Tasks Multiple views or descriptions of the data is available E.g. Object Categorization, Multi-modal Speech/Speaker/Activity recognition J. Saketha Nath (IIT-B) IBM Workshop 23-Sep-09 5 / 15

6 Multi-modal Learning Tasks Multiple views or descriptions of the data is available E.g. Object Categorization, Multi-modal Speech/Speaker/Activity recognition, Text Mining? J. Saketha Nath (IIT-B) IBM Workshop 23-Sep-09 5 / 15

7 Multi-modal Learning Tasks Multiple views or descriptions of the data is available E.g. Object Categorization, Multi-modal Speech/Speaker/Activity recognition, Text Mining? Build classifier using all features (say SVM) J. Saketha Nath (IIT-B) IBM Workshop 23-Sep-09 5 / 15

8 Multi-modal Learning Tasks Multiple views or descriptions of the data is available E.g. Object Categorization, Multi-modal Speech/Speaker/Activity recognition, Text Mining? Build classifier using all features (say SVM) Can we do better? E.g. Nilsback and Zisserman (CVPR06) found that: All the 3 kinds of features are critical Not all features in each kind may be important Exploit natural grouping of features. J. Saketha Nath (IIT-B) IBM Workshop 23-Sep-09 5 / 15

9 Scope and Objective Scope: Binary classification task, Kernel methods like SVM Simultaneous feature selection and classifier construction Multiple Kernel Learning [Lanckriet et.al., 02] Objective: Customized for multi-modal tasks Exploit the group structure in features (prior info) J. Saketha Nath (IIT-B) IBM Workshop 23-Sep-09 6 / 15

10 Problem Setting Given: D = {(x i, y i ) x i X, y i { 1, 1}, i = 1,..., m} Feature maps Φ j, j = 1,..., n (n is modality of data) E.g. Φ 1 (x i ) feature vector describing color of flower x i Using each Φ j generate various kernels (linear, polynomial, Gaussian) J. Saketha Nath (IIT-B) IBM Workshop 23-Sep-09 7 / 15

11 Problem Setting Given: Kernels K jk, j = 1,..., n, k = 1,..., n j n modality of data n j no. kernels generated from j th mode J. Saketha Nath (IIT-B) IBM Workshop 23-Sep-09 8 / 15

12 Problem Setting Given: Kernels K jk, j = 1,..., n, k = 1,..., n j n modality of data n j no. kernels generated from j th mode MKL Task: Simultaneously determine weights given to Kernels (features) and the classifier Utilize prior information regarding the Kernels J. Saketha Nath (IIT-B) IBM Workshop 23-Sep-09 8 / 15

13 Existing Methods SVM max α S 1 α 1 2 α YKYα K = n nj j=1 k=1 K jk (concatenation of all features) J. Saketha Nath (IIT-B) IBM Workshop 23-Sep-09 9 / 15

14 Existing Methods SVM MKL max α S 1 α 1 2 α YKYα K = n nj j=1 k=1 K jk (concatenation of all features) min λ jk 0, P max j,k λ jk=1 α S 1 α 1 2 α YKYα K = n nj j=1 k=1 λ jkk jk (convex combination of Kernels) Equivalent to selecting single best kernel! J. Saketha Nath (IIT-B) IBM Workshop 23-Sep-09 9 / 15

15 Proposed Methodology Φ jk ( ) implicit map defined by K jk f(x) = n nj j=1 k=1 w jk Φ jk(x) b J. Saketha Nath (IIT-B) IBM Workshop 23-Sep / 15

16 Proposed Methodology Φ jk ( ) implicit map defined by K jk f(x) = n nj j=1 k=1 w jk Φ jk(x) b SVM Formulation: 1 min n nj w jk,b,ξ 2 j=1 k=1 w jk 2 + C i ξ i i s.t. y i ( n j=1 nj k=1 w jk Φ jk(x i ) b) 1 ξ i, ξ i 0 J. Saketha Nath (IIT-B) IBM Workshop 23-Sep / 15

17 Proposed Methodology Φ jk ( ) implicit map defined by K jk f(x) = n nj j=1 k=1 w jk Φ jk(x) b Convex Formulation: [ 1 ( nj min w jk,b,ξ 2 max j k=1 w ) ] 2 jk 2 + C i ξ i i s.t. y i ( n nj j=1 k=1 w jk Φ jk(x i ) b) 1 ξ i, ξ i 0 J. Saketha Nath (IIT-B) IBM Workshop 23-Sep / 15

18 Dual Formulation: min λ j nj j ( max γ n, α S 1 α 1 n nj 2 α Y k=1 λ ) jkk jk Yα (1) γ j j=1 J. Saketha Nath (IIT-B) IBM Workshop 23-Sep / 15

19 Dual Formulation: min λ j nj j ( max γ n, α S 1 α 1 n nj 2 α Y k=1 λ ) jkk jk Yα (1) γ j j=1 Comments: Equivalent to SVM formulation with K n j=1 1 γ j ( Pnj ) k=1 λ jk K jk γj. weight for j th mode and λ jk weight for kth kernel in j th mode. γ j 0, j = 1,..., n provided K jk are positive definite. λ jk is highly sparse for each j. n = 1 gives back MKL! J. Saketha Nath (IIT-B) IBM Workshop 23-Sep / 15

20 Efficient Solver Pose as SOCP, solve using SeDuMi, Mosek Extensions of iterative algorithms in MKL literature suffer from non-convexity problems J. Saketha Nath (IIT-B) IBM Workshop 23-Sep / 15

21 Efficient Solver Pose as SOCP, solve using SeDuMi, Mosek Extensions of iterative algorithms in MKL literature suffer from non-convexity problems Mirror Descent based alg.: Iterative alg. solving an SVM at each step. Far more scalable than state-of-the-art MKL solvers (n = 1) J. Saketha Nath (IIT-B) IBM Workshop 23-Sep / 15

22 Results Object Categorization Average gain wrt. L1 MKL (%) Object Categories Average gain wrt. L2 MKL (%) Object Categories Figure: Plot of average gain (%) in accuracy with MixNorm-MKL on Caltech-101. J. Saketha Nath (IIT-B) IBM Workshop 23-Sep / 15

23 Results Scaling Training Time (seconds) MixNorm MKL L1 MKL Training Time (seconds) MixNorm MKL L1 MKL log10(number of Kernels) log10(number of Kernels) Figure: Scaling plots comparing mirror-descent based algorithm and simplemkl. J. Saketha Nath (IIT-B) IBM Workshop 23-Sep / 15

24 THANK YOU J. Saketha Nath (IIT-B) IBM Workshop 23-Sep / 15

MULTIPLEKERNELLEARNING CSE902

MULTIPLEKERNELLEARNING CSE902 Multiple Kernel Learning -keywords Heterogeneous information fusion Feature selection Max-margin classification Multiple kernel learning MKL Convex optimization Kernel classification