Performance Limits on the Classification of Kronecker-structured Models

Size: px

Start display at page:

Download "Performance Limits on the Classification of Kronecker-structured Models"

Shawn Dale Summers
6 years ago
Views:

1 Performance Limits on the Classification of Kronecker-structured Models Ishan Jindal and Matthew Nokleby Electrical and Computer Engineering Wayne State University

dictionary: y = Ax + z [Basri and Jacob, Lambertian Reflectance Functions and

2 Motivation: Subspace models (Union of) subspace models useful for signal processing/machine learning problems Represent signals via (overcomplete) dictionary: y = Ax + z [Basri and Jacob, Lambertian Reflectance Functions and Linear Subspaces, 2003] [Aharon et al. "K-SVD: An algorithm for designing overcomplete dictionaries 2006]

3 Motivation: Subspace models (Union of) subspace models useful for signal processing/machine learning problems Represent signals via (overcomplete) dictionary: y = Ax + z Offers a compact representation Efficient algorithms for classification, clustering, and dictionary learning Good understanding of fundamental performance limits [Zhang and Li, Discriminative K-SVD for dictionary learning 2010] [Heckel et al., Dimensionality-reduced subspace clustering 2015] [MN, Rodrigues, Calderbank, "Discrimination on the Grassmann Manifold 2015]

4 Motivation: Kronecker-structured models Matrix-valued subspace model: Y = AXB T + Z Constituent dictionaries for rows and columns Accounts for multidimensional data structure Useful for video inpainting, tomographic image reconstruction [Tsiligkaridis and Hero "Covariance estimation via Kronecker product expansion 2013] [Hawe et al., Separable dictionary learning 2013] [Shakeri et al., Minimax lower bounds for Kronecker-structured dictionary learning 2016]

5 Motivation: Kronecker-structured models Matrix-valued subspace model: Y = AXB T + Z Constituent dictionaries for rows and columns Accounts for multidimensional data structure Useful for video inpainting, tomographic image reconstruction [Tsiligkaridis and Hero "Covariance estimation via Kronecker product expansion 2013] [Hawe et al., Separable dictionary learning 2013] [Shakeri et al., Minimax lower bounds for Kronecker-structured dictionary learning 2016]

6 Motivation: Kronecker-structured models Matrix-valued subspace model: Y = AXB T + Z Constituent dictionaries for rows and columns Accounts for multidimensional data structure Useful for video inpainting, tomographic image reconstruction [Tsiligkaridis and Hero "Covariance estimation via Kronecker product expansion 2013] [Hawe et al., Separable dictionary learning 2013] [Shakeri et al., Minimax lower bounds for Kronecker-structured dictionary learning 2016]

7 Motivation: Kronecker-structured models Matrix-valued subspace model: Y = AXB T + Z Constituent dictionaries for rows and columns Accounts for multidimensional data structure Useful for video inpainting, tomographic image reconstruction time space [Tsiligkaridis and Hero "Covariance estimation via Kronecker product expansion 2013] [Hawe et al., Separable dictionary learning 2013] [Shakeri et al., Minimax lower bounds for Kronecker-structured dictionary learning 2016]

8 Motivation: Kronecker-structured models Matrix-valued subspace model: Y = AXB T + Z Constituent dictionaries for rows and columns Accounts for multidimensional data structure Useful for video inpainting, tomographic image reconstruction Even more compact representation Efficient dictionary learning methods time space What is the classification performance? [Tsiligkaridis and Hero "Covariance estimation via Kronecker product expansion 2013] [Hawe et al., Separable dictionary learning 2013] [Shakeri et al., Minimax lower bounds for Kronecker-structured dictionary learning 2016]

9 Contributions Information-theoretic analysis of the classification performance of K-S models Performance metrics: Diversity order: How fast does the classification error rate decay as the SNR approaches infinity? Main results: Exact derivation of the diversity order Find the diversity gap with unstructured subspace models Not in this talk: (Classification) capacity: How many K-S subspaces can one discern in the limit of large signal dimension?

10 Contributions Information-theoretic analysis of the classification performance of K-S models Performance metrics: Diversity order: How fast does the classification error rate decay as the SNR approaches infinity? Main results: Exact derivation of the diversity order Find the diversity gap with unstructured subspace models Not in this talk: (Classification) capacity: How many K-S subspaces can one discern in the limit of large signal dimension? [MN, Rodrigues, Calderbank, "Discrimination on the Grassmann Manifold 2015]

11 Problem Definition: Signal model Signal belongs to one of L classes Signal model for class 1 l L: Y = A l XB T l + Z Y 2 R m 1 m 2 A l 2 R m 1 n 1 B l 2 R m 2 n 2 : signal of interest X 2 R n 1 n 2 : coefficients : column subspace basis Z 2 R m 1 m 2 :model noise : row subspace basis n 1 <m 1, n 2 <m 2 Signal columns live near an n 1 -dimensional subspace, signal rows live near an n 2 -dimensional subspace

12 Problem Definition: Signal model Useful to work with vectorized signal model: y =(B l A l )x + z, y =vec(y ) 2 R M z =vec(z) 2 R M x =vec(x) 2 R N Signal lives near an n 1 n 2 -dimensional subspace with a Kronecker-structured dictionary Classification is over a restricted set of subspace models m 1 = + + m 2 n 1 x n 2 basis elements

13 Problem Definition: Classification model Let coefficients and i.i.d. noise be Gaussian: x N (0, I), z N (0, 2 I) Noise variance quantifies deviation from K-S model Class-conditional distribution is Gaussian: p(y A l, B l )=N (0, (B l A l )(B l A l ) T + 2 I) Equivalent to classification of a Gaussian mixture model with matrix normal components

14 Problem Definition: Classification model Let coefficients and i.i.d. noise be Gaussian: x N (0, I), z N (0, 2 I) Noise variance quantifies deviation from K-S model Class-conditional distribution is Gaussian: p(y A l, B l )=N (0, (B l A l )(B l A l ) T + 2 I) Equivalent to classification of a Gaussian mixture model with matrix normal components Classifier estimates the class label l from y (e.g. by ML rule) Want to understand the probability of error: P e = 1 L LX l=1 Pr(ˆl 6= l y p(y A l, B l ))

15 Diversity Order How fast does the error probability decay in the model noise? Fix L and bases A l, B l for each class Definition: The diversity order is the exponent: d = lim 2!0 log(p e ) 1 2 log(1/ 2 )

16 Diversity Order How fast does the error probability decay in the model noise? Fix L and bases A l, B l for each class Definition: The diversity order is the exponent: d = lim 2!0 log(p e ) 1 2 log(1/ 2 ) Isolates the impact of subspace model on performance For standard subspace models (m 2 =n 2 =1): d =min{n 1,m 1 n 1 } Equivalent K-S subspace conjecture : d =min{n 1 n 2,m 1 m 2 n 1 n 2 } [MN, Rodrigues, Calderbank, "Discrimination on the Grassmann Manifold 2015]

17 Bhattacharrya Bound Main analytical tool: Bhattacharrya bound Lemma [Bhattacharyya Bound] : For two zero-mean Gaussian classes with covariances Σ 1 and Σ 2 and equal priors, the pairwise maximum-likelihood classification error is bounded by: P e apple /2 2 1/2! 1/2

18 Bhattacharrya Bound Main analytical tool: Bhattacharrya bound Lemma [Bhattacharyya Bound] : For two zero-mean Gaussian classes with covariances Σ 1 and Σ 2 and equal priors, the pairwise maximum-likelihood classification error is bounded by: P e apple /2 2 1/2! 1/2 Tight w.r.t the exponent of the pairwise error as P e 0 Union bound is be exponentially tight, too Diversity order: worst pairiwise exponent of this bound [Kailath, The divergence and Bhattacharyya distance measures in signal selection 1967]

19 Diversity Order: Bhattacharrya bound Recall the class-conditional distributions p(y A l, B l )=N (0, (B l A l )(B l A l ) T + 2 I) Define Kronecker-structured dictionaries for each class: D l = B l A l 2 R m 1m 2 n 1 n 2

20 Diversity Order: Bhattacharrya bound Recall the class-conditional distributions p(y A l, B l )=N (0, (B l A l )(B l A l ) T + 2 I) Define Kronecker-structured dictionaries for each class: P e (D i, D j ) apple 1 2 D i DT i +D j DT j +2 2 I 2 D i D T i + 2 I 1 2 D j D T j + 2 I 1 2 Simplifying, we obtain, for almost every class pair: r ij P e (D i, D j )=2 n 1 n ijr 2 ij 2 r ij D l = B l A l 2 R m 1m 2 n 1 n 2 Pairwise error probability between classes i and j is bounded:! 1 2 n 1 n 2 : rank of D i D T i + D j D T j ijrij : smallest nonzero eigenvalue of D i D T i + D j D T j

21 Rank Analysis Theorem: For any K-S classification problem, the diversity order is where d(a) =r n 1 n 2 r =min i6=j rank(d id T i + D j D T j )

22 Rank Analysis Theorem: For any K-S classification problem, the diversity order is where d(a) =r n 1 n 2 r =min i6=j rank(d id T i + D j D T j ) Almost every dictionary D l has rank n 1 n 2 In general: rank(d 1 + D 2 ) = rank(d 1 ) + rank(d 2 ) dim(r(d 1 ) \ R(D 2 ))

23 Rank of the Sum of K-S Dictionaries Lemma : Let the intersection of row and column range spaces be Then, dim[r(a i ) \ R(A j )] = x dim[r(b i ) \ R(B j )] = y dim[r(a i B i ) \ R(A j B j )] = xy Intuition: If row or column subspaces are disjoint, entire K-S subspaces are disjoint

24 Rank of the Sum of K-S Dictionaries Lemma : Let the intersection of row and column range spaces be Then, dim[r(a i ) \ R(A j )] = x dim[r(b i ) \ R(B j )] = y dim[r(a i B i ) \ R(A j B j )] = xy Corollary: For almost every pair of KS dictionaries, rank(d i D T i + D j D T j )=2n 1 n 2 [2n 1 m 1 ] + [2n 2 m 2 ] + Intuition: If row or column subspaces are disjoint, entire K-S subspaces are disjoint

25 Main Result: Diversity Order Corollary : For almost any K-S classification problem, the diversity order is d = n 1 n 2 [2n 1 m 1 ] + [2n 2 m 2 ] + How does this compare to the unstructured conjecture? d =min{n 1 n 2,m 1 m 2 n 1 n 2 } = n 1 n 2 [2n 1 n 2 m 1 m 2 ] + When subspace dimensions are small, diversity is the same With larger dimension, the K-S diversity order is smaller than that of unstructured subspaces Gap is bilinear in dimensions => quadratic in overall dimension [MN, Rodrigues, Calderbank, "Discrimination on the Grassmann Manifold 2015]

26 Numerical results: Synthetic data Synthetic data: L=2 K-S bases drawn randomly, ML classification performance plotted vs. SNR Validates numerically the diversity results

classification accuracy YaleB faces Subspaces SIFT[1] K-S Test Accuracy (%) 79.36 84.

27 Numerical Results: YaleB Learned K-S dictionaries from 10 samples per class of the YaleB face dataset Compare to: standard subspace classification, classification from SIFT features Much more compact representation, competitive classification accuracy YaleB faces Subspaces SIFT[1] K-S Test Accuracy (%) Number of parameters K-S dictionary elements [Ramirez et al. Classification and clustering via dictionary learning 2010]

28 Conclusion Summary: Studied the classification performance of Kroneckerstructured subspace models Derived the exact diversity order Found the the error rate is higher than of unstructured subspace models Future work: Discriminative K-S dictionary/subspace learning Good feature extraction for K-S data Higher order tensor models

Discrimination on the Grassmann Manifold: Fundamental Limits of Subspace Classifiers

Discrimination on the Grassmann Manifold: Fundamental Limits of Subspace Classifiers Matthew Nokleby, Member, IEEE, Miguel Rodrigues, Member, IEEE, and Robert Calderbank, Fellow, IEEE Abstract arxiv:404.587v