Compressed Sensing Meets Machine Learning - Classification of Mixture Subspace Models via Sparse Representation

Size: px

Start display at page:

Download "Compressed Sensing Meets Machine Learning - Classification of Mixture Subspace Models via Sparse Representation"

June Barnett
6 years ago
Views:

1 Compressed Sensing Meets Machine Learning - Classification of Mixture Subspace Models via Sparse Representation Allen Y Yang <yang@eecsberkeleyedu> Feb 25, 2008 UC Berkeley

2 What is Sparsity Sparsity A signal is sparse if most of its coefficients are (approximately) zero (a) Harmonic functions (b) Magnitude spectrum Figure: 2-D DCT transform

3 Sparsity in spatial domain gene microarray data [Drmanac et al 1993]

Sparsity in human visual cortex [Olshausen & Field 1997, Serre & Poggio 2006] 1 Feed-forward: No iterative feedback loop 2 Redundancy: Average 80-200 neurons for

4 Sparsity in human visual cortex [Olshausen & Field 1997, Serre & Poggio 2006] 1 Feed-forward: No iterative feedback loop 2 Redundancy: Average neurons for each feature representation 3 Recognition: Information exchange between stages is not about individual neurons, but rather how many neurons as a group fire together

5 Sparsity and l 1 -Minimization Black gold age [Claerbout & Muir 1973, Taylor, Banks & McCoy 1979] Figure: Deconvolution of spike train

Sparse Support Estimators Sparse support estimator [Donoho 1992, Meinshausen & Buhlmann 2006, Yu 2006, Wainwright 2006, Ramchandran 2007, Gastpar 2007] Basis pursuit [Chen & Donoho

6 Sparse Support Estimators Sparse support estimator [Donoho 1992, Meinshausen & Buhlmann 2006, Yu 2006, Wainwright 2006, Ramchandran 2007, Gastpar 2007] Basis pursuit [Chen & Donoho 1999]: Given y = Ax and x unknown, x = arg min x 1, subject to y = Ax The Lasso (least absolute shrinkage and selection operator) [Tibshirani 1996] x = arg min y Ax 2, subject to x 1 k

7 Taking Advantage of Sparsity What generates sparsity? (d après Emmanuel Candès) Measure first, analyze later Curse of dimensionality 1 Numerical analysis: sparsity reduces cost for storage and computation 2 Regularization in classification: (a) decision boundary (b) maximal margin Figure: Linear support vector machine (SVM)

8 Our Contributions 1 Classification via compressed sensing 2 Performance in face recognition 3 Extensions Outlier rejection Occlusion compensation 4 Distributed pattern recognition in sensor networks

9 Problem Formulation in Face Recognition 1 Notations Training: For K classes, collect training samples {v 1,1,, v 1,n1 },, {v K,1,, v K,nK } R D Test: Present a new y R D, solve for label(y) [1, 2,, K] 2 Construct R D sample space via stacking Figure: For images, assume 3-channel image, D = e6 3 Assume y belongs to Class i [Belhumeur et al 1997, Basri & Jacobs 2003] y = α i,1 v i,1 + α i,2 v i,2 + + α i,n1 v i,ni, = A i α i, where A i = [v i,1, v i,2,, v i,ni ]

10 1 Nevertheless, i is the variable we need to solve Global representation: α 1 α 2 y = [A 1, A 2,, A K ], α K = Ax 0 2 Over-determined system: A R D n, where D n = n n K x 0 encodes membership of y: If y belongs to Subject i, x 0 = [ 0 0 α i 0 0 ] T R n Problems to face Solving for x 0 in R D is intractable True solution x 0 is sparse: Average 1 K terms non-zero

Holistic features Eigenfaces [Turk 1991]

11 Dimensionality Redunction 1 Construct linear projection R R d D, d is the feature dimension, d D ỹ = Ry = RAx 0 = Ãx 0 R d Ã R d n, but x 0 is unchanged 2 Holistic features Eigenfaces [Turk 1991] Fisherfaces [Belhumeur 1997] Laplacianfaces [He 2005] 3 Partial features 4 Unconventional features Downsampled faces Random projections

12 l 0 -Minimization 1 Solving for sparsest solution via l 0 -Minimization x 0 = arg min x 0 st ỹ = Ãx x 0 simply counts the number of nonzero terms 2 l 0 -Ball l 0 -ball is not convex l 0 -minimization is NP-hard

13 l 1 /l 0 Equivalence 1 Compressed sensing: If x 0 is sparse enough, l 0 -minimization is equivalent to (P 1 ) min x 1 st ỹ = Ãx x 1 = x 1 + x x n 2 l 1 -Ball l 1 -Minimization is convex Solution equal to l 0 -minimization 3 l 1 /l 0 Equivalence: [Donoho 2002, 2004; Candes et al 2004; Baraniuk 2006] Given ỹ = Ãx 0, there exists equivalence breakdown point (EBP) ρ(ã), if x 0 0 < ρ: l 1 -solution is unique x 1 = x 0

14 l 1 -Minimization Routines Matching pursuit [Mallat 1993] 1 Find most correlated vector v i in Ã with y: i = arg max y, v j 2 Ã Ãî, x i y, v i, y y x i v i 3 Repeat until y < ɛ Basis pursuit [Chen 1998] 1 Assume x 0 is m-sparse 2 Select m linearly independent vectors B m in Ã as a basis x m = B m y 3 Repeat swapping one basis vector in B m with another vector in Ã if improve y B mx m 4 If y B mx m 2 < ɛ, stop Quadratic solvers: ỹ = Ãx 0 + z R d, where z 2 < ɛ x = arg min{ x 1 + λ y Ãx 2 } [Lasso, Second-order cone programming]: More expensive Matlab Toolboxes l 1 -Magic by Candès at Caltech SparseLab by Donoho at Stanford cvx by Boyd at Stanford

15 Classification 1 Project x 1 onto face subspaces: α 1 0 δ 1 (x 1 ) =, δ 2 (x 1 ) = 0 2 Define residual r i = ỹ Ãδ i (x 1 ) 2 for Subject i: 0 α 2,, δ K (x 1 ) = α K (1) id(y) = arg min i=1,,k {r i }

AR Database 100 Subjects (Illumination and Expression

130 540 Eigen [%] 681 748 793 805 Laplacian [%] 731

692 737 Fisher [%] 834 868 N/A N/A Table: II Nearest

592 682 80 833 562 677 77 821 803 858 N/A N/A Table:

843 89 92 Laplacian [%] 734 858 908 957 Random [%] 541

N/A N/A Table: IV l 1 -Minimization 30 54 130 540 711

16 AR Database 100 Subjects (Illumination and Expression Variance) Table: I Nearest Neighbor Dimension Eigen [%] Laplacian [%] Random [%] Down [%] Fisher [%] N/A N/A Table: II Nearest Subspace N/A N/A Table: III Linear SVM Dimension Eigen [%] Laplacian [%] Random [%] Down [%] Fisher [%] N/A N/A Table: IV l 1 -Minimization N/A N/A

17 Sparsity vs Non-sparsity: l 1 and SVM decisively outperform NN and NS 1 Our framework seeks sparsity in representation of y 2 SVM seeks sparsity in decision boundaries on A = [v 1,, v n] 3 NN and NS do not enforce sparsity l 1 -Minimization vs SVM: Performance of SVM depends on the choice of features 1 Random project performs poorly with SVMs 2 l 1 -Minimization guarantees performance convergence with different features 3 At lower-dimensional space, Fisher features outperform Table: III Linear SVM Dimension Eigen [%] Laplacian [%] Random [%] Down [%] Fisher [%] N/A N/A Table: IV l 1 -Minimization N/A N/A

18 Randomfaces Blessing of Dimensionality [Donoho 2000] In high-dimensional data space R D, with overwhelming probability, l 1 /l 0 equivalence holds for random projection R Unconventional properties: 1 Domain independent! 2 Data independent! 3 Fast to generate and compute! Reference: Yang et al Feature selection in face recognition: A sparse representation perspective Berkeley Tech Report, 2007

concentrated to one subspace, the test sample is invalid

19 Variation: Outlier Rejection l 1 -Coefficients for invalid images Outlier Rejection When l 1 -solution is not sparse or concentrated to one subspace, the test sample is invalid Sparsity Concentration Index: SCI(x) = K max i δ i (x) 1 / x 1 1 K 1 [0, 1]

20 Variation: Occlusion Compensation 1 Sparse representation + sparse error y = Ax + e 2 Occlusion compensation: y = ( A I ) ( ) x = Bw e Reference: Wright et al Robust face recognition via sparse representation UIUC Tech Report, 2007

21 Distributed Pattern Recognition Figure: d-oracle: Distributed object recognition via camera wireless networks Key components: 1 Each sensor only observes partial profile of the event: Demand a global classification framework 2 Individual sensor obtains limited classification ability: Sensors become active only when certain events are locally detected 3 The network configuration is dynamic: Global classifier needs to adapt to change of active sensors

Problem Formulation for Distributed Action Recognition Architecture 8 sensors distributed on human body Location of the sensors are given and fixed Each sensor carries

22 Problem Formulation for Distributed Action Recognition Architecture 8 sensors distributed on human body Location of the sensors are given and fixed Each sensor carries triaxial accelerometer and biaxial gyroscope Sampling frequency: 20Hz Figure: Readings from 8 x-axis accelerometers and x-axis gyroscopes for a stand-kneel-stand sequence

Challenges for Distributed Action Recognition 1 Simultaneous segmentation and classification 2 Individual sensors not sufficient to classify all human actions 3 Simulate sensor failure and network

23 Challenges for Distributed Action Recognition 1 Simultaneous segmentation and classification 2 Individual sensors not sufficient to classify all human actions 3 Simulate sensor failure and network congestion by different subsets of active sensors 4 Identity independence: The prior examples of the subject for testing are excluded as part of training data Figure: Same actions performed by two subjects

Mixture Subspace Model for Distributed Action Recognition 1 Training samples are segmented manually with correct labels 2 On each sensor node i, normalize the vector form (via stacking) v i = [x(1),,

24 Mixture Subspace Model for Distributed Action Recognition 1 Training samples are segmented manually with correct labels 2 On each sensor node i, normalize the vector form (via stacking) v i = [x(1),, x(h), y(1),, y(h), z(1),, z(h), θ(1),, θ(h), ρ(1),, ρ(h)] T R 5h 3 Full body motion Training sample: v = ( v1 v 8 ) Test sample: y = y 1 R 8 5h y 8 4 Mixture subspace model y = y 1 y 8 = ( v1 v 8 ) 1,, ( v1 v 8 ) n x = Ax

25 Localized Classifiers Distributed Sparse Representation y 1 v1 ) ( v1 ) = (,, x y 8 v 8 1 v 8 n On each sensor node i: y 1 =(v 1,1,,v 1,n)x y 8 =(v 8,1,,v 8,n)x 1 Given a (long) test sequence at time t, apply multiple duration hypotheses: y i R 5h 2 Choose Fisher features R i R 10 5h : ỹ i = R i y i = R i A i x = Ã i x R 10

Localized Classifiers 1 Equivalently, define R i = (0 R i 0) ỹ i = (0 R i 0) y 1 y 8 = (0 R i 0) ( v1 v 8 ) 1,, ( v1 v 8 ) n x = R i Ax R10 2 For all segmentation hypotheses, apply sparsity

26 Localized Classifiers 1 Equivalently, define R i = (0 R i 0) ỹ i = (0 R i 0) y 1 y 8 = (0 R i 0) ( v1 v 8 ) 1,, ( v1 v 8 ) n x = R i Ax R10 2 For all segmentation hypotheses, apply sparsity concentration index (SCI) threshold σ 1 : (a) valid segmentation (b) invalid segmentation Local Sparsity Threshold σ 1 If SCI(x) > σ 1, sensor i becomes active and transmits ỹ i R 10 ỹ i provides a segmentation hypothesis at time t and length h i

27 Adaptive Global Classifier Adaptive classification for a subset of active sensors (Suppose 1,, L at time t and h i ) R Define global feature matrix R = : 0 R L 0 ỹ 1 y 1 A 1 = R = R x = R Ax ỹ L y 8 A 8 Global segmentation: Regardless of L active sensors, given a global threshold σ 2, If SCI(x) > σ 2, accept y as global segmentation with label given by x Distributed Classification via Compressed Sensing 1 Reformulate adaptive classification via feature matrix R : R Local: R = (0 R i 0) Global: R = R y = R Ax 0 R L 0 2 The representation x and training matrix A remain invariant 3 Segmentation, recognition, and outlier rejection are unified on x

946 944 928 946 988 Rec [%] 65 615 825 806 895 942 (a) Sensor 1-8 (b) Sensor 7 Reference: Yang et al Distributed

28 Experiment Algorithm only depends on two parameters: (σ 1, σ 2 ) Data communications between sensors and station are 10-D action features Precision vs Recall: Confusion Tables: Sensors 2 7 2,7 1,2,7 1-3, 7,8 1-8 Prec [%] Rec [%] (a) Sensor 1-8 (b) Sensor 7 Reference: Yang et al Distributed Segmentation and Classification of Human Actions Using a Wearable Motion Sensor Network Berkeley Tech Report, 2007

29 Conclusion 1 Sparsity is important for classification of HD data 2 A new recognition framework via compressed sensing 3 In HD feature space, choosing an optimal feature becomes not significant 4 Randomfaces, outliers, occlusion 5 Distributed pattern recognition in body sensor networks

30 Future Directions Distributed camera networks Biosensor networks in health care

31 Acknowledgments Berkeley: Shankar Sastry, Ruzena Bajcsy UIUC: Yi Ma UT-Dallas: Roozbeh Jafari l 1 -Magic by Candès at Caltech SparseLab by Donoho at Stanford cvx by Boyd at Stanford Collaborators MATLAB Toolboxes References Robust face recognition via sparse representation Submitted to PAMI, 2008 Distributed segmentation and classification of human actions using a wearable motion sensor network Berkeley Tech Report 2007

Action Recognition on Distributed Body Sensor Networks via Sparse Representation

Action Recognition on Distributed Body Sensor Networks via Sparse Representation Allen Y Yang June 28, 2008 CVPR4HB New Paradigm of Distributed Pattern Recognition Centralized Recognition