Invertible Nonlinear Dimensionality Reduction via Joint Dictionary Learning
|
|
- Valerie Thornton
- 5 years ago
- Views:
Transcription
1 Invertible Nonlinear Dimensionality Reduction via Joint Dictionary Learning Xian Wei, Martin Kleinsteuber, and Hao Shen Department of Electrical and Computer Engineering Technische Universität München, Germany {xian.wei, kleinsteuber, Abstract. This paper proposes an invertible nonlinear dimensionality reduction method via jointly learning dictionaries in both the original high dimensional data space and its low dimensional representation space. We construct an appropriate cost function, which preserves inner products of data representations in the low dimensional space. We employ a conjugate gradient algorithm on smooth manifold to minimize the cost function. By numerical experiments in image processing, our proposed method provides competitive and robust performance in image compression and recovery, even on heavily corrupted data. Thus, it can also be considered as an alternative approach to compressed sensing. However, our approach can outperform compressed sensing in task-driven learning problems, such as data visualization. Keywords: Invertible nonlinear dimensionality reduction, joint dictionary learning, inner products preservation, compressed sensing. 1 Introduction Dimensionality reduction (DR) is a powerful instrument to tackle large scale signal processing problems. It often serves as a preprocessing step to transform original high dimensional data to a low dimensional space. Then specific tasks, such as filtering or 2D visualization, can be performed directly on the low dimensional representations, cf. [1]. Most classic DR algorithms focuses on finding a low-dimensional embedding of original data, which are not reversible. In other words, there is no reliable reconstruction from the low dimensional space back to the original high dimensional space. However, in many applications, such as communication transmission, image down-sampling and super-resolution, and modeling the time varying data (dynamic textures), it requires that the DR process can be reversible. Finding an invertible nonlinear DR mapping is a long standing problem in the community. Recently, the technique of compressed sensing (CS) [2] has shown that high dimensional signals and images can be reconstructed from the measurements in far lower dimensional space than what is usually considered necessary. Formally, it assumes that a signal x R m admits a factorization of x = Dα under an atoms set D, also called dictionary, where α R k is sparse. Then the CS problem can
2 2 Xian Wei, Martin Kleinsteuber, and Hao Shen be formulated as recovering x from its low dimensional representation y = Ax or y = ADα, y R d m with d m, where A R d m is called a projection matrix which could be chosen as a random Gaussian matrix. This paper considers an alternative process of DR associated with the dictionary learning (DL) models [3, 4], that is, D is not given as some standard orthonormal basis, but learned from training samples. This problem has been studied somewhat within CS framework, known as Blind CS in [5], as well as the models in which D and A are simultaneously learned from data via some joint optimizations [6, 7]. However, one challenge of CS model is that it has to guarantee the incoherence between projection matrix A and dictionary D, as well as the incoherence between pair atoms in D and A themselves [2], which is often difficult to be achieved when D is redundant. In addition, learning tasks (such as 2D visualization) in compressed domain, is often difficult [8]. Different to the methods of CS via optimized projection matrix [6, 7], we propose an alternative approach to model the process of DR, using a couple dictionaries (D R m k, P R d k ), d m, called DRCDL. DRCDL could successfully achieve the task of interest, but avoid to learn the projection matrix directly. Finally, we present a joint learning optimization problem on a product manifold, which is efficiently resolved via geometric conjugate gradient (CG) method. 2 Joint dictionary learning under inner products preservation Let us denote by X := [x 1,..., x n ] R m n the data matrix containing n data samples x i R m, and Y := [y 1,..., y n ] R d n with d < m be its corresponding low dimensional representation via some DR mapping g : x i y i for all i = 1,..., n. In this work, we assume that both the original data and its low dimensional representation share the same or quite similar sparse structure. Such an assumption is popularly shared by the coupled sparse representation [9]. We assume that all data points x i R m admit sparse representations with respect to a common dictionary D := [d 1,..., d k ] R m k, i.e. x i = Dφ i, for all i = 1,..., n, (1) where φ i R k is the corresponding sparse representation of x i. In this work, we further assume that all columns of the dictionary D have unit norm. We then define the set S(m, k) := {D R m k ddiag(d D) = I k }, (2) where ddiag(z) is the diagonal matrix whose entries on the diagonal are those of Z, and I k denotes the identity matrix. We assume that the low dimensional representations Y share the same sparse structure with respect to a low dimensional dictionary P := [p 1,..., p k ] R d k, i.e. y i = P φ i with P S(d, k). By a slight abuse of notations, we denote by φ D : x i φ i and φ P : y i φ i the sparse
3 Invertible Nonlinear Dimensionality Reduction via Joint Dictionary Learning 3 coding in the original data space and the low dimensional representation space, respectively. We propose a nonlinear DR mapping g : x i P φ D (x i ), (3) and reversely g 1 : y i Dφ P (y i ), (4) The aim of DR is to find a DR mapping g : x i y i, which is stable and preserves as much useful structure as possible. According to well known Johnson- Lindenstrauss (JL) lemma, cf. [10], every n-point subset of Euclidean space can be embedded in dimension O(ɛ 2 log n) with 1 + ɛ distortion with 0 < ɛ < 1/2. Specifically, distance or inner product information of the high dimensional data is preserved in the low dimensional representation space, when ɛ is close to zero [11, 12]. Specifically, the loss introduced by the DR mapping g can be measured by the following function G(X; Y ) := n (x i x j yi y j ) 2. (5) i=1 Recall the assumption that both the original data point x i and its low dimensional representation y i := g(x i ) share the same sparse structure, i.e. x i = Dφ i and y i = P φ i. We adopt the loss function (5) directly to the current coupled sparse representation setting as G (D,P ) (X; Y ) = n ( ( φ i D D P P ) ) 2 φ j. (6) i=1 Roughly speaking, the loss G (D,P ) is small, if either the sparse representations are pair-wise conjugate with respect to D D P P, or the difference D D P P is essentially small. In this work, we consider the second argument. As both dictionaries D and P are often assumed to be full rank, P can be also considered as a low rank approximation of D. In order to ensure stability of the proposed nonlinear DR mapping g, we need to guarantee moderate mutual incoherence in both the high and low dimensional dictionaries, i.e. D R m k and P R d k, according to the theory in sparse representation, cf. [13]. However, when the difference D D P P is sufficiently small, the mutual coherence of D is ensured to be close to the mutual coherence of P. Hence, instead of penalizing on both D and P, we propose to apply a logarithmic barrier function to enforce the mutual coherence of P, i.e. r(p ) = 1 i<j k log ( 1 (p i p j ) 2). (7) Finally, let us denote Φ(Y, P ) := [φ P (y 1 ),..., φ P (y n )] R k n. Then, by considering the reconstruction error in the original data space, we propose the following
4 4 Xian Wei, Martin Kleinsteuber, and Hao Shen cost function f : S(m, k) S(d, k) R d n R (D, P, Y ) 1 2n X DΦ(Y, P ) 2 F + µ 1 2k 2 D D P P 2 F + µ 2r(P ), where µ 1 > 0 weighs between the loss of distance preservation of DR and how accurately DΦ(Y, P ) reconstructs the training samples, and µ 2 > 0 controls the mutual coherence of the learned dictionary. As an extension, if we assume the relationship between D and P is linear, it reads as P = U D, therefore, y could be directly obtained via y = U Dφ = U x. Here, U R m d is adopted as the eigenvectors of D corresponding to its first d largest eigenvalues. We call this model as compressed couple dictionaries learning (CCDL) in this paper. (8) 3 A conjugate gradient DR algorithm Recall the fact that the set S(m, k) is the product of k unit spheres, i.e. a k(m 1) dimensional smooth manifold. In what follows, we adopt the conjugate gradient algorithm on smooth manifolds, which has demonstrated its competitive performance in (co-)sparse dictionary learning, cf.[3, 4], to minimize the cost function f on the product manifold S(m, k) S(d, k) R d n. In this work, we employ the sparse solution given by solving an elastic-net problem, cf. [14], as φ 1 := argmin 2 y P φ λ 1 φ 1 + λ2 2 φ 2 2, (9) φ R k where λ 1 > 0 and λ 2 > 0 are regularization parameters, which ensures stability and uniqueness of solutions. Let us define the set of indices of the non-zero entries of the solution φ = [ϕ 1,..., ϕ k ] R k as Λ := {i {1,..., k} ϕ i 0}. Then the solution of the elastic net (9) has a closed-form expression as φ D(y) := ( D Λ D Λ + λ 2 I d ) 1 ( D Λ y λ 1 s Λ ), (10) where s Λ {±1} Λ carries the signs of φ Λ, D Λ R m Λ is the subset of D in which the index of atoms (rows) fall into support Λ. The solution φ P (y) shares an algorithmically convenient property of being locally twice differentiable with respect to both P and y, cf. [15, 16]. Recall the tangent space T D S(m, k) of S(m, k) at D S(m, k) as T D S(m, k) := {Ξ R m k ddiag(ξ D) = 0}, (11) and the orthogonal projection of a matrix Z R m k onto the tangent space T D S(m, k) with respect to the inner product Ξ, Ψ = tr(ξ Ψ) as Π D (Z) := Z D ddiag(d Z). (12)
5 Invertible Nonlinear Dimensionality Reduction via Joint Dictionary Learning 5 Then, by computing the first derivation of f at (D, P, Y ) in tangent direction (H D, H P, H Y ) T (D,P,Y ) S(m, k) S(d, k) R d n, we get the Riemannian gradient of f at (D, P, Y ) as grad f(d, P, Y ) = ( Π D ( f (D) ), Π P ( f (P ) ), f (Y ) ), (13) where f (D), f (P ), and f (Y ) are the Euclidean gradients of f with respect to the three arguments, respectively. Firstly, the Euclidean gradient f (D) of f with respect to D is computed as f (D) = n i=1 (Dφ P (y i ) 2x i ) φ P (y i ) + 2µ 1 k 2 D(D D P P ). (14) Using some shorthand notation, let Λ i be the support of nonzero entries of φ P (y i ), and denote K i := PΛ i P Λi λ 2 I k, r i := PΛ i y i λ 1 s Λi, x i := x i Dφ P (y i ), and q i := r i x i. Then, the Euclidean gradient f (P ) of f is computed as with f (P ) = n i=1 2V{ y i ( x i ) D Λi K 1 i + P Λi K 1 i DΛ i qi K 1 i + P Λi K 1 i q i D Λi K 1 i } + 2µ 1 k 2 P (P P D D) + µ 1 r (P ), r (P ) = P 1 i<j n (15) 2p i p j 1 (p i p j) 2 (E ij + E ji ) (16) being the gradient of the logarithmic barrier function (7). Here, V{ } denotes the full vector of { }. By E ij, we denote a matrix whose i th entry in the j th column is equal to one, and all others are zero. Finally, the Euclidean gradient f (Y ) is computed as f (Y ) = [ V{D Λ1 K 1 1 x 1 },..., V{D Λn K 1 n xn } ]. (17) By assembling the Riemannian gradients, geodesics and parallel transports on the underlying manifolds, a conjugate gradient algorithm on S(m, k) S(d, k) R d n is straightforward. Due to the page limit, we omit the presentation of the algorithm, and refer to [4] for more technical details. 4 Numerical experiments In this section, we investigate the performance of our proposed DR framework via couple dictionaries learning (DRCDL) and its linear extension - compressed CDL (CCDL) for signal compression, reconstruction, and visualization. Before presenting our experiments, we briefly discuss the question of choosing the parameters in our formulation. Considering the high coherence among the images
6 6 Xian Wei, Martin Kleinsteuber, and Hao Shen or imaginary patches, we prefer the dictionary with low redundancy, that is, k 2m for D R m k. For parameters (λ 1, λ 2 ) in Eq.(9), we put an emphasis on sparse solutions and choose λ 2 (0, λ1 10 ), as proposed in [14]. The parameters for µ 1, µ 2 in (8) could be well tuned via performing cross validation. The CMU Multi-PIE [17] faces and MNIST handwritten digital databases 1 are used as the benchmark dataset for images compression, reconstruction and 2D visualization in our experiments. In order to evaluate our proposed method on DR and reconstruction, we compare it with CS associated with the random Gaussian sensing matrix [2] (Gaussian CS), and robust principle components analysis (RPCA) [18]. In Figure 1 and 2, 5000 images are randomly chosen for training D, and 500 images are randomly taken from remained database for testing. We first reduce the dimensionality from m = 1024, 784 to d = 16 for PIE and MNIST, and then recover them using Gaussian CS, RPCA, and proposed DRCDL, CCDL, respectively. Figure 1 and 2(e) demonstrate that the proposed methods perform much better on signal reconstruction, in comparison with Gaussian CS and RPCA. In Figure 2, we impose PCA on original data and reduced data respec- (a) PIE data (b) Reovered data using RPCA (c) Reovered data using Gaussian CS (d) Reovered using DRCDL data Fig. 1. From (b) to (d), recovering the reduced data from d = 16, using RPCA, Gaussian CS, and DRCDL, respectively. The PSNR is 23.01dB, 25.28dB and 31.12dB. tively, to achieve a 2D visualization. Figure 2(b) 2(c) 2(d) show that learning directly in the compressed domain, is feasible. Compared to Gaussian CS, our proposed methods (DRCDL and CCDL) exhibit more stable and competitive performance on the results of PCA, even in very low-dimensional compressed domain, i.e. d = 10. Figure 3 shows the results of image compression and recovery on single image - Lena. Compare to Bayesian CS (BCS) [19] and JPEG2000, our proposed methods DRCDL and CCDL exhibit the strong performance when input data is heavy corrupted. 5 Conclusions This paper proposed a couple dictionary learning approach to achieve the task of invertible nonlinear DR, called DRCDL. Following the Johnson-Lindenstrauss 1
7 Invertible Nonlinear Dimensionality Reduction via Joint Dictionary Learning (a) PCA on original data, m = d = (b) PCA on compressed data (DRCDL, d = 10) (c) PCA on compressed data (CCDL, d = 10) (d) Gaussian CS on original data, d = 10. (e) Reconstructed MNIST images. Fig. 2. From (a) to (d), employing PCA on original data and reduced data, respectively. (e) shows that reconstructing MNIST images from d = 16 to m = 784, using CS and DRCDL, CCDL. From top to bottom (six rows): original data, recovered images using Gaussian CS, Gaussian CS based on K-SVD dictionary, DRCDL and CCDL. (a) 33 Training images used for learning dictionary (b) Lena (c) DRCDL (d) CCDL (e) JPEG (f) BCS Fig. 3. Recovery performance on Lena with a compression rate η = 32. (b) is the corrupted image with PSNR = 16.12dB; (c) to (f) are recovered images using DR- CDL, CCDL, JPEG2000 and BCS. The PSNR (db) is 26.52, 26.44, and 22.28, respectively. (JL) lemma in the process of DR, we develop a joint dictionary learning method, which preserves the distance information of the high dimensional data. Our experiments using single image, digits and facial images verified our idea. The proposed model is flexible and can be extended to some other cases and applications.
8 8 Xian Wei, Martin Kleinsteuber, and Hao Shen References 1. Van der Maaten, L.J., Postma, E.O., van den Herik, H.J.: Dimensionality reduction: A comparative review. Journal of Machine Learning Research 10(1-41) (2009) Donoho, D.L.: Compressed sensing. IEEE Transactions on Information Theory 52(4) (2006) Aharon, M., Elad, M., Bruckstein, A.: K-SVD: An algorithm for designing overcomplete dictionaries for sparse representation. IEEE Transactions on Signal Processing 54(11) (2006) Hawe, S., Seibert, M., Kleinsteuber, M.: Separable dictionary learning. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), IEEE (2013) Gleichman, S., Eldar, Y.C.: Blind compressed sensing. IEEE Transactions on Information Theory 57(10) (2011) Duarte Carvajalino, J.M., Sapiro, G.: Learning to sense sparse signals: Simultaneous sensing matrix and sparsifying dictionary optimization. IEEE Transactions on Image Processing 18(7) (2009) Elad, M.: Optimized projections for compressed sensing. IEEE Transactions on Signal Processing 55(12) (2007) Calderbank, R., Jafarpour, S., Schapire, R.: Compressed learning: Universal sparse dimensionality reduction and learning in the measurement domain. Technical report, Computer Science, Princeton University (2009) 9. Zeyde, R., Elad, M., Protter, M.: On single image scale-up using sparserepresentations. In: Curves and Surfaces. Volume 6920., Springer (2010) Johnson, W.B., Lindenstrauss, J.: Extensions of Lipschitz mappings into a Hilbert space. Contemporary Mathematics 26( ) (1984) Kim, H., Park, H., Zha, H.: Distance preserving dimension reduction for manifold learning. In: SDM, SIAM (2007) Baraniuk, R., Davenport, M., DeVore, R., Wakin, M.: A simple proof of the restricted isometry property for random matrices. Constructive Approximation 28(3) (2008) Elad, M.: Sparse and Redundant Representations: From Theory to Applications in Signal and Image Processing. Springer (2010) 14. Zou, H., Hastie, T.: Regularization and variable selection via the elastic net. Journal of the Royal Statistical Society: Series B (Statistical Methodology) 67(2) (2005) Mairal, J., Bach, F., Ponce, J.: Task-driven dictionary learning. IEEE Transactions on Pattern Analysis and Machine Intelligence 34(4) (2012) Wei, X., Shen, H., Kleinsteuber, M.: An adaptive dictionary learning approach for modeling dynamical textures. In: Proceedings of the 39 th IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP). (2014) Sim, T., Baker, S., Bsat, M.: The CMU pose, illumination, and expression (PIE) database. In: Fifth IEEE International Conference on Automatic Face and Gesture Recognition (FG), IEEE (2002) De la Torre, F., Black, M.J.: Robust principal component analysis for computer vision. In: Eighth IEEE International Conference on Computer Vision (ICCV). Volume 1., IEEE (2001) Ji, S., Xue, Y., Carin, L.: Bayesian compressive sensing. IEEE Transactions on Signal Processing 56(6) (2008)
SPARSE signal representations have gained popularity in recent
6958 IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 57, NO. 10, OCTOBER 2011 Blind Compressed Sensing Sivan Gleichman and Yonina C. Eldar, Senior Member, IEEE Abstract The fundamental principle underlying
More informationA Multi-task Learning Strategy for Unsupervised Clustering via Explicitly Separating the Commonality
A Multi-task Learning Strategy for Unsupervised Clustering via Explicitly Separating the Commonality Shu Kong, Donghui Wang Dept. of Computer Science and Technology, Zhejiang University, Hangzhou 317,
More informationTRACKING SOLUTIONS OF TIME VARYING LINEAR INVERSE PROBLEMS
TRACKING SOLUTIONS OF TIME VARYING LINEAR INVERSE PROBLEMS Martin Kleinsteuber and Simon Hawe Department of Electrical Engineering and Information Technology, Technische Universität München, München, Arcistraße
More informationMIT 9.520/6.860, Fall 2017 Statistical Learning Theory and Applications. Class 19: Data Representation by Design
MIT 9.520/6.860, Fall 2017 Statistical Learning Theory and Applications Class 19: Data Representation by Design What is data representation? Let X be a data-space X M (M) F (M) X A data representation
More informationSeparable Dictionary Learning
2013 IEEE Conference on Computer Vision and Pattern Recognition Separable Dictionary Learning Simon Hawe Matthias Seibert Martin Kleinsteuber Department of Electrical Engineering and Information Technology
More informationThe Iteration-Tuned Dictionary for Sparse Representations
The Iteration-Tuned Dictionary for Sparse Representations Joaquin Zepeda #1, Christine Guillemot #2, Ewa Kijak 3 # INRIA Centre Rennes - Bretagne Atlantique Campus de Beaulieu, 35042 Rennes Cedex, FRANCE
More informationEUSIPCO
EUSIPCO 013 1569746769 SUBSET PURSUIT FOR ANALYSIS DICTIONARY LEARNING Ye Zhang 1,, Haolong Wang 1, Tenglong Yu 1, Wenwu Wang 1 Department of Electronic and Information Engineering, Nanchang University,
More informationCS 231A Section 1: Linear Algebra & Probability Review
CS 231A Section 1: Linear Algebra & Probability Review 1 Topics Support Vector Machines Boosting Viola-Jones face detector Linear Algebra Review Notation Operations & Properties Matrix Calculus Probability
More informationCS 231A Section 1: Linear Algebra & Probability Review. Kevin Tang
CS 231A Section 1: Linear Algebra & Probability Review Kevin Tang Kevin Tang Section 1-1 9/30/2011 Topics Support Vector Machines Boosting Viola Jones face detector Linear Algebra Review Notation Operations
More informationBackpropagation Rules for Sparse Coding (Task-Driven Dictionary Learning)
Backpropagation Rules for Sparse Coding (Task-Driven Dictionary Learning) Julien Mairal UC Berkeley Edinburgh, ICML, June 2012 Julien Mairal, UC Berkeley Backpropagation Rules for Sparse Coding 1/57 Other
More informationDimensionality Reduction Using the Sparse Linear Model: Supplementary Material
Dimensionality Reduction Using the Sparse Linear Model: Supplementary Material Ioannis Gkioulekas arvard SEAS Cambridge, MA 038 igkiou@seas.harvard.edu Todd Zickler arvard SEAS Cambridge, MA 038 zickler@seas.harvard.edu
More informationIEEE SIGNAL PROCESSING LETTERS, VOL. 22, NO. 9, SEPTEMBER
IEEE SIGNAL PROCESSING LETTERS, VOL. 22, NO. 9, SEPTEMBER 2015 1239 Preconditioning for Underdetermined Linear Systems with Sparse Solutions Evaggelia Tsiligianni, StudentMember,IEEE, Lisimachos P. Kondi,
More informationNear Ideal Behavior of a Modified Elastic Net Algorithm in Compressed Sensing
Near Ideal Behavior of a Modified Elastic Net Algorithm in Compressed Sensing M. Vidyasagar Cecil & Ida Green Chair The University of Texas at Dallas M.Vidyasagar@utdallas.edu www.utdallas.edu/ m.vidyasagar
More informationLEARNING OVERCOMPLETE SPARSIFYING TRANSFORMS FOR SIGNAL PROCESSING. Saiprasad Ravishankar and Yoram Bresler
LEARNING OVERCOMPLETE SPARSIFYING TRANSFORMS FOR SIGNAL PROCESSING Saiprasad Ravishankar and Yoram Bresler Department of Electrical and Computer Engineering and the Coordinated Science Laboratory, University
More informationA Block-Jacobi Algorithm for Non-Symmetric Joint Diagonalization of Matrices
A Block-Jacobi Algorithm for Non-Symmetric Joint Diagonalization of Matrices ao Shen and Martin Kleinsteuber Department of Electrical and Computer Engineering Technische Universität München, Germany {hao.shen,kleinsteuber}@tum.de
More informationMethods for sparse analysis of high-dimensional data, II
Methods for sparse analysis of high-dimensional data, II Rachel Ward May 23, 2011 High dimensional data with low-dimensional structure 300 by 300 pixel images = 90, 000 dimensions 2 / 47 High dimensional
More informationA NEW FRAMEWORK FOR DESIGNING INCOHERENT SPARSIFYING DICTIONARIES
A NEW FRAMEWORK FOR DESIGNING INCOERENT SPARSIFYING DICTIONARIES Gang Li, Zhihui Zhu, 2 uang Bai, 3 and Aihua Yu 3 School of Automation & EE, Zhejiang Univ. of Sci. & Tech., angzhou, Zhejiang, P.R. China
More informationStructured matrix factorizations. Example: Eigenfaces
Structured matrix factorizations Example: Eigenfaces An extremely large variety of interesting and important problems in machine learning can be formulated as: Given a matrix, find a matrix and a matrix
More informationA discretized Newton flow for time varying linear inverse problems
A discretized Newton flow for time varying linear inverse problems Martin Kleinsteuber and Simon Hawe Department of Electrical Engineering and Information Technology, Technische Universität München Arcisstrasse
More informationCompressed Sensing and Neural Networks
and Jan Vybíral (Charles University & Czech Technical University Prague, Czech Republic) NOMAD Summer Berlin, September 25-29, 2017 1 / 31 Outline Lasso & Introduction Notation Training the network Applications
More information2.3. Clustering or vector quantization 57
Multivariate Statistics non-negative matrix factorisation and sparse dictionary learning The PCA decomposition is by construction optimal solution to argmin A R n q,h R q p X AH 2 2 under constraint :
More informationLecture 3: Compressive Classification
Lecture 3: Compressive Classification Richard Baraniuk Rice University dsp.rice.edu/cs Compressive Sampling Signal Sparsity wideband signal samples large Gabor (TF) coefficients Fourier matrix Compressive
More informationNew Coherence and RIP Analysis for Weak. Orthogonal Matching Pursuit
New Coherence and RIP Analysis for Wea 1 Orthogonal Matching Pursuit Mingrui Yang, Member, IEEE, and Fran de Hoog arxiv:1405.3354v1 [cs.it] 14 May 2014 Abstract In this paper we define a new coherence
More informationSEQUENTIAL SUBSPACE FINDING: A NEW ALGORITHM FOR LEARNING LOW-DIMENSIONAL LINEAR SUBSPACES.
SEQUENTIAL SUBSPACE FINDING: A NEW ALGORITHM FOR LEARNING LOW-DIMENSIONAL LINEAR SUBSPACES Mostafa Sadeghi a, Mohsen Joneidi a, Massoud Babaie-Zadeh a, and Christian Jutten b a Electrical Engineering Department,
More informationSparse representation classification and positive L1 minimization
Sparse representation classification and positive L1 minimization Cencheng Shen Joint Work with Li Chen, Carey E. Priebe Applied Mathematics and Statistics Johns Hopkins University, August 5, 2014 Cencheng
More informationIntroduction to Machine Learning
10-701 Introduction to Machine Learning PCA Slides based on 18-661 Fall 2018 PCA Raw data can be Complex, High-dimensional To understand a phenomenon we measure various related quantities If we knew what
More informationCSC 576: Variants of Sparse Learning
CSC 576: Variants of Sparse Learning Ji Liu Department of Computer Science, University of Rochester October 27, 205 Introduction Our previous note basically suggests using l norm to enforce sparsity in
More informationA Convex Approach for Designing Good Linear Embeddings. Chinmay Hegde
A Convex Approach for Designing Good Linear Embeddings Chinmay Hegde Redundancy in Images Image Acquisition Sensor Information content
More informationLecture Notes 10: Matrix Factorization
Optimization-based data analysis Fall 207 Lecture Notes 0: Matrix Factorization Low-rank models. Rank- model Consider the problem of modeling a quantity y[i, j] that depends on two indices i and j. To
More informationLinear Subspace Models
Linear Subspace Models Goal: Explore linear models of a data set. Motivation: A central question in vision concerns how we represent a collection of data vectors. The data vectors may be rasterized images,
More informationMethods for sparse analysis of high-dimensional data, II
Methods for sparse analysis of high-dimensional data, II Rachel Ward May 26, 2011 High dimensional data with low-dimensional structure 300 by 300 pixel images = 90, 000 dimensions 2 / 55 High dimensional
More informationGreedy Dictionary Selection for Sparse Representation
Greedy Dictionary Selection for Sparse Representation Volkan Cevher Rice University volkan@rice.edu Andreas Krause Caltech krausea@caltech.edu Abstract We discuss how to construct a dictionary by selecting
More informationAdvances in Manifold Learning Presented by: Naku Nak l Verm r a June 10, 2008
Advances in Manifold Learning Presented by: Nakul Verma June 10, 008 Outline Motivation Manifolds Manifold Learning Random projection of manifolds for dimension reduction Introduction to random projections
More informationBlock-Sparse Recovery via Convex Optimization
1 Block-Sparse Recovery via Convex Optimization Ehsan Elhamifar, Student Member, IEEE, and René Vidal, Senior Member, IEEE arxiv:11040654v3 [mathoc 13 Apr 2012 Abstract Given a dictionary that consists
More informationOn Optimal Frame Conditioners
On Optimal Frame Conditioners Chae A. Clark Department of Mathematics University of Maryland, College Park Email: cclark18@math.umd.edu Kasso A. Okoudjou Department of Mathematics University of Maryland,
More informationStability and Robustness of Weak Orthogonal Matching Pursuits
Stability and Robustness of Weak Orthogonal Matching Pursuits Simon Foucart, Drexel University Abstract A recent result establishing, under restricted isometry conditions, the success of sparse recovery
More informationFocus was on solving matrix inversion problems Now we look at other properties of matrices Useful when A represents a transformations.
Previously Focus was on solving matrix inversion problems Now we look at other properties of matrices Useful when A represents a transformations y = Ax Or A simply represents data Notion of eigenvectors,
More informationMachine Learning for Signal Processing Sparse and Overcomplete Representations. Bhiksha Raj (slides from Sourish Chaudhuri) Oct 22, 2013
Machine Learning for Signal Processing Sparse and Overcomplete Representations Bhiksha Raj (slides from Sourish Chaudhuri) Oct 22, 2013 1 Key Topics in this Lecture Basics Component-based representations
More informationPCA & ICA. CE-717: Machine Learning Sharif University of Technology Spring Soleymani
PCA & ICA CE-717: Machine Learning Sharif University of Technology Spring 2015 Soleymani Dimensionality Reduction: Feature Selection vs. Feature Extraction Feature selection Select a subset of a given
More informationLECTURE NOTE #11 PROF. ALAN YUILLE
LECTURE NOTE #11 PROF. ALAN YUILLE 1. NonLinear Dimension Reduction Spectral Methods. The basic idea is to assume that the data lies on a manifold/surface in D-dimensional space, see figure (1) Perform
More informationRandom projections. 1 Introduction. 2 Dimensionality reduction. Lecture notes 5 February 29, 2016
Lecture notes 5 February 9, 016 1 Introduction Random projections Random projections are a useful tool in the analysis and processing of high-dimensional data. We will analyze two applications that use
More informationIntroduction to Sparsity. Xudong Cao, Jake Dreamtree & Jerry 04/05/2012
Introduction to Sparsity Xudong Cao, Jake Dreamtree & Jerry 04/05/2012 Outline Understanding Sparsity Total variation Compressed sensing(definition) Exact recovery with sparse prior(l 0 ) l 1 relaxation
More informationParcimonie en apprentissage statistique
Parcimonie en apprentissage statistique Guillaume Obozinski Ecole des Ponts - ParisTech Journée Parcimonie Fédération Charles Hermite, 23 Juin 2014 Parcimonie en apprentissage 1/44 Classical supervised
More informationLecture 6 Sept Data Visualization STAT 442 / 890, CM 462
Lecture 6 Sept. 25-2006 Data Visualization STAT 442 / 890, CM 462 Lecture: Ali Ghodsi 1 Dual PCA It turns out that the singular value decomposition also allows us to formulate the principle components
More informationDeep Learning: Approximation of Functions by Composition
Deep Learning: Approximation of Functions by Composition Zuowei Shen Department of Mathematics National University of Singapore Outline 1 A brief introduction of approximation theory 2 Deep learning: approximation
More information5742 IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 55, NO. 12, DECEMBER /$ IEEE
5742 IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 55, NO. 12, DECEMBER 2009 Uncertainty Relations for Shift-Invariant Analog Signals Yonina C. Eldar, Senior Member, IEEE Abstract The past several years
More informationTractable Upper Bounds on the Restricted Isometry Constant
Tractable Upper Bounds on the Restricted Isometry Constant Alex d Aspremont, Francis Bach, Laurent El Ghaoui Princeton University, École Normale Supérieure, U.C. Berkeley. Support from NSF, DHS and Google.
More informationA Conjugate Gradient Algorithm for Blind Sensor Calibration in Sparse Recovery
A Conjugate Gradient Algorithm for Blind Sensor Calibration in Sparse Recovery Hao Shen, Martin Kleinsteuber, Cagdas Bilen, Rémi Gribonval To cite this version: Hao Shen, Martin Kleinsteuber, Cagdas Bilen,
More informationOrthogonal Matching Pursuit for Sparse Signal Recovery With Noise
Orthogonal Matching Pursuit for Sparse Signal Recovery With Noise The MIT Faculty has made this article openly available. Please share how this access benefits you. Your story matters. Citation As Published
More informationLecture Notes 9: Constrained Optimization
Optimization-based data analysis Fall 017 Lecture Notes 9: Constrained Optimization 1 Compressed sensing 1.1 Underdetermined linear inverse problems Linear inverse problems model measurements of the form
More informationSparsifying Transform Learning for Compressed Sensing MRI
Sparsifying Transform Learning for Compressed Sensing MRI Saiprasad Ravishankar and Yoram Bresler Department of Electrical and Computer Engineering and Coordinated Science Laborarory University of Illinois
More informationRobust Principal Component Analysis
ELE 538B: Mathematics of High-Dimensional Data Robust Principal Component Analysis Yuxin Chen Princeton University, Fall 2018 Disentangling sparse and low-rank matrices Suppose we are given a matrix M
More informationSparse Solutions of an Undetermined Linear System
1 Sparse Solutions of an Undetermined Linear System Maddullah Almerdasy New York University Tandon School of Engineering arxiv:1702.07096v1 [math.oc] 23 Feb 2017 Abstract This work proposes a research
More informationarxiv: v1 [math.na] 26 Nov 2009
Non-convexly constrained linear inverse problems arxiv:0911.5098v1 [math.na] 26 Nov 2009 Thomas Blumensath Applied Mathematics, School of Mathematics, University of Southampton, University Road, Southampton,
More informationRecent Advances in Structured Sparse Models
Recent Advances in Structured Sparse Models Julien Mairal Willow group - INRIA - ENS - Paris 21 September 2010 LEAR seminar At Grenoble, September 21 st, 2010 Julien Mairal Recent Advances in Structured
More informationLinear Algebra: Matrix Eigenvalue Problems
CHAPTER8 Linear Algebra: Matrix Eigenvalue Problems Chapter 8 p1 A matrix eigenvalue problem considers the vector equation (1) Ax = λx. 8.0 Linear Algebra: Matrix Eigenvalue Problems Here A is a given
More informationWIRELESS sensor networks (WSNs) have attracted
On the Benefit of using ight Frames for Robust Data ransmission and Compressive Data Gathering in Wireless Sensor Networks Wei Chen, Miguel R. D. Rodrigues and Ian J. Wassell Computer Laboratory, University
More informationLinear Algebra & Geometry why is linear algebra useful in computer vision?
Linear Algebra & Geometry why is linear algebra useful in computer vision? References: -Any book on linear algebra! -[HZ] chapters 2, 4 Some of the slides in this lecture are courtesy to Prof. Octavia
More informationc Springer, Reprinted with permission.
Zhijian Yuan and Erkki Oja. A FastICA Algorithm for Non-negative Independent Component Analysis. In Puntonet, Carlos G.; Prieto, Alberto (Eds.), Proceedings of the Fifth International Symposium on Independent
More informationA tutorial on sparse modeling. Outline:
A tutorial on sparse modeling. Outline: 1. Why? 2. What? 3. How. 4. no really, why? Sparse modeling is a component in many state of the art signal processing and machine learning tasks. image processing
More informationGeneralized Power Method for Sparse Principal Component Analysis
Generalized Power Method for Sparse Principal Component Analysis Peter Richtárik CORE/INMA Catholic University of Louvain Belgium VOCAL 2008, Veszprém, Hungary CORE Discussion Paper #2008/70 joint work
More informationFast Hard Thresholding with Nesterov s Gradient Method
Fast Hard Thresholding with Nesterov s Gradient Method Volkan Cevher Idiap Research Institute Ecole Polytechnique Federale de ausanne volkan.cevher@epfl.ch Sina Jafarpour Department of Computer Science
More informationConvolutional Dictionary Learning and Feature Design
1 Convolutional Dictionary Learning and Feature Design Lawrence Carin Duke University 16 September 214 1 1 Background 2 Convolutional Dictionary Learning 3 Hierarchical, Deep Architecture 4 Convolutional
More informationLearning Bound for Parameter Transfer Learning
Learning Bound for Parameter Transfer Learning Wataru Kumagai Faculty of Engineering Kanagawa University kumagai@kanagawa-u.ac.jp Abstract We consider a transfer-learning problem by using the parameter
More informationECE G: Special Topics in Signal Processing: Sparsity, Structure, and Inference
ECE 18-898G: Special Topics in Signal Processing: Sparsity, Structure, and Inference Sparse Recovery using L1 minimization - algorithms Yuejie Chi Department of Electrical and Computer Engineering Spring
More informationData dependent operators for the spatial-spectral fusion problem
Data dependent operators for the spatial-spectral fusion problem Wien, December 3, 2012 Joint work with: University of Maryland: J. J. Benedetto, J. A. Dobrosotskaya, T. Doster, K. W. Duke, M. Ehler, A.
More informationStrengthened Sobolev inequalities for a random subspace of functions
Strengthened Sobolev inequalities for a random subspace of functions Rachel Ward University of Texas at Austin April 2013 2 Discrete Sobolev inequalities Proposition (Sobolev inequality for discrete images)
More informationImproving the Incoherence of a Learned Dictionary via Rank Shrinkage
Improving the Incoherence of a Learned Dictionary via Rank Shrinkage Shashanka Ubaru, Abd-Krim Seghouane 2 and Yousef Saad Department of Computer Science and Engineering, University of Minnesota, Twin
More informationLecture: Face Recognition and Feature Reduction
Lecture: Face Recognition and Feature Reduction Juan Carlos Niebles and Ranjay Krishna Stanford Vision and Learning Lab Lecture 11-1 Recap - Curse of dimensionality Assume 5000 points uniformly distributed
More informationL26: Advanced dimensionality reduction
L26: Advanced dimensionality reduction The snapshot CA approach Oriented rincipal Components Analysis Non-linear dimensionality reduction (manifold learning) ISOMA Locally Linear Embedding CSCE 666 attern
More informationConnection of Local Linear Embedding, ISOMAP, and Kernel Principal Component Analysis
Connection of Local Linear Embedding, ISOMAP, and Kernel Principal Component Analysis Alvina Goh Vision Reading Group 13 October 2005 Connection of Local Linear Embedding, ISOMAP, and Kernel Principal
More informationMaximum variance formulation
12.1. Principal Component Analysis 561 Figure 12.2 Principal component analysis seeks a space of lower dimensionality, known as the principal subspace and denoted by the magenta line, such that the orthogonal
More informationSparse molecular image representation
Sparse molecular image representation Sofia Karygianni a, Pascal Frossard a a Ecole Polytechnique Fédérale de Lausanne (EPFL), Signal Processing Laboratory (LTS4), CH-115, Lausanne, Switzerland Abstract
More informationThe Singular Value Decomposition (SVD) and Principal Component Analysis (PCA)
Chapter 5 The Singular Value Decomposition (SVD) and Principal Component Analysis (PCA) 5.1 Basics of SVD 5.1.1 Review of Key Concepts We review some key definitions and results about matrices that will
More informationCompressed Sensing and Related Learning Problems
Compressed Sensing and Related Learning Problems Yingzhen Li Dept. of Mathematics, Sun Yat-sen University Advisor: Prof. Haizhang Zhang Advisor: Prof. Haizhang Zhang 1 / Overview Overview Background Compressed
More informationNon-linear Dimensionality Reduction
Non-linear Dimensionality Reduction CE-725: Statistical Pattern Recognition Sharif University of Technology Spring 2013 Soleymani Outline Introduction Laplacian Eigenmaps Locally Linear Embedding (LLE)
More informationAutomatic Subspace Learning via Principal Coefficients Embedding
IEEE TRANSACTIONS ON CYBERNETICS 1 Automatic Subspace Learning via Principal Coefficients Embedding Xi Peng, Jiwen Lu, Senior Member, IEEE, Zhang Yi, Fellow, IEEE and Rui Yan, Member, IEEE, arxiv:1411.4419v5
More informationTHe linear decomposition of data using a few elements
1 Task-Driven Dictionary Learning Julien Mairal, Francis Bach, and Jean Ponce arxiv:1009.5358v2 [stat.ml] 9 Sep 2013 Abstract Modeling data with linear combinations of a few elements from a learned dictionary
More informationCS281 Section 4: Factor Analysis and PCA
CS81 Section 4: Factor Analysis and PCA Scott Linderman At this point we have seen a variety of machine learning models, with a particular emphasis on models for supervised learning. In particular, we
More informationLecture 5 : Projections
Lecture 5 : Projections EE227C. Lecturer: Professor Martin Wainwright. Scribe: Alvin Wan Up until now, we have seen convergence rates of unconstrained gradient descent. Now, we consider a constrained minimization
More informationLinear Methods for Regression. Lijun Zhang
Linear Methods for Regression Lijun Zhang zlj@nju.edu.cn http://cs.nju.edu.cn/zlj Outline Introduction Linear Regression Models and Least Squares Subset Selection Shrinkage Methods Methods Using Derived
More informationAdaptive Compressive Imaging Using Sparse Hierarchical Learned Dictionaries
Adaptive Compressive Imaging Using Sparse Hierarchical Learned Dictionaries Jarvis Haupt University of Minnesota Department of Electrical and Computer Engineering Supported by Motivation New Agile Sensing
More informationCompressed sensing. Or: the equation Ax = b, revisited. Terence Tao. Mahler Lecture Series. University of California, Los Angeles
Or: the equation Ax = b, revisited University of California, Los Angeles Mahler Lecture Series Acquiring signals Many types of real-world signals (e.g. sound, images, video) can be viewed as an n-dimensional
More informationarxiv: v3 [cs.lg] 6 Sep 2017
An Efficient Method for Robust Projection Matrix Design Tao Hong a, Zhihui Zhu b a Department of Computer Science, Technion - Israel Institute of Technology, Haifa, 32000, Israel. b Department of Electrical
More informationCS168: The Modern Algorithmic Toolbox Lecture #7: Understanding Principal Component Analysis (PCA)
CS68: The Modern Algorithmic Toolbox Lecture #7: Understanding Principal Component Analysis (PCA) Tim Roughgarden & Gregory Valiant April 0, 05 Introduction. Lecture Goal Principal components analysis
More informationRui ZHANG Song LI. Department of Mathematics, Zhejiang University, Hangzhou , P. R. China
Acta Mathematica Sinica, English Series May, 015, Vol. 31, No. 5, pp. 755 766 Published online: April 15, 015 DOI: 10.1007/s10114-015-434-4 Http://www.ActaMath.com Acta Mathematica Sinica, English Series
More informationMachine Learning for Signal Processing Sparse and Overcomplete Representations
Machine Learning for Signal Processing Sparse and Overcomplete Representations Abelino Jimenez (slides from Bhiksha Raj and Sourish Chaudhuri) Oct 1, 217 1 So far Weights Data Basis Data Independent ICA
More informationConditions for Robust Principal Component Analysis
Rose-Hulman Undergraduate Mathematics Journal Volume 12 Issue 2 Article 9 Conditions for Robust Principal Component Analysis Michael Hornstein Stanford University, mdhornstein@gmail.com Follow this and
More informationUniqueness Conditions for A Class of l 0 -Minimization Problems
Uniqueness Conditions for A Class of l 0 -Minimization Problems Chunlei Xu and Yun-Bin Zhao October, 03, Revised January 04 Abstract. We consider a class of l 0 -minimization problems, which is to search
More informationClassification of handwritten digits using supervised locally linear embedding algorithm and support vector machine
Classification of handwritten digits using supervised locally linear embedding algorithm and support vector machine Olga Kouropteva, Oleg Okun, Matti Pietikäinen Machine Vision Group, Infotech Oulu and
More informationStatistical Pattern Recognition
Statistical Pattern Recognition Feature Extraction Hamid R. Rabiee Jafar Muhammadi, Alireza Ghasemi, Payam Siyari Spring 2014 http://ce.sharif.edu/courses/92-93/2/ce725-2/ Agenda Dimensionality Reduction
More informationLecture: Face Recognition and Feature Reduction
Lecture: Face Recognition and Feature Reduction Juan Carlos Niebles and Ranjay Krishna Stanford Vision and Learning Lab 1 Recap - Curse of dimensionality Assume 5000 points uniformly distributed in the
More informationNonnegative Matrix Factorization Clustering on Multiple Manifolds
Proceedings of the Twenty-Fourth AAAI Conference on Artificial Intelligence (AAAI-10) Nonnegative Matrix Factorization Clustering on Multiple Manifolds Bin Shen, Luo Si Department of Computer Science,
More information14 Singular Value Decomposition
14 Singular Value Decomposition For any high-dimensional data analysis, one s first thought should often be: can I use an SVD? The singular value decomposition is an invaluable analysis tool for dealing
More informationPHASE RETRIEVAL OF SPARSE SIGNALS FROM MAGNITUDE INFORMATION. A Thesis MELTEM APAYDIN
PHASE RETRIEVAL OF SPARSE SIGNALS FROM MAGNITUDE INFORMATION A Thesis by MELTEM APAYDIN Submitted to the Office of Graduate and Professional Studies of Texas A&M University in partial fulfillment of the
More informationON THE STABILITY OF DEEP NETWORKS
ON THE STABILITY OF DEEP NETWORKS AND THEIR RELATIONSHIP TO COMPRESSED SENSING AND METRIC LEARNING RAJA GIRYES AND GUILLERMO SAPIRO DUKE UNIVERSITY Mathematics of Deep Learning International Conference
More information235 Final exam review questions
5 Final exam review questions Paul Hacking December 4, 0 () Let A be an n n matrix and T : R n R n, T (x) = Ax the linear transformation with matrix A. What does it mean to say that a vector v R n is an
More informationIndependent Component Analysis (ICA)
Independent Component Analysis (ICA) Seungjin Choi Department of Computer Science and Engineering Pohang University of Science and Technology 77 Cheongam-ro, Nam-gu, Pohang 37673, Korea seungjin@postech.ac.kr
More informationNonlinear Dimensionality Reduction
Nonlinear Dimensionality Reduction Piyush Rai CS5350/6350: Machine Learning October 25, 2011 Recap: Linear Dimensionality Reduction Linear Dimensionality Reduction: Based on a linear projection of the
More informationSignal Recovery from Permuted Observations
EE381V Course Project Signal Recovery from Permuted Observations 1 Problem Shanshan Wu (sw33323) May 8th, 2015 We start with the following problem: let s R n be an unknown n-dimensional real-valued signal,
More information