GRAPH-BASED REGULARIZATION OF LARGE COVARIANCE MATRICES.

Size: px
Start display at page:

Download "GRAPH-BASED REGULARIZATION OF LARGE COVARIANCE MATRICES."

Transcription

1 GRAPH-BASED REGULARIZATION OF LARGE COVARIANCE MATRICES. A Thesis Presented in Partial Fulfillment of the Requirements for the Degree Master of Science in the Graduate School of The Ohio State University By Srikar Yekollu, B.E. * * * * * The Ohio State University 2009 Master s Examination Committee: Approved by Dr. Mikhail Belkin, Adviser Dr. Simon Dennis Adviser Computer Science & Engineering Graduate Program

2 c Copyright by Srikar Yekollu 2009

3 ABSTRACT A range of successful techniques in computer vision, such as Eigenfaces and Fisherfaces are based on using the spectral decomposition of the empirical covariance matrices that are constructed from given data. These matrices are typically constructed in a setting where the dimension of the data (number of pixels) exceeds the number of available samples, sometimes by a large margin. However it has been established in statistics that under these conditions and some fairly general modeling assumptions, the eigenvectors and eigenvalues of covariance matrices cannot be estimated reliably. Several techniques to remedy this problem have been proposed. These techniques typically make specific assumptions about the structure of the covariance matrix and assume that this structure is known in advance. In this thesis we propose a new method for automatically learning non-local structure in the covariance matrix in a data-dependent way. This learned structure is then used to improve inference for methods like Eigenfaces. Unlike most existing methods in computer vision and statistics we do not make any assumptions about the spatial (pixel) proximity structure of the data. We provide theoretical results indicating that our methods may overcome the problem of insufficient data. We evaluate the performance of our algorithms empirically and demonstrate significant and consistent improvements over traditional Eigenfaces as well as more recent techniques, such as 2D PCA, Euclidean Banding and thresholding for a wide range of parameter settings. ii

4 This is dedicated to my aunt and uncle, Sailaja & Raja and my parents, Girija & Anand who have strived to provide me the best quality of education and have always encouraged me in my endeavours. iii

5 ACKNOWLEDGEMENTS First, I would like to thank my adviser, Dr. Mikhail Belkin, not only for his wisdom and guidance but also for his unending patience with me. I thank him for his insightful ideas and attention to detail that have led to the production of this thesis. Through him, I have learned to become a better writer and overall thinker. He has strengthened my understanding of how to research effectively and continues to serve as an important role-model for me. I thank Dr. Simon Dennis for agreeing to take the time to be a member of my thesis committee. iv

6 VITA October 3, Born - Nellore, Andhra Pradesh, India June B.E., Computer Science & Engineering July 2005 June Technical Associate, Trilogy E-Business, Bangalore, India present Graduate Research Assistant, The Ohio State University PUBLICATIONS Research Publications Srikar Yekollu and Mikhail Belkin. Learning Banded Faces: Towards better Eigenfaces. IEEE International Conference on Computer Vision. Submitted. FIELDS OF STUDY Major Field: Computer Science & Engineering v

7 TABLE OF CONTENTS Page Abstract Dedication Acknowledgements Vita List of Tables ii iii iv v viii Chapters: 1. Introduction Idea and the Algorithm Idea and intuition Preliminaries Algorithm Theoretical analysis Experimental Evaluation Datasets Face Recognition Fisherfaces Image Reconstruction/Denoising vi

8 5. Summary and Conclusions Bibliography vii

9 LIST OF TABLES Table Page 4.1 Classification error rate (in %) on a training set of 40 subjects with 4 samples per subject and a test set 240 samples drawn from the ATT/ORL face dataset with different amounts of added noise. The number of eigenvectors was chosen to be 100 for all method other than 2DPCA. For 2DPCA the number of eigenvectors was 4, which produced the best performance Classification error rate (in %) on a training set of 15 subjects with 4 samples per subject and a test set 165 samples drawn from the Yale face dataset. The number of eigenvectors chosen were 50 and k was chosen to be 20. The results are averaged over 15 iterations Classification error rate (in %) as a function of the number of eigenvectors N e. The training set contains 40 subjects with 4 samples per subject and the test set contains 240 samples drawn from the ATT/ORL face dataset Classification error rate (in %) and the percentage of non-zero entries (sparsity) of covariance matrices (in %) Vs Banded Faces for different numbers of nearest neighbors k used to construct the graph for Banded Faces, vs best parameter setting for Euclidean and Thresholding banding. While sparsity is not applicable for 2DPCA, the classification error is provided for comparison. The bolded number is used for all other computations Classification error rate for Fisherfaces (in %) on a training set of 40 subjects with 4 samples per subject and a test set of 240 samples drawn from the ATT/ORL face dataset with different amounts of added noise. 26 viii

10 4.6 Reconstruction error (L 2 Distance to the original) on a set of 10 subjects with 3 samples per subject drawn from the ATT face dataset, different amounts of noise added Reconstruction error (L 2 Distance to the original) on a set of 15 subjects with 3 samples per subject drawn from the Yale face dataset, different amounts of noise added A few representative samples from the experiments in Table (4.6). The numbers below an image are the values of its L 2 distance from the corresponding original image ix

11 CHAPTER 1 INTRODUCTION A large class of popular techniques in computer vision, such as Eigenfaces and Fisherfaces, are based on computing the eigenvectors of certain covariance matrices generated from data, where a w h image is represented by a vector of pixels of dimension p = w h. These pixel-pixel covariance matrices are of the size (w h) 2, for w h images. For even a moderately sized image of dimensions 30 40, this translates to working in a 1200-dimensional space. For example, images ranging from up to are used in [26], [3] and [8]. It is clear that accurate estimation of these pixel-pixel covariance matrices plays a key role in the success of these techniques. Standard results from the random matrix theory ([2]) establish that the sample covariance matrix (the maximum likelihood estimator) constructed from n samples, converges to the population (true) covariance matrix at a rate no better than p/n. This implies that in the above examples, we need at least several thousand samples to establish a reasonable approximation of the covariance matrix. However, in many cases this number of samples is not available. More so in the case of Fisherfaces, this would be a more significant problem due to the requirement escalating to p samples per subject. For example in [26] 16 images of 16 individuals is used as the training set, in [3] a training set of size is used. Interestingly, despite this theoretical 1

12 disconnect, these techniques have proven successful for a wide range of problems in vision. We conjecture that this success is due to the fact that images reside near a lower-dimensional linear subspace in the space of all pixel configurations. However images have a wealth of structure, for example related to spatial proximity. Moreover, specific classes of images, e.g. faces, have additional nonlinear structure, which is not necessarily local in nature. One expects that statistically grounded methods for identifying and using such structures in the context of estimating covariance matrices could significantly improve performance of algorithms like Eigenfaces. In this thesis we will demonstrate that a class of methods specifically designed to detect non-linear structures in covariance matrices significantly outperforms classical Eigenface-like algorithms. Analyzing high dimensional data has been a topic of active research in the statistics community. A subject of particular interest was understanding the so-called high p, low n setting, where the number of data points is smaller than the dimension of the space. Some of this research has direct relevance to high-dimensional inference problems such as face recognition. In this thesis we will be interested in the structure of covariance matrices for PCA-based and closely related methods, such as Eigenfaces and Fisherfaces, to name two most well-known examples. We start with providing a brief overview of the relevant statistical literature. The behavior of the empirical (sample) covariance matrix ˆΣ (used in the classical PCA algorithm) is well understood in traditional multivariate statistics (see, e.g., [2]) when the dimensionality p is fixed and the number of samples n goes to infinity. The problem of high dimensional inference is usually expressed by treating the dimensionality p, as a variable which increases with the number of samples n as opposed to treating 2

13 it as a fixed value as in traditional multivariate statistics. So, the high p, low n setting is sometimes written as p(n) n c as n, where c > 0 is a constant. Note how p is written as p(n) to indicate the relation between them (p and N) in this setting. So, in this setting, when p is comparable to n, ˆΣ provides a poor estimate for the true covariance matrix Σ. In their fundamental work on the subject [21] Marcenko and Pastur provide the first theoretical analysis of the properties of certain random matrices and show how their eigenspectrum behaved with the value of p/n. This analysis was extended by Silverstein [24] to include covariance matrices. A number of theoretical advances including Johnstone [15] and El Karoui[16] followed. These works explore the behavior of the eigenvalues of covariance matrices in the high-dimensional setting. A general conclusion is that the sample covariance matrix constructed from n p-dimensional samples, approaches the population covariance matrix at the rate of n/p in operator norm (which implies the same rate of convergence for eigenvalues and eigenvectors). Informally speaking, one needs several times as many samples as there are dimensions of the space. However, obtaining more samples is not a viable option in a range of practical situations. A need to obtain better estimators for the true covariance matrix was recognized. While some standard regularization techniques such as Ridge regression and Steinian shrinkage [7] can be applied, they are not expected to work well as they do not make use of any existing additional structure in the data to overcome the curse of dimensionality [13]. A technique utilizing such additional structure was proposed in Bickel and Levina 2006 [6], which proposes banding of the covariance matrix based on a linear ordering of the coordinates. Effectively, the entries of the empirical covariance matrix corresponding to the coordinates which are not close, are set to be zero. Other techniques are those proposed by Wu 3

14 and Pourahmandi [28] who suggest smoothing along the components of the Cholesky decomposition of the covariance matrix, Huand et al. [14] who propose to impose an L 1 penalty on the Cholesky factor to achieve covariance sparsity and Furrer and Bengtsson [9] who propose to shrink (taper) the sample covariance matrix based on its Schur product with a positive definite function. Another set of techniques for regularization is based on the more general assumption that the covariance matrix is sparse, see Bickel and Levina [5] and El Karoui [17]. In this case, the authors suggest choosing a threshold and setting the entries of the empirical covariance matrix, which fall below it, to zero. The authors provide a variety of theoretical results, such as convergence rates for these estimates, showing the potential of these methods to work even when p is significantly larger than n. There have also been several lines of work in computer vision aimed at improving covariance-based methods. Most of these approaches (e.g., [1, 19, 22, 29, 30]) attempt to exploit a certain assumed spatial structure to improve inference. For example, the approach in [22] is to divide the data into blocks defined by pixel proximity and to apply PCA to each block separately. A popular method, 2DPCA [29], treats each row as a unit which can be interpreted as a certain assumption about the pixel proximity structure on the coordinates, see [27]. Most of the existing methods impose a certain fixed structure on the space of images. While such structures, based, e.g., on spatial pixel proximity are often natural and lead to elegant algorithms, one expects that many non-local dependencies would be ignored because of the rigidity of these models. In this thesis we propose an algorithm for learning certain dependencies between the pixels of an image (or, more generally, coordinates of a vector representation) in 4

15 a more flexible data-dependent way. More precisely, the data (images) are used to construct a metric space structure on the set of coordinates (pixels), which is applied to regularizing the covariance matrix. Our algorithm is related to the class of manifold methods ([4, 23, 25]), but operates on the space of coordinates rather than data points. We provide theoretical results, indicating its potential ability to overcome the curse of dimensionality. In the experimental section we also show that our algorithms consistently outperform 2DPCA as well as classical Eigenfaces, Fisherfaces and several recent statistical techniques, particularly when the data is noisy. Our method also generates very sparse matrices allowing for efficient computation. The outline for the rest of the thesis is as follows. In Section 2, we introduce the idea and intuition behind the work followed by a few preliminaries and then the actual Algorithm. In Section 3, we then present theoretical analysis of the ideas in Section 2 along with the relevant assumptions. The experimental evaluation of the algorithm is presented and discussed in Section 4. 5

16 CHAPTER 2 IDEA AND THE ALGORITHM. 2.1 Idea and intuition. In recent years there has been a large amount of research on manifold based algorithms for Machine Learning (e.g., [25], [23], [4]). These methods make an assumption that the data lies close to a low-dimensional, generally non-linear manifold embedded in a high-dimensional space. The manifold structure is the recovered and is used for various inferential purposes, such as data representation, clustering, semisupervised learning and many others. These ideas have also been applied in vision, e.g., [12]. In this thesis we take a point of view dual to that of traditional Manifold Learning. Instead of considering a non-linear structure on the space of data points (images) we try to discover non-linear structure on the space of coordinates (pixels). We believe that there are strong interactions between the coordinates involved, which are not fully explained by the spatial pixel proximity. For example, for face images, the pixels of the left and of the right eye are strongly correlated despite significant spatial separation between them. Dependencies of this sort are clearly structurally important for families of images, yet it is often hard to encode them explicitly. Therefore, we would like to find an automated way of detecting such dependencies from data in a 6

17 way useful for inferential tasks. We do this by constructing a metric space (graph) structure on the space of coordinates, reflecting correlations between the pixel values in images, such that the strongly correlated coordinates are close together in the graph. While technically not a manifold, we think of this metric space as analogous to a manifold in manifold learning. We note that this structure is related to graphical models, where the relation between variables (coordinates) is also represented by a graph (e.g.,[10]). However most graphical models research, (see, [20] for an overview) either assumes that the graph structure is known in advance or attempts to learn the independence structure of variables from the data. Our thesis, on the other hand, is primarily concerned with highly correlated coordinates and does not require the much more subtle notion of conditional dependence. We note that in the case of a multivariate Gaussian distribution, learning conditional independence requires inverting the correlation matrix ( [20]), a procedure which is unstable in the high p low n setting considered here. Intuitively, the underlying idea is to take advantage of highly correlated coordinates in the data, where the signal is particularly strong and inference can be done with higher confidence. This small set of coordinate pairs is then used to construct a graph structure on the set of coordinates. 2.2 Preliminaries We start with a set of n w h image samples. They are represented as column vectors(x i ) of length p = w h. The sample covariance matrix (ˆΣ p ) is defined as ˆΣ p = [ˆσ ij ] p p = 1 n n (X i X)(X i X) T (2.1) i=1 7

18 where, The correlations between pixels are X = 1 n n X i (2.2) i=1 ˆρ ij = ˆσ ij ˆσii ˆσ jj (2.3) Recall that ρ ij [ 1, 1]. As discussed above, we will use the correlation structure to construct a metric on the set of coordinates. A metric on a set of p elements is represented by p p symmetric matrix G, where G(i, j) gives the distance between the elements i and j. We will obtain this metric from a graph representing the strongest correlations between coordinates and will use it to regularize the covariance matrix by removing correlations between coordinates which are not close in the graph. The banding operator (following the notation in Bickel and Levina [6]) used for banding a p p matrix M = [m ij ] using a metric G with a distance threshold d is defined as: { [BG(M)] d m ij, if G(i, j) d ij = 0, if G(i, j) > d Thus given a matrix M, the operator BG d (M) produces a new matrix, where certain values are replaced by zeros. BG d (M) can be viewed as a projection operator on the space of matrices preserving entries corresponding to the pairs of coordinates close on the graph. In practice we expect the resulting matrices to be sparse. The underlying assumption is that there is a metric on the set of coordinates that reflects their correlation structure. Moreover, we hope that this structure can be recovered just from the pixels with the strongest correlations as we expect these pixel pairs to give us the most information. In Section 3 we will prove that these pixel pairs can be recovered reliably from a number of image samples logarithmic in dimension. 8

19 Moreover we will demonstrate that under a simple model for the underlying metric this number of samples is sufficient for reconstructing the whole metric structure. 2.3 Algorithm Input: A set of n labeled w h images as column vectors (X 1, X 2... X n ) of length p = w h, Parameters: k, d 1. [Constructing dissimilarity matrix] Construct the sample correlation matrix [ˆρ ij ] p p as in Equation (2.3). Construct the corresponding p p dissimilarity matrix ˆD as ˆd ij = 1 ˆρ ij 2. [Constructing the graph] For each coordinate i retain only the k smallest (least dissimilar) entries in the ith row of ˆD. Symmetrize the resulting matrix to get a sparse matrix ˆT = [ˆt ij ] p p with ˆt ij = max( ˆd ij, ˆd ji ) We interpret ˆT as the adjacency matrix for a weighted graph G. 3. [Constructing the metric] Apply the Floyd-Warshall algorithm on the graph G given by ˆT to construct the matrix of distances between coordinates. 4. [Regularizing the Covariance Matrix] Construct the banded covariance ˆΣ reg matrix ˆΣ p to get p = BG d (ˆΣ p ): { ˆσ reg ˆσ ij if the shortest distance between i and j d ij = 0 otherwise. Note: In the supervised setting, the value of d can be learned through the standard technique of cross validation. 9

20 5. [Eigenfaces] Replace ˆΣ p by ˆΣ reg p in the standard Eigenface algorithms or any other algorithm that uses covariance matrices. 10

21 CHAPTER 3 THEORETICAL ANALYSIS In this section, we formalize and provide theoretical justification for the ideas and the Algorithm presented in the previous section. We assume that, for the class of facial images, the dependencies between coordinates can be represented by a weighted graph whose nodes are the coordinates. This graph induces a metric space structure d M (, ), such that strongly related coordinates are close together. We propose a model for this structure and show, how, under these assumptions, it can be reconstructed from the data. We will assume that, if j is among the k coordinates most highly correlated with the coordinate i, they are connected by an edge whose weight is a linear function of the correlation ρ ij between them. For adjacent vertices i and j, we define the distance d M (i, j) to be the weight of the corresponding edge: d M (i, j) = m (1 ρ ij ) + c (3.1) If i and j are not adjacent then d M (i, j) is the shortest weighted distance between them. This definition encodes our intuition that strongly correlated pixel pairs contain most of the information about the structure of images. 11

22 To make this intuition algorithmically useful, we will further assume that the correlation structure has certain inherent sparsity so that the correlations between pixels far away in our metric are close to zero. This notion is formalized in Definition (3). We will now discuss our key results providing the basis for the Algorithm described in the previous section. 1. In Theorem (3.0.1) we show that the adjacency structure of our graph can be log p recovered from data using only k. We note that this quantity is logarith- n mic in the dimension and depends linearly on the number of nearest neighbors, which we expect to be small in practice. This justifies Step 1 in our algorithm. 2. Given that the adjacency structure is recovered and under the assumption of linearity (as above), the Theorem (3.0.2) shows that the full metric space structure can be reconstructed in Steps 2 and 3 of the algorithm. 3. Assuming now that the correlation matrix is bandable with respect to the metric structure, we show in Theorem (3.0.3) that the banded version of the sample covariance matrix (constructed in Step 4 of the Algorithm) is close to the underlying population covariance matrix. Specifically, the spectral norm of the difference between these matrices is bounded by a function of log p. On the n other hand, if no banding is performed, the difference is of the order p n. 4. Finally, Corollary (3.0.4) shows that the actual computation of Eigenfaces or any other eigenvector-base algorithm can be accurately done using the number of samples depending on log p. This corollary signifies the usage of spectral norm in Theorem (3.0.3) by making use of a Result (see Appendix) which says 12

23 that spectral norm convergence implies that both the sample eigenvalues and sample eigenvectors converge to their corresponding population equivalents (see [11] for a thorough discussion on matrix norms). This justifies Step 5 of the Algorithm. Definition 1. Given a matrix M = [m ij ], define, m ij, if m ij is less than the k th largest [T k (M)] ij = element in the i th row of M. 0, otherwise Definition 2. Given the metric d M (i, j) on a set of p variables, define k d = max i I[d M (i, j) < d] j That is, k d is maximum, among all variables, of the number of neighboring variables that are at less than d distance away from the given variable with respect to the metric d M (i, j). Definition 3. Given the metric d M (i, j) on a set of p variables, we define the set of (α, C) bandable matrices with respect to this metric as Σ p : max j i { σ ij : d M (i, j) > d} Ck α d This definition is a generalization of Definition (5) in [6]. Definition 4. (Matrix Norms) For a p dimensional vector x, the following norms are defined, x k = p x j k j=1 x = max x j j x = x 2 13

24 For a symmetric matrix M, the following norms are defined, M = M 2 = sup{ Mx : x = 1} = max λ i (M) i M (, ) = sup{ Mx : x = 1} = max m ij i M (1,1) = sup{ Mx 1 : x 1 = 1} = max j j m ij M = max m ij min( M (1,1), M (, ) ) ij For symmetric matrices, the following are true from the above definitions, i M (, ) = M (1,1) M M (, ) Result 1. (Hoeffding s Inequality, 1963) Let X 1,..., X n be independent random variables. Assume that the X i are almost surely bounded; that is, assume for 1 i n that Pr(X i [a i, b i ]) = 1. Then, for the sum of these variables S = X X n we have the inequality: ( ) 2 n 2 t 2 Pr(S E[S] nt) exp n i=1 (b, i a i ) 2 which is valid for positive values of t (where E[S] is the expected value of S). Result 2. Implication of Spectral norm convergence. (see. [18]) If A and B are two symmetric matrices, and if λ i is their i th eigenvalue, where the eigenvalues are sorted in decreasing order, we have, λ i (A) λ i (B) A B 14

25 and P D (A) P D (B) A B δ where, D > 0, P D (A) is the projection operator onto the first D eigenvectors of A and δ = (λ D (A) λ D+1 (A))/2. Theorem k-nearest Neighbor convergence ( ˆT ) T k (ˆΣ p ) T k (Σ p ) = O P (k ) log p Proof. The outline of the proof is as follows. We note from Definition (4) that, for symmetric matrices, the spectral norm ( M ) is bounded by M (, ) and use this to get, n T k (ˆΣ p ) T k (Σ p ) T k (ˆΣ p ) T k (Σ p ) (, ) 2k T k (ˆΣ p ) T k (Σ p ) The second part in the above follows from bounding the number of non-zero elements in any row of the difference. Which can be at max 2k if the operator T k on both the matrices totally disagrees on the spatial locations of the non-zero elements in any particular row. Thus, T k (ˆΣ p ) T k (Σ p ) = O P (k T k (ˆΣ p ) T k (Σ p ) ) (3.2) Since, each element of both ˆΣ p and Σ p (and hence of T k (ˆΣ p ) and T k (ˆΣ)) can be represented as the sum of independent random variables (which are bounded since the individual pixel values are bounded), an application of Hoeffding s Inequality(1) 15

26 followed by a union-bound on the whole matrix gives, Choose t = log pk n [ ] Pr T k (ˆΣ p ) T k (Σ p ) t kpe δnt2 [ ] log pk Pr T k (ˆΣ p ) T k (Σ p ) (pk) 1 δ n Thus we have, ( ) log pk T k (ˆΣ p ) T k (Σ p ) = O p n ( ) log p = O p ( k < p) (3.3) n Thus from (3.2) and (3.3) we have, T k (ˆΣ p ) T k (Σ p ) = O p (k ) log p n Theorem Reconstruction of the metric (d M (, )) Under the model in (3.1), given a set of sample images, we can recover a metric d G (i, j) such that, if d M (i, j) d M (i, k), then, d G (i, j) d G (i, k) Proof. If the largest value retained by T k ( ˆD) is ɛ, then the smallest correlation retained in the corresponding correlation matrix is ɛ = (1 ɛ ). If we choose k 16

27 such that ɛ > ɛ o, then by Theorem (3.0.1) and the assumption in Equation 3.1, for ρ ij > ɛ o, d G (i, j) = 1 ρ ij and d M (ij) = m (d G (i, j)) + c. Consider a path with one intermediate coordinate in the original metric. Say, the path between coordinates X and Y passes through Q. We have, d M (X, Y ) = d M (X, P ) + d M (P, Y ) and P = min v Λ d M(X, v) + d M (v, Y ) = min v Λ m (d G(X, v)) + c + m (d G (v, Y )) + c P = min v Λ (d G(X, v)) + (d G (v, Y )) where, Λ is the set of all coordinates. So, the path in the metric that we construct also passes through P. This logic can thus be extended to paths passing through more than one coordinates. This in fact also uses the basis of the Floyd Warshall Algorithm (dynamic programming). Theorem Convergence of the Banded Estimator(ˆΣ reg p ) If the population covariance matrix under consideration Σ p belongs to the set of matrices bandable with respect to the metric d M (i, j) (Definition (3)). Then, if we choose d such that k d (n 1 log p) 1 2(α+1), ˆΣ reg p ( (log ) ) α p 2(α+1) Σ p = O P n where, ˆΣ reg p = B d G (ˆΣ p ). 17

28 Proof. The proof follows along the lines of the proof of Theorem 1 in [6]. From the basic requirement of a norm to satisfy the triangle inequality we have, B d G(ˆΣ p ) Σ p B d G(ˆΣ p ) B d G(Σ p ) + O P ( B d G(Σ p ) Σ p ) (3.4) From the Definition (4) that, for symmetric matrices, the spectral norm ( M ) is bounded by M (, ) and use this to get, B d G(ˆΣ p ) B d G(Σ p ) B d G(ˆΣ p ) B d M(Σ p ) (, ) 2k d B d G(ˆΣ p ) B d G(Σ p ) The second part in the above follows from bounding the number of non-zero elements in any row of the difference. Which can be at max k d according to the definition of k d. Thus, B d G(ˆΣ p ) B d G(Σ p ) = O P ( B d G(ˆΣ p ) B d G(Σ p ) ) (3.5) Since, each element of both ˆΣ p and Σ p (and hence of B d G (ˆΣ p ) B d G (Σ p)) can be represented as the sum of independent random variables (which are bounded since the individual pixel values are bounded), an application of Hoeffding s Inequality (1) followed by a union-bound on the whole matrix gives, Choose t = log pk d n [ ] Pr BG(ˆΣ d p ) BG(Σ d p ) t k d pe δnt2 [ ] log Pr BG(ˆΣ d p ) BG(Σ d pkd p ) (pk d ) 1 δ n 18

29 Thus we have, ( ) log BG(ˆΣ d p ) BG(Σ d pkd p ) = O p n ( ) log p = O p ( k d < p) (3.6) n Thus from (3.5) and (3.6) we have, ( ) log p BG(ˆΣ d p ) BG(Σ d p ) = O P k d n ( (log ) ) α p 2(α+1) = O P n (3.7) But, from Definition (3) and using the properties of norms we have, B d G(Σ p ) Σ p B d G(Σ p ) Σ p (, ) ( ) = O P k α d = O P ( (log p n ) ) α 2(α+1) (3.8) From Equations (3.4), (3.7) and (3.8) we thus have, ˆΣ reg p Σ p = O P ( (log p n ) α 2(α+1) ) Corollary Convergence of Banded Faces If the population covariance matrix under consideration Σ p belongs to the set of matrices bandable with respect to the metric d M (i, j) (Definition (3)) and we choose d such that k d (n 1 log p) 1 2(α+1), ( ( ) ) α P D (Σ p ) P D (ˆΣ reg 1 log p 2(α+1) p ) = O P δ n where, ˆΣ reg p = B d G (ˆΣ p ), D > 0, δ = (λ D (Σ p ) λ D+1 (Σ p ))/2, λ i (A) is the i th largest eigenvalue of A and P D (A) is the projection operator onto the first D eigenvectors of A. 19

30 Proof. By a straightforward application of Result (2) to P D (Σ p ) P D (ˆΣ reg p ) we get, P D (Σ p ) P D (ˆΣ reg p ) Σ p δ reg ˆΣ p By applying the result of Theorem (3.0.3), P D (Σ p ) P D (ˆΣ reg p ) 1 δ O P = O P ( 1 δ ( (log p n ( log p n ) ) α 2(α+1) ) ) α 2(α+1) 20

31 CHAPTER 4 EXPERIMENTAL EVALUATION In this section we evaluate the performance of the proposed algorithm and compare it to 2DPCA, classical Eigenfaces and to some of the techniques recently proposed in the statistical literature. We will test our algorithm using two basic tasks of computer vision face recognition and image restoration/denoising on two popular face datasets, the Yale Face dataset and ATT/ORL face dataset. 4.1 Datasets ATT/ORL Face dataset: The ATT/ORL dataset consists of images of 40 different subjects with 10 images per subject. To speed up the computations we scale the images down to pixels. Yale Face dataset: The Yale face dataset contains grayscale images of 15 individuals. There are 11 images per subject, one per different facial expression or configuration. For our experiments, we scale the images down to resolution. In some of the experiments, zero mean Gaussian pixel noise of differing variance is added to images. We note that in all our experiments the number of training images n is significantly smaller than the dimension of the space p. 21

32 4.2 Face Recognition We compare the performance of our algorithm with 2DPCA, Eigenfaces and two recent statistical banding techniques in the classification setting. To do that, we reduce the dimensionality using each algorithm and run a very simple classification algorithm, 1-Nearest Neighbor classifier, in the reduced space. The two statistical techniques are banding of the covariance matrix using Euclidean pixel proximity [6] and thresholding (see [5, 17]). In the Euclidean pixel proximity based banding the assumption is that each pixel is correlated only with those pixels which are less than a distance d away from it in the Euclidean pixel space. Hence the procedure here is to zero covariances between pixels if the distance between them is greater than a certain distance (the banding parameter) in the Euclidean pixel space. The assumption for the thresholding approach is that all covariances below a certain threshold are due to noise. The algorithm works by setting all entries in the empirical covariance matrix that have an absolute value below a certain thresholding parameter t to zero. The value for the banding and thresholding parameters in these algorithms as well as the parameter d for our method were chosen using cross validation which is a common practice used in similar experiments. The number of nearest neighbors k for the Banded Faces algorithm was set to be 20 for all the experiments other than Table 4.4 where we show the dependence of the error on the number of nearest neighbors. To test the performance of estimators and their robustness to noise, we run them on a set of faces with additional Gaussian pixel noise added. The results of this experiment on the ATT/ORL dataset are shown in the Table (4.1) and for the Yale Face data set in Table (4.2). The first 100 (50 for the Yale dataset) eigenvectors were chosen as they capture approximately 90% of the variance. To see the influence of 22

33 the number of eigenvectors on performance, in Table (4.3) we compare Banded Faces with different techniques for different numbers of eigenvectors (2DPCA is omitted as the number of eigenvectors is not directly comparable). All experiments were run for 15 iterations and the average values are reported Fisherfaces We also compare Banded Faces with the technique of Euclidean proximity banding ( [6]) in the context of classification by using Fisherfaces as the dimensionality reduction technique. Here the banding techniques operate on the full covariance matrix constructed in the dimensionality reduction step of Fisherfaces (see. [3]) as well as the between-class covariance matrix. The number of nearest neighbors k for the Banded Faces algorithm was set to be 20. The performances of the algorithms are compared with different amounts of Gaussian pixel noise added to the datasets. The results for this experiment can be found in Table ( 4.5). σ noise Eigenfaces Euc 2DPCA Thresh B F Table 4.1: Classification error rate (in %) on a training set of 40 subjects with 4 samples per subject and a test set 240 samples drawn from the ATT/ORL face dataset with different amounts of added noise. The number of eigenvectors was chosen to be 100 for all method other than 2DPCA. For 2DPCA the number of eigenvectors was 4, which produced the best performance. Discussion. Several observations are now in order: 23

34 σ noise Eigenfaces Euc 2DPCA Thresh B F Table 4.2: Classification error rate (in %) on a training set of 15 subjects with 4 samples per subject and a test set 165 samples drawn from the Yale face dataset. The number of eigenvectors chosen were 50 and k was chosen to be 20. The results are averaged over 15 iterations. N e Eigenfaces Euc Thresh B F Table 4.3: Classification error rate (in %) as a function of the number of eigenvectors N e. The training set contains 40 subjects with 4 samples per subject and the test set contains 240 samples drawn from the ATT/ORL face dataset. 1. In our experiments we see that Banded Faces produces consistently better classification accuracy than 2DPCA, classical Eigenfaces as well as Thresholding and Euclidean banding methods. These improvements are consistent throughout a range of parameters. We also see that the performance of Banded Faces is quite robust to the changes in the number of nearest neighbors In line with the theoretical results in [15] we expect that the addition of noise to images makes estimating the covariance matrix and its eigenvectors less 24

35 Banded Faces, nearest neighbors for the adjacency graph, k Euc Thr 2DPCA Err rate % Sparsity % NA Table 4.4: Classification error rate (in %) and the percentage of non-zero entries (sparsity) of covariance matrices (in %) Vs Banded Faces for different numbers of nearest neighbors k used to construct the graph for Banded Faces, vs best parameter setting for Euclidean and Thresholding banding. While sparsity is not applicable for 2DPCA, the classification error is provided for comparison. The bolded number is used for all other computations. stable, since the number of samples is much smaller than the dimension of the space. Our theoretical results suggest that Banded Faces should require fewer samples for accurate estimation and thus its advantage should increase with the amount of noise. This is generally borne out in our experiments. For the ATT/ORL dataset in Table (4.1)the performance of 2DPCA and other methods deteriorates significantly with added noise, while the classification accuracy of Banded Faces is decreases much less. For the Yale dataset, Table (4.2), the performance of all decreases significantly with added noise, however the comparative advantage of Banded Faces still increases. 3. It is interesting to note that the 2DPCA, Euclidean pixel proximity and thresholding methods show similar performance (Table (4.1) and Table (4.2)). We see that these methods tend to outperform Eigenfaces, especially for noisy images, but do not match the performance of Banded Faces. We take these findings as validating out intuition that incorporating additional structure into the covariance matrix is generally helpful, especially when large amount of noise are present, but that the banding just 25

36 σ noise Fisherfaces Euclidean Banded Faces Table 4.5: Classification error rate for Fisherfaces (in %) on a training set of 40 subjects with 4 samples per subject and a test set of 240 samples drawn from the ATT/ORL face dataset with different amounts of added noise. using Euclidean spatial structure as in (2DPCA and Euclidean banding) or simple thresholding may be too rigid. 4. On a coarse level, the regularization by Banded Faces might seem equivalent to a thresholding procedure which functions by setting all covariances whose absolute values fall below a particular threshold to 0. The experiments in Table (4.4) serve to dispell this possibility by comparing the respective regularized covariances in terms of sparsity. It can be seen that even when the parameter k used in Banded Faces approaches its maximum value, the two procedures do not collapse to equivalence. 4.3 Image Reconstruction/Denoising The goal of the next set of experiments is to compare the performance of the algorithms in the context of image reconstruction/denoising. Here, the procedure is to project the images onto a subspace spanned by the few top eigenvectors, in which the dimensions which mostly capture noise are presumable eliminated. In our experiments we chose to project onto the top 20 eigenvectors. We compare the results obtained by using Eigenfaces, Banded Faces and banding using Euclidean pixel 26

37 proximity. The L 2 distance between the reconstructed and the original image is used to quantify reconstruction performance, with 0 distance corresponding to the perfect reconstruction. As before, the value of k for the Banded Faces algorithm was set to be 20 for all experiments. The experiments were repeated over multiple iterations with 30 samples of 10 subjects (3 per subject) drawn randomly from the dataset for each iteration. It should be noted that, Eigenfaces (Principal Components Analysis) produce the best possible approximation to the empirical covariance matrix, thus minimizing the average L 2 distance to the training set. However (according to the results in [15]) we believe that this may not be a good estimator to the true population covariance matrix. We test this by adding noise to images, applying the algorithm and analyzing the distance of reconstructed images to the original (no noise) prototypes. The corresponding performance values averaged over 5 iterations are seen in Table(4.6) for the ATT/ORL dataset and in Table(4.7) for the Yale dataset. We also show some representative sample images of three of the subjects with varying levels of noise and their reconstructions in Table (4.8). σ noise Eigenfaces Euc Thresh 2DPCA B F Table 4.6: Reconstruction error (L 2 Distance to the original) on a set of 10 subjects with 3 samples per subject drawn from the ATT face dataset, different amounts of noise added. 27

38 σ noise Eigfaces Euc Thresh 2DPCA B F Table 4.7: Reconstruction error (L 2 Distance to the original) on a set of 15 subjects with 3 samples per subject drawn from the Yale face dataset, different amounts of noise added. Discussion. We observe that images reconstructed using Banded Faces are consistently closer to the original prototypes than images reconstructed with Eigenfaces, and other methods. As expected, this gap increases with the amount of added noise. These improvements in the reconstruction accuracy are easily visually observable in the images in Table (4.8), where some of the facial features are more clearly seen in the rightmost column but are less apparent for the reconstructions obtained with the other two methods. Interestingly, Euclidean proximity banding shows almost identical performance to Eigenfaces for smaller amounts of noise, but demonstrates some improvement for the largest noise setting, while 2DPCA is better on one of the datasets. We see that the difference in performance between Eigenfaces and Banded Faces is significantly larger on the ATT/ORL dataset than on the Yale dataset. We conjecture that this may be a result of the fact that the original images in the ATT dataset are considerably more noisy. 28

39 Noise Original Image Noisy Image Eigenface Euclidean Banded Faces Low Noise, σ noise 0.01 L 2 Dist Medium Noise, σ noise 0.02 L 2 Dist High Noise, σ noise 0.03 L 2 Dist Table 4.8: A few representative samples from the experiments in Table (4.6). The numbers below an image are the values of its L 2 distance from the corresponding original image. 29

40 CHAPTER 5 SUMMARY AND CONCLUSIONS In this thesis we explore the structure of covariance matrices used in many popular algorithms, such as Eigenfaces, in the setting when the dimension of the space is significantly larger than the number of samples available, a situation which is common in a range of computer vision problems. Theoretical analysis of Eigenfaces shows that they generally require the number of samples linear in dimension. We postulate the existence of a data-dependent metric space structure on the coordinates, and show that under certain assumptions covariance matrices can be estimated accurately even when the number of samples is logarithmic in the dimension of the space. The resulting algorithms are simple and, as demonstrated by our experimental results, compare favorably to Eigenfaces in image reconstruction/denoising and image classification. We note that Eigenfaces is a very popular algorithm used for a variety of problems in computer vision and it provides us with a nice basis for both experimental and theoretical comparison. However,s our methods are not restricted to Eigenfaces and can be utilized wherever covariance or correlation matrices are used. 30

41 BIBLIOGRAPHY [1] T. H. Ahonen and M. A. Pietikainen. Face description with local binary patterns: Application to face recognition. IEEE PAMI, 28(12): , <4> [2] T. W. Anderson. An Introduction to Multivariate Statistical Analysis. Wiley, New York, <1, 2> [3] P. Belhumeur, J. Hespanha, and D. Kriegman. Eigenfaces vs. Fisherfaces: recognition using class specific linear projection. IEEE Transactions on Pattern Analysis and Machine Intelligence., 19(7): , <1, 23> [4] M. Belkin and P. Niyogi. Laplacian eigenmaps for dimensionality reduction and data representation. Neural Comput., 15(6): , ISSN <5, 6> [5] P. J. Bickel and E. Levina. Covariance Regularization by Thresholding. To appear in the Annals of Statistics., <4, 22> [6] P. J. Bickel and E. Levina. Regularized Estimation of Large Covariance Matrices. Annals of Statistics, 36(1): , <3, 8, 13, 18, 22, 23> [7] P. J. Bickel and B. Li. Regularization in Statistics. Test., 15(2), <3> [8] R. Chellappa, C. L. Wilson, and S. Sirohey. Human and Machine Recognition of Faces: A Survey. Proceedings of the IEEE, 83(5): , <1> [9] R. Furrer and T. Bengtsson. Estimation of high-dimensional prior and posteriori covariance matrices in Kalman filter variants. Journal of Multivariate Analysis, 98:227255, <4> [10] S. Geman and D. Geman. Stochastic Relaxation, Gibbs Distributions, and the Bayesian Restoration of Images. IEEE Trans. Pattern Analysis and Machine Intelligence, 6(2): , <7> [11] G. H. Golub and C. F. Van Loan. Matrix Computations. The John Hopkins University Press, Baltimore, Maryland, second edition, <13> 31

42 [12] X. He, S. Yan, Y. Hu, P. Niyogi, and H.-J.. Zhang. Face Recognition Using Laplacianfaces. IEEE Transactions on Pattern Analysis and Machine Intelligence., 27(3), <6> [13] A. E. Hoerl and R. W. Kennard. Ridge regression: Biased estimation for nonorthogonal problems. Technometrics, 12:55 67, <3> [14] J. Huang, N. Liu, M. Pourahmadi, and L. Liu. Covariance matrix selection and estimation via penalised normal likelihood. Biometrika, 93:8598, <4> [15] I. M. Johnstone. On the distribution of the largest eigenvalue in principal components analysis. Annals of Statistics, 29(2): , <3, 24, 27> [16] N. E. Karoui. Spectrum estimation for large dimensional covariance matrices using random matrix theory. Annals of Statistics, <3> [17] N. E. Karoui. Operator norm consistent estimation of large dimensional sparse covariance matrices. To appear in the Annals of Statistics, <4, 22> [18] T. Kato. Perturbation theory for linear operators. Springer, <14> [19] H. Kong, L. Wang, E. K. Teoh, X. Li, J. G. Wang, and R. Venkateswarlu. Generalized 2d principal component analysis for face image representation and recognition. Neural Networks, 18(5-6): , <4> [20] S. L. Lauritzen. Graphical Models. Clarendon Press, <7> [21] V. A. Marcenko and P. L. A. Distribution of the eigenvalues in certain sets of random matrices. Math. USSR-Sbornik, 1(4): , <3> [22] K. Nishino, S. Nayar, and T. Jebara. Clustered blockwise PCA for representing visual data. IEEE Transactions on Pattern Analysis and Machine Intelligence, 27(10):1675, October <4> [23] S. T. Roweis and K. Saul. Nonlinear Dimensionality Reduction by Locally Linear Embedding. Science., 290, <5, 6> [24] J. W. Silverstein and Z. D. Bai. On the empirical distribution of eigenvalues of a class of large dimensional random matrices. Journal of Multivariate Analysis, 54: , <3> [25] J. Tenenbaum, d. Silva, and J. Langford. A Global Geometric Framework for Nonlinear Dimensionality Reduction. Science., 290, <5, 6> [26] M. Turk and A. Pentland. Face recognition using Eigenfaces. Proceedings of IEEE Computer Society Conference on Computer Vision and Pattern Recognition., pages , <1> 32

43 [27] L. Wang, X. Wang, X. Zhang, and J. Feng. The equivalence of two-dimensional pca to line-based pca. ACM Pattern Recognition Letters, 26(1):57 60, <4> [28] W. B. Wu and M. Pourahmadi. Nonparametric estimation of large covariance matrices of longitudinal data. Biometrika, 90:831844, <4> [29] J. Yang, D. Zhang, A. F. Frangi, and J. Yang. Two-dimensional pca: A new approach to appearance-based face representation and recognition. IEEE PAMI, 26(1): , ISSN <4> [30] J. Ye, R. Janardan, and Q. Li. Two-dimensional linear discriminant analysis. NIPS, <4> 33

Discriminant Uncorrelated Neighborhood Preserving Projections

Discriminant Uncorrelated Neighborhood Preserving Projections Journal of Information & Computational Science 8: 14 (2011) 3019 3026 Available at http://www.joics.com Discriminant Uncorrelated Neighborhood Preserving Projections Guoqiang WANG a,, Weijuan ZHANG a,

More information

Permutation-invariant regularization of large covariance matrices. Liza Levina

Permutation-invariant regularization of large covariance matrices. Liza Levina Liza Levina Permutation-invariant covariance regularization 1/42 Permutation-invariant regularization of large covariance matrices Liza Levina Department of Statistics University of Michigan Joint work

More information

Non-linear Dimensionality Reduction

Non-linear Dimensionality Reduction Non-linear Dimensionality Reduction CE-725: Statistical Pattern Recognition Sharif University of Technology Spring 2013 Soleymani Outline Introduction Laplacian Eigenmaps Locally Linear Embedding (LLE)

More information

Locality Preserving Projections

Locality Preserving Projections Locality Preserving Projections Xiaofei He Department of Computer Science The University of Chicago Chicago, IL 60637 xiaofei@cs.uchicago.edu Partha Niyogi Department of Computer Science The University

More information

Nonlinear Dimensionality Reduction

Nonlinear Dimensionality Reduction Outline Hong Chang Institute of Computing Technology, Chinese Academy of Sciences Machine Learning Methods (Fall 2012) Outline Outline I 1 Kernel PCA 2 Isomap 3 Locally Linear Embedding 4 Laplacian Eigenmap

More information

Face Recognition Using Laplacianfaces He et al. (IEEE Trans PAMI, 2005) presented by Hassan A. Kingravi

Face Recognition Using Laplacianfaces He et al. (IEEE Trans PAMI, 2005) presented by Hassan A. Kingravi Face Recognition Using Laplacianfaces He et al. (IEEE Trans PAMI, 2005) presented by Hassan A. Kingravi Overview Introduction Linear Methods for Dimensionality Reduction Nonlinear Methods and Manifold

More information

Lecture: Face Recognition

Lecture: Face Recognition Lecture: Face Recognition Juan Carlos Niebles and Ranjay Krishna Stanford Vision and Learning Lab Lecture 12-1 What we will learn today Introduction to face recognition The Eigenfaces Algorithm Linear

More information

Unsupervised dimensionality reduction

Unsupervised dimensionality reduction Unsupervised dimensionality reduction Guillaume Obozinski Ecole des Ponts - ParisTech SOCN course 2014 Guillaume Obozinski Unsupervised dimensionality reduction 1/30 Outline 1 PCA 2 Kernel PCA 3 Multidimensional

More information

Statistical and Computational Analysis of Locality Preserving Projection

Statistical and Computational Analysis of Locality Preserving Projection Statistical and Computational Analysis of Locality Preserving Projection Xiaofei He xiaofei@cs.uchicago.edu Department of Computer Science, University of Chicago, 00 East 58th Street, Chicago, IL 60637

More information

Discriminative Direction for Kernel Classifiers

Discriminative Direction for Kernel Classifiers Discriminative Direction for Kernel Classifiers Polina Golland Artificial Intelligence Lab Massachusetts Institute of Technology Cambridge, MA 02139 polina@ai.mit.edu Abstract In many scientific and engineering

More information

Learning Multiple Tasks with a Sparse Matrix-Normal Penalty

Learning Multiple Tasks with a Sparse Matrix-Normal Penalty Learning Multiple Tasks with a Sparse Matrix-Normal Penalty Yi Zhang and Jeff Schneider NIPS 2010 Presented by Esther Salazar Duke University March 25, 2011 E. Salazar (Reading group) March 25, 2011 1

More information

Course 495: Advanced Statistical Machine Learning/Pattern Recognition

Course 495: Advanced Statistical Machine Learning/Pattern Recognition Course 495: Advanced Statistical Machine Learning/Pattern Recognition Deterministic Component Analysis Goal (Lecture): To present standard and modern Component Analysis (CA) techniques such as Principal

More information

Nonnegative Matrix Factorization Clustering on Multiple Manifolds

Nonnegative Matrix Factorization Clustering on Multiple Manifolds Proceedings of the Twenty-Fourth AAAI Conference on Artificial Intelligence (AAAI-10) Nonnegative Matrix Factorization Clustering on Multiple Manifolds Bin Shen, Luo Si Department of Computer Science,

More information

Dimensionality Reduction Using PCA/LDA. Hongyu Li School of Software Engineering TongJi University Fall, 2014

Dimensionality Reduction Using PCA/LDA. Hongyu Li School of Software Engineering TongJi University Fall, 2014 Dimensionality Reduction Using PCA/LDA Hongyu Li School of Software Engineering TongJi University Fall, 2014 Dimensionality Reduction One approach to deal with high dimensional data is by reducing their

More information

Focus was on solving matrix inversion problems Now we look at other properties of matrices Useful when A represents a transformations.

Focus was on solving matrix inversion problems Now we look at other properties of matrices Useful when A represents a transformations. Previously Focus was on solving matrix inversion problems Now we look at other properties of matrices Useful when A represents a transformations y = Ax Or A simply represents data Notion of eigenvectors,

More information

SINGLE-TASK AND MULTITASK SPARSE GAUSSIAN PROCESSES

SINGLE-TASK AND MULTITASK SPARSE GAUSSIAN PROCESSES SINGLE-TASK AND MULTITASK SPARSE GAUSSIAN PROCESSES JIANG ZHU, SHILIANG SUN Department of Computer Science and Technology, East China Normal University 500 Dongchuan Road, Shanghai 20024, P. R. China E-MAIL:

More information

CSE 291. Assignment Spectral clustering versus k-means. Out: Wed May 23 Due: Wed Jun 13

CSE 291. Assignment Spectral clustering versus k-means. Out: Wed May 23 Due: Wed Jun 13 CSE 291. Assignment 3 Out: Wed May 23 Due: Wed Jun 13 3.1 Spectral clustering versus k-means Download the rings data set for this problem from the course web site. The data is stored in MATLAB format as

More information

Eigenface-based facial recognition

Eigenface-based facial recognition Eigenface-based facial recognition Dimitri PISSARENKO December 1, 2002 1 General This document is based upon Turk and Pentland (1991b), Turk and Pentland (1991a) and Smith (2002). 2 How does it work? The

More information

Example: Face Detection

Example: Face Detection Announcements HW1 returned New attendance policy Face Recognition: Dimensionality Reduction On time: 1 point Five minutes or more late: 0.5 points Absent: 0 points Biometrics CSE 190 Lecture 14 CSE190,

More information

Principal Component Analysis -- PCA (also called Karhunen-Loeve transformation)

Principal Component Analysis -- PCA (also called Karhunen-Loeve transformation) Principal Component Analysis -- PCA (also called Karhunen-Loeve transformation) PCA transforms the original input space into a lower dimensional space, by constructing dimensions that are linear combinations

More information

Nonlinear Dimensionality Reduction. Jose A. Costa

Nonlinear Dimensionality Reduction. Jose A. Costa Nonlinear Dimensionality Reduction Jose A. Costa Mathematics of Information Seminar, Dec. Motivation Many useful of signals such as: Image databases; Gene expression microarrays; Internet traffic time

More information

Sparse representation classification and positive L1 minimization

Sparse representation classification and positive L1 minimization Sparse representation classification and positive L1 minimization Cencheng Shen Joint Work with Li Chen, Carey E. Priebe Applied Mathematics and Statistics Johns Hopkins University, August 5, 2014 Cencheng

More information

Dimension Reduction Techniques. Presented by Jie (Jerry) Yu

Dimension Reduction Techniques. Presented by Jie (Jerry) Yu Dimension Reduction Techniques Presented by Jie (Jerry) Yu Outline Problem Modeling Review of PCA and MDS Isomap Local Linear Embedding (LLE) Charting Background Advances in data collection and storage

More information

Estimation of large dimensional sparse covariance matrices

Estimation of large dimensional sparse covariance matrices Estimation of large dimensional sparse covariance matrices Department of Statistics UC, Berkeley May 5, 2009 Sample covariance matrix and its eigenvalues Data: n p matrix X n (independent identically distributed)

More information

Learning gradients: prescriptive models

Learning gradients: prescriptive models Department of Statistical Science Institute for Genome Sciences & Policy Department of Computer Science Duke University May 11, 2007 Relevant papers Learning Coordinate Covariances via Gradients. Sayan

More information

Local Learning Projections

Local Learning Projections Mingrui Wu mingrui.wu@tuebingen.mpg.de Max Planck Institute for Biological Cybernetics, Tübingen, Germany Kai Yu kyu@sv.nec-labs.com NEC Labs America, Cupertino CA, USA Shipeng Yu shipeng.yu@siemens.com

More information

SPECTRAL CLUSTERING AND KERNEL PRINCIPAL COMPONENT ANALYSIS ARE PURSUING GOOD PROJECTIONS

SPECTRAL CLUSTERING AND KERNEL PRINCIPAL COMPONENT ANALYSIS ARE PURSUING GOOD PROJECTIONS SPECTRAL CLUSTERING AND KERNEL PRINCIPAL COMPONENT ANALYSIS ARE PURSUING GOOD PROJECTIONS VIKAS CHANDRAKANT RAYKAR DECEMBER 5, 24 Abstract. We interpret spectral clustering algorithms in the light of unsupervised

More information

Introduction to Machine Learning. PCA and Spectral Clustering. Introduction to Machine Learning, Slides: Eran Halperin

Introduction to Machine Learning. PCA and Spectral Clustering. Introduction to Machine Learning, Slides: Eran Halperin 1 Introduction to Machine Learning PCA and Spectral Clustering Introduction to Machine Learning, 2013-14 Slides: Eran Halperin Singular Value Decomposition (SVD) The singular value decomposition (SVD)

More information

Linear Subspace Models

Linear Subspace Models Linear Subspace Models Goal: Explore linear models of a data set. Motivation: A central question in vision concerns how we represent a collection of data vectors. The data vectors may be rasterized images,

More information

Unsupervised Learning Techniques Class 07, 1 March 2006 Andrea Caponnetto

Unsupervised Learning Techniques Class 07, 1 March 2006 Andrea Caponnetto Unsupervised Learning Techniques 9.520 Class 07, 1 March 2006 Andrea Caponnetto About this class Goal To introduce some methods for unsupervised learning: Gaussian Mixtures, K-Means, ISOMAP, HLLE, Laplacian

More information

Multiple Similarities Based Kernel Subspace Learning for Image Classification

Multiple Similarities Based Kernel Subspace Learning for Image Classification Multiple Similarities Based Kernel Subspace Learning for Image Classification Wang Yan, Qingshan Liu, Hanqing Lu, and Songde Ma National Laboratory of Pattern Recognition, Institute of Automation, Chinese

More information

Analysis of Spectral Kernel Design based Semi-supervised Learning

Analysis of Spectral Kernel Design based Semi-supervised Learning Analysis of Spectral Kernel Design based Semi-supervised Learning Tong Zhang IBM T. J. Watson Research Center Yorktown Heights, NY 10598 Rie Kubota Ando IBM T. J. Watson Research Center Yorktown Heights,

More information

Face Recognition. Face Recognition. Subspace-Based Face Recognition Algorithms. Application of Face Recognition

Face Recognition. Face Recognition. Subspace-Based Face Recognition Algorithms. Application of Face Recognition ace Recognition Identify person based on the appearance of face CSED441:Introduction to Computer Vision (2017) Lecture10: Subspace Methods and ace Recognition Bohyung Han CSE, POSTECH bhhan@postech.ac.kr

More information

A Unified Bayesian Framework for Face Recognition

A Unified Bayesian Framework for Face Recognition Appears in the IEEE Signal Processing Society International Conference on Image Processing, ICIP, October 4-7,, Chicago, Illinois, USA A Unified Bayesian Framework for Face Recognition Chengjun Liu and

More information

ISSN: (Online) Volume 3, Issue 5, May 2015 International Journal of Advance Research in Computer Science and Management Studies

ISSN: (Online) Volume 3, Issue 5, May 2015 International Journal of Advance Research in Computer Science and Management Studies ISSN: 2321-7782 (Online) Volume 3, Issue 5, May 2015 International Journal of Advance Research in Computer Science and Management Studies Research Article / Survey Paper / Case Study Available online at:

More information

ECE521 week 3: 23/26 January 2017

ECE521 week 3: 23/26 January 2017 ECE521 week 3: 23/26 January 2017 Outline Probabilistic interpretation of linear regression - Maximum likelihood estimation (MLE) - Maximum a posteriori (MAP) estimation Bias-variance trade-off Linear

More information

When Fisher meets Fukunaga-Koontz: A New Look at Linear Discriminants

When Fisher meets Fukunaga-Koontz: A New Look at Linear Discriminants When Fisher meets Fukunaga-Koontz: A New Look at Linear Discriminants Sheng Zhang erence Sim School of Computing, National University of Singapore 3 Science Drive 2, Singapore 7543 {zhangshe, tsim}@comp.nus.edu.sg

More information

CITS 4402 Computer Vision

CITS 4402 Computer Vision CITS 4402 Computer Vision A/Prof Ajmal Mian Adj/A/Prof Mehdi Ravanbakhsh Lecture 06 Object Recognition Objectives To understand the concept of image based object recognition To learn how to match images

More information

Graph-Laplacian PCA: Closed-form Solution and Robustness

Graph-Laplacian PCA: Closed-form Solution and Robustness 2013 IEEE Conference on Computer Vision and Pattern Recognition Graph-Laplacian PCA: Closed-form Solution and Robustness Bo Jiang a, Chris Ding b,a, Bin Luo a, Jin Tang a a School of Computer Science and

More information

Introduction to Machine Learning

Introduction to Machine Learning 10-701 Introduction to Machine Learning PCA Slides based on 18-661 Fall 2018 PCA Raw data can be Complex, High-dimensional To understand a phenomenon we measure various related quantities If we knew what

More information

Keywords Eigenface, face recognition, kernel principal component analysis, machine learning. II. LITERATURE REVIEW & OVERVIEW OF PROPOSED METHODOLOGY

Keywords Eigenface, face recognition, kernel principal component analysis, machine learning. II. LITERATURE REVIEW & OVERVIEW OF PROPOSED METHODOLOGY Volume 6, Issue 3, March 2016 ISSN: 2277 128X International Journal of Advanced Research in Computer Science and Software Engineering Research Paper Available online at: www.ijarcsse.com Eigenface and

More information

Regularized Estimation of High Dimensional Covariance Matrices. Peter Bickel. January, 2008

Regularized Estimation of High Dimensional Covariance Matrices. Peter Bickel. January, 2008 Regularized Estimation of High Dimensional Covariance Matrices Peter Bickel Cambridge January, 2008 With Thanks to E. Levina (Joint collaboration, slides) I. M. Johnstone (Slides) Choongsoon Bae (Slides)

More information

Lecture Notes 1: Vector spaces

Lecture Notes 1: Vector spaces Optimization-based data analysis Fall 2017 Lecture Notes 1: Vector spaces In this chapter we review certain basic concepts of linear algebra, highlighting their application to signal processing. 1 Vector

More information

Manifold Regularization

Manifold Regularization 9.520: Statistical Learning Theory and Applications arch 3rd, 200 anifold Regularization Lecturer: Lorenzo Rosasco Scribe: Hooyoung Chung Introduction In this lecture we introduce a class of learning algorithms,

More information

Classification of handwritten digits using supervised locally linear embedding algorithm and support vector machine

Classification of handwritten digits using supervised locally linear embedding algorithm and support vector machine Classification of handwritten digits using supervised locally linear embedding algorithm and support vector machine Olga Kouropteva, Oleg Okun, Matti Pietikäinen Machine Vision Group, Infotech Oulu and

More information

Laplacian Eigenmaps for Dimensionality Reduction and Data Representation

Laplacian Eigenmaps for Dimensionality Reduction and Data Representation Introduction and Data Representation Mikhail Belkin & Partha Niyogi Department of Electrical Engieering University of Minnesota Mar 21, 2017 1/22 Outline Introduction 1 Introduction 2 3 4 Connections to

More information

Tuning Parameter Selection in Regularized Estimations of Large Covariance Matrices

Tuning Parameter Selection in Regularized Estimations of Large Covariance Matrices Tuning Parameter Selection in Regularized Estimations of Large Covariance Matrices arxiv:1308.3416v1 [stat.me] 15 Aug 2013 Yixin Fang 1, Binhuan Wang 1, and Yang Feng 2 1 New York University and 2 Columbia

More information

Intrinsic Structure Study on Whale Vocalizations

Intrinsic Structure Study on Whale Vocalizations 1 2015 DCLDE Conference Intrinsic Structure Study on Whale Vocalizations Yin Xian 1, Xiaobai Sun 2, Yuan Zhang 3, Wenjing Liao 3 Doug Nowacek 1,4, Loren Nolte 1, Robert Calderbank 1,2,3 1 Department of

More information

Smart PCA. Yi Zhang Machine Learning Department Carnegie Mellon University

Smart PCA. Yi Zhang Machine Learning Department Carnegie Mellon University Smart PCA Yi Zhang Machine Learning Department Carnegie Mellon University yizhang1@cs.cmu.edu Abstract PCA can be smarter and makes more sensible projections. In this paper, we propose smart PCA, an extension

More information

Graph Metrics and Dimension Reduction

Graph Metrics and Dimension Reduction Graph Metrics and Dimension Reduction Minh Tang 1 Michael Trosset 2 1 Applied Mathematics and Statistics The Johns Hopkins University 2 Department of Statistics Indiana University, Bloomington November

More information

ROBERTO BATTITI, MAURO BRUNATO. The LION Way: Machine Learning plus Intelligent Optimization. LIONlab, University of Trento, Italy, Apr 2015

ROBERTO BATTITI, MAURO BRUNATO. The LION Way: Machine Learning plus Intelligent Optimization. LIONlab, University of Trento, Italy, Apr 2015 ROBERTO BATTITI, MAURO BRUNATO. The LION Way: Machine Learning plus Intelligent Optimization. LIONlab, University of Trento, Italy, Apr 2015 http://intelligentoptimization.org/lionbook Roberto Battiti

More information

Reconnaissance d objetsd et vision artificielle

Reconnaissance d objetsd et vision artificielle Reconnaissance d objetsd et vision artificielle http://www.di.ens.fr/willow/teaching/recvis09 Lecture 6 Face recognition Face detection Neural nets Attention! Troisième exercice de programmation du le

More information

L26: Advanced dimensionality reduction

L26: Advanced dimensionality reduction L26: Advanced dimensionality reduction The snapshot CA approach Oriented rincipal Components Analysis Non-linear dimensionality reduction (manifold learning) ISOMA Locally Linear Embedding CSCE 666 attern

More information

A METHOD OF FINDING IMAGE SIMILAR PATCHES BASED ON GRADIENT-COVARIANCE SIMILARITY

A METHOD OF FINDING IMAGE SIMILAR PATCHES BASED ON GRADIENT-COVARIANCE SIMILARITY IJAMML 3:1 (015) 69-78 September 015 ISSN: 394-58 Available at http://scientificadvances.co.in DOI: http://dx.doi.org/10.1864/ijamml_710011547 A METHOD OF FINDING IMAGE SIMILAR PATCHES BASED ON GRADIENT-COVARIANCE

More information

Learning Eigenfunctions: Links with Spectral Clustering and Kernel PCA

Learning Eigenfunctions: Links with Spectral Clustering and Kernel PCA Learning Eigenfunctions: Links with Spectral Clustering and Kernel PCA Yoshua Bengio Pascal Vincent Jean-François Paiement University of Montreal April 2, Snowbird Learning 2003 Learning Modal Structures

More information

Riemannian Metric Learning for Symmetric Positive Definite Matrices

Riemannian Metric Learning for Symmetric Positive Definite Matrices CMSC 88J: Linear Subspaces and Manifolds for Computer Vision and Machine Learning Riemannian Metric Learning for Symmetric Positive Definite Matrices Raviteja Vemulapalli Guide: Professor David W. Jacobs

More information

Unsupervised Learning with Permuted Data

Unsupervised Learning with Permuted Data Unsupervised Learning with Permuted Data Sergey Kirshner skirshne@ics.uci.edu Sridevi Parise sparise@ics.uci.edu Padhraic Smyth smyth@ics.uci.edu School of Information and Computer Science, University

More information

Modeling Classes of Shapes Suppose you have a class of shapes with a range of variations: System 2 Overview

Modeling Classes of Shapes Suppose you have a class of shapes with a range of variations: System 2 Overview 4 4 4 6 4 4 4 6 4 4 4 6 4 4 4 6 4 4 4 6 4 4 4 6 4 4 4 6 4 4 4 6 Modeling Classes of Shapes Suppose you have a class of shapes with a range of variations: System processes System Overview Previous Systems:

More information

Spectral Regression for Efficient Regularized Subspace Learning

Spectral Regression for Efficient Regularized Subspace Learning Spectral Regression for Efficient Regularized Subspace Learning Deng Cai UIUC dengcai2@cs.uiuc.edu Xiaofei He Yahoo! hex@yahoo-inc.com Jiawei Han UIUC hanj@cs.uiuc.edu Abstract Subspace learning based

More information

Iterative Laplacian Score for Feature Selection

Iterative Laplacian Score for Feature Selection Iterative Laplacian Score for Feature Selection Linling Zhu, Linsong Miao, and Daoqiang Zhang College of Computer Science and echnology, Nanjing University of Aeronautics and Astronautics, Nanjing 2006,

More information

Cheng Soon Ong & Christian Walder. Canberra February June 2018

Cheng Soon Ong & Christian Walder. Canberra February June 2018 Cheng Soon Ong & Christian Walder Research Group and College of Engineering and Computer Science Canberra February June 2018 Outlines Overview Introduction Linear Algebra Probability Linear Regression

More information

Robust Laplacian Eigenmaps Using Global Information

Robust Laplacian Eigenmaps Using Global Information Manifold Learning and its Applications: Papers from the AAAI Fall Symposium (FS-9-) Robust Laplacian Eigenmaps Using Global Information Shounak Roychowdhury ECE University of Texas at Austin, Austin, TX

More information

Graphs, Geometry and Semi-supervised Learning

Graphs, Geometry and Semi-supervised Learning Graphs, Geometry and Semi-supervised Learning Mikhail Belkin The Ohio State University, Dept of Computer Science and Engineering and Dept of Statistics Collaborators: Partha Niyogi, Vikas Sindhwani In

More information

Distance Metric Learning in Data Mining (Part II) Fei Wang and Jimeng Sun IBM TJ Watson Research Center

Distance Metric Learning in Data Mining (Part II) Fei Wang and Jimeng Sun IBM TJ Watson Research Center Distance Metric Learning in Data Mining (Part II) Fei Wang and Jimeng Sun IBM TJ Watson Research Center 1 Outline Part I - Applications Motivation and Introduction Patient similarity application Part II

More information

Deep Learning: Approximation of Functions by Composition

Deep Learning: Approximation of Functions by Composition Deep Learning: Approximation of Functions by Composition Zuowei Shen Department of Mathematics National University of Singapore Outline 1 A brief introduction of approximation theory 2 Deep learning: approximation

More information

Properties of optimizations used in penalized Gaussian likelihood inverse covariance matrix estimation

Properties of optimizations used in penalized Gaussian likelihood inverse covariance matrix estimation Properties of optimizations used in penalized Gaussian likelihood inverse covariance matrix estimation Adam J. Rothman School of Statistics University of Minnesota October 8, 2014, joint work with Liliana

More information

Methods for sparse analysis of high-dimensional data, II

Methods for sparse analysis of high-dimensional data, II Methods for sparse analysis of high-dimensional data, II Rachel Ward May 23, 2011 High dimensional data with low-dimensional structure 300 by 300 pixel images = 90, 000 dimensions 2 / 47 High dimensional

More information

Statistical Pattern Recognition

Statistical Pattern Recognition Statistical Pattern Recognition Feature Extraction Hamid R. Rabiee Jafar Muhammadi, Alireza Ghasemi, Payam Siyari Spring 2014 http://ce.sharif.edu/courses/92-93/2/ce725-2/ Agenda Dimensionality Reduction

More information

Orthogonal Laplacianfaces for Face Recognition

Orthogonal Laplacianfaces for Face Recognition 3608 IEEE TRANSACTIONS ON IMAGE PROCESSING, VOL. 15, NO. 11, NOVEMBER 2006 [29] G. Deng and J. C. Pinoli, Differentiation-based edge detection using the logarithmic image processing model, J. Math. Imag.

More information

ECE 521. Lecture 11 (not on midterm material) 13 February K-means clustering, Dimensionality reduction

ECE 521. Lecture 11 (not on midterm material) 13 February K-means clustering, Dimensionality reduction ECE 521 Lecture 11 (not on midterm material) 13 February 2017 K-means clustering, Dimensionality reduction With thanks to Ruslan Salakhutdinov for an earlier version of the slides Overview K-means clustering

More information

COVARIANCE REGULARIZATION FOR SUPERVISED LEARNING IN HIGH DIMENSIONS

COVARIANCE REGULARIZATION FOR SUPERVISED LEARNING IN HIGH DIMENSIONS COVARIANCE REGULARIZATION FOR SUPERVISED LEARNING IN HIGH DIMENSIONS DANIEL L. ELLIOTT CHARLES W. ANDERSON Department of Computer Science Colorado State University Fort Collins, Colorado, USA MICHAEL KIRBY

More information

Small sample size in high dimensional space - minimum distance based classification.

Small sample size in high dimensional space - minimum distance based classification. Small sample size in high dimensional space - minimum distance based classification. Ewa Skubalska-Rafaj lowicz Institute of Computer Engineering, Automatics and Robotics, Department of Electronics, Wroc

More information

CS168: The Modern Algorithmic Toolbox Lecture #6: Regularization

CS168: The Modern Algorithmic Toolbox Lecture #6: Regularization CS168: The Modern Algorithmic Toolbox Lecture #6: Regularization Tim Roughgarden & Gregory Valiant April 18, 2018 1 The Context and Intuition behind Regularization Given a dataset, and some class of models

More information

Recognition Using Class Specific Linear Projection. Magali Segal Stolrasky Nadav Ben Jakov April, 2015

Recognition Using Class Specific Linear Projection. Magali Segal Stolrasky Nadav Ben Jakov April, 2015 Recognition Using Class Specific Linear Projection Magali Segal Stolrasky Nadav Ben Jakov April, 2015 Articles Eigenfaces vs. Fisherfaces Recognition Using Class Specific Linear Projection, Peter N. Belhumeur,

More information

Mark your answers ON THE EXAM ITSELF. If you are not sure of your answer you may wish to provide a brief explanation.

Mark your answers ON THE EXAM ITSELF. If you are not sure of your answer you may wish to provide a brief explanation. CS 189 Spring 2015 Introduction to Machine Learning Midterm You have 80 minutes for the exam. The exam is closed book, closed notes except your one-page crib sheet. No calculators or electronic items.

More information

Group Sparse Non-negative Matrix Factorization for Multi-Manifold Learning

Group Sparse Non-negative Matrix Factorization for Multi-Manifold Learning LIU, LU, GU: GROUP SPARSE NMF FOR MULTI-MANIFOLD LEARNING 1 Group Sparse Non-negative Matrix Factorization for Multi-Manifold Learning Xiangyang Liu 1,2 liuxy@sjtu.edu.cn Hongtao Lu 1 htlu@sjtu.edu.cn

More information

Introduction to Machine Learning. Introduction to ML - TAU 2016/7 1

Introduction to Machine Learning. Introduction to ML - TAU 2016/7 1 Introduction to Machine Learning Introduction to ML - TAU 2016/7 1 Course Administration Lecturers: Amir Globerson (gamir@post.tau.ac.il) Yishay Mansour (Mansour@tau.ac.il) Teaching Assistance: Regev Schweiger

More information

Face Recognition Using Multi-viewpoint Patterns for Robot Vision

Face Recognition Using Multi-viewpoint Patterns for Robot Vision 11th International Symposium of Robotics Research (ISRR2003), pp.192-201, 2003 Face Recognition Using Multi-viewpoint Patterns for Robot Vision Kazuhiro Fukui and Osamu Yamaguchi Corporate Research and

More information

Methods for sparse analysis of high-dimensional data, II

Methods for sparse analysis of high-dimensional data, II Methods for sparse analysis of high-dimensional data, II Rachel Ward May 26, 2011 High dimensional data with low-dimensional structure 300 by 300 pixel images = 90, 000 dimensions 2 / 55 High dimensional

More information

ECE 661: Homework 10 Fall 2014

ECE 661: Homework 10 Fall 2014 ECE 661: Homework 10 Fall 2014 This homework consists of the following two parts: (1) Face recognition with PCA and LDA for dimensionality reduction and the nearest-neighborhood rule for classification;

More information

L 2,1 Norm and its Applications

L 2,1 Norm and its Applications L 2, Norm and its Applications Yale Chang Introduction According to the structure of the constraints, the sparsity can be obtained from three types of regularizers for different purposes.. Flat Sparsity.

More information

Enhanced graph-based dimensionality reduction with repulsion Laplaceans

Enhanced graph-based dimensionality reduction with repulsion Laplaceans Enhanced graph-based dimensionality reduction with repulsion Laplaceans E. Kokiopoulou a, Y. Saad b a EPFL, LTS4 lab, Bat. ELE, Station 11; CH 1015 Lausanne; Switzerland. Email: effrosyni.kokiopoulou@epfl.ch

More information

Principal Component Analysis

Principal Component Analysis Machine Learning Michaelmas 2017 James Worrell Principal Component Analysis 1 Introduction 1.1 Goals of PCA Principal components analysis (PCA) is a dimensionality reduction technique that can be used

More information

ABSTRACT INTRODUCTION

ABSTRACT INTRODUCTION ABSTRACT Presented in this paper is an approach to fault diagnosis based on a unifying review of linear Gaussian models. The unifying review draws together different algorithms such as PCA, factor analysis,

More information

A New Combined Approach for Inference in High-Dimensional Regression Models with Correlated Variables

A New Combined Approach for Inference in High-Dimensional Regression Models with Correlated Variables A New Combined Approach for Inference in High-Dimensional Regression Models with Correlated Variables Niharika Gauraha and Swapan Parui Indian Statistical Institute Abstract. We consider the problem of

More information

Approximating the Covariance Matrix with Low-rank Perturbations

Approximating the Covariance Matrix with Low-rank Perturbations Approximating the Covariance Matrix with Low-rank Perturbations Malik Magdon-Ismail and Jonathan T. Purnell Department of Computer Science Rensselaer Polytechnic Institute Troy, NY 12180 {magdon,purnej}@cs.rpi.edu

More information

LECTURE NOTE #11 PROF. ALAN YUILLE

LECTURE NOTE #11 PROF. ALAN YUILLE LECTURE NOTE #11 PROF. ALAN YUILLE 1. NonLinear Dimension Reduction Spectral Methods. The basic idea is to assume that the data lies on a manifold/surface in D-dimensional space, see figure (1) Perform

More information

Supervised locally linear embedding

Supervised locally linear embedding Supervised locally linear embedding Dick de Ridder 1, Olga Kouropteva 2, Oleg Okun 2, Matti Pietikäinen 2 and Robert P.W. Duin 1 1 Pattern Recognition Group, Department of Imaging Science and Technology,

More information

Lecture 13 Visual recognition

Lecture 13 Visual recognition Lecture 13 Visual recognition Announcements Silvio Savarese Lecture 13-20-Feb-14 Lecture 13 Visual recognition Object classification bag of words models Discriminative methods Generative methods Object

More information

Discrete Mathematics and Probability Theory Fall 2015 Lecture 21

Discrete Mathematics and Probability Theory Fall 2015 Lecture 21 CS 70 Discrete Mathematics and Probability Theory Fall 205 Lecture 2 Inference In this note we revisit the problem of inference: Given some data or observations from the world, what can we infer about

More information

Global (ISOMAP) versus Local (LLE) Methods in Nonlinear Dimensionality Reduction

Global (ISOMAP) versus Local (LLE) Methods in Nonlinear Dimensionality Reduction Global (ISOMAP) versus Local (LLE) Methods in Nonlinear Dimensionality Reduction A presentation by Evan Ettinger on a Paper by Vin de Silva and Joshua B. Tenenbaum May 12, 2005 Outline Introduction The

More information

Data dependent operators for the spatial-spectral fusion problem

Data dependent operators for the spatial-spectral fusion problem Data dependent operators for the spatial-spectral fusion problem Wien, December 3, 2012 Joint work with: University of Maryland: J. J. Benedetto, J. A. Dobrosotskaya, T. Doster, K. W. Duke, M. Ehler, A.

More information

Table of Contents. Multivariate methods. Introduction II. Introduction I

Table of Contents. Multivariate methods. Introduction II. Introduction I Table of Contents Introduction Antti Penttilä Department of Physics University of Helsinki Exactum summer school, 04 Construction of multinormal distribution Test of multinormality with 3 Interpretation

More information

Symmetric Two Dimensional Linear Discriminant Analysis (2DLDA)

Symmetric Two Dimensional Linear Discriminant Analysis (2DLDA) Symmetric Two Dimensional inear Discriminant Analysis (2DDA) Dijun uo, Chris Ding, Heng Huang University of Texas at Arlington 701 S. Nedderman Drive Arlington, TX 76013 dijun.luo@gmail.com, {chqding,

More information

Spectral Techniques for Clustering

Spectral Techniques for Clustering Nicola Rebagliati 1/54 Spectral Techniques for Clustering Nicola Rebagliati 29 April, 2010 Nicola Rebagliati 2/54 Thesis Outline 1 2 Data Representation for Clustering Setting Data Representation and Methods

More information

Machine Learning, Fall 2009: Midterm

Machine Learning, Fall 2009: Midterm 10-601 Machine Learning, Fall 009: Midterm Monday, November nd hours 1. Personal info: Name: Andrew account: E-mail address:. You are permitted two pages of notes and a calculator. Please turn off all

More information

Nonlinear Manifold Learning Summary

Nonlinear Manifold Learning Summary Nonlinear Manifold Learning 6.454 Summary Alexander Ihler ihler@mit.edu October 6, 2003 Abstract Manifold learning is the process of estimating a low-dimensional structure which underlies a collection

More information

CS4495/6495 Introduction to Computer Vision. 8B-L2 Principle Component Analysis (and its use in Computer Vision)

CS4495/6495 Introduction to Computer Vision. 8B-L2 Principle Component Analysis (and its use in Computer Vision) CS4495/6495 Introduction to Computer Vision 8B-L2 Principle Component Analysis (and its use in Computer Vision) Wavelength 2 Wavelength 2 Principal Components Principal components are all about the directions

More information

Nonlinear Methods. Data often lies on or near a nonlinear low-dimensional curve aka manifold.

Nonlinear Methods. Data often lies on or near a nonlinear low-dimensional curve aka manifold. Nonlinear Methods Data often lies on or near a nonlinear low-dimensional curve aka manifold. 27 Laplacian Eigenmaps Linear methods Lower-dimensional linear projection that preserves distances between all

More information

Machine Learning. Data visualization and dimensionality reduction. Eric Xing. Lecture 7, August 13, Eric Xing Eric CMU,

Machine Learning. Data visualization and dimensionality reduction. Eric Xing. Lecture 7, August 13, Eric Xing Eric CMU, Eric Xing Eric Xing @ CMU, 2006-2010 1 Machine Learning Data visualization and dimensionality reduction Eric Xing Lecture 7, August 13, 2010 Eric Xing Eric Xing @ CMU, 2006-2010 2 Text document retrieval/labelling

More information