Robust subspace recovery by geodesically convex optimization

Size: px
Start display at page:

Download "Robust subspace recovery by geodesically convex optimization"

Transcription

1 Robust subspace recovery by geodesically convex optimization Teng Zhang arxiv: v2 [stat.ml] 0 Jun 202 Abstract We introduce Tyler s M-estimator to robustly recover the underlying linear model from a data set contaminated by outliers. We prove that the objective function of this estimator is geodesically convex on the manifold of all positive definite matrices have a unique minimizer. Besides, we prove that when inliers (i.e., points that are not outliers are sampled from a subspace the percentage of outliers is bounded by some number, then under some very weak assumptions a commonly used algorithm of this estimator can recover the underlying subspace exactly. We also show that empirically this algorithm compares favorably with other convex algorithms of subspace recovery. I. ITRODUCTIO This paper is about the following problem: suppose we are given a data set X with inliers sampled from a lowdimensional linear model some arbitrary outliers, can we recover the underlying linear model? The primary tool for this problem is Principal Component Analysis (PCA. However, PCA is very sensitive to outliers. Considering the popularity of linear modeling, an robust algorithm that find the underlying linear model will have many applications. This work introduces Tyler s M-estimator of covariance in [28], proves that the objective function is geodesically convex on the manifold of all positive definite matrices. Moreover, this work proves that when inliers are sample from a subspace L, a commonly used algorithm for this estimator finds the underlying subspace L under very weak conditions that almost only depend on the percentage of outliers. A. otation conventions We assume that we are given a data set X R D with points. We define the projector Π L as the D D symmetric matrix such that Π 2 L = Π L, the range of Π L is L. We define P L as the D dim(l projection matrix from R D to L, or equivalently, any P L such that Π L = P L PL T. We use L to denote the orthogonal complement of L. We use X L to express the set of points that lie both in X the subspace L, use X \ L to express the set of points that lie in X but not in the subspace L. We use X to denote the cardinality of the set X, S + (D to denote the set of all D D semi-positive definite matrices S ++ (D to denote the set of all D D positive definite matrices. T. Zhang is with the Institute of Mathematics its Applications, University of Minnesota, Minneapolis, M, USA zhang620@umn.edu. B. Main results In this paper we introduce the following estimator due to Tyler [28]: Σ = arg min tr(σ=,σ=σ T,Σ S ++(D F(Σ, where F(Σ = log(x T Σ x+ D logdet(σ, (I. we obtain Σ by the it of the sequence Σ (k generated by the following iterative procedure in [28]: Σ (k+ = x T Σ (k x /tr(. (I.2 x T Σ (k x We will explain the motivation of this estimator as an M- estimator of covariance in Section I-D, show in Section III that the objective function F(Σ is geodesically convex on S ++ (D, under the condition (III. the sequence Σ (k generated by (I.2 converges to the unique solution of (I.. When the inliers lie exactly on the subspace L, then under some weak assumptions (almost only depends on the percentage of outliers we can recover the L exactly from k Σ (k, which is a singular matrix with L as its range. Theorem I.. If there exists a d-dimensional subspace L such that X L > d X D, (I.3 the points in the set Y = {P L x : x X L } R d Y 0 = {P L x : x X \ L } R D d lie in general positions respectively (i.e., any k points in Y span a k- dimensional subspace for all k d any k points in Y 0 span a k-dimensional subspace for all k D d. Then the sequence Σ (k generated by (I.2 converges to some ˆΣ such that im(ˆσ = L. The condition of general positions is very weak for example, when we choose inliers arbitrarily from a uniform distribution in a ball in L, or from Gaussian measure in L choose outliers arbitrarily from a uniform measure in a ball in R D, or from Gaussian measure in R D this condition holds with probability. We remark that when the ambient dimension D the dimension of subspace L is kept as the constant d, then d D approaches 0 the required percentage of inliers in Theorem I. approaches 0. This property makes Theorem I. particularly strong for high-dimensional data set with a lowdimensional structure.

2 2 C. Previous work The robust estimator for covariance has been well studied in the statistical literature, which is related topic to robust linear modeling since we can recovery the linear model by the principal components of the estimated covariance. The M- estimators, L-estimators, MCD/MVE estimators Stahel- Donoho estimator have been proposed; for a complete review we refer the reader to [20, Section 6]. However most of these methods are not convex (or the convexity is not analyzed, so the algorithms are intractable (or have unknown tractability. it is possible that the only exception is M-estimators: the convergence of its algorithm has been analyzed in [3], we will show later that as a special M-estimator, F(Σ is geodesically convex in the space of positive definite matrices (it is also shown in [32]. There are other methods that recover the linear model without estimating the robust covariance, such as Projection Pursuit [8], [], [22, Section 2], which find the principal component directly by optimization on a sphere. Another common strategy is to fit the linear model by PCA after removing possible outliers [27], [33]. However these methods are still nonconvex. Some recent works on robust linear modeling focus on convex optimization tractable algorithms [34], [22], [35], [7]. Similar to Theorem I., these works provide conditions for exact subspace recovery. We remark that these conditions are more complicated than the condition in Theorem I., since they assume an incoherence condition that requires the inliers to be spread out on the subspace L, which is not required in Theorem I.. These kind of conditions are required in [34, Theorem ], [35, (6(7]. In [7, Theorem.] it is shown that the exact recovery holds with high probability when X L > C 0 +C X \L D, d which is a simple condition very similar to the condition in (I.3. However this condition is obtained under the assumption that inliers outliers are both sampled from Gaussian distributions. In another recent work, Soltanolkotabi Cès proved that sparse subspace clustering (SSC algorithm [7] can recover multiple subspaces with high probability, but this theory also have probabilistic assumptions: it assumes that inliers outliers are both i.i.d sampled from uniform measures on unit spheres [26, Theorem.3]. We remark that our condition (I.3 sometimes can be more restrictive than the corresponding conditions of other convex methods. For example, when outliers have small magnitude concentrate around the origin, then the conditions in [35, Theorem 2] can tolerate more outliers. Similarly, the conditions in [34, Theorem ] [26, Theorem.3] can also tolerate more outliers than (I.3 under some settings. The advantage of our condition is that it is deterministic simple, empirically it is also usually less restrictive than the conditions in [34], [22], [35], [7]. D. M-estimators In this section we show that the estimator (I. can be considered as a special M-estimator of covariance gives background of the current research on this estimator. We start with the motivation: it is well known that the empirical covariance is the MLE estimator for the covariance when we assume that all x X are i.i.d. drawn from a Gaussian distribution. As a natural generalization, M-estimators of covariance [9], [0], [2] consider the generalized distribution C(ρe ρ(xt Σ x / det(σ, (I.4 where C(ρ is a normalization constant, chosen so that the integral of the distribution is equal to one. It is a generalization since when ρ(x = x, (I.4 gives Gaussian distribution. When data points are i.i.d. sampled from the distribution (I.4, the corresponding MLE estimator of covariance is called M- estimator, which minimizes ρ(x T Σ x+ 2 logdet(σ. (I.5 The objective function F(Σ can be considered as the M- estimator when ρ(x = D 2 log(x. While for this choice of ρ the function in (I.4 is not a distribution, it can be considered as the it of the following multivariate student distribution as ν 0 [20, page 87]: Γ[(ν +D/2] Γ(ν/2ν D/2 π D/2 detσ[+ ν xt Σ x] (ν+d/2. Since student distribution has a heavy tail, it is expected that this estimator should be more robust to outliers. The reason that we enforce the condition tr(σ = in (I. is the scale invariance property of F(Σ: for any constant c > 0 Σ S ++ (D, we have F(Σ = F(cΣ. (I.6 This simple fact can be easily verified, but will be repetitively used in the analysis later. Tyler Kent investigated the estimator (I. implicitly, by solving the equation F (Σ = 0 [28], [2]. They obtained the uniqueness of the solution (up to scaling to F (Σ = 0 showed that the algorithm (I.2 converges to this solution in [2, Theorem 2], under the assumption (III.. This result is almost equivalent Theorem III. Theorem III.4 in this work, except that we consider the minimization of F(Σ directly also show the existence of the minimizer. An interesting claim in [28] is that, this estimator is the most robust estimator of the scatter matrix of an elliptical distribution in the sense of minimizing the maximum asymptotic variance. The geodesical convexity of the objective function F(Σ was discovered later. In [2], Auderset et al. showed that the function is geodesically convex on the space {Σ S ++ (D : det(σ = }. After finishing this work, we learned that the geodesical convexity of F(Σ on the space S ++ (D was recently independently investigated by Wiesel in [32, Proposition ]. Wiesel also extended the convex analysis to regularized Tyler s M-estimator in [32] generalized LSE (logarithm of a sum of exponents functions the estimation of Kronecker structured covariance in [30], [3].

3 3 E. Contributions the structure of this paper The main contribution in this work is that, we introduce Tyler s M-estimator for subspace recovery, justify this estimator by showing that the algorithm (I.2 can recover the underlying subspace exactly under rather weak assumptions on the distribution of data points. Besides, we also apply geodesic convexity majorization-minimization argument to show the existence the uniqueness of the minimizer, the convergence of the algorithm. While these two facts are also observed in [32], the analysis in this paper is more careful therefore it proves the uniqueness of the minimizer the pointwise convergence of the algorithm. The paper is organized as follows. In Section II, we introduce the background on the geometry ofs ++ (D geodesic convexity. With this background we prove the uniqueness of the solution to (I. the convergence of the algorithm (I.2 in Section III. Then we perform some experiments that describe the performance of Algorithm I.2 verify Theorem I. in Section IV. Technical proofs are shown in the Appendix. II. PRELIMIARIES Our analysis relies on basic concepts from the geometry of S ++ (D the geodesic convexity. For this purpose, in Section II-A we present a brief summary of the geometry of S + (D in Section II-B we introduce the definition some properties of geodesic convexity. For more details we refer the reader to [4], [29] on the geometry of S ++ (D the geodesic convexity. A. Metric geodesic on S ++ (D The metric of S ++ (D has been well studied in literature. Indeed, the trace metric in differential geometry [5, pg 326], natural metric in symmetric cone [8], [5], affine-invariant metric [24], the metric given by Fisher information matrix for Gaussian covariance matrix estimation [25] give the same metric on S ++ (D. For Σ,Σ 2 S ++ (D, these metrics are defined by: dist(σ,σ 2 = log(σ 2 Σ 2 Σ 2 F. (II. Based on this metric, the unique geodesic γ ΣΣ 2 (t (0 t connecting Σ Σ 2 is given by [4, (6.]: γ ΣΣ 2 (t = Σ 2 (Σ 2 Σ 2 Σ 2 t Σ 2. (II.2 It follows that the midpoint of Σ Σ 2 is γ ΣΣ 2 ( 2 = Σ 2 (Σ 2 Σ 2 Σ 2 2Σ 2. We remark that Σ 2 (Σ 2 Σ 2 Σ 2 2Σ 2 is also called the geometric mean of Σ Σ 2 [4, Section 4.]. B. Geodesic convexity Geodesic convexity is a natural generalization of the convexity to Riemannian manifolds [29, Chapter 3.2]. Given a Riemannian manifoldm a set A M, we say a function f : A R is geodesically convex, if every geodesic γ xy of M with endpoints x,y A (i.e., γ xy is a function from [0,] to M with γ xy (0 = x γ xy ( = y lies in A, f(γ xy (t ( tf(x+tf(y for any x,y A 0 < t <. (II.3 Following the proof of [23, Theorem..4], for a continuous function, the geodesic midpoint convexity is equivalent to the geodesic convexity: Lemma II.. Let f : A R be a continuous function. If f(γ xy ( f(x+f(y for any x y A 2 2 then f is a geodesically convex function. (II.4 III. PROPERTY OF THE OBJECTIVE FUCTIO AD THE ALGORITHM In this section we study the properties of the objective function F(Σ the algorithm in (I.2. We show that under a very mild assumption, the solution to (I. is unique the sequence Σ (k converges to the solution. We will also discuss Theorem I., the empirical algorithm some implementation issues in this section. A. Uniqueness of the solution We first show the existence the uniqueness of the solution to (I. under a rather weak assumption. Theorem III.. If for any linear subspace L we have X L < dim(l D, then the solution of (I. exists is unique. (III. Indeed, for real data sets that contain noise, (III. is usually satisfied if the dimension is smaller than the number of points: in noisy data set generally any d-dimensional linear subspace only contains at most d points. An important remark is that the condition (III. is incompatible with the condition on the percentage of inliers in Theorem I.. Indeed, the solution to (I. does not exists: one may verify that F Π L +εi ( ΠL +εi tr(π L +εi converges to as ε 0, while tr(π L +εi converges to a singular matrix where F(Σ is undefined. The proof of Theorem III. depends on the following two lemmas, whose proofs will be presented in the appendix. In general, Lemma III.2 guarantees the uniqueness of the solution Lemma III.3 guarantees the existence of the solution. While (III.2 is also proved to [32, Proposition ], additionally we show the condition for the equality in Lemma III.2, which implies the uniqueness of the minimizer of (I.. Lemma III.2. F(Σ is geodesically convex on the manifold S ++ (D. That is, for any Σ Σ 2 S ++ (D, we have F(Σ +F(Σ 2 2F(Σ 2 (Σ 2 Σ 2 Σ 2 2 Σ 2. (III.2 When span{x} = R D, the equality in (III.2 holds if only if Σ = cσ 2.

4 4 Lemma III.3. Under the condition (III., we have F(Σ as λ min (Σ 0. Here λ min (Σ is the smallest eigenvalue of Σ. (III.3 ow we are ready to prove Theorem III.. Proof: We first prove the uniqueness of the solution to (I.. If Σ Σ 2 are both solutions to (I., then apply (III.2 the scale invariance (I.6, we have F(Σ 3 F(Σ = F(Σ 2, for Σ 3 = Σ 2 (Σ 2 Σ 2 Σ 2 2Σ 2 (. tr Σ 2 (Σ 2 Σ 2 Σ 2 2Σ 2 Since Σ Σ 2 are both minimizers to F(Σ, we have F(Σ 3 = F(Σ = F(Σ, by the condition of equality in Lemma III.2 (the assumption span{x} = R D in Lemma III.2 holds; otherwise (III. does not hold for L = span{x}, we have Σ = cσ 2. However we have tr(σ = tr(σ 2, therefore Σ = Σ 2, which contradicts out assumption, we prove the uniqueness of the solution to (I.. ow we prove the existence of the solution. First, there exists a sequence {Σ i } i {Σ S ++ (D : tr(σ = } such that F(Σ i converges to inf tr(σ=,σ S++(DF(Σ. By compactness there is a converging subsequence of {Σ i }, by Lemma III.3 this subsequence does not converge to a singular matrix therefore the subsequence converges to some matrix Σ 0 S ++ (D. By continuity of F(Σ we obtain F(Σ 0 = inf tr(σ=,σ S++(DF(Σ therefore Σ 0 is a solution to (I.. B. Convergence of algorithm In this section we prove the convergence of the sequence (Σ k generated by (I.2 under the assumption (III., we will also discuss its connection to Theorem I., which is about the convergence of the sequence (Σ k under another assumption. We begin with the motivation of the procedure (I.2. If we set the derivative of F(Σ with respect to Σ to be 0, we have d dσ F(Σ = Σ = D x T Σ x D Σ = 0 x T Σ x. Since we minimize F(Σ under the assumption tr(σ =, we have D Σ = ( tr D x T Σ x x T Σ x = tr( x T Σ x x T Σ x whose RHS is the update formula (I.2. Therefore we have the motivation that the Theorem III.4 shows that Σ (k converges to the solution to (I. under the assumption (III.. Similar to [32, Section, II], it uses the majorization-minimization argument. However the analysis here is more complete in the sense that it proves the convergence of the sequence Σ (k, while the argument in [32] only leads to the convergence of the objective function F(Σ (k. Theorem III.4. When the condition (III. holds, the sequence Σ (k generated by (I.2 converges to the unique solution to (I.. This theorem also implies that the condition X L X Theorem I. is almost necessary. Indeed, if X L > d D in X < d D, then the condition (III. is usually satisfied, by Theorem III. the solution to (I. exists ( by definition nonsingular by Theorem III.4 Σ (k converges to this nonsingular matrix. Therefore we can not recoveryl by its range. This also shows a phase transition phenomenon at X L X = d D. For simplicity in the proof we define the operator T : S + (D S + (D as T(Σ = x T Σ x /tr( x T Σ x. (III.4 The main ingredient for the proof is the observation that Σ (k+ = T(Σ (k can be considered as a minimizer of a majorization function G(Σ,Σ (k over F(Σ such that G(Σ,Σ (k F(Σ G(Σ (k,σ (k = F(Σ (k. In this sense it can be considered as an algorithm with the majorization-minimization (MM principle []. We remark that similar observations are also used in the proof of the convergence in other iteratively reweighted least square (IRLS algorithms such as [4], [6], [35], [7]. When the condition in Theorem I. holds, the assumption (III. is violated, by our analysis in Section III-A the solution to (I. does not exist. However, Theorem I. shows that the sequence Σ (k still converges it converges to a singular matrix. Due to the complexity we put its proof in the appendix. C. Empirical algorithm implementation issues Since the solution to (I. can be considered a robust estimator of covariance, empirically we can simply recover the underlying d-dimensional subspace by the span of its top d eigenvectors. Our empirical algorithm in summarized in Algorithm. In each iteration the major computational cost is due to the calculation of the inverse of Σ (k the calculation of xt Σ (k x, therefore the cost is in the order of O( D 2 when D. We will show later in Section IV-C that the algorithm exhibit linear convergence. In implementation we stop the algorithm after k-th iteration when Σ (k Σ (k F Σ (k F < 0 8. In this paragraph we describe an empirical problem where the algorithm breaks down at some iteration step, describe a way to overcome it. If the condition in Theorem I. holds, λ min (Σ (k converges to 0 as k it is nonzero for each k. However in implementation, due to the rounding error,

5 5 Algorithm Empirical algorithm for recovering a d- dimensional subspace Input: X R D : data set, d: dimension of the subspace. Output: L : a d-dimensional linear subspace. Steps: Initialization: Σ (0 = I. Repeat ( (2 until convergence: k = k +, 2 Σ (k+ = x T Σ (k x /tr( x T Σ (k x. Recovery error =00, D=0, d=5 Let Σ be the it of the sequence Σ (k, let L be the span of top d eigenvectors of Σ. when k is very large λ min (Σ (k is very close to zero, the calculated Σ (k could be a non-positive matrix or a matrix with imaginary part the convergence of Algorithm fails. Therefore in implementation we check the value of min x T Σ (k x in each iteration, stop the algorithm when it is negative or has imaginary part. We remark that this breakdown will not happen for real data sets or synthetic data sets with noise, since in these cases Σ (k converges to a nonsingular positive matrix the rounding error will not makeσ (k a non-positive matrix or a matrix with imaginary part. D. Discussion on spherical projection A simple powerful method to enhance the robustness of an algorithm to outliers is to preprocess the data set by projecting the data points to a unit sphere. Empirically it enhances the robustness of PCA Reaper algorithms significantly [7, Section 5]. Therefore a natural question is that whether it can also be applied in our algorithm. Interestingly, spherical projection has been implicitly applied in the objective function F(Σ our algorithm: one may verify that the magnitude of any point in X does not impact the solution to (I., or the update formula (I.2. IV. UMERICAL EXPERIMETS In this section, we present some numerical experiments on Algorithm, to obtain the empirical performance of this algorithm. We also show that our algorithm outperforms other convex algorithms of robust PCA by a real data set. A. Model for simulation In Sections IV-B-IV-D, we apply our algorithm on the data generated from the following model. We choose a d- dimensional subspace L, sample points i.i.d. from the Gaussian distribution (0,Π L on L, sample 0 outliers i.i.d. from the uniform distribution in the cube [0,] D. In some experiments we also add a Gaussian noise (0,ε 2 I to each of the point. We use uniform distribution in [0,] D for outliers, to show that our algorithm allows anisotropic outliers. Recovery error umebr of inliers =00, D=50, d= umebr of inliers Fig.. The dependence on the number of inliers recovery error: x-axis is the number of inlier y-axis is the corresponding recovery error B. Exact recovery of the subspace In this section we use the model in Section IV-A, choose D = 0 or 50, d = 5, 0 = 00 different values of (2 to 20 for D = to 20 for D = 0. The mean recovery error ΠˆL Π L F over 20 experiments is recorded in Figure, where ˆL is obtained by the Algorithm L is the true underlying subspace. Theorem I. guarantees that ΠˆL Π L F = for > 00 when D = 0 > 0 when D = 50, this is verified in this experiment. When D = 50 = there is a small nonzero recovery error, which seems to contradicts Theorem I.. But we remark that when D = 50 = the convergence is slow, we stop the algorithm at the 000-th iteration without really converging to the solution to (I.. We expect that exact recover of the subspace L could be obtained after larger number of iterations in Algorithm.

6 =00, D=0, d=5 = Distance to the optimal value =00 =20 Recovery error ˆL L F Size of noise ǫ umber of iterations Fig. 3. Robustness to noise: the x-axis represents the size of Gaussian noise ε, the y-axis represents the recovery error. Recovery error =20, 0 =00, D=0, d=5 =20, 0 =00, D=50, d= umber of iterations Fig. 2. Convergence rate for simulated data sets. See the text in Section IV-C for more details of the experiment. C. Convergence rate In this section we show that empirically the algorithm converges linearly. In the left figure in Figure 2, we show the convergence rate for simulated data sets with D = 0, d = 5, 0 = 00 = 80,00,20. Additional we add a Gaussian noise with ε = 0.0. The x-axis represent the number of iterations k the y-axis is Σ (k Σ (K F, where K is the total number of iterations in Algorithm. In the right figure we show a different convergent rate: for two different simulations with no noise we plot Π Lk Π L F with respect to the number of iterations k, where L k is the span of first d eigenvectors of Σ (k. We use the settings (, 0,D,d = (20,00,0,5 (20,00,50,5 since by Theorem I. k Π Lk Π L F = 0. From the right figure in Figure 2 we see that the recovery error also converges linearly. D. Robustness to noise In this section we investigate the empirically robustness of our algorithm to noise by simulated data set sampled according to section IV-A with (, 0,D,d = (20,00,0,5, different size of noise ε. We use this setting since when ε = 0, we recover the subspace exactly. We record the recovery error in Figure IV-D with respect to the size of noise. In this experiment the recovery error depends linearly on the size of noise ε. Indeed, we consider a theory that explain the performance of Algorithm under some noise of small size as an interesting future question. E. Faces in a Crowd In this section we test our algorithm on the experiment of Faces in a Crowd in [7, Section 5.4]. The goal of this experiment is to show that our algorithm can be used to robustly learn the structure of face images. Linear modeling is applicable here since the images of the faces from the same person lies on a 9-dimensional subspace [3]. In this experiment we learn the subspace from a data set that contains 32 face images of a person from the Extended Yale Face Database [6] 400 rom images from the BACKGROUD/Google folder of the Caltech0 database [9]. The images are converts to grayscale downsample to pixels. We preprocess the images by subtracting their Euclidean median, apply Algorithm to this data set to obtain a 9-dimensional subspace, we use 32 other images from the same person to test how the learned subspace fits these images. This experiment is also used in [7, Section 5.4], therefore we only compare our algorithm with S-Reaper, which has been shown to perform better than PCA, spherical PCA, LLD Reaper algorithms. PCA algorithm is still included for comparison since it is the basic technique in linear modeling. Figure IV-E shows five images their projections to the 9-dimensional subspace fitted by PCA, S-reaper our algorithm (which is labeled as M-estimator due to the argument in Section I-D respectively. Figure IV-E shows that our algorithm visually performs better than S-Reaper, especially

7 7 M est S reaper PCA Original Fig. 4. Distance to the robust subspace Inlier Inlier Outlier Test sample Test sample The projection of images to the fitted subspace. M estimator S reaper Distance to the PCA subspace Fig. 5. Ordered distances of the 32 test images to the fitted 9-dimensional subspaces by Algorithm, S-reaper PCA. assumption. We also demonstrated the virtue of this methods by experiments on simulated data sets real data sets. An open question is that, if we can have a theoretical guarantee on the robustness of our algorithm to noise therefore verify the empirical performance in Section IV-D. We find it difficult to apply the commonly used perturbation analysis in [35, Section 2.7] or [34, Theorem 2], which are based on the size of the perturbation of the objective function, since the objective function F(Σ at a singular matrix is undefined. An interesting direction is to extend the idea of geodesical convexity to other problems. Euclidean metric between matrices is usually used under this metric the set of all positive definite matrices is considered as a cone. However in this work we consider the set of all positive matrices as a manifold use the Riemmannian metric between matrices. It turns out that whilef(σ in nonconvex in Euclidean metric, it is convex in Riemmannian metric, this formulation is more powerful than similar formulations that are convex in Euclidean metric [35], [7]. It would be interesting if there are other optimization problems with the property of geodesical convexity. VI. ACKOWLEDGEMET The author would like to thank Michael Mccoy for reading an earlier version of this manuscript for helpful comments. The author is grateful to Lek Heng Lim for introducing the book [4] helpful discussions. A. Proof of Lemma III.2 VII. APPEDIX Proof: Geodesical convexity of F(Σ follows from (III.2 Lemma II.. Therefore we only need to prove (III.2 for geodesic convexity. We will prove (III.2 by showing that, if Σ 3 S ++ (D is the geometric mean of Σ,Σ 2 S ++ (D, then we have ln(det(σ +ln(det(σ 2 = 2ln(det(Σ 3, (VII. for the test images. This observation can also be quantitatively verified by checking the distances of 32 test images to the fitted subspace by PCA, S-reaper out algorithm, which is shown in Figure IV-E. The subspace generated by our algorithm has smaller distances to the test images, which explain the better performance of our algorithm in Figure IV-E. Besides, in this experiment our algorithm performs much faster than S-Reaper; our algorithm costs 4.4 seconds on a machine with Intel Core 2 Duo CPU at 3.00GHz 6GB memory, while S-reaper cost 40 seconds. it is expected since there is an additional eigenvalue decomposition in each iteration of the S-Reaper algorithm. V. DISCUSSIO In this paper we have investigated an M-estimator for covariance estimation, proved that this estimator can find the underlying subspace exactly under a rather weak ln(x T Σ x+ln(x T Σ 2 x 2ln(x T Σ 3 x. (VII.2 We start with the proof of (VII.. Use (II.2 with t = 2, we have Σ 3 Σ Σ 3 =Σ 2 (Σ 2 Σ 3 Σ 2 2 Σ 2 Σ Σ 2 (Σ 2 Σ 3 Σ 2 2 Σ 2 =Σ 2. (VII.3 Using (VII.3, (VII. can be proved as follows: det(σ 2 = det(σ 3 Σ Σ 3 = det(σ 3 det(σ det(σ 3 =det(σ 3 2 /det(σ. To prove (VII.2, we let the SVD decomposition of Σ 2 Σ 2 Σ 2 = U 0 Σ 0 U0 T define ˆx = U 0 Σ 2 x, then we have x T Σ x = ˆx T ˆx, x T Σ 2 x = ˆx T Σ 0ˆx, x T Σ 3 x =

8 8 ˆx T Σ 2 0 ˆx. Assuming that Σ 0 is a diagonal matrix with diagonal entries σ,σ 2,,σ p ˆx = (ˆx,ˆx 2,,ˆx p T, then (VII.2 is equivalent to p p p σ ˆx 2 i ˆx 2 i ( σ 2 ˆx 2 i 2, i= i= i= which can be verified by Cauchy-Schwartz inequality. Therefore (VII.2 is proved. Finally we find the condition such that the equality in (III.2 holds. By its proof of geodesic convexity we know that it holds only when the equality (VII.2 holds for any x X. By the condition of equality in Cauchy-Schwartz inequality, we have that the equality in (III.2 only holds when for any i D (here i is the index of coordinates such that ˆx i 0, σ i = c for some c R. When Σ cσ 2, σ i is not the same number for all i D. Therefore there exists i D such that ˆx i = 0. That is, there exists a hyperplane in R D such that ˆx lies on it. Since ˆx is a linear transformation of x, when (VII.2 holds for any x X, then there exists a hyperplane such that it contains x X, which contradicts our assumption that span{x} = R D. B. Proof of Theorem III.4 First we will prove that the operator T is monotone with respect to the objective function F : F(T(Σ F(Σ, the equality holds for Σ S ++ (D only when T(Σ = Σ. We prove it by constructing the following majorization function over F(Σ: G(Σ,Σ = x T x x T Σ x,σ + D logdet(σ+c. (VII.4 When C is well chosen such that G(Σ,Σ = F(Σ. The fact G(Σ,Σ F(Σ can be proved by checking the first the second derivative of G(Σ,Σ F(Σ with respect to Σ. It is easy to verify the unique minimizer of G(Σ,Σ is Σ = D x T x x T Σ x, which is a scaled version of T(Σ. Therefore we prove the monotonicity of T as follows: F(T(Σ = F( Σ G( Σ,Σ G(Σ,Σ = F(Σ. (VII.5 Because of the uniqueness of the minimizer of G(Σ,Σ, the equality in the second inequality of (VII.5 holds only when Σ = Σ. Since Σ = ct(σ tr(σ = tr(t(σ =, the equality in (VII.5 only holds when T(Σ = Σ. Therefore the sequence F(Σ (k is monotone, any accumulation points of the sequence {Σ (k }, ˆΣ, satisfies F(T(ˆΣ = F(ˆΣ therefore T(ˆΣ = ˆΣ. Applying T(ˆΣ = ˆΣ, we have ˆΣ x T x x T ˆΣ = ci, for some c R. x (VII.6 Let A = log(σ, applying logdet(σ = tr(a d da exp(a = exp(a, the derivative of F(Σ with respect to A is d da F(Σ = x T x Σ x T ˆΣ x D I. Since the set {A : A = log(σ, where det(σ = } = {A : tr(a = }, applying (VII.6 the derivative of F(Σ with respect to A in the set {Σ : det(σ = } is 0 at c 0ˆΣ, where c 0 is a number chosen such that det(c 0ˆΣ =. Since both the set det(σ = F(Σ are geodesically convex (see (VII. for the convexity of the set, c 0ˆΣ is the unique minimizer of F(Σ in the set {Σ : det(σ = }. Applying the scale invariance of F(σ in (I.6, ˆΣ is the unique solution in the set {Σ : tr(σ = }, which means that ˆΣ is also the unique solution to (I.. C. Proof of Lemma III.3 Proof: If Lemma III.3 does not hold, then there exists a sequence Σ m such that it converges some Σ S + (D \ S ++ (D, the sequence F(Σ m is bounded. WLOG we assume that λ j (Σ mi v j (Σ mi also converge for any j p, where λ j (Σ v j (Σ are the j-th eigenvalue eigenvector of Σ. This can be assumed since any sequence has a subsequence satisfying this property (eigenvectors eigenvalues of Σ m lie in a compact space. We prove (III.3 by induction on the ambient dimension D. When D=2, we have dim(ker( Σ =, F(Σ m \ker( Σ (VII.7 ( log(λ 2 (Σ m +2log(x T v 2 (Σ m + 2 log(λ 2(Σ m + 2 log(λ (Σ m. When x / ker( Σ, we have inf m x T v 2 (Σ m > 0, therefore the term log(x T v 2 (Σ m is bounded from below. Applying the assumption that λ (Σ m are bounded from below, 2 log(λ (Σ m is also bounded from below. Applying the assumption X\ker( Σ > 2 m λ 2 (Σ m = 0, the RHS of (VII.7 converges to +, which is a contradiction to the assumption that F(Σ m is bounded, therefore (III.3 is proved. If (III.3 holds for the case dim(x < D 0, then we will prove (III.3 for dim(x = D 0. By the assumption on the convergence of eigenvectors eigenvalues of Σ (k, to prove (III.3 it is equivalent to prove that F (Σ m as m, (VII.8 where Σ m = PT LΣ m P L, L = ker( Σ, d0 = dim( L F : S ++ (d 0 R is defined by F (Σ = log((p T Lx T Σ P T Lx+ logdet(σ. D 0

9 9 An important observation is that m tr(σ m = 0. Combine it with X\ L > d0 D 0, we have m F (Σ m F ( Σ m = ( d 0 D 0 =. When Σ m converges to a nonsingular matrix Σ, m F ( Σ m = F ( Σ = C X \ L m logtr(σ m (VII.9 (VII.0 for some constant C, when Σ m converges to a singular matrix, by induction m F ( Σ m =. (VII. Combining (VII.9, (VII.0 (VII., (VII.8 is proved therefore Lemma III.3 is proved by induction. D. Proof of Theorem I. The roadmap of the proof is as follows. We denote the set of outliers by X 0 = X \ L let X = X L, let = X 0 = X 0. Assume that the solutions of (I. for the set Y Y 0 are I d /d I D d /(D d respectively, then we will prove that k Σ(k = d Π L, (VII.2 which implies Theorem I.. WLOG we can make these the assumptions on the solutions of (I. for the set Y Y 0 since the points in Y Y 0 lie in general positions, applying Theorem III. the solution to (I. for the set Y Y 0 are nonsingular. Assuming the solution of (I. for the set Y Y 0 are ˆΣ ˆΣ 2 respectively, then the following set X, which is a linear transformation of X, satisfies that the solution to (I. for the set Y Y 2 (they are generated by X are I d /d I D d /(D d: X = { ˆΣ Π L x+ ˆΣ 2 Π L x : x X}. If the algorithm in (I.2 for X converges to d Π L then by linear transformation the algorithm for X converges to P L Σ P T L, whose range is also in L. Therefore to prove Theorem I., we only need to prove (VII.2. ow we start to prove (VII.2. Using the update formula in (I.2 the assumption that the solutions of (I. to Y Y 0 are I d /d I D d /(D d respectively, we have P T L xxt P L P L x 2 tr( T P L P L 0 P L x 2 tr( 0 = PL T xxt P L P L x 2 P T L P L P L x 2 = d I d D d I D d. (VII.3 (VII.4 By checking the trace of the numerator of the LHS in (VII.3 (VII.4 we have 0 P T L PT L P L P L x 2 = d I d P L P L x 2 = X 0 D d I D d = 0 D d I D d. Applying (VII.5 (VII.6 we have λ min (PL T x T Σ x P L λ min (PT L = d λ min(p T L ΣP L, λ max (P T L λ max (PT L 0 xxt λ min (P T L ΣP L x 2P L x T Σ x P L xxt λ max (PL T ΣP L P L x 2P L (VII.5 (VII.6 = 0 D d λ max(p T L ΣP L. Combining them with the definition of the operator T in (III.4, we have λ min (PL T T(ΣP L λ max (P T T(ΣP L L α λ min(pl T ΣP L λ max (P T ΣP L L, where α = (D d 0 d > (it follows from the assumption X L X > d D. Therefore λ min (PL T Σ (k P L k λ max (PL T Σ (k P L λ min(p T k αk L Σ ( P L λ max (PL T Σ ( P L =. (VII.7 Since tr(σ (k = for all k > 0, we have λ max(p T k L Σ(k P L = 0, k PT L Σ(k P L = 0. (VII.8 Combining (VII.8 the fact that Σ (k is positive semidefinite, k PT L Σ (k P L = 0. (VII.9 Since we already obtained (VII.8 (VII.9, in order to prove (VII.2 we only need to prove that PL T ΣP L converges to I d /d. Applying (VII.5 we have λ max (P T L x T Σ x P L λ max (P T xxt L x T Σ x PT L +λ max ( 0 λ max (P T L ΣP L 0 P T L xxt x T Σ x PT L x 2 P L x 2 + d λ max(p T L ΣP L

10 0 λ min (P T L d λ min(p T L ΣP L. Therefore x T Σ x P L λ min ( P T L λ max (P T L T(ΣP L λ min (P T L T(ΣP L λ max(p T L ΣP L λ min (P T L ΣP L xxt x T Σ x P L dλ max (P T L ΣP L x 2 0 P L x 2 + λ min (PL T. (VII.20 ΣP L ow we will prove that λ min (P T L Σ (k P L > c for all k for some c > 0. (VII.2 If (VII.2 does not hold then there exists a subsequence Σ kj such that kj λ min (P T L Σ (kj P L = 0. Applying (VII.8, (VII.9 the induction argument in the proof of Lemma III.3 we have kj F(Σ (kj =, which contradicts the monotone property of the algorithm in (VII.5. Therefore (VII.2 is proved. Applying (VII.7, there exists a constant C > 0 such that λ max (P T L Σ(k P L Cα k. (VII.22 ow we prove the existence of k λ max(p T L Σ(k P L λ min(p T L Σ(k P L. If it does not exist, then there exists ε > 0 such that for any sufficiently large K 0, there exists k > k 2 > K 0 such that λ max (P T L Σ (k P L λ min (P T L Σ (k P L λ max(p T L Σ (k2 P L λ min (P T L Σ (k2 P L > ε. Summing (VII.20 for Σ = Σ (k2,σ (k2+,,σ (k, apply (VII.2 (VII.22 we have the contradiction for sufficiently large K 0. ext we will prove λ max (PL T Σ (k P L k λ min (PL T Σ (k P L = (VII.23 by contradiction, i.e., by assuming k λ max(p T L Σ(k P L λ min(p T L Σ(k P L = c 0 >. Since the sequence Σ (k lies in compact space, there is a subsequence{σ (kj } j converging to ˆΣ with λ max (P T L ˆΣPL λ min (P T L ˆΣPL = c 0 >. (VII.24 Applying (VII.8 (VII.9 we have Π L ˆΣΠL = ˆΣ. By simple calculation this property also holds for T n (ˆΣ for any n. Therefore the update T n (ˆΣ can be considered as a update only depends on the set Y. Then by using Theorem III.4 to the set Y we have n Tn (ˆΣ = d Π L, therefore for any ε > 0, there exists some n 0 > 0 such that λ max (P T L T n0 (ˆΣP L λ min (P T L T n0 (ˆΣP L < +ε. (VII.25 Using the continuity of the mapping T n0, for any η > 0 there exists ε 2 > 0 such that λ max (PL T T n0 (ˆΣP L λ min (PL T T n0 (ˆΣP L λ max(pl T T n0 (ΣP L λ min (PL T T n0 (ΣP L < η, (VII.26 when Σ ˆΣ < ε 2. Choose j 0 large enough such that Σ (kj 0 ˆΣ < ε 2, then applying (VII.25 (VII.26 with Σ = Σ (kj 0 we obtain λ max (PL T Σ kj 0 +n0 P L λ min (PL T Σ kj 0 +n0 P L < +ε +η. (VII.27 Summing (VII.20 with Σ = Σ k for all k k j0 + n 0, applying (VII.2 (VII.22 we obtain that for some C > 0, λ max (PL T c 0 = Σ (k P L k λ min (PL T Σ (k P L λ max(p T L Σ kj 0 +n0 P L λ min (P T L Σ kj 0 +n0 P L +C α kj 0 n0 <+C α kj 0 n0 +ε +η. (VII.28 Since we can choose ε, η arbitrarily small k j0, n 0 arbitrarily large, (VII.28 is a contradiction to (VII.24. Therefore (VII.23 is proved. Combining (VII.23 with (VII.8 (VII.9 notice that tr(σ (k = for all k > 0, we proved (VII.2. REFERECES [] L. P. Ammann. Robust singular value decompositions: A new approach to projection pursuit. Journal of the American Statistical Association, 88(422:pp , 993. [2] C. Auderset, C. Mazza, E. A. Ruh. Angular gaussian cauchy estimation. Journal of Multivariate Analysis, 93(:80 97, [3] R. Basri D. Jacobs. Lambertian reflectance linear subspaces. IEEE Transactions on Pattern Analysis Machine Intelligence, 25(2:28 233, February [4] R. Bhatia. Positive Definite Matrices. Princeton Series in Applied Mathematics. Princeton University Press, [5] S. Bonnabel R. Sepulchre. Riemannian metric geometric mean for positive semidefinite matrices of fixed rank. 3(3: , [6] T. F. Chan P. Mulet. On the convergence of the lagged diffusivity fixed point method in total variation image restoration. SIAM J. umer. Anal., 36: , February 999. [7] E. Elhamifar R. Vidal. Sparse subspace clustering. In Proceedings of the 2009 IEEE Computer Society Conference on Computer Vision Pattern Recognition (CVPR 09, pages , [8] J. Faraut A. Korányi. Analysis on symmetric cones. Oxford mathematical monographs. Clarendon Press, 994. [9] L. Fei-Fei, R. Fergus, P. Perona. Learning generative visual models from few training examples: An incremental bayesian approach tested on 0 object categories. Comput. Vis. Image Underst., 06(:59 70, Apr [0] P. J. Huber. Robust Statistics. John Wiley & Sons Inc., ew York, 98. Wiley Series in Probability Mathematical Statistics. [] D. R. Hunter K. Lange. A tutorial on mm algorithms. The American Statistician, 58(:pp , [2] J. T. Kent D. E. Tyler. Maximum likelihood estimation for the wrapped cauchy distribution. Journal of Applied Statistics, 5(2: , 988.

11 [3] J. T. Kent D. E. Tyler. Redescending M-estimates of multivariate location scatter. The Annals of Statistics, 9(4:pp , 99. [4] H. W. Kuhn. A note on Fermat s problem. Mathematical Programming, 4:98 07, /BF [5] S. Lang. Fundamentals of differential geometry. umber v. 60 in Graduate texts in mathematics. Springer, 999. [6] K. Lee, J. Ho, D. Kriegman. Acquiring linear subspaces for face recognition under variable lighting. IEEE Trans. Pattern Anal. Mach. Intelligence, 27(5: , [7] G. Lerman, M. McCoy, J. A. Tropp, T. Zhang. Robust computation of linear models, or how to find a needle in a haystack. Submitted Febrary 202. Available at [8] G. Li Z. Chen. Projection-Pursuit Approach to Robust Dispersion Matrices Principal Components: Primary Theory Monte Carlo. Journal of the American Statistical Association, 80(39: , 985. [9] R. A. Maronna. Robust M-estimators of multivariate location scatter. The Annals of Statistics, 4(:pp. 5 67, 976. [20] R. A. Maronna, R. D. Martin, V. J. Yohai. Robust statistics. Wiley Series in Probability Statistics. John Wiley & Sons Ltd., Chichester, Theory methods. [2] R. A. Maronna, R. D. Martin, V. J. Yohai. Robust statistics: Theory methods. Wiley Series in Probability Statistics. John Wiley & Sons Ltd., Chichester, [22] M. McCoy J. A. Tropp. Two proposals for robust PCA using semidefinite programming. Elec. J. Stat., 5:23 60, 20. [23] C. iculescu L. Persson. Convex functions their applications: a contemporary approach. umber v. 3 in CMS books in mathematics. Springer, [24] X. Pennec, P. Fillard,. Ayache. A riemannian framework for tensor computing. International Journal of Computer Vision, 66:4 66, /s z. [25] S. Smith. Covariance, subspace, intrinsic crame acute;r-rao bounds. Signal Processing, IEEE Transactions on, 53(5:60 630, may [26] M. Soltanolkotabi E. J. Cès. A geometric analysis of subspace clustering with outliers. CoRR, abs/2.4258, 20. [27] F. D. L. Torre M. J. Black. A framework for robust subspace learning. International Journal of Computer Vision, 54:7 42, /A: [28] D. E. Tyler. A distribution-free m-estimator of multivariate scatter. The Annals of Statistics, 5(:pp , 987. [29] C. Udrişte. Convex functions optimization methods on Riemannian manifolds. Mathematics its applications. Kluwer Academic Publishers, 994. [30] A. Wiesel. Geodesic convexity covariance estimation. Submitted to IEEE Trans. on Signal Processing. [3] A. Wiesel. On the convexity in kronecker structured covariance estimation. To be presented in SSP 202. [32] A. Wiesel. Unified framework to regularized covariance estimation in scaled gaussian models. Signal Processing, IEEE Transactions on, 60(:29 38, jan [33] H. Xu, C. Caramanis, S. Mannor. Principal Component Analysis with Contaminated Data: The High Dimensional Case. In Conference on Learning Theory (COLT [34] H. Xu, C. Caramanis, S. Sanghavi. Robust PCA via Outlier Pursuit. In J. Lafferty, C. K. I. Williams, J. Shawe-Taylor, R. S. Zemel, A. Culotta, editors, Advances in eural Information Processing Systems 23, pages [35] T. Zhang G. Lerman. A novel M-estimator for robust PCA. preprint, 20. arxiv:

A Novel M-Estimator for Robust PCA

A Novel M-Estimator for Robust PCA Journal of Machine Learning Research 15 (2014) 749-808 Submitted 12/11; Revised 6/13; Published 2/14 A Novel M-Estimator for Robust PCA Teng Zhang Institute for Mathematics and its Applications University

More information

Geodesic Convexity and Regularized Scatter Estimation

Geodesic Convexity and Regularized Scatter Estimation Geodesic Convexity and Regularized Scatter Estimation Lutz Duembgen (Bern) David Tyler (Rutgers) Klaus Nordhausen (Turku/Vienna), Heike Schuhmacher (Bern) Markus Pauly (Ulm), Thomas Schweizer (Bern) Düsseldorf,

More information

Robust computation of linear models by convex relaxation

Robust computation of linear models by convex relaxation Noname manuscript No. (will be inserted by the editor) Robust computation of linear models by convex relaxation Gilad Lerman Michael B. McCoy Joel A. Tropp Teng Zhang 18 February 2012. Revised 31 May 2013.

More information

Grassmann Averages for Scalable Robust PCA Supplementary Material

Grassmann Averages for Scalable Robust PCA Supplementary Material Grassmann Averages for Scalable Robust PCA Supplementary Material Søren Hauberg DTU Compute Lyngby, Denmark sohau@dtu.dk Aasa Feragen DIKU and MPIs Tübingen Denmark and Germany aasa@diku.dk Michael J.

More information

Robust Principal Component Analysis

Robust Principal Component Analysis ELE 538B: Mathematics of High-Dimensional Data Robust Principal Component Analysis Yuxin Chen Princeton University, Fall 2018 Disentangling sparse and low-rank matrices Suppose we are given a matrix M

More information

Principal Component Analysis

Principal Component Analysis Machine Learning Michaelmas 2017 James Worrell Principal Component Analysis 1 Introduction 1.1 Goals of PCA Principal components analysis (PCA) is a dimensionality reduction technique that can be used

More information

Sparse representation classification and positive L1 minimization

Sparse representation classification and positive L1 minimization Sparse representation classification and positive L1 minimization Cencheng Shen Joint Work with Li Chen, Carey E. Priebe Applied Mathematics and Statistics Johns Hopkins University, August 5, 2014 Cencheng

More information

Lecture Notes 1: Vector spaces

Lecture Notes 1: Vector spaces Optimization-based data analysis Fall 2017 Lecture Notes 1: Vector spaces In this chapter we review certain basic concepts of linear algebra, highlighting their application to signal processing. 1 Vector

More information

PCA with random noise. Van Ha Vu. Department of Mathematics Yale University

PCA with random noise. Van Ha Vu. Department of Mathematics Yale University PCA with random noise Van Ha Vu Department of Mathematics Yale University An important problem that appears in various areas of applied mathematics (in particular statistics, computer science and numerical

More information

SEQUENTIAL SUBSPACE FINDING: A NEW ALGORITHM FOR LEARNING LOW-DIMENSIONAL LINEAR SUBSPACES.

SEQUENTIAL SUBSPACE FINDING: A NEW ALGORITHM FOR LEARNING LOW-DIMENSIONAL LINEAR SUBSPACES. SEQUENTIAL SUBSPACE FINDING: A NEW ALGORITHM FOR LEARNING LOW-DIMENSIONAL LINEAR SUBSPACES Mostafa Sadeghi a, Mohsen Joneidi a, Massoud Babaie-Zadeh a, and Christian Jutten b a Electrical Engineering Department,

More information

Orthogonal Matching Pursuit for Sparse Signal Recovery With Noise

Orthogonal Matching Pursuit for Sparse Signal Recovery With Noise Orthogonal Matching Pursuit for Sparse Signal Recovery With Noise The MIT Faculty has made this article openly available. Please share how this access benefits you. Your story matters. Citation As Published

More information

Riemannian Metric Learning for Symmetric Positive Definite Matrices

Riemannian Metric Learning for Symmetric Positive Definite Matrices CMSC 88J: Linear Subspaces and Manifolds for Computer Vision and Machine Learning Riemannian Metric Learning for Symmetric Positive Definite Matrices Raviteja Vemulapalli Guide: Professor David W. Jacobs

More information

Small sample size in high dimensional space - minimum distance based classification.

Small sample size in high dimensional space - minimum distance based classification. Small sample size in high dimensional space - minimum distance based classification. Ewa Skubalska-Rafaj lowicz Institute of Computer Engineering, Automatics and Robotics, Department of Electronics, Wroc

More information

Homework 1. Yuan Yao. September 18, 2011

Homework 1. Yuan Yao. September 18, 2011 Homework 1 Yuan Yao September 18, 2011 1. Singular Value Decomposition: The goal of this exercise is to refresh your memory about the singular value decomposition and matrix norms. A good reference to

More information

Reconstruction from Anisotropic Random Measurements

Reconstruction from Anisotropic Random Measurements Reconstruction from Anisotropic Random Measurements Mark Rudelson and Shuheng Zhou The University of Michigan, Ann Arbor Coding, Complexity, and Sparsity Workshop, 013 Ann Arbor, Michigan August 7, 013

More information

Vector spaces. DS-GA 1013 / MATH-GA 2824 Optimization-based Data Analysis.

Vector spaces. DS-GA 1013 / MATH-GA 2824 Optimization-based Data Analysis. Vector spaces DS-GA 1013 / MATH-GA 2824 Optimization-based Data Analysis http://www.cims.nyu.edu/~cfgranda/pages/obda_fall17/index.html Carlos Fernandez-Granda Vector space Consists of: A set V A scalar

More information

New Coherence and RIP Analysis for Weak. Orthogonal Matching Pursuit

New Coherence and RIP Analysis for Weak. Orthogonal Matching Pursuit New Coherence and RIP Analysis for Wea 1 Orthogonal Matching Pursuit Mingrui Yang, Member, IEEE, and Fran de Hoog arxiv:1405.3354v1 [cs.it] 14 May 2014 Abstract In this paper we define a new coherence

More information

IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 57, NO. 11, NOVEMBER On the Performance of Sparse Recovery

IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 57, NO. 11, NOVEMBER On the Performance of Sparse Recovery IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 57, NO. 11, NOVEMBER 2011 7255 On the Performance of Sparse Recovery Via `p-minimization (0 p 1) Meng Wang, Student Member, IEEE, Weiyu Xu, and Ao Tang, Senior

More information

Sparse Subspace Clustering

Sparse Subspace Clustering Sparse Subspace Clustering Based on Sparse Subspace Clustering: Algorithm, Theory, and Applications by Elhamifar and Vidal (2013) Alex Gutierrez CSCI 8314 March 2, 2017 Outline 1 Motivation and Background

More information

DS-GA 1002 Lecture notes 0 Fall Linear Algebra. These notes provide a review of basic concepts in linear algebra.

DS-GA 1002 Lecture notes 0 Fall Linear Algebra. These notes provide a review of basic concepts in linear algebra. DS-GA 1002 Lecture notes 0 Fall 2016 Linear Algebra These notes provide a review of basic concepts in linear algebra. 1 Vector spaces You are no doubt familiar with vectors in R 2 or R 3, i.e. [ ] 1.1

More information

Optimization methods

Optimization methods Lecture notes 3 February 8, 016 1 Introduction Optimization methods In these notes we provide an overview of a selection of optimization methods. We focus on methods which rely on first-order information,

More information

L26: Advanced dimensionality reduction

L26: Advanced dimensionality reduction L26: Advanced dimensionality reduction The snapshot CA approach Oriented rincipal Components Analysis Non-linear dimensionality reduction (manifold learning) ISOMA Locally Linear Embedding CSCE 666 attern

More information

SUBSPACE CLUSTERING WITH DENSE REPRESENTATIONS. Eva L. Dyer, Christoph Studer, Richard G. Baraniuk

SUBSPACE CLUSTERING WITH DENSE REPRESENTATIONS. Eva L. Dyer, Christoph Studer, Richard G. Baraniuk SUBSPACE CLUSTERING WITH DENSE REPRESENTATIONS Eva L. Dyer, Christoph Studer, Richard G. Baraniuk Rice University; e-mail: {e.dyer, studer, richb@rice.edu} ABSTRACT Unions of subspaces have recently been

More information

Statistical Pattern Recognition

Statistical Pattern Recognition Statistical Pattern Recognition Feature Extraction Hamid R. Rabiee Jafar Muhammadi, Alireza Ghasemi, Payam Siyari Spring 2014 http://ce.sharif.edu/courses/92-93/2/ce725-2/ Agenda Dimensionality Reduction

More information

SUBSPACE CLUSTERING WITH DENSE REPRESENTATIONS. Eva L. Dyer, Christoph Studer, Richard G. Baraniuk. ECE Department, Rice University, Houston, TX

SUBSPACE CLUSTERING WITH DENSE REPRESENTATIONS. Eva L. Dyer, Christoph Studer, Richard G. Baraniuk. ECE Department, Rice University, Houston, TX SUBSPACE CLUSTERING WITH DENSE REPRESENTATIONS Eva L. Dyer, Christoph Studer, Richard G. Baraniuk ECE Department, Rice University, Houston, TX ABSTRACT Unions of subspaces have recently been shown to provide

More information

STA414/2104 Statistical Methods for Machine Learning II

STA414/2104 Statistical Methods for Machine Learning II STA414/2104 Statistical Methods for Machine Learning II Murat A. Erdogdu & David Duvenaud Department of Computer Science Department of Statistical Sciences Lecture 3 Slide credits: Russ Salakhutdinov Announcements

More information

Clustering VS Classification

Clustering VS Classification MCQ Clustering VS Classification 1. What is the relation between the distance between clusters and the corresponding class discriminability? a. proportional b. inversely-proportional c. no-relation Ans:

More information

Non-convex Robust PCA: Provable Bounds

Non-convex Robust PCA: Provable Bounds Non-convex Robust PCA: Provable Bounds Anima Anandkumar U.C. Irvine Joint work with Praneeth Netrapalli, U.N. Niranjan, Prateek Jain and Sujay Sanghavi. Learning with Big Data High Dimensional Regime Missing

More information

Conditions for Robust Principal Component Analysis

Conditions for Robust Principal Component Analysis Rose-Hulman Undergraduate Mathematics Journal Volume 12 Issue 2 Article 9 Conditions for Robust Principal Component Analysis Michael Hornstein Stanford University, mdhornstein@gmail.com Follow this and

More information

CSC 576: Variants of Sparse Learning

CSC 576: Variants of Sparse Learning CSC 576: Variants of Sparse Learning Ji Liu Department of Computer Science, University of Rochester October 27, 205 Introduction Our previous note basically suggests using l norm to enforce sparsity in

More information

Dimension Reduction Techniques. Presented by Jie (Jerry) Yu

Dimension Reduction Techniques. Presented by Jie (Jerry) Yu Dimension Reduction Techniques Presented by Jie (Jerry) Yu Outline Problem Modeling Review of PCA and MDS Isomap Local Linear Embedding (LLE) Charting Background Advances in data collection and storage

More information

SPARSE signal representations have gained popularity in recent

SPARSE signal representations have gained popularity in recent 6958 IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 57, NO. 10, OCTOBER 2011 Blind Compressed Sensing Sivan Gleichman and Yonina C. Eldar, Senior Member, IEEE Abstract The fundamental principle underlying

More information

Unsupervised Learning

Unsupervised Learning 2018 EE448, Big Data Mining, Lecture 7 Unsupervised Learning Weinan Zhang Shanghai Jiao Tong University http://wnzhang.net http://wnzhang.net/teaching/ee448/index.html ML Problem Setting First build and

More information

Two-View Segmentation of Dynamic Scenes from the Multibody Fundamental Matrix

Two-View Segmentation of Dynamic Scenes from the Multibody Fundamental Matrix Two-View Segmentation of Dynamic Scenes from the Multibody Fundamental Matrix René Vidal Stefano Soatto Shankar Sastry Department of EECS, UC Berkeley Department of Computer Sciences, UCLA 30 Cory Hall,

More information

PCA, Kernel PCA, ICA

PCA, Kernel PCA, ICA PCA, Kernel PCA, ICA Learning Representations. Dimensionality Reduction. Maria-Florina Balcan 04/08/2015 Big & High-Dimensional Data High-Dimensions = Lot of Features Document classification Features per

More information

Lecture 13. Principal Component Analysis. Brett Bernstein. April 25, CDS at NYU. Brett Bernstein (CDS at NYU) Lecture 13 April 25, / 26

Lecture 13. Principal Component Analysis. Brett Bernstein. April 25, CDS at NYU. Brett Bernstein (CDS at NYU) Lecture 13 April 25, / 26 Principal Component Analysis Brett Bernstein CDS at NYU April 25, 2017 Brett Bernstein (CDS at NYU) Lecture 13 April 25, 2017 1 / 26 Initial Question Intro Question Question Let S R n n be symmetric. 1

More information

Solution-recovery in l 1 -norm for non-square linear systems: deterministic conditions and open questions

Solution-recovery in l 1 -norm for non-square linear systems: deterministic conditions and open questions Solution-recovery in l 1 -norm for non-square linear systems: deterministic conditions and open questions Yin Zhang Technical Report TR05-06 Department of Computational and Applied Mathematics Rice University,

More information

Information-Theoretic Limits of Matrix Completion

Information-Theoretic Limits of Matrix Completion Information-Theoretic Limits of Matrix Completion Erwin Riegler, David Stotz, and Helmut Bölcskei Dept. IT & EE, ETH Zurich, Switzerland Email: {eriegler, dstotz, boelcskei}@nari.ee.ethz.ch Abstract We

More information

sparse and low-rank tensor recovery Cubic-Sketching

sparse and low-rank tensor recovery Cubic-Sketching Sparse and Low-Ran Tensor Recovery via Cubic-Setching Guang Cheng Department of Statistics Purdue University www.science.purdue.edu/bigdata CCAM@Purdue Math Oct. 27, 2017 Joint wor with Botao Hao and Anru

More information

Structured matrix factorizations. Example: Eigenfaces

Structured matrix factorizations. Example: Eigenfaces Structured matrix factorizations Example: Eigenfaces An extremely large variety of interesting and important problems in machine learning can be formulated as: Given a matrix, find a matrix and a matrix

More information

Vectors To begin, let us describe an element of the state space as a point with numerical coordinates, that is x 1. x 2. x =

Vectors To begin, let us describe an element of the state space as a point with numerical coordinates, that is x 1. x 2. x = Linear Algebra Review Vectors To begin, let us describe an element of the state space as a point with numerical coordinates, that is x 1 x x = 2. x n Vectors of up to three dimensions are easy to diagram.

More information

Solving Corrupted Quadratic Equations, Provably

Solving Corrupted Quadratic Equations, Provably Solving Corrupted Quadratic Equations, Provably Yuejie Chi London Workshop on Sparse Signal Processing September 206 Acknowledgement Joint work with Yuanxin Li (OSU), Huishuai Zhuang (Syracuse) and Yingbin

More information

ECE 521. Lecture 11 (not on midterm material) 13 February K-means clustering, Dimensionality reduction

ECE 521. Lecture 11 (not on midterm material) 13 February K-means clustering, Dimensionality reduction ECE 521 Lecture 11 (not on midterm material) 13 February 2017 K-means clustering, Dimensionality reduction With thanks to Ruslan Salakhutdinov for an earlier version of the slides Overview K-means clustering

More information

Robust PCA via Outlier Pursuit

Robust PCA via Outlier Pursuit Robust PCA via Outlier Pursuit Huan Xu Electrical and Computer Engineering University of Texas at Austin huan.xu@mail.utexas.edu Constantine Caramanis Electrical and Computer Engineering University of

More information

Fast Angular Synchronization for Phase Retrieval via Incomplete Information

Fast Angular Synchronization for Phase Retrieval via Incomplete Information Fast Angular Synchronization for Phase Retrieval via Incomplete Information Aditya Viswanathan a and Mark Iwen b a Department of Mathematics, Michigan State University; b Department of Mathematics & Department

More information

Introduction to Machine Learning

Introduction to Machine Learning 10-701 Introduction to Machine Learning PCA Slides based on 18-661 Fall 2018 PCA Raw data can be Complex, High-dimensional To understand a phenomenon we measure various related quantities If we knew what

More information

MULTICHANNEL SIGNAL PROCESSING USING SPATIAL RANK COVARIANCE MATRICES

MULTICHANNEL SIGNAL PROCESSING USING SPATIAL RANK COVARIANCE MATRICES MULTICHANNEL SIGNAL PROCESSING USING SPATIAL RANK COVARIANCE MATRICES S. Visuri 1 H. Oja V. Koivunen 1 1 Signal Processing Lab. Dept. of Statistics Tampere Univ. of Technology University of Jyväskylä P.O.

More information

Block-Sparse Recovery via Convex Optimization

Block-Sparse Recovery via Convex Optimization 1 Block-Sparse Recovery via Convex Optimization Ehsan Elhamifar, Student Member, IEEE, and René Vidal, Senior Member, IEEE arxiv:11040654v3 [mathoc 13 Apr 2012 Abstract Given a dictionary that consists

More information

First Efficient Convergence for Streaming k-pca: a Global, Gap-Free, and Near-Optimal Rate

First Efficient Convergence for Streaming k-pca: a Global, Gap-Free, and Near-Optimal Rate 58th Annual IEEE Symposium on Foundations of Computer Science First Efficient Convergence for Streaming k-pca: a Global, Gap-Free, and Near-Optimal Rate Zeyuan Allen-Zhu Microsoft Research zeyuan@csail.mit.edu

More information

MACHINE LEARNING. Methods for feature extraction and reduction of dimensionality: Probabilistic PCA and kernel PCA

MACHINE LEARNING. Methods for feature extraction and reduction of dimensionality: Probabilistic PCA and kernel PCA 1 MACHINE LEARNING Methods for feature extraction and reduction of dimensionality: Probabilistic PCA and kernel PCA 2 Practicals Next Week Next Week, Practical Session on Computer Takes Place in Room GR

More information

Methods for sparse analysis of high-dimensional data, II

Methods for sparse analysis of high-dimensional data, II Methods for sparse analysis of high-dimensional data, II Rachel Ward May 26, 2011 High dimensional data with low-dimensional structure 300 by 300 pixel images = 90, 000 dimensions 2 / 55 High dimensional

More information

ECE521 week 3: 23/26 January 2017

ECE521 week 3: 23/26 January 2017 ECE521 week 3: 23/26 January 2017 Outline Probabilistic interpretation of linear regression - Maximum likelihood estimation (MLE) - Maximum a posteriori (MAP) estimation Bias-variance trade-off Linear

More information

distances between objects of different dimensions

distances between objects of different dimensions distances between objects of different dimensions Lek-Heng Lim University of Chicago joint work with: Ke Ye (CAS) and Rodolphe Sepulchre (Cambridge) thanks: DARPA D15AP00109, NSF DMS-1209136, NSF IIS-1546413,

More information

Unsupervised Learning with Permuted Data

Unsupervised Learning with Permuted Data Unsupervised Learning with Permuted Data Sergey Kirshner skirshne@ics.uci.edu Sridevi Parise sparise@ics.uci.edu Padhraic Smyth smyth@ics.uci.edu School of Information and Computer Science, University

More information

Sparse and low-rank decomposition for big data systems via smoothed Riemannian optimization

Sparse and low-rank decomposition for big data systems via smoothed Riemannian optimization Sparse and low-rank decomposition for big data systems via smoothed Riemannian optimization Yuanming Shi ShanghaiTech University, Shanghai, China shiym@shanghaitech.edu.cn Bamdev Mishra Amazon Development

More information

Robust Statistics, Revisited

Robust Statistics, Revisited Robust Statistics, Revisited Ankur Moitra (MIT) joint work with Ilias Diakonikolas, Jerry Li, Gautam Kamath, Daniel Kane and Alistair Stewart CLASSIC PARAMETER ESTIMATION Given samples from an unknown

More information

Math 341: Convex Geometry. Xi Chen

Math 341: Convex Geometry. Xi Chen Math 341: Convex Geometry Xi Chen 479 Central Academic Building, University of Alberta, Edmonton, Alberta T6G 2G1, CANADA E-mail address: xichen@math.ualberta.ca CHAPTER 1 Basics 1. Euclidean Geometry

More information

Sparse Covariance Selection using Semidefinite Programming

Sparse Covariance Selection using Semidefinite Programming Sparse Covariance Selection using Semidefinite Programming A. d Aspremont ORFE, Princeton University Joint work with O. Banerjee, L. El Ghaoui & G. Natsoulis, U.C. Berkeley & Iconix Pharmaceuticals Support

More information

Methods for sparse analysis of high-dimensional data, II

Methods for sparse analysis of high-dimensional data, II Methods for sparse analysis of high-dimensional data, II Rachel Ward May 23, 2011 High dimensional data with low-dimensional structure 300 by 300 pixel images = 90, 000 dimensions 2 / 47 High dimensional

More information

Scalable Subspace Clustering

Scalable Subspace Clustering Scalable Subspace Clustering René Vidal Center for Imaging Science, Laboratory for Computational Sensing and Robotics, Institute for Computational Medicine, Department of Biomedical Engineering, Johns

More information

The properties of L p -GMM estimators

The properties of L p -GMM estimators The properties of L p -GMM estimators Robert de Jong and Chirok Han Michigan State University February 2000 Abstract This paper considers Generalized Method of Moment-type estimators for which a criterion

More information

Statistical Machine Learning

Statistical Machine Learning Statistical Machine Learning Christoph Lampert Spring Semester 2015/2016 // Lecture 12 1 / 36 Unsupervised Learning Dimensionality Reduction 2 / 36 Dimensionality Reduction Given: data X = {x 1,..., x

More information

Lecture 2: Linear Algebra Review

Lecture 2: Linear Algebra Review EE 227A: Convex Optimization and Applications January 19 Lecture 2: Linear Algebra Review Lecturer: Mert Pilanci Reading assignment: Appendix C of BV. Sections 2-6 of the web textbook 1 2.1 Vectors 2.1.1

More information

GEOMETRIC DISTANCE BETWEEN POSITIVE DEFINITE MATRICES OF DIFFERENT DIMENSIONS

GEOMETRIC DISTANCE BETWEEN POSITIVE DEFINITE MATRICES OF DIFFERENT DIMENSIONS GEOMETRIC DISTANCE BETWEEN POSITIVE DEFINITE MATRICES OF DIFFERENT DIMENSIONS LEK-HENG LIM, RODOLPHE SEPULCHRE, AND KE YE Abstract. We show how the Riemannian distance on S n ++, the cone of n n real symmetric

More information

Dimensionality Reduction Using the Sparse Linear Model: Supplementary Material

Dimensionality Reduction Using the Sparse Linear Model: Supplementary Material Dimensionality Reduction Using the Sparse Linear Model: Supplementary Material Ioannis Gkioulekas arvard SEAS Cambridge, MA 038 igkiou@seas.harvard.edu Todd Zickler arvard SEAS Cambridge, MA 038 zickler@seas.harvard.edu

More information

Robustness Meets Algorithms

Robustness Meets Algorithms Robustness Meets Algorithms Ankur Moitra (MIT) ICML 2017 Tutorial, August 6 th CLASSIC PARAMETER ESTIMATION Given samples from an unknown distribution in some class e.g. a 1-D Gaussian can we accurately

More information

Connection of Local Linear Embedding, ISOMAP, and Kernel Principal Component Analysis

Connection of Local Linear Embedding, ISOMAP, and Kernel Principal Component Analysis Connection of Local Linear Embedding, ISOMAP, and Kernel Principal Component Analysis Alvina Goh Vision Reading Group 13 October 2005 Connection of Local Linear Embedding, ISOMAP, and Kernel Principal

More information

Positive semidefinite matrix approximation with a trace constraint

Positive semidefinite matrix approximation with a trace constraint Positive semidefinite matrix approximation with a trace constraint Kouhei Harada August 8, 208 We propose an efficient algorithm to solve positive a semidefinite matrix approximation problem with a trace

More information

Recent Developments in Compressed Sensing

Recent Developments in Compressed Sensing Recent Developments in Compressed Sensing M. Vidyasagar Distinguished Professor, IIT Hyderabad m.vidyasagar@iith.ac.in, www.iith.ac.in/ m vidyasagar/ ISL Seminar, Stanford University, 19 April 2018 Outline

More information

Robust Principal Component Pursuit via Alternating Minimization Scheme on Matrix Manifolds

Robust Principal Component Pursuit via Alternating Minimization Scheme on Matrix Manifolds Robust Principal Component Pursuit via Alternating Minimization Scheme on Matrix Manifolds Tao Wu Institute for Mathematics and Scientific Computing Karl-Franzens-University of Graz joint work with Prof.

More information

Factor Analysis (10/2/13)

Factor Analysis (10/2/13) STA561: Probabilistic machine learning Factor Analysis (10/2/13) Lecturer: Barbara Engelhardt Scribes: Li Zhu, Fan Li, Ni Guan Factor Analysis Factor analysis is related to the mixture models we have studied.

More information

Discriminative Direction for Kernel Classifiers

Discriminative Direction for Kernel Classifiers Discriminative Direction for Kernel Classifiers Polina Golland Artificial Intelligence Lab Massachusetts Institute of Technology Cambridge, MA 02139 polina@ai.mit.edu Abstract In many scientific and engineering

More information

Lecture: Face Recognition

Lecture: Face Recognition Lecture: Face Recognition Juan Carlos Niebles and Ranjay Krishna Stanford Vision and Learning Lab Lecture 12-1 What we will learn today Introduction to face recognition The Eigenfaces Algorithm Linear

More information

Sparse Approximation via Penalty Decomposition Methods

Sparse Approximation via Penalty Decomposition Methods Sparse Approximation via Penalty Decomposition Methods Zhaosong Lu Yong Zhang February 19, 2012 Abstract In this paper we consider sparse approximation problems, that is, general l 0 minimization problems

More information

Analysis of Robust PCA via Local Incoherence

Analysis of Robust PCA via Local Incoherence Analysis of Robust PCA via Local Incoherence Huishuai Zhang Department of EECS Syracuse University Syracuse, NY 3244 hzhan23@syr.edu Yi Zhou Department of EECS Syracuse University Syracuse, NY 3244 yzhou35@syr.edu

More information

A Cross-Associative Neural Network for SVD of Nonsquared Data Matrix in Signal Processing

A Cross-Associative Neural Network for SVD of Nonsquared Data Matrix in Signal Processing IEEE TRANSACTIONS ON NEURAL NETWORKS, VOL. 12, NO. 5, SEPTEMBER 2001 1215 A Cross-Associative Neural Network for SVD of Nonsquared Data Matrix in Signal Processing Da-Zheng Feng, Zheng Bao, Xian-Da Zhang

More information

Sparse Solutions of Systems of Equations and Sparse Modelling of Signals and Images

Sparse Solutions of Systems of Equations and Sparse Modelling of Signals and Images Sparse Solutions of Systems of Equations and Sparse Modelling of Signals and Images Alfredo Nava-Tudela ant@umd.edu John J. Benedetto Department of Mathematics jjb@umd.edu Abstract In this project we are

More information

Optimal Linear Estimation under Unknown Nonlinear Transform

Optimal Linear Estimation under Unknown Nonlinear Transform Optimal Linear Estimation under Unknown Nonlinear Transform Xinyang Yi The University of Texas at Austin yixy@utexas.edu Constantine Caramanis The University of Texas at Austin constantine@utexas.edu Zhaoran

More information

Stat 159/259: Linear Algebra Notes

Stat 159/259: Linear Algebra Notes Stat 159/259: Linear Algebra Notes Jarrod Millman November 16, 2015 Abstract These notes assume you ve taken a semester of undergraduate linear algebra. In particular, I assume you are familiar with the

More information

Analysis of a Privacy-preserving PCA Algorithm using Random Matrix Theory

Analysis of a Privacy-preserving PCA Algorithm using Random Matrix Theory Analysis of a Privacy-preserving PCA Algorithm using Random Matrix Theory Lu Wei, Anand D. Sarwate, Jukka Corander, Alfred Hero, and Vahid Tarokh Department of Electrical and Computer Engineering, University

More information

Robust Motion Segmentation by Spectral Clustering

Robust Motion Segmentation by Spectral Clustering Robust Motion Segmentation by Spectral Clustering Hongbin Wang and Phil F. Culverhouse Centre for Robotics Intelligent Systems University of Plymouth Plymouth, PL4 8AA, UK {hongbin.wang, P.Culverhouse}@plymouth.ac.uk

More information

FACTOR ANALYSIS AND MULTIDIMENSIONAL SCALING

FACTOR ANALYSIS AND MULTIDIMENSIONAL SCALING FACTOR ANALYSIS AND MULTIDIMENSIONAL SCALING Vishwanath Mantha Department for Electrical and Computer Engineering Mississippi State University, Mississippi State, MS 39762 mantha@isip.msstate.edu ABSTRACT

More information

Approximating the Covariance Matrix with Low-rank Perturbations

Approximating the Covariance Matrix with Low-rank Perturbations Approximating the Covariance Matrix with Low-rank Perturbations Malik Magdon-Ismail and Jonathan T. Purnell Department of Computer Science Rensselaer Polytechnic Institute Troy, NY 12180 {magdon,purnej}@cs.rpi.edu

More information

A Randomized Algorithm for the Approximation of Matrices

A Randomized Algorithm for the Approximation of Matrices A Randomized Algorithm for the Approximation of Matrices Per-Gunnar Martinsson, Vladimir Rokhlin, and Mark Tygert Technical Report YALEU/DCS/TR-36 June 29, 2006 Abstract Given an m n matrix A and a positive

More information

arxiv: v1 [math.na] 26 Nov 2009

arxiv: v1 [math.na] 26 Nov 2009 Non-convexly constrained linear inverse problems arxiv:0911.5098v1 [math.na] 26 Nov 2009 Thomas Blumensath Applied Mathematics, School of Mathematics, University of Southampton, University Road, Southampton,

More information

Recursive Sparse Recovery in Large but Structured Noise - Part 2

Recursive Sparse Recovery in Large but Structured Noise - Part 2 Recursive Sparse Recovery in Large but Structured Noise - Part 2 Chenlu Qiu and Namrata Vaswani ECE dept, Iowa State University, Ames IA, Email: {chenlu,namrata}@iastate.edu Abstract We study the problem

More information

Convergence of the Ensemble Kalman Filter in Hilbert Space

Convergence of the Ensemble Kalman Filter in Hilbert Space Convergence of the Ensemble Kalman Filter in Hilbert Space Jan Mandel Center for Computational Mathematics Department of Mathematical and Statistical Sciences University of Colorado Denver Parts based

More information

The Metric Geometry of the Multivariable Matrix Geometric Mean

The Metric Geometry of the Multivariable Matrix Geometric Mean Trieste, 2013 p. 1/26 The Metric Geometry of the Multivariable Matrix Geometric Mean Jimmie Lawson Joint Work with Yongdo Lim Department of Mathematics Louisiana State University Baton Rouge, LA 70803,

More information

Scaling Limits of Waves in Convex Scalar Conservation Laws Under Random Initial Perturbations

Scaling Limits of Waves in Convex Scalar Conservation Laws Under Random Initial Perturbations Journal of Statistical Physics, Vol. 122, No. 2, January 2006 ( C 2006 ) DOI: 10.1007/s10955-005-8006-x Scaling Limits of Waves in Convex Scalar Conservation Laws Under Random Initial Perturbations Jan

More information

Machine learning for pervasive systems Classification in high-dimensional spaces

Machine learning for pervasive systems Classification in high-dimensional spaces Machine learning for pervasive systems Classification in high-dimensional spaces Department of Communications and Networking Aalto University, School of Electrical Engineering stephan.sigg@aalto.fi Version

More information

A Characterization of Sampling Patterns for Union of Low-Rank Subspaces Retrieval Problem

A Characterization of Sampling Patterns for Union of Low-Rank Subspaces Retrieval Problem A Characterization of Sampling Patterns for Union of Low-Rank Subspaces Retrieval Problem Morteza Ashraphijuo Columbia University ashraphijuo@ee.columbia.edu Xiaodong Wang Columbia University wangx@ee.columbia.edu

More information

IT is well-known that the cone of real symmetric positive

IT is well-known that the cone of real symmetric positive SUBMITTED TO IEEE TRANSACTIONS ON INFORMATION THEORY 1 Geometric distance between positive definite matrices of different dimensions Lek-Heng Lim, Rodolphe Sepulchre, Fellow, IEEE, and Ke Ye Abstract We

More information

Dimensionality Reduction: PCA. Nicholas Ruozzi University of Texas at Dallas

Dimensionality Reduction: PCA. Nicholas Ruozzi University of Texas at Dallas Dimensionality Reduction: PCA Nicholas Ruozzi University of Texas at Dallas Eigenvalues λ is an eigenvalue of a matrix A R n n if the linear system Ax = λx has at least one non-zero solution If Ax = λx

More information

THE SINGULAR VALUE DECOMPOSITION MARKUS GRASMAIR

THE SINGULAR VALUE DECOMPOSITION MARKUS GRASMAIR THE SINGULAR VALUE DECOMPOSITION MARKUS GRASMAIR 1. Definition Existence Theorem 1. Assume that A R m n. Then there exist orthogonal matrices U R m m V R n n, values σ 1 σ 2... σ p 0 with p = min{m, n},

More information

A Riemannian Framework for Denoising Diffusion Tensor Images

A Riemannian Framework for Denoising Diffusion Tensor Images A Riemannian Framework for Denoising Diffusion Tensor Images Manasi Datar No Institute Given Abstract. Diffusion Tensor Imaging (DTI) is a relatively new imaging modality that has been extensively used

More information

Robust Stochastic Principal Component Analysis

Robust Stochastic Principal Component Analysis John Goes Teng Zhang Raman Arora Gilad Lerman University of Minnesota Princeton University Johns Hopkins University University of Minnesota Abstract We consider the problem of finding lower dimensional

More information

The Expectation-Maximization Algorithm

The Expectation-Maximization Algorithm 1/29 EM & Latent Variable Models Gaussian Mixture Models EM Theory The Expectation-Maximization Algorithm Mihaela van der Schaar Department of Engineering Science University of Oxford MLE for Latent Variable

More information

Background Mathematics (2/2) 1. David Barber

Background Mathematics (2/2) 1. David Barber Background Mathematics (2/2) 1 David Barber University College London Modified by Samson Cheung (sccheung@ieee.org) 1 These slides accompany the book Bayesian Reasoning and Machine Learning. The book and

More information

SYMMETRIC MATRIX PERTURBATION FOR DIFFERENTIALLY-PRIVATE PRINCIPAL COMPONENT ANALYSIS. Hafiz Imtiaz and Anand D. Sarwate

SYMMETRIC MATRIX PERTURBATION FOR DIFFERENTIALLY-PRIVATE PRINCIPAL COMPONENT ANALYSIS. Hafiz Imtiaz and Anand D. Sarwate SYMMETRIC MATRIX PERTURBATION FOR DIFFERENTIALLY-PRIVATE PRINCIPAL COMPONENT ANALYSIS Hafiz Imtiaz and Anand D. Sarwate Rutgers, The State University of New Jersey ABSTRACT Differential privacy is a strong,

More information

On the Behavior of Information Theoretic Criteria for Model Order Selection

On the Behavior of Information Theoretic Criteria for Model Order Selection IEEE TRANSACTIONS ON SIGNAL PROCESSING, VOL. 49, NO. 8, AUGUST 2001 1689 On the Behavior of Information Theoretic Criteria for Model Order Selection Athanasios P. Liavas, Member, IEEE, and Phillip A. Regalia,

More information