LATELY, there has been a lot of fuss about sparse approximation. Just Relax: Convex Programming Methods for Identifying Sparse Signals in Noise

Size: px
Start display at page:

Download "LATELY, there has been a lot of fuss about sparse approximation. Just Relax: Convex Programming Methods for Identifying Sparse Signals in Noise"

Transcription

1 1030 IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 52, NO. 3, MARCH 2006 Just Relax: Convex Programming Methods for Identifying Sparse Signals in Noise Joel A. Tropp, Student Member, IEEE Abstract This paper studies a difficult and fundamental problem that arises throughout electrical engineering, applied mathematics, and statistics. Suppose that one forms a short linear combination of elementary signals drawn from a large, fixed collection. Given an observation of the linear combination that has been contaminated with additive noise, the goal is to identify which elementary signals participated and to approximate their coefficients. Although many algorithms have been proposed, there is little theory which guarantees that these algorithms can accurately and efficiently solve the problem. This paper studies a method called convex relaxation, which attempts to recover the ideal sparse signal by solving a convex program. This approach is powerful because the optimization can be completed in polynomial time with standard scientific software. The paper provides general conditions which ensure that convex relaxation succeeds. As evidence of the broad impact of these results, the paper describes how convex relaxation can be used for several concrete signal recovery problems. It also describes applications to channel coding, linear regression, and numerical analysis. Index Terms Algorithms, approximation methods, basis pursuit, convex program, linear regression, optimization methods, orthogonal matching pursuit, sparse representations. I. INTRODUCTION LATELY, there has been a lot of fuss about sparse approximation. This class of problems has two defining characteristics. 1) An input signal is approximated by a linear combination of elementary signals. In many modern applications, the elementary signals are drawn from a large, linearly dependent collection. 2) A preference for sparse linear combinations is imposed by penalizing nonzero coefficients. The most common penalty is the number of elementary signals that participate in the approximation. Sparse approximation problems arise throughout electrical engineering, statistics, and applied mathematics. One of the most common applications is to compress audio [1], images [2], and video [3]. Sparsity criteria also arise in linear regression [4], deconvolution [5], signal modeling [6], preconditioning [7], machine learning [8], denoising [9], and regularization [10]. Manuscript received June 7, 2004; revised February 1, This work was supported by the National Science Foundation Graduate Fellowship. The author was with the Institute for Computational Engineering and Sciences (ICES), The University of Texas at Austin, Austin, TX USA. He is now with the Department of Mathematics, The University of Michigan at Ann Arbor, Ann Arbor, MI USA ( jtropp@umich.edu). Communicated by V. V. Vaishampayan, Associate Editor At Large. Digital Object Identifier /TIT A. The Model Problem In this work, we will concentrate on the model problem of identifying a sparse linear combination of elementary signals that has been contaminated with additive noise. The literature on inverse problems tends to assume that the noise is an arbitrary vector of bounded norm, while the signal processing literature usually models the noise statistically; we will consider both possibilities. To be precise, suppose we measure a signal of the form where is a known matrix with unit-norm columns, is a sparse coefficient vector (i.e., few components are nonzero), and is an unknown noise vector. Given the signal, our goal is to approximate the coefficient vector. In particular, it is essential that we correctly identify the nonzero components of the coefficient vector because they determine which columns of the matrix participate in the signal. Initially, linear algebra seems to preclude a solution whenever has a nontrivial null space, we face an ill-posed inverse problem. Even worse, the sparsity of the coefficient vector introduces a combinatorial aspect to the problem. Nevertheless, if the optimal coefficient vector is sufficiently sparse, it turns out that we can accurately and efficiently approximate given the noisy observation. B. Convex Relaxation The literature contains many types of algorithms for approaching the model problem, including brute force [4, Sec ], nonlinear programming [11], and greedy pursuit [12] [14]. In this paper, we concentrate on a powerful method called convex relaxation. Although this technique was introduced over 30 years ago in [15], the theoretical justifications are still shaky. This paper attempts to lay a more solid foundation. Let us explain the intuition behind convex relaxation methods. Suppose we are given a signal of the form (1) along with a bound on the norm of the noise vector, say. At first, it is tempting to look for the sparsest coefficient vector that generates a signal within distance of the input. This idea can be phrased as a mathematical program (1) subject to (2) where the quasi-norm counts the number of nonzero components in its argument. To solve (2) directly, one must sift through all possible disbursements of the nonzero components in. This method is intractable because the search space is exponentially large [12], [16] /$ IEEE

2 TROPP: CONVEX PROGRAMMING METHODS FOR IDENTIFYING SPARSE SIGNALS IN NOISE 1031 To surmount this obstacle, one might replace the quasinorm with the norm to obtain a convex optimization problem subject to - where the tolerance is related to the error bound. Intuitively, the norm is the convex function closest to the quasi-norm, so this substitution is referred to as convex relaxation. One hopes that the solution to the relaxation will yield a good approximation of the ideal coefficient vector. The advantage of the new formulation is that it can be solved in polynomial time with standard scientific software [17]. We have found that it is more natural to study the closely related convex program As before, one can interpret the norm as a convex relaxation of the quasi-norm. So the parameter manages a tradeoff between the approximation error and the sparsity of the coefficient vector. This optimization problem can also be solved efficiently, and one expects that the minimizer will approximate the ideal coefficient vector. Appendix I offers an excursus on the history of convex relaxation methods for identifying sparse signals. C. Contributions The primary contribution of this paper is to justify the intuition that convex relaxation can solve the model problem. Our two major theorems describe the behavior of solutions to ( - ) and solutions to ( - ). We apply this theory to several concrete instances of the model problem. Let us summarize our results on the performance of convex relaxation. Suppose that we measure a signal of the form (1) and that we solve one of the convex programs with an appropriate choice of or to obtain a minimizer. Theorems 8 and 14 demonstrate that the vector is close to the ideal coefficient vector ; the support of (i.e., the indices of its nonzero components) is a subset of the support of ; moreover, the minimizer is unique. In words, the solution to the convex relaxation identifies every sufficiently large component of, and it never mistakenly identifies a column of that did not participate in the signal. As other authors have written, It seems quite surprising that any result of this kind is possible [18, Sec. 6]. The conditions under which the convex relaxations solve the model problem are geometric in nature. Section III-C describes the primary factors influencing their success. Small sets of the columns from should be well conditioned. The noise vector should be weakly correlated with all the columns of. These properties can be difficult to check in general. This paper relies on a simple approach based on the coherence parameter of, which measures the cosine of the minimal angle between - a pair of columns. It may be possible to improve these results using techniques from Banach space geometry. As an application of the theory, we will see that convex relaxation can be used to solve three versions of the model problem. If the coherence parameter is small both convex programs can identify a sufficiently sparse signal corrupted by an arbitrary vector of bounded norm (Sections IV-C and V-B); the program ( - ) can identify a sparse signal in additive white Gaussian noise (Section IV-D); the program ( - ) can identify a sparse signal in uniform noise of bounded norm (Section V-C). In addition, Section IV-E shows that ( - ) can be used for subset selection, a sparse approximation problem that arises in statistics. Section V-D demonstrates that ( - ) can solve another sparse approximation problem from numerical analysis. This paper is among the first to analyze the behavior of convex relaxation when noise is present. Prior theoretical work has focused on a special case of the model problem where the noise vector. See, for example, [14], [19] [23]. Although these results are beautiful, they simply do not apply to real-world signal processing problems. We expect that our work will have a significant practical impact because so many applications require sparse approximation in the presence of noise. As an example, Dossal and Mallat have used our results to study the performance of convex relaxation for a problem in seismics [24]. After the first draft [25] of this paper had been completed, it came to the author s attention that several other researchers were preparing papers on the performance of convex relaxation in the presence of noise [18], [26], [27]. These manuscripts are significantly different from the present work, and they also deserve attention. Turn to Section VI-B for comparisons and contrasts. D. Channel Coding It may be illuminating to view the model problem in the context of channel coding. The Gaussian channel allows us to send one real number during each transmission, but this number is corrupted by the addition of a zero-mean Gaussian random variable. Shannon s channel coding theorem [28, Ch. 10] shows that the capacity of the Gaussian channel can be achieved (asymptotically) by grouping the transmissions into long blocks and using a random code whose size is exponential in the block length. A major practical problem with this approach is that decoding requires an exponentially large lookup table. In contrast, we could also construct a large code by forming sparse linear combinations of vectors from a fixed codebook. Both the choice of vectors and the nonzero coefficients carry information. The vector length is analogous with the block length. When we transmit a codeword through the channel, it is contaminated by a zero-mean, white Gaussian random vector. To identify the codeword, it is necessary to solve the model problem. Therefore, any sparse approximation algorithm such as convex relaxation can potentially be used for decoding. To see that this coding scheme is practical, one must show that the codewords carry almost as much information as the channel permits. One must also demonstrate that the decoding algorithm can reliably recover the noisy codewords.

3 1032 IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 52, NO. 3, MARCH 2006 We believe that channel coding is a novel application of sparse approximation algorithms. In certain settings, these algorithms may provide a very competitive approach. Sections IV-D and V-C present some simple examples that take a first step in this direction. See [29] for some more discussion of these ideas. E. Outline The paper continues with notation and background material in Section II. Section III develops the fundamental lemmata that undergird our major results. The two subsequent sections provide the major theoretical results for ( - ) and ( - ). Both these sections describe several specific applications in signal recovery, approximation theory, statistics, and numerical analysis. The body of the paper concludes with Section VI, which surveys extensions and related work. The appendices present several additional topics and some supporting analysis. II. BACKGROUND This section furnishes the mathematical mise en scène. We have adopted an abstract point of view that facilitates our treatment of convex relaxation methods. Most of the material here may be found in any book on matrix theory or functional analysis, such as [30] [32]. A. The Dictionary We will work in the finite-dimensional, complex innerproduct space, which will be called the signal space. 1 The usual Hermitian inner product for will be written as, and we will denote the corresponding norm by.adictio- nary for the signal space is a finite collection of unit-norm elementary signals. The elementary signals are called atoms, and each is denoted by, where the parameter is drawn from an index set. The indices may have an interpretation, such as the time frequency or time scale localization of an atom, or they may simply be labels without an underlying metaphysics. The whole dictionary structure is written as The letter will denote the number of atoms in the dictionary. It is evident that, where returns the cardinality of a finite set. B. The Coherence Parameter A summary parameter of the dictionary is the coherence, which is defined as the maximum absolute inner product between two distinct atoms [12], [19] When the coherence is small, the atoms look very different from each other, which makes them easy to distinguish. It is common that the coherence satisfies. We say informally that a dictionary is incoherent when we judge that the coherence 1 Modifications for real signal spaces should be transparent, but the apotheosis to infinite dimensions may take additional effort. (3) parameter is small. An incoherent dictionary may contain far more atoms than an orthonormal basis (i.e., ) [14, Sec. II-D]. The literature also contains a generalization of the coherence parameter called the cumulative coherence function [14], [21]. For each natural number, this function is defined as This function will often provide better estimates than the coherence parameter. For clarity of exposition, we only present results in terms of the coherence parameter. Parallel results using cumulative coherence are not hard to develop [25]. It is essential to be aware that coherence is not fundamental to sparse approximation. Rather, it offers a quick way to check the hypotheses of our theorems. It is possible to provide much stronger results using more sophisticated techniques. C. Coefficient Space Every linear combination of atoms is parameterized by a list of coefficients. We may collect them into a coefficient vector, which formally belongs to the linear space 2. The standard basis for this space is given by the vectors whose coordinate projections are identically zero, except for one unit coordinate. The th standard basis vector will be denoted. Given a coefficient vector, the expressions and both represent its th coordinate. We will alternate between them, depending on what is most typographically felicitous. The support of a coefficient vector is the set of indices at which it is nonzero Suppose that. Without notice, we may embed short coefficient vectors from into by extending them with zeros. Likewise, we may restrict long coefficient vectors from to their support. Both transubstantiations will be natural in context. D. Sparsity and Diversity The sparsity of a coefficient vector is the number of places where it equals zero. The complementary notion, diversity, counts the number of places where the coefficient vector does not equal zero. Diversity is calculated with the quasi-norm For any positive number,define with the convention that. As one might expect, there is an intimate connection between the definitions (5) and (6). Indeed,. It is well known that 2 In case this notation is unfamiliar, is the collection of functions from to. It is equipped with the usual addition and scalar multiplication to form a linear space. (4) (5) (6)

4 TROPP: CONVEX PROGRAMMING METHODS FOR IDENTIFYING SPARSE SIGNALS IN NOISE 1033 the function (6) is convex if and only if, in which case it describes the norm. From this vantage, one can see that the norm is the convex function closest to the quasi-norm (subject to the same normalization on the zero vector and the standard basis vectors). For a more rigorous approach to this idea, see [33, Proposition 2.1] and also [21]. E. Dictionary Matrices Let us define a matrix, called the dictionary synthesis matrix, that maps coefficient vectors to signals. Formally via The matrix describes the action of this linear transformation in the standard bases of the underlying vector spaces. Therefore, the columns of are the atoms. The conjugate transpose of is called the dictionary analysis matrix, and it maps each signal to a coefficient vector that lists the inner products between signal and atoms H. Operator Norms The operator norm of a matrix is given by An immediate consequence is the upper norm bound Suppose that and. Then we have the identity We also have the following lower norm bound, which is proved in [25, Sec. 3.3]. Proposition 1: For every matrix If has full row rank, equality holds in (8). When is invertible, this result implies (7) (8) The rows of are atoms, conjugate-transposed. The symbol matrix. denotes the range (i.e., column span) of a F. Subdictionaries A subdictionary is a linearly independent collection of atoms. We will exploit the linear independence repeatedly. If the atoms in a subdictionary are indexed by the set, then we define a synthesis matrix and an analysis matrix. These matrices are entirely analogous with the dictionary synthesis and analysis matrices. We will frequently use the fact that the synthesis matrix has full column rank. Let index a subdictionary. The Gram matrix of the subdictionary is given by. This matrix is Hermitian, it has a unit diagonal, and it is invertible. The pseudoinverse of the synthesis matrix is denoted by, and it may be calculated using the formula. The matrix will denote the orthogonal projector onto the span of the subdictionary. This projector may be expressed using the pseudoinverse:. On occasion, the distinguished index set is instead of. In this case, the synthesis matrix is written as, and the orthogonal projector is denoted. G. Signals, Approximations, and Coefficients Let be a subdictionary, and let be a fixed input signal. This signal has a unique best approximation using the atoms in, which is determined by. Note that the approximation is orthogonal to the residual. There is a unique coefficient vector supported on that synthesizes the approximation:. We may calculate that. III. FUNDAMENTAL LEMMATA Fix an input signal and a positive parameter.define the convex function In this section, we study the minimizers of this function. These results are important to us because (L) is the objective function of the convex program ( - ), and it is essentially the Lagrangian function of ( - ). By analyzing the behavior of (L), we can understand the performance of convex relaxation methods. The major lemma of this section provides a sufficient condition which ensures that the minimizer of (L) is supported on a given index set. To develop this result, we first characterize the (unique) minimizer of (L) when it is restricted to coefficient vectors supported on. We use this characterization to obtain a condition under which every perturbation away from the restricted minimizer must increase the value of the objective function. When this condition is in force, the global minimizer of (L) must be supported on. A. Convex Analysis The proof relies on standard results from convex analysis. As it is usually presented, this subject addresses the properties of real-valued convex functions defined on real vector spaces. Nevertheless, it is possible to transport these results to the complex (L)

5 1034 IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 52, NO. 3, MARCH 2006 setting by defining an appropriate real-linear structure on the complex vector space. In this subsection, therefore, we use the bilinear inner product instead of the usual sesquilinear (i.e., Hermitian) inner product. Note that both inner products generate the norm. Suppose that is a convex function from a real-linear innerproduct space to. The gradient of the function at the point is defined as the usual (Fréchet) derivative, computed with respect to the real-linear inner product. Even when the gradient does not exist, we may define the subdifferential of at a point for every The elements of the subdifferential are called subgradients.if possesses a gradient at, the unique subgradient is the gradient. That is, The subdifferential of a sum is the (Minkowski) sum of the subdifferentials. Finally, if is a closed, proper convex function then is a global minimizer of if and only if. The standard reference for this material is [34]. Remark 2: The subdifferential of a convex function provides a dual description of the function in terms of its supporting hyperplanes. In consequence, the appearance of the subdifferential in our proof is analogous with the familiar technique of studying the dual of a convex program. B. Restricted Minimizers First, we must characterize the minimizers of the objective function (L). Fuchs developed the following result in the real setting using essentially the same method [23]. Lemma 3: Suppose that indexes a linearly independent collection of atoms, and let minimize the objective function (L) over all coefficient vectors supported on. A necessary and sufficient condition on such a minimizer is that where the vector is drawn from. Moreover, the minimizer is unique. Proof: Suppose that. Then the vectors and are orthogonal to each other. Apply the Pythagorean Theorem to see that minimizing (L) over coefficient vectors supported on is equivalent to minimizing the function (9) (10) over coefficient vectors from. Recall that has full column rank. It follows that the quadratic term in (10) is strictly convex, and so the whole function must also be strictly convex. Therefore, its minimizer is unique. The function is convex and unconstrained, so is a necessary and sufficient condition for the coefficient vector to minimize. The gradient of the first term of at equals. From the additivity of subdifferentials, it follows that for some vector drawn from the subdifferential.we premultiply this relation by to reach Apply the fact that to reach the conclusion. Now we identify the subdifferential of the end, define the signum function as for for norm. To that One may extend the signum function to vectors by applying it to each component. Proposition 4: Let be a complex vector. The complex vector lies in the subdifferential if and only if whenever, and whenever. Indeed, unless, in which case. We omit the proof. At last, we may develop bounds on how much a solution to the restricted problem varies from the desired solution. Corollary 5 (Upper Bounds): Suppose that indexes a subdictionary, and let minimize the function (L) over all coefficient vectors supported on. The following bounds are in force: (11) (12) Proof: We begin with the necessary and sufficient condition (13) where. To obtain (11), we take the norm of (13) and apply the upper norm bound Proposition 4 shows that, which proves the result. To develop the second bound (12), we premultiply (13) by the matrix and compute the norm As before,. Finally, we apply the identity (7) to switch from the norm to the norm.

6 TROPP: CONVEX PROGRAMMING METHODS FOR IDENTIFYING SPARSE SIGNALS IN NOISE 1035 C. The Correlation Condition Now, we develop a condition which ensures that the global minimizer of (L) is supported inside a given index set. This result is the soul of the analysis. Lemma 6 (Correlation Condition): Assume that indexes a linearly independent collection of atoms. Let minimize the function (L) over all coefficient vectors supported on. Suppose that where is determined by (9). It follows that is the unique global minimizer of (L). In particular, the condition guarantees that is the unique global minimizer of (L). In this work, we will concentrate on the second sufficient condition because it is easier to work with. Nevertheless, the first condition is significantly more powerful. Note that either sufficient condition becomes worthless when the bracket on its right-hand side is negative. We typically abbreviate the bracket in the second condition as The notation stands for exact recovery coefficient, which reflects the fact that is a sufficient condition for several different algorithms to recover the optimal representation of an exactly sparse signal [14]. Roughly, measures how distinct the atoms in are from the remaining atoms. Observe that if the index set satisfies, then the sufficient condition of the lemma will hold as soon as becomes large enough. But if is too large, then. Given a nonzero signal,define the function Then, any coefficient vector that minimizes the function (L) must be supported inside. One might wonder whether the condition is really necessary to develop results of this type. The answer is a qualified affirmative. Appendix II offers a partial converse of Lemma 6. D. Proof of the Correlation Condition Lemma We now establish Lemma 6. The argument presented here is different from the original proof in the technical report [25]. It relies on a perturbation technique that can be traced to the independent works [18], [25]. Proof of Lemma 6: Let be the unique minimizer of the objective function (L) over coefficient vectors supported on. In particular, the value of the objective function increases if we change any coordinate of indexed in. We will develop a condition which guarantees that the value of the objective function also increases when we change any other component of. Since (L) is convex, these two facts will imply that is the global minimizer. Choose an index not listed in, and let be a nonzero scalar. We must develop a condition which ensures that where is the th standard basis vector. To that end, expand the left-hand side of this relation to obtain Next, simplify the first bracket by expanding the norms and canceling like terms. Simplify the second bracket by recognizing that since the two vectors have disjoint supports. Hence, and place the convention that. This quantity measures the maximum correlation between the signal and any atom in the dictionary. To interpret the left-hand side of the sufficient conditions in the lemma, observe that Therefore, the lemma is strongest when the magnitude of the residual and its maximum correlation with the dictionary are both small. If the dictionary is not exponentially large, a generic vector is weakly correlated with the dictionary on account of measure concentration phenomena. Since the atoms are normalized, the maximum correlation never exceeds one. This fact yields a (much weaker) result that depends only on the magnitude of the residual. Corollary 7: Let index a subdictionary, and let be an input signal. Suppose that the residual vector satisfies Add and subtract in the left-hand side of the inner product, and use linearity to split the inner product into two pieces. We reach We will bound the right-hand side below. To that end, observe that the first term is strictly positive, so we may discard it. Then invoke the lower triangle inequality, and use the linearity of the inner products to draw out (14) It remains to rewrite (14) in a favorable manner. First, identify in the last term on the right-hand side of (14). Next, let us examine the second term. Lemma 3 characterizes the difference. Introduce this characterization, and identify the pseudoinverse of to discover that (15)

7 1036 IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 52, NO. 3, MARCH 2006 where bound (14) is determined by (9). Substitute (15) into the (16) Our final goal is to ensure that the left-hand side of (16) is strictly positive. This situation occurs whenever the bracket is nonnegative. Therefore, we need We require this expression to hold for every index that does not belong to. Optimizing each side over in yields the more restrictive condition Since is orthogonal to the atoms listed in, the lefthand side does not change if we maximize over all from. Therefore, We conclude that the relation is a sufficient condition for every perturbation away from to increase the objective function. Since (L) is convex, it follows that is the unique global minimizer of (L). In particular, since, it is also sufficient that This completes the argument. IV. PENALIZATION Suppose that is an input signal. In this section, we study applications of the convex program The parameter controls the tradeoff between the error in approximating the input signal and the sparsity of the approximation. We begin with a theorem that provides significant information about the solution to ( - ). Afterward, this theorem is applied to several important problems. As a first example, we show that the convex program can identify the sparsest representation of an exactly sparse signal. Our second example shows that ( - ) can be used to recover a sparse signal contaminated by an unknown noise vector of bounded norm, which is a well-known inverse problem. We will also see that a statistical model for the noise vector allows us to provide more refined results. In particular, we will discover that ( - ) is quite effective for recovering sparse signals corrupted with Gaussian noise. The section concludes with an application of the convex program to the subset selection problem from statistics. The reader should also be aware that convex programs of the form ( - ) have been proposed for numerous other applications. Geophysicists have long used them for deconvolution [5], [35]. Certain support vector machines, which arise in machine learning, can be reduced to this form [8]. Chen, - Donoho, and Saunders have applied ( - ) to denoise signals [9], and Fuchs has put it forth for several other signal processing problems, e.g., in [36], [37]. Daubechies, Defrise, and De Mol have suggested a related convex program to regularize ill-posed linear inverse problems [10]. Most intriguing, perhaps, Olshausen and Field have argued that the mammalian visual cortex may solve similar optimization problems to produce sparse representations of images [38]. Even before we begin our analysis, we can make a few immediate remarks about ( - ). Observe that, as the parameter tends to zero, the solution to the convex program approaches a point of minimal norm in the affine space. It can be shown that, except when belongs to a set of signals with Lebesgue measure zero in, no point of the affine space has fewer than nonzero components [14, Proposition 4.1]. So the minimizer of ( - ) is generically nonsparse when is small. In contrast, as approaches infinity, the solution tends toward the zero vector. It is also worth noting that ( - ) has an analytical solution whenever is unitary. One simply computes the orthogonal expansion of the signal with respect to the columns of and applies the soft thresholding operator to each coefficient. More precisely, where if otherwise. This result will be familiar to anyone who has studied the process of shrinking empirical wavelet coefficients to denoise functions [39], [40]. A. Performance of ( - ) Our major theorem on the behavior of ( - ) simply collects the lemmata from the last section. Theorem 8: Let index a linearly independent collection of atoms for which. Suppose that is an input signal whose best approximation over satisfies the correlation condition Let solve the convex program ( - ) with parameter. We may conclude the following. The support of is contained in, and the distance between and the optimal coefficient vector satisfies In particular, contains every index in for which Moreover, the minimizer is unique. In words, we approximate the input signal over, and we suppose that the remaining atoms are weakly correlated with the residual. It follows then that the minimizer of the convex program involves only atoms in and that this minimizer is not far from the coefficient vector that synthesizes the best approximation of the signal over.

8 TROPP: CONVEX PROGRAMMING METHODS FOR IDENTIFYING SPARSE SIGNALS IN NOISE 1037 To check the hypotheses of this theorem, one must carefully choose the index set and leverage information about the relationship between the signal and its approximation over. Our examples will show how to apply the theorem in several specific cases. At this point, we can also state a simpler version of this theorem that uses the coherence parameter to estimate some of the key quantities. One advantage of this formulation is that the index set plays a smaller role. Corollary 9: Suppose that, and assume that contains no more than indices. Suppose that is an input signal whose best approximation over satisfies the correlation condition Let solve ( - ) with parameter. We may conclude the following. The support of is contained in, and the distance between and the optimal coefficient vector satisfies In particular, contains every index in for which Moreover, the minimizer is unique. Proof: The corollary follows directly from the theorem when we apply the coherence bounds of Appendix III. This corollary takes an especially pleasing form if we assume that. In that case, the right-hand side of the correlation condition is no smaller than. Meanwhile, the coefficient vectors and differ in norm by no more than. Note that, when is unitary, the coherence parameter. Corollary 9 allows us to conclude that the solution to the convex program identifies precisely those atoms whose inner products against the signal exceed in magnitude. Their coefficients are reduced in magnitude by at most. This description matches the performance of the soft thresholding operator. B. Example: Identifying Sparse Signals As a simple application of the foregoing theorem, we offer a new proof that one can recover an exactly sparse signal by solving the convex program ( - ). Fuchs has already established this result in the real case [22]. Corollary 10 (Fuchs): Assume that indexes a linearly independent collection of atoms for which. Choose an arbitrary coefficient vector supported on, and fix an input signal. Let denote the unique minimizer of ( - ) with parameter. We may conclude the following. There is a positive number for which implies that. In the limit as,wehave. Proof: First, note that the best approximation of the signal over satisfies and that the corresponding coefficient vector. Therefore,. The Correlation Condition is in force for every positive, so Theorem 8 implies that the minimizer of the program ( - ) must be supported inside. Moreover, the distance from to the optimal coefficient vector satisfies It is immediate that as the parameter. Finally, observe that contains every index in provided that Note that the left-hand side of this inequality furnishes an explicit value for. As a further corollary, we obtain a familiar result for Basis Pursuit that was developed independently in [14], [23], generalizing the work in [19] [22]. We will require this corollary in the sequel. Corollary 11 (Fuchs, Tropp): Assume that. Let be supported on, and fix a signal. Then is the unique solution to subject to (BP) Proof: As, the solutions to ( - ) approach a point of minimum norm in the affine space. Since is the limit of these minimizers, it follows that must also solve (BP). The uniqueness claim seems to require a separate argument and the assumption that is strictly positive. See [23], [41] for two different proofs. C. Example: Identifying Sparse Signals in Noise This subsection shows how ( - ) can be applied to the inverse problem of recovering a sparse signal contaminated with an arbitrary noise vector of bounded norm. Let us begin with a model for our ideal signals. We will use this model for examples throughout the paper. Fix a -coherent dictionary containing atoms in a real signal space of dimension, and place the coherence bound. Each ideal signal is a linear combination of atoms with coefficients of. To formalize things, select to be any nonempty index set containing atoms. Let be a coefficient vector supported on, and let the nonzero entries of equal. Then each ideal signal takes the form. To correctly recover one of these signals, it is necessary to determine the support set of the optimal coefficient vector as well as the signs of the nonzero coefficients. Observe that the total number of ideal signals is since the choice of atoms and the choice of coefficients both carry information. The coherence estimate in Proposition 26 allows us to establish that the power of each ideal signal satisfies

9 1038 IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 52, NO. 3, MARCH 2006 Similarly, no ideal signal has power less than. In this example, we will form input signals by contaminating the ideal signals with an arbitrary noise vector of bounded norm, say. Therefore, we measure a signal of the form We would like to apply Corollary 9 to find circumstances in which the minimizer of ( - ) identifies the ideal signal. That is,. Observe that this is a concrete example of the model problem from the Introduction. First, let us determine when the Correlation Condition holds for. According to the Pythagorean Theorem Therefore,. Each atom has unit norm, so Referring to the paragraph after Corollary 9, we see that the correlation condition is in force if we select. Invoking Corollary 9, we discover that the minimizer of ( - ) with parameter is supported inside and also that Meanwhile, we may calculate that The coherence bound in Proposition 26 delivers (17) (18) In consequence of the triangle inequality and the bounds (17) and (18) ideal signals provided that the noise level. For the supremal value of, the SNR lies between and db. The total number of ideal signals is just over, so each one encodes 68 bits of information. On the other hand, if we have a statistical model for the noise, we can obtain good performance even when the noise level is substantially higher. In the next subsection, we will describe an example of this stunning phenomenon. D. Example: The Gaussian Channel A striking application of penalization is to recover a sparse signal contaminated with Gaussian noise. The fundamental reason this method succeeds is that, with high probability, the noise is weakly correlated with every atom in the dictionary (provided that the number of atoms is subexponential in the dimension ). Note that Fuchs has developed qualitative results for this type of problem in his conference publication [26]; see Section VI-B for some additional details on his work. Let us continue with the ideal signals described in the last subsection. This time, the ideal signals are corrupted by adding a zero-mean Gaussian random vector with covariance matrix. We measure the signal and we wish to identify. In other words, a codeword is sent through a Gaussian channel, and we must decode the transmission. We will approach this problem by solving the convex program ( - ) with an appropriate choice of to obtain a minimizer. One may establish the following facts about this signal model and the performance of convex relaxation. For each ideal signal, the SNR satisfies SNR The capacity of a Gaussian channel [28, Ch. 10] at this SNR is no greater than It follows that provided that For our ideal signals, the number of bits per transmission equals In conclusion, the convex program ( - ) can always recover the ideal signals provided that the norm of the noise is less than. At this noise level, the signal-to-noise ratio (SNR) is no greater than The probability that convex relaxation correctly identifies the ideal signal exceeds SNR Similarly, the SNR is no smaller than. Note that the SNR grows linearly with the number of atoms in the signal. Let us consider some specific numbers. Suppose that we are working in a signal space with dimension, that the dictionary contains atoms, that the coherence, and that the sparsity level. We can recover any of the It follows that the failure probability decays exponentially as the noise power drops to zero. To establish the first item, we compare the power of the ideal signals against the power of the noise, which is. To determine the likelihood of success, we must find the probability (over the noise) that the Correlation Condition is in force so that we can

10 TROPP: CONVEX PROGRAMMING METHODS FOR IDENTIFYING SPARSE SIGNALS IN NOISE 1039 invoke Corollary 9. Then, we must ensure that the noise does not corrupt the coefficients enough to obscure their signs. The difficult calculations are consigned to Appendix IV-A. We return to the same example from the last subsection. Suppose that we are working in a signal space with dimension, that the dictionary contains atoms, that the coherence, and that the sparsity level is. Assume that the noise level, which is about as large as we can reliably handle. In this case, a good choice for the parameter is. With these selections, the SNR is between 7.17 and db, and the channel capacity does not exceed bits per transmission. Meanwhile, we are sending about bits per transmission, and the probability of perfect recovery exceeds 95% over the noise. Although there is now a small failure probability, let us emphasize that the SNR in this example is over 10 db lower than in the last example. This improvement is possible because we have accounted for the direction of the noise. Let us be honest. This example still does not inspire much confidence in our coding scheme: the theoretical rate we have established is nowhere near the actual capacity of the channel. The shortcoming, however, is not intrinsic to the coding scheme. To obtain rates close to capacity, it is necessary to send linear combinations whose length is on the order of the signal dimension. Experiments indicate that convex relaxation can indeed recover linear combinations of this length. But the analysis to support these empirical results requires tools much more sophisticated than the coherence parameter. We hope to pursue these ideas in a later work. E. Example: Subset Selection Statisticians often wish to predict the value of one random variable using a linear combination of other random variables. At the same time, they must negotiate a compromise between the number of variables involved and the mean-squared prediction error to avoid overfitting. The problem of determining the correct variables is called subset selection, and it was probably the first type of sparse approximation to be studied in depth. As Miller laments, statisticians have made limited theoretical progress due to numerous complications that arise in the stochastic setting [4, Prefaces]. We will consider a deterministic version of subset selection that manages a simple tradeoff between the squared approximation error and the number of atoms that participate. Let be an arbitrary input signal. Suppose is a threshold that quantifies how much improvement in the approximation error is necessary before we admit an additional term into the approximation. We may state the formal problem (Subset) Were the support of fixed, then (Subset) would be a least-squares problem. Selecting the optimal support, however, is a combinatorial nightmare. In fact, if the dictionary is unrestricted, it must be NP-hard to solve (Subset) in consequence of results from [12], [16]. The statistics literature contains dozens of algorithmic approaches to subset selection, which [4] describes in detail. A method that has recently become popular is the lasso, which replaces the difficult subset selection problem with a convex relaxation of the form ( - ) in hope that the solutions will be related [42]. Our example provides a rigorous justification that this approach can succeed. If we have some basic information about the solution to (Subset), then we may approximate this solution using ( - ) with an appropriate choice of. We will invoke Theorem 8 to show that the solution to the convex relaxation has the desired properties. To do so, we require a theorem about the behavior of solutions to the subset selection problem. The proof appears in Appendix V. Theorem 12: Fix an input signal, and choose a threshold. Suppose that the coefficient vector solves the subset selection problem, and set. For,wehave. For,wehave. In consequence of this theorem, any solution to the subset selection problem satisfies the Correlation Condition with, provided that is chosen so that Applying Theorem 8 yields the following result. Corollary 13 (Relaxed Subset Selection): Fix an input signal. Suppose that the vector solves (Subset) with threshold ; the set satisfies ; and the coefficient vector solves ( - ) with parameter. Then it follows that the relaxation never selects a nonoptimal atom since the solution to the relaxation is nearly optimal since In particular, contains every index for which Moreover, is the unique solution to ( - ). In words, if any solution to the subset selection problem satisfies a condition on the Exact Recovery Coefficient, then the solution to the convex relaxation ( - ) for an appropriately chosen parameter will identify every significant atom in that solution to (Subset) and it will never involve any atom that does not belong in that optimal solution. It is true that, in the present form, the hypotheses of this corollary may be difficult to verify. Invoking Corollary 9 instead of Theorem 8 would yield a more practical result involving the coherence parameter. Nevertheless, this result would still involve a strong assumption on the sparsity of an optimal solution to the subset selection problem. It is not clear that one could verify this hypothesis in practice, so it may be better to view these results as philosophical support for the practice of convex relaxation.

11 1040 IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 52, NO. 3, MARCH 2006 For an incoherent dictionary, one could develop a converse result of the following shape. Suppose that the solution to the convex relaxation ( - ) is sufficiently sparse, has large enough coefficients, and yields a good approximation of the input signal. Then the solution to the relaxation must approximate an optimal solution to (Subset). As this paper was being completed, it came to the author s attention that Gribonval et al. have developed some results of this type [43]. Their theory should complement the present work nicely. Theorem 14: Let index a linearly independent collection of atoms for which, and fix an input signal. Select an error tolerance no smaller than Let solve the convex program ( - ) with tolerance. We may conclude the following. The support of is contained in, and the distance between and the optimal coefficient vector satisfies V. ERROR-CONSTRAINED MINIMIZATION In particular, contains every index from for which Suppose that is an input signal. In this section, we study applications of the convex program. subject to - Minimizing the norm of the coefficients promotes sparsity, while the parameter controls how much approximation error we are willing to tolerate. We will begin with a theorem that yields significant information about the solution to ( - ). Then, we will tour several different applications of this optimization problem. As a first example, we will see that it can recover a sparse signal that has been contaminated with an arbitrary noise vector of bounded norm. Afterward, we show that a statistical model for the bounded noise allows us to sharpen our analysis significantly. Third, we describe an application to a sparse approximation problem that arises in numerical analysis. The literature does not contain many papers that apply the convex program ( - ). Indeed, the other optimization problem ( - ) is probably more useful. Nevertheless, there is one notable study [18] of the theoretical performance of ( - ). This article will be discussed in more detail in Section VI-B. Let us conclude this introduction by mentioning some of the basic properties of ( - ). As the parameter approaches zero, the solutions will approach a point of minimal norm in the affine space. On the other hand, as soon as exceeds, the unique solution to ( - ) is the zero vector. A. Performance of ( - ) The following theorem describes the behavior of a minimizer of ( - ). In particular, it provides conditions under which the support of the minimizer is contained in a specific index set. We reserve the proof until Section V-E. Moreover, the minimizer is unique. In words, if the parameter is chosen somewhat larger than the error in the optimal approximation, then the solution to ( - ) identifies every significant atom in and it never picks an incorrect atom. Note that the manuscript [18] refers to the first conclusion as a support result and the second conclusion as a stability result. To invoke the theorem, it is necessary to choose the parameter carefully. Our examples will show how this can be accomplished in several specific cases. Using the coherence parameter, it is possible to state a somewhat simpler result. Corollary 15: Suppose that, and assume that lists no more than atoms. Suppose that is an input signal, and choose the error tolerance no smaller than Let solve ( - ) with tolerance. We may conclude that. Furthermore, we have Proof: The corollary follows directly from the theorem when we apply the coherence bounds from Appendix III. This corollary takes a most satisfying form under the assumption that. In that case, if we choose the tolerance then we may conclude that. B. Example: Identifying Sparse Signals in Noise (Redux) In Section IV-C, we showed that ( - ) can be used to solve the inverse problem of recovering sparse signals corrupted with arbitrary noise of bounded magnitude. This example will demonstrate that ( - ) can be applied to the same problem.

ALGORITHMS FOR SIMULTANEOUS SPARSE APPROXIMATION PART II: CONVEX RELAXATION

ALGORITHMS FOR SIMULTANEOUS SPARSE APPROXIMATION PART II: CONVEX RELAXATION ALGORITHMS FOR SIMULTANEOUS SPARSE APPROXIMATION PART II: CONVEX RELAXATION JOEL A TROPP Abstract A simultaneous sparse approximation problem requests a good approximation of several input signals at once

More information

SPARSE signal representations have gained popularity in recent

SPARSE signal representations have gained popularity in recent 6958 IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 57, NO. 10, OCTOBER 2011 Blind Compressed Sensing Sivan Gleichman and Yonina C. Eldar, Senior Member, IEEE Abstract The fundamental principle underlying

More information

An Introduction to Sparse Approximation

An Introduction to Sparse Approximation An Introduction to Sparse Approximation Anna C. Gilbert Department of Mathematics University of Michigan Basic image/signal/data compression: transform coding Approximate signals sparsely Compress images,

More information

Structured matrix factorizations. Example: Eigenfaces

Structured matrix factorizations. Example: Eigenfaces Structured matrix factorizations Example: Eigenfaces An extremely large variety of interesting and important problems in machine learning can be formulated as: Given a matrix, find a matrix and a matrix

More information

5742 IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 55, NO. 12, DECEMBER /$ IEEE

5742 IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 55, NO. 12, DECEMBER /$ IEEE 5742 IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 55, NO. 12, DECEMBER 2009 Uncertainty Relations for Shift-Invariant Analog Signals Yonina C. Eldar, Senior Member, IEEE Abstract The past several years

More information

A Generalized Uncertainty Principle and Sparse Representation in Pairs of Bases

A Generalized Uncertainty Principle and Sparse Representation in Pairs of Bases 2558 IEEE TRANSACTIONS ON INFORMATION THEORY, VOL 48, NO 9, SEPTEMBER 2002 A Generalized Uncertainty Principle Sparse Representation in Pairs of Bases Michael Elad Alfred M Bruckstein Abstract An elementary

More information

IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 57, NO. 11, NOVEMBER On the Performance of Sparse Recovery

IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 57, NO. 11, NOVEMBER On the Performance of Sparse Recovery IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 57, NO. 11, NOVEMBER 2011 7255 On the Performance of Sparse Recovery Via `p-minimization (0 p 1) Meng Wang, Student Member, IEEE, Weiyu Xu, and Ao Tang, Senior

More information

Copyright by Joel Aaron Tropp 2004

Copyright by Joel Aaron Tropp 2004 Copyright by Joel Aaron Tropp 2004 The Dissertation Committee for Joel Aaron Tropp certifies that this is the approved version of the following dissertation: Topics in Sparse Approximation Committee: Inderjit

More information

DS-GA 1002 Lecture notes 0 Fall Linear Algebra. These notes provide a review of basic concepts in linear algebra.

DS-GA 1002 Lecture notes 0 Fall Linear Algebra. These notes provide a review of basic concepts in linear algebra. DS-GA 1002 Lecture notes 0 Fall 2016 Linear Algebra These notes provide a review of basic concepts in linear algebra. 1 Vector spaces You are no doubt familiar with vectors in R 2 or R 3, i.e. [ ] 1.1

More information

Wavelet Footprints: Theory, Algorithms, and Applications

Wavelet Footprints: Theory, Algorithms, and Applications 1306 IEEE TRANSACTIONS ON SIGNAL PROCESSING, VOL. 51, NO. 5, MAY 2003 Wavelet Footprints: Theory, Algorithms, and Applications Pier Luigi Dragotti, Member, IEEE, and Martin Vetterli, Fellow, IEEE Abstract

More information

Lecture Notes 1: Vector spaces

Lecture Notes 1: Vector spaces Optimization-based data analysis Fall 2017 Lecture Notes 1: Vector spaces In this chapter we review certain basic concepts of linear algebra, highlighting their application to signal processing. 1 Vector

More information

Conditions for Robust Principal Component Analysis

Conditions for Robust Principal Component Analysis Rose-Hulman Undergraduate Mathematics Journal Volume 12 Issue 2 Article 9 Conditions for Robust Principal Component Analysis Michael Hornstein Stanford University, mdhornstein@gmail.com Follow this and

More information

c 2011 International Press Vol. 18, No. 1, pp , March DENNIS TREDE

c 2011 International Press Vol. 18, No. 1, pp , March DENNIS TREDE METHODS AND APPLICATIONS OF ANALYSIS. c 2011 International Press Vol. 18, No. 1, pp. 105 110, March 2011 007 EXACT SUPPORT RECOVERY FOR LINEAR INVERSE PROBLEMS WITH SPARSITY CONSTRAINTS DENNIS TREDE Abstract.

More information

Least Sparsity of p-norm based Optimization Problems with p > 1

Least Sparsity of p-norm based Optimization Problems with p > 1 Least Sparsity of p-norm based Optimization Problems with p > Jinglai Shen and Seyedahmad Mousavi Original version: July, 07; Revision: February, 08 Abstract Motivated by l p -optimization arising from

More information

October 25, 2013 INNER PRODUCT SPACES

October 25, 2013 INNER PRODUCT SPACES October 25, 2013 INNER PRODUCT SPACES RODICA D. COSTIN Contents 1. Inner product 2 1.1. Inner product 2 1.2. Inner product spaces 4 2. Orthogonal bases 5 2.1. Existence of an orthogonal basis 7 2.2. Orthogonal

More information

Math 350 Fall 2011 Notes about inner product spaces. In this notes we state and prove some important properties of inner product spaces.

Math 350 Fall 2011 Notes about inner product spaces. In this notes we state and prove some important properties of inner product spaces. Math 350 Fall 2011 Notes about inner product spaces In this notes we state and prove some important properties of inner product spaces. First, recall the dot product on R n : if x, y R n, say x = (x 1,...,

More information

Compressed sensing. Or: the equation Ax = b, revisited. Terence Tao. Mahler Lecture Series. University of California, Los Angeles

Compressed sensing. Or: the equation Ax = b, revisited. Terence Tao. Mahler Lecture Series. University of California, Los Angeles Or: the equation Ax = b, revisited University of California, Los Angeles Mahler Lecture Series Acquiring signals Many types of real-world signals (e.g. sound, images, video) can be viewed as an n-dimensional

More information

Simultaneous Sparsity

Simultaneous Sparsity Simultaneous Sparsity Joel A. Tropp Anna C. Gilbert Martin J. Strauss {jtropp annacg martinjs}@umich.edu Department of Mathematics The University of Michigan 1 Simple Sparse Approximation Work in the d-dimensional,

More information

Machine Learning for Signal Processing Sparse and Overcomplete Representations

Machine Learning for Signal Processing Sparse and Overcomplete Representations Machine Learning for Signal Processing Sparse and Overcomplete Representations Abelino Jimenez (slides from Bhiksha Raj and Sourish Chaudhuri) Oct 1, 217 1 So far Weights Data Basis Data Independent ICA

More information

Linear Algebra Done Wrong. Sergei Treil. Department of Mathematics, Brown University

Linear Algebra Done Wrong. Sergei Treil. Department of Mathematics, Brown University Linear Algebra Done Wrong Sergei Treil Department of Mathematics, Brown University Copyright c Sergei Treil, 2004, 2009 Preface The title of the book sounds a bit mysterious. Why should anyone read this

More information

1 Review of the dot product

1 Review of the dot product Any typographical or other corrections about these notes are welcome. Review of the dot product The dot product on R n is an operation that takes two vectors and returns a number. It is defined by n u

More information

IN this paper, we consider the capacity of sticky channels, a

IN this paper, we consider the capacity of sticky channels, a 72 IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 54, NO. 1, JANUARY 2008 Capacity Bounds for Sticky Channels Michael Mitzenmacher, Member, IEEE Abstract The capacity of sticky channels, a subclass of insertion

More information

ECE521 week 3: 23/26 January 2017

ECE521 week 3: 23/26 January 2017 ECE521 week 3: 23/26 January 2017 Outline Probabilistic interpretation of linear regression - Maximum likelihood estimation (MLE) - Maximum a posteriori (MAP) estimation Bias-variance trade-off Linear

More information

Cambridge University Press The Mathematics of Signal Processing Steven B. Damelin and Willard Miller Excerpt More information

Cambridge University Press The Mathematics of Signal Processing Steven B. Damelin and Willard Miller Excerpt More information Introduction Consider a linear system y = Φx where Φ can be taken as an m n matrix acting on Euclidean space or more generally, a linear operator on a Hilbert space. We call the vector x a signal or input,

More information

Simultaneous Support Recovery in High Dimensions: Benefits and Perils of Block `1=` -Regularization

Simultaneous Support Recovery in High Dimensions: Benefits and Perils of Block `1=` -Regularization IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 57, NO. 6, JUNE 2011 3841 Simultaneous Support Recovery in High Dimensions: Benefits and Perils of Block `1=` -Regularization Sahand N. Negahban and Martin

More information

Equivalence Probability and Sparsity of Two Sparse Solutions in Sparse Representation

Equivalence Probability and Sparsity of Two Sparse Solutions in Sparse Representation IEEE TRANSACTIONS ON NEURAL NETWORKS, VOL. 19, NO. 12, DECEMBER 2008 2009 Equivalence Probability and Sparsity of Two Sparse Solutions in Sparse Representation Yuanqing Li, Member, IEEE, Andrzej Cichocki,

More information

6196 IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 57, NO. 9, SEPTEMBER 2011

6196 IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 57, NO. 9, SEPTEMBER 2011 6196 IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 57, NO. 9, SEPTEMBER 2011 On the Structure of Real-Time Encoding and Decoding Functions in a Multiterminal Communication System Ashutosh Nayyar, Student

More information

Solution-recovery in l 1 -norm for non-square linear systems: deterministic conditions and open questions

Solution-recovery in l 1 -norm for non-square linear systems: deterministic conditions and open questions Solution-recovery in l 1 -norm for non-square linear systems: deterministic conditions and open questions Yin Zhang Technical Report TR05-06 Department of Computational and Applied Mathematics Rice University,

More information

DS-GA 1002 Lecture notes 10 November 23, Linear models

DS-GA 1002 Lecture notes 10 November 23, Linear models DS-GA 2 Lecture notes November 23, 2 Linear functions Linear models A linear model encodes the assumption that two quantities are linearly related. Mathematically, this is characterized using linear functions.

More information

Linear Algebra. Min Yan

Linear Algebra. Min Yan Linear Algebra Min Yan January 2, 2018 2 Contents 1 Vector Space 7 1.1 Definition................................. 7 1.1.1 Axioms of Vector Space..................... 7 1.1.2 Consequence of Axiom......................

More information

ASIGNIFICANT research effort has been devoted to the. Optimal State Estimation for Stochastic Systems: An Information Theoretic Approach

ASIGNIFICANT research effort has been devoted to the. Optimal State Estimation for Stochastic Systems: An Information Theoretic Approach IEEE TRANSACTIONS ON AUTOMATIC CONTROL, VOL 42, NO 6, JUNE 1997 771 Optimal State Estimation for Stochastic Systems: An Information Theoretic Approach Xiangbo Feng, Kenneth A Loparo, Senior Member, IEEE,

More information

The value of a problem is not so much coming up with the answer as in the ideas and attempted ideas it forces on the would be solver I.N.

The value of a problem is not so much coming up with the answer as in the ideas and attempted ideas it forces on the would be solver I.N. Math 410 Homework Problems In the following pages you will find all of the homework problems for the semester. Homework should be written out neatly and stapled and turned in at the beginning of class

More information

A PRIMER ON SESQUILINEAR FORMS

A PRIMER ON SESQUILINEAR FORMS A PRIMER ON SESQUILINEAR FORMS BRIAN OSSERMAN This is an alternative presentation of most of the material from 8., 8.2, 8.3, 8.4, 8.5 and 8.8 of Artin s book. Any terminology (such as sesquilinear form

More information

Lecture: Introduction to Compressed Sensing Sparse Recovery Guarantees

Lecture: Introduction to Compressed Sensing Sparse Recovery Guarantees Lecture: Introduction to Compressed Sensing Sparse Recovery Guarantees http://bicmr.pku.edu.cn/~wenzw/bigdata2018.html Acknowledgement: this slides is based on Prof. Emmanuel Candes and Prof. Wotao Yin

More information

CS168: The Modern Algorithmic Toolbox Lecture #6: Regularization

CS168: The Modern Algorithmic Toolbox Lecture #6: Regularization CS168: The Modern Algorithmic Toolbox Lecture #6: Regularization Tim Roughgarden & Gregory Valiant April 18, 2018 1 The Context and Intuition behind Regularization Given a dataset, and some class of models

More information

Linear Programming Redux

Linear Programming Redux Linear Programming Redux Jim Bremer May 12, 2008 The purpose of these notes is to review the basics of linear programming and the simplex method in a clear, concise, and comprehensive way. The book contains

More information

Thresholds for the Recovery of Sparse Solutions via L1 Minimization

Thresholds for the Recovery of Sparse Solutions via L1 Minimization Thresholds for the Recovery of Sparse Solutions via L Minimization David L. Donoho Department of Statistics Stanford University 39 Serra Mall, Sequoia Hall Stanford, CA 9435-465 Email: donoho@stanford.edu

More information

A simple test to check the optimality of sparse signal approximations

A simple test to check the optimality of sparse signal approximations A simple test to check the optimality of sparse signal approximations Rémi Gribonval, Rosa Maria Figueras I Ventura, Pierre Vergheynst To cite this version: Rémi Gribonval, Rosa Maria Figueras I Ventura,

More information

Sparse Approximation and Variable Selection

Sparse Approximation and Variable Selection Sparse Approximation and Variable Selection Lorenzo Rosasco 9.520 Class 07 February 26, 2007 About this class Goal To introduce the problem of variable selection, discuss its connection to sparse approximation

More information

Cheng Soon Ong & Christian Walder. Canberra February June 2018

Cheng Soon Ong & Christian Walder. Canberra February June 2018 Cheng Soon Ong & Christian Walder Research Group and College of Engineering and Computer Science Canberra February June 2018 Outlines Overview Introduction Linear Algebra Probability Linear Regression

More information

Fourier and Wavelet Signal Processing

Fourier and Wavelet Signal Processing Ecole Polytechnique Federale de Lausanne (EPFL) Audio-Visual Communications Laboratory (LCAV) Fourier and Wavelet Signal Processing Martin Vetterli Amina Chebira, Ali Hormati Spring 2011 2/25/2011 1 Outline

More information

1 Sparsity and l 1 relaxation

1 Sparsity and l 1 relaxation 6.883 Learning with Combinatorial Structure Note for Lecture 2 Author: Chiyuan Zhang Sparsity and l relaxation Last time we talked about sparsity and characterized when an l relaxation could recover the

More information

New Coherence and RIP Analysis for Weak. Orthogonal Matching Pursuit

New Coherence and RIP Analysis for Weak. Orthogonal Matching Pursuit New Coherence and RIP Analysis for Wea 1 Orthogonal Matching Pursuit Mingrui Yang, Member, IEEE, and Fran de Hoog arxiv:1405.3354v1 [cs.it] 14 May 2014 Abstract In this paper we define a new coherence

More information

On Cryptographic Properties of the Cosets of R(1;m)

On Cryptographic Properties of the Cosets of R(1;m) 1494 IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 47, NO. 4, MAY 2001 On Cryptographic Properties of the Cosets of R(1;m) Anne Canteaut, Claude Carlet, Pascale Charpin, and Caroline Fontaine Abstract

More information

Error Correction via Linear Programming

Error Correction via Linear Programming Error Correction via Linear Programming Emmanuel Candes and Terence Tao Applied and Computational Mathematics, Caltech, Pasadena, CA 91125 Department of Mathematics, University of California, Los Angeles,

More information

Lecture Notes 9: Constrained Optimization

Lecture Notes 9: Constrained Optimization Optimization-based data analysis Fall 017 Lecture Notes 9: Constrained Optimization 1 Compressed sensing 1.1 Underdetermined linear inverse problems Linear inverse problems model measurements of the form

More information

Sparsity in Underdetermined Systems

Sparsity in Underdetermined Systems Sparsity in Underdetermined Systems Department of Statistics Stanford University August 19, 2005 Classical Linear Regression Problem X n y p n 1 > Given predictors and response, y Xβ ε = + ε N( 0, σ 2

More information

Compressibility of Infinite Sequences and its Interplay with Compressed Sensing Recovery

Compressibility of Infinite Sequences and its Interplay with Compressed Sensing Recovery Compressibility of Infinite Sequences and its Interplay with Compressed Sensing Recovery Jorge F. Silva and Eduardo Pavez Department of Electrical Engineering Information and Decision Systems Group Universidad

More information

Vector spaces. DS-GA 1013 / MATH-GA 2824 Optimization-based Data Analysis.

Vector spaces. DS-GA 1013 / MATH-GA 2824 Optimization-based Data Analysis. Vector spaces DS-GA 1013 / MATH-GA 2824 Optimization-based Data Analysis http://www.cims.nyu.edu/~cfgranda/pages/obda_fall17/index.html Carlos Fernandez-Granda Vector space Consists of: A set V A scalar

More information

Linear Methods for Regression. Lijun Zhang

Linear Methods for Regression. Lijun Zhang Linear Methods for Regression Lijun Zhang zlj@nju.edu.cn http://cs.nju.edu.cn/zlj Outline Introduction Linear Regression Models and Least Squares Subset Selection Shrinkage Methods Methods Using Derived

More information

Inverse problems and sparse models (1/2) Rémi Gribonval INRIA Rennes - Bretagne Atlantique, France

Inverse problems and sparse models (1/2) Rémi Gribonval INRIA Rennes - Bretagne Atlantique, France Inverse problems and sparse models (1/2) Rémi Gribonval INRIA Rennes - Bretagne Atlantique, France remi.gribonval@inria.fr Structure of the tutorial Session 1: Introduction to inverse problems & sparse

More information

Inner product spaces. Layers of structure:

Inner product spaces. Layers of structure: Inner product spaces Layers of structure: vector space normed linear space inner product space The abstract definition of an inner product, which we will see very shortly, is simple (and by itself is pretty

More information

Chapter 2. Error Correcting Codes. 2.1 Basic Notions

Chapter 2. Error Correcting Codes. 2.1 Basic Notions Chapter 2 Error Correcting Codes The identification number schemes we discussed in the previous chapter give us the ability to determine if an error has been made in recording or transmitting information.

More information

IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 52, NO. 2, FEBRUARY Uplink Downlink Duality Via Minimax Duality. Wei Yu, Member, IEEE (1) (2)

IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 52, NO. 2, FEBRUARY Uplink Downlink Duality Via Minimax Duality. Wei Yu, Member, IEEE (1) (2) IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 52, NO. 2, FEBRUARY 2006 361 Uplink Downlink Duality Via Minimax Duality Wei Yu, Member, IEEE Abstract The sum capacity of a Gaussian vector broadcast channel

More information

A Cross-Associative Neural Network for SVD of Nonsquared Data Matrix in Signal Processing

A Cross-Associative Neural Network for SVD of Nonsquared Data Matrix in Signal Processing IEEE TRANSACTIONS ON NEURAL NETWORKS, VOL. 12, NO. 5, SEPTEMBER 2001 1215 A Cross-Associative Neural Network for SVD of Nonsquared Data Matrix in Signal Processing Da-Zheng Feng, Zheng Bao, Xian-Da Zhang

More information

LINEAR ALGEBRA: THEORY. Version: August 12,

LINEAR ALGEBRA: THEORY. Version: August 12, LINEAR ALGEBRA: THEORY. Version: August 12, 2000 13 2 Basic concepts We will assume that the following concepts are known: Vector, column vector, row vector, transpose. Recall that x is a column vector,

More information

Coding the Matrix Index - Version 0

Coding the Matrix Index - Version 0 0 vector, [definition]; (2.4.1): 68 2D geometry, transformations in, [lab]; (4.15.0): 196-200 A T (matrix A transpose); (4.5.4): 157 absolute value, complex number; (1.4.1): 43 abstract/abstracting, over

More information

Algorithms for sparse analysis Lecture I: Background on sparse approximation

Algorithms for sparse analysis Lecture I: Background on sparse approximation Algorithms for sparse analysis Lecture I: Background on sparse approximation Anna C. Gilbert Department of Mathematics University of Michigan Tutorial on sparse approximations and algorithms Compress data

More information

GREEDY SIGNAL RECOVERY REVIEW

GREEDY SIGNAL RECOVERY REVIEW GREEDY SIGNAL RECOVERY REVIEW DEANNA NEEDELL, JOEL A. TROPP, ROMAN VERSHYNIN Abstract. The two major approaches to sparse recovery are L 1-minimization and greedy methods. Recently, Needell and Vershynin

More information

Linear algebra for MATH2601: Theory

Linear algebra for MATH2601: Theory Linear algebra for MATH2601: Theory László Erdős August 12, 2000 Contents 1 Introduction 4 1.1 List of crucial problems............................... 5 1.2 Importance of linear algebra............................

More information

Stability Analysis and Synthesis for Scalar Linear Systems With a Quantized Feedback

Stability Analysis and Synthesis for Scalar Linear Systems With a Quantized Feedback IEEE TRANSACTIONS ON AUTOMATIC CONTROL, VOL 48, NO 9, SEPTEMBER 2003 1569 Stability Analysis and Synthesis for Scalar Linear Systems With a Quantized Feedback Fabio Fagnani and Sandro Zampieri Abstract

More information

Least squares regularized or constrained by L0: relationship between their global minimizers. Mila Nikolova

Least squares regularized or constrained by L0: relationship between their global minimizers. Mila Nikolova Least squares regularized or constrained by L0: relationship between their global minimizers Mila Nikolova CMLA, CNRS, ENS Cachan, Université Paris-Saclay, France nikolova@cmla.ens-cachan.fr SIAM Minisymposium

More information

STAT 200C: High-dimensional Statistics

STAT 200C: High-dimensional Statistics STAT 200C: High-dimensional Statistics Arash A. Amini May 30, 2018 1 / 57 Table of Contents 1 Sparse linear models Basis Pursuit and restricted null space property Sufficient conditions for RNS 2 / 57

More information

JUST RELAX: CONVEX PROGRAMMING METHODS FOR SUBSET SELECTION AND SPARSE APPROXIMATION

JUST RELAX: CONVEX PROGRAMMING METHODS FOR SUBSET SELECTION AND SPARSE APPROXIMATION JUST RELAX: CONVEX PROGRAMMING METHODS FOR SUBSET SELECTION AND SPARSE APPROXIMATION JOEL A TROPP Abstract Subset selection and sparse approximation problems request a good approximation of an input signal

More information

Linear Algebra Massoud Malek

Linear Algebra Massoud Malek CSUEB Linear Algebra Massoud Malek Inner Product and Normed Space In all that follows, the n n identity matrix is denoted by I n, the n n zero matrix by Z n, and the zero vector by θ n An inner product

More information

Machine Learning for Signal Processing Sparse and Overcomplete Representations. Bhiksha Raj (slides from Sourish Chaudhuri) Oct 22, 2013

Machine Learning for Signal Processing Sparse and Overcomplete Representations. Bhiksha Raj (slides from Sourish Chaudhuri) Oct 22, 2013 Machine Learning for Signal Processing Sparse and Overcomplete Representations Bhiksha Raj (slides from Sourish Chaudhuri) Oct 22, 2013 1 Key Topics in this Lecture Basics Component-based representations

More information

Lecture 3: Error Correcting Codes

Lecture 3: Error Correcting Codes CS 880: Pseudorandomness and Derandomization 1/30/2013 Lecture 3: Error Correcting Codes Instructors: Holger Dell and Dieter van Melkebeek Scribe: Xi Wu In this lecture we review some background on error

More information

4184 IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 51, NO. 12, DECEMBER Pranav Dayal, Member, IEEE, and Mahesh K. Varanasi, Senior Member, IEEE

4184 IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 51, NO. 12, DECEMBER Pranav Dayal, Member, IEEE, and Mahesh K. Varanasi, Senior Member, IEEE 4184 IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 51, NO. 12, DECEMBER 2005 An Algebraic Family of Complex Lattices for Fading Channels With Application to Space Time Codes Pranav Dayal, Member, IEEE,

More information

Interactive Interference Alignment

Interactive Interference Alignment Interactive Interference Alignment Quan Geng, Sreeram annan, and Pramod Viswanath Coordinated Science Laboratory and Dept. of ECE University of Illinois, Urbana-Champaign, IL 61801 Email: {geng5, kannan1,

More information

University of Luxembourg. Master in Mathematics. Student project. Compressed sensing. Supervisor: Prof. I. Nourdin. Author: Lucien May

University of Luxembourg. Master in Mathematics. Student project. Compressed sensing. Supervisor: Prof. I. Nourdin. Author: Lucien May University of Luxembourg Master in Mathematics Student project Compressed sensing Author: Lucien May Supervisor: Prof. I. Nourdin Winter semester 2014 1 Introduction Let us consider an s-sparse vector

More information

Stability and the elastic net

Stability and the elastic net Stability and the elastic net Patrick Breheny March 28 Patrick Breheny High-Dimensional Data Analysis (BIOS 7600) 1/32 Introduction Elastic Net Our last several lectures have concentrated on methods for

More information

IN this paper, we show that the scalar Gaussian multiple-access

IN this paper, we show that the scalar Gaussian multiple-access 768 IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 50, NO. 5, MAY 2004 On the Duality of Gaussian Multiple-Access and Broadcast Channels Nihar Jindal, Student Member, IEEE, Sriram Vishwanath, and Andrea

More information

Spanning and Independence Properties of Finite Frames

Spanning and Independence Properties of Finite Frames Chapter 1 Spanning and Independence Properties of Finite Frames Peter G. Casazza and Darrin Speegle Abstract The fundamental notion of frame theory is redundancy. It is this property which makes frames

More information

Stability and Robustness of Weak Orthogonal Matching Pursuits

Stability and Robustness of Weak Orthogonal Matching Pursuits Stability and Robustness of Weak Orthogonal Matching Pursuits Simon Foucart, Drexel University Abstract A recent result establishing, under restricted isometry conditions, the success of sparse recovery

More information

Lax embeddings of the Hermitian Unital

Lax embeddings of the Hermitian Unital Lax embeddings of the Hermitian Unital V. Pepe and H. Van Maldeghem Abstract In this paper, we prove that every lax generalized Veronesean embedding of the Hermitian unital U of PG(2, L), L a quadratic

More information

Sparse analysis Lecture II: Hardness results for sparse approximation problems

Sparse analysis Lecture II: Hardness results for sparse approximation problems Sparse analysis Lecture II: Hardness results for sparse approximation problems Anna C. Gilbert Department of Mathematics University of Michigan Sparse Problems Exact. Given a vector x R d and a complete

More information

Sparse linear models

Sparse linear models Sparse linear models Optimization-Based Data Analysis http://www.cims.nyu.edu/~cfgranda/pages/obda_spring16 Carlos Fernandez-Granda 2/22/2016 Introduction Linear transforms Frequency representation Short-time

More information

PHASE RETRIEVAL OF SPARSE SIGNALS FROM MAGNITUDE INFORMATION. A Thesis MELTEM APAYDIN

PHASE RETRIEVAL OF SPARSE SIGNALS FROM MAGNITUDE INFORMATION. A Thesis MELTEM APAYDIN PHASE RETRIEVAL OF SPARSE SIGNALS FROM MAGNITUDE INFORMATION A Thesis by MELTEM APAYDIN Submitted to the Office of Graduate and Professional Studies of Texas A&M University in partial fulfillment of the

More information

10-725/36-725: Convex Optimization Prerequisite Topics

10-725/36-725: Convex Optimization Prerequisite Topics 10-725/36-725: Convex Optimization Prerequisite Topics February 3, 2015 This is meant to be a brief, informal refresher of some topics that will form building blocks in this course. The content of the

More information

MA201: Further Mathematical Methods (Linear Algebra) 2002

MA201: Further Mathematical Methods (Linear Algebra) 2002 MA201: Further Mathematical Methods (Linear Algebra) 2002 General Information Teaching This course involves two types of teaching session that you should be attending: Lectures This is a half unit course

More information

The Skorokhod reflection problem for functions with discontinuities (contractive case)

The Skorokhod reflection problem for functions with discontinuities (contractive case) The Skorokhod reflection problem for functions with discontinuities (contractive case) TAKIS KONSTANTOPOULOS Univ. of Texas at Austin Revised March 1999 Abstract Basic properties of the Skorokhod reflection

More information

Sparse regression. Optimization-Based Data Analysis. Carlos Fernandez-Granda

Sparse regression. Optimization-Based Data Analysis.   Carlos Fernandez-Granda Sparse regression Optimization-Based Data Analysis http://www.cims.nyu.edu/~cfgranda/pages/obda_spring16 Carlos Fernandez-Granda 3/28/2016 Regression Least-squares regression Example: Global warming Logistic

More information

Machine Learning. VC Dimension and Model Complexity. Eric Xing , Fall 2015

Machine Learning. VC Dimension and Model Complexity. Eric Xing , Fall 2015 Machine Learning 10-701, Fall 2015 VC Dimension and Model Complexity Eric Xing Lecture 16, November 3, 2015 Reading: Chap. 7 T.M book, and outline material Eric Xing @ CMU, 2006-2015 1 Last time: PAC and

More information

Linear Algebra. The analysis of many models in the social sciences reduces to the study of systems of equations.

Linear Algebra. The analysis of many models in the social sciences reduces to the study of systems of equations. POLI 7 - Mathematical and Statistical Foundations Prof S Saiegh Fall Lecture Notes - Class 4 October 4, Linear Algebra The analysis of many models in the social sciences reduces to the study of systems

More information

Contents. 0.1 Notation... 3

Contents. 0.1 Notation... 3 Contents 0.1 Notation........................................ 3 1 A Short Course on Frame Theory 4 1.1 Examples of Signal Expansions............................ 4 1.2 Signal Expansions in Finite-Dimensional

More information

Linear Algebra, Summer 2011, pt. 2

Linear Algebra, Summer 2011, pt. 2 Linear Algebra, Summer 2, pt. 2 June 8, 2 Contents Inverses. 2 Vector Spaces. 3 2. Examples of vector spaces..................... 3 2.2 The column space......................... 6 2.3 The null space...........................

More information

Preliminaries and Complexity Theory

Preliminaries and Complexity Theory Preliminaries and Complexity Theory Oleksandr Romanko CAS 746 - Advanced Topics in Combinatorial Optimization McMaster University, January 16, 2006 Introduction Book structure: 2 Part I Linear Algebra

More information

Duke University, Department of Electrical and Computer Engineering Optimization for Scientists and Engineers c Alex Bronstein, 2014

Duke University, Department of Electrical and Computer Engineering Optimization for Scientists and Engineers c Alex Bronstein, 2014 Duke University, Department of Electrical and Computer Engineering Optimization for Scientists and Engineers c Alex Bronstein, 2014 Linear Algebra A Brief Reminder Purpose. The purpose of this document

More information

chapter 12 MORE MATRIX ALGEBRA 12.1 Systems of Linear Equations GOALS

chapter 12 MORE MATRIX ALGEBRA 12.1 Systems of Linear Equations GOALS chapter MORE MATRIX ALGEBRA GOALS In Chapter we studied matrix operations and the algebra of sets and logic. We also made note of the strong resemblance of matrix algebra to elementary algebra. The reader

More information

Characterization of Convex and Concave Resource Allocation Problems in Interference Coupled Wireless Systems

Characterization of Convex and Concave Resource Allocation Problems in Interference Coupled Wireless Systems 2382 IEEE TRANSACTIONS ON SIGNAL PROCESSING, VOL 59, NO 5, MAY 2011 Characterization of Convex and Concave Resource Allocation Problems in Interference Coupled Wireless Systems Holger Boche, Fellow, IEEE,

More information

MTH Linear Algebra. Study Guide. Dr. Tony Yee Department of Mathematics and Information Technology The Hong Kong Institute of Education

MTH Linear Algebra. Study Guide. Dr. Tony Yee Department of Mathematics and Information Technology The Hong Kong Institute of Education MTH 3 Linear Algebra Study Guide Dr. Tony Yee Department of Mathematics and Information Technology The Hong Kong Institute of Education June 3, ii Contents Table of Contents iii Matrix Algebra. Real Life

More information

This model of the conditional expectation is linear in the parameters. A more practical and relaxed attitude towards linear regression is to say that

This model of the conditional expectation is linear in the parameters. A more practical and relaxed attitude towards linear regression is to say that Linear Regression For (X, Y ) a pair of random variables with values in R p R we assume that E(Y X) = β 0 + with β R p+1. p X j β j = (1, X T )β j=1 This model of the conditional expectation is linear

More information

Output Input Stability and Minimum-Phase Nonlinear Systems

Output Input Stability and Minimum-Phase Nonlinear Systems 422 IEEE TRANSACTIONS ON AUTOMATIC CONTROL, VOL. 47, NO. 3, MARCH 2002 Output Input Stability and Minimum-Phase Nonlinear Systems Daniel Liberzon, Member, IEEE, A. Stephen Morse, Fellow, IEEE, and Eduardo

More information

Stat 159/259: Linear Algebra Notes

Stat 159/259: Linear Algebra Notes Stat 159/259: Linear Algebra Notes Jarrod Millman November 16, 2015 Abstract These notes assume you ve taken a semester of undergraduate linear algebra. In particular, I assume you are familiar with the

More information

1 Regression with High Dimensional Data

1 Regression with High Dimensional Data 6.883 Learning with Combinatorial Structure ote for Lecture 11 Instructor: Prof. Stefanie Jegelka Scribe: Xuhong Zhang 1 Regression with High Dimensional Data Consider the following regression problem:

More information

CS168: The Modern Algorithmic Toolbox Lecture #8: How PCA Works

CS168: The Modern Algorithmic Toolbox Lecture #8: How PCA Works CS68: The Modern Algorithmic Toolbox Lecture #8: How PCA Works Tim Roughgarden & Gregory Valiant April 20, 206 Introduction Last lecture introduced the idea of principal components analysis (PCA). The

More information

MATH 320, WEEK 6: Linear Systems, Gaussian Elimination, Coefficient Matrices

MATH 320, WEEK 6: Linear Systems, Gaussian Elimination, Coefficient Matrices MATH 320, WEEK 6: Linear Systems, Gaussian Elimination, Coefficient Matrices We will now switch gears and focus on a branch of mathematics known as linear algebra. There are a few notes worth making before

More information

CS168: The Modern Algorithmic Toolbox Lecture #10: Tensors, and Low-Rank Tensor Recovery

CS168: The Modern Algorithmic Toolbox Lecture #10: Tensors, and Low-Rank Tensor Recovery CS168: The Modern Algorithmic Toolbox Lecture #10: Tensors, and Low-Rank Tensor Recovery Tim Roughgarden & Gregory Valiant May 3, 2017 Last lecture discussed singular value decomposition (SVD), and we

More information

Typical Problem: Compute.

Typical Problem: Compute. Math 2040 Chapter 6 Orhtogonality and Least Squares 6.1 and some of 6.7: Inner Product, Length and Orthogonality. Definition: If x, y R n, then x y = x 1 y 1 +... + x n y n is the dot product of x and

More information

ECE 8201: Low-dimensional Signal Models for High-dimensional Data Analysis

ECE 8201: Low-dimensional Signal Models for High-dimensional Data Analysis ECE 8201: Low-dimensional Signal Models for High-dimensional Data Analysis Lecture 7: Matrix completion Yuejie Chi The Ohio State University Page 1 Reference Guaranteed Minimum-Rank Solutions of Linear

More information