Abstract The following linear inverse problem is considered: given a full column rank m n data matrix A and a length m observation vector b, nd the be

Size: px
Start display at page:

Download "Abstract The following linear inverse problem is considered: given a full column rank m n data matrix A and a length m observation vector b, nd the be"

Transcription

1 ON THE OPTIMALLITY OF THE BACKWARD GREEDY ALGORITHM FOR THE SUBSET SELECTION PROBLEM Christophe Couvreur y Yoram Bresler y General Physics Department and TCTS Laboratory, Faculte Polytechnique de Mons, Belgium. Dr C. Couvreur is also a Research Assistant of the National Fund for Scientic Research of Belgium (F.N.R.S.). Coordinated Science Laboratory and Department of Electrical and Computer Engineering, University of Illinois at Urbana-Champaign, USA. Corresponding author: Prof. Yoram Bresler Coordinated Science Laboratory University of Illinois 1308 W. Main St. Urbana, IL (USA) ybresler@uiuc.edu Phone: +1 (217) FAX: +1 (217) Submitted to SIAM Journal on Matrix Analysis and Applications Revised: May 18, 1998.

2 Abstract The following linear inverse problem is considered: given a full column rank m n data matrix A and a length m observation vector b, nd the best least squares solution to Ax = b with at most r < n nonzero components. The backward greedy algorithm computes a sparse solution to Ax = b by removing greedily columns from A until r columns are left. A simple implementation based on a QR downdating scheme by Givens rotations is described. The backward greedy algorithm is shown to be optimal for this problem in the sense that it selects the \correct" subset of columns from A if the perturbation of the data vector b is small enough. 1 Introduction The problem of computing a sparse approximate solution to a set of linear equations is important in matrix computations where it is known as the \subset selection problem" [4]. Such sparse approximate solutions have applications in statistical regression [6], in function interpolation [10], and in signal processing [5, 2, 7]. In this paper, the following formulation of the subset selection problem is considered. Given a data matrix A 2 C mn, m n, and an observation vector b 2 C m, nd the best least-squares solution to Ax = b with at most r non-zero components. That is, nd the subset of r columns from A that gives the best approximation of b in the sense that kax? bk 2 is minimized over all vectors x with at most r non-zero components. Alternately, the problem can be viewed as estimating x 0 2 C n from b and the a priori knowledge that x 0 is sparse when b = Ax 0 + = b 0 + (1) and is an unknown noise vector. For instance, in [2], b is a noisy observation of a linear mixture of vector signals. A dictionary of possible signals is available, but which subset of signals from the dictionary are present in the mixture is unknown. Their mixing proportions are also unknown. It is desired to nd the signals that are eectively present in b and their mixing proportions. This estimation and detection problem can be naturally formulated as a subset selection problem of the form (1) by letting the columns of A and the elements of x 0 represent the possible signals vectors in 1

3 the dictionary and the associated mixing proportions (which are equal to zero for absent signals), respectively. The best subset of columns from A can be found by exhaustive evaluation of the least-squares residual for all possible subsets. It is possible to take advantage of the structure of the problem to reduce the size of the search space by means of Branch & Bound-type algorithms [8, 9]. However, when n increases, algorithms based on exhaustive searches (even with Branch & Bound restrictions of the search space) become rapidly impractical. Therefore, several heuristics which do not require exhaustive searches have been proposed [6]. One such heuristic is the well-known \greedy" algorithm of Golub and Van Loan [4]. The greedy algorithm is a sequential forward selection scheme. The idea of the greedy algorithm is to start by nding the column of A closest to b, and then add columns one by one until r columns have been selected, each time adding the column that gives the largest decrement of the least-squares residual. Some theoretical arguments have recently been given in favor of the greedy algorithm [10]. However, there are situations in which the greedy algorithm will fail even if there exists an exact sparse solution to Ax = b. Consider the following simple example: A = ; x = ; b = ; r = 2: Clearly, the greedy algorithm will start by erroneously selecting the rst column of A and then add the second column. In this case, the correct set of columns (second and third) cannot be selected by the greedy algorithm. In general, it is always possible to nd situations in which one or all of the heuristics that have been proposed so far will fail [6]. An alternative to the sequential forward selection used in the greedy algorithm is to use sequential backward selection, i.e., rather than adding columns to the solution one by one, it is possible to start with all columns present (i.e., with the complete matrix A) and remove one column at a time until r columns are left. The column that is removed should be chosen to minimize the increment in the least-squares residual. We call this alternative approach the \backward greedy algorithm." To avoid confusion, the standard greedy algorithm will be referred to as the \forward 2

4 greedy algorithm" in the sequel. On the simple example above, it is obvious that the backward greedy algorithm will correctly remove the rst column from A. In this case, the backward greedy algorithm correctly yields the optimal solution. More generally, as will be shown in Section 3, the backward greedy algorithm is guaranteed to yield the correct subset of non-zero components in x if the perturbation in (1) is small enough. In the following section, the backward greedy algorithm for subset selection is formally dened and a simple QR-based implementation is described. Section 3 presents our main result about the backward greedy algorithm in the form of a theorem proving that it always nd the correct subset of non-zero components of x in the small perturbation case. Some implications of this theorem are then be discussed. Numerical results illustrating the properties of the backward greedy algorithm are given in Section 4. The paper is concluded by some remarks on the choice of the number of columns r and on the NP-hardness of the subset selection problem, and by a discussion of some possible extensions of our result. 2 The Backward Greedy Algorithm Formally, the backward greedy algorithm can be dened as follows. Let = f1; : : : ; ng denote the ordered set of column indices of A, and? = f 1 ; : : : ; c(?) g,?, an ordered subset of, of cardinality c(?). The \colon" notation A(: ;?) is used to designate the matrix formed from the columns of A whose indices are in?. Similarly, x(?) designates the?-indexed elements of the vector x, and x(i: j) designates the sub-vector with elements indexed by i through j. Denote by (?) = min ka(: ;?)z? bk 2 (2) z2c c(?) the least-squares residual associated with the sparse LS solution of Ax = b based on the?-indexed subset of columns of A. The backward greedy algorithm for subset selection is initialized by taking? =. Elements are then removed from? one by one by repeating the iteration?? n fk g : k = arg min (? n fkg) (3) k2? until c(?) = r. The column k that is removed at each iteration is chosen to minimize the increment in the least-squares residual. Once the last iteration has been performed, the sparse least-squares 3

5 solution associated with the column indices left in? is computed. The subset of indices obtained at the last iteration of the backward greedy algorithm and the associated sparse least-squares solution will be denoted by ^? s and ^x s, respectively. The forward greedy algorithm is usually implemented by means of the QR algorithm for leastsquares solution of linear systems. In essence, the forward greedy algorithm is a QR factorization of A by the Gram-Schmidt procedure in which the column pivot is chosen greedily with respect to the right-hand side b of the matrix equation Ax = b (see [10, 4]). Similarly, the backward greedy algorithm can be eciently implemented by combining the QR algorithm for least-squares problems with a column-deletion QR downdating step based on Givens rotations. For the implementation of the backward greedy algorithm, it is necessary to evaluate the leastsquare residuals associated with matrices obtained by gradual removal of columns in A. Consider rst the initial step of the backward greedy algorithm. Let us assume that the QR factorization of A is available: Q H A = R, where Q H denotes the Hermitian transpose of Q. The associated least-squares residual is given by () = ke(n + 1: m)k 2, e = Q H b. Since A(: ; n fkg) is the matrix obtained by deleting the k-th column from A(: ; ), its QR factorization (Q (k) ) H A(: ; nfkg) = R (k) can be computed by \downdating" Q and R for the column deletion by a sequence of Givens rotations. The downdating operation is [4, p. 595] G H n?1 GH k R(: ; n fkg) = R(k) G H n?1 G H k e = e(k) where G i is a rotation in planes i, i + 1 for i = k: n? 1, and R(: ; n fkg) is the upper Hessenberg matrix obtained by deleting the k-th column of R. The downdated least-squares residual is given by ( n fkg) = ke (k) (n: m)k 2 : Note that there is no need to explicitly compute the downdated QR factorization Q (k) R (k). It is simply necessary to apply the sequence of Givens rotations to e to nd the downdated least-square residual. Once the minimum downdated (nfk g) has been found, the k -th column is removed from A and the process repeated recursively until only r columns are left. The sparse least-square solution 4

6 ^x s can then be computed by the usual QR approach. The backward greedy algorithm can thus be written in pseudo-code as Algorithm Backward Greedy: Input: matrix A, column vector b, integer r < n. Output: index set ^? s, column vector ^x s. Subset selection step: i n;? f1; : : : ; ng; compute Q H A = R; e Q H b; while i > r do i i? 1; for j = 1: i do compute the Givens rotations downdating G H i?1 GH j H(j) = R (j) where H (j) = R with column j deleted and R (j) upper triangular; e (j) G H i?1 GH j e; (? n f j g) ke (j) (i: m)k where j is the j-th element of?; end choose 1 k i such that the residual (? n f k g) is minimum;?? n f k g; R R (k) ; e e (k) ; end. Solution step: ^? s?; compute the solution of Rz = e(1: r) by back-substitution; ^x s (?) z ; ^x s (? c ) 0. In general, the global cost of the backward greedy algorithm will be dominated by that of 5

7 the original QR factorization and the QR downdating operations during the subset selection step. The original QR factorization of A has a cost of O(mn 2 ). Downdating a QR factorization (i.e., computing the Givens rotations and applying them to e) at step i requires O(i 2 ) operations (recall that i = n:?1: r). [4]. The complete i-th selection iteration will require O(i 3 ) operations for the i QR downdate operations, plus O(i(m? i)) operations for the evaluation of the i least-squares residuals. Once the last selection iteration has been performed, computing ^x s still requires O(r 2 ) operations. The implementation of the algorithm that has been presented in this section is reasonably ecient but it is not optimized. Its optimization would be specic to particular constraints (number of operations, storage requirements, numerical stability) and therefore is not attempted in this paper. See [1] and references therein for details on the numerical implementation of a QR downdating algorithm for least-squares solutions. We conclude this section with a short discussion of the comparative computational costs of the backward greedy algorithm and the forward greedy algorithm. The backward greedy algorithm starts with n columns and removes them one by one until only r columns are left. The rst step in the algorithm is the solution of the non-sparse least-squares problem Ax = b by QR factorization. Recalling the analogy between the forward greedy algorithm and QR factorization with column pivoting, this rst step of the backward greedy algorithm can be seen to be equivalent to the nal step of the forward greedy algorithm. Obviously, if r is small with respect to n, the computational cost of the backward greedy algorithm will far exceed that of the forward greedy algorithm. Because the overhead of the forward greedy algorithm compared to a \regular" QR factorization, the cost of the backward algorithm might be similar or even slightly lower than that of the forward algorithm if r is close to n. Even if r is not too close to n, it may still be interesting to use the backward algorithm rather than the forward algorithm because it possesses some nice optimallity properties, which are the subject of the next section. In contrast, the forward greedy algorithm has no such properties, as demonstrated in Section 1. 6

8 3 Main Result Consider the alternative interpretation of the subset selection problem (1). In this interpretation, there exists a sparse vector x 0 such that Ax 0 = b 0, or A(: ;? 0 )x 0 (? 0 ) = b 0 and x 0 (? c 0 ) = 0 where? 0 is the subset of indices of non-zero elements in x 0. The subset selection problem can be viewed as nding the r components that are \active" in x 0 (i.e., nding? 0 ) and estimating their values from a noisy observation of b = b 0 +. That is, the subset selection is a detection and estimation problem with a priori knowledge on the solution in the form of a sparsity constraint. The main result of this paper, presented in Theorem 1, is that, for any full rank matrix A and any sparse vector x 0, there exists a bound on the perturbation that guarantees that the backward greedy algorithm will select the correct subset of components, i.e., ^? s =? 0. This means that the backward greedy algorithm is optimal for the subset selection problem at least for small \noise levels." In order to prove this theorem, we will need the following two lemmas. As usual, we dene the orthogonal distance d(x; S) between a point x and a subspace S as the distance between this point and its orthogonal projection on the subspace. The rst lemma states that the set of points located at equal distance from two subspaces (such as the ones dened by the range spaces of two possible subsets of A) consists of the union of two subspaces. The proof of Lemma 1 is given in the Appendix. Lemma 1. Let S 1 and S 2 be two subspaces of C n, dim(s 1 ) = dim(s 2 ) = r. The set of points located at equal distance from S 1 and S 2, called the bisector of S 1 and S 2 and denoted by H, consists of the union of two subspaces H 1 and H 2 of C n of dimensions dim(h 1 ) = dim(h 2 ) = n?r +dim(s 1 \S 2 ). The second lemma considers one iteration of the backward greedy algorithm, i.e., the removal of one column from the current subset of columns?. It states that there exists a non-trivial bound on the norm of the perturbation kb? b 0 k such that, for any perturbation smaller than the bound, one iteration of the backward greedy algorithm is guaranteed to remove a column that is not part of the correct subset of columns? 0, provided that the current subset of columns? contains the correct subset? 0. Let us rst dene the orthogonal distance d(x; H) between a point x and a bisector H as 7

9 d(x; H) = minfd(x; H 1 ); d(x; H 2 )g. That is, d(x; H) is the distance between the point and the closest of its projections on the two subspaces that make H. Lemma 2. Let ^? 2 arg min () (4)? c()=c(?)?1 where () is dened in (2) and?. Denote by? 0 the subset of indices associated with the sparse solution x 0 in (1). If?? 0, then there exists > 0 such that kb? b 0 k <, kb 0 k > 0, implies ^?? 0. The value of is given by = min?;6? 0 c( )=c(?)?1 where H(; ) denotes the bisector of R(A(: ; )) and R(A(: ; )). min d(b 0 ; H(; )); (5)?;? 0 c()=c(?)?1 Proof. Let H(; ), f1; : : : ; ng, f1; : : : ; ng, c() = c(), denote the bisector of R(A(: ; )) and R(A(: ; )), that is, the union of the two hyperplanes of points located at equal distance from R(A(: ;?)) and R(A(: ; )) as given in Lemma 1. Let be the radius of the largest ball V (b 0 ) centered at b 0 2 R(A(: ;? 0 )) that does not intersect any of the bisectors H(; ), 6? 0,?,? 0,?. That is, is given by (5). Clearly, all the points in V (b 0 ) are closer to at least one of the R(A(: ; )) for some? 0 than they are to any of the R(A(: ; )) for all 6? 0. Thus, if jb? b 0 j <, we are guaranteed to select ^? among the 's, which means ^?? 0 as desired. All that remains to show now is that > 0. This comes as a consequence of the linear independence of the columns of A. Suppose = 0. Then there exist 6? 0 and? 0 such that d(b 0 ; H(; )) = 0, i.e., b 0 2 H(; )). Because, by the denition of H(; ), d[b 0 ; R(A(: ; ))] = d[b 0 ; R(A(: ; ))] and, by assumption, b 0 2 R(A(: ; )) this implies b 0 2 R(A(: ; )) \ R(A(: ; )). Given that kb 0 k > 0, this would contradicts the hypothesis that A is full rank because 6? 0 while? 0. Equipped with Lemma 2, we can now state our main theorem. Theorem 1. For any full column rank matrix A and for any sparse vector x 0 with r non-zero components (or, alternately, for any b 0 = Ax 0 ), there exists > 0 such that kb?b 0 k < guarantees that the backward greedy algorithm for solving Ax = b will select the correct subset of components 8

10 ? 0. The value of is given by = min rk<n min ;6? 0 c( )=k min d(b 0 ; H(; )): (6) ;? 0 c()=k Furthermore, the corresponding sparse least-squares solution ^x s satises k^x s?x 0 k = min [A(: ;? 0 )] where min [A] denotes the smallest singular value of A. Proof. The rst part of the theorem can be proven by an inductive argument based on Lemma 2. Let? (k) denote the subset of columns of A selected at step k of the backward greedy algorithm, k = n; n? 1; : : : ; r. The base for the induction is the observation that? (n)? 0 since? (n) = by denition of the backward greedy algorithm. The induction step consists in showing that if? (k+1)? 0, then? (k)? 0 provided that kb?b 0 k < (k) for some (k) > 0, for i = n?1; n?2; : : : ; r. The induction step follows directly from Lemma 2. The bound (k) is given by (5) with? =? (k+1). It follows that the backward greedy algorithm will select the correct subset of columns of A, i.e., ^? s =? 0, if kb? b 0 k < with = min rk<n (k), the last relation yielding the bound (6). The last part of the theorem follows directly from k^x s? x 0 k = ka y (:;? 0 )(^b? b 0 )k ka y (:;? 0 )k; where ^b = A(:;? 0 )^x s and A y denotes the pseudo-inverse of A. Our specic motivation for studying the backward greedy algorithm for subset selection arises from an estimation and detection problem in statistical signal processing [3, 2]. In this application, the vector b is a random vector converging with probability one to b 0. We are interested in the properties of the estimate of? 0 and x 0 obtained by the backward greedy algorithm. Note that for the application described in [3, 2], nding? 0 is at least as important as nding x 0. Using Theorem 1, it is easy to establish that the estimates ^? s and ^x s obtained by the backward greedy algorithm are strongly consistent. That is, they converge to the true values? 0 and x 0 with probability one. To see this, note that the backward greedy algorithm is guaranteed to pick the correct subset of components if the perturbation is small enough, and recall that k^x s?x 0 k kb?b 0 k= min [A(:;? 0 )]. We get thus directly the following optimal detection corollary. Corollary 1. If b is a strongly consistent estimator of b 0, i.e., if b! b 0 w.p.1, then ^? s!? 0 and ^x s! x 0 w.p.1. 9

11 So far it has been assumed that a \true" sparse solution x 0 existed and that the subset selection problem consisted in nding this sparse solution. This \detection and estimation" view led to Theorem 1 and its rst corollary. There are however situations in which this formulation is not adequate: there is no underlying \true" solution and the subset selection problem is simply that of nding the subset of r columns from A such that the least-square residual is minimized. In this case, let? s and x s be dened as the optimal sparse solution. That is, let? s = arg min (?) (7)? c(?)=r x s = arg min kax? bk: (8) x2c n x(? c s )=0 In practice,? s and x s can always be found by exhaustive search. Recall that ^? s and ^x s are the solutions provided by the backward greedy algorithm. The following corollary of Theorem 1 implies that, provided that the residual (^? s ) is small enough, the backward greedy algorithm will give the same solution as an exhaustive search. Corollary 2. The backward greedy algorithm is optimal, i.e., ^? s =? s and ^x s = x s, provided that (^? s ) is small enough; that is, provided that (^? s ) < where is dened by (6) in which b 0 = A^x s and? 0 = ^? s. Proof. By hypothesis, kb? A^x s k <. From the denition of, this implies that b is closer to R(A(: ; ^? s )) than to R(A(: ;?) for any subset? with r or fewer components since b is on the \^? s - side" of all the bisectors H(^? s ;?). Hence, d(b; R(A(: ; ^? s ))) < d(b; R(A(: ;?))) or (^? s ) < (?), which implies ^? s =? s and ^x s = x s. Using Corollary 2, a procedure for checking if the output of the backward greedy algorithm ^? s is the optimal sparse solution? s can be suggested: simply compare (^? s ) to given by (6). If (^? s ) <, then ^? s =? s. Unfortunately, computing from (6) with ^? s, A, and b is not practical. So, this procedure is of limited usefulness in general and the implication of Corollary 2 is more of qualitative nature. The implication of Corollary 2 is that the backward greedy algorithm is optimal for the subset selection problem when its solution corresponds to a small enough residual. That is, if (? s ) is small enough then? s is guaranteed to be found by the algorithm. For larger residuals, the backward 10

12 greedy algorithm is sub-optimal; it may or may not nd the correct solution, depending on the values of A and b. 4 Numerical Results Figure 1 illustrates Theorem 1 and Corollary 2. A 10 7 matrix A with real random i.i.d. Gaussian coecients was generated, 2?0:54?0:10 0:44?0:39?1:71?1:16?0:06?1:77 0:80?1:38 0:73 0:29 0:59?0:25 0:08?0:60 0:78 0:78?1:12 0:64 1:20 0:28 2:25 1:64 0:65 1:33 1:73?1:91?1:20?0:15?0:70 1:41?0:82 0:59 0:43 A = 1:25 0:12 0:82 0:79 0:57?0:79 0:90 1:22 1:19 1:24 1:21 0:18 0:36 0:85?0:03 0:23?1:61?1:33?0:91?0:98 0:71 6 0:48?0:09?1:51 0:04?2:26 0:04?0:06 4 0:14?0:88 0:62?0:23?0:05 0:03 0: : The rst three columns of the matrix were linearly combined to yield the vector b 0, i.e., b 0 = Ax 0 with x 0 = [1; 1; 1; 0; 0; 0; 0] H. One hundred random perturbation vectors were generated and added to b 0 to yield b = b 0 +. The perturbation vectors were uniformly distributed on a hypersphere of given radius, i.e., the norm kk of the perturbation vectors was xed. For each perturbation vector, the sparse least-square solution to Ax = b with r = 3 non-zero elements was computed by the backward greedy algorithm and by exhaustive search. The experiment was repeated for several values of kk. Figure 1 gives the number of times the backward greedy algorithm found the true subset of of non-zero components? 0 = f1; 2; 3g as a function of the ratio kk=kb 0 k (i.e., of the inverse of the \signal-to-noise" ratio). Figure 1 also gives the number of times the sparse solution obtained by exhaustive search (? s ) was equal to? 0 or to ^? s (recall that for large perturbations,? s does not necessarily need to be equal to? 0 ). According to Theorem 1 and Corollary 2, we should expect ^? s =? 0 =? s when kk <. This can indeed be veried on Figure 1 where the value of corresponding to A and b 0 is marked by a circle. The value of was computed 11

13 from (6). The distance between b 0 and the bisectors of the subspaces corresponding to subsets of columns of A in the rightmost part of (6) can be computed by the method suggested at the end of the Appendix. The value of depends on the matrix A and the \exact" vector b 0. Two general observations can be made about. First, it is easily veried that should be proportional to kb 0 k. Second, recall that can be viewed as the radius of the largest hypersphere centered on the sparse vector b 0 that can be tted in the n-dimensional cone including b 0 whose facets are the bisectors of pairs of subspaces spanned by subsets of columns of A in (6). Intuitively, when the principal angles between these subspaces are small, the principal angles between the bisectors will also be small, the n-dimensional cone will be \narrower," and will be smaller. The principal angles between subspaces spanned by subsets of columns of A will be small when these columns are nearly linearly dependent, i.e., when the condition number of the matrix A is large. One would thus expect to see the ratio =kb 0 k to vary inversely with the condition number of the matrix A. In order to verify this hypothesis, two hundred 10 7 matrices with random coecients were generated. By manipulating the singular values of the matrices, the condition number (A) was set to 5 for the rst 100 matrices and to 50 for the next 100 matrices. For each matrix, the vector b 0 was again computed from the rst three columns of A as b 0 = A[1; 1; 1; 0; 0; 0; 0] H, and the bound was evaluated from (6). The histograms of Figure 2 gives the distributions of obtained for (A) = 5 and (A) = 50. It can be veried that =kb 0 k is generally higher when the condition number of the matrix is small. Given A and b 0, a method for computing an estimate the bound that does not requires the exhaustive evaluation of the distances between b 0 and bisectors in (6) can be suggested. This method can be used when the matrix A has a large number of columns. Using the same approach as in the rst experiment, random perturbation vectors kk can be added to b 0 and the backward greedy algorithm can be used to compute ^? s for increasing values of kk. When a perturbation vector is found such that ^? s 6=? 0, Theorem 1 implies that kk. If a large number of random vectors are generated for each value of kk, this Monte Carlo-type method can yield tight approximations of. Since it is based on the backward greedy algorithm, the method can yield an estimate of to any desired condence level in polynomial time. 12

14 5 Remarks 5.1 Determination of the Number of Components r In the variant of the subset selection problem considered here, the number of desired non-zero components r was supposed known a priori. It is possible to consider other formulations of the problem. For example, in [10], Natarajan states the following version of the subset selection problem (\Natarajan's problem" in the sequel): Given A, b, and > 0, nd the vector x satisfying kax? bk 2 if such exists, such that x has the fewest non-zero entries of all such vectors. In the version considered in [2], neither nor r are known a priori. Instead, the subset selection problem is dened as nding the sparse vector x that realizes the best trade-o between the match to the observation vector b and the number of non-zero components. This trade-o is measured by a gure of merit of the form min x fkax? bk 2 + f(x)g ; (9) where f(x) is a complexity penalty monotonously increasing with the number of non-zero components in x. Note that because (A(: ;? 0 )) < (A), a sparsity constraint reduces the sensitivity of the solution of the linear inverse problem Ax = b to perturbations of the observation vector b 0, at least for small enough ( < ). For this reason, solving a linear inverse problem with a sparsity constraint like in (9) is sometimes known as \regularization by sparsity" [5]. The backward greedy algorithm presented in Section 2 can be readily applied to the above problems by solving for successively decreasing r and looking for the sparsest solution satifying kax? bk < in the case of Natarajan's problem, or the minimizer of the gure of merit (9) in the case of regularization by sparisity. 5.2 NP-Hardness of Subset Selection In [10], Natarajan considered the variant of the subset selection problem described in Section 5.1. He showed that nding the sparsest solution to kax? bk for a given, if such solution exists, is NP-hard. This may seem to be in contradiction with our result that the backward greedy algorithm can solve correctly the subset selection problem in polynomial time in the small residual case. Indeed, for Natarjan's formulation of the subset selection problem, we have the following 13

15 corollary to our main result. Corollary 3. Let 1 ( 1 ), and? 0 () and r() be the residual, component indices, and their number, respectively, in the solution to the subset selection problem (Natarajan's Problem) for given A of full column rank, b, and, if the solution exists. Suppose 1 <, where is given by (6) with r = r() and? 0 =? 0 (). Then, the backward greedy algorithm will provide the optimal solution ^? s =? 0 () if stopped at the smallest r satisfying (^? s ). Proof. The proof follows directly from the formulation of the subset selection by Natarajan and from Corollary 2. Corollary 3 states that if the threshold is small enough, the solution to Natajan's problem will be found in polynomial time by the backward greedy algorithm. The contradiction between Natarjan's result on NP-hardness of the subset selection problem and the last statement is only apparent. Indeed, assuming that the backward greedy algorithm yields a solution ^? s complying with kax? bk for a given, it would still be necessary to compute the corresponding bound from (6) to verify that <, which is not a polynomial-time operation. Thus, even if the backward greedy algorithm yields a solution in polynomial time which can be expected to be the optimal solution if the residual is small enough, the verication that the residual is indeed \small enough" cannot be performed in polynomial time. 5.3 Extensions of the Results The results that have been presented in the previous sections can be extended to variants of the subset selection problem based on other metrics than the Euclidean distance. In general, the subset selection problem can be stated as nding the subset of r columns from A that gives the best approximation of b in the sense that d(ax; b) is minimized over all vectors x with at most r non-zero components for a given distance measure d(; ) dened on C m. Let the distance between a point x and a set of points S be dened as d(x; S) = min y2s d(x; y) and let H(; ) = fx : d [x; R(A(: ; ))] = d [x; R(A(: ; ))]g denote the bisector of R(A(: ; )) and R(A(: ; )). It is then easily veried that Lemma 2 and the rst part of Theorem 1 remain valid mutatis mutandis. 14

16 Appendix. Proof of Lemma 1 Proof. Let P 1 and P 2 denote the projection matrices associated with S 1 and S 2, respectively. The bisector H of S 1 and S 2 is dened by H = fx : kx? P 1 xk = kx? P 2 xkg: Let W = S 1 \ S 2. If W 6= f0g, it is always possible to write S 1 = W S1 ~ and S 2 = W S2 ~ with ~S 1 \ S2 ~ = f0g and s = dim( Si ~ ) = dim(s i )? dim(s 1 \ S 2 ), i = 1; 2. Let P W, P1 ~, and P2 ~ be the projection matrices associated with W, S1 ~, and S2 ~, respectively. Applying Pythagoras' theorem to the projections of x on the subspaces S i, Si ~, and W, we have kx? ~ Pi xk 2 = kx? P i xk 2 + kp i x? ~ Pi xk 2 = kx? P i xk 2 + kp W xk 2 for i = 1; 2. Hence, the bisector of S 1 and S 2 is also the bisector of ~ S1 and ~ S2, that is, H = fx : kx? ~ P1 xk = kx? ~ P2 xkg: By the indempotence and Hermitian symmetry properties of projection matrices, kx? P1 ~ xk 2 = kx? P2 ~ xk 2 is equivalent to x H ( P1 ~? P2 ~ )x = 0: Let U i = (u i 1 ; : : : ; ui s) be a m r matrix whose columns form an orthonormal basis for Si ~. We have 0 = x H ( ~ P1? ~ P2 )x = x H (U 1 U H 1? U 2U H 2 )x = x H (U 1? U 2 )(U 1 + U 2 ) H x; 8x 2 H: Thus, the bisector H consists of the union of the two subspaces dened by (U 1 + U 2 ) H x = 0 and (U 1? U 2 ) H = 0. In other terms, H = H 1 [ H 2 ; where H 1 = N ((U 1 + U 2 ) H ); H 2 = N ((U 1? U 2 ) H ); 15

17 and where N (A) denotes the nullspace of A. It also follows directly that dim[h 1 ] = dim[h 2 ] = m? s = m? r + dim(s 1 \ S 2 ); which concludes the proof. The derivation above suggests a method for computing the distance d(x; H) between a vector x and a bisector H. This distance can be easily computed from the orthonormal bases U 1 and U 2 by recalling that the range R(A) is the orthogonal complement of N (A H ). It follows that d(x; H 1 ) = jp R(U1 +U 2 )xj where P R(U1 +U 2 ) is the orthogonal projection on R(U 1 + U 2 ), which can be easily computed, e.g., by QR factorization of U 1 + U 2. Likewise d(x; H 2 ) = jp R(U1?U 2 )xj can be easily computed. The orthonormal bases U 1 and U 2 can be obtained from the principal vectors between the subspaces S 1 and S 2 [4]. The latter can be obtained from any pair of orthonormal bases for S 1 and S 2 via a SVD. References [1] A. Bjorck, H. Park, and L. Elden, Accurate downdating of least squares solutions, SIAM Journal on Matrix Analysis and Applications, 15 (1994), pp. 549{568. [2] C. Couvreur and Y. Bresler, Dictionary-based decomposition of linear mixtures of Gaussian processes, in Proceedings IEEE International Conference on Acoustic, Speech, and Signal Processing, Atlanta, GA, May 1996, pp. 2519{2522. [3], Optimal decomposition and classication of linear mixtures of ARMA processes, submitted, Available at ftp://thor.fpms.ac.be/pub/couvreur/biblio/sarma.ps.gz. [4] G. H. Golub and C. F. Van Loan, Matrix Computations, The Johns Hopkins University Press, Baltimore, ML, second ed.,

18 [5] G. Harikumar and Y. Bresler, A new algorithm for computing sparse solutions to linear inverse problems, in Proceedings of the International Conference on Acoustics, Speech and Signal Processing (ICASSP), vol. 3, Atlanta, GA, may 1996, pp. 1331{1334. [6] A. J. Miller, Subset Selection in Regression, Chapman and Hall, London, UK, [7] M. Nafie, M. Ali, and A. H. Tewfik, Optimal susbet selection for adaptive signal representation, in Proceedings of the International Conference on Acoustics, Speech and Signal Processing (ICASSP), vol. 4, Atlanta, GA, May [8] G. M. Furnival and R. W. Wilson, Jr., Regression by Leaps and Bounds, Technometrics, 16 (1974), pp. 499{511. [9] A. H. Feiveson, Finding the Best Regression Subset by Reduction in Nonfull-Rank Cases, SIAM Journal on Matrix Analysis and Its Applications, 15 (1994), pp. 194{204. [10] B. K. Natarajan, Sparse approximate solutions to linear systems, SIAM Journal on Computing, 24 (1995), pp. 227{

19 δ / b 0 = % Backward = True Backward = Exhaustive Exhaustive = True b b / b Figure 1: Performance of the backward greedy algorithm as a function of the ratio kb? b 0 k=kb 0 k. The value of the bound is represented by a circle. 18

20 (a) (b) 40 Γ 0 = { }, κ(a) = Γ 0 = { }, κ(a) = % 20 % δ / b δ / b 0 Figure 2: Distribution of the bound for 10 7 random matrices with given condition number: (a) (A) = 5 and (b) (A) =

Linear Systems. Carlo Tomasi. June 12, r = rank(a) b range(a) n r solutions

Linear Systems. Carlo Tomasi. June 12, r = rank(a) b range(a) n r solutions Linear Systems Carlo Tomasi June, 08 Section characterizes the existence and multiplicity of the solutions of a linear system in terms of the four fundamental spaces associated with the system s matrix

More information

Linear Systems. Carlo Tomasi

Linear Systems. Carlo Tomasi Linear Systems Carlo Tomasi Section 1 characterizes the existence and multiplicity of the solutions of a linear system in terms of the four fundamental spaces associated with the system s matrix and of

More information

Linear Algebra (part 1) : Vector Spaces (by Evan Dummit, 2017, v. 1.07) 1.1 The Formal Denition of a Vector Space

Linear Algebra (part 1) : Vector Spaces (by Evan Dummit, 2017, v. 1.07) 1.1 The Formal Denition of a Vector Space Linear Algebra (part 1) : Vector Spaces (by Evan Dummit, 2017, v. 1.07) Contents 1 Vector Spaces 1 1.1 The Formal Denition of a Vector Space.................................. 1 1.2 Subspaces...................................................

More information

DS-GA 1002 Lecture notes 0 Fall Linear Algebra. These notes provide a review of basic concepts in linear algebra.

DS-GA 1002 Lecture notes 0 Fall Linear Algebra. These notes provide a review of basic concepts in linear algebra. DS-GA 1002 Lecture notes 0 Fall 2016 Linear Algebra These notes provide a review of basic concepts in linear algebra. 1 Vector spaces You are no doubt familiar with vectors in R 2 or R 3, i.e. [ ] 1.1

More information

Lecture Notes 1: Vector spaces

Lecture Notes 1: Vector spaces Optimization-based data analysis Fall 2017 Lecture Notes 1: Vector spaces In this chapter we review certain basic concepts of linear algebra, highlighting their application to signal processing. 1 Vector

More information

Chapter 6 - Orthogonality

Chapter 6 - Orthogonality Chapter 6 - Orthogonality Maggie Myers Robert A. van de Geijn The University of Texas at Austin Orthogonality Fall 2009 http://z.cs.utexas.edu/wiki/pla.wiki/ 1 Orthogonal Vectors and Subspaces http://z.cs.utexas.edu/wiki/pla.wiki/

More information

SPARSE signal representations have gained popularity in recent

SPARSE signal representations have gained popularity in recent 6958 IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 57, NO. 10, OCTOBER 2011 Blind Compressed Sensing Sivan Gleichman and Yonina C. Eldar, Senior Member, IEEE Abstract The fundamental principle underlying

More information

Institute for Advanced Computer Studies. Department of Computer Science. Two Algorithms for the The Ecient Computation of

Institute for Advanced Computer Studies. Department of Computer Science. Two Algorithms for the The Ecient Computation of University of Maryland Institute for Advanced Computer Studies Department of Computer Science College Park TR{98{12 TR{3875 Two Algorithms for the The Ecient Computation of Truncated Pivoted QR Approximations

More information

Conditions for Robust Principal Component Analysis

Conditions for Robust Principal Component Analysis Rose-Hulman Undergraduate Mathematics Journal Volume 12 Issue 2 Article 9 Conditions for Robust Principal Component Analysis Michael Hornstein Stanford University, mdhornstein@gmail.com Follow this and

More information

DS-GA 1002 Lecture notes 10 November 23, Linear models

DS-GA 1002 Lecture notes 10 November 23, Linear models DS-GA 2 Lecture notes November 23, 2 Linear functions Linear models A linear model encodes the assumption that two quantities are linearly related. Mathematically, this is characterized using linear functions.

More information

Orthogonality. 6.1 Orthogonal Vectors and Subspaces. Chapter 6

Orthogonality. 6.1 Orthogonal Vectors and Subspaces. Chapter 6 Chapter 6 Orthogonality 6.1 Orthogonal Vectors and Subspaces Recall that if nonzero vectors x, y R n are linearly independent then the subspace of all vectors αx + βy, α, β R (the space spanned by x and

More information

A fast randomized algorithm for overdetermined linear least-squares regression

A fast randomized algorithm for overdetermined linear least-squares regression A fast randomized algorithm for overdetermined linear least-squares regression Vladimir Rokhlin and Mark Tygert Technical Report YALEU/DCS/TR-1403 April 28, 2008 Abstract We introduce a randomized algorithm

More information

Vector Space Basics. 1 Abstract Vector Spaces. 1. (commutativity of vector addition) u + v = v + u. 2. (associativity of vector addition)

Vector Space Basics. 1 Abstract Vector Spaces. 1. (commutativity of vector addition) u + v = v + u. 2. (associativity of vector addition) Vector Space Basics (Remark: these notes are highly formal and may be a useful reference to some students however I am also posting Ray Heitmann's notes to Canvas for students interested in a direct computational

More information

Lecture notes: Applied linear algebra Part 1. Version 2

Lecture notes: Applied linear algebra Part 1. Version 2 Lecture notes: Applied linear algebra Part 1. Version 2 Michael Karow Berlin University of Technology karow@math.tu-berlin.de October 2, 2008 1 Notation, basic notions and facts 1.1 Subspaces, range and

More information

14.2 QR Factorization with Column Pivoting

14.2 QR Factorization with Column Pivoting page 531 Chapter 14 Special Topics Background Material Needed Vector and Matrix Norms (Section 25) Rounding Errors in Basic Floating Point Operations (Section 33 37) Forward Elimination and Back Substitution

More information

Mathematics Department Stanford University Math 61CM/DM Inner products

Mathematics Department Stanford University Math 61CM/DM Inner products Mathematics Department Stanford University Math 61CM/DM Inner products Recall the definition of an inner product space; see Appendix A.8 of the textbook. Definition 1 An inner product space V is a vector

More information

Linear Algebra Massoud Malek

Linear Algebra Massoud Malek CSUEB Linear Algebra Massoud Malek Inner Product and Normed Space In all that follows, the n n identity matrix is denoted by I n, the n n zero matrix by Z n, and the zero vector by θ n An inner product

More information

Sparse Approximation of Signals with Highly Coherent Dictionaries

Sparse Approximation of Signals with Highly Coherent Dictionaries Sparse Approximation of Signals with Highly Coherent Dictionaries Bishnu P. Lamichhane and Laura Rebollo-Neira b.p.lamichhane@aston.ac.uk, rebollol@aston.ac.uk Support from EPSRC (EP/D062632/1) is acknowledged

More information

Parallel Numerical Algorithms

Parallel Numerical Algorithms Parallel Numerical Algorithms Chapter 6 Matrix Models Section 6.2 Low Rank Approximation Edgar Solomonik Department of Computer Science University of Illinois at Urbana-Champaign CS 554 / CSE 512 Edgar

More information

Review of some mathematical tools

Review of some mathematical tools MATHEMATICAL FOUNDATIONS OF SIGNAL PROCESSING Fall 2016 Benjamín Béjar Haro, Mihailo Kolundžija, Reza Parhizkar, Adam Scholefield Teaching assistants: Golnoosh Elhami, Hanjie Pan Review of some mathematical

More information

5.6. PSEUDOINVERSES 101. A H w.

5.6. PSEUDOINVERSES 101. A H w. 5.6. PSEUDOINVERSES 0 Corollary 5.6.4. If A is a matrix such that A H A is invertible, then the least-squares solution to Av = w is v = A H A ) A H w. The matrix A H A ) A H is the left inverse of A and

More information

MATH 1120 (LINEAR ALGEBRA 1), FINAL EXAM FALL 2011 SOLUTIONS TO PRACTICE VERSION

MATH 1120 (LINEAR ALGEBRA 1), FINAL EXAM FALL 2011 SOLUTIONS TO PRACTICE VERSION MATH (LINEAR ALGEBRA ) FINAL EXAM FALL SOLUTIONS TO PRACTICE VERSION Problem (a) For each matrix below (i) find a basis for its column space (ii) find a basis for its row space (iii) determine whether

More information

PROOF OF TWO MATRIX THEOREMS VIA TRIANGULAR FACTORIZATIONS ROY MATHIAS

PROOF OF TWO MATRIX THEOREMS VIA TRIANGULAR FACTORIZATIONS ROY MATHIAS PROOF OF TWO MATRIX THEOREMS VIA TRIANGULAR FACTORIZATIONS ROY MATHIAS Abstract. We present elementary proofs of the Cauchy-Binet Theorem on determinants and of the fact that the eigenvalues of a matrix

More information

linearly indepedent eigenvectors as the multiplicity of the root, but in general there may be no more than one. For further discussion, assume matrice

linearly indepedent eigenvectors as the multiplicity of the root, but in general there may be no more than one. For further discussion, assume matrice 3. Eigenvalues and Eigenvectors, Spectral Representation 3.. Eigenvalues and Eigenvectors A vector ' is eigenvector of a matrix K, if K' is parallel to ' and ' 6, i.e., K' k' k is the eigenvalue. If is

More information

IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 57, NO. 11, NOVEMBER On the Performance of Sparse Recovery

IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 57, NO. 11, NOVEMBER On the Performance of Sparse Recovery IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 57, NO. 11, NOVEMBER 2011 7255 On the Performance of Sparse Recovery Via `p-minimization (0 p 1) Meng Wang, Student Member, IEEE, Weiyu Xu, and Ao Tang, Senior

More information

Elementary linear algebra

Elementary linear algebra Chapter 1 Elementary linear algebra 1.1 Vector spaces Vector spaces owe their importance to the fact that so many models arising in the solutions of specific problems turn out to be vector spaces. The

More information

The SVD-Fundamental Theorem of Linear Algebra

The SVD-Fundamental Theorem of Linear Algebra Nonlinear Analysis: Modelling and Control, 2006, Vol. 11, No. 2, 123 136 The SVD-Fundamental Theorem of Linear Algebra A. G. Akritas 1, G. I. Malaschonok 2, P. S. Vigklas 1 1 Department of Computer and

More information

7. Dimension and Structure.

7. Dimension and Structure. 7. Dimension and Structure 7.1. Basis and Dimension Bases for Subspaces Example 2 The standard unit vectors e 1, e 2,, e n are linearly independent, for if we write (2) in component form, then we obtain

More information

STABILITY OF INVARIANT SUBSPACES OF COMMUTING MATRICES We obtain some further results for pairs of commuting matrices. We show that a pair of commutin

STABILITY OF INVARIANT SUBSPACES OF COMMUTING MATRICES We obtain some further results for pairs of commuting matrices. We show that a pair of commutin On the stability of invariant subspaces of commuting matrices Tomaz Kosir and Bor Plestenjak September 18, 001 Abstract We study the stability of (joint) invariant subspaces of a nite set of commuting

More information

Stat 159/259: Linear Algebra Notes

Stat 159/259: Linear Algebra Notes Stat 159/259: Linear Algebra Notes Jarrod Millman November 16, 2015 Abstract These notes assume you ve taken a semester of undergraduate linear algebra. In particular, I assume you are familiar with the

More information

A fast randomized algorithm for approximating an SVD of a matrix

A fast randomized algorithm for approximating an SVD of a matrix A fast randomized algorithm for approximating an SVD of a matrix Joint work with Franco Woolfe, Edo Liberty, and Vladimir Rokhlin Mark Tygert Program in Applied Mathematics Yale University Place July 17,

More information

and the polynomial-time Turing p reduction from approximate CVP to SVP given in [10], the present authors obtained a n=2-approximation algorithm that

and the polynomial-time Turing p reduction from approximate CVP to SVP given in [10], the present authors obtained a n=2-approximation algorithm that Sampling short lattice vectors and the closest lattice vector problem Miklos Ajtai Ravi Kumar D. Sivakumar IBM Almaden Research Center 650 Harry Road, San Jose, CA 95120. fajtai, ravi, sivag@almaden.ibm.com

More information

Vector spaces. DS-GA 1013 / MATH-GA 2824 Optimization-based Data Analysis.

Vector spaces. DS-GA 1013 / MATH-GA 2824 Optimization-based Data Analysis. Vector spaces DS-GA 1013 / MATH-GA 2824 Optimization-based Data Analysis http://www.cims.nyu.edu/~cfgranda/pages/obda_fall17/index.html Carlos Fernandez-Granda Vector space Consists of: A set V A scalar

More information

Numerical Methods I Non-Square and Sparse Linear Systems

Numerical Methods I Non-Square and Sparse Linear Systems Numerical Methods I Non-Square and Sparse Linear Systems Aleksandar Donev Courant Institute, NYU 1 donev@courant.nyu.edu 1 MATH-GA 2011.003 / CSCI-GA 2945.003, Fall 2014 September 25th, 2014 A. Donev (Courant

More information

Part I: Preliminary Results. Pak K. Chan, Martine Schlag and Jason Zien. Computer Engineering Board of Studies. University of California, Santa Cruz

Part I: Preliminary Results. Pak K. Chan, Martine Schlag and Jason Zien. Computer Engineering Board of Studies. University of California, Santa Cruz Spectral K-Way Ratio-Cut Partitioning Part I: Preliminary Results Pak K. Chan, Martine Schlag and Jason Zien Computer Engineering Board of Studies University of California, Santa Cruz May, 99 Abstract

More information

Least Squares. Tom Lyche. October 26, Centre of Mathematics for Applications, Department of Informatics, University of Oslo

Least Squares. Tom Lyche. October 26, Centre of Mathematics for Applications, Department of Informatics, University of Oslo Least Squares Tom Lyche Centre of Mathematics for Applications, Department of Informatics, University of Oslo October 26, 2010 Linear system Linear system Ax = b, A C m,n, b C m, x C n. under-determined

More information

Lecture 2: Linear Algebra Review

Lecture 2: Linear Algebra Review EE 227A: Convex Optimization and Applications January 19 Lecture 2: Linear Algebra Review Lecturer: Mert Pilanci Reading assignment: Appendix C of BV. Sections 2-6 of the web textbook 1 2.1 Vectors 2.1.1

More information

Chapter 3 Least Squares Solution of y = A x 3.1 Introduction We turn to a problem that is dual to the overconstrained estimation problems considered s

Chapter 3 Least Squares Solution of y = A x 3.1 Introduction We turn to a problem that is dual to the overconstrained estimation problems considered s Lectures on Dynamic Systems and Control Mohammed Dahleh Munther A. Dahleh George Verghese Department of Electrical Engineering and Computer Science Massachuasetts Institute of Technology 1 1 c Chapter

More information

Vector Space Concepts

Vector Space Concepts Vector Space Concepts ECE 174 Introduction to Linear & Nonlinear Optimization Ken Kreutz-Delgado ECE Department, UC San Diego Ken Kreutz-Delgado (UC San Diego) ECE 174 Fall 2016 1 / 25 Vector Space Theory

More information

Matrix Factorization and Analysis

Matrix Factorization and Analysis Chapter 7 Matrix Factorization and Analysis Matrix factorizations are an important part of the practice and analysis of signal processing. They are at the heart of many signal-processing algorithms. Their

More information

Applied Mathematics 205. Unit II: Numerical Linear Algebra. Lecturer: Dr. David Knezevic

Applied Mathematics 205. Unit II: Numerical Linear Algebra. Lecturer: Dr. David Knezevic Applied Mathematics 205 Unit II: Numerical Linear Algebra Lecturer: Dr. David Knezevic Unit II: Numerical Linear Algebra Chapter II.3: QR Factorization, SVD 2 / 66 QR Factorization 3 / 66 QR Factorization

More information

290 J.M. Carnicer, J.M. Pe~na basis (u 1 ; : : : ; u n ) consisting of minimally supported elements, yet also has a basis (v 1 ; : : : ; v n ) which f

290 J.M. Carnicer, J.M. Pe~na basis (u 1 ; : : : ; u n ) consisting of minimally supported elements, yet also has a basis (v 1 ; : : : ; v n ) which f Numer. Math. 67: 289{301 (1994) Numerische Mathematik c Springer-Verlag 1994 Electronic Edition Least supported bases and local linear independence J.M. Carnicer, J.M. Pe~na? Departamento de Matematica

More information

Preliminary/Qualifying Exam in Numerical Analysis (Math 502a) Spring 2012

Preliminary/Qualifying Exam in Numerical Analysis (Math 502a) Spring 2012 Instructions Preliminary/Qualifying Exam in Numerical Analysis (Math 502a) Spring 2012 The exam consists of four problems, each having multiple parts. You should attempt to solve all four problems. 1.

More information

Linear Regression and Its Applications

Linear Regression and Its Applications Linear Regression and Its Applications Predrag Radivojac October 13, 2014 Given a data set D = {(x i, y i )} n the objective is to learn the relationship between features and the target. We usually start

More information

Spanning and Independence Properties of Finite Frames

Spanning and Independence Properties of Finite Frames Chapter 1 Spanning and Independence Properties of Finite Frames Peter G. Casazza and Darrin Speegle Abstract The fundamental notion of frame theory is redundancy. It is this property which makes frames

More information

Fraction-free Row Reduction of Matrices of Skew Polynomials

Fraction-free Row Reduction of Matrices of Skew Polynomials Fraction-free Row Reduction of Matrices of Skew Polynomials Bernhard Beckermann Laboratoire d Analyse Numérique et d Optimisation Université des Sciences et Technologies de Lille France bbecker@ano.univ-lille1.fr

More information

Chapter 3 Transformations

Chapter 3 Transformations Chapter 3 Transformations An Introduction to Optimization Spring, 2014 Wei-Ta Chu 1 Linear Transformations A function is called a linear transformation if 1. for every and 2. for every If we fix the bases

More information

Block Bidiagonal Decomposition and Least Squares Problems

Block Bidiagonal Decomposition and Least Squares Problems Block Bidiagonal Decomposition and Least Squares Problems Åke Björck Department of Mathematics Linköping University Perspectives in Numerical Analysis, Helsinki, May 27 29, 2008 Outline Bidiagonal Decomposition

More information

Outline Introduction: Problem Description Diculties Algebraic Structure: Algebraic Varieties Rank Decient Toeplitz Matrices Constructing Lower Rank St

Outline Introduction: Problem Description Diculties Algebraic Structure: Algebraic Varieties Rank Decient Toeplitz Matrices Constructing Lower Rank St Structured Lower Rank Approximation by Moody T. Chu (NCSU) joint with Robert E. Funderlic (NCSU) and Robert J. Plemmons (Wake Forest) March 5, 1998 Outline Introduction: Problem Description Diculties Algebraic

More information

satisfying ( i ; j ) = ij Here ij = if i = j and 0 otherwise The idea to use lattices is the following Suppose we are given a lattice L and a point ~x

satisfying ( i ; j ) = ij Here ij = if i = j and 0 otherwise The idea to use lattices is the following Suppose we are given a lattice L and a point ~x Dual Vectors and Lower Bounds for the Nearest Lattice Point Problem Johan Hastad* MIT Abstract: We prove that given a point ~z outside a given lattice L then there is a dual vector which gives a fairly

More information

Exercise Solutions to Functional Analysis

Exercise Solutions to Functional Analysis Exercise Solutions to Functional Analysis Note: References refer to M. Schechter, Principles of Functional Analysis Exersize that. Let φ,..., φ n be an orthonormal set in a Hilbert space H. Show n f n

More information

EE731 Lecture Notes: Matrix Computations for Signal Processing

EE731 Lecture Notes: Matrix Computations for Signal Processing EE731 Lecture Notes: Matrix Computations for Signal Processing James P. Reilly c Department of Electrical and Computer Engineering McMaster University September 22, 2005 0 Preface This collection of ten

More information

FINITE-DIMENSIONAL LINEAR ALGEBRA

FINITE-DIMENSIONAL LINEAR ALGEBRA DISCRETE MATHEMATICS AND ITS APPLICATIONS Series Editor KENNETH H ROSEN FINITE-DIMENSIONAL LINEAR ALGEBRA Mark S Gockenbach Michigan Technological University Houghton, USA CRC Press Taylor & Francis Croup

More information

Recovery Guarantees for Rank Aware Pursuits

Recovery Guarantees for Rank Aware Pursuits BLANCHARD AND DAVIES: RECOVERY GUARANTEES FOR RANK AWARE PURSUITS 1 Recovery Guarantees for Rank Aware Pursuits Jeffrey D. Blanchard and Mike E. Davies Abstract This paper considers sufficient conditions

More information

Linear Algebra and Eigenproblems

Linear Algebra and Eigenproblems Appendix A A Linear Algebra and Eigenproblems A working knowledge of linear algebra is key to understanding many of the issues raised in this work. In particular, many of the discussions of the details

More information

12. Cholesky factorization

12. Cholesky factorization L. Vandenberghe ECE133A (Winter 2018) 12. Cholesky factorization positive definite matrices examples Cholesky factorization complex positive definite matrices kernel methods 12-1 Definitions a symmetric

More information

Krylov Space Methods. Nonstationary sounds good. Radu Trîmbiţaş ( Babeş-Bolyai University) Krylov Space Methods 1 / 17

Krylov Space Methods. Nonstationary sounds good. Radu Trîmbiţaş ( Babeş-Bolyai University) Krylov Space Methods 1 / 17 Krylov Space Methods Nonstationary sounds good Radu Trîmbiţaş Babeş-Bolyai University Radu Trîmbiţaş ( Babeş-Bolyai University) Krylov Space Methods 1 / 17 Introduction These methods are used both to solve

More information

Complementary Matching Pursuit Algorithms for Sparse Approximation

Complementary Matching Pursuit Algorithms for Sparse Approximation Complementary Matching Pursuit Algorithms for Sparse Approximation Gagan Rath and Christine Guillemot IRISA-INRIA, Campus de Beaulieu 35042 Rennes, France phone: +33.2.99.84.75.26 fax: +33.2.99.84.71.71

More information

Introduction to Compressed Sensing

Introduction to Compressed Sensing Introduction to Compressed Sensing Alejandro Parada, Gonzalo Arce University of Delaware August 25, 2016 Motivation: Classical Sampling 1 Motivation: Classical Sampling Issues Some applications Radar Spectral

More information

New Coherence and RIP Analysis for Weak. Orthogonal Matching Pursuit

New Coherence and RIP Analysis for Weak. Orthogonal Matching Pursuit New Coherence and RIP Analysis for Wea 1 Orthogonal Matching Pursuit Mingrui Yang, Member, IEEE, and Fran de Hoog arxiv:1405.3354v1 [cs.it] 14 May 2014 Abstract In this paper we define a new coherence

More information

Lecture 6. Numerical methods. Approximation of functions

Lecture 6. Numerical methods. Approximation of functions Lecture 6 Numerical methods Approximation of functions Lecture 6 OUTLINE 1. Approximation and interpolation 2. Least-square method basis functions design matrix residual weighted least squares normal equation

More information

Lecture 9: Numerical Linear Algebra Primer (February 11st)

Lecture 9: Numerical Linear Algebra Primer (February 11st) 10-725/36-725: Convex Optimization Spring 2015 Lecture 9: Numerical Linear Algebra Primer (February 11st) Lecturer: Ryan Tibshirani Scribes: Avinash Siravuru, Guofan Wu, Maosheng Liu Note: LaTeX template

More information

Lecture 4 Orthonormal vectors and QR factorization

Lecture 4 Orthonormal vectors and QR factorization Orthonormal vectors and QR factorization 4 1 Lecture 4 Orthonormal vectors and QR factorization EE263 Autumn 2004 orthonormal vectors Gram-Schmidt procedure, QR factorization orthogonal decomposition induced

More information

Sparse regression. Optimization-Based Data Analysis. Carlos Fernandez-Granda

Sparse regression. Optimization-Based Data Analysis.   Carlos Fernandez-Granda Sparse regression Optimization-Based Data Analysis http://www.cims.nyu.edu/~cfgranda/pages/obda_spring16 Carlos Fernandez-Granda 3/28/2016 Regression Least-squares regression Example: Global warming Logistic

More information

Lecture 1: Basic Concepts

Lecture 1: Basic Concepts ENGG 5781: Matrix Analysis and Computations Lecture 1: Basic Concepts 2018-19 First Term Instructor: Wing-Kin Ma This note is not a supplementary material for the main slides. I will write notes such as

More information

Linear Methods for Regression. Lijun Zhang

Linear Methods for Regression. Lijun Zhang Linear Methods for Regression Lijun Zhang zlj@nju.edu.cn http://cs.nju.edu.cn/zlj Outline Introduction Linear Regression Models and Least Squares Subset Selection Shrinkage Methods Methods Using Derived

More information

Solution of Linear Equations

Solution of Linear Equations Solution of Linear Equations (Com S 477/577 Notes) Yan-Bin Jia Sep 7, 07 We have discussed general methods for solving arbitrary equations, and looked at the special class of polynomial equations A subclass

More information

08a. Operators on Hilbert spaces. 1. Boundedness, continuity, operator norms

08a. Operators on Hilbert spaces. 1. Boundedness, continuity, operator norms (February 24, 2017) 08a. Operators on Hilbert spaces Paul Garrett garrett@math.umn.edu http://www.math.umn.edu/ garrett/ [This document is http://www.math.umn.edu/ garrett/m/real/notes 2016-17/08a-ops

More information

A Vector Space Justification of Householder Orthogonalization

A Vector Space Justification of Householder Orthogonalization A Vector Space Justification of Householder Orthogonalization Ronald Christensen Professor of Statistics Department of Mathematics and Statistics University of New Mexico August 28, 2015 Abstract We demonstrate

More information

Wavelet Footprints: Theory, Algorithms, and Applications

Wavelet Footprints: Theory, Algorithms, and Applications 1306 IEEE TRANSACTIONS ON SIGNAL PROCESSING, VOL. 51, NO. 5, MAY 2003 Wavelet Footprints: Theory, Algorithms, and Applications Pier Luigi Dragotti, Member, IEEE, and Martin Vetterli, Fellow, IEEE Abstract

More information

Coding the Matrix Index - Version 0

Coding the Matrix Index - Version 0 0 vector, [definition]; (2.4.1): 68 2D geometry, transformations in, [lab]; (4.15.0): 196-200 A T (matrix A transpose); (4.5.4): 157 absolute value, complex number; (1.4.1): 43 abstract/abstracting, over

More information

MAT Linear Algebra Collection of sample exams

MAT Linear Algebra Collection of sample exams MAT 342 - Linear Algebra Collection of sample exams A-x. (0 pts Give the precise definition of the row echelon form. 2. ( 0 pts After performing row reductions on the augmented matrix for a certain system

More information

Math 102, Winter Final Exam Review. Chapter 1. Matrices and Gaussian Elimination

Math 102, Winter Final Exam Review. Chapter 1. Matrices and Gaussian Elimination Math 0, Winter 07 Final Exam Review Chapter. Matrices and Gaussian Elimination { x + x =,. Different forms of a system of linear equations. Example: The x + 4x = 4. [ ] [ ] [ ] vector form (or the column

More information

Real Analysis Notes. Thomas Goller

Real Analysis Notes. Thomas Goller Real Analysis Notes Thomas Goller September 4, 2011 Contents 1 Abstract Measure Spaces 2 1.1 Basic Definitions........................... 2 1.2 Measurable Functions........................ 2 1.3 Integration..............................

More information

2 Garrett: `A Good Spectral Theorem' 1. von Neumann algebras, density theorem The commutant of a subring S of a ring R is S 0 = fr 2 R : rs = sr; 8s 2

2 Garrett: `A Good Spectral Theorem' 1. von Neumann algebras, density theorem The commutant of a subring S of a ring R is S 0 = fr 2 R : rs = sr; 8s 2 1 A Good Spectral Theorem c1996, Paul Garrett, garrett@math.umn.edu version February 12, 1996 1 Measurable Hilbert bundles Measurable Banach bundles Direct integrals of Hilbert spaces Trivializing Hilbert

More information

8. Prime Factorization and Primary Decompositions

8. Prime Factorization and Primary Decompositions 70 Andreas Gathmann 8. Prime Factorization and Primary Decompositions 13 When it comes to actual computations, Euclidean domains (or more generally principal ideal domains) are probably the nicest rings

More information

VII Selected Topics. 28 Matrix Operations

VII Selected Topics. 28 Matrix Operations VII Selected Topics Matrix Operations Linear Programming Number Theoretic Algorithms Polynomials and the FFT Approximation Algorithms 28 Matrix Operations We focus on how to multiply matrices and solve

More information

4.3 - Linear Combinations and Independence of Vectors

4.3 - Linear Combinations and Independence of Vectors - Linear Combinations and Independence of Vectors De nitions, Theorems, and Examples De nition 1 A vector v in a vector space V is called a linear combination of the vectors u 1, u,,u k in V if v can be

More information

Matrix decompositions

Matrix decompositions Matrix decompositions Zdeněk Dvořák May 19, 2015 Lemma 1 (Schur decomposition). If A is a symmetric real matrix, then there exists an orthogonal matrix Q and a diagonal matrix D such that A = QDQ T. The

More information

LECTURE 7. Least Squares and Variants. Optimization Models EE 127 / EE 227AT. Outline. Least Squares. Notes. Notes. Notes. Notes.

LECTURE 7. Least Squares and Variants. Optimization Models EE 127 / EE 227AT. Outline. Least Squares. Notes. Notes. Notes. Notes. Optimization Models EE 127 / EE 227AT Laurent El Ghaoui EECS department UC Berkeley Spring 2015 Sp 15 1 / 23 LECTURE 7 Least Squares and Variants If others would but reflect on mathematical truths as deeply

More information

Class notes: Approximation

Class notes: Approximation Class notes: Approximation Introduction Vector spaces, linear independence, subspace The goal of Numerical Analysis is to compute approximations We want to approximate eg numbers in R or C vectors in R

More information

An Introduction to Sparse Approximation

An Introduction to Sparse Approximation An Introduction to Sparse Approximation Anna C. Gilbert Department of Mathematics University of Michigan Basic image/signal/data compression: transform coding Approximate signals sparsely Compress images,

More information

Numerical Methods in Matrix Computations

Numerical Methods in Matrix Computations Ake Bjorck Numerical Methods in Matrix Computations Springer Contents 1 Direct Methods for Linear Systems 1 1.1 Elements of Matrix Theory 1 1.1.1 Matrix Algebra 2 1.1.2 Vector Spaces 6 1.1.3 Submatrices

More information

Math 341: Convex Geometry. Xi Chen

Math 341: Convex Geometry. Xi Chen Math 341: Convex Geometry Xi Chen 479 Central Academic Building, University of Alberta, Edmonton, Alberta T6G 2G1, CANADA E-mail address: xichen@math.ualberta.ca CHAPTER 1 Basics 1. Euclidean Geometry

More information

Math 350 Fall 2011 Notes about inner product spaces. In this notes we state and prove some important properties of inner product spaces.

Math 350 Fall 2011 Notes about inner product spaces. In this notes we state and prove some important properties of inner product spaces. Math 350 Fall 2011 Notes about inner product spaces In this notes we state and prove some important properties of inner product spaces. First, recall the dot product on R n : if x, y R n, say x = (x 1,...,

More information

Matrices and Vectors. Definition of Matrix. An MxN matrix A is a two-dimensional array of numbers A =

Matrices and Vectors. Definition of Matrix. An MxN matrix A is a two-dimensional array of numbers A = 30 MATHEMATICS REVIEW G A.1.1 Matrices and Vectors Definition of Matrix. An MxN matrix A is a two-dimensional array of numbers A = a 11 a 12... a 1N a 21 a 22... a 2N...... a M1 a M2... a MN A matrix can

More information

Computational Methods. Eigenvalues and Singular Values

Computational Methods. Eigenvalues and Singular Values Computational Methods Eigenvalues and Singular Values Manfred Huber 2010 1 Eigenvalues and Singular Values Eigenvalues and singular values describe important aspects of transformations and of data relations

More information

Course Notes: Week 1

Course Notes: Week 1 Course Notes: Week 1 Math 270C: Applied Numerical Linear Algebra 1 Lecture 1: Introduction (3/28/11) We will focus on iterative methods for solving linear systems of equations (and some discussion of eigenvalues

More information

Compressed Sensing and Robust Recovery of Low Rank Matrices

Compressed Sensing and Robust Recovery of Low Rank Matrices Compressed Sensing and Robust Recovery of Low Rank Matrices M. Fazel, E. Candès, B. Recht, P. Parrilo Electrical Engineering, University of Washington Applied and Computational Mathematics Dept., Caltech

More information

5 and A,1 = B = is obtained by interchanging the rst two rows of A. Write down the inverse of B.

5 and A,1 = B = is obtained by interchanging the rst two rows of A. Write down the inverse of B. EE { QUESTION LIST EE KUMAR Spring (we will use the abbreviation QL to refer to problems on this list the list includes questions from prior midterm and nal exams) VECTORS AND MATRICES. Pages - of the

More information

Sparse Solutions of Systems of Equations and Sparse Modelling of Signals and Images

Sparse Solutions of Systems of Equations and Sparse Modelling of Signals and Images Sparse Solutions of Systems of Equations and Sparse Modelling of Signals and Images Alfredo Nava-Tudela ant@umd.edu John J. Benedetto Department of Mathematics jjb@umd.edu Abstract In this project we are

More information

then kaxk 1 = j a ij x j j ja ij jjx j j: Changing the order of summation, we can separate the summands, kaxk 1 ja ij jjx j j: let then c = max 1jn ja

then kaxk 1 = j a ij x j j ja ij jjx j j: Changing the order of summation, we can separate the summands, kaxk 1 ja ij jjx j j: let then c = max 1jn ja Homework Haimanot Kassa, Jeremy Morris & Isaac Ben Jeppsen October 7, 004 Exercise 1 : We can say that kxk = kx y + yk And likewise So we get kxk kx yk + kyk kxk kyk kx yk kyk = ky x + xk kyk ky xk + kxk

More information

Sparse analysis Lecture II: Hardness results for sparse approximation problems

Sparse analysis Lecture II: Hardness results for sparse approximation problems Sparse analysis Lecture II: Hardness results for sparse approximation problems Anna C. Gilbert Department of Mathematics University of Michigan Sparse Problems Exact. Given a vector x R d and a complete

More information

Math Linear Algebra II. 1. Inner Products and Norms

Math Linear Algebra II. 1. Inner Products and Norms Math 342 - Linear Algebra II Notes 1. Inner Products and Norms One knows from a basic introduction to vectors in R n Math 254 at OSU) that the length of a vector x = x 1 x 2... x n ) T R n, denoted x,

More information

Contents. 4 Arithmetic and Unique Factorization in Integral Domains. 4.1 Euclidean Domains and Principal Ideal Domains

Contents. 4 Arithmetic and Unique Factorization in Integral Domains. 4.1 Euclidean Domains and Principal Ideal Domains Ring Theory (part 4): Arithmetic and Unique Factorization in Integral Domains (by Evan Dummit, 018, v. 1.00) Contents 4 Arithmetic and Unique Factorization in Integral Domains 1 4.1 Euclidean Domains and

More information

Linear Algebra. Paul Yiu. Department of Mathematics Florida Atlantic University. Fall A: Inner products

Linear Algebra. Paul Yiu. Department of Mathematics Florida Atlantic University. Fall A: Inner products Linear Algebra Paul Yiu Department of Mathematics Florida Atlantic University Fall 2011 6A: Inner products In this chapter, the field F = R or C. We regard F equipped with a conjugation χ : F F. If F =

More information

Chapter 1. Preliminaries. The purpose of this chapter is to provide some basic background information. Linear Space. Hilbert Space.

Chapter 1. Preliminaries. The purpose of this chapter is to provide some basic background information. Linear Space. Hilbert Space. Chapter 1 Preliminaries The purpose of this chapter is to provide some basic background information. Linear Space Hilbert Space Basic Principles 1 2 Preliminaries Linear Space The notion of linear space

More information

arxiv: v1 [math.pr] 22 May 2008

arxiv: v1 [math.pr] 22 May 2008 THE LEAST SINGULAR VALUE OF A RANDOM SQUARE MATRIX IS O(n 1/2 ) arxiv:0805.3407v1 [math.pr] 22 May 2008 MARK RUDELSON AND ROMAN VERSHYNIN Abstract. Let A be a matrix whose entries are real i.i.d. centered

More information

G1110 & 852G1 Numerical Linear Algebra

G1110 & 852G1 Numerical Linear Algebra The University of Sussex Department of Mathematics G & 85G Numerical Linear Algebra Lecture Notes Autumn Term Kerstin Hesse (w aw S w a w w (w aw H(wa = (w aw + w Figure : Geometric explanation of the

More information

LINEAR SYSTEMS (11) Intensive Computation

LINEAR SYSTEMS (11) Intensive Computation LINEAR SYSTEMS () Intensive Computation 27-8 prof. Annalisa Massini Viviana Arrigoni EXACT METHODS:. GAUSSIAN ELIMINATION. 2. CHOLESKY DECOMPOSITION. ITERATIVE METHODS:. JACOBI. 2. GAUSS-SEIDEL 2 CHOLESKY

More information