Low Rank Matrix Approximation

Size: px
Start display at page:

Download "Low Rank Matrix Approximation"

Transcription

1 Low Rank Matrix Approximation John T. Svadlenka Ph.D. Program in Computer Science The Graduate Center of the City University of New York New York, NY USA Abstract Low Rank Approximation is a fundamental computation across a broad range of applications where matrix dimension reduction is required. This survey paper on the Low Rank Approximation (LRA) of matrices provides a broad overview of recent progress in the field with perspectives from both the Theoretical Computer Science (TCS) and Numerical Linear Algebra (NLA) points of view. While traditional application areas of LRA have come from scientific, engineering, and statistical disciplines, a plethora of recent activity has been seen in image processing, machine learning, and climate informatics to name just a few of the emerging modern technologies. All of these disciplines and technologies are increasingly challenged by the scale of modern massive data sets (MMDS) collected from a variety of sources including sensor measurement, computational models, and Internet. At the same time, applied mathematicians and computer scientists have been seeking alternatives to the standard numerical linear algebra (NLA) algorithms that are fundamentally incapable to handle the sheer size of these MMDS s. Against the backdrop of the disparity between increasing computer processing power and disk storage capabilities on the one hand versus memory bandwidth limitations on the other, a further research goal is to find flexible new LRA approaches that leverage the strengths of modern hardware architectures while limiting exposure to data communication bottlenecks. Central to new approaches with regard to MMDS s is the approximation of a matrix by one of much smaller dimension facilitated by randomization techniques. The term low rank in the context of LRA refers to the inherent dimension of the matrix which may possibly be much smaller than both the actual number of rows and columns in the matrix. Thus, a matrix may be approximated, if not exactly represented, by another matrix containing significantly less rows and columns while still preserving the salient characteristics of the original matrix. Recent results have shown that it is possible to attain this dimension reduction using alternative approximation and probabilistic techniques where randomization plays a key role. This paper surveys classical and recent results in LRA and presents practical algorithms representative of the recent research progress. A broad understanding of potential future research directions in LRA should also be evident to readers from theoretical, algorithmic, and computational backgrounds. Keywords: Low rank approximation, Modern Massive Data Sets, random sketches, random projections, dimension reduction, QR factorization, Singular Value Decomposition, CUR, parallelism 1

2 1 Introduction Numerical Linear Algebra (NLA) provides the theoretical underpinnings and formal framework for analyzing matrices and the various operations on matrices. The results in NLA have typically been shared across a diverse spectrum of areas ranging from physical and life sciences, engineering, to data analysis. More recently, new computational disciplines and application areas have begun to test the limits of traditional NLA algorithms. Although existing algorithms have been rigorously developed, analyzed and refined over many years, certain inherent limitations of these algorithms have been increasingly exposed in both new and existing areas of application. A confluence of diverse factors as well as increased interest in NLA by researchers in Theoretical Computer Science have contributed to recent developments in NLA. Perhaps, more importantly, new perspectives on approximation and randomization as applied to NLA have generated new opportunities to further the progress in the field. A prominent factor has been the sheer size of today s MMDS s such as those encountered in data mining [1] that challenge conventional NLA algorithms. It can be shown that if one is willing to relax the requirement of high precision results for faster algorithms, then alternative algorithmic approaches with improved arithmetical computational complexity are available [2]. These alternative approaches create an approximation of the input matrix that is much smaller in size then the original. The approximate matrix is commonly known as a sketch due to the LRA randomization strategies employed to obtain the approximation. This random sketch of much smaller size replaces the original matrix in the application of interest in order to realize the computational savings. The question that naturally arises from this methodology is: how expensive is the LRA algorithm? While traditional LRA algorithms can yield a rank k approximation of an m n matrix in O(mnk) time, a randomized algorithm can reduce the asymptotic complexity to O(mn log k) [1], [2]. Another aspect of the approximation trade-off that justifies the approach lies within the context of data sets with inexact content [1]. Moreover, it can be argued that all numerical computation is itself approximate up to machine precision considerations, so an introduction of similarly inexact algorithms (up to the user-specified tolerance) is not an unreasonable proposition. If one also recognizes that other deficiencies in classical algorithms besides the computational complexity may be mitigated with alternative methods, then opportunities to realize further computational gains are possible. As an example, arithmetic computational complexity does not consider the effects of data movement to and from a computer s random access memory (RAM). Floating point operation speed continues to exceed that of memory of bandwidth [31] so that data transfer is a major performance bottleneck with regard to processing MMDS s. The problem is clearly magnified for out-of-core data sets. Algorithms that can significantly reduce the number of passes over the data set (pass efficiency) will reduce the clock time performance of an algorithm. Similarly, good data and temporal locality contributes to the likelihood of a more favorable performance profile. Data locality refers to the characteristic of an algorithm whereby a segment of data in memory is likely to be processed next if it is located close to that data currently being processed. Likewise, temporal locality is the algorithmic property such that a set of operations on adjacent sets of the data occur in close time-wise succession. The implication is that there is a lower likelihood of having to swap any components of the program data set among the various levels of the memory hierarchy multiple times. As a concrete example, consider the QR factorization which is a standard algorithm for LRA. Matrix-matrix multiplication itself may be executed faster than a QR algorithm 2

3 [2] due to block operations. Due to better use of memory hierarchies matrix-matrix kernels can perform better in general than those of matrix-vector [33, 29] as encountered in QR processing. Therefore, an LRA algorithmic alternative that is based on the former kernels rather than the latter may be more favorably disposed to computational gains. Additionally, matrix-matrix multiplication is embarrassingly parallel [1] and it more generally motivates the search for new LRA strategies that can more fully leverage the parallelism widely supported by modern hardware architectures. It should be noted from the above discussion that scalability improvements in NLA can be addressed in a multifaceted manner, that is, from the theoretical, algorithmic, and computational perspectives. This survey paper examines such hybrid algorithms which combine existing elements of conventional deterministic NLA algorithms with randomization and approximation schemes to offer new algorithms that reduce asymptotic complexity. The result is the capacity to process much larger data sets than possible with conventional algorithms. The outline of this survey paper after the current introductory section is as follows. We review classical results from the literature in Section 2 to provide some broad background on standard factorizations and algorithms for LRA. Subsequently, the more recent research concerning approximation and probabilistic results is given in Section 3. The results of these two sections form the theoretical basis for the presentation of the Randomized Hybrid LRA algorithms of Section 4 where we also discuss the strategies and benefits associated with them. Open problems are discussed in Section 5 and concluding remarks follow in Section 6. 2 Classical Results 2.1 Rank k Factorization In the previous section we mentioned that LRA is concerned with the study of matrix sketches that significantly reduce the number of rows and columns of a matrix. LRA yields two significant benefits: a savings in memory storage and a decrease in the number of arithmetic computations. To see how this is possible, consider that the storage space requirements of a dense m n matrix A is mn memory cells. An LRA of A of rank-k for k min(m, n) is defined by the factorization: A B C (2.1) such that B R m k and C R k n. Therefore, the memory storage cost of the LRA of A is O((m + n)k) and we have that ((m + n)k) m n. In terms of a matrix vector product involving the original matrix A of the form: x = Av (2.2) where x and v are m and n dimensional vectors, respectively, this operation requires mn multiplications and m(n 1) additions. Therefore, the number of arithmetic operations is roughly 2mn. A matrix vector product formulation with the LRA of A is: x BCv (2.3) 3

4 We can see that the product y = Cv requires kn+k(n 1) operations while the product x = By uses mk + m(k 1) operations. The overall arithmetic operation complexity is O(k(m + n)) and offers a significant savings when k min(m, n). If the matrix vector product can be left in the rank factorization form: x By (2.4) then the number of operations is further reduced. The metric by which we prefer to measure the accuracy of a rank-k LRA, Â k, of A is the (1 + ɛ) relative-error bound for small positive ɛ of the following form for both Spectral and Frobenius norms: A Âk (1 + ɛ) A A k (2.5) The relative-error norm bound is a particular example of a multiplicative error bound. A k is the theoretical best rank-k approximation of A that is given by the Singular Value Decomposition which we discuss next. 2.2 Singular Value Decomposition (SVD) Though the origins of the Singular Value Decomposition (SVD) can be traced back to the late 1800 s, it was not until the 20th century that it eventually evolved into its current and most general form. The SVD exists for any m n matrix A regardless of whether its entries are real or complex. Let A be an m n matrix with r = rank(a) whose elements may be complex. Then there exists two unitary matrices U and V such that A = UΣV (2.6) where U and V are m m and n n, respectively. Σ is an m n diagonal matrix with nonnegative elements σ i such that σ 1 σ 2 σ r > 0 and σ j = 0 for j > r. We may also write a truncated form of the SVD in which U consists of the r left-most columns of U, V is similarly r n, and Σ is diag(σ 1,..., σ r ). We write this truncated form as follows: A = U r Σ r V r (2.7) In this truncated form the columns of U and V form orthogonal bases for A and A, respectively. The σ 1,..., σ r are commonly referred to as the singular values of A. More importantly, these singular values indicate the lower bounds on the error of any rank-k approximation of A in the Spectral and Frobenius norms. We have for A k = U k Σ k Vk that: A A k 2 = σ k+1 (2.8) 4

5 A A k F = min(m,n) j=k+1 σ 2 j (2.9) The SVD plays a dual role with regards to matrix approximation. Firstly, algorithms exist to compute the decomposition with asymptotic cost of O((m + n)mn) [6] from which we may obtain a rank-k approximation for k = 1,..., r 1. For the special case of k = 0 we have that A = σ 1. Secondly, the rank-k approximation from the SVD is utilized as the optimal rank-k approximation for evaluating and comparing decompositions and the approximation algorithms which produce them. It is indeed the high cost of producing the SVD factorization for MMDS s that has motivated the search for new LRA techniques. We also note that from an SVD representation of A we may write it s low rank format by first forming the product (Σ k V k ): Alternatively, we may also write: A k = U k (Σ k V k ) (2.10) A k = (U k Σ k ) V k (2.11) Starting with the QR factorization, we present other important classical decompositions in subsequent sections and algorithms that generate them. The SVD decomposition may also be obtained from this QR factorization with an additional post-processing step to the QR algorithm. 2.3 QR Decomposition The QR decomposition is a factorization of a matrix A into the product of two matrices, a unitary matrix Q providing an orthogonal basis for A and an upper triangular matrix R. The significance of this decomposition is evident from its usage as a preliminary step in determining a rank-k SVD decomposition of an m n matrix A. The QR decomposition itself can be obtained faster than the SVD in O(mn min (m, n)) time. More formally, let A be an m n matrix with m n whose elements may be complex. Then there exists an m n matrix Q and an n n matrix R such that A = QR (2.12) where the columns of Q are orthonormal and R is upper triangular. Column i of A is the linear combination of columns of Q with the coefficients given by column i of R. In particular, by the upper triangular form of R, it is clear that column i of A is determined from the first i columns of Q. The existence of the QR factorization can be proven in a variety of ways. We present here a proof using the Gram-Schmidt procedure: Suppose (a 1, a 2,..., a n ) is a linearly independent list of vectors in an inner product space V. Then there is an orthonormal list of vectors (q 1, q 2,..., q n ) such that: span(a 1, a 2,..., a n ) = span(q 1, q 2,..., q n ). (2.13) 5

6 Proof: Let proj(r, s) := <r,s> <r,r> r denote the projection of r on to s and apply the following steps to obtain the orthonormal list of vectors: w 1 := a 1 w 2 := a 2 proj(a 2, w 1 ). w n := a n proj(a n, w 1 ) proj(a n, w 2 ) proj(a n, w n 1 ) q 1 := w 1 / w 1, q 2 := w 2 / w 2,..., q n := w n / w n Re-arranging equations for w 1, w 2,..., w n to be equations with a 1, a 2,..., a n on the left-hand side and replacing w i with q i gives A = QR where A = [a 1, a 2,..., a n ] Q = [q 1, q 2,..., q n ] < q 1, a 1 > < q 1, a 2 > < q 1, a 3 >... < q 1, a n > 0 < q 2, a 2 >... < q 2, a n 1 > < q 2, a n > R = 0 0 < q 3, a 3 >... < q 3, a n > < q n, a n > A problem with practical application of the Gram-Schmidt procedure occurs in the case that rank(a) < n. It is necessary in this case to determine a permutation of the columns of A such that the first rank(a) columns of Q are orthonormal. Let P denote the n n matrix representing this column permutation. We have the QRP formulation: A = QRP (2.14) Enhancements to the basic QR algorithm to obtain both the QR factorization and the permutation matrix are known as QR with column-pivoting. A further improvement discerns the rank of matrix A though at a higher cost. The goal is to find the permutation matrix P for the construction of R such that: ( ) R11 R R = 12 0 R 22 (2.15) Then If R 22 is small and R 11 is r r, it can be shown that σ r+1 (A) R 22 implies that rank(a) = n r. These constructions are known as rank revealing QR (RRQR) factorizations and are the most commonly used forms of the QR algorithm in use today. A deterministic RRQR algorithm was given by Gu and Eisenstat in their seminal paper [5] that finds a k-column subset C of the input matrix A such that the projection of A on to C has error relative to the best rank-k approximation of A as follows: A CC A k(n k) A A k 2 (2.16) The above result matches the classical existence result of Ruston[7]. referred to [5] for more information on RRQR factorizations. The reader is 6

7 2.4 Skeleton (CUR) Decomposition A different approach to LRA is one that determines a subset of actual rows and columns of the input matrix as factors in an approximation instead of an orthogonal matrix. The CUR decomposition consists of a matrix C of a subset of columns of the original matrix A and a matrix R containing a subset of rows of A. U is a suitably chosen matrix to complete the decomposition. The problem described in this section is the submatrix selection problem. Let A be an m n matrix of real elements with rank(a) = r. Then there exists a nonsingular r r matrix  in A. Moreover, let I be and J be the sets of row and column indices of A, respectively, in Â, such that C = A(1..m, J) and R = A(I, 1..n). For U =  1, we have that: A = CUR (2.17) Therefore, it is clear that a subset of both r columns and r rows captures A s column and row spaces, respectively. This skeleton is in contrast to the SVD s left and right singular vectors which are unitary. While it is NP-hard to find optimal row and column subsets, an advantage of this representation is that its content is conducive to being understood in application terms and domain knowledge. Moreover, the CUR decomposition may preserve structural properties of the original matrix that would otherwise be lost in the admittedly somewhat abstract decompositional reduction to unitary matrices. On the other hand, it is not guaranteed that  is well-conditioned. The CUR decomposition requires O((m + n + r)r) memory space and may be simplified to a rank factorization format by writing GH = CU R where G = CU and H = R, or G = C and H = UR. We shall see in a later section that the work of researchers from both Numerical Linear Algebra (NLA) and Theoretical Computer Science (TCS) have provided different algorithmic approaches to LRA employing the CUR decomposition. NLA algorithms have focused on the particular choice of an  that maximizes the absolute value of its determinant while TCS favors column and row sampling strategies based on sampling probabilities derived from Euclidean norms of either the matrix s singular vectors or of it s actual rows and columns. It is the construction of the sampling probabilities from singular vectors, commonly known as leverage scores, which is responsible for the computational complexity bound in the TCS approach. In the NLA algorithms the absolute value of the determinant is a proxy for quantifying the orthogonality of the columns in a matrix. This topic will be discussed in more detail in a later section. A variation of the above CUR strategy is to obtain a rank-k matrix C from columns of the original matrix A and project A on to C. This approximate decomposition is given by A CC A and is known as a CX decomposition where X := C A. The key idea is to project the matrix A onto a rank-k subspace of A as given by C. Thus, a rank-k factorization may be given by GH = CC A where G = C and H = C A. In the next section we present a related form of decomposition known as the Interpolative Decomposition. 2.5 Interpolative Decomposition (ID) The intuition motivating the ID is that if an m n matrix A has rank k, than it is reasonable to expect to be able to use some representative subset of k columns of A (let s call this column subset B) to represent all n columns of A. In effect, the columns of B serve 7

8 as a basis of A. Consequently, we only need to construct a k n matrix P to express each column i of A for i = 1... n as a linear combination of the columns of B. This intuition leads us to the Interpolative Decomposition Lemma: Suppose A is an m n matrix of rank k whose elements may be complex. Then there exists an m k matrix B consisting of a subset of columns of A and a k n matrix P such that: 1. A = B P 2. The I k matrix appears in some column subset of P 3. p ij 1 for all i and j To find the subset of k columns from a choice of n columns is NP-hard and algorithms based on the above conditions can be expensive. But the computation of the ID is made easier [1] by relaxing the requirement of p ij 1 to p ij 2. The B factor of the ID, as with the C and R matrices of the CUR decomposition, facilitates data analysis and it inherits properties of the matrix A. We may ask if the ID can be extended into the form of a two-sided ID decomposition where the rows of B are a basis for the rowspace of A? The existence of such a decomposition is given in [8] with the Two-sided Interpolative Decomposition Theorem: Let A be an m n matrix and k min(m, n). Then there exists: ( ) Ik ( A = P L A S S Ik T ) PR + X (2.18) such that P L and P R are permutation matrices. S C (m k) k and T C k (n k) and X satisfy: S F k(m k) (2.19) T F k(n k) (2.20) X 2 σ k+1 (A) 1 + k(min(m, n) k) (2.21) In the above formulation A S is a k k submatrix of A. Though we will not investigate this decomposition any further in this survey, we mention it here to point out that this CURlike decomposition includes a residual X term that is bounded by the (k + 1) singular value of A. To some extent we may infer an increased difficulty of the submatrix selection problem versus that of just column subset selection in the one-sided Interpolative Decomposition. 2.6 QR Conventional Algorithm and Complexity Cost After having described some of the more prominent decompositions in the prior sections, we turn our attention to the conventional algorithms commonly utilized to produce them. We will limit our presentation to the QR and SVD factorizations for a couple of reasons. First, these are two of the most ubiquitous factorizations in practice and much effort has been invested over the years to enhance the performance and functionality of their algorithms. 8

9 Moreover, an analysis of the techniques used in their production are sufficient to convey the limitations and inflexibilities that motivate the search for more robust algorithmic approaches. Let s first present a method for generating the Q and R factors of A which addresses computational issues arising from the Gram-Schmidt procedure. As a QR Gram Schmidt alternative, consider an orthogonal matrix product Q 1 Q 2... Q n that transforms A to upper triangular form R: (Q n... Q 2 Q 1 )A = R A multiplication of both sides by (Q n... Q 2 Q 1 ) 1 yields: (Q n... Q 2 Q 1 ) 1 (Q n... Q 2 Q 1 )A = (Q n... Q 2 Q 1 ) 1 R A = Q 1 Q 2... Q n R Note that a product of orthogonal matrices is also orthogonal so allowing for columnpivoting we have that: AΠ = Q 1 Q 2... Q n R (2.22) A Householder reflection matrix is used for each Q i, i = 1, 2,..., n to transform A to R column-wise. More formally, the Householder matrix vector multiplication Hx = (I 2vv T )x reflects a vector x across the hyperplane normal to v. The unit vector v is constructed for each Q i Householder matrix so that entries of column i below the diagonal of A vanish. 1. x = (a ii, a ii+1,..., a in ) for column i 2. v depends upon x and the standard basis vector e i 3. The matrix product Q i A is applied 4. The above items are repeated for each column of A The impacts to the QR algorithm are that Householder matrices improve numerical stability through the multiplication by orthogonal matrices. The chain of Q i s are not collapsed entirely together in a manner that would result in just one matrix-matrix multiplication operation. It is also possible that the matrix product Q i A is implemented as a series of matrix vector multiplications instead. While parallelized deployments of the QR algorithm are utilized, the parallelization on a massive scale that we seek for MMDS s is not suited to the above described algorithmic enhancements. We mention here that there also exists a variation of the QR algorithm that uses another type of orthogonal transformation, the Givens rotation, though the Householder version is more commonly used. A more detailed discussion on these QR algorithms may be found in one of the popular Linear Algebra textbooks such as [6]. We next turn our attention to algorithms for computing an SVD decomposition. 9

10 2.7 SVD Deterministic Algorithm and Complexity Cost The standard SVD algorithm is based on the work of Golub and Reinsch [3] though we will also review an alternative algorithm that uses a QR algorithm with post-processing. The SVD decomposition of A = UΣV occurs in two distinct steps: 1st Step: Use two sequences of Householder translations to reduce A to upper bidiagonal form: 1. B = Q n... Q 2 Q 1 AP 1 P 2... P n 2 2. Therefore, we have that: A = Q 1 Q 2... Q n BP n 2... P 2 P 1 2nd Step: Use two sequences of Givens rotations (orthogonal transformations) to reduce B to diagonal form Σ 1. Σ = G n 1... G 2 G 1 BF 1 F 2... F n 1 2. Likewise, we have that: B = G 1 G 2... G n 1 ΣF n 1... F 2 F 1 3. Set U := Q 1 Q 2... Q n G 1 G 2... G n 1 4. Set V := (F 1 F 2... F n 1 ) (P 1 P 2... P n 2 ) A truncated SVD to a rank-k approximation may be obtained from running this algorithm though there is no savings available in the arithmetic complexity. According to [28] the first step of bidiagonal reduction (BRD) can consume at least seventy percent of the time for the SVD algorithm. In practice, BRD consists of the repeated construction of the Householder reflectors and update of the matrix using the reflectors. Two matrix vector multiplications during each reflector construction involve the remaining subdiagonal portion of the matrix being reduced. Depending upon the implementation, if the sequence of matrix updates and matrix vector multiplications result in frequent data transfers across the memory hierarchy, a memory bottleneck may result. The situation may be exacerbated for matrices that are larger than available cache. It should be clear that this has implications for processing MMDS s and motivates, in part, the search for new LRA strategies. There exists an alternative to the above standard SVD algorithm which relies on first obtaining a rank-k QR factorization. In this case we have that: A = Q 1 Q 2... Q n RΠ + E (2.23) where E is a residual error term. The SVD algorithm described above may be applied to the product RΠ with the result: RΠ = XΣV (2.24) Let Q = Q 1 Q 2... Q n and noting that the product U = QX is also orthogonal, we have the following rank-k SVD decomposition of A: A = UΣV + E (2.25) While we have used the SVD algorithm it is only applied on a matrix of typically much smaller dimension than A. 10

11 3 Approximation and Probabilistic Results In recent years a number of theoretical results have appeared in the literature concerning the approximation of a matrix by one of a smaller size in terms of rows and/or columns. Three broad strategies have been identified by which we may seek to reduce matrix size. The first of these is dimension reduction in which the goal is to approximate a matrix by one of much smaller rank than the original matrix. A second approximation strategy is to choose a subset of columns (or rows) from the original matrix which are most representative of the original matrix and thereby preserving the salient characteristics of the original matrix in the approximation. In the third strategy a submatrix consisting of a subset of both rows and columns of the original matrix is chosen to formulate an approximate matrix of smaller size. Each of these strategies have received considerable attention by researchers and more often than not, from both Theoretical Computer Science and Numerical Linear Algebra perspectives. An informative and in-depth comparison of these two points of view and their cultural differences may be found in [2]. This section covers the three broad strategies and, in particular, the random multiplier matrices utilized in dimension reduction. 3.1 Dimension Reduction It is instructive to begin the discussion of Dimension Reduction, as it applies to matrices, with a related topic concerning points in Euclidean space. A seminal paper by Johnson and Lindenstrauss [9] proved that, given a set of n points of dimension d, it is possible to approximate distances between any two points in O(log n)-dimensional space. Let X 1, X 2,... X n R d. Then for ɛ (0, 1) there exists Φ R k d for k = O( 1 ɛ 2 log n) such that: (1 ɛ) X i X j 2 ΦX i ΦX j 2 (1 + ɛ) X i X j 2 (3.1) That is, the target dimension depends only upon the number of points. The immediate consequences of this result were obvious and significant for the Nearest Neighbor problem, though not readily apparent to the NLA community. The mapping matrix Φ (J-L Transform) can be constructed as a random Gaussian normal k d matrix. Alternatively, Achlioptas [10] demonstrated that a matrix obtained from Bernoulli random {+1, 1} entries could also be used. Perhaps even of more importance was their finding that sparsity could be introduced into the random matrix thereby achieving reduced matrix vector multiplication complexity. In this case {+1, 1} values are each chosen with probability p = 1 6 and zero otherwise. Sarlos [11] utilized the Johnson-Lindenstrauss lemma to provide the first relative-error approximation in terms of the Frobenius norm for LRA in a constant number of passes of a matrix building on Achlioptas result. If A R m n and B is an r n J-L transform with i.i.d. zero mean entries { 1, +1} for r = Θ( k ɛ + k log k) and ɛ (0, 1), then with probability p.5, we have that: A P roj AB T,k(A) F (1 + ɛ) A A k F (3.2) where P roj AB T,k(A) is the best rank k approximation of the projection of A in the column space of AB T. This result extended the preservation of distance metrics for vectors 11

12 to that of the actual matrix subspace structure utilizing J-L transforms. Moreover, it suggests a general two step general strategy for dimension reduction. In the first step a random subspace is created from the application of the J-L transform to the matrix A. A rank-k approximation of A is then obtained in the second step after projecting A on to the subspace generated in the first step. Thus, the randomization as given in step 1 for the J-L transform construction enables a new approach to SVD approximation with constant probability. We cover the important topic of random multiplier matrices as given by J- L transforms in more detail in the next section. Another important implication is that to arrive at a rank-k approximation of A, we must formulate r > k random linear combinations of columns of A. Sarlos result was actually preceded in the NLA literature by Papadimitriou et al. [12] which initially proposed using random projections in Latent Semantic Indexing (LSI) applications. Briefly, LSI is concerned with information retrieval and the evaluation of the spectral properties of term document matrices which capture documents on one matrix dimension and the terms found in those documents along the other dimension. Each entry in the matrix contains a count of the occurrences of the particular term for a given document. Papadimitriou s result provides a weaker additive error bound than Sarlos in terms of a rank-2k approximation Â2k of a matrix A: A Â2k 2 F A A k 2 F + 2ɛ A 2 F (3.3) The weakness in the additive error bound is due to the second term on the right hand side of the above result because A F can be arbitrarily large. Nonetheless, Papadimitriou provided mathematical rigor in explaining why LRA can be used to capture the salient features of term document matrices. Eventually, a relative-error bound in the spectral norm was given by Halko et al. [1] which relies on a power iteration to obtain the following result: Let A R m n. If B is an n 2k Gaussian matrix and Y = (AA ) q AB such that q is a small non-negative integer and 2k is the target rank approximation where 2 k 0.5min{m, n} then: E A P roj Y,2k (A) 2 [ {min(m, n} k 1 ] 1 2q+1 A Ak 2 (3.4) A power iteration factor (AA ) q appears in Y to address any case of slow decay in the singular values of A that might otherwise negatively affect the LRA accuracy. Thus, the accuracy of the approximation can be refined by a larger choice of q. It can be shown that the SVD of (AA ) q A preserves the left and right singular vectors of A. The singular value matrix of (AA ) q A is Σ 2q+1 where Σ is the singular value matrix of A. In practice the approach employed in [1] for most input matrices in practice does not utilize a power iteration (eq., q = 0). Instead, an oversampling parameter p, a small positive integer, is added to the rank-k value desired to specify the size of an n (k + p) random multiplier matrix. The choice of value for p involves a number of factors. Please see [1] for more details. An improvement to their proof was given by Woodruff [13] that realizes an actual rank k approximation. Woodruff also refined the proof using results for bounds on maximum and minimum singular values of Gaussian random matrices [14]. 12

13 From the Relative-Error bound for a matrix A we have seen that given a sample of r random linear combinations of columns of A, we may obtain a rank k < r approximation of A. Perhaps a more insightful explanation is to recognize the following: 1. Multiplying A by a random vector x yields a vector y colspace(a). 2. With high probability a set of r such y s are linearly independent. 3. A new approximate basis  for A consisting of the y s has dimension r. 4. If A is projected on to Â, a rank k matrix decomposition of this projection approximates the truncated rank-k SVD decomposition of A. The most expensive aspect of the random projection approach is the multiplication of A by a random multiplier matrix. On the one hand, matrix multiplication is an embarrassingly parallel operation, but the concern of memory bottlenecks is raised with MMDS s and the type of random multiplier involved which can have an adverse impact on clock time performance. We shall see in the next section that we may use structured random multipliers besides Gaussian matrices. Structured matrices are beneficial in reducing the number of floating point operations (FLOPS) as compared to Gaussian matrices and in the amount of storage that they require. 3.2 Subspace Projections with Random Multiplier Matrices In the prior subsection the discussion of J-L transforms described Gaussian random matrices or matrices of Bernoulli random {+1, 1} entries. Unfortunately, matrix vector and matrix matrix multiplication using such dense matrices is expensive. In the case of matrix vector multiplication with a dense R k d matrix, O(kd) arithmetic operations are required with each vector X R d. One alternative involves the sparsification of the J-L transform and it was first proposed by Achlioptas [10]. In this sparse variant of the J-L transform, each element is chosen from a probability distribution where {+1, 1} are each chosen with 1 6 probability and zero is selected with 2 3 probability. A scaling constant 3 k completes the definition of this J-L transform. While this approach can be effective for dense vectors, it is problematic when the vector itself is also sparse. While researchers have focused attention in recent years on J-L transforms for the specific case of sparse vectors, we concern ourselves for the remainder of this section with the general case. The next significant result was the introduction of the Fast Johnson-Lindenstrauss Transform (FJLT) by Ailon and Chazelle [15]. Their efforts addressed the limitations of processing sparse vectors while reducing the complexity of dense matrix vector multiplication. They introduced a transform, a random structured matrix, defined as the product of three matrix factors in which two of the matrices are randomized and the third is the Hadamard matrix. Let the FJLT Φ = P HD and d = 2 l such that: P R k d H, D R d d 13

14 P ij N(0, q 1 ) with probability q and P ij = 0 with probability 1 q for q = min(θ( log2 n d ), 1) H 2 = ( d 1 2 d 2 1 d 1 2 d 1 2 ) for q = 2 h, h = 0, 1,... l D is diagonal with D ii drawn independently from {1, 1} with probability 1 2. Then we have that with probability 2 3 for X i R d : (1 ɛ)k X i 2 Φ 2 (1 + ɛ)k X i 2 (3.5) To build Φ requires O(d log d + min(dɛ 2 log n, ɛ 2 log 3 n)) operations. Moreover, the complexity of matrix vector multiplication is O(d log d + P ). The motivation for the FJLT is to have a matrix with sparsity as proposed by Achlioptas and that negates the sparse vector scenario while reducing arithmetic compexity of multiplication. The H and D matrices are orthogonal matrices. Therefore, matrix vector multiplication preserves vector norms and distances between vectors. According to [15], H densifies sparse vectors while D provides enough randomization to prevent dense vectors from becoming sparse. The P matrix provides sparsity to the transform similarly as in [16]. The structure inherent in the FJLT as given by the recursive definition of H provides for the improved matrix vector multiplication complexity rather than the sparsity given in P. Woolfe et al. [16] subsequently applied the FJLT to LRA by formulating the subsampled random Fourier transform (SRFT) in the complex case. Let the SRFT Φ = n l DF S such that: D C n n is a diagonal matrix whose entries are i.i.d. random variables distributed uniformly on the unit circle. F is the n n Discrete Fourier Transform (DFT) matrix. S is an n l matrix whose columns are sampled uniformly from the n n identity matrix. Therefore, a random subspace may be created from an m n matrix using the SRFT as a random multiplier. In comparison to a Gaussian multiplier that requires nl random entries, the SRFT requires only (n + l) random entries: n entries for the D matrix and l for the matrix S. The SRFT matrix matrix multiplication is performed using O(mn log l) flops. In their algorithm the accuracy of a rank-k approximation Âk of A C m n for real α, β greater than 1 such that m > l α2 β (α 1) 2 (2k) 2 and with probability p 1 3 β is: A Âk 2( 2α 1 + 1)( α max (m, n) α max (m, n)) A A k 2 (3.6) A similar random structured matrix may be formed with the DFT matrix replaced by a Hadamard matrix and the diagonal matrix D consisting of randomly chosen {1, 1} on the diagonal as in the SRFT case. The primary drawback of SRFT s compared to 14

15 Gaussian matrices is a theoretically higher probability of failure [1]. For Gaussian matrices the probability of failure with oversampling parameter p is e p and that of SRFT s for a rank-k approximation increases to 1 k. We may ask what other types of random structured matrices can be used that are perhaps faster than SRFT for Dimension Reduction? The key to answering this question lies in a change in the definitions of the input matrix A and the random Gaussian multiplier B used to generate a random subspace in the first step of Dimension Reduction. According to the Dual Theorem in Pan et al. [17], assume that A R m n is an average input matrix with numerical rank at most r under the Gaussian probability distribution and that B R n l has numerical rank l. If l r then Dimension Reduction using B succeeds in outputting a rank l approximation to A. It implies that a unitary matrix B or a matrix that is both full-rank and well-conditioned (reasonably bounded condition number) is sufficient. Recall that the condition number of a unitary matrix is equal to one. This result suggests that we can expect to have success with random multipliers that are formed using structured and sparse orthogonal matrices. Moreover, as it concerns MMDS s we also benefit from the use of structured multipliers by reducing the memory space needed for their storage. Therefore, the possibility exists to find more efficient random multipliers, that is, those having lower complexity bound than O(mn log l) flops for matrix matrix multiplication as in SRFT. One such possibility is to employ Hadamard and Fourier matrices that are defined up to a few recursive levels, thus inducing a sparse and orthogonal matrix. Indeed the numerical experiments in [17] show that random multipliers using these abridged and orthogonal Hadamard matrices in place of a full Hadamard matrix are promising. Currently, there is no formal support for this specific type of matrix. Though it should not be surprising that orthogonal matrices should be effective multipliers given that matrix vector multiplication with them preserves vector norms as well as the distances and angles between vectors. 3.3 Approximations with Column(or Row) Subsets We now turn our attention to the problem of identifying a suitable subset of columns (or rows) of a matrix A R m n that may also be optionally processed further in some manner in order to obtain an approximation A. We previously saw a classical result in an earlier section concerning existence of a k-column subset C of A such that: A CC A k(n k) A A k 2 (3.7) The CC A term represents a CX approximation to A for X := C A in which A is projected on to the column space of C by the projection matrix CC. An influential paper by Frieze et al. [19] introduced a strategy of creating sampling probabilities from the Euclidean norms of the rows and columns of a matrix from which to subsequently sample a subset of columns and rows. Their key theoretical finding assumes the probabilities P i for rows A (i), i = 1... m and a constant c 1 such that: P i c A (i) 2 A 2 F (3.8) Theorem 1. [19]: Let R be a sample of r rows of A chosen from the above distribution and let W be the vector space spanned by R. There exists with probability p.9 an orthonormal set of vectors w (1), w (1),... w (k) in W such that: 15

16 k A A w (i) w(i) T i=1 2 A A k 2 F + 10k cr A 2 F (3.9) The authors applied Theorem 1 to provide an algorithm that samples a subset of columns and rows of A to form a rank-k approximation Âk of A with additive error bound: A Âk 2 F A A k 2 F + ɛ A 2 F (3.10) The additive error bound is weak in the sense that A 2 F in the second term on the righthand side may be arbitrarily large. While the algorithm has polynomial time complexity in k and 1 ɛ, the complexity bound does not include the computation of the sampling probabilities for the rows and columns of A and the sample complexity (number of rows) is O(k 4 ). Otherwise, it is interesting to note that the running time is independent of the matrix size. Subsequently, Deshpande et al [20] improved upon the above result of [19] utilizing a volume-sampling technique to generate a multiplicative error bound that is more refined than its additive counterpart. They showed that there exists a set of k rows of A whose span contains a subset of rows Ãk that is a multiplicative approximation to the best rank-k matrix approximation, A k : A Ãk F (k + 1) A A k F (3.11) Moreover, they extended this result into a stronger relative error approximation in the following theorem. Theorem 2. [20]: In any m n matrix A there exists O( k2 ɛ ) rows in whose span are rows that form a rank-k matrix Ãk for an error parameter ɛ such that: A Ãk 2 F (1 + ɛ) A A k 2 F (3.12) The volume sampling method utilized in this paper relies on volume distributions constructed for each k-subset of rows of A. The volume of a matrix B containing k rows is: vol(b) = 1 k! det(bb T ) (3.13) Thus, a k-row subset is chosen with probability proportional to the square of its volume. The improved error-bound approximation of Deshpande over that of [19] can at least in part be attributed to a volume metric that captures information about a matrix as opposed to that of the Euclidean vector norms associated to individual rows and columns of a matrix. However, Frieze s algorithms involves only two passes while Deshpande s requires multiple passes to obtain the relative error approximation. Both of the algorithms of Deshpande [20] and Frieze [19] have sample complexity that is at least quadratic for CX approximation. This complexity bound was subsequently reduced by Rudelson and Vershynin[21] in an approach using the Law of Large Numbers adapted to matrices. The rationale underlying their work is that if a matrix has small numerical rank than a low rank approximation 16

17 should be available from a random submatrix. Though they only obtain an additive error approximation, it is done with an algorithm using at most two passes of the data and with O(k log k) sample complexity. Their additive error is given in the spectral norm. Theorem 3. [21]: Suppose A is an m n matrix of numerical rank r = A F 2 and ɛ, δ (0, 1), c 0, and d an integer such that: ( ) ( ) r r m d c ɛ 4 log δ ɛ 4 δ A 2 2 (3.14) Let a random submatrix Ãk of d rows of A be sampled according to their squared Euclidean norms and let U k be the k top left singular vectors of à k. We have that with probability p 1 2 exp c δ : A AUU 2 A A k 2 + ɛ A 2 (3.15) Another point of interest is that the sample size d depends upon the numerical rank and not the desired rank-k value for the approximation as in [19, 20]. Finally, an algorithm that realizes the CX relative error existential result of Deshpande et al. [19] was given by Drineas et al. [22]. However, they take a different approach from previous papers concerning the construction of sampling probabilities. Recall that the previous sampling probabilities given in the literature utilized the squared Euclidean norms of the rows (or columns) of a matrix A. Drineas introduced the idea of subspace sampling according to the squared norms of the right singular vectors of A. Their argument is that this is an improvement over the previous column (row) sampling of the matrix due to linear span considerations. Suppose that the i-th column of a matrix A is given by: A (i) = UΣ(V T ) (i) (3.16) Therefore, V (i) is in some sense a metric of the extent to which A (i) is contained in the span of U. The effect of Σ is eliminated as compared to the previous probability distribution approach because it does not affect U s span. Consequently, the probability distributions p i for i = 1... n, otherwise known as leverage scores, associated to each column i of the best rank-k approximation A k are given by: ( p i V T (A,k) k ) (i) 2 2 (3.17) The theorem for the LRA algorithm follows: Theorem 4. [22]: Suppose A is an m n matrix containing real entries and k min(m, n) is an integer. Then there exists randomized algorithms that choose a c column k subset, C, of A where c = O( 2 log 1 δ 1 δ: ɛ 2 ) such that for ɛ, δ (0, 1], we have that with probability A CC + A F (1 + ɛ) A A k F (3.18) 17

18 ( k log k log 1 ) δ A similar result holds if at most c = O columns are chosen in expectation. ɛ 2 The complexity bound to build the CX in this algorithm is bounded by the cost required to derive the right singular vectors of the rank-k approximation. A less expensive alternative is to obtain approximate leverage scores. These may be derived according to a relative-error bound approximation in [23] with cost O(mn log k) to obtain the sampling probabilities corresponding to the top k singular vectors. An area of further research concerns the preconditioning of matrix A by a suitable multiplier matrix such that the leverage scores of the product matrix are approximately uniform. In this case, the sampling algorithm includes a post-processing recovery step of the original matrix. This strategy can be justified if the cost of the two matrix multiplications can be done inexpensively and it implies using orthogonal matrices as their inverses are readily available. A randomized algorithm that satisfies Theorem 4 is presented in section Approximations with Column-Row Subset Combinations We now extend discussion of the column (row) subset appproximation results of the prior section to that of approximations using column and row subsets simultaneously. Such approximations in the TCS community are commonly referred to as CUR decompositions while among NLA researchers the terms CGR and matrix skeletons are also used. An investigation as it concerns adapting the sampling approach of the last section to CUR can be found in Drineas et al. [24]. Their algorithms are devised with the goal of accommodating out-of-core massive data sets so that only O(cm + nr) RAM is required to obtain a CUR for A R m n, C R m c, and R R r n. Moreover, at most three passes of the matrix A are required. Their linear time algorithm, as with the Column Subset algorithms of the prior section, relies on sampling probabilities. These probabilities are proportional to the squared Euclidean norms p i and q j for the sets of rows and columns, respectively, of A. Thus, C is computed from sampling c columns of A according to the column probabilities given by q j. Likewise, R is constructed from r sampled rows of A according to the row probabilities, p i. The column and row subsets are scaled and the matrix U is derived from additional processing of C and R. More formally, we have the following sampling probabilities: p i = A (i) 2 A 2 F i = 1... m (3.19) q j = A(j) 2 A 2 F j = 1... n (3.20) The algorithm has cost complexity of O(max (m, n)) and requires one pass of A to compute the probabilities and a second pass in which the matrices C and R are simultaneously obtained. An additive error in expectation is given as follows for 1 k min (c, r) provided that c 64k ɛ 4 and r k ɛ 2 : E[ A Âk F ] A A k F + ɛ A F (3.21) E[ A Âk 2 ] A A k 2 + ɛ A F (3.22) 18

19 Though the algorithm is polynomial in k and 1 ɛ, it is linear in the input matrix dimensions. It is assumed that the desired rank-k is much smaller than min (m, n) and that c and r are sufficiently small to be considered constants. The additive error bounds for CUR were subsequently improved to relative error bounds by Drineas et al [22] by extending the relative error bound result for CX decompositions of the same paper. Once again they use a modified sampling probability strategy based on the singular vectors of the input matrix. The complexity of the algorithm is bounded by the time required to compute the squared Euclidean norms (or approximations thereof) of the singular vectors. A rough sketch of the algorithm is presented here and it is discussed in more detail in section 4. The matrix C, a column subset of A, is generated by the algorithm of Theorem 4. The left singular vectors U C of C are then used to compute probabilities q i for the sampling of r rows from A to form R. The U matrix is the pseudoinverse of the matrix W R r c that is the intersection of R with C. Both R and W are scaled similarly to C. ( q i = U T C c ) (i) 2 2 i = 1... m (3.23) The key theorem that enables the relative error bound for CUR follows: Theorem 5. [22]: Suppose A is an m n matrix containing real entries and C is an m c matrix containing c columns of A obtained with the algorithm of Theorem 4. Set r = 3200 c2 for ɛ (0, 1], and choose r rows of A (and the corresponding ones in C) as in ɛ 2 the algorithm described immediately above. Then we have with probability p.7 that: A CUR F (1 + ɛ) A CC + A F (3.24) ( A similar result holds if at most r = O c log c ɛ 2 ) rows are chosen in expectation. Please see [22] for further details. Theorems 4 and 5 can now be combined to get the final CUR relative error bound as given in Section 5.2 of [22] where ɛ p = 3ɛ: A CUR F (1 + ɛ) A CC + A F (1 + ɛ) 2 A A k F (3.25) (1 + ɛ) 2 A A k F (1 + ɛ p ) A A k F (3.26) According to [22], sampling according to the probabilities q i as defined above is done so that R may contain those rows that capture a similar subspace as the first c right singular vectors of C. The running time of the algorithm, omitting the CX decomposition algorithm of Theorem 4 to obtain C, is O(mn). The requirements of O(k log k ) columns ɛ 2 and O(c log c ) rows for a rank-k CUR approximation were subsequently lowered in a paper ɛ 2 by Boutsidis and Woodruff [25]. Though [25] is conceptually similar to [22] in the overall approach, Boutsidis and Woodruff employ approximation and sparsification results from the literature to obtain an algorithm with running time that is O(nnz(A)) where nnz(a) is the number of nonzero elements of A. Furthermore, their randomized algorithm requires only c = O( k ɛ ) columns and r = O( k ɛ ) rows for a rank-k CUR decomposition. The U matrix is constructed such that rank(u) = k. 19

A fast randomized algorithm for overdetermined linear least-squares regression

A fast randomized algorithm for overdetermined linear least-squares regression A fast randomized algorithm for overdetermined linear least-squares regression Vladimir Rokhlin and Mark Tygert Technical Report YALEU/DCS/TR-1403 April 28, 2008 Abstract We introduce a randomized algorithm

More information

A fast randomized algorithm for approximating an SVD of a matrix

A fast randomized algorithm for approximating an SVD of a matrix A fast randomized algorithm for approximating an SVD of a matrix Joint work with Franco Woolfe, Edo Liberty, and Vladimir Rokhlin Mark Tygert Program in Applied Mathematics Yale University Place July 17,

More information

Random Methods for Linear Algebra

Random Methods for Linear Algebra Gittens gittens@acm.caltech.edu Applied and Computational Mathematics California Institue of Technology October 2, 2009 Outline The Johnson-Lindenstrauss Transform 1 The Johnson-Lindenstrauss Transform

More information

Randomized Numerical Linear Algebra: Review and Progresses

Randomized Numerical Linear Algebra: Review and Progresses ized ized SVD ized : Review and Progresses Zhihua Department of Computer Science and Engineering Shanghai Jiao Tong University The 12th China Workshop on Machine Learning and Applications Xi an, November

More information

EE731 Lecture Notes: Matrix Computations for Signal Processing

EE731 Lecture Notes: Matrix Computations for Signal Processing EE731 Lecture Notes: Matrix Computations for Signal Processing James P. Reilly c Department of Electrical and Computer Engineering McMaster University September 22, 2005 0 Preface This collection of ten

More information

Randomized algorithms for the approximation of matrices

Randomized algorithms for the approximation of matrices Randomized algorithms for the approximation of matrices Luis Rademacher The Ohio State University Computer Science and Engineering (joint work with Amit Deshpande, Santosh Vempala, Grant Wang) Two topics

More information

EECS 275 Matrix Computation

EECS 275 Matrix Computation EECS 275 Matrix Computation Ming-Hsuan Yang Electrical Engineering and Computer Science University of California at Merced Merced, CA 95344 http://faculty.ucmerced.edu/mhyang Lecture 22 1 / 21 Overview

More information

AMS526: Numerical Analysis I (Numerical Linear Algebra for Computational and Data Sciences)

AMS526: Numerical Analysis I (Numerical Linear Algebra for Computational and Data Sciences) AMS526: Numerical Analysis I (Numerical Linear Algebra for Computational and Data Sciences) Lecture 19: Computing the SVD; Sparse Linear Systems Xiangmin Jiao Stony Brook University Xiangmin Jiao Numerical

More information

Applied Mathematics 205. Unit II: Numerical Linear Algebra. Lecturer: Dr. David Knezevic

Applied Mathematics 205. Unit II: Numerical Linear Algebra. Lecturer: Dr. David Knezevic Applied Mathematics 205 Unit II: Numerical Linear Algebra Lecturer: Dr. David Knezevic Unit II: Numerical Linear Algebra Chapter II.3: QR Factorization, SVD 2 / 66 QR Factorization 3 / 66 QR Factorization

More information

Linear Algebra and Eigenproblems

Linear Algebra and Eigenproblems Appendix A A Linear Algebra and Eigenproblems A working knowledge of linear algebra is key to understanding many of the issues raised in this work. In particular, many of the discussions of the details

More information

AM 205: lecture 8. Last time: Cholesky factorization, QR factorization Today: how to compute the QR factorization, the Singular Value Decomposition

AM 205: lecture 8. Last time: Cholesky factorization, QR factorization Today: how to compute the QR factorization, the Singular Value Decomposition AM 205: lecture 8 Last time: Cholesky factorization, QR factorization Today: how to compute the QR factorization, the Singular Value Decomposition QR Factorization A matrix A R m n, m n, can be factorized

More information

A fast randomized algorithm for the approximation of matrices preliminary report

A fast randomized algorithm for the approximation of matrices preliminary report DRAFT A fast randomized algorithm for the approximation of matrices preliminary report Yale Department of Computer Science Technical Report #1380 Franco Woolfe, Edo Liberty, Vladimir Rokhlin, and Mark

More information

MANY scientific computations, signal processing, data analysis and machine learning applications lead to large dimensional

MANY scientific computations, signal processing, data analysis and machine learning applications lead to large dimensional Low rank approximation and decomposition of large matrices using error correcting codes Shashanka Ubaru, Arya Mazumdar, and Yousef Saad 1 arxiv:1512.09156v3 [cs.it] 15 Jun 2017 Abstract Low rank approximation

More information

Subset Selection. Deterministic vs. Randomized. Ilse Ipsen. North Carolina State University. Joint work with: Stan Eisenstat, Yale

Subset Selection. Deterministic vs. Randomized. Ilse Ipsen. North Carolina State University. Joint work with: Stan Eisenstat, Yale Subset Selection Deterministic vs. Randomized Ilse Ipsen North Carolina State University Joint work with: Stan Eisenstat, Yale Mary Beth Broadbent, Martin Brown, Kevin Penner Subset Selection Given: real

More information

Parallel Numerical Algorithms

Parallel Numerical Algorithms Parallel Numerical Algorithms Chapter 6 Matrix Models Section 6.2 Low Rank Approximation Edgar Solomonik Department of Computer Science University of Illinois at Urbana-Champaign CS 554 / CSE 512 Edgar

More information

Lecture 9: Numerical Linear Algebra Primer (February 11st)

Lecture 9: Numerical Linear Algebra Primer (February 11st) 10-725/36-725: Convex Optimization Spring 2015 Lecture 9: Numerical Linear Algebra Primer (February 11st) Lecturer: Ryan Tibshirani Scribes: Avinash Siravuru, Guofan Wu, Maosheng Liu Note: LaTeX template

More information

STAT 309: MATHEMATICAL COMPUTATIONS I FALL 2018 LECTURE 9

STAT 309: MATHEMATICAL COMPUTATIONS I FALL 2018 LECTURE 9 STAT 309: MATHEMATICAL COMPUTATIONS I FALL 2018 LECTURE 9 1. qr and complete orthogonal factorization poor man s svd can solve many problems on the svd list using either of these factorizations but they

More information

Numerical Analysis Lecture Notes

Numerical Analysis Lecture Notes Numerical Analysis Lecture Notes Peter J Olver 8 Numerical Computation of Eigenvalues In this part, we discuss some practical methods for computing eigenvalues and eigenvectors of matrices Needless to

More information

14.2 QR Factorization with Column Pivoting

14.2 QR Factorization with Column Pivoting page 531 Chapter 14 Special Topics Background Material Needed Vector and Matrix Norms (Section 25) Rounding Errors in Basic Floating Point Operations (Section 33 37) Forward Elimination and Back Substitution

More information

Math 407: Linear Optimization

Math 407: Linear Optimization Math 407: Linear Optimization Lecture 16: The Linear Least Squares Problem II Math Dept, University of Washington February 28, 2018 Lecture 16: The Linear Least Squares Problem II (Math Dept, University

More information

Randomized algorithms for the low-rank approximation of matrices

Randomized algorithms for the low-rank approximation of matrices Randomized algorithms for the low-rank approximation of matrices Yale Dept. of Computer Science Technical Report 1388 Edo Liberty, Franco Woolfe, Per-Gunnar Martinsson, Vladimir Rokhlin, and Mark Tygert

More information

Lecture 5: Randomized methods for low-rank approximation

Lecture 5: Randomized methods for low-rank approximation CBMS Conference on Fast Direct Solvers Dartmouth College June 23 June 27, 2014 Lecture 5: Randomized methods for low-rank approximation Gunnar Martinsson The University of Colorado at Boulder Research

More information

Randomized algorithms for matrix computations and analysis of high dimensional data

Randomized algorithms for matrix computations and analysis of high dimensional data PCMI Summer Session, The Mathematics of Data Midway, Utah July, 2016 Randomized algorithms for matrix computations and analysis of high dimensional data Gunnar Martinsson The University of Colorado at

More information

Preliminary/Qualifying Exam in Numerical Analysis (Math 502a) Spring 2012

Preliminary/Qualifying Exam in Numerical Analysis (Math 502a) Spring 2012 Instructions Preliminary/Qualifying Exam in Numerical Analysis (Math 502a) Spring 2012 The exam consists of four problems, each having multiple parts. You should attempt to solve all four problems. 1.

More information

Numerical Methods. Elena loli Piccolomini. Civil Engeneering. piccolom. Metodi Numerici M p. 1/??

Numerical Methods. Elena loli Piccolomini. Civil Engeneering.  piccolom. Metodi Numerici M p. 1/?? Metodi Numerici M p. 1/?? Numerical Methods Elena loli Piccolomini Civil Engeneering http://www.dm.unibo.it/ piccolom elena.loli@unibo.it Metodi Numerici M p. 2/?? Least Squares Data Fitting Measurement

More information

Principal Component Analysis

Principal Component Analysis Machine Learning Michaelmas 2017 James Worrell Principal Component Analysis 1 Introduction 1.1 Goals of PCA Principal components analysis (PCA) is a dimensionality reduction technique that can be used

More information

Notes on Householder QR Factorization

Notes on Householder QR Factorization Notes on Householder QR Factorization Robert A van de Geijn Department of Computer Science he University of exas at Austin Austin, X 7872 rvdg@csutexasedu September 2, 24 Motivation A fundamental problem

More information

SPARSE signal representations have gained popularity in recent

SPARSE signal representations have gained popularity in recent 6958 IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 57, NO. 10, OCTOBER 2011 Blind Compressed Sensing Sivan Gleichman and Yonina C. Eldar, Senior Member, IEEE Abstract The fundamental principle underlying

More information

AMS526: Numerical Analysis I (Numerical Linear Algebra)

AMS526: Numerical Analysis I (Numerical Linear Algebra) AMS526: Numerical Analysis I (Numerical Linear Algebra) Lecture 7: More on Householder Reflectors; Least Squares Problems Xiangmin Jiao SUNY Stony Brook Xiangmin Jiao Numerical Analysis I 1 / 15 Outline

More information

Matrices and Vectors. Definition of Matrix. An MxN matrix A is a two-dimensional array of numbers A =

Matrices and Vectors. Definition of Matrix. An MxN matrix A is a two-dimensional array of numbers A = 30 MATHEMATICS REVIEW G A.1.1 Matrices and Vectors Definition of Matrix. An MxN matrix A is a two-dimensional array of numbers A = a 11 a 12... a 1N a 21 a 22... a 2N...... a M1 a M2... a MN A matrix can

More information

LU Factorization. LU factorization is the most common way of solving linear systems! Ax = b LUx = b

LU Factorization. LU factorization is the most common way of solving linear systems! Ax = b LUx = b AM 205: lecture 7 Last time: LU factorization Today s lecture: Cholesky factorization, timing, QR factorization Reminder: assignment 1 due at 5 PM on Friday September 22 LU Factorization LU factorization

More information

MODULE 8 Topics: Null space, range, column space, row space and rank of a matrix

MODULE 8 Topics: Null space, range, column space, row space and rank of a matrix MODULE 8 Topics: Null space, range, column space, row space and rank of a matrix Definition: Let L : V 1 V 2 be a linear operator. The null space N (L) of L is the subspace of V 1 defined by N (L) = {x

More information

Lecture 12: Randomized Least-squares Approximation in Practice, Cont. 12 Randomized Least-squares Approximation in Practice, Cont.

Lecture 12: Randomized Least-squares Approximation in Practice, Cont. 12 Randomized Least-squares Approximation in Practice, Cont. Stat60/CS94: Randomized Algorithms for Matrices and Data Lecture 1-10/14/013 Lecture 1: Randomized Least-squares Approximation in Practice, Cont. Lecturer: Michael Mahoney Scribe: Michael Mahoney Warning:

More information

Rank minimization via the γ 2 norm

Rank minimization via the γ 2 norm Rank minimization via the γ 2 norm Troy Lee Columbia University Adi Shraibman Weizmann Institute Rank Minimization Problem Consider the following problem min X rank(x) A i, X b i for i = 1,..., k Arises

More information

This can be accomplished by left matrix multiplication as follows: I

This can be accomplished by left matrix multiplication as follows: I 1 Numerical Linear Algebra 11 The LU Factorization Recall from linear algebra that Gaussian elimination is a method for solving linear systems of the form Ax = b, where A R m n and bran(a) In this method

More information

Fast Dimension Reduction

Fast Dimension Reduction Fast Dimension Reduction MMDS 2008 Nir Ailon Google Research NY Fast Dimension Reduction Using Rademacher Series on Dual BCH Codes (with Edo Liberty) The Fast Johnson Lindenstrauss Transform (with Bernard

More information

Methods for sparse analysis of high-dimensional data, II

Methods for sparse analysis of high-dimensional data, II Methods for sparse analysis of high-dimensional data, II Rachel Ward May 23, 2011 High dimensional data with low-dimensional structure 300 by 300 pixel images = 90, 000 dimensions 2 / 47 High dimensional

More information

Conditions for Robust Principal Component Analysis

Conditions for Robust Principal Component Analysis Rose-Hulman Undergraduate Mathematics Journal Volume 12 Issue 2 Article 9 Conditions for Robust Principal Component Analysis Michael Hornstein Stanford University, mdhornstein@gmail.com Follow this and

More information

SUBSET SELECTION ALGORITHMS: RANDOMIZED VS. DETERMINISTIC

SUBSET SELECTION ALGORITHMS: RANDOMIZED VS. DETERMINISTIC SUBSET SELECTION ALGORITHMS: RANDOMIZED VS. DETERMINISTIC MARY E. BROADBENT, MARTIN BROWN, AND KEVIN PENNER Advisors: Ilse Ipsen 1 and Rizwana Rehman 2 Abstract. Subset selection is a method for selecting

More information

Applications of Randomized Methods for Decomposing and Simulating from Large Covariance Matrices

Applications of Randomized Methods for Decomposing and Simulating from Large Covariance Matrices Applications of Randomized Methods for Decomposing and Simulating from Large Covariance Matrices Vahid Dehdari and Clayton V. Deutsch Geostatistical modeling involves many variables and many locations.

More information

Numerical Linear Algebra Primer. Ryan Tibshirani Convex Optimization /36-725

Numerical Linear Algebra Primer. Ryan Tibshirani Convex Optimization /36-725 Numerical Linear Algebra Primer Ryan Tibshirani Convex Optimization 10-725/36-725 Last time: proximal gradient descent Consider the problem min g(x) + h(x) with g, h convex, g differentiable, and h simple

More information

Applied Numerical Linear Algebra. Lecture 8

Applied Numerical Linear Algebra. Lecture 8 Applied Numerical Linear Algebra. Lecture 8 1/ 45 Perturbation Theory for the Least Squares Problem When A is not square, we define its condition number with respect to the 2-norm to be k 2 (A) σ max (A)/σ

More information

A Randomized Algorithm for the Approximation of Matrices

A Randomized Algorithm for the Approximation of Matrices A Randomized Algorithm for the Approximation of Matrices Per-Gunnar Martinsson, Vladimir Rokhlin, and Mark Tygert Technical Report YALEU/DCS/TR-36 June 29, 2006 Abstract Given an m n matrix A and a positive

More information

STAT 309: MATHEMATICAL COMPUTATIONS I FALL 2018 LECTURE 13

STAT 309: MATHEMATICAL COMPUTATIONS I FALL 2018 LECTURE 13 STAT 309: MATHEMATICAL COMPUTATIONS I FALL 208 LECTURE 3 need for pivoting we saw that under proper circumstances, we can write A LU where 0 0 0 u u 2 u n l 2 0 0 0 u 22 u 2n L l 3 l 32, U 0 0 0 l n l

More information

Linear Analysis Lecture 16

Linear Analysis Lecture 16 Linear Analysis Lecture 16 The QR Factorization Recall the Gram-Schmidt orthogonalization process. Let V be an inner product space, and suppose a 1,..., a n V are linearly independent. Define q 1,...,

More information

Orthogonalization and least squares methods

Orthogonalization and least squares methods Chapter 3 Orthogonalization and least squares methods 31 QR-factorization (QR-decomposition) 311 Householder transformation Definition 311 A complex m n-matrix R = [r ij is called an upper (lower) triangular

More information

A randomized algorithm for approximating the SVD of a matrix

A randomized algorithm for approximating the SVD of a matrix A randomized algorithm for approximating the SVD of a matrix Joint work with Per-Gunnar Martinsson (U. of Colorado) and Vladimir Rokhlin (Yale) Mark Tygert Program in Applied Mathematics Yale University

More information

B553 Lecture 5: Matrix Algebra Review

B553 Lecture 5: Matrix Algebra Review B553 Lecture 5: Matrix Algebra Review Kris Hauser January 19, 2012 We have seen in prior lectures how vectors represent points in R n and gradients of functions. Matrices represent linear transformations

More information

Parallel Singular Value Decomposition. Jiaxing Tan

Parallel Singular Value Decomposition. Jiaxing Tan Parallel Singular Value Decomposition Jiaxing Tan Outline What is SVD? How to calculate SVD? How to parallelize SVD? Future Work What is SVD? Matrix Decomposition Eigen Decomposition A (non-zero) vector

More information

Subset Selection. Ilse Ipsen. North Carolina State University, USA

Subset Selection. Ilse Ipsen. North Carolina State University, USA Subset Selection Ilse Ipsen North Carolina State University, USA Subset Selection Given: real or complex matrix A integer k Determine permutation matrix P so that AP = ( A 1 }{{} k A 2 ) Important columns

More information

Sparse BLAS-3 Reduction

Sparse BLAS-3 Reduction Sparse BLAS-3 Reduction to Banded Upper Triangular (Spar3Bnd) Gary Howell, HPC/OIT NC State University gary howell@ncsu.edu Sparse BLAS-3 Reduction p.1/27 Acknowledgements James Demmel, Gene Golub, Franc

More information

arxiv: v1 [math.na] 29 Dec 2014

arxiv: v1 [math.na] 29 Dec 2014 A CUR Factorization Algorithm based on the Interpolative Decomposition Sergey Voronin and Per-Gunnar Martinsson arxiv:1412.8447v1 [math.na] 29 Dec 214 December 3, 214 Abstract An algorithm for the efficient

More information

14 Singular Value Decomposition

14 Singular Value Decomposition 14 Singular Value Decomposition For any high-dimensional data analysis, one s first thought should often be: can I use an SVD? The singular value decomposition is an invaluable analysis tool for dealing

More information

Lecture 4 Orthonormal vectors and QR factorization

Lecture 4 Orthonormal vectors and QR factorization Orthonormal vectors and QR factorization 4 1 Lecture 4 Orthonormal vectors and QR factorization EE263 Autumn 2004 orthonormal vectors Gram-Schmidt procedure, QR factorization orthogonal decomposition induced

More information

Lecture 2: Linear Algebra Review

Lecture 2: Linear Algebra Review EE 227A: Convex Optimization and Applications January 19 Lecture 2: Linear Algebra Review Lecturer: Mert Pilanci Reading assignment: Appendix C of BV. Sections 2-6 of the web textbook 1 2.1 Vectors 2.1.1

More information

sublinear time low-rank approximation of positive semidefinite matrices Cameron Musco (MIT) and David P. Woodru (CMU)

sublinear time low-rank approximation of positive semidefinite matrices Cameron Musco (MIT) and David P. Woodru (CMU) sublinear time low-rank approximation of positive semidefinite matrices Cameron Musco (MIT) and David P. Woodru (CMU) 0 overview Our Contributions: 1 overview Our Contributions: A near optimal low-rank

More information

5.6. PSEUDOINVERSES 101. A H w.

5.6. PSEUDOINVERSES 101. A H w. 5.6. PSEUDOINVERSES 0 Corollary 5.6.4. If A is a matrix such that A H A is invertible, then the least-squares solution to Av = w is v = A H A ) A H w. The matrix A H A ) A H is the left inverse of A and

More information

Lecture 18 Nov 3rd, 2015

Lecture 18 Nov 3rd, 2015 CS 229r: Algorithms for Big Data Fall 2015 Prof. Jelani Nelson Lecture 18 Nov 3rd, 2015 Scribe: Jefferson Lee 1 Overview Low-rank approximation, Compression Sensing 2 Last Time We looked at three different

More information

Stability of the Gram-Schmidt process

Stability of the Gram-Schmidt process Stability of the Gram-Schmidt process Orthogonal projection We learned in multivariable calculus (or physics or elementary linear algebra) that if q is a unit vector and v is any vector then the orthogonal

More information

Numerical Methods I Non-Square and Sparse Linear Systems

Numerical Methods I Non-Square and Sparse Linear Systems Numerical Methods I Non-Square and Sparse Linear Systems Aleksandar Donev Courant Institute, NYU 1 donev@courant.nyu.edu 1 MATH-GA 2011.003 / CSCI-GA 2945.003, Fall 2014 September 25th, 2014 A. Donev (Courant

More information

AMS526: Numerical Analysis I (Numerical Linear Algebra for Computational and Data Sciences)

AMS526: Numerical Analysis I (Numerical Linear Algebra for Computational and Data Sciences) AMS526: Numerical Analysis I (Numerical Linear Algebra for Computational and Data Sciences) Lecture 1: Course Overview; Matrix Multiplication Xiangmin Jiao Stony Brook University Xiangmin Jiao Numerical

More information

The Singular Value Decomposition

The Singular Value Decomposition The Singular Value Decomposition An Important topic in NLA Radu Tiberiu Trîmbiţaş Babeş-Bolyai University February 23, 2009 Radu Tiberiu Trîmbiţaş ( Babeş-Bolyai University)The Singular Value Decomposition

More information

Block Bidiagonal Decomposition and Least Squares Problems

Block Bidiagonal Decomposition and Least Squares Problems Block Bidiagonal Decomposition and Least Squares Problems Åke Björck Department of Mathematics Linköping University Perspectives in Numerical Analysis, Helsinki, May 27 29, 2008 Outline Bidiagonal Decomposition

More information

Singular Value Decomposition

Singular Value Decomposition Singular Value Decomposition (Com S 477/577 Notes Yan-Bin Jia Sep, 7 Introduction Now comes a highlight of linear algebra. Any real m n matrix can be factored as A = UΣV T where U is an m m orthogonal

More information

Numerical Methods in Matrix Computations

Numerical Methods in Matrix Computations Ake Bjorck Numerical Methods in Matrix Computations Springer Contents 1 Direct Methods for Linear Systems 1 1.1 Elements of Matrix Theory 1 1.1.1 Matrix Algebra 2 1.1.2 Vector Spaces 6 1.1.3 Submatrices

More information

Fast Random Projections

Fast Random Projections Fast Random Projections Edo Liberty 1 September 18, 2007 1 Yale University, New Haven CT, supported by AFOSR and NGA (www.edoliberty.com) Advised by Steven Zucker. About This talk will survey a few random

More information

Numerical Linear Algebra Primer. Ryan Tibshirani Convex Optimization

Numerical Linear Algebra Primer. Ryan Tibshirani Convex Optimization Numerical Linear Algebra Primer Ryan Tibshirani Convex Optimization 10-725 Consider Last time: proximal Newton method min x g(x) + h(x) where g, h convex, g twice differentiable, and h simple. Proximal

More information

DS-GA 1002 Lecture notes 0 Fall Linear Algebra. These notes provide a review of basic concepts in linear algebra.

DS-GA 1002 Lecture notes 0 Fall Linear Algebra. These notes provide a review of basic concepts in linear algebra. DS-GA 1002 Lecture notes 0 Fall 2016 Linear Algebra These notes provide a review of basic concepts in linear algebra. 1 Vector spaces You are no doubt familiar with vectors in R 2 or R 3, i.e. [ ] 1.1

More information

The University of Texas at Austin Department of Electrical and Computer Engineering. EE381V: Large Scale Learning Spring 2013.

The University of Texas at Austin Department of Electrical and Computer Engineering. EE381V: Large Scale Learning Spring 2013. The University of Texas at Austin Department of Electrical and Computer Engineering EE381V: Large Scale Learning Spring 2013 Assignment Two Caramanis/Sanghavi Due: Tuesday, Feb. 19, 2013. Computational

More information

Math 102, Winter Final Exam Review. Chapter 1. Matrices and Gaussian Elimination

Math 102, Winter Final Exam Review. Chapter 1. Matrices and Gaussian Elimination Math 0, Winter 07 Final Exam Review Chapter. Matrices and Gaussian Elimination { x + x =,. Different forms of a system of linear equations. Example: The x + 4x = 4. [ ] [ ] [ ] vector form (or the column

More information

Applied Mathematics 205. Unit V: Eigenvalue Problems. Lecturer: Dr. David Knezevic

Applied Mathematics 205. Unit V: Eigenvalue Problems. Lecturer: Dr. David Knezevic Applied Mathematics 205 Unit V: Eigenvalue Problems Lecturer: Dr. David Knezevic Unit V: Eigenvalue Problems Chapter V.4: Krylov Subspace Methods 2 / 51 Krylov Subspace Methods In this chapter we give

More information

Technical Report No September 2009

Technical Report No September 2009 FINDING STRUCTURE WITH RANDOMNESS: STOCHASTIC ALGORITHMS FOR CONSTRUCTING APPROXIMATE MATRIX DECOMPOSITIONS N. HALKO, P. G. MARTINSSON, AND J. A. TROPP Technical Report No. 2009-05 September 2009 APPLIED

More information

Singular Value Decomposition

Singular Value Decomposition Chapter 6 Singular Value Decomposition In Chapter 5, we derived a number of algorithms for computing the eigenvalues and eigenvectors of matrices A R n n. Having developed this machinery, we complete our

More information

ANSWERS (5 points) Let A be a 2 2 matrix such that A =. Compute A. 2

ANSWERS (5 points) Let A be a 2 2 matrix such that A =. Compute A. 2 MATH 7- Final Exam Sample Problems Spring 7 ANSWERS ) ) ). 5 points) Let A be a matrix such that A =. Compute A. ) A = A ) = ) = ). 5 points) State ) the definition of norm, ) the Cauchy-Schwartz inequality

More information

Linear Algebra. Min Yan

Linear Algebra. Min Yan Linear Algebra Min Yan January 2, 2018 2 Contents 1 Vector Space 7 1.1 Definition................................. 7 1.1.1 Axioms of Vector Space..................... 7 1.1.2 Consequence of Axiom......................

More information

STA141C: Big Data & High Performance Statistical Computing

STA141C: Big Data & High Performance Statistical Computing STA141C: Big Data & High Performance Statistical Computing Lecture 5: Numerical Linear Algebra Cho-Jui Hsieh UC Davis April 20, 2017 Linear Algebra Background Vectors A vector has a direction and a magnitude

More information

STA141C: Big Data & High Performance Statistical Computing

STA141C: Big Data & High Performance Statistical Computing STA141C: Big Data & High Performance Statistical Computing Numerical Linear Algebra Background Cho-Jui Hsieh UC Davis May 15, 2018 Linear Algebra Background Vectors A vector has a direction and a magnitude

More information

Methods for sparse analysis of high-dimensional data, II

Methods for sparse analysis of high-dimensional data, II Methods for sparse analysis of high-dimensional data, II Rachel Ward May 26, 2011 High dimensional data with low-dimensional structure 300 by 300 pixel images = 90, 000 dimensions 2 / 55 High dimensional

More information

Faster Johnson-Lindenstrauss style reductions

Faster Johnson-Lindenstrauss style reductions Faster Johnson-Lindenstrauss style reductions Aditya Menon August 23, 2007 Outline 1 Introduction Dimensionality reduction The Johnson-Lindenstrauss Lemma Speeding up computation 2 The Fast Johnson-Lindenstrauss

More information

Singular Value Decompsition

Singular Value Decompsition Singular Value Decompsition Massoud Malek One of the most useful results from linear algebra, is a matrix decomposition known as the singular value decomposition It has many useful applications in almost

More information

Numerical Linear Algebra

Numerical Linear Algebra Numerical Linear Algebra The two principal problems in linear algebra are: Linear system Given an n n matrix A and an n-vector b, determine x IR n such that A x = b Eigenvalue problem Given an n n matrix

More information

Orthonormal Transformations and Least Squares

Orthonormal Transformations and Least Squares Orthonormal Transformations and Least Squares Tom Lyche Centre of Mathematics for Applications, Department of Informatics, University of Oslo October 30, 2009 Applications of Qx with Q T Q = I 1. solving

More information

Spectrum-Revealing Matrix Factorizations Theory and Algorithms

Spectrum-Revealing Matrix Factorizations Theory and Algorithms Spectrum-Revealing Matrix Factorizations Theory and Algorithms Ming Gu Department of Mathematics University of California, Berkeley April 5, 2016 Joint work with D. Anderson, J. Deursch, C. Melgaard, J.

More information

Lecture 8: Linear Algebra Background

Lecture 8: Linear Algebra Background CSE 521: Design and Analysis of Algorithms I Winter 2017 Lecture 8: Linear Algebra Background Lecturer: Shayan Oveis Gharan 2/1/2017 Scribe: Swati Padmanabhan Disclaimer: These notes have not been subjected

More information

Mathematical Optimisation, Chpt 2: Linear Equations and inequalities

Mathematical Optimisation, Chpt 2: Linear Equations and inequalities Mathematical Optimisation, Chpt 2: Linear Equations and inequalities Peter J.C. Dickinson p.j.c.dickinson@utwente.nl http://dickinson.website version: 12/02/18 Monday 5th February 2018 Peter J.C. Dickinson

More information

Chapter XII: Data Pre and Post Processing

Chapter XII: Data Pre and Post Processing Chapter XII: Data Pre and Post Processing Information Retrieval & Data Mining Universität des Saarlandes, Saarbrücken Winter Semester 2013/14 XII.1 4-1 Chapter XII: Data Pre and Post Processing 1. Data

More information

Lecture 24: Element-wise Sampling of Graphs and Linear Equation Solving. 22 Element-wise Sampling of Graphs and Linear Equation Solving

Lecture 24: Element-wise Sampling of Graphs and Linear Equation Solving. 22 Element-wise Sampling of Graphs and Linear Equation Solving Stat260/CS294: Randomized Algorithms for Matrices and Data Lecture 24-12/02/2013 Lecture 24: Element-wise Sampling of Graphs and Linear Equation Solving Lecturer: Michael Mahoney Scribe: Michael Mahoney

More information

Randomized Algorithms

Randomized Algorithms Randomized Algorithms Saniv Kumar, Google Research, NY EECS-6898, Columbia University - Fall, 010 Saniv Kumar 9/13/010 EECS6898 Large Scale Machine Learning 1 Curse of Dimensionality Gaussian Mixture Models

More information

Review problems for MA 54, Fall 2004.

Review problems for MA 54, Fall 2004. Review problems for MA 54, Fall 2004. Below are the review problems for the final. They are mostly homework problems, or very similar. If you are comfortable doing these problems, you should be fine on

More information

1 The linear algebra of linear programs (March 15 and 22, 2015)

1 The linear algebra of linear programs (March 15 and 22, 2015) 1 The linear algebra of linear programs (March 15 and 22, 2015) Many optimization problems can be formulated as linear programs. The main features of a linear program are the following: Variables are real

More information

7. Dimension and Structure.

7. Dimension and Structure. 7. Dimension and Structure 7.1. Basis and Dimension Bases for Subspaces Example 2 The standard unit vectors e 1, e 2,, e n are linearly independent, for if we write (2) in component form, then we obtain

More information

Least Squares. Tom Lyche. October 26, Centre of Mathematics for Applications, Department of Informatics, University of Oslo

Least Squares. Tom Lyche. October 26, Centre of Mathematics for Applications, Department of Informatics, University of Oslo Least Squares Tom Lyche Centre of Mathematics for Applications, Department of Informatics, University of Oslo October 26, 2010 Linear system Linear system Ax = b, A C m,n, b C m, x C n. under-determined

More information

AMS 209, Fall 2015 Final Project Type A Numerical Linear Algebra: Gaussian Elimination with Pivoting for Solving Linear Systems

AMS 209, Fall 2015 Final Project Type A Numerical Linear Algebra: Gaussian Elimination with Pivoting for Solving Linear Systems AMS 209, Fall 205 Final Project Type A Numerical Linear Algebra: Gaussian Elimination with Pivoting for Solving Linear Systems. Overview We are interested in solving a well-defined linear system given

More information

Chapter 5 Orthogonality

Chapter 5 Orthogonality Matrix Methods for Computational Modeling and Data Analytics Virginia Tech Spring 08 Chapter 5 Orthogonality Mark Embree embree@vt.edu Ax=b version of February 08 We needonemoretoolfrom basic linear algebra

More information

Linear Systems. Carlo Tomasi

Linear Systems. Carlo Tomasi Linear Systems Carlo Tomasi Section 1 characterizes the existence and multiplicity of the solutions of a linear system in terms of the four fundamental spaces associated with the system s matrix and of

More information

A Fast Algorithm For Computing The A-optimal Sampling Distributions In A Big Data Linear Regression

A Fast Algorithm For Computing The A-optimal Sampling Distributions In A Big Data Linear Regression A Fast Algorithm For Computing The A-optimal Sampling Distributions In A Big Data Linear Regression Hanxiang Peng and Fei Tan Indiana University Purdue University Indianapolis Department of Mathematical

More information

1 9/5 Matrices, vectors, and their applications

1 9/5 Matrices, vectors, and their applications 1 9/5 Matrices, vectors, and their applications Algebra: study of objects and operations on them. Linear algebra: object: matrices and vectors. operations: addition, multiplication etc. Algorithms/Geometric

More information

Math 350 Fall 2011 Notes about inner product spaces. In this notes we state and prove some important properties of inner product spaces.

Math 350 Fall 2011 Notes about inner product spaces. In this notes we state and prove some important properties of inner product spaces. Math 350 Fall 2011 Notes about inner product spaces In this notes we state and prove some important properties of inner product spaces. First, recall the dot product on R n : if x, y R n, say x = (x 1,...,

More information

arxiv: v1 [math.na] 5 May 2011

arxiv: v1 [math.na] 5 May 2011 ITERATIVE METHODS FOR COMPUTING EIGENVALUES AND EIGENVECTORS MAYSUM PANJU arxiv:1105.1185v1 [math.na] 5 May 2011 Abstract. We examine some numerical iterative methods for computing the eigenvalues and

More information

Rank revealing factorizations, and low rank approximations

Rank revealing factorizations, and low rank approximations Rank revealing factorizations, and low rank approximations L. Grigori Inria Paris, UPMC January 2018 Plan Low rank matrix approximation Rank revealing QR factorization LU CRTP: Truncated LU factorization

More information