A Fast Algorithm For Computing The A-optimal Sampling Distributions In A Big Data Linear Regression
|
|
- Julia Newton
- 5 years ago
- Views:
Transcription
1 A Fast Algorithm For Computing The A-optimal Sampling Distributions In A Big Data Linear Regression Hanxiang Peng and Fei Tan Indiana University Purdue University Indianapolis Department of Mathematical Sciences Indianapolis, IN USA (hanxpeng, feitan)@iupui.edu March 19, 2018 Abstract: It was demonstrated in Peng and Tan(2018) that the A-optimal sampling distributions in a big data linear regression model has the same running time O(np 2 ) as the full data least squares estimator. In this article, we construct a fast algorithm to compute the sampling distributions with o(np 2 ) time and establish the relative error bounds. AMS 2000 subject classifications: Primary 62G05; secondary 62G10,62G20. Keywords and phrases: A-optimality; big data analysis; fast algorithm; Johnson-Lindenstrauss transform; leverage scores; non-uniform sampling. 1. Introduction Let X be an n p matrix with n >> p. For α R, let H α = X(X X) α X = (h (α) i,j ) =: (h α,i,j). (1.1) Then H 1 = H = (h i,j ) is the hat matrix. The Ã-optimal distribution is given by π (aopt) i = π (2) h2,i,i i = n, i = 1,...,n. (1.2) i=1 h2,i,i For details, see Peng and Tan (2018). 2. Johnson Lindenstrauss Transforms Let R n p denote the space of n p matrices with real entries. Following Drineas, et al. (2012), given ǫ > 0 and a set of n points x i in R p, an ǫ-johnson Lindenstrauss transformation (ǫ-jlt) for the set is a projection from R p into R r, identified with a matrix Π R p r, such that (1 ǫ) x i x j 2 x i Π x j Π 2 (1+ǫ) x i x j 2, i,j = 1,...,n. (2.1) 1
2 H. Peng and F. Tan/Fast Algorithm For Computing The A-optimal Sampling Distributions2 What is focal is to construct a fast ǫ-jlt. A popular method in the literature is to construct a random projection that is an ǫ-jlt with high probability. Specifically, choose every entry of the projection to be i.i.d. with the random variable which takes the values ± 3/r with probability 1/6 each and zero otherwise. Let Π JLT denote such a random projection. The following result from Theorem 1.1 of Achlioptas (2003) ensures that Π JLT is such an ǫ-jlt. Lemma 2.1. Let x 1,...,x n be a set of n points in R p. Let ǫ (0, 1) be an accuracy parameter and δ be a probability of failure. If r 4lnn+2ln1/δ ǫ 2 /2 ǫ 3 /3, (2.2) then it holds with probability at least 1 δ that the r p random matarix Π JLT described above satisfies (2.1). Assume Π is an ǫ-jlt. Since it is a linear, we have (1 ǫ) x i 2 x i Π 2 (1+ǫ) x i 2, i = 1,...,n. (2.3) An ǫ-jlt maps a point in R p into R r and distorts the distance of two points within 1±ǫ. While an ǫ-jlt retains the local properties, the fast JLT (FJLT) satisfies stronger conditions than the ǫ-jlt. Following Drineas, et al. (2012), let U be an orthogonal matrix on R n p and view its columns as p vectors in R n. A projection from R n to R r, identified with a matrix Π R r n, is an ǫ-fjlt for U if it satisfies Approximate orthogonality: (ΠU) (ΠU) I p o ǫ. Fast running time: for M R n p, the matrix product ΠM can be computed in O(nplnr) time. A fast JLT possesses nice properties given in Lemma 2 of Drineas, et al. (2012). We restate these properties in Lemma 4.1 below for our use. Computing the matrix product ΠM takes O(npr) time. An ǫ-fjlt beats this running time with high probability and can be constructed by employing a randomized Hadamard transformation (RHT). An n n Hadamard matrix can be recursively defined as H n = n 1/2 Hn, where H 1 = 1 and ( ) Hn Hn H 2n =. H n H n Here for simplicity we assume n is the power of 2. For general construction, numerical implementation and evaluation, see e.g. Avron, et al. (2010). The Hadamard matrix encodes the discrete Fourier transformation over the additive group (Z/(2Z)) n : its FFT is particularly simple and requires O(nlogn) time. LetD R n n bearandomdiagonalmatrixwhosediagonalentriesarei.i.d.with the random variable D = ±1 with probability 1/2 each. The product HD is a RHT and possesses useful properties as remarked in Drineas, et al.(2012). When applied to a vector, it spread out the energy; Computing the product HDx
3 H. Peng and F. Tan/Fast Algorithm For Computing The A-optimal Sampling Distributions3 Fig 1. Fast approximation to the A-optimal sampling distribution Input: X R n p (with SVD X = UΛV ), error parameter ǫ (0, 1), and failure probability δ (0,1). Output:,i = 1,...,n. h (2) i,i 1. Let Π 1 R r1 n be the ǫ-fjlt for U with r 1 satisfying (2.4). 2. Compute Π 1 X R r1 p and its SVD, Π 1 X = U 1 Λ 1 V 1. Let R 1 = V 1 Λ View the normalized rows of XR 1 R n p as n vectors in R p, and compute an ǫ-jlt Π 2 R p r2 for the n vectors and their n 2 n pairwise sums with r 2 satisfying (2.2). 4. Compute the matrix product P = XR 1 R Π Compute h (2) i,i = e i P 2 for i = 1,...,n. for x R n only takes O(nlog 2 n) time; And accessing r components in HDx takes O(nlog 2 r) time. The subsampled randomized Hadamard transformation (SRHT) uniformly samples a set of r rows of a RHT. This HDx plays the role of preconditioning the input matrix and then one takes a uniform subsample of the rows of the resulting input matrix. Let S be a r n sampling matrix which uniformly samples r rows of an n d matrix. Let Π FJLT = S HD. As shown in Lemma 3 of Drineas, et al. (2012), Π FJLT is a FJLT for a large value of r with high probability. Below we state the result with a slightly improved lower bound for the subsample size r tailored for our applications. Lemma 2.2. Let Π FJLT be a r n random matrix obtained from the SRHT described above. Let U be a n p orthogonal matrix with n > p. If r 64pln(40np) ǫ 2 ln 64pln(40np) ǫ 2 δ then it holds with probability at least 1 δ that Π FJLT is a ǫ-fjlt for U. (2.4) Note that the lower bound for the subsample size r in Drineas, et al. (2012) is r 196ǫ 2 pln(40np)ln(900ǫ 2 pln(40np)) for δ = Fast Approximation of the A-optimal Distribution Let e i R n be the vector with the ith entry one and all others zero. As X has full rank, (X X) 1 = X + (X + ), so that the ith diagonal entry of H 2 is h (2) i,i = e i H 2 e i = e i X(X X) 1 2 = e i XX + (X + ) 2. (3.1) Similar tothefastcomputing fortheleverage scoresindrineas,etal.(2012), the two bottlenecks of computing h (2) i,i for the A-optimal distribution according to
4 H. Peng and F. Tan/Fast Algorithm For Computing The A-optimal Sampling Distributions4 (3.1) is two fold: first, computing the pseudoinverse and second, performing the matrix multiplications. Both of them take O(np 2 ) time. We will follow Drineas, et al. to get around the bottlenecks by the judiciously choosing random projections to (3.1). To get around the bottleneck of O(np 2 ) time due to computing X + in (3.1), we will compute the pseudoinverse of a smaller matrix that approximates X. This is done by approximating X by Π 1 X, where Π 1 R r1 p is an ǫ-fjlt for the left singular vector matrix U of X and chosen as the SRHT of Lemma 2.2. Computing this way the products in (3.1) reduces the time to O(npr 1 ). This is still not efficient as r 1 > p required by Lemma 2.2. To get around this bottleneck, we can further reduce the dimensionality by using an ǫ- JLT of Lemma 2.1 to reduce the dimension r 1 > p to r 2 = O(lnn). Specifically, we approximate h (2) i,i by h (2) i,i = e i X(Π 1 X) + (Π 1 X) + Π 2 2, (3.2) where Π 2 R p r2. This is realized by the algorithm in Fig. 1. Below we give the relative error bound and the running time of the procedure. Theorem 3.1. Let X be a n p matrix of full rank p with n p. Let ǫ (0, 1) be an error parameter and δ (0, 1) be a failure probability parameter. Let h (2) i,i,i = (2) 1,...,n be approximated by the output h i,i of the randomized algorithm given in Fig. 1. Then it holds with probability at least (1 δ) 2 that for i = 1,...,n, h (2) i,i h(2) i,i ǫ ( 2ǫ 2 +ǫ (1 ǫ) 2κ2 (Λ)+ 4ǫ+2 1 ǫ κ(λ)+2 ) h i,i (2), (3.3) where κ(x) = σ max (X)/σ min (X) is the condition number of X. Remark 3.1. Following the computing of the running time of the algorithm for the leverage scores in Theorem 1 of Drineas, et al. (2012), the running time for the algorithm in Fig. 1 is O(npln(r 1 )+npr 2 +r 1 p 2 +r 2 p 2 +p 3 ), as there is only one additional matrix multiplication R 1 R, which takes O(p 3 ) time. Thus the asymptotic running time of our algorithm is the same as the algorithm in Drineas, et al. (2012): O(npln(pǫ 1 )+npǫ 2 ln(n)+p 3 ǫ 2 ln(n)ln(pǫ 1 )). Treating ǫ as a constant, the asymptotic running time of our algorithm is O(npln(n)+p 3 ln(n)ln(p)), provided that p n exp(p). The running time is o(np 2 ) if pln(p) = o(n/ln(n)) and ln(n) = o(p).
5 H. Peng and F. Tan/Fast Algorithm For Computing The A-optimal Sampling Distributions5 4. Proofs for the Fast Algorithm We restate Lemma 2 of Drineas, et al. (2012) below. Lemma 4.1. Let M be an n d matrix of full rank d with n >> d. Let the SVD of M be M = UΣV. Let Π be an ǫ-fjlt for U with 0 < ǫ 1 and let the SVD of ΠU be ΠU = U 1 Σ 1 V 1. Then ΠM, ΠU, M and U have the same rank d. Moreover, I d Σ 1 o ǫ/(1 ǫ) and (ΠM) + = VΣ 1 (ΠU) +. Lemma 4.2. Let Y 1,...,Y n be i.i.d. d-dimensional complex random vectors with Y 1 M a.s. and E(Y 1 Y 1) o 1. Then for any t > 0, ( 1 P n n ) Y j Yj E(Y 1 Y1) t o j=1 (2n) ( nt 1+t 2 1 ) exp 2M 2 (1+. 1+t) 2 NotewereducedOliverira slowerbound(2n) 2 exp( nt 2 16M 2 +8M 2 t )totheabove one. Proof of Lemma 4.2. We shall optimize the lower bound in the proof of Lemma 1 of Oliveira (2010), which is f(s) = (2n) 1/(1 2M2s/n)) exp ( st+ 2M2 s 2 /n ), t > M 2 s/n To minimize f(s), we seek the choice of s of the form s 0 = n t 2M 2 t+b, b 0. Simple calculus shows that b = 1+ 1+t and the desired result follows from simplifying f(s 0 ). Proof of Lemma 2.2. Using the improved bound in Lemma 4.2 below and following Lemmas 3 and 4 of Drineas, et al. (2010), we obtain the desired improved bound. Proof of Theorem 3.1. Introduce v i = e i XX + X +, ˆv i = e i X(Π 1 X) + (Π 1 X) +, ṽ i = ˆv i Π 2. With these we have h (2) i,i = v i 2, h(2) i,i = ṽ i 2. (4.1) For two vectors a,b, the inner product is a,b = a b. We show below ˆv i,ˆv j v i,v j ǫ κ(λ) ( ǫκ(λ) ) 1 ǫ 1 ǫ +2 v i v j, (4.2) ṽ i,ṽ j ˆv i,ˆv j 2ǫ ˆv i ˆv j, i,j = 1,...,n, (4.3)
6 H. Peng and F. Tan/Fast Algorithm For Computing The A-optimal Sampling Distributions6 and ˆv i ( ǫκ(λ) 1 ǫ +1 ) v i, (4.4) where κ(λ) = Λ o Λ 1 o = σ max (X)/σ min (X) = κ(x). Note that (4.3) is straightforward from the definition of the ǫ-jlt. By Lemma 2.2, both (4.2) and (4.3) hold with probability at least 1 δ, while (4.4) holds with probability at least 1 δ by Lemma 2.1. Consequently, (4.2) (4.4) hold with probability at least (1 δ) 2 as Π 1 and Π 2 are independent random matrices. From these the desired (3.3) immediately follows in view of h (2) i,i h(2) i,i ṽ i,ṽ i ˆv i,ˆv i + ˆv i,ˆv i v i,v i. To prove (4.2), we use the SVD, X = UΛV, where U(V) is the n p (p p) orthonormal matrix consisting of the left (right) singular vectors as their columnsandλisthep psingularvaluematrix.asxhasfullrankp,allu,v,λ have the same rank p. By Lemmas 4.1 and 2.4 it holds with probability at 1 δ that (Π 1 X) + = VΛ 1 (Π 1 U 1 ) +. Using the singular value decompositions, we can express v i and ˆv i as Thus where v i = e i UΛ 1 V, ˆv i = e i U(Π 1 U) + (Π 1 U) + Λ 1 V. (4.5) ˆv i,ˆv j v i,v j A o v i v j, A = Σ(Π 1 U) + (Π 1 U) + Σ 2 (Π 1 U) + (Π 1 U) + Σ I p. Let B = (Π 1 U) + (Π 1 U) +. Then A o B I p 2 oκ 2 (Λ)+2 B I p o κ(λ). (4.6) AsΠ 1 Uhasfullrankpwithprobabilityatleast1 δ,wehaveb = ((Π 1 U) (Π 1 U)) 1 and B I p o (Π 1 U) (Π 1 U) I p o B o ǫ( B I p o +1), where we used the defining property of an ǫ-fjlt. Thus B I p o ǫ/(1 ǫ). Substitution of this in (4.6) gives (4.2). Using this and the second representation in (4.5), the desired (4.4) follows from ˆv i ΛBΛ 1 o v i (κ(λ) B I p o +1) v i.
7 H. Peng and F. Tan/Fast Algorithm For Computing The A-optimal Sampling Distributions7 References [1] Achlioptas, D. (2003). Database-friendly random projections: Johnson - lindenstrauss with binary coins. Journal of Computer and System Sciences, 66(4): [2] Avron, H., Maymounkov, P. and Toledo, S. (2010). Blendenpik: Supercharging LAPACK s least-squares solver. SIAM Journal on Scientific Computing, 32: [3] Baxter, J., Jones, R., Lin, M. and Olsen, J. (2004). SLLN for Weighted Independent Identically Distributed Random Variables. J. Theoret. Probab., 17: doi: /b:jotp d. [4] Candés, E.J. and Tao, T. (2009). Exact Matrix Completion via Convex Optimization. Found Comput Math 9: 717. doi: /s [5] Drineas P., Kannan R. and Mahoney M.W. (2006). Fast Monte Carlo algorithms for matrices I: Approximating matrix multiplication. SIAM Journal on Computing, 36: [6] Drineas, P., Mahoney,M.W., Muthukrishnan, S. (2008). [7] Drineas, P., Mahoney,M.W., Muthukrishnan, S. and Sarlós, T. (2010). Faster least squares approximation. Numerische Mathematik, 117(2): [8] Drineas P., Mahoney M.W. and Muthukrishnan S. (2006). Sampling algorithms for l 2 regression and applications. Proceedings of the 17th Annual ACM-SIAM Symposium on Discrete Algorithms, pages [9] Drineas P., Magdon-Ismail, M., Mahoney M.W. and Woodruff, D.P. (2012). Fast approximation of matrix coherence and statistical leverage. The Journal of Machine Learning Research, 13: [10] Oliveira, R. I. (2010). Sums of random Hermitian matrices and an inequality by Rudelson. Technical report. Preprint: arxiv: v1. [11] Mahoney, M. W. and Drineas, P. (2009). [12] Mahoney, M. W. (2011). Randomized algorithms for matrices and data. Foundations and Trends in Machine Learning. NOW Publishers, Boston. [13] Drineas, P., Mahoney, M.W, Muthukrishnan, S. and Sarlós, T. (2010). Faster least squares approximation. Numerische Mathematik, 117(2): [14] Papadimitriou, C.H., Raghavan, P., Tamaki, H. and Vempala, S. (2000). Latent semantic indexing: a probabilistic analysis. J. Computer and System Sciences, 61(2): [15] Sarlós, T. (2006). Improved approximation algorithms for large matrices via random projections. In Proceedings of the 47th Annual IEEE Symposium on Foundations of Computer Science, pages [16] Sarlós, T. (2010). [17] Drineas, P., Mahoney, M.W. and Muthukrishnan, S. (2008). Relative-error CUR matrix decompositions. SIAM Journal on Matrix Analysis and Applications, 30: [18] Mahoney, M.W. and Drineas, P. (2009). CUR matrix decompositions for improved data analysis. Proc. Natl. Acad. Sci. USA, 106:
8 H. Peng and F. Tan/Fast Algorithm For Computing The A-optimal Sampling Distributions8 [19] Boutsidis, C., Mahoney, M.W. and Drineas. P. (2009). An improved approximation algorithm for the column subset selection problem. In Proceedings of the 20th Annual ACM-SIAM Symposium on Discrete Algorithms: p [20] Teicher, H.(1974). On the law of the iterated logarithm. Ann. Probability 2: [21] Wang, C., Chen, M.-H., Schifano, E., Wu, J. and Yan, J. (2015). A Survey of Statistical Methods and Computing for Big Data. arxiv:
Sketching as a Tool for Numerical Linear Algebra
Sketching as a Tool for Numerical Linear Algebra (Part 2) David P. Woodruff presented by Sepehr Assadi o(n) Big Data Reading Group University of Pennsylvania February, 2015 Sepehr Assadi (Penn) Sketching
More informationRandom Methods for Linear Algebra
Gittens gittens@acm.caltech.edu Applied and Computational Mathematics California Institue of Technology October 2, 2009 Outline The Johnson-Lindenstrauss Transform 1 The Johnson-Lindenstrauss Transform
More informationFast Approximation of Matrix Coherence and Statistical Leverage
Journal of Machine Learning Research 13 (01) 3475-3506 Submitted 7/1; Published 1/1 Fast Approximation of Matrix Coherence and Statistical Leverage Petros Drineas Malik Magdon-Ismail Department of Computer
More informationA fast randomized algorithm for overdetermined linear least-squares regression
A fast randomized algorithm for overdetermined linear least-squares regression Vladimir Rokhlin and Mark Tygert Technical Report YALEU/DCS/TR-1403 April 28, 2008 Abstract We introduce a randomized algorithm
More informationApproximate Spectral Clustering via Randomized Sketching
Approximate Spectral Clustering via Randomized Sketching Christos Boutsidis Yahoo! Labs, New York Joint work with Alex Gittens (Ebay), Anju Kambadur (IBM) The big picture: sketch and solve Tradeoff: Speed
More informationLecture 18 Nov 3rd, 2015
CS 229r: Algorithms for Big Data Fall 2015 Prof. Jelani Nelson Lecture 18 Nov 3rd, 2015 Scribe: Jefferson Lee 1 Overview Low-rank approximation, Compression Sensing 2 Last Time We looked at three different
More informationCS 229r: Algorithms for Big Data Fall Lecture 17 10/28
CS 229r: Algorithms for Big Data Fall 2015 Prof. Jelani Nelson Lecture 17 10/28 Scribe: Morris Yau 1 Overview In the last lecture we defined subspace embeddings a subspace embedding is a linear transformation
More informationRandomized Algorithms for Matrix Computations
Randomized Algorithms for Matrix Computations Ilse Ipsen Students: John Holodnak, Kevin Penner, Thomas Wentworth Research supported in part by NSF CISE CCF, DARPA XData Randomized Algorithms Solve a deterministic
More informationSketched Ridge Regression:
Sketched Ridge Regression: Optimization and Statistical Perspectives Shusen Wang UC Berkeley Alex Gittens RPI Michael Mahoney UC Berkeley Overview Ridge Regression min w f w = 1 n Xw y + γ w Over-determined:
More informationA randomized algorithm for approximating the SVD of a matrix
A randomized algorithm for approximating the SVD of a matrix Joint work with Per-Gunnar Martinsson (U. of Colorado) and Vladimir Rokhlin (Yale) Mark Tygert Program in Applied Mathematics Yale University
More informationA Statistical Perspective on Algorithmic Leveraging
Ping Ma PINGMA@UGA.EDU Department of Statistics, University of Georgia, Athens, GA 30602 Michael W. Mahoney MMAHONEY@ICSI.BERKELEY.EDU International Computer Science Institute and Dept. of Statistics,
More informationRandomized algorithms for the approximation of matrices
Randomized algorithms for the approximation of matrices Luis Rademacher The Ohio State University Computer Science and Engineering (joint work with Amit Deshpande, Santosh Vempala, Grant Wang) Two topics
More informationApproximate Principal Components Analysis of Large Data Sets
Approximate Principal Components Analysis of Large Data Sets Daniel J. McDonald Department of Statistics Indiana University mypage.iu.edu/ dajmcdon April 27, 2016 Approximation-Regularization for Analysis
More informationRandomized Numerical Linear Algebra: Review and Progresses
ized ized SVD ized : Review and Progresses Zhihua Department of Computer Science and Engineering Shanghai Jiao Tong University The 12th China Workshop on Machine Learning and Applications Xi an, November
More informationSketching as a Tool for Numerical Linear Algebra
Sketching as a Tool for Numerical Linear Algebra David P. Woodruff presented by Sepehr Assadi o(n) Big Data Reading Group University of Pennsylvania February, 2015 Sepehr Assadi (Penn) Sketching for Numerical
More informationA fast randomized algorithm for orthogonal projection
A fast randomized algorithm for orthogonal projection Vladimir Rokhlin and Mark Tygert arxiv:0912.1135v2 [cs.na] 10 Dec 2009 December 10, 2009 Abstract We describe an algorithm that, given any full-rank
More informationarxiv: v3 [cs.ds] 21 Mar 2013
Low-distortion Subspace Embeddings in Input-sparsity Time and Applications to Robust Linear Regression Xiangrui Meng Michael W. Mahoney arxiv:1210.3135v3 [cs.ds] 21 Mar 2013 Abstract Low-distortion subspace
More informationLecture 12: Randomized Least-squares Approximation in Practice, Cont. 12 Randomized Least-squares Approximation in Practice, Cont.
Stat60/CS94: Randomized Algorithms for Matrices and Data Lecture 1-10/14/013 Lecture 1: Randomized Least-squares Approximation in Practice, Cont. Lecturer: Michael Mahoney Scribe: Michael Mahoney Warning:
More informationRandomly Sampling from Orthonormal Matrices: Coherence and Leverage Scores
Randomly Sampling from Orthonormal Matrices: Coherence and Leverage Scores Ilse Ipsen Joint work with Thomas Wentworth (thanks to Petros & Joel) North Carolina State University Raleigh, NC, USA Research
More informationRandNLA: Randomization in Numerical Linear Algebra
RandNLA: Randomization in Numerical Linear Algebra Petros Drineas Department of Computer Science Rensselaer Polytechnic Institute To access my web page: drineas Why RandNLA? Randomization and sampling
More informationGradient-based Sampling: An Adaptive Importance Sampling for Least-squares
Gradient-based Sampling: An Adaptive Importance Sampling for Least-squares Rong Zhu Academy of Mathematics and Systems Science, Chinese Academy of Sciences, Beijing, China. rongzhu@amss.ac.cn Abstract
More informationTighter Low-rank Approximation via Sampling the Leveraged Element
Tighter Low-rank Approximation via Sampling the Leveraged Element Srinadh Bhojanapalli The University of Texas at Austin bsrinadh@utexas.edu Prateek Jain Microsoft Research, India prajain@microsoft.com
More informationRelative-Error CUR Matrix Decompositions
RandNLA Reading Group University of California, Berkeley Tuesday, April 7, 2015. Motivation study [low-rank] matrix approximations that are explicitly expressed in terms of a small numbers of columns and/or
More informationFast Dimension Reduction
Fast Dimension Reduction MMDS 2008 Nir Ailon Google Research NY Fast Dimension Reduction Using Rademacher Series on Dual BCH Codes (with Edo Liberty) The Fast Johnson Lindenstrauss Transform (with Bernard
More informationto be more efficient on enormous scale, in a stream, or in distributed settings.
16 Matrix Sketching The singular value decomposition (SVD) can be interpreted as finding the most dominant directions in an (n d) matrix A (or n points in R d ). Typically n > d. It is typically easy to
More informationSAMPLING FROM LARGE MATRICES: AN APPROACH THROUGH GEOMETRIC FUNCTIONAL ANALYSIS
SAMPLING FROM LARGE MATRICES: AN APPROACH THROUGH GEOMETRIC FUNCTIONAL ANALYSIS MARK RUDELSON AND ROMAN VERSHYNIN Abstract. We study random submatrices of a large matrix A. We show how to approximately
More informationThe Fast Cauchy Transform and Faster Robust Linear Regression
The Fast Cauchy Transform and Faster Robust Linear Regression Kenneth L Clarkson Petros Drineas Malik Magdon-Ismail Michael W Mahoney Xiangrui Meng David P Woodruff Abstract We provide fast algorithms
More informationUsing Friendly Tail Bounds for Sums of Random Matrices
Using Friendly Tail Bounds for Sums of Random Matrices Joel A. Tropp Computing + Mathematical Sciences California Institute of Technology jtropp@cms.caltech.edu Research supported in part by NSF, DARPA,
More informationDense Fast Random Projections and Lean Walsh Transforms
Dense Fast Random Projections and Lean Walsh Transforms Edo Liberty, Nir Ailon, and Amit Singer Abstract. Random projection methods give distributions over k d matrices such that if a matrix Ψ (chosen
More informationA Practical Randomized CP Tensor Decomposition
A Practical Randomized CP Tensor Decomposition Casey Battaglino, Grey Ballard 2, and Tamara G. Kolda 3 SIAM AN 207, Pittsburgh, PA Georgia Tech Computational Sci. and Engr. 2 Wake Forest University 3 Sandia
More informationCompressed Sensing and Robust Recovery of Low Rank Matrices
Compressed Sensing and Robust Recovery of Low Rank Matrices M. Fazel, E. Candès, B. Recht, P. Parrilo Electrical Engineering, University of Washington Applied and Computational Mathematics Dept., Caltech
More informationRandNLA: Randomized Numerical Linear Algebra
RandNLA: Randomized Numerical Linear Algebra Petros Drineas Rensselaer Polytechnic Institute Computer Science Department To access my web page: drineas RandNLA: sketch a matrix by row/ column sampling
More informationMANY scientific computations, signal processing, data analysis and machine learning applications lead to large dimensional
Low rank approximation and decomposition of large matrices using error correcting codes Shashanka Ubaru, Arya Mazumdar, and Yousef Saad 1 arxiv:1512.09156v3 [cs.it] 15 Jun 2017 Abstract Low rank approximation
More informationRandomized Algorithms in Linear Algebra and Applications in Data Analysis
Randomized Algorithms in Linear Algebra and Applications in Data Analysis Petros Drineas Rensselaer Polytechnic Institute Computer Science Department To access my web page: drineas Why linear algebra?
More informationFast Random Projections using Lean Walsh Transforms Yale University Technical report #1390
Fast Random Projections using Lean Walsh Transforms Yale University Technical report #1390 Edo Liberty Nir Ailon Amit Singer Abstract We present a k d random projection matrix that is applicable to vectors
More informationColumn Selection via Adaptive Sampling
Column Selection via Adaptive Sampling Saurabh Paul Global Risk Sciences, Paypal Inc. saupaul@paypal.com Malik Magdon-Ismail CS Dept., Rensselaer Polytechnic Institute magdon@cs.rpi.edu Petros Drineas
More informationRecovering any low-rank matrix, provably
Recovering any low-rank matrix, provably Rachel Ward University of Texas at Austin October, 2014 Joint work with Yudong Chen (U.C. Berkeley), Srinadh Bhojanapalli and Sujay Sanghavi (U.T. Austin) Matrix
More informationRandomized methods for computing the Singular Value Decomposition (SVD) of very large matrices
Randomized methods for computing the Singular Value Decomposition (SVD) of very large matrices Gunnar Martinsson The University of Colorado at Boulder Students: Adrianna Gillman Nathan Halko Sijia Hao
More informationStein s Method for Matrix Concentration
Stein s Method for Matrix Concentration Lester Mackey Collaborators: Michael I. Jordan, Richard Y. Chen, Brendan Farrell, and Joel A. Tropp University of California, Berkeley California Institute of Technology
More informationCS60021: Scalable Data Mining. Dimensionality Reduction
J. Leskovec, A. Rajaraman, J. Ullman: Mining of Massive Datasets, http://www.mmds.org 1 CS60021: Scalable Data Mining Dimensionality Reduction Sourangshu Bhattacharya Assumption: Data lies on or near a
More informationFast Matrix Computations via Randomized Sampling. Gunnar Martinsson, The University of Colorado at Boulder
Fast Matrix Computations via Randomized Sampling Gunnar Martinsson, The University of Colorado at Boulder Computational science background One of the principal developments in science and engineering over
More informationLecture 9: Matrix approximation continued
0368-348-01-Algorithms in Data Mining Fall 013 Lecturer: Edo Liberty Lecture 9: Matrix approximation continued Warning: This note may contain typos and other inaccuracies which are usually discussed during
More informationarxiv: v2 [stat.ml] 29 Nov 2018
Randomized Iterative Algorithms for Fisher Discriminant Analysis Agniva Chowdhury Jiasen Yang Petros Drineas arxiv:1809.03045v2 [stat.ml] 29 Nov 2018 Abstract Fisher discriminant analysis FDA is a widely
More informationarxiv: v2 [cs.ds] 1 May 2013
Dimension Independent Matrix Square using MapReduce arxiv:1304.1467v2 [cs.ds] 1 May 2013 Reza Bosagh Zadeh Institute for Computational and Mathematical Engineering rezab@stanford.edu Gunnar Carlsson Mathematics
More informationIntroduction The framework Bias and variance Approximate computation of leverage Empirical evaluation Discussion of sampling approach in big data
Discussion of sampling approach in big data Big data discussion group at MSCS of UIC Outline 1 Introduction 2 The framework 3 Bias and variance 4 Approximate computation of leverage 5 Empirical evaluation
More informationConvergence Rates of Kernel Quadrature Rules
Convergence Rates of Kernel Quadrature Rules Francis Bach INRIA - Ecole Normale Supérieure, Paris, France ÉCOLE NORMALE SUPÉRIEURE NIPS workshop on probabilistic integration - Dec. 2015 Outline Introduction
More informationLecture 5: Randomized methods for low-rank approximation
CBMS Conference on Fast Direct Solvers Dartmouth College June 23 June 27, 2014 Lecture 5: Randomized methods for low-rank approximation Gunnar Martinsson The University of Colorado at Boulder Research
More informationSubset Selection. Deterministic vs. Randomized. Ilse Ipsen. North Carolina State University. Joint work with: Stan Eisenstat, Yale
Subset Selection Deterministic vs. Randomized Ilse Ipsen North Carolina State University Joint work with: Stan Eisenstat, Yale Mary Beth Broadbent, Martin Brown, Kevin Penner Subset Selection Given: real
More informationEnabling very large-scale matrix computations via randomization
Enabling very large-scale matrix computations via randomization Gunnar Martinsson The University of Colorado at Boulder Students: Adrianna Gillman Nathan Halko Patrick Young Collaborators: Edo Liberty
More informationLow Rank Matrix Approximation
Low Rank Matrix Approximation John T. Svadlenka Ph.D. Program in Computer Science The Graduate Center of the City University of New York New York, NY 10036 USA jsvadlenka@gradcenter.cuny.edu Abstract Low
More informationdimensionality reduction for k-means and low rank approximation
dimensionality reduction for k-means and low rank approximation Michael Cohen, Sam Elder, Cameron Musco, Christopher Musco, Mădălina Persu Massachusetts Institute of Technology 0 overview Simple techniques
More informationData Mining Lecture 4: Covariance, EVD, PCA & SVD
Data Mining Lecture 4: Covariance, EVD, PCA & SVD Jo Houghton ECS Southampton February 25, 2019 1 / 28 Variance and Covariance - Expectation A random variable takes on different values due to chance The
More informationSketching as a Tool for Numerical Linear Algebra
Foundations and Trends R in Theoretical Computer Science Vol. 10, No. 1-2 (2014) 1 157 c 2014 D. P. Woodruff DOI: 10.1561/0400000060 Sketching as a Tool for Numerical Linear Algebra David P. Woodruff IBM
More informationSupremum of simple stochastic processes
Subspace embeddings Daniel Hsu COMS 4772 1 Supremum of simple stochastic processes 2 Recap: JL lemma JL lemma. For any ε (0, 1/2), point set S R d of cardinality 16 ln n S = n, and k N such that k, there
More informationVery Sparse Random Projections
Very Sparse Random Projections Ping Li, Trevor Hastie and Kenneth Church [KDD 06] Presented by: Aditya Menon UCSD March 4, 2009 Presented by: Aditya Menon (UCSD) Very Sparse Random Projections March 4,
More informationRandomized Algorithms
Randomized Algorithms Saniv Kumar, Google Research, NY EECS-6898, Columbia University - Fall, 010 Saniv Kumar 9/13/010 EECS6898 Large Scale Machine Learning 1 Curse of Dimensionality Gaussian Mixture Models
More informationFast Approximate Matrix Multiplication by Solving Linear Systems
Electronic Colloquium on Computational Complexity, Report No. 117 (2014) Fast Approximate Matrix Multiplication by Solving Linear Systems Shiva Manne 1 and Manjish Pal 2 1 Birla Institute of Technology,
More informationJOHNSON-LINDENSTRAUSS TRANSFORMATION AND RANDOM PROJECTION
JOHNSON-LINDENSTRAUSS TRANSFORMATION AND RANDOM PROJECTION LONG CHEN ABSTRACT. We give a brief survey of Johnson-Lindenstrauss lemma. CONTENTS 1. Introduction 1 2. JL Transform 4 2.1. An Elementary Proof
More informationAn Iterative, Sketching-based Framework for Ridge Regression
Agniva Chowdhury Jiasen Yang Petros Drineas Abstract Ridge regression is a variant of regularized least squares regression that is particularly suitable in settings where the number of predictor variables
More informationError Estimation for Randomized Least-Squares Algorithms via the Bootstrap
via the Bootstrap Miles E. Lopes 1 Shusen Wang 2 Michael W. Mahoney 2 Abstract Over the course of the past decade, a variety of randomized algorithms have been proposed for computing approximate least-squares
More informationFaster Johnson-Lindenstrauss style reductions
Faster Johnson-Lindenstrauss style reductions Aditya Menon August 23, 2007 Outline 1 Introduction Dimensionality reduction The Johnson-Lindenstrauss Lemma Speeding up computation 2 The Fast Johnson-Lindenstrauss
More informationarxiv: v2 [cs.ds] 17 Feb 2016
Efficient Algorithm for Sparse Matrices Mina Ghashami University of Utah ghashami@cs.utah.edu Edo Liberty Yahoo Labs edo.liberty@yahoo.com Jeff M. Phillips University of Utah jeffp@cs.utah.edu arxiv:1602.00412v2
More informationNear-Optimal Coresets for Least-Squares Regression
6880 IEEE TRANSACTIONS ON INFORMATION THEORY, VOL 59, NO 10, OCTOBER 2013 Near-Optimal Coresets for Least-Squares Regression Christos Boutsidis, Petros Drineas, Malik Magdon-Ismail Abstract We study the
More informationAdvances in Extreme Learning Machines
Advances in Extreme Learning Machines Mark van Heeswijk April 17, 2015 Outline Context Extreme Learning Machines Part I: Ensemble Models of ELM Part II: Variable Selection and ELM Part III: Trade-offs
More informationOPERATIONS on large matrices are a cornerstone of
1 On Sparse Representations of Linear Operators and the Approximation of Matrix Products Mohamed-Ali Belabbas and Patrick J. Wolfe arxiv:0707.4448v2 [cs.ds] 26 Jun 2009 Abstract Thus far, sparse representations
More informationA Statistical Perspective on Algorithmic Leveraging
Journal of Machine Learning Research 16 (2015) 861-911 Submitted 1/14; Revised 10/14; Published 4/15 A Statistical Perspective on Algorithmic Leveraging Ping Ma Department of Statistics University of Georgia
More informationDimensionality reduction: Johnson-Lindenstrauss lemma for structured random matrices
Dimensionality reduction: Johnson-Lindenstrauss lemma for structured random matrices Jan Vybíral Austrian Academy of Sciences RICAM, Linz, Austria January 2011 MPI Leipzig, Germany joint work with Aicke
More informationApplications of Randomized Methods for Decomposing and Simulating from Large Covariance Matrices
Applications of Randomized Methods for Decomposing and Simulating from Large Covariance Matrices Vahid Dehdari and Clayton V. Deutsch Geostatistical modeling involves many variables and many locations.
More informationImproved Bounds on the Dot Product under Random Projection and Random Sign Projection
Improved Bounds on the Dot Product under Random Projection and Random Sign Projection Ata Kabán School of Computer Science The University of Birmingham Birmingham B15 2TT, UK http://www.cs.bham.ac.uk/
More informationApproximating a Gram Matrix for Improved Kernel-Based Learning
Approximating a Gram Matrix for Improved Kernel-Based Learning (Extended Abstract) Petros Drineas 1 and Michael W. Mahoney 1 Department of Computer Science, Rensselaer Polytechnic Institute, Troy, New
More informationYale university technical report #1402.
The Mailman algorithm: a note on matrix vector multiplication Yale university technical report #1402. Edo Liberty Computer Science Yale University New Haven, CT Steven W. Zucker Computer Science and Appled
More informationRandom projections. 1 Introduction. 2 Dimensionality reduction. Lecture notes 5 February 29, 2016
Lecture notes 5 February 9, 016 1 Introduction Random projections Random projections are a useful tool in the analysis and processing of high-dimensional data. We will analyze two applications that use
More informationsublinear time low-rank approximation of positive semidefinite matrices Cameron Musco (MIT) and David P. Woodru (CMU)
sublinear time low-rank approximation of positive semidefinite matrices Cameron Musco (MIT) and David P. Woodru (CMU) 0 overview Our Contributions: 1 overview Our Contributions: A near optimal low-rank
More informationNumerische Mathematik
Numer. Math. 011) 117:19 49 DOI 10.1007/s0011-010-0331-6 Numerische Mathematik Faster least squares approximation Petros Drineas Michael W. Mahoney S. Muthukrishnan Tamás Sarlós Received: 6 May 009 / Revised:
More informationarxiv: v1 [stat.me] 23 Jun 2013
A Statistical Perspective on Algorithmic Leveraging Ping Ma Michael W. Mahoney Bin Yu arxiv:1306.5362v1 [stat.me] 23 Jun 2013 Abstract One popular method for dealing with large-scale data sets is sampling.
More informationA fast randomized algorithm for approximating an SVD of a matrix
A fast randomized algorithm for approximating an SVD of a matrix Joint work with Franco Woolfe, Edo Liberty, and Vladimir Rokhlin Mark Tygert Program in Applied Mathematics Yale University Place July 17,
More informationELE 538B: Mathematics of High-Dimensional Data. Spectral methods. Yuxin Chen Princeton University, Fall 2018
ELE 538B: Mathematics of High-Dimensional Data Spectral methods Yuxin Chen Princeton University, Fall 2018 Outline A motivating application: graph clustering Distance and angles between two subspaces Eigen-space
More informationAccelerated Dense Random Projections
1 Advisor: Steven Zucker 1 Yale University, Department of Computer Science. Dimensionality reduction (1 ε) xi x j 2 Ψ(xi ) Ψ(x j ) 2 (1 + ε) xi x j 2 ( n 2) distances are ε preserved Target dimension k
More informationNotes on singular value decomposition for Math 54. Recall that if A is a symmetric n n matrix, then A has real eigenvalues A = P DP 1 A = P DP T.
Notes on singular value decomposition for Math 54 Recall that if A is a symmetric n n matrix, then A has real eigenvalues λ 1,, λ n (possibly repeated), and R n has an orthonormal basis v 1,, v n, where
More informationSubspace sampling and relative-error matrix approximation
Subspace sampling and relative-error matrix approximation Petros Drineas Rensselaer Polytechnic Institute Computer Science Department (joint work with M. W. Mahoney) For papers, etc. drineas The CUR decomposition
More informationRandom Projections for Support Vector Machines
Saurabh Paul Christos Boutsidis Malik Magdon-Ismail Petros Drineas Computer Science Dept. Mathematical Sciences Dept. Computer Science Dept. Computer Science Dept. Rensselaer Polytechnic Inst. IBM Research
More informationSingle Pass PCA of Matrix Products
Single Pass PCA of Matrix Products Shanshan Wu The University of Texas at Austin shanshan@utexas.edu Sujay Sanghavi The University of Texas at Austin sanghavi@mail.utexas.edu Srinadh Bhojanapalli Toyota
More informationRandNLA: Randomization in Numerical Linear Algebra: Theory and Practice
RandNLA: Randomization in Numerical Linear Algebra: Theory and Practice Petros Drineas Ilse Ipsen (organizer) Michael W. Mahoney RPI NCSU UC Berkeley To access our web pages use your favorite search engine.
More informationFast and Robust Least Squares Estimation in Corrupted Linear Models
Fast and Robust Least Squares Estimation in Corrupted Linear Models Brian McWilliams Gabriel Krummenacher Mario Lucic Joachim M. Buhmann Department of Computer Science ETH Zürich, Switzerland {mcbrian,gabriel.krummenacher,lucic,jbuhmann}@inf.ethz.ch
More informationLecture 24: Element-wise Sampling of Graphs and Linear Equation Solving. 22 Element-wise Sampling of Graphs and Linear Equation Solving
Stat260/CS294: Randomized Algorithms for Matrices and Data Lecture 24-12/02/2013 Lecture 24: Element-wise Sampling of Graphs and Linear Equation Solving Lecturer: Michael Mahoney Scribe: Michael Mahoney
More informationPositive definite preserving linear transformations on symmetric matrix spaces
Positive definite preserving linear transformations on symmetric matrix spaces arxiv:1008.1347v1 [math.ra] 7 Aug 2010 Huynh Dinh Tuan-Tran Thi Nha Trang-Doan The Hieu Hue Geometry Group College of Education,
More informationTechnical Report. Random projections for Bayesian regression. Leo Geppert, Katja Ickstadt, Alexander Munteanu and Christian Sohler 04/2014
Random projections for Bayesian regression Technical Report Leo Geppert, Katja Ickstadt, Alexander Munteanu and Christian Sohler 04/2014 technische universität dortmund Part of the work on this technical
More informationECE 8201: Low-dimensional Signal Models for High-dimensional Data Analysis
ECE 8201: Low-dimensional Signal Models for High-dimensional Data Analysis Lecture 7: Matrix completion Yuejie Chi The Ohio State University Page 1 Reference Guaranteed Minimum-Rank Solutions of Linear
More informationarxiv: v1 [cs.lg] 22 Mar 2014
CUR lgorithm with Incomplete Matrix Observation Rong Jin an Shenghuo Zhu Dept. of Computer Science an Engineering, Michigan State University, rongjin@msu.eu NEC Laboratories merica, Inc., zsh@nec-labs.com
More informationPrincipal Component Analysis
Machine Learning Michaelmas 2017 James Worrell Principal Component Analysis 1 Introduction 1.1 Goals of PCA Principal components analysis (PCA) is a dimensionality reduction technique that can be used
More informationA Tutorial on Matrix Approximation by Row Sampling
A Tutorial on Matrix Approximation by Row Sampling Rasmus Kyng June 11, 018 Contents 1 Fast Linear Algebra Talk 1.1 Matrix Concentration................................... 1. Algorithms for ɛ-approximation
More informationBlendenpik: Supercharging LAPACK's Least-Squares Solver
Blendenpik: Supercharging LAPACK's Least-Squares Solver The MIT Faculty has made this article openly available. Please share how this access benefits you. Your story matters. Citation As Published Publisher
More informationOpen Research Online The Open University s repository of research publications and other research outputs
Open Research Online The Open University s repository of research publications and other research outputs A note on the Weiss conjecture Journal Item How to cite: Gill, Nick (2013). A note on the Weiss
More informationFast Random Projections
Fast Random Projections Edo Liberty 1 September 18, 2007 1 Yale University, New Haven CT, supported by AFOSR and NGA (www.edoliberty.com) Advised by Steven Zucker. About This talk will survey a few random
More informationarxiv: v1 [cs.ds] 24 Dec 2017
Lectures on Randomized Numerical Linear Algebra * Petros Drineas Michael W. Mahoney arxiv:1712.08880v1 [cs.ds] 24 Dec 2017 Contents 1 Introduction 2 2 Linear Algebra 3 2.1 Basics..............................................
More informationTechnical Report No September 2009
FINDING STRUCTURE WITH RANDOMNESS: STOCHASTIC ALGORITHMS FOR CONSTRUCTING APPROXIMATE MATRIX DECOMPOSITIONS N. HALKO, P. G. MARTINSSON, AND J. A. TROPP Technical Report No. 2009-05 September 2009 APPLIED
More informationA Practical Randomized CP Tensor Decomposition
A Practical Randomized CP Tensor Decomposition Casey Battaglino, Grey Ballard 2, and Tamara G. Kolda 3 SIAM CSE 27 Atlanta, GA Georgia Tech Computational Sci. and Engr. Wake Forest University Sandia National
More informationR A N D O M I Z E D L I N E A R A L G E B R A F O R L A R G E - S C A L E D ATA A P P L I C AT I O N S
R A N D O M I Z E D L I N E A R A L G E B R A F O R L A R G E - S C A L E D ATA A P P L I C AT I O N S a dissertation submitted to the institute for computational and mathematical engineering and the committee
More informationSparse Features for PCA-Like Linear Regression
Sparse Features for PCA-Like Linear Regression Christos Boutsidis Mathematical Sciences Department IBM T J Watson Research Center Yorktown Heights, New York cboutsi@usibmcom Petros Drineas Computer Science
More informationNon-Asymptotic Theory of Random Matrices Lecture 4: Dimension Reduction Date: January 16, 2007
Non-Asymptotic Theory of Random Matrices Lecture 4: Dimension Reduction Date: January 16, 2007 Lecturer: Roman Vershynin Scribe: Matthew Herman 1 Introduction Consider the set X = {n points in R N } where
More information