Lecture 4: Constant Time SVD Approximation

Similar documents
Matrix Approximation via Sampling, Subspace Embedding. 1 Solving Linear Systems Using SVD

2.3 Nilpotent endomorphisms

U.C. Berkeley CS294: Spectral Methods and Expanders Handout 8 Luca Trevisan February 17, 2016

Lectures - Week 4 Matrix norms, Conditioning, Vector Spaces, Linear Independence, Spanning sets and Basis, Null space and Range of a Matrix

Lecture 10: May 6, 2013

Singular Value Decomposition: Theory and Applications

Errors for Linear Systems

APPENDIX A Some Linear Algebra

U.C. Berkeley CS278: Computational Complexity Professor Luca Trevisan 2/21/2008. Notes for Lecture 8

8.4 COMPLEX VECTOR SPACES AND INNER PRODUCTS

Lecture Notes on Linear Regression

Ph 219a/CS 219a. Exercises Due: Wednesday 23 October 2013

Stanford University CS359G: Graph Partitioning and Expanders Handout 4 Luca Trevisan January 13, 2011

College of Computer & Information Science Fall 2009 Northeastern University 20 October 2009

Econ107 Applied Econometrics Topic 3: Classical Model (Studenmund, Chapter 4)

n α j x j = 0 j=1 has a nontrivial solution. Here A is the n k matrix whose jth column is the vector for all t j=0

763622S ADVANCED QUANTUM MECHANICS Solution Set 1 Spring c n a n. c n 2 = 1.

Lecture 3. Ax x i a i. i i

Lecture 2: Gram-Schmidt Vectors and the LLL Algorithm

Problem Set 9 Solutions

MATH 5707 HOMEWORK 4 SOLUTIONS 2. 2 i 2p i E(X i ) + E(Xi 2 ) ä i=1. i=1

Randomness and Computation

3.1 Expectation of Functions of Several Random Variables. )' be a k-dimensional discrete or continuous random vector, with joint PMF p (, E X E X1 E X

CSCE 790S Background Results

10-701/ Machine Learning, Fall 2005 Homework 3

Low correlation tensor decomposition via entropy maximization

Linear, affine, and convex sets and hulls In the sequel, unless otherwise specified, X will denote a real vector space.

Dimensionality Reduction Notes 1

Communication Complexity 16:198: February Lecture 4. x ij y ij

Finding Dense Subgraphs in G(n, 1/2)

Math 217 Fall 2013 Homework 2 Solutions

Stanford University CS254: Computational Complexity Notes 7 Luca Trevisan January 29, Notes for Lecture 7

MATH 241B FUNCTIONAL ANALYSIS - NOTES EXAMPLES OF C ALGEBRAS

Learning Theory: Lecture Notes

5 The Rational Canonical Form

U.C. Berkeley CS294: Beyond Worst-Case Analysis Luca Trevisan September 5, 2017

Grover s Algorithm + Quantum Zeno Effect + Vaidman

Min Cut, Fast Cut, Polynomial Identities

= = = (a) Use the MATLAB command rref to solve the system. (b) Let A be the coefficient matrix and B be the right-hand side of the system.

SUCCESSIVE MINIMA AND LATTICE POINTS (AFTER HENK, GILLET AND SOULÉ) M(B) := # ( B Z N)

The Geometry of Logit and Probit

A New Refinement of Jacobi Method for Solution of Linear System Equations AX=b

ρ some λ THE INVERSE POWER METHOD (or INVERSE ITERATION) , for , or (more usually) to

princeton univ. F 13 cos 521: Advanced Algorithm Design Lecture 3: Large deviations bounds and applications Lecturer: Sanjeev Arora

7. Products and matrix elements

A polynomial time algorithm for the ground state of one-dimensional gapped local Hamiltonians

Lecture 10 Support Vector Machines II

Linear Approximation with Regularization and Moving Least Squares

Notes on Frequency Estimation in Data Streams

Approximate Smallest Enclosing Balls

THE CHINESE REMAINDER THEOREM. We should thank the Chinese for their wonderful remainder theorem. Glenn Stevens

18.1 Introduction and Recap

COS 511: Theoretical Machine Learning. Lecturer: Rob Schapire Lecture #16 Scribe: Yannan Wang April 3, 2014

Lecture 4: November 17, Part 1 Single Buffer Management

Lecture 12: Discrete Laplacian

Chapter 5. Solution of System of Linear Equations. Module No. 6. Solution of Inconsistent and Ill Conditioned Systems

Beyond Zudilin s Conjectured q-analog of Schmidt s problem

Lecture 21: Numerical methods for pricing American type derivatives

Composite Hypotheses testing

The Multiple Classical Linear Regression Model (CLRM): Specification and Assumptions. 1. Introduction

Expected Value and Variance

Supplement to Clustering with Statistical Error Control

Lecture Space-Bounded Derandomization

First day August 1, Problems and Solutions

Inner Product. Euclidean Space. Orthonormal Basis. Orthogonal

The Prncpal Component Transform The Prncpal Component Transform s also called Karhunen-Loeve Transform (KLT, Hotellng Transform, oregenvector Transfor

Perron Vectors of an Irreducible Nonnegative Interval Matrix

A PROBABILITY-DRIVEN SEARCH ALGORITHM FOR SOLVING MULTI-OBJECTIVE OPTIMIZATION PROBLEMS

Stanford University Graph Partitioning and Expanders Handout 3 Luca Trevisan May 8, 2013

BOUNDEDNESS OF THE RIESZ TRANSFORM WITH MATRIX A 2 WEIGHTS

Erratum: A Generalized Path Integral Control Approach to Reinforcement Learning

4 Analysis of Variance (ANOVA) 5 ANOVA. 5.1 Introduction. 5.2 Fixed Effects ANOVA

More metrics on cartesian products

Linear Regression Analysis: Terminology and Notation

Eigenvalues of Random Graphs

p 1 c 2 + p 2 c 2 + p 3 c p m c 2

Lecture 20: Lift and Project, SDP Duality. Today we will study the Lift and Project method. Then we will prove the SDP duality theorem.

Statistical Inference. 2.3 Summary Statistics Measures of Center and Spread. parameters ( population characteristics )

Exercises of Chapter 2

find (x): given element x, return the canonical element of the set containing x;

princeton univ. F 17 cos 521: Advanced Algorithm Design Lecture 7: LP Duality Lecturer: Matt Weinberg

Parametric fractional imputation for missing data analysis. Jae Kwang Kim Survey Working Group Seminar March 29, 2010

Salmon: Lectures on partial differential equations. Consider the general linear, second-order PDE in the form. ,x 2

Dynamic Programming. Preview. Dynamic Programming. Dynamic Programming. Dynamic Programming (Example: Fibonacci Sequence)

Lecture 4: Universal Hash Functions/Streaming Cont d

Additional Codes using Finite Difference Method. 1 HJB Equation for Consumption-Saving Problem Without Uncertainty

Genericity of Critical Types

ANSWERS. Problem 1. and the moment generating function (mgf) by. defined for any real t. Use this to show that E( U) var( U)

The Order Relation and Trace Inequalities for. Hermitian Operators

Norms, Condition Numbers, Eigenvalues and Eigenvectors

MMA and GCMMA two methods for nonlinear optimization

COS 511: Theoretical Machine Learning. Lecturer: Rob Schapire Lecture # 15 Scribe: Jieming Mao April 1, 2013

U.C. Berkeley CS294: Beyond Worst-Case Analysis Handout 6 Luca Trevisan September 12, 2017

C/CS/Phy191 Problem Set 3 Solutions Out: Oct 1, 2008., where ( 00. ), so the overall state of the system is ) ( ( ( ( 00 ± 11 ), Φ ± = 1

COS 521: Advanced Algorithms Game Theory and Linear Programming

Ph 219a/CS 219a. Exercises Due: Wednesday 12 November 2008

STAT 309: MATHEMATICAL COMPUTATIONS I FALL 2018 LECTURE 16

xp(x µ) = 0 p(x = 0 µ) + 1 p(x = 1 µ) = µ

DUE: WEDS FEB 21ST 2018

Notes prepared by Prof Mrs) M.J. Gholba Class M.Sc Part(I) Information Technology

Transcription:

Spectral Algorthms and Representatons eb. 17, Mar. 3 and 8, 005 Lecture 4: Constant Tme SVD Approxmaton Lecturer: Santosh Vempala Scrbe: Jangzhuo Chen Ths topc conssts of three lectures 0/17, 03/03, 03/08), based on [KV04]. We are nterested n the followng problem. Problem: Gven A R m n, fnd D R m n wth ran D) to approxmate A. ormally, mn A D 1) D:ranD) Notaton. Let λ t,, v t) denote the t th sngular value, left sngular vector, rght sngular vector of A, respectvely. Let A denote the th column of A, A ) denote the th row of A. Let A λ t v t)t t1 Av t) v t)t. By a theorem of Ecart and Young, A s the optmal soluton to 1) and A A r t+1 λ t. So one way to solve 1) s to fnd the top rght sngular vectors of A, {v t) } t1. t1 1 Compute SVD Gven an m n matrx, t taes Θmn) tme to just read the nput. We are to fnd ts top rght sngular vectors. Notce that we may only get approxmatons snce the sngular values can be rratonal. So for the top rght sngular vector, we are to fnd ṽ such that Aṽ 1 ε)λ 1 and ṽ v 1) ε for some gven accuracy parameter ε. The followng Power method can fnd the top rght sngular vector. Power method: 1. Let v 0 be a random unt vector n R n.. Repeat v t+1 AT A)v t A T A)v t Remar 1.1. There are several questons concernng the power method. How to generate a random unt vector v 0 R n? All we need s a unform dstrbuton on the surface of unt sphere. We can use any sphercal dstrbuton to generate a random vector and scale t to unt length. 1

Does the teraton converge to the top rght sngular vector v 1)? How fast? If v t v 1), then A T A)v t A T λ 1 u 1) ) λ 1 A T u 1) ) λ 1 v1) and v t+1 AT A)v t A T A)v t v1) v t so v 1) s a fxed pont of the teraton. It can be shown that v t ṽ after O 1 ε log n) teratons. Snce each round taes Omn) tme, the tme complexty of the power method s O mn ε log n). Exercse: Prove bounds on the convergence of the power method. Hnt: use SVD. See Lemma 1 and Theorem 1 n [CKVW] for detals.) How to fnd top rght sngular vectors? nd the top one, compute A Avv T ) and repeat. It taes tme O mn ε Can we have a smaller tme complexty when we have a sparse matrx? log n). Suppose A has only M nonzero entres. Each teraton of the power method taes OM) tme. So t taes O M ε log n) to fnd v1). or a sparse matrx, can we acheve O M ε log n) tme complexty for sngular vectors? Notce that n search of the top sngular vector, each teraton of the power method taes OM) tme. After computng A Avv T ), however, the matrx s no longer sparse. ortunately, we can extend the power method as follows. 1. Randomly choose orthonormal matrx V 0 v0 1,..., v0 ) Rn.. Repeat V t+1 A T A)V t V t+1 V t+1 dag V t+1 ) 1 1,..., V t+1 ) 1) normalze each column) Note that f V t V v 1),..., v ) ), then A T A)V t AT Av 1),..., Av ) ) A T Aλ 1 u 1),..., λ u ) ) λ 1v 1),..., λ v) ) and V t+1 λ 1v 1),..., λ v) )dag λ 1,..., ) λ v 1),..., v ) ) V t Approxmate SVD Can we have a faster approxmate soluton to 1)? That s, we want to fnd D wth ran D), such that A D s small.

.1 Exstence of a Good Constant-Dmensonal Subspace It s shown n [KV04] that we only need to loo at O ε ) rows of A. Theorem 1. [KV04] Theorem ) Gven A R m n, nteger, and ε > 0, there exsts a subset S of ε rows of A, such that n ts span les a matrx Ã.e., every row of à s n span {S}) wth the followng property: A à mn A rand) D + ε A. ) Proof: Pc ε rows from the followng dstrbuton wth multplcty): P Pr{row s pced} A) A, 1,..., m. We need to show there s nonzero probablty ths subset has the desred property. It would be a bad dea to just pc top rows wth largest A ). See gure 1 for such an example.) A 3),..., A m) A 1) 0 A ) gure 1: Suppose A 1) A ) < A 3) A m). The optmal subspace s span { A 1), A ),..., A )}, whle the subspace spanned by the top rows, span { A 3),..., A +)}, s a bad subspace. Let S be the chosen subset. We wll dentfy vectors ŷ 1),..., ŷ ) n span {S} such that ŷ t) s close to v t), t 1,...,. Notce v t) s a lnear combnaton of rows of A: λ t v t) A T m 1 ut) A )T. How do we approxmate ths lnear combnaton? The random vector w t) defned as follows has mean λ t v t)t wth bounded varance. w t) 1 S A ) S A ) P Let s S. Wrte w t) 1 s s j1 X j where {X j } s j1 A ) ut) are..d. as X P wth probablty P. 3

The mean of w t) s: [ E w t)] E [X j ] m 1 A ) P P m 1 A ) λ t v t)t. The varance of w t) s: [ w E t) E [w t)] ] 1 s E [ X λ t v t)t ) X T λ t v t))] 1 [ E XX T ] λ ) t s m A ) P 1 s 1 1 m s 1 s 1 A λ t ) A )T P ) A λ t ) ) P λ t ) Let ŷ t) w t)t /λ t, V 1 span { ŷ 1),..., ŷ )}. We show that Proj V1 A approxmates A by showng an upper bound on E [ A Proj V1 A ]. Let ˆ t1 Avt) ŷ t)t. A Proj V1 A A Av t) ŷ t)t t1 n u )T A ˆ ) 1 u )T A ˆ ) n + 1 λ v )T λ ŷ t)t + 1 λ v )T w ) + 1 error of orthogonal projecton general error) +1 u )T A ˆ ) n +1 n +1 λ λ v )T Therefore, E [ A Proj V1 A ] [ w A A + E ) λ v )T ] 1 4

A A + s A A A + ε A when s ε ) The exstence of a subset S satsfyng the propertes n the theorem follows from the nequalty on the expectaton. Remar.1. Theorem 1 s an exstental theorem. We do not now n the defnton of w t). The correspondng algorthmc result s gven n [DK + 04], presented on 0/4 and 03/01.. Constant Tme SVD Approxmaton Algorthm Lemma 1. [KV04] Lemma ) Let M R a b. Let Q Q 1, Q,..., Q a be a probablty dstrbuton on [a] such that Q α M ), 1,..., a for some α [0, 1] so when α 1, we have M equaltes). Let σ 1,..., p ) be p ndependent samples from [a], each follows dstrbuton Q. Let N R p b wth N t) M t ), t 1,..., p. Then pqt E [ M T M N T N ] 1 αp M 4 3) Proof: We frst show E [ N T N ] M T M. [ N E T N ) ] r,s p E [N t,r N t,s ] t1 p t1 t1 Next we show a bound on E [ N T N ) r,s M T M ) r,s M t,rm t,s pq t Q t ) ] : E [ N T N ) r,s M T M ) r,s p t1 ) ] M t,rm t,s M T M ) r,s t1 E [N t,r N t,s ) ] E [N t,r N t,s ]) ) M p M t,r t,s t1 pq t ) Q t M p M t,r t,s p α M t) / M t1 M M M t,r t,s αp M t). t1 Thus, E [ M T M N T N ] 5

b [ N E T N ) r,s M T M ) ) ] r,s r,s1 M αp M αp t1 1 M t) M t) t1 1 αp M 4 b r,s1 M t,rm t,s Remar.. Lemma 1 suggests that we can approxmate the egenvectors of M T M or the rght sngular vectors of M T M and the subspace spanned by them) by the egenvectors of N T N or the rght sngular vectors of N T N and the subspace spanned by them). In our problem, f we sample the rows of A to get p n matrx S and sample the columns of S to get p p matrx W, then we may use the subspace spanned by the left sngular vectors of W to approxmate the subspace spanned by the left sngular vectors of S, and use the subspace spanned by the rght sngular vectors of S to approxmate the subspace spanned by the rght sngular vectors of A. But can we use the subspace spanned by the left sngular vectors of W to approxmate the subspace spanned by the rght sngular vectors of A? They are not even of the same dmenson. A ey observaton of [KV04] s that we can mae use of the subspace spanned by the left sngular vectors of S and get an approxmaton of the subspace spanned by the rght sngular vectors of S. Remar.3. Wth the Marov nequalty, Lemma 1 mples that wth probablty at least 1 1 θ αp, we can assume M T M N T N θ M. Algorthm: 1. Input: A R m n,, ε.. p f, ε) max 4 ε 3, 3 ε 4 ) 3. Row samplng) Let P P 1, P,..., P m be a probablty dstrbuton on [m] such that P c A), 1,..., m for some c [0, 1]. Let A 1,..., p be p ndependent samples from [m], each followng dstrbuton P. Let S R p n wth S t) A t ), t 1,..., p. ppt 4. Column samplng) Let P P 1, P,..., P n be a probablty dstrbuton on [n] such that P j c S j, j 1,..., n. Let j S 1,..., j p be p ndependent samples from [n], each followng dstrbuton P. Let W R p p wth W t Sj t pp ), t 1,..., p. jt 5. Compute the top left sngular vectors of W : u 1) W ),..., u ) W ). 6

6. lter) Let T {t : W T W ) γ W } where γ cε 8. or t T, let ˆvt) ST W ) W T W ). 7. Output ˆv t) for t T. The ran- approxmaton to A can be reconstructed as à A t T ˆvt)ˆv t)t.) Remar.4. Some comments about the algorthm. An mportant observaton that the algorthm s based on s that there exsts a submatrx W of A, whose sze s only p p, p f, ε), such that W contans an mplct approxmaton to A that satsfes ). The algorthm enables us to answer n constant tme the queston: does there exst a good ran- approxmaton to A? Samplng: or the algorthm we have the followng two assumptons on samplng. 1. We can pc row of A wth probablty Q c A), c [0, 1]. A. or any row, we can pc the j th entry wth probablty Q,j c A,j A ). Note that f no entry of A s much larger than the average, then samplng accordng to unform dstrbuton would be enough. To mplement the column samplng step n the algorthm, we can pc a row unformly from S and apply the second samplng assumpton. Suppose the entres of A come as a stream and we only have p p memory. How do we acheve the samplng assumptons? Consder a smpler queston. How to pc one from a stream of numbers a 1, a,..., such that P a a and only one number s ept at any tme? The answer s whle seeng a, replace the exstng number by a wth probablty Proof Setch: Defne the dfference between M and the projecton of M on subspace span { x ) I } as: M; x ), I) M M M I a. 1 a x ) x )T 4) If {x ) I} s an orthonormal bass, then M; x ), I) I x )T M T Mx ) Next we state the followng lemmas, whose proofs can be found n [KV04]. 7

Lemma. [KV04] Lemma 3) Suppose A T A S T S θ A. Then a) or any par of unt vectors z, z n the row space of A, z T A T Az z T S T Sz θ A. b) or any set of vectors z 1),..., z l), l n the row space of A, A; z ), [l]) S; z ), [l]) θ A. ollowng from Lemma 1, the samplng n the algorthm enables us to mae use of Lemma for several tmes. Lemma 3. [KV04] Clam 1) S; ˆv t), t T ) S T ; W ), t T ) ε 8 A Now we are ready to show that Lemma 1, b), and 3. Ã n the algorthm does satsfy ). The proof uses Theorem 1, 1. rom Lemma 1, wth some probablty, we can assume for some θ 40 cp. A T A S T S θ A and SS T W W T θ S. rom Theorem 1, there exst vectors x 1),..., x ) such that A; x t), t []) A A A ε 8 A A ε 8 A. 3. rom Lemma b), by pcng θ ε 8, S; x t), t []) A; x t), t []) θ A A ε 4 A. 4. Snce S and S T have the same sngular values, there exst vectors y t), t [] n the column space of S such that S T ; y t), t []) A ε 4 A. 5. rom Theorem 1, there exst vectors z t), t [] such that W T ; z t), t []) S T ; z t), t []) θ S A ε A, and specfcally, W ), t [] wll have ths property: W T ; W ), t []) A ε A. 8

6. rom Lemma b), S T ; W ), t T ) W T ; W ), t T ) θ S A 3ε 4 A. 7. Apply Lemma 3 and Lemma b): Ths mples ). A; ˆv t), t T ) S; ˆv t), t T ) θ A A ε A. References [CKVW] Cheng, D., Kannan, R., Vempala, S., and Wang, G.. A dvde-and-merge methodology for clusterng. To appear n Proceedngs of the ACM Symposum on Prncples of Database Systems, 005. [DK + 04] Drneas, P., reze, A., Kannan, R., Vempala, S., and Vnay, V. Clusterng large graphs va the sngular value decomposton. Machne Learnng, 56:9 33, 004. Prelmnary verson n Proceedngs of the 10th ACM-SIAM Symposum on Dscrete Algorthms SODA), Baltmore, 1999. [KV04] reze, A., Kannan, R., and Vempala, S. ast Monte-Carlo algorthms for fndng lowran approxmatons. Journal of the ACM JACM), 516):105 1041, 004. Prelmnary verson n Proceedngs of the 39th oundatons of Computer Scence OCS), Palo Alto, 1998. 9