Matrix Approximation via Sampling, Subspace Embedding. 1 Solving Linear Systems Using SVD

Similar documents
Lecture 4: Constant Time SVD Approximation

Lecture 3. Ax x i a i. i i

Singular Value Decomposition: Theory and Applications

18.1 Introduction and Recap

U.C. Berkeley CS294: Spectral Methods and Expanders Handout 8 Luca Trevisan February 17, 2016

Lecture 4: Universal Hash Functions/Streaming Cont d

Stanford University CS359G: Graph Partitioning and Expanders Handout 4 Luca Trevisan January 13, 2011

Stanford University CS254: Computational Complexity Notes 7 Luca Trevisan January 29, Notes for Lecture 7

Estimation: Part 2. Chapter GREG estimation

Homework Notes Week 7

U.C. Berkeley CS294: Beyond Worst-Case Analysis Luca Trevisan September 5, 2017

APPENDIX A Some Linear Algebra

Errors for Linear Systems

Supplementary material: Margin based PU Learning. Matrix Concentration Inequalities

CSCE 790S Background Results

Finding Dense Subgraphs in G(n, 1/2)

MATH 5707 HOMEWORK 4 SOLUTIONS 2. 2 i 2p i E(X i ) + E(Xi 2 ) ä i=1. i=1

8.4 COMPLEX VECTOR SPACES AND INNER PRODUCTS

princeton univ. F 13 cos 521: Advanced Algorithm Design Lecture 3: Large deviations bounds and applications Lecturer: Sanjeev Arora

Lecture 10: May 6, 2013

CIS 700: algorithms for Big Data

n α j x j = 0 j=1 has a nontrivial solution. Here A is the n k matrix whose jth column is the vector for all t j=0

COS 521: Advanced Algorithms Game Theory and Linear Programming

Chapter 5. Solution of System of Linear Equations. Module No. 6. Solution of Inconsistent and Ill Conditioned Systems

Communication Complexity 16:198: February Lecture 4. x ij y ij

Yong Joon Ryang. 1. Introduction Consider the multicommodity transportation problem with convex quadratic cost function. 1 2 (x x0 ) T Q(x x 0 )

Notes on Frequency Estimation in Data Streams

Generalized Linear Methods

10-701/ Machine Learning, Fall 2005 Homework 3

Lecture 20: Lift and Project, SDP Duality. Today we will study the Lift and Project method. Then we will prove the SDP duality theorem.

Inner Product. Euclidean Space. Orthonormal Basis. Orthogonal

Expected Value and Variance

ISSN: ISO 9001:2008 Certified International Journal of Engineering and Innovative Technology (IJEIT) Volume 3, Issue 1, July 2013

ANSWERS. Problem 1. and the moment generating function (mgf) by. defined for any real t. Use this to show that E( U) var( U)

Deriving the X-Z Identity from Auxiliary Space Method

Exercises of Chapter 2

2.3 Nilpotent endomorphisms

MASSACHUSETTS INSTITUTE OF TECHNOLOGY 6.265/15.070J Fall 2013 Lecture 12 10/21/2013. Martingale Concentration Inequalities and Applications

U.C. Berkeley CS294: Beyond Worst-Case Analysis Handout 6 Luca Trevisan September 12, 2017

BOUNDEDNESS OF THE RIESZ TRANSFORM WITH MATRIX A 2 WEIGHTS

Inexact Newton Methods for Inverse Eigenvalue Problems

Norms, Condition Numbers, Eigenvalues and Eigenvectors

On a direct solver for linear least squares problems

Lecture Notes on Linear Regression

Randomness and Computation

MMA and GCMMA two methods for nonlinear optimization

The Geometry of Logit and Probit

Fall 2012 Analysis of Experimental Measurements B. Eisenstein/rev. S. Errede

MATH 829: Introduction to Data Mining and Analysis The EM algorithm (part 2)

Linear, affine, and convex sets and hulls In the sequel, unless otherwise specified, X will denote a real vector space.

CME 302: NUMERICAL LINEAR ALGEBRA FALL 2005/06 LECTURE 13

ρ some λ THE INVERSE POWER METHOD (or INVERSE ITERATION) , for , or (more usually) to

3.1 Expectation of Functions of Several Random Variables. )' be a k-dimensional discrete or continuous random vector, with joint PMF p (, E X E X1 E X

princeton univ. F 17 cos 521: Advanced Algorithm Design Lecture 7: LP Duality Lecturer: Matt Weinberg

APPROXIMATE PRICES OF BASKET AND ASIAN OPTIONS DUPONT OLIVIER. Premia 14

Eigenvalues of Random Graphs

Stanford University Graph Partitioning and Expanders Handout 3 Luca Trevisan May 8, 2013

1 Matrix representations of canonical matrices

Min Cut, Fast Cut, Polynomial Identities

For now, let us focus on a specific model of neurons. These are simplified from reality but can achieve remarkable results.

A New Refinement of Jacobi Method for Solution of Linear System Equations AX=b

763622S ADVANCED QUANTUM MECHANICS Solution Set 1 Spring c n a n. c n 2 = 1.

1 The Mistake Bound Model

The Multiple Classical Linear Regression Model (CLRM): Specification and Assumptions. 1. Introduction

Lectures - Week 4 Matrix norms, Conditioning, Vector Spaces, Linear Independence, Spanning sets and Basis, Null space and Range of a Matrix

Linear Approximation with Regularization and Moving Least Squares

6.842 Randomness and Computation February 18, Lecture 4

MATH 241B FUNCTIONAL ANALYSIS - NOTES EXAMPLES OF C ALGEBRAS

Math 217 Fall 2013 Homework 2 Solutions

Lecture 10 Support Vector Machines II

Google PageRank with Stochastic Matrix

Appendix for Causal Interaction in Factorial Experiments: Application to Conjoint Analysis

Lecture 12: Discrete Laplacian

REAL ANALYSIS I HOMEWORK 1

Strong Markov property: Same assertion holds for stopping times τ.

Feature Selection: Part 1

Maximum Likelihood Estimation of Binary Dependent Variables Models: Probit and Logit. 1. General Formulation of Binary Dependent Variables Models

1 Convex Optimization

Additional Codes using Finite Difference Method. 1 HJB Equation for Consumption-Saving Problem Without Uncertainty

Lecture 5 September 17, 2015

a b a In case b 0, a being divisible by b is the same as to say that

Dimensionality Reduction Notes 1

Vector Norms. Chapter 7 Iterative Techniques in Matrix Algebra. Cauchy-Bunyakovsky-Schwarz Inequality for Sums. Distances. Convergence.

Mining Data Streams-Estimating Frequency Moment

Spectral Graph Theory and its Applications September 16, Lecture 5

2E Pattern Recognition Solutions to Introduction to Pattern Recognition, Chapter 2: Bayesian pattern classification

p 1 c 2 + p 2 c 2 + p 3 c p m c 2

Lecture 2: Prelude to the big shrink

Lecture Space-Bounded Derandomization

STAT 309: MATHEMATICAL COMPUTATIONS I FALL 2018 LECTURE 16

Hongyi Miao, College of Science, Nanjing Forestry University, Nanjing ,China. (Received 20 June 2013, accepted 11 March 2014) I)ϕ (k)

A polynomial time algorithm for the ground state of one-dimensional gapped local Hamiltonians

Systems of Equations (SUR, GMM, and 3SLS)

The lower and upper bounds on Perron root of nonnegative irreducible matrices

C/CS/Phy191 Problem Set 3 Solutions Out: Oct 1, 2008., where ( 00. ), so the overall state of the system is ) ( ( ( ( 00 ± 11 ), Φ ± = 1

Stat260: Bayesian Modeling and Inference Lecture Date: February 22, Reference Priors

Low correlation tensor decomposition via entropy maximization

CIS526: Machine Learning Lecture 3 (Sept 16, 2003) Linear Regression. Preparation help: Xiaoying Huang. x 1 θ 1 output... θ M x M

Feb 14: Spatial analysis of data fields

Topic 5: Non-Linear Regression

Transcription:

Matrx Approxmaton va Samplng, Subspace Embeddng Lecturer: Anup Rao Scrbe: Rashth Sharma, Peng Zhang 0/01/016 1 Solvng Lnear Systems Usng SVD Two applcatons of SVD have been covered so far. Today we loo at a thrd applcaton. The problem s that, gven A R m n and b R m, fnd an x R n such that Ax b s mnmzed. SVD wll form the core of the soluton and there are no assumptons made on the matrx A. We denote the sngular value decomposton of A as A = UDV T Settng the gradent of fx = Ax b to zero, we get j fx = A, x b A j = 0 where, s nner product and A s the th row of A. Hence, A T Ax = A T b 1 Observe that ths system always has a soluton unque or otherwse. Solvng for x: If A T A s full ran, then the soluton s x = A T A 1 A T b If A T A s not full ran, we clam the followng s the best soluton n the least square sense: x = A T A A T b 3 where A T A = VD V T s nown as the pseudonverse. To support ths clam, we need to show that ths s a soluton to 1. Usng the SVD of A and the proposed x, A T Ax = VDU T UDV T VD V T A T b = VV T A T b = VV T VDU T b = VDU T b = A T b Hence, the least square soluton to Ax = b s gven by x LS = A T A A T b 1

Matrx Approxmaton va Samplng In ths secton, we present a method to estmate the best ran- approxmaton to matrces usng column selecton method..1 Approxmatng matrx product by samplng Ths subsecton establshes fast methods to approxmate matrx-vector and matrx-matrx product whch wll later be used n the algorthm to obtan the best ran- approxmaton of a gven matrx. Let A R m n and v R n, where A denotes the th column of A. Av = n A v We estmate ths product by samplng just one column from A. Defne the random varable X to be X = A v p wth probablty p Then the expected value of X s EX = A v = Av The Varance of X s VarX = E X EX = A v p Av We choose p s to mnmze VarX. Ths can be acheved by settng p A. After normalzng, we get and p = A A F VarX = A F v Av A F v We proceed to reduce the varance by repeatng the above process s tmes, and get s random varables X 1,..., X s. Let Y = 1 s X. Then we have EY = Av and VarY 1 s A F v We could also have an estmated probablty dstrbuton of the form p c A A F = LS col A, c

where c 1. If p < 1, we could pc a zero vector wth the remanng probablty. Wth ths choce for p s, we get VarY 1 cs A F v We now use ths to compute product of two matrces. Gven A R m n, B R n p, then AB = [AB 1, AB,..., AB p ] Let p = A be the probablty of pcng column. Denote ths dstrbuton as LS A col A. F We pc s columns of A, say j 1, j,..., j s, accordng to ths dstrbuton. We approxmate AB by the followng random matrx Y Y = 1 s s t=1 A jt B jt p jt where the subscrpts denote row ndces and superscrpts denote column ndces. We have EY = AB and VarY 1 A F B = 1 cs cs A F B F 5 We now have the framewor requred to study the algorthm to compute the ran- approxmaton of a matrx qucly.. Fast algorthm for ran- approxmaton Algorthm: Sample s columns of A accordng to LS col A, c. Let C be the matrx contanng these columns, that s, C = 1 A 1,..., As cs p1 ps Fnd the left sngular vectors of C, say u 1, u,..., u. Output à := u u T A as the ran- approxmaton of A. Theorem.1 [KV08]. E A à A A F + s A F where às a ran- approxmaton and n our case equals UUT A the last term would be f s used. cs A F We proof Theorem.1 through the followng lemma. 3

Lemma.. Let A R m n, C R m s, and U R m consstng of the top sngular vectors of C. Then, Proof. A UU T A F A A F + AA T CC T F A UU T A F = tra UU T A T A UU T A = tra T A A T UU T A A T UU T A + A T UU T UU T A = tra T A A T UU T A = A F U T A F The thrd equalty s due to U T U = I. Then, A UU T A F A A F = A F U T A F A F A F = A F U T A F = A F C F + C F U T A F = σ A σ C + σ C U T A F =1 =1 σ C σ A + σ C U T A F σ C σ A + u T CCT AA T u CC T AA T F + CC T AA T F = CC T AA T F Here we frst used the Cauchy-Schwarz nequalty on both summatons and then the Hoffman- Welandt nequalty on the frst summaton. Proof of Theorem.1. By Lemma., we have E A Ã = E A UU T A A A F + E AA T CC T F Snce ECC T = AA T, by 5, Thus, E AA T CC T F = VarCC T 1 s A F. E A Ã A A F + s A F.

3 Fast Approxmaton Usng CUR Decomposton [KV08] Another way to approxmate a matrx s to sample s columns of A and also sample s rows of A where A R m n. Let C R m s and R R s n be respectvely made of columns and rows of A, sampled accordng to LS col A and LS row A. Then we can compute an s s matrx U such that CUR A and E A CUR F A A F + Subspace Embeddng s A F + 1 A F s Defnton.1. Let S R r n and V R n wth dmv = d. We say S s a subspace embeddng for V f Sx = 1 ± ɛ x, x V. We say S s a subspace embeddng for matrx A f t s a subspace embeddng for the column space of A. Oblvous subspace embeddng s defned as follows. S s a random matrx wth dstrbuton D over R r n such that, for any fxed matrx A, we have SAx = 1 ± ɛ Ax for any x R d. One dea s that, each entry of S s an..d. Gaussan random varable N0, 1. Johnson-Lndenstrauss Lemma shows that, to preserve the parwse dstance, we need r = Olog m/ɛ where m s the number of vectors.e., Ax. For oblvous subspace embeddng, m = Ωd. Furthermore, S sampled n ths way s a dense matrx, thus computng SA s expensve. In ths lecture, we wll show a dstrbuton of S, whch has r = Od/ɛ and S s sparse. An applcaton of subspace embeddng s Lnear regresson. The goal of lnear regresson s mn x Ax b, where A R n d wth n >> d. It s equal to mn y A 1 y, where x A 1 = A b and y =. Now, suppose S s a subspace embeddng for A 1 1, then we have SA 1 y = 1 ± ɛ A 1 y for any y. The goal becomes the followng, mn Ax b = mn SAx Sb x x Snce SA R r d, t reduces the dmenson sgnfcantly f r only depends on d. Now, we gve the followng theorem. Theorem. [Woo1]. Let A R n d, there exsts an algorthm whch generatng an S R r n wth r = O d /ɛ polylog d/ɛ. Wth probablty 0.99, S s 1 ± ɛ subspace embeddng for A. Furthermore, SA can be computed n OnnzA tme. S can be generated n the followng way. Each column of S has exactly one non-zero entry, whch s chosen unformly and ndependently. We assgn ths entry to be 1 or -1 wth equal probablty. More precsely, let h : [n] {1,..., r} and σ : [n] { 1, 1}, then S j = χhj = σj wth χhj = beng an ndcator. The dmenson of S can be reduced further by applyng Johnson-Lndenstrauss Lemma on SA, whch gets S SA. To prove Theorem., we need the followng lemma and theorem. Lemma.3. If r ɛ δ, then for any x Rn wth x = 1, S generated by the above algorthm satsfes E S Sx 1 ɛ δ. 5

Theorem.. Let A R n d and B R n d. If r ɛ δ, then P S A T S T SB A T B F ɛ A F B F δ. Proof of Theorem.. Let U = u 1,..., u d R n d be an orthonormal bass of the column space of A. To show S s a subspace embeddng of A, t suffces to show that SUx = 1±ɛ x, x R d. It s equvalent to show that x T U T S T SUx = 1±ɛx T Ix,.e., U T S T SU I ɛ. Snce U T S T SU I F U T S T SU I, t suffces to show that U T S T SU I F ɛ. Let A = B = U. Snce U s columns are orthonormal vectors, we have that U T U = I d and U F = d. By Theorem., we have P S U T S T SU I d F ɛ 1 d δ. Set ɛ 1 = ɛ/d, we have r = Od /ɛ δ. Proof of Theorem.. Let x, y R n be two unt vectors. Then, Sx, Sy = Sx + Sy Sx y. Let X be a random varable and fx = EX 1/. By Mnows s theorem, fx + Y fx + fy. Thus, f Sx, Sy x, y = 1 f Sx x + Sy y Sx y x y 1 f Sx x + f Sy y + f Sx y x y 1 ɛ δ + ɛ δ =ɛ δ The last nequalty s due to Lemma.3, and the fact that x = y = 1 and x y. It s easy to see that A T B j = A, B j. Let X j denote SA, SB j A, B j. Then, A T S T SB A T B F = j X j. By the above calculaton, X j f ɛ δ. A B j Thus, fx j ɛ δ A B j, that s, EXj ɛ δ A B j. Then, we have E A T S T SB A T B F ɛ δ A F B j F. 6

By Chebyshev nequalty, P S A T S T SB A T B F ɛ A F B F E A T SS T B A T B F ɛ A F B F Proof of Lemma.3. Expand Sx 1 = Sx Sx + 1 δ. We wll then bound the expectaton of Sx and Sx separately. E Sx = Ex T S T Sx = x T ES T S x Consder ES T S j = E S, S j. Accordng to the dstrbuton of S, each column of S has exactly one nonzero entry, whch s pced as -1 or 1 unformly and ndependently. Thus, ES T S = 1 and ES T S j = 0 for j,.e., ES T S = I n. It means that E Sx = x = 1. Smlarly, we bound the expectaton of Sx. E Sx = E S j x j j = ES j1 S j S j S 1 j x j 1 x j x j 1 x j, j 1,j,j 1,j Snce columns of S are sampled ndependently and ES j = 0, we have that ES j S j = 0 for all j j. Moreover, ES j = ES j = 1 r. Thus, E Sx = ES jx j + ES j 1 S j x 1 j 1 x j + ES 1 j 1 S j x j 1 x j j, j 1 j 1 j 1 j = 1 r x j + 1 r x j 1 x j + 1 1 r x j 1 x j j, j 1 j 1 j 1 j = x j + x j 1 x j + 1 1 r x j 1 x j j j 1 j j 1 j 1 1 + r Puttng all together, we have E Sx 1 = E Sx E Sx + 1 r. Snce r ɛ δ, we have E Sx 1 ɛ δ. References [KV08] Ravndran Kannan and Santosh Vempala. Spectral algorthms. Foundatons and Trends n Theoretcal Computer Scence, 3:157 88, 008. [Woo1] Davd P. Woodruff. Setchng as a tool for numercal lnear algebra. Foundatons and Trends n Theoretcal Computer Scence, 101:1 157, 01. 7