Matrix Approximation via Sampling, Subspace Embedding. 1 Solving Linear Systems Using SVD

Matrx Approxmaton va Samplng, Subspace Embeddng Lecturer: Anup Rao Scrbe: Rashth Sharma, Peng Zhang 0/01/016 1 Solvng Lnear Systems Usng SVD Two applcatons of SVD have been covered so far. Today we loo at a thrd applcaton. The problem s that, gven A R m n and b R m, fnd an x R n such that Ax b s mnmzed. SVD wll form the core of the soluton and there are no assumptons made on the matrx A. We denote the sngular value decomposton of A as A = UDV T Settng the gradent of fx = Ax b to zero, we get j fx = A, x b A j = 0 where, s nner product and A s the th row of A. Hence, A T Ax = A T b 1 Observe that ths system always has a soluton unque or otherwse. Solvng for x: If A T A s full ran, then the soluton s x = A T A 1 A T b If A T A s not full ran, we clam the followng s the best soluton n the least square sense: x = A T A A T b 3 where A T A = VD V T s nown as the pseudonverse. To support ths clam, we need to show that ths s a soluton to 1. Usng the SVD of A and the proposed x, A T Ax = VDU T UDV T VD V T A T b = VV T A T b = VV T VDU T b = VDU T b = A T b Hence, the least square soluton to Ax = b s gven by x LS = A T A A T b 1

Matrx Approxmaton va Samplng In ths secton, we present a method to estmate the best ran- approxmaton to matrces usng column selecton method..1 Approxmatng matrx product by samplng Ths subsecton establshes fast methods to approxmate matrx-vector and matrx-matrx product whch wll later be used n the algorthm to obtan the best ran- approxmaton of a gven matrx. Let A R m n and v R n, where A denotes the th column of A. Av = n A v We estmate ths product by samplng just one column from A. Defne the random varable X to be X = A v p wth probablty p Then the expected value of X s EX = A v = Av The Varance of X s VarX = E X EX = A v p Av We choose p s to mnmze VarX. Ths can be acheved by settng p A. After normalzng, we get and p = A A F VarX = A F v Av A F v We proceed to reduce the varance by repeatng the above process s tmes, and get s random varables X 1,..., X s. Let Y = 1 s X. Then we have EY = Av and VarY 1 s A F v We could also have an estmated probablty dstrbuton of the form p c A A F = LS col A, c

where c 1. If p < 1, we could pc a zero vector wth the remanng probablty. Wth ths choce for p s, we get VarY 1 cs A F v We now use ths to compute product of two matrces. Gven A R m n, B R n p, then AB = [AB 1, AB,..., AB p ] Let p = A be the probablty of pcng column. Denote ths dstrbuton as LS A col A. F We pc s columns of A, say j 1, j,..., j s, accordng to ths dstrbuton. We approxmate AB by the followng random matrx Y Y = 1 s s t=1 A jt B jt p jt where the subscrpts denote row ndces and superscrpts denote column ndces. We have EY = AB and VarY 1 A F B = 1 cs cs A F B F 5 We now have the framewor requred to study the algorthm to compute the ran- approxmaton of a matrx qucly.. Fast algorthm for ran- approxmaton Algorthm: Sample s columns of A accordng to LS col A, c. Let C be the matrx contanng these columns, that s, C = 1 A 1,..., As cs p1 ps Fnd the left sngular vectors of C, say u 1, u,..., u. Output Ã := u u T A as the ran- approxmaton of A. Theorem.1 [KV08]. E A Ã A A F + s A F where Ã s a ran- approxmaton and n our case equals UUT A the last term would be f s used. cs A F We proof Theorem.1 through the followng lemma. 3

Lemma.. Let A R m n, C R m s, and U R m consstng of the top sngular vectors of C. Then, Proof. A UU T A F A A F + AA T CC T F A UU T A F = tra UU T A T A UU T A = tra T A A T UU T A A T UU T A + A T UU T UU T A = tra T A A T UU T A = A F U T A F The thrd equalty s due to U T U = I. Then, A UU T A F A A F = A F U T A F A F A F = A F U T A F = A F C F + C F U T A F = σ A σ C + σ C U T A F =1 =1 σ C σ A + σ C U T A F σ C σ A + u T CCT AA T u CC T AA T F + CC T AA T F = CC T AA T F Here we frst used the Cauchy-Schwarz nequalty on both summatons and then the Hoffman- Welandt nequalty on the frst summaton. Proof of Theorem.1. By Lemma., we have E A Ã = E A UU T A A A F + E AA T CC T F Snce ECC T = AA T, by 5, Thus, E AA T CC T F = VarCC T 1 s A F. E A Ã A A F + s A F.

3 Fast Approxmaton Usng CUR Decomposton [KV08] Another way to approxmate a matrx s to sample s columns of A and also sample s rows of A where A R m n. Let C R m s and R R s n be respectvely made of columns and rows of A, sampled accordng to LS col A and LS row A. Then we can compute an s s matrx U such that CUR A and E A CUR F A A F + Subspace Embeddng s A F + 1 A F s Defnton.1. Let S R r n and V R n wth dmv = d. We say S s a subspace embeddng for V f Sx = 1 ± ɛ x, x V. We say S s a subspace embeddng for matrx A f t s a subspace embeddng for the column space of A. Oblvous subspace embeddng s defned as follows. S s a random matrx wth dstrbuton D over R r n such that, for any fxed matrx A, we have SAx = 1 ± ɛ Ax for any x R d. One dea s that, each entry of S s an..d. Gaussan random varable N0, 1. Johnson-Lndenstrauss Lemma shows that, to preserve the parwse dstance, we need r = Olog m/ɛ where m s the number of vectors.e., Ax. For oblvous subspace embeddng, m = Ωd. Furthermore, S sampled n ths way s a dense matrx, thus computng SA s expensve. In ths lecture, we wll show a dstrbuton of S, whch has r = Od/ɛ and S s sparse. An applcaton of subspace embeddng s Lnear regresson. The goal of lnear regresson s mn x Ax b, where A R n d wth n >> d. It s equal to mn y A 1 y, where x A 1 = A b and y =. Now, suppose S s a subspace embeddng for A 1 1, then we have SA 1 y = 1 ± ɛ A 1 y for any y. The goal becomes the followng, mn Ax b = mn SAx Sb x x Snce SA R r d, t reduces the dmenson sgnfcantly f r only depends on d. Now, we gve the followng theorem. Theorem. [Woo1]. Let A R n d, there exsts an algorthm whch generatng an S R r n wth r = O d /ɛ polylog d/ɛ. Wth probablty 0.99, S s 1 ± ɛ subspace embeddng for A. Furthermore, SA can be computed n OnnzA tme. S can be generated n the followng way. Each column of S has exactly one non-zero entry, whch s chosen unformly and ndependently. We assgn ths entry to be 1 or -1 wth equal probablty. More precsely, let h : [n] {1,..., r} and σ : [n] { 1, 1}, then S j = χhj = σj wth χhj = beng an ndcator. The dmenson of S can be reduced further by applyng Johnson-Lndenstrauss Lemma on SA, whch gets S SA. To prove Theorem., we need the followng lemma and theorem. Lemma.3. If r ɛ δ, then for any x Rn wth x = 1, S generated by the above algorthm satsfes E S Sx 1 ɛ δ. 5

Theorem.. Let A R n d and B R n d. If r ɛ δ, then P S A T S T SB A T B F ɛ A F B F δ. Proof of Theorem.. Let U = u 1,..., u d R n d be an orthonormal bass of the column space of A. To show S s a subspace embeddng of A, t suffces to show that SUx = 1±ɛ x, x R d. It s equvalent to show that x T U T S T SUx = 1±ɛx T Ix,.e., U T S T SU I ɛ. Snce U T S T SU I F U T S T SU I, t suffces to show that U T S T SU I F ɛ. Let A = B = U. Snce U s columns are orthonormal vectors, we have that U T U = I d and U F = d. By Theorem., we have P S U T S T SU I d F ɛ 1 d δ. Set ɛ 1 = ɛ/d, we have r = Od /ɛ δ. Proof of Theorem.. Let x, y R n be two unt vectors. Then, Sx, Sy = Sx + Sy Sx y. Let X be a random varable and fx = EX 1/. By Mnows s theorem, fx + Y fx + fy. Thus, f Sx, Sy x, y = 1 f Sx x + Sy y Sx y x y 1 f Sx x + f Sy y + f Sx y x y 1 ɛ δ + ɛ δ =ɛ δ The last nequalty s due to Lemma.3, and the fact that x = y = 1 and x y. It s easy to see that A T B j = A, B j. Let X j denote SA, SB j A, B j. Then, A T S T SB A T B F = j X j. By the above calculaton, X j f ɛ δ. A B j Thus, fx j ɛ δ A B j, that s, EXj ɛ δ A B j. Then, we have E A T S T SB A T B F ɛ δ A F B j F. 6

By Chebyshev nequalty, P S A T S T SB A T B F ɛ A F B F E A T SS T B A T B F ɛ A F B F Proof of Lemma.3. Expand Sx 1 = Sx Sx + 1 δ. We wll then bound the expectaton of Sx and Sx separately. E Sx = Ex T S T Sx = x T ES T S x Consder ES T S j = E S, S j. Accordng to the dstrbuton of S, each column of S has exactly one nonzero entry, whch s pced as -1 or 1 unformly and ndependently. Thus, ES T S = 1 and ES T S j = 0 for j,.e., ES T S = I n. It means that E Sx = x = 1. Smlarly, we bound the expectaton of Sx. E Sx = E S j x j j = ES j1 S j S j S 1 j x j 1 x j x j 1 x j, j 1,j,j 1,j Snce columns of S are sampled ndependently and ES j = 0, we have that ES j S j = 0 for all j j. Moreover, ES j = ES j = 1 r. Thus, E Sx = ES jx j + ES j 1 S j x 1 j 1 x j + ES 1 j 1 S j x j 1 x j j, j 1 j 1 j 1 j = 1 r x j + 1 r x j 1 x j + 1 1 r x j 1 x j j, j 1 j 1 j 1 j = x j + x j 1 x j + 1 1 r x j 1 x j j j 1 j j 1 j 1 1 + r Puttng all together, we have E Sx 1 = E Sx E Sx + 1 r. Snce r ɛ δ, we have E Sx 1 ɛ δ. References [KV08] Ravndran Kannan and Santosh Vempala. Spectral algorthms. Foundatons and Trends n Theoretcal Computer Scence, 3:157 88, 008. [Woo1] Davd P. Woodruff. Setchng as a tool for numercal lnear algebra. Foundatons and Trends n Theoretcal Computer Scence, 101:1 157, 01. 7