CIS 700: algorithms for Big Data

Size: px

Start display at page:

Download "CIS 700: algorithms for Big Data"

Fay Cooper
6 years ago
Views:

1 CIS 700: algorthms for Bg Data Lecture 5: Dmenson Reducton Sldes at htt://grgory.us/bg-data-class.html Grgory Yaroslavtsev htt://grgory.us

2 Today Dmensonalty reducton AMS as dmensonalty reducton Johnson-Lndenstrauss transform

3 L -norm Estmaton Stream: m udates x, Δ n R that defne vector f where f j = :x =j Δ. Examle: For n = 4,3, 3, 0.5,,,,,,,,, (4,) f = (4,, 0.5, ) L -norm: f = f

4 L -norm Estmaton L -norm: f = f Two lectures ago: f 0 = F 0 -moment f = F -moment (va AMS sketchng) log n ε Sace: O log δ Technque: lnear sketches f 0 : S f for random sets S f : σ f for random sgns σ

5 AMS as dmensonalty reducton Mantan a lnear sketch vector Z = Z,, Z k = Rf Z = σ j f j, where σ j R *,+ j,n- Estmator Y for f : k k = Z = Rf k Dmensonalty reducton : x Rx, heavy tal Pr Y f c f k c

6 Normal Dstrbuton Normal dstrbuton N(0,) Range: (, + ) Densty: μ x = π e x Mean = 0, Varance = Basc facts: If X and Y are ndeendent r.v. wth normal dstrbuton then X + Y has normal dstrbuton Var cx = c Var,X- If X, Y are ndeendent, then Var X + Y = Var X + Var,Y-

7 Johnson-Lndenstrauss Transform Instead of ± let σ be..d. random varables from normal dstrbuton N(0,) Z = σ f We stll have E Z = f = f because: E σ E σ j = 0; E σ = varance of σ = Defne Z = (Z,, Z k ) and defne: Z = Zj j E Z = k f JL Lemma: There exsts C > 0 s.t. for small enough ε > 0: Pr Z k f > εk f ex Cε k

8 Proof of JL Lemma JL Lemma: C > 0 s.t. for small enough ε > 0: Pr Z k f > εk f ex Cε k Assume f =. We have Z = j σ j f and Z = (Z,, Z k ) E Z = k f = k Alternatve form of JL Lemma: Pr Z > k + ε ex ε k + O(k ε 3 )

9 Proof of JL Lemma Alternatve form of JL Lemma: Pr Z > k + ε ex ε k + O(k ε 3 ) Let Y = Z and α = k + ε For every s > 0 we have: Pr Y > α = Pr,e sy > e sα - By Markov and ndeendence of Z s: Pr e sy > e sα E esy We have Z N(0,), hence: e sα = e sα E e s Z = e sα E e sz E e sz = π e st e t dt = k = s

10 Proof of JL Lemma Alternatve form of JL Lemma: Pr Z > k + ε ex ε k + O(k ε 3 ) For every s > 0 we have: Pr Y > α e sα E e sz k = = e sα s k Let s = k and recall that α = k + ε α A calculaton fnshes the roof: Pr Y > α ex ε k + O(k ε 3 )

11 Johnson-Lndenstrauss Transform Sngle vector: k = O log δ ε Tght: k = Ω log δ ε [Woodruff 0] n vectors smultaneously: k = O log n log δ ε Tght: k = Ω log n log δ ε [Molnaro, Woodruff, Y. 3] Dstances between n vectors = O(n ) vectors: k = O log n log δ ε

12 Random Varables and Norms For a random varable X and let: / X = E X Facts: For any c: cx = c X s a norm (Mnkowsk s nequalty) q for q (Monotoncty of norms) Jensen s nequalty (used a lot for F = x ): If F s convex then F E X E[F(X)]

13 Khntchne Inequalty [Khntchne]For, x R n and σ..d. Rademachers: σ x x For r (ether σ or g N 0, ) exand E r x All odd owers of r are zero All even moments for σ are, and for g are σ x g x g x N 0, x g x x

14 Symmetrzaton [Symmetrzaton]: If Z,, Z n are ndeendent and σ are..d. Rademachers: Z E Z σ Z Let Y Y n be ndeendent wth the same dstrbuton as Z Z E Z = Z E Y Y Z Y (Jensen) = σ Z Y (Z Y are ndeendent and symmetrc) σ Z (trangle nequalty)

15 Decoulng Let x, x n be ndeendent wth mean 0 and x, x n dentcally dstrbuted as x and ndeendent of them. For any a j and : j a j x x j 4 a j x x j Let η,, η n be..d. Bernoulls (0/ w.. /): j a j x x j,j = 4 E η j a j x x j η ( η j ) 4 j a j x x j η ( η j ) (Jensen) There exsts η 0, n such that: j a j x x j η ( η j ) where S = η =. S j S a jx x j

16 Decoulng (contnued) Let x S be an S-dmensonal vector of x for S. S j S a j x x j = S j S a jx x j = E xs E x S,j a j x x (E x j = E x = 0),j a j x x j (Jensen) Overall: a j x x j 4 a j x x j j,j

17 Hanson-Wrght Inequalty For σ,, σ n ndeendent Rademachers and A R n n real and symmetrc for all : σ T Aσ E σ T Aσ A F + A Recall: A F = j a j = Tr(A T A) A = su v 0 Av v

18 Hanson-Wrght Inequalty For σ,, σ n ndeendent Rademachers and A R n n real and symmetrc for all : σ T Aσ E σ T Aσ A F + A σ T Aσ E σ T Aσ σ T Aσ (decoulng) Aσ (Khntchne) = Aσ Aσ / (monotoncty of norms)

19 Hanson-Wrght (contnued) Aσ Aσ E Aσ + Aσ E Aσ = A F + Aσ E Aσ A F + Aσ E Aσ A F + σ T A T Aσ (decoulng) (trangle neq.) A F A T / Aσ (Khntchne) A F A Ax

20 Hanson-Wrght (contnued) Aσ A F A Let E = Ax then E C 4 A Aσ E C A F 0 E larger root of the quadratc equaton above E A F + A (Hanson-Wrght) For σ,, σ n ndeendent Rademachers and A R n n real and symmetrc for all : σ T Aσ E σ T Aσ A F + A

21 Reca For a random varable X and let: X = E X / [Khntchne]For, x R n and σ..d. Rademachers: σ x x [Symmetrzaton]: If Z,, Z n are ndeendent and σ are..d. Rademachers: Z E Z σ Z [Hanson-Wrght]For σ,, σ n ndeendent Rademachers and A R n n real and symmetrc for all : σ T Aσ E σ T Aσ A F + A

22 Bernsten Inequalty Let X,, X n be nde. r.v s such that X K almost surely and E X σ. For all : X E X X E X σ X ( X ) (Khntchne) = X σ + K (symmetrzaton) σ + X E X / (trangle nequalty)

23 Bernsten Inequalty (cont.) X E X ( X ) σ + X σ + σ X σ X 4 / E X σ K X / (symmetrzaton) (Khntchne)

24 Bernsten Inequalty (cont.) Let E = ( X ) then for some C > 0: E C 4 KE Cσ 0 E larger root of ths quadratc equaton E σ + K [Bernsten] Let X,, X n be nde. r.v s such that X K almost surely and E X σ. For all : X E X σ + K

25 Sarse Johnson-Lndenstrauss Transform Let Π R m n be a JL-matrx where m = O ε log δ whch satsfes for x = : Pr Π Πx ε δ Takes O m x 0 tme to comute JL Would be O s x 0 tme Π only had s nonzero entres er column

26 Basc Sarse JL Transform Pck -wse nde. hash functon n m Pck 4-wse nde. hash functon σ n *,+ For each,n- let Π h, = σ, the rest are 0 [Thoru, Zhang ]: Ths s JL f m ε δ Best ossble snce s = Analyss: standard exectaton/varance usng bounded ndeendence + Chebyshev To mrove m let s use Hanson-Wrght (hgher moment than Chebyshev s second)

27 Sarse JL Transform: Constructon Π r, = η r, σ r, / s, where η are Bernoulls and σ are Rademachers For all r, : E η r, = s m For all : η r, = s (s non-zeros er column) η r, are negatvely correlated: E η r, r, S E η r, r, S = s m Each column chosen unformly from Bnom m, s columns of weght s works here S

28 By Hanson-Wrght: Z A x,η F + A x,η Sarse JL Transform: Analyss Thm [KN 4]: If m = O log and s εm: ε δ x: x =, Pr Πx Π ε δ Z = Πx = s m r= j η r, η r,j σ r, σ r,j x x j σ T A x,η σ A x,η s a block-dagonal matrx wth m blocks where r-th block s s x r x r T but wth zeros on the dagonal x r s a vector wth entres x r = η r, x A x,η F + A x,η

29 Sarse JL Transform: Analyss (Oerator norm) Snce A x,η s block-dagonal A x,η s the largest norm of any block Egenvalues n the r-th block are at most A x,η s s max x r, x r s

30 Sarse JL Transform: Analyss m Defne Q,j = r= η r, η r,j so that: A x,η F = /s x x j Q,j j Lemma: If s /m then, j Q,j A x,η F = A x,η F s x x j Q,j s j j / m x x j Q,j / (trangle neq.)

31 Sarse JL Transform: Analyss By Markov (m = O s ε log /δ, s εm, ): m Pr, Πx > ε- = Pr σ T A x,η σ > ε ε E, σ T A x,η σ - (Markov) ε C m + s = ε C ε + ε δ

32 Sarse JL Transform: Analyss Lemma: If s /m then, j Q,j Suose η a,,, η as, are all where a < < a s. s Note that Q j = t= Y t where t s an ndcator r.v. for the event η at,j =. Y t s are not nde. but negatvely correlated -th moment at most -th moments of..d. Bernoulls wth exectaton s (exand Y m t t and comare term by term) By Bernsten nequalty: Q j = t Y t s m +

33 FFT-based Fast JL-Transform [Alon, Chazelle 09] Runnng tme O(n log n) Defne Π R m n as Π = m S H D S = m n samlng matrx (wth relacement) H = unnormalzed bounded orthonormal system,.e. H R n n ; H T H = I; max H,j,j D = dag(α) for (α,, α n )..d. Rademachers If H = Hadamard matrx O(n log n) tme to comute Πx

34 FFT-based Fast JL-Transform Change S to S η = dag(η,, η n ) where η are Bernoulls wth exectaton E η = m/n [CNW 5] If Π = S m ηhd, m ε log log δ εδ x: x =, Pr Π Πx ε δ Let z = HDx so Πx = η m z Wll show that m n = η z s small

35 FFT-based Fast JL-Transform n m η z m = m m m η z 4 (max max max / m σ η z (Khntchne) η z ) η z η z η z m n η z / = η z symmetrzaton + (trangle nequalty)

36 FFT-based Fast JL-Transform max max η z η z max q η z = Eα,η max E α,η,η z q - q (n max = (n max for q = max (, log m) q η z q /q E α,η E α,η η z q ) q E η η E α,z q -) q = (m max max z (m q by choce of q) q = max z q q (Khntchne) E α,z q -) q η z q q =

37 FFT-based Fast JL-Transform Let E = m n = η z E C q m E C q m 0 E max q, q m m Markov: Pr Π Πx ε ε E δ = log /δ and m ε log δ log m δ

Dimensionality Reduction Notes 2

Dimensionality Reduction Notes 2 Dmensonalty Reducton Notes 2 Jelan Nelson mnlek@seas.harvard.edu August 11, 2015 1 Optmalty theorems for JL Yesterday we saw for MJL that we could acheve target dmenson m = O(ε 2 log N), and for DJL we