Fastfood - Approximating Kernel Expansions in Loglinear Time

Size: px
Start display at page:

Download "Fastfood - Approximating Kernel Expansions in Loglinear Time"

Transcription

1 Fastfoo - Approximating Kernel Expansions in Loglinear Time Quoc Le, Tamás Sarlós, Alex Smola. ICML-013 Machine Learning Journal Club, Gatsby May 16, 014

2 Notations Given: omain (X), kernel k. φ : X H feature map k(x,x ) = φ(x),φ(x ) H. (1) Representer theorem: for many tasks (SVM,...) w = N α i φ(x i ). () i=1 Consequence: ecision function f(x) = w,φ(x) H = N α i k(x i,x). (3) i=1

3 Ranom kitchen sinks (X = R ) Bochner Theorem: k continuous, shift invariant k(x x,0) = e i z,x x λ(z)z, λ M + (R ) (4) R = φz (x)φ z (x )λ(z)z, φ z (x) = e izx. (5) R Assumption: λ is a probability measure (normalization). Trick: ˆk(x x,0) = 1 n e i z j,x x, zj λ. (6) n i=1

4 Ranom kitchen sinks - continue Specially, for Gaussians: k(x x,0) = e x x σ, ( ) I λ(z) = N z;0, σ, (7) k(x x,0) ˆφ(x), ˆφ(x ) = ˆφ(x) ˆφ(x ), (8) ˆφ(x) = 1 n e izx C n, (9) Properties: O(n) CPU, O(n) RAM. Z = [ Z ab N ( 0,σ )] R n. (10) Iea (fastfoo): o not store Z, only the fast generators of Ẑ.

5 Fastfoo construction: n = ( = l ; otherwise paing) where G: iag(n(0,1)) R. V = 1 σ SHGPHB, (11) P: ranom permutation matrix {0,1}. B: iag(bernoulli) R, B ii { 1,1}. H = H : Walsh-Haamar (WH) transformation R H 1 = 1,H = [ ] [ H k H,H k+1 = k H k H k S: iag( s i G ): s i (π) r 1 e r F A 1, A 1 = π Γ( ). ] = (H ) k.

6 Fastfoo construction: n > (assumption: n) We stack n inepenent copies together: V = [V 1 ;...;V n] = Ẑ. (1) Intuition of V j = 1 σ SHGPHB: 1 HB: acts as an isometry, which makes the input enser. P: ensures the incoherence of the two H-s. H,G: WHs with iagonal Gaussian ense Gaussian. S: length istributions of V rows are inepenent.

7 Fastfoo: computational efficiency G,B,S: generate them once, store. RAM: O(n), cost of multiplication: O(n). P: O(n) storage, O(n) computation (lookup table). H : o not store, H x: O( log()) time/block, n blocks O(nlog()). To sum up: sinks CPU: O(n), RAM: O(n), vs fastfoo CPU: O(n log()), RAM: O(n).

8 Walsh-Haamar transformation: symmetry, orthogonality Definition: H 1 = 1,H = [ ],H k+1 = (H ) k.

9 Walsh-Haamar transformation: symmetry, orthogonality Definition: H 1 = 1,H = [ Symmetry, orthogonality ( = k ): ],H k+1 = (H ) k. H = H T, H H T = I (i.e., 1 H: orth.). (13)

10 Walsh-Haamar transformation: symmetry, orthogonality Definition: H 1 = 1,H = [ Symmetry, orthogonality ( = k ): ],H k+1 = (H ) k. H = H T, H H T = I (i.e., 1 H: orth.). (13) Proof: H 1, H : OK. [ [H k+1] T = (H ) k] T = (H T ) k = (H ) k = H k+1, ( H T k H T H k +1H T = (H k+1 k H )(H k H ) T = (H k H ) ) ( ) = (H kh T H k H T = ( k I) (I) = k+1 I using (A B) T = A T B T, (A B)(C D) = AC BD. (14) )

11 Walsh-Haamar transformation: spectral norm We have seen ( = k ): H H T = I. Spectral norm: H = λ max ( H T H ) = λmax (I) =. (15)

12 Goal ( = ) Unbiaseness: ] ] E[ˆk(x,x ) = E[ˆφ(x) ˆφ(x ) = e x x σ = k(x,x ). (16) Concentration: ] P[ ˆk(x,x ) k(x,x ) a b. (17)

13 Goal continue ( Low variance (one-block): v = x x σ, ψ [HGPHBv]j j(v) = cos ), j [], Low variance: var[ψ j (v)] = 1 var ψ j (v) j=1 ( 1 e v ), (18) ( 1 e v ) +C( v ), (19) ( ) C(α) = 6α 4 e α + α. (0) 3 T var [ˆφ(x) ˆφ(x )] (1 e v ) n + C( v ). (1) n Proof: ˆφ(x) T ˆφ(x ) = sum of n inep. terms ( n ), average ( 1 n ).

14 Towars unbiaseness: E([HGPHB] ij ) Let M := HGPHB. E(M ij ) = 0 () since Hi T : i th row of H H j : j th column of H, M ij = (Hi T )GP(H j B jj ), M ij P,B: sum of inepenent N(0,1)-s, +sign change, E(M ij ) = E[E(M ij P,B)] = E(0) = 0.

15 Unbiaseness: var ([HGPHB] ij ) Last slie: M ij = (Hi T )GP(H j B jj ), E(M ij ) = 0. [ ] var (M ij ) = E M ij Mij T ] = E [Bjj HT i GPee T P T GH i )] = E [(H i T GPH j B jj )(B jj Hj T P T GH i ] = E [1H i T Gee T GH i = H T i E [ G ] H i = H T i IH i = H T i H i = using e := [1;...;1] R, H j H T j = ee T, Pe = e, E ( Gee T G ) = E ( G ) (G: iagonal), E(G ii ) = 1.

16 Unbiaseness: cov ([HGPHB] ij,[hgphb] ik ), j k We have seen: E(M ij ) = 0, var (M ij ) =. cov(m ij,m ik ) = 0 (j k) since ) l.h.s. = E (H i T GPH j B jj Hi T GPH k B kk (3) ) = E(B jj B kk )E (H i T GPH j Hi T GPH k, (4) E(B jj B kk ) = E(B jj )E(B kk ) = 0 0 = 0 (5) using that 0 = I ((B jj,b kk ),others) = I(B jj,b kk ) = E(B uu ).

17 Unbiaseness In V = 1 σ 1 HGPHB (V = ( ) Mij E(V ij ) = E σ ( ) Mij var (V ij ) = var σ σ M) = 0, (6) = var (M ij) σ = σ = 1 σ, (7) cov(v ij,v ik ) = 0 (j k). (8) Thus, the istribution of the rows of V P,B: N ( 0, I σ ) [Ali&Recht 007] unbiaseness P, B unbiaseness. Note: we nee (i) (8) P,B, but we use E B (B jj B kk ); otherwise: V MOG, (ii) the inepenence of the rows.

18 Concentration (e cos, n = ) Theorem (RBF): Let ˆk(x,x ) = 1 j=1 ( 1 cos σ [ HGPHB(x x ) ] ). (9) j Then P ˆk(x,x ) k(x,x log ( ) δ ) α δ (30) for δ > 0, α = x x σ log ( ) δ.

19 Concentration proof ] We have alreay seen: E[ˆk(x,x ) = k(x,x ). Lemma (concentration of Gaussian measure; Leoux 1996): f : R R Lipschitz continuous (L), g N(0,I ). Then P( f(g) E[f(g)] t) e t L. (31) Lemma [approximate isometry of HB ; Ailon & Chazelle, 009]: x R ; H,B: from V. For any δ > 0 HBx P[ x log ( δ ) ] δ. (3)

20 Concentration proof Notation: v = x x σ, k(v) = k(x,x ), ˆk(v) = ˆk(x,x ). Sufficient to prove: f(g,p,b) = 1 j=1 cos(z j ), z = HGu, u = P HB v (33) concentrates aroun the mean. Iea: G f(g,p,b): Lipschitz high-p concentration in G B. Approximate isometry of HB : high-p in B (P: oes not matter). Union boun.

21 Concentration proof h(a) = 1 cos(a j ) (a R ), j=1 f(g;p,b) f(g ;P,B) = h[hiag(g)u] h[hiag(g )u], h(a) h(b) = 1 cos(a j ) cos(b j ) 1 j=1 cos(a j ) cos(b j ) 1 j=1 a j b j = 1 a b 1 1 a b = 1 a b Hiag(g)u Hiag(g )u H iag(g g )u = (g g ) u g g u. j=1

22 Concentration proof Until now: f(g;p,b) f(g ;P,B) u g g. (34) u term: using Pw = w u = PHB v = HB v. (35) Approximate isometry of HB : with 1 δ P B,P -probability u v log ( ) δ. (36)

23 Concentration proof Until now: f Lipschitz with 1 δ P B,P -probability f(g;p,b) f(g ;P,B) [ v log =: L g g. ( δ ) ] g g By the concentration of the Gaussian measure [G ii N(0,1)]: P G [ f(g;p,b) k(v) t] e t L =: δ, (37) ( ) ] P G [ f(g;p,b) k(v) log L δ. (38) δ We apply a union boun: δ.

24 Low variance: var[ψ j (v)] Notations: w = 1 HBv,u = Pw,z = HGu. (39) High-level iea: cov(z j,z t u): normal. cov(ψ(z j ),ψ(z t ) u), some exp cosh relations, j = t.

25 Low variance: z j u Def.: w = 1 HBv, u = Pw, z = HGu. Using E G (HGu u) = 0 cov(z j,z j u) = cov([hgu] j,[hgu] j u) = cov(hj T Gu,Hj T Gu u) [ ( )( ) ] T = E Hj T Gu Hj T Gu, Hj T Gu = [H j1 G 11 u 1,H j G u,...], (G : iagonal) ( ) cov(z j,z j u) = E GiiH jiu i = E ( G ) ii u i = i = u = v i using Hji = 1 (H ji = ±1), E ( Gii) = 1 [Gii N(0,1)], isometry of ( P 1 HB.} z u: normal, z j u N 0, v ). i u i

26 Low variance: cov(z j,z t u) Last slie: z j u N ( 0, v ). cov(z j,z t u) = corr(z j,z t u)st(z j u)st(z t u) (40) [ zj z t = corr(z j,z t u) v =: ρ jt (u) v =: ρ v. (41) ] ( [ ) 1 ρ N 0, ] v =: LL T = N(0,Lg),g N(0,I) ρ 1 (4) [ ] 1 0 L = ρ 1 ρ v. (43) Now for ψ j (v) = cos(z j ): cov(ψ j (v),ψ t (v) u) = cov(cos([lg] 1 ),cos([lg] )) (44) [ ] = E g k ) E g [cos([lg] k )]. (45) k=1cos([lg] k=1

27 Low variance: first term in cov(ψ j (v),ψ t (v) u) Using cos(α)cos(β) = 1 [cos(α β)+cos(α+β)], g = [g 1;g ], E g [cos([lg] 1 )cos([lg] )] = 1 E g {cos([lg] 1 [Lg] )+cos([lg] 1 +[Lg] )}, where [Lg] 1 [Lg] = v (g 1 ρg 1 ) 1 ρ g = v ρh, [Lg] 1 +[Lg] = v (g 1 +ρg 1 + ) 1 ρ g = v +ρh since g 1 ρg 1 1 ρ g (1 ρ) +(1 ρ )h = ρh, g 1 +ρg ρ g (1+ρ) +(1 ρ )h = +ρh. where h N(0,1).

28 Low variance: first term in cov(ψ j (v),ψ t (v) u) Thus E g [cos([lg] 1 )cos([lg] )] = 1 E g {cos(a h)+cos(a + h)}, (46) Making use of the relation we obtaine a = v ρ, (47) a + = v +ρ. (48) E[cos(ah)] = e 1 a, h N(0,1), (49) E g [cos([lg] 1 )cos([lg] )] = 1 [ e v (1 ρ) +e v (1+ρ) ].

29 Low variance: value of E[cos(b)] Lemma: E[cos(b)] = e 1 σ, b N(0,σ ), (50) Proof: The characteristic function of b N(m,σ ) c(t) = E b [e jtb] = e itm 1 σ t. (51) Specially, for m = 0, t = 1 (b N(0,σ )) e 1 σ = E b [e jb] = E[cos(b)]. (5)

30 Low variance: secon term in cov(ψ j (v),ψ t (v) u) Since z j N(0, v ) E g [cos(z j )]E g [cos(z t )] = (E g [cos( v h)]) = (e 1 v ) = e v using the ientity for E[cos(ah)]. Thus [cosh(a) = ea +e a ] cov(ψ j (v),ψ t (v) u) = 1 [e ] v (1 ρ) +e v (1+ρ) e v [ ] e v ρ +e v ρ = e v 1 [ ( ) ] = e v cosh v ρ 1.

31 Low variance: var[ψ j (v)] With j = t, ρ = 1 we got var[ψ j (v)] = e v [ e v +e v 1 ] (53) e v (54) = 1 ( 1 e v +e v ) (55) = 1 ( 1 e v ). (56) = 1+e v

32 Low variance: var j=1 ψ j(v) Decomposition: var ψ j (v) = We have seen that j=1 cov [ψ j (v),ψ t (v)]. (57) j,t=1 cov [ψ j (v),ψ t (v) u] = e v [ cosh We rewrite the cosh term. ( ) ] v ρ 1. (58)

33 Low variance: cosh v ρ Thir-orer Taylor expansion aroun 0 with remainer term where ( ) cosh v ρ = 1+ 1! v 4 ρ + 1 3! sinh(η) v 6 ρ 3 (59) η 1+ 1 v 4 ρ sinh ( v ) v 6 ρ 3 (60) 1+ v 4 ρ B( v ), (61) [ ] v ρ, v ρ, we use: cosh = sinh, sinh = cosh, cosh(0) = 1, sinh(a) = ea e a, sinh(0) = 0, monotonicity of sinh, ρ 1. B( v ) = sinh ( v ) v, (ρ 3 ρ ).

34 Low variance: var j=1 ψ j(v) Plugging the result back to cov [ψ j (v),ψ t (v) u], e v 1: Here, ρ = ρ(u). cov [ψ j (v),ψ t (v) u] v 4 ρ B( v ). (6) [ Remains: to boun E u ρ (u) ]. ( ) Small if E u 4 4 is small ( HB: ranomize preconitioner).

35 Numerical experiments Accuracy: similar to ranom kitchen sinks (RKS). CPU, RAM:

36 Summary Ranom kitchen sinks: use (normally istribute) ranom projections, which are store (Z). Fastfoo: approximates the RKS features using the composition of iag, permutation, Walsh-Haamar transformations (Ẑ). oes not store the feature map! Results: unbiase, concentration, low variance, RAM + CPU improvements.

37 Fastfoo: properties - rows of HGPHB: same length Let M = HGPHB. Square norm of the j th row lj = [MM T] [ = (HGPHB)(HGPHB) T] jj jj = [HGPHBB T H T P T GH T] [ = HG H T] jj jj (63) (64) = i H ijg ii = i G ii = G F (65) by BB T = I [B=iag(±1)], HH T = I, PP T = I, H ij = 1 (H ij = ±1).

38 Fastfoo: optional scaling matrix (S) Previous slie: l j = G F. Rescaling by 1 l j = 1 G F : yiels rows of unit length. S: iag( si G ): s i (π) r 1 e r F A 1, A 1 = π Γ( ). length istributions of the V rows: inepenent of each other.

Fastfood Approximating Kernel Expansions in Loglinear Time

Fastfood Approximating Kernel Expansions in Loglinear Time Fastfood Approximating Kernel Expansions in Loglinear Time Quoc Le Tamás Sarlós Alex Smola Google Knowledge, 1600 Amphitheatre Pkwy, Mountain View 94043, CA USA qvl@google.com stamas@google.com alex@smola.org

More information

Fastfood Approximating Kernel Expansions in Loglinear Time. Quoc Le, Tamas Sarlos, and Alex Smola Presenter: Shuai Zheng (Kyle)

Fastfood Approximating Kernel Expansions in Loglinear Time. Quoc Le, Tamas Sarlos, and Alex Smola Presenter: Shuai Zheng (Kyle) Fastfood Approximating Kernel Expansions in Loglinear Time Quoc Le, Tamas Sarlos, and Alex Smola Presenter: Shuai Zheng (Kyle) Large Scale Problem: ImageNet Challenge Large scale data Number of training

More information

arxiv: v1 [cs.lg] 13 Aug 2014

arxiv: v1 [cs.lg] 13 Aug 2014 Fastfood: Approximate Kernel Expansions in Loglinear Time arxiv:1408.3060v1 [cs.lg] 13 Aug 014 Quoc Viet Le Google Research, 1600 Amphitheatre Pky, Mountain View 94043 CA, USA Tamas Sarlos Google Strategic

More information

10-701/ Recitation : Kernels

10-701/ Recitation : Kernels 10-701/15-781 Recitation : Kernels Manojit Nandi February 27, 2014 Outline Mathematical Theory Banach Space and Hilbert Spaces Kernels Commonly Used Kernels Kernel Theory One Weird Kernel Trick Representer

More information

Function Spaces. 1 Hilbert Spaces

Function Spaces. 1 Hilbert Spaces Function Spaces A function space is a set of functions F that has some structure. Often a nonparametric regression function or classifier is chosen to lie in some function space, where the assume structure

More information

Random Projections. Lopez Paz & Duvenaud. November 7, 2013

Random Projections. Lopez Paz & Duvenaud. November 7, 2013 Random Projections Lopez Paz & Duvenaud November 7, 2013 Random Outline The Johnson-Lindenstrauss Lemma (1984) Random Kitchen Sinks (Rahimi and Recht, NIPS 2008) Fastfood (Le et al., ICML 2013) Why random

More information

Convergence of Random Walks

Convergence of Random Walks Chapter 16 Convergence of Ranom Walks This lecture examines the convergence of ranom walks to the Wiener process. This is very important both physically an statistically, an illustrates the utility of

More information

Connection of Local Linear Embedding, ISOMAP, and Kernel Principal Component Analysis

Connection of Local Linear Embedding, ISOMAP, and Kernel Principal Component Analysis Connection of Local Linear Embedding, ISOMAP, and Kernel Principal Component Analysis Alvina Goh Vision Reading Group 13 October 2005 Connection of Local Linear Embedding, ISOMAP, and Kernel Principal

More information

Memory Efficient Kernel Approximation

Memory Efficient Kernel Approximation Si Si Department of Computer Science University of Texas at Austin ICML Beijing, China June 23, 2014 Joint work with Cho-Jui Hsieh and Inderjit S. Dhillon Outline Background Motivation Low-Rank vs. Block

More information

Rank, Trace, Determinant, Transpose an Inverse of a Matrix Let A be an n n square matrix: A = a11 a1 a1n a1 a an a n1 a n a nn nn where is the jth col

Rank, Trace, Determinant, Transpose an Inverse of a Matrix Let A be an n n square matrix: A = a11 a1 a1n a1 a an a n1 a n a nn nn where is the jth col Review of Linear Algebra { E18 Hanout Vectors an Their Inner Proucts Let X an Y be two vectors: an Their inner prouct is ene as X =[x1; ;x n ] T Y =[y1; ;y n ] T (X; Y ) = X T Y = x k y k k=1 where T an

More information

Johnson-Lindenstrauss, Concentration and applications to Support Vector Machines and Kernels

Johnson-Lindenstrauss, Concentration and applications to Support Vector Machines and Kernels Johnson-Lindenstrauss, Concentration and applications to Support Vector Machines and Kernels Devdatt Dubhashi Department of Computer Science and Engineering, Chalmers University, dubhashi@chalmers.se Functional

More information

A PAC-Bayesian Approach to Spectrally-Normalized Margin Bounds for Neural Networks

A PAC-Bayesian Approach to Spectrally-Normalized Margin Bounds for Neural Networks A PAC-Bayesian Approach to Spectrally-Normalize Margin Bouns for Neural Networks Behnam Neyshabur, Srinah Bhojanapalli, Davi McAllester, Nathan Srebro Toyota Technological Institute at Chicago {bneyshabur,

More information

COMS 4721: Machine Learning for Data Science Lecture 10, 2/21/2017

COMS 4721: Machine Learning for Data Science Lecture 10, 2/21/2017 COMS 4721: Machine Learning for Data Science Lecture 10, 2/21/2017 Prof. John Paisley Department of Electrical Engineering & Data Science Institute Columbia University FEATURE EXPANSIONS FEATURE EXPANSIONS

More information

Conservation Laws. Chapter Conservation of Energy

Conservation Laws. Chapter Conservation of Energy 20 Chapter 3 Conservation Laws In orer to check the physical consistency of the above set of equations governing Maxwell-Lorentz electroynamics [(2.10) an (2.12) or (1.65) an (1.68)], we examine the action

More information

Least-Squares Regression on Sparse Spaces

Least-Squares Regression on Sparse Spaces Least-Squares Regression on Sparse Spaces Yuri Grinberg, Mahi Milani Far, Joelle Pineau School of Computer Science McGill University Montreal, Canaa {ygrinb,mmilan1,jpineau}@cs.mcgill.ca 1 Introuction

More information

Differentiation ( , 9.5)

Differentiation ( , 9.5) Chapter 2 Differentiation (8.1 8.3, 9.5) 2.1 Rate of Change (8.2.1 5) Recall that the equation of a straight line can be written as y = mx + c, where m is the slope or graient of the line, an c is the

More information

A Course in Machine Learning

A Course in Machine Learning A Course in Machine Learning Hal Daumé III 12 EFFICIENT LEARNING So far, our focus has been on moels of learning an basic algorithms for those moels. We have not place much emphasis on how to learn quickly.

More information

Gaussian vectors and central limit theorem

Gaussian vectors and central limit theorem Gaussian vectors and central limit theorem Samy Tindel Purdue University Probability Theory 2 - MA 539 Samy T. Gaussian vectors & CLT Probability Theory 1 / 86 Outline 1 Real Gaussian random variables

More information

Advanced Introduction to Machine Learning

Advanced Introduction to Machine Learning 10-715 Advanced Introduction to Machine Learning Homework Due Oct 15, 10.30 am Rules Please follow these guidelines. Failure to do so, will result in loss of credit. 1. Homework is due on the due date

More information

PDE Notes, Lecture #11

PDE Notes, Lecture #11 PDE Notes, Lecture # from Professor Jalal Shatah s Lectures Febuary 9th, 2009 Sobolev Spaces Recall that for u L loc we can efine the weak erivative Du by Du, φ := udφ φ C0 If v L loc such that Du, φ =

More information

Lecture Introduction. 2 Examples of Measure Concentration. 3 The Johnson-Lindenstrauss Lemma. CS-621 Theory Gems November 28, 2012

Lecture Introduction. 2 Examples of Measure Concentration. 3 The Johnson-Lindenstrauss Lemma. CS-621 Theory Gems November 28, 2012 CS-6 Theory Gems November 8, 0 Lecture Lecturer: Alesaner Mąry Scribes: Alhussein Fawzi, Dorina Thanou Introuction Toay, we will briefly iscuss an important technique in probability theory measure concentration

More information

Robust Forward Algorithms via PAC-Bayes and Laplace Distributions. ω Q. Pr (y(ω x) < 0) = Pr A k

Robust Forward Algorithms via PAC-Bayes and Laplace Distributions. ω Q. Pr (y(ω x) < 0) = Pr A k A Proof of Lemma 2 B Proof of Lemma 3 Proof: Since the support of LL istributions is R, two such istributions are equivalent absolutely continuous with respect to each other an the ivergence is well-efine

More information

Kernel methods, kernel SVM and ridge regression

Kernel methods, kernel SVM and ridge regression Kernel methods, kernel SVM and ridge regression Le Song Machine Learning II: Advanced Topics CSE 8803ML, Spring 2012 Collaborative Filtering 2 Collaborative Filtering R: rating matrix; U: user factor;

More information

Capacity Analysis of MIMO Systems with Unknown Channel State Information

Capacity Analysis of MIMO Systems with Unknown Channel State Information Capacity Analysis of MIMO Systems with Unknown Channel State Information Jun Zheng an Bhaskar D. Rao Dept. of Electrical an Computer Engineering University of California at San Diego e-mail: juzheng@ucs.eu,

More information

Basis Expansion and Nonlinear SVM. Kai Yu

Basis Expansion and Nonlinear SVM. Kai Yu Basis Expansion and Nonlinear SVM Kai Yu Linear Classifiers f(x) =w > x + b z(x) = sign(f(x)) Help to learn more general cases, e.g., nonlinear models 8/7/12 2 Nonlinear Classifiers via Basis Expansion

More information

Proof of Proposition 1

Proof of Proposition 1 A Proofs of Propositions,2,. Before e look at the MMD calculations in various cases, e prove the folloing useful characterization of MMD for translation invariant kernels like the Gaussian an Laplace kernels.

More information

Kernel Methods. Machine Learning A W VO

Kernel Methods. Machine Learning A W VO Kernel Methods Machine Learning A 708.063 07W VO Outline 1. Dual representation 2. The kernel concept 3. Properties of kernels 4. Examples of kernel machines Kernel PCA Support vector regression (Relevance

More information

Data Mining and Analysis: Fundamental Concepts and Algorithms

Data Mining and Analysis: Fundamental Concepts and Algorithms Data Mining and Analysis: Fundamental Concepts and Algorithms dataminingbook.info Mohammed J. Zaki 1 Wagner Meira Jr. 2 1 Department of Computer Science Rensselaer Polytechnic Institute, Troy, NY, USA

More information

Accelerated Dense Random Projections

Accelerated Dense Random Projections 1 Advisor: Steven Zucker 1 Yale University, Department of Computer Science. Dimensionality reduction (1 ε) xi x j 2 Ψ(xi ) Ψ(x j ) 2 (1 + ε) xi x j 2 ( n 2) distances are ε preserved Target dimension k

More information

Extreme Values by Resnick

Extreme Values by Resnick 1 Extreme Values by Resnick 1 Preliminaries 1.1 Uniform Convergence We will evelop the iea of something calle continuous convergence which will be useful to us later on. Denition 1. Let X an Y be metric

More information

Pithy P o i n t s Picked I ' p and Patljr Put By Our P e r i p a tetic Pencil Pusher VOLUME X X X X. Lee Hi^h School Here Friday Ni^ht

Pithy P o i n t s Picked I ' p and Patljr Put By Our P e r i p a tetic Pencil Pusher VOLUME X X X X. Lee Hi^h School Here Friday Ni^ht G G QQ K K Z z U K z q Z 22 x z - z 97 Z x z j K K 33 G - 72 92 33 3% 98 K 924 4 G G K 2 G x G K 2 z K j x x 2 G Z 22 j K K x q j - K 72 G 43-2 2 G G z G - -G G U q - z q - G x) z q 3 26 7 x Zz - G U-

More information

Kernel Principal Component Analysis

Kernel Principal Component Analysis Kernel Principal Component Analysis Seungjin Choi Department of Computer Science and Engineering Pohang University of Science and Technology 77 Cheongam-ro, Nam-gu, Pohang 37673, Korea seungjin@postech.ac.kr

More information

Least Distortion of Fixed-Rate Vector Quantizers. High-Resolution Analysis of. Best Inertial Profile. Zador's Formula Z-1 Z-2

Least Distortion of Fixed-Rate Vector Quantizers. High-Resolution Analysis of. Best Inertial Profile. Zador's Formula Z-1 Z-2 High-Resolution Analysis of Least Distortion of Fixe-Rate Vector Quantizers Begin with Bennett's Integral D 1 M 2/k Fin best inertial profile Zaor's Formula m(x) λ 2/k (x) f X(x) x Fin best point ensity

More information

Time series power spectral density. frequency-side,, vs. time-side, t

Time series power spectral density. frequency-side,, vs. time-side, t ime series power spectral ensity. requency-sie,, vs. time-sie, t t, t= 0, ±1, ±, Suppose stationary c u = cov{t+u,t} u = 0, ±1, ±, lag = 1/π Σ exp {-iu} c u perio π non-negative / : requency cycles/unit

More information

L5 Support Vector Classification

L5 Support Vector Classification L5 Support Vector Classification Support Vector Machine Problem definition Geometrical picture Optimization problem Optimization Problem Hard margin Convexity Dual problem Soft margin problem Alexander

More information

This ensures that we walk downhill. For fixed λ not even this may be the case.

This ensures that we walk downhill. For fixed λ not even this may be the case. Gradient Descent Objective Function Some differentiable function f : R n R. Gradient Descent Start with some x 0, i = 0 and learning rate λ repeat x i+1 = x i λ f(x i ) until f(x i+1 ) ɛ Line Search Variant

More information

Lecture 13: Simple Linear Regression in Matrix Format. 1 Expectations and Variances with Vectors and Matrices

Lecture 13: Simple Linear Regression in Matrix Format. 1 Expectations and Variances with Vectors and Matrices Lecture 3: Simple Linear Regression in Matrix Format To move beyond simple regression we need to use matrix algebra We ll start by re-expressing simple linear regression in matrix form Linear algebra is

More information

Applications of the Wronskian to ordinary linear differential equations

Applications of the Wronskian to ordinary linear differential equations Physics 116C Fall 2011 Applications of the Wronskian to orinary linear ifferential equations Consier a of n continuous functions y i (x) [i = 1,2,3,...,n], each of which is ifferentiable at least n times.

More information

Invariances in spectral estimates. Paris-Est Marne-la-Vallée, January 2011

Invariances in spectral estimates. Paris-Est Marne-la-Vallée, January 2011 Invariances in spectral estimates Franck Barthe Dario Cordero-Erausquin Paris-Est Marne-la-Vallée, January 2011 Notation Notation Given a probability measure ν on some Euclidean space, the Poincaré constant

More information

7.1 Support Vector Machine

7.1 Support Vector Machine 67577 Intro. to Machine Learning Fall semester, 006/7 Lecture 7: Support Vector Machines an Kernel Functions II Lecturer: Amnon Shashua Scribe: Amnon Shashua 7. Support Vector Machine We return now to

More information

Unsupervised Learning Techniques Class 07, 1 March 2006 Andrea Caponnetto

Unsupervised Learning Techniques Class 07, 1 March 2006 Andrea Caponnetto Unsupervised Learning Techniques 9.520 Class 07, 1 March 2006 Andrea Caponnetto About this class Goal To introduce some methods for unsupervised learning: Gaussian Mixtures, K-Means, ISOMAP, HLLE, Laplacian

More information

REAL ANALYSIS I HOMEWORK 5

REAL ANALYSIS I HOMEWORK 5 REAL ANALYSIS I HOMEWORK 5 CİHAN BAHRAN The questions are from Stein an Shakarchi s text, Chapter 3. 1. Suppose ϕ is an integrable function on R with R ϕ(x)x = 1. Let K δ(x) = δ ϕ(x/δ), δ > 0. (a) Prove

More information

Review: Support vector machines. Machine learning techniques and image analysis

Review: Support vector machines. Machine learning techniques and image analysis Review: Support vector machines Review: Support vector machines Margin optimization min (w,w 0 ) 1 2 w 2 subject to y i (w 0 + w T x i ) 1 0, i = 1,..., n. Review: Support vector machines Margin optimization

More information

Insights into the Geometry of the Gaussian Kernel and an Application in Geometric Modeling

Insights into the Geometry of the Gaussian Kernel and an Application in Geometric Modeling Insights into the Geometry of the Gaussian Kernel and an Application in Geometric Modeling Master Thesis Michael Eigensatz Advisor: Joachim Giesen Professor: Mark Pauly Swiss Federal Institute of Technology

More information

Support Vector Machines and Kernel Methods

Support Vector Machines and Kernel Methods 2018 CS420 Machine Learning, Lecture 3 Hangout from Prof. Andrew Ng. http://cs229.stanford.edu/notes/cs229-notes3.pdf Support Vector Machines and Kernel Methods Weinan Zhang Shanghai Jiao Tong University

More information

22 : Hilbert Space Embeddings of Distributions

22 : Hilbert Space Embeddings of Distributions 10-708: Probabilistic Graphical Models 10-708, Spring 2014 22 : Hilbert Space Embeddings of Distributions Lecturer: Eric P. Xing Scribes: Sujay Kumar Jauhar and Zhiguang Huo 1 Introduction and Motivation

More information

CHAPTER 5. THE KERNEL METHOD 138

CHAPTER 5. THE KERNEL METHOD 138 CHAPTER 5. THE KERNEL METHOD 138 Chapter 5 The Kernel Method Before we can mine data, it is important to first find a suitable data representation that facilitates data analysis. For example, for complex

More information

1 Solutions to selected problems

1 Solutions to selected problems Solutions to selected problems Section., #a,c,d. a. p x = n for i = n : 0 p x = xp x + i end b. z = x, y = x for i = : n y = y + x i z = zy end c. y = (t x ), p t = a for i = : n y = y(t x i ) p t = p

More information

Regularized Least Squares

Regularized Least Squares Regularized Least Squares Ryan M. Rifkin Google, Inc. 2008 Basics: Data Data points S = {(X 1, Y 1 ),...,(X n, Y n )}. We let X simultaneously refer to the set {X 1,...,X n } and to the n by d matrix whose

More information

An Optimal Algorithm for Bandit and Zero-Order Convex Optimization with Two-Point Feedback

An Optimal Algorithm for Bandit and Zero-Order Convex Optimization with Two-Point Feedback Journal of Machine Learning Research 8 07) - Submitte /6; Publishe 5/7 An Optimal Algorithm for Banit an Zero-Orer Convex Optimization with wo-point Feeback Oha Shamir Department of Computer Science an

More information

A Kernel Between Sets of Vectors

A Kernel Between Sets of Vectors A Kernel Between Sets of Vectors Risi Kondor Tony Jebara Columbia University, New York, USA. 1 A Kernel between Sets of Vectors In SVM, Gassian Processes, Kernel PCA, kernel K de nes feature map Φ : X

More information

Perceptron Revisited: Linear Separators. Support Vector Machines

Perceptron Revisited: Linear Separators. Support Vector Machines Support Vector Machines Perceptron Revisited: Linear Separators Binary classification can be viewed as the task of separating classes in feature space: w T x + b > 0 w T x + b = 0 w T x + b < 0 Department

More information

STAT 200C: High-dimensional Statistics

STAT 200C: High-dimensional Statistics STAT 200C: High-dimensional Statistics Arash A. Amini May 30, 2018 1 / 59 Classical case: n d. Asymptotic assumption: d is fixed and n. Basic tools: LLN and CLT. High-dimensional setting: n d, e.g. n/d

More information

Support Vector and Kernel Methods

Support Vector and Kernel Methods SIGIR 2003 Tutorial Support Vector and Kernel Methods Thorsten Joachims Cornell University Computer Science Department tj@cs.cornell.edu http://www.joachims.org 0 Linear Classifiers Rules of the Form:

More information

CLARK-OCONE FORMULA BY THE S-TRANSFORM ON THE POISSON WHITE NOISE SPACE

CLARK-OCONE FORMULA BY THE S-TRANSFORM ON THE POISSON WHITE NOISE SPACE CLARK-OCONE FORMULA BY THE S-TRANSFORM ON THE POISSON WHITE NOISE SPACE YUH-JIA LEE*, NICOLAS PRIVAULT, AND HSIN-HUNG SHIH* Abstract. Given ϕ a square-integrable Poisson white noise functionals we show

More information

Machine Learning and Data Mining. Support Vector Machines. Kalev Kask

Machine Learning and Data Mining. Support Vector Machines. Kalev Kask Machine Learning and Data Mining Support Vector Machines Kalev Kask Linear classifiers Which decision boundary is better? Both have zero training error (perfect training accuracy) But, one of them seems

More information

Nonlinear Dimensionality Reduction

Nonlinear Dimensionality Reduction Outline Hong Chang Institute of Computing Technology, Chinese Academy of Sciences Machine Learning Methods (Fall 2012) Outline Outline I 1 Kernel PCA 2 Isomap 3 Locally Linear Embedding 4 Laplacian Eigenmap

More information

Generalization Bounds in Machine Learning. Presented by: Afshin Rostamizadeh

Generalization Bounds in Machine Learning. Presented by: Afshin Rostamizadeh Generalization Bounds in Machine Learning Presented by: Afshin Rostamizadeh Outline Introduction to generalization bounds. Examples: VC-bounds Covering Number bounds Rademacher bounds Stability bounds

More information

Subspace Embeddings for the Polynomial Kernel

Subspace Embeddings for the Polynomial Kernel Subspace Embeddings for the Polynomial Kernel Haim Avron IBM T.J. Watson Research Center Yorktown Heights, NY 10598 haimav@us.ibm.com Huy L. Nguy ên Simons Institute, UC Berkeley Berkeley, CA 94720 hlnguyen@cs.princeton.edu

More information

Reproducing Kernel Hilbert Spaces Class 03, 15 February 2006 Andrea Caponnetto

Reproducing Kernel Hilbert Spaces Class 03, 15 February 2006 Andrea Caponnetto Reproducing Kernel Hilbert Spaces 9.520 Class 03, 15 February 2006 Andrea Caponnetto About this class Goal To introduce a particularly useful family of hypothesis spaces called Reproducing Kernel Hilbert

More information

Multi-View Clustering via Canonical Correlation Analysis

Multi-View Clustering via Canonical Correlation Analysis Technical Report TTI-TR-2008-5 Multi-View Clustering via Canonical Correlation Analysis Kamalika Chauhuri UC San Diego Sham M. Kakae Toyota Technological Institute at Chicago ABSTRACT Clustering ata in

More information

Bayesian Learning in Undirected Graphical Models

Bayesian Learning in Undirected Graphical Models Bayesian Learning in Undirected Graphical Models Zoubin Ghahramani Gatsby Computational Neuroscience Unit University College London, UK http://www.gatsby.ucl.ac.uk/ Work with: Iain Murray and Hyun-Chul

More information

Slide10 Haykin Chapter 14: Neurodynamics (3rd Ed. Chapter 13)

Slide10 Haykin Chapter 14: Neurodynamics (3rd Ed. Chapter 13) Slie10 Haykin Chapter 14: Neuroynamics (3r E. Chapter 13) CPSC 636-600 Instructor: Yoonsuck Choe Spring 2012 Neural Networks with Temporal Behavior Inclusion of feeback gives temporal characteristics to

More information

Kernel Methods. Outline

Kernel Methods. Outline Kernel Methods Quang Nguyen University of Pittsburgh CS 3750, Fall 2011 Outline Motivation Examples Kernels Definitions Kernel trick Basic properties Mercer condition Constructing feature space Hilbert

More information

Lecture 6: Calculus. In Song Kim. September 7, 2011

Lecture 6: Calculus. In Song Kim. September 7, 2011 Lecture 6: Calculus In Song Kim September 7, 20 Introuction to Differential Calculus In our previous lecture we came up with several ways to analyze functions. We saw previously that the slope of a linear

More information

Physics 513, Quantum Field Theory Homework 4 Due Tuesday, 30th September 2003

Physics 513, Quantum Field Theory Homework 4 Due Tuesday, 30th September 2003 PHYSICS 513: QUANTUM FIELD THEORY HOMEWORK 1 Physics 513, Quantum Fiel Theory Homework Due Tuesay, 30th September 003 Jacob Lewis Bourjaily 1. We have efine the coherent state by the relation { } 3 k η

More information

Sample Geometry. Edps/Soc 584, Psych 594. Carolyn J. Anderson

Sample Geometry. Edps/Soc 584, Psych 594. Carolyn J. Anderson Sample Geometry Edps/Soc 584, Psych 594 Carolyn J. Anderson Department of Educational Psychology I L L I N O I S university of illinois at urbana-champaign c Board of Trustees, University of Illinois Spring

More information

EECS 598: Statistical Learning Theory, Winter 2014 Topic 11. Kernels

EECS 598: Statistical Learning Theory, Winter 2014 Topic 11. Kernels EECS 598: Statistical Learning Theory, Winter 2014 Topic 11 Kernels Lecturer: Clayton Scott Scribe: Jun Guo, Soumik Chatterjee Disclaimer: These notes have not been subjected to the usual scrutiny reserved

More information

MIT 9.520/6.860, Fall 2018 Statistical Learning Theory and Applications. Class 04: Features and Kernels. Lorenzo Rosasco

MIT 9.520/6.860, Fall 2018 Statistical Learning Theory and Applications. Class 04: Features and Kernels. Lorenzo Rosasco MIT 9.520/6.860, Fall 2018 Statistical Learning Theory and Applications Class 04: Features and Kernels Lorenzo Rosasco Linear functions Let H lin be the space of linear functions f(x) = w x. f w is one

More information

Properties of Random Variables

Properties of Random Variables Properties of Random Variables 1 Definitions A discrete random variable is defined by a probability distribution that lists each possible outcome and the probability of obtaining that outcome If the random

More information

Solving the SVM Optimization Problem

Solving the SVM Optimization Problem Solving the SVM Optimization Problem Kernel-based Learning Methods Christian Igel Institut für Neuroinformatik Ruhr-Universität Bochum, Germany http://www.neuroinformatik.rub.de July 16, 2009 Christian

More information

Methods for sparse analysis of high-dimensional data, II

Methods for sparse analysis of high-dimensional data, II Methods for sparse analysis of high-dimensional data, II Rachel Ward May 26, 2011 High dimensional data with low-dimensional structure 300 by 300 pixel images = 90, 000 dimensions 2 / 55 High dimensional

More information

Lecture 2: Correlated Topic Model

Lecture 2: Correlated Topic Model Probabilistic Moels for Unsupervise Learning Spring 203 Lecture 2: Correlate Topic Moel Inference for Correlate Topic Moel Yuan Yuan First of all, let us make some claims about the parameters an variables

More information

Support Vector Machines.

Support Vector Machines. Support Vector Machines www.cs.wisc.edu/~dpage 1 Goals for the lecture you should understand the following concepts the margin slack variables the linear support vector machine nonlinear SVMs the kernel

More information

ML (cont.): SUPPORT VECTOR MACHINES

ML (cont.): SUPPORT VECTOR MACHINES ML (cont.): SUPPORT VECTOR MACHINES CS540 Bryan R Gibson University of Wisconsin-Madison Slides adapted from those used by Prof. Jerry Zhu, CS540-1 1 / 40 Support Vector Machines (SVMs) The No-Math Version

More information

Scale-Invariance of Support Vector Machines based on the Triangular Kernel. Abstract

Scale-Invariance of Support Vector Machines based on the Triangular Kernel. Abstract Scale-Invariance of Support Vector Machines based on the Triangular Kernel François Fleuret Hichem Sahbi IMEDIA Research Group INRIA Domaine de Voluceau 78150 Le Chesnay, France Abstract This paper focuses

More information

The University of Texas at Austin Department of Electrical and Computer Engineering. EE381V: Large Scale Learning Spring 2013.

The University of Texas at Austin Department of Electrical and Computer Engineering. EE381V: Large Scale Learning Spring 2013. The University of Texas at Austin Department of Electrical and Computer Engineering EE381V: Large Scale Learning Spring 2013 Assignment Two Caramanis/Sanghavi Due: Tuesday, Feb. 19, 2013. Computational

More information

ESANN'2001 proceedings - European Symposium on Artificial Neural Networks Bruges (Belgium), April 2001, D-Facto public., ISBN ,

ESANN'2001 proceedings - European Symposium on Artificial Neural Networks Bruges (Belgium), April 2001, D-Facto public., ISBN , Sparse Kernel Canonical Correlation Analysis Lili Tan and Colin Fyfe 2, Λ. Department of Computer Science and Engineering, The Chinese University of Hong Kong, Hong Kong. 2. School of Information and Communication

More information

IFT 6760A - Lecture 1 Linear Algebra Refresher

IFT 6760A - Lecture 1 Linear Algebra Refresher IFT 6760A - Lecture 1 Linear Algebra Refresher Scribe(s): Tianyu Li Instructor: Guillaume Rabusseau 1 Summary In the previous lecture we have introduced some applications of linear algebra in machine learning,

More information

Topic 7: Convergence of Random Variables

Topic 7: Convergence of Random Variables Topic 7: Convergence of Ranom Variables Course 003, 2016 Page 0 The Inference Problem So far, our starting point has been a given probability space (S, F, P). We now look at how to generate information

More information

Transforms. Convergence of probability generating functions. Convergence of characteristic functions functions

Transforms. Convergence of probability generating functions. Convergence of characteristic functions functions Transforms For non-negative integer value ranom variables, let the probability generating function g X : [0, 1] [0, 1] be efine by g X (t) = E(t X ). The moment generating function ψ X (t) = E(e tx ) is

More information

A Sketch of Menshikov s Theorem

A Sketch of Menshikov s Theorem A Sketch of Menshikov s Theorem Thomas Bao March 14, 2010 Abstract Let Λ be an infinite, locally finite oriente multi-graph with C Λ finite an strongly connecte, an let p

More information

Linear Regression. In this problem sheet, we consider the problem of linear regression with p predictors and one intercept,

Linear Regression. In this problem sheet, we consider the problem of linear regression with p predictors and one intercept, Linear Regression In this problem sheet, we consider the problem of linear regression with p predictors and one intercept, y = Xβ + ɛ, where y t = (y 1,..., y n ) is the column vector of target values,

More information

Inverse Theory Course: LTU Kiruna. Day 1

Inverse Theory Course: LTU Kiruna. Day 1 Inverse Theory Course: LTU Kiruna. Day Hugh Pumphrey March 6, 0 Preamble These are the notes for the course Inverse Theory to be taught at LuleåTekniska Universitet, Kiruna in February 00. They are not

More information

Sparse Random Features Algorithm as Coordinate Descent in Hilbert Space

Sparse Random Features Algorithm as Coordinate Descent in Hilbert Space Sparse Random Features Algorithm as Coordinate Descent in Hilbert Space Ian E.H. Yen 1 Ting-Wei Lin Shou-De Lin Pradeep Ravikumar 1 Inderjit S. Dhillon 1 Department of Computer Science 1: University of

More information

Econ 2148, fall 2017 Gaussian process priors, reproducing kernel Hilbert spaces, and Splines

Econ 2148, fall 2017 Gaussian process priors, reproducing kernel Hilbert spaces, and Splines Econ 2148, fall 2017 Gaussian process priors, reproducing kernel Hilbert spaces, and Splines Maximilian Kasy Department of Economics, Harvard University 1 / 37 Agenda 6 equivalent representations of the

More information

Unit vectors with non-negative inner products

Unit vectors with non-negative inner products Unit vectors with non-negative inner proucts Bos, A.; Seiel, J.J. Publishe: 01/01/1980 Document Version Publisher s PDF, also known as Version of Recor (inclues final page, issue an volume numbers) Please

More information

Methods for sparse analysis of high-dimensional data, II

Methods for sparse analysis of high-dimensional data, II Methods for sparse analysis of high-dimensional data, II Rachel Ward May 23, 2011 High dimensional data with low-dimensional structure 300 by 300 pixel images = 90, 000 dimensions 2 / 47 High dimensional

More information

Zachary Scherr Math 503 HW 5 Due Friday, Feb 26

Zachary Scherr Math 503 HW 5 Due Friday, Feb 26 Zachary Scherr Math 503 HW 5 Due Friay, Feb 26 1 Reaing 1. Rea Chapter 9 of Dummit an Foote 2 Problems 1. 9.1.13 Solution: We alreay know that if R is any commutative ring, then R[x]/(x r = R for any r

More information

Support Vector Machines

Support Vector Machines Support Vector Machines Hypothesis Space variable size deterministic continuous parameters Learning Algorithm linear and quadratic programming eager batch SVMs combine three important ideas Apply optimization

More information

Kernel Methods in Machine Learning

Kernel Methods in Machine Learning Kernel Methods in Machine Learning Autumn 2015 Lecture 1: Introduction Juho Rousu ICS-E4030 Kernel Methods in Machine Learning 9. September, 2015 uho Rousu (ICS-E4030 Kernel Methods in Machine Learning)

More information

Alternating Circulant Random Features for Semigroup Kernels

Alternating Circulant Random Features for Semigroup Kernels The Thirty-Secon AAAI Conference on Artificial Intelligence AAAI-8) Alternating Circulant Ranom Features for Semigroup Kernels Yusuke Mukuta The University of Tokyo mukuta@mi.t.u-tokyo.ac.jp Yoshitaka

More information

Lecture XII. where Φ is called the potential function. Let us introduce spherical coordinates defined through the relations

Lecture XII. where Φ is called the potential function. Let us introduce spherical coordinates defined through the relations Lecture XII Abstract We introuce the Laplace equation in spherical coorinates an apply the metho of separation of variables to solve it. This will generate three linear orinary secon orer ifferential equations:

More information

Randomized Algorithms

Randomized Algorithms Randomized Algorithms Saniv Kumar, Google Research, NY EECS-6898, Columbia University - Fall, 010 Saniv Kumar 9/13/010 EECS6898 Large Scale Machine Learning 1 Curse of Dimensionality Gaussian Mixture Models

More information

Analysis IV, Assignment 4

Analysis IV, Assignment 4 Analysis IV, Assignment 4 Prof. John Toth Winter 23 Exercise Let f C () an perioic with f(x+2) f(x). Let a n f(t)e int t an (S N f)(x) N n N then f(x ) lim (S Nf)(x ). N a n e inx. If f is continuously

More information

L 2,1 Norm and its Applications

L 2,1 Norm and its Applications L 2, Norm and its Applications Yale Chang Introduction According to the structure of the constraints, the sparsity can be obtained from three types of regularizers for different purposes.. Flat Sparsity.

More information

LECTURE NOTES ON DVORETZKY S THEOREM

LECTURE NOTES ON DVORETZKY S THEOREM LECTURE NOTES ON DVORETZKY S THEOREM STEVEN HEILMAN Abstract. We present the first half of the paper [S]. In particular, the results below, unless otherwise state, shoul be attribute to G. Schechtman.

More information