Powers of Tensors and Fast Matrix Multiplication

Size: px

Start display at page:

Download "Powers of Tensors and Fast Matrix Multiplication"

Lorin Lawson
6 years ago
Views:

1 Powers of Tensors and Fast Matrix Multiplication François Le Gall Department of Computer Science Graduate School of Information Science and Technology The University of Tokyo Simons Institute, 12 November 2014

2 Overview of our Results

3 Algebraic Complexity of Matrix Multiplication Compute the product of two n n matrices A and B over a field F Model: algebraic circuits gates: +,,, (operations on two elements of the field) input: a ij, b ij (2n 2 inputs) output: c ij = n k=1 a ikb kj (n 2 outputs) C M (n) =minimal number of algebraic operations needed to compute the product note: may depend on the field F Exponent of matrix multiplication = inf C M (n) n for all large enough n Obviously, 2. note: may depend on the field F

4 History of the main improvements on the exponent of square matrix multiplication Upper bound Year Authors < Strassen < Pan < Bini, Capovani, Romani and Lotti < Schönhage < Pan < Romani < Coppersmith and Winograd < Strassen < Coppersmith and Winograd < Stothers < Vassilevska Williams < Le Gall This work

5 History of the main improvements on the exponent of square matrix multiplication Upper bound Year Authors < Strassen < Pan < Bini, Capovani, Romani and Lotti < Schönhage < Pan < Romani < Coppersmith and Winograd < Strassen < Coppersmith and Winograd < Stothers < Vassilevska Williams < Le Gall analysis of a tensor by the laser method

6 History of the main improvements on the exponent of square matrix multiplication < Strassen < Coppersmith and Winograd < Stothers < Vassilevska Williams < Le Gall The tensors considered become more difficult to analyze (technical difficulties appear + the size of the tensor increases) Previous versions (up to v2.2): LM-based analysis v1 LM-based analysis v2.0 LM-based analysis v2.1 LM-based analysis v2.2 LM-based analysis v2. analyzing the tensor required solving a complicated optimization problem (difficult when the size of the tensor increases) Our new technique (v2.): analyzing the tensor (i.e., obtaining an upper bound on ω from it) can be done in time polynomial in the size of the tensor analysis based on convex optimization analysis of the tensor by the laser method (LM)

7 Applications of our method any tensor from which an upper bound on ω can be obtained from the laser method Laser-method-based analysis v2. polynomial time corresponding upper bound on ω which tensor? powers of the basic tensor from Coppersmith and Winograd s paper analysis of the m-th power of the tensor by CW m Upper bound Number of variables in in the optimization problem Authors 1 < CW (1987) 2 < CW (1987) 4 < Stothers (2010) 8 < Vassilevska Williams (2012) 16 < Le Gall (2014) 2 < Le Gall (2014) < Strassen < Coppersmith and Winograd < Stothers < Vassilevska Williams < Le Gall LM-based analysis v1 LM-based analysis v2.0 LM-based analysis v2.1 LM-based analysis v2.2 LM-based analysis v2.

8 Applications of our method any tensor from which an upper bound on ω can be obtained from the laser method Laser-method-based analysis v2. polynomial time corresponding upper bound on ω which tensor? powers of the basic tensor from Coppersmith and Winograd s paper analysis of the m-th power of the tensor by CW m Upper bound Number of variables in in the optimization problem Authors 1 < CW (1987) 2 < CW (1987) 4 < Stothers (2010) 8 < Vassilevska Williams (2012) 16 < Le Gall (2014) 2 < Le Gall (2014) < Strassen < Coppersmith and Winograd < Stothers < Vassilevska Williams < Le Gall LM-based analysis v1 LM-based analysis v2.0 LM-based analysis v2.1 LM-based analysis v2.2 LM-based analysis v2.

9 Applications of our method any tensor from which an upper bound on ω can be obtained from the laser method Laser-method-based analysis v2. polynomial time corresponding upper bound on ω which tensor? powers of the basic tensor from Coppersmith and Winograd s paper analysis of the m-th power of the tensor by CW m Upper bound Number of variables in in the optimization problem Authors 1 < CW (1987) 2 < CW (1987) 4 < Stothers (2010) 8 < Vassilevska Williams (2012) 16 < Le Gall (2014) 2 < Le Gall (2014) < Strassen < Coppersmith and Winograd < Stothers < Vassilevska Williams < Le Gall LM-based analysis v1 LM-based analysis v2.0 LM-based analysis v2.1 LM-based analysis v2.2 LM-based analysis v2.

10 Applications of our method any tensor from which an upper bound on ω can be obtained from the laser method Laser-method-based analysis v2. polynomial time corresponding upper bound on ω which tensor? powers of the basic tensor from Coppersmith and Winograd s paper analysis of the m-th power of the tensor by CW m Upper bound Number of variables in in the optimization problem Authors 1 < CW (1987) 2 < CW (1987) 4 < Stothers (2010) 8 < Vassilevska Williams (2012) 16 < Le Gall (2014) 2 < Le Gall (2014) < Strassen < Coppersmith and Winograd < Stothers < Vassilevska Williams < Le Gall LM-based analysis v1 LM-based analysis v2.0 LM-based analysis v2.1 LM-based analysis v2.2 LM-based analysis v2.

11 How to Obtain Upper Bounds on ω?

12 Strassen s algorithm (for the product of two 2 2 matrices) Goal: compute the product of A = a 11 a 12 a 21 a 22 by B = b 11 b 12 b 21 b Compute: m 1 = a 11 (b 12 b 22 ), m 2 =(a 11 + a 12 ) b 22, m =(a 21 + a 22 ) b 11, m 4 = a 22 (b 21 b 11 ), m 5 =(a 11 + a 22 ) (b 11 + b 22 ), m 6 =(a 12 a 22 ) (b 21 + b 22 ), m 7 =(a 11 a 21 ) (b 11 + b 12 ). 2. Output: m 2 + m 4 + m 5 + m 6 = c 11, m 1 + m 2 = c 12, m + m 4 = c 21, m 1 m + m 5 m 7 = c multiplications 18 additions/substractions

13 Strassen s algorithm (for the product of two 2 k 2 k matrices) Goal: compute the product of A = a 11 a 12 a 21 a 22 by B = b 11 b 12 b 21 b Compute: m 1 = a 11 (b 12 b 22 ), m 2 =(a 11 + a 12 ) b 22, m =(a 21 + a 22 ) b 11, m 4 = a 22 (b 21 b 11 ), m 5 =(a 11 + a 22 ) (b 11 + b 22 ), m 6 =(a 12 a 22 ) (b 21 + b 22 ), m 7 =(a 11 a 21 ) (b 11 + b 12 ). 2. Output: m 2 + m 4 + m 5 + m 6 = c 11, m 1 + m 2 = c 12, m + m 4 = c 21, m 1 m + m 5 m 7 = c multiplications 18 additions/substractions Recursive application gives C M (2 k )=O(7 k ) = O((2 k ) log 2 7 ) = log 2 (7) = [Strassen 69]

14 Strassen s algorithm (for the product of two 2 k 2 k matrices) More generally: Suppose that the product of two m m matrices can be computed with t multiplications. Then log m (t) or, equivalently, m t. Strassen s algorithm is the case m =2and t =7 7 multiplications 18 additions/substractions Recursive application gives C M (2 k )=O(7 k ) = O((2 k ) log 2 7 ) = log 2 (7) = [Strassen 69]

15 The tensor of matrix multiplication Definition The tensor corresponding to the multiplication of an m n matrix by an n p matrix is m p n a ik b kj c ij. m, n, p = i=1 j=1 k=1 intuitive interpretation: this is a formal sum when the aik and the bkj are replaced by the corresponding entries of matrices, the coefficient of cij becomes n k=1 a ikb kj

16 General -tensors Consider three vector spaces U, V and W over F Take bases of U, V and W : U = span{x 1,...,x dim(u) } V = span{y 1,...,y dim(v ) } W = span{z 1,...,z dim(w ) } A tensor over (U, V, W ) is an element of U V W i.e., a formal sum T = dim(u) u=1 dim(v ) v=1 dim(w ) w=1 d uvw x u y v z w F a three-dimension array with dim(u) dim(v ) dim(w ) entries in F

17 General -tensors A tensor over (U, V, W ) is an element of U V W i.e., a formal sum T = dim(u) u=1 dim(v ) v=1 dim(w ) w=1 dim(u) =mn, dim(v )=np and dim(w )=mp d uvw x u y v z w F U = span {a ik } 1 i m,1 k n V = span {b k j } 1 k n,1 j p W = span {c i j } 1 i m,1 j p d ikk ji j = 1 if i = i, j = j, k = k 0 otherwise Definition The tensor corresponding to the multiplication of an m n matrix by an n p matrix is m p n a ik b kj c ij. m, n, p = i=1 j=1 k=1

18 Rank Definition The tensor corresponding to the multiplication of an m n matrix by an n p matrix is m p n a ik b kj c ij. m, n, p = i=1 j=1 k=1 R( m, n, p ) mnp 2, 2, 2 =a 11 (b 12 b 22 ) (c 12 + c 22 ) +(a 11 + a 12 ) b 22 ( c 11 + c 12 ) +(a 21 + a 22 ) b 11 (c 21 c 22 ) + a 22 (b 21 b 11 ) (c 11 + c 21 ) +(a 11 + a 22 ) (b 11 + b 22 ) (c 11 + c 22 ) Strassen s algorithm gives R( 2, 2, 2 ) 7 rank = # of multiplications of the best (bilinear) algorithm +(a 12 a 22 ) (b 21 + b 22 ) c 11 +(a 11 a 21 ) (b 11 + b 12 ) ( c 22 )

19 How to obtain upper bounds on ω? Remember: Suppose that the product of two m m matrices can be computed with t multiplications. Then log m (t) or, equivalently, m t. In our terminology: R( m, m, m ) t = m t First generalization: Theorem R( m, n, p ) t = (mnp) / t Second generalization: [Bini et al. 1979] Theorem border rank R( m, n, p ) R( m, n, p ) R( m, n, p ) t = (mnp) / t

20 How to obtain upper bounds on ω? Third generalization: Theorem (the asymptotic sum inequality, special case) [Schönhage 1981] R( m 1,n 1,p 1 m 2,n 2,p 2 ) t = (m 1 n 1 p 1 ) / +(m 2 n 2 p 2 ) / t Theorem (the asymptotic sum inequality, general form) [Schönhage 1981] R k i=1 direct sum m i,n i,p i t = k i=1 (m i n i p i ) / t First generalization: Theorem R( m, n, p ) t = (mnp) / t Second generalization: [Bini et al. 1979] Theorem border rank R( m, n, p ) R( m, n, p ) R( m, n, p ) t = (mnp) / t

21 History of the main improvements on the exponent of square matrix multiplication Upper bound Year Authors < Strassen upper bound on ω from the < Pan analysis of the rank of a tensor < Bini et al. < Schönhage < Pan < Romani < Coppersmith and Winograd < Strassen < Coppersmith and Winograd < Stothers < Vassilevska Williams < Le Gall analysis of the border rank of a tensor analysis of a tensor by the asymptotic sum inequality analysis of a tensor by the laser method

22 The Laser Method on a Simpler Example

23 The laser method Why this is called the laser method? from V. Strassen. Algebra and Complexity. Proceedings of the first European Congress of Mathematics, pp , Ref. [27] Upper bound Year Authors < Strassen < Pan < Bini, Capovani, Romani and Lotti < Schönhage < Pan < Romani < Coppersmith and Winograd < Strassen < Coppersmith and Winograd < Stothers < Vassilevska Williams < Le Gall variants (improvements) of the laser method

24 The first CW construction Let q be a positive integer. Consider three vector spaces U, V and W of dimension q +1over F. U = span{x 0,...,x q } V = span{y 0,...,y q } W = span{z 0,...,z q } Coppersmith and Winograd (1987) introduced the following tensor: T easy = T 011 easy + T 101 easy + T 110 easy, tensor over (U, V, W ) where T 011 easy = T 101 easy = T 110 easy = q i=1 q i=1 q i=1 x 0 y i z i x i y 0 z i x i y i z 0 = 1, 1,q = q, 1, 1 = 1,q,1 T 011 easy = T 101 easy = T 110 easy = q i=1 q i=1 q i=1 x 00 y 0i z 0i 1 1 matrix by 1 q matrix x i0 y 00 z i0 q 1 matrix by 1 1 matrix x 0i y i0 z 00 1 q matrix by q 1 matrix

25 The first CW construction U = span{x 0,...,x q } V = span{y 0,...,y q } W = span{z 0,...,z q } U = U 0 U 1, where U 0 = span{x 0 } and U 1 = span{x 1,...,x q } V = V 0 V 1, where V 0 = span{y 0 } and V 1 = span{y 1,...,y q } W = W 0 W 1, where W 0 = span{z 0 } and W 1 = span{z 1,...,z q } Coppersmith and Winograd (1987) introduced the following tensor: T easy = T 011 easy + T 101 easy + T 110 easy, This is not a direct sum where T 011 easy = T 101 easy = T 110 easy = q i=1 q i=1 q i=1 x 0 y i z i x i y 0 z i x i y i z 0 tensor over (U 0,V 1,W 1 ) tensor over (U 1,V 0,W 1 ) tensor over (U 1,V 1,W 0 )

26 The first CW construction T easy = Teasy Teasy Teasy 110 R(T easy ) q +2 Actually, R(T easy )=q +2 Since the sum in not direct, we cannot use the asymptotic sum inequality Consider T 2 easy =(T 011 easy + T 101 easy + T 110 easy) (T 011 easy + T 101 easy + T 110 easy) = Teasy 011 Teasy Teasy 011 Teasy Teasy 110 Teasy 110 (9 terms) Consider Teasy N = Teasy 011 Teasy Teasy 110 Teasy 110 ( N terms) N Note: R(Teasy )=(q + 1) N+o(N) would imply =2 Coppersmith and Winograd showed how to select do not share variables (i.e., form a direct sum) N 2 2/ terms that by zeroing variables (i.e., without increasing the rank)

27 The first CW construction: Analysis H 1, 2 = 1 log 1 = log 1/ 2 2 log 2 (entropy) 2/ = log 2 2/ Theorem [Coppermith and Winograd 87] The tensor T easy N can be converted into a direct sum of by zeroing variables (i.e., without increasing the rank) exp H 1, 2 o(1) N = 2 2/ (1 o(1))n terms, each containing N copies of T 011 easy, N copies of T 101 easy and N copies of T 110 easy. Consider Teasy N = Teasy 011 Teasy Teasy 110 Teasy 110 ( N terms) N copies of Teasy copies of Teasy copies of Teasy copies of Teasy copies of Teasy 101 N copies of Teasy 110

The first CW construction: Analysis Theorem [Coppermith and Winograd 87] The tensor T easy N can be converted into a direct sum of exp H 1, 2 o(1) N = 2 2/ (1 o(1))n terms, each containing N copies

28 The first CW construction: Analysis Theorem [Coppermith and Winograd 87] The tensor T easy N can be converted into a direct sum of exp H 1, 2 o(1) N = 2 2/ (1 o(1))n terms, each containing N copies of T 011 easy, N copies of T 101 easy and N copies of T 110 easy. isomorphic to T 011 easy N/ T 101 easy N/ T 110 easy Theorem (the asymptotic sum inequality, general form) [Schönhage 1981] R k i=1 m i,n i,p i t = k i=1 (m i n i p i ) / t N/ = q N/,q N/,q N/ Consequence: 2 2/ = (1 o(1))n q N / R(T N easy ) R(T easy ) N =(q + 2) N 2 2/ q / q +2 = for q =8

29 Idea behind the proof T easy = T 011 easy + T 101 easy + T 110 easy T 011 easy = T 101 easy = T 110 easy = q i=1 q i=1 q i=1 x 0 y i z i x i y 0 z i x i y i z 0 Consider N =2 T 2 easy T 011 easy T 101 easy = =(T 011 easy + T 101 easy + T 110 easy) (T 011 easy + T 101 easy + T 110 easy) = Teasy 011 Teasy Teasy 011 Teasy Teasy 110 Teasy q i,i =0 (9 terms) label (x 0 x i ) (y i y 0 ) (z i z i ) tensor over (U 0 U 1 ) (V 1 V 0 ) (W 1 W 1 )

30 Idea behind the proof T 011 easy T 011 easy = q i,i =0 (x 0 x 0 ) (y i y i ) (z i z i ) tensor over (U 0 U 0 ) (V 1 V 1 ) (W 1 W 1 ) remove this term (e.g., by setting all variables in V 1 V 1 to zero) note: this removes more than one term! Consider N =2 T 2 easy T 011 easy T 101 easy = =(T 011 easy + T 101 easy + T 110 easy) (T 011 easy + T 101 easy + T 110 easy) = Teasy 011 Teasy Teasy 011 Teasy Teasy 110 Teasy q i,i =0 SHARE VARIABLES (x 0 x i ) (y i y 0 ) (z i z i ) (9 terms) by setting all variables in U 1 U 0, V 0 V 0 and Wlabel 0 W to zero tensor over (U 0 U 1 ) (V 1 V 0 ) (W 1 W 1 )

31 Idea behind the proof Conclusion: we can convert T 2 easy (a sum of 9 terms) into a direct sum of 2 terms NEXT STEP Consider Teasy N = Teasy 011 Teasy Teasy 110 Teasy 110 ( N terms) labels: N N

32 Idea behind the proof Theorem [Coppermith and Winograd 87] The tensor T easy N can be converted into a direct sum of exp H 1, 2 o(1) N = 2 2/ (1 o(1))n terms, each containing N copies of T 011 easy, N copies of T 101 easy and N copies of T 110 easy. We can obtain 2 2/ (1 o(1))n labels of the form N number of possibilities N N, 2N exp H 1, 2 N #0 = N/ #1 = 2N/ #0 = N/ #1 = 2N/ #0 = N/ #1 = 2N/ that do not share a blue part, a red part or a green part The proof of this theorem is based on a complicated construction using the existence of dense sets of integers with no three-term arithmetic progression

33 and Reinterpretation General Formulation of the Laser Method

34 The laser method: general formulation For any tensor T, any N 1 and any [2, ] define V,N (T ) k as the maximum of i=1 (m in i p i ) / over all restrictions of T N k isomorphic to i=1 m i,n i,p i V (T ) = lim N V,N (T ) 1/N The value of T This is the definition for symmetric tensors. Otherwise we use V (T )=V (T T 2 T ) 1/ V ( m, n, p )=(mnp) / This is an increasing function of ρ V (T T ) V (T )+V (T ) V (T T ) V (T ) V (T ) Theorem (the asymptotic sum inequality, general form) [Schönhage 1981] k (m i n i p i ) / R i=1 i=1 k m i,n i,p i

35 Example: The first CW construction For any tensor T, any N 1 and any [2, ] define V,N (T ) k as the maximum of i=1 (m in i p i ) / over all restrictions of T N k isomorphic to i=1 m i,n i,p i V (T ) = lim N V,N (T ) 1/N The value of T Theorem [Coppermith and Winograd 87] The tensor T easy N can be converted into a direct sum of exp H 1, 2 o(1) N = 2 2/ (1 o(1))n terms, each containing N copies of T 011 easy, N copies of T 101 easy and N copies of T 110 easy. isomorphic to T 011 easy N/ T 101 easy N/ T 110 easy N/ = q N/,q N/,q N/ V,N (T easy ) 2 2/ (1 o(1))n q N/ V (T easy ) 2 2/ q /

36 The laser method: general formulation For any tensor T, any N 1 and any [2, ] define V,N (T ) k as the maximum of i=1 (m in i p i ) / over all restrictions of T N k isomorphic to i=1 m i,n i,p i V (T ) = lim N V,N (T ) 1/N The value of T for instance, V ( m, n, p )=(mnp) Theorem (the asymptotic sum inequality, general form) [Schönhage 1981] k (m i n i p i ) / R i=1 i=1 Theorem (simple generalization of the asymptotic sum inequality) V (T ) R(T ) k m i,n i,p i

37 The laser method: general formulation Consider three vector spaces U, V and W over F A tensor T over (U, V, W ) is an element of U V W Assume that U, V and W are decomposed as U = U i V = V j W = W k for some I,J,K Z i I j J k K The tensor T is a partitioned tensor (with respect to this decomposition) if it can be written as T = (i,j,k) I J K T ijk where T ijk U i V j W k for each (i, j, k) I J K support of the tensor: supp(t )={(i, j, k) I J K T ijk =0} each non-zero T ijk is called a component of T We say that the tensor is tight if there exists some integer d such that i + j + k = d for all (i, j, k) supp(t )

38 Example: The first CW construction U = U 0 U 1, where U 0 = span{x 0 } and U 1 = span{x 1,...,x q } V = V 0 V 1, where V 0 = span{y 0 } and V 1 = span{y 1,...,y q } W = W 0 W 1, where W 0 = span{z 0 } and W 1 = span{z 1,...,z q } I = {0, 1} J = {0, 1} K = {0, 1} T easy = T 011 easy + T 101 easy + T 110 easy, where T 011 easy = T 101 easy = T 110 easy = q i=1 q i=1 q i=1 x 0 y i z i x i y 0 z i x i y i z 0 tensor over (U 0,V 1,W 1 ) tensor over (U 1,V 0,W 1 ) tensor over (U 1,V 1,W 0 ) supp(t easy )={(0, 1, 1), (1, 0, 1), (1, 1, 0)} it is tight, since i + j + k = 2 for all (i, j, k) supp(t easy )

39 The laser method: general formulation Main Theorem [LG 14] (reinterpretation of prior works) For any tight partitioned tensor T, any probability distribution P over supp(t ), and any [2, ], we have log(v (T )) =1 H(P ) + (i,j,k) supp(t ) P (i, j, k) log(v (T ijk )) (P ). H: entropy P : projection of P along the -th coordinate (= marginal distribution) (P ): to be defined later (zero in the case of simple tensors) Conclusion: we can compute a lower bound on the value of T if we know a lower bound on the value of each component we can then obtain an upper bound on ω via V (T ) R(T ) concretely, we use V (T ) R(T )= and do a binary search on ρ

40 Example: The first CW construction Main Theorem [LG 14] (reinterpretation of prior works) For any tight partitioned tensor T, any probability distribution P over supp(t ), and any [2, ], we have log(v (T )) =1 H(P ) + (i,j,k) supp(t ) P (i, j, k) log(v (T ijk )) (P ). H: entropy P : projection of P along the -th coordinate (= marginal distribution) (P ): to be defined later (zero in the case of simple tensors) supp(t easy )={(0, 1, 1), (1, 0, 1), (1, 1, 0)} V (T 011 easy) =V ( 1, 1,q )=q / P (0, 1, 1) = P (1, 0, 1) = P (1, 1, 0) = 1/ (P )=0 P 1 (0) = 1/, P 1 (1) = 2/ and P 2 = P = P 1 V (T 101 easy) =V ( q, 1, 1 )=q / V (T 110 easy) =V ( 1,q,1 )=q / log(v (T easy )) H 1, log q / + 1 log q / + 1 log q /

41 Theorem [Coppersmith and Winograd 87] Example: The first CW construction The tensor T N can be converted into a direct sum of easy exp H 1, 2 o(1) N = 2 2/ (1 o(1))n terms, each containing N copies of T 011 easy, N copies of T 101 easy and N copies of T 110 easy. V,N (T easy ) 2 2/ (1 o(1))n q N/ V (T easy ) 2 2/ q / supp(t easy )={(0, 1, 1), (1, 0, 1), (1, 1, 0)} V (T 011 easy) =V ( 1, 1,q )=q / P (0, 1, 1) = P (1, 0, 1) = P (1, 1, 0) = 1/ (P )=0 P 1 (0) = 1/, P 1 (1) = 2/ and P 2 = P = P 1 V (T 101 easy) =V ( q, 1, 1 )=q / V (T 110 easy) =V ( 1,q,1 )=q / log(v (T easy )) H 1, log q / + 1 log q / + 1 log q /

42 The laser method: general formulation Main Theorem [LG 14] For any tight partitioned tensor T, any probability distribution P over supp(t ), and any [2, ], we have log(v (T )) =1 H(P ) + (i,j,k) supp(t ) P (i, j, k) log(v (T ijk )) (P ). Interpretation: the laser method enables us to convert (by zeroing variables) T N into a direct sum of exp =1 H(P ) (P ) o(1) N terms, each isomorphic to (i,j,k) supp(t ) [T ijk ] P (i,j,k)n

43 The second CW construction Let q be a positive integer. Consider three vector spaces U, V and W of dimension q + 2 over F. U = span{x 0,...,x q, x q+1 } W = span{z 0,...,z q, z q+1 } V = span{y 0,...,y q, y q+1 } Coppersmith and Winograd (1987) considered the following tensor: T CW = T easy + TCW TCW TCW 200 R(T CW )=q +2 T CW = T 011 CW + T 101 CW + T 110 CW + T 002 CW + T 020 CW + T 200 CW, where TCW 011 = Teasy 011 TCW 101 = Teasy 101 TCW 110 = Teasy 110 and T 002 CW = x 0 y 0 z q+1 = 1, 1, 1 T 020 CW = x 0 y q+1 z 0 = 1, 1, 1 T 200 CW = x q+1 y 0 z 0 = 1, 1, 1.

44 The second CW construction U = span{x 0,...,x q, x q+1 } V = span{y 0,...,y q, y q+1 } W = span{z 0,...,z q, z q+1 } U = U 0 U 1 U 2, where U 0 = span{x 0 },U 1 = span{x 1,...,x q } and U 2 = span{x q+1 } V = V 0 V 1 V 2, where V 0 = span{y 0 },V 1 = span{y 1,...,y q } and V 2 = span{y q+1 } W = W 0 W 1 W 2, where W 0 = span{z 0 },W 1 = span{z 1,...,z q } and W 2 = span{z q+1 } T CW = TCW TCW TCW TCW TCW TCW 200 This is not a direct sum TCW 011 tensor over (U 0,V 1,W 1 ) TCW 101 tensor over (U 1,V 0,W 1 ) TCW 110 tensor over (U 1,V 1,W 0 ) T 002 T 002 CW = x 0 CW y 0 = xz 0q+1 y= 0 1, z1, q+1 1 tensor over (U 0,V 0,W 2 ) T 020 T 020 CW = x 0 CW y q+1 = x 0 z 0 y= q+1 1, 1, z1 0 tensor over (U 0,V 2,W 0 ) T 200 T 200 CW = x q+1 CW = y 0 x q+1 z 0 = y 0 1, 1, z1 0 tensor over (U 2,V 0,W 0 )

45 The second CW construction: laser method supp(t CW )={(0, 1, 1), (1, 0, 1), (1, 1, 0), (0, 0, 2), (0, 2, 0), (2, 0, 0)} V (T 002 CW )=V (T 020 CW )=V (T 200 CW )=1 V (T 011 CW )=V (T 101 CW )=V (T 110 CW )=q / P (0, 1, 1) = P (1, 0, 1) = P (1, 1, 0) = take P (0, 0, 2) = P (0, 2, 0) = P (2, 0, 0) = (1/ ) with 0 1/ P 1 (0) = + 2(1/ ), P 1 (1) = 2, P 1 (2) = (1/ ) T CW = TCW TCW TCW TCW TCW TCW 200 TCW 011 tensor over (U 0,V 1,W 1 ) TCW 101 tensor over (U 1,V 0,W 1 ) TCW 110 tensor over (U 1,V 1,W 0 ) T 002 T 002 CW = x 0 CW y 0 = xz 0q+1 y= 0 1, z1, q+1 1 tensor over (U 0,V 0,W 2 ) T 020 T 020 CW = x 0 CW y q+1 = x 0 z 0 y= q+1 1, 1, z1 0 tensor over (U 0,V 2,W 0 ) T 200 T 200 CW = x q+1 CW = y 0 x q+1 z 0 = y 0 1, 1, z1 0 tensor over (U 2,V 0,W 0 )

46 The second CW construction: laser method supp(t CW )={(0, 1, 1), (1, 0, 1), (1, 1, 0), (0, 0, 2), (0, 2, 0), (2, 0, 0)} V (T 002 CW )=V (T 020 CW )=V (T 200 CW )=1 V (T 011 CW )=V (T 101 CW )=V (T 110 CW )=q / take P (0, 1, 1) = P (1, 0, 1) = P (1, 1, 0) = with 0 1/ P (0, 0, 2) = P (0, 2, 0) = P (2, 0, 0) = (1/ ) P 1 (0) = + 2(1/ ), P 1 (1) = 2, P 1 (2) = (1/ ) Main Theorem [LG 14] For any tight partitioned tensor T, any probability distribution P over supp(t ), and any [2, ], we have log(v (T )) =1 H(P ) + (i,j,k) supp(t ) P (i, j, k) log(v (T ijk )) (P ). combined with this gives = log(v (T CW )) H 2 V (T CW ) R(T CW )=q +2, 2, 1 + log(q ) for q =6and =0.17

47 Analysis of the second construction analysis of the m-th power of the tensor by CW m Upper bound Number of variables in in the optimization problem Authors 1 < CW (1987) 2 < CW (1987) 4 < Stothers (2010) 8 < Vassilevska Williams (2012) 16 < Le Gall (2014) 2 < Le Gall (2014)

48 Analysis of the second power T 2 CW =(T 011 CW + T 101 CW + T 110 CW + T 002 CW + T 020 CW + T 200 CW ) 2 R(T 2 CW ) (q + 2)2 (6 terms) Idea: rewrite it as a (non-direct) sum of 15 terms by regrouping terms T 2 CW = T T T T 10 + T 01 + T 10 + T 10 + T 01 where + T 01 + T T T T T T 112, T 400 = T 200 CW T 200 CW, T 10 = T 200 CW T 110 CW + T 110 CW T 200 CW, T 220 = T 200 CW T 020 CW + T 020 CW T 200 CW + T 110 CW T 110 CW, T 211 = T 200 CW T 011 CW + T 011 CW T 200 CW + T 110 CW T 101 CW + T 101 CW T 110 CW, and the other 11 terms are obtained by permuting the variables (e.g., T 040 = TCW 020 TCW 020). MERGING

49 Analysis of the second power supp(t 2 CW )={(4, 0, 0),...,(0, 0, 4), (, 1, 0),...,(0, 1, ), (2, 2, 0),...,(0, 2, 2), (2, 1, 1),...,(1, 1, 2)} permutations 6 permutations permutations permutations lower bounds on the values of each component can be computed (recursively) choice of distribution: P (4, 0, 0) =...= P (0, 0, 4) =, P (, 1, 0) =...= P (0, 1, ) = (4-1= parameters) T 2 CW = T T T T 10 + T 01 + T 10 + T 10 + T 01 where P (2, 2, 0) =...= P (0, 2, 2) =, P(2, 1, 1) =...= P (1, 1, 2) = + T 01 + T T T T T T 112, T 400 = T 200 CW T 200 CW, T 10 = T 200 CW T 110 CW + T 110 CW T 200 CW, T 220 = T 200 CW T 020 CW + T 020 CW T 200 CW + T 110 CW T 110 CW, T 211 = T 200 CW T 011 CW + T 011 CW T 200 CW + T 110 CW T 101 CW + T 101 CW T 110 CW, and the other 11 terms are obtained by permuting the variables (e.g., T 040 = TCW 020 TCW 020).

50 Analysis of the second power supp(t 2 CW )={(4, 0, 0),...,(0, 0, 4), (, 1, 0),...,(0, 1, ), (2, 2, 0),...,(0, 2, 2), (2, 1, 1),...,(1, 1, 2)} permutations 6 permutations permutations permutations lower bounds on the values of each component can be computed (recursively) choice of distribution: P (4, 0, 0) =...= P (0, 0, 4) =, P (, 1, 0) =...= P (0, 1, ) = (4-1= parameters) we have (P )=0 P (2, 2, 0) =...= P (0, 2, 2) =, P(2, 1, 1) =...= P (1, 1, 2) =

51 Analysis of the second power supp(t 2 CW )={(4, 0, 0),...,(0, 0, 4), (, 1, 0),...,(0, 1, ), (2, 2, 0),...,(0, 2, 2), (2, 1, 1),...,(1, 1, 2)} permutations 6 permutations permutations permutations lower bounds on the values of each component can be computed (recursively) choice of distribution: P (4, 0, 0) =...= P (0, 0, 4) =, P (, 1, 0) =...= P (0, 1, ) = (4-1= parameters) we have (P )=0 Main Theorem [LG 14] P (2, 2, 0) =...= P (0, 2, 2) =, P(2, 1, 1) =...= P (1, 1, 2) = For any tight partitioned tensor T, any probability distribution P over supp(t ), and any [2, ], we have log(v (T )) =1 H(P ) + (i,j,k) supp(t ) P (i, j, k) log(v (T ijk )) (P ). Theorem V (T ) R(T ) = for q =6and =0.0002, =0.0125, = and =

52 Analysis of the second power analysis of the m-th power of the tensor by CW m Upper bound Number of variables in in the optimization problem Authors 1 < CW (1987) 2 < CW (1987) 4 < Stothers (2010) 8 < Vassilevska Williams (2012) 16 < Le Gall (2014) 2 < Le Gall (2014) What about the third power (using similar merging schemes)? this does not give any improvement

53 Analysis of the fourth power T 4 CW =(T 011 CW + T 101 CW + T 110 CW + T 002 CW + T 020 CW + T 200 CW ) 4 (6 4 terms) R(T 4 CW ) (q + 2)4 Idea: rewrite it as a (non-direct) sum of a smaller number of terms by regrouping terms T 4 CW = T T T T T 50 + T T T 41 + T T 2 + permutations of these terms T 080,T 008,T 701,T 107,T 170,T 017,T 071, =9 parameters for the probability distribution this time (P ) =0

54 The laser method: general formulation Main Theorem [LG 14] For any tight partitioned tensor T, any probability distribution P over supp(t ), and any [2, ], we have log(v (T )) =1 H(P ) + (i,j,k) supp(t ) P (i, j, k) log(v (T ijk )) (P ). H: entropy P : projection of P along the -th coordinate (= marginal distribution) (P ): to be defined later (zero in the case of simple tensors)

55 The laser method: general formulation Main Theorem [LG 14] For any tight partitioned tensor T, any probability distribution P over supp(t ), and any [2, ], we have log(v (T )) =1 H(P ) + (i,j,k) supp(t ) P (i, j, k) log(v (T ijk )) (P ). H: entropy P : projection of P along the -th coordinate (= marginal distribution) (P ): to be defined later (zero in the case of simple tensors) (P ) = max[h(q)] H(P ) where the max is over all distributions Q over supp(t ) such that P 1 = Q 1, P 2 = Q 2 and P = Q when the structure of support is simple, we typically have P 1 = Q 1,P 2 = Q 2,P = Q = P = Q and thus (P )=0

56 The laser method: general formulation Interpretation: the laser method enables us to convert (by zeroing variables) T N into a direct sum of exp =1 H(P ) (P ) o(1) N terms, each isomorphic to (i,j,k) supp(t ) [T ijk ] P (i,j,k)n type P we can control only the choice of the marginal distributions P1, P2 and P what we obtain is a (non-direct) sum of all type Q terms the most frequent terms are those with Q maximizing H(Q) the fact that type P are not the most frequent introduces the penalty term -Γ(P) (P ) = max[h(q)] H(P ) where the max is over all distributions Q over supp(t ) such that P 1 = Q 1, P 2 = Q 2 and P = Q

57 The laser method: computing the bound Main Theorem [LG 14] For any tight partitioned tensor T, any probability distribution P over supp(t ), and any log(v (T )) [2, ], we have concave =1 H(P ) + (i,j,k) supp(t ) How to find the best distribution for a given ρ? linear P (i, j, k) log(v (T ijk )) (P ). assume that (a lower bound on) each V (T ijk ) is known =0 If Γ(P) = 0 for all distributions P, the best distribution can be done efficiently (numerically) using convex optimization maximization of a concave function under linear constraints

58 The laser method: computing the bound Main Theorem [LG 14] For any tight partitioned tensor T, any probability distribution P over supp(t ), and any log(v (T )) [2, ], we have concave =1 H(P ) + (i,j,k) supp(t ) How to find the best distribution for a given ρ? linear P (i, j, k) log(v (T ijk )) (P ). assume that (a lower bound on) each V (T ijk ) is known??? In general: (P ) = max[h(q)] H(P ) where the max is over all distributions Q over supp(t ) such that P 1 = Q 1, P 2 = Q 2 and P = Q hard to solve, but can be done up to the 4th power of the CW tensor [Stothers 10]

59 The laser method: computing the bound analysis of the m-th power of the tensor by CW m Upper bound Number of variables in in the optimization problem Authors 1 < CW (1987) 2 < CW (1987) 4 < Stothers (2010) 8 < Vassilevska Williams (2012) 16 < Le Gall (2014) 2 < Le Gall (2014) In general: (P ) = max[h(q)] H(P ) where the max is over all distributions Q over supp(t ) such that P 1 = Q 1, P 2 = Q 2 and P = Q hard to solve, but can be done up to the 4th power of the CW tensor [Stothers 10]

60 The laser method: computing the bound Main Theorem [LG 14] For any tight partitioned tensor T, any probability distribution P over supp(t ), and any log(v (T )) [2, ], we have concave =1 H(P ) + (i,j,k) supp(t ) How to find the best distribution for a given ρ? linear P (i, j, k) log(v (T ijk )) (P ). assume that (a lower bound on) each V (T ijk ) is known??? In general: (P ) = max[h(q)] H(P ) where the max is over all distributions Q over supp(t ) such that P 1 = Q 1, P 2 = Q 2 and P = Q hard to solve, but can be done up to the 4th power of the CW tensor [Stothers 10] Simplification: restrict the search to the set of distributions P such that (P )=0 still hard to solve, but can be done up to the 8th power of the CW tensor [Vassilevska-Williams 12]

61 The laser method: computing the bound analysis of the m-th power of the tensor by CW m Upper bound Number of variables in in the optimization problem Authors 1 < CW (1987) 2 < CW (1987) 4 < Stothers (2010) 8 < Vassilevska Williams (2012) 16 < Le Gall (2014) 2 < Le Gall (2014) Simplification: restrict the search to the set of distributions P such that (P )=0 still hard to solve, but can be done up to the 8th power of the CW tensor [Vassilevska-Williams 12]

62 The laser method: computing the bound Main Theorem [LG 14] For any tight partitioned tensor T, any probability distribution P over supp(t ), and any log(v (T )) [2, ], we have concave =1 H(P ) + (i,j,k) supp(t ) How to find the best distribution for a given ρ? linear P (i, j, k) log(v (T ijk )) (P ). assume that (a lower bound on) each V (T ijk ) is known??? In general: (P ) = max[h(q)] H(P ) where the max is over all distributions Q over supp(t ) such that P 1 = Q 1, P 2 = Q 2 and P = Q hard to solve, but can be done up to the 4th power of the CW tensor [Stothers 10] Simplification: restrict the search to the set of distributions P such that (P )=0 still hard to solve, but can be done up to the 8th power of the CW tensor [Vassilevska-Williams 12]

63 The laser method: general formulation Main Theorem [LG 14] For any tight partitioned tensor T, any probability distribution P over supp(t ), and any log(v (T )) [2, ], we have =1 H(P ) (P ) = max[h(q)] H(P ) where the max is over all distributions Q over supp(t ) such that P 1 = Q 1, P 2 = Q 2 and P = Q + call this expression f(p ) (i,j,k) supp(t ) P (i, j, k) log(v (T ijk )) (P ). Efficient method to find a solution [LG 14](close to the optimal solution):

64 The laser method: general formulation Main Theorem [LG 14] For any tight partitioned tensor T, any probability distribution P over supp(t ), and any log(v (T )) [2, ], we have =1 H(P ) (P ) = max[h(q)] H(P ) where the max is over all distributions Q over supp(t ) such that P 1 = Q 1, P 2 = Q 2 and P = Q + call this expression f(p ) (i,j,k) supp(t ) P (i, j, k) log(v (T ijk )) (P ). Efficient method to find a solution [LG 14](close to the optimal solution): 1. find a distribution P that maximizes f(p ), and call it ˆP concave objective function, linear constraints 2. find the distribution Q that maximizes H(Q) under the constraints Q 1 = ˆP 1, Q 2 = ˆP 2 and Q = ˆP. Call it ˆQ.. output f( ˆQ) concave objective function, linear constraints Since ( ˆQ) =0, we have log(v (T )) f( ˆQ) from the theorem

65 Analysis of power 16 and 2 analysis of the m-th power of the tensor by CW m Upper bound Number of variables in in the optimization problem Authors 1 < CW (1987) 2 < CW (1987) 4 < Stothers (2010) 8 < Vassilevska Williams (2012) 16 < Le Gall (2014) 2 < Le Gall (2014) solutions to the optimization problems obtained numerically by convex optimization

66 Conclusion We constructed a time-efficient implementation of the laser method any tight partitioned tensor for which (lower bounds on) the value of each component is known convex optimization polynomial time analysis of the m-th power of the tensor by CW upper bound on ω We applied it to study higher powers of the basic tensor by CW m Upper bound Number of variables in in the optimization problem Authors 1 < CW (1987) 2 < CW (1987) 4 < Stothers (2010) 8 < Vassilevska Williams (2012) 16 < Le Gall (2014) 2 < Le Gall (2014) recent result [Ambainis, Filmus, LG 14]: Laser-method-based analysis (v2.) studying higher powers (using the same approach) cannot give an upper bound better than 2.725

Complexity of Matrix Multiplication and Bilinear Problems

Complexity of Matrix Multiplication and Bilinear Problems François Le Gall Graduate School of Informatics Kyoto University ADFOCS17 - Lecture 3 24 August 2017 Overview of the Lectures Fundamental techniques