How to find good starting tensors for matrix multiplication Markus Bläser Saarland University
Matrix multiplication z,... z,n..... z n,... z n,n = x,... x,n..... x n,... x n,n y,... y,n..... y n,... y n,n n z i,j = x i,k y k,j, k= i, j n entries are variables allowed operations: addition, multiplication, scalar multiplication
Strassen s algorithm ( ) ( z z 2 x x = 2 z 2 z 22 x 2 x 22 ) ( y y 2 y 2 y 22 ). p = (x + x 22 )(y + y 22 ), p 2 = (x + x 22 )y, p 3 = x (y 2 y 22 ), p 4 = x 22 ( y + y 2 ), p 5 = (x + x 2 )y 22, p 6 = ( x + x 2 )(y + y 2 ), p 7 = (x 2 x 22 )(y 2 + y 22 ). ( z z 2 z 2 z 22 ) ( p + p = 4 p 5 + p 7 p 3 + p 5 p 2 + p 4 p + p 3 p 2 + p 6 ).
Strassen s algorithm (2) 7 mults, 8 adds instead of 8 mults, 4 adds
Strassen s algorithm (2) 7 mults, 8 adds instead of 8 mults, 4 adds Observation: Strassen s algorithm works over any ring!
Strassen s algorithm (2) 7 mults, 8 adds instead of 8 mults, 4 adds Observation: Strassen s algorithm works over any ring! Recurse: ( ) ( ) ( ) =. C(n) 7 C(n/2) + O(n 2 ), C() = Theorem (Strassen) We can multiply n n-matrices with O(n log 2 7 ) = O(n 2.8 ) arithmetic operations
Tensor rank In general: bilinear forms b (X, Y),... b n (X, Y) in variables X = {x,..., x k } and Y = {y,..., y m }. Write n k m n b i z i = t h,i,j x h y i z j. j= h= i= j= t = (t h,i,j ) K k K m K n is the tensor corresponding to b,..., b n.
Tensor rank Definition u v w U V W is called a triad rank-one tensor. Definition (Rank) R(t) is the smallest r such that there are rank-one tensors t,..., t r with t = t + + t r. Lemma Let t U V W and t U V W. R(t t ) R(t) + R(t ) R(t t ) R(t)R(t )
Sums and products Direct sum t t (U U ) (V V ) (W W ): n k m k n m Tensor product t t (U U ) (V V ) (W W ):
Matrix multiplication tensor Example: 2 2-matrix multiplication 2, 2, 2 : x x 2 x 2 x 22 y y 2 y 2 y 22 z z 2 z 2 z 22 In general: t (h,h ),(i,i ),(j,j ) = δ h,iδ i,jδ j,h. Lemma R( k, m, n ) = R( n, k, m ) = = R( n, m, k ). k, m, n k, m, n = kk, mm, nn.
Strassen s algorithm and tensors Observation: Tensor product = Recursion Strassen s algorithm: 2, 2, 2 s = 2 s, 2 s, 2 s R( 2, 2, 2 s ) 7 s Definition (Exponent of matrix multiplication) ω = inf{τ R( n, n, n ) = O(n τ )} Strassen: ω log 2 7 2.8 Lemma If R( k, m, n ) r, then ω 3 log r log kmn.
What next? Maybe we can multiply 2 2-matrices with 6 multiplications? Theorem (Winograd) R( 2, 2, 2 ) 7 Open question (not so open anymore) Is there a small tensor n, n, n, say, n 0, which gives a better bound on the exponent than Strassen? Smirnov: R( 3, 3, 6 ) 40 ω 2.79
Border rank (example) Polynomial multiplication mod X 2 : (a 0 + a X)(b 0 + b X) = a 0 b }{{} 0 +(a b 0 + a 0 b )X + a }{{} b X 2 f 0 f 0 0 0 0 0 Observation R(t) = 3 However, t can be approximated by tensors of rank 2. t(ɛ) = (, ɛ) (, ɛ) (0, ɛ ) + (, 0) (, 0) (, ɛ ) 0 0 0 0 ɛ
Proof of observation restrictions Definition Let A : U U, B : V V, C : W W be homomorphism. (A B C)(u v w) = A(u) B(v) C(w) (A B C)t = r i= A(u i) B(v i ) C(w i ) for t = r i= u i v i w i. t t if there are A, B, C such that t = (A B C)t. ( restriction ). Lemma If t t, then R(t ) R(t) R(t) r iff t r. ( r diagonal of size r.)
Proof of observation 0 0 0 0 0 Let t = r i= u i v i w i. lin{w,..., w r } = K 2. Asume that w r = (, ). Let C be the projection along lin{w r } onto lin{(0, )}. (I I C)t = 0, which has rank 2.
Border rank Definition Let h N, t K k m n.. R h (t) = min{r u ρ K[ɛ] k, v ρ K[ɛ] m, w ρ K[ɛ] n : r u ρ v ρ w ρ = ɛ h t + O(ɛ h+ )}. ρ= 2. R(t) = min h R h (t). R(t) is called the border rank of t. Bini, Capovani, Lotti, Romani: R( 2, 2, 3 ) 0. Lemma If R( k, m, n ) r, then ω 3 Corollary ω 2.79. log r log kmn.
Schönhage s τ-theorem Schönhage: R( k,, n, (k )(n ), ) kn +. Theorem (Schönhage s τ-theorem) If R( p k i, m i, n i ) r with r > p then ω 3τ where τ is i= defined by p (k i m i n i ) τ = r. i= Corollary ω 2.55.
Strassen s tensor q Str = (e i e 0 e i + e }{{} 0 e i e i ) }{{} i= q,,,,q = q (e 0 + ɛe i ) (e 0 + ɛe i ) e i ɛ ɛ e 0 e 0 i= q e i + O(ɛ) i=
Rank versus border rank Theorem R(Str) = 2q. Let Str = r i= u i v i w i. W.l.o.g. assume that u r / lin{e 0,..., e q }. Let A be the projection along u r onto lin{e 0,..., e q }. Let B be the projection along e q onto lin{e 0,..., e q }. R(A I B) Str) R(Str). (A I B) Str is like Str with one inner tensor now being q,,. Do this q times and kill q triads. We are left with a matrix of rank q. Gap of almost 2 between rank and border rank.
Laser method Think of Strassen s tensor having an outer and an inner structure: Cut Str into (combinatorial) cubiods! inner tensors: q,,,,, q outer structure: Put in every cubiod that is nonzero., 2,. (Str π Str π 2 Str) s has inner tensors x, y, z with xyz = q 3s, outer tensor 2 s, 2 s, 2 s.
Degeneration Definition. Let t = r u ρ v ρ w ρ K k m n, A(ɛ) K[ɛ] k k, ρ= B(ɛ) K[ɛ] m m, and C(ɛ) K[ɛ] n n. Define (A(ɛ) B(ɛ) C(ɛ))t = r A(ɛ)u ρ B(ɛ)v ρ C(ɛ)w ρ. ρ= 2. t is a degeneration of t K k m n ( t t ), if there are A(ɛ), B(ɛ), C(ɛ), and q such that ɛ q t = (A(ɛ) B(ɛ) C(ɛ))t + O(ɛ q+ ). Remark R(t) r t r
Laser method (2) A degeneration (A(ɛ), B(ɛ), C(ɛ)) is called monomial if all entries are monomials. Lemma (Strassen) 3 4 n2 n, n, n by a monomial degeneration. inner tensors x, y, z with xyz = q 3s, outer tensor 2 s, 2 s, 2 s. 2 2s independent matrix products with x, y, z with xyz = q 3s Now apply the τ-theorem! Corollary (Strassen) ω 2.48
Coppersmith Winograd tensor q ɛ 5 CW = ɛ (e 0 + ɛ 2 e i ) (e 0 + ɛ 2 e i ) (e 0 + ɛ 2 e i ) i= q q q (e 0 + ɛ 3 e i ) (e 0 + ɛ 3 e i ) (e 0 + ɛ 3 e i ) i= i= i= + ( qɛ) e 0 e 0 e 0 + O(ɛ 6 )
Coppersmith Winograd tensor Remark (last time, a doable open question) R(CW) = 2q +
Laser method (3) CW has inner structure q,,,, q,,,, q. outer structure 0 0 0 0 0 There is a general method how to degenerate large diagonals from arbitrary tensors. apply to outer tensor Corollary (Coppersmith & Winograd) ω 2.4 Coppersmith & Winograd, Stothers, Vassilevska-Williams, LeGall: ω 2.37...
What did we learn so far?
What did we learn so far? How to multiply matrices of astronomic sizes fast!
What did we learn so far? How to multiply matrices of astronomic sizes fast! If we want to multiply matrices of astronomic sizes even faster, we need tensors with border rank close to max{dim U, dim V, dim W} with a rich structure
What did we learn so far? How to multiply matrices of astronomic sizes fast! If we want to multiply matrices of astronomic sizes even faster, we need tensors with border rank close to max{dim U, dim V, dim W} with a rich structure or completely new methods.
Cheap approaches that do not work (not yet?) Main tools: R R 2, 2, 2 7 7 2, 2, 3 [9,0] 2, 2, 4 4 [2,4] 2, 3, 3 [4,5] [0,5] 3, 3, 3 [9,23] [5,20] Rank: substition method (Pan), de Groote s twist of it Border rank: vanishing equations (Strassen, Lickteig, Landsberg & Ottaviani) in combination with substitution method (Landsberg & Michalek, B & Lysikov) Did not find any upper bounds
Characterization problem Definition S n (q) = {t K n K n K n R(t) q}, X n (q) = {t K n K n K n R(t) q}. These definitions are in complexity-theoretic terms. We need algebraic terms. But: {t t q } is not very useful We need easy to check algebraic criteria. Remark: all tensors considered are tight.
S n (n) Theorem t S n (n) iff t = n
S n (n) Theorem t S n (n) iff t = n The multiplication in any finite dimensional algebra A can be described by a set of bilinear forms. tensor t A Example: A ɛ = K[X]/(X n ɛ) = K n A ɛ K[X]/(X n ) R(A) = 2n. Theorem (Alder Strassen) R(A) 2 dim A number of maximal twosided ideals.
X n (n) Definition Let t U V W. t is U -generic ( V, W ) if the U-slices (V, W) contain an invertible element. Proposition (B & Lysikov) Let t be U - and V -generic. Then there is an algebra A with structural tensor t A such that t A = t.
X n (n) Theorem (B & Lysikov) Let A and B be algebras with tensors t A and t B. Then t A GL 3 n t B iff t A GL n t B. Theorem (B & Lysikov) Let t be U - and V -generic. Then t X n (n) iff there is an algebra A such that t A = t and ta GL n n
Smoothable algebras Definition An algebra A of dimension n of the form K[X,..., X m ]/I for some ideal I is called smoothable if I is a degeneration of some ideal whose zero set consists of n distinct points. Theorem (B & Lysikov) Let t be U - and V -generic. Then t X n (n) iff there is a smoothable algebra A such that t A = t.
Examples Cartwright et al.: All (commutative) algebras of dimension 7 are smoothable. All algebras generated by two elements are smoothable. All algebras with dim rad(a) 2 / rad(a) 3 = All algebras defined by a monomial ideal. Str + has minimal border rank. Its structural tensor is isomorphic to k[x,..., X q ]/(X i X j i, j q) CW + has minimal border rank. Its structural tensor is isomorphic to k[x,..., X q+ ]/(X i X j, X 2 i X2 j, X3 i i j)
Comon s conjecture symmetric tensor = invariant under permutation of dimensions symmetric rank = use symmetric rank-one tensors Conjecture (Comon) For symmetric tensors, the rank equals the symmetric rank. Proposition The border rank Comon conjecture is true for -generic tensors of minimal border rank.
X n (n + ) Theorem t S n (n + ) \ S n (n) iff t is isomorphic to the multiplication tensors in the algebras K[X]/(X 2 ) K n 2 or T 2 K n 3. where T 2 is the algebra of upper triangular 2 2-matrices. Open question (Doable) What about X n (n + ) (for -generic tensors)?
The asymptotic rank of CW We know that R(CW q ) = q + 2. For fast matrix multiplication, good upper bounds on R(CW N q ) are sufficient. In particular, R(CW N 3 ) /N 3 implies ω = 2. Theorem (B. & Lysikov) R(CW N q ) (q + ) N + 2 N.