Matrix multiplication: a group-theoretic approach

CSG399: Gems of Theoretical Computer Science. Lec. 21-23. Mar. 27-Apr. 3, 2009. Instructor: Emanuele Viola Scribe: Ravi Sundaram Matrix multiplication: a roup-theoretic approach Given two n n matrices A and B we want to compute their product c = A B. The trivial alorithm runs in time n 3 (this and the next runnin times are meant up to lower order factors n o(1) ). In 1967 Strassen improved the runnin time to n 2.81 and in 1990 Coppersmith and Winorad improved it further to n 2.38, which continues to be the best to date. Since the output is of size n 2 it is clear that the best possible runnin time one can aspire for is n 2. Remarkably, it is believed that this is attainable. In this lecture we present a roup-theoretic approach to matrix multiplication developed by H. Cohn and C. Umans (2003), later with R. Kleinber and B. Szeedy (2005). This approach ives a conceptually clean way to et fast alorithms, and also provides specific conjectures that if proven yield the optimal n 2 runnin time. In what follows we first present some notation, then we cover polynomial multiplication, and lastly we present matrix multiplication. 1 Notation Let G be a roup. Definition 1. The roup alebra C[G] is the set of formal sums G a where a C. An element of the roup alebra can be thouht of as a vector of G complex numbers (a 1, a 2,..., a G ). The operations of addition and multiplication are as expected: ( ) ( ) a + b := (a + b ), ( a ) ( b ) := c where c = a i b j. 2 Polynomial Multiplication i,j:i j= To explain matrix multiplication, it is best to start with polynomial multiplication. Consider the task of multiplyin two polynomials A(x) = n i=0 xi a i and B(x) = n i=0 xi b i. We think of each polynomial as bein iven as a vector of coefficients, and we are interested in computin the vector of coefficients of the product polynomial C(x) := A(x) B(x) = 2n i=0 xi c i, where c i = i j=0 a j b i j. The naive alorithm for multiplyin two n-deree polynomials takes time n 2. The Fast Fourier Transform achieves time n (recall in this discussion we are inorin lower order 1

factors). The Fourier Transform amounts to interpretin polynomials as elements of a roup alebra, i.e. A(x) = n i=0 xi a i can be viewed as an element Ā C[G], where G = Z m for m = O(n) (think of G = { i 0 i m 1}). Observe how multiplication in this roup alebra precisely corresponds to polynomial multiplication (for this to happen one needs to work over a roup of size slihtly larer than n, to avoid wrappin around, this is why we take the roup size to be O(n) rather than n). Now, the trick in Fast Fourier Transform is to transform the roup alebra into another isomorphic alebra C G such that multiplication in the oriinal roup alebra C[G] becomes point-wise multiplication in the new alebra C G (addition continues to be point-wise in the new alebra as well). The speedup of the Fast Fourier Transform comes from the fact that such an isomorphism exists and that it can be computed efficiently. This can be visualized via the followin diaram: A(x) = n i=0 xi a i B(x) = n i=0 xi b i C(x) := ( 2n i ) i=0 xi j=0 a j b i j Ā C[G] B C[G] C C[G] (convolution) C G C G C G (point-wise) time n 3 Matrix Multiplication The basic idea for matrix multiplication is similar to that of Fourier Transform: we embed matrices as elements of a roup alebra, and then via a transform we map these into another alebra such that the oriinal multiplication is transformed into component-wise multiplication (of smaller matrices). There are however a few notable differences with polynomial multiplication, arisin from the recursive nature of matrix multiplication. First, for matrix multiplication, the time required to compute the isomorphism is neliible (unlike polynomial multiplication, where speedin up this isomorphism via Fast Fourier Transform is the main source of the overall time improvement); also, the existence of ood alorithms (as measured in terms of multiplications) for a constant-size input implies an asymptotic improvement for all input lenths. We now state the isomorphism theorem without proof. Theorem 2. For every roup G, C[G] C d 1 d 1 C d 2 d 2... C d k d k. The operations in the resultin alebra are component-wise matrix addition and multiplication. The inteers d i are called the character derees of G, or the dimensions of the irreducible representations of G. Countin dimensions we readily see that k i=1 d2 i = G. Moreover, i, d i G / H where H is any abelian subroup of G. Note that if G is abelian then all its character derees are 1. 2

Like polynomial multiplication, the roup-theoretic approach to matrix multiplication can be visualized via the followin diaram A n n B n n C = A B n n Ā C[G] B C[G] C C[G] C d 1 d 1... C d k d k C d 1 d 1... C d k d k C d 1 d 1... C d k d k (point-wise) Where to compute the product in the last line we aain use matrix multiplication. The ain will be that that the matrix dimensions d 1,..., d k will be smaller than n. We now proceed to ive the details of the approach. 3.1 Embeddin We now explain how to embed the n n matrices A, B, C into the roup alebra C[G] (for C we are not performin this embeddin directly, but rather think of it when readin the coefficients of C from a roup alebra element, as we explain later). Given 3 sets S 1, S 2, S 3 G, S i = n, we let Ā := 1 s 2 ) A s1 s 2, B := C := ( 1 S 1 1,s 2 S 2 ( 2 S 1 2,s 3 S 3 ( 1 S 1 1,s 3 S 3 2 s 3 ) B s2 s 3, 1 s 3 ) C s1 s 3. This embeddin works when the cancelations in Ā B correspond to those in A B. This is uaranteed whenever the sets satisfy the followin property. Definition 3. The sets S 1, S 2, S 3 satisfy the triple-product property if s i S i, t i S i 2 s 3 = t 1 1 t 3 s i = t i, i 3. The next claim indeed shows that if the sets satisfy that property, then the coefficients of the matrix product appear as coefficients in the roup alebra. Claim 1. If S 1, S 2, S 3 satisfy the triple-product property then (A B) t1 t 3 t 1 1 t 3 in (Ā B), t 1 S 1, t 3 S 3. is the coefficient of 3

Proof. We have Ā B = ( 2 s 3 ) A s1 s 2 B t2 s 3, s 1 S 1, s 2, t 2 S 2, s 3 S 3. Since by the triple-product property 2 s 3 multiplies to t 1 1 t 3 only when s 1 = t 1 and s 2 = t 2 and s 3 = t 3, we see that the coefficient of t 1 1 t 3 is s 2 A t1 s 2 B s2 t 3 = (A B) t1 t 3. 3.2 Runnin time Lookin at the diaram, we see that we reduce multiplication of n n matrices to k matrix multiplications of dimensions d 1,..., d k. Thus, intuitively, is ω is the exponent of the runnin time of matrix multiplication, provided the embeddin works we should have n ω i dω i. This can be formalized in the followin theorem we do not prove. Theorem 4. If there exists S 1, S 2, S 3 G of size n satisfyin the triple-product property then, for ω the exponent of matrix multiplication, n ω i d ω i where the inteers d i are the character derees of G. 3.3 An example Here is a simple example. Let G = Z n Z n Z n = {(a, b, c) 0 a, b, c n}, (a, b, c) (a, b, c ) := (a + a, b + b, c + c ). Let S 1 := {(a, 0, 0) a < n}, S 2 := {(0, b, 0) b < n}, S 3 := {(0, 0, c) c < n}. It is straihtforward to verify that S 1, S 2, S 3 satisfy the triple-product property, i.e. ( a, 0, 0) (0, b, 0) (0, b, 0) (0, 0, c) = ( a, 0, 0) (0, 0, c ) a = a, b = b, c = c. Since G is abelian it follows that all the character derees d i are 1 and hence, usin Theorem 2, we et the expression n ω d ω i = d 2 i = G = n 3. Thus this does not rule out that ω = 3. This is no better than the naive alorithm with cubic runnin time. In fact, no abelian roup does better. In the next section we ive a non-trivial example via a non-abelian roup. 4

4 A roup yieldin ω < 3 Now we ive an example of a roup and embeddin ivin a nontrivial alorithm, i.e., ω < 3. Given any roup H we define a new roup G := {(a, b)z J a, b H, J {0, 1}, z is a new symbol}. It is easy to see that G = 2 H 2. For clarity, note (a, b)z 0 = (a, b)1 = (a, b). The roup operation + G for G is defined by the followin three rules: (a, b) + G (a, b ) = (a + H a, b + H b ), (a, b)z = z(b, a), z z = 1. In other words, we take two copies of H, and we let z act on them by swappin them. Those versed in roup theory will reconize G as a wreath product. As an exercise it is ood to verify that ((a, b)z) 1 = ( b, a)z. Now, let H := H 1 H 2 H 3, where for every i, H i := Z n is the additive roup of inteers modulo n. This is the roup we use for matrix multiplication. Aain for clarity, note 0 G = (0 H, 0 H ) = ((0, 0, 0), (0, 0, 0)). We now define the three sets for the embeddin. For ease of notation, let H 4 := H 1. For 1 i 3, we define S i := {(a, b)z J a H i \ {0}, b H i+1, J {0, 1}}. Removin the element 0 is crucial to obtain a non-trivial exponent < 3. Lemma 5. S 1, S 2, S 3 as defined above satisfy the triple-product property. Proof. We need to prove that s i S i, t i S i Equivalently, we have to prove that 2 s 3 = t 1 1 t 3 s i = t i, i 3 t 1 2 s 3 t 1 3 = 0 s i = t i, i 3 We first see that we can represent s i t 1 i in a normal form. 5

Claim 2. s i, t i S i, either or for some a i, a i H i \ {0}, b i, b i H i+1. s i t 1 i = (a i, b i )z(a i, b i) (a i, b i )(a i, b i) Proof. If t i contains no z then it is easy to see that one of the two forms will occur dependin on the presence or absence of z in s i. Alternately, if t i = (a i, b i)z then t 1 i = ( b i, a i)z = z( a i, b i), from which the result follows. Specifically, the absence or presence of z in the final form depends on the presence or absence of z in s i. Given the above normal form we can rewrite our equation t 1 2 s 3 t 1 3 = 0 as (a 1, b 1 )z J 1 (a 1, b 1)(a 2, b 2 )z J 2 (a 2, b 2)(a 3, b 3 )z J 3 (a 3, b 3) = ((0, 0, 0), (0, 0, 0)). Observe that there can only be an an even number of z s, since they annihilate in pairs and there is none on the riht hand side. If there are no z s then by equatin componentwise we et that i 3, a i +a i = b i +b i = 0 and hence s i = t i, i 3. If there are 2 z s then assumin without loss of enerality that J 3 = 0 we et that Note that if (a 1, b 1 )z(a 1, b 1)(a 2, b 2 )z(a 2, b 2)(a 3, b 3 )(a 3, b 3) = (a 1, b 1 )(b 1, a 1)(b 2, a 2 )(a 2, b 2)(a 3, b 3 )(a 3, b 3) = (a 1 + b 1 + b 2 + a 2 + a 3 + a 3, b 1 + a 1 + a 2 + b 2 + b 3 + b 3). (a 1 + b 1 + b 2 + a 2 + a 3 + a 3, b 1 + a 1 + a 2 + b 2 + b 3 + b 3) = ((0, 0, 0), (0, 0, 0)) then it must be the case that a 1 + b 1 + b 2 + a 2 + a 3 + a 3 = (0, 0, 0). However, since a i H i \ 0 and b i H i+1 this can never happen: a 1 is the only term with an entry in the first component, but by definition it is non-zero. (The only other elements that could contribute to makin the first component non-zero are a 1, b 3, and b 3, but all these moved to the second copy of H.) Havin verified the embeddin capability of G, we now observe that the relative size of the sets S i in G yield non-trivial exponents via Theorem 4. Claim 3. The roup G and associated sets S 1, S 2, S 3 with the triple-product property yields a matrix multiplication exponent ω < 3. 6

Proof. Elementary countin shows G = 2 H 2 = 2n 6, S i = 2n(n 1), i 3. Since H H is an abelian subroup of G we have by Theorem 2 that all character derees d i G / H = 2. Theorem 4 shows that the exponent ω of matrix multiplication satisfies (2n(n 1)) 3 d ω i. We prove that ω < 3 by showin that ω = 3 yields a contradiction. Suppose ω = 3, then (2n(n 1)) 3 d 3 i 2 d 2 i = 4n 6, which is false for n = 5. Hence ω < 3. The best bound with this approach is ω 2.9. 7