A note on the multiplication of sparse matrices

Similar documents
Block designs and statistics

13.2 Fully Polynomial Randomized Approximation Scheme for Permanent of Random 0-1 Matrices

Feature Extraction Techniques

The Simplex Method is Strongly Polynomial for the Markov Decision Problem with a Fixed Discount Rate

Interactive Markov Models of Evolutionary Algorithms

Fast Montgomery-like Square Root Computation over GF(2 m ) for All Trinomials

Explicit solution of the polynomial least-squares approximation problem on Chebyshev extrema nodes

A Generalized Permanent Estimator and its Application in Computing Multi- Homogeneous Bézout Number

Generalized AOR Method for Solving System of Linear Equations. Davod Khojasteh Salkuyeh. Department of Mathematics, University of Mohaghegh Ardabili,

e-companion ONLY AVAILABLE IN ELECTRONIC FORM

IN modern society that various systems have become more

An improved self-adaptive harmony search algorithm for joint replenishment problems

Design of Spatially Coupled LDPC Codes over GF(q) for Windowed Decoding

The Hilbert Schmidt version of the commutator theorem for zero trace matrices

Support Vector Machine Classification of Uncertain and Imbalanced data using Robust Optimization

A note on the realignment criterion

Distributed Subgradient Methods for Multi-agent Optimization

Lecture 9 November 23, 2015

A BLOCK MONOTONE DOMAIN DECOMPOSITION ALGORITHM FOR A NONLINEAR SINGULARLY PERTURBED PARABOLIC PROBLEM

RESTARTED FULL ORTHOGONALIZATION METHOD FOR SHIFTED LINEAR SYSTEMS

Uniform Approximation and Bernstein Polynomials with Coefficients in the Unit Interval

Hybrid System Identification: An SDP Approach

arxiv: v1 [cs.ds] 3 Feb 2014

Page 1 Lab 1 Elementary Matrix and Linear Algebra Spring 2011

ON THE TWO-LEVEL PRECONDITIONING IN LEAST SQUARES METHOD

Randomized Recovery for Boolean Compressed Sensing

Ensemble Based on Data Envelopment Analysis

Course Notes for EE227C (Spring 2018): Convex Optimization and Approximation

Fast Structural Similarity Search of Noncoding RNAs Based on Matched Filtering of Stem Patterns

Homework 3 Solutions CSE 101 Summer 2017

Sharp Time Data Tradeoffs for Linear Inverse Problems

A Simplified Analytical Approach for Efficiency Evaluation of the Weaving Machines with Automatic Filling Repair

arxiv: v1 [stat.ot] 7 Jul 2010

NBN Algorithm Introduction Computational Fundamentals. Bogdan M. Wilamoswki Auburn University. Hao Yu Auburn University

On weighted averages of double sequences

Bipartite subgraphs and the smallest eigenvalue

Algorithms for parallel processor scheduling with distinct due windows and unit-time jobs

This model assumes that the probability of a gap has size i is proportional to 1/i. i.e., i log m e. j=1. E[gap size] = i P r(i) = N f t.

Supplementary Material for Fast and Provable Algorithms for Spectrally Sparse Signal Reconstruction via Low-Rank Hankel Matrix Completion

On Poset Merging. 1 Introduction. Peter Chen Guoli Ding Steve Seiden. Keywords: Merging, Partial Order, Lower Bounds. AMS Classification: 68W40

Bulletin of the. Iranian Mathematical Society

. The univariate situation. It is well-known for a long tie that denoinators of Pade approxiants can be considered as orthogonal polynoials with respe

An Improved Particle Filter with Applications in Ballistic Target Tracking

A new type of lower bound for the largest eigenvalue of a symmetric matrix

Constrained Consensus and Optimization in Multi-Agent Networks arxiv: v2 [math.oc] 17 Dec 2008

A Note on Online Scheduling for Jobs with Arbitrary Release Times

arxiv: v1 [math.na] 10 Oct 2016

Use of PSO in Parameter Estimation of Robot Dynamics; Part One: No Need for Parameterization

Non-Parametric Non-Line-of-Sight Identification 1

A Low-Complexity Congestion Control and Scheduling Algorithm for Multihop Wireless Networks with Order-Optimal Per-Flow Delay

Deflation of the I-O Series Some Technical Aspects. Giorgio Rampa University of Genoa April 2007

The Fundamental Basis Theorem of Geometry from an algebraic point of view

Lower Bounds for Quantized Matrix Completion

}, (n 0) be a finite irreducible, discrete time MC. Let S = {1, 2,, m} be its state space. Let P = [p ij. ] be the transition matrix of the MC.

Curious Bounds for Floor Function Sums

Convex Programming for Scheduling Unrelated Parallel Machines

arxiv: v1 [math.nt] 14 Sep 2014

Lecture 21. Interior Point Methods Setup and Algorithm

Chapter 6 1-D Continuous Groups

CSE525: Randomized Algorithms and Probabilistic Analysis May 16, Lecture 13

Fast and Memory Optimal Low-Rank Matrix Approximation

Testing Properties of Collections of Distributions

1 Proof of learning bounds

Linear Transformations

Using EM To Estimate A Probablity Density With A Mixture Of Gaussians

Ch 12: Variations on Backpropagation

Optical Properties of Plasmas of High-Z Elements

Birthday Paradox Calculations and Approximation

Algebraic Montgomery-Yang problem: the log del Pezzo surface case

Finding Rightmost Eigenvalues of Large Sparse. Non-symmetric Parameterized Eigenvalue Problems. Abstract. Introduction

Quantum algorithms (CO 781, Winter 2008) Prof. Andrew Childs, University of Waterloo LECTURE 15: Unstructured search and spatial search

Character analysis on linear elementary algebra with max-plus operation

The Weierstrass Approximation Theorem

Boosting with log-loss

Elliptic Curve Scalar Point Multiplication Algorithm Using Radix-4 Booth s Algorithm

Determining the Robot-to-Robot Relative Pose Using Range-only Measurements

Recovering Data from Underdetermined Quadratic Measurements (CS 229a Project: Final Writeup)

A LOSS FUNCTION APPROACH TO GROUP PREFERENCE AGGREGATION IN THE AHP

A Self-Organizing Model for Logical Regression Jerry Farlow 1 University of Maine. (1900 words)

Effective joint probabilistic data association using maximum a posteriori estimates of target states

arxiv: v2 [math.co] 8 Mar 2018

Efficient Filter Banks And Interpolators

In this chapter, we consider several graph-theoretic and probabilistic models

Department of Electronic and Optical Engineering, Ordnance Engineering College, Shijiazhuang, , China

Lecture 13 Eigenvalue Problems

Nonmonotonic Networks. a. IRST, I Povo (Trento) Italy, b. Univ. of Trento, Physics Dept., I Povo (Trento) Italy

Closed-form evaluations of Fibonacci Lucas reciprocal sums with three factors

arxiv:cond-mat/ v1 [cond-mat.stat-mech] 22 Oct 1998

Low-complexity, Low-memory EMS algorithm for non-binary LDPC codes

Low complexity bit parallel multiplier for GF(2 m ) generated by equally-spaced trinomials

On the Communication Complexity of Lipschitzian Optimization for the Coordinated Model of Computation

Computable Shell Decomposition Bounds

Constant-Space String-Matching. in Sublinear Average Time. (Extended Abstract) Wojciech Rytter z. Warsaw University. and. University of Liverpool

Polygonal Designs: Existence and Construction

Vulnerability of MRD-Code-Based Universal Secure Error-Correcting Network Codes under Time-Varying Jamming Links

A Note on the Applied Use of MDL Approximations

Statistical properties of contact maps

arxiv: v1 [cs.ds] 17 Mar 2016

E0 370 Statistical Learning Theory Lecture 6 (Aug 30, 2011) Margin Analysis

Multi-Dimensional Hegselmann-Krause Dynamics

Transcription:

Cent. Eur. J. Cop. Sci. 41) 2014 1-11 DOI: 10.2478/s13537-014-0201-x Central European Journal of Coputer Science A note on the ultiplication of sparse atrices Research Article Keivan Borna 12, Sohrab Aboozarkhani Fard 1 1 Faculty of Matheatics and Coputer Science, Kharazi University, Tehran, Iran 2 School of Matheatics, Institute for Research in Fundaental Sciences IPM), P.O. 19395-5746, Tehran, Iran. Received 07 Deceber 2012; accepted 28 Deceber 2013 Abstract: We present a practical algorith for ultiplication of two sparse atrices. In fact if A and B are two atrices of size n with 1 and 2 non zero eleents respectively, then our algorith perfors Oin{ 1 n, 2 n, 1 2 }) ultiplications and Ok) additions where k is the nuber of non zero eleents in the tiny atrices that are obtained by the coluns ties rows atrix ultiplication ethod. Note that in the useful case, k 2 n. However, in Proposition 3.3 and Proposition 3.4 we obtain tight upper bounds for the coplexity of additions. We also study the coplexity of ultiplication in a practical case where non-zero eleents of A resp. B) are distributed independently with unifor distribution aong coluns resp. rows) of the and show that the expected nuber of ultiplications is O 1 2 /n). Finally a coparison of nuber of required ultiplications in the naïve atrix ultiplication, Strassen s ethod and our algorith is given. Keywords: algoriths atrix ultiplication sparse atrices tiny atrices Versita sp. z o.o. 1. Introduction Let A be an n atrix and B a second atrix of size n p. Then the product AB is an p atrix whose entries are AB) i,j = n k=1 A ikb kj. In the following we recall soe alternate descriptions of atrix ultiplication; see [1, Section 2.4] for ore details: A ties coluns of B: The j-th colun of AB is the product of A and the j-th colun of B. Rows of A ties B: The i-th row of AB is the product of the i-th row of A and B. Coluns of A ties rows of B: AB is obtained as the su of coluns of A ties rows of B. That is, AB = A 1 B 1 + + A n B n, where A i and B j stand for the i-th colun of A and the j-th row of B respectively. We also use this ethod for ultiplication in our algoriths. E-ail: borna@khu.ac.ir 1

A note on the ultiplication of sparse atrices In the following we give a very short review of soe algoriths for atrix ultiplication: The naïve atrix ultiplication algorith perfors On 3 ) operations using n 3 ultiplications and n 3 n 2 additions. Strassen [2] gave a divide-and-conquer algorith which runs in On 2.81 ) tie. For exaple, for two 2 2 atrices, the naïve ethod takes 8 ultiplications and 4 additions, while using the Strassen s ethod they can be ultiplied using only 7 ultiplications and 18 additions. Horowitz et al. [3, Section 2.4.4] gave an algorith that runs in O 1 n + 2 n) where the atrices are stored in sparse storage odel. Coppersith and Winograd [4] provided the fastest known atrix ultiplication algorith, with a coplexity of On 2.38 ). Yuster and Zwick [5] gave a new algorith that ultiplies A and B using Oin{ 1 2 ) 0.347 n 1.2 + n 2+o1), 1 n, 2 n, n 2.376+o1) }) algebraic operations ultiplications, additions and subtractions). In fact they split each of the given atrices into two dense and sparse atrices. Recall that an n n atrix is called sparse resp. dense) if the nuber of non zero eleents of it is On 1.37 ) resp. On 1.68 )). Then ultiply the dense parts using the fast dense rectangular atrix ultiplication algorith of Coppersith [6] and ultiply the sparse parts using the naïve sparse atrix ultiplication algorith. Finally they output the su of theses two parts. Note that if the given atrices are sparse, one has to avoid ultiplying zeros. Now ultiplying two sparse atrices using the atrix ultiplication algoriths of Coppersith and Winograd [4] for exaple, then it does not provide any iproveents over the non sparse atrix ultiplication, as their concern is to ultiply two atrices in general. Furtherore note that the result of Yuster and Zwick [5] is of theoretical iportance at least by now). In this paper we give a practical algorith for ultiplication of two sparse atrices using sparse storage odels. More precisely, let A and B be two sparse atrices of size n with 1 and 2 non zero eleents respectively then the coplexity of ultiplication of our algorith is Oin{ 1 n, 2 n, 1 2 }) and the coplexity of additions is Ok). This iproves the coplexity O 1 n + 2 n) entioned in [3, Section 2.4.4]. We also study the coplexity of ultiplication where non-zero eleents of A and B are distributed independently with unifor distribution aong coluns of A and rows of B) respectively. In fact we then show that the expected nuber of ultiplications is O 1 2 /n). Furtherore in Section 3.2 we obtain tight upper bounds for the coplexity of additions, k, as it is presented in the following: 1 2, 1, 2 n k 1 n, 1 n, 2 > n 2 n, 1 > n, 2 n and if 1, 2 > n we have { k 2 n, 2 αn, αn 2 + 1 αn). in{ 2 αn, n}, 2 > αn where α = 1 /n. The organization of this paper is as follows. In Section 2 the ain results of the paper are given and in Section 3 the coplexity analysis of our algoriths are presented. Section 4 is devoted to soe conclusions and future works. Finally in Section 5 we give a coparision for the nuber of required ultiplications in the naïve atrix ultiplication, Strassen s ethod and our algorith. 2. Main results Let A and B be two square atrices of size n and with 1 and 2 non zero eleents. Our results could be generalized easily to rectangular atrices, but for the sake of siplicity we just present the for square atrices. We first recall two storage odels for sparse atrices; see [3, Section 2.3] for ore details. Note that these two sparse storage odels keep the row and colun nubers sorted respectively. Storage Model 1. In this odel we store any sparse atrix of size n in a atrix by + 1 rows and 3 coluns, where is the nuber of non zero eleents of it. 2

Keivan Borna, Sohrab Aboozarkhani Fard a) In the first row we store the triple row count, colun count and the nuber of non zero eleents. b) In the next rows we store the triple row nuber, colun nuber and value for each non zero eleent. Storage Model 2. This odel is essentially the sae as Storage Model 1 with a difference that we first store the colun nubers and then the row nubers. In the following our ain algorith for ultiplication of two sparse atrices is given. Our approach is essentially based on the Coluns ties Rows atrix ultiplication ethod which states that a ij b jk is the entry at row i and colun k of the j-th tiny atrix. 2.1. Main algorith If 2 n, ultiply A and B using Partial Algorith 1. Else ultiply A and B using Partial Algorith 2. 2.2. Partial algorith 1 Input: A and B two sparse atrices of size n and with 1 and 2 non-zero eleents which are stored in Storage Model 1 as Ā and B respectively. Output: The product C := AB which is a square atrix of size n) is given in its ordinary odel. Process: //Z: a atrix with k + 1 rows and 3 coluns, //i : index for non-zero eleents of A, //j : index for non-zero eleents of B, //u: index for non-zero eleents of tiny atrices, //t: counter of non-zero eleents of tiny atrices. 1 i t u 1 2 Z0, 0), Z0, 1)) n, n) 3 while i 1 do 4 j 1 5 while j 2 do 6 if Āi, 1) = Bj, 0) then 7 Zt, 0), Zt, 1), Zt, 2), t) Āi, 0), Bj, 1), Āi, 2) Bj, 2), t + 1) 8 k t 1 9 Z0, 2) k 10 while u k do 11 CZu, 0), Zu, 1)) CZu, 0), Zu, 1)) + Zu, 2) 2.3. Partial algorith 2 Let B be stored in Storage Model 1 as B, then the first colun of B is sorted. Using a one diensional array of size n, say W, for each i, let W [i] stand for the first row nuber whose first colun is i and zero if i does not appear in the 3 3 3 0 1 4 first colun of B. For exaple if B is, then W is filled with the following values: 0 2 7 2 2 3 W [0] = 1, W [1] = 0 and W [2] = 3. 3

A note on the ultiplication of sparse atrices 2.3.1. Coputation of W We use the following algorith to copute W : Input: B a square atrix of size n with 2 non zero eleents. Output: W an array of length n initialized with zero. Process: 1 u B1, 0) 2 W u 1 3 while 2 i 2 do 4 if u Bi, 0) then 5 u Bi, 0) 6 W u i In the following we give an algorith to copute AB: Input: A and B two sparse atrices of size n and with 1 and 2 non-zero eleents where A is stored in Storage Model 2 as Ā and B is stored in Storage Model 1 as B. Output: The product C := AB which is a square atrix of size n) in its ordinary storage odel. Process: //Z: a atrix with k + 1 rows and 3 coluns, //i : index for non-zero eleents of A, //j : index for non-zero eleents of B, //u: index for non-zero eleents of tiny atrices, //t: counter of non-zero eleents of tiny atrices. 1 i t u 1 2 Z0, 0), Z0, 1)) n, n) 3 while i 1 do 4 W Āi, 0)) 5 if 0 then 6 while j 2 7 if Āi, 0) = Bj, 0) then 8 Zt, 0), Zt, 1), Zt, 2), t) Āi, 1), Bj, 1), Āi, 2) Bj, 2), t + 1) 9 k t 1 10 Z0, 2) k 11 while u k do 12 CZu, 0), Zu, 1)) CZu, 0), Zu, 1)) + Zu, 2) Reark 1. The ultiplication of two sparse atrices is not necessarily sparse. Thus the output of product C := AB is a square atrix of size n and so we have to store it in its ordinary storage odel. Reark 2. One can apply the Partial Algorith 1 when 2 > n. But the role of Partial Algorith 2 is to reduce the dependence of the ain algorith fro 2. This will iprove the functionality of our algorith as n grows. 4

Keivan Borna, Sohrab Aboozarkhani Fard 3. Coplexity analysis In the following for each 1 t n let ā t and b t denote the nuber of non zero eleents of the t-th colun of A and t-th row of B respectively. 3.1. Coplexity analysis of ultiplications The nuber of ultiplications in this algorith is n t=1 āt b t ; see [5] for ore inforation. On the other hand, the loops will run in O 1 2 ). Thus the total coplexity of ultiplications is Oin{ 1 n, 2 n, 1 2 }). Corollary 3.1. The particular case of atrix-vector product has cost O 1 ). This is because 2 n and for each i, b i = 0 or b i = 1. Thus the nuber of ultiplications is n āi b i n āi = 1 = On 1.37 ). A siilar proof shows that the vector-atrix product has cost O 2 ) = On 1.37 ). Recall that for a dense atrix A of size n n, its product Ax with an arbitrary input vector x has cost On 2 ). Furtherore in [7] the authors presented an Onlogn) algorith for coputing the atrix-vector product of a Pascal atrix and a vector. Note that a fast solution for the atrix-vector product has any applications in solving a syste of equations Ax = b where its solution is given by x = A 1 b. 3.1.1. Coplexity of ultiplications in a practical case Let A and B be two sparse atrices of size n with 1 and 2 non-zero eleents respectively. Assue that non-zero eleents of A resp. B) are distributed independently with unifor distribution aong coluns resp. rows) of the. For constructing A resp. B) we run the following Bernoulli rando test 1 resp. 2 ) ties. For each 1 i n define two rando variable ā i, b i that count the nuber of non-zero eleents of the i th colun of A and the i th row of B respectively. For ā i, put the first non-zero eleent randoly with unifor distribution in one of coluns in A. Then with probability 1/n this eleents locates in the colun i and with probability n 1)/n this eleent does not locate in the colun i of A. A siilar rando test can be applied for b i. Thus the rando distribution of ā i, P A, and of b i, P B, are binoial with the following probability functions for which we have 0 ā i 1 := in{ 1, n}, 0 b i 2 := in{ 2, n}: P A j) = Pā i = j) = P B j) = P b i = j) = 1 ) ) j 1 n 1 j n n ) ) j 1 n 1 j n n 2 ) 1 j, ) 2 j. Since ā i, b i are independent we have E[The nuber of ultiplications] = E[ = = = n ā i bi ] = n E[ā i ]E[ b i ] n 1 2 j.p A j). j.p B j) n 1 2 j.p A j). j.p B j) n 1 j. n 1 j. 1 2 n 1 j 1 ) ) j 1 n 1 n n ) 1 j ) n 1) 1 j 2. j j n 1 = On 1.74 ). 2 j 2. j 2 j ) n 1) 2 j n 2 ) ) j 1 n 1 n n ) 2 j 5

A note on the ultiplication of sparse atrices The second inequality is a siple siplification by Matheatica for exaple. We suarize this result in the following corollary. Corollary 3.2. The expected nuber of ultiplications in the product of two sparse atrices A and B where non-zero eleents of A resp. B) are distributed independently with unifor distribution aong coluns resp. rows) of the is On 1.74 ). In Section 5 we copare the required nuber of ultiplications in our algorith with those of the naïve and Strassen s ethods. The coplexity of additions in both Partial Algoriths 1 and 2 is Ok). A precise analysis of this issue is done in the following. 3.2. Coplexity analysis of additions Note that as we entioned k is the coplexity of additions. We can give upper and lower bounds for k as in the following: Proposition 3.3. Let A and B be two atrices of size n and with 1 and 2 non zero eleents respectively. Then upper and lower bounds for k are: 1 2, 1, 2 n k 1 n, 1 n, 2 > n 2 n, 1 > n, 2 n n 2 + 1 n) 2 n), 1 > n, 2 > n and { n k 3 + 1 + 2 )n, 1 + 2 > n 2 0, 1 + 2 n 2 Proof. Thus Let A = a ij ) and B = b ij ). Then ultiplying A and B by coluns ties rows ethod we have: a 11 a AB = [ ] 1t a. b 11 b 1n + + [ ] 1n. b t1 b tn + + [ ]. b n1 b nn. a n1 a nt a 11 b 11 a 11 b 1n a 1t b t1 a 1t b tn a 1n b n1 a 1n b nn AB =.. + +.. + +... a n1 b 11 a n1 b 1n a nt b t1 a nt b tn a nn b n1 a nn b nn a nn Thus we obtained a su of n tiny atrices. Note that k is in fact the nuber of non zero eleents of these tiny atrices rows of Z). Lower bounds: Let = n 2 1 and p = n 2 2 be the nuber of zero eleents of A and B respectively. Since each zero in A will produce n zero in one of tiny atrices, we have n zero eleents in all tiny atrices. For exaple, if a 23 = 0, then the second row in the third tiny atrix is zero. Now if for each j, 1 j n, a ij and b jk are not zero siultaneously, then k = n 3 + p)n = 1 + 2 )n n 3. Thus k 1 + 2 )n n 3 when 1 + 2 > n 2 and k 0 if 1 + 2 n 2. Upper bounds: The axiu for k, the axiu nuber of non zero eleents in tiny atrices, is obtained if whenever soe colun of A is non zero say t), then the t-th row of B is non zero too. This is because then the t-th tiny atrix is non zero. We have the following upper bounds for k in different cases: 6

Keivan Borna, Sohrab Aboozarkhani Fard a) If 1, 2 n, then k 1 2. b) If 1 n and 2 > n, then k n 1. c) If 1, 2 > n, then k n 2 + 1 n) 2 n). In order to see this when 1, 2 n, let 1 = n i and 2 = n j for soe i, j 0. Then as it was entioned the axiu for k is obtained if non zero eleents of A respectively B) are located in the t-th colun of A respectively row of B). Hence all tiny atrices except the t-th one becoe zero and in this tiny atrix, ni eleents vanish because of zeros in the t-th colun of A) and n i)j eleents vanish because of zeros in the t-th row of B). Hence we have n 2 ni n i)j = n i)n j) = 1 2 non zero eleents in Z totally. If 1 = n i for soe 0 i n and 2 > n, then one can argue siilarly to see that k n 2 ni = nn i) = n 1. Finally if 1, 2 > n, then let 1 = n + i and 2 = n + j for soe i, j > 0. Assue that n non zero eleents of A and B are in A t and B t respectively in the worth case). This will produce n 2 non zero eleents in the t-th tiny atrix. Then each of other i eleents, will produce at ost j non zero eleents. Hence k n 2 + ij = n 2 + 1 n) 2 n). Reark 3. Note that the upper bound for k whenever 1, 2 > n is still large. For exaple let 1 = 3n and 2 = 2n i.e., i = 2n and j = n). Now assue that all eleents of A i, A j, A t and B i, B j be non zero. Thus by Proposition 3.3, k 3n 2, whereas we have only 2n 2 non zero eleents in the i-th and the j-th tiny atrices, i.e., k = 2n 2. It is obvious that if non zero eleents of A and B are located in different coluns and rows, then k 2n 2. In Proposition 3.4 we overcoe this proble. Proposition 3.4. Let A and B be two atrices of size n and with 1 > n and 2 > n non zero eleents respectively. Let α = 1 /n. Then the upper bound for k is: { 2 n, k 2 αn, αn 2 + 1 αn). in{ 2 αn, n}, 2 > αn In particular, { 2 n, k 2 αn, n. in{ 1, 2 }, 2 > αn. That is, in the worth case, k n 2. Proof. Let 1 = n + i and 2 = n + j for soe i, j > 0. As it was entioned in the proof of Proposition 3.3, in the worth case, one can assue that n non zero eleents of A and B are in A t and B t respectively for soe t). This will produce n 2 non zero eleents in the t-th tiny atrix. In addition, If j α 1)n, then in fact α β 1)n j α β)n for soe 1 β α 1. Thus we have α β 1)n 2 further non zero eleents. Finally 0 j α β 1)n n and i α β 1)n n which yields at ost [j α β 1)n]n further non zero eleents. Hence for j α 1)n we have totally n 2 + α β 1)n 2 + [j α β 1)n]n = n 2 + jn = 2 n non zero eleents. 7

A note on the ultiplication of sparse atrices If j > α 1)n, then we have α 1)n 2 further non zero eleents. Finally j α 1)n > 0 and 0 i α 1)n n which gives at ost [i α 1)n]. in{j α 1)n, n} further non zero eleents. Hence for j > α 1)n we have totally n 2 + α 1)n 2 + [i α 1)n]. in{j α 1)n, n} = αn 2 + 1 αn). in{ 2 αn, n} non zero eleents. Thus, { k 2 n, 2 αn, αn 2 + 1 αn). in{ 2 αn, n}, 2 > αn Now let j > α 1)n. If in{ 2 αn, n} = n, then αn 2 + 1 αn). in{ 2 αn, n} = 1 n, and if in{ 2 αn, n} = 2 αn, then αn 2 + 1 αn). in{ 2 αn, n} = αn 2 + 1 αn). 2 αn) < αn 2 + n. 2 αn) = 2 n, where the inequality is due to the fact that α 1)n 1 n < αn and so 0 1 αn < n. Hence for j > α 1)n, we have k n. in{ 1, 2 }. Exaple 4. Let 1 = 3n and 2 = 6n i.e., i = 2n and j = 5n) and assue that, in the worth case, all eleents of A i1, A i2, A i3 and B i1,, B i6 are non zero. Then k = 3n 2 and it is obvious that if non zero eleents of A and B are located in different coluns and rows, then k 3n 2. This result is also aditted by Proposition 3.4 applied for α = 3. As a corollary of Proposition 3.3 and Proposition 3.4 we have the following result: Corollary 3.5. In the worst case, k 2 n. In fact this upper bound is reachable only in the following three cases: i) 1 > 2, 2 > αn and 1, 2 > n. ii) 2 αn and 1, 2 > n. iii) 1 > n and 2 n. Exaple 5. 1 0 0 0 0 0 Let A = 1 0 0 and B = 1 1 1. Then n = 3, 1 = 3 and 2 = 6. By Proposition 3.3, k 9. One can note 1 0 0 1 1 1 that in this situation in fact k = 0. But if one oves each of 1 s to the first row, it will produce three non zero in one of tiny atrices. As a result, if all eleents of the first row of B becoe non zero eg., by couting the first two rows 3 3 6 3 3 3 1 0 1 of B), then k = 9. Now in order to apply Partial Algorith 2, note that Ā = 0 0 1 0 1 1, B 1 1 1 = 1 2 1 and W on 0 2 1 2 0 1 2 1 1 2 2 1 B is W = 0 1 4 ). Hence Z = 3 3 0 ) which confirs the fact that AB = 0 and no ultiplications or additions are applied. 8

Keivan Borna, Sohrab Aboozarkhani Fard Exaple 6. Let A = 2 1 0 0 1 2 1 0 0 1 2 1 0 0 1 2 and B = A. Then n =4, 1 = 2 = 10. Then the precise nuber of ultiplications and additions that our algorith does is 26 and the reported nubers are 40 and 36 respectively. 5 4 1 0 C := AB = 4 6 4 1 1 4 6 4 0 1 4 5 3.3. Space coplexity Furtherore, Both storage odels 1 and 2 use 3 1 +1)+3 2 + 1) = 3 1 + 2 + 2) = Oax{ 1, 2 })=On 1.37 ) space. Since the product of two sparse atrices is not necessarily sparse, we have to use a further On 2 ) space for storing the output. Furtherore, we use 3k eory for coputing the nuber of non-zero eleents of tiny atrices, since k< 2 n we deduce that k = On 2.37 ). As a result the total space coplexity is On 2.37 ). 4. Conclusions and future work In this paper we iproved the coplexity of the algorith for ultiplication of two sparse atrices posed in [3]. In fact, when we use the sparse storage odel for storing input atrices, the required tie for ultiplication, for exaple via the algorith in [3, Section 2.4.4], exceeds the tie presented by the naïve algorith. In this paper we present an algorith that stores the input atrices in the sparse storage odel and the tie for ultiplication is less than the naïve and Strassen s algoriths. Furtherore, tight upper bounds for the coplexity of additions is presented. Studying our algorith for other alternatives to store sparse atrices for exaple [8]) is the subject of future work. 5. Coparision of our algorith with the naïve and Strassen s ethods The ai of this section is to copare the required nuber of ultiplications in our algorith, the naïve and Strassen s ethods. In Tables 1 and 2 the coluns N, S, O represent the required nuber of ultiplications in the naïve, Strassen s and our algorith, respectively. We have generated 100 pairs of sparse rando atrices each pair of different size) in the following two situations: 1. When each pair of the sparse atrices are uniforly distributed rando atrices: 30996 10199 2226 Figure 1. The average nuber of ultiplications in case 1) in 100 tests Using the functions sprand or sprandn) of MATLAB, one can generate 100 pairs of sparse uniforly or norally) distributed rando atrices. We have written a Java applet for coputing the average nuber of ultiplications that each of the three algoriths naïve, Strassen s and our algorith) are doing. The average nuber of ultiplications for the naïve atrix ultiplications is 30996, for Strassen is 10199 and for our algorith is 2226 as it is shown in Figure 1. This observation also shows that our algorith is doing the least nuber of ultiplications 9

A note on the ultiplication of sparse atrices Table 1. The nuber of ultiplications for 20 pairs of sparse unifor rando atrices Input Output Input Output No. No. size 1 2 N S O size 1 2 N S O 1 40 452 216 64000 19208 2446 11 25 178 148 15625 9261 1036 2 45 167 553 91125 19208 2024 12 7 17 3 343 189 10 3 44 592 225 85184 19208 3056 13 46 815 64 97336 19208 1149 4 5 4 12 125 56 10 14 11 26 29 1331 392 73 5 9 8 26 729 448 22 15 48 778 915 110592 64827 14930 6 32 454 308 32768 21952 4337 16 3 1 2 27 27 2 7 11 57 52 1331 392 267 17 8 22 3 512 448 8 8 39 490 457 59319 21952 5721 18 17 44 20 4913 3136 39 9 13 60 68 2197 1323 313 19 23 236 122 12167 2744 1211 10 40 259 125 64000 19208 792 20 4 4 5 64 64 5 32315 11445 84 Figure 2. The average nuber of ultiplications in case 2) for 100 tests and has the best perforance. Furtherore a benchark about ultiplication of 20 pairs of such atrices with different sizes is given in Table 1. This table can be read off as follows. For exaple when No. = 15, the required nuber of ultiplications in three algoriths for two sparse atrices A and B of size 48 with unifor distribution aong coluns of A and rows of B) with 778 and 915 non-zero eleents are 110592, 64827 and 14930 respectively. Table 2. The nuber of ultiplications for 20 pairs of sparse rando atrices Input Output Input Output No. No. size 1 2 N S O size 1 2 N S O 1 8 6 12 512 448 10 11 12 9 27 1728 1323 22 2 19 32 10 6859 3136 14 12 2 2 2 8 8 2 3 22 36 21 10648 2744 38 13 10 7 21 1000 392 9 4 42 103 98 74088 19208 227 14 20 34 52 8000 2744 99 5 11 3 18 1331 392 4 15 30 81 15 27000 9261 42 6 21 49 57 9261 2744 121 16 10 8 15 1000 392 5 7 34 38 60 39304 21952 67 17 25 9 15 15625 9261 6 8 6 2 3 216 189 0 18 41 7 72 68921 19208 11 9 31 106 1 29791 9261 5 19 12 10 3 1728 1323 1 10 50 158 35 125000 64827 104 20 30 23 104 27000 9261 64 2. When there is no liit on the distribution of sparse atrices: In this case we generate 100 pairs of sparse rando atrices and copute the average nuber of ultiplications that each algorith is doing. Siilar to case 1) the results are presented in Figure 2 and Table??. For exaple when No. = 4, the required nuber of ultiplications in three algoriths for two rando sparse atrices of size 10

Keivan Borna, Sohrab Aboozarkhani Fard 42 with 103 and 98 non-zero eleents are 74088, 19208 and 227 respectively. Acknowledgeents The paper benefited fro the helpful coents and encourageents of Professor Gilbert Strang. The authors would like to thank hi very uch. The first author is also thankful to the National Elite Foundation of Iran for partial financial support. References [1] G. Strang, Introduction to Linear Algebra Wellesley-Cabridge Press, Wellesley, USA, 2003) [2] V. Strassen, Gaussian eliination is not optial, Nuer. Math. 13, 354 356, 1969 [3] A. Horrowithz, J. Sahny, Fundaentals of Data Structures Coputer Science Press, New York, 1983) [4] D. Coppersith, S. Winograd, Matrix ultiplication via arithetic progression, J. Syb. Coput. 9, 251 280, 1990 [5] R. Yuster, U. Zwick, Fast sparse atrix ultiplication, ACM T. Alg. 1, 2 13, 2005 [6] D. Coppersith, Rectangular atrix ultiplication revisited, J. Coplexity 13, 42 49, 1997 [7] Z. Tang, R. Duraiswai, N. Guerov, Fast algoriths to copute atrix-vector products for pascal atrices, Technical Reports fro UMIACS UMIACS-TR-2004-08, 2004 [8] A. Björck, Block bidiagonal decoposition and least square probles, Perspectives in nuerical Analysis, Helsinki, May 27 29, 2008 11