Permutation Codes: Optimal Exact-Repair of a Single Failed Node in MDS Code Based Distributed Storage Systems

Size: px

Start display at page:

Download "Permutation Codes: Optimal Exact-Repair of a Single Failed Node in MDS Code Based Distributed Storage Systems"

Phillip Tyler
5 years ago
Views:

1 Permutation Codes: Optimal Exact-Repair of a Single Failed Node in MDS Code Based Distributed Storage Systems Viveck R Cadambe Electrical Engineering and Computer Science University of California Irvine, Irvine, California, 92697, USA vcadambe@uciedu Cheng Huang, Jin Li Communications, Collaborations and Systems Group, Microsoft Research, Redmond, WA, USA {chengh, jinl}@microsoftcom Abstract It is well known that maximum distance separable (MDS) codes are an efficient means of storing data in distributed storage systems, since they provide optimal tolerance against failures (erasures) for a fixed amount of storage A (n, k) code can be used in a distributed storage system with n disks, each having a storage capacity of 1 unit, to store k units of information, where n > k If the code used is maximum distance separable, then the storage system can tolerate up to () disk failures (erasures) The focus of this paper is the design of a MDS code with the additional property that a single disk failure can be repaired with optimal repair bandwidth, ie, the amount of data to be downloaded for recovery of the failed node Previously, a lower bound of n 1 units has been established by Dimakis et al, on the repair bandwidth for a single node failure in an (n,k) MDS code based storage system, where each of the n disks store 1 unit While the reference by Dimakis et al, established a lower bound on the repair bandwidth, existence of codes achieving this lower bound for arbitrary (n, k) have only been established in an asymptotic manner, in the limit of arbitrarily large storage capacities While finite code constructions achieving this lower bound have been provided previously for k max(n/2,3), the question of existence of finite codes for arbitrary values of (n,k) achieving the lower bound remained open In this paper, we provide the first known construction of a finite code for arbitrary (n,k), which can repair a single failed systematic node by downloading exactly n 1 units of data The codes, which are optimally efficient in terms of a metric known as repair bandwidth, are based on permutation matrices and hence termed Permutation Codes We also show that our code has a simple repair property which enables efficiency, not only in terms of the amount of repair bandwidth, but also in terms of the amount of data accessed on the disk

2 I INTRODUCTION Consider a distributed storage system with n distributed data disks, each storing one unit of data The data storage devices in the storage system can possibly fail To protect the storage system from information loss in case of disk failure, the amount of storage space in the system is greater than the amount of information stored, with the extra storage space used to design redundancy in the system Assume that the amount of information to be stored in this storage system is equal to k units, where k < n Then, it is well known that the optimal tolerance to failures is provided by using an (n,k) maximum distance separable (MDS) code to store the data Such a code would tolerate any () disk failures, since the MDS property ensures that the original information can be recovered by using any k surviving disks While codes which are not maximum distance separable (such as repetition codes) can be used in the storage system, they have lower redundancy for a given amount of storage Equivalently, for a given amount of redundancy, MDS codes provide the smallest cost of storage in distributed data storage systems There is an enormous amount of literature associated with the search of erasure codes for storage systems with efficient encoding and decoding properties (See for example, [1] [4]) While an MDS code based storage system can tolerate a worst-case failure scenario of k disks, the most common failure scenario in a storage system is the case where a single disk fails In case of single disk failure, efficient (fast) recovery of the failed disk is important, since replacing the disk before other disks fail reduces the chance of data loss and improves the overall reliability of the system The focus of this paper is on recovery efficiency (speed) of a single disk failure in an MDS code for distributed storage systems When a single node fails, a new node enters the storage system, connects to the surviving n 1 disks via a network, downloads data from these (n 1) surviving disks, and reconstructs the failed node The primary factor in determining the speed of recovery of a failed node is the amount of time taken for the new node to download the necessary data from surviving disks, which, in turn, depends on the amount of data accessed and downloaded 1 by the new node This problem has been studied previously from the perspective of the amount of data to be downloaded - also known as the repair bandwidth - by the new node in [5] [11] Note that a trivial solution for any (n,k) MDS code is to achieve a repair bandwidth of k units This is because the entire original data, and hence the failed disk, can be recovered with the new node reading any set of k surviving disks completely The goal of our paper, as is the goal of references [5] [11], is to minimize the amount of repair bandwidth to recover a single failed node in a MDS code based distributed storage system We restrict our code design to systematic MDS codes, ie, where the first k storage nodes, also known as systematic nodes, store an uncoded copy of the k information units The remaining nodes are referred to as parity nodes The systematic property is a crucial practical constraint in storage systems, since it ensures ease of access of data under normal operation (ie, no failures) Further, unlike in [5], we require the new node to be a replica of the failed node (also termed exact regeneration in [6]), so that the original systematic code structure is retained after the repair operation is completed The codes for the scenario we focus on, in this paper, have also been termed minimum storage regenerating (MSR) codes, in literature We next summarize the results of references [5] [11] before proceeding to describe our contributions to this problem A Previous Work The goal of minimizing the amount of repair bandwidth was initiated in [5] The reference studied the problem from the perspective of functional repair, where the new (replaced) node need not be a replica of the failed node; it only needs to be information equivalent to the failed node, so that in combination with the original n 1 surviving nodes, the code still maintains the MDS property This study showed that, under these constraints, the minimum repair bandwidth is equal to n 1 units, where, as mentioned before, each of the n nodes stores 1 unit Since n 1 k, n > k > 1, the result meant that for the constraints of the functional repair problem, the trivial strategy of downloading k units could be improved for any (n, k) Further, since the constraints of functional repair [5] are weaker than the constraints of our problem where the code has to be systematic, and recovery has to be exact, the result implied that for any MDS code based system, Repair Bandwidth n 1, n 1 ie, is a lower bound for the problem of exact regeneration which is of interest in this paper An important question is whether this lower bound is tight even if the stricter and more practical constraints of systematic 1 There is a subtle difference between the amount of data accessed and downloaded; Such differences are explored later in Section II-A

3 coding and exact regeneration are imposed This question was motivated in reference [6] for the special case of (n = 4,k = 2), for which, it was shown that the lower bound of n 1 = 3/2 units is indeed tight The reference made a connection between the repair bandwidth problem and the technique of interference alignment which was developed in the context of wireless communications [12] [14] This question was later studied for more general cases in [7] [11], [15] By expanding connections of the repair bandwidth problem to interference alignment, references [7], [8] constructed scalar linear code constructions to show that the bound of n 1 is tight for k max( n 2,3) In other words, for k max(n 2,3), perhaps surprisingly, there is no loss in terms of minimum repair bandwidth even if we impose the stricter constraints of exact regeneration and systematic coding References [11], [15] provided alternate linear code constructions for the case of k n n 1 2 achieving the lower bound of units While the achievability of n 1 units for the range of high redundancy (ie, k n/2) is a powerful result, most practical distributed storage systems do no operate in this regime In fact, in most distributed storage systems the number of parity nodes are smaller than the number of systematic nodes, ie, n > k/2 However, this practical setting is more challenging and results in current literature are relatively weaker References [9], [10] constructed, for any (n,k) including the cases where n > k/2, a class of asymptotic codes that achieve a repair bandwidth of n 1 asymptotically, as the amount of information stored in a single node grows arbitrarily large2 In other words, they showed that exact regeneration is asymptotically equivalent to functional regeneration While this asymptotic equivalence is an interesting theoretical limit to what practical codes can achieve, the existence of finite codes achieving a repair bandwidth of n 1 units remained an open problem of practical interest In fact, for arbitrary (n, k), the construction of finite codes having a repair strategy more efficient than the trivial repair strategy with a repair bandwidth of k units remained open The only other insight in previous literature regarding this open problem comes from [7] which restricts coding schemes to scalar linear codes, and shows that under this restriction, the limit of n 1 units on the repair bandwidth is NOT achievable It is this open problem - the existence of finite codes achieving the lower bound of n 1 units on the repair bandwidth - that is the object of focus of this paper The main contribution of this paper is the achievability of this limit using finite-length vector linear codes, for the failure of systematic nodes We explore our main contributions in greater detail in the next section Before we proceed, we note that there exists, in literature, a parallel line of work, which studies the repair bandwidth for codes which are not necessarily MDS and hence use a greater amount of storage for a given amount of redundancy [5], [7], [16] These references study the tradeoff between the amount of storage and the repair bandwidth required, for a given amount of redundancy Further, we also note that design of codes, from the perspective of efficient recovery of its information elements for error-correcting (rather than erasure) erasure has also been studied in literature in associated with locally decodable codes (See [17] and references therein) The focus of this paper, however, is on MDS erasure codes (also referred to as minimum storage regenerating codes), ie, (n,k) codes which can tolerate any () erasures II SUMMARY OF CONTRIBUTIONS The main contribution of this paper is the design a new class of MDS codes which achieve the minimum repair bandwidth of n 1 units for the repair of a single failed systematic node Specifically, the code constructions presented in this paper are listed below 1) Random Permutation Codes: In Section V, we present a construction of codes which achieve the repair bandwidth lower bound of n 1 units for repair of systematic nodes for any tuple (n,k) where n > k The code construction, albeit finite, is based on random coding, with the random coding argument used to justify the existence of a repair-bandwidth optimal MDS code This means that for any arbitrary (n, k), a brute-force search over a (finite) set of codes described in the section, will yield a repair-bandwidth optimal MDS code Since our code construction is based on permutation matrices, the codes are termed Permutation Codes 2) Explicit Construction of Permutation Codes for {2,3}: While Section V describes a random coding based construction, we also provide in Section VI, explicit constructions for the special case of {2,3} It must be noted that the search for codes with efficiently repair both systematic and parity nodes is still open However, from a practical perspective, the step taken in this paper is important since, in most storage systems, the number of parity nodes is small compared to systematic nodes 2 This means that a unit must be equivalent to an arbitrarily large number of bits

4 A Efficient Code Construction in terms of Disk Access While most previous works described above explore the repair problem by accounting for the amount of information to be sent over the network for repair, there exists another important cost during the repair of a node viz amount of disk access To understand the difference between these two costs, consider a toy example of a case where a disk stores two bits a 1,a 2 Now, suppose that, to repair some other failed node in this system, the bit a 1 + a 2 has to be sent to the new node This means that the bandwidth required for this particular disk is 1 bit However, in many storage systems, the disk-read speed is slower than the network transfer speeds and hence becomes a bottleneck In the case where the disk read speed is a bottleneck, the defining factor in the speed of repair is the amount of disk access rather than the repair bandwidth In the toy example described the amount of disk access is 2 bits as both a 1 and a 2 have to be read from the disk to compute a 1 +a 2 Thus, it is possible that certain codes, while minimizing repair bandwidth, can perform poorly in terms of disk access rendering the codes impractical In this paper, we will formalize this notion of disk access cost, and show that Permutation Codes are not only bandwidth optimal, but also disk-access optimal, for the repair of a single failed systematic node III THE SIMPLEST PERMUTATION CODE : n = 4,k = 2 We start with the case of n = 4,k = 2 Repair-bandwidth optimal codes for the case of n = 4,k = 2 have, in fact, been developed previously in [6] We only use this case as a toy example to bring out the interference alignment perspective of the challenges of code development We have k = 2 sources are denoted as a 1 and a 2 respectively, with a i = [a i (1) a i (2)] T being a 2 1 vector over a field of size q 5 The goal is to design a code of n = 4 elements, where each of the four code element, is a 2 1 vector The ith code element is denoted as d i for i = 1,2,3,4 Since each storage/code element is equivalent to 2 scalars over the field, we consider 1 unit - the amount of storage in each disk - to be equivalent to 2 scalars over the field The goal is to design an MDS code which repairs a single failed systematic node by downloading a total of 3/2 units, or equivalently 3 scalars over the field In particular, we present a code which repairs a failed systematic node by downloading a single scalar from the remaining 3 surviving nodes Since we restrict ourselves to systematic codes, we have d 1 = a 1 and d 2 = a 2 Let the parity nodes i = 3,4 store vectors of the form d i = C i,1 a 1 +C i,2 x 2, where C 3,1,C 3,2,C 4,1,C 4,2 are all 2 2 matrices over the field As depicted in Figure 1, we have [ ] 1 0 C 3,1 = C 3,2 = C 4,2 = 0 1 [ ] 0 1 C 4,1 = 2 0 It can be verified that the above code is an MDS code, since a 1,a 2 can be recovered from any 2 nodes Now, suppose node 1 fails, a naive strategy would be to download the 4 scalars stored in any 2 nodes completely, and recover all of a 1 and a 2 In this naive strategy, while we only desire to reconstruct a 1, we also end up completely reconstructing the variable a 2, although it is not desired Thus, this strategy uses 4 dimensions to download 4 variables - two desired variables and two undesired variables However, as depicted in Figure 1, a more efficient strategy that downloads a total of 3 scalars - one scalar from each surviving node - exists This more efficient strategy achieves this by restricting the undesired variable (or interference) a 2 to only 1 dimension The idea is to download scalar V 1 d i from node i for i = 2,3,4, where V 1 = [1 0] Then, since ([ ]) 1 rowspan(v 1 ) = rowspan(v 1 C 3,2 ) = rowspan(v 1 C 4,2 ) = rowspan (1) 0 the interference associated with a 2 aligns along the scalar V 1 a 2 = a 2 (1) This variable, being downloaded from node 2 as V 1 d 2 = a 2 (1) is cancelled to obtain 2 equations in the two desired variables a 1 (1) and a 1 (2) Because ([ ]) V1 C rank 3,1 = 2 (2) V 1 C 4,1 the desired variables can be recovered and repaired Similarly, if node 2 fails, it can be noted that downloading V 2 d i, where V 2 = [0 1], from nodes i {1,3} and V 1 d 4 from node 4, suffices for repair because the following

matrices In the context of interference channels, equations (1) and (3) are similar to the alignment constraints and equations (2) and (4) are related to the reconstruction of the desired signal (See

5 a 1 (1) a 1 (2) a 2 (1) a 2 (2) a 2 (1) a 1 (1)+a 2 (1) a 1 (2)+a 2 (2) a 1 (1)+a 2 (1) New Node a 1 (1) a 1 (2) a 1 (2)+a 2 (1) a 1 (2)+a 2 (1) 2a 1 (1)+a 2 (2) Aligned Interference Fig 1 An optimal (4,2) code are satisfied rowspan(v 2 C 3,1 ) = rowspan(v 1 C 4,1 ) = rowspan(v 2 ), (3) ([ ]) V2 C rank 3,2 = 2 V 1 C 4,2 (4) Note that the coding matrices in the above construction are permutation matrices In the context of interference channels, equations (1) and (3) are similar to the alignment constraints and equations (2) and (4) are related to the reconstruction of the desired signal (See [14]) The coding matrices C i,j are analogous to the channel gain matrices in wireless interference channels, and the vectors V 1 and V 2 are analogous to the beamforming vectors in the wireless context Note that there is a fundamental difference between the wireless interference channel and the storage setting In the storage setting, is that we are allowed to design the channel matrices satisfying the desired alignment relations, since we are allowed to design the coding matrices While this flexibility enables a simple solution for the case of n = 4,k = 2, the extensions of this solution to more general cases of n,k are not trivial In fact, it is shown in [7] that the extensions are not even possible, if we restrict our code elements d i to be () 1 vectors As we will show in this paper, such extensions are in fact possible, by choosing the code elements d i to be L = () k dimensional vectors We will next introduce a general notation and on overview of our solution IV OVERVIEW OF THE SOLUTION FOR ARBITRARY (n, k) Consider k sources, all of equal size L = M/k over a field F q of size q Source i {1,2,,k} is represented by the L 1 vector a i F L q Note here that M denotes the size of the total information stored in the distributed storage system, in terms of the number of elements over the field There are n nodes storing a code of the k source symbols in an (n,k) MDS code Each node stores a data of size L, ie, each coded symbol of the (n,k) code is a L 1 vector Therefore, 1 unit is equivalent to L scalars over the field q The data stored in node i represented by L 1 vector d i, where i = 1,2,,n We assume that our code is linear and d i can be represented as d i = k C i,j a j, j=1 where C i,j are L L square matrices Further, we restrict our codes to have a systematic structure, so that, for i {1,2,,k}, { } I j = i C i,j = 0 j i

6 Since we restrict our attention to MDS codes, we will need the matrices C i,j to satisfy the following property Property 1: C j1,1 C j1,2 C j1,k C j2,1 C j2,2 C j2,k rank = Lk = M (5) C jk,1 C jk,2 C jk,k for any distinct j 1,j 2,,j k {1,2,,n} The MDS property ensures that the storage system can tolerate up to (n k) failures (erasures), since all the sources can be reconstructed from any k nodes whose indices are represented by j 1,j 2,,j k {1,2,,n} Now, consider the case where a single systematic node, say node i {1,2,,k} fails The goal here is to reconstruct the failed node i, ie, to reconstruct d i, using all the other n 1 nodes, ie, {d j : j i} To understand the 1 solution, first, consider the case where node 1 fails We download a fraction of of the data stored in each of the nodes {1,2,3,,n} {1}, so that the total repair bandwidth is n 1 units We focus on linear repair solutions for our codes, which implies that we need to download L linear combinations from each of d j,j {2,3,,n} Specifically, we denote the linear combination downloaded from node j {2,3,,n} as V 1,j d j k = V 1,j C i,j a j j=1 = V 1,j C i,1 a 1 }{{} Desired signal component j + V 1,j C i,j a j j=2 }{{} Interference component L where V 1,j is a L dimensional matrix The matrices V 1,j are referred to as repair matrices in this paper The goal of the problem is to construct L components of a 1 from the above equations For systematic node j {2,3,,k}, the equations downloaded by the new node do not contain information of the desired signal a 1, since for these nodes, C j,1 = 0 The linear combinations downloaded from the remaining nodes j {k +1,k + 2,, n}, however, contain components of both the desired signal and the interference Thus, the downloaded linear combinations V 1,j d j are of two types 1) The data downloaded from the surviving systematic nodes i = 2,,k contain no information of the desired signal a 1, ie, V 1,j d j = V 1,j a j,j = 2,,k L Note that there such linear combinations of each interfering component a j,j = 2,3,,k L 2) The L components of the desired signal have to be reconstructed using the() = L linear combinations of the form V 1,j d j,j = k+1,k+2,,n Note that these linear combinations also contain the interference terms a j,j = 2,k which need to be cancelled The goal of our solution will be to completely cancel the interference from the second set of L linear combinations, using the former set of linear combinations, and then to regenerate x 1 using the latter L interference-free linear combinations A Interference Cancellation The linear combinations corresponding to interference component a i,i 1 downloaded using node i by the new node is V T 1,i a i for i = 2,3,,k To cancel the associated interference from all the remaining nodes V 1,j d j by linear techniques, we will need, j = k +1,k+2,,n, i = 2,3,,k rowspan(v 1,j C j,i ) rowspan(v T 1,i ), rowspan(v 1,j C j,i ) = rowspan(v1,i T ), (6) where the final equality (6) follows because C j,i are all full rank matrices and therefore, the subset relation automatically implies the equality relation as rank(c T j,i V 1,j) = rank(v 1,j ) = L = rank(v 1,i) Thus, as long as (6) is satisfied for all values j {k + 1,k + 2,,n},i {2,3,,k}, the interference components can be,

7 completely cancelled from V 1,j d j to obtain V 1,j C j,1 a 1,j {k +1,k +2,,n} Now, we need to ensure that the desired L 1 vector a 1 can be uniquely resolved, from the L linear combinations of the form V1,j T C j,1a 1,j = k +1,k +2,,n In other words, we need to ensure that rank V 1,k+1 C k+1,1 V 1,k+2 C k+2,1 V 1,n C n,1 = L (7) If we construct C l,j and V 1,i satisfying (6) and (7) for i = 2,,n,j = 1,2,,k,l = 1,2,3,n, then, a failure of node 1 can be repaired with the desired minimum repair bandwidth To solve the problem for the failure of any other systematic node, we need to ensure similar conditions We summarize all the conditions required for successful reconstruction of a single failed (systematic) node with the minimum repair bandwidth below Equation (5) in Property 1 The interference alignment relations rowspan(v l,j C j,i ) = rowspan(v l,i ) for l = 1,2,,k, j = k +1,k +2,,n and i {1,2,,k} {l} Reconstruction of the failed node, given that the alignment relations are satisfied V l,k+1 C k+1,l V l,k+2 C k+2,l rank = L (8) V l,n C n,l for l = 1,2,,k, j = k +1,k +2,,n and i {1,2,,k} {l} Note that given n,k, our design choices are L, q, C j,i and V l,j for l = 1,2,,k, j = k +1,k +2,,n and i {1,2,,k} {l} Reference [7] has shown that the above conditions can NOT be satisfied if we restrict ourselves to M = k() References [9], [10] constructed solutions which satisfied the above relations in an asymptotically exact, as M The main contribution of this paper is the construction of coding matrices and repair matrices so that the above relations are satisfied exactly, with finite M, ie, with M = k() k B Characteristics of Our Solution Above, we have defined a general structure for a linear, repair bandwidth optimal solution For the solution described in this paper, the repair matrices satisfy a set of additional properties described here First, in our solution, V l,j = V l,j for all l {1,2,,k},j j,j,j {1,2,,n} {l} In other words, when a node, say node l, fails, we download the same linear combination from every surviving node We use the notation V l = Vl,j for all j {1,2,,n} {l} The second property satisfied by our solution, is that it is disk-access optimal This notion is explained further next C Disk-Access Optimality Definition 1: Consider a set of L L dimensional coding matrices C i,j,i = k+1,k+2,,n,j = 1,2,,k and a set of repair matrices W l,i for some l {1,2,,n} and for all i {1,2,,n} {l}, where the repair matrix W l,i has dimension B l,i L, where B l,i L The repair matrices satisfy the property that d l can be reconstructed linearly from W l,i d i,i {1,2,,n} {l} In other words, a failure of node l can be repaired using the repair matrices Then the amount of disk access required for the repair of node l is defined to be the

8 λ 4,1 a 1 ( 0,0,0 )+λ 4,2 a 2 ( 0,0,0 )+λ 4,3 a 3 ( 0,0,0 ) λ 4,1 a 1 ( 0,0,1 )+λ 4,2 a 2 ( 0,0,1 )+λ 4,3 a 3 ( 0,0,1 ) λ 4,1 a 1 ( 0,1,0 )+λ 4,2 a 2 ( 0,1,0 )+λ 4,3 a 3 ( 0,1,0 ) λ 4,1 a 1 ( 0,1,1 )+λ 4,2 a 2 ( 0,1,1 )+λ 4,3 a 3 ( 0,1,1 ) λ 4,1 a 1 ( 1,0,0 )+λ 4,2 a 2 ( 1,0,0 )+λ 4,3 a 3 ( 1,0,0 ) λ 4,1 a 1 ( 1,0,1 )+λ 4,2 a 2 ( 1,0,1 )+λ 4,3 a 3 ( 1,0,1 ) λ 4,1 a 1 ( 1,1,0 )+λ 4,2 a 2 ( 1,1,0 )+λ 4,3 a 3 ( 1,1,0 ) λ 4,1 a 1 ( 1,1,1 )+λ 4,2 a 2 ( 1,1,1 )+λ 4,3 a 3 ( 1,1,1 ) λ 5,1 a 1 ( 1,0,0 )+λ 5,2 a 2 ( 0,1,0 )+λ 5,3 a 3 ( 0,0,1 ) λ 5,1 a 1 ( 1,0,1 )+λ 5,2 a 2 ( 0,1,1 )+λ 5,3 a 3 ( 0,0,0 ) λ 5,1 a 1 ( 1,1,0 )+λ 5,2 a 2 ( 0,0,0 )+λ 5,3 a 3 ( 0,1,1 ) λ 5,1 a 1 ( 1,1,1 )+λ 5,2 a 2 ( 0,0,1 )+λ 5,3 a 3 ( 0,1,0 ) λ 5,1 a 1 ( 0,0,0 )+λ 5,2 a 2 ( 1,1,0 )+λ 5,3 a 3 ( 1,0,1 ) λ 5,1 a 1 ( 0,0,1 )+λ 5,2 a 2 ( 1,1,1 )+λ 5,3 a 3 ( 1,0,0 ) λ 5,1 a 1 ( 0,1,0 )+λ 5,2 a 2 ( 1,0,0 )+λ 5,3 a 3 ( 1,1,1 ) λ 5,1 a 1 ( 0,1,1 )+λ 5,2 a 2 ( 1,0,1 )+λ 5,3 a 3 ( 1,1,0 ) Node 4 Node 5 Fig 2 The two parity nodes in the (5,3) permutation code quantity i={1,2,,n} {l} ω(w l,i ) where ω(a) represents the number of non-zero columns of matrix A To compute W l,i d i, only ω(w l,i ) entries of the matrix d i have to be accessed This leads to the above definition for the amount of disk access for a linear solution Also, note that if rank(w l,i ) - the amount of bandwidth used - is always smaller than ω(w l,i ) Therefore, the amount of disk access is smaller than the amount of bandwidth used for a given solution This leads to the following lemma Lemma 1: For any (n,k) MDS code storing 1 unit of data in each disk, the amount of disk access needed to repair any single failed node l = 1,2,,n is at least as large as n 1 units It turns out that our solution is not only repair bandwidth optimal, but it is also optimal in terms of disk access More formally, for our solution V j not only has a rank of L/(n k), it also has exactly L/(n k) non-zero columns; in fact, V j has exactly L/(n k) non-zero entries Among the L columns of V j, L L columns are zero This means that, to obtain the linear combination V l d i from node i for repair of node l i, only L entries of the node i has to be accessed We now proceed to describe our solution V PERMUTATION CODE In this section, we describe a set of random codes based on permutation matrices satisfying the desired properties described in the previous section We begin with some preliminary notations required for our description Notations and Preliminary Definitions: The bold font is used for vectors and matrices and the regular font is reserved for scalars Given a l 1 dimensional vector a its l components are denoted by a = a(1) a(2) a(l) For example, d 1 = [d 1 (1) d 1 (2) d 1 (L)] T Given a set A, the l-dimensional Cartesian product of the set is denoted by A l The notation I l denotes the l l identity matrix; the subscript l is dropped when the size l is clear from the context Next, we define a set of functions which will be useful in the description of permutation codes Given (n,k) and a number m {1,2,,(n k) k }, we define a function 3 φ : {1,2,,(n k) k {0,1,,(n k 1)} k such that φ(m) is the unique k dimensional vector whose k components represent the 3 While the functions defined here are parametrized by n,k, these quantities are not explicitly denoted here for brevity of notation

9 k-length representation of m 1 in base () In other words φ(m) = (r 1,r 2,,r k ) m 1 = k r i () i 1, where r i {0,1,,( 1)} Further, we denote the ith component of φ(m) by φ i (m), for i = 1,2,,k Since the k-length representation of a number in base (n k) is unique, φ and φ i are well defined functions Further, φ is invertible and its inverse is denoted by φ 1 We also use the following compressed notation for φ 1 r 1,r 2,,r k = φ 1 (r 1,r 2,,r k ) = i=1 k r i () i 1 1 The definition of the above functions will be useful in constructing our codes A Example : n=5, k=3 We motivate our code by first considering the case where k = 3,n = 5 for simplicity The extension of the code to arbitrary n,k will follow later 4 For n = 5,k = 3, we have M/k = (n k) k = 2 3 = 8 As the name suggests, we use scaled permutation matrices for C i,j,j {1,2,,k},i {k +1,k +2,,n} Note here that the variables a j,j = 1,2,,k are () k 1 dimensional vectors We represent the () k = 8 components these vectors by the k = 3 bit representation of their indices as a j = (a j (1) a j (2) a j (8)) T = I 8 = i=1 a j ( 0,0,0 ) a j ( 0,0,1 ) a j ( 0,1,0 ) a j ( 1,0,0 ) a j ( 1,0,1 ) a j ( 1,1,0 ) a j ( 1,1,1 ) for all j = 1,2,,k Now, similarly, we can denote the identity matrix as e(1) e( 0,0,0 ) e(2) e( 0,0,1 ) e(8) = e( 1,1,1 ) where, naturally,e(i) is the ith row of the identity matrix Now, we describe our code as follows Since the first three storage nodes are systematic nodes and the remaining two are parity nodes, the design parameters are C 4,j,C 5,j,V j for j = 1,2,3 We choose C 4,j = λ 4,j I so that d 4 = 3 λ 4,j a j, j=1 where λ 4,j are independent random scalars chosen using a uniform distribution over the field F q Now, consider 4 Optimal codes for n = 5,k = 3 have been proposed in [8], [18] We only use this case to demonstrate our construction in a simple setting,

10 the 8 8 permutation matrix P i defined as e( 1,0,0 ) e( 1,0,1 ) e( 1,1,0 ) P 1 = e( 1,1,1 ) e( 0,0,0 ),P 2 = e( 0,0,1 ) e( 0,1,0 ) e( 0,1,1 ) e( 0,1,0 ) e( 0,1,1 ) e( 0,0,0 ) e( 0,0,1 ) e( 1,1,0 ) e( 1,1,1 ) e( 1,0,0 ) e( 1,0,1 ) Then, the fifth node (ie, the second parity node) is designed as 3 d 5 = λ 5,j P j a j, j=1,p 3 = e( 0,0,1 ) e( 0,0,0 ) e( 0,1,1 ) e( 0,1,0 ) e( 1,0,1 ) e( 1,0,0 ) e( 1,1,1 ) e( 1,1,0 ) where λ 5,j are random independent scalars drawn uniformly over the entries of the field F q In other words, we have C 5,j = λ 5,j P j,j = 1,2,3 The code is depicted in Figure 2 For a better understanding of the structure of the permutations, consider an arbitrary column vector a = [a(1) a(2) a(8)] T Then, a( 1,0,0 ) a(5) a( 1,0,1 ) a(6) a( 1,1,0 ) a(7) P 1 a = a( 0,0,0 ) = a(1) a( 0,0,1 ) a(2) a( 0,1,0 ) a(3) a( 0,1,1 ) a(4) In other words, P 1 is a permutation of the components of a such that the element a( 1,x 2,x 3 ) is swapped with the element a( 0,x 2,x 3 ) for x 2,x 3 {0,1} Similarly, P 2 swaps a( x 1,0,x 3 ) with a( x 1,1,x 3 ) and P 3 swaps a( x 1,x 2,0 ) with a( x 1,x 2,1 ) where x 1,x 2,x 3 {0,1} Now, we show that this code can be used to achieve optimal recovery, in terms of repair bandwidth, for a single failed systematic node To see this, consider the case where node 1 fails Note that for optimal repair, the new 1 node has to download a fraction of = 1 2 of every surviving node, ie, nodes 2,3,4,5 The repair strategy is to download d i ( 0,0,0 ),d i ( 0,0,1 ),d i ( 0,1,0 ),d i ( 0,1,1 ) from node i {2,3,4,5}, so that V 1 = e( 0,0,0 ) e( 0,0,1 ) e( 0,1,0 ) e( 0,1,1 ) In other words, the rows of V 1 come from the set {e( 0,x 2,x 3 ) : x 2,x 3 {0,1}} Note that the strategy downloads half the data stored in every surviving node as required With these download vectors, it can be observed (See Figure 3) that the interference is aligned as required and all the 8 components of the desired signal a 1 can be reconstructed Specifically we note that = e(1) e(2) e(3) e(4) rowspan(v 1 C 4,i ) = rowspan(v 1 C 5,i ) = span({e( 0,x 2,x 3 ) : x 2,x 3 {0,1}}) (9) for i = 2, 3: Put differently, because of the structure of the permutations, the downloaded components can be expressed as d 4 ( 0,x 2,x 3 ) = λ 4,1 a 1 ( 0,x 2,x 3 )+λ 4,2 a 2 ( 0,x 2,x 3 )+λ 4,3 a 3 ( 0,x 2,x 3 ) d 5 ( 0,x 2,x 3 ) = λ 5,1 a 1 ( 1,x 2,x 3 )+λ 5,2 a 2 ( 0,x 2 1,x 3 )+λ 5,3 a 3 ( 0,x 2,x 3 1 ) Note that since x 2,x 3 {0,1} there are a total 8 components described in the two equations above, such that,

11 Download these components to repair Node 1 { Desired Signals λ 4,1 a 1 ( 0,0,0 )+λ 4,2 a 2 ( 0,0,0 )+λ 4,3 a 3 ( 0,0,0 ) λ 4,1 a 1 ( 0,0,1 )+λ 4,2 a 2 ( 0,0,1 )+λ 4,3 a 3 ( 0,0,1 ) λ 4,1 a 1 ( 0,1,0 )+λ 4,2 a 2 ( 0,1,0 )+λ 4,3 a 3 ( 0,1,0 ) λ 4,1 a 1 ( 0,1,1 )+λ 4,2 a 2 ( 0,1,1 )+λ 4,3 a 3 ( 0,1,1 ) λ 4,1 a 1 ( 1,0,0 )+λ 4,2 a 2 ( 1,0,0 )+λ 4,3 a 3 ( 1,0,0 ) λ 4,1 a 1 ( 1,0,1 )+λ 4,2 a 2 ( 1,0,1 )+λ 4,3 a 3 ( 1,0,1 ) λ 4,1 a 1 ( 1,1,0 )+λ 4,2 a 2 ( 1,1,0 )+λ 4,3 a 3 ( 1,1,0 ) λ 4,1 a 1 ( 1,1,1 )+λ 4,2 a 2 ( 1,1,1 )+λ 4,3 a 3 ( 1,1,1 ) Aligned interference λ 5,1 a 1 ( 1,0,0 )+λ 5,2 a 2 ( 0,1,0 )+λ 5,3 a 3 ( 0,0,1 ) λ 5,1 a 1 ( 1,0,1 )+λ 5,2 a 2 ( 0,1,1 )+λ 5,3 a 3 ( 0,0,0 ) λ 5,1 a 1 ( 1,1,0 )+λ 5,2 a 2 ( 0,0,0 )+λ 5,3 a 3 ( 0,1,1 ) λ 5,1 a 1 ( 1,1,1 )+λ 5,2 a 2 ( 0,0,1 )+λ 5,3 a 3 ( 0,1,0 ) λ 5,1 a 1 ( 0,0,0 )+λ 5,2 a 2 ( 1,1,0 )+λ 5,3 a 3 ( 1,0,1 ) λ 5,1 a 1 ( 0,0,1 )+λ 5,2 a 2 ( 1,1,1 )+λ 5,3 a 3 ( 1,0,0 ) λ 5,1 a 1 ( 0,1,0 )+λ 5,2 a 2 ( 1,0,0 )+λ 5,3 a 3 ( 1,1,1 ) λ 5,1 a 1 ( 0,1,1 )+λ 5,2 a 2 ( 1,0,1 )+λ 5,3 a 3 ( 1,1,0 ) } Download these components to repair Node 1 Node 4 Node 5 Fig 3 Shaded portions indicate downloaded portions used to recover failure of node 1 Note that the undesired symbols can be cancelled by downloading half the components of a 2,a 3, ie, by downloading a 2 ( 0,x 1,x 2 ) and a 3 ( 0,x 1,x 2 ) for x 1,x 2 {0,1} all the interference is of the form a i ( 0,y 2,y 3 ),i {2,3},y 2,y 3 {0,1} In other words, the interference from a i,i = 2,3 comes from only half its components, and the interference is aligned as described in (9) However, note that the 8 components span all the 8 components of the desired signal a 1 Thus, the interference can be completely cancelled and the desired signal can be completely reconstructed Similarly, in case of failure of node 2, the set of rows of the repair matrices V 2 is equal to the set {e( x 1,0,x 3 ) : x 1,x 3 {0,1}}, ie, V 2 = e( 0,0,0 ) e( 0,0,1 ) e( 1,0,0 ) e( 1,0,1 ) = With this set of download vectors, it can be noted that, for i = 1,3 e(1) e(2) e(5) e(6) rowspan(v 2 C 4,i ) = rowspan(v 2 C 5,i ) = span({e( x 1,0,x 3 ) : x 1,x 3 {0,1}}) (10) so that the interference is aligned It can be verified that the desired signal can be reconstructed completely because of condition (8) as well The rows of V 3 come from the set {e( x 1,x 2,0 ) : x 1,x 2 {0,1}} Equations (6) and (8) can be verified to be satisfied for this choice of V 3 with the alignment condition taking the form this case can be verified to be satisfied, for i = 1,2, as rowspan(v 3 C 4,i ) = rowspan(v 3 C 5,i ) = span({e( x 1,x 2,0 ) : x 1,x 2 {0,1}}) (11) While this shows that optimal repair is achieved, all the remains to be shown is that the code is an MDS code, ie, Property 1 This is shown in Appendix I, for the generalization of this code to arbitrary values of (n,k) Next, we describe this generalization B The (n,k) permutation code This is a natural generalization of the (5,3) code for general values of (n,k), with L = () k To describe this generalization, we define function χ i (m) = (φ 1 (m),φ 2 (m),,,φ i 1 (m),φ i (m) 1,φ i+1 (m),φ i+2 (m),,φ k (m)), where the operator represents an addition modulo (n k) In other words, χ i (m) essentially modifies the ith position in the base () representation of m 1, by addition of 1 modulo () Remark 1: For the (5,3) Permutation Code described previously, note that the mth row of P i is e( χ i (m) ) In other words, for the (5,3) Permutation Code described above, the mth component of P i a is equal to a( χ i (m) ) Remark 2: χ i (1), χ i (2),, χ i (() k ) is a permutation of 1,2,,() k for any i {1,2,,k} Therefore, given a L 1 vector a, [ a ( χi (1) ),a ( χ i (2) ),,a ( χ i (() k ) )] T

12 is a permutation of a We will use this permutation to construct our codes In this code, we have L = M/k = () k, so that the k sources, a 1,a 2,,a k are all () k 1 vectors and the coding matrices are () k () k matrices Consider the permutation matrix P i defined as e ( χ i (1) ) e ( χ i (2) ) P i = e ( χ i (() k ) ) for i = 1,2,,k, where e(1),e(2),,e(() k ) are the rows of the identity matrix I () k Note that because of Remark 2, the above matrix is indeed a permutation matrix Then, the coding matrices are defined as C j,i = λ j,i P j k 1 i Thus, to understand the structure of the above permutation, consider an arbitrary column vector a = ( a(1) a(2) a(() k ) ) T Then, let j = x 1,x 2,x 3,,x k ) for 1 j () k Then, the jth component of P i a is Thus, we can write a( (x 1,x 2,,x i 1,x i 1,x i+1,,x k ) ) d k+r+1 ( x 1,x 2,,x k ) = λ k+r+1,1 a 1 (x 1 r,x 2,x 3,,x k )+λ k+r+1,2 a 2 (x 1,x 2 r,x 3,,x k ) (12) ++λ k+r+1,k a k (x 1,x 2,x 3,,x k r) where r {0,1,2,, 1} This describes the coding matrices Now, in case of failure of node l, the rows of the repair matrices V l are chosen from the set {e(m) : φ l (m) = 0} Since φ l (m) can take values, this construction has L k = ()k 1 rows for V l as required Because of the construction, we have the following interference alignment relation for i l,j {k+1,k +2,,n} Further, rowspan(c j,i V l ) = rowspan({e(m) : φ l (m) = 0}) rowspan(c j,l V l ) = rowspan({e(m) : φ l (m) = j k 1}) for j {k+1,k+2,,n} so that (8) is satisfied and the desired signal can be reconstructed from the interference All that remains to be shown is the MDS property This is shown in Appendix I VI EXPLICIT CONSTRUCTION OF PERMUTATION CODES FOR {2,3} While, theoretically, any (n, k) MDS code could be used to build distributed storage systems, in practice, the case of having a small number of parity nodes, ie small values of, is especially of interest In fact, a significant portion of literature on use of codes for storage systems is devoted to building codes for the cases of () {2,3} with desirable properties (See, for example, [1] [4]) While these references focused on constructing MDS codes with efficient encoding and decoding properties, here, we study the construction of MDS codes for {2,3} with desirable repair properties In the previous section, we provided random code constructions based on permutation matrices In this section, we further strengthen our constructions by providing explicit code constructions for the important case of {2, 3} Note that the codes constructed earlier were random constructions because of the fact that scalars λ j,i were picked randomly from the field Further, note that, as long as λ j,i,j = k + 1,k + 2,,n,i = 1,2,,k are any set of non-zero scalars, the repair bandwidth for failure of a single systematic node is n 1 units as required The randomness of the scalars λ j,i was used in the previous section to show the existence of codes which satisfy the MDS property In this section, for the two cases of = 2 and = 3, we choose these scalars explicitly

13 (ie, not randomly) so that the MDS property is satisfied For both cases, the scalars λ i,j are chosen as so that we have λ j,i = λ i 1 j (13) C j,i = (λ j P j ) i 1 for j = {1,2,,},i = 1,2,k If = 2, we choose q (k +1) and choose λ 1,λ 2,,λ k are distinct non-zero elements from the field With this choice of scalars, in Appendix II, we show that the code satisfies the MDS property For example, for the case of n = 5,k = 3 described previously, we can choose the field q = 4 and λ 4,1 = λ 4,2 = λ 4,3 = 1 λ 5,1 = 1,λ 5,2 = 2,λ 5,3 = 3 This choice of scalars λ j,i ensures the MDS property as shown in Appendix II For = 3, we choose λ 1,λ 2,,λ k to be k non-zero elements in the field F q satisfying λ i λ j,λ i +λ j 0 (14) for all i j,i,j {1,2,,k} Note that elements λ i satisfying the above conditions can be chosen satisfying the above properties by choosing q 2k+1 and ensuring that λ i +λ i = 0 λ i / {λ 1,λ 2,,λ k } In Appendix II, we also show that the code described here for the case of = 3 is an MDS code VII CONCLUSION In this paper, we provide a class of MDS codes with optimal repair properties, in terms of repair bandwidth, for a single failed systematic node We show that our codes are optimal, not only in terms of repair bandwidth, but also in terms of the amount of disk access during the recovery of a single failed node We also provide an explicit construction over relatively small field sizes for the special cases of storage systems with 2 parity nodes, or 3 parity nodes Since we effectively provide the first set of minimum storage regenerating (MSR) codes for arbitrary (n, k), this work can be viewed as a stepping stone towards implementation of MDS codes in distributed storage systems From a theoretical perspective, since coding matrices in the storage set up are analogous to channel matrices in wireless channels, our codes may be viewed as a tool to create interference alignment toy examples for wireless channels In fact, our An interesting exercise in this direction would be to test if our example constructions could be modified, perhaps via an appropriate change of basis, to generate constructions satisfying properties desired for ergodic interference alignment [19] Such an exercise could lead to extensions of the powerful idea of ergodic interference alignment to more general contexts From the perspective of storage systems, there remain several unanswered questions First, there remains open the existence of finite codes which can achieve more efficient repair of parity nodes as well, along with systematic nodes Second, we assume that the new node connects to all d = n 1 surviving nodes in the system An interesting question is whether finite code constructions can be found to conduct efficient repair when the new node is restricted to connect to a subset of the surviving nodes While asymptotic constructions satisfying the lower bounds have been found for both these problems, the existence of finite codes satisfying these properties remain open Finally, the search for repair strategies of existing codes, which is analogous to the search of interference alignment beamforming vectors for fixed channel matrices in the context of interference channels, remains open While iterative techniques exist for the wireless context [20], [21], they cannot be directly extended to the storage context because of the discrete nature of the optimization problem in the latter context Such algorithms, while explored in the context of certain classes of codes in [22], [23], are an interesting area of future work APPENDIX I MDS PROPERTY We intend to show that the determinant of the matrix in (5) is a non-zero polynomial in Λ = {λ j,i,j = k + 1,k + 2,,n,i = 1,2,,k} for any j 1,j 2,,j k {1,2,,n} If we show this, then, each MDS constraint corresponds to showing that a polynomial p j1,j 2,,j k (Λ) is non-zero Using the Schwartz-Zippel Lemma on the product of these polynomials π j1,j 2,,j k p j1,j 2,,j k (Λ) automatically implies the existence of Λ so that the

14 MDS constraints are satisfied, in a sufficiently large field Therefore, all that remains to be shown is that the determinant of (5) is a non-zero polynomial in Λ We will show this by showing that there exists at least one set of values for the variables Λ such that the determinant of (5) is non-zero To show this, we first assume, without loss of generality, that j 1,j 2,,j k are in ascending order Also, let j 1,j 2,,j k m {1,2,,k} and j k m+1,j k m+2,,j k {k +1,k +2,,n} For simplicity we will assume that j 1 = 1,j 2 = 2,,j k m = k m The proof for any other set {j 1,j 2,,j k m } is almost identical to this case, except for a difference in the indices used henceforth Substituting the appropriate values of C j,i, the matrix in (5) can be written as I I 0 λ jk m+1,1p s k m+1 1 λ jk m+1,k mp s k m+1 k m λ jk m+1,kp s (15) k m+1 k λ jk,1p s k 1 λ jk,k mp s k k m λ jk,k mp s k k where s i = j i k 1 Now, if { } 0 if (j,i) / {(jt,t) : t = k m+1,k m+2,,k} λ j,i = 1 otherwise then the above matrix is a block diagonal matrix Therefore, its determinant evaluates to the product of the k determinant of its diagonal blocks, ie, P su u which is non-zero This implies that the determinant in u=k m+1 (5) is a non-zero polynomial in Λ as required This completes the proof APPENDIX II PROOF OF MDS PROPERTY FOR EXPLICIT CONSTRUCTIONS OF SECTION VI We need to show Property 1 Before we show this property, we begin with the following Lemma which shows that the coding matrices for the Permutation Code commute Lemma 2: P m1 i P m2 j = P m2 j P m1 i where P i is chosen as in (12) Proof: In order to show this, we show that P i P j a = P j P i a for any 2 k 1 dimensional column vector a Assuming without loss of generality that i < j, this can be seen by verifying that P i P j a = P j P i a = i 1 entries j i 1 entries {}}{{}}{ a( 0,0,,0,0 m 1, 0,,0,0 m 2,0,,0 ) a( 1,1,,1,1 m 1,1,,1,1 m 2,1,,1 ) a( k 1,k 1,,k 1,(k 1) m 1,0,,0,(k 1) m 2,k 1,,k 1 ) In other words, the < r 1,r 2,,r k >th element of both P m1 i P m2 j a and P m2 j P m1 i a can be verified to be a(< r 1,r 2,,r i 1,r i m 1,r i+1,,r j 1,r j m 2,r j+1,,r k ) c Now, we proceed to show the 1 property for n k {2,3} Without loss of generality, we assume that that j 1,j 2,,j k are in ascending order Case 1: = 2: We divide this case into 2 scenarios In the first scenario, j 1,j 2,,j k 1 {1,2,,k} and j k {k + 1,k + 2} Note that this corresponds to reconstructing the data from k 1 systematic nodes and a single parity node Now, substituting this in equation (15) in Appendix I, and expanding this determinant along the first (k 1)L columns, we get this determinant to be equal to C jk,i Therefore, the desired property is

15 equivalent to the matrix C j,i = (λ i P i ) j k 1 to be full rank for all j {k+1,k+2,,n},i = 1,2,,k This scenario is hence, trivial Now, in the second scenario, consider the case where j 1,j 2,,j k 2 {1,2,,k} and j k 1 = k +1,j k = k +2 This corresponds to the case where the original sources are reconstructed using k 2 systematic nodes, and both parity nodes By substituting in (15) and expanding along the first (k 2)L rows, the MDS property can be shown to be equivalent showing that the matrix [ ] I I λ i P i λ j P j having full rank Now, note that the matrices P i and P j On noting that the determinant of commuting blockmatrices can be evaluated by using the element-wise determinant expansion over blocks [24], the determinant of the matrix can be written as λ j P j λ i P i = λ 1 j P 1 i P j P 1 i λ i λ 1 j I Note that the above expression is equal to0if and only if λ i λ 1 j is an Eigen-value of the permutation matrixp j P 1 i Since P j P 1 i is a permutation matrix, 1 is its only Eigen value As noted in (13), we have λ i λ j λ i λ 1 j 1, and hence, the determinant shown above is non-zero and the matrix is full-rank as required Case 2: = 2: We divide this case into 5 scenarios as listed below 1) j 1,j 2,,j k 1 {1,2,,k} and j k {k+1,k +2} 2) j 1,j 2,,j k 2 {1,2,,k} and j k 1 = k +1,j k = k +2} 3) j 1,j 2,,j k 2 {1,2,,k} and j k 1 = k +2,j k = k +3} 4) j 1,j 2,,j k 2 {1,2,,k} and j k 1 = k +1,j k = k +3} 5) j 1,j 2,,j k 3 {1,2,,k} and j k 2 = k +1,j k 1 = k +2,j k = k +3} Property 1 can be proved to hold in the first two scenarios using arguments similar to Case 1 For the third scenario, again, using arguments similar to Case 1, showing the MDS property is equivalent to showing that the matrix [ ] λi P i λ j P j λ 2 i P2 i λ 2 j P2 j has a full rank The above matrix has a full rank because is equal to [ ] [ ] I I λi P i 0 λ i P i λ j P j 0 λ j P j and both the matrices of the above product have full rank Now, for the fourth scenario, we need to show that the matrix [ ] I I λ 2 i P2 i λ 2 j P2 j has a full rank This can be seen on noting that the determinant of the above matrix evaluates to λ 2 j P2 j λ2 i P2 i = λ2 j P 2 i P 2 j P 2 i λ 2 i λ 2 j I which is non-zero if λ 2 i λ 2 j 1 This is ensured because the conditions in (14) imply that λ 2 i λ2 j Finally, we consider to scenario 5 where we need to show that all the information can be recovered from k 3 systematic nodes, and all 3 parity nodes For this, we need I I I λ i P i λ j P j λ l P l λ 2 i P2 i λ 2 j P2 j λ 2 l P2 l to have full rank Note that the above matrix has a block Vandermonde structure, where each of the blocks commute pairwise because of Lemma 2 This fact, combined with the fact that commuting block matrices can be expanded in a manner, similar to the element-wise determinant expansion, implies that the determinant of the above matrix is equal to λ i P i λ j P j i,j The determinant is non-zero since λ i λ j if i j This completes the proof of the desired MDS property

16 REFERENCES [1] M Blaum, J Brady, J Bruck, and J Menon, Evenodd: an optimal scheme for tolerating double disk failures in raid architectures, in Computer Architecture, 1994, Proceedings the 21st Annual International Symposium on, pp , Apr 1994 [2] C Huang and L Xu, Star : An efficient coding scheme for correcting triple storage node failures, Computers, IEEE Transactions on, vol 57, pp , July 2008 [3] P Corbett, B English, A Goel, T Grcanac, S Kleiman, J Leong, and S Sankar, Row-diagonal parity for double disk failure correction, in In Proceedings of the 3rd USENIX Symposium on File and Storage Technologies (FAST), pp 1 14, 2004 [4] J S Plank, The raid-6 liber8tion code, The International Journal of High Performance Computing and Applications, vol 23, pp , August 2009 [5] A Dimakis, P Godfrey, M Wainwright, and K Ramchandran, Network coding for distributed storage systems, in IEEE INFOCOM, pp , may 2007 [6] Y Wu and A Dimakis, Reducing repair traffic for erasure coding-based storage via interference alignment, in IEEE International Symposium on Information Theory, pp , july [7] N B Shah, R K V, P V Kumar, and K Ramachandran, Explicit codes minimizing repair bandwidth for distributed storage, CoRR, vol abs/ , [8] C Suh and K Ramchandran, Exact regeneration codes for distributed storage repair using interference alignment, CoRR, vol abs/ , [9] V R Cadambe, S Jafar, and H Maleki, Distributed data storage with minimum storage regenerating codes - exact and functional repair are asymptotically equally efficient, CoRR, vol abs/ , April [10] C Suh and K Ramchandran, On the existence of optimal exact-repair mds codes for distributed storage, CoRR, vol abs/ , April [11] B Gaston and J Pujol, Double circulant minimum storage regenerating codes, CoRR, vol abs/ , [12] M Maddah-Ali, A Motahari, and A Khandani, Communication over MIMO X channels: Interference alignment, decomposition, and performance analysis, in IEEE Trans on Information Theory, pp , 2008 [13] S Jafar and S Shamai, Degrees of freedom region for the MIMO X channel, IEEE Trans on Information Theory, vol 54, pp , Jan 2008 [14] V Cadambe and S Jafar, Interference alignment and the degrees of freedom of the k user interference channel, IEEE Trans on Information Theory, vol 54, pp , Aug 2008 [15] R K V, N B Shah, and P V Kumar, Optimal exact-regenerating codes for distributed storage at the msr and mbr points via a product-matrix construction, CoRR, vol abs/ , [16] N B Shah, R K V, P V Kumar, and K Ramchandran, Distributed storage codes with repair-by-transfer and non-achievability of interior points on the storage-bandwidth tradeoff, CoRR, vol abs/ , [17] S Yekhanin, Locally decodable codes, in Now Publishers, pp , june 2010 [18] D Cullina, A Dimakis, and T Ho, Searching for minimum storage regenerating codes, Proceedings of the 47th Annual Allerton Conference on Communication, Control and Computation, Sep [19] B Nazer, M Gastpar, S A Jafar, and S Vishwanath, Ergodic interference alignment, June 2009 [20] K Gomadam, V Cadambe, and S Jafar, Approaching the capacity of wireless networks through distributed interference alignment, in Submitted to Globecom 2008 Preprint available through the authors website, March 2008 [21] S Peters and R Heath, Interference alignment via alternating minimization, in Acoustics, Speech and Signal Processing, 2009 ICASSP 2009 IEEE International Conference on, pp , April 2009 [22] Z Wang, A G Dimakis, and J Bruck, Rebuilding for array codes in distributed storage systems, ACTEMT: Workshop on the Application of Communication Theory to Emerging Memory Technologies, December [23] L Xiang, Y Xu, J C Lui, and Q Chang, Optimal recovery of single disk failure in rdp code storage systems, in Proceedings of the ACM SIGMETRICS international conference on Measurement and modeling of computer systems, SIGMETRICS 10, (New York, NY, USA), pp , ACM, [24] I Kovacs, D S Silver, and S G Williams, Determinants of commuting-block matrices, The American Mathematical Monthly, vol 106, no 10, pp pp ,

Distributed Data Storage with Minimum Storage Regenerating Codes - Exact and Functional Repair are Asymptotically Equally Efficient

Distributed Data Storage with Minimum Storage Regenerating Codes - Exact and Functional Repair are Asymptotically Equally Efficient Viveck R Cadambe, Syed A Jafar, Hamed Maleki Electrical Engineering and