Tornado and Luby Transform Codes Ashsh Khst 6.454 Presentaton October 22, 2003
Background: Erasure Channel Elas[956] studed the Erasure Channel β x x β β x 2 m x 2 k? Capacty of Noseless Erasure Channel s No Feedback s necessary to acheve capacty A random lnear code can acheve capacty. Encodng: O(n 2 ) Decodng: O(n 3 ) Applcatons Communcaton Lnks over the Internet Storage Meda m( β )
Classcal MDS Codes c c 2 2 k Features Any set of k co-ordnates s an nformaton set for (n,k,d) MDS Code. The recever knows the codeword once t receves any k symbols and knows ther postons. Capacty achevng codes. c 2 k n symbols Drawbacks Reed Solomon Codes (RS) codes requre O(k 2 ) tme for decodng. Block codes : Need pror knowledge of erasure probablty.
Dgtal Fountan Approach A new protocol for bulk data dstrbuton Scenaro: One Server multple Recevers Encodng: Construct encodng symbols on the fly and send them when atleast one recever s lstenng. Decodng: Collect desred number of symbols from the server and reconstruct the orgnal fle Goals: Relable, Effcent, On Demand and Tolerant
Tornado Codes Features Correct -R(- ε) errors over BEC. Tme for encodng and decodng s proportonal to Very fast software mplementatons. n log ( ) ε Tradeoffs The assumpton of ndependent erasures s crtcal. Hgh latency. Low Rate Implementatons are less attractve. Block Codes Not sutable for heterogeneous recever populaton.
Irregular Bpartte Graph x x 2 c + x x2 Irregular Random Graphs are used for generatng check symbols (x, x 2, x n ) (x x n, c c nβ ) x n Input/Message Symbols c nβ Check Symbols Degree Sequences Left Degree Sequence: (λ, λ 2, λ n ) Rght Degree Sequence: (ρ, ρ 2 ρ m ) Defnton: λ k (ρ k ) s the fracton of edges that are ncdent on a left(rght) node of degree k.
Irregular Graphs: Example Gven: (λ, λ 2 ) (/2,/2) (ρ, ρ 2 ) (0,) Number of Edges E 4 2 π 2 3 l number of left nodes of degree r number of rght nodes of degree 2 π λ E 2 E ρ l r 3 ( 2 l, l ) (2,) r, r ) (0,2) ( 2 Random permutaton between edges, nduces a unform dstrbuton over the ensemble.
Constructon of Tornado Codes (n) (βn) (β 2 n) (β m+ n) B 0 B ` B m C B : Irregular Graph C : Conventonal Code Code C(B 0,B,B 2 B m,c): Each B s an rregular bpartte graph wth same degree sequences C s a conventonal rate (- β) code wth O(n 2 ) complexty. m s chosen such that Length of C: m + 0 nβ m m+ 2 nβ n nβ + β β Ths s a rate (- β) code wth encodng/decodng complexty of O(n). n
Lnear Tme Decodng Algorthm s s 2 s 3.. Fnd a check node c n that s connected to only one source node s k. If no such c n stop and declare error. (a) set s k c n (b) fnd all c n that are neghbors of s k and set c n c n + s k (c) Remove all edges connected to s k 2. Repeat () untl all source nodes are determned. 0 4. 0 s 3 0 2. s 2 s 3 5. 0 s3 0 3. 0 s 2 s 3 0 6. 0
What has to be solved? So far Identfed the structure of encoder as a cascade of rregular bpartte graphs. Suggested a canddate decodng algorthm whch has lnear complexty. Goal: Specfy the set of degree sequences (λ, λ 2, λ n ) and (ρ, ρ 2 ρ m ) for whch the ths smple decodng algorthm succeeds. Man Contrbuton of the Paper:. Develop mathematcal condtons on the degree sequences under whch ths decodng scheme succeeds 2. Provde explct degree sequences that acheve the capacty of BEC.
Condtons on Degree Sequences Defne: x ρ( ) ρ x x λ( ) λ δ: Erasure Probablty of the memorless channel Necessary Condton: If the decodng algorthm succeeds n recoverng all message symbols then ρ( δλ( x)) > x, x (0,] Approach: Compute the expected value of the fracton of edges wth degree one on the rght and requre that t s > 0 Suffcent Condton: The above condton s also suffcent f we mpose λ λ 2 0. Approach: The proof uses tools from statstcal mechancs to show that the varance n degree dstrbuton s small. x
Capacty Achevng Dstrbuton Fx an nteger D > 0, Let 2,3..., ) )( ( + D D H λ.2,3..., )! ( e α ρ α Average degree of left nodes Average degree of rght nodes ) log( / D E E λ λ β α ρ α α ) log( D e e Intuton: Posson dstrbuton s natural f all the edges from the left unformly choose the rght nodes. Ths dstrbuton s preserved when edges are successvely removed from the graph. Heavy tal dstrbuton produces some message nodes of hgh degrees that get decoded frst and remove many edges from the graph.
Capacty Achevng Dstrbuton Note that: ρ( x) e α ( x ) λ( x ) log( x) D For the above choce of ρ(x) and λ(x), t s easy to verfy that ρ( δλ( x)) > x, x (0,] whenever, δ < β +/ D Let D /ε. It follows that β(- ε) fracton of erasures can be corrected by ths rate - β code. The maxmum degree log(d), mples that the number of operatons n decodng s proportonal to nlog(/ ε).
Lnear Programmng Approach Fx (λ, λ 2, λ n ) and δ. The objectve s to fnd (ρ, ρ 2 ρ m ) for some fxed m. Let x /N, for,2..n. We have the followng constrants: ρ(- δ λ( x )) > -x ρ 0 and ρ() Mnmze N ( ρ( δλ( x ) + x ) The soluton for ρ(x) s feasble f the nequalty holds for all x n (0,] Once a feasble soluton has been found the optmal δ s found by a bnary search. An teratve approach s suggested that uses the dual condton δλ( ρ( y)) < y, y (0,]
Practcal Implementatons 640K 320K 60K 60K Tornado Z Code G 0 G G 2 Rate ½ code. Takes 640,000 packets (each 256 byte) as nput. Only three cascades have been used. G 0 and G use heavytal / posson dstrbuton as noted. G 2 cannot use a standard quadratc tme code. It s degree dstrbuton s obtaned through lnear programmng. On a 200MHz Pentum machne, the decodng operaton takes.73 seconds.
Issues The assumpton of ndependent erasures s crtcal n desgn of Tornado codes. So deep nterleavng and very long block lengths are necessary. Hgh latency s ncurred n encodng and decodng operatons, snce both encodng and decodng must be delayed by at least one block sze. Heavy memory usage: Decodng each block of Tornado Z requres 32 MB of RAM. Snce they are block codes, they have to be optmzed for a partcular rate. The number of encodng symbols s fxed when the nput block length and rate s fxed.
Luby Transform Codes Features k nput symbols can be recovered from any set of symbols wth probablty -δ. k + O( k log 2 ( k / δ )) Encodng tme: O(log(k/ δ)) per encodng symbol. Decodng tme: O(k log(k/ δ)) These codes are rate-less: the number of dstnct encodng symbols that can be generated s extremely large. Encodng symbols are generated on the fly The constructon does not make use of channel erasure probablty and hence can optmally serve heterogeneous recevers
Encodng of LT Codes Fx a degree dstrbuton ρ(d) To produce each encodng symbol: Generate the degree D ~ ρ(d) For each of the D edges, randomly pck one nput symbol node. Compute the XOR of all the D neghbors and assgn ths value to the encodng symbol. How does the decoder know the neghbors of an encodng symbol t receves? Ths nformaton can be explctly ncluded as an overhead n each packet. Pseudo-randomness can be exploted to duplcate the encodng process at the recever. The recever has to be gven the seed and/or keys assocated wth the process.
Decodng LT Codes The decodng process s vrtually the same as that of Tornado Codes. At the start release all encodng symbols of degree. Ther neghbors are now covered and form a rpple. In each subsequent step, process one message symbol from the rpple s processed. It s removed as a neghbor of the encodng symbols. If any encodng symbol now has a degree one, t s released. If ts neghbor s not already n the rpple, t gets added to the rpple. The process ends when the rpple s empty. If some message symbols reman unprocessed, ths s a decodng falure.
LT Analyss- (ρ()) How many encodng symbols (each of degree ) wll guarantee that all message symbol nodes wll be covered wth probablty > - δ? Ans: k log(k/ δ). The probablty that a message node s not covered s k log( k / δ ) ( ) δ / k k By usng the unon bound estmate, the desred result follows. k log(k/ δ) encodng symbols s unacceptable. Snce all edges are randomly ncdent on message nodes, k log(k/ δ) edges are requred to cover all the nodes.
LT Analyss-2 Suppose L nput symbols reman unprocessed durng a decodng step. Any encodng symbol s equally lkely to get released ndependent of all other encodng symbols. If an encodng symbols s released, t s equally lkely to cover any of the L symbols. Defne: q(,l) probablty that an encodng symbol of degree s released, when L nput symbols reman unprocessed. 0 ), (.., 2.., ), ( ) (, 2 2 ) ( 2 + + L q k L k P C C P C L q k q k L L k otherwse
LT Analyss-3 Solton Dstrbuton: ρ() / k ρ( ) / ( ), 2.. k r(l) : probablty that a an encodng symbol s released ( k L)! L ( k L ( 2))! r( L) ρ( ) q(, L) k ( k )! ( k )! k Thus at each step, we expect one encodng symbol to be released The sze of the rpple at each step s one.
Propertes of Solton Dstrbuton At each step one encodng symbol s released. Only k encodng symbols are needed on average to retreve k nput symbols. It expected number of edges n the graph s k log(k). The deal solton dstrbuton compresses the number of encodng symbols to the mnmum possble value, keepng the number of edges n the graph mnmum. The deal solton dstrbuton does not work well n practce, snce the expected sze of rpple s one. It s extremely senstve to small varatons.
Robust Solton Dstrbuton Mantan the sze of Rpple to a larger value R ~ Defne the followng dstrbuton Intuton c k log( k τ() / δ ) R/k for k/r- R log(r/δ)/k for k/r 0 for I k/r+..k The value of τ() s chosen so that R encodng symbols are expected to be released ntally. Ths generates a rpple of sze R When L encodng symbols are unprocessed, the most probable symbols to be released have degree k/l.
Robust Solton Dstrbuton (contd.) When L R, we requre that all the unprocessed symbols be covered. Ths s ensured by choosng τ(k/r) R log(r/δ)/k. The probablty any covered symbol gets ncluded n the rpple s (L-R)/L. We need L/(L-R) releases to expect that the sze of rpple remans the same. Thus the fracton of encodng symbols of degree k/l s proportonal to: L L R ( ) k ( )( k R) ( ) + R ( )( k R) ρ ( ) +τ ( )
Robust Solton Dstrbuton The robust solton dstrbuton s gven by µ() (τ() + ρ())/ς, where ς (τ() + ρ()). One can show that: Average number of encodng symbols to recover message 2 symbols s k + O( k log ( k / δ )) Decodng takes tme proportonal to O(k log(k/δ)) and encodng takes tme O(log(k/δ)) per symbol The probablty that the decodng algorthm fals to recover the message symbols s less than δ. 26
Conclusons Tornado Codes acheve lnear tme encodng and decodng but cannot solve the heterogeneous user case LT Codes can smultaneously serve heterogeneous users, but requre O(k logk) tme. Raptor codes (2000) acheve the best of both worlds and wll be dscussed next week.