Lossy Compression via Sparse Linear Regression: Computationally Efficient Encoding and Decoding

ossy Compressio via Sparse iear Regressio: Computatioally Efficiet Ecodig ad Decodig Ramji Vekataramaa Dept of Egieerig, Uiv of Cambridge ramjiv@egcamacuk Tuhi Sarkar Dept of EE, IIT Bombay tuhi91@gmailcom Sekhar Tatikoda Dept of EE, Yale Uiversity sekhartatikoda@yaleedu Abstract We propose computatioally efficiet ecoders ad decoders for lossy compressio usig a Sparse Regressio Code Codewords are structured liear combiatios of colums of a desig matrix The proposed ecodig algorithm sequetially chooses colums of the desig matrix to successively approximate the source sequece It is show to achieve the optimal distortiorate fuctio for iid Gaussia sources with squared-error distortio For a give rate, the parameters of the desig matrix ca be varied to trade off distortio performace with ecodig complexity A example of such a trade-off is: computatioal resource space or time) per source sample of O/ log ) 2 ) ad probability of excess distortio decayig expoetially i / log, where is the block legth The Sparse Regressio Code is robust i the followig sese: for ay ergodic source, the proposed ecoder achieves the optimal distortio-rate fuctio of a iid Gaussia source with the same variace Simulatios show that the ecoder has very good empirical performace, especially at low ad moderate rates I INTRODUCTION Developig efficiet codes for lossy compressio at rates approachig the Shao rate-distortio limit has log bee a importat goal of iformatio theory Efficiecy is measured i terms of the storage complexity of the codebook as well the computatioal complexity of ecodig ad decodig The Shao-style iid radom codebook achieves the optimal distortio-rate trade-off but its storage ad computatioal complexities grow expoetially with the block legth I this paper, we study a class of codes called Sparse Superpositio or Sparse Regressio Codes SPARCs) for lossy compressio with a squared-error distortio criterio We preset computatioally efficiet ecodig ad decodig algorithms that attai the optimal rate-distortio fuctio for iid Gaussia sources Sparse Regressio codes were recetly itroduced by Barro ad Joseph for commuicatio over the AWGN chael ad show to approach the Shao capacity with feasible decodig [1, [2, [3 The codebook costructio is based the statistical framework of high-dimesioal liear regressio The codewords are sparse liear combiatios of colums of a N desig matrix or dictioary, where is the blocklegth ad N is a low-order polyomial i This structure eables the desig of computatioally efficiet compressio ecoders based o sparse approximatio ideas eg, [4, [5) We propose oe such ecoder ad aalyze it performace SPARCs for lossy compressio were first cosidered i [6 where some prelimiary results were preseted The ratedistortio ad error expoet performace of these codes uder miimum-distace optimal) ecodig was characterized i [7 The mai cotributios of this paper are the followig We propose a computatioally efficiet ecodig algorithm for SPARCs which achieves the optimal distortiorate fuctio for iid Gaussia sources with growig block legth The algorithm is based o successive approximatio of the source sequece by colums of the desig matrix The parameters of the desig matrix ca be chose to trade off performace with complexity For example, oe choice of parameters discussed i Sectio IV yields a O 2 ) desig matrix, per-sample ecodig complexity proportioal to log )2, ad probability of log excess distortio decayig expoetially i To the best of our kowledge, this is the fastest kow rate of decay amog lossy compressio codes with feasible ecodig ad decodig With the proposed ecoder, SPARCs share the followig robustess property of radom iid Gaussia codebooks [8, [9: for a give rate R, ay ergodic source with variace σ 2 ca be compressed with distortio close to the iid Gaussia distortio-rate fuctio σ 2 e 2R We briefly review related work i developig computatioally efficiet codes for lossy compressio It was show i [10 that the optimal rate-distortio fuctio of memoryless sources ca be approached by cocateatig optimal codes over sub-blocks of legth much smaller tha the overall block legth Nearest eighbor ecodig is used over each of these sub-blocks, which is feasible due to their short legth For this scheme, it is ot kow how rapidly the probability of excess distortio decays to zero with the overall block legth For sources with fiite alphabet, various codig techiques have bee proposed recetly to approach the rate-distortio boud with computatioally feasible ecodig ad decodig, eg [11, [12, [13 The rates of decay of the probability of excess distortio for these schemes vary, but i geeral they are slower tha expoetial i the block legth The survey paper by Gray ad Neuhoff [14 cotais a extesive discussio of various compressio techiques ad their performace versus complexity trade-offs These iclude scalar quatizatio with etropy codig, tree-structured vector quatizatio, multi-stage vector quatizatio, ad trellis-coded quatizatio Though these techiques have good empirical performace, they have ot bee prove to attai the op-

A: β: 0, Sectio 1 Sectio 2 Sectio M colums M colums M colums 0, c 1, 0, c 2, 0, c, 0,, 0 Fig 1 A is a M matrix ad β is a M 1 vector The positios of the o-zeros i β correspod to the gray colums of A which combie to form the codeword Aβ timal rate-distortio trade-off with computatioally feasible ecoders ad decoders Notatio: Upper-case letters are used to deote radom variables, lower-case for their realizatios, ad bold-face letters for radom vectors ad matrices All vectors have legth The source sequece is deoted by S S 1,, S ), ad the recostructio sequece by Ŝ Ŝ1,, Ŝ) X is the l 2 -orm of vector X, ad X = X / is the ormalized versio N µ, σ 2 ) deotes the Gaussia distributio with mea µ ad variace σ 2 a, b deotes the ier product i a ib i All logarithms are with base e II THE SPARSE REGRESSION CODEBOOK A sparse regressio code SPARC) is defied i terms of a desig matrix A of dimesio M whose etries are iid N 0, 1) Here is the block legth ad M ad are itegers whose values will be specified shortly i terms of ad the rate R As show i Figure 1, oe ca thik of the matrix A as composed of sectios with M colums each Each codeword is a liear combiatio of colums, with oe colum from each sectio Formally, a codeword ca be expressed as Aβ, where β is a M 1 vector β 1,, β M ) with the followig property: there is exactly oe o-zero β i for i {1,, M}, oe o-zero β i for i {M + 1,, 2M}, ad so forth The o-zero value of β i sectio i is set to c i where the value of c i will be specified i the ext sectio Deote the set of all β s that satisfy this property by B M, Sice there are M colums i each of the sectios, the total umber of codewords is M To obtai a compressio rate of R ats/sample, we eed M = e R or log M = R 1) Ecoder: This is defied by a mappig g : R B M, Give the source sequece S ad target distortio D, the ecoder attempts to fid a ˆβ B M, such that S A ˆβ 2 D If such a codeword is ot foud, a error is declared Decoder: This is a mappig h : B M, R O receivig ˆβ B M, from the ecoder, the decoder produces recostructio h ˆβ) = A ˆβ Storage Complexity: The storage complexity of the dictioary is proportioal to M There are several choices for T the pair M, ) which satisfy 1) For example, = 1 ad M = e R recovers the Shao-style radom codebook i which the umber of colums i A is e R, ie, the storage complexity is expoetial i For our costructios, we choose M to be a low-order polyomial i The is Θ/log ), ad the umber of colums M i the dictioary is a low-order polyomial i This reductio i storage complexity ca be haressed to develop computatioally efficiet ecoders for the SPARC The results i Sectio IV show that this choice of M, ) offers a good trade-off betwee complexity ad error performace III COMPUTATIONAY EFFICIENT ENCODER The source sequece S is geerated by a ergodic source with mea 0 ad variace σ 2 The SPARC is defied by the M desig matrix A The jth colum of A is deoted A j, 1 j M The o-zero value of β i sectio i is chose to be c i = 2Rσ 2 1 2R ) i 1, i = 1,, 2) Give source sequece S, the ecoder determies ˆβ B M, accordig to the followig algorithm Step 0: Set R 0 = S Step i, i = 1,, : Pick Set m i = argmax j: i 1)M+1 j im A j, 3) R i = c i 4) where c i is give by 2) Step + 1: The codeword ˆβ has o-zero values i positios m i, 1 i The value of the o-zero i sectio i give by c i I summary, the algorithm sequetially chooses the m i s, sectio by sectio, to miimize a residue i each step A Computatioal Complexity Each of the stages of the ecodig algorithm ivolves computig M ier products ad fidig the maximum amog them Therefore the umber of operatios per source sample is proportioal to M If we choose M = b for some b > 0, 1) implies = Θ/log ), ad the umber of operatios per source sample is of the order / log ) b+1 We ote that due to the sequetial ature of the algorithm, oly oe sectio of the desig matrix eeds to be kept i memory at each step Whe we have several source sequeces to be ecoded i successio, the ecoder ca have a pipelied architecture which requires computatioal space memory) of the order M ad has costat computatio time per source symbol The code structure automatically yields low decodig complexity The ecoder ca represet the chose β with biary sequeces of log 2 M bits each The ith biary sequece idicates the positio of the o-zero elemet i sectio i Hece the decoder complexity correspodig to locatig the

o-zero elemets usig the received bits is log 2 M, which is O1) per source sample Recostructig the codeword the requires additios per source sample IV MAIN RESUT Theorem 1 Cosider a legth source sequece S geerated by a ergodic source havig mea 0 ad variace σ 2 et δ 0, δ 1, δ 2 be ay positive costats such that δ 0 + 5Rδ 1 + δ 2 ) < 05 5) et A be a M desig matrix with iid N 0, 1) etries ad M, satisfyig 1) O the SPARC defied by A, the proposed ecodig algorithm produces a codeword A ˆβ that satisfies the followig for sufficietly large M, P S A ˆβ 2 > σ 2 e 2R 1 + e R ) 2) < p 0 + p 1 + p 2 6) where S p 0 = P σ 1 > δ 0 ), p 1 = 2M exp δ 2 1/8 ), M 2δ 2 ) p 2 = 8 log M Proof A sketch of the proof is give i Sectio V The full versio ca be foud i [15 Corollary 1 If the source sequece S geerated accordig to a iid N 0, σ 2 ) distributio, p 0 < 2 exp 3δ 2 0/4), ad the SPARC with the proposed ecoder attais the optimal distortio-rate fuctio σ 2 e 2R, with probability of excess distortio decayig expoetially i Remarks: 1) The probability measure i 6) is over the space of source sequeces ad desig matrices 2) Ergodicity of the source is oly eeded to esure that p 0 0 as 3) For a iid N 0, σ 2 ) source, Corollary 1 says that with the choice M = b b > 0) we ca achieve a distortio withi ay costat gap of the optimal distortio-rate fuctio σ 2 e 2R with the probability of excess distortio fallig expoetially i = Θ/log ) 4) For a give rate R, Theorem 1 guaratees that the proposed ecoder achieves a squared-error distortio close to the Gaussia D R) for all ergodic sources with variace σ 2 apidoth [8 also shows that for ay ergodic source of a give variace, oe caot attai a squared-error distortio smaller tha this usig a iid Gaussia codebook with miimum-distace ecodig Gap from D R): To achieve distortios close to the Gaussia D R) with high probability, we eed p 0, p 1, p 2 to all go 7) to 0 I particular, for p 2 0 with growig, from 7) we require that M 2δ2 > 8 log M Or, δ 2 > log log M 2 log M + log 8 2 log M 8) To approach D R), ote that we eed,, M to all go to while satisfyig 1):, for the probability of error i 7) to be small, ad M i order to allow δ 2 to be small accordig to 8) Whe,, M are sufficietly large, 8) dictates how log log M log M small ca be: the distortio is approximately higher tha the optimal value D R) = σ 2 e 2R Performace versus Complexity Trade-off : Recall that the ecodig complexity is OM ) operatios per source sample The performace of the ecoder improves as M, icrease both i terms of the gap from the optimal distortio 8) ad the probability of error 7) Choosig M = b for b > 0 yields /log ad the resultig ecodig complexity is Θ /log ) b+1) ; the gap from D R) govered by 8) is approximately log log b log At the other extreme, the Shao codebook has = 1, M = e R Here the SPARC cosists of oly oe sectio, ad the proposed algorithm essetially performs miimum-distace ecodig The ecodig complexity is Oe R ) expoetial) From 8), δ 2 is approximately log The gap from D R) is ow domiated by δ 0 ad δ 1 whose typical values for the iid Gaussia case are Θ1/ ) from 7) ad Corollary 1) Successive Refiemet Iterpretatio: The proposed ecoder may be iterpreted i terms of successive refiemet [16 We ca thik of each sectio of the desig matrix A as a codebook of rate R/ For step i, i = 1,,, the residue acts as the source sequece, ad the algorithm attempts to fid the colum withi Sectio i that miimizes the distortio The distortio after step i is the variace of the ew residue R i The miimum mea-squared distortio with a Gaussia codebook [8 at rate R/ is D i = 2 exp 2R/) 2 1 2R/) 9) for R/ 1 The typical value of the distortio i Sectio i is close to Di sice the algorithm is equivalet to maximumlikelihood ecodig withi each sectio see 10) i Sectio V) Sice the rate R/ is ifiitesimal, the deviatios from Di i each sectio ca be quite large However, sice the umber of sectios is very large, the fial distortio R 2 is close to the typical value σ 2 e 2R with excess distortio probability that falls expoetially i We emphasize that the successive refiemet iterpretatio is oly true for the proposed ecoder, ad is ot a iheret feature of the sparse regressio codebook Figure 2 shows the performace of the proposed ecoder o a uit variace iid Gaussia source The dimesio of A is M with M = b The curves show the average distortio obtaied at various rates for b = 2 ad b = 3 The value of was icreased with rate i order to keep the total computatioal

Distortio 1 09 08 07 06 05 04 03 02 01 D*R) b = 3 b = 2 0 0 05 1 15 2 25 3 35 4 45 5 Rate bits/sample) Fig 2 Average distortio of the proposed ecoder for iid N 0, 1) source at various rates With M = b, distortio-rate curves are show for b = 2 ad b = 3 alog with D R) = e 2R complexity b+1 ) similar across differet rates Recall that block legth is determied by 1)) The reductio i distortio obtaied by icreasig b from 2 to 3 comes at the expese of a icrease i computatioal complexity by a factor of Simulatios were also performed for a uit variace aplacia source The resultig distortio-rate curve was virtually idetical to Figure 2 which is cosistet with Theorem 1 V PROOF OF THEOREM 1 We first preset a o-rigorous aalysis of the proposed ecodig algorithm based o the followig observatios 1) A j 2 is approximately equal to 1 whe is large, for 1 j M This is because A j 2 is the ormalized sum of squares of iid N 0, 1) radom variables 2) Similarly, S 2 is approximately equal to σ 2 for large 3) If X 1, X 2, X M are iid N 0, 1) radom variables, the max{x 1,, X M } is approximately equal to 2 log M for large M [17 Step i, i = 1,, : We show that if 2 σ ) i 1, 2 1 2R the R i 2 σ 2 1 2R/) i 10) 10) is true for i = 0 the secod observatio above) For each j {i 1)M + 1,, im}, the statistic max T i) j = i 1)M+1 j im T i) j A j, / 11) is a N 0, 1) radom variable This is because it is the projectio of iid N 0, 1) radom vector A j i the directio of ad is idepedet of A j This idepedece holds because is a fuctio of the source sequece S ad the colums {A j } 1 j i 1)M, which are all idepedet of A j for i 1)M + 1 j im Further, the T i) j s are mutually idepedet for i 1)M + 1 j im This ca be see by coditioig o the realizatio of / We therefore have 2 log M 12) From 4), we have R i 2 = 2 + c 2 i A mi 2 2c i a) σ 2 1 2R ) i 1 1 + c 2 i 2c iσ 2R b) = σ 2 1 2R ) i ) i 1 2 log M 13) a) follows from 12) ad the iductio hypothesis b) is obtaied by substitutig for c i from 2) ad for from 1) Therefore, the fial residue after Step is R 2 = S A ˆβ 2 σ 2 1 2R ) σ 2 e 2R 14) where we have used 1 + x) e x for x R Sketch of Formal Proof: The essece of the proof is i aalyzig the deviatio from the typical values of the residual distortio at each step of the algorithm These deviatios arise from atypicality cocerig the source, the desig matrix ad the maximum computed i each step We itroduce some otatio to capture the deviatios The orm of the residue at stage i is expressed as R i 2 = σ 2 1 2R ) i 1 + i ) 2, i = 0,, 15) i [ 1, ) measures the deviatio of the residual distortio R i 2 from its typical value give i 13) The orm of the colum of A chose i step i, is writte as A mi 2 = 1 + γ i, i = 1,, 16) We express the maximum of the statistic T i) j i Step i as max T i) j = = 2 log M1+ɛ i ) i 1)M+1 j im 17) ɛ i measures the deviatio of the maximum computed i step i from 2 log M Armed with this otatio, we have from 4) R i 2 = 2 + c 2 i A mi 2 2c i = σ 2 1 2R/) i[ 1 + i 1 ) 2 + 2R/ 1 2R/ 2 i 1 + γ i 2ɛ i 1 + i 1 )) From 18) ad 15), we obtai 18) 1+ i) 2 = 1+ i 1) 2 + 2R/ 1 2R/ 2 i 1 +γ i 2ɛ i1+ i 1)) 19) for i = 1,, The goal is to boud the fial distortio R 2 = σ 2 1 2R ) 1 + ) 2 20)

We fid a upper boud for 1 + ) 2 that holds uder a evet whose probability is close to 1 Accordigly, defie A as the evet where all of the followig hold: 0 < δ 0, i=1 γ i < δ 1, i=1 ɛ i < δ 2 for δ 0, δ 1, δ 2 as specified i the statemet of Theorem 1 We upper boud the probability of the evet A c usig the followig lemmas proofs i [15) ) 1 emma 1 P i=1 γ i > δ < 2M exp δ 2 /8 ) for δ 0, 1 emma 2 For δ > 0, P ) 1 i=1 ɛ i > δ < ) M 2δ κ log M Usig these lemmas, we have P A c ) < p 0 +p 1 +p 2, where p 0, p 1, p 2 are give by 7) The remaider of the proof cosists of obtaiig a boud for 1+ ) 2 uder the coditio that A holds This is doe via the followig lemma, proved i [15 emma 3 Whe A is true ad is sufficietly large, i 0 w i + 4R/ 1 2R/ where w = ) 1 + R/ 1 2R/ i w i j γ j + ɛ j ), j=1 1 i 21) emma 3 implies that whe A holds ad is sufficietly large, w 4R 0 + γ j 1 2R/)w + ɛ j j=1 j=1 a) w [δ 4R 0 + 1 R/) δ 1 + δ 2 ) ) [ b) exp δ 0 + R 1 2R/ e R δ 0 + 5Rδ 1 + δ 2 )) = e R 4R 1 R/) δ 1 + δ 2 ) 22) where is defied i the statemet of the theorem a) is true because A holds ad b) is obtaied usig 1 + x e x with x = The distortio ca the be bouded as R/ 1 2R/ R 2 = σ 2 e 2R 1 + ) 2 σ 2 e 2R 1 + e R ) 2 23) VI CONCUSION We showed that Sparse Regressio codes achieve the iid Gaussia distortio-rate fuctio with a successiveapproximatio ecoder I terms of block legth, the ecodig complexity is a low-order polyomial i ad the probability of excess distortio decays expoetially i /log The gap from the distortio-rate fuctio D R) is Olog log M/ log M), as give i 8) A importat directio for future work is desigig feasible ecoders for SPARCs with faster covergece to D R) as desig matrix dimesio or block legth icreases The results of [18, [19 show that the optimal gap from D R) amog all codes) is Θ1/ ) The fact that SPARCs achieve the optimal error-expoet with miimum-distace ecodig [20 suggests that it is possible to desig ecoders with faster covergece to D R) at the expese of slightly higher computatioal complexity The results of this paper together with those i [2, [3 show that SPARCs with computatioally efficiet ecodig ad decodig achieve rates close to the Shao-theoretic limits for both lossy compressio ad commuicatio Further, [21 demostrates how source ad chael codig SPARCs ca be ested to effect biig ad superpositio, which are key igrediets of multi-termial source ad chael codig schemes Sparse regressio codes therefore offer a promisig framework to develop fast, rate-optimal codes for a variety of models i etwork iformatio theory REFERENCES [1 A Barro ad A Joseph, east squares superpositio codes of moderate dictioary size are reliable at rates up to capacity, IEEE Tras o If Theory, vol 58, pp 2541 2557, Feb 2012 [2 A Barro ad A Joseph, Toward fast reliable commuicatio at rates ear capacity with Gaussia oise, i Proc 2010 IEEE ISIT [3 A Joseph ad A Barro, Fast sparse superpositio codes have expoetially small error probability for R < C, Submitted to IEEE Tras If Theory, 2012 http://arxivorg/abs/12072406 [4 S Mallat ad Z Zhag, Matchig pursuits with time-frequecy dictioaries, IEEE Tras Sigal Processig, vol 41, pp 3397 3415, Dec 1993 [5 A R Barro, A Cohe, W Dahme, ad R A DeVore, Approximatio ad learig by greedy algorithms, Aals of Statistics, vol 36, pp 64 94, 2008 [6 I Kotoyiais, K Rad, ad S Gitzeis, Sparse superpositio codes for Gaussia vector quatizatio, i 2010 IEEE If Theory Workshop, p 1, Ja 2010 [7 R Vekataramaa, A Joseph, ad S Tatikoda, Gaussia ratedistortio via sparse liear regressio over compact dictioaries, i Proc 2012 IEEE ISIT http://arxivorg/abs/12020840 [8 A apidoth, O the role of mismatch i rate distortio theory, IEEE Tras If Theory, vol 43, pp 38 47, Ja 1997 [9 D Sakriso, The rate of a class of radom processes, IEEE Tras If Theory, vol 16, pp 10 16, Ja 1970 [10 A Gupta, S Verdú, ad T Weissma, Rate-distortio i ear-liear time, i Proc 2008 IEEE ISIT, pp 847 851 [11 I Kotoyiais ad C Giora, Efficiet radom codebooks ad databases for lossy compressio i ear-liear time, i IEEE If Theory Workshop o Networkig ad If Theory, pp 236 240, Jue 2009 [12 M Waiwright, E Maeva, ad E Martiia, ossy source compressio usig low-desity geerator matrix codes: Aalysis ad algorithms, IEEE Tras If Theory, vol 56, o 3, pp 1351 1368, 2010 [13 S Korada ad R Urbake, Polar codes are optimal for lossy source codig, IEEE Tras If Theory, vol 56, pp 1751 1768, April 2010 [14 R Gray ad D Neuhoff, Quatizatio, IEEE Tras If Theory, vol 44, pp 2325 2383, Oct 1998 [15 R Vekataramaa, T Sarkar, ad S Tatikoda, ossy compressio via sparse liear regressio: Computatioally efficiet ecoders ad decoders, http://arxivorg/abs/12121707 [16 W Equitz ad T Cover, Successive refiemet of iformatio, IEEE Tras If Theory, vol 37, pp 269 275, Mar 1991 [17 H David ad H Nagaraja, Order Statistics Joh Wiley & Sos, 2003 [18 A Igber ad Y Kochma, The dispersio of lossy source codig, i Data Compressio Coferece DCC), pp 53 62, March 2011 [19 V Kostia ad S Verdú, Fixed-legth lossy compressio i the fiite blocklegth regime, IEEE Tras o If Theory, vol 58, o 6, pp 3309 3338, 2012 [20 R Vekataramaa, A Joseph, ad S Tatikoda, ossy compressio via sparse liear regressio: Performace uder miimum-distace ecodig, 2012 http://arxivorg/abs/12020840 [21 R Vekataramaa ad S Tatikoda, Sparse regressio codes for multi-termial source ad chael codig, i 50th Allerto Cof o Commu, Cotrol, ad Computig, 2012