Fixed-Length-Parsing Universal Compression with Side Information
|
|
- Barbra Hardy
- 6 years ago
- Views:
Transcription
1 Fixed-ength-Parsing Universal Compression with Side Information Yeohee Im and Sergio Verdú Dept. of Electrical Eng., Princeton University, NJ Abstract This paper presents a fixed-length-parsing universal compression algorithm for settings where compressor and decompressor share the same side information. We prove the optimality of the algorithm for any stationary processes. Furthermore, strategies to modify the algorithm are suggested in order to lower the data compression rate. I. INTRODUCTION Underlying the most prevalent compression applications, the empel-ziv class of algorithms, 2 is a widely used universal compression paradigm based on parsing into variable-length phrases, which achieves Shannon s fundamental entropy rate limit. In 3 Willems put forth a fixed-length counterpart which he analyzed using Kac s fundamental result on repetition times 4. In many environments, side information is accessible to both compressor and decompressor. When software has to be updated, the old version serves as side information for clients and servers. In video compression, an estimate of the previous frame is useful as side information 5. In some image transmission applications, a noisy version of the image is given to the decompressor in the form of side information to decode the digitized image 6. The rsync algorithm proposed in 7 conveys data of updated files, capitalizing on the fact that both files are highly similar. A wea and cheap checsum is compared first between blocs of the old and new files, and a more expensive and stronger checsum is calculated for blocs where the wea checsum matched, in order to ferret out discrepancies. Data compression with side information is also encountered in genomic sequencing. It was pointed out in 8 that the distance between two DNA sequences can be measured with an information-theoretic approach, and in this setting, a reference sequence taes the role of side information 9. An algorithm to compress a target genome was proposed in 0 when a reference genome is given. Taing into account that a high proportion of any two human genomes are identical, the difference between a target and a reference is compressed with a sliding-window empel-ziv algorithm. The problem of universal source coding with side information has been considered in various wors. In, an algorithm with side information was designed based on the Context Tree Weighting method 2, which weighs the This wor was partially supported by the Center for Science of Information, an NSF Science and Technology Center, under Grant CCF model distributions recursively using a binary context tree. The algorithm in and the conditional Multilevel Pattern Matching code introduced in 9 are optimal when source and side information are jointly stationary and ergodic processes. In addition, there has been some research on universal source coding with side information that applied empel-ziv algorithms. A conditional version of the established slidingwindow empel-ziv algorithm was proposed in 3 and shown to be optimal in 4 for the stationary ergodic sources. The Z78 algorithm was modified to exploit side information in 5, based on the algorithm given in 6 for universal decoding for finite-state channels. In this paper, we propose a universal fixed-length-parsing algorithm with side information available to compressor and decompressor, motivated by the counterpart without side information devised and elegantly analyzed by Willems in 3. Fixed-length parsing has worse transient behavior than variable-length parsing (Z) but is amenable to more straightforward analysis. We mae the first step to design an algorithm with fixed-length parsing for the system with side information, which we show is optimal for any jointly stationary sequences. While the algorithm in 3 traced bac to find the first match of fixed-length phrases, our algorithm loos for the match of source and side information simultaneously, and uses side information to represent the location of the match. Section II and Section III describe and analyze the algorithms, respectively. II. DESCRIPTION OF AGORITHMS In this section, we propose an extension of the Willems algorithm for fixed-length parsing which taes advantage of the side information nown to both of the compressor and the decompressor. Consider source X =(,X,X 2, ) taing values on a finite alphabet A. Side information Y =(,Y,Y 2, ) is given to the compressor and the decompressor, taing values on a finite alphabet B. The compressor deals with the source and the side information by parsing them into phrases of fixed size. The number of phrases of size in X which the compressor aims to compress is denoted by N. A. Algorithm et be defined as the number of bits needed to represent symbols from the alphabet A, i.e., log 2 A. The /7/$ IEEE 2563
2 Algorithm Send bits to describe (X,,X ). s i =min t : 0 <t<(i ) +, (XY ) i (i )+ =(XY )i t (i )+ t n i = t,s i : Y(i )+ (i )+ t i = Y i t. n i =0. if n i : 2 then send h (n i ). Send h (2 ). Send X i (i )+ without compression using bits. algorithm starts by sending bits to convey the first phrase (X,,X ) without any compression. At the i-th step, the compressor finds out where the same values of the i- th phrase, (X (i )+,,X i ) and (Y (i )+,,Y i ), appeared simultaneously before, for the last time. Tracing bac until the location of that (X, Y)-match, count the number of Y-matches of the phrase (Y (i )+,,Y i ). As the compressor transmits the information of the number of Y- matches, the decompressor can recognize the location of the (X, Y)-match uniquely. The number of Y-matches is conveyed using h ( ) the universal prefix code for : 2 in 3, the length of which is log2 ( + ) + log l(h (n)) = 2 n, if n<2, log 2 ( + ), if n =2 (), where l( ) is the length function. If the number of Y-matches n i : 2, the compressor sends h (n i ). When n i 2 or there is no match, h (2 ) is transmitted, followed by the uncompressed sequence (X (i )+,,X i ). B. Algorithm 2 If Algorithm cannot find an (X, Y)-match within the offset 2, the compressor simply sends h (2 ) and the uncompressed phrase. There is, however, a fair chance of finding an X-match in the past, permitting the compressor to capitalize on the original Z77 algorithm. The compressor loos bac for the (X, Y)-match of the i- th phrase, and calculates the number of Y-matches n i until the (X, Y)-match appears as in Algorithm. In case of n i : 2, the flag bit 0 is sent to denote that an (X, Y)-match was found, and h (n i ) is delivered. Otherwise, the compressor forwards the flag bit and finds out whether there exists an a : b j Z : a j b. Algorithm 2 Send bits to describe (X,,X ). s i =min t : 0 <t<(i ) +, (XY ) i (i )+ =(XY )i t (i )+ t n i = t,s i : Y(i )+ (i )+ t i = Y i t. n i =0. if n i : 2 then Send a flag bit 0 and h (n i ). Send a flag bit. r i =min t : 0 <t<(i ) +, X(i )+ i = Xi t (i )+ t if r i 2 m then send h m (r i ). Send h m (2 m ). Send X i (i )+ with bits. X-match. When the match is found, the number of symbols, r i, that it has to loo bac to reach the match is counted, and h m (r i ) is sent, if r i 2 m, for some fixed integer parameter m. When r i 2 m or no match was found, h m (2 m ) is conveyed, followed by uncompressed bits. C. Algorithm 3 In Algorithm, the prefix code h ( ) is used to convey the number of Y-matches found until the (X, Y)-match is reached. The code assigns a codeword when the number of matches is in : 2. However, when the sequences are not long enough, the number of all Y-matches can be smaller than 2, in which case it is inefficient to consider a code for : 2. The algorithm is modified in order that the compressor can reduce the amount of bits to send, by adopting a prefix code with another parameter. After sending i phrases, the match of the i-th phrase in the past is found, and the number of Y-matches n i until the match is counted as before. Besides, the algorithm computes p i, the number of Y-matches from the beginning until the current time. If p i < 2, then there is no need to use the code h ( ), since it is efficient only when every index, 2,, 2 is a possible candidate to be encoded. Choose the parameter i = log 2 (p i +). The decompressor as well as the compressor can compute i, since they are aware of 2564
3 Algorithm 3 Send bits to describe (X,,X ). s i =min t : 0 <t<(i ) +, (XY ) i (i )+ =(XY )i t (i )+ t n i = t,s i : Y(i )+ (i )+ t i = Y i t. n i =0. p i = t, (i ) : Y(i )+ (i )+ t i = Y i t. if p i < 2 then i = log 2 (p i +). if n i : 2 i then send h i (n i ). Send h i (2 i ). Send X(i )+ i using bits. if n i : 2 then send h (n i ). Send h (2 ). Send X i (i )+ using bits. every location of the Y-matches. The compressor conveys h i (n i ) if n i : 2 i, and otherwise, it sends h i (2 i ) with the uncompressed sequence. After several phrases have been sent, p i will become as large as 2. In that case, h (2 ) can be used. This modification is useful for the original Willems algorithm without side information 3 as well. III. ANAYSIS In this section we show the optimality of Algorithm and compare the performance of the modified algorithms. A. Optimality of Algorithm Willems 3 showed the asymptotic optimality of his fixedlength-parsing algorithm in the sense of expected length for stationary sources. Here we show that the length per symbol of the algorithm with side information approaches the conditional entropy rate as the bloclength and the number of phrases go to infinity. For any doubly infinite sequence a= (,a,a 0,a, ), the repeated recurrence times are defined as T 0 (a) =0, T j (a) =min t N : t>t j (a), a = a t t j. A random variable C indicates the number of Y-matches that appeared in the past since the latest (X, Y)-match, which is C = t N : t T (X, Y), Y = Y t t. (2) The following lemma provides the relationship between the expected value of C and the conditional probability. emma. For any stationary processes X, Y and sequences x, y, the conditional probability of x given y and the conditional expectation of C satisfy E C X = x,y = y ( P X = x Y = y ), (3) where C is defined in (2). Proof. We simplify notation as T j T j (y). Define a probability w n PX Tj x j n Y = y. Note that the difference between w n and w n+ represents the probability that X-match appears for the first time with the interval T n+, among the locations of Y-matches, as w n w n+ = P X Tj x j n, X Tn+ + = x Y = y It can be interpreted in the following way by stationarity: w n w n+ = P X x,x Tj x j n,x Tn = x Y = y. The conditional probability of C and X becomes P C = n, X = x Y = y = P X = x,x Tj x j n,x Tn = x Y = y = P X Tj x j n, X Tn = x Y = y P X x,x Tj x j n,x Tn = x Y = y =(w n w n ) (w n w n+ ). Finally, we get the desired result from E C X = x,y = y P X = x Y = y n = lim j (w j 2w j + w j+ ) (4) = lim w n n(w n w n+ ) (5) = lim ( w n) (6) = P X tj t j = x Y = y, (7) where (6) holds because (w j w j+ ) <, and accordingly, w n w n+ = o ( n). Note that emma reduces to Kac s lemma 4 in the case without side information, which implies that the meanrecurrence time of a sequence is lower than the reciprocal of the probability of the sequence. Based on emma, the average codelength of the algorithm with side information is analyzed next. et w i (X i Y i ) be 2565
4 the codeword constructed by Algorithm to send the i-th phrase X(i )+ i, when the previous phrases are X(i ), the current phrase of side information is Y(i )+ i, and the previous phrases of side information are Y. For convenience, the first recurrence time of sequences, T (X, Y), will be denoted by T from this point on. Theorem. The conditional average codelength of the i-th phrase is bounded as E l ( w i (X i Y i ) ) X(i )+ i = x,yi (i )+ = y ı X Y (x y)+ log 2( + ) + ((i ) +)PX = (8) x,y = y, for any x A, y B. Proof. Define the event E xy X = x,y = y. Using stationarity, the expected length of codeword of i-th phrase is E l ( w i (X i Y i ) ) X(i )+ i = x,yi (i )+ ( ) = y = E l w i (X (i ) Y (i ) ) E xy = log 2 j P C = j, T (i ) E xy 2 + P C 2 or T (i ) + E xy + log2 ( + ) 2 log 2 j P C = j E xy + log 2 j P C = j E xy j=2 + P C 2, T (i ) + E xy + log2 ( + ) (a) log 2 E C E xy + log 2 ( + ) + P C = j E xy P T j (Y) (i ) + C = j, E xy 2 (b) ı X Y (x y)+ log EC E xy 2( + ) + ((i ) +)PY = y (c) ı X Y (x y)+ log 2( + ) + ((i ) +)PE xy, where (a) holds by the concavity of logarithm and Jensen s inequality, and (b) follows by emma, Marov s inequality and E T j (Y) Y = y =j ET (Y) Y j = y PY = y (9) Inequality (c) is implied by emma. The codeword for X N with side information Y N is denoted by w(x N Y N ). Utilizing the bound for the codelength of each phrase given in Theorem, we show that the average compression rate is asymptotically bounded by the conditional entropy rate. Theorem 2. Algorithm is asymptotically optimal in the sense that lim lim E l(w(x N Y N )) H(X Y). (0) N N Proof. For any finite N and, the expected codelength is bounded as E l(w(x N Y N )) N = + E l ( w i (X i Y i ) ) () N = + P X(i )+ i = x,yi (i )+ = y x A y B E l ( w i (X i Y i ) ) X(i )+ i =x,yi (i )+ =y (2) N ( P E xy log 2 ( + ) + ı X Y (x y) x A y B ) + + (3) ((i ) +)P E xy = +(N ) log 2 ( + ) +(N )H(X Y ) N + A B (i ) +. (4) As the number of phrases goes to infinity, the asymptotic compression rate can be upper bounded as: E l(w(x N Y N )) lim N N log 2( + ) + H(X Y ) + A B lim N N N (i ) + (5) = log 2( + ) + H(X Y ). (6) Therefore, (0) follows as goes to infinity. B. Algorithm 2 While Algorithm exploits (X, Y)-matches only, X- matches are also exploited in Algorithm 2 in order to denote the match location. However, this modified strategy necessitates a flag bit to indicate whether the codeword signals an (X, Y)-match or an X-match. We show a sufficient condition for Algorithm 2 to outperform Algorithm. The codelengths are summarized in Table I, where n i is the number of Y-matches until the (X, Y)-match is reached, and r i is the number of symbols needed to reach the X-match, as before. For cases 2), 3), 4), Algorithm 2 constructs a shorter codeword than Algorithm does if m<, while Algorithm 2 outputs one more bit for case ). Hence, Algorithm 2 is efficient when the frequency of case ) is low. Denote E C 2, T (i ), E 2 C =2, T (i ), E 3,t ( C 2 + T (i ) + ) T (X) =t, E 4 ( C 2 + T (i ) + ) T (X) >c i, 2566
5 TABE I: Comparison of Codelengths Cases Algorithm Algorithm 2 ) n i 2 log 2 ( + ) + log 2 n i log 2 ( + ) + log 2 n i + 2) n i =2 log 2 ( + ) + log 2 ( + ) + 3) n i 2 +, r i 2 m log 2 ( + ) + log 2 ( + m) + log 2 r i + 4) n i 2 +, r i 2 m log 2 ( + ) + log 2 ( + m) + + where c i min(2 m, (i )). et w i 2( ) be codeword of the i-th phrase constructed with Algorithm 2. Then, the difference between the average codelengths is E l(w i (X i Y i )) X(i )+ i = x,yi (i )+ = y E l(w2(x i i Y i )) X(i )+ i i = x,y(i )+ = y c i = P E E xy +( )P E 2 E xy + P E 3,t E xy t= ( log 2 ( + ) + log 2 ( + m) log 2 t ) + PE 4 E xy ( log 2 ( + ) log 2 ( + m) ) (7) P E E xy +( )P E 2 E xy +P t ci E 3,t E xy ( log 2 ( + ) + log 2 ( + m) m ) + P E 4 E xy ( log2 ( + ) log 2 ( + m) ) (8) P E E xy + P E c E xy ( log 2 ( + ) log 2 ( + m) ). (9) This gap is positive if P E E xy log 2 ( + ) log 2 ( + m), (20) which means that Algorithm 2 is advantageous for relatively short sequences. C. Algorithm 3 Described in Section II-C, Algorithm 3 improves performance by exploiting the fact that the decompressor is able to locate every match of the side information. We focus on the codelength for the i-th phrase whose codeword is denoted by w 3 ( ). et ˆB B i be a set composed of y B i such that the number of matches, p i, of the i-th phrase among the i previous phrases is smaller than 2. Ify ˆB, a new parameter i =log 2 (p i +) replaces, and C is certainly lower than 2 i, given that T (i ). The conditional average codelength of Algorithm 3 for the i-th phrase is E l(w3(x i i Y i )) X(i )+ i = x,yi = y log 2 j P C = j, T (i ) X =x,y (i ) =y + P T>(i ) X =x,y (i ) =y + log 2 ( + i ), y ˆB. 2 i = The performance gap corresponding to the i-th phrase between Algorithm and Algorithm 3 for this case results in E l(w i (X i Y i )) X(i )+ i =x,yi =y E l(w3(x i i Y i )) X(i )+ i =x,yi =y = log 2 ( + ) log 2 ( + i ) 0, y ˆB. Hence, when y ˆB, the performance of Algorithm 3 is at least as good as that of Algorithm, while if y ˆB, both algorithms yield the same expected codelength. Thus, Algorithm always constructs codewords no shorter than those of Algorithm 3, which exploits the fact that the first several phrases typically find few side information matches. REFERENCES J. Ziv and A. empel, A universal algorithm for sequential data compression, IEEE Trans. Inf. Theory, vol. 24, pp , May J. Ziv and A. empel, Compression of individual sequences via variable-rate coding, IEEE Trans. Inf. Theory, pp , Sep F. M. J. Willems, Universal data compression and repetition times, IEEE Trans. Inf. Theory, vol. 35, no. I, pp 54 58, Jan M. Kac, On the notion of recurrence in discrete stochastic processes, Bulletin of the American Mathematical Society, pp , A. Aaron, R. Zhang, B. Girod, Wyner-Ziv coding of motion video, in Proc. Asilomar Conf. on Signals and Systems, Pacific Grove, Nov. 2002, pp S. S. Pradhan and K. Ramchandran, Enhancing analog image transmission systems using digital side information: a new wavelet-based image coding paradigm, in Proc. IEEE Data Compression Conf., Snowbird, Mar. 200, pp A. Tridgell and P. Macerras, The rsync algorithm, Technical Report TR-CS-96-05, Department of Computer Science, The Australian National University, M. i, J. H. Badger, X. Chen, S. Kwong, P. Kearney and H. Zhang, An information-based sequence distance and its application to whole mitochondrial genome phylogeny, Bioinformatics, pp 49 54, E. Yang, A. Kaltcheno, and J. C. Kieffer, Universal lossless data compression with side information by using a conditional MPM grammar transform, IEEE Trans. Inf. Theory, vol. 47, no. 6, Sep B. G. Chern, I. Ochoa, A. Manolaos, A. No, K. Venat and T. Weissman, Reference based genome compression, IEEE Inf. Theory Worshop, 202. H. Cai, S. R. Kularni, and S. Verdú, An algorithm for universal lossless compression with side information, IEEE Trans. Inf. Theory, vol. 52, no. 9, pp , Sep F. M. J. Willems, Y. M. Shtarov, and T. J. Tjalens, The context-tree weighting method: basic properties, IEEE Trans. Inf. Theory, vol. 4, no. 3, pp , May P. Subrahmanya and T. Berger, A sliding window empel-ziv algorithm for differential layer encoding in progressive transmission, IEEE Int l Symposium on Inf. Theory, Whistler, BC, Canada, June 995, p T. Jacob and R. K. Bansal, On the optimality of sliding window empel-ziv algorithm with side information, 2008 Int l Symposium on Inf. Theory and its Applications, Aucland, New Zealand, Dec T. Uyematsu and S Kuzuoa, Conditional empel-ziv complexity and its application to source coding theorem with side information, IEICE Trans. Fundam., vol. E86-A, no. 0, Oct J. Ziv, Universal decoding for finite-state channel, IEEE Trans. Inf. Theory, vol 3, no. 4, pp , July
Information. Abstract. This paper presents conditional versions of Lempel-Ziv (LZ) algorithm for settings where compressor and decompressor
Optimal Universal Lossless Compression with Side Information Yeohee Im, Student Member, IEEE and Sergio Verdú, Fellow, IEEE arxiv:707.0542v cs.it 8 Jul 207 Abstract This paper presents conditional versions
More informationIN this paper, we study the problem of universal lossless compression
4008 IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 52, NO. 9, SEPTEMBER 2006 An Algorithm for Universal Lossless Compression With Side Information Haixiao Cai, Member, IEEE, Sanjeev R. Kulkarni, Fellow,
More informationCoding for Discrete Source
EGR 544 Communication Theory 3. Coding for Discrete Sources Z. Aliyazicioglu Electrical and Computer Engineering Department Cal Poly Pomona Coding for Discrete Source Coding Represent source data effectively
More informationUpper Bounds on the Capacity of Binary Intermittent Communication
Upper Bounds on the Capacity of Binary Intermittent Communication Mostafa Khoshnevisan and J. Nicholas Laneman Department of Electrical Engineering University of Notre Dame Notre Dame, Indiana 46556 Email:{mhoshne,
More informationChapter 2: Source coding
Chapter 2: meghdadi@ensil.unilim.fr University of Limoges Chapter 2: Entropy of Markov Source Chapter 2: Entropy of Markov Source Markov model for information sources Given the present, the future is independent
More informationRun-length & Entropy Coding. Redundancy Removal. Sampling. Quantization. Perform inverse operations at the receiver EEE
General e Image Coder Structure Motion Video x(s 1,s 2,t) or x(s 1,s 2 ) Natural Image Sampling A form of data compression; usually lossless, but can be lossy Redundancy Removal Lossless compression: predictive
More informationTransmitting k samples over the Gaussian channel: energy-distortion tradeoff
Transmitting samples over the Gaussian channel: energy-distortion tradeoff Victoria Kostina Yury Polyansiy Sergio Verdú California Institute of Technology Massachusetts Institute of Technology Princeton
More informationDistributed Arithmetic Coding
Distributed Arithmetic Coding Marco Grangetto, Member, IEEE, Enrico Magli, Member, IEEE, Gabriella Olmo, Senior Member, IEEE Abstract We propose a distributed binary arithmetic coder for Slepian-Wolf coding
More informationEE376A: Homework #3 Due by 11:59pm Saturday, February 10th, 2018
Please submit the solutions on Gradescope. EE376A: Homework #3 Due by 11:59pm Saturday, February 10th, 2018 1. Optimal codeword lengths. Although the codeword lengths of an optimal variable length code
More information3F1 Information Theory, Lecture 3
3F1 Information Theory, Lecture 3 Jossy Sayir Department of Engineering Michaelmas 2013, 29 November 2013 Memoryless Sources Arithmetic Coding Sources with Memory Markov Example 2 / 21 Encoding the output
More informationVariable-Rate Universal Slepian-Wolf Coding with Feedback
Variable-Rate Universal Slepian-Wolf Coding with Feedback Shriram Sarvotham, Dror Baron, and Richard G. Baraniuk Dept. of Electrical and Computer Engineering Rice University, Houston, TX 77005 Abstract
More informationQuantization for Distributed Estimation
0 IEEE International Conference on Internet of Things ithings 0), Green Computing and Communications GreenCom 0), and Cyber-Physical-Social Computing CPSCom 0) Quantization for Distributed Estimation uan-yu
More informationMotivation for Arithmetic Coding
Motivation for Arithmetic Coding Motivations for arithmetic coding: 1) Huffman coding algorithm can generate prefix codes with a minimum average codeword length. But this length is usually strictly greater
More informationCapacity of the Discrete Memoryless Energy Harvesting Channel with Side Information
204 IEEE International Symposium on Information Theory Capacity of the Discrete Memoryless Energy Harvesting Channel with Side Information Omur Ozel, Kaya Tutuncuoglu 2, Sennur Ulukus, and Aylin Yener
More informationDistributed Source Coding Using LDPC Codes
Distributed Source Coding Using LDPC Codes Telecommunications Laboratory Alex Balatsoukas-Stimming Technical University of Crete May 29, 2010 Telecommunications Laboratory (TUC) Distributed Source Coding
More informationDesign of Optimal Quantizers for Distributed Source Coding
Design of Optimal Quantizers for Distributed Source Coding David Rebollo-Monedero, Rui Zhang and Bernd Girod Information Systems Laboratory, Electrical Eng. Dept. Stanford University, Stanford, CA 94305
More informationIEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 56, NO. 4, APRIL
IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 56, NO. 4, APRIL 2010 1737 A Universal Scheme for Wyner Ziv Coding of Discrete Sources Shirin Jalali, Student Member, IEEE, Sergio Verdú, Fellow, IEEE, and
More informationAn Extended Fano s Inequality for the Finite Blocklength Coding
An Extended Fano s Inequality for the Finite Bloclength Coding Yunquan Dong, Pingyi Fan {dongyq8@mails,fpy@mail}.tsinghua.edu.cn Department of Electronic Engineering, Tsinghua University, Beijing, P.R.
More information3F1 Information Theory, Lecture 3
3F1 Information Theory, Lecture 3 Jossy Sayir Department of Engineering Michaelmas 2011, 28 November 2011 Memoryless Sources Arithmetic Coding Sources with Memory 2 / 19 Summary of last lecture Prefix-free
More informationarxiv: v1 [cs.it] 5 Sep 2008
1 arxiv:0809.1043v1 [cs.it] 5 Sep 2008 On Unique Decodability Marco Dalai, Riccardo Leonardi Abstract In this paper we propose a revisitation of the topic of unique decodability and of some fundamental
More informationAn instantaneous code (prefix code, tree code) with the codeword lengths l 1,..., l N exists if and only if. 2 l i. i=1
Kraft s inequality An instantaneous code (prefix code, tree code) with the codeword lengths l 1,..., l N exists if and only if N 2 l i 1 Proof: Suppose that we have a tree code. Let l max = max{l 1,...,
More informationCompression and Coding
Compression and Coding Theory and Applications Part 1: Fundamentals Gloria Menegaz 1 Transmitter (Encoder) What is the problem? Receiver (Decoder) Transformation information unit Channel Ordering (significance)
More informationBasic Principles of Lossless Coding. Universal Lossless coding. Lempel-Ziv Coding. 2. Exploit dependences between successive symbols.
Universal Lossless coding Lempel-Ziv Coding Basic principles of lossless compression Historical review Variable-length-to-block coding Lempel-Ziv coding 1 Basic Principles of Lossless Coding 1. Exploit
More informationChapter 4. Data Transmission and Channel Capacity. Po-Ning Chen, Professor. Department of Communications Engineering. National Chiao Tung University
Chapter 4 Data Transmission and Channel Capacity Po-Ning Chen, Professor Department of Communications Engineering National Chiao Tung University Hsin Chu, Taiwan 30050, R.O.C. Principle of Data Transmission
More informationLecture 2: August 31
0-704: Information Processing and Learning Fall 206 Lecturer: Aarti Singh Lecture 2: August 3 Note: These notes are based on scribed notes from Spring5 offering of this course. LaTeX template courtesy
More informationPART III. Outline. Codes and Cryptography. Sources. Optimal Codes (I) Jorge L. Villar. MAMME, Fall 2015
Outline Codes and Cryptography 1 Information Sources and Optimal Codes 2 Building Optimal Codes: Huffman Codes MAMME, Fall 2015 3 Shannon Entropy and Mutual Information PART III Sources Information source:
More informationECE 587 / STA 563: Lecture 5 Lossless Compression
ECE 587 / STA 563: Lecture 5 Lossless Compression Information Theory Duke University, Fall 2017 Author: Galen Reeves Last Modified: October 18, 2017 Outline of lecture: 5.1 Introduction to Lossless Source
More informationOn Scalable Source Coding for Multiple Decoders with Side Information
On Scalable Source Coding for Multiple Decoders with Side Information Chao Tian School of Computer and Communication Sciences Laboratory for Information and Communication Systems (LICOS), EPFL, Lausanne,
More informationLarge Deviations Performance of Knuth-Yao algorithm for Random Number Generation
Large Deviations Performance of Knuth-Yao algorithm for Random Number Generation Akisato KIMURA akisato@ss.titech.ac.jp Tomohiko UYEMATSU uematsu@ss.titech.ac.jp April 2, 999 No. AK-TR-999-02 Abstract
More informationLecture 4 : Adaptive source coding algorithms
Lecture 4 : Adaptive source coding algorithms February 2, 28 Information Theory Outline 1. Motivation ; 2. adaptive Huffman encoding ; 3. Gallager and Knuth s method ; 4. Dictionary methods : Lempel-Ziv
More informationMultimedia. Multimedia Data Compression (Lossless Compression Algorithms)
Course Code 005636 (Fall 2017) Multimedia Multimedia Data Compression (Lossless Compression Algorithms) Prof. S. M. Riazul Islam, Dept. of Computer Engineering, Sejong University, Korea E-mail: riaz@sejong.ac.kr
More informationFeedback Capacity of a Class of Symmetric Finite-State Markov Channels
Feedback Capacity of a Class of Symmetric Finite-State Markov Channels Nevroz Şen, Fady Alajaji and Serdar Yüksel Department of Mathematics and Statistics Queen s University Kingston, ON K7L 3N6, Canada
More informationOn Common Information and the Encoding of Sources that are Not Successively Refinable
On Common Information and the Encoding of Sources that are Not Successively Refinable Kumar Viswanatha, Emrah Akyol, Tejaswi Nanjundaswamy and Kenneth Rose ECE Department, University of California - Santa
More informationCan the sample being transmitted be used to refine its own PDF estimate?
Can the sample being transmitted be used to refine its own PDF estimate? Dinei A. Florêncio and Patrice Simard Microsoft Research One Microsoft Way, Redmond, WA 98052 {dinei, patrice}@microsoft.com Abstract
More informationThe Capacity Region for Multi-source Multi-sink Network Coding
The Capacity Region for Multi-source Multi-sink Network Coding Xijin Yan Dept. of Electrical Eng. - Systems University of Southern California Los Angeles, CA, U.S.A. xyan@usc.edu Raymond W. Yeung Dept.
More informationFEEDBACK does not increase the capacity of a discrete
1 Sequential Differential Optimization of Incremental Redundancy Transmission Lengths: An Example with Tail-Biting Convolutional Codes Nathan Wong, Kasra Vailinia, Haobo Wang, Sudarsan V. S. Ranganathan,
More informationSIGNAL COMPRESSION Lecture 7. Variable to Fix Encoding
SIGNAL COMPRESSION Lecture 7 Variable to Fix Encoding 1. Tunstall codes 2. Petry codes 3. Generalized Tunstall codes for Markov sources (a presentation of the paper by I. Tabus, G. Korodi, J. Rissanen.
More informationA new converse in rate-distortion theory
A new converse in rate-distortion theory Victoria Kostina, Sergio Verdú Dept. of Electrical Engineering, Princeton University, NJ, 08544, USA Abstract This paper shows new finite-blocklength converse bounds
More informationHomework Set #2 Data Compression, Huffman code and AEP
Homework Set #2 Data Compression, Huffman code and AEP 1. Huffman coding. Consider the random variable ( x1 x X = 2 x 3 x 4 x 5 x 6 x 7 0.50 0.26 0.11 0.04 0.04 0.03 0.02 (a Find a binary Huffman code
More informationRepresentation of Correlated Sources into Graphs for Transmission over Broadcast Channels
Representation of Correlated s into Graphs for Transmission over Broadcast s Suhan Choi Department of Electrical Eng. and Computer Science University of Michigan, Ann Arbor, MI 80, USA Email: suhanc@eecs.umich.edu
More information4. Quantization and Data Compression. ECE 302 Spring 2012 Purdue University, School of ECE Prof. Ilya Pollak
4. Quantization and Data Compression ECE 32 Spring 22 Purdue University, School of ECE Prof. What is data compression? Reducing the file size without compromising the quality of the data stored in the
More informationInformation Theory CHAPTER. 5.1 Introduction. 5.2 Entropy
Haykin_ch05_pp3.fm Page 207 Monday, November 26, 202 2:44 PM CHAPTER 5 Information Theory 5. Introduction As mentioned in Chapter and reiterated along the way, the purpose of a communication system is
More informationSIGNAL COMPRESSION Lecture Shannon-Fano-Elias Codes and Arithmetic Coding
SIGNAL COMPRESSION Lecture 3 4.9.2007 Shannon-Fano-Elias Codes and Arithmetic Coding 1 Shannon-Fano-Elias Coding We discuss how to encode the symbols {a 1, a 2,..., a m }, knowing their probabilities,
More informationSolutions to Set #2 Data Compression, Huffman code and AEP
Solutions to Set #2 Data Compression, Huffman code and AEP. Huffman coding. Consider the random variable ( ) x x X = 2 x 3 x 4 x 5 x 6 x 7 0.50 0.26 0. 0.04 0.04 0.03 0.02 (a) Find a binary Huffman code
More informationOptimal Lossless Compression: Source Varentropy and Dispersion
Optimal Lossless Compression: Source Varentropy and Dispersion Ioannis Kontoyiannis Department of Informatics Athens U. of Economics & Business Athens, Greece Email: yiannis@aueb.gr Sergio Verdú Department
More informationECE 587 / STA 563: Lecture 5 Lossless Compression
ECE 587 / STA 563: Lecture 5 Lossless Compression Information Theory Duke University, Fall 28 Author: Galen Reeves Last Modified: September 27, 28 Outline of lecture: 5. Introduction to Lossless Source
More information10-704: Information Processing and Learning Fall Lecture 10: Oct 3
0-704: Information Processing and Learning Fall 206 Lecturer: Aarti Singh Lecture 0: Oct 3 Note: These notes are based on scribed notes from Spring5 offering of this course. LaTeX template courtesy of
More informationAchieving Shannon Capacity Region as Secrecy Rate Region in a Multiple Access Wiretap Channel
Achieving Shannon Capacity Region as Secrecy Rate Region in a Multiple Access Wiretap Channel Shahid Mehraj Shah and Vinod Sharma Department of Electrical Communication Engineering, Indian Institute of
More informationLecture 1 : Data Compression and Entropy
CPS290: Algorithmic Foundations of Data Science January 8, 207 Lecture : Data Compression and Entropy Lecturer: Kamesh Munagala Scribe: Kamesh Munagala In this lecture, we will study a simple model for
More informationChapter 9 Fundamental Limits in Information Theory
Chapter 9 Fundamental Limits in Information Theory Information Theory is the fundamental theory behind information manipulation, including data compression and data transmission. 9.1 Introduction o For
More informationMultiuser Successive Refinement and Multiple Description Coding
Multiuser Successive Refinement and Multiple Description Coding Chao Tian Laboratory for Information and Communication Systems (LICOS) School of Computer and Communication Sciences EPFL Lausanne Switzerland
More informationMultiaccess Channels with State Known to One Encoder: A Case of Degraded Message Sets
Multiaccess Channels with State Known to One Encoder: A Case of Degraded Message Sets Shivaprasad Kotagiri and J. Nicholas Laneman Department of Electrical Engineering University of Notre Dame Notre Dame,
More informationIN this paper, we consider the capacity of sticky channels, a
72 IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 54, NO. 1, JANUARY 2008 Capacity Bounds for Sticky Channels Michael Mitzenmacher, Member, IEEE Abstract The capacity of sticky channels, a subclass of insertion
More informationDuality Between Channel Capacity and Rate Distortion With Two-Sided State Information
IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 48, NO. 6, JUNE 2002 1629 Duality Between Channel Capacity Rate Distortion With Two-Sided State Information Thomas M. Cover, Fellow, IEEE, Mung Chiang, Student
More informationTight Upper Bounds on the Redundancy of Optimal Binary AIFV Codes
Tight Upper Bounds on the Redundancy of Optimal Binary AIFV Codes Weihua Hu Dept. of Mathematical Eng. Email: weihua96@gmail.com Hirosuke Yamamoto Dept. of Complexity Sci. and Eng. Email: Hirosuke@ieee.org
More informationData Compression Using a Sort-Based Context Similarity Measure
Data Compression Using a Sort-Based Context Similarity easure HIDETOSHI YOKOO Department of Computer Science, Gunma University, Kiryu, Gunma 76, Japan Email: yokoo@cs.gunma-u.ac.jp Every symbol in the
More informationTraining-Based Schemes are Suboptimal for High Rate Asynchronous Communication
Training-Based Schemes are Suboptimal for High Rate Asynchronous Communication The MIT Faculty has made this article openly available. Please share how this access benefits you. Your story matters. Citation
More informationCoding of memoryless sources 1/35
Coding of memoryless sources 1/35 Outline 1. Morse coding ; 2. Definitions : encoding, encoding efficiency ; 3. fixed length codes, encoding integers ; 4. prefix condition ; 5. Kraft and Mac Millan theorems
More informationDigital communication system. Shannon s separation principle
Digital communication system Representation of the source signal by a stream of (binary) symbols Adaptation to the properties of the transmission channel information source source coder channel coder modulation
More informationAn Outer Bound for the Gaussian. Interference channel with a relay.
An Outer Bound for the Gaussian Interference Channel with a Relay Ivana Marić Stanford University Stanford, CA ivanam@wsl.stanford.edu Ron Dabora Ben-Gurion University Be er-sheva, Israel ron@ee.bgu.ac.il
More informationMemory in Classical Information Theory: A Brief History
Beyond iid in information theory 8- January, 203 Memory in Classical Information Theory: A Brief History sergio verdu princeton university entropy rate Shannon, 948 entropy rate: memoryless process H(X)
More informationA Comparison of Methods for Redundancy Reduction in Recurrence Time Coding
1 1 A Comparison of Methods for Redundancy Reduction in Recurrence Time Coding Hidetoshi Yokoo, Member, IEEE Abstract Recurrence time of a symbol in a string is defined as the number of symbols that have
More informationOn Scalable Coding in the Presence of Decoder Side Information
On Scalable Coding in the Presence of Decoder Side Information Emrah Akyol, Urbashi Mitra Dep. of Electrical Eng. USC, CA, US Email: {eakyol, ubli}@usc.edu Ertem Tuncel Dep. of Electrical Eng. UC Riverside,
More informationLecture 3. Mathematical methods in communication I. REMINDER. A. Convex Set. A set R is a convex set iff, x 1,x 2 R, θ, 0 θ 1, θx 1 + θx 2 R, (1)
3- Mathematical methods in communication Lecture 3 Lecturer: Haim Permuter Scribe: Yuval Carmel, Dima Khaykin, Ziv Goldfeld I. REMINDER A. Convex Set A set R is a convex set iff, x,x 2 R, θ, θ, θx + θx
More informationCapacity of a channel Shannon s second theorem. Information Theory 1/33
Capacity of a channel Shannon s second theorem Information Theory 1/33 Outline 1. Memoryless channels, examples ; 2. Capacity ; 3. Symmetric channels ; 4. Channel Coding ; 5. Shannon s second theorem,
More informationLecture 4 Noisy Channel Coding
Lecture 4 Noisy Channel Coding I-Hsiang Wang Department of Electrical Engineering National Taiwan University ihwang@ntu.edu.tw October 9, 2015 1 / 56 I-Hsiang Wang IT Lecture 4 The Channel Coding Problem
More informationA New Achievable Region for Gaussian Multiple Descriptions Based on Subset Typicality
0 IEEE Information Theory Workshop A New Achievable Region for Gaussian Multiple Descriptions Based on Subset Typicality Kumar Viswanatha, Emrah Akyol and Kenneth Rose ECE Department, University of California
More informationLecture 11: Quantum Information III - Source Coding
CSCI5370 Quantum Computing November 25, 203 Lecture : Quantum Information III - Source Coding Lecturer: Shengyu Zhang Scribe: Hing Yin Tsang. Holevo s bound Suppose Alice has an information source X that
More informationEE 4TM4: Digital Communications II. Channel Capacity
EE 4TM4: Digital Communications II 1 Channel Capacity I. CHANNEL CODING THEOREM Definition 1: A rater is said to be achievable if there exists a sequence of(2 nr,n) codes such thatlim n P (n) e (C) = 0.
More informationECE533 Digital Image Processing. Embedded Zerotree Wavelet Image Codec
University of Wisconsin Madison Electrical Computer Engineering ECE533 Digital Image Processing Embedded Zerotree Wavelet Image Codec Team members Hongyu Sun Yi Zhang December 12, 2003 Table of Contents
More informationSecret Key and Private Key Constructions for Simple Multiterminal Source Models
Secret Key and Private Key Constructions for Simple Multiterminal Source Models arxiv:cs/05050v [csit] 3 Nov 005 Chunxuan Ye Department of Electrical and Computer Engineering and Institute for Systems
More informationA Hyper-Trellis based Turbo Decoder for Wyner-Ziv Video Coding
A Hyper-Trellis based Turbo Decoder for Wyner-Ziv Video Coding Arun Avudainayagam, John M. Shea and Dapeng Wu Wireless Information Networking Group (WING) Department of Electrical and Computer Engineering
More informationSource Coding. Master Universitario en Ingeniería de Telecomunicación. I. Santamaría Universidad de Cantabria
Source Coding Master Universitario en Ingeniería de Telecomunicación I. Santamaría Universidad de Cantabria Contents Introduction Asymptotic Equipartition Property Optimal Codes (Huffman Coding) Universal
More informationGuess & Check Codes for Deletions, Insertions, and Synchronization
Guess & Chec Codes for Deletions, Insertions, and Synchronization Serge Kas Hanna, Salim El Rouayheb ECE Department, IIT, Chicago sashann@hawiitedu, salim@iitedu Abstract We consider the problem of constructing
More informationShannon s noisy-channel theorem
Shannon s noisy-channel theorem Information theory Amon Elders Korteweg de Vries Institute for Mathematics University of Amsterdam. Tuesday, 26th of Januari Amon Elders (Korteweg de Vries Institute for
More informationDispersion of the Gilbert-Elliott Channel
Dispersion of the Gilbert-Elliott Channel Yury Polyanskiy Email: ypolyans@princeton.edu H. Vincent Poor Email: poor@princeton.edu Sergio Verdú Email: verdu@princeton.edu Abstract Channel dispersion plays
More informationCS6304 / Analog and Digital Communication UNIT IV - SOURCE AND ERROR CONTROL CODING PART A 1. What is the use of error control coding? The main use of error control coding is to reduce the overall probability
More informationCoding on Countably Infinite Alphabets
Coding on Countably Infinite Alphabets Non-parametric Information Theory Licence de droits d usage Outline Lossless Coding on infinite alphabets Source Coding Universal Coding Infinite Alphabets Enveloppe
More informationMARKOV CHAINS A finite state Markov chain is a sequence of discrete cv s from a finite alphabet where is a pmf on and for
MARKOV CHAINS A finite state Markov chain is a sequence S 0,S 1,... of discrete cv s from a finite alphabet S where q 0 (s) is a pmf on S 0 and for n 1, Q(s s ) = Pr(S n =s S n 1 =s ) = Pr(S n =s S n 1
More informationPerformance-based Security for Encoding of Information Signals. FA ( ) Paul Cuff (Princeton University)
Performance-based Security for Encoding of Information Signals FA9550-15-1-0180 (2015-2018) Paul Cuff (Princeton University) Contributors Two students finished PhD Tiance Wang (Goldman Sachs) Eva Song
More informationMultimedia Communications. Mathematical Preliminaries for Lossless Compression
Multimedia Communications Mathematical Preliminaries for Lossless Compression What we will see in this chapter Definition of information and entropy Modeling a data source Definition of coding and when
More informationFast Length-Constrained MAP Decoding of Variable Length Coded Markov Sequences over Noisy Channels
Fast Length-Constrained MAP Decoding of Variable Length Coded Markov Sequences over Noisy Channels Zhe Wang, Xiaolin Wu and Sorina Dumitrescu Department of Electrical and Computer Engineering McMaster
More informationAalborg Universitet. Bounds on information combining for parity-check equations Land, Ingmar Rüdiger; Hoeher, A.; Huber, Johannes
Aalborg Universitet Bounds on information combining for parity-check equations Land, Ingmar Rüdiger; Hoeher, A.; Huber, Johannes Published in: 2004 International Seminar on Communications DOI link to publication
More information2018/5/3. YU Xiangyu
2018/5/3 YU Xiangyu yuxy@scut.edu.cn Entropy Huffman Code Entropy of Discrete Source Definition of entropy: If an information source X can generate n different messages x 1, x 2,, x i,, x n, then the
More informationSCALABLE AUDIO CODING USING WATERMARKING
SCALABLE AUDIO CODING USING WATERMARKING Mahmood Movassagh Peter Kabal Department of Electrical and Computer Engineering McGill University, Montreal, Canada Email: {mahmood.movassagh@mail.mcgill.ca, peter.kabal@mcgill.ca}
More informationSpectrum Sensing via Event-triggered Sampling
Spectrum Sensing via Event-triggered Sampling Yasin Yilmaz Electrical Engineering Department Columbia University New Yor, NY 0027 Email: yasin@ee.columbia.edu George Moustaides Dept. of Electrical & Computer
More informationCapacity-achieving Feedback Scheme for Flat Fading Channels with Channel State Information
Capacity-achieving Feedback Scheme for Flat Fading Channels with Channel State Information Jialing Liu liujl@iastate.edu Sekhar Tatikonda sekhar.tatikonda@yale.edu Nicola Elia nelia@iastate.edu Dept. of
More informationUNIT I INFORMATION THEORY. I k log 2
UNIT I INFORMATION THEORY Claude Shannon 1916-2001 Creator of Information Theory, lays the foundation for implementing logic in digital circuits as part of his Masters Thesis! (1939) and published a paper
More informationCut-Set Bound and Dependence Balance Bound
Cut-Set Bound and Dependence Balance Bound Lei Xiao lxiao@nd.edu 1 Date: 4 October, 2006 Reading: Elements of information theory by Cover and Thomas [1, Section 14.10], and the paper by Hekstra and Willems
More informationPractical Polar Code Construction Using Generalised Generator Matrices
Practical Polar Code Construction Using Generalised Generator Matrices Berksan Serbetci and Ali E. Pusane Department of Electrical and Electronics Engineering Bogazici University Istanbul, Turkey E-mail:
More informationSlepian-Wolf Code Design via Source-Channel Correspondence
Slepian-Wolf Code Design via Source-Channel Correspondence Jun Chen University of Illinois at Urbana-Champaign Urbana, IL 61801, USA Email: junchen@ifpuiucedu Dake He IBM T J Watson Research Center Yorktown
More informationOn the Cost of Worst-Case Coding Length Constraints
On the Cost of Worst-Case Coding Length Constraints Dror Baron and Andrew C. Singer Abstract We investigate the redundancy that arises from adding a worst-case length-constraint to uniquely decodable fixed
More informationShannon s Noisy-Channel Coding Theorem
Shannon s Noisy-Channel Coding Theorem Lucas Slot Sebastian Zur February 13, 2015 Lucas Slot, Sebastian Zur Shannon s Noisy-Channel Coding Theorem February 13, 2015 1 / 29 Outline 1 Definitions and Terminology
More informationEE5139R: Problem Set 7 Assigned: 30/09/15, Due: 07/10/15
EE5139R: Problem Set 7 Assigned: 30/09/15, Due: 07/10/15 1. Cascade of Binary Symmetric Channels The conditional probability distribution py x for each of the BSCs may be expressed by the transition probability
More informationLecture 6 I. CHANNEL CODING. X n (m) P Y X
6- Introduction to Information Theory Lecture 6 Lecturer: Haim Permuter Scribe: Yoav Eisenberg and Yakov Miron I. CHANNEL CODING We consider the following channel coding problem: m = {,2,..,2 nr} Encoder
More informationJoint Source-Channel Coding for the Multiple-Access Relay Channel
Joint Source-Channel Coding for the Multiple-Access Relay Channel Yonathan Murin, Ron Dabora Department of Electrical and Computer Engineering Ben-Gurion University, Israel Email: moriny@bgu.ac.il, ron@ee.bgu.ac.il
More informationLecture 16. Error-free variable length schemes (contd.): Shannon-Fano-Elias code, Huffman code
Lecture 16 Agenda for the lecture Error-free variable length schemes (contd.): Shannon-Fano-Elias code, Huffman code Variable-length source codes with error 16.1 Error-free coding schemes 16.1.1 The Shannon-Fano-Elias
More informationAn Alternative Proof of Channel Polarization for Channels with Arbitrary Input Alphabets
An Alternative Proof of Channel Polarization for Channels with Arbitrary Input Alphabets Jing Guo University of Cambridge jg582@cam.ac.uk Jossy Sayir University of Cambridge j.sayir@ieee.org Minghai Qin
More informationCommunications Theory and Engineering
Communications Theory and Engineering Master's Degree in Electronic Engineering Sapienza University of Rome A.A. 2018-2019 AEP Asymptotic Equipartition Property AEP In information theory, the analog of
More informationIntro to Information Theory
Intro to Information Theory Math Circle February 11, 2018 1. Random variables Let us review discrete random variables and some notation. A random variable X takes value a A with probability P (a) 0. Here
More information