1 Introduction to information theory

Save this PDF as:

Size: px
Start display at page:

Transcription

1 1 Introduction to information theory 1.1 Introduction In this chapter we present some of the basic concepts of information theory. The situations we have in mind involve the exchange of information through transmission and storage media designed for that purpose. The information is represented in digital formats using symbols and alphabets taken in a very broad sense. The deliberate choice of the way information is represented, often referred to as coding, is an essential aspect of the theory, and for many results it will be assumed that the user is free to choose a suitable code. We present the classical results of information theory, which were originally developed as a model of communication as it takes place over a telephone circuit or a similar connection. However, we shall pay particular attention to two-dimensional (2-D) applications, and many examples are chosen from these areas. A huge amount of information is exchanged in formats that are essentially 2-D, namely web-pages, graphic material, etc. Such forms of communication typically have an extremely complex structure. The term media is often used to indicate the structure of the message as well as the surrounding organization. Information theory is relevant for understanding the possibilities and limitation of many aspects of 2-D media and one should not expect to be able to model and analyze all aspects within a single approach. 1.2 Entropy of discrete sources A discrete information source is a device or process that outputs symbols at discrete instances from some finite alphabet A ={x 1, x 2,...,x r }. We usually assume that a large number of such symbols will have to be processed, and that they may be produced at regular instances in time or read from various positions in the plane in some fashion. The use of a large number of symbols allows us to describe the frequencies in terms of a probability distribution. A single symbol can be thought of as a random variable, X, with values from the finite alphabet A. We assume that all variables have the same probability distribution, and the probability of x i is written as P(x i )orp[x = x i ]. It is often convenient to refer to the probability distribution, P(X), of the variable as a vector of probabilities p X, where the values are listed in some fixed order.

2 2 Introduction to information theory As a measure of the amount of information provided by X we introduce the entropy of a random variable. Definition. Theentropy of the variable X is H(X) = E[log(1/P(X))] = P(x)log P(x), (1.1) where E[ ] indicates the expected value. The word entropy (as well as the use of the letter H) has its historic origin in statistical mechanics. The choice of the base for the logarithms is a matter of convention, but we shall always use binary logarithms and refer to the amount of information as measured in bits. We note that 0 H(X) log r. The entropy is always positive since each term is positive. The maximal value of H(X) is reached when each value of X has probability 1/r and the minimal when one outcome has probability 1 (See Exercise 1.1). When r = 2 m, the variable provides at most m bits of information, which shows that the word bit in information theory is used in a way that differs slightly from common technical terminology. If X N represents a string of N symbols, each with the same alphabet and probability distribution, it follows from (1.1) that the maximal entropy is NH, and that this maximum is reached when the variables are independent. We refer to a source for which the outputs are independent as a memoryless source, and if the probability distributions also are identical we describe it as i.i.d. (independent identically distributed). In this case we say that the entropy of the source is H bits/symbol. Example 1.1 (Entropy of a binary source). For a binary memoryless source with probability distribution P = (p, 1 p) we get the entropy (1.1)as H(p) = p log p (1 p)log(1 p), (1.2) where H( p) is called the (binary) entropy function. The entropy function is symmetric with respect to p = 1/2, and reaches its maximal value, 1, there. For small p, H increases quickly with increasing p, and H(0.11) = 0.5. We note that the same letter, H, is used to indicate the entropy of a random variable and the value of the entropy function for a particular distribution, (p, 1 p). We shall use the same notation H(P) when P is a general discrete probability distribution. Example 1.2 (Run-length coding). For a binary source with a small value of P(1) it may be useful to segment a string of symbols into runs of 0s ending with a 1. In this way the source is converted into a memoryless source with alphabet {0, 1, 2,...,n}, where the symbols indicate the length of the segment. The value k represents the string 0 k 1, k < n. If runs of more than n 0s are possible, the value k = n represents the string

3 1.2 Entropy of discrete sources Figure 1.1. A binary image of a printed text. 0 n, i.e. a run of 0s that is not terminated. Run-length coding may be used to code binary images of text along the rows (Fig. 1.1). For a pair of random variables, (X, Y ), with values from finite alphabets A x = {x 1, x 2,...,x r } and A y ={y 1, y 2,...,y s }, the conditional probability of y j given x i is P(y j x i ) = q ij. It is often convenient to refer to the conditional probability distribution as the matrix Q = P(Y X) = [q ij ]. The probability distribution of the variable, Y, then becomes p Y = p X Q, where p X and p Y are the distributions of X and Y, respectively. Definition. The conditional entropy H(Y X) = i, j P(x i, y j )log P(y j x i ) = i P(x i ) j P(y j x i )log P(y j x i ) (1.3) can be interpreted as the additional information provided by Y when X is known. For any particular y j, the average value of P(y j x i )log P(y j x i ) is given by P(y j )log P(y j ). Since the function u log u is concave (has negative second derivative), we can use Jensen s inequality, which states that for such a function f (E(u)) E[ f (u)]. Thus, on applying Jensen s inequality with u = p(y x) to(1.3), it follows that the entropy is never increased by conditioning, i.e. H(Y ) H(Y X).

4 4 Introduction to information theory For a string of variables, X 1, X 2,...,X n we have P(X 1, X 2,...,X n ) = P(X 1 )P(X 2 X 1 )P(X 3 X 1, X 2 )...P(X n X 1, X 2,...,X n 1 ) and thus, taking the expectation of the logarithm, H(X 1, X 2,...,X n ) = H(X 1 ) + H(X 2 X 1 ) + + H(X n X 1, X 2,...,X n 1 ). (1.4) This result is known as the chain rule, and expresses the total amount of information as a sum of information contributions from variable X i when values of the previous variables are known. When the variables are not independent, the chain rule is often a valuable tool with which to decompose the probabilities and entropies of the entire string of data into terms that are easier to analyze and calculate. Example 1.3 (Entropy by the chain rule). If X takes values in A ={x 1, x 2, x 3, x 4 } with probability distribution (1/4, 1/4, 1/3, 1/6), we can of course calculate the entropy directly from the definition. As an alternative, we can think of the outcome as a result of first choosing a subset of two values, represented by the variable Y. We can then express the entropy by the chain rule: H(X) = H(Y ) + H(X Y ) = H(1/2) + (1/2)H(1/2) + (1/2)H(1/3) = 3/2 + (log 3 2/3)/2 = 7/6 + (1/2)log Mutual information and discrete memoryless channels For a pair of random variables, (X, Y ), with values from finite alphabets A x = {x 1, x 2,...,x r } and A y ={y 1, y 2,...,y s },themutual information is defined as I (Y ; X) = ( ) [ ( )] P(y x) P(y x) P(x, y)log = E log. (1.5) P(y) P(y) x,y The mutual information is a measure of the amount of information about X represented by Y. It follows from the definition that the mutual information can be expressed in terms of entropies as I (X; Y ) = H(Y ) H(Y X) = H(X) H(X Y ). (1.6) The properties of the conditional entropy imply that the mutual information I (X; Y ) 0. It is 0 when X and Y are independent, and achieves the maximal value H(X) when each value of X gives a unique value of Y. It may also be noted that I (Y ; X) = I (X; Y ). The symmetry follows from rewriting P(y x)asp(y, x)/p(x) in the definition of I. This is an unexpected property since there is often a causal relation between X and Y, and it is not possible to interchange cause and effect.

5 1.4 Source coding for discrete sources 5 From the chain rule for entropy we get I (Y ; X 1, X 2,...,X n ) = I (Y ; X 1 ) + I (Y ; X 2 X 1 ) + + I (Y ; X n X 1, X 2,...,X n 1 ), (1.7) which we refer to as the chain rule for mutual information. A discrete information channel is a model of communication where X and Y represent the input and output variables, and the channel is defined by the conditional probability matrix, Q = P(Y X) = [q ij ], which in this context is also called the transition matrix of the channel. When the input distribution P(X) is given, we find the output distribution as p Y = p X Q. Example 1.4 (The binary symmetric channel). The single most important channel model is the binary symmetric channel (BSC). This channel models a situation in which random errors occur with probability p in binary data. For equally distributed inputs, the output has the same distribution, and the mutual information may be found from (1.6) as C = I (Y ; X) = 1 H(p), (1.8) where H again indicates the binary entropy function. Thus C = 1 for p = 0 or 1, C has minimal value 0 for p = 1/2, and C = 1/2 for p = For two variables we can combine (1.4) and (1.6) to get H(X, Y ) = H(X) + H(Y ) I (X; Y ). Mutual information cannot be increased by data processing. If Z is a variable that is obtained by processing Y, but depends on X only through Y,wehavefrom(1.7) I (X; Z, Y ) = I (X; Y ) + I (X; Z Y ) = I (X; Z) + I (X; Y Z). Here I (X; Z Y ) = 0, since X and Z are independent given Y, and I (X; Y Z) 0. Thus I (X; Z) I (X; Y ). (1.9) 1.4 Source coding for discrete sources In this section we demonstrate that the entropy of a message has a more operational significance: if the entropy of the source is H, a message in the form of a string of N symbols can, by suitable coding, be represented by about NH binary symbols. We refer to such a process as source coding.

6 6 Introduction to information theory The Kraft inequality We shall prove the existence of efficient source codes by actually constructing some codes that are important in applications. However, getting to these results requires some intermediate steps. A binary variable-length source code is described as a mapping from the source alphabet A to a set of finite strings, C from the binary code alphabet, which we always denote {0, 1}. Since we allow the strings in the code to have different lengths, it is important that we can carry out the reverse mapping in a unique way. A simple way of ensuring this property is to use a prefix code, a set of strings chosen in such a way that no string is also the beginning (prefix) of another string. Thus, when the current string belongs to C, we know that we have reached the end, and we can start processing the following symbols as a new code string. In Example 1.5 an example of a simple prefix code is given. If c i is a string in C and l(c i ) its length in binary symbols, the expected length of the source code per source symbol is N L(C) = P(c i )l(c i ). i=1 If the set of lengths of the code is {l(c i )}, any prefix code must satisfy the following important condition, known as the Kraft inequality: 2 l(c i ) 1. (1.10) i The code can be described as a binary search tree: starting from the root, two branches are labelled 0 and 1, and each node is either a leaf that corresponds to the end of a string, or a node that can be assumed to have two continuing branches. Let l m be the maximal length of a string. If a string has length l(c), it follows from the prefix condition that none of the 2 lm l(c) extensions of this string are in the code. Also, two extensions of different code strings are never equal, since this would violate the prefix condition. Thus by summing over all codewords we get 2 l m l(c i ) 2 l m i and the inequality follows. It may further be proven that any uniquely decodable code must satisfy (1.10) and that if this is the case there exists a prefix code with the same set of code lengths. Thus restriction to prefix codes imposes no loss in coding performance. Example 1.5 (A simple code). The code {0, 10, 110, 111} is a prefix code for an alphabet of four symbols. If the probability distribution of the source is (1/2, 1/4, 1/8, 1/8), the average length of the code strings is 1 1/ / /4 = 7/4,whichis also the entropy of the source.

7 1.4 Source coding for discrete sources 7 If all the numbers log P(c i ) were integers, we could choose these as the lengths l(c i ). In this way the Kraft inequality would be satisfied with equality, and furthermore L = i P(c i )l(c i ) = i P(c i )log P(c i ) = H(X) and thus the expected code length would equal the entropy. Such a case is shown in Example 1.5. However, in general we have to select code strings that only approximate the optimal values. If we round log P(c i ) to the nearest larger integer log P(c i ), the lengths satisfy the Kraft inequality, and by summing we get an upper bound on the code lengths l(c i ) = log P(c i ) log P(c i ) + 1. (1.11) The difference between the entropy and the average code length may be evaluated from H(X) L = [ ( ) ] 1 P(c i ) log l i = ( 2 l i ) P(c i )log P(c i i ) P(c i i ) log i 2 l i 0, where the inequalities are those established by Jensen and Kraft, respectively. This gives H(X) L H(X) + 1, (1.12) where the right-hand side is given by taking the average of (1.11). The loss due to the integer rounding may give a disappointing result when the coding is done on single source symbols. However, if we apply the result to strings of N symbols, we find an expected code length of at most NH + 1, and the result per source symbol becomes at most H + 1/N. Thus, for sources with independent symbols, we can get an expected code length close to the entropy by encoding sufficiently long strings of source symbols Huffman coding In this section we present a construction of a variable-length source code with minimal expected code length. This method, Huffman coding, is a commonly used coding technique. It is based on the following simple recursive procedure for binary codewords. (1) Input: a list of symbols, S, and their probability distribution. (2) Let the two smallest probabilities be p i and p j (maybe not unique). (3) Assign these two symbols the same code string, α, followed by 0 and 1, respectively. (4) Merge them into a single symbol, a ij, with probability p(a ij ) = p i + p j. (5) Repeat from Step 1 until only a single symbol is left. The result of the procedure can be interpreted as a decision tree, where the code string is obtained as the labels on the branches that lead to a leaf representing the symbol. It

8 8 Introduction to information theory Table 1.1. Huffman coding (S denotes symbols and the two bits determined in each stage are indicated by italics) Stage 1 Stage 2 Stage 3 Stage 4 (five symbols) (four Symbols) (three symbols) (two symbols) Prob S C Prob S C Prob S C Prob S C 0.50 a a a a a a a a a a a a a a follows immediately from the description that the code is a prefix code. It may be noted that the algorithm constructs the tree by going from the leaves to the root. To see that the Huffman code has minimal expected length, we first note that in any optimal prefix code there are two strings with maximal length differing in the last bit. These strings are code strings for the symbols with the smallest probabilities (Steps 1 and 2). If the symbols with the longest strings did not have minimal probabilities, we could switch the strings and get a smaller expected length. If the longest string did not have a companion that differs only in the last bit, we could eliminate the last bit, use this onebit-shorter codeword and preserve the prefix property, thus reducing the expected length. The expected length can be expressed as L = L + p i + p j, where L is the expected length of the code after the two symbols have been merged, and the last terms were chosen to be minimal. Thus the reduced code must be chosen as an optimal code for the reduced alphabet with α (Step 3) and the same arguments may be repeated for this (Step 4). Thus, the steps of Huffman coding are consistent with the structure of an optimal prefix code. Further, the code is indeed optimal, insofar as restricting the class of codes to prefix codes gives no loss in coding performance, as previously remarked. It follows from the optimality of the Huffman code that the expected code length satisfies (1.12). An example of the Huffman code procedure is given in Table 1.1. For convenience the symbols are ordered by decreasing probabilities. By extending the alphabet and code N symbols at a time, an average code length per original symbol arbitrarily close to the entropy may be achieved. The price is that the number of codewords increases exponentially in N. 1.5 Arithmetic codes Arithmetic coding can code a sequence of symbols, x n, using close to log 2 p(x n ) bits, in accordance with the entropy, with a complexity which is only linear in n.

9 1.5 Arithmetic codes 9 This is achieved by processing one source symbol at a time. Furthermore, arithmetic coding may immediately be combined with adaptive (dynamically changing) estimates of probabilities. Arithmetic coding is often referred to as coding by interval subdivision. The coding specifies a number (pointing) within a subinterval of the unit interval [0; 1[, where the lower end point is closed and the upper end point is open. The codeword identifying the subinterval, and thereby the sequence that is coded, is a binary representation of this number Ideal arithmetic coding Let x n denote a sequence of symbols to be coded. x n = x 1...x n has length n. Letx i be the symbol at i. LetP(x n ) denote a probability measure of x n. At this point we disregard the restriction of finite precision of our computer, but this issue is discussed in the following sections. We shall associate the sequence of symbols x n with an interval [C(x n ); C(x n ) + P(x n )[. The lower bound C(x n ) may be used as the codeword. This interval subdivision may be done sequentially, so we need only calculate the subinterval and thereby the codeword associated with the sequence we are coding. If we were to order the intervals of all the possible sequences of length n one after another, we would have partitioned the unit interval [0; 1[ into intervals of lengths equal to the probabilities of the strings. (Summing the probabilities of all possible sequences adds to unity since we have presumed a probability measure.) For the sake of simplicity we restrict ourselves to the case of a binary source alphabet. The sequence x n is extended to either x n 0orx n 1. The codeword subinterval (of x n+1 ) is calculated by the following recursion: C(x n 0) = C(x n ) if x n+1 = 0, (1.13) C(x n 1) = C(x n ) + P(x n 0) if x n+1 = 1, (1.14) P(x n i) = P(x n )P(i x n ), i = 0, 1. (1.15) The width of the interval associated with x n equals its probability P(x n ). We note that (1.15) is a decomposition of the probability leading to the chain rule (1.4). Using pointers with a spacing of powers of 1/2, at least one pointer of length log 2 (P(x n )) will point to the interval. ( y denotes the ceiling or round-up value of y.) This suggests that arithmetic coding yields a code length that on average is very close to the entropy. Before formalizing this statement, the basic recursions of binary arithmetic coding (1.13) (1.15) are generalized for a finite alphabet, A: C(x n 0) = C(x n ), C(x n i) = C(x n ) + j<i P(x n i) = P(x n )P(i x n ), P(x n j), i = 1,..., A 1, i = 0,..., A 1, where A is the size of the symbol alphabet. Compared with the binary version, we still partition the interval into two for the lower bound, namely j < i and j i. Thelast

10 10 Introduction to information theory equation again reduces the interval to a size proportional to the probability of x n+1, i.e. P(x n+1 ). One difference is that the selected subinterval may have intervals both above and below. Let a tag T (x n ) denote the center point of the interval, [C(x n ); C(x n ) + P(x n )[, associated with x n by the recursions. We shall determine a precision of l(x n ) bits, such that the codeword defined belongs to a prefix code for X n. Now choose l(x n ) = log 2 (P(x n )) +1 < log 2 (P(x n )) + 2. (1.16) Truncating the tag to l(x n ) bits by (1.16) gives the codeword c = T (x n ) l(x n ), where z k denotes the truncation of a fractional binary number z to a precision of k digits. Choosing l(x n )by(1.16) ensures not only that this value, c, points within the interval but also that T (x n ) l(x n ) (= T (x n ) l(x n ) + 2 l(xn) ) points within the interval. This means that the codeword c will not be the prefix of the codeword generated by the procedure for any other outcome of X n. Since this codeword construction is prefix-free it is also uniquely decodable. Taking the expectation of (1.16) gives the right-hand side of H(X n ) < L < H(X n ) + 2. Thus the expected code length per source symbol, L/n, will converge to the entropy as n approaches. In addition to the fact that the arithmetic coding procedure above converges to the entropy with a complexity linear in the length of the sequence, it may also be noted that the probability expression is general, such that conditional probabilities may be used for sources with memory and these conditional probabilities may vary over the sequence. These are desirable properties in many applications. The issue of estimating these conditional probabilities is addressed in Chapter Finite-precision arithmetic coding A major drawback of the recursions is that, even with finite-precision conditional probabilities P(i j) in(1.15), the left-hand side will quickly need a precision beyond any register width we may choose for these calculations. (We remind ourselves that it is mandatory that the encoder and decoder can perform exactly the same calculations.) There are several remedies to this. The most straightforward is (returning to the binary case) to approximate the multiplication of (1.15) by P(x n i) = P(x n ) P(i x n ) k, i = 0, 1, (1.17) i.e. k represents the precision of the representation of the interval. The relative position (or exponent) of the k digits is determined by the representation of C(x n ) as specified below. Having solved the precision problem of the multiplication, we still face the problem of representing C(x n ) with adequate precision and the related problem of avoiding having to code the whole data string before generating output bits. Adequate precision for unique decodability may be achieved by ensuring that all symbols at all points are associated

An instantaneous code (prefix code, tree code) with the codeword lengths l 1,..., l N exists if and only if. 2 l i. i=1

Kraft s inequality An instantaneous code (prefix code, tree code) with the codeword lengths l 1,..., l N exists if and only if N 2 l i 1 Proof: Suppose that we have a tree code. Let l max = max{l 1,...,

Information Theory CHAPTER. 5.1 Introduction. 5.2 Entropy

Haykin_ch05_pp3.fm Page 207 Monday, November 26, 202 2:44 PM CHAPTER 5 Information Theory 5. Introduction As mentioned in Chapter and reiterated along the way, the purpose of a communication system is

Chapter 2: Source coding

Chapter 2: meghdadi@ensil.unilim.fr University of Limoges Chapter 2: Entropy of Markov Source Chapter 2: Entropy of Markov Source Markov model for information sources Given the present, the future is independent

Chapter 9 Fundamental Limits in Information Theory

Chapter 9 Fundamental Limits in Information Theory Information Theory is the fundamental theory behind information manipulation, including data compression and data transmission. 9.1 Introduction o For

Information Theory. David Rosenberg. June 15, New York University. David Rosenberg (New York University) DS-GA 1003 June 15, / 18

Information Theory David Rosenberg New York University June 15, 2015 David Rosenberg (New York University) DS-GA 1003 June 15, 2015 1 / 18 A Measure of Information? Consider a discrete random variable

Information and Entropy

Information and Entropy Shannon s Separation Principle Source Coding Principles Entropy Variable Length Codes Huffman Codes Joint Sources Arithmetic Codes Adaptive Codes Thomas Wiegand: Digital Image Communication

Coding of memoryless sources 1/35

Coding of memoryless sources 1/35 Outline 1. Morse coding ; 2. Definitions : encoding, encoding efficiency ; 3. fixed length codes, encoding integers ; 4. prefix condition ; 5. Kraft and Mac Millan theorems

Multimedia Communications. Mathematical Preliminaries for Lossless Compression

Multimedia Communications Mathematical Preliminaries for Lossless Compression What we will see in this chapter Definition of information and entropy Modeling a data source Definition of coding and when

Lecture 3 : Algorithms for source coding. September 30, 2016

Lecture 3 : Algorithms for source coding September 30, 2016 Outline 1. Huffman code ; proof of optimality ; 2. Coding with intervals : Shannon-Fano-Elias code and Shannon code ; 3. Arithmetic coding. 1/39

Entropies & Information Theory

Entropies & Information Theory LECTURE I Nilanjana Datta University of Cambridge,U.K. See lecture notes on: http://www.qi.damtp.cam.ac.uk/node/223 Quantum Information Theory Born out of Classical Information

ECE 587 / STA 563: Lecture 5 Lossless Compression

ECE 587 / STA 563: Lecture 5 Lossless Compression Information Theory Duke University, Fall 2017 Author: Galen Reeves Last Modified: October 18, 2017 Outline of lecture: 5.1 Introduction to Lossless Source

Shannon-Fano-Elias coding

Shannon-Fano-Elias coding Suppose that we have a memoryless source X t taking values in the alphabet {1, 2,..., L}. Suppose that the probabilities for all symbols are strictly positive: p(i) > 0, i. The

Exercise 1. = P(y a 1)P(a 1 )

Chapter 7 Channel Capacity Exercise 1 A source produces independent, equally probable symbols from an alphabet {a 1, a 2 } at a rate of one symbol every 3 seconds. These symbols are transmitted over a

Coding for Discrete Source

EGR 544 Communication Theory 3. Coding for Discrete Sources Z. Aliyazicioglu Electrical and Computer Engineering Department Cal Poly Pomona Coding for Discrete Source Coding Represent source data effectively

Introduction to Information Theory. Uncertainty. Entropy. Surprisal. Joint entropy. Conditional entropy. Mutual information.

L65 Dept. of Linguistics, Indiana University Fall 205 Information theory answers two fundamental questions in communication theory: What is the ultimate data compression? What is the transmission rate

Dept. of Linguistics, Indiana University Fall 2015

L645 Dept. of Linguistics, Indiana University Fall 2015 1 / 28 Information theory answers two fundamental questions in communication theory: What is the ultimate data compression? What is the transmission

Information Theory. Week 4 Compressing streams. Iain Murray,

Information Theory http://www.inf.ed.ac.uk/teaching/courses/it/ Week 4 Compressing streams Iain Murray, 2014 School of Informatics, University of Edinburgh Jensen s inequality For convex functions: E[f(x)]

(each row defines a probability distribution). Given n-strings x X n, y Y n we can use the absence of memory in the channel to compute

ENEE 739C: Advanced Topics in Signal Processing: Coding Theory Instructor: Alexander Barg Lecture 6 (draft; 9/6/03. Error exponents for Discrete Memoryless Channels http://www.enee.umd.edu/ abarg/enee739c/course.html

Data Compression Techniques

Data Compression Techniques Part 1: Entropy Coding Lecture 4: Asymmetric Numeral Systems Juha Kärkkäinen 08.11.2017 1 / 19 Asymmetric Numeral Systems Asymmetric numeral systems (ANS) is a recent entropy

Information Theory, Statistics, and Decision Trees

Information Theory, Statistics, and Decision Trees Léon Bottou COS 424 4/6/2010 Summary 1. Basic information theory. 2. Decision trees. 3. Information theory and statistics. Léon Bottou 2/31 COS 424 4/6/2010

3F1: Signals and Systems INFORMATION THEORY Examples Paper Solutions

Engineering Tripos Part IIA THIRD YEAR 3F: Signals and Systems INFORMATION THEORY Examples Paper Solutions. Let the joint probability mass function of two binary random variables X and Y be given in the

4. Quantization and Data Compression. ECE 302 Spring 2012 Purdue University, School of ECE Prof. Ilya Pollak

4. Quantization and Data Compression ECE 32 Spring 22 Purdue University, School of ECE Prof. What is data compression? Reducing the file size without compromising the quality of the data stored in the

Quantum-inspired Huffman Coding

Quantum-inspired Huffman Coding A. S. Tolba, M. Z. Rashad, and M. A. El-Dosuky Dept. of Computer Science, Faculty of Computers and Information Sciences, Mansoura University, Mansoura, Egypt. tolba_954@yahoo.com,

Lec 03 Entropy and Coding II Hoffman and Golomb Coding

CS/EE 5590 / ENG 40 Special Topics Multimedia Communication, Spring 207 Lec 03 Entropy and Coding II Hoffman and Golomb Coding Zhu Li Z. Li Multimedia Communciation, 207 Spring p. Outline Lecture 02 ReCap

Lecture 3: Channel Capacity

Lecture 3: Channel Capacity 1 Definitions Channel capacity is a measure of maximum information per channel usage one can get through a channel. This one of the fundamental concepts in information theory.

(Classical) Information Theory II: Source coding

(Classical) Information Theory II: Source coding Sibasish Ghosh The Institute of Mathematical Sciences CIT Campus, Taramani, Chennai 600 113, India. p. 1 Abstract The information content of a random variable

Course notes for Data Compression - 1 The Statistical Coding Method Fall 2005

Course notes for Data Compression - 1 The Statistical Coding Method Fall 2005 Peter Bro Miltersen August 29, 2005 Version 2.0 1 The paradox of data compression Definition 1 Let Σ be an alphabet and let

Introduction to Information Theory. By Prof. S.J. Soni Asst. Professor, CE Department, SPCE, Visnagar

Introduction to Information Theory By Prof. S.J. Soni Asst. Professor, CE Department, SPCE, Visnagar Introduction [B.P. Lathi] Almost in all the means of communication, none produces error-free communication.

Information Theory. Coding and Information Theory. Information Theory Textbooks. Entropy

Coding and Information Theory Chris Williams, School of Informatics, University of Edinburgh Overview What is information theory? Entropy Coding Information Theory Shannon (1948): Information theory is

An introduction to basic information theory. Hampus Wessman

An introduction to basic information theory Hampus Wessman Abstract We give a short and simple introduction to basic information theory, by stripping away all the non-essentials. Theoretical bounds on

COS597D: Information Theory in Computer Science October 19, Lecture 10

COS597D: Information Theory in Computer Science October 9, 20 Lecture 0 Lecturer: Mark Braverman Scribe: Andrej Risteski Kolmogorov Complexity In the previous lectures, we became acquainted with the concept

Ch 0 Introduction. 0.1 Overview of Information Theory and Coding

Ch 0 Introduction 0.1 Overview of Information Theory and Coding Overview The information theory was founded by Shannon in 1948. This theory is for transmission (communication system) or recording (storage

Sample solutions to Homework 4, Information-Theoretic Modeling (Fall 2014)

Sample solutions to Homework 4, Information-Theoretic Modeling (Fall 204) Jussi Määttä October 2, 204 Question [First, note that we use the symbol! as an end-of-message symbol. When we see it, we know

Lecture 6: Kraft-McMillan Inequality and Huffman Coding

EE376A/STATS376A Information Theory Lecture 6-0/25/208 Lecture 6: Kraft-McMillan Inequality and Huffman Coding Lecturer: Tsachy Weissman Scribe: Akhil Prakash, Kai Yee Wan In this lecture, we begin with

MAHALAKSHMI ENGINEERING COLLEGE-TRICHY QUESTION BANK UNIT V PART-A. 1. What is binary symmetric channel (AUC DEC 2006)

MAHALAKSHMI ENGINEERING COLLEGE-TRICHY QUESTION BANK SATELLITE COMMUNICATION DEPT./SEM.:ECE/VIII UNIT V PART-A 1. What is binary symmetric channel (AUC DEC 2006) 2. Define information rate? (AUC DEC 2007)

Information Theory. M1 Informatique (parcours recherche et innovation) Aline Roumy. January INRIA Rennes 1/ 73

1/ 73 Information Theory M1 Informatique (parcours recherche et innovation) Aline Roumy INRIA Rennes January 2018 Outline 2/ 73 1 Non mathematical introduction 2 Mathematical introduction: definitions

Computer Arithmetic. MATH 375 Numerical Analysis. J. Robert Buchanan. Fall Department of Mathematics. J. Robert Buchanan Computer Arithmetic

Computer Arithmetic MATH 375 Numerical Analysis J. Robert Buchanan Department of Mathematics Fall 2013 Machine Numbers When performing arithmetic on a computer (laptop, desktop, mainframe, cell phone,

CS225/ECE205A, Spring 2005: Information Theory Wim van Dam Engineering 1, Room 5109

CS225/ECE205A, Spring 2005: Information Theory Wim van Dam Engineering 1, Room 5109 vandam@cs http://www.cs.ucsb.edu/~vandam/teaching/s06_cs225/ Formalities There was a mistake in earlier announcement.

X 1 : X Table 1: Y = X X 2

ECE 534: Elements of Information Theory, Fall 200 Homework 3 Solutions (ALL DUE to Kenneth S. Palacio Baus) December, 200. Problem 5.20. Multiple access (a) Find the capacity region for the multiple-access

Cut-Set Bound and Dependence Balance Bound

Cut-Set Bound and Dependence Balance Bound Lei Xiao lxiao@nd.edu 1 Date: 4 October, 2006 Reading: Elements of information theory by Cover and Thomas [1, Section 14.10], and the paper by Hekstra and Willems

repetition, part ii Ole-Johan Skrede INF Digital Image Processing

repetition, part ii Ole-Johan Skrede 24.05.2017 INF2310 - Digital Image Processing Department of Informatics The Faculty of Mathematics and Natural Sciences University of Oslo today s lecture Coding and

Information Theory. Lecture 10. Network Information Theory (CT15); a focus on channel capacity results

Information Theory Lecture 10 Network Information Theory (CT15); a focus on channel capacity results The (two-user) multiple access channel (15.3) The (two-user) broadcast channel (15.6) The relay channel

Chapter I: Fundamental Information Theory

ECE-S622/T62 Notes Chapter I: Fundamental Information Theory Ruifeng Zhang Dept. of Electrical & Computer Eng. Drexel University. Information Source Information is the outcome of some physical processes.

Lecture 11: Quantum Information III - Source Coding

CSCI5370 Quantum Computing November 25, 203 Lecture : Quantum Information III - Source Coding Lecturer: Shengyu Zhang Scribe: Hing Yin Tsang. Holevo s bound Suppose Alice has an information source X that

Huffman Coding. C.M. Liu Perceptual Lab, College of Computer Science National Chiao-Tung University

Huffman Coding C.M. Liu Perceptual Lab, College of Computer Science National Chiao-Tung University http://www.csie.nctu.edu.tw/~cmliu/courses/compression/ Office: EC538 (03)573877 cmliu@cs.nctu.edu.tw

Data Compression Techniques

Data Compression Techniques Part 2: Text Compression Lecture 5: Context-Based Compression Juha Kärkkäinen 14.11.2017 1 / 19 Text Compression We will now look at techniques for text compression. These techniques

Kolmogorov complexity ; induction, prediction and compression

Kolmogorov complexity ; induction, prediction and compression Contents 1 Motivation for Kolmogorov complexity 1 2 Formal Definition 2 3 Trying to compute Kolmogorov complexity 3 4 Standard upper bounds

Classical Information Theory Notes from the lectures by prof Suhov Trieste - june 2006

Classical Information Theory Notes from the lectures by prof Suhov Trieste - june 2006 Fabio Grazioso... July 3, 2006 1 2 Contents 1 Lecture 1, Entropy 4 1.1 Random variable...............................

ELEC 515 Information Theory. Distortionless Source Coding

ELEC 515 Information Theory Distortionless Source Coding 1 Source Coding Output Alphabet Y={y 1,,y J } Source Encoder Lengths 2 Source Coding Two coding requirements The source sequence can be recovered

CMPT 365 Multimedia Systems. Lossless Compression

CMPT 365 Multimedia Systems Lossless Compression Spring 2017 Edited from slides by Dr. Jiangchuan Liu CMPT365 Multimedia Systems 1 Outline Why compression? Entropy Variable Length Coding Shannon-Fano Coding

Information Theory and Model Selection

Information Theory and Model Selection Dean Foster & Robert Stine Dept of Statistics, Univ of Pennsylvania May 11, 1998 Data compression and coding Duality of code lengths and probabilities Model selection

The binary entropy function

ECE 7680 Lecture 2 Definitions and Basic Facts Objective: To learn a bunch of definitions about entropy and information measures that will be useful through the quarter, and to present some simple but

Introduction to algebraic codings Lecture Notes for MTH 416 Fall Ulrich Meierfrankenfeld

Introduction to algebraic codings Lecture Notes for MTH 416 Fall 2014 Ulrich Meierfrankenfeld December 9, 2014 2 Preface These are the Lecture Notes for the class MTH 416 in Fall 2014 at Michigan State

EE-597 Notes Quantization

EE-597 Notes Quantization Phil Schniter June, 4 Quantization Given a continuous-time and continuous-amplitude signal (t, processing and storage by modern digital hardware requires discretization in both

DRAFT. Algebraic computation models. Chapter 14

Chapter 14 Algebraic computation models Somewhat rough We think of numerical algorithms root-finding, gaussian elimination etc. as operating over R or C, even though the underlying representation of the

6.02 Fall 2012 Lecture #1

6.02 Fall 2012 Lecture #1 Digital vs. analog communication The birth of modern digital communication Information and entropy Codes, Huffman coding 6.02 Fall 2012 Lecture 1, Slide #1 6.02 Fall 2012 Lecture

Information redundancy

Information redundancy Information redundancy add information to date to tolerate faults error detecting codes error correcting codes data applications communication memory p. 2 - Design of Fault Tolerant

Large Deviations Performance of Knuth-Yao algorithm for Random Number Generation

Large Deviations Performance of Knuth-Yao algorithm for Random Number Generation Akisato KIMURA akisato@ss.titech.ac.jp Tomohiko UYEMATSU uematsu@ss.titech.ac.jp April 2, 999 No. AK-TR-999-02 Abstract

Example: Letter Frequencies

Example: Letter Frequencies i a i p i 1 a 0.0575 2 b 0.0128 3 c 0.0263 4 d 0.0285 5 e 0.0913 6 f 0.0173 7 g 0.0133 8 h 0.0313 9 i 0.0599 10 j 0.0006 11 k 0.0084 12 l 0.0335 13 m 0.0235 14 n 0.0596 15 o

Factor Graphs and Message Passing Algorithms Part 1: Introduction

Factor Graphs and Message Passing Algorithms Part 1: Introduction Hans-Andrea Loeliger December 2007 1 The Two Basic Problems 1. Marginalization: Compute f k (x k ) f(x 1,..., x n ) x 1,..., x n except

Information in Biology

Information in Biology CRI - Centre de Recherches Interdisciplinaires, Paris May 2012 Information processing is an essential part of Life. Thinking about it in quantitative terms may is useful. 1 Living

1.6: Solutions 17. Solution to exercise 1.6 (p.13).

1.6: Solutions 17 A slightly more careful answer (short of explicit computation) goes as follows. Taking the approximation for ( N K) to the next order, we find: ( N N/2 ) 2 N 1 2πN/4. (1.40) This approximation

On the Cost of Worst-Case Coding Length Constraints

On the Cost of Worst-Case Coding Length Constraints Dror Baron and Andrew C. Singer Abstract We investigate the redundancy that arises from adding a worst-case length-constraint to uniquely decodable fixed

Machine Learning and Data Mining. Decision Trees. Prof. Alexander Ihler

+ Machine Learning and Data Mining Decision Trees Prof. Alexander Ihler Decision trees Func-onal form f(x;µ): nested if-then-else statements Discrete features: fully expressive (any func-on) Structure:

Infor mation Theor y. Outline

Outline Background Basic Definitions Infor mation and Entropy Independent Messages Codings Background Infor mation Theor y verses Infor mation Systems Infor mation theor y: originated in electrical engineering

10-704: Information Processing and Learning Spring Lecture 8: Feb 5

10-704: Information Processing and Learning Spring 2015 Lecture 8: Feb 5 Lecturer: Aarti Singh Scribe: Siheng Chen Disclaimer: These notes have not been subjected to the usual scrutiny reserved for formal

BASICS OF COMPRESSION THEORY

BASICS OF COMPRESSION THEORY Why Compression? Task: storage and transport of multimedia information. E.g.: non-interlaced HDTV: 0x0x0x = Mb/s!! Solutions: Develop technologies for higher bandwidth Find

Context tree models for source coding

Context tree models for source coding Toward Non-parametric Information Theory Licence de droits d usage Outline Lossless Source Coding = density estimation with log-loss Source Coding and Universal Coding

The Adaptive Classical Capacity of a Quantum Channel,

The Adaptive Classical Capacity of a Quantum Channel, or Information Capacities of Three Symmetric Pure States in Three Dimensions Peter W. Shor 1 AT&T Labs Research Florham Park, NJ 07932 02139 1 Current

Stream Codes. 6.1 The guessing game

About Chapter 6 Before reading Chapter 6, you should have read the previous chapter and worked on most of the exercises in it. We ll also make use of some Bayesian modelling ideas that arrived in the vicinity

THIS paper is aimed at designing efficient decoding algorithms

IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 45, NO. 7, NOVEMBER 1999 2333 Sort-and-Match Algorithm for Soft-Decision Decoding Ilya Dumer, Member, IEEE Abstract Let a q-ary linear (n; k)-code C be used

CISC 876: Kolmogorov Complexity

March 27, 2007 Outline 1 Introduction 2 Definition Incompressibility and Randomness 3 Prefix Complexity Resource-Bounded K-Complexity 4 Incompressibility Method Gödel s Incompleteness Theorem 5 Outline

Information Theory and Coding

Information Theory and oding Subject ode : 0E55 IA Marks : 5 No. of Lecture Hrs/Week : 04 Exam Hours : 0 Total no. of Lecture Hrs. : 5 Exam Marks : 00 PART - A Unit : Information Theory: Introduction,

COMMUNICATION SCIENCES AND ENGINEERING

COMMUNICATION SCIENCES AND ENGINEERING X. PROCESSING AND TRANSMISSION OF INFORMATION Academic and Research Staff Prof. Peter Elias Prof. Robert G. Gallager Vincent Chan Samuel J. Dolinar, Jr. Richard

Computer Science 385 Analysis of Algorithms Siena College Spring Topic Notes: Limitations of Algorithms

Computer Science 385 Analysis of Algorithms Siena College Spring 2011 Topic Notes: Limitations of Algorithms We conclude with a discussion of the limitations of the power of algorithms. That is, what kinds

Chapter 2. Error Correcting Codes. 2.1 Basic Notions

Chapter 2 Error Correcting Codes The identification number schemes we discussed in the previous chapter give us the ability to determine if an error has been made in recording or transmitting information.

Generalized Kraft Inequality and Arithmetic Coding

J. J. Rissanen Generalized Kraft Inequality and Arithmetic Coding Abstract: Algorithms for encoding and decoding finite strings over a finite alphabet are described. The coding operations are arithmetic

Introduction to Convolutional Codes, Part 1

Introduction to Convolutional Codes, Part 1 Frans M.J. Willems, Eindhoven University of Technology September 29, 2009 Elias, Father of Coding Theory Textbook Encoder Encoder Properties Systematic Codes

Lecture 4 October 18th

Directed and undirected graphical models Fall 2017 Lecture 4 October 18th Lecturer: Guillaume Obozinski Scribe: In this lecture, we will assume that all random variables are discrete, to keep notations

Optimum Binary-Constrained Homophonic Coding

Optimum Binary-Constrained Homophonic Coding Valdemar C. da Rocha Jr. and Cecilio Pimentel Communications Research Group - CODEC Department of Electronics and Systems, P.O. Box 7800 Federal University

Lecture 2. Capacity of the Gaussian channel

Spring, 207 5237S, Wireless Communications II 2. Lecture 2 Capacity of the Gaussian channel Review on basic concepts in inf. theory ( Cover&Thomas: Elements of Inf. Theory, Tse&Viswanath: Appendix B) AWGN

Block 2: Introduction to Information Theory

Block 2: Introduction to Information Theory Francisco J. Escribano April 26, 2015 Francisco J. Escribano Block 2: Introduction to Information Theory April 26, 2015 1 / 51 Table of contents 1 Motivation

3F1 Information Theory Course

3F1 Information Theory Course (supervisor copy) 1 3F1 Information Theory Course Jossy Sayir js851@cam.ac.uk November 15, 211 Contents 1 Introduction to Information Theory 4 1.1 Source Coding.....................................

18.2 Continuous Alphabet (discrete-time, memoryless) Channel

0-704: Information Processing and Learning Spring 0 Lecture 8: Gaussian channel, Parallel channels and Rate-distortion theory Lecturer: Aarti Singh Scribe: Danai Koutra Disclaimer: These notes have not

Reserved-Length Prefix Coding

Reserved-Length Prefix Coding Michael B. Baer vlnks Mountain View, CA 94041-2803, USA Email:.calbear@ 1eee.org Abstract Huffman coding finds an optimal prefix code for a given probability mass function.

CS 630 Basic Probability and Information Theory. Tim Campbell

CS 630 Basic Probability and Information Theory Tim Campbell 21 January 2003 Probability Theory Probability Theory is the study of how best to predict outcomes of events. An experiment (or trial or event)

COS597D: Information Theory in Computer Science September 21, Lecture 2

COS597D: Information Theory in Computer Science September 1, 011 Lecture Lecturer: Mark Braverman Scribe: Mark Braverman In the last lecture, we introduced entropy H(X), and conditional entry H(X Y ),

Aalborg Universitet. Bounds on information combining for parity-check equations Land, Ingmar Rüdiger; Hoeher, A.; Huber, Johannes

Aalborg Universitet Bounds on information combining for parity-check equations Land, Ingmar Rüdiger; Hoeher, A.; Huber, Johannes Published in: 2004 International Seminar on Communications DOI link to publication

Conditional Independence

H. Nooitgedagt Conditional Independence Bachelorscriptie, 18 augustus 2008 Scriptiebegeleider: prof.dr. R. Gill Mathematisch Instituut, Universiteit Leiden Preface In this thesis I ll discuss Conditional

Source and Channel Coding for Correlated Sources Over Multiuser Channels

Source and Channel Coding for Correlated Sources Over Multiuser Channels Deniz Gündüz, Elza Erkip, Andrea Goldsmith, H. Vincent Poor Abstract Source and channel coding over multiuser channels in which

On the minimum neighborhood of independent sets in the n-cube

Matemática Contemporânea, Vol. 44, 1 10 c 2015, Sociedade Brasileira de Matemática On the minimum neighborhood of independent sets in the n-cube Moysés da S. Sampaio Júnior Fabiano de S. Oliveira Luérbio

Digital Image Processing Lectures 25 & 26

Lectures 25 & 26, Professor Department of Electrical and Computer Engineering Colorado State University Spring 2015 Area 4: Image Encoding and Compression Goal: To exploit the redundancies in the image

Text Compression. Jayadev Misra The University of Texas at Austin December 5, A Very Incomplete Introduction to Information Theory 2

Text Compression Jayadev Misra The University of Texas at Austin December 5, 2003 Contents 1 Introduction 1 2 A Very Incomplete Introduction to Information Theory 2 3 Huffman Coding 5 3.1 Uniquely Decodable

Lies My Calculator and Computer Told Me

Lies My Calculator and Computer Told Me 2 LIES MY CALCULATOR AND COMPUTER TOLD ME Lies My Calculator and Computer Told Me See Section.4 for a discussion of graphing calculators and computers with graphing

Complexity and NP-completeness

Lecture 17 Complexity and NP-completeness Supplemental reading in CLRS: Chapter 34 As an engineer or computer scientist, it is important not only to be able to solve problems, but also to know which problems

Multi-User Fundamentals

Contents 12 Multi-User Fundamentals 401 12.1 Multi-user Channels and Bounds................................ 402 12.1.1 Data Rates and Rate Regions.............................. 406 12.1.2 Optimum multi-user

Information Theory and Communication

Information Theory and Communication Ritwik Banerjee rbanerjee@cs.stonybrook.edu c Ritwik Banerjee Information Theory and Communication 1/8 General Chain Rules Definition Conditional mutual information