P. Fenwick [5] has recently implemented a compression algorithm for English text of this type, with good results. EXERCISE. For those of you that know

Size: px
Start display at page:

Download "P. Fenwick [5] has recently implemented a compression algorithm for English text of this type, with good results. EXERCISE. For those of you that know"

Transcription

1 Six Lectures on Information Theory John Kieer 2 Prediction & Information Theory Prediction is important in communications, control, forecasting, investment, and other areas. When the data model is known, it is understood how to do optimal prediction. When the data model is not known, one needs a prediction algorithm, called a universal prediction algorithm, that will perform well no matter what the data model turns out to be. These few pages represent a brief introduction to the interesting subject of universal prediction. 2.1 Shannon's Work in Prediction Information theorists nd prediction particularly interesting because of Claude Shannon's interest in this subject. In Section 2.1, we discuss Shannon's contributions to prediction Compression of English text via prediction Shannon [16] performed an experiment in which a human subject was asked to try to predict the next letter in an English text given all the previous letters. If the subject made a wrong prediction, the subject was allowed to make further guesses until the next letter was guessed correctly. The number of guesses to determine each letter was then recorded. Shannon reported one outcome of this experiment as follows: T H E R E I S N O R E V E R S E O N A M O T O R C Y C L E A F R I E N D O F M I N E F O U N D T H I S O U T R A T H E R D R A M A T I C A L L Y T H E O T H E R D A Y Beneath each letter of the text is given the number of guesses that were needed to guess that letter correctly. Suppose the subject always makes predictions in the same way, based on what the subject has seen in the past. Then the text is completely recoverable from the sequence consisting of the numbers of guesses: 1; 1; 1; 5; 1; 1; 2; : : :; 1; 1; 1; 1; 1 (1) For example, the subject looks at the rst integer in this sequence and sees a \1". The subject then knows that the rst letter of the text is the letter that the subject would make as his/her rst guess as the rst letter of any text. Presumably, the subject has a rule that tells him/her to always make the initial guess for the rst letter of any text to be \T". The rst letter of the text is therefore decoded correctly. One can arrive at a data compression algorithm based on Shannon's idea. If x 1 ; x 2 ; : : : ; : : : ; x i?1 represents the rst i? 1 letters of any text, then you have to specify a permutation of the set of letters (x 1 ; : : : ; x i?1 ) = ( 1 ; 2 ; : : : ; 28 ) (2) fa; b; c; : : : ; z; spaceg such that 1 is your rst prediction (guess) for x i, the i-th letter of the text, 2 is your second prediction (guess) for x i, etc. The family of all these permutations (x 1 ; : : : ; x i?1 ), as i varies and x 1 ; x 2 ; : : : ; x i?1 vary, determines a data compression algorithm. You rst represent your text x 1 ; x 2 ; : : : ; x N as the sequence of positive integers j 1 ; j 2 ; : : : ; j N in which j i = j if and only if the j-th term of the sequence (x 1 ; : : : ; x i?1 ) in (2) is equal to x i. Then, one compresses the sequence j 1 ; j 2 ; : : : ; j N using an adaptive arithmetic coder. 1

2 P. Fenwick [5] has recently implemented a compression algorithm for English text of this type, with good results. EXERCISE. For those of you that know how to do adaptive arithmetic coding, arithmetically encode the sequence (1), and thereby determine the number of codebits needed to represent Shannon's text above. (If you don't know how to do adaptive arithmetic coding, you can get an approximate upper bound to the arithmetic codeword length by computing log 2 J, where J is the product of the terms in the sequence (1).) Mind-reading machines At Bell Laboratories David Hagelbarger [8] built a simple mind-reading machine, whose purpose was to play the \penny matching" game. In this game, a player chooses head or tail, while a mind-reading machine tries to predict and match his/her choice. Hagelbarger's simple 8-state machine was able to match the \pennies" of its human opponent 5218 times over the course of 9795 plays. Hagelbarger's machine recorded and analyzed its human opponent's past choices, looking for patterns that would foretell the next choice; since it is almost impossible for a human to avoid falling into such patterns, Hagelbarger's machine could be successful more than 50% of the time. Inspired by Hagelbarger's success, Shannon then built a dierent 8-state mind-reading machine. In a duel between the two machines, Shannon's machine won. A description of Shannon's machine may be found in [17]. An account of Shannon's philosophy in building mind-reading machines and other game-playing machines may be found in [18]. 2.2 Predictors for Known and Unknown Data Models If one is dealing with a data sequence whose probabilistic model is known, then one can optimally predict the samples in the sequence. This is discussed in Section If one does not know the data model, one must use a predictor that works well no matter what the data model is, that is, one must use a universal predictor. Section pins down for us the notion of a universal predictor Optimum predictor for a known data model Let A be a nite alphabet. Let X 1 ; X 2 ; X 3 ; : : : be an innite random data sequence in which each random data sample X i takes its value in A. A predictor applied to this sequence shall generate a sequence of predictions ^X 1 ; ^X2 ; : : :, in which ^Xi is the predicted value of X i for every i. The predictions must be nonanticipatory; that is, for each i 2, ^Xi is determined by X 1 ; X 2 ; : : : ; X i?1. A predictor can either be deterministic or random. To dene a deterministic predictor, you specify for each i a function f i from A i?1 into A. You then dene the predictions by ^X i = f i (X 1 ; X 2 ; : : : ; X i?1 ) a.s. To dene a random predictor, you specify a probability distribution q i (jx 1 ; x 2 ; : : : ; x i?1 ) on A for each sequence of values x 1 ; x 2 ; : : : ; x i?1 of X 1 ; X 2 ; : : : ; X i?1, respectively. Then, the conditional distribution of ^X i given X 1 = x 1 ; X 2 = x 2 ; : : : ; X i?1 = x i?1 is taken to be q i (jx 1 ; : : : ; x i?1 ). At this point, we need to make some technical comments. Having specied a predictor, one needs to be able to compute the prediction error probability Pr[ ^Xi 6= X i ] for each i. To accomplish this, we make the implicit assumption that the conditional probability coincides with the conditional probability Pr[ ^Xi = ^x i jx 1 = x 1 ; X 2 = x 2 ; : : : ; X i?1 = x i?1 ] (3) Pr[ ^Xi = ^x i jx 1 = x 1 ; X 2 = x 2 ; : : : ; X i = x i ] This allows one to compute the joint conditional probability as the product of the conditional probability Pr[X i = x i ; ^Xi = ^x i jx 1 = x 1 ; X 2 = x 2 ; : : : ; X i?1 = x i?1 ] (4) 2

3 Pr[X i = x i jx 1 = x 1 ; : : : ; X i?1 = x i?1 ] and the conditional probability in (3) (which comes from q i ). From the probabilities (4), one can compute the joint probability distribution of X i and ^Xi, from which one determines Pr[ ^Xi 6= X i ]. The aim of predictor design is to design a predictor so that Pr[ ^Xi 6= X i ] is minimized for every i. Such a predictor is called an optimum predictor. If the probabilistic model of the random sequence fx i g is known (that is, the probability distribution of (X 1 ; X 2 ; : : : ; X i ) is known for every i), then an optimum predictor is dened as follows: ^X i = arg max a2a Pr[X i = ajx 1 ; X 2 ; : : : ; X i?1 ] Note that this optimum predictor is a deterministic predictor. We see that, in the known data model case, there is no need to consider random predictors. EXAMPLE. Let p() be a probability distribution on A. Suppose the model for the data sequence fx i g is known to be the i.i.d. model in which each X i has the distribution p(). Then, the best predictor as given in (3) can be dened in the following way. Pick a 2 A so that p(a) p(x) for every x 2 A. Dene ^X i = a; i = 1; 2; 3; : : : Prediction for an unknown data model If the model for the data sequence fx i g is not known, then the optimum predictor may not be known, since it may depend on the model. In this case, one can try to nd a \universal predictor", a predictor that will make Pr[ ^Xi 6= X i ] asymptotically small as i! 1, no matter what the model may be. In order to make precise what the notion of a universal predictor is, let us simplify matters by assuming that our random data sequence fx i g is a stationary sequence. Suppose we pick the best predictor in the sense that Pr[ ^Xi 6= X i ] is minimized for every i. Then it can be shown that the average prediction error probability n?1 n X Pr[ ^Xi 6= X i ] converges as n! 1 to a number we shall denote as (M), where M is the probabilistic model of the sequence fx i g. (You could take M to be the family of nite-dimensional distributions of the random sequence fx i g.) We suppose now that the model M of the data sequence fx i g is unknown, and lies in some known class M of stationary data models. (For example, M could be the class of i.i.d. models, or the class of stationary Markov models.) We say that a predictor is a universal predictor for the class M if n?1 n X Pr[ ^Xi 6= X i ]! (M) as n! 1 (5) where the data sequence fx i g employed in (5) is allowed to have a model M which can be any model in M. EXAMPLE. Suppose we have a binary alphabet (A = f0; 1g). Let N 0 (x 1 ; x 2 ; : : : ; x i?1 ) denote the number of zeroes in the binary sequence (x 1 ; x 2 ; : : : ; x i?1 ) and let N 1 (x 1 ; x 2 ; : : : ; x i?1 ) denote the number of ones in this same sequence. Consider the following predictor: ^X i = 0; N0 (X 1 ; : : : ; X i?1 ) N 1 (X 1 ; : : : ; X i?1 ) 1; otherwise It is easy to show that this predictor is a universal predictor for the class of i.i.d. binary data models. (Hint: Let fx i g be an i.i.d. binary data sequence, and let p = Pr[X 1 = 0]; it is easy to show that (M) = min(p; 1? p), and using the law of large numbers, it is easily seen that the limit on the left side of (5) is also min(p; 1? p).) (6) 3

4 No matter what the class of stationary data models M may be, there will exist at least one universal predictor for M. Of these universal predictors, which are the best? One should look for universal predictors for the class M for which the speed of convergence to the limit in (5) is fast rather than slow. For example, Merhav et al. [11] have shown that for the binary i.i.d. model class and for the universal predictor in (6), the convergence in (5) takes place with speed O(1=n) as n! 1. In addition, they obtain similar results for a universal predictor for the class of binary Markov models. Results of this type for more general model classes are needed. 2.3 Construction of Universal Predictors We discuss two information theory methods for constructing universal predictors. In Section 2.3.1, we show how to construct universal predictors from arithmetic encoders. In Section 2.3.2, we discuss the MDL method for constructing universal predictors Universal predictors from arithmetic encoders Prediction and data compression are connected in the sense that good data compressors can be used to construct good predictors (and vice-versa). Data compression can be accomplished using what are called arithmetic encoders. From an arithmetic encoder that works well, we show in this section how to construct a universal predictor. The reader need not know anything about the mechanics of arithmetic encoding in order to understand this section. Let fq i : i = 1; 2; : : :g denote a family of conditional probability distributions in which, for each i, and each x 1 ; x 2 ; : : : ; x i?1 from the data alphabet A, q i (jx 1 ; x 2 ; : : : ; x i?1 ) is a probability distribution on A. The family fq i g denes an arithmetic encoder which encodes the rst n samples X 1 ; X 2 ; : : : ; X n of any random data sequence fx i : i = 1; 2; : : :g using roughly nx? log q i (X i jx 1 ; X 2 ; : : : ; X i?1 ) (7) codebits, where the logarithm throughout is to base two. (The actual number of codebits diers from the quantity in (7) by at most 2.) The distributions fq i g used to dene an arithmetic encoder shall be called the coding distributions for the arithmetic encoder. Suppose the model of the data sequence fx i g is known. What is the optimum arithmetic encoder for this data sequence? For each i, let p i denote the conditional distribution of X i given X 1 ; X 2 ; : : : ; X i?1. In other words, p i (x i jx 1 ; x 2 ; : : : ; x i?1 ) = Pr[X i = x i jx 1 = x 1 ; : : : ; X i?1 = x i?1 ] It can be shown that the optimal arithmetic encoder for the data sequence fx i g is the arithmetic encoder whose coding distributions fq i g satisfy q i = p i for every i. We suppose now that the model M of the data sequence fx i g is stationary. Let H(M) (called the entropy of the model M) be the number dened by H(M) = lim n!1 n?1 E " nx? log p i (X i jx 1 ; : : : X i?1 ) Let M be a family of stationary models. An arithmetic encoder with coding distributions fq i g is called a universal arithmetic encoder for M if n?1 E for all models M in M. models. " nx? log q i (X i jx 1 ; X 2 ; : : : ; X i?1 ) # #! H(M) as n! 1 There exist universal arithmetic encoders for any family of all stationary data EXAMPLE. The following is known to dene a universal arithmetic encoder for the family of all binary i.i.d. data models: 4

5 q i (ajx 1 ; : : : ; x i?1 ) = ( N0(X 1;:::;Xi?1)+1=2 i ; a = 0 N 1(X 1;:::;Xi?1)+1=2 i ; a = 1 The following result shows how a universal predictor can be obtained from a universal arithmetic encoder. Theorem 1 Let M be any family of stationary data models. Let fq i g be the family of coding distributions for a universal arithmetic encoder for M. For each i, let i be the conditional probability distribution in which Consider the predictor dened by X i?1 i (ajx 1 ; : : : ; x i?1 ) = (i? 1)?1 q j (ajx i?j ; : : : ; x i?1 ) j=1 ^X i = arg max i (ajx 1 ; : : : ; X i?1 ) a2a Then this predictor is a universal predictor for the class M. EXAMPLE. Ryabko [14] [15] discovered a universal predictor for the class of stationary data models which he constructed from an arithmetic encoder. We shall not present Ryabko's predictor here, because it is too complicated. However, we do present the results of Ryabko's case study, in which he compared the performance of his universal predictor to the performance of the well-known Laplace predictor. Ryabko's performance criterion was to see how well a predictor performs as a betting strategy in betting on the outcome of the World Cup championship games from 1950 through First, the performance of Laplace's predictor was tested. Laplace's predictor is a random predictor which yields predictions f ^Xi g for a binary sequence fx i g as follows: ( N0(X1;:::;Xi?1)+1 ; a = 0 i+1 Pr[ ^Xi = ajx 1 = x 1 ; : : : ; X i?1 = x i?1 ] = N 1(X 1;:::;Xi?1)+1 ; a = 1 i+1 In each World Cup year, a prediction was made as to whether the winner would be a European team (E) or a non-european team (N). Bets were made by spreading the gambler's current capital according to the probabilities of the two possible predictions. An initial capital of $1000 was assumed. The unrealistic assumption was made that the betting odds were equal in each year. Table 1: Performance of Laplace's Predictor Year Winner N E N N E N E N E N E Pr[E] 1/2 1/3 1/2 2/5 1/3 3/7 3/8 4/9 2/5 5/11 5/12 Pr[N] 1/2 2/3 1/2 3/5 2/3 4/7 5/8 5/9 3/5 6/11 7/12 Capital The amount of capital after the termination of each year's World Cup is listed. The results show that the Laplace predictor does not yield a good betting strategy, since the capital dwindled from $1000 to $369 over the course of the 11 World Cups. Ryabko's predictor gave the following results when applied to World Cup betting: Table 2: Performance of Ryabo's Predictor Year Winner N E N N E N E N E N E Pr[E] 1/ / Pr[N] 1/ / Capital

6 Note that Ryabko's predictor greatly outperformed Laplace's predictor in this case study. Using Ryabko's predictor to govern the betting strategy, the initial capital increased from $1000 to $1920! MDL-based universal predictor design In information theory, \MDL" stands for \minimum description length". The minimum description length criterion, developed largely by J. Rissanen, selects a data model from a parametric family of data models so that the total number of codebits needed to encode the model parameter together with the observed data is minimized. The selected model can then be used to construct a predictor for the next data sample. The predictor selected by this MDL-based technique can be a universal predictor in some cases. We now furnish the details of the MDL method. Let be a parameter space. Let fm : 2 g be a family of data models indexed by. Each model M is a sequence of probability distributions fp (x n ) : n = 1; 2; : : :g, where each p (x n ) is the distribution of the vector x n consisting of the rst n random data samples from the nite alphabet A, assuming that the true value of the model parameter is. One partitions the parameter space into nitely many subsets, and selects a parameter value from each subset in the partition. The set of selected parameters forms a nite subset ^ of the parameter space. Let! ^ be the mapping which maps each parameter in into the parameter ^ from ^ which lies in the same subset in the partition of as does. The mapping! ^ quantizes the parameters in the parameter space. Next, one selects a binary codeword of length L() to represent each parameter in ^. Suppose now that the rst i? 1 data samples x i?1 have been observed,and it is desired to form the prediction ^x i of the next sample. To do this, one rst selects so that = arg min 2 [L(^)? log p^(x i?1 )] Then, one assumes that the true data model is the model M, and forms the prediction ^x i based on this model: ^x i = arg max p (ajx i?1 ) a2a where the conditional probability p (ajx i?1 ) is computed as the ratio p (ajx i?1 ) = p (x i?1 ; a)=p (x i?1 ) The reader can consult the works [12] [13] to see instances in which the predictor dened as above turns out to be a universal predictor for the family of models fm g. 2.4 Nonparametric Universal Predictors In nonparametric universal prediction theory, one wishes to predict as well as possible the terms in completely arbitrary data sequences, not governed by any probabilistic model. Universal predictors can be found in this nonparametric context, but they must be random predictors. One can draw the following moral from this: For a random data model, known or unknown, one can always choose a good deterministic predictor, provided that the model is suciently \smooth" (such as a stationary model). For a completely arbitrary data model, to get a good predictor, you must choose a nondeterministic predictor. In order to describe nonparametric prediction theory, we need the concept of nite-state predictor. For a given deterministic predictor, suppose there is a nite set S and two functions f : S A! A and g : S A! S such that the predictions ^x 1 ; ^x 2 ; : : : ; ^x n arising from applying the predictor to an arbitrary nite-length sequence x 1 ; x 2 ; : : : ; x n are generated by the formulae: ^x i = f(s i?1 ; x i?1 ) s i = g(s i?1 ; x i?1 ) where s 0 is some xed element of S. Then the predictor is said to be a nite-state predictor. 6

7 We are now ready to pose a general nonparametic prediction design problem. Let x n denote an arbitrary binary sequence (x 1 ; x 2 ; : : : ; x n ) of length n. If some prediction rule applied to x n yields the sequence of predictions ^x n = (^x 1 ; ^x 2 ; : : : ; ^x n ), then the average prediction error d n (x n ; ^x n ), which we must control, is dened by where d(x; ^x) is the Hamming distance function given by 0; x = ^x d(x; ^x) = 1; x 6= ^x d n (x n ; ^x n ) = n?1 n X d(x i ; ^x i ) (8) Let be a nite set of nite-state predictors. For each predictor 2, let d (x n ) denote the average prediction error arising when is applied to the sequence x n. Can we design a random predictor so that, if ^x n is the sequence of random predictions resulting from applying the predictor to x n, then max x n je[d n(x n ; ^x n )]? min d (x n )j! 0 as n! 1 2 The answer is yes! The design of the random predictor asked for in the preceding is somewhat tricky. We content ourselves here with an elaboration of the form of this universal random predictor in a special case. Let the data alphabet be binary. We consider two elementary deterministic prediction rules. Rule 0 is the prediction rule in which every prediction is taken to be 0, and Rule 1 is the prediction rule in which every prediction is taken to be 1. For Rule 0 applied to x n, the average prediction error is N 1 (x n )=n, and for Rule 1, the average prediction error is N 0 (x n )=n. There must be a random predictor for which max x n je[d n(x n ; ^x n )]? min(n 0 (x n )=n; N 1 (x n )=n)j! 0 as n! 1 What form does this predictor take? Letting q i (jx 1 ; x 2 ; : : : ; x i?1 ) denote the distribution of ^x i as determined by the nondeterministic data samples x 1 ; x 2 ; : : : ; x i?1, this universal predictor takes the form q(1jx 1 ; : : : ; x i?1 ) = 8 < : 1=2+i?N 0(x 1;:::;xi?1) 2i 0; N 0 (x 1 ; : : : ; x i?1 ) > (1=2 + i )(i? 1) 1; N 0 (x 1 ; : : : ; x i?1 ) < (1=2? i )(i? 1) ; otherwise where the numbers i are positive numbers which! 0 as i! 1. In other words, if the fraction of ones seen in the past lies within the bounds 1=2 i, a random estimate is made for x i ; otherwise, a deterministic estimate is taken. For a treatment of nonparametric universal prediction theory, the reader is invited to consult the references [7], [2], [3], and [9]. 2.5 String-Matching Predictors In examples of universal predictors presented so far, prediction of the value of a sample is based upon frequencies of patterns appearing in the sequence of past samples. A string-matching type predictor operates in a simpler manner one simply looks in the past to nd one pattern that agrees with a pattern formed by the most recently observed samples, taking the prediction to be the sample appearing right after the pattern found. We illustrate one example of a string-matching type predictor. Suppose the samples that have been observed are The prediction ^x i is made by following these three steps: x 1 ; x 2 ; : : : ; x i?1 (9) 7

8 Step 1: Find the longest pattern appearing at the end of the sequence (9) which has a matching pattern in the past. Step 2: Identify the rst place where a pattern matching is found as one moves back into the past, and construct the reduced sequence obtained by deleting all symbols from (9) appearing to the left of the position of this matching pattern. Step 3: Take ^x i to be the sample following the pattern at the beginning of the reduced sequence. EXAMPLE. Let the past samples be 1; 1; 0; 1; 0; 0; 0; 0; 1; 0; 1; 0; 1; 1; 0; 0; 1; 0 and let us predict the next sample. Step 1 identies the longest matching pattern to be \0; 0; 1; 0". Step 2 gives us the reduced sequence 0; 0; 1; 0; 1; 0; 1; 1; 0; 0; 1; 0 Step 3 tells us to take our predicted value to be \1" because this value follows the pattern \0; 0; 1; 0" at the beginning of the reduced sequence above. The predictor dened above can be shown to be a universal predictor for the class of all i.i.d. data models with alphabet A. It is an open problem to nd the largest class of stationary data models for which this predictor is a universal predictor. References [1] P. Algoet, \Universal schemes for prediction, gambling and portfolio selection," Ann. Prob., vol. 20, pp , [2] D. Blackwell, \Controlled random walk," in Proceedings of the 1954 Congress of Mathematicians, vol. III, pp , Amsterdam, North Holland. [3] T. Cover, \Behavior of sequential predictors of binary sequences," Proc. 4th Prague Conference on Information Theory, Stat. Dec. Functions, and Random Processes, pp , Prague, [4] T. Cover, \Universal portfolios," Math. Finance, vol. 1, pp. 1-29, [5] P. Fenwick, Technical Report, Dept. of Computer Science, University of Auckland, New Zealand. [6] M. Feder, \Gambling using a nite-state machine," IEEE Trans. Inform. Theory, vol. 37, pp , [7] M. Feder, N. Merhav and M. Gutman, \Universal prediction of individual sequences," IEEE Trans. Inform. Theory, vol. 38, pp , [8] D. Hagelbarger, \SEER, A SEquence Extrapolating Robot," I.R.E. Trans. Electronic Computers, vol. 5, pp. 1-7, [9] F. Hannan, \Approximation to Bayes risk in repeated plays," in Contributions to the Theory of Games, vol. III, Annals of Mathematics Studies No. 39, pp , Princeton, [10] J. Kelly, \A new interpretation of information rate," Bell Sys. Tech. J., vol. vol. 35, pp , [11] N. Merhav, M. Feder, and M. Gutman, \Some properties of sequential predictors for binary Markov sources," IEEE Trans. Inform. Theory, vol. 39, pp , [12] J. Rissanen, \Universal coding, information, prediction, and estimation," IEEE Trans. Inform. Theory, vol. 30, pp ,

9 [13] J. Rissanen, STOCHASTIC COMPLEXITY IN STATISTICAL INQUIRY. Singapore: World Scientic, [14] B. Ryabko, \Prediction of random sequences and universal coding," Problemy Peredachi Informatsii, vol. 24, pp. 3-14, [15] B. Ryabko, \The complexity and eectiveness of prediction algorithms," Journal of Complexity, vol. 10, pp , [16] C. Shannon, \Prediction and entropy of printed English," Bell System Tech. J., vol. 30, pp. 50{64, [17] C. Shannon, \A mind-reading machine," Bell Laboratories memorandum, (Reprinted in the Collected Papers of Claude Elwood Shannon, IEEE Press, pp ) [18] C. Shannon, \Game Playing Machines," Journal Franklin Institute, vol. 260, pp ,

3F1 Information Theory, Lecture 3

3F1 Information Theory, Lecture 3 3F1 Information Theory, Lecture 3 Jossy Sayir Department of Engineering Michaelmas 2013, 29 November 2013 Memoryless Sources Arithmetic Coding Sources with Memory Markov Example 2 / 21 Encoding the output

More information

Complex Systems Methods 2. Conditional mutual information, entropy rate and algorithmic complexity

Complex Systems Methods 2. Conditional mutual information, entropy rate and algorithmic complexity Complex Systems Methods 2. Conditional mutual information, entropy rate and algorithmic complexity Eckehard Olbrich MPI MiS Leipzig Potsdam WS 2007/08 Olbrich (Leipzig) 26.10.2007 1 / 18 Overview 1 Summary

More information

Preface These notes were prepared on the occasion of giving a guest lecture in David Harel's class on Advanced Topics in Computability. David's reques

Preface These notes were prepared on the occasion of giving a guest lecture in David Harel's class on Advanced Topics in Computability. David's reques Two Lectures on Advanced Topics in Computability Oded Goldreich Department of Computer Science Weizmann Institute of Science Rehovot, Israel. oded@wisdom.weizmann.ac.il Spring 2002 Abstract This text consists

More information

3F1 Information Theory, Lecture 3

3F1 Information Theory, Lecture 3 3F1 Information Theory, Lecture 3 Jossy Sayir Department of Engineering Michaelmas 2011, 28 November 2011 Memoryless Sources Arithmetic Coding Sources with Memory 2 / 19 Summary of last lecture Prefix-free

More information

Learning Methods for Online Prediction Problems. Peter Bartlett Statistics and EECS UC Berkeley

Learning Methods for Online Prediction Problems. Peter Bartlett Statistics and EECS UC Berkeley Learning Methods for Online Prediction Problems Peter Bartlett Statistics and EECS UC Berkeley Course Synopsis A finite comparison class: A = {1,..., m}. Converting online to batch. Online convex optimization.

More information

Lecture 4 Noisy Channel Coding

Lecture 4 Noisy Channel Coding Lecture 4 Noisy Channel Coding I-Hsiang Wang Department of Electrical Engineering National Taiwan University ihwang@ntu.edu.tw October 9, 2015 1 / 56 I-Hsiang Wang IT Lecture 4 The Channel Coding Problem

More information

Stochastic Processes

Stochastic Processes qmc082.tex. Version of 30 September 2010. Lecture Notes on Quantum Mechanics No. 8 R. B. Griffiths References: Stochastic Processes CQT = R. B. Griffiths, Consistent Quantum Theory (Cambridge, 2002) DeGroot

More information

Sequential prediction with coded side information under logarithmic loss

Sequential prediction with coded side information under logarithmic loss under logarithmic loss Yanina Shkel Department of Electrical Engineering Princeton University Princeton, NJ 08544, USA Maxim Raginsky Department of Electrical and Computer Engineering Coordinated Science

More information

10-704: Information Processing and Learning Fall Lecture 9: Sept 28

10-704: Information Processing and Learning Fall Lecture 9: Sept 28 10-704: Information Processing and Learning Fall 2016 Lecturer: Siheng Chen Lecture 9: Sept 28 Note: These notes are based on scribed notes from Spring15 offering of this course. LaTeX template courtesy

More information

Entropy and Ergodic Theory Lecture 3: The meaning of entropy in information theory

Entropy and Ergodic Theory Lecture 3: The meaning of entropy in information theory Entropy and Ergodic Theory Lecture 3: The meaning of entropy in information theory 1 The intuitive meaning of entropy Modern information theory was born in Shannon s 1948 paper A Mathematical Theory of

More information

Data Compression Techniques

Data Compression Techniques Data Compression Techniques Part 2: Text Compression Lecture 5: Context-Based Compression Juha Kärkkäinen 14.11.2017 1 / 19 Text Compression We will now look at techniques for text compression. These techniques

More information

1 Introduction to information theory

1 Introduction to information theory 1 Introduction to information theory 1.1 Introduction In this chapter we present some of the basic concepts of information theory. The situations we have in mind involve the exchange of information through

More information

Information Theory. Lecture 5 Entropy rate and Markov sources STEFAN HÖST

Information Theory. Lecture 5 Entropy rate and Markov sources STEFAN HÖST Information Theory Lecture 5 Entropy rate and Markov sources STEFAN HÖST Universal Source Coding Huffman coding is optimal, what is the problem? In the previous coding schemes (Huffman and Shannon-Fano)it

More information

Lecture 7: More Arithmetic and Fun With Primes

Lecture 7: More Arithmetic and Fun With Primes IAS/PCMI Summer Session 2000 Clay Mathematics Undergraduate Program Advanced Course on Computational Complexity Lecture 7: More Arithmetic and Fun With Primes David Mix Barrington and Alexis Maciel July

More information

Introduction to Information Theory

Introduction to Information Theory Introduction to Information Theory Impressive slide presentations Radu Trîmbiţaş UBB October 2012 Radu Trîmbiţaş (UBB) Introduction to Information Theory October 2012 1 / 19 Transmission of information

More information

Multimedia Communications. Mathematical Preliminaries for Lossless Compression

Multimedia Communications. Mathematical Preliminaries for Lossless Compression Multimedia Communications Mathematical Preliminaries for Lossless Compression What we will see in this chapter Definition of information and entropy Modeling a data source Definition of coding and when

More information

Stream Codes. 6.1 The guessing game

Stream Codes. 6.1 The guessing game About Chapter 6 Before reading Chapter 6, you should have read the previous chapter and worked on most of the exercises in it. We ll also make use of some Bayesian modelling ideas that arrived in the vicinity

More information

Some Notes On Rissanen's Stochastic Complexity Guoqi Qian zx and Hans R. Kunsch Seminar fur Statistik ETH Zentrum CH-8092 Zurich, Switzerland November

Some Notes On Rissanen's Stochastic Complexity Guoqi Qian zx and Hans R. Kunsch Seminar fur Statistik ETH Zentrum CH-8092 Zurich, Switzerland November Some Notes On Rissanen's Stochastic Complexity by Guoqi Qian 1 2 and Hans R. Kunsch Research Report No. 79 November 1996 Seminar fur Statistik Eidgenossische Technische Hochschule (ETH) CH-8092 Zurich

More information

Lecture 1: September 25, A quick reminder about random variables and convexity

Lecture 1: September 25, A quick reminder about random variables and convexity Information and Coding Theory Autumn 207 Lecturer: Madhur Tulsiani Lecture : September 25, 207 Administrivia This course will cover some basic concepts in information and coding theory, and their applications

More information

EE376A: Homework #3 Due by 11:59pm Saturday, February 10th, 2018

EE376A: Homework #3 Due by 11:59pm Saturday, February 10th, 2018 Please submit the solutions on Gradescope. EE376A: Homework #3 Due by 11:59pm Saturday, February 10th, 2018 1. Optimal codeword lengths. Although the codeword lengths of an optimal variable length code

More information

EE5139R: Problem Set 4 Assigned: 31/08/16, Due: 07/09/16

EE5139R: Problem Set 4 Assigned: 31/08/16, Due: 07/09/16 EE539R: Problem Set 4 Assigned: 3/08/6, Due: 07/09/6. Cover and Thomas: Problem 3.5 Sets defined by probabilities: Define the set C n (t = {x n : P X n(x n 2 nt } (a We have = P X n(x n P X n(x n 2 nt

More information

Entropies & Information Theory

Entropies & Information Theory Entropies & Information Theory LECTURE I Nilanjana Datta University of Cambridge,U.K. See lecture notes on: http://www.qi.damtp.cam.ac.uk/node/223 Quantum Information Theory Born out of Classical Information

More information

Classical Information Theory Notes from the lectures by prof Suhov Trieste - june 2006

Classical Information Theory Notes from the lectures by prof Suhov Trieste - june 2006 Classical Information Theory Notes from the lectures by prof Suhov Trieste - june 2006 Fabio Grazioso... July 3, 2006 1 2 Contents 1 Lecture 1, Entropy 4 1.1 Random variable...............................

More information

Chapter 4. Data Transmission and Channel Capacity. Po-Ning Chen, Professor. Department of Communications Engineering. National Chiao Tung University

Chapter 4. Data Transmission and Channel Capacity. Po-Ning Chen, Professor. Department of Communications Engineering. National Chiao Tung University Chapter 4 Data Transmission and Channel Capacity Po-Ning Chen, Professor Department of Communications Engineering National Chiao Tung University Hsin Chu, Taiwan 30050, R.O.C. Principle of Data Transmission

More information

ECE Advanced Communication Theory, Spring 2009 Homework #1 (INCOMPLETE)

ECE Advanced Communication Theory, Spring 2009 Homework #1 (INCOMPLETE) ECE 74 - Advanced Communication Theory, Spring 2009 Homework #1 (INCOMPLETE) 1. A Huffman code finds the optimal codeword to assign to a given block of source symbols. (a) Show that cannot be a Huffman

More information

arxiv:cs/ v1 [cs.it] 15 Sep 2005

arxiv:cs/ v1 [cs.it] 15 Sep 2005 On Hats and other Covers (Extended Summary) arxiv:cs/0509045v1 [cs.it] 15 Sep 2005 Hendrik W. Lenstra Gadiel Seroussi Abstract. We study a game puzzle that has enjoyed recent popularity among mathematicians,

More information

CSCI 2570 Introduction to Nanocomputing

CSCI 2570 Introduction to Nanocomputing CSCI 2570 Introduction to Nanocomputing Information Theory John E Savage What is Information Theory Introduced by Claude Shannon. See Wikipedia Two foci: a) data compression and b) reliable communication

More information

1 Introduction This work follows a paper by P. Shields [1] concerned with a problem of a relation between the entropy rate of a nite-valued stationary

1 Introduction This work follows a paper by P. Shields [1] concerned with a problem of a relation between the entropy rate of a nite-valued stationary Prexes and the Entropy Rate for Long-Range Sources Ioannis Kontoyiannis Information Systems Laboratory, Electrical Engineering, Stanford University. Yurii M. Suhov Statistical Laboratory, Pure Math. &

More information

National University of Singapore Department of Electrical & Computer Engineering. Examination for

National University of Singapore Department of Electrical & Computer Engineering. Examination for National University of Singapore Department of Electrical & Computer Engineering Examination for EE5139R Information Theory for Communication Systems (Semester I, 2014/15) November/December 2014 Time Allowed:

More information

Basic Principles of Lossless Coding. Universal Lossless coding. Lempel-Ziv Coding. 2. Exploit dependences between successive symbols.

Basic Principles of Lossless Coding. Universal Lossless coding. Lempel-Ziv Coding. 2. Exploit dependences between successive symbols. Universal Lossless coding Lempel-Ziv Coding Basic principles of lossless compression Historical review Variable-length-to-block coding Lempel-Ziv coding 1 Basic Principles of Lossless Coding 1. Exploit

More information

SIGNAL COMPRESSION Lecture Shannon-Fano-Elias Codes and Arithmetic Coding

SIGNAL COMPRESSION Lecture Shannon-Fano-Elias Codes and Arithmetic Coding SIGNAL COMPRESSION Lecture 3 4.9.2007 Shannon-Fano-Elias Codes and Arithmetic Coding 1 Shannon-Fano-Elias Coding We discuss how to encode the symbols {a 1, a 2,..., a m }, knowing their probabilities,

More information

lossless, optimal compressor

lossless, optimal compressor 6. Variable-length Lossless Compression The principal engineering goal of compression is to represent a given sequence a, a 2,..., a n produced by a source as a sequence of bits of minimal possible length.

More information

Lecture 6 I. CHANNEL CODING. X n (m) P Y X

Lecture 6 I. CHANNEL CODING. X n (m) P Y X 6- Introduction to Information Theory Lecture 6 Lecturer: Haim Permuter Scribe: Yoav Eisenberg and Yakov Miron I. CHANNEL CODING We consider the following channel coding problem: m = {,2,..,2 nr} Encoder

More information

(Classical) Information Theory II: Source coding

(Classical) Information Theory II: Source coding (Classical) Information Theory II: Source coding Sibasish Ghosh The Institute of Mathematical Sciences CIT Campus, Taramani, Chennai 600 113, India. p. 1 Abstract The information content of a random variable

More information

Nonparametric inference for ergodic, stationary time series.

Nonparametric inference for ergodic, stationary time series. G. Morvai, S. Yakowitz, and L. Györfi: Nonparametric inference for ergodic, stationary time series. Ann. Statist. 24 (1996), no. 1, 370 379. Abstract The setting is a stationary, ergodic time series. The

More information

The Game of Normal Numbers

The Game of Normal Numbers The Game of Normal Numbers Ehud Lehrer September 4, 2003 Abstract We introduce a two-player game where at each period one player, say, Player 2, chooses a distribution and the other player, Player 1, a

More information

Data Compression. Limit of Information Compression. October, Examples of codes 1

Data Compression. Limit of Information Compression. October, Examples of codes 1 Data Compression Limit of Information Compression Radu Trîmbiţaş October, 202 Outline Contents Eamples of codes 2 Kraft Inequality 4 2. Kraft Inequality............................ 4 2.2 Kraft inequality

More information

Entropy Rate of Stochastic Processes

Entropy Rate of Stochastic Processes Entropy Rate of Stochastic Processes Timo Mulder tmamulder@gmail.com Jorn Peters jornpeters@gmail.com February 8, 205 The entropy rate of independent and identically distributed events can on average be

More information

1 Ex. 1 Verify that the function H(p 1,..., p n ) = k p k log 2 p k satisfies all 8 axioms on H.

1 Ex. 1 Verify that the function H(p 1,..., p n ) = k p k log 2 p k satisfies all 8 axioms on H. Problem sheet Ex. Verify that the function H(p,..., p n ) = k p k log p k satisfies all 8 axioms on H. Ex. (Not to be handed in). looking at the notes). List as many of the 8 axioms as you can, (without

More information

Lecture 1 : Data Compression and Entropy

Lecture 1 : Data Compression and Entropy CPS290: Algorithmic Foundations of Data Science January 8, 207 Lecture : Data Compression and Entropy Lecturer: Kamesh Munagala Scribe: Kamesh Munagala In this lecture, we will study a simple model for

More information

Entropy as a measure of surprise

Entropy as a measure of surprise Entropy as a measure of surprise Lecture 5: Sam Roweis September 26, 25 What does information do? It removes uncertainty. Information Conveyed = Uncertainty Removed = Surprise Yielded. How should we quantify

More information

10-704: Information Processing and Learning Fall Lecture 10: Oct 3

10-704: Information Processing and Learning Fall Lecture 10: Oct 3 0-704: Information Processing and Learning Fall 206 Lecturer: Aarti Singh Lecture 0: Oct 3 Note: These notes are based on scribed notes from Spring5 offering of this course. LaTeX template courtesy of

More information

Lecture 11: Quantum Information III - Source Coding

Lecture 11: Quantum Information III - Source Coding CSCI5370 Quantum Computing November 25, 203 Lecture : Quantum Information III - Source Coding Lecturer: Shengyu Zhang Scribe: Hing Yin Tsang. Holevo s bound Suppose Alice has an information source X that

More information

Chapter 2 Source Models and Entropy. Any information-generating process can be viewed as. computer program in executed form: binary 0

Chapter 2 Source Models and Entropy. Any information-generating process can be viewed as. computer program in executed form: binary 0 Part II Information Theory Concepts Chapter 2 Source Models and Entropy Any information-generating process can be viewed as a source: { emitting a sequence of symbols { symbols from a nite alphabet text:

More information

Information Theory. Week 4 Compressing streams. Iain Murray,

Information Theory. Week 4 Compressing streams. Iain Murray, Information Theory http://www.inf.ed.ac.uk/teaching/courses/it/ Week 4 Compressing streams Iain Murray, 2014 School of Informatics, University of Edinburgh Jensen s inequality For convex functions: E[f(x)]

More information

Bounded Expected Delay in Arithmetic Coding

Bounded Expected Delay in Arithmetic Coding Bounded Expected Delay in Arithmetic Coding Ofer Shayevitz, Ram Zamir, and Meir Feder Tel Aviv University, Dept. of EE-Systems Tel Aviv 69978, Israel Email: {ofersha, zamir, meir }@eng.tau.ac.il arxiv:cs/0604106v1

More information

Alon Orlitsky. AT&T Bell Laboratories. March 22, Abstract

Alon Orlitsky. AT&T Bell Laboratories. March 22, Abstract Average-case interactive communication Alon Orlitsky AT&T Bell Laboratories March 22, 1996 Abstract and Y are random variables. Person P knows, Person P Y knows Y, and both know the joint probability distribution

More information

Chaos, Complexity, and Inference (36-462)

Chaos, Complexity, and Inference (36-462) Chaos, Complexity, and Inference (36-462) Lecture 7: Information Theory Cosma Shalizi 3 February 2009 Entropy and Information Measuring randomness and dependence in bits The connection to statistics Long-run

More information

Foundations of Mathematics MATH 220 FALL 2017 Lecture Notes

Foundations of Mathematics MATH 220 FALL 2017 Lecture Notes Foundations of Mathematics MATH 220 FALL 2017 Lecture Notes These notes form a brief summary of what has been covered during the lectures. All the definitions must be memorized and understood. Statements

More information

An Analysis of the Difference of Code Lengths Between Two-Step Codes Based on MDL Principle and Bayes Codes

An Analysis of the Difference of Code Lengths Between Two-Step Codes Based on MDL Principle and Bayes Codes IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 47, NO. 3, MARCH 2001 927 An Analysis of the Difference of Code Lengths Between Two-Step Codes Based on MDL Principle Bayes Codes Masayuki Goto, Member, IEEE,

More information

CS 282A/MATH 209A: Foundations of Cryptography Prof. Rafail Ostrosky. Lecture 4

CS 282A/MATH 209A: Foundations of Cryptography Prof. Rafail Ostrosky. Lecture 4 CS 282A/MATH 209A: Foundations of Cryptography Prof. Rafail Ostrosky Lecture 4 Lecture date: January 26, 2005 Scribe: Paul Ray, Mike Welch, Fernando Pereira 1 Private Key Encryption Consider a game between

More information

Lecture 3. Mathematical methods in communication I. REMINDER. A. Convex Set. A set R is a convex set iff, x 1,x 2 R, θ, 0 θ 1, θx 1 + θx 2 R, (1)

Lecture 3. Mathematical methods in communication I. REMINDER. A. Convex Set. A set R is a convex set iff, x 1,x 2 R, θ, 0 θ 1, θx 1 + θx 2 R, (1) 3- Mathematical methods in communication Lecture 3 Lecturer: Haim Permuter Scribe: Yuval Carmel, Dima Khaykin, Ziv Goldfeld I. REMINDER A. Convex Set A set R is a convex set iff, x,x 2 R, θ, θ, θx + θx

More information

1 Primals and Duals: Zero Sum Games

1 Primals and Duals: Zero Sum Games CS 124 Section #11 Zero Sum Games; NP Completeness 4/15/17 1 Primals and Duals: Zero Sum Games We can represent various situations of conflict in life in terms of matrix games. For example, the game shown

More information

Pp. 311{318 in Proceedings of the Sixth International Workshop on Articial Intelligence and Statistics

Pp. 311{318 in Proceedings of the Sixth International Workshop on Articial Intelligence and Statistics Pp. 311{318 in Proceedings of the Sixth International Workshop on Articial Intelligence and Statistics (Ft. Lauderdale, USA, January 1997) Comparing Predictive Inference Methods for Discrete Domains Petri

More information

Communications Theory and Engineering

Communications Theory and Engineering Communications Theory and Engineering Master's Degree in Electronic Engineering Sapienza University of Rome A.A. 2018-2019 AEP Asymptotic Equipartition Property AEP In information theory, the analog of

More information

Some Basic Concepts of Probability and Information Theory: Pt. 2

Some Basic Concepts of Probability and Information Theory: Pt. 2 Some Basic Concepts of Probability and Information Theory: Pt. 2 PHYS 476Q - Southern Illinois University January 22, 2018 PHYS 476Q - Southern Illinois University Some Basic Concepts of Probability and

More information

MATH 3C: MIDTERM 1 REVIEW. 1. Counting

MATH 3C: MIDTERM 1 REVIEW. 1. Counting MATH 3C: MIDTERM REVIEW JOE HUGHES. Counting. Imagine that a sports betting pool is run in the following way: there are 20 teams, 2 weeks, and each week you pick a team to win. However, you can t pick

More information

Introduction to Information Entropy Adapted from Papoulis (1991)

Introduction to Information Entropy Adapted from Papoulis (1991) Introduction to Information Entropy Adapted from Papoulis (1991) Federico Lombardo Papoulis, A., Probability, Random Variables and Stochastic Processes, 3rd edition, McGraw ill, 1991. 1 1. INTRODUCTION

More information

Asymptotic Minimax Regret for Data Compression, Gambling, and Prediction

Asymptotic Minimax Regret for Data Compression, Gambling, and Prediction IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 46, NO. 2, MARCH 2000 431 Asymptotic Minimax Regret for Data Compression, Gambling, Prediction Qun Xie Andrew R. Barron, Member, IEEE Abstract For problems

More information

Phase Transitions of Random Codes and GV-Bounds

Phase Transitions of Random Codes and GV-Bounds Phase Transitions of Random Codes and GV-Bounds Yun Fan Math Dept, CCNU A joint work with Ling, Liu, Xing Oct 2011 Y. Fan (CCNU) Phase Transitions of Random Codes and GV-Bounds Oct 2011 1 / 36 Phase Transitions

More information

A little context This paper is concerned with finite automata from the experimental point of view. The behavior of these machines is strictly determin

A little context This paper is concerned with finite automata from the experimental point of view. The behavior of these machines is strictly determin Computability and Probabilistic machines K. de Leeuw, E. F. Moore, C. E. Shannon and N. Shapiro in Automata Studies, Shannon, C. E. and McCarthy, J. Eds. Annals of Mathematics Studies, Princeton University

More information

On Universal Types. Gadiel Seroussi Hewlett-Packard Laboratories Palo Alto, California, USA. University of Minnesota, September 14, 2004

On Universal Types. Gadiel Seroussi Hewlett-Packard Laboratories Palo Alto, California, USA. University of Minnesota, September 14, 2004 On Universal Types Gadiel Seroussi Hewlett-Packard Laboratories Palo Alto, California, USA University of Minnesota, September 14, 2004 Types for Parametric Probability Distributions A = finite alphabet,

More information

Computational Completeness

Computational Completeness Computational Completeness 1 Definitions and examples Let Σ = {f 1, f 2,..., f i,...} be a (finite or infinite) set of Boolean functions. Any of the functions f i Σ can be a function of arbitrary number

More information

Exercises with solutions (Set B)

Exercises with solutions (Set B) Exercises with solutions (Set B) 3. A fair coin is tossed an infinite number of times. Let Y n be a random variable, with n Z, that describes the outcome of the n-th coin toss. If the outcome of the n-th

More information

A New Interpretation of Information Rate

A New Interpretation of Information Rate A New Interpretation of Information Rate reproduced with permission of AT&T By J. L. Kelly, jr. (Manuscript received March 2, 956) If the input symbols to a communication channel represent the outcomes

More information

The No-Regret Framework for Online Learning

The No-Regret Framework for Online Learning The No-Regret Framework for Online Learning A Tutorial Introduction Nahum Shimkin Technion Israel Institute of Technology Haifa, Israel Stochastic Processes in Engineering IIT Mumbai, March 2013 N. Shimkin,

More information

Exercise 1. = P(y a 1)P(a 1 )

Exercise 1. = P(y a 1)P(a 1 ) Chapter 7 Channel Capacity Exercise 1 A source produces independent, equally probable symbols from an alphabet {a 1, a 2 } at a rate of one symbol every 3 seconds. These symbols are transmitted over a

More information

CSC 5170: Theory of Computational Complexity Lecture 5 The Chinese University of Hong Kong 8 February 2010

CSC 5170: Theory of Computational Complexity Lecture 5 The Chinese University of Hong Kong 8 February 2010 CSC 5170: Theory of Computational Complexity Lecture 5 The Chinese University of Hong Kong 8 February 2010 So far our notion of realistic computation has been completely deterministic: The Turing Machine

More information

A Mathematical Theory of Communication

A Mathematical Theory of Communication A Mathematical Theory of Communication Ben Eggers Abstract This paper defines information-theoretic entropy and proves some elementary results about it. Notably, we prove that given a few basic assumptions

More information

6196 IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 57, NO. 9, SEPTEMBER 2011

6196 IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 57, NO. 9, SEPTEMBER 2011 6196 IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 57, NO. 9, SEPTEMBER 2011 On the Structure of Real-Time Encoding and Decoding Functions in a Multiterminal Communication System Ashutosh Nayyar, Student

More information

Intro to Information Theory

Intro to Information Theory Intro to Information Theory Math Circle February 11, 2018 1. Random variables Let us review discrete random variables and some notation. A random variable X takes value a A with probability P (a) 0. Here

More information

Design of Optimal Quantizers for Distributed Source Coding

Design of Optimal Quantizers for Distributed Source Coding Design of Optimal Quantizers for Distributed Source Coding David Rebollo-Monedero, Rui Zhang and Bernd Girod Information Systems Laboratory, Electrical Eng. Dept. Stanford University, Stanford, CA 94305

More information

Lecture 5 Channel Coding over Continuous Channels

Lecture 5 Channel Coding over Continuous Channels Lecture 5 Channel Coding over Continuous Channels I-Hsiang Wang Department of Electrical Engineering National Taiwan University ihwang@ntu.edu.tw November 14, 2014 1 / 34 I-Hsiang Wang NIT Lecture 5 From

More information

Lecture 3 : Algorithms for source coding. September 30, 2016

Lecture 3 : Algorithms for source coding. September 30, 2016 Lecture 3 : Algorithms for source coding September 30, 2016 Outline 1. Huffman code ; proof of optimality ; 2. Coding with intervals : Shannon-Fano-Elias code and Shannon code ; 3. Arithmetic coding. 1/39

More information

ITCT Lecture IV.3: Markov Processes and Sources with Memory

ITCT Lecture IV.3: Markov Processes and Sources with Memory ITCT Lecture IV.3: Markov Processes and Sources with Memory 4. Markov Processes Thus far, we have been occupied with memoryless sources and channels. We must now turn our attention to sources with memory.

More information

MARKOV CHAINS: STATIONARY DISTRIBUTIONS AND FUNCTIONS ON STATE SPACES. Contents

MARKOV CHAINS: STATIONARY DISTRIBUTIONS AND FUNCTIONS ON STATE SPACES. Contents MARKOV CHAINS: STATIONARY DISTRIBUTIONS AND FUNCTIONS ON STATE SPACES JAMES READY Abstract. In this paper, we rst introduce the concepts of Markov Chains and their stationary distributions. We then discuss

More information

Lecture 2: Perfect Secrecy and its Limitations

Lecture 2: Perfect Secrecy and its Limitations CS 4501-6501 Topics in Cryptography 26 Jan 2018 Lecture 2: Perfect Secrecy and its Limitations Lecturer: Mohammad Mahmoody Scribe: Mohammad Mahmoody 1 Introduction Last time, we informally defined encryption

More information

Homework set 2 - Solutions

Homework set 2 - Solutions Homework set 2 - Solutions Math 495 Renato Feres Simulating a Markov chain in R Generating sample sequences of a finite state Markov chain. The following is a simple program for generating sample sequences

More information

Some Concepts in Probability and Information Theory

Some Concepts in Probability and Information Theory PHYS 476Q: An Introduction to Entanglement Theory (Spring 2018) Eric Chitambar Some Concepts in Probability and Information Theory We begin this course with a condensed survey of basic concepts in probability

More information

Solvability of Word Equations Modulo Finite Special And. Conuent String-Rewriting Systems Is Undecidable In General.

Solvability of Word Equations Modulo Finite Special And. Conuent String-Rewriting Systems Is Undecidable In General. Solvability of Word Equations Modulo Finite Special And Conuent String-Rewriting Systems Is Undecidable In General Friedrich Otto Fachbereich Mathematik/Informatik, Universitat GH Kassel 34109 Kassel,

More information

CIS 2033 Lecture 5, Fall

CIS 2033 Lecture 5, Fall CIS 2033 Lecture 5, Fall 2016 1 Instructor: David Dobor September 13, 2016 1 Supplemental reading from Dekking s textbook: Chapter2, 3. We mentioned at the beginning of this class that calculus was a prerequisite

More information

Variable Length Codes for Degraded Broadcast Channels

Variable Length Codes for Degraded Broadcast Channels Variable Length Codes for Degraded Broadcast Channels Stéphane Musy School of Computer and Communication Sciences, EPFL CH-1015 Lausanne, Switzerland Email: stephane.musy@ep.ch Abstract This paper investigates

More information

Intelligent Systems:

Intelligent Systems: Intelligent Systems: Undirected Graphical models (Factor Graphs) (2 lectures) Carsten Rother 15/01/2015 Intelligent Systems: Probabilistic Inference in DGM and UGM Roadmap for next two lectures Definition

More information

RANDOM WALKS IN ONE DIMENSION

RANDOM WALKS IN ONE DIMENSION RANDOM WALKS IN ONE DIMENSION STEVEN P. LALLEY 1. THE GAMBLER S RUIN PROBLEM 1.1. Statement of the problem. I have A dollars; my colleague Xinyi has B dollars. A cup of coffee at the Sacred Grounds in

More information

Minimal enumerations of subsets of a nite set and the middle level problem

Minimal enumerations of subsets of a nite set and the middle level problem Discrete Applied Mathematics 114 (2001) 109 114 Minimal enumerations of subsets of a nite set and the middle level problem A.A. Evdokimov, A.L. Perezhogin 1 Sobolev Institute of Mathematics, Novosibirsk

More information

The Integers. Peter J. Kahn

The Integers. Peter J. Kahn Math 3040: Spring 2009 The Integers Peter J. Kahn Contents 1. The Basic Construction 1 2. Adding integers 6 3. Ordering integers 16 4. Multiplying integers 18 Before we begin the mathematics of this section,

More information

Lecture 4. David Aldous. 2 September David Aldous Lecture 4

Lecture 4. David Aldous. 2 September David Aldous Lecture 4 Lecture 4 David Aldous 2 September 2015 The specific examples I m discussing are not so important; the point of these first lectures is to illustrate a few of the 100 ideas from STAT134. Ideas used in

More information

Lattices for Distributed Source Coding: Jointly Gaussian Sources and Reconstruction of a Linear Function

Lattices for Distributed Source Coding: Jointly Gaussian Sources and Reconstruction of a Linear Function Lattices for Distributed Source Coding: Jointly Gaussian Sources and Reconstruction of a Linear Function Dinesh Krithivasan and S. Sandeep Pradhan Department of Electrical Engineering and Computer Science,

More information

Communications II Lecture 9: Error Correction Coding. Professor Kin K. Leung EEE and Computing Departments Imperial College London Copyright reserved

Communications II Lecture 9: Error Correction Coding. Professor Kin K. Leung EEE and Computing Departments Imperial College London Copyright reserved Communications II Lecture 9: Error Correction Coding Professor Kin K. Leung EEE and Computing Departments Imperial College London Copyright reserved Outline Introduction Linear block codes Decoding Hamming

More information

The Integers. Math 3040: Spring Contents 1. The Basic Construction 1 2. Adding integers 4 3. Ordering integers Multiplying integers 12

The Integers. Math 3040: Spring Contents 1. The Basic Construction 1 2. Adding integers 4 3. Ordering integers Multiplying integers 12 Math 3040: Spring 2011 The Integers Contents 1. The Basic Construction 1 2. Adding integers 4 3. Ordering integers 11 4. Multiplying integers 12 Before we begin the mathematics of this section, it is worth

More information

Data Compression Techniques

Data Compression Techniques Data Compression Techniques Part 1: Entropy Coding Lecture 4: Asymmetric Numeral Systems Juha Kärkkäinen 08.11.2017 1 / 19 Asymmetric Numeral Systems Asymmetric numeral systems (ANS) is a recent entropy

More information

Lecture 4: Elementary Quantum Algorithms

Lecture 4: Elementary Quantum Algorithms CS 880: Quantum Information Processing 9/13/010 Lecture 4: Elementary Quantum Algorithms Instructor: Dieter van Melkebeek Scribe: Kenneth Rudinger This lecture introduces several simple quantum algorithms.

More information

Turing Machines Part III

Turing Machines Part III Turing Machines Part III Announcements Problem Set 6 due now. Problem Set 7 out, due Monday, March 4. Play around with Turing machines, their powers, and their limits. Some problems require Wednesday's

More information

MARKOV CHAINS A finite state Markov chain is a sequence of discrete cv s from a finite alphabet where is a pmf on and for

MARKOV CHAINS A finite state Markov chain is a sequence of discrete cv s from a finite alphabet where is a pmf on and for MARKOV CHAINS A finite state Markov chain is a sequence S 0,S 1,... of discrete cv s from a finite alphabet S where q 0 (s) is a pmf on S 0 and for n 1, Q(s s ) = Pr(S n =s S n 1 =s ) = Pr(S n =s S n 1

More information

MASSACHUSETTS INSTITUTE OF TECHNOLOGY Department of Electrical Engineering and Computer Science Transmission of Information Spring 2006

MASSACHUSETTS INSTITUTE OF TECHNOLOGY Department of Electrical Engineering and Computer Science Transmission of Information Spring 2006 MASSACHUSETTS INSTITUTE OF TECHNOLOGY Department of Electrical Engineering and Computer Science 6.44 Transmission of Information Spring 2006 Homework 2 Solution name username April 4, 2006 Reading: Chapter

More information

Elementary 2-Group Character Codes. Abstract. In this correspondence we describe a class of codes over GF (q),

Elementary 2-Group Character Codes. Abstract. In this correspondence we describe a class of codes over GF (q), Elementary 2-Group Character Codes Cunsheng Ding 1, David Kohel 2, and San Ling Abstract In this correspondence we describe a class of codes over GF (q), where q is a power of an odd prime. These codes

More information

Discrete Probability Refresher

Discrete Probability Refresher ECE 1502 Information Theory Discrete Probability Refresher F. R. Kschischang Dept. of Electrical and Computer Engineering University of Toronto January 13, 1999 revised January 11, 2006 Probability theory

More information

Data Compression Techniques (Spring 2012) Model Solutions for Exercise 2

Data Compression Techniques (Spring 2012) Model Solutions for Exercise 2 582487 Data Compression Techniques (Spring 22) Model Solutions for Exercise 2 If you have any feedback or corrections, please contact nvalimak at cs.helsinki.fi.. Problem: Construct a canonical prefix

More information

Discrete Probability and State Estimation

Discrete Probability and State Estimation 6.01, Spring Semester, 2008 Week 12 Course Notes 1 MASSACHVSETTS INSTITVTE OF TECHNOLOGY Department of Electrical Engineering and Computer Science 6.01 Introduction to EECS I Spring Semester, 2008 Week

More information

Game Theory and Algorithms Lecture 7: PPAD and Fixed-Point Theorems

Game Theory and Algorithms Lecture 7: PPAD and Fixed-Point Theorems Game Theory and Algorithms Lecture 7: PPAD and Fixed-Point Theorems March 17, 2011 Summary: The ultimate goal of this lecture is to finally prove Nash s theorem. First, we introduce and prove Sperner s

More information