Information Theory Notes

Size: px
Start display at page:

Download "Information Theory Notes"

Transcription

1 Iformatio Theory Notes Erkip, Fall 009 Table of cotets Week (9/4/009) Course Overview Week (9//009) Jese s Iequality Week 3 (9/8/009) Data Processig Iequality Week 4 (0/5/009) Asymptotic Equipartitio Property Week 5 (0/9/009) Etropy Rates for Stochastic Processes Data Compressio Week 6 (0/6/009) Optimal Codes Huffma Codes Week 7 (/9/009) Other Codes with Special Properties Chael Capacity Week 8 (/6/009) Chael Codig Theorem Week 9 (/3/009) Differetial Etropy Week 0 (/30/009) Gaussia Chaels Parallel Gaussia Chaels No-white Gaussia oise chaels Week (/7/009) Rate Distortio Theory Biary Source / Hammig Distributio Gaussia Source with Squared-Error Distortio Idepedet Gaussia Radom Variables with Sigle-Letter Distortio Note: This set of otes follows the lecture, which refereces the text Elemets of Iformatio Theory by Cover, Thomas, Secod Editio.

2 Week (9/4/009) Course Overview Commuicatios-cetric Motivatio: Commuicatio over oisy chaels. Noise Source Ecoder Chael Decoder Destiatio Examples: over iteret, Voice over phoe etwork, Data over USB storage. I these examples, the source would be the voice, text, ad data, the chael would be the (wired or wireless) phoe etwork, iteret, ad storage device. Purpose: Reproduce source data at the destiatio (with presece of oise i chael) Chael ca itroduce oise (static), iterferece (multiple copies at destiatio arrivig at differet times from radio waves boucig o buildigs, etc), or distortio (amplitude, phase, modulatio) Sources have redudacy, e.g. the eglish laguage, ca recostruct words from parts: th_t that, ad images, where idividual pixel differeces are ot oticed by the huma eye. Some sources ca tolerate distortio (images). A typical example of a ecoder is broke ito three compoets: From Source Source Ecoder Chael Ecoder Modulator to Chael The source ecoder outputs a source codeword, a efficiet represetatio of the source, removig redudacies. The output is take to be a sequece of bits (biary) The chael ecoder outputs a chael codeword, itroducig cotrolled redudacy to tolerate errors that may be itroduced by the chael. The cotrolled redudacy is take ito accout while decodig. Note the distictio from redudacies preset i the source, where the redudacies may be arbitrary (may pixels i a image). The codeword is desiged by examiig chael properties. The simplest chael ecoder ivolves just repeatig the source codeword. The modulator coverts the bits of the chael codeword ito waveforms for trasmissio over the chael. They may be modulated i amplitude or frequecy, e.g. radio broadcasts. The correspodig decoder ca also be broke ito three compoets that ivert the operatios of the source ecoder above: From Chael Demodulator Chael Decoder Source Decoder To Destiatio We call the output of the demodulator the received word ad the output of the chael decoder the estimated source codeword. Remark. Later we will see that separatig the source ad chael ecoder/decoder as above does ot cause a loss i performace uder certai coditios. This allows us to cocetrate o each compoet idepedetly of the other compoets. Called the Joit Source Chael Codig Theorem.

3 Iformatio Theory addresses the followig questios: Give a source, how much ca I compress the data? Are there ay limits? Give a chael, how oisy ca the chael be, or how much redudacy is ecessary to miimize error i decodig? What is the maximum rate of commuicatio? Iformatio Theory iteracts with may other fields as well, probability/statistics, computer sciece, ecoomics, quatum theory, etc. See book for details. There are two fudametal theorems by Shao that address these questios. First we defie a few quatities. Typically, the source data is ukow, ad we ca treat the iput data as a radom variable, for istace, flippig cois ad sedig that as iput across the chael to test the commuicatio scheme. We defie the etropy of a radom variable X, takig values i the alphabet X as H(X)= x X p(x) log p(x) The base logarithm measures the etropy i bits. The ituitio is that etropy describes the compressibility of the source. Example. Let X uif{, etropy is, 6}. Note we eed 4 bits to represet the values of X ituitively. The 6 H(X)= 6 log 6 = 4 bits x= Example. 8 horses i a race with wiig probabilities ( ),,,,,,,. Note H(X)= log 4 log 4 8 log 8 6 log log 64 = bits The aive ecodig for outputtig the wier of the race uses 8 bits, labellig the horses from 000 to i biary. However, sice some horses wi more frequetly, it makes sese to use shorter ecodig words for more frequet wiers. For istace, label the horses (0,0,0,0,00,0,0,). This particular ecodig has the property that it is prefix-free, o codeword is a prefix of aother codeword. This makes it easy to determie how may bits to read util we ca determie which horse is beig refereced. Now ote some codewords are loger tha 3 bits, but o the average, we see that the average descriptio legth is which is exactly the etropy. ()+ 4 ()+ 8 (3) + 6 (4) (6)=bits Now we are ready to state (roughly) Shao s First Theorem: Theorem 3. (Source Code Theorem) Roughly speakig, H(X) is the miimum rate at which we ca compress a radom variable X ad recover it fully. Ituitio ca be draw from the previous two examples. We will visit this agai i more detail later. 3

4 Now we defie the mutual iformatio betwee two radom variables (X, Y ) p(x, y) defied as I(X; Y )= x,y p(x, y) log p(x, y) p(x)p(y) Later we will show that I(X; Y ) = H(X) H(X Y ), where H(X Y ) is to be defied later. Here H(X) is to be iterpreted as the compressibility (as stated earlier), or the amout of ucertaity i X. H(X Y ) is the amout of ucertaity i X, kowig Y, ad fially I(X ; Y ) is iterpreted as the reductio i ucertaity of X due to kowledge of Y, or a measure of depedecy betwee X ad Y. For istace, if X ad Y are completely depedet (kowig Y gives all iformatio about X), the H(X Y ) = 0 (o ucertaity give Y ), ad so I(X ; Y ) = H(X) (a complete reductio of ucertaity). O the other had, if X ad Y are idepedet, the H(X Y ) = H(X) (still ucertai) ad I(X ; Y ) = 0 (o reductio of ucertaity). Now we will characterize a commuicatios chael as oe for which the chael output Y depeds probabilitstic o the iput X. that is, the chael is characterized by the coditioal probability P(Y X). For a simple example, suppose the iput is {0, }, ad the chael flips the iput bit with probability ε, the P(Y = X =0)=ε, P(Y = 0 X = 0) = ε. Defie the capacity of the chael to be C = max p(x) I(X ; Y ) otig here that I(X; Y ) depeds o the joit p(x, y) = p(y x)p(x), ad thus depeds o the iput distributio as well as the chael. The maximizatio is take over all possible iput distributios. We ca ow roughly state Shao s Secod Theorem: Theorem 4. (Chael Codig Theorem) The maximum rate of iformatio over a chael with arbitrarily low probability of error is give by its capacity C. Here rate is umber of bits per trasmissio. This meas that as log as the trasmissio rate is below the capacity, the probability of error ca be made arbitrarily small. We ca motivate the theorem above with a few examples: Example 5. Cosider a oiseless biary chael, where iput is {0, } ad the output is equal to the iput. We would the expect to be able to sed bit of data i each trasmissio through the chael with o probability of error. Ideed if we compute I(X;Y ) =H(X) H(X Y ) =H(X) sice kowig Y gives complete iformatio about X, ad thus takig derivative ad settig to zero gives us C = max H(X)=max [ p log p ( p) log ( p)] p(x) p log p +log( p)+=0 ad so p = p ad p =. The C = log =bit 4

5 as expected. Example 6. Cosider a oisy 4 symbol chael, where the iput ad output X, Y take values i {0,,, 3} ad P(Y = j X = j)=p(y = j + X = j) = where j + is take modulo 4. That is, a give iput is either preserved or icremeted upo output with equal probability. To make use of this oisy chael, we ote that we do ot have to use all the iputs whe sedig iformatio. If we restrict the iput to oly 0 or, the we ote that 0 is set to 0 or, ad is set to or 3, ad we ca easily deduce from the output whether we set a 0 or a. This allows us to sed oe bit of iformatio through the chael with zero probability of error. It turs out that this is the best strategy. Give a iput distributio p X (x), we compute the joit distributio: p(x, y)= Note that the margial distributio of the output is { p X(x) y {x, x+} 0 else p Y (y)= (p X(y) + p X (y )) Applyig the defiitio of coditioal etropy, we have The the mutual iformatio is H(Y X)= x ( p(x)h I(X;Y )=H(Y ) ) = The etropy of Y is 3 H(Y )= y=0 p Y (y) log p Y (y) Sice give p Y (y), we ca fid p X correspodig to p Y, it suffices to maximize I(X ; Y ) over possibilities for p Y. Lettig p Y (0)= p, p Y ()= q, p Y ()=r, p Y (3)= p q r, we have the maximizatio problem max p log p q log q r log r ( p q r) log( p q r) p,q,r Takig the gradiet ad settig it to zero, we have the equatios which reduces to log p +log( p q r)+ = 0 log q +log( p q r)+ = 0 log r +log( p q r)+ = 0 p = p q r q = p q r r = q q r 5

6 so that p = q = r =. Pluggig this ito the above gives 4 as expected. [ ] C = 4 4 log = bit 4 Remark. Above, to illustrate how to use the oisy 4-symbol chael, suppose the iput strig is 000. If the 4-symbol chael has o oise, the we could sed bits of iformatio per trasmissio by usig the followig scheme: Block the iputs two bits at a time: 0, 0,, 0, ad sed the messages,, 3,, across the chael. The the chael decoder iverts back to 0, 0,, 0 ad reproduces the iput strig. However, sice the chael is oisy, we caot do this without icurrig some probability of error. Thus istead, we decide to set oe bit at a time, ad so we sed 0,, 0,, ad the output of the chael is 0 or, or 3, 0 or, or3, which we ca easily ivert to recover the iput. Furthermore, by the theorem, bit per trasmissio is the capacity of the chael, so this is the best scheme for this chael. A additioal bous is that we ca eve operate at capacity with zero probability of error! I geeral, we ca compute the capacity of a chael, but it is hard to fid a ice scheme. For istace, cosider the biary symmetric chael, with iputs {0, } where the output is flipped with probability ε. Note that sedig oe bit through this chael has a probability of error ε. Oe possibility for usig the chael ivolves sedig the same bit repeatedly through the chael ad takig the majority i the chael decodig. For istace, if we sed the same bit three times, the probability of error is P( or3flips)=ε 3 + 3ε ( ε) =3ε ε 3 = O(ε ) whereas the data rate becomes bits per trasmissio (we eeded to sed 3 bits to sed bit of iformatio). To lower the probability of error we would the eed to sed more bits. If we sed bits, the the 3 data rate is ad the probability of error is P(/ flips or more)= ( ) ε j j ( ε) j j=/ We ca compute the capacity of the chael directly i this case as well. Note that H(Y X)= x=0 p X (x)h(ε)=h(ε) ad I(X; Y ) = H(X) H(ε) = H(ε) p log p ( p) log p, which is maximized at p =, correspodig to a chael capacity of H(ε). Note that the proposed repititio codes above do ot achieve this capacity for arbitrarily low probability of error. Later we will show how to fid such codes. Week (9//009) Today: Followig Chapter, defiig ad examiig properties of basic iformatio-theoretic quatities. Etropy is for a discrete radom variable X, takig values i a coutable alphabet X with probability mass fuctio p(x) =P(X = x), x X. Later we will discuss a otio of etropy for the cotiuous case. 6

7 Defiitio. The etropy of X p(x) is defied to be H(X)= x X p(x) log p(x)=e p(x) ( log p(x)) If we use the base logarithm, the the measure is i bits. If we use the atural logarithm, the measure is i ats. Also, by covetio (cotiuity), we take 0 log0=0. Note that H(X) is idepedet o labelig. It oly depeds o the masses i the pmf p(x). Also, H(X) 0, which follows from the fact that log(p(x)) 0 for 0 p(x). Example 7. The etropy of a Beroulli(p) radom variable, is H(X) = p log p ( p) log( p). { 0 w.p. p X = w.p. p We will ofte use the otatio H(p) 6 p log p ( p) log( p) (biary etropy fuctio) for coveiece. Note the graph of H(p):. (-x*log(x)-(-x)*log(-x))/log() I particular, we ote the followig observatios: H(p) is cocave. Later we will show that etropy i geeral is a cocave fuctio of the uderlyig probability distributio. H(p) = 0 for p = 0,. This reiforces the ituitio that etropy is a measure of radomess, sice i the case of o radomess p =0,, there is o etropy. H(p) is maximized whe p = /. Likewise, this is the situatio where a biary distributio is the most radom. 7

8 H(p)=H( p). Example 8. Let w.p. X = w.p. 3 w.p. 4 4 I this case H(X) = log 4 log = + = 3 bits. Last time we viewed etropy as the average 4 legth of a biary code eeded to ecode X. We ca iterpret such biary codes as a sequece of yes/o questios that are eeded to determie the value of X. The etropy is the the average umber of questios eeded usig the optimal strategy for determiig X. I this case, we ca use the followig simple strategy, takig advatage of the fact that X = occurs more ofte tha the other cases: Questio : Is X =? Questio : Is X =? I this case, the umber of questios eeded is if X =, which happes with probability ad otherwise, with probability, so the expected umber of questios eeded is + = 3, which coicides with the etropy. Example 9. Let X,, X be i.i.d Beroulli(p) distributed (0 with probability p, with probability p). The the joit distributio is give by p(x,, x )= p # of 0 s ( p) # of s = ( p) i Xi p i Xi Observe that if is large, the by the law of large umbers i= that X i ( p) with high probability, so p(x,, x ) ( p) ( p) p p = H(p) The iterpretatio is that for large, the probability of the joit is cocetrated i a set of size H(p) o which it is approximately uiformly distributed. I this case we ca exploit this compressio ad use H(p) bits to distiguish sequeces i this set (per radom variable we are usig H(p) bits. For this reaso H(p) is the effective alphabet size of (X,, X ). Defiitio. The joit etropy of (X, Y ) p(x, y) is give by H(X, Y )= x X y Y Defiitio. The coditioal etropy of Y give X is p(x, y) log p(x, y) =E p(x,y) ( log p(x, y)) H(Y X) = p(x)h(y X =x) x X = p(x)p(y x) log p(y x) x X y Y = p(x, y) log p(y x) x,y = E p(x,y) ( log p(y x)) We expect that we ca write the joit etropy i terms of the idividual etropies, ad this is ideed the case, i a result we call the Chai Rule for Etropy: 8

9 Propositio 0. (Chai Rule for Etropy) For two radom variables X, Y, H(X, Y ) =H(X)+H(Y X)=H(Y )+H(X Y ) For multiple radom variables X,, X, we have that H(X,, X ) = i= H(X i X,, X i ) Proof. We just perform a simple computatio H(X, Y ) = p(x, y) log p(x, y) x X y Y = log p(x) p(x, y) x X y Y x X = H(X) +H(Y X) y Y p(x, y) log p(y x) by symmetry (switchig the roles of X ad Y ) we get the other equality. For radom variables X,, X we proceed by iductio. We have just proved the result for two radom variables. Assume it is true for radom variables. The H(X,, X ) = H(X X,, X )+H(X,, X ) = H(X X,, X )+ i= = i= H(X i X,, X i ) H(X i X,, X i ) where we have used the usual chai rule ad the iductio hypothesis. We also immediately get the chai rule for coditioal etropy: Corollary. (Chai Rule for Coditioal Etropy) For three radom variables X, Y, Z, we have H(X, Y Z)=H(Y Z) +H(X Y, Z) =H(X Z)+H(Y X, Z) ad for radom variables X,, X ad a radom variable Z, we have H(X,, X Z) = i= H(X i Z, X, X i ) Proof. This follows from the chai rule for etropy give Z =z, ad averagig over z gives the result. Example. Cosider the followig joit distributio for X, Y i tabular form with the margials: p(x, y) = y\x p y p x 9

10 Note that H(X) = bit ad H(Y ) = H ( 4 ) = 4 log log3 = 3 log 3 bits. The coditioal etropies 4 4 are We compute the differet etropies H(X), H(Y ), H(Y X), H(X Y ), H(X, Y ) below: ( ) H(X) = H = bit ( ) H(Y ) = H 4 = 4 log log3 4 = 3 log3 bits 4 H(Y X) = P(X = ) H(Y X = )+P(X = ) H(Y X = ) = ( ) H = bits H(X Y ) = P(Y =) H(X Y = )+P(Y = ) H(X Y =) ) = 3 ( 4 H 3 [ ] = log 3 3 log 3 = 3 4 log3 bits H(X, Y ) = 4 log 4 log = 3 bits Above we ote that H(Y X = ) = H(X Y = ) = H(0) = 0 sice i each case the distributio becomes determiistic. Note that i geeral H(Y X) H(X Y ) ad i this example H(X Y ) H(X), H(Y X) H(Y ). We will show this i geeral later. Also, H(Y X = x) has o relatio to H(Y ), ca be larger or smaller. The coditioal etropy is related to a more geeral quatity called relative etropy, which we ow defie. Defiitio. For two pmf s p(x), q(x) o X the relative etropy (Kullback-Leibler distace) is defied by D(p q)= x X p(x) log p(x) q(x) usig the covetio that 0 log 0 = 0 log 0 = 0 ad p q 0 logp =. I particular, the distace is allowed to be 0 ifiite. Note that the distace is ot symmetric, so D(p q) D(q p) i geeral, ad so it is ot a metric. However it does satisfy positivity. We will show that D(p q) 0 with equality whe p = q. Roughly speakig this describes the loss from usig q istead of p for compressio. Defiitio. The mutual iformatio betwee two radom variables X, Y p(x, y) is I(X ; Y )= x X y Y p(x, y) p(x, y) log = d(p(x, y) p(x)p(y)) p(x)p(y) 0

11 Relatioship betwee Etropy ad Mutual Iformatio I(X ; Y ) = p(x, y) p(x, y) log p(x) p(y) x,y = p(x, y) p(x, y) log p(x) x,y y = H(Y X)+H(Y ) = H(Y ) H(Y X) ( log p(y) x ) p(x, y) ad thus we coclude by symmetry that I(X ; Y )=H(Y ) H(Y X)=H(X) H(X Y ) Remark. As we said before, this is the reductio i etropy of X give kowledge of Y, ad we will show later that I(X;Y ) 0 with equality if ad oly if X, Y are idepedet. Also, ote that I(X ; X) = H(X) H(X, X) = H(X), ad sometimes H(X) is referred to as self-iformatio. The relatioships betwee the etropies ad the mutual iformatio ca be represeted by a Ve Diagram. H(X), H(Y ) represet the two circles, the itersectio is I(X ; Y ), the uio is H(X, Y ), ad the differeces X Y ad Y X represet H(X Y ) ad H(Y X), respectively. We will also establish chai rules for mutual iformatio ad relative etropy. First we defie the mutual iformatio coditioed o a radom variable Z: Defiitio. The coditioal mutual etropy of X, Y give Z for (X, Y, Z) p(x, y, z) is I(X ; Y Z)=H(X Z) H(X Y, Z)=E p(x,y,z) (log otig that this is also I(X ; Y Z = z) averaged over z. The chai rule for mutual iformatio is the I(X,, X ; Y )= i= I(X i ; Y X,, X i ) ) p(x, y z) p(x z)p(y z) The proof just uses the chai rules for etropy ad coditioal etropy: I(X, X ; Y ) = H(X,, X ) H(X,, X Y ) = H(X i X,, X i ) H(X i Y, X,, X i ) i= = I(X i ; Y X,, X i ) i= Now we defie the relative etropy coditioed o a radom variable Z, ad prove the correspodig chai rule:

12 Defiitio. Let X be a radom variable with distributio give by pmf p(x). Let p(y x), q(y x) be two coditioal pmfs. The coditioal relative etropy of p(y x) ad q(y x) is the defied to be D(p(y x) q(y x)) = x p(x)d(p(y X = x) q(y X =x)) The the chai rule for relative etropy is = p(y x) p(x)p(y x) log q(y x) x,y ( ) p(y X) = E p(x,y) log q(y X) ( D(p(x, y) q(x, y)) = E p(x,y) log p(x, Y ) ) q(x, Y ) ( ) p(y X) = E p(x,y) log q(y X) + logp(x) q(x) ( ( p(y X) = E p(x,y) log )+ E q(y X) p(x) log p(x) ) q(x) = D(p(y x) q(y x)) +D(p(x) q(x)) The various chai rules will help simplify calculatios later i the course. Jese s Iequality Covexity ad Jese s Iequality will be very useful for us later. First a defiitio: Defiitio. A fuctio f is said to be covex o [a, b] if for all [a, b ] [a, b] ad 0 λ we have f(λa + ( λ)b ) λf(a )+( λ)f(b ) If the iequality is strict, the we say f is strictly covex. Note that if we have a biary radom variable X Beroulli(λ), the we ca restate covexity as f(e p(x) (X)) E p(x) (f(x)) We ca geeralize this to more complicated radom variables, which leads to Jese s iequality: Theorem 3. (Jese s Iequality) If f is covex, ad X is a radom variable, the f(e(x)) E(f(X)) ad if f is strictly covex, the equality above implies X =E(X) with probability, i.e. X is costat. Proof. There are may proofs, but for fiite radom variables, we ca show this by iductio ad the fact that it is already true by defiitio for biary radom variables. To exted the proof to the geeral case, ca use the idea of supportig lies: we ote that f(x) +m(y x) f(y)

13 for all x ad some m depedet o x. Set x = E(X) to get f(e(x)) +m(y E(X)) f(y) ad itegratig i y gives the result f(e(x)) E(f(X)) If we have equality, the f(y) f(e(x)) +m(y E(X))=0 or f(y)= f(e(x))+m(y E(X)) for a.e. y. However, by strict covexity this ca occur oly if y =E(X) a.e. Week 3 (9/8/009) We cotiue with establishig iformatio theoretic quatities ad iequalities. We start with applicatios of Jese s iequality. Theorem 4. (Iformatio Iequality) For ay p(x), q(x) pmfs, we have that with equality if ad oly if p = q. D(p q) 0 Proof. First let A={x: p(x) >0}, ad compute D(p q) = p(x) log p(x) q(x) x A = ( p(x) log q(x) ) p(x) x A ( ) log p(x) q(x) p(x) ( x A ) log q(x) = 0 x X Examiig the proof above, equality occurs if ad oly if x A q(x)= x X q(x) ad equality is achieved i the applicatio of Jese s iequality. The first coditio leads to q(x) = 0 wheever p(x) =0. 3

14 Equality i the applicatio of Jese s iequality with strict covexity of log( ) implies that o A, q(x) p(x) = q(x)= x A ad therefore q = p. This has quite a few implicatios: Corollary 5. I(X ; Y ) 0 if ad oly if X, Y are idepedet. Proof. Notig I(X; Y )=D(p(x, y) p(x)p(y)), we have that I(X ; Y ) 0 with equality if ad oly if p(x, y) = p(x)p(y), i.e. X, Y are idepedet. Corollary 6. If X <, the we have the followig useful upper boud: H(X) log X Proof. Let u(x) uif(x), i.e. u(x) = i X. The X for x X. Let X p(x) be ay radom variable takig values 0 D(p u) = p(x) log p(x) u(x) x = H(X)+ p(x) log X x = H(X)+log X from which the result follows. Equality holds if ad oly if p = u. Theorem 7. (Coditioig Reduces Etropy) with equality if ad oly if X, Y are idepedet. H (X Y ) H(X) Proof. Note I(X ; Y ) = H(X) H(X Y ) 0, where equality holds if ad oly if X, Y are idepedet. Remark. Do ot forget that H(X Y = y) is ot ecessarily H(X). The coditioal etropy iequality allows us to show the cocavity of the etropy fuctio: Corollary 8. Suppose X p(x) is a r.v. The H(p) 6 H(X) is a cocave fuctio of p(x), i.e. H(λp +( λ)p ) λh(p )+( λ)h(p ) 4

15 Note that p i are pmf s, so H(p i ) is ot the biary etropy fuctio. Proof. Let X i p i (x) for i =, defied o the same alphabet X. Let ad defie { w.p.λ θ = w.p.( λ) { X w.p.λ Z = X θ = X w.p.( λ) Now ote that H(Z θ)=λh(p ) +( λ)h(p ), ad by the coditioal etropy iequality, λh(p )+( λ)h(p )=H(Z θ) H(Z) =H(λp + ( λ)p ) Note that P(Z =a) =λp (a) +( λ)p (a). Corollary 9. H(X,, X ) i= H(X i ) Proof. By the chai rule, H(X,, X ) = H(X i X,, X i ) H(X i ) i= i= usig the coditioal etropy iequality. Equality holds if ad oly if X,, X are mutually idepedet, followig from X i beig idepedet from the joit (X,, X i ). Aother applicatio of Jese s iequality will give us aother set of useful iequalities. Theorem 0. (Log Sum Iequality) Let a,, a ad b,, b be oegative umbers. The i= a i log a i b i ( ) ( a i log i= ) ai bi Proof. This follows from the covexity of t log t (with secod derivative ). The defiig t we have that X = a i b w.p. i b i bi bi i= a i log a i b i = E(X log X) E(X) log E(X) ( ) ai ai = log bi bi ad multiplyig by b i gives the result. Note that equality holds if ad oly if ai ai = b i bi, or a i = Cb i. 5

16 The we have the followig corollaries: Corollary. D(p q) is covex i the pair (p, q), i.e. D(λp + ( λ)p λq + ( λ)q ) λd(p q ) +( λ)d(p q ) Proof. Expadig the iequality above, we wat to show that x (λp (x)+( λ)p (x)) log λp (x)+( λ)p (x) λq (x)+( λ)q (x) λ x p (x) log p (x) q (x) +( λ) x p (x) log p (x) q (x) The for a fixed x, it suffices to show that i=, λ i p i log λ ( ip i λ i q i i=, λ i p i )(log i λ ) ip i i λ i q i otig λ =λ, λ =( λ), which is exactly the log sum iequality. Summig over x gives the result. Theorem. Let (X, Y ) p(x, y)= p(x)p(y x). The. I(X ; Y ) is a cocave fuctio of p(x) for a fixed coditioal distributio p(y x).. I(X ; Y ) is a covex fuctio of p(y x) for a fixed margial distributio p(x). Remark. This will be useful i studyig the source-chael codig theorems, sice p(y x) captures iformatio about the chael. Cocavity implies that local maxima are also global maxima, ad whe computig the capacity of a chael (maximizig mutual iformatio betwee iput ad output of chael) will allow us to look for local maxima. Likewise, covexity implies that local miima are global miima, ad whe we study lossy compressio we will wat to compute give a fixed iput distributio, what is the worst case sceario, which ivolves miimizig the mutual iformatio. Proof. For (), ote that I(X ; Y ) = H(Y ) H(Y X). Sice p(y) = x p(x)p(y x) is a liear fuctio of p(x) for a fixed p(y x), ad H(Y ) is a cocave fuctio of p(y), we see that H(Y ) is a cocave fuctio of p(x), sice a compositio of a cocave fuctio φ with a liear fuctio T is cocave: As for the secod term, ote that φ(t(λx + ( λ)x ))= φ(λtx +( λ)tx ) λφ(tx ) +( λ)φ(tx ) H(Y X)= x p(x)h(y X = x) where H(Y X = x) = p(y x) log p(y x) which is fixed for a fixed p(y x), ad thus H(Y X) is a liear fuctio of p(x), which i particular is cocave. Thus I(X ; Y ) is a sum of two cocave fuctios which is also cocave: (φ + φ )(λx +( λ)y) i (λφ i (x)+( λ)φ i (y))=λ(φ + φ )(x)+( λ)(φ + φ )(y) This proves (). 6

17 To prove (), cosider p (y x), p (y x), two coditioal distributios give p(x), with correspodig joits p (x, y), p (x, y) ad output margials p (y), p (y). The also cosider p λ (y x)=λp (y x)+( λ)p (y x) with correspodig joit p λ (x, y)=λp (x, y) +( λ)p (x, y) ad margials p λ (y)=λp (y) + ( λ)p (y). The the mutual iformatio I λ (X ; Y ) usig p λ is I λ (X ; Y ) = D(p λ (x, y) p(x)p λ (y)) λd(p (x, y) p(x)p (y))+( λ)d(p (x, y) p(x)p (y)) = λi (X ; Y ) +( λ)i (X ; Y ) which shows that I λ is a covex fuctio of p(y x) for a fixed p(x). Data Processig Iequality We ow tur to provig a result about Markov chais, which we defie first: Defiitio 3. The radom variables X, Y, Z form a Markov chai, deoted X Y Z if the joit factors i the followig way: for all x, y, z. p(x, y, z)= p(x)p(y x)p(z y) Properties.. X Y Z if ad oly if X, Z are coditioally idepedet give Y, i.e. p(x, z y) = p(x y)p(z y) Proof. First assume X Y Z. The p(x, z y)= p(x, y, z) p(y) Coversely, suppose p(x, z y) = p(x y)p(z y). The = p(x, y)p(z y) p(y) = p(x y)p(z y) p(x, y, z)= p(y)p(x, z y)= p(y)p(x y)p(z y)= p(x)p(y x)p(z y) as desired.. X Y Z Z Y X, which follows from the previous property. 3. If Z = f(y ), the X Y Z. This also follows from the first property, sice give Y, Z is idepedet of X: { z = f(y) p(z x, y)= p(z y)= 0 else Now we ca state the theorem: Theorem 4. (Data Processig Iequality) Suppose X Y Z. The I(X;Y ) I(X;Z). 7

18 Remark. This meas that alog the chai, the depedecy decreases. Alteratively, if we iterpet I(X ; Y ) as the amout of kowledge Y gives about X, the ay processig of Y to get Z may decrease the iformatio that we ca get about X. Proof. Expad I(X;Y, Z) by chai rule i two ways: I(X ; Y, Z)=I(X ; Y )+I(X ; Z Y ) I(X ; Y, Z)=I(X ; Z)+I(X ; Y Z) Note that I(X;Z Y ) =0 sice X, Z are idepedet give Y. Also I(X ; Y Z) 0, ad thus I(X ; Y )=I(X ; Y, Z)=I(X ; Z)+I(X ; Y Z) I(X ; Z) as desired. Now we illustrate some examples. Example 5. (Commuicatios Example) Modelig the process of sedig a (radom) message W to some destiatio: W Ecoder X Chael Y Decoder Z Note that W X Y Z is a Markov chai, each oly depeds directly o the previous. The data processig iequality tells us that I(X ; Y ) I(X ; Z) Thus, if we are ot careful, the decoder may decrease mutual iformatio (ad of course caot add ay iformatio about X). Similarly, I(X ; Y ) I(W ; Z) Example 6. (Statistics Examples) I statistics, frequetly we wat to estimate some ukow distributio X by makig measuremets (observatios) ad the comig up with some estimator: X Measuremet Y Estimator Z For istace, X could be the distributio of heights of a populatio (gaussia with ukow mea ad stadard deviatio). the Y would be some samples, ad from samples we ca come up with a estimator (say the sample mea / stadard deviatio). X Y Z, ad by the data processig iequality we the ote that the estimator may decrease iformatio. A estimator that does ot decrease iformatio about X is called a sufficiet statistic. Here is a probabilistic setup. Suppose we have a family of probability distributios {p θ (x)}, ad Θ is the parameter to be estimated. X represets samples from some distributio p Θ i the family. Let T(X) be ay fuctio of X (for istace, the sample mea). The Θ X T(X) form a Markov Chai. By the data processig iequality, I(Θ; T(X)) I(Θ; X). Defiitio 7. T(X) is a sufficiet statistic for Θ if Θ T(X) X also forms a Markov Chai. Note that this implies I(Θ; T(X))=I(Θ; X). 8

19 As a example, cosider p θ (x) Beroulli(θ). Ad cosider X = (X,, X ) i.i.d p θ (x). The T(X) = i= X i is a sufficiet statistic for Θ, i.e. give T(X), the distributio of X does ot deped o Θ. Here T(X) couts the umber of s. The the joit ( ) xi = k P(X = (x,, x ) T(X)=k, Θ=θ)= k 0 else where we ote that P(X = (x, k Θ=θ)= ( ) k θ k ( θ) k., x ) T(X) = k Θ = θ) = θ k ( θ) k (if xi = k) ad P(T(X) = As a secod example, cosider f θ (x) N(θ, ) ad cosider X = (X,, X ) i.i.df θ (x). The T(X) = i= X i is a sufficiet statistic for Θ. Note that i= X i N(θ, ) so that T(X) N(θ, /). Cosider the joit of (X,, X, T(X)), which is ( ) exp f(x,, x, x )= f θ (x,, x )P(T(X)=x x,, x )= i= x i θ x = xi 0 else Note the θ-depedet part is i= x i θ. Also, ( ) f(x ) = exp x θ ad if x = i x i, the the θ-depedet part of f(x ) is x θ = i= x i θ. Thus the θ-depedet parts cacel i f(x,, x x, Θ) so that give T(X), we have that X ad Θ are idepedet. As a fial example, if g θ Uif(θ, θ + ), ad X = (X, I this case we ote that ( ) T(X,, X ) = max X i, mi X i i i, X ) i.i.dg θ, the a sufficiet statistic for θ is P(x I,, x I max X i = M,mi X i = m)= I i [m, M] i i i= which does ot deped o the parameter θ. Give the max ad mi, we kow that the X i are i betwee, so we do ot eed to kow θ to compute the coditioal distributio. Now cosider the problem of estimatig a ukow r.v. X p(x) by observig Y with coditioal distributio p(y x). Use g(y ) = Xˆ as the estimate for X, ad cosider the error P e = Pr(Xˆ X). Fao s iequality relates this error to the coditioal etropy H(X Y ). This is ituitive, sice the lower the etropy, the lower we expect the probability of error to be. Theorem 8. (Fao s Iequality) Suppose we have a setup as above with X takig values i the fiite alphabet X. The H(P e )+P e log( X ) H(X Y ) where H(P e ) is the biary etropy fuctio P e log P e + ( P e ) log ( P e ). Note that a weaker form of the iequality is (after usig H(P e ) ad log( X ) log( X ): P e H(X Y ) log( X ) 9

20 Remark. Note i particular from the first iequality that if P e = 0 the H(X Y ) = 0, so that X is a fuctio of Y. { X Xˆ Proof. Let E = be the idicator of error, ad ote E Beroulli(P e). Examie H(E, X Y ) 0 X = Xˆ usig the chai rule: H(E Y ) + H(X E, Y ) = H(E, X Y ) = H(X Y ) + H(E X, Y ). Note that H(E Y ) H(E)=H(P e ). Also, H(X E, Y )=P(E =0)H(X E = 0, Y )+P(E =)H(X E =, Y ) where i the first compoet H(X E = 0, Y ) = 0 because kowig Y ad that there was o error gives us iformatio that X = g(y ), so it becomes determiistic. Also, give E =, Y, we kow that X takes values i X \{Xˆ }, a alphabet of size X. The usig the estimate H(X) log X, we have that H(X E, Y ) P e log( X ) ad fially sice H(E X, Y ) =0 sice kowig X, Y allows us to determie whether there is a error, H(X Y ) P e log( X )+H(P e ) which is the desired iequality. Remark. The theorem is also true with g(y ) radom. See.0. i book. I fact, we just make a slight adjustmet. I the proof above we have actually showed the result with H(X Xˆ) istead of H(X Y ). The the rest is a cosequece of the data processig iequality, sice I(X, Xˆ) I(X, Y ) H(X) H(X Xˆ) H(X) H(X Y ) H(X Y ) H(X Xˆ) Also, the iequality may be sharp, there exist distributios for which equality is satisfied. See book. Week 4 (0/5/009) Suppose we have X, X i.i.d with etropy H(X). Note P(X = X ) = x X probability below: p(x). We ca boud this Lemma 9. P(X = X ) H(X), with equality if ad oly if X is uiformly distributed. Proof. Sice x is covex, ad H(X)=E p(x) log p(x), we have that H(X) = E p(x)log p(x) E p(x) p(x)= x X p(x) = P(X = X ) With the same proof, we also get Lemma 30. Let X p(x), X r(x). The P(X = X H(p) D(p r) ) or P(X = X H(r) D(r p) ) 0

21 Proof. H(p) D(p r) = E p(x) log p(x) log p(x) r(x) E p(x) r(x) = P(X =X ) ad swappig the roles of p(x), r(x), the other iequality holds. These match ituitio, sice the higher the etropy (ucertaity), the less likely the two radom variables will agree. Asymptotic Equipartitio Property Now we will shift gears ad discuss a importat result that allows for compressio of radom variables. First recall the differet otios of covergece of radom variables: Usual covergece (of real umbers, or determiistic r.v.): Covergece a.e. (with probability ): a a ε, Ns.t. for > N, a a ε X X a.e. P(ω: X (ω) X(ω))= Covergece i probability (weaker tha above) p X X lim P( X (ω) X(ω) > ε)=0 ε Covergece i law/distributio: d X X P(X [a, b]) P(X [a, b]) for all [a, b] with P(X =a) =P(X = b)=0 (or cdf s coverge at pts of cotiuity, or characteristic fuctios coverge pt-wise, etc...) Now we work towards a result kow as the Law of Large Numbers for i.i.d r.v. (sample meas ted to the expectatio i the limit). First, Lemma 3. (Chebyshev s Iequality) Let X be a r.v. with mea µ ad variace σ. The P( X µ > ε) σ ε Proof. σ = X µ dp X µ dp X µ >ε ε P( X µ > ε) from which we coclude the result.

22 Theorem 3. (Weak Law of Large Numbers (WLLN)) Let X,, X be i.i.d X with mea µ ad variace σ. The i probability, i.e. X, the mea of A is µ ad the variace is σ. The by Cheby- Proof. Note that lettig A = shev s iequality, i= X X i= ) lim P( X µ > ε =0 i= ) P( X µ >ε σ ε 0 i= so that we have covergece i probability. With a straightforward applicatio of the law of large umbers, we arrive at the followig: Theorem 33. (Asymptotic Equipartitio Property (AEP)) If X,, X are i.i.d p(x), the log p(x,, x ) H(X) i probability. Proof. Sice p(x,, x )= p(x ) p(x ), we have that log p(x,, x )= log p(x i ) E( log p(x))=h(x) i= a.e. usig the Law of Large Numbers o the radom variable log p(x i ), which are still i.i.d, sice (X, Y ) idepedet implies (f(x), g(y )) idepedet. Remark. Note that this shows that p(x,, x ) H(X), so that the joit is approximately uiform o a smaller space. This is where we get compressio of the r.v., ad motivates the followig defiitio. Defiitio 34. The typical set A ε () with respect to p(x) is the set of sequeces (x,, x ) such that (H(X)+ε) p(x,, x ) (H(X) ε) Properties of A ε () :. Directly from the defiitio, we have that () (X,, X ) A ε H(X) ε log p(x,, x ) H(X)+ε log p(x,, x ) H(X) < ε

23 ( ) (). P A ε ε for large. Proof. The AEP shows that for some ( P log p(x,, X ) H(X) )< > ε ε ad comparig with the first property, this gives a boud o the measure of the complemet of the typical set, ad this gives the result. 3. ( ε) (H(X) ε) () A ε (H(X)+ε) Proof. = (x,,x ) p(x,, x ) () Aε p(x, (), x ) A ε (H(X)+ε) gives the RHS iequality, ad ( ) () ε <P A ε = p(x, () Aε (), x ) A ε (H(X) ε) gives the LHS iequality (oly true for large eough ). Give these properties, we ca come up with a scheme for compressig data. Give X,, X i.i.d p(x) over the alphabet X, ote that sequeces from the typical set occur with very high probability > ( ε). Thus we expect to be able to use shorter codes for typical sequeces. The scheme is very simple. Sice there are at most (H(X)+ε) sequeces i the typical set, we eed at most (H(X) + ε) bits, ad roudig up this is (H(X) + ε) + bits. To idicate i the code that we are describig a typical sequece, we will apped a 0 to ay code word of a typical sequece. Thus to describe a typical sequece we use at most (H(X) + ε) + bits. For o-typical sequeces, sice they occur rarely we ca afford to use the lousy descriptio with at most log X + bits, ad appedig a at the begiig to idicate that it is ot typical, we describe o-typical sequeces with log X + bits. Note that this code is -: give the code word we will be able to decode by lookig at the first digit to see whether it is a typical sequece, ad the lookig i the appropriate table to recover the sequece. Now we compute the average legth of the code. To fix otatio, let x = (x, the legth of the correspodig codeword. The the expected legth of code is E(l(x )) = x p(x )l(x ) () Aε p(x )[(H(X)+ε)+] + (A ε () ) c p(x )[ log X + ] P(A ε () )[(H(X) +ε)] +( P(A ε () ))[ log X ] + (H(X)+ε)+ε log X + (H(X)+ε ), x ) ad let l(x ) deote with ε = ε + ε log X + /. Note that ε ca be made arbitrarily small, sice ε 0 ε 0. Thus [ ] E l(x ) H(X) +ε 3

24 for large, ad o the average we have represeted X usig H(X) bits. Questio: Ca oe do better tha this? It turs out that the aswer is o, but we will oly be able to aswer this much later. Remark. This scheme is also ot very practical: For oe, we caot compress idividual r.v., we must take large ad compress them together. The table to look up will be large ad iefficiet. This requires kowledge of the uderlyig distributio p(x). We will later see schemes that ca estimate the uderlyig distributio as the iput comes, a sort of uiversal ecodig scheme. Next time: We will discuss etropy i the cotext of stochastic processes. Defiitio 35. A stochastic process is a idexed set of r.v. s with arbitrary depedece. To describe a stochastic process we specify the joit p(x,, x ) for all. Ad we will be focusig o statioary stochastic processes, i.e. oes for which for all, t, i.e. idepedet of time shifts. p(x,, x )= p(x t+,, x t+ ) I the special case of i.i.d. radom variables, we had that H(X,, X ) = i= H(X) = H(X), ad we kow how the etropy behaves for large. We will ask how H(X,, X ) grows for a geeral statioary process. Week 5 (0/9/009) Notes: Midterm /, Next class starts at pm. Etropy Rates for Stochastic Processes Cotiuig o, we ca ow defie a aalogous quatity for etropy: Defiitio 36. The etropy rate of a stochastic process (arbitrary) {X i } is whe the limit exists. Also defie a related quatity, H(X)= lim H(X,, X ) whe the limit exists. H (X)= lim H(X X,, X ) Note that H(X) is the Cesaro limit of H(X X, direct limit., X ) (usig the chai rule), whereas H (X) is the 4

25 Also, for i.i.d radom variables, this agrees with the usual defiitio of etropy, sice i that case. H(X,, X ) H(X) Theorem 37. For a statioary stochastic process, the limits H(X) ad H (X) exist ad are equal., X ) is mootoe decreasig for a statioary pro- Proof. This follows from the fact that H(X X, cess: H(X + X,, X ) H(X + X,, X ) = H(X X,, X ) where the iequality follows sice coditioig reduces etropy ad the equality follows sice the process is statioary. The the sequece is mootoe decreasig ad also bouded below by 0 (etropy is oegative) ad thus the limit ad the Cesaro limit exists ad are equal. Here is a importat result that we will ot prove (maybe later), by Shao, McMilla, ad Breima: Theorem 38. (Geeralized AEP) For ay statioary ergodic process, log p(x,, X ) H(X)a.e. Thus, the results about compressio ad typical sets geeralize to statioary ergodic process. Now we itroduce the followig otatio for Markov Chais (MC) ad also some review:. We will take X to be a discrete, at most coutable alphabet, called the state space. For istace, X could be {0,,, }.. A stochastic process is a Markov Chai (MC) if P(X = j X =i, X = i,, X =i )=P(X = j X = i) i.e. coditioig oly depeds o the previous time. We call this probability which we deote P ij,+ the oe-step trasitio probability. If this is idepedet of, the we call the process a time-ivariat MC. Thus, we ca write X 0 X ad X + X X 0 (sice the joit o ay fiite segmet factorizes with depedecies oly o the previous radom variable. 3. For a time-ivariat MC, we express P ij i a compact probability trasitio matrix P (may be ifiite dimesioal ), where P ij is the etry of the i-th row, j-th colum. Note that j P ij = for each i. Give a distributio µ o X, which we ca write µ =[p(0), p(), ] 5

26 if at time 0 the margial distributio is µ, the at time the probabilities ca be foud by the formula P(X = j)= i P(X 0 =i)p ij = [µp] j where we use the otatio row vector µ times matrix P. It turs out that a ecessary ad sufficiet coditio for statioarity is that the margial distributios all equal some distributio µ for every, where µ = µp. For a time-ivariat MC, the joit depeds oly o the iitial distributio P(X 0 = i) ad the time ivariat trasitio probability (which is fixed), ad thus if P(X 0 = i) = µ i where µ = µp, statioarity follows. 4. Uder some coditios o a geeral (ot ecessarily time-ivariat) Markov chai (ergodicity, for istace), o matter what iitial distributio µ 0 we start from µ µ (covergece a.e. to a uique statioary distributio µ). Perro-Frobeius has such a result about coditios o P. A example of a ot so ice Markov chai is a Markov chai with two isolated compoets. The the limits exist but deped o the iitial distributio. Now we tur to results about Etropy Rates. For statioary MC s, ote that H(X) = H (X) = lim H(X X,, X ) = lim H(X X ) = H(X X ) = i,j µ i P ij log P ij Example 39. For a state Markov chai with ( ) α α P = β β we ca spot a statioary distributio to be µ = β, µ α+ β = α. The we ca compute the etropy rate α + β accordigly H(X) =H(X X ) = β α + β H(α) + α α + β H(β) Example 40. (Radom Walk o a Weighted Graph) For a graph with m odes X = {,, m}, assig weights w ij 0 with w ij = w ji ad w ii = 0 to edges of the graph. Let w i = j w ij, the total weight of edges leavig i, ad W = i<j w ij, the sum of all the weights. The defie trasitio probabilities p ij = wij ad examie the resultig statioary distributio. w i (A example radom walk fixes a startig poit, say, which correspods to iitial distributio µ 0 = [, 0, ]). The statioary distributio (turs out to be uique) ca be guessed: µ i = wi. To verify: W [µp] j = i µ i p ij = i w i W wij = w ij = w j w i W W i 6

27 which shows that µp = µ. Now we ca compute the etropy rate: H(X) = H(X X ) = ij = ij µ i p ij log p ij w i W wij w i log w ij w i w ij = W log w ij + w i ij ij ( wij ) ( wi ) = H +H W W w ij W log W w i which we ca iterpret as the etropy of the edge distributio plus the etropy of all the ode distributio, so to speak. I the special case where all the weights are equal, let E i be the umber of edges from ode i, ad let E be the total umber of edges. The µ i = Ei ad E ( E H(X) = log(e) H E,, E ) m E Example 4. (Radom Walk o a Chessboard) I the book, but is just a special case of the above. Example 4. (Hidde Markov Model) This cocers a Markov Chai {X } which we caot observe (is hidde), ad fuctios (observatios) Y (X ) which we ca observe. We ca boud the etropy rate of Y. Y does ot ecessarily form a Markov Chai. Data Compressio We ow work towards the goal of arguig that etropy is the limit of data compressio. First let s cosider properties of codes. For istace, a simple example is the Morse code, where each letter is ecoded by a sequece of dots ad dashes (biary alphabet), where the idea is to put frequetly used letters with shorter descriptios, ad the descriptios have arbitrary legth. We defie a code C for some source (r.v.) X to be a mappig from X D, fiite legth strigs from a D-ary alphabet. We deote C(x) to be the code word of x ad l(x) to be the legth of C(x). The expected codeword legth is the L(C) = E(l(X)). Reasoable Properties of Codes. We say a code is o-sigular if it is oe to oe, i.e. each C(x) is easily decodable to recover x. Now cosider the extesio C of a code C to be defied by C(x,, x ) = C(x ) C(x ) (cocateatio of codes), for ecodig may realizatios of X at oce (for istace, to sed across a chael). It would be good for the extesio the to be o-sigular as well. We call a code uiquely decodable (UD) if the extesio is osigular. Eve more specific, a code is prefix-free (or prefix) if o codeword is a prefix of aother code word. I this case the momet a code word is trasmitted we ca decode it immediately (without havig to receive the ext letter) Example biary codes: x sigular osigular but UD UD but ot prefix prefix

28 The goal is to miimize L(C), ad to this ed we ote a useful iequality. Theorem 43. (Kraft Iequality) For ay prefix code over a alphabet of size D, the codeword legths l,, l m must satisfy m D li i= Coversely, for ay set of legths satisfyig this iequality, there exists a prefix code with the specified legths. Proof. The idea is to examie a D-ary tree, where from the root, each brach represets a letter from the alphabet. The prefix coditio implies that the codewords correspod to the leaves of the D-ary tree. The argumet is the to exted the tree to a full D-ary tree (add odes util the depth is l max, the max legth). The umber of leaves at l max for a full tree is D lmax, ad if a codeword has legth l i, the the umber of leaves at the bottom level that correspod to this word is D lmax li. Thus, the codewords of a prefix ecodig satisfy D lmax li D lmax i ad the result follows. The coverse is easy to see by reverseig the costructio. Associate each codeword to D lmax li cosecutive leaves (which have a commo acestor). Next time we will show that the Kraft iequality eeds to hold also for UD codes, i which case there is o loss or beefit from focusig o prefix codes. Week 6 (0/6/009) First we ote that we ca exted the Kraft iequality discussed last week to coutably may legths as well. Theorem 44. (Exteded Kraft Iequality) For ay prefix code over a alphabet of size D, the codeword legths l, l, must satisfy D li i= Coversely, for ay set of legths satisfyig this iequality, there exists a prefix code with the specified legths. Proof. Give a prefix code, we ca agai cosider the ifiite tree with the codewords at the leaves. We still exted the tree so that the tree is full i a differet sese, addig odes util every ode either has D childre or o childre. Such a tree ca the be associated to a probability space o the (coutably-ifiite) leaves by the formula P(X = l) = D depth(l), ad the sum of all the probabilities is. Sice we have added dummy odes, we have oly icreased the sum of D li, ad thus D li i= 8

29 Coversely, as before we ca reverse the procedure above to costruct a D-ary tree with leaves at depth l i. It turs out that UD codes also eed to satisfy Kraft s iequality. Theorem 45. (McMilla) The codeword legths l, l,, l m of a UD code satisfy m D li i= ad coversely, for ay set of legths satisfyig this iequality, there exists a UD code with the specified legths. Proof. The coverse is immediate from the previous Theorem 44, sice we ca fid a prefix code satisfyig the specified legths, ad prefix codes are a subset of UD codes. Now give a UD code, examie C k, the extesio of the code, where C(x, l(x,, x k )= k m= l(x m ). Now look at, x k ) = C(x ) C(x k ), ad ( x X ) k D l(x) = kl max D l(x,,xk) = a(m)d m x,,x k m= which is a polyomial i D, where a(m) is the umber of sequeces with code legth m, ad sice C k is osigular, a(m) D m. Thus ( kl max =kl max x X D l(x) ) k m= Takig k-th roots, ad takig the limit as k, we the have that x X D l(x) (kl max ) /k which gives the result. Remark. Thus, there is o loss i cosiderig prefix codes as opposed to UD codes, sice for ay UD code there is a prefix code with the same legths. Optimal Codes Give a source with probability distributio (p,, p m ), we would wat to kow how to fid a prefix code with the miimum expected legth. The problem is the a miimizatio problem: m mi L = i= p i l i m s.t. D li i= l i Z + 9

30 where D is a fixed positive iteger (the base). This is a liear program with iteger costraits. First, we igore the iteger costrait ad assume equality i Kraft s iequality. The we have a Lagrage multiplier problem. The derivative of the Lagragia gives the equatios p i = λd li log D p D li = i λ log D m p i λ log D = i= λ log D = D li λ = = p i log D Thus the optimal codeword legths are l i = log D p i which yields the expected code legth L = i p i l i = i p i log D p i =H(X) the etropy beig i base D. This gives a lower boud for the miimizatio problem. Theorem 46. For ay prefix code for a radom variable X, the expected codeword legth satisfies with equality if ad oly if D li = p i. L H(X) Proof. This has already bee prove partially via Lagrage multipliers (with equality costrait i Kraft), but we ow tur to a Iformatio Theoretic approach. Let r i = D l i where c = i D li. The c L H(X) = i p i l i + i p i log p i = i p i log D li + i p i log p i = p i log p i r i i i D(p r) 0 p i log c ad thus L H(X) as desired. Equality holds if ad oly if log c = 0 ad p = r, i which case we have equality i Kraft s iequality (c = ) ad D li = p i, which fiishes the proof. Now if p i D li, we would like to kow how close to the etropy lower boud we ca get. A good suboptimal code is the Shao-Fao code, which ivolves a simple roudig procedure. Theorem 47. For optimal codeword legths l i,, l m for a radom variable X, L = i p il i satisfies H(X) L H(X)+ 30

31 Proof. Recall from Theorem 46 that legths of log achieve the lower boud if they are itegers. We p i ow suppose we used log pi istead. This correspods to the Shao code, but first we ote that the legths still satisfy Kraft s iequality: D log/pi i ad also that this gives the desired upper boud: i D log/pi i p i log p i i p i (log ) + = H(X)+ p i Sice the optimal code is better tha this Shao code, this gives the desired upper boud H(X) +. The lower boud was already proved i the previous theorem 46. Remark 48. The overhead of i the previous theorem ca be made isigificat if we ecode blocks (X,, X B ) to icrease H(X,, X B ), effectively spreadig the overhead across a block. By the previous theorem, we ca fid a code for (X,, X B ) with legths l(x,, x ) such that The per outcome, we have that H(X,, X ) E[l(x,, x )] H(X,, X )+ H(X,, X ) E[l(x,, x )] H(X,, X ) ad lettig L = E[l(x,, x )], we see that L H(X) if X,, X is statioary. Note the coectio to the Shao-McMilla-Breima Theorem (or AEP i the case of i.i.d X i ). There we saw that we ca desig codes usig typical sequeces to get a compressio rate arbitrarily close to the etropy. This remark uses a slightly differet approach usig Shao codes. Huffma Codes Now we cosider a particular prefix code for a probability distributio, which will tur out to be optimal. The idea is essetially to costruct a prefix code tree (as i the proofs of Kraft s iequality) from the bottom up. We first look at the case D = (biary codes) Huffma Procedure:. Order the symbols i decreasig order of probabilities, so x,, x m satisfies p p p m.. Merge the two symbols of lowest probability (last symbols) ito a sigle tree associated with probability p m + p m. + p m + p m x m x m Deote this symbol x m. 3

Information Theory and Statistics Lecture 4: Lempel-Ziv code

Information Theory and Statistics Lecture 4: Lempel-Ziv code Iformatio Theory ad Statistics Lecture 4: Lempel-Ziv code Łukasz Dębowski ldebowsk@ipipa.waw.pl Ph. D. Programme 203/204 Etropy rate is the limitig compressio rate Theorem For a statioary process (X i)

More information

Entropies & Information Theory

Entropies & Information Theory Etropies & Iformatio Theory LECTURE I Nilajaa Datta Uiversity of Cambridge,U.K. For more details: see lecture otes (Lecture 1- Lecture 5) o http://www.qi.damtp.cam.ac.uk/ode/223 Quatum Iformatio Theory

More information

Shannon s noiseless coding theorem

Shannon s noiseless coding theorem 18.310 lecture otes May 4, 2015 Shao s oiseless codig theorem Lecturer: Michel Goemas I these otes we discuss Shao s oiseless codig theorem, which is oe of the foudig results of the field of iformatio

More information

Lecture 14: Graph Entropy

Lecture 14: Graph Entropy 15-859: Iformatio Theory ad Applicatios i TCS Sprig 2013 Lecture 14: Graph Etropy March 19, 2013 Lecturer: Mahdi Cheraghchi Scribe: Euiwoog Lee 1 Recap Bergma s boud o the permaet Shearer s Lemma Number

More information

Entropy Rates and Asymptotic Equipartition

Entropy Rates and Asymptotic Equipartition Chapter 29 Etropy Rates ad Asymptotic Equipartitio Sectio 29. itroduces the etropy rate the asymptotic etropy per time-step of a stochastic process ad shows that it is well-defied; ad similarly for iformatio,

More information

Lecture 7: October 18, 2017

Lecture 7: October 18, 2017 Iformatio ad Codig Theory Autum 207 Lecturer: Madhur Tulsiai Lecture 7: October 8, 207 Biary hypothesis testig I this lecture, we apply the tools developed i the past few lectures to uderstad the problem

More information

Advanced Stochastic Processes.

Advanced Stochastic Processes. Advaced Stochastic Processes. David Gamarik LECTURE 2 Radom variables ad measurable fuctios. Strog Law of Large Numbers (SLLN). Scary stuff cotiued... Outlie of Lecture Radom variables ad measurable fuctios.

More information

Lecture 19: Convergence

Lecture 19: Convergence Lecture 19: Covergece Asymptotic approach I statistical aalysis or iferece, a key to the success of fidig a good procedure is beig able to fid some momets ad/or distributios of various statistics. I may

More information

6.3 Testing Series With Positive Terms

6.3 Testing Series With Positive Terms 6.3. TESTING SERIES WITH POSITIVE TERMS 307 6.3 Testig Series With Positive Terms 6.3. Review of what is kow up to ow I theory, testig a series a i for covergece amouts to fidig the i= sequece of partial

More information

4.3 Growth Rates of Solutions to Recurrences

4.3 Growth Rates of Solutions to Recurrences 4.3. GROWTH RATES OF SOLUTIONS TO RECURRENCES 81 4.3 Growth Rates of Solutios to Recurreces 4.3.1 Divide ad Coquer Algorithms Oe of the most basic ad powerful algorithmic techiques is divide ad coquer.

More information

ECE 901 Lecture 14: Maximum Likelihood Estimation and Complexity Regularization

ECE 901 Lecture 14: Maximum Likelihood Estimation and Complexity Regularization ECE 90 Lecture 4: Maximum Likelihood Estimatio ad Complexity Regularizatio R Nowak 5/7/009 Review : Maximum Likelihood Estimatio We have iid observatios draw from a ukow distributio Y i iid p θ, i,, where

More information

INFORMATION THEORY AND STATISTICS. Jüri Lember

INFORMATION THEORY AND STATISTICS. Jüri Lember INFORMATION THEORY AND STATISTICS Lecture otes ad exercises Sprig 203 Jüri Lember Literature:. T.M. Cover, J.A. Thomas "Elemets of iformatio theory", Wiley, 99 ja 2006; 2. Yeug, Raymod W. "A first course

More information

Chapter 3. Strong convergence. 3.1 Definition of almost sure convergence

Chapter 3. Strong convergence. 3.1 Definition of almost sure convergence Chapter 3 Strog covergece As poited out i the Chapter 2, there are multiple ways to defie the otio of covergece of a sequece of radom variables. That chapter defied covergece i probability, covergece i

More information

Infinite Sequences and Series

Infinite Sequences and Series Chapter 6 Ifiite Sequeces ad Series 6.1 Ifiite Sequeces 6.1.1 Elemetary Cocepts Simply speakig, a sequece is a ordered list of umbers writte: {a 1, a 2, a 3,...a, a +1,...} where the elemets a i represet

More information

n outcome is (+1,+1, 1,..., 1). Let the r.v. X denote our position (relative to our starting point 0) after n moves. Thus X = X 1 + X 2 + +X n,

n outcome is (+1,+1, 1,..., 1). Let the r.v. X denote our position (relative to our starting point 0) after n moves. Thus X = X 1 + X 2 + +X n, CS 70 Discrete Mathematics for CS Sprig 2008 David Wager Note 9 Variace Questio: At each time step, I flip a fair coi. If it comes up Heads, I walk oe step to the right; if it comes up Tails, I walk oe

More information

Lecture 11: Channel Coding Theorem: Converse Part

Lecture 11: Channel Coding Theorem: Converse Part EE376A/STATS376A Iformatio Theory Lecture - 02/3/208 Lecture : Chael Codig Theorem: Coverse Part Lecturer: Tsachy Weissma Scribe: Erdem Bıyık I this lecture, we will cotiue our discussio o chael codig

More information

Entropy and Ergodic Theory Lecture 5: Joint typicality and conditional AEP

Entropy and Ergodic Theory Lecture 5: Joint typicality and conditional AEP Etropy ad Ergodic Theory Lecture 5: Joit typicality ad coditioal AEP 1 Notatio: from RVs back to distributios Let (Ω, F, P) be a probability space, ad let X ad Y be A- ad B-valued discrete RVs, respectively.

More information

Chapter 6 Infinite Series

Chapter 6 Infinite Series Chapter 6 Ifiite Series I the previous chapter we cosidered itegrals which were improper i the sese that the iterval of itegratio was ubouded. I this chapter we are goig to discuss a topic which is somewhat

More information

Distribution of Random Samples & Limit theorems

Distribution of Random Samples & Limit theorems STAT/MATH 395 A - PROBABILITY II UW Witer Quarter 2017 Néhémy Lim Distributio of Radom Samples & Limit theorems 1 Distributio of i.i.d. Samples Motivatig example. Assume that the goal of a study is to

More information

Lecture 15: Strong, Conditional, & Joint Typicality

Lecture 15: Strong, Conditional, & Joint Typicality EE376A/STATS376A Iformatio Theory Lecture 15-02/27/2018 Lecture 15: Strog, Coditioal, & Joit Typicality Lecturer: Tsachy Weissma Scribe: Nimit Sohoi, William McCloskey, Halwest Mohammad I this lecture,

More information

Information Theory Tutorial Communication over Channels with memory. Chi Zhang Department of Electrical Engineering University of Notre Dame

Information Theory Tutorial Communication over Channels with memory. Chi Zhang Department of Electrical Engineering University of Notre Dame Iformatio Theory Tutorial Commuicatio over Chaels with memory Chi Zhag Departmet of Electrical Egieerig Uiversity of Notre Dame Abstract A geeral capacity formula C = sup I(; Y ), which is correct for

More information

Convergence of random variables. (telegram style notes) P.J.C. Spreij

Convergence of random variables. (telegram style notes) P.J.C. Spreij Covergece of radom variables (telegram style otes).j.c. Spreij this versio: September 6, 2005 Itroductio As we kow, radom variables are by defiitio measurable fuctios o some uderlyig measurable space

More information

Lecture 10: Universal coding and prediction

Lecture 10: Universal coding and prediction 0-704: Iformatio Processig ad Learig Sprig 0 Lecture 0: Uiversal codig ad predictio Lecturer: Aarti Sigh Scribes: Georg M. Goerg Disclaimer: These otes have ot bee subjected to the usual scrutiy reserved

More information

EECS564 Estimation, Filtering, and Detection Hwk 2 Solns. Winter p θ (z) = (2θz + 1 θ), 0 z 1

EECS564 Estimation, Filtering, and Detection Hwk 2 Solns. Winter p θ (z) = (2θz + 1 θ), 0 z 1 EECS564 Estimatio, Filterig, ad Detectio Hwk 2 Sols. Witer 25 4. Let Z be a sigle observatio havig desity fuctio where. p (z) = (2z + ), z (a) Assumig that is a oradom parameter, fid ad plot the maximum

More information

Lecture 6: Source coding, Typicality, and Noisy channels and capacity

Lecture 6: Source coding, Typicality, and Noisy channels and capacity 15-859: Iformatio Theory ad Applicatios i TCS CMU: Sprig 2013 Lecture 6: Source codig, Typicality, ad Noisy chaels ad capacity Jauary 31, 2013 Lecturer: Mahdi Cheraghchi Scribe: Togbo Huag 1 Recap Uiversal

More information

EE 4TM4: Digital Communications II Information Measures

EE 4TM4: Digital Communications II Information Measures EE 4TM4: Digital Commuicatios II Iformatio Measures Defiitio : The etropy H(X) of a discrete radom variable X is defied by We also write H(p) for the above quatity. Lemma : H(X) 0. H(X) = x X Proof: 0

More information

Product measures, Tonelli s and Fubini s theorems For use in MAT3400/4400, autumn 2014 Nadia S. Larsen. Version of 13 October 2014.

Product measures, Tonelli s and Fubini s theorems For use in MAT3400/4400, autumn 2014 Nadia S. Larsen. Version of 13 October 2014. Product measures, Toelli s ad Fubii s theorems For use i MAT3400/4400, autum 2014 Nadia S. Larse Versio of 13 October 2014. 1. Costructio of the product measure The purpose of these otes is to preset the

More information

MASSACHUSETTS INSTITUTE OF TECHNOLOGY 6.436J/15.085J Fall 2008 Lecture 19 11/17/2008 LAWS OF LARGE NUMBERS II THE STRONG LAW OF LARGE NUMBERS

MASSACHUSETTS INSTITUTE OF TECHNOLOGY 6.436J/15.085J Fall 2008 Lecture 19 11/17/2008 LAWS OF LARGE NUMBERS II THE STRONG LAW OF LARGE NUMBERS MASSACHUSTTS INSTITUT OF TCHNOLOGY 6.436J/5.085J Fall 2008 Lecture 9 /7/2008 LAWS OF LARG NUMBRS II Cotets. The strog law of large umbers 2. The Cheroff boud TH STRONG LAW OF LARG NUMBRS While the weak

More information

Problem Set 4 Due Oct, 12

Problem Set 4 Due Oct, 12 EE226: Radom Processes i Systems Lecturer: Jea C. Walrad Problem Set 4 Due Oct, 12 Fall 06 GSI: Assae Gueye This problem set essetially reviews detectio theory ad hypothesis testig ad some basic otios

More information

Maximum Likelihood Estimation and Complexity Regularization

Maximum Likelihood Estimation and Complexity Regularization ECE90 Sprig 004 Statistical Regularizatio ad Learig Theory Lecture: 4 Maximum Likelihood Estimatio ad Complexity Regularizatio Lecturer: Rob Nowak Scribe: Pam Limpiti Review : Maximum Likelihood Estimatio

More information

The Maximum-Likelihood Decoding Performance of Error-Correcting Codes

The Maximum-Likelihood Decoding Performance of Error-Correcting Codes The Maximum-Lielihood Decodig Performace of Error-Correctig Codes Hery D. Pfister ECE Departmet Texas A&M Uiversity August 27th, 2007 (rev. 0) November 2st, 203 (rev. ) Performace of Codes. Notatio X,

More information

An Introduction to Randomized Algorithms

An Introduction to Randomized Algorithms A Itroductio to Radomized Algorithms The focus of this lecture is to study a radomized algorithm for quick sort, aalyze it usig probabilistic recurrece relatios, ad also provide more geeral tools for aalysis

More information

7.1 Convergence of sequences of random variables

7.1 Convergence of sequences of random variables Chapter 7 Limit Theorems Throughout this sectio we will assume a probability space (, F, P), i which is defied a ifiite sequece of radom variables (X ) ad a radom variable X. The fact that for every ifiite

More information

Cooperative Communication Fundamentals & Coding Techniques

Cooperative Communication Fundamentals & Coding Techniques 3 th ICACT Tutorial Cooperative commuicatio fudametals & codig techiques Cooperative Commuicatio Fudametals & Codig Techiques 0..4 Electroics ad Telecommuicatio Research Istitute Kiug Jug 3 th ICACT Tutorial

More information

CEE 522 Autumn Uncertainty Concepts for Geotechnical Engineering

CEE 522 Autumn Uncertainty Concepts for Geotechnical Engineering CEE 5 Autum 005 Ucertaity Cocepts for Geotechical Egieerig Basic Termiology Set A set is a collectio of (mutually exclusive) objects or evets. The sample space is the (collectively exhaustive) collectio

More information

Discrete Mathematics for CS Spring 2007 Luca Trevisan Lecture 22

Discrete Mathematics for CS Spring 2007 Luca Trevisan Lecture 22 CS 70 Discrete Mathematics for CS Sprig 2007 Luca Trevisa Lecture 22 Aother Importat Distributio The Geometric Distributio Questio: A biased coi with Heads probability p is tossed repeatedly util the first

More information

( ) = p and P( i = b) = q.

( ) = p and P( i = b) = q. MATH 540 Radom Walks Part 1 A radom walk X is special stochastic process that measures the height (or value) of a particle that radomly moves upward or dowward certai fixed amouts o each uit icremet of

More information

Lecture 7: Channel coding theorem for discrete-time continuous memoryless channel

Lecture 7: Channel coding theorem for discrete-time continuous memoryless channel Lecture 7: Chael codig theorem for discrete-time cotiuous memoryless chael Lectured by Dr. Saif K. Mohammed Scribed by Mirsad Čirkić Iformatio Theory for Wireless Commuicatio ITWC Sprig 202 Let us first

More information

UC Berkeley CS 170: Efficient Algorithms and Intractable Problems Handout 17 Lecturer: David Wagner April 3, Notes 17 for CS 170

UC Berkeley CS 170: Efficient Algorithms and Intractable Problems Handout 17 Lecturer: David Wagner April 3, Notes 17 for CS 170 UC Berkeley CS 170: Efficiet Algorithms ad Itractable Problems Hadout 17 Lecturer: David Wager April 3, 2003 Notes 17 for CS 170 1 The Lempel-Ziv algorithm There is a sese i which the Huffma codig was

More information

Lecture 12: November 13, 2018

Lecture 12: November 13, 2018 Mathematical Toolkit Autum 2018 Lecturer: Madhur Tulsiai Lecture 12: November 13, 2018 1 Radomized polyomial idetity testig We will use our kowledge of coditioal probability to prove the followig lemma,

More information

Math 525: Lecture 5. January 18, 2018

Math 525: Lecture 5. January 18, 2018 Math 525: Lecture 5 Jauary 18, 2018 1 Series (review) Defiitio 1.1. A sequece (a ) R coverges to a poit L R (writte a L or lim a = L) if for each ǫ > 0, we ca fid N such that a L < ǫ for all N. If the

More information

Lecture 2. The Lovász Local Lemma

Lecture 2. The Lovász Local Lemma Staford Uiversity Sprig 208 Math 233A: No-costructive methods i combiatorics Istructor: Ja Vodrák Lecture date: Jauary 0, 208 Origial scribe: Apoorva Khare Lecture 2. The Lovász Local Lemma 2. Itroductio

More information

Discrete Mathematics and Probability Theory Spring 2013 Anant Sahai Lecture 18

Discrete Mathematics and Probability Theory Spring 2013 Anant Sahai Lecture 18 EECS 70 Discrete Mathematics ad Probability Theory Sprig 2013 Aat Sahai Lecture 18 Iferece Oe of the major uses of probability is to provide a systematic framework to perform iferece uder ucertaity. A

More information

1 Convergence in Probability and the Weak Law of Large Numbers

1 Convergence in Probability and the Weak Law of Large Numbers 36-752 Advaced Probability Overview Sprig 2018 8. Covergece Cocepts: i Probability, i L p ad Almost Surely Istructor: Alessadro Rialdo Associated readig: Sec 2.4, 2.5, ad 4.11 of Ash ad Doléas-Dade; Sec

More information

4.1 Data processing inequality

4.1 Data processing inequality ECE598: Iformatio-theoretic methods i high-dimesioal statistics Sprig 206 Lecture 4: Total variatio/iequalities betwee f-divergeces Lecturer: Yihog Wu Scribe: Matthew Tsao, Feb 8, 206 [Ed. Mar 22] Recall

More information

Lecture 27. Capacity of additive Gaussian noise channel and the sphere packing bound

Lecture 27. Capacity of additive Gaussian noise channel and the sphere packing bound Lecture 7 Ageda for the lecture Gaussia chael with average power costraits Capacity of additive Gaussia oise chael ad the sphere packig boud 7. Additive Gaussia oise chael Up to this poit, we have bee

More information

Sequences. Notation. Convergence of a Sequence

Sequences. Notation. Convergence of a Sequence Sequeces A sequece is essetially just a list. Defiitio (Sequece of Real Numbers). A sequece of real umbers is a fuctio Z (, ) R for some real umber. Do t let the descriptio of the domai cofuse you; it

More information

Asymptotic Coupling and Its Applications in Information Theory

Asymptotic Coupling and Its Applications in Information Theory Asymptotic Couplig ad Its Applicatios i Iformatio Theory Vicet Y. F. Ta Joit Work with Lei Yu Departmet of Electrical ad Computer Egieerig, Departmet of Mathematics, Natioal Uiversity of Sigapore IMS-APRM

More information

Lecture 3: August 31

Lecture 3: August 31 36-705: Itermediate Statistics Fall 018 Lecturer: Siva Balakrisha Lecture 3: August 31 This lecture will be mostly a summary of other useful expoetial tail bouds We will ot prove ay of these i lecture,

More information

Statistics 511 Additional Materials

Statistics 511 Additional Materials Cofidece Itervals o mu Statistics 511 Additioal Materials This topic officially moves us from probability to statistics. We begi to discuss makig ifereces about the populatio. Oe way to differetiate probability

More information

Notes 19 : Martingale CLT

Notes 19 : Martingale CLT Notes 9 : Martigale CLT Math 733-734: Theory of Probability Lecturer: Sebastie Roch Refereces: [Bil95, Chapter 35], [Roc, Chapter 3]. Sice we have ot ecoutered weak covergece i some time, we first recall

More information

Random Variables, Sampling and Estimation

Random Variables, Sampling and Estimation Chapter 1 Radom Variables, Samplig ad Estimatio 1.1 Itroductio This chapter will cover the most importat basic statistical theory you eed i order to uderstad the ecoometric material that will be comig

More information

The Borel hierarchy classifies subsets of the reals by their topological complexity. Another approach is to classify them by size.

The Borel hierarchy classifies subsets of the reals by their topological complexity. Another approach is to classify them by size. Lecture 7: Measure ad Category The Borel hierarchy classifies subsets of the reals by their topological complexity. Aother approach is to classify them by size. Filters ad Ideals The most commo measure

More information

MASSACHUSETTS INSTITUTE OF TECHNOLOGY 6.265/15.070J Fall 2013 Lecture 3 9/11/2013. Large deviations Theory. Cramér s Theorem

MASSACHUSETTS INSTITUTE OF TECHNOLOGY 6.265/15.070J Fall 2013 Lecture 3 9/11/2013. Large deviations Theory. Cramér s Theorem MASSACHUSETTS INSTITUTE OF TECHNOLOGY 6.265/5.070J Fall 203 Lecture 3 9//203 Large deviatios Theory. Cramér s Theorem Cotet.. Cramér s Theorem. 2. Rate fuctio ad properties. 3. Chage of measure techique.

More information

CS284A: Representations and Algorithms in Molecular Biology

CS284A: Representations and Algorithms in Molecular Biology CS284A: Represetatios ad Algorithms i Molecular Biology Scribe Notes o Lectures 3 & 4: Motif Discovery via Eumeratio & Motif Represetatio Usig Positio Weight Matrix Joshua Gervi Based o presetatios by

More information

Increasing timing capacity using packet coloring

Increasing timing capacity using packet coloring 003 Coferece o Iformatio Scieces ad Systems, The Johs Hopkis Uiversity, March 4, 003 Icreasig timig capacity usig packet colorig Xi Liu ad R Srikat[] Coordiated Sciece Laboratory Uiversity of Illiois e-mail:

More information

Basics of Probability Theory (for Theory of Computation courses)

Basics of Probability Theory (for Theory of Computation courses) Basics of Probability Theory (for Theory of Computatio courses) Oded Goldreich Departmet of Computer Sciece Weizma Istitute of Sciece Rehovot, Israel. oded.goldreich@weizma.ac.il November 24, 2008 Preface.

More information

Problem Set 2 Solutions

Problem Set 2 Solutions CS271 Radomess & Computatio, Sprig 2018 Problem Set 2 Solutios Poit totals are i the margi; the maximum total umber of poits was 52. 1. Probabilistic method for domiatig sets 6pts Pick a radom subset S

More information

Brief Review of Functions of Several Variables

Brief Review of Functions of Several Variables Brief Review of Fuctios of Several Variables Differetiatio Differetiatio Recall, a fuctio f : R R is differetiable at x R if ( ) ( ) lim f x f x 0 exists df ( x) Whe this limit exists we call it or f(

More information

Lecture 1: Basic problems of coding theory

Lecture 1: Basic problems of coding theory Lecture 1: Basic problems of codig theory Error-Correctig Codes (Sprig 016) Rutgers Uiversity Swastik Kopparty Scribes: Abhishek Bhrushudi & Aditya Potukuchi Admiistrivia was discussed at the begiig of

More information

Estimation for Complete Data

Estimation for Complete Data Estimatio for Complete Data complete data: there is o loss of iformatio durig study. complete idividual complete data= grouped data A complete idividual data is the oe i which the complete iformatio of

More information

Sequences and Series of Functions

Sequences and Series of Functions Chapter 6 Sequeces ad Series of Fuctios 6.1. Covergece of a Sequece of Fuctios Poitwise Covergece. Defiitio 6.1. Let, for each N, fuctio f : A R be defied. If, for each x A, the sequece (f (x)) coverges

More information

Lecture 10 October Minimaxity and least favorable prior sequences

Lecture 10 October Minimaxity and least favorable prior sequences STATS 300A: Theory of Statistics Fall 205 Lecture 0 October 22 Lecturer: Lester Mackey Scribe: Brya He, Rahul Makhijai Warig: These otes may cotai factual ad/or typographic errors. 0. Miimaxity ad least

More information

Generalized Semi- Markov Processes (GSMP)

Generalized Semi- Markov Processes (GSMP) Geeralized Semi- Markov Processes (GSMP) Summary Some Defiitios Markov ad Semi-Markov Processes The Poisso Process Properties of the Poisso Process Iterarrival times Memoryless property ad the residual

More information

10-704: Information Processing and Learning Spring Lecture 10: Feb 12

10-704: Information Processing and Learning Spring Lecture 10: Feb 12 10-704: Iformatio Processig ad Learig Sprig 2015 Lecture 10: Feb 12 Lecturer: Akshay Krishamurthy Scribe: Dea Asta, Kirthevasa Kadasamy Disclaimer: These otes have ot bee subjected to the usual scrutiy

More information

7.1 Convergence of sequences of random variables

7.1 Convergence of sequences of random variables Chapter 7 Limit theorems Throughout this sectio we will assume a probability space (Ω, F, P), i which is defied a ifiite sequece of radom variables (X ) ad a radom variable X. The fact that for every ifiite

More information

Seunghee Ye Ma 8: Week 5 Oct 28

Seunghee Ye Ma 8: Week 5 Oct 28 Week 5 Summary I Sectio, we go over the Mea Value Theorem ad its applicatios. I Sectio 2, we will recap what we have covered so far this term. Topics Page Mea Value Theorem. Applicatios of the Mea Value

More information

Chapter 6 Principles of Data Reduction

Chapter 6 Principles of Data Reduction Chapter 6 for BST 695: Special Topics i Statistical Theory. Kui Zhag, 0 Chapter 6 Priciples of Data Reductio Sectio 6. Itroductio Goal: To summarize or reduce the data X, X,, X to get iformatio about a

More information

Lecture 3 The Lebesgue Integral

Lecture 3 The Lebesgue Integral Lecture 3: The Lebesgue Itegral 1 of 14 Course: Theory of Probability I Term: Fall 2013 Istructor: Gorda Zitkovic Lecture 3 The Lebesgue Itegral The costructio of the itegral Uless expressly specified

More information

ACO Comprehensive Exam 9 October 2007 Student code A. 1. Graph Theory

ACO Comprehensive Exam 9 October 2007 Student code A. 1. Graph Theory 1. Graph Theory Prove that there exist o simple plaar triagulatio T ad two distict adjacet vertices x, y V (T ) such that x ad y are the oly vertices of T of odd degree. Do ot use the Four-Color Theorem.

More information

4. Partial Sums and the Central Limit Theorem

4. Partial Sums and the Central Limit Theorem 1 of 10 7/16/2009 6:05 AM Virtual Laboratories > 6. Radom Samples > 1 2 3 4 5 6 7 4. Partial Sums ad the Cetral Limit Theorem The cetral limit theorem ad the law of large umbers are the two fudametal theorems

More information

STAT Homework 1 - Solutions

STAT Homework 1 - Solutions STAT-36700 Homework 1 - Solutios Fall 018 September 11, 018 This cotais solutios for Homework 1. Please ote that we have icluded several additioal commets ad approaches to the problems to give you better

More information

CS 330 Discussion - Probability

CS 330 Discussion - Probability CS 330 Discussio - Probability March 24 2017 1 Fudametals of Probability 11 Radom Variables ad Evets A radom variable X is oe whose value is o-determiistic For example, suppose we flip a coi ad set X =

More information

Lecture 8: Convergence of transformations and law of large numbers

Lecture 8: Convergence of transformations and law of large numbers Lecture 8: Covergece of trasformatios ad law of large umbers Trasformatio ad covergece Trasformatio is a importat tool i statistics. If X coverges to X i some sese, we ofte eed to check whether g(x ) coverges

More information

If a subset E of R contains no open interval, is it of zero measure? For instance, is the set of irrationals in [0, 1] is of measure zero?

If a subset E of R contains no open interval, is it of zero measure? For instance, is the set of irrationals in [0, 1] is of measure zero? 2 Lebesgue Measure I Chapter 1 we defied the cocept of a set of measure zero, ad we have observed that every coutable set is of measure zero. Here are some atural questios: If a subset E of R cotais a

More information

7 Sequences of real numbers

7 Sequences of real numbers 40 7 Sequeces of real umbers 7. Defiitios ad examples Defiitio 7... A sequece of real umbers is a real fuctio whose domai is the set N of atural umbers. Let s : N R be a sequece. The the values of s are

More information

Math 61CM - Solutions to homework 3

Math 61CM - Solutions to homework 3 Math 6CM - Solutios to homework 3 Cédric De Groote October 2 th, 208 Problem : Let F be a field, m 0 a fixed oegative iteger ad let V = {a 0 + a x + + a m x m a 0,, a m F} be the vector space cosistig

More information

It is always the case that unions, intersections, complements, and set differences are preserved by the inverse image of a function.

It is always the case that unions, intersections, complements, and set differences are preserved by the inverse image of a function. MATH 532 Measurable Fuctios Dr. Neal, WKU Throughout, let ( X, F, µ) be a measure space ad let (!, F, P ) deote the special case of a probability space. We shall ow begi to study real-valued fuctios defied

More information

Discrete Mathematics for CS Spring 2008 David Wagner Note 22

Discrete Mathematics for CS Spring 2008 David Wagner Note 22 CS 70 Discrete Mathematics for CS Sprig 2008 David Wager Note 22 I.I.D. Radom Variables Estimatig the bias of a coi Questio: We wat to estimate the proportio p of Democrats i the US populatio, by takig

More information

1 Review and Overview

1 Review and Overview DRAFT a fial versio will be posted shortly CS229T/STATS231: Statistical Learig Theory Lecturer: Tegyu Ma Lecture #3 Scribe: Migda Qiao October 1, 2013 1 Review ad Overview I the first half of this course,

More information

Learning Theory: Lecture Notes

Learning Theory: Lecture Notes Learig Theory: Lecture Notes Kamalika Chaudhuri October 4, 0 Cocetratio of Averages Cocetratio of measure is very useful i showig bouds o the errors of machie-learig algorithms. We will begi with a basic

More information

MASSACHUSETTS INSTITUTE OF TECHNOLOGY 6.436J/15.085J Fall 2008 Lecture 6 9/24/2008 DISCRETE RANDOM VARIABLES AND THEIR EXPECTATIONS

MASSACHUSETTS INSTITUTE OF TECHNOLOGY 6.436J/15.085J Fall 2008 Lecture 6 9/24/2008 DISCRETE RANDOM VARIABLES AND THEIR EXPECTATIONS MASSACHUSETTS INSTITUTE OF TECHNOLOGY 6.436J/15.085J Fall 2008 Lecture 6 9/24/2008 DISCRETE RANDOM VARIABLES AND THEIR EXPECTATIONS Cotets 1. A few useful discrete radom variables 2. Joit, margial, ad

More information

Sequences A sequence of numbers is a function whose domain is the positive integers. We can see that the sequence

Sequences A sequence of numbers is a function whose domain is the positive integers. We can see that the sequence Sequeces A sequece of umbers is a fuctio whose domai is the positive itegers. We ca see that the sequece 1, 1, 2, 2, 3, 3,... is a fuctio from the positive itegers whe we write the first sequece elemet

More information

ECE 564/645 - Digital Communication Systems (Spring 2014) Final Exam Friday, May 2nd, 8:00-10:00am, Marston 220

ECE 564/645 - Digital Communication Systems (Spring 2014) Final Exam Friday, May 2nd, 8:00-10:00am, Marston 220 ECE 564/645 - Digital Commuicatio Systems (Sprig 014) Fial Exam Friday, May d, 8:00-10:00am, Marsto 0 Overview The exam cosists of four (or five) problems for 100 (or 10) poits. The poits for each part

More information

Lecture 11: Pseudorandom functions

Lecture 11: Pseudorandom functions COM S 6830 Cryptography Oct 1, 2009 Istructor: Rafael Pass 1 Recap Lecture 11: Pseudoradom fuctios Scribe: Stefao Ermo Defiitio 1 (Ge, Ec, Dec) is a sigle message secure ecryptio scheme if for all uppt

More information

Notes on Information Theory by Jeff Steif

Notes on Information Theory by Jeff Steif Notes o Iformatio Theory by Jeff Steif 1 Etropy, the Shao-McMilla-Breima Theorem ad Data Compressio These otes will cotai some aspects of iformatio theory. We will cosider some REAL problems that REAL

More information

Lecture 4: April 10, 2013

Lecture 4: April 10, 2013 TTIC/CMSC 1150 Mathematical Toolkit Sprig 01 Madhur Tulsiai Lecture 4: April 10, 01 Scribe: Haris Agelidakis 1 Chebyshev s Iequality recap I the previous lecture, we used Chebyshev s iequality to get a

More information

ECE 901 Lecture 13: Maximum Likelihood Estimation

ECE 901 Lecture 13: Maximum Likelihood Estimation ECE 90 Lecture 3: Maximum Likelihood Estimatio R. Nowak 5/7/009 The focus of this lecture is to cosider aother approach to learig based o maximum likelihood estimatio. Ulike earlier approaches cosidered

More information

Sequences, Mathematical Induction, and Recursion. CSE 2353 Discrete Computational Structures Spring 2018

Sequences, Mathematical Induction, and Recursion. CSE 2353 Discrete Computational Structures Spring 2018 CSE 353 Discrete Computatioal Structures Sprig 08 Sequeces, Mathematical Iductio, ad Recursio (Chapter 5, Epp) Note: some course slides adopted from publisher-provided material Overview May mathematical

More information

Optimization Methods MIT 2.098/6.255/ Final exam

Optimization Methods MIT 2.098/6.255/ Final exam Optimizatio Methods MIT 2.098/6.255/15.093 Fial exam Date Give: December 19th, 2006 P1. [30 pts] Classify the followig statemets as true or false. All aswers must be well-justified, either through a short

More information

Math 2784 (or 2794W) University of Connecticut

Math 2784 (or 2794W) University of Connecticut ORDERS OF GROWTH PAT SMITH Math 2784 (or 2794W) Uiversity of Coecticut Date: Mar. 2, 22. ORDERS OF GROWTH. Itroductio Gaiig a ituitive feel for the relative growth of fuctios is importat if you really

More information

Output Analysis and Run-Length Control

Output Analysis and Run-Length Control IEOR E4703: Mote Carlo Simulatio Columbia Uiversity c 2017 by Marti Haugh Output Aalysis ad Ru-Legth Cotrol I these otes we describe how the Cetral Limit Theorem ca be used to costruct approximate (1 α%

More information

6.867 Machine learning, lecture 7 (Jaakkola) 1

6.867 Machine learning, lecture 7 (Jaakkola) 1 6.867 Machie learig, lecture 7 (Jaakkola) 1 Lecture topics: Kerel form of liear regressio Kerels, examples, costructio, properties Liear regressio ad kerels Cosider a slightly simpler model where we omit

More information

(A sequence also can be thought of as the list of function values attained for a function f :ℵ X, where f (n) = x n for n 1.) x 1 x N +k x N +4 x 3

(A sequence also can be thought of as the list of function values attained for a function f :ℵ X, where f (n) = x n for n 1.) x 1 x N +k x N +4 x 3 MATH 337 Sequeces Dr. Neal, WKU Let X be a metric space with distace fuctio d. We shall defie the geeral cocept of sequece ad limit i a metric space, the apply the results i particular to some special

More information

Math 216A Notes, Week 5

Math 216A Notes, Week 5 Math 6A Notes, Week 5 Scribe: Ayastassia Sebolt Disclaimer: These otes are ot early as polished (ad quite possibly ot early as correct) as a published paper. Please use them at your ow risk.. Thresholds

More information

Machine Learning Brett Bernstein

Machine Learning Brett Bernstein Machie Learig Brett Berstei Week 2 Lecture: Cocept Check Exercises Starred problems are optioal. Excess Risk Decompositio 1. Let X = Y = {1, 2,..., 10}, A = {1,..., 10, 11} ad suppose the data distributio

More information

Topic 9: Sampling Distributions of Estimators

Topic 9: Sampling Distributions of Estimators Topic 9: Samplig Distributios of Estimators Course 003, 2016 Page 0 Samplig distributios of estimators Sice our estimators are statistics (particular fuctios of radom variables), their distributio ca be

More information

Intro to Learning Theory

Intro to Learning Theory Lecture 1, October 18, 2016 Itro to Learig Theory Ruth Urer 1 Machie Learig ad Learig Theory Comig soo 2 Formal Framework 21 Basic otios I our formal model for machie learig, the istaces to be classified

More information

ECE 901 Lecture 12: Complexity Regularization and the Squared Loss

ECE 901 Lecture 12: Complexity Regularization and the Squared Loss ECE 90 Lecture : Complexity Regularizatio ad the Squared Loss R. Nowak 5/7/009 I the previous lectures we made use of the Cheroff/Hoeffdig bouds for our aalysis of classifier errors. Hoeffdig s iequality

More information