INFORMATION THEORY AND STATISTICS. Jüri Lember

Size: px
Start display at page:

Download "INFORMATION THEORY AND STATISTICS. Jüri Lember"

Transcription

1 INFORMATION THEORY AND STATISTICS Lecture otes ad exercises Sprig 203 Jüri Lember

2 Literature:. T.M. Cover, J.A. Thomas "Elemets of iformatio theory", Wiley, 99 ja 2006; 2. Yeug, Raymod W. "A first course of iformatio theory", Kluwer, 2002; 3. Te Su Ha, Kigo Kobayashi "Mathematics of iformatio ad codig", AMS, 994; 4. Csisza r, I., Shields, P. "Iformatio theory ad statistics : a tutorial", MA 2004; 5. Mackay, D. "Iformatio theory, iferece ad learig algorithms", Cambridge 2004; 6. McEliece, R. "Iformatio ad codig", Cambridge 2004; 7. Gray, R. "Etropy ad iformatio theory", Spriger 990; 8. Gray, R. "Etropy ad iformatio theory", Spriger 990; 9. Gray, R. "Source codig theory", Kluwer, 990; 0. Shields, P. "The ergodic theory of discrete sample paths", AMS 996;. Dembo, A., Zeitoui, O. "Large deviatio techiques ad Applicatios", Spriger

3 Mai cocepts. (Shao) etropy I what follows, let X = {x, x 2,...} be a discrete (fiite or coutably ifiite) alphabet. Let X be a radom variable takig values o X with distributio P. We shall deote p i := P(X = x i ) = P (x i ). Thus, for every A X P (A) = P(X A) = i:x i A p i = x A P (x). Sice X is fixed, the distributio P ca be uiquely represeted via the probabilities p i i.e. P = (p, p 2,...). Recall that the support of P, deoted via X P is the set of letters havig positive probability (atoms), i.e. X P := {x X : P (x) > 0}. Also recall that for ay g : X R such that p i g(x i ) <, the expectatio of g(x) is defies as follows Eg(X) = p i g(x i ) = g(x)p (x) = g(x)p (x). (.) x X x X P NB! I what follows log := log 2 ad 0 log 0 := 0... Defiitio ad elemetary properties Def. The (Shao) etropy of radom variable X (distributio P ) H(X) is H(X) = p i log p i = x X P (x) log P (x). Remarks: H(X) depeds o X via P, oly. By (.) H(X) = E ( log P (X) ) = E log P (X). The sum p i log p i is always defied (sice p i log p i 0), but ca be ifiite. Hece 0 H(X), ad H(X) = 0 iff for a letter x, X = x, a.s.. 3

4 Etropy does ot deped o the alphabet X, it oly depeds o probabilities p i. Hece, we ca also write H(p, p 2,...). I priciple, ay other logarithm log b ca be used i the defiitio of etropy. Such etropy is deoted by H b i.e. H b (X) = p i log b p i = x X P (x) log b P (x). sice log b p = log b a log a p, it holds H b (X) = (log b a)h a (X), so that H b (X) = (log b 2)H(X) ad H e (X) = (l 2)H(X). I iformatio theory, typically, log 2 is used ad such etropy is measured i bits. The etropy defied with l is measured i ats, the etropy defied with log 0 is measured i dits. The umber log p(x i ) ca be iterpreted as the amout of iformatio oe gets if X takes x i. The smaller p(x i ), the bigger is the amout of iformatio. The etropy is thus the average amout of iformatio or radomess X cotais the bigger H(X), the more radom is X. The cocept of etropy was itroduced by C. Shao i his semial paper "A mathematical theory of commuicatio" (948). Examples: Let X = {0, }, p = P(X = ), i.e. X B(, p). The H(X) = p log p ( p) log( p) =: h(p). The fuctio h(p) is called the biary etropy fuctio. The fuctio h(p) is cocave, symmetric aroud 2 ad has maximum at p = 2 : h( 2 ) = 2 log 2 2 log 2 = log 2 =. 2 Cosider the distributios P : a b c d e Q : a b c d H(P ) = 2 log 2 4 log 4 8 log 8 6 log 6 6 log 6 = = 5 8 H(Q) = log 4 = 2. Thus P is "less radom ", although the umber of atoms (the letters with positive probability) is bigger. 4

5 ..2 Axiomatic approach The etropy has the property of groupig H(p, p 2, p 3,...) = H(Σ k i=p i, p k+, p k+2,...) + ( ) ( Σ k p p ) k i=p i H Σ k i= p,..., i Σ k i= p. (.2) i The proof of (.2) is Exercise 2. I a sese, groupig is a atural "additivity" property that a measure of iformatio should have. It turs out that whe X is fiite, the groupig together with symmetry ad cotiuity implies etropy. More precisely, let for ay m, P m be the set all probability measures i m-dimesioal alphabet, i.e. { m } P m := (p,..., p m ) : p i 0, p i =. Suppose, for every m we have a fuctio f m : P m [0, ) that is a cadidate for a measure of iformatio. The fuctio f m is cotiuous if it is cotiuous with respect to all coordiates, ad it is symmetric, if it value is idepedet of the order of the argumets. Theorem.2 Let, for every m, f m : P m [0, ) be symmetric fuctios satisfyig the followig axioms: A f 2 is ormalized, i.e. f 2 ( 2, 2 ) = ; A2 f m is cotiuous for every m = 2, 3,...; A3 it has the groupig property: for every < k < m, f m (p, p 2,..., p m ) = f m k+ (Σ k i=p i, p k+,..., p m )+ ( ) ( Σ k p p ) k i=p i fk Σ k i= p,..., i Σ k i= p. i A4 for every m <, it holds f m ( m,..., m ) f (,..., ). The for every m, Proof. Let, for every m, f m (p,..., p m ) = i= m p i log p i. (.3) i= g(m) := f m ( m,..., m ). By symmetry ad applyig A3 m times, we obtai ( g(m) = f m m,...,,..., }{{ m} m,..., ) }{{ m} = f m ( m..., m ) + f (,..., ) = g(m) + g(). 5

6 Hece, for itegers ad k, g( k ) = kg() ad by A, g(2 k ) = kg(2) = k i.e. g(2 k ) = log(2 k ), k. Usig A4, it is possible to show that the equality above holds for every iteger, i.e. g() = log, N. Fix a arbitrary m ad cosider (p,..., p m ), where all compoets are ratioal. The, there exist itegers k,..., k m ad commo deomiator such that p i = k i, i =,..., m. I this case, Therefore, ( g() = f,..., }{{ } k,,...,,..., }{{ } k 2 = f m ( k,..., k m ) + m = f m (p,..., p m ) + f m (p,..., p m ) = log() i= m i=,..., } {{ } k m ) k i f k i ( k i,..., k i ) k i g(k i) = f m (p,..., p m ) + m p i log(k i ) = i= m p i log(k i ). i= m p i log( k m i ) = p i log p i so that (.3) holds whe all p i are ratioal. Now use cotiuity of f m to deduce that (.3) always holds. Remark: Oe ca drop the axiom A4...3 Etropy is strictly cocave Jese s iequality. We shall ofte use Jese s iequality. Recall that a fuctio g : R R is covex, if for every x, x 2 ad λ [0, ], it holds i= g(λx + ( λ)x 2 ) λg(x ) + ( λ)g(x 2 ). A fuctio g is strictly covex, if equality holds oly for λ = or λ = 0. A fuctio g is cocave, if g is covex. Theorem.3 (Jese s iequality). Let g be covex fuctio ad X a radom variable such that E g(x) < ad E X <. The i= Eg(X) g(ex). (.4) If g is strictly covex, the (.4) is equality if ad oly if X = EX a.s.. 6

7 Mixture of distributios ad the cocavity of etropy. Let P ad P 2 be two distributios give i X. (Note that ay two discrete distributios ca be defied i a commo alphabet like the uio of their supports). The mixture of P ad P 2 is their covex combiatio: Q = λp + ( λ)p 2, λ (0, ). Whe X P, X 2 P 2 ad Z B(, λ), the the followig radom variable has the mixture distributio Q: { X if Z =, Y = X 2 if Z = 0. Clearly Q cotais the radomess of P ad P 2. I additio, Z is radom. Propositio. Etropy is strictly cocave i.e. H(Q) λh(p ) + ( λ)h(p 2 ) ad the iequality is strict except whe P = P 2. Whe X P ad X P2 are disjoit, the H(Q) = λh(p ) + ( λ)h(p 2 ) + h(λ). (.5) Proof. The fuctio f(y) = y log y is strictly cocave (y 0). Thus, for every x X λp (x) log P (x) ( λ)p 2 (x) log P 2 (x) = λf ( P (x) ) + ( λ)f ( P 2 (x) ) ( ) f λp (x) + ( λ)p 2 (x) = Q(x) log Q(x). Sum over X to get λh(p ) + ( λ)h(p 2 ) H(Q). The iequality is strict, whe there is at least oe x X so that P (x) P 2 (x). The proof of (.5) is Exercise 5. Example: Let P = B(, p ) ad P 2 = B(, p 2 ) (both Beroulli distributios). The the mixture λp + ( λ)p 2 is B(, λp + ( λ)p 2 ). The cocavity of etropy implies that biary etropy fuctio h(p) is strictly cocave: h(λp +( λ)p 2 ) λh(p )+( λ)h(p 2 )..2 Joit etropy Let X ad Y be radom variables takig values i discrete alphabets X ad Y, respectively. The (X, Y ) is radom vector with support i X Y = {(x, y) : x X, y Y}. Let P be the (joit) distributio of (X, Y ), a probability measure o X Y. Deote p ij := P (x i, y j ) = P ( (X, Y ) = (x i, y j ) ) = P(X = x i, Y = y j ). Joit distributio is ofte represeted by the followig table 7

8 X \Y y y 2... y j... x P (x, y ) = p P (x, y 2 ) = p 2... p j... j p j = P (x ) x 2 P (x 2, y ) = p 2 P (x, y 2 ) = p p 2j... j p 2j = P (x 2 ) x i p i p i2... p ij... j p ij = P (x i ) i p i = P (y ) i p i2 = P (y 2 )... i p ij = P (y j )... I the table ad i what follows (with some abuse of otatio), P (x) := P(X = x) ad P (y) := P(Y = y) deote margial laws. The radom variables X ad Y are idepedet if ad oly if P (x, y) = P (x)p (y) x X, y Y. The radom vector (X, Y ) ca be cosidered as a radom variable i a product alphabet X Y, ad the etropy of such a radom variable is H(X, Y ) = p ij log p ij = ( ) P (x, y) log P (x, y) = E log P (X, Y ). (.6) ij (x,y) X Y Def.4 The etropy H(X, Y ) as defied i (.6) is called the joit etropy of (X, Y ). Idepedet X ad Y. H(X, Y ) = = x X Whe X ad Y are idepedet, the P (x)p (y)(log P (x) + log P (y)) (x,y) X Y P (x, y) log P (x, y) = x X y Y P (x) log P (x) y Y P (y) log P (y) = H(X) + H(Y ). The argumet above ca be restate as follows. For every x X ad y Y it holds log P (x, y) = log P (x) + log P (y) so that Expectatio is liear log P (X, Y ) = log P (X) + log P (Y ). H(X, Y ) = E ( log P (X, Y ) ) = E ( log P (X) + log P (Y ) ) = E log P (X) E log P (Y ) = H(X) + H(Y ). The joit etropy of several radom variables. several radom variables X,..., X is defied H(X,..., X ) := E log P (X,..., X ). Whe all radom variables are idepedet, the H(X,..., X ) = H(X i ). 8 i= By aalogy, the joit etropy of

9 .3 Coditioal etropy.3. Defiitio Let x be such that P (x) > 0. The defie the coditioal probabilities P (y x) := P(Y = y X = x) = P (x, y) P (x). The coditioal distributio of Y give X = x is y y 2 y 3... P (y x) P (y 2 x) P (y 2 x).... The etropy of that distributio is H(Y x) :=: H(Y X = x) := y Y P (y x) log P (y x). Cosider the fuctio x H(Y x). Applyig it to the radom variable X P, we get a ew radom variable (the fuctio of X) with distributio H(Y x ) H(Y x 2 ) H(Y x 3 )... P (x ) P (x 2 ) P (x 3 ).... ad expectatio x X P H(Y x)p (x). Def.5 The coditioal etropy of Y give X P is Remarks: H(Y X) := H(Y x)p (x) = P (x) log P (y x)p (y x) x X P x X P y Y = ( ) log P (y x)p (x, y) = E log P (Y X). x X P y Y Whe X ad Y are idepedet, the P (y x) = P (y) x X P, y Y so that H(Y X) = H(Y ). I geeral H(X Y ) H(Y X) (take idepedet X, Y such that H(X) H(Y )). H(Y X) = 0 iff for a fuctio f, Y = f(x). Ideed, H(Y X) = 0 iff H(Y X = x) = 0 for every x X P. Hece, there exists f(x) such that P(Y = f(x) X = x) = or Y = f(x). 9

10 Joit etropy for more tha two radom variables. Let X, Y, Z be radom variables with supports X, Y ad Z. Cosiderig the vector (X, Y ) (or the vector (Y, Z)) as a radom variable, we have H(X, Y Z) := P (z) P (x, y z) log P (x, y z) z Z = H(X Y, Z) := (x,y,z) X Y Z (x,y) X Y log P (x, y z)p (x, y, z) = E log P (X, Y Z) P (x y, z) log P (x y, z) P (y, z) (y,z) Y Z x X = log P (x y, z)p (x, y, z) = E log P (X Y, Z). (x,y,z) X Y Z Moreover, give ay set X,..., X of radom variables, oe ca similarly defie coditioal etropies H(X, X,..., X j X j,..., X )..3.2 Chai rules for etropy Lemma. (Chai rule) Let X,..., X be radom variables. The H(X,..., X ) = H(X ) + H(X 2 X ) + H(X 3 X, X 2 ) + + H(X X,..., X ). Proof. For ay (x,..., x ) such that P (x,..., x ) > 0, it holds so that P (x,..., x ) = P (x )P (x 2 x )P (x 3 x, x 2 ) P (x x,..., x ), H(X,..., X ) = E log P (X,..., X ) = E log P (X ) E log P (X 2 X ) E log P (X X,..., X ) = H(X ) + H(X 2 X ) + + H(X X,..., X ). I particular, for ay radom vector (X, Y ) H(X, Y ) = H(X) + H(Y X) = H(Y ) + H(X Y ). Lemma.2 (Chai rule for coditioal etropy) Let X,..., X, Z be radom variables. The H(X,..., X Z) = H(X Z)+H(X 2 X, Z)+H(X 3 X, X 2, Z)+ +H(X X,..., X, Z). 0

11 Proof. For every (x,..., x, z) such that P (x,..., x, z) > 0, it holds so that P (x,..., x z) = P (x z)p (x 2 x, z)p (x 3 x 2, x, z) P (x x,..., x, z) log P (X,..., X Z) = log P (X Z) + log P (X 2 X, Z) + + P (X X,..., X, Z). Now take expectatio. I particular, for ay radom vector (X, Y, Z) H(X, Y Z) = H(X Z) + H(Y X, Z) = H(Y Z) + H(X Y, Z)..4 Kullback-Leibler distace.4. Defiitio NB! I what follows, 0 log( 0 q ) := 0, if q 0 ad p log(p ) := if p > 0. 0 Def.6 Let P ad Q two distributios o X. The Kullback-Leibler distace (Kullback- Leibler divergece, relative etropy, iformatioal divergece) betwee probability distributios P ad Q is defied as Where X P, the Whe X P ad Y Q, the D(P Q) := x X ( D(P Q) = E P (x) log P (x) Q(x). (.7) log P (X) Q(X) D(X Y ) := D(P Q). Def.7 Let, for ay x X, P (y x) ad Q(y x) be two (coditioal) probability distributios o Y. Let P (x) be a probability distributio o X. The coditioal Kullback- Leibler distace is the K-L distace of P (y x) ad Q(y x) averaged over P D(P (y x) Q(y x)) = x P (x) y ). P (y x) log P (y x) Q(y x) = x where P (x, y) := P (y x)p (x) ad (X, Y ) P (x, y). y P (x, y) log P (y x) Q(y x) = E log P (Y X) Q(Y X),

12 Remarks: Note that log P (x) is ot always o-egative so that i case of ifiite X, we have Q(x) to show that the sum of the series i (.7) is defied. Let us do it. Defie X + := { x X : P (x) } { Q(x) >, X := x X : P (x) } Q(x). The series over X is absolutely coverget: If P (x) log P (x) Q(x) = P (x) log Q(x) P (x) P (x) Q(x) P (x). x X x X x X P (x) log P (x) Q(x) <. x X + the series (.7) is coverget, otherwise its sum is. As we shall show below, D(P Q) 0 with equality oly if P = Q. However, i geeral D(P Q) D(Q P ). Hece K-L distace is ot a metric (true "distace"). Moreover, it does ot satisfy triagular iequality (Exercise 7). K-L distace measures the amout of "average surprise", that a distributio P provides us, whe we believe that the distributio is Q. If there is a x X such that Q(x ) = 0 (we believe x ever occurs), but P (x ) > 0 (it still happes sometimes), the ( P (x P (x ) ) ) log = Q(x ) implyig that D(P Q) =. This matches with ituitio seeig a impossible evet to happe is extremely surprisig (a miracle). O the other had, if there is a letter x X such that Q(x ) > 0 (we believe it might happe), but P (x ) = 0 (it actually ever happes), the ( P (x P (x ) ) ) log = 0. Q(x ) also this matches with the ituitio we are ot largely surprised if somethig that might happe actually ever does. I this poit of view the asymmetry of K-L distace is rather atural. Example: Let P = B(, ), Q = B(, q). The 2 D(P Q) = 2 log( 2q ) + 2 log( 2( q) ) = log(4q( q)), if q 0 2 D(Q P ) =q log(2q) + ( q) log(2( q)) if q 0. 2

13 .4.2 K-L distace is o-egative: Gibbs iequality ad its cosqueces Propositio.2 (Gibbs iequality) D(P Q) 0, with equality iff P = Q. Proof. Whe D(P Q) =, the iequality trivially holds. Hece cosider the situatio D(P Q) < i.e., series (.7) coverges absolutely (whe X ifiite). Let X P. Defie Y := Q(X) P (X) ad let g(x) := log(x). Note that g is strictly covex. We shall apply Jese s iequality. Let us first covice that all expectatios exists E g(y ) = x X log Q(x) P (x) P (x) = x X By Jese s iequality ( D(P Q) = E log P (X) ) Q(X) ( = E log P (x) P (x) <, Q(x) E Y log Q(X) P (X) = EY = x X ) = Eg(Y ) g(ey ) = log() = 0, with D(P Q) = 0 if ad oly if Y = a.s. or Q(x) = P (x) for every x X P. This implies Q(x) = P (x) for every x X. Q(x) P (x) =. P (x) Corollary. (log-sum iequality) Let a, a 2,... ad b, b 2,... oegative umbers so that a i < ad 0 < b i <. The ai log a i ( ai a i ) log, (.8) b i bi with equality iff a i b i = c i. Proof. Let a i = a i j a j, b i = b i j b. j Hece (a, a 2,...) ad (b, b 2,...) are probability measures so that from Gibbs iequality, it follows 0 a i log a i b i = a i j a log j a i j a j b i j b j = [ j a ai log a i ( aj a i ) log ]. j b i bj Sice ai log aj bj <, the iequality (.8) follows. We kow that D((a, a 2,...) (b, b 2,...)) = 0 iff a i = b i. This, however, implies that a i j = a j b i j b =: c, i. j 3

14 Remark: Note that log-sum iequality ad Gibbs iequality are equivalet. From Gibbs (or log-sum) iequality, it also follows that for fiite X, the distributio with the biggest etropy is uiform. Note that if U is uiform distributio over X, the H(U) = log X. Corollary.2 Let X <. The, for ay distributio P, it holds H(P ) log X, with equality iff P is uiform over X. Proof. Let U be uiform distributio over X, i.e. U(x) = X x X. The D(P U) = x X P (x) log P (x) U(x) = log X H(P ) 0. The equality holds iff U(x) = P (x) for every x X, i.e. P = U. Pisker iequality. There are several ways to measure the distace betwee differet probability measures o X. I statistics, a commo measure is so-called l or total variatio distace : for ay two probability measures P ad P 2 o X : It is easy to see (Exercise 8) where P P 2 := x X P (x) P 2 (x). P P 2 = 2 sup P (B) P 2 (B) = 2 P (A) P 2 (A) 2, (.9) B X A := {x X : P (x) P 2 (x)}. The covergece i total variatio, i.e. P P 0 implies that for every B X, P (B) P (B). I particular, for ay x X, P (x) P (x). O the other had, it is possible to show (Sheffe s theorem) that the covergece P (x) P (x) for every x implies P P 0. Thus P P 0 P (x) P (x), x X. I what follows, the covergece P P is always meat i total variatio. Note that for fiite X this is equivalet to the covergece i usual (Euclidia) distace. Pisker iequality implies that covergece i K-L distace i.e. D(P P ) 0 or D(P P ) 0 implies P P. Theorem.8 (Pisker iequality) For every two probability measures P ad P 2 o X, it holds D(P P 2 ) 2 l 2 P P 2 2. (.0) The proof of Pisker iequality is based o log-sum iequality. 4

15 Covexity of K-L distace. Let P, P 2, Q, Q 2 be the distributios o X. cosider the mixtures λp + ( λ)p 2 ja λq + ( λ)q 2. Corollary.3 D ( λp + ( λ)p 2 λq + ( λ)q 2 ) λd(p Q ) + ( λ)d(p 2 Q 2 ). (.) Proof. Fix x X. Log-sum iequality: Sum over X. λp (x) log λp (x) λq (x) + ( λ)p 2(x) log ( λ)p 2(x) ( λ)q 2 (x) ( ) λp (x) + ( λ)p 2 (x) log λp (x) + ( λ)p 2 (x) λq (x) + ( λ)q 2 (x). Take Q = Q 2 = Q. The from (5.3), it follows that the fuctio P D(P Q) is covex. Similarly oe gets that Q D(P Q) is covex. Whe they are fiite, the both fuctios are also strictly covex. Ideed: D(P Q) = P (x) log P (x) P (x) log Q(x) = P (x) log Q(x) H(P ). (.2) The fuctio P P (x) log Q(x) is liear, P H(P ) strictly cocave. The differece is, thus, strictly covex (whe fiite). From (.2) also the strict covexity of Q D(P Q) follows. Cotiuity of K-L distace for fiite X. I fiite-dimesioal space, a fiite covex fuctio is cotiuous. Hece if X < ad the fuctio P D(P Q) is fiite (i a ope set), the it is cotiuous (i that set). The same holds for the fuctio Q D(P Q). Example: The fiiteess is importat. Let X = {a, b}, ad let for every the measure P be such that P (a) = p, where p > 0 ad p 0. Let P (a) = 0. Clearly, P P, but for every = D(P P ) D(P P ) = 0. Coditioig icreases K-L distace. Let, for every x X, P (y x) ad P 2 (y x) be coditioal probability distributios, ad let P (x) a probability measure o X. Let P i (y) := x P i (y x)p (x), where i =, 2. The D(P (y x) P 2 (y x)) D(P P 2 ). (.3) Proof of (.3) is Exercise 6. 5

16 .5 Mutual iformatio Let (X, Y ) be radom vector with distributio P (x, y), (x, y) X Y. As usually, let P (x) ad P (y) be the margial distributios, i.e. P (x) is distributio of X ad P (y) is distributio of Y. Def.9 The mutual iformatio I(X; Y ) of X ad Y is K-L distace betwee the joit distributio P (x, y) ad the product distributio P (x)p (y) I(X; Y ) := x,y P (x, y) log P (x, y) P (x)p (y) = D( P (x, y) P (x)p (y) ) ( = E log P (X, Y ) ). P (X)P (Y ) Hece I(X; Y ) is K-L distace betwee (X, Y ) ad a vector (X, Y ), where X ad Y are distributed as X ad Y, but ulike X ad Y, the radom variables X ad Y are idepedet. Properties: I(X; Y ) depeds o joit distributio P (x, y). 0 I(X; Y ). mutual iformatio is symmetric I(X; Y ) = I(Y ; X). I(X; Y ) = 0 iff X, Y are idepedet. The followig relatio is importat: For the proof, ote I(X; Y ) = H(X) H(X Y ) = H(Y ) H(Y X). (.4) I(X; Y ) = E log P (X, Y ) P (X)P (Y ) = E log P (X Y )P (Y ) P (X)P (Y ) = E log P (X Y ) P (X) = E log P (X Y ) E log P (X) = H(X) H(X Y ). By symmetry, the roles of X ad Y ca be chaged. Hece the mutual iformatio is the reductio of radomess of X due to the kowledge of Y. Whe X ad Y are idepedet, the H(X Y ) = H(X), ad I(X; Y ) = 0. O the other had, whe X = f(y ), the H(X Y ) = 0 so that I(X; Y ) = H(X). I particular I(X; X) = H(X) H(X X) = H(X). Therefore, sometimes etropy is referred to as self-iformatio. 6

17 Recall chai rule: H(X Y ) = H(X, Y ) H(Y ). Hece Coditioig reduces etropy I(X; Y ) = H(X) + H(Y ) H(X, Y ). (.5) H(X Y ) H(X), because H(X) H(X Y ) = I(X; Y ) 0. Recall H(X Y ) = y H(X Y = y)p (y). The fact that sum is smaller tha H(X) does ot imply that H(X Y = y) H(X) for every y. As the followig little couterexample shows, it eed ot to be case (check!) Y\X a b u 0 3 v 8 For ay radom vector (X,..., X ), it holds H(X,..., X ) 4 8 H(X i ), with equality iff all compoets are idepedet. For the proof use chai rule H(X,..., X ) = H(X ) + H(X 2 X ) + H(X 3 X, X 2 ) + + H(X X,..., X ) i= ad apply the fact that coditioig reduces etropy. Coditioal mutual iformatio. support of Z. Let X, Y, Z be radom variables, let Z be the Def.0 The coditioal mutual iformatio of X, Y give Z is P (X Y, Z) I(X; Y Z) :=H(X Z) H(X Y, Z) = E log P (X Z) P (X Y, Z)P (Y Z) P (X, Y Z) =E log = E log P (X Z)P (Y Z) P (X Z)P (Y Z) = P (x, y z) P (x, y, z) log P (x z)p (y z) x,y,z = z P (z) y,x P (x, y z) log P (x, y z) P (x z)p (y z) = z D ( P (x, y z) P (x z)p (y z) ) P (z). 7

18 Properties: I(X; Y Z) 0, with equality iff X ad Y are coditioally idepedet: P (x, y z) = P (x z)p (y z), x X, y Y, z Z. (.6) For proof ote that I(X; Y Z) = 0 iff for every z Z, it holds ( ) D P (x, y z) P (x z)p (y z) = 0. This meas coditioal idepedece. The proof of followig equalities is Exercise 8 I(X; X Z) = H(X Z) I(X; Y Z) = H(Y Z) H(Y X, Z) I(X; Y Z) = H(X Z) + H(Y Z) H(X, Y Z). I additio, the followig equality holds I(X; Y Z) = H(X; Z) + H(Y ; Z) H(X, Y, Z) H(Z). (.7) Chai rule for mutual iformatio I(X,..., X ; Y ) = I(X ; Y )+I(X 2 ; Y X )+I(X 3 ; Y X, X 2 )+ +I(X ; Y X,..., X ). For proof use chai rule for etropy ad coditioal etropy: I(X,..., X ; Y ) =H(X,..., X ) H(X,..., X Y ) =H(X ) + H(X 2 X ) + + H(X X,..., X ) H(X Y ) H(X 2 X, Y ) H(X X,..., X, Y ). Chai rule for coditioal mutual iformatio: I(X,..., X ; Y Z) = I(X ; Y Z)+I(X 2 ; Y X, Z)+ +I(X ; Y X,..., X, Z). Proof is similar. 8

19 .6 Fao s iequality Let X be a (ukow) radom variable ad ˆX a related radom variable a estimate of X. Let P e := P(X ˆX) be the probability of mistake made by estimatio. If P e = 0, the X = ˆX a.s. so that H(X ˆX) = 0. Therefore, it is atural to expect that whe P e is small, the H(X ˆX) should also be small. Fao s iequality quatifies that idea. Theorem. (Fao s iequality) Let X ad ˆX be radom variables o X. The where h is biary etropy fuctio. H(X ˆX) h(p e ) + P e log( X ), (.8) Proof. Let Hece Chai rule for etropy: E = { if ˆX X, 0 if ˆX = X. E = I { ˆX X}, E B(, P e ). because H(E X, ˆX) = 0 (why?) O the other had, H(E, X ˆX) = H(X ˆX) + H(E X, ˆX) = H(X ˆX), (.9) H(E, X ˆX) = H(E ˆX) + H(X E, ˆX) H(E) + H(X E, ˆX) = h(p e ) + H(X E, ˆX). Note H(X E, ˆX) = x X P( ˆX = x, E = )H(X ˆX = x, E = ) + x X P( ˆX = x, E = 0)H(X ˆX = x, E = 0). Give ˆX = x ad E = 0, we have X = x ad the H(X ˆX = x, E = 0) = 0 or H(X E, ˆX) = x X P( ˆX = x, E = )H(X ˆX = x, E = ). If E = ad ˆX = x, the X X \x, so that H(X ˆX = x, E = ) log( X ). To summarize: H(X E, ˆX) P e log( X ). Form (.9) we obtai H(X ˆX) P e log( X ) + h(p e ). 9

20 Corollary.4 H(X ˆX) + P e log X, ehk P e H(X ˆX). log X If X <, the Fao s iequality implies: if P e 0, the H(X ˆX) 0. Whe X =, the Fao s iequality is trivial ad such a implicatio might ot exists. Example: Let Z B(, p) ad let Y be such a radom variable that Y H(Y ) =. Defie X as follows { 0 if Z = 0, X = Y if Z =. > 0 ad Let ˆX = 0 a.s.. The P e = P(X > 0) = P(X = Y ) = P(Z = ) = p. But H(X ˆX) = H(X) H(X Z) = ph(y ) =. The for every p > 0, clearly H(X ˆX) = ad therefore H(X ˆX) 0, whe P e 0. Whe Fao s iequality is a equality? holds iff for every x X, Ispectig the proof reveals that equality ad H(X ˆX = x, E = ) = log( X ) (.20) H(E ˆX) = H(E). (.2) The equality (.20) meas that the coditioal distributio of X give X ˆX = x is uiform over all remaiig alphabet X \x. That, i tur, meas that to every x i X correspods p i so that P( ˆX = x i, X = x j ) = p i, j i. I other words, the joit distributio of ( ˆX, X) ˆX\X x x 2 x x P( ˆX = x, X = x ) P( ˆX = x, X = x 2 ) P( ˆX = x, X = x ) x 2 P( ˆX = x 2, X = x ) P( ˆX = x 2, X = x 2 ) P( ˆX = x 2, X = x ) x P( ˆX = x, X = x ) P( ˆX = x, X = x ) is such that i every row, all elemets outside the mai diagoal are equal (to a costat depedig o the row). The relatio (.2) meas that for every x X, it holds that P (X = x ˆX = x) = P e (i every row the probability i mai diagoal divided by the 20

21 sum of the whole row equals to P e. A joit distributio satisfyig both requiremets (.20) ad (.2) is, for example, ˆX\X a b c a 3 0 b 25 c with this distributio, P e = 2, log( X ) = so that 5 O the other had P e log( X ) + h(p e ) = log log 5 2 = 3 5 log log 5. 5 H(X ˆX = a) = H(X ˆX = b) = H(X ˆX = c) = 3 5 log log 5, 5 implyig that H(X ˆX) = 3 5 log log 5. 5 Therefore, Fao s iequality is a equality..7 Data processig iequality.7. Fiite Markovi chai Def.2 The radom variables X,..., X with supports X,..., X form a Markov chai whe for every x i X i ad m = 2,..., P(X m+ = x m+ X m = x m,..., X = x ) = P(X m+ = x m+ X m = x m ). (.22) The X,..., X is Markov chai iff for every x,..., x such that x i X i P (x,..., x ) = P (x, x 2 )P (x 3 x 2 ) P (x x ). The fact that X,..., X form a Markov chai is i iformatio theory deoted as X X 2 X. Thus X Y Z iff P (x, y, z) = P (x)p (y x)p (z y). We shall ow list (without proofs) some elemetary properties of Markov chais. 2

22 Properties: If X X 2 X, the X X X (reversed MC is also a MC). Every sub-chai Markov chai is a Markov chai: if X X 2 X, the X X 2 X k. If X X 2 X, the for every m < ad x i X i P (x,..., x m+ x m,..., x ) = P (x,..., x m+ x m ). (.23) X X iff for every m = 2,..., the radom variables X,..., X m ad X m+,..., X are coditioally idepedet give X m : for every x m X m, P (x,..., x m, x m+,..., x x m ) = P (x,..., x m x m )P (x m+,..., x x m ). (.24).7.2 Data processig iequality Lemma.3 (Data processig iequality) Whe X Y Z, the with equality iff X Z Y. I(X; Y ) I(X; Z), Proof. From X Y Z it follows that X ad Z are coditioally idepedet give Y. This implies I(X; Z Y ) = 0 ad from the chai rule for etropy, it follows I(X; Y, Z) = I(X; Z) + I(X; Y Z) = I(X; Y ) + I(X; Z Y ) = I(X; Y ). (.25) Sice I(X; Y Z) 0,we obtai I(X; Z) I(X; Y ) ad the equality holds iff I(X; Y Z) = 0 or the radom variables X ad Y are coditioally idepedet give Z. That meas X Z Y. Let X be a ukow radom variable we are iterested i. Istead of X, we kow Y (data) givig us I(X; Y ) bits of iformatio. Would it be possible to process the data so that the amout of iformatio about X icreases? The data are possible to process determiistically applyig a determiistic fuctio g, obtaiig g(y ). Hece we have Markov chai X Y g(y ) ad from data processig iequality I(X; Y ) I(X; g(y )) it follows that g(y ) does ot give more iformatio about X as Y. Aother possibility is to process Y by applyig additioal radomess idepedet of X. Sice this additioal radomess is idepedet of X, the X Y Z is still Markov chai ad from data processig iequality I(X; Y ) I(X; Z). Hece, the data processig iequality postulates well-kow fact: it is ot possible to icrease iformatio by processig the data. 22

23 Corollary.5 Whe X Y Z, the Proof. Exercise 23. Corollary.6 Whe X Y Z, the Proof. Exercise Sufficiet statistics H(X Z) H(X Y ). I(X; Z) I(Y ; Z), I(X; Y Z) I(X; Y ). Let {P θ } be a family of probability distributios model. Let X be a radom sample from the distributio P θ. Recall that -elemetal radom sample ca always be cosidered as a radom variable takig values i X. Clearly the sample depeds o chose distributio P θ or, equivaletly, o its idex parameter θ. Let T (X) be ay statistic (fuctio of the sample) givig a estimate to ukow parameter θ. Let us cosider the Bayesia approach, where θ is a radom variable with (prior) distributio π. The θ X T (X) is Markov chai ad from data processig iequality I(θ; T (X)) I(θ; X). Whe the iequality above is a equality, the T (X) gives as much iformatio about θ as X ad we kow that the equality implies θ T (X) X. By defiitio of Markov chai, the for every sample x X P(X = x T (X) = t, θ) = P(X = x T (X) = t) or give the value of T (X), the distributio of sample is idepedet of θ. I statistics, a statistic T (X) havig such a property is called sufficiet. Corollary.7 A statistic T is sufficiet iff for every distributio π of θ the followig equality holds true I(θ; T (X)) = I(θ; X). Example: Let {P θ } the family of all Beroulli distributios. A statistic T (X) = i= X i is sufficiet, because { 0 if i P(X = x,..., X i = x i T (X) = t, θ) = x i t, if ( i t) x i = t. Ideed: if i x i = t, the P(X = x,..., X = x T (X) = t, θ) = P(X = x,..., X = x, T (X) = t, θ) P(T (X) = t, θ) θ t ( θ) t π(θ) = x,...,x : i x i=t θt ( θ) t π(θ) = ( t), because give sum t (the umber of oes) there are exactly ( t) possibilities for differet samples. 23

24 .8 Etropy rate of a stochastic process Let us cosider a stochastic process {X } =. Def.3 The etropy rate of a stochastic process {X } =is provided the limit exists. H X := lim H(X,..., X ), Examples: let {X } = i.i.d. radom variables from the distributio P, i.e. X i P. the lim H(X,..., X ) = lim i= H(X i ) = lim H(P ). Thus, i i.i.d. case the etropy rate of the process equals to the etropy of X. Let {X } = be idepedet radom variables H(X,..., X ) = H(X i ). The limit eed ot always exists so that the etropy rate is ot always defied for that process. Let X, X 2,... i.i.d. radom variables X i P. Let X = Z. Cosider radom walk {S } =0, s.t. S 0 = 0, S = X, S 2 = X + X 2,..., S = X + + X. The etropy rate of radom walk is H S = H(P ). The proof of that is Exercise 32. i= The limit H X. Cosider the limit (whe exists) H X := lim H(X X,..., X ). We shall ow show that for a large class of stochastic processes, called statioary processes, the limit H X always exists. Def.4 A stochastic process {X } = is statioary, if for every ad every k the radom vectors have the same distributios. (X,..., X ) ad (X k+,..., X k+ ) 24

25 Hece, whe {X } = is statioary, the all radom variables X, X 2,... have the same distributios, all two-dimesioal radom vectors (X, X 2 ), (X 2, X 3 ),... have the same distributio, the vectors (X, X 2, X 3 ), (X 2, X 3, X 4 ),... have the same distributio etc. Propositio.3 Whe {X } = is statioary, the the limit H X always exists. Proof. Sice {X } = is statioary, the for every the radom vectors (X,..., X ) ad (X 2,..., X + ) have the same distributios. Hece, for every Therefore H(X X,..., X ) = H(X + X 2,..., X ). H(X + X,..., X ) H(X + X 2,..., X ) = H(X X,..., X ), so that the sequece {H(X X,..., X )} is o-egative ad o-icreasig. Such a sequece has always a limit. Next, we show that for a statioary process the etropy rate is always defied ad equals to H X. We eed Cesaro s lemma Lemma.4 (Cesaro) Let {a } o-egative real umbers with a > 0 ad a =. Deote b := i= a i. Let x x be arbitrary coverget sequece. The a i x i x, whe. b i= I a special case a =, we obtai x x x. Theorem.5 Whe {X } = is a statioary process, the H X always exists ad H X = H X. Proof. From the chai rule for etropy: H(X,..., X ) = H(X k X,..., X k ). Use H(X k X,..., X k ) H X, together with Cesaro lemma to obtai lim H(X,..., X ) = lim H(X k X,..., X k ) = H X. k= k= Hece, every statioary process has a etropy rate that equals to H X. It might be 0 eve if X is still radom (ca you fid a example of such process?). O the other had, also a o-statioary processes might have a etropy rate (which of the examples above was o-statioary). 25

26 .8. Etropy rate of Markov chai Determiig a etropy rate of a stochastic process is, i geeral, ot a easy task. I this sub-subsectio, we fid the etropy rate of statioary Markov chai. Let {X } = be a radom process where all radom variables X i are takig the values o discrete alphabet X. Def.6 The radom process {X } = is Markov chai, if for every m ad x,..., x m X such that P(X m = x m,..., X = x ) > 0, (.22) holds, i.e. P(X m+ = x m+ X m = x m,..., X = x ) = P(X m+ = x m+ X m = x m ). (.26) I thermiology of Markov chais, the elemets of X are called states, ad the chai is called time homogeous, if the the right had side of equality (.26) is idepedet of m. I this case, for every m ad x i, x j X P(X m+ = x j X m = x i ) = P (X 2 = x j X = x i ) =: P ij. The matrix P = (P ij ) is trasitio matrix of time-homogeous MC {X }. Let π(i) = π(x i ) iitial distributio be the distributio of X. The P(X 2 = x j ) = x i X P(X 2 = x j X = x i )P(X = x i ) = i P ij π(i) so that the distributio of X 2 is π T P. Similarly, the distributio of X k is π T P k. Now, it is ot hard to see that the distributio of ay fiite vector (X k,..., X k+l ) is fully determied by trasitio matrix P ad iitial distributio π. Markov chai {X } is statioary iff π is such that π T P = π or π(j) = i π(i)p ij j. Such iitial distributio (whe exists) is called statioary iitial distributio. Whether it exists ad is uique, depeds o the trasitio matrix P. Example: Let X = 2 ad let the trasitio matrix be ( ) α α. β β Uique statioary iitial distributio correspodig to that trasitio matrix is β ( α + β, α α + β ). Theorem.7 Let {X } be statioary time-homogeous Markov chai with trasitio matrix (P ij ) ad (statioary) iitial distributio π. The H X = H(X 2 X ) = i π(i) j P ij log P ij. 26

27 Proof. From (.26), we obtai that for every H(X X,..., X ) = H(X X ). Sice chai is statioary, we get H(X X ) = H(X 2 X ) ad by Theorem.5, The equatio H X = H X = lim H(X X,..., X ) = lim H(X X ) = H(X 2 X ). H(X 2 X ) = i π(i) j P ij log P ij is Exercise 3..9 Exercises. Let us toss util the firs head. Let X be the umber tosses eeded. Fid H(X), if the probability of head is p. 2. Prove groupig property H(p, p 2, p 3,...) = H(p + p 2, p 3,...) + (p + p 2 )H( (p + p 2 ), p 2 (p + p 2 ) ) ad deduce (.2). 3. Let g : X X a fuctio. Prove that 4. Fid P such that H(P ) =. H(g(X)) H(X), H(g(X) Y ) H(X Y ). 5. let X ad X 2 radom variables with disjoit supports. Let X have mixture distributio, i.e. { X if Z =, X = X 2 if Z = 0, where Z B(, p). Fid H(X). Show that 6. Let X P. Show that 2 H(X) 2 H(X ) + 2 H(X 2). P ( P (X) d ) (log d ) H(X). p 7. Fid distributios P, Q ad R show that D(P Q) > D(P R) + D(R Q). 27

28 8. Prove (.9). 9. Let ad for every, P = (p, p 2,..., p m, 0, 0,...) P = ( ( )p,..., ( )p m,,..., 0,...), (.27) M M }{{ } M where show that M = 2 c, c > 0. H(P ) = ( )H(P ) + log 2 M + h( ) H(P ) + c. 0. Let X ifiite. Defie P = ( α log, α log,..., α, 0, ), log }{{} where α > 0. Show that P P, where P = (, 0,...), but H(P ) α. Let Q = (q, q 2, q 3,...), where q i = ( q)q i. Show that D(P Q) <, but D(P Q).. Let X = (X,..., X ) radom vector, where X i has Beroulli distributio for every i. The radom variables X i are either idepedet or idetically distributed. Let R = (R,..., R ) be the ru legths of X. For example, if X = (, 0, 0, 0,,, 0), the R = (, 3, 2, ). Show that 0 H(X) H(R) mi i H(X i ). 2. Let X, Y be radom variables, let Z = X + Y. Show that H(Z X) = H(Y X). Show that whe X ad Y are idepedet, the H(X) H(Z) ad H(Y ) H(Z). Fid X ad Y such that H(X) > H(Z) ad H(Y ) > H(Z). Whe H(Z) = H(X) + H(Y )? 28

29 3. Let ρ(x, Y ) = H(X Y ) + H(Y X). Show that ρ is semi-metric. Whe ρ(x, Y ) = 0? Show that ρ(x, Y ) = H(X)+H(Y ) 2I(X; Y ) = H(X, Y ) I(X; Y ) = 2H(X, Y ) H(X) H(Y ). 4. Prove that for every 2 Show that H(X,..., X ) H(X i X j, j i). i= 2 [H(X, X 2 ) + H(X 3, X 2 ) + H(X, X 3 )] H(X, X 2, X 3 ). 5. Let X, Y, Z be radom variables, with Y ad Z beig idepedet. Show that D(X Y Z) = H(X Z) + D(X Y ) + H(X) H(Z) + D(X Y ). 6. Usig log-sum iequality prove (.3). 7. (a) Let X ad X 2 have the same distributio. Let ρ(x, X 2 ) := H(X 2 X ). (.28) H(X ) Prove that ρ is symmetric, ρ [0, ]. Whe ρ = 0? Whe ρ =? (b) Let (X, Y ) have the followig joit distributio, where ϵ (0, 4 ]: Y \X ϵ ϵ Fid I(X; Y ) ad ρ (like i (.28)). Fid cov(x, Y ) ad the correlatio coefficiet of X ad Y. Note that whe, the the limit of correlatio coefficiet is for every ϵ > 0. (c) Let (X, Y ) have the followig joit distributio Y \X Fid I(X; Y ) ad ρ (like i (.28)). Fid cov(x, Y ) ad the correlatio coefficiet of X ad Y. 29

30 8. Prove I(X; X Z) = H(X Z) I(X; Y Z) = H(Y Z) H(Y X, Z) I(X; Y Z) = H(X Z) + H(Y Z) H(X, Y Z) I(X; Y Z) = H(X, Z) + H(Y, Z) H(X, Y, Z) H(Z). 9. Prove H(X, Y Z) H(X Z) I(X, Y ; Z) I(X; Z) H(X, Y, Z) H(X, Y ) H(X, Z) H(X) Whe the iequalities are equalities? 20. fid X, Y, Z such that 2. Prove that I(X; Y Z) I(Y ; Z X) I(Y ; Z) + I(X; Y ). I(X; Y Z) > I(X; Y ) = 0 0 = I(X; Y Z) < I(X; Y ). H(X g(y )) H(X Y ). fid (X, Y ) such that X ad Y are depedig, g is ot oe-to-oe, but the iequality is a equality. 22. Let X = (X,..., X ) be a radom vector with biary (0 or valued) compoets havig the followig distributio: { 2 ( ) whe P (x,..., x ) = i x i is eve; 0, whe i x i is odd. Fid the distributio of X i. Fid the distributio of (X i, X i+ ). Fid I(X ; X 2 ), I(X 2 ; X 3 X ), I(X 4 ; X 3 X, X 2 ),..., I(X ; X X, X 2,..., X 2 ). 23. Prove that if X Y Z, the H(X Z) H(X Y ), I(X; Z) I(Y ; Z) ad I(X; Y Z) I(X; Y ). 24. Let {P θ } be a set of Beroulli distributios, θ Θ, where Θis discrete set, π is a prior distributio of θ. Let X be a radom sample ad T (X) = i= X i. Fid H(θ T (X)) ad H(θ X). Show that data processig iequality is a equality. 30

31 25. Let X X 2 X 3 X 4. Prove I(X ; X 4 ) I(X 2 ; X 3 ). 26. let X X 2 X. Fid I(X ; X 2, X 3,..., X ). 27. Let X X 2 X 3 be Markov chai, where X =, X 2 = k, X 3 = m, k < ad k < m. Show that "bottleeck" decreases mutual iformatio betwee X ad X 3 i.e. I(X ; X 3 ) log k. Show that whe k =, the X ad X 3 are idepedet. 28. Let X = m ad let X be a radom variable takig values o X. Fid a oradom estimate ˆX to X with smallest error probability. Let P e = P(X ˆX). fid X such that Fao s iequality is a equality H(X) = P e log( X ) + h(p e )? 29. Let P be a probability distributio with support X P = {, 2,...}. Let µ be the mea of P. Prove that H(P ) µ log µ + ( µ) log(µ ), with equality iff P has geometric distributio. Hece, amogst such distributios, the geometric distributio has the biggest etropy. 30. Let {X } = be a statioary radom process. Prove 3. Prove that for statioary MC, H(X,..., X ) H(X,..., X ) H(X,..., X ) H(X X,..., X ). H(X 2 X ) = i π(i) j P ij log P ij. 32. Let X, X 2,... be i.i.d. radom variables X i P. Cosider radom walk {S } =0, s.t. S 0 = 0, S = X, S 2 = X + X 2,..., S = X + + X. Prove that the etropy rate of radom walk is H S = H(P ). 33. A dog walks o the itegers: at time 0 is it o positio 0. The it start to move, with probability 0.5 to left ad with the same probability to right. The it cotiues movig i the same directio, possibly reversig directio with probability 0.. A typical walk might look like Fid H X. (X 0, X,...) = (0,, 2, 3, 4, 3, 2,, 0,, 2, 3,...). 3

32 34. Cosider radom walk o rig (0,,..., l), i.e. l is followed by 0. Let S = X i, i= where X has uiform distributio o (0,,..., l) ad X 2, X 3,... are i.i.d. radom variables P (X 2 = ) = P (X 2 = 2) = 0.5. Fid H S. 32

33 2 Zero-error data compressio 2. Codes I this sectio, we suppose that besides our origial alphabet X, we have aother fiite codig alphabet D. I what follows, D =: D so that alphabet D will be referred to as D-ary alphabet ad without loss of geerality we take D = {0,..., D }. I case D = 2, thus, we speak about biary alphabet {0, } etc. The alphabet D is used i data trasmissio. Typically D < X, hece to trasmit a letter x it should be represeted as a fiite strig of letters from D - a codeword. I what follows, let D be the set of all fiite legth strigs (codewords) from D. Formally, thus D := =D, X := =X. Def 2. A code is mappig C : X D. There are differet codes. A classical example of a code is Morse alphabet, where D cosists of three elemets: a dot, a dash ad a letter space. Actually there is also a word space but whe codig letters oly, it will ot be eeded. I Morse code, short letters represet frequet letters (i Eglish) ad log sequeces represet ifrequet letters. This makes Morse code reasoably efficiet but, as we shall see, this is ot the most efficiet (optimal) code. Oe ca see this immediately by oticig that oe of the three code-letters space is used i the ed of the word, oly. Def 2.2 A code C is o-sigular, whe it is ijective i.e. every elemet of X is mapped ito a differet codeword: if x i x j the C(x i ) C(x j ). No-sigularity is sufficiet to decode uiquely letters, but typically oe eed to codewords. A the a stroger property is eeded. Def 2.3 A extesio of a code C is a mappig C from X ito D defied as follows C : X D, C (x x ) := C(x ) C(x ). Hece the extesio of a code C is a cocateatio of codewords of letters to obtai a codeword for word. Def 2.4 A code C is uiquely decodable, if its extesio is o-sigular. Hece, if C is uiquely decodable, the to every codeword C(x ) C(x ) correspods oly oe origial word (source strig) x x. However, oe may have to look at the etire strig to determie eve the first symbol i the correspodig source strig. It is atural to expect that the first letter x ca be decoded as soo as C(x ) has bee observed decodig ca be performed "o-lie". This meas that C(x ) caot be the begiig (prefix) of ay other codeword. 33

34 Def 2.5 A code C is prefix code (prefix-free code, istataeous code) if o codeword is a prefix of ay other codeword i.e. there are o differet letters x i ad x j such that C(x i ) is a prefix of C(x j ). Clearly prefix codes are uiquely decodable ad uiquely decodable codes are o-sigular. Examples: Morse code is prefix code, sice every codeword eds with space. Let X = {a, b, c, d} ad cosider biary codes C, C 2, C 3 ad C 4, represeted i the table. X C C 2 C 3 C 4 a b c 0 0 d Code C is ot o-sigular; C 2 is o-sigular but ot uiquely decodable, sice 00 could stad for the letter b as well as for the words ad ad ca. Code C 3 is uiquely decodable but ot prefix code. Ideed, to figure out whether is a codeword of cbb... b or dbb... b, oe has to cout all 0 s. Thus, oe caot decode the first letter before the whole codeword is read. This is so, because the codeword C(c) = is a prefix of the codeword C(d) = 0. Code C 4 is prefix code, hece every letter ca be decoded as soo as it codeword has observed. Decode "o-lie" the word Kraft iequality Prefix code as a tree. Every prefix code ca be represeted as D-ary tree, where every ode has at most D childre. To every brach of a tree correspod a letter from D, to every leave correspods a letter from X ad the path from the root to the letter is the codeword of that letter (leave). The legth of that codeword is the legth (or level) of that leave. Example: Let D = 3. Let us costruct a code tree of the followig prefix code: a b c d e f g h I what follows, give a code C, we shall deote by l(x) := C(x) the legth of the codeword. I the example above, X = 8 ad the legths of codewords i icreasig order are l = l 2 =, l 3 = 2, l 4 = l 5 = l 6 = l 7 = l 8 = 3. It is clear that whe C is a prefix code ad ca be represeted as a tree, the the codeword legths caot be arbitrary small. Kraft iequality gives a ice boud. 34

35 Theorem 2.6 (Kraft iequality) Let C : X D be a prefix code l i = l(x i ). The D l i. (2.) i Coversely, let {l i } X i= itegers that satisfy (2.). The there exist prefix code C : X D such that l i = l(x i ) x i X. Proof. Let us start with provig the first claim for the case X = m <. Let l := max{l,..., l m } <. Orgaize the set {l,..., l m } (code) as a D-ary tree. A codeword at level l i has D l l i descedats at level l. All the the descedat sets (correspodig to differet l i ) must be disjoit. Therefore the total umber of odes i these sets (over all codewords) must be less tha or equal to D l : m D l l i D l i= m D l i. Let us ow prove the same claim for geeral case, where X. Recall i= D = {0,..., D } ad cosider the codeword d d 2 d li. Let 0.d d 2 d li D-ary expasio 0.d d 2 d li, i.e. be the real umber havig the 0.d d 2 d li = l i j= d j D j. (2.2) Cosider the iterval (sub-iterval of [0, ]) [0.d d 2 d li, 0.d d 2 d li + D l i ). correspodig to the codeword d d 2 d li. To this iterval belog all real umbers whose D-ary expasio begis with 0.d d 2 d li. Clearly the legth of that iterval is D l i. Sice C is prefix code the itervals correspodig to differet codewords are disjoit. Sice they are all sub-itervals of [0, ], their legths sum up somethig less tha or equal to. This meas that (2.) holds. Let us prove the secod statemet: we are give the set {l i } X i= satisfyig (2.). We aim to costruct a prefix code so that the codewords have legths {l i }. Sice (2.) holds, it is possible to divide uit iterval ito disjoit subitervals with legths D l i. Ideed, order l l 2. Let the first iterval be [0, D l ), secod [D l, D l + D l 2 ) ad so o. Thus the first iterval correspods to l. It begis with 0 that ca be represeted as }{{} l 35

36 The first iterval eds with D l with D-ary expasio beig 0. 0 } {{ 0}. l Clearly the first iterval cosists of these real umbers, whose D-ary expasio begis with (with l zeros). Secod iterval correspods to l 2. We represet both D l as well as D l +D l 2 as D-ary real umbers with l 2 umbers after 0.. Recall that l 2 l. If l 2 = l, the the D l will be represeted just like previously, otherwise it will be represeted as l 2 { }} { 0. 0 } {{ 0} 0 0. (2.3) l Clearly oe eeds at most l 2 figures after 0. to expad D l + D l 2 : To this iterval belog all these real umbers whose D-ary expasio begis with (2.3). The begiig of the third iterval (correspodig to l 3 ) ca be represeted as D-ary umber 0.d d 2 d l3. Agai, recall l 3 l 2 ad if l 3 > l 2, the the last l 3 l 2 elemets of that represetatio are zero. The D-ary expasio of the edpoit of that iterval D l + D l 2 + D l 3 has obviously at most l 3 elemets after 0.. We proceed similarly: the iterval correspodig to l i begis with D l + + D l i. The D-ary expasio of that umber has at most l i elemets after 0. ad we use l i elemets which is possible because l i l i. Hece, the D-ary represetatio is 0.d d li. To this iterval belog real umbers whose D-ary expasio begis with that represetatio. To costruct the code, take to every l i (to letter x i ) the word d d li from the D-ary expasio of D l + + D l i (begiig of the iterval). Sice differet codewords belog to differet itervals, the obtaied code is a prefix code. Examples: Cosider the code C 4. The l =, l 2 = 2, l 3 = l 4 = 3. Let us fid the real umbers whose D-ary represetatios are 0.d d 2 d li. We obtai = 0, = 0. 2 = 0.5, = 0. 2 = = 0.75, 0. 2 = = Hece the itervals used i the first part of the proof are [0, 0 + ), [0.5, ), [0.75, ), [0.875, ). 2 I this example, the Kraft iequality is a equality. The coverse: Let {, 2, 3, 3} be the legths of the codewords. The easiest way to costruct the correspodig code is to costruct a tree. The procedure used i the proof is as follows. Let us costruct the itervals: [0, 2 ), [ 2, ), [ 2 + 4, ), [ , ). 36

37 With biary represetatio these itervals (recall the umbers of figures after 0. must be l i ) are 2 {}}{{}}{{}}{{}}{ [0. 0, 0.), [0. 0, 0.), [0. 0, 0.), [0., ). Codewords: 0, 0, 0,. Let the legths of the codewords be {2, 2, 3, 3}. Note that Kraft iequality is strict: = 3 <. Itervals 4 [0, 4 ), [ 4, 2 ), [ 2, ), [ 2 + 8, ). With biary expasio these itervals are [0.00, 0.0), [0.0, 0.0), [0.00, 0.0), [0.0, 0.0). Codewords: 00, 0, 00, Expected legth ad etropy Let us cosider the case where letters are chose radomly accordig to a distributio P o X. I other words, we cosider a radom variable X P. Give a code C we are iterested i the expected legth of a codeword. Sice l(x) is the legth of codeword C(x), the expected legth of the code C is 3 3 L(C) = x l(x)p (x). Example: Cosider the code C 4. Let P (a) = 2, P (b) = 4, P (c) = P (d) = 8. The Note that H(P ) = 7 4. L(C 4 ) = = 7 4. Hece L is the average umber of symbols we eed to describe the outcome of X whe the code C is used. Clearly, the smaller the expected legth, the better code. The expected legth is obviously small whe all codeword are small i.e. l(x) is small for every x. O the other had, we kow that for prefix code the legths l(x) caot be arbitrary small, sice they have to satisfy Kraft iequality. But give the legths l(x) that satisfy Kraft equality, how to choose the code with miimal expected legth? We kow how to fid the codewords, but how to assig these words to letters x? The ituitio correctly suggest that the expected legth is small if the frequet (high probability) letters have small codewords ad ifrequet letters loger. Also the Morse code follows the same priciple, but the symbol "space" is oly used to mark the ed of the word, hece oe ca figure out a three letter prefix code with smaller expected legth. 37

38 The ext theorem provides a fudametal lower boud o the expected legth of ay prefix code. It turs out tha the for D-ary code the expected legth caot be lower the H D (P ). Theorem 2.7 Let C : X D be a prefix code. The L(C) H D (P ), with the equality if ad oly if l(x) = log D P (x), x X. Proof. L(C) H D (P ) = x l(x)p (x) x P (x) log D P (x) = x P (x) log D D l(x) + x P (x) log D P (x). Let The L(C) H D (P ) = x c := x D l(x), R(x) := D l(x). c P (x) log D P (x) R(x) log D c = D(P R) + log D c 0, because D(P R) 0 ad from Kraft iequality, it follows log D c 0. The iequality is a equality oly if P = R ad c =. This holds iff for every x X it holds P (x) = D l(x). Necessary coditio is that log D P (x) is iteger for every x X. Optimal codes for D-adic distributio. The code with miimum expected legth is called optimal. From the precedig theorem, it follows that if P satisfies the followig coditio: log D Z, x X, (2.4) P (x) (sometimes such distributios are called D-adic), the optimal prefix code is easy to costruct: take l(x) = log D P (x). The legths l(x) satisfy Kraft iequality (with equality) ad the correspodig optimal code ca be costructed via costructig the tree or usig the iterval as i the proof of Kraft iequality. The expected legth of such code is H D (P ) ad from the precedig theorem we kow that it must be the optimal. 38

Information Theory and Statistics Lecture 4: Lempel-Ziv code

Information Theory and Statistics Lecture 4: Lempel-Ziv code Iformatio Theory ad Statistics Lecture 4: Lempel-Ziv code Łukasz Dębowski ldebowsk@ipipa.waw.pl Ph. D. Programme 203/204 Etropy rate is the limitig compressio rate Theorem For a statioary process (X i)

More information

Lecture 14: Graph Entropy

Lecture 14: Graph Entropy 15-859: Iformatio Theory ad Applicatios i TCS Sprig 2013 Lecture 14: Graph Etropy March 19, 2013 Lecturer: Mahdi Cheraghchi Scribe: Euiwoog Lee 1 Recap Bergma s boud o the permaet Shearer s Lemma Number

More information

Convergence of random variables. (telegram style notes) P.J.C. Spreij

Convergence of random variables. (telegram style notes) P.J.C. Spreij Covergece of radom variables (telegram style otes).j.c. Spreij this versio: September 6, 2005 Itroductio As we kow, radom variables are by defiitio measurable fuctios o some uderlyig measurable space

More information

Chapter 3. Strong convergence. 3.1 Definition of almost sure convergence

Chapter 3. Strong convergence. 3.1 Definition of almost sure convergence Chapter 3 Strog covergece As poited out i the Chapter 2, there are multiple ways to defie the otio of covergece of a sequece of radom variables. That chapter defied covergece i probability, covergece i

More information

Advanced Stochastic Processes.

Advanced Stochastic Processes. Advaced Stochastic Processes. David Gamarik LECTURE 2 Radom variables ad measurable fuctios. Strog Law of Large Numbers (SLLN). Scary stuff cotiued... Outlie of Lecture Radom variables ad measurable fuctios.

More information

6.3 Testing Series With Positive Terms

6.3 Testing Series With Positive Terms 6.3. TESTING SERIES WITH POSITIVE TERMS 307 6.3 Testig Series With Positive Terms 6.3. Review of what is kow up to ow I theory, testig a series a i for covergece amouts to fidig the i= sequece of partial

More information

Entropy Rates and Asymptotic Equipartition

Entropy Rates and Asymptotic Equipartition Chapter 29 Etropy Rates ad Asymptotic Equipartitio Sectio 29. itroduces the etropy rate the asymptotic etropy per time-step of a stochastic process ad shows that it is well-defied; ad similarly for iformatio,

More information

Chapter 6 Principles of Data Reduction

Chapter 6 Principles of Data Reduction Chapter 6 for BST 695: Special Topics i Statistical Theory. Kui Zhag, 0 Chapter 6 Priciples of Data Reductio Sectio 6. Itroductio Goal: To summarize or reduce the data X, X,, X to get iformatio about a

More information

MASSACHUSETTS INSTITUTE OF TECHNOLOGY 6.265/15.070J Fall 2013 Lecture 3 9/11/2013. Large deviations Theory. Cramér s Theorem

MASSACHUSETTS INSTITUTE OF TECHNOLOGY 6.265/15.070J Fall 2013 Lecture 3 9/11/2013. Large deviations Theory. Cramér s Theorem MASSACHUSETTS INSTITUTE OF TECHNOLOGY 6.265/5.070J Fall 203 Lecture 3 9//203 Large deviatios Theory. Cramér s Theorem Cotet.. Cramér s Theorem. 2. Rate fuctio ad properties. 3. Chage of measure techique.

More information

It is always the case that unions, intersections, complements, and set differences are preserved by the inverse image of a function.

It is always the case that unions, intersections, complements, and set differences are preserved by the inverse image of a function. MATH 532 Measurable Fuctios Dr. Neal, WKU Throughout, let ( X, F, µ) be a measure space ad let (!, F, P ) deote the special case of a probability space. We shall ow begi to study real-valued fuctios defied

More information

Chapter 6 Infinite Series

Chapter 6 Infinite Series Chapter 6 Ifiite Series I the previous chapter we cosidered itegrals which were improper i the sese that the iterval of itegratio was ubouded. I this chapter we are goig to discuss a topic which is somewhat

More information

4.1 Data processing inequality

4.1 Data processing inequality ECE598: Iformatio-theoretic methods i high-dimesioal statistics Sprig 206 Lecture 4: Total variatio/iequalities betwee f-divergeces Lecturer: Yihog Wu Scribe: Matthew Tsao, Feb 8, 206 [Ed. Mar 22] Recall

More information

Axioms of Measure Theory

Axioms of Measure Theory MATH 532 Axioms of Measure Theory Dr. Neal, WKU I. The Space Throughout the course, we shall let X deote a geeric o-empty set. I geeral, we shall ot assume that ay algebraic structure exists o X so that

More information

(A sequence also can be thought of as the list of function values attained for a function f :ℵ X, where f (n) = x n for n 1.) x 1 x N +k x N +4 x 3

(A sequence also can be thought of as the list of function values attained for a function f :ℵ X, where f (n) = x n for n 1.) x 1 x N +k x N +4 x 3 MATH 337 Sequeces Dr. Neal, WKU Let X be a metric space with distace fuctio d. We shall defie the geeral cocept of sequece ad limit i a metric space, the apply the results i particular to some special

More information

Infinite Sequences and Series

Infinite Sequences and Series Chapter 6 Ifiite Sequeces ad Series 6.1 Ifiite Sequeces 6.1.1 Elemetary Cocepts Simply speakig, a sequece is a ordered list of umbers writte: {a 1, a 2, a 3,...a, a +1,...} where the elemets a i represet

More information

Product measures, Tonelli s and Fubini s theorems For use in MAT3400/4400, autumn 2014 Nadia S. Larsen. Version of 13 October 2014.

Product measures, Tonelli s and Fubini s theorems For use in MAT3400/4400, autumn 2014 Nadia S. Larsen. Version of 13 October 2014. Product measures, Toelli s ad Fubii s theorems For use i MAT3400/4400, autum 2014 Nadia S. Larse Versio of 13 October 2014. 1. Costructio of the product measure The purpose of these otes is to preset the

More information

Lecture 19: Convergence

Lecture 19: Convergence Lecture 19: Covergece Asymptotic approach I statistical aalysis or iferece, a key to the success of fidig a good procedure is beig able to fid some momets ad/or distributios of various statistics. I may

More information

Math 155 (Lecture 3)

Math 155 (Lecture 3) Math 55 (Lecture 3) September 8, I this lecture, we ll cosider the aswer to oe of the most basic coutig problems i combiatorics Questio How may ways are there to choose a -elemet subset of the set {,,,

More information

MASSACHUSETTS INSTITUTE OF TECHNOLOGY 6.436J/15.085J Fall 2008 Lecture 19 11/17/2008 LAWS OF LARGE NUMBERS II THE STRONG LAW OF LARGE NUMBERS

MASSACHUSETTS INSTITUTE OF TECHNOLOGY 6.436J/15.085J Fall 2008 Lecture 19 11/17/2008 LAWS OF LARGE NUMBERS II THE STRONG LAW OF LARGE NUMBERS MASSACHUSTTS INSTITUT OF TCHNOLOGY 6.436J/5.085J Fall 2008 Lecture 9 /7/2008 LAWS OF LARG NUMBRS II Cotets. The strog law of large umbers 2. The Cheroff boud TH STRONG LAW OF LARG NUMBRS While the weak

More information

Distribution of Random Samples & Limit theorems

Distribution of Random Samples & Limit theorems STAT/MATH 395 A - PROBABILITY II UW Witer Quarter 2017 Néhémy Lim Distributio of Radom Samples & Limit theorems 1 Distributio of i.i.d. Samples Motivatig example. Assume that the goal of a study is to

More information

Lecture Notes for Analysis Class

Lecture Notes for Analysis Class Lecture Notes for Aalysis Class Topological Spaces A topology for a set X is a collectio T of subsets of X such that: (a) X ad the empty set are i T (b) Uios of elemets of T are i T (c) Fiite itersectios

More information

Ma 530 Introduction to Power Series

Ma 530 Introduction to Power Series Ma 530 Itroductio to Power Series Please ote that there is material o power series at Visual Calculus. Some of this material was used as part of the presetatio of the topics that follow. What is a Power

More information

Lecture 11: Channel Coding Theorem: Converse Part

Lecture 11: Channel Coding Theorem: Converse Part EE376A/STATS376A Iformatio Theory Lecture - 02/3/208 Lecture : Chael Codig Theorem: Coverse Part Lecturer: Tsachy Weissma Scribe: Erdem Bıyık I this lecture, we will cotiue our discussio o chael codig

More information

Definition 4.2. (a) A sequence {x n } in a Banach space X is a basis for X if. unique scalars a n (x) such that x = n. a n (x) x n. (4.

Definition 4.2. (a) A sequence {x n } in a Banach space X is a basis for X if. unique scalars a n (x) such that x = n. a n (x) x n. (4. 4. BASES I BAACH SPACES 39 4. BASES I BAACH SPACES Sice a Baach space X is a vector space, it must possess a Hamel, or vector space, basis, i.e., a subset {x γ } γ Γ whose fiite liear spa is all of X ad

More information

Fall 2013 MTH431/531 Real analysis Section Notes

Fall 2013 MTH431/531 Real analysis Section Notes Fall 013 MTH431/531 Real aalysis Sectio 8.1-8. Notes Yi Su 013.11.1 1. Defiitio of uiform covergece. We look at a sequece of fuctios f (x) ad study the coverget property. Notice we have two parameters

More information

An Introduction to Randomized Algorithms

An Introduction to Randomized Algorithms A Itroductio to Radomized Algorithms The focus of this lecture is to study a radomized algorithm for quick sort, aalyze it usig probabilistic recurrece relatios, ad also provide more geeral tools for aalysis

More information

Entropies & Information Theory

Entropies & Information Theory Etropies & Iformatio Theory LECTURE I Nilajaa Datta Uiversity of Cambridge,U.K. For more details: see lecture otes (Lecture 1- Lecture 5) o http://www.qi.damtp.cam.ac.uk/ode/223 Quatum Iformatio Theory

More information

MAT1026 Calculus II Basic Convergence Tests for Series

MAT1026 Calculus II Basic Convergence Tests for Series MAT026 Calculus II Basic Covergece Tests for Series Egi MERMUT 202.03.08 Dokuz Eylül Uiversity Faculty of Sciece Departmet of Mathematics İzmir/TURKEY Cotets Mootoe Covergece Theorem 2 2 Series of Real

More information

Entropy and Ergodic Theory Lecture 5: Joint typicality and conditional AEP

Entropy and Ergodic Theory Lecture 5: Joint typicality and conditional AEP Etropy ad Ergodic Theory Lecture 5: Joit typicality ad coditioal AEP 1 Notatio: from RVs back to distributios Let (Ω, F, P) be a probability space, ad let X ad Y be A- ad B-valued discrete RVs, respectively.

More information

Learning Theory: Lecture Notes

Learning Theory: Lecture Notes Learig Theory: Lecture Notes Kamalika Chaudhuri October 4, 0 Cocetratio of Averages Cocetratio of measure is very useful i showig bouds o the errors of machie-learig algorithms. We will begi with a basic

More information

Lecture 7: October 18, 2017

Lecture 7: October 18, 2017 Iformatio ad Codig Theory Autum 207 Lecturer: Madhur Tulsiai Lecture 7: October 8, 207 Biary hypothesis testig I this lecture, we apply the tools developed i the past few lectures to uderstad the problem

More information

Math 220A Fall 2007 Homework #2. Will Garner A

Math 220A Fall 2007 Homework #2. Will Garner A Math 0A Fall 007 Homewor # Will Garer Pg 3 #: Show that {cis : a o-egative iteger} is dese i T = {z œ : z = }. For which values of q is {cis(q): a o-egative iteger} dese i T? To show that {cis : a o-egative

More information

62. Power series Definition 16. (Power series) Given a sequence {c n }, the series. c n x n = c 0 + c 1 x + c 2 x 2 + c 3 x 3 +

62. Power series Definition 16. (Power series) Given a sequence {c n }, the series. c n x n = c 0 + c 1 x + c 2 x 2 + c 3 x 3 + 62. Power series Defiitio 16. (Power series) Give a sequece {c }, the series c x = c 0 + c 1 x + c 2 x 2 + c 3 x 3 + is called a power series i the variable x. The umbers c are called the coefficiets of

More information

Math 61CM - Solutions to homework 3

Math 61CM - Solutions to homework 3 Math 6CM - Solutios to homework 3 Cédric De Groote October 2 th, 208 Problem : Let F be a field, m 0 a fixed oegative iteger ad let V = {a 0 + a x + + a m x m a 0,, a m F} be the vector space cosistig

More information

Beurling Integers: Part 2

Beurling Integers: Part 2 Beurlig Itegers: Part 2 Isomorphisms Devi Platt July 11, 2015 1 Prime Factorizatio Sequeces I the last article we itroduced the Beurlig geeralized itegers, which ca be represeted as a sequece of real umbers

More information

MAS111 Convergence and Continuity

MAS111 Convergence and Continuity MAS Covergece ad Cotiuity Key Objectives At the ed of the course, studets should kow the followig topics ad be able to apply the basic priciples ad theorems therei to solvig various problems cocerig covergece

More information

Information Theory Tutorial Communication over Channels with memory. Chi Zhang Department of Electrical Engineering University of Notre Dame

Information Theory Tutorial Communication over Channels with memory. Chi Zhang Department of Electrical Engineering University of Notre Dame Iformatio Theory Tutorial Commuicatio over Chaels with memory Chi Zhag Departmet of Electrical Egieerig Uiversity of Notre Dame Abstract A geeral capacity formula C = sup I(; Y ), which is correct for

More information

EE 4TM4: Digital Communications II Information Measures

EE 4TM4: Digital Communications II Information Measures EE 4TM4: Digital Commuicatios II Iformatio Measures Defiitio : The etropy H(X) of a discrete radom variable X is defied by We also write H(p) for the above quatity. Lemma : H(X) 0. H(X) = x X Proof: 0

More information

Chapter 5. Inequalities. 5.1 The Markov and Chebyshev inequalities

Chapter 5. Inequalities. 5.1 The Markov and Chebyshev inequalities Chapter 5 Iequalities 5.1 The Markov ad Chebyshev iequalities As you have probably see o today s frot page: every perso i the upper teth percetile ears at least 1 times more tha the average salary. I other

More information

EECS564 Estimation, Filtering, and Detection Hwk 2 Solns. Winter p θ (z) = (2θz + 1 θ), 0 z 1

EECS564 Estimation, Filtering, and Detection Hwk 2 Solns. Winter p θ (z) = (2θz + 1 θ), 0 z 1 EECS564 Estimatio, Filterig, ad Detectio Hwk 2 Sols. Witer 25 4. Let Z be a sigle observatio havig desity fuctio where. p (z) = (2z + ), z (a) Assumig that is a oradom parameter, fid ad plot the maximum

More information

Measure and Measurable Functions

Measure and Measurable Functions 3 Measure ad Measurable Fuctios 3.1 Measure o a Arbitrary σ-algebra Recall from Chapter 2 that the set M of all Lebesgue measurable sets has the followig properties: R M, E M implies E c M, E M for N implies

More information

The Boolean Ring of Intervals

The Boolean Ring of Intervals MATH 532 Lebesgue Measure Dr. Neal, WKU We ow shall apply the results obtaied about outer measure to the legth measure o the real lie. Throughout, our space X will be the set of real umbers R. Whe ecessary,

More information

Lesson 10: Limits and Continuity

Lesson 10: Limits and Continuity www.scimsacademy.com Lesso 10: Limits ad Cotiuity SCIMS Academy 1 Limit of a fuctio The cocept of limit of a fuctio is cetral to all other cocepts i calculus (like cotiuity, derivative, defiite itegrals

More information

Integrable Functions. { f n } is called a determining sequence for f. If f is integrable with respect to, then f d does exist as a finite real number

Integrable Functions. { f n } is called a determining sequence for f. If f is integrable with respect to, then f d does exist as a finite real number MATH 532 Itegrable Fuctios Dr. Neal, WKU We ow shall defie what it meas for a measurable fuctio to be itegrable, show that all itegral properties of simple fuctios still hold, ad the give some coditios

More information

This exam contains 19 pages (including this cover page) and 10 questions. A Formulae sheet is provided with the exam.

This exam contains 19 pages (including this cover page) and 10 questions. A Formulae sheet is provided with the exam. Probability ad Statistics FS 07 Secod Sessio Exam 09.0.08 Time Limit: 80 Miutes Name: Studet ID: This exam cotais 9 pages (icludig this cover page) ad 0 questios. A Formulae sheet is provided with the

More information

On Random Line Segments in the Unit Square

On Random Line Segments in the Unit Square O Radom Lie Segmets i the Uit Square Thomas A. Courtade Departmet of Electrical Egieerig Uiversity of Califoria Los Ageles, Califoria 90095 Email: tacourta@ee.ucla.edu I. INTRODUCTION Let Q = [0, 1] [0,

More information

Generalized Semi- Markov Processes (GSMP)

Generalized Semi- Markov Processes (GSMP) Geeralized Semi- Markov Processes (GSMP) Summary Some Defiitios Markov ad Semi-Markov Processes The Poisso Process Properties of the Poisso Process Iterarrival times Memoryless property ad the residual

More information

Math 341 Lecture #31 6.5: Power Series

Math 341 Lecture #31 6.5: Power Series Math 341 Lecture #31 6.5: Power Series We ow tur our attetio to a particular kid of series of fuctios, amely, power series, f(x = a x = a 0 + a 1 x + a 2 x 2 + where a R for all N. I terms of a series

More information

6. Sufficient, Complete, and Ancillary Statistics

6. Sufficient, Complete, and Ancillary Statistics Sufficiet, Complete ad Acillary Statistics http://www.math.uah.edu/stat/poit/sufficiet.xhtml 1 of 7 7/16/2009 6:13 AM Virtual Laboratories > 7. Poit Estimatio > 1 2 3 4 5 6 6. Sufficiet, Complete, ad Acillary

More information

Lecture 10: Universal coding and prediction

Lecture 10: Universal coding and prediction 0-704: Iformatio Processig ad Learig Sprig 0 Lecture 0: Uiversal codig ad predictio Lecturer: Aarti Sigh Scribes: Georg M. Goerg Disclaimer: These otes have ot bee subjected to the usual scrutiy reserved

More information

CHAPTER 10 INFINITE SEQUENCES AND SERIES

CHAPTER 10 INFINITE SEQUENCES AND SERIES CHAPTER 10 INFINITE SEQUENCES AND SERIES 10.1 Sequeces 10.2 Ifiite Series 10.3 The Itegral Tests 10.4 Compariso Tests 10.5 The Ratio ad Root Tests 10.6 Alteratig Series: Absolute ad Coditioal Covergece

More information

1 Approximating Integrals using Taylor Polynomials

1 Approximating Integrals using Taylor Polynomials Seughee Ye Ma 8: Week 7 Nov Week 7 Summary This week, we will lear how we ca approximate itegrals usig Taylor series ad umerical methods. Topics Page Approximatig Itegrals usig Taylor Polyomials. Defiitios................................................

More information

7.1 Convergence of sequences of random variables

7.1 Convergence of sequences of random variables Chapter 7 Limit Theorems Throughout this sectio we will assume a probability space (, F, P), i which is defied a ifiite sequece of radom variables (X ) ad a radom variable X. The fact that for every ifiite

More information

PROBLEM SET 5 SOLUTIONS 126 = , 37 = , 15 = , 7 = 7 1.

PROBLEM SET 5 SOLUTIONS 126 = , 37 = , 15 = , 7 = 7 1. Math 7 Sprig 06 PROBLEM SET 5 SOLUTIONS Notatios. Give a real umber x, we will defie sequeces (a k ), (x k ), (p k ), (q k ) as i lecture.. (a) (5 pts) Fid the simple cotiued fractio represetatios of 6

More information

Random Models. Tusheng Zhang. February 14, 2013

Random Models. Tusheng Zhang. February 14, 2013 Radom Models Tusheg Zhag February 14, 013 1 Radom Walks Let me describe the model. Radom walks are used to describe the motio of a movig particle (object). Suppose that a particle (object) moves alog the

More information

Advanced Analysis. Min Yan Department of Mathematics Hong Kong University of Science and Technology

Advanced Analysis. Min Yan Department of Mathematics Hong Kong University of Science and Technology Advaced Aalysis Mi Ya Departmet of Mathematics Hog Kog Uiversity of Sciece ad Techology September 3, 009 Cotets Limit ad Cotiuity 7 Limit of Sequece 8 Defiitio 8 Property 3 3 Ifiity ad Ifiitesimal 8 4

More information

CEE 522 Autumn Uncertainty Concepts for Geotechnical Engineering

CEE 522 Autumn Uncertainty Concepts for Geotechnical Engineering CEE 5 Autum 005 Ucertaity Cocepts for Geotechical Egieerig Basic Termiology Set A set is a collectio of (mutually exclusive) objects or evets. The sample space is the (collectively exhaustive) collectio

More information

Lecture 12: November 13, 2018

Lecture 12: November 13, 2018 Mathematical Toolkit Autum 2018 Lecturer: Madhur Tulsiai Lecture 12: November 13, 2018 1 Radomized polyomial idetity testig We will use our kowledge of coditioal probability to prove the followig lemma,

More information

7 Sequences of real numbers

7 Sequences of real numbers 40 7 Sequeces of real umbers 7. Defiitios ad examples Defiitio 7... A sequece of real umbers is a real fuctio whose domai is the set N of atural umbers. Let s : N R be a sequece. The the values of s are

More information

Lecture 12: September 27

Lecture 12: September 27 36-705: Itermediate Statistics Fall 207 Lecturer: Siva Balakrisha Lecture 2: September 27 Today we will discuss sufficiecy i more detail ad the begi to discuss some geeral strategies for costructig estimators.

More information

MASSACHUSETTS INSTITUTE OF TECHNOLOGY 6.436J/15.085J Fall 2008 Lecture 6 9/24/2008 DISCRETE RANDOM VARIABLES AND THEIR EXPECTATIONS

MASSACHUSETTS INSTITUTE OF TECHNOLOGY 6.436J/15.085J Fall 2008 Lecture 6 9/24/2008 DISCRETE RANDOM VARIABLES AND THEIR EXPECTATIONS MASSACHUSETTS INSTITUTE OF TECHNOLOGY 6.436J/15.085J Fall 2008 Lecture 6 9/24/2008 DISCRETE RANDOM VARIABLES AND THEIR EXPECTATIONS Cotets 1. A few useful discrete radom variables 2. Joit, margial, ad

More information

EE 4TM4: Digital Communications II Probability Theory

EE 4TM4: Digital Communications II Probability Theory 1 EE 4TM4: Digital Commuicatios II Probability Theory I. RANDOM VARIABLES A radom variable is a real-valued fuctio defied o the sample space. Example: Suppose that our experimet cosists of tossig two fair

More information

Spring Information Theory Midterm (take home) Due: Tue, Mar 29, 2016 (in class) Prof. Y. Polyanskiy. P XY (i, j) = α 2 i 2j

Spring Information Theory Midterm (take home) Due: Tue, Mar 29, 2016 (in class) Prof. Y. Polyanskiy. P XY (i, j) = α 2 i 2j Sprig 206 6.44 - Iformatio Theory Midterm (take home) Due: Tue, Mar 29, 206 (i class) Prof. Y. Polyaskiy Rules. Collaboratio strictly prohibited. 2. Write rigorously, prove all claims. 3. You ca use otes

More information

1 Convergence in Probability and the Weak Law of Large Numbers

1 Convergence in Probability and the Weak Law of Large Numbers 36-752 Advaced Probability Overview Sprig 2018 8. Covergece Cocepts: i Probability, i L p ad Almost Surely Istructor: Alessadro Rialdo Associated readig: Sec 2.4, 2.5, ad 4.11 of Ash ad Doléas-Dade; Sec

More information

The Growth of Functions. Theoretical Supplement

The Growth of Functions. Theoretical Supplement The Growth of Fuctios Theoretical Supplemet The Triagle Iequality The triagle iequality is a algebraic tool that is ofte useful i maipulatig absolute values of fuctios. The triagle iequality says that

More information

Math 525: Lecture 5. January 18, 2018

Math 525: Lecture 5. January 18, 2018 Math 525: Lecture 5 Jauary 18, 2018 1 Series (review) Defiitio 1.1. A sequece (a ) R coverges to a poit L R (writte a L or lim a = L) if for each ǫ > 0, we ca fid N such that a L < ǫ for all N. If the

More information

Basics of Probability Theory (for Theory of Computation courses)

Basics of Probability Theory (for Theory of Computation courses) Basics of Probability Theory (for Theory of Computatio courses) Oded Goldreich Departmet of Computer Sciece Weizma Istitute of Sciece Rehovot, Israel. oded.goldreich@weizma.ac.il November 24, 2008 Preface.

More information

MASSACHUSETTS INSTITUTE OF TECHNOLOGY 6.265/15.070J Fall 2013 Lecture 21 11/27/2013

MASSACHUSETTS INSTITUTE OF TECHNOLOGY 6.265/15.070J Fall 2013 Lecture 21 11/27/2013 MASSACHUSETTS INSTITUTE OF TECHNOLOGY 6.265/15.070J Fall 2013 Lecture 21 11/27/2013 Fuctioal Law of Large Numbers. Costructio of the Wieer Measure Cotet. 1. Additioal techical results o weak covergece

More information

Week 5-6: The Binomial Coefficients

Week 5-6: The Binomial Coefficients Wee 5-6: The Biomial Coefficiets March 6, 2018 1 Pascal Formula Theorem 11 (Pascal s Formula For itegers ad such that 1, ( ( ( 1 1 + 1 The umbers ( 2 ( 1 2 ( 2 are triagle umbers, that is, The petago umbers

More information

4 The Sperner property.

4 The Sperner property. 4 The Sperer property. I this sectio we cosider a surprisig applicatio of certai adjacecy matrices to some problems i extremal set theory. A importat role will also be played by fiite groups. I geeral,

More information

5 Birkhoff s Ergodic Theorem

5 Birkhoff s Ergodic Theorem 5 Birkhoff s Ergodic Theorem Amog the most useful of the various geeralizatios of KolmogorovâĂŹs strog law of large umbers are the ergodic theorems of Birkhoff ad Kigma, which exted the validity of the

More information

Notes for Lecture 11

Notes for Lecture 11 U.C. Berkeley CS78: Computatioal Complexity Hadout N Professor Luca Trevisa 3/4/008 Notes for Lecture Eigevalues, Expasio, ad Radom Walks As usual by ow, let G = (V, E) be a udirected d-regular graph with

More information

Mathematical Statistics - MS

Mathematical Statistics - MS Paper Specific Istructios. The examiatio is of hours duratio. There are a total of 60 questios carryig 00 marks. The etire paper is divided ito three sectios, A, B ad C. All sectios are compulsory. Questios

More information

Shannon s noiseless coding theorem

Shannon s noiseless coding theorem 18.310 lecture otes May 4, 2015 Shao s oiseless codig theorem Lecturer: Michel Goemas I these otes we discuss Shao s oiseless codig theorem, which is oe of the foudig results of the field of iformatio

More information

INEQUALITIES BJORN POONEN

INEQUALITIES BJORN POONEN INEQUALITIES BJORN POONEN 1 The AM-GM iequality The most basic arithmetic mea-geometric mea (AM-GM) iequality states simply that if x ad y are oegative real umbers, the (x + y)/2 xy, with equality if ad

More information

Problem Set 4 Due Oct, 12

Problem Set 4 Due Oct, 12 EE226: Radom Processes i Systems Lecturer: Jea C. Walrad Problem Set 4 Due Oct, 12 Fall 06 GSI: Assae Gueye This problem set essetially reviews detectio theory ad hypothesis testig ad some basic otios

More information

A sequence of numbers is a function whose domain is the positive integers. We can see that the sequence

A sequence of numbers is a function whose domain is the positive integers. We can see that the sequence Sequeces A sequece of umbers is a fuctio whose domai is the positive itegers. We ca see that the sequece,, 2, 2, 3, 3,... is a fuctio from the positive itegers whe we write the first sequece elemet as

More information

subcaptionfont+=small,labelformat=parens,labelsep=space,skip=6pt,list=0,hypcap=0 subcaption ALGEBRAIC COMBINATORICS LECTURE 8 TUESDAY, 2/16/2016

subcaptionfont+=small,labelformat=parens,labelsep=space,skip=6pt,list=0,hypcap=0 subcaption ALGEBRAIC COMBINATORICS LECTURE 8 TUESDAY, 2/16/2016 subcaptiofot+=small,labelformat=pares,labelsep=space,skip=6pt,list=0,hypcap=0 subcaptio ALGEBRAIC COMBINATORICS LECTURE 8 TUESDAY, /6/06. Self-cojugate Partitios Recall that, give a partitio λ, we may

More information

Seunghee Ye Ma 8: Week 5 Oct 28

Seunghee Ye Ma 8: Week 5 Oct 28 Week 5 Summary I Sectio, we go over the Mea Value Theorem ad its applicatios. I Sectio 2, we will recap what we have covered so far this term. Topics Page Mea Value Theorem. Applicatios of the Mea Value

More information

Lecture 10 October Minimaxity and least favorable prior sequences

Lecture 10 October Minimaxity and least favorable prior sequences STATS 300A: Theory of Statistics Fall 205 Lecture 0 October 22 Lecturer: Lester Mackey Scribe: Brya He, Rahul Makhijai Warig: These otes may cotai factual ad/or typographic errors. 0. Miimaxity ad least

More information

Random Variables, Sampling and Estimation

Random Variables, Sampling and Estimation Chapter 1 Radom Variables, Samplig ad Estimatio 1.1 Itroductio This chapter will cover the most importat basic statistical theory you eed i order to uderstad the ecoometric material that will be comig

More information

ECE 901 Lecture 13: Maximum Likelihood Estimation

ECE 901 Lecture 13: Maximum Likelihood Estimation ECE 90 Lecture 3: Maximum Likelihood Estimatio R. Nowak 5/7/009 The focus of this lecture is to cosider aother approach to learig based o maximum likelihood estimatio. Ulike earlier approaches cosidered

More information

Ma 530 Infinite Series I

Ma 530 Infinite Series I Ma 50 Ifiite Series I Please ote that i additio to the material below this lecture icorporated material from the Visual Calculus web site. The material o sequeces is at Visual Sequeces. (To use this li

More information

Complex Analysis Spring 2001 Homework I Solution

Complex Analysis Spring 2001 Homework I Solution Complex Aalysis Sprig 2001 Homework I Solutio 1. Coway, Chapter 1, sectio 3, problem 3. Describe the set of poits satisfyig the equatio z a z + a = 2c, where c > 0 ad a R. To begi, we see from the triagle

More information

Sequences and Series of Functions

Sequences and Series of Functions Chapter 6 Sequeces ad Series of Fuctios 6.1. Covergece of a Sequece of Fuctios Poitwise Covergece. Defiitio 6.1. Let, for each N, fuctio f : A R be defied. If, for each x A, the sequece (f (x)) coverges

More information

Chapter 4. Fourier Series

Chapter 4. Fourier Series Chapter 4. Fourier Series At this poit we are ready to ow cosider the caoical equatios. Cosider, for eample the heat equatio u t = u, < (4.) subject to u(, ) = si, u(, t) = u(, t) =. (4.) Here,

More information

Chapter 10: Power Series

Chapter 10: Power Series Chapter : Power Series 57 Chapter Overview: Power Series The reaso series are part of a Calculus course is that there are fuctios which caot be itegrated. All power series, though, ca be itegrated because

More information

CS284A: Representations and Algorithms in Molecular Biology

CS284A: Representations and Algorithms in Molecular Biology CS284A: Represetatios ad Algorithms i Molecular Biology Scribe Notes o Lectures 3 & 4: Motif Discovery via Eumeratio & Motif Represetatio Usig Positio Weight Matrix Joshua Gervi Based o presetatios by

More information

STAT Homework 1 - Solutions

STAT Homework 1 - Solutions STAT-36700 Homework 1 - Solutios Fall 018 September 11, 018 This cotais solutios for Homework 1. Please ote that we have icluded several additioal commets ad approaches to the problems to give you better

More information

Sequences A sequence of numbers is a function whose domain is the positive integers. We can see that the sequence

Sequences A sequence of numbers is a function whose domain is the positive integers. We can see that the sequence Sequeces A sequece of umbers is a fuctio whose domai is the positive itegers. We ca see that the sequece 1, 1, 2, 2, 3, 3,... is a fuctio from the positive itegers whe we write the first sequece elemet

More information

Randomized Algorithms I, Spring 2018, Department of Computer Science, University of Helsinki Homework 1: Solutions (Discussed January 25, 2018)

Randomized Algorithms I, Spring 2018, Department of Computer Science, University of Helsinki Homework 1: Solutions (Discussed January 25, 2018) Radomized Algorithms I, Sprig 08, Departmet of Computer Sciece, Uiversity of Helsiki Homework : Solutios Discussed Jauary 5, 08). Exercise.: Cosider the followig balls-ad-bi game. We start with oe black

More information

4. Partial Sums and the Central Limit Theorem

4. Partial Sums and the Central Limit Theorem 1 of 10 7/16/2009 6:05 AM Virtual Laboratories > 6. Radom Samples > 1 2 3 4 5 6 7 4. Partial Sums ad the Cetral Limit Theorem The cetral limit theorem ad the law of large umbers are the two fudametal theorems

More information

A Proof of Birkhoff s Ergodic Theorem

A Proof of Birkhoff s Ergodic Theorem A Proof of Birkhoff s Ergodic Theorem Joseph Hora September 2, 205 Itroductio I Fall 203, I was learig the basics of ergodic theory, ad I came across this theorem. Oe of my supervisors, Athoy Quas, showed

More information

4.3 Growth Rates of Solutions to Recurrences

4.3 Growth Rates of Solutions to Recurrences 4.3. GROWTH RATES OF SOLUTIONS TO RECURRENCES 81 4.3 Growth Rates of Solutios to Recurreces 4.3.1 Divide ad Coquer Algorithms Oe of the most basic ad powerful algorithmic techiques is divide ad coquer.

More information

1 of 7 7/16/2009 6:06 AM Virtual Laboratories > 6. Radom Samples > 1 2 3 4 5 6 7 6. Order Statistics Defiitios Suppose agai that we have a basic radom experimet, ad that X is a real-valued radom variable

More information

Topic 9: Sampling Distributions of Estimators

Topic 9: Sampling Distributions of Estimators Topic 9: Samplig Distributios of Estimators Course 003, 2016 Page 0 Samplig distributios of estimators Sice our estimators are statistics (particular fuctios of radom variables), their distributio ca be

More information

Refinement of Two Fundamental Tools in Information Theory

Refinement of Two Fundamental Tools in Information Theory Refiemet of Two Fudametal Tools i Iformatio Theory Raymod W. Yeug Istitute of Network Codig The Chiese Uiversity of Hog Kog Joit work with Siu Wai Ho ad Sergio Verdu Discotiuity of Shao s Iformatio Measures

More information

If a subset E of R contains no open interval, is it of zero measure? For instance, is the set of irrationals in [0, 1] is of measure zero?

If a subset E of R contains no open interval, is it of zero measure? For instance, is the set of irrationals in [0, 1] is of measure zero? 2 Lebesgue Measure I Chapter 1 we defied the cocept of a set of measure zero, ad we have observed that every coutable set is of measure zero. Here are some atural questios: If a subset E of R cotais a

More information

REAL ANALYSIS II: PROBLEM SET 1 - SOLUTIONS

REAL ANALYSIS II: PROBLEM SET 1 - SOLUTIONS REAL ANALYSIS II: PROBLEM SET 1 - SOLUTIONS 18th Feb, 016 Defiitio (Lipschitz fuctio). A fuctio f : R R is said to be Lipschitz if there exists a positive real umber c such that for ay x, y i the domai

More information