Entropy, Relative Entropy and Mutual Information

Similar documents
Chain Rules for Entropy

Probability and Statistics. What is probability? What is statistics?

3. Basic Concepts: Consequences and Properties

Random Variables. ECE 313 Probability with Engineering Applications Lecture 8 Professor Ravi K. Iyer University of Illinois

Continuous Random Variables: Conditioning, Expectation and Independence

Lecture 3 Probability review (cont d)

UNIVERSITY OF OSLO DEPARTMENT OF ECONOMICS

D KL (P Q) := p i ln p i q i

å 1 13 Practice Final Examination Solutions - = CS109 Dec 5, 2018

CHAPTER 3 POSTERIOR DISTRIBUTIONS

STK3100 and STK4100 Autumn 2017

Lecture 9. Some Useful Discrete Distributions. Some Useful Discrete Distributions. The observations generated by different experiments have

2SLS Estimates ECON In this case, begin with the assumption that E[ i

STK3100 and STK4100 Autumn 2018

( ) = ( ) ( ) Chapter 13 Asymptotic Theory and Stochastic Regressors. Stochastic regressors model

UNIVERSITY OF OSLO DEPARTMENT OF ECONOMICS

2. Independence and Bernoulli Trials

STK4011 and STK9011 Autumn 2016

Econometric Methods. Review of Estimation

Mean is only appropriate for interval or ratio scales, not ordinal or nominal.

Lecture 3. Sampling, sampling distributions, and parameter estimation

Summary of the lecture in Biostatistics

Lecture 9: Tolerant Testing

Pr[X (p + t)n] e D KL(p+t p)n.

Special Instructions / Useful Data

Channel Models with Memory. Channel Models with Memory. Channel Models with Memory. Channel Models with Memory

9 U-STATISTICS. Eh =(m!) 1 Eh(X (1),..., X (m ) ) i.i.d

Point Estimation: definition of estimators

Qualifying Exam Statistical Theory Problem Solutions August 2005

Lecture Notes Types of economic variables

Chapter 5 Properties of a Random Sample

Lecture 7. Confidence Intervals and Hypothesis Tests in the Simple CLR Model

Multiple Choice Test. Chapter Adequacy of Models for Regression

Parameter Estimation

STRONG CONSISTENCY FOR SIMPLE LINEAR EV MODEL WITH v/ -MIXING

ρ < 1 be five real numbers. The

X X X E[ ] E X E X. is the ()m n where the ( i,)th. j element is the mean of the ( i,)th., then

Third handout: On the Gini Index

Lecture Notes Forecasting the process of estimating or predicting unknown situations

Dr. Shalabh Department of Mathematics and Statistics Indian Institute of Technology Kanpur

GEOMETRY OF JENSEN S INEQUALITY AND QUASI-ARITHMETIC MEANS

1 Solution to Problem 6.40

Ordinary Least Squares Regression. Simple Regression. Algebra and Assumptions.

Simulation Output Analysis

Chapter 14 Logistic Regression Models

ECONOMETRIC THEORY. MODULE VIII Lecture - 26 Heteroskedasticity

CHAPTER 6. d. With success = observation greater than 10, x = # of successes = 4, and

Functions of Random Variables

Chapter 4 Multiple Random Variables

12.2 Estimating Model parameters Assumptions: ox and y are related according to the simple linear regression model

CHAPTER VI Statistical Analysis of Experimental Data

BASIC PRINCIPLES OF STATISTICS

Comparison of Dual to Ratio-Cum-Product Estimators of Population Mean

{ }{ ( )} (, ) = ( ) ( ) ( ) Chapter 14 Exercises in Sampling Theory. Exercise 1 (Simple random sampling): Solution:

Class 13,14 June 17, 19, 2015

Midterm Exam 1, section 2 (Solution) Thursday, February hour, 15 minutes

Logistic regression (continued)

STATISTICAL INFERENCE

ENGI 4421 Joint Probability Distributions Page Joint Probability Distributions [Navidi sections 2.5 and 2.6; Devore sections

MEASURES OF DISPERSION

IS 709/809: Computational Methods in IS Research. Simple Markovian Queueing Model

The number of observed cases The number of parameters. ith case of the dichotomous dependent variable. the ith case of the jth parameter

CS 2750 Machine Learning Lecture 5. Density estimation. Density estimation

Parameter, Statistic and Random Samples

Chapter 5 Properties of a Random Sample

Multiple Linear Regression Analysis

Midterm Exam 1, section 1 (Solution) Thursday, February hour, 15 minutes

TESTS BASED ON MAXIMUM LIKELIHOOD

Test Paper-II. 1. If sin θ + cos θ = m and sec θ + cosec θ = n, then (a) 2n = m (n 2 1) (b) 2m = n (m 2 1) (c) 2n = m (m 2 1) (d) none of these

Introducing Sieve of Eratosthenes as a Theorem

Simple Linear Regression

Nonparametric Density Estimation Intro

Lecture 3 Naïve Bayes, Maximum Entropy and Text Classification COSI 134

Parametric Density Estimation: Bayesian Estimation. Naïve Bayes Classifier

best estimate (mean) for X uncertainty or error in the measurement (systematic, random or statistical) best

Simple Linear Regression

STA 108 Applied Linear Models: Regression Analysis Spring Solution for Homework #1

Random Variables and Probability Distributions

Multivariate Transformation of Variables and Maximum Likelihood Estimation

Likelihood Ratio, Wald, and Lagrange Multiplier (Score) Tests. Soccer Goals in European Premier Leagues

Chapter 8: Statistical Analysis of Simulated Data

Mu Sequences/Series Solutions National Convention 2014

Module 7. Lecture 7: Statistical parameter estimation

MS exam problems Fall 2012

Factorization of Finite Abelian Groups

Bounds on the expected entropy and KL-divergence of sampled multinomial distributions. Brandon C. Roy

Generative classification models

PTAS for Bin-Packing

Maximum Likelihood Estimation

Unit 9. The Tangent Bundle

Chapter 4 Multiple Random Variables

Correlation and Simple Linear Regression

i 2 σ ) i = 1,2,...,n , and = 3.01 = 4.01

Unique Common Fixed Point of Sequences of Mappings in G-Metric Space M. Akram *, Nosheen

Study of Correlation using Bayes Approach under bivariate Distributions

Strong Convergence of Weighted Averaged Approximants of Asymptotically Nonexpansive Mappings in Banach Spaces without Uniform Convexity

Bayes (Naïve or not) Classifiers: Generative Approach

Lecture Note to Rice Chapter 8

Analysis of System Performance IN2072 Chapter 5 Analysis of Non Markov Systems

Unsupervised Learning and Other Neural Networks

Transcription:

Etro Relatve Etro ad Mutual Iformato rof. Ja-Lg Wu Deartmet of Comuter Scece ad Iformato Egeerg Natoal Tawa Uverst

Defto: The Etro of a dscrete radom varable s defed b : base : 0 0 0 as bts 0 : addg terms of zero robablt chage the etro does ot Iformato Theor

Note that etro s a fucto of the dstrbuto of. It does ot deed o the actual values take b the r.v. but ol o the robabltes. If s wrtte as the the eected E g Eg Eectato value g value of the r. v. g Remark : The etro of the eected value of E Self-formato Iformato Theor 3

Lemma.: 0 Lemma.: b = b a a E: 0 0 def 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0. 0. 0 0. 0. 0.3 0.4 0.5 0.6 0.7 0.8 0.9.0 = bts whe =/ s a cocave fucto of 3 =0 f =0 or 4 ma occurs whe =/ Iformato Theor 4

Iformato Theor 5 Jot Etro ad Codtoal Etro Defto: The jot etro of a ar of dscrete radom varables wth a jot dstrbuto s defed as Defto: The codtoal etro s defed as E or as defed s E

Iformato Theor 6 Theorem. Cha Rule: or euvaletl we ca wrte : f

Corollar: Z = Z + Z Remark: II = Iformato Theor 7

Relatve Etro ad Mutual Iformato The etro of a radom varable s a measure of the ucertat of the radom varable; t s a measure of the amout of formato reured o the average to descrbe the radom varable. The relatve etro s a measure of the dstace betwee two dstrbutos. I statstcs t arses as a eected arthm of the lkelhood rato. The relatve etro D s a measure of the effcec of assumg that the dstrbuto s whe the true dstrbuto s. Iformato Theor 8

E: If we kew the true dstrbuto of the r.v. the we could costruct a code wth average descrto legth. If stead we used the code for a dstrbuto we would eed +D bts o the average to descrbe the r.v.. Iformato Theor 9

Iformato Theor 0 Defto: The relatve etro or Kullback Lebler dstace betwee two robablt mass fuctos ad s defes as E E E E D

Iformato Theor Defto: Cosder two r.v. s ad wth a jot robablt mass fucto ad margal robablt mass fuctos ad. The mutual formato I; s the relatve etro betwee the jot dstrbuto ad the roduct dstrbuto.e. ; E D I

E: Let = {0 } ad cosder two dstrbutos ad o. Let 0=-r =r ad let 0=-s =s. The ad D If r=s the D=D=0 Whle geeral D 0 0 0 r 0 0 0 s r s s r r s DD r s s r Iformato Theor

Iformato Theor 3 Relatosh betwee Etro ad Mutual Iformato Rewrte I; as ; I

Thus the mutual formato I; s the reducto the ucertat of due to the kowledge of. B smmetr t follows that I; = sas much about as sas about Sce = + I; = + I; = = The mutual formato of a r.v. wth tself s the etro of the r.v. etro : self-formato Iformato Theor 4

Theorem: Mutual formato ad etro:. I; = = = +. I; = I;. I; = I; Iformato Theor 5

Cha Rules for Etro Relatve Etro ad Mutual Iformato Theorem: Cha rule for etro Let be draw accordg to. The Iformato Theor 6

Iformato Theor 7 roof 3 3 3

Iformato Theor 8 We wrte the

Iformato Theor 9 Defto: The codtoal mutual formato of rv s. ad gve Z s defed b ; Z Z Z E Z Z Z I z

Iformato Theor 0 Theorem: cha rule for mutual-formato roof: I I ; ; I I ; ;

Defto: The codtoal relatve etro D s the average of the relatve etroes betwee the codtoal robablt mass fuctos ad averaged over the robablt mass fucto. D E Theorem: Cha rule for relatve etro D = D+ D Iformato Theor

Jese s Ieualt ad Its Coseueces Defto: A fucto s sad to be cove over a terval ab f for ever ab ad 0 f +- f +-f A fucto f s sad to be strctl cove f eualt holds ol f =0 or =. Defto: A fucto s cocave f f s cove. E: cove fuctos: e for 0 cocave fuctos: / for 0 both cove ad cocave: a+b; lear fuctos Iformato Theor

Theorem: If the fucto f has a secod dervatve whch s o-egatve ostve everwhere the the fucto s cove strctl cove. E E d : : dscrete case cotuous case Iformato Theor 3

Iformato Theor 4 Theorem : Jese s eualt: If f s cove fucto ad s a radom varable the Ef fe. roof: For a two mass ot dstrbuto the eualt becomes f + f f + + = whch follows drectl from the defto of cove fuctos. Suose the theorem s true for dstrbutos wth K- mass ots. The wrtg = /- K for = K- we have The roof ca be eteded to cotuous dstrbutos b cotut argumets. Mathematcal Iducto k k k k k k k k k k k k k k k k k k k k k k k f f f f f f f f f f

Iformato Theor 5 Theorem: Iformato eualt: Let be two robablt mass fuctos. The D 0 wth eualt ff = for all. roof: Let A={:>0} be the suort set of. The 0 cocave t s A A A A E E D

Corollar: No-egatvt of mutual formato: For a two rv s. I; 0 wth eualt ff ad are deedet. roof: Corollar: I; = D 0 wth eualt ff =.e. ad are deedet D 0 wth eualt ff = for all ad wth >0. Corollar: I;Z 0 wth eualt ff ad are codtoar deedet gve Z. Iformato Theor 6

Theorem: where deotes the umber of elemets the rage of wth eualt ff has a uform dstrbuto over. roof: Let u=/ be the uform robablt mass fucto over ad let be the robablt mass fucto for. The D u ece b theo - egatvt of 0 D u u relatve etro Iformato Theor 7

Theorem: codtog reduces etro: wth eualt ff ad are deedet. roof: 0 I;= Note that ths s true ol o the average; secfcall = ma be greater tha or less tha or eual to but o the average = =. Iformato Theor 8

E: Let have the followg jot dstrbuto 0 3/4 /8 /8 The =/8 7/8=0.544 bts ==0 bts == bts > owever = 3/4 =+/4 = = 0.5 bts < Iformato Theor 9

Theorem: Ideedece boud o etro: Let be draw accordg to. The wth eualt ff the are deedet. roof: B the cha rule for etroes wth eualt ff the s are deedet. Iformato Theor 30

The LOG SUM INEQUALIT AND ITS ALICATIONS Theorem: Log sum eualt For o-egatve umbers a a a ad b b.. b a a b wth eualt ff a /b = costat. some covetos : a 00 0 0 0 0 a b 0 a a 0 f a 0 Iformato Theor 3

Iformato Theor 3 roof: Assume w.l.o.g that a >0 ad b >0. The fucto ft=tt s strctl cove sce for all ostve t. ece b Jese s eualt we have whch s the sum eualt. 0 " e t t f b a a b a a b b a b a b a b a b b b a b b b a b b b a b a b b b a t b b t f t f 0 ote that we obta ad.settg 0 for

Rerovg the theorem that D 0 wth eualt ff = D 0 from - sum eualt wth eualt ff /=c. Sce both ad are robablt mass fuctos c= =. Iformato Theor 33

Iformato Theor 34 Theorem: D s cove the ar.e. f ad are two ars of robablt mass fuctos the roof: 0 all for D D D D D b a a b a a the b b a a Let D -sum

Theorem: cocavt of etro: s a cocave fucto of. That s: λ +-λ λ +-λ roof: = Du where u s the uform dstrbuto o outcomes. The cocavt of the follows drectl from the covet of D. Iformato Theor 35

Theorem: Let ~ =. The mutual formato I; s a cocave fucto of for fed a cove fucto of for fed. roof: I;=-= = f s fed the s a lear fucto of. = = ece whch s a cocave fucto of s a cocave fucto of. The secod term of s a lear fucto of. ece the dfferece s a cocave fucto of. Iformato Theor 36

We f ad cosder two dfferet codtoal dstrbutos ad. The corresodg jot dstrbutos are = ad = ad ther resectve margals are ad. Cosder a codtoal dstrbuto = +- that s a mture of ad. The corresodg jot dstrbuto s also a mture of the corresodg jot dstrbutos whe s fed s lear wth = +- ad the dstrbuto of s also a mture = +-. ece f we let = = +-. The roduct of the margal dstrbutos s also lear wth whe s fed. I; = D cove of the mutual formato s a cove fucto of the codtoal dstrbuto. Therefore the covet of I; s the same as that of the D w.r.t. whe s fed. Iformato Theor 37

Data rocessg eualt: No clever maulato of the data ca mrove the fereces that ca be made from the data Defto: Rv s. Z are sad to form a Markov cha that order deoted b Z f the codtoal dstrbuto of Z deeds ol o ad s codtoall deedet of. That s Z form a Markov cha the z=z z=z : ad Z are codtoall deedet gve Z mles that Z If Z=f the Z Iformato Theor 38

Theorem: Data rocessg eualt f Z the I; I;Z No rocessg of determstc or radom ca crease the formato that cotas about. roof: I;Z = I;Z + I;Z : cha rule = I; + I;Z : cha rule Sce ad Z are deedet gve we have I;Z=0. Sce I;Z0 we have I;I;Z wth eualt ff I;Z=0.e. Z forms a Markov cha. Smlarl oe ca rove I;ZI;Z. Iformato Theor 39

Corollar: If Z forms a Markov cha ad f Z=g we have I;I;g : fuctos of the data caot crease the formato about. Corollar: If Z the I;ZI; roof: I;Z=I;Z+I;Z =I;+I;Z B Markovt I;Z=0 ad I;Z 0 I;ZI; The deedece of ad s decreased or remas uchaged b the observato of a dowstream r.v. Z. Iformato Theor 40

Note that t s ossble that I;Z>I; whe ad Z do ot form a Markov cha. E: Let ad be deedet far bar rv s ad let Z=+. The I;=0 but I;Z =Z Z =Z =Z=Z==/ bt. Iformato Theor 4

Fao s eualt: Fao s eualt relates the robablt of error guessg the r.v. to ts codtoal etro. Note that: The codtoal etro of a r.v. gve aother radom varable s zero ff s a fucto of. roof: W =0 mles there s o ucertat about f we kow for all wth >0 there s ol oe ossble value of wth >0 we ca estmate from wth zero robablt of error ff =0. we eect to be able to estmate wth a low robablt of error ol f the codtoal etro s small. Fao s eualt uatfes ths dea. Iformato Theor 4

Suose we wsh to estmate a r.v. wth a dstrbuto. We observe a r.v. whch s related to b the codtoal dstrbuto. From we calculate a fucto whch s a estmate of. We wsh to boud the robablt that. We observe that forms a Markov cha. Defe the robablt of error e r r g g Iformato Theor 43

Theorem: Fao s eualt For a estmator such that wth e = r we have e + e - Ths eualt ca be weakeed to or + e e Remark: e = 0 = 0 ^ ^ ^ e E: bar r.v. - Iformato Theor 44

roof: Defe a error rv. f E 0 f B the cha rule for etroes we have E ^ = ^ + E ^ ^ =0 =E + E e ^ e - Sce codtog reduces etro E E= e. Now sce E s a fucto of ad ^ E=0. ^ Sce E s a bar-valued r.v. E= e. The remag term E ^ ca be bouded as follows: E ^ = r E=0E=0+ ^ r E=E= ^ - e 0 + e - ^ Iformato Theor 45

^ Sce gve E=0 = ad gve E= we ca uer boud the codtoal etro b the of the umber of remag outcomes -. e + e. ^ B the data rocessg eualt we have I; I; ^ sce ^ ad therefore ^. Thus we have e + e ^. Remark: Suose there s o kowledge of. Thus must be guessed wthout a formato. Let { m} ^ ad m. The the best guess of s = ad the resultg robablt of error s e = -. Fao s eualt becomes e + e m- The robablt mass fucto m = - e e /m- e /m- acheves ths boud wth eualt. Iformato Theor 46

Some roertes of the Relatve Etro. Let ad be two robablt dstrbutos o the state sace of a Markov cha at tme ad let + ad + be the corresodg dstrbutos at tme +. Let the corresodg jot mass fucto be deoted b ad. That s + = r + + = r + where r s the robablt trasto fucto for the Markov cha. Iformato Theor 47

The b the cha rule for relatve etro we have the followg two easos: D + + = D + D + + = D + + + D + + Sce both ad are derved from the same Markov cha so + = + = r + ad hece D + + = 0 Iformato Theor 48

That s D = D + + + D + + Sce D + + 0 D D + + or D D + + Cocluso: The dstace betwee the robablt mass fuctos s decreasg wth tme for a Markov cha. Iformato Theor 49

. Relatve etro D betwee a dstrbuto o the states at tme ad a statoar dstrbuto decreases wth. I the last euato f we let be a statoar dstrbuto the + s the same statoar dstrbuto. ece D D + A state dstrbuto gets closer ad closer to each statoar dstrbuto as tme asses. D 0 lm Iformato Theor 50

3. Def:A robablt trasto matr [ j ] j = r { + =j =} s called doubl stochastc f j = = j= ad j j = = j= The uform dstrbuto s a statoar dstrbuto of ff the robablt trasto matr s doubl stochastc. Iformato Theor 5

4. The codtoal etro crease wth for a statoar Markov rocess. If the Markov rocess s statoar the s costat. So the etro s o-creasg. owever t ca be roved that creases wth. Ths mles that: the codtoal ucertat of the future creases. roof: codtog reduces etro = b Markovt = - b statoart Smlarl: 0 s creasg for a Markov cha. Iformato Theor 5

Suffcet Statstcs Suose we have a faml of robablt mass fucto {f } deed b ad let be a samle from a dstrbuto ths faml. Let T be a statstc fucto of the samle lke the samle mea or samle varace. The T Ad b the data rocessg eualt we have I;T I; for a dstrbuto o. owever f eualt holds o formato s lost. A statstc T s called suffcet for f t cotas all the formato about. Iformato Theor 53

Def: A fucto T s sad to be a suffcet statstc relatve to the faml {f } f s deedet of gve T.e. T forms a Markov cha. or: I; = I; T for all dstrbutos o Suffcet statstcs reserve mutual formato. Iformato Theor 54

Some eamles of Suffcet Statstcs. Let be a..d. seuece of co tosses of a co wth ukow arameter θ r. Gve the umber of s s a suffcet statstcs for θ. ere {0} T Gve T all seueces havg that ma s are euall lkel ad deedet of the arameter θ.. Iformato Theor 55

Iformato Theor 56.... 0...... r for statstcs suffcet a s T ad Thus otherwse k f k k

. If s ormall dstrbuted wth mea θ ad varace ; that s f f e N ad are draw deedetl accordg to a suffcet statstc for θ s the samle mea. Ths ca be verfed that s deedet of θ. f Iformato Theor 57

The mmal suffcet statstcs s a suffcet statstcs that s a fucto of all other suffcet statstcs. Def: A statc T s a mmal suffcet statstc related to f f t s a fucto of ever other suffcet statstc U : T U ece a mmal suffcet statstc mamall comresses the formato about θ the samle. Other suffcet statstcs ma cota addtoal rrelevat formato. The suffcet statstcs of the above eamles are mmal. Iformato Theor 58

Shuffles crease Etro: If T s a shuffle ermutato of a deck of cards ad s the tal radom osto of the cards the deck ad f the choce of the shuffle T s deedet of the T where T s the ermutato of the deck duced b the shuffle T o the tal ermutato. roof: T TT = T - TT wh? = T = f ad T are deedet! Iformato Theor 59

If ad are..d. wth etro the r = - wth eualt ff has a uform dstrbuto. f: suose ~. B Jese s eualt we have E E whch mles that - = = = r = Let ad be two..d. rv s wth etro. The rob. at = s gve b r = = Let be deedet wth ~ ~r The r = --Dr r = -r-dr *Notce that the fucto f= s cove f: --Dr = + r/ = r r = r = r = Iformato Theor 60