Entropy and Large Deviations p. 1/32 Entropy and Large Deviations S.R.S. Varadhan Courant Institute, NYU Michigan State Universiy East Lansing March 31, 2015
Entropy comes up in many different contexts. Entropy and Large Deviations p. 2/32
Entropy and Large Deviations p. 2/32 Entropy comes up in many different contexts. in Physics, introduced by Rudolf Clausius in 1865
Entropy and Large Deviations p. 2/32 Entropy comes up in many different contexts. in Physics, introduced by Rudolf Clausius in 1865 In connection with heat transfer, relation between heat and work. Classical thermodynamics.
Entropy and Large Deviations p. 2/32 Entropy comes up in many different contexts. in Physics, introduced by Rudolf Clausius in 1865 In connection with heat transfer, relation between heat and work. Classical thermodynamics. Bulk Quantity
Entropy and Large Deviations p. 2/32 Entropy comes up in many different contexts. in Physics, introduced by Rudolf Clausius in 1865 In connection with heat transfer, relation between heat and work. Classical thermodynamics. Bulk Quantity Boltzmann around 1877 defined entropy as clog Ω, Ω is the set of micro states that correspond to a given macro state. Ω is its size.
Shannon s Entropy.!948. Mathematical theory of communication. Modern information theory. Entropy and Large Deviations p. 3/32
Entropy and Large Deviations p. 3/32 Shannon s Entropy.!948. Mathematical theory of communication. Modern information theory. Given p = (p 1,...,p k ), H(p) = i p i logp i
Entropy and Large Deviations p. 3/32 Shannon s Entropy.!948. Mathematical theory of communication. Modern information theory. Given p = (p 1,...,p k ), H(p) = i p i logp i Conditional ProbabilityP[X = i Σ] = p i (ω), E[p i (ω)] = p i
Entropy and Large Deviations p. 4/32 Conditional Entropy H(ω) = i p i (ω)logp i (ω) E[H(ω)] = E[ i p i (ω)logp i (ω)]
Entropy and Large Deviations p. 5/32 Concavity H = i p i logp i E[ i p i (ω)logp i (ω)]
Entropy and Large Deviations p. 5/32 Concavity H = i p i logp i E[ i p i (ω)logp i (ω)] X i is a stationary stochastic process. p i (ω) = P[X 1 = i X 0,...X n,...]. H[P] = E P[ i p i (ω)logp i (ω) ]
Entropy and Large Deviations p. 5/32 Concavity H = i p i logp i E[ i p i (ω)logp i (ω)] X i is a stationary stochastic process. p i (ω) = P[X 1 = i X 0,...X n,...]. H[P] = E P[ i p i (ω)logp i (ω) ] Conditioning is with respect to past history.
Look at p(x 1,...,x n ) Entropy and Large Deviations p. 6/32
Entropy and Large Deviations p. 6/32 Look at p(x 1,...,x n ) H n (P) = x 1,...,x n p(x 1,...,x n )logp(x 1,...,x n )
Entropy and Large Deviations p. 6/32 Look at p(x 1,...,x n ) H n (P) = x 1,...,x n p(x 1,...,x n )logp(x 1,...,x n ) H n+m H n +H m,h n is.
Entropy and Large Deviations p. 6/32 Look at p(x 1,...,x n ) H n (P) = x 1,...,x n p(x 1,...,x n )logp(x 1,...,x n ) H n+m H n +H m,h n is. H n+1 H n is
Entropy and Large Deviations p. 6/32 Look at p(x 1,...,x n ) H n (P) = x 1,...,x n p(x 1,...,x n )logp(x 1,...,x n ) H n+m H n +H m,h n is. H n+1 H n is H(P) = lim H n n = lim[h n+1 H n ]
Dynamical system. (X,Σ,T,P). T : X X preservesp. Entropy and Large Deviations p. 7/32
Entropy and Large Deviations p. 7/32 Dynamical system. (X,Σ,T,P). T : X X preservesp. ShiftT in a space of sequences. P any stationary process.
Entropy and Large Deviations p. 7/32 Dynamical system. (X,Σ,T,P). T : X X preservesp. ShiftT in a space of sequences. P any stationary process. Isomorphism between (X, Σ, T, P) and (X,Σ,T,P ) S : X X ;ST = T S,PS 1 = P
Entropy and Large Deviations p. 7/32 Dynamical system. (X,Σ,T,P). T : X X preservesp. ShiftT in a space of sequences. P any stationary process. Isomorphism between (X, Σ, T, P) and (X,Σ,T,P ) S : X X ;ST = T S,PS 1 = P Spectral Invariant. (U f)(x) = f(u x) Unitary map inl 2 (X,Σ,P)
Entropy and Large Deviations p. 7/32 Dynamical system. (X,Σ,T,P). T : X X preservesp. ShiftT in a space of sequences. P any stationary process. Isomorphism between (X, Σ, T, P) and (X,Σ,T,P ) S : X X ;ST = T S,PS 1 = P Spectral Invariant. (U f)(x) = f(u x) Unitary map inl 2 (X,Σ,P) VU = U V
Entropy and Large Deviations p. 8/32 Entropy of a dynamical system. (Ω,Σ,P,T)
Entropy and Large Deviations p. 8/32 Entropy of a dynamical system. (Ω,Σ,P,T) Each function f taking a finite set of values generates a stochastic processp f, the distribution of {f(t n x)}
Entropy and Large Deviations p. 8/32 Entropy of a dynamical system. (Ω,Σ,P,T) Each function f taking a finite set of values generates a stochastic processp f, the distribution of {f(t n x)} h(ω,σ,p,t) = sup f h(p f )
Entropy and Large Deviations p. 8/32 Entropy of a dynamical system. (Ω,Σ,P,T) Each function f taking a finite set of values generates a stochastic processp f, the distribution of {f(t n x)} h(ω,σ,p,t) = sup f h(p f ) Invariant. Not spectral. H(T 2,P) = 2H(T,P).
Entropy and Large Deviations p. 8/32 Entropy of a dynamical system. (Ω,Σ,P,T) Each function f taking a finite set of values generates a stochastic processp f, the distribution of {f(t n x)} h(ω,σ,p,t) = sup f h(p f ) Invariant. Not spectral. H(T 2,P) = 2H(T,P). Computable. P = Πp, H(P) = h(p).
Isomorphism Theorem of Ornstein Entropy and Large Deviations p. 9/32
Entropy and Large Deviations p. 9/32 Isomorphism Theorem of Ornstein If h(p) = h(q)
Entropy and Large Deviations p. 9/32 Isomorphism Theorem of Ornstein If h(p) = h(q) The dynamical systems with product measuresp,q, i.e(f,p) and (G,Q) are isomorphic.
Shannon s entropy and coding Entropy and Large Deviations p. 10/32
Entropy and Large Deviations p. 10/32 Shannon s entropy and coding Code in such a way that the best compression is achieved.
Entropy and Large Deviations p. 10/32 Shannon s entropy and coding Code in such a way that the best compression is achieved. Incoming data stream has stationary statistics P.
Entropy and Large Deviations p. 10/32 Shannon s entropy and coding Code in such a way that the best compression is achieved. Incoming data stream has stationary statistics P. Coding to be done into words in an alphabet of size r.
Entropy and Large Deviations p. 10/32 Shannon s entropy and coding Code in such a way that the best compression is achieved. Incoming data stream has stationary statistics P. Coding to be done into words in an alphabet of size r. The compression factor is c = H(P) logr ;
The number ofntuples is k n. Entropy and Large Deviations p. 11/32
Entropy and Large Deviations p. 11/32 The number ofntuples is k n. Shannon-Breiman-McMillan theorem: IfP is stationary and ergodic, P(E n ) 1 where E n = { 1 n logp(x 1,...,x n ) H(P) ǫ}
Entropy and Large Deviations p. 11/32 The number ofntuples is k n. Shannon-Breiman-McMillan theorem: IfP is stationary and ergodic, P(E n ) 1 where E n = { 1 n logp(x 1,...,x n ) H(P) ǫ} Almost the entire probability underp is carried by nearlye nh(p) n tuples of more or less equal probability.
Entropy and Large Deviations p. 11/32 The number ofntuples is k n. Shannon-Breiman-McMillan theorem: IfP is stationary and ergodic, P(E n ) 1 where E n = { 1 n logp(x 1,...,x n ) H(P) ǫ} Almost the entire probability underp is carried by nearlye nh(p) n tuples of more or less equal probability. r m = e nh(p). c = m n = H(P) logr
k n sequences of lengthnfrom an alphabet of sizek. Entropy and Large Deviations p. 12/32
Entropy and Large Deviations p. 12/32 k n sequences of length n from an alphabet of size k. How many look likep?
Entropy and Large Deviations p. 12/32 k n sequences of length n from an alphabet of size k. How many look likep? exp[n(h(p))+o(n)]
Entropy and Large Deviations p. 12/32 k n sequences of length n from an alphabet of size k. How many look likep? exp[n(h(p))+o(n)] H(P) is maximized when P = P 0, the uniform distribution on F n
Entropy and Large Deviations p. 12/32 k n sequences of length n from an alphabet of size k. How many look likep? exp[n(h(p))+o(n)] H(P) is maximized when P = P 0, the uniform distribution on F n H(P 0 ) = logk
Entropy and Large Deviations p. 13/32 Kullback-Leibler information, Relative Entropy (1951) H(q;p) = q i log q i p i i
Entropy and Large Deviations p. 13/32 Kullback-Leibler information, Relative Entropy (1951) H(q;p) = q i log q i p i i Ifpis the uniform distribution with mass 1 k point then at every h(q : p) = logk h(q)
Kullback-Leibler information, Relative Entropy (1951) H(q;p) = q i log q i p i i Ifpis the uniform distribution with mass 1 k point then at every h(q : p) = logk h(q) Two probability densities H(g;f) = g(x)log g(x) f(x) dx Entropy and Large Deviations p. 13/32
Entropy and Large Deviations p. 14/32 Two probability measures H(µ,λ) = log dµ dλ dµ = dµ dλ log dµ dλ dλ
Entropy and Large Deviations p. 14/32 Two probability measures H(µ,λ) = log dµ dλ dµ = Two stationary processes dµ dλ log dµ dλ dλ H(Q,P) = E Q [H(Q Σ;P Σ)]
Entropy and Large Deviations p. 14/32 Two probability measures H(µ,λ) = log dµ dλ dµ = Two stationary processes dµ dλ log dµ dλ dλ H(Q,P) = E Q [H(Q Σ;P Σ)] Has issues! OK ifp Σ is globally defined.
α,β are probability measures on X Entropy and Large Deviations p. 15/32
Entropy and Large Deviations p. 15/32 α,β are probability measures on X P = Πα
Entropy and Large Deviations p. 15/32 α,β are probability measures on X P = Πα n r n (dx) = 1 n δ xi i=1
Entropy and Large Deviations p. 15/32 α,β are probability measures on X P = Πα n r n (dx) = 1 n δ xi Sanov s Theorem. i=1 P[r n (dx) β] = exp[ nh(β;α)+o(n)]
Large Deviations Entropy and Large Deviations p. 16/32
Entropy and Large Deviations p. 16/32 Large Deviations For closed sets C lim sup n 1 n logp n[c] inf x C I(x)
Entropy and Large Deviations p. 16/32 Large Deviations For closed sets C lim sup n 1 n logp n[c] inf x C I(x) For open sets G lim inf n 1 n logp n[c] inf x G I(x)
Entropy and Large Deviations p. 16/32 Large Deviations For closed sets C lim sup n 1 n logp n[c] inf x C I(x) For open sets G lim inf n 1 n logp n[c] inf x G I(x) I is lower semi continuous and has compact level sets.
Sanov s theorem is an example. Entropy and Large Deviations p. 17/32
Entropy and Large Deviations p. 17/32 Sanov s theorem is an example. P n on M(X) with weak topology is the distribution ofr n (dx) under Πα.
Entropy and Large Deviations p. 17/32 Sanov s theorem is an example. P n on M(X) with weak topology is the distribution ofr n (dx) under Πα. I(β) = h(β : α).
P is a stationary process, We have a string of n observations. Entropy and Large Deviations p. 18/32
Entropy and Large Deviations p. 18/32 P is a stationary process, We have a string of n observations. Ergodic theorem says most sequences resemblep.
Entropy and Large Deviations p. 18/32 P is a stationary process, We have a string of n observations. Ergodic theorem says most sequences resemblep. What is the probability that they resembleq?
Entropy and Large Deviations p. 18/32 P is a stationary process, We have a string of n observations. Ergodic theorem says most sequences resemblep. What is the probability that they resembleq? exp[ nh(q;p)]
Entropy and Large Deviations p. 18/32 P is a stationary process, We have a string of n observations. Ergodic theorem says most sequences resemblep. What is the probability that they resembleq? exp[ nh(q;p)] IfP = P 0 then H(Q;P) = logk H(Q).
Entropy and Large Deviations p. 18/32 P is a stationary process, We have a string of n observations. Ergodic theorem says most sequences resemblep. What is the probability that they resembleq? exp[ nh(q;p)] IfP = P 0 then H(Q;P) = logk H(Q). OK for nicep.
Functional Analysis Entropy and Large Deviations p. 19/32
Entropy and Large Deviations p. 19/32 Functional Analysis f 0 fdλ = 1
Entropy and Large Deviations p. 19/32 Functional Analysis f 0 fdλ = 1 d dp [ p=1 ] f p dλ = f logfdλ
There is an analog of Holder s inequality in the limit asp 1. Entropy and Large Deviations p. 20/32
Entropy and Large Deviations p. 20/32 There is an analog of Holder s inequality in the limit asp 1. The duality
Entropy and Large Deviations p. 20/32 There is an analog of Holder s inequality in the limit asp 1. The duality x,y R. xy x p p + y q q, x p p = sup y [xy y q q ], y q q = sup x [xy x p p ]
Entropy and Large Deviations p. 20/32 There is an analog of Holder s inequality in the limit asp 1. The duality x,y R. xy x p p + y q q, x p p = sup y [xy y q q ], y q q = sup x [xy x p p ] Becomes, forx > 0,y R xlogx x = sup y [xy e y ] e y = sup x>0 [xy (xlogx x)]
Entropy and Large Deviations p. 21/32 Hölder inequality becomes generalized Jensen s inequality. gdµ = fgdλ log e g dλ+h(µ,λ)
Entropy and Large Deviations p. 21/32 Hölder inequality becomes generalized Jensen s inequality. gdµ = fgdλ log e g dλ+h(µ,λ) g = cχ A (x). Optimize over c > 0. c = log 1 λ(a)
Entropy and Large Deviations p. 21/32 Hölder inequality becomes generalized Jensen s inequality. gdµ = fgdλ log e g dλ+h(µ,λ) g = cχ A (x). Optimize over c > 0. c = log 1 λ(a) e g = χ A c + 1 λ(a) χ A; e g dλ 2
Entropy and Large Deviations p. 21/32 Hölder inequality becomes generalized Jensen s inequality. gdµ = fgdλ log e g dλ+h(µ,λ) g = cχ A (x). Optimize over c > 0. c = log 1 λ(a) e g = χ A c + 1 λ(a) χ A; e g dλ 2 µ(a) H(µ,λ)+log2 log 1 λ(a)
Entropy grows linearly but thelp norms grow exponentially fast. Entropy and Large Deviations p. 22/32
Entropy and Large Deviations p. 22/32 Entropy grows linearly but thelp norms grow exponentially fast. Useful inequality for interacting particle systems
Statistical Mechanics. Probability distributions are often defined through an energy functional. Entropy and Large Deviations p. 23/32
Entropy and Large Deviations p. 23/32 Statistical Mechanics. Probability distributions are often defined through an energy functional. p n (x 1,x 2,...,x n ) = [c(n)] 1 exp[ n k+1 i=1 F(x i,x i+1,...,x i+k 1 )]
Entropy and Large Deviations p. 24/32 c n = x 1,...,x n exp[ n k+1 i=1 F(x i,x i+1,...,x i+k 1 )]
Entropy and Large Deviations p. 24/32 c n = x 1,...,x n exp[ n k+1 i=1 F(x i,x i+1,...,x i+k 1 )] exp[nh(p)], P typical sequences.
Entropy and Large Deviations p. 24/32 c n = x 1,...,x n exp[ n k+1 i=1 F(x i,x i+1,...,x i+k 1 )] exp[nh(p)], P typical sequences. exp[ n FdP]
Entropy and Large Deviations p. 24/32 c n = x 1,...,x n exp[ n k+1 i=1 F(x i,x i+1,...,x i+k 1 )] exp[nh(p)], P typical sequences. exp[ n FdP] The sum therefore is e n( FdP H(P))
Entropy and Large Deviations p. 25/32 Then by Laplace asymptotics logc n lim = sup[ n n P F dp +H(P)]
Entropy and Large Deviations p. 25/32 Then by Laplace asymptotics logc n lim = sup[ n n P F dp +H(P)] If the sup is attained at a uniquep then P c 1 exp[ i= F(x i,x i+1,...,x i+k 1 )]
Entropy and Large Deviations p. 25/32 Then by Laplace asymptotics logc n lim = sup[ n n P F dp +H(P)] If the sup is attained at a uniquep then P c 1 exp[ i= F(x i,x i+1,...,x i+k 1 )] The averages 1 n k+1 n i=1 G(x i,x i+1,...,x i+k 1 ) converge in probability under p n to E P [G(x 1,...,x k )].
Log Sobolev inequality. Entropy and Large Deviations p. 26/32
Entropy and Large Deviations p. 26/32 Log Sobolev inequality. T t is a Markov semigroup, on (X,Σ) T t f 0 iff 0 and T t 1 = 1.
Entropy and Large Deviations p. 26/32 Log Sobolev inequality. T t is a Markov semigroup, on (X,Σ) T t f 0 iff 0 and T t 1 = 1. Symmetric with respect to µ
Entropy and Large Deviations p. 26/32 Log Sobolev inequality. T t is a Markov semigroup, on (X,Σ) T t f 0 iff 0 and T t 1 = 1. Symmetric with respect to µ f 0, fdµ = 1 f logfdµ cd( f)
Entropy and Large Deviations p. 26/32 Log Sobolev inequality. T t is a Markov semigroup, on (X,Σ) T t f 0 iff 0 and T t 1 = 1. Symmetric with respect to µ f 0, fdµ = 1 f logfdµ cd( f) d dt f t logf t dµ 1 2 D( f) c f t logf t dµ
Exponential decay, spectral gap etc. Entropy and Large Deviations p. 27/32
Entropy and Large Deviations p. 27/32 Exponential decay, spectral gap etc. Infinitesimal Version.
Entropy and Large Deviations p. 27/32 Exponential decay, spectral gap etc. Infinitesimal Version. {f(x,θ)}
Entropy and Large Deviations p. 27/32 Exponential decay, spectral gap etc. Infinitesimal Version. {f(x,θ)} H(θ,θ 0 ) = f(x,θ)log f(x,θ) f(x,θ 0 ) dµ
Entropy and Large Deviations p. 27/32 Exponential decay, spectral gap etc. Infinitesimal Version. {f(x,θ)} H(θ,θ 0 ) = f(x,θ)log f(x,θ) f(x,θ 0 ) dµ H(θ,θ 0 ) has a minimum at θ 0.
Entropy and Large Deviations p. 27/32 Exponential decay, spectral gap etc. Infinitesimal Version. {f(x,θ)} H(θ,θ 0 ) = f(x,θ)log f(x,θ) f(x,θ 0 ) dµ H(θ,θ 0 ) has a minimum at θ 0. I(θ(0)) = d2 H dθ 2 θ=θ 0
Optimizing Entropy. Entropy and Large Deviations p. 28/32
Entropy and Large Deviations p. 28/32 Optimizing Entropy. Toss a coin n times.
Entropy and Large Deviations p. 28/32 Optimizing Entropy. Toss a coin n times. Had 3n 4 heads.
Entropy and Large Deviations p. 28/32 Optimizing Entropy. Toss a coin n times. Had 3n 4 heads. Was it a steady lead or burst of heads?
Entropy and Large Deviations p. 28/32 Optimizing Entropy. Toss a coin n times. Had 3n 4 heads. Was it a steady lead or burst of heads? P[ 1 ns(nt) f(t)]
f (t) = p(t) Entropy and Large Deviations p. 29/32
Entropy and Large Deviations p. 29/32 f (t) = p(t) e n 1 0 h(p(t))dt
Entropy and Large Deviations p. 29/32 f (t) = p(t) e n 1 0 h(p(t))dt Minimize 1 0 h(p(t))dt subject to 1 0 p(s)ds = 3 4
Entropy and Large Deviations p. 29/32 f (t) = p(t) e n 1 0 h(p(t))dt Minimize 1 0 h(p(t))dt subject to 1 0 p(s)ds = 3 4 h(p) = log2+plogp+(1 p)log(1 p)
Entropy and Large Deviations p. 29/32 f (t) = p(t) e n 1 0 h(p(t))dt Minimize 1 0 h(p(t))dt subject to 1 0 p(s)ds = 3 4 h(p) = log2+plogp+(1 p)log(1 p) Convexity. p(s) = 3s 4.
General Philosophy Entropy and Large Deviations p. 30/32
Entropy and Large Deviations p. 30/32 General Philosophy P(A) is to be estimated.
Entropy and Large Deviations p. 30/32 General Philosophy P(A) is to be estimated. P(A) is small.
Entropy and Large Deviations p. 30/32 General Philosophy P(A) is to be estimated. P(A) is small. FindQsuch that Q(A) 1
Entropy and Large Deviations p. 30/32 General Philosophy P(A) is to be estimated. P(A) is small. FindQsuch that Q(A) 1 1 P(A) = Q(A) Q(A) A dq log e dp dq
Entropy and Large Deviations p. 31/32 P(A) Q(A)exp[ 1 Q(A) A log dq dp dq]
Entropy and Large Deviations p. 31/32 P(A) Q(A)exp[ 1 Q(A) Asymptotically more or less A log dq dp dq] P(A) exp[ h(q : P)]
Entropy and Large Deviations p. 31/32 P(A) Q(A)exp[ 1 Q(A) Asymptotically more or less A log dq dp dq] Optimize over Q. P(A) exp[ h(q : P)]
Example. Markov Chain Entropy and Large Deviations p. 32/32
Entropy and Large Deviations p. 32/32 Example. Markov Chain {p i,j }. Invariant distributionp
Entropy and Large Deviations p. 32/32 Example. Markov Chain {p i,j }. Invariant distributionp What is the probability that the empirical r n is close toq.
Entropy and Large Deviations p. 32/32 Example. Markov Chain {p i,j }. Invariant distributionp What is the probability that the empirical r n is close toq. Find{q i,j } such that the invariant distribution isq
Entropy and Large Deviations p. 32/32 Example. Markov Chain {p i,j }. Invariant distributionp What is the probability that the empirical r n is close toq. Find{q i,j } such that the invariant distribution isq h(q : P) = i,j q iq i,j log q i,j p i,j
Entropy and Large Deviations p. 32/32 Example. Markov Chain {p i,j }. Invariant distributionp What is the probability that the empirical r n is close toq. Find{q i,j } such that the invariant distribution isq h(q : P) = i,j q iq i,j log q i,j p i,j Optimize over {q i,j } with q as invariant distribution.
Entropy and Large Deviations p. 32/32 Example. Markov Chain {p i,j }. Invariant distributionp What is the probability that the empirical r n is close toq. Find{q i,j } such that the invariant distribution isq h(q : P) = i,j q iq i,j log q i,j p i,j Optimize over {q i,j } with q as invariant distribution. Best possible lower bound is the upper bound.
Random graphs Entropy and Large Deviations p. 33/32
Entropy and Large Deviations p. 33/32 Random graphs p is the probability of edge.
Entropy and Large Deviations p. 33/32 Random graphs p is the probability of edge. Need more triangles ( N 3) τ, τ > p 3.
Entropy and Large Deviations p. 33/32 Random graphs p is the probability of edge. Need more triangles ( N 3) τ, τ > p 3. Fix f(x,y)f(y,z)f(z,x)dxdydz = τ
Entropy and Large Deviations p. 33/32 Random graphs p is the probability of edge. Need more triangles ( N 3) τ, τ > p 3. Fix f(x,y)f(y,z)f(z,x)dxdydz = τ Minimize H(f) = [f(x,y)log f(x,y) p +(1 f(x,y))log (1 f(x,y)) 1 p ]dxdy
Entropy and Large Deviations p. 33/32 Random graphs p is the probability of edge. Need more triangles ( N 3) τ, τ > p 3. Fix f(x,y)f(y,z)f(z,x)dxdydz = τ Minimize H(f) = [f(x,y)log f(x,y) p +(1 f(x,y))log (1 f(x,y)) 1 p ]dxdy f = τ 1 3?
THANK YOU Entropy and Large Deviations p. 34/32