Entropy and Large Deviations

Similar documents
Large Deviations and Random Graphs

MEASURE-THEORETIC ENTROPY

A Note On Large Deviation Theory and Beyond

Large-deviation theory and coverage in mobile phone networks

δ xj β n = 1 n Theorem 1.1. The sequence {P n } satisfies a large deviation principle on M(X) with the rate function I(β) given by

Lecture 5 - Information theory

Exercises with solutions (Set D)

LECTURE 3. Last time:

Entropy and Limit theorems in Probability Theory

STA205 Probability: Week 8 R. Wolpert

Mean-field dual of cooperative reproduction

ECE 4400:693 - Information Theory

Chapter 8: Differential entropy. University of Illinois at Chicago ECE 534, Natasha Devroye

Information Theory. Lecture 5 Entropy rate and Markov sources STEFAN HÖST

Complex Systems Methods 2. Conditional mutual information, entropy rate and algorithmic complexity

INFORMATION THEORY AND STATISTICS

Lecture 1: Entropy, convexity, and matrix scaling CSE 599S: Entropy optimality, Winter 2016 Instructor: James R. Lee Last updated: January 24, 2016

A relative entropy characterization of the growth rate of reward in risk-sensitive control

MGMT 69000: Topics in High-dimensional Data Analysis Falll 2016

H(X) = plog 1 p +(1 p)log 1 1 p. With a slight abuse of notation, we denote this quantity by H(p) and refer to it as the binary entropy function.

Bioinformatics: Biology X

Lecture 5 Channel Coding over Continuous Channels

Waiting times, recurrence times, ergodicity and quasiperiodic dynamics

Application of Information Theory, Lecture 7. Relative Entropy. Handout Mode. Iftach Haitner. Tel Aviv University.

Distance-Divergence Inequalities

(1) Consider the space S consisting of all continuous real-valued functions on the closed interval [0, 1]. For f, g S, define

ECE598: Information-theoretic methods in high-dimensional statistics Spring 2016

Intertwinings for Markov processes

Large deviation theory and applications

Linear and non-linear programming

Some remarks on Large Deviations

Bayesian Regularization

3 (Due ). Let A X consist of points (x, y) such that either x or y is a rational number. Is A measurable? What is its Lebesgue measure?

Information geometry for bivariate distribution control

18.175: Lecture 3 Integration

Chaos, Complexity, and Inference (36-462)

EE376A: Homework #3 Due by 11:59pm Saturday, February 10th, 2018

Information Theory: Entropy, Markov Chains, and Huffman Coding

Information Theory in Intelligent Decision Making

Lecture 35: December The fundamental statistical distances

A PARAMETRIC MODEL FOR DISCRETE-VALUED TIME SERIES. 1. Introduction

Quantum algorithms based on quantum walks. Jérémie Roland (ULB - QuIC) Quantum walks Grenoble, Novembre / 39

An Introduction to Entropy and Subshifts of. Finite Type

June 21, Peking University. Dual Connections. Zhengchao Wan. Overview. Duality of connections. Divergence: general contrast functions

Theorem 2.1 (Caratheodory). A (countably additive) probability measure on a field has an extension. n=1

Solutions to Homework Set #1 Sanov s Theorem, Rate distortion

A note on adiabatic theorem for Markov chains

2 (Bonus). Let A X consist of points (x, y) such that either x or y is a rational number. Is A measurable? What is its Lebesgue measure?

Lecture 6: Entropy Rate

PROBABILITY AND INFORMATION THEORY. Dr. Gjergji Kasneci Introduction to Information Retrieval WS

Bayes spaces: use of improper priors and distances between densities

MAGIC010 Ergodic Theory Lecture Entropy

Homework Set #2 Data Compression, Huffman code and AEP

10-704: Information Processing and Learning Spring Lecture 8: Feb 5

Boltzmann-Gibbs Preserving Langevin Integrators

Gärtner-Ellis Theorem and applications.

PROBABILITY: LIMIT THEOREMS II, SPRING HOMEWORK PROBLEMS

Contraction properties of Feynman-Kac semigroups

3 hours UNIVERSITY OF MANCHESTER. 22nd May and. Electronic calculators may be used, provided that they cannot store text.

Diffeomorphic Warping. Ben Recht August 17, 2006 Joint work with Ali Rahimi (Intel)

Computing and Communications 2. Information Theory -Entropy

Quiz 2 Date: Monday, November 21, 2016

Finding is as easy as detecting for quantum walks

1. Principle of large deviations

Information Theory and Statistics Lecture 3: Stationary ergodic processes

Quantitative Model Checking (QMC) - SS12

Coding on Countably Infinite Alphabets

RS Chapter 1 Random Variables 6/5/2017. Chapter 1. Probability Theory: Introduction

A Single-letter Upper Bound for the Sum Rate of Multiple Access Channels with Correlated Sources

MATH 205C: STATIONARY PHASE LEMMA

215 Problem 1. (a) Define the total variation distance µ ν tv for probability distributions µ, ν on a finite set S. Show that

ELEMENTS OF PROBABILITY THEORY

Context tree models for source coding

2) Let X be a compact space. Prove that the space C(X) of continuous real-valued functions is a complete metric space.

A note on adiabatic theorem for Markov chains and adiabatic quantum computation. Yevgeniy Kovchegov Oregon State University

Lecture 2: August 31

Probability and Measure

MATH & MATH FUNCTIONS OF A REAL VARIABLE EXERCISES FALL 2015 & SPRING Scientia Imperii Decus et Tutamen 1

arxiv: v4 [cs.it] 17 Oct 2015

Proof: The coding of T (x) is the left shift of the coding of x. φ(t x) n = L if T n+1 (x) L

MASSACHUSETTS INSTITUTE OF TECHNOLOGY Department of Electrical Engineering and Computer Science Transmission of Information Spring 2006

Chapter 2: Source coding

3. If a choice is broken down into two successive choices, the original H should be the weighted sum of the individual values of H.

Real Analysis, 2nd Edition, G.B.Folland Elements of Functional Analysis

1 Stat 605. Homework I. Due Feb. 1, 2011

Existence, Uniqueness and Stability of Invariant Distributions in Continuous-Time Stochastic Models

1 Math 241A-B Homework Problem List for F2015 and W2016

6.1 Main properties of Shannon entropy. Let X be a random variable taking values x in some alphabet with probabilities.

Lecture 6: Gaussian Channels. Copyright G. Caire (Sample Lectures) 157

Spectral Theory of Orthogonal Polynomials

ELEC546 Review of Information Theory

EE514A Information Theory I Fall 2013

The main results about probability measures are the following two facts:

x log x, which is strictly convex, and use Jensen s Inequality:

Entropy-dissipation methods I: Fokker-Planck equations

Information Theory and Hypothesis Testing

A construction of strictly ergodic subshifts having entropy dimension (joint with K. K. Park and J. Lee (Ajou Univ.))

Large deviations, weak convergence, and relative entropy

Lecture 4. Entropy and Markov Chains

ENTROPY, SPEED AND SPECTRAL RADIUS OF RANDOM WALKS

Transcription:

Entropy and Large Deviations p. 1/32 Entropy and Large Deviations S.R.S. Varadhan Courant Institute, NYU Michigan State Universiy East Lansing March 31, 2015

Entropy comes up in many different contexts. Entropy and Large Deviations p. 2/32

Entropy and Large Deviations p. 2/32 Entropy comes up in many different contexts. in Physics, introduced by Rudolf Clausius in 1865

Entropy and Large Deviations p. 2/32 Entropy comes up in many different contexts. in Physics, introduced by Rudolf Clausius in 1865 In connection with heat transfer, relation between heat and work. Classical thermodynamics.

Entropy and Large Deviations p. 2/32 Entropy comes up in many different contexts. in Physics, introduced by Rudolf Clausius in 1865 In connection with heat transfer, relation between heat and work. Classical thermodynamics. Bulk Quantity

Entropy and Large Deviations p. 2/32 Entropy comes up in many different contexts. in Physics, introduced by Rudolf Clausius in 1865 In connection with heat transfer, relation between heat and work. Classical thermodynamics. Bulk Quantity Boltzmann around 1877 defined entropy as clog Ω, Ω is the set of micro states that correspond to a given macro state. Ω is its size.

Shannon s Entropy.!948. Mathematical theory of communication. Modern information theory. Entropy and Large Deviations p. 3/32

Entropy and Large Deviations p. 3/32 Shannon s Entropy.!948. Mathematical theory of communication. Modern information theory. Given p = (p 1,...,p k ), H(p) = i p i logp i

Entropy and Large Deviations p. 3/32 Shannon s Entropy.!948. Mathematical theory of communication. Modern information theory. Given p = (p 1,...,p k ), H(p) = i p i logp i Conditional ProbabilityP[X = i Σ] = p i (ω), E[p i (ω)] = p i

Entropy and Large Deviations p. 4/32 Conditional Entropy H(ω) = i p i (ω)logp i (ω) E[H(ω)] = E[ i p i (ω)logp i (ω)]

Entropy and Large Deviations p. 5/32 Concavity H = i p i logp i E[ i p i (ω)logp i (ω)]

Entropy and Large Deviations p. 5/32 Concavity H = i p i logp i E[ i p i (ω)logp i (ω)] X i is a stationary stochastic process. p i (ω) = P[X 1 = i X 0,...X n,...]. H[P] = E P[ i p i (ω)logp i (ω) ]

Entropy and Large Deviations p. 5/32 Concavity H = i p i logp i E[ i p i (ω)logp i (ω)] X i is a stationary stochastic process. p i (ω) = P[X 1 = i X 0,...X n,...]. H[P] = E P[ i p i (ω)logp i (ω) ] Conditioning is with respect to past history.

Look at p(x 1,...,x n ) Entropy and Large Deviations p. 6/32

Entropy and Large Deviations p. 6/32 Look at p(x 1,...,x n ) H n (P) = x 1,...,x n p(x 1,...,x n )logp(x 1,...,x n )

Entropy and Large Deviations p. 6/32 Look at p(x 1,...,x n ) H n (P) = x 1,...,x n p(x 1,...,x n )logp(x 1,...,x n ) H n+m H n +H m,h n is.

Entropy and Large Deviations p. 6/32 Look at p(x 1,...,x n ) H n (P) = x 1,...,x n p(x 1,...,x n )logp(x 1,...,x n ) H n+m H n +H m,h n is. H n+1 H n is

Entropy and Large Deviations p. 6/32 Look at p(x 1,...,x n ) H n (P) = x 1,...,x n p(x 1,...,x n )logp(x 1,...,x n ) H n+m H n +H m,h n is. H n+1 H n is H(P) = lim H n n = lim[h n+1 H n ]

Dynamical system. (X,Σ,T,P). T : X X preservesp. Entropy and Large Deviations p. 7/32

Entropy and Large Deviations p. 7/32 Dynamical system. (X,Σ,T,P). T : X X preservesp. ShiftT in a space of sequences. P any stationary process.

Entropy and Large Deviations p. 7/32 Dynamical system. (X,Σ,T,P). T : X X preservesp. ShiftT in a space of sequences. P any stationary process. Isomorphism between (X, Σ, T, P) and (X,Σ,T,P ) S : X X ;ST = T S,PS 1 = P

Entropy and Large Deviations p. 7/32 Dynamical system. (X,Σ,T,P). T : X X preservesp. ShiftT in a space of sequences. P any stationary process. Isomorphism between (X, Σ, T, P) and (X,Σ,T,P ) S : X X ;ST = T S,PS 1 = P Spectral Invariant. (U f)(x) = f(u x) Unitary map inl 2 (X,Σ,P)

Entropy and Large Deviations p. 7/32 Dynamical system. (X,Σ,T,P). T : X X preservesp. ShiftT in a space of sequences. P any stationary process. Isomorphism between (X, Σ, T, P) and (X,Σ,T,P ) S : X X ;ST = T S,PS 1 = P Spectral Invariant. (U f)(x) = f(u x) Unitary map inl 2 (X,Σ,P) VU = U V

Entropy and Large Deviations p. 8/32 Entropy of a dynamical system. (Ω,Σ,P,T)

Entropy and Large Deviations p. 8/32 Entropy of a dynamical system. (Ω,Σ,P,T) Each function f taking a finite set of values generates a stochastic processp f, the distribution of {f(t n x)}

Entropy and Large Deviations p. 8/32 Entropy of a dynamical system. (Ω,Σ,P,T) Each function f taking a finite set of values generates a stochastic processp f, the distribution of {f(t n x)} h(ω,σ,p,t) = sup f h(p f )

Entropy and Large Deviations p. 8/32 Entropy of a dynamical system. (Ω,Σ,P,T) Each function f taking a finite set of values generates a stochastic processp f, the distribution of {f(t n x)} h(ω,σ,p,t) = sup f h(p f ) Invariant. Not spectral. H(T 2,P) = 2H(T,P).

Entropy and Large Deviations p. 8/32 Entropy of a dynamical system. (Ω,Σ,P,T) Each function f taking a finite set of values generates a stochastic processp f, the distribution of {f(t n x)} h(ω,σ,p,t) = sup f h(p f ) Invariant. Not spectral. H(T 2,P) = 2H(T,P). Computable. P = Πp, H(P) = h(p).

Isomorphism Theorem of Ornstein Entropy and Large Deviations p. 9/32

Entropy and Large Deviations p. 9/32 Isomorphism Theorem of Ornstein If h(p) = h(q)

Entropy and Large Deviations p. 9/32 Isomorphism Theorem of Ornstein If h(p) = h(q) The dynamical systems with product measuresp,q, i.e(f,p) and (G,Q) are isomorphic.

Shannon s entropy and coding Entropy and Large Deviations p. 10/32

Entropy and Large Deviations p. 10/32 Shannon s entropy and coding Code in such a way that the best compression is achieved.

Entropy and Large Deviations p. 10/32 Shannon s entropy and coding Code in such a way that the best compression is achieved. Incoming data stream has stationary statistics P.

Entropy and Large Deviations p. 10/32 Shannon s entropy and coding Code in such a way that the best compression is achieved. Incoming data stream has stationary statistics P. Coding to be done into words in an alphabet of size r.

Entropy and Large Deviations p. 10/32 Shannon s entropy and coding Code in such a way that the best compression is achieved. Incoming data stream has stationary statistics P. Coding to be done into words in an alphabet of size r. The compression factor is c = H(P) logr ;

The number ofntuples is k n. Entropy and Large Deviations p. 11/32

Entropy and Large Deviations p. 11/32 The number ofntuples is k n. Shannon-Breiman-McMillan theorem: IfP is stationary and ergodic, P(E n ) 1 where E n = { 1 n logp(x 1,...,x n ) H(P) ǫ}

Entropy and Large Deviations p. 11/32 The number ofntuples is k n. Shannon-Breiman-McMillan theorem: IfP is stationary and ergodic, P(E n ) 1 where E n = { 1 n logp(x 1,...,x n ) H(P) ǫ} Almost the entire probability underp is carried by nearlye nh(p) n tuples of more or less equal probability.

Entropy and Large Deviations p. 11/32 The number ofntuples is k n. Shannon-Breiman-McMillan theorem: IfP is stationary and ergodic, P(E n ) 1 where E n = { 1 n logp(x 1,...,x n ) H(P) ǫ} Almost the entire probability underp is carried by nearlye nh(p) n tuples of more or less equal probability. r m = e nh(p). c = m n = H(P) logr

k n sequences of lengthnfrom an alphabet of sizek. Entropy and Large Deviations p. 12/32

Entropy and Large Deviations p. 12/32 k n sequences of length n from an alphabet of size k. How many look likep?

Entropy and Large Deviations p. 12/32 k n sequences of length n from an alphabet of size k. How many look likep? exp[n(h(p))+o(n)]

Entropy and Large Deviations p. 12/32 k n sequences of length n from an alphabet of size k. How many look likep? exp[n(h(p))+o(n)] H(P) is maximized when P = P 0, the uniform distribution on F n

Entropy and Large Deviations p. 12/32 k n sequences of length n from an alphabet of size k. How many look likep? exp[n(h(p))+o(n)] H(P) is maximized when P = P 0, the uniform distribution on F n H(P 0 ) = logk

Entropy and Large Deviations p. 13/32 Kullback-Leibler information, Relative Entropy (1951) H(q;p) = q i log q i p i i

Entropy and Large Deviations p. 13/32 Kullback-Leibler information, Relative Entropy (1951) H(q;p) = q i log q i p i i Ifpis the uniform distribution with mass 1 k point then at every h(q : p) = logk h(q)

Kullback-Leibler information, Relative Entropy (1951) H(q;p) = q i log q i p i i Ifpis the uniform distribution with mass 1 k point then at every h(q : p) = logk h(q) Two probability densities H(g;f) = g(x)log g(x) f(x) dx Entropy and Large Deviations p. 13/32

Entropy and Large Deviations p. 14/32 Two probability measures H(µ,λ) = log dµ dλ dµ = dµ dλ log dµ dλ dλ

Entropy and Large Deviations p. 14/32 Two probability measures H(µ,λ) = log dµ dλ dµ = Two stationary processes dµ dλ log dµ dλ dλ H(Q,P) = E Q [H(Q Σ;P Σ)]

Entropy and Large Deviations p. 14/32 Two probability measures H(µ,λ) = log dµ dλ dµ = Two stationary processes dµ dλ log dµ dλ dλ H(Q,P) = E Q [H(Q Σ;P Σ)] Has issues! OK ifp Σ is globally defined.

α,β are probability measures on X Entropy and Large Deviations p. 15/32

Entropy and Large Deviations p. 15/32 α,β are probability measures on X P = Πα

Entropy and Large Deviations p. 15/32 α,β are probability measures on X P = Πα n r n (dx) = 1 n δ xi i=1

Entropy and Large Deviations p. 15/32 α,β are probability measures on X P = Πα n r n (dx) = 1 n δ xi Sanov s Theorem. i=1 P[r n (dx) β] = exp[ nh(β;α)+o(n)]

Large Deviations Entropy and Large Deviations p. 16/32

Entropy and Large Deviations p. 16/32 Large Deviations For closed sets C lim sup n 1 n logp n[c] inf x C I(x)

Entropy and Large Deviations p. 16/32 Large Deviations For closed sets C lim sup n 1 n logp n[c] inf x C I(x) For open sets G lim inf n 1 n logp n[c] inf x G I(x)

Entropy and Large Deviations p. 16/32 Large Deviations For closed sets C lim sup n 1 n logp n[c] inf x C I(x) For open sets G lim inf n 1 n logp n[c] inf x G I(x) I is lower semi continuous and has compact level sets.

Sanov s theorem is an example. Entropy and Large Deviations p. 17/32

Entropy and Large Deviations p. 17/32 Sanov s theorem is an example. P n on M(X) with weak topology is the distribution ofr n (dx) under Πα.

Entropy and Large Deviations p. 17/32 Sanov s theorem is an example. P n on M(X) with weak topology is the distribution ofr n (dx) under Πα. I(β) = h(β : α).

P is a stationary process, We have a string of n observations. Entropy and Large Deviations p. 18/32

Entropy and Large Deviations p. 18/32 P is a stationary process, We have a string of n observations. Ergodic theorem says most sequences resemblep.

Entropy and Large Deviations p. 18/32 P is a stationary process, We have a string of n observations. Ergodic theorem says most sequences resemblep. What is the probability that they resembleq?

Entropy and Large Deviations p. 18/32 P is a stationary process, We have a string of n observations. Ergodic theorem says most sequences resemblep. What is the probability that they resembleq? exp[ nh(q;p)]

Entropy and Large Deviations p. 18/32 P is a stationary process, We have a string of n observations. Ergodic theorem says most sequences resemblep. What is the probability that they resembleq? exp[ nh(q;p)] IfP = P 0 then H(Q;P) = logk H(Q).

Entropy and Large Deviations p. 18/32 P is a stationary process, We have a string of n observations. Ergodic theorem says most sequences resemblep. What is the probability that they resembleq? exp[ nh(q;p)] IfP = P 0 then H(Q;P) = logk H(Q). OK for nicep.

Functional Analysis Entropy and Large Deviations p. 19/32

Entropy and Large Deviations p. 19/32 Functional Analysis f 0 fdλ = 1

Entropy and Large Deviations p. 19/32 Functional Analysis f 0 fdλ = 1 d dp [ p=1 ] f p dλ = f logfdλ

There is an analog of Holder s inequality in the limit asp 1. Entropy and Large Deviations p. 20/32

Entropy and Large Deviations p. 20/32 There is an analog of Holder s inequality in the limit asp 1. The duality

Entropy and Large Deviations p. 20/32 There is an analog of Holder s inequality in the limit asp 1. The duality x,y R. xy x p p + y q q, x p p = sup y [xy y q q ], y q q = sup x [xy x p p ]

Entropy and Large Deviations p. 20/32 There is an analog of Holder s inequality in the limit asp 1. The duality x,y R. xy x p p + y q q, x p p = sup y [xy y q q ], y q q = sup x [xy x p p ] Becomes, forx > 0,y R xlogx x = sup y [xy e y ] e y = sup x>0 [xy (xlogx x)]

Entropy and Large Deviations p. 21/32 Hölder inequality becomes generalized Jensen s inequality. gdµ = fgdλ log e g dλ+h(µ,λ)

Entropy and Large Deviations p. 21/32 Hölder inequality becomes generalized Jensen s inequality. gdµ = fgdλ log e g dλ+h(µ,λ) g = cχ A (x). Optimize over c > 0. c = log 1 λ(a)

Entropy and Large Deviations p. 21/32 Hölder inequality becomes generalized Jensen s inequality. gdµ = fgdλ log e g dλ+h(µ,λ) g = cχ A (x). Optimize over c > 0. c = log 1 λ(a) e g = χ A c + 1 λ(a) χ A; e g dλ 2

Entropy and Large Deviations p. 21/32 Hölder inequality becomes generalized Jensen s inequality. gdµ = fgdλ log e g dλ+h(µ,λ) g = cχ A (x). Optimize over c > 0. c = log 1 λ(a) e g = χ A c + 1 λ(a) χ A; e g dλ 2 µ(a) H(µ,λ)+log2 log 1 λ(a)

Entropy grows linearly but thelp norms grow exponentially fast. Entropy and Large Deviations p. 22/32

Entropy and Large Deviations p. 22/32 Entropy grows linearly but thelp norms grow exponentially fast. Useful inequality for interacting particle systems

Statistical Mechanics. Probability distributions are often defined through an energy functional. Entropy and Large Deviations p. 23/32

Entropy and Large Deviations p. 23/32 Statistical Mechanics. Probability distributions are often defined through an energy functional. p n (x 1,x 2,...,x n ) = [c(n)] 1 exp[ n k+1 i=1 F(x i,x i+1,...,x i+k 1 )]

Entropy and Large Deviations p. 24/32 c n = x 1,...,x n exp[ n k+1 i=1 F(x i,x i+1,...,x i+k 1 )]

Entropy and Large Deviations p. 24/32 c n = x 1,...,x n exp[ n k+1 i=1 F(x i,x i+1,...,x i+k 1 )] exp[nh(p)], P typical sequences.

Entropy and Large Deviations p. 24/32 c n = x 1,...,x n exp[ n k+1 i=1 F(x i,x i+1,...,x i+k 1 )] exp[nh(p)], P typical sequences. exp[ n FdP]

Entropy and Large Deviations p. 24/32 c n = x 1,...,x n exp[ n k+1 i=1 F(x i,x i+1,...,x i+k 1 )] exp[nh(p)], P typical sequences. exp[ n FdP] The sum therefore is e n( FdP H(P))

Entropy and Large Deviations p. 25/32 Then by Laplace asymptotics logc n lim = sup[ n n P F dp +H(P)]

Entropy and Large Deviations p. 25/32 Then by Laplace asymptotics logc n lim = sup[ n n P F dp +H(P)] If the sup is attained at a uniquep then P c 1 exp[ i= F(x i,x i+1,...,x i+k 1 )]

Entropy and Large Deviations p. 25/32 Then by Laplace asymptotics logc n lim = sup[ n n P F dp +H(P)] If the sup is attained at a uniquep then P c 1 exp[ i= F(x i,x i+1,...,x i+k 1 )] The averages 1 n k+1 n i=1 G(x i,x i+1,...,x i+k 1 ) converge in probability under p n to E P [G(x 1,...,x k )].

Log Sobolev inequality. Entropy and Large Deviations p. 26/32

Entropy and Large Deviations p. 26/32 Log Sobolev inequality. T t is a Markov semigroup, on (X,Σ) T t f 0 iff 0 and T t 1 = 1.

Entropy and Large Deviations p. 26/32 Log Sobolev inequality. T t is a Markov semigroup, on (X,Σ) T t f 0 iff 0 and T t 1 = 1. Symmetric with respect to µ

Entropy and Large Deviations p. 26/32 Log Sobolev inequality. T t is a Markov semigroup, on (X,Σ) T t f 0 iff 0 and T t 1 = 1. Symmetric with respect to µ f 0, fdµ = 1 f logfdµ cd( f)

Entropy and Large Deviations p. 26/32 Log Sobolev inequality. T t is a Markov semigroup, on (X,Σ) T t f 0 iff 0 and T t 1 = 1. Symmetric with respect to µ f 0, fdµ = 1 f logfdµ cd( f) d dt f t logf t dµ 1 2 D( f) c f t logf t dµ

Exponential decay, spectral gap etc. Entropy and Large Deviations p. 27/32

Entropy and Large Deviations p. 27/32 Exponential decay, spectral gap etc. Infinitesimal Version.

Entropy and Large Deviations p. 27/32 Exponential decay, spectral gap etc. Infinitesimal Version. {f(x,θ)}

Entropy and Large Deviations p. 27/32 Exponential decay, spectral gap etc. Infinitesimal Version. {f(x,θ)} H(θ,θ 0 ) = f(x,θ)log f(x,θ) f(x,θ 0 ) dµ

Entropy and Large Deviations p. 27/32 Exponential decay, spectral gap etc. Infinitesimal Version. {f(x,θ)} H(θ,θ 0 ) = f(x,θ)log f(x,θ) f(x,θ 0 ) dµ H(θ,θ 0 ) has a minimum at θ 0.

Entropy and Large Deviations p. 27/32 Exponential decay, spectral gap etc. Infinitesimal Version. {f(x,θ)} H(θ,θ 0 ) = f(x,θ)log f(x,θ) f(x,θ 0 ) dµ H(θ,θ 0 ) has a minimum at θ 0. I(θ(0)) = d2 H dθ 2 θ=θ 0

Optimizing Entropy. Entropy and Large Deviations p. 28/32

Entropy and Large Deviations p. 28/32 Optimizing Entropy. Toss a coin n times.

Entropy and Large Deviations p. 28/32 Optimizing Entropy. Toss a coin n times. Had 3n 4 heads.

Entropy and Large Deviations p. 28/32 Optimizing Entropy. Toss a coin n times. Had 3n 4 heads. Was it a steady lead or burst of heads?

Entropy and Large Deviations p. 28/32 Optimizing Entropy. Toss a coin n times. Had 3n 4 heads. Was it a steady lead or burst of heads? P[ 1 ns(nt) f(t)]

f (t) = p(t) Entropy and Large Deviations p. 29/32

Entropy and Large Deviations p. 29/32 f (t) = p(t) e n 1 0 h(p(t))dt

Entropy and Large Deviations p. 29/32 f (t) = p(t) e n 1 0 h(p(t))dt Minimize 1 0 h(p(t))dt subject to 1 0 p(s)ds = 3 4

Entropy and Large Deviations p. 29/32 f (t) = p(t) e n 1 0 h(p(t))dt Minimize 1 0 h(p(t))dt subject to 1 0 p(s)ds = 3 4 h(p) = log2+plogp+(1 p)log(1 p)

Entropy and Large Deviations p. 29/32 f (t) = p(t) e n 1 0 h(p(t))dt Minimize 1 0 h(p(t))dt subject to 1 0 p(s)ds = 3 4 h(p) = log2+plogp+(1 p)log(1 p) Convexity. p(s) = 3s 4.

General Philosophy Entropy and Large Deviations p. 30/32

Entropy and Large Deviations p. 30/32 General Philosophy P(A) is to be estimated.

Entropy and Large Deviations p. 30/32 General Philosophy P(A) is to be estimated. P(A) is small.

Entropy and Large Deviations p. 30/32 General Philosophy P(A) is to be estimated. P(A) is small. FindQsuch that Q(A) 1

Entropy and Large Deviations p. 30/32 General Philosophy P(A) is to be estimated. P(A) is small. FindQsuch that Q(A) 1 1 P(A) = Q(A) Q(A) A dq log e dp dq

Entropy and Large Deviations p. 31/32 P(A) Q(A)exp[ 1 Q(A) A log dq dp dq]

Entropy and Large Deviations p. 31/32 P(A) Q(A)exp[ 1 Q(A) Asymptotically more or less A log dq dp dq] P(A) exp[ h(q : P)]

Entropy and Large Deviations p. 31/32 P(A) Q(A)exp[ 1 Q(A) Asymptotically more or less A log dq dp dq] Optimize over Q. P(A) exp[ h(q : P)]

Example. Markov Chain Entropy and Large Deviations p. 32/32

Entropy and Large Deviations p. 32/32 Example. Markov Chain {p i,j }. Invariant distributionp

Entropy and Large Deviations p. 32/32 Example. Markov Chain {p i,j }. Invariant distributionp What is the probability that the empirical r n is close toq.

Entropy and Large Deviations p. 32/32 Example. Markov Chain {p i,j }. Invariant distributionp What is the probability that the empirical r n is close toq. Find{q i,j } such that the invariant distribution isq

Entropy and Large Deviations p. 32/32 Example. Markov Chain {p i,j }. Invariant distributionp What is the probability that the empirical r n is close toq. Find{q i,j } such that the invariant distribution isq h(q : P) = i,j q iq i,j log q i,j p i,j

Entropy and Large Deviations p. 32/32 Example. Markov Chain {p i,j }. Invariant distributionp What is the probability that the empirical r n is close toq. Find{q i,j } such that the invariant distribution isq h(q : P) = i,j q iq i,j log q i,j p i,j Optimize over {q i,j } with q as invariant distribution.

Entropy and Large Deviations p. 32/32 Example. Markov Chain {p i,j }. Invariant distributionp What is the probability that the empirical r n is close toq. Find{q i,j } such that the invariant distribution isq h(q : P) = i,j q iq i,j log q i,j p i,j Optimize over {q i,j } with q as invariant distribution. Best possible lower bound is the upper bound.

Random graphs Entropy and Large Deviations p. 33/32

Entropy and Large Deviations p. 33/32 Random graphs p is the probability of edge.

Entropy and Large Deviations p. 33/32 Random graphs p is the probability of edge. Need more triangles ( N 3) τ, τ > p 3.

Entropy and Large Deviations p. 33/32 Random graphs p is the probability of edge. Need more triangles ( N 3) τ, τ > p 3. Fix f(x,y)f(y,z)f(z,x)dxdydz = τ

Entropy and Large Deviations p. 33/32 Random graphs p is the probability of edge. Need more triangles ( N 3) τ, τ > p 3. Fix f(x,y)f(y,z)f(z,x)dxdydz = τ Minimize H(f) = [f(x,y)log f(x,y) p +(1 f(x,y))log (1 f(x,y)) 1 p ]dxdy

Entropy and Large Deviations p. 33/32 Random graphs p is the probability of edge. Need more triangles ( N 3) τ, τ > p 3. Fix f(x,y)f(y,z)f(z,x)dxdydz = τ Minimize H(f) = [f(x,y)log f(x,y) p +(1 f(x,y))log (1 f(x,y)) 1 p ]dxdy f = τ 1 3?

THANK YOU Entropy and Large Deviations p. 34/32