On the Complete Monotonicity of Heat Equation

Similar documents
A concavity property for the reciprocal of Fisher information and its consequences on Costa s EPI

Computing and Communications 2. Information Theory -Entropy

Principles of Communications

Machine Learning Srihari. Information Theory. Sargur N. Srihari

Heat equation and the sharp Young s inequality

ECE 4400:693 - Information Theory

Information Theoretic Limits of Randomness Generation

3. If a choice is broken down into two successive choices, the original H should be the weighted sum of the individual values of H.

Monotonicity of entropy and Fisher information: a quick proof via maximal correlation

Lecture 8: Channel Capacity, Continuous Random Variables

A New Tight Upper Bound on the Entropy of Sums

ECE598: Information-theoretic methods in high-dimensional statistics Spring 2016

MMSE Dimension. snr. 1 We use the following asymptotic notation: f(x) = O (g(x)) if and only

Information Theory. Coding and Information Theory. Information Theory Textbooks. Entropy

Chapter 8: Differential entropy. University of Illinois at Chicago ECE 534, Natasha Devroye

An Alternative Proof for the Capacity Region of the Degraded Gaussian MIMO Broadcast Channel

d(x n, x) d(x n, x nk ) + d(x nk, x) where we chose any fixed k > N

A Simple Proof of the Entropy-Power Inequality

Variable selection and feature construction using methods related to information theory

G 8243 Entropy and Information in Probability

Limits on classical communication from quantum entropy power inequalities

Ch. 8 Math Preliminaries for Lossy Coding. 8.4 Info Theory Revisited

Discrete Memoryless Channels with Memoryless Output Sequences

Chapter I: Fundamental Information Theory

Information maximization in a network of linear neurons

6.1 Main properties of Shannon entropy. Let X be a random variable taking values x in some alphabet with probabilities.

Fisher information and Stam inequality on a finite group

Medical Imaging. Norbert Schuff, Ph.D. Center for Imaging of Neurodegenerative Diseases

Yet Another Proof of the Entropy Power Inequality

Introduction to Information Theory

Lecture 2: August 31

Entropy Power Inequalities: Results and Speculation

A CLASSROOM NOTE: ENTROPY, INFORMATION, AND MARKOV PROPERTY. Zoran R. Pop-Stojanović. 1. Introduction

Chapter 2: Entropy and Mutual Information. University of Illinois at Chicago ECE 534, Natasha Devroye

Ch. 8 Math Preliminaries for Lossy Coding. 8.5 Rate-Distortion Theory

1.6. Information Theory

Convexity/Concavity of Renyi Entropy and α-mutual Information

3F1 Information Theory, Lecture 1

Information Theory Primer:

Information in Biology

Kinetic models of Maxwell type. A brief history Part I

Information. = more information was provided by the outcome in #2

A Gentle Tutorial on Information Theory and Learning. Roni Rosenfeld. Carnegie Mellon University

Entropy power inequality for a family of discrete random variables

EE/Stat 376B Handout #5 Network Information Theory October, 14, Homework Set #2 Solutions

Functional Properties of MMSE

Quantitative Biology Lecture 3

Estimation of information-theoretic quantities

Draft. On Limiting Expressions for the Capacity Region of Gaussian Interference Channels. Mojtaba Vaezi and H. Vincent Poor

Heat equation and the sharp Young s inequality

Information in Biology

Lecture 14 February 28

Outline of the Lecture. Background and Motivation. Basics of Information Theory: 1. Introduction. Markku Juntti. Course Overview

Gaussian channel. Information theory 2013, lecture 6. Jens Sjölund. 8 May Jens Sjölund (IMT, LiU) Gaussian channel 1 / 26

How to Quantitate a Markov Chain? Stochostic project 1

Entropy and Limit theorems in Probability Theory

Thermodynamics/Optics

Strong log-concavity is preserved by convolution

Some New Results on Information Properties of Mixture Distributions

Concavity of entropy under thinning

Submodularity in Machine Learning

ELEC546 Review of Information Theory

Lecture 2. Capacity of the Gaussian channel

Steiner s formula and large deviations theory

Lecture 11: Continuous-valued signals and differential entropy

The Effect upon Channel Capacity in Wireless Communications of Perfect and Imperfect Knowledge of the Channel

VARIANTS OF ENTROPY POWER INEQUALITY

Lecture 22: Final Review

Be sure this exam has 8 pages including the cover The University of British Columbia MATH 103 Midterm Exam II Mar 14, 2012

Chaos, Complexity, and Inference (36-462)


Plan Martingales cont d. 0. Questions for Exam 2. More Examples 3. Overview of Results. Reading: study Next Time: first exam

Logarithmic Sobolev Inequalities

Entropy and Ergodic Theory Lecture 4: Conditional entropy and mutual information

Functional Analysis HW 2

Learning Objectives for Math 165

EE 4TM4: Digital Communications II Scalar Gaussian Channel

The Central Limit Theorem: More of the Story

QB LECTURE #4: Motif Finding

FOR a continuous random variable X with density g(x), the differential entropy is defined as

Lecture 6: Gaussian Channels. Copyright G. Caire (Sample Lectures) 157

High-dimensional distributions with convexity properties

Example: Letter Frequencies

Phenomena in high dimensions in geometric analysis, random matrices, and computational geometry Roscoff, France, June 25-29, 2012

3. Convex functions. basic properties and examples. operations that preserve convexity. the conjugate function. quasiconvex functions

Example: Letter Frequencies

Channel capacity. Outline : 1. Source entropy 2. Discrete memoryless channel 3. Mutual information 4. Channel capacity 5.

Nonstochastic Information Theory for Feedback Control

Introduction to Machine Learning

Capacity Bounds for Diamond Networks

Part I. Entropy. Information Theory and Networks. Section 1. Entropy: definitions. Lecture 5: Entropy

An Extended Fano s Inequality for the Finite Blocklength Coding

Introduction to Information Entropy Adapted from Papoulis (1991)

On some special cases of the Entropy Photon-Number Inequality

Chernoff s distribution is log-concave. But why? (And why does it matter?)

The Capacity Region of the Gaussian MIMO Broadcast Channel

x log x, which is strictly convex, and use Jensen s Inequality:

Functions & Graphs. Section 1.2

Communication Theory II

Hands-On Learning Theory Fall 2016, Lecture 3

Transcription:

[Pop-up Salon, Maths, SJTU, 2019/01/10] On the Complete Monotonicity of Heat Equation Fan Cheng John Hopcroft Center Computer Science and Engineering Shanghai Jiao Tong University chengfan@sjtu.edu.cn

Overview 2 f x, t = 2 x2 t (Heat equation) f(x, t) h X + tz (Gaussian Channel) Y = X + tz Z N(0,1) h X = xlogx dx CMI = 2 (H. P. McKean, 1966) CMI 4. Conjecture: CMI = + (Cheng, 2015)

Outline Super-H Theorem Boltzmann equation and heat equation Shannon Entropy Power Inequality Complete Monotonicity Conjecture

Fire and Civilization Drill Myth: west and east Steam engine James Watts The Wealth of Nations Independence of US 1776

Study of Heat t f x, t = 1 f(x, t) 2 x2 2 Heat transfer The history begins with the work of Joseph Fourier around 1807 In a remarkable memoir, Fourier invented both Heat equation and the method of Fourier analysis for its solution

Information Age A mathematical theory of communication, Bell System Technical Journal. 27 (3): 379 423. Gaussian Channel: Z t N(0, t) X and Z are mutually independent. The p.d.f of X is g(x) Y is the convolution of X and Z t : Y = X+ The p.d.f. of Y 1 f(y; t) = g(x) e(y x)2 2t 2πt t f(y; t) = 1 2 2 y2 f(y; t) Fundamentally, Gaussian channel and heat equation are identical in mathematics (Gaussian mixture model) Z t

Entropy Formula Second law of thermodynamics: one way only Entropy

Ludwig Boltzmann Boltzmann formula: Gibbs formula: S = k b i S = k B lnw p i lnp i Ludwig Eduard Boltzmann 1844-1906 Vienna, Austrian Empire Boltzmann equation: H-theorem: df dt = ( f t ) force + ( f t ) diff + ( f t ) coll H(f(t))is non decreasing

Super H-theorem for Boltzmann Equation Notation A function is completely monotone (CM) iff all the signs of its derivatives are: +, -, +, -, (e.g., 1, t e t ) McKean s Conjecture on Boltzmann equation (1966): H(f(t)) is CM in t, when f t satisfies Boltzmann equation False, disproved by E. Lieb in 1970s the particular Bobylev-Krook-Wu explicit solutions, this theorem holds true for n 101 and breaks downs afterwards H. P. McKean, NYU. National Academy of Sciences

Super H-theorem for Heat Equation Heat equation: Is H(f(t)) CM in t, if f(t) satisfies heat equation Equivalently, is H(X + tz) CM in t? The signs of the first two order derivatives were obtained Failed to obtain the 3 rd and 4 th. (It is easy to compute the derivatives, it is hard to obtain their signs) This suggests that, etc., but I could not prove it C. Villani, 2010 Fields Medalist

Claude E. Shannon and EPI Central limit theorem Capacity region of Gaussian broadcast channel Capacity region of Gaussian Multiple-Input Multiple-Output broadcast channel Uncertainty principle All of them can be proved by Entropy power inequality (EPI) Entropy power inequality (Shannon 1948): For any two independent continuous random variables X and Y, e 2h(X+Y) e 2h(X) + e 2h(Y) Equalities holds iff X and Y are Gaussian Motivation: Gaussian noise is the worst noise. Impact: A new characterization of Gaussian distribution in information theory Comments: most profound! (Kolmogorov)

Entropy Power Inequality Shannon himself didn t give a proof but an explanation, which turned out to be wrong The first proof is given by A. J. Stam (1959), N. M. Blachman (1966) Research on EPI Generalization, new proof, new connection. E.g., Gaussian interference channel is open, some stronger EPI should exist. Stanford Information Theory School: Thomas Cover and his students: A. El Gamel, M. H. Costa, A. Dembo, A. Barron (1980-1990) Princeton Information Theory School: Sergio Verdu, etc. (2000s) Battle field of Shannon theory

Ramification of EPI Gaussian perturbation: h(x + tz) Shannon EPI Fisher Information: I X + tz = t h(x + tz)/2 Fisher Information is decreasing in t Fisher information inequality (FII): 1 1 + 1 I(X+Y) I(X) I(Y) e 2h(X+ tz) is concave in t Tight Young s inequality X + Y r c X p Y q Status Quo: FII can imply EPI and all its generalizations. However, life is always hard. FII is far from enough

On X + tz X is arbitrary and h(x) may not exist When t 0, X + tz X. When t, X + tz Gaussian When t > 0, X + tz and h(x + tz) are infinitely differential X + tz is called mixed gaussian distribution (Gaussian Mixed Model (GMM) in machine learning) X + tz is Gaussian channel/source in information theory Gaussian noise is the worst additive noise Gaussian distribution maximizes h(x) Entropy power inequality, central limit theorem, etc.

Where we take off Shannon Entropy power inequality Fisher information inequality h(x + tz) h f t is CM When f(t) satisfied Boltzmann equation, disproved When f(t) satisfied heat equation, unknown We even don t know what CM is! Motivation: to study some inequalities; e.g., the convexity of h(x + the concavity of Any progress? None I X+ tz t Information theorists got lost in the past 70 years Mathematicians ignored it e t Z), It is widely believed that there should be no new EPI except Shannon EPI and FII.

Discovery I X + tz = h X + tz 0 (de Bruijn, 1958) 2 t I (1) = I X + tz 0 (McKean1966, Costa 1985) t Observation: I(X + tz) is convex in t h X + tz = 1 ln 2πet, I X + tz = 1. I is CM: +, -, +, - 2 t If the observation is true, the first three derivatives are: +, -, + Q: Is the 4 th order derivative -? Because Z is Gaussian! The signs of derivatives of h(x + tz) are independent of X. Invariant! Exactly the same problem in McKean 1966 To convince people, we must prove its convexity

Challenge Let X g(x) h Y t = f(y, t) ln f(y, t) dy, no closed form except some special g(x). f(y, t) satisfies heat equation. I Y t = f 1 2 dy f I 1 Y t = f 2 f 2 1 f f 2 2 dy So what is I (2)? (Heat equation, integration by parts)

Challenge (cont d) It is trivial to calculate derivatives. It is hard to prove their signs

Breakthrough Integration by parts: udv = uv vdu First breakthrough since McKean 1966

GCMC Gaussian complete monotonicity conjecture: I(X + tz) is CM in t Conjecture: logi(x + tz) is convex in t Pointed out by C. Villani and G. Toscani the connection with McKean s paper A general form: number partition. Hard to determine the coefficients. Hard to find β k,j!

Complete monotone function How to construct g(x)? A new expression for entropy involved special functions in mathematical physics Herbert R. Stahl, 2013

Complete monotone function A function f(t) is CM, then logf(t) is convex in t I Y t is CM in t, then log I(Y t ) is convex in t A function f(t) is CM, a Schur-convex function can be obtained by f(t) Schur-convex Majority theory Remarks: The current tools in information theory don t work. More sophisticated tools should be built to attack this problem. A new mathematical theory of information theory

Potential application: Interference channel A challenge question: what is the application of GCMC? Mathematical speaking, a beautiful result on a fundamental problem will be very useful Potential Application Central limit theorem Capacity region of Gaussian broadcast channel Capacity region of Gaussian Multiple-Input Multiple-Output broadcast channel Uncertainty principle Where EPI works Gaussian interference channel: open since 1970s Where EPI fails CM is considered to be much more powerful than EPI

Remarks If GCMC is true A fundamental breakthrough in mathematical physics, information theory and any disciplines related to Gaussian distribution A new expression for Fisher information Derivatives are an invariant Though h(x + tz) looks very messy, certain regularity exists Application: Gaussian interference channel? If GCMC is false No Failure, as heat equation is a physical phenomenon A lucky number (e.g. 2019) where Gaussian distribution fails. Painful!