Heat Flow Derivatives and Minimum Mean-Square Error in Gaussian Noise

Heat Flow Derivatives and Minimum Mean-Square Error in Gaussian Noise Michel Ledoux University of Toulouse, France Abstract We connect recent developments on Gaussian noise estimation and the Minimum Mean-Square Error to earlier results on entropy and Fisher information heat flow expansion. In particular, derivatives ohe Minimum Mean-Square Error with respect to the noise parameter are related to the heat flow derivatives ohe Fisher information and a special Lie algebra structure on iterated gradients. The results lead in particular to a partial answer to the Minimum Mean-Square Error conjecture. 1 Introduction Recent works in information theory have been developed on the Minimum Mean-Square Error MMSE in the estimation of a random variable from its observation perturbated by a Gaussian noise [5, 6]. The concept of MMSE has been very fruitful, in particular has allowed for a simple proof of Shannon s monotonicity of entropy [4, 10] see also [9], first established in [1]. Given a random vector X in, let X t = X + 2t N, t 0, where N is an independent standard normal on. The Minimum Mean-Square Error MMSEt = E X E X Xt 2, t 0. is an estimate ohe input X ohe model given the noisy output X t the notation being the Euclidean norm in. The MMSE can be studied as a function ohe signal-to-noise ratio and as a functional ohe input distribution. In particular, the aforementioned articles and the references therein provide a study ohe analytic properties ohe MMSE and of its derivatives with respect to the parameter ohe noise perturbation, in relation with entropy and Fisher information. 1

The MMSE actually appears as an alternate description ohe standard Fisher information along the heat flow. Given a random vector X with smooth positive density f with respect to the Lebesgue measure, let Rn f 2 IX = dx f denote its Fisher information. For each t > 0, the law of X t is a convolution with a Gaussian kernel and as such admits a smooth density. With some abuse in notation, let then It = IX t be the Fisher information of X t. As emphasized in [5], the MMSE actually directly connects to the Fisher information It, t > 0, along the flow via the identity recalled below in Section 2. 4t 2 It = 2nt MMSEt, t > 0, The purpose ohis note is to relate some ohe conclusions of [5, 6] to earlier results on entropy and Fisher information expansions under heat flow [7] by means ohe so-called Γ-calculus for Markov diffusion operators cf. [3]. In particular, the Lie algebra structure on the iterated Γ-gradients emphasized in [7] provides the suitable structure to understand the successive derivatives ohe MMSE and an induction mechanism to compute arbitrary orders. This tool moreover allows for a partial answer to the MMSE conjecture of [6] stating that the MMSE/Fisher information characterizes up to the sign the distribution ohe underlying random variable. The first paragraph ohe paper Section 2 relates, following [5], the standard notion of Fisher information to the MMSE. Section 3 describes via the Γ-calculus the successive derivatives ohe Fisher information by the Lie brackets from [7], and the subsequent section interprets the main conclusion on the MMSE itself. Section 5 emphasizes conditional cumulants towards the representation ohe derivatives ohe MMSE. On the basis ohis representation, the next section addresses the MMSE conjecture in a particular case. The final Section 7 collects a few bounds on the Fisher information and the MMSE of independent interest already discussed in [6]. The Appendix Section 8 presents proofs of some specific properties ohe Lie brackets in this context. 2 MMSE and Fisher information We start by connecting, following [5], the MMSE to the Fisher information along the heat flow. On a probability space Ω, A, P, let therefore X be a random vector and X t = X + 2t N, t 0, where N is an independent standard normal on with the Gaussian distribution Denote by p t x the Gaussian kernel p t x = dγx = e x 2 /2 dx 2π. n/2 1 4πt n/2 e x 2 /4t, t > 0, x, 2

so that in particular, for any measurable and bounded ϕ : R, E ϕx t = E ϕ X + 2t x dγx R n = E ϕx + x p t xdx R n = E ϕx p t x Xdx. Therefore, for any t > 0, the law of X t has a strictly positive C density with respect to the Lebesgue measure which is given as the convolution ohe law of X with p t, that is x = E p t x X, x. The Fisher information IX t = It of X t is therefore represented along the heat flow as It = Rn 2 dx, t > 0. 2.1 As will be seen below via the connection with the MMSE, It is actually well defined finite for every t > 0. Define now, for any t > 0, g t x = E Xp t x X, x. Then, assuming first that X is integrable, the conditional expectation EX X t may be represented by E X X t = g t X t almost surely. Indeed, if ϕ : R is measurable and bounded, E ϕx t E X X t = E XϕXt = E XϕX + x p t xdx R n = ϕx E Xp t x X dx, while E ϕx t g t X t from which the claim follows. As a consequence, if h t = x gt, then = ϕxg t xdx X t E X X t = ht X t. Note that although EX X t may not be well-defined if X is not integrable, it makes sense to consider the integrable random variable X t EX X t which is identified to 2t EN X t since X t = E X t X t = E X Xt + 2t E N Xt. 3

In particular, X EX X t makes also sense and has moments of all orders. Therefore, the Minimum Mean-Square Error is well-defined for every t 0. MMSEt = E X E X Xt 2, t 0. 2.2 Now, observe that by the Lebesgue differentiation theorem, 2t = E X xp t x X = g t x. Hence 4t 2 1 It = xft g t 2 dx R f n t = x g t 2 dx R f n t Xt = E E X X t 2 = 2t E N 2 E X E X X t 2 since X t EX X t is orthogonal to X EX X t. Therefore the Fisher information I and the MMSE are related by the identity 4t 2 It = 2nt MMSEt, t > 0. 2.3 In particular, the Fisher information It is well-defined for every t > 0 whatsoever the underlying random vector X. The Mutual Mean Square Error MMSE of [5] and [6] is actually studied as a function ohe noise parameter as mmses = E X E X s X + N 2 = E X E X Xt 2 = MMSEt 2.4 for t = ts = 1, s > 0. It is indeed in this form that the successive derivatives ohe Fisher 2s information and ohe MMSE will connect most suitably as opposed to the relation 2.3. The study ohe function mmses, s > 0, will then require various, unfortunately somewhat heavy, changes of variables in the subsequent analysis. It is shown in [6] that the mmse function is infinitely differentiable at every s > 0, and at s = 0 whenever X has moments of all orders and thus also the MMSE and the Fisher information. We freely use these properties below. 3 Heat flow derivatives We present here in the context ohe heat semigroup on the results of [7] on the successive derivatives of entropy and Fisher information. 4

The derivatives of entropy along the heat flow have been examined from various viewpoints in the literature, in particular in connection with logarithmic Sobolev inequalities cf. e.g. [2, 3]. Given a random vector X and X t = X+ 2tN, t > 0, with probability density with respect to the Lebesgue measure, consider, provided it exists, the entropy with a positive sign convention along the flow Ht = log dx, t > 0. The classical de Brujin formula expresses that d Ht = It 3.1 dt where we recall the Fisher information It of 2.1. Note that with v = log, t > 0, It = v 2 dx. Similarly, as is classical from the Bakry-Émery calculus [2, 3], d 2 dt Ht = 2 f 2 t 2 v 2 dx 3.2 where 2 v is the matrix ohe second derivatives of v. According to the end of Section 2, it may be implicitly assumed here and below that entropy and Fisher information are infinitively differentiable on 0,. Note that while the calculus of [2, 3, 7] is developed along a semigroup from a given probability density, it makes sense in the same way along the densities, t > 0, since the latter solve similarly the heat equation t =. Nevertheless, it is known from [7] that the rule d l dt Ht = l 1l 2 l 1 l v 2 dx cannot be iterated for l 3. The work [7] actually develops in the context of an abstract Markov diffusion operator the suitable algebraic framework to express the successive derivatives of entropy and Fisher information along the heat flow via the Γ-calculus on the iterated gradients cf. [3]. We recall here some ohe main elements ohis description. Consider the bilinear symmetric carré du champ operator Γu 1, u 2 = u 1 u 2 acting on smooth functions u 1, u 2 : R. For simplicity, set Γu = Γu, u. Introduce the Lie bracket between the Laplace operator or more generally a linear operator L and a j-multilinear j 1 symmetric form Bu = Bu,..., u on smooth functions by 2 [, B]u = Bu jb u, u,..., u. 5

Here [, B] is actually a j-multilinear form which is defined by polarization from [, B]u = [, B]u,..., u. For example if B is 2-linear, 2 [, B]u 1, u 2 = Bu 1, u 2 B u 1, u 2 Bu 1, u 2 = [, B]u 1 + u 2, u 1 + u 2 [, B]u 1, u 1 [, B]u 2, u 2. It is classical and easy to see that if we define Γ l u 1, u 2 = l u 1 l u 2 the dot product being understood as the scalar product between l-tensors with Γ 1 = Γ, for every l 1, [, Γ l ]u = Γ l+1 u = l u l u = l+1 u 2. 3.3 However, in order to study the time derivatives of entropy, it is necessary to go one step further in the brackets and to consider 2 [Γ, B]u = 2 Γ Bu, u jb Γu, u,..., u. Here, since Γ is 2-multilinear, [Γ, B] is now j + 1-multilinear and obtained by polizarization from [Γ, B]u = [Γ, B]u,..., u. For example, if B is 2-linear, 2 [Γ, B]u 1, u 2, u 3 = 1 3 [ ] Γ Buσ1, u σ2, u σ3 B Γuσ1, u σ2, u σ3 σ where the sum runs over all permutations σ of {1, 2, 3}. We refer to [7] for more details. In this framework, the main conclusion of [7] is summarized in the following statement. According to the preceding, define for every l 2, Γ l = [ + Γ, Γ l 1 ] recursively from Γ 1 u = Γu = u 2 for smooth u : R. Note that actually Γ l = [, Γ l 1 ] + [Γ, Γ l 1 ] which by induction yields a sum of j-multilinear forms with 2 j l. Recall v = log. Theorem 1. For every l 1, d l dt l Ht = 1l 2 l 1 Γl vdx. For the further purposes, we will rather deal equivalently by 3.1 with the derivatives of the Fisher information. Theorem 1 therefore expresses that d l It = 2l f dtl t Γl+1 vdx, l 0. 3.4 It is a main feature, at the root ohe Bakry-Émery criterion for logarithmic Sobolev inequalities cf. [2, 7, 3], that Γ 2 = Γ 2. Indeed, Γ 2 = [ + Γ, Γ] = [, Γ] = Γ 2. 6

However, the whole point of Theorem 1 is that this identity does not extend to the higher orders. Specifically, Γ 2 = Γ 2, Γ 3 = Γ 3 + [Γ 1, Γ 2 ], Γ 4 = Γ 4 + 2 [Γ 1, Γ 3 ] + [Γ 1, [Γ 1, Γ 2 ]], Γ 5 = Γ 5 + 3 [Γ 1, Γ 4 ] + 2 [Γ 2, Γ 3 ] + 3 [Γ 1, [Γ 1, Γ 3 ]]... + [Γ 2, [Γ 1, Γ 2 ]] + [Γ 1, [Γ 1, [Γ 1, Γ 2 ]]], For example, in dimension one, Γ 1 u = u 2, Γ 2 u = u 2, but Γ 3 u = u 2 2u 3, Γ 4 u = u 42 12u u 2 + 6u 4, Γ 5 u = u 52 30u 2 u 4 20u u 42 + 120u 2 u 2 24u 5. The successive expressions Γ l u are more and more cumbersome, but precisely given by the Lie bracket structure. It will be useful to record the following rough description ohe Γ l u, l 1, in dimension one. Its proof is postponed to the Appendix, Section 8. Recall Γ 1 u = Γu = u 2. Proposition 2. For every l 2 and every smooth function u : R R, Γ l u = R l u 2,..., u l 3.5 where u j, j 2, denote the successive derivatives of u and where R l = R l Z 2,..., Z l is a polynomial in the variables Z 2,..., Z l ohe form R l Z 2,..., Z l = Z 2 l + R l 1 Z 2,..., Z l 1 where R l 1 is some polynomial of l 2 coordinates R 1 0. The polynomial R l 1 is not explicit, although it may be noticed that it contains 1 l l 1! Z l 2. Moreover Γ l u is 2lhomogeneous under the transformation ux u λ x = uλx, λ R, in the sense that In particular, R l λ 2 Z 2,..., λ l Z l = λ 2l Rl Z 2,..., Z l. Γ l u λ x = λ 2l Γl uλx. 3.6 For example, according to the above expressions for Γ 2, Γ 3, Γ 4, Γ 5, R 2 Z 2 = Z 2 2, R 3 Z 2, Z 3 = Z 2 3 2Z 3 2, R 4 Z 2, Z 3, Z 4 = Z 2 4 12Z 2 Z 2 3 + 6Z 4 2, R 5 Z 2, Z 3, Z 4, Z 5 = Z 2 5 30Z 2 3Z 4 20Z 2 Z 2 4 + 120Z 2 2Z 2 3 24Z 5 2. 7

4 Derivatives ohe mmse The preceding section provides a full description ohe successive derivatives ohe Fisher information It, t > 0. We develop here the connection with the derivatives ohe mmse mmses = MMSEt, t = ts = 1 2s, s > 0, of 2.4 via suitable changes of functions and variables. Recall v = vt, x = log x, t > 0, x, which defines a C function of both the 1 space and time variables. Recall also the Gaussian kernel p t x = e x 2 /4t and set 4πt n/2 h = ht, x = ftx. Since p tx tp t = p t as well as t =, it is an easy exercise to check that h solves the pde t h = h 1 t x h. With t = ts = 1 2s, set furthermore k = ks, x = h t, 2t x = h 1 2s, x s. Then s k = 1 Lk 4.1 2s where L is the Ornstein-Uhlenbeck generator acting on smooth function ϕ on as Lϕ = ϕ x ϕ. The operator L has to be thought of as the Laplacian with a drift in such a way that the standard Gaussian measure γ is its invariant and symmetric measure. Let then q = log k and introduce, for s > 0, ms = 1 k Υqdγ = 1 Υk dγ 4.2 s R s n R k n where Υq = Γq = q 2 the change to the letter Υ will be justified below. The function ms, s > 0, is closely linked to the mmse. To this aim, observe first that after a change of variables, with always t = ts = 1 2s, ms = 4t 2 w 2 dx. where w = log h. Indeed, recalling that k = h 1, x 2s s, after the change of variables s x = y, and t = 1, 2s 1 k Υqdγ = 1 Rn k 2 dγ s R s k n = 1 1 s 2 h 1, h 2 1, x x 2s s e x 2 /2 dx 2s s 2π n/2 = 4t 2 ht, y h 2 t, yp t ydy = 1 Rn h 2 dy h 2 = 4t 2 w 2 dy. 8

Since w = log h = v log p t, going back to v = log, Therefore w 2 = v + x 2 = v 2 + 1 2t t x 2 x v + = 2 + 1 x f 4t 2 ft 2 t + x 2 t 4t. 2 ms = 4t R 2 w 2 dx = 4t 2 It + 4t x dx + n = 4t 2 It 4nt + x 2 dx where we used that x X x dx = x E p t x X dx R 2t n = 1 2t R E x x X p t x Xdx n = 1 2t R E x + X x p t xdx n = 1 x 2 p t xdx = n. 2t Using in addition that it follows that Hence x 2 dx = E X t 2 = E X 2 + 2nt, ms = 4t 2 It 2nt + E X 2 = MMSEt + E X 2. x 2 dx ms = mmses + E X 2. 4.3 This identity requires a second moment on X. However, as will be clear from the main Theorem 7 below, this condition is no more necessary at the first and next derivatives for which E X 2 cancels out. The analysis ohe derivatives ohe mmse is thus brought back to the derivatives ohe function m of 4.2. The function m is ohe form of a Fisher information but, with respect to the setting of Section 2, the underlying generator and semigroup are the Ornstein-Uhlenbeck ones rather than the standard heat Brownian generator and semigroup, with the Gaussian measure γ as invariant and symmetric measure. The principle of proof emphasized in [7] is nevertheless exactly the same, with simply some variations in the expression ohe iterated gradients. Denote by Υ l, l 1, the iterated gradients associated to Ornstein-Uhlenbeck operator L = x defined by Υ l = [L, Υ l 1 ], l 2, with Υ 1 u = Γu = u 2. As developed in 9

[7], these iterated gradients are closely related to the standard gradients Γ l, l 1, of 3.3 in the following way. Let Q l the polynomial in the variable Z Q l Z = l 1 Z i = i=0 l a l iz i, i=1 and set by extension Then, for every l 1, Q l Υ = l a l i Υ i. i=1 Q l Υ = Γ l. For example, Q 1 Υ = Υ 1 = Γ 1, Q 2 Υ = Υ 2 Υ 1 = Γ 2, Q 3 Υ = Υ 3 3Υ 2 + 2Υ 1 = Γ 3. As in the standard Laplacian case, introduce the brackets Υ l = [L+Γ, Υ l 1 ], l 2, starting from Υ 1 = Υ 1 = Γ 1 = Γ, and set then Q l Υ = l a l Υ i i. i=1 Note that Υ 2 = Υ 2 = Γ 2 + Γ. The differentiation process expressed by Theorem 1 then takes the same form. At the first and second steps, the derivatives of ms = 1 k Υqdγ, s > 0, s are given by d ds ms = 1 k [ Υ s 2 2 q Υ 1 q ] dγ = 1 k Q R s n 2 2 Υqdγ, d 2 ds ms = 1 k [ Υ3 q 3Υ 2 s 3 2 q + 2Υ 1 q ] dγ = 1 k Q R s n 3 3 Υqdγ. By induction, we end up with the following conclusion [7]. Theorem 3. In the preceding notation, for every l 0, d l ds ms = 1 k Q l s l+1 l+1 Υqdγ. For the proof, we may refer more precisely to the key relation 3.7 from [7] which expresses that d k Q l Υqdγ = l k Q l Υqdγ + 1 k Q l+1 Υqdγ, ds R s n R s n from which the inductive step follows immediately. 10

Theorem 3 provides a full description ohe derivatives ohe function m, and therefore of the mmse, in terms ohe iterated gradients Υ l and their brackets. While the following is not strictly necessary towards the main conclusion, it will be convenient to record it here as the calculus with the standard gradients Γ l is somewhat more direct than the one with the Υ l s. Actually, in the same way as Q l Υ = Γ l, we also have Proposition 4. For every l 1, Q l Υ = Γ l. We postpone the proof ohis proposition to the appendix. As a consequence of Proposition 4, and Theorem 3, d l ds ms = 1 k l s Γ l+1 l+1 qdγ. According to the change of variables 3.6, we therefore obtain a first description ohe derivatives ohe mmse. Recall t = ts = 1 2s and w = log s p ts. Corollary 5. For every l 1, d l ds mmses = 1 l s 2l+1 5 Conditional cumulants s Γl+1 wdx. In order to make Corollary 5 effective, we express in this section the quantities Γ l w, l 1, in terms of conditional cumulants. For simplicity, we deal with dimension one and analyze first the spatial derivatives of v = vt, x = log x, t > 0, denoted v l, l 1. Denote by κ l t = κ l tx, l 1, the conditional cumulants given X t = x defined by the expansions in λ R, log EeλX p t x X λ l = x l! κl tx. 5.1 The first conditional cumulant is the mean while the second cumulant is the variance l=1 κ 1 t x = E X X t = x κ 2 t x = E X 2 X t = x E X X t = x 2. By the kernel representation ohe heat semigroup, for every t > 0 and λ R, x + 2tλ = E p t x + 2tλ X = 1 4πt E e x+2tλ X2 /4t = e λ2 t λx E e λx p t x X. 11

Hence that is, by 5.1, vx + 2tλ = log x + 2tλ = λ 2 t λx + log E e λx p t x X vx + 2tλ vx = λ 2 t λx + l=1 λ l l! κl tx. Since by a formal Taylor expansion, vx + 2tλ = 2tλ l l=0 v l x, it follows by comparison l! that 2tv = κ 1 t x, 4t 2 v = κ 2 t 2t and, for every l 3, 2t l v l = κ l t. Recalling that w = v log p t = v x2 4t 1 2 log4πt, we therefore reach the following conclusion. Proposition 6. For every l 1, 2t l w l = κ l t. 5.2 We can now go back to Corollary 5. Recall the polynomials R l, l 1, from Proposition 2 so that Γ l+1 w = R l+1 w 2,..., w l+1 = 2t 2l+1 Rl+1 κ 2 t,..., κ l+1 t. Hence, with t = ts = 1, 2s d l ds mmses = f l t Rl+1 κ 2 t,..., κ l+1 t dx. 5.3 Let K l s be the conditional cumulants of X given s X + N, equivalently X ts to shorten the notation, that is with generating series in λ R, log E e λx X ts = l=1 λ l l! Kl s. Note that all these conditional cumulants, besides the first one, have moments of all orders on the same scheme as for 2.2. Indeed, as is classical, Kl s, l 2, is the coefficient in front of λl l! ohe formal series expansion 1 λ m k k m! M s m k 1 m 2 where M m s are the conditional centered moments. For example, In addition, EK 2 s = EM 2 s = MMSEts. [X ] m = E E X Xts Xts, m 2, 5.4 K 2 s = M 2 s, K 3 s = M 3 s, K 4 s = M 4 s 3M 2 s 2. Recalling that X t has law dx, the identities 5.3 lead to the following main conclusion describing the successive derivatives ohe mmse in terms ohe conditional cumulants K l s, l 1. 12

Theorem 7. In the preceding notation, for every l 1, at s > 0, d l ds l mmses = E Rl+1 K 2 s,..., K l+1 s. 5.5 As an illustration, the first derivatives are given by d ds mmses = E K 2 s 2, d 2 ds 2 mmses = E K 3 s 2 2K 2 s 3, d 3 ds 3 mmses = E K 4 s 2 12K 2 s K 3 s 2 + 6K 2 s 4, d 4 ds 4 mmses = E K 5 s 2 30K 3 s 2 K 4 s 20K 2 s K 4 s 2 + 120K 2 s 2 K 3 s 2 24K 2 s 5. These identities are in accordance with the formulas of [8, 6] expressed in terms ohe conditional central moments 5.4. Indeed, d ds mmses = E M 2 s 2, d 2 ds 2 mmses = E M 3 s 2 2M 2 s 3, d 3 ds 3 mmses = E M 4 s 2 6M 4 s M 2 s 2 12M 2 s M 3 s 2 + 15M 2 s 4 d 4 ds 4 mmses = E M 5 s 2 20M 2 s M 3 s M 5 s 30M 3 s 2 M 4 s 20M 2 s M 4 s 2 + 120M 2 s 3 M 4 s + 310M 2 s 2 M 3 s 2 204M 2 s 5. To illustrate these identities, observe for example that if X is centered normal with variance σ 2, then σ 2 mmses = 1 + σ 2 s, s > 0, while K 1 s = σ2 s 1+σ 2 s X ts, K 2 s = σ2 1+σ 2 s, and Kl s = 0 if l 2, so that d l σ 2 l+1 ds mmses = l 1l l! = 1 l+1 l! E K 1 + σ 2 s 2 l+1. s 6 On the MMSE conjecture The work [6] raises the conjecture that the knowledge ohe Minimum Mean-Square Error MMSEt, t 0 equivalently the entropy Ht or the Fisher information It, t > 0, along the flow, determines the underlying distribution of a centered real-valued random variable X up to the sign, namely either the law of X or of X. Note indeed that [X ] 2 MMSEt = E E X Xt, t 0, 13

is invariant by translation of X by a constant, so does not characterize the mean, and by the change of X into X. The idea expressed by the preceding investigation is that the MMSE could act as a kind of moment generating function, its successive derivatives at 0 determining the moments or the cumulants ohe underlying distribution. To make use of Theorem 7 following this principle, observe first that the expected conditional cumulants EK l s, l 1, converge as s 0 to the cumulants K l X of X, provided all moments of X are finite. It might be of interest to briefly justify this intuitive statement. To this task, let us first show that EX X t EX almost surely as t whenever X is integrable. Indeed, in the notation of Section 2, E X X t = g t X t = Ẽ Xp t X t X Ẽp t X t X = Ẽ X e Xt X 2 /4t Ẽe Xt X 2 /4t where X is an independent copy of X. Now, if E X = E X <, by dominated convergence, almost surely, lim Ẽ X e X t X 2 /4t = e N 2 /2 EX t while from which the claim follows. lim Ẽ e X 2 2 XX t/4t = e N 2 /2 t Assuming all moments of X are finite, for each m, the family [X EX X t ] m, t 0, is uniformly integrable. Therefore, as s 0 recall s = 1 = 1, the expected conditional t 2ts moments EMs m converge to the centered moments E[X EX] m of X. In particular, MMSEt K 2 X as t. Now, the same type of arguments shows that, for every k 1, and every m 1,..., m k 1, EM m 1 s M m k s E [ X EX ] m 1 E [ X EX ] mk and similarly with the conditional cumulants K l s, EK m 1 s K m k s K m 1 X K m k X. On the basis of Theorem 7 and the latter, a partial result towards the MMSE conjecture may therefore be emphasized. Recall from Proposition 2 that R 2 Z 2 = Z 2 2 and, for every l 2, R l+1 Z 2,..., Z l+1 = Z 2 l+1 + R l Z 2,..., Z l. Hence, in the limit as s 0 in 5.5, the quantities K l+1 X 2 + Rl K 2 X,..., K l X, l 1 are determined by the mmse/mmse. While these expressions are recursive, the knowledge of K 2 X,..., K l X only determines K l+1 X up to the sign. Therefore, only the case of non-negative cumulants may be handled. 14

Corollary 8. Let X be a centered random variable determined by its moments such that all its cumulants K l X, l 0, are non-negative. Then the MMSE characterizes the distribution of X. Since K l X = 1 l K l X, if X is a centered random variable determined by its moments such as both K l X and K l X, l 0, are non-negative, then necessarily K l X = 0 for odd l s and the law of X is symmetric. It may be pointed out that, within this corollary, the knowledge of all derivatives of MMSE/mmse at zero is enough to determine the law of X. Since it is not known whether mmses is real-analytic at zero, which is proved in [5] only under extra assumptions, this might be weaker than knowing the function mmse. A simple example of a random variable for which the hypotheses ohe corollary are satisfied is the case of a symmetric infinitively divisible random variable X determined by its moments. That is, the law of X has Fourier transform ohe form e ψ where ψt = σ2 t 2 2 + e itx 1dµx where σ 2 0 and µ is a symmetric measure on R such that µ{0} = 0 and R x2k dµx < for every k 1. Then K 2 X = σ 2 + R x2 dµx, K 2k+1 X = 0, k 1, and K 2k X = x 2k dµx, k 2. R One may also consider the case of a Lévy measure µ supported on R +. Concrete examples may be achieved as compound Poisson distributions. Let Y be a symmetric or positive random variables and P an independent Poisson variable with parameter θ > 0. Considering Y 0, Y 1,..., independent copies of Y, set P X = Y k. Then K l X = θ EY l 0, l 1. R k=0 It might be worthwhile mentioning in addition that the knowledge ohe cumulants up to the sign is not enough to characterize the distribution in general. For example, the hyperbolic dx 1 distribution has Laplace transform while the Laplace transform ohe symmetric Bernoulli measure on { 1, +1} is coshλ. The odd cumulants ohese two distributions 2π coshπx/2 cosλ are zero, while if K 2l, l 1, denote the even cumulants ohe Bernoulli law, those ohe hyperbolic distribution are 1 l+1 K 2l. The multivariate version ohe MMSE conjecture is also worth consideration, and might involve multidimensional versions ohe Fisher information and cumulants. Some formulas in this regard are displayed in the next section. 15

7 Some monotonicity properties ohe MMSE In this last section, we collect a few simple monotonicity properties ohe Fisher information and the MMSE, some ohem already emphasized in [6]. We first mention from the identity 4t 2 It = 2nt MMSEt, t > 0, of 2.3 that, for every t > 0, It n 2t 7.1 which connects in this Euclidean setting to the classical Li-Yau parabolic inequality on heat kernels in Riemannian manifolds 1. The derivative formulas ohe preceding sections may be extended to higher dimension, at the expense however of somewhat cumbersome computations and notation. For simplicity, let us stick here at the level ohe Fisher information and its first derivative. Let thus X be a random vector in and X t = X + 2t N, t 0, as above. With Z t = Cov X X t = E [X EX Xt ] [ X EX X t ] X t it holds from Sections 2 and 3 that = E X X X t E X Xt E X Xt, 4t 2 It = 2nt MMSEt = E TrZ t 2t Id 7.2 and 8t 4 d dt It = E Z t 2t Id 2 7.3 where is the Hilbert-Schmidt norm on matrices. Indeed, from 3.2, Now, in the same form as d It = 2 2 v 2 dx = 2 dt 2t = g t x 2 f 2 t where g t x = EXp t x X and EX X t = gt X t, it holds true that 2 dx. 4t 2 2 = x x 2t Id 2x g t + h t 1 The Li-Yau inequality states that if is a positive solution ohe heat equation on a n-dimensional Riemannian manifold with non-negative Ricci curvature, then log n 2t which integrated yields 7.1 see [3]. 16

where h t x = EX Xp t x X and EX X X t = ht X t. Hence 4t 2 2 f t = h t g t g t 2t Id ft 2 from which the claim 7.3 follows by definition of Z t. Combining 7.2 and 7.3 yields some interesting consequences. First, 2t 2 d dt MMSEt = 4nt2 + 4t E TrZ t 2t Id + E Z t 2t Id 2 = E Z t 2 0, so that MMSEt, t 0, is increasing. This property can be made somewhat more precise. Indeed, E TrZ t 2t Id 2 n E Zt 2t Id 2 from which it follows that 2t 2 d dt MMSEt 1 n MMSEt2. Hence Therefore, for all 0 < t t, d 1 dt MMSEt 1 2nt 2. 1 MMSEt 1 MMSEt 2n 1 1 t 1 t In particular, as t, MMSEt M 2 X = E X EX 2, so that for every t 0, MMSEt. 2nt M 2 X 2nt + M 2 X. 7.4 Note that the right-hand side of 7.4 is precisely the MMSE of a random vector distributed according to the standard normal law γ, which thus achieves the maximum ohe MMSE over centered random variables with finite second moment. Another property was emphasized in [6] as the single crossing property. Namely, if MMSEt 0 2 2nt 2 0αt 0 at t 0 > 0 for some smooth function αt, t > 0, then d dt MMSEt t=t0 αt 0. Indeed, setting F t = βt + MMSEt, t > 0, where β is an anti-derivative of α, by 7.2, F t 0 = αt 0 + 2n + 2 E TrZ t0 2t 0 Id + 1 E Z t 0 2t 2 t0 2t 0 Id 2 0 1 E TrZ 2nt 2 t0 2t 0 Id 2 1 + E Z 0 2t 2 t0 2t 0 Id 2 0. 0 As discussed in [6], a sensible choice for the function α is the MMSE of a standard normal given by the right-hand side of 7.4. 17

8 Appendix: Proofs of Propositions 2 and 4 Proof of Proposition 2. Although the proposition is presented for functions on the real line, we use the general multi-dimensional notation. The proof of 3.5 of Proposition 2 relies on the following statement. The more precise description ohe multilinear forms V l therein is not strictly necessary for the purpose. Property 3.6 also follows from this statement. Proposition 9. For every l 3, Γ l u = Γ l u + V l u where V l u = l j=3 α 1,...,α j {2,...,l 1} α 1 + +α j =2l c α1,...,α j u α 1 u α j 8.1 for real coefficients c α1,...,α j, α 1,..., α j {2,..., l 1}, j = 3,..., l. In addition, c α1,...,α l = 1 l l 1!. It might be of interest to understand the combinatorial structure ohe coefficients c. Proof. The proof is performed by induction. Assuming that Γ l u = Γ l u+v l u, we examine [, Γ l ]u = [, Γ l ]u+[, V l ]u and [Γ, Γ l ]u = [Γ, Γ l ]u+[γ, V l ]u as Γ l+1 = [ +Γ, Γ l ]. First [, Γ l ]u = Γ l+1 u, providing the term Γ l+1 u in Γ l+1 u. Next which is ohe form V l+1 u. [Γ, Γ l ]u = Γ Γ l u, u Γ l Γu, u = u u l2 u 2 l u l We then study [, V l ]u and [Γ, V l ]u for some V l ohe form 8.1. Let Bu = Bu,..., u = u α 1 u α j be j-multilinear symmetric in smooth functions u where α 1,..., α j are integers greater than 2 satisfying α 1 + + α j = 2l. By definition, 2[, B]u = Bu jb u, u,..., u = u α 1 u j α j u α 1 u α i+2 u α j = j i=1 i,i =1,i i u α1 u αi+1 u αi +1 u αj. Hence [, V l ]u has the form V l+1 u. In the same way, 2[Γ, B]u = 2 Γ u, Bu jb Γu, u, u,..., u j = 2 u u α 1 u α i+1 u αj i=1 18 j u α 1 u 2 α i u α j. i=1

Now u 2 α i = 2u u α i+1 + P u where P u only involves derivatives of u up to the order α i, with total degree α i + 2. Therefore [Γ, V l ]u is also ohe form V l+1 u. Moreover, if j = l so that necessarily α 1 = = α l = 2, then 2[Γ, B]u = 2l u 2l+1, the only term ohis form in V l+1 u, justifying the last assertion ohe statement. The proposition is established. Proof of Proposition 4. We use induction on l. By construction ohe polynomials l 1 l Q l Z = Z i = a l iz i, l 1, it holds that Q l+1 Z = l i=1 al iz i+1 l Q l Z. Hence, Then, by the induction hypothesis, i=0 i=1 Q l+1 Υ = [L + Υ, Q l Υ] l Q l Υ. Q l+1 Υ = [L + Γ, Γ l ] l Γ l = [L, Γ l ] + [Γ, Γ l ] l Γ l. Since Γ l+1 = [ + Γ, Γ l ], it suffices to show that We establish 8.2 also by induction. On the one hand, [L, Γ l ] = [, Γ l ] + l Γ l. 8.2 [L, Γ l+1 ] = [L, [, Γ l ]] + [L, [Γ, Γ l ]]. On the other hand, by the induction hypothesis, [, Γ l+1 ] = [, [, Γ l ]] + [, [Γ, Γ l ]] = [, [L, Γ l ]] l [, Γ l ] + [, [Γ, Γ l ]]. The identity 8.2 at the order l + 1 reads [L, Γ l+1 ] = [, Γ l+1 ] + l + 1[ + Γ, Γ l ] so that in order it holds true, it amounts to show that [L, [, Γ l ]] + [L, [Γ, Γ l ]] = [, [L, Γ l ]] + [, Γ l ] + [, [Γ, Γ l ]] + l + 1[Γ, Γ l ]. Since [, L] =, by the Lie bracket property cf. [7], so that it is left to show that [, [L, Γ l ]] + [, Γ l ] [L, [, Γ l ]] = 0 [L, [Γ, Γ l ]] = [, [Γ, Γ l ]] + l + 1[Γ, Γ l ]. 8.3 Again by the Lie bracket property, and the fact that [L, Γ] = Υ 2 = Γ 2 + Γ = [, Γ] + Γ, while [L, [Γ, Γ l ]] = [ Γ l, [L, Γ]] + [Γ, [L, Γ l ]] = [ Γ l, [, Γ]] + [Γ, Γ l ] + [Γ, [L, Γ l ]] [, [Γ, Γ l ]] = [ Γ l, [, Γ]] + [Γ, [, Γ l ]]. Since by the induction hypothesis [L, Γ l ] = [, Γ l ]+l Γ l, it immediately follows that 8.3 holds true, thereby completing the proof of Proposition 4. 19

Acknowledgment. I am grateful to O. Johnson, and I. Nourdin and G. Peccati, for pointing out to me the reference [6], and to D. Bakry, M. Capitaine and G. Letac for useful discussions. I thank the referees for a careful reading ohe earlier manuscript and suggestions of improvements in the exposition. References [1] S. Artstein, K. Ball, F. Barthe, A. Naor. Solution of Shannon s problem on the monotonicity of entropy. J. Amer. Math. Soc. 17 2004, 975 982. [2] D. Bakry, M. Émery. Diffusions hypercontractives. Séminaire de Probabilités XIX, Lecture Notes in Math. 1123 1985, 177 206. Springer. [3] D. Bakry, I. Gentil, M. Ledoux. Analysis and geometry of Markov diffusion operators. Grundlehren der mathematischen Wissenschaften 348, Springer, Berlin, 2014. [4] M. Madiman, A. Barron. Generalized entropy power inequalities and monotonicity properties of information. IEEE Trans. Inform. Theory 53 2007 2317 2329. [5] D. Guo, S. Shamai, S. Verdú. Mutual information and minimum mean-square error in Gaussian channels. IEEE Trans. Inform. Theory 51 2005, 1261 1282. [6] D. Guo, Y. Wu, S. Shamai, S. Verdú. Estimation in Gaussian noise: properties ohe minimum mean-square error. IEEE Trans. Inform. Theory 57 2011, 2371 2385. [7] M. Ledoux. L algèbre de Lie des gradients itérés d un générateur markovien Développements de moyennes et entropies. Ann. scient. Éc. Norm. Sup. 28 1995, 435 460. [8] V. Prelov, S. Verdú. Second-order asymptotics of mutual information. IEEE Trans. Inform. Theory 50 2004, 1567 1580. [9] D. Shlyakhtenko. A free analogue of Shannon s problem on monotonicity of entropy. Adv. Math. 208 2007, 824 833. [10] A. Tulino, S. Verdú. Monotonic decrease ohe non-gaussianness ohe sum of independent random variables: a simple proof. IEEE Trans. Inform. Theory 52 2006, 4295 4297. Institut de Mathématiques de Toulouse Université de Toulouse Paul-Sabatier, F-31062 Toulouse, France & Institut Universitaire de France ledoux@math.univ-toulouse.fr 20