Convergence of generalized entropy minimizers in sequences of convex problems

Similar documents
l(y j ) = 0 for all y j (1)

THEOREMS, ETC., FOR MATH 515

Algebraic matroids are almost entropic

Chapter 8. General Countably Additive Set Functions. 8.1 Hahn Decomposition Theorem

Real Analysis, 2nd Edition, G.B.Folland Elements of Functional Analysis

L p Spaces and Convexity

Introduction to Real Analysis

MATH 409 Advanced Calculus I Lecture 7: Monotone sequences. The Bolzano-Weierstrass theorem.

Subdifferential representation of convex functions: refinements and applications

+ 2x sin x. f(b i ) f(a i ) < ɛ. i=1. i=1

Probability and Measure

On Semicontinuity of Convex-valued Multifunctions and Cesari s Property (Q)

Convexity/Concavity of Renyi Entropy and α-mutual Information

MATHS 730 FC Lecture Notes March 5, Introduction

Copyright 2010 Pearson Education, Inc. Publishing as Prentice Hall.

MATH MEASURE THEORY AND FOURIER ANALYSIS. Contents

5 Measure theory II. (or. lim. Prove the proposition. 5. For fixed F A and φ M define the restriction of φ on F by writing.

HILBERT SPACES AND THE RADON-NIKODYM THEOREM. where the bar in the first equation denotes complex conjugation. In either case, for any x V define

2 Statement of the problem and assumptions

A convergence result for an Outer Approximation Scheme

LECTURE 15: COMPLETENESS AND CONVEXITY

Examples of Dual Spaces from Measure Theory

Problem Set. Problem Set #1. Math 5322, Fall March 4, 2002 ANSWERS

van Rooij, Schikhof: A Second Course on Real Functions

Differentiation of Measures and Functions

U e = E (U\E) e E e + U\E e. (1.6)

Some Properties of the Augmented Lagrangian in Cone Constrained Optimization

The local equicontinuity of a maximal monotone operator

THEOREMS, ETC., FOR MATH 516

3 (Due ). Let A X consist of points (x, y) such that either x or y is a rational number. Is A measurable? What is its Lebesgue measure?

The Dirichlet s P rinciple. In this lecture we discuss an alternative formulation of the Dirichlet problem for the Laplace equation:

Acta Universitatis Carolinae. Mathematica et Physica

2 Lebesgue integration

Linear conic optimization for nonlinear optimal control

f-divergence Inequalities

Module 3. Function of a Random Variable and its distribution

Local strong convexity and local Lipschitz continuity of the gradient of convex functions

f-divergence Inequalities

CHAPTER 6. Differentiation

Theorems. Theorem 1.11: Greatest-Lower-Bound Property. Theorem 1.20: The Archimedean property of. Theorem 1.21: -th Root of Real Numbers

On duality theory of conic linear problems

JUHA KINNUNEN. Harmonic Analysis

Introduction to Convex Analysis Microeconomics II - Tutoring Class

Tools from Lebesgue integration

Introduction to Real Analysis Alternative Chapter 1

Hartogs Theorem: separate analyticity implies joint Paul Garrett garrett/

Week 3: Faces of convex sets

On a result of Pazy concerning the asymptotic behaviour of nonexpansive mappings

1/12/05: sec 3.1 and my article: How good is the Lebesgue measure?, Math. Intelligencer 11(2) (1989),

Notation. General. Notation Description See. Sets, Functions, and Spaces. a b & a b The minimum and the maximum of a and b

Dynkin (λ-) and π-systems; monotone classes of sets, and of functions with some examples of application (mainly of a probabilistic flavor)

Week 2: Sequences and Series

Maximization of the information divergence from the multinomial distributions 1

It follows from the above inequalities that for c C 1

From now on, we will represent a metric space with (X, d). Here are some examples: i=1 (x i y i ) p ) 1 p, p 1.

WEAK CONVERGENCE OF RESOLVENTS OF MAXIMAL MONOTONE OPERATORS AND MOSCO CONVERGENCE

Research Article On Some Improvements of the Jensen Inequality with Some Applications

Mathematical Methods for Physics and Engineering

Functional Properties of MMSE

Real Analysis, 2nd Edition, G.B.Folland Signed Measures and Differentiation

Lebesgue-Radon-Nikodym Theorem

NONLINEAR ELLIPTIC EQUATIONS WITH MEASURES REVISITED

Solving a linear equation in a set of integers II

ANALYSIS QUALIFYING EXAM FALL 2017: SOLUTIONS. 1 cos(nx) lim. n 2 x 2. g n (x) = 1 cos(nx) n 2 x 2. x 2.

On Total Convexity, Bregman Projections and Stability in Banach Spaces

212a1214Daniell s integration theory.

Lebesgue Integration: A non-rigorous introduction. What is wrong with Riemann integration?

3 Integration and Expectation

ON A HYBRID PROXIMAL POINT ALGORITHM IN BANACH SPACES

2 (Bonus). Let A X consist of points (x, y) such that either x or y is a rational number. Is A measurable? What is its Lebesgue measure?

MAT 570 REAL ANALYSIS LECTURE NOTES. Contents. 1. Sets Functions Countability Axiom of choice Equivalence relations 9

Self-dual Smooth Approximations of Convex Functions via the Proximal Average

A PARAMETRIC MODEL FOR DISCRETE-VALUED TIME SERIES. 1. Introduction

Chapter 1. Measure Spaces. 1.1 Algebras and σ algebras of sets Notation and preliminaries

Final. due May 8, 2012

On Kusuoka Representation of Law Invariant Risk Measures

THE ASYMPTOTIC ELASTICITY OF UTILITY FUNCTIONS AND OPTIMAL INVESTMENT IN INCOMPLETE MARKETS 1

Estimates for probabilities of independent events and infinite series

A Geometric Framework for Nonconvex Optimization Duality using Augmented Lagrangian Functions

Division of the Humanities and Social Sciences. Supergradients. KC Border Fall 2001 v ::15.45

HOMEOMORPHISMS OF BOUNDED VARIATION

REGULAR LAGRANGE MULTIPLIERS FOR CONTROL PROBLEMS WITH MIXED POINTWISE CONTROL-STATE CONSTRAINTS

Convex Analysis and Economic Theory Winter 2018

On a weighted total variation minimization problem

Notes on Measure, Probability and Stochastic Processes. João Lopes Dias

First In-Class Exam Solutions Math 410, Professor David Levermore Monday, 1 October 2018

Part III. 10 Topological Space Basics. Topological Spaces

It follows from the above inequalities that for c C 1

7 The Radon-Nikodym-Theorem

arxiv: v1 [math.pr] 14 Aug 2017

The Hilbert Transform and Fine Continuity

Five Mini-Courses on Analysis

Integral Jensen inequality

for all x,y [a,b]. The Lipschitz constant of f is the infimum of constants C with this property.

36-752: Lecture 1. We will use measures to say how large sets are. First, we have to decide which sets we will measure.

Near-Potential Games: Geometry and Dynamics

16 1 Basic Facts from Functional Analysis and Banach Lattices

Real Analysis Math 131AH Rudin, Chapter #1. Dominique Abdi

Simple Abelian Topological Groups. Luke Dominic Bush Hipwood. Mathematics Institute

Analysis Comprehensive Exam Questions Fall 2008

Transcription:

Proceedings IEEE ISIT 206, Barcelona, Spain, 2609 263 Convergence of generalized entropy minimizers in sequences of convex problems Imre Csiszár A Rényi Institute of Mathematics Hungarian Academy of Sciences H-364 Budapest, POBox 27, Hungary Email: csiszar@renyihu František Matúš Institute of Information Theory and Automation Academy of Sciences of the Czech Republic 8208 Prague, POBox 8, Czech Republic Email: matus@utiacascz Abstract Integral functionals based on convex normal integrands are minimized over convex constraint sets Generalized minimizers exist under a boundedness condition Sequences of the minimization problems are studied when the constraint sets are nested The corresponding sequences of generalized minimizers are related to the minimization over limit convex sets Martingale theorems and moment problems are discussed I INTRODUCTION Let (Z, Z, µ be a σ-finite measure space For real Z-measurable functions g on Z, let H : g (z, g(z µ(dz Z The functional H is based on an integrand on Z R with values in (, + Assumptions on, summarized at the beginning of Section II, include strict convexity in the second coordinate, finiteness of (z, t for t > 0 and (z, t = + for t < 0 Thus, H is finite only when g is nonnegative, µ-ae This work generalizes and strengthens existing results on minimization of the integral functional H over convex sets C of functions g for which inf C H inf g C H (g is finite If a minimizer g in C exists, H (g = inf C H, then it is unique, µ- ae, by strict convexity Otherwise, the minimizing sequences g n in C, H (g n inf C H, are of interest The minimization of H over C has a generalized minimizer if all minimizing sequences have a common limit, denoted here by ĝ C Theorem implies existence of ĝ C when a single minimizing sequence is bounded Corollary presents equivalent assumptions For a sequence C n of convex sets of functions g let ( P n = inf C n H, n Assuming monotonicity, either C n C n+ or C n C n+, let C denote the intersection or union of the sets C n, respectively The goal is to relate the problems ( P n to ( P J = inf C H Theorems 2 and 3, formulated in Section III, deal with the limit behavior of the sequence ĝ Cn of generalized minimizers in the problems ( P n Under some conditions the convergence to the generalized minimizer ĝ C in ( P is established in Bregman distance based on In Section IV, the results are applied to extend martingale theorems Discussion of moment problems is presented in Section V Proofs are postponed to Appendix II PRELIMINARIES AND ASSUMPTIONS It is assumed throughout that is a normal integrand 7, Chapter 4 such that (z,, z Z, belongs to the class Γ of functions γ on R that are finite and strictly convex for t > 0 and equal to + for t < 0 The integrand is autonomous when γ Γ exists such that (z, = γ for all z Z A function γ is asymptotically nonlinear if either the limit γ (+ of γ (t when t + is infinite or the increasing function t t γ (+ γ(t is unbounded The integrand is asymptotically nonlinear if (z, has the property for µ-aa z Z From now on, all functions f, g, h on Z are assumed to be nonnegative and Z-measurable Equalities between them are considered µ-ae If neither the positive nor the negative part of the function z (z, g(z is µ-integrable, the integral of this function over Z is + by convention The Bregman distance of g and h, based on, is given by B (g, h Z (z, g(z, h(z µ(dz where is a nonnegative integrand on Z R 2 such that (z, s, t for z Z and s, t 0 is equal to γ(s γ(t γ sgn(s t (ts t if γ +(t is finite, and s (+ otherwise Here, γ abbreviates (z,, sgn(r denotes + if r 0 and if r < 0, and γ ± are the one-sided derivatives The Bregman distance is nonnegative and vanishes if and only if g = h For more technical details see 5 A sequence of functions g n on Z converges locally in the measure µ to a function h, in symbols g n h, if it converges in measure on each set C of finite µ-measure, thus µ(c { g n h > ε} 0, ε > 0 Either of the convergences B (g n, h 0 or B (h, g n 0 implies g n h 5, Corollary 24 A set of functions H on Z is bounded locally in the measure µ if for each set C Z with finite µ(c to every ε > 0 there exists K such that sup h H µ(c {h > K} ε A set of functions G or H is -bounded or reversely -bounded if there exist functions g, h such that, respectively, sup G B (, h < + or sup H B (g, < +

III MAIN RESULTS The following assertion extends 3, Theorem (c that is confined to autonomous integrands There, an assumption of boundedness was missing, though implicitly used in a proof, see also 5, Example 05 Theorem Let C be a convex set of functions on Z with finite inf C H If a minimizing sequence g n C is bounded locally in measure then there exists a unique function ĝ C such that ( H (g inf C H + B (g, ĝ C, g C Remark If the finite infimum in ( is attained, the function ĝ C from Theorem equals the minimizer and H (g H (ĝ C + B (g, ĝ C, g C Otherwise, ĝ C is the generalized minimizer in the sense that B (g n, ĝ C 0 for every minimizing sequence g n in C The assumption of boundedness has equivalent reformulations Corollary Assuming inf C H is finite, the following assertions are equivalent (i A minimizing sequence is bounded locally in measure (ii Every minimizing sequence is eventually -bounded (iii The generalized minimizer exists (iv The level sets of H intersected with C are -bounded Let C n be a sequence of convex sets of nonnegative functions and ( P n the corresponding minimization problems with values When nonincreasing/nondecreasing, in symbols C n /C n, the intersection/union C gives rise to the problem ( P Correspondingly, the sequence of values is nondecreasing/nondecreasing and upper/lower bounded by J Theorem 2 Let C n be a sequence of convex sets such that is finite Let H have a minimizing sequence in C that is bounded locally in measure If the sequence has a finite limit then there exists a unique function h such that B (h, ĝ Cn 0 and (2 H (g lim + B (g, h, g C If the finite limit of equals J then h = ĝ C Remark 2 In Theorem 2, let the infima in the problems ( P n be attained By Remark, ĝ Cn are the unique minimizers If the sequence is bounded then a unique function h exists such that ĝ Cn converges in the sense B (h, ĝ Cn 0 and (3 H (g lim H (ĝ Cn + B (g, h, g C By Lemma 4 in Appendix, ĝ Cn h Hence, along a subsequence ĝ Cn h, µ-ae Often H is lower semicontinuous, lim inf H (g n H (g whenever g n g, µ-ae Then, J lim H (ĝ Cn H (h If h C then the inequalities are tight by (3, h = ĝ C is the minimizer in the problem ( P, B (ĝ C, ĝ Cn 0 and H (ĝ Cn H (ĝ C Example Let µ be the measure on R 2 expressible as sum of the probability measure (pm sitting in the point (0,, the pm sitting in (0, 2 and the pm on (0, + 2 with the density (x, x 2 e x x2, see 4, Example The functional H γ is based on the autonomous integrand γ : t t ln t Given s, the functional H γ is minimized over C n = {G(ε, s: 0 ε n } where G(a, a 2 for (a, a 2 R 2 denotes the set of nonnegative functions g that satisfy the three moment constraints R 2 (, z, z 2 g(z, z 2 µ(dz, dz 2 = (, a, a 2 The minimum of H γ over G(ε, s, ε > 0, is attained uniquely at the function f ε,s : (z, z 2 e ϑ z+ϑ 2 z2 Λ(ϑ,ϑ 2 where Λ: (ϑ, ϑ 2 ln e ϑ2 + e 2ϑ2 + ϑ ϑ 2, ϑ, ϑ 2 <, is the log-laplace transform of µ and (ϑ, ϑ 2 is the unique solution of Λ(ϑ, ϑ 2 = (ε, s Thus, ( ϑ 2 ϑ = ε e ϑ 2 + e 2ϑ 2 + 2 ϑ ϑ 2 e ϑ 2 + 2e 2ϑ 2 + ϑ ( ϑ = s e ϑ 2 + e 2ϑ 2 + 2 2 ϑ ϑ 2 For fixed s > s +2e +e, the second equation and ϑ 2 imply ϑ ( ϑ 2 2 (e + e 2 (s s and ϑ By the first equation, ϑ ε 0 and ε 0 Hence, when ε 0 the functions f ε,s converge to zero on (0, + 2, to +e at (0, and to e +e at (0, 2 Further, min G(ε,s H γ = ϑ ε + ϑ 2s Λ(ϑ, ϑ 2 s ln( + e This minimum is actually Λ (ε, s where Λ is the conjugate of Λ By calculus, for s > s Λ (ε, s ϑ ε + ϑ 2 s ln e ϑ2 + e 2ϑ2 + sup ϑ,ϑ 2< = s + sup ϑ < ϑ ε ln + e + ϑ = s ln(+e 2 ε/(+e + O(ε ϑ e ϑ2 The sets C n intersect to C = G(0, s For s 2 this family contains the single function f 0,s that equals 2 s at (0,, s at (0, 2, and 0 otherwise Hence, J = (s ln(s + (2 s ln(2 s, s 2 If s > 2 then G(0, s is empty, thus J = + For s > s it follows that the infimum in is attained and lim = s ln( + e < J Thus, the limit is finite but smaller than J, which is even infinite for s > 2 When n is sufficiently large the minimizer ĝ Cn in the problem ( P n is equal to the minimizer f /n,s of H γ over G(/n, s, using the upper bound on Λ Theorem 2 applies The function h = f 0,s is different from ĝ C = f 0,s for s < s 2, while ĝ C is not defined for s > 2

Theorem 3 Let be asymptotically nonlinear and C n be a sequence of convex sets such that is finite Let H have a minimizing sequence in C n that is bounded locally in measure, n If the sequence has a finite limit then the limit equals J, the minimizing sequences in C are -bounded and B (ĝ Cn, ĝ C 0 Example 2 Let µ be the Lebesgue measure on 0, The functional H γ is based on γ : t t ln t and is minimized over C n = { g 0: 0 g dµ =, g(t = 2t for t n} The minimizer exists, ĝ Cn (t = + n, t n, and = /n 0 2t ln(2t dt + ( n 2 ln( + n 0 By Theorem 3, J = 0 and ĝ C is the generalized minimizer in the problem ( P, which has however no minimizer because ĝ C is not in C = n C n Instead, ĝ C is a minimizer when minimizing over all functions integrating to IV MARTINGALE THEOREMS In this section Theorems 2 and 3 are related to convergence of conditional expectations in Bregman distances A function f has the covering property wrt a sub-σ-algebra Y of Z (and µ if Z is the union of at most countably many sets Y Y with f dµ finite By Radon-Nikodym theorem, Y there exists a Y-measurable function f Y = f Y,µ that satisfies Y f Y dµ = f dµ for Y Y It is nonnegative and unique Y If f > 0 then f Y > 0 Let sub-σ-algebras Y n of Z be nested as Y n Y n+, in symbols Y n, and generate Y A martingale theorem asserts that if f has the covering property wrt Y then f Yn f Y, µ-ae In a backward martingale theorem, sub-σ-algebras are nested as Y n Y n+ and intersect to Y Let D Y,f denote the set of nonnegative functions g that satisfy Y g dµ = f dµ for Y Y The minimization of the Y functionals H over the convex set C = D Y,f is not considered here in full generality but only for integrands : (z, t h(z γ ( t h(z, z Z, t R, where γ Γ and h is a positive function on Z If h then the integrand is autonomous, which can be arranged replacing µ by ν with dν = h dµ and g by gh, see 5, Appendix C It is further assumed that γ is differentiable, nonnegative and γ( = 0 Then, H (g becomes the γ-divergence D γ (g, h of a function g from h H (g = Z h γ( g h dµ = Dγ (g, h Lemma If f 0 and h > 0 have the covering property wrt a sub-σ-algebra Y of Z and µ then min g DY,f D γ (g, h = D γ (f Y, h Y If finite, the minimum is attained at g = f Y h/h Y Proofs of the assertions presented here are not included They are available for reviewer s purposes at http://staffutiacascz/matus/ When the minimum in Lemma is finite, Theorem for H = D γ (, h and C = D Y,f features the minimizer ĝ DY,f equal to f Y h/h Y Ineq ( is tight D γ (g, h = D γ (f Y, h Y + B (g, f Y h/h Y, g D Y,f In fact, if the right hand side is finite the Bregman distance is ( ( g f Y h γ Z h h γ γ ( f Y g f Y h dµ h Y h Y h Y The third term can be omitted since g Y = f Y For the autonomous integrand, h, ĝ DY,f = f Y and ( takes the form H γ (g = H γ (f Y + B γ (g, f Y, g D Y,f Lemma 2 If sub-σ-algebras Y n, generate Y, and f 0 and h > 0 have the covering property wrt Y then D γ (f Yn, h Yn D γ (f Y, h Y Theorem 2 restricted to the autonomous integrands applies to sequences of sets of the type D Y,f Lemmas and 2 are invoked in the proof Corollary 2 Under the assumptions of Lemma 2 with h, if H γ (f Y is finite then B γ (f Y, f Yn 0 The choice γ : t t ln t t + gives the convergence of relative entropies in Lemma 2, see 6, Theorem 2, while the assertion of Corollary 2 seems to be new A consequence of Theorem 3 is analogous to Corollary 2 Corollary 3 Let γ be asymptotically nonlinear If Y n, intersect to Y, f 0 and h > 0 have the covering property wrt Y, and D γ (f Y, h Y is finite then D γ (f Yn, h Yn D γ (f Y, h Y, and if H γ (f Y is finite then B γ (f Yn, f Y 0 Choosing γ : t t ln t t + the convergence of relative entropies follows, see 6, Theorem 3 V DISCUSSION The minimization of integral functionals studied in this contribution includes a number of special situations that appeared before The relations to literature are discussed at length in 5, Section The focus here is on sequences of generalized minimizers for entropy-like functionals, which is of novelty even for the autonomous integrands The generalization matters, see the simple situation in Example 2 To make Theorem 2 operational, conditions for the convergence J < + are desirable By Remark 2, it is sufficient to have 0, making H lower semicontinuous, and h C An alternative is to assume that µ is finite, and bounded below Likewise, for Theorem 3 boundedness of should be under control For more general sequences of convex sets, see 2, 8 In moment problems, C is the set of functions g satisfying ϕ Z j g dµ = a j, j =,, d,

where ϕ j : Z R are given moment functions and a j R are prescribed moments Feasibility is resolved by the concept of conic core 5 If inf C H is finite then the boundedness condition from Theorem is equivalent to the modified dual constraint qualification The generalized minimizer can be explicitly described avoiding primary constraint qualification 5 A well-understood special limiting situation is when µ is finite, H γ is based on γ Γ with γ(t/t +, C n = { g L (µ: Z ϕ j g dµ = a j, j =,, n } where ϕ j L (µ, and C = n C n contains a function g with H γ (g finite Then, H γ is bounded below, C n are weakly closed, H γ has weakly compact level sets and is strongly lower semicontinuous in L (µ Therefore, ĝ Cn and ĝ C are minimizers 8, Theorem The convergence J < + takes place By Theorem 2, B γ (ĝ C, ĝ Cn 0 If γ(t = t ln t this improves the previously known convergence in L (µ, Corollary 33 See also 3 for a sufficient condition on γ to conclude this convergence APPENDIX Any -bounded set is bounded locally in measure and so is any reversely -bounded set provided is asymptotically nonlinear, see Corollary 5 The following appeared previously as 5, Lemma 23 Lemma 3 To any set C Z of finite µ-measure and K, ξ, ε positive there exists δ > 0 such that for functions g and h either of B (g, h δ or B (h, g δ implies µ(c { g h > ε} < ξ + µ(c {g > K} Lemma 4 If a sequence g n is bounded locally in measure, h n,m is an array of functions and B (g n, h n,m 0 with n, m (n m or m n then g n h n,m 0 with n, m (n m or m n Proof: Since the sequence is bounded, if C Z has finite µ-measure and ξ > 0 then there exists K > 0 such that µ(c {g n > K} < ξ for all n By Lemma 3, for ε > 0 there exists δ > 0 such that if B (g n, h n,m δ then µ(c { h n,m g n > ε} < ξ + µ(c {g n > K} < 2ξ The assumption on convergence implies that this holds eventually in n, m Hence, the local convergence follows Corollary 4 If a sequence g n is bounded locally in measure and B (g n, g m 0 with n m or m n then g n converges locally in measure Proof: Lemma 4 is applied with h n,m = g m to conclude that g n g m 0 with n, m as above In either case, g n is Cauchy locally in measure, and the convergence follows Proof of Theorem : The main ingredient is the identity involving integral functionals and Bregman distances th (g + ( th (g 2 H (tg + ( tg 2 = tb (g, h + ( tb (g 2, h B (tg + ( tg 2, h where g, g 2 and h are nonnegative functions, 0 < t < and the H -terms and B (tg + ( tg 2, h are finite By assumption, there exists a minimizing sequence g n in C that is bounded locally in measure and finite H (g n converge to finite inf C H Then, the identity implies 2 H (g n + 2 H (g m H (h n,m = 2 B (g n, h n,m + 2 B (g m, h n,m where h n,m = 2 (g n + g m The left-hand side tends to 0 with n, m as H (h n,m converges to inf C H due to convexity of H Then, B (g n, h n,m 0 and B (g m, h n,m 0 By Lemma 4, g n h n,m 0 and g m h n,m 0 Hence, g n g m 0 which expresses that the sequence g n is Cauchy locally in measure Going to a subsequence if necessary, g n ĝ C, µ-ae, for some function ĝ C Taking g C with finite H (g and h n = t n g + ( t n g n with 0 < t n <, the identity implies t n H (g + ( t n H (g n H (h n = t n B (g, h n + ( t n B (g n, h n If t n 0 then H (h n inf C H Hence, t n can decrease slowly enough to make t n H (g n H (h n converging to zero Dividing by t n, H (g inf H H (h n H lim sup (g n + B C t (g, h n n Since h n ĝ C, µ-ae, ineq ( follows by lower semicontinuity of the Bregman distance 5, Lemma 22 Proof of Theorem 2: By assumptions, Theorem applied to C and Corollary, the level sets of H intersected with C are -bounded Then, Theorem and Corollary apply to all C n and the generalized minimizers ĝ Cn exist For n m, ineq ( and C n C m imply (4 H (g J m + B (g, ĝ Cm, g C n If g k C n is a minimizing sequence in the problem ( P n then (4 with g k instead of g is limited in k and gives (5 m J +lim sup k B (g k, ĝ Cm J m +B (ĝ Cn, ĝ Cm due to g k ĝ Cn and lower semicontinuity of the Bregman distance 5, Lemma 22 Hence, B (ĝ Cn, ĝ Cm 0 when n, m and n m In addition, ĝ Cn is -bounded as lim J sup B (ĝ Cn, ĝ C n By Corollary 4, ĝ Cn h for a function h Limiting in (5 with n and lower semicontinuity imply lim m J + B (h, ĝ Cm, m Hence, B (h, ĝ Cm 0 with m It follows from (4 restricted to C C n that H (g lim J m m + lim sup B (g, ĝ Cm, g C m This, ĝ Cm h and lower semicontinuity imply ineq (2

When the value J in ( P is finite then ineq (2 implies that the level sets of H intersected with C are -bounded Theorem together with Corollary imply existence of a function ĝ C such that (6 H (g J + B (g, ĝ C, g C When lim = J, ineqs (6 and (2 differ only in h and ĝ C Then, the two functions coincide due to the uniqueness stated in Theorem Remark 3 In Theorem 2, if g n C n and H (g n 0 then g n h In fact, ineq (4 implies lim m J + lim sup B (g n, ĝ Cm Then, g n is eventually -bounded Since B (g n, g Cn 0, due to ineq (4 with n = m, it follows that g n g Cn 0 by Lemma 4 This and g n ĝ C imply g n h The following is an extended version of 5, Lemma 99 Lemma 5 Let C Z have finite µ-measure and L, ξ, δ > 0 There exists K > L such that if B (g, h δ for functions g, h then µ(c {g > K} < ξ + µ(c {h > L} The assertion holds also assuming instead B (h, g δ and the asymptotic nonlinearity of Proof: Let M = 2δ/ξ By monotonicity, for K > L B (g, h {g>k,h L} (z, g(z, h(z µ(dz {g>k,h L} (z, K, L µ(dz M µ ( C { (, K, L M, g > K, h L} Then, µ(c {g > K} is at most M B (g, h + µ(c { (, K, L < M} + µ(c {h > L} If B (g, h δ then M B (h, g 2ξ By strict convexity, (z, K, L increases to + when K + Therefore, for K > L sufficiently large µ ( C { (, K, L < M} < 2 ξ, and the assertion follows If B (h, g δ then for K > L δ {g>k,h L} (z, h(z, g(z µ(dz {g>k,h L} (z, L, K µ(dz M µ ( C { (, L, K M, g > K, h L} Therefore, µ(c {g > K} is at most 2 ξ + µ(c { (, L, K < M} + µ(c {h > L} To conclude (z, L, K + with K +, the asymptotic nonlinearity of (z, suffices The modified assertion follows as above Corollary 5 If a set G of functions is -bounded then is bounded locally in measure The conclusion holds also for In Lemma 99 the arguments in B (h, g are erroneously interchanged sets H that are reversely -bounded assuming the asymptotic nonlinearity of Proof of Theorem 3: By Theorem, the generalized minimizers ĝ Cn exist For n m it follows from ineq ( and C m C n that (7 H (g + B (g, ĝ Cn, g C m Since the left-hand side of (7 can be finite the sequence ĝ Cn is reversely -bounded Then, it is bounded locally in measure by Corollary 5, using that is asymptotically nonlinear For a minimizing sequence g k in C m, Theorem implies g k ĝ Cm, and then a subsequence converges µ-ae Limiting in (7 along the subsequence (8 J m + B (ĝ Cm, ĝ Cn, n m, on account of the lower semicontinuity of Bregman distance 5, Lemma 22 Since the sequence is bounded and monotone B (ĝ Cn, ĝ Cm 0, n m By Lemma 4, there exists a function h such that ĝ Cn h Limiting in (7 with n, along a subsequence, H (g lim + B (g, h, g C m analogously as above Since m is arbitrary this inequality holds even for g C Then, inf C H is at least the finite limit, and J = lim by monotonicity Hence, H (g J + B (g, h, g C, and the minimizing sequences in C are -bounded Theorem applies to C and, by uniqueness, h = ĝ C Limiting in (8 with n, along a subsequence that gives ĝ C, µ-ae, ĝ Cn J m lim J + B (ĝ Cm, ĝ C, m Then, B (ĝ Cm, ĝ C 0 when m ACKNOWLEDGEMENT The work on this contribution was supported by the Hungarian National Foundation for Scientific Research under Grant K 05840 and by Grant Agency of the Czech Republic under Grant 6-200S REFERENCES Borwein, JM and Lewis, AS, Convergence of best entropy estimates SIAM J Optimization (99 9 205 2 Borwein, JM and Lewis, AS, Strong rotundity and optimization SIAM J Optimization 4 (994 46 58 3 Csiszár, I, Generalized projections for non-negative functions Acta Math Hungar 68 (995, 2, 6 85 4 Csiszár, I and Matúš, F, Information projections revisited IEEE Trans Inform Theory 49 (2003 474 490 5 Csiszár, I and Matúš, F, Generalized minimizers of convex integral functionals, Bregman distance, Pythagorean identities Kybernetika 47 (202 637 689 6 Harremoës, P and Holst, KK, Convergence of Markov chains in information divergence J Theoretical Probability 22 (2009 86 202 Press, Princeton 970 7 Rockafellar, RT and Wets, RJ-B, Variational Analysis Springer Verlag, Berlin, Heidelberg, New York 2004 8 Teboulle, M and Vajda, I, Convergence of best ϕ-entropy estimates IEEE Transactions on Information Theory 39 (993 297 30

PROOFS OF THE RESULTS FROM SECTION IV This is an appendix which is not a part of the publication It provides supplementary material to Section IV Proof of Lemma : Let ν denote the measure with the µ-density h Assuming the γ-divergence is finite, D γ (g, h = h γ( g Z h dµ = Z γ( g h dν = Z γ( g h Y,ν dν = γ( g Z h Y,ν h dµ = γ( g Z h Y,ν h Y,µ dµ Conditional Jensen s inequality with γ 0 implies γ ( g ( ( h Y,ν γ g h Y,ν By assumptions, f Y and h Y are well-defined and so is g Y for g D Y,f For such g and Y Y g Y Y,µ dµ = g dµ = g Y Y h ( dν g = Y h dν = ( g Y,ν Y h h dµ Y,ν (( g = Y h h dµ = ( g Y,ν Y,µ Y h h Y,ν Y,µ dµ Using that h Y,µ > 0, ( g h Y,ν = g Y,µ h Y,µ It follows by g Y,µ = f Y,µ that D γ (g, h Z γ ( g Y,µ h Y,µ h Y,µ dµ = D γ (f Y,µ, h Y,µ Thus, the minimum is at least D γ (f Y, h Y Denoting f Y h/h Y by f h, for Y Y Y f dµ = Y f Y dµ = Y (f h Y dµ = Y f h dµ whenever the integral on the left is finite Therefore, f h belongs to D Y,f As D γ (f h, h = D γ (f Y, h Y the minimum equals D γ (f Y, h Y and f h is a minimizer Proof of Lemma 2: By Lemma, D γ (f Yn, h Yn and the limit of the sequence is at most D γ (f Y, h Y A martingale theorem implies µ-ae convergences f Yn f Y and h Yn h Y Hence, lim inf D γ (f Yn, h Yn D γ (f Y, h Y by Fatou lemma and lower semicontinuity of γ 0 Proof of Corollary 2: By monotonicity of Y n, D Yn,f A function g belongs to the intersection of D Yn,f if and only if Y g dµ = Y f dµ for Y n Y n The equality holds also for Y Y Hence, C n = D Yn,f intersect to C = D Y,f If H γ (f Y = D γ (f Y, is finite then, by Lemma with h, ĝ Cn = f Yn and ĝ C = f Y Lemma 2 and Theorem 2 with h = ĝ C imply the convergence Proof of Corollary 3: The integrand is asymptotically nonlinear and C n = D Yn,f and have the union C contained in D Y,f Lemma implies that = D γ(f Yn, h Yn Then, the sequence is finite and has a nonnegative limit because γ 0 It follows by Theorem 3 that B (ĝ Cn, ĝ C 0 where ĝ Cn = f Yn h/h Yn Then, by Fatou lemma ( f Yn ĝc 0 lim inf h γ h γ( Z h Yn h γ ( ĝ C h f Yn h h Yn ĝ C dµ 0 In turn, B (f Y h/h Y, ĝ C = 0, using that f Yn f Y and h Yn h Y, µ-ae Hence, ĝ C = f Y h/h Y and B (ĝ Cn, ĝ C = D γ (f Yn, h Yn D γ (f Y, h Y 0 This rewrites to B γ (f Yn, f Y = H γ (f Yn H γ (f Y 0 when h In Example 2, the inclusion was strict