Stability and Sensitivity of the Capacity in Continuous Channels. Malcolm Egan

Similar documents
Capacity Sensitivity in Additive Non-Gaussian Noise Channels

Recursive Methods. Introduction to Dynamic Optimization

Lecture 5: The Bellman Equation

Some Properties of the Augmented Lagrangian in Cone Constrained Optimization

Division of the Humanities and Social Sciences. Supergradients. KC Border Fall 2001 v ::15.45

Lecture 1: Entropy, convexity, and matrix scaling CSE 599S: Entropy optimality, Winter 2016 Instructor: James R. Lee Last updated: January 24, 2016

The Lindeberg central limit theorem

ON THE REGULARITY OF SAMPLE PATHS OF SUB-ELLIPTIC DIFFUSIONS ON MANIFOLDS

We denote the space of distributions on Ω by D ( Ω) 2.

Notes on Distributions

Randomized Quantization and Optimal Design with a Marginal Constraint

Metric Spaces and Topology

Continuity of convex functions in normed spaces

A note on the σ-algebra of cylinder sets and all that

MAT 578 FUNCTIONAL ANALYSIS EXERCISES

7 Complete metric spaces and function spaces

Convex Analysis and Economic Theory AY Elementary properties of convex functions

is a Borel subset of S Θ for each c R (Bertsekas and Shreve, 1978, Proposition 7.36) This always holds in practical applications.

Some SDEs with distributional drift Part I : General calculus. Flandoli, Franco; Russo, Francesco; Wolf, Jochen

Problem Set 1: Solutions Math 201A: Fall Problem 1. Let (X, d) be a metric space. (a) Prove the reverse triangle inequality: for every x, y, z X

Brownian Motion. 1 Definition Brownian Motion Wiener measure... 3

MATH 4200 HW: PROBLEM SET FOUR: METRIC SPACES

An introduction to Mathematical Theory of Control

Appendix B Convex analysis

LECTURE 15: COMPLETENESS AND CONVEXITY

Topological properties of Z p and Q p and Euclidean models

Convex Optimization Notes

LARGE DEVIATIONS OF TYPICAL LINEAR FUNCTIONALS ON A CONVEX BODY WITH UNCONDITIONAL BASIS. S. G. Bobkov and F. L. Nazarov. September 25, 2011

Convex Analysis and Economic Theory Winter 2018

Math The Laplacian. 1 Green s Identities, Fundamental Solution

ELEMENTS OF PROBABILITY THEORY

Aliprantis, Border: Infinite-dimensional Analysis A Hitchhiker s Guide

Optimality Conditions for Nonsmooth Convex Optimization

Mathematics II, course

MATH 202B - Problem Set 5

CURVATURE ESTIMATES FOR WEINGARTEN HYPERSURFACES IN RIEMANNIAN MANIFOLDS

Weak convergence and Compactness.

PROBABILITY: LIMIT THEOREMS II, SPRING HOMEWORK PROBLEMS

Metric Spaces. Exercises Fall 2017 Lecturer: Viveka Erlandsson. Written by M.van den Berg

Optimization and Optimal Control in Banach Spaces

Course 212: Academic Year Section 1: Metric Spaces

CHAPTER V DUAL SPACES

Spectral theory for compact operators on Banach spaces

Chapter 2. Fuzzy function space Introduction. A fuzzy function can be considered as the generalization of a classical function.

Econometrica Supplementary Material

ON A CLASS OF NONSMOOTH COMPOSITE FUNCTIONS

Introduction to Functional Analysis

Bayesian Persuasion Online Appendix

Berge s Maximum Theorem

Fourier Transform & Sobolev Spaces

Outline Today s Lecture

FIXED POINT METHODS IN NONLINEAR ANALYSIS

Lecture 3. Econ August 12

MATH 722, COMPLEX ANALYSIS, SPRING 2009 PART 5

Statistics 612: L p spaces, metrics on spaces of probabilites, and connections to estimation

Mathematical Methods in Economics (Part I) Lecture Note

A PROOF OF A CONVEX-VALUED SELECTION THEOREM WITH THE CODOMAIN OF A FRÉCHET SPACE. Myung-Hyun Cho and Jun-Hui Kim. 1. Introduction

OPTIMAL SOLUTIONS TO STOCHASTIC DIFFERENTIAL INCLUSIONS

Topics in Stochastic Geometry. Lecture 4 The Boolean model

On the Asymptotic Optimality of Empirical Likelihood for Testing Moment Restrictions

2a Large deviation principle (LDP) b Contraction principle c Change of measure... 10

Sum-Power Iterative Watefilling Algorithm

Introduction to Empirical Processes and Semiparametric Inference Lecture 09: Stochastic Convergence, Continued

Lecture 5 Channel Coding over Continuous Channels

Some Background Material

Problem List MATH 5143 Fall, 2013

g 2 (x) (1/3)M 1 = (1/3)(2/3)M.

Definition 6.1. A metric space (X, d) is complete if every Cauchy sequence tends to a limit in X.

Local strong convexity and local Lipschitz continuity of the gradient of convex functions

Connections between spectral properties of asymptotic mappings and solutions to wireless network problems

Principles of Real Analysis I Fall VII. Sequences of Functions

Locally convex spaces, the hyperplane separation theorem, and the Krein-Milman theorem

Lévy Processes and Infinitely Divisible Measures in the Dual of afebruary Nuclear2017 Space 1 / 32

Measurable Choice Functions

EQUIVALENCE OF TOPOLOGIES AND BOREL FIELDS FOR COUNTABLY-HILBERT SPACES

Discussion Paper Series

GEOMETRIC APPROACH TO CONVEX SUBDIFFERENTIAL CALCULUS October 10, Dedicated to Franco Giannessi and Diethard Pallaschke with great respect

much more on minimax (order bounds) cf. lecture by Iain Johnstone

Optimal Power Control in Decentralized Gaussian Multiple Access Channels

Finite-dimensional spaces. C n is the space of n-tuples x = (x 1,..., x n ) of complex numbers. It is a Hilbert space with the inner product

An invariance result for Hammersley s process with sources and sinks

Quadratic estimates and perturbations of Dirac type operators on doubling measure metric spaces

Functional Analysis. Franck Sueur Metric spaces Definitions Completeness Compactness Separability...

Fragmentability and σ-fragmentability

ON THE DEFINITION OF RELATIVE PRESSURE FOR FACTOR MAPS ON SHIFTS OF FINITE TYPE. 1. Introduction

Problem Set 2: Solutions Math 201A: Fall 2016

Structural and Multidisciplinary Optimization. P. Duysinx and P. Tossings

Lecture 6 Channel Coding over Continuous Channels

Yuqing Chen, Yeol Je Cho, and Li Yang

Stochastic Convergence, Delta Method & Moment Estimators

1 Lesson 1: Brunn Minkowski Inequality

Global minimization. Chapter Upper and lower bounds

The small ball property in Banach spaces (quantitative results)

Lecture I: Asymptotics for large GUE random matrices

The Lévy-Itô decomposition and the Lévy-Khintchine formula in31 themarch dual of 2014 a nuclear 1 space. / 20

Stochastic Optimization with Inequality Constraints Using Simultaneous Perturbations and Penalty Functions

Asymptotic stability of an evolutionary nonlinear Boltzmann-type equation

Introduction to Convex Analysis Microeconomics II - Tutoring Class

Consensus building: How to persuade a group

Metric spaces and metrizability

Transcription:

Stability and Sensitivity of the Capacity in Continuous Channels Malcolm Egan Univ. Lyon, INSA Lyon, INRIA 2019 European School of Information Theory April 18, 2019 1 / 40

Capacity of Additive Noise Models Consider the (memoryless, stationary, scalar) additive noise channel Y = X + N, where the noise N is a random variable on (R, B(R)) with probability density function p N. The capacity is defined by C = sup µ X P subject to I (µ X, P Y X ) µ X Λ Key Question: What is the capacity for general constraints and non-gaussian noise distributions? 2 / 40

Non-Gaussian Noise Models In many applications, the noise is non-gaussian. Example 1: Poisson Spatial Fields of Interferers. Noise in this model is the interference Z = i Φ r η/2 i h i X i. 3 / 40

Non-Gaussian Noise Models Suppose that (i) Φ is a homogeneous Poisson point process; (ii) (h i ) and (X i ) are processes with independent elements; (ii) E[ h i X i 4/η ] <. Then, the interference Z converges almost surely to a symmetric α-stable random variable. 0.35 0.3 α = 1.1 α = 1.5 α = 2 0.25 0.2 0.15 0.1 0.05 0 6 4 2 0 2 4 6 4 / 40

Non-Gaussian Noise Models Example 2: Molecular Timing Channel. In the channel Y = X + N, the input X corresponds to time of release. 5 / 40

Non-Gaussian Noise Models In the channel Y = X + N, the noise N corresponds to the diffusion time from the transmitter to the receiver. Under Brownian motion models of diffusion, the noise distribution is inverse Gaussian or Lévy stable. ) λ p N (x) = ( 2πx 2 exp λ(x µ)2 2µ 2. x 6 / 40

Capacity of Non-Gaussian Noise Models The capacity is defined by C = sup µ X P subject to I (µ X, P Y X ) µ X Λ The noise is in general non-gaussian. Question: What is the constraint set Λ? 7 / 40

Constraint Sets A familiar constraint common in wireless communications is Λ P = {µ X P : E µx [X 2 ] P} corresponding to an average power constraint. Other constraints appear in applications. For example, Λ c = {µ X P : E µx [ X r ] c} where 0 < r < 2. This corresponds to a fractional moment constraint (useful in the study of α-stable noise channels). In the molecular timing channel, Λ T = {µ X P : E µx [X ] T, P µx (X < 0) = 0} is the relevant constraint. 8 / 40

Capacity of Non-Gaussian Noise Channels The capacity is defined by C = sup µ X P subject to I (µ X, P Y X ) µ X Λ Since the channel is additive, I (µ X, P Y X ) = p N (y x) log p N(y x) dydµ X (x). p Y (y) There are two basic questions that can be asked: (i) What is the value of the capacity C? (ii) What is the optimal solution µ X? 9 / 40

Topologies on Sets of Probability Measures Point set topology plays an important role in optimization theory. For example, it allows us to determine whether or not the optimum can be achieved (i.e., the sup becomes a max). 10 / 40

Topologies on Sets of Probability Measures In applications, we usually optimize over R n, which has the standard topology induced by Euclidean metric balls. In the capacity problem, we optimize over sets of probability measures in subsets of P. Question: What is a useful topology on the set of probability measures? 11 / 40

Topologies on Sets of Probability Measures A useful choice is the topology of weak convergence. Closed sets S are defined by sequences of probability measures (µ i ) S and a limiting probability measure µ S such that lim i f (x)dµ i (x) = for all bounded and continuous functions f. f (x)dµ(x). It turns out that the topology of weak convergence for probability measures is metrizable. There exists a metric d on P such that d metric-balls generate the topology of weak convergence (known as the Lévy-Prokhorov metric). 12 / 40

Topologies on Sets of Probability Measures In addition, Prokhorov s theorem gives a characterization of compactness. Prokhorov s Theorem: If a subset Λ P of probability measures is tight and closed, then Λ is compact in the topology of weak convergence. A set of probability measures Λ is tight if for all ɛ > 0, there exists a compact set K ɛ R such that µ(k ɛ ) 1 ɛ, µ Λ. 13 / 40

Existence of the Optimal Input The capacity is defined by C = sup µ X P subject to I (µ X, P Y X ) µ X Λ Question: Does the capacity-achieving input exist? This is answered by the extreme value theorem. Extreme Value Theorem: If Λ is weakly compact and I (µ X, P Y X ) is weakly continuous on Λ, then µ X exists. 14 / 40

Support of the Optimal Input Question: When is the optimal input discrete and compactly supported? The initial results on this question were due to Smith [Smith1971]. Theorem: For amplitude and average power constraints, the optimal input for the Gaussian noise channel is discrete and compactly supported. 15 / 40

Support of the Optimal Input More generally, the support of the optimal input can be studied via the KKT conditions. Let M be a convex and compact set of channel input distributions. Then, µ X M maximizes the capacity if and only if for all µ X M ( )] dpy X (Y X ) E µx [log I (µ X dp Y (Y ), P Y X ). Equality holds at points of increase constraints on optimal inputs. Significant progress recently; e.g., [Fahs2018,Dytso2019]. 16 / 40

Characterizing the Capacity In general, it is hard to compute the capacity in closed-form. Exceptions are Gaussian and Cauchy noise channels under various constraints. Theorem [Lapidoth and Moser]: Let the input alphabet X and the output alphabet Y of a channel W ( ) be seperable metric spaces, and assume that for any Borel subset B Y the mapping x W (B x) from X to [0, 1] is Borel measurable. Let Q( ) be any probability measure on X, and R( ) any probability measure on Y. Then, the mutual information I (Q; W ) can be bounded by I (Q; W ) D(W ( x) R( ))dq(x) 17 / 40

A Change in Perspective New Perspective: the capacity is a map (p N, Λ) C. Definition Let K = (p N, Λ) and ˆK = (ˆp N, ˆΛ) be two tuples of channel parameters. The capacity sensitivity due to a perturbation from channel K to the channel ˆK is defined as C K ˆK = C(K) C( ˆK). Egan, M., Perlaza, S.M. and Kungurtsev, V., Capacity sensitivity in additive non-gaussian noise channels, Proc. IEEE International Symposium on Information Theory, Aachen, Germany, Jun. 2017. 18 / 40

A Strategy Consider a differentiable function f : R n R, which admits a Taylor series representation f (x + e ẽ) = f (x) + e Dẽf (x) T ẽ + o( e ). (ẽ is unit norm). This yields f (x + e ẽ) f (x) Dẽf (x) e + o( e ), i.e., the sensitivity. Question: what is the directional derivative of the optimal value function of an optimization problem (e.g., the capacity)? 19 / 40

A Strategy In the case of vector, smooth optimization problems there is a good theory. E.g., envelope theorems. Proposition Let the real valued function f (x, y) : R n R R be twice differentiable on a compact convex subset X of R n+1, strictly concave in x. Let x be the optimal value of f on X and denote ψ(y) = f (x, y). Then, the derivative of ψ(y) exists and is given by ψ (y) = f y (x, y). 20 / 40

A Strategy A sketch of the proof: 1. Use the implicit function theorem to write ψ(y) = f (x (y), y). 2. Observe that ψ (y) = f y (x (y), y) + ( x f (x (y), y)) T dx (y) dy = f y (x (y), y). Generalizations of this result due to Danskin and Gol shtein. 21 / 40

A Strategy Recall: C(Λ, p N ) = sup I (µ X, p N ) µ X Λ Question: What is the effect of Constraint perturbations: C(Λ) (fix p N )? Noise distribution perturbations: C(p N ) (fix Λ)? 22 / 40

Constraint Perturbations Common Question: What is the effect of power on the capacity? Another Formulation: What is the effect of changing the set of probability measures Λ 2 = {µ X : E µx [X 2 ] P}. Natural Generalization: What is the effect of changing Λ on C(Λ) = sup µ X P I (µ X, P Y X ) subject to µ X Λ. 23 / 40

Constraint Perturbations Question: Do small changes in the constraint set lead to small changes in the capacity? To answer this question, we need to formalize what a small change means. Key Idea: The constraint set is viewed as a point-to-set map. Example: Consider the power constraint Λ 2 (P) = {µ X : E µx [X 2 ] P} is a map from R to a compact set of probability measures Λ 2 : R P 24 / 40

Constraint Perturbations When the power P (or more generally, any other parameter) is changed, Λ 2 (P) can expand or contract. There are therefore two aspects to continuity of a point-to-set map. Definition A point-to-set map Λ : R P is upper hemicontinuous at P R if for all ɛ > 0 there exists a δ > 0 such that d(p, P) < δ implies that Λ(P) η ɛ (Λ(P)). Definition A point-to-set map Λ : R P is lower hemicontinuous at P R if for all ɛ > 0 there exists a δ > 0 such that d(p, P) < δ implies that Λ(P) η ɛ (Λ(P)). Λ is continuous if it is both upper and lower hemicontinous. 25 / 40

Lemma 1: Berge s Maximum Theorem Theorem (Berge s Maximum Theorem) Let Θ and S be two metric spaces, Γ : Θ S a compact-valued point-to-set map, and ϕ : S Θ R be a continuous function on S Θ. Define σ(θ) = arg max{ϕ(s, θ) : s Γ(θ)}, θ Θ ϕ (θ) = max{ϕ(s, θ) : s Γ(θ)}, θ Θ and assume that Γ is continuous at θ Θ. Then, ϕ : Θ R is continuous at θ. Implication: continuity of the capacity in P if 1. I (µ X, P Y X ) is weakly continuous on Λ 2. Λ : R P is continuous. 26 / 40

Bounding the Capacity Sensitivity We now have general conditions to ensure that the capacity sensitivity C(Λ(P)) C(Λ(P )) 0, P P. However, the capacity is in general a complicated function of the constraint parameters. Question: Is there a general way of bounding the capacity sensitivity? 27 / 40

Bounding the Capacity Sensitivity Key tool: Regular subgradients. Definition Consider a function f : R n R and a point x R n with f (x) finite. For a vector, v R n, v is a regular subgradient of f at x, denoted by v ˆ f (x), if there exists δ > 0 such that for all x B δ (x) f (x) f (x) + v T (x x) + o( x x ). Related to subgradients in convex optimization. What are conditions for existence? 28 / 40

Lemma 2: Existence of Regular Subgradients Theorem (Rockafellar and Wets 1997) Suppose f : R n R is finite and lower semicontinuous at x R n. Then, there exists a sequence x k f x with ˆ f (x k ) for all k. Rockafellar, R. and Wets, R., Variational Analysis. Berlin Heidelbeg: Springer-Verlag, 1997 Implication: 1. Let f (P) = C(Λ(P)) 2. Apply Berge s maximum theorem and regular subgradients. This yields general estimates of the capacity sensitivity. 29 / 40

Example 1: RHS Constraint Perturbations Consider constraints Λ(b) = {µ X P : E µx [f ( X )] b} where f is positive, non-decreasing and lower semicontinuous. The capacity is given by sup µ X P subject to I (µ X, P Y X ) µ X Λ(b). Need to establish continuity in b. 30 / 40

Example 1: RHS Constraint Perturbations Theorem Let b R + and suppose that the following conditions hold: (i) Λ(b) is non-empty and compact. (ii) I (µ X, P Y X ) is weakly continuous on Λ(b). Then, C(b) is continuous at b. It is now possible to apply the Rockafellar-Wets regular subgradient existence theorem. Suppose b > b with C(b) < and C( b) <. If b b and ɛ > 0 are sufficiently small C(b) C( b) ɛ v b b + o( b b ) 31 / 40

Example 2: Discrete Input Constraints Consider the general constraint set Λ (allowing for continuous inputs). C(Λ) = sup µ X P I (µ X, P Y X ) subject to µ X Λ, E.g., Λ = Λ p = {µ X P : E µx [ X p ] b}. Let P be the set of probability measures on (R, B(R)) that have mass points in the set > Z. Let Λ be a compact subset of P. The discrete approximation of C(Λ) is then defined as where Λ = P Λ. C(Λ ) = sup µ X P I (µ X, P Y X ) subject to µ X Λ, 32 / 40

Example 2: Discrete Input Constraints The capacity sensitivity in this case is: C Λ Λ = C(Λ) C(Λ ), I.e., the cost of discreteness. Again, we need to establish continuity in order to apply the Rockafellar-Wets theorem. 33 / 40

Example 2: Discrete Input Constraints Theorem (Egan, Perlaza 2018) Let Λ be a non-empty compact subset of P. If the mutual information I (, P Y X ) is weakly continuous on Λ, then C(Λ ) C(Λ) as 0. (i) Gaussian model p N (x) = 1 2πσ 2 exp ( x 2 /(2σ 2 ) ), σ > 0. Λ = {µ X P : E µx [X 2 ] b}, b > 0. (ii) Cauchy model 1 pn (x) = πγ (1+( xγ ) 2), γ > 0. Λ = {µ X P : E µx [ X r ] b}, b > 0. (iii) Inverse Gaussian model λ pn (x) = 2πx exp ( λ(x γ)2 3 2γ 2 x Λ = {µx P : E µx [X ] b}, b > 0. ), x > 0, λ, γ > 0. 34 / 40

Example 2: Discrete Input Constraints Theorem (Egan, Perlaza 2018) Suppose that Λ is a non-empty compact subset of P and the mutual information I : P R is weakly continuous on Λ. If C = sup µx Λ I (µ X, P Y X ) <, then for all ɛ > 0 there exists v R such that for sufficiently small, holds. C(Λ) C(Λ ) ɛ v + o( ) 35 / 40

Recipe Review If the model parameter is finite dimensional: 1. Establish continuity via Berge s maximum theorem. 2. Apply regular subgradient existence theorem. Remark: The method applies to more general channels; e.g., vector channels Egan, M., On Capacity Sensitivity in Additive Vector Symmetric -Stable Noise Channels, Proc. IEEE WCNC (Invited Paper MoTION Workshop), 2019. What if the model parameter is not finite dimensional? E.g., the noise distribution? 36 / 40

Noise Distribution Perturbations In the case of noise pdf perturbations, the relevant capacity sensitivity is C p 0 N p 1 N = C(p0 N ) C(p1 N ). Let (pn i ) i be a sequence of pdfs converging to pn 0 weakly, KL divergence...). (in e.g., TV, Question: Does C(p i N ) C(p0 N ) as i hold? 37 / 40

Noise Distribution Perturbations Theorem (Egan, Perlaza 2017) Let {p i N } i=1 be a pointwise convergent sequence with limit p0 N and let Λ be a compact set of probability measures not dependent on p N. Suppose the following conditions hold: (i) The mutual information I (µ X, pn i ) is weakly continuous on Λ. (ii) For the convergent sequence {pn i } i=1 and all weakly convergent sequences {µ i } i=1 in Λ, lim i I (µ i, p i N ) = I (µ 0, p 0 N ). (iii) There exists an optimal input probability measure µ i noise probability density pn i. Then, lim i C(p i N ) = C(p0 N ). for each 38 / 40

Lemma 3: Mutual Information Bound Lemma (Egan, Perlaza, Kungurtsev 2017) Let pn 0, p1 N be two noise probability density functions and Λ be a compact subset of P such that C(pN 0 ) < and C(p1 N ) <. Then, the capacity sensitivity satisfies C(p 0 N ) C(p1 N ) max{ I (µ 0, p 0 N ) I (µ 0, p 1 N ), I (µ 1, p 0 N ) I (µ 1, p 1 N ) }. Observation: To compute the estimate, we need to characterize the optimal input distribution. I.e. is the support discrete, continuous, compact? Connects to questions about the optimal input structure. 39 / 40

Conclusions Key Question: How sensitive are information measures to model assumptions? Many noise models and constraints are highly idealized. The capacity sensitivity framework provides a means of investigating what happens when idealizations are relaxed. 40 / 40