Optimization in Information Theory

Similar documents
SHANNON made the following well-known remarks in

Geometric Programming for Communication Systems

5. Duality. Lagrangian

A Proof of the Converse for the Capacity of Gaussian MIMO Broadcast Channels

Convex Optimization & Lagrange Duality

(each row defines a probability distribution). Given n-strings x X n, y Y n we can use the absence of memory in the channel to compute

Convex Optimization Boyd & Vandenberghe. 5. Duality

Introduction to Mathematical Programming IE406. Lecture 10. Dr. Ted Ralphs

I.3. LMI DUALITY. Didier HENRION EECI Graduate School on Control Supélec - Spring 2010

Lagrange Duality. Daniel P. Palomar. Hong Kong University of Science and Technology (HKUST)

Upper Bounds on the Capacity of Binary Intermittent Communication

subject to (x 2)(x 4) u,

Efficient Nonlinear Optimizations of Queuing Systems

1. f(β) 0 (that is, β is a feasible point for the constraints)

Lagrangian Duality and Convex Optimization

Lecture 2: August 31

Lecture 18: Optimization Programming

Lecture 7: Convex Optimizations

Lecture: Duality.

Lecture 10: Linear programming. duality. and. The dual of the LP in standard form. maximize w = b T y (D) subject to A T y c, minimize z = c T x (P)

Karush-Kuhn-Tucker Conditions. Lecturer: Ryan Tibshirani Convex Optimization /36-725

Convex Optimization. Dani Yogatama. School of Computer Science, Carnegie Mellon University, Pittsburgh, PA, USA. February 12, 2014

Extreme Abridgment of Boyd and Vandenberghe s Convex Optimization

Constrained Optimization and Lagrangian Duality

Support Vector Machines

14. Duality. ˆ Upper and lower bounds. ˆ General duality. ˆ Constraint qualifications. ˆ Counterexample. ˆ Complementary slackness.

Uses of duality. Geoff Gordon & Ryan Tibshirani Optimization /

On the Method of Lagrange Multipliers

Discrete Memoryless Channels with Memoryless Output Sequences

Convex Optimization M2

minimize x x2 2 x 1x 2 x 1 subject to x 1 +2x 2 u 1 x 1 4x 2 u 2, 5x 1 +76x 2 1,

Lecture Notes 9: Constrained Optimization

8. Geometric problems

gpcvx A Matlab Solver for Geometric Programs in Convex Form

Support Vector Machines

Cut-Set Bound and Dependence Balance Bound

12. Interior-point methods

Lecture: Duality of LP, SOCP and SDP

CO 250 Final Exam Guide

CS295: Convex Optimization. Xiaohui Xie Department of Computer Science University of California, Irvine

ELEG5481 SIGNAL PROCESSING OPTIMIZATION TECHNIQUES 6. GEOMETRIC PROGRAM

Duality in Linear Programs. Lecturer: Ryan Tibshirani Convex Optimization /36-725

Lecture 3 January 28

Variable Length Codes for Degraded Broadcast Channels

Optimal Power Control in Decentralized Gaussian Multiple Access Channels

ELE539A: Optimization of Communication Systems Lecture 15: Semidefinite Programming, Detection and Estimation Applications

Duality. Lagrange dual problem weak and strong duality optimality conditions perturbation and sensitivity analysis generalized inequalities

Network Control: A Rate-Distortion Perspective

Lecture Notes on Support Vector Machine

Semidefinite Programming

Lagrange Relaxation and Duality

Motivation. Lecture 2 Topics from Optimization and Duality. network utility maximization (NUM) problem:

Lecture 6: Conic Optimization September 8

EE/AA 578, Univ of Washington, Fall Duality

Convex Optimization and Support Vector Machine

Determinant maximization with linear. S. Boyd, L. Vandenberghe, S.-P. Wu. Information Systems Laboratory. Stanford University

Additional Exercises for Introduction to Nonlinear Optimization Amir Beck March 16, 2017

8. Geometric problems

The Lagrangian L : R d R m R r R is an (easier to optimize) lower bound on the original problem:

Duality. Geoff Gordon & Ryan Tibshirani Optimization /

Understanding the Simplex algorithm. Standard Optimization Problems.

Support Vector Machines: Maximum Margin Classifiers

ELE539A: Optimization of Communication Systems Lecture 6: Quadratic Programming, Geometric Programming, and Applications

15. Conic optimization

This research was partially supported by the Faculty Research and Development Fund of the University of North Carolina at Wilmington

UC Berkeley Department of Electrical Engineering and Computer Science. EECS 227A Nonlinear and Convex Optimization. Solutions 5 Fall 2009

Recursions for the Trapdoor Channel and an Upper Bound on its Capacity

Threshold Optimization for Capacity-Achieving Discrete Input One-Bit Output Quantization

E5295/5B5749 Convex optimization with engineering applications. Lecture 5. Convex programming and semidefinite programming

ICS-E4030 Kernel Methods in Machine Learning

Primal/Dual Decomposition Methods

Constrained optimization

CS-E4830 Kernel Methods in Machine Learning

Computing and Communications 2. Information Theory -Entropy

IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 52, NO. 2, FEBRUARY Uplink Downlink Duality Via Minimax Duality. Wei Yu, Member, IEEE (1) (2)

Entropy Vectors and Network Information Theory

Energy State Amplification in an Energy Harvesting Communication System

Applications of Linear Programming

(Classical) Information Theory III: Noisy channel coding

Introduction to Machine Learning Lecture 7. Mehryar Mohri Courant Institute and Google Research

Solving Dual Problems

Lecture 5. Theorems of Alternatives and Self-Dual Embedding

Lecture 1: Entropy, convexity, and matrix scaling CSE 599S: Entropy optimality, Winter 2016 Instructor: James R. Lee Last updated: January 24, 2016

Lecture 1. 1 Conic programming. MA 796S: Convex Optimization and Interior Point Methods October 8, Consider the conic program. min.

Reliable Computation over Multiple-Access Channels

Lecture 10: Linear programming duality and sensitivity 0-0

Interior Point Algorithms for Constrained Convex Optimization

Homework Set #6 - Solutions

Lagrange duality. The Lagrangian. We consider an optimization program of the form

Quiz 2 Date: Monday, November 21, 2016

5 Mutual Information and Channel Capacity

Introduction to Information Theory. Uncertainty. Entropy. Surprisal. Joint entropy. Conditional entropy. Mutual information.

Tutorial on Convex Optimization: Part II

Capacity of the Discrete-Time AWGN Channel Under Output Quantization

Dept. of Linguistics, Indiana University Fall 2015

Diffeomorphic Warping. Ben Recht August 17, 2006 Joint work with Ali Rahimi (Intel)

IE 5531 Midterm #2 Solutions

Homework 4. Convex Optimization /36-725

CSCI : Optimization and Control of Networks. Review on Convex Optimization

Capacity of the Discrete Memoryless Energy Harvesting Channel with Side Information

Transcription:

Optimization in Information Theory Dawei Shen November 11, 2005 Abstract This tutorial introduces the application of optimization techniques in information theory. We revisit channel capacity problem from the convex optimization perspective and show that the Lagrange dual problem of the channel capacity problem with input cost is a simple geometric problem, which can be efficiently solved using numerical algorithms such primal-dual interior point method. Upper bound on the channel capacity can be efficiently generated. This tutorial assumes the reader has some background in optimization theory, especially in convex optimization and geometric optimization techniques. 1 The Channel Capacity Problem with Input Cost First we are going to formulate the channel capacity problem with input cost into a standard optimization problem. Consider the problem of data transmission over a discrete memoryless channel with input X X = {1, 2,... N}, outputy Y = {1, 2,... M}, and the channel law P ij = Prob{Y = j X = i}, i = 1, 2,..., N, j = 1, 2,..., M. The channel law forms channel matrix P R N M, where the (i, j) entry of P is P ij 0 with P1 = 1. Here 1 represents for the column vector with all entries equal to 1. A distribution p R 1 N on the input, together with a given channel matrix P, induces a distribution q R 1 M on the output by q = pp, and a joint distribution Q on the input output pair by Q ij = p i P ij. We also associate with each input alphabet symbol i an input cost s i 0, forming a column vector s. It is a key result in information theory that the capacity C(S) of a discrete memoryless channel under the input cost constraint E p [s] = ps S is C(S) = max I(X; Y ) (1) p:ps S where the mutual information between input X and output Y is defined as I(X; Y ) = N i=1 Q ij log Q ij p i q j = H(Y ) H(Y X) = q j log q j pr 1

where r R N 1 and r i = M P ij log P ij is the conditional entropy of Y given X = i. Therefore, the channel capacity problem can be formulated as the following maximization problem, referred to as the cannel capacity problem with input cost: maximize pr q j log q j pp = q, ps S p1 = 1, p 0 (2) where the optimization variables are p and q. means componentwise inequalities on a vector. The constant parameters are P, the channel matrix and r i = P ij log P ij. If there are no input cost constraint, the channel capacity problem becomes maximize pr q j log q j pp = q p1 = 1, p 0 (3) The primal optimization problem for the channel capacity has been formulated. Note that keeping two sets of optimization variables p and q, and introducing the equality constraint pp = q in the primal problem is a key step to derive an explicit and simple lagrange dual problem of (2). 2 Geometric Programming Dual We are mainly interested in the Lagrange dual problem rather than the primal problem itself. After some manipulation as done in [2], it can be shown that the Lagrange dual problem of the channel capacity problem with input cost (2) is the following geometric program in convex form: log exp(α j + γs) Pα + γs r, γ 0 (4) where the optimization variables are α and γ, and the constant parameters are P, s, and S. An equivalent version of the lagrange dual problem is the following geometric program, in 2

standard form: w S j z j w s i M z P ij j e H(P(i)), i = 1, 2,..., N, w 1, z j 0, j = 1, 2,..., M (5) where the optimization variables are z and w, and P (i) is the ith row of P. Lagrange duality between problems (2) and (4) means the following properties: Weak Duality. Any feasible (α, γ) of the Lagrange dual problem (4) produce an upper bound on channel capacity with input cost: log M exp(α j + γs) C(S). Strong Duality. The optimal value of the Lagrange dual problem (4) is C(S). The weak duality part follows directly from the duality theory that the Lagrange dual function is always an upper bound on the primal maximization problem. The strong duality part of the proposition holds because the primal problem (1) is a convex optimization satisfying Slater s condition. An immediate corollary could be obtained. The Lagrange dual of the channel capacity problem without input cost (3) is the following geometric program in convex form: log e α j Pα r (6) where the optimization variables are α, and the constant parameters are P. The equivalent version of the Lagrange dual problem is the following geometric program in standard form: M z j z P ij j e H(P(i)), i = 1, 2,..., N, z j 0, j = 1, 2,..., M (7) Lagrange duality between problems (3) and (6) means the following: Weak Duality. log ( M eα j) C, for all α that satisfy Pα + r 0. Strong Duality. log ( M eα j ) = C, where α are the optimal dual variables. The Lagrange dual (7) of the channel capacity problem is a simple geometric program with a linear objective function and only monomial inequality constraints. Also, dual problem (5) 3

is a generalized version of the dual problem (7), weighing the objective function by w S and each constraint by w s i, where w is the Lagrange multiplier associated with the input cost constraint. If the costs for all alphabet symbols are 0, we can analytically the objective function over w by simply letting w = 0, so that we indeed recover the dual problem (7) for channels without the input cost constraint. Both the primal and the dual problems of C(S) can be simultaneously and efficiently solved through a primal-dual interior point method, which scales smoothly for different channels and alphabet sizes. Utilizing the structure and sparsity of the exponent constant matrix A of the geometric program dual for channel capacity, standard convex optimization algorithms can be further accelerated in this case. Suppose we have solved the geometric program dual of channel capacity. By strong duality, we obtain C(S). We can also recover the optimal primal variables, i.e., the capacity achieving input distribution, from the optimal dual variables. For example, we can recover a least norm capacity-achieving input distribution for a channel without an input cost constraint as follows. First, the optimal output distribution q can be recovered from the optimal dual variable α q j = exp(α j C), j = 1, 2,..., M (8) where C = log M eα j,and the optimal input distribution p is a vector that satisfies the linear equations pr = C + e C( M ) αje α j C e α j pp = q p1 = 1. At this point, we have shown that we can formulate the channel capacity problem into a geometric programming problem. Note that in this tutorial, we assume the reader has some familiarities with optimization theories, especially in convex optimization and geometric optimization. A convex optimization problem is easy to solve numerically by efficient algorithms such as the primal-dual interior point methods. Geometric programs have been used for various engineering problems. Geometric programming is a type of nonlinear problems that can be turned into convex optimization. Interested readers could refer to [1][2] for more complete introduction to geometric programming. Thus, by the convex optimization techniques, we could compute the channel capacity and the capacity-achieving optimal input distributions efficiently. 3 Some observations from the formulation From the complementary slackness property, if p i > 0, i, i.e., every mass point in the capacity achieving input distribution is positive, then solving a system of linear equations Pα + γs +r = 0 obtains α and γ, hence, the channel capacity with input cost C(S) = log exp(αj + γ S). 4

In the case of no input cost constraint, this recovers the observation made by Gallager in [4]. A dual argument based on complementary slackness shows that p i = 0 if r i + (Pα ) i + γ s i > 0 in the Lagrange dual of channel capacity. Therefore, from the optimal dual variable α,γ, we immediately obtain the support of the capacity-achieving input distribution as {i r i + (Pα ) i + γ s i = 0}. From the primal and dual problems of channel capacity, we also obtain various optimality conditions. For example, if there are λ R N 1 and α R M 1 satisfying the following KKT conditions for a given P e α j M j =1 eα j + λ i (r i + λ 0 r + Pα 0 N λ i P ij = 0, i=1 j = 1, 2,..., M ) P ij α j = 0, i = 1, 2,..., N (9) then the resulting log M eα j is the channel capacity C. There is a minmax KL divergence (minmax KL) characterization of discrete memoryless channel capacity with input cost in [5][6] [ ] C(S) = min min max D(P (i) q) + γ(s s i ) (10) q γ 0 i where the minimization over q is over all possible output distributions. It is proved in [2] that the minmaxkl characterization (10) can be recovered from the complementary slackness property. 4 Bounding From the Dual Because the inequality constraints in the dual problem (4) are affine, it is easy to obtain a dual feasible α by finding any solution to a system of linear inequalities, and the resulting value of the dual objective function provides an easily derivable upper bound on channel capacity. Many channel matrices also exhibit sparsity patterns: special patterns of a small number of nonzero entries. Based on the sparsity pattern of the given channel matrix, tight analytic obunds may also be obtained from an appropriate selection of dual variables. For example, it is easy to verify that α j = max i H(P (i) ), j, satisfy the dual ocnstraints and generate an upper bound on channel capacity C log M min i H(P (i) ). Similarly, it is easy to verify that α j = log max i P ij satisfy the dual constraints and give the following corollary: Corollary: Channel capacity is upper-bounded in terms of a maximumlikelihood receiver selecting arg max i P ij for each output alphabet symbol j C log 5 max P ij (11) i

which is tight if and only if the optimal output distribution q is q j = max i P ij M k=1 max i P ik, j = 1, 2,..., M. When there is an input cost constraint ps S, the above upper bound would become C(S) log max(e s i P ij ) + S (12) i where each maximum-likelihood decision is modified by the cost vector s. 5 Conclusions and Remarks In this tutorial, we introduce the application of optimization theory in the information theory research. We revisit the classic channel capacity problem and show that finding the channel capacity and capacity-achieving optimal input distribution can be cast into a geometric programming problem in both standard form and convex form, which can be efficiently solved using primal-dual interior point method. We also discuss some observations from the formulation. The geometric programming dual characterization is shown to be equivalent to the minmax KL characterization in [5][6]. Upper bound for the channel capacity can also be efficiently generated from the dual problem. Note that in this tutorial, we focus on the channel capacity problem. In fact, the rate distortion problem can also be formulated into a geometric program and it has many similar forms, properties and observations. More interestingly, the dual of the channel capacity problem and the dual of the rate distortion problem can be connected beautifully by the Shannon duality correspondence. However, it s pretty mathematically heavy and has been beyond the scope of this tutorial. Interested readers may refer to [2] for a more detailed introduction. References [1] S. Boyd, S. J. Kim, L. Vandenberghe, and A. Hassibi. A Tutorial on Geometric Programming. Optimization and Engineering, 2005. to appear [2] M. Chiang and S. Boyd, Geometric Programming Duals of Channel Capacity and Rate Distortion. IEEE Trans. on Information Theory, Vol.50, NO.2, pp. 245-258, February 2004 [3] T.M.Cover and M. Chiang, Duality between channel capacity and rate distortion with state information,, IEEE Trans. on Information Theory, Vol.48, pp.1629-1638, June 2002 [4] R.G.Gallager, Information Theory and Reliable Communication, New York: Wiley, 1968 [5] R.G.Gallager, Source coding with side information and universal coding, in Proc. IEEE Int. Symp. Information Theory, 1976 6

[6] I.Csiszar and J.Korner, Information Thoery: Coding Theorems for Discrete Memoryless Systems. New York: Academic, 1981 [7] M. Chiang and A.Sutivong, Efficient Optimization of constrained nonlinear resource allocation,, in Proc. IEEE GLOBECOM, San Francisco, CA, Dec. 2003 7