Variable-length codes for channels with memory and feedback: error-exponent upper bounds

Similar documents
Variable-length codes for channels with memory and feedback: error-exponent lower bounds

Capacity of the Discrete Memoryless Energy Harvesting Channel with Side Information

A Simple Converse of Burnashev s Reliability Function

Feedback Capacity of a Class of Symmetric Finite-State Markov Channels

On the capacity of the general trapdoor channel with feedback

Problems in the intersection of Information theory and Control

Upper Bounds on the Capacity of Binary Intermittent Communication

Arimoto Channel Coding Converse and Rényi Divergence

A Simple Converse of Burnashev's Reliability Function

6196 IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 57, NO. 9, SEPTEMBER 2011

Lecture 4 Noisy Channel Coding

Capacity-achieving Feedback Scheme for Flat Fading Channels with Channel State Information

Cooperative Communication with Feedback via Stochastic Approximation

Error Exponent Region for Gaussian Broadcast Channels

Chapter 4. Data Transmission and Channel Capacity. Po-Ning Chen, Professor. Department of Communications Engineering. National Chiao Tung University

On the Capacity of the Interference Channel with a Relay

The Poisson Channel with Side Information

Dispersion of the Gilbert-Elliott Channel

The Capacity Region of the Gaussian Cognitive Radio Channels at High SNR

On the Throughput, Capacity and Stability Regions of Random Multiple Access over Standard Multi-Packet Reception Channels

Variable Length Codes for Degraded Broadcast Channels

National University of Singapore Department of Electrical & Computer Engineering. Examination for

Chapter 2 Review of Classical Information Theory

On the Rate-Limited Gelfand-Pinsker Problem

Secure Degrees of Freedom of the MIMO Multiple Access Wiretap Channel

ECE598: Information-theoretic methods in high-dimensional statistics Spring 2016

Capacity of the Trapdoor Channel with Feedback

Characterization of Information Channels for Asymptotic Mean Stationarity and Stochastic Stability of Nonstationary/Unstable Linear Systems

ELEC546 Review of Information Theory

Two Applications of the Gaussian Poincaré Inequality in the Shannon Theory

The Method of Types and Its Application to Information Hiding

LECTURE 13. Last time: Lecture outline

An Outer Bound for the Gaussian. Interference channel with a relay.

Cut-Set Bound and Dependence Balance Bound

Iterative Encoder-Controller Design for Feedback Control Over Noisy Channels

ECE Information theory Final (Fall 2008)

Approximately achieving the feedback interference channel capacity with point-to-point codes

Can Feedback Increase the Capacity of the Energy Harvesting Channel?

EE 4TM4: Digital Communications II. Channel Capacity

Strong Converse Theorems for Classes of Multimessage Multicast Networks: A Rényi Divergence Approach

Relay Networks With Delays

Attaining maximal reliability with minimal feedback via joint channel-code and hash-function design

Compound Polar Codes

Towards control over fading channels

Lower Bounds on the Graphical Complexity of Finite-Length LDPC Codes

Superposition Encoding and Partial Decoding Is Optimal for a Class of Z-interference Channels

Multi-Kernel Polar Codes: Proof of Polarization and Error Exponents

Searching with Measurement Dependent Noise

Decentralized Stochastic Control with Partial Sharing Information Structures: A Common Information Approach

Principles of Coded Modulation. Georg Böcherer

Asymptotic Expansion and Error Exponent of Two-Phase Variable-Length Coding with Feedback for Discrete Memoryless Channels

5958 IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 56, NO. 12, DECEMBER 2010

EE5139R: Problem Set 7 Assigned: 30/09/15, Due: 07/10/15

Lecture 6 I. CHANNEL CODING. X n (m) P Y X

Lecture 4 Channel Coding

On the variable-delay reliability function of discrete memoryless channels with access to noisy feedback

Communication constraints and latency in Networked Control Systems

Revision of Lecture 5

Optimal Power Control in Decentralized Gaussian Multiple Access Channels

A Tight Upper Bound on the Second-Order Coding Rate of Parallel Gaussian Channels with Feedback

II. THE TWO-WAY TWO-RELAY CHANNEL

Equivalence for Networks with Adversarial State

The PPM Poisson Channel: Finite-Length Bounds and Code Design

Feedback Capacity of the Compound Channel

Practical Polar Code Construction Using Generalised Generator Matrices

Encoder Decoder Design for Event-Triggered Feedback Control over Bandlimited Channels

RECENT advances in technology have led to increased activity

Joint Write-Once-Memory and Error-Control Codes

RCA Analysis of the Polar Codes and the use of Feedback to aid Polarization at Short Blocklengths

On the Sufficiency of Power Control for a Class of Channels with Feedback

The Communication Complexity of Correlation. Prahladh Harsha Rahul Jain David McAllester Jaikumar Radhakrishnan

Multiaccess Channels with State Known to One Encoder: A Case of Degraded Message Sets

EE229B - Final Project. Capacity-Approaching Low-Density Parity-Check Codes

Recursions for the Trapdoor Channel and an Upper Bound on its Capacity

Threshold Optimization for Capacity-Achieving Discrete Input One-Bit Output Quantization

Variable-Rate Universal Slepian-Wolf Coding with Feedback

Covert Communication with Channel-State Information at the Transmitter

Feedback Capacity of the Gaussian Interference Channel to Within Bits: the Symmetric Case

On the Capacity Region of the Gaussian Z-channel

Intermittent Communication

Optimal Decentralized Control of Coupled Subsystems With Control Sharing

Lecture 5 Channel Coding over Continuous Channels

Joint Source-Channel Coding for the Multiple-Access Relay Channel

On The Binary Lossless Many-Help-One Problem with Independently Degraded Helpers

Sufficient Statistics in Decentralized Decision-Making Problems

MGMT 69000: Topics in High-dimensional Data Analysis Falll 2016

Variable-length coding with feedback in the non-asymptotic regime

An Alternative Proof of Channel Polarization for Channels with Arbitrary Input Alphabets

Optimal Feedback Communication Via Posterior Matching Ofer Shayevitz, Member, IEEE, and Meir Feder, Fellow, IEEE

The Compound Capacity of Polar Codes

Optimality of Walrand-Varaiya Type Policies and. Approximation Results for Zero-Delay Coding of. Markov Sources. Richard G. Wood

Information Embedding meets Distributed Control

A Half-Duplex Cooperative Scheme with Partial Decode-Forward Relaying

Noisy channel communication

Second-Order Asymptotics in Information Theory

Quiz 2 Date: Monday, November 21, 2016

Computation of Information Rates from Finite-State Source/Channel Models

Lecture 5: Channel Capacity. Copyright G. Caire (Sample Lectures) 122

Channel Polarization and Blackwell Measures

An Achievable Error Exponent for the Mismatched Multiple-Access Channel

Transcription:

Variable-length codes for channels with memory and feedback: error-exponent upper bounds Achilleas Anastasopoulos and Jui Wu Abstract The reliability function of memoryless channels with noiseless feedback and variable-length coding has been found to be a linear function of the average rate in the classic work of Burnashev. In this work we consider unifilar channels with noiseless feedback and study upper bounds for the channel reliability function with variable length codes. In unifilar channels the channel state is known to the transmitter but is unknown to the receiver. We generalize Burnashev s analysis and derive a similar expression which is linear in average rate and depends on the channel capacity, as well as an additional parameter which relates to a sequential binary hypothesis testing problem over this channel. This parameter is evaluated by setting up an appropriate Markov decision process (MDP). Furthermore, an upper bound for this parameter is derived using a simplified MDP. Numerical evaluation of the parameter for several binary input/state/output unifilar channels hints at the optimal transmission strategies. Such strategies are studied in a companion paper to provide lower (achievable) bounds on the channel reliability function. I. INTRODUCTION Error exponent analysis has been an active area of research for quite a few decades. The vast literature in this area can be categorized based on (i) whether the channel is memoryless or with memory; (ii) whether there is or is not channel output feedback to the transmitter; (iii) whether the employed coding is fixed-length or variable-length; and (iv) whether upper (converse) or lower (achievable) bounds are analyzed. In the case of memoryless channels with noiseless feedabck Schalkwijk and Kailath [] proposed a transmission scheme for the additive white Gaussian noise (AWGN) channel with infinite error exponent. On the other hand, Dobrushin [2] and later aroutunian [3], by deriving an error upper bound for discrete memoryless channels (DMCs) showed that at least for symmetric channels there is no gain to be expected through feedback when fixed-length codes are employed. This was a strong negative result since it suggested that for DMC channels, noiseless feedback can neither improve capacity (as was well known) nor can it improve the error exponent when fixed-length codes are used. A remarkable result was derived by Burnashev in [4], where error exponent matching upper and lower bounds were derived for DMCs with feedback and variable-length codes. The error exponent has a simple form E(R) C ( R/C), where R is the average rate, C is the channel capacity and C is the maximum divergence that can be obtained in the channel for a binary hypothesis testing problem. Berlin et al [5] have provided a simpler derivation of the Burnashev bound that emphasizes the link between the constant C and the binary hypothesis testing problem. Several variable-length transmission schemes have been proposed in the literature for DMCs and their error exponents have been analyzed [6], [7], [8]. In the case of channels with memory and feedback, the capacity was studied in [9], [0], [], and a number of capacityachievable schemes have been recently studied in the literature [2], [], [3]. The only work that studies error exponents for variable-length codes for channels with memory and feedback is [2] where the authors consider finite state channels with channel state known causally to both the transmitter and the receiver. In this work, we consider channels with memory and feedback, and derive a straight-line upper bound on the error-exponent for variable-length codes. We specifically look at unifilar channels since for this family, the capacity has been characterized in an elegant way through the use of Markov decision processes (MDPs) [0]. Our technique is motivated by that of [4], i.e., studying the rate of decay of the posterior message entropy using martingale theory in two distinct regimes: large and small message entropies. A major difference between this work as compared to [4] is that we analyze the multi-step drift behavior of the communication system instead of the one-step drift that is analyzed for DMCs. This is necessitated by the fact that one-step analysis cannot capture the memory inherent in the channel and thus results in extremely loose bounds. It is not surprising that the parameter C in our case also relates to the maximum discrimination that can be achieved in this channel in a binary hypothesis testing problem. In order to evaluate this quantity, we formulate two MDPs with decreasing degree of complexity, the solutions of which are upper bounds on the quantity C, with the former being tighter than the latter. The tightness of the bounds is argued based on the fact that asymptotically this is the expected performance of the best system, and by achievability results presented in the companion paper [4]. An additional contribution of this work is a complete reworking of some of the more opaque proofs of [4] resulting in significant simplification of the exposition. We finally provide some numerical results for a number of interesting unifilar channels including the trapdoor, chemical, and other two-input/output/state unifilar channels. The main difference between our work and that in [2] is that for unfilar channels, the channel state is not observed at the receiver. This complicates the analysis considerably as is evidenced by the different approaches in evaluating the constant C in these two works. Furthermore, our results indicate that optimal policies for achieving the maximum divergence are very different when the receiver knows or does not know the channel state. The remaining part of this paper is organized as follows. In section II, we describe the channel model for the unifilar channel and the class of encoding and decoding strategies. In section III, we analyze the drifts of the posterior message entropy in the The authors are with the Department of Electrical Engineering and Computer Science, University of Michigan, Ann Arbor, MI, 4805 USA e-mail: {juiwu, anastas}@umich.edu

large- and small-entropy regime. In section IV, we formulate two MDPs in order to study the problem of one-bit transmission over this channel. Section V presents numerical results for several unifilar channels. Final conclusions are given in section VI. II. CANNEL MODEL AND PRELIMINARIES Consider a family of finite-state point-to-point channels with inputs X t X, output Y t Y and state S t S at time t, with all alphabets being finite and initial state S s known to both the transmitter and the receiver. The channel conditional probability is P (Y t, S t+ X t, Y t, S t ) Q(Y t X t, S t )δ g(st,x t,y t)(s t+ ), () for a given stochastic kernel Q X S P(Y) and deterministic function g S X Y S, where P(Y) denotes the space of all probability measure on Y, and δ ( ) is the Kronecker delta function. This family of channels is referred to as unifilar channels [0]. The authors in [0] have derived the capacity C under certain conditions in the form of C lim sup N {p(x t s t,y t,s )} t N t N I(X t, S t ; Y t Y t, S ). (2) In this paper, we restrict our attention to such channels with strictly positive Q(y x, s) for any (y, x, s) Y X S and ergodic behavior so that the above limit indeed exists. Let W {, 2, 3,, M 2 K } be the message to be transmitted. In this system, the transmitter receives perfect feedback of the output with unit delay and decides the input X t based on (Y t, S ) at time t. The transmitter can adopt randomized encoding strategies, where X t e t ( W, Y t, S ) with a collection of distributions {e t ( w, y t, s ) P(X )} (w,y t,s ). Without loss of generality we can represent the randomized encoder through a deterministic mapping X t e t (W, Y t, V t, S ) involving the random variables {V t } t which are generated as P (V t V t, Y t, X t, S t, W ) P (V t ). Furthermore, since we are interested in error exponent upper bounds, we can assume that the random variables V t are causally observed common information among the transmitter and receiver. The decoding policy consists of a stopping time T w.r.t. filtration {F t σ(y t, V t, S )} t and estimated message Ŵt at every time t. The average rate R and error probability P e of this scheme are defined as R K E[T ] and P e P (ŴT W T ). The channel reliability function (highest achievable error exponent) is defined as E (R) sup log P e E[T ]. Since transmission schemes with P (T ) > 0 result in the trivial error exponent log P e E[T ] 0, we restrict attention to those schemes that have a.s. finite decision times. III. ERROR-EXPONENT UPPER BOUND Our methodology is inspired by the analysis in [4] for DMCs. The analysis involves lower-bounding the rate of decrease of the posterior message entropy which, through a generalization of Fano s Lemma, provides lower bounds on the error probability. Entropy can decrease no faster than the channel capacity. owever this bound becomes trivial at low values of entropy which necessitates switching to lower bounding the corresponding log drift. The log drift analysis is quite involved in [4] even for the DMC. The fundamental difference in our work compared to DMC, is the presence of memory in unifilar channels. A singlestep drift analysis wouldn t be able to capture this memory resulting in loose bounds. For this reason we analyze multi-step drifts; in fact we consider the asymptotic behavior as the step size becomes larger and larger. The outline of the analysis is as follows. Lemma and Lemma 2 describe the overall decreasing rate of the entropy induced by the posterior belief on the message in terms of drifts in the linear and logarithmic regime, respectively. The former relates the drift to capacity, C, while the latter relates it to a quantity C which can be interpreted as the largest discrimination that can be achieved in this channel for a binary hypothesis testing problem, as elegantly explained in [5]. The result presented in Lemma 3 shows that based on a general random process that satisfies the two above mentioned drift conditions one can create an appropriate submartingale. These three results are then combined together in Proposition to provide a lower bound on the stopping time of an arbitrary system employing variable-length coding, and equivalently an upper bound on the error exponent. Let us define the following random processes Π t (i) P (W i F t ), t 0 (3) M t Π t (i) log Π t (i). (4) i From the generalized Fano s Lemma [4, Lemma ], the expectation of the posterior entropy at stopping time T is upper bounded by P e(log P e + o(log P e)). Thus in order to estimate the rate of decrease of P e, we study the corresponding rate for t. The next lemma gives a first estimate of the drift of { t } t 0. Lemma. For any t and ɛ > 0, there exists an N N(ɛ) such that E[ t F t ] N(C + ɛ) a.s. (5) Proof: Please see appendix A. Since for small values of t the above result does not give any information, we now analyze the drifts of the process {log t } t 0.

Lemma 2. For any given ɛ > 0, there exists an N N(ɛ) such that if t < ɛ where the constant C is given by C max s,y t,v t,k Y t+,v t+ E[log( ) log( t ) F t ] N(C + ɛ) a.s. (6) lim sup N max {e i} N P (Y t+, V t+ W k, y t, v t, s ) log P (Yt+, V t+ W k, y t, v t, s ) P (Y t+, V t+ W k, y t, v t, s ). (7) Proof: Please see appendix B. We comment at this point that the proof of this result is significantly simpler than the corresponding one in [4, Lemma 3]. The reason is that we develop the proof directly in the asymptotic regime and thus there is no need for complex convexity arguments as the ones derived in [4, Lemma 7, and eq. (A8)-(A2)]. At this point one can bound the quantity in (7) by max x,s,x s D(Q(y x, s) Q(y x, s )) using convexity. Such a bound, however, can be very loose since it does not account for channel memory. In Section IV, we will discuss how to evaluate C. Before we continue with the second stage of the analysis, we also note that log t+ log t is bounded above by a positive number C 2 almost surely due to the fact that kernel Q(, ) is strictly positive. The proof is similar to that in [4, Lemma 4]. In the following lemma, we propose a submartingale that connects drift analysis and the stopping time in the proof of our main result. Lemma 3. Suppose a random process { t } t 0 has the following properties E[ t+ t F t ] K (8a) E[log t+ log t F t ] K 2 if t < (8b) almost surely for some positive numbers K, K 2, K 3, where K 2 > K. Define a process {Z t } t 0 by where f : R R is defined by log t+ log t < K 3 if t < (8c) Z t ( t K + t) {t> } + ( log t K 2 + t + f(log t )) { t } t 0, (9) f(y) eλy K 2 λ with a positive constant λ. Then, for sufficiently small λ, {Z t } t 0 is a submartingale w.r.t. F t. Proof: Please see appendix C. Two comments are in place regarding the proof of this result. First, the main difficulty in proving such results is to take care of what happens in the transition range (around ) where t and t+ are not both above or below the threshold. The choice of the function f( ) is what makes the proof work. The proof offered here is quite concise compared to the one employed in [4] (which consists of Lemma 5 and an approximation argument given in Theorem ). The reason for that is the specific definition of the Z t process and in particular the choice of the f( ) function which simplifies considerably the proof. The second, and related, comment is that this Lemma is not a straightforward extension of [5, Lemma in p. 50] since there, the purpose was to bound from below a positive rate of increase of a process. In our case, the proof hinges on the additional constraint (40c) we impose on the choice of the f( ) function. We are now ready to present our main result. Proposition. Any transmission scheme with M 2 K messages and error probability P e satisfies log P e E[T ] for any ɛ > 0. Furthermore, lim K U(ɛ, K, R, C, C, C 2 ) o ɛ (). Proof: Please see appendix D. (0) C ( R C ) + U(ɛ, K, R, C, C, C 2 ), ()

IV. EVALUATION OF C In this section we evaluate the constant C. As noted in [4][5], the quantity C relates to a binary hypothesis testing problem. When the posterior entropy is small, the receiver has very high confidence in a certain message. In this situation, the transmitter is essentially trying to inform the receiver whether or not this candidate message is the true one. Since the unifilar channel has memory, it is not surprising that the constant C is connected to a Markov decision process related to the aforementioned binary hypothesis testing problem, as was the case in [2]. Recall that C is defined as C max s,y t,v t,k lim sup N max {e i} N P (Y t+, V t+ W k, y t, s ) log max s,y t,v t,k lim sup N max {e i} N Y t+,v t+ P (Yt+, V t+ W k, y t, s ) P (Y t+, V t+ W k, y t, s ) D(P (Y t+, V t+ W k, y t, s ) P (Y t+, V t+ W k, y t, s )) (3) We now look into the quantities P (Y t+, V t+ W k, yt, v t, s ) and P (Y t+, V t+ W k, yt, v t, s ). Let us define X k t e t (k, Y t, V t, S ) and S k t g t (k, Y t, V t, S ) which are the input and the state at time t, respectively conditioning on W k. Then, and P (Y t+, V t+ yt, v t, s, W k) P (Yt+, V t+ yt, v t, s, W k) P (Y i Yt+ i, V t+, i y t, s, W k) P (V i Yt+ i, V t+ i, yt, s, W k) P (Y i Y i t+, V i t+, y t, v t, s, W k)p (V i Y i t+, V i t+, yt, v t, s, W k) (2) Q(Y i S k i, X k i )P (V i ) (4) Q(Y i x, s)p (X i x S i s, Y i t+, V i t+, y t, v t, s, W k) P (V i ) x,s P (S i s Yt+ i, V t+ i, yt, v t, s, W k) P (V i ) x,s where Xi k(x s) and Bk i (s) are given by Q(Y i x, s)x k i (x s)b k i (s), (5) Moreover, B k i can be updated by X k i (x s) P (X i x S i s, Y i t+, V i t+, y t, v t, s, W k) (6) B k i (s) P (S i s Y i t+, V i t+, yt, v t, s, W k). (7) B k i (s) x, s δ g( s, x,y t)(s)q(y t x, s)x k i ( x s)bk i ( s) x, s Q(Y t x, s)x k i ( x s)bk i ( s), (8)

which we can concisely express as Bi k φ(bk i, Xk i, Y i). With the above derivation, the divergence in (3) can be expressed as D(P (Yt+, V t+ W k, yt, s ) P (Yt+, V t+ W k, yt, v t, s )) Q(Y i Si k E[log, Xk i ) x,s Q(Y i x, s)xi k(x s)bk i (s) y t, v t, s, W k] where the function R(s 0, b, x 0, x ) is given by Q(Y i Si k E[E[log, Xk i ) x,s Q(Y i x, s)xi k(x s)bk i (s) S k i, B k i, X k i, X k i, y t, v t, s, W k] y t, v t, s, W k] E[R(S k i, B k i, X k i, X k i ) y t, v t, s, W k], (9) R(s 0, b, x 0, x ) y Q(y s 0, x 0 ) log Q(y s 0, x 0 ) x, s Q(y x, s)x ( x s)b ( s). (20) This inspires us to define a controlled Markov process with state (S 0 t, B t ) S P(S), action (X 0 t, X t ) X (S P(X )), instantaneous reward R(S 0 t, B t, X 0 t, X t ) at time t and transition kernel Q (S 0 t+, B t S 0 t, B t, X 0 t, X t ) y δ g(s 0 t,x 0 t,y) (S 0 t+)δ φ(b t,x t,y) (B t )Q(y X 0 t, S 0 t ). (2) That this is indeed a controlled Markov process can be readily established. Note that at time t 0 the process starts with initial state (S 0 0, B ). Let V N (s 0, b ) be the (average) reward in N steps of this process V N (s 0, b ) N N E[ R(Si 0, Bi, Xi 0, Xi ) S0 0 s 0, B b ], (22) i and denote by V (s 0, b ) the corresponding lim sup, i.e., V (s 0, b ) lim sup N V N (s 0, b ). Then, the constant C is given by C sup s 0,b V (s 0, b ). (23) A. A computational efficient upper bound on C The MDP defined above has uncountably infinite state and action spaces. In this section, we propose an alternative upper bound on C and formulate an MDP with finite state and action spaces to evaluate it. This provides a looser but more computational efficient upper bound. Consider again the divergence term D(P (Yt+, V t+ W k, yt, v t, s ) P (Yt+, V t+ W k, yt, v t, s )) (24) D(P (Yt+, V t+ W k, yt, v t, s ) P (W j y t, v t, s ) P (W k y t, v t P (Yt+, s ), V t+ W j, yt, v t, s )) (25) j k P (W j y t, v t, s ) P (W k y t, v t, s ) j k max j k D(P (Yt+, V t+ W k, yt, v t, s ) P (Yt+, V t+ W j, yt, v t, s )) (26) D(P (Y t+, V t+ W k, yt, v t, s ) P (Y t+, V t+ W j, yt, v t, s )) (27)

where is due to convexity. Consider deterministic policies and look into the first distribution in the divergence, P (Y t+, V t+ W k, yt, v t, s ) Then we have where R is defined by N i P (Y t+i W k, Y t+i t+, V t+i t+, yt, v t, s )P (V t+i W k, Y t+i t+, V t+i t+, y t, v t, s ) N Q(Y t+i Xt+i, k St+i)P k (V t+i ). (28) i D(P (Yt+, V t+ W k, yt, v t, s ) P (Yt+, V t+ W j, yt, v t, s )) N E[E[log Q(Y t+i Xt+i k, Sk t+i ) Q(Y t+i X j t+i, Sj t+i ) i Yt+ t+i, Vt+ t+i, yt, v t, s, W k] y t, v t, s, W k] N E[ R(S t+i, k S j t+i, Xk t+i, X j t+i ) yt, v t, s, W k], (29) i R(s 0, s, x 0, x ) y Q(y x 0, s 0 ) log Q(y x0, s 0 ) Q(y x, s ). (30) Similar to the previous development, we define a controlled Markov chain with state (S 0 t, S t ) S 2, action (X 0 t, X t ) X 2, instantaneous reward R(S 0 t, S t, X 0 t, X t ) at time t and transition kernel Q (S 0 t+, S t+ S 0 t, S t, X 0 t, X t ) y δ g(s 0 t,x 0 t,y) (S 0 t+)δ g(s t,x t,y) (S t+)q(y X 0 t, S 0 t ). (3) Let Ṽ N (s 0, s ) denote the average N-stage reward for this MDP, i.e., Ṽ N (s 0, s ) N N E[ R(Si 0, Si, Xi 0, Xi ) S0 0 s 0, S0 s ], (32) i Combining the above with the definition of C, we have which gives a simple upper bound on C. C max s 0,s Ṽ (s 0, s ). (33) V. NUMERICAL RESULT FOR UNIFILAR CANNELS In this section, we provide numerical results for the expressions V and Ṽ for some binary input/output/state unifilar channels. We consider the trapdoor channel (denoted as channel A), chemical channel (denoted as channel B(p 0 )), symmetric unifilar channels (denoted as channel C(p 0, q 0 )), and asymmetric unifilar channels (denoted as channel C(p 0, q 0, p, q )). All of these channels have g(s, x, y) s x y and kernel Q characterized as shown in Table I. TABLE I: Kernel definition for binary unifilar channels Channel Q(0 0, 0) Q(0, 0) Q(0 0, ) Q(0, ) A 0.5 0.5 0 B(p 0 ) p 0 p 0 0 C(p 0, q 0 ) q 0 p 0 p 0 q 0 D(p 0, q 0, p, q ) q 0 p 0 p q The numerical results are shown in the following table and were obtained by numerically solving the corresponding MDPs. The results for V were obtained by quantizing the state and input spaces using uniform quantization with n 00 points. The results are tabulated in Table II. It is not surprising that the trapdoor and chemical channels have infinite upper bounds. This is also true for the Z channel in the DMC case and it is related to the fact that the transition kernel has a zero entry. Intuitively, discrimination of the two hypotheses can be perfect by transmitting always X t S t under 0 and X t S t under hypothesis: with high probability, that does not depend on the message size or the target error rate, the receiver under the 0 hypothesis will receive the output Y t S t which is impossible under hypothesis and thus will make a perfect decision. For each MDP the rewards do not seem to depend on the initial state, within the accuracy of our calculations. Similarly, the results comparing the first and second MDPs are within the accuracy of our calculations and so we cannot make a conclusive

TABLE II: Asymptotic reward per unit time Channel inf s 0,b V (s 0, b ) sup s 0,b V (s 0, b ) min s 0,s Ṽ (s 0, s ) max s 0,s Ṽ (s 0, s ) C C A B(0.9) 3.294 C(0.5, 0.).633.637.637.637.637.533 C(0.9, 0.) 2.459 2.536 2.533 2.536 2.536 2.459 D(0.5, 0., 0., 0.) 2.274 2.303 2.298 2.298 2.303 2.247 D(0.9, 0., 0., 0.) 2.459 2.536 2.533 2.536 2.536 2.459 statement regarding the difference between the two MDP solutions. There is a strong indication, however, that they both result in the same average reward asymptotically. Also shown in the above table is the quantity C which is the average reward received in the MDP for the instantaneous reward R (s 0, b, x 0, x ) x, s x ( x s)b ( s) Q(y x, s) Q(y x, s) log Q(y x 0, s 0 ), y which is of interest in the design of transmission schemes in [4]. VI. CONCLUSIONS In this paper, we derive an upper bound on the error-exponent of unifilar channels with noiseless feedback and variable length codes. We generalize Burnashev s techniques by performing multi-step drift analysis and deriving a lower bound on the stopping time together with a proposed submartingale. The constant C which is the zero rate exponent is evaluated through an MDP and furher upper bounded through a more computationally tractable MDP. Numerical results show that for some unifilar channels the two MDPs give different results. A future research direction is the analytical solution of these MDPs. In addition, the presented analysis can be easily generalized to channels with finite state and inter-symbol interference (ISI) with the state known only to the receiver. Given any y t Y t, v t V t and s S, APPENDIX A PROOF OF LEMMA E[ t+ t Y t y t, V t v t, S s ] I(W ; Y t+, V t+ Y t y t, V t v t, S s ) (Y t+ V t+, Y t y t, V t v t, S s ) (V t+ Y t y t, V t v t, S s ) + (Y t+ V t+, Y t y t, V t v t, S s, W ) + (V t+ Y t y t, V t v t, S s, W ) (Y t+ V t+, Y t y t, V t v t, S s ) (V t+ ) + (Y t+ V t+, Y t y t, V t v t, S s, W ) + (V t+ ) (b) (Y t+ Y t y t, S s ) + (Y t+ Y t y t, V t v t, S s, W, S t+, X t+ ) (c) (Y t+ Y t y t, S s ) + (Y t+ Y t y t, S s, S t+, X t+ ) I(X t+, S t+ ; Y t+ Y t y t, S s ), (34) where is due to the way the common random variables are selected, (b) is due to conditioning reduces entropy, and (c) is due to the channel properties. Note that the last term is the mutual information between X t+, S t+ and Y t+ conditioning on Y t y t, S s, which is different from conditional mutual information I(X t+, S t+ ; Y t+ Y t, S ). Now the N-step drift

becomes E[ t Y t y t, V t v t, S s ] kt kt kt kt E[E[ k+ k Y t y t, V t v t, Y k t+, V k t+, S s ] Y t y t, V t v t, S s ] y k t+,vk t+ y k t+,vk t+ y k t+ P (Y k t+ y k t+, V k t+ v k t+ Y t y t, V t v t, S s )E[ k+ k Y k y k, V k v k, S s ] P (Y k t+ y k t+, V k t+ v k t+ Y t y t, V t v t, S s )I(X k+, S k+ ; Y k+ Y k y k, S s ) P (Y k t+ y k t+ Y t y t, V t v t, S s )I(X k+, S k+ ; Y k+ Y k y k, S s ) I(X k+, S k+ ; Y k+ Yt+, k Y t y t, S s ) kt (b) N(C + ɛ), where is due to (34) and (b) is due to (2). Given any y t Y t, v t V t and s S, E[log( ) log( t ) Y t y t, V t v t, S s ] APPENDIX B PROOF OF LEMMA 2 E[log i P (W i Yt+, V t+, yt, v t, s ) log P (W i Yt+, V t+, yt, v t, s ) i P (W i yt, v t, s ) log P (W i y t, v t Y t y t, V t v t, S s ]. (36), s ) For convenience, we define the following quantities f i P (W i y t, v t, s ) (35) (37a) f i (Yt+ t+ ) P (W i Yt+ t+, v t, s ) (37b) ˆQ(Y t+ t+ i) P (Yt+ t+, v t, s ). (37c) Since t < ɛ, there exits a k such that f k > ɛ while f j < ɛ for j k. We further define ˆf j f j /( f k ) for j k. The following approximations are valid for f k close to. f k (Y t+, V t+ ) log f k(y t+, V t+ ) ( f k) j k ˆf j ˆQ(Y t+, V t+ j) ˆQ(Y t+, V f j (Y t+, V t+ ) log f j(y t+, V t+ ) ( f k)(log( f k ) + o(log( f k ))) P (Y t+, V t+ yt, S ) ˆQ(Y t+, V t+ Substituting these approximate expressions back to the drift expression we have Y t+,vt+ t+ k) + o( f k ) (38a) ˆf j ˆQ(Y t+, V t+ j) ˆQ(Y t+, V k) + o(). (38c) E[log( ) log( t ) Y t y t, V t v t, S s ] i P (Yt+, V t+ yt, v t, s ) log f i(yt+, V t+ ) log f i(yt+, V i f i log f i Y t+,vt+ Y t+,vt+ N(C + ɛ), ( f k )(log( f k ) + o(log( f k ))) ˆQ(Y t+, V t+ k) log j k ˆQ(Y t+, V t+ k) log ˆQ(Y t+, V j k ˆf j ˆQ(Y where the last inequality is due to the definition of C. t+ ) ( f k )(log( f k ) + o(log( f k )) t+ k) t+, V t+ j) + o() ˆf j ˆQ(Y t+ ˆQ(Y t+ t+ k) (38b),Vt+ j),vt+ k) (39)

APPENDIX C PROOF OF LEMMA 3 We can always choose a sufficiently small positive λ such that λeλk3 K3 2 > 0 2K 2 We first consider the case t >. (e y ) < y + f(y) K K 2 K 3 < y < 0 (40a) (e y ) > y + f(y) K K 2 0 < y < K 3 (40b) + f (y) > 0 K 2 K 3 < y < 0. (40c) Z t+ ( t+ K t+ log + t + ) {t+> } + ( K 2 + t + + f(log t )) { t+ } ( t+ + t + ) {t+> K } + ( t+ + t + ) {t+ K } t+ + t +, K where is due to (40a). Therefore we have E[Z t+ Z t F t ] E[Z t+ {t> } Z t {t> } F t ] E[( t+ + t + ) {t> } ( t 0, where the last equation is due to (8a). Similarly, for the case t, from (40b) we have and therefore E[Z t+ Z t F t ] K t+ log E[ + t + + f(log t t+ log K 2 ) E[( + f (log t+ K 2 ))(log t+ (b) E[ K 2 f (log t+ 2 E[e λ log t+ + K + t) {t> } F t ] (40d) t+ log Z t+ ( + t + + f(log t+ )), (43) K 2 K 2 t f(log t ) F t] log t ) + + f (Z( t+, t )) 2 ) + f (Z( t+, t )) (log t+ log t )2 F t ] (log t+ t+ λeλ(z(t+,t) log +log t+ ) (log t+ 2K 2 log t )2 F t ] t+ (c) E[e λ log t+ λeλ(k3+log ) (log t+ 2K 2 (d) ( λeλk3 K 2K 3)E[e 2 λ log 2 (e) 0, t+ F t ] log t )2 F t ] log t )2 F t ] where is from the second-order Taylor s expansion of f at log t, (b) is due to (8b) and (40c), (c) is due to that Z( t+, t ) is between log t and log t+, (d) is due to (8c), and (e) is due to (40d). From (42) and (44), we have E[Z t+ Z t Y t ] 0 and thus {Z t } t 0 is a submartingale. APPENDIX D PROOF OF PROPOSITION The proof essentially applies Lemma 3 to the block submartingale. Given any ɛ > 0, there exists an N N(ɛ) such that by Lemma and Lemma 2, E[ N(t+) Nt F Nt ] N(C + ɛ) (4) (42) (44) (45a) E[log N(t+) log Nt F Nt ] N(C + ɛ) if Nt < ɛ (45b) log N(t+) log Nt < NC 2 if Nt < ɛ. (45c)

Define M t Z Nt, where Z t is defined in (9), and filtration F t σ(y Nt, S ). {M t} t 0 is a submartingale w.r.t. {F t } t 0 by Lemma 3. Notice that the quantity t here indicates the order of the block of N consecutive transmissions. Furthermore, define the stopping time ˆT w.r.t. {F t } t 0 by ˆT min{k T Nk}. By definition of ˆT, we have ( ˆT )N T a.s. (46) Now we essentially apply the optional sampling theorem on the submartingale {M t} t 0 as follows K ɛ N(C + ɛ) M 0 E[M ˆT ] E[( log N ˆT log ɛ N(C + ɛ) E[ log N ˆT + log ɛ N(C + ɛ) + f( log N ˆT log ɛ + f( log N ˆT log ɛ )) N ˆT ɛ] + E[( N ˆT ɛ N(C + ɛ) ) N ˆT >ɛ] + E[ ˆT ] )] + E[ N ˆT + ɛ N(C + ɛ) ] + E[ ˆT ] E[ log T + log ɛ + f( log T N(C + ɛ) log ɛ )] + E[ T + ɛ N(C + ɛ) ] + E[ ˆT ] (b) log E[ T ] + log ɛ + E[f( log T N(C + ɛ) log ɛ )] + E[ T ] + ɛ N(C + ɛ) + E[ ˆT ] (c) log E[ T ] + log ɛ + E[f( log N ˆT )] + E[ T ] + ɛ N(C + ɛ) log ɛ N(C + ɛ) + E[T ] N + (d) log E[ T ] + log ɛ + E[ T ] + ɛ N(C + ɛ) N(C + ɛ) + E[T ] N + + λnc (e) log P e + log(k log P e) + log ɛ N(C + ɛ) + P e log P e ( P e) log( P e) + P ek + ɛ N(C + ɛ) + E[T ] N + + λnc, where is due to that the receiver no longer performs actions after time T, (b) is due the the concavity of log( ), (c) is due to (46), (d) is due to that f is upper-bounded by λnc, and (e) is due to the Fano s lemma. Multiplying N on the both sides the above inequality, we get log P e E[T ] which proves the result. C ( R log(k log P e) + log ɛ ) + C K/R + C + ɛ K/R (N(ɛ) + λc + (47) + ɛ( + C C C(C + ɛ) R) P e log P e ( P e) log( P e) + P ek + 2ɛ ), (48) C + ɛ REFERENCES [] J. Schalkwijk and T. Kailath, A coding scheme for additive noise channels with feedback I: No bandwidth constraint, IEEE Trans. Inform. Theory, vol. 2, no. 2, pp. 72 82, Apr 966. [2] R. L. Dobrushin, An asymptotic bound for the probability error of information transmission through a channel without memory using the feedback, Problemy Peredachi Informatsii, vol. 8, pp. 6 60, 962. [3] E. A. aroutunian, Lower bound for error probability in channels with feedback, Problemy Peredachi Informatsii, vol. 3, pp. 36 44, 977. [4] M. V. Burnashev, Data transmission over a discrete channel with feedback. Random transmission time, Problemy Peredachi Informatsii, vol. 2, no. 4, pp. 0 30, Oct.-Dec. 976. [5] P. Berlin, B. Nakiboglu, B. Rimoldi, and E. Telatar, A simple converse of burnashev s reliability function, IEEE Trans. Information Theory, vol. 55, no. 7, pp. 3074 3080, July 2009. [6] M. orstein, Sequential transmission using noiseless feedback, IEEE Trans. Inform. Theory, vol. 9, no. 3, pp. 36 43, Jul 963. [7]. Yamamoto and K. Itoh, Asymptotic performance of a modified schalkwijk-barron scheme for channels with noiseless feedback (corresp.), IEEE Transactions on Information Theory, vol. 25, no. 6, pp. 729 733, Nov 979. [8] O. Shayevitz and M. Feder, Optimal feedback communication via posterior matching, IEEE Trans. Information Theory, vol. 57, no. 3, pp. 86 222, Mar. 20. [9] S. Tatikonda and S. Mitter, The capacity of channels with feedback, IEEE Trans. Information Theory, vol. 55, no., pp. 323 349, Jan. 2009. [0]. Permuter, P. Cuff, B. V. Roy, and T. Weissman, Capacity of the trapdoor channel with feedback, IEEE Trans. Information Theory, vol. 54, no. 7, pp. 350 365, July 2008. [] J.. Bae and A. Anastasopoulos, A posterior matching scheme for finite-state channels with feedback, in Proc. International Symposium on Information Theory, Austin, TX, June 200, pp. 2338 2342. [2] G. Como, S. Yuksel, and S. Tatikonda, The error exponent of variable-length codes over Markov channels with feedback, IEEE Trans. Information Theory, vol. 55, no. 5, pp. 239 260, May 2009. [3] A. Anastasopoulos, A sequential transmission scheme for unifilar finite-state channels with feedback based on posterior matching, in Proc. International Symposium on Information Theory, July 202, pp. 294 298. [4] A. Anastasopoulos and J. Wu, Variable-length codes for channels with memory and feedback: error exponent lower bounds, in Proc. International Symposium on Information Theory, Aachen, Germany, Jan. 207, (submitted. Extended version available online on arxiv at https://arxiv.org/a/ 0000-0003-3795-654 and at http://web.eecs.umich.edu/ anastas/preprints.html). [5] M. V. Burnashev and K. S. Zingangirov, On one problem of observation control, Problemy Peredachi Informatsii, vol., no. 3, pp. 44 52, 975.