On Concentration of Martingales and Applications in Information Theory, Communication & Coding

Similar documents
On the Concentration of the Crest Factor for OFDM Signals

On the Concentration of the Crest Factor for OFDM Signals

Lower Bounds on the Graphical Complexity of Finite-Length LDPC Codes

Concentration of Measure with Applications in Information Theory, Communications and Coding

On the Entropy of Sums of Bernoulli Random Variables via the Chen-Stein Method

On Generalized EXIT Charts of LDPC Code Ensembles over Binary-Input Output-Symmetric Memoryless Channels

EE229B - Final Project. Capacity-Approaching Low-Density Parity-Check Codes

Performance Analysis and Code Optimization of Low Density Parity-Check Codes on Rayleigh Fading Channels

On Achievable Rates and Complexity of LDPC Codes over Parallel Channels: Bounds and Applications

Capacity-Achieving Ensembles for the Binary Erasure Channel With Bounded Complexity

Analytical Bounds on Maximum-Likelihood Decoded Linear Codes: An Overview

March 1, Florida State University. Concentration Inequalities: Martingale. Approach and Entropy Method. Lizhe Sun and Boning Yang.

Bounds on Achievable Rates of LDPC Codes Used Over the Binary Erasure Channel

Recent Results on Capacity-Achieving Codes for the Erasure Channel with Bounded Complexity

Tightened Upper Bounds on the ML Decoding Error Probability of Binary Linear Block Codes and Applications

P (A G) dp G P (A G)

Tightened Upper Bounds on the ML Decoding Error Probability of Binary Linear Block Codes and Applications

Bounds on the Performance of Belief Propagation Decoding

Optimal Rate and Maximum Erasure Probability LDPC Codes in Binary Erasure Channel

An Improved Sphere-Packing Bound for Finite-Length Codes over Symmetric Memoryless Channels

Slepian-Wolf Code Design via Source-Channel Correspondence

Network coding for multicast relation to compression and generalization of Slepian-Wolf

Low-density parity-check codes

An Introduction to Low Density Parity Check (LDPC) Codes

arxiv: v4 [cs.it] 17 Oct 2015

Lecture 4 Noisy Channel Coding

Channel Polarization and Blackwell Measures

On Improved Bounds for Probability Metrics and f- Divergences

An Improved Sphere-Packing Bound for Finite-Length Codes over Symmetric Memoryless Channels

On the Block Error Probability of LP Decoding of LDPC Codes

Time-invariant LDPC convolutional codes

EECS 750. Hypothesis Testing with Communication Constraints

On the Typicality of the Linear Code Among the LDPC Coset Code Ensemble

ECEN 655: Advanced Channel Coding

Lecture 12. Block Diagram

Lecture 14 February 28

THIS paper provides a general technique for constructing

LOW-density parity-check (LDPC) codes were invented

arxiv: v4 [cs.it] 8 Apr 2014

Graph-based Codes for Quantize-Map-and-Forward Relaying

Channel Dispersion and Moderate Deviations Limits for Memoryless Channels

An Introduction to Algorithmic Coding Theory

arxiv:cs/ v1 [cs.it] 16 Aug 2005

Dispersion of the Gilbert-Elliott Channel

Unified Scaling of Polar Codes: Error Exponent, Scaling Exponent, Moderate Deviations, and Error Floors

Low-Density Parity-Check Codes

Variable-Rate Universal Slepian-Wolf Coding with Feedback

Tight Bounds for Symmetric Divergence Measures and a New Inequality Relating f-divergences

Tight Bounds for Symmetric Divergence Measures and a Refined Bound for Lossless Source Coding

Lecture 11: Polar codes construction

Generalized Writing on Dirty Paper

EE5139R: Problem Set 7 Assigned: 30/09/15, Due: 07/10/15

Hypothesis Testing with Communication Constraints

Analysis of Sum-Product Decoding of Low-Density Parity-Check Codes Using a Gaussian Approximation

Arimoto-Rényi Conditional Entropy. and Bayesian M-ary Hypothesis Testing. Abstract

Introduction to Low-Density Parity Check Codes. Brian Kurkoski

Bounds on Mutual Information for Simple Codes Using Information Combining

Secret Key and Private Key Constructions for Simple Multiterminal Source Models

Upper Bounds on the Capacity of Binary Intermittent Communication

Capacity of the Discrete Memoryless Energy Harvesting Channel with Side Information

Spatially Coupled LDPC Codes

An Achievable Error Exponent for the Mismatched Multiple-Access Channel

Computing the threshold shift for general channels

Lecture 6 I. CHANNEL CODING. X n (m) P Y X

Lecture 22: Final Review

Quantum Sphere-Packing Bounds and Moderate Deviation Analysis for Classical-Quantum Channels

Entropy and Ergodic Theory Lecture 15: A first look at concentration

On the Shamai-Laroia Approximation for the Information Rate of the ISI Channel

LECTURE 10. Last time: Lecture outline

Superposition Encoding and Partial Decoding Is Optimal for a Class of Z-interference Channels

Distributed Source Coding Using LDPC Codes

Lecture 4 : Introduction to Low-density Parity-check Codes

Approaching Blokh-Zyablov Error Exponent with Linear-Time Encodable/Decodable Codes

The PPM Poisson Channel: Finite-Length Bounds and Code Design

An Efficient Maximum Likelihood Decoding of LDPC Codes Over the Binary Erasure Channel

Low-complexity error correction in LDPC codes with constituent RS codes 1

SHARED INFORMATION. Prakash Narayan with. Imre Csiszár, Sirin Nitinawarat, Himanshu Tyagi, Shun Watanabe

Iterative Encoding of Low-Density Parity-Check Codes

Bounds on the Maximum Likelihood Decoding Error Probability of Low Density Parity Check Codes

ELEC546 Review of Information Theory

Practical Polar Code Construction Using Generalised Generator Matrices

Joint Iterative Decoding of LDPC Codes and Channels with Memory

Exact Probability of Erasure and a Decoding Algorithm for Convolutional Codes on the Binary Erasure Channel

Why We Can Not Surpass Capacity: The Matching Condition

An Introduction to Low-Density Parity-Check Codes

Communications Theory and Engineering

THE seminal paper of Gallager [1, p. 48] suggested to evaluate

On Bit Error Rate Performance of Polar Codes in Finite Regime

Accumulate-Repeat-Accumulate Codes: Capacity-Achieving Ensembles of Systematic Codes for the Erasure Channel with Bounded Complexity

5. Density evolution. Density evolution 5-1

Second-Order Asymptotics in Information Theory

Entropy, Inference, and Channel Coding

A t super-channel. trellis code and the channel. inner X t. Y t. S t-1. S t. S t+1. stages into. group two. one stage P 12 / 0,-2 P 21 / 0,2

On Third-Order Asymptotics for DMCs

Interactions of Information Theory and Estimation in Single- and Multi-user Communications

Information Theory and Hypothesis Testing

On the Design of Raptor Codes for Binary-Input Gaussian Channels

Arimoto Channel Coding Converse and Rényi Divergence

Belief propagation decoding of quantum channels by passing quantum messages

Transcription:

On Concentration of Martingales and Applications in Information Theory, Communication & Coding Igal Sason Department of Electrical Engineering Technion - Israel Institute of Technology Haifa 32000, Israel ETH, Zurich, Switzerland August 22 23, 2012. I. Sason (Technion) Seminar Talk, ETH, Zurich, Switzerland August 22 23, 2012. 1 / 90

Concentration of Measures Concentration of Measures Concentration of measures is a central issue in probability theory, and it is strongly related to information theory and coding. Roughly speaking, the concentration of measure phenomenon can be stated in the following simple way: A random variable that depends in a smooth way on many independent random variables (but not too much on any of them) is essentially constant (Talagrand, 1996). I. Sason (Technion) Seminar Talk, ETH, Zurich, Switzerland August 22 23, 2012. 2 / 90

Concentration of Measures Concentration of Measures Concentration of measures is a central issue in probability theory, and it is strongly related to information theory and coding. Roughly speaking, the concentration of measure phenomenon can be stated in the following simple way: A random variable that depends in a smooth way on many independent random variables (but not too much on any of them) is essentially constant (Talagrand, 1996). Concentration Inequalities Concentration inequalities provide upper bounds on tail probabilities of the type P( X x t) (or P(X x t) for a random variable (RV) X, where x denotes the expectation or median of X. Several techniques were developed to prove concentration. I. Sason (Technion) Seminar Talk, ETH, Zurich, Switzerland August 22 23, 2012. 2 / 90

Survey Papers on Concentration Inequalities for Martingales Survey Papers on Concentration Inequalities for Martingales 1 C. McDiarmid, Concentration, Probabilistic Methods for Algorithmic Discrete Mathematics, pp. 195 248, Springer, 1998. 2 F. Chung and L. Lu, Concentration inequalities and martingale inequalities: a survey, Internet Mathematics, vol. 3, no. 1, pp. 79 127, March 2006. Available at http://www.ucsd.edu/ fan/wp/concen.pdf. 3 I. S., On Refined Versions of the Azuma-Hoeffding Inequality with Applications in Information Theory, Communications and Coding, a tutorial paper with some original results, July 2011. Last updated in July 2012. See http://arxiv.org/abs/1111.1977. I. Sason (Technion) Seminar Talk, ETH, Zurich, Switzerland August 22 23, 2012. 3 / 90

Concentration via the Martingale Approach Definition - [Discrete-Time Martingales] Let (Ω, F, P) be a probability space. Let F 0 F 1... be a sequence of sub σ-algebras of F (which is called a filtration). A sequence X 0,X 1,... of RVs is a martingale if for every i 1 X i : Ω R is F i -measurable, i.e., 2 E[ X i ] <. {ω Ω : X i (ω) t} F i i {0,1,...},t R. 3 X i = E[X i+1 F i ] almost surely. I. Sason (Technion) Seminar Talk, ETH, Zurich, Switzerland August 22 23, 2012. 4 / 90

Concentration via the Martingale Approach Definition - [Discrete-Time Martingales] Let (Ω, F, P) be a probability space. Let F 0 F 1... be a sequence of sub σ-algebras of F (which is called a filtration). A sequence X 0,X 1,... of RVs is a martingale if for every i 1 X i : Ω R is F i -measurable, i.e., 2 E[ X i ] <. {ω Ω : X i (ω) t} F i i {0,1,...},t R. 3 X i = E[X i+1 F i ] almost surely. A Simple Example: Random walk X n = n i=0 U i where P(U i = +1) = P(U i = 1) = 1 2 are i.i.d. RVs. I. Sason (Technion) Seminar Talk, ETH, Zurich, Switzerland August 22 23, 2012. 4 / 90

Concentration via the Martingale Approach Martingales Remarks Remark 1 Given a RV X L 1 (Ω, F, P) and a filtration of sub σ-algebras {F i }, let X i = E[X F i ] i = 0,1,.... Then, the sequence X 0,X 1,... forms a martingale. I. Sason (Technion) Seminar Talk, ETH, Zurich, Switzerland August 22 23, 2012. 5 / 90

Concentration via the Martingale Approach Martingales Remarks Remark 1 Given a RV X L 1 (Ω, F, P) and a filtration of sub σ-algebras {F i }, let X i = E[X F i ] i = 0,1,.... Then, the sequence X 0,X 1,... forms a martingale. Remark 2 Choose F 0 = {Ω, } and F n = F, then Remark 1 gives a martingale with X 0 = E[X F 0 ] = E[X] (F 0 doesn t provide information about X). X n = E[X F n ] = X a.s. (F n provides full information about X). I. Sason (Technion) Seminar Talk, ETH, Zurich, Switzerland August 22 23, 2012. 5 / 90

Concentration via the Martingale Approach Concentration via the Martingale Approach Theorem - [Azuma-Hoeffding inequality] Let {X k, F k } k=0 be a discrete-parameter real-valued martingale. If the jumps of the sequence {X k } are bounded almost surely (a.s.), i.e., X i X i 1 d i i = 1,2,...,n a.s. then ( P( X n X 0 r) 2exp r 2 2 n i=1 d2 i ), r > 0. I. Sason (Technion) Seminar Talk, ETH, Zurich, Switzerland August 22 23, 2012. 6 / 90

Concentration via the Martingale Approach Concentration via the Martingale Approach Theorem - [Azuma-Hoeffding inequality] Let {X k, F k } k=0 be a discrete-parameter real-valued martingale. If the jumps of the sequence {X k } are bounded almost surely (a.s.), i.e., X i X i 1 d i i = 1,2,...,n a.s. then ( P( X n X 0 r) 2exp r 2 2 n i=1 d2 i ), r > 0. Concentration Inequalities Around the Average The Azuma-Hoeffding inequality and Remark 2 enable to get a concentration inequality for X around its expected value (E[X]). I. Sason (Technion) Seminar Talk, ETH, Zurich, Switzerland August 22 23, 2012. 6 / 90

Refined Versions of the Azuma-Hoeffding Inequality But, the Azuma-Hoeffding inequality is not tight!. For example, if r > n i=1 d i P( X n X 0 r) = 0. I. Sason (Technion) Seminar Talk, ETH, Zurich, Switzerland August 22 23, 2012. 7 / 90

Refined Versions of the Azuma-Hoeffding Inequality Theorem (Th. 2 McDiarmid 89) Let {X k, F k } k=0 be a discrete-parameter real-valued martingale. Assume that, for some constants d,σ > 0, the following two requirements hold a.s. X k X k 1 d, Var(X k F k 1 ) = E [ (X k X k 1 ) 2 F k 1 ] σ 2 for every k {1,...,n}. Then, for every α 0, ( ( δ + γ )) γ P( X n X 0 αn) 2exp n D 1 + γ 1 + γ ( ) ( ) where γ σ2 d, δ α 2 d, and D(p q) p ln p q + (1 p)ln 1 p 1 q. I. Sason (Technion) Seminar Talk, ETH, Zurich, Switzerland August 22 23, 2012. 8 / 90

Refined Versions of the Azuma-Hoeffding Inequality Corollary Under the conditions of Theorem 2, for every α 0, P( X n X 0 αn) 2exp ( nf(δ)) where f(δ) = { ln(2) [ 1 h 2 ( 1 δ 2 ) ], 0 δ 1 +, δ > 1 and h 2 (x) xlog 2 (x) (1 x)log 2 (1 x) for 0 x 1. I. Sason (Technion) Seminar Talk, ETH, Zurich, Switzerland August 22 23, 2012. 9 / 90

Refined Versions of the Azuma-Hoeffding Inequality Corollary Under the conditions of Theorem 2, for every α 0, P( X n X 0 αn) 2exp ( nf(δ)) where f(δ) = { ln(2) [ 1 h 2 ( 1 δ 2 ) ], 0 δ 1 +, δ > 1 and h 2 (x) xlog 2 (x) (1 x)log 2 (1 x) for 0 x 1. Proof Set γ = 1 in Theorem 2. I. Sason (Technion) Seminar Talk, ETH, Zurich, Switzerland August 22 23, 2012. 9 / 90

Refined Versions of the Azuma-Hoeffding Inequality Corollary Under the conditions of Theorem 2, for every α 0, P( X n X 0 αn) 2exp ( nf(δ)) where f(δ) = { ln(2) [ 1 h 2 ( 1 δ 2 ) ], 0 δ 1 +, δ > 1 and h 2 (x) xlog 2 (x) (1 x)log 2 (1 x) for 0 x 1. Proof Set γ = 1 in Theorem 2. Observation (first set γ = 1, and then use Pinsker s Inequality) Theorem 2 Corollary Azuma-Hoeffding inequality. I. Sason (Technion) Seminar Talk, ETH, Zurich, Switzerland August 22 23, 2012. 9 / 90

Refined Versions of the Azuma-Hoeffding Inequality 2 LOWER BOUNDS ON EXPONENTS 1.5 1 0.5 Theorem 2: γ=1/8 1/4 1/2 Corollary 2: f(δ) 0 0 0.2 0.4 0.6 0.8 1 1.2 δ = α/d Exponent of Azuma inequality: δ 2 /2 I. Sason (Technion) Seminar Talk, ETH, Zurich, Switzerland August 22 23, 2012. 10 / 90

A Simple Example to Motivate Further Improvements A Simple Example to Motivate Further Improvements Let (Ω, F, P) be a probability space. d > 0 and γ (0,1] be fixed numbers. {U k } k N be i.i.d. random variables where P(U k = +d) = P(U k = d) = γ 2, P(U k = 0) = 1 γ, k N. I. Sason (Technion) Seminar Talk, ETH, Zurich, Switzerland August 22 23, 2012. 11 / 90

A Simple Example to Motivate Further Improvements A Simple Example to Motivate Further Improvements Let (Ω, F, P) be a probability space. d > 0 and γ (0,1] be fixed numbers. {U k } k N be i.i.d. random variables where P(U k = +d) = P(U k = d) = γ 2, P(U k = 0) = 1 γ, k N. Consider the Markov process {X k } k 0 where X 0 = 0 and X k = X k 1 + U k, k N. I. Sason (Technion) Seminar Talk, ETH, Zurich, Switzerland August 22 23, 2012. 11 / 90

A Simple Example to Motivate Further Improvements A Simple Example (Cont.) - Large Deviation Analysis From Cramér s theorem in R, for every α E[U 1 ] = 0, where the rate function is given by lim 1 n n ln P(X n αn) = I(α) I(α) = sup {tα ln E[exp(tU 1 )]}. t 0 I. Sason (Technion) Seminar Talk, ETH, Zurich, Switzerland August 22 23, 2012. 12 / 90

A Simple Example to Motivate Further Improvements A Simple Example (Cont.) - Large Deviation Analysis From Cramér s theorem in R, for every α E[U 1 ] = 0, where the rate function is given by lim 1 n n ln P(X n αn) = I(α) I(α) = sup {tα ln E[exp(tU 1 )]}. t 0 Let δ α d. Calculation shows that I(α) = δx ln (1 + γ [ cosh(x) 1 ]) ( δ(1 γ) + ) δ x ln 2 (1 γ) 2 + γ 2 (1 δ 2 ). γ(1 δ) I. Sason (Technion) Seminar Talk, ETH, Zurich, Switzerland August 22 23, 2012. 12 / 90

A Simple Example to Motivate Further Improvements A Simple Example (Cont.) - Large Deviation Analysis Lets examine the martingale approach in this simple case, where X k = with the natural filtration k U i, k N, X 0 = 0 i=1 F k = σ(u 1,...,U k ), k N, F 0 = {,Ω}. I. Sason (Technion) Seminar Talk, ETH, Zurich, Switzerland August 22 23, 2012. 13 / 90

A Simple Example to Motivate Further Improvements A Simple Example (Cont.) - Large Deviation Analysis Lets examine the martingale approach in this simple case, where X k = with the natural filtration k U i, k N, X 0 = 0 i=1 F k = σ(u 1,...,U k ), k N, F 0 = {,Ω}. In this case, the martingale has bounded increments, and for every k N X k X k 1 d E[(X k X k 1 ) 2 F k 1 ] = Var(U k ) = γd 2 almost surely. I. Sason (Technion) Seminar Talk, ETH, Zurich, Switzerland August 22 23, 2012. 13 / 90

A Simple Example to Motivate Further Improvements A Simple Example (Cont.) - Large Deviation Analysis Via the martingale approach, the following exponential inequality follows: ( ( δ + γ )) γ P (X n X 0 αn) exp n D 1 + γ 1 + γ where δ α d, and ( p ( 1 p ) D(p q) p ln + (1 p)ln, p,q [0,1] q) 1 q is the divergence (relative entropy) between the probability distributions (p,1 p) and (q,1 q). If δ > 1, then the above probability is zero. I. Sason (Technion) Seminar Talk, ETH, Zurich, Switzerland August 22 23, 2012. 14 / 90

A Simple Example to Motivate Further Improvements A Simple Example (Cont.) - Large Deviation Analysis Via the martingale approach, the following exponential inequality follows: ( ( δ + γ )) γ P (X n X 0 αn) exp n D 1 + γ 1 + γ where δ α d, and ( p ( 1 p ) D(p q) p ln + (1 p)ln, p,q [0,1] q) 1 q is the divergence (relative entropy) between the probability distributions (p,1 p) and (q,1 q). If δ > 1, then the above probability is zero. This exponential bound is not tight (unless γ = 1), and the gap to the exact asymptotic exponent increases as the value of γ (0, 1] is decreased. I. Sason (Technion) Seminar Talk, ETH, Zurich, Switzerland August 22 23, 2012. 14 / 90

A Simple Example to Motivate Further Improvements Comparison of the Exact Exponent and Its Lower Bound 0.8 COMPARISON BETWEEN THE EXPONENTS 0.7 0.6 0.5 0.4 0.3 0.2 0.1 γ = 9/10 0 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 δ = α/d I. Sason (Technion) Seminar Talk, ETH, Zurich, Switzerland August 22 23, 2012. 15 / 90

A Simple Example to Motivate Further Improvements Comparison of the Exact Exponent and Its Lower Bound (Cont.) 3 COMPARISON BETWEEN THE EXPONENTS 2.5 2 1.5 1 0.5 γ = 1/10 0 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 δ = α/d I. Sason (Technion) Seminar Talk, ETH, Zurich, Switzerland August 22 23, 2012. 16 / 90

A Simple Example to Motivate Further Improvements Questions: Why the lower bound on the exponent is not asymptotically tight? Why this gap increases as the value of γ is decreased? How this situation can be improved via the martingale approach? I. Sason (Technion) Seminar Talk, ETH, Zurich, Switzerland August 22 23, 2012. 17 / 90

A Simple Example to Motivate Further Improvements Questions: Why the lower bound on the exponent is not asymptotically tight? Why this gap increases as the value of γ is decreased? How this situation can be improved via the martingale approach? This exponential bound relies on Bennett s inequality: Since U k d, E[U k ] = 0, and Var(U k ) = γd 2, then γ exp(td) + exp( γtd) E [exp(tu k )], t 0. 1 + γ I. Sason (Technion) Seminar Talk, ETH, Zurich, Switzerland August 22 23, 2012. 17 / 90

A Simple Example to Motivate Further Improvements Questions: Why the lower bound on the exponent is not asymptotically tight? Why this gap increases as the value of γ is decreased? How this situation can be improved via the martingale approach? This exponential bound relies on Bennett s inequality: Since U k d, E[U k ] = 0, and Var(U k ) = γd 2, then γ exp(td) + exp( γtd) E [exp(tu k )], t 0. 1 + γ The probability distribution that achieves Bennett s inequality with equality is asymmetric (unless γ = 1), and is equal to P(Ũk = d) = γ 1 + γ, P(Ũk = γd) = 1 1 + γ. By reducing the value of γ (0,1], the above asymmetry grows. Indeed, this enlarges the gap to the exact asymptotic exponent. I. Sason (Technion) Seminar Talk, ETH, Zurich, Switzerland August 22 23, 2012. 17 / 90

Conditionally Symmetric Martingales Interim Conclusion An improvement is likely to be obtained by a refinement of Bennett s inequality when X k is conditionally symmetric given F k 1. I. Sason (Technion) Seminar Talk, ETH, Zurich, Switzerland August 22 23, 2012. 18 / 90

Conditionally Symmetric Martingales Interim Conclusion An improvement is likely to be obtained by a refinement of Bennett s inequality when X k is conditionally symmetric given F k 1. Definition: Conditionally Symmetric Martingales Let {X k, F k } k N0, where N 0 N {0}, be a discrete-time and real-valued martingale, and let ξ k X k X k 1 for every k N. Then {X k, F k } k N0 is a conditionally symmetric martingale if, conditioned on F k 1, the RV ξ k is symmetrically distributed around zero. I. Sason (Technion) Seminar Talk, ETH, Zurich, Switzerland August 22 23, 2012. 18 / 90

Conditionally Symmetric Martingales Interim Conclusion An improvement is likely to be obtained by a refinement of Bennett s inequality when X k is conditionally symmetric given F k 1. Definition: Conditionally Symmetric Martingales Let {X k, F k } k N0, where N 0 N {0}, be a discrete-time and real-valued martingale, and let ξ k X k X k 1 for every k N. Then {X k, F k } k N0 is a conditionally symmetric martingale if, conditioned on F k 1, the RV ξ k is symmetrically distributed around zero. Definition: Conditionally Symmetric Sub/ Supermartingales Let {X k, F k } k N0 be a discrete-time real-valued sub or supermartingale, and let η k X k E[X k F k 1 ] for every k N. Then it is conditionally symmetric if, conditioned on F k 1, the RV η k is symmetrically distributed around zero. I. Sason (Technion) Seminar Talk, ETH, Zurich, Switzerland August 22 23, 2012. 18 / 90

Conditionally Symmetric Martingales Construction of Conditionally Symmetric Martingales Example 1: Let (Ω, F, P) be a probability space {U k } k N L 1 (Ω, F, P) be independent random variables that are symmetrically distributed around zero {F k } k 0 be the natural filtration where F 0 = {,Ω} and F k = σ(u 1,...,U k ), k N. For k N, let A k L (Ω, F k 1, P) be an F k 1 -measurable random variable with a finite essential supremum. I. Sason (Technion) Seminar Talk, ETH, Zurich, Switzerland August 22 23, 2012. 19 / 90

Conditionally Symmetric Martingales Construction of Conditionally Symmetric Martingales Example 1: Let (Ω, F, P) be a probability space {U k } k N L 1 (Ω, F, P) be independent random variables that are symmetrically distributed around zero {F k } k 0 be the natural filtration where F 0 = {,Ω} and F k = σ(u 1,...,U k ), k N. For k N, let A k L (Ω, F k 1, P) be an F k 1 -measurable random variable with a finite essential supremum. Define a new sequence of random variables in L 1 (Ω, F, P) where X n = n A k U k, n N k=1 and X 0 = 0. Then, {X n, F n } n N0 is a conditionally symmetric martingale. I. Sason (Technion) Seminar Talk, ETH, Zurich, Switzerland August 22 23, 2012. 19 / 90

Conditionally Symmetric Martingales Construction of Conditionally Symmetric Martingales Example 2: Let {X n, F n } n N0 be a conditionally symmetric martingale A k L (Ω, F k 1, P) be an F k 1 -measurable random variable with a finite essential supremum. Define Y 0 = 0 and Y n = n k=1 A k(x k X k 1 ), n N. Then, {Y n, F n } n N0 is a conditionally symmetric martingale. I. Sason (Technion) Seminar Talk, ETH, Zurich, Switzerland August 22 23, 2012. 20 / 90

Conditionally Symmetric Martingales Construction of Conditionally Symmetric Martingales Example 2: Let {X n, F n } n N0 be a conditionally symmetric martingale A k L (Ω, F k 1, P) be an F k 1 -measurable random variable with a finite essential supremum. Define Y 0 = 0 and Y n = n k=1 A k(x k X k 1 ), n N. Then, {Y n, F n } n N0 is a conditionally symmetric martingale. Example 3: A sampled Brownian motion is a discrete time, conditionally symmetric martingale with un-bounded increments. I. Sason (Technion) Seminar Talk, ETH, Zurich, Switzerland August 22 23, 2012. 20 / 90

Conditionally Symmetric Martingales Construction of Conditionally Symmetric Martingales Example 2: Let {X n, F n } n N0 be a conditionally symmetric martingale A k L (Ω, F k 1, P) be an F k 1 -measurable random variable with a finite essential supremum. Define Y 0 = 0 and Y n = n k=1 A k(x k X k 1 ), n N. Then, {Y n, F n } n N0 is a conditionally symmetric martingale. Example 3: A sampled Brownian motion is a discrete time, conditionally symmetric martingale with un-bounded increments. Goal: Our next goal is to demonstrate how the assumption of the conditional symmetry improves existing exponential inequalities for discrete-time real-valued martingales with bounded increments. I. Sason (Technion) Seminar Talk, ETH, Zurich, Switzerland August 22 23, 2012. 20 / 90

Exponential Inequalities for Conditionally Symmetric Martingales Exponential Inequalities for Conditionally Symmetric Martingales Lemma Let X be a real-valued RV with a symmetric distribution around zero, a support [ d,d], and assume Var(X) γd 2 for some d > 0 and γ [0,1]. Let h be a real-valued convex function, and assume that h(d 2 ) h(0). Then, E[h(X 2 )] (1 γ)h(0) + γh(d 2 ) with equality if P(X = ±d) = γ 2, P(X = 0) = 1 γ. I. Sason (Technion) Seminar Talk, ETH, Zurich, Switzerland August 22 23, 2012. 21 / 90

Exponential Inequalities for Conditionally Symmetric Martingales Exponential Inequalities for Conditionally Symmetric Martingales Lemma Let X be a real-valued RV with a symmetric distribution around zero, a support [ d,d], and assume Var(X) γd 2 for some d > 0 and γ [0,1]. Let h be a real-valued convex function, and assume that h(d 2 ) h(0). Then, E[h(X 2 )] (1 γ)h(0) + γh(d 2 ) with equality if P(X = ±d) = γ 2, P(X = 0) = 1 γ. Proof h convex, X [ d,d] a.s. h(x 2 ) h(0) + ( ) X 2 ( d h(d 2 ) h(0) ). Taking expectations on both sides gives the inequality, with an equality for the above symmetric distribution. I. Sason (Technion) Seminar Talk, ETH, Zurich, Switzerland August 22 23, 2012. 21 / 90

Exponential Inequalities for Conditionally Symmetric Martingales Exp. Inequalities for Conditionally Symmetric Martingales (Cont.) Corollary Let X be a real-valued RV with a symmetric distribution around zero, a support [ d,d], and assume Var(X) γd 2 for some d > 0 and γ [0,1]. Then, E [ exp(λx) ] 1 + γ [ cosh(λd) 1 ], λ R with equality if P(X = ±d) = γ 2, P(X = 0) = 1 γ. I. Sason (Technion) Seminar Talk, ETH, Zurich, Switzerland August 22 23, 2012. 22 / 90

Exponential Inequalities for Conditionally Symmetric Martingales Exp. Inequalities for Conditionally Symmetric Martingales (Cont.) Corollary Let X be a real-valued RV with a symmetric distribution around zero, a support [ d,d], and assume Var(X) γd 2 for some d > 0 and γ [0,1]. Then, E [ exp(λx) ] 1 + γ [ cosh(λd) 1 ], λ R with equality if P(X = ±d) = γ 2, P(X = 0) = 1 γ. Proof Symmetric distribution of X E [ exp(λx) ] = E [ cosh(λx) ]. The corollary follows from the lemma since, for every x R, cosh(λx) = h(x 2 ) where h(x) λ 2n x n n=0 (2n)! is a convex function, and h(d 2 ) = cosh(λd) 1 = h(0). I. Sason (Technion) Seminar Talk, ETH, Zurich, Switzerland August 22 23, 2012. 22 / 90

Exponential Inequalities for Conditionally Symmetric Martingales Theorem 2 (I. S., 2012) Let {X k, F k } k N0 be a discrete-time real-valued and conditionally symmetric martingale. Assume that, for some fixed numbers d,σ > 0, the following two requirements are satisfied a.s. X k X k 1 d, Var(X k F k 1 ) = E [ (X k X k 1 ) 2 F k 1 ] σ 2 for every k N. I. Sason (Technion) Seminar Talk, ETH, Zurich, Switzerland August 22 23, 2012. 23 / 90

Exponential Inequalities for Conditionally Symmetric Martingales Theorem 2 (I. S., 2012) Let {X k, F k } k N0 be a discrete-time real-valued and conditionally symmetric martingale. Assume that, for some fixed numbers d,σ > 0, the following two requirements are satisfied a.s. X k X k 1 d, Var(X k F k 1 ) = E [ (X k X k 1 ) 2 F k 1 ] σ 2 for every k N. Then, for every α 0 and n N, ( ) P max X k X 0 αn 2exp ( ne(γ,δ) ) 1 k n where γ σ2 d 2, δ α d and for γ (0,1] and δ [0,1), the exponent E(γ,δ) is given as follows: I. Sason (Technion) Seminar Talk, ETH, Zurich, Switzerland August 22 23, 2012. 23 / 90

Exponential Inequalities for Conditionally Symmetric Martingales Theorem 2 (Cont.) E(γ,δ) δx ln (1 + γ [ cosh(x) 1 ]) ( δ(1 γ) + ) δ x ln 2 (1 γ) 2 + γ 2 (1 δ 2 ). γ(1 δ) If δ > 1, then the probability is zero (so E(γ,δ) + ), and E(γ,1) = ln ( 2 γ). I. Sason (Technion) Seminar Talk, ETH, Zurich, Switzerland August 22 23, 2012. 24 / 90

Exponential Inequalities for Conditionally Symmetric Martingales Theorem 2 (Cont.) E(γ,δ) δx ln (1 + γ [ cosh(x) 1 ]) ( δ(1 γ) + ) δ x ln 2 (1 γ) 2 + γ 2 (1 δ 2 ). γ(1 δ) If δ > 1, then the probability is zero (so E(γ,δ) + ), and E(γ,1) = ln ( 2 γ). Asymptotic Optimality of the Exponent The example we studied earlier shows that the exponent of the bound in Theorem 1 is asymptotically optimal. That is, there exists a conditionally symmetric martingale, satisfying the conditions in this theorem, that attains the exponent E(γ,δ) in the limit where n. I. Sason (Technion) Seminar Talk, ETH, Zurich, Switzerland August 22 23, 2012. 24 / 90

Exponential Inequalities for Conditionally Symmetric Martingales Theorem 2 should be compared to the bound in Theorem 1 which does not require the conditional symmetry property. Theorem 1 (Reminder) [McDiarmid 1989, book of Dembo & Zeitouni (Corollary 2.4.7)] Let {X k, F k } k N0 be a discrete-time real-valued martingale with bounded jumps. Assume that the two conditions on the bounded increments and conditional variance from Theorem 1 are satisfied a.s. for every k N. I. Sason (Technion) Seminar Talk, ETH, Zurich, Switzerland August 22 23, 2012. 25 / 90

Exponential Inequalities for Conditionally Symmetric Martingales Theorem 2 should be compared to the bound in Theorem 1 which does not require the conditional symmetry property. Theorem 1 (Reminder) [McDiarmid 1989, book of Dembo & Zeitouni (Corollary 2.4.7)] Let {X k, F k } k N0 be a discrete-time real-valued martingale with bounded jumps. Assume that the two conditions on the bounded increments and conditional variance from Theorem 1 are satisfied a.s. for every k N. Then, for every α 0 and n N, ( ) P max X k X 0 αn 1 k n ( 2exp n D ( δ + γ )) γ 1 + γ 1 + γ where γ σ2 d and δ α 2 d were introduced earlier, and ( p ( 1 p ) D(p q) p ln + (1 p)ln, p,q [0,1]. q) 1 q If δ > 1, then the probability is zero. I. Sason (Technion) Seminar Talk, ETH, Zurich, Switzerland August 22 23, 2012. 25 / 90

Exponential Inequalities for Conditionally Symmetric Martingales Proof Technique for Theorems 1 and 2 Define X n X 0 = n k=1 ξ k where ξ k X k X k 1. I. Sason (Technion) Seminar Talk, ETH, Zurich, Switzerland August 22 23, 2012. 26 / 90

Exponential Inequalities for Conditionally Symmetric Martingales Proof Technique for Theorems 1 and 2 Define X n X 0 = n k=1 ξ k where ξ k X k X k 1. ξ k d, E [ ξ k F k 1 ] = 0, Var(ξk F k 1 ) γd 2. The RV ξ k is F k -measurable. I. Sason (Technion) Seminar Talk, ETH, Zurich, Switzerland August 22 23, 2012. 26 / 90

Exponential Inequalities for Conditionally Symmetric Martingales Proof Technique for Theorems 1 and 2 Define X n X 0 = n k=1 ξ k where ξ k X k X k 1. ξ k d, E [ ξ k F k 1 ] = 0, Var(ξk F k 1 ) γd 2. The RV ξ k is F k -measurable. Exponentiation and the maximal inequality for submartingales give that, for every α 0, ( ) [ ( P max (X k X 0 ) nα e nαt E exp t 1 k n n ξ k )], t 0. k=1 I. Sason (Technion) Seminar Talk, ETH, Zurich, Switzerland August 22 23, 2012. 26 / 90

Exponential Inequalities for Conditionally Symmetric Martingales Proof Technique for Theorems 1 and 2 Define X n X 0 = n k=1 ξ k where ξ k X k X k 1. ξ k d, E [ ξ k F k 1 ] = 0, Var(ξk F k 1 ) γd 2. The RV ξ k is F k -measurable. Exponentiation and the maximal inequality for submartingales give that, for every α 0, ( ) [ ( P max (X k X 0 ) nα e nαt E exp t 1 k n n ξ k )], t 0. k=1 Since {F i } forms a filtration, then for every t 0 [ ( n )] [ ( n 1 ) E exp t ξ k = E exp t ξ k E [ ] ] exp(tξ n ) F n 1. k=1 k=1 I. Sason (Technion) Seminar Talk, ETH, Zurich, Switzerland August 22 23, 2012. 26 / 90

Exponential Inequalities for Conditionally Symmetric Martingales Proof Technique for Theorems 1 and 2 (Cont.) For proving Theorem 2, the use of Bennett s inequality gives E [exp(tξ k ) F k 1 ] γ exp(td) + exp( γtd). 1 + γ I. Sason (Technion) Seminar Talk, ETH, Zurich, Switzerland August 22 23, 2012. 27 / 90

Exponential Inequalities for Conditionally Symmetric Martingales Proof Technique for Theorems 1 and 2 (Cont.) For proving Theorem 2, the use of Bennett s inequality gives E [exp(tξ k ) F k 1 ] γ exp(td) + exp( γtd). 1 + γ For the refinement in Theorem 1 for conditionally symmetric martingales with bounded increments, use of Lemma 1 gives E [exp(tξ k ) F k 1 ] 1 + γ [ cosh(td) 1 ]. I. Sason (Technion) Seminar Talk, ETH, Zurich, Switzerland August 22 23, 2012. 27 / 90

Exponential Inequalities for Conditionally Symmetric Martingales Proof Technique for Theorems 1 and 2 (Cont.) For proving Theorem 2, the use of Bennett s inequality gives E [exp(tξ k ) F k 1 ] γ exp(td) + exp( γtd). 1 + γ For the refinement in Theorem 1 for conditionally symmetric martingales with bounded increments, use of Lemma 1 gives E [exp(tξ k ) F k 1 ] 1 + γ [ cosh(td) 1 ]. In both cases, one gets exponential bounds with the free parameter t. I. Sason (Technion) Seminar Talk, ETH, Zurich, Switzerland August 22 23, 2012. 27 / 90

Exponential Inequalities for Conditionally Symmetric Martingales Proof Technique for Theorems 1 and 2 (Cont.) For proving Theorem 2, the use of Bennett s inequality gives E [exp(tξ k ) F k 1 ] γ exp(td) + exp( γtd). 1 + γ For the refinement in Theorem 1 for conditionally symmetric martingales with bounded increments, use of Lemma 1 gives E [exp(tξ k ) F k 1 ] 1 + γ [ cosh(td) 1 ]. In both cases, one gets exponential bounds with the free parameter t. Optimization over t 0 & union bound to get two-sided inequalities. I. Sason (Technion) Seminar Talk, ETH, Zurich, Switzerland August 22 23, 2012. 27 / 90

Exponential Inequalities for Conditionally Symmetric Martingales Proof Technique for Theorems 1 and 2 (Cont.) For proving Theorem 2, the use of Bennett s inequality gives E [exp(tξ k ) F k 1 ] γ exp(td) + exp( γtd). 1 + γ For the refinement in Theorem 1 for conditionally symmetric martingales with bounded increments, use of Lemma 1 gives E [exp(tξ k ) F k 1 ] 1 + γ [ cosh(td) 1 ]. In both cases, one gets exponential bounds with the free parameter t. Optimization over t 0 & union bound to get two-sided inequalities. The asymptotic optimality of the exponent in Theorem 1 follows from the simple example that was shown earlier. I. Sason (Technion) Seminar Talk, ETH, Zurich, Switzerland August 22 23, 2012. 27 / 90

Relation to Classical Results in Probability Theory Relation to Classical Results in Probability Theory: The above concentration inequalities are linked to Central limit theorem for martingales Law of iterated logarithm Moderate deviations principle for real-valued i.i.d. RVs. For proofs and discussions on these relations, see Section IV in http://arxiv.org/abs/1111.1977. I. Sason (Technion) Seminar Talk, ETH, Zurich, Switzerland August 22 23, 2012. 28 / 90

Application 1: MDP for Binary Hypothesis Testing Binary Hypothesis Testing Let the RVs X 1,X 2... be i.i.d. Q, and consider two hypotheses: H 1 : Q = P 1. H 2 : Q = P 2. For simplicity, assume that the RVs are discrete, and take their values on a finite alphabet X where P 1 (x),p 2 (x) > 0 for every x X. The log-likelihood ratio (LLR): L(X n 1 ) = n i=1 ln P 1(X i ) P 2 (X i ). By the strong law of large numbers (SLLN), if H 1 is true, then a.s. L(X1 lim n) L(X1 n n = D(P 1 P 2 ) and, if H 2, lim n) n n = D(P 2 P 1 ). Let λ,λ R satisfy D(P 2 P 1 ) < λ λ < D(P 1 P 2 ). Decide on H 1 if L(X n 1 ) > nλ and on H 2 if L(X n 1 ) < nλ. I. Sason (Technion) Seminar Talk, ETH, Zurich, Switzerland August 22 23, 2012. 29 / 90

Application 1: MDP for Binary Hypothesis Testing Let and then ) α (1) n P 1 (L(X n 1 n ) nλ ( ) β n (1) P2 n L(X1 n ) nλ ) α (2) n P 1 (L(X n 1 n ) nλ ( ) β n (2) P2 n L(X1 n ) nλ α (1) n and β n (1) are the probabilities of either making an error/ declaring an erasure under, respectively, hypotheses H 1 and H 2. α (2) n and β n (2) are the probabilities of making an error under, respectively, hypotheses H 1 and H 2. Large deviations analysis α n and β n both decay exponentially to zero as a function of the block length (n). I. Sason (Technion) Seminar Talk, ETH, Zurich, Switzerland August 22 23, 2012. 30 / 90

Application 1: MDP for Binary Hypothesis Testing Moderate-Deviations Analysis for Binary Hypothesis Testing Instead of the setting where the thresholds are kept fixed independently of n, let these thresholds tend to their asymptotic limits (due to the SLLN), i.e., lim n λ(n) = D(P 1 P 2 ), lim n λ(n) = D(P 2 P 1 ). In moderate-deviations analysis of binary hypothesis testing, we are interested to analyze the case where 1 The block length n of the input sequence tends to infinity. 2 The thresholds tend to their asymptotic limits simultaneously. I. Sason (Technion) Seminar Talk, ETH, Zurich, Switzerland August 22 23, 2012. 31 / 90

Application 1: MDP for Binary Hypothesis Testing Moderate-Deviations Analysis for Binary Hypothesis Testing (Cont.) To this end, let η ( 1 2,1), and ε 1,ε 2 > 0 be arbitrary fixed numbers. Set the upper and lower thresholds to λ (n) = D(P 1 P 2 ) ε 1 n (1 η) λ (n) = D(P 2 P 1 ) + ε 2 n (1 η). I. Sason (Technion) Seminar Talk, ETH, Zurich, Switzerland August 22 23, 2012. 32 / 90

Application 1: MDP for Binary Hypothesis Testing Moderate-Deviations Analysis via the Martingale Approach Under hypothesis H 1, construct the martingale {U k, F k } n k=0 where F 0 F 1...F n is the natural filtration F 0 = {,Ω}, F k = σ(x 1,...,X k ), k {1,...,n}. U k = E P n 1 [ L(X n 1 ) F k ]. I. Sason (Technion) Seminar Talk, ETH, Zurich, Switzerland August 22 23, 2012. 33 / 90

Application 1: MDP for Binary Hypothesis Testing Moderate-Deviations Analysis via the Martingale Approach Under hypothesis H 1, construct the martingale {U k, F k } n k=0 where F 0 F 1...F n is the natural filtration F 0 = {,Ω}, F k = σ(x 1,...,X k ), k {1,...,n}. U k = E P n 1 [ L(X n 1 ) F k ]. For every k {0,...,n}: U k = k i=1 ln P 1(X i ) P 2 (X i ) + (n k)d(p 1 P 2 ). I. Sason (Technion) Seminar Talk, ETH, Zurich, Switzerland August 22 23, 2012. 33 / 90

Application 1: MDP for Binary Hypothesis Testing Moderate-Deviations Analysis via the Martingale Approach (Cont.) U 0 = nd(p 1 P 2 ), and U n = L(X n 1 ). Let d 1 max x X ln P 1(x) P 2 (x) D(P 1 P 2 ) d 1 <. U k U k 1 d 1 holds a.s. for every k {1,...,n}. Due to the independence of the RVs in the sequence {X i } [ E P n (Uk 1 U k 1 ) 2 ] F k 1 = { ( P 1 (x) ln P ) } 2 1(x) P 2 (x) D(P 1 P 2 ) x X σ 2 1. I. Sason (Technion) Seminar Talk, ETH, Zurich, Switzerland August 22 23, 2012. 34 / 90

Application 1: MDP for Binary Hypothesis Testing Moderate-Deviations Analysis via the Martingale Approach (Cont.) Let ε 1 > 0 and η ( 1 2,1) be two arbitrarily fixed numbers as above. Under hypothesis H 1, it follows from Theorem 2 and the above construction of a martingale that ( ( P1 n ( L(X n 1 ) nλ (n) (η,n) ) δ 1 + γ 1 ) ) exp nd γ 1 1 + γ 1 1 + γ 1 where with d 1 and σ 2 1 δ (η,n) 1 ε 1n (1 η) d 1, γ 1 σ2 1 d 2 1 as defined in the previous slide. I. Sason (Technion) Seminar Talk, ETH, Zurich, Switzerland August 22 23, 2012. 35 / 90

Application 1: MDP for Binary Hypothesis Testing Moderate-Deviations Analysis via the Martingale Approach (Cont.) The concentration inequality in Theorem 2 and ( a simple ) inequality give that, under hypothesis H 1, for every η 1 2,1, α (1) n ( exp ( ε2 1 n2η 1 2σ1 2 1 ε 1 d 1 3σ 2 1 (1 + γ 1) )) 1 n 1 η so this upper bound on the overall probability of either making an error or declaring an erasure under hypothesis H 1 decays sub-exponentially to zero. It improves by increasing η ( 1 2,1). On the other hand, the exponential decay of the probability of error improves as the value of η ( 1 2,1) is decreased (since the margin for making an error is increased). α (2) n I. Sason (Technion) Seminar Talk, ETH, Zurich, Switzerland August 22 23, 2012. 36 / 90

Application 1: MDP for Binary Hypothesis Testing Moderate-Deviations Analysis via the Martingale Approach (Cont.) Moderate-deviations analysis reflects, as can be expected, a tradeoff between the two α n s and also between the two β n s. But all decay asymptotically to zero as n tends to infinity (which is indeed consistent with the SLLN). I. Sason (Technion) Seminar Talk, ETH, Zurich, Switzerland August 22 23, 2012. 37 / 90

Application 1: MDP for Binary Hypothesis Testing Moderate-Deviations Analysis via the Martingale Approach (Cont.) The following upper bound implies that lim n n1 2η ln P1 n ( L(X n 1 ) nλ (n) ) ε 2 1 2σ1 2. I. Sason (Technion) Seminar Talk, ETH, Zurich, Switzerland August 22 23, 2012. 38 / 90

Application 1: MDP for Binary Hypothesis Testing Moderate-Deviations Analysis via the Martingale Approach (Cont.) The following upper bound implies that lim n n1 2η ln P1 n ( L(X n 1 ) nλ (n) ) ε 2 1 2σ1 2. Question: But, does the upper bound reflect the correct asymptotic scaling of this probability? I. Sason (Technion) Seminar Talk, ETH, Zurich, Switzerland August 22 23, 2012. 38 / 90

Application 1: MDP for Binary Hypothesis Testing Moderate-Deviations Analysis via the Martingale Approach (Cont.) The following upper bound implies that lim n n1 2η ln P1 n ( L(X n 1 ) nλ (n) ) ε 2 1 2σ1 2. Question: But, does the upper bound reflect the correct asymptotic scaling of this probability? Reply: Indeed, it follows from the moderate-deviations principle (MDP) for real-valued RVs. I. Sason (Technion) Seminar Talk, ETH, Zurich, Switzerland August 22 23, 2012. 38 / 90

Application 1: MDP for Binary Hypothesis Testing Moderate-Deviations Principle (MDP) for Real-Valued RVs Theorem Let {X i } be i.i.d. random real-valued RVs, satisfying E[X i ] = 0, Var(X i ) = σ 2, X i d and let η ( 1 2,1). Then, for every α > 0, ( ) n lim n n1 2η ln P X i αn η i=1 = α2 2σ2, α 0. I. Sason (Technion) Seminar Talk, ETH, Zurich, Switzerland August 22 23, 2012. 39 / 90

Application 1: MDP for Binary Hypothesis Testing Moderate-Deviations Analysis via the Martingale Approach (Cont.) The MDP shows that this inequality holds in fact with equality. The refined inequality in Theorem 2 gives the correct asymptotic scaling in this case. It also gives an analytical bound for finite n. This is in contrast to the analysis which follows from the Azuma-Hoeffding inequality, which does not coincide with the correct asymptotic scaling (since ε2 1 2σ 2 1 is replaced by ε2 1, and σ 2d 2 1 d 1 ). 1 In the considered setting of moderate-deviations analysis for binary hypothesis testing, the error probability decays sub-exponentially to 0. I. Sason (Technion) Seminar Talk, ETH, Zurich, Switzerland August 22 23, 2012. 40 / 90

Application 1: MDP for Binary Hypothesis Testing Papers on Moderate-Deviations Analysis in IT Aspects 1 E. A. Abbe, Local to Global Geometric Methods in Information Theory, Ph.D. dissertation, MIT, Boston, MA, USA, June 2008. 2 Y. Altuǧ and A. B. Wagner, Moderate-deviations analysis of channel coding: discrete memoryless case, Proc. ISIT 2010, pp. 265 269, Austin, Texas, USA, June 2010. 3 D. He, L. A. Lastras-Montaño, E. Yang, A. Jagmohan and J. Chen, On the redundancy of Slepian-Wolf coding, IEEE Trans. on Information Theory, vol. 55, no. 12, pp. 5607 5627, December 2009. 4 Y. Polyanskiy and S. Verdú, Channel dispersion and moderate-deviations limits of memoryless channels, Proceedings Forty-Eighth Annual Allerton Conference, pp. 1334 1339, UIUC, Illinois, USA, October 2010. 5 I. Sason, Moderate-deviations analysis of binary hypothesis testing, http://arxiv.org/abs/1111.1995, November 2011. 6 V. Y. F. Tan, Moderate-deviations of lossy source coding for discrete and Gaussian channels, http://arxiv.org/abs/1111.2217, November 2011. I. Sason (Technion) Seminar Talk, ETH, Zurich, Switzerland August 22 23, 2012. 41 / 90

Application 2 - Concentration of Martingales in Coding Theory Concentration Phenomena for Codes Defined on Graphs Motivation & Background The performance analysis of a particular code is difficult, especially for codes of large block lengths. I. Sason (Technion) Seminar Talk, ETH, Zurich, Switzerland August 22 23, 2012. 42 / 90

Application 2 - Concentration of Martingales in Coding Theory Concentration Phenomena for Codes Defined on Graphs Motivation & Background The performance analysis of a particular code is difficult, especially for codes of large block lengths. Does the performance concentrate around the average performance of the ensemble? I. Sason (Technion) Seminar Talk, ETH, Zurich, Switzerland August 22 23, 2012. 42 / 90

Application 2 - Concentration of Martingales in Coding Theory Concentration Phenomena for Codes Defined on Graphs Motivation & Background The performance analysis of a particular code is difficult, especially for codes of large block lengths. Does the performance concentrate around the average performance of the ensemble? The existence of such a concentration validates the use of the density evolution technique as an analytical tool to assess performance of long enough codes (e.g., LDPC codes) and to assess their asymptotic gap to capacity. I. Sason (Technion) Seminar Talk, ETH, Zurich, Switzerland August 22 23, 2012. 42 / 90

Application 2 - Concentration of Martingales in Coding Theory Concentration Phenomena for Codes Defined on Graphs Motivation & Background The performance analysis of a particular code is difficult, especially for codes of large block lengths. Does the performance concentrate around the average performance of the ensemble? The existence of such a concentration validates the use of the density evolution technique as an analytical tool to assess performance of long enough codes (e.g., LDPC codes) and to assess their asymptotic gap to capacity. The current concentration results for codes defined on graphs, which mainly rely on the Azuma-Hoeffding inequality, are weak since in practice concentration is observed at much shorter block lengths. I. Sason (Technion) Seminar Talk, ETH, Zurich, Switzerland August 22 23, 2012. 42 / 90

Application 2 - Concentration of Martingales in Coding Theory Performance under Message-Passing Decoding Theorem I - [Concentration of performance under iterative message-passing decoding (Richardson and Urbanke, 2001)] Let C, a code chosen uniformly at random from the ensemble LDPC(n,λ,ρ), be used for transmission over a memoryless binary-input output-symmetric (MBIOS) channel. Assume that the decoder performs l iterations of message-passing decoding, and let P b (C,l) denote the resulting bit error probability. Then, for every δ > 0, there exists an α > 0 where α = α(λ,ρ,δ,l) (independent of the block length n) such that P ( P b (C,l) E LDPC(n,λ,ρ) [P b (C,l)] δ ) e αn Proof The proof applies Azuma s inequality to a martingale sequence with bounded differences (IEEE Trans. on IT, Feb. 2001). I. Sason (Technion) Seminar Talk, ETH, Zurich, Switzerland August 22 23, 2012. 43 / 90

Application 2 - Concentration of Martingales in Coding Theory Conditional Entropy of LDPC code ensembles Theorem II - [Concentration of Conditional Entropy of LDPC code ensembles (Méasson et al. 2008)] Let C be chosen uniformly at random from the ensemble LDPC(n,λ,ρ). Assume that the transmission of the code C takes place over an MBIOS channel. Let H(X Y) designate the conditional entropy of the transmitted codeword X given the received sequence Y from the channel. Then, for any ξ > 0, P ( H(X Y) E LDPC(n,λ,ρ) [H(X Y)] n ξ ) 2exp( Bξ 2 ) 1 where B 2(d max c +1) 2 (1 R d ), dmax c is the maximal check-node degree, and R d is the design rate of the ensemble. I. Sason (Technion) Seminar Talk, ETH, Zurich, Switzerland August 22 23, 2012. 44 / 90

Application 2 - Concentration of Martingales in Coding Theory Proof - [outline] 1 Introduction of a martingale sequence with bounded differences: Define the RV Z = HG (X Y), where G is a graph of a code chosen uniformly at random from the ensemble LDPC(n, λ, ρ) Define the martingale sequence Zt = E[Z F t ] t {0, 1,..., m}, where the filtration is the sequence of subsets of σ-algebras, generated by revealing each time another parity-check equation of the code. 2 Upper bounds on the differences Z t+1 Z t : It was proved that Zt+1 Z t (r + 1)H G ( X Y), where r is the degree of parity-check equation revealed at time t, and X = X i1... X ir (i.e., X is the modulo-2 sum of some r bits in the codeword X). Then r d max c, and H G ( X Y) 1. 3 Azuma s inequality was applied to a get a concentration inequality, using Z t+1 Z t d max c + 1 for every t = 0,...,m 1 where m = n(1 R d ) is the number of parity-check nodes. I. Sason (Technion) Seminar Talk, ETH, Zurich, Switzerland August 22 23, 2012. 45 / 90

Application 2 - Concentration of Martingales in Coding Theory Improvements to Theorem II Improvement 1 - A tightened upper bound on the conditional entropy Instead of upper bounding H G ( X Y) by 1, which is independent of the channel capacity (C), it can be proved that ( ) H G ( X Y) 1 C r 2 h 2 where h is the binary entropy function to the base 2. For a BSC or BEC, this bound can be further improved. I. Sason (Technion) Seminar Talk, ETH, Zurich, Switzerland August 22 23, 2012. 46 / 90

Application 2 - Concentration of Martingales in Coding Theory Improvement 2 (trivial) Instead of taking the trivial bound r d max c for all m terms in the Azuma s inequality, one can rely on the degree distribution of the parity-check nodes. The number of parity-check nodes of degree r is n(1 R d )Γ r. Theorem III - [Tightened Expressions for B] Considering the terms of Theorem II, applying these two improvements yields tightened expressions for B. 1 General MBIOS - B [ ( BSC - B 1 2(1 R d ) d max c i=1 (i+1)2 Γ i h 1 C i 2 2 2(1 R d ) d max ( )] c 1 [1 2h i=1 (i+1)2 1 (1 C)] i 2 Γ i [h 2 BEC - B 1 2(1 R d ) d max c i=1 (i+1)2 Γ i (1 C i ) 2 )] 2 I. Sason (Technion) Seminar Talk, ETH, Zurich, Switzerland August 22 23, 2012. 47 / 90

Application 2 - Concentration of Martingales in Coding Theory Numerical comparison for BEC and BIAWGN Let us consider the case where (2,20) regular LDPC code ensemble. Communication over a BEC or BIAWGNC with capacity of 0.98 per channel use. Compared to Theorems II, applying Theorem III results in tighter expressions for B: [ ( )] 2 BIAWGN - Improvement by factor h = 5.134 1 C dc 2 2 BEC - Improvement by factor 1 (1 C dc ) 2 = 9.051 I. Sason (Technion) Seminar Talk, ETH, Zurich, Switzerland August 22 23, 2012. 48 / 90

Application 2 - Concentration of Martingales in Coding Theory Comparison for Heavy-Tail Poisson Distribution (Tornado Codes) Consider the capacity-achieving Tornado LDPC code ensemble for a BEC with erasure probability p. We wish to design a code ensemble that achieves a fraction 1 ε of the capacity. Theorem II - The parity-check degree is Poisson distributed, therefore =. Hence, B = 0 and this result is useless. ( ) 1 Theorem III - B scales (at least) like O ). d max c log 2( 1 ε The parameter B tends to zero slowly as we let the fractional gap ε tend to zero; this demonstrates a rather fast concentration. I. Sason (Technion) Seminar Talk, ETH, Zurich, Switzerland August 22 23, 2012. 49 / 90

Application 2 - Concentration of Martingales in Coding Theory Interim Conclusions The use of Azuma s inequality was addressed in the context of proving concentration phenomena for code ensembles defined on graphs and iterative decoding algorithms. A possible tightening of a concentration inequality for the conditional entropy of LDPC ensembles by using Azuma s inequality, and deriving an improved upper bound on the jumps of the martingale sequence. The improved inequality enables to prove concentration of the conditional entropy for ensembles of Tornado codes (in contrast to the original concentration inequality). I. Sason (Technion) Seminar Talk, ETH, Zurich, Switzerland August 22 23, 2012. 50 / 90

Application 2 - Concentration of Martingales in Coding Theory Concentration of Martingales in Coding Theory Selected Papers Applying Azuma s Inequality for LDPC Ensembles M. Sipser and D. A. Spielman, Expander codes, IEEE Trans. on Information Theory, vol. 42, no. 6, pp. 1710-1722, November 1996. M.G. Luby, M. Mitzenmacher, M.A. Shokrollahi and D.A. Spielman, Efficient erasure correcting codes, IEEE Trans. on Information Theory, vol. 47, no. 2, pp. 569-584, February 2001. T. Richardson and R. Urbanke, The capacity of low-density parity check codes under message-passing decoding. IEEE Trans. on Information Theory, vol. 47, pp. 599 618, February 2001. A. Kavcic, X. Ma and M. Mitzenmacher, Binary intersymbol interference channels: Gallager bounds, density evolution, and code performance bounds, IEEE Trans. on Information Theory, vol. 49, no. 7, pp. 1636-1652, July 2003. I. Sason (Technion) Seminar Talk, ETH, Zurich, Switzerland August 22 23, 2012. 51 / 90

Application 2 - Concentration of Martingales in Coding Theory (Cont.) A. Montanari, Tight bounds for LDPC and LDGM codes under MAP decoding, IEEE Trans. on Information Theory, vol. 51, no. 9, pp. 3247 3261, September 2005. C. Méasson, A. Montanari and R. Urbanke, Maxwell construction: The hidden bridge between iterative and maximum a-posteriori decoding, IEEE Trans. on Information Theory, vol. 54, pp. 5277 5307, December 2008. L.R. Varshney, Performance of LDPC codes under faulty iterative decoding, IEEE Trans. on Information Theory, vol. 57, no. 7, pp. 4427 4444, July 2011. I. Sason (Technion) Seminar Talk, ETH, Zurich, Switzerland August 22 23, 2012. 52 / 90

Application 3: Concentration for OFDM Signals Orthogonal Frequency Division Multiplexing (OFDM) The OFDM modulation converts a high-rate data stream into a number of low-rate steams that are transmitted over parallel narrow-band channels. One of the problems of OFDM is that the peak amplitude of the signal can be significantly higher than the average amplitude. Sensitivity to non-linear devices in the communication path (e.g., digital-to-analog converters, mixers and high-power amplifiers). An increase in the symbol error rate and also a reduction in the power efficiency as compared to single-carrier systems. I. Sason (Technion) Seminar Talk, ETH, Zurich, Switzerland August 22 23, 2012. 53 / 90

Application 3: Concentration for OFDM Signals OFDM (Cont.) Given an n-length codeword {X i } i=0 n 1, a single OFDM baseband symbol is described by s(t;x 0,...,X n 1 ) = 1 n 1 ( j 2πit ) X i exp, 0 t T. n T i=0 Assume that X 0,...,X n 1 are i.i.d. complex RVs with X i = 1. Since the sub-carriers are orthonormal over [0,T], then a.s. the power of the signal s over this interval is 1. The CF of the signal s, composed of n sub-carriers, is defined as CF n (s) max 0 t T s(t). The CF scales with high probability like log(n) for large n. I. Sason (Technion) Seminar Talk, ETH, Zurich, Switzerland August 22 23, 2012. 54 / 90

Application 3: Concentration for OFDM Signals Concentration of Measures In the following, we consider two of the main approaches for proving concentration inequalities, and apply them to prove concentration for the crest factor of OFDM signals. 1 The 1st approach is based on martingales (the Azuma-Hoeffding inequality and some refinements). 2 The 2nd approach is based on Talagrand s inequalities. I. Sason (Technion) Seminar Talk, ETH, Zurich, Switzerland August 22 23, 2012. 55 / 90

Application 3: Concentration for OFDM Signals A Previously Reported Result A concentration inequality for the CF of OFDM signals was derived (Litsyn and Wunder, IEEE Trans. on IT, 2006). It states that for every c 2.5 CFn P( (s) ) ( clog log(n) 1 log(n) < = 1 O ( ) log(n) 4 ). log(n) I. Sason (Technion) Seminar Talk, ETH, Zurich, Switzerland August 22 23, 2012. 56 / 90

Application 3: Concentration for OFDM Signals Theorem - [McDiarmid s Inequality] Let X = (X 1,...,X n ) be a vector of independent random variables with X k taking values in a set A k for each k. Suppose that a real-valued function f, defined on k A k, satisfies f(x) f(x ) c k whenever the vectors x and x differ only in the k-th coordinate. Let µ E[f(X)] be the expected value of f(x). Then, for every α 0, P( f(x) µ α) 2exp ) ( 2α2. k c2 k I. Sason (Technion) Seminar Talk, ETH, Zurich, Switzerland August 22 23, 2012. 57 / 90