Lecture 14 (03/27/18). Channels. Decoding. Preview of the Capacity Theorem.

Similar documents
Lecture 3: Shannon s Theorem

Error Probability for M Signals

ECE 534: Elements of Information Theory. Solutions to Midterm Exam (Spring 2006)

Chapter 7 Channel Capacity and Coding

Chapter 7 Channel Capacity and Coding

Module 3 LOSSY IMAGE COMPRESSION SYSTEMS. Version 2 ECE IIT, Kharagpur

2E Pattern Recognition Solutions to Introduction to Pattern Recognition, Chapter 2: Bayesian pattern classification

ÉCOLE POLYTECHNIQUE FÉDÉRALE DE LAUSANNE

Randomness and Computation

EGR 544 Communication Theory

Lecture Notes on Linear Regression

princeton univ. F 13 cos 521: Advanced Algorithm Design Lecture 3: Large deviations bounds and applications Lecturer: Sanjeev Arora

Stanford University CS254: Computational Complexity Notes 7 Luca Trevisan January 29, Notes for Lecture 7

Communication with AWGN Interference

Expected Value and Variance

CSCE 790S Background Results

Lecture 3. Ax x i a i. i i

VQ widely used in coding speech, image, and video

Lecture 3: Probability Distributions

MATH 5707 HOMEWORK 4 SOLUTIONS 2. 2 i 2p i E(X i ) + E(Xi 2 ) ä i=1. i=1

Finding Dense Subgraphs in G(n, 1/2)

MASSACHUSETTS INSTITUTE OF TECHNOLOGY 6.265/15.070J Fall 2013 Lecture 12 10/21/2013. Martingale Concentration Inequalities and Applications

State Amplification and State Masking for the Binary Energy Harvesting Channel

Lecture 4: November 17, Part 1 Single Buffer Management

3.1 Expectation of Functions of Several Random Variables. )' be a k-dimensional discrete or continuous random vector, with joint PMF p (, E X E X1 E X

More metrics on cartesian products

Maximum Likelihood Estimation of Binary Dependent Variables Models: Probit and Logit. 1. General Formulation of Binary Dependent Variables Models

Lossy Compression. Compromise accuracy of reconstruction for increased compression.

Lectures - Week 4 Matrix norms, Conditioning, Vector Spaces, Linear Independence, Spanning sets and Basis, Null space and Range of a Matrix

Limited Dependent Variables

NP-Completeness : Proofs

Lecture 10 Support Vector Machines II

Module 9. Lecture 6. Duality in Assignment Problems

Differentiating Gaussian Processes

Channel Encoder. Channel. Figure 7.1: Communication system

Tornado and Luby Transform Codes. Ashish Khisti Presentation October 22, 2003

DC-Free Turbo Coding Scheme Using MAP/SOVA Algorithms

Entropy Coding. A complete entropy codec, which is an encoder/decoder. pair, consists of the process of encoding or

APPENDIX A Some Linear Algebra

Complete subgraphs in multipartite graphs

princeton univ. F 17 cos 521: Advanced Algorithm Design Lecture 7: LP Duality Lecturer: Matt Weinberg

Estimation: Part 2. Chapter GREG estimation

Matrix Approximation via Sampling, Subspace Embedding. 1 Solving Linear Systems Using SVD

Department of Computer Science Artificial Intelligence Research Laboratory. Iowa State University MACHINE LEARNING

The Gaussian classifier. Nuno Vasconcelos ECE Department, UCSD

Calculation of time complexity (3%)

Composite Hypotheses testing

Inner Product. Euclidean Space. Orthonormal Basis. Orthogonal

ANSWERS. Problem 1. and the moment generating function (mgf) by. defined for any real t. Use this to show that E( U) var( U)

Chapter 5. Solution of System of Linear Equations. Module No. 6. Solution of Inconsistent and Ill Conditioned Systems

6. Stochastic processes (2)

xp(x µ) = 0 p(x = 0 µ) + 1 p(x = 1 µ) = µ

6. Stochastic processes (2)

Consider the following passband digital communication system model. c t. modulator. t r a n s m i t t e r. signal decoder.

Problem Set 9 Solutions

Lecture 4: Universal Hash Functions/Streaming Cont d

Econ107 Applied Econometrics Topic 3: Classical Model (Studenmund, Chapter 4)

First Year Examination Department of Statistics, University of Florida

Affine transformations and convexity

U.C. Berkeley CS294: Beyond Worst-Case Analysis Luca Trevisan September 5, 2017

Pulse Coded Modulation

CS 798: Homework Assignment 2 (Probability)

Convergence of random processes

Lecture 5 Decoding Binary BCH Codes

Ph 219a/CS 219a. Exercises Due: Wednesday 23 October 2013

For now, let us focus on a specific model of neurons. These are simplified from reality but can achieve remarkable results.

Errors for Linear Systems

Markov Chain Monte Carlo (MCMC), Gibbs Sampling, Metropolis Algorithms, and Simulated Annealing Bioinformatics Course Supplement

Maximizing the number of nonnegative subsets

Stanford University CS359G: Graph Partitioning and Expanders Handout 4 Luca Trevisan January 13, 2011

Google PageRank with Stochastic Matrix

10-701/ Machine Learning, Fall 2005 Homework 3

U.C. Berkeley CS294: Beyond Worst-Case Analysis Handout 6 Luca Trevisan September 12, 2017

The Multiple Classical Linear Regression Model (CLRM): Specification and Assumptions. 1. Introduction

Maximum Likelihood Estimation of Binary Dependent Variables Models: Probit and Logit. 1. General Formulation of Binary Dependent Variables Models

Dynamic Programming. Preview. Dynamic Programming. Dynamic Programming. Dynamic Programming (Example: Fibonacci Sequence)

ECE559VV Project Report

Supplementary material: Margin based PU Learning. Matrix Concentration Inequalities

NAME and Section No.

MATH 241B FUNCTIONAL ANALYSIS - NOTES EXAMPLES OF C ALGEBRAS

Min Cut, Fast Cut, Polynomial Identities

SELECTED PROOFS. DeMorgan s formulas: The first one is clear from Venn diagram, or the following truth table:

Homework Assignment 3 Due in class, Thursday October 15

Lecture 9: Converse of Shannon s Capacity Theorem

Finding Primitive Roots Pseudo-Deterministically

Spectral Graph Theory and its Applications September 16, Lecture 5

The strict priority scheduler

Digital Modems. Lecture 2

THE ARIMOTO-BLAHUT ALGORITHM FOR COMPUTATION OF CHANNEL CAPACITY. William A. Pearlman. References: S. Arimoto - IEEE Trans. Inform. Thy., Jan.

Generalized Linear Methods

On complexity and randomness of Markov-chain prediction

Module 2. Random Processes. Version 2 ECE IIT, Kharagpur

= z 20 z n. (k 20) + 4 z k = 4

Which Separator? Spring 1

Approximate Smallest Enclosing Balls

Winter 2008 CS567 Stochastic Linear/Integer Programming Guest Lecturer: Xu, Huan

Excess Error, Approximation Error, and Estimation Error

Introduction to information theory and data compression

2.3 Nilpotent endomorphisms

Transcription:

Lecture 14 (03/27/18). Channels. Decodng. Prevew of the Capacty Theorem. A. Barg The concept of a communcaton channel n nformaton theory s an abstracton for transmttng dgtal (and analog) nformaton from the sender to the recpent over a nosy medum. Examples of physcal channels are wreless lnks, cable communcatons (optcal, coaxal, etc.), wrtng onto dgtal meda (flash, magnetc), and many more. Let X, Y be fnte sets. A mappng W : X Y s called stochastc f the mage of x X s a random varable takng values n Y. Denote P (x y) by W (y x), the probablty of y condtonal on the gven nput x. Defnton: A dscrete memoryless channel (DMC) s a stochastc mappng W : X Y. We use the letter W to refer both to the channel tself and to the probablty dstrbuton W (y x). The sets X and Y are called the nput alphabet and the output alphabet of W, respectvely. The channel s represented by a stochastc matrx whose rows are labelled by the elements of X (nput letters) and columns by the elements of Y (output letters). By defnton y Y W (y x) = 1 for any x X. Examples. 1. Z-channel (called so because ts dagram resembles the letter Z). ( ) 1 ε ε W = 0 1 2. Bnary symmetrc channel (BSC(p) ) W : {0, 1} {0, 1} ( ) 1 p p W = p 1 p 3. Bnary erasure channel (BEC((p)) W : {0, 1} {0, 1,?} ( ) 1 p p 0 W = 0 p 1 p There are more examples n the textbook. Defnton: Let M be a fnte set of cardnalty M and let f : M X n be a mappng. A code C of length n over the alphabet X s the mage of f n X n. We say that a message m M s encoded nto a codeword x m C f f(m) = x m. The set of codewords {x 1,..., x M } s called a channel code 1. The number R = 1 n log M s called the rate of the code C. Below we denote general n-vectors by x n, y n and keep the above notaton for the codewords. The codewords are transmtted over the channel. Ths means the followng. The mappng W s extended from X to X n usng the memoryless property of W : n W n (y n x n ) = W (y n x n ), where x n = (x n 1,..., x n n), y n = (y1 n,..., yn). n =1 The result of transmttng the codeword x m over the channel W s a vector y n Y n wth probablty W n (y n x m ). Messages are encoded and transmtted as codewords to provde the recpent wth the functonalty of correctng errors that may occur n the channel. Error correcton s performed by a decoder,.e., a mappng g : Y n M. The decoder s a determnstc mappng constructed so as to mnmze the probablty of ncorrect recovery of transmtted messages. 1 Sometmes the term code s used to refer to f and then the set of codewords C s called the codebook. 1

2 Optmal decoders. We brefly dscuss optmal decodng rules. Let Pr(m) be a probablty dstrbuton on M. Let y n be the receved vector,.e., the output of the channel. The posteror probablty that the transmtted message was m equals (1) P (m y n ) = Pr(m)W n (y n x m ) P (y n, ) where P (y n ) = M m=1 Pr(m)W n (y n x m ). Assume that g(y n ) = m, then the error probablty s p e = 1 P (m y n ). To mnmze p e decode y to m such that (2) P (m y n ) P (m y n ) for all m m (tes are broken arbtrarly). Ths rule s called the maxmum aposteror probablty (MAP) decoder. If Pr(m) = 1/M s unform, then the MAP decoder s equvalent to the maxmum lkelhood (ML) decoder g ML gven by g(y n ) = m f W n (y n x m ) W n (y n x m ) for all m m. To see ths, use the Bayes formula (1) n (2). If Pr(m) = 1/M s not unform, then the ML decoder s generally suboptmal. ML and MAP decoders are computatonally very hard because of the large search nvolved n fndng g(y). Prevew of the Shannon capacty theorem. The followng dscusson s nformal. It uses the smple case of the BSC to explan the nature of channel capacty n geometrc terms. Consder transmsson over W =BSC(p), p < 1 /2. Let d H (x n, y n ) = { : x y } be the Hammng dstance between the (bnary n-dmensonal) vectors x n and y n. Let x n be the transmtted vector and y n the receved vector. The typcal value of the dstance d H (x n, y n ) np. In other words, Pr{ d H (x n, y n ) np nα} s small, where α > 0 s a small number. Therefore defne the decoder value g(y n ) as follows: f there s a unque codevector x m C such that d H (x m, y n ) np nα, then g(y n ) = x m, otherwse put g(y) = x 1 (or any other arbtrary codevector). Below we call vectors y n whose dstance from x m s about np typcal for x m. The number of typcal vectors y n {0, 1} n for a gven x n s (3) {y n {0, 1} n : d H (x n, y n ) np nλ} =. Lemma 1. Let 1 λ 1/2, then (4) Proof : 1 = (λ + (1 λ)) n =0 1 n + 1 2nh(λ) λ (1 λ) n =0 =0 2 nh(λ). : np nλ (1 λ) n( λ ) λn = 2 nh(λ) 1 λ =0

3 We would lke the sets T α (x m ) {y n Y n : d H (x m, y n ) np nα} for dfferent x m to be dsjont. Note that n(p+α) T α ( ) =. Suppose frst that =n(p α) (5) M T α ( ) (4) 1 n + 1 M2nh(p α) > 2 n, then a pont y n n the output space s typcal on average for an exponentally large number of codevectors, namely for A = 2 n(r+h 2(p α) o(1)) /2 n = 2 n(r+h 2(p α) 1 o(1)) = 2 nε codevectors, where we denoted ε = R + h 2 (p) α 1. Ths means that a sgnfcant proporton of ponts y n s typcal for exponentally many codewords. In ths case decodng wth low error probablty s mpossble (for nstance, the maxmum error probablty s close to 1). We observe that (5) mples the followng nequalty: R > 1 h(p) + α where α s small f so s α. Thus f ths nequalty s true, the error probablty s large. At the same tme, f the cumulatve volume of the typcal sets around the codewords satsfes M T α ( ) = 2 nr T α ( ) (4) 2 n(r+h 2(p+α)),.e., s less than 2 n (the total sze of the output space), then there can exst codes n whch decodng s correct wth large probablty. We wll show by random choce that such codes ndeed exst. We notce that there s a dvdng pont R = 1 h 2 (p) between the low and hgh error probablty of decodng. Ths value of the rate s called the capacty of the channel W. We can also flp the queston by askng Threshold nose level: We are gven a code C of rate R. Assumng that C s chosen optmally for transmsson over a BSC(p), what s the largest p for whch the code can guarantee relable transmsson? The above argument shows that the maxmum p s p th = h 1 2 to vsualze the dependence on the rate). (1 R) (Plot the functon h 1 2 (z)

4 Lecture 15 (03/29/18). Channel capacty The followng s a formalzaton of the dscusson n the prevous secton. Consder BSC(p),.e., a stochastc mappng W : X Y, X = Y = {0, 1} such that Let W (y x) = (1 p)1 {x=y} + p1 {x y}. T α (x n ) = {y n : d H (x n, y n ) np nα}, α > 0. Let C = {x n 1,..., x n M } be a code (below we omt the superscrpt n from the notaton for the codewords). Let D(x m ) {y n {0, 1} n : m W n (y n x m ) W n (y n x m )} be the decson regon for the codeword x m. Denote by λ m = W n (y n x m ) y n D(x m) c the error probablty of decodng condtonal on transmttng the mth codeword and let be the maxmum error probablty of the code C. λ max (C) = max λ m m Theorem 2. (Shannon s capacty theorem for the BSC, lower bound) Gven ε > 0, γ > 0, p < 1/2 and R 1 h(p) γ, there exsts n 0 = n 0 (ε, γ) such that for any n n 0 there exsts a code C {0, 1} n of cardnalty 2 Rn, whose maxmum error probablty of decodng on a BSC(p) satsfes λ max ε. Proof : Let M = 2 R n, where R = R + 1 n. Choose C = {x 1,..., x M } {0, 1} Mn by randomly assgnng codewords to the messages wth unform probablty Pr(f(m) = x m ) = 2 n ndependently of each other (below we use the notaton x m for codewords, omttng the superscrpt x n m). Suppose that y n s the receved vector. Let us use the followng decoder g : Y n M : f there s a unque codeword x m C such that y n T α (x m ), assgn g(y n ) = m. In all other stuatons put g(y n ) = 1. (Ths mappng s called the typcal pars decoder). Let Z m = 1 {(yn T α (x m ))}, m = 1,..., M be the ndcator random varable of the event {y n s typcal for x m }. Suppose that the transmtted vector s x 1. The probablty of error λ 1 satsfes We have 2 M λ 1 Pr{Z 1 = 0} + Pr{ Z m 1}. m=2 Pr{Z 1 = 0} = Pr{y n T α (x 1 )} = Pr{ d H (x 1, y n ) np > nα} Chebyshev nequalty np(1 p) (nα) 2 = p(1 p)n δ 2 The Chebyshev nequalty states that for any random varable X wth fnte expectaton and varance Var(X) we have Pr{ X EX a} Var X a 2.

5 by takng α = n (1 δ)/2, δ > 0. By takng n suffcently large, we can guarantee that Pr{Z 1 = 0} β, where β > 0 s arbtrarly small. Next for m 2 Pr{Z m = 1} = T α( ) 2 n (4) 2 n(1 h(p+α)). Use the unon bound: M Pr{ Z m 1} M Pr{Z m = 1} 2 n(1 R h(p+α)) 2 n(α γ+ 1 n ), m=2 where we wrte h(p+α) = h(p)+α, and α s small f α s small (note that α > 0 snce p < 1/2). By takng a suffcently large n we can ensure that α < γ 1 n, and so Pr{ M m=2 Z m 1} β. Now let use compute the average probablty of error over all codeword assgnments f : P e = E F λ(c) = 1 M P r(c) λ m (C) M (6) = 1 M C m=1 M P r(c)λ m (C) = C C m=1 Pr(C)λ 1 (C) 2β where Pr(C) = M m=1 Pr{f(m) = x m}. Here F s the random mappng. Snce we go over all the mappngs, the sum C P r(c)λ m(c) does not depend on m. By (6) there exsts a code C for whch the error probablty averaged over M codewords satsfes λ(c) 2β. By the Markov nequalty, {m M : λ m (C ) 2λ(C } M/2. Thus, there exst at least M/2 messages 3 whose codewords n C are decoded wth error probablty λ m (C ) 4β. Denote ths set of codewords by C and take β = ε/4. Thus there s a code C of cardnalty M 2 = 2n(R 1 n ),.e., of rate R wth λ max ( C ) ε. Observe that the probablty λ max falls as n ε. By usng the optmal (.e., MAP) decoder together rather than the typcal pars decoder t s possble to show that there exst much better codes for the BSC. Theorem 3. For any rate R, 0 R < 1 h 2 (p) there exsts a sequence of codes C, = 1, 2... of growng length n such that log C R n and (7) λ max (C ) 2 n{d(δ(r) p))(1 o(1)} where D(x y) = x log x y 1 x + (1 x) log 1 y, δ(r) h 1 2 (1 R), and o(1) 0 as n. We wll omt the proof. Note that the declne rate of the maxmum error probablty s a much faster (exponental) functon of the code length n than n the above argument. We took a loss by usng a suboptmal decoder (and a smpler proof). Fnte-length scalng. The effcency of our transmsson desgn can be measured by the number of messages that can be transmtted relably. Suppose that the code rate s R = 1 h 2 (p) γ, where γ > 0 s small and p s the transmsson probablty of the BSC. Suppose moreover that we requre 3 Ths dea s called expurgaton.

6 that λ max = ε. We already understand that we wll have to choose a suffcently long code. What s the smallest n that can guarantee ths? As R 1 h(p), we have δ(r) p. We have D(δ(1 h(p)) p) = D(p p) = 0, D δ(δ p) (1 δ)p = log = 0, D δ (δ p) = 1 1 δ(1 p) δ=p ln 2 δ(1 δ). If R = 1 h 2 (p) γ then δ = p + γ where γ s some small number. Expandng D nto a power seres n the neghborhood of δ = p we obtan D(δ p) = 1 2 D δ (δ p)(δ p) 2 + o((δ p) 2 ) = O((δ p) 2 ). 0.10 0.08 p 0.5 0.06 0.04 0.02 From (7) we obtan (8) n log(1/ε) (δ p) 2 (constants omtted). Let us rephrase ths by fndng how γ (gap to capacty) depends on the code length n. To answer ths, rewrte (8) as follows: δ p (log(1/ε)) 1 /2 1 n Now substtute δ = h 1 (1 R) to fnd that R 1 h(p O(n 1 /2 )), or ( 1 R 1 h(p) O n ). To conclude: Proposton 4. The gap-to-capacty for optmal codes scales as n 1/2. The outcome of ths calculaton s called fnte-length scalng of the code sequence on the BSC. The same order of scalng s true for optmal codes used on any bnary-nput dscrete memoryless channel (DMC). We return to the textbook, and state the Shannon capacty theorem for a DMC (pp.192 195, 200). Then we consder examples (pp. 187 191), and then (next class) prove the capacty theorem.