Lecture 27. Capacity of additive Gaussian noise channel and the sphere packing bound

Similar documents
Lecture 15: Strong, Conditional, & Joint Typicality

Econ 325/327 Notes on Sample Mean, Sample Proportion, Central Limit Theorem, Chi-square Distribution, Student s t distribution 1.

Lecture 7: Channel coding theorem for discrete-time continuous memoryless channel

Convergence of random variables. (telegram style notes) P.J.C. Spreij

If a subset E of R contains no open interval, is it of zero measure? For instance, is the set of irrationals in [0, 1] is of measure zero?

7.1 Convergence of sequences of random variables

Lecture 11: Channel Coding Theorem: Converse Part

ECE 330:541, Stochastic Signals and Systems Lecture Notes on Limit Theorems from Probability Fall 2002

Advanced Stochastic Processes.

Lecture 1: Basic problems of coding theory

Output Analysis and Run-Length Control

Lecture 3: August 31

n outcome is (+1,+1, 1,..., 1). Let the r.v. X denote our position (relative to our starting point 0) after n moves. Thus X = X 1 + X 2 + +X n,

Information Theory Tutorial Communication over Channels with memory. Chi Zhang Department of Electrical Engineering University of Notre Dame

Information Theory and Coding

Discrete Mathematics for CS Spring 2008 David Wagner Note 22

7.1 Convergence of sequences of random variables

Lecture 2. The Lovász Local Lemma

MASSACHUSETTS INSTITUTE OF TECHNOLOGY 6.265/15.070J Fall 2013 Lecture 21 11/27/2013

Sequences and Series of Functions

Lecture Chapter 6: Convergence of Random Sequences

The Maximum-Likelihood Decoding Performance of Error-Correcting Codes

Distribution of Random Samples & Limit theorems

Machine Learning Theory Tübingen University, WS 2016/2017 Lecture 12

An Introduction to Randomized Algorithms

lim za n n = z lim a n n.

(A sequence also can be thought of as the list of function values attained for a function f :ℵ X, where f (n) = x n for n 1.) x 1 x N +k x N +4 x 3

Statistical and Mathematical Methods DS-GA 1002 December 8, Sample Final Problems Solutions

On Random Line Segments in the Unit Square

MASSACHUSETTS INSTITUTE OF TECHNOLOGY 6.436J/15.085J Fall 2008 Lecture 19 11/17/2008 LAWS OF LARGE NUMBERS II THE STRONG LAW OF LARGE NUMBERS

6.3 Testing Series With Positive Terms

Notes 19 : Martingale CLT

Entropies & Information Theory

Lecture 19: Convergence

17.1 Channel coding with input constraints

Consider the n-dimensional additive white Gaussian noise (AWGN) channel

2 Banach spaces and Hilbert spaces

Symmetric Two-User Gaussian Interference Channel with Common Messages

Lecture Notes for Analysis Class

6.895 Essential Coding Theory October 20, Lecture 11. This lecture is focused in comparisons of the following properties/parameters of a code:

Solution. 1 Solutions of Homework 1. Sangchul Lee. October 27, Problem 1.1

MAS111 Convergence and Continuity

Notes for Lecture 11

Measure and Measurable Functions

Product measures, Tonelli s and Fubini s theorems For use in MAT3400/4400, autumn 2014 Nadia S. Larsen. Version of 13 October 2014.

STAT Homework 1 - Solutions

Metric Space Properties

Fall 2013 MTH431/531 Real analysis Section Notes

How to Maximize a Function without Really Trying

The Boolean Ring of Intervals

Sequences A sequence of numbers is a function whose domain is the positive integers. We can see that the sequence

A Note on Matrix Rigidity

Lecture 2: Concentration Bounds

w (1) ˆx w (1) x (1) /ρ and w (2) ˆx w (2) x (2) /ρ.

EECS564 Estimation, Filtering, and Detection Hwk 2 Solns. Winter p θ (z) = (2θz + 1 θ), 0 z 1

Approximations and more PMFs and PDFs

Math 61CM - Solutions to homework 3

MA131 - Analysis 1. Workbook 2 Sequences I

Infinite Sequences and Series

Statistical Analysis on Uncertainty for Autocorrelated Measurements and its Applications to Key Comparisons

Shannon s noiseless coding theorem

Riemann Sums y = f (x)

1 Convergence in Probability and the Weak Law of Large Numbers

ECE 901 Lecture 12: Complexity Regularization and the Squared Loss

Ma 530 Infinite Series I

ECONOMETRIC THEORY. MODULE XIII Lecture - 34 Asymptotic Theory and Stochastic Regressors

Lecture 7: Properties of Random Samples

ECE 564/645 - Digital Communication Systems (Spring 2014) Final Exam Friday, May 2nd, 8:00-10:00am, Marston 220

Series III. Chapter Alternating Series

Section 11.8: Power Series

3. Z Transform. Recall that the Fourier transform (FT) of a DT signal xn [ ] is ( ) [ ] = In order for the FT to exist in the finite magnitude sense,

1+x 1 + α+x. x = 2(α x2 ) 1+x

REGRESSION WITH QUADRATIC LOSS

7-1. Chapter 4. Part I. Sampling Distributions and Confidence Intervals

1 Approximating Integrals using Taylor Polynomials

Sequences I. Chapter Introduction

Estimation of the Mean and the ACVF

Problem Set 4 Due Oct, 12

Solutions to Math 347 Practice Problems for the final

MASSACHUSETTS INSTITUTE OF TECHNOLOGY 6.265/15.070J Fall 2013 Lecture 3 9/11/2013. Large deviations Theory. Cramér s Theorem

Fundamental Theorem of Algebra. Yvonne Lai March 2010

Notes 27 : Brownian motion: path properties

Random Variables, Sampling and Estimation

Application to Random Graphs

Machine Learning Brett Bernstein

Lecture 14: Graph Entropy


6 Integers Modulo n. integer k can be written as k = qn + r, with q,r, 0 r b. So any integer.

It is always the case that unions, intersections, complements, and set differences are preserved by the inverse image of a function.

1 Review and Overview

Topic 9: Sampling Distributions of Estimators

Let us give one more example of MLE. Example 3. The uniform distribution U[0, θ] on the interval [0, θ] has p.d.f.

SOME TRIBONACCI IDENTITIES

Inequalities for Entropies of Sets of Subsets of Random Variables

Chapter 3. Strong convergence. 3.1 Definition of almost sure convergence

Statistical Inference (Chapter 10) Statistical inference = learn about a population based on the information provided by a sample.

Because it tests for differences between multiple pairs of means in one test, it is called an omnibus test.

Lecture 4. We also define the set of possible values for the random walk as the set of all x R d such that P(S n = x) > 0 for some n.

Math 155 (Lecture 3)

Lecture 11 and 12: Basic estimation theory

Transcription:

Lecture 7 Ageda for the lecture Gaussia chael with average power costraits Capacity of additive Gaussia oise chael ad the sphere packig boud 7. Additive Gaussia oise chael Up to this poit, we have bee cosiderig oly chaels with discrete iput ad output alphabets. This does does ot iclude the most importat chael for commuicatio egieers, amely the Gaussia chael. This is the chael oe ecouters i modelig commuicatio over a bad-limited chaels ad additive white Gaussia oise. The full treatmet of the physical cotiuous time chael is beyod the scope of this course. Istead, we preset the treatmet for a importat sub-block of the practical chael the discrete time, memoryless additive Gaussia oise (AGN) chael. Specifically, for X = Y = R, for each iput x X the AGO chael (X, W, Y) produces a radom output Y distributed N (x, σ ), i.e., for iput x, the chael output is give by Y = x + Z where Z N (0, σ ). The chael is statioary ad memoryless, i.e., for each chael use a idepedet zero mea Gaussia oise with the same variace σ is added to the iput. c Himashu Tyagi. Feel free to use with ackowledgemet.

Oe ca aively start by askig the questio: How may messages ca be set reliably over this chael. With a little thought, we ca covice ourselves that this questio is ot iterestig. For A > 0, by choosig A m as the mth codeword, for m =,,..., ay error requiremet ca be satisfied by choosig A to be large eough. Thus, ay umber of messages ca be set reliably eve with oe chael use. But of course the scheme above is ot practical. I fact, the theoretical model of AGN is used to model trasmissio over a badlimited additive white Gaussia oise (AWGN) chael. The amplitude of the iput of AGN is directly related to the amplitude of the iput sigal ad the average power x i of a codeword for chael uses is related to the average power of the iput sigal. Sice our circuits ca oly work with a fixed maximum amplitude ad our batteries ca provide oly fiite power, we must impose these two costraits o our codewords. I this course, we will oly cosider the average power costrait, which is easier to hadle tha the peak amplitude costrait. Specifically, we cosider the followig codig problem: Defiitio 7. (Chael codes with average power costraits). For a AGN (R, W, R), a (, M, P ) code cosists of codewords x m R ad associated decodig sets D m R, m M, such that each codeword satisfies the average power costraits: x mi P, m M. The otios of average ad maximum probability of error are defied as before (see lecture 0). A rate R > 0 is ɛ-achievable for W with average power costrait P if there exist (, R, P ) codes with maximum probability of error less tha ɛ, for all sufficietly large. The supremum over all ɛ-achievable rate with avg. power costrait P is called the ɛ-capacity at power P, deoted C ɛ,p (W ). The capacity of the chael at power P, deoted As before, we will work uder the maximum probability of error criterio; we ca proceed as i lecture 4 to show that results remai the same uder the average probability of error criterio.

C P (W ), is give by C P (W ) = lim ɛ 0 C ɛ,p (W ). 7. Capacity of AGN ad the sphere packig boud We shall establish the followig result, which characterizes C P (W ). Theorem 7. (Capacity of AGN). Give a AGN with oise variace σ, P > 0, 0 < ɛ <, we have C ɛ,p (W ) = C P (W ) = log ( + P σ ). The ratio (P/σ ) is ofte referred to as the allowed sigal-to-oise ratio (SNR); it ca be formally show to be equivalet to the SNR for sedig a sigal over a badlimited AWGN chael. Thus, the formula for capacity ca be re-stated as C P (W ) = log( + SNR), perhaps the most famous formula of Iformatio theory. I fact, the result above claims more. It shows that the maximum possible rate will ot icrease eve if a ovaishig probability of error ɛ (0, ) is allowed. We begi by provig the strog coverse, i.e., for every ɛ (0, ), C ɛ,p (W ) / log(+ SNR). Coceptually, the AGN chael is very similar to BSC. Let s recall our proof of sphere packig boud for the BSC. However, the details are somewhat techically ivolved. Nevertheless, I have icluded the proof to show how the ideas go through. There are two parts of the proof: First, we show that ay set D m such that W (D m x m ) is large, must have cardiality greater tha roughly h(δ). Secod, there are o more tha sequeces i all. Thus, the umber of codewords possible is o more tha ( h(δ)). Whe tryig to follow the same recipe for AGN, we have o trouble i extedig the first step, with the uderstadig that cardialities eed to replaced by volumes. Note that proofs for BSC essetially worked with bouds o pmf. For AGN, we eed to work with bouds o probability desities. Specifically, cosider a (, M, P ) code with the maximum probability of error less tha ɛ. 3

What is miimum possible volume of a decodig set of this code? The followig lemma aswers this questio. Lemma 7.3. Give a set D R, x R, ad 0 < ɛ < such that for a AGN W with oise variace σ W (D x) ɛ, for every η > 0 ad 0 < δ < ɛ, we have for all sufficietly large that ( ) πe vol(d) ( σ ) e η ( ɛ δ), Remark 7.4. The proof of the lemma is simple, but the otatios make it look difficult. Note that for a sufficietly large, a ball of radius ρ i Euclidea space R has volume less tha (you ca see the wikipedia article for volumes of -balls ad the refereces therei): ( πe ) (ρ). () Thus, the result above says that ay large probability set has volume more tha roughly a ball of radius σ. Also, ote that by Chebyshev s iequality whe we sed a iput sequece x over AGN, with large probability we receive a Y i a ball of radius roughly σ aroud it. The result above says that ay large probability set must have at least as much volume as this ball. Proof. Deote by fw (y x) the desity of the output whe x is set at y. Note that for AGN f W (y x) = (πσ ) e y x σ, where y x deotes the Euclidea distace betwee x ad y. Therefore, if y x ρ, we have f W (y x) (πσ ) 4 e ρ σ.

For η > 0, ρ = σ ( η), ad sufficietly large, we show that with large probability the output of a AGN chael is outside a Euclidea ball of radius ρ with ceter x, if the iput to the chael is x. Ideed, deotig by B ρ (x) a ball of radius ρ aroud x, cosider a radom variable Y = (Y,..., Y ) with idepedet etries with the ith coordiated geerated by N (x i, σ ). The, by Chebyshev s iequality P ( Y i σ ( η) ) Var [ Y i ] σ η = Var [ Y ] σ, η where the right side goes to zero as goes to ifiity. Equivaletly, lim P (Y / B ρ (x)) =. Therefore, give a set D such that W (D x) ɛ, we have W (D B ρ (x) c x) ɛ δ, for all sufficietly large. Thus, ɛ δ W (D B ρ (x) c x) vol(d B ρ ) (πσ ) e ( η), i.e., ( ) πe vol(d) ( σ ) e η ( ɛ δ), which completes the proof. This brigs us to the secod questio: 5

Where do the received vectors lie with large probability? This questio is difficult to aswer with good precisio. I fact, the fial aswer we will give is somewhat surprisig. Naively, we kow that each codeword x m satisfies x m P. As remarked above, whe we sed a codeword x m, with large probability we see a output withi a radius σ of this codeword. Thus, for ay codeword x m, the received vectors y satisfy y x m + y x m ( P + σ), with large probability. Therefore, with ρ = ( P + σ), for ay codeword x m, the received vectors lie i B ρ (0) with large probability, i.e., W (B ρ (0) x m ) δ, () for all sufficietly large. To obtai a coverse boud, we combie this observatio with Lemma 7.3 as follows: Cosider D m = D m B ρ (0). Sice W (D m x m ) ɛ, () implies that W (D m x m ) ɛ δ. But the by Lemma 7.3, each D m satisfies ( ) πe vol(d m) ( σ ) Furthermore, all the D ms are disjoit ad lie withi B ρ (0). Thus, by () the sum of their volumes is less tha ( πe ) ( rho ) ( πe ) ( ( P + σ)). Therefore, the maximum umber M of such disjoit sets we ca have satisfies ( ) P log M log + σ. 6

But ufortuately log(+ x) is i geeral more tha / log(+x)! Therefore, this boud does ot yield a coverse for our capacity result. What did we miss? What we missed is a iterestig fact about Gaussia radom variables (i fact, about the so-called ocetral chi-squared radom variables). Namely, we ca show () with ρ P + σ. I fact, we ca use Chebyshev s iequality to see this. Whe x is set, the received vector Y = (Y,..., Y ) is idepedet with Y i N (x i, σ ). Thus, E [ Y i ] = (x i + σ ) = bx + σ. But how about its variace. Note that a straightforward calculatio shows that Var [ Y i ] = Var [ Yi ] = x 4 i + 6σx i + σ 4. But ow we are i trouble. We oly kow that x P, but have o hadle over x4 i. I fact, it is ot easy to boud the variace i terms of just x, it s ot easy but it is ideed possible! The so-called Gaussia Poicaré iequality gives that Var [ Y i ] 4 Therefore, by Chebyshev s iequality, for every η > 0 P ( Y i > ( x + σ ) + η E [ Yi ] = 4( x + σ ). ) ( ) x 4 η + σ η. 7

I particular, if x P, P ( Y i > (P + σ + η) Thus, for all sufficietly large ad ρ give by ) 4(P + σ ) η. ρ = (P + σ + η), for every x such that x P, we have P (Y B ρ (x)) = W (B ρ (x) x) δ. (3) We ca ow complete our proof with (3) i place of (). Let D m = D m B ρ (x m ). The, by (3) ad the assumptio that W (D m x m ) ɛ, we have that W (D m x m ) ɛ δ. Therefore, by Lemma 7.3, each D m satisfies ( ) πe vol(d m) ( σ ) e η/ ( ɛ δ), (4) for all sufficietly large. O the other had, sice all D m are subsets of B ρ (0) ad disjoit, M vol(d m) vol(b ρ (0)) m= ( πe ( πe = ) (ρ ) ) ( (P + σ + η)) (5) where the secod iequality uses () ad holds for all sufficietly large. Thus, by com- 8

biig (4) ad (5) we get log M log P + σ + η σ + η + log ɛ δ. Thus, for every η > 0, every 0 < δ < ( ɛ)/ad sufficietly large log M ( log + P + η ) σ + η + log ɛ δ. By takig η 0 ad δ 0, we get C ɛ,p (W ) log ( + P σ ), which completes the proof of the strog coverse. 9