Information Dimension

Similar documents
Coding over Interference Channels: An Information-Estimation View

Degrees of Freedom invector Interference Channels

Lecture 8: MIMO Architectures (II) Theoretical Foundations of Wireless Communications 1. Overview. Ragnar Thobaben CommTh/EES/KTH

Lecture 5 Channel Coding over Continuous Channels

ECE Information theory Final

On the Optimality of Treating Interference as Noise in Competitive Scenarios

Shannon Theory for Compressed Sensing

MMSE Dimension. snr. 1 We use the following asymptotic notation: f(x) = O (g(x)) if and only

Lecture 6 Channel Coding over Continuous Channels

18.2 Continuous Alphabet (discrete-time, memoryless) Channel

The Robustness of Dirty Paper Coding and The Binary Dirty Multiple Access Channel with Common Interference

5. Density evolution. Density evolution 5-1

Exercises with solutions (Set D)

Multi-Input Multi-Output Systems (MIMO) Channel Model for MIMO MIMO Decoding MIMO Gains Multi-User MIMO Systems

Ch. 8 Math Preliminaries for Lossy Coding. 8.5 Rate-Distortion Theory

Multi-Input Multi-Output Systems (MIMO) Channel Model for MIMO MIMO Decoding MIMO Gains Multi-User MIMO Systems

Lecture 2. Capacity of the Gaussian channel

Entropies & Information Theory

C.M. Liu Perceptual Signal Processing Lab College of Computer Science National Chiao-Tung University

Note that the new channel is noisier than the original two : and H(A I +A2-2A1A2) > H(A2) (why?). min(c,, C2 ) = min(1 - H(a t ), 1 - H(A 2 )).

f (x) = k=0 f (0) = k=0 k=0 a k k(0) k 1 = a 1 a 1 = f (0). a k k(k 1)x k 2, k=2 a k k(k 1)(0) k 2 = 2a 2 a 2 = f (0) 2 a k k(k 1)(k 2)x k 3, k=3

ECE Information theory Final (Fall 2008)

Revision of Lecture 5

Information Theory. Coding and Information Theory. Information Theory Textbooks. Entropy

Lecture 8: Channel Capacity, Continuous Random Variables

3F1: Signals and Systems INFORMATION THEORY Examples Paper Solutions

Capacity of a channel Shannon s second theorem. Information Theory 1/33

Draft. On Limiting Expressions for the Capacity Region of Gaussian Interference Channels. Mojtaba Vaezi and H. Vincent Poor

FRAMES IN QUANTUM AND CLASSICAL INFORMATION THEORY

EE 4TM4: Digital Communications II Scalar Gaussian Channel

Lecture 4. Capacity of Fading Channels

MARKOV CHAINS A finite state Markov chain is a sequence of discrete cv s from a finite alphabet where is a pmf on and for

Lecture 8: Shannon s Noise Models

Lecture 6: Gaussian Channels. Copyright G. Caire (Sample Lectures) 157

Information Theory for Wireless Communications. Lecture 10 Discrete Memoryless Multiple Access Channel (DM-MAC): The Converse Theorem

Lecture 14 February 28

Principles of Communications

On the Capacity and Degrees of Freedom Regions of MIMO Interference Channels with Limited Receiver Cooperation

Polar codes for the m-user MAC and matroids

Information Theory for Wireless Communications, Part II:

Quiz 2 Date: Monday, November 21, 2016

Using Noncoherent Modulation for Training

Lecture 1: The Multiple Access Channel. Copyright G. Caire 12

ECE598: Information-theoretic methods in high-dimensional statistics Spring 2016

EE 4TM4: Digital Communications II. Channel Capacity

Degrees-of-Freedom Robust Transmission for the K-user Distributed Broadcast Channel

EE376A - Information Theory Final, Monday March 14th 2016 Solutions. Please start answering each question on a new page of the answer booklet.

Iterative Quantization. Using Codes On Graphs

Multiuser Capacity in Block Fading Channel

Chapter 9 Fundamental Limits in Information Theory

On the Applications of the Minimum Mean p th Error to Information Theoretic Quantities

Control Over Noisy Channels

Chapter 3, 4 Random Variables ENCS Probability and Stochastic Processes. Concordia University

Revision of Lecture 4

On the Secrecy Capacity of the Z-Interference Channel

Shannon s A Mathematical Theory of Communication

Interference Alignment at Finite SNR for TI channels

Approximate Capacity of Fast Fading Interference Channels with no CSIT

Approximately achieving the feedback interference channel capacity with point-to-point codes

PROOF OF ZADOR-GERSHO THEOREM

(Classical) Information Theory III: Noisy channel coding

Lecture 9: Diversity-Multiplexing Tradeoff Theoretical Foundations of Wireless Communications 1

Arimoto Channel Coding Converse and Rényi Divergence

ELEC546 Review of Information Theory

On the Capacity of the Interference Channel with a Relay

Generalized Writing on Dirty Paper

Compression and Coding

Lecture 22: Final Review

K User Interference Channel with Backhaul

Basic Principles of Video Coding

Chapter 4. Data Transmission and Channel Capacity. Po-Ning Chen, Professor. Department of Communications Engineering. National Chiao Tung University

Estimation of the Capacity of Multipath Infrared Channels

Coding for Discrete Source

Lossy Compression Coding Theorems for Arbitrary Sources

Entropy, Inference, and Channel Coding


Strong Converse Theorems for Classes of Multimessage Multicast Networks: A Rényi Divergence Approach

Functional Properties of MMSE

19. Channel coding: energy-per-bit, continuous-time channels

Lecture 6 I. CHANNEL CODING. X n (m) P Y X

On the Duality between Multiple-Access Codes and Computation Codes

Vector Channel Capacity with Quantized Feedback

Information Theory - Entropy. Figure 3

Capacity of the Discrete Memoryless Energy Harvesting Channel with Side Information

Nonanticipative Rate Distortion Function and. Relations to Filtering Theory

Digital Image Processing Lectures 25 & 26

CHAPTER 3. P (B j A i ) P (B j ) =log 2. j=1

Half-Duplex Gaussian Relay Networks with Interference Processing Relays

Chapter 4: Continuous channel and its capacity

Error Exponent Region for Gaussian Broadcast Channels

3. Coding theory 3.1. Basic concepts

Principles of Communications

EE376A: Homeworks #4 Solutions Due on Thursday, February 22, 2018 Please submit on Gradescope. Start every question on a new page.

WIRELESS networks with multiple users are interference-limited

Interactions of Information Theory and Estimation in Single- and Multi-user Communications

Simultaneous SDR Optimality via a Joint Matrix Decomp.

Diversity-Multiplexing Tradeoff in MIMO Channels with Partial CSIT. ECE 559 Presentation Hoa Pham Dec 3, 2007

Block 2: Introduction to Information Theory

Lecture 5: Channel Capacity. Copyright G. Caire (Sample Lectures) 122

x log x, which is strictly convex, and use Jensen s Inequality:

Transcription:

Information Dimension Mina Karzand Massachusetts Institute of Technology November 16, 2011 1 / 26

2 / 26

Let X would be a real-valued random variable. For m N, the m point uniform quantized version of X is shown by X m = mx m Thus, X m Z/m Lower Information Dimension: d(x ) = lim inf m Upper Information Dimension: d(x ) = lim sup m H( X m ) log m H( X m ) log m 3 / 26

If d(x ) = d(x ), then Entropy of dimension d(x ): H( X m ) d(x ) = lim m log m Ĥ(X ) = lim m [H( X m) d(x ) log m] 4 / 26

If H( X n ) <, then 0 d(x n ) d(x n ) n. 5 / 26

If H( X n ) <, then 0 d(x n ) d(x n ) n. If E[log (1 + X )] <, then d(x ) <. 5 / 26

If H( X n ) <, then 0 d(x n ) d(x n ) n. If E[log (1 + X )] <, then d(x ) <. It is sufficient to restrict to the exponential subsequence m = 2 l. Define [.] l. m, H([X ] l ) d(x ) = lim n l 5 / 26

If H( X n ) <, then 0 d(x n ) d(x n ) n. If E[log (1 + X )] <, then d(x ) <. It is sufficient to restrict to the exponential subsequence m = 2 l. Define [.] l. m, H([X ] l ) d(x ) = lim n l 5 / 26

Translation Invariance, x n R n, d(x n + X n ) = d(x n ) 6 / 26

Translation Invariance, x n R n, d(x n + X n ) = d(x n ) Scale Invariance, α 0, d(αx n ) = d(x n ) 6 / 26

Translation Invariance, x n R n, d(x n + X n ) = d(x n ) Scale Invariance, α 0, d(αx n ) = d(x n ) If X n and Y n are independent, max{d(x n ), d(y n )} d(x n + Y n ) d(x n ) + d(y n ) 6 / 26

Translation Invariance, x n R n, d(x n + X n ) = d(x n ) Scale Invariance, α 0, d(αx n ) = d(x n ) If X n and Y n are independent, max{d(x n ), d(y n )} d(x n + Y n ) d(x n ) + d(y n ) If {X i } are independent and d(x i ) exists for all i, n d(x n ) = d(x i ) i=1 6 / 26

Translation Invariance, x n R n, d(x n + X n ) = d(x n ) Scale Invariance, α 0, d(αx n ) = d(x n ) If X n and Y n are independent, max{d(x n ), d(y n )} d(x n + Y n ) d(x n ) + d(y n ) If {X i } are independent and d(x i ) exists for all i, n d(x n ) = d(x i ) If X n, Y n and Z n are independent, then d(x n + Y n + Z n ) + d(z n ) d(x n + Z n ) + d(y n ) + d(z n ) i=1 6 / 26

A probability distribution can be uniquely represented as the mixture v = pv d + qv c + rv s p + q + r = 1 v d : purely atomic prob. measure (discrete part) v c : absolutely continuous probability measure v s : probability measure singular with respect to Lebesgue measure 7 / 26

Theorem: Let X be a random variable s.t. H( X ) <. Its distribution can be represented as Then d(x ) = ρ and v = (1 ρ)v d + ρv c Ĥ(X ) = (1 ρ)h(v d ) + ρh(v c ) + h b (ρ) 8 / 26

Renyi entropy of order α of a discrete random variable: y p y log 1 p y, α = 1 H α (Y ) = 1 log max y p y, α = 1 1 α log ( y pα y ), α 1,. 9 / 26

Renyi entropy of order α of a discrete random variable: y p y log 1 p y, α = 1 H α (Y ) = 1 log max y p y, α = 1 1 α log ( y pα y ), α 1,. d α (X ) = lim inf m d α (X ) = lim sup m H α ( X m ) log m H α ( X m ) log m 9 / 26

Renyi entropy of order α of a discrete random variable: y p y log 1 p y, α = 1 H α (Y ) = 1 log max y p y, α = 1 1 α log ( y pα y ), α 1,. d α (X ) = lim inf m d α (X ) = lim sup m H α ( X m ) log m H α ( X m ) log m Ĥ α (X ) = lim m [H α( X m ) d α (X ) log m] 9 / 26

Theorem: Let X be a real random variable, satisfying the property H α( X )< with the distribution represented as: Then, v = pv d + qv c + rv s For α > 1: If p > 0 (X has a discrete component), then d α (X ) = 0 and Ĥ α (X ) = H α (v d ) + α 1 α log p. 10 / 26

Theorem: Let X be a real random variable, satisfying the property H α( X )< with the distribution represented as: Then, v = pv d + qv c + rv s For α > 1: If p > 0 (X has a discrete component), then d α (X ) = 0 and Ĥ α (X ) = H α (v d ) + α 1 α log p. For α < 1: If q > 0(X has an absolutely continuous part), then d α (X ) = 1 and Ĥ α (X ) = h α (v c ) + α a α log q 10 / 26

Dyadic expansion of X can be written as X = (X ) j 2 j j=1 There is a one to one correspondence between X and the binary random process {(X ) j, j N} d(x ) = lim inf i d(x ) = lim sup i H((X ) 1, (X ) 2,..., (X ) i ) i H((X ) 1, (X ) 2,..., (X ) i ) i Random variables whose lower and upper information dimension differ can be constructed from processes with different lower and upper entropy rate. 11 / 26

Cantor Distribution C 0 = [0, 1] C 1 = [0, 1/3] [2/3, 1] C 2 = [0, 1/9] [2/9, 1/3] [2/3, 7/9] [8/9, 1] C 3 = The support of the Cantor distribution is the Cantor set i=1 C i. 12 / 26

Cantor Distribution C 0 = [0, 1] C 1 = [0, 1/3] [2/3, 1] C 2 = [0, 1/9] [2/9, 1/3] [2/3, 7/9] [8/9, 1] C 3 = The support of the Cantor distribution is the Cantor set i=1 C i. 12 / 26

Degrees of freedom of the interference channel Channel Model: K-user real-valued memoryless Gaussian interference channel with a fixed deterministic channel matrix H = [h ij ] (known at encoder and decoder), where at each symbol epoch the i th user transmits X i and the i th decoder receives where {X i, N i } K i=1 N i N (0, 1). Y i = k snrhij X j + N i j=1 are independent with E[X 2 i ] 1 and 13 / 26

Sum-rate capacity: { K } C(H, snr) max R i : R K C(H, snr) i=1 Degrees of freedom or the multiplexing gain DOF (H) = C(H, snr) lim snr 1 2 log snr 14 / 26

Theorem: Let X be independent of N which is standard normal random variable. Denote I (X, snr) = I (X ; snrx + N) Then, I (X, snr) lim = d(x ) snr 1 2 log snr Mutual information is maximized asymptotically by any absolutely continuous input distribution, where d(x ) = 1. 15 / 26

Information dimension under projection Almost every projection preserves the dimension. But, computing the dimension for individual projections is in general difficult. Theorem: Let A R m n with m n. Then for any X n, d(ax n ) min{d(x n ), rank(a)} Theorem: Let α (1, 2] and m n. Then for almost every A R m n, d α (AX n ) = min{d α (X n ), m} 16 / 26

Theorem: Let, K K dof (X K, H) d h ij X j d i=1 j=1 j i h ij X j Then, DOF (H) = sup X K dof (X K, H) where the supremum is over independent X 1, X 2,..., X K such that for some fixed C > 0. H( X i ) C 17 / 26

Theorem: Let, K K dof (X K, H) d h ij X j d i=1 j=1 j i h ij X j Then, DOF (H) = sup X K dof (X K, H) where the supremum is over independent X 1, X 2,..., X K such that H( X i ) C for some fixed C > 0. Applies to non-gaussian noise as long as finite non-gaussianness, D(N N G ) <. 17 / 26

dof (X K, H) K K d h ij X j d h ij X j j=1 j i }{{}}{{} info. dim. of the i-th user info. dim. of the interference i=1 18 / 26

X n i 1 C(H, snr) = lim n n sup X n 1,...,X n K K i=1 = [X i,1, X i,2,..., X i,n ]: i th input user. sup is over independent X n 1,..., X n K. I (X n i ; Y n i ) I (Xi n ; Yi n ) = I (X1 n,..., XK n ; Y i n ) I (X1 n,..., XK n ; Y n K = I h ij Xj n, snr I h ij Xj n, snr j=1 j i i Xi n ) 19 / 26

DOF (H) = lim lim sup 1 snr n X1 n,...,x K n n 2 log snr K K I h ij Xj n, snr I i=1 j=1 j i = lim sup 1 lim n X1 n,...,x K n snr n 2 log snr K K I h ij Xj n, snr I i=1 j=1 j i h ij Xj n, snr h ij Xj n, snr 20 / 26

DOF (H) = lim lim sup 1 snr n X1 n,...,x K n n 2 log snr K K I h ij Xj n, snr I i=1 j=1 j i = lim sup 1 lim n X1 n,...,x K n snr n 2 log snr K K I h ij Xj n, snr I i=1 j=1 j i h ij Xj n, snr h ij Xj n, snr I (., snr) = d(.) 2 log snr + o(log snr) 20 / 26

1 DOF (H) = lim n n K K d h ij Xj n, snr d j=1 j i i=1 h ij Xj n, snr SINGLE LETTERIZATION AND EXAMPLES 21 / 26

Two user IC ([ ]) a b DOF c d = sup X1 X 2 d(ax 1 + bx 2 ) + d(cx 1 + dx 2 ) d(bx 2 ) d(cx 1 ) 0, a = d = 0 = 2, a 0, d 0, b = c = 0 1, otherwise 22 / 26

Many-to-one IC: h 11 h 12 h 13 h 1K 0 h 22 0 0 DOF.. 0...... = K 1........ 0 0 0 0 h KK Achieved by choosing X 1 discrete and the rest absolutely continuous. 23 / 26

One-to-Many IC: h 11 0 0 0 h 21 h 22 0 0 DOF..... 0 = K 1 h K1 0 0 h KK Achieved by choosing X 1 discrete and the rest absolutely continuous. 24 / 26

MAC: 1 1 DOF.. = 1 1 1 25 / 26

Information Dimension and Rate Distortion Theory For scalar source and MSE distortion, whenever d(x ) exists and is finite, as D 0 R X (D) = d(x ) 2 log 1 D + o(log D) 26 / 26

Information Dimension and Rate Distortion Theory For scalar source and MSE distortion, whenever d(x ) exists and is finite, as D 0 R X (D) = d(x ) 2 log 1 D + o(log D) X is discrete and H(X ) < : R X (D) = H(X ) + o(1) X is continuous and h(x ) > : R X (D) = 1 2 log 1 + h(x ) + o(1) 2πeD X is discrete-continuous mixed: R X (D) = ρ 2 log 1 D + Ĥ(X ) + o(1) 26 / 26