Information Dimension Mina Karzand Massachusetts Institute of Technology November 16, 2011 1 / 26
2 / 26
Let X would be a real-valued random variable. For m N, the m point uniform quantized version of X is shown by X m = mx m Thus, X m Z/m Lower Information Dimension: d(x ) = lim inf m Upper Information Dimension: d(x ) = lim sup m H( X m ) log m H( X m ) log m 3 / 26
If d(x ) = d(x ), then Entropy of dimension d(x ): H( X m ) d(x ) = lim m log m Ĥ(X ) = lim m [H( X m) d(x ) log m] 4 / 26
If H( X n ) <, then 0 d(x n ) d(x n ) n. 5 / 26
If H( X n ) <, then 0 d(x n ) d(x n ) n. If E[log (1 + X )] <, then d(x ) <. 5 / 26
If H( X n ) <, then 0 d(x n ) d(x n ) n. If E[log (1 + X )] <, then d(x ) <. It is sufficient to restrict to the exponential subsequence m = 2 l. Define [.] l. m, H([X ] l ) d(x ) = lim n l 5 / 26
If H( X n ) <, then 0 d(x n ) d(x n ) n. If E[log (1 + X )] <, then d(x ) <. It is sufficient to restrict to the exponential subsequence m = 2 l. Define [.] l. m, H([X ] l ) d(x ) = lim n l 5 / 26
Translation Invariance, x n R n, d(x n + X n ) = d(x n ) 6 / 26
Translation Invariance, x n R n, d(x n + X n ) = d(x n ) Scale Invariance, α 0, d(αx n ) = d(x n ) 6 / 26
Translation Invariance, x n R n, d(x n + X n ) = d(x n ) Scale Invariance, α 0, d(αx n ) = d(x n ) If X n and Y n are independent, max{d(x n ), d(y n )} d(x n + Y n ) d(x n ) + d(y n ) 6 / 26
Translation Invariance, x n R n, d(x n + X n ) = d(x n ) Scale Invariance, α 0, d(αx n ) = d(x n ) If X n and Y n are independent, max{d(x n ), d(y n )} d(x n + Y n ) d(x n ) + d(y n ) If {X i } are independent and d(x i ) exists for all i, n d(x n ) = d(x i ) i=1 6 / 26
Translation Invariance, x n R n, d(x n + X n ) = d(x n ) Scale Invariance, α 0, d(αx n ) = d(x n ) If X n and Y n are independent, max{d(x n ), d(y n )} d(x n + Y n ) d(x n ) + d(y n ) If {X i } are independent and d(x i ) exists for all i, n d(x n ) = d(x i ) If X n, Y n and Z n are independent, then d(x n + Y n + Z n ) + d(z n ) d(x n + Z n ) + d(y n ) + d(z n ) i=1 6 / 26
A probability distribution can be uniquely represented as the mixture v = pv d + qv c + rv s p + q + r = 1 v d : purely atomic prob. measure (discrete part) v c : absolutely continuous probability measure v s : probability measure singular with respect to Lebesgue measure 7 / 26
Theorem: Let X be a random variable s.t. H( X ) <. Its distribution can be represented as Then d(x ) = ρ and v = (1 ρ)v d + ρv c Ĥ(X ) = (1 ρ)h(v d ) + ρh(v c ) + h b (ρ) 8 / 26
Renyi entropy of order α of a discrete random variable: y p y log 1 p y, α = 1 H α (Y ) = 1 log max y p y, α = 1 1 α log ( y pα y ), α 1,. 9 / 26
Renyi entropy of order α of a discrete random variable: y p y log 1 p y, α = 1 H α (Y ) = 1 log max y p y, α = 1 1 α log ( y pα y ), α 1,. d α (X ) = lim inf m d α (X ) = lim sup m H α ( X m ) log m H α ( X m ) log m 9 / 26
Renyi entropy of order α of a discrete random variable: y p y log 1 p y, α = 1 H α (Y ) = 1 log max y p y, α = 1 1 α log ( y pα y ), α 1,. d α (X ) = lim inf m d α (X ) = lim sup m H α ( X m ) log m H α ( X m ) log m Ĥ α (X ) = lim m [H α( X m ) d α (X ) log m] 9 / 26
Theorem: Let X be a real random variable, satisfying the property H α( X )< with the distribution represented as: Then, v = pv d + qv c + rv s For α > 1: If p > 0 (X has a discrete component), then d α (X ) = 0 and Ĥ α (X ) = H α (v d ) + α 1 α log p. 10 / 26
Theorem: Let X be a real random variable, satisfying the property H α( X )< with the distribution represented as: Then, v = pv d + qv c + rv s For α > 1: If p > 0 (X has a discrete component), then d α (X ) = 0 and Ĥ α (X ) = H α (v d ) + α 1 α log p. For α < 1: If q > 0(X has an absolutely continuous part), then d α (X ) = 1 and Ĥ α (X ) = h α (v c ) + α a α log q 10 / 26
Dyadic expansion of X can be written as X = (X ) j 2 j j=1 There is a one to one correspondence between X and the binary random process {(X ) j, j N} d(x ) = lim inf i d(x ) = lim sup i H((X ) 1, (X ) 2,..., (X ) i ) i H((X ) 1, (X ) 2,..., (X ) i ) i Random variables whose lower and upper information dimension differ can be constructed from processes with different lower and upper entropy rate. 11 / 26
Cantor Distribution C 0 = [0, 1] C 1 = [0, 1/3] [2/3, 1] C 2 = [0, 1/9] [2/9, 1/3] [2/3, 7/9] [8/9, 1] C 3 = The support of the Cantor distribution is the Cantor set i=1 C i. 12 / 26
Cantor Distribution C 0 = [0, 1] C 1 = [0, 1/3] [2/3, 1] C 2 = [0, 1/9] [2/9, 1/3] [2/3, 7/9] [8/9, 1] C 3 = The support of the Cantor distribution is the Cantor set i=1 C i. 12 / 26
Degrees of freedom of the interference channel Channel Model: K-user real-valued memoryless Gaussian interference channel with a fixed deterministic channel matrix H = [h ij ] (known at encoder and decoder), where at each symbol epoch the i th user transmits X i and the i th decoder receives where {X i, N i } K i=1 N i N (0, 1). Y i = k snrhij X j + N i j=1 are independent with E[X 2 i ] 1 and 13 / 26
Sum-rate capacity: { K } C(H, snr) max R i : R K C(H, snr) i=1 Degrees of freedom or the multiplexing gain DOF (H) = C(H, snr) lim snr 1 2 log snr 14 / 26
Theorem: Let X be independent of N which is standard normal random variable. Denote I (X, snr) = I (X ; snrx + N) Then, I (X, snr) lim = d(x ) snr 1 2 log snr Mutual information is maximized asymptotically by any absolutely continuous input distribution, where d(x ) = 1. 15 / 26
Information dimension under projection Almost every projection preserves the dimension. But, computing the dimension for individual projections is in general difficult. Theorem: Let A R m n with m n. Then for any X n, d(ax n ) min{d(x n ), rank(a)} Theorem: Let α (1, 2] and m n. Then for almost every A R m n, d α (AX n ) = min{d α (X n ), m} 16 / 26
Theorem: Let, K K dof (X K, H) d h ij X j d i=1 j=1 j i h ij X j Then, DOF (H) = sup X K dof (X K, H) where the supremum is over independent X 1, X 2,..., X K such that for some fixed C > 0. H( X i ) C 17 / 26
Theorem: Let, K K dof (X K, H) d h ij X j d i=1 j=1 j i h ij X j Then, DOF (H) = sup X K dof (X K, H) where the supremum is over independent X 1, X 2,..., X K such that H( X i ) C for some fixed C > 0. Applies to non-gaussian noise as long as finite non-gaussianness, D(N N G ) <. 17 / 26
dof (X K, H) K K d h ij X j d h ij X j j=1 j i }{{}}{{} info. dim. of the i-th user info. dim. of the interference i=1 18 / 26
X n i 1 C(H, snr) = lim n n sup X n 1,...,X n K K i=1 = [X i,1, X i,2,..., X i,n ]: i th input user. sup is over independent X n 1,..., X n K. I (X n i ; Y n i ) I (Xi n ; Yi n ) = I (X1 n,..., XK n ; Y i n ) I (X1 n,..., XK n ; Y n K = I h ij Xj n, snr I h ij Xj n, snr j=1 j i i Xi n ) 19 / 26
DOF (H) = lim lim sup 1 snr n X1 n,...,x K n n 2 log snr K K I h ij Xj n, snr I i=1 j=1 j i = lim sup 1 lim n X1 n,...,x K n snr n 2 log snr K K I h ij Xj n, snr I i=1 j=1 j i h ij Xj n, snr h ij Xj n, snr 20 / 26
DOF (H) = lim lim sup 1 snr n X1 n,...,x K n n 2 log snr K K I h ij Xj n, snr I i=1 j=1 j i = lim sup 1 lim n X1 n,...,x K n snr n 2 log snr K K I h ij Xj n, snr I i=1 j=1 j i h ij Xj n, snr h ij Xj n, snr I (., snr) = d(.) 2 log snr + o(log snr) 20 / 26
1 DOF (H) = lim n n K K d h ij Xj n, snr d j=1 j i i=1 h ij Xj n, snr SINGLE LETTERIZATION AND EXAMPLES 21 / 26
Two user IC ([ ]) a b DOF c d = sup X1 X 2 d(ax 1 + bx 2 ) + d(cx 1 + dx 2 ) d(bx 2 ) d(cx 1 ) 0, a = d = 0 = 2, a 0, d 0, b = c = 0 1, otherwise 22 / 26
Many-to-one IC: h 11 h 12 h 13 h 1K 0 h 22 0 0 DOF.. 0...... = K 1........ 0 0 0 0 h KK Achieved by choosing X 1 discrete and the rest absolutely continuous. 23 / 26
One-to-Many IC: h 11 0 0 0 h 21 h 22 0 0 DOF..... 0 = K 1 h K1 0 0 h KK Achieved by choosing X 1 discrete and the rest absolutely continuous. 24 / 26
MAC: 1 1 DOF.. = 1 1 1 25 / 26
Information Dimension and Rate Distortion Theory For scalar source and MSE distortion, whenever d(x ) exists and is finite, as D 0 R X (D) = d(x ) 2 log 1 D + o(log D) 26 / 26
Information Dimension and Rate Distortion Theory For scalar source and MSE distortion, whenever d(x ) exists and is finite, as D 0 R X (D) = d(x ) 2 log 1 D + o(log D) X is discrete and H(X ) < : R X (D) = H(X ) + o(1) X is continuous and h(x ) > : R X (D) = 1 2 log 1 + h(x ) + o(1) 2πeD X is discrete-continuous mixed: R X (D) = ρ 2 log 1 D + Ĥ(X ) + o(1) 26 / 26