Interactions of Information Theory and Estimation in Single- and Multi-user Communications

Interactions of Information Theory and Estimation in Single- and Multi-user Communications Dongning Guo Department of Electrical Engineering Princeton University March 8, 2004 p 1 Dongning Guo

Communications S Source X Encoder Noisy Y channel S Decoder Dest p 2 Dongning Guo

Communications S Source X Encoder Noisy Y channel S Decoder Dest Tension: Noise causes errors (X Y ) Reliable transmission at a good rate (S S ) p 2 Dongning Guo

Communications S Source X Encoder Noisy Y channel S Decoder Dest Tension: Noise causes errors (X Y ) Reliable transmission at a good rate (S S ) Shannon (1948): Given a noisy channel, arbitrarily reliable transmission is possible up to a certain rate p 2 Dongning Guo

Example: Mars Rover Received power as low as 10 18 watt Rate as high as 10 Kbits/sec p 3 Dongning Guo

Wireless Network (X 1, Y 1 ) (X k, Y k ) p 4 Dongning Guo

Wireless Network (X 1, Y 1 ) (X k, Y k ) Tension: Throughput Interferences (complicated) Resources are scarce and shared p 4 Dongning Guo

Wireless Network (X 1, Y 1 ) (X k, Y k ) Tension: Throughput Interferences (complicated) Resources are scarce and shared Interferences are information Think smart! p 4 Dongning Guo

Wireless Network (X 1, Y 1 ) (X k, Y k ) Tension: Throughput Interferences (complicated) Resources are scarce and shared Interferences are information Think smart! Wanted: Simple laws of the seemingly disordered system p 4 Dongning Guo

Example: Sensor Network p 5 Dongning Guo

Mutual Information & MMSE Probabilistic point of view: X P X X P Y X Y X and Y are random variables/vectors or processes p 6 Dongning Guo

Mutual Information & MMSE Probabilistic point of view: X P X X P Y X Y X and Y are random variables/vectors or processes Minimum mean-squared error (MMSE): mmse(x; Y ) = min f E X f(y ) 2, Achieved by X(Y ) = E {X Y } p 6 Dongning Guo

Multi-user Perspective Examples: cellular telephony, sensor networks, DSL X 1 X K Channel P Y X Y Multiuser detector X 1 X K p 7 Dongning Guo

Multi-user Perspective Examples: cellular telephony, sensor networks, DSL X 1 X K Channel P Y X Y Multiuser detector X 1 X K Mean-square error: E X k X k 2 p 7 Dongning Guo

Multi-user Perspective Examples: cellular telephony, sensor networks, DSL X 1 X K Channel P Y X Y Multiuser detector X 1 X K Mean-square error: E X k X k 2 Mutual information: I (X k ; X k ) p 7 Dongning Guo

Outline Part I: Canonical Gaussian channel: Y = α X + W A fundamental I-mmse relationship By-products and applications p 8 Dongning Guo

Outline Part I: Canonical Gaussian channel: Y = α X + W A fundamental I-mmse relationship By-products and applications Part II: Multiuser (vector) channel: Y = S N K X + W Large-system analysis via statistical physics Result: Multiuser channel can be decoupled p 8 Dongning Guo

Part I: Y = α X + W p 9 Dongning Guo

An Observation Scalar Gaussian channel: Y = snr X + W, W N (0, 1) p 10 Dongning Guo

An Observation Scalar Gaussian channel: Y = snr X + W, W N (0, 1) Gaussian input: X N (0, 1) I(X; Y ) = 1 2 mmse(snr) = E log(1 + snr), [ snr X 1 + snr Y ] 2 = 1 1 + snr p 10 Dongning Guo

An Observation Scalar Gaussian channel: Y = snr X + W, W N (0, 1) Gaussian input: X N (0, 1) I(X; Y ) = 1 2 mmse(snr) = E log(1 + snr), [ snr X 1 + snr Y ] 2 = 1 1 + snr Immediately, d dsnr I(X; Y ) = 1 mmse(snr) log e 2 ( ) p 10 Dongning Guo

Another Observation Scalar Gaussian channel: Y = snr X + W, W N (0, 1) p 11 Dongning Guo

Another Observation Scalar Gaussian channel: Y = snr X + W, W N (0, 1) Binary input: X = ±1 equally likely I(snr) = snr e y2 2 2π log cosh(snr snr y) dy, mmse(snr) = 1 e y2 2 2π tanh(snr snr y) dy p 11 Dongning Guo

d dsnr I(snr) = 1 2 mmse(snr) ( ) 12 1 08 06 04 02 mmse snr I snr 2 4 6 8 10 snr Gaussian input p 12 Dongning Guo

d dsnr I(snr) = 1 2 mmse(snr) ( ) 12 1 08 06 04 02 mmse snr I snr 2 4 6 8 10 snr Gaussian input, binary input p 12 Dongning Guo

I-mmse Theorem Theorem 1 (Guo, Shamai & Verdú) Y = snr X + W P X with EX 2 <, d dsnr I(snr) = 1 2 mmse(snr) ( ) p 13 Dongning Guo

I-mmse Theorem Theorem 1 (Guo, Shamai & Verdú) Y = snr X + W P X with EX 2 <, ( ) also proved for: Vector channel: Continuous-time: Discrete-time: d dsnr I(snr) = 1 2 mmse(snr) Y = snr HX + W Y t = snr X t + W t, t [0, T ] Y n = snr X n + W n, n = 1, 2, ( ) p 13 Dongning Guo

Information vs Estimation Information theory I(X; Y ): (coded) reliable rate detection theory likelihood ratio estimation theory mmse(x; Y ): (uncoded) accuracy p 14 Dongning Guo

Information vs Estimation Information theory I(X; Y ): (coded) reliable rate detection theory likelihood ratio estimation theory mmse(x; Y ): (uncoded) accuracy History: Wiener, Shannon, Kolmogorov Price, Kailath, 1950-60s Duncan, 1970: Mutual information causal MMSE in continuous-time filtering p 14 Dongning Guo

Proof of d dsnr I(snr) = 1 2 mmse(snr) ( ) Equivalent to: I(snr + δ) I(snr) = δ 2 mmse(snr) + o(δ) p 15 Dongning Guo

Proof of d dsnr I(snr) = 1 2 mmse(snr) ( ) Equivalent to: I(snr + δ) I(snr) = δ 2 mmse(snr) + o(δ) Incremental channel: X σ 1 W 1 σ 2 W 2 Y 1 snr + δ Y 2 snr p 15 Dongning Guo

Proof of d dsnr I(snr) = 1 2 mmse(snr) ( ) Equivalent to: I(snr + δ) I(snr) = δ 2 mmse(snr) + o(δ) Incremental channel: Markov property: X σ 1 W 1 σ 2 W 2 Y 1 snr + δ Y 2 snr I(X; Y 1 ) I(X; Y 2 ) = I(X; Y 1, Y 2 ) I(X; Y 2 ) = I(X; Y 1 Y 2 ) p 15 Dongning Guo

Proof of d dsnr I(snr) = 1 mmse(snr) (Cont) 2 Lemma 1 (Verdú 90, Lapidoth-Shamai 02, Guo et al 04) Y = δ Z + U, U N (0, 1) As δ 0, I(Y ; Z) = δ 2 E (Z EZ)2 + o(δ) p 16 Dongning Guo

Proof of d dsnr I(snr) = 1 mmse(snr) (Cont) 2 Lemma 1 (Verdú 90, Lapidoth-Shamai 02, Guo et al 04) Y = δ Z + U, U N (0, 1) As δ 0, I(Y ; Z) = δ 2 E (Z EZ)2 + o(δ) Apply Lemma 1 to X Y 1 conditioned on Y 2 : I(X; Y 1 Y 2 ) = δ 2 E (X E {X Y 2}) 2 + o(δ) Increase due to SNR increment: I(snr + δ) I(snr) = I(X; Y 1 Y 2 ) = δ mmse(snr) + o(δ) 2 p 16 Dongning Guo

Why d dsnr I(snr) = 1 2 mmse(snr)? snr 1 snr 2 snr 3 0 X Y 1 Y 2 Y 3 p 17 Dongning Guo

Why d dsnr I(snr) = 1 2 mmse(snr)? snr 1 snr 2 snr 3 0 X Y 1 Y 2 Y 3 Using mutual information chain rule: I(snr 1 ) = I(X; Y 1 ) = I(X; Y 1 Y 2 ) + I(X; Y 2 ) = n=1 I(X; Y n Y n+1 ) 1 (snr n snr n+1 ) mmse(snr n ) 2 1 2 n=1 snr1 0 mmse(γ) dγ p 17 Dongning Guo

Continous-time Channel Channel model: R t = dy t = snr X t + N t, snr X t dt + db t or, equivalently, {B t } Brownian motion (Wiener process) p 18 Dongning Guo

Continous-time Channel Channel model: R t = snr X t + N t, or, equivalently, dy t = snr X t dt + db t {B t } Brownian motion (Wiener process) Consider t [0, T ] Information rate: I(snr) = 1 T I(XT 0 ; Y T 0 ) Causal and non-causal MMSEs: cmmse(t, snr) = E ( X t E { }) 2, X t Y0 t mmse(t, T, snr) = E ( X t E { }) 2 X t Y T 0 p 18 Dongning Guo

Triangle Relationship Theorem 2 If T 0 EX2 t dt < (finite-power input), d dsnr I(snr) = 1 2 T 0 mmse(t, T, snr) dt T Proof: Radon-Nikodym derivatives Stochastic calculus p 19 Dongning Guo

Triangle Relationship Theorem 2 If T 0 EX2 t dt < (finite-power input), d dsnr I(snr) = 1 2 T 0 mmse(t, T, snr) dt T Proof: Radon-Nikodym derivatives Stochastic calculus Theorem 3 (Duncan 1970) For finite-power input, I(snr) = snr 2 T 0 cmmse(t, snr) dt T p 19 Dongning Guo

Stationary Inputs Theorem 5 For stationary finite-power input, d dsnr I(snr) = 1 2 mmse(snr) ( ) p 20 Dongning Guo

Stationary Inputs Theorem 5 For stationary finite-power input, d dsnr I(snr) = 1 2 mmse(snr) ( ) Theorem 6 For stationary finite-power input, cmmse(snr) = 1 snr snr 0 mmse(γ) dγ ( ) p 20 Dongning Guo

Stationary Inputs Theorem 5 For stationary finite-power input, d dsnr I(snr) = 1 2 mmse(snr) ( ) Theorem 6 For stationary finite-power input, cmmse(snr) = 1 snr snr 0 mmse(γ) dγ ( ) Can be checked: Gaussian input with spectrum S X (ω) I(snr) [Shannon 49], cmmse(snr) [Yovits-Jackson 55] Random telegraph waveform (2-state Markov) input MMSEs due to Wonham 65 and Yao 85 p 20 Dongning Guo

cmmse(snr) = 1 snr snr 0 mmse(γ) dγ ( ) 1 An example: Random telegraph waveform input 08 06 04 02 cmmse snr mmse snr 5 10 15 20 25 30 snr p 21 Dongning Guo

Extensions Vector channel Y = snr H X + W : d dsnr I(X; Y ) = 1 2 E H X H E {X Y } 2 p 22 Dongning Guo

Extensions Vector channel Y = snr H X + W : d dsnr I(X; Y ) = 1 2 E H X H E {X Y } 2 Discrete-time model: Y n = snr X n + W n, n = 1, 2, Via piecewise constant continuous-time input p 22 Dongning Guo

Extensions Vector channel Y = snr H X + W : d dsnr I(X; Y ) = 1 2 E H X H E {X Y } 2 Discrete-time model: Y n = snr X n + W n, n = 1, 2, Via piecewise constant continuous-time input More general models: dy t = snr h t (X t ) dt + db t, t R MMSEs errors in estimating channel input h t (X t ) p 22 Dongning Guo

Other Channels & Applications Incremental channel device works if noise has independent increments (Lévy processes) Eg, Gaussian channel MMSE: X 2 t X 2 t ; Poisson channel φ(λ t ) φ( λ t ), where φ(x) = x log x Bounds on MMSE bounds on mutual information Linear estimation upper bound Intersymbol interference channel p 23 Dongning Guo

Part II: Y = SX + W p 24 Dongning Guo

Multiuser Channel Channel model: K Y = s k snrk X k + W where W N (0, I) k=1 = S N K X + W, p 25 Dongning Guo

Multiuser Channel Channel model: Y = where W N (0, I) K s k snrk X k + W k=1 Assumptions: Iid input: X k P X Random signatures: s k Fading: snr k P snr = S N K X + W, p 25 Dongning Guo

Examples Code-division multiple access (CDMA): X 1 X 2 snr1 s 1 snr2 s 2 W N (0, I) K Y = s k snrk X k + W k=1 X K snrk s K Y = S N K X + W Raondom signatures in Qualcomm CDMA, 3G wireless Also: multiple-antenna, multi-carrier systems p 26 Dongning Guo

Joint Decoding Joint decoding Encoder Encoder Encoder X 1 snr1 s 1 X 2 snr2 s 2 X K snrk s K N (0, I) Y Joint decoding p 27 Dongning Guo

Joint Decoding Joint decoding Encoder Encoder Encoder X 1 snr1 s 1 X 2 snr2 s 2 X K snrk s K N (0, I) Y Joint decoding Spectral efficiency: C joint = 1 N I(X; Y ) p 27 Dongning Guo

Separate Decoding Multiuser detection + single-user decoding: Encoder Encoder Encoder X 1 snr1 s 1 X 2 snr2 s 2 X K snrk s K N (0, I) Y Multiuser detector Decoder X 1 Decoder X 2 Decoder X K p 28 Dongning Guo

Separate Decoding Multiuser detection + single-user decoding: Encoder Encoder Encoder X 1 snr1 s 1 Xk snrk s k X K snrk s K N (0, I) Multiuser Y detector Decoder X 1 X k Decoder Decoder X K Channel for user k: Capacity of user k: P Xk X k I(X k ; X k ) p 28 Dongning Guo

Multiuser Detection Detection function: X k = f k (Y, S) p 29 Dongning Guo

Multiuser Detection Detection function: X k = f k (Y, S) Optimal: X k = E {X k Y, S} Wrt p Xk Y,S (induced from p X and p Y X,S ) p 29 Dongning Guo

Multiuser Detection Detection function: X k = f k (Y, S) Optimal: X k = E {X k Y, S} Wrt p Xk Y,S (induced from p X and p Y X,S ) Suboptimal: X k q = E q {X k Y, S} Wrt q Xk Y,S (induced from q X and q Y X,S ) p 29 Dongning Guo

Special Cases If q X N (0, 1), then X q = [ S S + σ 2 I ] 1 S Y σ : matched filter; σ = 1: linear MMSE; σ 0: decorrelator p 30 Dongning Guo

Special Cases If q X N (0, 1), then X q = [ S S + σ 2 I ] 1 S Y σ : matched filter; σ = 1: linear MMSE; σ 0: decorrelator If q X = p X, then σ 0: jointly optimal; σ = 1: individually optimal p 30 Dongning Guo

Special Cases If q X N (0, 1), then X q = [ S S + σ 2 I ] 1 S Y σ : matched filter; σ = 1: linear MMSE; σ 0: decorrelator If q X = p X, then σ 0: jointly optimal; σ = 1: individually optimal In principle, multiuser detector X q = E q {X Y, S}, parameterized by q X and σ p 30 Dongning Guo

Problem Multiuser detection + single-user codes Encoder Encoder Encoder X 1 snr1 s 1 Xk snrk s k X K snrk s K N (0, I) Multiuser Y detector Decoder X 1 q Decoder X k q Decoder X K q What is P Xk q X k? I(X k ; X k q )? I(X; Y )? p 31 Dongning Guo

New Result: Decoupling Multiuser channel: Encoder Encoder Encoder X 1 snr1 s 1 Xk snrk s k X K snrk s K N (0, I) Multiuser Y detector Decoder X 1 q Decoder X k q Decoder X K q p 32 Dongning Guo

New Result: Decoupling Multiuser channel: Encoder Encoder Encoder X 1 snr1 s 1 Xk snrk s k X K snrk s K N (0, I) Multiuser Y detector Decoder X 1 q Decoder X k q Decoder X K q Equivalent single-user scalar channel: N ( 0, η 1) Encoder X k snrk Y Decision function Decoder X k q p 32 Dongning Guo

Decision Function Implicit decision function: N ( 0, η 1) X p X snr + Z Decision function X q p 33 Dongning Guo

Decision Function Implicit decision function: N ( 0, η 1) X p X snr + Z Decision function X q Best estimate of X given Z assuming a postulated channel: N ( 0, ξ 1) X q X snr + Z Ie, X q = E q {X Z, snr; ξ} p 33 Dongning Guo

Retrochannel The equivalent channel & retrochannel: N ( 0, η 1) X p X snr + Z Decision function E q {X Z, snr; ξ} X q Retrochannel q X Z,snr;ξ X p 34 Dongning Guo

Retrochannel The equivalent channel & retrochannel: N ( 0, η 1) X p X snr + Z Decision function E q {X Z, snr; ξ} X q Retrochannel q X Z,snr;ξ [ ] 2; The MSE: E(snr; η, ξ) = E X X q [ ] 2 the variance: V(snr; η, ξ) = E X X q X p 34 Dongning Guo

Main Result: Decoupling Theorem 7 Equivalent channel for a user with snr k = snr: N ( 0, η 1) X p X snr + Z Decision function X q E q {X Z, snr; ξ} Retrochannel X q X Z,snr;ξ ie, Gaussian followed by a decision function, where η 1 =1 + β E {snr E(snr; η, ξ)}, ξ 1 =σ 2 + β E {snr V(snr; η, ξ)} ( ) E(snr; η, ξ) = E[X X q ] 2 ; V(snr; η, ξ) = E[X X q ] 2 p 35 Dongning Guo

Spectral Efficiencies Corollary 1 The capacity of user k is I(η snr k ) N ( 0, η 1) X p X snrk + Z p 36 Dongning Guo

Spectral Efficiencies Corollary 1 The capacity of user k is I(η snr k ) N ( 0, η 1) X p X snrk + Z Corollary 2 The overall spectral efficiency: C sep = β E {I(η snr)} p 36 Dongning Guo

Spectral Efficiencies Corollary 1 The capacity of user k is I(η snr k ) N ( 0, η 1) X p X snrk + Z Corollary 2 The overall spectral efficiency: C sep = β E {I(η snr)} Theorem 8 The spectral efficiency under joint decoding is C joint = C sep + (η 1 log η)/2 Generalization of Shamai-Verdú, Tanaka, Müller-Gerstaker p 36 Dongning Guo

Special Case Postulate q X N (0, 1), σ 2 = 1 Then linear MMSE p 37 Dongning Guo

Special Case Postulate q X N (0, 1), σ 2 = 1 Then linear MMSE MMSE: mmse(snr; η) = 1 1 + η snr p 37 Dongning Guo

Special Case Postulate q X N (0, 1), σ 2 = 1 Then linear MMSE MMSE: mmse(snr; η) = 1 1 + η snr Tse-Hanly 99 is a special case of ( ): { } η 1 1 = 1 + β E snr 1 + η snr p 37 Dongning Guo

Special Case Postulate q X N (0, 1), σ 2 = 1 Then linear MMSE MMSE: mmse(snr; η) = 1 1 + η snr Tse-Hanly 99 is a special case of ( ): { } η 1 1 = 1 + β E snr 1 + η snr Single-user capacity C(snr; η) = 1 2 log (1 + ηsnr) p 37 Dongning Guo

Joint vs Separate Theorem 9 P X, C joint (β) = β 0 1 β C sep(β ) dβ p 38 Dongning Guo

Joint vs Separate Theorem 9 P X, Proof: Use C joint (β) = β d dβ C joint = 1 β C sep + dη dβ 0 1 β C sep(β ) dβ [ { } ] d E dη I(ηsnr) + 1 η 1 d dsnr I(snr) = 1 mmse(snr) ( ), and fixed-point eqn ( ) 2 p 38 Dongning Guo

Joint vs Separate Theorem 9 P X, β 1 C joint (β) = 0 β C sep(β ) dβ d Proof: dβ C joint = 1 β C sep + dη [ { } ] d E dβ dη I(ηsnr) + 1 η 1 d Use dsnr I(snr) = 1 mmse(snr) ( ), and fixed-point eqn ( ) 2 Mutual information chain rule Successive cancellation 1 N I(X; Y S) = 1 K I(X k ; Y S, X k+1,, X K ) N = 1 N k=1 K I k=1 ( η ( k N ) ) snr k p 38 Dongning Guo

Statistical Physics From microscopic interactions to macroscopic properties p 39 Dongning Guo

Statistical Physics From microscopic interactions to macroscopic properties Spin glass: p 39 Dongning Guo

Statistical Physics From microscopic interactions to macroscopic properties Spin glass: Posterior distribution of transmitted symbols = configuration distribution under external field p X Y,S (x y, S) exp [ H y,s (x)] p 39 Dongning Guo

Replica Trick Mutual information 1 K I(X; Y S) = 1 K E { log p Y S (Y S) S } const = F(S) const p 40 Dongning Guo

Replica Trick Mutual information 1 K I(X; Y S) = 1 K E { log p Y S (Y S) S } const = F(S) const Asymptotic equipartition property Free energy F(S) F = lim K (1/K) E { log p Y S (Y S) } p 40 Dongning Guo

Replica Trick Mutual information 1 K I(X; Y S) = 1 K E { log p Y S (Y S) S } const = F(S) const Asymptotic equipartition property Free energy F(S) F = lim K (1/K) E { log p Y S (Y S) } Replica trick: lim u 0 u log E E {Θ u log Θ} {Θu } = lim u 0 E {Θ u } = E {log Θ} p 40 Dongning Guo

Conclusion I-mmse relationship: Under arbitrary input distribution, d dsnr I(snr) = 1 2 mmse(snr) ( ) Incremental channel proof Relationship between causal & non-causal MMSEs p 41 Dongning Guo

Conclusion I-mmse relationship: Under arbitrary input distribution, d dsnr I(snr) = 1 2 mmse(snr) ( ) Incremental channel proof Relationship between causal & non-causal MMSEs Multiuser channel: Multiuser detector optimal for a postulated system Decoupling: Equivalent Gaussian single-user channel Multiuser efficiency, spectral efficiencies p 41 Dongning Guo

Future Work Application and practical implications of d dsnr I(snr) = 1 2 mmse(snr) ( ) Eg, ISI channels, coding Further extensions MMSE structures in capacity-achieving receivers Eg, ISI channel, CDMA channel, lattice codes Applications of the decoupling result Eg, power control, signaling optimization Applications of statistical physics methodologies Eg, large sensor networks p 42 Dongning Guo