The Maximum Entropy Principle and Applications to MIMO Channel Modeling Eurecom CM Talk, 16/02/2006 Maxime Guillaud maxime.guillaud@eurecom.fr (Joint work with Mérouane Debbah) The Maximum Entropy Principle and Applications to MIMO Channel Modeling, M. Guillaud, CM Talk 16/02/2006 p.1
Overview Background theory Plausible reasoning The Maximum Entropy method and the link to modelling Applications to wireless channel modelling General method: focus on one parameter Application to channel energy Application to spatial correlation Conclusion The Maximum Entropy Principle and Applications to MIMO Channel Modeling, M. Guillaud, CM Talk 16/02/2006 p.2
Deduction, Plausible Reasoning and Probabilities Deduction from fact: A B A is true, therefore B is true. B is false, therefore A is false. Plausible reasoning from fact: A B B is true. What does this say about A? A is false. What does this say about B? We continuously do plausible reasoning: B = {At night, a masked gentleman comes out of the broken window of a store, carrying a bag of jewelry}, A = {This gentleman is a robber} B is true, therefore A becomes more likely. Plausible reasoning can be quantified using probability theory This goes beyond the statistical interpretation of probability, it takes into account our degree of knowledge The Maximum Entropy Principle and Applications to MIMO Channel Modeling, M. Guillaud, CM Talk 16/02/2006 p.3
Plausible Reasoning and Modelling We already have all the Bayes probability theory for this. Yes, but... Prior distributions in the Bayesian framework are "speculative" parameters: if correct, they increase the accuracy of the conclusions, but if they are wrong it s the opposite Jaynes [1] proposes to solve this by taking into account all constraints or information which are known for sure assuming the maximum uncertainty (or entropy) for everything else this makes it a nice modelling tool in general! Jayne s Maximum Entropy (MaxEnt) and Bayes methods are not contradictory The Maximum Entropy Principle and Applications to MIMO Channel Modeling, M. Guillaud, CM Talk 16/02/2006 p.4
The Maximum Entropy Method Need for practical way to mathematically enforce the maximization of uncertainty. Jaynes defined uncertainty as a continuous function of the distribution, representing degrees of certainty by real numbers qualitative correspondence with common sense: uncertainty decreases when some extra knowledge is acquired consistency property: if a conclusion can be reasoned out in more than one way, every possible way must yield the same result The unique solution is the Shannon entropy: H(P ) = D log(p (v))p (v)dv. The MaxEnt method: P MaxEnt = arg max H(P ) P,constraints The Maximum Entropy Principle and Applications to MIMO Channel Modeling, M. Guillaud, CM Talk 16/02/2006 p.5
Maximum Entropy: Application Examples (1) Discrete random variable v, takes a finite number of values {v 1,..., v N } with probabilities P (v = v i ) = p i. Entropy: H(p 1,... p N ) = N i=1 p i log(p i ) Maximize H(p 1,... p N ) under the constraint N i=1 p i = 1 Lagrange method: maximize L(p 1,...p N ) p i L(p 1,... p N ) = H(p 1,... p N ) + β ( N ) p i 1 i=1 = 0 yields the uniform density p i = e β 1 for i = 1... N normalization N i=1 p i = 1 imposes p i = 1 N. The Maximum Entropy Principle and Applications to MIMO Channel Modeling, M. Guillaud, CM Talk 16/02/2006 p.6
Maximum Entropy: Application Examples (2) Generalization to continuous variates is straightforward: for v D with PDF P v (v), Entropy: H(P v ) = D P v(v) log(p v (v))dv Maximize H(P v ) under the constraint D P v(v)dv = 1 Lagrange method: maximize L(P v ) = P v (v) log(p v (v))dv + β D ( D ) P v (v)dv 1 Due to the integral structure of the functional L(P v ), the Euler-Lagrange equation applies [2], and maximizing L(P v ) is easy: δl(p v ) δp v = log(p v (v)) 1 + β = 0 for D = [a, b], normalizing yields again the uniform law P v (v) = 1 b a if D is infinite (e.g. [0, + )) and there are no other constraints, there is no MaxEnt distribution The Maximum Entropy Principle and Applications to MIMO Channel Modeling, M. Guillaud, CM Talk 16/02/2006 p.7
Maximum Entropy: Application Examples (3) Any constraint based on expectation (mean, covariance...) can be easily handled: in general, add R constraints D g r(v)p v (v) = a r, r = 1... R, by maximizing L(P v ) = D P v(v) log(p v (v))dv +β [ D P v(v)dv 1 ] Mean constraint: g r (v) = v, a r = m + R r=1 γ r Variance constraint: g r (v) = v 2, a r = m 2 + σ 2 [ D g ] r(v)p v (v)dv a r. for D = R, with mean m and variance σ 2 prescribed, MaxEnt yields the normal distribution P v (v) = 1 σ exp ( [ 1 x m ) ] 2 2π 2 σ for D = [0, + ) with mean m prescribed, MaxEnt yields the exponential distribution P v (v) = 1 m e x/m. see [3] for more applications The Maximum Entropy Principle and Applications to MIMO Channel Modeling, M. Guillaud, CM Talk 16/02/2006 p.8
Overview Background theory Plausible reasoning The Maximum Entropy method and the link to modelling Applications to wireless channel modelling General method: focus on one parameter Application to channel energy Application to spatial correlation Conclusion The Maximum Entropy Principle and Applications to MIMO Channel Modeling, M. Guillaud, CM Talk 16/02/2006 p.9
MaxEnt Wireless Channel Modelling We restrict the problem to the classical framework of MIMO frequency-flat fading channels y = Hx + n. with H an n r n t matrix with complex scalar coefficients. seek an analytical formula for the PDF P (H) (applications to channel code optimization) what do we know (or don t know...) for sure? the SNR is not always the same the SNR is bounded in many situations the coefficients of H can not be assumed independent incorporate spatial correlation only (realizations are assumed i.i.d. in time) What is the MaxEnt distribution function P (H) corresponding to this? The Maximum Entropy Principle and Applications to MIMO Channel Modeling, M. Guillaud, CM Talk 16/02/2006 p.10
Previous results: Energy constraint only Debbah et al. [4] derived the MaxEnt distribution for the average energy NE 0, N = n t n r constraint only: maximizing [ ] L(P ) = log(p (H))P (H)dH +β 1 P (H)dH C } N C {{}} N {{} Entropy [ PDF normalization ] +γ NE 0 H 2 F P (H)dH C } N {{} Avg. energy constraint with Lagrange multipliers β, γ yields a Gaussian i.i.d. model ( ) 1 N P H E0 (H, E 0 ) = (πe 0 ) N exp h i 2 E 0 Gaussianity and independence are results of the ignorance of further constraints, not assumptions. i=1 The Maximum Entropy Principle and Applications to MIMO Channel Modeling, M. Guillaud, CM Talk 16/02/2006 p.11
Focusing on one parameter: the IMM method Goal: generate a model that maximally explores the domain of a variable V. We propose the following method: derive P V (V ) using a MaxEnt argument and known constraints on V derive P H V (H, V ) using MaxEnt and known constraints on H marginalize P H,V = P H V P V over V to obtain P (H): P (H) = P H V (H, V )P V (V )dv Individual MaxEnt and Marginalize (IMM) method in general, this yields distributions with less entropy but that maximally explore the domain of V Applications: channel energy (SNR) and spatial correlation. The Maximum Entropy Principle and Applications to MIMO Channel Modeling, M. Guillaud, CM Talk 16/02/2006 p.12
Application to Channel Energy Derive the PDF of the channel energy E R according to MaxEnt, under the constraints 0 E E max E 0 = E max 0 EP E (E)dE is known MaxEnt yields the truncated exponential law β P E (E) = exp(βe max ) 1 exp(βe), 0 E E max, 0 elsewhere { ( ) } 1 with β = RootOf E max exp(βe max ) β + E 0 (exp(βe max ) 1) = 0 < 0. marginalize over E: P (H) = P H,E (H, E)dE = R + R + P H E (H)P E (E)dE. The Maximum Entropy Principle and Applications to MIMO Channel Modeling, M. Guillaud, CM Talk 16/02/2006 p.13
Application to Channel Energy (cont d) Example for a SISO channel (n t = n r = 1): distribution of r = h known energy: P r (r) = 2r E 0 exp unknonw energy: P r (r) = E max 0 ( ) r2 E 0 β exp(βe max ) 1 ( ) 2r exp βe r2 de E E 1 1 0.9 Known energy, E 0 =1 0.9 P r (r) 0.8 0.7 0.6 0.5 0.4 0.3 Unknown energy, E =1, E =+ 0 max Unknown energy, E =1, E =4 0 max Unknown energy, E 0 =1, E max =1.5 Mutual information CDF P(I<I 0 ) 0.8 0.7 0.6 0.5 0.4 0.3 Known energy, E =1 0 Unknown energy, E =1, E =+ 0 max 0.2 0.2 Unknown energy, E 0 =1, E max =4 0.1 0.1 Unknown energy, E 0 =1, E max =1.5 0 0 0.5 1 1.5 2 2.5 3 3.5 4 r 0 0 1 2 3 4 5 6 7 I 0 (nats) The Maximum Entropy Principle and Applications to MIMO Channel Modeling, M. Guillaud, CM Talk 16/02/2006 p.14
Application to Spatial Correlation Q = C N hh H P H (H)dH (spatial covariance) is known to be an important channel characteristic Case of a deterministic Q: Apply the MaxEnt method by introducing N 2 + 1 Lagrange multipliers α a,b, β, and maximizing L(P H Q ) = C N log(p H Q (H, Q))P H Q (H, Q)dH + β [ 1 C N P H Q (H, Q)dH ] + (a,b) [1,...,N] 2 α a,b [ C N h a h b P H Q(H)dH q a,b ] This yields the correlated Gaussian distribution 1 ( ) P h Q (h, Q) = det(πq) exp (h H Q 1 h) The Maximum Entropy Principle and Applications to MIMO Channel Modeling, M. Guillaud, CM Talk 16/02/2006 p.15
Application to Spatial Correlation (cont d) For unknown Q, derive P Q (Q) as the MaxEnt distribution over S = {N N positive semidefinite matrices on C}. S is mapped [5] into the product space U(N)/T R +N : U(N)/T : unitary N N matrices with real, non-negative first row R N is the space of real non-negative non-decreasing N-tuples Q = UΛU H, U U(N)/T, Λ = diag(λ 1,... λ N ). K(Λ) = (2π)N(N 1)/2 N 1 j=1 j! i<j (λ i λ j ) 2 is the Jacobian Under the average total energy constraint ( S tr(q)p Q(Q)dQ = NE 0 ), MaxEnt yields P U,Λ (U, Λ) = P U P Λ (Λ)K(Λ), and P U (U) is the uniform distribution on U(N)/T and P Λ (Λ) = C i=1...n eγλ i This factorization is a consequence of the MaxEnt optimization! The Maximum Entropy Principle and Applications to MIMO Channel Modeling, M. Guillaud, CM Talk 16/02/2006 p.16
Application to Spatial Correlation: P (H) Marginalize over Q to obtain P (H): P H (H) = S P H Q (H, Q)P Q (Q)dQ = U(N)/T R +N P H U,Λ (H)P U P Λ (Λ)K(Λ)dUdΛ P U,Λ (U, Λ) = P U P Λ (Λ)K(Λ) provides a way to generate realizations of Q: uniform P U : P H (H) is invariant by unitarily invariant U generated by orthogonalization of i.i.d. Gaussian matrices the joint distribution of the eigenvalues Λ is C i=1...n eγλ i (2π)N(N 1)/2 N 1 j=1 j! i<j (λ i λ j ) 2 P H Q (H, Q) is a correlated Gaussian r.v. The Maximum Entropy Principle and Applications to MIMO Channel Modeling, M. Guillaud, CM Talk 16/02/2006 p.17
Conclusions General MaxEnt-based method to generate analytical channel models with a parameter of interest [6] Application to channel energy Application to spatial covariance matrix importance of the distribution of the eigenvalues Every expectation constraint (correlation...) is easily incorporated Extension to other types of correlation (time, frequency...) is possible The Maximum Entropy Principle and Applications to MIMO Channel Modeling, M. Guillaud, CM Talk 16/02/2006 p.18
References [1] E. T. Jaynes, Probability Theory, Cambridge University Press, 2003. [2] Benjamin Svetitsky, Notes on functionals, http://julian.tau.ac.il/ bqs/functionals.pdf, Mar. 2005. [3] J.N. Kapur and H.K. Kesavan, Entropy Optimization Principles with Applications, Academic Press, 1997. [4] Merouane Debbah and Ralf R. Müller, MIMO channel modelling and the principle of maximum entropy, IEEE Transactions on Information Theory, vol. 51, no. 5, pp. 1667 1690, May 2005. [5] Fumio Hiai and Dénes Petz, The Semicircle Law, Free Random Variables and Entropy, vol. 77 of Mathematical Surveys and Monographs, American Mathematical Society, 2000. [6] M. Guillaud and M. Debbah, Maximum entropy MIMO wireless channel models with limited information, in Proc. MATHMOD Conference on Mathematical Modeling, Wien, Austria, Feb. 2006. The Maximum Entropy Principle and Applications to MIMO Channel Modeling, M. Guillaud, CM Talk 16/02/2006 p.19