arxiv:cs/ v2 [cs.it] 1 Oct 2006

Similar documents
Gaussian Message Passing on Linear Models: An Update

Gaussian message passing on linear models: an update

Particle Methods as Message Passing

Steepest descent on factor graphs

Factor Graphs and Message Passing Algorithms Part 1: Introduction

Computation of Information Rates from Finite-State Source/Channel Models

Steepest Descent as Message Passing

An Introduction to Factor Graphs

Expectation propagation for signal detection in flat-fading channels

NUMERICAL COMPUTATION OF THE CAPACITY OF CONTINUOUS MEMORYLESS CHANNELS

SAMPLING JITTER CORRECTION USING FACTOR GRAPHS

Performance Analysis and Code Optimization of Low Density Parity-Check Codes on Rayleigh Fading Channels

Marginal Densities, Factor Graph Duality, and High-Temperature Series Expansions

Sub-Gaussian Model Based LDPC Decoder for SαS Noise Channels

Latent Tree Approximation in Linear Model

IEEE SIGNAL PROCESSING MAGAZINE

ON CONVERGENCE PROPERTIES OF MESSAGE-PASSING ESTIMATION ALGORITHMS. Justin Dauwels

Aalborg Universitet. Bounds on information combining for parity-check equations Land, Ingmar Rüdiger; Hoeher, A.; Huber, Johannes

Robust Expectation Propagation in Factor Graphs Involving Both Continuous and Binary Variables

Iterative Encoding of Low-Density Parity-Check Codes

NAME... Soc. Sec. #... Remote Location... (if on campus write campus) FINAL EXAM EE568 KUMAR. Sp ' 00

A STATE-SPACE APPROACH FOR THE ANALYSIS OF WAVE AND DIFFUSION FIELDS

Extending Monte Carlo Methods to Factor Graphs with Negative and Complex Factors

SPARSE intersymbol-interference (ISI) channels are encountered. Trellis-Based Equalization for Sparse ISI Channels Revisited

Expectation Propagation in Factor Graphs: A Tutorial

Practical Polar Code Construction Using Generalised Generator Matrices

EVALUATING SYMMETRIC INFORMATION GAP BETWEEN DYNAMICAL SYSTEMS USING PARTICLE FILTER

Single-Gaussian Messages and Noise Thresholds for Low-Density Lattice Codes

Polar Codes: Graph Representation and Duality

An Introduction to Low Density Parity Check (LDPC) Codes

Soft-Output Decision-Feedback Equalization with a Priori Information

MANY papers and books are devoted to modeling fading

Asynchronous Decoding of LDPC Codes over BEC

Construction of low complexity Array based Quasi Cyclic Low density parity check (QC-LDPC) codes with low error floor

Joint Channel Estimation and Co-Channel Interference Mitigation in Wireless Networks Using Belief Propagation

Information Theoretic Imaging

Shannon meets Wiener II: On MMSE estimation in successive decoding schemes

Structured Low-Density Parity-Check Codes: Algebraic Constructions

9 Forward-backward algorithm, sum-product on factor graphs

Gaussian Belief Propagation Solver for Systems of Linear Equations

Low Density Parity Check (LDPC) Codes and the Need for Stronger ECC. August 2011 Ravi Motwani, Zion Kwok, Scott Nelson

Binary Transmissions over Additive Gaussian Noise: A Closed-Form Expression for the Channel Capacity 1

SIGNAL detection in flat Rayleigh fading channels can be. Window-Based Expectation Propagation for Adaptive Signal Detection in Flat-Fading Channels

Message-Passing Decoding for Low-Density Parity-Check Codes Harish Jethanandani and R. Aravind, IIT Madras

RANDOM DISCRETE MEASURE OF THE PHASE POSTERIOR PDF IN TURBO SYNCHRONIZATION

Graph-based Codes for Quantize-Map-and-Forward Relaying

Exact Probability of Erasure and a Decoding Algorithm for Convolutional Codes on the Binary Erasure Channel

The Information Bottleneck Revisited or How to Choose a Good Distortion Measure

Upper Bounds on the Capacity of Binary Intermittent Communication

The Concept of Soft Channel Encoding and its Applications in Wireless Relay Networks

Joint FEC Encoder and Linear Precoder Design for MIMO Systems with Antenna Correlation

Linear Dynamical Systems

Convolutional Codes ddd, Houshou Chen. May 28, 2012

Walk-Sum Interpretation and Analysis of Gaussian Belief Propagation

Power EP. Thomas Minka Microsoft Research Ltd., Cambridge, UK MSR-TR , October 4, Abstract

MMSE DECISION FEEDBACK EQUALIZER FROM CHANNEL ESTIMATE

Joint Equalization and Decoding for Nonlinear Two-Dimensional Intersymbol Interference Channels with Application to Optical Storage

1e

Fractional Belief Propagation

A FEASIBILITY STUDY OF PARTICLE FILTERS FOR MOBILE STATION RECEIVERS. Michael Lunglmayr, Martin Krueger, Mario Huemer

Probabilistic and Bayesian Machine Learning

One-Bit LDPC Message Passing Decoding Based on Maximization of Mutual Information

6.451 Principles of Digital Communication II Wednesday, May 4, 2005 MIT, Spring 2005 Handout #22. Problem Set 9 Solutions

Least Squares and Kalman Filtering on Forney Graphs

Determining the Optimal Decision Delay Parameter for a Linear Equalizer

Connecting Belief Propagation with Maximum Likelihood Detection

The Effect of Memory Order on the Capacity of Finite-State Markov and Flat-Fading Channels

Data Detection for Controlled ISI. h(nt) = 1 for n=0,1 and zero otherwise.

Expectation Propagation in Dynamical Systems

TREE-STRUCTURED EXPECTATION PROPAGATION FOR LDPC DECODING OVER THE AWGN CHANNEL

Variable-Rate Universal Slepian-Wolf Coding with Feedback

Soft-Output Trellis Waveform Coding

New Puncturing Pattern for Bad Interleavers in Turbo-Codes

Analysis of Sum-Product Decoding of Low-Density Parity-Check Codes Using a Gaussian Approximation

Online Estimation of Discrete Densities using Classifier Chains

MMSE DECODING FOR ANALOG JOINT SOURCE CHANNEL CODING USING MONTE CARLO IMPORTANCE SAMPLING

Capacity of Memoryless Channels and Block-Fading Channels With Designable Cardinality-Constrained Channel State Feedback

The Sorted-QR Chase Detector for Multiple-Input Multiple-Output Channels

Bayesian Inference Course, WTCN, UCL, March 2013

A Message-Passing Approach for Joint Channel Estimation, Interference Mitigation and Decoding

LDPC Codes. Slides originally from I. Land p.1

Decoding of LDPC codes with binary vector messages and scalable complexity

CO-OPERATION among multiple cognitive radio (CR)

Lecture 4 : Introduction to Low-density Parity-check Codes

Constellation Shaping for Communication Channels with Quantized Outputs

On the Design of Raptor Codes for Binary-Input Gaussian Channels

Recovering the Graph Structure of Restricted Structural Equation Models

STA 4273H: Statistical Machine Learning

Binary Puzzles as an Erasure Decoding Problem

Belief propagation decoding of quantum channels by passing quantum messages

RECENT advances in technology have led to increased activity

Staircase Codes. for High-Speed Optical Communications

CHAPTER 8 Viterbi Decoding of Convolutional Codes

Part 1: Expectation Propagation

These outputs can be written in a more convenient form: with y(i) = Hc m (i) n(i) y(i) = (y(i); ; y K (i)) T ; c m (i) = (c m (i); ; c m K(i)) T and n

AN EXACT SOLUTION FOR OUTAGE PROBABILITY IN CELLULAR NETWORKS

DEEP LEARNING CHAPTER 3 PROBABILITY & INFORMATION THEORY

Extended MinSum Algorithm for Decoding LDPC Codes over GF (q)

Performance of Multi Binary Turbo-Codes on Nakagami Flat Fading Channels

Chapter 7: Channel coding:convolutional codes

Transcription:

A General Computation Rule for Lossy Summaries/Messages with Examples from Equalization Junli Hu, Hans-Andrea Loeliger, Justin Dauwels, and Frank Kschischang arxiv:cs/060707v [cs.it] 1 Oct 006 Abstract Elaborating on prior work by Minka, we formulate a general computation rule for lossy messages. An important special case (with many applications in communications) is the conversion of soft-bit messages to Gaussian messages. By this method, the performance of a Kalman equalizer is improved, both for uncoded and coded transmission. I. INTRODUCTION We consider message passing algorithms in factor graphs [1], []. If the factor graph has no cycles, the messages computed by the basic sum-product and max-product algorithms are exact summaries of the subgraph behind the corresponding edge. However, in many applications (especially with continuous variables), complexity considerations suggest, or even dictate, the use of approximate or lossy summaries. For example, it is customary to use Gaussian messages even in cases where the true (sum-product or max-product) messages are not Gaussian, or to use scalar (i.e., single-variable) messages instead of multi-dimensional (i.e., multi-variable) messages. In this paper, we first formulate a general message update rule for lossy summaries/messages that is a nontrivial generalization of the standard sum-product or max-product rules. This rule was in essence proposed by Minka [3], [4], but our general formulation of it may not be obvious from Minka s work. We then focus on one particular application: the conversion of binary ( soft-bit ) messages into Gaussian messages, which has many uses in communications. For our numerical examples, we then further focus on equalization: we give simulation results for an iterative Kalman equalizer both for a linear FIR (finite impulse response) channel and for a linear IIR (infinite impulse response) channel. For uncoded transmission, the new algorithm almost closes the gap between the BJCR algorithm and the LMMSE (linear minimum mean squared error) equalizer; for coded transmission, the new algorithm improves the performance of the iterative Kalman equalizer at very little additional cost. It should be noted that the new message computation rule yields iterative algorithms even for cycle-free graphs. We also note that some sort of damping is usually required to stabilize the algorithm. Junli Hu and Hans-Andrea Loeliger are with the Dept. of Information Technology and Electrical Engineering, ETH Zurich, CH-809 Zurich, Switzerland. Justin Dauwels is with the Amari Research Unit of the RIKEN Brain Science Institute, -1 Hirosawa, Wako-shi, Saitama-ken, 351-198, Japan. Frank Kschischang is with the Electrical and Computer Engineering Dept., University of Toronto, Ontario M4S 3G5, Canada. µ true µ approx X µ Fig. 1. Lossy message µ approx along a general edge/variable X. In this paper, we will use Forney-style factor graphs as in [] where edges represent variables and nodes represent factors. II. A GENERAL COMPUTATION RULE FOR LOSSY MESSAGES/SUMMARIES Consider the messages along a general edge (variable) X in some factor graph as illustrated in Fig. 1. Let µ true (x) be the true sum-product or max-product message which we want (or need) to replace by a message µ approx (x) in some prescribed family of functions (e.g., Gaussians). In such cases, most writers (including these authors) used to compute µ approx as some approximation of µ true. However, the semantics of factor graphs suggests another approach. Note that the factor graph of Fig. 1 represents the function f(x) = µ true (x) µ(x), (1) which the replacement of µ true by µ approx will change into It is thus natural to first compute f(x) = µ approx (x) µ(x). () f(x) = some approximation of µ true (x) µ(x) (3) and then to compute µ approx from (). The approximation in (3) must be chosen so that solving () for µ approx yields a function in the prescribed family. Important special cases of this general approach (including the Gaussian case) were proposed as expectation propagation in [3] and [4]. The choice of a suitable approximation in (3) will, in general, depend on the application. For many applications, a natural approach (proposed and pursued by Minka) is to minimize the Kullback-Leibler divergence: f = argmin D(f f ). (4) f in chosen family In this paper, the approximate messages will always be Gaussian. However, other families of functions can be used. For example, multivariable messages with a prescribed

Fig.. or µ gm. µb µ b binary Y Gaussian µg = X µ gs µ gm Converting a soft-bit message µ b into a Gaussian message µ gs Markov-chain structure were used in [6]; with hindsight, the update rule for such messages that was proposed in [6] is indeed an example of the general scheme described here. A related idea was proposed in [5]. III. CONVERTING SOFT-BIT MESSAGES TO GAUSSIAN MESSAGES We will now apply the general scheme of the previous section to the conversion of messages defined on the finite alphabet {+1, 1} into Gaussian messages. The setup is shown in Fig., which is (a part of) a factor graph with an equality constraint between the real variable X and the {+1, 1}-valued variable Y. (The equality constraint node in Fig. may formally be viewed as representing the factor δ(x y), which is to be understood as a Dirac delta in x and a Kronecker delta in y.) The messages µ b and µ b are defined on the finite alphabet {+1, 1} and the messages µg, µ gs, and µ gm are Gaussians; µ gs denotes the standard Gaussian approximation and µ gm denotes the alternative Gaussian approximation due to Minka, as will be detailed below. Let us first recall the conversion of Gaussian messages into soft-bit messages. Let m g and g be the mean and the variance, respectively, of µ g. The (lossless) conversion from µg to µ b is an immediate and standard application of the sum-product (or max-product) rule [1], []: ( µb ) ( µg ) (+1) (+1) ; (5) µb ( 1) µg ( 1) in the standard logarithmic representation, this becomes µb (+1) ln µb ( 1) = m g. (6) g We now turn to the more interesting lossy conversion of µ b into a Gaussian. Let m b and b be the mean and the variance, respectively, of µ b, which are given by m b = µ b (+1) µ b ( 1) µ b (+1)+ µ b ( 1) (7) b = 1 ( m b ). (8) The traditional approach forms the Gaussian message µ gs (with mean m gs and variance gs) from the mean and the variance of µ b : m gs = m b and gs = b. (9) The approach of Section II yields another Gaussian message µ gm (with mean m gm and variance gm ) as follows. In Fig., the true global function corresponding to (1) is δ(x 1) µ b (+1) µ g (+1)+δ(x+1) µ b ( 1) µ g ( 1) (10) which (when properly normalized) has mean and variance m true = m b + m b 1+ m b mb (11) true = 1 (m true), (1) where m b is the mean of µ b (5), which is formed as in (7). The approximate global function (corresponding to ()) is the Gaussian µ gm (x) µ g (x) (13) with mean m g and variance g given by 1/g = 1/ gm +1/ g (14) m g /g = m gm / gm + m g / g. (15) Now a natural choice for the approximation (3) is to equate the mean and the variance of the Gaussian approximation with the corresponding moments of the true global function: m g = m true and g = true. (16) (As pointed out by Minka, this choice may be derived from (4).) The desired Gaussian message µ gm is thus obtained by first evaluating (11) and (1) and then computing gm and m gm from (14) and (15). Note that, in general, the message µ gm is not trivial even if µ b is neutral ( m b = 0 and b = 1). A. Negative Variance IV. ISSUES Solving (14) for gm may result in a negative value for gm in Section V.) In such cases,. (This indeed happens in the examples to be described µ gm is a correction factor (not itself a probability mass function) that tries to compensate for an overly confident µ g. The product (13) usually remains a valid probability mass function, up to a scale factor.

code binary X k B X 1 X... X n channel model Gaussian... A + =... Y 1 Y... Y n C Fig. 3. Joint code/channel factor graph. W k + B. Damping In our numerical experiments (Section V), simply replacing the standard Gaussian message µ gs by µ gm yielded unstable algorithms. Good results were obtained, however, by geometric mixtures of the form Fig. 4. Y k Factor graph of the channel model (one section). µ g (x) = ( µgm (x)) α ( µgs (x)) 1 α (17) with 0 α 1. The mean and the variance of the resulting Gaussian µ g are given by and m g = 1/ g = α/ gm +(1 α)/ gs (18) m gm α/ gm + m gs (1 α)/ gs α/ gm +(1 α)/ gs V. APPLICATION EXAMPLE: EQUALIZATION (19) Consider the transmission of binary ({+1, 1}-valued) symbols X 1,...,X n over a linear channel with transfer function H(z) = M l=0 h lz 1 and additive white Gaussian noise W 1,...,W n. The received channel output symbols are Y 1,...,Y n with Y k = M h l X k l +W k, (0) l=0 where we assume X k = 0 for k < 0. The binary symbols X k may or may not be coded. The joint code/channel factor graph is shown in Fig. 3 with channel-model details as in Fig. 4. (In the uncoded case, the code graph is missing.) The factor graph shown in Fig. 4 results from writing (0) in state space form with suitable matrices A, B, and C, where B is a column vector and C is a row vector, cf. []. Equalization is achieved by forward-backward Gaussian message passing (i.e., Kalman smoothing) in the factor graph of Fig. 4 according to the recipes stated in []. (See [7] for a more detailed discussion.) In this paper, we are only concerned with the messages along the edges X k (towards the channel model) in Fig. 3. Using the standard messages (9) results in an LMMSE equalizer; in the uncoded case, this algorithm terminates after a single forward-backward sweep since the factor graph of Fig. 4 has no cycles. However, using the (damped) Minka messages (17) (19) results in an iterative algorithm even in the uncoded case. Simulation results for two different channels are given in Figures 5 7. Figures 5 and 6 show the bit error rate vs. the signal-to-noise ratio () for an FIR channel with transfer function H(z) = 0.7+0.46z 1 +0.688z +0.46z 3 + 0.7z 4 ; Fig. 7 shows the bit error rate vs. the for an IIR channel with transfer function H(z) = 1/(1 0.9z 1 ). The FIR channel was used as an example in [8]; because this channel has a spectral null, the difference between a LMMSE equalizer and the optimal BCJR equalizer is large. The IIR channel was used as an example in [9]. Two different message update schedules are used: in Schedule A, the output messages (along edge X k out of the channel model) are initialized to infinite variance and are updated only after a complete forward-backward Kalman sweep; in Schedule B, these messages are updated (and immediately used for the corresponding incoming Minka message) both during the forward Kalman sweep and the backward Kalman sweep. From our simulations, Schedule B is clearly superior. It is obvious from Figures 5 and 7 that, for uncoded transmission, the Minka messages provide a very marked improvement over the standard messages (i.e., over the LMMSE equalizer). In Fig. 5, we almost achieve the performance of the BCJR (or Viterbi) equalizer (and we also outperform the decision-feedback equalizer [8, p. 643]). As for Fig. 7, we almost achieve the performance of the quasi-viterbi algorithm reported in [9]. For the coded example of Fig. 6, a rate 1/ convolutional codes with constraint length 7 was used. In this case, the iterative Kalman equalizer does quite well already with the standard input messages (9), but the Minka messages do give

Minka, Schedule A, 1st Ite (LMMSE) Minka, Schedule A, nd Ite Minka, Schedule A, 10th Ite Minka, Schedule B, nd Ite Minka, Schedule B, 10th Ite BCJR limit Minka, Schedule A, 1st Ite (LMMSE) Minka, Schedule A, nd Ite Minka, Schedule A, 10th Ite Minka, Schedule B, nd Ite Minka, Schedule B, 10th Ite Probability of bit error (BER) 10 Probability of bit error (BER) 10 5 0 5 10 15 0 5 30 35 40 Fig. 5. Bit error rate vs. for uncoded binary transmission over FIR channel with transfer function H(z) = 0.7 + 0.46z 1 + 0.688z + 0.46z 3 +0.7z 4. 10 1 14 16 18 0 Fig. 7. Bit error rate vs. for uncoded binary transmission over IIR channel with transfer function H(z) = 1/(1 0.9z 1 ). Probability of error (BER) 10 LMMSE / Kalman, 1st Ite LMMSE / Kalman, 10th Ite Minka, Schedule B, 10th Ite BCJR, 1st Ite BCJR, 10th Ite 10 = 1 db 15 db 0 4 6 8 10 1 14 16 Fig. 6. Bit error rate vs. for coded binary transmission over FIR channel. a further improvement at very small cost. A key issue with all these simulations is the choice of the damping/mixing factor α in (17) (19). The best results were obtained by changingα in every iteration. Typical good sequences of values of α are plotted in Fig. 8. We note the following observations: The initial values of α are very small. After a moderate number of iterations (typically 10... 0), the bit error rate stops decreasing. At this point, α is still very small. Many more iterations with slowly increasing α are required to reach a fixed point with α = 1. At such a fixed point with α = 1, the approximation (3) holds everywhere. 0 0 40 60 80 100 10 Fig. 8. Good sequences for α vs. the iteration number k. VI. CONCLUSION Elaborating on Minka s work, we have formulated a general computation rule for lossy messages. An important special case is the conversion of soft-bit messages to Gaussian messages. In this case, the resulting Gaussian message is non-trivial even if the soft-bit message is neutral. By this method, the performance of a Kalman equalizer is significantly improved. REFERENCES [1] F. R. Kschischang, B. J. Frey, and H.-A. Loeliger, Factor graphs and the sum-product algorithm, IEEE Trans. Information Theory, vol. 47, pp. 498 519, Feb. 001. [] H.-A. Loeliger, An introduction to factor graphs, IEEE Signal Proc. Mag., Jan. 004, pp. 8 41. [3] T. P. Minka, A Family of Algorithms for Approximate Bayesian Inference. PhD thesis, Massachusetts Institute of Technology (MIT), 001. [4] T. P. Minka, Expectation propagation for approximate Bayesian inference, in Proceedings of the 17th Annual Conference on Uncertainty in Artificial Intelligence (UAI-01), vol. 17, pp. 36 369, 001.

[5] T. Minka and Y. Qi, Tree-structured approximations by expectation propagation, in Advances in Neural Information Processing Systems 16, MIT Press, 003. [6] J. Dauwels, H.-A. Loeliger, P. Merkli, and M. Ostojic, On Markov structured summary propagation and LFSR synchronization, Proc. 4nd Allerton Conf. on Communication, Control, and Computing, (Allerton House, Monticello, Illinois), Sept. 9 Oct. 1, 004, pp. 451 460. [7] H.-A. Loeliger, J. Hu, S. Korl, and Li Ping, Gaussian message passing and related topics: an update, Proc. 4th Int. Symp. on Turbo Codes & Related Topics, Munich, Germany, April 3 7, 006. [8] J. Proakis, Digital Communications. 4th ed., McGraw-Hill, 001. [9] A. Duel-Hallen and C. Heegard, Delayed decision-feedback sequence estimation, IEEE Trans. Communications, vol. 37, pp. 48 436, May 1989.