INFORMATION PROCESSING ABILITY OF BINARY DETECTORS AND BLOCK DECODERS. Michael A. Lexa and Don H. Johnson

Similar documents
Correlations in Populations: Information-Theoretic Limits

Lecture 5: Likelihood ratio tests, Neyman-Pearson detectors, ROC curves, and sufficient statistics. 1 Executive summary

Simultaneous and sequential detection of multiple interacting change points

2. What are the tradeoffs among different measures of error (e.g. probability of false alarm, probability of miss, etc.)?

DETECTION theory deals primarily with techniques for

Distributed Structures, Sequential Optimization, and Quantization for Detection

Symmetrizing the Kullback-Leibler distance

392D: Coding for the AWGN Channel Wednesday, January 24, 2007 Stanford, Winter 2007 Handout #6. Problem Set 2 Solutions

Decentralized Detection in Sensor Networks

Lecture 22: Error exponents in hypothesis testing, GLRT

Broadcast Detection Structures with Applications to Sensor Networks

Chapter 2 Signal Processing at Receivers: Detection Theory

Decentralized Detection In Wireless Sensor Networks

Distributed Binary Quantizers for Communication Constrained Large-scale Sensor Networks

University of Siena. Multimedia Security. Watermark extraction. Mauro Barni University of Siena. M. Barni, University of Siena

Lecture 12. Block Diagram

Diversity Performance of a Practical Non-Coherent Detect-and-Forward Receiver

Introduction to Signal Detection and Classification. Phani Chavali

Limits of population coding

Chapter 7: Channel coding:convolutional codes

On Design Criteria and Construction of Non-coherent Space-Time Constellations

SIPCom8-1: Information Theory and Coding Linear Binary Codes Ingmar Land

EECS 750. Hypothesis Testing with Communication Constraints

Detection Performance Limits for Distributed Sensor Networks in the Presence of Nonideal Channels

Optimal Mean-Square Noise Benefits in Quantizer-Array Linear Estimation Ashok Patel and Bart Kosko

CS264: Beyond Worst-Case Analysis Lecture #11: LP Decoding

(each row defines a probability distribution). Given n-strings x X n, y Y n we can use the absence of memory in the channel to compute

4488 IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 54, NO. 10, OCTOBER /$ IEEE

LIKELIHOOD RECEIVER FOR FH-MFSK MOBILE RADIO*

Sequential Procedure for Testing Hypothesis about Mean of Latent Gaussian Process

Impact of channel-state information on coded transmission over fading channels with diversity reception

Digital Modulation 1

Chapter 2. Binary and M-ary Hypothesis Testing 2.1 Introduction (Levy 2.1)

Lecture 7 September 24

An introduction to basic information theory. Hampus Wessman

g(.) 1/ N 1/ N Decision Decision Device u u u u CP

These outputs can be written in a more convenient form: with y(i) = Hc m (i) n(i) y(i) = (y(i); ; y K (i)) T ; c m (i) = (c m (i); ; c m K(i)) T and n

Lecture 4: Proof of Shannon s theorem and an explicit code

Chapter 9 Fundamental Limits in Information Theory

One Lesson of Information Theory

Optimal Distributed Detection Strategies for Wireless Sensor Networks

Performance of small signal sets

Reliable Computation over Multiple-Access Channels

Turbo Codes for Deep-Space Communications

16.36 Communication Systems Engineering

Appendix B Information theory from first principles

Channel Coding and Interleaving

Practical Polar Code Construction Using Generalised Generator Matrices

Distributed Detection and Estimation in Wireless Sensor Networks: Resource Allocation, Fusion Rules, and Network Security

Information Theory and Hypothesis Testing

Information Hiding and Covert Communication

THE potential for large-scale sensor networks is attracting

MATH Examination for the Module MATH-3152 (May 2009) Coding Theory. Time allowed: 2 hours. S = q

UNIFORMLY MOST POWERFUL CYCLIC PERMUTATION INVARIANT DETECTION FOR DISCRETE-TIME SIGNALS

arxiv:cs/ v1 [cs.it] 11 Sep 2006

A Novel Asynchronous Communication Paradigm: Detection, Isolation, and Coding

10-704: Information Processing and Learning Fall Lecture 24: Dec 7

Digital Transmission Methods S

A Systematic Description of Source Significance Information

Efficient Decoding of Permutation Codes Obtained from Distance Preserving Maps

Digital Band-pass Modulation PROF. MICHAEL TSAI 2011/11/10

Chapter 7. Error Control Coding. 7.1 Historical background. Mikael Olofsson 2005

Linear Programming Decoding of Binary Linear Codes for Symbol-Pair Read Channels

EXTENDING THE DORSCH DECODER FOR EFFICIENT SOFT DECISION DECODING OF LINEAR BLOCK CODES SEAN MICHAEL COLLISON

Hypothesis testing (cont d)

Lecture 7. Union bound for reducing M-ary to binary hypothesis testing

Applications of Information Geometry to Hypothesis Testing and Signal Detection

Summary: SER formulation. Binary antipodal constellation. Generic binary constellation. Constellation gain. 2D constellations

Fusion of Decisions Transmitted Over Fading Channels in Wireless Sensor Networks

Lecture 8: Shannon s Noise Models

Cooperative Spectrum Sensing for Cognitive Radios under Bandwidth Constraints

Expectation propagation for signal detection in flat-fading channels

MMSE DECODING FOR ANALOG JOINT SOURCE CHANNEL CODING USING MONTE CARLO IMPORTANCE SAMPLING

Encoding or decoding

A New Interpretation of Information Rate

Threshold Optimization for Capacity-Achieving Discrete Input One-Bit Output Quantization

A Generalized Restricted Isometry Property

Uncertainty. Jayakrishnan Unnikrishnan. CSL June PhD Defense ECE Department

Hypothesis Testing with Communication Constraints

Upper Bounds on the Capacity of Binary Intermittent Communication

Detection theory. H 0 : x[n] = w[n]

On Two Probabilistic Decoding Algorithms for Binary Linear Codes

Interactions of Information Theory and Estimation in Single- and Multi-user Communications

QUANTIZATION FOR DISTRIBUTED ESTIMATION IN LARGE SCALE SENSOR NETWORKS

Maximum Likelihood Decoding of Codes on the Asymmetric Z-channel

Diversity-Multiplexing Tradeoff in MIMO Channels with Partial CSIT. ECE 559 Presentation Hoa Pham Dec 3, 2007

Chapter 2. Error Correcting Codes. 2.1 Basic Notions

Introduction to Statistical Inference

BASICS OF DETECTION AND ESTIMATION THEORY

IN HYPOTHESIS testing problems, a decision-maker aims

CHAPTER 14. Based on the info about the scattering function we know that the multipath spread is T m =1ms, and the Doppler spread is B d =0.2 Hz.

Cooperative Communication with Feedback via Stochastic Approximation

Mapper & De-Mapper System Document

that efficiently utilizes the total available channel bandwidth W.

Finding the best mismatched detector for channel coding and hypothesis testing

Data Detection for Controlled ISI. h(nt) = 1 for n=0,1 and zero otherwise.

Communication Engineering Prof. Surendra Prasad Department of Electrical Engineering Indian Institute of Technology, Delhi

Lecture 8: Channel and source-channel coding theorems; BEC & linear codes. 1 Intuitive justification for upper bound on channel capacity

RCA Analysis of the Polar Codes and the use of Feedback to aid Polarization at Short Blocklengths

PSK bit mappings with good minimax error probability

Transcription:

INFORMATION PROCESSING ABILITY OF BINARY DETECTORS AND BLOCK DECODERS Michael A. Lexa and Don H. Johnson Rice University Department of Electrical and Computer Engineering Houston, TX 775-892 amlexa@rice.edu, dhj@rice.edu ABSTRACT This paper applies the concepts of information processing [] to the study of binary detectors and block decoders in a single user digital communication system. We quantify performance in terms of the information transfer ratio which measures how well systems preserve discrimination information between two stochastic signals. We investigate hard decision detectors and minimum distance decoders in various additive noise environments. We show that likelihood ratio digital demodulators maximize.. INTRODUCTION In our theory of information processing, information is defined only with respect to the ultimate receiver. Consequently, no single objective measure can quantify the information a signal expresses. For example, this paper (presumably) means more to a signal processing researcher than it does to a Shakespearean scholar. To probe how well systems process information, we resort to calculating how well an informational change at the input is expressed in the output. The complete theoretical basis of this theory can be found elsewhere []. Briefly, to quantify an informational change, we calculate the information-theoretic distance, specifically the Kullback-Leibler distance (KL), between the probability distributions characterizing the signals that encode two pieces of information. We assume the signals, but not the information, are stochastic. The Data Processing Theorem [] says that the KL distance between the outputs of any system responding to the two inputs must be less than or equal to the distance calculated at the input. Here, we use this framework to characterize how well likelihood ratio detectors and block decoders process the information encoded in their inputs. This work was supported by the National Science Foundation under Grant CCR-5558. The word distance does not imply a metric since the KL distance is not symmetric in its arguments and does not satisfy the triangle inequality. We adopt the digital communication system model shown schematically in Figure. The input binary data word u α of length K represents the information the receiver ultimately wants. The encoder simply maps the data word into a code word of length N (u α v α ) and passes the code word onto the modulator. The modulator maps the code word into their signal representations (v α s α ) and transmits a continuous-time signal using an antipodal signal set. The channel adds white noise and the total transmission interval for each data word is KT seconds. Viewed from the framework of information processing, we say that the information is encoded in the received signal vector r α. Obviously, we use the word encode in an untraditional sense. What we mean is the following. The theory of information processing assumes that information does not exist in a tangible form, rather it is always contained within a signal. Thus, the received signal vector contains information about the data word. Normally, we would describe the received signal as a noisy version of the transmitted signal, but viewing the information as being encoded makes it easier to think about this theory in an arbitrary setting. We calculate three KL distances. The first is between the two received signal vectors r α, r α2 at the input to the detector. (The subscripts α and α 2 distinguish the two transmitted pieces of information.) The second is between the detected binary words w α, w α2 at the output of the detector (input to decoder), and the third is between the decoded binary words û α, û α2. We denote these distances by D r (α α 2 ), D w (α α 2 ), and Dû (α α 2 ) respectively. These distances represent the informational change between these particular signals. Certainly, a change in information (that is, a change in data words) induces the distance, but more importantly, through Stein s Lemma [2], the KL distance is the exponential decay rate of the false alarm probability of an optimum Neyman-Pearson detector. Thus, these distances quantify our ability to discriminate between the two information bearing signals at the input and output of the detector and decoder. Because of the Data Processing Theorem [], the detector and the decoder can at best pre-

KL KL KL u α vα s α (t) r α (t) r α w α ^ uα encoder modulator demodulator detector decoder n(t) Fig.. Two binary data blocks u α, u α2 are separately transmitted. The Kullback-Leibler distance between the distributions induced by each of the data blocks is calculated at the input and output of the detector and the decoder. The ratios of the input and output distances provide a measure of how well the detector and decoder preserve the informational change encoded their input signals. serve the distance presented at their input and at worst, reduce it to zero causing the ultimate recipient of the transmission to lose all ability to discern the informational change. The performance criterion we use is the information transfer ratio denoted by and defined as the ratio of the KL distances at the input and output of any system. It is a number between zero and one and reflects the fraction of the informational change preserved across a system. Here, we study the information transfer ratio of the detector and the decoder. det = D w (α α 2 ) D r (α α 2 ) dec = D û (α α 2 ) D w (α α 2 ) Ideally, the information transfer ratios across each of these systems would equal one indicating no informational loss. However in reality, we expect informational losses because the probability of error is never zero. The overall information transfer ratio across both the detector and decoder is simply expressed as the product of the individual information transfer ratios [3]. overall = det dec 2. KULLBACK-LEIBLER DISTANCE CALCULATIONS Each transmitted data word induces a probability distribution on the received signal vectors at the output of the demodulator. For example, if the channel adds white noise then each element of the received vector r α would be normally distributed with mean ± KE b /N and variance N /2 depending upon whether a zero or one is transmitted. (E b is the energy per data bit.) The statistical independence of the received vector elements allows us to write the KL distance at the input of the detector as a sum of the distances () (2) between each received vector element [3]. D r (α α 2 ) = N D rj (α α 2 ) (3) j= Simplifying this expression we can rewrite it in terms of the Hamming distance between v α and v α2 because D rj (α α 2 ) = if the j th bits in each word are the same. D r (α α 2 ) = d H (v α, v α2 ) D r (α α 2 ) (4) Table lists the KL distances D r (α α 2 ) for various noise distributions as a function of SNR. The detector compares each received sample r αj (j =,..., N) to a threshold and declares as its output either a one or a zero. The detected binary word w α is the collection of N such outputs. The decoder maps w α to estimates of the transmitted data words (w α û α, w α2 û α2 ). Specifically, its output is the code word closest in Hamming distance to w α (minimum distance decoding). We calculate the KL distance at the output of the detector by viewing each binary vector w n (n =,..., 2 N ) as the output of a binary symmetric channel with error probability P e. (See Table for expressions of P e for different noise distributions.) Accordingly, the probability of receiving w n when we transmit v α (or equivalently u α ) is Pr[w n u α ] = P d H(w n,v α) e ( P e ) N d H(w n,v α). These probabilities define the discrete distribution over the output of the detector, thus by definition we obtain D w (α α 2 ) = = N D wj (α α 2 ) j= 2 N n= Pr[w n u α ] log Pr[w n u α ] Pr[w n u α2 ] (5)

Noise Distribution D r (α α 2) P e SNR 4ξ Q ( 2ξ ) 2 e 4 ξ + 4 ξ 2 e 2 ξ SNR 4 2 ln ( + ξ) 2 ln [ sech ( )] 2 2ξ [ 2 tan sinh ( )] 2 2ξ 8 2 2 ( 2 tan ξ ) 8 2 2 Table. The Kullback-Leibler distances between the received random variables r αj and r α2j and the detector s hard decision bit error probabilities are shown in columns two and three for various noise distributions. In each expression ξ = KE b /NN where the signal-to-noise ratio per bit (SNR) equals E b /N. For the distribution, the quantity N is understood to be the width parameter. The fourth and fifth columns list the asymptotic values of the information transfer ratio across the detector. (See Figure 2.) When no error control coding is employed K = N. Calculation of the KL distance at the output of the decoder hinges on the decoding probabilities. Assuming u α is transmitted, the probability of decoding it as û m (m =,..., 2 K ) is the total probability mass of the decoding sphere of v m. Pr[û m u α ] = L m l= Pr[w l u α ] Here, l indexes the binary words within the decoding sphere of v m. To ensure the KL distance at the output of the decoder is defined we assume there are no failure-to-decode events, or in other words, we assume each w n lies within a decoding sphere. Similar to equation (5), we have Dû (α α 2 ) = 2 K m= Pr[û m u α ] log Pr[û m u α ] Pr[û m u α2 ]. (6) In the special case when no error control coding is employed, v α = u α, N = K, and the decoder performs no function. The output of the detector is the estimate of the transmitted data word. The expression for the KL distance at the input of the detector remains unchanged except u α substitutes v α in equation (4). The KL distance at the output of the detector can be written like equation (6) but with Pr[û m u α ] replaced by Pr[û m u α ] = P d H(u m,u α) e ( P e ) K d H(u m,u α). In this case we can also simplify the output KL distance in much the same way as equation (3). Because the estimates û α are statistically independent when there is no coding, we can write Dû (α α 2 ) = [ d H (u α, u α2 ) ( P e ) log P ] e P e + P e log P e P e (7) The bracketed term is the KL distance between the binary distributions which result every time a data bit is transmitted. 3. EXAMPLES AND DISCUSSION We study three fundamental examples. We investigate performance when no error control coding is used (the uncoded case), and then consider two Hamming codes ((3, ) and (7, 4)). In order to make fair comparisons between uncoded and coded cases, we maintain constant data rates. This requirement constrains the total transmission time of the N coded bits of a (N, K) code to KT seconds. (It takes KT seconds to transmit K data bits in uncoded cases.) We plot the information transfer ratios for four noise distributions in Figure 2 for the uncoded case and list their respective asymptotic values of in Table. These curves show the informational loss for making hard decisions at the detector. Notice the decrease in performance as the SNR increases. It is not due to the output KL distances decreasing but instead, to the growing proportional differences between the input and the output distances. (See Figure 3.) This fact means the detector better preserves the informational change at lower SNR values than at higher values. However, even though the detector is less efficient with SNR, the loss is not great. Because the information transfer ratio across the detector is completely independent of the input data words, it is, in particular, independent of the input data word length for the uncoded case. We prove in Appendix A that a likelihood ratio digital demodulator maximizes the information transfer ratio across binary detectors. Thus, the curves in Figure 2 represent the best achievable performance across any hard decision detector. Figure 4 plots det, dec, and overall when we use a (3, ) and (7, 4) Hamming code. The top row exhibits the losses across the detector; the middle row across the decoder;

.8.8.6.4.2.6.4.2 2 5 5 5 5 2 2 5 5 5 5 2 Fig. 2. The performance of the detector in terms of the information transfer ratio is shown for the uncoded case. The performance is independent of the data word length K. 2 8 6 4 2 2 5 5 5 5 2 Fig. 3. The widening gap between the Kullback-Leibler distances at the input and output of the hard decision detector illustrates why the information transfer ratio decreases with increasing SNR. This particular plot is generated with noise and with K = 4. and the bottom row across both systems. Because of the constant data rate constraint the information transfer ratio curves across the detector are scaled versions of the curves in Figure 2. The examples studied here show a relatively constant additional loss across the decoder. These curves are identical when plotted against probability of error. Why they are not monotonic is an issue we are studying. Apparently, a more efficient code, the (7,4) code here, yields larger information transfer ratios. The performance across the decoder depends upon the choice of the transmitted code words. In general, the dependence is related in a complicated way to Hamming distance, but for the (7, 4) code studied here, greater distance implies better performance. For example, instead of choosing two code words with a Hamming distance of 4 as in Figure 4, we could choose two with a Hamming distance of 7. As shown in Figure 5, compared to the right middle panel of Figure 4, better fidelity results for high SNR. Within the framework of information processing the concept of coding gain does not exist. Because of the Data Processing Theorem, error control coding simply can not regain the informational loss across the detector. Once the loss occurs no post-processing can be performed to compensate for it. More powerful codes and decoding Fig. 5. The information transfer ratio across the decoder for a (7, 4) code is shown (α = (), α 2 = 6 ()). The improved performance, compared with right middle plot of Figure 4, is due to the increase in Hamming distance from 4 to 7. schemes could conceivably improve the informational efficiency across the decoder. At present however, no methods or even approaches exist on how to design codes and decoding schemes to maximize across the decoder. Improvements can be made across the detector if we introduce soft decision detectors. In fact, it is not difficult to think of examples in which this is the case. Such investigations could possibly lead to using, for example, to systematically study soft decision decoding. A. APPENDIX Consider a general binary detection problem where r α and r α2 are two possible received signal vectors presented at the input of the detector under hypothesis α and α 2 respectively. Let p(r α ) and p(r α 2 ) be conditional probability density functions associated with each hypothesis. Denote the output decisions of the detector as Λ and Λ 2. The information transfer ratio equals = D Λ (α α 2 ) D r (α α 2 ) = P D log (P D /P F ) + ( P D ) log ( P D )/( P F ) p(r α ) log p(r α) p(r α 2) dr where P D is the probability of detection and P F is the probability of false alarm. Explicitly, p(λ α ) = P F p(λ 2 α ) = P F p(λ α 2 ) = P D p(λ 2 α 2 ) = P D. Maximizing is equivalent to maximizing the numerator which translates into finding values of P D and P F which maximize ( ) ( ) PD PD P D log + ( P D ) log = P F P F H(P D ) P D log P F ( P D ) log ( P F ). (8)

Across detector Across detector.8.8.6.4.2.6.4.2 2 5 5 5 5 2.8 Across decoder 2 5 5 5 5 2.8 Across decoder.6.4.2.6.4.2 2 5 5 5 5 2.8.6 Overall 2 5 5 5 5 2.8.6 Overall.4.2 2 5 5 5 5 2.4.2 2 5 5 5 5 2 Fig. 4. Plots of the information transfer ratio across the detector and decoder for a (3, ) (left column) and a (7, 4) (right column) Hamming code are shown for various noise distributions. The detector makes hard decisions and the decoder uses minimum distance decoding. For the (3, ) code α = (), α 2 = 2 (); for the (7, 4) code α = (), α 2 = 5 (). We arbitrarily reference the plots to the all-zero code words. Since P D and P F are coupled they can not be independently optimized, so without loss of generality, assume P F = a and P D = a + l. Substituting these values into equation (8) and setting its derivative equal to zero we obtain [ (a 2 ] + al) + a + l log (a 2 =. + al) + a For a given value of a (P F ), we note that the derivative is positive for l >, negative for l <, and zero when l = (minimum). Thus to maximize the numerator of equation (8) we choose the largest possible l but constrained to l a. The upper bound results from the fact that P D and P F are probabilities and thus must be between zero and one. Formally, for a given false-alarm probability max l = max P D P F l< a P D = max p(r α 2 ) p(r α ) dr Λ Λ Therefore Λ should be defined as Λ = {r p(r α 2 ) > p(r α )} which is exactly the condition of the likelihood ratio test. This result is general and holds for all noise distributions. B. REFERENCES [] S. Sinanović and D. H. Johnson, Toward a theory of information processing, Submitted to IEEE Trans on Signal Processing, Jun 22. [2] T.M. Cover and J.A. Thomas, Elements of Information Theory, Wiley, 99. [3] Sinan Sinanović, Toward a Theory of Information Processing, 999, Master of Science Thesis, Rice University, Houston TX.