Model Identification for Wireless Propagation with Control of the False Discovery Rate

Model Identification for Wireless Propagation with Control of the False Discovery Rate Christoph F. Mecklenbräuker (TU Wien) Joint work with Pei-Jung Chung (Univ. Edinburgh) Dirk Maiwald (Atlas Elektronik) Nicolai Czink (FTW) Bernard H. Fleury (Aalborg Univ. and FTW) Advanced Lectures in Wireless Communications 10.04.2008 1

Motivation Tx Channel Rx ˆ C Risk for over-estimation C 10.04.2008 2

Motivation What is interference depends on your knowledge of the channel 10.04.2008 3

Uniform Linear Array ULA-8 Uniform Circular Array UCA-15 10.04.2008 4

Some paths explained 10.04.2008 5

Problem Formulation (1) Tx Rx 10.04.2008 6

Problem Formulation (2) An array of n antennas receives m broadband wavefronts impinging at unknown delays and directions hidden in additive Gaussian noise (spatially and temporally white). Goal: Determine the number of signals m based on the array output and the associated parameters. 10.04.2008 7

10.04.2008 8 Double-directional model Transfer function in 3-D case: DoA, DoD, delay! = " " " = P p m T j l d j k d j p m l k p p p c x 1 1) ( 2 1)cos ( 2 1)cos ( 2,, e e e # $ % & $ ' & $

Data Model Array output x (k) (t) for the kth snapshot is short-time Fourier transformed T " 1 ( k ) 1! ( k ) " j# t X (#) = w( t) x ( t)e T t= 0 For large T, we can approximately model the array output in frequency domain ( k ) X (!) = H (!;" ) S (!) + U (!) where the columns of the transfer matrix H model plane waves. 10.04.2008 9 ( k ) ( k )

Data Model Statistics Linear data model ( k ) ( k ) X (!) = H (!;" ) S (!) + U ( k ) (!) Data statistics conditioned on the signal X (k ) S (k) ~ N C (HS (k )," 2 I) 10.04.2008 10

Data Model Statistics Linear data model ( k ) ( k ) X (!) = H (!;" ) S (!) + U ( k ) (!) Data statistics conditioned on the signal X (k ) S (k) ~ N C (HS (k )," 2 I) 10.04.2008 11

Conditional Data Model Log-likelihood Data statistics conditioned on the signal f X (x) = 1 " N # N ($) exp ' % 1 #($) x % * ) H($;&)S(k ) ($) 2, ( + 10.04.2008 12

Wavefront Detection using a Multiple Hypotheses Test for m = 1, 2,... M Hypothesis H m : Array output contains at most (m 1) wavefronts hidden in the noise Alternative A m : Array output contains at least m wavefronts hidden in the noise 10.04.2008 13

Test for model order selection Generalized Likelihood Ratio Test Equivalent to the Wald Test proposed by Steven Kay 1993 for parametric model order selection H 2 H 1 10.04.2008 14

Test for model order selection Generalized Likelihood Ratio Test Equivalent to the Wald Test proposed by Steven Kay 1993 for parametric model order selection H 3 H 2 10.04.2008 15

Test for model order selection Generalized Likelihood Ratio Test Equivalent to the Wald Test proposed by Steven Kay 1993 for parametric model order selection H 4 H 3 10.04.2008 16 Image: Wikipedia

Generalized Likelihood Ratio Test Based on the likelihood ratio, we obtain the test statistic where K 1 ( k ) ( k ) R ˆ = R ˆ (" ) = X (" ) X ( " j! K k = 1 10.04.2008 17 j j )

Traditional formulation Evaluate test statistic T m (θ) from data and compare with pre-computed threshold value T m <? t m t m := F Tm "1 (1"# m ) Inverse of cumulative distribution function is needed 10.04.2008 18

Formulation with p-values Evaluate empirial significance value (=p-value) for test statistic T m (θ) from data and compare with the specified false-alarm probability T m <? t m t m := F Tm "1 (1"# m ) p m <? 1"# m p m := F Tm (T m ) 10.04.2008 19

Test distribution Under hypothesis H m, the statistic F m (ω j ;θ ) is F n 1,n2 -distributed where the degrees of freedom n 1, n 2 are given by n 1 = K ( 2 + dim(θ m ) ) n 2 = K ( 2n 2m dim(θ m 1 )) ^ Narrowband (J = 1): GLRT is equivalent to F- test [Shumway 1983]. Broadband (J > 1): test distribution is unknown. 10.04.2008 20

Where we are now in the talk At this point of the talk, we have a tool for computing (estimating) the p-values for all the hypotheses. That s acceptable because, we don t know the exact distribution of the broadband GLRT test statistic. (J being a small integer > 1) Now, let s talk about the type of errors, we can commit. 10.04.2008 21

PCE, FWE, FDR definitions Ref.[1] m hypothesis are assumed to be known in advance, R is observable, U, V, S, T are unobservable Control of type-one errors PCE = E(V/m) Per Comparison Error Rate FWE = P(V 1) Familywise Error Rate FDR = E(V/R) False Discovery Rate 10.04.2008 22

Control of the false discovery rate (FDR) Traditional approach controls familywise error-rate (FWE). When the number of hypotheses is large than the power of Bonferroni-type procedures is substantially reduced. Benjamini and Hochberg proposed to control FDR instead of FWE in 1995. FDR is defined as the expected proportion of erroneously rejected hypotheses 10.04.2008 23

Benjamini-Hochberg proc. When the test statistics corresponding to the true null hpotheses are independent, the following procedure controls the FDR at level q Sort the p-values: p (1), p (2),..., p (M) Find k = max { m : p (m) mq/m } Reject all H (1), H (2),..., H (k). (if no such k exists then don t reject any hypothesis) 10.04.2008 24

Benjamini-Hochberg proc. Sort the p-values: p (1), p (2),..., p (M) Find k = max { m : p (m) mq/m } Reject all H (1), H (2),..., H (k). 10.04.2008 25

10.04.2008 26

10.04.2008 27

10.04.2008 28

Early take-home message The broadband test distribution under H m is not known. We apply the bootstrap technique to determine the distribution numerically. If all null hypotheses are true then controlling the FDR is equivalent to controlling the FWE Simulations show that the FDR-controlling procedure has always a higher probability of detection than the FWE controlling procedure. Reliability of the proposed test is not affected by the gain in power. 10.04.2008 29

ULA-8 UCA-15 10.04.2008 30

10.04.2008 31

Receiver s View on Weikendorf Site 10.04.2008 32

10.04.2008 33

10.04.2008 34

Closing remark Model order selection is a problem which is asymmetric in its risks for over- or under-estimating the true model structure Multiple hypotheses tests let you control the various types of errors you could commit 10.04.2008 35

Happy birthday 10.04.2008 36

References Y. Benjamini and Y. Hochberg. Controlling the false discovery rate: a practical and powerful approach to multiple testing. J. Roy. Statist. Soc. Ser. B, (57):289 300, 1995. R.H. Shumway. Replicated time-series regression: An approach to signal estimation and detection. In D.R. Brillinger and P.R. Krishnaiah, editors, Handbook of Statistics, Vol. 3, pages 383 408. Elsevier Science Publishers B.V., 1983. S. Holm. A simple sequentially rejective multiple test procedure. Scand. J. Statist., 6:65 70, 1979. E. Efron. Bootstrap method. Another look at Jacknife. The Annals of Statistics, 7:1 26, 1979. Abdelhak M. Zoubir and B. Boashash. The bootstrap and its application in signal processing. IEEE Signal Processing Magazine, 15(1):56 76, January 1998. D. Maiwald. Breitbandverfahren zur Signalentdeckung und ortung mit Sensorgruppen in Seismik und Sonaranwendungen. Dr. Ing. Dissertation, Dept. of Electrical Engineering, Ruhr Universität Bochum, Shaker Verlag, Aachen, 1995. P.-J. Chung, J.F. Böhme, A.O. Hero, and C.F. Mecklenbräuker. Signal detection using a multiple hypothesis test. In Proc. Third IEEE Sensor Array and Multichannel Signal Processing Workshop, Barcelona, Spain, July 18-21 2004. P.-J. Chung, J.F. Böhme, C.F. Mecklenbräuker, and A.O. Hero. On signal detection using the benjamini-hochberg procedure. In Proc. IEEE Workshop on Statistical and Signal Processing, Bordeaux, France, July 2005. 10.04.2008 37

FDR example 10.04.2008 38

FDR example (continued) 10.04.2008 39

10.04.2008 40

120 MHz. 10.04.2008 41

Bootstrap approximation: assumptions The test statistic T m ( θ m ) is the sample mean of J samples $ T j = log 1+ n ' 1 & F m (" j ;# m )) % n 2 ( We consider T 1, T 2,..., T J as i.i.d. samples because X(ω j ) are asymptotically independent for T F m (ω j ;θ m ) are asymptotically F n 1,n2 -distributed 10.04.2008 42