MULTICHANNEL BLIND SEPARATION AND. Scott C. Douglas 1, Andrzej Cichocki 2, and Shun-ichi Amari 2

MULTICHANNEL BLIND SEPARATION AND DECONVOLUTION OF SOURCES WITH ARBITRARY DISTRIBUTIONS Scott C. Douglas 1, Andrzej Cichoci, and Shun-ichi Amari 1 Department of Electrical Engineering, University of Utah Salt Lae City, Utah 8411 USA Brain Information Processing Group, Frontier Research Program, RIKEN Wao-shi, Saitama 31-1 JAPAN Abstract{ Blind deconvolution and separation of linearly mixed and convolved sources is an important and challenging tas for numerous applications. While several recently-developed algorithms have shown promise in these tass, these techniques may fail to separate signal mixtures containing both sub- and super-gaussiandistributed sources. In this paper, we present a simple and ecient extension of a family of algorithms that enables the separation and deconvolution of mixtures of arbitrary non-gaussian sources. Our technique monitors the statistics of each of the outputs of the separator using a rigorously-derived sucient criterion for stability and then selects the appropriate nonlinearity for each channel such that local convergence conditions of the algorithm are satised. Extensive simulations show the validity and eciency of our method to blindly extract mixtures of arbitrary-distributed source signals. I. INTRODUCTION Blind signal separation is useful for numerous problems in biomedical signal analysis, acoustics, communications, and signal and image processing. In blind source separation of instantaneous signal mixtures, a set of measured signals fx i ()g 1 i n is assumed to be generated from a set of unnown stochastic independent sources fs i ()g 1 i m, m n as x() = Hs() (1) where x() =[x 1 () x n ()] T, s() =[s 1 () s m ()] T,andH is an (n m)-dimensional matrix of unnown mixing coecients fh ij g. The measured sensor signals are processed by a linear single-layer feed-forward networ as y() = W()x() ()

where y() =[y 1 () y m ()] T and W() isan(mn)-dimensional synaptic weight matrix. Ideally, W() is adjusted iteratively such that lim W()H = PD (3)!1 where P is an (m m)-dimensional permutation matrix with a single unity entry in any ofitsrows or columns and D is a diagonal nonsingular matrix. Recently, several simple, ecient, and robust iterative algorithms for adjusting W() have been proposed for the blind signal separation tas [1]{[1]. Such methods use higher-order statistical information about the source signals to iteratively adjust the coecient matrix W(). In this paper, we consider one class of on-line adaptive algorithms given by [1] W( +1) = W()+() I ; f(y())y T () W() (4) where f(y()) = [f 1 (y 1 ()) f m (y m ())] T. The optimal forms of the nonlinear functions ff i (y)g can be shown to be dependent on the statistics of the source signals [, 7, 8]. For example, if the signal mixture consists of sub-gaussian sources with negative urtoses, the choices f i (y) = f N (y) = jyj p sgn(y) for p = f 3 :::g provide adequate separation capabilities. For mixtures of super-gaussian sources with positive urtoses, the choice f i (y) = f P (y) = tanh(y) with > can be used [3, 11, 1]. A related tas to blind signal separation is that of multichannel signal deconvolution, in which x() is assumed to be produced from s() as x() = 1X p=;1 H p s( ; p) () where H p is an (nm)-dimensional matrix of mixing coecients at lag p. The goal is to calculate a vector y() of possibly scaled and/or delayed estimates of the source signals in s() from x() using a causal linear lter given by y() = LX p= W p ()x( ; p) (6) where the (m n)-dimensional matrices fw p ()g, p L contain the coecients of the multichannel lter. One algorithm that can be used in this tas is described in [, 6] and is given by W p ( +1) = W p ()+() W p () ; f(y( ; L))u T ( ; p) (7) where the n-dimensional vector u() is computed as u() = LX q= W T L;q()y( ; q): (8)

This algorithm reduces to that in (4) for L =. Although simulations have indicated that these algorithms are successful at separating and deconvolving linearly-mixed signals, they require nowledge about the statistics of the source signals to function properly. In particular, it must be nown a priori whether the source signals are sub-gaussian or super- Gaussian so that the nonlinearities f i (y) can be properly chosen. Even worse, if the measured signals x i () contain mixtures of both sub-gaussian (e.g. digital data) and super-gaussian (e.g. speech) sources, then these algorithms may fail to separate these signals reliably. In this paper, we propose modications to the algorithms in (4) and (7) that enable sources from arbitrary non-gaussian distributions to be extracted from measurements of the mixed signals. Our methods use simple sucient conditions for algorithm stability that are based on the necessary stability conditions originally derived by Amari et al [3]. Our computationally-simple algorithms employ time-varying nonlinearities in the coecient updates that are selected from a family of xed nonlinearities at each iteration to best satisfy our sucient stability conditions. Simulations show the excellent and robust convergence behavior of the proposed methods in separating mixtures of sub- and super-gaussian sources. II. CRITERIA FOR ALGORITHM STABILITY The modied algorithms for separation of sources with arbitrary distributions are based on the stability analysis of (4) that is described in [3]. For brevity and simplicity of discussion, we only consider (4) and outline the necessary extensions of the analysis that are needed to develop our modied algorithms, although we later apply the results to the multichannel deconvolution method in (7). The algorithm in (4) can be derived as an iterative stochastic minimization procedure for the cost function (W()) = ; 1 log(det(w()w()t )) ; i=1 Eflog p i (y i ())g (9) where Efg denotes statistical expectation and ;d log p i (y)=dy = f i (y). If p i (y) is the actual probability distribution of the source extracted at the ith output, then (W()) represents the negative of the maximum lielihood cost function [, 7]. The procedure in (4) represents the natural gradient method for minimizing (9) iteratively using signal measurements. For details on the general form of the natural gradient search method, the reader is referred to [1]. In [3], the stability of (4) is analyzed by studying the expected value of the Hessian of the cost function, denoted as Efd (W())=dw ij ()dw pq ()g,

in the vicinity of a source separation solution. Here, w ij () isthe(i j)th element ofw(), which in[3] is assumed to be a square matrix (m = n). In what follows, we remove this restriction. In analogy with the results of [3], it is simpler to consider the form of d (W()) in terms of the modied coecient dierential such that dx() = dw()w T ()(W()W T ()) ;1 (1) dx()y() = dw()x(): (11) Note that the natural gradient method automatically performs its search in the coecient space spanned by dx(), so that the coecient updates remain in the original column space of W() for all. We can then represent the dierential d (W()) in terms of the elements of dx() as d (W()) = y T ()dx T ()F d (y())dx()y() + f T (y())dx()dx()y() (1) where F d (y()) is a diagonal matrix whose (i i)th entry is f i (y i()). As is shown in [3], the expectations of the terms on the RHS of (1) are Efy T ()dx T ()F d(y())dx()y()g = Eff T (y())dx()dx()y()g = i=1 j=1 j6=i + i=1 i=1 j=1 where i (), i (), j(), and i () are dened as i () j ()[dx ji ()] i ()[dx ii ()] (13) i ()dx ij ()dx ji () (14) i () =Efyi ()fi(y i ())g i () =Efyi ()g (1) j () =Effj(y j ())g and i () =Efy i ()f(y i ())g (16) respectively, and where it has been assumed that y i () andy j () are independent for i 6= j. Thus, the expected value of the Hessian is Efd (W())g = i=1 j=1 j6=i + i=1 i j()[dx ji ()] + i ()dx ij ()dx ji () [ i ()+ i ()][dx ii ()] : (17)

For stability, Efd (W())g must be positive for all possible values of dx ij (). By examining the RHS of (17), one can obtain the following necessary and sucient stability conditions on f i (y) for all 1 i<j m: i () > (18) i ()+ i () > (19) i () i () j () j () > 1 4 [ i()+ j ()] : () The conditions in (18) and (19) are satised in practice for any odd nondecreasing function f i (y) =;f i (;y). However, the condition in () is dicult to calculate in practice, as it involves m(m ; 1)= dierent combinations for 1 i<j m. For this reason, we consider a sucient stability criterion of the form i () i () j () j () > () i () j () (1) where (), () > satises for all 1 i<j m the inequality () i () j () 1 4 [ i()+ j ()] : () After some algebra, we nd that the smallest value of () satisfying () is () = 1 s + max() min () + min() max () (3) where max () and min () are the maximum and minimum values of i () for 1 i m, respectively. For this value of (), we can guarantee the stability of the algorithm in (4) if, for all 1 i m, i () i() ; () i () > : (4) Note that all values of i () converge to one as the coecients converge to a separating solution due to the normalizing condition Eff(y())y T ()g = I, such that () 1 near convergence. III. THE ALGORITHM MODIFICATION We now describe the modied algorithms that are based on the stability criteria of the last section. It is nown that the algorithm in (4) can determine a separating solution for W() if a set of nonlinearities ff i (y)g, 1 i m can be properly chosen. For this reason, our modied algorithms employ a time-varying vector nonlinearity f (y()) = [f 1 (y 1 ()) f m (y m ()] T,

where f i (y) ischosen to be one of two nonlinearities f N (y) andf P (y) thatare optimized for sub- and super-gaussian source separation tass, respectively. To select f i (y) at time, we form the time-averaged estimates i () = (1 ; ) i ( ; 1) + jy i ()j () ir () = (1 ; ) ir ( ; 1) + f r (y i()) (6) ir () = (1 ; ) ir ( ; 1) + y i ()f r (y i ()) (7) for r = fn Pg and 1 i m, where is a small positive parameter. Then, f i (y) at time is selected as f i(y) = f N(y) f P (y) if i () in () ; () in () > i () ip () ; () ip () if i () in () ; () in () <i () ip () ; () ip () where () is computed at infrequent intervals from past estimates of i (). With these choices, the resulting vector f (y()) is used in place of f(y()) to adjust the coecient matrix in (4). In simulations, it was found that the value of () did not vary signicantly over time, and in fact, setting () equal to one in (8) for all appears to provide convergence of the algorithms to a separating solution. It can be seen that as the coecients of the system converge, the quantity i () ir() ; ir () becomes a reliable estimate of the left-hand-side of the inequality in (4) for f i (y) =f r (y). Extensive simulations have shown that, so long as a set of nonlinearity assignments exists such that the stability conditions in (18){() are satised for one ordering of the extracted sources at the outputs, then (8) properly selects each f i (y) over time to enable the system to reliably extract all source signals regardless of their distribution. IV. SIMULATIONS We now show the capabilities of our modied source separation algorithms via simulation. In our rst example, we employ the signal separation method in (4) to separate ten instantaneously-mixed signals. In this case, the three signal sets fs 1 () s () s 3 () s 4 ()g, fs () s 6 () s 7 ()g, and fs 8 () s 9 () s 1 ()g are i.i.d. with Laplacian, uniform-[;1 1], and binaryf1g distributions, respectively, where the Laplacian p.d.f. is given by p s (s) = :e ;jsj. Since the rst and latter two distributions are super- and sub- Gaussian, respectively, the algorithms in [4, 8] cannot linearly separate these sources from an arbitrary mixture of them. We generate x() as in (1), where the entries of H are drawn from a uniform-[ 1] distribution. The values of h ij to four decimal places are shown in Table 1. As is clear from the table, H exhibits no particular structure, and thus the extraction of the ten sources from the measured signals x() is a challenging tas. (8)

Table 1: The entries of H for the ten source separation example. (i j) 1 3 4 6 7 8 9 1 1.844.41.397.6648.973.4364.619.8918.964.883.71.7.66.461.97.93.7337.7483.38.691 3.741.47.77.7443.884.9441.77.4733.19.393 4.4146.368.64.998.3.977.133.1339.3141.93.71.636.3.437.46.468.38.1748.1137.413 6.184.499.44.111.74.971.1367.6777.7476.666 7.148.81.611.71.138.7741.4.437.847.7489 8.816.81.6381.364.919.6716.73.79.368.46 9.9349.114.144.398.497.94.487..6368.717 1.6.68.97.86.8663.9.9896.4781.3873. We apply the algorithm in (4), where f(y()) = f (y()) is adapted according to our method described in (){(8) with () = :, = :1, W() = I, and f N (y) =jyj 3 sgn(y) and f P (y) = tanh(1y) (9) corresponding to nonlinearities for separating sub- and super-gaussian-distributed sources, respectively. Within each on-line nonlinearity selection procedure, we set () to one for all. From the outputs of the separation system, we compute the error vector e() =[e 1 () e 1 ()] T as e() = s() ; D ;1 P T y() (3) where approximate versions of the permutation matrix P and scaling matrix D as introduced in (3) are obtained from W() and H at iteration = 1. Figure 1 shows these ten error signals. Since each error signal decreases to a small value after a sucient number of iterations, all of the sources are reliably extracted using our modied algorithm. Figure plots the performance factor () dened as () = jjp T W()Hjj E jj[p T W()H] d jj ; 1 (31) E where jjjj E denotes the matrix Euclidean norm and where [Q] d is a diagonal matrix whose (i i)th entry is q ii. As can be seen, the value of () decreases to approximately :168 in steady-state, indicating that the system has adequately separated the ten sources. Moreover, a careful examination of the nonlinearities chosen for each extracted output indicate that the appropriate stabilizing nonlinearity f N (y) orf P (y) was eventually selected for each output signal. We now combine the algorithm modication in (){(8) with the blind deconvolution and source separation technique in (7) and apply the resulting system to a three-source separation problem. In this case, the three sources are chosen to be i.i.d. Laplacian-, uniform-, and binary-distributed, respectively, and the convolutive mixture model is given by x() = A 1 x( ; 1) + A x( ; ) + B s()+b 1 s( ; 1) (3)

e_1() e_3() 1-1 1 1-1 1 e_() e_7() e_9() 1 1 1 e_() e_4() e_6() e_8() e_1() 1 1 1 1 1 Figure 1: The ten error signals for the rst source separation example. where the (3 3) matrices A 1, A, B, and B 1 are given by A 1 = 4 B = ;:1 :3 ;:1 ;: : ;:3 ;: ;: :4 4 : :7 : :9 :8 :7 :4 3 A = 4 3 and B1 = 4 :4 : :3 :8 :4 :3 :6 :6 :1 :4 : :4 :7 :7 :1 :6 3 (33) 3 : (34) Figures 3(a)-(c) and (d)-(f) show the vector sequences s() and x() in this case. The deconvolution system with time-varying nonlinearities was applied to these signals, in which L =6,() =:, =:1, W p () = I(p ; 3), and () was set to one for all within the nonlinearity selection procedures. Shown in 3(g)-(i) are the error signals e 1 () =s 1 () ; y 3 ()=d 33, e () =s () ; y ()=d, and e 3 () =s 3 () ; y 1 ()=d 11, where d 11, d, and d 33 are appropriate scaling factors. As can be seen, the errors decrease to small values for each extracted output, and the signal-to-noise ratios for the three extracted outputs were empirically found to be 1:, 7:4, and 1: db, respectively. Figure 4 shows the actual value of () ; 1 on a logarithmic scale for the second source separation example. Starting from an initial value of (1) = :94, the value of () gradually approaches unity over time. These results, combined with the successful separation capabilities of the modied systems, indicate that setting () to one within the nonlinearity selection procedures does not limit the overall capabilities of the systems.

1 1 1 psi() 1-1 1-1 3 4 6 7 8 9 1 Figure : Performance factor () for the rst source separation example. 1 (a) (b) (c) s_1() s_() s_3() -1 1 (d) x 1 4-1 (e) x 1 4-1 (f) x 1 4 x_1() x_() x_3() - 1 (g) x 1 4 1 1 (h) x 1 4 1 (i) x 1 4 e_1() e_() e_3() -1 1 x 1 4-1 x 1 4-1 x 1 4 Figure 3: The three source signals ((a)-(c)), the three mixed signals ((d)- (f)), and the three error signals ((g)-(i)) for the convolutive-mixture source separation example. 1 1-1 gamma()-1 1-1 -3 1-4 1. 1 1.. 3 x 1 4 Figure 4: Evolution of () ; 1 for the convolutive-mixture source separation example.

V. CONCLUSIONS In conclusion, we have described techniques for selecting the nonlinearities within blind source separation and deconvolution algorithms to enable the separation of sources with arbitrary distributions. The proposed methods can be easily implemented in an on-line setting. Simulations applying the techniques to instantaneous mixture separation and to multichannel deconvolution and source separation indicate the ability of the methods to accurately separate signal mixtures containing both sub- and super-gaussian sources. REFERENCES [1] S. Amari, \Natural gradient wors eciently in learning," submitted to Neural Computation. [] S. Amari and J.-F. Cardoso, \Blind source separation semi-parametric statistical approach," to appear in IEEE Trans. Signal Processing. [3] S. Amari, T.-P. Chen, and A. Cichoci, \Stability analysis of adaptive blind source separation," to appear in Neural Networs. [4] S. Amari, A. Cichoci, and H.H. Yang, \A new learning algorithm for blind signal separation," Adv. Neural Inform. Proc. Sys. 8 (Cambridge, MA: MIT Press, 1996), pp. 77-763. [] S. Amari, S.C. Douglas, A. Cichoci, and H.H. Yang, \Multichannel blind deconvolution using the natural gradient," Proc. IEEE Worshop Signal Proc. Adv. Wireless Comm. (Piscataway, NJ: IEEE, 1997), pp. 11-14. [6] S. Amari, S.C. Douglas, A. Cichoci, and H.H. Yang, \Novel on-line adaptive learning algorithms for blind deconvolution using the natural gradient approach," presented at 11th IFAC Symp. Syst. Ident., Kitayushu, Japan, July 1997. [7] A.J. Bell and T.J. Sejnowsi, \An information maximization approach to blind separation and blind deconvolution," Neural Computation, vol. 7, no. 6, pp. 119-119, Nov. 199. [8] J.-F. Cardoso and B. Laheld, \Equivariant adaptive source separation," IEEE Trans. Signal Processing, vol. 44, pp. 317-33, Dec. 1996. [9] A. Cichoci, R. Unbehauen, and E. Rummert, \Robust learning algorithm for blind separation of signals," Electron. Lett., vol. 3, no. 17, pp. 1386-1387, Aug. 1994. [1] A. Cichoci, S. Amari, M. Adachi, and W. Kasprza, \Self-adaptive neural networs for blind separation of sources," Proc. IEEE Int. Symp. Circ. Syst., vol. (New Yor: IEEE, 1996), pp. 17-16. [11] P. Comon, C. Jutten, and J. Herault, \Blind separation of sources, II: Problem statement," Signal Processing, vol. 4, no. 1, pp. 11-, July 1991. [1] C. Jutten and J. Herault, \Blind separation of sources, I: An adaptive algorithm based on neuromimetic architecture," Signal Processing, vol. 4, no. 1, pp. 1-1, July 1991.