x x2 H11 H y y2 H22 H - PDF Free Download

.5.5 5 5.5.5 5 5 2 3 4 5 6 7 H 2 3 4 5 6 7 H2 5 5 x 5 5 x2.5.5.5.5.5.5.5.5 2 3 4 5 6 7 H2 2 3 4 5 6 7 H22.5.5 y.5.5 y2 Figure : Mixing coef.: H [k], H 2 [k], H 2 [k], H 22 [k]. Figure 3: Observations and recovered sources..3.3.25.25.2.2.5.5...5.5.5 5 5 2 25 G.3.25.5 5 5 2 25 G2.3.25 PEFOMANCE INDEX.2.5.2.5 2...5.5.5 5 5 2 25 G2.5 5 5 2 25 G22 3 5 5 2 25 3 35 4 ITEATIONS Figure 2: Overall coef.: G [k], G 2 [k], G 2 [k], G 22 [k]. Figure 4: Performance Index versus iterations. Cichocki, for their useful discussion during this period. This work has also been supported by the Grants: CI- CYT (TIC96-5-C-8), CICYT (TIC96-5-C- 2) and Xunta de Galicia (XUGA 52A96). 9. EFEENCES [] L. B. Almeida and F. M. Silva. Adaptive decorrelation. Articial Neural Networks (Elsevier), 2:49{56, 992. [2] S. Amari and J. F. Cardoso. Blind source separationsemiparametric statistical approach. IEEE Transactions on Signal Processing, 45():2692{2697, 997. [3] S. Amari, S. Douglas, A. Cichocki, and H. Yang. Multichannel blind deconvolution and equalization using the natural gradient. In IEEE Workshop on Wireless Communication, Paris, pages {4, April 997. [4] S.-I. Amari, T.-P. Chen, and A. Cichocki. Stability analysis of learning algorithms for blind source separation. Neural Networks, (8):345{35, 997. [5] A. Bell and T. Sejnowski. An information maximization approach to blind separation and blind deconvolution. Neural Computation, (7,6):29{59, 996. [6] X.-. Cao and. Wen Liu. General approach to blind source separation. IEEE Transactions on Signal Processing, 44(3):562{57, Mar. 996. [7] J. Cardoso and B. Laheld. Equivariant adaptive source separation. IEEE Transactions on Signal Processing, 44(2):37{33, Dec. 996. [8] A. Cichocki,. Bogner, L. Moszczynski, and K. Pope. Modied H-J algorithms for blind separation of sources. Digital Signal Processing, 7(2):8{93, 997. [9] A. Cichocki and. Unbehauen. obust neural networks with on-line learning for blind identication and blind separation of sources. IEEE Transactions on Circuits and Systems-I, 43():894{96, 996. [] A. Cichocki,. Unbehauen, and E. ummert. obust learning algorithm for blind separation of signals. Electronics Letters, 3(7):386{387, August 994. [] P. Comon. Independent component analysis, a new concept? Signal Processing, (36):287{34, 994. [2] S. Cruces, A. Cichocki, and L. Castedo. An unied perspective of blind source separation adaptive algorithms. In proceedings of the Learning'98 congress (to appear), Madrid, Spain, Sept. 998. [3] S. Douglas and A. Cichocki. Neural networks for blind decorrelation of signals. IEEE Transactions on Signal Processing, 45():2829{2842, Nov 997. [4]. Fletcher. Practical Methods of Optimization. John Willey and Sons, second edition, 987. [5] S. Haykin. Blind Deconvolution. Prentice Hall, 994. [6] C. Jutten and J. Herault. Blind separation of sources, part I: An adaptive algorithm based on neuromimetic architecture. Signal Processing, (24):{, 99. [7] J. Karhunen, E. Oja, L. Wang,. Vigario, and J. Koutsensalo. A class of neural networks for independent component analysis. IEEE Transactions on Neural Networks, 8(3):486{53, May 97.

From the denition of the PSD matrix we see that ks (!) ; Ik 2 +ks (!)k 2 + P k k [k]k 2 Close to convergence, the following approximation Pk k [k]k 2 maxfjdiag( [])jg holds true and choosing suciently small so that P + k maxfjdiag( [k])jg P + k k [k]k 2 we arrive at the practical step-size for the convolutive mixtures separation algorithm given by = + P k maxfjdiag( [k])jg (28) 5. STABILITY ANALYSIS In this section we perform a stability analysis of the II method and provide the necessary and sucient conditions that the non-linearities must satisfy in order to ensure that the separating solutions are stable points. At the separating solution the sources s[n] should be properly scaled in order that [] = I holds true. To simplify the notation we will dene f i = f(s i [n]), g i = g(s i [n]), s i = s i [n], fi = @f(si[n]) @s i[n] and gi = @g(si[n]) @s i[n], for all i = N. It can be shown (see [2]) that (for real sources) the necessary and sucient conditions that ensure asymptotic stability of the instantaneous and convolutive II methods are E[fi s ig i ]+E[f i s i gi] > (29) E[fi]E[s j g j ]+E[fj]E[s i g i ] > (3) E[fi ]E[f j ]E[s ig i ]E[s j g j ] >E[gi ]E[g j ]E[s if i ]E[s j f j ](3) for all i jj i6=j. The rst two conditions can be easily satised by the use of strictly monotonically increasing odd functions. The third condition is not as critical since, as suggested in [4], it can be forced by just permuting the order of the functions f() andg(). Similar results have been obtained by Amari et al. [4] and Cardoso [7] when g(y) = y. Our results, however, are valid for the general case of two non-linear functions. ecently, Amari and Cardoso have shown in [2] that the best local estimating function (f(y)g T (y)) is obtained when one of the functions f() org() is linear, i.e., when f(y) = y or g(y) = y. Nevertheless, they also point out that their results characterize only the local asymptotic behavior while other estimating functions, with two non-linearities, may have preferred global properties and are more robust for noisy sensor signals. 6. COMPUTE SIMULATIONS Computer simulations were carried out to illustrate the performance of the II method for the separation of convolutive mixtures. We considered two i.i.d. 6- QAM sources and the following mixing system H[k] = [:z 3 +:2z 2 +:6z + ;:z 3 +:2z 2 ; :4z +:5 ; :4z ; +:2z ;2 ; :z ;3 :z 3 ; :3z 2 +:5z ; :6+ :5z ; ;:3z ;2 +:z ;3 +:6z ; +:2z ;2 +:z ;3 ] (see Figure ). The separating system is a two-input two-output system with four FI lters having 2 taps each. To adapt the separating system we used the algorithm given by eq. (26) and (28) in its batch version overadatawindow of samples. The chosen parameters were =,f(y) =jyj 2 y and g(y) =y. We dened the performance index P index of the separation system in terms of the overall response G[n] as P index = NX @ X M X i= + MX j= j= k NX X i= jg ij [k]j 2 max l m fjg il[m]j 2 g ; k A! jg ij [k]j 2 max l m fjg lj[m]j 2 g ; (32) Figure 4 illustrates the evolution of the performance index through the iterations, where it is clearly seen that convergence is achieved after twenty iterations. Finally, Figure 3 shows the observations of the two mixed sources and the resulting outputs after convergence. 7. CONCLUSIONS A new way to derive the generalized learning rule proposed in [] has been presented. The new approach is termed Iterative Inversion (II) method because it is an iterative procedure to invert a non-linear correlation matrix of the outputs. Although initially presented for instantaneous mixtures, the method is also extended to consider the more realistic case of convolutive mixtures. It is also noted that many existing algorithms can be interpreted as particular cases of the II method thus providing a unied perspective to BSS. Finally, a stability analysis shows that choosing strictly monotically increasing odd non-linear functions ensures local convergence to the separation solution. 8. ACKNOWLEDGMENTS Part of this workistheresultofastay at the Brain Science Institute (IKEN), supported by the Frontier esearch Program of Japan. The rst author wishes to thank the members of the Brain Information Processing esearch Group and especially Profs. Amari and

the other hand, the second matrix W b will be selected to diagonalize the skew-hermitian part of and towards this aim the on-line version of the algorithm (5) can be used. However, in order to merge (5) and (22) into a single recursion for the overall separating system W we need that both adaptations be orthogonal at a rst order. Since the matrix yy H ; I is always Hermitian, to orthogonalize the second adaptation at a rst order we can replace f(y)g H (y) ; I by its projection onto the space of skew-hermitian matrices and the following algorithm results W (n+) b = I ; f(y)g H (y) ; g(y)f H (y) +jf H (y)g(y)j W b Combining both adaptation rules as W (n+) = W (n+) b and keeping only the terms of the rst order W (n+) a expansion in (which isavalid approximation for small enough) we arriveat W (n+) = yy I H ; I ; +jy H yj ; f(y)g H (y) ; g(y)f H (y) +jf H (y)g(y)j W which is a normalized version of the family of generalized EASI algorithms [7, 7]. -Non-linear PCA algorithm: Imposing the constraint that the separation matrix W be unitary, we can derive the non-linear PCA algorithm developed by Oja and Karhunen [7] from the II method. First, let us redene the nonlinear correlation matrix as = E[f(y)gH (y)] + I where f(y) is an odd function and g(y) =f(y) ; y. Substituting now the matrix by its stochastic approximation in the II algorithm (5) we arrive at W (n+) = W ; ; I W (23) = W ; f(y) ; f H (y) ; y H W Taking into account that W is a unitary matrix, i.e., W H W = I, we can rewrite (23) as W (n+) = W + f(y) x H ; f H (y)w which is Karhunen-Oja non-linear PCA algorithm [7]. 4. CONVOLUTIVE MIXTUES In this section we will extend the ideas presented in the previous section to the case of convolutive mixtures of temporally i.i.d. sources. In this case, we propose the diagonalization of the following nonlinear correlation matrix sequence [k] = E[f(y[n])gH (y[n ; k])] = [k] I (24) where f() andg() are suitably chosen non-linear functions. Let us dene the nonlinear Power Spectral Density (PSD) matrix S (!) as the Discrete Time Fourier Transform (DTFT) of the correlation matrix [k]. Similarly,wedeneU (!) =DTFTfW [k]g. Therefore, in the frequency domain equation (24) is equivalent tohaving a nonlinear PSD matrix equal to the identity matrix for all the frequencies, i.e. S (!) = I 8!. Moreover, we can dene a new non-linear PSD matrix S Fg (!) =U; (!) sothat S S (!) ; (!) =I ) U (!) =S Fg (!) (25) It is interesting to note that, for each frequency bin!, we have a decoupled instantaneous mixture problem under the assumption that the separation lter has an innite number of coecients. Then, we can use to calculate the separation matrix at each frequency the II method presented in Section 2 resulting in the following frequency domain algorithm U (n+) (!)=U (!) ; S (!) ; I U (!) =(+ )U (!) ; S (!)U (!) Working in the frequency domain has the problem that for each frequency bin there is an independent separation algorithm that may yield to the sources with permuted orders. This limitation can be overcome if we compute the inverse DTFT of the algorithm (26) to express it in the time domain W (n+) [k] = ( + )W [k] ; [k] (26) [k] = [k] W [k] = X m= m=; [m] W [k ; m] The stochastic version of the algorithm (26) is obtained by means of the approximation [m] f(y[n]) g H (y[n;m]). Following a similar procedure to section 3, it is possible to chose the non-linearities so as to extend the existing algorithms for instantaneous mixtures to convolutive mixtures. To complete the denition of the iteration we still have to determine the step-size. In order to ensure convergence for all the frequencies should satisfy < ks (!) ; Ik 2 8! (27)

It should be noted that the derived above algorithm (4) is extended batch version of the family of adaptive algorithms proposed originally by Cichocki et al. in [, 9, 8] for blind source separation. This algorithm exhibits the equivariant property in the absence of noise. 2.2. On-line and batch adaptations The adaptive implementation of the II algorithm (5) admits both on-line and batch versions. On-line adaptation: the correlation matrix in (5) is replaced by its single sample estimation at time n, i.e., f(y[n])gh (y[n]) (6) Batch adaptation: if we assume that the stationarity and ergodicity properties hold on a block of observations of L samples we can replace in (5) the statistical average by the temporal average L L; X k= f(y[k])g H (y[k]) (7) 2.3. A practical step-size adaptation In order to provide numerical stability of the algorithms and to ensure high convergence speed it is importantto choose possibly close to optimal learning step. Taking into account that for the separation algorithm (5) we needed to ensure the condition (3) and since k ; Ik 2 +k k 2 we propose to estimate the learning step as = +k k 2 : (8) This means that for the on-line adaptation resulting in = k k 2 = jg H (y[n]) f(y[n])j +jg H (y[n]) f(y[n])j (9) In the batch case, however, taking into account that the algorithm diagonalizes, a practical estimate of the spectral radius of the correlation matrix is given by k k 2 maxfjdiag( )jg This approximation may not be true at points far from convergence. However, if is small enough, the condition for the expansion of the inverse (3) is still true. Therefore, we propose to use = +maxfjdiag( )jg (2) 3. ELATION WITH EXISTING ALGOITHMS Several existing algorithms for BSS can be derived from the II method (5): the decorrelation algorithm of Almeida [], the natural gradient algorithm for ICA [3, 2], the non-linear PCA algorithm [7] (under the requirement ofhaving an unitary separation matrix) and the normalized and generalized EASI algorithms [7, 7]. In fact, all these models can be shown to be consistent with (5) when one or more linear/non-linear correlation matrices are diagonalized. Below, we will show these relations for the on-line case. - Decorrelation algorithm: In the linear case f(y) =g(y) =y and the adaptation rule (5) reduces to the decorrelation algorithm proposed by Almeida [] (see also Douglas and Cichocki [3, 9]). W (n+) = W ; yy H ; I +jy H yj W (2) - Amari's Natural gradient algorithms for ICA: When only one of the two functions f(y) org(y) is non-linear the II algorithm converts into W (n+) = W ; f(y) y H ; I +jy H f(y)j W (22) or in its permuted form. Both are normalized versions of the Natural Gradient algorithms for ICA developed by Amari et al. [3, 2] and independently by Cardoso and Laheld [7]. - Generalized EASI algorithms: The family of generalized EASI algorithms was proposed by Karhunen et al. in [7] as an extension of the family of EASI algorithms derived by Cardoso and Laheld in [7]. We can derive its normalized version from the II method following a procedure similar to that described by Cardoso and Laheld in their article. Let us split the separation stage in two blocks connected in series, i.e., W = W b W a. The rst matrix W a will be selected to diagonalize the Hermitian matrix yy and will be adapted according to algorithm (2). On

derivation of the generalized and exible learning algorithm for instantaneous BSS is given. Section 3 shows that many existing algorithms for BSS can be derived as special cases of the II method. Section 4 extends the method to convolutive mixtures. Section 5 presents the conditions to ensure algorithm stability. Section 6 illustrates the performance through computer simulations and Section 7 is devoted to the conclusions. 2. INSTANTANEOUS MIXTUES 2.. Criterion According to the Darmois-Skitovich theorem [, 6], sources are recovered if and only if y is a vector of statistically independent signals. When this occurs, it is apparent that the nonlinear spatial correlation matrix (W) =E[f(y)g H (y)] (5) will be diagonal. Here f(y) and g(y) represent nonlinear functions of the outputs. Without loss of generality we will consider that this correlation matrix is equal to the identity at the optimum separation matrices W, i.e., (W )=I (6) Next, let us dene a new non-linear function of the outputs F(y) =W ; f(y) and a new spatial correlation matrix Fg (W) =E[F(y)g H (y)] = W ; (W) (7) This allows us to rewrite condition (6) as (W )=I ) W = ; Fg (W ) (8) Therefore, we mayinterpret that solving the BSS problem is equivalent toinverting the matrix Fg (W ). This inversion, however, cannot be carried out directly since we do not have access to F(y) andg(y) atthe optimum separating solution W.Instead,we will develop in the sequel an iterative procedure that overcomes this limitation. Let us denote W the separating matrix used in the nth iteration. According to the above discussion the best approximation to W at the nth iteration will be the inverse of Fg (W ), i.e., W (n+) = ; Fg (W ) (9) In order to avoid the computational complexity caused by matrix inversions, we propose that W (n+) be the solution to the following minimization problem arg min W ; (n+) =k Fg (W ) ; W ; (n+) k 2 F () where kk F denotes the Frobenius norm of a matrix. Let us resort now to a Newton-aphson recursive technique to solve the minimization problem (). Following an argumentation similar to the development of the Bussgang techniques for blind equalization [5] and to the continuation method [4], to extend the domain of convergence of the Newton-apshon method we willinterpret Fg (W ) as a rough estimate of the mixing matrix H at iteration n and, therefore, with a constant value. Computing the gradient and the Hessian of with respect to W ; (n+) it is obtained that = ;( @ @W;H Fg (W ) ; W ; (n+) )and @ @ = I. Therefore the resulting Newton- @W; @W;H apshon recursion that minimizes is W ; (n+) = W ; + ( Fg ; W; (n+) ) where to simplify notation Fg = Fg(W ). Dening = we can rewrite the above iteration as + W ; (n+) = ( ; ) W ; + Fg () Note from this equation that W ; (n+) is an estimate of an exponential windowed version of the spatial correlation matrix, and is in accordance with our interpretation of Fg as an estimate of the mixing system H. Moreover, it also provides some clues to select the nonlinearities F() and g() as those that provide a good estimate of H from the outputs vector y. Next, let us describe explicitly the recursion () in terms of W ; rather than W W (n+) = If is chosen in such away that < I + ; I ; W (2) k ; Ik 2 (3) where kk 2 denotes the 2-norm, we can express the inverse in (2) as a Taylor series expansion and make the following approximation I + ; I ; = X i= ; ; I i I ; ; I (4) Finally, substituting (4) into (2) we arrive at the following Iterative Newton-aphson method for the Inversion of Fg, the so-called Iterative Inversion (II) method W (n+) = W ; ; I W (5)

AN ITEATIVE INVESION METHOD FO BLIND SOUCE SEPAATION Sergio Cruces and Luis Castedo Universities of Seville and La Coru~na Camino de los Descubrimientos 492-Seville, Spain sergio@viento.us.es Andrzej Cichocki Laboratory for Open Information Systems Brain Science Institute, IKEN Wako-Shi, Saitama 35-, JAPAN cia@brain.riken.go.jp ABSTACT In this paper we present an Iterative Inversion (II) approach to derive the generalized learning rule for Blind Source Separation (BSS) proposed in []. The approach consists in the diagonalization of a non-linear spatial correlation matrix of the outputs and is rst presented for instantaneous mixtures. It is shown how existing algorithms for BSS can be derived as particular cases of the resulting generalized rule. The II method is also extended to the separation of convolutive mixtures of temporally i.i.d. sources. Finally, necessary and sucient asymptotic stability conditions for the method to converge are given.. INTODUCTION The separation of mixtures of unknown signals arises in a large number of signal processing applications such as array processing, multiuser communications and voice restoration. This problem is known as Blind Source Separation (BSS) and it has been shown [] that can be solved if the sources are non-gaussian and statistically independent. Since the pioneering work of Jutten and Herault [6], a lot of novel, ecient and robust adaptive algorithms for blind source separation have been rigorously derived and their properties have been investigated. Algorithms are developed from a dierent points of view such ascontrast functions [], mutual information maximization [5], Kullback-Leibler minimization using natural gradient approach [3] and nonlinear principal component analysis [7]. In this paper we present a new approach, the Iterative Inversion (II) method, and show that most of the existing algorithms can be obtained as particular cases of this method. The BSS separation problem is typically formulated as follows. Assume that an array of sensors provides a vector of N observed signals x[n] =[x [n] x 2 [n] x N [n]] T that are mixtures of N random processes s i [n] i = 2 N termed sources. The exact probability density function of the sources is unknown: we only assume that they are complex-valued, zero-mean, non- Gaussian distributed and statistically independent. In the convolutive mixture case we consider that the sources samples are temporally independent and identically distributed (i.i.d.), and the observations are related to the sources as follows x[n]= X k=; H[k]s[n ; k] () where s[n] =[s [n] s 2 [n] s N [n]] T is the vector of sources and H[n] is the sequence of N N impulse response matrices corresponding to the mixing system. To recover the sources, the observations are processed by a Multiple Input Multiple Output (MIMO) system with memory to produce the outputs y[n]= X k=; W[k]s[n ; k] (2) where W[n] is the sequence of N N impulse response matrices corresponding to the separating system. We will denote G[n] = W[n] H[n] the matrix impulse response of the mixing and separation system. The aim in BSS is to nd or estimate W[n] suchthatthe separating system retrieves the original sources with some specic indeterminacies. The signal model is considerably simplied when instantaneous mixtures are considered. In this case H[n] = H[n] and W[n] = W[n], and equations () and (2) reduce to x = Hs (3) y = Wx (4) (Indices with respect to n are omitted in the instantaneous case to simplify notation). This paper is structured as follows. Section 2 presents the II method for instantaneous mixtures and a new