An Improved Cumulant Based Method for Independent Component Analysis

An Improved Cumulant Based Method for Independent Component Analysis Tobias Blaschke and Laurenz Wiskott Institute for Theoretical Biology Humboldt University Berlin Invalidenstraße 43 D - 0 5 Berlin Germany {t.blaschkel.wiskott}@biologie.hu-berlin.de http://itb.biologie.hu-berlin.de/ Abstract. An improved method for independent component analysis based on the diagonalization of cumulant tensors is proposed. It is based on Comon s algorithm [] but it takes third- and fourth-order cumulant tensors into account simultaneously. The underlying contrast function is also mathematically much simpler and has a more intuitive interpretation. It is therefore easier to optimize and approximate. A comparison with Comon s algorithm JADE [2] and FastICA [3] on different data sets demonstrates its performance. Introduction Vectorial input signals xt = [x t... x N t] T to be analyzed are often a mixture of some underlying signals s i t coming from different sources. In a simple model of this data generation process it is assumed that there are as many sources as input signal components that the source signal components s i are statistically independent and that the mixing is linear and noise free yielding the relation x = As with fixed mixing matrix A and source signal st = [s t... s N t] T. In the following we will drop the reference to time and simply assume some sets of zero mean sources and input data related by. The input components are usually statistically dependent due to the mixing process while the sources are not. If one succeeds in finding a matrix R that yields statistically independent output components u k given by u = Rx = RAs 2 one can recover the original sources s i up to a permutation and constant scaling of the sources. R is called the unmixing matrix and finding the matrix is referred to as independent component analysis ICA or blind source separation. There is a variety of methods for performing ICA and a large body of literature see [45] for an overview. We extend here on the cumulant based methods [67] and present an improved algorithm that takes third- and fourth-order cumulants into account simultaneously and at the same time is simpler and faster than Comon s algorithm [] which our algorithm is based upon. Supported by a grant from the Volkswagen Foundation.

2 Improved ICA Algorithm 2. Cumulants and Independence Statistical properties of the output data set u can be described by its moments or more conveniently by its cumulants C... u. Cumulants of a given order form a tensor e.g. third-order cumulants are defined by C u ijk := u iu j u k with indicating the mean over all data points. The diagonal elements characterize the distribution of single components. For example C u i C u ii C u iii and Cu iiii are the mean variance skewness and kurtosis of u i respectively. The off-diagonal elements all cumulants with ijkl iiii characterize the statistical dependencies between components. If and only if all components u i are statistically independent the off-diagonal elements vanish and the cumulant tensors of all orders are diagonal assuming infinite amount of data. Thus ICA is equivalent to finding an unmixing matrix that diagonalizes the cumulant tensors C... u of the output data u i at least approximately. The first order cumulant tensor is a vector and doesn t have off-diagonal elements. The second order cumulant tensor can be diagonalized easily by whitening the input data x with an appropriate matrix W yielding y = Wx. It can be shown that this exact diagonalization of the second order cumulant tensor of the whitened data y is preserved if and only if the final matrix Q generating the output data u = Qy is orthogonal i.e. a pure rotation possibly plus reflections [8]. In general there doesn t exist an orthogonal matrix that would diagonalize also the thirdor fourth-order cumulant tensor thus the diagonalization of these tensors can only be approximate and we need to define an optimization criterion for this approximate diagonalization which is done in the next section. 2.2 Contrast Function Since the goal is to diagonalize the cumulant tensors a suitable contrast function or cost function to be minimized would be the square sum over all third- and fourth-order off-diagonal elements. However because the square sum over all elements of a cumulant tensor is preserved under any orthogonal transformation Q of the underlying data y [9] one can equally well maximize the sum over the diagonal elements Ψ 34 u = 3! α 2 C ααα u + 4! α C u αααα 2. 3 The factors 3! and 4! arise from an expansion of the Kullback-Leibler divergence of u and y which provides a principled derivation of this contrast function [70]. Due to the multilinearity of the cumulants C... u in C... y 3 can be rewritten as Ψ 34 Q = 2+ Q αβ Q αγ Q αδ C y 2 βγδ Q αβ Q αγ Q αδ Q αɛ C y βγδɛ 3! 4! α βγδ α βγδɛ }{{}}{{} C u ααα C u αααα 4

which is now subject to an optimization procedure to find the orthogonal matrix Q that maximizes it. 2.3 Givens Rotations Any orthogonal N N matrix such as Q can be written as a product of NN 2 or more Givens rotation matrices Q r for the rotation part and a diagonal matrix with diagonal elements ± for the reflection part. Since reflections don t matter in our case we only consider the Givens rotations. A Givens rotation is a plane rotation around the origin within a subspace spanned by two selected components. For simplicity and without loss of generality we consider only N = 2 so that the Givens rotation matrix reads Q r cos φ sin φ =. 5 sin φ cos φ With 5 the contrast function 4 can be rewritten as Ψ 34 φ = Ψ 3 φ + Ψ 4 φ with Ψ n φ := n d ni cos φ 2n i sin φ i + cos φ i sin φ 2n i 6 n! i=0 with some constants d ni that depend only on the cumulants before rotation. The next step is to find the angle of rotation φ that maximizes Ψ 34. For N > 2 one has to perform several Givens rotations for each pair of components in order to find the global maximum. To simplify this equation Comon [] defined some auxiliary variables θ := tan φ and ξ := θ θ and derived Ψ 3 θ = θ + 3 3 a i θ i θ i 7 3! θ i= Ψ 4 ξ = ξ 2 + 4 2 4 b i ξ i 8 4! i=0 for 6 with some constants a i and b i depending on the cumulants before rotation. To maximize 7 or 8 one has to take their derivative and find the root giving the largest value for Ψ 34. With this formulation only either the thirdorder or the fourth-order diagonal cumulants can be maximized but not both simultaneously. In a more direct approach and after some quite involved calculations using various trigonometric addition theorems we were able to derive a contrast function that combines third- and fourth-order cumulants is mathematically much simpler has a more intuitive interpretation and is therefore easier to optimize and approximate. We found Ψ 34 φ = A 0 + A 4 cos 4φ + φ 4 + A 8 cos 8φ + φ 8 9

with some constants A 0 A 4 A 8 and φ 4 φ 8 that depend only on the cumulants before rotation see the appendix. The third term comes from the fourth order cumulants only while the first two terms incorporate information from the thirdand the fourth-order cumulants. We found empirically that A 8 is usually small compared to A 4 by a factor of about /8. In the simulations we will therefore also consider an approximate contrast function Ψ 34 φ := A 0 + A 4 cos 4φ + φ 4 which is trivial to maximize. Note that Ψ 34 still takes third- and fourth-order cumulants into account. Contrast functions for third- or fourth-order cumulants only i.e. Ψ 3 Ψ 4 Ψ 3 or Ψ 4 can be easily obtained by setting all fourth- or third-order cumulants to zero respectively. 3 Simulations We compared four variants of the improved algorithm namely those based on Ψ 34 φ Ψ34 φ Ψ 4 φ and Ψ 4 φ with Comon s original algorithm based on Ψ 4 ξ [] with the JADE algorithm [2] that diagonalizes 4th order cumulant matrices and the FastICA package using a fixed-point algorithm with different nonlinearities [3]. We assembled three different data sets of length 4428 each mixed by a randomly chosen mixing matrix. To quantify the performances we used the error measure E proposed by Amari et al. [] which indicates good unmixing by low values and vanishes for perfect unmixing. Table shows the performance of the different algorithms in Matlab implementation 2. For underlying symmetric distributions all algorithms perform similarly well. If the source distributions are skewed the additional third order information is crucial and the new method and the FastICA algorithm using a corresponding nonlinearity clearly unmix better. In the case where the sources are both symmetric and asymmetric only the new algorithm can discriminate between the different distributions. Furthermore the algorithms with approximate contrast work as well as those with the exact contrast. 4 Conclusion We have proposed an improved cumulant based method for independent component analysis. In contrast to Comon s method [] it diagonalizes third- and fourth-order cumulants simultaneously and is thus able to handle linear mixtures of symmetric and skew distributed source signals. Due to its mathematically simple formulation an approximate algorithm can be derived which shows equal unmixing performance. It can handle also sources with symmetric and asymmteric components. We have always chosen the nonlinearity yielding best performance. 2 Code is available from http://itb.biologie.hu-berlin.de/~blaschke

Table. Unmixing performance for different algorithms and data sets: i 5 real acoustic sources [2] mean kurtosis: 3.808 + normal distributed source N 0 ii 5 skew-normal distributed sources mean skewness: 0.078 + normal distributed source iii 3 music sources [3] mean kurtosis:.636 + 3 skew-normal distributed sources + normal distributed source. Low values of E indicate good performance. Simulations have been carried out on a.4 GHz AMD-Athlon PC using Matlab implementation. contrast function / method unmixing performance E elapsed time/s i ii iii i ii iii Ψ 34 φ 3rd+4th order 0.02 0.053 0.8.9.9 3.6 Ψ 34 φ 3rd+4th order approx. 0.03 0.054 0.8.9.9 3.6 Ψ 4 φ 4th order 0.0 5.8.74.7.7 6.0 Ψ 4 φ 4th order approx. 0.02 6.27.93.9.9 3.6 Ψ 4 ξ Comon [] 0.02 4.5 2. 4.9 4.9 9.2 JADE [2] 0.0 5..62.3.3 2.0 FastICA [3] 0.00 0.058.00.9 0.49 0.7 5 Appendix From 6 one can derive Ψ n φ = a n0 + s n4 sin 4φ + c n4 cos 4φ + s n8 sin 8φ + c n8 cos 8φ with a 30 := [ 5 C y 2 y 2 + C 222 3! 8 ] + 9 C y 2 y 2 + C 222 + 6 C y Cy 22 + Cy 2 Cy 222 a 40 := [ 35 C y 2 y 2 + C 2222 4! 64 + 6 5 C y 2 y 2 + C 2222 + 2 5 C y Cy 22 + Cy 22 Cy 2222 + 36 3 C y 2 y 22 + 32 3 C 2 Cy 222 + 2 3 Cy Cy 2222 [ ] s 34 := 6 C y 3! 4 Cy 2 Cy 22 Cy 222 c 34 := [ 3 C y 2 y 2 + C 222 3! 8 9 C y 2 y 2 + C 222 6 s 44 := [ 8 7 C y 4! 32 Cy 2 Cy 222 Cy 2222 + 48 C y 2 Cy 22 Cy 22 Cy 222 + 8 c 44 := [ 7 C y 2 y 2 + C 2222 4! 6 6 C y 2 y 2 + C 2222 2 ] C y Cy 22 + Cy 2 Cy 222 ] C y Cy 222 Cy 2 Cy 2222 C y Cy 22 + Cy 22 Cy 2222 ]

] 36 C y 2 y 22 32 C 2 Cy 222 2 Cy Cy 2222 [ s 48 := 8 4! 64 48 c 48 := [ 4! 64 6 C y Cy 2 Cy 222 Cy 2222 C y 2 Cy 22 Cy 22 Cy 222 C y 2 y + C 2222 2 C y 2 y 2 + C 2222 2 8 ] C y Cy 222 Cy 2 Cy 2222 C y Cy 22 + Cy 22 Cy 2222 ] + 36 C y 2 y 22 + 32 C 2 Cy 222 + 2 Cy Cy 2222 and s 38 := c 38 := 0. With this it is trivial to determine the constants for Ψ 34 φ = Ψ 3 φ + Ψ 4 φ in the form given in 9. We find: A 0 := a 30 + a 40 A 4 := c 34 + c 44 2 + s 34 + s 44 2 A 8 := c 2 48 + s2 48 tan φ4 := s 34+s 44 c 34 +c 44 tan φ 8 := s 48 c 48. The coefficients A 0 A 8 A 4 and φ 8 φ 4 are functions of the cumulants of 3rd and 4th order of the centered and whitened signals y. References. Comon P.: Tensor diagonalization a useful tool in signal processing. In Blanke M. Soderstrom T. eds.: IFAC-SYSID 0th IFAC Symposium on System Identification. Volume. Copenhagen Denmark 994 77 82 2. Cardoso J.F. Souloumiac A.: Blind beamforming for non Gaussian signals. IEE Proceedings-F 40 993 362 370 3. Hyvärinen A.: Fast and robust fixed-point algorithms for independent component analysis. IEEE Transactions on Neural Networks 0 999 626 634 4. Hyvärinen A. Karhunen J. Oja E.: Independent Component Analysis. Wiley Series on Adaptive and Learning Systems for Signal Processing Communications and Control. John Wiley&Sons 200 5. Lee T.W. Girolami M. Bell A. Sejnowski T.: A unifying framework for independent component analysis. International Journal of Computers and Mathematics with Applications 39 2000 2 2 6. Comon P.: Independent component analysis. In Lacoume J. ed.: Proc. Int. Sig. Proc. Workshop on Higher-Order Statistics Chamrousse France 99 29 38 7. Cardoso J.F.: High-order contrasts for independent component analysis. Neural Computation 999 57 92 8. Hyvärinen A.: Survey on independent component analysis. Neural Computing Surveys 2 999 94 28 9. Deco G. Obradovic D.: An Information-Theoretic Approach to Neural Computing. Springer Series in Perspectives in Neural Computing. Springer 996 0. McCullagh P.: Tensor Methods in Statistics. Monographs on Statistics and Applied Probability. Chapmann and Hall 987. Amari S. Cichocki A. Yang H.: Recurrent neural networks for blind separation of sources. In: Proc. of the Int. Symposium on Nonlinear Theory and its Applications NOLTA-95 Las Vegas USA 995 37 42 2. John F. Kennedy Library Boston: Sound excerpts from the speeches of president John F. Kennedy. http://www.jfklibrary.org 3. Pearlmutter B.: 6 clips sampled from audio CDs. http://sweat.cs.unm.edu/ ~bap