An Improved Cumulant Based Method for Independent Component Analysis

Similar documents
Fundamentals of Principal Component Analysis (PCA), Independent Component Analysis (ICA), and Independent Vector Analysis (IVA)

PROPERTIES OF THE EMPIRICAL CHARACTERISTIC FUNCTION AND ITS APPLICATION TO TESTING FOR INDEPENDENCE. Noboru Murata

1 Introduction Independent component analysis (ICA) [10] is a statistical technique whose main applications are blind source separation, blind deconvo

Using Kernel PCA for Initialisation of Variational Bayesian Nonlinear Blind Source Separation Method

Independent Component Analysis (ICA)

ICA [6] ICA) [7, 8] ICA ICA ICA [9, 10] J-F. Cardoso. [13] Matlab ICA. Comon[3], Amari & Cardoso[4] ICA ICA

CIFAR Lectures: Non-Gaussian statistics and natural images

ICA. Independent Component Analysis. Zakariás Mátyás

Non-Euclidean Independent Component Analysis and Oja's Learning

Independent Component Analysis. Contents

Massoud BABAIE-ZADEH. Blind Source Separation (BSS) and Independent Componen Analysis (ICA) p.1/39

Independent Component Analysis and Its Applications. By Qing Xue, 10/15/2004

Gatsby Theoretical Neuroscience Lectures: Non-Gaussian statistics and natural images Parts I-II

Analytical solution of the blind source separation problem using derivatives

NONLINEAR INDEPENDENT FACTOR ANALYSIS BY HIERARCHICAL MODELS

Natural Gradient Learning for Over- and Under-Complete Bases in ICA

Blind signal processing algorithms

Introduction to Independent Component Analysis. Jingmei Lu and Xixi Lu. Abstract

Estimation of linear non-gaussian acyclic models for latent factors

BLIND SOURCES SEPARATION BY SIMULTANEOUS GENERALIZED REFERENCED CONTRASTS DIAGONALIZATION

Blind separation of instantaneous mixtures of dependent sources

Recursive Generalized Eigendecomposition for Independent Component Analysis

On Information Maximization and Blind Signal Deconvolution

ANALYSING ICA COMPONENTS BY INJECTING NOISE. August-Bebel-Strasse 89, Potsdam, Germany

TWO METHODS FOR ESTIMATING OVERCOMPLETE INDEPENDENT COMPONENT BASES. Mika Inki and Aapo Hyvärinen

To appear in Proceedings of the ICA'99, Aussois, France, A 2 R mn is an unknown mixture matrix of full rank, v(t) is the vector of noises. The

c Springer, Reprinted with permission.

1 Introduction Blind source separation (BSS) is a fundamental problem which is encountered in a variety of signal processing problems where multiple s

Soft-LOST: EM on a Mixture of Oriented Lines

Independent Component Analysis

Parametric independent component analysis for stable distributions

Independent Component Analysis of Incomplete Data

FREQUENCY-DOMAIN IMPLEMENTATION OF BLOCK ADAPTIVE FILTERS FOR ICA-BASED MULTICHANNEL BLIND DECONVOLUTION

Wavelet de-noising for blind source separation in noisy mixtures.

Principal Component Analysis

Robust extraction of specific signals with temporal structure

where A 2 IR m n is the mixing matrix, s(t) is the n-dimensional source vector (n» m), and v(t) is additive white noise that is statistically independ

Blind Instantaneous Noisy Mixture Separation with Best Interference-plus-noise Rejection

Blind Machine Separation Te-Won Lee

Independent Component Analysis

Scatter Matrices and Independent Component Analysis

BLIND SEPARATION OF INSTANTANEOUS MIXTURES OF NON STATIONARY SOURCES

NONLINEAR BLIND SOURCE SEPARATION USING KERNEL FEATURE SPACES.

Blind Source Separation via Generalized Eigenvalue Decomposition

ON SOME EXTENSIONS OF THE NATURAL GRADIENT ALGORITHM. Brain Science Institute, RIKEN, Wako-shi, Saitama , Japan

Independent Component Analysis

Independent component analysis: algorithms and applications

The squared symmetric FastICA estimator

MULTICHANNEL SIGNAL PROCESSING USING SPATIAL RANK COVARIANCE MATRICES

Optimization and Testing in Linear. Non-Gaussian Component Analysis

Independent Component Analysis

An Introduction to Independent Components Analysis (ICA)

Independent Component Analysis and Its Application on Accelerator Physics

BLIND SEPARATION USING ABSOLUTE MOMENTS BASED ADAPTIVE ESTIMATING FUNCTION. Juha Karvanen and Visa Koivunen

A Canonical Genetic Algorithm for Blind Inversion of Linear Channels

Non-orthogonal Support-Width ICA

ON-LINE BLIND SEPARATION OF NON-STATIONARY SIGNALS

Comparative Performance Analysis of Three Algorithms for Principal Component Analysis

Undercomplete Independent Component. Analysis for Signal Separation and. Dimension Reduction. Category: Algorithms and Architectures.

ICA and ISA Using Schweizer-Wolff Measure of Dependence

Independent Component Analysis (ICA) Bhaskar D Rao University of California, San Diego

Blind Extraction of Singularly Mixed Source Signals

ORIENTED PCA AND BLIND SIGNAL SEPARATION

A Constrained EM Algorithm for Independent Component Analysis

Independent Component Analysis and Blind Source Separation

Single Channel Signal Separation Using MAP-based Subspace Decomposition

Bayesian ensemble learning of generative models

Autoregressive Independent Process Analysis with Missing Observations

Blind Spectral-GMM Estimation for Underdetermined Instantaneous Audio Source Separation

Acoustic Source Separation with Microphone Arrays CCNY

IDENTIFIABILITY AND SEPARABILITY OF LINEAR ICA MODELS REVISITED. Jan Eriksson and Visa Koivunen

Independent Component Analysis. PhD Seminar Jörgen Ungh

Fourier PCA. Navin Goyal (MSR India), Santosh Vempala (Georgia Tech) and Ying Xiao (Georgia Tech)

CONTROL SYSTEMS ANALYSIS VIA BLIND SOURCE DECONVOLUTION. Kenji Sugimoto and Yoshito Kikkawa

Principal Component Analysis

Independent Component Analysis for Blind Source Separation

MISEP Linear and Nonlinear ICA Based on Mutual Information

Speed and Accuracy Enhancement of Linear ICA Techniques Using Rational Nonlinear Functions

A Convex Cauchy-Schwarz Divergence Measure for Blind Source Separation

Independent Components Analysis by Direct Entropy Minimization

FuncICA for time series pattern discovery

Comparison of Fast ICA and Gradient Algorithms of Independent Component Analysis for Separation of Speech Signals

One-unit Learning Rules for Independent Component Analysis

Advanced Introduction to Machine Learning CMU-10715

BLIND source separation (BSS) aims at recovering a vector

BLIND SEPARATION OF INSTANTANEOUS MIXTURES OF NON STATIONARY SOURCES

A Pseudo-Euclidean Iteration for Optimal Recovery in Noisy ICA

Blind Signal Separation: Statistical Principles

Lecture 10: Dimension Reduction Techniques

Lecture'12:' SSMs;'Independent'Component'Analysis;' Canonical'Correla;on'Analysis'

Nonlinear reverse-correlation with synthesized naturalistic noise

Principal Component Analysis vs. Independent Component Analysis for Damage Detection

CONVOLUTIVE NON-NEGATIVE MATRIX FACTORISATION WITH SPARSENESS CONSTRAINT

By choosing to view this document, you agree to all provisions of the copyright laws protecting it.

Semi-Blind approaches to source separation: introduction to the special session

Separation of Different Voices in Speech using Fast Ica Algorithm

Short-Time ICA for Blind Separation of Noisy Speech

Robust Independent Component Analysis via Minimum -Divergence Estimation

ADAPTIVE LATERAL INHIBITION FOR NON-NEGATIVE ICA. Mark Plumbley

Transcription:

An Improved Cumulant Based Method for Independent Component Analysis Tobias Blaschke and Laurenz Wiskott Institute for Theoretical Biology Humboldt University Berlin Invalidenstraße 43 D - 0 5 Berlin Germany {t.blaschkel.wiskott}@biologie.hu-berlin.de http://itb.biologie.hu-berlin.de/ Abstract. An improved method for independent component analysis based on the diagonalization of cumulant tensors is proposed. It is based on Comon s algorithm [] but it takes third- and fourth-order cumulant tensors into account simultaneously. The underlying contrast function is also mathematically much simpler and has a more intuitive interpretation. It is therefore easier to optimize and approximate. A comparison with Comon s algorithm JADE [2] and FastICA [3] on different data sets demonstrates its performance. Introduction Vectorial input signals xt = [x t... x N t] T to be analyzed are often a mixture of some underlying signals s i t coming from different sources. In a simple model of this data generation process it is assumed that there are as many sources as input signal components that the source signal components s i are statistically independent and that the mixing is linear and noise free yielding the relation x = As with fixed mixing matrix A and source signal st = [s t... s N t] T. In the following we will drop the reference to time and simply assume some sets of zero mean sources and input data related by. The input components are usually statistically dependent due to the mixing process while the sources are not. If one succeeds in finding a matrix R that yields statistically independent output components u k given by u = Rx = RAs 2 one can recover the original sources s i up to a permutation and constant scaling of the sources. R is called the unmixing matrix and finding the matrix is referred to as independent component analysis ICA or blind source separation. There is a variety of methods for performing ICA and a large body of literature see [45] for an overview. We extend here on the cumulant based methods [67] and present an improved algorithm that takes third- and fourth-order cumulants into account simultaneously and at the same time is simpler and faster than Comon s algorithm [] which our algorithm is based upon. Supported by a grant from the Volkswagen Foundation.

2 Improved ICA Algorithm 2. Cumulants and Independence Statistical properties of the output data set u can be described by its moments or more conveniently by its cumulants C... u. Cumulants of a given order form a tensor e.g. third-order cumulants are defined by C u ijk := u iu j u k with indicating the mean over all data points. The diagonal elements characterize the distribution of single components. For example C u i C u ii C u iii and Cu iiii are the mean variance skewness and kurtosis of u i respectively. The off-diagonal elements all cumulants with ijkl iiii characterize the statistical dependencies between components. If and only if all components u i are statistically independent the off-diagonal elements vanish and the cumulant tensors of all orders are diagonal assuming infinite amount of data. Thus ICA is equivalent to finding an unmixing matrix that diagonalizes the cumulant tensors C... u of the output data u i at least approximately. The first order cumulant tensor is a vector and doesn t have off-diagonal elements. The second order cumulant tensor can be diagonalized easily by whitening the input data x with an appropriate matrix W yielding y = Wx. It can be shown that this exact diagonalization of the second order cumulant tensor of the whitened data y is preserved if and only if the final matrix Q generating the output data u = Qy is orthogonal i.e. a pure rotation possibly plus reflections [8]. In general there doesn t exist an orthogonal matrix that would diagonalize also the thirdor fourth-order cumulant tensor thus the diagonalization of these tensors can only be approximate and we need to define an optimization criterion for this approximate diagonalization which is done in the next section. 2.2 Contrast Function Since the goal is to diagonalize the cumulant tensors a suitable contrast function or cost function to be minimized would be the square sum over all third- and fourth-order off-diagonal elements. However because the square sum over all elements of a cumulant tensor is preserved under any orthogonal transformation Q of the underlying data y [9] one can equally well maximize the sum over the diagonal elements Ψ 34 u = 3! α 2 C ααα u + 4! α C u αααα 2. 3 The factors 3! and 4! arise from an expansion of the Kullback-Leibler divergence of u and y which provides a principled derivation of this contrast function [70]. Due to the multilinearity of the cumulants C... u in C... y 3 can be rewritten as Ψ 34 Q = 2+ Q αβ Q αγ Q αδ C y 2 βγδ Q αβ Q αγ Q αδ Q αɛ C y βγδɛ 3! 4! α βγδ α βγδɛ }{{}}{{} C u ααα C u αααα 4

which is now subject to an optimization procedure to find the orthogonal matrix Q that maximizes it. 2.3 Givens Rotations Any orthogonal N N matrix such as Q can be written as a product of NN 2 or more Givens rotation matrices Q r for the rotation part and a diagonal matrix with diagonal elements ± for the reflection part. Since reflections don t matter in our case we only consider the Givens rotations. A Givens rotation is a plane rotation around the origin within a subspace spanned by two selected components. For simplicity and without loss of generality we consider only N = 2 so that the Givens rotation matrix reads Q r cos φ sin φ =. 5 sin φ cos φ With 5 the contrast function 4 can be rewritten as Ψ 34 φ = Ψ 3 φ + Ψ 4 φ with Ψ n φ := n d ni cos φ 2n i sin φ i + cos φ i sin φ 2n i 6 n! i=0 with some constants d ni that depend only on the cumulants before rotation. The next step is to find the angle of rotation φ that maximizes Ψ 34. For N > 2 one has to perform several Givens rotations for each pair of components in order to find the global maximum. To simplify this equation Comon [] defined some auxiliary variables θ := tan φ and ξ := θ θ and derived Ψ 3 θ = θ + 3 3 a i θ i θ i 7 3! θ i= Ψ 4 ξ = ξ 2 + 4 2 4 b i ξ i 8 4! i=0 for 6 with some constants a i and b i depending on the cumulants before rotation. To maximize 7 or 8 one has to take their derivative and find the root giving the largest value for Ψ 34. With this formulation only either the thirdorder or the fourth-order diagonal cumulants can be maximized but not both simultaneously. In a more direct approach and after some quite involved calculations using various trigonometric addition theorems we were able to derive a contrast function that combines third- and fourth-order cumulants is mathematically much simpler has a more intuitive interpretation and is therefore easier to optimize and approximate. We found Ψ 34 φ = A 0 + A 4 cos 4φ + φ 4 + A 8 cos 8φ + φ 8 9

with some constants A 0 A 4 A 8 and φ 4 φ 8 that depend only on the cumulants before rotation see the appendix. The third term comes from the fourth order cumulants only while the first two terms incorporate information from the thirdand the fourth-order cumulants. We found empirically that A 8 is usually small compared to A 4 by a factor of about /8. In the simulations we will therefore also consider an approximate contrast function Ψ 34 φ := A 0 + A 4 cos 4φ + φ 4 which is trivial to maximize. Note that Ψ 34 still takes third- and fourth-order cumulants into account. Contrast functions for third- or fourth-order cumulants only i.e. Ψ 3 Ψ 4 Ψ 3 or Ψ 4 can be easily obtained by setting all fourth- or third-order cumulants to zero respectively. 3 Simulations We compared four variants of the improved algorithm namely those based on Ψ 34 φ Ψ34 φ Ψ 4 φ and Ψ 4 φ with Comon s original algorithm based on Ψ 4 ξ [] with the JADE algorithm [2] that diagonalizes 4th order cumulant matrices and the FastICA package using a fixed-point algorithm with different nonlinearities [3]. We assembled three different data sets of length 4428 each mixed by a randomly chosen mixing matrix. To quantify the performances we used the error measure E proposed by Amari et al. [] which indicates good unmixing by low values and vanishes for perfect unmixing. Table shows the performance of the different algorithms in Matlab implementation 2. For underlying symmetric distributions all algorithms perform similarly well. If the source distributions are skewed the additional third order information is crucial and the new method and the FastICA algorithm using a corresponding nonlinearity clearly unmix better. In the case where the sources are both symmetric and asymmetric only the new algorithm can discriminate between the different distributions. Furthermore the algorithms with approximate contrast work as well as those with the exact contrast. 4 Conclusion We have proposed an improved cumulant based method for independent component analysis. In contrast to Comon s method [] it diagonalizes third- and fourth-order cumulants simultaneously and is thus able to handle linear mixtures of symmetric and skew distributed source signals. Due to its mathematically simple formulation an approximate algorithm can be derived which shows equal unmixing performance. It can handle also sources with symmetric and asymmteric components. We have always chosen the nonlinearity yielding best performance. 2 Code is available from http://itb.biologie.hu-berlin.de/~blaschke

Table. Unmixing performance for different algorithms and data sets: i 5 real acoustic sources [2] mean kurtosis: 3.808 + normal distributed source N 0 ii 5 skew-normal distributed sources mean skewness: 0.078 + normal distributed source iii 3 music sources [3] mean kurtosis:.636 + 3 skew-normal distributed sources + normal distributed source. Low values of E indicate good performance. Simulations have been carried out on a.4 GHz AMD-Athlon PC using Matlab implementation. contrast function / method unmixing performance E elapsed time/s i ii iii i ii iii Ψ 34 φ 3rd+4th order 0.02 0.053 0.8.9.9 3.6 Ψ 34 φ 3rd+4th order approx. 0.03 0.054 0.8.9.9 3.6 Ψ 4 φ 4th order 0.0 5.8.74.7.7 6.0 Ψ 4 φ 4th order approx. 0.02 6.27.93.9.9 3.6 Ψ 4 ξ Comon [] 0.02 4.5 2. 4.9 4.9 9.2 JADE [2] 0.0 5..62.3.3 2.0 FastICA [3] 0.00 0.058.00.9 0.49 0.7 5 Appendix From 6 one can derive Ψ n φ = a n0 + s n4 sin 4φ + c n4 cos 4φ + s n8 sin 8φ + c n8 cos 8φ with a 30 := [ 5 C y 2 y 2 + C 222 3! 8 ] + 9 C y 2 y 2 + C 222 + 6 C y Cy 22 + Cy 2 Cy 222 a 40 := [ 35 C y 2 y 2 + C 2222 4! 64 + 6 5 C y 2 y 2 + C 2222 + 2 5 C y Cy 22 + Cy 22 Cy 2222 + 36 3 C y 2 y 22 + 32 3 C 2 Cy 222 + 2 3 Cy Cy 2222 [ ] s 34 := 6 C y 3! 4 Cy 2 Cy 22 Cy 222 c 34 := [ 3 C y 2 y 2 + C 222 3! 8 9 C y 2 y 2 + C 222 6 s 44 := [ 8 7 C y 4! 32 Cy 2 Cy 222 Cy 2222 + 48 C y 2 Cy 22 Cy 22 Cy 222 + 8 c 44 := [ 7 C y 2 y 2 + C 2222 4! 6 6 C y 2 y 2 + C 2222 2 ] C y Cy 22 + Cy 2 Cy 222 ] C y Cy 222 Cy 2 Cy 2222 C y Cy 22 + Cy 22 Cy 2222 ]

] 36 C y 2 y 22 32 C 2 Cy 222 2 Cy Cy 2222 [ s 48 := 8 4! 64 48 c 48 := [ 4! 64 6 C y Cy 2 Cy 222 Cy 2222 C y 2 Cy 22 Cy 22 Cy 222 C y 2 y + C 2222 2 C y 2 y 2 + C 2222 2 8 ] C y Cy 222 Cy 2 Cy 2222 C y Cy 22 + Cy 22 Cy 2222 ] + 36 C y 2 y 22 + 32 C 2 Cy 222 + 2 Cy Cy 2222 and s 38 := c 38 := 0. With this it is trivial to determine the constants for Ψ 34 φ = Ψ 3 φ + Ψ 4 φ in the form given in 9. We find: A 0 := a 30 + a 40 A 4 := c 34 + c 44 2 + s 34 + s 44 2 A 8 := c 2 48 + s2 48 tan φ4 := s 34+s 44 c 34 +c 44 tan φ 8 := s 48 c 48. The coefficients A 0 A 8 A 4 and φ 8 φ 4 are functions of the cumulants of 3rd and 4th order of the centered and whitened signals y. References. Comon P.: Tensor diagonalization a useful tool in signal processing. In Blanke M. Soderstrom T. eds.: IFAC-SYSID 0th IFAC Symposium on System Identification. Volume. Copenhagen Denmark 994 77 82 2. Cardoso J.F. Souloumiac A.: Blind beamforming for non Gaussian signals. IEE Proceedings-F 40 993 362 370 3. Hyvärinen A.: Fast and robust fixed-point algorithms for independent component analysis. IEEE Transactions on Neural Networks 0 999 626 634 4. Hyvärinen A. Karhunen J. Oja E.: Independent Component Analysis. Wiley Series on Adaptive and Learning Systems for Signal Processing Communications and Control. John Wiley&Sons 200 5. Lee T.W. Girolami M. Bell A. Sejnowski T.: A unifying framework for independent component analysis. International Journal of Computers and Mathematics with Applications 39 2000 2 2 6. Comon P.: Independent component analysis. In Lacoume J. ed.: Proc. Int. Sig. Proc. Workshop on Higher-Order Statistics Chamrousse France 99 29 38 7. Cardoso J.F.: High-order contrasts for independent component analysis. Neural Computation 999 57 92 8. Hyvärinen A.: Survey on independent component analysis. Neural Computing Surveys 2 999 94 28 9. Deco G. Obradovic D.: An Information-Theoretic Approach to Neural Computing. Springer Series in Perspectives in Neural Computing. Springer 996 0. McCullagh P.: Tensor Methods in Statistics. Monographs on Statistics and Applied Probability. Chapmann and Hall 987. Amari S. Cichocki A. Yang H.: Recurrent neural networks for blind separation of sources. In: Proc. of the Int. Symposium on Nonlinear Theory and its Applications NOLTA-95 Las Vegas USA 995 37 42 2. John F. Kennedy Library Boston: Sound excerpts from the speeches of president John F. Kennedy. http://www.jfklibrary.org 3. Pearlmutter B.: 6 clips sampled from audio CDs. http://sweat.cs.unm.edu/ ~bap