FASTGEO - A HISTOGRAM BASED APPROACH TO LINEAR GEOMETRIC ICA. Andreas Jung, Fabian J. Theis, Carlos G. Puntonet, Elmar W. Lang

Similar documents
A Canonical Genetic Algorithm for Blind Inversion of Linear Channels

1 Introduction Independent component analysis (ICA) [10] is a statistical technique whose main applications are blind source separation, blind deconvo

ADAPTIVE LATERAL INHIBITION FOR NON-NEGATIVE ICA. Mark Plumbley

On the Estimation of the Mixing Matrix for Underdetermined Blind Source Separation in an Arbitrary Number of Dimensions

On Information Maximization and Blind Signal Deconvolution

SOMICA and Geometric ICA

LA PRISE DE CALAIS. çoys, çoys, har - dis. çoys, dis. tons, mantz, tons, Gas. c est. à ce. C est à ce. coup, c est à ce

Independent Component Analysis

Blind channel deconvolution of real world signals using source separation techniques

PROPERTIES OF THE EMPIRICAL CHARACTERISTIC FUNCTION AND ITS APPLICATION TO TESTING FOR INDEPENDENCE. Noboru Murata

Vectors. Teaching Learning Point. Ç, where OP. l m n

Independent Component Analysis

Non-Euclidean Independent Component Analysis and Oja's Learning

Analytical solution of the blind source separation problem using derivatives

Natural Gradient Learning for Over- and Under-Complete Bases in ICA

Fundamentals of Principal Component Analysis (PCA), Independent Component Analysis (ICA), and Independent Vector Analysis (IVA)

NONLINEAR BLIND SOURCE SEPARATION USING KERNEL FEATURE SPACES.

Framework for functional tree simulation applied to 'golden delicious' apple trees

Massoud BABAIE-ZADEH. Blind Source Separation (BSS) and Independent Componen Analysis (ICA) p.1/39

TWO METHODS FOR ESTIMATING OVERCOMPLETE INDEPENDENT COMPONENT BASES. Mika Inki and Aapo Hyvärinen

Parametric independent component analysis for stable distributions

Natural Image Statistics

An Introduction to Optimal Control Applied to Disease Models

Gatsby Theoretical Neuroscience Lectures: Non-Gaussian statistics and natural images Parts I-II

Recursive Generalized Eigendecomposition for Independent Component Analysis

MULTICHANNEL SIGNAL PROCESSING USING SPATIAL RANK COVARIANCE MATRICES

One-unit Learning Rules for Independent Component Analysis

CIFAR Lectures: Non-Gaussian statistics and natural images

An Example file... log.txt

BLIND SEPARATION OF INSTANTANEOUS MIXTURES OF NON STATIONARY SOURCES

Independent Component Analysis


Estimation of linear non-gaussian acyclic models for latent factors

Complete Blind Subspace Deconvolution

An Improved Cumulant Based Method for Independent Component Analysis

SPARSE REPRESENTATION AND BLIND DECONVOLUTION OF DYNAMICAL SYSTEMS. Liqing Zhang and Andrzej Cichocki

Blind Extraction of Singularly Mixed Source Signals

Semi-Blind approaches to source separation: introduction to the special session

Autoregressive Independent Process Analysis with Missing Observations

Undercomplete Independent Component. Analysis for Signal Separation and. Dimension Reduction. Category: Algorithms and Architectures.

Optimal data-independent point locations for RBF interpolation

To appear in Proceedings of the ICA'99, Aussois, France, A 2 R mn is an unknown mixture matrix of full rank, v(t) is the vector of noises. The

ON SOME EXTENSIONS OF THE NATURAL GRADIENT ALGORITHM. Brain Science Institute, RIKEN, Wako-shi, Saitama , Japan

MULTICHANNEL BLIND SEPARATION AND. Scott C. Douglas 1, Andrzej Cichocki 2, and Shun-ichi Amari 2

Independent Component Analysis of Incomplete Data

Margin Maximizing Loss Functions

An Analytical Comparison between Bayes Point Machines and Support Vector Machines

A Constrained EM Algorithm for Independent Component Analysis

Planning for Reactive Behaviors in Hide and Seek

CONTROL SYSTEMS ANALYSIS VIA BLIND SOURCE DECONVOLUTION. Kenji Sugimoto and Yoshito Kikkawa

OC330C. Wiring Diagram. Recommended PKH- P35 / P50 GALH PKA- RP35 / RP50. Remarks (Drawing No.) No. Parts No. Parts Name Specifications

ON-LINE BLIND SEPARATION OF NON-STATIONARY SIGNALS

QUESTIONS ON QUARKONIUM PRODUCTION IN NUCLEAR COLLISIONS

Comparative Analysis of ICA Based Features

Glasgow eprints Service

Using Kernel PCA for Initialisation of Variational Bayesian Nonlinear Blind Source Separation Method

A Beamforming Method for Blind Calibration of Time-Interleaved A/D Converters

ETIKA V PROFESII PSYCHOLÓGA

New BaBar Results on Rare Leptonic B Decays

A Generalization of Principal Component Analysis to the Exponential Family

Blind Machine Separation Te-Won Lee

Equivalence Probability and Sparsity of Two Sparse Solutions in Sparse Representation

A Review of Independent Component Analysis Techniques

Clustering VS Classification

4.3 Laplace Transform in Linear System Analysis

MINIMIZATION-PROJECTION (MP) APPROACH FOR BLIND SOURCE SEPARATION IN DIFFERENT MIXING MODELS

AN IDENTIFICATION ALGORITHM FOR ARMAX SYSTEMS

Single Channel Signal Separation Using MAP-based Subspace Decomposition

Vector analysis. 1 Scalars and vectors. Fields. Coordinate systems 1. 2 The operator The gradient, divergence, curl, and Laplacian...

Advanced Introduction to Machine Learning CMU-10715

BLIND SOURCE SEPARATION IN POST NONLINEAR MIXTURES

Blind Source Separation Using Artificial immune system

ICA [6] ICA) [7, 8] ICA ICA ICA [9, 10] J-F. Cardoso. [13] Matlab ICA. Comon[3], Amari & Cardoso[4] ICA ICA

Independent Component Analysis and Blind Source Separation

Independent Component Analysis and Its Applications. By Qing Xue, 10/15/2004

Fast Parallel Estimation of High Dimensional Information Theoretical Quantities with Low Dimensional Random Projection Ensembles

Matrices and Determinants

IEEE TRANSACTIONS ON SIGNAL PROCESSING, VOL. 54, NO. 2, FEBRUARY

" #$ P UTS W U X [ZY \ Z _ `a \ dfe ih j mlk n p q sr t u s q e ps s t x q s y i_z { U U z W } y ~ y x t i e l US T { d ƒ ƒ ƒ j s q e uˆ ps i ˆ p q y

Wavelet de-noising for blind source separation in noisy mixtures.


B œ c " " ã B œ c 8 8. such that substituting these values for the B 3 's will make all the equations true

Statistical Pattern Recognition

THEORETICAL CONCEPTS & APPLICATIONS OF INDEPENDENT COMPONENT ANALYSIS

Real and Complex Independent Subspace Analysis by Generalized Variance

A Functional Quantum Programming Language

NONLINEAR INDEPENDENT FACTOR ANALYSIS BY HIERARCHICAL MODELS

c Springer, Reprinted with permission.

Linear Regression and Its Applications

Independent component analysis: an introduction

ACENTRAL problem in neural-network research, as well

Distinguishing Causes from Effects using Nonlinear Acyclic Causal Models

Bayesian ensemble learning of generative models

Lecture'12:' SSMs;'Independent'Component'Analysis;' Canonical'Correla;on'Analysis'

1 Introduction The Separation of Independent Sources (SIS) assumes that some unknown but independent temporal signals propagate through a mixing and/o

Lund Institute of Technology Centre for Mathematical Sciences Mathematical Statistics

Remaining energy on log scale Number of linear PCA components

ICA. Independent Component Analysis. Zakariás Mátyás

Manifold Regularization

Transcription:

FASTGEO - A HISTOGRAM BASED APPROACH TO LINEAR GEOMETRIC ICA Andreas Jung, Fabian J Theis, Carlos G Puntonet, Elmar W Lang Institute for Theoretical Physics, University of Regensburg Institute of Biophysics, University of Regensburg, 93-Regensburg, Germany Dept of Architecture and Computer Technology, University of Granda (Spain) email: AndreasJung@physikuni-regensburgde ABSTRACT Guided by the principles of neural geometric ICA, we present a new approach to linear geometric ICA based on histograms rather than basis vectors Considering that the learning process converges to the medians and not the maxima of the underlying distributions restricted to the receptive fields of the corresponding neurons, we observe a considerable improvement in separation quality of different distributions and a sizable reduction in computational cost by a factor of 1 at least We further explore the accuracy of the algorithm depending on the number of samples and the choice of the mixing matrix Finally we discuss the problem of high dimensions and how it can be treated with geometrical ICA algorithms 1 INTRODUCTION Independent component analysis (ICA) describes the process of finding statistically independent data within a given random vector In blind source separation (BSS) one furthermore assumes that the given vector has been mixed using a fixed set of independent sources The advantage of applying ICA algorithms to BSS problems in contrast to correlation-based algorithms is the fact that ICA tries to make the output signals as independent as possible by also including higher-order statistics Since the first introduction of the ICA method by Jutten and Herault [1] various algorithms have been proposed to solve the blind source separation problem [] [3] [] Most of them are based on information theory, but recently geometrical algorithms have gained some attention due to their relatively easy implementation They were first proposed in [5] [6], and since then have been successfully used for separating real world data [7] [8] As shown in [9] the geometric update step requires the signum function in the following way:! "$#%& (' (1) ) Thanks to the Deutsche Forschungsgemeinschaft (DFG) for financial support Then the converge to the medians in their receptive field Note that the medians do not necessarily coincide with the maxima +*-,/*1 of the mixed density distribution on the sphere as shown in figure 1 Therefore, in general, any algorithm searching for the maxima of the distribution will not find the medians, which are the images of the unit vectors after mixing [9] However with special restrictions to the sources (identical super-gaussian distribution of each component, as for example speech signals), the medians correspond to the maxima [1] Fig 1 Probability Density [1/Deg] 1 1 1 8 6 α 1 α 5 9 135 18 Angle [Deg] Projected density distribution on the sphere of a mixture of two Laplacian signals with different variances, with the mixture matrix mapping the unit vectors 3 to +6578*1 9: ;*1 < for = > @? (dark line = theoretical density function, gray line = histogram of a mixture of 1 samples) THEORY Given an independent random vector ABC $DFEHG, which will be called source vector with zero mean and symmetric IKJMLON/P distribution, where C is a fixed probability space, and QMRSQ-TE- is a quadratic invertible matrix, we call the random variable UVB IXW A the mixed vector The goal

I,, w b of linear I ICA is to recover the sources and the mixing matrix from the given mixture U At first, we QOY? will restrict ourselves to the two-dimensional case, so let So far in geometric ICA, mostly neural algorithms have been applied [5] [6] The idea is to consider the projected random variable Z[B ]\_^ U ]\H U where \ B E %`;abc dd A B a " J E $e e" e K c dd fb @?/\!gddhfb \! () E denotes the projection of onto the unit sphere A, then taking the angle and i,/ finally b \ mapping modulo Two arbitrary starting values ( 7 b J fb \! are chosen (usually the angles of the two unit vectors 3, and 3 ) and are then subjected to the following update rule: ] 1!! jlk!& (3) jlk Here denotes a sample of the distribution ; of Z, and = is chosen such that the distance of gm7 jlk \ to modulo is smaller than the distance for any other, n]o = b is the learning rate parameter which has to converge to Hence it makes sense to define the receptive field of the neuron to be p B a " J fb \! e" \ ; c closest (modulo ) to ' p () Note that since we are in two-dimensions the length of is always q r <st As shown in [9], after convergence the neurons satisfy the geometric convergence condition: Definition 1 Two angles u, u J fb \! satisfy the Geometric Convergence Condition (GCC) if u is the median of Z restricted to its receptive field p for = v @? Let the mixing matrix I be given as follows: B xw 657 +*-, y657 +*1 :! +*-,6z:! +*1 { ' (5) Then the vectors +657 +* <(: 1 +*1 <} satisfy the GCC; hence we may assume that the above algorithm converges to these two solutions, and therefore we have recovered I and the sources A The distribution on the sphere together with the angles *1 is depicted in figure 3 HISTOGRAM BASED ALGORITHM: FASTGEO Using the geometric convergence condition (GCC), we know that the neurons will converge to the medians in their receptive fields This enables us to compute these positions directly using a search on the histogram of Z, which reduces the computation time by a factor of about 1 or more In the FastGeo-algorithm we scan through all different receptive fields and test GCC In practice this means discretizing x Fig Example of a two-dimensional scatter plot of a mixture of two Laplacian signals with identical variances Dash-dotted lines show borders of the receptive fields Probability Density [1/Deg] 1 8 6 F x 1 F 1 α α 1 ϕ= α 1 α α +18 α 1 +18 5 9 135 18 5 7 315 36 Angle [Deg] Fig 3 Probability density function ~ of Z from figure with the mixing angles * and their receptive fields p for = K @? the distribution ~ \1ƒ of Z using a given bin-size and then testing the different receptive fields The algorithm will be formulated more precisely in the following: For p simplicity let us assume that the cumulative distribution of Z p is invertible this means that is nowhere constant Define a function B fb \! dd E ˆdD u,/ + -! u + -? + X \? (6) where u + - B p Š, q X = q Ž for = K @? is the median of Z p + X =/q! p + XŒ = q? { (7) e f t = K q t = qž f t in =

Lemma 31 Let be a zero of in f b \! Then the u + - satisfy the GCC Proof By definition, u,/ + -! u + -? \? u,l + -! u + -? (8) is the receptive field of u,/ + - H + - b Since, the starting point of the above interval is, because u,/ + -! u + -? \? ' (9) Hence we have shown that the receptive field of u,/ + - is f g qž, and by construction u,/ + - is the median of Z restricted to the above interval The claim for u + - then follows Algorithm 3 (FastGeo) Find the zeros of always has at least two zeros which represent the stable and the unstable fixpoint of the neural algorithm In practice we extract I Š the, fixpoint which then gives the proper demixing matrix by picking such that ~l u,/ + 1 /! ~l u + 1 / (1) is maximal For unimodal and super-gaussian source distributions this results in a the stable fixpoint (see conjecture 67 in [9]) For sub-gaussian sources choosing -, with ~l u,/ + 1 /! ~l u + 1 / (11) minimal, induces the demixing matrix Alternatively, using the result for the super-gaussian case, note that for unimodal distributions and are the desired * u,/ + 1 /! u + 1 /? Œ 1 \? (1) u,/ + 1 /! u + 1 /&\? Œ 1 (13) ACCURACY In this section we want to consider the dependence of the FastGeo-algorithm on the number of samples after the binsize has been fixed As seen in the previous section, the accuracy of the histogram based algorithm then only depends on the distribution of the samples U respectively Z ie we can estimate the error made by approximating the mixing matrix I by a finite number of samples In the following we will give some results of test-runs made with this algorithm * J When choosing two arbitrary angles fb \!( = @? I * for the mixing matrix, we define as the distance between these two angles modulo b q This will give us an angle in the range between and q b b7 respectively and First let us consider the accuracy of the recovered angles * e*1 l *!š œ ž Ÿ@ š e *, when varying the angle for a fixed number of samples Choosing a mixture of two Laplacian source signals with identical variances, figure shows a nearly linear decrease of the error * * with decreasing The 95% confidence interval decreases similar to the standard deviation, ie both parameters provide a good estimate for the error Note that the mean of the error * is very close to zero α [ ] 5 3 1 Mean Standard deviation 95% confidence interval 1 3 5 6 7 8 9 α [ ] Fig Mixture of 1 samples of two Laplacian source signals with identical variances Plotted is the mean, standard deviation and 95% confidence interval of * calculated from 1 runs for each angle * * In order to show more precisely that * is proportional to, we have plotted *-ƒ/* * versus in figure 5, which should result in a horizontal line Obviously the estimate of * *1 * * the with respect to is good for a wide range of ( b7 ) and gets only slightly worse for smaller values α / α [ ] 1 8 6 Mean Standard deviation 95% confidence interval 1 3 5 6 7 8 9 α [ ] Fig 5 Same mixture as in figure, plotting *-ƒ/* versus * Varying the number of samples used for estimating the angles *1 shows that with increasing number of samples the

I G G G G estimate enhances as denoted in figure 6 This fact is well know in statistics and signal processing σ( α / α) [ ] 6 5 3 1 1 Samples 1 Samples 1 Samples 1 3 5 6 7 8 9 α [ ] Fig 6 Mixture of two Laplacian source signals with identical variances for different number of samples The standard deviation of *-ƒ/* calculated from 1 runs for each angle * is plotted To get a relation between the error * * and the number of samples, we plot for different, the standard deviation of * versus the number of samples, see figure 7 From the slope of the curve we see that the standard deviation of * Q Š is roughly proportional to Q, where denotes the number of samples σ( α) [ ] 1 1 1 1 1 Samples α=1 α=5 α=9 Fig 7 Dependence of standard deviation of * with the number * v of samples b b * for estimating b the for three different angles Note the logarithmic scale on both axes, and The above results have been put together numerically in table 1, where we chose following mixing matrix I : B xw b b ' ' h{ ' (1) Given is the standard deviation of the non-diagonal terms after normalizing each column of the mixing matrix, so that the diagonal elements are unity For comparison, we also calculated the performance index, as proposed by Amari [11] eª, m e 1, m 1, «N/ eª e v eª m e (15) m 1,²± 1, «N/ 8 eª m e ³ where µ ª mh I @ 9 ¹ Š, G WI number of samples standard deviation index, ' b b b b ' b º º b '» b ' b b b b ' b º b ' b ¼ b b ' b b b b ' b b ¼ b ' b º» Table 1 Standard deviations of the non-diagonal terms and the performance index, with different number of samples In statistics, an often used technique for estimating the error made by approximating a probability density function (pdf) by a finite number of samples is the so-called confidence interval, which is the interval around the estimate obtained from a given number of samples such that the probability that the real value lies outside this interval is less than a fixed error probability ½ : e U K¾ " e À v ½ (16) For ",/ ' ' '6" estimating the median of a pdf, we refer to [1]: Let G be independent identically distributed (iid) samples of the random variable " ¾ U ", such that "d Á1,, then the estimated median of the samples is defined as "gâã Ä Q, if is odd or " ÂÄ " ÂÄ Á1,6 Q if is even For large Q Q º Å ( ), let KÇ Æ be the inverse of the cumulative standard distribution of and È v b Q? É Q? Æ ' (17) " ¾ Then the confidence interval of is given by f",9á " G Š Ž ' (18) In our case, an approximated confidence interval can be calculated by first running the algorithm and thus getting approximate angles * Then use the above equations to get confidence intervals for the samples of Z restricted to the corresponding receptive fields p 5 HIGHER DIMENSIONS In the above sections we have explicitly used two-dimensional data In real world problems, however, the mixed data is

º Î Q À À \ Ö usually high-dimensional (for example EEG-data with 1 dimensions) Therefore it would be satisfactory to generalize geometric algorithms to higher dimensions The neural geometric algorithm can be easily translated to higher dimensional cases, but a serious problem occurs in the explicit calculation: In order to approximate higher dimensional pdfs it is necessary to have an exponentially growing number of samples, as will be shown in the following The number of samples in a ball ÊS Š, of radius Ì on the unit sphere A Š,²Í E divided by the number of samples on the whole A Š, can be calculated as follows, if we assume a uniformly distributed random vector Let Ê B a " J E e e" eïî c and A Š, B a " J E e e" e x c referring to [13], the volume of ÊS can be calculated by Ð Ñ u Ê H \ròä Ó À ' (19) ÔlÕ It follows for Ö Number of Samples in Ball Q : Q Ÿ@ž Ø Ù ÒÚ ÛÜ ÒÚ Ÿ@ž ØÞÝ ÒÚ Û Ì Š, À Š, Ì Š, À Š, Á1, Ì Š, () Obviously the number of samples in the Ball decreases by Ì Š, Ö if Ì, which is the interesting case To have the same accuracy when estimating the medians, the decrease must be compensated by an exponential growth in the number of samples For three dimensions, we have found a good approximate for the demixing matrix by using 1 samples, in four dimensions the reconstructed mixing matrix couldn t be reconstructed correctly, even with a larger number of samples A different approach for I higher dimensions has been taken by [1] and [8], where has been calculated by using Ø Š, Û projections ofu E from E onto along the different coordinate axises and reconstructed the multidimensional matrix from the two-dimensional solutions However, I this approach works only satisfactory if the mixing matrix is close to the unit matrix up to permutation and scaling Otherwise, even in three dimensions, this projection approach won t give the desired results, as can be seen in figure 8, where the mixing matrix has been chosen as: I bzb ' Å b b ' º bßbàb ' ¼ (1) x 3 x Fig 8 Projection of a three dimensional mixture of Laplacian signals onto the three coordinate planes Note that the projection into the ", - "d -plane does not have two distinct lines which are needed for the geometric algorithms 6 CONCLUSION We presented a new algorithm for linear geometric ICA based on histograms, which is both stable and more efficient compared to the neural geometric ICA algorithm The accuracy of the algorithm concerning the estimation of the relevant medians of the underlying data distributions, when varying both the mixing matrices and the sample numbers, has been explored quantitatively, showing a rather good performance of the algorithm We also considered the problem of high dimensional data sets with respect to the geometrical algorithms and discussed how projections to lowdimensional subspaces could solve this problem for a special class of mixing matrices Simulations with non-symmetrical and non-unimodal distributions have shown promising results so far, indicating that the new algorithm will perform as well with almost any distribution This is the subject of ongoing research in our group In the future, the histogram based algorithm could as well be extended to the non-linear case similar to [8], using multiple centered spheres for projection on the surface on which the projected data histograms could then be evaluated 7 ACKNOWLEDGMENTS This research was supported by the Deutsche Forschungsgemeinschaft (DFG) in the Graduiertenkolleg Nonlinearity and Nonequilibrium in Condensed Matter We also would like to thank Christoph Bauer and Tobias Westenhuber for x 1

suggestions and comments on the neural algorithm 8 REFERENCES [1] C Jutten and J Herault, Blind separation of sources - an adaptive algorithm based on neuromimetics architecture, Signal Processing, pp 1 1, 1991 [] P Common, Independent component analysis - a new concept?, Signal Processing, vol 36, pp 87 31, 199 [1] Bosch, Statistik-Taschenbuch, Oldenbuch Verlag, pp 697 71, 1993 [13] RK Pathria, Statistical mechanics, Butterworth, Heinemann, pp 5 55, 1998 [1] Ch Bauer, CG Puntonet, M Rodriguez-Alvarez, and EWLang, Separation of EEG signals with geometric procedures, C Fyfe, ed, Engineering of Intelligent Systems (Proc EIS ), pp 1 18, [3] AJ Bell and TJ Sejnowski, An informationmaximization approach to blind seperation and blind deconvolution, Neural Computation, vol 7, pp 119 1159, 1995 [] J-F Cardoso, Blind signal separation: Statistical principals, Proc IEEE, vol 86, pp 9 5, 1998 [5] C G Puntonet and A Prieto, An adaptive geometrical procedure for blind separation of sources, Neural Processing Letters, vol, 1995 [6] C G Puntonet and A Prieto, Neural net approach for blind separation of sources based on geometric properties, Neurocomputing, vol 18, pp 11 16, 1998 [7] Ch Bauer, M Habl, EW Lang, CG Puntonet, and MR Alvarez, Probabilistic and geometric ICA applied to the separation of EEG signals, MHHamza, ed, Signal Processing and Communication (ProcSPC ), IASTED/ACTA Press, Anaheim, USA, pp 339 36, [8] CG Puntonet, Ch Bauer, EW Lang, MR Alvarez, and B Prieto, Adaptive-geometric methods: application to the separation of EEG signals, P Pajunen and J Karhunen, eds, ICA Proceedings, pp 73 77, [9] Fabian J Theis, Andreas Jung, and Elmar W Lang, A theoretic model for linear geometric ICA, preprint, 1 [1] A Prieto, B Prieto, CG Puntonet, A Canas, and P Martin-Smith, Geometric separation of linear mixtures of sources: Application to speech signals, JFCardoso, ChJutten, PhLoubaton, eds, Independent Component Analysis and Signal Separation (Proc ICA 99), pp 95 3, 1999 [11] S Amari, A Cichocki, and HH Yang, A new learning algorithm for blind signal separation, Advances in Neural Information Processing Systems, vol 8, pp 757 763, 1996