Independent Component Analysis (ICA)

Similar documents
Massoud BABAIE-ZADEH. Blind Source Separation (BSS) and Independent Componen Analysis (ICA) p.1/39

Independent Component Analysis (ICA) Bhaskar D Rao University of California, San Diego

Independent Component Analysis and Its Applications. By Qing Xue, 10/15/2004

Gatsby Theoretical Neuroscience Lectures: Non-Gaussian statistics and natural images Parts I-II

Independent Component Analysis. Contents

HST.582J/6.555J/16.456J

Non-orthogonal Support-Width ICA

CIFAR Lectures: Non-Gaussian statistics and natural images

HST.582J / 6.555J / J Biomedical Signal and Image Processing Spring 2007

An Improved Cumulant Based Method for Independent Component Analysis

Advanced Introduction to Machine Learning CMU-10715

Semi-Blind approaches to source separation: introduction to the special session

Independent Component Analysis

Fundamentals of Principal Component Analysis (PCA), Independent Component Analysis (ICA), and Independent Vector Analysis (IVA)

ICA. Independent Component Analysis. Zakariás Mátyás

An Introduction to Independent Components Analysis (ICA)

Blind Machine Separation Te-Won Lee

Analytical solution of the blind source separation problem using derivatives

Independent Component Analysis. PhD Seminar Jörgen Ungh

Lecture'12:' SSMs;'Independent'Component'Analysis;' Canonical'Correla;on'Analysis'

Wavelet de-noising for blind source separation in noisy mixtures.

Robust extraction of specific signals with temporal structure

Independent Component Analysis

Artificial Intelligence Module 2. Feature Selection. Andrea Torsello

Introduction to Independent Component Analysis. Jingmei Lu and Xixi Lu. Abstract

MTTS1 Dimensionality Reduction and Visualization Spring 2014 Jaakko Peltonen

Independent Component Analysis

Separation of Different Voices in Speech using Fast Ica Algorithm

1 Introduction Independent component analysis (ICA) [10] is a statistical technique whose main applications are blind source separation, blind deconvo

A Canonical Genetic Algorithm for Blind Inversion of Linear Channels

Independent component analysis: algorithms and applications

Independent Component Analysis

Slide11 Haykin Chapter 10: Information-Theoretic Models

Independent Component Analysis

Independent Component Analysis

Different Estimation Methods for the Basic Independent Component Analysis Model

Undercomplete Independent Component. Analysis for Signal Separation and. Dimension Reduction. Category: Algorithms and Architectures.

Principal Component Analysis

INDEPENDENT COMPONENT ANALYSIS

Independent Component Analysis and Its Application on Accelerator Physics

A GUIDE TO INDEPENDENT COMPONENT ANALYSIS Theory and Practice

Independent Component Analysis and Blind Source Separation

Independent Component Analysis

BLIND source separation (BSS) aims at recovering a vector

TWO METHODS FOR ESTIMATING OVERCOMPLETE INDEPENDENT COMPONENT BASES. Mika Inki and Aapo Hyvärinen

LECTURE :ICA. Rita Osadchy. Based on Lecture Notes by A. Ng

Comparative Analysis of ICA Based Features

Independent Component Analysis and Unsupervised Learning

Tutorial on Blind Source Separation and Independent Component Analysis

Independent Component Analysis (ICA)

ICA and ISA Using Schweizer-Wolff Measure of Dependence

PROPERTIES OF THE EMPIRICAL CHARACTERISTIC FUNCTION AND ITS APPLICATION TO TESTING FOR INDEPENDENCE. Noboru Murata

Natural Image Statistics

Blind Source Separation Using Artificial immune system

BLIND SEPARATION OF POSITIVE SOURCES USING NON-NEGATIVE PCA

One-unit Learning Rules for Independent Component Analysis

Blind channel deconvolution of real world signals using source separation techniques

Independent Component Analysis and Unsupervised Learning. Jen-Tzung Chien

Separation of the EEG Signal using Improved FastICA Based on Kurtosis Contrast Function

Single Channel Signal Separation Using MAP-based Subspace Decomposition

File: ica tutorial2.tex. James V Stone and John Porrill, Psychology Department, Sheeld University, Tel: Fax:

where A 2 IR m n is the mixing matrix, s(t) is the n-dimensional source vector (n» m), and v(t) is additive white noise that is statistically independ

Natural Gradient Learning for Over- and Under-Complete Bases in ICA

Unsupervised learning: beyond simple clustering and PCA

On Information Maximization and Blind Signal Deconvolution

Blind Signal Separation: Statistical Principles

Distinguishing Causes from Effects using Nonlinear Acyclic Causal Models

Distinguishing Causes from Effects using Nonlinear Acyclic Causal Models

TRINICON: A Versatile Framework for Multichannel Blind Signal Processing

Recursive Generalized Eigendecomposition for Independent Component Analysis

Independent Component Analysis of Incomplete Data

Machine Learning for Signal Processing. Analysis. Class Nov Instructor: Bhiksha Raj. 8 Nov /18797

THEORETICAL CONCEPTS & APPLICATIONS OF INDEPENDENT COMPONENT ANALYSIS

ICA [6] ICA) [7, 8] ICA ICA ICA [9, 10] J-F. Cardoso. [13] Matlab ICA. Comon[3], Amari & Cardoso[4] ICA ICA

Independent Component Analysis

By choosing to view this document, you agree to all provisions of the copyright laws protecting it.

Feature Extraction with Weighted Samples Based on Independent Component Analysis

MULTICHANNEL SIGNAL PROCESSING USING SPATIAL RANK COVARIANCE MATRICES

Zero-Entropy Minimization for Blind Extraction of Bounded Sources (BEBS)

Acoustic Source Separation with Microphone Arrays CCNY

Statistical signal processing

FEATURE EXTRACTION USING SUPERVISED INDEPENDENT COMPONENT ANALYSIS BY MAXIMIZING CLASS DISTANCE

c Springer, Reprinted with permission.

Nonlinear reverse-correlation with synthesized naturalistic noise

Lecture 10: Dimension Reduction Techniques

CPSC 340: Machine Learning and Data Mining. More PCA Fall 2017

CPSC 340: Machine Learning and Data Mining. Sparse Matrix Factorization Fall 2018

MINIMIZATION-PROJECTION (MP) APPROACH FOR BLIND SOURCE SEPARATION IN DIFFERENT MIXING MODELS

Short-Time ICA for Blind Separation of Noisy Speech

ON-LINE BLIND SEPARATION OF NON-STATIONARY SIGNALS

STATS 306B: Unsupervised Learning Spring Lecture 12 May 7

Blind separation of sources that have spatiotemporal variance dependencies

Data Analysis and Manifold Learning Lecture 6: Probabilistic PCA and Factor Analysis

FAST ALGORITHM FOR ESTIMATING MUTUAL INFORMATION, ENTROPIES AND SCORE FUNCTIONS. Dinh Tuan Pham

Independent component analysis for biomedical signals

Final Report For Undergraduate Research Opportunities Project Name: Biomedical Signal Processing in EEG. Zhang Chuoyao 1 and Xu Jianxin 2

Blind separation of instantaneous mixtures of dependent sources

Comparison of Fast ICA and Gradient Algorithms of Independent Component Analysis for Separation of Speech Signals

Dimensionality Reduction. CS57300 Data Mining Fall Instructor: Bruno Ribeiro

To appear in Proceedings of the ICA'99, Aussois, France, A 2 R mn is an unknown mixture matrix of full rank, v(t) is the vector of noises. The

Transcription:

Independent Component Analysis (ICA) Université catholique de Louvain (Belgium) Machine Learning Group http://www.dice.ucl ucl.ac.be/.ac.be/mlg/ 1 Overview Uncorrelation vs Independence Blind source separation & cocktail party problem Equations, indeterminations & assumptions Pre-whitening step Some examples The Gaussian case Objective functions: how to recover independent components? Non-Gaussianity approach & central limit theorem Minimum dependence approach Real-world examples Extensions 1

What is ICA? PCA: finding a transformation that uncorrelate variables ICA: finding a transformation that make variables as independent as possible In this lecture: the transformation is constrained to be linear and instantaneous 3 Independence is stronger than uncorrelation Uncorrelation between x and y : E[xy xy] ] = E[x]E[ ]E[y] Independence between x and y : E[f(x)g(y)]=E[ )]=E[f(x)]E[g(y)],)], for any non-linear functions f and g x does not carry any information about y In other words : Correlation measures the existence of a linear relation between variables Dependence measures the existence of any relation between variables 4

Uncorrelation vs indepence: example Let u be a random variable with uniform distribution: E[u]= and E[u ]=1 f u 1/ 3 Let v = u E[uv] = E[u 3 ] = and E[u]E[v]= u and v are uncorrelated (no linear relation between u and v) 3 + 3 u Let f(u)=u and g(u)=u E f g [ ( u ) ( v )] = E [ u 4 ] = ( 3) u and v are dependent (a link exists between u and v) 4 but E f [ ( u )] E [ u ] E [ g ( v )] 1443 1443 E [u ] = PCA vs ICA? PCA (whitening): maximum variance projection NO independence! rem: whitening conservation for any rotation 1-1 - - ICA: minimum dependence directions (can t say anything on y knowing x) rem: independence remains for k π rotation (k in Z) up to permutation and sign! - - 6 3

PCA vs ICA? s x y 1 y s 1 mixing matrix x 1 A 1 = 1 PCA x x y x 1 ICA y 1 x 1 7 Independent Component Analysis (ICA) «source separation» or «cocktail-party» problem aims: to separate signals to use an independence criterion instead of variance maximisation (PCA) 8 4

Blind source separation Sources S A Mixtures X W Outputs Y UNKNOWN KNOWN TO ESTIMATE 9 Method? Under several assumptions: Y=Estim(S)=W ICA X Cocktail party Hypotheses: linear and additive mixing no phase delay signals rather than data 1 Why ICA rather than PCA? Uncorrelation independence If W is a whitening matrix, then UW s.t. UU T =I is also a whitening W whitening matrix highly non-unique (up to any rotation matrix!) W ICA is unique, up to indeterminations

The problem in equations Notations independent signals (unknown) measured signals linear mixing: The problem : to estimate W A -1 x () t = [ s () t s () t s ()] T,, n t s 1, K 1, K () t = [ x () t x () t x ()] T,, n t () t = A s () t x. but A is unknown! so that y = Wx = WAs will be an estimate of the sources: y = sˆ 11 Independence hypothesis A is unknown we cannot compute W = A -1 This lack of information is compensated by the independence hypothesis Solution - indeterminations Solution We measure the independence of signals y i (t) when this independence is maximum : y t Indeterminations order of signals (indep. is symmetric) multiplying factor on each signal solution: n x i = aij s j j = 1 W 1 = PDA n = j = 1 α s j. aij α () s () t constant Diagonal matrix (non-zero coefficients) A calibration could be necessary Permutation matrix Low importance in applications 1 6

Solution - assumptions Source signals are mutually independent Since the magnitude of the s i cannot be known, it is fixed s.t. E[s i s it ]=1. Hence, it is supposed that : T E [ ss ] = I The mixing matrix is supposed to be constant in time 13 Whitening: a preprocessing to ICA? Sources centered observations Output signals 3 - - s -3 -.. x=as Whitened signals - - Y=Wz 1-1 - - z=vx=vas 14 7

Whitening: a preprocessing to ICA? Why unmixing z (whitened signals) instead of x? If z is white VA is orthogonal: T E [ zz ] = T T ( VA) E [ ss ] VA 1443 I = I If VA is orthogonal W reduces to an orthogonal matrix : T E [ yy ] = T T W E [ zz ] W 1443 I = I Hence: only n(n-1)/ instead of n parameters have to be estimated 1 Separation of Uniform signals sources 1 An estimation -Permutation s1-s - Double inversion. - - Other estimation. 1 - Inversion of s1 - No permutation - - 16 8

Uncorrelation and independence [1] sources Mixtures 1-1 1-1 - - 1-1 - ICA Whitening - - - - (FastICA) - - - - 17 Uncorrelation and independence [] (SWICA) Sources Mixtures Whitening ICA F. Vrins Warning: mean and variance of original images are important! 18 9

The Gaussian case If the sources have Gaussian distribution Temporal structure looks (but isn t t!) similar to uniform random signal Scatter plot is very different Sources (temp. Struct.) Sources (scatter plot) 3 - - 1-3 -3 3 19 ϕ ( x µ ) 1 ( ) = σ x e πσ (fully described by mean and variance) The Gaussian case If the sources have Gaussian distribution in the Gaussian case: Independence is equivalent to uncorrelation! n! Mixtures Max. Variance Scaling (whitening) - - - - - - A rotation after whitening does not change anything! 1

Sources The Gaussian case superimposition White mixt. - - - - - - p How to find the rotation corresponding to original sources (up to perm/scale)? p Other information needed (temporal struct.,etc) to separate Gaussian sources. 1 Main tool for ICA: independence Discrete case Continuous case ( A, B ) p ( A) p ( B ) ( A B ) p ( A) P = P = f x n ( x ) = f x ( x i ) i = 1 i The problem to measure independence between signals to minimize this independence 11

ICA objective functions Y = WX = WAS Non-Gaussianity approach By the central limit theorem, the PDF of a sum of n indepedent random variables converges to a Gaussian Measure of non-gaussianity Finding W such that the outputs PDF are as different as possible from the Gaussian function one output signal at a time Independence approach Find independence measures between signals Estimation of PDF or of independence criterion all output signals together 3 Gaussianity and CLT Central Limit Theorem: illustration with uniform variable n=1 n= n= n=1 n= n=1 4 1

Non-Gaussianity approach Minimum differential entropy Gaussians have maximum differential entropy Maximum negentropy equivalent to differential entropy Maximum positive transform of kurtosis Gaussians have kurtosis = Gram-Charlier expansion measure the difference between output pdf and Gaussian pdf Entropy [1/] Discrete case H K ( x ) = log( ) i p i p i = 1 for j i H ( ) ( minimum ) ( x ) log( K ) ( maximum) if pi = 1, p j = x = if p i = 1 K H = Continuous case: differential entropy h ( x ) f ( u ) logf ( u ) = du 6 13

Entropy [/] Continuous case (continued) maximum differential entropy (if variance = σ ): Gaussian 1 ( x ) log( πeσ ) h G = differential entropy: invariant to orthogonal transforms Minimizing h(x) make PDF of x far from Gaussian Finding W s.t. the outputs entropies are low (x i are unit-variance) 7 Negentropy negentropy: : difference wrt the entropy of a Gaussian J ( x ) = h ( x ) h( x ) multi-dimensional case : J G ( x ) f ( u ) x = log f f x x G ( u ) ( u ) du Finding W s.t. the J(x) is maximum (x i are unit-variance) 8 14

Kurtosis : intuitive considerations Definition of Kurtosis : κ 4 ( ) E[ ] 3( [ ]) x = x x 4 E Interesting properties: - for Gaussian PDF: κ ( x ) 4 G = - for most non-gaussian PDF: ( x ) κ 4 > Finding W s.t. m i = 1 κ 4 ( x i ) is maximum (x i are unit-variance) 9 Kurtosis: illustration 3 1

Gram-Charlier Expansion Taylor expansion approximate a function f around f(x ) Gram-Charlier expansion approximate a PDF p x around the Gaussian function ϕ truncated at fourth order: p x H 3 ( ξ ) H ( ) ( ) ( ) 1 + ( ) + ( ) 4 ξ ξ ϕ ξ κ3 x κ 4 x 3! 4! «Non-Gaussian part» of p x 31 Minimum dependence approach Minimum Mutual Information Minimum sum of marginal entropies Minimum positive transform of cross-cumulant cumulant 3 16

Mutual information and marginal entropies Mutual information (MI) I ( x ) f ( u ) = f x log n f i = 1 I(x)= iff all x i are independent x ( u ) ( u ) x i i du MI and sum of outputs marginal entropies 33 I ( x ) = = = m h ( x i ) i = 1 m h ( x i ) i = 1 m h ( x i ) i = 1 h h h ( x ) ( Wz ) ( z ) log( det( W ) ) 144 44 3 (WW T =I) Mutual information and marginal entropies Mutual information Difficult to estimate (joint pdf of x ) Computational cost Finding W s.t. the I(x) is minimum Sum of outputs marginal entropies Better than MI because no estimation of joint PDF Finding W s.t. WW T =I minimizing m h(xi ) i = 1 34 17

Moments and cumulants Probability density function Properties of the distribution (mean, variance, ) Moments ordre r moment r r ( x ) E[ x ] µ' = centered ordre r moment µ r [ ] r ( x ) = E ( x - E[ x] ) 3 Independence,, PCA and ICA Whitening Diagonalization of covariance matrix (+ scaling) Only moments with order < are taken into account ICA Diagonlization of higher-order cumulant tensor ( hyper( hyper-matrix with four indices: I,j,k,l): like higher-order covariance! In order to go further than (linear) decorrelation! Independence: one should know all cross-cumulants cumulants and then make them equal to zero! 36 18

The Gaussian case (con t) Gaussian distribution Perfectly defined by the mean/variance of the variable! All moments of order> are stricly zero! PCA ICA data are described by covariance matrices only moments with order < are taken into account (set to zero) = Make the higher order statistics to zero But it is already the case for Gaussian PDF (see G-C G C exp.)! 37 Decorrelation = independence for Gaussian variables! Decorrelation transform: up to rotation transform Too many indeterminations for the BSS problem Other information needed (temporal structure, frequency, ) Theory and Practice In theory, independence measures require the knowledge of the PDF (to compute mutual information or entropies, ). In practice, those PDF are unknown. Two possibilities Density estimation (but difficult task) Estimate directly the independence measures 38 Example of independence approximation: independence = all higher-order cross-cumulants cumulants must be zero approx. of indep. = covariance and kurtosis must be zero 19

Dependence minimization 39 How to maximize independence or non-gaussianity? for example, through cumulants or negentropy Often: a pre-whitening step is useful ICA problem reduces to find a rotation matrix: ++ : the number of elements to estimate reduces from n to n(n-1)/ - - : computational problem and errors (if PCA failed) Objective functions (OF) to minimize estimation or approximation of non-gaussianity and independence measures Local minima? Algorithms to minimize OF: neural (gradient-based, based, ) and algebraic methods, specific algorithms for specific problems Local vs Global criterion «local criterion» π Source scatter plot I(y) 1 π π 3π θ π -1 - - -1 1 «global criterion» 4. 4 y = Ws s.t. W = θ cos sin θ sin cos θ θ 3. 3 4. pi/4 pi/ 3*pi/4 pi pi/4 3pi/7pi/4 pi

Image preprocessing 41 Signal separation Example 1 Example x1 y 1 -s 1 x y -s 4 1

Biomedical appl. : FECG extraction Whitening/ICA Signals recorded on a pregnant woman abdomen Extracted source signals F. Vrins Maternal ECG, Fetal ECG 43 Mixture of digital sources (PCA ICA) PCA ICA 44

Independent subimages A.J. Bell, T.J. Sejnowski, Edges are the Independent Components of natural scenes, NIPS 96, pp. 831-836, MIT Press, 1996. 4 Handfree phone in car N. Charkani El Hassani, Séparation auto-adaptative de sources pour des mélanges convolutifs Application à la téléphonie mains-libres dans les voitures, thèse de doctorat, INP Grenoble, 1996. 46 3

Multiple RF tags Y. Deville, J. Damour, N. Charkani, Improved multi-tag radio-frequency identification systems based on new source separation neural networks, Proc. of ICA 99, Aussois (France), January 1999, pp. 449-44. 47 Financial time series ICA reconstruction (4 ICs) A.D. Back, A.S. Weigend, A First Application of Independent Component Analysis to Extracting Structure from Stock Returns, International Journal of Neural Systems, Vol. 8, No. (October, 1997) 48 4

Cocktail party Speech music separation observations estimations Speech speech separation observations estimations T.-W. Lee, Institute for Neural Computation, University of California (San Diego), http://www.cnl.salk.edu/~tewon/blind/blind_audio.html 49 Problem extensions [1/3] Basic model Extensions n measures, m sources, n m n > m: first : m is estimated PCA stage : dim. red. from n to m m m signals then source separation (ICA) n < m: n x i = ij j, j = 1 () t a s () t, i = 1, n the m most powerful sources are estimated the result is corrupted by the m - n other sources Other techniques, using sparisty, K

Problem extensions [/3] Extensions (ctnd( ctnd) noisy observations n x i t = aij s j i K, j = 1 () () t + n () t, i = 1, n () t = As () t n() t x + Even if W is estimated perfectly: () t = Wx() t = WAs() t Wn() t y + If n >>m: : specific algorithms (projection in the signal subspace) 1 ill-conditioned mixings lines of A are similar specific algorithms Problem extensions [3/3] Extensions (ctnd( ctnd) more complex mixtures (filtering) specific algorithms convolutive mixtures x () t = A() t s () t n p 1 x i = ij j, j = 1 k = () t a ( k ) s ( t k ), i = 1, n post non-linear mixtures n x i = i ij j, j = 1 () t f a s () t, i = 1, n non-linear mixtures x i K 1 K n = K () t = f ( s () t,, s () t ), i 1,, n i K 6

Sources and References Some ideas and figures contained in these slides come from: Blind separation of sources, Part I, C. Jutten & J. Hérault,, Signal Processing 4, pp. 1-1, 1 1, 1991. Improving Independent Component Analysis Performances by Variable Selection, F. Vrins, J. A. Lee,, V. Vigneron and C. Jutten, IEEE NNSP'3, pp. 39-368, 368, September 17-19, 19, 3, Toulouse (France). High performance magnetic field smart sensor arrays with source separation, A. Paraschiv-Ionescu et al., Proc. 1st Int. Conf. On Modeling and Simulation of Microsystems (MSM98), Santa Clara (USA), April 6-8, 6 1998. Elements of Information Theory, Cover and Thomas, Wiley and Sons,, New York, 1. Independent component analysis,, A. Hyvarinen, J. Karhunen and E. Oja, Wiley series on adaptive and learning systems for signal processing, communications and control, S. Haykin edt, 1. Thanks to Frédéric ric Vrins for many slides! 3 7