Independent Component Analysis and Its Applications. By Qing Xue, 10/15/2004

Similar documents
CIFAR Lectures: Non-Gaussian statistics and natural images

Independent Component Analysis. Contents

Gatsby Theoretical Neuroscience Lectures: Non-Gaussian statistics and natural images Parts I-II

Introduction to Independent Component Analysis. Jingmei Lu and Xixi Lu. Abstract

Fundamentals of Principal Component Analysis (PCA), Independent Component Analysis (ICA), and Independent Vector Analysis (IVA)

Independent component analysis: algorithms and applications

ICA. Independent Component Analysis. Zakariás Mátyás

Separation of Different Voices in Speech using Fast Ica Algorithm

Independent Component Analysis. PhD Seminar Jörgen Ungh

INDEPENDENT COMPONENT ANALYSIS

Independent Component Analysis and Blind Source Separation

Independent Component Analysis

HST.582J/6.555J/16.456J

TWO METHODS FOR ESTIMATING OVERCOMPLETE INDEPENDENT COMPONENT BASES. Mika Inki and Aapo Hyvärinen

Advanced Introduction to Machine Learning CMU-10715

Independent Component Analysis

Independent Component Analysis (ICA) Bhaskar D Rao University of California, San Diego

Artificial Intelligence Module 2. Feature Selection. Andrea Torsello

HST.582J / 6.555J / J Biomedical Signal and Image Processing Spring 2007

MTTS1 Dimensionality Reduction and Visualization Spring 2014 Jaakko Peltonen

Independent Component Analysis

Lecture'12:' SSMs;'Independent'Component'Analysis;' Canonical'Correla;on'Analysis'

Different Estimation Methods for the Basic Independent Component Analysis Model

Independent Component Analysis

A GUIDE TO INDEPENDENT COMPONENT ANALYSIS Theory and Practice

Separation of the EEG Signal using Improved FastICA Based on Kurtosis Contrast Function

1 Introduction Independent component analysis (ICA) [10] is a statistical technique whose main applications are blind source separation, blind deconvo

From independent component analysis to score matching

Independent Component Analysis

Final Report For Undergraduate Research Opportunities Project Name: Biomedical Signal Processing in EEG. Zhang Chuoyao 1 and Xu Jianxin 2

One-unit Learning Rules for Independent Component Analysis

Independent Component Analysis and FastICA. Copyright Changwei Xiong June last update: July 7, 2016

A Constrained EM Algorithm for Independent Component Analysis

CPSC 340: Machine Learning and Data Mining. More PCA Fall 2017

CPSC 340: Machine Learning and Data Mining. Sparse Matrix Factorization Fall 2018

Independent Component Analysis (ICA)

Comparative Analysis of ICA Based Features

Chapter 15 - BLIND SOURCE SEPARATION:

ARTEFACT DETECTION IN ASTROPHYSICAL IMAGE DATA USING INDEPENDENT COMPONENT ANALYSIS. Maria Funaro, Erkki Oja, and Harri Valpola

ACENTRAL problem in neural-network research, as well

Massoud BABAIE-ZADEH. Blind Source Separation (BSS) and Independent Componen Analysis (ICA) p.1/39

Slide11 Haykin Chapter 10: Information-Theoretic Models

Blind Machine Separation Te-Won Lee

Unsupervised learning: beyond simple clustering and PCA

Figure 1: The original signals

Principal Component Analysis vs. Independent Component Analysis for Damage Detection

Blind Source Separation Using Artificial immune system

Tutorial on Blind Source Separation and Independent Component Analysis

An Introduction to Independent Components Analysis (ICA)

Donghoh Kim & Se-Kang Kim

Principal Component Analysis

Multi-user FSO Communication Link

Independent Component Analysis of Rock Magnetic Measurements

Independent Component Analysis

Estimation of linear non-gaussian acyclic models for latent factors

Independent Component Analysis and Unsupervised Learning

Independent Component Analysis and Unsupervised Learning. Jen-Tzung Chien

STATS 306B: Unsupervised Learning Spring Lecture 12 May 7

THEORETICAL CONCEPTS & APPLICATIONS OF INDEPENDENT COMPONENT ANALYSIS

Artifact Extraction from EEG Data Using Independent Component Analysis

ICA [6] ICA) [7, 8] ICA ICA ICA [9, 10] J-F. Cardoso. [13] Matlab ICA. Comon[3], Amari & Cardoso[4] ICA ICA

Independent Component Analysis

An Improved Cumulant Based Method for Independent Component Analysis

THE functional role of simple and complex cells has

A survey of dimension reduction techniques

APPLICATION OF INDEPENDENT COMPONENT ANALYSIS TO CHEMICAL REACTIONS. S.Triadaphillou, A. J. Morris and E. B. Martin

Connections between score matching, contrastive divergence, and pseudolikelihood for continuous-valued variables. Revised submission to IEEE TNN

MULTICHANNEL SIGNAL PROCESSING USING SPATIAL RANK COVARIANCE MATRICES

Frequency-Based Separation of Climate Signals

Independent Component Analysis of Evoked Potentials in EEG

Independent Component Analysis of Incomplete Data

ICA and ISA Using Schweizer-Wolff Measure of Dependence

Blind separation of sources that have spatiotemporal variance dependencies

Natural Image Statistics

Speed and Accuracy Enhancement of Linear ICA Techniques Using Rational Nonlinear Functions

Comparison of Fast ICA and Gradient Algorithms of Independent Component Analysis for Separation of Speech Signals

Emergence of Phase- and Shift-Invariant Features by Decomposition of Natural Images into Independent Feature Subspaces

Independent Component Analysis and Its Application on Accelerator Physics

Dimensionality Reduction. CS57300 Data Mining Fall Instructor: Bruno Ribeiro

Temporal Coherence, Natural Image Sequences, and the Visual Cortex

Fast blind separation based on information theory.

Higher Order Statistics

Independent Component Analysis

Structure in Data. A major objective in data analysis is to identify interesting features or structure in the data.

NON-NEGATIVE SPARSE CODING

Autoregressive Independent Process Analysis with Missing Observations

Introduction to Machine Learning

Short-Time ICA for Blind Separation of Noisy Speech

Undercomplete Independent Component. Analysis for Signal Separation and. Dimension Reduction. Category: Algorithms and Architectures.

New Machine Learning Methods for Neuroimaging

PROPERTIES OF THE EMPIRICAL CHARACTERISTIC FUNCTION AND ITS APPLICATION TO TESTING FOR INDEPENDENCE. Noboru Murata

Dimension Reduction (PCA, ICA, CCA, FLD,

Estimating Overcomplete Independent Component Bases for Image Windows

PCA & ICA. CE-717: Machine Learning Sharif University of Technology Spring Soleymani

Recursive Generalized Eigendecomposition for Independent Component Analysis

Uncorrelatedness and Independence

A non-gaussian decomposition of Total Water Storage (TWS), using Independent Component Analysis (ICA)

Single Channel Signal Separation Using MAP-based Subspace Decomposition

Blind Source Separation in real-world environments: new algorithms, applications and implementations

Natural Gradient Learning for Over- and Under-Complete Bases in ICA

Transcription:

Independent Component Analysis and Its Applications By Qing Xue, 10/15/2004

Outline Motivation of ICA Applications of ICA Principles of ICA estimation Algorithms for ICA Extensions of basic ICA framework

Motivation of ICA Cocktail-party problem

Motivation of ICA

Motivation of ICA The problem: How to separate the voice from the music using recordings of several microphones in the room. Solution: a ij are known: the linear equation can be solved by classical methods. a ij are unknown: ICA can be used to estimate the a ij and separate the original source signals from their mixtures.

Applications of ICA Blind source separation (classical application of ICA) Cocktail party problem Fig1: An illustration of blind source separation. This figure shows four source signals or independent components.

Applications of ICA Fig2: Due to some external circumstances, only linear mixtures of the source signals in Fig1 can be observed. Fig3: The estimates of the source signals using only the linear mixtures in Fig2. Estimates are accurate up to multiplying factors.

Applications of ICA Separation of artifacts in MEG data Magnetoencephalography (MEG): The measurement of the magnetic activity of the brain. Provides very good temporal resolution and moderate spatial resolution. Problem when using a MEG record: Extracting the essential features of the neuromagnetic signals in the presence of artifacts. An ICA approach to separate brain activity from artifacts

Applications of ICA Fig4: Samples of MEG signals, showing artifacts produced by blinking, saccades, biting and cardiac cycle. For each of the 6 positions shown, the two orthogonal directions of the sensors are plotted.

Applications of ICA Fig5: Nine independent components found from the MEG data

Application of ICA Finding hidden factors in financial data Financial data with parallel time series (e.g. currency exchange rates, daily returns of stocks) May have some common underlying factors. ICA decomposes the data into independent components to give an insight to the structure of the data set.

Application of ICA Problem: Suppose a cashflow of several stores belonging to the same retail chain Find the fundamental factors common to all stores that affect the cashflow data. Input data The weekly cash flow in 40 stores in the same retail chain; The cash flow measurements cover 140 weeks. Pre-whitened to project the original signal vectors to the subspace spanned by their first five principal components.

Applications of ICA Fig6: Five samples of the original cashflow time series. Horizontal axis: time in weeks.

Applications of ICA Fig7: Five independent components or fundamental factors found from the cashflow data

Application of ICA Interpretations of the factors Factor 1 and 2 follow the sudden changes caused by holidays, the most prominent is the Christmas time. Factor 3 could represent a still slower variation, such as a trend Factor 4 might follows the relative competitive position of the retail chain with respect to its competitors. Factor 5 reflects the slower seasonal variation, with the summer holidays clearly visible.

Application of ICA Feature extraction The columns of A represent features, and s i is the coefficient of the i-th feature in an observed data vector x. ICA is used to find features that are as independent as possible.

Linear Representation of Multivariate Data A suitable representation of the multivariate data can facilitate the subsequent analysis of the data. Linear transformation of the original data: computational and conceptual simplicity. The problem can be rephrased as x: random vector whose elements are the mixtures x 1,,x m, y: random vector with elements y 1,,y n. W: the matrix to be determined by the statistical properties components y i. Wx y t x t x t x W t y t y t y m n = = ) ( ) ( ) ( ) ( ) ( ) ( 2 1 2 1 M M

Dimension Reduction Methods Principal component analysis (PCA) Choose W to limit the number of components y i to be quite small. To determine W so that each component contains as much information on the data as possible.

Independence as a Guiding Principle Determine W by finding statistically independent components of y i Any one of these components gives no information on the other ones. The starting point of ICA Find statistically independent components in the general case where the data is nongaussian.

Definition of Linear ICA Definition Given a set of observations of random variables (x 1 (t),x 2 (t),,x n (t)) (t is the time or sample index) Assume that they are generated by a linear mixture of independent components (s 1 (t),s 2 (t),,s m (t)) ICA consists of estimating both the unknown matrix A and the s i (t), only by observe the x i (t). = ) ( ) ( ) ( ) ( ) ( ) ( 2 1 2 1 t s t s t s A t x t x t x m n M M

Definition of Linear ICA Alternative definition Find a linear transformation given by a matrix W, so that the random variables y i, i=1,, n are as independent as possible. Each x i (t) is a sample on one observed random variable. = ) ( ) ( ) ( ) ( ) ( ) ( 2 1 2 1 t x t x t x W t y t y t y m n M M

What is Independence? Statistical independence Independence between two random variables y 1 and y 2 p ( y1, y2) = p1( y1) p2( y2) p(y 1,y 2 ): is the joint pdf of y 1 and y 2 ; p 1 (y 1 ) and p 2 (y 2 ) are the marginal pdf of y 1 and y 2 respectively. If y 1 and y 2 are independent E { 2 g 1 ( y1 ) g 2 ( y 2 )} E{ g 1 ( y1 )} E{ g 2 ( y )} = 0 for any two functions g 1 and g 2.

Uncorrelatedness Uncorrelated variables are only partly independent Uncorrelatedness: a weaker form independence is if y 1 and y 2 are uncorrelated E { 2 y1 y 2} E{ y1} E{ y } = 0 Independence implies uncorrelatedness ICA methods constrain the estimation procedure to give uncorrelated estimates of the independent components, which can reduce the number of the free parameters.

Gaussian Variables are Forbidden Fundamental restriction in ICA: Independent components must be nongaussian for ICA to be possible. s 1 and s 2 have a gaussian distribution Mixing matrix A is orthogonal Thus x 1 and x 2 are gaussian Joint distribution of x 1 and x 2 is completely symmetric and A can not be estimated. Joint Distribution of two independent gaussian variables

ICA by Maximization of Nongaussianity Nongaussian is independent Central limit theorem Sums of nongaussian random variables are closer to gaussian than the original ones. A linear combination of the observed mixture variables will be maximally nongaussian if it equals one of the independent components.

Measures of Nongaussianity Kurtosis (fourth-order cumulant) kurt(y)=e{y 4 }-3(E{y 2 }) 2 Zero for a gaussian random variable Negative for subgaussian random variables (flat pdf) Positive for supergaussian random variables (spiky pdf with heavy tails) Nongaussianity: absolute value of kurtosis Start from some weight vector w compute the direction in which the kurtosis of y=w T x is growing or decreasing most strongly and find a new w using gradient methods. Sensitive to outliers from the sample.

Measures of Nongaussianity Negentropy J(y)=H(y gauss )- H(y) H(y) is the differential entropy H of a random vector y with density f(y); y gauss is a gaussian random variable of the same covariance matrix as y. Computationally very difficult: requires an estimate (possibly nonparametric) of the pfd. The optimal estimator of nongaussianity concerning statistical properties.

Measures of Nongaussianity Approximations of negentropy G is a practical and non-quadratic function, c is a constant and v is a Gaussian variable of zero mean and unit variance. J ( y) c[ E{ G( y)} E{ G( v)}] Choosing a G that does not grow too fast help obtaining more robust estimators. Give a very good compromise between the properties of Kurtosis and Negentropy Fast to compute; have appealing statistical properties especially robustness. 2

ICA by Maximum Likelihood Estimation Formulate the likelihood in the ICA model Denoting W= (w 1,,w n ) T f i are the density functions of s i (here assumed to be known) log-likelihood takes the form L = T n t= 1 i= 1 log f ( w i T i x( t)) + T log detw Estimate the model by maximum likelihood methods Require the knowledge of the probability densities of the independent components Be very sensitive to outliers.

ICA by Minimization of Mutual Information A natural measure of the dependence between random variables: mutual information I between m random variables y i is defined as I ( y1, y 2,..., y m ) = H ( yi ) H ( y) m i = 1 I is always non-negative and is zero iff y i are statistically independent. For an invertible linear transformation y=wx: I ( 2 y1, y,..., ym) = H ( yi ) H ( x) log detw i Choose W which minimizes the mutual information between the components y i is a natural way of estimating the ICA model. Restricted by the probability density of independent components

Algorithms for ICA Introduction Choose one of the principles of estimation for ICA Practical algorithms to optimized the objective function corresponding to the principle. The statistical properties of the ICA method depend only on the objective functions used.

Preprocessing of the Data Preprocessing: Makes the problem of ICA estimation simpler and better conditioned. Centering: center x by subtract its mean vector Centering implies that s is zero-mean as well. Centering the observed x is solely to simplify ICA algorithms After estimating the mixing matrix with the centered data, mean vector of s, which is WE{x}, can be added back.

Preprocessing of the Data Whitening Transform the observed x linearly to obtain a white one, whose components are uncorrelated with unity variances. Transforms the mixing matrix into an orthogonal new one. Reduces the number of parameters to be estimated (from n 2 to n(n-1)/2). May also reduce the dimension of the data (e.g. discard those with very small eigenvalue in EVD).

Preprocessing of the Data Example of whitening The joint distribution of the independent components s 1 and s 2 with uniform distributions. Horizontal axis: s 1, vertical axis: s 2 The joint distribution of observed mixtures x 1 and x 2. Horizontal axis: x 1 vertical axis: x 2 The joint distribution of the whitened mixtures The whitened mixtures is clearly a rotated version of the original data. Estimation is left to be a single angle that gives the rotation.

Algorithms for ICA FastICA algorithm In many practical situations the convergence of adaptive algorithms based on gradient descent is often slow. Batch (block) algorithms based on fixed-point iteration are remedies for this problem. The FastICA algorithm is based on a fixed-point iteration scheme for finding a maximum of the nongaussianity of w T x, measured in negentropy J(y)=[E{G(y)}-E{G(v)}] 2, where G is any nonquadratic function, and v is a standardized Gaussian variable.

Algorithms for ICA One-unit FastICA algorithm : 1. Choose an initial (e.g. random) weight vector w 2. Let w + = E{xg(w T x)}-e{g (w T x)}w (using large sample of x vector) 3. Let w= w + / w + 4. If not converged, go back to 2. Convergence means the old and new values of w point in the same direction (dot product is equal to 1). Considering the computations, the averages can be estimated using a smaller sample. The sample points should be chosen separately at every iteration. If the convergence is not satisfactory, one may increase the sample size.

Algorithms for ICA FastICA algorithm for several units To estimate several independent components, the one-unit FastICA is run several times to obtain vectors w 1,,w n. To prevent different vectors from converging to the same maximum, a decorrelation step is added after each iteration.

Algorithms for ICA Properties of FastICA The convergence is cubic instead of linear convergence for ordinary ICA algorithms based on gradient descent method. Finds directly independent components of any non-gaussian distribution in contrast to many algorithms, where some estimate of the pdf has to be first available. The independent components can be estimated one by one. This decreases the computational load in cases where only some of the independent components need to be estimated.

Extensions of Basic ICA Framework Estimation of the noisy ICA model In practice it may be unrealistic to assume that the data could be divided into signals and noise in any meaningful way Noisy ICA model gives estimation of ordinary latent variable. Estimates the model with overcomplete bases More independent components than observed mixtures

Extensions of Basic ICA Framework Tailor ICA methods to a given practical application Example: estimate a ICA-similar-model in which the components are not necessarily all independent. It would be interesting to formulate the exact conditions that enable the such estimation. Avoiding and detecting overlearning Overlearning occurs if there are not enough samples in the data or there is a considerable amount of noise present. This results in the generation of spike-like signals.

Extensions of Basic ICA Framework Integrating time-correlations in ICA methods If x(t) come from a stochastic process, instead of being a sample of a random variable, blind source separation can also be accomplished by using time-correlations. Make together the blind source separation and some kind of blind deconvolution consider particular phenomena There may be time delays of signals propagation Echo in blind source separation

Reference Survey on Independent Component Analysis, Aapo Hyvarinen, Helsinki University of Technology, Laboratory of Computer and Information Science. Independent Component Analysis: a Tutorial, Aapo Hyvarinen, Helsinki University of Technology, Laboratory of Computer and Information Science. Independent Component Analysis Introduction, A. Hyvärinen, J. Karhunen, E. Oja, 2001 John Wiley & Sons. A Unifying Information-theoretic Framework for Independent Component Analysis, Te-Won Lee, Mark Girolami, Anthony J. Bell and Terrence J. Sejnowski, International Journal on Mathematical and Computer Models,1999