Independent Component Analysis

Similar documents
Independent Component Analysis

CIFAR Lectures: Non-Gaussian statistics and natural images

Gatsby Theoretical Neuroscience Lectures: Non-Gaussian statistics and natural images Parts I-II

New Machine Learning Methods for Neuroimaging

Lecture'12:' SSMs;'Independent'Component'Analysis;' Canonical'Correla;on'Analysis'

Independent Component Analysis and Blind Source Separation

Independent Component Analysis and Its Applications. By Qing Xue, 10/15/2004

From independent component analysis to score matching

Independent Component Analysis. Contents

ICA. Independent Component Analysis. Zakariás Mátyás

Independent Component Analysis

Natural Image Statistics

NeuroImage 49 (2010) Contents lists available at ScienceDirect. NeuroImage. journal homepage:

1 Introduction Independent component analysis (ICA) [10] is a statistical technique whose main applications are blind source separation, blind deconvo

Introduction to Independent Component Analysis. Jingmei Lu and Xixi Lu. Abstract

PROPERTIES OF THE EMPIRICAL CHARACTERISTIC FUNCTION AND ITS APPLICATION TO TESTING FOR INDEPENDENCE. Noboru Murata

Estimation of linear non-gaussian acyclic models for latent factors

A Constrained EM Algorithm for Independent Component Analysis

TWO METHODS FOR ESTIMATING OVERCOMPLETE INDEPENDENT COMPONENT BASES. Mika Inki and Aapo Hyvärinen

Independent Component Analysis (ICA) Bhaskar D Rao University of California, San Diego

Independent Component Analysis. PhD Seminar Jörgen Ungh

One-unit Learning Rules for Independent Component Analysis

Emergence of Phase- and Shift-Invariant Features by Decomposition of Natural Images into Independent Feature Subspaces

Advanced Introduction to Machine Learning CMU-10715

Fundamentals of Principal Component Analysis (PCA), Independent Component Analysis (ICA), and Independent Vector Analysis (IVA)

ACENTRAL problem in neural-network research, as well

HST.582J/6.555J/16.456J

Independent Component Analysis

Independent Component Analysis

Massoud BABAIE-ZADEH. Blind Source Separation (BSS) and Independent Componen Analysis (ICA) p.1/39

A General Linear Non-Gaussian State-Space Model: Identifiability, Identification, and Applications

Discovery of non-gaussian linear causal models using ICA

Distinguishing Causes from Effects using Nonlinear Acyclic Causal Models

Donghoh Kim & Se-Kang Kim

Separation of Different Voices in Speech using Fast Ica Algorithm

Blind Source Separation Using Artificial immune system

ICA [6] ICA) [7, 8] ICA ICA ICA [9, 10] J-F. Cardoso. [13] Matlab ICA. Comon[3], Amari & Cardoso[4] ICA ICA

Distinguishing Causes from Effects using Nonlinear Acyclic Causal Models

Natural Gradient Learning for Over- and Under-Complete Bases in ICA

Blind separation of sources that have spatiotemporal variance dependencies

Non-Euclidean Independent Component Analysis and Oja's Learning

INDEPENDENT COMPONENT ANALYSIS

A Canonical Genetic Algorithm for Blind Inversion of Linear Channels

Independent component analysis: algorithms and applications

Causal modelling combining instantaneous and lagged effects: an identifiable model based on non-gaussianity

CPSC 340: Machine Learning and Data Mining. Sparse Matrix Factorization Fall 2018

Undercomplete Independent Component. Analysis for Signal Separation and. Dimension Reduction. Category: Algorithms and Architectures.

Principal Component Analysis vs. Independent Component Analysis for Damage Detection

Blind Machine Separation Te-Won Lee

Separation of the EEG Signal using Improved FastICA Based on Kurtosis Contrast Function

Unsupervised Feature Extraction by Time-Contrastive Learning and Nonlinear ICA

SPARSE REPRESENTATION AND BLIND DECONVOLUTION OF DYNAMICAL SYSTEMS. Liqing Zhang and Andrzej Cichocki

CPSC 340: Machine Learning and Data Mining. More PCA Fall 2017

New Approximations of Differential Entropy for Independent Component Analysis and Projection Pursuit

Extraction of Sleep-Spindles from the Electroencephalogram (EEG)

Higher Order Statistics

Artificial Intelligence Module 2. Feature Selection. Andrea Torsello

File: ica tutorial2.tex. James V Stone and John Porrill, Psychology Department, Sheeld University, Tel: Fax:

Independent Component Analysis of Incomplete Data

Slide11 Haykin Chapter 10: Information-Theoretic Models

Tutorial on Blind Source Separation and Independent Component Analysis

Chapter 15 - BLIND SOURCE SEPARATION:

Dimensionality Reduction. CS57300 Data Mining Fall Instructor: Bruno Ribeiro

Discovery of Linear Acyclic Models Using Independent Component Analysis

HST.582J / 6.555J / J Biomedical Signal and Image Processing Spring 2007

Unsupervised learning: beyond simple clustering and PCA

NeuroImage 58 (2011) Contents lists available at ScienceDirect. NeuroImage. journal homepage:

Robustness of Principal Components

An Improved Cumulant Based Method for Independent Component Analysis

PCA & ICA. CE-717: Machine Learning Sharif University of Technology Spring Soleymani

Independent Component Analysis

Model-free Functional Data Analysis

Unsupervised Learning: Dimensionality Reduction

Recursive Generalized Eigendecomposition for Independent Component Analysis

On Information Maximization and Blind Signal Deconvolution

Independent Component Analysis (ICA)

Independent component analysis: an introduction

Analytical solution of the blind source separation problem using derivatives

Remaining energy on log scale Number of linear PCA components

POLYNOMIAL SINGULAR VALUES FOR NUMBER OF WIDEBAND SOURCES ESTIMATION AND PRINCIPAL COMPONENT ANALYSIS

Independent component analysis for biomedical signals

Optimization and Testing in Linear. Non-Gaussian Component Analysis

Blind signal processing algorithms

Final Report For Undergraduate Research Opportunities Project Name: Biomedical Signal Processing in EEG. Zhang Chuoyao 1 and Xu Jianxin 2

MTTS1 Dimensionality Reduction and Visualization Spring 2014 Jaakko Peltonen

ADAPTIVE LATERAL INHIBITION FOR NON-NEGATIVE ICA. Mark Plumbley

Principal Component Analysis

Finding a causal ordering via independent component analysis

Efficient Coding. Odelia Schwartz 2017

Comparative Analysis of ICA Based Features

STATS 306B: Unsupervised Learning Spring Lecture 12 May 7

Using Kernel PCA for Initialisation of Variational Bayesian Nonlinear Blind Source Separation Method

Principal Component Analysis

Lecture III. Independent component analysis (ICA) for linear static problems: information- theoretic approaches

FASTICA BASED BLIND SOURCE SEPARATION FOR CT IMAGING UNDER NOISE CONDITIONS

Scatter Matrices and Independent Component Analysis

Factor Analysis (10/2/13)

To appear in Proceedings of the ICA'99, Aussois, France, A 2 R mn is an unknown mixture matrix of full rank, v(t) is the vector of noises. The

Research Article Acoustical Source Tracing Using Independent Component Analysis and Correlation Analysis

Transcription:

A Short Introduction to Independent Component Analysis with Some Recent Advances Aapo Hyvärinen Dept of Computer Science Dept of Mathematics and Statistics University of Helsinki

Problem of blind source separation There is a number of source signals : Due to some external circumstances, only linear mixtures of the source signals are observed: Estimate (separate) original signals!

Principal component analysis does not recover original signals

Principal component analysis does not recover original signals A solution is possible Use information on statistical independence to recover:

Independent Component Analysis (Hérault and Jutten, 1984-1991) Observed data x i (t) is modelled using hidden variables s i (t): x i (t)= m a i j s j (t), i=1...n (1) j=1 or as a matrix decomposition X=AS (2) Matrix of a i j is constant parameter called mixing matrix Hidden random factors s i (t) are called independent components or source signals Problem: Estimate both a i j and s j (t), observing only x i (t) Unsupervised, exploratory approach

When can the ICA model be estimated? Must assume: The s i are mutually statistically independent The s i are nongaussian (non-normal) (Optional:) Number of independent components is equal to number of observed variables Then: mixing matrix and components can be identified (Comon, 1994) A very surprising result!

Background: Principal component analysis and factor analysis Basic idea: find directions i w i x i of maximum variance Goal: explain maximum amount of variance with few components Noise reduction Reduction in computation Easier to interpret (?) (Vain hope: finds original underlying components)

Why cannot PCA or FA find source signals (original components)? They use only covariances cov(x i,x j ) Due to symmetry cov(x i,x j )=cov(x j,x i ), only n 2 /2 available Mixing matrix has n 2 parameters

Why cannot PCA or FA find source signals (original components)? They use only covariances cov(x i,x j ) Due to symmetry cov(x i,x j )=cov(x j,x i ), only n 2 /2 available Mixing matrix has n 2 parameters So, not enough information in covariances Factor rotation problem : Cannot distinguish between (s 1,s 2 ) and ( 2s 1 + 2s 2, 2s 1 2s 2 )

ICA uses nongaussianity to find source signals Gaussian (normal) distribution is completely determined by covariances But: Nongaussian data has structure beyond covariances e.g. E{x i x j x k x l } instead of just E{x i x j } Is it reasonable to assume data is nongaussian? Many variables may be gaussian because sum of many independent variables (central limit theorem), e.g. intelligence Fortunately, signals measured by physical sensors are usually quite nongaussian

Some examples of nongaussianity in signals 2 5 10 1.5 4 8 1 0.5 0 3 2 1 0 6 4 2 0 0.5 1 2 1 2 4 1.5 3 6 2 0 1 2 3 4 5 6 7 8 9 10 4 0 1 2 3 4 5 6 7 8 9 10 8 0 1 2 3 4 5 6 7 8 9 10 0.7 0.7 0.7 0.6 0.6 0.6 0.5 0.5 0.5 0.4 0.4 0.4 0.3 0.3 0.3 0.2 0.2 0.2 0.1 0.1 0.1 0 2 1.5 1 0.5 0 0.5 1 1.5 2 0 4 3 2 1 0 1 2 3 4 5 0 8 6 4 2 0 2 4 6 8 10

Illustration of PCA vs. ICA with nongaussian data Two components with uniform distributions: Original components, observed mixtures, PCA, ICA PCA does not find original coordinates, ICA does!

Illustration of PCA vs. ICA with gaussian data Original components, observed mixtures, PCA Distribution after PCA is the same as distribution before mixing! Factor rotation problem

How is nongaussianity used in ICA? Classic Central Limit Theorem: Average of many independent random variables will have a distribution that is close(r) to gaussian So, roughly: Any mixture of components will be more gaussian than the components themselves

How is nongaussianity used in ICA? Classic Central Limit Theorem: Average of many independent random variables will have a distribution that is close(r) to gaussian So, roughly: Any mixture of components will be more gaussian than the components themselves Maximizing the nongaussianity of i w i x i, we can find s i. Also known as projection pursuit. Cf. principal component analysis: maximize variance of i w i x i.

Illustration of changes in nongaussianity 0.4 0.35 0.3 0.25 0.2 0.15 0.1 0.05 0 4 3 2 1 0 1 2 3 4 Histogram and scatterplot, original uniform distributions 0.4 0.35 0.3 0.25 0.2 0.15 0.1 0.05 0 4 3 2 1 0 1 2 3 4 Histogram and scatterplot, mixtures given by PCA

ICA algorithms consist of two ingredients 1. Nongaussianity measure Kurtosis: a classic, but sensitive to outliers. Differential entropy: statistically better, but difficult to compute. Approximations of entropy: a practical compromise. 2. Optimization algorithm Gradient methods: natural gradient, infomax (Bell, Sejnowski, Amari, Cichocki et al 1994-6) Fixed-point algorithm: FastICA (Hyvärinen, 1999)

Example of separated components in brain signals (MEG) Temporal envelope (arbitrary units) Fourier amplitude (arbitrary units) Distribution over channels Phase differences 1 2 3 4 5 6 7 8 9 0 100 200 300 Time (seconds) 5 10 15 20 25 30 Frequency (Hz) (Hyvärinen, Ramkumar, Parkkonen, Hari, NeuroImage, 2010)

Recent Advance 1: Causal analysis A structural equation model (SEM) analyses causal connections as x2 0.49 x1 0.7 x i = b i j x j + n i j i 0.17-0.14-1.2 0.91 x3 0.0088 Cannot be estimated in gaussian case without further assumptions 0.54 0.97 x4-0.063-0.12-0.42 Again, nongaussianity solves the problem (Shimizu et al, 2006) x5-0.77 x6 0.87-0.66 x7

Recent Advance 2: Analysing reliability of components (testing) Algorithmic reliability: Are there local minima? Can be analysed by rerunning from different initial points (a) Statistical reliability: Is the result just random/accidental? Can be analyzed by bootstrap (b) Our Icasso package (Himberg et al, 2004) visualizes reliability: 10 5 2 1 0 1 2 a) 1 0 1 2 b) 14 12 A 16 20 13 19 18 17 15 B 3 1 3 4 5 3 4 5 4 7 9 6 6 0 2 4 6 8 10 6 0 2 4 6 8 10 11 8 New: A proper testing procedure which gives p-values (Hyvärinen, 2011)

Recent Advance 3: Three-way data ( Group ICA ) Often ones measures several data matrices X k,k=1,...,r, e.g. different conditions, measurement sessions, subjects, etc. Each matrix could be modelled X k = A k S k but how to connect the results? (Calhoun, 2001) a) If mixing matrices same for all X k, use ) X 1 = (X 1,X 2,...,X r ) = A (S 1,S 2,...,S r b) If component values same for all X k, use X 2 = X 1. X r = A 1. A r S If both A k and S k the same: analyse average data matrix, or combine ICA with PARAFAC (Beckmann and Smith, 2005).

Recent Advance 4: Dependencies between components In fact, estimated components are often not independent ICA does not have enough parameters to force independence Many authors model correlations of squares, or simultaneous activity Two signals that are uncorrelated but whose squares are correlated. On-going work on even linearly correlated components (Sasaki et al, 2011) Alternatively, in parallel time series, innovations of VAR model could be independent (Gómez-Herrero et al, 2008)

Recent Advance 5: Better estimation of basic linear mixing In case of time signals, we can do ICA on time-frequency decompositions (Pham, 2002; Hyvärinen et al, 2010) If the data is by its very nature non-negative, we could impose the same in the model (Hoyer, 2004) Zero must have some special meaning as a baseline E.g. Fourier spectra More precise modelling of nongaussianity of components could also improve estimation.

Summary ICA is a very simple model: Simplicity implies wide applicability. A nongaussian alternative to PCA or factor analysis. Finds a linear decomposition by maximizing nongaussianity of the components. These (hopefully) correspond to the original sources Recent advances: Causal analysis, or structural equation modelling, using ICA Testing of independent components for statistical significance Group ICA, i.e. ICA on three-way data Modelling dependencies between components Imporovements in estimating the basic linear mixing model Nongaussianity is beautiful!?