Independent Component Analysis

Similar documents
Independent Component Analysis

CIFAR Lectures: Non-Gaussian statistics and natural images

Gatsby Theoretical Neuroscience Lectures: Non-Gaussian statistics and natural images Parts I-II

Lecture'12:' SSMs;'Independent'Component'Analysis;' Canonical'Correla;on'Analysis'

From independent component analysis to score matching

Independent Component Analysis and Its Applications. By Qing Xue, 10/15/2004

Independent Component Analysis

1 Introduction Independent component analysis (ICA) [10] is a statistical technique whose main applications are blind source separation, blind deconvo

ICA. Independent Component Analysis. Zakariás Mátyás

Independent Component Analysis. Contents

Independent Component Analysis. PhD Seminar Jörgen Ungh

Independent Component Analysis and Blind Source Separation

Fundamentals of Principal Component Analysis (PCA), Independent Component Analysis (ICA), and Independent Vector Analysis (IVA)

A Constrained EM Algorithm for Independent Component Analysis

Natural Image Statistics

Estimation of linear non-gaussian acyclic models for latent factors

PROPERTIES OF THE EMPIRICAL CHARACTERISTIC FUNCTION AND ITS APPLICATION TO TESTING FOR INDEPENDENCE. Noboru Murata

Introduction to Independent Component Analysis. Jingmei Lu and Xixi Lu. Abstract

One-unit Learning Rules for Independent Component Analysis

Donghoh Kim & Se-Kang Kim

Independent Component Analysis (ICA) Bhaskar D Rao University of California, San Diego

Independent Component Analysis

INDEPENDENT COMPONENT ANALYSIS

TWO METHODS FOR ESTIMATING OVERCOMPLETE INDEPENDENT COMPONENT BASES. Mika Inki and Aapo Hyvärinen

Blind Machine Separation Te-Won Lee

ACENTRAL problem in neural-network research, as well

Distinguishing Causes from Effects using Nonlinear Acyclic Causal Models

Distinguishing Causes from Effects using Nonlinear Acyclic Causal Models

A Canonical Genetic Algorithm for Blind Inversion of Linear Channels

Independent Component Analysis

Blind separation of sources that have spatiotemporal variance dependencies

Discovery of non-gaussian linear causal models using ICA

Undercomplete Independent Component. Analysis for Signal Separation and. Dimension Reduction. Category: Algorithms and Architectures.

ICA [6] ICA) [7, 8] ICA ICA ICA [9, 10] J-F. Cardoso. [13] Matlab ICA. Comon[3], Amari & Cardoso[4] ICA ICA

Independent Component Analysis of Incomplete Data

Natural Gradient Learning for Over- and Under-Complete Bases in ICA

Advanced Introduction to Machine Learning CMU-10715

Independent component analysis: algorithms and applications

Separation of Different Voices in Speech using Fast Ica Algorithm

Pairwise measures of causal direction in linear non-gaussian acy

HST.582J/6.555J/16.456J

New Approximations of Differential Entropy for Independent Component Analysis and Projection Pursuit

Blind Source Separation Using Artificial immune system

Massoud BABAIE-ZADEH. Blind Source Separation (BSS) and Independent Componen Analysis (ICA) p.1/39

Independent Component Analysis

CPSC 340: Machine Learning and Data Mining. Sparse Matrix Factorization Fall 2018

Non-Euclidean Independent Component Analysis and Oja's Learning

Artificial Intelligence Module 2. Feature Selection. Andrea Torsello

Principal Component Analysis vs. Independent Component Analysis for Damage Detection

Unsupervised Feature Extraction by Time-Contrastive Learning and Nonlinear ICA

Slide11 Haykin Chapter 10: Information-Theoretic Models

New Machine Learning Methods for Neuroimaging

File: ica tutorial2.tex. James V Stone and John Porrill, Psychology Department, Sheeld University, Tel: Fax:

Dimensionality Reduction. CS57300 Data Mining Fall Instructor: Bruno Ribeiro

Discovery of Linear Acyclic Models Using Independent Component Analysis

Emergence of Phase- and Shift-Invariant Features by Decomposition of Natural Images into Independent Feature Subspaces

Comparative Analysis of ICA Based Features

Blind signal processing algorithms

Independent component analysis: an introduction

On Information Maximization and Blind Signal Deconvolution

Separation of the EEG Signal using Improved FastICA Based on Kurtosis Contrast Function

An Improved Cumulant Based Method for Independent Component Analysis

Finding a causal ordering via independent component analysis

A survey of dimension reduction techniques

Different Estimation Methods for the Basic Independent Component Analysis Model

CPSC 340: Machine Learning and Data Mining. More PCA Fall 2017

Higher Order Statistics

HST.582J / 6.555J / J Biomedical Signal and Image Processing Spring 2007

Independent Component Analysis (ICA)

Optimization and Testing in Linear. Non-Gaussian Component Analysis

Independent Component Analysis

Final Report For Undergraduate Research Opportunities Project Name: Biomedical Signal Processing in EEG. Zhang Chuoyao 1 and Xu Jianxin 2

Independent Component Analysis and Unsupervised Learning

Independent Component Analysis and Unsupervised Learning. Jen-Tzung Chien

BLIND SEPARATION USING ABSOLUTE MOMENTS BASED ADAPTIVE ESTIMATING FUNCTION. Juha Karvanen and Visa Koivunen

Analytical solution of the blind source separation problem using derivatives

SPARSE REPRESENTATION AND BLIND DECONVOLUTION OF DYNAMICAL SYSTEMS. Liqing Zhang and Andrzej Cichocki

Recursive Generalized Eigendecomposition for Independent Component Analysis

POLYNOMIAL SINGULAR VALUES FOR NUMBER OF WIDEBAND SOURCES ESTIMATION AND PRINCIPAL COMPONENT ANALYSIS

Lecture III. Independent component analysis (ICA) for linear static problems: information- theoretic approaches

MTTS1 Dimensionality Reduction and Visualization Spring 2014 Jaakko Peltonen

STAT 730 Chapter 9: Factor analysis

Scatter Matrices and Independent Component Analysis

ADAPTIVE LATERAL INHIBITION FOR NON-NEGATIVE ICA. Mark Plumbley

Maximum Likelihood Blind Source Separation: A Context-Sensitive Generalization of ICA

PCA & ICA. CE-717: Machine Learning Sharif University of Technology Spring Soleymani

Speed and Accuracy Enhancement of Linear ICA Techniques Using Rational Nonlinear Functions

Principal Component Analysis

Blind Separation of Fetal ECG from Single Mixture using SVD and ICA

Independent Component Analysis

Lecture 6: April 19, 2002

Chapter 15 - BLIND SOURCE SEPARATION:

Independent Components Analysis by Direct Entropy Minimization

Factor Analysis (10/2/13)

Short-Time ICA for Blind Separation of Noisy Speech

Independent Component Analysis of Rock Magnetic Measurements

Causal Inference Using Nonnormality Yutaka Kano and Shohei Shimizu 1

A General Linear Non-Gaussian State-Space Model: Identifiability, Identification, and Applications

EXTENSIONS OF ICA AS MODELS OF NATURAL IMAGES AND VISUAL PROCESSING. Aapo Hyvärinen, Patrik O. Hoyer and Jarmo Hurri

Transcription:

A Short Introduction to Independent Component Analysis Aapo Hyvärinen Helsinki Institute for Information Technology and Depts of Computer Science and Psychology University of Helsinki

Problem of blind source separation There is a number of source signals : Due to some external circumstances, only linear mixtures of the source signals are observed: Estimate (separate) original signals!

Principal component analysis does not recover original signals

Principal component analysis does not recover original signals A solution is possible Use information on statistical independence to recover:

Independent Component Analysis (Hérault and Jutten, 1984-1991) Observed data x i is modelled using hidden variables s i : x i = m a i j s j, i = 1...n (1) j=1 Matrix of a i j is constant parameter called mixing matrix Hidden random factors s i are called independent components or source signals Problem: Estimate both a i j and s j, observing only x i Unsupervised, exploratory approach

When can the ICA model be estimated? Must assume: The s i are mutually statistically independent The s i are nongaussian (non-normal) (Optional:) Number of independent components is equal to number of observed variables Then: mixing matrix and components can be identified (Comon, 1994) A very surprising result!

Related methods: Principal component analysis and factor analysis Basic idea: find directions i w i x i of maximum variance Goal: explain maximum amount of variance with few components Noise reduction Reduction in computation Easier to interpret (?)

Why cannot PCA or FA find source signals (original components)? They use only covariances cov(x i,x j ) Due to symmetry cov(x i,x j ) = cov(x j,x i ), only n 2 /2 available Mixing matrix has n 2 parameters

Why cannot PCA or FA find source signals (original components)? They use only covariances cov(x i,x j ) Due to symmetry cov(x i,x j ) = cov(x j,x i ), only n 2 /2 available Mixing matrix has n 2 parameters So, not enough information in covariances Factor rotation problem : Cannot distinguish between (s 1,s 2 ) and ( 2s 1 + 2s 2, 2s 1 2s 2 )

Deep connection between covariances and gaussianity Gaussian (normal) distribution is completely determined by covariances No additional information can be obtained Using only covariances and treating data as gaussian are closely related Uncorrelated gaussian variables (i.e. zero cov) are independent Uncorrelated, standardized gaussian distribution is same in all directions

ICA uses nongaussianity to find source signals Nongaussian data has structure beyond covariances e.g. E{x i x j x k x l } instead of just E{x i x j } Catch: only works when components are nongaussian Many psychological variables may be gaussian because sum of many independent variables (central limit theorem). Fortunately, signals measured by sensors are usually quite nongaussian

Some examples of nongaussianity in signals 2 5 6 1.5 1 4 3 2 4 2 0.5 0 0.5 1 1.5 1 0 1 2 3 4 0 2 4 6 2 0 1 2 3 4 5 6 7 8 9 10 5 0 1 2 3 4 5 6 7 8 9 10 8 0 1 2 3 4 5 6 7 8 9 10 0.7 0.7 0.8 0.6 0.6 0.7 0.5 0.5 0.6 0.4 0.3 0.4 0.3 0.5 0.4 0.3 0.2 0.2 0.2 0.1 0.1 0.1 0 2 1.5 1 0.5 0 0.5 1 1.5 2 0 5 4 3 2 1 0 1 2 3 4 5 0 8 6 4 2 0 2 4 6

Illustration of using nongaussianity Two components with uniform distributions: Original components, observed mixtures, PCA, ICA PCA does not find original coordinates, ICA does!

Illustration of problem with gaussian distributions Original components, observed mixtures, PCA Distribution after PCA is the same as distribution before mixing! Factor rotation problem

How is nongaussianity used in ICA? Classic Central Limit Theorem: Average of many independent random variables will have a distribution that is close(r) to gaussian So, roughly: Any mixture of components will be more gaussian than the components themselves

How is nongaussianity used in ICA? Classic Central Limit Theorem: Average of many independent random variables will have a distribution that is close(r) to gaussian So, roughly: Any mixture of components will be more gaussian than the components themselves Maximizing the nongaussianity of i w i x i, we can find s i. Also known as projection pursuit. Cf. principal component analysis: maximize variance of i w i x i.

Illustration of changes in nongaussianity 0.4 0.35 0.3 0.25 0.2 0.15 0.1 0.05 0 4 3 2 1 0 1 2 3 4 Histogram and scatterplot, original uniform distributions 0.4 0.35 0.3 0.25 0.2 0.15 0.1 0.05 0 4 3 2 1 0 1 2 3 4 Histogram and scatterplot, mixtures given by PCA

ICA algorithms consist of two ingredients 1. Nongaussianity measure Kurtosis: a classic, but sensitive to outliers. Differential entropy: statistically better, but difficult to compute. Equivalent to maximizing likelihood or network information flow Approximations of entropy: good compromise. 2. Optimization algorithm Gradient methods: natural gradient, infomax (Bell, Sejnowski, Amari, Cichocki et al 1994-6) Fast fixed-point algorithm, FastICA (Hyvärinen, 1999) One-by-one estimation vs. simultaneous estimation

Practical note: Combining ICA with FA/PCA Almost always done in practice: First, find a small number of factors with PCA or FA Then, perform ICA on those factors ICA is then a method of factor rotation Very different from varimax etc. which do not use statistical structure, and cannot find original components (in most cases) PCA reduces noise in signals, reduces computation

Practical note 2: Time-filtering of data ICA model still holds with the same matrix A x i (t) = f(t) x i (t) = f(τ)x i (t τ) (2) τ (3) x i (t) = a i j s j (t) (4) j Try to find a frequency band in which the source signals are as independent and nongaussian as possible

Recent advance: Causal analysis In a structural equation model (SEM), causal directions can be analyzed x i = b i j x j + n i j i 0.17 x2 x1 0.49-0.14-1.2 0.91 0.7 x3 0.0088 Cannot be estimated in gaussian case without further assumptions Again, nongaussianity solves the problem (Shimizu et al, 2006): Our LiNGAM package. x5 0.54 0.97 x4-0.063-0.12-0.42-0.77 x6 0.87-0.66 x7

Conclusions ICA is very simple as a model: linear nongaussian hidden variables model. Solves the blind source separation problem if data (components) are nongaussian Works by maximizing nongaussianity of components. Radically different from PCA. Can be applied almost in any field where we have continuous-valued variables, e.g. electro/magnetoencephalograms functional magnetic resonance imaging