Independent Component Analysis (ICA)

Similar documents
Independent Component Analysis

Nonnegative Matrix Factorization

Linear Models for Regression

Logistic Regression. Seungjin Choi

LECTURE :ICA. Rita Osadchy. Based on Lecture Notes by A. Ng

Fisher s Linear Discriminant Analysis

Advanced Introduction to Machine Learning CMU-10715

Probabilistic Latent Semantic Analysis

Independent Component Analysis

Information Theory Primer:

A Coupled Helmholtz Machine for PCA

HST.582J/6.555J/16.456J

PCA & ICA. CE-717: Machine Learning Sharif University of Technology Spring Soleymani

HST.582J / 6.555J / J Biomedical Signal and Image Processing Spring 2007

Density Estimation. Seungjin Choi

Kernel Principal Component Analysis

Massoud BABAIE-ZADEH. Blind Source Separation (BSS) and Independent Componen Analysis (ICA) p.1/39

Independent Component Analysis on the Basis of Helmholtz Machine

Natural Gradient Learning for Over- and Under-Complete Bases in ICA

Independent Component Analysis and Unsupervised Learning. Jen-Tzung Chien

Bayes Decision Theory

Independent Component Analysis and Unsupervised Learning

ICA [6] ICA) [7, 8] ICA ICA ICA [9, 10] J-F. Cardoso. [13] Matlab ICA. Comon[3], Amari & Cardoso[4] ICA ICA

Linear Models for Regression

Nonparameteric Regression:

Complete Blind Subspace Deconvolution

MULTI-VARIATE/MODALITY IMAGE ANALYSIS

Dimensionality Reduction. CS57300 Data Mining Fall Instructor: Bruno Ribeiro

COMP 551 Applied Machine Learning Lecture 13: Dimension reduction and feature selection

1 Introduction Independent component analysis (ICA) [10] is a statistical technique whose main applications are blind source separation, blind deconvo

Statistical Analysis of fmrl Data

Expectation Maximization

Artificial Intelligence Module 2. Feature Selection. Andrea Torsello

Independent component analysis: an introduction

Orthogonal Nonnegative Matrix Factorization: Multiplicative Updates on Stiefel Manifolds

TRINICON: A Versatile Framework for Multichannel Blind Signal Processing

Independent Component Analysis. Contents

Independent Components Analysis

Principal Component Analysis

Independent Component Analysis of Incomplete Data

Independent Component Analysis (ICA) Bhaskar D Rao University of California, San Diego

Some Interesting Problems in Pattern Recognition and Image Processing

Autoregressive Independent Process Analysis with Missing Observations

On Spectral Basis Selection for Single Channel Polyphonic Music Separation

Independent Component Analysis and Its Applications. By Qing Xue, 10/15/2004

Lecture'12:' SSMs;'Independent'Component'Analysis;' Canonical'Correla;on'Analysis'

NONNEGATIVE matrix factorization (NMF) is a

CIFAR Lectures: Non-Gaussian statistics and natural images

Biomedical signal processing application of optimization methods for machine learning problems

Dimension Reduction (PCA, ICA, CCA, FLD,

ON SOME EXTENSIONS OF THE NATURAL GRADIENT ALGORITHM. Brain Science Institute, RIKEN, Wako-shi, Saitama , Japan

To appear in Proceedings of the ICA'99, Aussois, France, A 2 R mn is an unknown mixture matrix of full rank, v(t) is the vector of noises. The

Independent Component Analysis (ICA)

Fundamentals of Principal Component Analysis (PCA), Independent Component Analysis (ICA), and Independent Vector Analysis (IVA)

ICA and ISA Using Schweizer-Wolff Measure of Dependence

Introduction to Independent Component Analysis. Jingmei Lu and Xixi Lu. Abstract

Slide11 Haykin Chapter 10: Information-Theoretic Models

Unsupervised learning: beyond simple clustering and PCA

where A 2 IR m n is the mixing matrix, s(t) is the n-dimensional source vector (n» m), and v(t) is additive white noise that is statistically independ

Introduction to Automata

ICA. Independent Component Analysis. Zakariás Mátyás

Independent Component Analysis

Support Vector Machines

Introduction PCA classic Generative models Beyond and summary. PCA, ICA and beyond

Consensus Algorithms for Camera Sensor Networks. Roberto Tron Vision, Dynamics and Learning Lab Johns Hopkins University

Constrained Projection Approximation Algorithms for Principal Component Analysis

Gatsby Theoretical Neuroscience Lectures: Non-Gaussian statistics and natural images Parts I-II

Information geometry of mirror descent

Introduction to Machine Learning. PCA and Spectral Clustering. Introduction to Machine Learning, Slides: Eran Halperin

Machine Learning. B. Unsupervised Learning B.2 Dimensionality Reduction. Lars Schmidt-Thieme, Nicolas Schilling

Independent Component Analysis

CPSC 340: Machine Learning and Data Mining. Sparse Matrix Factorization Fall 2018

Introduction to Machine Learning

Conditional Independence and Factorization

Semi-Blind approaches to source separation: introduction to the special session

c Springer, Reprinted with permission.

A Convex Cauchy-Schwarz Divergence Measure for Blind Source Separation

Lecture III. Independent component analysis (ICA) for linear static problems: information- theoretic approaches

New Machine Learning Methods for Neuroimaging

An Introduction to Independent Components Analysis (ICA)

INDEPENDENT COMPONENT ANALYSIS (ICA) is a

Principal Component Analysis (PCA)

Independent Subspace Analysis

Donghoh Kim & Se-Kang Kim

FuncICA for time series pattern discovery

Multidimensional scaling (MDS)

Blind Source Separation with a Time-Varying Mixing Matrix

Separation of Different Voices in Speech using Fast Ica Algorithm

Comparative Analysis of ICA Based Features

THEORETICAL CONCEPTS & APPLICATIONS OF INDEPENDENT COMPONENT ANALYSIS

From independent component analysis to score matching

Finite Automata. Seungjin Choi

Final Report For Undergraduate Research Opportunities Project Name: Biomedical Signal Processing in EEG. Zhang Chuoyao 1 and Xu Jianxin 2

Variational Autoencoders

Variables. Cho-Jui Hsieh The University of Texas at Austin. ICML workshop on Covariance Selection Beijing, China June 26, 2014

All you want to know about GPs: Linear Dimensionality Reduction

Properties of Context-Free Languages

PHONEME CLASSIFICATION OVER THE RECONSTRUCTED PHASE SPACE USING PRINCIPAL COMPONENT ANALYSIS

Tutorial on Blind Source Separation and Independent Component Analysis

BACKPROPAGATION. Neural network training optimization problem. Deriving backpropagation

Transcription:

Independent Component Analysis (ICA) Seungjin Choi Department of Computer Science and Engineering Pohang University of Science and Technology 77 Cheongam-ro, Nam-gu, Pohang 37673, Korea seungjin@postech.ac.kr http://mlg.postech.ac.kr/ seungjin 1 / 34

Outline ICA vs PCA Blind source separation Darmois theorem Algorithms Maximum likelihood ICA Natural gradient vs relative gradient Infomax ICA FastICA 2 / 34

Introduction to ICA 3 / 34

What is ICA? ICA is a statistical method, the goal of which is to decompose the multivariate data x R D into a linear sum of statistically independent components, i.e., x = s 1 a 1 + s 2 a 2 + + s D a D = As, where {s i } are coefficients (sources, latent variables, encoding variables) and {a i } are basis vectors. Constraints: Assume that coefficients {s i } D i=1 are statistically independent. Goal: Learn basis vectors A from data only {x 1,..., x N } 4 / 34

ICA vs. PCA Linear transform Dimensionality reduction (compression) Feature extraction (representation learning) PCA Second-order statistics (Gaussian) Linear orthogonal transform Optimal coding in mean square sense ICA Higher-order statistics (non-gaussian) Linear non-orthogonal transform Related with projection pursuit (non-gaussian is interesting) Better features for classification? 5 / 34

Example: PCA vs ICA 8 8 6 6 4 4 2 2 0 0 2 2 4 4 6 6 8 8 6 4 2 0 2 4 6 8 8 8 6 4 2 0 2 4 6 8 (a) PCA (b) ICA 6 / 34

Two Aspects of ICA Blind source separation Acoustic source separation (cocktail party speech recognition) Biomedical data analysis (EEG, ECG, MEG, fmri, PET) Digital communications (multiuser detection, blind equalization, MIMO channels) Representation learning Natural sound/image statistics Computer vision (e.g. face recogntion/detection) Empirical data analysis (stock market returns, gene expression data, etc) Data visualization (lower-dimensional embedding) 7 / 34

Blind Source Separation 8 / 34

Blind Source Separation Mixing: x = As. Demixing: Want y = Wx to be estimate of s 9 / 34

An Example of EEG (a) Raw EEG (b) After ICA 10 / 34

Transparent Transformation Given a set of observed data X = [x 1,..., x N ] that was generated from unknown sources s through an unknown linear transform A, i.e., x = As, the task of blind source separtion is to restore sources S by estimating the mixing matrix A. To this end, we constrcut a demixing matrix W such that the elements of y = Wx are statistically independent. Impopsing independence in {y i } leads to y = WAs = PΛs where P is the permutation matrix and Λ is a scaling matrix. The transformation PΛ is referred to as transparent transformation. For example, y 1 y 2 y 3 0 0 λ 3 λ 1 0 0 0 λ 2 0 s 1 s 2 s 3. 11 / 34

Darmois Theorem Theorem Supposed that random variables s 1,..., s n are mutually independent. Consider two linear combinations of s i, y 1 = α 1 s 1 + α n s n, y 2 = β 1 s 1 + β n s n. If y 1 and y 2 are statistically independent, then α i β i 0 only when s i is Gaussian. Remark: In other words, assume that at most one of {s i } is Gaussian. Suppose that the mixing matrix is of full-column rank. Then the pairwise independence between {y i } leads to WA is a transparent transformation. 12 / 34

Algorithms for ICA Mutual information minimization Maximum likelihood estimation 13 / 34

Mutual Information Minimization Build a linear model y = Wx such that we solve the following optimization: arg min E p(y) [J (W)] = I (y 1,..., y D ), W where I (y 1,..., y D ) is the mutual information given by [ I (y 1,..., y D ) = D KL p(y) ] [ p i (y i ) = p(y) log i p(y) i p i(y i ) ] dy, which is always nonnegative and its minimum is achieved only when y i s are mutually independent. Note that p(y) = p(x), leading to the loss function (to be minimized) det W J (W) = log det W D log p i (y i ). i=1 14 / 34

Maximum Likelihood Estimation Consider the linear model x = As, where the distribution of x is given by p(x) = r(s) D det A = i=1 r i(s i ). det A Then the single factor the log-likelihood is given by L = log p(x A, r) = log det A + D r i (s i ). Replacing r i ( ) = p i ( ) and A = W 1, the negative log-likelihood becomes L = log det W n log p i (y i ), leading to Maximum likelihood estimation = mutual information minimization in the context of ICA. i=1 i=1 15 / 34

An Information Geometrical View of ICA 16 / 34

ICA: Gradient Descent Algorithm The loss function is given by D J (W) = log det W log p i (y i ). i=1 Define the score function ϕ i (y i ) = d log p i (y i ) d log det W dw = W to obtain dy i and use the relation J (W) = W + ϕ(y)x, leading to the following update ( W W + η W ϕ(y)x ), where ϕ(y) = [ϕ 1 (y 1 ),..., ϕ D (y D )]. 17 / 34

Hypothesized Distributions The ICA algorithm requires p i ( ), hence, we use a hypothesized distribution. Super-Gaussian: ϕi (y i ) = sign(y i ) or tanh(y i ). Sub-Gaussian: ϕi = y 3 i Switching nonlinearity: y i ± tanh(αy i ). Flexible ICA: Generalized Gaussian distribution, leading to α p(y; α) = 2λΓ ( 1 )e y λ α, α ϕ i (y i ) = y i α 1 sign(y i ). 18 / 34

Natural Gradient 19 / 34

Natural Gradient Let S w = {w R D } be a parameter space on which an objective function J (w) is defined. If the coordinate system is nonorthogonal, then dw 2 = G i,j (w)dw i dw j, i j G i,j (w) is Riemannian metric. Theorem (Amari, 1998) The steepest descent direction of J (w) in a Riemannian space is given by ng J (w) = G 1 (w)j (w). 20 / 34

Natural Gradient ICA It turned out that the natural gradient in the context of ICA has the form ng J (W) = J (W)W W. The natural gradient ICA algorithm is of the form ( W W + η I ϕ(y)y ) W, where ϕ(y) = [ϕ 1 (y 1 ),..., ϕ D (y D )] and ϕ i ( ) = d log p i (y i ) dy i. - Relatively fast convergence (compared to the conventional gradient) and equivariance property (uniform performance, regardless of condition of A). 21 / 34

Application I Learn statistical structure of natural scenes 22 / 34

Learn Statistical Structure of Natural Scenes 23 / 34

Examples of Natural Images 24 / 34

Learned Basis Images: PCA 25 / 34

Learned Basis Images: ICA 26 / 34

Application II Face recognition 27 / 34

Eigenfaces 28 / 34

Factorial Faces [Choi and Lee, 2000] 29 / 34

AR Face Database 30 / 34

Eigenfaces vs Factorial Faces 31 / 34

Performance Comparison 32 / 34

Application III Fetal ECG 33 / 34

ECG raw data ICs (flexible ICA) x_1 y_1 x_2 y_2 x_3 y_3 x_4 y_4 x_5 y_5 x_6 y_6 x_7 y_7 x_8 y_8 (a) ECG Data (b) Independent components 34 / 34