HST.582J/6.555J/16.456J

Similar documents
HST.582J / 6.555J / J Biomedical Signal and Image Processing Spring 2007

Chapter 15 - BLIND SOURCE SEPARATION:

HST-582J/6.555J/16.456J Biomedical Signal and Image Processing Spring Laboratory Project 3 Blind Source Separation: Fetal & Maternal ECG

Independent Component Analysis and Its Applications. By Qing Xue, 10/15/2004

HST.582J / 6.555J / J Biomedical Signal and Image Processing Spring 2007

Gatsby Theoretical Neuroscience Lectures: Non-Gaussian statistics and natural images Parts I-II

CIFAR Lectures: Non-Gaussian statistics and natural images

Fundamentals of Principal Component Analysis (PCA), Independent Component Analysis (ICA), and Independent Vector Analysis (IVA)

Independent Component Analysis. Contents

PCA & ICA. CE-717: Machine Learning Sharif University of Technology Spring Soleymani

Independent Component Analysis. PhD Seminar Jörgen Ungh

Independent Component Analysis (ICA) Bhaskar D Rao University of California, San Diego

Artificial Intelligence Module 2. Feature Selection. Andrea Torsello

Advanced Introduction to Machine Learning CMU-10715

Unsupervised learning: beyond simple clustering and PCA

Final Report For Undergraduate Research Opportunities Project Name: Biomedical Signal Processing in EEG. Zhang Chuoyao 1 and Xu Jianxin 2

Independent Component Analysis and Its Application on Accelerator Physics

Tutorial on Blind Source Separation and Independent Component Analysis

Independent Component Analysis (ICA)

Dimensionality Reduction. CS57300 Data Mining Fall Instructor: Bruno Ribeiro

Principal Component Analysis vs. Independent Component Analysis for Damage Detection

CPSC 340: Machine Learning and Data Mining. More PCA Fall 2017

Introduction to Independent Component Analysis. Jingmei Lu and Xixi Lu. Abstract

CPSC 340: Machine Learning and Data Mining. Sparse Matrix Factorization Fall 2018

Blind Machine Separation Te-Won Lee

Independent Component Analysis (ICA)

Independent Component Analysis

A GUIDE TO INDEPENDENT COMPONENT ANALYSIS Theory and Practice

Factor Analysis and Kalman Filtering (11/2/04)

Principal Component Analysis

COMPLEX PRINCIPAL COMPONENT SPECTRA EXTRACTION

MACHINE LEARNING. Methods for feature extraction and reduction of dimensionality: Probabilistic PCA and kernel PCA

LECTURE :ICA. Rita Osadchy. Based on Lecture Notes by A. Ng

INDEPENDENT COMPONENT ANALYSIS

Introduction PCA classic Generative models Beyond and summary. PCA, ICA and beyond

Dimension Reduction (PCA, ICA, CCA, FLD,

CHAPTER 4 PRINCIPAL COMPONENT ANALYSIS-BASED FUSION

Independent Component Analysis

PCA, Kernel PCA, ICA

Maximum variance formulation

Machine Learning (Spring 2012) Principal Component Analysis

Independent component analysis: algorithms and applications

CS281 Section 4: Factor Analysis and PCA

Independent Component Analysis and Unsupervised Learning. Jen-Tzung Chien

MTTS1 Dimensionality Reduction and Visualization Spring 2014 Jaakko Peltonen

Independent Component Analysis and Unsupervised Learning

Independent Component Analysis

Independent Component Analysis

Structure in Data. A major objective in data analysis is to identify interesting features or structure in the data.

Rigid Structure from Motion from a Blind Source Separation Perspective

Machine Learning (BSMC-GA 4439) Wenke Liu

An Introduction to Independent Components Analysis (ICA)

Independent Component Analysis

Undercomplete Independent Component. Analysis for Signal Separation and. Dimension Reduction. Category: Algorithms and Architectures.

COMS 4721: Machine Learning for Data Science Lecture 19, 4/6/2017

Dimensionality Reduction with Principal Component Analysis

Massoud BABAIE-ZADEH. Blind Source Separation (BSS) and Independent Componen Analysis (ICA) p.1/39

Principal Component Analysis

Mathematical foundations - linear algebra

CS168: The Modern Algorithmic Toolbox Lecture #8: How PCA Works

26. Filtering. ECE 830, Spring 2014

Dimension Reduction Techniques. Presented by Jie (Jerry) Yu

Independent Component Analysis on the Basis of Helmholtz Machine

FuncICA for time series pattern discovery

PCA and admixture models

Separation of Different Voices in Speech using Fast Ica Algorithm

Independent Component Analysis of Rock Magnetic Measurements

7. Variable extraction and dimensionality reduction

c Springer, Reprinted with permission.

Independent Components Analysis

TRINICON: A Versatile Framework for Multichannel Blind Signal Processing

Independent Component Analysis

Fourier PCA. Navin Goyal (MSR India), Santosh Vempala (Georgia Tech) and Ying Xiao (Georgia Tech)

Manifold Learning for Signal and Visual Processing Lecture 9: Probabilistic PCA (PPCA), Factor Analysis, Mixtures of PPCA

Independent Component Analysis

Manning & Schuetze, FSNLP, (c)

Dimension reduction, PCA & eigenanalysis Based in part on slides from textbook, slides of Susan Holmes. October 3, Statistics 202: Data Mining

3. ESTIMATION OF SIGNALS USING A LEAST SQUARES TECHNIQUE

Probabilistic & Unsupervised Learning

Principal Component Analysis

Sparse vectors recap. ANLP Lecture 22 Lexical Semantics with Dense Vectors. Before density, another approach to normalisation.

ANLP Lecture 22 Lexical Semantics with Dense Vectors

LECTURE 16: PCA AND SVD

Bindel, Fall 2009 Matrix Computations (CS 6210) Week 8: Friday, Oct 17

Lecture 3: Latent Variables Models and Learning with the EM Algorithm. Sam Roweis. Tuesday July25, 2006 Machine Learning Summer School, Taiwan

Lecture 7: Con3nuous Latent Variable Models

Independent Component Analysis of Incomplete Data

Unsupervised Learning

Introduction to Machine Learning

Digital Image Processing Lectures 13 & 14

Lecture: Face Recognition and Feature Reduction

Machine Learning - MT & 14. PCA and MDS

Computational functional genomics

Different Estimation Methods for the Basic Independent Component Analysis Model

Computation. For QDA we need to calculate: Lets first consider the case that

Least Squares Optimization

Blind Separation of Fetal ECG from Single Mixture using SVD and ICA

Linear Factor Models. Sargur N. Srihari

Separation of the EEG Signal using Improved FastICA Based on Kurtosis Contrast Function

MultiDimensional Signal Processing Master Degree in Ingegneria delle Telecomunicazioni A.A

Transcription:

Blind Source Separation: PCA & ICA HST.582J/6.555J/16.456J Gari D. Clifford gari [at] mit. edu http://www.mit.edu/~gari G. D. Clifford 2005-2009 What is BSS? Assume an observation (signal) is a linear mix of >1 unknown independent source signals The mixing (not the signals) is stationary We have as many observations as unknown sources To find sources in observations - need to define a suitable measure of independence For example - the cocktail party problem (sources are speakers and background noise): The cocktail party problem - find Z z 1 x 1 z 2 x 2 A z N X T =AZ T x N Z T X T 1

Formal statement of problem N independent sources Z mn ( M xn ) linear square mixing A nn ( N xn ) (#sources=#sensors) produces a set of observations X mn ( M xn ).. X T = AZ T Formal statement of solution demix observations X T ( N xm ) into Y T = WX T Y T ( NxM ) Z T W ( N xn ) A -1 How do we recover the independent sources? ( We are trying to estimate W A -1 ). We require a measure of independence! Signal source Noise sources Observed mixtures Z T X T =AZ T Y T =WX T 2

T T X T = A Z T T T Y T = W X T The Fourier Transform (Independence between components is assumed) Recap: Non-causal Wiener filtering Observation S x (f ) Noise Power S d (f ) x[n] - observation y[n] - ideal signal d[n] - noise component Ideal Signal S y (f ) f Filtered signal: S filt (f ) = S x (f ).H(f ) 3

BSS is a transform? Like Fourier, we decompose into components by transforming the observations into another vector space which maximises the separation between interesting (signal) and unwanted (noise). Unlike Fourier, separation is not based on frequency- It s based on independence Sources can have the same frequency content No assumptions about the signals (other than they are independent and linearly mixed) So you can filter/separate in-band noise/signals with BSS Principal Component Analysis Second order decorrelation = independence Find a set of orthogonal axes in the data (independence metric = variance) Project data onto these axes to decorrelate Independence is forced onto the data through the orthogonality of axes Conventional noise / signal separation technique Singular Value Decomposition Decompose observation X=AZ into. X=USV T S is a diagonal matrix of singular values with elements arranged in descending order of magnitude (the singular spectrum) The columns of V are the eigenvectors of C=X T X (the orthogonal subspace dot(v i,v j )=0 ) they demix or rotate the data U is the matrix of projections of X onto the eigenvectors of C the source estimates 4

Singular Value Decomposition Decompose observation X=AZ into. X=USV T Eigenspectrum of decomposition S = singular matrix zeros except on the leading diagonal S ij (i=j) are the eigenvalues ½ Placed in order of descending magnitude Correspond to the magnitude of projected data along each eigenvector Eigenvectors are the axes of maximal variation in the data [stem(diag(s).^2)] Eigenspectrum= Plot of eigenvalues Variance = power (analogous to Fourier components in power spectra) SVD: Method for PCA 5

SVD noise/signal separation To perform SVD filtering of a signal, use a truncated SVD decomposition (using the first p eigenvectors) Y=US p V T [Reduce the dimensionality of the data by discarding noise projections S noise =0 Then reconstruct the data with just the signal subsapce] Most of the signal is contained in the first few principal components. Discarding these and projecting back into the original observation space effects a noise-filtering or a noise/signal separation X Real data S 2 X p =US p V T λ n X p p=2 X p p=4 Two dimensional example 6

Independent Component Analysis As in PCA, we are looking for N different vectors onto which we can project our observations to give a set of N maximally independent signals (sources) output data (discovered sources) dimensionality = dimensionality of observations Instead of using variance as our independence measure (i.e. decorrelating) as we do in PCA, we use a measure of how statistically independent the sources are. 7

ICA: The basic idea... Assume underlying source signals (Z ) are independent. Assume a linear mixing matrix (A ) X T =AZ T in order to find Y ( Z ), find W, ( A -1 )... Y T =WX T How? Initialise W & iteratively update W to minimise or maximise a cost function that measures the (statistical) independence between the columns of the Y T. Non-Gaussianity statistical independence? From the Central Limit Theorem, - add enough independent signals together, Gaussian PDF Sources, Z P(Z) (subgaussian) Mixtures (X T =AZ T ) P(X) (Gaussian) Recap: Moments of a distribution 8

Higher order moments (3 rd -skewness) μ x Higher order moments (4 th -kurtosis) Gaussians are mesokurtic with κ =3 SubGaussian SuperGaussian μ x Non-Gaussianity statistical independence? Central Limit Theorem: add enough independent signals together, Gaussian PDF (κ =3) make data components non-gaussian to find independent sources Sources, Z P(Z) (κ <3 (1.8) ) Mixtures (X T =AZ T ) P(X) (κ =3.4) 9

Recall trying to estimate W Assume underlying source signals (Z ) are independent. Assume a linear mixing matrix (A ) X T =AZ T in order to find Y ( Z ), find W, ( A -1 )... Y T =WX T Initialise W & iteratively update W with gradient descent to maximise kurtosis. Gradient descent to find W Given a cost function, ξ, we update each element of W ( w ij ) at each step, τ, and recalculate cost function (ηis the learning rate (~ 0.1), and speeds up convergence.) Weight updates to find: (Gradient ascent) W=[1 3; -2-1] κ 10 5 w ij Iterations 10

Gradient descent min (1/ κ 1, 1/ κ 2 ) κ = max Gradient Descent Cost function, ξ, can be maximum κ or minimum 1/κ ξ = min (1/ κ 1, 1/ κ 2 ) κ = max Gradient descent example Imagine a 2-channel ECG, comprised of two sources; Cardiac Noise and SNR=1 X 1 X 2 11

Iteratively update W and measure κ Y 1 Y 2 Iteratively update W and measure κ Y 1 Y 2 Iteratively update W and measure κ Y 1 Y 2 12

Iteratively update W and measure κ Y 1 Y 2 Iteratively update W and measure κ Y 1 Y 2 Maximized κ for non-gaussian signal Y 1 Y 2 14

Outlier insensitive ICA cost functions Measures of statistical independence In general we require a measure of statistical independence which we maximise between each of the N components. Non-Gaussianity is one approximation, but sensitive to small changes in the distribution tail. Other measures include: Mutual Information I, Entropy (Negentropy, J ) and Maximum (Log) Likelihood (Note: all are related to κ ) Entropy-based cost function Kurtosis is highly sensitive to small changes in distribution tails. A more robust measures of Gaussianity is based on differential entropy H(y), negentropy: where y gauss is a Gaussian variable with the same covariance matrix as y. J(y) can be estimated from kurtosis Entropy: measure of randomness- Gaussians are maximally random 15

Minimising Mutual Information Mutual information (MI) between two vectors x and y : I = H x + H y - H xy always non-negative and zero if variables are independent therefore we want to minimise MI. MI can be re-written in terms of negentropy where c is a constant. differs from negentropy by a constant and a sign change Independent source discovery using Maximum Likelihood Generative latent variable modelling N observables, X... from N sources, z i through a linear mapping W=w ij Latent variables assumed to be independently distributed Find elements of W by gradient ascent - iterative update by where is some learning rate (const) and is our objective cost function, the log likelihood The cocktail party problem revisited some real examples using ICA 16

17

Observations Separation of mixed observations into source estimates is excellent apart from: Order of sources has changed Signals have been scaled Why? In X T =AZ T, insert a permutation matrix B X T =ABB -1 Z T B -1 Z T = sources with different col. order. sources change by a scaling A AB ICA solutions are order and scale independent because κ is dimensionless Separation of sources in the ECG ζ=3 κ =11 ζ=0 κ =3 ζ=1 κ =5 ζ=0 κ =1.5 18

Transformation inversion for filtering Problem - can never know if sources are really reflective of the actual source generators - no gold standard De-mixing might alter the clinical relevance of the ECG features Solution: Identify unwanted sources, set corresponding (p) columns in W -1 to zero (W p -1 ), then multiply back through to remove noise sources and transform back into original observation space. Transformation inversion for filtering Real data X Y=WX Z X filt = W p -1Y 19

X ICA results 4 Y=WX Z X filt = W p -1 Y Summary PCA is good for Gaussian noise separation ICA is good for non-gaussian noise separation PCs have obvious meaning - highest energy components ICA - derived sources : arbitrary scaling/inversion & ordering. need energy-independent heuristic to identify signals / noise Order of ICs change - IC space is derived from the data. - PC space only changes if SNR changes. ICA assumes linear mixing matrix ICA assumes stationary mixing De-mixing performance is function of lead position ICA requires as many sensors (ECG leads) as sources Filtering - discard certain dimensions then invert transformation In-band noise can be removed - unlike Fourier! Fetal ECG lab preparation 20

Fetal abdominal recordings Maternal QRS Fetal QRS Maternal ECG is much larger in amplitude Maternal and fetal ECG overlap in time domain Maternal features are broader, but Fetal ECG is in-band of maternal ECG (they overlap in freq domain) 5 second window Maternal HR=72 bpm / Fetal HR = 156bpm MECG & FECG spectral properties Fetal QRS power region ECG Envelope Movement Artifact QRS Complex P & T Waves Muscle Noise Baseline Wander Fetal / Maternal Mixture Maternal Noise Fetal 21