Independent Component Analysis. PhD Seminar Jörgen Ungh

Independent Component Analysis PhD Seminar Jörgen Ungh

Agenda Background a motivater Independence ICA vs. PCA Gaussian data ICA theory Examples

Background & motivation The cocktail party problem Bla bla Bla bla Hi hi

Background & motivation The cocktail party problem x1 s1 Bla bla s2 s3 Bla bla Hi hi x2 x3

Cocktail party problem Let s1(t), s2(t) and s3(t) be the original spoken signals Let x1(t), x2(t) and x3(t) be recorded signals The connection between s and x can be written x1(t)=a11*s1(t) + a12*s2(t) + a13*s3(t) x2(t)=a21*s1(t) + a22*s2(t) + a23*s3(t) x3(t)=a31*s1(t) + a32*s2(t) + a33*s3(t) Goal: Estimate s1, s2 and s3 from x1, x2 and x3? Problem: We do not know anything about the right side

Cocktail party problem Example Microphone 1 Microphone 2 Separated 1 Separated 2 http://www.cnl.salk.edu/~tewon/blind/blind_audio.html

Today we celebrate our independence day - US President THOMAS J. WHITMORE (Bill Pullman) in Independence Day (1996)

Independence what is it?? Independence = Uncorrelation

Definitions Covariance: C xy = E{( x m x )( y m y ) T } Correlation: R = E{ x xy y T } If m = m = 0, then C = x y xy R xy

Uncorrelated Two vectors are uncorrelated if: C xy = E{( x m )( y m ) } = x y T 0 xy T R = E{ x y } = E{ x} E{ y } = T m x m T y If m x = m y = 0, then C = R = xy xy 0 from now we assume zero mean variables

Independent Vectors x,y are independent if: p Which also gives: ( x, y) p ( x) p ( y), y x y x = E{ g ( x) g ( y)} E{ g ( x)} E{ g ( y)} = x y Where gx and gy are arbitrary functions of x and y x y

Independent Independent is stronger than uncorrelated! T E { x y } = E{ x} E{ y T } E{ g ( x) g ( y)} E{ g ( x)} E{ g ( y)} = x y x y Equal if linear functions of x and y

Independent Uncorrelated y y x x Are x and y uncorrelated?

Independent Uncorrelated y y YES x YES x Are x and y uncorrelated?

Independent Uncorrelated y y x x Are x and y independent?

Independent Uncorrelated y y YES x No x Are x and y independent?

Relations Independent Uncorrelated BUT Uncorrelated Independent

ICA vs. PCA Independent Principal

PCA Goal: Project data onto an ortonormal basis with maximum variance Data explained by principal components e1 e2

PCA Uses information up to second moment, i.e. the mean and variance/covariance Reduce dimensions of data Ortonormal basis of uncorrelated vectors

ICA Goal: Find the independent sources Data explained by independent components y e1 x e2

ICA Uses information over second moment, i.e. higher order statistics like kurtosis and skewness Does not reduce dimensions of data A basis of independent vectors

ICA vs. PCA Independent is stronger In case of Gaussian data, ICA = PCA

Gaussian data

Gaussian distribution Definition: = ) ( ) ( 2 1 exp ) (2 1 ) ( 1 2 1 2 / µ µ π x C x C x f T N x C = covariance matrix, µ = mean vector Explained completely by first and second order statistics, i.e. mean and variances

Gaussian data Cannot perform a rotation of the basis, due to symmetry

Gaussian distribution Completely defined by its first and second moment Uncorrelated Gaussian data Independence Why assume gaussian data?

Central limit theorem Definition: A sum of independent random variables will tend to be Gaussian That is the argument for many assumptions on gaussian distributions

Central limit theorem Definition: A sum of independent random variables will tend to be Gaussian What if we put it in another way?

Central limit theorem 2:nd definition: The mixtures of two or more independent random variables are more gaussian than the random variables themselves

A single random u.d. variable

A mixture of 2 u.d. variables

Idea! The observed mixtures should be more gaussian than the original components The original components should be less gaussian than the mixture If we try to maximize the non-gaussianity of the data, we should get closer to the original components

ICA theory Problem definition Solution Preprocessing Different methods Examples

ICA: Definition of the problem Let s1(t), s2(t) and s3(t) be the original signals Let x1(t), x2(t) and x3(t) be collected signals The connection between s and x can be written x1(t)=a11*s1(t) + a12*s2(t) + a13*s3(t) x2(t)=a21*s1(t) + a22*s2(t) + a23*s3(t) x3(t)=a31*s1(t) + a32*s2(t) + a33*s3(t) Goal: Estimate s1, s2 and s3 from x1, x2 and x3?

ICA: Assumption Independence Non-gaussian Square mixing matrix

ICA: Idea Maximize non-gaussianity of the data! We need a measure of Gaussianity or Non-gaussianity

Measures of Gaussianity 1. Kurtosis { 4} ( { 2} ) 3 2 kurt( y) = E y E y Assuming zero mean variables

Measures of Gaussianity 1. Kurtosis kurt( y) { } 3 = E y 4 Assuming zero mean and unit variance

Measures of Gaussianity 1. Kurtosis { 4} ( { 2} ) 3 2 kurt( y) = E y E y For Gaussian data we have: { } ( { }) y 4 3 E y 2 2 E = Which gives kurt(y) = 0, for Gaussian data For most other, kurt 0, positive or negative

Measures of Gaussianity 1. Kurtosis Maximize kurt(y)

Measures of Gaussianity 1. Kurtosis Maximize kurt(y) Advantages: Drawbacks: - Easy to compute - Sensitive to outliers

Measures of Gaussianity 2. Negentropy J ( y) = H ( ygauss ) H ( y) where, H = entropy, defined as: H ( y) = py ( η)logpy ( η) dη

Measures of Gaussianity 2. Negentropy J ( y) = H ( ygauss ) H ( y) Gaussian data has the largest entropy, meaning that it is the most random distrubution.

Measures of Gaussianity 2. Negentropy J ( y) = H ( ygauss ) H ( y) Gaussian data has the largest entropy, meaning that it is the most random distrubution. J(y) > 0 and equals zero if y gaussian

Measures of Gaussianity 1. Negentropy Maximize J(y) Advantages: Drawbacks: - Robust - Computationally hard

ICA: Solutions Kurtosis Negentropy Maximum likelihood Infomax Mutual information

ICA: Solutions Kurtosis Negentropy Maximum likelihood Infomax Mutual information Based on independence and/or non-gaussian

ICA: Restrictions Non-gaussian data* Scaling, sign and order of components Need to know the No. of components

ICA: Restrictions Non-gaussian data* Scaling, sign and order of components Need to know the No. of components * In case of some Gaussian data, the independent components will still be found, but the Gaussian ones will be mixed.

ICA: Preprocessing No reduction of dimension in ICA Need to know the number of components But, we do already have a method for dimension reduction and estimating the probable number of components

ICA: Preprocessing No reduction of dimension in ICA Need to know the number of components But, we do already have a method for dimension reduction and estimating the probable number of components Use PCA as a preprocessing step!

ICA: Preprocessing Low pass filtering + Reduces noise Reduces independence High pass filtering + Increases independence - Increases noise

ICA: Overlearning Much more mixtures than independent components Spiky character of the components

Examples Cocktail party Music separation Image analysis Separation of recorded signals of brain activity Process data Noise/Signal separation Process monitoring

Cocktail party problem Bla bla Bla bla Hi hi

Music separation Source 1 Source 2 Source 3 Source 4 Mix 1 Est 1 Mix 2 Est 2 Mix 3 Est 3 Mix 4 Est 4 http://www.cis.hut.fi/projects/ica/cocktail/cocktail_en.cgi

Image analysis - NLPCA

Brain activity S3 S2 S1 S4

Process data 5 Mixed signals 0-5 0 200 400 600 800 1000 1200 5 0-5 0 200 400 600 800 1000 1200 5 0-5 0 200 400 600 800 1000 1200 2 1 0 0 200 400 600 800 1000 1200

Process data 20 Whitened signals 0-20 0 200 400 600 800 1000 1200 5 0-5 0 200 400 600 800 1000 1200 5 0-5 0 200 400 600 800 1000 1200 5 0-5 0 200 400 600 800 1000 1200

Process data 20 Independent components 0-20 0 200 400 600 800 1000 1200 2 0-2 0 200 400 600 800 1000 1200 5 0-5 0 200 400 600 800 1000 1200 0-5 -10 0 200 400 600 800 1000 1200

Process data 1 0-1 0 200 400 600 800 1000 1200 4 2 0 0 200 400 600 800 1000 1200 1.5 1 0.5 0 200 400 600 800 1000 1200 1 0.5 0 0 200 400 600 800 1000 1200

Noise removal Different noise sources Laplacian Gaussian Uniform Exponential 2 1.5 1 0.5 0-0.5-1 0 100 200 300 400 500 600 700 800 900 1000

Noise removal - Laplacian 4 Mixed signals 2 0-2 -4 0 200 400 600 800 1000 1200 4 2 0-2 -4 0 200 400 600 800 1000 1200

Noise removal - Laplacian 3 Whitened signals 2 1 0-1 -2 0 200 400 600 800 1000 1200 6 4 2 0-2 -4 0 200 400 600 800 1000 1200

Noise removal - Laplacian 2 Independent components 1 0-1 -2 0 200 400 600 800 1000 1200 4 2 0-2 -4-6 0 200 400 600 800 1000 1200

Noise removal - Gaussian 4 Mixed signals 2 0-2 -4 0 200 400 600 800 1000 1200 5 0-5 0 200 400 600 800 1000 1200

Noise removal - Gaussian 2 Whitened signals 1 0-1 -2 0 200 400 600 800 1000 1200 4 2 0-2 -4 0 200 400 600 800 1000 1200

Noise removal - Gaussian 2 Independent components 1 0-1 -2 0 200 400 600 800 1000 1200 4 2 0-2 -4 0 200 400 600 800 1000 1200

Noise removal - Uniform 2 Mixed signals 1 0-1 -2 0 200 400 600 800 1000 1200 2 1 0-1 -2 0 200 400 600 800 1000 1200

Noise removal - Uniform 2 Whitened signals 1 0-1 -2 0 200 400 600 800 1000 1200 4 2 0-2 -4 0 200 400 600 800 1000 1200

Noise removal - Uniform 2 Independent components 1 0-1 -2 0 200 400 600 800 1000 1200 2 1 0-1 -2 0 200 400 600 800 1000 1200

Noise removal - Exponential 2 Mixed signals 1 0-1 0 200 400 600 800 1000 1200 0.6 0.4 0.2 0-0.2-0.4 0 200 400 600 800 1000 1200

Noise removal - Exponential 3 Whitened signals 2 1 0-1 -2 0 200 400 600 800 1000 1200 2 0-2 -4-6 0 200 400 600 800 1000 1200

Noise removal - Exponential 2 Independent components 1 0-1 -2 0 200 400 600 800 1000 1200 6 4 2 0-2 0 200 400 600 800 1000 1200

Process monitoring Often done by PCA Example: F1, F2 One step further, use ICA!

Practical considerations Noise reduction (filtering) Dimension reduction (PCA?) Overlearning Algorithm

What about time signals? So far, no information about time used Original ICA, x is a random variable What if x is a time signal x(t)? 2 1.5 1 x(t) 0.5 0-0.5-1 0 100 200 300 400 500 600 700 800 900 1000

Time signal x(t) Extra information, order is not random: Autocorrelation Cross correlation More information Relaxed assumptions Gaussian data ok

Extensions Non-linear ICA Independent subspace analysis

Further information: Book: Independent Component Analysis - A. Hyvärinen, J. Karhunen, E. Oja Covers everything from novel to expert Homepage: http://www.cis.hut.fi/projects/ica/ Tutorials, material, contacts, matlab code, Journal of Machine Learning Research http://jmlr.csail.mit.edu/papers/special/ica03.html Papers and publications Toolboxes, code http://mole.imm.dtu.dk/toolbox/ica/index.html http://www.bsp.brain.riken.jp/icalab/ http://www.cis.hut.fi/projects/ica/book/links.html