Independent Component Analysis PhD Seminar Jörgen Ungh
Agenda Background a motivater Independence ICA vs. PCA Gaussian data ICA theory Examples
Background & motivation The cocktail party problem Bla bla Bla bla Hi hi
Background & motivation The cocktail party problem Bla bla Bla bla Hi hi
Background & motivation The cocktail party problem x1 s1 Bla bla s2 s3 Bla bla Hi hi x2 x3
Cocktail party problem Let s1(t), s2(t) and s3(t) be the original spoken signals Let x1(t), x2(t) and x3(t) be recorded signals The connection between s and x can be written x1(t)=a11*s1(t) + a12*s2(t) + a13*s3(t) x2(t)=a21*s1(t) + a22*s2(t) + a23*s3(t) x3(t)=a31*s1(t) + a32*s2(t) + a33*s3(t) Goal: Estimate s1, s2 and s3 from x1, x2 and x3? Problem: We do not know anything about the right side
Cocktail party problem Example Microphone 1 Microphone 2 Separated 1 Separated 2 http://www.cnl.salk.edu/~tewon/blind/blind_audio.html
Today we celebrate our independence day - US President THOMAS J. WHITMORE (Bill Pullman) in Independence Day (1996)
Independence what is it?? Independence = Uncorrelation
Definitions Covariance: C xy = E{( x m x )( y m y ) T } Correlation: R = E{ x xy y T } If m = m = 0, then C = x y xy R xy
Uncorrelated Two vectors are uncorrelated if: C xy = E{( x m )( y m ) } = x y T 0 xy T R = E{ x y } = E{ x} E{ y } = T m x m T y If m x = m y = 0, then C = R = xy xy 0 from now we assume zero mean variables
Independent Vectors x,y are independent if: p Which also gives: ( x, y) p ( x) p ( y), y x y x = E{ g ( x) g ( y)} E{ g ( x)} E{ g ( y)} = x y Where gx and gy are arbitrary functions of x and y x y
Independent Independent is stronger than uncorrelated! T E { x y } = E{ x} E{ y T } E{ g ( x) g ( y)} E{ g ( x)} E{ g ( y)} = x y x y Equal if linear functions of x and y
Independent Uncorrelated y y x x Are x and y uncorrelated?
Independent Uncorrelated y y YES x YES x Are x and y uncorrelated?
Independent Uncorrelated y y x x Are x and y independent?
Independent Uncorrelated y y YES x No x Are x and y independent?
Relations Independent Uncorrelated BUT Uncorrelated Independent
ICA vs. PCA Independent Principal
PCA Goal: Project data onto an ortonormal basis with maximum variance Data explained by principal components e1 e2
PCA Uses information up to second moment, i.e. the mean and variance/covariance Reduce dimensions of data Ortonormal basis of uncorrelated vectors
ICA Goal: Find the independent sources Data explained by independent components y e1 x e2
ICA Uses information over second moment, i.e. higher order statistics like kurtosis and skewness Does not reduce dimensions of data A basis of independent vectors
ICA vs. PCA Independent is stronger In case of Gaussian data, ICA = PCA
Gaussian data
Gaussian distribution Definition: = ) ( ) ( 2 1 exp ) (2 1 ) ( 1 2 1 2 / µ µ π x C x C x f T N x C = covariance matrix, µ = mean vector Explained completely by first and second order statistics, i.e. mean and variances
Gaussian data Cannot perform a rotation of the basis, due to symmetry
Gaussian distribution Completely defined by its first and second moment Uncorrelated Gaussian data Independence Why assume gaussian data?
Central limit theorem Definition: A sum of independent random variables will tend to be Gaussian That is the argument for many assumptions on gaussian distributions
Central limit theorem Definition: A sum of independent random variables will tend to be Gaussian What if we put it in another way?
Central limit theorem 2:nd definition: The mixtures of two or more independent random variables are more gaussian than the random variables themselves
A single random u.d. variable
A mixture of 2 u.d. variables
Idea! The observed mixtures should be more gaussian than the original components The original components should be less gaussian than the mixture If we try to maximize the non-gaussianity of the data, we should get closer to the original components
ICA theory Problem definition Solution Preprocessing Different methods Examples
ICA: Definition of the problem Let s1(t), s2(t) and s3(t) be the original signals Let x1(t), x2(t) and x3(t) be collected signals The connection between s and x can be written x1(t)=a11*s1(t) + a12*s2(t) + a13*s3(t) x2(t)=a21*s1(t) + a22*s2(t) + a23*s3(t) x3(t)=a31*s1(t) + a32*s2(t) + a33*s3(t) Goal: Estimate s1, s2 and s3 from x1, x2 and x3?
ICA: Assumption Independence Non-gaussian Square mixing matrix
ICA: Idea Maximize non-gaussianity of the data! We need a measure of Gaussianity or Non-gaussianity
Measures of Gaussianity 1. Kurtosis { 4} ( { 2} ) 3 2 kurt( y) = E y E y Assuming zero mean variables
Measures of Gaussianity 1. Kurtosis kurt( y) { } 3 = E y 4 Assuming zero mean and unit variance
Measures of Gaussianity 1. Kurtosis { 4} ( { 2} ) 3 2 kurt( y) = E y E y For Gaussian data we have: { } ( { }) y 4 3 E y 2 2 E = Which gives kurt(y) = 0, for Gaussian data For most other, kurt 0, positive or negative
Measures of Gaussianity 1. Kurtosis Maximize kurt(y)
Measures of Gaussianity 1. Kurtosis Maximize kurt(y) Advantages: Drawbacks: - Easy to compute - Sensitive to outliers
Measures of Gaussianity 2. Negentropy J ( y) = H ( ygauss ) H ( y) where, H = entropy, defined as: H ( y) = py ( η)logpy ( η) dη
Measures of Gaussianity 2. Negentropy J ( y) = H ( ygauss ) H ( y) Gaussian data has the largest entropy, meaning that it is the most random distrubution.
Measures of Gaussianity 2. Negentropy J ( y) = H ( ygauss ) H ( y) Gaussian data has the largest entropy, meaning that it is the most random distrubution. J(y) > 0 and equals zero if y gaussian
Measures of Gaussianity 1. Negentropy Maximize J(y) Advantages: Drawbacks: - Robust - Computationally hard
ICA: Solutions Kurtosis Negentropy Maximum likelihood Infomax Mutual information
ICA: Solutions Kurtosis Negentropy Maximum likelihood Infomax Mutual information Based on independence and/or non-gaussian
ICA: Restrictions Non-gaussian data* Scaling, sign and order of components Need to know the No. of components
ICA: Restrictions Non-gaussian data* Scaling, sign and order of components Need to know the No. of components * In case of some Gaussian data, the independent components will still be found, but the Gaussian ones will be mixed.
ICA: Preprocessing No reduction of dimension in ICA Need to know the number of components But, we do already have a method for dimension reduction and estimating the probable number of components
ICA: Preprocessing No reduction of dimension in ICA Need to know the number of components But, we do already have a method for dimension reduction and estimating the probable number of components Use PCA as a preprocessing step!
ICA: Preprocessing Low pass filtering + Reduces noise Reduces independence High pass filtering + Increases independence - Increases noise
ICA: Overlearning Much more mixtures than independent components Spiky character of the components
Examples Cocktail party Music separation Image analysis Separation of recorded signals of brain activity Process data Noise/Signal separation Process monitoring
Cocktail party problem Bla bla Bla bla Hi hi
Music separation Source 1 Source 2 Source 3 Source 4 Mix 1 Est 1 Mix 2 Est 2 Mix 3 Est 3 Mix 4 Est 4 http://www.cis.hut.fi/projects/ica/cocktail/cocktail_en.cgi
Music separation Source 1 Source 2 Source 3 Source 4 Mix 1 Est 1 Mix 2 Est 2 Mix 3 Est 3 Mix 4 Est 4 http://www.cis.hut.fi/projects/ica/cocktail/cocktail_en.cgi
Image analysis - NLPCA
Brain activity S3 S2 S1 S4
Process data 5 Mixed signals 0-5 0 200 400 600 800 1000 1200 5 0-5 0 200 400 600 800 1000 1200 5 0-5 0 200 400 600 800 1000 1200 2 1 0 0 200 400 600 800 1000 1200
Process data 20 Whitened signals 0-20 0 200 400 600 800 1000 1200 5 0-5 0 200 400 600 800 1000 1200 5 0-5 0 200 400 600 800 1000 1200 5 0-5 0 200 400 600 800 1000 1200
Process data 20 Independent components 0-20 0 200 400 600 800 1000 1200 2 0-2 0 200 400 600 800 1000 1200 5 0-5 0 200 400 600 800 1000 1200 0-5 -10 0 200 400 600 800 1000 1200
Process data 1 0-1 0 200 400 600 800 1000 1200 4 2 0 0 200 400 600 800 1000 1200 1.5 1 0.5 0 200 400 600 800 1000 1200 1 0.5 0 0 200 400 600 800 1000 1200
Noise removal Different noise sources Laplacian Gaussian Uniform Exponential 2 1.5 1 0.5 0-0.5-1 0 100 200 300 400 500 600 700 800 900 1000
Noise removal - Laplacian 4 Mixed signals 2 0-2 -4 0 200 400 600 800 1000 1200 4 2 0-2 -4 0 200 400 600 800 1000 1200
Noise removal - Laplacian 3 Whitened signals 2 1 0-1 -2 0 200 400 600 800 1000 1200 6 4 2 0-2 -4 0 200 400 600 800 1000 1200
Noise removal - Laplacian 2 Independent components 1 0-1 -2 0 200 400 600 800 1000 1200 4 2 0-2 -4-6 0 200 400 600 800 1000 1200
Noise removal - Gaussian 4 Mixed signals 2 0-2 -4 0 200 400 600 800 1000 1200 5 0-5 0 200 400 600 800 1000 1200
Noise removal - Gaussian 2 Whitened signals 1 0-1 -2 0 200 400 600 800 1000 1200 4 2 0-2 -4 0 200 400 600 800 1000 1200
Noise removal - Gaussian 2 Independent components 1 0-1 -2 0 200 400 600 800 1000 1200 4 2 0-2 -4 0 200 400 600 800 1000 1200
Noise removal - Uniform 2 Mixed signals 1 0-1 -2 0 200 400 600 800 1000 1200 2 1 0-1 -2 0 200 400 600 800 1000 1200
Noise removal - Uniform 2 Whitened signals 1 0-1 -2 0 200 400 600 800 1000 1200 4 2 0-2 -4 0 200 400 600 800 1000 1200
Noise removal - Uniform 2 Independent components 1 0-1 -2 0 200 400 600 800 1000 1200 2 1 0-1 -2 0 200 400 600 800 1000 1200
Noise removal - Exponential 2 Mixed signals 1 0-1 0 200 400 600 800 1000 1200 0.6 0.4 0.2 0-0.2-0.4 0 200 400 600 800 1000 1200
Noise removal - Exponential 3 Whitened signals 2 1 0-1 -2 0 200 400 600 800 1000 1200 2 0-2 -4-6 0 200 400 600 800 1000 1200
Noise removal - Exponential 2 Independent components 1 0-1 -2 0 200 400 600 800 1000 1200 6 4 2 0-2 0 200 400 600 800 1000 1200
Process monitoring Often done by PCA Example: F1, F2 One step further, use ICA!
Practical considerations Noise reduction (filtering) Dimension reduction (PCA?) Overlearning Algorithm
What about time signals? So far, no information about time used Original ICA, x is a random variable What if x is a time signal x(t)? 2 1.5 1 x(t) 0.5 0-0.5-1 0 100 200 300 400 500 600 700 800 900 1000
Time signal x(t) Extra information, order is not random: Autocorrelation Cross correlation More information Relaxed assumptions Gaussian data ok
Extensions Non-linear ICA Independent subspace analysis
Further information: Book: Independent Component Analysis - A. Hyvärinen, J. Karhunen, E. Oja Covers everything from novel to expert Homepage: http://www.cis.hut.fi/projects/ica/ Tutorials, material, contacts, matlab code, Journal of Machine Learning Research http://jmlr.csail.mit.edu/papers/special/ica03.html Papers and publications Toolboxes, code http://mole.imm.dtu.dk/toolbox/ica/index.html http://www.bsp.brain.riken.jp/icalab/ http://www.cis.hut.fi/projects/ica/book/links.html