Independent Component Analysis (ICA)

Similar documents
Independent Component Analysis (ICA)

Lecture'12:' SSMs;'Independent'Component'Analysis;' Canonical'Correla;on'Analysis'

Introduction to Independent Component Analysis. Jingmei Lu and Xixi Lu. Abstract

Independent Component Analysis

Independent Component Analysis

Independent Component Analysis and Its Applications. By Qing Xue, 10/15/2004

Artificial Intelligence Module 2. Feature Selection. Andrea Torsello

Unsupervised learning: beyond simple clustering and PCA

Independent Component Analysis

Independent Component Analysis

Fundamentals of Principal Component Analysis (PCA), Independent Component Analysis (ICA), and Independent Vector Analysis (IVA)

CIFAR Lectures: Non-Gaussian statistics and natural images

Dimensionality Reduction. CS57300 Data Mining Fall Instructor: Bruno Ribeiro

Gatsby Theoretical Neuroscience Lectures: Non-Gaussian statistics and natural images Parts I-II

Independent Component Analysis

PCA & ICA. CE-717: Machine Learning Sharif University of Technology Spring Soleymani

Independent Component Analysis (ICA)

ICA. Independent Component Analysis. Zakariás Mátyás

Independent component analysis: algorithms and applications

Multi-user FSO Communication Link

Separation of the EEG Signal using Improved FastICA Based on Kurtosis Contrast Function

Independent Component Analysis. PhD Seminar Jörgen Ungh

Independent Component Analysis of Incomplete Data

A GUIDE TO INDEPENDENT COMPONENT ANALYSIS Theory and Practice

HST.582J/6.555J/16.456J

Comparison of Fast ICA and Gradient Algorithms of Independent Component Analysis for Separation of Speech Signals

Principal Component Analysis vs. Independent Component Analysis for Damage Detection

An Introduction to Independent Components Analysis (ICA)

Independent Component Analysis and Its Application on Accelerator Physics

Independent Component Analysis. Contents

Independent Component Analysis and FastICA. Copyright Changwei Xiong June last update: July 7, 2016

Independent Component Analysis (ICA) Bhaskar D Rao University of California, San Diego

INDEPENDENT COMPONENT ANALYSIS

CPSC 340: Machine Learning and Data Mining. More PCA Fall 2017

Non-orthogonal Support-Width ICA

Machine Learning (BSMC-GA 4439) Wenke Liu

MTTS1 Dimensionality Reduction and Visualization Spring 2014 Jaakko Peltonen

Independent Component Analysis

TWO METHODS FOR ESTIMATING OVERCOMPLETE INDEPENDENT COMPONENT BASES. Mika Inki and Aapo Hyvärinen

Independent Component Analysis

CPSC 340: Machine Learning and Data Mining. Sparse Matrix Factorization Fall 2018

STATS 306B: Unsupervised Learning Spring Lecture 12 May 7

An Improved Cumulant Based Method for Independent Component Analysis

Principal Component Analysis

Independent Component Analysis

Donghoh Kim & Se-Kang Kim

Estimation of linear non-gaussian acyclic models for latent factors

LECTURE :ICA. Rita Osadchy. Based on Lecture Notes by A. Ng

Chapter 15 - BLIND SOURCE SEPARATION:

Separation of Different Voices in Speech using Fast Ica Algorithm

Independent Components Analysis

Independent Component Analysis of Rock Magnetic Measurements

Independent component analysis for functional data

Blind Source Separation Using Artificial immune system

Real and Complex Independent Subspace Analysis by Generalized Variance

Finding a causal ordering via independent component analysis

HST.582J / 6.555J / J Biomedical Signal and Image Processing Spring 2007

Properties of Matrices and Operations on Matrices

Artifact Extraction from EEG Data Using Independent Component Analysis

Advanced Introduction to Machine Learning CMU-10715

A non-gaussian decomposition of Total Water Storage (TWS), using Independent Component Analysis (ICA)

A Constrained EM Algorithm for Independent Component Analysis

From independent component analysis to score matching

Package ProDenICA. February 19, 2015

Package steadyica. November 11, 2015

Multivariate Random Variable

University of Illinois ECE 313: Final Exam Fall 2014

Matrix Algebra 2.1 MATRIX OPERATIONS Pearson Education, Inc.

Different Estimation Methods for the Basic Independent Component Analysis Model

UNIVERSITY of PENNSYLVANIA CIS 520: Machine Learning Final, Fall 2013

ICA and ISA Using Schweizer-Wolff Measure of Dependence

Covariance and Correlation Matrix

Blind Instantaneous Noisy Mixture Separation with Best Interference-plus-noise Rejection

Final Report For Undergraduate Research Opportunities Project Name: Biomedical Signal Processing in EEG. Zhang Chuoyao 1 and Xu Jianxin 2

where A 2 IR m n is the mixing matrix, s(t) is the n-dimensional source vector (n» m), and v(t) is additive white noise that is statistically independ

Linear Regression Linear Regression with Shrinkage

Announcements (repeat) Principal Components Analysis

Concentration Ellipsoids

Linear Algebra for Machine Learning. Sargur N. Srihari

Single Channel Signal Separation Using MAP-based Subspace Decomposition

One-unit Learning Rules for Independent Component Analysis

Matrix Factorizations

PCA, Kernel PCA, ICA

On Information Maximization and Blind Signal Deconvolution

Lecture 44. Better and successive approximations x2, x3,, xn to the root are obtained from

Exam, Solutions

A6523 Modeling, Inference, and Mining Jim Cordes, Cornell University

Multiple Random Variables

Case Studies of Independent Component Analysis For CS383C Numerical Analysis of Linear Algebra Alan Oursland, Judah De Paula, Nasim Mahmood

BLIND DECONVOLUTION ALGORITHMS FOR MIMO-FIR SYSTEMS DRIVEN BY FOURTH-ORDER COLORED SIGNALS

Acoustic classification using independent component analysis

Review (Probability & Linear Algebra)

Measurement, Scaling, and Dimensional Analysis Summer 2017 METRIC MDS IN R

Independent Component Analysis and Blind Source Separation

at Some sort of quantization is necessary to represent continuous signals in digital form

File: ica tutorial2.tex. James V Stone and John Porrill, Psychology Department, Sheeld University, Tel: Fax:

Dimensionality Reduction and Principle Components

Linear Equations in Linear Algebra

Blind separation of sources that have spatiotemporal variance dependencies

Learning with Singular Vectors

Transcription:

Data Mining 160 SECTION 5A-ICA Independent Component Analysis (ICA) Independent Component Analysis (ICA) ( Independent Component Analysis:Algorithms and Applications Hyvarinen and Oja (2000)) is a variation of Principal Component Analysis (PCA).and a strong competitor to Factor Analysis. ICA is an attempt to decompose complex data into independent subparts (also known as the blind source separation problem or the cocktail party problem.) It attempts to determine the source signals S given only the observed mixtures.x (It is necessary to assume independence of source signals, i.e. the value of one signal does not give any information re other signals). Using singular value decompositon X UDV T and writing S N U and A T DVT we can write N X SA T thus each column of X is a linear combination of the columns of S. Since U is orthogonal, and assuming that the columns of X each have mean zero, this means the columns of S have zero mean, are uncorrelated, and have unit variance. p We have X i a ij S j, i 1,..., p j1 or (writing X and S as column vectors) X AS AR T RS A S for any orthogonal p p matrix R ICA assumes the S i are statistically independent (thus determining all the cross moments) rather than uncorrelated.(which determines only the second-order cross moments). Independence implies uncorrelatedness so ICA constrains the estimation procedure to give uncorrelated estimates of the independent components (this reduces the number of free parameters and thus simplifies the problem) The extra moment conditions identify A uniquely. NOTE: In Factor Analysis with q p we have q X i a ij S j, i, j1 i 1,...,p or X AS where the S are the common factors and represents unique factors. ICA can be viewed as another Factor Analysis rotation method (just like varimax or quartimax); it starts essentially from a Factor Analysis solution and looks for rotations that lead to independent components. Mills 2017 ICA 160

Data Mining 2017 161 In Factor Analysis, the S j and i are generally assumed to be Gaussian; orthogonal transformations AS of Gaussians are still Gaussian. Hence we can estimate the model only up to an orthogonal transformation. Thus A is not identifiable for independent Gaussian components. (If just one component is Gaussian, the ICA model can be estimated.) However we actually do not want Gaussian source variables (we allow at most one Gaussian source variable) because if S i are Gaussian and the mixing matrix A is orthogonal, the X i will also be Gaussian and uncorrelated with unit variance so the joint density will be completely symmetric and we will have no information on the directions of the columns of the mixing matrix A hence A cannot be estimated. We avoid this identifiability problem by assuming the S i are independent and non-gaussian so (because A is orthogonal). S A 1 X A T X We assume X has been whitened (i.e. sphered) via SVD to have CovX I; A is orthogonal and solving the ICA problem means finding an orthogonal S such that the components of S A T X are independent and non-gaussian. Writing where and setting we obtain Y W T X X AS Z A T W Y W T X W T AS A T W T S Z T S which can be more Gaussian than any of the S i. and the least Gaussian when it equals one of the S i (i.e. when only one of the elements of Z is nonzero). We want to find W so as to maximize the nongaussianity of Y; this corresponds (in the transformed coordinate systems) to Z which has only one nonzero component. Thus is one of the independent components. Y W T X Z T S Thus finding A that minimizes mutual information IS A T X requires looking for an orthogonal transformation that gives the most independence between its components. This is equivalent to minimizing the sum of the entropies of separate components of Y, which is equivalent to maximizing their departures from Gaussianity (since Gaussian variables have maximum entropy). Mills 2017 161

Data Mining 162 There are two problems: 1. We cannot determine the variances of the independent components. Since S and A are both unknown, a scalar multiple of one S i can be cancelled out by dividing the corresponding column a i of A by the same scalar. Thus we fix the magnitude of the independent components S i - since they are all random variables, we assume each has unit variance and, since they have been centered, this means ES i 2 1. Note that we can multiply an independent component by1 without affecting the model so there is also ambiguity of sign. 2. We cannot determine the order of the independent components. Since S and A are both unknowns, we are free to change the order of the terms, setting any one of them first. Thus a permutation matrix P and its inverse can be substituted in the model to give X AP 1 PS where AP 1 is the new unknown mixing matrix to be solved for by ICA and the elements of PS are the original independent S i but in different (i.e. permuted) order. Read some required files: drive - D: code.dir - paste(drive, DATA/Data Mining R-Code, sep / ) data.dir - paste(drive, DATA/Data Mining Data, sep / ) source(paste(code.dir, BorderHist.r, sep / )) source(paste(code.dir, WaveIO.r, sep / )) library(fastica) We will create and display two signals (Figure 16) S.1 - sin((1:1000)/20) S.2 - rep((((1:200)-100)/100), 5) S - cbind(s.1, S.2) plot(s.1) plot(s.2) Mills 2017 ICA 162

Data Mining 2017 163 Figure 16. Original signals Mills 2017 163

Data Mining 164 and rotate them: a - pi/4 A - matrix(c(cos(a), sin(a), -sin(a), cos(a)), 2, 2) X - S%*%A plot(x[,1]) plot(x[,2]) Figure 17. Rotated signals We combine them and then display them with their histograms: border.hist(s.1, S.2) border.hist(x[,1], X[,2]) Figure 18. Border histograms of the original (left) and rotated signals. Mills 2017 ICA 164

Data Mining 2017 165 Now start with the mixed signals and observe what happens to the histograms as we rotate the axes on which the signals are projected: b - pi/36 W - matrix(c(cos(b), -sin(b), sin(b), cos(b)), 2, 2) XX - X for (i in 1:9) { XX - XX%*%W border.hist(xx[,1], XX[,2]) readline( Press Enter... ) Mills 2017 165

Data Mining 166 Figure 19. Effect of rotating the projection plane Mills 2017 ICA 166

Data Mining 2017 167 We see that for the fully mixed signals the histograms appear nearly Gaussian. As we move through the different projections the histograms move away from normality. The resulting signals are: plot(xx[,1]) plot(xx[,2]) Figure 20. Result of the ICA Now consider what happens for 3 signals - a sine function, a sawtooth, and a pair of exponentials. S.1 - sin((1:1000)/20) S.2 - rep((((1:200)-100)/100), 5) S.3 - rep(c(exp(seq(0,.99,.01))-1.845617, -exp(seq(0,.99,.01))1.845617), 5) S - cbind(s.1, S.2, S.3) A - matrix(runif(9), 3, 3) # Set a random mixing X - S%*%A Mills 2017 167

Data Mining 168 Do an ICA on the mixed data: a - fastica(x, 3, alg.typ parallel, fun logcosh, alpha 1, method R, row.norm FALSE, maxit 200, tol 0.0001, verbose TRUE) Whitening Symmetric FastICA using logcosh approx. to neg-entropy function Iteration 1 tol 0.1086564 Iteration 2 tol 0.004629528 Iteration 3 tol 0.0001178137 Iteration 4 tol 5.028182e-06 We then plot the original, mixed, and recovered data: oldpar - par(mfcol c(3, 3), marc(2, 2, 2, 1)) plot(1:1000, S[,1], type l, main Original Signals, xlab, ylab ) for (i in 2:3) { plot(1:1000, S[,i ], type l, xlab, ylab ) plot(1:1000, X[,1 ], type l, main Mixed Signals, xlab, ylab ) for (i in 2:3) { plot(1:1000, X[,i], type l, xlab, ylab ) plot(1:1000, a$s[,1 ], type l, main ICA source estimates, xlab, ylab ) for (i in 2:3) { plot(1:1000, a$s[,i], type l, xlab, ylab ) par(oldpar) Figure 21. Original, mixed and recovered Mills 2017 ICA 168

Data Mining 2017 169 Repeat the process with four signals: S.1 - sin((1:1000)/20) S.2 - rep((((1:200)-100)/100), 5) s.3 - tan(seq(-pi/2.1,pi/2-.1,.0118)) S.3 - rep(s.3, 4) S.4 - rep(c(exp(seq(0,.99,.01))-1.845617, -exp(seq(0,.99,.01))1.845617), 5) S - cbind(s.1, S.2, S.3, S.4) (A - matrix(runif(16), 4, 4)) [,1] [,2] [,3] [,4] [1,] 0.4091777 0.79526756 0.773487999 0.7201944 [2,] 0.1084712 0.03256865 0.151097684 0.2899303 [3,] 0.8920621 0.69775810 0.281228361 0.1156242 [4,] 0.4683415 0.91346105 0.003911073 0.1033929 X - S%*%A a - fastica(x, 4, alg.typ parallel, fun logcosh, alpha 1, method R, row.norm FALSE, maxit 200, tol 0.0001, verbose TRUE) Centering Whitening Symmetric FastICA using logcosh approx. to neg-entropy function Iteration 1 tol 0.3458911 Iteration 2 tol 0.007638039 Iteration 3 tol 0.001150413 Iteration 4 tol 0.0003499578 Iteration 5 tol 9.909304e-05 oldpar - par(mfcol c(4, 3), marc(2, 2, 2, 1)) plot(1:1000, S[,1], type l, main Original Signals, xlab, ylab ) for (i in 2:4) { plot(1:1000, S[,i ], type l, xlab, ylab ) plot(1:1000, X[,1 ], type l, main Mixed Signals, xlab, ylab ) for (i in 2:4) { plot(1:1000, X[,i], type l, xlab, ylab ) plot(1:1000, a$s[,1 ], type l, main ICA source estimates, xlab, ylab ) for (i in 2:4) { plot(1:1000, a$s[,i], type l, xlab, ylab ) par(oldpar) Mills 2017 169

Data Mining 170 Figure 22. Mills 2017 ICA 170

Data Mining 2017 171 For this example we will look at three mixtures of 4 signals (note the warning messages):. A - matrix(runif(12), 4, 3) X - S%*%A a - fastica(x, 4, alg.typ parallel, fun logcosh, alpha 1, method R, row.norm FALSE, maxit 200, tol 0.0001, verbose TRUE) n.comp is too large n.comp set to 3 Centering Whitening Symmetric FastICA using logcosh approx. to neg-entropy function Iteration 1 tol 0.1473840 Iteration 2 tol 0.003145043 Iteration 3 tol 1.781576e-05 oldpar - par(mfcol c(4, 3), marc(2, 2, 2, 1)) plot(1:1000, S[,1], type l, main Original Signals, xlab, ylab ) for (i in 2:4) { plot(1:1000, S[,i ], type l, xlab, ylab ) plot(1:1000, X[,1 ], type l, main Mixed Signals, xlab, ylab ) for (i in 2:3) { plot(1:1000, X[,i], type l, xlab, ylab ) plot(0, type n )) # Dummy to fill plot(1:1000, a$s[,1 ], type l, main ICA source estimates, xlab, ylab ) for (i in 2:3) { plot(1:1000, a$s[,i], type l, xlab, ylab ) plot(0, type n ) # Dummy to fill par(oldpar) Mills 2017 171

Data Mining 172 Figure 23. Mills 2017 ICA 172

Data Mining 2017 173 The next example uses ICA on sounds. This is a demonstration found at the Laboratory of Computer and Information Science (CIS) of the Department of Computer Science and Engineering at Helsinki University of Technology http://www.cis.hut.fi/projects/ica/cocktail/cocktail_en.cgi For this example we will need to read and write.wav files. A.wav file has the basic structure.described in the next function: read.wavz -z function(d.file)z { zz - file(d.file, rb ) # Open binary file for reading # RIFF chunk RIFF - readchar(zz,4) # Word RIFF (4) file.len - readbin(zz, integer(), 1) # Number of bytes in file (4) WAVE - readchar(zz,4) # Word WAVE (4) # FORMAT chunk fmt - readchar(zz,4) # fmt (3) len.of.format - readbin(zz, integer(), 1) # format length (40 f.one - readbin(zz, integer(), 1, size2) # Number 1 (2) Channel.numbs - readbin(zz, integer(), 1, size2) # Number of channels (2) Sample.Rate - readbin(zz, integer(), 1) # Sample rate (4) Bytes.P.Sec - readbin(zz, integer(), 1) # Bytes/sec (4) Bytes.P.Sample - readbin(zz, integer(), 1, size2) # Bytes/sample (2) Bits.P.Sample - readbin(zz, integer(), 1, size2) # Bits/sample (2) # DATA chunk DATA - readchar(zz,4) # Word DATA (4) data.len - readbin(zz, integer(), 1) # Length of data (4) bias - 2^(Bits.P.Sample - 1) wav.data - rep(0, data.len) # Create a place to store data # Read data based on above parameters wav.data - readbin(zz, integer(), data.len, sizebytes.p.sample, signedf) close(zz) # Close the file wav.data - wav.data - bias # Shift based on bias # Return the information for R list(riffriff, File.Lenfile.len,WAVEWAVE, formatfmt, len.of.formatlen.of.format,f.onef.one,channel.numbschannel.numbs, Sample.RateSample.Rate,Bytes.P.SecBytes.P.Sec, Bytes.P.SampleBytes.P.Sample, Bits.P.SampleBits.P.Sample, DATADATA, data.lendata.len, datawav.data) Set up variables for the data and create the file names for the input, mixed, and output file: numb.source - 9 in.file - matrix(0, numb.source, 1) mix.file - matrix(0, numb.source, 1) out.file - matrix(0, numb.source, 1) for (i in 1:numb.source) { in.file[i,] - paste(data.dir, /source,i,.wav, sep ) mix.file[i,] - paste(data.dir, /m,i,.wav, sep ) out.file[i,] - paste(data.dir, /s,i,.wav, sep ) Mills 2017 173

Data Mining 174 in.wav - { for (m in 1:numb.source) { in.wav - c(in.wav, list(read.wav(in.file[m,]))) Mills 2017 ICA 174

Data Mining 2017 175 We can look at the characteristics of the file with: wav.char - function(wav) { cat( RIFF, wav$riff, \n ) cat( Length, wav$file.len, \n ) cat( Wave, wav$wave, \n ) cat( Format, wav$format, \n ) cat( Format Length, wav$len.of.format, \n ) cat( One, wav$f.one, \n ) cat( Number of Channels, wav$channel.numbs, \n ) cat( Sample Rate, wav$sample.rate, \n ) cat( Bytes/Sec, wav$bytes.p.sec, \n ) cat( Bytes/Sample, wav$bytes.p.sample, \n ) cat( Bits/Sample, wav$bits.p.sample, \n ) cat( Data, wav$data, \n ) cat( Data Length, wav$data.len, \n ) wav.char(in.wav[[1]]) RIFF RIFF Length 50036 Wave WAVE Format fmt Format Length 16 One 1 Number of Channels 1 Sample Rate 8000 Bytes/Sec 8000 Bytes/Sample 1 Bits/Sample 8 Data data Data Length 50000 Set up a random matrix for mixing: A - matrix(runif(numb.source*numb.source),numb.source,numb.source) We will create a matrix (50009) that has one source in each column: mixed - { for (i in 1:numb.source) { mixed - cbind(mixed, in.wav[[i]]$data) We multiply by the 99mixing matrix to produce a new (5000 9) matrix in which each column is a mixture of the 9 columns of the original matrix. mixed - mixed%*%a Mills 2017 175

Data Mining 176 We now plot the resulting wave form (Figure 24). old.par - par(mfcol c(numb.source, 1)) par(marc(2,2,2,2)0.1) plot(mixed[,1], type l, main Mixed ) for (m in 2:numb.source) { plot(mixed[,m], type l ) if (dev.cur()[[1]]!1) bringtotop(whichdev.cur()) par(old.par) Figure 24. 9 signals mixed Mills 2017 ICA 176

Data Mining 2017 177 In order to save the signal as a.wav file, we need the header information. We cheat a little by simply using the in.wav header and replacing its data part with the mixed data. The first part of the following code simply creates a mixed list from the in list and the second part does the data replacement. mix.wav - { for (m in 1:numb.source) { mix.wav - c(mix.wav, list(in.wav[[m]])) for (m in 1:numb.source) { mix.wav[[m]]$data - mixed[,m] write.wav(mix.file[m,], mix.wav[[m]]) # Play them Use the sound library to play the mixed sound: library(sound) play(mix.file[1,]) play(mix.file[2,]) play(mix.file[3,]) play(mix.file[4,]) play(mix.file[5,]) play(mix.file[6,]) play(mix.file[7,]) play(mix.file[8,]) play(mix.file[9,]) # Unmix them We will use fastica to unmix the signals and save then play the results as we did for the mixed signal: mixed.all - { for (i in 1:numb.source) { mixed.all - cbind(mixed.all, mixed[,i]) ICA.wavs - fastica(mixed.all, numb.source, alg.typ parallel, fun logcosh, alpha 1, method R, row.norm FALSE, maxit 200, tol 0.0001, verbose TRUE) # Save them new.wav - { for (m in 1:numb.source) { new.wav - c(new.wav, list(in.wav[[m]])) for (m in 1:numb.source) { new.wav[[m]]$data - 5*ICA.wavs$S[,m] write.wav(out.file[m,], new.wav[[m]]) # Play them play(out.file[1,]) Mills 2017 177

Data Mining 178 play(out.file[2,]) play(out.file[3,]) play(out.file[4,]) play(out.file[5,]) play(out.file[6,]) play(out.file[7,]) play(out.file[8,]) play(out.file[9,]) # Plot them old.par - par(mfcol c(numb.source, 3)) par(marc(2,2,2,2)0.1) plot(in.wav[[1]]$data, type l, main Original ) for (m in 2:numb.source) { plot(in.wav[[m]]$data, type l ) plot(mixed[,1], type l, main Mixed ) for (m in 2:numb.source) { plot(mixed[,m], type l ) plot(ica.wavs$s[,1], type l, ) if (dev.cur()[[1]]!1) bringtotop(whichdev.cur()) for (m in 2:numb.source) { plot(ica.wavs$s[,m], type l ) par(old.par) Mills 2017 ICA 178

Data Mining 2017 179 Figure 25. # Original - play play(in.file[1,]) play(in.file[2,]) play(in.file[3,]) play(in.file[4,]) play(in.file[5,]) play(in.file[6,]) play(in.file[7,]) play(in.file[8,]) play(in.file[9,]) Mills 2017 179