LECTURE :ICA. Rita Osadchy. Based on Lecture Notes by A. Ng

Similar documents
Independent Components Analysis

PCA & ICA. CE-717: Machine Learning Sharif University of Technology Spring Soleymani

Advanced Introduction to Machine Learning CMU-10715

Artificial Intelligence Module 2. Feature Selection. Andrea Torsello

Principal Component Analysis

Independent Component Analysis (ICA)

Tutorial on Blind Source Separation and Independent Component Analysis

Autoregressive Independent Process Analysis with Missing Observations

Independent Subspace Analysis

HST.582J/6.555J/16.456J

Independent Component Analysis (ICA) Bhaskar D Rao University of California, San Diego

Why do we care? Measurements. Handling uncertainty over time: predicting, estimating, recognizing, learning. Dealing with time

Dimensionality Reduction. CS57300 Data Mining Fall Instructor: Bruno Ribeiro

Independent Component Analysis. Contents

Independent Component Analysis

Complete Blind Subspace Deconvolution

Independent Component Analysis. Slides from Paris Smaragdis UIUC

Dimension Reduction (PCA, ICA, CCA, FLD,

HST.582J / 6.555J / J Biomedical Signal and Image Processing Spring 2007

Separation of Different Voices in Speech using Fast Ica Algorithm

REAL-TIME TIME-FREQUENCY BASED BLIND SOURCE SEPARATION. Scott Rickard, Radu Balan, Justinian Rosca. Siemens Corporate Research Princeton, NJ 08540

Statistical Machine Learning

ICA. Independent Component Analysis. Zakariás Mátyás

FuncICA for time series pattern discovery

Why do we care? Examples. Bayes Rule. What room am I in? Handling uncertainty over time: predicting, estimating, recognizing, learning

MULTI-VARIATE/MODALITY IMAGE ANALYSIS

Independent Component Analysis

Lecture 5: Logistic Regression. Neural Networks

Cross-Entropy Optimization for Independent Process Analysis

Logistic Regression Review Fall 2012 Recitation. September 25, 2012 TA: Selen Uguroglu

Lecture 3: Latent Variables Models and Learning with the EM Algorithm. Sam Roweis. Tuesday July25, 2006 Machine Learning Summer School, Taiwan

Independent Component Analysis

Introduction to Independent Component Analysis. Jingmei Lu and Xixi Lu. Abstract

Gatsby Theoretical Neuroscience Lectures: Non-Gaussian statistics and natural images Parts I-II

THEORETICAL CONCEPTS & APPLICATIONS OF INDEPENDENT COMPONENT ANALYSIS

Unsupervised Learning

Independent Component Analysis and Its Applications. By Qing Xue, 10/15/2004

Unsupervised learning: beyond simple clustering and PCA

CPSC 340: Machine Learning and Data Mining. More PCA Fall 2017

State Space and Hidden Markov Models

Short course A vademecum of statistical pattern recognition techniques with applications to image and video analysis. Agenda

Independent Component Analysis

Blind Machine Separation Te-Won Lee

Real and Complex Independent Subspace Analysis by Generalized Variance

Independent Component Analysis

Lecture 7: Con3nuous Latent Variable Models

Acoustic MIMO Signal Processing

L2: Review of probability and statistics

Heeyoul (Henry) Choi. Dept. of Computer Science Texas A&M University

Lecture 10: Dimension Reduction Techniques

Dimensionality Reduction

L11: Pattern recognition principles

Lecture 3: Pattern Classification. Pattern classification

MACHINE LEARNING ADVANCED MACHINE LEARNING

Independent Component Analysis and Unsupervised Learning. Jen-Tzung Chien

CIFAR Lectures: Non-Gaussian statistics and natural images

Independent Component Analysis

Independent Component Analysis (ICA)

Deriving Principal Component Analysis (PCA)

Unsupervised Learning: Dimensionality Reduction

STA 4273H: Statistical Machine Learning

An Introduction to Independent Components Analysis (ICA)

Lecture'12:' SSMs;'Independent'Component'Analysis;' Canonical'Correla;on'Analysis'

Independent Component Analysis

Advanced Introduction to Machine Learning CMU-10715

Independent component analysis: an introduction

Lecture 24: Principal Component Analysis. Aykut Erdem May 2016 Hacettepe University

Acoustic Source Separation with Microphone Arrays CCNY

Course content (will be adapted to the background knowledge of the class):

Fundamentals of Principal Component Analysis (PCA), Independent Component Analysis (ICA), and Independent Vector Analysis (IVA)

BAYESIAN DECISION THEORY

Factor Analysis (10/2/13)

MACHINE LEARNING ADVANCED MACHINE LEARNING

Principal Component Analysis

MTTS1 Dimensionality Reduction and Visualization Spring 2014 Jaakko Peltonen

Independent Component Analysis and Unsupervised Learning

Kalman filtering and friends: Inference in time series models. Herke van Hoof slides mostly by Michael Rubinstein

Independent Component Analysis of Incomplete Data

Learning Relational Kalman Filtering

Hidden Markov models

Probabilistic & Unsupervised Learning

Massoud BABAIE-ZADEH. Blind Source Separation (BSS) and Independent Componen Analysis (ICA) p.1/39

Linear Algebra Methods for Data Mining

Independent component analysis: algorithms and applications

Machine Learning (BSMC-GA 4439) Wenke Liu

Introduction to Machine Learning

Supervised Learning Hidden Markov Models. Some of these slides were inspired by the tutorials of Andrew Moore

A summary of Deep Learning without Poor Local Minima

Speaker Representation and Verification Part II. by Vasileios Vasilakakis

Introduction to Data Mining

Hidden Markov Models and Gaussian Mixture Models

Introduction to Machine Learning

Machine Learning for Large-Scale Data Analysis and Decision Making A. Neural Networks Week #6

Modeling Data with Linear Combinations of Basis Functions. Read Chapter 3 in the text by Bishop

Face Recognition and Biometric Systems

c Springer, Reprinted with permission.

Logistic Regression. Vibhav Gogate The University of Texas at Dallas. Some Slides from Carlos Guestrin, Luke Zettlemoyer and Dan Weld.

Undercomplete Independent Component. Analysis for Signal Separation and. Dimension Reduction. Category: Algorithms and Architectures.

Latent Variable Models and EM Algorithm

M.S. Project Report. Efficient Failure Rate Prediction for SRAM Cells via Gibbs Sampling. Yamei Feng 12/15/2011

Transcription:

LECURE :ICA Rita Osadchy Based on Lecture Notes by A. Ng

Cocktail Party Person 1 2 s 1 Mike 2 s 3 Person 3 1 Mike 1 s 2 Person 2 3 Mike 3 microphone signals are mied speech signals 1 2 3 ( t) ( t) ( t) a a a 11 21 31 s ( t) a 1 1 1 Input: microphone signals 1, 2, 3 ( t) ( t) ( t) Goal: recover the speech signals s, s s 12 s ( t) a s ( t) a 1 2, 22 32 3 s s 2 s ( t) a 2 2 ( t) a 13 ( t) a 23 33 s 3 s s 3 3 http://research.ics.tkk.fi/ica/cocktail/cocktail_en.cgi

ICA vs. PCA Similar to PCA Finds a new basis to represent the data Different from PCA PCA removes only correlations, ICA removes correlations, and higher order dependence. In PCA some components are more important than others (based on eigenvalues) in ICA components are equally important.

ICA vs. PCA PCA: principle components are orthogonal. ICA: independent components are not!

ICA vs. PCA maimal variance directions independent components

Model n Assume data sr, generated by n independent sources. We assume: As, A R nn miing matri is unknown

Model Assume data sources. ij sr s signal from source j at time i. n, generated by n independent source 1 source 2 s t1 mic. j at time i ij n k 1 A jk s ik sum over sources s t 2 t

Problem Definition We observe Goal: recover the sources the data ( As). i ; i 1,.., m s j i denotes time, that generated Let W Goal is to find W, such that Denote 1 A unmiing matri W w1 wn si W i then the j-th source can be recovered by s w ij j i

ICA Intuition s j Uniform[1,1]

ICA Ambiguities If we have no prior knowledge about the miing matri, then there are inherent ambiguities in A that are impossible to recover. he sources can be recovered up to Permutation Scaling Sign

Permutation Ambiguity Assume that P is a nn permutation matri. 0 1 0 Eamples: 1 0 0 0 1 P ; P ; 1 0 0 0 1 W w w 11 21 w w 12 22 ; P 0 1 1 ; 0 PW w w 21 11 w w 22 12 Given only the W and PW. i ' s, we cannot distinguish between he permutation of the original sources is ambiguous. Not important in most applications

Scaling Ambiguity i As i A 2, s i 0. 5s i 2A 0. 5 i s i a A 1 a j, s 1 s j j i a 1 We cannot recover the correct scaling of the sources. Not important in most applications a j s i1 s 1/ ij Scaling a speaker's speech signal by some positive factor affects only the volume of that speaker's speech. Also, sign changes do not matter: j and j sound identical when played on a speaker. s j s s

Gaussian sources are problematic n 2, s ~ N(0, I), As ~ N(0, AA ) E[ ] E[ Ass A ] AA Let R be an arbitrary orthogonal matri, such that RR R R I. Let A' AR, then E[ ' ' ' ] A' s ' ~ N(0, AA E[ A' ss A' ] E[ ARss ( AR) ] ) ARR A AA

Gaussian Sources are Problematic Whether the miing matri is A or A, we would observe data from a N(0, AA ) distribution. hus, there is no way to tell if the sources were mied using A or A. here is an arbitrary rotational component in the miing matri that cannot be determined from the data, and we cannot recover the original sources. Reason: he Gaussian distribution is spherically symmetric. For non-gaussian data, it is possible, given enough data, to recover the n independent sources.

Densities and linear transformations Suppose R s is a r.v drawn according to (s). Let be a r.v. defined by. he density of is given by: p ( ) p ( W) W s As p s where 1 W A (A is squared invertible matri) Eample: s ~ Uniform[0,1] : p s ( s) 1(0 s 1) Let A 2, then 2s. Clearly, ~ Uniform[0,2] hus, ( ) 0.5 (0 2). p

ICA algorithm Assume that the distribution of he joint distribution is n j1 s p s s ). i is ( i p( s) p s ( s j ) sources are independent Using the previous formulation, we can derive p( ) p ( w ) W p( ) ps ( W) W We must specify a density for the individual sources p s. n j1 s j As W 1 s

ICA algorithm A cumulative distribution of a real r.v. z is defined by z z F( z0) P 0 he density of z can be found by z 0 p z ( z) dz p z ( z) F'( z). Specify a density for the s j specify its cdf. If you have a prior knowledge that the sources' densities take a certain form, then use it here, otherwise make an assumption about cdf.

Density of s cdf is has to be a monotonic function that increases from zero to one. Gaussian CDF sigmoid i g( s) 1/(1 e p( s) g'( s) We assume that the data has zero mean. his is necessary because our assumption that implies hus p( s) g'( s) E( s) 0. E( ) E( As) 0 s )

ICA algorithm W is a parameter of our model that we want to estimate. Given a training set l( W ) Maimize m l(w) s i1 j1 i ; i 1,..., log g' m w logw. using gradient ascent: j i, the log likelihood is: W W l(w), where is the learning rate.? w Equivalently, w j w j l(w ) j

ICA algorithm By taking the derivatives of using: we obtain the update rule: When the algorithm converges, compute l(w) )) ( )(1 ( ) '( ); 1/(1 ) ( g g g e g W W W W 1 1 2 1 2 1 2 1 2 1 i i n i i W w g w g w g W W. i W i s

Remarks We assumed that independent of each other. i ; i 1,..., m are his assumption is incorrect for time series where the i 's are dependent (e.g. speech data). it can be shown, that having correlated training eamples will not hurt the performance of the algorithm if we have sufficient data. ip: run stochastic gradient ascent on a randomly shuffled copy of the training set.

Application domains of ICA Blind source separation Image denoising Medical signal processing fmri, ECG, EEG Modelling of the hippocampus and visual corte Feature etraction, face recognition Compression, redundancy reduction Watermarking Clustering ime series analysis (stock market, microarray data) opic etraction Econometrics: Finding hidden factors in financial data Slide due B. Poczos

Slide due B. Poczos

Fig. from Jung

Image denoising Original image Noisy image Wiener filtering ICA filtering