Independent Component Analysis. PhD Seminar Jörgen Ungh

Similar documents
Independent Component Analysis and Its Applications. By Qing Xue, 10/15/2004

Gatsby Theoretical Neuroscience Lectures: Non-Gaussian statistics and natural images Parts I-II

CIFAR Lectures: Non-Gaussian statistics and natural images

Independent Component Analysis

Introduction to Independent Component Analysis. Jingmei Lu and Xixi Lu. Abstract

Independent Component Analysis. Contents

An Introduction to Independent Components Analysis (ICA)

Independent Component Analysis

Independent component analysis: algorithms and applications

HST.582J/6.555J/16.456J

INDEPENDENT COMPONENT ANALYSIS

Lecture'12:' SSMs;'Independent'Component'Analysis;' Canonical'Correla;on'Analysis'

Dimension Reduction (PCA, ICA, CCA, FLD,

Independent Component Analysis (ICA) Bhaskar D Rao University of California, San Diego

Different Estimation Methods for the Basic Independent Component Analysis Model

A GUIDE TO INDEPENDENT COMPONENT ANALYSIS Theory and Practice

Independent Component Analysis

Independent Component Analysis

Advanced Introduction to Machine Learning CMU-10715

HST.582J / 6.555J / J Biomedical Signal and Image Processing Spring 2007

Comparative Analysis of ICA Based Features

Separation of Different Voices in Speech using Fast Ica Algorithm

Independent Component Analysis of Rock Magnetic Measurements

MTTS1 Dimensionality Reduction and Visualization Spring 2014 Jaakko Peltonen

ICA. Independent Component Analysis. Zakariás Mátyás

Fundamentals of Principal Component Analysis (PCA), Independent Component Analysis (ICA), and Independent Vector Analysis (IVA)

PCA & ICA. CE-717: Machine Learning Sharif University of Technology Spring Soleymani

Artificial Intelligence Module 2. Feature Selection. Andrea Torsello

Principal Component Analysis vs. Independent Component Analysis for Damage Detection

Unsupervised learning: beyond simple clustering and PCA

TWO METHODS FOR ESTIMATING OVERCOMPLETE INDEPENDENT COMPONENT BASES. Mika Inki and Aapo Hyvärinen

Independent Component Analysis (ICA)

Independent Component Analysis

Statistical Analysis of fmrl Data

Figure 1: The original signals

Independent Component Analysis

FA Homework 5 Recitation 1. Easwaran Ramamurthy Guoquan (GQ) Zhao Logan Brooks

Chapter 15 - BLIND SOURCE SEPARATION:

Blind Source Separation Using Artificial immune system

Slide11 Haykin Chapter 10: Information-Theoretic Models

Recursive Generalized Eigendecomposition for Independent Component Analysis

Independent Component Analysis and Blind Source Separation

Independent Component Analysis and Unsupervised Learning

Dimensionality Reduction. CS57300 Data Mining Fall Instructor: Bruno Ribeiro

One-unit Learning Rules for Independent Component Analysis

Independent Component Analysis and FastICA. Copyright Changwei Xiong June last update: July 7, 2016

Non-Euclidean Independent Component Analysis and Oja's Learning

Comparison of Fast ICA and Gradient Algorithms of Independent Component Analysis for Separation of Speech Signals

Final Report For Undergraduate Research Opportunities Project Name: Biomedical Signal Processing in EEG. Zhang Chuoyao 1 and Xu Jianxin 2

Introduction to Machine Learning

Separation of the EEG Signal using Improved FastICA Based on Kurtosis Contrast Function

Independent Component (IC) Models: New Extensions of the Multinormal Model

STATS 306B: Unsupervised Learning Spring Lecture 12 May 7

ACENTRAL problem in neural-network research, as well

CPSC 340: Machine Learning and Data Mining. More PCA Fall 2017

Independent Component Analysis and Unsupervised Learning. Jen-Tzung Chien

Hilbert Schmidt Independence Criterion

Tutorial on Blind Source Separation and Independent Component Analysis

EXTENSIONS OF ICA AS MODELS OF NATURAL IMAGES AND VISUAL PROCESSING. Aapo Hyvärinen, Patrik O. Hoyer and Jarmo Hurri

Independent Component Analysis

Introduction PCA classic Generative models Beyond and summary. PCA, ICA and beyond

Principal Component Analysis

An Improved Cumulant Based Method for Independent Component Analysis

FuncICA for time series pattern discovery

Structure in Data. A major objective in data analysis is to identify interesting features or structure in the data.

1 Introduction Independent component analysis (ICA) [10] is a statistical technique whose main applications are blind source separation, blind deconvo

CPSC 340: Machine Learning and Data Mining. Sparse Matrix Factorization Fall 2018

Rigid Structure from Motion from a Blind Source Separation Perspective

THEORETICAL CONCEPTS & APPLICATIONS OF INDEPENDENT COMPONENT ANALYSIS

FEATURE EXTRACTION USING SUPERVISED INDEPENDENT COMPONENT ANALYSIS BY MAXIMIZING CLASS DISTANCE

Marginal density. If the unknown is of the form x = (x 1, x 2 ) in which the target of investigation is x 1, a marginal posterior density

Natural Image Statistics

Donghoh Kim & Se-Kang Kim

Independent Component Analysis and Its Application on Accelerator Physics

New Machine Learning Methods for Neuroimaging

PILCO: A Model-Based and Data-Efficient Approach to Policy Search

Estimation of linear non-gaussian acyclic models for latent factors

COMPARISON OF TWO FEATURE EXTRACTION METHODS BASED ON MAXIMIZATION OF MUTUAL INFORMATION. Nikolay Chumerin, Marc M. Van Hulle

Lecture III. Independent component analysis (ICA) for linear static problems: information- theoretic approaches

UNIVERSITY OF MIAMI. Jonathan Boley A RESEARCH PROJECT

Mining Big Data Using Parsimonious Factor and Shrinkage Methods

Principal Component Analysis

PCA, Kernel PCA, ICA

Independent Components Analysis

Blind Machine Separation Te-Won Lee

Classification CE-717: Machine Learning Sharif University of Technology. M. Soleymani Fall 2012

Unsupervised Variational Bayesian Learning of Nonlinear Models

Natural Gradient Learning for Over- and Under-Complete Bases in ICA

Chapter 14 Combining Models

PROPERTIES OF THE EMPIRICAL CHARACTERISTIC FUNCTION AND ITS APPLICATION TO TESTING FOR INDEPENDENCE. Noboru Murata

From independent component analysis to score matching

Linear & Non-Linear Discriminant Analysis! Hugh R. Wilson

Advanced Introduction to Machine Learning

Robustness of Principal Components

ROBERTO BATTITI, MAURO BRUNATO. The LION Way: Machine Learning plus Intelligent Optimization. LIONlab, University of Trento, Italy, Apr 2015

Using Kernel PCA for Initialisation of Variational Bayesian Nonlinear Blind Source Separation Method

CPSC 540: Machine Learning

Machine Learning Basics Lecture 2: Linear Classification. Princeton University COS 495 Instructor: Yingyu Liang

Machine Learning (BSMC-GA 4439) Wenke Liu

Transcription:

Independent Component Analysis PhD Seminar Jörgen Ungh

Agenda Background a motivater Independence ICA vs. PCA Gaussian data ICA theory Examples

Background & motivation The cocktail party problem Bla bla Bla bla Hi hi

Background & motivation The cocktail party problem Bla bla Bla bla Hi hi

Background & motivation The cocktail party problem x1 s1 Bla bla s2 s3 Bla bla Hi hi x2 x3

Cocktail party problem Let s1(t), s2(t) and s3(t) be the original spoken signals Let x1(t), x2(t) and x3(t) be recorded signals The connection between s and x can be written x1(t)=a11*s1(t) + a12*s2(t) + a13*s3(t) x2(t)=a21*s1(t) + a22*s2(t) + a23*s3(t) x3(t)=a31*s1(t) + a32*s2(t) + a33*s3(t) Goal: Estimate s1, s2 and s3 from x1, x2 and x3? Problem: We do not know anything about the right side

Cocktail party problem Example Microphone 1 Microphone 2 Separated 1 Separated 2 http://www.cnl.salk.edu/~tewon/blind/blind_audio.html

Today we celebrate our independence day - US President THOMAS J. WHITMORE (Bill Pullman) in Independence Day (1996)

Independence what is it?? Independence = Uncorrelation

Definitions Covariance: C xy = E{( x m x )( y m y ) T } Correlation: R = E{ x xy y T } If m = m = 0, then C = x y xy R xy

Uncorrelated Two vectors are uncorrelated if: C xy = E{( x m )( y m ) } = x y T 0 xy T R = E{ x y } = E{ x} E{ y } = T m x m T y If m x = m y = 0, then C = R = xy xy 0 from now we assume zero mean variables

Independent Vectors x,y are independent if: p Which also gives: ( x, y) p ( x) p ( y), y x y x = E{ g ( x) g ( y)} E{ g ( x)} E{ g ( y)} = x y Where gx and gy are arbitrary functions of x and y x y

Independent Independent is stronger than uncorrelated! T E { x y } = E{ x} E{ y T } E{ g ( x) g ( y)} E{ g ( x)} E{ g ( y)} = x y x y Equal if linear functions of x and y

Independent Uncorrelated y y x x Are x and y uncorrelated?

Independent Uncorrelated y y YES x YES x Are x and y uncorrelated?

Independent Uncorrelated y y x x Are x and y independent?

Independent Uncorrelated y y YES x No x Are x and y independent?

Relations Independent Uncorrelated BUT Uncorrelated Independent

ICA vs. PCA Independent Principal

PCA Goal: Project data onto an ortonormal basis with maximum variance Data explained by principal components e1 e2

PCA Uses information up to second moment, i.e. the mean and variance/covariance Reduce dimensions of data Ortonormal basis of uncorrelated vectors

ICA Goal: Find the independent sources Data explained by independent components y e1 x e2

ICA Uses information over second moment, i.e. higher order statistics like kurtosis and skewness Does not reduce dimensions of data A basis of independent vectors

ICA vs. PCA Independent is stronger In case of Gaussian data, ICA = PCA

Gaussian data

Gaussian distribution Definition: = ) ( ) ( 2 1 exp ) (2 1 ) ( 1 2 1 2 / µ µ π x C x C x f T N x C = covariance matrix, µ = mean vector Explained completely by first and second order statistics, i.e. mean and variances

Gaussian data Cannot perform a rotation of the basis, due to symmetry

Gaussian distribution Completely defined by its first and second moment Uncorrelated Gaussian data Independence Why assume gaussian data?

Central limit theorem Definition: A sum of independent random variables will tend to be Gaussian That is the argument for many assumptions on gaussian distributions

Central limit theorem Definition: A sum of independent random variables will tend to be Gaussian What if we put it in another way?

Central limit theorem 2:nd definition: The mixtures of two or more independent random variables are more gaussian than the random variables themselves

A single random u.d. variable

A mixture of 2 u.d. variables

Idea! The observed mixtures should be more gaussian than the original components The original components should be less gaussian than the mixture If we try to maximize the non-gaussianity of the data, we should get closer to the original components

ICA theory Problem definition Solution Preprocessing Different methods Examples

ICA: Definition of the problem Let s1(t), s2(t) and s3(t) be the original signals Let x1(t), x2(t) and x3(t) be collected signals The connection between s and x can be written x1(t)=a11*s1(t) + a12*s2(t) + a13*s3(t) x2(t)=a21*s1(t) + a22*s2(t) + a23*s3(t) x3(t)=a31*s1(t) + a32*s2(t) + a33*s3(t) Goal: Estimate s1, s2 and s3 from x1, x2 and x3?

ICA: Assumption Independence Non-gaussian Square mixing matrix

ICA: Idea Maximize non-gaussianity of the data! We need a measure of Gaussianity or Non-gaussianity

Measures of Gaussianity 1. Kurtosis { 4} ( { 2} ) 3 2 kurt( y) = E y E y Assuming zero mean variables

Measures of Gaussianity 1. Kurtosis kurt( y) { } 3 = E y 4 Assuming zero mean and unit variance

Measures of Gaussianity 1. Kurtosis { 4} ( { 2} ) 3 2 kurt( y) = E y E y For Gaussian data we have: { } ( { }) y 4 3 E y 2 2 E = Which gives kurt(y) = 0, for Gaussian data For most other, kurt 0, positive or negative

Measures of Gaussianity 1. Kurtosis Maximize kurt(y)

Measures of Gaussianity 1. Kurtosis Maximize kurt(y) Advantages: Drawbacks: - Easy to compute - Sensitive to outliers

Measures of Gaussianity 2. Negentropy J ( y) = H ( ygauss ) H ( y) where, H = entropy, defined as: H ( y) = py ( η)logpy ( η) dη

Measures of Gaussianity 2. Negentropy J ( y) = H ( ygauss ) H ( y) Gaussian data has the largest entropy, meaning that it is the most random distrubution.

Measures of Gaussianity 2. Negentropy J ( y) = H ( ygauss ) H ( y) Gaussian data has the largest entropy, meaning that it is the most random distrubution. J(y) > 0 and equals zero if y gaussian

Measures of Gaussianity 1. Negentropy Maximize J(y) Advantages: Drawbacks: - Robust - Computationally hard

ICA: Solutions Kurtosis Negentropy Maximum likelihood Infomax Mutual information

ICA: Solutions Kurtosis Negentropy Maximum likelihood Infomax Mutual information Based on independence and/or non-gaussian

ICA: Restrictions Non-gaussian data* Scaling, sign and order of components Need to know the No. of components

ICA: Restrictions Non-gaussian data* Scaling, sign and order of components Need to know the No. of components * In case of some Gaussian data, the independent components will still be found, but the Gaussian ones will be mixed.

ICA: Preprocessing No reduction of dimension in ICA Need to know the number of components But, we do already have a method for dimension reduction and estimating the probable number of components

ICA: Preprocessing No reduction of dimension in ICA Need to know the number of components But, we do already have a method for dimension reduction and estimating the probable number of components Use PCA as a preprocessing step!

ICA: Preprocessing Low pass filtering + Reduces noise Reduces independence High pass filtering + Increases independence - Increases noise

ICA: Overlearning Much more mixtures than independent components Spiky character of the components

Examples Cocktail party Music separation Image analysis Separation of recorded signals of brain activity Process data Noise/Signal separation Process monitoring

Cocktail party problem Bla bla Bla bla Hi hi

Music separation Source 1 Source 2 Source 3 Source 4 Mix 1 Est 1 Mix 2 Est 2 Mix 3 Est 3 Mix 4 Est 4 http://www.cis.hut.fi/projects/ica/cocktail/cocktail_en.cgi

Music separation Source 1 Source 2 Source 3 Source 4 Mix 1 Est 1 Mix 2 Est 2 Mix 3 Est 3 Mix 4 Est 4 http://www.cis.hut.fi/projects/ica/cocktail/cocktail_en.cgi

Image analysis - NLPCA

Brain activity S3 S2 S1 S4

Process data 5 Mixed signals 0-5 0 200 400 600 800 1000 1200 5 0-5 0 200 400 600 800 1000 1200 5 0-5 0 200 400 600 800 1000 1200 2 1 0 0 200 400 600 800 1000 1200

Process data 20 Whitened signals 0-20 0 200 400 600 800 1000 1200 5 0-5 0 200 400 600 800 1000 1200 5 0-5 0 200 400 600 800 1000 1200 5 0-5 0 200 400 600 800 1000 1200

Process data 20 Independent components 0-20 0 200 400 600 800 1000 1200 2 0-2 0 200 400 600 800 1000 1200 5 0-5 0 200 400 600 800 1000 1200 0-5 -10 0 200 400 600 800 1000 1200

Process data 1 0-1 0 200 400 600 800 1000 1200 4 2 0 0 200 400 600 800 1000 1200 1.5 1 0.5 0 200 400 600 800 1000 1200 1 0.5 0 0 200 400 600 800 1000 1200

Noise removal Different noise sources Laplacian Gaussian Uniform Exponential 2 1.5 1 0.5 0-0.5-1 0 100 200 300 400 500 600 700 800 900 1000

Noise removal - Laplacian 4 Mixed signals 2 0-2 -4 0 200 400 600 800 1000 1200 4 2 0-2 -4 0 200 400 600 800 1000 1200

Noise removal - Laplacian 3 Whitened signals 2 1 0-1 -2 0 200 400 600 800 1000 1200 6 4 2 0-2 -4 0 200 400 600 800 1000 1200

Noise removal - Laplacian 2 Independent components 1 0-1 -2 0 200 400 600 800 1000 1200 4 2 0-2 -4-6 0 200 400 600 800 1000 1200

Noise removal - Gaussian 4 Mixed signals 2 0-2 -4 0 200 400 600 800 1000 1200 5 0-5 0 200 400 600 800 1000 1200

Noise removal - Gaussian 2 Whitened signals 1 0-1 -2 0 200 400 600 800 1000 1200 4 2 0-2 -4 0 200 400 600 800 1000 1200

Noise removal - Gaussian 2 Independent components 1 0-1 -2 0 200 400 600 800 1000 1200 4 2 0-2 -4 0 200 400 600 800 1000 1200

Noise removal - Uniform 2 Mixed signals 1 0-1 -2 0 200 400 600 800 1000 1200 2 1 0-1 -2 0 200 400 600 800 1000 1200

Noise removal - Uniform 2 Whitened signals 1 0-1 -2 0 200 400 600 800 1000 1200 4 2 0-2 -4 0 200 400 600 800 1000 1200

Noise removal - Uniform 2 Independent components 1 0-1 -2 0 200 400 600 800 1000 1200 2 1 0-1 -2 0 200 400 600 800 1000 1200

Noise removal - Exponential 2 Mixed signals 1 0-1 0 200 400 600 800 1000 1200 0.6 0.4 0.2 0-0.2-0.4 0 200 400 600 800 1000 1200

Noise removal - Exponential 3 Whitened signals 2 1 0-1 -2 0 200 400 600 800 1000 1200 2 0-2 -4-6 0 200 400 600 800 1000 1200

Noise removal - Exponential 2 Independent components 1 0-1 -2 0 200 400 600 800 1000 1200 6 4 2 0-2 0 200 400 600 800 1000 1200

Process monitoring Often done by PCA Example: F1, F2 One step further, use ICA!

Practical considerations Noise reduction (filtering) Dimension reduction (PCA?) Overlearning Algorithm

What about time signals? So far, no information about time used Original ICA, x is a random variable What if x is a time signal x(t)? 2 1.5 1 x(t) 0.5 0-0.5-1 0 100 200 300 400 500 600 700 800 900 1000

Time signal x(t) Extra information, order is not random: Autocorrelation Cross correlation More information Relaxed assumptions Gaussian data ok

Extensions Non-linear ICA Independent subspace analysis

Further information: Book: Independent Component Analysis - A. Hyvärinen, J. Karhunen, E. Oja Covers everything from novel to expert Homepage: http://www.cis.hut.fi/projects/ica/ Tutorials, material, contacts, matlab code, Journal of Machine Learning Research http://jmlr.csail.mit.edu/papers/special/ica03.html Papers and publications Toolboxes, code http://mole.imm.dtu.dk/toolbox/ica/index.html http://www.bsp.brain.riken.jp/icalab/ http://www.cis.hut.fi/projects/ica/book/links.html