Independent Component Analysis and Unsupervised Learning. Jen-Tzung Chien

Similar documents
Independent Component Analysis and Unsupervised Learning

A Convex Cauchy-Schwarz Divergence Measure for Blind Source Separation

Monaural speech separation using source-adapted models

SINGLE CHANNEL SPEECH MUSIC SEPARATION USING NONNEGATIVE MATRIX FACTORIZATION AND SPECTRAL MASKS. Emad M. Grais and Hakan Erdogan

Independent Component Analysis. Contents

A Variance Modeling Framework Based on Variational Autoencoders for Speech Enhancement

Machine Learning Techniques for Computer Vision

Non-Negative Matrix Factorization And Its Application to Audio. Tuomas Virtanen Tampere University of Technology

Pattern Recognition and Machine Learning

Joint Factor Analysis for Speaker Verification

Short-Time ICA for Blind Separation of Noisy Speech

Linear Dynamical Systems

Heeyoul (Henry) Choi. Dept. of Computer Science Texas A&M University

Fundamentals of Principal Component Analysis (PCA), Independent Component Analysis (ICA), and Independent Vector Analysis (IVA)

Recent Advances in Bayesian Inference Techniques

STA 4273H: Statistical Machine Learning

Unsupervised Learning

Introduction to Independent Component Analysis. Jingmei Lu and Xixi Lu. Abstract

Independent Component Analysis and Its Applications. By Qing Xue, 10/15/2004

PILCO: A Model-Based and Data-Efficient Approach to Policy Search

Density Estimation. Seungjin Choi

Hidden Markov Models and Gaussian Mixture Models

Independent Component Analysis (ICA)

Expectation Maximization (EM)

Curve Fitting Re-visited, Bishop1.2.5

PATTERN CLASSIFICATION

HST.582J/6.555J/16.456J

PCA & ICA. CE-717: Machine Learning Sharif University of Technology Spring Soleymani

Novel spectrum sensing schemes for Cognitive Radio Networks

Independent Component Analysis

Reformulating the HMM as a trajectory model by imposing explicit relationship between static and dynamic features

Support Vector Machines using GMM Supervectors for Speaker Verification

STA414/2104. Lecture 11: Gaussian Processes. Department of Statistics

CIFAR Lectures: Non-Gaussian statistics and natural images

Sequence labeling. Taking collective a set of interrelated instances x 1,, x T and jointly labeling them

Non-negative Matrix Factorization: Algorithms, Extensions and Applications

TRINICON: A Versatile Framework for Multichannel Blind Signal Processing

EUSIPCO

Latent Tree Approximation in Linear Model

p L yi z n m x N n xi

Separation of Different Voices in Speech using Fast Ica Algorithm

Gatsby Theoretical Neuroscience Lectures: Non-Gaussian statistics and natural images Parts I-II

Comparing linear and non-linear transformation of speech

Upper Bound Kullback-Leibler Divergence for Hidden Markov Models with Application as Discrimination Measure for Speech Recognition

Eigenvoice Speaker Adaptation via Composite Kernel PCA

ICA. Independent Component Analysis. Zakariás Mátyás

Advanced Introduction to Machine Learning CMU-10715

Bayesian ensemble learning of generative models

STA 4273H: Statistical Machine Learning

Computer Vision Group Prof. Daniel Cremers. 6. Mixture Models and Expectation-Maximization

ALGONQUIN - Learning dynamic noise models from noisy speech for robust speech recognition

Experiments with a Gaussian Merging-Splitting Algorithm for HMM Training for Speech Recognition

Massachusetts Institute of Technology

STA 414/2104: Machine Learning

Single-channel source separation using non-negative matrix factorization

Blind Machine Separation Te-Won Lee

FuncICA for time series pattern discovery

UNIVERSITY of PENNSYLVANIA CIS 520: Machine Learning Final, Fall 2013

Hidden Markov Models in Language Processing

Brief Introduction of Machine Learning Techniques for Content Analysis

HST.582J / 6.555J / J Biomedical Signal and Image Processing Spring 2007

CSC2535: Computation in Neural Networks Lecture 7: Variational Bayesian Learning & Model Selection

Basic math for biology

Cheng Soon Ong & Christian Walder. Canberra February June 2018

A TWO-LAYER NON-NEGATIVE MATRIX FACTORIZATION MODEL FOR VOCABULARY DISCOVERY. MengSun,HugoVanhamme

Density Estimation: ML, MAP, Bayesian estimation

order is number of previous outputs

Single Channel Signal Separation Using MAP-based Subspace Decomposition

Speaker recognition by means of Deep Belief Networks

Graphical Models for Collaborative Filtering

Hidden Markov Models and Gaussian Mixture Models

CS229 Project: Musical Alignment Discovery

A Convex Cauchy-Schwarz Divergence Measure for Blind Source Separation

Session Variability Compensation in Automatic Speaker Recognition

Speaker Representation and Verification Part II. by Vasileios Vasilakakis

PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 13: SEQUENTIAL DATA

MAP adaptation with SphinxTrain

Nonnegative Tensor Factorization with Smoothness Constraints

Overview of Statistical Tools. Statistical Inference. Bayesian Framework. Modeling. Very simple case. Things are usually more complicated

An Introduction to Independent Components Analysis (ICA)

Jorge Silva and Shrikanth Narayanan, Senior Member, IEEE. 1 is the probability measure induced by the probability density function

Environmental Sound Classification in Realistic Situations

Sparse Gaussian Markov Random Field Mixtures for Anomaly Detection

Master 2 Informatique Probabilistic Learning and Data Analysis

Information Dynamics Foundations and Applications

MACHINE LEARNING AND PATTERN RECOGNITION Fall 2006, Lecture 8: Latent Variables, EM Yann LeCun

Boundary Contraction Training for Acoustic Models based on Discrete Deep Neural Networks

VECTOR-QUANTIZATION BY DENSITY MATCHING IN THE MINIMUM KULLBACK-LEIBLER DIVERGENCE SENSE

Expectation Maximization

Gaussian Process Latent Variable Models for Dimensionality Reduction and Time Series Modeling

Independent Component Analysis

More on HMMs and other sequence models. Intro to NLP - ETHZ - 18/03/2013

Independent Component Analysis

Information Theory in Computer Vision and Pattern Recognition

13: Variational inference II

REAL-TIME TIME-FREQUENCY BASED BLIND SOURCE SEPARATION. Scott Rickard, Radu Balan, Justinian Rosca. Siemens Corporate Research Princeton, NJ 08540

CS 136a Lecture 7 Speech Recognition Architecture: Training models with the Forward backward algorithm

Human-Oriented Robotics. Temporal Reasoning. Kai Arras Social Robotics Lab, University of Freiburg

Statistical Analysis of fmrl Data

Estimating Correlation Coefficient Between Two Complex Signals Without Phase Observation

Transcription:

Independent Component Analysis and Unsupervised Learning Jen-Tzung Chien

TABLE OF CONTENTS 1. Independent Component Analysis 2. Case Study I: Speech Recognition Independent voices Nonparametric likelihood ratio ICA 3. Case Study II: Blind Source Separation Convex divergence ICA Nonstationary Bayesian ICA Online Gaussian process ICA 4. Summary

Introduction Independent component analysis (ICA) is essential for blind source separation. ICA is applied to separate the mixed signals and find the independent components. The demixed components can be grouped into clusters where the intra-cluster elements are dependent and intercluster elements are independent. ICA provides unsupervised learning approach to acoustic modeling, signal separation and many others. APSIPA DL: Independent Component Analysis and Unsupervised Learning 3

Blind Source Separation Cocktail-party problem Goal Unknown: A and s Reconstruct the source signals via demixing matrix W Mixture matrix A is assumed to be fixed. APSIPA DL: Independent Component Analysis and Unsupervised Learning 4

Independent Component Analysis Three assumptions sources statistically independent independent component nongaussian distribution mixing matrix square matrix mt mt m t mm m m mt m t S S S S A A A A X X X X 1 1 11 1 1 11 1 1 11 m m t m x As APSIPA DL: Independent Component Analysis and Unsupervised Learning 5

ICA Objective Function APSIPA DL: Independent Component Analysis and Unsupervised Learning 6

ICA Learning Rule ICA demixing matrix can be estimated by optimizing an objective function via gradient descent algorithm or natural gradient algorithm APSIPA DL: Independent Component Analysis and Unsupervised Learning 7

TABLE OF CONTENTS 1. Independent Component Analysis 2. Case Study I: Speech Recognition Independent voices Nonparametric likelihood ratio ICA 3. Case Study II: Blind Source Separation Convex divergence ICA Nonstationary Bayesian ICA Online Gaussian process ICA 4. Summary

ICA for Speech Recognition Mismatch between training and test data always exists. Adaptation of HMM parameters is important. Eigenvoice (PCA) versus Independent Voice (ICA) PCA performs a linear de-correlation process ICA extracts the higher-order statistics E[ e e2 em ] E[ e1 ] E[ e2 ] E[ e 1 M r r r E[ s1 s2 sm ] E[ s1 ] E[ s2 r r ] E[ s ] r M ] uncorrelation PCA higher-order correlations are zero ICA APSIPA DL: Independent Component Analysis and Unsupervised Learning 9

Sparseness & Information Redundancy The degree of sparseness in distribution of the transformed signals is proportional to the amount of information conveyed by the transformation. Sparseness measurement fourth-order statistics (kurtosis) nongaussianity 4 2 2 kurt( s) E[ s ] E [ s ] 3 Information redundancy reduction using ICA is higher than that using PCA. APSIPA DL: Independent Component Analysis and Unsupervised Learning 10

Eigenvoices versus Independent Voices APSIPA DL: Independent Component Analysis and Unsupervised Learning 11

Evaluation of Kurtosis 35 30 Independent voice Eigenvoice 25 Kurtosis 20 15 10 5 0 5 10 15 20 25 30 35 Voice index APSIPA DL: Independent Component Analysis and Unsupervised Learning 12

Word Error Rates on Aurora2 Word error rates (%) 11.5 11.0 10.5 10.0 9.5 9.0 8.5 8.0 7.5 No adaptation Eigenvoice L=5 Eigenvoice L=10 Eigenvoice L=15 Independent voice L=5 Independent voice L=10 Independent voice L=15 7.0 2 0 K=10 K=15 K: number of components L: number of adaptation sentences APSIPA DL: Independent Component Analysis and Unsupervised Learning 13

TABLE OF CONTENTS 1. Independent Component Analysis 2. Case Study I: Speech Recognition Independent voices Nonparametric likelihood ratio ICA 3. Case Study II: Blind Source Separation Convex divergence ICA Nonstationary Bayesian ICA Online Gaussian process ICA 4. Summary

Test of Independence Given the demixing signals, the null & alternative hypotheses are defined as If y is Gaussian distributed, we are testing whether the correlation between and is equal to zero, i.e. or APSIPA DL: Independent Component Analysis and Unsupervised Learning 15

Likelihood Ratio LR serves as the test statistics which measures the confidence for against. LR is a measure of independence for and can act as an objective function for finding ICA demixing matrix. However, it is not allowed to assume Gaussianity for ICA problem. APSIPA DL: Independent Component Analysis and Unsupervised Learning 16

Nonparametric Approach Let each sample be transformed by. Instead of assuming Gaussianity, we apply the kernel density estimation using Gaussian kernel Kernel centroid is given by APSIPA DL: Independent Component Analysis and Unsupervised Learning 17

Nonparametric Likelihood Ratio NLR objective function with multivariate Gaussian kernel APSIPA DL: Independent Component Analysis and Unsupervised Learning 18

ICA Learning Procedure Output W Log likelihood ratio for null and alternative hypotheses Maximizing with respect to,, we obtain APSIPA DL: Independent Component Analysis and Unsupervised Learning 19

Training data Viterbi alignment Segment-based supervector collection for a subword unit X NLR-ICA Y K-means clustering.... Cluster 1 Cluster 2 Cluster M Hidden Markov model training Lexicon HMM 1 HMM 2 HMM M.... 1 2 M Test data Speech recognizer Recognition result

Segment-Based Supervector Aligned utterance Aligned utterance x s1 xs 2 x 1... x s N x 2... Aligned utterance... xt1 x T Segment-based supervector matrix X x1 x2 1... x T T x APSIPA DL: Independent Component Analysis and Unsupervised Learning 21

TABLE OF CONTENTS 1. Independent Component Analysis 2. Case Study I: Speech Recognition Independent voices Nonparametric likelihood ratio ICA 3. Case Study II: Blind Source Separation Convex divergence ICA Nonstationary Bayesian ICA Online Gaussian process ICA 4. Summary

ICA Objective Function Independent Component Analysis Maximum Likelihood Divergence Measure Kurtosis Negentropy -Divergence Kullback- Leiblier (KL) Divergence Euclidean Divergence Cauchy Schwartz Divergence Convex Divergence APSIPA DL: Independent Component Analysis and Unsupervised Learning 23

Mutual Information & KL Divergence Mutual information between two variables and is defined by using the Shannon entropy. It can be formulated as the KL divergence or relative entropy between the joint distribution and the product of marginal distribution where. APSIPA DL: Independent Component Analysis and Unsupervised Learning 24

Divergence Measures Euclidean divergence Cauchy-Schwartz divergence -divergence APSIPA DL: Independent Component Analysis and Unsupervised Learning 25

Divergence Measures f-divergence Jensen-Shannon divergence where. Entropy is a concave function. APSIPA DL: Independent Component Analysis and Unsupervised Learning 26

Convex Function A convex function should meet the Jensen s inequality f () : convex function A general convex function is defined by APSIPA DL: Independent Component Analysis and Unsupervised Learning 27

Convex Divergence By assuming equal weight, we have When, C-DIV is derived as a case with convex function APSIPA DL: Independent Component Analysis and Unsupervised Learning 28

Different Divergence Measures APSIPA DL: Independent Component Analysis and Unsupervised Learning 29

Different Divergence Measures APSIPA DL: Independent Component Analysis and Unsupervised Learning 30

Convex Divergence ICA C-ICA learning algorithm Nonparametric C-ICA is established by using Parzen window density function. APSIPA DL: Independent Component Analysis and Unsupervised Learning 31

Simulated Experiments A parametric demixing matrix Two sources: super-gaussian and sub-gaussian distribution p Kurtosis W cos1 cos2 1, s [, ] s ) 2 1 0, otherwise 1 1 1 ( 1 sin1 sin 2 Source 1: -1.13, source 2: 2.23 1 p( s 2 ) exp 2 2 s 2 2 APSIPA DL: Independent Component Analysis and Unsupervised Learning 32

KL-DIV C-DIV alpha=1 C-DIV, alpha= -1 APSIPA DL: Independent Component Analysis and Unsupervised Learning 33

Learning Curves APSIPA DL: Independent Component Analysis and Unsupervised Learning 34

Experiments on Blind Source Separation One music signal and two speech signals from two male speakers were sampled from ICA 99 BSS Test Sets at http://sound.media.mit.edu/ica-bench/ Mixing matrix A 0.8 0.3 0.3 0.2 0.8 0.7 0.3 0.2 0.3 Evaluation metric signal-to-interference ratio (SIR) SIR(dB) 10log 10 T s 2 T t1 t t1 y t s t 2 APSIPA DL: Independent Component Analysis and Unsupervised Learning 35

Comparison of Different Methods PC-ICA NC-ICA APSIPA DL: Independent Component Analysis and Unsupervised Learning 36

TABLE OF CONTENTS 1. Independent Component Analysis 2. Case Study I: Speech Recognition Independent voices Nonparametric likelihood ratio ICA 3. Case Study II: Blind Source Separation Convex divergence ICA Nonstationary Bayesian ICA Online Gaussian process ICA 4. Summary

Why Nonstationary Source Separation? Real-world blind source separation number of sources is unknown BSS is a dynamic time-varying system mixing process is nonstationary Why nonstationary? Bayesian method using ARD can determine the changing number of sources recursive Bayesian for online tracking of nonstationary conditions Gaussian process provides a nonparametric solution to represent temporal structure of time-varying mixing system. APSIPA DL: Independent Component Analysis and Unsupervised Learning 38

Nonstationary Mixing Systems Time-varying mixing matrix Source signals may abruptly appear or disappear S 3 S 1 a a ( t) 23 ( ) 1) ( 1) t ( 2) a t a 23 23 a t 23 ( t) d a ( t 1) 2) 21 3 21 0 ( t2) d 1 a ' 22 d ' ( ) ( 1) 1 d a t 22 ( t) a a t 2 22 22 d 2 S 2 S 2 APSIPA DL: Independent Component Analysis and Unsupervised Learning 39

Nonstationary Bayesian (NB) Learning Maximum a posteriori estimation of NB-ICA parameters and compensation parameters updating Learning epoch t Learning epoch t+1 (t-1) (t) (t+1) Prior Updating Prior Updating (t-1) (t) (t+1) Learning epoch t Learning epoch t+1 APSIPA DL: Independent Component Analysis and Unsupervised Learning 40

Model Construction Noisy ICA model Likelihood function of an observation Distribution of model parameters source mixing matrix noise APSIPA DL: Independent Component Analysis and Unsupervised Learning 41

Prior & Marginal Distributions Prior distributions precision of noise precision of mixing matrix Marginal likelihood of NB-ICA model APSIPA DL: Independent Component Analysis and Unsupervised Learning 42

Automatic Relevance Determination Detection of source signals number of sources can be determined APSIPA DL: Independent Component Analysis and Unsupervised Learning 43

Compensation for Nonstationary ICA Prior density of compensation parameter conjugate prior (Wishart distribution) APSIPA DL: Independent Component Analysis and Unsupervised Learning 44

Graphical Model for NB-ICA ( l1) ρ ( l1) V ( l1) u α ( l ) ( l) H (l) A NM (l) Π (l) R ( l1) Φ ( l1) Φ r ( l1) ω (l) B N (l) ε t (l) x t (l) s t L (l) M M ( l1) Φ m APSIPA DL: Independent Component Analysis and Unsupervised Learning 45

Experiments Nonstationary Blind Source Separation ICA'99 http://sound.media.mit.edu/ica-bench/ Scenarios state of source signals: active or inactive source signals or sensors are moving: nonstationary mixing matrix APSIPA DL: Independent Component Analysis and Unsupervised Learning 46

Source Signals and ARD Curves Blue: first source signal Red: second source signal APSIPA DL: Independent Component Analysis and Unsupervised Learning 47

TABLE OF CONTENTS 1. Independent Component Analysis 2. Case Study I: Speech Recognition Independent voices Nonparametric likelihood ratio ICA 3. Case Study II: Blind Source Separation Convex divergence ICA Nonstationary Bayesian ICA Online Gaussian process ICA 4. Summary

Online Gaussian Process (OLGP) Basic ideas incrementally detect the status of source signals and estimate the corresponding distributions from online observation data. temporal structure of time-varying mixing coefficients are characterized by Gaussian process. Gaussian process is a nonparametric model which defines the prior distribution over functions for Bayesian inference. APSIPA DL: Independent Component Analysis and Unsupervised Learning 49

Model Construction Noisy ICA model Likelihood function Distribution of model parameters source noise P APSIPA DL: Independent Component Analysis and Unsupervised Learning 50

Gaussian Process Mixing matrix is generated by the latent function GP is adopted to describe the distribution of are hyperparameters of kernel function APSIPA DL: Independent Component Analysis and Unsupervised Learning 51

Graphical Model for OLGP-ICA ( l1) Λ a ( l1) Ξ a ( l1) M a ( l1) R a NM ( l1) λ s ( l1) u (l) A t ( l1) R s ( l1) ρ s ( l1) ω (l) (l) B ε t N (l) x t (l) s t L ( l1) M s M APSIPA DL: Independent Component Analysis and Unsupervised Learning 52

Experimental Setup Nonstationary source separation using source signals from http://www.kecl.ntt.co.jp/icl/signal/ Nonstationary scenarios status of source signals: active or inactive source signals or sensors are moving: nonstationary mixing matrix APSIPA DL: Independent Component Analysis and Unsupervised Learning 53

Male Music Female APSIPA DL: Independent Component Analysis and Unsupervised Learning 54

Comparison of Different Methods Signal-to-interference ratios (SIRs) (db) VB-ICA BICA-HMM Switching- ICA Online VB-ICA OLGP-ICA Demixed signal 1 Demixed signal 2 7.97 9.04 12.06 11.26 17.24-3.23-1.5-4.82 4.47 9.96 APSIPA DL: Independent Component Analysis and Unsupervised Learning 55

Summary We presented speaker adaptation method based on independent voices by fulfilling ICA perspective. A nonparametric likelihood ratio ICA was proposed according to hypothesis test theory. A convex divergence was developed as an optimization metric for ICA algorithm. A nonstationary Bayesian ICA was proposed to deal with nonstationary mixing system. An online Gaussian process ICA was presented for nonstationary and temporally correlated source separation. ICA methods could be extended to solve nonnegative matrix factorization and single-channel separation. APSIPA DL: Independent Component Analysis and Unsupervised Learning 56

References J.-T. Chien and B.-C. Chen, A new independent component analysis for speech recognition and separation, IEEE Transactions on Audio, Speech and Language Processing, vol. 14, no. 4, pp. 1245-1254, 2006. J.-T. Chien, H.-L. Hsieh and S. Furui, A new mutual information measure for independent component analysis, in Proc. ICASSP, pp. 1817-1820, 2008. H.-L. Hsieh, J.-T. Chien, K. Shinoda and S. Furui, Independent component analysis for noisy speech recognition, in Proc. ICASSP, pp. 4369-4372, 2009. H.-L. Hsieh and J.-T. Chien, Online Bayesian learning for dynamic source separation, in Proc. ICASSP, pp. 1950-1953, 2010. H.-L. Hsieh and J.-T. Chien, Online Gaussian process for nonstationary speech separation, in Proc. INTERSPEECH, pp. 394-397, 2010. H.-L. Hsieh and J.-T. Chien, Nonstationary and temporally-correlated source separation using Gaussian process, in Proc. ICASSP, pp. 2120-2123, 2011. J.-T. Chien and H.-L. Hsieh, Convex divergence ICA for blind source separation, IEEE Transactions on Audio, Speech and Language Processing, vol. 20, no. 1, pp. 290-301, 2012. APSIPA DL: Independent Component Analysis and Unsupervised Learning 57

Thanks to H.-L. Hsieh K. Shinoda S. Furui jtchien@nctu.edu.tw APSIPA DL: Independent Component Analysis and Unsupervised Learning 58