Independent Component Analysis and Unsupervised Learning. Jen-Tzung Chien
|
|
- Judith Conley
- 6 years ago
- Views:
Transcription
1 Independent Component Analysis and Unsupervised Learning Jen-Tzung Chien
2 TABLE OF CONTENTS 1. Independent Component Analysis 2. Case Study I: Speech Recognition Independent voices Nonparametric likelihood ratio ICA 3. Case Study II: Blind Source Separation Convex divergence ICA Nonstationary Bayesian ICA Online Gaussian process ICA 4. Summary
3 Introduction Independent component analysis (ICA) is essential for blind source separation. ICA is applied to separate the mixed signals and find the independent components. The demixed components can be grouped into clusters where the intra-cluster elements are dependent and intercluster elements are independent. ICA provides unsupervised learning approach to acoustic modeling, signal separation and many others. APSIPA DL: Independent Component Analysis and Unsupervised Learning 3
4 Blind Source Separation Cocktail-party problem Goal Unknown: A and s Reconstruct the source signals via demixing matrix W Mixture matrix A is assumed to be fixed. APSIPA DL: Independent Component Analysis and Unsupervised Learning 4
5 Independent Component Analysis Three assumptions sources statistically independent independent component nongaussian distribution mixing matrix square matrix mt mt m t mm m m mt m t S S S S A A A A X X X X m m t m x As APSIPA DL: Independent Component Analysis and Unsupervised Learning 5
6 ICA Objective Function APSIPA DL: Independent Component Analysis and Unsupervised Learning 6
7 ICA Learning Rule ICA demixing matrix can be estimated by optimizing an objective function via gradient descent algorithm or natural gradient algorithm APSIPA DL: Independent Component Analysis and Unsupervised Learning 7
8 TABLE OF CONTENTS 1. Independent Component Analysis 2. Case Study I: Speech Recognition Independent voices Nonparametric likelihood ratio ICA 3. Case Study II: Blind Source Separation Convex divergence ICA Nonstationary Bayesian ICA Online Gaussian process ICA 4. Summary
9 ICA for Speech Recognition Mismatch between training and test data always exists. Adaptation of HMM parameters is important. Eigenvoice (PCA) versus Independent Voice (ICA) PCA performs a linear de-correlation process ICA extracts the higher-order statistics E[ e e2 em ] E[ e1 ] E[ e2 ] E[ e 1 M r r r E[ s1 s2 sm ] E[ s1 ] E[ s2 r r ] E[ s ] r M ] uncorrelation PCA higher-order correlations are zero ICA APSIPA DL: Independent Component Analysis and Unsupervised Learning 9
10 Sparseness & Information Redundancy The degree of sparseness in distribution of the transformed signals is proportional to the amount of information conveyed by the transformation. Sparseness measurement fourth-order statistics (kurtosis) nongaussianity kurt( s) E[ s ] E [ s ] 3 Information redundancy reduction using ICA is higher than that using PCA. APSIPA DL: Independent Component Analysis and Unsupervised Learning 10
11 Eigenvoices versus Independent Voices APSIPA DL: Independent Component Analysis and Unsupervised Learning 11
12 Evaluation of Kurtosis Independent voice Eigenvoice 25 Kurtosis Voice index APSIPA DL: Independent Component Analysis and Unsupervised Learning 12
13 Word Error Rates on Aurora2 Word error rates (%) No adaptation Eigenvoice L=5 Eigenvoice L=10 Eigenvoice L=15 Independent voice L=5 Independent voice L=10 Independent voice L= K=10 K=15 K: number of components L: number of adaptation sentences APSIPA DL: Independent Component Analysis and Unsupervised Learning 13
14 TABLE OF CONTENTS 1. Independent Component Analysis 2. Case Study I: Speech Recognition Independent voices Nonparametric likelihood ratio ICA 3. Case Study II: Blind Source Separation Convex divergence ICA Nonstationary Bayesian ICA Online Gaussian process ICA 4. Summary
15 Test of Independence Given the demixing signals, the null & alternative hypotheses are defined as If y is Gaussian distributed, we are testing whether the correlation between and is equal to zero, i.e. or APSIPA DL: Independent Component Analysis and Unsupervised Learning 15
16 Likelihood Ratio LR serves as the test statistics which measures the confidence for against. LR is a measure of independence for and can act as an objective function for finding ICA demixing matrix. However, it is not allowed to assume Gaussianity for ICA problem. APSIPA DL: Independent Component Analysis and Unsupervised Learning 16
17 Nonparametric Approach Let each sample be transformed by. Instead of assuming Gaussianity, we apply the kernel density estimation using Gaussian kernel Kernel centroid is given by APSIPA DL: Independent Component Analysis and Unsupervised Learning 17
18 Nonparametric Likelihood Ratio NLR objective function with multivariate Gaussian kernel APSIPA DL: Independent Component Analysis and Unsupervised Learning 18
19 ICA Learning Procedure Output W Log likelihood ratio for null and alternative hypotheses Maximizing with respect to,, we obtain APSIPA DL: Independent Component Analysis and Unsupervised Learning 19
20 Training data Viterbi alignment Segment-based supervector collection for a subword unit X NLR-ICA Y K-means clustering.... Cluster 1 Cluster 2 Cluster M Hidden Markov model training Lexicon HMM 1 HMM 2 HMM M M Test data Speech recognizer Recognition result
21 Segment-Based Supervector Aligned utterance Aligned utterance x s1 xs 2 x 1... x s N x 2... Aligned utterance... xt1 x T Segment-based supervector matrix X x1 x x T T x APSIPA DL: Independent Component Analysis and Unsupervised Learning 21
22 TABLE OF CONTENTS 1. Independent Component Analysis 2. Case Study I: Speech Recognition Independent voices Nonparametric likelihood ratio ICA 3. Case Study II: Blind Source Separation Convex divergence ICA Nonstationary Bayesian ICA Online Gaussian process ICA 4. Summary
23 ICA Objective Function Independent Component Analysis Maximum Likelihood Divergence Measure Kurtosis Negentropy -Divergence Kullback- Leiblier (KL) Divergence Euclidean Divergence Cauchy Schwartz Divergence Convex Divergence APSIPA DL: Independent Component Analysis and Unsupervised Learning 23
24 Mutual Information & KL Divergence Mutual information between two variables and is defined by using the Shannon entropy. It can be formulated as the KL divergence or relative entropy between the joint distribution and the product of marginal distribution where. APSIPA DL: Independent Component Analysis and Unsupervised Learning 24
25 Divergence Measures Euclidean divergence Cauchy-Schwartz divergence -divergence APSIPA DL: Independent Component Analysis and Unsupervised Learning 25
26 Divergence Measures f-divergence Jensen-Shannon divergence where. Entropy is a concave function. APSIPA DL: Independent Component Analysis and Unsupervised Learning 26
27 Convex Function A convex function should meet the Jensen s inequality f () : convex function A general convex function is defined by APSIPA DL: Independent Component Analysis and Unsupervised Learning 27
28 Convex Divergence By assuming equal weight, we have When, C-DIV is derived as a case with convex function APSIPA DL: Independent Component Analysis and Unsupervised Learning 28
29 Different Divergence Measures APSIPA DL: Independent Component Analysis and Unsupervised Learning 29
30 Different Divergence Measures APSIPA DL: Independent Component Analysis and Unsupervised Learning 30
31 Convex Divergence ICA C-ICA learning algorithm Nonparametric C-ICA is established by using Parzen window density function. APSIPA DL: Independent Component Analysis and Unsupervised Learning 31
32 Simulated Experiments A parametric demixing matrix Two sources: super-gaussian and sub-gaussian distribution p Kurtosis W cos1 cos2 1, s [, ] s ) 2 1 0, otherwise ( 1 sin1 sin 2 Source 1: -1.13, source 2: p( s 2 ) exp 2 2 s 2 2 APSIPA DL: Independent Component Analysis and Unsupervised Learning 32
33 KL-DIV C-DIV alpha=1 C-DIV, alpha= -1 APSIPA DL: Independent Component Analysis and Unsupervised Learning 33
34 Learning Curves APSIPA DL: Independent Component Analysis and Unsupervised Learning 34
35 Experiments on Blind Source Separation One music signal and two speech signals from two male speakers were sampled from ICA 99 BSS Test Sets at Mixing matrix A Evaluation metric signal-to-interference ratio (SIR) SIR(dB) 10log 10 T s 2 T t1 t t1 y t s t 2 APSIPA DL: Independent Component Analysis and Unsupervised Learning 35
36 Comparison of Different Methods PC-ICA NC-ICA APSIPA DL: Independent Component Analysis and Unsupervised Learning 36
37 TABLE OF CONTENTS 1. Independent Component Analysis 2. Case Study I: Speech Recognition Independent voices Nonparametric likelihood ratio ICA 3. Case Study II: Blind Source Separation Convex divergence ICA Nonstationary Bayesian ICA Online Gaussian process ICA 4. Summary
38 Why Nonstationary Source Separation? Real-world blind source separation number of sources is unknown BSS is a dynamic time-varying system mixing process is nonstationary Why nonstationary? Bayesian method using ARD can determine the changing number of sources recursive Bayesian for online tracking of nonstationary conditions Gaussian process provides a nonparametric solution to represent temporal structure of time-varying mixing system. APSIPA DL: Independent Component Analysis and Unsupervised Learning 38
39 Nonstationary Mixing Systems Time-varying mixing matrix Source signals may abruptly appear or disappear S 3 S 1 a a ( t) 23 ( ) 1) ( 1) t ( 2) a t a a t 23 ( t) d a ( t 1) 2) ( t2) d 1 a ' 22 d ' ( ) ( 1) 1 d a t 22 ( t) a a t d 2 S 2 S 2 APSIPA DL: Independent Component Analysis and Unsupervised Learning 39
40 Nonstationary Bayesian (NB) Learning Maximum a posteriori estimation of NB-ICA parameters and compensation parameters updating Learning epoch t Learning epoch t+1 (t-1) (t) (t+1) Prior Updating Prior Updating (t-1) (t) (t+1) Learning epoch t Learning epoch t+1 APSIPA DL: Independent Component Analysis and Unsupervised Learning 40
41 Model Construction Noisy ICA model Likelihood function of an observation Distribution of model parameters source mixing matrix noise APSIPA DL: Independent Component Analysis and Unsupervised Learning 41
42 Prior & Marginal Distributions Prior distributions precision of noise precision of mixing matrix Marginal likelihood of NB-ICA model APSIPA DL: Independent Component Analysis and Unsupervised Learning 42
43 Automatic Relevance Determination Detection of source signals number of sources can be determined APSIPA DL: Independent Component Analysis and Unsupervised Learning 43
44 Compensation for Nonstationary ICA Prior density of compensation parameter conjugate prior (Wishart distribution) APSIPA DL: Independent Component Analysis and Unsupervised Learning 44
45 Graphical Model for NB-ICA ( l1) ρ ( l1) V ( l1) u α ( l ) ( l) H (l) A NM (l) Π (l) R ( l1) Φ ( l1) Φ r ( l1) ω (l) B N (l) ε t (l) x t (l) s t L (l) M M ( l1) Φ m APSIPA DL: Independent Component Analysis and Unsupervised Learning 45
46 Experiments Nonstationary Blind Source Separation ICA'99 Scenarios state of source signals: active or inactive source signals or sensors are moving: nonstationary mixing matrix APSIPA DL: Independent Component Analysis and Unsupervised Learning 46
47 Source Signals and ARD Curves Blue: first source signal Red: second source signal APSIPA DL: Independent Component Analysis and Unsupervised Learning 47
48 TABLE OF CONTENTS 1. Independent Component Analysis 2. Case Study I: Speech Recognition Independent voices Nonparametric likelihood ratio ICA 3. Case Study II: Blind Source Separation Convex divergence ICA Nonstationary Bayesian ICA Online Gaussian process ICA 4. Summary
49 Online Gaussian Process (OLGP) Basic ideas incrementally detect the status of source signals and estimate the corresponding distributions from online observation data. temporal structure of time-varying mixing coefficients are characterized by Gaussian process. Gaussian process is a nonparametric model which defines the prior distribution over functions for Bayesian inference. APSIPA DL: Independent Component Analysis and Unsupervised Learning 49
50 Model Construction Noisy ICA model Likelihood function Distribution of model parameters source noise P APSIPA DL: Independent Component Analysis and Unsupervised Learning 50
51 Gaussian Process Mixing matrix is generated by the latent function GP is adopted to describe the distribution of are hyperparameters of kernel function APSIPA DL: Independent Component Analysis and Unsupervised Learning 51
52 Graphical Model for OLGP-ICA ( l1) Λ a ( l1) Ξ a ( l1) M a ( l1) R a NM ( l1) λ s ( l1) u (l) A t ( l1) R s ( l1) ρ s ( l1) ω (l) (l) B ε t N (l) x t (l) s t L ( l1) M s M APSIPA DL: Independent Component Analysis and Unsupervised Learning 52
53 Experimental Setup Nonstationary source separation using source signals from Nonstationary scenarios status of source signals: active or inactive source signals or sensors are moving: nonstationary mixing matrix APSIPA DL: Independent Component Analysis and Unsupervised Learning 53
54 Male Music Female APSIPA DL: Independent Component Analysis and Unsupervised Learning 54
55 Comparison of Different Methods Signal-to-interference ratios (SIRs) (db) VB-ICA BICA-HMM Switching- ICA Online VB-ICA OLGP-ICA Demixed signal 1 Demixed signal APSIPA DL: Independent Component Analysis and Unsupervised Learning 55
56 Summary We presented speaker adaptation method based on independent voices by fulfilling ICA perspective. A nonparametric likelihood ratio ICA was proposed according to hypothesis test theory. A convex divergence was developed as an optimization metric for ICA algorithm. A nonstationary Bayesian ICA was proposed to deal with nonstationary mixing system. An online Gaussian process ICA was presented for nonstationary and temporally correlated source separation. ICA methods could be extended to solve nonnegative matrix factorization and single-channel separation. APSIPA DL: Independent Component Analysis and Unsupervised Learning 56
57 References J.-T. Chien and B.-C. Chen, A new independent component analysis for speech recognition and separation, IEEE Transactions on Audio, Speech and Language Processing, vol. 14, no. 4, pp , J.-T. Chien, H.-L. Hsieh and S. Furui, A new mutual information measure for independent component analysis, in Proc. ICASSP, pp , H.-L. Hsieh, J.-T. Chien, K. Shinoda and S. Furui, Independent component analysis for noisy speech recognition, in Proc. ICASSP, pp , H.-L. Hsieh and J.-T. Chien, Online Bayesian learning for dynamic source separation, in Proc. ICASSP, pp , H.-L. Hsieh and J.-T. Chien, Online Gaussian process for nonstationary speech separation, in Proc. INTERSPEECH, pp , H.-L. Hsieh and J.-T. Chien, Nonstationary and temporally-correlated source separation using Gaussian process, in Proc. ICASSP, pp , J.-T. Chien and H.-L. Hsieh, Convex divergence ICA for blind source separation, IEEE Transactions on Audio, Speech and Language Processing, vol. 20, no. 1, pp , APSIPA DL: Independent Component Analysis and Unsupervised Learning 57
58 Thanks to H.-L. Hsieh K. Shinoda S. Furui APSIPA DL: Independent Component Analysis and Unsupervised Learning 58
Independent Component Analysis and Unsupervised Learning
Independent Component Analysis and Unsupervised Learning Jen-Tzung Chien National Cheng Kung University TABLE OF CONTENTS 1. Independent Component Analysis 2. Case Study I: Speech Recognition Independent
More informationA Convex Cauchy-Schwarz Divergence Measure for Blind Source Separation
INTERNATIONAL JOURNAL OF CIRCUITS, SYSTEMS AND SIGNAL PROCESSING Volume, 8 A Convex Cauchy-Schwarz Divergence Measure for Blind Source Separation Zaid Albataineh and Fathi M. Salem Abstract We propose
More informationMonaural speech separation using source-adapted models
Monaural speech separation using source-adapted models Ron Weiss, Dan Ellis {ronw,dpwe}@ee.columbia.edu LabROSA Department of Electrical Enginering Columbia University 007 IEEE Workshop on Applications
More informationSINGLE CHANNEL SPEECH MUSIC SEPARATION USING NONNEGATIVE MATRIX FACTORIZATION AND SPECTRAL MASKS. Emad M. Grais and Hakan Erdogan
SINGLE CHANNEL SPEECH MUSIC SEPARATION USING NONNEGATIVE MATRIX FACTORIZATION AND SPECTRAL MASKS Emad M. Grais and Hakan Erdogan Faculty of Engineering and Natural Sciences, Sabanci University, Orhanli
More informationIndependent Component Analysis. Contents
Contents Preface xvii 1 Introduction 1 1.1 Linear representation of multivariate data 1 1.1.1 The general statistical setting 1 1.1.2 Dimension reduction methods 2 1.1.3 Independence as a guiding principle
More informationA Variance Modeling Framework Based on Variational Autoencoders for Speech Enhancement
A Variance Modeling Framework Based on Variational Autoencoders for Speech Enhancement Simon Leglaive 1 Laurent Girin 1,2 Radu Horaud 1 1: Inria Grenoble Rhône-Alpes 2: Univ. Grenoble Alpes, Grenoble INP,
More informationMachine Learning Techniques for Computer Vision
Machine Learning Techniques for Computer Vision Part 2: Unsupervised Learning Microsoft Research Cambridge x 3 1 0.5 0.2 0 0.5 0.3 0 0.5 1 ECCV 2004, Prague x 2 x 1 Overview of Part 2 Mixture models EM
More informationNon-Negative Matrix Factorization And Its Application to Audio. Tuomas Virtanen Tampere University of Technology
Non-Negative Matrix Factorization And Its Application to Audio Tuomas Virtanen Tampere University of Technology tuomas.virtanen@tut.fi 2 Contents Introduction to audio signals Spectrogram representation
More informationPattern Recognition and Machine Learning
Christopher M. Bishop Pattern Recognition and Machine Learning ÖSpri inger Contents Preface Mathematical notation Contents vii xi xiii 1 Introduction 1 1.1 Example: Polynomial Curve Fitting 4 1.2 Probability
More informationJoint Factor Analysis for Speaker Verification
Joint Factor Analysis for Speaker Verification Mengke HU ASPITRG Group, ECE Department Drexel University mengke.hu@gmail.com October 12, 2012 1/37 Outline 1 Speaker Verification Baseline System Session
More informationShort-Time ICA for Blind Separation of Noisy Speech
Short-Time ICA for Blind Separation of Noisy Speech Jing Zhang, P.C. Ching Department of Electronic Engineering The Chinese University of Hong Kong, Hong Kong jzhang@ee.cuhk.edu.hk, pcching@ee.cuhk.edu.hk
More informationLinear Dynamical Systems
Linear Dynamical Systems Sargur N. srihari@cedar.buffalo.edu Machine Learning Course: http://www.cedar.buffalo.edu/~srihari/cse574/index.html Two Models Described by Same Graph Latent variables Observations
More informationHeeyoul (Henry) Choi. Dept. of Computer Science Texas A&M University
Heeyoul (Henry) Choi Dept. of Computer Science Texas A&M University hchoi@cs.tamu.edu Introduction Speaker Adaptation Eigenvoice Comparison with others MAP, MLLR, EMAP, RMP, CAT, RSW Experiments Future
More informationFundamentals of Principal Component Analysis (PCA), Independent Component Analysis (ICA), and Independent Vector Analysis (IVA)
Fundamentals of Principal Component Analysis (PCA),, and Independent Vector Analysis (IVA) Dr Mohsen Naqvi Lecturer in Signal and Information Processing, School of Electrical and Electronic Engineering,
More informationRecent Advances in Bayesian Inference Techniques
Recent Advances in Bayesian Inference Techniques Christopher M. Bishop Microsoft Research, Cambridge, U.K. research.microsoft.com/~cmbishop SIAM Conference on Data Mining, April 2004 Abstract Bayesian
More informationSTA 4273H: Statistical Machine Learning
STA 4273H: Statistical Machine Learning Russ Salakhutdinov Department of Statistics! rsalakhu@utstat.toronto.edu! http://www.utstat.utoronto.ca/~rsalakhu/ Sidney Smith Hall, Room 6002 Lecture 11 Project
More informationUnsupervised Learning
Unsupervised Learning Bayesian Model Comparison Zoubin Ghahramani zoubin@gatsby.ucl.ac.uk Gatsby Computational Neuroscience Unit, and MSc in Intelligent Systems, Dept Computer Science University College
More informationIntroduction to Independent Component Analysis. Jingmei Lu and Xixi Lu. Abstract
Final Project 2//25 Introduction to Independent Component Analysis Abstract Independent Component Analysis (ICA) can be used to solve blind signal separation problem. In this article, we introduce definition
More informationIndependent Component Analysis and Its Applications. By Qing Xue, 10/15/2004
Independent Component Analysis and Its Applications By Qing Xue, 10/15/2004 Outline Motivation of ICA Applications of ICA Principles of ICA estimation Algorithms for ICA Extensions of basic ICA framework
More informationPILCO: A Model-Based and Data-Efficient Approach to Policy Search
PILCO: A Model-Based and Data-Efficient Approach to Policy Search (M.P. Deisenroth and C.E. Rasmussen) CSC2541 November 4, 2016 PILCO Graphical Model PILCO Probabilistic Inference for Learning COntrol
More informationDensity Estimation. Seungjin Choi
Density Estimation Seungjin Choi Department of Computer Science and Engineering Pohang University of Science and Technology 77 Cheongam-ro, Nam-gu, Pohang 37673, Korea seungjin@postech.ac.kr http://mlg.postech.ac.kr/
More informationHidden Markov Models and Gaussian Mixture Models
Hidden Markov Models and Gaussian Mixture Models Hiroshi Shimodaira and Steve Renals Automatic Speech Recognition ASR Lectures 4&5 23&27 January 2014 ASR Lectures 4&5 Hidden Markov Models and Gaussian
More informationIndependent Component Analysis (ICA)
Independent Component Analysis (ICA) Seungjin Choi Department of Computer Science and Engineering Pohang University of Science and Technology 77 Cheongam-ro, Nam-gu, Pohang 37673, Korea seungjin@postech.ac.kr
More informationExpectation Maximization (EM)
Expectation Maximization (EM) The EM algorithm is used to train models involving latent variables using training data in which the latent variables are not observed (unlabeled data). This is to be contrasted
More informationCurve Fitting Re-visited, Bishop1.2.5
Curve Fitting Re-visited, Bishop1.2.5 Maximum Likelihood Bishop 1.2.5 Model Likelihood differentiation p(t x, w, β) = Maximum Likelihood N N ( t n y(x n, w), β 1). (1.61) n=1 As we did in the case of the
More informationPATTERN CLASSIFICATION
PATTERN CLASSIFICATION Second Edition Richard O. Duda Peter E. Hart David G. Stork A Wiley-lnterscience Publication JOHN WILEY & SONS, INC. New York Chichester Weinheim Brisbane Singapore Toronto CONTENTS
More informationHST.582J/6.555J/16.456J
Blind Source Separation: PCA & ICA HST.582J/6.555J/16.456J Gari D. Clifford gari [at] mit. edu http://www.mit.edu/~gari G. D. Clifford 2005-2009 What is BSS? Assume an observation (signal) is a linear
More informationPCA & ICA. CE-717: Machine Learning Sharif University of Technology Spring Soleymani
PCA & ICA CE-717: Machine Learning Sharif University of Technology Spring 2015 Soleymani Dimensionality Reduction: Feature Selection vs. Feature Extraction Feature selection Select a subset of a given
More informationNovel spectrum sensing schemes for Cognitive Radio Networks
Novel spectrum sensing schemes for Cognitive Radio Networks Cantabria University Santander, May, 2015 Supélec, SCEE Rennes, France 1 The Advanced Signal Processing Group http://gtas.unican.es The Advanced
More informationIndependent Component Analysis
Independent Component Analysis Seungjin Choi Department of Computer Science Pohang University of Science and Technology, Korea seungjin@postech.ac.kr March 4, 2009 1 / 78 Outline Theory and Preliminaries
More informationReformulating the HMM as a trajectory model by imposing explicit relationship between static and dynamic features
Reformulating the HMM as a trajectory model by imposing explicit relationship between static and dynamic features Heiga ZEN (Byung Ha CHUN) Nagoya Inst. of Tech., Japan Overview. Research backgrounds 2.
More informationSupport Vector Machines using GMM Supervectors for Speaker Verification
1 Support Vector Machines using GMM Supervectors for Speaker Verification W. M. Campbell, D. E. Sturim, D. A. Reynolds MIT Lincoln Laboratory 244 Wood Street Lexington, MA 02420 Corresponding author e-mail:
More informationSTA414/2104. Lecture 11: Gaussian Processes. Department of Statistics
STA414/2104 Lecture 11: Gaussian Processes Department of Statistics www.utstat.utoronto.ca Delivered by Mark Ebden with thanks to Russ Salakhutdinov Outline Gaussian Processes Exam review Course evaluations
More informationCIFAR Lectures: Non-Gaussian statistics and natural images
CIFAR Lectures: Non-Gaussian statistics and natural images Dept of Computer Science University of Helsinki, Finland Outline Part I: Theory of ICA Definition and difference to PCA Importance of non-gaussianity
More informationSequence labeling. Taking collective a set of interrelated instances x 1,, x T and jointly labeling them
HMM, MEMM and CRF 40-957 Special opics in Artificial Intelligence: Probabilistic Graphical Models Sharif University of echnology Soleymani Spring 2014 Sequence labeling aking collective a set of interrelated
More informationNon-negative Matrix Factorization: Algorithms, Extensions and Applications
Non-negative Matrix Factorization: Algorithms, Extensions and Applications Emmanouil Benetos www.soi.city.ac.uk/ sbbj660/ March 2013 Emmanouil Benetos Non-negative Matrix Factorization March 2013 1 / 25
More informationTRINICON: A Versatile Framework for Multichannel Blind Signal Processing
TRINICON: A Versatile Framework for Multichannel Blind Signal Processing Herbert Buchner, Robert Aichner, Walter Kellermann {buchner,aichner,wk}@lnt.de Telecommunications Laboratory University of Erlangen-Nuremberg
More informationEUSIPCO
EUSIPCO 213 1569744273 GAMMA HIDDEN MARKOV MODEL AS A PROBABILISTIC NONNEGATIVE MATRIX FACTORIZATION Nasser Mohammadiha, W. Bastiaan Kleijn, Arne Leijon KTH Royal Institute of Technology, Department of
More informationLatent Tree Approximation in Linear Model
Latent Tree Approximation in Linear Model Navid Tafaghodi Khajavi Dept. of Electrical Engineering, University of Hawaii, Honolulu, HI 96822 Email: navidt@hawaii.edu ariv:1710.01838v1 [cs.it] 5 Oct 2017
More informationp L yi z n m x N n xi
y i z n x n N x i Overview Directed and undirected graphs Conditional independence Exact inference Latent variables and EM Variational inference Books statistical perspective Graphical Models, S. Lauritzen
More informationSeparation of Different Voices in Speech using Fast Ica Algorithm
Volume-6, Issue-6, November-December 2016 International Journal of Engineering and Management Research Page Number: 364-368 Separation of Different Voices in Speech using Fast Ica Algorithm Dr. T.V.P Sundararajan
More informationGatsby Theoretical Neuroscience Lectures: Non-Gaussian statistics and natural images Parts I-II
Gatsby Theoretical Neuroscience Lectures: Non-Gaussian statistics and natural images Parts I-II Gatsby Unit University College London 27 Feb 2017 Outline Part I: Theory of ICA Definition and difference
More informationComparing linear and non-linear transformation of speech
Comparing linear and non-linear transformation of speech Larbi Mesbahi, Vincent Barreaud and Olivier Boeffard IRISA / ENSSAT - University of Rennes 1 6, rue de Kerampont, Lannion, France {lmesbahi, vincent.barreaud,
More informationUpper Bound Kullback-Leibler Divergence for Hidden Markov Models with Application as Discrimination Measure for Speech Recognition
Upper Bound Kullback-Leibler Divergence for Hidden Markov Models with Application as Discrimination Measure for Speech Recognition Jorge Silva and Shrikanth Narayanan Speech Analysis and Interpretation
More informationEigenvoice Speaker Adaptation via Composite Kernel PCA
Eigenvoice Speaker Adaptation via Composite Kernel PCA James T. Kwok, Brian Mak and Simon Ho Department of Computer Science Hong Kong University of Science and Technology Clear Water Bay, Hong Kong [jamesk,mak,csho]@cs.ust.hk
More informationICA. Independent Component Analysis. Zakariás Mátyás
ICA Independent Component Analysis Zakariás Mátyás Contents Definitions Introduction History Algorithms Code Uses of ICA Definitions ICA Miture Separation Signals typical signals Multivariate statistics
More informationAdvanced Introduction to Machine Learning CMU-10715
Advanced Introduction to Machine Learning CMU-10715 Independent Component Analysis Barnabás Póczos Independent Component Analysis 2 Independent Component Analysis Model original signals Observations (Mixtures)
More informationBayesian ensemble learning of generative models
Chapter Bayesian ensemble learning of generative models Harri Valpola, Antti Honkela, Juha Karhunen, Tapani Raiko, Xavier Giannakopoulos, Alexander Ilin, Erkki Oja 65 66 Bayesian ensemble learning of generative
More informationSTA 4273H: Statistical Machine Learning
STA 4273H: Statistical Machine Learning Russ Salakhutdinov Department of Statistics! rsalakhu@utstat.toronto.edu! http://www.utstat.utoronto.ca/~rsalakhu/ Sidney Smith Hall, Room 6002 Lecture 7 Approximate
More informationComputer Vision Group Prof. Daniel Cremers. 6. Mixture Models and Expectation-Maximization
Prof. Daniel Cremers 6. Mixture Models and Expectation-Maximization Motivation Often the introduction of latent (unobserved) random variables into a model can help to express complex (marginal) distributions
More informationALGONQUIN - Learning dynamic noise models from noisy speech for robust speech recognition
ALGONQUIN - Learning dynamic noise models from noisy speech for robust speech recognition Brendan J. Freyl, Trausti T. Kristjanssonl, Li Deng 2, Alex Acero 2 1 Probabilistic and Statistical Inference Group,
More informationExperiments with a Gaussian Merging-Splitting Algorithm for HMM Training for Speech Recognition
Experiments with a Gaussian Merging-Splitting Algorithm for HMM Training for Speech Recognition ABSTRACT It is well known that the expectation-maximization (EM) algorithm, commonly used to estimate hidden
More informationMassachusetts Institute of Technology
Massachusetts Institute of Technology 6.867 Machine Learning, Fall 2006 Problem Set 5 Due Date: Thursday, Nov 30, 12:00 noon You may submit your solutions in class or in the box. 1. Wilhelm and Klaus are
More informationSTA 414/2104: Machine Learning
STA 414/2104: Machine Learning Russ Salakhutdinov Department of Computer Science! Department of Statistics! rsalakhu@cs.toronto.edu! http://www.cs.toronto.edu/~rsalakhu/ Lecture 9 Sequential Data So far
More informationSingle-channel source separation using non-negative matrix factorization
Single-channel source separation using non-negative matrix factorization Mikkel N. Schmidt Technical University of Denmark mns@imm.dtu.dk www.mikkelschmidt.dk DTU Informatics Department of Informatics
More informationBlind Machine Separation Te-Won Lee
Blind Machine Separation Te-Won Lee University of California, San Diego Institute for Neural Computation Blind Machine Separation Problem we want to solve: Single microphone blind source separation & deconvolution
More informationFuncICA for time series pattern discovery
FuncICA for time series pattern discovery Nishant Mehta and Alexander Gray Georgia Institute of Technology The problem Given a set of inherently continuous time series (e.g. EEG) Find a set of patterns
More informationUNIVERSITY of PENNSYLVANIA CIS 520: Machine Learning Final, Fall 2013
UNIVERSITY of PENNSYLVANIA CIS 520: Machine Learning Final, Fall 2013 Exam policy: This exam allows two one-page, two-sided cheat sheets; No other materials. Time: 2 hours. Be sure to write your name and
More informationHidden Markov Models in Language Processing
Hidden Markov Models in Language Processing Dustin Hillard Lecture notes courtesy of Prof. Mari Ostendorf Outline Review of Markov models What is an HMM? Examples General idea of hidden variables: implications
More informationBrief Introduction of Machine Learning Techniques for Content Analysis
1 Brief Introduction of Machine Learning Techniques for Content Analysis Wei-Ta Chu 2008/11/20 Outline 2 Overview Gaussian Mixture Model (GMM) Hidden Markov Model (HMM) Support Vector Machine (SVM) Overview
More informationHST.582J / 6.555J / J Biomedical Signal and Image Processing Spring 2007
MIT OpenCourseWare http://ocw.mit.edu HST.582J / 6.555J / 16.456J Biomedical Signal and Image Processing Spring 2007 For information about citing these materials or our Terms of Use, visit: http://ocw.mit.edu/terms.
More informationCSC2535: Computation in Neural Networks Lecture 7: Variational Bayesian Learning & Model Selection
CSC2535: Computation in Neural Networks Lecture 7: Variational Bayesian Learning & Model Selection (non-examinable material) Matthew J. Beal February 27, 2004 www.variational-bayes.org Bayesian Model Selection
More informationBasic math for biology
Basic math for biology Lei Li Florida State University, Feb 6, 2002 The EM algorithm: setup Parametric models: {P θ }. Data: full data (Y, X); partial data Y. Missing data: X. Likelihood and maximum likelihood
More informationCheng Soon Ong & Christian Walder. Canberra February June 2018
Cheng Soon Ong & Christian Walder Research Group and College of Engineering and Computer Science Canberra February June 218 Outlines Overview Introduction Linear Algebra Probability Linear Regression 1
More informationA TWO-LAYER NON-NEGATIVE MATRIX FACTORIZATION MODEL FOR VOCABULARY DISCOVERY. MengSun,HugoVanhamme
A TWO-LAYER NON-NEGATIVE MATRIX FACTORIZATION MODEL FOR VOCABULARY DISCOVERY MengSun,HugoVanhamme Department of Electrical Engineering-ESAT, Katholieke Universiteit Leuven, Kasteelpark Arenberg 10, Bus
More informationDensity Estimation: ML, MAP, Bayesian estimation
Density Estimation: ML, MAP, Bayesian estimation CE-725: Statistical Pattern Recognition Sharif University of Technology Spring 2013 Soleymani Outline Introduction Maximum-Likelihood Estimation Maximum
More informationorder is number of previous outputs
Markov Models Lecture : Markov and Hidden Markov Models PSfrag Use past replacements as state. Next output depends on previous output(s): y t = f[y t, y t,...] order is number of previous outputs y t y
More informationSingle Channel Signal Separation Using MAP-based Subspace Decomposition
Single Channel Signal Separation Using MAP-based Subspace Decomposition Gil-Jin Jang, Te-Won Lee, and Yung-Hwan Oh 1 Spoken Language Laboratory, Department of Computer Science, KAIST 373-1 Gusong-dong,
More informationSpeaker recognition by means of Deep Belief Networks
Speaker recognition by means of Deep Belief Networks Vasileios Vasilakakis, Sandro Cumani, Pietro Laface, Politecnico di Torino, Italy {first.lastname}@polito.it 1. Abstract Most state of the art speaker
More informationGraphical Models for Collaborative Filtering
Graphical Models for Collaborative Filtering Le Song Machine Learning II: Advanced Topics CSE 8803ML, Spring 2012 Sequence modeling HMM, Kalman Filter, etc.: Similarity: the same graphical model topology,
More informationHidden Markov Models and Gaussian Mixture Models
Hidden Markov Models and Gaussian Mixture Models Hiroshi Shimodaira and Steve Renals Automatic Speech Recognition ASR Lectures 4&5 25&29 January 2018 ASR Lectures 4&5 Hidden Markov Models and Gaussian
More informationCS229 Project: Musical Alignment Discovery
S A A V S N N R R S CS229 Project: Musical Alignment iscovery Woodley Packard ecember 16, 2005 Introduction Logical representations of musical data are widely available in varying forms (for instance,
More informationA Convex Cauchy-Schwarz Divergence Measure for Blind Source Separation
A Convex Cauchy-Schwarz Divergence easure for Blind Source Separation Zaid Albataineh and Fathi. Salem Abstract We propose a new class of divergence measures for Independent Component Analysis (ICA for
More informationSession Variability Compensation in Automatic Speaker Recognition
Session Variability Compensation in Automatic Speaker Recognition Javier González Domínguez VII Jornadas MAVIR Universidad Autónoma de Madrid November 2012 Outline 1. The Inter-session Variability Problem
More informationSpeaker Representation and Verification Part II. by Vasileios Vasilakakis
Speaker Representation and Verification Part II by Vasileios Vasilakakis Outline -Approaches of Neural Networks in Speaker/Speech Recognition -Feed-Forward Neural Networks -Training with Back-propagation
More informationPATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 13: SEQUENTIAL DATA
PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 13: SEQUENTIAL DATA Contents in latter part Linear Dynamical Systems What is different from HMM? Kalman filter Its strength and limitation Particle Filter
More informationMAP adaptation with SphinxTrain
MAP adaptation with SphinxTrain David Huggins-Daines dhuggins@cs.cmu.edu Language Technologies Institute Carnegie Mellon University MAP adaptation with SphinxTrain p.1/12 Theory of MAP adaptation Standard
More informationNonnegative Tensor Factorization with Smoothness Constraints
Nonnegative Tensor Factorization with Smoothness Constraints Rafal ZDUNEK 1 and Tomasz M. RUTKOWSKI 2 1 Institute of Telecommunications, Teleinformatics and Acoustics, Wroclaw University of Technology,
More informationOverview of Statistical Tools. Statistical Inference. Bayesian Framework. Modeling. Very simple case. Things are usually more complicated
Fall 3 Computer Vision Overview of Statistical Tools Statistical Inference Haibin Ling Observation inference Decision Prior knowledge http://www.dabi.temple.edu/~hbling/teaching/3f_5543/index.html Bayesian
More informationAn Introduction to Independent Components Analysis (ICA)
An Introduction to Independent Components Analysis (ICA) Anish R. Shah, CFA Northfield Information Services Anish@northinfo.com Newport Jun 6, 2008 1 Overview of Talk Review principal components Introduce
More informationJorge Silva and Shrikanth Narayanan, Senior Member, IEEE. 1 is the probability measure induced by the probability density function
890 IEEE TRANSACTIONS ON AUDIO, SPEECH, AND LANGUAGE PROCESSING, VOL. 14, NO. 3, MAY 2006 Average Divergence Distance as a Statistical Discrimination Measure for Hidden Markov Models Jorge Silva and Shrikanth
More informationEnvironmental Sound Classification in Realistic Situations
Environmental Sound Classification in Realistic Situations K. Haddad, W. Song Brüel & Kjær Sound and Vibration Measurement A/S, Skodsborgvej 307, 2850 Nærum, Denmark. X. Valero La Salle, Universistat Ramon
More informationSparse Gaussian Markov Random Field Mixtures for Anomaly Detection
Sparse Gaussian Markov Random Field Mixtures for Anomaly Detection Tsuyoshi Idé ( Ide-san ), Ankush Khandelwal*, Jayant Kalagnanam IBM Research, T. J. Watson Research Center (*Currently with University
More informationMaster 2 Informatique Probabilistic Learning and Data Analysis
Master 2 Informatique Probabilistic Learning and Data Analysis Faicel Chamroukhi Maître de Conférences USTV, LSIS UMR CNRS 7296 email: chamroukhi@univ-tln.fr web: chamroukhi.univ-tln.fr 2013/2014 Faicel
More informationInformation Dynamics Foundations and Applications
Gustavo Deco Bernd Schürmann Information Dynamics Foundations and Applications With 89 Illustrations Springer PREFACE vii CHAPTER 1 Introduction 1 CHAPTER 2 Dynamical Systems: An Overview 7 2.1 Deterministic
More informationMACHINE LEARNING AND PATTERN RECOGNITION Fall 2006, Lecture 8: Latent Variables, EM Yann LeCun
Y. LeCun: Machine Learning and Pattern Recognition p. 1/? MACHINE LEARNING AND PATTERN RECOGNITION Fall 2006, Lecture 8: Latent Variables, EM Yann LeCun The Courant Institute, New York University http://yann.lecun.com
More informationBoundary Contraction Training for Acoustic Models based on Discrete Deep Neural Networks
INTERSPEECH 2014 Boundary Contraction Training for Acoustic Models based on Discrete Deep Neural Networks Ryu Takeda, Naoyuki Kanda, and Nobuo Nukaga Central Research Laboratory, Hitachi Ltd., 1-280, Kokubunji-shi,
More informationVECTOR-QUANTIZATION BY DENSITY MATCHING IN THE MINIMUM KULLBACK-LEIBLER DIVERGENCE SENSE
VECTOR-QUATIZATIO BY DESITY ATCHIG I THE IIU KULLBACK-LEIBLER DIVERGECE SESE Anant Hegde, Deniz Erdogmus, Tue Lehn-Schioler 2, Yadunandana. Rao, Jose C. Principe CEL, Electrical & Computer Engineering
More informationExpectation Maximization
Expectation Maximization Seungjin Choi Department of Computer Science and Engineering Pohang University of Science and Technology 77 Cheongam-ro, Nam-gu, Pohang 37673, Korea seungjin@postech.ac.kr 1 /
More informationGaussian Process Latent Variable Models for Dimensionality Reduction and Time Series Modeling
Gaussian Process Latent Variable Models for Dimensionality Reduction and Time Series Modeling Nakul Gopalan IAS, TU Darmstadt nakul.gopalan@stud.tu-darmstadt.de Abstract Time series data of high dimensions
More informationIndependent Component Analysis
000 001 002 003 004 005 006 007 008 009 010 011 012 013 014 015 016 017 018 019 020 021 022 023 024 025 026 027 028 029 030 031 032 033 034 035 036 037 038 039 040 041 042 043 044 1 Introduction Indepent
More informationMore on HMMs and other sequence models. Intro to NLP - ETHZ - 18/03/2013
More on HMMs and other sequence models Intro to NLP - ETHZ - 18/03/2013 Summary Parts of speech tagging HMMs: Unsupervised parameter estimation Forward Backward algorithm Bayesian variants Discriminative
More informationIndependent Component Analysis
A Short Introduction to Independent Component Analysis Aapo Hyvärinen Helsinki Institute for Information Technology and Depts of Computer Science and Psychology University of Helsinki Problem of blind
More informationInformation Theory in Computer Vision and Pattern Recognition
Francisco Escolano Pablo Suau Boyan Bonev Information Theory in Computer Vision and Pattern Recognition Foreword by Alan Yuille ~ Springer Contents 1 Introduction...............................................
More information13: Variational inference II
10-708: Probabilistic Graphical Models, Spring 2015 13: Variational inference II Lecturer: Eric P. Xing Scribes: Ronghuo Zheng, Zhiting Hu, Yuntian Deng 1 Introduction We started to talk about variational
More informationREAL-TIME TIME-FREQUENCY BASED BLIND SOURCE SEPARATION. Scott Rickard, Radu Balan, Justinian Rosca. Siemens Corporate Research Princeton, NJ 08540
REAL-TIME TIME-FREQUENCY BASED BLIND SOURCE SEPARATION Scott Rickard, Radu Balan, Justinian Rosca Siemens Corporate Research Princeton, NJ 84 fscott.rickard,radu.balan,justinian.roscag@scr.siemens.com
More informationCS 136a Lecture 7 Speech Recognition Architecture: Training models with the Forward backward algorithm
+ September13, 2016 Professor Meteer CS 136a Lecture 7 Speech Recognition Architecture: Training models with the Forward backward algorithm Thanks to Dan Jurafsky for these slides + ASR components n Feature
More informationHuman-Oriented Robotics. Temporal Reasoning. Kai Arras Social Robotics Lab, University of Freiburg
Temporal Reasoning Kai Arras, University of Freiburg 1 Temporal Reasoning Contents Introduction Temporal Reasoning Hidden Markov Models Linear Dynamical Systems (LDS) Kalman Filter 2 Temporal Reasoning
More informationStatistical Analysis of fmrl Data
Statistical Analysis of fmrl Data F. Gregory Ashby The MIT Press Cambridge, Massachusetts London, England Preface xi Acronyms xv 1 Introduction 1 What Is fmri? 2 The Scanning Session 4 Experimental Design
More informationEstimating Correlation Coefficient Between Two Complex Signals Without Phase Observation
Estimating Correlation Coefficient Between Two Complex Signals Without Phase Observation Shigeki Miyabe 1B, Notubaka Ono 2, and Shoji Makino 1 1 University of Tsukuba, 1-1-1 Tennodai, Tsukuba, Ibaraki
More information