Pattern Classification
|
|
- Matthew Evans
- 5 years ago
- Views:
Transcription
1 Pattern Classification Introduction Parametric classifiers Semi-parametric classifiers Dimensionality reduction Significance testing 6345 Automatic Speech Recognition Semi-Parametric Classifiers 1
2 Semi-Parametric Classifiers Mixture densities ML parameter estimation Mixture implementations Expectation maximization (EM) 6345 Automatic Speech Recognition Semi-Parametric Classifiers 2
3 Mixture Densities PDF is composed of a mixture of m component densities {ω 1,,ω m }: m p(x)= p(x ω j )P(ω j ) j=1 Component PDF parameters and mixture weights P(ω j ) are typically unknown, making parameter estimation a form of unsupervised learning Gaussian mixtures assume Normal components: p(x ω k ) N(µ k, Σ k ) 6345 Automatic Speech Recognition Semi-Parametric Classifiers 3
4 Gaussian Mixture Example: One Dimension x σ p(x) =06p 1 (x)+04p 2 (x) p 1 (x) N( σ,σ 2 ) p 2 (x) N(15σ,σ 2 ) 6345 Automatic Speech Recognition Semi-Parametric Classifiers 4
5 Gaussian Example First 9 MFCC s from[s]: Gaussian PDF s ( dimension 1 ) s ( dimension 2 ) s ( dimension 3 ) s ( dimension 4 ) s ( dimension 5 ) s ( dimension 6 ) s ( dimension 7 ) s ( dimension 8 ) s ( dimension 9 ) 6345 Automatic Speech Recognition Semi-Parametric Classifiers
6 Independent Mixtures [s]: 2 Gaussian Mixture Components/Dimension s ( dimension 1 ) s ( dimension 2 ) s ( dimension 3 ) s ( dimension 4 ) s ( dimension 5 ) s ( dimension 6 ) s ( dimension 7 ) s ( dimension 8 ) s ( dimension 9 ) 6345 Automatic Speech Recognition Semi-Parametric Classifiers
7 Mixture Components [s]: 2 Gaussian Mixture Components/Dimension s ( dimension 1 ) s ( dimension 2 ) s ( dimension 3 ) s ( dimension 4 ) s ( dimension 5 ) s ( dimension 6 ) s ( dimension 7 ) s ( dimension 8 ) s ( dimension 9 ) 6345 Automatic Speech Recognition Semi-Parametric Classifiers
8 ML Parameter Estimation: 1D Gaussian Mixture Means log L(µ k ) = n log p(x i ) = n log m i=1 i=1 j=1 p(x i ω j )P(ω j ) log L(µ k ) µ k = i log p(x i ) = µ k i 1 p(x i ) µ k p(x i ω k )P(ω k ) p(x i ω k ) = (x i µ k ) 2 1 e 2σ 2 k = p(x i ω k ) (x i µ k ) µ k µ k 2πσk σ 2 k log L(µ k ) µ k = i since p(x i ω k )P(ω k ) p(x i ) P(ω k ) p(x i ) p(x i ω k ) (x i µ k ) σ 2 k = P(ω k x i ) ˆµ k = =0 P(ω k x i )x i i P(ω k x i ) i 6345 Automatic Speech Recognition Semi-Parametric Classifiers 8
9 Gaussian Mixtures: ML Parameter Estimation The maximum likelihood solutions are of the form: ˆµ k = 1 n 1 n i i ˆP(ω k x i )x i ˆP(ω k x i ) ˆΣ k = 1 n i ˆP(ω k x i )(x i ˆµ k )(x i ˆµ k ) t 1 n ˆP(ω k )= 1 n i ˆP(ω k x i ) i ˆP(ω k x i ) 6345 Automatic Speech Recognition Semi-Parametric Classifiers 9
10 Gaussian Mixtures: ML Parameter Estimation The ML solutions are typically solved iteratively: Select a set of initial estimates for ˆP(ω k ), ˆµ k, ˆΣ k Use a set of n samples to reestimate the mixture parameters until some kind of convergence is found Clustering procedures are often used to provide the initial parameter estimates Similar to K-means clustering procedure 6345 Automatic Speech Recognition Semi-Parametric Classifiers 10
11 Example: 4 Samples, 2 Densities 1 Data: X = {x 1,x 2,x 3,x 4 }={2,1, 1, 2} 2 Init: p(x ω 1 ) N(1, 1) p(x ω 2 ) N( 1, 1) P(ω i )=05 3 Estimate: P(ω 1 x) P(ω 2 x) x 1 x 2 x 3 x p(x) (e 05 + e 45 )(e 0 + e 2 )(e 0 + e 2 )(e 05 + e 45 ) Recompute mixture parameters (only shown for ω 1 ): ˆP(ω 1 )= = 05 ˆµ 1 = 98(2)+88(1)+12( 1)+02( 2) = 134 ˆσ 2 1 = 98(2 134)2 +88(1 134) 2 +12( 1 134) 2 +02( 2 134) = Repeat steps 3,4 until convergence 6345 Automatic Speech Recognition Semi-Parametric Classifiers 11
12 [s] Duration: 2 Densities Iter µ 1 µ 2 σ 1 σ Iter P(ω 1 ) P(ω 2 ) logp(x) Duration (sec) Duration (sec) Duration (sec) 6345 Automatic Speech Recognition Semi-Parametric Classifiers 12
13 Gaussian Mixture Example: Two Dimensions 3-Dimensional PDF 4 PDF Contour Automatic Speech Recognition Semi-Parametric Classifiers 13
14 Two Dimensional Mixtures Diagonal Covariance Full Covariance s ( dimension 5 ) s ( dimension 6 ) s ( dimension 5 ) s ( dimension 6 ) Two Mixtures Three Mixtures s ( dimension 5 ) s ( dimension 6 ) s ( dimension 5 ) s ( dimension 6 ) Automatic Speech Recognition Semi-Parametric Classifiers 14
15 Two Dimensional Components Mixture Components s ( dimension 5 ) s ( dimension 5 ) s ( dimension 5 ) s ( dimension 5 ) 6345 Automatic Speech Recognition Semi-Parametric Classifiers 15 s ( dimension 6 ) s ( dimension 6 ) s ( dimension 6 ) s ( dimension 6 )
16 Mixture of Gaussians: Implementation Variations Diagonal Gaussians are often used instead of full-covariance Gaussians Can reduce the number of parameters Can potentially model the underlying PDF just as well if enough components are used Mixture parameters are often constrained to be the same in order to reduce the number of parameters which need to be estimated Richter Gaussians share the same mean in order to better model the PDF tails Tied-Mixtures share the same Gaussian parameters across all classes Only the mixture weights ˆP(ω i ) are class specific (Also known as semi-continuous) 6345 Automatic Speech Recognition Semi-Parametric Classifiers 16
17 Richter Gaussian Mixtures [s] Log Duration: 2 Richter Gaussians log Duration (sec) log Duration (sec) Richter Density Richter Components 6345 Automatic Speech Recognition Semi-Parametric Classifiers 17
18 Expectation-Maximization (EM) Used for determining parameters, θ, forincomplete data, X = {x i } (ie, unsupervised learning problems) Introduces variable, Z = {z j }, to make data complete so θ can be solved using conventional ML techniques log L(θ) =logp(x,z θ) = i,j log p(x i,z j θ) In reality, z j can only be estimated by P(z j x i,θ), so we can only compute the expectation of log L(θ) E = E(log L(θ)) = P(z j x i,θ)logp(x i,z j θ) i j EM solutions are computed iteratively until convergence 1 Compute the expectation of log L(θ) 2 Compute the values θ, which maximize E 6345 Automatic Speech Recognition Semi-Parametric Classifiers 18
19 EM Parameter Estimation: 1D Gaussian Mixture Means Let z i be the component id, {ω j }, which x i belongs to E = E(log L(θ)) = P(z j x i,θ)logp(x i,z j θ) i j Convert to mixture component notation: E = E(log L(µ k )) = P(ω j x i )logp(x i,ω j ) i j Differentiate with respect to µ k : E µ k = i P(ω k x i ) log p(x i,ω k ) = µ k i P(ω k x i )x i i ˆµ k = P(ω k x i ) i P(ω k x i )( x i µ k σ 2 k ) = Automatic Speech Recognition Semi-Parametric Classifiers 19
20 EM Properties Each iteration of EM will increase the likelihood of X log p(x θ ) = log p(x i θ ) = p(x θ) p(x i i θ) i j = P(z j x i,θ)(log p(x i θ ) p(x i,z j θ) p(x i j i,z j θ ) p(x i θ) P(z j x i,θ)log p(x i θ ) p(x i θ) +log p(x i,z j θ ) p(x i,z j θ) ) Using Bayes rule and the Kullback-Liebler distance metric: p(x i,z j θ) p(x i θ) =P(z j x i,θ) j P(z j x i,θ)log P(z j x i,θ) P(z j x i,θ ) 0 Since θ was determined to maximize E(log L(θ)): i j P(z j x i,θ)log p(x i,z j θ ) p(x i,z j θ) 0 Combining these two properties: p(x θ ) p(x θ) 6345 Automatic Speech Recognition Semi-Parametric Classifiers 20
21 Dimensionality Reduction Given a training set, PDF parameter estimation becomes less robust as dimensionality increases Increasing dimensions can make it more difficult to obtain insights into any underlying structure Analytical techniques exist which can transform a sample space to a different set of dimensions If original dimensions are correlated, the same information may require fewer dimensions The transformed space will often have more Normal distribution than the original space If the new dimensions are orthogonal, it could be easier to model the transformed space 6345 Automatic Speech Recognition Dimensionality Reduction 1
22 Principal Components Analysis Linearly transforms d-dimensional vector, x, tod dimensional vector, y, via orthonormal vectors, W y = W t x W = {w 1,,w d } W t W =I If d <d,xcan be only partially reconstructed from y ˆx = Wy Principal components, W, minimize the distortion, D, between x, and ˆx, on training data, X = {x 1,,x n } D= n x i ˆx i 2 i=1 Also known as Karhunen-Loève (K-L) expansion (w i s are sinusoids for some stochastic processes) 6345 Automatic Speech Recognition Dimensionality Reduction 2
23 PCA Computation x 2 y 2 y 1 x 1 W corresponds to the first d eigenvectors, P, ofσ P={e 1,,e d } Σ=PΛP t w i =e i Full covariance structure of original space, Σ, is transformed to a diagonal covariance structure, Λ Eigenvalues, {λ 1,,λ d }, represent the variances in Λ Axes in d -space contain maximum amount of variance d D = i=d Automatic Speech Recognition Dimensionality Reduction 3 λ i
24 PCA Example Original feature vector mean rate response (d =40) Data obtained from 100 speakers from TIMIT corpus First 10 components explains 98% of total variance 100% Percent of Total Variance 90% 80% 70% 60% 50% Cosine PCA 40% Number of Components 6345 Automatic Speech Recognition Dimensionality Reduction 4
25 PCA Example 0 5 Value Value Value Value Value Frequency (Bark) Frequency (Bark) 6345 Automatic Speech Recognition Dimensionality Reduction 5
26 Left 8 Left 4 Left 2 Left 1 Right 1 Right 2 Right 4 Right 8 C13 C12 C11 C10 C9 C8 C7 C6 C5 C4 C3 C2 C1 C0 Left 8 Left 4 Left 2 Left 1 Right 1 Right 2 Right 4 Right C13 C12 C11 C10 C9 C8 C7 C6 C5 C4 C3 C2 C1 C PCA for Boundary Classification Eight non-uniform averages from 14 MFCCs First 50 dimensions used for classification Second Component Seventh Component 6345 Automatic Speech Recognition Dimensionality Reduction 6
27 PCA Issues PCA can be performed using Covariances Σ Correlation coefficients matrix P ρ ij = σ ij σii σ jj ρ ij 1 Pis usually preferred when the input dimensions have significantly different ranges PCA can be used to normalize or whiten original d-dimensional space to simplify subsequent processing Σ= P= Λ= I Whitening operation can be done in one step: z = V t x 6345 Automatic Speech Recognition Dimensionality Reduction 7
28 Significance Testing To properly compare results from different classifier algorithms, A 1,andA 2, it is necessary to perform significance tests Large differences can be insignificant for small test sets Small differences can be significant for large test sets General significance tests evaluate the hypothesis that the probability of being correct, p i, of both algorithms is the same The most powerful comparisons can be made using common train and test corpora, and common evaluation criterion Results reflect differences in algorithms rather than accidental differences in test sets Significance tests can be more precise when identical data are used since they can focus on tokens misclassified by only one algorithm, rather than on all tokens 6345 Automatic Speech Recognition Significance Testing 1
29 McNemar s Significance Test When algorithms A 1 and A 2 are tested on identical data we can collapse the results into a 2x2 matrix of counts A 1 /A 2 Correct Incorrect Correct n 00 n 01 Incorrect n 10 n 11 To compare algorithms, we test the null hypothesis H 0 that n 01 p 1 = p 2,orn 01 = n 10,orq= = 1 n 01 + n 10 2 Given H 0, the probability of observing k tokens asymmetrically classified out of n = n 01 + n 10 has a Binomial PMF ( ) (1 ) n n P(k) = k 2 McNemar s Test measures the probability, P, of all cases that meet or exceed the observed asymmetric distribution, and tests P<α 6345 Automatic Speech Recognition Significance Testing 2
30 McNemar s Significance Test (cont t) The probability, P, is computed by summing up the PMF tails l n P = P(k)+ P(k) l= min(n 01,n 10 ) m = max(n 01,n 10 ) k=0 k=m Probability Distribution 0 min max n For large n, a Normal distribution is often assumed 6345 Automatic Speech Recognition Significance Testing 3
31 Significance Test Example (Gillick and Cox, 1989) Common test set of 1400 tokens Algorithms A 1 and A 2 make 72 and 62 errors Are the differences significant? A 2 A A 2 A A 2 A n = 134 m =72 P=0437 n =16 m=13 P=00213 n =10 m=10 P= Automatic Speech Recognition Significance Testing 4
32 References Huang, Acero, and Hon, Spoken Language Processing, Prentice-Hall, 2001 Duda, Hart and Stork, Pattern Classification, John Wiley & Sons, 2001 Jelinek, Statistical Methods for Speech Recognition MIT Press, 1997 Bishop, Neural Networks for Pattern Recognition, Clarendon Press, 1995 Gillick and Cox, Some Statistical Issues in the Comparison of Speech Recognition Algorithms, Proc ICASSP, Automatic Speech Recognition Pattern Classification
Pattern Recognition. Parameter Estimation of Probability Density Functions
Pattern Recognition Parameter Estimation of Probability Density Functions Classification Problem (Review) The classification problem is to assign an arbitrary feature vector x F to one of c classes. The
More informationParametric Models. Dr. Shuang LIANG. School of Software Engineering TongJi University Fall, 2012
Parametric Models Dr. Shuang LIANG School of Software Engineering TongJi University Fall, 2012 Today s Topics Maximum Likelihood Estimation Bayesian Density Estimation Today s Topics Maximum Likelihood
More informationL11: Pattern recognition principles
L11: Pattern recognition principles Bayesian decision theory Statistical classifiers Dimensionality reduction Clustering This lecture is partly based on [Huang, Acero and Hon, 2001, ch. 4] Introduction
More informationBayesian Decision and Bayesian Learning
Bayesian Decision and Bayesian Learning Ying Wu Electrical Engineering and Computer Science Northwestern University Evanston, IL 60208 http://www.eecs.northwestern.edu/~yingwu 1 / 30 Bayes Rule p(x ω i
More informationPattern Recognition 2
Pattern Recognition 2 KNN,, Dr. Terence Sim School of Computing National University of Singapore Outline 1 2 3 4 5 Outline 1 2 3 4 5 The Bayes Classifier is theoretically optimum. That is, prob. of error
More informationHidden Markov Modelling
Hidden Markov Modelling Introduction Problem formulation Forward-Backward algorithm Viterbi search Baum-Welch parameter estimation Other considerations Multiple observation sequences Phone-based models
More informationMachine Learning. Gaussian Mixture Models. Zhiyao Duan & Bryan Pardo, Machine Learning: EECS 349 Fall
Machine Learning Gaussian Mixture Models Zhiyao Duan & Bryan Pardo, Machine Learning: EECS 349 Fall 2012 1 The Generative Model POV We think of the data as being generated from some process. We assume
More informationUnsupervised Learning
2018 EE448, Big Data Mining, Lecture 7 Unsupervised Learning Weinan Zhang Shanghai Jiao Tong University http://wnzhang.net http://wnzhang.net/teaching/ee448/index.html ML Problem Setting First build and
More informationCS 195-5: Machine Learning Problem Set 1
CS 95-5: Machine Learning Problem Set Douglas Lanman dlanman@brown.edu 7 September Regression Problem Show that the prediction errors y f(x; ŵ) are necessarily uncorrelated with any linear function of
More informationMachine learning for pervasive systems Classification in high-dimensional spaces
Machine learning for pervasive systems Classification in high-dimensional spaces Department of Communications and Networking Aalto University, School of Electrical Engineering stephan.sigg@aalto.fi Version
More informationp(d θ ) l(θ ) 1.2 x x x
p(d θ ).2 x 0-7 0.8 x 0-7 0.4 x 0-7 l(θ ) -20-40 -60-80 -00 2 3 4 5 6 7 θ ˆ 2 3 4 5 6 7 θ ˆ 2 3 4 5 6 7 θ θ x FIGURE 3.. The top graph shows several training points in one dimension, known or assumed to
More informationUniversity of Cambridge Engineering Part IIB Module 4F10: Statistical Pattern Processing Handout 2: Multivariate Gaussians
Engineering Part IIB: Module F Statistical Pattern Processing University of Cambridge Engineering Part IIB Module F: Statistical Pattern Processing Handout : Multivariate Gaussians. Generative Model Decision
More informationMachine Learning 2017
Machine Learning 2017 Volker Roth Department of Mathematics & Computer Science University of Basel 21st March 2017 Volker Roth (University of Basel) Machine Learning 2017 21st March 2017 1 / 41 Section
More informationClustering VS Classification
MCQ Clustering VS Classification 1. What is the relation between the distance between clusters and the corresponding class discriminability? a. proportional b. inversely-proportional c. no-relation Ans:
More informationEEL 851: Biometrics. An Overview of Statistical Pattern Recognition EEL 851 1
EEL 851: Biometrics An Overview of Statistical Pattern Recognition EEL 851 1 Outline Introduction Pattern Feature Noise Example Problem Analysis Segmentation Feature Extraction Classification Design Cycle
More informationProbabilistic Latent Semantic Analysis
Probabilistic Latent Semantic Analysis Seungjin Choi Department of Computer Science and Engineering Pohang University of Science and Technology 77 Cheongam-ro, Nam-gu, Pohang 37673, Korea seungjin@postech.ac.kr
More informationAlgorithmisches Lernen/Machine Learning
Algorithmisches Lernen/Machine Learning Part 1: Stefan Wermter Introduction Connectionist Learning (e.g. Neural Networks) Decision-Trees, Genetic Algorithms Part 2: Norman Hendrich Support-Vector Machines
More informationLecture 3: Pattern Classification
EE E6820: Speech & Audio Processing & Recognition Lecture 3: Pattern Classification 1 2 3 4 5 The problem of classification Linear and nonlinear classifiers Probabilistic classification Gaussians, mixtures
More informationPCA & ICA. CE-717: Machine Learning Sharif University of Technology Spring Soleymani
PCA & ICA CE-717: Machine Learning Sharif University of Technology Spring 2015 Soleymani Dimensionality Reduction: Feature Selection vs. Feature Extraction Feature selection Select a subset of a given
More informationLecture 7: Con3nuous Latent Variable Models
CSC2515 Fall 2015 Introduc3on to Machine Learning Lecture 7: Con3nuous Latent Variable Models All lecture slides will be available as.pdf on the course website: http://www.cs.toronto.edu/~urtasun/courses/csc2515/
More informationSTA 414/2104: Machine Learning
STA 414/2104: Machine Learning Russ Salakhutdinov Department of Computer Science! Department of Statistics! rsalakhu@cs.toronto.edu! http://www.cs.toronto.edu/~rsalakhu/ Lecture 8 Continuous Latent Variable
More informationEECS490: Digital Image Processing. Lecture #26
Lecture #26 Moments; invariant moments Eigenvector, principal component analysis Boundary coding Image primitives Image representation: trees, graphs Object recognition and classes Minimum distance classifiers
More informationUnsupervised Learning with Permuted Data
Unsupervised Learning with Permuted Data Sergey Kirshner skirshne@ics.uci.edu Sridevi Parise sparise@ics.uci.edu Padhraic Smyth smyth@ics.uci.edu School of Information and Computer Science, University
More informationp(x ω i 0.4 ω 2 ω
p( ω i ). ω.3.. 9 3 FIGURE.. Hypothetical class-conditional probability density functions show the probability density of measuring a particular feature value given the pattern is in category ω i.if represents
More informationUniversity of Cambridge Engineering Part IIB Module 4F10: Statistical Pattern Processing Handout 2: Multivariate Gaussians
University of Cambridge Engineering Part IIB Module 4F: Statistical Pattern Processing Handout 2: Multivariate Gaussians.2.5..5 8 6 4 2 2 4 6 8 Mark Gales mjfg@eng.cam.ac.uk Michaelmas 2 2 Engineering
More informationHidden Markov Models and Gaussian Mixture Models
Hidden Markov Models and Gaussian Mixture Models Hiroshi Shimodaira and Steve Renals Automatic Speech Recognition ASR Lectures 4&5 23&27 January 2014 ASR Lectures 4&5 Hidden Markov Models and Gaussian
More informationCovariance and Correlation Matrix
Covariance and Correlation Matrix Given sample {x n } N 1, where x Rd, x n = x 1n x 2n. x dn sample mean x = 1 N N n=1 x n, and entries of sample mean are x i = 1 N N n=1 x in sample covariance matrix
More informationExpect Values and Probability Density Functions
Intelligent Systems: Reasoning and Recognition James L. Crowley ESIAG / osig Second Semester 00/0 Lesson 5 8 april 0 Expect Values and Probability Density Functions otation... Bayesian Classification (Reminder...3
More informationManifold Learning for Signal and Visual Processing Lecture 9: Probabilistic PCA (PPCA), Factor Analysis, Mixtures of PPCA
Manifold Learning for Signal and Visual Processing Lecture 9: Probabilistic PCA (PPCA), Factor Analysis, Mixtures of PPCA Radu Horaud INRIA Grenoble Rhone-Alpes, France Radu.Horaud@inria.fr http://perception.inrialpes.fr/
More informationData Analysis and Manifold Learning Lecture 6: Probabilistic PCA and Factor Analysis
Data Analysis and Manifold Learning Lecture 6: Probabilistic PCA and Factor Analysis Radu Horaud INRIA Grenoble Rhone-Alpes, France Radu.Horaud@inrialpes.fr http://perception.inrialpes.fr/ Outline of Lecture
More informationParametric Unsupervised Learning Expectation Maximization (EM) Lecture 20.a
Parametric Unsupervised Learning Expectation Maximization (EM) Lecture 20.a Some slides are due to Christopher Bishop Limitations of K-means Hard assignments of data points to clusters small shift of a
More informationCOM336: Neural Computing
COM336: Neural Computing http://www.dcs.shef.ac.uk/ sjr/com336/ Lecture 2: Density Estimation Steve Renals Department of Computer Science University of Sheffield Sheffield S1 4DP UK email: s.renals@dcs.shef.ac.uk
More informationProbabilistic & Unsupervised Learning
Probabilistic & Unsupervised Learning Week 2: Latent Variable Models Maneesh Sahani maneesh@gatsby.ucl.ac.uk Gatsby Computational Neuroscience Unit, and MSc ML/CSML, Dept Computer Science University College
More informationApproximating the Covariance Matrix with Low-rank Perturbations
Approximating the Covariance Matrix with Low-rank Perturbations Malik Magdon-Ismail and Jonathan T. Purnell Department of Computer Science Rensselaer Polytechnic Institute Troy, NY 12180 {magdon,purnej}@cs.rpi.edu
More informationBrief Introduction of Machine Learning Techniques for Content Analysis
1 Brief Introduction of Machine Learning Techniques for Content Analysis Wei-Ta Chu 2008/11/20 Outline 2 Overview Gaussian Mixture Model (GMM) Hidden Markov Model (HMM) Support Vector Machine (SVM) Overview
More informationBayes Decision Theory
Bayes Decision Theory Minimum-Error-Rate Classification Classifiers, Discriminant Functions and Decision Surfaces The Normal Density 0 Minimum-Error-Rate Classification Actions are decisions on classes
More informationLecture Notes on the Gaussian Distribution
Lecture Notes on the Gaussian Distribution Hairong Qi The Gaussian distribution is also referred to as the normal distribution or the bell curve distribution for its bell-shaped density curve. There s
More informationHeeyoul (Henry) Choi. Dept. of Computer Science Texas A&M University
Heeyoul (Henry) Choi Dept. of Computer Science Texas A&M University hchoi@cs.tamu.edu Introduction Speaker Adaptation Eigenvoice Comparison with others MAP, MLLR, EMAP, RMP, CAT, RSW Experiments Future
More informationMaximum Likelihood Estimation. only training data is available to design a classifier
Introduction to Pattern Recognition [ Part 5 ] Mahdi Vasighi Introduction Bayesian Decision Theory shows that we could design an optimal classifier if we knew: P( i ) : priors p(x i ) : class-conditional
More informationIEOR E4570: Machine Learning for OR&FE Spring 2015 c 2015 by Martin Haugh. The EM Algorithm
IEOR E4570: Machine Learning for OR&FE Spring 205 c 205 by Martin Haugh The EM Algorithm The EM algorithm is used for obtaining maximum likelihood estimates of parameters when some of the data is missing.
More informationLecture 4: Probabilistic Learning. Estimation Theory. Classification with Probability Distributions
DD2431 Autumn, 2014 1 2 3 Classification with Probability Distributions Estimation Theory Classification in the last lecture we assumed we new: P(y) Prior P(x y) Lielihood x2 x features y {ω 1,..., ω K
More informationParametric Techniques Lecture 3
Parametric Techniques Lecture 3 Jason Corso SUNY at Buffalo 22 January 2009 J. Corso (SUNY at Buffalo) Parametric Techniques Lecture 3 22 January 2009 1 / 39 Introduction In Lecture 2, we learned how to
More informationMachine Learning. B. Unsupervised Learning B.2 Dimensionality Reduction. Lars Schmidt-Thieme, Nicolas Schilling
Machine Learning B. Unsupervised Learning B.2 Dimensionality Reduction Lars Schmidt-Thieme, Nicolas Schilling Information Systems and Machine Learning Lab (ISMLL) Institute for Computer Science University
More informationTHESIS COVARIANCE REGULARIZATION IN MIXTURE OF GAUSSIANS FOR HIGH-DIMENSIONAL IMAGE CLASSIFICATION. Submitted by. Daniel L Elliott
THESIS COVARIANCE REGULARIZATION IN MIXTURE OF GAUSSIANS FOR HIGH-DIMENSIONAL IMAGE CLASSIFICATION Submitted by Daniel L Elliott Department of Computer Science In partial fulfillment of the requirements
More informationLecture 2: GMM and EM
2: GMM and EM-1 Machine Learning Lecture 2: GMM and EM Lecturer: Haim Permuter Scribe: Ron Shoham I. INTRODUCTION This lecture comprises introduction to the Gaussian Mixture Model (GMM) and the Expectation-Maximization
More informationHigh Dimensional Discriminant Analysis
High Dimensional Discriminant Analysis Charles Bouveyron 1,2, Stéphane Girard 1, and Cordelia Schmid 2 1 LMC IMAG, BP 53, Université Grenoble 1, 38041 Grenoble cedex 9 France (e-mail: charles.bouveyron@imag.fr,
More informationPrincipal Component Analysis -- PCA (also called Karhunen-Loeve transformation)
Principal Component Analysis -- PCA (also called Karhunen-Loeve transformation) PCA transforms the original input space into a lower dimensional space, by constructing dimensions that are linear combinations
More informationExpectation Maximization
Expectation Maximization Machine Learning CSE546 Carlos Guestrin University of Washington November 13, 2014 1 E.M.: The General Case E.M. widely used beyond mixtures of Gaussians The recipe is the same
More informationCS534 Machine Learning - Spring Final Exam
CS534 Machine Learning - Spring 2013 Final Exam Name: You have 110 minutes. There are 6 questions (8 pages including cover page). If you get stuck on one question, move on to others and come back to the
More informationFeature selection and extraction Spectral domain quality estimation Alternatives
Feature selection and extraction Error estimation Maa-57.3210 Data Classification and Modelling in Remote Sensing Markus Törmä markus.torma@tkk.fi Measurements Preprocessing: Remove random and systematic
More informationExperiments with a Gaussian Merging-Splitting Algorithm for HMM Training for Speech Recognition
Experiments with a Gaussian Merging-Splitting Algorithm for HMM Training for Speech Recognition ABSTRACT It is well known that the expectation-maximization (EM) algorithm, commonly used to estimate hidden
More informationParametric Techniques
Parametric Techniques Jason J. Corso SUNY at Buffalo J. Corso (SUNY at Buffalo) Parametric Techniques 1 / 39 Introduction When covering Bayesian Decision Theory, we assumed the full probabilistic structure
More informationStatistical and Learning Techniques in Computer Vision Lecture 2: Maximum Likelihood and Bayesian Estimation Jens Rittscher and Chuck Stewart
Statistical and Learning Techniques in Computer Vision Lecture 2: Maximum Likelihood and Bayesian Estimation Jens Rittscher and Chuck Stewart 1 Motivation and Problem In Lecture 1 we briefly saw how histograms
More informationHidden Markov Models and Gaussian Mixture Models
Hidden Markov Models and Gaussian Mixture Models Hiroshi Shimodaira and Steve Renals Automatic Speech Recognition ASR Lectures 4&5 25&29 January 2018 ASR Lectures 4&5 Hidden Markov Models and Gaussian
More informationUniversity of Cambridge. MPhil in Computer Speech Text & Internet Technology. Module: Speech Processing II. Lecture 2: Hidden Markov Models I
University of Cambridge MPhil in Computer Speech Text & Internet Technology Module: Speech Processing II Lecture 2: Hidden Markov Models I o o o o o 1 2 3 4 T 1 b 2 () a 12 2 a 3 a 4 5 34 a 23 b () b ()
More informationProbability Models for Bayesian Recognition
Intelligent Systems: Reasoning and Recognition James L. Crowley ENSIAG / osig Second Semester 06/07 Lesson 9 0 arch 07 Probability odels for Bayesian Recognition Notation... Supervised Learning for Bayesian
More informationPerformance Comparison of K-Means and Expectation Maximization with Gaussian Mixture Models for Clustering EE6540 Final Project
Performance Comparison of K-Means and Expectation Maximization with Gaussian Mixture Models for Clustering EE6540 Final Project Devin Cornell & Sushruth Sastry May 2015 1 Abstract In this article, we explore
More informationUnsupervised Learning Techniques Class 07, 1 March 2006 Andrea Caponnetto
Unsupervised Learning Techniques 9.520 Class 07, 1 March 2006 Andrea Caponnetto About this class Goal To introduce some methods for unsupervised learning: Gaussian Mixtures, K-Means, ISOMAP, HLLE, Laplacian
More informationProblem Set 2. MAS 622J/1.126J: Pattern Recognition and Analysis. Due: 5:00 p.m. on September 30
Problem Set 2 MAS 622J/1.126J: Pattern Recognition and Analysis Due: 5:00 p.m. on September 30 [Note: All instructions to plot data or write a program should be carried out using Matlab. In order to maintain
More informationCh 4. Linear Models for Classification
Ch 4. Linear Models for Classification Pattern Recognition and Machine Learning, C. M. Bishop, 2006. Department of Computer Science and Engineering Pohang University of Science and echnology 77 Cheongam-ro,
More informationPhysics 403. Segev BenZvi. Parameter Estimation, Correlations, and Error Bars. Department of Physics and Astronomy University of Rochester
Physics 403 Parameter Estimation, Correlations, and Error Bars Segev BenZvi Department of Physics and Astronomy University of Rochester Table of Contents 1 Review of Last Class Best Estimates and Reliability
More informationECE 5984: Introduction to Machine Learning
ECE 5984: Introduction to Machine Learning Topics: (Finish) Expectation Maximization Principal Component Analysis (PCA) Readings: Barber 15.1-15.4 Dhruv Batra Virginia Tech Administrativia Poster Presentation:
More informationBayesian Decision Theory
Bayesian Decision Theory Dr. Shuang LIANG School of Software Engineering TongJi University Fall, 2012 Today s Topics Bayesian Decision Theory Bayesian classification for normal distributions Error Probabilities
More informationMixtures of Gaussians with Sparse Structure
Mixtures of Gaussians with Sparse Structure Costas Boulis 1 Abstract When fitting a mixture of Gaussians to training data there are usually two choices for the type of Gaussians used. Either diagonal or
More informationLecture 3: Pattern Classification. Pattern classification
EE E68: Speech & Audio Processing & Recognition Lecture 3: Pattern Classification 3 4 5 The problem of classification Linear and nonlinear classifiers Probabilistic classification Gaussians, mitures and
More informationClustering K-means. Clustering images. Machine Learning CSE546 Carlos Guestrin University of Washington. November 4, 2014.
Clustering K-means Machine Learning CSE546 Carlos Guestrin University of Washington November 4, 2014 1 Clustering images Set of Images [Goldberger et al.] 2 1 K-means Randomly initialize k centers µ (0)
More informationPATTERN CLASSIFICATION
PATTERN CLASSIFICATION Second Edition Richard O. Duda Peter E. Hart David G. Stork A Wiley-lnterscience Publication JOHN WILEY & SONS, INC. New York Chichester Weinheim Brisbane Singapore Toronto CONTENTS
More informationMACHINE LEARNING. Methods for feature extraction and reduction of dimensionality: Probabilistic PCA and kernel PCA
1 MACHINE LEARNING Methods for feature extraction and reduction of dimensionality: Probabilistic PCA and kernel PCA 2 Practicals Next Week Next Week, Practical Session on Computer Takes Place in Room GR
More informationIndependent Component Analysis and Unsupervised Learning
Independent Component Analysis and Unsupervised Learning Jen-Tzung Chien National Cheng Kung University TABLE OF CONTENTS 1. Independent Component Analysis 2. Case Study I: Speech Recognition Independent
More informationA Small Footprint i-vector Extractor
A Small Footprint i-vector Extractor Patrick Kenny Odyssey Speaker and Language Recognition Workshop June 25, 2012 1 / 25 Patrick Kenny A Small Footprint i-vector Extractor Outline Introduction Review
More informationp(x ω i 0.4 ω 2 ω
p(x ω i ).4 ω.3.. 9 3 4 5 x FIGURE.. Hypothetical class-conditional probability density functions show the probability density of measuring a particular feature value x given the pattern is in category
More informationClustering by Mixture Models. General background on clustering Example method: k-means Mixture model based clustering Model estimation
Clustering by Mixture Models General bacground on clustering Example method: -means Mixture model based clustering Model estimation 1 Clustering A basic tool in data mining/pattern recognition: Divide
More informationECE 521. Lecture 11 (not on midterm material) 13 February K-means clustering, Dimensionality reduction
ECE 521 Lecture 11 (not on midterm material) 13 February 2017 K-means clustering, Dimensionality reduction With thanks to Ruslan Salakhutdinov for an earlier version of the slides Overview K-means clustering
More informationLinear Models for Classification
Linear Models for Classification Oliver Schulte - CMPT 726 Bishop PRML Ch. 4 Classification: Hand-written Digit Recognition CHINE INTELLIGENCE, VOL. 24, NO. 24, APRIL 2002 x i = t i = (0, 0, 0, 1, 0, 0,
More informationIntroduction to Machine Learning. Maximum Likelihood and Bayesian Inference. Lecturers: Eran Halperin, Yishay Mansour, Lior Wolf
1 Introduction to Machine Learning Maximum Likelihood and Bayesian Inference Lecturers: Eran Halperin, Yishay Mansour, Lior Wolf 2013-14 We know that X ~ B(n,p), but we do not know p. We get a random sample
More informationIntroduction to Graphical Models
Introduction to Graphical Models The 15 th Winter School of Statistical Physics POSCO International Center & POSTECH, Pohang 2018. 1. 9 (Tue.) Yung-Kyun Noh GENERALIZATION FOR PREDICTION 2 Probabilistic
More informationStatistical Pattern Recognition
Statistical Pattern Recognition Feature Extraction Hamid R. Rabiee Jafar Muhammadi, Alireza Ghasemi, Payam Siyari Spring 2014 http://ce.sharif.edu/courses/92-93/2/ce725-2/ Agenda Dimensionality Reduction
More informationPattern Classification
Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John Wiley & Sons, 2000 with the permission of the authors
More informationMachine Learning Lecture 2
Machine Perceptual Learning and Sensory Summer Augmented 15 Computing Many slides adapted from B. Schiele Machine Learning Lecture 2 Probability Density Estimation 16.04.2015 Bastian Leibe RWTH Aachen
More informationUnsupervised Learning: K- Means & PCA
Unsupervised Learning: K- Means & PCA Unsupervised Learning Supervised learning used labeled data pairs (x, y) to learn a func>on f : X Y But, what if we don t have labels? No labels = unsupervised learning
More informationPattern Recognition Problem. Pattern Recognition Problems. Pattern Recognition Problems. Pattern Recognition: OCR. Pattern Recognition Books
Introduction to Statistical Pattern Recognition Pattern Recognition Problem R.P.W. Duin Pattern Recognition Group Delft University of Technology The Netherlands prcourse@prtools.org What is this? What
More informationENG 8801/ Special Topics in Computer Engineering: Pattern Recognition. Memorial University of Newfoundland Pattern Recognition
Memorial University of Newfoundland Pattern Recognition Lecture 6 May 18, 2006 http://www.engr.mun.ca/~charlesr Office Hours: Tuesdays & Thursdays 8:30-9:30 PM EN-3026 Review Distance-based Classification
More informationPHONEME CLASSIFICATION OVER THE RECONSTRUCTED PHASE SPACE USING PRINCIPAL COMPONENT ANALYSIS
PHONEME CLASSIFICATION OVER THE RECONSTRUCTED PHASE SPACE USING PRINCIPAL COMPONENT ANALYSIS Jinjin Ye jinjin.ye@mu.edu Michael T. Johnson mike.johnson@mu.edu Richard J. Povinelli richard.povinelli@mu.edu
More informationMachine Learning 2nd Edition
INTRODUCTION TO Lecture Slides for Machine Learning 2nd Edition ETHEM ALPAYDIN, modified by Leonardo Bobadilla and some parts from http://www.cs.tau.ac.il/~apartzin/machinelearning/ The MIT Press, 2010
More informationHST.582J / 6.555J / J Biomedical Signal and Image Processing Spring 2007
MIT OpenCourseWare http://ocw.mit.edu HST.582J / 6.555J / 16.456J Biomedical Signal and Image Processing Spring 2007 For information about citing these materials or our Terms of Use, visit: http://ocw.mit.edu/terms.
More informationChapter 3: Maximum-Likelihood & Bayesian Parameter Estimation (part 1)
HW 1 due today Parameter Estimation Biometrics CSE 190 Lecture 7 Today s lecture was on the blackboard. These slides are an alternative presentation of the material. CSE190, Winter10 CSE190, Winter10 Chapter
More informationInf2b Learning and Data
Inf2b Learning and Data Lecture 13: Review (Credit: Hiroshi Shimodaira Iain Murray and Steve Renals) Centre for Speech Technology Research (CSTR) School of Informatics University of Edinburgh http://www.inf.ed.ac.uk/teaching/courses/inf2b/
More informationConditional Random Field
Introduction Linear-Chain General Specific Implementations Conclusions Corso di Elaborazione del Linguaggio Naturale Pisa, May, 2011 Introduction Linear-Chain General Specific Implementations Conclusions
More informationMachine Learning. CUNY Graduate Center, Spring Lectures 11-12: Unsupervised Learning 1. Professor Liang Huang.
Machine Learning CUNY Graduate Center, Spring 2013 Lectures 11-12: Unsupervised Learning 1 (Clustering: k-means, EM, mixture models) Professor Liang Huang huang@cs.qc.cuny.edu http://acl.cs.qc.edu/~lhuang/teaching/machine-learning
More information1 EM algorithm: updating the mixing proportions {π k } ik are the posterior probabilities at the qth iteration of EM.
Université du Sud Toulon - Var Master Informatique Probabilistic Learning and Data Analysis TD: Model-based clustering by Faicel CHAMROUKHI Solution The aim of this practical wor is to show how the Classification
More informationDimensionality Reduction: PCA. Nicholas Ruozzi University of Texas at Dallas
Dimensionality Reduction: PCA Nicholas Ruozzi University of Texas at Dallas Eigenvalues λ is an eigenvalue of a matrix A R n n if the linear system Ax = λx has at least one non-zero solution If Ax = λx
More informationMaximum variance formulation
12.1. Principal Component Analysis 561 Figure 12.2 Principal component analysis seeks a space of lower dimensionality, known as the principal subspace and denoted by the magenta line, such that the orthogonal
More informationNecessary Corrections in Intransitive Likelihood-Ratio Classifiers
Necessary Corrections in Intransitive Likelihood-Ratio Classifiers Gang Ji and Jeff Bilmes SSLI-Lab, Department of Electrical Engineering University of Washington Seattle, WA 9895-500 {gang,bilmes}@ee.washington.edu
More informationA Variance Modeling Framework Based on Variational Autoencoders for Speech Enhancement
A Variance Modeling Framework Based on Variational Autoencoders for Speech Enhancement Simon Leglaive 1 Laurent Girin 1,2 Radu Horaud 1 1: Inria Grenoble Rhône-Alpes 2: Univ. Grenoble Alpes, Grenoble INP,
More informationMachine Learning - MT & 14. PCA and MDS
Machine Learning - MT 2016 13 & 14. PCA and MDS Varun Kanade University of Oxford November 21 & 23, 2016 Announcements Sheet 4 due this Friday by noon Practical 3 this week (continue next week if necessary)
More informationSession 1: Pattern Recognition
Proc. Digital del Continguts Musicals Session 1: Pattern Recognition 1 2 3 4 5 Music Content Analysis Pattern Classification The Statistical Approach Distribution Models Singing Detection Dan Ellis
More informationNotes on Latent Semantic Analysis
Notes on Latent Semantic Analysis Costas Boulis 1 Introduction One of the most fundamental problems of information retrieval (IR) is to find all documents (and nothing but those) that are semantically
More informationMachine Learning. Gaussian Mixture Models. Zhiyao Duan & Bryan Pardo, Machine Learning: EECS 349 Fall
Machine Learning Gaussian Mixture Models Zhiyao Duan & Bryan Pardo, Machine Learning: EECS 349 Fall 2012 1 Discriminative vs Generative Models Discriminative: Just learn a decision boundary between your
More informationABSTRACT INTRODUCTION
ABSTRACT Presented in this paper is an approach to fault diagnosis based on a unifying review of linear Gaussian models. The unifying review draws together different algorithms such as PCA, factor analysis,
More informationExpectation Maximization (EM)
Expectation Maximization (EM) The Expectation Maximization (EM) algorithm is one approach to unsupervised, semi-supervised, or lightly supervised learning. In this kind of learning either no labels are
More information