Session 1: Pattern Recognition
|
|
- Ira Phillips
- 6 years ago
- Views:
Transcription
1 Proc. Digital del Continguts Musicals Session 1: Pattern Recognition Music Content Analysis Pattern Classification The Statistical Approach Distribution Models Singing Detection Dan Ellis <dpwe@ee.columbia.edu> Laboratory for Recognition and Organization of Speech and Audio Columbia University, New York Spring 2003 Musical Content - Dan Ellis 1: Pattern Recognition
2 1 Music Content Analysis Music contains information at many levels - what is it? We d like to get this information out automatically - fine-level transcription of events - broad-level classification of pieces Information extraction can be framed as: pattern classification / recognition or machine learning - build systems based on (labeled) training data Musical Content - Dan Ellis 1: Pattern Recognition
3 Music analysis What might we want to get out of music? Instrument identification - different levels of specificity - registers within instruments Score recovery - transcribe the note sequence - extract the performance Ensemble performance - gestalts : chords, tone colors Broader timescales - phrasing & musical structure - artist / genre clustering and classification Musical Content - Dan Ellis 1: Pattern Recognition
4 Course Outline Session 1: Pattern Classification - MLP Neural Networks - GMM distribution models - Singing detection Session 2: Sequence Recognition - The Hidden Markov Model - Chord sequence recognition Session 3: Musical Applications - Note transcription - Music summarization - Music Information Retrieval - Similarity browsing Musical Content - Dan Ellis 1: Pattern Recognition
5 Outline Music Content Analysis Pattern Classification - classification - decision boundaries - neural networks The Statistical Approach Distribution Models Singing Detection Musical Content - Dan Ellis 1: Pattern Recognition
6 2 Pattern Classification Classification means: finding categorical (discrete) labels for real-world (continuous) observations F2/Hz 4000 f/hz time/s x x F1/Hz ay ao Why? - information extraction - conditional processing - detect exceptions Musical Content - Dan Ellis 1: Pattern Recognition
7 Building a classifier Define classes/attributes - could state explicit rules - better to define through training examples Define feature space Define decision algorithm - set parameters from examples Measure performance - calculate (weighted) error rate 1100 Pols vowel formants: "u" (x) vs. "o" (o) 1000 F2 / Hz F1 / Hz Musical Content - Dan Ellis 1: Pattern Recognition
8 Classification system parts Sensor signal segment feature vector Pre-processing/ segmentation Feature extraction STFT Locate vowels Formant extraction class Classification Post-processing Context constraints Costs/risk Musical Content - Dan Ellis 1: Pattern Recognition
9 Feature extraction Right features are critical - waveform vs. formants vs. cepstra - invariance under irrelevant modifications Theoretically equivalent features may act very differently in practice - representations make important aspects explicit - remove irrelevant information Feature design incorporates domain knowledge - more data less need for cleverness? Smaller feature space (fewer dimensions) simpler models (fewer parameters) less training data needed faster training Musical Content - Dan Ellis 1: Pattern Recognition
10 Minimum distance classification Pols vowel formants: "u" (x), "o" (o), "a" (+) F2 / Hz x * o new observation x F1 / Hz Find closest match (nearest neighbor) min[ D( x, y ω i )], - choice of distance metric D(x,y) is important Find representative match (class prototypes) min[ D( x, z ω )] - class data { y ω,i } class model z ω Musical Content - Dan Ellis 1: Pattern Recognition
11 Linear classifiers Minimum Euclidean distance is equivalent to a linear discriminant function: - examples { y ω,i } template z ω = mean{ y ω,i } - observed feature vector x = [x 1 x 2...] T - Euclidean distance metric Dxy [, ] = ( x y) T ( x y) - minimum distance rule: class i = argmin i D[ x, z i ] = argmin i D 2 [ x, z i ] - i.e. choose class O over U if D 2 [ x, z O ] < D 2 [ x, z U ]... Musical Content - Dan Ellis 1: Pattern Recognition
12 Linear classification boundaries Min-distance divides normal to class centers: decision boundary 1 (z O -z U ) T (z O -z U ) 2 z O -z U z U x o z O (z O -z U ) (x-z U ) T (z O -z U ) z O -z U * x Scaling axes changes boundary: new boundary x o Squeezed horizontally x Squeezed vertically new boundary o Musical Content - Dan Ellis 1: Pattern Recognition
13 Decision boundaries Linear functions give linear boundaries - planes and hyperplanes as dimensions increase Linear boundaries can become curves under feature transformations - e.g. a linear discriminant in x' = [x 2 1 x ] T What about more complex boundaries? - i.e. if a linear discriminant is how about a nonlinear function? F[ w i x i ] F[ w ij x i ] w i x i - changes only threshold space, but j i j i F[ w jk F[ w ij x i ]] offers more flexibility, and has even more... Musical Content - Dan Ellis 1: Pattern Recognition
14 Neural networks Sums over nonlinear functions of sums large range of decision surfaces e.g. Multi-layer perceptron (MLP) with 1 hidden layer: y k = F[ w jk F[ w ij x j i ]] j x 1 + x w 2 h jk + y 1 x F[ ] Input layer w ij + h 1 Hidden layer Problem is finding the weights w ij... (training) + y2 Output layer Musical Content - Dan Ellis 1: Pattern Recognition
15 Back-propagation training Find w ij to minimize e.g. E = n for training set of patterns {x n } and desired (target) outputs {t n } y( x n ) t n Differentiate error with respect to weights - contribution from each pattern x n, t n : y k = F[ w jk h j j ] E w jk = = E y y Σ w jk 2( y t) F' Σ w jk = w jk α j ( w jk h j ) [ ] h j E w jk i.e. gradient descent with learning rate α 2 Musical Content - Dan Ellis 1: Pattern Recognition
16 Neural net example 2 input units (normalized F1, F2) 5 hidden units, 3 output units ( U, O, A ) F1 F2 A O U Sigmoid nonlinearity: F[ x] 1 = e x df = F( 1 F) dx sigm(x) d sigm(x) dx Musical Content - Dan Ellis 1: Pattern Recognition
17 1 Neural net training 2:5:3 net: MS error by training epoch Mean squared error Training epoch 10 iterations iterations example Musical Content - Dan Ellis 1: Pattern Recognition
18 Outline Music Content Analysis Pattern Classification The Statistical Approach - Conditional distributions - Bayes rule Distribution Models Singing Detection Musical Content - Dan Ellis 1: Pattern Recognition
19 3 The Statistical Approach Observations are random variables whose distribution depends on the class: Observation x Class ω i (hidden) discrete continuous Source distributions p(x ω i ) - reflect variability in feature - reflect noise in observation - generally have to be estimated from data (rather than known in advance) p(x ω i ) p(x ω i ) Pr(ω i x) ω 1 ω 2 ω 3 ω 4 x Musical Content - Dan Ellis 1: Pattern Recognition
20 Random Variables review Random vars have joint distributions (pdf s): p(x,y) y µy σ y σ xy p(y) p(x) µ x σ x x - marginal px ( ) = pxy (, ) dy - covariance Σ = E[ ( o µ )( o µ ) T ] = σ x 2 σxy σ xy σ y 2 Musical Content - Dan Ellis 1: Pattern Recognition
21 Conditional probability Knowing one value in a joint distribution constrains the remainder: y Y p(x,y) x p(x y = Y ) Bayes rule: pxy (, ) = pxy ( ) py ( ) pyx ( ) = pxy ( ) py ( ) px ( ) can reverse conditioning given priors/marginal - either term can be discrete or continuous Musical Content - Dan Ellis 1: Pattern Recognition
22 Priors and posteriors Bayesian inference can be interpreted as updating prior beliefs with new information, x: Bayes Rule: Pr( ω i ) Prior probability pxω ( i ) pxω ( j ) Pr( ω j ) j Likelihood Evidence = p(x) = Pr( ω i x) Posterior probability Posterior is prior scaled by likelihood & normalized by evidence (so Σ(posteriors) = 1) Objection: priors are often unknown - but omitting them amounts to assuming they are all equal Musical Content - Dan Ellis 1: Pattern Recognition
23 Bayesian (MAP) classifier Minimize the probability of error by choosing maximum a posteriori (MAP) class: Intuitively right - choose most probable class in light of the observation Given models for each distribution, hence ωˆ = argmax ω i Pr( ω i x) pxω ( i ) the search for ωˆ = argmax Pr( ω i x) ω i pxω ( i ) Pr( ω i ) becomes argmax ω i pxω ( j ) Pr( ω j ) j but denominator = p(x) is the same over all ω i ωˆ = argmax ω i pxω ( i ) Pr( ω i ) Musical Content - Dan Ellis 1: Pattern Recognition
24 Practical implementation Optimal classifier is but we don t know ωˆ = Pr( ω i x) argmax ω i Pr( ω i x) Can model directly e.g. train a neural net to map from inputs x to a set of outputs Pr(ω i ) - a discriminative model Often easier to model conditional distributions then use Bayes rule to find MAP class pxω ( i ) Labeled training examples {x n,ω xn } Sort according to class Estimate conditional pdf for class ω 1 p(x ω 1 ) Musical Content - Dan Ellis 1: Pattern Recognition
25 Outline Music Content Analysis Pattern Classification The Statistical Approach Distribution Models - Gaussian models - Gaussian mixtures Singing Detection Musical Content - Dan Ellis 1: Pattern Recognition
26 4 Distribution Models Easiest way to model distributions is via parametric model - assume known form, estimate a few parameters Gaussian model is simple & useful: in 1 dim: pxω ( i ) = πσ i exp normalization to make it sum to x µ i σ i Parameters mean µ i and variance σ i 2 fit P M P M e 1/2 p(x ω i ) µ i σ i x Musical Content - Dan Ellis 1: Pattern Recognition
27 Gaussians in d dimensions: p( x ω i ) = ( 2π) d 1 2 Σ i exp 1 -- ( x µ 2 i ) T 1 Σ i ( x µ i ) Described by d dimensional mean vector µ i and d x d covariance matrix Σ i argmax ω i Classify by maximizing log likelihood i.e ( x µ 2 i ) T 1 Σ i ( x µ i ) 1 -- log Σ 2 i + logpr( ω i ) Musical Content - Dan Ellis 1: Pattern Recognition
28 Gaussian Mixture models (GMMs) Single Gaussians cannot model - distributions with multiple modes - distributions with nonlinear correlation What about a weighted sum? px ( ) c k pxm ( k ) i.e. k where {c k } is a set of weights and pxm ( k ) { } is a set of Gaussian components - can fit anything given enough components Interpretation: each observation is generated by one of the Gaussians, chosen at random with priors c k = Pr( m k ) Musical Content - Dan Ellis 1: Pattern Recognition
29 Gaussian mixtures (2) e.g. nonlinear correlation: resulting surface original data Gaussian components Problem: finding c k and m k parameters - easy if we knew which m k generated each x Musical Content - Dan Ellis 1: Pattern Recognition
30 Expectation-maximization (EM) General procedure for estimating a model when some parameters are unknown - e.g. which component generated a value in a Gaussian mixture model Procedure: Iteratively update fixed model parameters Θ to maximize Q=expected value of log-probability of known training data x trn and unknown parameters u: Q( Θ, Θ old ) = Pr u x trn, Θ old ( ) log p( u, x trn Θ) u - iterative because Pr( u x trn, Θ old ) changes each time we change Θ - can prove this increases data likelihood, hence maximum-likelihood (ML) model - local optimum - depends on initialization Musical Content - Dan Ellis 1: Pattern Recognition
31 Fitting GMMs with EM Want to find component Gaussians m k { µ k, Σ k } plus weights = = c k Pr( m k ) to maximize likelihood of training data x trn If we could assign each x to a particular m k, estimation would be direct Hence, treat ks (mixture indices) as unknown, form Q = E[ log p( x, k Θ) ], differentiate with respect to model parameters equations for µ k, Σ k and c k to maximize Q... Musical Content - Dan Ellis 1: Pattern Recognition
32 GMM EM update equations: Solve for maximizing Q: pkx n, Θ n µ k = pkx ( n, Θ) n pkx n, Θ n Σ k = n ( ) x n ( )( x n µ k )( x n µ k ) T pkx ( n, Θ) 1 c k = --- pkx N ( n, Θ) n pkx ( n, Θ) Each involves, fuzzy membership of x n to Gaussian k Parameter is just sample average, weighted by fuzzy membership Musical Content - Dan Ellis 1: Pattern Recognition
33 GMM examples Vowel data fit with different mixture counts: 1 Gauss logp(x)= Gauss logp(x)= Gauss logp(x)= Gauss logp(x)= example... Musical Content - Dan Ellis 1: Pattern Recognition
34 Methodological Remarks: Training and test data A rich model can learn every training example (overtraining) error rate Test data Training data Overfitting training or parameters But, goal is to classify new, unseen data i.e. generalization - sometimes use cross validation set to decide when to stop training For evaluation results to be meaningful: - don t test with training data! - don t train on test data (even indirectly...) Musical Content - Dan Ellis 1: Pattern Recognition
35 Model complexity More model parameters better fit - more Gaussian mixture components - more MLP hidden units More training data can use larger models For best generalization (no overfitting), there will be some optimal model size: More parameters Word Error Rate More training data WER% Optimal parameter/data ratio Constant training time Training set / hours Hidden layer / units WER for PLP12N-8k nets vs. net size & training data Musical Content - Dan Ellis 1: Pattern Recognition
36 Outline Music Content Analysis Pattern Classification The Statistical Approach Distribution Models Singing Detection - Motivation - Features - Classifiers Musical Content - Dan Ellis 1: Pattern Recognition
37 5 Singing Detection (Berenzweig et al. 01) Hz File: /Users/dpwe/projects/aclass/aimee.wav t Can we automatically detect when singing is present? f 9 Printed: Tue Mar 11 13:04:28 0:02 0:04 0:06 0:08 0:10 0:12 0:14 0:16 0:18 0:20 0:22 0:24 0:26 0:28 mus vox mus vox mus - for further processing (lyrics recognition?) - as a song signature? - as a basis for classification? Musical Content - Dan Ellis 1: Pattern Recognition
38 Singing Detection: Requirements Labelled training examples - 60 x 15 sec. radio excerpts - hand-mark sung phrases freq / khz freq / khz trn/mus/ hand-label vox 0 trn/mus/ Labelled test data - several complete tracks from CDs, hand-labelled Feature choice - Mel-frequency Cepstral Coefficients (MFCCs) popular for speech; maybe sung voice too? - separation of voices? - temporal dimension freq / khz freq / khz trn/mus/19 trn/mus/ time / sec Classifier choice - MLP Neural Net - GMMs for singing / music - SVM? Musical Content - Dan Ellis 1: Pattern Recognition
39 MLP Neural Net Directly estimate p(singing x) music MFCC calculation C 0 C 1... singing not singing C 12 - net has 26 inputs (+ ), 15 HUs, 2 o/ps (26:15:2) How many hidden units? - depends on data amount, boundary complexity Feature context window? - useful in speech Delta features? - useful in speech Training parameters... Musical Content - Dan Ellis 1: Pattern Recognition
40 MLP Results Raw net outputs on a CD track (FA 74.1%): Aimee Mann : Build That Wall + handvox freq / khz :15:1 netlab on 16ms frames thr=0.5) p(singing) time / sec 30 p(singing) Smoothed for continuity: best FA = 90.5% y p ( ) ( ) time / sec 30 Musical Content - Dan Ellis 1: Pattern Recognition
41 GMM System Separate models for p(x sing), p(x no sing) - combined via likelihood ratio test GMM1 p(x singing ) music MFCC calculation C 0 C 1... Log l'hood ratio test p(x singing ) log p(x not ) singing? C 12 GMM2 p(x no singing ) How many Gaussians for each? - say 20; depends on data & complexity What kind of covariance? - diagonal (spherical?) Musical Content - Dan Ellis 1: Pattern Recognition
42 GMM Results Raw and smoothed results (Best FA=84.9%): freq / khz log(lhood) Aimee Mann : Build That Wall + handvox 26d 20mix GMM on 16ms frames thr=0) log(lhood) d 20mix GMM smoothed by 61 pt (1 sec) hann thr=-0.8) time / sec 30 MLP has advantage of discriminant training Each GMM trains only on data subset faster to train? (2 x 10 min vs. 20 min) Musical Content - Dan Ellis 1: Pattern Recognition
43 Summary Music content analysis: Pattern classification Basic machine learning methods: Neural Nets, GMMs Singing detection: classic application but... the time dimension? Musical Content - Dan Ellis 1: Pattern Recognition
44 References A.L. Berenzweig and D.P.W. Ellis (2001) Locating Singing Voice Segments within Music Signals, Proc. IEEE Workshop on Apps. of Sig. Proc. to Acous. and Audio, Mohonk NY, October R.O. Duda, P. Hart, R. Stork (2001) Pattern Classification, 2nd Ed. Wiley, E. Scheirer and M. Slaney (1997) Construction and evaluation of a robust multifeature speech/music discriminator, Proc. IEEE ICASSP, Munich, April Musical Content - Dan Ellis 1: Pattern Recognition
Lecture 3: Pattern Classification
EE E6820: Speech & Audio Processing & Recognition Lecture 3: Pattern Classification 1 2 3 4 5 The problem of classification Linear and nonlinear classifiers Probabilistic classification Gaussians, mixtures
More informationPattern Recognition Applied to Music Signals
JHU CLSP Summer School Pattern Recognition Applied to Music Signals 2 3 4 5 Music Content Analysis Classification and Features Statistical Pattern Recognition Gaussian Mixtures and Neural Nets Singing
More informationLecture 3: Pattern Classification. Pattern classification
EE E68: Speech & Audio Processing & Recognition Lecture 3: Pattern Classification 3 4 5 The problem of classification Linear and nonlinear classifiers Probabilistic classification Gaussians, mitures and
More informationLecture 3: Machine learning, classification, and generative models
EE E6820: Speech & Audio Processing & Recognition Lecture 3: Machine learning, classification, and generative models 1 Classification 2 Generative models 3 Gaussian models Michael Mandel
More informationMachine Learning for Signal Processing Bayes Classification and Regression
Machine Learning for Signal Processing Bayes Classification and Regression Instructor: Bhiksha Raj 11755/18797 1 Recap: KNN A very effective and simple way of performing classification Simple model: For
More informationEEL 851: Biometrics. An Overview of Statistical Pattern Recognition EEL 851 1
EEL 851: Biometrics An Overview of Statistical Pattern Recognition EEL 851 1 Outline Introduction Pattern Feature Noise Example Problem Analysis Segmentation Feature Extraction Classification Design Cycle
More informationCSCI-567: Machine Learning (Spring 2019)
CSCI-567: Machine Learning (Spring 2019) Prof. Victor Adamchik U of Southern California Mar. 19, 2019 March 19, 2019 1 / 43 Administration March 19, 2019 2 / 43 Administration TA3 is due this week March
More informationL11: Pattern recognition principles
L11: Pattern recognition principles Bayesian decision theory Statistical classifiers Dimensionality reduction Clustering This lecture is partly based on [Huang, Acero and Hon, 2001, ch. 4] Introduction
More informationMachine Learning 2017
Machine Learning 2017 Volker Roth Department of Mathematics & Computer Science University of Basel 21st March 2017 Volker Roth (University of Basel) Machine Learning 2017 21st March 2017 1 / 41 Section
More informationThe Noisy Channel Model. Statistical NLP Spring Mel Freq. Cepstral Coefficients. Frame Extraction ... Lecture 10: Acoustic Models
Statistical NLP Spring 2009 The Noisy Channel Model Lecture 10: Acoustic Models Dan Klein UC Berkeley Search through space of all possible sentences. Pick the one that is most probable given the waveform.
More informationStatistical NLP Spring The Noisy Channel Model
Statistical NLP Spring 2009 Lecture 10: Acoustic Models Dan Klein UC Berkeley The Noisy Channel Model Search through space of all possible sentences. Pick the one that is most probable given the waveform.
More informationProbabilistic classification CE-717: Machine Learning Sharif University of Technology. M. Soleymani Fall 2016
Probabilistic classification CE-717: Machine Learning Sharif University of Technology M. Soleymani Fall 2016 Topics Probabilistic approach Bayes decision theory Generative models Gaussian Bayes classifier
More informationMachine Learning Lecture 5
Machine Learning Lecture 5 Linear Discriminant Functions 26.10.2017 Bastian Leibe RWTH Aachen http://www.vision.rwth-aachen.de leibe@vision.rwth-aachen.de Course Outline Fundamentals Bayes Decision Theory
More informationMachine Learning. Gaussian Mixture Models. Zhiyao Duan & Bryan Pardo, Machine Learning: EECS 349 Fall
Machine Learning Gaussian Mixture Models Zhiyao Duan & Bryan Pardo, Machine Learning: EECS 349 Fall 2012 1 The Generative Model POV We think of the data as being generated from some process. We assume
More informationBrief Introduction of Machine Learning Techniques for Content Analysis
1 Brief Introduction of Machine Learning Techniques for Content Analysis Wei-Ta Chu 2008/11/20 Outline 2 Overview Gaussian Mixture Model (GMM) Hidden Markov Model (HMM) Support Vector Machine (SVM) Overview
More informationPattern Classification
Pattern Classification Introduction Parametric classifiers Semi-parametric classifiers Dimensionality reduction Significance testing 6345 Automatic Speech Recognition Semi-Parametric Classifiers 1 Semi-Parametric
More informationIntroduction to Support Vector Machines
Introduction to Support Vector Machines Hsuan-Tien Lin Learning Systems Group, California Institute of Technology Talk in NTU EE/CS Speech Lab, November 16, 2005 H.-T. Lin (Learning Systems Group) Introduction
More informationMidterm Review CS 6375: Machine Learning. Vibhav Gogate The University of Texas at Dallas
Midterm Review CS 6375: Machine Learning Vibhav Gogate The University of Texas at Dallas Machine Learning Supervised Learning Unsupervised Learning Reinforcement Learning Parametric Y Continuous Non-parametric
More informationThe Noisy Channel Model. CS 294-5: Statistical Natural Language Processing. Speech Recognition Architecture. Digitizing Speech
CS 294-5: Statistical Natural Language Processing The Noisy Channel Model Speech Recognition II Lecture 21: 11/29/05 Search through space of all possible sentences. Pick the one that is most probable given
More informationThe Noisy Channel Model. Statistical NLP Spring Mel Freq. Cepstral Coefficients. Frame Extraction ... Lecture 9: Acoustic Models
Statistical NLP Spring 2010 The Noisy Channel Model Lecture 9: Acoustic Models Dan Klein UC Berkeley Acoustic model: HMMs over word positions with mixtures of Gaussians as emissions Language model: Distributions
More informationParametric Models. Dr. Shuang LIANG. School of Software Engineering TongJi University Fall, 2012
Parametric Models Dr. Shuang LIANG School of Software Engineering TongJi University Fall, 2012 Today s Topics Maximum Likelihood Estimation Bayesian Density Estimation Today s Topics Maximum Likelihood
More informationStatistical learning. Chapter 20, Sections 1 4 1
Statistical learning Chapter 20, Sections 1 4 Chapter 20, Sections 1 4 1 Outline Bayesian learning Maximum a posteriori and maximum likelihood learning Bayes net learning ML parameter learning with complete
More informationLecture 4: Probabilistic Learning. Estimation Theory. Classification with Probability Distributions
DD2431 Autumn, 2014 1 2 3 Classification with Probability Distributions Estimation Theory Classification in the last lecture we assumed we new: P(y) Prior P(x y) Lielihood x2 x features y {ω 1,..., ω K
More informationMidterm Review CS 7301: Advanced Machine Learning. Vibhav Gogate The University of Texas at Dallas
Midterm Review CS 7301: Advanced Machine Learning Vibhav Gogate The University of Texas at Dallas Supervised Learning Issues in supervised learning What makes learning hard Point Estimation: MLE vs Bayesian
More informationPATTERN RECOGNITION AND MACHINE LEARNING
PATTERN RECOGNITION AND MACHINE LEARNING Chapter 1. Introduction Shuai Huang April 21, 2014 Outline 1 What is Machine Learning? 2 Curve Fitting 3 Probability Theory 4 Model Selection 5 The curse of dimensionality
More informationAdvanced statistical methods for data analysis Lecture 2
Advanced statistical methods for data analysis Lecture 2 RHUL Physics www.pp.rhul.ac.uk/~cowan Universität Mainz Klausurtagung des GK Eichtheorien exp. Tests... Bullay/Mosel 15 17 September, 2008 1 Outline
More informationGaussian and Linear Discriminant Analysis; Multiclass Classification
Gaussian and Linear Discriminant Analysis; Multiclass Classification Professor Ameet Talwalkar Slide Credit: Professor Fei Sha Professor Ameet Talwalkar CS260 Machine Learning Algorithms October 13, 2015
More informationp(d θ ) l(θ ) 1.2 x x x
p(d θ ).2 x 0-7 0.8 x 0-7 0.4 x 0-7 l(θ ) -20-40 -60-80 -00 2 3 4 5 6 7 θ ˆ 2 3 4 5 6 7 θ ˆ 2 3 4 5 6 7 θ θ x FIGURE 3.. The top graph shows several training points in one dimension, known or assumed to
More informationPerformance Comparison of K-Means and Expectation Maximization with Gaussian Mixture Models for Clustering EE6540 Final Project
Performance Comparison of K-Means and Expectation Maximization with Gaussian Mixture Models for Clustering EE6540 Final Project Devin Cornell & Sushruth Sastry May 2015 1 Abstract In this article, we explore
More informationMIRA, SVM, k-nn. Lirong Xia
MIRA, SVM, k-nn Lirong Xia Linear Classifiers (perceptrons) Inputs are feature values Each feature has a weight Sum is the activation activation w If the activation is: Positive: output +1 Negative, output
More informationClustering by Mixture Models. General background on clustering Example method: k-means Mixture model based clustering Model estimation
Clustering by Mixture Models General bacground on clustering Example method: -means Mixture model based clustering Model estimation 1 Clustering A basic tool in data mining/pattern recognition: Divide
More informationLecture 4: Probabilistic Learning
DD2431 Autumn, 2015 1 Maximum Likelihood Methods Maximum A Posteriori Methods Bayesian methods 2 Classification vs Clustering Heuristic Example: K-means Expectation Maximization 3 Maximum Likelihood Methods
More informationCOM336: Neural Computing
COM336: Neural Computing http://www.dcs.shef.ac.uk/ sjr/com336/ Lecture 2: Density Estimation Steve Renals Department of Computer Science University of Sheffield Sheffield S1 4DP UK email: s.renals@dcs.shef.ac.uk
More informationClassification: The rest of the story
U NIVERSITY OF ILLINOIS AT URBANA-CHAMPAIGN CS598 Machine Learning for Signal Processing Classification: The rest of the story 3 October 2017 Today s lecture Important things we haven t covered yet Fisher
More informationON THE USE OF MLP-DISTANCE TO ESTIMATE POSTERIOR PROBABILITIES BY KNN FOR SPEECH RECOGNITION
Zaragoza Del 8 al 1 de Noviembre de 26 ON THE USE OF MLP-DISTANCE TO ESTIMATE POSTERIOR PROBABILITIES BY KNN FOR SPEECH RECOGNITION Ana I. García Moral, Carmen Peláez Moreno EPS-Universidad Carlos III
More informationLecture 9: Speech Recognition. Recognizing Speech
EE E68: Speech & Audio Processing & Recognition Lecture 9: Speech Recognition 3 4 Recognizing Speech Feature Calculation Sequence Recognition Hidden Markov Models Dan Ellis http://www.ee.columbia.edu/~dpwe/e68/
More informationGaussian Mixture Models
Gaussian Mixture Models Pradeep Ravikumar Co-instructor: Manuela Veloso Machine Learning 10-701 Some slides courtesy of Eric Xing, Carlos Guestrin (One) bad case for K- means Clusters may overlap Some
More informationLecture 9: Speech Recognition
EE E682: Speech & Audio Processing & Recognition Lecture 9: Speech Recognition 1 2 3 4 Recognizing Speech Feature Calculation Sequence Recognition Hidden Markov Models Dan Ellis
More informationPATTERN CLASSIFICATION
PATTERN CLASSIFICATION Second Edition Richard O. Duda Peter E. Hart David G. Stork A Wiley-lnterscience Publication JOHN WILEY & SONS, INC. New York Chichester Weinheim Brisbane Singapore Toronto CONTENTS
More informationPreFEst: A Predominant-F0 Estimation Method for Polyphonic Musical Audio Signals
PreFEst: A Predominant-F0 Estimation Method for Polyphonic Musical Audio Signals Masataka Goto National Institute of Advanced Industrial Science and Technology (AIST). IT, AIST, 1-1-1 Umezono, Tsukuba,
More informationUNIVERSITY of PENNSYLVANIA CIS 520: Machine Learning Final, Fall 2013
UNIVERSITY of PENNSYLVANIA CIS 520: Machine Learning Final, Fall 2013 Exam policy: This exam allows two one-page, two-sided cheat sheets; No other materials. Time: 2 hours. Be sure to write your name and
More informationMachine Recognition of Sounds in Mixtures
Machine Recognition of Sounds in Mixtures Outline 1 2 3 4 Computational Auditory Scene Analysis Speech Recognition as Source Formation Sound Fragment Decoding Results & Conclusions Dan Ellis
More informationClustering. Professor Ameet Talwalkar. Professor Ameet Talwalkar CS260 Machine Learning Algorithms March 8, / 26
Clustering Professor Ameet Talwalkar Professor Ameet Talwalkar CS26 Machine Learning Algorithms March 8, 217 1 / 26 Outline 1 Administration 2 Review of last lecture 3 Clustering Professor Ameet Talwalkar
More informationECE521 week 3: 23/26 January 2017
ECE521 week 3: 23/26 January 2017 Outline Probabilistic interpretation of linear regression - Maximum likelihood estimation (MLE) - Maximum a posteriori (MAP) estimation Bias-variance trade-off Linear
More informationComputer Vision Group Prof. Daniel Cremers. 6. Mixture Models and Expectation-Maximization
Prof. Daniel Cremers 6. Mixture Models and Expectation-Maximization Motivation Often the introduction of latent (unobserved) random variables into a model can help to express complex (marginal) distributions
More informationCSE446: Clustering and EM Spring 2017
CSE446: Clustering and EM Spring 2017 Ali Farhadi Slides adapted from Carlos Guestrin, Dan Klein, and Luke Zettlemoyer Clustering systems: Unsupervised learning Clustering Detect patterns in unlabeled
More informationLearning with Noisy Labels. Kate Niehaus Reading group 11-Feb-2014
Learning with Noisy Labels Kate Niehaus Reading group 11-Feb-2014 Outline Motivations Generative model approach: Lawrence, N. & Scho lkopf, B. Estimating a Kernel Fisher Discriminant in the Presence of
More informationProbabilistic modeling. The slides are closely adapted from Subhransu Maji s slides
Probabilistic modeling The slides are closely adapted from Subhransu Maji s slides Overview So far the models and algorithms you have learned about are relatively disconnected Probabilistic modeling framework
More informationCh 4. Linear Models for Classification
Ch 4. Linear Models for Classification Pattern Recognition and Machine Learning, C. M. Bishop, 2006. Department of Computer Science and Engineering Pohang University of Science and echnology 77 Cheongam-ro,
More informationLinear Models for Classification
Linear Models for Classification Oliver Schulte - CMPT 726 Bishop PRML Ch. 4 Classification: Hand-written Digit Recognition CHINE INTELLIGENCE, VOL. 24, NO. 24, APRIL 2002 x i = t i = (0, 0, 0, 1, 0, 0,
More informationNaïve Bayes classification
Naïve Bayes classification 1 Probability theory Random variable: a variable whose possible values are numerical outcomes of a random phenomenon. Examples: A person s height, the outcome of a coin toss
More information> DEPARTMENT OF MATHEMATICS AND COMPUTER SCIENCE GRAVIS 2016 BASEL. Logistic Regression. Pattern Recognition 2016 Sandro Schönborn University of Basel
Logistic Regression Pattern Recognition 2016 Sandro Schönborn University of Basel Two Worlds: Probabilistic & Algorithmic We have seen two conceptual approaches to classification: data class density estimation
More informationClassification for High Dimensional Problems Using Bayesian Neural Networks and Dirichlet Diffusion Trees
Classification for High Dimensional Problems Using Bayesian Neural Networks and Dirichlet Diffusion Trees Rafdord M. Neal and Jianguo Zhang Presented by Jiwen Li Feb 2, 2006 Outline Bayesian view of feature
More informationMachine Learning Lecture 7
Course Outline Machine Learning Lecture 7 Fundamentals (2 weeks) Bayes Decision Theory Probability Density Estimation Statistical Learning Theory 23.05.2016 Discriminative Approaches (5 weeks) Linear Discriminant
More informationSINGLE CHANNEL SPEECH MUSIC SEPARATION USING NONNEGATIVE MATRIX FACTORIZATION AND SPECTRAL MASKS. Emad M. Grais and Hakan Erdogan
SINGLE CHANNEL SPEECH MUSIC SEPARATION USING NONNEGATIVE MATRIX FACTORIZATION AND SPECTRAL MASKS Emad M. Grais and Hakan Erdogan Faculty of Engineering and Natural Sciences, Sabanci University, Orhanli
More informationSinger Identification using MFCC and LPC and its comparison for ANN and Naïve Bayes Classifiers
Singer Identification using MFCC and LPC and its comparison for ANN and Naïve Bayes Classifiers Kumari Rambha Ranjan, Kartik Mahto, Dipti Kumari,S.S.Solanki Dept. of Electronics and Communication Birla
More informationSTATS 306B: Unsupervised Learning Spring Lecture 2 April 2
STATS 306B: Unsupervised Learning Spring 2014 Lecture 2 April 2 Lecturer: Lester Mackey Scribe: Junyang Qian, Minzhe Wang 2.1 Recap In the last lecture, we formulated our working definition of unsupervised
More informationParametric Unsupervised Learning Expectation Maximization (EM) Lecture 20.a
Parametric Unsupervised Learning Expectation Maximization (EM) Lecture 20.a Some slides are due to Christopher Bishop Limitations of K-means Hard assignments of data points to clusters small shift of a
More informationNONLINEAR CLASSIFICATION AND REGRESSION. J. Elder CSE 4404/5327 Introduction to Machine Learning and Pattern Recognition
NONLINEAR CLASSIFICATION AND REGRESSION Nonlinear Classification and Regression: Outline 2 Multi-Layer Perceptrons The Back-Propagation Learning Algorithm Generalized Linear Models Radial Basis Function
More informationCMU-Q Lecture 24:
CMU-Q 15-381 Lecture 24: Supervised Learning 2 Teacher: Gianni A. Di Caro SUPERVISED LEARNING Hypotheses space Hypothesis function Labeled Given Errors Performance criteria Given a collection of input
More informationMachine Learning Lecture 2
Machine Perceptual Learning and Sensory Summer Augmented 15 Computing Many slides adapted from B. Schiele Machine Learning Lecture 2 Probability Density Estimation 16.04.2015 Bastian Leibe RWTH Aachen
More informationClassification CE-717: Machine Learning Sharif University of Technology. M. Soleymani Fall 2012
Classification CE-717: Machine Learning Sharif University of Technology M. Soleymani Fall 2012 Topics Discriminant functions Logistic regression Perceptron Generative models Generative vs. discriminative
More informationMachine learning comes from Bayesian decision theory in statistics. There we want to minimize the expected value of the loss function.
Bayesian learning: Machine learning comes from Bayesian decision theory in statistics. There we want to minimize the expected value of the loss function. Let y be the true label and y be the predicted
More informationPattern Recognition and Machine Learning
Christopher M. Bishop Pattern Recognition and Machine Learning ÖSpri inger Contents Preface Mathematical notation Contents vii xi xiii 1 Introduction 1 1.1 Example: Polynomial Curve Fitting 4 1.2 Probability
More informationEngineering Part IIB: Module 4F10 Statistical Pattern Processing Lecture 5: Single Layer Perceptrons & Estimating Linear Classifiers
Engineering Part IIB: Module 4F0 Statistical Pattern Processing Lecture 5: Single Layer Perceptrons & Estimating Linear Classifiers Phil Woodland: pcw@eng.cam.ac.uk Michaelmas 202 Engineering Part IIB:
More informationCS 136 Lecture 5 Acoustic modeling Phoneme modeling
+ September 9, 2016 Professor Meteer CS 136 Lecture 5 Acoustic modeling Phoneme modeling Thanks to Dan Jurafsky for these slides + Directly Modeling Continuous Observations n Gaussians n Univariate Gaussians
More informationHeeyoul (Henry) Choi. Dept. of Computer Science Texas A&M University
Heeyoul (Henry) Choi Dept. of Computer Science Texas A&M University hchoi@cs.tamu.edu Introduction Speaker Adaptation Eigenvoice Comparison with others MAP, MLLR, EMAP, RMP, CAT, RSW Experiments Future
More informationIntroduction to Gaussian Process
Introduction to Gaussian Process CS 778 Chris Tensmeyer CS 478 INTRODUCTION 1 What Topic? Machine Learning Regression Bayesian ML Bayesian Regression Bayesian Non-parametric Gaussian Process (GP) GP Regression
More information9/12/17. Types of learning. Modeling data. Supervised learning: Classification. Supervised learning: Regression. Unsupervised learning: Clustering
Types of learning Modeling data Supervised: we know input and targets Goal is to learn a model that, given input data, accurately predicts target data Unsupervised: we know the input only and want to make
More informationMulti-layer Neural Networks
Multi-layer Neural Networks Steve Renals Informatics 2B Learning and Data Lecture 13 8 March 2011 Informatics 2B: Learning and Data Lecture 13 Multi-layer Neural Networks 1 Overview Multi-layer neural
More informationLecture : Probabilistic Machine Learning
Lecture : Probabilistic Machine Learning Riashat Islam Reasoning and Learning Lab McGill University September 11, 2018 ML : Many Methods with Many Links Modelling Views of Machine Learning Machine Learning
More informationIntro. ANN & Fuzzy Systems. Lecture 15. Pattern Classification (I): Statistical Formulation
Lecture 15. Pattern Classification (I): Statistical Formulation Outline Statistical Pattern Recognition Maximum Posterior Probability (MAP) Classifier Maximum Likelihood (ML) Classifier K-Nearest Neighbor
More informationMultivariate statistical methods and data mining in particle physics
Multivariate statistical methods and data mining in particle physics RHUL Physics www.pp.rhul.ac.uk/~cowan Academic Training Lectures CERN 16 19 June, 2008 1 Outline Statement of the problem Some general
More informationNaïve Bayes classification. p ij 11/15/16. Probability theory. Probability theory. Probability theory. X P (X = x i )=1 i. Marginal Probability
Probability theory Naïve Bayes classification Random variable: a variable whose possible values are numerical outcomes of a random phenomenon. s: A person s height, the outcome of a coin toss Distinguish
More informationDiscriminative Models
No.5 Discriminative Models Hui Jiang Department of Electrical Engineering and Computer Science Lassonde School of Engineering York University, Toronto, Canada Outline Generative vs. Discriminative models
More informationIntroduction to Machine Learning. Introduction to ML - TAU 2016/7 1
Introduction to Machine Learning Introduction to ML - TAU 2016/7 1 Course Administration Lecturers: Amir Globerson (gamir@post.tau.ac.il) Yishay Mansour (Mansour@tau.ac.il) Teaching Assistance: Regev Schweiger
More informationStatistical NLP Spring Digitizing Speech
Statistical NLP Spring 2008 Lecture 10: Acoustic Models Dan Klein UC Berkeley Digitizing Speech 1 Frame Extraction A frame (25 ms wide) extracted every 10 ms 25 ms 10ms... a 1 a 2 a 3 Figure from Simon
More informationDigitizing Speech. Statistical NLP Spring Frame Extraction. Gaussian Emissions. Vector Quantization. HMMs for Continuous Observations? ...
Statistical NLP Spring 2008 Digitizing Speech Lecture 10: Acoustic Models Dan Klein UC Berkeley Frame Extraction A frame (25 ms wide extracted every 10 ms 25 ms 10ms... a 1 a 2 a 3 Figure from Simon Arnfield
More informationArtificial Neural Networks
Artificial Neural Networks Stephan Dreiseitl University of Applied Sciences Upper Austria at Hagenberg Harvard-MIT Division of Health Sciences and Technology HST.951J: Medical Decision Support Knowledge
More informationLinear & nonlinear classifiers
Linear & nonlinear classifiers Machine Learning Hamid Beigy Sharif University of Technology Fall 1394 Hamid Beigy (Sharif University of Technology) Linear & nonlinear classifiers Fall 1394 1 / 34 Table
More informationSTA 4273H: Statistical Machine Learning
STA 4273H: Statistical Machine Learning Russ Salakhutdinov Department of Statistics! rsalakhu@utstat.toronto.edu! http://www.utstat.utoronto.ca/~rsalakhu/ Sidney Smith Hall, Room 6002 Lecture 3 Linear
More informationIndependent Component Analysis and Unsupervised Learning
Independent Component Analysis and Unsupervised Learning Jen-Tzung Chien National Cheng Kung University TABLE OF CONTENTS 1. Independent Component Analysis 2. Case Study I: Speech Recognition Independent
More informationBayesian Machine Learning
Bayesian Machine Learning Andrew Gordon Wilson ORIE 6741 Lecture 2: Bayesian Basics https://people.orie.cornell.edu/andrew/orie6741 Cornell University August 25, 2016 1 / 17 Canonical Machine Learning
More informationMachine Learning (CS 567) Lecture 5
Machine Learning (CS 567) Lecture 5 Time: T-Th 5:00pm - 6:20pm Location: GFS 118 Instructor: Sofus A. Macskassy (macskass@usc.edu) Office: SAL 216 Office hours: by appointment Teaching assistant: Cheol
More informationMining Classification Knowledge
Mining Classification Knowledge Remarks on NonSymbolic Methods JERZY STEFANOWSKI Institute of Computing Sciences, Poznań University of Technology SE lecture revision 2013 Outline 1. Bayesian classification
More informationAlgorithmisches Lernen/Machine Learning
Algorithmisches Lernen/Machine Learning Part 1: Stefan Wermter Introduction Connectionist Learning (e.g. Neural Networks) Decision-Trees, Genetic Algorithms Part 2: Norman Hendrich Support-Vector Machines
More informationExpectation Propagation for Approximate Bayesian Inference
Expectation Propagation for Approximate Bayesian Inference José Miguel Hernández Lobato Universidad Autónoma de Madrid, Computer Science Department February 5, 2007 1/ 24 Bayesian Inference Inference Given
More informationA Variance Modeling Framework Based on Variational Autoencoders for Speech Enhancement
A Variance Modeling Framework Based on Variational Autoencoders for Speech Enhancement Simon Leglaive 1 Laurent Girin 1,2 Radu Horaud 1 1: Inria Grenoble Rhône-Alpes 2: Univ. Grenoble Alpes, Grenoble INP,
More informationUniversity of Cambridge Engineering Part IIB Module 3F3: Signal and Pattern Processing Handout 2:. The Multivariate Gaussian & Decision Boundaries
University of Cambridge Engineering Part IIB Module 3F3: Signal and Pattern Processing Handout :. The Multivariate Gaussian & Decision Boundaries..15.1.5 1 8 6 6 8 1 Mark Gales mjfg@eng.cam.ac.uk Lent
More informationProbabilistic Graphical Models: MRFs and CRFs. CSE628: Natural Language Processing Guest Lecturer: Veselin Stoyanov
Probabilistic Graphical Models: MRFs and CRFs CSE628: Natural Language Processing Guest Lecturer: Veselin Stoyanov Why PGMs? PGMs can model joint probabilities of many events. many techniques commonly
More informationDensity Estimation. Seungjin Choi
Density Estimation Seungjin Choi Department of Computer Science and Engineering Pohang University of Science and Technology 77 Cheongam-ro, Nam-gu, Pohang 37673, Korea seungjin@postech.ac.kr http://mlg.postech.ac.kr/
More informationMachine Learning Lecture 2
Machine Perceptual Learning and Sensory Summer Augmented 6 Computing Announcements Machine Learning Lecture 2 Course webpage http://www.vision.rwth-aachen.de/teaching/ Slides will be made available on
More informationMining Classification Knowledge
Mining Classification Knowledge Remarks on NonSymbolic Methods JERZY STEFANOWSKI Institute of Computing Sciences, Poznań University of Technology COST Doctoral School, Troina 2008 Outline 1. Bayesian classification
More informationComputer Vision Group Prof. Daniel Cremers. 2. Regression (cont.)
Prof. Daniel Cremers 2. Regression (cont.) Regression with MLE (Rep.) Assume that y is affected by Gaussian noise : t = f(x, w)+ where Thus, we have p(t x, w, )=N (t; f(x, w), 2 ) 2 Maximum A-Posteriori
More informationMachine Learning. Lecture 4: Regularization and Bayesian Statistics. Feng Li. https://funglee.github.io
Machine Learning Lecture 4: Regularization and Bayesian Statistics Feng Li fli@sdu.edu.cn https://funglee.github.io School of Computer Science and Technology Shandong University Fall 207 Overfitting Problem
More informationPattern Recognition and Machine Learning. Bishop Chapter 6: Kernel Methods
Pattern Recognition and Machine Learning Chapter 6: Kernel Methods Vasil Khalidov Alex Kläser December 13, 2007 Training Data: Keep or Discard? Parametric methods (linear/nonlinear) so far: learn parameter
More informationLecture 9: PGM Learning
13 Oct 2014 Intro. to Stats. Machine Learning COMP SCI 4401/7401 Table of Contents I Learning parameters in MRFs 1 Learning parameters in MRFs Inference and Learning Given parameters (of potentials) and
More informationLogistic Regression. Machine Learning Fall 2018
Logistic Regression Machine Learning Fall 2018 1 Where are e? We have seen the folloing ideas Linear models Learning as loss minimization Bayesian learning criteria (MAP and MLE estimation) The Naïve Bayes
More informationStatistical and Learning Techniques in Computer Vision Lecture 2: Maximum Likelihood and Bayesian Estimation Jens Rittscher and Chuck Stewart
Statistical and Learning Techniques in Computer Vision Lecture 2: Maximum Likelihood and Bayesian Estimation Jens Rittscher and Chuck Stewart 1 Motivation and Problem In Lecture 1 we briefly saw how histograms
More information