Session 1: Pattern Recognition

Similar documents
Lecture 3: Pattern Classification

Pattern Recognition Applied to Music Signals

Lecture 3: Pattern Classification. Pattern classification

Lecture 3: Machine learning, classification, and generative models

Machine Learning for Signal Processing Bayes Classification and Regression

EEL 851: Biometrics. An Overview of Statistical Pattern Recognition EEL 851 1

CSCI-567: Machine Learning (Spring 2019)

L11: Pattern recognition principles

Machine Learning 2017

The Noisy Channel Model. Statistical NLP Spring Mel Freq. Cepstral Coefficients. Frame Extraction ... Lecture 10: Acoustic Models

Statistical NLP Spring The Noisy Channel Model

Probabilistic classification CE-717: Machine Learning Sharif University of Technology. M. Soleymani Fall 2016

Machine Learning Lecture 5

Machine Learning. Gaussian Mixture Models. Zhiyao Duan & Bryan Pardo, Machine Learning: EECS 349 Fall

Brief Introduction of Machine Learning Techniques for Content Analysis

Pattern Classification

Introduction to Support Vector Machines

Midterm Review CS 6375: Machine Learning. Vibhav Gogate The University of Texas at Dallas

The Noisy Channel Model. CS 294-5: Statistical Natural Language Processing. Speech Recognition Architecture. Digitizing Speech

The Noisy Channel Model. Statistical NLP Spring Mel Freq. Cepstral Coefficients. Frame Extraction ... Lecture 9: Acoustic Models

Parametric Models. Dr. Shuang LIANG. School of Software Engineering TongJi University Fall, 2012

Statistical learning. Chapter 20, Sections 1 4 1

Lecture 4: Probabilistic Learning. Estimation Theory. Classification with Probability Distributions

Midterm Review CS 7301: Advanced Machine Learning. Vibhav Gogate The University of Texas at Dallas

PATTERN RECOGNITION AND MACHINE LEARNING

Advanced statistical methods for data analysis Lecture 2

Gaussian and Linear Discriminant Analysis; Multiclass Classification

p(d θ ) l(θ ) 1.2 x x x

Performance Comparison of K-Means and Expectation Maximization with Gaussian Mixture Models for Clustering EE6540 Final Project

MIRA, SVM, k-nn. Lirong Xia

Clustering by Mixture Models. General background on clustering Example method: k-means Mixture model based clustering Model estimation

Lecture 4: Probabilistic Learning

COM336: Neural Computing

Classification: The rest of the story

ON THE USE OF MLP-DISTANCE TO ESTIMATE POSTERIOR PROBABILITIES BY KNN FOR SPEECH RECOGNITION

Lecture 9: Speech Recognition. Recognizing Speech

Gaussian Mixture Models

Lecture 9: Speech Recognition

PATTERN CLASSIFICATION

PreFEst: A Predominant-F0 Estimation Method for Polyphonic Musical Audio Signals

UNIVERSITY of PENNSYLVANIA CIS 520: Machine Learning Final, Fall 2013

Machine Recognition of Sounds in Mixtures

Clustering. Professor Ameet Talwalkar. Professor Ameet Talwalkar CS260 Machine Learning Algorithms March 8, / 26

ECE521 week 3: 23/26 January 2017

Computer Vision Group Prof. Daniel Cremers. 6. Mixture Models and Expectation-Maximization

CSE446: Clustering and EM Spring 2017

Learning with Noisy Labels. Kate Niehaus Reading group 11-Feb-2014

Probabilistic modeling. The slides are closely adapted from Subhransu Maji s slides

Ch 4. Linear Models for Classification

Linear Models for Classification

Naïve Bayes classification

> DEPARTMENT OF MATHEMATICS AND COMPUTER SCIENCE GRAVIS 2016 BASEL. Logistic Regression. Pattern Recognition 2016 Sandro Schönborn University of Basel

Classification for High Dimensional Problems Using Bayesian Neural Networks and Dirichlet Diffusion Trees

Machine Learning Lecture 7

SINGLE CHANNEL SPEECH MUSIC SEPARATION USING NONNEGATIVE MATRIX FACTORIZATION AND SPECTRAL MASKS. Emad M. Grais and Hakan Erdogan

Singer Identification using MFCC and LPC and its comparison for ANN and Naïve Bayes Classifiers

STATS 306B: Unsupervised Learning Spring Lecture 2 April 2

Parametric Unsupervised Learning Expectation Maximization (EM) Lecture 20.a

NONLINEAR CLASSIFICATION AND REGRESSION. J. Elder CSE 4404/5327 Introduction to Machine Learning and Pattern Recognition

CMU-Q Lecture 24:

Machine Learning Lecture 2

Classification CE-717: Machine Learning Sharif University of Technology. M. Soleymani Fall 2012

Machine learning comes from Bayesian decision theory in statistics. There we want to minimize the expected value of the loss function.

Pattern Recognition and Machine Learning

Engineering Part IIB: Module 4F10 Statistical Pattern Processing Lecture 5: Single Layer Perceptrons & Estimating Linear Classifiers

CS 136 Lecture 5 Acoustic modeling Phoneme modeling

Heeyoul (Henry) Choi. Dept. of Computer Science Texas A&M University

Introduction to Gaussian Process

9/12/17. Types of learning. Modeling data. Supervised learning: Classification. Supervised learning: Regression. Unsupervised learning: Clustering

Multi-layer Neural Networks

Lecture : Probabilistic Machine Learning

Intro. ANN & Fuzzy Systems. Lecture 15. Pattern Classification (I): Statistical Formulation

Multivariate statistical methods and data mining in particle physics

Naïve Bayes classification. p ij 11/15/16. Probability theory. Probability theory. Probability theory. X P (X = x i )=1 i. Marginal Probability

Discriminative Models

Introduction to Machine Learning. Introduction to ML - TAU 2016/7 1

Statistical NLP Spring Digitizing Speech

Digitizing Speech. Statistical NLP Spring Frame Extraction. Gaussian Emissions. Vector Quantization. HMMs for Continuous Observations? ...

Artificial Neural Networks

Linear & nonlinear classifiers

STA 4273H: Statistical Machine Learning

Independent Component Analysis and Unsupervised Learning

Bayesian Machine Learning

Machine Learning (CS 567) Lecture 5

Mining Classification Knowledge

Algorithmisches Lernen/Machine Learning

Expectation Propagation for Approximate Bayesian Inference

A Variance Modeling Framework Based on Variational Autoencoders for Speech Enhancement

University of Cambridge Engineering Part IIB Module 3F3: Signal and Pattern Processing Handout 2:. The Multivariate Gaussian & Decision Boundaries

Probabilistic Graphical Models: MRFs and CRFs. CSE628: Natural Language Processing Guest Lecturer: Veselin Stoyanov

Density Estimation. Seungjin Choi

Machine Learning Lecture 2

Mining Classification Knowledge

Computer Vision Group Prof. Daniel Cremers. 2. Regression (cont.)

Machine Learning. Lecture 4: Regularization and Bayesian Statistics. Feng Li.

Pattern Recognition and Machine Learning. Bishop Chapter 6: Kernel Methods

Lecture 9: PGM Learning

Logistic Regression. Machine Learning Fall 2018

Statistical and Learning Techniques in Computer Vision Lecture 2: Maximum Likelihood and Bayesian Estimation Jens Rittscher and Chuck Stewart

Transcription:

Proc. Digital del Continguts Musicals Session 1: Pattern Recognition 1 2 3 4 5 Music Content Analysis Pattern Classification The Statistical Approach Distribution Models Singing Detection Dan Ellis <dpwe@ee.columbia.edu> http://www.ee.columbia.edu/~dpwe/muscontent/ Laboratory for Recognition and Organization of Speech and Audio Columbia University, New York Spring 2003 Musical Content - Dan Ellis 1: Pattern Recognition 2003-03-18-1

1 Music Content Analysis Music contains information at many levels - what is it? We d like to get this information out automatically - fine-level transcription of events - broad-level classification of pieces Information extraction can be framed as: pattern classification / recognition or machine learning - build systems based on (labeled) training data Musical Content - Dan Ellis 1: Pattern Recognition 2003-03-18-2

Music analysis What might we want to get out of music? Instrument identification - different levels of specificity - registers within instruments Score recovery - transcribe the note sequence - extract the performance Ensemble performance - gestalts : chords, tone colors Broader timescales - phrasing & musical structure - artist / genre clustering and classification Musical Content - Dan Ellis 1: Pattern Recognition 2003-03-18-3

Course Outline Session 1: Pattern Classification - MLP Neural Networks - GMM distribution models - Singing detection Session 2: Sequence Recognition - The Hidden Markov Model - Chord sequence recognition Session 3: Musical Applications - Note transcription - Music summarization - Music Information Retrieval - Similarity browsing Musical Content - Dan Ellis 1: Pattern Recognition 2003-03-18-4

Outline 1 2 3 4 5 Music Content Analysis Pattern Classification - classification - decision boundaries - neural networks The Statistical Approach Distribution Models Singing Detection Musical Content - Dan Ellis 1: Pattern Recognition 2003-03-18-5

2 Pattern Classification Classification means: finding categorical (discrete) labels for real-world (continuous) observations F2/Hz 4000 f/hz 3000 2000 1000 0 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0 time/s 2000 1000 0 0 x x 1000 2000 F1/Hz ay ao Why? - information extraction - conditional processing - detect exceptions Musical Content - Dan Ellis 1: Pattern Recognition 2003-03-18-6

Building a classifier Define classes/attributes - could state explicit rules - better to define through training examples Define feature space Define decision algorithm - set parameters from examples Measure performance - calculate (weighted) error rate 1100 Pols vowel formants: "u" (x) vs. "o" (o) 1000 F2 / Hz 900 800 700 600 250 300 350 400 450 500 550 600 F1 / Hz Musical Content - Dan Ellis 1: Pattern Recognition 2003-03-18-7

Classification system parts Sensor signal segment feature vector Pre-processing/ segmentation Feature extraction STFT Locate vowels Formant extraction class Classification Post-processing Context constraints Costs/risk Musical Content - Dan Ellis 1: Pattern Recognition 2003-03-18-8

Feature extraction Right features are critical - waveform vs. formants vs. cepstra - invariance under irrelevant modifications Theoretically equivalent features may act very differently in practice - representations make important aspects explicit - remove irrelevant information Feature design incorporates domain knowledge - more data less need for cleverness? Smaller feature space (fewer dimensions) simpler models (fewer parameters) less training data needed faster training Musical Content - Dan Ellis 1: Pattern Recognition 2003-03-18-9

Minimum distance classification 1800 1600 Pols vowel formants: "u" (x), "o" (o), "a" (+) 1400 1200 + F2 / Hz 1000 800 600 x * o new observation x 0 200 400 600 800 1000 1200 1400 1600 1800 F1 / Hz Find closest match (nearest neighbor) min[ D( x, y ω i )], - choice of distance metric D(x,y) is important Find representative match (class prototypes) min[ D( x, z ω )] - class data { y ω,i } class model z ω Musical Content - Dan Ellis 1: Pattern Recognition 2003-03-18-10

Linear classifiers Minimum Euclidean distance is equivalent to a linear discriminant function: - examples { y ω,i } template z ω = mean{ y ω,i } - observed feature vector x = [x 1 x 2...] T - Euclidean distance metric Dxy [, ] = ( x y) T ( x y) - minimum distance rule: class i = argmin i D[ x, z i ] = argmin i D 2 [ x, z i ] - i.e. choose class O over U if D 2 [ x, z O ] < D 2 [ x, z U ]... Musical Content - Dan Ellis 1: Pattern Recognition 2003-03-18-11

Linear classification boundaries Min-distance divides normal to class centers: decision boundary 1 (z O -z U ) T (z O -z U ) 2 z O -z U z U x o z O (z O -z U ) (x-z U ) T (z O -z U ) z O -z U * x Scaling axes changes boundary: new boundary x o Squeezed horizontally x Squeezed vertically new boundary o Musical Content - Dan Ellis 1: Pattern Recognition 2003-03-18-12

Decision boundaries Linear functions give linear boundaries - planes and hyperplanes as dimensions increase Linear boundaries can become curves under feature transformations - e.g. a linear discriminant in x' = [x 2 1 x 2 2...] T What about more complex boundaries? - i.e. if a linear discriminant is how about a nonlinear function? F[ w i x i ] F[ w ij x i ] w i x i - changes only threshold space, but j i j i F[ w jk F[ w ij x i ]] offers more flexibility, and has even more... Musical Content - Dan Ellis 1: Pattern Recognition 2003-03-18-13

Neural networks Sums over nonlinear functions of sums large range of decision surfaces e.g. Multi-layer perceptron (MLP) with 1 hidden layer: y k = F[ w jk F[ w ij x j i ]] j x 1 + x w 2 h jk + y 1 x 2 3 + F[ ] Input layer w ij + h 1 Hidden layer Problem is finding the weights w ij... (training) + y2 Output layer Musical Content - Dan Ellis 1: Pattern Recognition 2003-03-18-14

Back-propagation training Find w ij to minimize e.g. E = n for training set of patterns {x n } and desired (target) outputs {t n } y( x n ) t n Differentiate error with respect to weights - contribution from each pattern x n, t n : y k = F[ w jk h j j ] E w jk = = E y y Σ w jk 2( y t) F' Σ w jk = w jk α j ( w jk h j ) [ ] h j E w jk i.e. gradient descent with learning rate α 2 Musical Content - Dan Ellis 1: Pattern Recognition 2003-03-18-15

Neural net example 2 input units (normalized F1, F2) 5 hidden units, 3 output units ( U, O, A ) F1 F2 A O U Sigmoid nonlinearity: F[ x] 1 = 1 ---------------- 1 + e x df = F( 1 F) dx 0.8 0.6 0.4 0.2 sigm(x) d sigm(x) dx 0 5 4 3 2 1 0 1 2 3 4 5 Musical Content - Dan Ellis 1: Pattern Recognition 2003-03-18-16

1 Neural net training 2:5:3 net: MS error by training epoch Mean squared error 0.8 0.6 0.4 0.2 1600 0 0 10 20 30 40 50 60 70 80 90 100 Training epoch Contours @ 10 iterations 1600 Contours @ 100 iterations 1400 1400 1200 1200 1000 1000 800 800 600 200 400 600 800 1000 1200 example... 600 200 400 600 800 1000 1200 Musical Content - Dan Ellis 1: Pattern Recognition 2003-03-18-17

Outline 1 2 3 4 5 Music Content Analysis Pattern Classification The Statistical Approach - Conditional distributions - Bayes rule Distribution Models Singing Detection Musical Content - Dan Ellis 1: Pattern Recognition 2003-03-18-18

3 The Statistical Approach Observations are random variables whose distribution depends on the class: Observation x Class ω i (hidden) discrete continuous Source distributions p(x ω i ) - reflect variability in feature - reflect noise in observation - generally have to be estimated from data (rather than known in advance) p(x ω i ) p(x ω i ) Pr(ω i x) ω 1 ω 2 ω 3 ω 4 x Musical Content - Dan Ellis 1: Pattern Recognition 2003-03-18-19

Random Variables review Random vars have joint distributions (pdf s): p(x,y) y µy σ y σ xy p(y) p(x) µ x σ x x - marginal px ( ) = pxy (, ) dy - covariance Σ = E[ ( o µ )( o µ ) T ] = σ x 2 σxy σ xy σ y 2 Musical Content - Dan Ellis 1: Pattern Recognition 2003-03-18-20

Conditional probability Knowing one value in a joint distribution constrains the remainder: y Y p(x,y) x p(x y = Y ) Bayes rule: pxy (, ) = pxy ( ) py ( ) pyx ( ) = pxy ( ) py ( ) ------------------------------- px ( ) can reverse conditioning given priors/marginal - either term can be discrete or continuous Musical Content - Dan Ellis 1: Pattern Recognition 2003-03-18-21

Priors and posteriors Bayesian inference can be interpreted as updating prior beliefs with new information, x: Bayes Rule: Pr( ω i ) Prior probability pxω ( i ) -------------------------------------------------- pxω ( j ) Pr( ω j ) j Likelihood Evidence = p(x) = Pr( ω i x) Posterior probability Posterior is prior scaled by likelihood & normalized by evidence (so Σ(posteriors) = 1) Objection: priors are often unknown - but omitting them amounts to assuming they are all equal Musical Content - Dan Ellis 1: Pattern Recognition 2003-03-18-22

Bayesian (MAP) classifier Minimize the probability of error by choosing maximum a posteriori (MAP) class: Intuitively right - choose most probable class in light of the observation Given models for each distribution, hence ωˆ = argmax ω i Pr( ω i x) pxω ( i ) the search for ωˆ = argmax Pr( ω i x) ω i pxω ( i ) Pr( ω i ) becomes argmax -------------------------------------------------- ω i pxω ( j ) Pr( ω j ) j but denominator = p(x) is the same over all ω i ωˆ = argmax ω i pxω ( i ) Pr( ω i ) Musical Content - Dan Ellis 1: Pattern Recognition 2003-03-18-23

Practical implementation Optimal classifier is but we don t know ωˆ = Pr( ω i x) argmax ω i Pr( ω i x) Can model directly e.g. train a neural net to map from inputs x to a set of outputs Pr(ω i ) - a discriminative model Often easier to model conditional distributions then use Bayes rule to find MAP class pxω ( i ) Labeled training examples {x n,ω xn } Sort according to class Estimate conditional pdf for class ω 1 p(x ω 1 ) Musical Content - Dan Ellis 1: Pattern Recognition 2003-03-18-24

Outline 1 2 3 4 5 Music Content Analysis Pattern Classification The Statistical Approach Distribution Models - Gaussian models - Gaussian mixtures Singing Detection Musical Content - Dan Ellis 1: Pattern Recognition 2003-03-18-25

4 Distribution Models Easiest way to model distributions is via parametric model - assume known form, estimate a few parameters Gaussian model is simple & useful: in 1 dim: pxω ( i ) = 1 --------------- 2πσ i exp normalization to make it sum to 1 1 -- x µ i 2 ------------- 2 σ i Parameters mean µ i and variance σ i 2 fit P M P M e 1/2 p(x ω i ) µ i σ i x Musical Content - Dan Ellis 1: Pattern Recognition 2003-03-18-26

Gaussians in d dimensions: p( x ω i ) = 1 ----------------------------------- ( 2π) d 1 2 Σ i exp 1 -- ( x µ 2 i ) T 1 Σ i ( x µ i ) Described by d dimensional mean vector µ i and d x d covariance matrix Σ i 5 0.5 1 0 4 3 4 2 2 0 0 2 4 1 0 0 1 2 3 4 5 argmax ω i Classify by maximizing log likelihood i.e. 1 -- ( x µ 2 i ) T 1 Σ i ( x µ i ) 1 -- log Σ 2 i + logpr( ω i ) Musical Content - Dan Ellis 1: Pattern Recognition 2003-03-18-27

Gaussian Mixture models (GMMs) Single Gaussians cannot model - distributions with multiple modes - distributions with nonlinear correlation What about a weighted sum? px ( ) c k pxm ( k ) i.e. k where {c k } is a set of weights and pxm ( k ) { } is a set of Gaussian components - can fit anything given enough components Interpretation: each observation is generated by one of the Gaussians, chosen at random with priors c k = Pr( m k ) Musical Content - Dan Ellis 1: Pattern Recognition 2003-03-18-28

Gaussian mixtures (2) e.g. nonlinear correlation: resulting surface 3 2 1 0-1 0.8 1 1.2 1.4-2 0 5 10 original data 15 20 25 30 35 0 0.2 0.4 0.6 Gaussian components Problem: finding c k and m k parameters - easy if we knew which m k generated each x Musical Content - Dan Ellis 1: Pattern Recognition 2003-03-18-29

Expectation-maximization (EM) General procedure for estimating a model when some parameters are unknown - e.g. which component generated a value in a Gaussian mixture model Procedure: Iteratively update fixed model parameters Θ to maximize Q=expected value of log-probability of known training data x trn and unknown parameters u: Q( Θ, Θ old ) = Pr u x trn, Θ old ( ) log p( u, x trn Θ) u - iterative because Pr( u x trn, Θ old ) changes each time we change Θ - can prove this increases data likelihood, hence maximum-likelihood (ML) model - local optimum - depends on initialization Musical Content - Dan Ellis 1: Pattern Recognition 2003-03-18-30

Fitting GMMs with EM Want to find component Gaussians m k { µ k, Σ k } plus weights = = c k Pr( m k ) to maximize likelihood of training data x trn If we could assign each x to a particular m k, estimation would be direct Hence, treat ks (mixture indices) as unknown, form Q = E[ log p( x, k Θ) ], differentiate with respect to model parameters equations for µ k, Σ k and c k to maximize Q... Musical Content - Dan Ellis 1: Pattern Recognition 2003-03-18-31

GMM EM update equations: Solve for maximizing Q: pkx n, Θ n µ k = pkx ( n, Θ) n pkx n, Θ n Σ k = n ( ) x n --------------------------------------------- ( )( x n µ k )( x n µ k ) T ------------------------------------------------------------------------------------ pkx ( n, Θ) 1 c k = --- pkx N ( n, Θ) n pkx ( n, Θ) Each involves, fuzzy membership of x n to Gaussian k Parameter is just sample average, weighted by fuzzy membership Musical Content - Dan Ellis 1: Pattern Recognition 2003-03-18-32

GMM examples Vowel data fit with different mixture counts: 1 Gauss logp(x)=-1911 1600 2 Gauss logp(x)=-1864 1600 1400 1400 1200 1200 1000 1000 800 800 600 200 400 600 800 1000 1200 1600 3 Gauss logp(x)=-1849 600 200 400 600 800 1000 1200 1600 4 Gauss logp(x)=-1840 1400 1400 1200 1200 1000 1000 800 800 600 200 400 600 800 1000 1200 600 200 400 600 800 1000 1200 example... Musical Content - Dan Ellis 1: Pattern Recognition 2003-03-18-33

Methodological Remarks: Training and test data A rich model can learn every training example (overtraining) error rate Test data Training data Overfitting training or parameters But, goal is to classify new, unseen data i.e. generalization - sometimes use cross validation set to decide when to stop training For evaluation results to be meaningful: - don t test with training data! - don t train on test data (even indirectly...) Musical Content - Dan Ellis 1: Pattern Recognition 2003-03-18-34

Model complexity More model parameters better fit - more Gaussian mixture components - more MLP hidden units More training data can use larger models For best generalization (no overfitting), there will be some optimal model size: 44 42 More parameters Word Error Rate More training data WER% 40 38 36 34 Optimal parameter/data ratio Constant training time 32 9.25 18.5 Training set / hours 37 74 4000 1000 Hidden layer / units 2000 500 WER for PLP12N-8k nets vs. net size & training data Musical Content - Dan Ellis 1: Pattern Recognition 2003-03-18-35

Outline 1 2 3 4 5 Music Content Analysis Pattern Classification The Statistical Approach Distribution Models Singing Detection - Motivation - Features - Classifiers Musical Content - Dan Ellis 1: Pattern Recognition 2003-03-18-36

5 Singing Detection (Berenzweig et al. 01) Hz File: /Users/dpwe/projects/aclass/aimee.wav 7000 6500 6000 5500 5000 4500 4000 3500 3000 2500 2000 1500 1000 500 t Can we automatically detect when singing is present? f 9 Printed: Tue Mar 11 13:04:28 0:02 0:04 0:06 0:08 0:10 0:12 0:14 0:16 0:18 0:20 0:22 0:24 0:26 0:28 mus vox mus vox mus - for further processing (lyrics recognition?) - as a song signature? - as a basis for classification? Musical Content - Dan Ellis 1: Pattern Recognition 2003-03-18-37

Singing Detection: Requirements Labelled training examples - 60 x 15 sec. radio excerpts - hand-mark sung phrases freq / khz freq / khz trn/mus/3 10 5 hand-label vox 0 trn/mus/8 10 5 Labelled test data - several complete tracks from CDs, hand-labelled Feature choice - Mel-frequency Cepstral Coefficients (MFCCs) popular for speech; maybe sung voice too? - separation of voices? - temporal dimension freq / khz freq / khz 0 10 5 0 10 5 trn/mus/19 trn/mus/28 0 0 2 4 6 8 10 12 time / sec Classifier choice - MLP Neural Net - GMMs for singing / music - SVM? Musical Content - Dan Ellis 1: Pattern Recognition 2003-03-18-38

MLP Neural Net Directly estimate p(singing x) music MFCC calculation C 0 C 1... singing not singing C 12 - net has 26 inputs (+ ), 15 HUs, 2 o/ps (26:15:2) How many hidden units? - depends on data amount, boundary complexity Feature context window? - useful in speech Delta features? - useful in speech Training parameters... Musical Content - Dan Ellis 1: Pattern Recognition 2003-03-18-39

MLP Results Raw net outputs on a CD track (FA 74.1%): Aimee Mann : Build That Wall + handvox freq / khz 10 8 6 4 2 0 1 26:15:1 netlab on 16ms frames (FA=74.1% @ thr=0.5) p(singing) 0.8 0.6 0.4 0.2 0 0 5 10 15 20 25 time / sec 30 p(singing) 1 0.8 0.6 Smoothed for continuity: best FA = 90.5% y p ( ) ( ) 0.4 0.2 0 0 5 10 15 20 25 time / sec 30 Musical Content - Dan Ellis 1: Pattern Recognition 2003-03-18-40

GMM System Separate models for p(x sing), p(x no sing) - combined via likelihood ratio test GMM1 p(x singing ) music MFCC calculation C 0 C 1... Log l'hood ratio test p(x singing ) log p(x not ) singing? C 12 GMM2 p(x no singing ) How many Gaussians for each? - say 20; depends on data & complexity What kind of covariance? - diagonal (spherical?) Musical Content - Dan Ellis 1: Pattern Recognition 2003-03-18-41

GMM Results Raw and smoothed results (Best FA=84.9%): freq / khz log(lhood) 10 8 6 4 2 0 10 5 0-5 Aimee Mann : Build That Wall + handvox 26d 20mix GMM on 16ms frames (FA=65.8% @ thr=0) log(lhood) -10 5 0 26d 20mix GMM smoothed by 61 pt (1 sec) hann (FA=84.9% @ thr=-0.8) -5 0 5 10 15 20 25 time / sec 30 MLP has advantage of discriminant training Each GMM trains only on data subset faster to train? (2 x 10 min vs. 20 min) Musical Content - Dan Ellis 1: Pattern Recognition 2003-03-18-42

Summary Music content analysis: Pattern classification Basic machine learning methods: Neural Nets, GMMs Singing detection: classic application but... the time dimension? Musical Content - Dan Ellis 1: Pattern Recognition 2003-03-18-43

References A.L. Berenzweig and D.P.W. Ellis (2001) Locating Singing Voice Segments within Music Signals, Proc. IEEE Workshop on Apps. of Sig. Proc. to Acous. and Audio, Mohonk NY, October 2001. http://www.ee.columbia.edu/~dpwe/pubs/waspaa01-singing.pdf R.O. Duda, P. Hart, R. Stork (2001) Pattern Classification, 2nd Ed. Wiley, 2001. E. Scheirer and M. Slaney (1997) Construction and evaluation of a robust multifeature speech/music discriminator, Proc. IEEE ICASSP, Munich, April 1997. http://www.ee.columbia.edu/~dpwe/e6820/papers/scheis97-mussp.pdf Musical Content - Dan Ellis 1: Pattern Recognition 2003-03-18-44