Lecture 7: Pitch and Chord (2) HMM, pitch detection functions. Li Su 2016/03/31

Similar documents
Introduction Basic Audio Feature Extraction

Analysis of polyphonic audio using source-filter model and non-negative matrix factorization

Topic 7. Convolution, Filters, Correlation, Representation. Bryan Pardo, 2008, Northwestern University EECS 352: Machine Perception of Music and Audio

The Noisy Channel Model. Statistical NLP Spring Mel Freq. Cepstral Coefficients. Frame Extraction ... Lecture 9: Acoustic Models

Short-Time Fourier Transform and Chroma Features

Signal Modeling Techniques in Speech Recognition. Hassan A. Kingravi

The Noisy Channel Model. Statistical NLP Spring Mel Freq. Cepstral Coefficients. Frame Extraction ... Lecture 10: Acoustic Models

Statistical NLP Spring The Noisy Channel Model

Topic 6. Timbre Representations

Hidden Markov Model and Speech Recognition

Robert Collins CSE586 CSE 586, Spring 2015 Computer Vision II

Recall: Modeling Time Series. CSE 586, Spring 2015 Computer Vision II. Hidden Markov Model and Kalman Filter. Modeling Time Series

Between Homomorphic Signal Processing and Deep Neural Networks: Constructing Deep Algorithms for Polyphonic Music Transcription

ROBUST REALTIME POLYPHONIC PITCH DETECTION

Statistical NLP Spring Digitizing Speech

Digitizing Speech. Statistical NLP Spring Frame Extraction. Gaussian Emissions. Vector Quantization. HMMs for Continuous Observations? ...

ON SCALABLE CODING OF HIDDEN MARKOV SOURCES. Mehdi Salehifar, Tejaswi Nanjundaswamy, and Kenneth Rose

Design Criteria for the Quadratically Interpolated FFT Method (I): Bias due to Interpolation

University of Colorado at Boulder ECEN 4/5532. Lab 2 Lab report due on February 16, 2015

The Noisy Channel Model. CS 294-5: Statistical Natural Language Processing. Speech Recognition Architecture. Digitizing Speech

PUBLISHED IN IEEE TRANSACTIONS ON AUDIO, SPEECH AND LANGUAGE PROCESSING, VOL. 19, NO. 6, AUGUST

Short-Time Fourier Transform and Chroma Features

LECTURE NOTES IN AUDIO ANALYSIS: PITCH ESTIMATION FOR DUMMIES

Feature Extraction for ASR: Pitch

CS 188: Artificial Intelligence Fall 2011

PreFEst: A Predominant-F0 Estimation Method for Polyphonic Musical Audio Signals

Cochlear modeling and its role in human speech recognition

Chapter 4 Dynamic Bayesian Networks Fall Jin Gu, Michael Zhang

Lecture 5: GMM Acoustic Modeling and Feature Extraction

Temporal Modeling and Basic Speech Recognition

Hidden Markov Models. Dr. Naomi Harte

Lecture 9: Speech Recognition. Recognizing Speech

Lecture 9: Speech Recognition

Non-Negative Matrix Factorization And Its Application to Audio. Tuomas Virtanen Tampere University of Technology

Real-Time Pitch Determination of One or More Voices by Nonnegative Matrix Factorization

Human-Oriented Robotics. Temporal Reasoning. Kai Arras Social Robotics Lab, University of Freiburg

Lecture 8: Pitch and Chord (3) pitch detection and music transcription. Li Su 2016/03/31

Independent Component Analysis and Unsupervised Learning

Lecture 5. The Digital Fourier Transform. (Based, in part, on The Scientist and Engineer's Guide to Digital Signal Processing by Steven Smith)

Timbral, Scale, Pitch modifications

Statistical NLP: Hidden Markov Models. Updated 12/15

Research Article A Combined Mathematical Treatment for a Special Automatic Music Transcription System

Lecture 4: Hidden Markov Models: An Introduction to Dynamic Decision Making. November 11, 2010

Proc. of NCC 2010, Chennai, India

Nonnegative Matrix Factorization with Markov-Chained Bases for Modeling Time-Varying Patterns in Music Spectrograms

Hidden Markov Models

A Variance Modeling Framework Based on Variational Autoencoders for Speech Enhancement

Linear Prediction 1 / 41

Speech and Language Processing. Chapter 9 of SLP Automatic Speech Recognition (II)

Automatic Speech Recognition (CS753)

SINGLE CHANNEL SPEECH MUSIC SEPARATION USING NONNEGATIVE MATRIX FACTORIZATION AND SPECTRAL MASKS. Emad M. Grais and Hakan Erdogan

Sequence labeling. Taking collective a set of interrelated instances x 1,, x T and jointly labeling them

Hidden Markov Models. By Parisa Abedi. Slides courtesy: Eric Xing

Linear Dynamical Systems (Kalman filter)

Sequence modelling. Marco Saerens (UCL) Slides references

BILEVEL SPARSE MODELS FOR POLYPHONIC MUSIC TRANSCRIPTION

Intelligent Systems (AI-2)

Hidden Markov Models The three basic HMM problems (note: change in notation) Mitch Marcus CSE 391

David Weenink. First semester 2007

On Optimal Coding of Hidden Markov Sources

Intelligent Systems (AI-2)

Independent Component Analysis and Unsupervised Learning. Jen-Tzung Chien

Singer Identification using MFCC and LPC and its comparison for ANN and Naïve Bayes Classifiers

'L. E. Dickson, Introduction to the Theory of Numbers, Chap. V (1929).

Audio Features. Fourier Transform. Fourier Transform. Fourier Transform. Short Time Fourier Transform. Fourier Transform.

Bayesian harmonic models for musical signal analysis. Simon Godsill and Manuel Davy

Audio Features. Fourier Transform. Short Time Fourier Transform. Short Time Fourier Transform. Short Time Fourier Transform

p(d θ ) l(θ ) 1.2 x x x

Expectation Maximization (EM)

State-Space Methods for Inferring Spike Trains from Calcium Imaging

Robust Speaker Identification

Feature extraction 1

Introduction to Artificial Intelligence (AI)

Session 1: Pattern Recognition

Physical Acoustics. Hearing is the result of a complex interaction of physics, physiology, perception and cognition.

What s an HMM? Extraction with Finite State Machines e.g. Hidden Markov Models (HMMs) Hidden Markov Models (HMMs) for Information Extraction

Hidden Markov Modelling

Introduction to Machine Learning CMU-10701

Brief Introduction of Machine Learning Techniques for Content Analysis

ESTIMATING TRAFFIC NOISE LEVELS USING ACOUSTIC MONITORING: A PRELIMINARY STUDY

Dept. of Linguistics, Indiana University Fall 2009

CS 343: Artificial Intelligence

Voice Activity Detection Using Pitch Feature

Hidden Markov models

LONG-TERM REVERBERATION MODELING FOR UNDER-DETERMINED AUDIO SOURCE SEPARATION WITH APPLICATION TO VOCAL MELODY EXTRACTION.

Pitch Estimation and Tracking with Harmonic Emphasis On The Acoustic Spectrum

BAYESIAN NONNEGATIVE HARMONIC-TEMPORAL FACTORIZATION AND ITS APPLICATION TO MULTIPITCH ANALYSIS

Shankar Shivappa University of California, San Diego April 26, CSE 254 Seminar in learning algorithms

ISOLATED WORD RECOGNITION FOR ENGLISH LANGUAGE USING LPC,VQ AND HMM

Monitoring System Of Phytoplankton Blooms By Using Unsupervised Classifier And Time Modelling

Artificial Intelligence Markov Chains

Hidden Markov models 1

15-381: Artificial Intelligence. Hidden Markov Models (HMMs)

Hidden Markov Models in Language Processing

Statistical Machine Learning from Data

CS229 Project: Musical Alignment Discovery

Multiscale Systems Engineering Research Group

CS 5522: Artificial Intelligence II

A gentle introduction to Hidden Markov Models

Bayesian Networks: Construction, Inference, Learning and Causal Interpretation. Volker Tresp Summer 2016

Transcription:

Lecture 7: Pitch and Chord (2) HMM, pitch detection functions Li Su 2016/03/31

Chord progressions Chord progressions are not arbitrary Example 1: I-IV-I-V-I (C-F-C-G-C) Example 2: I-V-VI-III-IV-I-II-V (C-G-Am-Em-F-C-Dm-G) From: M. Mueller, Fundamentals of Music Processing, Chapter 5, Springer 2015

Markov chains of chord progressions Markov states α 1, α 2, α 3 in a sequence s 1 s 2 s 3 Markov property: P s n+1 = α j s n = α i, s n 1 = α k, = P(s n+1 = α j s n = α i ) From: M. Mueller, Fundamentals of Music Processing, Chapter 5, Springer 2015

HMM model Observations Chroma features Or template-based result Hidden states Refined chord sequence The answer we want Transition probability From training data Emission probability From training data From: M. Mueller, Fundamentals of Music Processing, Chapter 5, Springer 2015

Discrete HMM components Map an arbitrary chroma features in the test data to one of a finite set of prototype vectors (codebook) Quantization: map the feature Clustering: train the codebook From: M. Mueller, Fundamentals of Music Processing, Chapter 5, Springer 2015

A naïve HMM model training method I states (e.g., I = 24), K observation symbols For the training data: c i = number of transition form α i at time (n = 1) a ij = number of transitions from α i to α j number of transitions from α i b ik = number of transitions from α iand observing β k number of transitions from α i From: M. Mueller, Fundamentals of Music Processing, Chapter 5, Springer 2015

The uncovering problem of HMM Given: An HMM specified by Θ = (A, A, C, B, B) An observation sequence O = (o 1, o 2,, o N ) Find: The single state sequence S = (s 1, s 2,, s N ), s i A that best explain the observation sequence S = argmax S I states, N time frames -> total I N possible paths How to solve this problem? P(O, S Θ)

Viterbi s algorithm (1) Based on dynamic programming: the optimal result for a problem is built on the optimal result for the sub-problems From: M. Mueller, Fundamentals of Music Processing, Chapter 5, Springer 2015

Viterbi s algorithm (2) For backtracking From: M. Mueller, Fundamentals of Music Processing, Chapter 5, Springer 2015

An example of Viterbi s algorithm (1) From: M. Mueller, Fundamentals of Music Processing, Chapter 5, Springer 2015

An example of Viterbi s algorithm (2) From: M. Mueller, Fundamentals of Music Processing, Chapter 5, Springer 2015

Result Better than temporal smoothing From: M. Mueller, Fundamentals of Music Processing, Chapter 5, Springer 2015

Pitch detection

Pitch detection Pitch detection from the spectrum Problem 1: missing fundamental Problem 2: inharmonicity Periodicity-based pitch detection? From: http://sites.sinauer.com/wolfe4e/wa10.02.html From: http://www.21harmony.com/blog/illuminating-inharmonicity

Periodicity detection We have discussed some techniques in spectrum estimation / frequency detection What is the difference between frequency and periodicity? Formally, a periodic signal is defined as x t = x t + T 0, t What is the definition of frequency? Find the fundamental frequency/period Application: pitch detection, transcription, beat tracking

Pitch detection theory: a historical remark August Seebeck (1805-1849) Georg Simon Ohm (1789-1854) Herman von Helmholtz (1821-1894) Harvey Fletcher (1884-1981) Jan Frederik Schouten (1910-1980)

Seebeck s experiment (1841) and Ohm s second law Ohm s second law: a pitch could be heard only if the wave contains power at the frequency ( Fourierism perspective) Ohm: Seebeck s finding is just an illusion pitch is periodicity! pitch is frequency!

Helmholtz s theory On the Sensations of Tone as a Physiological Basis for the Theory of Music (1877) Fourierism perspective: distortion products generated in the ear so we can hear that weak fundamental Fletcher: discover missing fundamental using high-pass filter on audio signal I support Ohm s position, and I have a beautiful explanation I support Helmholtz s position!

Schouten s experiment I (1938) Input signal: 400Hz, 600Hz, 800Hz,, with distortion product at 200Hz (Helmholtz s theory) Add a pure tone of 206 Hz, beats should be heard No beats were heard Things are not quite so simple

Schouten s experiment II (1938) Input signal: 1000Hz, 1200Hz, 1400Hz A clear pitch at 200 Hz should be heard (Helmholtz s theory) Input signal: 1040Hz, 1240Hz, 1440Hz Also a clear pitch at 200 Hz should be heard (Helmholtz s theory) Experiment: ~207 Hz Things are not quite so simple

Challenges Quasi-periodicity Multiple periodicity (polyphonic: overlap and harmonic) Transient

Basic idea of periodicity detection Formally, a periodic signal is defined as x t = x t + T 0, t Formally, the frequency spectrum of a signal is defined as Frequency analysis: the relationship between the signal and the sinusoidal basis Periodicity analysis: the relationship between the signal and itself

Basic periodicity detection functions Autocorrelation function (ACF) Average magnitude difference function (AMDF) YIN and its periodicity detector Generalized ACF and Cepstrum

Autocorrelation function (ACF) Cross product measures similarity across time Cross correlation: R xy τ = 1 σ N 1 t=0 N 1 τ x t y(t + τ) Autocorrelation: R xx τ = 1 σ N 1 t=0 N 1 τ x t x(t + τ) t: time-domain τ: lag-domain

Other relevant pitch detection functions Average magnitude difference function (AMDF) AMDF xx τ = 1 σ N 1 t=0 N 1 τ x t x t + τ The pitch detection function used in YIN YIN xx τ = 1 σ N 1 t=0 N 1 τ x t x t + τ 2 Ref: Alain de Cheveigné et al, YIN, a fundamental frequency estimator for speech and music, J. Acoust. Soc. Am. 111 (4), April 2002 Pre-processing Pitch detection function Post-processing

Result 0.1 0 Time-domain signal A violin D4 f 0 = 293 Hz T = 3.41 msec Pitch indicator: Discarding zero-lag term (for zero lag the signal matches the signal itself) p = argmax p ACF(p) p = argmin p AMDF(p) -0.1 0 0.02 0.04 0.06 0.08 0.1 Time (s) 2 x 10-3 ACF 1 0-1 0 0.005 0.01 0.015 0.02 Lag (s) AMDF 0.06 0.04 0.02 0 0 0.005 0.01 0.015 0.02 Lag (s) 4 x 10-3 YIN 2 0 0 0.005 0.01 0.015 0.02 Lag (s)

Wiener-Khinchin Theorem The computational complexity of a N-point ACF: O(N N) Is there any way to accelerate it? Wiener-Khinchin theorem: the ACF is the inverse Fourier transform of the power spectrum R xx τ = IFFT( FFT x t Complexity: O(N log N) 2 )

Generalized ACF Consider a generalization of ACF: R xx τ = IFFT( FFT x t γ ), 0 < γ < 2 Or, R xx τ = IFFT(log FFT x t )? What are the advantages of generalized ACF? Recall the logarithmic compression part of the chromagram! Reference: Helge Indefrey, Wolfgang Hess, and Günter Seeser. "Design and evaluation of double-transform pitch determination algorithms with nonlinear distortion in the frequency domain-preliminary results." in Proc, ICASSP, 1985. Anssi Klapuri, "Multipitch analysis of polyphonic music and speech signals using an auditory model." IEEE Transaction on Audio, Speech and Language Processing, Vol.16, No.2, pp. 255-266, 2008.

Preliminary result 0.1 0 Time-domain signal A violin D4 (f 0 = 293 Hz, T = 3.41 msec) Pitch indicator: γ = 2 (ACF) γ = 0.2 Logarithm -0.1 0 0.02 0.04 0.06 0.08 0.1 Time (s) ACF 5 0-5 0 0.005 0.01 0.015 0.02 Lag (s) AMDF 0.05 0-0.05 0 0.005 0.01 0.015 0.02 Lag (s) YIN 0.2 0-0.2 0 0.005 0.01 0.015 0.02 Lag (s)