Lecture 7: Pitch and Chord (2) HMM, pitch detection functions Li Su 2016/03/31
Chord progressions Chord progressions are not arbitrary Example 1: I-IV-I-V-I (C-F-C-G-C) Example 2: I-V-VI-III-IV-I-II-V (C-G-Am-Em-F-C-Dm-G) From: M. Mueller, Fundamentals of Music Processing, Chapter 5, Springer 2015
Markov chains of chord progressions Markov states α 1, α 2, α 3 in a sequence s 1 s 2 s 3 Markov property: P s n+1 = α j s n = α i, s n 1 = α k, = P(s n+1 = α j s n = α i ) From: M. Mueller, Fundamentals of Music Processing, Chapter 5, Springer 2015
HMM model Observations Chroma features Or template-based result Hidden states Refined chord sequence The answer we want Transition probability From training data Emission probability From training data From: M. Mueller, Fundamentals of Music Processing, Chapter 5, Springer 2015
Discrete HMM components Map an arbitrary chroma features in the test data to one of a finite set of prototype vectors (codebook) Quantization: map the feature Clustering: train the codebook From: M. Mueller, Fundamentals of Music Processing, Chapter 5, Springer 2015
A naïve HMM model training method I states (e.g., I = 24), K observation symbols For the training data: c i = number of transition form α i at time (n = 1) a ij = number of transitions from α i to α j number of transitions from α i b ik = number of transitions from α iand observing β k number of transitions from α i From: M. Mueller, Fundamentals of Music Processing, Chapter 5, Springer 2015
The uncovering problem of HMM Given: An HMM specified by Θ = (A, A, C, B, B) An observation sequence O = (o 1, o 2,, o N ) Find: The single state sequence S = (s 1, s 2,, s N ), s i A that best explain the observation sequence S = argmax S I states, N time frames -> total I N possible paths How to solve this problem? P(O, S Θ)
Viterbi s algorithm (1) Based on dynamic programming: the optimal result for a problem is built on the optimal result for the sub-problems From: M. Mueller, Fundamentals of Music Processing, Chapter 5, Springer 2015
Viterbi s algorithm (2) For backtracking From: M. Mueller, Fundamentals of Music Processing, Chapter 5, Springer 2015
An example of Viterbi s algorithm (1) From: M. Mueller, Fundamentals of Music Processing, Chapter 5, Springer 2015
An example of Viterbi s algorithm (2) From: M. Mueller, Fundamentals of Music Processing, Chapter 5, Springer 2015
Result Better than temporal smoothing From: M. Mueller, Fundamentals of Music Processing, Chapter 5, Springer 2015
Pitch detection
Pitch detection Pitch detection from the spectrum Problem 1: missing fundamental Problem 2: inharmonicity Periodicity-based pitch detection? From: http://sites.sinauer.com/wolfe4e/wa10.02.html From: http://www.21harmony.com/blog/illuminating-inharmonicity
Periodicity detection We have discussed some techniques in spectrum estimation / frequency detection What is the difference between frequency and periodicity? Formally, a periodic signal is defined as x t = x t + T 0, t What is the definition of frequency? Find the fundamental frequency/period Application: pitch detection, transcription, beat tracking
Pitch detection theory: a historical remark August Seebeck (1805-1849) Georg Simon Ohm (1789-1854) Herman von Helmholtz (1821-1894) Harvey Fletcher (1884-1981) Jan Frederik Schouten (1910-1980)
Seebeck s experiment (1841) and Ohm s second law Ohm s second law: a pitch could be heard only if the wave contains power at the frequency ( Fourierism perspective) Ohm: Seebeck s finding is just an illusion pitch is periodicity! pitch is frequency!
Helmholtz s theory On the Sensations of Tone as a Physiological Basis for the Theory of Music (1877) Fourierism perspective: distortion products generated in the ear so we can hear that weak fundamental Fletcher: discover missing fundamental using high-pass filter on audio signal I support Ohm s position, and I have a beautiful explanation I support Helmholtz s position!
Schouten s experiment I (1938) Input signal: 400Hz, 600Hz, 800Hz,, with distortion product at 200Hz (Helmholtz s theory) Add a pure tone of 206 Hz, beats should be heard No beats were heard Things are not quite so simple
Schouten s experiment II (1938) Input signal: 1000Hz, 1200Hz, 1400Hz A clear pitch at 200 Hz should be heard (Helmholtz s theory) Input signal: 1040Hz, 1240Hz, 1440Hz Also a clear pitch at 200 Hz should be heard (Helmholtz s theory) Experiment: ~207 Hz Things are not quite so simple
Challenges Quasi-periodicity Multiple periodicity (polyphonic: overlap and harmonic) Transient
Basic idea of periodicity detection Formally, a periodic signal is defined as x t = x t + T 0, t Formally, the frequency spectrum of a signal is defined as Frequency analysis: the relationship between the signal and the sinusoidal basis Periodicity analysis: the relationship between the signal and itself
Basic periodicity detection functions Autocorrelation function (ACF) Average magnitude difference function (AMDF) YIN and its periodicity detector Generalized ACF and Cepstrum
Autocorrelation function (ACF) Cross product measures similarity across time Cross correlation: R xy τ = 1 σ N 1 t=0 N 1 τ x t y(t + τ) Autocorrelation: R xx τ = 1 σ N 1 t=0 N 1 τ x t x(t + τ) t: time-domain τ: lag-domain
Other relevant pitch detection functions Average magnitude difference function (AMDF) AMDF xx τ = 1 σ N 1 t=0 N 1 τ x t x t + τ The pitch detection function used in YIN YIN xx τ = 1 σ N 1 t=0 N 1 τ x t x t + τ 2 Ref: Alain de Cheveigné et al, YIN, a fundamental frequency estimator for speech and music, J. Acoust. Soc. Am. 111 (4), April 2002 Pre-processing Pitch detection function Post-processing
Result 0.1 0 Time-domain signal A violin D4 f 0 = 293 Hz T = 3.41 msec Pitch indicator: Discarding zero-lag term (for zero lag the signal matches the signal itself) p = argmax p ACF(p) p = argmin p AMDF(p) -0.1 0 0.02 0.04 0.06 0.08 0.1 Time (s) 2 x 10-3 ACF 1 0-1 0 0.005 0.01 0.015 0.02 Lag (s) AMDF 0.06 0.04 0.02 0 0 0.005 0.01 0.015 0.02 Lag (s) 4 x 10-3 YIN 2 0 0 0.005 0.01 0.015 0.02 Lag (s)
Wiener-Khinchin Theorem The computational complexity of a N-point ACF: O(N N) Is there any way to accelerate it? Wiener-Khinchin theorem: the ACF is the inverse Fourier transform of the power spectrum R xx τ = IFFT( FFT x t Complexity: O(N log N) 2 )
Generalized ACF Consider a generalization of ACF: R xx τ = IFFT( FFT x t γ ), 0 < γ < 2 Or, R xx τ = IFFT(log FFT x t )? What are the advantages of generalized ACF? Recall the logarithmic compression part of the chromagram! Reference: Helge Indefrey, Wolfgang Hess, and Günter Seeser. "Design and evaluation of double-transform pitch determination algorithms with nonlinear distortion in the frequency domain-preliminary results." in Proc, ICASSP, 1985. Anssi Klapuri, "Multipitch analysis of polyphonic music and speech signals using an auditory model." IEEE Transaction on Audio, Speech and Language Processing, Vol.16, No.2, pp. 255-266, 2008.
Preliminary result 0.1 0 Time-domain signal A violin D4 (f 0 = 293 Hz, T = 3.41 msec) Pitch indicator: γ = 2 (ACF) γ = 0.2 Logarithm -0.1 0 0.02 0.04 0.06 0.08 0.1 Time (s) ACF 5 0-5 0 0.005 0.01 0.015 0.02 Lag (s) AMDF 0.05 0-0.05 0 0.005 0.01 0.015 0.02 Lag (s) YIN 0.2 0-0.2 0 0.005 0.01 0.015 0.02 Lag (s)