Lecture 8: Pitch and Chord (3) pitch detection and music transcription. Li Su 2016/03/31

Similar documents
Lecture 7: Pitch and Chord (2) HMM, pitch detection functions. Li Su 2016/03/31

0 0 = 1 0 = 0 1 = = 1 1 = 0 0 = 1

授課大綱 課號課程名稱選別開課系級學分 結果預視

Chapter 20 Cell Division Summary

Chapter 22 Lecture. Essential University Physics Richard Wolfson 2 nd Edition. Electric Potential 電位 Pearson Education, Inc.

生物統計教育訓練 - 課程. Introduction to equivalence, superior, inferior studies in RCT 謝宗成副教授慈濟大學醫學科學研究所. TEL: ext 2015

國立中正大學八十一學年度應用數學研究所 碩士班研究生招生考試試題

Candidates Performance in Paper I (Q1-4, )

Between Homomorphic Signal Processing and Deep Neural Networks: Constructing Deep Algorithms for Polyphonic Music Transcription

Advanced Engineering Mathematics 長榮大學科工系 105 級

= lim(x + 1) lim x 1 x 1 (x 2 + 1) 2 (for the latter let y = x2 + 1) lim

2019 年第 51 屆國際化學奧林匹亞競賽 國內初選筆試 - 選擇題答案卷

Linear Regression. Applied Linear Regression Models (Kutner, Nachtsheim, Neter, Li) hsuhl (NUK) SDA Regression 1 / 34

Frequency Response (Bode Plot) with MATLAB

EXPERMENT 9. To determination of Quinine by fluorescence spectroscopy. Introduction

Chapter 1 Linear Regression with One Predictor Variable

Multiple sequence alignment (MSA)

MULTIPITCH ESTIMATION AND INSTRUMENT RECOGNITION BY EXEMPLAR-BASED SPARSE REPRESENTATION. Ikuo Degawa, Kei Sato, Masaaki Ikehara

Chapter 6. Series-Parallel Circuits ISU EE. C.Y. Lee

Algorithms and Complexity

REAXYS NEW REAXYS. RAEXYS 教育訓練 PPT HOW YOU THINK HOW YOU WORK

MECHANICS OF MATERIALS

Ch.9 Liquids and Solids

在雲層閃光放電之前就開始提前釋放出離子是非常重要的因素 所有 FOREND 放電式避雷針都有離子加速裝置支援離子產生器 在產品設計時, 為增加電場更大範圍, 使用電極支援大氣離子化,

Differential Equations (DE)

FACTORS IN FACTORIZATION: DOES BETTER AUDIO SOURCE SEPARATION IMPLY BETTER POLYPHONIC MUSIC TRANSCRIPTION?

雷射原理. The Principle of Laser. 授課教授 : 林彥勝博士 Contents

Nonnegative Matrix Factor 2-D Deconvolution for Blind Single Channel Source Separation

HKDSE Chemistry Paper 2 Q.1 & Q.3

Analysis of polyphonic audio using source-filter model and non-negative matrix factorization

台灣大學開放式課程 有機化學乙 蔡蘊明教授 本著作除另有註明, 作者皆為蔡蘊明教授, 所有內容皆採用創用 CC 姓名標示 - 非商業使用 - 相同方式分享 3.0 台灣授權條款釋出

磁振影像原理與臨床研究應用 課程內容介紹 課程內容 參考書籍. Introduction of MRI course 磁振成像原理 ( 前 8 週 ) 射頻脈衝 組織對比 影像重建 脈衝波序 影像假影與安全 等

Topic 6. Timbre Representations

第二章 : Hydrostatics and Atmospheric Stability. Ben Jong-Dao Jou Autumn 2010

邏輯設計 Hw#6 請於 6/13( 五 ) 下課前繳交

Oracle Analysis of Sparse Automatic Music Transcription

Short-Time Fourier Transform and Chroma Features

GSAS 安裝使用簡介 楊仲準中原大學物理系. Department of Physics, Chung Yuan Christian University

Introduction Basic Audio Feature Extraction

Scalable audio separation with light Kernel Additive Modelling

Single Channel Music Sound Separation Based on Spectrogram Decomposition and Note Classification

Non-Negative Matrix Factorization And Its Application to Audio. Tuomas Virtanen Tampere University of Technology

原子模型 Atomic Model 有了正確的原子模型, 才會發明了雷射

Novelty detection. Juan Pablo Bello MPATE-GE 2623 Music Information Retrieval New York University

適應控制與反覆控制應用在壓電致動器之研究 Adaptive and Repetitive Control of Piezoelectric Actuators

KWUN TONG GOVERNMENT SECONDARY SCHOOL 觀塘官立中學 (Office) Shun Lee Estate Kwun Tong, Kowloon 上學期測驗

Timbral, Scale, Pitch modifications

tan θ(t) = 5 [3 points] And, we are given that d [1 points] Therefore, the velocity of the plane is dx [4 points] (km/min.) [2 points] (The other way)

University of Colorado at Boulder ECEN 4/5532. Lab 2 Lab report due on February 16, 2015

CHAPTER 2. Energy Bands and Carrier Concentration in Thermal Equilibrium

Feature extraction 1

課程名稱 : 電路學 (2) 授課教師 : 楊武智 期 :96 學年度第 2 學期

14-A Orthogonal and Dual Orthogonal Y = A X

Real-Time Pitch Determination of One or More Voices by Nonnegative Matrix Factorization

pseudo-code-2012.docx 2013/5/9

Research Article A Combined Mathematical Treatment for a Special Automatic Music Transcription System

Chapter 2 the z-transform. 2.1 definition 2.2 properties of ROC 2.3 the inverse z-transform 2.4 z-transform properties

Chapter 1 Linear Regression with One Predictor Variable

ANSYS 17 應用於半導體設備和製程的應用技術

SINGLE CHANNEL SPEECH MUSIC SEPARATION USING NONNEGATIVE MATRIX FACTORIZATION AND SPECTRAL MASKS. Emad M. Grais and Hakan Erdogan

Chapter 13. Enzyme Kinetics ( 動力學 ) and Specificity ( 特異性 專一性 ) Biochemistry by. Reginald Garrett and Charles Grisham

TIME-DEPENDENT PARAMETRIC AND HARMONIC TEMPLATES IN NON-NEGATIVE MATRIX FACTORIZATION

Ch2. Atoms, Molecules and Ions

ApTutorGroup. SAT II Chemistry Guides: Test Basics Scoring, Timing, Number of Questions Points Minutes Questions (Multiple Choice)

PreFEst: A Predominant-F0 Estimation Method for Polyphonic Musical Audio Signals

Ch2 Linear Transformations and Matrices

Chapter 7. The Quantum- Mechanical Model of the Atom. Chapter 7 Lecture Lecture Presentation. Sherril Soman Grand Valley State University

A Direct Simulation Method for Continuous Variable Transmission with Component-wise Design Specifications

Chapter 1 Physics and Measurement

ROBUST REALTIME POLYPHONIC PITCH DETECTION

Finite Interval( 有限區間 ) open interval ( a, closed interval [ ab, ] = { xa x b} half open( or half closed) interval. Infinite Interval( 無限區間 )

BILEVEL SPARSE MODELS FOR POLYPHONIC MUSIC TRANSCRIPTION

Short-Time Fourier Transform and Chroma Features

Learning to Recommend with Location and Context

FUNDAMENTALS OF FLUID MECHANICS Chapter 3 Fluids in Motion - The Bernoulli Equation

Topic 7. Convolution, Filters, Correlation, Representation. Bryan Pardo, 2008, Northwestern University EECS 352: Machine Perception of Music and Audio

Novelty detection. Juan Pablo Bello EL9173 Selected Topics in Signal Processing: Audio Content Analysis NYU Poly

醫用磁振學 MRM 課程介紹與原理複習. Congratulations! Syllabus. From Basics to Bedside. You are HERE! License of Radiological Technologist 盧家鋒助理教授國立陽明大學生物醫學影像暨放射科學系

數位系統設計 Digital System Design

SHIFTED AND CONVOLUTIVE SOURCE-FILTER NON-NEGATIVE MATRIX FACTORIZATION FOR MONAURAL AUDIO SOURCE SEPARATION. Tomohiko Nakamura and Hirokazu Kameoka,

壓差式迴路式均熱片之研製 Fabrication of Pressure-Difference Loop Heat Spreader

論文與專利寫作暨學術 倫理期末報告 班級 : 碩化一甲學號 :MA 姓名 : 林郡澤老師 : 黃常寧

大原利明 算法点竄指南 点竄術 算 額 絵馬堂

d) There is a Web page that includes links to both Web page A and Web page B.

Sparse Learning Under Regularization Framework

Hybrid Simulations of Ground Motions for the Earthquake Scenarios of Shanchiao and Hengchun faults

Statistics and Econometrics I

1 dx (5%) andˆ x dx converges. x2 +1 a

Elementary Number Theory An Algebraic Apporach

Signal representations: Cepstrum

CH 5 More on the analysis of consumer behavior

Candidates Performance in Paper I (Q1-4, )

Chapter 8 Lecture. Essential University Physics Richard Wolfson 2 nd Edition. Gravity 重力 Pearson Education, Inc. Slide 8-1

NMF WITH SPECTRAL AND TEMPORAL CONTINUITY CRITERIA FOR MONAURAL SOUND SOURCE SEPARATION. Julian M. Becker, Christian Sohn and Christian Rohlfing

On Spectral Basis Selection for Single Channel Polyphonic Music Separation

Ch. 6 Electronic Structure and The Periodic Table

Harmonic Adaptive Latent Component Analysis of Audio and Application to Music Transcription

Work Energy And Power 功, 能量及功率

LECTURE NOTES IN AUDIO ANALYSIS: PITCH ESTIMATION FOR DUMMIES

Lecture Notes on Propensity Score Matching

Transcription:

Lecture 8: Pitch and Chord (3) pitch detection and music transcription Li Su 2016/03/31

Pitch detection Pitch detection from the spectrum Problem 1: missing fundamental Problem 2: inharmonicity Periodicity-based pitch detection?

Periodicity detection We have discussed some techniques in spectrum estimation / frequency detection What is the difference between frequency and periodicity? Formally, a periodic signal is defined as x t = x t + T 0, t What is the definition of frequency? Find the fundamental frequency/period Application: pitch detection, transcription, beat tracking

Basic idea of periodicity detection Formally, a periodic signal is defined as x t = x t + T 0, t Formally, the frequency spectrum of a signal is defined as Frequency analysis: the relationship between the signal and the sinusoidal basis Periodicity analysis: the relationship between the signal and itself

Basic periodicity detection functions Autocorrelation function (ACF) Average magnitude difference function (AMDF) YIN and its periodicity detector Generalized ACF and Cepstrum

Autocorrelation function (ACF) Cross product measures similarity across time Cross correlation: R xy τ = 1 σ N 1 t=0 N 1 τ x t y(t + τ) Autocorrelation: R xx τ = 1 σ N 1 t=0 N 1 τ x t x(t + τ) t: time-domain τ: lag-domain

Other relevant pitch detection functions Average magnitude difference function (AMDF) AMDF xx τ = 1 σ N 1 t=0 N 1 τ x t x t + τ The pitch detection function used in YIN YIN xx τ = 1 σ N 1 t=0 N 1 τ x t x t + τ 2 Ref: Alain de Cheveigné et al, YIN, a fundamental frequency estimator for speech and music, J. Acoust. Soc. Am. 111 (4), April 2002 Pre-processing Pitch detection function Post-processing

Result 0.1 0 Time-domain signal A violin D4 f 0 = 293 Hz T = 3.41 msec Pitch indicator: Discarding zero-lag term (for zero lag the signal matches the signal itself) p = argmax p ACF(p) p = argmin p AMDF(p) -0.1 0 0.02 0.04 0.06 0.08 0.1 Time (s) 2 x 10-3 ACF 1 0-1 0 0.005 0.01 0.015 0.02 Lag (s) AMDF 0.06 0.04 0.02 0 0 0.005 0.01 0.015 0.02 Lag (s) 4 x 10-3 YIN 2 0 0 0.005 0.01 0.015 0.02 Lag (s)

Wiener-Khinchin Theorem The computational complexity of a N-point ACF: O(N N) Is there any way to accelerate it? Wiener-Khinchin theorem: the ACF is the inverse Fourier transform of the power spectrum R xx τ = IFFT( FFT x t Complexity: O(N log N) 2 )

Generalized ACF Consider a generalization of ACF: R xx τ = IFFT( FFT x t γ ), 0 < γ < 2 Or, R xx τ = IFFT(log FFT x t )? What are the advantages of generalized ACF? Recall the logarithmic compression part of the chromagram! Reference: Helge Indefrey, Wolfgang Hess, and Günter Seeser. "Design and evaluation of double-transform pitch determination algorithms with nonlinear distortion in the frequency domain-preliminary results." in Proc, ICASSP, 1985. Anssi Klapuri, "Multipitch analysis of polyphonic music and speech signals using an auditory model." IEEE Transaction on Audio, Speech and Language Processing, Vol.16, No.2, pp. 255-266, 2008.

Preliminary result 0.1 0 Time-domain signal A violin D4 (f 0 = 293 Hz, T = 3.41 msec) Pitch indicator: γ = 2 (ACF) γ = 0.2 Logarithm -0.1 0 0.02 0.04 0.06 0.08 0.1 Time (s) ACF 5 0-5 0 0.005 0.01 0.015 0.02 Lag (s) AMDF 0.05 0-0.05 0 0.005 0.01 0.015 0.02 Lag (s) YIN 0.2 0-0.2 0 0.005 0.01 0.015 0.02 Lag (s)

Cepstrum From spectrum to cepstrum ( 倒頻譜 ) Spectrum computed by fast Fourier transform (FFT): X f = FFT(x(t)) Cepstrum: തX q = IFFT(log X(f) ) q: quefrency ( 倒頻率 ) (not frequency) Quefrency in the cepstrum, and lag in the ACF are both measured in time (but not in the time domain) FT[ ] log[ ] FT 1 [ ] x(t) X(f) log X f തX(f) s t f(t) S f F(f) log S f + log F f log ҧ S f + log തF f convolution product addition addition

The meaning of the cepstrum What is the meaning for the spectrum of a spectrum? It extracts the oscillatory behaviors of the spectrum It measures how many oscillatory shapes per frequency -> fundamental period! We can also think ACF in this way! log X(f) log S(f) log F(f) = + തX(q) f X S (q) f rahmonic peak X F (q) f = + q q 0 q q

A closer look to the cepstrum A physical system: convolution of the excitation and the impulse response Excitation: fast spectral variation Impulse response: slow spectral variation How to do deconvolution? Homomorphic signal processing Homomorphism: to carry over operations from one algebra system to another Convert complicated operation to simple ones Example: Fourier transform logarithm Convolution Multiplication Addition

Homomorphic signal processing for pitch detection Source-filter model: pitch signal as an impulse train convolved by an impulse response Separation of oscillatory part and the impulse response part Example of homomorphic filtering: a long-pass lifter for capturing pitch information High-pass vs. long-pass Filter vs. lifter തX(q) X S (q) X F (q) = + q q 0 q q

x(t) s(t) f(t) = t t t X(f) S(f) F(f) = log X(f) f f 0 log S(f) f log F(f) f = + f f f തX(q) X S (q) X F (q) = + q q 0 q q

Generalized logarithm and cepstrum If we just care about the effect of nonlinear scaling (i.e., logarithm) when computing cepstrum Pros: simulate human s perception by compression Cons: sensitive to noise and zeros in the spectrum Generalized logarithm: x γ 1 g γ x = ቐ γ, 0 < γ < 2 ln x, γ = 0 Generalized cepstrum: തX γ q = IFFT(g γ (X(f))) Similar to the generalized ACF: R xx τ = IFFT( X f 2 ) Useful when there are multiple pitches Implication: our perception may be neither linear scale (ACF) nor log scale (cepstrum)

Example A complicated example: 5-polyphony piano sample A4+C#5+F5+G#5+B5 T. Tolonen and M. Karjalainen, A computationally efficient multipitch analysis model, IEEE Speech Audio Process., vol. 8, no. 6, pp. 708 716, Nov. 2000. L. Su and Y.-H. Yang, Combining Spectral and Temporal Representations for Multipitch Estimation of Polyphonic Music, IEEE/ACM Speech Audio Language Process., vol. 23, no. 10, pp. 1600 1612, Oct. 2015.

Automatic music transcription Can a machine beat a music genius?

Digital music formats 訊號層面 (signal) 原始格式 :.wav 壓縮格式 :.mp3,.mp4, 符號層面 (symbolic) 音樂數位介面 (MIDI) 文字層面 (lexical) 原始格式 : 紙本 掃描成.pdf 可編輯 :musicxml 主要的音樂轉譜問題 WAV to MIDI: automatic music transcription MIDI to musicxml: automatic music transcription (note parsing) PDF to musicxml: optical music recognition (OMR) Synthesize Physical (Audio signal) Symbolic (MIDI) Transcribe Play Lexical (Score)

音樂數位介面 (MIDI) 實際製作 播放 辨識數位音樂時, 單純給電腦樂譜資訊是不夠的 舉例 : 如何定義漸快或漸慢? 反過來問 : 給定一個 C4, 從第 1.73 秒到第 2.24 秒, 它是四分音符還是八分音符? 音樂數位介面 (Musical Instrument Digital Interface, MIDI) 於 1980 年代問世 相容於 ( 所有的 ) 鍵盤樂器 音效卡 合成器 電子鼓等裝置, 內含以下資訊 : Onset Duration (offset) Pitch Velocity ( 力度 )

MusicXML 相容於多數樂譜編輯軟體如 Finale, Sibelius, MuseScore 等 基本格式 <part> <measure> <attributes> <divisions> <key> <fifths><mode> <time> <beats><beat-type> <clef> <sign><line>

WAV to MIDI 自動採譜的三個層次 Multi-pitch estimation (MPE): collectively estimate pitch values of all concurrent sources at each individual time frame, without determining their sources Note tracking (NT): estimate continuous segments that typically correspond to individual notes or syllables Timbre tracking, streaming: stream pitch estimates into a single pitch trajectory over an entire conversation or music performance for each of the concurrent sources 其他 WAV to MIDI 的相關問題 Onset detection Beat tracking / downbeat tracking Meter recognition Chord recognition Structure segmentation

Important papers M. Müller, D. P. W. Ellis, A. Klapuri, and G. Richard, Signal processing for music analysis, IEEE J. Sel. Topics Signal Process., vol. 5, no. 6, pp. 1088 1110, Dec. 2011. E. Benetos, S. Dixon, D. Giannoulis, H. Kirchhoff, and A. Klapuri, Automatic music transcription: Challenges and future directions, J. Intell. Inf. Syst., vol. 41, no. 3, pp. 407 434, 2013. Z. Duan, J. Han, and B. Pardo, Multi-pitch streaming of harmonic sound mixtures, IEEE/ACM Trans. Audio, Speech, Lang. Process., vol. 22, no. 1, pp. 138 150, Jan. 2014.

Challenges of multipitch signal (1) Example: C major triad (C4+E4+G4)

Challenges of multipitch signal (2) Octave dual-tone (C4+C5) The harmonic series of C5 is fully overlapped with C4 Ill-posed problem: hard to determine whether C4 or C4+C5 Can be found by the ratio of even and odd harmonics, but not very efficient in experiment 8ve

Challenges of multipitch signal (3) 複雜的音型 ( 海頓 / 驚愕交響曲第一樂章 / 第 25.5 秒 ) Who is who? 8ve 8ve 8ve 8ve 8ve

Challenges in multipitch estimation The best grade in Music Information Retrieval Evaluation exchange (MIREX) Multi-F0 challenges: Multi-pitch estimation: 72.3 % accuracy (Anders Elowsson et. al, 2014) Note tracking: 58.2 % accuracy (Anders Elowsson et. al, 2014) Evaluated on only 30 woodwind quintets and 10 piano clips Challenge (1):overlapped harmonics (octaves, fifths) Challenge (2):noise Challenge (3):threshold of detection Challenge (4):labeled dataset We have not enough labeled data!! Challenge (5):timbre complexity One pitch played by multiple instruments Challenge (6):efficiency

State of the art Feature-based (expert knowledge-based) Using audio features derived from the input time-frequency representation (e.g., spectrum, autocorrelation, ), and designing pitch salience function Statistical model-based Maximum a posteriori (MAP) estimation problem Matrix factorization-based Non-negative matrix factorization (NMF) Dictionary-based methods

Dictionary-based pitch detection: basic From frequency representation A dictionary D R m n be a set of spectral features D = d 1, d 2,, d n, column d k R m called an atom or template Input feature vector: x R m Encoding process: template matching Solve linear equations / linear approximation, α R m x = Dα or x Dα

Basic template matching: single pitch detection Input x, dictionary D = d 1, d 2,, d 88, each d k represents one pitch (e.g., d 1 is the spectral pattern of A0, d 40 is the spectral pattern of C4) Find a d k such that x d k is maximum Vector quantization (VQ): sparsest approximation x Dα s.t. α 0 = 1 C4 Templates from A0 to C8

How about polyphonic signals? Find the atoms having the k-th largest x d k k-nearest neighbor (knn) Or, how about the following formulation? min α 0 s.t. x Dα 2 < ε From: E. Vincent et. al, Adaptive Harmonic Spectral Decomposition for Multiple Pitch Estimation, IEEE TASLP 2010

Sparse coding Perfect reconstruction minimize α 0 s.t. x = Dα Approximation, hard constraint minimize α 0 s.t. x Dα 2 < ε Approximation, soft constraint minimize x Dα 2 + λ α 0

Some concepts revisited Assume D full rank, Condition Example Solution Application m>n: skinny D x = Dα: an overdetermined system D: an under-complete dictionary Least square error: α = D T D 1 D T x Regression Curve fitting m<n: fat D x = Dα: an underdetermined system D: an over-complete dictionary Least norm solution: α = D T DD T 1 x Sparse solution: min α 0 s. t. x = Dα Signal recovery Feature selection

Some points in sparse coding Overcompleteness For n > 2m, sparse solution is guaranteed L1-norm regularization L0-norm is non-convex (no guarantee of global optimal solution) Use L1-norm instead of L0-norm (a compromise between convexity and sparsity) argmin α x Dα 2 + λ α 1

Pitch tracking and streaming 偵測音符的起始點 (onset) 音高 (pitch) 和終止點 (offset) 完成 pitch tracking, 才算是具備實際用途的自動採譜演算法 比 MPE 更精細複雜的問題 Repeating note Tie, fermata Legato, portato, staccato Trill, mordant, grace note Vibrato Tremolo Slide The challenge of streaming Clustering of timbre feature Multiple instrument recognition

Example: vibrato Pitch contours of the first note of Mozart s Variationen (C5) interpreted in 6 different musical terms: None, Scherzando, Tranquillo, Con Brio, Maestoso, and Risoluto

Instantaneous frequency estimation How accurate can we estimate the fundamental frequency? The finest grid in frequency (Δf) and periodicity (Δτ) representation Frequency-based method: Δf = f s /N Periodicity-based method: Δτ = 1/f s Super-resolution methods Interpolation Higher-order physical quantities Example: use the temporal difference of STFT phase to calibrate the instantaneous frequency Reference: Justin Salamon and Emilia Gómez. "Melody extraction from polyphonic music signals using pitch contour characteristics." IEEE Transactions on Audio, Speech, and Language Processing, vol. 20, no. 6, pp. 1759-1770, 2012.