Introduction Basic Audio Feature Extraction Vincent Koops (with slides by Meinhard Müller) Sound and Music Technology, December 6th, 2016 1 28 November 2017
Today g Main modules A. Sound and music for games B. Analysis, classification, and retrieval of sound and music for media C. Generation and manipulation of sound and music for games and media 2
Introduction Aim of this lecture: g Show you the basics of digital signal processing g Show you the basic of audio feature extraction g You should be able to solve simple tasks using audio feature extraction toolboxes g Important concepts are marked in italics g Have some fun with sound! 3
Course Module Module B: g 1. Introduction to Digital Signal Processing: g From natural sound to digital audio g 2. Basic audio feature extraction: g From digital audio to features Case studies: practical examples Toolboxes, etc. 4
What are Features? g If we want to compare/find/classify things, we need a way to describe them in a useful way g Visual: colors, shapes, etc 5
What are Features? g Audio: frequencies, amplitudes: g Low/high sounds: ex. Tuba vs Flute instrument g Loud/quiet sounds: ex. Metal vs Acoustic music g (A manipulation of) these elements is called a feature. 6
Recap: What are Features? g Important feature properties: g Level of abstraction: Low: Frequency -> low/high music instruments Mid: Genre -> metal/country music High: Emotion -> sad/happy music g Locality: Global feature: Local feature: entire signal part of a signal (for time representation) 7
Book: Fundamentals of Music Processing Meinard Müller Fundamentals of Music Processing Audio, Analysis, Algorithms, Applications 483 p., 249 illus., hardcover ISBN: 978-3-319-21944-8 Springer, 2015 Accompanying website: www.music-processing.de
Fourier Transform Idea: Decompose a given signal into a superposition of sinusoids (elementary signals). Sinusoids Signal
Fourier Transform Each sinusoid has a physical meaning and can be described by three parameters: Sinusoids
Fourier Transform Each sinusoid has a physical meaning and can be described by three parameters: Sinusoids Signal
Fourier Transform Each sinusoid has a physical meaning and can be described by three parameters: Signal Fouier transform 1 0.5 1 2 3 4 5 6 7 8 Frequency (Hz)
Fourier Transform Role of phase 0.5 025 Magnitude 0-0.25-0.5 0 0.25 0.5 0.75 1 Phase
Fourier Transform Role of phase Analysis with sinusoid having frequency 262 Hz and phase φ = 0.05 0.5 025 Magnitude 0-0.25-0.5 0 0.25 0.5 0.75 1 Phase
Fourier Transform Role of phase Analysis with sinusoid having frequency 262 Hz and phase φ = 0.24 0.5 025 Magnitude 0-0.25-0.5 0 0.25 0.5 0.75 1 Phase
Fourier Transform Role of phase Analysis with sinusoid having frequency 262 Hz and phase φ = 0.45 0.5 025 Magnitude 0-0.25-0.5 0 0.25 0.5 0.75 1 Phase
Fourier Transform Role of phase Analysis with sinusoid having frequency 262 Hz and phase φ = 0.6 0.5 025 Magnitude 0-0.25-0.5 0 0.25 0.5 0.75 1 Phase
Fourier Transform Example: Superposition of two sinusoids Frequency (Hz)
Fourier Transform Example: C4 played by piano Frequency (Hz)
Fourier Transform Example: Piano tone (C4, 261.6 Hz)
Fourier Transform Example: Piano tone (C4, 261.6 Hz) Analysis using sinusoid with 262 Hz high correlation large Fourier coefficient
Fourier Transform Example: Piano tone (C4, 261.6 Hz) Analysis using sinusoid with 400 Hz low correlation small Fourier coefficient
Fourier Transform Example: Piano tone (C4, 261.6 Hz) Analysis using sinusoid with 523 Hz high correlation large Fourier coefficient
Fourier Transform Example: C4 played by trumpet Frequency (Hz)
Fourier Transform Example: C4 played by violine Frequency (Hz)
Fourier Transform Example: C4 played by flute Frequency (Hz)
Fourier Transform Example: Speech Bonn Frequency (Hz)
Fourier Transform Example: Speech Zürich Frequency (Hz)
Fourier Transform Example: C-major scale (piano) Frequency (Hz)
Fourier Transform Example: Chirp signal Frequency (Hz)
Fourier Transform Signal Fourier representation Fourier transform
Fourier Transform Signal Fourier representation Fourier transform Tells which frequencies occur, but does not tell when the frequencies occur. Frequency information is averaged over the entire time interval. Time information is hidden in the phase
Fourier Transform Frequency (Hz) Frequency (Hz)
Short Time Fourier Transform Idea (Dennis Gabor, 1946): Consider only a small section of the signal for the spectral analysis recovery of time information Short Time Fourier Transform (STFT) Section is determined by pointwise multiplication of the signal with a localizing window function
Short Time Fourier Transform Frequency (Hz)
Short Time Fourier Transform Frequency (Hz)
Short Time Fourier Transform Frequency (Hz)
Short Time Fourier Transform Frequency (Hz)
Short Time Fourier Transform Frequency (Hz)
Short Time Fourier Transform Frequency (Hz)
Short Time Fourier Transform Frequency (Hz)
Short Time Fourier Transform Frequency (Hz)
Short Time Fourier Transform Window functions Rectangular window Triangular window Hann window Frequency (Hz)
Short Time Fourier Transform Window functions Frequency (Hz) Trade off between smoothing and ringing
Short Time Fourier Transform Definition Signal Window function (, ) STFT with for
Short Time Fourier Transform Intuition: is musical note of frequency ω centered at time t Inner product measures the correlation between the musical note and the signal Frequency (Hz) 12 8 4 0 0 1 2 3 4 5 6
Time-Frequency Representation Spectrogram Frequency (Hz)
Time-Frequency Representation Spectrogram Frequency (Hz) Intensity (db)
Time-Frequency Representation Chirp signal and STFT with Hann window of length 50 ms Frequency (Hz)
Time-Frequency Representation Chirp signal and STFT with box window of length 50 ms Frequency (Hz)
Time-Frequency Representation Time-Frequency Localization Size of window constitutes a trade-off between time resolution and frequency resolution: Large window : Small window : poor time resolution good frequency resolution good time resolution poor frequency resolution Heisenberg Uncertainty Principle: there is no window function that localizes in time and frequency with arbitrary position.
Time-Frequency Representation Signal and STFT with Hann window of length 20 ms Frequency (Hz)
Time-Frequency Representation Signal and STFT with Hann window of length 100 ms Frequency (Hz)
Audio Features Example: C-major scale (piano) Spectrogram Frequency (Hz)
Audio Features Example: Chromatic scale Frequency (Hz) Frequency (Hz) Intensity (db) Intensity (db) C1 24 Spectrogram C2 36 C3 48 C4 60 C5 72 C6 84 C7 96 C8 108
Audio Features Example: Chromatic scale C1 24 C2 36 C3 48 C4 60 C5 72 C6 84 C7 96 C8 108 Frequency (Hz) Frequency (Hz) Intensity (db) Intensity (db) Spectrogram
Audio Features Model assumption: Equal-tempered scale MIDI pitches: Piano notes: p = 21 (A0) to p = 108 (C8) Concert pitch: p = 69 (A4) 440 Hz Center frequency: Hz Logarithmic frequency distribution Octave: doubling of frequency
Audio Features Idea: Binning of Fourier coefficients Divide up the fequency axis into logarithmically spaced pitch regions and combine spectral coefficients of each region to a single pitch coefficient.
Audio Features Time-frequency representation Windowing in the time domain Windowing in the frequency domain
Audio Features Example: Chromatic scale C1 24 C2 36 C3 48 C4 60 C5 72 C6 84 C7 96 C8 108 Frequency (Hz) Frequency (Hz) Intensity (db) Intensity (db) Spectrogram
Audio Features Example: Chromatic scale C1 24 Spectrogram C2 36 C3 48 C4 60 C5 72 C6 84 C7 96 C8 108 C8: 4186 Hz C7: 2093 Hz C6: 1046 Hz C5: 523 Hz C4: 262 Hz C3: 131 Hz Intensity (db)
Audio Features Example: Chromatic scale C1 24 Log-frequency spectrogram C2 36 C3 48 C4 60 C5 72 C6 84 C7 96 C8 108 C8: 4186 Hz C7: 2093 Hz C6: 1046 Hz C5: 523 Hz C4: 262 Hz C3: 131 Hz Intensity (db)
Audio Features Chroma features Chromatic circle Shepard s helix of pitch
Audio Features Chroma features Human perception of pitch is periodic in the sense that two pitches are perceived as similar in color if they differ by an octave. Seperation of pitch into two components: tone height (octave number) and chroma. Chroma : 12 traditional pitch classes of the equaltempered scale. For example: Chroma C Computation: pitch features chroma features Add up all pitches belonging to the same class Result: 12-dimensional chroma vector.
Audio Features Chroma features
Audio Features Chroma features C2 C3 C4 Chroma C
Audio Features Chroma features C # 2 C # 3 C # 4 Chroma C #
Audio Features Chroma features D2 D3 D4 Chroma D
Audio Features Example: Chromatic scale C1 24 Log-frequency spectrogram C2 36 C3 48 C4 60 C5 72 C6 84 C7 96 C8 108 Pitch (MIDI note number) Intensity (db)
Audio Features Example: Chromatic scale C1 24 Log-frequency spectrogram C2 36 C3 48 C4 60 C5 72 C6 84 C7 96 C8 108 Pitch (MIDI note number) Intensity (db) Chroma C
Audio Features Example: Chromatic scale C1 24 Log-frequency spectrogram C2 36 C3 48 C4 60 C5 72 C6 84 C7 96 C8 108 Pitch (MIDI note number) Intensity (db) Chroma C #
Audio Features Example: Chromatic scale C1 24 C2 36 C3 48 C4 60 C5 72 C6 84 C7 96 C8 108 Chromagram Chroma Intensity (db)
Audio Features Chroma features Frequency (pitch)
Audio Features Chroma features Frequency (pitch)
Audio Features Chroma features Chroma
Audio Features Chroma features Sequence of chroma vectors correlates to the harmonic progression Normalization to changes in dynamics makes features invariant Further denoising and smoothing Taking logarithm before adding up pitch coefficients accounts for logarithmic sensation of intensity
Audio Features Chroma features (normalized) Karajan Scherbakov
Audio Features Automatic Chord Estimation G:maj D:maj See: http://chordify.net
Mid-level representation Self-distance matrix (self-similarity matrix) Common mid-level representation Comparing each frame with all other frames Each element describing the dissimilarity of two frames (or a sequence of frames) Informative patterns Stripes for repeated sequences Blocks for homogenous segments Paulus, Müller, Klapuri Audio-Based Music Structure Analysis ISMIR 2010
Mid-level representation Self-distance matrix examples MFCC Chroma Paulus, Müller, Klapuri Audio-Based Music Structure Analysis ISMIR 2010
Approaches Categorization Proposed categorization Novelty-based approaches (points of high contrast) Repetition-based approaches Homogeneity-based approaches An earlier division into Sequence approaches: There exists sequences that are repeated during the piece (stripes in SDMs) State approaches: Piece is produced by a state machine, each state produces distinct observations Paulus, Müller, Klapuri Audio-Based Music Structure Analysis ISMIR 2010
Paulus, Müller, Klapuri Audio-Based Music Structure Analysis ISMIR 2010
Audio Features There are many ways to implement chroma features Properties may differ significantly Appropriateness depends on respective application http://www.mpi-inf.mpg.de/resources/mir/chromatoolbox/ MATLAB implementations for various chroma variants
Tool boxes g Many tool boxes exist that can help you develop your own MIR application g Toolboxes have their pro s and con s depending on: g Language of choice: Matlab, Python, C, C++, etc. g I will only list the large frameworks, a lot of other code is available: g Use you favourite search engine! g See also: g http://www.music-ir.org/evaluation/tools.html (rather outdated) g https://wiki.python.org/moin/pythoninmusic 16
VAMP plugins g http://www.vamp-plugins.org/ g You need a host and a plugin: g Sonic visualizer g Sonic annotator g A lot of plugins for high and low-level feature extraction are available g Outputs text files or XML (RDF) g Has Python bindings (VamPy) 17
Sonic Visualizer Waveform Spectrogram Chromagram 18
Matlab toolboxes g g g g g MIR toolbox g https://www.jyu.fi/hum/laitokset/musiikki/en/research/coe/materi als/mirtoolbox Similarity Matrix Toolbox: g http://www.audiolabs-erlangen.de/resources/mir/smtoolbox/ Dan Ellis Matlab resources: g http://www.ee.columbia.edu/~dpwe/resources/matlab/ Chroma toolbox g http://www.mpi-inf.mpg.de/resources/mir/chromatoolbox/ g All sorts of different chroma variants Constant Q toolbox: g http://www.eecs.qmul.ac.uk/~anssik/cqt/ 19
Other Frameworks g Essentia: g http://essentia.upf.edu/ g C++ library g Marsyas: g http://marsyas.info/ g C++ (and Java) g Aubio: g http://aubio.org/ g Written in C g Has Python bindings g Clam g http://clam-project.org/ g Written in C++ g Has Python bindings g Yaafe g http://yaafe.sourceforge.net/ g Python 20
Additional Literature Park, Tae Hong (2010). Introduction to digital signal processing: Computer musically speaking. World Scientific Meinard Müller Fundamentals of Music Processing Audio, Analysis, Algorithms, Applications ISBN: 978-3-319-21944-8 Springer, 2015 22