Introduction Basic Audio Feature Extraction

Similar documents
Audio Features. Fourier Transform. Fourier Transform. Fourier Transform. Short Time Fourier Transform. Fourier Transform.

Audio Features. Fourier Transform. Short Time Fourier Transform. Short Time Fourier Transform. Short Time Fourier Transform

Short-Time Fourier Transform and Chroma Features

Short-Time Fourier Transform and Chroma Features

University of Colorado at Boulder ECEN 4/5532. Lab 2 Lab report due on February 16, 2015

Fourier Analysis of Signals

Fourier Analysis of Signals

Lecture 7: Pitch and Chord (2) HMM, pitch detection functions. Li Su 2016/03/31

Non-Negative Matrix Factorization And Its Application to Audio. Tuomas Virtanen Tampere University of Technology

Sound Recognition in Mixtures

Frequency Domain Speech Analysis

EXAMINING MUSICAL MEANING IN SIMILARITY THRESHOLDS

6.003 Signal Processing

Chapter 17. Superposition & Standing Waves

E : Lecture 1 Introduction

CS 188: Artificial Intelligence Fall 2011

L6: Short-time Fourier analysis and synthesis

Introduction to Biomedical Engineering

TIME-DEPENDENT PARAMETRIC AND HARMONIC TEMPLATES IN NON-NEGATIVE MATRIX FACTORIZATION

Nonnegative Matrix Factor 2-D Deconvolution for Blind Single Channel Source Separation

CS229 Project: Musical Alignment Discovery

Topic 7. Convolution, Filters, Correlation, Representation. Bryan Pardo, 2008, Northwestern University EECS 352: Machine Perception of Music and Audio

FACTORS IN FACTORIZATION: DOES BETTER AUDIO SOURCE SEPARATION IMPLY BETTER POLYPHONIC MUSIC TRANSCRIPTION?

Scalable audio separation with light Kernel Additive Modelling

Scattering.m Documentation

Chirp Transform for FFT

Lecture Notes 5: Multiresolution Analysis

Introduction to Time-Frequency Distributions

1. Calculation of the DFT

Automatic Speech Recognition (CS753)

Novelty detection. Juan Pablo Bello MPATE-GE 2623 Music Information Retrieval New York University

LECTURE NOTES IN AUDIO ANALYSIS: PITCH ESTIMATION FOR DUMMIES

Gaussian Processes for Audio Feature Extraction

MULTI-RESOLUTION SIGNAL DECOMPOSITION WITH TIME-DOMAIN SPECTROGRAM FACTORIZATION. Hirokazu Kameoka

Physics General Physics. Lecture 25 Waves. Fall 2016 Semester Prof. Matthew Jones

Feature extraction 2

Time-Frequency Analysis

PreFEst: A Predominant-F0 Estimation Method for Polyphonic Musical Audio Signals

EE123 Digital Signal Processing

y = log b Exponential and Logarithmic Functions LESSON THREE - Logarithmic Functions Lesson Notes Example 1 Graphing Logarithms

Design Criteria for the Quadratically Interpolated FFT Method (I): Bias due to Interpolation

Instrument-specific harmonic atoms for mid-level music representation

6.003 Signal Processing

Timbre Similarity. Perception and Computation. Prof. Michael Casey. Dartmouth College. Thursday 7th February, 2008

IMISOUND: An Unsupervised System for Sound Query by Vocal Imitation

QUERY-BY-EXAMPLE MUSIC RETRIEVAL APPROACH BASED ON MUSICAL GENRE SHIFT BY CHANGING INSTRUMENT VOLUME

Convolutive Non-Negative Matrix Factorization for CQT Transform using Itakura-Saito Divergence

CONSTRUCTING AN INVERTIBLE CONSTANT-Q TRANSFORM WITH NONSTATIONARY GABOR FRAMES

Singer Identification using MFCC and LPC and its comparison for ANN and Naïve Bayes Classifiers

I. Signals & Sinusoids

Oracle Analysis of Sparse Automatic Music Transcription

Noise Robust Isolated Words Recognition Problem Solving Based on Simultaneous Perturbation Stochastic Approximation Algorithm

Musical Genre Classication

TIME-DEPENDENT PARAMETRIC AND HARMONIC TEMPLATES IN NON-NEGATIVE MATRIX FACTORIZATION

Lecture Wigner-Ville Distributions

Response-Field Dynamics in the Auditory Pathway

MULTIPITCH ESTIMATION AND INSTRUMENT RECOGNITION BY EXEMPLAR-BASED SPARSE REPRESENTATION. Ikuo Degawa, Kei Sato, Masaaki Ikehara

COMP 546, Winter 2018 lecture 19 - sound 2

Chapters 11 and 12. Sound and Standing Waves

! Spectral Analysis with DFT. ! Windowing. ! Effect of zero-padding. ! Time-dependent Fourier transform. " Aka short-time Fourier transform

TIME-FREQUENCY ANALYSIS EE3528 REPORT. N.Krishnamurthy. Department of ECE University of Pittsburgh Pittsburgh, PA 15261

Topic 3: Fourier Series (FS)

Bayesian harmonic models for musical signal analysis. Simon Godsill and Manuel Davy

Machine Recognition of Sounds in Mixtures

SINGLE CHANNEL SPEECH MUSIC SEPARATION USING NONNEGATIVE MATRIX FACTORIZATION AND SPECTRAL MASKS. Emad M. Grais and Hakan Erdogan

Extraction of Individual Tracks from Polyphonic Music

Linear Prediction 1 / 41

Reference Text: The evolution of Applied harmonics analysis by Elena Prestini

Sound 2: frequency analysis

Analysis of polyphonic audio using source-filter model and non-negative matrix factorization

Lecture 5. The Digital Fourier Transform. (Based, in part, on The Scientist and Engineer's Guide to Digital Signal Processing by Steven Smith)

10ème Congrès Français d Acoustique

Power-Scaled Spectral Flux and Peak-Valley Group-Delay Methods for Robust Musical Onset Detection

Temporal Modeling and Basic Speech Recognition

200Pa 10million. Overview. Acoustics of Speech and Hearing. Loudness. Terms to describe sound. Matching Pressure to Loudness. Loudness vs.

Origin of Sound. Those vibrations compress and decompress the air (or other medium) around the vibrating object

1. Personal Audio 2. Features 3. Segmentation 4. Clustering 5. Future Work

Speech Signal Representations

TIME-FREQUENCY ANALYSIS: TUTORIAL. Werner Kozek & Götz Pfander

Superposition and Standing Waves

ACCOUNTING FOR PHASE CANCELLATIONS IN NON-NEGATIVE MATRIX FACTORIZATION USING WEIGHTED DISTANCES. Sebastian Ewert Mark D. Plumbley Mark Sandler

Wavelet Transform. Figure 1: Non stationary signal f(t) = sin(100 t 2 ).

Jean Morlet and the Continuous Wavelet Transform

Review &The hardest MC questions in media, waves, A guide for the perplexed

Real-Time Spectrogram Inversion Using Phase Gradient Heap Integration

Constrained Nonnegative Matrix Factorization with Applications to Music Transcription

Multiresolution Analysis

FOURIER ANALYSIS. (a) Fourier Series

Physical Acoustics. Hearing is the result of a complex interaction of physics, physiology, perception and cognition.

BAYESIAN NONNEGATIVE HARMONIC-TEMPORAL FACTORIZATION AND ITS APPLICATION TO MULTIPITCH ANALYSIS

Basics on 2-D 2 D Random Signal

Environmental Effects and Control of Noise Lecture 2

SPEECH COMMUNICATION 6.541J J-HST710J Spring 2004

Pattern Recognition Applied to Music Signals

INTRODUCTION TO DELTA-SIGMA ADCS

Introduction to time-frequency analysis. From linear to energy-based representations

Single Channel Music Sound Separation Based on Spectrogram Decomposition and Note Classification

L29: Fourier analysis

Musical Modulation by Symmetries. By Rob Burnham

Music 270a: Complex Exponentials and Spectrum Representation

Transcription:

Introduction Basic Audio Feature Extraction Vincent Koops (with slides by Meinhard Müller) Sound and Music Technology, December 6th, 2016 1 28 November 2017

Today g Main modules A. Sound and music for games B. Analysis, classification, and retrieval of sound and music for media C. Generation and manipulation of sound and music for games and media 2

Introduction Aim of this lecture: g Show you the basics of digital signal processing g Show you the basic of audio feature extraction g You should be able to solve simple tasks using audio feature extraction toolboxes g Important concepts are marked in italics g Have some fun with sound! 3

Course Module Module B: g 1. Introduction to Digital Signal Processing: g From natural sound to digital audio g 2. Basic audio feature extraction: g From digital audio to features Case studies: practical examples Toolboxes, etc. 4

What are Features? g If we want to compare/find/classify things, we need a way to describe them in a useful way g Visual: colors, shapes, etc 5

What are Features? g Audio: frequencies, amplitudes: g Low/high sounds: ex. Tuba vs Flute instrument g Loud/quiet sounds: ex. Metal vs Acoustic music g (A manipulation of) these elements is called a feature. 6

Recap: What are Features? g Important feature properties: g Level of abstraction: Low: Frequency -> low/high music instruments Mid: Genre -> metal/country music High: Emotion -> sad/happy music g Locality: Global feature: Local feature: entire signal part of a signal (for time representation) 7

Book: Fundamentals of Music Processing Meinard Müller Fundamentals of Music Processing Audio, Analysis, Algorithms, Applications 483 p., 249 illus., hardcover ISBN: 978-3-319-21944-8 Springer, 2015 Accompanying website: www.music-processing.de

Fourier Transform Idea: Decompose a given signal into a superposition of sinusoids (elementary signals). Sinusoids Signal

Fourier Transform Each sinusoid has a physical meaning and can be described by three parameters: Sinusoids

Fourier Transform Each sinusoid has a physical meaning and can be described by three parameters: Sinusoids Signal

Fourier Transform Each sinusoid has a physical meaning and can be described by three parameters: Signal Fouier transform 1 0.5 1 2 3 4 5 6 7 8 Frequency (Hz)

Fourier Transform Role of phase 0.5 025 Magnitude 0-0.25-0.5 0 0.25 0.5 0.75 1 Phase

Fourier Transform Role of phase Analysis with sinusoid having frequency 262 Hz and phase φ = 0.05 0.5 025 Magnitude 0-0.25-0.5 0 0.25 0.5 0.75 1 Phase

Fourier Transform Role of phase Analysis with sinusoid having frequency 262 Hz and phase φ = 0.24 0.5 025 Magnitude 0-0.25-0.5 0 0.25 0.5 0.75 1 Phase

Fourier Transform Role of phase Analysis with sinusoid having frequency 262 Hz and phase φ = 0.45 0.5 025 Magnitude 0-0.25-0.5 0 0.25 0.5 0.75 1 Phase

Fourier Transform Role of phase Analysis with sinusoid having frequency 262 Hz and phase φ = 0.6 0.5 025 Magnitude 0-0.25-0.5 0 0.25 0.5 0.75 1 Phase

Fourier Transform Example: Superposition of two sinusoids Frequency (Hz)

Fourier Transform Example: C4 played by piano Frequency (Hz)

Fourier Transform Example: Piano tone (C4, 261.6 Hz)

Fourier Transform Example: Piano tone (C4, 261.6 Hz) Analysis using sinusoid with 262 Hz high correlation large Fourier coefficient

Fourier Transform Example: Piano tone (C4, 261.6 Hz) Analysis using sinusoid with 400 Hz low correlation small Fourier coefficient

Fourier Transform Example: Piano tone (C4, 261.6 Hz) Analysis using sinusoid with 523 Hz high correlation large Fourier coefficient

Fourier Transform Example: C4 played by trumpet Frequency (Hz)

Fourier Transform Example: C4 played by violine Frequency (Hz)

Fourier Transform Example: C4 played by flute Frequency (Hz)

Fourier Transform Example: Speech Bonn Frequency (Hz)

Fourier Transform Example: Speech Zürich Frequency (Hz)

Fourier Transform Example: C-major scale (piano) Frequency (Hz)

Fourier Transform Example: Chirp signal Frequency (Hz)

Fourier Transform Signal Fourier representation Fourier transform

Fourier Transform Signal Fourier representation Fourier transform Tells which frequencies occur, but does not tell when the frequencies occur. Frequency information is averaged over the entire time interval. Time information is hidden in the phase

Fourier Transform Frequency (Hz) Frequency (Hz)

Short Time Fourier Transform Idea (Dennis Gabor, 1946): Consider only a small section of the signal for the spectral analysis recovery of time information Short Time Fourier Transform (STFT) Section is determined by pointwise multiplication of the signal with a localizing window function

Short Time Fourier Transform Frequency (Hz)

Short Time Fourier Transform Frequency (Hz)

Short Time Fourier Transform Frequency (Hz)

Short Time Fourier Transform Frequency (Hz)

Short Time Fourier Transform Frequency (Hz)

Short Time Fourier Transform Frequency (Hz)

Short Time Fourier Transform Frequency (Hz)

Short Time Fourier Transform Frequency (Hz)

Short Time Fourier Transform Window functions Rectangular window Triangular window Hann window Frequency (Hz)

Short Time Fourier Transform Window functions Frequency (Hz) Trade off between smoothing and ringing

Short Time Fourier Transform Definition Signal Window function (, ) STFT with for

Short Time Fourier Transform Intuition: is musical note of frequency ω centered at time t Inner product measures the correlation between the musical note and the signal Frequency (Hz) 12 8 4 0 0 1 2 3 4 5 6

Time-Frequency Representation Spectrogram Frequency (Hz)

Time-Frequency Representation Spectrogram Frequency (Hz) Intensity (db)

Time-Frequency Representation Chirp signal and STFT with Hann window of length 50 ms Frequency (Hz)

Time-Frequency Representation Chirp signal and STFT with box window of length 50 ms Frequency (Hz)

Time-Frequency Representation Time-Frequency Localization Size of window constitutes a trade-off between time resolution and frequency resolution: Large window : Small window : poor time resolution good frequency resolution good time resolution poor frequency resolution Heisenberg Uncertainty Principle: there is no window function that localizes in time and frequency with arbitrary position.

Time-Frequency Representation Signal and STFT with Hann window of length 20 ms Frequency (Hz)

Time-Frequency Representation Signal and STFT with Hann window of length 100 ms Frequency (Hz)

Audio Features Example: C-major scale (piano) Spectrogram Frequency (Hz)

Audio Features Example: Chromatic scale Frequency (Hz) Frequency (Hz) Intensity (db) Intensity (db) C1 24 Spectrogram C2 36 C3 48 C4 60 C5 72 C6 84 C7 96 C8 108

Audio Features Example: Chromatic scale C1 24 C2 36 C3 48 C4 60 C5 72 C6 84 C7 96 C8 108 Frequency (Hz) Frequency (Hz) Intensity (db) Intensity (db) Spectrogram

Audio Features Model assumption: Equal-tempered scale MIDI pitches: Piano notes: p = 21 (A0) to p = 108 (C8) Concert pitch: p = 69 (A4) 440 Hz Center frequency: Hz Logarithmic frequency distribution Octave: doubling of frequency

Audio Features Idea: Binning of Fourier coefficients Divide up the fequency axis into logarithmically spaced pitch regions and combine spectral coefficients of each region to a single pitch coefficient.

Audio Features Time-frequency representation Windowing in the time domain Windowing in the frequency domain

Audio Features Example: Chromatic scale C1 24 C2 36 C3 48 C4 60 C5 72 C6 84 C7 96 C8 108 Frequency (Hz) Frequency (Hz) Intensity (db) Intensity (db) Spectrogram

Audio Features Example: Chromatic scale C1 24 Spectrogram C2 36 C3 48 C4 60 C5 72 C6 84 C7 96 C8 108 C8: 4186 Hz C7: 2093 Hz C6: 1046 Hz C5: 523 Hz C4: 262 Hz C3: 131 Hz Intensity (db)

Audio Features Example: Chromatic scale C1 24 Log-frequency spectrogram C2 36 C3 48 C4 60 C5 72 C6 84 C7 96 C8 108 C8: 4186 Hz C7: 2093 Hz C6: 1046 Hz C5: 523 Hz C4: 262 Hz C3: 131 Hz Intensity (db)

Audio Features Chroma features Chromatic circle Shepard s helix of pitch

Audio Features Chroma features Human perception of pitch is periodic in the sense that two pitches are perceived as similar in color if they differ by an octave. Seperation of pitch into two components: tone height (octave number) and chroma. Chroma : 12 traditional pitch classes of the equaltempered scale. For example: Chroma C Computation: pitch features chroma features Add up all pitches belonging to the same class Result: 12-dimensional chroma vector.

Audio Features Chroma features

Audio Features Chroma features C2 C3 C4 Chroma C

Audio Features Chroma features C # 2 C # 3 C # 4 Chroma C #

Audio Features Chroma features D2 D3 D4 Chroma D

Audio Features Example: Chromatic scale C1 24 Log-frequency spectrogram C2 36 C3 48 C4 60 C5 72 C6 84 C7 96 C8 108 Pitch (MIDI note number) Intensity (db)

Audio Features Example: Chromatic scale C1 24 Log-frequency spectrogram C2 36 C3 48 C4 60 C5 72 C6 84 C7 96 C8 108 Pitch (MIDI note number) Intensity (db) Chroma C

Audio Features Example: Chromatic scale C1 24 Log-frequency spectrogram C2 36 C3 48 C4 60 C5 72 C6 84 C7 96 C8 108 Pitch (MIDI note number) Intensity (db) Chroma C #

Audio Features Example: Chromatic scale C1 24 C2 36 C3 48 C4 60 C5 72 C6 84 C7 96 C8 108 Chromagram Chroma Intensity (db)

Audio Features Chroma features Frequency (pitch)

Audio Features Chroma features Frequency (pitch)

Audio Features Chroma features Chroma

Audio Features Chroma features Sequence of chroma vectors correlates to the harmonic progression Normalization to changes in dynamics makes features invariant Further denoising and smoothing Taking logarithm before adding up pitch coefficients accounts for logarithmic sensation of intensity

Audio Features Chroma features (normalized) Karajan Scherbakov

Audio Features Automatic Chord Estimation G:maj D:maj See: http://chordify.net

Mid-level representation Self-distance matrix (self-similarity matrix) Common mid-level representation Comparing each frame with all other frames Each element describing the dissimilarity of two frames (or a sequence of frames) Informative patterns Stripes for repeated sequences Blocks for homogenous segments Paulus, Müller, Klapuri Audio-Based Music Structure Analysis ISMIR 2010

Mid-level representation Self-distance matrix examples MFCC Chroma Paulus, Müller, Klapuri Audio-Based Music Structure Analysis ISMIR 2010

Approaches Categorization Proposed categorization Novelty-based approaches (points of high contrast) Repetition-based approaches Homogeneity-based approaches An earlier division into Sequence approaches: There exists sequences that are repeated during the piece (stripes in SDMs) State approaches: Piece is produced by a state machine, each state produces distinct observations Paulus, Müller, Klapuri Audio-Based Music Structure Analysis ISMIR 2010

Paulus, Müller, Klapuri Audio-Based Music Structure Analysis ISMIR 2010

Audio Features There are many ways to implement chroma features Properties may differ significantly Appropriateness depends on respective application http://www.mpi-inf.mpg.de/resources/mir/chromatoolbox/ MATLAB implementations for various chroma variants

Tool boxes g Many tool boxes exist that can help you develop your own MIR application g Toolboxes have their pro s and con s depending on: g Language of choice: Matlab, Python, C, C++, etc. g I will only list the large frameworks, a lot of other code is available: g Use you favourite search engine! g See also: g http://www.music-ir.org/evaluation/tools.html (rather outdated) g https://wiki.python.org/moin/pythoninmusic 16

VAMP plugins g http://www.vamp-plugins.org/ g You need a host and a plugin: g Sonic visualizer g Sonic annotator g A lot of plugins for high and low-level feature extraction are available g Outputs text files or XML (RDF) g Has Python bindings (VamPy) 17

Sonic Visualizer Waveform Spectrogram Chromagram 18

Matlab toolboxes g g g g g MIR toolbox g https://www.jyu.fi/hum/laitokset/musiikki/en/research/coe/materi als/mirtoolbox Similarity Matrix Toolbox: g http://www.audiolabs-erlangen.de/resources/mir/smtoolbox/ Dan Ellis Matlab resources: g http://www.ee.columbia.edu/~dpwe/resources/matlab/ Chroma toolbox g http://www.mpi-inf.mpg.de/resources/mir/chromatoolbox/ g All sorts of different chroma variants Constant Q toolbox: g http://www.eecs.qmul.ac.uk/~anssik/cqt/ 19

Other Frameworks g Essentia: g http://essentia.upf.edu/ g C++ library g Marsyas: g http://marsyas.info/ g C++ (and Java) g Aubio: g http://aubio.org/ g Written in C g Has Python bindings g Clam g http://clam-project.org/ g Written in C++ g Has Python bindings g Yaafe g http://yaafe.sourceforge.net/ g Python 20

Additional Literature Park, Tae Hong (2010). Introduction to digital signal processing: Computer musically speaking. World Scientific Meinard Müller Fundamentals of Music Processing Audio, Analysis, Algorithms, Applications ISBN: 978-3-319-21944-8 Springer, 2015 22