Introduction Basic Audio Feature Extraction

Size: px

Start display at page:

Download "Introduction Basic Audio Feature Extraction"

Shana Barrett
5 years ago
Views:

1 Introduction Basic Audio Feature Extraction Vincent Koops (with slides by Meinhard Müller) Sound and Music Technology, December 6th, November 2017

2 Today g Main modules A. Sound and music for games B. Analysis, classification, and retrieval of sound and music for media C. Generation and manipulation of sound and music for games and media 2

3 Introduction Aim of this lecture: g Show you the basics of digital signal processing g Show you the basic of audio feature extraction g You should be able to solve simple tasks using audio feature extraction toolboxes g Important concepts are marked in italics g Have some fun with sound! 3

4 Course Module Module B: g 1. Introduction to Digital Signal Processing: g From natural sound to digital audio g 2. Basic audio feature extraction: g From digital audio to features Case studies: practical examples Toolboxes, etc. 4

5 What are Features? g If we want to compare/find/classify things, we need a way to describe them in a useful way g Visual: colors, shapes, etc 5

6 What are Features? g Audio: frequencies, amplitudes: g Low/high sounds: ex. Tuba vs Flute instrument g Loud/quiet sounds: ex. Metal vs Acoustic music g (A manipulation of) these elements is called a feature. 6

7 Recap: What are Features? g Important feature properties: g Level of abstraction: Low: Frequency -> low/high music instruments Mid: Genre -> metal/country music High: Emotion -> sad/happy music g Locality: Global feature: Local feature: entire signal part of a signal (for time representation) 7

8 Book: Fundamentals of Music Processing Meinard Müller Fundamentals of Music Processing Audio, Analysis, Algorithms, Applications 483 p., 249 illus., hardcover ISBN: Springer, 2015 Accompanying website:

9 Fourier Transform Idea: Decompose a given signal into a superposition of sinusoids (elementary signals). Sinusoids Signal

10 Fourier Transform Each sinusoid has a physical meaning and can be described by three parameters: Sinusoids

11 Fourier Transform Each sinusoid has a physical meaning and can be described by three parameters: Sinusoids Signal

12 Fourier Transform Each sinusoid has a physical meaning and can be described by three parameters: Signal Fouier transform Frequency (Hz)

13 Fourier Transform Role of phase Magnitude Phase

14 Fourier Transform Role of phase Analysis with sinusoid having frequency 262 Hz and phase φ = Magnitude Phase

15 Fourier Transform Role of phase Analysis with sinusoid having frequency 262 Hz and phase φ = Magnitude Phase

16 Fourier Transform Role of phase Analysis with sinusoid having frequency 262 Hz and phase φ = Magnitude Phase

17 Fourier Transform Role of phase Analysis with sinusoid having frequency 262 Hz and phase φ = Magnitude Phase

18 Fourier Transform Example: Superposition of two sinusoids Frequency (Hz)

19 Fourier Transform Example: C4 played by piano Frequency (Hz)

20 Fourier Transform Example: Piano tone (C4, Hz)

Fourier Transform Example: Piano tone (C4, 261.

21 Fourier Transform Example: Piano tone (C4, Hz) Analysis using sinusoid with 262 Hz high correlation large Fourier coefficient

22 Fourier Transform Example: Piano tone (C4, Hz) Analysis using sinusoid with 400 Hz low correlation small Fourier coefficient

23 Fourier Transform Example: Piano tone (C4, Hz) Analysis using sinusoid with 523 Hz high correlation large Fourier coefficient

24 Fourier Transform Example: C4 played by trumpet Frequency (Hz)

25 Fourier Transform Example: C4 played by violine Frequency (Hz)

26 Fourier Transform Example: C4 played by flute Frequency (Hz)

27 Fourier Transform Example: Speech Bonn Frequency (Hz)

28 Fourier Transform Example: Speech Zürich Frequency (Hz)

29 Fourier Transform Example: C-major scale (piano) Frequency (Hz)

30 Fourier Transform Example: Chirp signal Frequency (Hz)

31 Fourier Transform Signal Fourier representation Fourier transform

32 Fourier Transform Signal Fourier representation Fourier transform Tells which frequencies occur, but does not tell when the frequencies occur. Frequency information is averaged over the entire time interval. Time information is hidden in the phase

33 Fourier Transform Frequency (Hz) Frequency (Hz)

34 Short Time Fourier Transform Idea (Dennis Gabor, 1946): Consider only a small section of the signal for the spectral analysis recovery of time information Short Time Fourier Transform (STFT) Section is determined by pointwise multiplication of the signal with a localizing window function

35 Short Time Fourier Transform Frequency (Hz)

36 Short Time Fourier Transform Frequency (Hz)

37 Short Time Fourier Transform Frequency (Hz)

38 Short Time Fourier Transform Frequency (Hz)

39 Short Time Fourier Transform Frequency (Hz)

40 Short Time Fourier Transform Frequency (Hz)

41 Short Time Fourier Transform Frequency (Hz)

42 Short Time Fourier Transform Frequency (Hz)

43 Short Time Fourier Transform Window functions Rectangular window Triangular window Hann window Frequency (Hz)

44 Short Time Fourier Transform Window functions Frequency (Hz) Trade off between smoothing and ringing

45 Short Time Fourier Transform Definition Signal Window function (, ) STFT with for

46 Short Time Fourier Transform Intuition: is musical note of frequency ω centered at time t Inner product measures the correlation between the musical note and the signal Frequency (Hz)

47 Time-Frequency Representation Spectrogram Frequency (Hz)

48 Time-Frequency Representation Spectrogram Frequency (Hz) Intensity (db)

49 Time-Frequency Representation Chirp signal and STFT with Hann window of length 50 ms Frequency (Hz)

50 Time-Frequency Representation Chirp signal and STFT with box window of length 50 ms Frequency (Hz)

51 Time-Frequency Representation Time-Frequency Localization Size of window constitutes a trade-off between time resolution and frequency resolution: Large window : Small window : poor time resolution good frequency resolution good time resolution poor frequency resolution Heisenberg Uncertainty Principle: there is no window function that localizes in time and frequency with arbitrary position.

52 Time-Frequency Representation Signal and STFT with Hann window of length 20 ms Frequency (Hz)

53 Time-Frequency Representation Signal and STFT with Hann window of length 100 ms Frequency (Hz)

54 Audio Features Example: C-major scale (piano) Spectrogram Frequency (Hz)

55 Audio Features Example: Chromatic scale Frequency (Hz) Frequency (Hz) Intensity (db) Intensity (db) C1 24 Spectrogram C2 36 C3 48 C4 60 C5 72 C6 84 C7 96 C8 108

56 Audio Features Example: Chromatic scale C1 24 C2 36 C3 48 C4 60 C5 72 C6 84 C7 96 C8 108 Frequency (Hz) Frequency (Hz) Intensity (db) Intensity (db) Spectrogram

57 Audio Features Model assumption: Equal-tempered scale MIDI pitches: Piano notes: p = 21 (A0) to p = 108 (C8) Concert pitch: p = 69 (A4) 440 Hz Center frequency: Hz Logarithmic frequency distribution Octave: doubling of frequency

58 Audio Features Idea: Binning of Fourier coefficients Divide up the fequency axis into logarithmically spaced pitch regions and combine spectral coefficients of each region to a single pitch coefficient.

59 Audio Features Time-frequency representation Windowing in the time domain Windowing in the frequency domain

60 Audio Features Example: Chromatic scale C1 24 C2 36 C3 48 C4 60 C5 72 C6 84 C7 96 C8 108 Frequency (Hz) Frequency (Hz) Intensity (db) Intensity (db) Spectrogram

61 Audio Features Example: Chromatic scale C1 24 Spectrogram C2 36 C3 48 C4 60 C5 72 C6 84 C7 96 C8 108 C8: 4186 Hz C7: 2093 Hz C6: 1046 Hz C5: 523 Hz C4: 262 Hz C3: 131 Hz Intensity (db)

62 Audio Features Example: Chromatic scale C1 24 Log-frequency spectrogram C2 36 C3 48 C4 60 C5 72 C6 84 C7 96 C8 108 C8: 4186 Hz C7: 2093 Hz C6: 1046 Hz C5: 523 Hz C4: 262 Hz C3: 131 Hz Intensity (db)

63 Audio Features Chroma features Chromatic circle Shepard s helix of pitch

64 Audio Features Chroma features Human perception of pitch is periodic in the sense that two pitches are perceived as similar in color if they differ by an octave. Seperation of pitch into two components: tone height (octave number) and chroma. Chroma : 12 traditional pitch classes of the equaltempered scale. For example: Chroma C Computation: pitch features chroma features Add up all pitches belonging to the same class Result: 12-dimensional chroma vector.

65 Audio Features Chroma features

66 Audio Features Chroma features C2 C3 C4 Chroma C

67 Audio Features Chroma features C # 2 C # 3 C # 4 Chroma C #

68 Audio Features Chroma features D2 D3 D4 Chroma D

69 Audio Features Example: Chromatic scale C1 24 Log-frequency spectrogram C2 36 C3 48 C4 60 C5 72 C6 84 C7 96 C8 108 Pitch (MIDI note number) Intensity (db)

70 Audio Features Example: Chromatic scale C1 24 Log-frequency spectrogram C2 36 C3 48 C4 60 C5 72 C6 84 C7 96 C8 108 Pitch (MIDI note number) Intensity (db) Chroma C

71 Audio Features Example: Chromatic scale C1 24 Log-frequency spectrogram C2 36 C3 48 C4 60 C5 72 C6 84 C7 96 C8 108 Pitch (MIDI note number) Intensity (db) Chroma C #

72 Audio Features Example: Chromatic scale C1 24 C2 36 C3 48 C4 60 C5 72 C6 84 C7 96 C8 108 Chromagram Chroma Intensity (db)

73 Audio Features Chroma features Frequency (pitch)

74 Audio Features Chroma features Frequency (pitch)

75 Audio Features Chroma features Chroma

76 Audio Features Chroma features Sequence of chroma vectors correlates to the harmonic progression Normalization to changes in dynamics makes features invariant Further denoising and smoothing Taking logarithm before adding up pitch coefficients accounts for logarithmic sensation of intensity

77 Audio Features Chroma features (normalized) Karajan Scherbakov

78 Audio Features Automatic Chord Estimation G:maj D:maj See:

79 Mid-level representation Self-distance matrix (self-similarity matrix) Common mid-level representation Comparing each frame with all other frames Each element describing the dissimilarity of two frames (or a sequence of frames) Informative patterns Stripes for repeated sequences Blocks for homogenous segments Paulus, Müller, Klapuri Audio-Based Music Structure Analysis ISMIR 2010

80 Mid-level representation Self-distance matrix examples MFCC Chroma Paulus, Müller, Klapuri Audio-Based Music Structure Analysis ISMIR 2010

81 Approaches Categorization Proposed categorization Novelty-based approaches (points of high contrast) Repetition-based approaches Homogeneity-based approaches An earlier division into Sequence approaches: There exists sequences that are repeated during the piece (stripes in SDMs) State approaches: Piece is produced by a state machine, each state produces distinct observations Paulus, Müller, Klapuri Audio-Based Music Structure Analysis ISMIR 2010

82 Paulus, Müller, Klapuri Audio-Based Music Structure Analysis ISMIR 2010

83 Audio Features There are many ways to implement chroma features Properties may differ significantly Appropriateness depends on respective application MATLAB implementations for various chroma variants

84 Tool boxes g Many tool boxes exist that can help you develop your own MIR application g Toolboxes have their pro s and con s depending on: g Language of choice: Matlab, Python, C, C++, etc. g I will only list the large frameworks, a lot of other code is available: g Use you favourite search engine! g See also: g (rather outdated) g 16

85 VAMP plugins g g You need a host and a plugin: g Sonic visualizer g Sonic annotator g A lot of plugins for high and low-level feature extraction are available g Outputs text files or XML (RDF) g Has Python bindings (VamPy) 17

86 Sonic Visualizer Waveform Spectrogram Chromagram 18

87 Matlab toolboxes g g g g g MIR toolbox g als/mirtoolbox Similarity Matrix Toolbox: g Dan Ellis Matlab resources: g Chroma toolbox g g All sorts of different chroma variants Constant Q toolbox: g 19

88 Other Frameworks g Essentia: g g C++ library g Marsyas: g g C++ (and Java) g Aubio: g g Written in C g Has Python bindings g Clam g g Written in C++ g Has Python bindings g Yaafe g g Python 20

89 Additional Literature Park, Tae Hong (2010). Introduction to digital signal processing: Computer musically speaking. World Scientific Meinard Müller Fundamentals of Music Processing Audio, Analysis, Algorithms, Applications ISBN: Springer,

Audio Features. Fourier Transform. Fourier Transform. Fourier Transform. Short Time Fourier Transform. Fourier Transform.

Audio Features. Fourier Transform. Fourier Transform. Fourier Transform. Short Time Fourier Transform. Fourier Transform. Advanced Course Computer Science Music Processing Summer Term 2010 Fourier Transform Meinard Müller Saarland University and MPI Informatik meinard@mpi-inf.mpg.de Audio Features Fourier Transform Fourier