Timbre Similarity. Perception and Computation. Prof. Michael Casey. Dartmouth College. Thursday 7th February, 2008

Size: px

Start display at page:

Download "Timbre Similarity. Perception and Computation. Prof. Michael Casey. Dartmouth College. Thursday 7th February, 2008"

Kelly Wilkins
6 years ago
Views:

1 Timbre Similarity Perception and Computation Prof. Michael Casey Dartmouth College Thursday 7th February, 2008 Prof. Michael Casey (Dartmouth College) Timbre Similarity Thursday 7th February, / 21

2 Audio Similarity Computation Metric Spaces Euclidean and Cosine Metrics The S-Matrix Timbre Spaces Sound Objects and Textures Prof. Michael Casey (Dartmouth College) Timbre Similarity Thursday 7th February, / 21

3 Metric Spaces A metric space is a vector space with a distance function, called a metric. a, b R d L p norm: δ p (a, b) = { d i=1 (a i b i ) p } 1 p L 1 is the City Block metric L 2 is the Euclidean distance Prof. Michael Casey (Dartmouth College) Timbre Similarity Thursday 7th February, / 21

4 Euclidean and Cosine Distance Euclidean distance converts rectangular coordinates to magnitude (length). Cosine distance converts rectangular coordinates to the cosine of the angle between vectors. Dot product: ab T = d i=1 a i b i Cosine distance: cos(θ) = abt a b If a = 1 and b = 1 then cos(θ) = ab T If a = 1 and b = 1 then δ 2 (a, b) = 2 2ab T Else: δ 2 (a, b) = a + b 2 a b abt Prof. Michael Casey (Dartmouth College) Timbre Similarity Thursday 7th February, / 21

5 Audio Feature Vectors We use a feature extractor to compute a set of feature vectors. We obtain a new vector of dimensionality d every N samples. The collection of vectors forms an observation matrix X R t d x 11 x 12 x x 1d x 21 x 22 x x 2d X = x t1 x t2 x t3... x td Prof. Michael Casey (Dartmouth College) Timbre Similarity Thursday 7th February, / 21

6 Feature Vector Norming Often we want to make the features invariant to scaling. To do this we make each vector unit norm: ˆx = x x = x d i=1 x2 i By doing this we make Euclidean distance proportional to cosine distance: δ 2 (ˆx, ŷ) = 2 2ˆxŷ T Prof. Michael Casey (Dartmouth College) Timbre Similarity Thursday 7th February, / 21

7 Self-Similarity Matrix: S-Matrix Jonathan Foote, Visualizing music and audio using self-similarity, Proceedings of the seventh ACM international conference on Multimedia, Orlando, S-Matrix is based on the cosine distance between normed matrices: S = ˆXˆX T We implement the S-Matrix in Octave using matrix multiplication: octave> X = loadadb( myfeatures.mfcc20 ); octave> X = nmmtx(x); octave> S = X*X ; octave> imagesc(s); Prof. Michael Casey (Dartmouth College) Timbre Similarity Thursday 7th February, / 21

S-Matrix: Chopin Mazurka Opus 6 No. 1 13 MFCC Coefficients; 100ms hop; normed cosine distance.

8 S-Matrix: Chopin Mazurka Opus 6 No MFCC Coefficients; 100ms hop; normed cosine distance. Prof. Michael Casey (Dartmouth College) Timbre Similarity Thursday 7th February, / 21

9 Audio Feature Extraction fftextract can be downloaded from It is easy to extract features for.wav,.aiff,.ogg,.snd files using this program. If you have compressed audio :.mp3,.mp4,.aac,.wma,.flac you will first need to decode them to PCM format. Also download the following files and put them in your Octave/Matlab folder: nmmtx.m Unit norm each vector in an observation matrix dist.m Euclidean distance between observation matrices readadb.m Load an observation matrix Prof. Michael Casey (Dartmouth College) Timbre Similarity Thursday 7th February, / 21

10 fftextract Feature Example: 12 band-per-octave Constant-Q spectrum: fftextract -q 12 file.wav file.cqt12 Prof. Michael Casey (Dartmouth College) Timbre Similarity Thursday 7th February, / 21

11 fftextract Feature Example: 24-band Pitch-Class Profile (PCP) fftextract -c 24 file.wav file.chr24 Prof. Michael Casey (Dartmouth College) Timbre Similarity Thursday 7th February, / 21

12 fftextract Feature Example: 20 Mel-frequency cepstral coefficients (MFCC) fftextract -m 20 file.wav file.mfcc20 Prof. Michael Casey (Dartmouth College) Timbre Similarity Thursday 7th February, / 21

13 Sound Object Similarity Sound objects are short individual events. Make an observation matrix out of sound object features. Each sound object s features forms one row of the matrix. The S-Matrix is a distance matrix of each sound to each other sound. Can we derive a map of the similarity space between sounds? Prof. Michael Casey (Dartmouth College) Timbre Similarity Thursday 7th February, / 21

14 Kruskal s Multidimensional Scaling (MDS) Algorithm MDS takes a matrix of distances as input. The output is a map of points in d-dimensional space d is chosen to be as low as possible with minimal Kruskal stress. Prof. Michael Casey (Dartmouth College) Timbre Similarity Thursday 7th February, / 21

15 Distances between 10 major UK Cities Prof. Michael Casey (Dartmouth College) Timbre Similarity Thursday 7th February, / 21

16 MDS Example: recovering the map of the UK Measure the distances between cities in your favourite country Make a symmetric distance matrix of these distances (like in your AAA books) Run the MDS algorithm on the distance matrix The result is a recovered map of the positions MDS Solution for 10 Cities in the UK Prof. Michael Casey (Dartmouth College) Timbre Similarity Thursday 7th February, / 21

17 Sound Object Similarity Which sound objects are perceived as similar? Is sound perception continuous, categorical or both? How can we compute the similarity of two sounds? How can we compute the category of a sound? What identifies a sound as belonging to a category? Prof. Michael Casey (Dartmouth College) Timbre Similarity Thursday 7th February, / 21

18 Timbre Space of Musical Instrument Relationships Make a collection of features for a set of musical instruments. Fix: volume / duration / pitch Measure the distances using an S-Matrix. Run the MDS algorithm on the distance matrix The result is a recovered map of the positions in a timbre space. Prof. Michael Casey (Dartmouth College) Timbre Similarity Thursday 7th February, / 21

19 Texture Similarity Music recordings and everyday sounds consist of sound textures; Textures are simultaneous sounding events. It is common to think of textures as noisy but counterpoint is a texture. The same questions apply to sound textures as for sound objects: Which textures are perceived as similar? Is texture perception continuous, categorical or both? How can we compute the similarity of sound textures? How can we compute the category of a sound texture? What identifies a sound texture belonging to a category? Prof. Michael Casey (Dartmouth College) Timbre Similarity Thursday 7th February, / 21

20 Musical Instrument Timbre Perception John Grey (1975) Multidimensional Scaling (MDS) of Musical Instruments David Wessel (1979) Perceptual Control Spaces Jean-Claude Risset (1979) MDS of Re-Synthesized Instrument Tones Prof. Michael Casey (Dartmouth College) Timbre Similarity Thursday 7th February, / 21

21 Everyday Audio Perception Bill Gaver (1983) Ecological Audio Perception Warren and Vebrugge (1989) Perception of Breaking-Bouncing Events Clarkson (1995) Classification of Ambulatory Audio Prof. Michael Casey (Dartmouth College) Timbre Similarity Thursday 7th February, / 21

University of Colorado at Boulder ECEN 4/5532. Lab 2 Lab report due on February 16, 2015

University of Colorado at Boulder ECEN 4/5532 Lab 2 Lab report due on February 16, 2015 This is a MATLAB only lab, and therefore each student needs to turn in her/his own lab report and own programs. 1