Novelty detection. Juan Pablo Bello MPATE-GE 2623 Music Information Retrieval New York University

Similar documents
Novelty detection. Juan Pablo Bello EL9173 Selected Topics in Signal Processing: Audio Content Analysis NYU Poly

Power-Scaled Spectral Flux and Peak-Valley Group-Delay Methods for Robust Musical Onset Detection

t Tao Group Limited, Reading, RG6 IAZ, U.K. wenwu.wang(ieee.org t Samsung Electronics Research Institute, Staines, TW1 8 4QE, U.K.

Topic 6. Timbre Representations

Feature extraction 2

Sparse Time-Frequency Transforms and Applications.

Introduction Basic Audio Feature Extraction

Design Criteria for the Quadratically Interpolated FFT Method (I): Bias due to Interpolation

A NEW DISSIMILARITY METRIC FOR THE CLUSTERING OF PARTIALS USING THE COMMON VARIATION CUE

PERCEPTUAL MATCHING PURSUIT WITH GABOR DICTIONARIES AND TIME-FREQUENCY MASKING. Gilles Chardon, Thibaud Necciari, and Peter Balazs

University of Colorado at Boulder ECEN 4/5532. Lab 2 Lab report due on February 16, 2015

July 6, Applause Identification and its relevance to Archival of Carnatic Music. Padi Sarala, Vignesh Ishwar, Ashwin Bellur and Hema A.

FACTORS IN FACTORIZATION: DOES BETTER AUDIO SOURCE SEPARATION IMPLY BETTER POLYPHONIC MUSIC TRANSCRIPTION?

Transients Detection in the Time-Scale Domain

On Spectral Basis Selection for Single Channel Polyphonic Music Separation

200Pa 10million. Overview. Acoustics of Speech and Hearing. Loudness. Terms to describe sound. Matching Pressure to Loudness. Loudness vs.

PROBABILISTIC EXTRACTION OF BEAT POSITIONS FROM A BEAT ACTIVATION FUNCTION

Automatic Speech Recognition (CS753)

Scalable audio separation with light Kernel Additive Modelling

NON-NEGATIVE MATRIX FACTORIZATION WITH SELECTIVE SPARSITY CONSTRAINTS FOR TRANSCRIPTION OF BELL CHIMING RECORDINGS

E : Lecture 1 Introduction

Non-Negative Matrix Factorization And Its Application to Audio. Tuomas Virtanen Tampere University of Technology

Improved Method for Epoch Extraction in High Pass Filtered Speech

LECTURE NOTES IN AUDIO ANALYSIS: PITCH ESTIMATION FOR DUMMIES

Timbral, Scale, Pitch modifications

Speech Signal Representations

Time-Series Analysis for Ear-Related and Psychoacoustic Metrics

STATISTICAL APPROACH FOR SOUND MODELING

Lecture 7: Pitch and Chord (2) HMM, pitch detection functions. Li Su 2016/03/31

arxiv: v2 [cs.sd] 8 Feb 2017

Spread Spectrum Watermarking. Information Technologies for IPR Protection

ROBUST REALTIME POLYPHONIC PITCH DETECTION

Data Characterization in Gravitational Waves

Feature Extraction for ASR: Pitch

Physical Modeling Synthesis

MULTI-RESOLUTION SIGNAL DECOMPOSITION WITH TIME-DOMAIN SPECTROGRAM FACTORIZATION. Hirokazu Kameoka

Optimal Speech Enhancement Under Signal Presence Uncertainty Using Log-Spectral Amplitude Estimator

DETECTION AND MODELING OF TRANSIENT AUDIO SIGNALS WITH PRIOR INFORMATION

Elec4621 Advanced Digital Signal Processing Chapter 11: Time-Frequency Analysis

Non-Stationary Noise Power Spectral Density Estimation Based on Regional Statistics

Oracle Analysis of Sparse Automatic Music Transcription

Short-Time Fourier Transform and Chroma Features

Signal Modeling Techniques in Speech Recognition. Hassan A. Kingravi

Chirp Transform for FFT

NOISE ROBUST RELATIVE TRANSFER FUNCTION ESTIMATION. M. Schwab, P. Noll, and T. Sikora. Technical University Berlin, Germany Communication System Group

PARAMETRIC coding has proven to be very effective

Proc. of NCC 2010, Chennai, India

Aalborg Universitet. On Perceptual Distortion Measures and Parametric Modeling Christensen, Mads Græsbøll. Published in: Proceedings of Acoustics'08

Noise Robust Isolated Words Recognition Problem Solving Based on Simultaneous Perturbation Stochastic Approximation Algorithm

CRYSTALLIZATION SONIFICATION OF HIGH-DIMENSIONAL DATASETS

Convention Paper Presented at the 128th Convention 2010 May London, UK

2D Spectrogram Filter for Single Channel Speech Enhancement

When Dictionary Learning Meets Classification

Covariance smoothing and consistent Wiener filtering for artifact reduction in audio source separation

CEPSTRAL ANALYSIS SYNTHESIS ON THE MEL FREQUENCY SCALE, AND AN ADAPTATIVE ALGORITHM FOR IT.

Short-Time Fourier Transform and Chroma Features

EIGENFILTERS FOR SIGNAL CANCELLATION. Sunil Bharitkar and Chris Kyriakakis

Techniques for Machine Understanding of Live Drum Performances. Eric Dean Battenberg

TIME-DEPENDENT PARAMETRIC AND HARMONIC TEMPLATES IN NON-NEGATIVE MATRIX FACTORIZATION

Lecture 7: Feature Extraction

TIME-DEPENDENT PARAMETRIC AND HARMONIC TEMPLATES IN NON-NEGATIVE MATRIX FACTORIZATION

The Choice of MPEG-4 AAC encoding parameters as a direct function of the perceptual entropy of the audio signal

Chapters 11 and 12. Sound and Standing Waves

An Efficient Pitch-Tracking Algorithm Using a Combination of Fourier Transforms

Linear Prediction 1 / 41

MANY digital speech communication applications, e.g.,

A Robust Event-Triggered Consensus Strategy for Linear Multi-Agent Systems with Uncertain Network Topology

New Recursive-Least-Squares Algorithms for Nonlinear Active Control of Sound and Vibration Using Neural Networks

Computational Auditory Scene Analysis Based on Loudness/Pitch/Timbre Decomposition

AN EFFICIENT MULTI-RESOLUTION SPECTRAL TRANSFORM FOR MUSIC ANALYSIS

Signal types. Signal characteristics: RMS, power, db Probability Density Function (PDF). Analogue-to-Digital Conversion (ADC).

Estimation of Relative Operating Characteristics of Text Independent Speaker Verification

An information-geometric approach to real-time audio segmentation

Study of nonlinear phenomena in a tokamak plasma using a novel Hilbert transform technique

arxiv:math/ v1 [math.na] 7 Mar 2006

Analysis of polyphonic audio using source-filter model and non-negative matrix factorization

Topic 3: Fourier Series (FS)

Fourier Analysis of Signals

Fourier Analysis of Signals

Robust Sparse Recovery via Non-Convex Optimization

Singer Identification using MFCC and LPC and its comparison for ANN and Naïve Bayes Classifiers

MULTIPITCH ESTIMATION AND INSTRUMENT RECOGNITION BY EXEMPLAR-BASED SPARSE REPRESENTATION. Ikuo Degawa, Kei Sato, Masaaki Ikehara

International Journal of Advanced Research in Computer Science and Software Engineering

On-off Control: Audio Applications

CS229 Project: Musical Alignment Discovery

Single Channel Music Sound Separation Based on Spectrogram Decomposition and Note Classification

Shankar Shivappa University of California, San Diego April 26, CSE 254 Seminar in learning algorithms

Module 1: Signals & System

Research Article A Combined Mathematical Treatment for a Special Automatic Music Transcription System

Environmental Effects and Control of Noise Lecture 2

Physical Acoustics. Hearing is the result of a complex interaction of physics, physiology, perception and cognition.

Melody Extraction and Musical Onset Detection

MITIGATING UNCORRELATED PERIODIC DISTURBANCE IN NARROWBAND ACTIVE NOISE CONTROL SYSTEMS

The effect of speaking rate and vowel context on the perception of consonants. in babble noise

PUBLISHED IN IEEE TRANSACTIONS ON AUDIO, SPEECH AND LANGUAGE PROCESSING, VOL. 19, NO. 6, AUGUST

SPEECH ANALYSIS AND SYNTHESIS

A POSTERIORI SPEECH PRESENCE PROBABILITY ESTIMATION BASED ON AVERAGED OBSERVATIONS AND A SUPER-GAUSSIAN SPEECH MODEL

Convention Paper Presented at the 121st Convention 2006 October 5 8 San Francisco, CA, USA

MULTISCALE SCATTERING FOR AUDIO CLASSIFICATION

10ème Congrès Français d Acoustique

Transcription:

Novelty detection Juan Pablo Bello MPATE-GE 2623 Music Information Retrieval New York University

Novelty detection Energy burst Find the start time (onset) of new events (notes) in the music signal. Short duration Steady state Onset: single instant chosen to mark the start of the (attack) transient. Unstability

Applications MIR: can be used for segmentation; one layer of rhythmic analysis (e.g. beat). Computational Musicology: see the groundbreaking work at ÖFAI (1, 2) Sound manipulation and synthesis: http://www.music.mcgill.ca/~hockman/ projects/artma/index.html Computational biology?!! http://isophonics.net/content/calcium-signalanalyser

Difficulties extended transient (e.g. violin, flute) ambiguous events (vibrato, tremolo, glissandi) polyphonies -> (a)synchronous onsets perceptual vs physical

Difficulties extended transient (e.g. violin, flute) ambiguous events (vibrato, tremolo, glissandi) polyphonies -> (a)synchronous onsets perceptual vs physical

Difficulties extended transient (e.g. violin, flute) ambiguous events (vibrato, tremolo, glissandi) polyphonies -> (a)synchronous onsets perceptual vs physical

Architecture Audio signal Reduction Novelty (detection) function Detection (peak picking) Onsets

Time-domain Onsets: often characterized by an amplitude increase Envelope following (full-wave rectification + smoothing): E 0 (m) = 1 N N/2 X n= N/2 x(n + mh) w(n) Squaring instead of rectifying, results in the local energy: E(m) = 1 N N/2 X n= N/2 (x(n + mh)) 2 w(n)

Time-domain

Time-domain We can use the derivative of energy w.r.t. time -> sharp peaks during energy rise Detectable changes in loudness are proportional to the overall loudness of the sound. @E(m)/@m E(m) = @log(e(m)) @m Simulates the ear s perception of loudness (Klapuri, 1999)

Time-domain

Frequency-domain Impulsive noise in time -> wide band noise in frequency More noticeable in high frequencies. Linear weighting of Energy HFC(m) = 2 N N/2 X k=0 X k (m) 2 k

Frequency-domain We can also measure change (flux) in spectral content (Duxbury, 02). SF(m) = 2 N Use half-wave rectification to only take energy increases into account SF R (m) = 2 N N/2 X k=0 H(x) =(x + x )/2 ( X k (m) X k (m 1) ) Use the (squared) L2 norm of the flux vector P N/2 k=0 H ( X k(m) X k (m 1) ) SF RL2 (m) = 2 N N/2 X k=0 [H ( X k (m) X k (m 1) )] 2

Spectral Flux

Phase deviation The change in instantaneous frequency in bin k can be used to detect onsets (Bello, 03) k(m 1) k(m) PD(m) = 2 N = 2 N N/2 X k=0 N/2 X k=0 00 k(m) = 2 N N/2 X k=0 k(m) k(m 1) princarg ( k (m) 2 k (m 1) + k (m 2))

Phase deviation This function can be improved by weighting frequency bins by their magnitude (Dixon, 06): PD W (m) = 2 N N/2 X k=0 X k (m) 00 k(m) and normalized: PD WN (m) = P N/2 k=0 X k(m) 00 k P (m) N/2 k=0 X k(m)

Phase deviation

Complex domain We can combine the spectral flux and phase deviation strategies, such that: ˆX k (m) = ˆX k (m) e j ˆk(m) where, ˆX k (m) = X k (m 1) ˆk(m) =princarg (2 k (m 1) k(m 2)) such that, CD(m) = 2 N P N/2 k=0 X k(m) ˆXk (m)

Complex domain As before, we can use half-wave rectification to improve the function (Dixon, 06): CD(m) = 2 N P N/2 k=0 RCD k(m) where, RCD k (m) = Xk (m) ˆXk (m) if X k (m) X k (m 1) 0 otherwise

Complex domain

Peak picking The function is post-processed to facilitate peak picking: Smoothing -> decrease jaggedness Normalization -> generalization of threshold values Thresholding -> eliminate spurious peaks

Peak picking Adaptive thresholding is a more robust choice, typically defined as: (m) = + f( m), m L apple m apple m + L where f is a function, e.g. the local mean or median, of the detection function; β increases the window length before the peak; and α is an offset value Peak picking reduces to selecting local maxima above the threshold

Peak picking

Comparing detection functions

Comparing detection functions

Comparing detection functions

Benchmarking correct c false false P = positive f - f + R = negative F = c c + f + c c + f 2PR P + R

References Dixon, S. Onset Detection Revisited. Proceedings of the 9th International Conference on Digital Audio Effects (DAFx06), Montreal, Canada, 2006. Collins, N. A Comparison of Sound Onset Detection Algorithms with Emphasis on Psycho-Acoustically Motivated Detection Functions. Journal of the Audio Engineering Society, 2005. Bello, J.P., Daudet, L., Abdallah, S., Duxbury, C., Davies, M. and Sandler, M.B. A tutorial on onset detection in music signals. IEEE Transactions on Speech and Audio Processing. 13(5), Part 2, pages 1035-1047, September, 2005. Bello, J.P., Duxbury, C., Davies, M. and Sandler, M. On the use of phase and energy for musical onset detection in the complex domain. IEEE Signal Processing Letters. 11(6), pages 553-556, June, 2004. Klapuri, A. Sound Onset Detection by Applying Psychoacoustic Knowledge. IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Phoenix, Arizona, 1999.