Gaussian Processes for Audio Feature Extraction

Size: px

Start display at page:

Download "Gaussian Processes for Audio Feature Extraction"

Arron Singleton
5 years ago
Views:

1 Gaussian Processes for Audio Feature Extraction Dr. Richard E. Turner Computational and Biological Learning Lab Department of Engineering University of Cambridge

2 Machine hearing pipeline signal T samples

3 Machine hearing pipeline time-frequency (TF) analysis frequency T' D > T samples short time Fourier transform spectrogram wavelet filter bank (non-linear) signal T samples

4 Machine hearing pipeline probabilistic model HMM (speech recognition) NMF (source separation, denoising) ICA (source separation, denoising) time-frequency (TF) analysis frequency T' D > T samples short time Fourier transform spectrogram wavelet filter bank (non-linear) signal T samples

5 Problems with conventional pipeline probabilistic model noise (source mixtures) hard to model in TF domain (hard to propagate uncertainty noise/missing data - from signal to TF domain) time-frequency (TF) analysis frequency T' D > T samples (non-linear) signal T samples

6 Problems with conventional pipeline probabilistic model noise (source mixtures) hard to model in TF domain (hard to propagate uncertainty noise/missing data - from signal to TF domain) time-frequency (TF) analysis frequency T' D > T samples hard to enforce/learn dependencies intrinsic to the FT analysis image of mapping (injective) (non-linear) signal T samples

7 Problems with conventional pipeline probabilistic model noise (source mixtures) hard to model in TF domain (hard to propagate uncertainty noise/missing data - from signal to TF domain) time-frequency (TF) analysis frequency T' D > T samples hard to enforce/learn dependencies intrinsic to the FT analysis image of mapping (injective) (non-linear) learning based on time-frequency representation ignores Jacobian signal T samples

8 Problems with conventional pipeline probabilistic model noise (source mixtures) hard to model in TF domain (hard to propagate uncertainty noise/missing data - from signal to TF domain) time-frequency (TF) analysis frequency T' D > T samples hard to enforce/learn dependencies intrinsic to the FT analysis image of mapping (injective) (non-linear) learning based on time-frequency representation ignores Jacobian signal T samples

9 Problems with conventional pipeline probabilistic model noise (source mixtures) hard to model in TF domain (hard to propagate uncertainty noise/missing data - from signal to TF domain) time-frequency (TF) analysis frequency T' D > T samples hard to enforce/learn dependencies intrinsic to the FT analysis image of mapping (injective) (non-linear) learning based on time-frequency representation ignores Jacobian signal T samples hard to adapt both top and bottom layers

10 Goal of this talk probabilistic model probabilise time-frequency analysis (construct generative model in which inference corresponds to classical time-frequency analysis) time-frequency (TF) analysis frequency build a hierachical model that incorporates downstream processing module T' D > T samples classical signal processing (non-linear) machine learning signal T samples

11 A typical audio pipeline signal y time /s

12 A typical audio pipeline spectrogram frequency /khz magnitude short time Fourier transform Fourier transform window signal y time /s

13 A typical audio pipeline NMF spectrogram frequency /khz magnitude short time Fourier transform Fourier transform window signal y time /s

14 A typical audio pipeline NMF spectrogram frequency /khz magnitude short time Fourier transform Fourier transform window signal y time /s

15 What form of generative model corresponds to the STFT? desire: expected value of latent time-frequency coefficients s d,1:t = STFT assume y formed by (weighted) superposition of band-limited signals s d,1:t linearity of inference can be assured by setting the distributions of each s d,1:t and the noise to be Gaussian time-invariance = generative model statistically stationary = GP prior over STFT coefficients, p(s d,1:t ) = G(s d,1:t ;, Γ), stationary Γ t,t T k=1 FT 1 t,k γ kft k,t where FT k,t =e 2πi(k 1)(t 1)/T

16 Time-frequency analysis as inference generation complex sinusoids time-varying (complex) coefficients

17 Time-frequency analysis as inference generation complex sinusoids time-varying (complex) coefficients

18 Time-frequency analysis as inference generation inference complex sinusoids time-varying (complex) coefficients

19 Time-frequency analysis as inference complex sinusoids generation time-varying (complex) coefficients inference most probable coefficients given the signal is the STFT STFT STFT window = prior covariance frequency shifted inverse signal covariance

20 Time-frequency analysis as inference generation inference

21 Time-frequency analysis as inference generation inference

22 Time-frequency analysis as inference generation inference

23 Time-frequency analysis as inference generation inference

24 Time-frequency analysis as inference generation inference depends on independent of

25 Time-frequency analysis as inference generation inference depends on independent of signal noise

26 Time-frequency analysis as inference generation inference depends on independent of signal noise Wiener filter

27 Time-frequency analysis as inference generation inference depends on independent of signal noise Wiener filter

28 Time-frequency analysis as inference generation inference depends on independent of signal noise Wiener filter STFT window = prior covariance frequency shifted inverse signal covariance

29 Time-frequency analysis as inference generation inference probabilistic filter bank depends on independent of signal noise Wiener filter probabilistic STFT STFT window = prior covariance frequency shifted inverse signal covariance

30 Time-frequency analysis as inference probabilistic models in which inference recovers STFT, filter bank, wavelet analysis unifes a number of existing probabilistic time-series models & connects to traditional sig. proc. can learn window of STFT and frequencies (equivalently filter properties) frequency shift relationship mimics classical relationship between these time-frequency relationships hops/down-sampling and finite window used correspond to FITC (uniformly spaced pseudo-points) and sparse-covariance approximations rediscover Nyquist in the context of approximation GPs

31 Probabilistic audio processing pipeline 2.6 envelopes freq /khz mean spectrum carriers freq /khz.1 = bandpass Gaussian noise 1 signal time /ms

32 Probabilistic audio processing pipeline 2.6 envelopes freq /khz mean spectrum carriers freq /khz.1 = bandpass Gaussian noise 1 signal time /ms

33 Probabilistic audio processing pipeline mean spectrum envelope patterns 2.6 = slow Gaussian process envelopes freq /khz mean spectrum carriers freq /khz.1 = bandpass Gaussian noise 1 signal time /ms

34 Key Observation fix envelopes: Inference and Learning posterior over carriers is Gaussian posterior mean given by an (adaptive) filter Leads to MAP estimation of the envelopes (or HMCMC), let z lt = log h lt Z MAP = arg max p(z Y) Z p(z Y) = 1 Z p(z, Y) = 1 dxp(z, Y, X) = 1 Z Z p(z) dxp(y A, X)p(X) Compute integral efficiently using chain stuctured approximation and Kalman Smoothing Leads to gradient based optimisation for transformed amplitudes Learning: approximate Maximum Likelihood θ = arg max θ p(y θ) NMF: zero-temperature EM, one E-Step, initialise constant envelopes

35 Audio modelling

36 Audio modelling fire stream wind rain frequency /kh foot step time /s tent-zip Turner, 21

37 Audio modelling fire stream wind rain frequency /kh foot step time /s tent-zip Turner, 21

38 Audio modelling frequency /kh time /s Turner, 21

39 Statistical texture synthesis Old approach: build detailed physical models (e.g. rain drops) New approach train model on your favourite texture sample from the prior, and then from the likelihood. Waveform unique, but statistically matched to original Often perceptually indistinguishable

40 Audio denoising SNR improvement /db SNR before /db NMF tnmf GTF GTFtNMF adapted filters unadapted filters Wiener spectral subtract block threshold PESQ improvement PESQ before SNR log spec improvement /db SNR log spec before /db 5 5 y t y t y t y t y t y t time /ms time /ms

41 Audio missing data imputation SNR /db missing region /ms tnmf GTF GTFtNMF unadapted filters adapted filters PESQ missing region /ms SNR log spec /db missing region /ms y t y t y t y t y t y t time /ms time /ms

42 Unifying classical and probabilistic audio signal processing Probabilistic signal processing robustness adaptation fast methods important variables Classical signal processing

43 Probabilistic signal processing Classical signal processing Cemgil & Godsill freq shift Qi & Minka Filter Bank & Hilbert freq shift STFT estimation Amplitudes Spectrogram

44 Additional slides

A Variance Modeling Framework Based on Variational Autoencoders for Speech Enhancement

A Variance Modeling Framework Based on Variational Autoencoders for Speech Enhancement Simon Leglaive 1 Laurent Girin 1,2 Radu Horaud 1 1: Inria Grenoble Rhône-Alpes 2: Univ. Grenoble Alpes, Grenoble INP,