Voice Activity Detection Using Pitch Feature

Size: px

Start display at page:

Download "Voice Activity Detection Using Pitch Feature"

Jerome Newman
6 years ago
Views:

1 Voice Activity Detection Using Pitch Feature Presented by: Shay Perera 1

2 CONTENTS Introduction Related work Proposed Improvement References Questions 2

3 PROBLEM speech Non speech Speech Region Non Speech Region 3

4 MOTIVATION Speech Compression Discontinuous Transmission (Cell phones) Speech Recognition Speech Enhancement (Noise Reduction) 4

5 CHALLENGES Noisy environment Stationery noise Transient noise Real time Voiced/UnVoiced 5

6 RELATED WORK Feature Extraction VAD Voice Scheme Activity Detection in Presence of Transient Noise Using Spectral Clustering - Saman Mousazadeh and Israel Cohen, Senior Member, IEEE Speech divided to short overlapping frames (in this work 32 ms frames with ½ overlap used) Features extracted from preprocessed frames Clustering is performed on data points. Spectral clustering Find two GMM for modeling speech/non speech data Compute likelihood ratio using GMM Non- Speech GMM Speech GMM A supervised lerning algorithm Optimum parameters of GMMs Optimum parameters of spectral clustering LRT 6

7 FEATURE EXTRACTION Goal: Find metric with good separation between speech/non speech frame Feature Space: 1. Absolute value of MFCCs (Mel-Frequency cepstrum coefficients) 2. Arithmetic mean of the log-likelihood ratios for the individual frequency bins Y(:, t) Ym (:, t) t Ks Ks 1 PG ( X k H1) 1 k ( t) k ( t) t log log(1 k( t)) Ks k1 PG ( X k H0) Ks k1 1 k ( t) Metric P l W ( i, j) exp pq( i p, j p) pp l l Q( i, j) Y (:, i)(1 exp( / ) Y l (:, j)(1 exp( / ) m l m i j 2 2 7

8 PROPOSED IMPROVEMENT Using Pitch Information Transient Noise ( typing ) Stationary Noise ( whitenoise ) Speech x[ n] ( g[ n]* p[ n]) * h[ n] p[ n] [ n kp] k Glottal Airflow Formants 8

9 PITCH ESTIMATION Methods: Time Domain: Autocorrelation RAPT YIN Time -Frequency Domain: Cepstral HPS LPC J&W 9

10 PITCH ESTIMATION A PITCH ESTIMATION FILTER ROBUST TO HIGH LEVELS OF NOISE (PEFAC) Sira Gonzalez and Mike Brookes Imperial College London, UK EUSIPCO,

11 PITCH ESTIMATION PEFAC METHOD: (a) Calculating STFT Y ( f ) a ( f kf ) N ( f ) t k, t 0 t k1 (b) Log spaced frequency grid K K Y ( q) a ( q log k log f ) N ( q) t k, t 0 t k1 (c)compress amplitude using LTASS (d)convolve with analysis filter h(q) and select the highest peak in feasible range K h( q) ( q log k) k1 50Hz 400Hz 11

12 PITCH FEATURE Feature Vector: Y(:, t) Ym (:, t) t St S t 1 1 exp( p p ) nonspeech speech mean spectrum power pspeech, pspeech GMM sum of first 3 peaks power New Metric: P l W ( i, j) exp pq( i p, j p) pp l l l Q( i, j) Y (:, i)(1 exp( / )(1 exp( S / ) m i i s l l l Y (:, j)(1 exp( / )(1 exp( S / ) m j i s

13 Results (On TIMIT Database) typing + whitenoise SNR=20 db doorknock + colorednoise SNR=10 db typing + babblenoise SNR=5 db Training:20 sequences, Testing: 40 sequences 14

14 FUTURE WORK Different metric Different Pitch feature 15

15 REFERENCES Voice Activity Detection in Presence of Transient Noise Using Spectral Clustering - Saman Mousazadeh and Israel Cohen, Senior Member, IEEE F. R. Bach and M. I. Jordan, Learning spectral clustering, with application to speech separation, Journal of Machine Learning Research, vol. 7, pp , DISCRIMINATIVE TRAINING OF HIDDEN MARKOV MODELS FOR MULTIPLE PITCH TRACKING, Francis R. Bach and Michael I. Jordan J. H. Chang and N. S. Kim, Voice activity detection based on complex laplacian model, Electron. Lett., vol. 39, no. 7, pp , ENEE632 Project4 Part I: Pitch Detection Naotoshi Seo sonots@umd.edu March 24, 2008 A PITCH ESTIMATION FILTER ROBUST TO HIGH LEVELS OF NOISE (PEFAC), Sira Gonzalez and Mike Brookes, Imperial College London, Department of Electrical and Electronic Engineering, London SW7 2AZ, UK 16

16 Questions? 17

VOICE ACTIVITY DETECTION IN PRESENCE OF TRANSIENT NOISE USING SPECTRAL CLUSTERING AND DIFFUSION KERNELS

2014 IEEE 28-th Convention of Electrical and Electronics Engineers in Israel VOICE ACTIVITY DETECTION IN PRESENCE OF TRANSIENT NOISE USING SPECTRAL CLUSTERING AND DIFFUSION KERNELS Oren Rosen, Saman Mousazadeh