Markov-Switching GARCH Models and Applications to Digital Processing of Speech Signals

Size: px

Start display at page:

Download "Markov-Switching GARCH Models and Applications to Digital Processing of Speech Signals"

Marvin Marcus Perkins
6 years ago
Views:

1 Markov-Switching GARCH Models and Applications to Digital Processing of Speech Signals Electrical Engineering Department Technion - Israel Institute of Technology Supervised by: Prof. Israel Cohen

2 Outline Introduction - Spectral Enhancement for Speech Enhancement Markov-Switching GARCH Model

3 Spectral Enhancement Existing Algorithms Speech Enhancement - Main Goal Given a noisy observation of speech signal, y = x + d find the best estimate for the speech signal ˆx

4 Spectral Enhancement Existing Algorithms Short-Time Fourier Transform (STFT) Since the signal is highly non-stationary, STFT is applied to yield a time-frequency representation: and we get Y tk = X tk + D tk clean speech noisy speech, SNR=5 db

5 Spectral Enhancement Spectral Enhancement Existing Algorithms The expansion coefficients are sparse in the STFT domain H1 tk Y tk = X tk + D tk H0 tk (speech absent) : Y tk = D tk and the spectral enhancement problem can be formulated as { min E d (X tk, ˆX ) tk ˆqtk, ˆλ tk, ˆλ } d,tk, Y tk ˆX tk

6 Spectral Enhancement Spectral Enhancement Existing Algorithms The expansion coefficients are sparse in the STFT domain H1 tk (speech present) : Y tk = X tk + D tk H0 tk Y tk = D tk and the spectral enhancement problem can be formulated as { ) } min E d (X tk, ˆX tk ˆqtk, ˆλ tk, ˆλ d,tk, Y tk ˆX tk d (X tk, ˆX ) tk = g(x tk ) g(ˆx tk ) 2 - distortion measure q tk = P ( H1 tk ) - speech presence probability λ tk = E { X tk 2 H1 tk } - speech spectral variance λ d,tk = E { D tk 2} - noise spectral variance

7 Spectral Enhancement (cont.) Spectral Enhancement Existing Algorithms A spectral enhancement system requires: Estimation / Detection method: and Fidelity criterion, i.e., g(x) = X, X, log X A detector for speech spectral coefficients / speech presence probability Spectral model: Statistical models for {X tk } and {D tk } Spectral variance estimators ˆλ tk and ˆλ d,tk

8 Spectral Models Spectral Enhancement Existing Algorithms Decision-directed spectral variance estimator { 2 ( ˆλ tk = max µ ˆX t 1,k + (1 µ) Y tk 2 ˆλ ) } d,tk, λ min 0 µ 1 [Ephraim & Malah 84] Hidden-Markov Model A vector model: Hidden Markov chain with mixtures of AR processes. [Ephraim et al. 89, 92]

9 Spectral Models Spectral Enhancement Existing Algorithms Decision-directed spectral variance estimator { 2 ( ˆλ tk = max µ ˆX t 1,k + (1 µ) Y tk 2 ˆλ ) } d,tk, λ min 0 µ 1 [Ephraim & Malah 84] Hidden-Markov Model A vector model: Hidden Markov chain with mixtures of AR processes. [Ephraim et al. 89, 92]

10 Existing Algorithms Spectral Enhancement Existing Algorithms Spectral subtraction: detection (power thresholding) and independent estimation (power subtraction). Subspace approach: detection followed by estimation. MMSE, STSA, OM-LSA (and other): estimation under uncertainty (e.g., ˆX tk = E{X tk Y tk, H1 tk}p(htk 1 Y tk)). Drawbacks: Estimation under uncertainty degrades speech quality and prevent sufficient attenuation of noise-only coefficients compared to using an ideal detector. Erroneous detection may remove desired speech components or result in an annoying musical noise.

11 Existing Algorithms Spectral Enhancement Existing Algorithms Spectral subtraction: detection (power thresholding) and independent estimation (power subtraction). Subspace approach: detection followed by estimation. MMSE, STSA, OM-LSA (and other): estimation under uncertainty (e.g., ˆX tk = E{X tk Y tk, H1 tk}p(htk 1 Y tk)). Drawbacks: Estimation under uncertainty degrades speech quality and prevent sufficient attenuation of noise-only coefficients compared to using an ideal detector. Erroneous detection may remove desired speech components or result in an annoying musical noise.

12 GARCH Formulation Markov-switching GARCH model Asymptotic Stationarity Spectral Modeling State smoothing GARCH: Generalized Autoregressive Conditional Heteroscedasticity

GARCH Formulation Markov-switching GARCH model Asymptotic Stationarity Spectral Modeling State smoothing Generalized Autoregressive Conditional

13 GARCH Formulation Markov-switching GARCH model Asymptotic Stationarity Spectral Modeling State smoothing Generalized Autoregressive Conditional Heteroscedasticity (GARCH) Model Robert F. Engle Nobel price (2003) in economic sciences for methods of analyzing economic time series with time-varying volatility (ARCH)

14 GARCH Formulation Markov-switching GARCH model Asymptotic Stationarity Spectral Modeling State smoothing Generalized Autoregressive Conditional Heteroscedasticity (GARCH) Model GARCH models [Engle 82; Bollerslev 86] are widely used in various financial applications such as risk management, option pricing, and foreign exchange. Explicitly parameterize the time-varying volatility in terms of past conditional variances and past squared innovations, while taking into account excess kurtosis and volatility clustering. Modeling speech expansion coefficients as GARCH processes offers reasonable model on which to base the variance estimation [Cohen 04, 06]

15 GARCH Formulation Markov-switching GARCH model Asymptotic Stationarity Spectral Modeling State smoothing Generalized Autoregressive Conditional Heteroscedasticity (GARCH) Model GARCH models [Engle 82; Bollerslev 86] are widely used in various financial applications such as risk management, option pricing, and foreign exchange. Explicitly parameterize the time-varying volatility in terms of past conditional variances and past squared innovations, while taking into account excess kurtosis and volatility clustering. Modeling speech expansion coefficients as GARCH processes offers reasonable model on which to base the variance estimation [Cohen 04, 06]

16 GARCH Model (cont.) GARCH Formulation Markov-switching GARCH model Asymptotic Stationarity Spectral Modeling State smoothing 4.2 Deutschmark/British Pound Foreign Exchange Rate Exchange Rate Jan 1984 Jan 1986 Jan 1988 Jan Deutschmark/British Pound Daily Returns 0.25 Real value of speech spectral coefficients at 3 khz Return Jan 1984 Jan 1986 Jan 1988 Jan

Speech Spectral Analysis GARCH Formulation Markov-switching GARCH model Asymptotic Stationarity Spectral Modeling State smoothing When observing a time series of successive expansion coefficients in

17 Speech Spectral Analysis GARCH Formulation Markov-switching GARCH model Asymptotic Stationarity Spectral Modeling State smoothing When observing a time series of successive expansion coefficients in a fixed frequency bin, successive magnitudes of the expansion coefficients are highly correlated, whereas successive phases are nearly uncorrelated. Speech signals in the STFT domain are characterized by volatility clustering and heavy-tailed distribution. Real value of speech spectral coefficients at 3 khz

18 GARCH Model (cont.) GARCH Formulation Markov-switching GARCH model Asymptotic Stationarity Spectral Modeling State smoothing Let {y t } denote a real-valued discrete-time stochastic process, and let ψ t denote the information set available at time t. The innovation (prediction error) ε t at time t in the MMSE sense is given by ε t = y t E {y t ψ t 1 } The conditional variance (volatility) of y t is defined by σ 2 t = Var {y t ψ t 1 } = E { ε 2 t ψ t 1}.

19 GARCH Model (cont.) GARCH Formulation Markov-switching GARCH model Asymptotic Stationarity Spectral Modeling State smoothing A linear GARCH model of order (p,q), ε t GARCH(p,q): ε t = σ t v t q p σt 2 = ξ + α i ε 2 t i + β j σt j 2 i=1 j=1 {v t } is a zero-mean unit-variance white noise process. To ensure positive conditional variances: ξ > 0, α i 0, β j 0, i = 1,...,q, j = 1,...,p, and for the existence of a finite unconditional variance: q p α i + β j < 1 i=1 j=1

20 GARCH Model (cont.) GARCH Formulation Markov-switching GARCH model Asymptotic Stationarity Spectral Modeling State smoothing A linear GARCH model of order (p,q), ε t GARCH(p,q): ε t = σ t v t q p σt 2 = ξ + α i ε 2 t i + β j σt j 2 i=1 j=1 {v t } is a zero-mean unit-variance white noise process. To ensure positive conditional variances: ξ > 0, α i 0, β j 0, i = 1,...,q, j = 1,...,p, and for the existence of a finite unconditional variance: q p α i + β j < 1 i=1 j=1

21 GARCH Formulation Markov-switching GARCH model Asymptotic Stationarity Spectral Modeling State smoothing Markov-Switching GARCH Model In a Markov-switching GARCH model, the parameters are allowed to change in time according to a Markovian state to enable tracking of any shocks. Multi-state models are used for speech recognition and for speech enhancement [Drucker 68, Ephraim et al. 89, 92]. In case of speech signals the parameters may change in time, e.g., for representing different speech phonemes or speakers. A special state may be defined for speech absence hypothesis. This may results in a soft voice activity detector.

22 GARCH Formulation Markov-switching GARCH model Asymptotic Stationarity Spectral Modeling State smoothing Markov-Switching GARCH Model S t - an m-state, first-order hidden Markov chain with transition probabilities matrix A. A general (p,q)-order MS-GARCH model follows: Given that S t = s t : ε t = σ t,st v t σt,s 2 t q p = ξ st + α i,st ε 2 t i + β j,st σt j,s 2 t j i=1 j=1

23 Asymptotic Stationarity GARCH Formulation Markov-switching GARCH model Asymptotic Stationarity Spectral Modeling State smoothing For a single-state model: i α i + j β j < 1 is necessary and sufficient to have an asymptotic stationary variance and to guarantee a finite second-order moment [Bollerslev 86]. Clearly, for a MS-GARCH model: i α i,s + j β j,s < 1 s is a sufficient condition. Sufficient and necessary conditions for stationarity are known only for some degenerated cases.

24 Asymptotic Stationarity (cont.) GARCH Formulation Markov-switching GARCH model Asymptotic Stationarity Spectral Modeling State smoothing The unconditional variance follows: E [ ε 2 ] [ ( )] t = E ψt 1,S t E ε 2 t ψ t 1,s t [ ( )] m ( ) = E St Eψt 1 σ 2 t,st s t = π st E ψt 1 σ 2 t,st s t where π st p (S t = s t ) s t=1

25 Asymptotic Stationarity (cont.) GARCH Formulation Markov-switching GARCH model Asymptotic Stationarity Spectral Modeling State smoothing Define: Ψ K (1) K (2) K (r) I m 0 m 0 m 0 m I m where 0 m 0 I m 0 m { K (i)} (α i,s + β i,s ) π s { A i, s, s = 1,...,m s, s π } s,s s and r max{p,q}.

26 Asymptotic Stationarity (cont.) GARCH Formulation Markov-switching GARCH model Asymptotic Stationarity Spectral Modeling State smoothing Let Λ be an m m square matrix with {Λ} ij = i,j = 1,...,m. Then: { (I Ψ) 1} ij, Theorem The process is asymptotically wide-sense stationary with asymptotic variance lim t E ( ε 2 t) = π Λξ, if and only if ρ(ψ) < 1 [Abramson and Cohen, Econometric Theory 07] Some regimes may allow volatility to grow over time and still the second-order moment will be finite. Similar method was applied to other MS-GARCH formulations.

27 Asymptotic Stationarity (cont.) GARCH Formulation Markov-switching GARCH model Asymptotic Stationarity Spectral Modeling State smoothing Let Λ be an m m square matrix with {Λ} ij = i,j = 1,...,m. Then: { (I Ψ) 1} ij, Theorem The process is asymptotically wide-sense stationary with asymptotic variance lim t E ( ε 2 t) = π Λξ, if and only if ρ(ψ) < 1 [Abramson and Cohen, Econometric Theory 07] Some regimes may allow volatility to grow over time and still the second-order moment will be finite. Similar method was applied to other MS-GARCH formulations.

28 GARCH Formulation Markov-switching GARCH model Asymptotic Stationarity Spectral Modeling State smoothing Back to Speech: Spectral Modeling of Speech Signals Using MS-GARCH

29 GARCH Formulation Markov-switching GARCH model Asymptotic Stationarity Spectral Modeling State smoothing MS-GARCH Model in the Time-Frequency Domain A frame (or subband) dependent Markov chain S t with realization s t 0,1,...,m X tk MS GARCH(1,1): X tk = λ tk t 1,st V tk λ tk t 1,st = λ min,st + α st X t 1,k 2 ( ) + β st λt 1,k t 2,st 1 λ min,st 1 with λ min,st > 0, α st, β st 0. {V tk } are iid complex random variables with zero mean and unit variance. {X tk } are statistically dependent but, { X tk λ tk t 1,st, s t } are zero-mean statistically independent random variables.

30 GARCH Formulation Markov-switching GARCH model Asymptotic Stationarity Spectral Modeling State smoothing MS-GARCH Model in the Time-Frequency Domain A frame (or subband) dependent Markov chain S t with realization s t 0,1,...,m X tk MS GARCH(1,1): X tk = λ tk t 1,st V tk λ tk t 1,st = λ min,st + α st X t 1,k 2 ( ) + β st λt 1,k t 2,st 1 λ min,st 1 with λ min,st > 0, α st, β st 0. {V tk } are iid complex random variables with zero mean and unit variance. {X tk } are statistically dependent but, { X tk λ tk t 1,st, s t } are zero-mean statistically independent random variables.

31 Model Estimation GARCH Formulation Markov-switching GARCH model Asymptotic Stationarity Spectral Modeling State smoothing Parameters are traditionally estimated from a training set using ML approach. Alternative, parameters may be chosen such that each state would represent a different level of the spectral energy. Setting α s = β s = 0 for a specific s results in a constant variance, hence, may be used for speech absence.

32 Model estimation (cont.) GARCH Formulation Markov-switching GARCH model Asymptotic Stationarity Spectral Modeling State smoothing The conditional variances in each subband (indexed n) are limited within a dynamic range of η g db. For the speech absence state (namely, s t = 0): λ min,n,0 = 10 log 10 ζg ηg/10, α n,0 = β n,0 = 0. where ζ g max t,k X tk 2. Under speech presence, η t db is the local dynamic range of the conditional variances such that { λ min,n,1 = max λ min,n,0,10 log ζn ηt/10} 10, and λ min,n,s, s = 2,...,m are log-spaced between λ min,n,1 and ζ n max t,k κn X tk 2.

33 Model estimation (cont.) GARCH Formulation Markov-switching GARCH model Asymptotic Stationarity Spectral Modeling State smoothing The parameters α n,s,β n,s for s > 0 set the volatility level of the conditional variance. Under an immutable state s, the stationary variance follows 1 β n,s λ,n,s lim λ tk t 1,s = λ min,n,s t,k κ n 1 α n,s β n,s assuming α n,s + β n,s < 1. Different states are related to different dynamic ranges in ascending order, thus λ,n,s λ min,n,s+1 and therefore 1 β n,s 1 α n,s β n,s λ min,n,s+1 λ min,n,s.

34 Relation to HMM GARCH Formulation Markov-switching GARCH model Asymptotic Stationarity Spectral Modeling State smoothing In a standard HMM: p (X t X t 1,X t 2,..., s t,s t 1,s t 2,...) = p (X t s t ) each state specifies a specific pdf. In a MS-GARCH process both past observations and the regime-path are required for the conditional density. each state formulates the evolution of the conditional variance.

35 GARCH Formulation Markov-switching GARCH model Asymptotic Stationarity Spectral Modeling State smoothing Reconstruction of the Conditional Variance In a noiseless environment (econometric) the conditional variance is recursively reconstructed. In our case, the conditional variance is estimated using two steps: [Abramson and Cohen, Trans. SP 07] Propagation } ˆλ tk t 1,st =λ min,st + α st E { X t 1,k 2 Y t 1, s t + β st E { λ t 1,k t 2,St 1 Y t 1 }, s t βst E { λ min,st 1 Y t 1 }, s t } E { X t 1,k 2 Y t 1, s t = p ( s t 1 s t, Y t 1) } E { X t 1,k 2 Y t 1, s t 1 s t 1

36 GARCH Formulation Markov-switching GARCH model Asymptotic Stationarity Spectral Modeling State smoothing Reconstruction of the Conditional Variance (cont.) and Update ˆλ tk t,st =E { X tk 2 st, ˆλ } tk t 1,st,Y t [ ˆλ tk t 1,st = ˆλ tk t 1,st + σtk 2 ˆλ tk t 1,st σtk 2 + ˆλ tk t 1,st + σtk 2 Y tk 2 ]

37 GARCH Formulation Markov-switching GARCH model Asymptotic Stationarity Spectral Modeling State smoothing Relation to decision directed estimation Recall the decision-directed estimation: ˆλ DD tk = { 2 ) } max µ ˆX t 1,k + (1 µ) ( Y tk 2 σtk 2, ξ min σtk 2,0 µ 1 For an ARCH(1) model (i.e., β = 0) with α = 1 the update step can be written as [Abramson & Cohen, Trans. SP] ˆλ tk t = } ( ) µ tk E { X t 1,k 2 Y t 1 + (1 µ tk ) Y tk 2 σtk 2 + µ tk λ min with µ tk 1 ˆλ 2 tk t 1 (ˆλ tk t 1 +σ 2 tk) 2.

38 GARCH Formulation Markov-switching GARCH model Asymptotic Stationarity Spectral Modeling State smoothing Relation to decision directed estimation (cont.) } For λ min << E { X t 1,k 2 Y t 1, the degenerated ARCH-based variance estimation with α = 1 is similar to the decision-directed approach with µ µ tk 2 } ˆX t 1,k E { X t 1,k 2 Y t 1 GARCH modeling allows α < 1 and β > 0 and it refers to the spectral variances as a random process rather than a sequence of random parameters which are heuristically evaluated.

39 State Probability GARCH Formulation Markov-switching GARCH model Asymptotic Stationarity Spectral Modeling State smoothing The conditional state probability is derived by p ( s t Y t) p (Y t s t, ˆλ t t 1,st )p ( s t Y t 1) = s t p (Y t s t, ˆλ ), t t 1,st p (s t Y t 1 ) where p ( s t Y t 1) = s t 1 p ( s t 1 Y t 1) a st 1,s t. Accordingly, 1 p (s t = 0 Y t ) is a soft voice activity detector for the subband. The problem of state smoothing, p(s t Y t+l ) with L > 0, is also addressed [Abramson and Cohen, SPL 06].

40 State smoothing GARCH Formulation Markov-switching GARCH model Asymptotic Stationarity Spectral Modeling State smoothing State smoothing, i.e., p (s t Y τ ) for τ > t in HMP relies on [Chang and Hancock 66,Lindgren 78, Askar and Derin 81] f (X t ψ t 1,s t ) = f (X t s t ) In an AR-HMP a finite set of past observation is required [Kim 94]. In a path-dependent MS-GARCH process the density is a function of all past values and active states. We derive generalization of both the forward-backward recursions and the stable backward recursion to path dependent model [Abramson and Cohen, IEEE Signal processing letters 06].

41 State smoothing (cont.) GARCH Formulation Markov-switching GARCH model Asymptotic Stationarity Spectral Modeling State smoothing Define } ˆΛ t {ˆλt t 1,St S t = 1,...,m the generalized forward density α ( s t, Y t) f (s t, ˆΛ t,y t ) and the generalized backward density ( β Y t+l t+l St t+l 1, Y t+l 1) ( ) f Y t+l t+l St t+l 1, ˆΛ t, Yt t+l 1 then the noncausal state probability can be obtained by ( ) ( p s t Y t+l) ( = p s t ˆΛ ) α (s t, Y t ) β Y t+l t, Yt t+l t+1 s t, Y t = ( ) s t α(s t, Y t )β Yt+1 t+l s t, Y t

42 Generalized forward recursion GARCH Formulation Markov-switching GARCH model Asymptotic Stationarity Spectral Modeling State smoothing The generalized forward density satisfies the following recursion: α ( { s t, Y t) πs0 ( f (Y 0 s 0 ) ) t = 0 = f Y t s t, ˆλ t t,st s t 1 α ( s t 1, Y t 1) a st 1s t t = 1, 2...

43 Generalized backward recursion GARCH Formulation Markov-switching GARCH model Asymptotic Stationarity Spectral Modeling State smoothing The generalized backward density requires two recursive steps: Step I: ˆλt+l t+l 1,S t+l t ˆλ t+l t+l,s t+l t for l = 1,..., L, for all St t+l : = ξ st+l 1 + α st+l ˆλt+l 1 t+l 1,S t+l 1 t = g ˆλt+l t+l 1,S t t+l,y t+l, σ 2 Step II: for l = L,..., 1, for all St t+l : f Y t+l t+l St t+l, ˆλ t t 1,st, Yt t+l 1 = β Y t+l β Y t+l t+l St t+l 1, Y t+l 1 = È s t+l f Y t+l t+l St t+l t+l+1 St+l t + β st+1ˆλt+l 1 t+l 2,S t+l 1 t, Y t+l f Y t+l St t+l, ˆλ t t 1,st, Yt t+l 1, ˆλ t t 1,st, Yt t+l 1 a st+l 1 s t+l

44 State smoothing - results GARCH Formulation Markov-switching GARCH model Asymptotic Stationarity Spectral Modeling State smoothing The generalized recursions capture the non-memoryless structure of the Markov-switching GARCH model. Each step of the generalized backward recursion is calculated for m L+1 regime sequences Error rate Delay L (samples) State smoothing error rate for 3-state MSTF-GARCH models with SNRs of 5 db (triangle), 10 db (asterisk) and 15 db (circle).

45 Spectral enhancement GARCH Formulation Markov-switching GARCH model Asymptotic Stationarity Spectral Modeling State smoothing Minimizing yields E { g (X tk ) g(ˆx tk ) 2 Y t} ) g (ˆXtk = E { g (X tk ) Y t} where E { g (X tk ) Y t} = s t p ( S t = s t Y t) E { g (X tk ) s t, Y t}

46 Spectral Enhancement GARCH Formulation Markov-switching GARCH model Asymptotic Stationarity Spectral Modeling State smoothing Having the set {ˆλtk t,st } st=m s t=0, minimizing the mean-square error of the log-spectral amplitude (LSA) { ( ) } 2 E log X tk log ˆX tk Y t yields: ˆX tk = Y tk s t G LSA (ˆξtk,st, ˆγ tk,st ) p(st Y t ), where G LSA (ξ,ϑ) is the LSA gain function [Ephraim and Malah 85].

47 Guidelines Spectral Enhancement - Reformulation Detection and Estimation of Speech Signals Simultaneous Detection and Estimation

48 Guidelines Spectral Enhancement - Reformulation Detection and Estimation of Speech Signals Recall Find an estimate ˆX tk which minimizes { g ) } 2 E (Xtk ) g (ˆX tk ψt where g (X) and g (X) are specific functions, e.g., X, X or log X. Alternatively, Bayesian formulation: ) C (X argminˆxtk tk, ˆX tk p (Y tk X tk ) p(x tk )dx tk where C X, ˆX ˆX 2 = g (X) g.

49 Guidelines Spectral Enhancement - Reformulation Detection and Estimation of Speech Signals Recall Find an estimate ˆX tk which minimizes { g ) } 2 E (Xtk ) g (ˆX tk ψt where g (X) and g (X) are specific functions, e.g., X, X or log X. Alternatively, Bayesian formulation: ) C (X argminˆxtk tk, ˆX tk p (Y tk X tk ) p(x tk )dx tk where C X, ˆX ˆX 2 = g (X) g.

50 Guidelines Spectral Enhancement - Reformulation Detection and Estimation of Speech Signals (cont.) Independent detection and estimation: Simultaneous detection and estimation:

51 Guidelines Spectral Enhancement - Reformulation Detection and Estimation of Speech Signals (cont.) Guidelines [Middleton et al. 68, 72]: Integrated detector with a decision space { η0 tk, } ηtk 1. Under ηj tk, Hj tk is accepted and ˆX tk = ˆX tk,j is considered. Cost for the hypothesis-decision pair {H i,η j }: C ij (X, ˆX ) ( = b ij d ij X, ˆX ) d ij (X, ˆX ) is an appropriate distortion measure between the signal and its estimate under η j. b ij is a trade-off parameter of the cost associated with the pair {H i, η j } against other pairs.

52 Guidelines Spectral Enhancement - Reformulation Detection and Estimation of Speech Signals (cont.) For the signal-observation {X,Y} the cost associated with the decision η j is C j (X, Y ) = = 1 i=0 1 i=0 C ij (X, ˆX ) p (H i X) p (H i )C ij (X, ˆX ) p (X H i )/p (X) The Combined Bayes risk of simultaneous detection and estimation: R = 1 j=0 Y X C j (X, Y )p (η j Y )p (Y X)p (X) dxdy

53 Guidelines Spectral Enhancement - Reformulation Detection and Estimation of Speech Signals (cont.) The goal: Find the optimal detector and estimator which minimize the combined Bayes risk R = 1 j=0 Y p (η j Y ) 1 i=0 X C ij (X, ˆX ) p (H i )p (Y X)p (X H i ) dxdy

54 Guidelines Spectral Enhancement - Reformulation Detection and Estimation of Speech Signals (cont.) Combined solution: Optimal estimator under a decision η j : Optimal decision rule: ˆX j = arg min ˆX 1 p (H i ) r ij (Y ) i=0 q [r 10 (Y ) r 11 (Y )] η 1 (1 q)[r 01 (Y ) r 00 (Y )] η 0 r ij (Y ) C ij (X, ˆX ) p (Y X) p (X H i )dx q p(h 1 )

55 Guidelines Spectral Enhancement - Reformulation Detection and Estimation of Speech Signals (cont.) Recall C ij (X, ˆX ) ( = b ij d ij X, ˆX ) Normalize b 00 = b 11 = 1. b 10 is associated with missed-detection, b 01 is associated with false-detection. d ii ( ) is not necessarily zero (estimation error). Under H 0, a natural background noise level is desired.

56 Distortion Functions: Guidelines Spectral Enhancement - Reformulation Detection and Estimation of Speech Signals Quadratic distortion measure: ( ) X d ij X, ˆX ˆX 2 j j = 2 G f Y ˆX j i = 1 i = 0 Quadratic spectral amplitude (QSA) distortion measure: ( ) ( ) 2 X ˆX j i = 1 d ij X, ˆX j = ( ) 2 G f Y ˆX j i = 0 G f << 1 is a constant attenuation floor.

57 Guidelines Spectral Enhancement - Reformulation Detection and Estimation of Speech Signals (cont.) Definitions: a priori and a posteriori SNRs ξ λ λ d, γ Generalized likelihood ratio: Y 2 λ d, υ ξγ 1 + ξ. Λ (ξ,γ) = q p (Y H 1 ) (1 q) p (Y H 0 ). φ j (ξ,γ) b 0j + b 1j Λ (ξ,γ).

58 QSA distortion measure: Guidelines Spectral Enhancement - Reformulation Detection and Estimation of Speech Signals Under a Gaussian model: [Abramson and Cohen, Trans. ASLP 07] Estimator ˆX j = b 1jΛ (ξ, γ)g STSA (ξ, γ) + b 0j G f φ j (ξ, γ) Y G j (ξ, γ)y, j = 0, 1 Detector q e υ b 10G0 2 G ξ where G STSA = 2υ γ η 1 η 0 (1 q) exp υ ξ (1 + ξ) γ (1 + υ)(b10 1) + 2(G1 b10g0) G STSA b 01 (G 1 G f ) 2 (G 0 G f ) 2 2 (1 + υ υ)i0 2 + υ υi1 2 [Ephraim and Malah 84]

59 Guidelines Spectral Enhancement - Reformulation Detection and Estimation of Speech Signals of Speech Signals: The estimator ˆX tk = G j Y tk is optimal for any given detector.

60 Guidelines Spectral Enhancement - Reformulation Detection and Estimation of Speech Signals of Speech Signals: The estimator ˆX tk = G j Y tk is optimal for any given detector.

61 QSA distortion measure: ξ = 15 db Guidelines Spectral Enhancement - Reformulation Detection and Estimation of Speech Signals 6 7 ξ = 5 db Gain [db] G 1 Gain [db] Gain [db] G 0 28 G (detection & estimation) G STSA (p<1) Instantaneous SNR (γ 1) [db] ξ = 5 db Gain [db] Instantaneous SNR (γ 1) [db] ξ = 15 db Instantaneous SNR (γ 1) [db] q = 0.8, G f = 20 db and b 10 = 1.1, b 01 = Instantaneous SNR (γ 1) [db]

62 Asymptotic behavior Guidelines Spectral Enhancement - Reformulation Detection and Estimation of Speech Signals If the false alarm parameter b 01 << Λ(ξ,γ) we get G 1 (ξ,γ) G STSA (ξ,γ). If b 01 >> Λ(ξ,γ) (wrong decision might be expensive) = G 1 (ξ,γ) G f. If the missed-detection parameter b 10 << Λ(ξ,γ) 1 then G 0 (ξ,γ) G f. But b 10 >> Λ(ξ,γ) 1 = G 0 (ξ,γ) G STSA (ξ,γ) to overcome the high cost for missed-detection.

63 Relation to classical algorithms Guidelines Spectral Enhancement - Reformulation Detection and Estimation of Speech Signals Λ(ξ,γ) Recall that 1+Λ(ξ,γ) = p (H 1 Y ) is the a posteriori speech presence probability. For equal cost parameters b ij = 1 i,j we have ˆX 1 = ˆX 0 = [p (H 1 Y )G STSA (ξ,γ) + (1 p (H 1 Y )) G f ]Y this is the estimator proposed by [Cohen 06]. In case G f = 0 we get the STSA suppression rule [Ephraim and Malah 84].

64 Guidelines Spectral Enhancement - Reformulation Detection and Estimation of Speech Signals Design of coupled detector and estimator with cost parameters which control the tradeoff between musical residual noise and speech distortion. This method enables to incorporate an independent detector for speech or transient noise with the optimal estimator to yield a suboptimal solution. Low computational complexity.

65 Blind Source Separation

66 Problem formulation Given a single mixture x = s 1 + s 2 find estimates ŝ 1 and ŝ 2. Since the problem is il-posed, some a priori information is required such as statistical codebook. In the STFT domain, at some specific time-index we have x = s 1 + s 2

67 Problem formulation (cont.) q 1 {1,...,m 1 } and q 2 {1,...,m 2 } denote the active states of the codebooks with known a priori probabilities p 1 (i) p (q 1 = i) and p 2 (i) p (q 2 = j). ( Given that q 1 = i and q 2 = j we assume s 1 CN ( ) and s 2 CN. 0,Σ (j) 2 0,Σ (i) 1 Hence, we need to find the active states at each time-frame and subsequently estimate each signal. )

68 Existing solutions Sequential classification and estimation: MAP classification: {î,ĵ } MMSE estimation: ŝ 1 = E { } s 1 x,î,ĵ = Σ (î) 1 = arg max p (x i,j) p (i,j) i,j ( Σ (î) 1 + Σ (ĵ) ) 1 2 x Wîĵ x Classification error may significantly distort the signal or result in residual interference.

69 Existing solutions (cont.) Direct estimation: ŝ 1 = E {s 1 x} = i,j p (i,j x) W ij x.

70 Simultaneous classification and estimation Consider a classifier η ij and a combined cost function Cī j ij (s,ŝ) bī j ij s ŝ 2 2. bī j ij > 0 are cost parameters for a decision that {i,j} is the active pair while q 1 = ī and q 2 = j. The combined risk: R = Cī j ij (s 1,ŝ 1 )p ( x s 1,ī, j ) p ( s 1 ī, j ) p ( ī, j ) p (η ij x) ds 1 dx i,j ī, j

71 Simultaneous classification and estimation (cont.) Using rī j ij (x,ŝ 1 ) = Cī j ij (s 1,ŝ 1 )p ( x s 1, j ) p ( s 1 ī ) ds 1 we can write: R = i,j p (η ij x) ī, j p ( ī, j ) rī j ij (x,ŝ 1 )dx and the combined solution is obtained by minimizing: min {R} η ij,ŝ 1

72 Optimal solution The optimal combined solution: The estimator under the decision η ij ŝ 1,ij = arg min ŝ 1 = ī, j p ( ī, j ) rī j ij (x,ŝ 1 ) ī j bī j ij p ( x ī, j ) p ( ī, j ) Wī j x ī j bī j ij p ( x ī, j ) p ( ī, j ) G ij x (1)

73 Optimal solution (cont.) The optimal classifier is obtained by min ij ī, j p ( ī, j ) rī j ij (x,ŝ 1 ) where rī j ij (x,ŝ 1 ) = bī j ij p ( x ī, j ) [ ( ) x H W 2 ī j 2W ī j G ij x + 1 T Σ ( j) ] 2 W ī j 1

74 Parameter selection A specific state is defined for signal absence, e.g., q = 0. We may choose bī j ij = bīi b j j and bīi = b 1,m i = 0, ī 0 b 1,f i 0, ī = 0 1 o.w. with relation to missed and false detection.

75 Joint classification and estimation Now we consider a cascade classification and estimation: η x = s 1 + s ij 2 Classifier Estimator ŝ 1,ij Under signal-presence decision we wish to control the residual interference and under signal-absence decision we limit the distortion level.

76 Joint classification and estimation (cont.) Under signal-presence decision (η 1j ): { } ŝ 1,1j = arg min p (q 1 = 1 x) E s 1 ŝ q 1 = 1,x ŝ 1 s.t. ε2 r (x) σ 2 r where ε 2 r (x) p (q 1 = 0 x) E { } s 1 ŝ q 1 = 0,x

77 Joint classification and estimation (cont.) Under signal-absence decision (η 0j ): { } ŝ 1,0j = arg min p (q 1 = 0 x) E s 1 ŝ q 1 = 0,x ŝ 1 s.t. ε2 d (x) σd 2, where { } ε 2 d (x) p (q 1 = 1 x) E s 1 ŝ q 1 = 1,x

78 Joint classification and estimation (cont.) The Lagrangian is given by (for η 0j ): { } ( ) L d (ŝ 1,µ d ) = p (q 1 = 0 x) E s 1 ŝ q 1 = 0,x +µ d ε2 d (x) σd 2 and ( ) µ d ε2 d (x) σd 2 = 0 for µ d 0 Similarly, under η 1j we have L r (ŝ 1,µ r ) with µ r.

79 Joint classification and estimation (cont.) By joining the solutions for both cases we obtain: ŝ 1,ij = ī j µīi p ( ī, j x ) Wī j x ī j µīi p ( ī, j x ) with µīi = µ d i = 0, ī = 1 µ r i = 1, ī = 0 1 o.w. which is similar to the solution for the simultaneous classification and estimation with b j j = 1, b 1,m = µ d, and b 1,f = µ r.

80 GARCH-based codebook

81 Block diagram

82 Distortion vs. interference reduction

84 GARCH modeling Speech Enhancement Voice Activity Detection Speech Dereverberation Blind-Source Separation

85 GARCH modeling Speech Enhancement Voice Activity Detection Speech Dereverberation Blind-Source Separation Signal estimation - synthetic signals 3-state model 5-state model SNR improvement (db) theoretical limit MSTF GARCH, true par. MSTF GARCH, m=1 MSTF GARCH, m=3 MSTF GARCH, m=5 constant variance SNR improvement (db) theoretical limit MSTF GARCH, true par. MSTF GARCH, m=1 MSTF GARCH, m=3 MSTF GARCH, m=5 MSTF GARCH, m= Input SNR (db) Input SNR (db)

86 GARCH modeling Speech Enhancement Voice Activity Detection Speech Dereverberation Blind-Source Separation Estimated conditional variances at various frequencies: Magnitude (db) Magnitude (db) Magnitude (db) khz Time [sec] khz Time [sec] khz Time [sec] Solid line: Signal s squared absolute value Dashed-dotted line: single-state model Dotted line: 3-state model Dashed line: 5-state model

87 Variance estimation (cont.) GARCH modeling Speech Enhancement Voice Activity Detection Speech Dereverberation Blind-Source Separation Estimated squared absolute values at frequency of 2 khz: 20 SNR= 10 db Magnitude (db) Time [sec] 20 SNR= 0 db Solid line: Speech signal s squared absolute value. Dashed-dotted line: 5-state MSTF-GARCH model. Dotted line: Decision-directed approach. Magnitude (db) Time [sec]

GARCH modeling Speech Enhancement Voice Activity Detection Speech Dereverberation Blind-Source Separation Speech Enhancement - MS-GARCH Modeling 8 7 Clean signal Speech corrupted by a factory noise

88 GARCH modeling Speech Enhancement Voice Activity Detection Speech Dereverberation Blind-Source Separation Speech Enhancement - MS-GARCH Modeling 8 7 Clean signal Speech corrupted by a factory noise with 5 db SNR (LSD= 6.68 db, SegSNR= 0.05 db) 8 7 Frequency [khz] Frequency [khz] Amplitude Time [Sec] Speech reconstructed by using a 4-state model (LSD= 3.14 db, SegSNR= 6.76 db) 8 Amplitude Time [Sec] Comparison with DD Reverberated speech with 20 db SNR 19 Frequency [khz] SegSNR [db] Decision directed MS GARCH Amplitude Time [Sec] LSD [db] T [sec] 60

GARCH modeling Speech Enhancement Voice Activity Detection Speech Dereverberation Blind-Source Separation Keyboard-Typing Noise, Detection

23 db) Frequency [khz] 6 4 2 Frequency [khz] 6 4 2 Amplitude 0 0 0.5 1 1.5 2 Time [Sec] Amplitude 0 0 0.5 1 1.5 2 Time [Sec] 8 OM-LSA (LSD=6.

89 GARCH modeling Speech Enhancement Voice Activity Detection Speech Dereverberation Blind-Source Separation Keyboard-Typing Noise, Detection vs. Estimation 8 Clean signal 8 Noisy signal with 0.8 db SNR (LSD=7.69 db, SegSNR=-2.23 db) Frequency [khz] Frequency [khz] Amplitude Time [Sec] Amplitude Time [Sec] 8 OM-LSA (LSD=6.77 db, SegSNR=-1.31 db) 8 Detection and estimation (LSD=2.52 db, SegSNR=4.41 db) Frequency [khz] Frequency [khz] Amplitude Time [Sec] Amplitude Time [Sec]

GARCH modeling Speech Enhancement Voice Activity Detection Speech Dereverberation Blind-Source Separation Car Environment with a Siren, Detection vs.

90 GARCH modeling Speech Enhancement Voice Activity Detection Speech Dereverberation Blind-Source Separation Car Environment with a Siren, Detection vs. Estimation 8 Noisy signal with 3 db SNR OM-LSA Clean signal (LSD=6.56 db, SegSNR=-5.95 db) (LSD=4.609 db, SegSNR=-0.81 db) Frequency [khz] Frequency [khz] Frequency [khz] Amplitude Time [Sec] Amplitude Time [Sec] Amplitude Time [Sec] STSA with siren noise detector (LSD=4.37 db, SegSNR= db) Sim. detection and estimation (LSD=3.14 db, SegSNR=6.50 db) Frequency [khz] Frequency [khz] Amplitude Time [Sec] Amplitude Time [Sec]

91 GARCH modeling Speech Enhancement Voice Activity Detection Speech Dereverberation Blind-Source Separation Voice Activity Detection Using MS-GARCH Model 1 SNR= 15 db Speech presence probabillity MSTF GARCH DD Instantaneous SNR [db] Higher dynamic range for the speech presence probabilities. Much lower probabilities for low energy coefficients.

92 Speech Dereverberation GARCH modeling Speech Enhancement Voice Activity Detection Speech Dereverberation Blind-Source Separation Noisy and reverberant speech:

Speech Dereverberation GARCH modeling Speech Enhancement Voice Activity Detection Speech Dereverberation Blind-Source Separation 4 Clean signal 4 Corrupted signal LSD=4.87 db SegSNR=5.85 db 3.5 3.

93 Speech Dereverberation GARCH modeling Speech Enhancement Voice Activity Detection Speech Dereverberation Blind-Source Separation 4 Clean signal 4 Corrupted signal LSD=4.87 db SegSNR=5.85 db Frequency [khz] Frequency [khz] Amplitude Amplitude Time [Sec] DD-based algorithm LSD=1.99 db SegSNR=8.35 db Time [Sec] Proposed algorithm LSD=1.70 db SegSNR=9.01 db Frequency [khz] Frequency [khz] Amplitude Amplitude Time [Sec] Time [Sec] * Joint work with Habets and Gannot.

94 GARCH modeling Speech Enhancement Voice Activity Detection Speech Dereverberation Blind-Source Separation Single-Channel Blind Source Separation GARCH modeling with Simultaneous Classification and Estimation: Clean and mixed signals Estimated signals GMM speech proposed music GMM mix proposed Time [sec] Time [sec]

95 Summary Introduction GARCH modeling Speech Enhancement Voice Activity Detection Speech Dereverberation Blind-Source Separation A new formulation for the problem of speech enhancement based on a simultaneous detection and estimation approach. A novel statistical model for speech signals in the STFT domain. The statistical model and the detection and estimation approach may be applied to speech enhancement in highly nonstationary noise environments, speech dereverberation and single-sensor blind source separation. Future Research: Improve multichannel algorithms using GARCH modeling and detection and estimation approach. Multivariate GARCH model with cross-band correlation. Optimizing the cost function and the trade-off parameters in case of multi-hypotheses.

96 GARCH modeling Speech Enhancement Voice Activity Detection Speech Dereverberation Blind-Source Separation Thank you!

New Statistical Model for the Enhancement of Noisy Speech

New Statistical Model for the Enhancement of Noisy Speech Electrical Engineering Department Technion - Israel Institute of Technology February 22, 27 Outline Problem Formulation and Motivation 1 Problem