TRINICON: A Versatile Framework for Multichannel Blind Signal Processing Herbert Buchner, Robert Aichner, Walter Kellermann {buchner,aichner,wk}@lnt.de Telecommunications Laboratory University of Erlangen-Nuremberg
Contents Introduction: blind source separation and dereverberation A generic broadband algorithm Special cases and illustration Incorporation of models for spherically invariant random processes (SIRPs) Comparison of some examples Summary and outlook Page 1
Introduction (1) Scenario: MIMO FIR model (assuming number Q of sources number P of sensors) s 1 s Q h 11 h Q1 h 1P h QP x 1 sensor 1 x P sensor P w 11 w P1 w 1Q w PQ y 1 y Q mixing system H demixing system W Page 2
Introduction (1) Scenario: MIMO FIR model (assuming number Q of sources number P of sensors) s 1 s Q h 11 h Q1 h 1P h QP x 1 sensor 1 x P sensor P w 11 w P1 w 1Q w PQ y 1 y Q mixing system H demixing system W We consider here two general classes of problems: Page 2
Introduction (1) Scenario: MIMO FIR model (assuming number Q of sources number P of sensors) s 1 s Q h 11 h Q1 h 1P h QP x 1 sensor 1 x P sensor P w 11 w P1 w 1Q w PQ y 1 y Q mixing system H demixing system W We consider here two general classes of problems: Blind source separation (BSS) for convolutive mixtures Separate signals by forcing the outputs to be mutually independent (output signals may be still filtered and channel-wise permuted) Page 2
Introduction (1) Scenario: MIMO FIR model (assuming number Q of sources number P of sensors) s 1 s Q h 11 h Q1 h 1P h QP x 1 sensor 1 x P sensor P w 11 w P1 w 1Q w PQ y 1 y Q mixing system H demixing system W We consider here two general classes of problems: Blind source separation (BSS) for convolutive mixtures Separate signals by forcing the outputs to be mutually independent (output signals may be still filtered and channel-wise permuted) Multichannel blind deconvolution dereverberation In addition to BSS: recover original source signals up to scaling and permutation Page 2
Introduction (2) How to estimate the P Q L MIMO coefficients w pq (κ), κ = 0,..., L 1? Page 3
Introduction (2) How to estimate the P Q L MIMO coefficients w pq (κ), κ = 0,..., L 1? Exploitation of Nonwhiteness Nonstationarity Nongaussianity Page 3
Introduction (2) How to estimate the P Q L MIMO coefficients w pq (κ), κ = 0,..., L 1? Exploitation of by using Nonwhiteness Nonstationarity Nongaussianity Second-order statistics (SOS) simultaneous output decorrelation for multiple time-lags SOS simultaneous output decorrelation for multiple time-intervals Higher-order statistics (HOS) Page 3
Introduction (2) How to estimate the P Q L MIMO coefficients w pq (κ), κ = 0,..., L 1? Exploitation of by using Nonwhiteness Nonstationarity Nongaussianity Second-order statistics (SOS) simultaneous output decorrelation for multiple time-lags SOS simultaneous output decorrelation for multiple time-intervals Higher-order statistics (HOS) In practice: Combined use of properties may lead to improved performance and more general applicability. Page 3
Introduction (2) How to estimate the P Q L MIMO coefficients w pq (κ), κ = 0,..., L 1? Exploitation of by using Nonwhiteness Nonstationarity Nongaussianity Second-order statistics (SOS) simultaneous output decorrelation for multiple time-lags SOS simultaneous output decorrelation for multiple time-intervals Higher-order statistics (HOS) In practice: Combined use of properties may lead to improved performance and more general applicability. Here: First rigorous derivation of a generic broadband adaptation algorithm - exploiting all properties and - avoiding problems of narrowband approaches (permutations, circularity, ) TRINICON ( Triple-N ICA for Convolutive Mixtures ) Page 3
Optimization Criterion Motivation (1) Nongaussianity: higher-order statistics for Independent Component Analysis (ICA) Page 4
Optimization Criterion Motivation (1) Nongaussianity: higher-order statistics for Independent Component Analysis (ICA) Current ICA approaches for BSS of instantaneous mixtures (or for independent application on individual frequency bins, i.e., narrowband approaches) Maximum likelihood (ML) Minimum mutual information (MMI) Maximum Entropy (ME) / Infomax It can be shown: MMI is the most general approach of these classes. Mutual information [Shannon]: I(Y 1, Y 2 ) = p(y 1, y 2 )ld ( ) p(y1, y 2 ) dy 1 dy 2 p(y 1 )p(y 2 ) Page 4
Nongaussianity (cont d): Optimization Criterion Motivation (2) Page 5
Nongaussianity (cont d): BSS for convolutive mixtures: Optimization Criterion Motivation (2) s 1 s P h 11 h P1 h 1P h PP x 1 sensor 1 x P sensor P w 11 w P1 w 1P w PP y 1 y P mixing system H demixing system W Minimize mutual information between the output channels I(Y 1, Y 2 ) = E{ld(p(y 1, y 2 )) ld(p(y 1 ) p(y 2 ))} Page 5
Optimization Criterion Motivation (2) Nongaussianity (cont d): BSS for convolutive mixtures: Multichannel Blind Deconvolution s 1 s P h 11 h P1 h 1P h PP x 1 sensor 1 x P sensor P w 11 w P1 w 1P w PP y 1 y P s 1 s P h 11 h P1 h 1P h PP x 1 sensor 1 x P sensor P w 11 w P1 w 1P w PP y 1 y P mixing system H demixing system W mixing system H demixing system W Minimize mutual information between the output channels I(Y 1, Y 2 ) = E{ld(p(y 1, y 2 )) ld(p(y 1 ) p(y 2 ))} Most of the current approaches: make outputs spatially and temporally independent Problem for speech and audio: i.i.d. assumption whitening Page 5
Optimization Criterion Motivation (2) Nongaussianity (cont d): BSS for convolutive mixtures: Multichannel Blind Deconvolution s 1 s P h 11 h P1 h 1P h PP x 1 sensor 1 x P sensor P w 11 w P1 w 1P w PP y 1 y P s 1 s P h 11 h P1 h 1P h PP x 1 sensor 1 x P sensor P w 11 w P1 w 1P w PP y 1 y P mixing system H demixing system W mixing system H demixing system W Minimize mutual information between the output channels I(Y 1, Y 2 ) = E{ld(p(y 1, y 2 )) ld(p(y 1 ) p(y 2 ))} Most of the current approaches: make outputs spatially and temporally independent Problem for speech and audio: i.i.d. assumption whitening Obvious generalization: factorization p(y 1 ) p(y 2 ) arbitrary target pdf Minimize Kullback-Leibler distance between a certain source model and output Page 5
Optimization Criterion Nongaussianity: Kullback-Leibler distance between PDFs of source model and output. Page 6
Optimization Criterion Nongaussianity: Kullback-Leibler distance between PDFs of source model and output. Nonstationarity: Ensemble average is replaced by averages over multiple blocks of length N. Page 6
Optimization Criterion Nongaussianity: Kullback-Leibler distance between PDFs of source model and output. Nonstationarity: Ensemble average is replaced by averages over multiple blocks of length N. Nonwhiteness: We consider multivariate PDFs, i.e., densities including D time-lags Page 6
Optimization Criterion Nongaussianity: Kullback-Leibler distance between PDFs of source model and output. Nonstationarity: Ensemble average is replaced by averages over multiple blocks of length N. Nonwhiteness: We consider multivariate PDFs, i.e., densities including D time-lags General TRINICON cost function J (m) = β(i, m) 1 N i=0 N 1 j=0 {log(ˆp s,p D (y(i, j))) log(ˆp y,p D (y(i, j)))} β: window function for online, offline, and block-online algorithms ˆp s,p D ( ), ˆp y,p D ( ): assumed or estimated multivariate source model PDF and output PDF, respectively. y(i, j): concatenated length-d output blocks of the P channels (i-th block, shifted by j samples), y q (i, j) = [y q (il+j),..., y q (il D + 1+j)] Page 6
FIR model with multiple time lags Matrix formulation Nonwhiteness: simultaneously optimize MIMO coefficients for D time lags Formulation of the linear convolution in matrix notation: P y q (m, j) = [y q (ml+j),..., y q (ml D + 1+j)] = x p (m, j)w pq (m) p=1 Page 7
FIR model with multiple time lags Matrix formulation Nonwhiteness: simultaneously optimize MIMO coefficients for D time lags Formulation of the linear convolution in matrix notation: P y q (m, j) = [y q (ml+j),..., y q (ml D + 1+j)] = x p (m, j)w pq (m) p=1 with the 2L D Sylvester matrices w pq,0 0 0 w pq,1 w pq,0.. w pq,1 0 w pq,l 1. w pq,0 W pq (m) = 0 w pq,l 1 w pq,1.. 0 0 w pq,l 1 0 0 0... 0 0 0. Page 7
General Coefficient Update Natural gradient WW H W J of the cost function as generic coefficient update: W(m) = 2 N β(i, m) i=0 N 1 j=0 W(i)y H (i, j) {Φ s,p D (y(i, j)) Φ y,p D (y(i, j))} Page 8
General Coefficient Update Natural gradient WW H W J of the cost function as generic coefficient update: W(m) = 2 N β(i, m) i=0 N 1 j=0 To select a practical algorithm: desired score function (hypothesized) W(i)y H (i, j) {Φ s,p D (y(i, j)) Φ y,p D (y(i, j))} Φ s,p D (y(i, j)) = log ˆp s,p D(y(i, j)) y(i, j) where the model (i.e., desired) PDF is factorized among the channels ( BSS) and/or factorized among a certain number of time lags ( full or partial MCBD) actual score function (observed) Φ y,p D (y(i, j)) = log ˆp y,p D(y(i, j)). y(i, j) Page 8
General Coefficient Update Natural gradient WW H W J of the cost function as generic coefficient update: W(m) = 2 N β(i, m) i=0 N 1 j=0 To select a practical algorithm: desired score function (hypothesized) W(i)y H (i, j) {Φ s,p D (y(i, j)) Φ y,p D (y(i, j))} Φ s,p D (y(i, j)) = log ˆp s,p D(y(i, j)) y(i, j) where the model (i.e., desired) PDF is factorized among the channels ( BSS) and/or factorized among a certain number of time lags ( full or partial MCBD) actual score function (observed) Φ y,p D (y(i, j)) = log ˆp y,p D(y(i, j)). y(i, j) How to obtain these sequences of high-dimensional multivariate PDFs? Page 8
Special Cases and Illustration SOS case SOS case exploiting only non-stationarity and non-whiteness Page 9
Special Cases and Illustration SOS case SOS case exploiting only non-stationarity and non-whiteness by assuming multivariate Gaussian PDFs in both score functions: 1 ˆp D (y p (i, j)) = (2π) D det(r yp y p (i)) e 1 2 y p(i,j)r 1 ypyp (i)yh p (i,j) W(m) = 2 β(i, m)w i=0 { ˆRyy ˆR ss } ˆR 1 ss Page 9
Special Cases and Illustration SOS case SOS case exploiting only non-stationarity and non-whiteness by assuming multivariate Gaussian PDFs in both score functions: 1 ˆp D (y p (i, j)) = (2π) D det(r yp y p (i)) e 1 2 y p(i,j)r 1 ypyp (i)yh p (i,j) W(m) = 2 β(i, m)w i=0 { ˆRyy ˆR ss } ˆR 1 ss Generic SOS-based BSS: ˆR ss = bdiag D ˆRyy [ ] Ry1 y 1 R y1 y 2 : PSfrag R y2 y 1 replacements R y2 y 2 D D Page 9
Special Cases and Illustration SOS case SOS case exploiting only non-stationarity and non-whiteness by assuming multivariate Gaussian PDFs in both score functions: 1 ˆp D (y p (i, j)) = (2π) D det(r yp y p (i)) e 1 2 y p(i,j)r 1 ypyp (i)yh p (i,j) W(m) = 2 β(i, m)w i=0 { ˆRyy ˆR ss } ˆR 1 ss Generic SOS-based BSS: ˆR ss = bdiag D ˆRyy [ ] Ry1 y 1 R y1 y 2 : PSfrag R y2 y 1 replacements R y2 y 2 Inherent RLS-like normalization Robust adaptation D D Page 9
Special Cases and Illustration SOS case (2) W(m) = 2 β(i, m)w i=0 { ˆRyy ˆR ss } ˆR 1 ss Desired correlation matrices ˆR ss for BSS and dereverberation: (a) BSS ˆR ss = bdiag D ˆRyy spectral content untouched no dereverberation Page 10
Special Cases and Illustration SOS case (2) W(m) = 2 β(i, m)w i=0 { ˆRyy ˆR ss } ˆR 1 ss Desired correlation matrices ˆR ss for BSS and dereverberation: (a) BSS (b) MCBD ˆR ss = bdiag D ˆRyy spectral content untouched no dereverberation ˆR ss = diag D ˆRyy output de-cross-correlated and de-auto-correlated undesirable for speech and audio Page 10
Special Cases and Illustration SOS case (2) W(m) = 2 β(i, m)w i=0 { ˆRyy ˆR ss } ˆR 1 ss Desired correlation matrices ˆR ss for BSS and dereverberation: (a) BSS (b) MCBD (c) MCBPD ˆR ss = bdiag D ˆRyy spectral content untouched no dereverberation ˆR ss = diag D ˆRyy remove only the influence output de-cross-correlated of the room and de-auto-correlated vocal tract untouched undesirable for speech room and audio PSfrag replacements vocal tract Page 10
Models for Spherically Invariant Processes (SIRPs) in TRINICON SIRPs are described by multivariate PDFs of the form (with suitable function f D ): ˆp D (y p ) = 1 ( π D det( ˆR f D y ˆR 1 p pp yp H pp ) ) 1 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 Several attractive properties: Good model for speech signals Closed-form representation 0 1 0.5 y(n 1) 0 0.5 1 1 0.5 0 y(n) 0.5 1 Page 11
Models for Spherically Invariant Processes (SIRPs) in TRINICON SIRPs are described by multivariate PDFs of the form (with suitable function f D ): ˆp D (y p ) = 1 ( π D det( ˆR f D y ˆR 1 p pp yp H pp ) ) 1 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 Several attractive properties: Good model for speech signals Closed-form representation 0 1 0.5 y(n 1) 0 0.5 1 1 0.5 0 y(n) 0.5 1 Reduced number of parameters to estimate for BSS or Dereverberation Page 11
Models for Spherically Invariant Processes (SIRPs) in TRINICON SIRPs are described by multivariate PDFs of the form (with suitable function f D ): ˆp D (y p ) = 1 ( π D det( ˆR f D y ˆR 1 p pp yp H pp ) ) 1 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 Several attractive properties: Good model for speech signals Closed-form representation 0 1 0.5 y(n 1) 0 0.5 1 1 0.5 0 y(n) 0.5 1 Reduced number of parameters to estimate for BSS or Dereverberation Multivariate PDFs can be derived analytically from corresponding univariate PDFs Page 11
Models for Spherically Invariant Processes (SIRPs) in TRINICON SIRPs are described by multivariate PDFs of the form (with suitable function f D ): ˆp D (y p ) = 1 ( π D det( ˆR f D y ˆR 1 p pp yp H pp ) ) 1 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 Several attractive properties: Good model for speech signals Closed-form representation 0 1 0.5 y(n 1) 0 0.5 1 1 0.5 0 y(n) 0.5 1 Reduced number of parameters to estimate for BSS or Dereverberation Multivariate PDFs can be derived analytically from corresponding univariate PDFs Incorporation into TRINICON leads to inherent stepsize normalization of the update equation Page 11
Incorporation of SIRPs into Generic BSS BSS: coefficient update for 2 sources with SIRPs reads W(m) = m [ β(i, m)w(i) i=0 0 Ry1 y 2 R 1 y 2 y 2 R y2 y 1 R 1 y 1 y 1 0 ] Page 12
Incorporation of SIRPs into Generic BSS BSS: coefficient update for 2 sources with SIRPs reads W(m) = m [ β(i, m)w(i) i=0 0 Ry1 y 2 R 1 y 2 y 2 R y2 y 1 R 1 y 1 y 1 0 ] RLS-like normalization (by lagged auto-correlation matrices R yq y q ) Matrices R yp y q PDFs: R yp y q (i) = 1 N of cross-relations include nonlinear terms derived from multivariate N 1 j=0 φ D (s) = f D (s) f D (s) ( ) φ D y q (i, j)r 1 y q y q (i)yq H (i, j) yp H (i, j)y q (i, j), HOS-SIRP realizations: φ D (s) may be derived analytically from the univariate PDF Page 12
Examples BSS performance: Conditions: Reverberation time T 60 200ms, 2 2 case, speech signals, microphone spacing 16cm, sampling rate 16kHz, estimation of R yy by correlation method SOS Algorithms (L = 1024) 22 20 Generic algorithm Signal to Interference Ratio in db 18 16 14 12 10 8 6 4 2 Broadband alg. (approx. normalization) Narrowband algorithm after Fancourt and Parra 0 0 50 100 150 200 Number of iterations Page 13
Examples BSS performance: Conditions: Reverberation time T 60 200ms, 2 2 case, speech signals, microphone spacing 16cm, sampling rate 16kHz, estimation of R yy by correlation method Signal to Interference Ratio in db SOS Algorithms (L = 1024) 22 20 18 16 14 12 10 8 6 4 2 Generic algorithm 0 0 50 100 150 200 Number of iterations Broadband alg. (approx. normalization) Narrowband algorithm after Fancourt and Parra Signal to Interference Ratio in db Generic SOS vs. TRINICON 16 14 12 10 8 TRINICON SIRP 6 4 Generic SOS 2 0 10 20 30 40 50 60 70 80 Number of iterations Page 13
Examples (2) SOS-dereverberation performance: Conditions: Reverberation time T 60 200ms, 2 2 case, speech signals, microphone spacing 16cm, unmixing filter length 1024 taps, sampling rate 16kHz, estimation of R yy by correlation method SIR improvement 22 20 MCBPD (SOS) Signal to Interference Ratio in db 18 16 14 12 10 8 6 4 2 Narrowband BSS after Fancourt and Parra 0 0 50 100 150 200 Number of iterations Page 14
Examples (2) SOS-dereverberation performance: Conditions: Reverberation time T 60 200ms, 2 2 case, speech signals, microphone spacing 16cm, unmixing filter length 1024 taps, sampling rate 16kHz, estimation of R yy by correlation method Signal to Interference Ratio in db SIR improvement 22 20 18 16 14 12 10 8 6 4 2 MCBPD (SOS) Narrowband BSS after Fancourt and Parra 0 0 50 100 150 200 Number of iterations Signal to Reverberation Ratio in db SRR improvement 8 7 6 MCBPD (SOS) 5 4 3 pure BSS (SOS) 2 1 Delay and sum beamformer 0 0 50 100 150 Number of iterations Page 14
Summary and Outlook Versatile framework for blind MIMO signal processing exploiting nongaussianity, nonwhiteness, and nonstationarity in a rigorous way Various novel algorithms can be derived from it (time-domain and frequency-domain) Examples show: separation gain more than 20dB, dereverberation gain more than 7dB This broadband approach has led to efficient realtime BSS on a laptop PC More sophisticated source models (e.g., HMM-based) may be incorporated in the framework Page 15