AR PROCESSES AND SOURCES CAN BE RECONSTRUCTED FROM DEGENERATE MIXTURES Radu Balan, Alexander Jourjine, Justinian Rosca Siemens Cororation Research 7 College Road East Princeton, NJ 8 fradu,jourjine,roscag@scr.siemens.com ABSTRACT When mixing of sources is degenerate the known blind source searation methods fail, since in general the degenerate BSS is an ill-osed roblem. Here we reort that if signal transmission is modeled by AR() rocesses one can reconstruct the rocesses and estimate the sources from their degenerate mixture using only second order statistics. We also rove that the aroach fails for a general ARMA(,q) model. The theoretical results are veried in the case of degenerate mixing of two voices and on synthetic data.. INTRODUCTION AND STATEMENT OF THE PROBLEM Current Blind Source Searation (BSS) literature addresses the case when the number of sources is equal to the number of microhones [JH9, Com9, BS9, Ama9, Tor9, Car97, PP9]. Little work has been done to address the degenerate case when this constraint is not satised. Particularly hard is the case of interest for many BSS alications when there are more sources than the number of microhones. This reort demonstrates that seartion in such a degenerate case is feasible. We roose a source searation architecture where sources are modeled as AR rocesses. We solve a secial case of the singular multivariate AR identication roblem, namely when the measurement is scalar but the noise term is a - dimensional vector. Our current aroach is based on the second order statistics only. Methods based on second order statistics have for regular mutivariate AR identication and signal searation (see for instance [S.M88, S.N9, WFO9]). In contrast to these studies, our work concerns the singular case for both the mutivariate AR identication and signal estimation in the BSS roblem. csiemens Cororate Research 998 We aly ths aroach to the degenerate case of the BSS roblem, secically when a scalar mixture of indeendent source signals is recorded with one microhone. The theory for singular multivariate AR rocess identication that is develoed here can be extended to higher dimensions (i.e. more sources than two voices). Let us consider two indeendent univariate AR() rocesses of order and the measurement given by the sum of the two oututs (see Figure ). The timedomain evolution equations are the following: s (n) P =, a k= ks (n, k)+g (n) s (n) =, b k= ks (n, k)+g (n) x(n) =s (n)+s (n) () where and are two indeendent unit variance white-noises, a ;:::;a and b ;:::;b are the arameters of the rst and second AR rocess resectively and G and G are real constants. The roblem is to identify the + real arameters a ;:::;a ;b ;:::;b, G and G based on the measurement fx(n)g n=;:::;n of a realisation of (). Our solution is based on the second order statistics of the measurements ractically given by the samled autocovariance coecients ^r(l) = N NX k=l x(k)x(k, l): The organization of the reort is the following: section resents the main theoretical results. First we show how the sectral density of x can be decomosed; second we derive a modied ARMA estimator by a olynomial system that involves second order statistics of the measurements. Section resents a gradient algorithm to solve these equations together with some other algorithm to address the estimation roblem. Section contains numerical exeriments showing a successful alication of the theory and is followed by conclusions.
- AR () - AR () s s @ @R h +,, Figure : The Singular Multivariate AR Model. THE MAIN RESULTS Since the two signals s and s are indeendent, the rocess () has the sectral ower density given by the following formula: R x (z) = G P (z)p ( z ) + G P (z)p ( z ) () Now it is easy to rove the following decomosition (factorization) result: THEOREM Suose we are given the sum x of two indeendent and stable AR() rocess oututs s and s.furthermore suose the rocesses have no common oles. Then the second order statistics is generically enough to uniquely identify the two AR rocesses. Remarks. By generical we mean that the set of \bad" AR rocesses form an algebraic manifold of ositive codimension in the + dimensional sace of arameters. Actually we can say a lot more about this algebraic manifold. These results will aear shortly in a full-length reort.. We oint out that the uniqueness of the decomosition () holds only for AR rocesses. If we relace them by ARMA rocesses, the result no longer holds true, as can be easily seen.. Equation () shows that x(n) is second-order statistics equivalent with an ARMA(,) rocess whose transfer function Q(z)=P (z) is related to our AR() rocesses by: P (z) =P (z)p (z) Q(z)Q( z )=G P (z)p ( z )+G P (z)p ( z ) This would suggest the following identication algorithm: ALGORITHM (ARMA(,) Identication). Identify the rocess fx(n)g as an ARMA(; ) rocess Q(z)=P (z);. For each artition of the roots of P into two subsets of zeros each, construct the olynomials P and P that have these roots and comute G and G - x that best t the second equation above (we exlain what we mean by best t in the next section after the Algorithm );. Choose the artition that gives the smallest error, and that will be an estimate of G ;P, G ;P. We tried this algorithm but it does not give accetable estimates, articularly for large. A second aroach to this roblem is to look for a Modied ARMA estimator (MARMA estimator), adated to our secial form. To do this we need to obtain the time-domain evolution equation of the measurement. In the z transform domain we have: P ( z )P ( z )x(z) =G P ( z ) (z)+g P ( z ) (z) which turns into the following equation: P x(n)+ P(a k= b) kx(n, k) =G (n)+g (n)+ + (G k= b k (n, k)+g a k (n, k)) P k where (a b) k = a l= lb k,l with the convention a = b =. To obtain the second order statistics evolution, we correlate x(n) with x(n, l) and s (n) with (n, l), resectively s (n) with (n, l) in (). Let us denote as follows: r(l) =E[x(n)x(n, l)], (l) = E[s (n) (n, l)], (l) = E[s (n) (n, l)], where E[X] is the exected value of the random variable X. Then we obtain the following system of olynomial equations: P r(l)+ P(a k= b) kr(l, k) =G P (,l)+g (,l)+ + G k= b k (k, l)+ G k= a k (k, l) (l) =, P k= a k (l, k)+g l; (l) =, P k= b k (l, k)+g l; () where is the Dirac imulse. Now note two things; First we do not know the theoretical autocovariance coecients, so we have to relace r(l) by the samled values ^r(l); Second note the causality relations between s ;s and the noise inuts. This causality imlies (l) = (l) = for every l<. Therefore the system () becomes: P ^r(l)+ P (a k= b) k^r(l, k), (G + G) l;,, k=l (G b k (k, l)+g a k (k, l))= (l) P min(l;) =, a k= k (l, k) ; () = G min(l;) (l) =, k= a k (l, k) ; () = G () We solve this nonlinear system in G ;G ;a;b by looking for the least square solution that minimizes a quadratic criterion of the form: P P L J = l= lj^r(l)+ (a k= b) k^r(l, k)(g + G) l;,, P k=l (G b k (k, l)+g a k (k, l))j ()
where L + and ( l ) l are some ositiveweights. Thus the Least Square estimator (LS estimator) is given by solving the following otimization roblem: ( ^G ; ^G ; ^a;^b) =argmin J(^r) (). IDENTIFICATION AND SEPARATION ALGORITHMS In this section we resent an algorithm to solve the identication issue and then we discuss the degenerate case of the BSS roblem. Here we reort only one algorithm we tried so far. A longer discussion will follow in an extended version of this reort... The Least Square Estimator The Least Square estimator resented before is based on a gradient descent scheme. One issue related to this algorithm is howtochoose an initial oint(g ;G ;a;b). We resent here an algorithm for obtaining this intialization. The idea is the following: we identify rst the time series fx(n)g; n =;:::;N as a \long" AR rocess, say AR(L ), and then we aroximate its sectral ower density by a decomosition of the tye (). ALGORITHM (Initialization of G ;G ;a;b). Choose L >and nd an AR(L ) estimator of the time series fx(n)g n=:::n, say ~ G= ~ P(z).. For each artition of the L roots of ~ P into two grous of zeros, construct P (z) and P (z) the olynomials corresonding to these zeros. Let S(z) be the remainder olynomial in ~ P, ~ P = P P S. Find G and G that best aroximate the equation: ~G = G P (z)s(z)p ( z )S( z )+G P (z)s(z)p ( z )S( z ) (7) (we indicate below how to obtain G and G ). Choose the best artition with resect to the aroximation error and obtain the corresonding estimates for G ;G, a, b. To choose G and G in (7) we have tried both a Pade aroximation [K.K87] and a least -norm solution. Both seem to work equaly ne. We describe here the -norm aroximation. Let us denote X P (z)s(z)p ( z )S( z )= L, X l=,l + f l z l ; P (z)s(z)p ( z )S( z )= L, l=,l + f l zl and Then the -norm error comuted on the unit circle in the comlex lane is given by: Error = X L, l=,l + jg f l + G f l, ~ G l; j Then we easily obtain a linear system in G and G by setting to zero the derivatives of the Error with resect to G, resectively G... The Estimation Problem Recall the roblem is the following: we have two voice signals recorded by the same microhone and we want, based on this mixed signal, to estimate the original two signals. The solution we roose is reresented in Figure and consists of two stages: an identication art and a linear estimation art. For identication, we assume the two voices are aroximated resectively by AR() rocesses and our task is to identify the roccesses arameters. For the linear estimation we tried both the Wiener ltering [Poo9]as well the causal art of the Wiener lter. It seems the causal art gives better results is terms of the sound quality. The Wiener lter formulae are given by: F (z) = F (z) = and the causal arts are then: F c (z, )= G P (z, ) T (z, ) G P(z)P( z ) G P(z)P( z )+G P(z)P( z ) G (8) P(z)P( z ) G P(z)P( z )+G P(z)P( z ) F c (z, )= G P (z, ) T (z, ) (9) where T (z) is the sectral factor in the factorization T (z)t ( z )=G P (z)p ( z )+G P (z)p ( z ). The adatation algorithm is the following: ALGORITHM (On-line Adatation). Initialize the arameter estimation on the rst N samles using the revious algorithm.. Aly a coule of gradient descent stes to \olish" the aroximation.. At each new samle, udate the samled autocovariance coecient by using a rectangular window (or an exonential window) and aly a gradient ste to adat the estimation of ^G ; ^G ; ^a;^b. Estimate ^s ; ^s using the udated (causal) Wiener lters. Exerimentally, the gradient correction at Ste seems not to track well the actual values of the arameters (obtained by an AR() estimator on the actual signals). Nonetheless, the more comutationally exensive algorithm that simly alies the estimation
s s,, - @ @R + h,, F x,,,,, @, @ - F,,, - Identification Block Figure : The Adative Estimation Diagram. ^s - ^s - on a sliding nonoverlaing window gives better results. Future work is needed to obtain a better on-line algorithm. mization, for several values of L. The rst lot gives the Yule-Walker estimation [GH9] of the sectral owers of the two AR() rocesses. The theoretical sectral ower is deicted in Figure, to lots, using the actual values of the arameters. In Figure we also resent the sectral ower densities where the gradient algorithm converged after stes. Dierent initializations imlied dierent limiting densities each of them corresonding to a dierent local minimum of the criterion J. For L =we show in Figure the convergence of the arameters. Figure lots the decimal logarithm of the criterion. Note how fast J decays during the rst stes. The limiting sectral ower densities obtained in Figure, second row, aroximates very well the original sectral densities. The gradient descent stes decreased the criterion to about times less the initial value. The arameters of the two AR() roccesses used were the following: G = G =, a =:;a =,:;a =,:;a =: and b =,:;b = :;b =,:;b = :. The identication algorithm gave a better estimate for the second rocess which was the most owerful rocess.. NUMERICAL EXPERIMENTS We reort exeriments on both synthetic and voice data. First we describe AR identication exeriments of singular mutivariate AR rocesses on synthetic data. Second we describe an alication of the theory to the estimation of two voices from one scalar mixture. All exeriments were erformed in Matlab. Estimated from s Initialization..8....8. 8.8....8.... 8 Initialization Estimated from s 7 8.... 8.. Exeriments on Synthetic Data We constructed two stable and indeendent AR() rocesses. Then we estimated the rocesses from their sum by ltering the observed signal with Wiener lters dened by arameters estimated by alying gradient descent with the initialization given by the Algorithm. Here we reort only one set of results, those corresonding to =. We considered N = samles to estimate the autocovariance coecients at various lags ^r(l); this corresonded to a ms window ofaseech signal samled at Hz, on which the signal may be considered stationary (see [LJ9]). For the criterion () we took L = to avoid the contribution of the uctuations in ^r(l) for large l. For diferent values of L we obtained dierent initializations. Surrisingly, the best initialization has been given by the lowest value L = =. In Figure we lot the initial sectral ower obtained with the Algorithm using the -norm mini- Initialization Initialization.. 8.... 8 Initialization Initialization...... 8 8 Figure : Sectral ower densities for the Yule-Walker estimations (rst row), initial sectral owers for L = (second row), L = (third row) and L = (fourth row) From these exeriments we conclude that the algorithm we roosed gives a fairly good estimate of the sectral ower densities of the two sources. The arameters of the most owerful rocess are estimated better.
8....8... The criterion at each iteration Norm aroximation.8 7 8 9 *L=8 = J =.8897 J =..7.. 7.8 System Signal.... Signal G.7. Identified.9.8.7.. Identified 8 a.. 7 8 9.7..... 8 8 8. 7 8 9. Identified.... 8 Identified 7.. 8 a a.... 7 8 9.. Identified.. Identified...8 7 8 9.. 8 8. a. Figure : The theoretical sectral ower densities (rst row), limiting sectral owers for L = (second row), L = (third row) and L = (fourth row).. 7 8 9.. System G... Exeriments on Voice Data We erformed exeriments with voices from the TIMIT database. The two voice signals (called A and A here) consist of 78 samles at khz samling frequency (about seconds of data). We tested how feasible the estimation roblem is. We identied the two voices as AR() rocesses, directly on the actual signals, and then we estimated the two voices from their sum using the lters from equations ( 8) and ( 9). In Figure 7 we show the time-series of the original voices (uer grahs), of their sum (the middle lot), and the estimated signals (lower lots). We used = and N =. The quality of the oututs is good for this rather low dimensional AR models we are aroximating voices with. We exerimented with longer AR rocesses as well, but the quality of the oututs does not imrove signicantly. These exeriments were meant to show that the estimation roblem can be solved reasonably well when we identify the two voices as AR rocesses. b b b b.8 7 8 9..... 7 8 9.7.... 7 8 9..... 7 8 9..... 7 8 9 Figure : The arameters of the two rocesses.. CONCLUSIONS In this reort we solved the identication roblem of a sum of two indeendent AR rocesses. First we roved that this system is identiable, next we deduced an estimator for the rocesses arameters and nally we resented a family of algorithms to imlement this estimator. As a direct alication we considered the de- Log(J) Figure : The criterion log J
....... The Mixture of the two voices. s s +s s^............. Voice A Estimated Voice A x. x s s^ Voice A.......... x......... Estimated Voice A x. x Figure 7: Voice exeriments: source voices (uer row), mixture calculated as sum (middle row), and estimated oututs (bottom row). generate case of the Blind Source Searation roblem where a mixture of two voices is given, as recorded with one microhone. From this one measurement (more secically one sequence of samles) we estimated the original two signals. This alication raised the adatation roblem of the singular AR identication algorithm. We showed how to adat the revious algorithm to an on-line rocedure. The resent study shows that the second order statistics is sucient for both the identication of singular multivariate AR rocesses of the articular form considered here, as well as estimation of indeendent signals in a scalar mixture of two voices when these voices can be well aroximated by AR rocesses. Future work will deal with various issues raised in the on-line imlementation, such as faster and more reliable algorithms. [Car97] [Com9] J.F. Cardoso. Infomax and maximum likelihood for blind source searation. IEEE Signal Processing Letters, ():{, Aril 997. P. Comon. Indeendent comonent analysis, a new concet? Signal Processing, ():87{, 99. [GH9] Arthur A. Giordano and Frank M. Hsu. Least Square Estimation with Alications to Digital Signal Processing. Sringer-Verlag, 99. [JH9] C. Jutten and J. Herault. Blind searation of sources, art i: An adative algorithm based on neuromimetic architecture. Signal Processing, ():{, 99. [K.K87] K.Kumar. Identication of autoregeressivemoving average (arma) models using ade aroximations. Bull.Inst.Statis.Inst., Proc.th Session, ():77{89, 7 987. [LJ9] L.Rabiner and B-H. Juang. Fundamentals of Seech Recognition. PTR Prentince Hall, 99. [Poo9] [PP9] [S.M88] [S.N9] H. Vincent Poor. An Introduction to Signal Detection and Estimation. Sringer-Verlag, 99. B. A. Pearlmutter and L. C. Parra. A contextsensitive generalization of ica. In International Conference on Neural Information Processing, Hong Kong, 99. S.M.Kay. Modern Sectral Estimation. Prentice Hall, 988. S.Nakamori. Estimation of multivariate signals by outut autocovariance data in linear discretetime systems. Math.and Com.Moddeling, ():97{, 7 99. [Tor9] K. Torkkola. Blind searation of convolved sources based on information maximization. In IEEE Worksho on Neural Networks for Signal Processing, Kyoto, Jaan, 99. [WFO9] E. Weinstein, M. Feder, and A. Oenheim. Multi-channel signal searation by decorrelation. IEEE Trans. on Seech and Audio Processing, ():{, 99.. REFERENCES [Ama9] S. Amari. Minimum mutual information blind searation. Neural Comutation, 99. [BS9] A.J. Bell and T.J. Sejnowski. An informationmaximization aroach to blind searation and blind deconvolution. Neural Comutation, 7:9{9, 99.