Proc. th Asilomar Conf. Signals, Syst. Comput., Pacific Grove, CA, Nov. 1-, 199 10 Blind Deconvolution of Discrete-Valued Signals Ta-Hsin Li Department of Statistics Texas A&M University College Station, Texas 8 Abstract This paper shows that when the input signal to a linear system is discrete-valued the blind deconvolution problem of simultaneously estimating the system and recovering the input can be solved more eæciently by taking into account the discreteness of the input signal. Two situations are considered. One deals with noiseless data by an inverse-æltering procedure which minimizes a cost function that measures the discreteness of the output of an inverse ælter. For noisy data observed from FIR systems, the Gibbs sampling approach is employed to simulate the posteriors of the unknowns under the assumption that the input signal is a Markov chain. It is shown that in the noiseless case the method leads to a highly eæcient estimator for parametric systems so that the estimation error decays exponentially as the sample size grows. The Gibbs sampling approach also provides rather precise results for noisy data, even if the initial and transition probabilities of the input signal and the variance of the noise are completely unknown. 1. Introduction Blind deconvolution in general deals with the simultaneous estimation of a linear system fs j g and reconstruction of its random input fx t g on the basis of the data fy t g obtained from the convolution 1X y t = s j x t,j : j=,1 è1è Partial information about the statistical properties of fx t g is usually required in order to obtain a sensible solution. It is evident that how well the knowledge of fx t g can be incorporated into the solution plays an important role in this problem. The current paper is concerned with a special problem of blind deconvolution in which the input signal takes discrete values from a known alphabet a typical situation encountered frequently in digital communications ë8ë. Based on the inverse æltering approach, a cost function is employed to measure the closeness of the æltered data to a discrete-valued sequence and minimized to obtain an estimate for the unknown system. In the parametric case where the system is characterized by a ænite dimensional parameter èe.g., ARMA modelsè the method is proved to yield highly eæcient estimates so that the estimation error may decay exponentially as the sample size grows. When the data fy t g are contaminated by Gaussian white noise and the system fs j g has ænite length èfirè, the current paper shows that the Gibbs sampling procedure can be used to deal with the estimation of fs j g and fx t g under the assumption that fx t g is a Markov chain ëë. This method presents an avenue to incorporate colored input signals into the blind deconvolution problem. All these results provide yet another piece of evidence that a digital signal is capable of resisting distortion and contamination if its discreteness can be judiciously utilized in the restoration procedure.. An Inverse Filtering Procedure When fx t g is an i.i.d. sequence of zero-mean random variables and fs j g is a minimum-phase ARMAèp; qè system so that px a æ j y t,j = b æ j x t,j èè with a æ 0 = b æ 0 = 1, the classical least-squares method ë1ë provides a solution to the problem by seeking the coefæcients fa1;::: ;a p ;b1;::: ;b q gto minimize the sample variance of the linear prediction error fu t g given by px u t = a j y t,j, b j u t,j : j=1 èè Since the method approximates the maximum likelihood estimation by ignoring the end-point eæect, it is not surprising that minimizing the sample variance of fu t g leads to asymptotically eæcient estimates ë1ë for the ARMA system èè. An alternative to least squares is the method of moments. Although computationally
appealing, it does not however provide eæcient estimates except for the pure AR systems. The variance of the estimates in both methods is usually proportional to the reciprocal of the sample size, i.e., Oè1=nè. To generalize the idea of least squares, it is crucial to observe that fu t g in èè is the output of an inverse ælter corresponding to the ARMA system in èè. Therefore, the least-squares method calls for the minimization of variance of the output sequence obtained by æltering the data fy t g with an inverse ælter. For an arbitrary parametric system with s j = s j èç æ è in è1è, one may consider the output sequence u t èçè = 1X j=,1 s,1 j èçè y t,j èè where fs,1 j èçèg is the inverse of fs j èçèg. Since the variance alone is no longer suæcient for the discrimination of nonminimum-phase systems, higher-order moments of fu t g have tobe involved in the selection of optimal ælters ëë, ëë, ëë, ë9ë, ë10ë. Minimization of Eèju t j k,r k è with r k = Eèjx t j k è=eèjx t j k è and ké1, for example, was suggested in ëë, whereas maximization of jc k èu t èj=ècèu t èè k= with c k èu t è being the k-th order cumulant ofu t for kéwas discussed in ëë, ëë. The stationarity of fx t g is a crucial requirement in all these procedures, and many of them further require some moments of fx t g to be available. The estimation accuracy of these procedures is usually Oè1=nè ëë. This accuracy limit, however, can be signiæcantly improved when the discreteness of the input signal is taken into account. In fact, for an m-ary signal whose alphabet is A = fa i ;i=1;::: ;mg, a highly eæcient estimator can be obtained by minimizing ^J n èçè = 1 n+1 nx t=,n my j^u t èçè, a i j ; i=1 where f^u t èçèg results from the inverse æltering ^u t èçè = nx j=,n s,1 t,j èçèy j èè èè using only the observed data fy t ;t =,n;::: ;ng èassuming y t = 0 for all jtj é nè. This criterion measures the closeness of f^u t èçèg from being an A- valued discrete sequence. It can be shown ëë, ëë that the minimizer of ^Jn èçè, denoted by ^ç n, is a consistent estimator for the true parameter ç æ and, more importantly, that the estimation error k^ç n, ç æ k is bounded by the tail behavior of the true inverse system so that X k^ç n, ç æ kçc js,1 j èç æ èj èè jjjçn where cé0 is a constant. For ARMA systems, this implies that the error of ^ç n decays as an exponential function rather than the square-root reciprocal of the sample size n. In other words, minimization of ^Jn èçè would produce ësuper-eæcient" estimates for the blind deconvolution problem. If the system is autoregressive with ænite order, the super-eæciency yields lim n!1 Prè^ç n = ç æ è=1: In other words, the minimizer of ^Jn èçè would be equal to the true values of the parameter with probability tending to unity as nincreases. It is also important to point out that all these results can be obtained without requiring the x t to have the same distribution as long as they are independent ëë, ëë. Therefore the supereæciency applies even to nonstationary signals. To demonstrate these results, let us consider a simple nonminimum-phase MAèè system ëë y t =,1:x t +:x t,1,x t, where fx t g is a binary sequence with Prèx t =0è=p t and Prèx t =1è=1,p t. For the general MAèè model y t = b0x t + b1x t,1 + bx t,,we assume b0 + b1 + b =1 and reparametrize the resulting two-parameter system with the zeros of the polynomial b0z + b1z + b denoted by ç = è;è. Therefore, in this example, ç æ =è æ ;zæ è=è1=;è. To compare with other methods which make no use of the discreteness of the input signal, we consider the well-known procedure of maximizing the standardized skewness ëë, ëë, ë9ë ^S n èçè = j^c è^u t èj è^cè^u t èè = where ^u t = ^u t èçè is the output of the inverse ælter in èè and ^c k è^u t è the k-th order sample cumulant of^u t. Two cases are considered: In Case 1 the input signal fx t g is stationary with p t =0: for all t, while in Case it is nonstationary with p t = æèsinètç=18èè where æèæè is the distribution function of the standard normal random variable. In both cases a random sample of size n = 1000 is used in the computation of ^Jn èçè and ^Sn èçè, and the contour plots of these criteria are presented in Figures 1í. As we can see from Figs. 1 and, the èbinarinessè criterion ^Jn èçè has a very sharp valley near the true value ç æ èindicated by +èinboth stationary and nonstationary cases. This implies that minimizing ^Jn èçè will produce very precise estimates for both stationary and nonstationary input signals. On the other hand, 11
.0.0.. 1. 1. 0.0 0.1 0. 0. 0. 0. 0. 0.0 0.1 0. 0. 0. 0. 0. Fig. 1. Contour of ^Jnèçè: Stationary case. Fig.. Contour of ^Snèçè: Stationary case..0.0.. 1. 1. 0.0 0.1 0. 0. 0. 0. 0. 0.0 0.1 0. 0. 0. 0. 0. Fig.. Contour of ^Jnèçè: Nonstationary case. Fig.. Contour of ^Snèçè: Nonstationary case. the standardized skewness ^Sn èçè has a rather broad peak near ç æ in the stationary case èfig. è. Although a solution to the deconvolution problem is provided by maximizing ^Sn èçè, the broad peak in ^Sn èçè as shown by Fig. may yield inaccurate estimates for ç æ. To make things even worse, the peak completely disappears in ^Sn èçè for the nonstationary signal èfig. è. This reveals how crucial the stationarity may be to the successful implementation of procedures like maximization of the standardized skewness. It is evident that the advantage of ^Jn èçè comes primarily from its utilization of the discreteness of the input signal.. A Gibbs Sampling Procedure Suppose fs j g in è1è is an FIR system operated in a noisy environment so that fy t g is obtained from y t = ç j x t,j + æ t è8è where fæ t g is Gaussian white noise with unknown variance ç. For the input signal, we assume that fx t g is a ærst-order Markov chain with state space A, unknown initial probabilities ç i =Prèx1,q=a i è, and unknown transition probabilities ç ij = Prèx t = a j jx t,1 = a i è. The blind deconvolution èor restorationè problem becomes the joint estimation of all the unknown parameters ç =ëç0;::: ;ç q ë T, ç =fç i ;ç ij g, and ç, and the recovery of the unknown input x = fx1,q;::: ;x n g, solely from a ænite data set y = fy1;::: ;y n g. It should 1
be pointed out that most of the previously mentioned methods of blind deconvolution do not directly apply to this situation since the input signal fx t g is colored and its moments unknown. To deal with this problem, Chen and this author have recently combined the Bayesian approach with a Gibbs sampling procedure ëë. The gist of method can be summarized as follows. Upon regarding all the unknowns as independent random variablesèvectors, a multivariate Gaussian distribution and an inverse chisquire distribution are used as priors for ç and ç, respectively, so that ç ç Nèç0 ; æ 0è and ç ç ç, èç; çè èi.e., çç=ç ç ç èçèè. Dirichlet distributions are employed as priors for the ç's, namely èç1;::: ;ç m è ç Dèæ1;::: ;æ m è and èç i1;::: ;ç im è ç Dèæ i1;::: ;æ im è; so that pèç1;::: ;ç Q Q P m è è ç æi i P with ç i = 1 and pèç i1;::: ;ç im è è ç æij j with j ç ij = 1. Selection of the parameters in these priors reæects the a priori information about the unknowns. For instance, small values of ç and ç or large variances in æ0 correspond to less informative priors suitable for the situations where information about ç and ç is limited. Jeæray's non-informative Dirichlet prior for èç1;::: ;ç m è corresponds to æ i =,1= while in general æ i é,1. According to the Bayesian approach, one is interested in seeking the conditional expectation Eèx t jyè or the mode of the conditional probability pèx t jyè, for instance, as estimates of x t. The diæculty is that any direct computation of these estimates seems impossible because of the complexity of the problem èmore unknowns than observationsè. Alternatively, one may employ the Monte Carlo method with a Gibbs sampler. The idea of Gibbs sampling is to construct a Markov chain by recursively generating random samples from the conditional posterior distribution of an individual or a subset of the unknowns given the data y and the rest unknowns. This procedure continues until the sampling Markov chain converges in distribution. In this case, the random samples generated by the Gibbs sampler can be regarded as ergodic samples from the joint posterior distribution pèx; ç;ç ;çjyè, so the simple average of the x t components and the maximum relative frequency of x t = a i obtained from these samples, for example, will approximate the conditional expectation èmmse estimatorè Eèx t jyè and the MAP estimator modefpèx t jyèg, respectively. It is not too diæcult to derive for the Gibbs sampler the conditional posterior distributions of the unknowns in our problem. As a matter of fact, it can be shown ëë that the conditional posterior distribution of ç given y and the rest unknonws is Gaussian with mean vector ç1 and covariance matrix æ1, i.e., where æ,1 1 = ç1 pèç j rest; yè ç Nèç1 ; æ 1è nx t=1 = æ1 xtx T t =ç +æ,1 0 and è nx xty t =ç +æ,1 0 ç 0 t=1 with xt =ëx t ;æææ ;x t,q ë T. Similarly, it can be shown ëë that pèç j rest; yè ç ç, èç + n; çç + s è, pèç1;::: ;ç m jrest; yè ç Dèæ1 + æ1;::: ;æ m +æ m è; pèç i1;::: ;ç im j rest; yè ç Dèæ i1 + n i1;::: ;æ im + n im è; where s = P n t=1 èy t, P q ç jx t,j è, n ij =èfèx t ;x t,1 è=èa i ;a j èg;! and and æ i =1ifx1,q=a i and æ i =0ifx1,q=a i. For any æxed t 0 f1,q;::: ;ng, the conditional posterior distribution of x t 0 can be expressed as Prèx t 0 = a i j rest; yè è pèx 0 jçè expè,s 0 =èç èè where x 0 = fx 0 1,q ;::: ;x0 n g with x0 t = a 0 i and x 0 t = x t for t = t 0, and s 0 P P n = t=1 èy q t, ç jx 0 t,j è. Note that under Q the Markovian assumption of fx t g we have pèxjçè=è ç èèq æi i ç nij ij è. As an example of the Gibbs sampling procedure, let us consider the MAèè system y t =,0:18x t +0:91x t,1 +0:81x t,, 0:198x t, + æ t where fx t g is a four-level Markov chain with A = f,;,1; 1; g, ç i =1=, and ëç ij ë= : : : : : : : : :1 : : : :1 : : : : A realization of fx t g with n = 100 is shown in Fig. èaè and the corresponding fy t g shown in Fig. èbè. The 1
sample variance of fæ t g is adjusted so that the signalto-noise ratio in fy t g equals 1 db. The parameters in the prior distributions are chosen as follows: ç0 = 0, æ0 = 1000 I, ç =, ç = 0:, and æ i = æ ij = 1. Fig. 1ècè shows the i.i.d. uniform initial guess for fx t g in the Gibbs sampler, and Figs. èdè and èeè present the conditional mean and mode of x t given y, i.e., Eèx t jyè and modefpèx t jyèg, respectively, calculated from the last 00 samples of the total 1000 iterations of Gibbs sampling. The constraints ç1 ç 0: and ç1 çjç i j+0: for i = 1 are used to remove the sign and shift ambiguities in the solution. Estimates of ç p and ç ij are given in the form of Eèæjyè æ V èæjyè by, respectively, and è,0:19; 0:89; 0:9;,0:181è æ è0:0191; 0:019; 0:00; 0:011è : :1 : :18 :1 : :0 :1 :0 : : :19 :1 :1 :9 : æ :11 :08 :09 :08 :0 :08 :08 :0 :0 :0 :08 :0 :08 :0 :10 :09 It is evident by comparing Figs. èdè and èeè with Fig. èaè that the MAP estimator modefpèx t jyèg completely recovers the input signal fx t g from the noisy data while the recovery by the MMSE estimator Eèx t jyè is almost complete except for the last point, even though the sample size is relatively small. The estimates for the system parameters and the transition probabilities are reasonably accurate given that n is merely 100. This demonstrates again the impact of the discreteness of input signals on the improvement of blind deconvolution solutions. References ë1ë P.J. Brockwell and R.A. Davis, Time Series: Theory and Methods, nd Ed., New York: Springer, 1991. ëë R. Chen and T.H. Li, ëblind restoration of linearly degraded discrete signals by Gibbs sampler," Tech. Rep. 19, Dept. of Statist., Texas A&M University, College Station, 199. ëë Q. Cheng, ëmaximum standardized cumulant deconvolution of non-gaussian processes", Ann. Statist., vol. 18, pp. 1í18,1990. ëë D. Donoho, ëon minimum entropy deconvolution," in Applied Time Series Analysis II, D. Findley, Ed., New York: Academic, 1981. ëë D.N. Godard, ëself-recovering equalization and carrier tracking in two-dimensional data communication systems," IEEE Trans. Commun., vol. COM-8, pp. 18í18, Nov. 1980. ëë T.H. Li, ëblind identiæcation and deconvolution of linear systems driven by binary random sequences," : (a) x(t) [markov] (b) y(t) [1dB] (c) x0(t) [iid unif] (d) E(x(t) y) (e) mode of p(x(t) y) 1-1 - - 8 0 - -8-1 -1-1 -1 - - 1-1 - - Fig.. Deconvolution by Gibbs sampling. IEEE Trans. Inform. Theory, vol. IT-8, pp. í8, Jan. 199. ëë T.H. Li, ëblind deconvolution of linear systems with nonstationary multilevel inputs," Proc. IEEE Signal Process. Workshop on Higher-Order Statist., S. Lake Tahoe, CA, pp. 10í1, June 199. ë8ë J.G. Proakis, Digital Communications, nd Edn, New York: McGraw-Hill, 1989. ë9ë O. Shalvi and E. Weinstein, ënew criteria for blind deconvolution of nonminimum phase systems èchannelsè," IEEE Trans. Inform. Theory, vol. IT-, pp. 1í1, Mar. 1990. ë10ë J.K. Tugnait, ëinverse ælter criteria for estimation of linear parametric models using higher order statistics," Proc. ICASSP-91, pp. 101í10. 1