STOCHASTIC INFORMATION GRADIENT ALGORITHM WITH GENERALIZED GAUSSIAN DISTRIBUTION MODEL

Size: px

Start display at page:

Download "STOCHASTIC INFORMATION GRADIENT ALGORITHM WITH GENERALIZED GAUSSIAN DISTRIBUTION MODEL"

Jonathan Tucker
5 years ago
Views:

1 Journal of Circuits, Systems, and Computers Vol. 2, No. (22) 256 (6 pages) #.c World Scientific Publishing Company DOI:.42/S STOCHASTIC INFORMATION GRADIENT ALGORITHM WITH GENERALIZED GAUSSIAN DISTRIBUTION MODEL BADONG CHEN,, JOSE C. PRINCIPE,, JINCHUN HU, and YU ZHU, ** Department of Electrical and Computer Engineering, University of Florida, Gainesville, FL 326, USA Department of Precision Instruments and Mechanology, Tsinghua University, Beijing 84, P. R. China Received 26 October 29 Accepted 3 August 2 Published 29 February 22 This paper presents a parameterized version of the stochastic information gradient (SIG) algorithm, in which the error distribution is modeled by generalized Gaussian density (GGD), with location, shape, and dispersion parameters. Compared with the kernel-based SIG (SIG- Kernel) algorithm, the GGD-based SIG (SIG-GGD) algorithm does not involve kernel width selection. If the error is zero-mean, the SIG-GGD algorithm will become the least mean p-power (LMP) algorithm with adaptive order and variable step-size. Due to its well matched density estimation and automatic switching capability, the proposed algorithm is favorably in line with existing algorithms. Keywords: Minimum error entropy criterion (MEE); stochastic information gradient (SIG) algorithm; generalized Gaussian density (GGD); least mean p-power (LMP).. Introduction Stochastic gradient (SG) algorithm under error criteria has been extensively studied and widely used in the last decades.,2 The least mean square (LMS) algorithm, which is based on the minimum mean-square error (MSE) criterion, is perhaps the most popular algorithm because of low computation requirements and optimality for Gaussian cases. However, in the presence of non-gaussian *This paper was recommended by Regional Editor Majid Ahmadi. 256-

2 B. Chen et al. noises, the LMS algorithm will result in significant performance degradation. To improve the learning performance (degree of stability, convergence speed, misadjustment error, etc.) for non-gaussian situations, many stochastic gradient algorithms under non-mean-square error (Non-MSE) criteria are studied. 3 5 Among these algorithms, the least mean p-power (LMP) algorithm 5,6 deserves special attention due to its ease of mathematical tractability and satisfactory performance. In scenarios of adaptive system training, the LMP algorithm with order p is given by Wðk þ ðjeðkþjp Þ ¼ WðkÞ pjeðkþj eðkþ ; ðþ where WðkÞ ¼½w ðkþ; w 2 ðkþ;...; w M ðkþš T is the weight vector of the adaptive model at k iteration, > is the step-size, eðkþ denotes the prediction error, the eðkþ depends on the topology of the adaptive system under consideration. Well-known special cases of the LMP algorithm include the least absolute difference (LAD) algorithm (p ¼ ), 7 LMS algorithm (p ¼ 2), and the least mean fourth (LMF) algorithm (p ¼ 4). 3 More recently, researchers propose the minimum error entropy (MEE) criterion as an information theoretic alternative to traditional error criteria in supervised adaptive system training. 8,23 As shown in Fig., under MEE criterion, the weight vector of the adaptive model hðx; WÞ is adjusted online so that the error s entropy HðeðkÞÞ is minimized. The error s entropy measures the average uncertainty contained in the error signal. Its minimization forces the error samples to concentrate. Numerical examples suggested that the MEE criterion could be able to achieve a better error distribution, especially for higher-order statistics in the adaptive system training. 2 Under MEE criterion, the optimal weight vector W is x(k) f (x) z(k) + + Σ n(k) d(k) + Σ e(k) error entropy h(x,w) y(k) Fig.. Scheme of adaptive system training under MEE criterion, where dðkþ is the noisy observations of the system output, and is considered as the desired response of the adaptive model when driven by input signal xðkþ and perturbed by noise samples nðkþ

3 Stochastic Information Gradient Algorithm with Generalized Gaussian Distribution Model given by W ¼ arg min HðeÞ W2R M Z ¼ arg min W2R M R p e ðeþ log p e ðeþde ¼ arg min E e ð log p e ðeþþ ; ð2þ W2R M where p e ðþ denotes the PDF of the error, and E e ðþ denotes expectation operator. The optimal weight vector can be searched by stochastic information gradient (SIG) algorithm. 9 Wðk þ Þ log p : ð3þ The key problem of SIG algorithm is how to estimate the PDF of error eðkþ. The nonparametric kernel approach has been widely used due to its smoothing characteristics. 3 By kernel density estimation (KDE), the estimated error distribution is given by ^p e ðeðkþþ ¼ L X k i¼k Lþ K h ðeðkþ eðiþþ ; ð4þ where K h ð:þ represents the kernel function, h is the kernel width, and L denotes the sliding window length. By kernel approach, one is often confronted with the problem of how to choose a suitable value of the kernel width. An inappropriate choice of width will significantly deteriorate the behavior of the algorithm. 8, Though the effects of the kernel width on the shape of the performance surface and the eigenvalues of the Hessian at and around the optimal solution have been carefully investigated, the choice of the kernel width is still a difficult task. In Ref. 24, an adaptive kernel width is proposed, however, the computation of the kernel width is based on a gradient search algorithm which is even as complex as the adaptive algorithm itself. Hence, certain parameterized density estimation, which does not involve the choice of kernel width, might be more practical. Further, if some prior knowledge about the structure of the density is available, the parameterized density estimation may achieve a better accuracy than nonparametric ones. These facts motivate us to develop a parameterized SIG algorithm, in which the error s distribution is modeled by a certain parametric family of probability density functions. In this paper, the generalized Gaussian density (GGD) 4 is used to match the error s distribution. The reasons for the choice of GGD densities are at least twofold: () The GGD densities belong to a family of maximum entropy densities, which have simple yet flexible functional forms and could approximate a large number of different signals, comprising a wide range of statistical distributions 5 8 ; (2) The 256-3

4 B. Chen et al. GGD densities have only three parameters, i.e., location, shape and dispersion, which can be easily estimated from the sample data. 7,8 The paper is organized as follows: In Sec. 2, the generalized Gaussian distribution and its parametric estimation are introduced. In Sec. 3, the GGD-based SIG algorithm is derived, and the connections with LMP algorithm are discussed. In Sec. 4, a series of Monte Carlo simulation experiments are performed to demonstrate the performance of the new algorithm. Finally, concluding remarks are presented in Sec Generalized Gaussian PDF and Its Parametric Estimation The generalized Gaussian density (GGD) with location and standard deviation > is defined as jx j p X ðxþ ¼ 2 exp ; ð5þ in which sffiffiffiffiffiffiffiffiffiffi ¼ ð Þ ð 3 Þ ; ð6þ where ðþ denotes the Gamma function 9 given by ðzþ ¼ Z x z e x dx ; z > : ð7þ In Eq. (5), describes the exponential rate of decay, and hence the shape of PDF, while models the dispersion of the distribution. Usually, is called the shape parameter and is referred to as the dispersion or scale parameter. The GGD densities include Gaussian ( ¼ 2) and Laplacian ( ¼ ) distributions as special cases. Figure 2 shows the GGD distributions for several shape parameters with the same variance ( ¼ ). From Fig. 2, it is evident that smaller values of the shape parameter correspond to heavier tails and therefore to sharper distributions. In the limiting cases, as!, the GGD becomes close to the uniform distribution, whereas for! þ, it approaches an impulse function (-distribution). The GGD densities are the maximum entropy (Maxent) distributions, which satisfy the following constrained optimization (see Appendix A): 8 Z þ >< max pðxþ log pðxþdx >: s:t: p Z þ jx j pðxþdx ¼ m ¼ ; where m denotes the -order absolute central moment. ð8þ 256-4

5 Stochastic Information Gradient Algorithm with Generalized Gaussian Distribution Model α=3 α= α=2 α= Fig. 2. Generalized Gaussian distribution for different shape parameters with unit variance. According to Jaynes s maximum entropy principle (MEP), 2 the Maxent distribution is \uniquely determined as the one which is maximally noncommittal with regard to missing information, and that it agrees with what is known, but expresses maximum uncertainty with respect to all other matters". Therefore, the GGD distribution would be the most unbiased distribution that agrees with a certain absolute central moment constraint. Until now, a number of methods have been proposed for the parametric estimation of a GGD distribution. The state-of-the-art can be found in Refs. 4 and 8. Here we only introduce the moment-matching estimators (MMEs). A complete statistical description for the GGD-modeled distributions, by simply matching the underlying moments of the data set with those of the assumed distribution, is first obtained by Mallat. 2 For a GGD distribution, the r-order absolute central moment is calculated as Z þ m r ¼ E½jX j r Š¼ jx j r p X ðxþdx ¼ r ðr þ Þ : ð9þ p ffiffiffiffiffiffi The ratio of the mean absolute deviation (m ) to standard deviation ( )is MðÞ ¼ m 2 pffiffiffiffiffiffi ¼ q ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi ; ðþ m 2 3 m

6 B. Chen et al. which is a steadily increasing function of shape parameter. So an estimate for can be expressed as ^ ¼ M ^m pffiffiffiffiffiffi ; ðþ ^m 2 where M ðþ denotes the inverse function of MðÞ, ^m and ^m 2 represent the sample mean absolute deviation and the sample standard deviation, respectively. Given independent, identically distributed (i.i.d.) random samples fx ; x 2 ;...; x N g,we have 8 ^ ¼ X N x N i ; >< ^m ¼ N >: ^m 2 ¼ N i¼ X N i¼ X N i¼ jx i ^j ; ðx i ^Þ 2 : The shape parameter can also be expressed in terms of the kurtosis of the GGD distribution. 22 A similar ratio, which is equal to the reciprocal of the square root of kurtosis of the GGD distribution is given by KðÞ ¼ m 2 3 pffiffiffiffiffiffi ¼ q ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi : ð3þ m 4 Therefore, another estimate of could be ^ ¼ K ^m pffiffiffiffiffiffi 2 ; ð4þ ^m 4 where K ðþ denotes the inverse function of KðÞ, ^m 4 is the sample fourth-order central moment, that is, ^m 4 ¼ N X N i¼ 5 ðx i ^Þ 4 : To estimate the shape parameter, we have to solve the inverse functions M ðþ or K ðþ, whose curves are depicted in Fig. 3. As both the inverse functions have no explicit forms, some interpolation method should be used in practical applications. p ffiffiffiffiffiffi Plugging the estimate of into (6) and substituting for ^m 2, we may get the estimate of the dispersion parameter: s ^ ¼ ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi ^m 2 ð ^ Þ ð 3^ Þ ð2þ ð5þ : ð6þ 256-6

7 Stochastic Information Gradient Algorithm with Generalized Gaussian Distribution Model 9 8 y=m - (x) y=k - (x) 7 6 y x Fig. 3. Curves of the inverse function M ðþ and K ðþ. 3. GGD-Based Stochastic Information Gradient Algorithm Using GGD density to approximate the error s distribution, we have! ^ ^p e ðeðkþþ ¼ k 2 ^ exp jeðkþ ^ ^k! kj ; ð7þ k ^ ^ k k where ^ k, ^ k,and^ k denote the estimated location, shape, and dispersion parameters at k iteration. By moment-matching estimators, say the kurtosis-based approach, we have 8 ^ k ¼ X k eðiþ ; L i¼k Lþ X k >< ^ k ¼ K B L i¼k Lþ ðeðiþ ^ kþ 2 qffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi X A ; k i¼k Lþ ðeðiþ ^ kþ 4 ð8þ L vffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi vffiffiffiffiffiffiffiffiffiffiffiffiffiffi u X ^ k ^ k ¼ t ðeðiþ ^ L k Þ 2 u k t ; >: i¼k Lþ 3^ k 256-7

8 B. Chen et al. where L denotes the sliding window length. Thus, the GGD-based SIG algorithm can be derived as 8! Wðk þ log ^ k exp jeðkþ ^ ^k! 9 < = kj : 2^ k ^ k ; ^ k ^k jeðkþ ^ k j ^ k ¼ WðkÞ ^ k ^ ^ jeðkþ ^ k k j ^ k signðeðkþ ^ eðkþ : k ð9þ In this paper, to make a distinction, we denote \SIG-Kernel" and \SIG-GGD" the SIG algorithms based on kernel approach and GGD densities, respectively. Compared with the SIG-Kernel algorithm, the SIG-GGD just needs to estimate the three parameters of GGD density, without resorting to the choice of kernel width and the calculation of kernel function. Comparing () and (9), we find that when ^ k, the SIG-GGD algorithm can be regarded as the LMP algorithm with adaptive order ^a k and variable step-size =^ ^ k k. In fact, under certain conditions, the SIG-GGD algorithm will converge to a specific LMP algorithm. Consider the finite impulse response (FIR) system identification case, in which the plant and the adaptive model are both FIR filters with the same order. In this case, the error signal eðkþ is given by eðkþ ¼W T XðkÞþnðkÞ W T ðkþxðkþ ¼ V T ðkþxðkþþnðkþ ; ð2þ where W ¼½w ; w 2 ;...; w M ŠT is the weight vector of unknown system, nðkþ is the interference noise, XðkÞ ¼½xðkÞ; xðk Þ;...; xðk M þ ÞŠ T is the input vector, and V ðkþ ¼W WðkÞ is the weight error vector. Then, near the convergence, we have WðkÞ W, and hence eðkþ ¼ðW WðkÞÞ T XðkÞþnðkÞ nðkþ : If fnðkþg is zero-mean white noise, we have ^ k and 8 X k ^ k K B L ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi i¼k Lþ n2ðiþ q X A ; k >< L i¼k Lþ n4 ðiþ vffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi vffiffiffiffiffiffiffiffiffiffiffiffiffiffi u X ^ k ^ k t n 2 u k ðiþ t : >: L i¼k Lþ 3^ k ð2þ ð22þ 256-8

9 Stochastic Information Gradient Algorithm with Generalized Gaussian Distribution Model Assuming L as large enough, by the limit theorems of probability, 8 X k n 2 ðiþ!e½n 2 ðkþš ; >< L >: L i¼k Lþ X k i¼k Lþ n 4 ðiþ!e½n 4 ðkþš : Thus, the estimated parameters ^ k and ^ k shall approach certain constants, i.e., the SIG-GGD algorithm shall converge to a certain LMP algorithm with fixed order and step size. For example, if nðkþ is zero-mean Gaussian distributed (with shape parameter ¼ 2), we have ^ k 2 near the convergence, and consequently, the SIG- GGD algorithm converges to the LMS algorithm. Similarly, for Laplace noises, the SIG-GGD algorithm shall converge to the LAD algorithm. In Ref. 4, it has been shown that under slow adaptation, the LMS and LAD algorithms are, respectively, the optimum algorithms for the Gaussian and Laplace interfering noises. We may therefore conclude that the SIG-GGD algorithm has the ability to adjust its parameter pair ( ^ k, ^ k ) so as to switch to a certain optimum algorithm. This automatic switching property will be verified in the following section through a set of Monte- Carlo simulations. 4. Simulation Results To demonstrate the performance of the SIG-GGD algorithm, we perform a series of Monte-Carlo simulations for the FIR channel identification, in comparison with the SIG-Kernel, LAD, LMS, and LMF algorithms. Let the weight vector of unknown channel be W ¼½:; :3; :5; :3; :Š T, and the input signal be white Gaussian noise with unit power. For the SIG-Kernel algorithm, the Gaussian kernel is used, and the kernel width is chosen according to the well-known Silverman s rule of thumb. 3 For the SIG-GGD algorithm, the kurtosis-based moment-match estimator is used (see Ref. 8). To solve the inverse function K ðþ, we adopt the basic linear interpolation method, with step.5. In addition, to avoid \very large gradient", we set the upper bound of ^ k equal to 4, that is, if ^ k > 4, we set ^ k ¼ 4. In the experiments below, we focus on four special distributions of the disturbance noise, whose density functions are depicted in Fig. 4. Figures 5 8 illustrate the convergence curves of the mean-square deviation (MSD) E½V T ðkþv ðkþš averaged over independent Monte-Carlo runs for each disturbance noise. Figure 9 shows the averaged parameter k for Laplace, Gaussian, and Uniform noises. To compare the statistical results, we also summarize in Table the sample mean and standard deviation of the weight w 3 (w 3 ¼ :5). From the figures and table, we have the following observations: () The performances of the LAD, LMS, and LMF algorithms depend crucially on the distribution of the disturbance noise. The three algorithms may achieve the ð23þ 256-9

10 B. Chen et al..6.5 p(x) Gaussian Laplace Uniform MixNorm x Fig. 4. Four probability density functions of the disturbance noise. -5 SIG-Kernel SIG-GGD LAD LMS LMF LMF - MSD (db) -5 LMS SIG-Kernel SIG-GGD -2 LAD iterations Fig. 5. Convergence curves of mean square deviation (Laplace noise case). 256-

11 Stochastic Information Gradient Algorithm with Generalized Gaussian Distribution Model -5 SIG-Kernel SIG-GGD LAD LMS LMF - MSD (db) -5 SIG-Kernel LMF LAD LMS SIG-GGD iterations Fig. 6. Convergence curves of mean square deviation (Gaussian noise case). -5 SIG-Kernel SIG-GGD LAD LMS LMF - MSD (db) -5 LAD LMS SIG-Kernel LMF SIG-GGD iterations Fig. 7. Convergence curves of mean square deviation (Uniform noise case). 256-

12 B. Chen et al. -5 SIG-Kernel SIG-GGD LAD LMS LMF - MSD (db) -5 LAD LMS SIG-GGD LMF -2 SIG-Kernel iterations Fig. 8. Convergence curves of mean square deviation (MixNorm noise case). Table. Mean deviation results of w3 over Monte-Carlo simulations. SIG-Kernel SIG-GGD LAD LMS LMF Laplace Gaussian Uniform MixNorm smallest misadjustment for a certain noise distribution (e.g., the LMS algorithm for Gaussian noise); however, for other noise distributions, their performance will deteriorate dramatically. (2) The SIG-GGD and SIG-Kernel algorithms are both robust to the noise distribution. In the case of symmetric and unimodal noise (e.g., Laplace, Gaussian, Uniform), the SIG-GGD algorithm may achieve a smaller misadjustment than the SIG-Kernel algorithm. Though in the case of nonsymmetric and nonunimodal noise (e.g., MixNorm), the performance of SIG-GGD algorithm may be not as good as the SIG-Kernel algorithm, it is still better than most of the LMP algorithms. (3) Near the convergence, the parameter k converges approximately to, 2, and 4 (note that ^ k is restricted to ^ k 4 artificially) when disturbed by Laplace, 256-2

13 Stochastic Information Gradient Algorithm with Generalized Gaussian Distribution Model 2 α k iterations (a) Laplace Noise 2.5 α k iterations (b) Gaussian Noise 4 α k iterations (c) Uniform Noise Fig. 9. Parameter k averaged across independent Monte Carlo runs. Gaussian, and Uniform noise, which implies the SIG-GGD algorithm will switch approximately to the LAD, LMS, and LMF algorithm for, the Laplace, Gaussian, and Uniform noise, respectively. This agrees with the previous discussion in Sec Conclusions We have presented in this paper a new stochastic information gradient (SIG) algorithm, in which the error s distribution is modeled by the generalized Gaussian density (GGD). Different from the kernel-based SIG algorithm, the GGD-based algorithm does not resort to the choice of kernel width, and just relies on estimating the location, shape, and dispersion parameters during the adaptation. The relationships between the proposed algorithm and the least mean p-power (LMP) 256-3

14 B. Chen et al. algorithm are investigated. This new algorithm has the ability to adjust the parameters so as to switch approximately to a certain optimal LMP algorithm. Simulation results suggest that, for the case in which the noise has a symmetric and unimodal distribution, the SIG-GGD algorithm may outperform the SIG-Kernel algorithm. In the case of nonsymmetric and nonunimodal disturbance, the performance of the SIG-GGD algorithm may be not as good as the SIG-Kernel algorithm, however, it still outperforms most of the LMP algorithms. Acknowledgments This work was supported by the National Natural Science Foundation of China (No ), and was partially supported by NSF Grant ECCS 85644, NSF IIS 96497, and ONR N Appendix A To solve the optimization of (8), we create an unconstrained expression for the entropy using Lagrange multipliers: Z þ Z þ L ¼ pðxþ log pðxþdx ð Þ pðxþdx Z þ jx j pðxþdx : ða:þ Using calculus of variations, we maximize L with respect to pðxþ, f p log p ð Þp jx j pg¼ ) log p ð Þ jx j ¼ ) p ¼ expf jx j g ; where and are determined by 8Z þ >< expf jx j gdx ¼ ; >: Z þ jx j expf jx j gdx ¼ : Hence, it is easy to derive and as follows: ¼ logfð2 ð=þþ=g ; ¼ = : ða:2þ ða:3þ ða:4þ Combining (A.2) and (A.4) yields the GGD distribution. This completes the proof

15 Stochastic Information Gradient Algorithm with Generalized Gaussian Distribution Model References. S. Haykin, Adaptive Filtering Theory, 3rd edn. (Prentice Hall, NY, 996). 2. B. Widrow and S. D. Stearns, Adaptive Signal Processing (Prentice Hall, Englewood Cliffs, NJ, 985). 3. E. Walach and B. Widrow, The least mean fourth (LMF) adaptive algorithm and its family, IEEE Trans. Inform. Theory 3 (984) S. C. Douglas and H. Y. Meng, Stochastic gradient adaptation under general error criteria, IEEE Trans. Signal Process. 42 (994) S. C. Pei and C. C. Tseng, Least mean p-power error criterion for adaptive FIR filter, IEEE J. Select. Areas Commun. 2 (994) Y. Xiao, Y. Tadokoro and K. Shida, Adaptive algorithm based on least mean p-power error criterion for Fourier analysis in additive noise, IEEE Trans. Signal Process. 47 (999) V. J. Mathews and S. H. Cho, Improved convergence analysis of stochastic gradient adaptive filter using the sign algorithm, IEEE, Trans. Acoust. Speech, Signal Process. 35 (987) D. Erdogmus and J. C. Principe, Generalized information potential criterion for adaptive system training, IEEE Trans. Neural Ntw 3 (22) D. Erdogmus, E. H. Kenneth and J. C. Principe, Online entropy manipulation: Stochastic information gradient, IEEE Signal Process. Lett. (23) D. Erdogmus and J. C. Principe, Convergence properties and data efficiency of the minimum error entropy criterion in Adaline training, IEEE Trans. Signal Process. 5 (23) T. M. Cover and J. A. Thomas, Element of Information Theory (Wiley & Son, Inc., Chichester, 99). 2. D. Erdogmus and J. C. Principe, Comparison of entropy and mean square error criteria in adaptive system training using higher order statistics, Second Int. Workshop on Independent Component Analysis and Blind Signal Separation (2), pp B. W. Silverman, Density Estimation for Statistic and Data Analysis (Chapman & Hall, NY, 986). 4. M. K. Varanasi and B. Aazhang, Parametric generalized Gaussian density estimation, J. Acoust. Soc. Amer. 86 (989) R. L. Joshi and T. R. Fisher, Comparison of generalized Gaussian and Laplacian modeling in DCT image coding, IEEE Signal Process. Lett. 2 (995) F. Wang, H. Li and R. Li, Unified parametric and nonparametric ICA algorithms for hybrid source signals and stability analysis, Int. J. Innovative Comput. Inform. Control 4 (28) K. Sharifi and A. Leon-Garcia, Estimation of shape parameter for generalized Gaussian distributions in subband decompositions of video, IEEE Trans. Circuits Syst. Video Technol. 5 (995) K. Kokkinakis and A. K. Nandi, Exponent parameter estimation for generalized Gaussian probability density functions with application to speech modeling, Signal Process. 85 (25) M. Abramowitz and I. A. Stegun, Handbook of Mathematical Functions (Dover, New York, 97). 2. E. T. Jaynes, Information theory and statistical mechanics, Phys. Rev. 6 (957) S. Mallat, A theory for multiresolution signal decomposition: The wavelet representation, IEEE Trans. Pattern Anal. Machine Intell. (989)

16 B. Chen et al. 22. S. Choi, A. Cichocki and S. Amari, Flexible independent component analysis, J. VLSI Signal Process. 26 (2) B. Chen, J. Hu, L. Pu and Z. Sun, Stochastic gradient algorithm under (h;)-entropy criterion, Circuits Syst. Signal Process. 26 (27) A. Singh and J. C. Principe, Information theoretic learning with adaptive kernels, Signal Process. 9 (2)

STOCHASTIC INFORMATION GRADIENT ALGORITHM BASED ON MAXIMUM ENTROPY DENSITY ESTIMATION. Badong Chen, Yu Zhu, Jinchun Hu and Ming Zhang

ICIC Express Letters ICIC International c 2009 ISSN 1881-803X Volume 3, Number 3, September 2009 pp. 1 6 STOCHASTIC INFORMATION GRADIENT ALGORITHM BASED ON MAXIMUM ENTROPY DENSITY ESTIMATION Badong Chen,