Convergence of a Fixed-Point Minimum Error Entropy Algorithm

Entropy 05, 7, 5549-5560; doi:0.3390/e7085549 Article OPE ACCESS entropy ISS 099-4300 www.dpi.co/journal/entropy Convergence of a Fixed-Point Miniu Error Entropy Algorith Yu Zhang, Badong Chen, *, Xi Liu, Zejian Yuan and Joe C. Principe,3 School of Aeronautic and Atronautic, Zhejiang Univerity, Hangzhou 3007, China; E-Mail: zhangyu80@zju.edu.cn School of Electronic and Inforation Engineering, Xi an Jiaotong Univerity, Xi an 70049, China; E-Mail: lxi0@63.co (X.L.); yzejian@gail.co (Z.Y.) 3 Departent of Electrical and Coputer Engineering, Univerity of Florida, Gaineville, FL 36, USA; E-Mail: principe@cnel.ufl.edu * Author to who correpondence hould be addreed; E-Mail: chenbd@ail.xjtu.edu.cn. Acadeic Editor: Kevin H. Knuth Received: 3 May 05 / Accepted: 8 July 05 / Publihed: 3 Augut 05 Abtract: he iniu error entropy () criterion i an iportant learning criterion in inforation theoretical learning (IL). However, the olution cannot be obtained in cloed for even for a iple linear regreion proble, and one ha to earch it, uually, in an iterative anner. he fixed-point iteration i an efficient way to olve the olution. In thi work, we tudy a fixed-point algorith for linear regreion, and our focu i ainly on the convergence iue. We provide a ufficient condition (although a little looe) that guarantee the convergence of the fixed-point algorith. An illutrative exaple i alo preented. Keyword: inforation theoretic learning (IL); iniu error entropy () criterion; fixed-point algorith MSC Code: 6B0. Introduction In recent year, inforation theoretic eaure, uch a entropy and utual inforation, have been widely applied in doain of achine learning (o called inforation theoretic learning (IL) [) and

Entropy 05, 7 5550 ignal proceing [,. A poible ain reaon for the ucce of IL i that inforation theoretic quantitie can capture higher-order tatitic of the data and offer potentially ignificant perforance iproveent in achine learning application [. Baed on the Parzen window ethod [3, the ooth and nonparaetric inforation theoretic etiator can be applied directly to the data without ipoing any a priori auption (ay the Gauian auption) about the underlying probability denity function (PDF). In particular, Renyi quadratic entropy etiator can be eaily calculated by a double u over aple [4 7. he entropy in upervied learning erve a a eaure of iilarity and follow a iilar fraework of the well-known ean quare error (MSE) [,. An adaptive yte can be trained by iniizing the entropy of the error over the training dataet [4. hi learning criterion i called the iniu error entropy () criterion [,,8 0. ay achieve uch better perforance than MSE epecially when data are heavy-tailed or ultiodal non-gauian [,,0. However, the olution cannot be obtained in cloed for even when the yte i a iple linear odel uch a a finite ipule repone (FIR) filter. A practical approach i to earch the olution over perforance urface by an iterative algorith. Uually, a iple gradient baed earch algorith i adopted. With a gradient baed learning algorith, however, one ha to elect a proper learning rate (or tep-ize) to enure the tability and achieve a better tradeoff between iadjutent and convergence peed [4 7. Another ore proiing earch algorith i the fixed-point iterative algorith, which i tep-ize free and i often uch fater than gradient baed ethod [. he fixed-point algorith have received coniderable attention in achine learning and ignal proceing due to their deirable propertie of low coputational requireent and fat convergence peed [ 7. he convergence i a key iue for an iterative learning algorith. For the gradient baed algorith, the convergence proble ha already been tudied and oe theoretical reult have been obtained [6,7. For the fixed-point algorith, up to now there i till no tudy concerning the convergence. he goal of thi paper i to tudy the convergence of a fixed-point algorith and provide a ufficient condition that enure the convergence to a unique olution (the fixed point). It i worth noting that the convergence of a fixed-point axiu correntropy criterion (MCC) algorith ha been tudied in [8. he reainder of the paper i organized a follow. In Section, we derive a fixed-point algorith. In Section 3, we prove a ufficient condition to guarantee the convergence. In Section 4, we preent an illutrative exaple. Finally in Section 5, we give the concluion.. Fixed-Point Algorith Conider a iple linear regreion (filtering) cae where the error ignal i ei () di () yi () di () W Xi () () with di () being a deired value at tie i, yi () W Xi () the output of the linear odel, [,,, the weight vector, and X i [ x i x i x i W w w w () (), (),, () the input vector (i.e., the regreor). he goal i to find a weight vector uch that the error ignal i a all a poible. Under the criterion, the optial weight vector i obtained by iniizing the error entropy [,. With Renyi quadratic entropy, the olution can be expreed a

Entropy 05, 7 555 W p xdx p xdx arg in log e( ) arg ax e( ) W W () where p e(.) denote the PDF of the error ignal. In IL the quantity p e ( x) dx i alo called the quadratic inforation potential (QIP) [. In a practical ituation, however, the error ditribution i e(), e(),, e( ), where uually unknown, and one ha to etiate it fro the error aple { } denote the aple nuber. Baed on the Parzen window approach [3, the etiated PDF take the for pˆ e( x) κ( xe( i)) (3) i where κ (.) tand for a kernel function (not necearily a Mercer kernel), atifying κ ( x) 0 and κ( x) dx. Without entioned otherwie, the kernel function i elected a a Gauian kernel, given by κ x ( x) exp( ) (4) π where denote the kernel bandwidth. With Gauian kernel, the QIP can be iply etiated a [ pˆ e ( xdx ) κ ( xei ( )) dx ( ei ( ) e( j)) κ i i j (5) herefore, in practical ituation, the olution of () becoe W arg ax ( e( i) e( j)) κ W i j (6) Unfortunately, there i no cloed for olution of (6). One can apply a gradient baed iterative algorith to earch the olution, tarting fro an initial point. Below we derive a fixed-point iterative algorith, which i, in general, uch fater than a gradient baed ethod (although a gradient ethod can be viewed a a pecial cae of the fixed-point ethod, it involve a tep-ize paraeter). Let take the following firt order derivative: where W κ i j (() ei e()) j κ ( ei ( ) e( j)) ( ei ( ) e( j) )[ X( i) X( j) i j ( ( ) ( )) κ ei e j ( di ( ) d( j) )[ X( i) X( j) i j ( () ( )) κ e i e j [ X() i X( j) [ X() i X( j) W i j M { PdX R EE W} (7)

Entropy 05, 7 555 R κ ( ei () ej ( )) Xi () X( j) Xi () X( j) i j PdX κ ( ei () e( j)) ( di () d( j) )[ X() i X( j) i j Let (() ()) 0 κ ei e j W, and aue that the atrix i j the following olution [5: ( ) [ [ dx (8) R i invertible. hen, we obtain W R P (9) he above olution i, in for, very iilar to the well-known Wiener olution [9. However, it i not a cloed for olution, ince both atrix R and vector P dx depend on the weight vector W (note that ei () depend on W ). herefore, the olution of (9) i actually a fixed-point equation, which can alo be expreed a W f,where ( ) f ( W ) R P (0) he olution (fixed-point) of the equation W f can be found by the following iterative fixed-point algorith: dx W f k+ ( Wk) () where W k denote the etiated weight vector at iteration k. hi algorith i called the fixed-point algorith [5. An online fixed-point algorith wa alo derived in [5. In the next ection, we will prove a ufficient condition under which the algorith () urely converge to a unique fixed-point. 3. Convergence of the Fixed-Point he convergence of a fixed-point algorith can be proved by the well-known contraction apping theore (alo known a the Banach fixed-point theore) [. According to the contraction apping theore, the convergence of the fixed-point algorith () i guaranteed if β > 0 and 0< α < uch that the initial weight vector W0 β, and W p { W : W β p }, it hold that f ( W ) β p f ( W ) W f ( W ) α p W p where. p denote an lp-nor of a vector or an induced nor of a atrix, defined by A ax AX X, with p, A p X 0 p p p, atrix of f ( W ) with repect to W, given by X, and W ( W ) () f denote the Jacobian W f f f f (3) w w w

Entropy 05, 7 5553 where w f ( W ) R P w dx R R R P + R P w dx w dx R κ ( e() i e( j) )[ X() i X( j) [ X() i X( j) f w i j + R κ ( ei () e( j) )[ di () d( j) [ X() i X( j) i j w R κ i j ( ei ( ) e( j) )( x ( i) x ( j) ) ( ei ( ) e( j) )[ X( i) X( j) [ X( i) X( j) f + R ( ei () e( j) )( x () i x ( j) ) κ e() i e( j) d() i d( j) X() i X( j) i j ( )[ [ (4) o obtain a ufficient condition to guarantee the convergence of the fixed-point algorith (), we prove two theore below. heore. If d() i d( j) X() i X( j) i j β > ξ λin i j [ Xi () X( j) [ Xi () X( j) * * and, where i the olution of equation ϕ( ) β, where, d() i d( j) X() i X( j) ϕ i j ( ), 0, ( β X() i X( j) + d() i d( j) ) λ in exp [ X( i) X( j) [ X( i) X( j) i j 4 hen β f for all W { W : W β}. ( ) (5) Proof. he induced atrix nor i copatible with the correponding vector lp-nor, hence where R R P dx R PdX f ( W ) (6) R i the -nor (alo referred to a the colun-u nor) of the invere atrix, which i iply the axiu abolute colun u of the atrix. According to the atrix theory, the following inequality hold:

Entropy 05, 7 5554 R R R λax (7) where R i the -nor (alo referred to a the pectral nor) of R axiu eigenvalue of the atrix. Further, we have λ ( R ) λ in R ax ( a) λ κ in i j λ κ β ( e() i e( j) )[ X() i X( j) [ X() i X( j) ( X() i X( j) + d() i d( j) )[ X() i X( j) [ X() i X( j) in i j, which equal the (8) where (a) coe fro In addition, it hold that ( ) ei () e( j) di () d( j) W X() i X( j) W X() i X( j) + d() i d( j) β X() i X( j) + d() i d( j) (9) P dx κ ( b) ( c) i j ( ei () e( j) )[ di () d( j) [ X() i X( j) κ ( ei ( ) e( j) )[ di ( ) d( j) [ X( i) X( j) i j di ( ) d( j) Xi ( ) X( j) π i j where (b) follow fro the convexity of the vector l-nor, and (c) i becaue κ ( x) for any π x.cobining (6) (8) and (0), we derive (0) f ( W ) λ κ β λ π i j ( X() i X( j) + d() i d( j) )[ X() i X( j) [ X() i X( j) in i j i j ( β X() i X( j) + d() i d( j) ) in i j 4 ϕ ( ) di () d( j) Xi () X( j) d( i) d( j) X( i) X( j) exp ( ) ( ) [ X i X j [ X() i X( j) ()

Entropy 05, 7 5555 Clearly, the function ( ) 0,, atifying li ϕ( ), and li ϕ( ) ξ. herefore, if β > ξ, the equation ϕ( ) β will have a 0+ unique olution heore. If and ax {, } ϕ i a continuou and onotonically decreaing function of over ( ) * over ( 0, ), and if *, we have ϕ( ) d() i d( j) X() i X( j) i j β > ξ λin i j [ Xi () X( j) [ Xi () X( j) β, which coplete the proof., where * i the olution of the equation ϕ( ) β, and i the olution of equation ψ( ) α ( 0< α < ), where, γ ψ ( ), 0, ( β Xi () X( j) di () d( j) ) λin exp + Xi () X( j) Xi () X( j) i j 4 in which ( X() i X( j) d() i d( j) ) X() i X( j) [ X() i X( j) [ X() i X( j) γ β + β + i j di ( ) d( j) Xi ( ) X( j) then it hold that f β, and ( ) W W α ) ( f for all W { W : W β}. ( ) () (3) Proof. By heore, we have f β. o prove ( ) W f W α, it uffice to prove, w f α. By (4), we have w ( W ) R i j + R i j R f ( ei () e( j) )( x() i x( j) ) κ ( ei () e( j) )[ X() i X( j) [ X() i X( j) f ( ei () e( j) )( x() i x( j) ) κ ( ei () e( j) )[ di () d( j) [ X() i X( j) i j ( ei () e( j) )( x() i x( j) ) κ ( ei () e( j) )[ X() i X( j) [ X() i X( j) f + R i j ( ei ( ) e( j) )( x( i) x( j) ) κ ( ei ( ) e( j) )[ di ( ) d( j) [ X( i) X( j) (4)

Entropy 05, 7 5556 It i eay to derive ( d ) R i j ( ei ( ) ej ( ))( x( i) x( j) ) κ ( ei ( ) ej ( )) Xi ( ) X( j) Xi ( ) X( j) ( W f ) β ( e( i) e( j) )( x( i) x( j) ) κ ( e( i) e( j) ) X( i) X( j) X( i) X( j) ( W ) R f i j ( ei ( ) e( j) )( x( i) x( j) ) κ ( ei ( ) e( j) ) X ( i) X( j) X( i) X( j) R i j β ( β X() i X( j) + d() i d( j) ) X() i X( j) X() i X( j) X() i X( j) ( e) 3 4 π R i j where (d) follow fro the convexity of the vector l-nor and f β, and (e) i due to the fact that ( ei ( ) ej ( ))( x( i) x( j) ) ( β Xi ( ) X( j) + di ( ) d( j) ) Xi ( ) X( j) and κ ( ) x π for any x. In a iilar way, one can derive (5) R i j ( ei ( ) e( j) )( x( i) x( j) ) κ ( ei ( ) e( j) )[ di ( ) d( j) [ X( i) X( j) ( ei () e( j) )( x () i x ( j) ) κ ( ei () e( j) )[ di () d( j) [ X() i X( j) R i j ( 3 R ( β X i) X( j) + d( i) d( j) ) d( i) d( j) X( i) X( j) i j 4 π (6) hen, cobining (4) (6), (7) and (8), we have w ( W ) β 3 R f 4 π + i j ( β X() i X( j) d() i d( j) ) X() i X( j) [ X() i X( j) [ X() i X( j) + ( ( ) ( ) ( ) ( )) ( ) ( ) ( ) ( ) 3 β X i X j d i d j d i d j X i X j 4 π R + i j 3 4 λin κ β X( i) i j γ π ( X( j) + d( i) d( j) )[ X( i) X( j) [ X( i) X( j) γ ( β Xi () X( j) + di () d( j) ) λ in exp () ( ) () ( ) i j 4 ψ ( ) [ Xi X j [ Xi X j (7)

Entropy 05, 7 5557 Obviouly, ( ) atifie li ψ ( ), li ψ ( ) 0. herefore, given 0< α <, the equation ψ ( ) olution 0+ ψ i alo a continuou and onotonically decreaing function of over ( ) over ( 0, ), and if, we have ψ ( ) α. hi coplete the proof. 0,, and α ha a unique According to heore and Banach Fixed-Point heore [, given an initial weight vector atifying W0 β, the fixed-point algorith () will urely converge to a unique fixed point W W : W β provided that the kernel bandwidth i larger than a certain in the range { } value. Moreover, the value of α ( 0< α < ) guarantee the convergence peed. It i worth noting that the derived ufficient condition will be, certainly, a little looe, due to the zooing out in the proof proce. 4. Illutrative Exaple In the following, we give an illutrative exaple to verify the derived ufficient condition that guarantee the convergence of the fixed-point algorith. Let u conider a iple linear odel: di () Xi () + vi () (8) where X () i i a calar input, and vi () i an additive noie. Aue that X () i i unifor ditributed over 3, 3 and vi () i zero-ean Gauian with variance 0.0. here are 00 training aple generated fro the yte (8). Baed on thee data we calculate { Xi (), di ()} 00 i 00 00 di ( ) d( j) Xi ( ) X( j) i j ξ.974 00 00 Xi () X( j) i j (9) We chooe β 3 > ξ and α 0.9938 <. hen by olving the equation ϕ( ) β and ψ( ) α, we obtain.38 and.68. herefore, by heore, if.68 the fixed-point algorith will converge to a unique olution in the range 3 W 3. Figure 3 illutrate the curve of the function W, ( ) ( ) df f W R PdX, and when 3.0, 0., 0.0, repectively. Fro the Figure we oberve: (i) when 3.0 >.68, we have f ( W ) < 3 and df < α for df 3 W 3; (ii) when 0.<.68, we till have f ( W ) < 3 and < α for 3 W 3. In thi cae, the algorith till will converge to a unique olution in the range 3 W 3. hi reult confir the fact that the derived ufficient condition i a little looe (i.e., far fro being neceary). he ain reaon for thi i that there i a lot of zooing out in the derivation proce; (iii) however, when i too all, ay 0.0, the condition df < α will not hold for oe W 3 W 3. In thi cae, the algorith ay diverge. { }

Entropy 05, 7 5558 3 0 - - w f(w) df(w)/dw -3-3 - - 0 3 w Figure. Plot of the function W, f ( W ) and df when 3.0. 3 0 - w f(w) df(w)/dw - -3-3 - - 0 3 w Figure. Plot of the function W, f ( W ) and df when 0.. 3 0 - w f(w) df(w)/dw - -3-3 - - 0 3 w Figure 3. Plot of the function W, f ( W ) and df when 0.0. able how the nuber of iteration for convergence with different kernel bandwidth (3.0,.0, 0., 0.05). he initial weight vector i et at W 0 0., and the top condition for the convergence i Wk W W k k 6 < 0 (30)

Entropy 05, 7 5559 A one can ee, when 3.0 ax {, }, the fixed-point algorith will urely converge to a olution with few iteration. When becoe aller, the algorith ay till converge, but the convergence peed will becoe uch lower. ote that when i too all (e.g., 0.0 ), the algorith will diverge (the correponding reult are not hown in able ). 5. Concluion able. uber of iteration for convergence with different kernel bandwidth. 3.0.0 0. 0.05 Iteration 3 4 6 43 he criterion ha received increaing attention in ignal proceing and achine learning due to it deirable perforance in adaptive yte training epecially with non-gauian data. Many iterative optiization ethod have been developed to iniize the error entropy for practical ue. But the fixed-point algorith have been eldo tudied, and in particular, too little attention ha been paid to the convergence iue of the fixed-point algorith. hi paper preented a theoretical tudy of thi proble, and proved a ufficient condition to guarantee the convergence of a fixed-point algorith. he reult of thi tudy ay provide a poible range for chooing a kernel bandwidth for learning. However, the derived ufficient condition ay give a uch larger kernel bandwidth than a deired one due to the zooing out in the forula derivation proce. In the future tudy, we will try to derive a tighter ufficient condition that enure the convergence of the fixed-point algorith. Acknowledgent hi work wa upported by 973 Progra (o. 05CB35703) and ational SF of China (o. 6375). Author Contribution Yu Zhang and Badong Chen proved the ain theore in thi paper, Xi Liu preented the illutrative exaple, Zejian Yuan and Joe C. Principe polihed the language and were in charge of technical checking. All author have read and approved the final anucript. Conflict of Interet he author declare no conflict of interet. Reference. Principe, J.C. Inforation heoretic Learning: Renyi Entropy and Kernel Perpective; Springer: ew York, Y, USA, 00.. Chen, B.; Zhu, Y.; Hu, J.C.; Principe, J.C. Syte Paraeter Identification: Inforation Criteria and Algorith; Elevier: Aterda, the etherland, 03.

Entropy 05, 7 5560 3. Silveran, B.W. Denity Etiation for Statitic and Data Analyi; Chapan & Hall: ew York, Y, USA, 986. 4. Erdogu, D.; Principe, J.C. An error-entropy iniization for upervied training of nonlinear adaptive yte. IEEE ran. Signal Proce. 00, 50, 780 786. 5. Erdogu, D.; Principe, J.C. Generalized inforation potential criterion for adaptive yte training. IEEE ran. eural etw. 00, 3, 035 044. 6. Erdogu, D.; Principe, J.C. Convergence propertie and data efficiency of the iniu error entropy criterion in adaline traing. IEEE ran. Signal Proce. 003, 5, 966 978. 7. Chen, B.; Zhu, Y.; Hu, J. Mean-quare convergence analyi of ADALIE training with iniu error entropy criterion. IEEE ran. eural etw. 00,, 68 79. 8. Chen, B.; Principe, J.C. Soe further reult on the iniu error entropy etiation. Entropy 0, 4, 966 977. 9. Chen, B.; Principe, J.C. On the Soothed Miniu Error Entropy Criterion. Entropy 0, 4, 3 33. 0. Marque de Sá, J.P.; Silva, L.M.A.; Santo, J.M.F.; Alexandre, L.A. Miniu Error Entropy Claification; Springer: London, UK, 03.. Agarwal, R.P.; Meehan, M.; O Regan, D. Fixed Point heory and Application; Cabridge Univerity Pre: Cabridge, UK, 00.. Cichocki, A.; Aari, S. Adaptive Blind Signal and Iage Proceing: Learning Algorith and Application; Wiley: ew York, Y, USA, 00. 3. Regalia, P.A.; Kofidi, E. Monotonic convergence of fixed-point algorith for ICA. IEEE ran. eural etw. 003, 4, 943 949. 4. Fiori, S. Fat fixed-point neural blind-deconvolution algorith. IEEE ran. eural etw. 004, 5, 455 459. 5. Han, S.; Principe, J.C. A fixed-point iniu error entropy algorith. In Proceeding of the 6th IEEE Signal Proceing Society Workhop on Machine Learning for Signal Proceing, Arlington, VA, USA, 6 8 Septeber 006; pp. 67 7. 6. Chen, J.; Richard, C.; Berudez, J.C.M.; Honeine, P. on-negative leat-ean-quare algorith. IEEE ran. Signal Proce. 0, 59, 55 535. 7. Chen, J.; Richard, C.; Berudez, J.C.M.; Honeine, P. Variant of non-negative leat-ean-quare algorith and convergence analyi. IEEE ran. Signal Proce. 04, 6, 3990 4005. 8. Chen, B.; Wang, J.; Zhao, H.; Zheng,.; Principe, J.C. Convergence of a fixed-point algorith under Maxiu Correntropy Criterion. IEEE Signal Proce. Lett. 05,, 73 77. 9. Kailath,.; Sayed, A.H.; Haibi, B. Linear Etiation; Prentice Hall: Upper Saddle River, J, USA, 000. 05 by the author; licenee MDPI, Bael, Switzerland. hi article i an open acce article ditributed under the ter and condition of the Creative Coon Attribution licene (http://creativecoon.org/licene/by/4.0/).