Key Dagonal Blocs of the Fsher Informaton Matrx on Neural Manfol of Full-Parametrse Multlayer Perceptrons Xongzh Chen Mathematcal College, Schuan Unersty, Chengu 60064, P. R. Chna Abstract: It s ell non the natural graent learnng (NGL ([] may ao global optma or phenomena of plateau n the tranng process snce t taes nto conseraton the ntrnsc geometrc structure of the parameter space. But, natural graent ([] s tself nuce by Fsher nformaton matrx (FIM ([] efne on the -form tangent space ([3], therefore calculaton of releant FIM s ey to the realzaton of NGL. hs paper ges explct eraton an compact matrx representaton of the agonal blocs, an ther nerses as ell, of the FIM base Remannan metrc on neural manfol of full-parametrse multlayer perceptrons (MLP, thus extenng an complementng the results partally gen n [] an [3]. Keyors: Neural manfol, Natural graent learnng, Fsher nformaton matrx (FIM, Full-parametrse Multlayer perceptrons (FP-MLPs. Introucton he smplest neural manfol s the space of all sngle feeforar neurons (.e., perceptrons th nput noes an sngle output noe. e use the same settngs an notatons propose n [] thout restatng the etale assumptons on the arables that occur n the moel here x~ N( 0I,, n ( y f( x + ε, (I ε ~ N(0, an, x ε are statstcally nepenent, (,, n n R an, hch n ths paper ll enote ts Euclean norm. It s obtane n [] that the FIM s G c c c ( ( I + ( (. ( Hoeer, the mathematcal etals on ho ( s ere as not fully shon. o nestgate the FIM on neural manfols M of full-parametrse MLPs (hch are MLPs th atonally threshols arables, paper [4] shos the bloc form of ths FIM: For neural manfol M of all fully connecte n p type MLPs hose nput-output relaton s y f( x + ε f( x θ ε + +, here x an ε are mutually nepenent an normally strbute respectely as N(, 0I an N(0,, the Fsher nformaton matrx G(, θ, n blocs s [ ln (] [ ln (] [ ln (] [ ln (]( ˆ p p p p f u f (,, [ ln (] ( ˆ ( ˆ ( ˆ G θ E p f u f f u f u f ( ( ˆ [ ln (] ( ˆ ( ˆ u p u u ( f f f f f n here ln p( ( ln p(,, ln p( R. hough [], [3] partally touches the blocs of the agonal bloc G [ ln p(] [ ln p(], the etals an compact representatons are mssng, hereuner e ll proe the explct representatons of these agonal blocs an ther nerses. Key Dagonal Blocs of the Remannan Metrc on M o proe a concse mathematcal statement heren, e use almost the same settngs gen n [4] unless otherse state or supplemente. Here t s suppose that f ( τ L ( R, τ,.e., f belongs to the famly of square ntegrable functons th respect to the Lebesgue measure τ on the Borel -algebra oer the real lne R an that the probablty enstes appearng here are regular n the sense specfe n [3].
. Frst, through theorem, e proe full etals on ho formula ( as ere an proe ts compact representaton n matrx notatons, then e procee on to obtan the man targets of ths paper. heorem For the moel y f( x + ε uner the settng gen n [], the Fsher nformaton matrx an ts nerse at are respectely c 0 ( G( ( c ( + c( c( I, (3 0 c I c 0 ( 4 G ( ( c + ( c c I, (4 0 c I here the symbols bear the meanngs gen n the follong proof. n Proof: Suppose,,, n, here, forms the orthonormal bass for R. hen x,,, n x are mutually statstcally nepenent th u x ς, ς ~ N(0, an xξ ~ N(0,, an consequently Set ς γ, G ( E[ f ( x] ( x ( exp ς f ς ς π ς c, ( G( E[ f ( x] ( ( exp x f ς ς (- π c, ( G( E[ f ( x] ( ( 0, x x c, ( G ( E[ f ( x] ( x( 0 x c γ,, ; (, ( c c, γ,, n C ag C ag c c I. hen, from the orthogonalty of, t s nferre that Snce ( ; ( c, ( n ( n C, c 0 G ( C ( c + c. (5 0 ci I +, then I an hence Further, the nerse G ( s G ( c( I + c c( I + c( c(. c 0 4 G ( C ( c + ( c c I, 0 c I hch completes the proof.. No, let s conser full-parameters MLPs propose n [4] as: y f( x + ε f ( x+ θ + ε (II here x ~ N( 0I,, ε ~ N(0,. For moel (II, the ont probablty ensty functon s ln (, ;,, ln ( ln ( ;,, ln ( ln( [ ( ] so that ts graents are θ θ π f,
ε ln p( [ y ( ] ( ( f x f x f x ε (- ln p( [ y ( x] f ( ( x θ x f f + u x ln p( [ y ( x]( ˆ ε ( ˆ θ f f f here p( p( x, y;,, θ satsfes regularty contons specfe n [5] an p( p( y x;,, θ. By the efnton gen n [], for moel (II the corresponng FIM shoul be ( ρ ρ G( ρ G(, θ, E ln p([ ln p(]. (6 For smplcty of notatons, e specfcally set G ( G G G, here s the symbol of tensor prouct an ln ( [ ln ( ] ( ( θ θ G E p p E f x f x xx + +. Hereuner e ll obtan explctly the agonal blocs an ther nerses of G, hose compact representatons are also shon... For the ey Blocs G, of G ( G an ther nersons, e hae heorem Suppose ran, then for the moel y f( x + ε f( x+ θ + ε, the blocs G, of the FIM an ther nerse are respectely G Λ 0 + 0 βi ( βi ( D( Λ 0 G + β 0 I here the meanngs of the symbols an alues are gen n the proof. ( β I ( H(, (7, (8 n Proof: Snce for any par of,, there exsts an orthonormal bass,, n for R such that L span, span, an L L n the Euclan sense, here L span 3,, n, so all the ectors x,, n x are statstcally nepenent an strbute normally as N (0,. e set ( K(, K (- - p p ( P(, K P p p Frst of all, e coul obtan to groups of four nepenent alues releant respectely to, an, (see (A., (A. of the Appenx for more etals, hch are Λ (-3 Γ Obously, Γ KΛ K, or equalently, G G,, λ λ ( G (,, G G λ λ,, G G γ γ ( G (,, G G γ γ 3
Λ PΓ P. (9 For l,< l n, e mmeately hae By β l G ( (, l E f x+ θ f x+ θ < l n (-4 Λlm, l G 0,,, m < l m n l m settng ( ag, ag, β ( n ( n R ; (, (, ( 3 n ; β ( n ( n C I, C Λ C Λ I. hen PR, e coul obtan from (-3, (-4 thatg C, hch nuces compactly Λ 0 G ( C β. (0 0 I( n ( n Expanng (0 nto Λ 0 G ( Λ + an nferrng from (-, e hae β 0 β I λ λ Λ,,,,,, (,, λ + λ + λ + λ λ λ Further, also by the orthogonalty of (, e hae,. ( I. ( (he exact expressons of ( an ( are gen by (A-3, (A-4 of Appenx. hch s Hence, e hae G λ λ λ λ β,,,, ( + + + + I, (3 θ θ β ( ( G E f ( x+ f ( x+ xx I + D, (4 here D P Λ P β P P an β are gen n (A.4, (A.5 of the Appenx. As for the nerse of G, e hae smply the compact form G Λ 0 ( β Λ + 0 β I. (5 Settng Λ et Λ, e then hae Λ,, λ λ,, Λ λ λ an consequently an λ λ λ + ( λ + ( λ + λ,,,,,, (,, Λ λ λ Λ Λ (6 4
G Λ 0 + 0 β I ( β I ( H(, (7 h - h here H P Λ P β P P h h s efne by (A.6 n Appenx. For the blocs G of G, e hae 4 smlarly G( c, I + c, c, an G( c, I+ c, c,, here c,, c, are obtane by the same metho use n theorem. 3 Conclung Remars In all the sectons aboe, e hae struggle to ge a etale an compact representaton of the agonal blocs G of the FIM G ( ρ on neural manfol Mof full-parametrse MLPs, thus extenng results of smlar blocs n smpler neural manfols uner more ersatle settngs. Obously, the ncrease complextes n calculatons of FIM actually n most cases only permts a bloc matrx representaton of the esre metrc, snce the actaton functons are of ast arety an the nepenence assumptons neee to explctly obtan the entres of G( ρ an t nerse are too restrcte. Despte of the tantalzaton of the entry-se presentaton of G( ρ an G ( ρ, e e nonetheless ent a step further n calculatng some ey blocs an e ll probably resort to the egenstructure of the square blocs of G( ρ uner strngent contons an try to fn out more f feasble. 4 Appenx o mae the paper more concentrate on ts core contents, e hae exclusely moe the complcate computatons here th reference remars gen at the approprate places n preous sectons. 4. e present here as a lemma the ecomposton of an orthogonal matrx th respect to some ts column/ro ectors, hch ll be useful n future applcatons. Lemma For any orthogonal matrx ( ( (, l, m n, l + m n, the entty hols m l t I t. t, here ( n n n, l proof: From the orthogonalty of, t s obtane that ( ( l n n n n l l t t t I +, l hence I t. 4. Exact expressons of Λ, Γ : t t ( ( ( ( ( ( ( ( ( (, γ G E f x+ θ f x+ θ x, γ G E f x+ θ f x+ θ x (A., γ G E f x+ θ f x+ θ x x,, γ G γ 5
( ( ( ( ( ( ( ( ( (, λ G E f x+ θ f x+ θ x, λ G E f x+ θ f x+ θ x (A., λ G E f x+ θ f x+ θ x x,, λ G λ 4.3 he exact expressons of β,, etc.: - - e setk etk, then (A-3 I ( + ( + [ ( + ( ] + - K + [ ( + ( ] - + + ( + ( K (A-4 - + + ( + ( K - ( + ( + + K,,,, λ - + λ ( + λ ( + λ ( + K,,,, λ - + λ ( + λ ( + λ ( + K (A.5,,,, λ - ( + λ + λ + λ ( [ ( + ( ] K,,,, λ - ( + λ + λ + λ ( [ ( + ( ] K,,,, h λ - + λ + λ + λ ( + K Λ,,,, h λ - + λ + λ + λ ( + K Λ (A.6,,,, h λ - ( ( λ ( λ λ ( [ ( ( ] K Λ + + + +,,,, h λ - ( + ( λ + ( λ + λ ( [ ( + ( ] K Λ 5 Acnolegements he author oul le to than Mr. Guoun Zhao for scoerng the relatonshp n Lemma of Appenx, hch leas to the fnal etal of (3. References [] Shun-ch Amar. Natural graent ors effcently n learnng. Neural computaton. 998, MI Press, Vol.0, 5-76. 6
[] C. R., Rao. Lnear Statstcal Inference an Its Applcatons, e., John ley & Sons, 97. [3] Shun-ch Amar. Dfferentcal Geometrcal Methos n Statstcs. Lecture notes n statstcs, Vol. 8. Sprnger-Verlag, 985, Ne yor. [4] Changln Ca, Zhongzh Sh, Xongzh Chen. he Fsher Informaton Matrx on Neural Manfols of Multlayer Perceptrons. [5] Amar, S., Par, H. an Fuumzu, K. Aapte Metho of Realzng Natural Graent Learnng for Multlayer Perceptrons. Submtte to Neural Computaton. 多层感知器流形上的 Fsher 信息矩阵的关键分块子矩阵 陈雄志, 蔡长林 四川大学数学学院, 成都 60064 摘要 : 众所周知, 自然梯度学习法 ([] 能够在训练过程中避免局部最优或高原现象, 因为它考虑到了参数空间的内禀几何性质 但是, 自然梯度 ([] 本身是由定义在 - 形式切空间上的 Fsher 信息矩阵 (FIM 来诱导的, 所以对 FIM 的计算就成为实现自然梯度学习的关键环节 本文给出了全参数 MLP 流形上的 FIM 的关键对角子块及其逆的显式表达和紧凑矩阵表达 如此, 本文就将 [] 和 [3] 中部分给出的结果做了推广和补充 关键词 : 神经流形, 自然梯度学习,Fsher 信息矩阵 (FIM, 全参数多层感知器 (Full-parametrse MLP 7