Key Diagonal Blocks of the Fisher Information Matrix on Neural Manifold of Full-Parametrised Multilayer Perceptrons

Similar documents
Service Bulletin-04 真空电容的外形尺寸

On the Quark model based on virtual spacetime and the origin of fractional charge

Source mechanism solution

Lecture Note on Linear Algebra 16. Eigenvalues and Eigenvectors

2012 AP Calculus BC 模拟试卷

The dynamic N1-methyladenosine methylome in eukaryotic messenger RNA 报告人 : 沈胤

The preload analysis of screw bolt joints on the first wall graphite tiles in East

能源化学工程专业培养方案. Undergraduate Program for Specialty in Energy Chemical Engineering 专业负责人 : 何平分管院长 : 廖其龙院学术委员会主任 : 李玉香

Chapter 22 Lecture. Essential University Physics Richard Wolfson 2 nd Edition. Electric Potential 電位 Pearson Education, Inc.

= lim(x + 1) lim x 1 x 1 (x 2 + 1) 2 (for the latter let y = x2 + 1) lim

GRE 精确 完整 数学预测机经 发布适用 2015 年 10 月考试

Chinese Journal of Applied Entomology 2014, 51(2): DOI: /j.issn 信息物质的化学分析技术 黄翠虹 , ; 2.

0 0 = 1 0 = 0 1 = = 1 1 = 0 0 = 1

A proof of the 3x +1 conjecture

A Tutorial on Variational Bayes

通量数据质量控制的理论与方法 理加联合科技有限公司

Happy Niu Year 牛年快乐 1

Digital Image Processing. Point Processing( 点处理 )

三类调度问题的复合派遣算法及其在医疗运营管理中的应用

Galileo Galilei ( ) Title page of Galileo's Dialogue concerning the two chief world systems, published in Florence in February 1632.

Synthesis of PdS Au nanorods with asymmetric tips with improved H2 production efficiency in water splitting and increased photostability

( 选出不同类别的单词 ) ( 照样子完成填空 ) e.g. one three

d) There is a Web page that includes links to both Web page A and Web page B.

There are only 92 stable elements in nature

5. Polymorphism, Selection. and Phylogenetics. 5.1 Population genetics. 5.2 Phylogenetics

Easter Traditions 复活节习俗

Riemann s Hypothesis and Conjecture of Birch and Swinnerton-Dyer are False

Explainable Recommendation: Theory and Applications

Lecture 2: Introduction to Probability

Large-Scale Data-Dependent Kernel Approximation Appendix

Effect of lengthening alkyl spacer on hydroformylation performance of tethered phosphine modified Rh/SiO2 catalyst

PHZ 6607 Lecture Notes

USTC SNST 2014 Autumn Semester Lecture Series

Type and Propositions

Lecture Note on Linear Algebra 14. Linear Independence, Bases and Coordinates

Zinc doped g C3N4/BiVO4 as a Z scheme photocatalyst system for water splitting under visible light

Geomechanical Issues of CO2 Storage in Deep Saline Aquifers 二氧化碳咸水层封存的力学问题

Alternative flat coil design for electromagnetic forming using FEM

Algorithms and Complexity

Sichuan Earthquake 四川地震

Halloween 万圣节. Do you believe in ghosts? 你相信有鬼吗? Read the text below and do the activity that follows. 阅读下面的短文, 然后完成练习 :

THE INVERSE DERIVATIVE

Fundamentals of Heat and Mass Transfer, 6 th edition

Phase-field simulations of forced flow effect on dendritic growth perpendicular to flow

Integrated Algebra. Simplified Chinese. Problem Solving

Finite-Difference Methods (FDM)

Measurement of accelerator neutron radiation field spectrum by Extended Range Neutron Multisphere Spectrometers and unfolding program

MASTER S DEGREE THESIS. Electrochemical Stability of Pt-Au alloy Nanoparticles and the Effect of Alloying Element (Au) on the Stability of Pt

Lecture 13 Metabolic Diversity 微生物代谢的多样性

tan θ(t) = 5 [3 points] And, we are given that d [1 points] Therefore, the velocity of the plane is dx [4 points] (km/min.) [2 points] (The other way)

Design, Development and Application of Northeast Asia Resources and Environment Scientific Expedition Data Platform

A Tableau Algorithm for the Generic Extension of Description Logic

Chapter 4. Mobile Radio Propagation Large-Scale Path Loss

Rigorous back analysis of shear strength parameters of landslide slip

Photo induced self formation of dual cocatalysts on semiconductor surface

QTM - QUALITY TOOLS' MANUAL.

4.8 Low-Density Parity-Check Codes

Growth of Cu/SSZ 13 on SiC for selective catalytic reduction of NO

Theory of Water-Proton Spin Relaxation in Complex Biological Systems

A new approach to inducing Ti 3+ in anatase TiO2 for efficient photocatalytic hydrogen production

Chapter 6. Series-Parallel Circuits ISU EE. C.Y. Lee

复合功能 VZθ 执行器篇 Multi-Functional VZθActuator

Chapter 1 Linear Regression with One Predictor Variable

Principia and Design of Heat Exchanger Device 热交换器原理与设计

763622S ADVANCED QUANTUM MECHANICS Solution Set 1 Spring c n a n. c n 2 = 1.

Enhancement of the activity and durability in CO oxidation over silica supported Au nanoparticle catalyst via CeOx modification

The Order Relation and Trace Inequalities for. Hermitian Operators

1. Length of Daytime (7 points) 白昼长度 (7 分 )

C/CS/Phy191 Problem Set 3 Solutions Out: Oct 1, 2008., where ( 00. ), so the overall state of the system is ) ( ( ( ( 00 ± 11 ), Φ ± = 1

Homogeneous boron doping in a TiO2 shell supported on a TiB2 core for enhanced photocatalytic water oxidation

Linear Regression. Applied Linear Regression Models (Kutner, Nachtsheim, Neter, Li) hsuhl (NUK) SDA Regression 1 / 34

Market Opening Highest Bid Lowest Bid Closing Price Previous Closing Price

Picnics 野餐. For small children the idea of a teddy bear s picnic is popular each child brings their teddy bear along to enjoy the fun as well.

Integrating non-precious-metal cocatalyst Ni3N with g-c3n4 for enhanced photocatalytic H2 production in water under visible-light irradiation

Tsinghua-Berkeley Shenzhen Institute (TBSI) PhD Program Design

Proton gradient transfer acid complexes and their catalytic performance for the synthesis of geranyl acetate

Fabrication of ultrafine Pd nanoparticles on 3D ordered macroporous TiO2 for enhanced catalytic activity during diesel soot combustion

Atomic & Molecular Clusters / 原子分子团簇 /

Lecture 2. Random variables: discrete and continuous

Inner Product. Euclidean Space. Orthonormal Basis. Orthogonal

Synthesis of anisole by vapor phase methylation of phenol with methanol over catalysts supported on activated alumina

Competitive Experimentation and Private Information

澳作生态仪器有限公司 叶绿素荧光测量中的 PAR 测量 植物逆境生理生态研究方法专题系列 8 野外进行荧光测量时, 光照和温度条件变异性非常大 如果不进行光照和温度条件的控制或者精确测量, 那么荧光的测量结果将无法科学解释

A MULTIDIMENSIONAL ANALOGUE OF THE RADEMACHER-GAUSSIAN TAIL COMPARISON

p(z) = 1 a e z/a 1(z 0) yi a i x (1/a) exp y i a i x a i=1 n i=1 (y i a i x) inf 1 (y Ax) inf Ax y (1 ν) y if A (1 ν) = 0 otherwise

Chapter 2 the z-transform. 2.1 definition 2.2 properties of ROC 2.3 the inverse z-transform 2.4 z-transform properties

Poincaré Lelong Approach to Universality and Scaling of Correlations Between Zeros

Conditional expectation and prediction

Introduction. 固体化学导论 Introduction of Solid State Chemistry 新晶体材料 1976 年中科院要求各教研室讨论研究方向上海硅酸盐所 福州物质所. Textbooks and References

General Physics I. Lecture 22: More Twists on Space and Time. Prof. WAN, Xin ( 万歆 )

Single-atom catalysis: Bridging the homo- and heterogeneous catalysis

Microbiology. Zhao Liping 赵立平 Chen Feng. School of Life Science and Technology, Shanghai Jiao Tong University

The Hardy-Littlewood prime k-tuple conjecture is false

Influence of surface strain on activity and selectivity of Pd based catalysts for the hydrogenation of acetylene: A DFT study

Lecture 4-1 Nutrition, Laboratory Culture and Metabolism of Microorganisms

Operating characteristics of a single-stage Stirling-type pulse tube cryocooler with high cooling power at liquid nitrogen temperatures *

Gauss-Bonnet Theorem on Moduli Spaces. 陆志勤 Zhiqin Lu, UC Irvine

进化树构建方法的概率方法 第 4 章 : 进化树构建的概率方法 问题介绍. 部分 lid 修改自 i i f l 的 ih l i

STUDIES ON CLITICS OF CHINESE - THE GRAMMATICALIZATION APPROACH

Transcription:

Key Dagonal Blocs of the Fsher Informaton Matrx on Neural Manfol of Full-Parametrse Multlayer Perceptrons Xongzh Chen Mathematcal College, Schuan Unersty, Chengu 60064, P. R. Chna Abstract: It s ell non the natural graent learnng (NGL ([] may ao global optma or phenomena of plateau n the tranng process snce t taes nto conseraton the ntrnsc geometrc structure of the parameter space. But, natural graent ([] s tself nuce by Fsher nformaton matrx (FIM ([] efne on the -form tangent space ([3], therefore calculaton of releant FIM s ey to the realzaton of NGL. hs paper ges explct eraton an compact matrx representaton of the agonal blocs, an ther nerses as ell, of the FIM base Remannan metrc on neural manfol of full-parametrse multlayer perceptrons (MLP, thus extenng an complementng the results partally gen n [] an [3]. Keyors: Neural manfol, Natural graent learnng, Fsher nformaton matrx (FIM, Full-parametrse Multlayer perceptrons (FP-MLPs. Introucton he smplest neural manfol s the space of all sngle feeforar neurons (.e., perceptrons th nput noes an sngle output noe. e use the same settngs an notatons propose n [] thout restatng the etale assumptons on the arables that occur n the moel here x~ N( 0I,, n ( y f( x + ε, (I ε ~ N(0, an, x ε are statstcally nepenent, (,, n n R an, hch n ths paper ll enote ts Euclean norm. It s obtane n [] that the FIM s G c c c ( ( I + ( (. ( Hoeer, the mathematcal etals on ho ( s ere as not fully shon. o nestgate the FIM on neural manfols M of full-parametrse MLPs (hch are MLPs th atonally threshols arables, paper [4] shos the bloc form of ths FIM: For neural manfol M of all fully connecte n p type MLPs hose nput-output relaton s y f( x + ε f( x θ ε + +, here x an ε are mutually nepenent an normally strbute respectely as N(, 0I an N(0,, the Fsher nformaton matrx G(, θ, n blocs s [ ln (] [ ln (] [ ln (] [ ln (]( ˆ p p p p f u f (,, [ ln (] ( ˆ ( ˆ ( ˆ G θ E p f u f f u f u f ( ( ˆ [ ln (] ( ˆ ( ˆ u p u u ( f f f f f n here ln p( ( ln p(,, ln p( R. hough [], [3] partally touches the blocs of the agonal bloc G [ ln p(] [ ln p(], the etals an compact representatons are mssng, hereuner e ll proe the explct representatons of these agonal blocs an ther nerses. Key Dagonal Blocs of the Remannan Metrc on M o proe a concse mathematcal statement heren, e use almost the same settngs gen n [4] unless otherse state or supplemente. Here t s suppose that f ( τ L ( R, τ,.e., f belongs to the famly of square ntegrable functons th respect to the Lebesgue measure τ on the Borel -algebra oer the real lne R an that the probablty enstes appearng here are regular n the sense specfe n [3].

. Frst, through theorem, e proe full etals on ho formula ( as ere an proe ts compact representaton n matrx notatons, then e procee on to obtan the man targets of ths paper. heorem For the moel y f( x + ε uner the settng gen n [], the Fsher nformaton matrx an ts nerse at are respectely c 0 ( G( ( c ( + c( c( I, (3 0 c I c 0 ( 4 G ( ( c + ( c c I, (4 0 c I here the symbols bear the meanngs gen n the follong proof. n Proof: Suppose,,, n, here, forms the orthonormal bass for R. hen x,,, n x are mutually statstcally nepenent th u x ς, ς ~ N(0, an xξ ~ N(0,, an consequently Set ς γ, G ( E[ f ( x] ( x ( exp ς f ς ς π ς c, ( G( E[ f ( x] ( ( exp x f ς ς (- π c, ( G( E[ f ( x] ( ( 0, x x c, ( G ( E[ f ( x] ( x( 0 x c γ,, ; (, ( c c, γ,, n C ag C ag c c I. hen, from the orthogonalty of, t s nferre that Snce ( ; ( c, ( n ( n C, c 0 G ( C ( c + c. (5 0 ci I +, then I an hence Further, the nerse G ( s G ( c( I + c c( I + c( c(. c 0 4 G ( C ( c + ( c c I, 0 c I hch completes the proof.. No, let s conser full-parameters MLPs propose n [4] as: y f( x + ε f ( x+ θ + ε (II here x ~ N( 0I,, ε ~ N(0,. For moel (II, the ont probablty ensty functon s ln (, ;,, ln ( ln ( ;,, ln ( ln( [ ( ] so that ts graents are θ θ π f,

ε ln p( [ y ( ] ( ( f x f x f x ε (- ln p( [ y ( x] f ( ( x θ x f f + u x ln p( [ y ( x]( ˆ ε ( ˆ θ f f f here p( p( x, y;,, θ satsfes regularty contons specfe n [5] an p( p( y x;,, θ. By the efnton gen n [], for moel (II the corresponng FIM shoul be ( ρ ρ G( ρ G(, θ, E ln p([ ln p(]. (6 For smplcty of notatons, e specfcally set G ( G G G, here s the symbol of tensor prouct an ln ( [ ln ( ] ( ( θ θ G E p p E f x f x xx + +. Hereuner e ll obtan explctly the agonal blocs an ther nerses of G, hose compact representatons are also shon... For the ey Blocs G, of G ( G an ther nersons, e hae heorem Suppose ran, then for the moel y f( x + ε f( x+ θ + ε, the blocs G, of the FIM an ther nerse are respectely G Λ 0 + 0 βi ( βi ( D( Λ 0 G + β 0 I here the meanngs of the symbols an alues are gen n the proof. ( β I ( H(, (7, (8 n Proof: Snce for any par of,, there exsts an orthonormal bass,, n for R such that L span, span, an L L n the Euclan sense, here L span 3,, n, so all the ectors x,, n x are statstcally nepenent an strbute normally as N (0,. e set ( K(, K (- - p p ( P(, K P p p Frst of all, e coul obtan to groups of four nepenent alues releant respectely to, an, (see (A., (A. of the Appenx for more etals, hch are Λ (-3 Γ Obously, Γ KΛ K, or equalently, G G,, λ λ ( G (,, G G λ λ,, G G γ γ ( G (,, G G γ γ 3

Λ PΓ P. (9 For l,< l n, e mmeately hae By β l G ( (, l E f x+ θ f x+ θ < l n (-4 Λlm, l G 0,,, m < l m n l m settng ( ag, ag, β ( n ( n R ; (, (, ( 3 n ; β ( n ( n C I, C Λ C Λ I. hen PR, e coul obtan from (-3, (-4 thatg C, hch nuces compactly Λ 0 G ( C β. (0 0 I( n ( n Expanng (0 nto Λ 0 G ( Λ + an nferrng from (-, e hae β 0 β I λ λ Λ,,,,,, (,, λ + λ + λ + λ λ λ Further, also by the orthogonalty of (, e hae,. ( I. ( (he exact expressons of ( an ( are gen by (A-3, (A-4 of Appenx. hch s Hence, e hae G λ λ λ λ β,,,, ( + + + + I, (3 θ θ β ( ( G E f ( x+ f ( x+ xx I + D, (4 here D P Λ P β P P an β are gen n (A.4, (A.5 of the Appenx. As for the nerse of G, e hae smply the compact form G Λ 0 ( β Λ + 0 β I. (5 Settng Λ et Λ, e then hae Λ,, λ λ,, Λ λ λ an consequently an λ λ λ + ( λ + ( λ + λ,,,,,, (,, Λ λ λ Λ Λ (6 4

G Λ 0 + 0 β I ( β I ( H(, (7 h - h here H P Λ P β P P h h s efne by (A.6 n Appenx. For the blocs G of G, e hae 4 smlarly G( c, I + c, c, an G( c, I+ c, c,, here c,, c, are obtane by the same metho use n theorem. 3 Conclung Remars In all the sectons aboe, e hae struggle to ge a etale an compact representaton of the agonal blocs G of the FIM G ( ρ on neural manfol Mof full-parametrse MLPs, thus extenng results of smlar blocs n smpler neural manfols uner more ersatle settngs. Obously, the ncrease complextes n calculatons of FIM actually n most cases only permts a bloc matrx representaton of the esre metrc, snce the actaton functons are of ast arety an the nepenence assumptons neee to explctly obtan the entres of G( ρ an t nerse are too restrcte. Despte of the tantalzaton of the entry-se presentaton of G( ρ an G ( ρ, e e nonetheless ent a step further n calculatng some ey blocs an e ll probably resort to the egenstructure of the square blocs of G( ρ uner strngent contons an try to fn out more f feasble. 4 Appenx o mae the paper more concentrate on ts core contents, e hae exclusely moe the complcate computatons here th reference remars gen at the approprate places n preous sectons. 4. e present here as a lemma the ecomposton of an orthogonal matrx th respect to some ts column/ro ectors, hch ll be useful n future applcatons. Lemma For any orthogonal matrx ( ( (, l, m n, l + m n, the entty hols m l t I t. t, here ( n n n, l proof: From the orthogonalty of, t s obtane that ( ( l n n n n l l t t t I +, l hence I t. 4. Exact expressons of Λ, Γ : t t ( ( ( ( ( ( ( ( ( (, γ G E f x+ θ f x+ θ x, γ G E f x+ θ f x+ θ x (A., γ G E f x+ θ f x+ θ x x,, γ G γ 5

( ( ( ( ( ( ( ( ( (, λ G E f x+ θ f x+ θ x, λ G E f x+ θ f x+ θ x (A., λ G E f x+ θ f x+ θ x x,, λ G λ 4.3 he exact expressons of β,, etc.: - - e setk etk, then (A-3 I ( + ( + [ ( + ( ] + - K + [ ( + ( ] - + + ( + ( K (A-4 - + + ( + ( K - ( + ( + + K,,,, λ - + λ ( + λ ( + λ ( + K,,,, λ - + λ ( + λ ( + λ ( + K (A.5,,,, λ - ( + λ + λ + λ ( [ ( + ( ] K,,,, λ - ( + λ + λ + λ ( [ ( + ( ] K,,,, h λ - + λ + λ + λ ( + K Λ,,,, h λ - + λ + λ + λ ( + K Λ (A.6,,,, h λ - ( ( λ ( λ λ ( [ ( ( ] K Λ + + + +,,,, h λ - ( + ( λ + ( λ + λ ( [ ( + ( ] K Λ 5 Acnolegements he author oul le to than Mr. Guoun Zhao for scoerng the relatonshp n Lemma of Appenx, hch leas to the fnal etal of (3. References [] Shun-ch Amar. Natural graent ors effcently n learnng. Neural computaton. 998, MI Press, Vol.0, 5-76. 6

[] C. R., Rao. Lnear Statstcal Inference an Its Applcatons, e., John ley & Sons, 97. [3] Shun-ch Amar. Dfferentcal Geometrcal Methos n Statstcs. Lecture notes n statstcs, Vol. 8. Sprnger-Verlag, 985, Ne yor. [4] Changln Ca, Zhongzh Sh, Xongzh Chen. he Fsher Informaton Matrx on Neural Manfols of Multlayer Perceptrons. [5] Amar, S., Par, H. an Fuumzu, K. Aapte Metho of Realzng Natural Graent Learnng for Multlayer Perceptrons. Submtte to Neural Computaton. 多层感知器流形上的 Fsher 信息矩阵的关键分块子矩阵 陈雄志, 蔡长林 四川大学数学学院, 成都 60064 摘要 : 众所周知, 自然梯度学习法 ([] 能够在训练过程中避免局部最优或高原现象, 因为它考虑到了参数空间的内禀几何性质 但是, 自然梯度 ([] 本身是由定义在 - 形式切空间上的 Fsher 信息矩阵 (FIM 来诱导的, 所以对 FIM 的计算就成为实现自然梯度学习的关键环节 本文给出了全参数 MLP 流形上的 FIM 的关键对角子块及其逆的显式表达和紧凑矩阵表达 如此, 本文就将 [] 和 [3] 中部分给出的结果做了推广和补充 关键词 : 神经流形, 自然梯度学习,Fsher 信息矩阵 (FIM, 全参数多层感知器 (Full-parametrse MLP 7