Why feed-forward networks are in a bad shape
|
|
- Brian Hubbard
- 6 years ago
- Views:
Transcription
1 Why feed-forward networks are n a bad shape Patrck van der Smagt, Gerd Hrznger Insttute of Robotcs and System Dynamcs German Aerospace Center (DLR Oberpfaffenhofen) Wesslng, GERMANY emal smagt@dlr.de Abstract It has often been noted that the learnng problem n feed-forward neural networks s very badly condtoned. Although, generally, the specal form of the transfer functon s usually taken to be the cause of ths condton, we show that t s caused by the manner n whch neurons are connected. By analyzng the expected values of the Hessan n a feed-forward network t s shown that, even n a network where all the learnng samples are well chosen and the transfer functon s not n ts saturated state, the system has a non-optmal condton. We subsequently propose a change n the feed-forward network structure whch allevates ths problem. We fnally demonstrate the postve nfluence of ths approach. 1 Introducton It has long been known [1, 3, 4, 6] that learnng n feed-forward networks s a dffcult problem, and that ths s ntrnsc to the structure of such networks. The cause of the learnng dffcultes s reflected n the Hessan matrx of the learnng problem, whch conssts of second dervatves of the error functon. When the Hessan s very badly condtoned, the error functon has a very strongly elongated shape; ndeed, condtons of are no excepton n feedforward network learnng, and n fact mean that the problem exceeds the representatonal accuracy of the computer. We wll show that ths problem s caused by the structure of the feed-forward network. An adaptaton to the learnng rule s shown to mprove ths condton. 2 The learnng problem We defne a feed-forward neural network wth a sngle layer of hdden unts (where ndcates the th nput, h the h th hdden unt, o the o th output, and ~x s an nput vector (x 1 ;x 2 ;::: ;x N )) N (~x; W ) o = h w ho s w h x + h : (1) 0 In: L. Nklasson, M. Bodén, and T. Zemke, edtors, Proceedngs of the 8th Internatonal Conference on Artfcal Neural Networks, pages Sprnger Verlag, 1998.
2 The total number of weghts w j s n. W.l.o.g. we wll assume o = 1 n the sequel. The learnng task conssts of mnmzng an approxmaton error E N (W )= 1 2 N (~x (p) ;W), ~y (p) (2) where s the number of learnng samples and ~y s an output vector (y 1 ;y 2 ;::: ;y No ). kks usually taken to be the L 2 norm. The error E can be wrtten as a Taylor expanson around W 0 : E N (W + W 0 )=E(W 0 ), J T W W T HW + r(w 0 ): (3) Here, H s the Hessan and J s the Jacoban of E around W 0. In the case that E s quadratc, the rest term s 0 and a Newton-based second-order search technque can then be used to fnd the extremum n the ellpsod E. A problem arses, however, when the axes of ths ellpsod are very dfferent. When the rato of the lengths of the largest and smallest axes s very large (close to the computatonal precson of the computer used n optmzaton), the computaton of the exact local gradent wll be mprecse, such that the system s dffcult to mnmze. Now, snce H s a real symmetrc matrx, ts egenvectors span an orthogonal bass, and the drectons of the axes of E are equal to the egenvectors of H. Furthermore, the correspondng egenvalues are the square root of the lengths of the axes. Therefore, the condton number of the Hessan matrx, whch s defned as the rato of the largest and smallest sngular values (and therefore, for a postve defnte matrx, of the largest and smallest egenvalues), determnes how well the error surface E can be mnmzed. 2.1 The lnear network In the case that s() s the dentty, N (W ) s equvalent to a lnear feed-forward network N (W 0 ) wthout hdden unts, and we can wrte E N (W 0 + W 0 0 )=E(W 0 0 ), J T W W 0T HW 0 : (4) In the lnear case the Hessan reduces to (leavng the ndex (p) out): H jk =,1 x j x k ; 1 j; k n = N +1 (5) where, for notatonal smplcty, we set x N+1 1. In ths case H s the covarance matrx of the nput patterns. It s nstantly clear that H s a postve defnte symmetrc matrx. Le Cun et al. [1] show that, when the nput patterns are uncorrelated, H has a contnuous spectrum of egenvalues, << +. Furthermore, there s one
3 egenvalue n of multplcty order n present only n the case that hx k 6= 0. Therefore, the Hessan for a lnear feed-forward network s optmally condtoned when hx k =0. The reason for ths behavour s very understandable. As!1, the summaton of uncorrelated elements x x j wll cancel out when hx =0, except where = j,.e., on the dagonal of the covarance matrx. In the lmt P these dagonal elements go towards the varance of the nput data (x )= 2 p x(p). From Gerschgorn s theorem we know that the egenvalues of a dagonal matrx equal the elements on the dagonal. 2.2 Mult-layer feed-forward networks In the case that a nonlnear feed-forward network s used,.e., s() s a nonlnear transfer functon, the rest term n Eq. (3) cannot be neglected n general. However, t s a well-known fact [2] that the rest term r() s neglgble close enough to a mnmum. From the defnton of E N we can compute that H j;k =,1 [N (~x), N k (~x) : k We nvestgate the propertes of H j;k of Eq. (6). The frst term of the Hessan has a factor [N (~x), y]. Close to a mnmum, ths factor s close to zero such that t can be neglected. Also, when summed over many learnng samples, ths factor equals the random measurement error and cancels out n the summaton. Therefore we can wrte H (~x) : k 2.3 Propertes of the Hessan Every Hessan of a feed-forward network wth one layer of hdden unts can be parttoned nto four parts, dependng on whether the dervatve s taken wth respect to a weght from nput to hdden or from hdden to output unt. We take the smplfcaton of Eq. (7) as a startng pont. The partal dervatve of N can be computed to ho = s w h x + h ah h = x w ho s 0 (a h ) x w ho a 0 h where s 0 () s the dervatve of s(). We can wrte the Hessan as a block matrx H P (x p 1 w h1oa 0 h 1 )(x 2 w h2oa 0 h 2 ) 00 H P (x p w h1oa 0 h 1 )a h2 10 H P (x p w h2oa 0 h 2 )a h1 01 H P a p h 1 a h2 11 H
4 (note that 10 H T = 01 H). Assumng that the nput samples have a normal (0,1) dstrbuton, we can analytcally compute the expectatons and varances of the elements n H by determnng the dstrbuton functons of these elements. Fgure 1 (left) depcts the expectatons and standard devatons of the elements of H for = 100. From the fgure t can be seen that, even though the network s n an optmal state, the elements of 11 H are much larger than those of 00 H. Naturally, ths effect s much stronger when weghts from nput to hdden unts are large, such that the hdden unts are close to ther saturated state. In that case, a 0 h 0 and the elements of 00 H (as well as those of 01 H) tend to 0. The centerng method as proposed n [1], when also appled to the actvaton values of the hdden unts, ensures an optmal condton for 11 H. Schraudolph and Sejnowsk [4] have shown that centerng n backpropagaton further mproves the learnng problem. Usng argumentaton smlar to Le Cun et al. [1], P they furthermore suggest that centerng the o = y o,a o as well as the h = o ow ho s 0 (a h ) mproves the condton of H. Although ths wll mprove the condton of the 00 H and 01 H, the problem that the elements of 00 H and 11 H are very dfferent n sze remans. We suggest that ths approach alone s not suffcent to mprove the learnng problem. 3 An adapted learnng rule To understand why the elements of H are so dfferent, we have to consder the back-propagaton learnng rule. Frst, Saarnen et al. [3] lst a few cases n whch the Hessan may become (nearly) sngular. The lsted reasons are assocated wth the ll character of the transfer functons whch are customarly used: the sgmodal functon s(x), whch saturates (.e., the dervatve becomes zero) for large nput. However, another problem exsts when a network has a small weght leavng from a hdden unt, the nfluence of the weghts that feed nto ths hdden unt s sgnfcantly reduced. Ths problem touches a characterstc problem n feed-forward network learnng: the gradents n the lower-layer weghts are nfluenced by the hgher-layer weghts. Why ths s so can be seen from the back-propagaton learnng method, whch works as follows. For each learnng sample: 1. compute o = y o, a o where a o s the actvaton value for output unt o; 2. compute w ho P = o a h where a h s the actvaton for hdden unt h; 3. compute h = o ow ho s P 0 (a h ); 4. compute w h = h x = o ow ho s 0 (a h )x. The gradent s then computed as the summaton of the w s. The gradent for a weght from an nput to a hdden unt becomes neglgble when o s small (.e., the network correctly represents ths sample), x s small (.e., the network nput s close to 0), w ho s small or s 0 (a h ) s small (because w h s large). The latter two of these cases are undesrable, and lead to paralyss of the weghts from nput to hdden unts.
5 0.8 S H =j 00 H =j 00 H 6=j 5 n 01 H H 6=j w h1o 1 w 1h 1 w h2o1 h2 h1 w 1h2 Fgure 1: (left) The dstrbuton of the elements of H for = 100. (rght) An exemplar lnearly augmented feed-forward neural network. In order to allevate these problems, we propose a change to the learnng rule as follows: w h = o (w ho s 0 (a h )+a h )x = h x + a h x o : (8) o By addng a h to the mddle term, we can solve both paralyss problems. In effect, an extra connecton from each nput unt to each output unt s created, wth a weght value coupled to the weght from the nput to hdden unt. The o th output of the neural network s now computed as M(~x; W ) o = h w ho s h w h x + h + S o h! w h x (9) where ds(x)=dx s(x). In the case that s(x) s the tanh functon we fnd that S(x) = log cosh x; note that ths functon asymptotcally goes to jxj,log 2 for large x. In effect, we add the absolute of the hdden unt actvaton values to each output unt. We call the new network the lnearly augmented feedforward network. The structure of ths network s depcted n fgure 1 (rght) for one nput and output unt and two hdden unts. Analyss of the Hessan s condton. We can compute the approxmaton error E 0 for M smlar to (2) and construct the Hessan H 0 for M. We can relate the quadrants of H 0 to those of H as follows. Frst, 11 H 0 = 11 H, and 00 H 0 1+h1N ;2+h2N = 00 H ::: + 1 x 1 a h1 x 2 a h2 (1 + w h1oa 0 h 1 + w h2oa 0 h 2 ); 01 H 0 +h 1N ;+h2n = 01 H ::: + 1 x a h1 a h2 : p p Usng the same lne of argument as n secton 2.1 we note that t s mportant that the a s be centered,.e., (a) the nput values should be centered, and (b) t s advantageous to use a centered hdden unt actvaton functon (e.g., tanh(x) rather than 1=(1 + e,x )).
6 M and the Unversal Approxmaton Theorems. It has been shown n varous publcatons that the ordnary feed-forward neural network N can represent any Borel-measurable functon wth a sngle layer of hdden unts whch have sgmodal or Gaussan actvaton functons. It can be easly shown [6] that N and M are equvalent, such that all representaton theorems that hold for N also hold for M. 4 Examples The new method has been tested on a few problems. Frst, OR classfcaton wth two hdden unts. Secondly, the approxmaton of sn(x) wth 3 hdden unts and 11 learnng samples randomly chosen between 0 and 2. Thrd, the approxmaton of the nverse knematcs plus nverse perspectve transform for a 3 DoF robot arm wth camera fxed n the end-effector. The network conssted of 5 nputs, 8 hdden unts, and 3 outputs; the 1107 learnng samples were gathered usng a Manutec R3 robot. All networks were traned wth Polak-Rbère conjugate gradent wth Powell restarts [5]. All experments were run 1000 tmes wth dfferent ntal weghts. The results are llustrated below. Note that, for the chosen problems, M effectvely smoothes away local mnma and saddle ponts (% stuck goes to 0.0). The reported E was measured only over those runs whch dd not get stuck. References N M OR % stuck # steps to reach E =0: sn % stuck E after 1000 teratons 29 10,5 7:3 10,5 robot E after 1000 teratons 5:3 10,3 1:8 10,3 [1] Y. Le Cun, I. Kanter, and S. A. Solla. Egenvalues of covarance matrces: Applcaton to neural network learnng. Physcal Revew Letters, 66(18): , [2] E. Polak. Computatonal Methods n Optmzaton. Academc Press, New York, [3] S. Saarnen, R. Bramley, and G. Cybenko. Ill-condtonng n neural network tranng problems. Sam Journal of Scentfc Computng, 14(3): , May [4] N. N. Schraudolph and T. J. Sejnowsk. Temperng backpropagaton networks: Not all weghts are created equal. In D. S. Touretzky, M. C. Moser, and M. E. Hasselmo, edtors, Advances n Neural Informaton Processng Systems, pages , [5] P. van der Smagt. Mnmsaton methods for tranng feed-forward networks. Neural Networks, 7(1):1 11, [6] P. van der Smagt and G. Hrznger. Solvng the ll-condtonng n neural network learnng. In J. Orr, K. Müller, and R. Caruana, edtors, Trcks of the Trade: How to Make Neural Networks Really Work. Lecture Notes n Computer Scence, Sprnger Verlag, In prnt.
EEE 241: Linear Systems
EEE : Lnear Systems Summary #: Backpropagaton BACKPROPAGATION The perceptron rule as well as the Wdrow Hoff learnng were desgned to tran sngle layer networks. They suffer from the same dsadvantage: they
More informationFor now, let us focus on a specific model of neurons. These are simplified from reality but can achieve remarkable results.
Neural Networks : Dervaton compled by Alvn Wan from Professor Jtendra Malk s lecture Ths type of computaton s called deep learnng and s the most popular method for many problems, such as computer vson
More informationIV. Performance Optimization
IV. Performance Optmzaton A. Steepest descent algorthm defnton how to set up bounds on learnng rate mnmzaton n a lne (varyng learnng rate) momentum learnng examples B. Newton s method defnton Gauss-Newton
More informationρ some λ THE INVERSE POWER METHOD (or INVERSE ITERATION) , for , or (more usually) to
THE INVERSE POWER METHOD (or INVERSE ITERATION) -- applcaton of the Power method to A some fxed constant ρ (whch s called a shft), x λ ρ If the egenpars of A are { ( λ, x ) } ( ), or (more usually) to,
More informationLecture Notes on Linear Regression
Lecture Notes on Lnear Regresson Feng L fl@sdueducn Shandong Unversty, Chna Lnear Regresson Problem In regresson problem, we am at predct a contnuous target value gven an nput feature vector We assume
More informationThe Order Relation and Trace Inequalities for. Hermitian Operators
Internatonal Mathematcal Forum, Vol 3, 08, no, 507-57 HIKARI Ltd, wwwm-hkarcom https://doorg/0988/mf088055 The Order Relaton and Trace Inequaltes for Hermtan Operators Y Huang School of Informaton Scence
More informationLectures - Week 4 Matrix norms, Conditioning, Vector Spaces, Linear Independence, Spanning sets and Basis, Null space and Range of a Matrix
Lectures - Week 4 Matrx norms, Condtonng, Vector Spaces, Lnear Independence, Spannng sets and Bass, Null space and Range of a Matrx Matrx Norms Now we turn to assocatng a number to each matrx. We could
More informationCHALMERS, GÖTEBORGS UNIVERSITET. SOLUTIONS to RE-EXAM for ARTIFICIAL NEURAL NETWORKS. COURSE CODES: FFR 135, FIM 720 GU, PhD
CHALMERS, GÖTEBORGS UNIVERSITET SOLUTIONS to RE-EXAM for ARTIFICIAL NEURAL NETWORKS COURSE CODES: FFR 35, FIM 72 GU, PhD Tme: Place: Teachers: Allowed materal: Not allowed: January 2, 28, at 8 3 2 3 SB
More informationKernel Methods and SVMs Extension
Kernel Methods and SVMs Extenson The purpose of ths document s to revew materal covered n Machne Learnng 1 Supervsed Learnng regardng support vector machnes (SVMs). Ths document also provdes a general
More informationMultilayer Perceptrons and Backpropagation. Perceptrons. Recap: Perceptrons. Informatics 1 CG: Lecture 6. Mirella Lapata
Multlayer Perceptrons and Informatcs CG: Lecture 6 Mrella Lapata School of Informatcs Unversty of Ednburgh mlap@nf.ed.ac.uk Readng: Kevn Gurney s Introducton to Neural Networks, Chapters 5 6.5 January,
More information2.3 Nilpotent endomorphisms
s a block dagonal matrx, wth A Mat dm U (C) In fact, we can assume that B = B 1 B k, wth B an ordered bass of U, and that A = [f U ] B, where f U : U U s the restrcton of f to U 40 23 Nlpotent endomorphsms
More informationLinear Feature Engineering 11
Lnear Feature Engneerng 11 2 Least-Squares 2.1 Smple least-squares Consder the followng dataset. We have a bunch of nputs x and correspondng outputs y. The partcular values n ths dataset are x y 0.23 0.19
More informationTransfer Functions. Convenient representation of a linear, dynamic model. A transfer function (TF) relates one input and one output: ( ) system
Transfer Functons Convenent representaton of a lnear, dynamc model. A transfer functon (TF) relates one nput and one output: x t X s y t system Y s The followng termnology s used: x y nput output forcng
More informationCHAPTER III Neural Networks as Associative Memory
CHAPTER III Neural Networs as Assocatve Memory Introducton One of the prmary functons of the bran s assocatve memory. We assocate the faces wth names, letters wth sounds, or we can recognze the people
More informationHongyi Miao, College of Science, Nanjing Forestry University, Nanjing ,China. (Received 20 June 2013, accepted 11 March 2014) I)ϕ (k)
ISSN 1749-3889 (prnt), 1749-3897 (onlne) Internatonal Journal of Nonlnear Scence Vol.17(2014) No.2,pp.188-192 Modfed Block Jacob-Davdson Method for Solvng Large Sparse Egenproblems Hongy Mao, College of
More informationStructure and Drive Paul A. Jensen Copyright July 20, 2003
Structure and Drve Paul A. Jensen Copyrght July 20, 2003 A system s made up of several operatons wth flow passng between them. The structure of the system descrbes the flow paths from nputs to outputs.
More informationSingular Value Decomposition: Theory and Applications
Sngular Value Decomposton: Theory and Applcatons Danel Khashab Sprng 2015 Last Update: March 2, 2015 1 Introducton A = UDV where columns of U and V are orthonormal and matrx D s dagonal wth postve real
More informationWeek3, Chapter 4. Position and Displacement. Motion in Two Dimensions. Instantaneous Velocity. Average Velocity
Week3, Chapter 4 Moton n Two Dmensons Lecture Quz A partcle confned to moton along the x axs moves wth constant acceleraton from x =.0 m to x = 8.0 m durng a 1-s tme nterval. The velocty of the partcle
More informationFeb 14: Spatial analysis of data fields
Feb 4: Spatal analyss of data felds Mappng rregularly sampled data onto a regular grd Many analyss technques for geophyscal data requre the data be located at regular ntervals n space and/or tme. hs s
More informationErrors for Linear Systems
Errors for Lnear Systems When we solve a lnear system Ax b we often do not know A and b exactly, but have only approxmatons  and ˆb avalable. Then the best thng we can do s to solve ˆx ˆb exactly whch
More informationWeek 5: Neural Networks
Week 5: Neural Networks Instructor: Sergey Levne Neural Networks Summary In the prevous lecture, we saw how we can construct neural networks by extendng logstc regresson. Neural networks consst of multple
More informationSalmon: Lectures on partial differential equations. Consider the general linear, second-order PDE in the form. ,x 2
Salmon: Lectures on partal dfferental equatons 5. Classfcaton of second-order equatons There are general methods for classfyng hgher-order partal dfferental equatons. One s very general (applyng even to
More informationMultilayer Perceptron (MLP)
Multlayer Perceptron (MLP) Seungjn Cho Department of Computer Scence and Engneerng Pohang Unversty of Scence and Technology 77 Cheongam-ro, Nam-gu, Pohang 37673, Korea seungjn@postech.ac.kr 1 / 20 Outlne
More informationChapter 5. Solution of System of Linear Equations. Module No. 6. Solution of Inconsistent and Ill Conditioned Systems
Numercal Analyss by Dr. Anta Pal Assstant Professor Department of Mathematcs Natonal Insttute of Technology Durgapur Durgapur-713209 emal: anta.bue@gmal.com 1 . Chapter 5 Soluton of System of Lnear Equatons
More informationLinear Approximation with Regularization and Moving Least Squares
Lnear Approxmaton wth Regularzaton and Movng Least Squares Igor Grešovn May 007 Revson 4.6 (Revson : March 004). 5 4 3 0.5 3 3.5 4 Contents: Lnear Fttng...4. Weghted Least Squares n Functon Approxmaton...
More informationFall 2012 Analysis of Experimental Measurements B. Eisenstein/rev. S. Errede
Fall 0 Analyss of Expermental easurements B. Esensten/rev. S. Errede We now reformulate the lnear Least Squares ethod n more general terms, sutable for (eventually extendng to the non-lnear case, and also
More informationNeural networks. Nuno Vasconcelos ECE Department, UCSD
Neural networs Nuno Vasconcelos ECE Department, UCSD Classfcaton a classfcaton problem has two types of varables e.g. X - vector of observatons (features) n the world Y - state (class) of the world x X
More informationLecture 12: Discrete Laplacian
Lecture 12: Dscrete Laplacan Scrbe: Tanye Lu Our goal s to come up wth a dscrete verson of Laplacan operator for trangulated surfaces, so that we can use t n practce to solve related problems We are mostly
More informationModule 3 LOSSY IMAGE COMPRESSION SYSTEMS. Version 2 ECE IIT, Kharagpur
Module 3 LOSSY IMAGE COMPRESSION SYSTEMS Verson ECE IIT, Kharagpur Lesson 6 Theory of Quantzaton Verson ECE IIT, Kharagpur Instructonal Objectves At the end of ths lesson, the students should be able to:
More informationInner Product. Euclidean Space. Orthonormal Basis. Orthogonal
Inner Product Defnton 1 () A Eucldean space s a fnte-dmensonal vector space over the reals R, wth an nner product,. Defnton 2 (Inner Product) An nner product, on a real vector space X s a symmetrc, blnear,
More informationNorms, Condition Numbers, Eigenvalues and Eigenvectors
Norms, Condton Numbers, Egenvalues and Egenvectors 1 Norms A norm s a measure of the sze of a matrx or a vector For vectors the common norms are: N a 2 = ( x 2 1/2 the Eucldean Norm (1a b 1 = =1 N x (1b
More informationELASTIC WAVE PROPAGATION IN A CONTINUOUS MEDIUM
ELASTIC WAVE PROPAGATION IN A CONTINUOUS MEDIUM An elastc wave s a deformaton of the body that travels throughout the body n all drectons. We can examne the deformaton over a perod of tme by fxng our look
More informationGeneralized Linear Methods
Generalzed Lnear Methods 1 Introducton In the Ensemble Methods the general dea s that usng a combnaton of several weak learner one could make a better learner. More formally, assume that we have a set
More informationGlobal Sensitivity. Tuesday 20 th February, 2018
Global Senstvty Tuesday 2 th February, 28 ) Local Senstvty Most senstvty analyses [] are based on local estmates of senstvty, typcally by expandng the response n a Taylor seres about some specfc values
More informationThe exam is closed book, closed notes except your one-page cheat sheet.
CS 89 Fall 206 Introducton to Machne Learnng Fnal Do not open the exam before you are nstructed to do so The exam s closed book, closed notes except your one-page cheat sheet Usage of electronc devces
More informationLINEAR REGRESSION ANALYSIS. MODULE IX Lecture Multicollinearity
LINEAR REGRESSION ANALYSIS MODULE IX Lecture - 30 Multcollnearty Dr. Shalabh Department of Mathematcs and Statstcs Indan Insttute of Technology Kanpur 2 Remedes for multcollnearty Varous technques have
More information1 Convex Optimization
Convex Optmzaton We wll consder convex optmzaton problems. Namely, mnmzaton problems where the objectve s convex (we assume no constrants for now). Such problems often arse n machne learnng. For example,
More informationComposite Hypotheses testing
Composte ypotheses testng In many hypothess testng problems there are many possble dstrbutons that can occur under each of the hypotheses. The output of the source s a set of parameters (ponts n a parameter
More informationSTAT 309: MATHEMATICAL COMPUTATIONS I FALL 2018 LECTURE 16
STAT 39: MATHEMATICAL COMPUTATIONS I FALL 218 LECTURE 16 1 why teratve methods f we have a lnear system Ax = b where A s very, very large but s ether sparse or structured (eg, banded, Toepltz, banded plus
More informationInexact Newton Methods for Inverse Eigenvalue Problems
Inexact Newton Methods for Inverse Egenvalue Problems Zheng-jan Ba Abstract In ths paper, we survey some of the latest development n usng nexact Newton-lke methods for solvng nverse egenvalue problems.
More informationChat eld, C. and A.J.Collins, Introduction to multivariate analysis. Chapman & Hall, 1980
MT07: Multvarate Statstcal Methods Mke Tso: emal mke.tso@manchester.ac.uk Webpage for notes: http://www.maths.manchester.ac.uk/~mkt/new_teachng.htm. Introducton to multvarate data. Books Chat eld, C. and
More informationSupporting Information
Supportng Informaton The neural network f n Eq. 1 s gven by: f x l = ReLU W atom x l + b atom, 2 where ReLU s the element-wse rectfed lnear unt, 21.e., ReLUx = max0, x, W atom R d d s the weght matrx to
More informationRBF Neural Network Model Training by Unscented Kalman Filter and Its Application in Mechanical Fault Diagnosis
Appled Mechancs and Materals Submtted: 24-6-2 ISSN: 662-7482, Vols. 62-65, pp 2383-2386 Accepted: 24-6- do:.428/www.scentfc.net/amm.62-65.2383 Onlne: 24-8- 24 rans ech Publcatons, Swtzerland RBF Neural
More informationMATH 567: Mathematical Techniques in Data Science Lab 8
1/14 MATH 567: Mathematcal Technques n Data Scence Lab 8 Domnque Gullot Departments of Mathematcal Scences Unversty of Delaware Aprl 11, 2017 Recall We have: a (2) 1 = f(w (1) 11 x 1 + W (1) 12 x 2 + W
More informationAPPENDIX A Some Linear Algebra
APPENDIX A Some Lnear Algebra The collecton of m, n matrces A.1 Matrces a 1,1,..., a 1,n A = a m,1,..., a m,n wth real elements a,j s denoted by R m,n. If n = 1 then A s called a column vector. Smlarly,
More informationGradient Descent Learning and Backpropagation
Artfcal Neural Networks (art 2) Chrstan Jacob Gradent Descent Learnng and Backpropagaton CSC 533 Wnter 200 Learnng by Gradent Descent Defnton of the Learnng roble Let us start wth the sple case of lnear
More informationSolutions to exam in SF1811 Optimization, Jan 14, 2015
Solutons to exam n SF8 Optmzaton, Jan 4, 25 3 3 O------O -4 \ / \ / The network: \/ where all lnks go from left to rght. /\ / \ / \ 6 O------O -5 2 4.(a) Let x = ( x 3, x 4, x 23, x 24 ) T, where the varable
More informationWorkshop: Approximating energies and wave functions Quantum aspects of physical chemistry
Workshop: Approxmatng energes and wave functons Quantum aspects of physcal chemstry http://quantum.bu.edu/pltl/6/6.pdf Last updated Thursday, November 7, 25 7:9:5-5: Copyrght 25 Dan Dll (dan@bu.edu) Department
More information10-801: Advanced Optimization and Randomized Methods Lecture 2: Convex functions (Jan 15, 2014)
0-80: Advanced Optmzaton and Randomzed Methods Lecture : Convex functons (Jan 5, 04) Lecturer: Suvrt Sra Addr: Carnege Mellon Unversty, Sprng 04 Scrbes: Avnava Dubey, Ahmed Hefny Dsclamer: These notes
More informationLECTURE 9 CANONICAL CORRELATION ANALYSIS
LECURE 9 CANONICAL CORRELAION ANALYSIS Introducton he concept of canoncal correlaton arses when we want to quantfy the assocatons between two sets of varables. For example, suppose that the frst set of
More informationApplication of B-Spline to Numerical Solution of a System of Singularly Perturbed Problems
Mathematca Aeterna, Vol. 1, 011, no. 06, 405 415 Applcaton of B-Splne to Numercal Soluton of a System of Sngularly Perturbed Problems Yogesh Gupta Department of Mathematcs Unted College of Engneerng &
More informationA new Approach for Solving Linear Ordinary Differential Equations
, ISSN 974-57X (Onlne), ISSN 974-5718 (Prnt), Vol. ; Issue No. 1; Year 14, Copyrght 13-14 by CESER PUBLICATIONS A new Approach for Solvng Lnear Ordnary Dfferental Equatons Fawz Abdelwahd Department of
More informationProf. Dr. I. Nasser Phys 630, T Aug-15 One_dimensional_Ising_Model
EXACT OE-DIMESIOAL ISIG MODEL The one-dmensonal Isng model conssts of a chan of spns, each spn nteractng only wth ts two nearest neghbors. The smple Isng problem n one dmenson can be solved drectly n several
More informationMultigradient for Neural Networks for Equalizers 1
Multgradent for Neural Netorks for Equalzers 1 Chulhee ee, Jnook Go and Heeyoung Km Department of Electrcal and Electronc Engneerng Yonse Unversty 134 Shnchon-Dong, Seodaemun-Ku, Seoul 1-749, Korea ABSTRACT
More informationDifference Equations
Dfference Equatons c Jan Vrbk 1 Bascs Suppose a sequence of numbers, say a 0,a 1,a,a 3,... s defned by a certan general relatonshp between, say, three consecutve values of the sequence, e.g. a + +3a +1
More informationNon-linear Canonical Correlation Analysis Using a RBF Network
ESANN' proceedngs - European Smposum on Artfcal Neural Networks Bruges (Belgum), 4-6 Aprl, d-sde publ., ISBN -97--, pp. 57-5 Non-lnear Canoncal Correlaton Analss Usng a RBF Network Sukhbnder Kumar, Elane
More informationThe Prncpal Component Transform The Prncpal Component Transform s also called Karhunen-Loeve Transform (KLT, Hotellng Transform, oregenvector Transfor
Prncpal Component Transform Multvarate Random Sgnals A real tme sgnal x(t can be consdered as a random process and ts samples x m (m =0; ;N, 1 a random vector: The mean vector of X s X =[x0; ;x N,1] T
More informationIntroduction to the Introduction to Artificial Neural Network
Introducton to the Introducton to Artfcal Neural Netork Vuong Le th Hao Tang s sldes Part of the content of the sldes are from the Internet (possbly th modfcatons). The lecturer does not clam any onershp
More information= = = (a) Use the MATLAB command rref to solve the system. (b) Let A be the coefficient matrix and B be the right-hand side of the system.
Chapter Matlab Exercses Chapter Matlab Exercses. Consder the lnear system of Example n Secton.. x x x y z y y z (a) Use the MATLAB command rref to solve the system. (b) Let A be the coeffcent matrx and
More informationConvexity preserving interpolation by splines of arbitrary degree
Computer Scence Journal of Moldova, vol.18, no.1(52), 2010 Convexty preservng nterpolaton by splnes of arbtrary degree Igor Verlan Abstract In the present paper an algorthm of C 2 nterpolaton of dscrete
More informationThis model contains two bonds per unit cell (one along the x-direction and the other along y). So we can rewrite the Hamiltonian as:
1 Problem set #1 1.1. A one-band model on a square lattce Fg. 1 Consder a square lattce wth only nearest-neghbor hoppngs (as shown n the fgure above): H t, j a a j (1.1) where,j stands for nearest neghbors
More informationNumerical Heat and Mass Transfer
Master degree n Mechancal Engneerng Numercal Heat and Mass Transfer 06-Fnte-Dfference Method (One-dmensonal, steady state heat conducton) Fausto Arpno f.arpno@uncas.t Introducton Why we use models and
More information10-701/ Machine Learning, Fall 2005 Homework 3
10-701/15-781 Machne Learnng, Fall 2005 Homework 3 Out: 10/20/05 Due: begnnng of the class 11/01/05 Instructons Contact questons-10701@autonlaborg for queston Problem 1 Regresson and Cross-valdaton [40
More informationSome Comments on Accelerating Convergence of Iterative Sequences Using Direct Inversion of the Iterative Subspace (DIIS)
Some Comments on Acceleratng Convergence of Iteratve Sequences Usng Drect Inverson of the Iteratve Subspace (DIIS) C. Davd Sherrll School of Chemstry and Bochemstry Georga Insttute of Technology May 1998
More informationA New Refinement of Jacobi Method for Solution of Linear System Equations AX=b
Int J Contemp Math Scences, Vol 3, 28, no 17, 819-827 A New Refnement of Jacob Method for Soluton of Lnear System Equatons AX=b F Naem Dafchah Department of Mathematcs, Faculty of Scences Unversty of Gulan,
More informationGeorgia Tech PHYS 6124 Mathematical Methods of Physics I
Georga Tech PHYS 624 Mathematcal Methods of Physcs I Instructor: Predrag Cvtanovć Fall semester 202 Homework Set #7 due October 30 202 == show all your work for maxmum credt == put labels ttle legends
More informationNON-CENTRAL 7-POINT FORMULA IN THE METHOD OF LINES FOR PARABOLIC AND BURGERS' EQUATIONS
IJRRAS 8 (3 September 011 www.arpapress.com/volumes/vol8issue3/ijrras_8_3_08.pdf NON-CENTRAL 7-POINT FORMULA IN THE METHOD OF LINES FOR PARABOLIC AND BURGERS' EQUATIONS H.O. Bakodah Dept. of Mathematc
More information2 STATISTICALLY OPTIMAL TRAINING DATA 2.1 A CRITERION OF OPTIMALITY We revew the crteron of statstcally optmal tranng data (Fukumzu et al., 1994). We
Advances n Neural Informaton Processng Systems 8 Actve Learnng n Multlayer Perceptrons Kenj Fukumzu Informaton and Communcaton R&D Center, Rcoh Co., Ltd. 3-2-3, Shn-yokohama, Yokohama, 222 Japan E-mal:
More informationMathematical Preparations
1 Introducton Mathematcal Preparatons The theory of relatvty was developed to explan experments whch studed the propagaton of electromagnetc radaton n movng coordnate systems. Wthn expermental error the
More informationApproximate D-optimal designs of experiments on the convex hull of a finite set of information matrices
Approxmate D-optmal desgns of experments on the convex hull of a fnte set of nformaton matrces Radoslav Harman, Mára Trnovská Department of Appled Mathematcs and Statstcs Faculty of Mathematcs, Physcs
More informationCSE 252C: Computer Vision III
CSE 252C: Computer Vson III Lecturer: Serge Belonge Scrbe: Catherne Wah LECTURE 15 Kernel Machnes 15.1. Kernels We wll study two methods based on a specal knd of functon k(x, y) called a kernel: Kernel
More informationEcon107 Applied Econometrics Topic 3: Classical Model (Studenmund, Chapter 4)
I. Classcal Assumptons Econ7 Appled Econometrcs Topc 3: Classcal Model (Studenmund, Chapter 4) We have defned OLS and studed some algebrac propertes of OLS. In ths topc we wll study statstcal propertes
More informationProblem Set 9 Solutions
Desgn and Analyss of Algorthms May 4, 2015 Massachusetts Insttute of Technology 6.046J/18.410J Profs. Erk Demane, Srn Devadas, and Nancy Lynch Problem Set 9 Solutons Problem Set 9 Solutons Ths problem
More informationU.C. Berkeley CS294: Beyond Worst-Case Analysis Luca Trevisan September 5, 2017
U.C. Berkeley CS94: Beyond Worst-Case Analyss Handout 4s Luca Trevsan September 5, 07 Summary of Lecture 4 In whch we ntroduce semdefnte programmng and apply t to Max Cut. Semdefnte Programmng Recall that
More informationj) = 1 (note sigma notation) ii. Continuous random variable (e.g. Normal distribution) 1. density function: f ( x) 0 and f ( x) dx = 1
Random varables Measure of central tendences and varablty (means and varances) Jont densty functons and ndependence Measures of assocaton (covarance and correlaton) Interestng result Condtonal dstrbutons
More information1 Derivation of Point-to-Plane Minimization
1 Dervaton of Pont-to-Plane Mnmzaton Consder the Chen-Medon (pont-to-plane) framework for ICP. Assume we have a collecton of ponts (p, q ) wth normals n. We want to determne the optmal rotaton and translaton
More informationU.C. Berkeley CS294: Spectral Methods and Expanders Handout 8 Luca Trevisan February 17, 2016
U.C. Berkeley CS94: Spectral Methods and Expanders Handout 8 Luca Trevsan February 7, 06 Lecture 8: Spectral Algorthms Wrap-up In whch we talk about even more generalzatons of Cheeger s nequaltes, and
More informationRandom Walks on Digraphs
Random Walks on Dgraphs J. J. P. Veerman October 23, 27 Introducton Let V = {, n} be a vertex set and S a non-negatve row-stochastc matrx (.e. rows sum to ). V and S defne a dgraph G = G(V, S) and a drected
More informationC/CS/Phy191 Problem Set 3 Solutions Out: Oct 1, 2008., where ( 00. ), so the overall state of the system is ) ( ( ( ( 00 ± 11 ), Φ ± = 1
C/CS/Phy9 Problem Set 3 Solutons Out: Oct, 8 Suppose you have two qubts n some arbtrary entangled state ψ You apply the teleportaton protocol to each of the qubts separately What s the resultng state obtaned
More informationThe Quadratic Trigonometric Bézier Curve with Single Shape Parameter
J. Basc. Appl. Sc. Res., (3541-546, 01 01, TextRoad Publcaton ISSN 090-4304 Journal of Basc and Appled Scentfc Research www.textroad.com The Quadratc Trgonometrc Bézer Curve wth Sngle Shape Parameter Uzma
More informationHidden Markov Models & The Multivariate Gaussian (10/26/04)
CS281A/Stat241A: Statstcal Learnng Theory Hdden Markov Models & The Multvarate Gaussan (10/26/04) Lecturer: Mchael I. Jordan Scrbes: Jonathan W. Hu 1 Hdden Markov Models As a bref revew, hdden Markov models
More informationEffects of Ignoring Correlations When Computing Sample Chi-Square. John W. Fowler February 26, 2012
Effects of Ignorng Correlatons When Computng Sample Ch-Square John W. Fowler February 6, 0 It can happen that ch-square must be computed for a sample whose elements are correlated to an unknown extent.
More informationThe Study of Teaching-learning-based Optimization Algorithm
Advanced Scence and Technology Letters Vol. (AST 06), pp.05- http://dx.do.org/0.57/astl.06. The Study of Teachng-learnng-based Optmzaton Algorthm u Sun, Yan fu, Lele Kong, Haolang Q,, Helongang Insttute
More informationUsing T.O.M to Estimate Parameter of distributions that have not Single Exponential Family
IOSR Journal of Mathematcs IOSR-JM) ISSN: 2278-5728. Volume 3, Issue 3 Sep-Oct. 202), PP 44-48 www.osrjournals.org Usng T.O.M to Estmate Parameter of dstrbutons that have not Sngle Exponental Famly Jubran
More informationVector Norms. Chapter 7 Iterative Techniques in Matrix Algebra. Cauchy-Bunyakovsky-Schwarz Inequality for Sums. Distances. Convergence.
Vector Norms Chapter 7 Iteratve Technques n Matrx Algebra Per-Olof Persson persson@berkeley.edu Department of Mathematcs Unversty of Calforna, Berkeley Math 128B Numercal Analyss Defnton A vector norm
More informationRadar Trackers. Study Guide. All chapters, problems, examples and page numbers refer to Applied Optimal Estimation, A. Gelb, Ed.
Radar rackers Study Gude All chapters, problems, examples and page numbers refer to Appled Optmal Estmaton, A. Gelb, Ed. Chapter Example.0- Problem Statement wo sensors Each has a sngle nose measurement
More informationDr. Shalabh Department of Mathematics and Statistics Indian Institute of Technology Kanpur
Analyss of Varance and Desgn of Experment-I MODULE VII LECTURE - 3 ANALYSIS OF COVARIANCE Dr Shalabh Department of Mathematcs and Statstcs Indan Insttute of Technology Kanpur Any scentfc experment s performed
More informationSolving Nonlinear Differential Equations by a Neural Network Method
Solvng Nonlnear Dfferental Equatons by a Neural Network Method Luce P. Aarts and Peter Van der Veer Delft Unversty of Technology, Faculty of Cvlengneerng and Geoscences, Secton of Cvlengneerng Informatcs,
More informationCS4495/6495 Introduction to Computer Vision. 3C-L3 Calibrating cameras
CS4495/6495 Introducton to Computer Vson 3C-L3 Calbratng cameras Fnally (last tme): Camera parameters Projecton equaton the cumulatve effect of all parameters: M (3x4) f s x ' 1 0 0 0 c R 0 I T 3 3 3 x1
More informationON A DETERMINATION OF THE INITIAL FUNCTIONS FROM THE OBSERVED VALUES OF THE BOUNDARY FUNCTIONS FOR THE SECOND-ORDER HYPERBOLIC EQUATION
Advanced Mathematcal Models & Applcatons Vol.3, No.3, 2018, pp.215-222 ON A DETERMINATION OF THE INITIAL FUNCTIONS FROM THE OBSERVED VALUES OF THE BOUNDARY FUNCTIONS FOR THE SECOND-ORDER HYPERBOLIC EUATION
More informationModule 2. Random Processes. Version 2 ECE IIT, Kharagpur
Module Random Processes Lesson 6 Functons of Random Varables After readng ths lesson, ou wll learn about cdf of functon of a random varable. Formula for determnng the pdf of a random varable. Let, X be
More informationGrover s Algorithm + Quantum Zeno Effect + Vaidman
Grover s Algorthm + Quantum Zeno Effect + Vadman CS 294-2 Bomb 10/12/04 Fall 2004 Lecture 11 Grover s algorthm Recall that Grover s algorthm for searchng over a space of sze wors as follows: consder the
More informationMultilayer neural networks
Lecture Multlayer neural networks Mlos Hauskrecht mlos@cs.ptt.edu 5329 Sennott Square Mdterm exam Mdterm Monday, March 2, 205 In-class (75 mnutes) closed book materal covered by February 25, 205 Multlayer
More informationStanford University CS359G: Graph Partitioning and Expanders Handout 4 Luca Trevisan January 13, 2011
Stanford Unversty CS359G: Graph Parttonng and Expanders Handout 4 Luca Trevsan January 3, 0 Lecture 4 In whch we prove the dffcult drecton of Cheeger s nequalty. As n the past lectures, consder an undrected
More informationPARTICIPATION FACTOR IN MODAL ANALYSIS OF POWER SYSTEMS STABILITY
POZNAN UNIVE RSITY OF TE CHNOLOGY ACADE MIC JOURNALS No 86 Electrcal Engneerng 6 Volodymyr KONOVAL* Roman PRYTULA** PARTICIPATION FACTOR IN MODAL ANALYSIS OF POWER SYSTEMS STABILITY Ths paper provdes a
More informationIntroduction. - The Second Lyapunov Method. - The First Lyapunov Method
Stablty Analyss A. Khak Sedgh Control Systems Group Faculty of Electrcal and Computer Engneerng K. N. Toos Unversty of Technology February 2009 1 Introducton Stablty s the most promnent characterstc of
More informationP R. Lecture 4. Theory and Applications of Pattern Recognition. Dept. of Electrical and Computer Engineering /
Theory and Applcatons of Pattern Recognton 003, Rob Polkar, Rowan Unversty, Glassboro, NJ Lecture 4 Bayes Classfcaton Rule Dept. of Electrcal and Computer Engneerng 0909.40.0 / 0909.504.04 Theory & Applcatons
More informationAdmin NEURAL NETWORKS. Perceptron learning algorithm. Our Nervous System 10/25/16. Assignment 7. Class 11/22. Schedule for the rest of the semester
0/25/6 Admn Assgnment 7 Class /22 Schedule for the rest of the semester NEURAL NETWORKS Davd Kauchak CS58 Fall 206 Perceptron learnng algorthm Our Nervous System repeat untl convergence (or for some #
More informationLOW BIAS INTEGRATED PATH ESTIMATORS. James M. Calvin
Proceedngs of the 007 Wnter Smulaton Conference S G Henderson, B Bller, M-H Hseh, J Shortle, J D Tew, and R R Barton, eds LOW BIAS INTEGRATED PATH ESTIMATORS James M Calvn Department of Computer Scence
More informationEstimating the Fundamental Matrix by Transforming Image Points in Projective Space 1
Estmatng the Fundamental Matrx by Transformng Image Ponts n Projectve Space 1 Zhengyou Zhang and Charles Loop Mcrosoft Research, One Mcrosoft Way, Redmond, WA 98052, USA E-mal: fzhang,cloopg@mcrosoft.com
More information