On Topology, Size and Generalization of Non-linear Feed-Forward Neural Networks

Size: px
Start display at page:

Download "On Topology, Size and Generalization of Non-linear Feed-Forward Neural Networks"

Transcription

1 On Topology, Size and Generalization of Non-linear Feed-Forward Neural Networks Stephan Rudolph Institute for Statics and Dynamics of Aerospace Structures University of Stuttgart, Pfaffenwaldring 27, D Stuttgart, Germany Published in: NEUROCOMPUTING vol. 6, no. (July 997), pp 22. Abstract. The use of similarity transforms in the design and the interpretation of feed-forward neural networks is proposed. The method is based on the so-called Buckingham-Theorem or Pi-Theorem and is valid for all neural network function approximation problems which belong to the class of dimensionally homogeneous equations. The new design method allows the a priori determination of a minimal topology size of the first and last network layer. Finally, the correct and unique pointwise generalization capability of the new so-called similarity network topology is proved and illustrated using two examples. Keywords: Pi-Theorem, similarity transforms, similarity functions, dimensional homogeneity, neural network generalization, neural network topology. Introduction The potential of feed-forward neural networks to approximate the functional relationship g implicitly encoded in a certain number of p training patterns fx ; : : : ; x n g p has originated much research in the understanding, the training and the generalization performance of neural networks [23, 22, 33, 34, 37]. Usually such feedforward neural networks are composed of k layers with up to j summation units s j;k which sum their inputs x i;k. Each input is hereby multiplied by an adjustable weight w ij;k. The summation result s j;k with s j;k = X i w ij;k x i;k + w 0;k () is then propagated through a non-linear function h(s j;k ) to the output, which serves as input for the nodes of the next network layer. Often nonlinear functions of the type h(s j;k ) = es j;k + e s j;k (2) are chosen which are commonly referred to as sigmoidal functions [33, 34]. The computational power attributed to these networks originates mainly from these nonlinear functions h(s j;k ) of the weighted sums, since the limitations of linear neural network models are now well understood and documented in the literature [2, 24, 33]. On the other hand it is mainly this non-linearity which makes any deeper mathematical analysis of the network properties and performance very difficult [20, 34]. Neural network research has therefore mainly been concentrated on the establishment of certain classes of non-linear multi-layered feedforward neural networks, for which theorems and theoretical bounds for the approximation properties can be stated [, 2, 7, 5, 36]. But until today no general theory has been presented for the a priori topology design, the explanation of the generalization properties and the a posteriori interpretation of non-linear multi-layered feed-forward neural networks.

2 . Motivation The search for a general theory for the design of feed-forward neural networks means that one seeks additional constraints to reduce the class of all possibly imaginable functional relationships g to a smaller subclass f, for which certain general properties can be derived. Inherently such additional constraints are difficult to select a priori, since the form of g is naturally unknown when using feed-forward neural networks to approximate the functional relationship g encoded in a set of p training patterns fx ; : : : ; x n g p, as shown in Fig.. 0 x x 2. x n C A - approximation function g Figure : Classical Neural Network g - x n It is evident that a too weak assumption upon the unknown function g may not result in a very significant restriction of the remaining function class properties of f, while a too strong assumption may reduce the range of validity to a too narrow, possibly unimportant function class f. Ideally, the assumption should also be immediately justifiable by inspection of the a priori available knowledge, e.g. from the available set of training patterns fx ; : : : ; x n g p only. A well suited a priori assumption in all technical applications of feed-forward neural networks without loss of generality is the restriction of all possible neural network functions g to the class of all dimensionally homogeneous functions f. Therefore in later parts of the paper the terms dimensional homogeneity, dimensionally homogeneous equation, dimension, dimensionless product, similarity transform and similarity function will be of major importance for the understanding of the subsequent derivations using the Pi-Theorem [5, 6]. The definitions and usage of these terms in the natural sciences are recalled for this reason in the following paragraph..2 Definition of terms The term dimensional homogeneity simply means that in any equation f (x ; : : : ; x n ) = 0 (this is the implicit notation of x n = f (x ; : : : ; x n) only, where x n is commonly denoted as y) the functional relationship of the physical variables x ; : : : ; x n has to apply to the physical dimensions of the variables (usually expressed in SI-units) as well (e.g. from [Nm] = [kg] ([m]=[s]) 2 follows that E = mc 2 fulfills the dimensions check and is possibly correct if the quantitative validity of the equation can also be shown). This means that any physical dimension (i.e. a SI-unit like [kg]) cannot be created from or disappear in the void. This so-called principle of dimensional homogeneity guarantees that in every possible and correct physical equation the dimensions on the left hand side of the equal sign are always identical to those on the right hand side. All (the known as well as the still unknown) equations in physics belong therefore to the so-called class of dimensionally homogeneous equations. The term dimension has thus multiple meanings in mathematics and physics. A vector x = fx ; x 2 ; x 3 g T is called a 3-dimensional vector since it has 3 components. If the vector components are the three physical variables of the previous example (i.e. x = fm; c; Eg T ), each of the vector components has also physical dimensions. The term dimensionless product stands for a special class of monomial expressions in the Q form of j = x ri= j x ji i which have no physical dimensions (e.g. are dimensionless) and are formed out of a (sub)set of physical variables x i ; x j. These physical variables x i ; x j are again elements of the set of variables in the very same existing dimensionally homogeneous equation f (x ; : : : ; x n ) = 0. The monomial expression j can also be interpreted as a mapping of x i ; x j 7! j which belongs to the class of so-called similarity transforms. A dimensionless function F ( ; : : : ; m ) = 0 in these variables j is also called similarity function. The principle of dimensional homogeneity is purely epistemologically based and is the axiomatic foundation of group theoretic methods in mathematical physics [3, 4]. Its philosophical foundation is used in the validation of any theoretical model building in physics and engineering. The general validity of this principle has lead to the establishment of the commonly known statement that one cannot com- 2

3 pare apples with oranges (i.e. 5 [apples] 6= 5 [oranges]). In all natural sciences it is therefore generally agreed on that any dimensionally not homogeneous model cannot be correct [6, 9, ]. The principle of dimensional homogeneity must therefore always be observed. 2 Theoretical Foundation This section introduces without proof the so called Buckingham- or Pi-Theorem. Pi-Theorem [5, 6]. From the existence of a dimensionally homogeneous and complete equation f of n physical quantities x i the existence of an equation F of only m dimensionless quantities j can be shown f (x ; :::; x n ) = 0 (3) F ( ; :::; m ) = 0 (4) where r = n m is the rank of the dimensional matrix constructed by the x i and with dimensionless quantities j of the form j = x j ry i= x ji i (5) with j = ; : : : ; m 2 N and ji 2 R as constants. The implicit form f (x ; : : : ; x n ) = 0 includes here all explicit notations x n = f(x ; : : : ; x n), since any explicit equation can always be written in an implicit form. Furthermore, modern proofs of the Pi-Theorem [9, ] impose no special assumptions on the specific nature of the operator f. It can be proved that every dimensionally homogeneous equation f in physics can be subjected to the Pi-Theorem (i.e. algebraic equations, differential equations, integro-differential equations, and so on). 2. Dimensional Matrix The so-called dimensional matrix associated with each set of p training patterns x ; : : : ; x n is shown in the left hand side of Fig. 2. This dimensional matrix has n rows for the variables x i and up to k columns for the representation of the dimensional exponents e ij of the variables x i in the k base dimensions s k of the employed unit system. In the current known SI-unit system seven dimensions (mass, length, time, temperature, current, amount of substance and intensity of light) are distinguished, thus k 7. Further examples of dimensional matrices are given in Fig. 4 and 6. To calculate the dimensionless products j in equation (5), the dimensional matrix of the pattern x ; : : : ; x n as shown in the left hand side of Fig. 2 needs to be created. By rank preserving operations the upper diagonal form of the dimensional matrix as shown in the right hand side of Fig. 2 is obtained. This means that ei- x x 2 : x n s s 2 : : : : : e ij s k 6 n r 6 m x x 2 : x r x r+ : x r+m s s 2 : : : : : s r : : : : : : ji Figure 2: Definition of Dimensional Matrix ther multiples of matrix columns may be added to each other or that matrix rows can be interchanged. The unknown exponents ji of the dimensionless products in equation (5) are then automatically determined by negation of the values of the resulting matrix elements ji in the hatched part of the matrix on the lower right hand side of Fig. 2. (Note: in respect to equation (5) the index j = ; : : : ; m of the variables x j refers hereby to the group of variables x i with i = r + ; : : : ; r + m, i.e. x r+ ; : : : ; x r+m as shown in the hatched part of the dimensional matrix in Fig. 2 right. This index transform greatly simplifies the notation of the dimensionless groups and is used consequently in the following.) The addition of matrix columns which contain physical dimension exponents e ij to one another means that the dimensions of the original dimensional representation system s ; : : : ; s k will be combined with each other by multiplication. This signifies that the dimensional representations of the variables x ; : : : ; x n will be transformed into a representation in another, equivalent dimensional representation system s ; : : : ; s r, with r k [28, 29]. A physical in- 3

4 terpretation of such a representation change is given in the next section. 2.2 Dimensionless Products A practical example of the derivation of dimensionless products j from a dimensional matrix is shown in the following. The simple example of a bending truss bar is used to illustrate the theoretical concept of the Pi-Theorem introduced above and is used to show the straightforward transformation of a dimensionally homogeneous function f into a dimensionless function F. The bending of a truss bar of length l, with a material of Young s modulus E and a cross sectional moment I exhibits a deflection u under a load P. This is shown in Fig. 3. Since the l; E; I P u Figure 3: Bending of a truss bar underlying differential equation of linear bending theory is solvable, a closed analytical solution in the form of f exists and is well known in the literature [9, 38] to be equal to f (l; E; P; I; u) = 0 with (6) u = P l 3 (7) 3 E I Ignoring for a moment the final result in equation (7) and starting only with the knowledge of the physical dimensions of the variables l; E; P; I; u in the relevance list of equation (6), the following dimensional matrix can immediately be established. In Fig. 4 left, the dimension exponents of the physical variables are given in a ([M ]ass, [L]ength, [T ]ime)-system. Variable SI-units [M ] [L] [T ] [F ] [L] P [kg m=s 2 ] -2 0 l [m] E [kg=m s 2 ] - -2 =) -2 I [m 4 ] u [m] Figure 4: Dimensional Matrix Computations By adding multiples of the matrix columns to each other in the left hand side of Fig. 4, the modified dimensional matrix in the right hand side of Fig. 4 is obtained. There the dimensional representations of the physical variables in the former ([M ]ass, [L]ength, [T ]ime)-system have now been transformed into their equivalent dimensional representation in a ([F ]orce, [L]ength)-System [28, 29]. (Note: in similarity theory it can be shown that there exists an infinite number of equivalent dimensional representation systems which can be transformed into one another by simple matrix operations []. The derivation of this proof lies however outside the main scope of this paper. The choice of a set of k fundamental dimensions to describe n physical variables can be seen as analogous to the choice of a set of k linearly independent base vectors of a linear vector space. If the original base vectors are then replaced by a linear combination of the latter to form a new vector base, the coordinate representations of the n vectors described in the original base system will change accordingly. This is exactly what is shown by the operations in the dimensional matrix in Fig. 4. A representation change may thus lead to more or less dense coordinate description, according to a more or less appropriate choice of the base vectors.) Concerning the right hand side of Fig. 4 the third column (i.e. [T ]ime) has been omitted, since it contains only zeros. Since the shown bending problem is static it requires no explicit modeling of time. The rank of both dimensional matrices (in Fig. 4 left as well as in Fig. 4 right) is r = 2, from n = 5 physical variables only m = n r = 3 dimensionless products = E P l 2 = E P l2 (8) 2 = I l 4 = I l 4 (9) 3 = u l = u l (0) are obtained as guaranteed by the Pi-Theorem. According to the definitions in Fig. 2 and equation (5), the values of the ji can be determined by visual inspection of the coefficients of the lower part of the modified dimensional matrix, compare also Fig. 2 and Fig. 4. Taking now a look back to the exact solution in form of equation (7), an algebraic manipulation (multiplication of the right hand side of equation (7) by unity in form of the factor (l 2 =l 2 )) of 4

5 elementary calculus leads to u l = 3 ( P El 2 )(l4 I ) () Substituting now equations (8), (9) and (0) into equation () yields the dimensionless equation 3 = 3 2 (2) Equation (2) is a practical example of the fact that every complete and dimensionally homogeneous equation of n physical variables can be written in the form of a dimensionless equation of its m = n r dimensionless groups (i.e. the dimensionless products). The dimensionless products can thus be interpreted as the necessary and sufficient building blocks of the correct solution. This is stated by the Pi-Theorem and written in the general form of equations (3), (4) and (5). 2.3 Relevance to Neural Networks The above considerations of dimensional homogeneity will have multiple conceptual consequences for the design and the generalization properties of non-linear multi-layered feedforward neural networks. In the following it is described how the Pi-Theorem is used for the topology design and for the proof of generalization of non-linear multi-layered feed-forward neural networks. This will be discussed in direct comparison to the classical neural network approach where the principle of dimensional homogeneity is ignored. The current approach to the problem of function approximation by neural networks as shown data P l E I u e e e e e p x p x 2p x 3p x 4p x 5p Figure 5: Original pattern data in Fig. is considered first. Classically, a set of p numerical training pattern data as shown in Fig. 5 is used in the training of the neural network to approximate the unknown functional relationship g encoded in the patterns x ;p ; : : : ; x n;p, which has in the current example with n = 5 the form of g(p; l; E; I; u) = 0. From the data in Fig. 5, as well as from the knowledge of the exact analytical solution from linear bending theory in equation (7), it is quite clear that the approximation problem during the training phases of the neural network consists in the identification of the correct n-dimensional hyper-surface g(x ; : : : ; x n ) = 0 with n = 5. This is shown in Fig. 6, which represents the computation sequence inside such a neural net- 0 P l C EA! w 0 P w l w 2 E w 3 I w 4! I u Figure 6: Computation sequence for g work with five adjustable weights w 0 ; w ; w 2 ; w 3 and w 4. According to the definitions in equation () this network sums the weighted logarithms of its inputs in s and propagates it through the exponential h(s) = e s as output function. The topology of this network is shown in Fig. 7. P l E I w w 2 w 3 w 4 s w 0 Figure 7: Neural Network Topology for g According to equation (7) the correct solution after training should be w = ; w 2 = 3; w 3 = ; w 4 = and w 0 = =3 for the neural network to generalize correctly. In this context it is important to note that in classical neural network function approximation, the weights w 0 ; w ; w 2 ; w 3 ; w 4 are initialized by random before the training and are iteratively updated according to the employed learning rule. This however means that the initial neural network state u 5

6 as well as most of the intermediate neural network states do encode dimensionally not homogeneous functions. To highlight this fact, the condition of dimensional homogeneity of the current weights in the neural network in Fig. 6 and 7 is written as [u] = [P ] w [l] w2 [E] w3 [I] w 4 (3) where [u]; [P ]; : : : denote the dimensional representation of the variables u; P; : : :. From the general equation (3) one obtains two equations in the two dimensions [F ]orce and [L]ength used to represent the variables in the dimensional matrix. This leads to a linear equation system of the weights in [F ] : w + w 3 = 0 (4) in [L] : w 2 2w 3 + 4w 4 = (5) g Figure 8: Properties of Function Classes cally illegal and meaningless state, regardless whether by accident some neural network responses might be numerically correct (e.g. numerically 5 [apples] are equal to 5 [oranges]). This is shown in Fig. 8. This means that the sufficient search space of all possible physical solutions f is artificially enlarged to the set of all dimensionally not homogeneous functions g, since the dimensional representation of u is [u] = [F ] 0 [L] : This means that all numerical values of the weights w ; : : : ; w 4 in Fig. 6 and 7 which do not satisfy equations (4) and (5) do not represent dimensionally homogeneous equations. In the context of universal function approximation by neural networks thus many neural network states (i.e. the randomly initialized neural network weights as well as most of the intermediate neural network states during the training) violate the principle of dimensional homogeneity (here equations (4) and (5)) and do encode dimensionally not homogeneous functions g. The neural network is thus in an physif which cannot represent physically correct solutions by definition. By ignoring the property of dimensional homogeneity the original problem of correct function approximation might thus have even worsened, especially in such cases where only very few training patterns are known, since the now added purely numerically correct solutions in the form of 5 [apples] = 5 [oranges] are in the training numerically indistinguishable from the only numerically and dimensionally correct solutions in the form of 5 [apples] = 5 [apples]. Taking therefore the a priori property of dimensional homogeneity of the unknown and sought after function f into account, the dimensionless groups can be determined from the dimensional matrix as shown in the previous example of Fig. 4. This means that every data point in x ; : : : ; x n corresponds to a data point in ; : : : ; m. In Fig. 9 the numerical values of this mapping for the data points in Fig. 5 are shown. From this transformed data table, as well as data e e e e e p ;p 2;p 3;p Figure 9: pattern data transforms from the knowledge of the exact dimensionless solution in form of equation (2), it is clear that the approximation problem during the training phases of the neural network has now been transformed into the problem of the identification of the correct m-dimensional hyper-surface F ( ; : : : ; m ) = 0 with only m = 3. This is shown in Fig. 0, which represents the computation sequence inside such a neural network with three adjustable weights v 0 ; v and v 2. 0 P l! C EA! 2 I! v 0 v v 2 2! 3! u Figure 0: Modified computation of f via F 6

7 According to equation (2) the correct solution after training is v = ; v 2 = and v 0 = =3 for the neural network to generalize correctly. A network topology for this is shown in Fig.. P l E I v 0 + v v Figure : Network Topology for f via F In respect to the definitions in equation () this network first computes in the first hidden layer the intermediate dimensionless products ; 2 as sums of the weighted logarithms of the inputs. (In fact the logarithms of ; 2 are computed). The weighted sum 3 in the second hidden layer is then propagated through the exponential f (s) = e s as output function. (Remark: The output function of the nodes P; l; E; I and v 0 is the logarithm, the output function of 3 is the exponential, while and 2 have the identity as output function). In contrary to the previous classical neural network function approximation of g it is important to note in this context that the neural network can now permanently represent only dimensionally homogeneous functions f, regardless of the random initialization and the iterative update of v 0 ; v ; v 2 during the training. The validity of the Pi-Theorem for all dimensionally homogeneous equations in physics can thus be interpreted in such a way, that since for every function f (x ; : : : ; x n ) = 0 a function F ( ; : : : ; m ) = 0 exists, the function F can be seen as beeing nested inside of f, enclosed by the appropriate similarity mappings as indicated in Fig. 0. This mapping scheme based on the existence proof of the Pi-Theorem can thus be generalized to the new neural network similarity topology design and interpretation scheme as shown in Fig. 2. This means that the first and last network layer represent the (here predetermined and fixed) for- and back-transform and u 0 x x 2. x n C A - function F - x n Figure 2: Similarity Network for f via F into and from dimensionless space, while the learning during the training is done through adjustment of the weights of the sought after similarity function F only. This in direct comparison to Fig. advantageous exploitation of the principle of dimensional homogeneity in the design of feed-forward neural networks as well as the derivation of several important neural network properties which can be proved with the help of this theory are stated in the following. 2.4 Proof of Generalization Based on this neural network topology design scheme as shown in Fig. 2, a generally valid proof of the two necessary and sufficient conditions for the correct generalization in neural networks can now be established in form of the two following consecutive steps. The formerly unresolved generalization capability of non-linear multi-layered feed-forward neural networks can be now proven to be pointwise correct, if and only if a training pattern p can be learned and recalled errorfree by the new similarity neural network topology F. This proof is due to the fact that well distinct data points in x-space may fall onto the very same point in space. This is known in physics as the phenomenon of complete similarity. For (x ; : : : ; x n ) p; (x ; : : : ; x n ) p ( ; : : : ; m ) p = const R (x ; : : : ; x n ) p;2 (x ; : : : ; x n ) p; Figure 3: Complete Similarity Condition 7

8 each training pattern p exists an infinite amount of completely similar points on n- dimensional hypersurfaces, which are defined by the specific constant numerical values of the dimensionless variables j;p = const as shown in Fig. 3. A numerical example of such two completely similar points lying on such a hypersurface defined by x j = j;p ry i= x ji i ( xi 2 R + j = ; : : : ; m (6) are the two pattern data sets p = and p = 5 given in Fig. 5, which have been computed for this purpose with the specific constant values of j;p in Fig. 9. Equations (6) stem from equations (5) which in engineering are commonly known as similarity laws [, 6]. The pointwise correct generalization of the p pattern data is in a mathematical sense a necessary condition for the overall correct generalization capability of the neural network. totally correct, if and only if the neural network approximates after the training the correct similarity function F ( ; : : : ; m ) = 0 which is associated with f (x ; : : : ; x n ) = 0. The correct similarity function F is approximated if and only if the correct pointwise generalization property is fulfilled for each point in the whole domain of definition of F. This is in a mathematical sense the necessary and sufficient condition for the correct generalization capability of the neural network. At this point three remarks are in order to put the above proof of generalization into perspective. First, it is evident that the above statements represent the theoretical result which one would obtain with ideal noise-free data. It is however important to realize that the permanent presence of noise and/or measurement error requires no fundamental methodological change in the principal approach of modeling real physical behavior as dimensionally homogeneous models (see also section 4). Second, it is important to see that the correct pointwise generalization is automatically achieved for every point in the training data set which can be reproduced by the neural network within reasonable error bounds. This is a consequence of the transformation sequence inside the similarity network topology only and does not depend on the identification of the overall correct dimensionless function F. As a further advantage, the error of the pointwise correct approximation can be verified by a simple recall of each of the training patterns after the training phase. Third, it should be clear that because of the necessary and sufficient condition for the totally correct generalization in form of the identification of the correct similarity function F, there is made no explicit or implicit claim that this correct result is in fact achieved after the training of the neural network. It should be clear that the underlying basic problem of function approximation based on sparse data samples remains. It is however claimed that every correct result can always be written in this form and that the search space of intermediate stages of the sought after approximation of g without the use of the dimensionless groups is very likely to violate the property of dimensional homogeneity. In the following section the resulting consequences are summarized which stem from the necessary and sufficient conditions for the correct generalization property of the new similarity topology design scheme. 2.5 Design Consequences Multi-layered non-linear feed-forward neural networks used as a tool for universal function approximation may now be designed according to the Pi-Theorem. As a consequence, the first and last network layer has to encode a similarity transformation of the training patterns as indicated in Fig. 2. This leads because of m = nr as stated in the Pi-Theorem to the following neural network layer sizes as shown in Fig. 4. The similarity transforms j can be determined before the training phase and depend on the dimensional information of the training patterns only. Most importantly, the feed-forward neural network is now no longer able to encode dimensionally not homogeneous functions g, but can only encode and approximate dimensionally homogeneous functions f (x ; : : : ; x n ) = 0 via the adjustment of the weights of the corresponding similarity function F ( ; : : : ; m ) = 0. About the 8

9 6 (n) x-nodes 6 - (m) F- () - -nodes m -node () x n -node fixed variable fixed Figure 4: General Similarity Topology necessary minimum number of weights, layers and nodes to encode the similarity function F nothing can be stated in full generality from the Pi-Theorem, since this depends on the physics of the modeled phenomenon and the available set of node transfer functions according to equation (2). This still unresolved question and the therefore necessary variable topology are indicated by the in Fig. 4. Since the modern proofs of the Pi-Theorem make no special assumptions on the specific nature of the operator f as stated in section 2, the similarity topology design method can be imposed a priori on any feed-forward neural network for arbitrary non-linear function approximation problems without any loss of generality, as long as the approximation problem belongs to the class of dimensionally homogeneous models. The following properties can then be proven straightforward: The transformation sequence inside the neural network is x 7! 7! F () 7! x, as shown in Fig 4. The first and last transformation layer have hereby a simple precomputed product form as shown in equation (5). The last computation step consists of the back-transform from the resulting m to the sought after x n contained therein. To compute this, up to r current input values of the so-called basis variables x i in the resulting Q dimensionless product m = x ri= n x ji i need to be propagated directly from the first to this last hidden layer node. The general form of the backtransform is thus x n = Q ri= m x ji i. This is on the same hand an effortless model correspondence to the so-called information shortcuts [33, 34] observed in neuro-biological systems. If and only if the neural network learns during the training phase the corresponding dimensionless similarity function F of the dimensionally homogeneous function f, the correct and unique generalization capability of the network over the whole range of definition of f can be proved based on the properties of the similarity function F. Any other feed-forward neural network which cannot be shown analytically to be equivalent to the minimal topology generated by the new method based on the Pi- Theorem is principally unable to generalize correctly over the whole range of definition of f, because it does not encode the correct similarity function F. The weights and transfer functions of the internal nodes can a posteriori be interpreted as dimensionless similarity variables and similarity functions, thus supporting the engineering analysis, discussion, interpretation and understanding of the now explicit functional relationship formerly implicitly encoded in the training patterns. The original approximation problem (i.e. the learning process) has not been worsened, since the dimensionality of F in respect to f is reduced by r. The ratio of the number of patterns p to the number of n versus m = n r independent variables involved has thus been increased, since (p=n) (p=m). In other contexts as in neural networks, the equivalents to many of the above stated mathematical properties have a proven record of usefulness in many other fields of engineering and science [3, 4, 5, 6, 6]. Mainly heat transfer [4] and fluid mechanics [7] account for the traditional strengths of similarity theory. The new interpretation of the results of similarity theory in the context of neural network topology design and interpretation is now briefly demonstrated in the comparison of a classical and the new neural network topology in the following real world problem setting of a non-linear function approximation problem solution generally attributed to PRANDTL [26]. 9

10 3 Application Example The drag w exerted by a fluid (velocity v, density and viscosity ) around a sphere with diameter l is a highly nonlinear physical phenomena. Since the discovery of the governing Navier- Stokes equations which cannot be solved analytically, approximate solutions have been experimentally justified [26]. From the measurements in these experiments, a functional relationship f (l; v; ; ; w) = 0 can be expected. This is shown in Fig. 5. v - ; 6 l w Figure 5: Flow around a sphere Two different neural network topology concepts, a traditional neural network with 4 input nodes, two hidden layers with 4 nodes each and one output node, all having sigmoidal transfer functions as in equation (2), as well a neural network constructed with the similarity principle and polynomials as transfer functions, are compared. Both networks were presented the same patterns in the numerical simulation using the SNNS-package [35]. To determine the new neural network topology, the dimensional matrix shown in the left hand side of Fig. 6 is constructed from the dimensional information of the patterns. Then, name SI-units [M ] [L] [T ] [M ] [L] [T ] [kg=m 3 ] l [m] v [m=s] 0 - =) 0 0 [m 2 =s] w [kg m=s 2 ] Figure 6: Dimensional Matrix Computations by using rank preserving operations and adding multiples of the matrix columns to each other, the modified dimensional matrix in the right hand side of Fig. 6 is obtained. According to equation (5), two dimensionless products and 2 can immediately be derived by inspection of the lower diagonal coefficients ji of this modified dimensional matrix to be equal to = v l ( Re ) (7) 2 = w v 2 l 2 ( c w ) (8) This means that a functional relationship in the form F ( ; 2 ) = 0 with only two dimensionless similarity variables exists and corresponds to the expected functional relationship f (x ; : : : ; x 5 ) = 0. Both dimensionless products and 2 occur that often in fluid dynamics that is called the REYNOLDS-Number Re and 2 is called the drag coefficient c w. The determination of F is traditionally done using statistical methods [26, 8], but can also be iteratively determined by a neural network during the training period. From the experimental data [26] available, 44 data points in the range of Re = 0 3 ; : : : ; have been se- c w F f Re - "similarity.net" "konventional.net" "pattern.set" Figure 7: Approximation Results lected as training patterns as shown in Fig. 7. Here the projection of the training result of both the classical neural network learning f and the new neural network topology learning F, is shown. A closer look at the error distribution in Fig. 8 shows that the relative error in the 6 C W C W [%] f F "konv.plt" "pi.plt" p - Figure 8: Relative Approximation Error neural network constructed with the new topology design method is better balanced over the 0

11 p = ; : : : ; 44 training patterns. As can be observed from this data, the neural network with the new topology concept obtains better (or at least as good) approximation results over the whole range of training data. However, this is not the main point here at all and error propagation is one of the topics of our ongoing research. Therefore no further numerical data are presented here because none of the claims made in this paper is based on numerical or statistical observations. Most importantly, the new neural network topology guarantees an optimal generalization, since (an approximation of) the similarity function F of the physical phenomena is encoded in the network. Since all elements of the hyperplanes with v l = const = Re 2 [03 ; : : : ; ] are projected onto the very same point in dimensionless space, the generalization of the network F is guaranteed to be correct for p = 44- times infinitely more points not included in the original training set p with exactly the same relative approximation error as the p = 44 data points in Fig. 8. This is the main advantage over the classical neural network approximation g and a unique property inherent in the new similarity topology design method. 4 Discussion With both problems of function approximation and the a priori estimation of the size of the middle layer(s) now necessarily approximating F instead of f (necessary number of nodes, links or kinds of transfer functions) is not dealt with, since this issue is already addressed in the literature [, 2, 7, 5, 36]. Imposing the new topology concept does not change nor worsen the original problem of universal function approximation, but clearly identifies the well distinct origin and separation of the problem of universal function approximation from the problem of correct generalization in feed-forward neural networks. From the key idea of dimensional homogeneity highlighted in this paper it should be quite clear that one should be very cautious not to jump to numerical simulations too early. It is evident that a major part of the cognitive effort and of the scientific achievement in understanding physical phenomena lies in the discovery of the qualitatively correct relevance list of the expected functional relationship and only finally in the successful establishment of a quantitative model description. In this respect similarity theory is one of the keys to a meaningful a posteriori interpretation of the inner nodes of the neural network after successful training. The epistemological concept of dimensions has thus much more significant consequences than one might expect when looking at numerical values of data only. After more than 2000 years of scientific effort of mankind just 7 independent dimensions have been established in our presently known and used SI-unit system. It is evident that the range of validity of similarity theory is limited to the kind of problems which can be described with functional relationships of variables represented in these 7 dimensions. The range of validity is thus typically the area of engineering and physics. It is an open question in the philosophy of the natural sciences whether there are more (but still unknown) dimensions out there to be discovered or whether there exist classes of problems which inherently lie outside the domain of dimensional representations. In this respect it is just mentioned that the generalization and further extension of the Pi-Theorem to other areas than physics and engineering has already been suggested and seems to lead to fruitful extensions [27]. The Pi-Theorem has also already been successfully applied to the problem of pattern recognition with neural networks [0] as well as in the broad field of economics [8]. A clear understanding of all dimensions involved in a certain problem is therefore one of the crucial basic steps in any effort of theoretical model building and can significantly facilitate the theoretical analysis and conclusions drawn from the investigated model as shown in the examples. 4. Related Issues Noise. Real world data are commonly affected with errors and noise when measured. While for systematic measurement errors it can be compensated for, measurement noise can commonly only be handled and compensated for if a certain noise distribution model is assumed. In this respect it is stated without proof

12 that similarity theory in the form of dimensional analysis is often the method of choice to process and display experimental data affected with noise. The real world example of the flow around a sphere in section 3 shows best the advantageous projection of noise-affected experimental data into a physically meaningful lowerdimensional space. The mapping of several well distinct data points in x-space onto the principally same point (or in its proximity) in -space by similarity transforms even helps to deal with statistically distributed noise effects [8]. Dimensional analysis has therefore always been one of the methods of choice of experimenters [26]. The influence of noise is therefore judged as so important that it is one of the main topics of our ongoing work. Non-uniqueness of solutions. Without further consequences for the main arguments in this paper it is mentioned that the existence proof of solutions of equation (5) is constructive but not unique. Since the dimensionless products form a free Abelian group, all possible solutions to equation (4) are of the form ^ k = my j= kj j (9) with k = ; : : : ; m 2 N and kj 2 R. The general solution may thus consist of any arbitrary combination of the m original j which also satisfies the condition of structural independence [, 25]. This means that the square matrix with elements kj has to be of full rank to guarantee the equivalence of both dimensionless parameter sets = f ; : : : ; m g and ^ = f^ ; : : : ; ^ m g. In respect to the dimensionless equation F ( ; : : : ; m ) = 0 as guaranteed by the Pi- Theorem this means that the form of F depends on the choice of a specific set of ^ = f^ ; : : : ; ^ m g. In terms of the bending bar example in section 2.2 choosing a full rank matrix to be equal to = C A (20) results in the following 3 modified dimensionless products ^ = = P E l 2 (2) ^ 2 = 2 = l4 I ^ 3 = 3 = u l (22) (23) what in turn changes the form of the resulting similarity function F to ^ 3 = 3 ^ ^ 2 (24) 4.2 Related Works In a recent paper by GUNARATNAM and GERO [2] the effect of dimensionless representations (i.e. dimensionless products) on neural network generalization performance in the presence of noise was tested numerically in comparison to classical neural network approaches. The numerical experiments reported in [2] fit perfectly well into the systematic presentation of the underlying theoretical framework here. However no general explanation and/or no proof in form of the necessary and sufficient conditions for the correct generalization as in section 2.4 nor further details of the topology design of similarity networks as enumerated in section 2.5 were provided in [2]. In another recent paper by RUDOLPH [32] the practical implementation of the presented similarity theory into the fitness function of a genetic algorithm is described. There the condition of complete similarity according to equation (6) is used to construct a fitness function of a genetic algorithm which enables the selection of correctly generalizing neural network topologies out of a sequence of randomly mutated populations of neural network individuals. Despite the fact that the neural network topologies were left completely unconstrained on a so-called genetic grid which defined the maximally possible network size in terms of layers and nodes, the theoretical result in form of the general topology design scheme as shown in Fig. 2 and 4 was always achieved in the numerous numerical computer simulations. Besides the two recent short presentations of the principle of dimensional homogeneity in feed-forward neural networks at workshops [30, 2

13 3] with limited audience, it is claimed based on extensive literature search that this is the first time that the principle of dimensional homogeneity is theoretically applied to the a priori topology design of non-linear feed-forward neural networks and that a proof in form of necessary and sufficient conditions for the correct generalization in non-linear feed-forward neural networks is presented. The author has also communicated his original idea prior to publication to the graduate students G. EMRICH, H.-G. HERRMANN and O. BARTH of the Institut für Statik und Dynamik der Luft- und Raumfahrtkonstruktionen (ISD) to stimulate and encourage further work [0, 3]. 5 Summary The principle of dimensional homogeneity is a necessary and sufficient condition for the mapping of the n dimensional variables x into m- dimensional dimensionless space. The proof of the Pi-Theorem hereby guarantees the reduction to a minimum of m = n r independent dimensionless variables. Thus, the topology of any feed-forward neural network designed with this method consists of a predetermined node reduction and special transfer functions in the first layer of the neural network and a predetermined node expansion and special transfer functions in the last layer of the neural network. Only these topology properties guarantee the unique feature of correct pointwise generalization of non-linear feed-forward neural networks based on the correct approximation of the given training data only. No other theoretical way is known today to substitute for this unique feature. This result implies that all other neural networks which cannot be shown to be analytically equivalent to the new neural network topology design scheme are uncapable of generalizing correctly and suggests therefore the systematic use of this topology design method. Acknowledgments The support of G. EMRICH in the numerical simulations and the proof-reading of several previous drafts was very valuable and has been appreciated. The financial support of this work by the Deutsche Forschungsgemeinschaft (DFG) is acknowledged. References [] A. Barron, Universal approximation bounds for superpositions of a sigmoidal function, IEEE Transactions on Information Theory, 39 (3) (993), [2] E. Blum and L. Li, Approximation theory and neural networks, Neural Networks, 4 (4) (99), [3] G. Bluman and J. Cole, Similarity Methods for Differential Equations. (Springer, New York, 974). [4] G. Bluman and S. Kumei, Symmetries and Differential Equations. (Springer, New York, 989). [5] P. Bridgman, Dimensional Analysis (Yale University Press, New Haven, 922). [6] E. Buckingham, The principle of similitude, Nature 96 (95), [7] G. Cybenko, Approximations by superpositions of a sigmoidal function, Math. Control, Signals, Systems, 2 (989), [8] F. de Jong, Dimensional analysis for economists. North-Holland, Amsterdam, 967. [9] B. Elvers; S. Hawkins; G. Schulz (Eds.), Ullmanns Encyclopedia of Industrial Chemistry. Volume B: Fundamentals of Chemical Engineering. (Verlagsgesellschaft, Weinheim, 990). [0] G. Emrich. Bilderkennung mit neuen, multiskaleninvarianten Zentralmomenten, in: Kröplin, B.: Internationales Workshop Neuronale Netze in Ingenieuranwendungen, Institut für Statik und Dynamik der Luft- und Raumfahrtkonstruktionen, Universität Stuttgart, Februar 996, -2. [] H. Görtler, Dimensionsanalyse (Springer, Berlin, 975). [2] D. Gunaratnam and J. Gero, Effect of Representation on the Performance of Neural Networks in Structural Engineering Applications. Microcomputers in Civil Engineering 9, 97-08, 994. [3] H.-G. Herrmann. Untersuchungen zur Anwendbarkeit von Neuronalen Netzen in der Strukturmechanik (PhD in preparation), Institut für Statik und Dynamik (ISD), Universität Stuttgart,

14 [4] J. Holman, Heat Transfer, (McGraw-Hill, New York, 986). [5] K. Hornik, M. Stinchcombe, and H. White, Multilayer feedforward networks are universal approximators, Neural Networks, 2 (5) (989), [6] H. Huntley, Dimensional Analysis (MacDonald, London, 952). [7] S. Kline, Similitude and Approximation Theory, (Springer, New York, 986). [8] C. Li and Y. Lee, A statistical procedure for model building in dimensional analysis, International Journal of Heat and Mass Transfer, 33 (7) (990), [9] L. Malvern, Introduction to the Mechanics of a Continous Medium, (Prentice Hall, London, 969). [20] T. Masters, Practical Neural Network Recipes in C++ (chapter 6: Multilayer Feedforward Networks, 85 90, Academic Press, Boston, 993). [2] M. Minsky and S. Papert, Perceptron: An Introduction to Computational Geometry (MIT Press, Cambridge, MA, 969). [22] K. Möller and G. Paaß (eds), Künstliche Neuronale Netze: Eine Bestandsaufnahme, Künstliche Intelligenz KI, 8 (4) (994), [23] G. Paaß, Assessing and improving neural network predictions by the bootstrap algorithm, in: J. Cowan S. Hanson and C. Giles, eds., Advances in Neural Information Processing Systems 5 (NIPS 5), (Morgan Kaufmann, San Mateo, CA, 993), [24] Y.-H. Pao, Adaptive Pattern Recognition and Neural Networks (Addison-Wesley, Reading MA, 989). [25] J. Pawlowski, Die Ähnlichkeitstheorie in der physikalisch-technischen Forschung. Grundlagen und Anwendungen, Springer, Berlin, 97. [26] L. Prandtl, C. Wieselsberger, and A. Betz, Ergebnisse der Aerodynamischen Versuchsanstalt zu Göttingen (Oldenbourg, Berlin, 923). [27] Z. Rosenbaum, Foundations and Techniques of Generalized Dimensional Analysis, Ph.D. dissertation, The State University of New Jersey, 990. [28] S. Rudolph, Eine Methodik zur systematischen Bewertung von Konstruktionen, Ph.D. dissertation, Universität Stuttgart, VDI Fortschrittsberichte, Reihe, Nummer 25, Düsseldorf, 995. [29] S. Rudolph, A Methodology for the Systematic Evaluation of Engineering Design Objects, Ph.D. dissertation, translation of the original german PhD thesis [28] into english language. A copy of this translated Ph.D. thesis is available on request by from: rudolph@isd.uni-stuttgart.de, ISD Verlag, Number 02-94, Stuttgart University, 995. [30] S. Rudolph, Entwurf, Anwendung und Interpretation Neuronaler Netze im Ingenieurwesen, in: Berkhan, V.; Egly, H. und Olbrich, M. (eds.): Forum Bauinformatik, Junge Wissenschaftler forschen, Hannover 95. VDI Fortschrittsberichte Reihe 20, Nummer 73, VDI-Verlag, Dsseldorf, 24-30, 995. [3] S. Rudolph, On Topology and Generalization in Feed-Forward Neural Networks, in: Kröplin, B.-H. (ed.): Neuronale Netze in Ingenieuranwendungen, Internationales Workshop, Institut für Statik und Dynamik der Luft- und Raumfahrtkonstruktionen, Universität Stuttgart, Februar 996, 7-26, 996. [32] S. Rudolph, On A Genetic Algorithm for the Selection of Optimally Generalizing Neural Network Topologies, Proceedings of the 2nd International Conference on Adaptive Computing in Engineering Design and Control 96, I.C.Parmee (ed.), University of Plymouth, March 26th-28th, Plymouth, United Kingdom, 79-86, 996. [33] D. Rumelhart and J. McClelland, Parallel Distributed Processing. Volume I and II (MIT Press, Cambridge, MA, 986). [34] E. Sanchez-Sinencio and C. Lau (eds), Artificial Neural Networks (IEEE Press, New York, 992). [35] SNNS (Stuttgart Neural Network Simulator), User Manual, Version 3.2, Institute for Parallel and Distributed High Performance Systems, Stuttgart University, Germany, 994. [36] K.-Y. Siu, V. Roychowdhury, and T. Kailath, Rational approximation techniques for analysis of neural networks, IEEE Transactions on Information Theory, 40 (2) (994), [37] M. Stone, An asymptotic equivalence of choice of model by cross-validation and akaike s criterion, Journal of the Royal Statistical Society, Ser B, 39 () (977), [38] S. Timoshenko and J. Goodier, Theory of Elasticity, McGraw-Hill, London,

A Logarithmic Neural Network Architecture for Unbounded Non-Linear Function Approximation

A Logarithmic Neural Network Architecture for Unbounded Non-Linear Function Approximation 1 Introduction A Logarithmic Neural Network Architecture for Unbounded Non-Linear Function Approximation J Wesley Hines Nuclear Engineering Department The University of Tennessee Knoxville, Tennessee,

More information

Temporal Backpropagation for FIR Neural Networks

Temporal Backpropagation for FIR Neural Networks Temporal Backpropagation for FIR Neural Networks Eric A. Wan Stanford University Department of Electrical Engineering, Stanford, CA 94305-4055 Abstract The traditional feedforward neural network is a static

More information

Neural Networks and the Back-propagation Algorithm

Neural Networks and the Back-propagation Algorithm Neural Networks and the Back-propagation Algorithm Francisco S. Melo In these notes, we provide a brief overview of the main concepts concerning neural networks and the back-propagation algorithm. We closely

More information

Back-Propagation Algorithm. Perceptron Gradient Descent Multilayered neural network Back-Propagation More on Back-Propagation Examples

Back-Propagation Algorithm. Perceptron Gradient Descent Multilayered neural network Back-Propagation More on Back-Propagation Examples Back-Propagation Algorithm Perceptron Gradient Descent Multilayered neural network Back-Propagation More on Back-Propagation Examples 1 Inner-product net =< w, x >= w x cos(θ) net = n i=1 w i x i A measure

More information

Artificial Neural Networks

Artificial Neural Networks Introduction ANN in Action Final Observations Application: Poverty Detection Artificial Neural Networks Alvaro J. Riascos Villegas University of los Andes and Quantil July 6 2018 Artificial Neural Networks

More information

Application of Artificial Neural Networks in Evaluation and Identification of Electrical Loss in Transformers According to the Energy Consumption

Application of Artificial Neural Networks in Evaluation and Identification of Electrical Loss in Transformers According to the Energy Consumption Application of Artificial Neural Networks in Evaluation and Identification of Electrical Loss in Transformers According to the Energy Consumption ANDRÉ NUNES DE SOUZA, JOSÉ ALFREDO C. ULSON, IVAN NUNES

More information

Can Vector Space Bases Model Context?

Can Vector Space Bases Model Context? Can Vector Space Bases Model Context? Massimo Melucci University of Padua Department of Information Engineering Via Gradenigo, 6/a 35031 Padova Italy melo@dei.unipd.it Abstract Current Information Retrieval

More information

Unit 8: Introduction to neural networks. Perceptrons

Unit 8: Introduction to neural networks. Perceptrons Unit 8: Introduction to neural networks. Perceptrons D. Balbontín Noval F. J. Martín Mateos J. L. Ruiz Reina A. Riscos Núñez Departamento de Ciencias de la Computación e Inteligencia Artificial Universidad

More information

Artificial Neural Networks

Artificial Neural Networks Artificial Neural Networks 鮑興國 Ph.D. National Taiwan University of Science and Technology Outline Perceptrons Gradient descent Multi-layer networks Backpropagation Hidden layer representations Examples

More information

Evolutionary Computation

Evolutionary Computation Evolutionary Computation - Computational procedures patterned after biological evolution. - Search procedure that probabilistically applies search operators to set of points in the search space. - Lamarck

More information

Feed-forward Network Functions

Feed-forward Network Functions Feed-forward Network Functions Sargur Srihari Topics 1. Extension of linear models 2. Feed-forward Network Functions 3. Weight-space symmetries 2 Recap of Linear Models Linear Models for Regression, Classification

More information

Reading Group on Deep Learning Session 1

Reading Group on Deep Learning Session 1 Reading Group on Deep Learning Session 1 Stephane Lathuiliere & Pablo Mesejo 2 June 2016 1/31 Contents Introduction to Artificial Neural Networks to understand, and to be able to efficiently use, the popular

More information

Entropy Manipulation of Arbitrary Non I inear Map pings

Entropy Manipulation of Arbitrary Non I inear Map pings Entropy Manipulation of Arbitrary Non I inear Map pings John W. Fisher I11 JosC C. Principe Computational NeuroEngineering Laboratory EB, #33, PO Box 116130 University of Floridaa Gainesville, FL 326 1

More information

ARTIFICIAL NEURAL NETWORK PART I HANIEH BORHANAZAD

ARTIFICIAL NEURAL NETWORK PART I HANIEH BORHANAZAD ARTIFICIAL NEURAL NETWORK PART I HANIEH BORHANAZAD WHAT IS A NEURAL NETWORK? The simplest definition of a neural network, more properly referred to as an 'artificial' neural network (ANN), is provided

More information

An accelerated predictor-corrector scheme for 3D crack growth simulations

An accelerated predictor-corrector scheme for 3D crack growth simulations An accelerated predictor-corrector scheme for 3D crack growth simulations W. Weber 1 and G. Kuhn 2 1,2 1 Institute of Applied Mechanics, University of Erlangen-Nuremberg Egerlandstraße 5, 91058 Erlangen,

More information

Neural Network Weight Space Symmetries Can Speed up Genetic Learning

Neural Network Weight Space Symmetries Can Speed up Genetic Learning Neural Network Weight Space Symmetries Can Speed up Genetic Learning ROMAN NERUDA Λ Institue of Computer Science Academy of Sciences of the Czech Republic P.O. Box 5, 187 Prague, Czech Republic tel: (4)665375,fax:(4)8585789

More information

Characterization of Convex and Concave Resource Allocation Problems in Interference Coupled Wireless Systems

Characterization of Convex and Concave Resource Allocation Problems in Interference Coupled Wireless Systems 2382 IEEE TRANSACTIONS ON SIGNAL PROCESSING, VOL 59, NO 5, MAY 2011 Characterization of Convex and Concave Resource Allocation Problems in Interference Coupled Wireless Systems Holger Boche, Fellow, IEEE,

More information

Lecture 2. G. Cowan Lectures on Statistical Data Analysis Lecture 2 page 1

Lecture 2. G. Cowan Lectures on Statistical Data Analysis Lecture 2 page 1 Lecture 2 1 Probability (90 min.) Definition, Bayes theorem, probability densities and their properties, catalogue of pdfs, Monte Carlo 2 Statistical tests (90 min.) general concepts, test statistics,

More information

Data Mining Part 5. Prediction

Data Mining Part 5. Prediction Data Mining Part 5. Prediction 5.5. Spring 2010 Instructor: Dr. Masoud Yaghini Outline How the Brain Works Artificial Neural Networks Simple Computing Elements Feed-Forward Networks Perceptrons (Single-layer,

More information

below, kernel PCA Eigenvectors, and linear combinations thereof. For the cases where the pre-image does exist, we can provide a means of constructing

below, kernel PCA Eigenvectors, and linear combinations thereof. For the cases where the pre-image does exist, we can provide a means of constructing Kernel PCA Pattern Reconstruction via Approximate Pre-Images Bernhard Scholkopf, Sebastian Mika, Alex Smola, Gunnar Ratsch, & Klaus-Robert Muller GMD FIRST, Rudower Chaussee 5, 12489 Berlin, Germany fbs,

More information

Analysis of Fast Input Selection: Application in Time Series Prediction

Analysis of Fast Input Selection: Application in Time Series Prediction Analysis of Fast Input Selection: Application in Time Series Prediction Jarkko Tikka, Amaury Lendasse, and Jaakko Hollmén Helsinki University of Technology, Laboratory of Computer and Information Science,

More information

Design Collocation Neural Network to Solve Singular Perturbed Problems with Initial Conditions

Design Collocation Neural Network to Solve Singular Perturbed Problems with Initial Conditions Article International Journal of Modern Engineering Sciences, 204, 3(): 29-38 International Journal of Modern Engineering Sciences Journal homepage:www.modernscientificpress.com/journals/ijmes.aspx ISSN:

More information

NONLINEAR CLASSIFICATION AND REGRESSION. J. Elder CSE 4404/5327 Introduction to Machine Learning and Pattern Recognition

NONLINEAR CLASSIFICATION AND REGRESSION. J. Elder CSE 4404/5327 Introduction to Machine Learning and Pattern Recognition NONLINEAR CLASSIFICATION AND REGRESSION Nonlinear Classification and Regression: Outline 2 Multi-Layer Perceptrons The Back-Propagation Learning Algorithm Generalized Linear Models Radial Basis Function

More information

Discriminative Direction for Kernel Classifiers

Discriminative Direction for Kernel Classifiers Discriminative Direction for Kernel Classifiers Polina Golland Artificial Intelligence Lab Massachusetts Institute of Technology Cambridge, MA 02139 polina@ai.mit.edu Abstract In many scientific and engineering

More information

Multilayer feedforward networks are universal approximators

Multilayer feedforward networks are universal approximators Multilayer feedforward networks are universal approximators Kur Hornik, Maxwell Stinchcombe and Halber White (1989) Presenter: Sonia Todorova Theoretical properties of multilayer feedforward networks -

More information

Artificial Neural Networks. MGS Lecture 2

Artificial Neural Networks. MGS Lecture 2 Artificial Neural Networks MGS 2018 - Lecture 2 OVERVIEW Biological Neural Networks Cell Topology: Input, Output, and Hidden Layers Functional description Cost functions Training ANNs Back-Propagation

More information

Learning and Memory in Neural Networks

Learning and Memory in Neural Networks Learning and Memory in Neural Networks Guy Billings, Neuroinformatics Doctoral Training Centre, The School of Informatics, The University of Edinburgh, UK. Neural networks consist of computational units

More information

Neural networks COMS 4771

Neural networks COMS 4771 Neural networks COMS 4771 1. Logistic regression Logistic regression Suppose X = R d and Y = {0, 1}. A logistic regression model is a statistical model where the conditional probability function has a

More information

Artificial Neural Networks

Artificial Neural Networks Artificial Neural Networks Threshold units Gradient descent Multilayer networks Backpropagation Hidden layer representations Example: Face Recognition Advanced topics 1 Connectionist Models Consider humans:

More information

CONTROL SYSTEMS, ROBOTICS AND AUTOMATION Vol. XI Stochastic Stability - H.J. Kushner

CONTROL SYSTEMS, ROBOTICS AND AUTOMATION Vol. XI Stochastic Stability - H.J. Kushner STOCHASTIC STABILITY H.J. Kushner Applied Mathematics, Brown University, Providence, RI, USA. Keywords: stability, stochastic stability, random perturbations, Markov systems, robustness, perturbed systems,

More information

On Information Maximization and Blind Signal Deconvolution

On Information Maximization and Blind Signal Deconvolution On Information Maximization and Blind Signal Deconvolution A Röbel Technical University of Berlin, Institute of Communication Sciences email: roebel@kgwtu-berlinde Abstract: In the following paper we investigate

More information

Section 3.2. Multiplication of Matrices and Multiplication of Vectors and Matrices

Section 3.2. Multiplication of Matrices and Multiplication of Vectors and Matrices 3.2. Multiplication of Matrices and Multiplication of Vectors and Matrices 1 Section 3.2. Multiplication of Matrices and Multiplication of Vectors and Matrices Note. In this section, we define the product

More information

Short Term Memory and Pattern Matching with Simple Echo State Networks

Short Term Memory and Pattern Matching with Simple Echo State Networks Short Term Memory and Pattern Matching with Simple Echo State Networks Georg Fette (fette@in.tum.de), Julian Eggert (julian.eggert@honda-ri.de) Technische Universität München; Boltzmannstr. 3, 85748 Garching/München,

More information

Two alternative derivations of Bridgman s theorem

Two alternative derivations of Bridgman s theorem Journal of Mathematical Chemistry 26 (1999) 255 261 255 Two alternative derivations of Bridgman s theorem Mário N. Berberan-Santos a and Lionello Pogliani b a Centro de Química-Física Molecular, Instituto

More information

On the complexity of shallow and deep neural network classifiers

On the complexity of shallow and deep neural network classifiers On the complexity of shallow and deep neural network classifiers Monica Bianchini and Franco Scarselli Department of Information Engineering and Mathematics University of Siena Via Roma 56, I-53100, Siena,

More information

Quantum Computation via Sparse Distributed Representation

Quantum Computation via Sparse Distributed Representation 1 Quantum Computation via Sparse Distributed Representation Gerard J. Rinkus* ABSTRACT Quantum superposition states that any physical system simultaneously exists in all of its possible states, the number

More information

ARTIFICIAL NEURAL NETWORKS گروه مطالعاتي 17 بهار 92

ARTIFICIAL NEURAL NETWORKS گروه مطالعاتي 17 بهار 92 ARTIFICIAL NEURAL NETWORKS گروه مطالعاتي 17 بهار 92 BIOLOGICAL INSPIRATIONS Some numbers The human brain contains about 10 billion nerve cells (neurons) Each neuron is connected to the others through 10000

More information

Input layer. Weight matrix [ ] Output layer

Input layer. Weight matrix [ ] Output layer MASSACHUSETTS INSTITUTE OF TECHNOLOGY Department of Electrical Engineering and Computer Science 6.034 Artificial Intelligence, Fall 2003 Recitation 10, November 4 th & 5 th 2003 Learning by perceptrons

More information

Stable Adaptive Momentum for Rapid Online Learning in Nonlinear Systems

Stable Adaptive Momentum for Rapid Online Learning in Nonlinear Systems Stable Adaptive Momentum for Rapid Online Learning in Nonlinear Systems Thore Graepel and Nicol N. Schraudolph Institute of Computational Science ETH Zürich, Switzerland {graepel,schraudo}@inf.ethz.ch

More information

In: Proc. BENELEARN-98, 8th Belgian-Dutch Conference on Machine Learning, pp 9-46, 998 Linear Quadratic Regulation using Reinforcement Learning Stephan ten Hagen? and Ben Krose Department of Mathematics,

More information

Cover Page. The handle holds various files of this Leiden University dissertation

Cover Page. The handle  holds various files of this Leiden University dissertation Cover Page The handle http://hdl.handle.net/1887/39637 holds various files of this Leiden University dissertation Author: Smit, Laurens Title: Steady-state analysis of large scale systems : the successive

More information

Lecture 4: Perceptrons and Multilayer Perceptrons

Lecture 4: Perceptrons and Multilayer Perceptrons Lecture 4: Perceptrons and Multilayer Perceptrons Cognitive Systems II - Machine Learning SS 2005 Part I: Basic Approaches of Concept Learning Perceptrons, Artificial Neuronal Networks Lecture 4: Perceptrons

More information

Approximation of Functions by Multivariable Hermite Basis: A Hybrid Method

Approximation of Functions by Multivariable Hermite Basis: A Hybrid Method Approximation of Functions by Multivariable Hermite Basis: A Hybrid Method Bartlomiej Beliczynski Warsaw University of Technology, Institute of Control and Industrial Electronics, ul. Koszykowa 75, -66

More information

Virtual distortions applied to structural modelling and sensitivity analysis. Damage identification testing example

Virtual distortions applied to structural modelling and sensitivity analysis. Damage identification testing example AMAS Workshop on Smart Materials and Structures SMART 03 (pp.313 324) Jadwisin, September 2-5, 2003 Virtual distortions applied to structural modelling and sensitivity analysis. Damage identification testing

More information

CHAPTER 0 PRELIMINARY MATERIAL. Paul Vojta. University of California, Berkeley. 18 February 1998

CHAPTER 0 PRELIMINARY MATERIAL. Paul Vojta. University of California, Berkeley. 18 February 1998 CHAPTER 0 PRELIMINARY MATERIAL Paul Vojta University of California, Berkeley 18 February 1998 This chapter gives some preliminary material on number theory and algebraic geometry. Section 1 gives basic

More information

On the minimal free resolution of a monomial ideal.

On the minimal free resolution of a monomial ideal. On the minimal free resolution of a monomial ideal. Caitlin M c Auley August 2012 Abstract Given a monomial ideal I in the polynomial ring S = k[x 1,..., x n ] over a field k, we construct a minimal free

More information

Pattern Recognition Prof. P. S. Sastry Department of Electronics and Communication Engineering Indian Institute of Science, Bangalore

Pattern Recognition Prof. P. S. Sastry Department of Electronics and Communication Engineering Indian Institute of Science, Bangalore Pattern Recognition Prof. P. S. Sastry Department of Electronics and Communication Engineering Indian Institute of Science, Bangalore Lecture - 27 Multilayer Feedforward Neural networks with Sigmoidal

More information

Vector, Matrix, and Tensor Derivatives

Vector, Matrix, and Tensor Derivatives Vector, Matrix, and Tensor Derivatives Erik Learned-Miller The purpose of this document is to help you learn to take derivatives of vectors, matrices, and higher order tensors (arrays with three dimensions

More information

On the Lebesgue constant of barycentric rational interpolation at equidistant nodes

On the Lebesgue constant of barycentric rational interpolation at equidistant nodes On the Lebesgue constant of barycentric rational interpolation at equidistant nodes by Len Bos, Stefano De Marchi, Kai Hormann and Georges Klein Report No. 0- May 0 Université de Fribourg (Suisse Département

More information

Discrete Projection Methods for Incompressible Fluid Flow Problems and Application to a Fluid-Structure Interaction

Discrete Projection Methods for Incompressible Fluid Flow Problems and Application to a Fluid-Structure Interaction Discrete Projection Methods for Incompressible Fluid Flow Problems and Application to a Fluid-Structure Interaction Problem Jörg-M. Sautter Mathematisches Institut, Universität Düsseldorf, Germany, sautter@am.uni-duesseldorf.de

More information

Outliers Treatment in Support Vector Regression for Financial Time Series Prediction

Outliers Treatment in Support Vector Regression for Financial Time Series Prediction Outliers Treatment in Support Vector Regression for Financial Time Series Prediction Haiqin Yang, Kaizhu Huang, Laiwan Chan, Irwin King, and Michael R. Lyu Department of Computer Science and Engineering

More information

Linear & nonlinear classifiers

Linear & nonlinear classifiers Linear & nonlinear classifiers Machine Learning Hamid Beigy Sharif University of Technology Fall 1394 Hamid Beigy (Sharif University of Technology) Linear & nonlinear classifiers Fall 1394 1 / 34 Table

More information

2 Systems of Linear Equations

2 Systems of Linear Equations 2 Systems of Linear Equations A system of equations of the form or is called a system of linear equations. x + 2y = 7 2x y = 4 5p 6q + r = 4 2p + 3q 5r = 7 6p q + 4r = 2 Definition. An equation involving

More information

Convergence Acceleration of Logarithmically Convergent Series Avoiding Summation

Convergence Acceleration of Logarithmically Convergent Series Avoiding Summation Convergence Acceleration of Logarithmically Convergent Series Avoiding Summation Herbert H. H. Homeier Institut für Physikalische und Theoretische Chemie Universität Regensburg, D-93040 Regensburg, Germany

More information

Computational Complexity and Genetic Algorithms

Computational Complexity and Genetic Algorithms Computational Complexity and Genetic Algorithms BART RYLANDER JAMES FOSTER School of Engineering Department of Computer Science University of Portland University of Idaho Portland, Or 97203 Moscow, Idaho

More information

An Exact Solution of the Differential Equation For flow-loaded Ropes

An Exact Solution of the Differential Equation For flow-loaded Ropes International Journal of Science and Technology Volume 5 No. 11, November, 2016 An Exact Solution of the Differential Equation For flow-loaded Ropes Mathias Paschen Chair of Ocean Engineering, University

More information

Linear & nonlinear classifiers

Linear & nonlinear classifiers Linear & nonlinear classifiers Machine Learning Hamid Beigy Sharif University of Technology Fall 1396 Hamid Beigy (Sharif University of Technology) Linear & nonlinear classifiers Fall 1396 1 / 44 Table

More information

Artificial Neural Network

Artificial Neural Network Artificial Neural Network Contents 2 What is ANN? Biological Neuron Structure of Neuron Types of Neuron Models of Neuron Analogy with human NN Perceptron OCR Multilayer Neural Network Back propagation

More information

Reinforcement Learning, Neural Networks and PI Control Applied to a Heating Coil

Reinforcement Learning, Neural Networks and PI Control Applied to a Heating Coil Reinforcement Learning, Neural Networks and PI Control Applied to a Heating Coil Charles W. Anderson 1, Douglas C. Hittle 2, Alon D. Katz 2, and R. Matt Kretchmar 1 1 Department of Computer Science Colorado

More information

Support Vector Machine (SVM) and Kernel Methods

Support Vector Machine (SVM) and Kernel Methods Support Vector Machine (SVM) and Kernel Methods CE-717: Machine Learning Sharif University of Technology Fall 2015 Soleymani Outline Margin concept Hard-Margin SVM Soft-Margin SVM Dual Problems of Hard-Margin

More information

λ-universe: Introduction and Preliminary Study

λ-universe: Introduction and Preliminary Study λ-universe: Introduction and Preliminary Study ABDOLREZA JOGHATAIE CE College Sharif University of Technology Azadi Avenue, Tehran IRAN Abstract: - Interactions between the members of an imaginary universe,

More information

Small sample size generalization

Small sample size generalization 9th Scandinavian Conference on Image Analysis, June 6-9, 1995, Uppsala, Sweden, Preprint Small sample size generalization Robert P.W. Duin Pattern Recognition Group, Faculty of Applied Physics Delft University

More information

On the convergence speed of artificial neural networks in the solving of linear systems

On the convergence speed of artificial neural networks in the solving of linear systems Available online at http://ijimsrbiauacir/ Int J Industrial Mathematics (ISSN 8-56) Vol 7, No, 5 Article ID IJIM-479, 9 pages Research Article On the convergence speed of artificial neural networks in

More information

General Properties for Determining Power Loss and Efficiency of Passive Multi-Port Microwave Networks

General Properties for Determining Power Loss and Efficiency of Passive Multi-Port Microwave Networks University of Massachusetts Amherst From the SelectedWorks of Ramakrishna Janaswamy 015 General Properties for Determining Power Loss and Efficiency of Passive Multi-Port Microwave Networks Ramakrishna

More information

Support Vector Machine (SVM) and Kernel Methods

Support Vector Machine (SVM) and Kernel Methods Support Vector Machine (SVM) and Kernel Methods CE-717: Machine Learning Sharif University of Technology Fall 2014 Soleymani Outline Margin concept Hard-Margin SVM Soft-Margin SVM Dual Problems of Hard-Margin

More information

Combination of M-Estimators and Neural Network Model to Analyze Inside/Outside Bark Tree Diameters

Combination of M-Estimators and Neural Network Model to Analyze Inside/Outside Bark Tree Diameters Combination of M-Estimators and Neural Network Model to Analyze Inside/Outside Bark Tree Diameters Kyriaki Kitikidou, Elias Milios, Lazaros Iliadis, and Minas Kaymakis Democritus University of Thrace,

More information

Convex envelopes, cardinality constrained optimization and LASSO. An application in supervised learning: support vector machines (SVMs)

Convex envelopes, cardinality constrained optimization and LASSO. An application in supervised learning: support vector machines (SVMs) ORF 523 Lecture 8 Princeton University Instructor: A.A. Ahmadi Scribe: G. Hall Any typos should be emailed to a a a@princeton.edu. 1 Outline Convexity-preserving operations Convex envelopes, cardinality

More information

Numerical methods for the Navier- Stokes equations

Numerical methods for the Navier- Stokes equations Numerical methods for the Navier- Stokes equations Hans Petter Langtangen 1,2 1 Center for Biomedical Computing, Simula Research Laboratory 2 Department of Informatics, University of Oslo Dec 6, 2012 Note:

More information

Algebra and Trigonometry 2006 (Foerster) Correlated to: Washington Mathematics Standards, Algebra 2 (2008)

Algebra and Trigonometry 2006 (Foerster) Correlated to: Washington Mathematics Standards, Algebra 2 (2008) A2.1. Core Content: Solving problems The first core content area highlights the type of problems students will be able to solve by the end of, as they extend their ability to solve problems with additional

More information

Statistical Pattern Recognition

Statistical Pattern Recognition Statistical Pattern Recognition Feature Extraction Hamid R. Rabiee Jafar Muhammadi, Alireza Ghasemi, Payam Siyari Spring 2014 http://ce.sharif.edu/courses/92-93/2/ce725-2/ Agenda Dimensionality Reduction

More information

Neural Networks for Two-Group Classification Problems with Monotonicity Hints

Neural Networks for Two-Group Classification Problems with Monotonicity Hints Neural Networks for Two-Group Classification Problems with Monotonicity Hints P. Lory 1, D. Gietl Institut für Wirtschaftsinformatik, Universität Regensburg, D-93040 Regensburg, Germany Abstract: Neural

More information

PREDICTION OF FATIGUE LIFE OF COLD FORGING TOOLS BY FE SIMULATION AND COMPARISON OF APPLICABILITY OF DIFFERENT DAMAGE MODELS

PREDICTION OF FATIGUE LIFE OF COLD FORGING TOOLS BY FE SIMULATION AND COMPARISON OF APPLICABILITY OF DIFFERENT DAMAGE MODELS PREDICTION OF FATIGUE LIFE OF COLD FORGING TOOLS BY FE SIMULATION AND COMPARISON OF APPLICABILITY OF DIFFERENT DAMAGE MODELS M. Meidert and C. Walter Thyssen/Krupp Presta AG Liechtenstein FL-9492 Eschen

More information

APPENDIX A. Background Mathematics. A.1 Linear Algebra. Vector algebra. Let x denote the n-dimensional column vector with components x 1 x 2.

APPENDIX A. Background Mathematics. A.1 Linear Algebra. Vector algebra. Let x denote the n-dimensional column vector with components x 1 x 2. APPENDIX A Background Mathematics A. Linear Algebra A.. Vector algebra Let x denote the n-dimensional column vector with components 0 x x 2 B C @. A x n Definition 6 (scalar product). The scalar product

More information

INVARIANT SUBSETS OF THE SEARCH SPACE AND THE UNIVERSALITY OF A GENERALIZED GENETIC ALGORITHM

INVARIANT SUBSETS OF THE SEARCH SPACE AND THE UNIVERSALITY OF A GENERALIZED GENETIC ALGORITHM INVARIANT SUBSETS OF THE SEARCH SPACE AND THE UNIVERSALITY OF A GENERALIZED GENETIC ALGORITHM BORIS MITAVSKIY Abstract In this paper we shall give a mathematical description of a general evolutionary heuristic

More information

Linearly-solvable Markov decision problems

Linearly-solvable Markov decision problems Advances in Neural Information Processing Systems 2 Linearly-solvable Markov decision problems Emanuel Todorov Department of Cognitive Science University of California San Diego todorov@cogsci.ucsd.edu

More information

Linear Discrimination Functions

Linear Discrimination Functions Laurea Magistrale in Informatica Nicola Fanizzi Dipartimento di Informatica Università degli Studi di Bari November 4, 2009 Outline Linear models Gradient descent Perceptron Minimum square error approach

More information

12. Lecture Stochastic Optimization

12. Lecture Stochastic Optimization Soft Control (AT 3, RMA) 12. Lecture Stochastic Optimization Differential Evolution 12. Structure of the lecture 1. Soft control: the definition and limitations, basics of expert" systems 2. Knowledge

More information

Determinants of Partition Matrices

Determinants of Partition Matrices journal of number theory 56, 283297 (1996) article no. 0018 Determinants of Partition Matrices Georg Martin Reinhart Wellesley College Communicated by A. Hildebrand Received February 14, 1994; revised

More information

Deep Belief Networks are compact universal approximators

Deep Belief Networks are compact universal approximators 1 Deep Belief Networks are compact universal approximators Nicolas Le Roux 1, Yoshua Bengio 2 1 Microsoft Research Cambridge 2 University of Montreal Keywords: Deep Belief Networks, Universal Approximation

More information

Proceedings of 12th International Heat Pipe Conference, pp , Moscow, Russia, 2002.

Proceedings of 12th International Heat Pipe Conference, pp , Moscow, Russia, 2002. 7KHUPDO3HUIRUPDQFH0RGHOLQJRI3XOVDWLQJ+HDW3LSHVE\$UWLILFLDO1HXUDO1HWZRUN Sameer Khandekar (a), Xiaoyu Cui (b), Manfred Groll (a) (a) IKE, University of Stuttgart, Pfaffenwaldring 31, 70569, Stuttgart, Germany.

More information

L p Approximation of Sigma Pi Neural Networks

L p Approximation of Sigma Pi Neural Networks IEEE TRANSACTIONS ON NEURAL NETWORKS, VOL. 11, NO. 6, NOVEMBER 2000 1485 L p Approximation of Sigma Pi Neural Networks Yue-hu Luo and Shi-yi Shen Abstract A feedforward Sigma Pi neural networks with a

More information

Taylor series. Chapter Introduction From geometric series to Taylor polynomials

Taylor series. Chapter Introduction From geometric series to Taylor polynomials Chapter 2 Taylor series 2. Introduction The topic of this chapter is find approximations of functions in terms of power series, also called Taylor series. Such series can be described informally as infinite

More information

Address for Correspondence

Address for Correspondence Research Article APPLICATION OF ARTIFICIAL NEURAL NETWORK FOR INTERFERENCE STUDIES OF LOW-RISE BUILDINGS 1 Narayan K*, 2 Gairola A Address for Correspondence 1 Associate Professor, Department of Civil

More information

On Elementary and Algebraic Cellular Automata

On Elementary and Algebraic Cellular Automata Chapter On Elementary and Algebraic Cellular Automata Yuriy Gulak Center for Structures in Extreme Environments, Mechanical and Aerospace Engineering, Rutgers University, New Jersey ygulak@jove.rutgers.edu

More information

Hybrid HMM/MLP models for time series prediction

Hybrid HMM/MLP models for time series prediction Bruges (Belgium), 2-23 April 999, D-Facto public., ISBN 2-649-9-X, pp. 455-462 Hybrid HMM/MLP models for time series prediction Joseph Rynkiewicz SAMOS, Université Paris I - Panthéon Sorbonne Paris, France

More information

Nonlinear Coordinate Transformations for Unconstrained Optimization I. Basic Transformations

Nonlinear Coordinate Transformations for Unconstrained Optimization I. Basic Transformations Nonlinear Coordinate Transformations for Unconstrained Optimization I. Basic Transformations TIBOR CSENDES Kalmár Laboratory, József Attila University, Szeged, Hungary and TAMÁS RAPCSÁK Computer and Automation

More information

Neural Network Based Response Surface Methods a Comparative Study

Neural Network Based Response Surface Methods a Comparative Study . LS-DYNA Anwenderforum, Ulm Robustheit / Optimierung II Neural Network Based Response Surface Methods a Comparative Study Wolfram Beyer, Martin Liebscher, Michael Beer, Wolfgang Graf TU Dresden, Germany

More information

Lecture 7 Artificial neural networks: Supervised learning

Lecture 7 Artificial neural networks: Supervised learning Lecture 7 Artificial neural networks: Supervised learning Introduction, or how the brain works The neuron as a simple computing element The perceptron Multilayer neural networks Accelerated learning in

More information

Notes on the Matrix-Tree theorem and Cayley s tree enumerator

Notes on the Matrix-Tree theorem and Cayley s tree enumerator Notes on the Matrix-Tree theorem and Cayley s tree enumerator 1 Cayley s tree enumerator Recall that the degree of a vertex in a tree (or in any graph) is the number of edges emanating from it We will

More information

AI Programming CS F-20 Neural Networks

AI Programming CS F-20 Neural Networks AI Programming CS662-2008F-20 Neural Networks David Galles Department of Computer Science University of San Francisco 20-0: Symbolic AI Most of this class has been focused on Symbolic AI Focus or symbols

More information

A Novel Activity Detection Method

A Novel Activity Detection Method A Novel Activity Detection Method Gismy George P.G. Student, Department of ECE, Ilahia College of,muvattupuzha, Kerala, India ABSTRACT: This paper presents an approach for activity state recognition of

More information

Artificial Neural Network Method of Rock Mass Blastability Classification

Artificial Neural Network Method of Rock Mass Blastability Classification Artificial Neural Network Method of Rock Mass Blastability Classification Jiang Han, Xu Weiya, Xie Shouyi Research Institute of Geotechnical Engineering, Hohai University, Nanjing, Jiangshu, P.R.China

More information

CS 4700: Foundations of Artificial Intelligence

CS 4700: Foundations of Artificial Intelligence CS 4700: Foundations of Artificial Intelligence Prof. Bart Selman selman@cs.cornell.edu Machine Learning: Neural Networks R&N 18.7 Intro & perceptron learning 1 2 Neuron: How the brain works # neurons

More information

Mark Gales October y (x) x 1. x 2 y (x) Inputs. Outputs. x d. y (x) Second Output layer layer. layer.

Mark Gales October y (x) x 1. x 2 y (x) Inputs. Outputs. x d. y (x) Second Output layer layer. layer. University of Cambridge Engineering Part IIB & EIST Part II Paper I0: Advanced Pattern Processing Handouts 4 & 5: Multi-Layer Perceptron: Introduction and Training x y (x) Inputs x 2 y (x) 2 Outputs x

More information

AN ELEMENTARY PROOF OF THE SPECTRAL RADIUS FORMULA FOR MATRICES

AN ELEMENTARY PROOF OF THE SPECTRAL RADIUS FORMULA FOR MATRICES AN ELEMENTARY PROOF OF THE SPECTRAL RADIUS FORMULA FOR MATRICES JOEL A. TROPP Abstract. We present an elementary proof that the spectral radius of a matrix A may be obtained using the formula ρ(a) lim

More information

MODULE -4 BAYEIAN LEARNING

MODULE -4 BAYEIAN LEARNING MODULE -4 BAYEIAN LEARNING CONTENT Introduction Bayes theorem Bayes theorem and concept learning Maximum likelihood and Least Squared Error Hypothesis Maximum likelihood Hypotheses for predicting probabilities

More information

arxiv:quant-ph/ v1 20 Apr 1995

arxiv:quant-ph/ v1 20 Apr 1995 Combinatorial Computation of Clebsch-Gordan Coefficients Klaus Schertler and Markus H. Thoma Institut für Theoretische Physik, Universität Giessen, 3539 Giessen, Germany (February, 008 The addition of

More information

Chapter Two Elements of Linear Algebra

Chapter Two Elements of Linear Algebra Chapter Two Elements of Linear Algebra Previously, in chapter one, we have considered single first order differential equations involving a single unknown function. In the next chapter we will begin to

More information

A New Approach to Estimating the Expected First Hitting Time of Evolutionary Algorithms

A New Approach to Estimating the Expected First Hitting Time of Evolutionary Algorithms A New Approach to Estimating the Expected First Hitting Time of Evolutionary Algorithms Yang Yu and Zhi-Hua Zhou National Laboratory for Novel Software Technology Nanjing University, Nanjing 20093, China

More information

Artificial Neural Network Simulation of Battery Performance

Artificial Neural Network Simulation of Battery Performance Artificial work Simulation of Battery Performance C.C. O Gorman, D. Ingersoll, R.G. Jungst and T.L. Paez Sandia National Laboratories PO Box 58 Albuquerque, NM 8785 Abstract Although they appear deceptively

More information