Universitat Wurzburg. Institut fur Theoretische Physik. Am Hubland, D{97074 Wurzburg, Germany. Selection of Examples for a Linear Classier

Size: px
Start display at page:

Download "Universitat Wurzburg. Institut fur Theoretische Physik. Am Hubland, D{97074 Wurzburg, Germany. Selection of Examples for a Linear Classier"

Transcription

1 Universitat Wurzburg Institut fur Theoretische Physik Am Hubland, D{9774 Wurzburg, Germany } Selection of Examples for a Linear Classier Georg Jung and Manfred Opper Ref.: WUE-ITP-95- ftp : ftp.physik.uni-wuerzburg.de georgju@physik.uni-wuerzburg.de

2

3 Selection of Examples for a Linear Classier Georg Jungyand Manfred Opperz yphysikalisches Institut, Julius-Maximilians-Universitat Am Hubland, D{9774 Wurzburg, Federal Republic of Germany zthe Baskin Center for Computer Engineering & Information Sciences, University of California, Santa Cruz, CA 9564, USA Abstract. We investigate the problem of selecting an informative subsample out of a neural network's training data. Using the replica method of statistical mechanics, we calculate the performance of a heuristic selection algorithm for a linear neural network which avoids overtting. PACS numbers: 87..+e, 5.9.+m Short title: Selection of Examples December 9, 995

4 . Introduction Finding the optimal complexity of a neural network for learning an unknown task is one of the most interesting problems in the theory of neural computation. A popular strategy is to start with a complex network having more connections than actually are needed. Then, after training, by deleting couplings which are too small or seem to be of less importance, one hopes to end up with a network which has reasonable performance (see e.g.[3]). The strategy of clipping network weights has also been investigated in the framework of statistical mechanics (see e.g.[, 4, 5]). In this paper, we will look at a problem, which is in some sense dual to the problem of pruning the network weights. We consider the problem of pruning the training examples. Besides its interest from a purely theoretical viewpoint, such problem is motivated from the so called phenomenon of overtting in network learning. If the complexity of a network is too high, it may be able to t all training examples perfectly, but the probability of predicting the outputs on new data may drop to the value of random guessing. As a result, the learning curve, which displays the generalization error as a function of the number of examples, may show a nonmonotonic behaviour. This means that an increase of the number of examples can in some regions lead to a decrease in generalization abilities. Recently, Garces has found in numerical studies that overtting can be avoided by deleting examples from the training set [8]. In this respect, it is interesting to nd out from which selection of examples the performance of the network will benet most. In this article, we will study a heuristic strategy for the selection of examples in the case of a linear classier applied to two toy problems. From an analytical viewpoint, it is simple enough to be free of the problems of replica symmetry breaking. This approach should not be confused with the well studied problem of learning with queries [4, 5, 6]. In the latter case, one selects inputs with respect to some criterion, before their outputs are observed. In our case, all training examples, i.e. both inputs and outputs are known to the learner. The paper is organized as follows: In the second section, we briey review some properties of the Adaline algorithm which will be basic to our treatment. The third section explains the heuristic strategy for example selection. In section four, we introduce two dierent rules to be learnt by the network. Section ve contains a new approach to the statistical mechanics of the problem. Finally, in sections six and seven, the results of our calculations are presented and discussed.

5 . Adaline Learning 3 As has been shown e.g. in [9, 3,, ], the eect of overtting can already be observed for the case of a simple linear classier, the so called Adaline model [3,, ], which we will discuss in the following sections. For any N dimensional input vector this linear classier is dened by the output S = p N J : () For p linearly independent input vectors, and arbitrary real valued target outputs S, = ; : : : ; p, the system of equations S = p N J () will always have solutions. Hence, for random inputs, where linear depencies are unlikely, it will be possible to adjust the vector J of the N network weights J j, j = ; : : : ; N in such a way that if p < N, all training examples are perfectly learnt by the network. An explicit solution for such a weight vector is given by the pseudo-inverse-solution (PSI) []: J = NX ;= S B(C ) ; (3) P where C = N j j j is the correlation matrix of the training patterns. Out of the linear space of solutions to equation (), this is the one which has has minimal squared norm q = J =N. The restriction of our treatment to the case p < N is mainly for mathematical convenience. However, especially in this region the eect of overtting can be observed. 3. Weighting of Examples Our basic strategy for the selection of examples is based on the fact that most iterative learning algorithms for a single layer net result in a coupling vector which has the form of a weighted Hebbian rule [4, 9]. To see this, consider a learning algorithm of the backpropagation type which is based on the minimization of a quadratic training energy E = NX = (S B? g J (h J)) ; (4) by gradient descent. Here we assume a smooth output g J (h J ) of the student net, which is dened through the internal eld h J = p N J. Hence, during a learning step, the

6 4 network couplings are changed by an amount J j : (5) It is not hard to show that the algorithm (5) for g J (h ) = h, when started with a zero initial vector, converges to the Adaline rule (3). where To rewrite (5) as a weighted Hebbian sum, we set J j = NX = x (t) / x S B j ; (6) NX = (? (S B ) g J J : J represents the weighting (in the following called the embedding strength) of the 'th example in the t'th learning step. Our selection of examples will be based on the following heuristic ansatz: Intuitively, we will expect that examples which have small or even negative total embedding strengths x = P t x (t) after learning could be cast out of the training set. The latter ones would have an output that has already the correct sign without being learnt. This strategy may also be understood as an approximation to the AdaTron algorithm [5] which, by construction, allows for nonnegative embedding strengths only. Another possibility was discussed in a paper by Garces [8] for the case of Adatron learning. Only examples which are not too hard to learn, i.e. which have positive embedding strengths below a certain value, were left in the training set To nd an expression for the total embedding strengths, it is not necessary to solve the dynamics (5). We can get the same information directly from the nal coupling vector. For the Adaline case, with = p < the form of the coupling vector (3) N provides us with an explicit expression for the embedding strengths which is given by x = S B X S B(C ) ; valid for p < N. To treat the statistical mechanics of the problem, however a simpler implicit denition of x 's using a suitable Lagrangean will be given in section 5. Although it is possible to obtain the coupling vector of the linear classier for >, useful expressions for the corresponding embedding strenghts are harder to nd and to treat within in the framework of statistical mechanics. Hence, we will restrict ourselves to the region <.

7 5 In order to keep the subsequent analysis simple, we will not assume that the Adaline algorithm has to be rerun for a second time on the reduced training set. We will rather make the ansatz, that the new weight vector is given a priori via single shot learning by a weighted Hebbian sum of the form ~J = p X f(x )S B N : (8) In the following, we will consider two choices for the weighting function f(x). One choice will be called the modied Adaline rule f(x) = x(x), where examples with negative embedding strenghts are abandoned, and the positive ones keep their original weights. This may be considered as an approximation to the more complicated algorithm of relearning the remaining examples with the Adaline method. For comparison, we will also study the simpler choice f(x) = (x) (modied Hebbian rule), where the remaining examples have equal weights. 4. Learning Tasks Like in most theoretical studies of neural networks, the task to be learnt will be dened by a teacher network. It provides the correct outputs S B for a given input. For the simplest case, the teacher is given by a single layer network with xed vector of couplings B and output S B = g B (h B ); (9) with h B = p N B. The complexity of such teacher networks can be tuned by suitably varying the output function g B. In all the following cases, an explicit mismatch between teacher and student is assumed by chosing g B as a nonlinear function. For simplicity, we specialize on the binary case gb =. Hence, even when the teacher problem is of the simple type g B (h) = sign(h), the linear output function of the student used for the training process will cause overtting. For better comparison with the teacher net, we will measure the performance of the student network after training by its clipped output sign(h J ). Hence, we dene the generalization error as the following average: g = hjg B(h B )? sign(h J )ji : () In the simplest case, where the inputs are drawn from a spherically symmetric density, f() = ()?N= e? jj

8 6 it is possible to describe the generalization error in terms of the angle between the two vectors B and ~ J. = B J ~ j Jj ~ jbj = R= ~ q ~q Here ~ R = B ~ J=N denes the overlap between teacher and student, and ~q = ~ J =N. For simplicity, we have chosen the norm of the teacher to be p N. We will investigate two dierent rules. (i) We begin with a linearly separable task, dened by a teacher perceptron with a nonzero threshold. The output function is: g B (h B ) = sign(h B? ) The shaded regions in Figure display the fraction of input space, where the answers from teacher and student dier. The generalization error can be easily calculated to be: g = cos ~ R p~q + where Dh is the gaussian measure: Dh = p dh e?h = and (x) = x R Dh: Dh q ~Rh ~q? ~ R (ii) The second rule is the so-called reversed-wedge-problem [9]: g B (h B ) = sign(h B (h B? ) ): A? 3 5 ; () The shaded regions in Figure display in the same way as before the fraction of input space, where the answers from teacher and student dier. The generalization error can be similarly calculated: g = cos R ~ p~q + Dh q 5. Statistical Mechanics: A Lagrangean Approach ~Rh ~q? ~ R A? 3 5 : () Following the approach of Elizabeth Gardner [, ], the application of statistical mechanics to network learning (for a review see [6, 7, 8]) is often based on the fact, that the network congurations J j obtained from a learning algorithm are minima of a suitable training energy E. In this case, by introducing a canonical ensemble of networks

9 at temperature, the desired conguration appears as the one with maximal weight in the partition function = R dje?e, in the limit. This procedure works ne [3] e.g. in order to determine the order parameters of the standard Adaline rule, which are explicit functions of the couplings. However it does not give us any direct information on the embedding strenghts, which are only implicitely related to the couplings. One possibility to obtain these quantities, would be to introduce their explicit denitions within functions, see e.g.[7]. In our paper, we will use a new, technically more elegant approach which is based on a Langrangean formulation of the optimization problem rather than on a Hamiltonian E. We will make explicit use of the fact that the coupling vector is the solution of the following constrained optimization problem: Minimize J under the constraints = p N J = S B; 8: By introducing Lagrange parameters x together with the Lagrange function L(J; fs B x g) = J? X S B x p N J? S B 7 ; (3) we can solve the optimization problem by nding the point in N + P dimensional space of J j 's and x 's where L is stationary. Setting the partial derivative of L with respect to the J j 's equal to zero, we see that J = p X x S B N : (4) Thus, the x 's actually coincide with the embedding strenghts. However, the stationary point of L is a saddlepoint, not a minimum. In order to keep integrals nite, we will work with the following complex partition function Y = dj j dy exp [?L(J; fiy g] : (5) Y j Using the saddlepoint method, by suitably deforming the contour of integration of the y 's, we nd that in the limit, the dominant contribution to comes from the stationary point of the Lagrangean L. Using the complex distribution in, we are able to calculate any average of functions of the embedding strenghts by identifying y with?is Bx at the saddlepoint. Especially, we are interested in the distribution W ( ~ J l ) of an arbitrary component of the new student vector ~ Jl (fx g). This enables us to calculate the order parameters necessary to describe the generalization ability of a network with the new couplings ~ Jl. The characteristic function ~(k) = he ik ~ J l (fx g) i of this random variable is expressed as a further average

10 8 over the complex distribution dened by the partition function (5). We will denote this average by h: : :i and get ** ~(k) = lim exp lim * + i X " # + + ik J ~ l (fi y S g) B = (6) Y j dj j Y dy p exp y p N J? S B? J + ik ~ Jl + Introducing n replicas and the local eld of the teacher, we see that in the n limit only the l?th component of the student vector contributes. ~(k) = * lim ;n exp? i X p N l X a Y a X a dj la Y ;a dy a Y p dh dv X X Jla? i g B (h )y a? i ;a h v + iy b + J la y a + B l v + kf g g B (h B (h ) ) After averaging and introducing appropriate order parameters, assuming replica symmetry, we obtain in the limit n ~(k) = exp? k A + ikb lb (7) ; (8) with A = F + G=? 3 Q = and b = T? U=: These constants are dened by the following order parameters, which again can be calculated within the replica framework. Q ab = hy a y b i; F = iy G ab = iy a b f g B (h) f iy g B (h) S = hiyg B (h)i; T = ivf g B (h) gb(h) Finally, = (G aa? G ab ) and = + (Q aa? Q ab ): ; U = hyvi (9) iy g B (h) : g B (h) Explicit expressions for these quantities are given in Appendix B. By a Fouriertransform of (8) we obtain the Gaussian distribution W ( Jl ~ ) = p exp? ( J ~ l? B l b) A A : ()

11 9 Hence, using the self-averaging property of order parameters, the overlap of the new weight vector with the teacher and the corresponding norm are given by ~R = N and ~q = N NX l= NX l= h ~ Jl ib l = b h ~ J l i = A + b : Using these orderparameters, the generalization error for the dierent tasks of section 4 can be calculated from the results of section Results In this section we present the learning curves of our algorithms applied to the learning tasks of section 4. For both learning tasks, the distribution P (x) of embedding strenghts (see Figure 3 and Appendix A) broadens as increases and shows an increasing fraction of negative x. Hence, using our algorithm, a mostly increasing fraction of examples, which approaches as, will be cast out of the training set. In Figure 4, we have displayed the relative number of remaining examples eff = p =N examples, for both learning tasks. Here eff = P (x) dx: Figure 5 shows the generalization error for the modied Adaline algorithm with weight function f(x) = x(x), learning a perceptron with threshold. The dashed curve was obtained for the standard Adaline algorithm, where for only the trivial generalization " g = is achieved. Obviously, by selecting examples, this overtting phenomenon vanishes. The little symbols on the curves are results of numerical simulations of the algorithm. Since the modied algorithm achieves smaller errors than the minimum of the dashed curve, it performes better than an Adaline algorithm applied to a random selection of eff N examples. It is interesting to note that both R ~ and ~q diverge for as in the case of normal Adaline learning [3]. The ratio R= ~ p ~q however, which enters the formulae for " g, remains nite. For the same task, Figure 7 shows the result for the modied Hebb method f(x) = (x). This yields, except for close to, a slight reduction of performance compared to the modied Adaline case. The dashed curves are the results for Hebbian

12 learning of a random selection of the same number of examples p = eff N. However both sets of curves are displayed as functions of the initial size of the training set. This comparison again proves that the selected examples contain more information about the learning task than a random subset of examples. Figures 6 and 8 display the performance of the algorithms in the case of the reversedwedge-problem. The overall behaviour is roughly the same as in the case of the threshold perceptron. 7. Conclusion Using statistical mechanics, we have analysed a simple strategy for pruning the training set of examples in case of a linear classier network. By rewriting the coupling vector as a Hebbian sum, the natural concept of the weight of an example is introduced. Deleting the examples with negative weights from the training set avoids the overtting phenomenon. By using dierent weight functions f(x), our analysis might be extended to other types of example selections e.g. erasing the ones which are too hard to learn [8]. In this case, we expect that the network's performance will benet mostly from such a procedure, when is suciently larger than. However, such an analysis requires a dierent mathematical treatment than that of section 5. It might be further interesting to see, whether similar strategies can be developed for more complicated networks. Acknowledgement We are grateful to Wolfgang Kinzel and Ronny Meir for stimulating discussions. G. Jung was supported by a DFG grant and M. Opper by a Heisenberg fellowship of DFG. Appendix A. Distribution of Embedding Strenghts In this appendix we will briey derive the calculation of the distribution of embedding strengths x. Dening the characteristic function (k) = he ikx i = lim he? ky S B i i ; and introducing replicas one gets (k) = * Y lim ;n j;a? i X dj ja Y h v + p i X N Y dy a ;a j ;j X a dh dv exp? J ja y a + B j v X Jja j;a (A)

13 X +? kyb g B (h )? i yg(h a ) ;a Upon averaging over the inputs and dening order parameters R a q ab = hj a J b i, we obtain in replica symmetry (k) = lim ;n Y? (q? R ) X = lim DhDz exp Y Dh dy a exp ;a X a y a + i X ;a k g B(h) + ik? X (y) a ;a y a (Rh? g B (h ))?? Rh g B (h) = hbj a i and kyb g B (h ) : g B (h)? i kz p q? R with = (q? q). Assuming binary outputs, jg B (h)j = this nally yields P (x) = q (q? R ) Dh exp? (? x? Rhg B(h)) (q? R ) : (A) The order parameters R and q can be obtained by the techniques introduced in [3, ] and read R = R Dh hg B (h); q = R Dhg B (h)?r for <. Explicit results for continuous functions g B (h) are given in []. Finally, the parameter can be determined using eq. (3) together with (A) q = N J = N X ; dx xp (x):? g B (h )(C ) g B (h ) = N X g B (h )x = (A3) (A4) In the last equality, we have specialised on binary outputs only. Calculating the integral with the help of (A) and comparing with (A3) leads to =?. Appendix B. Order Parameters The explicit results for the order parameters (9) are: Q =? q Dh Dz Rh? g B (h) + z q? R ; U =? Dh hg B (h);

14 = + ; =? F = G =? T = Dh Dz f? Rh? g B(h) + z p q? R g B (h) Dh Dz g B(h)f? Rh? g B(h) + z p q? R g B (h)? q? R ; Dh Dz g B (h) (Rh? g B (h)) f? Rh? g B(h) + z p q? R Dh Dz hg B (h)f? Rh? g B(h) + z p q? R g B (h) ; ; g B (h)? R ; The parameters Q and U depend only on the learning task and are independent of the choice of the weight function f(x). (i) Perceptron with threshold q Q =? q +? R exp? s U =? exp? (B5) (ii) reversed-wedge-problem q Q =? q +? R s U =? " exp? h i exp?? #? For modied Hebbian learning f(x) = (x) we get: F = Dh g B (h) (Rh? g B (h))? q A ; gb(h)(q? R ) dh q =? q p g (q? R B ) (h) exp? qh (q? R ) + Rhg B(h) q? R? g B(h) (q? R ) ;

15 G =? Dh g B (h) (Rh? g B g B (h) (Rh? g B (h))? q gb(h)(q? R ) A 3 T =? q? R Dh hg B g B (h) (Rh? g B (h))? q A? R: gb(h)(q? R ) In a similar way we are able to obtain the parameters for modied Adaline{learning, i.e. f(x) = x(x): References F =? =? T = Dh (Rh? g B(h)) + q? R p q? R p dh g p B (h)(rh? g B (h)) q gb(h) exp? qh (q? R ) + Rhg B(h)? g B(h) q? R (q? R ) g B (h)(rh? g B (h))? q A ; gb(h)(q? R ) Dh hg B(h)? R(h? ) p q? R dh hg B (h) q gb(h) [] Gardner E 988 J. Phys. A: Math. Gen. 57 [] Gardner E and Derrida B 988 J. Phys. A: Math. Gen. g B (h)(rh? g B (h))? q gb(h)(q? R g B (h)(rh? g B (h))? q A + gb(h)(q? R ) ; A exp? qh (q? R ) + Rhg B(h) q? R? g B(h) (q? R ) [3] Opper M, Kinzel W, Kleinz J and Nehl R 99 J. Phys. A: Math. Gen. 3 L58 [4] Hebb D O 949 The organization of behaviour, (New York Wiley) [5] Anlauf J and Biehl M 989 Europhys. Lett. 687 [6] Seung H S, Sompolinsky H and Tishby N 99 Phys. Rev. A [7] Watkin T L H, Rau A and Biehl M 993 Rev. Mod. Phys [8] Opper M and Kinzel W Statistical Mechanics of Generalization in: Physics of Neural Networks III, edited by van Hemmen J S, Domany E and Schulten K (Berlin: Springer, to be published). [9] Vallet F 989 Europhys. Lett [] Boes S, Kinzel W and Opper M 993 Phys. Rev. E [] Kohonen T 988 Self-Organisation and associative memory (Berlin: Springer). :

16 4 [] Diederich S and Opper M 987 Phys. Rev. Lett. 59, 949 [3] Widrow B and Ho M E 96 WESCON Convention Report IV 96 [4] Kinzel W and Rujan P 99 Europhys. Lett [5] Watkin T L H and Rau A 99 J. Phys. A: Math. Gen. 5 3 [6] Seung H S, Opper M and Sompolinsky H 99 contribution to the Vth Annual Workshop on Computational Learning Theory (COLT9) (Pittsburgh 99); pages 87-94, published by the Ass. for Computing Machinery, New York [7] Opper M 988 Phys. Rev. A [8] Garces R Scheme to improve the generalization error in: Proceedings of the 993 Connectionist Models Summer School pages , edited by Mozer M, Smolensky P, Touretzky D, Elman J and Weigend A [9] Watkin T L H and Rau A 99 Phys. Rev. A 45 4 [] Garces R, Kuhlmann P and Eissfeller H 99 J. Phys. A: Math. Gen. 5 L335 [] Kinzel W and Opper M 99 Dynamics of Learning; in: Physics of Neural Networks, ed. by van Hemmen J L, Domany E and Schulten K; published by Springer Verlag, p. 49 [] Krogh A and Hertz J 99 in: Advances in Neural Information Processing Systems III (San Mateo: Morgan Kaufmann) [3] LeCun Y, Denker J S and Solla S 99 Optimal Brain Damage, in: Advances in Neural Information Processing Systems II (Denver 989), ed. Touretzky D, 598 (San Mateo: Morgan Kaufmann) [4] Bouten M, Engel A, Komoda A and Serneels R 99 J. Phys. A: Math. Gen [5] Kuhlmann P and Muller K R 994 J. Phys. A:Math. Gen

17 Figure Captions Figure : Geometry of input space for perceptron with threshold. Figure : Geometry of input space for reversed-wedge-problem. Figure 3: Distribution of embedding strenghts for signum-teacher with threshold =. a) Theoretical results for =.4,.6,.8,.9 (upper to lower curves), b) Theoretical results (dashed) and simulations (histo) for = 3=3, c) Same as b) but for = 8=3. Figure 4: Relative size eff = p =N of pruned training set, versus relative size = p=n of original training set for a) perceptron with threshold (lled line =, dashed line = :4) and b) reversed-wedge-problem (lled line =, dashed line = :). Figure 5: Generalization error for modied Adaline algorithm (full lines) compared with regular Adaline (dashed lines). Task: perceptron with threshold. The symbols with errorbars represent results achieved by a perceptron with N = 3 input units averaged over draws of dierent learning sets. Figure 6: Same as in Figure 5, but for reversed-wedge-problem. Figure 7: Generalization error for modied Hebb learning (full lines). Task: a perceptron with threshold. The dashed lines show regular Hebb learning using a random selection of p = N examples. Simulations are done analogue to Figure 5. Figure 8: Same as in Figure 7, but for reversed-wedge-problem. In most cases the modied hebb rule achieves better results than the modied Adaline algorithm except the realizable problem =. Figure 8: Same as in Figure 7, but for reversed-wedge-problem. In most cases the modied hebb rule achieves better results than the modied Adaline algorithm except the realizable problem =.

18 6 6? B J Figure. 6? B J 6? Figure.

19 (a).35 w(x) x. w(x) (b).5 w(x). (c) x x Figure 3.

20 8 (a).6 eff (b).6 eff Figure 4.

21 9 g.5.45 = :4 = :.4 = : = Figure 5..5 = : g.45 = :6.4 = : = Figure 6.

22 g.5.45 = :4 = :.4 = : = Figure 7..5 = : g.45 = : = :4.3 = Figure 8.

arxiv:cond-mat/ v1 [cond-mat.dis-nn] 18 Nov 1996

arxiv:cond-mat/ v1 [cond-mat.dis-nn] 18 Nov 1996 arxiv:cond-mat/9611130v1 [cond-mat.dis-nn] 18 Nov 1996 Learning by dilution in a Neural Network B López and W Kinzel Institut für Theoretische Physik, Universität Würzburg, Am Hubland, D-97074 Würzburg

More information

ESANN'1999 proceedings - European Symposium on Artificial Neural Networks Bruges (Belgium), April 1999, D-Facto public., ISBN X, pp.

ESANN'1999 proceedings - European Symposium on Artificial Neural Networks Bruges (Belgium), April 1999, D-Facto public., ISBN X, pp. Statistical mechanics of support vector machines Arnaud Buhot and Mirta B. Gordon Department de Recherche Fondamentale sur la Matiere Condensee CEA-Grenoble, 17 rue des Martyrs, 38054 Grenoble Cedex 9,

More information

Statistical Mechanics of Learning : Generalization

Statistical Mechanics of Learning : Generalization Statistical Mechanics of Learning : Generalization Manfred Opper Neural Computing Research Group Aston University Birmingham, United Kingdom Short title: Statistical Mechanics of Generalization Correspondence:

More information

Learning with Ensembles: How. over-tting can be useful. Anders Krogh Copenhagen, Denmark. Abstract

Learning with Ensembles: How. over-tting can be useful. Anders Krogh Copenhagen, Denmark. Abstract Published in: Advances in Neural Information Processing Systems 8, D S Touretzky, M C Mozer, and M E Hasselmo (eds.), MIT Press, Cambridge, MA, pages 190-196, 1996. Learning with Ensembles: How over-tting

More information

arxiv:cond-mat/ v1 3 Mar 1997

arxiv:cond-mat/ v1 3 Mar 1997 On-Line AdaTron Learning of Unlearnable Rules Jun-ichi Inoue and Hidetoshi Nishimori Department of Physics, Tokyo Institute of Technology, Oh-okayama, Meguro-ku, Tokyo 152, Japan arxiv:cond-mat/9703019v1

More information

below, kernel PCA Eigenvectors, and linear combinations thereof. For the cases where the pre-image does exist, we can provide a means of constructing

below, kernel PCA Eigenvectors, and linear combinations thereof. For the cases where the pre-image does exist, we can provide a means of constructing Kernel PCA Pattern Reconstruction via Approximate Pre-Images Bernhard Scholkopf, Sebastian Mika, Alex Smola, Gunnar Ratsch, & Klaus-Robert Muller GMD FIRST, Rudower Chaussee 5, 12489 Berlin, Germany fbs,

More information

Since Gardner's pioneering work on the storage capacity of a single layer perceptron [?], there have been numerous eorts to use the statistical mechan

Since Gardner's pioneering work on the storage capacity of a single layer perceptron [?], there have been numerous eorts to use the statistical mechan Weight space structure and the storage capacity of a fully-connected committee machine Yuansheng Xiong and Jong-Hoon Oh Department of Physics, Pohang Institute of Science and Technology, Hyoa San 31, Pohang,

More information

Support Vector Regression with Automatic Accuracy Control B. Scholkopf y, P. Bartlett, A. Smola y,r.williamson FEIT/RSISE, Australian National University, Canberra, Australia y GMD FIRST, Rudower Chaussee

More information

Error Empirical error. Generalization error. Time (number of iteration)

Error Empirical error. Generalization error. Time (number of iteration) Submitted to Neural Networks. Dynamics of Batch Learning in Multilayer Networks { Overrealizability and Overtraining { Kenji Fukumizu The Institute of Physical and Chemical Research (RIKEN) E-mail: fuku@brain.riken.go.jp

More information

Gaussian Processes for Regression. Carl Edward Rasmussen. Department of Computer Science. Toronto, ONT, M5S 1A4, Canada.

Gaussian Processes for Regression. Carl Edward Rasmussen. Department of Computer Science. Toronto, ONT, M5S 1A4, Canada. In Advances in Neural Information Processing Systems 8 eds. D. S. Touretzky, M. C. Mozer, M. E. Hasselmo, MIT Press, 1996. Gaussian Processes for Regression Christopher K. I. Williams Neural Computing

More information

= w 2. w 1. B j. A j. C + j1j2

= w 2. w 1. B j. A j. C + j1j2 Local Minima and Plateaus in Multilayer Neural Networks Kenji Fukumizu and Shun-ichi Amari Brain Science Institute, RIKEN Hirosawa 2-, Wako, Saitama 35-098, Japan E-mail: ffuku, amarig@brain.riken.go.jp

More information

a subset of these N input variables. A naive method is to train a new neural network on this subset to determine this performance. Instead of the comp

a subset of these N input variables. A naive method is to train a new neural network on this subset to determine this performance. Instead of the comp Input Selection with Partial Retraining Pierre van de Laar, Stan Gielen, and Tom Heskes RWCP? Novel Functions SNN?? Laboratory, Dept. of Medical Physics and Biophysics, University of Nijmegen, The Netherlands.

More information

Hierarchical Learning in Polynomial Support Vector Machines

Hierarchical Learning in Polynomial Support Vector Machines Machine Learning, 46, 53 70, 2002 c 2002 Kluwer Academic Publishers. Manufactured in The Netherlands. Hierarchical Learning in Polynomial Support Vector Machines SEBASTIAN RISAU-GUSMAN MIRTA B. GORDON

More information

Stochastic Learning in a Neural Network with Adapting. Synapses. Istituto Nazionale di Fisica Nucleare, Sezione di Bari

Stochastic Learning in a Neural Network with Adapting. Synapses. Istituto Nazionale di Fisica Nucleare, Sezione di Bari Stochastic Learning in a Neural Network with Adapting Synapses. G. Lattanzi 1, G. Nardulli 1, G. Pasquariello and S. Stramaglia 1 Dipartimento di Fisica dell'universita di Bari and Istituto Nazionale di

More information

Ecient Higher-order Neural Networks. for Classication and Function Approximation. Joydeep Ghosh and Yoan Shin. The University of Texas at Austin

Ecient Higher-order Neural Networks. for Classication and Function Approximation. Joydeep Ghosh and Yoan Shin. The University of Texas at Austin Ecient Higher-order Neural Networks for Classication and Function Approximation Joydeep Ghosh and Yoan Shin Department of Electrical and Computer Engineering The University of Texas at Austin Austin, TX

More information

NONLINEAR CLASSIFICATION AND REGRESSION. J. Elder CSE 4404/5327 Introduction to Machine Learning and Pattern Recognition

NONLINEAR CLASSIFICATION AND REGRESSION. J. Elder CSE 4404/5327 Introduction to Machine Learning and Pattern Recognition NONLINEAR CLASSIFICATION AND REGRESSION Nonlinear Classification and Regression: Outline 2 Multi-Layer Perceptrons The Back-Propagation Learning Algorithm Generalized Linear Models Radial Basis Function

More information

Stochastic Dynamics of Learning with Momentum. in Neural Networks. Department of Medical Physics and Biophysics,

Stochastic Dynamics of Learning with Momentum. in Neural Networks. Department of Medical Physics and Biophysics, Stochastic Dynamics of Learning with Momentum in Neural Networks Wim Wiegerinck Andrzej Komoda Tom Heskes 2 Department of Medical Physics and Biophysics, University of Nijmegen, Geert Grooteplein Noord

More information

Introduction to Neural Networks

Introduction to Neural Networks Introduction to Neural Networks What are (Artificial) Neural Networks? Models of the brain and nervous system Highly parallel Process information much more like the brain than a serial computer Learning

More information

Automatic Capacity Tuning. of Very Large VC-dimension Classiers. B. Boser 3. EECS Department, University of California, Berkeley, CA 94720

Automatic Capacity Tuning. of Very Large VC-dimension Classiers. B. Boser 3. EECS Department, University of California, Berkeley, CA 94720 Automatic Capacity Tuning of Very Large VC-dimension Classiers I. Guyon AT&T Bell Labs, 50 Fremont st., 6 th oor, San Francisco, CA 94105 isabelle@neural.att.com B. Boser 3 EECS Department, University

More information

Gaussian process for nonstationary time series prediction

Gaussian process for nonstationary time series prediction Computational Statistics & Data Analysis 47 (2004) 705 712 www.elsevier.com/locate/csda Gaussian process for nonstationary time series prediction Soane Brahim-Belhouari, Amine Bermak EEE Department, Hong

More information

1 Introduction Tasks like voice or face recognition are quite dicult to realize with conventional computer systems, even for the most powerful of them

1 Introduction Tasks like voice or face recognition are quite dicult to realize with conventional computer systems, even for the most powerful of them Information Storage Capacity of Incompletely Connected Associative Memories Holger Bosch Departement de Mathematiques et d'informatique Ecole Normale Superieure de Lyon Lyon, France Franz Kurfess Department

More information

Stable Adaptive Momentum for Rapid Online Learning in Nonlinear Systems

Stable Adaptive Momentum for Rapid Online Learning in Nonlinear Systems Stable Adaptive Momentum for Rapid Online Learning in Nonlinear Systems Thore Graepel and Nicol N. Schraudolph Institute of Computational Science ETH Zürich, Switzerland {graepel,schraudo}@inf.ethz.ch

More information

Non-Euclidean Independent Component Analysis and Oja's Learning

Non-Euclidean Independent Component Analysis and Oja's Learning Non-Euclidean Independent Component Analysis and Oja's Learning M. Lange 1, M. Biehl 2, and T. Villmann 1 1- University of Appl. Sciences Mittweida - Dept. of Mathematics Mittweida, Saxonia - Germany 2-

More information

A Simple Weight Decay Can Improve Generalization

A Simple Weight Decay Can Improve Generalization A Simple Weight Decay Can Improve Generalization Anders Krogh CONNECT, The Niels Bohr Institute Blegdamsvej 17 DK-2100 Copenhagen, Denmark krogh@cse.ucsc.edu John A. Hertz Nordita Blegdamsvej 17 DK-2100

More information

Applied Mathematics &Optimization

Applied Mathematics &Optimization Appl Math Optim 29: 211-222 (1994) Applied Mathematics &Optimization c 1994 Springer-Verlag New Yor Inc. An Algorithm for Finding the Chebyshev Center of a Convex Polyhedron 1 N.D.Botin and V.L.Turova-Botina

More information

Batch-mode, on-line, cyclic, and almost cyclic learning 1 1 Introduction In most neural-network applications, learning plays an essential role. Throug

Batch-mode, on-line, cyclic, and almost cyclic learning 1 1 Introduction In most neural-network applications, learning plays an essential role. Throug A theoretical comparison of batch-mode, on-line, cyclic, and almost cyclic learning Tom Heskes and Wim Wiegerinck RWC 1 Novel Functions SNN 2 Laboratory, Department of Medical hysics and Biophysics, University

More information

On Learning µ-perceptron Networks with Binary Weights

On Learning µ-perceptron Networks with Binary Weights On Learning µ-perceptron Networks with Binary Weights Mostefa Golea Ottawa-Carleton Institute for Physics University of Ottawa Ottawa, Ont., Canada K1N 6N5 050287@acadvm1.uottawa.ca Mario Marchand Ottawa-Carleton

More information

Ground state and low excitations of an integrable. Institut fur Physik, Humboldt-Universitat, Theorie der Elementarteilchen

Ground state and low excitations of an integrable. Institut fur Physik, Humboldt-Universitat, Theorie der Elementarteilchen Ground state and low excitations of an integrable chain with alternating spins St Meinerz and B - D Dorfelx Institut fur Physik, Humboldt-Universitat, Theorie der Elementarteilchen Invalidenstrae 110,

More information

T sg. α c (0)= T=1/β. α c (T ) α=p/n

T sg. α c (0)= T=1/β. α c (T ) α=p/n Taejon, Korea, vol. 2 of 2, pp. 779{784, Nov. 2. Capacity Analysis of Bidirectional Associative Memory Toshiyuki Tanaka y, Shinsuke Kakiya y, and Yoshiyuki Kabashima z ygraduate School of Engineering,

More information

Support Vector Machines vs Multi-Layer. Perceptron in Particle Identication. DIFI, Universita di Genova (I) INFN Sezione di Genova (I) Cambridge (US)

Support Vector Machines vs Multi-Layer. Perceptron in Particle Identication. DIFI, Universita di Genova (I) INFN Sezione di Genova (I) Cambridge (US) Support Vector Machines vs Multi-Layer Perceptron in Particle Identication N.Barabino 1, M.Pallavicini 2, A.Petrolini 1;2, M.Pontil 3;1, A.Verri 4;3 1 DIFI, Universita di Genova (I) 2 INFN Sezione di Genova

More information

114 EUROPHYSICS LETTERS i) We put the question of the expansion over the set in connection with the Schrodinger operator under consideration (an accur

114 EUROPHYSICS LETTERS i) We put the question of the expansion over the set in connection with the Schrodinger operator under consideration (an accur EUROPHYSICS LETTERS 15 April 1998 Europhys. Lett., 42 (2), pp. 113-117 (1998) Schrodinger operator in an overfull set A. N. Gorban and I. V. Karlin( ) Computing Center RAS - Krasnoyars 660036, Russia (received

More information

bound on the likelihood through the use of a simpler variational approximating distribution. A lower bound is particularly useful since maximization o

bound on the likelihood through the use of a simpler variational approximating distribution. A lower bound is particularly useful since maximization o Category: Algorithms and Architectures. Address correspondence to rst author. Preferred Presentation: oral. Variational Belief Networks for Approximate Inference Wim Wiegerinck David Barber Stichting Neurale

More information

(a) (b) (c) Time Time. Time

(a) (b) (c) Time Time. Time Baltzer Journals Stochastic Neurodynamics and the System Size Expansion Toru Ohira and Jack D. Cowan 2 Sony Computer Science Laboratory 3-4-3 Higashi-gotanda, Shinagawa, Tokyo 4, Japan E-mail: ohiracsl.sony.co.jp

More information

Learning About Spin Glasses Enzo Marinari (Cagliari, Italy) I thank the organizers... I am thrilled since... (Realistic) Spin Glasses: a debated, inte

Learning About Spin Glasses Enzo Marinari (Cagliari, Italy) I thank the organizers... I am thrilled since... (Realistic) Spin Glasses: a debated, inte Learning About Spin Glasses Enzo Marinari (Cagliari, Italy) I thank the organizers... I am thrilled since... (Realistic) Spin Glasses: a debated, interesting issue. Mainly work with: Giorgio Parisi, Federico

More information

p D q EA Figure 3: F as a function of p d plotted for p d ' q EA, t = 0:05 and y =?0:01.

p D q EA Figure 3: F as a function of p d plotted for p d ' q EA, t = 0:05 and y =?0:01. F p D q EA Figure 3: F as a function of p d plotted for p d ' q EA, t = :5 and y =?:. 5 F p A Figure : F as a function of p d plotted for < p d < 2p A, t = :5 and y =?:. F p A p max p p p D q EA Figure

More information

Vectors in many applications, most of the i, which are found by solving a quadratic program, turn out to be 0. Excellent classication accuracies in bo

Vectors in many applications, most of the i, which are found by solving a quadratic program, turn out to be 0. Excellent classication accuracies in bo Fast Approximation of Support Vector Kernel Expansions, and an Interpretation of Clustering as Approximation in Feature Spaces Bernhard Scholkopf 1, Phil Knirsch 2, Alex Smola 1, and Chris Burges 2 1 GMD

More information

Pattern Recognition Prof. P. S. Sastry Department of Electronics and Communication Engineering Indian Institute of Science, Bangalore

Pattern Recognition Prof. P. S. Sastry Department of Electronics and Communication Engineering Indian Institute of Science, Bangalore Pattern Recognition Prof. P. S. Sastry Department of Electronics and Communication Engineering Indian Institute of Science, Bangalore Lecture - 27 Multilayer Feedforward Neural networks with Sigmoidal

More information

Equivalence of Backpropagation and Contrastive Hebbian Learning in a Layered Network

Equivalence of Backpropagation and Contrastive Hebbian Learning in a Layered Network LETTER Communicated by Geoffrey Hinton Equivalence of Backpropagation and Contrastive Hebbian Learning in a Layered Network Xiaohui Xie xhx@ai.mit.edu Department of Brain and Cognitive Sciences, Massachusetts

More information

Linear Regression and Its Applications

Linear Regression and Its Applications Linear Regression and Its Applications Predrag Radivojac October 13, 2014 Given a data set D = {(x i, y i )} n the objective is to learn the relationship between features and the target. We usually start

More information

Lecture 5: Logistic Regression. Neural Networks

Lecture 5: Logistic Regression. Neural Networks Lecture 5: Logistic Regression. Neural Networks Logistic regression Comparison with generative models Feed-forward neural networks Backpropagation Tricks for training neural networks COMP-652, Lecture

More information

Linear & nonlinear classifiers

Linear & nonlinear classifiers Linear & nonlinear classifiers Machine Learning Hamid Beigy Sharif University of Technology Fall 1394 Hamid Beigy (Sharif University of Technology) Linear & nonlinear classifiers Fall 1394 1 / 34 Table

More information

Gaussian Process Regression: Active Data Selection and Test Point. Rejection. Sambu Seo Marko Wallat Thore Graepel Klaus Obermayer

Gaussian Process Regression: Active Data Selection and Test Point. Rejection. Sambu Seo Marko Wallat Thore Graepel Klaus Obermayer Gaussian Process Regression: Active Data Selection and Test Point Rejection Sambu Seo Marko Wallat Thore Graepel Klaus Obermayer Department of Computer Science, Technical University of Berlin Franklinstr.8,

More information

460 HOLGER DETTE AND WILLIAM J STUDDEN order to examine how a given design behaves in the model g` with respect to the D-optimality criterion one uses

460 HOLGER DETTE AND WILLIAM J STUDDEN order to examine how a given design behaves in the model g` with respect to the D-optimality criterion one uses Statistica Sinica 5(1995), 459-473 OPTIMAL DESIGNS FOR POLYNOMIAL REGRESSION WHEN THE DEGREE IS NOT KNOWN Holger Dette and William J Studden Technische Universitat Dresden and Purdue University Abstract:

More information

550 XU Hai-Bo, WANG Guang-Rui, and CHEN Shi-Gang Vol. 37 the denition of the domain. The map is a generalization of the standard map for which (J) = J

550 XU Hai-Bo, WANG Guang-Rui, and CHEN Shi-Gang Vol. 37 the denition of the domain. The map is a generalization of the standard map for which (J) = J Commun. Theor. Phys. (Beijing, China) 37 (2002) pp 549{556 c International Academic Publishers Vol. 37, No. 5, May 15, 2002 Controlling Strong Chaos by an Aperiodic Perturbation in Area Preserving Maps

More information

On the Damage-Spreading in the Bak-Sneppen Model. Max-Planck-Institut fur Physik komplexer Systeme, Nothnitzer Str. 38, D Dresden.

On the Damage-Spreading in the Bak-Sneppen Model. Max-Planck-Institut fur Physik komplexer Systeme, Nothnitzer Str. 38, D Dresden. On the Damage-Spreading in the Bak-Sneppen Model Angelo Valleriani a and Jose Luis Vega b Max-Planck-Institut fur Physik komplexer Systeme, Nothnitzer Str. 38, D-01187 Dresden. We explain the results recently

More information

Storage Capacity of Letter Recognition in Hopfield Networks

Storage Capacity of Letter Recognition in Hopfield Networks Storage Capacity of Letter Recognition in Hopfield Networks Gang Wei (gwei@cs.dal.ca) Zheyuan Yu (zyu@cs.dal.ca) Faculty of Computer Science, Dalhousie University, Halifax, N.S., Canada B3H 1W5 Abstract:

More information

Iterative procedure for multidimesional Euler equations Abstracts A numerical iterative scheme is suggested to solve the Euler equations in two and th

Iterative procedure for multidimesional Euler equations Abstracts A numerical iterative scheme is suggested to solve the Euler equations in two and th Iterative procedure for multidimensional Euler equations W. Dreyer, M. Kunik, K. Sabelfeld, N. Simonov, and K. Wilmanski Weierstra Institute for Applied Analysis and Stochastics Mohrenstra e 39, 07 Berlin,

More information

Electronic Notes in Theoretical Computer Science 18 (1998) URL: 8 pages Towards characterizing bisim

Electronic Notes in Theoretical Computer Science 18 (1998) URL:   8 pages Towards characterizing bisim Electronic Notes in Theoretical Computer Science 18 (1998) URL: http://www.elsevier.nl/locate/entcs/volume18.html 8 pages Towards characterizing bisimilarity of value-passing processes with context-free

More information

Australian National University. Abstract. number of training examples necessary for satisfactory learning performance grows

Australian National University. Abstract. number of training examples necessary for satisfactory learning performance grows The VC-Dimension and Pseudodimension of Two-Layer Neural Networks with Discrete Inputs Peter L. Bartlett Robert C. Williamson Department of Systems Engineering Research School of Information Sciences and

More information

Learning curves for Gaussian processes regression: A framework for good approximations

Learning curves for Gaussian processes regression: A framework for good approximations Learning curves for Gaussian processes regression: A framework for good approximations Dorthe Malzahn Manfred Opper Neural Computing Research Group School of Engineering and Applied Science Aston University,

More information

Online learning from finite training sets: An analytical case study

Online learning from finite training sets: An analytical case study Online learning from finite training sets: An analytical case study Peter Sollich* Department of Physics University of Edinburgh Edinburgh EH9 3JZ, U.K. P.SollichOed.ac.uk David Barbert Neural Computing

More information

2 Tikhonov Regularization and ERM

2 Tikhonov Regularization and ERM Introduction Here we discusses how a class of regularization methods originally designed to solve ill-posed inverse problems give rise to regularized learning algorithms. These algorithms are kernel methods

More information

Max Margin-Classifier

Max Margin-Classifier Max Margin-Classifier Oliver Schulte - CMPT 726 Bishop PRML Ch. 7 Outline Maximum Margin Criterion Math Maximizing the Margin Non-Separable Data Kernels and Non-linear Mappings Where does the maximization

More information

Multi-layer Neural Networks

Multi-layer Neural Networks Multi-layer Neural Networks Steve Renals Informatics 2B Learning and Data Lecture 13 8 March 2011 Informatics 2B: Learning and Data Lecture 13 Multi-layer Neural Networks 1 Overview Multi-layer neural

More information

Introduction to Artificial Neural Networks

Introduction to Artificial Neural Networks Facultés Universitaires Notre-Dame de la Paix 27 March 2007 Outline 1 Introduction 2 Fundamentals Biological neuron Artificial neuron Artificial Neural Network Outline 3 Single-layer ANN Perceptron Adaline

More information

CSC 411 Lecture 10: Neural Networks

CSC 411 Lecture 10: Neural Networks CSC 411 Lecture 10: Neural Networks Roger Grosse, Amir-massoud Farahmand, and Juan Carrasquilla University of Toronto UofT CSC 411: 10-Neural Networks 1 / 35 Inspiration: The Brain Our brain has 10 11

More information

On Learning Perceptrons with Binary Weights

On Learning Perceptrons with Binary Weights On Learning Perceptrons with Binary Weights Mostefa Golea, Mario Marchand Ottawa-Carleton Institute for Physics University of Ottawa Ottawa, Ont., Canada K1N 6N5 Submitted to Neural Computation, third

More information

Early Brain Damage. Volker Tresp, Ralph Neuneier and Hans Georg Zimmermann Siemens AG, Corporate Technologies Otto-Hahn-Ring München, Germany

Early Brain Damage. Volker Tresp, Ralph Neuneier and Hans Georg Zimmermann Siemens AG, Corporate Technologies Otto-Hahn-Ring München, Germany Early Brain Damage Volker Tresp, Ralph Neuneier and Hans Georg Zimmermann Siemens AG, Corporate Technologies Otto-Hahn-Ring 6 81730 München, Germany Abstract Optimal Brain Damage (OBD is a method for reducing

More information

Input layer. Weight matrix [ ] Output layer

Input layer. Weight matrix [ ] Output layer MASSACHUSETTS INSTITUTE OF TECHNOLOGY Department of Electrical Engineering and Computer Science 6.034 Artificial Intelligence, Fall 2003 Recitation 10, November 4 th & 5 th 2003 Learning by perceptrons

More information

The perceptron learning algorithm is one of the first procedures proposed for learning in neural network models and is mostly credited to Rosenblatt.

The perceptron learning algorithm is one of the first procedures proposed for learning in neural network models and is mostly credited to Rosenblatt. 1 The perceptron learning algorithm is one of the first procedures proposed for learning in neural network models and is mostly credited to Rosenblatt. The algorithm applies only to single layer models

More information

Constrained Leja points and the numerical solution of the constrained energy problem

Constrained Leja points and the numerical solution of the constrained energy problem Journal of Computational and Applied Mathematics 131 (2001) 427 444 www.elsevier.nl/locate/cam Constrained Leja points and the numerical solution of the constrained energy problem Dan I. Coroian, Peter

More information

y(n) Time Series Data

y(n) Time Series Data Recurrent SOM with Local Linear Models in Time Series Prediction Timo Koskela, Markus Varsta, Jukka Heikkonen, and Kimmo Kaski Helsinki University of Technology Laboratory of Computational Engineering

More information

Simple Neural Nets For Pattern Classification

Simple Neural Nets For Pattern Classification CHAPTER 2 Simple Neural Nets For Pattern Classification Neural Networks General Discussion One of the simplest tasks that neural nets can be trained to perform is pattern classification. In pattern classification

More information

Generalization in a Hopfield network

Generalization in a Hopfield network Generalization in a Hopfield network J.F. Fontanari To cite this version: J.F. Fontanari. Generalization in a Hopfield network. Journal de Physique, 1990, 51 (21), pp.2421-2430. .

More information

Math 1270 Honors ODE I Fall, 2008 Class notes # 14. x 0 = F (x; y) y 0 = G (x; y) u 0 = au + bv = cu + dv

Math 1270 Honors ODE I Fall, 2008 Class notes # 14. x 0 = F (x; y) y 0 = G (x; y) u 0 = au + bv = cu + dv Math 1270 Honors ODE I Fall, 2008 Class notes # 1 We have learned how to study nonlinear systems x 0 = F (x; y) y 0 = G (x; y) (1) by linearizing around equilibrium points. If (x 0 ; y 0 ) is an equilibrium

More information

A New Method to Determine First-Order Transition Points from Finite-Size Data

A New Method to Determine First-Order Transition Points from Finite-Size Data A New Method to Determine First-Order Transition Points from Finite-Size Data Christian Borgs and Wolfhard Janke Institut für Theoretische Physik Freie Universität Berlin Arnimallee 14, 1000 Berlin 33,

More information

K. Pyragas* Semiconductor Physics Institute, LT-2600 Vilnius, Lithuania Received 19 March 1998

K. Pyragas* Semiconductor Physics Institute, LT-2600 Vilnius, Lithuania Received 19 March 1998 PHYSICAL REVIEW E VOLUME 58, NUMBER 3 SEPTEMBER 998 Synchronization of coupled time-delay systems: Analytical estimations K. Pyragas* Semiconductor Physics Institute, LT-26 Vilnius, Lithuania Received

More information

3.4 Linear Least-Squares Filter

3.4 Linear Least-Squares Filter X(n) = [x(1), x(2),..., x(n)] T 1 3.4 Linear Least-Squares Filter Two characteristics of linear least-squares filter: 1. The filter is built around a single linear neuron. 2. The cost function is the sum

More information

The error-backpropagation algorithm is one of the most important and widely used (and some would say wildly used) learning techniques for neural

The error-backpropagation algorithm is one of the most important and widely used (and some would say wildly used) learning techniques for neural 1 2 The error-backpropagation algorithm is one of the most important and widely used (and some would say wildly used) learning techniques for neural networks. First we will look at the algorithm itself

More information

Artificial Neural Networks. MGS Lecture 2

Artificial Neural Networks. MGS Lecture 2 Artificial Neural Networks MGS 2018 - Lecture 2 OVERVIEW Biological Neural Networks Cell Topology: Input, Output, and Hidden Layers Functional description Cost functions Training ANNs Back-Propagation

More information

Machine Learning and Adaptive Systems. Lectures 3 & 4

Machine Learning and Adaptive Systems. Lectures 3 & 4 ECE656- Lectures 3 & 4, Professor Department of Electrical and Computer Engineering Colorado State University Fall 2015 What is Learning? General Definition of Learning: Any change in the behavior or performance

More information

Optimization and Gradient Descent

Optimization and Gradient Descent Optimization and Gradient Descent INFO-4604, Applied Machine Learning University of Colorado Boulder September 12, 2017 Prof. Michael Paul Prediction Functions Remember: a prediction function is the function

More information

Ingo Ahrns, Jorg Bruske, Gerald Sommer. Christian Albrechts University of Kiel - Cognitive Systems Group. Preusserstr Kiel - Germany

Ingo Ahrns, Jorg Bruske, Gerald Sommer. Christian Albrechts University of Kiel - Cognitive Systems Group. Preusserstr Kiel - Germany On-line Learning with Dynamic Cell Structures Ingo Ahrns, Jorg Bruske, Gerald Sommer Christian Albrechts University of Kiel - Cognitive Systems Group Preusserstr. 1-9 - 24118 Kiel - Germany Phone: ++49

More information

Introduction to Machine Learning

Introduction to Machine Learning Introduction to Machine Learning Neural Networks Varun Chandola x x 5 Input Outline Contents February 2, 207 Extending Perceptrons 2 Multi Layered Perceptrons 2 2. Generalizing to Multiple Labels.................

More information

Neural Networks. Fundamentals Framework for distributed processing Network topologies Training of ANN s Notation Perceptron Back Propagation

Neural Networks. Fundamentals Framework for distributed processing Network topologies Training of ANN s Notation Perceptron Back Propagation Neural Networks Fundamentals Framework for distributed processing Network topologies Training of ANN s Notation Perceptron Back Propagation Neural Networks Historical Perspective A first wave of interest

More information

Microcanonical scaling in small systems arxiv:cond-mat/ v1 [cond-mat.stat-mech] 3 Jun 2004

Microcanonical scaling in small systems arxiv:cond-mat/ v1 [cond-mat.stat-mech] 3 Jun 2004 Microcanonical scaling in small systems arxiv:cond-mat/0406080v1 [cond-mat.stat-mech] 3 Jun 2004 M. Pleimling a,1, H. Behringer a and A. Hüller a a Institut für Theoretische Physik 1, Universität Erlangen-Nürnberg,

More information

Midterm, Fall 2003

Midterm, Fall 2003 5-78 Midterm, Fall 2003 YOUR ANDREW USERID IN CAPITAL LETTERS: YOUR NAME: There are 9 questions. The ninth may be more time-consuming and is worth only three points, so do not attempt 9 unless you are

More information

High-conductance states in a mean-eld cortical network model

High-conductance states in a mean-eld cortical network model Neurocomputing 58 60 (2004) 935 940 www.elsevier.com/locate/neucom High-conductance states in a mean-eld cortical network model Alexander Lerchner a;, Mandana Ahmadi b, John Hertz b a Oersted-DTU, Technical

More information

0.7 mw. 1.4 mw 2.2 mw [10 6 ] laser power [mw]

0.7 mw. 1.4 mw 2.2 mw [10 6 ] laser power [mw] EUROPHYSICS LETTERS 23 December 1997 Europhys. Lett., (), pp. 1-1 (1997) Inseparable time evolution of anisotropic Bose-Einstein condensates H. Wallis 1 and H. Steck 1;2 1 Max-Planck-Institut fur Quantenoptik,

More information

Machine Learning. Neural Networks

Machine Learning. Neural Networks Machine Learning Neural Networks Bryan Pardo, Northwestern University, Machine Learning EECS 349 Fall 2007 Biological Analogy Bryan Pardo, Northwestern University, Machine Learning EECS 349 Fall 2007 THE

More information

output dimension input dimension Gaussian evidence Gaussian Gaussian evidence evidence from t +1 inputs and outputs at time t x t+2 x t-1 x t+1

output dimension input dimension Gaussian evidence Gaussian Gaussian evidence evidence from t +1 inputs and outputs at time t x t+2 x t-1 x t+1 To appear in M. S. Kearns, S. A. Solla, D. A. Cohn, (eds.) Advances in Neural Information Processing Systems. Cambridge, MA: MIT Press, 999. Learning Nonlinear Dynamical Systems using an EM Algorithm Zoubin

More information

Quantized 2d Dilaton Gravity and. abstract. We have examined a modied dilaton gravity whose action is separable into the

Quantized 2d Dilaton Gravity and. abstract. We have examined a modied dilaton gravity whose action is separable into the FIT-HE-95-3 March 995 hep-th/xxxxxx Quantized d Dilaton Gravity and Black Holes PostScript processed by the SLAC/DESY Libraries on 9 Mar 995. Kazuo Ghoroku Department of Physics, Fukuoka Institute of Technology,Wajiro,

More information

CSC321 Lecture 4: Learning a Classifier

CSC321 Lecture 4: Learning a Classifier CSC321 Lecture 4: Learning a Classifier Roger Grosse Roger Grosse CSC321 Lecture 4: Learning a Classifier 1 / 28 Overview Last time: binary classification, perceptron algorithm Limitations of the perceptron

More information

An Integral Representation of Functions using. Three-layered Networks and Their Approximation. Bounds. Noboru Murata 1

An Integral Representation of Functions using. Three-layered Networks and Their Approximation. Bounds. Noboru Murata 1 An Integral epresentation of Functions using Three-layered Networks and Their Approximation Bounds Noboru Murata Department of Mathematical Engineering and Information Physics, University of Tokyo, Hongo

More information

1.1 GRAPHS AND LINEAR FUNCTIONS

1.1 GRAPHS AND LINEAR FUNCTIONS MATHEMATICS EXTENSION 4 UNIT MATHEMATICS TOPIC 1: GRAPHS 1.1 GRAPHS AND LINEAR FUNCTIONS FUNCTIONS The concept of a function is already familiar to you. Since this concept is fundamental to mathematics,

More information

cond-mat/ Mar 1995

cond-mat/ Mar 1995 Critical Exponents of the Superconducting Phase Transition Michael Kiometzis, Hagen Kleinert, and Adriaan M. J. Schakel Institut fur Theoretische Physik Freie Universitat Berlin Arnimallee 14, 14195 Berlin

More information

Abstract. In this paper we propose recurrent neural networks with feedback into the input

Abstract. In this paper we propose recurrent neural networks with feedback into the input Recurrent Neural Networks for Missing or Asynchronous Data Yoshua Bengio Dept. Informatique et Recherche Operationnelle Universite de Montreal Montreal, Qc H3C-3J7 bengioy@iro.umontreal.ca Francois Gingras

More information

A Dynamical Implementation of Self-organizing Maps. Wolfgang Banzhaf. Department of Computer Science, University of Dortmund, Germany.

A Dynamical Implementation of Self-organizing Maps. Wolfgang Banzhaf. Department of Computer Science, University of Dortmund, Germany. Fraunhofer Institut IIE, 994, pp. 66 73 A Dynamical Implementation of Self-organizing Maps Abstract Wolfgang Banzhaf Department of Computer Science, University of Dortmund, Germany and Manfred Schmutz

More information

Widths. Center Fluctuations. Centers. Centers. Widths

Widths. Center Fluctuations. Centers. Centers. Widths Radial Basis Functions: a Bayesian treatment David Barber Bernhard Schottky Neural Computing Research Group Department of Applied Mathematics and Computer Science Aston University, Birmingham B4 7ET, U.K.

More information

arxiv:hep-th/ v1 17 Jun 2003

arxiv:hep-th/ v1 17 Jun 2003 HD-THEP-03-28 Dirac s Constrained Hamiltonian Dynamics from an Unconstrained Dynamics arxiv:hep-th/030655v 7 Jun 2003 Heinz J. Rothe Institut für Theoretische Physik Universität Heidelberg, Philosophenweg

More information

SPSS, University of Texas at Arlington. Topics in Machine Learning-EE 5359 Neural Networks

SPSS, University of Texas at Arlington. Topics in Machine Learning-EE 5359 Neural Networks Topics in Machine Learning-EE 5359 Neural Networks 1 The Perceptron Output: A perceptron is a function that maps D-dimensional vectors to real numbers. For notational convenience, we add a zero-th dimension

More information

Lecture 4: Perceptrons and Multilayer Perceptrons

Lecture 4: Perceptrons and Multilayer Perceptrons Lecture 4: Perceptrons and Multilayer Perceptrons Cognitive Systems II - Machine Learning SS 2005 Part I: Basic Approaches of Concept Learning Perceptrons, Artificial Neuronal Networks Lecture 4: Perceptrons

More information

Automatic Differentiation and Neural Networks

Automatic Differentiation and Neural Networks Statistical Machine Learning Notes 7 Automatic Differentiation and Neural Networks Instructor: Justin Domke 1 Introduction The name neural network is sometimes used to refer to many things (e.g. Hopfield

More information

Z :fdwexp[ PF= ((lnz)) = lim n '(((Z")) 1). E(W) = g e(w;s'), PE(W)], dw= +dw, P o(w), (3) P(W) =Z 'exp[ PE(w)],

Z :fdwexp[ PF= ((lnz)) = lim n '(((Z)) 1). E(W) = g e(w;s'), PE(W)], dw= +dw, P o(w), (3) P(W) =Z 'exp[ PE(w)], VOLUME 65, NUMBER 13 PHYSICAL REVIEW LETTERS 24 SEPTEMBER 1990 Learning from Examples in Large Neural Networks H. Sompolinsky and N. Tishby AT& T Bell Laboratories, Murray Hill, /Ver Jersey 07974 H. S.

More information

hep-lat/ Dec 94

hep-lat/ Dec 94 IASSNS-HEP-xx/xx RU-94-99 Two dimensional twisted chiral fermions on the lattice. Rajamani Narayanan and Herbert Neuberger * School of Natural Sciences, Institute for Advanced Study, Olden Lane, Princeton,

More information

with angular brackets denoting averages primes the corresponding residuals, then eq. (2) can be separated into two coupled equations for the time evol

with angular brackets denoting averages primes the corresponding residuals, then eq. (2) can be separated into two coupled equations for the time evol This paper was published in Europhys. Lett. 27, 353{357, 1994 Current Helicity the Turbulent Electromotive Force N. Seehafer Max-Planck-Gruppe Nichtlineare Dynamik, Universitat Potsdam, PF 601553, D-14415

More information

The Erwin Schrodinger International Pasteurgasse 6/7. Institute for Mathematical Physics A-1090 Wien, Austria

The Erwin Schrodinger International Pasteurgasse 6/7. Institute for Mathematical Physics A-1090 Wien, Austria ESI The Erwin Schrodinger International Pasteurgasse 6/7 Institute for Mathematical Physics A-1090 Wien, Austria On an Extension of the 3{Dimensional Toda Lattice Mircea Puta Vienna, Preprint ESI 165 (1994)

More information

Approximation Algorithms for Maximum. Coverage and Max Cut with Given Sizes of. Parts? A. A. Ageev and M. I. Sviridenko

Approximation Algorithms for Maximum. Coverage and Max Cut with Given Sizes of. Parts? A. A. Ageev and M. I. Sviridenko Approximation Algorithms for Maximum Coverage and Max Cut with Given Sizes of Parts? A. A. Ageev and M. I. Sviridenko Sobolev Institute of Mathematics pr. Koptyuga 4, 630090, Novosibirsk, Russia fageev,svirg@math.nsc.ru

More information

Fast pruning using principal components

Fast pruning using principal components Oregon Health & Science University OHSU Digital Commons CSETech January 1993 Fast pruning using principal components Asriel U. Levin Todd K. Leen John E. Moody Follow this and additional works at: http://digitalcommons.ohsu.edu/csetech

More information

APPENDIX A. Background Mathematics. A.1 Linear Algebra. Vector algebra. Let x denote the n-dimensional column vector with components x 1 x 2.

APPENDIX A. Background Mathematics. A.1 Linear Algebra. Vector algebra. Let x denote the n-dimensional column vector with components x 1 x 2. APPENDIX A Background Mathematics A. Linear Algebra A.. Vector algebra Let x denote the n-dimensional column vector with components 0 x x 2 B C @. A x n Definition 6 (scalar product). The scalar product

More information