Slow Dynamics Due to Singularities of Hierarchical Learning Machines
|
|
- Deirdre Freeman
- 5 years ago
- Views:
Transcription
1 Progress of Theoretical Physics Supplement No. 157, Slow Dynamics Due to Singularities of Hierarchical Learning Machines Hyeyoung Par 1, Masato Inoue 2, and Masato Oada 3, 1 Computer Science Dept., Kyungpoo National Univ., Daegu , Korea 2 Department of Computational Intelligence and Systems Science, Interdisciplinary Graduate School of Science and Engineering, Toyo Institute of Technology, Yoohama , Japan 3 Department of Complexity Science and Engineering, Graduate School of Frontier Sciences, The University of Toyo, Kashiwa , Japan Recently, slow dynamics in learning of neural networs has been nown to be closely related to singularities, which exist in parameter spaces of hierarchical learning models. To show the influence of singular structure on learning dynamics, we tae statistical mechanical approaches and investigate online-learning dynamics under various learning scenario with different relationship between optimum and singularities. From the investigation, we found a quasi-plateau phenomenon which differs from the well nown plateau. The quasi-plateau and plateau become extremely serious when an optimal point is in a neighborhood of a singularity. The quasi-plateau and plateau disappear in the natural gradient learning, which taes singular structures into account and uses Riemannian measure for the parameter space. 1. Introduction Parameter spaces of hierarchical learning models such as multilayer perceptrons have complex singular structures, which are responsible for various nontrivial properties in estimation performances and learning dynamics. 1, 3 Even though many statistical mechanical analysis on the slow learning dynamics have been done, 2, 5, 6 the singular structure and its influence on learning dynamics have not been treated. Therefore, some interesting phenomena caused by singularities, which we shall discuss in this paper, have not been observed in the previous wors. To see influences of singularity in various situations, it is important to choose learning models and learning scenario carefully, considering the geometrical structure of parameter spaces. We use a multilayer perceptron MLP with one hidden layer. This is the simplest models that have typical singular structure in its parameter spaces. Note that soft-committee machines do not have it. We investigate three types of learning tas classified by the positional relationship between optimal point and singularity. We analyze dynamics of natural gradient as well as that of standard gradient to see the different properties of the two methods. hypar@nu.ac.r Also at RIKEN BSI, Wao , Japan. inoue@sp.dis.titech.ac.jp Also at RIKEN BSI, Wao , Japan, and at Intelligent Cooperation and Control, PRESTO, JST, c/o RIKEN BSI, Wao , Japan. oada@brain.rien.jp
2 276 H. Par, M. Inoue and M. Oada 2. Model with singularity and its learning We use a simple MLP M K with K hidden units defined as ζ = f J,wξ = K i=1 w igj i ξ. Here, ξ R N is the input vector; J i R N and w i R are the weight parameters connected to the i-th hidden unit; and N is the number of input nodes. We also assume a teacher networ ζ = f B,vξ of the same architecture with M hidden units and parameter B n and v n. The space of M K has a typical hierarchical structure with singularities. Consider a space of MLP with two hidden units, S 2. Each point in S 2 is specified by parameters J i and w i i = 1, 2. In the space, a set of all points satisfying J 1 = J 2 = J 0 and w 1 +w 2 = w 0 represent the same MLP with one hidden unit specified by J 0 and w 0. Since all the points on the line w 1 + w 2 = w 0 have the same entropy, the Fisher information matrix becomes singular on the points. In addition, if we consider a functional space of M K, all those points shrin into one point maing an intrinsic singularity in the space. These singular subspaces in which all the points have the same energy level are ubiquitous in the space of MLP and may cause unpleasant phenomena in learning dynamics. We can also suppose that the influence of the singularity intensifies when the optimum is located at or around the singularities. We investigate two on-line gradient descent learning algorithm; the standard gradient learning and the natural gradient learning. At each learning step, new training data ξ, ζ is generated from the teacher networ. The student networ is trained to decrease the squared error between its output and ζ from teacher. The update term of standard gradient learning is given by J i = η N δ iξ, w i = η K M N gx ie, e = w j gx j v n gy n, 2.1 where δ i = w i g x i e, x i = J i ξ, y n = B n ξ, and η is a learning rate. For the natural gradient learning, we need a Fisher information matrix G of a stochastic model of the student networ and its inverse, which we denote as G 1 G ww G wj = G wj T G JJ j=1 ] ] G w i w j i,j=1..k G w i J j = ] G w i J j T ] i,j=1..k G J i J j n=1 i,j=1..k i,j=1..k ]. 2.2 Each bloc of the matrix can be written by G w ij j = φ ij U T, G J ij j = θ ij I + UΘ ij U T, U = J 1,, J K ], where the scalars G w iw j and θ ij, a 1 K vector φ ij, and a K K matrix Θ ij can be deterministically calculated for given J and w. 2 Using the obtained G 1, the update term of natural gradient learning is given by J i = K G w J i T w + G J ij J, wi = K G w i w w + G w ij J. 2.3
3 Slow Dynamics Due to Singularities Statistical mechanical method for analyzing dynamics Using statistical mechanical approach, 5, 6 we investigate average dynamics at thermodynamic limit, i.e., the limit of N. The estimation accuracy of learning is evaluated by the generalization error defined as E gen = 1 2 {f B,wξ f J,wξ} 2. At thermodynamic limit, the generalization error can be described by using new order parameters, which are defined as R in J T i B n, Q ij J T i J j and T nm B T n B m. Especially when gu = erfu/ 2, the explicit form of E gen is given by E gen = 1 K Q ij w i w j arcsin π 1 + Qii 1 + Qjj + 2 i,j M T mn v m v n arcsin 1 + Tmm 1 + Tnn m,n K i M n ] R in w i v n arcsin Qii 1 + Tnn From this, we now that the motion equations of R in, Q ij, and w i are sufficient to trace the dynamics of learning model. In the thermodynamic limit N, the motion equations are given by dr in = η δ iy n, dq ij = η δ ix j + δ j x i + η 2 δ i δ j, dw i = η gx ie, 3.2 where α is a continuous time variable. In the case of gu = erfu/ 2, the motion equations can be given by compact forms with Q ij, R in, T nm, w i and v n. 3, 5 For the natural gradient learning, we can apply the same method to obtain the motion equations and obtain K dr in = η K ] θ i δ y n + R n φ T i g e + Θ i δ x, 3.3 dq ij = η Q i K φ T j g e + Θ j δ x K + Q j φ T i g e + Θ i δ x + K ] K θ i δ x j + θ j δ x i + η 2 θ i θ jl δ δ l, dw i = η K,l 3.4 G w iw g e + φ i δ x ], 3.5
4 278 H. Par, M. Inoue and M. Oada plateau plateau quasi-plateau quasi-plateau Fig. 1. Evolution of generalization error in standard gradient learning. Fig. 2. Evolution of generalization error in natural gradient learning. where R n = R 1n,.., R Kn ], Q i = Q 1i,.., Q Ki ], x = x 1,.., x K ] T, and g = gx. This is a generalization of the motion equations for a soft committee machines. 2 Detailed description will be given in Ref Results and conclusions We analyzed the dynamics of the standard gradient and natural gradient learning for the case K = M = 2. We used three conditions of teacher parameter to represent three types of learning tas: B 1 B 2 for regular case, B 1 = B 2 for singular case, and B 1 = B2 B 1 B 2 = 0.9 for near-singular case. In all cases, we set B i = 1, v i = 0.5 i = 1, 2. For initial condition of learning, we used ] ] ] Q = Q ij ] i,j=1,2 =, R = R 0 1 in ] i,n=1,2 = , w =. 0.1 ε 4.1 For the ε, we tried three different values, 0.02, 0.04 and Time evolutions of the generalization error in standard gradient learning for three types of learning tas are shown in Fig. 1. For regular case a, we can see the well nown plateau cause by the permutation symmetry. Note that the permutation symmetry satisfies the singularity condition discussed in 2. For singular case b in which the symmetry breaing is not necessary, we can still see a different type of slow dynamics. We call it a quasi-plateau. 3 The quasi-plateau is caused by the
5 Slow Dynamics Due to Singularities 279 singular subspace w 1 + w 2 = 1 in this experiment, which does not exist in the soft-committee machines. This is the reason why conventional researches using softcommittee machines had not observed the quasi-plateau. Another important and interesting phenomenon is shown in near-singular case b. In near-singular case, we can see both of plateau and quasi-plateau, which maes the learning extremely slow. Since the near-singular case frequently occurs in practical applications, this phenomenon has very important meaning. Moreover, we can also see that the slow convergence cannot be avoided by changing the initial value ε in near-singular case. On the other hand, we cannot see that the plateau and quasi-plateau in the natural gradient learning. In addition, we can also see that the natural gradient learning hardly depends on the initial condition Fig. 2. By taing a geometrical viewpoint and statistical mechanical approach on learning dynamics, we found the existence of quasi-plateau and severeness of slow dynamics in near-singular case, which is interesting in both of theoretical and practical sense. The mechanism of the slow dynamics in standard gradient learning and its resolution by natural gradient learning will be discussed with detailed explanation on the properties of singular structure in Ref. 4. References 1 S. Amari, T. Ozei and H. Par, Sys. and Comm. in Jpn , M. Inoue, H. Par and M. Oada, J. Phys. Soc. Jpn , H. Par, M. Inoue and M. Oada, J. of Phys. A , H. Par, M. Inoue and M. Oada, in preparation, P. Riegler and M. Biehl, J. of Phys. A , L D. Saad and A. Solla, Phys. Rev. E , 4225.
Local minima and plateaus in hierarchical structures of multilayer perceptrons
Neural Networks PERGAMON Neural Networks 13 (2000) 317 327 Contributed article Local minima and plateaus in hierarchical structures of multilayer perceptrons www.elsevier.com/locate/neunet K. Fukumizu*,
More information= w 2. w 1. B j. A j. C + j1j2
Local Minima and Plateaus in Multilayer Neural Networks Kenji Fukumizu and Shun-ichi Amari Brain Science Institute, RIKEN Hirosawa 2-, Wako, Saitama 35-098, Japan E-mail: ffuku, amarig@brain.riken.go.jp
More informationCHALMERS, GÖTEBORGS UNIVERSITET. EXAM for ARTIFICIAL NEURAL NETWORKS. COURSE CODES: FFR 135, FIM 720 GU, PhD
CHALMERS, GÖTEBORGS UNIVERSITET EXAM for ARTIFICIAL NEURAL NETWORKS COURSE CODES: FFR 135, FIM 72 GU, PhD Time: Place: Teachers: Allowed material: Not allowed: October 23, 217, at 8 3 12 3 Lindholmen-salar
More informationArtificial Neural Network : Training
Artificial Neural Networ : Training Debasis Samanta IIT Kharagpur debasis.samanta.iitgp@gmail.com 06.04.2018 Debasis Samanta (IIT Kharagpur) Soft Computing Applications 06.04.2018 1 / 49 Learning of neural
More informationDynamics of Learning Near Singularities in Layered Networks
LETTER Communicated by Kenji Fukumizu Dynamics of Learning Near Singularities in Layered Networks Haikun Wei weihaikun@brain.riken.jp RIKEN Brain Science Institute, Saitama, 35098, Japan, Southeast University,
More informationMultilayer Perceptron
Outline Hong Chang Institute of Computing Technology, Chinese Academy of Sciences Machine Learning Methods (Fall 2012) Outline Outline I 1 Introduction 2 Single Perceptron 3 Boolean Function Learning 4
More informationEigen Vector Descent and Line Search for Multilayer Perceptron
igen Vector Descent and Line Search for Multilayer Perceptron Seiya Satoh and Ryohei Nakano Abstract As learning methods of a multilayer perceptron (MLP), we have the BP algorithm, Newton s method, quasi-
More informationPattern Classification
Pattern Classification All materials in these slides were taen from Pattern Classification (2nd ed) by R. O. Duda,, P. E. Hart and D. G. Stor, John Wiley & Sons, 2000 with the permission of the authors
More informationMultilayer Perceptrons and Backpropagation
Multilayer Perceptrons and Backpropagation Informatics 1 CG: Lecture 7 Chris Lucas School of Informatics University of Edinburgh January 31, 2017 (Slides adapted from Mirella Lapata s.) 1 / 33 Reading:
More informationDeep Neural Networks (1) Hidden layers; Back-propagation
Deep Neural Networs (1) Hidden layers; Bac-propagation Steve Renals Machine Learning Practical MLP Lecture 3 4 October 2017 / 9 October 2017 MLP Lecture 3 Deep Neural Networs (1) 1 Recap: Softmax single
More informationNeural Networks and the Back-propagation Algorithm
Neural Networks and the Back-propagation Algorithm Francisco S. Melo In these notes, we provide a brief overview of the main concepts concerning neural networks and the back-propagation algorithm. We closely
More informationLearning with incomplete information on the Committee Machine
Learning with incomplete information on the Committee Machine Urs Bergmann Heidelberg, Meeting, December 2007 1 Introduction Network Architectures 2 Learning Introduction Generalization Error Exemplified
More informationBasic Principles of Unsupervised and Unsupervised
Basic Principles of Unsupervised and Unsupervised Learning Toward Deep Learning Shun ichi Amari (RIKEN Brain Science Institute) collaborators: R. Karakida, M. Okada (U. Tokyo) Deep Learning Self Organization
More informationClassification with Perceptrons. Reading:
Classification with Perceptrons Reading: Chapters 1-3 of Michael Nielsen's online book on neural networks covers the basics of perceptrons and multilayer neural networks We will cover material in Chapters
More informationDeep Neural Networks (1) Hidden layers; Back-propagation
Deep Neural Networs (1) Hidden layers; Bac-propagation Steve Renals Machine Learning Practical MLP Lecture 3 2 October 2018 http://www.inf.ed.ac.u/teaching/courses/mlp/ MLP Lecture 3 / 2 October 2018 Deep
More informationNeural networks. Chapter 19, Sections 1 5 1
Neural networks Chapter 19, Sections 1 5 Chapter 19, Sections 1 5 1 Outline Brains Neural networks Perceptrons Multilayer perceptrons Applications of neural networks Chapter 19, Sections 1 5 2 Brains 10
More informationError Empirical error. Generalization error. Time (number of iteration)
Submitted to Neural Networks. Dynamics of Batch Learning in Multilayer Networks { Overrealizability and Overtraining { Kenji Fukumizu The Institute of Physical and Chemical Research (RIKEN) E-mail: fuku@brain.riken.go.jp
More informationNeural Network Training
Neural Network Training Sargur Srihari Topics in Network Training 0. Neural network parameters Probabilistic problem formulation Specifying the activation and error functions for Regression Binary classification
More informationComputational statistics
Computational statistics Lecture 3: Neural networks Thierry Denœux 5 March, 2016 Neural networks A class of learning methods that was developed separately in different fields statistics and artificial
More informationMultilayer Neural Networks
Multilayer Neural Networks Multilayer Neural Networks Discriminant function flexibility NON-Linear But with sets of linear parameters at each layer Provably general function approximators for sufficient
More informationAn artificial neural networks (ANNs) model is a functional abstraction of the
CHAPER 3 3. Introduction An artificial neural networs (ANNs) model is a functional abstraction of the biological neural structures of the central nervous system. hey are composed of many simple and highly
More informationNon-Convex Optimization in Machine Learning. Jan Mrkos AIC
Non-Convex Optimization in Machine Learning Jan Mrkos AIC The Plan 1. Introduction 2. Non convexity 3. (Some) optimization approaches 4. Speed and stuff? Neural net universal approximation Theorem (1989):
More informationMIDTERM: CS 6375 INSTRUCTOR: VIBHAV GOGATE October,
MIDTERM: CS 6375 INSTRUCTOR: VIBHAV GOGATE October, 23 2013 The exam is closed book. You are allowed a one-page cheat sheet. Answer the questions in the spaces provided on the question sheets. If you run
More informationStatistical Mechanics of Learning : Generalization
Statistical Mechanics of Learning : Generalization Manfred Opper Neural Computing Research Group Aston University Birmingham, United Kingdom Short title: Statistical Mechanics of Generalization Correspondence:
More informationIntroduction to Natural Computation. Lecture 9. Multilayer Perceptrons and Backpropagation. Peter Lewis
Introduction to Natural Computation Lecture 9 Multilayer Perceptrons and Backpropagation Peter Lewis 1 / 25 Overview of the Lecture Why multilayer perceptrons? Some applications of multilayer perceptrons.
More information<Special Topics in VLSI> Learning for Deep Neural Networks (Back-propagation)
Learning for Deep Neural Networks (Back-propagation) Outline Summary of Previous Standford Lecture Universal Approximation Theorem Inference vs Training Gradient Descent Back-Propagation
More informationAlgebraic Information Geometry for Learning Machines with Singularities
Algebraic Information Geometry for Learning Machines with Singularities Sumio Watanabe Precision and Intelligence Laboratory Tokyo Institute of Technology 4259 Nagatsuta, Midori-ku, Yokohama, 226-8503
More informationy(x n, w) t n 2. (1)
Network training: Training a neural network involves determining the weight parameter vector w that minimizes a cost function. Given a training set comprising a set of input vector {x n }, n = 1,...N,
More informationLecture 16: Introduction to Neural Networks
Lecture 16: Introduction to Neural Networs Instructor: Aditya Bhasara Scribe: Philippe David CS 5966/6966: Theory of Machine Learning March 20 th, 2017 Abstract In this lecture, we consider Bacpropagation,
More informationMultilayer Perceptron Learning Utilizing Singular Regions and Search Pruning
Multilayer Perceptron Learning Utilizing Singular Regions and Search Pruning Seiya Satoh and Ryohei Nakano Abstract In a search space of a multilayer perceptron having hidden units, MLP(), there exist
More informationStochastic Complexities of Reduced Rank Regression in Bayesian Estimation
Stochastic Complexities of Reduced Rank Regression in Bayesian Estimation Miki Aoyagi and Sumio Watanabe Contact information for authors. M. Aoyagi Email : miki-a@sophia.ac.jp Address : Department of Mathematics,
More informationNeural networks. Chapter 20. Chapter 20 1
Neural networks Chapter 20 Chapter 20 1 Outline Brains Neural networks Perceptrons Multilayer networks Applications of neural networks Chapter 20 2 Brains 10 11 neurons of > 20 types, 10 14 synapses, 1ms
More informationTTIC 31230, Fundamentals of Deep Learning David McAllester, Winter Multiclass Logistic Regression. Multilayer Perceptrons (MLPs)
TTIC 31230, Fundamentals of Deep Learning David McAllester, Winter 2018 Multiclass Logistic Regression Multilayer Perceptrons (MLPs) Stochastic Gradient Descent (SGD) 1 Multiclass Classification We consider
More informationIntroduction: The Perceptron
Introduction: The Perceptron Haim Sompolinsy, MIT October 4, 203 Perceptron Architecture The simplest type of perceptron has a single layer of weights connecting the inputs and output. Formally, the perceptron
More informationNew Insights and Perspectives on the Natural Gradient Method
1 / 18 New Insights and Perspectives on the Natural Gradient Method Yoonho Lee Department of Computer Science and Engineering Pohang University of Science and Technology March 13, 2018 Motivation 2 / 18
More informationMultilayer Perceptron = FeedForward Neural Network
Multilayer Perceptron = FeedForward Neural Networ History Definition Classification = feedforward operation Learning = bacpropagation = local optimization in the space of weights Pattern Classification
More informationData Mining (Mineria de Dades)
Data Mining (Mineria de Dades) Lluís A. Belanche belanche@lsi.upc.edu Soft Computing Research Group Dept. de Llenguatges i Sistemes Informàtics (Software department) Universitat Politècnica de Catalunya
More informationLecture 10. Neural networks and optimization. Machine Learning and Data Mining November Nando de Freitas UBC. Nonlinear Supervised Learning
Lecture 0 Neural networks and optimization Machine Learning and Data Mining November 2009 UBC Gradient Searching for a good solution can be interpreted as looking for a minimum of some error (loss) function
More informationNeural Learning in Structured Parameter Spaces Natural Riemannian Gradient
Neural Learning in Structured Parameter Spaces Natural Riemannian Gradient Shun-ichi Amari RIKEN Frontier Research Program, RIKEN, Hirosawa 2-1, Wako-shi 351-01, Japan amari@zoo.riken.go.jp Abstract The
More informationHow New Information Criteria WAIC and WBIC Worked for MLP Model Selection
How ew Information Criteria WAIC and WBIC Worked for MLP Model Selection Seiya Satoh and Ryohei akano ational Institute of Advanced Industrial Science and Tech, --7 Aomi, Koto-ku, Tokyo, 5-6, Japan Chubu
More informationNatural Gradient Learning for Over- and Under-Complete Bases in ICA
NOTE Communicated by Jean-François Cardoso Natural Gradient Learning for Over- and Under-Complete Bases in ICA Shun-ichi Amari RIKEN Brain Science Institute, Wako-shi, Hirosawa, Saitama 351-01, Japan Independent
More informationarxiv: v5 [cs.ne] 3 Feb 2015
Riemannian metrics for neural networs I: Feedforward networs Yann Ollivier arxiv:1303.0818v5 [cs.ne] 3 Feb 2015 February 4, 2015 Abstract We describe four algorithms for neural networ training, each adapted
More informationCSE 190 Fall 2015 Midterm DO NOT TURN THIS PAGE UNTIL YOU ARE TOLD TO START!!!!
CSE 190 Fall 2015 Midterm DO NOT TURN THIS PAGE UNTIL YOU ARE TOLD TO START!!!! November 18, 2015 THE EXAM IS CLOSED BOOK. Once the exam has started, SORRY, NO TALKING!!! No, you can t even say see ya
More informationMultilayer Perceptron
Aprendizagem Automática Multilayer Perceptron Ludwig Krippahl Aprendizagem Automática Summary Perceptron and linear discrimination Multilayer Perceptron, nonlinear discrimination Backpropagation and training
More informationBias-Variance Trade-Off in Hierarchical Probabilistic Models Using Higher-Order Feature Interactions
- Trade-Off in Hierarchical Probabilistic Models Using Higher-Order Feature Interactions Simon Luo The University of Sydney Data61, CSIRO simon.luo@data61.csiro.au Mahito Sugiyama National Institute of
More informationWhat Do Neural Networks Do? MLP Lecture 3 Multi-layer networks 1
What Do Neural Networks Do? MLP Lecture 3 Multi-layer networks 1 Multi-layer networks Steve Renals Machine Learning Practical MLP Lecture 3 7 October 2015 MLP Lecture 3 Multi-layer networks 2 What Do Single
More informationNeural networks. Chapter 20, Section 5 1
Neural networks Chapter 20, Section 5 Chapter 20, Section 5 Outline Brains Neural networks Perceptrons Multilayer perceptrons Applications of neural networks Chapter 20, Section 5 2 Brains 0 neurons of
More informationNeural Networks. Yan Shao Department of Linguistics and Philology, Uppsala University 7 December 2016
Neural Networks Yan Shao Department of Linguistics and Philology, Uppsala University 7 December 2016 Outline Part 1 Introduction Feedforward Neural Networks Stochastic Gradient Descent Computational Graph
More information(Feed-Forward) Neural Networks Dr. Hajira Jabeen, Prof. Jens Lehmann
(Feed-Forward) Neural Networks 2016-12-06 Dr. Hajira Jabeen, Prof. Jens Lehmann Outline In the previous lectures we have learned about tensors and factorization methods. RESCAL is a bilinear model for
More informationStatistical Machine Learning (BE4M33SSU) Lecture 5: Artificial Neural Networks
Statistical Machine Learning (BE4M33SSU) Lecture 5: Artificial Neural Networks Jan Drchal Czech Technical University in Prague Faculty of Electrical Engineering Department of Computer Science Topics covered
More informationMulti-layer Neural Networks
Multi-layer Neural Networks Steve Renals Informatics 2B Learning and Data Lecture 13 8 March 2011 Informatics 2B: Learning and Data Lecture 13 Multi-layer Neural Networks 1 Overview Multi-layer neural
More informationComputational Intelligence Lecture 3: Simple Neural Networks for Pattern Classification
Computational Intelligence Lecture 3: Simple Neural Networks for Pattern Classification Farzaneh Abdollahi Department of Electrical Engineering Amirkabir University of Technology Fall 2011 arzaneh Abdollahi
More information10-701/15-781, Machine Learning: Homework 4
10-701/15-781, Machine Learning: Homewor 4 Aarti Singh Carnegie Mellon University ˆ The assignment is due at 10:30 am beginning of class on Mon, Nov 15, 2010. ˆ Separate you answers into five parts, one
More informationNeural Networks Lecture 4: Radial Bases Function Networks
Neural Networks Lecture 4: Radial Bases Function Networks H.A Talebi Farzaneh Abdollahi Department of Electrical Engineering Amirkabir University of Technology Winter 2011. A. Talebi, Farzaneh Abdollahi
More informationComputational Intelligence Winter Term 2017/18
Computational Intelligence Winter Term 207/8 Prof. Dr. Günter Rudolph Lehrstuhl für Algorithm Engineering (LS ) Fakultät für Informatik TU Dortmund Plan for Today Single-Layer Perceptron Accelerated Learning
More informationIntelligent Systems Discriminative Learning, Neural Networks
Intelligent Systems Discriminative Learning, Neural Networks Carsten Rother, Dmitrij Schlesinger WS2014/2015, Outline 1. Discriminative learning 2. Neurons and linear classifiers: 1) Perceptron-Algorithm
More informationStatistical Machine Learning from Data
January 17, 2006 Samy Bengio Statistical Machine Learning from Data 1 Statistical Machine Learning from Data Other Artificial Neural Networks Samy Bengio IDIAP Research Institute, Martigny, Switzerland,
More information4. Multilayer Perceptrons
4. Multilayer Perceptrons This is a supervised error-correction learning algorithm. 1 4.1 Introduction A multilayer feedforward network consists of an input layer, one or more hidden layers, and an output
More informationStochastic gradient descent; Classification
Stochastic gradient descent; Classification Steve Renals Machine Learning Practical MLP Lecture 2 28 September 2016 MLP Lecture 2 Stochastic gradient descent; Classification 1 Single Layer Networks MLP
More informationComputational Intelligence
Plan for Today Single-Layer Perceptron Computational Intelligence Winter Term 00/ Prof. Dr. Günter Rudolph Lehrstuhl für Algorithm Engineering (LS ) Fakultät für Informatik TU Dortmund Accelerated Learning
More informationSimple Neural Nets For Pattern Classification
CHAPTER 2 Simple Neural Nets For Pattern Classification Neural Networks General Discussion One of the simplest tasks that neural nets can be trained to perform is pattern classification. In pattern classification
More informationOnline learning from finite training sets: An analytical case study
Online learning from finite training sets: An analytical case study Peter Sollich* Department of Physics University of Edinburgh Edinburgh EH9 3JZ, U.K. P.SollichOed.ac.uk David Barbert Neural Computing
More informationMean-field equations for higher-order quantum statistical models : an information geometric approach
Mean-field equations for higher-order quantum statistical models : an information geometric approach N Yapage Department of Mathematics University of Ruhuna, Matara Sri Lanka. arxiv:1202.5726v1 [quant-ph]
More informationword2vec Parameter Learning Explained
word2vec Parameter Learning Explained Xin Rong ronxin@umich.edu Abstract The word2vec model and application by Mikolov et al. have attracted a great amount of attention in recent two years. The vector
More informationIntroduction to Logistic Regression and Support Vector Machine
Introduction to Logistic Regression and Support Vector Machine guest lecturer: Ming-Wei Chang CS 446 Fall, 2009 () / 25 Fall, 2009 / 25 Before we start () 2 / 25 Fall, 2009 2 / 25 Before we start Feel
More informationModule 12. Machine Learning. Version 2 CSE IIT, Kharagpur
Module 12 Machine Learning Lesson 39 Neural Networks - III 12.4.4 Multi-Layer Perceptrons In contrast to perceptrons, multilayer networks can learn not only multiple decision boundaries, but the boundaries
More informationECE521 Lectures 9 Fully Connected Neural Networks
ECE521 Lectures 9 Fully Connected Neural Networks Outline Multi-class classification Learning multi-layer neural networks 2 Measuring distance in probability space We learnt that the squared L2 distance
More informationApplication of Artificial Neural Networks in Evaluation and Identification of Electrical Loss in Transformers According to the Energy Consumption
Application of Artificial Neural Networks in Evaluation and Identification of Electrical Loss in Transformers According to the Energy Consumption ANDRÉ NUNES DE SOUZA, JOSÉ ALFREDO C. ULSON, IVAN NUNES
More informationCSC321 Lecture 5: Multilayer Perceptrons
CSC321 Lecture 5: Multilayer Perceptrons Roger Grosse Roger Grosse CSC321 Lecture 5: Multilayer Perceptrons 1 / 21 Overview Recall the simple neuron-like unit: y output output bias i'th weight w 1 w2 w3
More informationNeural Networks. Advanced data-mining. Yongdai Kim. Department of Statistics, Seoul National University, South Korea
Neural Networks Advanced data-mining Yongdai Kim Department of Statistics, Seoul National University, South Korea What is Neural Networks? One of supervised learning method using one or more hidden layer.
More informationIntroduction to Neural Networks
Introduction to Neural Networks What are (Artificial) Neural Networks? Models of the brain and nervous system Highly parallel Process information much more like the brain than a serial computer Learning
More informationLab 5: 16 th April Exercises on Neural Networks
Lab 5: 16 th April 01 Exercises on Neural Networks 1. What are the values of weights w 0, w 1, and w for the perceptron whose decision surface is illustrated in the figure? Assume the surface crosses the
More informationLecture 6. Regression
Lecture 6. Regression Prof. Alan Yuille Summer 2014 Outline 1. Introduction to Regression 2. Binary Regression 3. Linear Regression; Polynomial Regression 4. Non-linear Regression; Multilayer Perceptron
More informationNeed for Deep Networks Perceptron. Can only model linear functions. Kernel Machines. Non-linearity provided by kernels
Need for Deep Networks Perceptron Can only model linear functions Kernel Machines Non-linearity provided by kernels Need to design appropriate kernels (possibly selecting from a set, i.e. kernel learning)
More informationSerious limitations of (single-layer) perceptrons: Cannot learn non-linearly separable tasks. Cannot approximate (learn) non-linear functions
BACK-PROPAGATION NETWORKS Serious limitations of (single-layer) perceptrons: Cannot learn non-linearly separable tasks Cannot approximate (learn) non-linear functions Difficult (if not impossible) to design
More informationMachine Learning CS 4900/5900. Lecture 03. Razvan C. Bunescu School of Electrical Engineering and Computer Science
Machine Learning CS 4900/5900 Razvan C. Bunescu School of Electrical Engineering and Computer Science bunescu@ohio.edu Machine Learning is Optimization Parametric ML involves minimizing an objective function
More informationMidterm: CS 6375 Spring 2015 Solutions
Midterm: CS 6375 Spring 2015 Solutions The exam is closed book. You are allowed a one-page cheat sheet. Answer the questions in the spaces provided on the question sheets. If you run out of room for an
More informationNN V: The generalized delta learning rule
NN V: The generalized delta learning rule We now focus on generalizing the delta learning rule for feedforward layered neural networks. The architecture of the two-layer network considered below is shown
More informationGradient Descent Training Rule: The Details
Gradient Descent Training Rule: The Details 1 For Perceptrons The whole idea behind gradient descent is to gradually, but consistently, decrease the output error by adjusting the weights. The trick is
More informationDeterministic annealing variant of variational Bayes method
Journal of Physics: Conference Series Deterministic annealing variant of variational Bayes method To cite this article: K Katahira et al 28 J. Phys.: Conf. Ser. 95 1215 View the article online for updates
More informationArtificial Neural Networks 2
CSC2515 Machine Learning Sam Roweis Artificial Neural s 2 We saw neural nets for classification. Same idea for regression. ANNs are just adaptive basis regression machines of the form: y k = j w kj σ(b
More informationPart 8: Neural Networks
METU Informatics Institute Min720 Pattern Classification ith Bio-Medical Applications Part 8: Neural Netors - INTRODUCTION: BIOLOGICAL VS. ARTIFICIAL Biological Neural Netors A Neuron: - A nerve cell as
More informationIndependent Component Analysis (ICA)
Independent Component Analysis (ICA) Seungjin Choi Department of Computer Science and Engineering Pohang University of Science and Technology 77 Cheongam-ro, Nam-gu, Pohang 37673, Korea seungjin@postech.ac.kr
More informationarxiv:nlin/ v2 [nlin.si] 15 Sep 2004
Integrable Mappings Related to the Extended Discrete KP Hierarchy ANDREI K. SVININ Institute of System Dynamics and Control Theory, Siberian Branch of Russian Academy of Sciences, P.O. Box 1233, 664033
More informationCS260: Machine Learning Algorithms
CS260: Machine Learning Algorithms Lecture 4: Stochastic Gradient Descent Cho-Jui Hsieh UCLA Jan 16, 2019 Large-scale Problems Machine learning: usually minimizing the training loss min w { 1 N min w {
More informationStatistical Machine Learning from Data
January 17, 2006 Samy Bengio Statistical Machine Learning from Data 1 Statistical Machine Learning from Data Multi-Layer Perceptrons Samy Bengio IDIAP Research Institute, Martigny, Switzerland, and Ecole
More informationLECTURE # - NEURAL COMPUTATION, Feb 04, Linear Regression. x 1 θ 1 output... θ M x M. Assumes a functional form
LECTURE # - EURAL COPUTATIO, Feb 4, 4 Linear Regression Assumes a functional form f (, θ) = θ θ θ K θ (Eq) where = (,, ) are the attributes and θ = (θ, θ, θ ) are the function parameters Eample: f (, θ)
More informationMachine Learning for Large-Scale Data Analysis and Decision Making A. Neural Networks Week #6
Machine Learning for Large-Scale Data Analysis and Decision Making 80-629-17A Neural Networks Week #6 Today Neural Networks A. Modeling B. Fitting C. Deep neural networks Today s material is (adapted)
More informationVasil Khalidov & Miles Hansard. C.M. Bishop s PRML: Chapter 5; Neural Networks
C.M. Bishop s PRML: Chapter 5; Neural Networks Introduction The aim is, as before, to find useful decompositions of the target variable; t(x) = y(x, w) + ɛ(x) (3.7) t(x n ) and x n are the observations,
More informationECS289: Scalable Machine Learning
ECS289: Scalable Machine Learning Cho-Jui Hsieh UC Davis Sept 29, 2016 Outline Convex vs Nonconvex Functions Coordinate Descent Gradient Descent Newton s method Stochastic Gradient Descent Numerical Optimization
More informationMultilayer Neural Networks
Multilayer Neural Networks Introduction Goal: Classify objects by learning nonlinearity There are many problems for which linear discriminants are insufficient for minimum error In previous methods, the
More informationOptimization and Gradient Descent
Optimization and Gradient Descent INFO-4604, Applied Machine Learning University of Colorado Boulder September 12, 2017 Prof. Michael Paul Prediction Functions Remember: a prediction function is the function
More informationA Gradient-Based Algorithm Competitive with Variational Bayesian EM for Mixture of Gaussians
A Gradient-Based Algorithm Competitive with Variational Bayesian EM for Mixture of Gaussians Miael Kuusela, Tapani Raio, Antti Honela, and Juha Karhunen Abstract While variational Bayesian (VB) inference
More informationIntroduction to Neural Networks
CUONG TUAN NGUYEN SEIJI HOTTA MASAKI NAKAGAWA Tokyo University of Agriculture and Technology Copyright by Nguyen, Hotta and Nakagawa 1 Pattern classification Which category of an input? Example: Character
More informationNeural Networks. Mark van Rossum. January 15, School of Informatics, University of Edinburgh 1 / 28
1 / 28 Neural Networks Mark van Rossum School of Informatics, University of Edinburgh January 15, 2018 2 / 28 Goals: Understand how (recurrent) networks behave Find a way to teach networks to do a certain
More informationNeural Networks Learning the network: Backprop , Fall 2018 Lecture 4
Neural Networks Learning the network: Backprop 11-785, Fall 2018 Lecture 4 1 Recap: The MLP can represent any function The MLP can be constructed to represent anything But how do we construct it? 2 Recap:
More informationComputing Neural Network Gradients
Computing Neural Network Gradients Kevin Clark 1 Introduction The purpose of these notes is to demonstrate how to quickly compute neural network gradients in a completely vectorized way. It is complementary
More informationNeural Networks (Part 1) Goals for the lecture
Neural Networks (Part ) Mark Craven and David Page Computer Sciences 760 Spring 208 www.biostat.wisc.edu/~craven/cs760/ Some of the slides in these lectures have been adapted/borrowed from materials developed
More informationOn the saddle point problem for non-convex optimization
On the saddle point problem for non-convex optimization Razvan Pascanu Université de Montréal r.pascanu@gmail.com Surya Ganguli Stanford University sganguli@standford.edu Yann N. Dauphin Université de
More informationIntroduction to Machine Learning Spring 2018 Note Neural Networks
CS 189 Introduction to Machine Learning Spring 2018 Note 14 1 Neural Networks Neural networks are a class of compositional function approximators. They come in a variety of shapes and sizes. In this class,
More information