Chapter 15. Dynamically Driven Recurrent Networks

Similar documents
Direct Method for Training Feed-forward Neural Networks using Batch Extended Kalman Filter for Multi- Step-Ahead Predictions

4. Multilayer Perceptrons

T Machine Learning and Neural Networks

Lecture 5: Recurrent Neural Networks

Introduction to Neural Networks

Recurrent Neural Networks

Artificial Neural Network and Fuzzy Logic

Learning and Memory in Neural Networks

ARTIFICIAL NEURAL NETWORK PART I HANIEH BORHANAZAD

Neural networks. Chapter 19, Sections 1 5 1

Artificial Neural Network

Chapter 11. Stochastic Methods Rooted in Statistical Mechanics

A New Look at Nonlinear Time Series Prediction with NARX Recurrent Neural Network. José Maria P. Menezes Jr. and Guilherme A.

Lecture 4: Feed Forward Neural Networks

CS:4420 Artificial Intelligence

Sample Exam COMP 9444 NEURAL NETWORKS Solutions

Neural networks. Chapter 20. Chapter 20 1

International University Bremen Guided Research Proposal Improve on chaotic time series prediction using MLPs for output training

Christian Mohr

(Feed-Forward) Neural Networks Dr. Hajira Jabeen, Prof. Jens Lehmann

Neuro-Fuzzy Comp. Ch. 4 March 24, R p

Last update: October 26, Neural networks. CMSC 421: Section Dana Nau

Feed-forward Network Functions

Artificial Neural Networks Examination, June 2005

Artificial Neural Network : Training

Consider the way we are able to retrieve a pattern from a partial key as in Figure 10 1.

CSC242: Intro to AI. Lecture 21

Neural networks. Chapter 20, Section 5 1

Neural Nets in PR. Pattern Recognition XII. Michal Haindl. Outline. Neural Nets in PR 2

Chapter 4 Neural Networks in System Identification

y k = (a)synaptic f(x j ) link linear i/p o/p relation (b) Activation link linear i/p o/p relation

Neural Networks, Computation Graphs. CMSC 470 Marine Carpuat

Reservoir Computing and Echo State Networks

Modelling Time Series with Neural Networks. Volker Tresp Summer 2017

Neural Networks Introduction

Lecture 7 Artificial neural networks: Supervised learning

NN V: The generalized delta learning rule

A recursive algorithm based on the extended Kalman filter for the training of feedforward neural models. Isabelle Rivals and Léon Personnaz

Introduction Neural Networks - Architecture Network Training Small Example - ZIP Codes Summary. Neural Networks - I. Henrik I Christensen

22c145-Fall 01: Neural Networks. Neural Networks. Readings: Chapter 19 of Russell & Norvig. Cesare Tinelli 1

Temporal Backpropagation for FIR Neural Networks

Introduction to Neural Networks: Structure and Training

Artificial Neural Networks. MGS Lecture 2

inear Adaptive Inverse Control

Neural Networks. CSE 6363 Machine Learning Vassilis Athitsos Computer Science and Engineering Department University of Texas at Arlington

Artificial Neural Network

CS 6501: Deep Learning for Computer Graphics. Basics of Neural Networks. Connelly Barnes

CSE 352 (AI) LECTURE NOTES Professor Anita Wasilewska. NEURAL NETWORKS Learning

3.4 Linear Least-Squares Filter

Cheng Soon Ong & Christian Walder. Canberra February June 2018

CSC321 Lecture 5: Multilayer Perceptrons

Project 1: A comparison of time delay neural networks (TDNN) trained with mean squared error (MSE) and error entropy criterion (EEC)

Neural Networks. Chapter 18, Section 7. TB Artificial Intelligence. Slides from AIMA 1/ 21

Multilayer Perceptron

Multilayer Neural Networks. (sometimes called Multilayer Perceptrons or MLPs)

Mark Gales October y (x) x 1. x 2 y (x) Inputs. Outputs. x d. y (x) Second Output layer layer. layer.

NARX neural networks for sequence processing tasks

Need for Deep Networks Perceptron. Can only model linear functions. Kernel Machines. Non-linearity provided by kernels

Analysis of Multilayer Neural Network Modeling and Long Short-Term Memory

Application of Artificial Neural Networks in Evaluation and Identification of Electrical Loss in Transformers According to the Energy Consumption

Autonomous learning algorithm for fully connected recurrent networks

Unit III. A Survey of Neural Network Model

Need for Deep Networks Perceptron. Can only model linear functions. Kernel Machines. Non-linearity provided by kernels

Statistical Machine Learning (BE4M33SSU) Lecture 5: Artificial Neural Networks

A Logarithmic Neural Network Architecture for Unbounded Non-Linear Function Approximation

AI Programming CS F-20 Neural Networks

Neural Networks and the Back-propagation Algorithm

Using a Hopfield Network: A Nuts and Bolts Approach

Neural Networks. Mark van Rossum. January 15, School of Informatics, University of Edinburgh 1 / 28

Modeling Economic Time Series Using a Focused Time Lagged FeedForward Neural Network

Serious limitations of (single-layer) perceptrons: Cannot learn non-linearly separable tasks. Cannot approximate (learn) non-linear functions

Cascade Neural Networks with Node-Decoupled Extended Kalman Filtering

Lecture 4: Perceptrons and Multilayer Perceptrons

Branch Prediction using Advanced Neural Methods

Artificial Neural Networks D B M G. Data Base and Data Mining Group of Politecnico di Torino. Elena Baralis. Politecnico di Torino

Short Term Memory and Pattern Matching with Simple Echo State Networks

Artificial Intelligence Hopfield Networks

Long-Term Prediction, Chaos and Artificial Neural Networks. Where is the Meeting Point?

Neural Turing Machine. Author: Alex Graves, Greg Wayne, Ivo Danihelka Presented By: Tinghui Wang (Steve)

Ch.8 Neural Networks

Multilayer Neural Networks. (sometimes called Multilayer Perceptrons or MLPs)

Long-Short Term Memory

Neural Networks with Applications to Vision and Language. Feedforward Networks. Marco Kuhlmann

CSCI 315: Artificial Intelligence through Deep Learning

Artificial Neural Networks Examination, June 2004

Neural Network Training

Artificial Neural Networks Examination, March 2004

SPSS, University of Texas at Arlington. Topics in Machine Learning-EE 5359 Neural Networks

CSC 411 Lecture 10: Neural Networks

Artificial Neural Networks. Q550: Models in Cognitive Science Lecture 5

ARTIFICIAL NEURAL NETWORKS گروه مطالعاتي 17 بهار 92

Advanced Methods for Recurrent Neural Networks Design

Harnessing Nonlinearity: Predicting Chaotic Systems and Saving

EEE 241: Linear Systems

Computational Intelligence Lecture 6: Associative Memory

Ways to make neural networks generalize better

Artificial Neural Networks

y(x n, w) t n 2. (1)

Artificial Neuron (Perceptron)

EE04 804(B) Soft Computing Ver. 1.2 Class 2. Neural Networks - I Feb 23, Sasidharan Sreedharan

Transcription:

Chapter 15. Dynamically Driven Recurrent Networks Neural Networks and Learning Machines (Haykin) Lecture Notes on Self-learning Neural Algorithms Byoung-Tak Zhang School of Computer Science and Engineering Seoul National University Version 20171120

Contents 15.1 Introduction... 3 15.2 Recurrent Network Architectures..... 4 15.3 Universal Approximation Theorem........ 12 15.5 Computational Power of Recurrent Networks....... 14 15.6 Learning Algorithms.......... 17 15.7 Back Propagation Through Time..... 18 15.8 Real-Time Recurrent Learning........ 21 15.9 Vanishing Gradients in Recurrent Networks...... 26 15.10 Supervised Training Framework for Recurrent Networks.... 28 15.11 Computer Experiment: Mackey-Glass Attractor.. 32 15.12 Adaptivity Considerations.. 33 15.13 Case Study: Model Reference Applied to Neurocontrol 34 Summary and Discussion...... 35 (c) 2017 Biointelligence Lab, SNU 2

15.1 Introduction Global feedback is a facilitator of computational intelligence. In previous chapters we studied how the use of global feedback in a recurrent network makes it possible to achieve some useful tasks: o o o Content-addressable memory Autoassociation Dynamic reconstruction of a chaotic process In this chapter, we study the other important applications of recurrent networks: o o o o Input output mapping, the study of which naturally benefits from Chapter 14 on sequential state estimation Applying the feedback from the output layer to the input of the hidden layer Combining all possible feedback loops in a single recurrent network structure Other configurations as the building blocks for the construction of recurrent networks Recurrent networks have a very rich repertoire of architectural layouts, which makes them all the more powerful in computational terms. A recurrent network responds temporally to an externally applied input signal. We may therefore speak of the recurrent networks considered in this chapter as dynamically driven recurrent networks (c) 2017 Biointelligence Lab, SNU 3

15.2 Recurrent Network Architectures (1/8) Four specific network architectures 1) Input-Output Recurrent Model 2) State-Space Model 3) Recurrent Multilayer Perceptrons 4) Second-Order Network They all incorporate a static multilayer perceptron or parts thereof They all exploit the nonlinear mapping capability of the multilayer perceptron (c) 2017 Biointelligence Lab, SNU 4

15.2 Recurrent Network Architectures (2/8) 1) Input-Output Recurrent Model 1. The model has a single input that is applied to a tapped-delay-line memory of q units. 2. It has a single output that is fed back to the input via another tapped-delay-line memory, also of q units. 3. The present value of the model input is denoted by un, and the corresponding value of the model output is denoted by y n + 1 4. The dynamic behavior of the nonlinear autoregressive with exogenous inputs (NARX) model is described by y F y y u u = (,!, ;,!, ) n+ 1 n n- q+ 1 n n- q+ 1 where F is a nonlinear function of its arguments Figure 15.1 Nonlinear autoregressive with exogenous inputs (NARX) model; the feedback part of the network is shown in blue. 5

15.2 Recurrent Network Architectures (3/8) 2) State-Space Model Figure 15.2 State-space model; the feedback part of the model is shown in blue. Figure 15.3 Simple recurrent network (SRN); the feedback part of the network is shown in blue. 6

15.2 Recurrent Network Architectures (4/8) 2) State-Space Model 1. A state space model, is the basic idea of which was discussed in Chapter 14 2. The output is fed back to the input layer via a bank of unit-time delays. 3. The input layer consists of a concatenation of feedback nodes and source nodes. x = n+ 1 a( xn, un) yn = Bxn 4. Elman's network (Fig.15.3) contains recurrent connections from the hidden neurons to a layer of context units consisting of unit-time delays. These context units store the outputs of the hidden neurons for one time-step and then feed them back to the input layer. (c) 2017 Biointelligence Lab, SNU 7

15.2 Recurrent Network Architectures (5/8) 3) Recurrent Multilayer Perceptrons Figure 15.4 Recurrent multilayer perceptron; feedback paths in the network are printed in blue. (c) 2017 Biointelligence Lab, SNU 8

15.2 Recurrent Network Architectures (6/8) 3) Recurrent Multilayer Perceptrons 1. A recurrent multilayer perceptron (RMLP) has one or more hidden layers, basically for the same reasons that static multilayer perceptrons are often more effective and parsimonious than those using a single hidden layer 2. Each computation layer of an RMLP has feedback around it, as illustrated in Fig. 15.4 for the case of an RMLP with two hidden layers. x x u I, n+ 1 = I( I, n, n) x = ( x x ) II, n+ 1 II II, n, I, n+ 1! f f f x = ( x x ) on, + 1 o on,, Kn, + 1 (c) 2017 Biointelligence Lab, SNU 9

15.2 Recurrent Network Architectures (7/8) 4) Second-Order Network Figure 15.5 Second-order recurrent network; bias connections to the neurons are omitted to simplify the presentation. The network has 2 inputs and 3 state neurons, hence the need for 3 X 2 = 6 multipliers. The feedback links in the figure are printed in blue to emphasize their global role. 10

15.2 Recurrent Network Architectures (8/8) 4) Second-Order Network First-order neuron Second-order neuron Second-order recurrent networks å å v = w x + w u v k a, kj j b, ki i j i = åå w xu k kij i j i j v b w x u = +åå k, n k kij i, n j, n i j x = 1 φ( v ) = + 1+ exp( -v ) kn, 1 kn, kn, d ( x u )= x i, j k 11

15.3 Universal Approximation Theorem (1/2) Any nonlinear dynamic system may be approximated by a recurrent neural network to any desired degree of accuracy and with no restrictions imposed on the compactness of the state space, provided that the network is equipped with an adequate number of hidden neurons. xn + 1 = f( W x + Wu ) y = Wx n c n a n b n 1- e φ ( x) = tanh( x) = 1 + e -2x -2x éx1ù éj( x1) ù ê x ú ê 2 j( x2) ú f : ê ú ê ú ê! ú ê! ú ê ú ê x ) ú êë qúû êëj ( xq úû 1 φ ( x) = 1 + e - x 12

15.3 Universal Approximation Theorem (2/2) Example 1 Fully Connected Recurrent Network m= 2, q= 3, and p= 1 W W a b éw w w 11 12 13 = ê w21 w22 w ú ê 23 ú êëw w w 31 32 33 éb w w 1 14 15 = ê b2 w24 w ú ê 25 ú êëb w w 3 34 35 ù úû ù úû Wc = [ 1 0 0] Figure 15.6 Fully connected recurrent network with two inputs, two hidden neurons, and one output neuron. The feedback connections are shown in blue to emphasize their global role. 13

15.5 Computational Power of Recurrent Networks (1/3) Every finite - state machine is equivalent to, and can be simulated by, some neural net. That is, given any finite - state machine M, we can build a certain neural net M N which, regarded as a black - box machine, will behave precisely like M! Theorem I (Siegelmann and Sontag, 1991) All Turing machines may be simulated by fully connected recurrent networks built on neurons with sigmoidal activation functions. Three functional blocks of Turing Machine a control unit, which can assume any one of a finite number of possible states linear tape, assumed to be infinitely long in both directions, which is marked off into discrete squares, where each square is available to store a single symbol taken from a finite set of symbols a read write head, which moves along the tape and transmits information to and from the control unit Figure 15.7 Turing machine. 14

15.5 Computational Power of Recurrent Networks (2/3) Figure 15.8 Illustration of Theorems I and II, and corollary to them. 15

15.5 Computational Power of Recurrent Networks (3/3) Theorem II (Siegelmann et al., 1997) NARX networks with one layer of hidden neurons with bounded, one-sided saturated (BOSS) activation functions and a linear output neuron can simulate fully connected recurrent networks with bounded, one-sided saturated activation functions, except for a linear slowdown. Three condtions of activation functions 1. The function j( ) has a bounded range; that is, a j( x) b, a¹ b, for all xîr 2. The function j( ) is saturated on the left side; that is, there exists values s and S such that j( x) = S for all x s 3. The function j( ) is nonconstant; that is j( x ) ¹ j( x ) for some x and x. 1 2 1 2 BOSS function ì 1 ï for x> j( x) = í1 + exp( - x) ï î0 for x s s 16

15.6 Learning Algorithms Two modes of training a recurrent network 1. Epochwise training. For a given epoch, the recurrent network uses a temporal sequence of input target response pairs and starts running from some initial state until it reaches a new state, at which point the training is stopped and the network is reset to an initial state for the next epoch. 2. Continuous training. This second method of training is suitable for situations where there are no reset states available or on-line learning is required. The distinguishing feature of continuous training is that the network learns while performing signal processing. Simply put, the learning process never stops. Two different learning algorithms 1. Back-propagation through time (BPTT) Section 15.7 - Epochwise, continuous, or combined 1. Real-time recurrent learning (RTRL) Section 15.8 - Derived from the state-space model 17

15.7 Back Propagation Through Time (1/3) The back-propagation-through-time (BPTT) algorithm for training a recurrent network is an extension of the standard back-propagation algorithm.8 It may be derived by unfolding the temporal operation of the network into a layered feedforward network, the topology of which grows by one layer at every time-step. Figure 15.9 (a) Architectural graph of a twoneuron recurrent network N. (b) Signal-flow graph of the network N. unfolded in time. (c) 2017 Biointelligence Lab, SNU 18

15.7 Back Propagation Through Time (2/3) Epochwise Back Propagation Through Time E n 1 = 2 1 åå n= n0 jîa e 2 total jn, δ = jn, E - total vj, n δ = jn, ' ' ì φ ( vjn, ) ejn, for n= n1 ï í é ù ïφ ( v ) êe å w ú for n n n î ë û j, n j, n + jkd k, n + 1 0 < < 1 kîa E D wji =-η w n 1 = η å n= n0+ 1 total δ ji jn, xin, - 1 (c) 2017 Biointelligence Lab, SNU 19

15.7 Back Propagation Through Time (3/3) Truncated Back Propagation Through Time 1 El= 2 å e Î δ = jl, j 2 j, l El δ jl, =- for all jîa and n- h< l n v ' ì φ ( vjl, ) ejl, for l = n ï í ' φ ( vj, l ) wjk, lδ k, l+ 1 for n- h< l < n ï å î kîa D w = η å δ x - n ji,n j, l i, l 1 n= n- h+ 1 The Ordered Derivative Approach A φ φ If a= φ( b, c), then F = F and F = F b c jl, l l l l - b - a - c - a (c) 2017 Biointelligence Lab, SNU 20

15.8 Real-Time Recurrent Learning (1/5) Real-time recurrent learning(rtrl): adjustments are made to the synaptic weights of a fully connected recurrent network in real time that is, while the network continues to perform its signal-processing function. T éφ( w1 ξ ) ù n ê ú ê! ú T n+ 1= ê φ( j n) ú éxn ù x w ξ ê ú ξn = ê ú ê! ú ëun û ê ú T êëφ( wqξn) úû w j éwa, jù = ê ú, j = 1,2,..., q êëw b, júû Figure 15.10 Fully connected recurrent network for formulation of the RTRL algorithm; the feedback connections are all shown in blue. (c) 2017 Biointelligence Lab, SNU 21

15.8 Real-Time Recurrent Learning (2/5) Λ xn, =, j = 1,2,..., q w jn j é0 ù ê T ú U jn, = êξnú jth row, j = 1,2,..., q ê ú ë0 û Φ = diag(φ ( w ξ ),...,φ ( w ξ )...,φ ( w ξ )) ' T ' T ' T n 1 n j n q n Λ Φ W Λ U =, 1 ( + jn n an, jn, jn, ), j = + 1,2,..., q (c) 2017 Biointelligence Lab, SNU 22

15.8 Real-Time Recurrent Learning (3/5) e = d -y n n n = d -Wx n c n E 1 T = ee 2 n n n E æ n e ö n = ç e w j è w j ø n æ x ö n =-Wc e ç n è w j ø =- WcΛ j, nen, j = 1,2,..., q D w j,n =- η E w n j = η W Λ, = 1,2,..., c j, nen j q Λ j,0 = 0 for all j (c) 2017 Biointelligence Lab, SNU 23

15.8 Real-Time Recurrent Learning (4/5) (c) 2017 Biointelligence Lab, SNU 24

15.8 Real-Time Recurrent Learning (5/5) Teacher forcing, or equation-error method, involves replacing the actual output of a neuron, during training of the network, with the corresponding desired response (i.e., target signal) in subsequent computation of the dynamic behavior of the network, whenever that desired response is available. Faster training, corrective mechanism. x = φ( v ) jn, + 1 jn, = and tanh( v ) jn, φ( v jn, ) φ'( v jn, ) = v = jn, 2 sech ( v jn, ) = 1-x 2 jn, + 1 Figure 15.11 Sensitivity graph of the fully recurrent network of Fig. 15.6. Note: The three nodes, labeled ξ l,n are all to be viewed as a single input. 25

15.9 Vanishing Gradients in Recurrent Networks (1/2) The vanishing-gradients problem arises in the training of the network to produce a desired response at the current time that depends on input data in the distant past. Robust information latching in a recurrent network is accomplished if the states of the network are contained in the reduced attracting set of a hyperbolic attractor. Figure 15.12 Illustration of the vanishing-gradient problem: (a) State x n resides in the basin of attraction, β, but outside the reduced attraction set g. (b) State x n resides inside the reduced attraction set g. (c) 2017 Biointelligence Lab, SNU 26

15.9 Vanishing Gradients in Recurrent Networks (2/2) Long-Term Dependencies D = -η w n æ y ö D w = ηå d -y è ø (,, ) in, n ç i n i n i w å ( din, yin, ) in, in, = η - x f in, = x x i æ y x ö ç è xin, w ø ( xin,, un) ik, ik, = J E 2 total 1 Etotal = å di, n -yi, n 2 i w x, nk, D w = æ y æ x x öö d -y è è øø n in, in, ik, n ηå i n i n ç å i x ç in, k= 1 xik, wk ( J x,, ) (,, ) det nk 0 as k for all n The network is not robust to the presence of noise in the input signal, or else The network is unable to discover long-term dependencies (i.e., relationships between target outputs and inputs that occur in the distant past). (c) 2017 Biointelligence Lab, SNU 27

15.10 Supervised Training Framework for Recurrent Networks Using Nonlinear Sequential State Estimators (1/4) Figure 15.13 Nonlinear state-space model depicting the underlying dynamics of a recurrent network undergoing supervised training. w = w + ω + n 1 n n d = b( w, v, u ) + v n n n n n (c) 2017 Biointelligence Lab, SNU 28

15.10 Supervised Training Framework for Recurrent Networks Using Nonlinear Sequential State Estimators (2/4) Description of the Supervised-Training Framework using the Extended Kalman Filter The recurrent neural network, undergoing training, performs the role of the predictor; and the extended Kalman filter, providing the supervision, performs the role of the corrector. (c) 2017 Biointelligence Lab, SNU 29

15.10 Supervised Training Framework for Recurrent Networks Using Nonlinear Sequential State Estimators (3/4) Description of the Supervised-Training Framework using the Extended Kalman Filter α = d -b( wˆ, v, u ) n n nn -1 n n wˆ = wˆ + G α nn nn -1 n n wˆ = wˆ + G ( d -y ) nn nn -1 n n n (c) 2017 Biointelligence Lab, SNU 30

15.10 Supervised Training Framework for Recurrent Networks Using Nonlinear Sequential State Estimators (4/4) Decoupled Extended Kalman Filter Figure 15.15 Block-diagonal representation of the filtering-error covariance matrix pertaining to the decoupled Kalman filter (DEKF). The shaded parts of the square represent nonzero values of,where i = 1, 2, 3, 4 for the example illustrated in the figure. As we make the number of disjoint weight groups, g, larger, more zeros are created in the covariance matrix P n n ; in other words, the matrix P n n becomes more sparse. The computational burden is therefore reduced, but the numerical accuracy of the state estimation becomes degraded. (c) 2017 Biointelligence Lab, SNU 31

15.11 Computer Experiment: Dynamic Reconstruction of Mackey-Glass Attractor Figure 15.16 Ensemble-averaged cumulative absolute error curves during the autonomous prediction phase of dynamic reconstruction of the Mackey-Glass attractor. d x dt t ax =- bx t + 1 + t-dt 10 xt-d t Extended Kalman filter (EKF) Central-difference Kalman filter (CDKF) Cubature Kalman filter (CKF) (c) 2017 Biointelligence Lab, SNU 32

15.12 Adaptivity Considerations Adaptive Critic Figure 15.17 Block diagram illustrating the use of an adaptive critic for the control of recurrent node activities v n in a recurrent neural network (assumed to have a single output); the part of the figure involving the critic is shown in blue. Consider a recurrent neural network embedded in a stochastic environment with relatively small variability in its statistical behavior. Provided that the underlying probability distribution of the environment is fully represented in the supervised-training sample supplied to the network, it is possible for the network to adapt to the relatively small statistical variations in the environment without any further on-line adjustments being made to the synaptic weights of the network. (c) 2017 Biointelligence Lab, SNU 33

15.13 Case Study: Model Reference Applied to Neurocontrol Figure 15.18 Model-reference adaptive control system; the feedback loop of the system is printed in blue. T 1 J( w, θ ) = y ( n) -y( n, w, θ ) åå k i, r k T n = 1 i 2 (c) 2017 Biointelligence Lab, SNU 34

Summary and Discussion (1/2) Four main recurrent network models with global feedback Nonlinear autoregressive networks with exogenous inputs (NARX networks), which use feedback from the output layer to the input layer Fully connected recurrent networks, which use feedback from the hidden layer to the input layer Recurrent multilayer perceptrons with more than one hidden layer, which use feedback from the output of each computation layer to its own input Second-order recurrent networks, which use second-order neurons. Properties of Recurrent Neural Networks They are universal approximators of nonlinear dynamic systems, provided that they are equipped with an adequate number of hidden neurons. They are locally controllable and observable, provided that their linearized versions satisfy certain conditions around the equilibrium point. Given any finite-state machine, we can build a recurrent neural network which, regarded as a black-box machine, will behave like that finite-state machine. Recurrent neural networks exhibit a meta-learning (i.e., learning to learn) capability. (c) 2017 Biointelligence Lab, SNU 35

Summary and Discussion (2/2) Gradient-based Learning Algorithms Back propagation through time (BPTT) off-line learning Real-time recurrent learning (RTRL) on-line learning Supervised-learning Algorithms Based on Nonlinear Sequential State Estimation Extended Kalman filter (EKF), with the linearization of the measurement model pertaining to the recurrent neural network by using the BPTT or RTRL algorithm. Derivative-free nonlinear sequential state estimator (CKF / CDKF). In so doing, not only the applicability of this novel approach to supervised learning is broaden, but also numerical accuracy is improved (but with increased computational requirements). (c) 2017 Biointelligence Lab, SNU 36