A Scalable Recurrent Neural Network Framework for Model-free

Similar documents
Enhancing Performance of MLP/RBF Neural Classifiers via an Multivariate Data Distribution Scheme

the results to larger systems due to prop'erties of the projection algorithm. First, the number of hidden nodes must

Artificial Neural Networks MLP, Backpropagation

initially lcated away frm the data set never win the cmpetitin, resulting in a nnptimal nal cdebk, [2] [3] [4] and [5]. Khnen's Self Organizing Featur

Chapter 3: Cluster Analysis

Least Squares Optimal Filtering with Multirate Observations

Churn Prediction using Dynamic RFM-Augmented node2vec

Pattern Recognition 2014 Support Vector Machines

Slide04 (supplemental) Haykin Chapter 4 (both 2nd and 3rd ed): Multi-Layer Perceptrons

ENSC Discrete Time Systems. Project Outline. Semester

Analysis on the Stability of Reservoir Soil Slope Based on Fuzzy Artificial Neural Network

Lecture 2: Supervised vs. unsupervised learning, bias-variance tradeoff

The Research on Flux Linkage Characteristic Based on BP and RBF Neural Network for Switched Reluctance Motor

Lecture 2: Supervised vs. unsupervised learning, bias-variance tradeoff

Support-Vector Machines

Multiple Source Multiple. using Network Coding

Bootstrap Method > # Purpose: understand how bootstrap method works > obs=c(11.96, 5.03, 67.40, 16.07, 31.50, 7.73, 11.10, 22.38) > n=length(obs) >

Neural Networks with Wavelet Based Denoising Layers for Time Series Prediction

k-nearest Neighbor How to choose k Average of k points more reliable when: Large k: noise in attributes +o o noise in class labels

COMP 551 Applied Machine Learning Lecture 11: Support Vector Machines

Kinetic Model Completeness

5 th grade Common Core Standards

Reinforcement Learning" CMPSCI 383 Nov 29, 2011!

1996 Engineering Systems Design and Analysis Conference, Montpellier, France, July 1-4, 1996, Vol. 7, pp

THERMAL-VACUUM VERSUS THERMAL- ATMOSPHERIC TESTS OF ELECTRONIC ASSEMBLIES

Module 3: Gaussian Process Parameter Estimation, Prediction Uncertainty, and Diagnostics

IAML: Support Vector Machines

Data Mining: Concepts and Techniques. Classification and Prediction. Chapter February 8, 2007 CSE-4412: Data Mining 1

NEURAL NETWORKS. Neural networks

Fall 2013 Physics 172 Recitation 3 Momentum and Springs

Building to Transformations on Coordinate Axis Grade 5: Geometry Graph points on the coordinate plane to solve real-world and mathematical problems.

NUMBERS, MATHEMATICS AND EQUATIONS

B. Definition of an exponential

Pure adaptive search for finite global optimization*

Numerical Simulation of the Thermal Resposne Test Within the Comsol Multiphysics Environment

Admin. MDP Search Trees. Optimal Quantities. Reinforcement Learning

Training Algorithms for Recurrent Neural Networks

GENESIS Structural Optimization for ANSYS Mechanical

COMP 551 Applied Machine Learning Lecture 9: Support Vector Machines (cont d)

Resampling Methods. Cross-validation, Bootstrapping. Marek Petrik 2/21/2017

What is Statistical Learning?

Energy-Efficient Signal Processing via Algorithmic Noise-Tolerance

DESIGN OPTIMIZATION OF HIGH-LIFT CONFIGURATIONS USING A VISCOUS ADJOINT-BASED METHOD

Computational modeling techniques

Biocomputers. [edit]scientific Background

Collocation Map for Overcoming Data Sparseness

Review Problems 3. Four FIR Filter Types

Learning to Control an Unstable System with Forward Modeling

Modeling the Nonlinear Rheological Behavior of Materials with a Hyper-Exponential Type Function

EDA Engineering Design & Analysis Ltd

Feedforward Neural Networks

Math Foundations 10 Work Plan

NUROP CONGRESS PAPER CHINESE PINYIN TO CHINESE CHARACTER CONVERSION

Principal Components

Engineering Approach to Modelling Metal THz Structures

February 28, 2013 COMMENTS ON DIFFUSION, DIFFUSIVITY AND DERIVATION OF HYPERBOLIC EQUATIONS DESCRIBING THE DIFFUSION PHENOMENA

Computational Methods CMSC/AMSC/MAPL 460. Ramani Duraiswami, Dept. of Computer Science

You need to be able to define the following terms and answer basic questions about them:

COMP9444 Neural Networks and Deep Learning 3. Backpropagation

Public Key Cryptography. Tim van der Horst & Kent Seamons

x 1 Outline IAML: Logistic Regression Decision Boundaries Example Data

Time, Synchronization, and Wireless Sensor Networks

Simple Linear Regression (single variable)

COMP 551 Applied Machine Learning Lecture 5: Generative models for linear classification

Edinburgh Research Explorer

ENG2410 Digital Design Sequential Circuits: Part A

FEEDFORWARD NEURAL NETWORK COMBINED WITH NUMERICAL INTEGRATOR STRUCTURE FOR AN INVERTED PENDULUM MODELING AND SIMULATION.

ON-LINE PROCEDURE FOR TERMINATING AN ACCELERATED DEGRADATION TEST

Review of Simulation Approaches in Reliability and Availability Modeling

2:k e-1x,- Y~ 12 /21(2,

a(k) received through m channels of length N and coefficients v(k) is an additive independent white Gaussian noise with

8 th Grade Math: Pre-Algebra

Early detection of mining truck failure by modelling its operation with neural networks classification algorithms

Copyright Paul Tobin 63

Optimization Programming Problems For Control And Management Of Bacterial Disease With Two Stage Growth/Spread Among Plants

Lyapunov Stability Stability of Equilibrium Points

Agenda. What is Machine Learning? Learning Type of Learning: Supervised, Unsupervised and semi supervised Classification

Some Theory Behind Algorithms for Stochastic Optimization

ECE 2100 Circuit Analysis

Midwest Big Data Summer School: Machine Learning I: Introduction. Kris De Brabanter

Eric Klein and Ning Sa

Methods for Determination of Mean Speckle Size in Simulated Speckle Pattern

Dataflow Analysis and Abstract Interpretation

Administrativia. Assignment 1 due thursday 9/23/2004 BEFORE midnight. Midterm exam 10/07/2003 in class. CS 460, Sessions 8-9 1

Bayesian nonparametric modeling approaches for quantile regression

Thermodynamics and Equilibrium

Current/voltage-mode third order quadrature oscillator employing two multiple outputs CCIIs and grounded capacitors

I.S. 239 Mark Twain. Grade 7 Mathematics Spring Performance Task: Proportional Relationships

Linearization of the Output of a Wheatstone Bridge for Single Active Sensor. Madhu Mohan N., Geetha T., Sankaran P. and Jagadeesh Kumar V.

Artificial Neural Networks Backpropagation & Deep Neural Networks

COMP 551 Applied Machine Learning Lecture 4: Linear classification

OF SIMPLY SUPPORTED PLYWOOD PLATES UNDER COMBINED EDGEWISE BENDING AND COMPRESSION

Oscillator. Introduction of Oscillator Linear Oscillator. Stability. Wien Bridge Oscillator RC Phase-Shift Oscillator LC Oscillator

EXPERIMENTAL STUDY ON DISCHARGE COEFFICIENT OF OUTFLOW OPENING FOR PREDICTING CROSS-VENTILATION FLOW RATE

NTP Clock Discipline Principles

Math Foundations 20 Work Plan

Physical Layer: Outline

Design and Simulation of Dc-Dc Voltage Converters Using Matlab/Simulink

Dynamic Ping Optimization for Surveillance in Multistatic Sonar Buoy Networks with Energy Constraints

A New Evaluation Measure. J. Joiner and L. Werner. The problems of evaluation and the needed criteria of evaluation

Transcription:

A Scalable Recurrent Neural Netwrk Framewrk fr Mdel-free POMDPs April 3, 2007 Zhenzhen Liu, Itamar Elhanany Machine Intelligence Lab Department f Electrical and Cmputer Engineering The University f Tennessee http://mil.engr.utk.edu

Outline Intrductin Backgrund and mtivatin TRTRL/SMD Simulatin results Summary and future wrk 2

Ingredients fr building Intelligent Machines Implementatin platfrm? Must scale Mammal brain as a reference mdel? Massively parallel architecture Operates at (relatively) lw speeds Fault-tlerant tlerant Sftware vs. Hardware If hardware, what technlgy? (FPGA, VC VLSI, Analg VLSI) (Nnlinear) Functin apprximatin Dealing with high-dimensinal prblems Optimal plicy is unattainable Capturing spatitempral dependencies RNNs,, Bayesian Netwrks, Fuzzy? Bilgically-inspired inspired schemes 3

Scaling ADP Gals T address high-dimensinal state and/r actin spaces Supprt nline learning Deal with partially bservable scenaris (e.g. POMDPs) Hardware realizable Apprach taken Emply recurrent neural netwrks (RNNs( RNNs) Imprved learning algrithm that scales Devised hardware-efficient efficient architecture Embed within apprximate Q-Learning Q framewrk 4

The Real-Time Recurrent Learning (RTRL) Algrithm Originally prpsed in 1989 fr arbitrary RNN tplgy Stchastic gradient-based nline algrithm Activatin functin f neurn k is defined by: y k t 1 f k s k t, where s k is the weighted sum f all activatins leading t neurn k. The netwrk errr at time t is defined by: z k 1 1 J = m m = m t 2 2 m utputs x = y k k if k input if k utput 2 [ d y ] [ e ( )] 2 m utputs where d m (t) dentes the desired target value fr utput neurn m 5

Updating the weights The errr is minimized alng a psitive multiple f the perfrmance measure gradient such that w ( t + 1) = w + Δw Δw J = α = α w k utputs e k y k w The partial derivatives f the activatin functin with respect t the weights are identified as sensitivity elements and dented by p k = yk w 6

Updating the Sensitivities in RTRL The sensitivities f nde k with respect t a change in weight are updated using the recursive expressin w p k ( t + 1) = f k k kl ik t) l N l ( s () t ) w p + δ z ( Each neurn perfrms O(N 3 ) multiplicatins, yielding a ttal cmputatinal cmplexity f O(N 4 ) The strage requirements are dminated by the weights and the sensitivities resulting in O(N 3 ) strage requirements 7

Truncated RTRL (TRTRL) Mtivatin: T btain a scalable versin f the RTRL algrithm while minimizing perfrmance degradatin. Hw? Bilgically-inspired inspired apprach: limit l the sensitivities f each neurn t its ingress (incming) and egress (utging) links. 8

Revising Sensitivity Updates fr TRTRL Fr all ndes nt in the utput set,, the ingress sensitivity functin fr nde i is given by p ( ( ))[ s t w p z ( )]. ( t + 1) = f t i i i + The egress sensitivities fr nde i are updated by p ( )[ i s w p + δ y ( )]. ( t + 1) = f t Fr the utput neurns, a nnzer sensitivity element must exist in rder t update the weights, yielding p i ( ( ))[ i s t w p + w p + δ z ( )]. ( t + 1) = f t i i 9

Strage and Cmputatinal Cmplexity f TRTRL The netwrk architecture remains the same with TRTRL (there s s a weight between each tw neurns) Only the calculatin f sensitivities is reduced The cmputatinal lad fr each neurn becmes O(KN) where K dentes the number f utput neurns The cmputatin cmplexity was reduced frm O(N 4 ) t O(KN 2 ) The strage requirement was reduced frm O(N 3 ) t O(N 2 ) 10

Stchastic-Meta Descent (SMD N. Schraudlph et al.) Gradient descent techniques ften suffer frm slw cnverges, particularly fr ill-cnditined prblems Mainstream apprach: utilize secnd-rder rder infrmatin, e.g. LM, Newtn methds (all utilize Hessian matrix) Hwever, these are cmputatinally heavy Stchastic meta-descent (SMD) has recently been prpsed as a cheap secnd-rder rder gradient technique Emplys an independent learning rate fr each weight Utilizes Hessian infrmatin in lcal step size w ( t + 1) = w + λ δ 11

SMD adpted fr TRTRL We adpted SMD fr TRTRL (first wrk in applying SMD t RNNs) Apprach - adapt learning rate alng expnentiated gradient descent directin J lnλ( t) = lnλ( t 1) μ, lnλ lnλ = = lnλ ( t lnλ ( t J w( t) 1) μ w lnλ 1) + μδ v safeguard factr against unreasnably small, r negative, values e ( ρ,1+ v ) λ = λ ( t 1) max μδ using relatinship x 1+ x 12

SMD adpted fr TRTRL (cnt.) Adapt gradient trace ( δ ( H v( t ) ) v ( t + 1) = βv + λ β ) H t is the instantaneus Hessian (the matrix f secnd derivatives 2 J/ w w kl f the errr J with respect t each pair f weights) at time t The prduct f the Hessian and an arbitrary vectr t Hv R v w r w rv r 0 which fr TRTRL yields ( H tv ) = Rv e p = [ e Rv { p } Rv{ y( t) } p ] εutput εutput 13

SMD in TRTRL T cmplete the analysis, the R-peratr R n S, Y, P is R v R We als added adaptive glbal meta-learning rate by defining t yield { y } = f ( s ) R { s } R { s } = v z, v v v lεu I { } ( ) { } [ ] i p = f s( t) Rv s( t) wi p + w p + δiz f ( s ) [ v p v p ] i + + i J ϕ = = δ( t) v( t) ln λ ( 1+ ηϕ ϕ ( 1) ) μ = μ ( t 1) t l l 14

Using TRTRL/RNN fr Slving POMDP Recall that the mtivatin fr using RNNs was t slve POMDPs O(t) RNN J(t) α + _ Envirnment a(t) Sft-max J(t-1) + _ r(t) In each step: (1) feedfrward all actins, (2) find the ne with maximal (sft-max) value (J), (3) apply crrespnding actin t the envirnment, (4) get next reward and update weights 15

Example 2 - Fur State POMDP 6 12/18 r = 0 5 12/18 r = 8 6 12/18-5 -12/18 r = 0 r = 8 4-state POMDP with identical (cnfusing) bservatins Agent needs t remember prir bservatin t infer state 15 internal neurns and 1 utput neurn 16

Summary Scalable, efficient RNNs Vital tl fr addressing high-dimensinal POMDPs Intrduced a fast, hardware-efficient learning algrithm and architecture Slightly imprved SMD technique (adaptive glbal learning rate) Successfully applied TRTRL-SMD in slving POMDP Pathway fr addressing practical prblems Scalable framewrk fr ADP with RNNs 17