Modelling Time Series with Neural Networks. Volker Tresp Summer 2017

Similar documents
Neural Networks. Volker Tresp Summer 2015

Long-Short Term Memory and Other Gated RNNs

Sequence Modeling with Neural Networks

arxiv: v3 [cs.lg] 14 Jan 2018

Lecture 11 Recurrent Neural Networks I

Recurrent and Recursive Networks

Lecture 11 Recurrent Neural Networks I

Recurrent Neural Networks (Part - 2) Sumit Chopra Facebook

Artificial Neural Networks D B M G. Data Base and Data Mining Group of Politecnico di Torino. Elena Baralis. Politecnico di Torino

On the use of Long-Short Term Memory neural networks for time series prediction

Deep Learning Recurrent Networks 2/28/2018

Natural Language Processing and Recurrent Neural Networks

Stephen Scott.

Neural Architectures for Image, Language, and Speech Processing

Recurrent Neural Networks Deep Learning Lecture 5. Efstratios Gavves

Long-Short Term Memory

EE-559 Deep learning Recurrent Neural Networks

Slide credit from Hung-Yi Lee & Richard Socher

A Tutorial On Backward Propagation Through Time (BPTT) In The Gated Recurrent Unit (GRU) RNN

Recurrent Neural Networks. deeplearning.ai. Why sequence models?

Recurrent Neural Networks

Machine Learning for Signal Processing Neural Networks Continue. Instructor: Bhiksha Raj Slides by Najim Dehak 1 Dec 2016

Reservoir Computing and Echo State Networks

Deep Recurrent Neural Networks

Deep Learning Sequence to Sequence models: Attention Models. 17 March 2018

Recurrent Neural Networks. Jian Tang

Introduction to Deep Neural Networks

CSCI 315: Artificial Intelligence through Deep Learning

Recurrent Neural Networks. COMP-550 Oct 5, 2017

Lecture 17: Neural Networks and Deep Learning

Neural Networks. Yan Shao Department of Linguistics and Philology, Uppsala University 7 December 2016

Lecture 15: Exploding and Vanishing Gradients

Deep learning / Ian Goodfellow, Yoshua Bengio and Aaron Courville. - Cambridge, MA ; London, Spis treści

Natural Language Processing

Neural Networks 2. 2 Receptive fields and dealing with image inputs

Recurrent Neural Networks with Flexible Gates using Kernel Activation Functions

Machine Learning for Large-Scale Data Analysis and Decision Making A. Neural Networks Week #6

Artificial Neural Networks. Introduction to Computational Neuroscience Tambet Matiisen

Neural Networks and Deep Learning

Neural Networks Language Models

Index. Santanu Pattanayak 2017 S. Pattanayak, Pro Deep Learning with TensorFlow,

Deep Learning and Lexical, Syntactic and Semantic Analysis. Wanxiang Che and Yue Zhang

Learning Recurrent Neural Networks with Hessian-Free Optimization: Supplementary Materials

CSC321 Lecture 15: Exploding and Vanishing Gradients

CSC321 Lecture 10 Training RNNs

From perceptrons to word embeddings. Simon Šuster University of Groningen

Neural Networks with Applications to Vision and Language. Feedforward Networks. Marco Kuhlmann

Recurrent Neural Network

Analysis of the Learning Process of a Recurrent Neural Network on the Last k-bit Parity Function

Lecture 5 Neural models for NLP

Recurrent neural networks

How to do backpropagation in a brain

Deep Learning Architectures and Algorithms

Apprentissage, réseaux de neurones et modèles graphiques (RCP209) Neural Networks and Deep Learning

Deep Structured Prediction in Handwriting Recognition Juan José Murillo Fuentes, P. M. Olmos (Univ. Carlos III) and J.C. Jaramillo (Univ.

Deep Learning Recurrent Networks 10/11/2017

Recurrent Neural Networks

Introduction to RNNs!

Conditional Language modeling with attention

Sequence Transduction with Recurrent Neural Networks

Generating Sequences with Recurrent Neural Networks

Convolutional LSTM Network: A Machine Learning Approach for Precipitation Nowcasting 卷积 LSTM 网络 : 利用机器学习预测短期降雨 施行健 香港科技大学 VALSE 2016/03/23

arxiv: v1 [cs.ne] 14 Nov 2012

Course 495: Advanced Statistical Machine Learning/Pattern Recognition

International University Bremen Guided Research Proposal Improve on chaotic time series prediction using MLPs for output training

Christian Mohr

Statistical NLP for the Web

Statistical Learning Theory. Part I 5. Deep Learning

Sequence Models. Ji Yang. Department of Computing Science, University of Alberta. February 14, 2018

Development of a Deep Recurrent Neural Network Controller for Flight Applications

Natural Language Understanding. Recap: probability, language models, and feedforward networks. Lecture 12: Recurrent Neural Networks and LSTMs

A New Concept using LSTM Neural Networks for Dynamic System Identification

Lecture 5: Recurrent Neural Networks

ECE521 Lecture 7/8. Logistic Regression

Speech and Language Processing

Feedforward Neural Networks

Sample Exam COMP 9444 NEURAL NETWORKS Solutions

arxiv: v1 [cs.cl] 21 May 2017

Direct Method for Training Feed-forward Neural Networks using Batch Extended Kalman Filter for Multi- Step-Ahead Predictions

CSC321 Lecture 16: ResNets and Attention

Jakub Hajic Artificial Intelligence Seminar I

STA 4273H: Statistical Machine Learning

ECE521 Lectures 9 Fully Connected Neural Networks

Learning Long-Term Dependencies with Gradient Descent is Difficult

Machine Learning for Physicists Lecture 1

Neural Networks. Bishop PRML Ch. 5. Alireza Ghane. Feed-forward Networks Network Training Error Backpropagation Applications

Analysis of Multilayer Neural Network Modeling and Long Short-Term Memory

Unfolded Recurrent Neural Networks for Speech Recognition

TTIC 31230, Fundamentals of Deep Learning David McAllester, April Vanishing and Exploding Gradients. ReLUs. Xavier Initialization

Memory-Augmented Attention Model for Scene Text Recognition

A Critical Review of Recurrent Neural Networks for Sequence Learning

Introduction to Convolutional Neural Networks 2018 / 02 / 23

Deep Learning for NLP

Task-Oriented Dialogue System (Young, 2000)

EE-559 Deep learning LSTM and GRU

Machine Learning for Computer Vision 8. Neural Networks and Deep Learning. Vladimir Golkov Technical University of Munich Computer Vision Group

Based on the original slides of Hung-yi Lee

RECURRENT NETWORKS I. Philipp Krähenbühl

Multilayer Perceptrons (MLPs)

Need for Deep Networks Perceptron. Can only model linear functions. Kernel Machines. Non-linearity provided by kernels

Transcription:

Modelling Time Series with Neural Networks Volker Tresp Summer 2017 1

Modelling of Time Series The next figure shows a time series (DAX) Other interesting time-series: energy prize, energy consumption, gas consumption, copper prize,... 2

Neural Networks for Time-Series Modelling Let z t, t = 1, 2,... be the time-discrete time-series of interest (example: DAX) Let x t, t = 1, 2,... denote a second time-series, that contains information on z t (Example: Dow Jones) For simplicity, we assume that both z t and x t are scalar. The goal is the prediction of the next value of the time-series We assume a system of the form z t = f(z t 1,..., z t T, x t 1,..., x t T ) + ɛ t with i.i.d. random numbers ɛ t, t = 1, 2,... which model unknown disturbances. 3

Neural Networks for Time-Series Modelling (cont d) We approximate, using a neural network, f(z t 1,..., z t T, x t 1,..., x t T ) f w,v (z t 1,..., z t T, x t 1,..., x t T ) and obtain the cost function cost(w, V ) = N (z t f w,v (z t 1,..., z t T, x t 1,..., x t T )) 2 t=1 The neural network can be trained as before with simple back propagation if in training all z t and all x t are known! This is a NARX model: Nonlinear Auto Regressive Model with external inputs. Another name: TDNN (time-delay neural network). Note the convolutional idea in TDNNs 4

Language model: last T words as input. The task is to predict the next word in a sentence. One-hot encoding.

Mutiple-Step Prediction Predicting more than one time step in the future is not trivial The future inputs are not available The model noise needs to be properly considered in multiple step prediction (for example by a stochastic simulation); if possible one could also simulate future inputs (multivariate prediction) Teacher forcing: Use the predicted output as input (can be risky) 5

Recurrent Neural Network Recurrent Neural Networks are powerful methods for time series and sequence modelling 6

Generic Recurrent Neural Network Architecture Consider a feedforward neural network where there are connections between the hidden units and, as before, z t,h = sig(z T t 1 a h + x T t v h) ŷ t = sig(z T t w) Here, z t = (z t,1, z t,2,..., z t,h ) T, x t = (x t,0, x t,1,..., x t,m 1 ) T In Recurrent Neural Networks (RNNs) the next state of the neurons in the hidden layer depends on their last state and both are not directly measured a h, w, v h are weight vectors Note that in must applications, one is interested in the output y t (and not in z t,h ) The next figure shows an example. Only some of the recurrent connections are shown (blue). The blue connections also model a time lag. Without recurrent connections (a h = 0, h), we obtain a regular feedforward network 7

Note, that a recurrent neural network has an internal memory

A Recurrent Neural Network Architecture unfolded in Time The same RNN but with a different intuition Consider that at each time-step a feedforward Neural Network predicts outputs based on some inputs In addition, the hidden layer also receives input from the hidden layer of the previous time step Without the nonlinearities in the transfer functions, this is a linear state-space model; thus a RNN is a nonlinear state-space model 8

Training of Recurrent Neural Network Architecture Backpropagation through time (BPTT): essentially backpropagation applied to the unfolded network; note that all that happened before time t influences ŷ t, so the error needs to be backpropagated backwards in time, in principle until the beginning of the experiment! In reality, one typically truncates the gradient calculation (review in: Werbos (1990)) Real-Time Recurrent Learning (RTRL) (Williams and Zipser (1989)) Time-Dependent Recurrent Back-Propagation: learning with continuous time (Lagrangian approach) (Pearlmutter 1998) 9

Echo-State Network Recurrent Neural Networks are notoriously difficult to train A simple alternative is to initialize A and V randomly (according to some recipe) and only train w, e.g., with the ADALINE learning rule This works surprisingly well and is done in the Echo-State Network (ESN) 10

Iterative Prediction Assume a trained model where the prediction is: ŷ t (x t, y t ) ŷ t+1,... Thus we predict (e.g., the DAX of the next day) and the obtain a measurement of the next day A RNN would ignore the new measurement. What can be done 1: In probabilistic models, the measurement can change the hidden state estimates accordingly (HMM, Kalman filter, particle filter,...) 2: We can use y t as an input to the RNN (as in TDNN) 3: We add a (linear) noise model 11

Bidirectional RNNs The predictions in bidirectional RNNs depend on past and future inputs Useful for sequence labelling problems: handwriting recognition, speech recognition, bioinformatics,... 12

Long Short Term Memory (LSTM) As a recurrent structure the Long Short Term Memory (LSTM) approach has been very successful Basic idea: at time T a newspaper announces that the Siemens stock is labelled as buy. This information will influence the development of the stock in the next days. A standard RNN will not remember this information for very long. One solution is to define an extra input to represent that fact and that is on as along as buy is valid. But this is handcrafted and does not exploit the flexibility of the RNN. A flexible construct which can hold the information is a long short term memory (LSTM) block. The LSTM was used very successful for reading handwritten text and is the basis for many applications involving sequential data (NLP, translation of text,...) 13

LSTM in Detail The LSTM block replaces one hidden unit z h, together with its input weights a h and v h. In general all H hidden units are replaced by H LSTM blocks. It produces one output z h (in the figure it is called y) All inputs in the figure are weighted inputs Thus in the figure z would be the regular RNN-neuron output with a tanh transfer function Three gates are used that control the information flow The input gate (one parameter) determines if z should be attenuated The forget gate (one parameter) determines if the last z should be added and with which weight Then another tanh, modulated by the output gate See http://www.wildml.com/2015/10/recurrent-neural-network-tutorial-part-4-implementinga-grulstm-rnn-with-python-and-theano/ 14

LSTM Applications Wiki: LSTM achieved the best known results in unsegmented connected handwriting recognition, and in 2009 won the ICDAR handwriting competition. LSTM networks have also been used for automatic speech recognition, and were a major component of a network that in 2013 achieved a record 17.7% phoneme error rate on the classic TIMIT natural speech dataset Applications: Robot control, Time series prediction, Speech recognition, Rhythm learning, Music composition, Grammar learning, Handwriting recognition, Human action recognition, Protein Homology Detection 15

Gated Recurrent Units (GRUs) Some people found LSTMs too complicated and invented GRUs with fewer gates 16

Encoder Decoder Architecture For example, used in machine translation 17