A New Look at Nonlinear Time Series Prediction with NARX Recurrent Neural Network. José Maria P. Menezes Jr. and Guilherme A.

A New Look at Nonlinear Time Series Prediction with NARX Recurrent Neural Network José Maria P. Menezes Jr. and Guilherme A. Barreto Department of Teleinformatics Engineering Federal University of Ceará, Centro de Tecnologia Fortaleza-CE, Brazil October 23-27, 2006

Contents

Motivation Objectives Theoretical Foundations Time Series Prediction (TSP) Tasks Recurrent Neural Networks (RNN) NARX Architecture Simulations VBR Video Traffic Laser Time Series Conclusion

Motivation 1. Long Term Dependence occurs very often in real-world time series (e.g. traffic series). 2. Theory of Dynamical Systems provides the theoretical bases to analyzing nonlinear systems with chaotic behavior. 3. Recurrent Neural Networks are capable of representing arbitrary nonlinear dynamical mappings, such as those commonly found in nonlinear time series prediction. 4. NARX Model is a recurrent neural network capable of modeling efficiently time series with long-term dependences.

Objectives of this Work 1. To evaluate the performance of standard dynamic neural networks in difficult time series prediction tasks. 2. To propose a new field of application for NARX networks: prediction of univariate time series with long range dependencies.

Theoretical Foundations

Time Series Prediction (TSP) Tasks TSP One-step-ahead prediction: Neural network models are commonly used to estimating only the next value of a time series. Multi-step-ahead prediction: If the user is interested in a wider prediction horizon. The model s output should be fed back to the input regressor for a fixed but finite number of time steps. Dynamic modeling: If the prediction horizon tends to infinity, the neural network will act as an autonomous system, modeling the long-term dynamics of the system that generated the oberved time series.

Recurrent Neural Networks (RNN) RNN Feedforward MLP-like networks can be easily adapted to process time series through an input tapped delay line (e.g. FTDNN model). Recurrent neural networks (RNN) have local and/or global feedback loops in their structure (e.g. Elman, Jordan and NARX models) [1]. RNN are capable to represent arbitrary nonlinear dynamical mappings, such as those commonly found in nonlinear time series prediction tasks.

Recurrent Neural Networks (RNN) Takens Embedding Theorem Takens [3] has shown that the state of a deterministic dynamic system can be accurately reconstructed by a time window of finite length sliding over the observed time series as follows: where x 1 (n) [x(n) x(n τ) x(n (d E 1)τ)] T, x(n) is the value of the time series at time n, d E is the embedding dimension and τ is the embedding delay.

Recurrent Neural Networks (RNN) FTDNN Focused Time Delay Neural Network

Recurrent Neural Networks (RNN) Elman Network

NARX Architecture NARX Model in System Identification Nonlinear Autoregressive with exogenous input (NARX) [2]: y(n + 1) = f [y(n),..., y(n d y + 1); u(n), u(n 1),..., u(n d u + 1)]. = f [y(n); u(n)], where u(n) and y(n) denote, respectively, the input and the output of the model at discrete time n. The parameters d u 1 and d y 1, d u d y, are memory delays.

NARX Architecture NARX Neural Network Architecture

NARX Architecture NARX Network in Nonlinear Time Series Prediction Using Takens Theorem to build the input regressor: u(n) = [x(n) x(n τ) x(n (d E 1)τ)] T, where we set d u = d E. The output regressor y(n) can be written in two different modes, depending on the training modes of the NARX network: y p (n) = [ x(n),..., x(n d y + 1)], y sp (n) = [x(n),..., x(n d y + 1)], where the P-mode contains d y past values of the estimated time series, while the SP-mode contains d y past values of the actual time series.

NARX Architecture Paralell Mode (NARX-P)

NARX Architecture Series-Parallel Mode (NARX-SP)

Simulations

Evaluated Networks NARX-P, NARX-SP, FTDNN and Elman networks. All networks have two-hidden layers and one output neuron. All neurons use the hyperbolic tangent activation function. The standard backpropagation algorithm is used to train the networks.

Summary Table - Training Parameters number of neurons number of neurons 1st hidden layer 2st hidden layer learning rate epochs (N h,1 ) p (N h,2 ) TASK 1 2d E + 1 p Nh,1 0.001 300 TASK 2 2d E + 1 Nh,1 0.01 3000

Performance Evaluation Metric The networks are evaluated in multi-step-ahead prediction tasks. Quantitatively, we compute the Normalized Mean Squared Error (NMSE): NMSE(N) = 1 N N σx 2 e 2 (n) where n=1 N is the horizon prediction, σ x 2 is the sample variance of the actual time series and e(n) = y(n) ŷ(n) is the prediction error at time n.

Task 1: Long-term prediction of VBR video traffic Variable bit rate (VBR) video traffic (Jurassic Park) [4]. This video traffic trace was encoded with MPEG-I. VBR video traffic typically exhibits burstiness over multiple time scales [5],[6]. 2000 sample points, rescaled to the range [ 1, 1]. 1500 samples for training and 500 samples for testing.

VBR Video Traffic Empirical Sensitivity Analysis - 1 Embedding dimension 1 FTDNN Elman NARX P NARX SP 0.8 NMSE 0.6 0.4 0.2 0 5 10 15 20 25 Order

VBR Video Traffic Empirical Sensitivity Analysis - 2 Number of training epochs 1 0.9 0.8 FTDNN Elman NARX P NARX SP 0.7 0.6 NMSE 0.5 0.4 0.3 0.2 0.1 0 0 100 200 300 400 500 600 Epochs

VBR Video Traffic Multi-Step-Ahead Predictions - 1 FTDNN 0.4 0.2 Predicted Original 0 Bits 0.2 0.4 0.6 0.8 1 0 50 100 150 200 250 300 Frame number

VBR Video Traffic Multi-Step-Ahead Predictions - 2 Elman 0.4 0.2 Predicted Original 0 Bits 0.2 0.4 0.6 0.8 1 0 50 100 150 200 250 300 Frame number

VBR Video Traffic Multi-Step-Ahead Predictions - 3 NARX-SP 0.5 Predicted Original 0 Bits 0.5 1 0 50 100 150 200 250 300 Frame number

VBR Video Traffic Task 2: Long-term prediction of chaotic laser intensities Chaotic laser time series: comprises measurements of the intensity pulsations of a single-mode Far-Infrared-Laser NH3 in a chaotic state [7]. Available worldwide since a TSP competition organized by the Santa Fe Institute [8]. 1500 sample points which have been rescaled to the range [ 1, 1]. 1000 samples for training and 500 samples for testing.

Laser Time Series Dynamic Modeling - 1 FTDNN Elman Network 1 0.8 Predicted Original 1 0.8 Predicted Original 0.6 0.6 0.4 0.4 0.2 0.2 P 0 P 0 0.2 0.2 0.4 0.4 0.6 0.6 0.8 0.8 1 0 100 200 300 400 500 Time 1 0 100 200 300 400 500 Time

Laser Time Series Dynamic Modeling - 2 NARX-SP Network 1 0.8 Predicted Original 0.6 0.4 0.2 P 0 0.2 0.4 0.6 0.8 1 0 50 100 150 200 250 300 350 400 450 500 Time

Laser Time Series Sensitivity Analysis Length of the prediction horizon. 3 2.5 FTDNN Elman NARX P NARX SP 2 Arv 1.5 1 0.5 0 0 10 20 30 40 50 60 70 80 90 100 Prediction Horizon (N)

Contents Motivation Objectives Theoretical Foundations Simulations Conclusion Laser Time Series Laser Time Series 200 200 150 150 100 100 j j Recurrence Plot: original series, NARX-SP, FTDNN, Elman. 50 50 100 i 150 0 0 200 200 200 150 150 100 100 j j 0 0 50 50 0 0 50 100 i 150 200 50 100 i 150 200 50 50 100 i 150 200 0 0

Conclusion

Conclusion The results has shown that NARX network can be succesfully applied to complex univariate time series modelling and prediction tasks. The proposed approach consistently outperforms standard neural network based predictors, such as the FTDNN and Elman architectures.

References [1] J. F. Kolen and S. C. Kremer, A Field Guide to Dynamical Recurrent Networks, Wiley-IEEE Press, 2001. [2] T. Lin, B. G. Horne, P. Tino, and C. L. Giles, Learning long-term dependencies in NARX recurrent neural networks IEEE Transactions on Neural Networks, vol. 7, no. 6, pp. 1424 1438, 1996. [3] F. Takens, Detecting strange attractors in turbulence, in Dynamical Systems and Turbulence, D. A. Rand and L.-S. Young, Eds. 1981, vol. 898 of Lecture Notes in Mathematics, pp. 366 381, Springer. [4] O. Rose, Statistical properties of MPEG video traffic and their impact on traffic modeling in ATM systems, in Proceedings of the 20th Annual IEEE Conference on Local. [5] J. Beran, R. Sherman, M. S. Taqqu, and W. Willinger, Long-range dependence in variable-bit-rate video traffic, IEEE Transactions on Communications, vol. 43, no. 234, pp. 1566 1579, 1995. [6] D. Heyman and T. Lakshman, What are the implications of long-range dependence for VBR video traffic engineering, IEEE/ACM Transactions on Networking, vol. 4, no. 3, pp. 301 317, 1996. [7] U. Huebner, N. B. Abraham, and C. O. Weiss. Dimensions and entropies of chaotic intensity pulsations in a single-mode far-infrared NH3 laser. Physical Review A, 40(11):6354 6365, 1989. [8] A. Weigend and N. Gershefeld. Time Series Prediction: Forecasting the Future and Understanding the Past. Addison-Wesley, Reading, 1994.

A New Look at Nonlinear Time Series Prediction with NARX Recurrent Neural Network