Modeling Economic Time Series Using a Focused Time Lagged FeedForward Neural Network

Similar documents
Neural Networks and the Back-propagation Algorithm

ARTIFICIAL NEURAL NETWORKS گروه مطالعاتي 17 بهار 92

ARTIFICIAL NEURAL NETWORK PART I HANIEH BORHANAZAD

Lecture 4: Perceptrons and Multilayer Perceptrons

4. Multilayer Perceptrons

Neural Networks. Nethra Sambamoorthi, Ph.D. Jan CRMportals Inc., Nethra Sambamoorthi, Ph.D. Phone:

Direct Method for Training Feed-forward Neural Networks using Batch Extended Kalman Filter for Multi- Step-Ahead Predictions

Artificial Neural Network

Data Mining Part 5. Prediction

Artificial Intelligence

Unit III. A Survey of Neural Network Model

Combination of M-Estimators and Neural Network Model to Analyze Inside/Outside Bark Tree Diameters

Introduction to Neural Networks

An artificial neural networks (ANNs) model is a functional abstraction of the

Introduction to Natural Computation. Lecture 9. Multilayer Perceptrons and Backpropagation. Peter Lewis

Analysis of Multilayer Neural Network Modeling and Long Short-Term Memory

Part 8: Neural Networks

Lecture 7 Artificial neural networks: Supervised learning

Wavelet Neural Networks for Nonlinear Time Series Analysis

CS 6501: Deep Learning for Computer Graphics. Basics of Neural Networks. Connelly Barnes

EE04 804(B) Soft Computing Ver. 1.2 Class 2. Neural Networks - I Feb 23, Sasidharan Sreedharan

Artificial Neural Networks

T Machine Learning and Neural Networks

Multilayer Perceptron

NN V: The generalized delta learning rule

Application of Artificial Neural Networks in Evaluation and Identification of Electrical Loss in Transformers According to the Energy Consumption

Need for Deep Networks Perceptron. Can only model linear functions. Kernel Machines. Non-linearity provided by kernels

Artificial Neural Network

Portugaliae Electrochimica Acta 26/4 (2008)

CSE 352 (AI) LECTURE NOTES Professor Anita Wasilewska. NEURAL NETWORKS Learning

Reservoir Computing and Echo State Networks

Artificial Neural Networks. Edward Gatt

ECE 471/571 - Lecture 17. Types of NN. History. Back Propagation. Recurrent (feedback during operation) Feedforward

ECE662: Pattern Recognition and Decision Making Processes: HW TWO

Address for Correspondence

ARTIFICIAL INTELLIGENCE. Artificial Neural Networks

A Logarithmic Neural Network Architecture for Unbounded Non-Linear Function Approximation

Multilayer Perceptrons and Backpropagation

(Feed-Forward) Neural Networks Dr. Hajira Jabeen, Prof. Jens Lehmann

Artificial Neural Network Method of Rock Mass Blastability Classification

Multilayer Feedforward Networks. Berlin Chen, 2002

AI Programming CS F-20 Neural Networks

Need for Deep Networks Perceptron. Can only model linear functions. Kernel Machines. Non-linearity provided by kernels

Chapter 2 Single Layer Feedforward Networks

Artificial Neural Network : Training

Machine Learning for Large-Scale Data Analysis and Decision Making A. Neural Networks Week #6

NONLINEAR CLASSIFICATION AND REGRESSION. J. Elder CSE 4404/5327 Introduction to Machine Learning and Pattern Recognition

Pattern Classification

Engineering Part IIB: Module 4F10 Statistical Pattern Processing Lecture 6: Multi-Layer Perceptrons I

Machine Learning and Data Mining. Multi-layer Perceptrons & Neural Networks: Basics. Prof. Alexander Ihler

Neural Networks biological neuron artificial neuron 1

Artificial Neural Networks" and Nonparametric Methods" CMPSCI 383 Nov 17, 2011!

Chapter 15. Dynamically Driven Recurrent Networks

18.6 Regression and Classification with Linear Models

Keywords- Source coding, Huffman encoding, Artificial neural network, Multilayer perceptron, Backpropagation algorithm

Mark Gales October y (x) x 1. x 2 y (x) Inputs. Outputs. x d. y (x) Second Output layer layer. layer.

Module 12. Machine Learning. Version 2 CSE IIT, Kharagpur

Deep Feedforward Networks

Artificial Neural Networks (ANN)

Multilayer Neural Networks

Artificial Neural Networks Examination, June 2005

Time Series and Forecasting

Neural networks. Chapter 20. Chapter 20 1

Neuro-Fuzzy Comp. Ch. 4 March 24, R p

Introduction to Machine Learning

Simple Neural Nets For Pattern Classification

Neural Networks. Intro to AI Bert Huang Virginia Tech

Artificial Neural Networks

Artificial Neural Networks. MGS Lecture 2

Chapter 4 Neural Networks in System Identification

Neural networks. Chapter 19, Sections 1 5 1

A Wavelet Neural Network Forecasting Model Based On ARIMA

Recursive Neural Filters and Dynamical Range Transformers

Pattern Matching and Neural Networks based Hybrid Forecasting System

Lecture 5: Logistic Regression. Neural Networks

Adaptive Inverse Control based on Linear and Nonlinear Adaptive Filtering

Artificial neural networks

100 inference steps doesn't seem like enough. Many neuron-like threshold switching units. Many weighted interconnections among units

Multilayer Perceptrons (MLPs)

Artificial Neural Network and Fuzzy Logic

Mr. Harshit K. Dave 1, Dr. Keyur P. Desai 2, Dr. Harit K. Raval 3

Automatic modelling of neural networks for time series prediction in search of a uniform methodology across varying time frequencies

FORECASTING YIELD PER HECTARE OF RICE IN ANDHRA PRADESH

Pattern Recognition Prof. P. S. Sastry Department of Electronics and Communication Engineering Indian Institute of Science, Bangalore

DETECTING PROCESS STATE CHANGES BY NONLINEAR BLIND SOURCE SEPARATION. Alexandre Iline, Harri Valpola and Erkki Oja

Unit 8: Introduction to neural networks. Perceptrons

Temporal Backpropagation for FIR Neural Networks

A STATE-SPACE NEURAL NETWORK FOR MODELING DYNAMICAL NONLINEAR SYSTEMS

Artificial Neural Networks

From perceptrons to word embeddings. Simon Šuster University of Groningen

Neural Networks Introduction

Neural Network Based Response Surface Methods a Comparative Study

Artificial Neural Networks. Historical description

Chapter 3 Supervised learning:

Retrieval of Cloud Top Pressure

Neural Networks (Part 1) Goals for the lecture

Radial Basis Function Networks. Ravi Kaushik Project 1 CSC Neural Networks and Pattern Recognition

Feedforward Neural Nets and Backpropagation

MODELLING ENERGY DEMAND FORECASTING USING NEURAL NETWORKS WITH UNIVARIATE TIME SERIES

Artificial Neural Network Based Approach for Design of RCC Columns

Transcription:

Proceedings of Student Research Day, CSIS, Pace University, May 9th, 23 Modeling Economic Time Series Using a Focused Time Lagged FeedForward Neural Network N. Moseley ABSTRACT, - Artificial neural networks (ANN) are simplified mathematical representations of some aspects of the functioning of the human brain. ANN s based on the Multilayer Perceptron (MLP) the base architecture of layered networks, have been shown to be a powerful tool for input-output mapping and have been used extensively in many disciplines. In this paper we demonstrate the use of a neural network to model univariate economic time series,. For this thesis, a MLP based network simulator was designed and implemented in the C programming language. Specifically a Time Focused Feed forward layered network (TFLN) trained with standard back propagation algorithm with momentum is the chosen architecture. Focused Time Lagged Feed Forward Networks acquire temporal processing ability through the realization of short-term memory. The neural network generates estimates of the time series after training, additionally the ability of the network to discover nonlinear relationships was used to investigate the interaction between two key economic indicators from their representation as total sales and total inventories time series. A model validation regime predicated digital signal processing methodology was developed. The results of these studies demonstrate that the application of neural networks to time series data seems to hold promise as an effective tool for analysis and forecasting.. INTRODUCTION A discrete- time signal or time series x(n) is basically a sequence of real or complex number samples. The key characteristics of a time series are that the observations are ordered in time and adacent observations are dependent (related). When successive observations of the series are dependent, we may use past observations to predict future values. Modeling and predicting economic data by using traditional statistical approaches has only been partially successful. Accordingly, researchers in recent times have turned to alternative approaches, most notably Artificial Neural Networks (ANN) which constitute a class of non-linear models. Non-linear models, by definition have more meaningful applicability but they present an added difficulty in that their supplementary degrees of freedom which lead to a better fitting of the model to data may result in a reduction in generalization capabilities. This learninggeneralization dilemma is the main limitation of ANN s. A set of input and target samples are presented to the learning system, which is used to discover the statistical behavior of the input environment After training the fitted model must be validated with a validatiion set: a set of data not contained in the training set that provides a way to measure the capacity of the model to generalize what it has learned to include other data sets. Real world economic data is often nonlinear, comprising high frequency multipolynomial components and is piecewise continuous. Modeling economic data presents difficulties. A number of issues arise when working with the traditional techniques of linear function approximation: The system of interest may be intrinsically nonlinear or the wrong linear model may be selected. As the number of input variables increase the number of free parameters grow, which requires many more samples in order to prevent the model from specializing in the noise or other features of the training data. Additionally, polynomials become less efficient predictors as the number of input variables increase. A study found that the sum of squares error falls off as O (/M) where M is the sum of hidden units in a neural network, 2 regardless of the number of input variables. Error decreases O (/ M /d), where d is the number of input variables, for polynomials or any other series expansion.[2]. The motivation for analysis of time series using neural networks in this thesis is driven by the following features: (a) neural networks rely purely on the input observations the data is allowed to speak,(b) Multilayer feedforward networks with at least one hidden layer and a sufficient number of hidden units are capable of approximating any measurable function [2]., this makes them versatile enough to represent any form of time series (c) the capacity to generalize allows ANNs to extract statistical information even in the case of missing or noisy data. (d) Ann s have the capacity to represent nonlinearities in time series 4.

2. NEURAL NETWORKS 2. Multilayer Perceptron Neural networks generally consist of a number of interconnected nonlinear processing elements or neurons in which the nonlinearity is distributed throughout the network. The manner in which the interneuron connections are arranged and the nature of the connections determines the structure of the network. Fig. is a representation of a FeedForward Multilayer Perceptron. Inputs Outputs Input Hidden Output Layer Layer Layer Figure. Representation of a Feed forward Neural Network Showing input source nodes, and feedforward propagation through processing elements in hidden layer on to processing elements in outer layer The learning algorithm of the MLP determines the degree to which the connections are adusted during training in order to achieve a desired network behavior. In a FeedForward Multilayer Perceptron, the neurons are arranged in a feedforward mode so that the outputs of nodes in a layer form the input to nodes in subsequent layers. Therefore, signals flow unidirectionally from the input layer through each internal layer of the network to the output layer. Between the input and the output layers are the hidden layers. The network is given nonlinear properties through the use of a nonlinear transfer function associated with each processing element. The hidden layer may be visualized as creating a map relating an input pattern to its desired response. This ability allows MLPs to discriminate between nonlinearly separable categories. Masters (993) suggests, If a function consists of a finite collection of points, a 3 layer network is capable of learning it. For a ANN to model its environment, it is necessary that it s strengths/weights of the interneuron connections be adusted according to the difference between the desired and actual outputs corresponding to a given input condition. The adustments to the weights are effected under the influence of a learning algorithm with the following points in mind: The algorithm starts from an arbitrary setting of the neuron s synaptic weights. Adustments to the synaptic weights in response to statical variations in the system s behavior are made on a continuous basis. Computations of adustments to the synaptic weights are completed inside a time interval that is one sampling period long. 2.2 BACKPROPAGATION ALGORITHM The Back-propagation (BP) learning algorithm has emerged as the standard for the training of MLP. The partial derivatives of the cost function (performance measure) with respect to the free parameters (synaptic weights and biases)of the network are determined by back-propagating the error signals (computed by the output neurons) through the network layer by layer. In the application of the BP algorithm there are two distinct passes. In the forward pass the 4.2

synaptic weights remain unaltered throughout the network, and the function signals of the network are computed on a neuron by neuron basis. The function signals occurring at the output of a neuron is computed as y = f( v ) where v is the induced local field of neuron, defined by m v = w yi (2) i= where m is the total number of inputs applied to neuron and y i () is the synaptic weight connecting neuron i to neuron, is the input signal of neuron. If neuron is in the first hidden layer of the network the index i refers i to the ith input terminal of the network for which y = x i i where i is the i th element of the input vector (pattern). If neuron is in the output layer then the index refers to the th output terminal of the network where where o d (n) wi (3) y = o (4) is the th element of the output vector pattern The output is compared with the desired response obtaining the error signal e for the th output neuron. The backward pass starts at the output layer by passing the error signal leftward through the network layer by layer, and recursively computing the δ (local gradient) for each neuron. This recursive process permits the synaptic weights of the network to undergo changes in accordance with the following delta rule ϕ ' ( v ) Weight learning local inputsignal correction = rate. gradient. neuron wi η δ yi The local gradient is dependent on the location of the neuron. For neurons located in the output layer the local gradient is where δ ' e ϕ ( v) = (6) is the derivative of the activation function called on its argument the induced local field. The activation function f can be a simple threshold function, a sigmoid, or a hyperbolic tangent function and for neurons located in the hidden layer ' δ = ϕ ( v ) δ w (7) k k where the weighted sum of the δ s computed for the neurons in the next hidden or output layer that are connected to neuron is included. For the presentation of each training example the input pattern is fixed throughout the round trip process, encompassing the forward pass followed by the backward pass. k (5) 2.3 A FOCUSED TIME LAGGED FEEDFORWARD NETWORK In the Time focused feedforward network (TFLN) a static MLP acquires temporal processing capability, it sees the time series x xn in the form of many mappings of an input vector to an output val This technique was presented by Haykin.S[2]. The TFLN is a non-linear filter consisting of a tapped delay line memory of order p and a multiplayer perceptron. The TFLN used in the proect had a sigmoid activation function the logistic function: ϕ ( v) = a > and < v ( + exp( av ) (8) n) < 4.3

where v is the induced local field of neuron. input vector for each iteration of the algorithm represented as and the network output represented as [ ] x = x, x( n ),... x( n p) T (9) m yn ( ) = wy i = = wϕ w() l x( n l) + b +bo Figure 2a and 2b show sample sets of the data utilized, showing time variations over the period of interest. The decision as to the size of the layers in the network was determined using the constructive method. Constructive methods determine the topology of the network during training as an integral part of the learning algorithm. The approach is to begin with a small network, train the network until the performance criterion has been reached, continue adding nodes and training until a global performance has been reached in terms of an acceptable error criterion. subset. The training and test sets were used for modeling and the validation set used for extrapolation over unseen data points for validation of the model Figure 2. Sales and Inventory time series after transformation to remove seasonality and trend. () a b.8.8 scaled.6.4 scaled.6.4.2.2 2 3 4 months 2 3 4 months 3. MODELING THE DATA Four sets of time series from the Federal Reserve Archives were employed in this investigation the monthly totals inventories and sales (Billion$) from Jan 97 to December 2, and monthly totals inventories and (Billion$) from Jan 967 to December 2. Two set are shown above(see figures 2a and 2b)..All data sets were preprocessed in order 4.4

to eliminate trends and seasonality influences from the data. The resulting data was partitioned into estimation and validation sets. The estimation set was further partitioned into a training and test.it is necessary to investigate how well the obtained model captures the key features of the data as demonstrated in the agreement between the model output and the observed data in an least square error statistical sense. The existence of any structure in the residual or prediction error signal indicates a misfit between the model and the data. Hence, a key validation technique is to check whether the residual process, is a realization of white noise. Autocorrelation test. (ACT) The autocorrelation sequence of a stationary random signal is given[] by N rxy () l = lim N > x y ( n l) () n= N 2N + () It was shown (Kendall and Stuart 983) that when N is sufficiently large, the distribution of the estimated Autocorrelation coefficients ρ(l) = r(l)/r() is approximately Gaussian with zero mean and variance of /N. The approximate 95 percent confidence limits are ±.96/ N. Any estimated values of ρ(l) that fall outside these limits are significantly different from zero with 95 percent confidence. Values well beyond these limits indicate nonwhiteness of the residual signal. Power spectrum density test. (PSDT) Given a set of data { xn)} ( N, the standardized cumulative periodogram is defined by: Ik ( ) = k i= K i= e R N e R N 2πi 2πi (2) and K is the integer part of N/2. If the process x( n ) is white Gaussian noise (WGN), then the random variables I( k ), k =, 2,..., K, are independently and uniformly distributed in the interval (, ), and the plot of I( k ) should be approximately linear with respect to k (Jenkins and Watts 968), The hypothesis is reected at level.5 if I( k ) exits the boundaries specified by where k K ( k ) ( K ) Ib( k) = ±.36 ( K ) 2 Figure c is a plot of the standardized cumulative periodogram for the residuals being considered. The plot shows a linear relationship in the least square sense and approaches that of a monotonic increasing function lying within the limits. Partial Autocorrelation test. (PACT) Given the residual process x(n), it was shown (Kendall and Stuart 983) that when N is sufficiently large, the partial autocorrelation sequence (PACS) values {kl} for lag l are approximately independent with distribution WN (, /N). This means that roughly 95 percent of the PACS values fall within the bounds ±.96/ N. If we observe values consistently well beyond this range for N sufficiently large, it would indicate nonwhiteness of the signal. (3) 4.5

3. RESULTS.8.7.6.5.4.3.2. -. -.2 5 5 2 25 3 35 4 --------------------------Training data--------------------------- > --------Modeling--- Figure 3 showing Inventory training data and modeled output representing network prediction The performance of a MLP for predictive modeling (function approximation) was observed using two sets of economic data.the neural network used a feed forward multilayer perceptron with sigmoid activation function. The supervised training of this artificial neural network was conducted using a set of 27 data points randomly chosen from a distribution of 4 points and network architecture consisting of an input later of 3 nodes, a single hidden layer with 2 nodes and a single output node. Standard back propagation with a combination of learning rate neta =.65 and momentum constant alpha =.9 was found to give the best result. Figures 3 and 4 display the networks modeling of the 2 sets of time series. Figure 5 display the results of model validation tests. 4.6

.9.8.7.6.5.4.3.2. 5 5 2 25 3 35 4 --------------------------- Training data------------------------- <------Modeling----- Figure4 showing sales training data and modeled out put representing network prediction 4. DISCUSSIONS In order to be able to make any definitive statements about the model s performance capabilities the model validation tests presented earlier were used as the criteria. The results of these statistical tests due to (Brockwell and Davis 99; Bendat and Piersol 986) presented the following picture: Figures 5 show model validation results for inventory, both the Autocorrelation sequence and PACS tests for lags to 2 showed over 9% of values falling within the confidence limits discounting the unity value at lag. The PSD test showed approximate linear behavior the cumulative values occurring within the confidence limits. The sales plot presented similar results for the time series with 4 data points. The plots for ACS, PACS and PSD all are incompliance with the pre-defined validation criteria for the residuals to be characterized as instances of White Gaussian Noise (WGN). On the basis of these tests there was a high probability that the residual generated by the NN model when modeling sales and inventory having been trained on these individual sets was a white noise source. The residual arising from the difference between the prediction of inventory from sales and the original data showed no significant structure there which indicated that the NN model parameterization did adequately encode the statistical information contained in the sales inventory training sample The model output therefore is a reasonable representation of the original signal. 5. CONCLUSIONS Forecasts are a prerequisite for most decisions that are based on planning, consequently. the quality of the forecast must be evaluated considering its possible impact on the decision. Invariably the quality of decision making is measured in 4.7

monetary cost. Costs from over and underproduction can be significant with far reaching consequences. In medical inventory management the cost of erroneously predicting the amount of needed units of blood of a specific blood group can be devastating. Over prediction may result in inventory holding costs while under prediction may be fatal. A Time Lagged Feed forward neural network has been presented as being capable of predictively modeling univariate economic time series. This neural network simulator proved to be viable and adequate to learn the statistics of a nonlinear environment and to be able to act as a predictor. The spectral analysis was able to identify similarities between modeled output and source data. It was observed that on average the neural network exhibited excellent generalization and function approximation capabilities. Residual a..5 -.5 -. 2 3 Samples c.5 ACR COEFFICIENTS.5 -.5 5 5 2 lags d b Ik(SCP).5 Amplitde.5 -.5 2 3 FREQUENCY(CYCLES/SAMPLE) -.5 5 5 2 lags Figure 5. Model validation test results (a) Residuals - the difference between actual and network output, (b) and (d) autocorrelation (b) and partial autocorrelation tests (c) power spectral density test. REFERENCES. Manolakis, D., Ingle V. and Kogon, S. 2 Statistical and Adaptive Signal Processing. McGraw Hill 2. Haykin.S 999 Neural Networks: A comphrensive Foundation. Haykin, S 996Adaptive Filter Theory third edition : 4.8

2. Neurocontrol of Nonlinear Dynamical Systems with kalman filter trained recurrent networks: IEEE Transactions on Neural Networks vol5 No2 March 994. 3. Puskorius, G. and Feldkamp,L. Decoupled Extended kalman Filter training of FeedForward Layered Networks::IEEE c99 ISBN -783-64-/9/ 77. 4. Koopmans,L. 995The Spectral Analysis of Time series,974 by Academic Press, Inc. 5. Fourier Analysis of Time series: An Introduction 996 John Wiley & sons Inc. 6. Grover Brown, R Hwang. Y.C. Introduction to Random Signals and Applied kalman Filtering997 John Wiley & sons. 7. Ellman, J.L.Finding Structure in Time: Cognitive Science,79 2, 99. 8. Spatial Predictive Modeling: a Neural Network Approach:www.cobblestoneconcepts,com/ucgis2summer/Paulson/Paulson.htm. 9. Ning Zhang S., huxiang Xu. Neuron Adaptive Higher Order Neural Network Models for Automated Financial Data Modeling. IEEE Transactions on Neural Networks vol3.nojanuary 22.. Feldkamp, l., and Puskorius,G. A Signal Processing Framework based on Dynamic Neural Networks with Application to problems in Adaptation, Filtering and Classification: Transactions of IEEE vol86.. Proakis and Manolakis Digital Signal Processing 2. Chakraborty K., Mehrota K., Mohan C., and Ranka S. Forecasting the behaviour of Multivariate Time Series using Neural Networks 4.9