Modeling Economic Time Series Using a Focused Time Lagged FeedForward Neural Network

Proceedings of Student Research Day, CSIS, Pace University, May 9th, 23 Modeling Economic Time Series Using a Focused Time Lagged FeedForward Neural Network N. Moseley ABSTRACT, - Artificial neural networks (ANN) are simplified mathematical representations of some aspects of the functioning of the human brain. ANN s based on the Multilayer Perceptron (MLP) the base architecture of layered networks, have been shown to be a powerful tool for input-output mapping and have been used extensively in many disciplines. In this paper we demonstrate the use of a neural network to model univariate economic time series,. For this thesis, a MLP based network simulator was designed and implemented in the C programming language. Specifically a Time Focused Feed forward layered network (TFLN) trained with standard back propagation algorithm with momentum is the chosen architecture. Focused Time Lagged Feed Forward Networks acquire temporal processing ability through the realization of short-term memory. The neural network generates estimates of the time series after training, additionally the ability of the network to discover nonlinear relationships was used to investigate the interaction between two key economic indicators from their representation as total sales and total inventories time series. A model validation regime predicated digital signal processing methodology was developed. The results of these studies demonstrate that the application of neural networks to time series data seems to hold promise as an effective tool for analysis and forecasting.. INTRODUCTION A discrete- time signal or time series x(n) is basically a sequence of real or complex number samples. The key characteristics of a time series are that the observations are ordered in time and adacent observations are dependent (related). When successive observations of the series are dependent, we may use past observations to predict future values. Modeling and predicting economic data by using traditional statistical approaches has only been partially successful. Accordingly, researchers in recent times have turned to alternative approaches, most notably Artificial Neural Networks (ANN) which constitute a class of non-linear models. Non-linear models, by definition have more meaningful applicability but they present an added difficulty in that their supplementary degrees of freedom which lead to a better fitting of the model to data may result in a reduction in generalization capabilities. This learninggeneralization dilemma is the main limitation of ANN s. A set of input and target samples are presented to the learning system, which is used to discover the statistical behavior of the input environment After training the fitted model must be validated with a validatiion set: a set of data not contained in the training set that provides a way to measure the capacity of the model to generalize what it has learned to include other data sets. Real world economic data is often nonlinear, comprising high frequency multipolynomial components and is piecewise continuous. Modeling economic data presents difficulties. A number of issues arise when working with the traditional techniques of linear function approximation: The system of interest may be intrinsically nonlinear or the wrong linear model may be selected. As the number of input variables increase the number of free parameters grow, which requires many more samples in order to prevent the model from specializing in the noise or other features of the training data. Additionally, polynomials become less efficient predictors as the number of input variables increase. A study found that the sum of squares error falls off as O (/M) where M is the sum of hidden units in a neural network, 2 regardless of the number of input variables. Error decreases O (/ M /d), where d is the number of input variables, for polynomials or any other series expansion.[2]. The motivation for analysis of time series using neural networks in this thesis is driven by the following features: (a) neural networks rely purely on the input observations the data is allowed to speak,(b) Multilayer feedforward networks with at least one hidden layer and a sufficient number of hidden units are capable of approximating any measurable function [2]., this makes them versatile enough to represent any form of time series (c) the capacity to generalize allows ANNs to extract statistical information even in the case of missing or noisy data. (d) Ann s have the capacity to represent nonlinearities in time series 4.

2. NEURAL NETWORKS 2. Multilayer Perceptron Neural networks generally consist of a number of interconnected nonlinear processing elements or neurons in which the nonlinearity is distributed throughout the network. The manner in which the interneuron connections are arranged and the nature of the connections determines the structure of the network. Fig. is a representation of a FeedForward Multilayer Perceptron. Inputs Outputs Input Hidden Output Layer Layer Layer Figure. Representation of a Feed forward Neural Network Showing input source nodes, and feedforward propagation through processing elements in hidden layer on to processing elements in outer layer The learning algorithm of the MLP determines the degree to which the connections are adusted during training in order to achieve a desired network behavior. In a FeedForward Multilayer Perceptron, the neurons are arranged in a feedforward mode so that the outputs of nodes in a layer form the input to nodes in subsequent layers. Therefore, signals flow unidirectionally from the input layer through each internal layer of the network to the output layer. Between the input and the output layers are the hidden layers. The network is given nonlinear properties through the use of a nonlinear transfer function associated with each processing element. The hidden layer may be visualized as creating a map relating an input pattern to its desired response. This ability allows MLPs to discriminate between nonlinearly separable categories. Masters (993) suggests, If a function consists of a finite collection of points, a 3 layer network is capable of learning it. For a ANN to model its environment, it is necessary that it s strengths/weights of the interneuron connections be adusted according to the difference between the desired and actual outputs corresponding to a given input condition. The adustments to the weights are effected under the influence of a learning algorithm with the following points in mind: The algorithm starts from an arbitrary setting of the neuron s synaptic weights. Adustments to the synaptic weights in response to statical variations in the system s behavior are made on a continuous basis. Computations of adustments to the synaptic weights are completed inside a time interval that is one sampling period long. 2.2 BACKPROPAGATION ALGORITHM The Back-propagation (BP) learning algorithm has emerged as the standard for the training of MLP. The partial derivatives of the cost function (performance measure) with respect to the free parameters (synaptic weights and biases)of the network are determined by back-propagating the error signals (computed by the output neurons) through the network layer by layer. In the application of the BP algorithm there are two distinct passes. In the forward pass the 4.2

synaptic weights remain unaltered throughout the network, and the function signals of the network are computed on a neuron by neuron basis. The function signals occurring at the output of a neuron is computed as y = f( v ) where v is the induced local field of neuron, defined by m v = w yi (2) i= where m is the total number of inputs applied to neuron and y i () is the synaptic weight connecting neuron i to neuron, is the input signal of neuron. If neuron is in the first hidden layer of the network the index i refers i to the ith input terminal of the network for which y = x i i where i is the i th element of the input vector (pattern). If neuron is in the output layer then the index refers to the th output terminal of the network where where o d (n) wi (3) y = o (4) is the th element of the output vector pattern The output is compared with the desired response obtaining the error signal e for the th output neuron. The backward pass starts at the output layer by passing the error signal leftward through the network layer by layer, and recursively computing the δ (local gradient) for each neuron. This recursive process permits the synaptic weights of the network to undergo changes in accordance with the following delta rule ϕ ' ( v ) Weight learning local inputsignal correction = rate. gradient. neuron wi η δ yi The local gradient is dependent on the location of the neuron. For neurons located in the output layer the local gradient is where δ ' e ϕ ( v) = (6) is the derivative of the activation function called on its argument the induced local field. The activation function f can be a simple threshold function, a sigmoid, or a hyperbolic tangent function and for neurons located in the hidden layer ' δ = ϕ ( v ) δ w (7) k k where the weighted sum of the δ s computed for the neurons in the next hidden or output layer that are connected to neuron is included. For the presentation of each training example the input pattern is fixed throughout the round trip process, encompassing the forward pass followed by the backward pass. k (5) 2.3 A FOCUSED TIME LAGGED FEEDFORWARD NETWORK In the Time focused feedforward network (TFLN) a static MLP acquires temporal processing capability, it sees the time series x xn in the form of many mappings of an input vector to an output val This technique was presented by Haykin.S[2]. The TFLN is a non-linear filter consisting of a tapped delay line memory of order p and a multiplayer perceptron. The TFLN used in the proect had a sigmoid activation function the logistic function: ϕ ( v) = a > and < v ( + exp( av ) (8) n) < 4.3

where v is the induced local field of neuron. input vector for each iteration of the algorithm represented as and the network output represented as [ ] x = x, x( n ),... x( n p) T (9) m yn ( ) = wy i = = wϕ w() l x( n l) + b +bo Figure 2a and 2b show sample sets of the data utilized, showing time variations over the period of interest. The decision as to the size of the layers in the network was determined using the constructive method. Constructive methods determine the topology of the network during training as an integral part of the learning algorithm. The approach is to begin with a small network, train the network until the performance criterion has been reached, continue adding nodes and training until a global performance has been reached in terms of an acceptable error criterion. subset. The training and test sets were used for modeling and the validation set used for extrapolation over unseen data points for validation of the model Figure 2. Sales and Inventory time series after transformation to remove seasonality and trend. () a b.8.8 scaled.6.4 scaled.6.4.2.2 2 3 4 months 2 3 4 months 3. MODELING THE DATA Four sets of time series from the Federal Reserve Archives were employed in this investigation the monthly totals inventories and sales (Billion$) from Jan 97 to December 2, and monthly totals inventories and (Billion$) from Jan 967 to December 2. Two set are shown above(see figures 2a and 2b)..All data sets were preprocessed in order 4.4

to eliminate trends and seasonality influences from the data. The resulting data was partitioned into estimation and validation sets. The estimation set was further partitioned into a training and test.it is necessary to investigate how well the obtained model captures the key features of the data as demonstrated in the agreement between the model output and the observed data in an least square error statistical sense. The existence of any structure in the residual or prediction error signal indicates a misfit between the model and the data. Hence, a key validation technique is to check whether the residual process, is a realization of white noise. Autocorrelation test. (ACT) The autocorrelation sequence of a stationary random signal is given[] by N rxy () l = lim N > x y ( n l) () n= N 2N + () It was shown (Kendall and Stuart 983) that when N is sufficiently large, the distribution of the estimated Autocorrelation coefficients ρ(l) = r(l)/r() is approximately Gaussian with zero mean and variance of /N. The approximate 95 percent confidence limits are ±.96/ N. Any estimated values of ρ(l) that fall outside these limits are significantly different from zero with 95 percent confidence. Values well beyond these limits indicate nonwhiteness of the residual signal. Power spectrum density test. (PSDT) Given a set of data { xn)} ( N, the standardized cumulative periodogram is defined by: Ik ( ) = k i= K i= e R N e R N 2πi 2πi (2) and K is the integer part of N/2. If the process x( n ) is white Gaussian noise (WGN), then the random variables I( k ), k =, 2,..., K, are independently and uniformly distributed in the interval (, ), and the plot of I( k ) should be approximately linear with respect to k (Jenkins and Watts 968), The hypothesis is reected at level.5 if I( k ) exits the boundaries specified by where k K ( k ) ( K ) Ib( k) = ±.36 ( K ) 2 Figure c is a plot of the standardized cumulative periodogram for the residuals being considered. The plot shows a linear relationship in the least square sense and approaches that of a monotonic increasing function lying within the limits. Partial Autocorrelation test. (PACT) Given the residual process x(n), it was shown (Kendall and Stuart 983) that when N is sufficiently large, the partial autocorrelation sequence (PACS) values {kl} for lag l are approximately independent with distribution WN (, /N). This means that roughly 95 percent of the PACS values fall within the bounds ±.96/ N. If we observe values consistently well beyond this range for N sufficiently large, it would indicate nonwhiteness of the signal. (3) 4.5

3. RESULTS.8.7.6.5.4.3.2. -. -.2 5 5 2 25 3 35 4 --------------------------Training data--------------------------- > --------Modeling--- Figure 3 showing Inventory training data and modeled output representing network prediction The performance of a MLP for predictive modeling (function approximation) was observed using two sets of economic data.the neural network used a feed forward multilayer perceptron with sigmoid activation function. The supervised training of this artificial neural network was conducted using a set of 27 data points randomly chosen from a distribution of 4 points and network architecture consisting of an input later of 3 nodes, a single hidden layer with 2 nodes and a single output node. Standard back propagation with a combination of learning rate neta =.65 and momentum constant alpha =.9 was found to give the best result. Figures 3 and 4 display the networks modeling of the 2 sets of time series. Figure 5 display the results of model validation tests. 4.6

.9.8.7.6.5.4.3.2. 5 5 2 25 3 35 4 --------------------------- Training data------------------------- <------Modeling----- Figure4 showing sales training data and modeled out put representing network prediction 4. DISCUSSIONS In order to be able to make any definitive statements about the model s performance capabilities the model validation tests presented earlier were used as the criteria. The results of these statistical tests due to (Brockwell and Davis 99; Bendat and Piersol 986) presented the following picture: Figures 5 show model validation results for inventory, both the Autocorrelation sequence and PACS tests for lags to 2 showed over 9% of values falling within the confidence limits discounting the unity value at lag. The PSD test showed approximate linear behavior the cumulative values occurring within the confidence limits. The sales plot presented similar results for the time series with 4 data points. The plots for ACS, PACS and PSD all are incompliance with the pre-defined validation criteria for the residuals to be characterized as instances of White Gaussian Noise (WGN). On the basis of these tests there was a high probability that the residual generated by the NN model when modeling sales and inventory having been trained on these individual sets was a white noise source. The residual arising from the difference between the prediction of inventory from sales and the original data showed no significant structure there which indicated that the NN model parameterization did adequately encode the statistical information contained in the sales inventory training sample The model output therefore is a reasonable representation of the original signal. 5. CONCLUSIONS Forecasts are a prerequisite for most decisions that are based on planning, consequently. the quality of the forecast must be evaluated considering its possible impact on the decision. Invariably the quality of decision making is measured in 4.7

monetary cost. Costs from over and underproduction can be significant with far reaching consequences. In medical inventory management the cost of erroneously predicting the amount of needed units of blood of a specific blood group can be devastating. Over prediction may result in inventory holding costs while under prediction may be fatal. A Time Lagged Feed forward neural network has been presented as being capable of predictively modeling univariate economic time series. This neural network simulator proved to be viable and adequate to learn the statistics of a nonlinear environment and to be able to act as a predictor. The spectral analysis was able to identify similarities between modeled output and source data. It was observed that on average the neural network exhibited excellent generalization and function approximation capabilities. Residual a..5 -.5 -. 2 3 Samples c.5 ACR COEFFICIENTS.5 -.5 5 5 2 lags d b Ik(SCP).5 Amplitde.5 -.5 2 3 FREQUENCY(CYCLES/SAMPLE) -.5 5 5 2 lags Figure 5. Model validation test results (a) Residuals - the difference between actual and network output, (b) and (d) autocorrelation (b) and partial autocorrelation tests (c) power spectral density test. REFERENCES. Manolakis, D., Ingle V. and Kogon, S. 2 Statistical and Adaptive Signal Processing. McGraw Hill 2. Haykin.S 999 Neural Networks: A comphrensive Foundation. Haykin, S 996Adaptive Filter Theory third edition : 4.8

2. Neurocontrol of Nonlinear Dynamical Systems with kalman filter trained recurrent networks: IEEE Transactions on Neural Networks vol5 No2 March 994. 3. Puskorius, G. and Feldkamp,L. Decoupled Extended kalman Filter training of FeedForward Layered Networks::IEEE c99 ISBN -783-64-/9/ 77. 4. Koopmans,L. 995The Spectral Analysis of Time series,974 by Academic Press, Inc. 5. Fourier Analysis of Time series: An Introduction 996 John Wiley & sons Inc. 6. Grover Brown, R Hwang. Y.C. Introduction to Random Signals and Applied kalman Filtering997 John Wiley & sons. 7. Ellman, J.L.Finding Structure in Time: Cognitive Science,79 2, 99. 8. Spatial Predictive Modeling: a Neural Network Approach:www.cobblestoneconcepts,com/ucgis2summer/Paulson/Paulson.htm. 9. Ning Zhang S., huxiang Xu. Neuron Adaptive Higher Order Neural Network Models for Automated Financial Data Modeling. IEEE Transactions on Neural Networks vol3.nojanuary 22.. Feldkamp, l., and Puskorius,G. A Signal Processing Framework based on Dynamic Neural Networks with Application to problems in Adaptation, Filtering and Classification: Transactions of IEEE vol86.. Proakis and Manolakis Digital Signal Processing 2. Chakraborty K., Mehrota K., Mohan C., and Ranka S. Forecasting the behaviour of Multivariate Time Series using Neural Networks 4.9