Reservoir Computing in Forecasting Financial Markets

Size: px
Start display at page:

Download "Reservoir Computing in Forecasting Financial Markets"

Transcription

1 April 9, 2015 Reservoir Computing in Forecasting Financial Markets Jenny Su Committee Members: Professor Daniel Gauthier, Adviser Professor Kate Scholberg Professor Joshua Socolar Defense held on Wednesday, April 15, 2015 in Physics Building Room 298

2 Abstract The ability of the echo state network to learn chaotic time series makes it an interesting tool for financial forecasting where data is very nonlinear and complex. In this study I initially examine the Mackey-Glass system to determine how different global parameters can optimize training in an echo state network. In order to simultaneously optimize multiple parameters I conduct a grid search to explore the mean squared error surface plot. In the the grid search I find that error is relatively stable over certain ranges of the leaking rate and spectral radius. However, the ranges over which the Mackey-Glass system minimizes error do not correspond with an error surface plot minimum for financial data, as a result of intrinsic qualities such as step size and timescale of dynamics in the data. The study of chaos in financial time series data leads me to find alternate understandings of the distribution of the relative stock price over time. I find the Lorentzian distribution and the Voigt profile are good models for explaining the thick tails that characterize large returns and losses, which are not explained in the common Gaussian model. These distributions act as an untrained random model to benchmark the predictions of the echo state network trained on the historical price changes in the S&P 500. The global reservoir parameters, optimized in a grid search given financial input data, does not lead to significant predictive abilities. Studies of the committees of multiple reservoirs are shown to give similar forecasts to single reservoirs. Compared to a benchmark random sample from the defined distribution of previous input, the echo state network is not able to make significantly better forecasts, suggesting the necessity of more sophisticated statistical techniques and the need to better understand chaotic dynamics in finance. 2

3 Contents 1 Introduction Background Approach Network Concepts Basic Concept Input The Reservoir and Echo State Property Training and Output Reservoir Optimization Summary Echo State Network and Mackey-Glass Mackey-Glass System Constant Bias Leaking Rate, Spectral Radius, and Reservoir Size Summary Financial Forecasting and Neural Networks Nonlinear Characteristics History of Neural Networks in Finance Neural Forecasting Competition Network Predictions S&P 500 Data Data Processing Benchmarking Parameter Optimization Reservoir Results Committee Concluscion Analysis of Results Conclusion

4 Chapter 1 Introduction 1.1 Background Artificial Neural Networks are trainable systems that have powerful learning capabilities with many applications in forecasting and classification. Their learning process has many similarities to that of the human brain because their design was inspired by research into biological nervous systems. Like the central nervous system of animals, the artificial neural network is composed of many neurons or artificial nodes, which are connected in a defined network. Both biological and artificial neural networks send and receive feedback signals through their connections which partially determines their expressed state. Therefore these networks are constantly updating with information from external sources as well as internally within the network from other neurons. The development of artificial neural networks began in 1943 when McCulloch and Pitts [23] first proposed a network in which the state of neurons was determined by a combination of all the signals received from connected neurons in the network. In their model, simple artificial neurons i = 1,...,n could only have binary neuron activations or states, n i = 0,1, where n i = 0 represents the resting state and n i = 1 represents the activated state of the neuron. The binary state of each neuron i is dictated by a linear combination of action potentials, j w i j n j (t), where w i j is a matrix representation of the connection weights between neurons i and j as seen in figure 1.1. If this value exceeds a certain threshold, then neuron i becomes activated and transmits signal n i = 1. 2

5 Figure 1.1: Network model of neurons are summed as a linear combination which must exceed a certain threshold for the neuron to be activated. Another important foundational step occurred in 1949 when Hebb published The Organization of Behavior in which he asserts that simultaneous activation of neurons leads to increased connection strengths between those neurons [8]. This established the basis for Hebb s learning rule which states that the synaptic connection between neurons was adaptive. The connection between neurons can be strengthened or increased as a result of repeated activations between the neurons. If neuron x 1 activations consistently leads to neuron x 2 activations, then the connection weight between them would incrase. This idea was later incorporated in 1961 by Edward Caianeillo to the learning algorithm used to determine weight matrix w i j connecting neurons [1] where the connection weight matrix had adaptive weights for neurons with similar activations. The connection between neurons which activate simultaneously are stronger. All of these discoveries contributed to the development of the first simple feed-forward network which Rosenblatt and his collaborators called a perceptron [29]. The perceptron consisted of two layers of neurons: the input layer and the output layer. Signals continues from the input to the output in a single direction as seen in figure 1.2. This first feed-forward network was used in a simple classification problem between two classes. After each classification, if the model predicts the class incorrectly the weights are adjusted and the network is run again until the weights converge to the correct values that 3

6 allow the model to predict the class correctly. Figure 1.2: Single-layer perceptrons have adjustable weights which are trained to predict correct classification. Years later in 1982, Hopfield published the basis for a recurrent network where neurons can be updated sequentially based on information stored within the network [9]. As opposed to a feedforward network, recurrent neural networks have neurons whose connections are multidirectional as shown in figure 1.3. This allows the network to display dynamical behavior based on internal memory of signals throughout the network. In Hopfield s first network, there was no self-connection, the neuron did not receive input from itself, and the connections between neurons were symmetric so that w i j = w ji. Hopfield showed that networks with these characteristics which iteratively adjusted the state of each neuron would reach a local minimum and evolve to a final state. 4

7 Figure 1.3: Hopfield network neurons connections are multidirectional with symmetric weights. Within the field of networks, many different learning rules that have been developed are classified as supervised learning, where the network is adjusted by comparing the network output with the desired output. One of the most common learning rules, error backpropagation, was first proposed in 1974 by Werbos [37]. The learning algorithm makes small iterative adjustments to network connections to minimize the difference between the target and network output. Backpropagation has been especially successful in feed forward networks but the method is only partially successful with recurrent neural networks. Output of recurrent neural networks can bifurcate unlike the smooth continuous output of feed-forward networks. This bifurcation can lead to discontinuous error surfaces[3]. An alternative to backpropagation was introduced with echo state networks (ESNs) and liquid state machines [13, 19]. These two training models developed independently within the context of machine learning and computational neuroscience respectively share the same basic idea of a randomly connected reservoir of neurons with a trained output feedback. This developed into the current research field of Reservoir Computing. The specific network concepts defined by the echo state network approach I will be using in this project is explained in detail in chapter 2. Applications of reservoir computing are useful in tasks for function approximation, signal processing, classification, and data processing. They have been applied to system identification, natural language 5

8 processing, medical diagnosis, as well as spam filters. In this study, I am interested in the application of these network models to the financial sector, which historically has great incentives to determine models that are capable of predicting and forecasting changes in a complex, widely fluctuating market. 1.2 Approach The goal of the project is to study echo state network and its modeling capacity in forecasting financial data series. Within the scope of this project I first forecast a known dynamical system Mackey-Glass to better understand reservoir behavior given specified global parameters. Secondly I will then use knowledge of ESNs in application to financial data to determine whether ESNs may provide a useful prediction of the stock market trajectories. The paper is organized as follows. The second chapter discusses network concepts and the mathematical description of the Echo State network. In order to determine best practices, in chapter three I study the network using the Mackey-Glass system, a nonlinear delay differential equation that is commonly used to test modelling of complex systems. The ESN can be trained using Mackey-Glass on varying global parameters of the reservoir. These global parameters are tuned to optimize network performance. The fourth chapter introduces the dynamics of the stock market and how previous forecasting measures have dealt with these widely fluctuating time series. In the fifth chapter I apply the echo state network approach to financial data. I study the impact of global parameters and data processing techniques. I benchmark the reservoir predictions using random samples from distributions that best fit the historical input data. I also study the impact of committee methods combining output from multiple reservoirs in comparison to single reservoir predictions as well as random sample. The sixth and final chapter concludes with a comparative discussion of the echo state network approach in financial forecasting as well as limitations of the model and further studies. 6

9 Chapter 2 Network Concepts 2.1 Basic Concept There are many supervised learning algorithms for training recurrent neural networks. In the echo state network (ESN) approach used in this project, only the output weights from the reservoir are trained. The connections of the neurons within the reservoir are randomly generated at the outset as in figure 2.1. Training all network connections is unnecessary, which makes this faster than previous learning algorithms in which all connections are trained. Because the network has recurrent loops, it maintains a memory of the past input and the output consists of echoes of the initial input time series. 7

10 Figure 2.1: Echo state network approach has a combination of trained and untrained connections between neurons or nodes. In implementing the echo state network, an input teacher data series u(n) is used to train a reservoir of size N x with neurons connections determined by a randomly generated matrix W R N x N x. The output node of the reservoir gives a readout of a linear combination of all or a portion of the neuron activations W out x(n). This matrix W out is computed so that the output y(n) corresponds as closely as possible to a defined target data series y target (n). This last step is the training portion of the learning algorithm. Once the best weight matrix W out is determined, new input data u(n) can be used in the reservoir to generate output or reservoir predictions y(n) beyond the target data. The rest of this chapter describes the mathematical properties of the different components involved in the ESN approach and also introduces the tunable global parameters to optimize this learning architecture, as first developed by Jaeger in 2001 [13]. 2.2 Input The input data to the reservoir serves as the driving mechanism for the reservoir. Not only does the reservoir exhibit nonlinear dynamics as a result of the input, but the reservoir will also retain a memory 8

11 of previous input. This ability to remember, which will later be defined as the echo state property in section 2.3, occurs as a result of the recurrence of the network; the nodes form recurrent cycles through their connections in W. Typically, the teacher input series, u(n), consists of N u series of data points at discrete time steps (n, n + 1, n + 2,...). In my studies, part of the input teacher series is used as the target data y target (n) that the output is trained on. There are no limitations to the starting point of the data series, and therefore shifting the initial starting input does not significantly impact the ability of the reservoir to learn the input series. While in many contexts, N u is one-dimensional, there is no limit to the number of arrays that can be used as input to the reservoir. The input is fed into the reservoir using a randomly generated input weight matrix W in R N x (1+N u ). In addition to the specified input data u(n), there is also a constant bias value input to the reservoir. This bias input is weighted by the randomly generated column of values in W in as seen in figure 2.2. The purpose of the bias constant serves to increase the variability of input dynamics [13]. The impact of this bias constant is further studied in section 3.2. All these input values directly impact the reservoir, which is studied in the next section. Figure 2.2: W in determines the connection for both the input data and bias input. 9

12 2.3 The Reservoir and Echo State Property The reservoir is defined by the neurons within it whose states all follow the same update equation as a function of the input and feedback from other neurons. The state of each neurons or nodes depends on previous states in the reservoir according to x(n + 1) = (1 α)x(n) + α tanh(wx(n) + W in u(n + 1) + W f b y(n) + v(n)), (2.1) where α is the leaking rate of the network. Technically the state update can be governed by any sigmoidal function but in the context of this project I use the tanh function because it is the standard sigmoid function studied across all encountered literature in reservoir computing. According to Eq. 2.1, each new neuron state x(n + 1) is determined by its current state x(n) as well as a nonlinear expression of other current nodes Wx(n) and the input data W in u(n + 1). The state x(n + 1) can also depend on the output adjusted by the randomly generated feedback weight matrix W f b y(n) as well as a noise function v(n). However, both of these parameters are optional and are omitted in this project. Jaeger found that noise was useful in maintaining stability of reservoir activations when driven by highly chaotic time series such as the Mackey-Glass system [13]. He also concluded that noise insertion negatively impacted the precision of predictions, which is undesirable in my project. In the project, I do not use a separate, W f b, to feed into the reservoir but rather feed the output y(n) back as input u(n) and therefore W f b = W in. This allows the reservoir to exhibit dynamical behavior as a result of the output as well as the input. One of the most important parameters in determining reservoir behavior is the random weight matrix W that describes neuron connections. The recurrent loops that occur due to these connections lead to the echo state property of the network that gives the reservoir a finite memory. The echo state property states the the reservoir retains a memory of previous states as the input data is stored in the recurrent loops of the reservoir connections. The echo state property can be ensured in practice if spectral radius, denoted by ρ, is less than 1. The spectral radius is the largest maximum eigenvalue of the weight matrix W. However more recent researchers have determined that the echo state property occurs even for the more relaxed condition when the maximum value of entry w i j in W is less than 1 [6]. Jaeger, more recently, also noted that the echo state property can exist even when ρ > 1 but may not exist for all input and never exists for the null input [22]. Increasing the ρ may even increase network performance by increasing memory of the input since the spectral radius defines how long input is remembered. He also 10

13 found that the echo state property might be defined with respect to the input u(n). Since the reservoir exhibits memory, the initial n activations should not be used in training, which is defined in the next section, due to any transients that may exist in the reservoir. Since ρ in most of the studies in this project is close to unity there is a slow forgetting. Therefore, many initial states need to be dismissed before actual predictions can take place. This makes this learning process input-intensive and data-wasteful and other techniques have been developed that incorporate an auxiliary initiator network that computes an appropriate starting state for the recurrent model network [13]. These techniques are beyond the scope of my project because the data length is not so much a limitation in financial data. 2.4 Training and Output Given all the reservoir activations resulting from the N u input series, the goal is to take the output of this reservoir y(n) to best approximate the target data y target (n). Ideally the goal is to minimize the difference between the target data and the reservoir output. In this project, I use Mean Squared Error (MSE) defined as MS E = 1 n Because the reservoir output is a linear combination of the activations, n (y out (n) y target (n)) 2. (2.2) i=1 Y = W out X, (2.3) where Y and X are matrix representations of the reservoir output y(n) and reservoir states x(n) respectively, minimizing MSE becomes a simple linear regression. After substituting in matrix forms used in Eq. 2.3 into Eq. 2.4, I find MS E = (W out X Y target ) 2. (2.4) Minimizing the MSE gives the solution to W out as W out = Y target X 1. (2.5) However, because u(n) is often larger than the reservoir size, the system is overdetermined. Taking the inverse of an overdetermined matrix can give unstable solutions. Therefore instead of a output matrix as 11

14 a simple function of the target and inverse of the activations, I implement Tikhonov regularization, also known as ride regression, in order to find a stable solution for W out [17]. In Tikhonov regularization, a regularization term is added to Eq. 2.2, resulting in (W out X Y target ) 2 β(w out ) 2, (2.6) where β is the regularization coefficient used to penalize larger norms. Minimizing Eq. 2.6, I obtain the solution W out = Y target X T (XX T + βi) 1. (2.7) Alternatively, as mentioned previously in section 2.2, the noise function v(n) is also used to stabilize solutions in systems that are overdetermined. However the ridge regression method is a more computationally efficient solution that does not penalize the precision of reservoir predictions that noise may affect. 2.5 Reservoir Optimization In determining the optimal reservoir, there are many adjustable global parameters that influence the dynamical behavior. The goal in reservoir optimization is therefore to generate dynamical behavior that is most similar to the system the reservoir is attempting to model. The reservoir dynamics, as seen in Eq. 2.1, is governed by W, W in, and α. These parameters are related to different quantities that characterize the network such as: reservoir size N x, spectral radius ρ, input scaling β, and leaking rate α. These global parameters and their effect on the reservoir are the topic of this section. The current standard practice involves intuitive manual adjustment of each of these parameters to optimize the reservoir [17]. I will study parameter optimization in section 5.3. Reservoir Size The reservoir size N x, or the number of neurons, dictates the model capacity of the network. The current intuition states that in general, bigger reservoirs lead to better performance. The training method (Eq. 2.7) used for Echo State Networks is computationally efficient enough to generate large reservoir sizes 12

15 on the order of magnitude of These have been found useful in automatic speech recognition [32]. A benchmark for the lower bound for a reservoir is given by Lukosevicius [17]: the reservoir size N x should be at least equal to the estimate of the independent real values that the reservoir needs to retain in memory of the input. This is a result of the echo state property, described in section 2.3. The maximum length memory of the reservoir is limited by its size. Spectral Radius The spectral radius ρ is one of the most central parameters in the echo state network because it plays an important role in governing the echo state property of the reservoir. The radius ρ is defined as the maximum absolute eigenvalue of the reservoir connection matrix W and it determines the length of the reservoir memory. Larger spectral radius corresponds to a longer memory of the input history. In this project, the spectral radius essentially scales the weight matrix W. After a random matrix W is generated, the spectral radius of this matrix is calculated by ρ(w). Then W is divided by ρ(w) to normalize its spectral radius. This means ρ(w) becomes equal to 1. Then the connection matrix is multiplied by the selected spectral radius ρ such that ρ(w) = ρ. This optimization of the matrix spectral radius occurs after the connection weight matrix is randomly selected. Then its spectral radius is adjusted to fit the desired ρ value. In practice the condition, ρ < 1, ensures the echo state property in most applications [17]. However, the theoretical limit established in Ref. [6] shows that the echo state property only exists for every input under the tighter condition ρ < 1/2. In application, however, the echo state property often holds for even ρ 1 given nonzero input. Typically large ρ values may push the reservoir into chaotic spontaneous behavior which would violate the echo state property. This is shown by Jaeger [12] where the trivial zero input will lead to linearly unstable solutions given ρ > 1. Generally the spectral radius should be tuned to optimize reservoir performance with a starting benchmark value of ρ = 1 and exploring other ρ values close to 1. When modelling systems that depend on more recent input history, ρ should be smaller compared to a larger ρ when a more extensive memory is required. 13

16 Input Scaling Input scaling is an important parameter as previously noted in section 2.3 because it also plays a role in determining the echo state property [22]. The input scaling is typically determined by the input weight matrix W in which is randomly sampled on a range [ a, a], where a is the scaling parameter. While not necessarily required, the input scaling parameter determines the the scale of the entire input series by scaling all the columns of W in. This reduces the number of free parameters that need to be tuned to optimize reservoir performance [17]. Theoretically it is possible to scale individual input units u(n). Increasing the number of input scaling parameters could serve to expand components of the input u(n) that favorably drive reservoir dynamics. However there are no algorithms that can easily scale individual input components and exploration of this topic is beyond the scope of this project. In this project, I determine the input scaling through W in, which is chosen randomly from the range [ 1, 1]. The input scaling parameter, a, is multiplied by the input data, u(n). This is analogous to sampling on the range, [ a, a]. W in not only scales the input u(n) but it also scales the constant bias. Input scaling determines the nonlinearity of the network dynamics. Lukosevicius [17] advises ensuring that the inputs to the state update Eq. 2.1 are bounded. Because of the tanh function that defines neuron activations, inputs very close to 0 will lead to linear behavior of neurons. Inputs close to 1 or -1 may cause binary switching behavior of reservoirs. Inputs between 0 and 1 or -1 will lead to dynamical nonlinear behavior of neurons. Leaking Rate The leaking rate α approximates a discrete Euler integration of the state over time. The leaking rate α, which is bounded between [0,1], determines the speed at which the input affects or leaks into the reservoir. This discretization term comes from an Euler integration of the state equation in time x t = x(n + 1) x(n) t x n + f (n) (2.8) so I can determine the solution for a new state x(n + 1) x(n + 1) = (1 t)x n + t f (n). (2.9) In the reservoir the leaking rate α is the t in the Euler integration of an ordinary differential equation. 14

17 This also functions as a form of exponential smoothing for the time series where previous states x(n) will be weighted exponentially less over time because by substitution to the state update equation (Eq. 2.1) I can see x(n + 1) = (1 α) 2 x(n 2) + α(1 α) f (x(n 1)) + α f (x(n)). (2.10) In general, the leaking rate is set to match the speed of the dynamics the reservoir is attempting to model from y target. A recent study [12] has shown that leaking rate can also impact the short-term memory of echo state networks. The study showed that in some reserovirs, given a small leaking rate, α, the slow dynamics of the reservoir states, x(n), could increase the length of the short-term memory of the reservoir. I study impact of the leaking rate along with other parameters more explicitly in section 5.3. Further studies of multiple time scales may be useful to determine components of the system that may have multi-scale dynamics which occur on different timescales but are beyond the scope within this paper. 2.6 Summary In this chapter, I outlined the specific components of the echo state network as well as specific parameters that are useful in optimizing its performance. In the following chapter I apply this knowledge in practice by studying the modelling capacity of the reservoir using a known time series. 15

18 Chapter 3 Echo State Network and Mackey-Glass 3.1 Mackey-Glass System The Mackey-Glass equation is a nonlinear time delay differential equation. It was originally derived as a model of chaotic dynamics in blood cell generation in a collaboration between Mackey and Glass at McGill University [20]. The equation expands on a simple feedback system whose dynamics are given by dx dt = λ γx, (3.1) which has a stable equilibrium at λ/γ in the limits t given that γ is positive. In this simple feedback, system the rate of change of the control variable, dx/dt, is influenced by the value of the control variable. The system increases at a constant rate of λ and decreases at rate γx. To better model real physiological systems, modifications to this simple feedback system include introducing a time delay component. The form of the Mackey-Glass equation studied today is x τ dx dt = β 1 + xτ n γx. (3.2) The equation can generate periodic behavior, bifurcations, and chaotic dynamics given specified parameters β, γ, τ, and n. The derivatives of a time delay differential equation are dependent on solutions at previous times. x τ represents the the state x at a time delayed by the constant τ, x(t τ). There may be a significant time lag between determining the control variable x and responding with an updated change. Additionally, in real physiological models, the parameters γ and λ are not necessarily constant 16

19 for all time t, and may vary with x(t τ), also denoted x τ. This is true for Eq. 3.2, which also exhibits period doubling bifurcations leading up to chaotic regime, and has been extensively studied in relation to chaos theory [7]. Because the system has been so well characterized by previous studies, the Mackey-Glass system is often used to benchmark time series prediction studies. Previous studies by Jaeger [14] have shown that ESNs improve upon the best previous techniques for modeling the Mackey-Glass chaotic systems [36] by a factor of 700, which he attributes to the ESN s ability to store and remember previous states defined as the echo state property in section 2.3. In Jaeger s article, he uses Mackey-Glass parameters β = 0.2, n = 10, γ = 0.1, and τ = 17, which I reproduced in figure 3.1 using dde23 solver [5] in Python. For τ > 16.8, the system has a chaotic attractor. Since I use τ = 17 the system exhibits a chaotic attractor which can be observed in the time delay embedding in figure 3.2. I use the same above parameters to start to train my echo state networks. Figure 3.1: Mackey-Glass system over time t = 3000 given β = 0.2, n = 10, γ = 0.1, and τ = 17. I begin to understand the preparation for reservoir training tasks using the Mackey-Glass system since it has been well studied in the context of the reservoir. The rest of this chapter details the studies of the constant bias global parameter discussed in section 2.5 and the impact on optimization of the echo 17

20 Figure 3.2: Phase space diagram of Mackey-Glass attractor plotted by time delay embedding given β = 0.2, n = 10, γ = 0.1, and τ = 17. state network source code made publicly available by Lukosecivius [18]. These studies will give insight into the basic routine used in training networks which is applied in subsequent chapters to financial data sets. 3.2 Constant Bias In determining states of the activations (see Eq. 2.1), there is an additional bias input, randomly generated within the W in matrix that is multiplied by a constant scaling parameter. In order to better understand the impact of this bias input on the reservoir dynamics I use Mackey-Glass data and study the behavior of the reservoir as a result of changing bias. I generate input Mackey-Glass data for training and testing, construct an echo state network, and train the network on the output matrix to make predictions for the next step. The rest of this section illustrates the impact of bias on the mean squared error. As I study MSE, I examine the impact that the output weight matrix W out may have on the error as it indirectly impacts MSE by determining y(n). Input preparation Using Mackey-Glass parameters β = 0.2, n = 10, γ = 0.1, and τ = 17, I generate times series as shown in figure 3.1. This data is split up into the training data as well as the testing data. In this numerical study, 18

21 I use a training length of 2000 and a testing length of The experimental data is shifted between -1 and 1 to fall within the nonlinear regime of the tanh function. There is no additional input scaling factor needed to adjust the input in this case because the range of the series is less than 1. The training data is weighted according to the randomly generated input weight matrix used in the reservoir to drive the activation states in Eq The first 100 inputs are used to initialize the reservoir and are not used to train the output. The rest of the training data is used as target data to train on as Y target in Eq The testing length is used to compare with the reservoir output to determine the accuracy of reservoir predictions as in Eq Reservoir Construction A reservoir was constructed as described in section 2.3. In order to study the constant bias I varied the scaling parameter for the bias from the range 0 to 10 with.1 increments, I examine the impact this may have on the reservoir activations by isolating the parameter. I hold all other parameters constant: reservoir size = 1000, leaking rate =.3, spectral radius = 1.25, input scaling = 1. Using the first 2000 data points in the training sequence to drive the reservoir according to the reservoir update Eq. 2.1, the states of all the reservoir nodes are collected in X. As seen in figure 3.3 which shows some reservoir activations, the initial activations resulting from the first few random states of the network are highly variable in the first few steps t > 50. Therefore the first 100 activations are ignored to remove any initial transience of the random reservoir. Since reservoir connections are randomly generated, I created 10 reservoirs given each scaling parameter to test the impact of bias on MSE. 19

22 Figure 3.3: The activations of a few nodes for t < 50 are more variable than the stable activations for t > 50. Training and Prediction Training the output feedback of the reservoir requires using the target data Y target and reservoir activations X in Eq The regularization coefficient is set to 10 8 for all the reservoir training throughout this project in order to minimize the number of adjusted parameters. Once the output matrix is determined, the matrix prediction y(n) is calculated by the output weight matrix and the reservoir states. This output y(n) is then used as the subsequent input to continue to drive the reservoir and make the following prediction. This series of predictions is compared to the test data and the MSE, given by Eq. 2.2, is determined. Results After training a number of reservoirs over a range of scaling constants, I compare the prediction results of the reservoirs to the test data. The following plot is an example of the actual data compared to the 20

23 reservoir predictions for one of the runs using a bias scaling constant of 1. Figure 3.4: Reservoir Predictions after training compared to test data points. To compare the results across the varying scaling constants, I calculate the MSE using Eq. 2.2 for reservoir predictions over a length of 500 steps for each reservoir. The average MSE of the 10 reservoirs for each scaling bias is shown in the figure below. 21

24 Figure 3.5: MSE across varying scaling constants for the bias given parameters reservoir size = 1000, leaking rate =.3, spectral radius = 1.25, input scaling = 1. As seen in figure 3.5, the smallest MSE occurred when the scaling constant was kept below 3.0 but error increased when the scaling constant was zero or close to it. In the range from 0.8 to 2.2, the MSE was on the order of magnitude of 10 5 or smaller. Typically in literature, this scaling constant is kept at 1 [17]. The purpose of the constant bias input as explained by Jaeger and in section 2.2 was to increase the variability of the reservoir dynamics. In application, the bias may help the reservoir deal with offset or non-centered data. The mean of the shifted input data is close to but not quite zero, at around While the bias input does seem to have an impact on MSE, there may be some indicator for large error in the echo state network model. Since W out has a significant impact on output y(n) I compare the different weight matrices as they affect MSE. Higher weights in the the output matrix could possibly be a result of an unstable solution in Eq To study whether or not the output weight matrix would be 22

25 correlated to the MSE, figure 3.6 was produced to compare the mean of the output weight matrix to the MSE. Figure 3.6: MSE as a function of output weight matrix W out Figure 3.6 indicates no correlation between the mean of the weight matrix and the MSE. This might have been expected because weight matrices with larger more unstable values could indicate poor linear fits for W out from Eq However because Tikhonov regression was intended to penalize large, unstable matrix solutions, these solutions may not manifest in the final outcome weight matrix. This already acts as a threshold for any solution derived from Eq. 2.7 to dismiss highly unstable solutions. Other studies of the output matrix could observe other statistical features. In the following section we study the impact of a few other reservoir parameters. 23

26 3.3 Leaking Rate, Spectral Radius, and Reservoir Size There are many other important parameters that can be optimized in the network as discussed in section 5.3. In this section we study the interrelated impact of multiple parameters. One of the flaws of optimizing each parameter individually is that the MSE may have a local minimum as a function of multiple parameters. In order to better optimize across multiple parameters, I study more advanced optimization techniques. Gradient Descent Gradient descent is a method commonly used to minimize error functions. In gradient descent optimization, also known as method of steepest descent, an iterative approach is taken to converge to a local minimum. From an initial starting vector of parameters, the gradient for MSE is calculated across the multiple parameters. The direction in which MSE will fall the fastest is the negative gradient at that initial starting set of parameters. These parameters are then adjusted in the direction of steepest descent so that MSE is reduced and the gradient is calculated again. This process is repeated to minimize the MSE until the gradient converges to 0, at which point there is a local minimum. While gradient descent algorithms are useful in finding the minimum of smooth convex functions, there is not much knowledge about the error surface in reservoir computing. The error surface could be discontinuous and contain multiple local minima where the gradient descent algorithm could fall and miss the absolute minimum. In order to get a better sense of the kind of error surface that exists in reservoir optimization, I conduct a grid search detailed in the next subsection. Grid Search The basic grid search explores the MSE outcome for multiple reservoirs across several ranges of global parameters. Grid searches are a computationally expensive brute force method to finding optimal parameters since multiple calculations need to be made at each point along the grid. However, for the purposes of this study, it provides the ability to visualize the error surface over many reservoir parameters. In this grid search, I explore the error surface as a function of reservoir size, leaking rate and spectral radius. Data processing follows the same procedures as described in section 3.2. However, instead of simply adjusting one parameter, I am able to tune three different parameters. I study the slices of the 24

27 error surface at different reservoir sizes N x = 100, 400, 700, and This surface plot shows the MSE at different values of leaking rate 0.05 α 0.3 and spectral radius 0.8 ρ 1.2. At each point on the grid defined by the global parameters, 10 different randomly connected reservoirs are created. Because of the computational costs associated with grid search I take large step sizes in all of my parameters to minimize the number of searches that need to be run. Results While I am not able to conduct an exhaustive grid search, the purpose of the grid search is to better understand the error surface across reservoir parameters. The surface plots in figures 3.7, 3.8, 3.9, and 3.10 show the mean of the MSE outcome for the 10 reservoirs created at each point. Figure 3.7 excludes values with leaking rate α < 0.2 because the MSE exploded beyond that point. I have already discussed in section 2.5 that smaller reservoir sizes do not have the capacity for large memory of input. Therefore it makes sense that many parameter values given in the grid search returned extremely high MSE values. The MSE surface also tends to be lower for the reservoir N x = 1000 compared to the other three plots. Studying these surface plots across multiple reservoir sizes, the MSE seems to be minimized for higher leaking rates α > 0.2 which correspond to the values of α I have been using in our studies of Mackey- Glass. 25

28 Figure 3.7: MSE surface for N x = 100 reservoirs across α and ρ 26

29 Figure 3.8: MSE surface for N x = 400 reservoirs across α and ρ 27

30 Figure 3.9: MSE surface for N x = 700 reservoirs across α and ρ 28

31 Figure 3.10: MSE surface for N x = 1000 reservoirs across α and ρ 3.4 Summary In this chapter I examined the Mackey-Glass system and how the numerical solutions could be used as input to the echo state network and as a demonstration of the network training process. I studied specifically the bias input in reservoir and how the scaling constant for bias input affected the reservoir predictions and error. I determined a wide optimal range for the bias input. Studying the output weight matrix, I determined there was no direct relationship between the mean of output weights and the prediction ability of the reservoir. I was able to study the multivariate impact of reservoir parameters on MSE in a grid search across multiple reservoir sizes, leaking rates, and spectral radii. The grid search suggests using leaking rate values α > 0.2, and the leaking rate value used in the Mackey-Glass studies is α = 0.3, which confirms I have been operating in an optimal range of reservoir parameters. As I move further, these initial studies will guide my understanding of the reservoir predictions of financial data. 29

32 Chapter 4 Financial Forecasting and Neural Networks This chapter provides a cursory examination of the history of analysis in financial markets as it relates to forecasting and nonlinear dynamics. In section 4.1, I describe the complex dynamics of the financial input. As a result of the nonlinearity, neural networks have historically been used as forecasting models with some success. The following section will elaborate some of the results that artificial neural networks have acheived in finance. Finally in the last section, I discuss the specific echo state network approach used in the project as a part of the Neural Forecasting Competition in 2007 in which different neural networks were used to predict unknown financial data. 4.1 Nonlinear Characteristics In finance, many qualitative and quantitative forecasting techniques have been applied to try to predict fluctuations in share prices. However, according to the efficient market hypothesis, one cannot outperform the market to achieve returns in excess of the average market returns based on empirical information. This is because the current market price reflects all known information about future value of the stock and investors cannot purchase undervalued stock or sell at inflated rates. This hypothesis is highly controversial and heavily debated. An entire body of study exists to produce methods and models that hope to have substantial predictive abilities of stock prices. Prediction is extremely complex and difficult as the stock market is the result of interactions of 30

33 many variables and players. Many previous techniques used various linear models to attempt to explain the numerous complex interactions. Chaos theory offers an explanation of the underlying generating process, suggesting the stock market may be a deterministic system. This section outlines the history of chaos in the description of financial markets, the determination of chaos in a time series, and possible causes for nonlinear behavior in the financial model. As a result of the failure of linear models, models of complex systems were developed to explain nonlinear processes in financial markets. The foundation for chaos theory in the realm of finance was established by Mandelbrot in 1963 in a study of cotton spot price data [21] where he found price changes did not fit a normal distribution. The theoretical justification for price changes fitting a Gaussian was given by Osbourne [27] using central limit theorem. It states that, if transactions are a true random walk, they are randomly independently, identically distributed, and price changes should be normally distributed since the price changes are the simple sum of the IID transactions. Mandelbrot discovered that the price changes differed from a normal distribution [21]. Particularly he found long tails in the distribution of price changes. There are more observations at the extreme ends than a normal distribution would predict. Furthermore his studies of cotton prices did not indicate finite variance. He expected to see variance converge as he sampled more cotton price changes since he was increasing the number of observations. Increasing the sample size of cotton prices did not cause the variance of cotton prices to converge as he would expect for Gaussian distribution since the variance for each observation is identical. He suggested instead the distribution modeled a stable Paretian distribution. Fama confirmed a stable Paretian distribution was a better fit for price returns than a Gaussian in his studies on thirty stocks in the Dow-Jones Industrial Index [4]. Other studies by Praetz in 1972 suggested a scaled t-distribution as an alternative [28]. Later on in section 5.2, I fit the financial data to both a Gaussian and a Lorentzian distribution, which is a stable paretian distribution with no skew and no defined variance measure, which was an important characteristic Mandelbrot observed in cotton prices. Mandelbrot s work established the basis of chaos theory in the world of finance. The rejection of the random walk theory brought chaos and determinism into the study of economics [24]. Studies have found that there is evidence of nonlinearities in share price data by applying correlation dimension analysis [25]. These studies sought to determine whether a time series is independently and identically distributed as would be expected for a random walk. Studies using correlation dimension have gone on to reject the Gaussian hypothesis offered by Osbourne and argue the existence of chaos in 31

34 financial markets[10]. There have been many reasons offered that suggest chaotic dynamics within the stock market. Deterministic processes could be the result of behavioral economics which dictate irrationality in investment decisions [35]. Rather than making purely logical decisions based on the current market, the psychology of fear and emotions plays a deterministic role in risk-taking and investment decisions. Paretian distribution implies a paretian market which is inherently more risky because there are more abrupt changes, variability higher and probability of loss is greater [4]. In general, the complex interactions of the many variables and players in the stock market could behave nonlinearly. Many researchers claim there does exist some degree of determinism in the market driven by high dimensional attractors. This debate on chaos will be important as I study the use of neural networks in the financial domain. 4.2 History of Neural Networks in Finance Throughout the years of research analysis on stock prices, studies have indicated a major roadblock is the lack of mathematical and statistical techniques in the field [35]. Some of the benefits of neural networks lie in the learning ability of networks to approximate functions. Given the vast amount of data available in the financial world, neural networks have become invaluable tools in detecting complex processes and modeling these relationships. Networks have been introduced as models for times series forecasting as early as 1990 where most early studies focused on predicting the return from indices. In the prediction of returns in the Tokyo Stock Price Index (TOPIX) on data covering the period January 1985 to September 1989, Kimoto and collaborators [16] compared the performance of modular neural works trained through backpropagation with multiple regression analysis, a traditional method of forecasting. The modular neural networks they created were series of independent feed-forward neural networks with three layers including input layer, a hidden layer of nodes, and an output layer. The output of these networks was combined to inform buy or sell decisions for TOPIX. They were able to determine a higher correlation coefficient between the target data and the system of networks (0.527) than for individual networks ( ). In 1990, Kamijo and Tanigawa [15] used a recurrent neural network approach to model candlestick charts, chart combining line charts and bar charts to illustrate the high, low, open, and close price for each day. They were able to use backpropagation to train the RNN to recognize specific patterns in the price chart but 32

35 were unable draw conclusions about the predictive abilities of the network. Neural networks were used to predict the daily change in direction of the S&P 500 in both studies conducted by Trippi and Desieno [33] and by Choi, Lee, and Lee [2]. Trippi and Desieno [33] developed composite rules which are different ways to combine the trained output from multiple networks to determine Boolean (rise or fall) signals. Their results show that they are 99% confident their best composite rule would outperform a randomly generated trading position. They estimated a potential annual return of $60,000. Choi, Lee, and Lee [2] used neural networks to make rise or fall predictions in the stock index. Using this method they were able to make higher annualized gains than previous methods. To address limitations of backpropagation in dealing with noise and non-stationary data [30], other research uses a hybrid method incorporating rules-based system from machine learning along with recurrent neural networks. The rules-based technique categorizes the stock data into cases based on empirical rules derived from past observations. Studies from Tsaih, Hsu, and Lai incorporating a hybrid method using rules-based system to generate input data use in neural networks found better returns over a six year period compared to a strategy where the stock and bought and held over the six year period [34]. Studies are still working to continue to improve the training algorithms for neural networks. In the next section I will study how the echo state network training algorithm, described in earlier chapters, can be implemented implemented in handling financial data. 4.3 Neural Forecasting Competition In spring of 2007, echo state networks outperformed many other neural network training algorithms in the NN3 Artificial neural Network and Computational Intelligenece Forecasting Competition, where the objective was to forecast 111 monthly financial timeseries by 18 months. The submission using an echo state network approach by Ilies, Jaeger, Kosuchinas, Rincon, Sakenas, and Vaskevicius was ranked first in forecasting the competition time series [26]. The team used the same recurrent nerual network architecture described in chapter 2 to train blocks of the 111 competition time series with high levels of success [11]. Based on their report, 111 time series was divided into 6 temporally blocks. Each of these blocks was preprocessed using seasonal decomposition methods before being used to train a collective of 500 echo state networks. The reservoir parameters were manually manipulated using part of the time series as a validation set [11]. The promising results 33

36 from the competition give rise to many more questions about the application of echo state networks in finance, which I will address in the next chapter. 34

37 Chapter 5 Network Predictions In this chapter, I apply all I have learned about echo state networks and finance to test the ability of the echo state network to handle financial data. In the first section, I examine the data set I will be using, the daily closing price of the S&P500 (GPSC) as well as the data processing measures implemented in the project. In section 5.2 I examine how I benchmark the reservoir performance against a random guess drawn from a distribution which accurately models the input data u(n). In order to optimize parameters simultaneously, I conduct a series of grid search in section 5.3. I present the results of the reservoir prediction in section 5.5. Then I study other methods which may improve prediction such as the committee method in section S&P 500 Data The data that I will be using as input data as well as the target data is the S&P 500, which is an American stock index of the top 500 corporations in the New York Stock Exchange. Specifically I used the daily close price before dividend adjustments of each day retrieved from Yahoo Finance [38]. The time period of data used ranges from March 2007 to March The figure below shows the raw input data from Yahoo Finance. 35

38 Figure 5.1: S&P 500 daily close prices from March 2007 to March Data Processing In order to use stock prices as input, I need to apply data processing techniques that would allow the reservoir to run efficiently. Raw input could lead to unstable reservoir dynamics because the raw input data could be in a range that does not produce meaningful reservoir activations. Important considerations in processing data include converting the non-stationary financial time series to a stationary set. Once the data set is detrended, I also need to process the data to ensure the scale is within the correct range for reservoir dynamics. Using a stationary representation of the financial data is important in ensuring proper reservoir performance [31]. Stationary processes are defined as a stationary distribution over time. In other words, the mean and the variance of the data remain constant over time. There are several methods for detrending considered for this project including simple differencing, logarithmic differencing, and relative differencing. For the original n-length time series y(t) = y 1, y 2...y n, the simple difference is 36

39 y di f f (t) = y 2 y 1, y 3 y 1,...y n y n 1. (5.1) Another method to detrend the data is logarithmic differencing y log (t) = log(y 2 /y 1 ), log(y 3 /y 2 ),... log(y n /y n 1 ) (5.2) The last method similar to the simple difference is the relative difference y rel (t) = (y 2 y 1 )/y 1, (y 3 y 2 )/y 2,...(y n y n 1 )/y n (5.3) I chose to implement a relative difference between data points as it indicates the change in stock price which has explicit implications on stock returns. The sign of the relative difference gives the direction of price trajectory. The following is a plot of the detrended data used in the project. Figure 5.2: Relative difference of S&P 500 data from figure

40 Another important consideration in data processing is in the scaling of the input data. As discussed in section 2.5 the input should fall within the nonlinear range of the tanh function. Since many of the relative differences were below 10%, an input scaling factor between 1 and 50 was used. In figure 5.3 the MSE from an average of 10 reservoirs over a scaling factor between 1 and 50. This is the result for a reservoir with spectral radius ρ = 0.8, N x = 500 neurons, and leaking rate α = 0.1. This plot shows input scale that minimizes MSE to be 1, which implies that the unscaled data may be a valid input to the reservoir. The range of input scales which causes a peak in MSE should be avoided. The shape of the MSE curve is very interesting but further studies are still needed to understand the dynamics underlying the bell shaped MSE curve. Figure 5.3: MSE of reservoir over input scaling factor between 1 and 50, reservoir with spectral radius ρ = 1.1, resevoir size of 500 neurons, and leaking rate =

41 5.2 Benchmarking In order to test the reservoir performance, I compare the reservoir output to a random draw from a distribution. As discussed in section 4.1, according to random walk theory, the distribution of price changes should be Gaussian. A random independent identically distributed variable should have a normal distribution. Therefore I fit a histogram of S&P 500 price changes to a Gaussian in the plot in figure 5.4 where a Guassian distribution is defined by mean µ and standard deviation σ in G(x, µ, σ) = 1 σ (x µ) 2 (2π) e 2σ 2. (5.4) The parameters µ and σ are fitted in table 5.1 with reduced chi-square valeu = µ e 05 ± σ ± Table 5.1: Gaussian fit parameters for distribution of S&P 500 data In figure 5.4, the Gaussian does not fit the distribution very well; the reduced chi-square is much larger than 1, which means the distribution is not fully capturing the data. There are too many extreme values in the long tail of the distribution for the Gaussian to be a good fit. Furthermore, because the Gaussian drops off exponentially from the mean, the slope is too steep to capture all the points in the bell curve. In table 5.1, the fitted µ has very high uncertainty as the center may be variable. The Gaussian is not a good fit for the distribution of price changes, which also leads me to reject the Gaussian distribution as way to fit the input data. 39

42 Figure 5.4: Gaussian fit to histogram of relative price change in S&P 500. In order to find a distribution with a better fit, I attempted a few other fits using the Lorentzian distribution, also known as the Cauchy distribution, and the Voigt profile, which is a convolution of the normal and the Lorentzian distribution. Both of these distributions have thicker tails than the Gaussian and would therefore be better able to capture the larger price changes within the distribution. The Lorentzian distribution is a unique case of the stable Paretian distribution discussed in section 4.1 with no skewness. The probability distribution function of the Lorentzian is defined by location x 0 and γ in L(x, x 0, γ) = 1 πγ[1 + ( (x x 0) γ ) 2. (5.5) ] Figure 5.5 is a graph of the fit to a Lorentzian with reduced chi-square = The parameters for the distribution are listed in table 5.2. The goodness of fit for the Lorentzian distribution is very close to reduced chi-square value = 1, which indicates a good fit between the data and the distribution. The thicker tails of the Lorentzian distribution are better able to capture the larger price changes that occur in the distribution. The uncertainty for the center location x 0 is relatively high compared to the center location value. However, since there was also trouble fitting a µ to the Gaussian model, I assume that the center of the historical price change distribution is variable and difficult to fit. 40

43 Figure 5.5: Lorentzian fit to histogram of relative price change in S&P 500. x e 06 ± 7.81e 05 γ ± 7.81e 05 Table 5.2: Lorentzian fit parameters for distribution of S&P 500 data. Another view of the Lorentzian distribution fit to the data at the tails is shown in figure 5.6. This figure shows that the tail of the Lorentzian distribution is able to fit the data well. However the Lorentzian distribution does tend to have slightly thicker tails than the data, and the Lorentzian distribution would be more likely to predict larger price fluctuations than that which actually occur. 41

44 Figure 5.6: A closer view of the tail of relative prices and the Lorentzian distribution shows fewer large price fluctuations than the Lorentzian would predict The next distribution that I attempt to fit the data to is the Voigt profile, a convolution of the Gaussian distribution, given by Eq. 5.4, and the Lorentzian distribution, given by Eq Therefore the convolution is defined by V(x, σ, γ) = G(x, σ)l(x x, γ)dx. (5.6) The Voigt profile in figure 5.7 has almost the same reduced chi-square value fit as the Lorentzian in figure 5.5. The goodness of fit for the Voigt profile is and the parameters for the Voigt profile are shown in table 5.3. The Voigt profile has three parameter values to fit compared to the two parameters for both the Gaussian and the Lorentzian. Even with this additional parameter, the goodness of fit is almost identical to the Lorentzian fit. As with the Lorentzian distribution, the Voigt profile is better able to the thicker tail of the price change distribution with high fluctuations. Similar to the previous two models, 42

45 the fitted parameter for the center x 0 is uncertain for the Voigt model. In the Voigt model, the σ value is more uncertain; the uncertainty of the σ in the fit is 13.57% of the σ value. Figure 5.7: Voigt profile fit to histogram of relative price change in S&P 500 x e 05 ± 7.70e 05 γ ± σ ± Table 5.3: Voigt profile fit parameters for distribution of S&P 500 data Ultimately I use units drawn randomly from the Lorentzian distribution that best approximates the inputs to measure the performance of the reservoir. The reduced chi-square for the Lorentzian distribution is very similar to the goodness of fit for the Voigt profile. Implementing a random selection from the Lorentzian distribution is also more straightforward. This selection from the distribution simulates a random guess for the price change each day based on the distribution of all previous days. The random 43

46 guess establishes the baseline for a randomly determined forecasting model. Comparing this to an echo state network, I can measure how well the trained network predicts price changes by studying the MSE of each method. 5.3 Parameter Optimization In order to optimize the reservoir to make the best reservoir predictions possible, there are a large number of global parameters that can be tuned. I discussed these parameters in section 2.5. Initially, I optimized these parameters individually to find the parameter value that would reduce MSE, which is the standard practice [17]. However, many of these parameters may affect each other, and therefore the best optimization strategy would determine the parameters simultaneously. I considered a gradient descent algorithm that would calculate the gradient of the MSE as a result of the different parameters and move in the direction of greatest descent to find the local minimum. However, this method may not be effective where there are many local minima or where the surface is discontinuous. Therefore, to better understand the landscape of MSE as a function of multiple parameters, I conduct a coarse grid search across the leaking rate, spectral radius, and reservoir size. To conduct the grid search, I create 10 reservoir given each set of parameters and find the average MSE across the 10 reservoirs created. Figures 5.8, 5.9, 5.10, and 5.11 show the surface of the MSE as a function of spectral radius and leaking rate across reservoir sizes of 100, 400, 700, and 1000 respectively. These figures highlight that while MSE may not be largely varying, certain choices of parameters will cause the reservoir to fluctuate highly and return output values that have extremely high errors. Compared to the grid search in section 3.3 where α 0.3 gave the best MSE results, the MSE is minimized in the case of financial data input for leaking rate α 0.1 and spectral radius ρ 1. There are many differences between the Mackey-Glass data and the stock data that could impact the optimal parameters for reservoir output. In the case of Mackey-Glass, the solution to Eq. 3.2 was taken for very small time steps of t = 0.1. This makes the Mackey-Glass input much smoother than the financial data series, which from figure 5.2 seem to experience many shocks. The smaller leaking rate α could minimize the impact of these sudden shocks by slowly integrating the input updates over time. The smaller optimal spectral radius implies a shorter memory required for the system. Financial markets may be less dependent on price changes in the distant past than on the more recent price shifts. However, 44

47 despite differences between the effect of spectral radius and leaking rate on Mackey-Glass input and financial data input, the reservoir size functions similarly in both such that increasing reservoir increases stability of MSE. The slope of the MSE surface for reservoir size N x = 1000 is lower than the surface for N x = 100 for both Mackey-Glass input and S&P 500 price changes input. In preparing the reservoir model to forecast and predict S&P 500 price changes, I will maintain the optimal range of parameters, where α 0.1, ρ 1, and reservoir size N x is as large as computationally feasible. Figure 5.8: MSE surface across spectral radius and leaking rate with reservoir size =

48 Figure 5.9: MSE surface across spectral radius and leaking rate with reservoir size =

49 Figure 5.10: MSE surface across spectral radius and leaking rate with reservoir size =

50 Figure 5.11: MSE surface across spectral radius and leaking rate with reservoir size = Reservoir The reservoir used in forecasting the S&P 500 data is very similar to the reservoir system used in section 3.2. From figure 5.3, the optimal scaling factor = 1, which I set for all reservoirs forecasting the market index. Based on section 5.3, I use reservoir parameters reservoir size N x = 1000, leaking rate α = 0.1, and spectral radius ρ = 0.8. For the input, I use the first 1000 data points u 0, u 1...u 1000 in the training sequence computed as the relative difference in Eq. 5.3 to drive the reservoir. The first 20 activations are disregarded and not used in training the network since they function to initialize the reservoir. Then the weighted output of the nodes in the reservoir is trained on the target data, which comes from the relative difference of the input S&P 500 data y target (n) = u(n). This gives the trained output weight matrix W out, which is used to predict the price change one day ahead. Because the market exhibits such complex behaviors, it is hard to expect even a trained reservoir to make highly accurate predictions 48

Reservoir Computing and Echo State Networks

Reservoir Computing and Echo State Networks An Introduction to: Reservoir Computing and Echo State Networks Claudio Gallicchio gallicch@di.unipi.it Outline Focus: Supervised learning in domain of sequences Recurrent Neural networks for supervised

More information

ARTIFICIAL NEURAL NETWORK PART I HANIEH BORHANAZAD

ARTIFICIAL NEURAL NETWORK PART I HANIEH BORHANAZAD ARTIFICIAL NEURAL NETWORK PART I HANIEH BORHANAZAD WHAT IS A NEURAL NETWORK? The simplest definition of a neural network, more properly referred to as an 'artificial' neural network (ANN), is provided

More information

Lecture 4: Feed Forward Neural Networks

Lecture 4: Feed Forward Neural Networks Lecture 4: Feed Forward Neural Networks Dr. Roman V Belavkin Middlesex University BIS4435 Biological neurons and the brain A Model of A Single Neuron Neurons as data-driven models Neural Networks Training

More information

Artificial Intelligence

Artificial Intelligence Artificial Intelligence Jeff Clune Assistant Professor Evolving Artificial Intelligence Laboratory Announcements Be making progress on your projects! Three Types of Learning Unsupervised Supervised Reinforcement

More information

ARTIFICIAL INTELLIGENCE. Artificial Neural Networks

ARTIFICIAL INTELLIGENCE. Artificial Neural Networks INFOB2KI 2017-2018 Utrecht University The Netherlands ARTIFICIAL INTELLIGENCE Artificial Neural Networks Lecturer: Silja Renooij These slides are part of the INFOB2KI Course Notes available from www.cs.uu.nl/docs/vakken/b2ki/schema.html

More information

Harnessing Nonlinearity: Predicting Chaotic Systems and Saving

Harnessing Nonlinearity: Predicting Chaotic Systems and Saving Harnessing Nonlinearity: Predicting Chaotic Systems and Saving Energy in Wireless Communication Publishde in Science Magazine, 2004 Siamak Saliminejad Overview Eco State Networks How to build ESNs Chaotic

More information

SPSS, University of Texas at Arlington. Topics in Machine Learning-EE 5359 Neural Networks

SPSS, University of Texas at Arlington. Topics in Machine Learning-EE 5359 Neural Networks Topics in Machine Learning-EE 5359 Neural Networks 1 The Perceptron Output: A perceptron is a function that maps D-dimensional vectors to real numbers. For notational convenience, we add a zero-th dimension

More information

Neural Networks. CSE 6363 Machine Learning Vassilis Athitsos Computer Science and Engineering Department University of Texas at Arlington

Neural Networks. CSE 6363 Machine Learning Vassilis Athitsos Computer Science and Engineering Department University of Texas at Arlington Neural Networks CSE 6363 Machine Learning Vassilis Athitsos Computer Science and Engineering Department University of Texas at Arlington 1 Perceptrons x 0 = 1 x 1 x 2 z = h w T x Output: z x D A perceptron

More information

Lecture 7 Artificial neural networks: Supervised learning

Lecture 7 Artificial neural networks: Supervised learning Lecture 7 Artificial neural networks: Supervised learning Introduction, or how the brain works The neuron as a simple computing element The perceptron Multilayer neural networks Accelerated learning in

More information

(Feed-Forward) Neural Networks Dr. Hajira Jabeen, Prof. Jens Lehmann

(Feed-Forward) Neural Networks Dr. Hajira Jabeen, Prof. Jens Lehmann (Feed-Forward) Neural Networks 2016-12-06 Dr. Hajira Jabeen, Prof. Jens Lehmann Outline In the previous lectures we have learned about tensors and factorization methods. RESCAL is a bilinear model for

More information

Lecture 5: Logistic Regression. Neural Networks

Lecture 5: Logistic Regression. Neural Networks Lecture 5: Logistic Regression. Neural Networks Logistic regression Comparison with generative models Feed-forward neural networks Backpropagation Tricks for training neural networks COMP-652, Lecture

More information

Short Term Memory Quantifications in Input-Driven Linear Dynamical Systems

Short Term Memory Quantifications in Input-Driven Linear Dynamical Systems Short Term Memory Quantifications in Input-Driven Linear Dynamical Systems Peter Tiňo and Ali Rodan School of Computer Science, The University of Birmingham Birmingham B15 2TT, United Kingdom E-mail: {P.Tino,

More information

AI Programming CS F-20 Neural Networks

AI Programming CS F-20 Neural Networks AI Programming CS662-2008F-20 Neural Networks David Galles Department of Computer Science University of San Francisco 20-0: Symbolic AI Most of this class has been focused on Symbolic AI Focus or symbols

More information

Chapter 9: The Perceptron

Chapter 9: The Perceptron Chapter 9: The Perceptron 9.1 INTRODUCTION At this point in the book, we have completed all of the exercises that we are going to do with the James program. These exercises have shown that distributed

More information

Artificial Neural Network

Artificial Neural Network Artificial Neural Network Eung Je Woo Department of Biomedical Engineering Impedance Imaging Research Center (IIRC) Kyung Hee University Korea ejwoo@khu.ac.kr Neuron and Neuron Model McCulloch and Pitts

More information

arxiv: v1 [cs.lg] 2 Feb 2018

arxiv: v1 [cs.lg] 2 Feb 2018 Short-term Memory of Deep RNN Claudio Gallicchio arxiv:1802.00748v1 [cs.lg] 2 Feb 2018 Department of Computer Science, University of Pisa Largo Bruno Pontecorvo 3-56127 Pisa, Italy Abstract. The extension

More information

Modelling Time Series with Neural Networks. Volker Tresp Summer 2017

Modelling Time Series with Neural Networks. Volker Tresp Summer 2017 Modelling Time Series with Neural Networks Volker Tresp Summer 2017 1 Modelling of Time Series The next figure shows a time series (DAX) Other interesting time-series: energy prize, energy consumption,

More information

Machine Learning. Neural Networks

Machine Learning. Neural Networks Machine Learning Neural Networks Bryan Pardo, Northwestern University, Machine Learning EECS 349 Fall 2007 Biological Analogy Bryan Pardo, Northwestern University, Machine Learning EECS 349 Fall 2007 THE

More information

Artificial Neural Networks

Artificial Neural Networks Artificial Neural Networks 鮑興國 Ph.D. National Taiwan University of Science and Technology Outline Perceptrons Gradient descent Multi-layer networks Backpropagation Hidden layer representations Examples

More information

Introduction to Neural Networks

Introduction to Neural Networks Introduction to Neural Networks What are (Artificial) Neural Networks? Models of the brain and nervous system Highly parallel Process information much more like the brain than a serial computer Learning

More information

Christian Mohr

Christian Mohr Christian Mohr 20.12.2011 Recurrent Networks Networks in which units may have connections to units in the same or preceding layers Also connections to the unit itself possible Already covered: Hopfield

More information

Neural networks. Chapter 19, Sections 1 5 1

Neural networks. Chapter 19, Sections 1 5 1 Neural networks Chapter 19, Sections 1 5 Chapter 19, Sections 1 5 1 Outline Brains Neural networks Perceptrons Multilayer perceptrons Applications of neural networks Chapter 19, Sections 1 5 2 Brains 10

More information

Neural Networks and Deep Learning

Neural Networks and Deep Learning Neural Networks and Deep Learning Professor Ameet Talwalkar November 12, 2015 Professor Ameet Talwalkar Neural Networks and Deep Learning November 12, 2015 1 / 16 Outline 1 Review of last lecture AdaBoost

More information

Course 395: Machine Learning - Lectures

Course 395: Machine Learning - Lectures Course 395: Machine Learning - Lectures Lecture 1-2: Concept Learning (M. Pantic) Lecture 3-4: Decision Trees & CBC Intro (M. Pantic & S. Petridis) Lecture 5-6: Evaluating Hypotheses (S. Petridis) Lecture

More information

International University Bremen Guided Research Proposal Improve on chaotic time series prediction using MLPs for output training

International University Bremen Guided Research Proposal Improve on chaotic time series prediction using MLPs for output training International University Bremen Guided Research Proposal Improve on chaotic time series prediction using MLPs for output training Aakash Jain a.jain@iu-bremen.de Spring Semester 2004 1 Executive Summary

More information

Artificial Neural Networks D B M G. Data Base and Data Mining Group of Politecnico di Torino. Elena Baralis. Politecnico di Torino

Artificial Neural Networks D B M G. Data Base and Data Mining Group of Politecnico di Torino. Elena Baralis. Politecnico di Torino Artificial Neural Networks Data Base and Data Mining Group of Politecnico di Torino Elena Baralis Politecnico di Torino Artificial Neural Networks Inspired to the structure of the human brain Neurons as

More information

CSE 417T: Introduction to Machine Learning. Final Review. Henry Chai 12/4/18

CSE 417T: Introduction to Machine Learning. Final Review. Henry Chai 12/4/18 CSE 417T: Introduction to Machine Learning Final Review Henry Chai 12/4/18 Overfitting Overfitting is fitting the training data more than is warranted Fitting noise rather than signal 2 Estimating! "#$

More information

A Practical Guide to Applying Echo State Networks

A Practical Guide to Applying Echo State Networks A Practical Guide to Applying Echo State Networks Mantas Lukoševičius Jacobs University Bremen, Campus Ring 1, 28759 Bremen, Germany m.lukosevicius@jacobs-university.de Abstract. Reservoir computing has

More information

Neural Networks. Chapter 18, Section 7. TB Artificial Intelligence. Slides from AIMA 1/ 21

Neural Networks. Chapter 18, Section 7. TB Artificial Intelligence. Slides from AIMA   1/ 21 Neural Networks Chapter 8, Section 7 TB Artificial Intelligence Slides from AIMA http://aima.cs.berkeley.edu / 2 Outline Brains Neural networks Perceptrons Multilayer perceptrons Applications of neural

More information

3.4 Linear Least-Squares Filter

3.4 Linear Least-Squares Filter X(n) = [x(1), x(2),..., x(n)] T 1 3.4 Linear Least-Squares Filter Two characteristics of linear least-squares filter: 1. The filter is built around a single linear neuron. 2. The cost function is the sum

More information

18.6 Regression and Classification with Linear Models

18.6 Regression and Classification with Linear Models 18.6 Regression and Classification with Linear Models 352 The hypothesis space of linear functions of continuous-valued inputs has been used for hundreds of years A univariate linear function (a straight

More information

Last update: October 26, Neural networks. CMSC 421: Section Dana Nau

Last update: October 26, Neural networks. CMSC 421: Section Dana Nau Last update: October 26, 207 Neural networks CMSC 42: Section 8.7 Dana Nau Outline Applications of neural networks Brains Neural network units Perceptrons Multilayer perceptrons 2 Example Applications

More information

MODULAR ECHO STATE NEURAL NETWORKS IN TIME SERIES PREDICTION

MODULAR ECHO STATE NEURAL NETWORKS IN TIME SERIES PREDICTION Computing and Informatics, Vol. 30, 2011, 321 334 MODULAR ECHO STATE NEURAL NETWORKS IN TIME SERIES PREDICTION Štefan Babinec, Jiří Pospíchal Department of Mathematics Faculty of Chemical and Food Technology

More information

A summary of Deep Learning without Poor Local Minima

A summary of Deep Learning without Poor Local Minima A summary of Deep Learning without Poor Local Minima by Kenji Kawaguchi MIT oral presentation at NIPS 2016 Learning Supervised (or Predictive) learning Learn a mapping from inputs x to outputs y, given

More information

Echo State Networks with Filter Neurons and a Delay&Sum Readout

Echo State Networks with Filter Neurons and a Delay&Sum Readout Echo State Networks with Filter Neurons and a Delay&Sum Readout Georg Holzmann 2,1 (Corresponding Author) http://grh.mur.at grh@mur.at Helmut Hauser 1 helmut.hauser@igi.tugraz.at 1 Institute for Theoretical

More information

Lecture 4: Perceptrons and Multilayer Perceptrons

Lecture 4: Perceptrons and Multilayer Perceptrons Lecture 4: Perceptrons and Multilayer Perceptrons Cognitive Systems II - Machine Learning SS 2005 Part I: Basic Approaches of Concept Learning Perceptrons, Artificial Neuronal Networks Lecture 4: Perceptrons

More information

Negatively Correlated Echo State Networks

Negatively Correlated Echo State Networks Negatively Correlated Echo State Networks Ali Rodan and Peter Tiňo School of Computer Science, The University of Birmingham Birmingham B15 2TT, United Kingdom E-mail: {a.a.rodan, P.Tino}@cs.bham.ac.uk

More information

Using a Hopfield Network: A Nuts and Bolts Approach

Using a Hopfield Network: A Nuts and Bolts Approach Using a Hopfield Network: A Nuts and Bolts Approach November 4, 2013 Gershon Wolfe, Ph.D. Hopfield Model as Applied to Classification Hopfield network Training the network Updating nodes Sequencing of

More information

Neural Networks Introduction

Neural Networks Introduction Neural Networks Introduction H.A Talebi Farzaneh Abdollahi Department of Electrical Engineering Amirkabir University of Technology Winter 2011 H. A. Talebi, Farzaneh Abdollahi Neural Networks 1/22 Biological

More information

Neural networks. Chapter 20, Section 5 1

Neural networks. Chapter 20, Section 5 1 Neural networks Chapter 20, Section 5 Chapter 20, Section 5 Outline Brains Neural networks Perceptrons Multilayer perceptrons Applications of neural networks Chapter 20, Section 5 2 Brains 0 neurons of

More information

Deep Feedforward Networks

Deep Feedforward Networks Deep Feedforward Networks Yongjin Park 1 Goal of Feedforward Networks Deep Feedforward Networks are also called as Feedforward neural networks or Multilayer Perceptrons Their Goal: approximate some function

More information

Artificial Neural Networks (ANN) Xiaogang Su, Ph.D. Department of Mathematical Science University of Texas at El Paso

Artificial Neural Networks (ANN) Xiaogang Su, Ph.D. Department of Mathematical Science University of Texas at El Paso Artificial Neural Networks (ANN) Xiaogang Su, Ph.D. Department of Mathematical Science University of Texas at El Paso xsu@utep.edu Fall, 2018 Outline Introduction A Brief History ANN Architecture Terminology

More information

Artificial Neural Networks" and Nonparametric Methods" CMPSCI 383 Nov 17, 2011!

Artificial Neural Networks and Nonparametric Methods CMPSCI 383 Nov 17, 2011! Artificial Neural Networks" and Nonparametric Methods" CMPSCI 383 Nov 17, 2011! 1 Todayʼs lecture" How the brain works (!)! Artificial neural networks! Perceptrons! Multilayer feed-forward networks! Error

More information

Linear discriminant functions

Linear discriminant functions Andrea Passerini passerini@disi.unitn.it Machine Learning Discriminative learning Discriminative vs generative Generative learning assumes knowledge of the distribution governing the data Discriminative

More information

Introduction to Natural Computation. Lecture 9. Multilayer Perceptrons and Backpropagation. Peter Lewis

Introduction to Natural Computation. Lecture 9. Multilayer Perceptrons and Backpropagation. Peter Lewis Introduction to Natural Computation Lecture 9 Multilayer Perceptrons and Backpropagation Peter Lewis 1 / 25 Overview of the Lecture Why multilayer perceptrons? Some applications of multilayer perceptrons.

More information

The error-backpropagation algorithm is one of the most important and widely used (and some would say wildly used) learning techniques for neural

The error-backpropagation algorithm is one of the most important and widely used (and some would say wildly used) learning techniques for neural 1 2 The error-backpropagation algorithm is one of the most important and widely used (and some would say wildly used) learning techniques for neural networks. First we will look at the algorithm itself

More information

Neural networks. Chapter 20. Chapter 20 1

Neural networks. Chapter 20. Chapter 20 1 Neural networks Chapter 20 Chapter 20 1 Outline Brains Neural networks Perceptrons Multilayer networks Applications of neural networks Chapter 20 2 Brains 10 11 neurons of > 20 types, 10 14 synapses, 1ms

More information

Neural Networks. Mark van Rossum. January 15, School of Informatics, University of Edinburgh 1 / 28

Neural Networks. Mark van Rossum. January 15, School of Informatics, University of Edinburgh 1 / 28 1 / 28 Neural Networks Mark van Rossum School of Informatics, University of Edinburgh January 15, 2018 2 / 28 Goals: Understand how (recurrent) networks behave Find a way to teach networks to do a certain

More information

CS:4420 Artificial Intelligence

CS:4420 Artificial Intelligence CS:4420 Artificial Intelligence Spring 2018 Neural Networks Cesare Tinelli The University of Iowa Copyright 2004 18, Cesare Tinelli and Stuart Russell a a These notes were originally developed by Stuart

More information

Machine Learning. Neural Networks. (slides from Domingos, Pardo, others)

Machine Learning. Neural Networks. (slides from Domingos, Pardo, others) Machine Learning Neural Networks (slides from Domingos, Pardo, others) Human Brain Neurons Input-Output Transformation Input Spikes Output Spike Spike (= a brief pulse) (Excitatory Post-Synaptic Potential)

More information

Deep Feedforward Networks. Sargur N. Srihari

Deep Feedforward Networks. Sargur N. Srihari Deep Feedforward Networks Sargur N. srihari@cedar.buffalo.edu 1 Topics Overview 1. Example: Learning XOR 2. Gradient-Based Learning 3. Hidden Units 4. Architecture Design 5. Backpropagation and Other Differentiation

More information

Artificial Neural Networks. Edward Gatt

Artificial Neural Networks. Edward Gatt Artificial Neural Networks Edward Gatt What are Neural Networks? Models of the brain and nervous system Highly parallel Process information much more like the brain than a serial computer Learning Very

More information

Learning and Memory in Neural Networks

Learning and Memory in Neural Networks Learning and Memory in Neural Networks Guy Billings, Neuroinformatics Doctoral Training Centre, The School of Informatics, The University of Edinburgh, UK. Neural networks consist of computational units

More information

Linear & nonlinear classifiers

Linear & nonlinear classifiers Linear & nonlinear classifiers Machine Learning Hamid Beigy Sharif University of Technology Fall 1394 Hamid Beigy (Sharif University of Technology) Linear & nonlinear classifiers Fall 1394 1 / 34 Table

More information

Artificial Neural Networks

Artificial Neural Networks Artificial Neural Networks Threshold units Gradient descent Multilayer networks Backpropagation Hidden layer representations Example: Face Recognition Advanced topics 1 Connectionist Models Consider humans:

More information

Neural Networks biological neuron artificial neuron 1

Neural Networks biological neuron artificial neuron 1 Neural Networks biological neuron artificial neuron 1 A two-layer neural network Output layer (activation represents classification) Weighted connections Hidden layer ( internal representation ) Input

More information

Memory Capacity of Input-Driven Echo State NetworksattheEdgeofChaos

Memory Capacity of Input-Driven Echo State NetworksattheEdgeofChaos Memory Capacity of Input-Driven Echo State NetworksattheEdgeofChaos Peter Barančok and Igor Farkaš Faculty of Mathematics, Physics and Informatics Comenius University in Bratislava, Slovakia farkas@fmph.uniba.sk

More information

Neural Networks: Introduction

Neural Networks: Introduction Neural Networks: Introduction Machine Learning Fall 2017 Based on slides and material from Geoffrey Hinton, Richard Socher, Dan Roth, Yoav Goldberg, Shai Shalev-Shwartz and Shai Ben-David, and others 1

More information

Financial Risk and Returns Prediction with Modular Networked Learning

Financial Risk and Returns Prediction with Modular Networked Learning arxiv:1806.05876v1 [cs.lg] 15 Jun 2018 Financial Risk and Returns Prediction with Modular Networked Learning Carlos Pedro Gonçalves June 18, 2018 University of Lisbon, Instituto Superior de Ciências Sociais

More information

Pattern Recognition Prof. P. S. Sastry Department of Electronics and Communication Engineering Indian Institute of Science, Bangalore

Pattern Recognition Prof. P. S. Sastry Department of Electronics and Communication Engineering Indian Institute of Science, Bangalore Pattern Recognition Prof. P. S. Sastry Department of Electronics and Communication Engineering Indian Institute of Science, Bangalore Lecture - 27 Multilayer Feedforward Neural networks with Sigmoidal

More information

Introduction to Artificial Neural Networks

Introduction to Artificial Neural Networks Facultés Universitaires Notre-Dame de la Paix 27 March 2007 Outline 1 Introduction 2 Fundamentals Biological neuron Artificial neuron Artificial Neural Network Outline 3 Single-layer ANN Perceptron Adaline

More information

Good vibrations: the issue of optimizing dynamical reservoirs

Good vibrations: the issue of optimizing dynamical reservoirs Good vibrations: the issue of optimizing dynamical reservoirs Workshop on ESNs / LSMs, NIPS 2006 Herbert Jaeger International University Bremen (Jacobs University Bremen, as of Spring 2007) The basic idea:

More information

Refutation of Second Reviewer's Objections

Refutation of Second Reviewer's Objections Re: Submission to Science, "Harnessing nonlinearity: predicting chaotic systems and boosting wireless communication." (Ref: 1091277) Refutation of Second Reviewer's Objections Herbert Jaeger, Dec. 23,

More information

Neural Nets in PR. Pattern Recognition XII. Michal Haindl. Outline. Neural Nets in PR 2

Neural Nets in PR. Pattern Recognition XII. Michal Haindl. Outline. Neural Nets in PR 2 Neural Nets in PR NM P F Outline Motivation: Pattern Recognition XII human brain study complex cognitive tasks Michal Haindl Faculty of Information Technology, KTI Czech Technical University in Prague

More information

Short Term Memory and Pattern Matching with Simple Echo State Networks

Short Term Memory and Pattern Matching with Simple Echo State Networks Short Term Memory and Pattern Matching with Simple Echo State Networks Georg Fette (fette@in.tum.de), Julian Eggert (julian.eggert@honda-ri.de) Technische Universität München; Boltzmannstr. 3, 85748 Garching/München,

More information

Data Mining Part 5. Prediction

Data Mining Part 5. Prediction Data Mining Part 5. Prediction 5.5. Spring 2010 Instructor: Dr. Masoud Yaghini Outline How the Brain Works Artificial Neural Networks Simple Computing Elements Feed-Forward Networks Perceptrons (Single-layer,

More information

Pattern Matching and Neural Networks based Hybrid Forecasting System

Pattern Matching and Neural Networks based Hybrid Forecasting System Pattern Matching and Neural Networks based Hybrid Forecasting System Sameer Singh and Jonathan Fieldsend PA Research, Department of Computer Science, University of Exeter, Exeter, UK Abstract In this paper

More information

CMSC 421: Neural Computation. Applications of Neural Networks

CMSC 421: Neural Computation. Applications of Neural Networks CMSC 42: Neural Computation definition synonyms neural networks artificial neural networks neural modeling connectionist models parallel distributed processing AI perspective Applications of Neural Networks

More information

Using reservoir computing in a decomposition approach for time series prediction.

Using reservoir computing in a decomposition approach for time series prediction. Using reservoir computing in a decomposition approach for time series prediction. Francis wyffels, Benjamin Schrauwen and Dirk Stroobandt Ghent University - Electronics and Information Systems Department

More information

Artificial Neural Networks. Introduction to Computational Neuroscience Tambet Matiisen

Artificial Neural Networks. Introduction to Computational Neuroscience Tambet Matiisen Artificial Neural Networks Introduction to Computational Neuroscience Tambet Matiisen 2.04.2018 Artificial neural network NB! Inspired by biology, not based on biology! Applications Automatic speech recognition

More information

Machine Learning. Neural Networks. (slides from Domingos, Pardo, others)

Machine Learning. Neural Networks. (slides from Domingos, Pardo, others) Machine Learning Neural Networks (slides from Domingos, Pardo, others) For this week, Reading Chapter 4: Neural Networks (Mitchell, 1997) See Canvas For subsequent weeks: Scaling Learning Algorithms toward

More information

Several ways to solve the MSO problem

Several ways to solve the MSO problem Several ways to solve the MSO problem J. J. Steil - Bielefeld University - Neuroinformatics Group P.O.-Box 0 0 3, D-3350 Bielefeld - Germany Abstract. The so called MSO-problem, a simple superposition

More information

Machine Learning for Large-Scale Data Analysis and Decision Making A. Neural Networks Week #6

Machine Learning for Large-Scale Data Analysis and Decision Making A. Neural Networks Week #6 Machine Learning for Large-Scale Data Analysis and Decision Making 80-629-17A Neural Networks Week #6 Today Neural Networks A. Modeling B. Fitting C. Deep neural networks Today s material is (adapted)

More information

Machine Learning. Neural Networks. (slides from Domingos, Pardo, others)

Machine Learning. Neural Networks. (slides from Domingos, Pardo, others) Machine Learning Neural Networks (slides from Domingos, Pardo, others) For this week, Reading Chapter 4: Neural Networks (Mitchell, 1997) See Canvas For subsequent weeks: Scaling Learning Algorithms toward

More information

Long-Short Term Memory and Other Gated RNNs

Long-Short Term Memory and Other Gated RNNs Long-Short Term Memory and Other Gated RNNs Sargur Srihari srihari@buffalo.edu This is part of lecture slides on Deep Learning: http://www.cedar.buffalo.edu/~srihari/cse676 1 Topics in Sequence Modeling

More information

CHAPTER 3. Pattern Association. Neural Networks

CHAPTER 3. Pattern Association. Neural Networks CHAPTER 3 Pattern Association Neural Networks Pattern Association learning is the process of forming associations between related patterns. The patterns we associate together may be of the same type or

More information

An artificial neural networks (ANNs) model is a functional abstraction of the

An artificial neural networks (ANNs) model is a functional abstraction of the CHAPER 3 3. Introduction An artificial neural networs (ANNs) model is a functional abstraction of the biological neural structures of the central nervous system. hey are composed of many simple and highly

More information

Feedforward Neural Nets and Backpropagation

Feedforward Neural Nets and Backpropagation Feedforward Neural Nets and Backpropagation Julie Nutini University of British Columbia MLRG September 28 th, 2016 1 / 23 Supervised Learning Roadmap Supervised Learning: Assume that we are given the features

More information

Sample questions for Fundamentals of Machine Learning 2018

Sample questions for Fundamentals of Machine Learning 2018 Sample questions for Fundamentals of Machine Learning 2018 Teacher: Mohammad Emtiyaz Khan A few important informations: In the final exam, no electronic devices are allowed except a calculator. Make sure

More information

CSE 352 (AI) LECTURE NOTES Professor Anita Wasilewska. NEURAL NETWORKS Learning

CSE 352 (AI) LECTURE NOTES Professor Anita Wasilewska. NEURAL NETWORKS Learning CSE 352 (AI) LECTURE NOTES Professor Anita Wasilewska NEURAL NETWORKS Learning Neural Networks Classifier Short Presentation INPUT: classification data, i.e. it contains an classification (class) attribute.

More information

Feed-forward Network Functions

Feed-forward Network Functions Feed-forward Network Functions Sargur Srihari Topics 1. Extension of linear models 2. Feed-forward Network Functions 3. Weight-space symmetries 2 Recap of Linear Models Linear Models for Regression, Classification

More information

Reservoir Computing with Stochastic Bitstream Neurons

Reservoir Computing with Stochastic Bitstream Neurons Reservoir Computing with Stochastic Bitstream Neurons David Verstraeten, Benjamin Schrauwen and Dirk Stroobandt Department of Electronics and Information Systems (ELIS), Ugent {david.verstraeten, benjamin.schrauwen,

More information

1 What a Neural Network Computes

1 What a Neural Network Computes Neural Networks 1 What a Neural Network Computes To begin with, we will discuss fully connected feed-forward neural networks, also known as multilayer perceptrons. A feedforward neural network consists

More information

In the Name of God. Lecture 11: Single Layer Perceptrons

In the Name of God. Lecture 11: Single Layer Perceptrons 1 In the Name of God Lecture 11: Single Layer Perceptrons Perceptron: architecture We consider the architecture: feed-forward NN with one layer It is sufficient to study single layer perceptrons with just

More information

y(x n, w) t n 2. (1)

y(x n, w) t n 2. (1) Network training: Training a neural network involves determining the weight parameter vector w that minimizes a cost function. Given a training set comprising a set of input vector {x n }, n = 1,...N,

More information

Advanced statistical methods for data analysis Lecture 2

Advanced statistical methods for data analysis Lecture 2 Advanced statistical methods for data analysis Lecture 2 RHUL Physics www.pp.rhul.ac.uk/~cowan Universität Mainz Klausurtagung des GK Eichtheorien exp. Tests... Bullay/Mosel 15 17 September, 2008 1 Outline

More information

Linear Models for Regression CS534

Linear Models for Regression CS534 Linear Models for Regression CS534 Prediction Problems Predict housing price based on House size, lot size, Location, # of rooms Predict stock price based on Price history of the past month Predict the

More information

Tutorial on Machine Learning for Advanced Electronics

Tutorial on Machine Learning for Advanced Electronics Tutorial on Machine Learning for Advanced Electronics Maxim Raginsky March 2017 Part I (Some) Theory and Principles Machine Learning: estimation of dependencies from empirical data (V. Vapnik) enabling

More information

Deep Learning Architecture for Univariate Time Series Forecasting

Deep Learning Architecture for Univariate Time Series Forecasting CS229,Technical Report, 2014 Deep Learning Architecture for Univariate Time Series Forecasting Dmitry Vengertsev 1 Abstract This paper studies the problem of applying machine learning with deep architecture

More information

ARTIFICIAL NEURAL NETWORKS گروه مطالعاتي 17 بهار 92

ARTIFICIAL NEURAL NETWORKS گروه مطالعاتي 17 بهار 92 ARTIFICIAL NEURAL NETWORKS گروه مطالعاتي 17 بهار 92 BIOLOGICAL INSPIRATIONS Some numbers The human brain contains about 10 billion nerve cells (neurons) Each neuron is connected to the others through 10000

More information

Chapter 3 Supervised learning:

Chapter 3 Supervised learning: Chapter 3 Supervised learning: Multilayer Networks I Backpropagation Learning Architecture: Feedforward network of at least one layer of non-linear hidden nodes, e.g., # of layers L 2 (not counting the

More information

Neural Networks (Part 1) Goals for the lecture

Neural Networks (Part 1) Goals for the lecture Neural Networks (Part ) Mark Craven and David Page Computer Sciences 760 Spring 208 www.biostat.wisc.edu/~craven/cs760/ Some of the slides in these lectures have been adapted/borrowed from materials developed

More information

22/04/2014. Economic Research

22/04/2014. Economic Research 22/04/2014 Economic Research Forecasting Models for Exchange Rate Tuesday, April 22, 2014 The science of prognostics has been going through a rapid and fruitful development in the past decades, with various

More information

Apprentissage, réseaux de neurones et modèles graphiques (RCP209) Neural Networks and Deep Learning

Apprentissage, réseaux de neurones et modèles graphiques (RCP209) Neural Networks and Deep Learning Apprentissage, réseaux de neurones et modèles graphiques (RCP209) Neural Networks and Deep Learning Nicolas Thome Prenom.Nom@cnam.fr http://cedric.cnam.fr/vertigo/cours/ml2/ Département Informatique Conservatoire

More information

Bits of Machine Learning Part 1: Supervised Learning

Bits of Machine Learning Part 1: Supervised Learning Bits of Machine Learning Part 1: Supervised Learning Alexandre Proutiere and Vahan Petrosyan KTH (The Royal Institute of Technology) Outline of the Course 1. Supervised Learning Regression and Classification

More information

Direct Method for Training Feed-forward Neural Networks using Batch Extended Kalman Filter for Multi- Step-Ahead Predictions

Direct Method for Training Feed-forward Neural Networks using Batch Extended Kalman Filter for Multi- Step-Ahead Predictions Direct Method for Training Feed-forward Neural Networks using Batch Extended Kalman Filter for Multi- Step-Ahead Predictions Artem Chernodub, Institute of Mathematical Machines and Systems NASU, Neurotechnologies

More information

Introduction To Artificial Neural Networks

Introduction To Artificial Neural Networks Introduction To Artificial Neural Networks Machine Learning Supervised circle square circle square Unsupervised group these into two categories Supervised Machine Learning Supervised Machine Learning Supervised

More information

Introduction Biologically Motivated Crude Model Backpropagation

Introduction Biologically Motivated Crude Model Backpropagation Introduction Biologically Motivated Crude Model Backpropagation 1 McCulloch-Pitts Neurons In 1943 Warren S. McCulloch, a neuroscientist, and Walter Pitts, a logician, published A logical calculus of the

More information

Advanced Methods for Recurrent Neural Networks Design

Advanced Methods for Recurrent Neural Networks Design Universidad Autónoma de Madrid Escuela Politécnica Superior Departamento de Ingeniería Informática Advanced Methods for Recurrent Neural Networks Design Master s thesis presented to apply for the Master

More information

Artificial Neural Networks

Artificial Neural Networks Artificial Neural Networks Oliver Schulte - CMPT 310 Neural Networks Neural networks arise from attempts to model human/animal brains Many models, many claims of biological plausibility We will focus on

More information