Locally Connected Recurrent Networks. Lai-Wan CHAN and Evan Fung-Yu YOUNG. Computer Science Department, The Chinese University of Hong Kong

Size: px
Start display at page:

Download "Locally Connected Recurrent Networks. Lai-Wan CHAN and Evan Fung-Yu YOUNG. Computer Science Department, The Chinese University of Hong Kong"

Transcription

1 Locally Connected Recurrent Networks Lai-Wan CHAN and Evan Fung-Yu YOUNG Computer Science Department, The Chinese University of Hong Kong New Territories, Hong Kong lwchan@cs.cuhk.hk Technical Report : CS-TR Abstract The fully connected recurrent network (FRN) using the on-line training method, Real Time Recurrent Learning (RTRL), is computationally expensive. It has a computational complexity of O(N 4 ) and storage complexity of O(N 3 ), where N is the number of non-input units. We have devised a locally connected recurrent model which has a much lower complexity in both computational time and storage space. The ringstructure recurrent network (RRN), the simplest kind of the locally connected has the corresponding complexity of O(mn+np) and O(np) respectively, where p, n and m are the number of input, hidden and output units respectively. We compare the performance between RRN and FRN in sequence recognition and time series prediction. We tested the networks' ability in temporal memorizing power and time warpping ability in the sequence recognition task. In the time series prediction task, we used both networks to train and predict three series; a periodic series with white noise, a deterministic chaotic series and the sunspots data. Both tasks show that RRN needs a much shorter training time and the performance of RRN is comparable to that of FRN. 1 Introduction Recurrent neural network is an attractive model because it can carry information through time. But the problem of training recurrent models to encode temporal information has not been solved satisfactorily. To make it be popular in real life applications, we need an ecient and powerful training method to tap the full use of a recurrent network. Many researchers are working on this problem in the past decades and there are some exciting results in the training procedures. Back Propagation through time (BPTT) [4] is an ecient learning algorithm but it cannot be run on-line and is impractical for tasks with input signals of unknown lengths. The Real Time Recurrent Learning (RTRL) [9] is a powerful on-line learning algorithm but it is extremely inecient and is also non-local in space. There are also some other variants of RTRL [12, 7]. Alternatively, the non-gradient descent methods 1

2 like NBB [5] and TD [6] have many desirable features but they do not guarantee convergence. The challenge of devising new learning algorithms for recurrent models is thus an attractive, interesting and useful task. We usually consider a fully connected architecture for a recurrent model because this very general architecture provides the model a great freedom to build its own internal representations for encoding temporal information. This general architecture, on the other hand, has imposed a great burden on the training process. In a fully recurrent network, any weight w ij can aect the activation of any unit u k within one time step since unit u i is connected to unit u k directly. Therefore the weight w ij is "temporally" close to unit u k although it may not be "spatially". Actually we need to have some information from every other unit in order to update the weight w ij in RTRL. This inherited property of being fully connected has slowed down the training process and has also made the process being non-local in space. The fully connected architecture is also unnatural in neurosystem since every unit can now aect every other unit directly within one time step. Furthermore the model is not suitable for parallelization. The communication time will be long as we need each processing unit to communicate with every other at each epoch. In our work, we abandon the fully connected architecture and consider one in which every hidden unit is connected to only a number of its neighbors and we call it a "locally connected" recurrent network. We have devised an on-line learning algorithm for this new kind of network model such that information can be carried innitely through time although eect is not all propagated forward. The new learning process has a much lower computational complexity in both time and space in comparison with the other approaches and is also more local in space in the sense that a weight update needs only the information from those units in its vicinity. Besides this algorithm has much exibility for mapping into dierent parallel architectures and the parameters can be chosen in such a way that the computational process is totally local in space, i.e. the computations in a processing unit rely only on those units directly connected to it. In this paper, we rst describe the network architecture in section 2. Then we describe 2

3 input units output units hidden units Figure 1: Architecture of a Locally Connected Recurrent Network the learning algorithm and discuss about its computational complexity, storage complexity and other important aspects in section 3. Section 4 describes the simplest kind of the locally connected recurrent models called the ring-structured recurrent network (RRN). Section 5 and 6 compare RRN with the fully recurrent network (FRN) [9] in the tasks of sequence recognition and time series prediction respectively in terms of their performance in both training and testing. 2 Locally Connected Recurrent Networks 2.1 Network Topology The model is made up of three layers of units, input layer, hidden layer and output layer. Each hidden node is connected with every input unit and every output unit as shown in Figure 1. The inter-layer connections are all running forward (the solid lines in Figure 1) and recurrent links (the dotted curves in Figure 1) exist in the hidden layer only. In the hidden layer, each unit is connected to some hidden units forming a partially connected structure. Dierent structures can be formed depending on the way in which the hidden nodes are connected. For simplicity, we consider only homogeneous structures, so every unit in the structure is identical in terms of connections. Three common examples are shown in Figure 2. For example, each unit in a ring is connected to two others (Figure 2a) while that 3

4 (a) (b) to bottom to right to left (c) to bottom to top to front to right to left to back to top Figure 2: Connections in the Hidden Layer of a Locally Connected Recurrent Network. The black dots are the hidden neurons and the solid lines are the recurrent weights (for simplicity, only one line is drew between two nodes.) (a) ring structure (b) grid structure (c) cubic structure t= t= t= t= t= t=0... (a) (b) Figure 3: Locally Connected Recurrent Networks after Unfolding 4

5 in a grid is four (Figure 2b) and that in an n dimensional cube (hypercube) is n (Figure 2c). We dene the term \neighborhood" of a hidden unit u j as follows: Neighborhood of unit u j, Q j = f u i j u i is a hidden unit and u i is connected to u j directly g and the degree of connectivity, q = max j [the cardinarity of Q j ] The neighborhood of a hidden node is thus a set of all hidden units having a direct connection to that node. A network is called a q-net if each hidden unit has q incoming recurrent connections (including the self-feedback loop). Therefore a ring is a 3-net, a grid is a 5-net and an n dimensional cube is a (n+1)-net. 2.2 Subgrouping The hidden layers can be unfolded in time to form a feedforward structure for easy in visualisation of the subgroups. Figure 3 shows the connections in a 3-net and a 5-net after unfolding. We divide the hidden units into overlapping subgroups such that each unit u j is at the center of one subgroup G j;;q : G j;;q = f u i j u i is a hidden unit in a q-net and u i aects u j in at most time steps g is a parameter which determines how far an eect propagates among the units. Figure 4 gives an example of a subgroup in a 3-net and a 5-net when is equal to two. These subgroups have the following properties: The subgroups are overlapping with one another. All subgroups have the same size and the size is dependent on and the degree of connectivity q (with the exceptional boundary case). The maximum size of one subgroup is denoted by r and r = q + (q-1)(-1) There are totally n subgroups where n is the number of hidden units. Each hidden unit belongs to r subgroups where r is the size of one subgroup and it is the center of exactly one of them. 5

6 t=2 t=2 t=1 t=1 t=0 t=0 (a) 3-net (b) 5-net A subgroup centered at the unfilled unit Figure 4: Subgrouping in Locally Connected Recurrent Networks Since and q are xed in a particular network model, we will omit these subscripts from G j;;q in the following derivation. 2.3 Learning Algorithm The training algorithm obeys the following rule: If a weight w ij can only aect the activation of a unit u k after at least + 1 time steps, its eect on u k is neglected. where is the xed parameter as described in the previous section. From the derivation of the backpropagation of errors in a feedforward network, the error terms in each consecutive layer are diminished by a factor equal to the derivative of the sigmoid function f(x) (i.e. f(x)[1-f(x)]). This factor is 0.25 at maximum and the eect of w ij is diminishing when it is being propagated through time, provided the connection weights are not exceptionally large. Therefore, this rule of neglecting temporally distant terms is justied. Now w ij aects only a xed set of units: f u k ju k 2 G i g and u k is only aected by a xed set of weights: f w ij ju i 2 G k g An example of this is shown in Figure 5. 6

7 Information of the dotted-line weight is kept in these five units The unfilled unit keeps information of the solid weights only (a) (b) Figure 5: Eect Propagation in a Locally Connected Recurrent Network Suppose there are p input units, n hidden units and m output units in a q-net and we use capital letters P, N and M to denote the set of input units, the set of hidden units and the set of output units respectively. Each hidden unit is connected to all input units and all output units while the hidden units themselves are interconnected as described above. At time step t, each hidden unit u i computes its output as: y i (t) = g(x i (t)) (1) X = g( w ij z j (t)) u i 2 N (2) u j 2Q i[p where z j = yj (t? 1) u j 2 Q i and I j (t) u j 2 P g can be any dierentiable function x i (t) is the total input to u i at time t w ij is the weight running from u j to u i Q i is the neighborhood of u i I k (t) is the k th input at time t There is no direct connection between the input layer and the output layer. Each output unit u i simply computes its activation at time t as: y i (t) = g(x X i (t)) (3) = g( w ij z j (t)) u i 2 M (4) uj2n 7

8 Only the output units contribute to the cost function: X G = E(t) (5) t X E(t) = 1 E k (t) 2 (6) 2 uk2m E k (t) = d k (t)? y k (t) (7) where G is the total error E(t) is the error of the whole network at time t E k (t) is the error of the k th output unit at time t d k (t) is the k th target output at time t Minimizing G will teach the network to have yi 0s imitating d0 is. This can be done using gradient descent with a momentum term: w ij (t) X + w ij (t? 1) ij = E k k(t) + w ij (t? 1) ij u k2m where is the learning rate is the momentum coecient w ij (t) is the change in w ij at time t The weights between the hidden layer and the output layer are non-recurrent and updates can easily be done by error backpropagation as in ordinary feedforward networks: w ij (t) = E i (t)g 0 (x i (t))y j (t) + w ij (t? 1) (10) where u i 2 M and u j 2 N For the incoming connections to the hidden units, we can propagate the error terms one step back from the output layer to the hidden layer as in equation (11) and update the 8

9 weights according to equation (12): k u l2m X w ij (t) =? uk2gi E l (t)g 0 (x l (t))w lk (t) u k 2 N ij ) + w ij (t? 1) u i 2 N (12) since w ij can only aect the units in G i. The derivatives in equation 11 can be obtained by dierentiating the dynamic rule in equation 1: k = g 0 p (t? 1) (x k (t))[ w kp + ki z j (t? 1)] u ij p2q k where ki = 1 ui = u k 0 u i 6= u k and u k 2 G i and u i 2 N This relates the k (t)=@w ij at time t to those at time t-1. We can thus iterate it forward from the initial k ij = 0 (14) w(0) = 0 (15) The derivatives and weight changes can be calculated at each epoch along the way and take the full change in w ij as the time-sum of w ij (t) at the end of the training sequence. 3 Analysis 3.1 Time Complexity In the whole training process, the most time consuming operation is the update of the sensitivity matrix, [p k ij k(t) ], according to equation 13. For each element p k ij in matrix, we need to go through equation 13 once in each time step to update its value and this requires q operations to calculate the summation on the right hand side where q is the degree of connectivity. Since there are n(q+p) weights in the hidden layer and each weight has r such p-terms, where r is the size of one subgroup, there are totally nr(q+p) p-terms. 9

10 Updating the sensitivity matrix in each epoch thus needs a total computational complexity of order O(nqr(q+p)) operations. In a locally connected recurrent network, q and r are constants which are much smaller than n, so the computational complexity of the above algorithm is greatly improved in comparison with the O(N 4 ) of a FRN, the fully recurrent network derived by Williams and Zipser [9], where N is equal to the number of non-input units (i.e. N = m+n). It is true that the n here may be dierent from that in FRN, i.e. we may need more hidden units in a locally connected recurrent model but experimental results show that this increase is just too small to counterbalance the O(N 4 ) in FRN. Taking into account the other operations, the total computational complexity is of order O(mn+nqr(p+q)) in each epoch. If we assume that the parameters q and r are xed in a recurrent model, the complexity becomes O(mn+np) which is optimal since we have already had this order of number of connection weights. 3.2 Space Complexity We have a p k ij -term for a particular weight w ij if and only if the unit u k is in the subgroup G i centered at the unit u i. Since there are r units in each subgroup and there are totally n(q+p) weights in the hidden layer, the total amount of storage needed is O(nr(q+p)). For xed q and r, the complexity turns into O(np). This space complexity is again smaller than the O(N 3 ) of FRN y. 3.3 Local Computations in Time and Space This learning algorithm is local in time since the computations at time t is dependent only on the information at time t and time (t-1). Training can thus be done on-line. Furthermore this algorithm is more local in space than FRN as the update of a weight w ij according to equation 13 needs only the p-terms from those units u k which are in the vicinity of unit u i, i.e. u k 2 G i. Thus unlike FRN in which updating a weight needs some information from The recurrent weights occur between all hidden and output units in a FRN and the complexity has not included the eect of the input units, which its number is assumed to be much smaller than N. y If the weights between the input units and the non-input units are taken into account, the complexity should contain an extra term of Np 10

11 B B C 2 3 D A C A E D 9 6 A E 4 D B 8 E 7 C (a) (b) Figure 6: Analog between a Fully Recurrent Network and a Ring-Structured Recurrent Network every other unit in the network, only those units in the proximity are involved here. 4 Ring-Structured Recurrent Network (RRN) A ring-structured recurrent network (RRN) is a particular type of locally connected recurrent network and is the simplest kind. It is a model with degree of connectivity equaling to three and equaling to one. Each hidden unit is connected to itself and its two nearest neighbors only. The 3-net in Figure 2 shows its connection in the hidden layer. In spite of its simplicity, it becomes more powerful as long as we keep on increasing the number of hidden units. To explain it intuitively, we may imagine an analogy between a fully connected recurrent network and a RRN by tracing out the Hamiltonian circuit in the former model (Figure 6). One unit in the former is represented by several in the latter. Increasing the number of hidden units in a RRN acts as if increasing the number of units in a fully connected recurrent model which will eventually increase the memory capacity of the network. Another advantage of using RRN is that its simplicity reduces the computational complexity to the minimum of O(mn+27n) since q and r are now both equal to three. 11

12 5 Comparison between RRN and FRN in Sequence Recognition We compare the performance of RRN and FRN in sequence recognition. We trained both networks to learn a set of alphabet sequences and compared their training speed and recalling power. The letters in a sequence were input into the network one at a time and the network was required to recognize the sequence after seeing all the alphabets. In each training cycle, all the sequences were fed into the network once and this repeated until the mean square error dropped below a small threshold. We then tested the trained network with a long stream of letters which was made up of words that had already been learned. However the embedded words might be time warped to make recalling more dicult. We compare RRN and FRN in both the training speed and recalling power. 5.1 Training Sets and Testing Sequences We had used nine sets of training data (Table 1) of which three contained time warped sequences and we had used the alphabet streams of Table 2 to compare the recalling power of RRN and FRN. The rst six training sets aimed at testing the network's temporal memorizing power. There were six to twenty-four words in each set and the average word length varied from 3 to 5.9. The third training set was the most dicult one as it contained all twenty-four possible combinations of \a", \b", \c" and \d". Some words share the same sub-sequence, either in the beginning or at the end of the sequence. For example, in the test set 5, the words \iran" and \iraq", \poland", \ireland" and \iceland", \algeria" and \nigeria". The corresponding testing sequences of these sets were formed by concatenating all the words in the training set together, with each word separated by a dot which was the reset signal. The remaining three tests (test number 7, 8 and 9) aimed at testing the network's time warping performance. We had added several time warped samples into the training set, to see whether the network could generalize them to other arbitrarily time warped sequences. The seventh and the eighth test were more dicult since the training sequences were just permutations of the same set of alphabets. The testing alphabet streams were formed by 12

13 Test Training Sets 1 abc acb bac bca cab cba - - abcd dabc acbd dacb 2 bacd dbac bcad dbca cabd dcab cbad dcba abcd abdc acbd acdb adbc adcb bacd badc 3 bcad bcda bdac bdca cabd cadb cbad cbda cdab cdba dabc dacb dbac dbca dcab dcba cat dog pig man 4 hen bat cow y tod duck sh bird cock swam sheep - iran iraq niger swedan 5 norway poland ireland iceland algeria nigeria - - alpha beta gamma delta 6 epsilon zeta theta lambda mu nu xi omicron pi rho sigma - abc acb bac bca 7* cab cba aaaaabbbbbccccc aaaaacccccbbbbb bbbbbaaaaaccccc bbbbbcccccaaaaa cccccaaaaabbbbb cccccbbbbbaaaaa 8* stop tops spot pots ssstttoooppp tttooopppsss ssspppooottt pppoootttsss kick jump clap ride 9* it beat walk swim kkkiiiccckkk jjjuuummmppp ccclllaaappp rrriiidddeee lliiittt bbbeeeaaattt wwwaaalllkkk ssswwwiiimmm Table 1: Training Sets for Comparing RRN and RTRL in Sequence Recognition concatenating a number of time warped words together, again separated by dots and each character in a word might persist for one to three time steps unexpectantly. 5.2 Comparison in Training Speed The performance of RRN and FRN in the training phase are summarized in Table 3 and Table 4. We could see that the training time of RRN was always much shorter than that of FRN although the number of iterations taken by RRN might be greater. The low eciency of FRN was most apparent in test three which was a dicult training set with all twenty- 13

14 Test Testing Sequences 1 abc.acb.bac.bca.cab.cba 2 abcd.dabc.acbd.dacb.bacd.dbac.bcad.dbca.cabd.dcab.cbad.dcba. 3 abcd.abdc.acbd.acdb.adbc.adcb.bacd.badc.bcad.bcda.bdac.bdca.cabd. cadb.cbad.cbda.cdab.cdba.dabc.dacb.dbac.dbca.dcab.dcba. 4 cat.dog.pig.man.hen.bat.cow.y.tod.duck.sh.bird.cock.swam.sheep. 5 iran.iraq.niger.swedan.norway.poland.ireland.iceland.algeria.nigeria. 6 alpha.beta.gamma.delta.epsilon.zeta.theta.lambda.mu.nu.xi.omicron. pi.rho.sigma. 7 abc.aabbbcc.aaabbc.aaabccc.acb.aaacbbb.aaccbb.accccb.bac.bbbaaac. bbaaccc.baccc.bca.bbbcaaa.bbccca.bbcccaa.cab.caaab.cccab.cabbb. cba.cccbbbaaa.ccbba.cbbbaa. 8 stop.sssttoopp.stttooopp.sstooppp.tops.tttoopppss.ttoooppsss.tooopps. spot.ssspott.sppooot.sspppoottt.pots.pppoottss.ppooottsss.potttss. 9 kick.kkiicckk.kkkicckkk.kiiiccckk.jump.jjjuummp.jjuuummmpp. jummppp.clap.ccllappp.clllaap.cccllaaap.ride.rrriiddeee.rriiidde. riidddee.it.liiit.lliittt.iit.beat.bbbeeat.beeaaatt.bbeeeattt.walk. wwaaalllk.wwaallkkk.wwwalk.swim.ssswwiiimm.sswwwiim.swiiimmm. Table 2: Testing Sequence for Comparing RRN and RTRL in Sequence Recognition four possible combinations of "a", "b", "c" and "d". We could not even train the network to learn the sequences by FRN within a reasonable amount of time (about 3.2 days for 100 iterations). Test No. of No. of No. of Time Taken Final rms Hidden Units Recurrent Links Iterations (sec) Error Table 3: Performance of RRN in the Training Phase of Sequence Recognition 14

15 Test No. of No. of No. of Time Taken Final rms Hidden Units Recurrent Links Iterations (sec) Error Table 4: Performance of FRN in the Training Phase of Sequence Recognition 5.3 Comparison in Recalling Power The performance of RRN and FRN in the testing phase is summarized in Table 5. It could be shown that both RRN and FRN performed very satisfactorily in all these tests and the percentage of successful recalls was greater than 90% in all cases. The network can be trained to have both a strong temporal memorizing power and a good time warping performance with either learning algorithm although RRN can be trained at a much faster rate. Test Percentage of Successful Percentage of Successful Recalls (%) in RRN Recalls (%) in FRN Table 5: Performance of RRN and FRN in the Recalling Phase 15

16 6 Comparison between RRN and FRN in Time Series Prediction Another comparison between RRN and FRN was made in the task of time series prediction. Time series prediction is to forecast the future values of a series based on the history of that series. Predicting the future is important in a variety of elds and the non-linear signal processing of neural networks has been a new and promising approach for this purpose [8, 10, 3, 2, 1]. Recurrent networks have an advantage over feedforward nets in this problem as the recurrent links in the network can bring the appropriate previous values back to itself in order to forecast the next one. In this paper, we compare the performance of RRN and FRN in these prediction tasks. A periodic series with white noise (series 1) - points on a sine curve, having a unit magnitude, and adding to it at each step a uniformly distributed random variable in the interval [-0.5,+0.5]: y(t) = sin(10t) + random[?0:5; +0:5] t = 0; 1; 2; ::::: This series was periodic (Figure 7) but the regularities were masked by noise. The whole series had 120 data points. The rst 100 points were for training while the remaining twenty were for testing. A deterministic chaotic series (series 2) - A chaotic series looks completely random although it is produced by a deterministic and noise-free system. Chaos has many interesting and ridiculous properties. A simple example of deterministic chaos is: y t = 1? 2yt?1 2 t = 1; 2; 3; ::::: All the values generated by this iterative quadratic formula lie within the unit interval if the initial value is between -1 and +1 (Figure 8). Again the whole series had 120 data points. The rst 100 points were training data and the remaining twenty were for testing. The sunspots numbers (series 3) - Sunspots are dark blotches on the sun. They were rst observed around 1610 and yearly averages have been recorded since

17 This sunspots series is usually used as a yardstick to compare new modeling and forecasting methods. We had used the data from 1700 to 1979 (Figure 9) and those from 1700 through 1920 was for training while the remaining was for testing. The comparison emphasizes on both the training speed and the predictive power. In order to have a fair comparison, FRN was slightly modied such that only the hidden layer was fully connected. The inter-layers connections were just ordinary feedforward links. Figure 10 shows the network topology of RRN and the modied FRN in these time series prediction tasks. In additions the output units in these models were linear since a sigmoid function gave only values between zero and one. They computed their activations as the weighted sum of their input from the hidden layer and non-linearity in these models were due to the sigmoid hidden units only. In the training phase, the data points were fed into the input unit one by one consecutively and network forecasted the next value in the series at the output unit. The training series was fed into the network repeatedly until the root mean square error of the predictions dropped below a very small threshold. The testing data was then used to evaluate the network's predictive power. There could be two ways to do prediction, single-step prediction and multi-step prediction: Single-step prediction - The input unit is given values of the observed time series. Multi-step prediction - The predicted output is fed back as input for the next prediction. Hence the input is consisted of predicted values as opposed to actual observations of the original time series. We will compare RRN and FRN in both their training speed and their performance in single-step prediction and multi-step prediction. 6.1 Comparison in Training Speed The performance of RRN and FRN in the training phase is shown in Tables 6 and 7. The learning speed of RRN was again much better than that of FRN and the speedup was even more obvious in comparison with that in sequence recognition. It was because the large 17

18 time step Figure 7: Series 1 - A Sine Function with White Noise time step Figure 8: Series 2 - An Iterative Quadratic Series 18

19 year Figure 9: Series 3 - Sunspots Activity from 1700 to 1979 Output unit Output unit Hidden Units (fully connected) Ring of Hidden Units Input Unit Input Unit (a) (b) Figure 10: FRN and RRN for Time Series Prediction 19

20 training set (at least 100 data points, 220 in the sunspots data) increased the computational complexity of FRN signicantly. Training FRN to learn the sunspots numbers needed about four days while RRN needed only about nine hours. RRN was thus practically more useful then FRN for large-scale problems. However the nal root mean square error of FRN was smaller than that of RRN and we might expect FRN to have a stronger predictive power. Series No. of No. of No. of Time Final Hidden Units Recurrent Links Iterations Taken (sec) rms Error Table 6: Performance of RRN in the Training Phase of Time Series Prediction Series No. of No. of No. of Time Final Hidden Units Recurrent Links Iterations Taken (sec) rms Error Table 7: Performance of FRN in the Training Phase of Time Series Prediction 6.2 Comparison in Predictive Power Table 8 compares the predictive power of RRN and FRN in single-step prediction. The performance of FRN was slightly better than that of RRN in single-step prediction. This was expectable because FRN had more recurrent links and it could actually do more work than RRN. However there were still many factors aecting the results. One obvious factor was that the number of hidden units used, i.e. ten in this case, might not be optimal for RRN or FRN. To compare RRN and FRN in multi-step prediction, we had plotted the correlation coecient against prediction time for both methods on the same graph (Figure 11, 12 and 13). It could be shown that the correlation coecient of FRN dropped slower than that of RRN in series 2 but vice versa in series 3. Hence the performance of RRN and FRN is 20

21 Series rms Error of RRN rms Error of FRN Table 8: Performance of the single-step prediction by RRN and FRN in Time Series Prediction 1 solid line - RRN; dotted line - FRN correlation coefficient time step Figure 11: RRN v.s. FRN in Multi-step Prediction of Series 1 similar although the former is simpler and needs a much shorter training time. 7 Conclusion Recurrent networks are powerful in solving problems with temporal extent as the recurrent links in the network can carry past information through time. Unfortunately, there is still no satisfactory algorithm for training recurrent models. The FRN using Real Time Recurrent Learning Rule (RTRL) by Williams and Zipser [9] is powerful as it performs exact gradient descent for a fully recurrent network in an on-line manner. However FRN is inecient and 21

22 solid line - RRN; dotted line - FRN correlation coefficient time step Figure 12: RRN v.s. FRN in Multi-step Prediction of Series 2 solid line - RRN; dotted line - FRN correlation coefficient year Figure 13: RRN v.s. FRN in Multi-step Prediction of Series 3 22

23 it has a computational complexity of O(N 4 ), where N is the total number of non-input units. Here we start from FRN and try to devise a new learning procedure for a locally connected recurrent network. This new on-line learning rule has a much lower computational complexity O(mn+np) and storage complexity O(np) than FRN where m, n and p are the number of input units, hidden units and output units respectively. This algorithm has much exibility for implementation on parallel architectures [11]. The algorithm is local in space if the parameter in the model is xed to one and each processing element in this case needs only to communicate with the directly connected processing elements. We had compared FRN with the simplest kind of locally connected recurrent networks, called the ring-structured recurrent network RRN, in temporal sequence recognition. It could be shown that RRN was much more ecient than FRN and there were even some large-scale problems which could not be solved by FRN within an acceptable amount of time. However both methods performed satisfactorily in recalling and the percentage of successful recalls was above 90% in all trials. Thus RRN could perform as good as FRN in these sequence recognition tasks although the training time was much shorter than that of FRN. We had also compared RRN with FRN in time series prediction. Three typical examples were used in our experiments, a periodic series with white noise, a deterministic chaotic series and the sunspots activity data. We could see that RRN, again, needed a much smaller amount of training time although both had a stronger predictive power and could perform satisfactorily in both the single-step prediction and multi-step prediction. To conclude, RRN can be run at a much faster speed than FRN although the performance of both are comparable to each other. However RRN is only the simplest kind of locally connected recurrent models and we can increase the number of recurrent links by using a more densely connected network. In sum, the new learning algorithm introduced in this paper is O(N 2 ) times faster than FRN and it should be preferred especially in some large scale applications. 23

24 References [1] K. Chakraborty, K. Mehrotra, C. K. Mohan, and S. Ranka. Forecasting the behaviour of multivariate time series using neural networks. Neural Networks, 5:961{970, [2] C. de Groot and D. Wurtz. Analysis of univariate time series with connectionist nets: a case study of two classical examples. Neurocomputing, 3:177{192, [3] J. B. Elsner. Predicting time series using a neural network as a method of distinguishing chaos from noise. J. Phys. A: Math. Gen., 25:843{850, [4] R. E. Rumelhard, G. E. Hinton, and R. J. Williams. Learning lnternal representation by error backpropagation. Parallel Distributed Processing: Explorations in Microstructure of Cognition, 1, [5] J. H. Schmidhuber. A local learning algorithm for dynamic feedforward and recurrent networks. Report FKI , [6] J. H. Schmidhuber. Temporal-dierence-driven learning in recurrent networks. Parallel Processing Neural Systems and Computers, 15:626{629, [7] J. H. Schmidhuber. A xed size storage O[n 3 ] time complexity learning algorithm for fully recurrent continually running network. Neural Computation, 4(2):243{248, [8] A. S. Weigend, B. A. Huberman, and D. E. Rumelhart. Predicting the future: a connectionist approach. International Journal of Neural Systems, 1(3):193{209, [9] R. J. Williams and D. Zipser. A learning algorithm for continually running fully recurrent neural networks. Neural Computation, 1(2):270{280, [10] F. S. Wong. Time series forecasting using backpropagation neural networks. Technical Report NU-CCS-90-9, [11] E.F.Y. Young and L.W. Chan. Parallel implementation of partially connected recurrent network. In IEEE Conference on Neural Networks 1994, Orlando, volume IV, pages 2058{2063,

25 [12] D. Zipser. Subgrouping reduces complexity and speeds up learning in recurrent networks. In Advances in Neural Information Processing Systems, pages 638{641,

Math for Liberal Studies

Math for Liberal Studies Math for Liberal Studies We want to measure the influence each voter has As we have seen, the number of votes you have doesn t always reflect how much influence you have In order to measure the power of

More information

Set Notation and Axioms of Probability NOT NOT X = X = X'' = X

Set Notation and Axioms of Probability NOT NOT X = X = X'' = X Set Notation and Axioms of Probability Memory Hints: Intersection I AND I looks like A for And Union U OR + U looks like U for Union Complement NOT X = X = X' NOT NOT X = X = X'' = X Commutative Law A

More information

Lecture 3 : Probability II. Jonathan Marchini

Lecture 3 : Probability II. Jonathan Marchini Lecture 3 : Probability II Jonathan Marchini Puzzle 1 Pick any two types of card that can occur in a normal pack of shuffled playing cards e.g. Queen and 6. What do you think is the probability that somewhere

More information

Chapter 1 Problem Solving: Strategies and Principles

Chapter 1 Problem Solving: Strategies and Principles Chapter 1 Problem Solving: Strategies and Principles Section 1.1 Problem Solving 1. Understand the problem, devise a plan, carry out your plan, check your answer. 3. Answers will vary. 5. How to Solve

More information

ARTIFICIAL NEURAL NETWORK PART I HANIEH BORHANAZAD

ARTIFICIAL NEURAL NETWORK PART I HANIEH BORHANAZAD ARTIFICIAL NEURAL NETWORK PART I HANIEH BORHANAZAD WHAT IS A NEURAL NETWORK? The simplest definition of a neural network, more properly referred to as an 'artificial' neural network (ANN), is provided

More information

Do in calculator, too.

Do in calculator, too. You do Do in calculator, too. Sequence formulas that give you the exact definition of a term are explicit formulas. Formula: a n = 5n Explicit, every term is defined by this formula. Recursive formulas

More information

Lecture 4: DNA Restriction Mapping

Lecture 4: DNA Restriction Mapping Lecture 4: DNA Restriction Mapping Study Chapter 4.1-4.3 8/28/2014 COMP 555 Bioalgorithms (Fall 2014) 1 Recall Restriction Enzymes (from Lecture 2) Restriction enzymes break DNA whenever they encounter

More information

9/2/2009 Comp /Comp Fall

9/2/2009 Comp /Comp Fall Lecture 4: DNA Restriction Mapping Study Chapter 4.1-4.34.3 9/2/2009 Comp 590-90/Comp 790-90 Fall 2009 1 Recall Restriction Enzymes (from Lecture 2) Restriction enzymes break DNA whenever they encounter

More information

Lecture 2 31 Jan Logistics: see piazza site for bootcamps, ps0, bashprob

Lecture 2 31 Jan Logistics: see piazza site for bootcamps, ps0, bashprob Lecture 2 31 Jan 2017 Logistics: see piazza site for bootcamps, ps0, bashprob Discrete Probability and Counting A finite probability space is a set S and a real function p(s) ons such that: p(s) 0, s S,

More information

Discrete Probability Distributions

Discrete Probability Distributions 37 Contents Discrete Probability Distributions 37.1 Discrete Probability Distributions 2 37.2 The Binomial Distribution 17 37.3 The Poisson Distribution 37 37.4 The Hypergeometric Distribution 53 Learning

More information

8/29/13 Comp 555 Fall

8/29/13 Comp 555 Fall 8/29/13 Comp 555 Fall 2013 1 (from Lecture 2) Restriction enzymes break DNA whenever they encounter specific base sequences They occur reasonably frequently within long sequences (a 6-base sequence target

More information

Lecture 4: DNA Restriction Mapping

Lecture 4: DNA Restriction Mapping Lecture 4: DNA Restriction Mapping Study Chapter 4.1-4.3 9/3/2013 COMP 465 Fall 2013 1 Recall Restriction Enzymes (from Lecture 2) Restriction enzymes break DNA whenever they encounter specific base sequences

More information

Address for Correspondence

Address for Correspondence Research Article APPLICATION OF ARTIFICIAL NEURAL NETWORK FOR INTERFERENCE STUDIES OF LOW-RISE BUILDINGS 1 Narayan K*, 2 Gairola A Address for Correspondence 1 Associate Professor, Department of Civil

More information

Discrete Probability Distributions

Discrete Probability Distributions Chapter 4 Discrete Probability Distributions 4.1 Random variable A random variable is a function that assigns values to different events in a sample space. Example 4.1.1. Consider the experiment of rolling

More information

Finding extremal voting games via integer linear programming

Finding extremal voting games via integer linear programming Finding extremal voting games via integer linear programming Sascha Kurz Business mathematics University of Bayreuth sascha.kurz@uni-bayreuth.de Euro 2012 Vilnius Finding extremal voting games via integer

More information

memory networks, have been proposed by Hopeld (1982), Lapedes and Farber (1986), Almeida (1987), Pineda (1988), and Rohwer and Forrest (1987). Other r

memory networks, have been proposed by Hopeld (1982), Lapedes and Farber (1986), Almeida (1987), Pineda (1988), and Rohwer and Forrest (1987). Other r A Learning Algorithm for Continually Running Fully Recurrent Neural Networks Ronald J. Williams College of Computer Science Northeastern University Boston, Massachusetts 02115 and David Zipser Institute

More information

10.1. Randomness and Probability. Investigation: Flip a Coin EXAMPLE A CONDENSED LESSON

10.1. Randomness and Probability. Investigation: Flip a Coin EXAMPLE A CONDENSED LESSON CONDENSED LESSON 10.1 Randomness and Probability In this lesson you will simulate random processes find experimental probabilities based on the results of a large number of trials calculate theoretical

More information

Equivalence of Backpropagation and Contrastive Hebbian Learning in a Layered Network

Equivalence of Backpropagation and Contrastive Hebbian Learning in a Layered Network LETTER Communicated by Geoffrey Hinton Equivalence of Backpropagation and Contrastive Hebbian Learning in a Layered Network Xiaohui Xie xhx@ai.mit.edu Department of Brain and Cognitive Sciences, Massachusetts

More information

4. Multilayer Perceptrons

4. Multilayer Perceptrons 4. Multilayer Perceptrons This is a supervised error-correction learning algorithm. 1 4.1 Introduction A multilayer feedforward network consists of an input layer, one or more hidden layers, and an output

More information

CHAPTER 3. Pattern Association. Neural Networks

CHAPTER 3. Pattern Association. Neural Networks CHAPTER 3 Pattern Association Neural Networks Pattern Association learning is the process of forming associations between related patterns. The patterns we associate together may be of the same type or

More information

COGS Q250 Fall Homework 7: Learning in Neural Networks Due: 9:00am, Friday 2nd November.

COGS Q250 Fall Homework 7: Learning in Neural Networks Due: 9:00am, Friday 2nd November. COGS Q250 Fall 2012 Homework 7: Learning in Neural Networks Due: 9:00am, Friday 2nd November. For the first two questions of the homework you will need to understand the learning algorithm using the delta

More information

discrete probability distributions

discrete probability distributions 37 Contents discrete probability distributions 1. Discrete probability distributions. The binomial distribution 3. The Poisson distribution 4. The hypergeometric distribution Learning outcomes In this

More information

Ecient Higher-order Neural Networks. for Classication and Function Approximation. Joydeep Ghosh and Yoan Shin. The University of Texas at Austin

Ecient Higher-order Neural Networks. for Classication and Function Approximation. Joydeep Ghosh and Yoan Shin. The University of Texas at Austin Ecient Higher-order Neural Networks for Classication and Function Approximation Joydeep Ghosh and Yoan Shin Department of Electrical and Computer Engineering The University of Texas at Austin Austin, TX

More information

Simple Neural Nets For Pattern Classification

Simple Neural Nets For Pattern Classification CHAPTER 2 Simple Neural Nets For Pattern Classification Neural Networks General Discussion One of the simplest tasks that neural nets can be trained to perform is pattern classification. In pattern classification

More information

Lecture 7 Artificial neural networks: Supervised learning

Lecture 7 Artificial neural networks: Supervised learning Lecture 7 Artificial neural networks: Supervised learning Introduction, or how the brain works The neuron as a simple computing element The perceptron Multilayer neural networks Accelerated learning in

More information

Data Mining Part 5. Prediction

Data Mining Part 5. Prediction Data Mining Part 5. Prediction 5.5. Spring 2010 Instructor: Dr. Masoud Yaghini Outline How the Brain Works Artificial Neural Networks Simple Computing Elements Feed-Forward Networks Perceptrons (Single-layer,

More information

Phase-Space learning for recurrent networks. and. Abstract. We study the problem of learning nonstatic attractors in recurrent networks.

Phase-Space learning for recurrent networks. and. Abstract. We study the problem of learning nonstatic attractors in recurrent networks. Phase-Space learning for recurrent networks Fu-Sheng Tsung and Garrison W Cottrell Department of Computer Science & Engineering and Institute for Neural Computation University of California, San Diego,

More information

100 inference steps doesn't seem like enough. Many neuron-like threshold switching units. Many weighted interconnections among units

100 inference steps doesn't seem like enough. Many neuron-like threshold switching units. Many weighted interconnections among units Connectionist Models Consider humans: Neuron switching time ~ :001 second Number of neurons ~ 10 10 Connections per neuron ~ 10 4 5 Scene recognition time ~ :1 second 100 inference steps doesn't seem like

More information

Artificial Neural Networks

Artificial Neural Networks Artificial Neural Networks Threshold units Gradient descent Multilayer networks Backpropagation Hidden layer representations Example: Face Recognition Advanced topics 1 Connectionist Models Consider humans:

More information

18.6 Regression and Classification with Linear Models

18.6 Regression and Classification with Linear Models 18.6 Regression and Classification with Linear Models 352 The hypothesis space of linear functions of continuous-valued inputs has been used for hundreds of years A univariate linear function (a straight

More information

Machine Learning. Neural Networks

Machine Learning. Neural Networks Machine Learning Neural Networks Bryan Pardo, Northwestern University, Machine Learning EECS 349 Fall 2007 Biological Analogy Bryan Pardo, Northwestern University, Machine Learning EECS 349 Fall 2007 THE

More information

Keywords- Source coding, Huffman encoding, Artificial neural network, Multilayer perceptron, Backpropagation algorithm

Keywords- Source coding, Huffman encoding, Artificial neural network, Multilayer perceptron, Backpropagation algorithm Volume 4, Issue 5, May 2014 ISSN: 2277 128X International Journal of Advanced Research in Computer Science and Software Engineering Research Paper Available online at: www.ijarcsse.com Huffman Encoding

More information

Serious limitations of (single-layer) perceptrons: Cannot learn non-linearly separable tasks. Cannot approximate (learn) non-linear functions

Serious limitations of (single-layer) perceptrons: Cannot learn non-linearly separable tasks. Cannot approximate (learn) non-linear functions BACK-PROPAGATION NETWORKS Serious limitations of (single-layer) perceptrons: Cannot learn non-linearly separable tasks Cannot approximate (learn) non-linear functions Difficult (if not impossible) to design

More information

Introduction to Neural Networks

Introduction to Neural Networks Introduction to Neural Networks What are (Artificial) Neural Networks? Models of the brain and nervous system Highly parallel Process information much more like the brain than a serial computer Learning

More information

12.1. Randomness and Probability

12.1. Randomness and Probability CONDENSED LESSON. Randomness and Probability In this lesson, you Simulate random processes with your calculator Find experimental probabilities based on the results of a large number of trials Calculate

More information

arxiv: v1 [math.ra] 11 Aug 2010

arxiv: v1 [math.ra] 11 Aug 2010 ON THE DEFINITION OF QUASI-JORDAN ALGEBRA arxiv:1008.2009v1 [math.ra] 11 Aug 2010 MURRAY R. BREMNER Abstract. Velásquez and Felipe recently introduced quasi-jordan algebras based on the product a b = 1

More information

Relating Real-Time Backpropagation and. Backpropagation-Through-Time: An Application of Flow Graph. Interreciprocity.

Relating Real-Time Backpropagation and. Backpropagation-Through-Time: An Application of Flow Graph. Interreciprocity. Neural Computation, 1994 Relating Real-Time Backpropagation and Backpropagation-Through-Time: An Application of Flow Graph Interreciprocity. Francoise Beaufays and Eric A. Wan Abstract We show that signal

More information

Chapter 3 Supervised learning:

Chapter 3 Supervised learning: Chapter 3 Supervised learning: Multilayer Networks I Backpropagation Learning Architecture: Feedforward network of at least one layer of non-linear hidden nodes, e.g., # of layers L 2 (not counting the

More information

Lecture 4: Feed Forward Neural Networks

Lecture 4: Feed Forward Neural Networks Lecture 4: Feed Forward Neural Networks Dr. Roman V Belavkin Middlesex University BIS4435 Biological neurons and the brain A Model of A Single Neuron Neurons as data-driven models Neural Networks Training

More information

22c145-Fall 01: Neural Networks. Neural Networks. Readings: Chapter 19 of Russell & Norvig. Cesare Tinelli 1

22c145-Fall 01: Neural Networks. Neural Networks. Readings: Chapter 19 of Russell & Norvig. Cesare Tinelli 1 Neural Networks Readings: Chapter 19 of Russell & Norvig. Cesare Tinelli 1 Brains as Computational Devices Brains advantages with respect to digital computers: Massively parallel Fault-tolerant Reliable

More information

arxiv: v3 [cs.lg] 14 Jan 2018

arxiv: v3 [cs.lg] 14 Jan 2018 A Gentle Tutorial of Recurrent Neural Network with Error Backpropagation Gang Chen Department of Computer Science and Engineering, SUNY at Buffalo arxiv:1610.02583v3 [cs.lg] 14 Jan 2018 1 abstract We describe

More information

Analysis of Multilayer Neural Network Modeling and Long Short-Term Memory

Analysis of Multilayer Neural Network Modeling and Long Short-Term Memory Analysis of Multilayer Neural Network Modeling and Long Short-Term Memory Danilo López, Nelson Vera, Luis Pedraza International Science Index, Mathematical and Computational Sciences waset.org/publication/10006216

More information

Feedforward Neural Nets and Backpropagation

Feedforward Neural Nets and Backpropagation Feedforward Neural Nets and Backpropagation Julie Nutini University of British Columbia MLRG September 28 th, 2016 1 / 23 Supervised Learning Roadmap Supervised Learning: Assume that we are given the features

More information

Temporal Backpropagation for FIR Neural Networks

Temporal Backpropagation for FIR Neural Networks Temporal Backpropagation for FIR Neural Networks Eric A. Wan Stanford University Department of Electrical Engineering, Stanford, CA 94305-4055 Abstract The traditional feedforward neural network is a static

More information

Artificial Neural Network

Artificial Neural Network Artificial Neural Network Contents 2 What is ANN? Biological Neuron Structure of Neuron Types of Neuron Models of Neuron Analogy with human NN Perceptron OCR Multilayer Neural Network Back propagation

More information

Integer weight training by differential evolution algorithms

Integer weight training by differential evolution algorithms Integer weight training by differential evolution algorithms V.P. Plagianakos, D.G. Sotiropoulos, and M.N. Vrahatis University of Patras, Department of Mathematics, GR-265 00, Patras, Greece. e-mail: vpp

More information

Lecture 4: Perceptrons and Multilayer Perceptrons

Lecture 4: Perceptrons and Multilayer Perceptrons Lecture 4: Perceptrons and Multilayer Perceptrons Cognitive Systems II - Machine Learning SS 2005 Part I: Basic Approaches of Concept Learning Perceptrons, Artificial Neuronal Networks Lecture 4: Perceptrons

More information

2- AUTOASSOCIATIVE NET - The feedforward autoassociative net considered in this section is a special case of the heteroassociative net.

2- AUTOASSOCIATIVE NET - The feedforward autoassociative net considered in this section is a special case of the heteroassociative net. 2- AUTOASSOCIATIVE NET - The feedforward autoassociative net considered in this section is a special case of the heteroassociative net. - For an autoassociative net, the training input and target output

More information

Training Multi-Layer Neural Networks. - the Back-Propagation Method. (c) Marcin Sydow

Training Multi-Layer Neural Networks. - the Back-Propagation Method. (c) Marcin Sydow Plan training single neuron with continuous activation function training 1-layer of continuous neurons training multi-layer network - back-propagation method single neuron with continuous activation function

More information

Boxlets: a Fast Convolution Algorithm for. Signal Processing and Neural Networks. Patrice Y. Simard, Leon Bottou, Patrick Haner and Yann LeCun

Boxlets: a Fast Convolution Algorithm for. Signal Processing and Neural Networks. Patrice Y. Simard, Leon Bottou, Patrick Haner and Yann LeCun Boxlets: a Fast Convolution Algorithm for Signal Processing and Neural Networks Patrice Y. Simard, Leon Bottou, Patrick Haner and Yann LeCun AT&T Labs-Research 100 Schultz Drive, Red Bank, NJ 07701-7033

More information

Phase-Space Learning Fu-Sheng Tsung Chung Tai Ch'an Temple 56, Yuon-fon Road, Yi-hsin Li, Pu-li Nan-tou County, Taiwan 545 Republic of China Garrison

Phase-Space Learning Fu-Sheng Tsung Chung Tai Ch'an Temple 56, Yuon-fon Road, Yi-hsin Li, Pu-li Nan-tou County, Taiwan 545 Republic of China Garrison Phase-Space Learning Fu-Sheng Tsung Chung Tai Ch'an Temple 56, Yuon-fon Road, Yi-hsin Li, Pu-li Nan-tou County, Taiwan 545 Republic of China Garrison W. Cottrell Institute for Neural Computation Computer

More information

Introduction to Artificial Neural Networks

Introduction to Artificial Neural Networks Facultés Universitaires Notre-Dame de la Paix 27 March 2007 Outline 1 Introduction 2 Fundamentals Biological neuron Artificial neuron Artificial Neural Network Outline 3 Single-layer ANN Perceptron Adaline

More information

Christian Mohr

Christian Mohr Christian Mohr 20.12.2011 Recurrent Networks Networks in which units may have connections to units in the same or preceding layers Also connections to the unit itself possible Already covered: Hopfield

More information

Long-Short Term Memory

Long-Short Term Memory Long-Short Term Memory Sepp Hochreiter, Jürgen Schmidhuber Presented by Derek Jones Table of Contents 1. Introduction 2. Previous Work 3. Issues in Learning Long-Term Dependencies 4. Constant Error Flow

More information

Neural Networks and Ensemble Methods for Classification

Neural Networks and Ensemble Methods for Classification Neural Networks and Ensemble Methods for Classification NEURAL NETWORKS 2 Neural Networks A neural network is a set of connected input/output units (neurons) where each connection has a weight associated

More information

Revision: Neural Network

Revision: Neural Network Revision: Neural Network Exercise 1 Tell whether each of the following statements is true or false by checking the appropriate box. Statement True False a) A perceptron is guaranteed to perfectly learn

More information

NONLINEAR CLASSIFICATION AND REGRESSION. J. Elder CSE 4404/5327 Introduction to Machine Learning and Pattern Recognition

NONLINEAR CLASSIFICATION AND REGRESSION. J. Elder CSE 4404/5327 Introduction to Machine Learning and Pattern Recognition NONLINEAR CLASSIFICATION AND REGRESSION Nonlinear Classification and Regression: Outline 2 Multi-Layer Perceptrons The Back-Propagation Learning Algorithm Generalized Linear Models Radial Basis Function

More information

Introduction to Natural Computation. Lecture 9. Multilayer Perceptrons and Backpropagation. Peter Lewis

Introduction to Natural Computation. Lecture 9. Multilayer Perceptrons and Backpropagation. Peter Lewis Introduction to Natural Computation Lecture 9 Multilayer Perceptrons and Backpropagation Peter Lewis 1 / 25 Overview of the Lecture Why multilayer perceptrons? Some applications of multilayer perceptrons.

More information

AI Programming CS F-20 Neural Networks

AI Programming CS F-20 Neural Networks AI Programming CS662-2008F-20 Neural Networks David Galles Department of Computer Science University of San Francisco 20-0: Symbolic AI Most of this class has been focused on Symbolic AI Focus or symbols

More information

Comments. Assignment 3 code released. Thought questions 3 due this week. Mini-project: hopefully you have started. implement classification algorithms

Comments. Assignment 3 code released. Thought questions 3 due this week. Mini-project: hopefully you have started. implement classification algorithms Neural networks Comments Assignment 3 code released implement classification algorithms use kernels for census dataset Thought questions 3 due this week Mini-project: hopefully you have started 2 Example:

More information

Neural Networks Learning the network: Backprop , Fall 2018 Lecture 4

Neural Networks Learning the network: Backprop , Fall 2018 Lecture 4 Neural Networks Learning the network: Backprop 11-785, Fall 2018 Lecture 4 1 Recap: The MLP can represent any function The MLP can be constructed to represent anything But how do we construct it? 2 Recap:

More information

Lecture 5: Logistic Regression. Neural Networks

Lecture 5: Logistic Regression. Neural Networks Lecture 5: Logistic Regression. Neural Networks Logistic regression Comparison with generative models Feed-forward neural networks Backpropagation Tricks for training neural networks COMP-652, Lecture

More information

Parallel layer perceptron

Parallel layer perceptron Neurocomputing 55 (2003) 771 778 www.elsevier.com/locate/neucom Letters Parallel layer perceptron Walmir M. Caminhas, Douglas A.G. Vieira, João A. Vasconcelos Department of Electrical Engineering, Federal

More information

CMSC 421: Neural Computation. Applications of Neural Networks

CMSC 421: Neural Computation. Applications of Neural Networks CMSC 42: Neural Computation definition synonyms neural networks artificial neural networks neural modeling connectionist models parallel distributed processing AI perspective Applications of Neural Networks

More information

Using Variable Threshold to Increase Capacity in a Feedback Neural Network

Using Variable Threshold to Increase Capacity in a Feedback Neural Network Using Variable Threshold to Increase Capacity in a Feedback Neural Network Praveen Kuruvada Abstract: The article presents new results on the use of variable thresholds to increase the capacity of a feedback

More information

Novel determination of dierential-equation solutions: universal approximation method

Novel determination of dierential-equation solutions: universal approximation method Journal of Computational and Applied Mathematics 146 (2002) 443 457 www.elsevier.com/locate/cam Novel determination of dierential-equation solutions: universal approximation method Thananchai Leephakpreeda

More information

Course 395: Machine Learning - Lectures

Course 395: Machine Learning - Lectures Course 395: Machine Learning - Lectures Lecture 1-2: Concept Learning (M. Pantic) Lecture 3-4: Decision Trees & CBC Intro (M. Pantic & S. Petridis) Lecture 5-6: Evaluating Hypotheses (S. Petridis) Lecture

More information

secretsaremadetobefoundoutwithtime UGETGVUCTGOCFGVQDGHQWPFQWVYKVJVKOG Breaking the Code

secretsaremadetobefoundoutwithtime UGETGVUCTGOCFGVQDGHQWPFQWVYKVJVKOG Breaking the Code Breaking the Code To keep your secret is wisdom; but to expect others to keep it is folly. Samuel Johnson Secrets are made to be found out with time Charles Sanford Codes have been used by the military

More information

CONVERGENCE ANALYSIS OF MULTILAYER FEEDFORWARD NETWORKS TRAINED WITH PENALTY TERMS: A REVIEW

CONVERGENCE ANALYSIS OF MULTILAYER FEEDFORWARD NETWORKS TRAINED WITH PENALTY TERMS: A REVIEW CONVERGENCE ANALYSIS OF MULTILAYER FEEDFORWARD NETWORKS TRAINED WITH PENALTY TERMS: A REVIEW Jian Wang 1, Guoling Yang 1, Shan Liu 1, Jacek M. Zurada 2,3 1 College of Science China University of Petroleum,

More information

Artificial Neural Networks Examination, March 2004

Artificial Neural Networks Examination, March 2004 Artificial Neural Networks Examination, March 2004 Instructions There are SIXTY questions (worth up to 60 marks). The exam mark (maximum 60) will be added to the mark obtained in the laborations (maximum

More information

NN V: The generalized delta learning rule

NN V: The generalized delta learning rule NN V: The generalized delta learning rule We now focus on generalizing the delta learning rule for feedforward layered neural networks. The architecture of the two-layer network considered below is shown

More information

Stat 198 for 134 Instructor: Mike Leong

Stat 198 for 134 Instructor: Mike Leong Chapter 2: Repeated Trials ad Samplig Sectio 2.1 Biomial Distributio 2.2 Normal Approximatio: Method 2.3 Normal Approximatios: Derivatio (Skip) 2.4 Poisso Approximatio 2.5 Radom Samplig Chapter 2 Table

More information

Neural Networks biological neuron artificial neuron 1

Neural Networks biological neuron artificial neuron 1 Neural Networks biological neuron artificial neuron 1 A two-layer neural network Output layer (activation represents classification) Weighted connections Hidden layer ( internal representation ) Input

More information

Artificial Neural Networks Examination, June 2005

Artificial Neural Networks Examination, June 2005 Artificial Neural Networks Examination, June 2005 Instructions There are SIXTY questions. (The pass mark is 30 out of 60). For each question, please select a maximum of ONE of the given answers (either

More information

Pattern Recognition Prof. P. S. Sastry Department of Electronics and Communication Engineering Indian Institute of Science, Bangalore

Pattern Recognition Prof. P. S. Sastry Department of Electronics and Communication Engineering Indian Institute of Science, Bangalore Pattern Recognition Prof. P. S. Sastry Department of Electronics and Communication Engineering Indian Institute of Science, Bangalore Lecture - 27 Multilayer Feedforward Neural networks with Sigmoidal

More information

Convergence of Hybrid Algorithm with Adaptive Learning Parameter for Multilayer Neural Network

Convergence of Hybrid Algorithm with Adaptive Learning Parameter for Multilayer Neural Network Convergence of Hybrid Algorithm with Adaptive Learning Parameter for Multilayer Neural Network Fadwa DAMAK, Mounir BEN NASR, Mohamed CHTOUROU Department of Electrical Engineering ENIS Sfax, Tunisia {fadwa_damak,

More information

The error-backpropagation algorithm is one of the most important and widely used (and some would say wildly used) learning techniques for neural

The error-backpropagation algorithm is one of the most important and widely used (and some would say wildly used) learning techniques for neural 1 2 The error-backpropagation algorithm is one of the most important and widely used (and some would say wildly used) learning techniques for neural networks. First we will look at the algorithm itself

More information

Backpropagation Neural Net

Backpropagation Neural Net Backpropagation Neural Net As is the case with most neural networks, the aim of Backpropagation is to train the net to achieve a balance between the ability to respond correctly to the input patterns that

More information

From perceptrons to word embeddings. Simon Šuster University of Groningen

From perceptrons to word embeddings. Simon Šuster University of Groningen From perceptrons to word embeddings Simon Šuster University of Groningen Outline A basic computational unit Weighting some input to produce an output: classification Perceptron Classify tweets Written

More information

ARTIFICIAL INTELLIGENCE. Artificial Neural Networks

ARTIFICIAL INTELLIGENCE. Artificial Neural Networks INFOB2KI 2017-2018 Utrecht University The Netherlands ARTIFICIAL INTELLIGENCE Artificial Neural Networks Lecturer: Silja Renooij These slides are part of the INFOB2KI Course Notes available from www.cs.uu.nl/docs/vakken/b2ki/schema.html

More information

A Logarithmic Neural Network Architecture for Unbounded Non-Linear Function Approximation

A Logarithmic Neural Network Architecture for Unbounded Non-Linear Function Approximation 1 Introduction A Logarithmic Neural Network Architecture for Unbounded Non-Linear Function Approximation J Wesley Hines Nuclear Engineering Department The University of Tennessee Knoxville, Tennessee,

More information

LSTM CAN SOLVE HARD. Jurgen Schmidhuber Lugano, Switzerland. Abstract. guessing than by the proposed algorithms.

LSTM CAN SOLVE HARD. Jurgen Schmidhuber Lugano, Switzerland. Abstract. guessing than by the proposed algorithms. LSTM CAN SOLVE HARD LONG TIME LAG PROBLEMS Sepp Hochreiter Fakultat fur Informatik Technische Universitat Munchen 80290 Munchen, Germany Jurgen Schmidhuber IDSIA Corso Elvezia 36 6900 Lugano, Switzerland

More information

COMP-4360 Machine Learning Neural Networks

COMP-4360 Machine Learning Neural Networks COMP-4360 Machine Learning Neural Networks Jacky Baltes Autonomous Agents Lab University of Manitoba Winnipeg, Canada R3T 2N2 Email: jacky@cs.umanitoba.ca WWW: http://www.cs.umanitoba.ca/~jacky http://aalab.cs.umanitoba.ca

More information

Neural networks. Chapter 20. Chapter 20 1

Neural networks. Chapter 20. Chapter 20 1 Neural networks Chapter 20 Chapter 20 1 Outline Brains Neural networks Perceptrons Multilayer networks Applications of neural networks Chapter 20 2 Brains 10 11 neurons of > 20 types, 10 14 synapses, 1ms

More information

Neural Networks DWML, /25

Neural Networks DWML, /25 DWML, 2007 /25 Neural networks: Biological and artificial Consider humans: Neuron switching time 0.00 second Number of neurons 0 0 Connections per neuron 0 4-0 5 Scene recognition time 0. sec 00 inference

More information

CS 4700: Foundations of Artificial Intelligence

CS 4700: Foundations of Artificial Intelligence CS 4700: Foundations of Artificial Intelligence Prof. Bart Selman selman@cs.cornell.edu Machine Learning: Neural Networks R&N 18.7 Intro & perceptron learning 1 2 Neuron: How the brain works # neurons

More information

Learning and Memory in Neural Networks

Learning and Memory in Neural Networks Learning and Memory in Neural Networks Guy Billings, Neuroinformatics Doctoral Training Centre, The School of Informatics, The University of Edinburgh, UK. Neural networks consist of computational units

More information

Multilayer Perceptrons and Backpropagation

Multilayer Perceptrons and Backpropagation Multilayer Perceptrons and Backpropagation Informatics 1 CG: Lecture 7 Chris Lucas School of Informatics University of Edinburgh January 31, 2017 (Slides adapted from Mirella Lapata s.) 1 / 33 Reading:

More information

IN the field of ElectroMagnetics (EM), Boundary Value

IN the field of ElectroMagnetics (EM), Boundary Value 1 A Neural Network Based ElectroMagnetic Solver Sethu Hareesh Kolluru (hareesh@stanford.edu) General Machine Learning I. BACKGROUND AND MOTIVATION IN the field of ElectroMagnetics (EM), Boundary Value

More information

Neural Networks. Nicholas Ruozzi University of Texas at Dallas

Neural Networks. Nicholas Ruozzi University of Texas at Dallas Neural Networks Nicholas Ruozzi University of Texas at Dallas Handwritten Digit Recognition Given a collection of handwritten digits and their corresponding labels, we d like to be able to correctly classify

More information

y(n) Time Series Data

y(n) Time Series Data Recurrent SOM with Local Linear Models in Time Series Prediction Timo Koskela, Markus Varsta, Jukka Heikkonen, and Kimmo Kaski Helsinki University of Technology Laboratory of Computational Engineering

More information

Introduction to Machine Learning Spring 2018 Note Neural Networks

Introduction to Machine Learning Spring 2018 Note Neural Networks CS 189 Introduction to Machine Learning Spring 2018 Note 14 1 Neural Networks Neural networks are a class of compositional function approximators. They come in a variety of shapes and sizes. In this class,

More information

Artificial Neural Networks Examination, June 2004

Artificial Neural Networks Examination, June 2004 Artificial Neural Networks Examination, June 2004 Instructions There are SIXTY questions (worth up to 60 marks). The exam mark (maximum 60) will be added to the mark obtained in the laborations (maximum

More information

Error Empirical error. Generalization error. Time (number of iteration)

Error Empirical error. Generalization error. Time (number of iteration) Submitted to Neural Networks. Dynamics of Batch Learning in Multilayer Networks { Overrealizability and Overtraining { Kenji Fukumizu The Institute of Physical and Chemical Research (RIKEN) E-mail: fuku@brain.riken.go.jp

More information

Mathematics for large scale tensor computations

Mathematics for large scale tensor computations Mathematics for large scale tensor computations José M. Martín-García Institut d Astrophysique de Paris, Laboratoire Universe et Théories Meudon, October 2, 2009 What we consider Tensors in a single vector

More information

Public Key Exchange by Neural Networks

Public Key Exchange by Neural Networks Public Key Exchange by Neural Networks Zahir Tezcan Computer Engineering, Bilkent University, 06532 Ankara zahir@cs.bilkent.edu.tr Abstract. This work is a survey on the concept of neural cryptography,

More information

CSC242: Intro to AI. Lecture 21

CSC242: Intro to AI. Lecture 21 CSC242: Intro to AI Lecture 21 Administrivia Project 4 (homeworks 18 & 19) due Mon Apr 16 11:59PM Posters Apr 24 and 26 You need an idea! You need to present it nicely on 2-wide by 4-high landscape pages

More information

C4 Phenomenological Modeling - Regression & Neural Networks : Computational Modeling and Simulation Instructor: Linwei Wang

C4 Phenomenological Modeling - Regression & Neural Networks : Computational Modeling and Simulation Instructor: Linwei Wang C4 Phenomenological Modeling - Regression & Neural Networks 4040-849-03: Computational Modeling and Simulation Instructor: Linwei Wang Recall.. The simple, multiple linear regression function ŷ(x) = a

More information

Artificial Neural Networks

Artificial Neural Networks Introduction ANN in Action Final Observations Application: Poverty Detection Artificial Neural Networks Alvaro J. Riascos Villegas University of los Andes and Quantil July 6 2018 Artificial Neural Networks

More information

Long-Term Prediction, Chaos and Artificial Neural Networks. Where is the Meeting Point?

Long-Term Prediction, Chaos and Artificial Neural Networks. Where is the Meeting Point? Engineering Letters, 5:, EL_5 Long-Term Prediction, Chaos and Artificial Neural Networks. Where is the Meeting Point? Pilar Gómez-Gil Abstract This paper presents the advances of a research using a combination

More information