Railway passenger train delay prediction via neural network model

Size: px
Start display at page:

Download "Railway passenger train delay prediction via neural network model"

Transcription

1 Railway passenger train delay prediction via neural network model Masoud Yaghini 1*, Mohammad M. Khoshraftar 1 and Masoud Seyedabadi 1 1 School of Railway Engineering, Iran University of Science and Technology, Tehran, Iran SUMMARY The aim of this paper is to present an artificial neural network model with high accuracy to predict the delay of passenger trains in Iranian Railways. In the proposed model, we use three different methods to define inputs including normalized real number, binary coding, and binary set encoding inputs. One of the great challenges of using neural network is how to design a superior network for a specific task. To find an appropriate architecture, three different strategies, called quick, dynamic, and multiple are investigated. To prevent the proposed model from overfitting in modeling, according to cross validation, we divide existing passenger train delays data set into three subsets called training set, validation set, and testing set. To evaluate the proposed model, we compare the results of three different data input methods and three different architectures with each other and with some common prediction methods such as decision tree and multinomial logistic regression. For comparing different neural networks, we consider training time and accuracy of neural networks on test data set and network size. In addition, for comparing neural networks with other well-known prediction methods, we consider training time and the accuracy of neural network on test data sets. To make a fair comparison among all models, we sketch a time-accuracy graph. The results revealed that the proposed model has higher accuracy. KEYWORDS: Neural network; Prediction model; Passenger train delays; Iranian Railways * Correspondence 1

2 1. INTRODUCTION Delay is one of the major issues in railway systems all over the world. According to the British National Audit Office (NAO) [11] incidents such as infrastructure faults, fleet problems, fatalities and trespass still cause significant delays to the traveling public and great cost to the railway. For example, in , 0.8 million incidents led to 14 million minutes of delay to franchised passenger rail services in Great Britain, costing a minimum of 1 billion (averaging around 73 for each minute of delay ) in the time lost to passengers in delays. Of these incidents 1376, each led to over 1000 minutes of delay. Managing the consequences of incidents and getting trains running normally again is vital to reducing delays; so predicting passenger train delays is a very difficult task [11]. Because of the importance of this matter, Iranian Railways always register and analyze the data of delay with respect to its date, causes, and time of delays. In this research, the registered data of passenger train delays in Iranian Railways from 2005 to the end of 2009 is used. According to the achieved data from this database, the average delay from 2005 to the end of 2009 was 18,174 hours per year and 30 minutes for each passenger train. Literature review revealed that a few research on passenger train delays forecasting has been done. Carey and Kwiecinski [2] develop a simple stochastic method to knock-on train delays. Knock-on delay refers to that part of a train s delay, which is caused by other trains in front of it. Huisman and Boucherie [8] develop a stochastic model to predict the train delays with different speeds. Their model can capture both scheduled and unscheduled train movements. A case study of a railway section in the Dutch railway network illustrates the practical value of the model, both for long and shortterm railway planning [8]. Peters et al. [12] develop an intelligent delay predictor model for real-time delay monitoring, and timetable optimization in the range of train networks. This system is responsible for processing existing delays in the network to generate delay predictions for depending trains in the near future. This rule-based system was used as a comparison to the specially developed neural network in order to evaluate the accuracy and the faculty of 2

3 abstraction of such an artificially intelligent component. Yuan [16] develops an improved stochastic model for train delays and delay propagation in stations. The most important scientific contribution of this research is an innovative analytical probability model that accurately predicts the knock-on delays of trains, including the impact on train punctuality at stations based on an extension of blocking time theory of railway operations to stochastic phenomena [16]. Yuan [17] develops a model that deals with stochastic dependence in the modeling of train delays and delay propagation. The proposed model can be used in assessing timetable stability and predicting train punctuality given primary delays. Model validation reveals that the delay estimates match with real-world data very well. Briggs and Beck [1] demonstrate that the distribution of train delays on the British railway network is accurately described by q-exponential functions. In this research, they use data on departure times for 23 major stations for the period September 2005 October Daamenan et al. [5] propose a method to predict knock-on delays in an accurate and non-discriminative way. In this research, two main classes of knock-on delays are distinguished: hindrance at conflicting track sections and waiting for scheduled connections in stations. Flight delays prediction is one of the related topics to this research. Zonglei et al. [18] develop a new method based on machine learning to predict large scale of flight delays. This new method first does k-means on the data of the flight delays, which is an unsupervised learning to get standard of each class of delay. With these classes of delay, a supervised learning method can be used to build an alarm model. For the supervised learning, they use decision tree, BP neural network, and Naive Bayes. Zonglei et al. [19] develop a new method to forecast flight delays. This new method is based on content-based recommendation system. In the forecast model, the events flight delays and airports have been mapped to users and items, respectively, which are the concepts in the recommendation system. According to the propagation of the delay, this new method alerts the target airport by monitoring the status of related airports. The observed status is compared with the history data in order to predict the seriousness of delay. Jianli et al. [9] develop a new method to 3

4 describe the prediction of flight delays and delay propagation at the airport. The method is based on extended cellular automata (ECA) which is one of the extending applications of cellular automata (CA) by extending the components of cell and the definition of cellular neighbors. Long and Hasan [10] develop a simulation model to estimate flight delays and cancellations in all operating conditions, including off-nominal conditions such as inclement weather. It also explicitly models the impacts of flight delays on downstream operations, such as delayed departure, flight cancellation, and ground delay program (GDP). Another related topic is the predicting of bus arrival times. Effective prediction of bus arrival times is important to advanced traveler information systems (ATIS). Automatic passenger counter (APC) systems have been implemented in various public transit systems to obtain bus occupancy along with other information such as location, travel time, etc. Such information has great potential as input data for a variety of applications including performance evaluation, operations management, and service planning. Chen et al. [3] proposed a dynamic model for predicting bus arrival times. Their model uses data collected by a real world APC system. It consists of two major elements. The first one is an artificial neural network model for predicting bus travel time between time points for a trip occurring at given time of day, day of week, and weather condition. The second one is a Kalman filter based dynamic algorithm to adjust the arrival time prediction using up to the minute bus location information. Their test runs shows that this model is quite powerful in modeling variations in bus arrival times along the service route. Chen et al. [4] used an automatic passenger counter data which is collected by New Jersey Transit. They proposed models based on a neural network to predict bus arrival times. Their test runs shows that the predicted travel times generated by the models are reasonably close to the actual arrival times. Yu et al. [15] proposed a hybrid model, based on support vector machine (SVM) and Kalman filtering technique to predict bus arrival times. In the model, the SVM model predicts the baseline travel times on the basic of historical trips occurring data at given time of day, weather conditions, route segment, the travel times on the current segment, and the latest travel times on the predicted 4

5 segment. The Kalman filtering based dynamic algorithm uses the latest bus arrival information, together with estimated baseline travel times, to predict arrival times at the next point. The results show that the hybrid model is feasible and applicable in bus arrival time forecasting area, and generally provides better performance than artificial neural network (ANN) based methods. By the accurate predicted value of train delays, the railway operator can schedule a suitable timetable. This minimizes delays and prevents the errors and problems for the railway plans and because of this one can hope passenger trains have a minimum travel time. Developing a highly accurate neural network based passenger train delays prediction is the aim of our research. To evaluate the proposed model, a comparison among the accuracy of the proposed model, decision tree, and multinomial logistic regression has been made. The evaluation revealed that the proposed model has the highest accuracy. The outline of this paper is as follows. Section 2 presents an explanation for gathered passenger delay data. In Section 3, a brief description on weight updates method for neural networks is presented and then an explanation of the way that how different methods find a suitable structure for the proposed neural network model is given. In Section 4, data partitioning, neural network structures, data discretization, and transformation of data into binary set encoding are explained. In Section 5, the obtained results from the proposed model are presented. In Section 6 summary, conclusions, and some hints for the future research are given. 2. DATA UNDERSTANDING Delay in a passenger train means that the train is not arrived at its prescheduled time. Actually, train delay is not including scheduled stopping time. For example, in Iranian Railways stopping time at interval stations for praying or boarding and alighting passengers are excluded from delay time. The important causes for passenger train delays are: 5

6 - Delay at the origin. It is the difference between actual train s departure time and the scheduled train s departure time. - Incidence with another passenger train or freight train. This case happens when trains running in opposing directions pass each other at places where loops or sidings are available. It is the essentials of train waiting time for a line clearance. - Unscheduled waiting time at overtaking points. It is the train waiting time for the arrival and passing of another train with common path according to its priority. - Engine breakdown. Many kinds of problems can cause the passenger train s engine to stop working properly during the traveling time. - The other trains engine breakdowns: The other trains engine damages, which have some negative effects on travel time of this train. - The other causes. Wagon breakdowns, infrastructure faults such as the track and signal failure, and non-scheduled stops for praying are the the other causes of delay time. In Iranian Railways, the data of passenger train delays are being registered and recorded every day. At the weekend, these files merge and create weekly delay data. Finally, at the end of each month, these weekly files merge to create monthly delay data. In this research, monthly files from 2005 to end of 2009 are used. In each year, the number of patterns represents the number of dispatched trains of that year. Table 1 represents these data in summary. As shown in table 1, 2008 year has the maximum total delay, and the 2007 year also has a great total delay. The average delay for each train in 2008 year is the quotient of total delay divided by the number of dispatched trains, which is equal to 39.7 minutes with respect to table 1. Concerning average delay, 2008 has the greatest average delay. Note that 2009 year roughly has a great total delay but by making a comparison, it has a better average delay. Figure 1 represents the monthly average delay per each passenger train from 2005 to 2009 years, which maximum and minimum average among monthly delays belongs to 10th month and 1th month with and minutes per 6

7 each passenger train, respectively. Figure 2 represents seasonal average delay per each passenger train from 2005 to 2009 years, which maximum and minimum average among seasonal delays is related to summer and spring seasons with and minutes per each passenger train, respectively Notations 3. THE BASICS OF THE PROPOSED MODEL In this section we introduce the basics of the proposed model. The notations used in this section are as follows. σ ( x ) : The activation functions for hidden and output neuron. ij w ij : The weight form unit i to unit j. ( t) w : The change value for w ij at iteration t. η : The error learning rate. δ pj : The propagated error in unit j for the pattern p. o pi : The output of unit i for pattern p. α : The momentum parameter. d : Eta decay. t : The target value of output j for pattern p. pj Θ : i The number of input units. Θ : o The number of output units. M(t): The movement vector based on the changes to the weights over the cycle t. C(t) : The Change vector based on the momentum at the cycle t. W (t) : The vector of weights at cycle t. m(t) : The index of the training acceleration at the cycle t Weight updating procedure 7

8 Artificial neural network is one of the most important data mining techniques. It is being used with both supervised and unsupervised learning. In this paper, a special kind of feedforward neural network, which called multilayer perceptrons (MLP), is used. For training of the network back propagation, the algorithm with the momentum term is used. Activation function for hidden and output units is the standard sigmoid function, and it is shown in equation (1). σ ( x ) 1 1 x = + e. (1) Initial weights of the network are set to random values in the interval 0.5 w ij 0.5. For each pattern of training data set, information flow through the network to generate a prediction. The prediction is compared to the target value found in the training data for the current pattern, and that difference is propagated back through the network to update the weights. To be more precise, the change value w for updating the weights is calculated as equation (2). ( 1) ηδ α ( ) w t + = o + w t. (2) ij pj pi ij Where η is the error learning rate, δ pj is the propagated error, o pi is the output of unit i for pattern p, α is the momentum parameter and w ( t) change value for is the w ij at the previous epoch. The value of α is fixed during training, but the value of η varies across epochs of training. η starts at the userspecified initial eta, decreases logarithmically to the value of ηlow and reverts to ηhigh equation (3). value, and then decreases again to η low ( ) ( ).The value of η is calculated as η t = η t 1.exp(log( η η ) / d ). (3) Where d is the user-specified number of eta decay epochs. If η ( t 1) < ηlow then η ( t ) ηhigh complete. The back-propagated error valueδ pj low high = and continues to epoch thusly until training is is calculated based on where the ij 8

9 connection lies in the network. For connections to output units it is calculated as equation (4). ( t o ) o ( 1 o ) δ =. (4) pj pj pj pj pj For the other units it is calculated as equation (5). t pj δ ( 1 ) = o o δ w. (5) pj pj pj pk kj k is the target value of output j for pattern p. Weights are updated immediately as each pattern is presented to the network during training The architecture of the proposed model Finding network architecture based on existing data is one of the controversial issues in neural networks. Several different methods in this research are used to cope with this problem and finally the method with highest accuracy is selected as the best method. These methods named quick, dynamic and multiple, respectively. In quick method, only a single neural network is trained. By default, the network has one hidden layer containing max (( ) / 20,3) Θ + Θ, where i o i Θ is the number of input units and o Θ is the number of output units. In the dynamic method, the topology of the network changes during training, with units added to improve performance until the network achieves a desired accuracy. There are two stages to dynamic training: finding the topology and training the final network. At the first stage, a network with two hidden layers, which each hidden layer has two units, is built. Initial learning rate is 0.05 and α = 0.09 and train the initial network as usual through one epoch. Create two copies of the initial network, a left and a right network. To the right network, add one unit to the second hidden layer. Train both augmented networks through one epoch, and determine the overall error for each network. If the left network has the lower error, keep it and add one unit to the right network, which has the first hidden layer. If the right network has the lower error, replace the left network with a copy of the right network, and add a unit to the second hidden layer of the right network. Train both networks through another epoch, and repeat the training/augmentation 9

10 epoch until the stopping criteria are met. To adjust the learning rate at each epoch, two vectors are computed. The first vector is the movement vector, M(t), based on the changes to the weights over the epoch. The second vector is the change vector, C(t), based on the momentum at the current epoch. M(t) and C(t) are computed according to the equations (6) and (7). ( ) = 2 ( ) ( 1) ( ) 0.8. ( 1) ( ) M t W t W t. (6) C t = C t M t. (7) W (t) is the vector of weights at epoch t and W (t 1) is the vector of weights at the previous epoch. The ratio of the magnitudes of these vectors is defined as equation (8). m ( t ) = M ( t ) C ( t ). (8) m(t) is an index of the acceleration of training. If the index is less than 1+ C ( t ) 10 training is slowing and learning rate increased by a factor of 1.2. If the index is greater than 5.0, training is accelerating, and eta is decreased by a factor of 4 m ( t ). After finding a good topology, the final network is trained in the normal back-propagation manner and initial learning rate set to 0.02 and α = 0.09 then train network based on these values. In multiple methods, multiple networks are trained in pseudo parallel fashion and the network with the highest accuracy is considered as the final model. At the first step, several single-layer networks are generated with different numbers of hidden units, from 3 up to the number of input units. These networks will always go up to 12 units (even if there are fewer than 12 input units) and will never go larger than 60 units. A network is generated for each number of input units in the sequence 3, 4, 7, 12, and so on, with the increment at each step being two larger than the previous increment. For each single-layer network, a set of two-layer networks is also created. The first layer has the same number of hidden units as the single-layer network, and the number of units in the second layer varies across networks. Network is generated for each number of second-layer units in the sequence 2, 5, 10, 17, 10

11 and so on, up to the number of units in the first hidden layer, again with the increment at each step being two larger than the previous increment [14]. 4. DATA PARTITIONING AND NEURAL NETWORK STRUCTURE 4.1. Data partitioning During training a prediction model, overfitting, i.e. learning more than adequate specification of training data is one of the common problems especially with the large data sets [13]. The aforementioned problem has an off-putting effect on the prediction model to forecast new pattern and causes incorrect or lower than expected prediction. With respect to superabundant volume of data, more than 30,000 patterns per each year, using crossed validation technique could prevent such negative effects. This technique partitions the data into three samples called training set, validation set, and test set. Validation set is used as a pseudo test set in order to evaluate the quality of a network during training such an evaluation called cross validation, allowing training the model with training set, refining the model using a second set, and test results with the third set. This reduces the size of each partition accordingly. However, it may be most suitable when working with a very large dataset. Training proceeds until a minimum of the error on the training set is reached, but only until a minimum of the error on the validation set is reached during the training. By using this technique, specify the point of overfitting. In this paper 60% of data used as training set, 10% as validation set and remaining 30% as test set Neural network output units In the existing data set, the passenger train delay attributes have real number values, they saved in minutes. Data discretization technique is used to reduce the number of values for the given continuous attribute and divide the range of the attributes into intervals. Interval labels then are used to replace actual data values. Replacing numerous values of a continuous attribute by a small number of interval labels thereby reduces and simplifies the original data. This 11

12 leads to a concise, easy-to-use, knowledge-level representation of mining results. Discretization techniques can be categorized based on how the discretization is performed, such as whether it uses class information or which direction it proceeds (i.e., top-down vs. bottom-up). If the discretization process uses class information, then we say it is the supervised discretization. Otherwise, it is unsupervised. If the process starts by first finding one or a few points (called split points or cut points) to split the entire attribute range, and then repeats this recursively on the resulting intervals, it is called top-down discretization or splitting. This contrasts with bottom-up discretization or merging, which starts by considering all of the continuous values as potential split-points, removes some by merging neighborhood values to form intervals, and then recursively applies this process to the resulting intervals [7]. In this research, binning technique is used as discretization technique. Binning does not use class information and therefore is an unsupervised discretization technique. Binning is a top-down splitting technique based on a specified number of bins. In this research, we try to use equal-width binning to discretize attribute values. Using this technique, data is divided (partitioned) into 10 different classes and each of these classes constituted one output unit of neural network. Figure 3 represents the histogram of these classes. Eventually, the proposed model could predict one of these classes, i.e. it predicts an approximation of delay time. Origin destination (O-D pair) of train, day, month and year of dispatching (i.e. dispatching date), railway corridor, which train pass through on its route to destination are the fields that used as the input of the model. In the existing database, there are 278 O-D pairs and for each of them an exclusive integer number is considered. There are also 9 railway corridors and for each of them as O-D pairs an integer number is considered. One of the aforementioned ten classes of delay is network output. Network output layer has 10 binary output units and each of them represent one class of delay, consequently. By Pearson s chi-square, which can test dependency between two categorical attributes, ones could measure two by two dependencies among attributes [6]. If the value of chi-square statistics is at the interval 12

13 2 0, χα ( r 1) ( c 1) the hypostasis of independency between two attributes is accepted, otherwise it is rejected, α is the level of significance (here α = 0.05) and r and c are the number of rows and columns in order, respectively. The dependency between two categorical attributes is high if the probability of independency becomes lower than For delay, corridor, day, month, year and O-D pairs, the values of Pearson s chi-square test statistics are illustrated in table Input units of neural network Finding the best inputs forms to be used with the proposed neural network; three different approaches are considered for input attributes, including normalized real number, binary coding, and binary set encoding inputs. In the first approach, called normalized real number, five units are considered as input units, input unit number one is allotted to the O-D pairs, number two to the corridor, number three to the day, number four to the month, and number five to the year. In the second approach, called binary set encoding inputs, at first integer value which assigned to each attribute encodes into the binary number and then log 2 (max number + 1) binary attribute is considered which max number is the greatest integer number among the different values. If the number of binary attributes did not become an integer value, it rounded to the nearest greater integer number. Finally, these encoded values are assigned to the new attribute such that each attribute just takes one digit of these binary values. For example, for value equal to 50, figure 4 shows the procedure of encoding to binary string (suppose the maximum number of attribute is 70). For each different value of categorical attribute, one binary attribute is considered e.g. for an attribute with 10 different values, 10 separate binary attributes (one for each different value) is considered, and for each pattern just one of 10 separate attributes takes 1 and the others take 0. Another approach for neural network input units is binary inputs. In this approach, binary input units for attendance or nonattendance of different 13

14 values of a categorical attribute is considered, e.g. attributes O-D pairs have 278 different values so, it need 278 binary input units whereas with the binary set encoding it need just nine binary input units. In addition, this matter causes exponential growth of neural network structure. Neural network structure size decreases significantly if ones use normalized input values. Table 3 represents neural network size for different input approaches. Note that just for quick method, a comparison has been made in table 3, and the two other methods do not have a fixed and predefined structure in advance. Therefore, doing these computations is impossible, only thing we can say, because of two hidden layers, network structure size is greater at the end The results on whole dataset 5. EXPERIMENTAL RESULTS A personal computer with Quad CPU 2.83 GHz 64 Bits and 4GB main memory and SPSS Clementine 12.0 software [14] is used to achieve all of results for the proposed model and other prediction methods. Table 4 represents the results of three different structural methods of neural network models on database. In the model, from 179,982 patterns, 107,918 patterns are used as the training set, 53,837 as the test set and as the validation set. To reduce the effect of random parameter initialization on the prediction ability of the models, we run each model 100 times, independently and take the average of results. As observed from table 4 greatest prediction accuracy on the test set is related to binary input using quick methods, which is equal to 92.18%. Binary inputs make network structure size too much big and therefore, it needs more memory and as the consequence, greater time to solve the problems. Lowest prediction accuracy on test set belongs to normalized real number inputs using multiple methods, which is equal to 60.30%. As results reveal ones cannot say one method is definitely the best one. Instead, an analysis on accuracy, time and network structure size is needed to disclose. To evaluate the acquired results, decision tree and multimodal logistic regression is used. In prediction 14

15 with decision tree and logistic regression, the normalized input values are used as input of the model and instead of the ten-output units, an output unit is used to predict ten different classes. Integer numbers 1 to 10 represent each class values. For decision tree induction, C5.0 algorithm is applied. Table 5 represents the results of decision tree and logistic regression models. To make a comparison in an easier way, tables 6 represents the results of different models from the aspects of training time and model accuracy on the test set. We consider two integer numbers for each result of table 6; one shows the rank of the model accuracy and the other shows rank of model training time among other models. Finding a suitable way to make a fair comparison among the models, in figure 5, training time-test accuracy graph is sketched. In this graph, horizontal axis represents model training time and vertical axis represents the accuracy of the model on the test set. In this graph, based on accuracy and training time ones could see the position of each models. According to dispersion of points, ones could divide prediction methods into three classes. In figure 5, the neural network models with binary set input values for all structural methods and normalized real number input values for quick and dynamic methods have low training time and high prediction accuracy. Decision tree, multinomial logistic regression and neural network based model with normalized real number input value for multiple methods have low training time and low prediction accuracy. Each of three structural methods for neural network based model with binary set input values are in the first region and this show the efficiency of the proposed model. In spite of the high prediction accuracy neural network model with binary input values, they located in the second region because they have great training times. Because of low prediction accuracy, all of other models are located in the fourth region The periodical analysis In order to see how the trained model could predict delay time of trains for following periods, in this section four models are constructed. Note that in this research one year is considered as one period, ones also could consider other 15

16 periods such as month, week and etc. each of these models which are called model one, two, three and four, are trained and fitted with data of preceding year in order to predict the expected delay in the following year. In this section, binary set encoding scheme is used. The first model is trained by data of 2005 years, i.e. data of 2005 year is used as train and validation set, then this trained model is used for prediction in 2006 year, i.e. data of 2006 year is used as test dataset. Model two is trained by data of 2005 to the end of 2006 year, and then this trained model is used for prediction in 2007 year. The third model is trained by data of 2005 to the end of 2007 year, and then this trained model is used for prediction in 2008 year. The fourth model is trained by data of 2005 to the end of 2008 year, and then this trained model is used for prediction in 2009 year. 75% of the train set patterns is used as the training pattern and the remaining 25% is used as the validation pattern. Table 7 is illustrated the number of patterns for each model. Table 8 represents the results of the model one to the model four; each of these models is trained 250 epochs over the training set at most. To reduce the effect of random parameter initialization on the prediction ability of the models, all results were averaged over 100 independent runs. These results reveal the effectiveness and efficiency of the proposed model in prediction of delay time of trains for following periods Analysis on disaggregated data To improve the comprehension of the quality of results, the main dataset is partitioned into k different categories called category 1 to category k, respectively. Then k different neural network models associated with each category are trained. According to table 1, 2009 year is not following the pervious delay procedure, so, to prove the fitting ability of the models with data of Iran railway, 2009 year is considered as the test set and all delay patterns in this year will not use to train the model at all. The patterns are included in 2005 to 2008 years are used as the training and validation set (antecedent periods to this test dataset), e.g. model 1 which is trained by pattern in category 1 for 2005 to the end of 2008 years, is used to predict one 16

17 of the ten delay class labels in category 1 for 2009 year. As mentioned at section 4.3, there are nine railway corridors in the dataset. These corridors are used to create nine different categories. Note that ones could consider another attribute (such as O-D pairs, day, month and year or combination of them) to make k categories. With respect to the number of railway corridors, nine disaggregated test datasets are created from the 2009 dataset and nine disaggregated train and validation datasets are created from the 2005 to the end of 2008-year dataset. The nine different models associated to each of these training and validation sets, are trained, in order to predict the expected delay in the following testing dataset. The 75 percent of train dataset is used as training patterns and the remaining 25 percent is used as validation patterns. Table 9 illustrates the number of patterns for each model. Table 10 represents the results of the model one to the model nine. Each of these models is trained 250 epochs over the training set at most. To reduce the effect of random parameter initialization on the prediction ability of the models, all results were averaged over 100 independent runs. The average over structural methods revealed using disaggregated model could improve the quality of results in comparing with the whole dataset (table 4) and periodical (table 8) results. 6. CONCLUSIONS In this paper for being able to predict passenger train delays in Iranian Railways, a neural network model with high accuracy is presented. In the proposed model, we use three different approaches to define inputs including normalized real number, binary coding, and binary set encoding input values. Finding an appropriate architecture for the passenger train prediction neural network model, various strategies are investigated. Predicting passenger train delays, the registered data of in Iranian Railways from 2005 to the end of 2009 year is used. To evaluate the quality of the results, we take advantage of decision tree and multinomial logistic regression models. These comparisons among outcomes revealed that the proposed model has great accuracy and low training time and as the consequence good solution quality. Delay prediction 17

18 makes it easier for railway operators to do a suitable timetable and minimizes delays, errors, and problems for the future railway plans and hope passenger trains have a minimum travel time. Future research will progress in two directions: improving training time and improving prediction accuracy. The model accuracy may be improved through metaheuristic methods such as genetic algorithms or simulated annealing or hybrid algorithms to find a better network architecture. Training time can be improved through other meta-heuristic methods such as particle swarm optimization or continuous ant colony optimization. [1] [2] [3] [4] [5] [6] [7] [8] REFERENCE Briggs, K., Beck, C. (2007). Modeling train delays with q exponential functions, Statistical Mechanics and its Applications, 15, May, vol. 378, pp Carey, M., Kwiecinski, A. (2007). Stochastic approximation to the effects of headways on knock on delays of trains, Transportation Research, vol. 28, August, pp Chen, M., Liu, X., Xia, J., Chien, S.,(2004). A Dynamic Bus-Arrival Time Prediction Model Based on APC Data, Computer-Aided Civil and Infrastructure Engineering, vol. 19, no.5, p.p Chen, M., Yaw, J., Chien, S., Liu, X. (2007). Using automatic passenger counter data in bus arrival time prediction, Journal of Advanced Transportation, vol. 41, no.3, p.p Daamen, W., Goverde, R., Hansen. (2009). Non Discriminatory Automatic Registration of Knock On Train Delays, Networks and Spatial Economics, vol. 9, 23, November, pp Freedman, D., Pisani, R., Puves, R.(2007). Statistics, 4th edition, W. W. Norton & Company. Han, J., Kamber, M. (2006). Data Mining Concepts and Techniques, 2nd edition, Morgan Kaufmann Publishers. Huisman T., Boucherie, R. (2001). Running times on railway sections 18

19 [9] [10] [11] [12] [13] [14] [15] [16] [17] with heterogeneous traintraffic, Transportation Research, vol. 35, March, pp Jianli, D., Yuecheng, Y., Jiandong, W. (2009). A Model for Predicting Flight Delay and Delay Propagation Based on ParallelCellular Automata, International Colloquium on Computing, Communication, Control, and Management, vol. 1, 29, September, pp Long, D., Hasan, S.(2009). Improved Prediction of Flight Delays Using the LMINET2System Wide Simulation Model, Aviation Technology, Integration, and Operations Conference (ATIO), 23, September. National Audit Office. (2008). Reducing passenger rail delays by better management of incidents, HC 308 Session , Report by the Comptroller and Auditor General. Peters, J., Emig, B., Jung, M., Schmidt, S. (2005). Prediction of Delays in Public Transportation using Neural Networks, International Conference on Computational Intelligence for Modelling, Control and Automation and International Conference on Intelligent Agents, vol. 2, pp Prechelt, L. (1994). PROBEN1 a set of neural network benchmark problems and benchmarking rules, Faculty Informatics, University of Karlsruhe, Germany, Technical Report. 21/94. SPSS, Clementine 12.0 Algorithms Guide, Yu, B., Yang, Z-Z., Chen, K., Yu, B.(2010). Hybrid model for prediction of bus arrival times at next station, Journal of Advanced Transportation, vol. 44, no.3, p.p Yuan, J. (2006). Stochastic Modeling of Train Delays and Delay Propagation in Stations, PhD dissertation, Delft University of Technology, Faculty of Civil Engineering and Geosciences, Department of Transportation and Planning. Yuan, J. (2009). Dealing with stochastic dependence in the modeling of train delays and delay propagation, International Conference on Transportation Engineering. 19

20 [18] [19] Zonglei, L., Jiandong, W., Guansheng, Z. (2008). A New Method to Alarm Large Scale of Flights Delay Based on Machine Learning, International Symposium on Knowledge Acquisition and Modeling, pp Zonglei, L., Jiandong, W., Tao, X. (2009). A new method for flight delays forecast based on the recommendation system, Computing Communication Control and Management, Vol. 1, p.p

21 Table1: The summary of passenger train delay data Number of Total delay average Total delay(min) Year dispatched trains (min/train) Sum (average) Table 2: Pearson s chi-square test statistics values Attribute Chi-square test Degree of Probability of statistics values freedom independency Delay, corridor Delay, day Delay, month Delay, year Delay, origin-destination Table 3: Comparison neural network sizes in three different approaches for defining input units in quick method Number of Normalized Binary set Binary number encoding input input input Unit at input layer Unit at hidden layer Unit at output layer Connection between input and hidden layer Connection between hidden and output layer Total connection

22 Table 4: The average result of 100 times runs of neural network methods independently Input Structural Time Training set Test set Validation set value type methods (sec) Correct Accuracy Correct Accuracy Correct Accuracy Quick Numeric Dynamic Multiple Quick Binary Dynamic Multiple Quick Binary set Dynamic Multiple Table 5: The result of prediction with decision tree and logistic regression algorithm Algorithm Time Training set Test set Validation set (sec) correct Accuracy correct accuracy correct accuracy Decision tree Multinomial Logistic regression Method Neural network Table 6: A comparison among all results for ranking Input Structural Method Time Test Training Accuracy Value type methods code (sec) accuracy time rank rank Quick Numeric Dynamic Multiple Quick Binary Dynamic Multiple Quick Binary set Dynamic Multiple Decision tree Numeric Logistic regression Numeric

23 Model Table 7: The number of patterns for each model Training and validation dataset Test dataset No. of training patterns No. of validation patterns No. of test patterns One Two Three Four Table 8: The results of model one to model four model One Two Three Four Structural methods Training Time (sec) Training set Validation set Test set Correct Accuracy Correct Accuracy Correct Accuracy Quick Dynamic Multiple Quick Dynamic Multiple Quick Dynamic Multiple Quick Dynamic Multiple Table 9: The number of patterns for each model Model No. of training patterns No. of validation patterns No. of test patterns

24 Table 10: The results of model one to model nine Model Average Structural methods Training Time (sec) Training set Validation set Test set Correct Accuracy Correct Accuracy Correct Accuracy Quick Dynamic Multiple Quick Dynamic Multiple Quick Dynamic Multiple Quick Dynamic Multiple Quick Dynamic Multiple Quick Dynamic Multiple Quick Dynamic Multiple Quick Dynamic Multiple Quick Dynamic Multiple Quick Dynamic Multiple

Data Mining Part 5. Prediction

Data Mining Part 5. Prediction Data Mining Part 5. Prediction 5.5. Spring 2010 Instructor: Dr. Masoud Yaghini Outline How the Brain Works Artificial Neural Networks Simple Computing Elements Feed-Forward Networks Perceptrons (Single-layer,

More information

Stochastic prediction of train delays with dynamic Bayesian networks. Author(s): Kecman, Pavle; Corman, Francesco; Peterson, Anders; Joborn, Martin

Stochastic prediction of train delays with dynamic Bayesian networks. Author(s): Kecman, Pavle; Corman, Francesco; Peterson, Anders; Joborn, Martin Research Collection Other Conference Item Stochastic prediction of train delays with dynamic Bayesian networks Author(s): Kecman, Pavle; Corman, Francesco; Peterson, Anders; Joborn, Martin Publication

More information

Artificial Neural Networks Examination, June 2005

Artificial Neural Networks Examination, June 2005 Artificial Neural Networks Examination, June 2005 Instructions There are SIXTY questions. (The pass mark is 30 out of 60). For each question, please select a maximum of ONE of the given answers (either

More information

MIDTERM: CS 6375 INSTRUCTOR: VIBHAV GOGATE October,

MIDTERM: CS 6375 INSTRUCTOR: VIBHAV GOGATE October, MIDTERM: CS 6375 INSTRUCTOR: VIBHAV GOGATE October, 23 2013 The exam is closed book. You are allowed a one-page cheat sheet. Answer the questions in the spaces provided on the question sheets. If you run

More information

ARTIFICIAL NEURAL NETWORKS گروه مطالعاتي 17 بهار 92

ARTIFICIAL NEURAL NETWORKS گروه مطالعاتي 17 بهار 92 ARTIFICIAL NEURAL NETWORKS گروه مطالعاتي 17 بهار 92 BIOLOGICAL INSPIRATIONS Some numbers The human brain contains about 10 billion nerve cells (neurons) Each neuron is connected to the others through 10000

More information

Predicting flight on-time performance

Predicting flight on-time performance 1 Predicting flight on-time performance Arjun Mathur, Aaron Nagao, Kenny Ng I. INTRODUCTION Time is money, and delayed flights are a frequent cause of frustration for both travellers and airline companies.

More information

Artificial Neural Networks Examination, June 2004

Artificial Neural Networks Examination, June 2004 Artificial Neural Networks Examination, June 2004 Instructions There are SIXTY questions (worth up to 60 marks). The exam mark (maximum 60) will be added to the mark obtained in the laborations (maximum

More information

ARTIFICIAL NEURAL NETWORK PART I HANIEH BORHANAZAD

ARTIFICIAL NEURAL NETWORK PART I HANIEH BORHANAZAD ARTIFICIAL NEURAL NETWORK PART I HANIEH BORHANAZAD WHAT IS A NEURAL NETWORK? The simplest definition of a neural network, more properly referred to as an 'artificial' neural network (ANN), is provided

More information

SUPERVISED LEARNING: INTRODUCTION TO CLASSIFICATION

SUPERVISED LEARNING: INTRODUCTION TO CLASSIFICATION SUPERVISED LEARNING: INTRODUCTION TO CLASSIFICATION 1 Outline Basic terminology Features Training and validation Model selection Error and loss measures Statistical comparison Evaluation measures 2 Terminology

More information

Lecture 7 Artificial neural networks: Supervised learning

Lecture 7 Artificial neural networks: Supervised learning Lecture 7 Artificial neural networks: Supervised learning Introduction, or how the brain works The neuron as a simple computing element The perceptron Multilayer neural networks Accelerated learning in

More information

Neural Networks biological neuron artificial neuron 1

Neural Networks biological neuron artificial neuron 1 Neural Networks biological neuron artificial neuron 1 A two-layer neural network Output layer (activation represents classification) Weighted connections Hidden layer ( internal representation ) Input

More information

Artificial Neural Networks

Artificial Neural Networks Artificial Neural Networks 鮑興國 Ph.D. National Taiwan University of Science and Technology Outline Perceptrons Gradient descent Multi-layer networks Backpropagation Hidden layer representations Examples

More information

Artificial Neural Networks Examination, March 2004

Artificial Neural Networks Examination, March 2004 Artificial Neural Networks Examination, March 2004 Instructions There are SIXTY questions (worth up to 60 marks). The exam mark (maximum 60) will be added to the mark obtained in the laborations (maximum

More information

Keywords- Source coding, Huffman encoding, Artificial neural network, Multilayer perceptron, Backpropagation algorithm

Keywords- Source coding, Huffman encoding, Artificial neural network, Multilayer perceptron, Backpropagation algorithm Volume 4, Issue 5, May 2014 ISSN: 2277 128X International Journal of Advanced Research in Computer Science and Software Engineering Research Paper Available online at: www.ijarcsse.com Huffman Encoding

More information

Predicting MTA Bus Arrival Times in New York City

Predicting MTA Bus Arrival Times in New York City Predicting MTA Bus Arrival Times in New York City Man Geen Harold Li, CS 229 Final Project ABSTRACT This final project sought to outperform the Metropolitan Transportation Authority s estimated time of

More information

Data Mining. 3.6 Regression Analysis. Fall Instructor: Dr. Masoud Yaghini. Numeric Prediction

Data Mining. 3.6 Regression Analysis. Fall Instructor: Dr. Masoud Yaghini. Numeric Prediction Data Mining 3.6 Regression Analysis Fall 2008 Instructor: Dr. Masoud Yaghini Outline Introduction Straight-Line Linear Regression Multiple Linear Regression Other Regression Models References Introduction

More information

CSCI-567: Machine Learning (Spring 2019)

CSCI-567: Machine Learning (Spring 2019) CSCI-567: Machine Learning (Spring 2019) Prof. Victor Adamchik U of Southern California Mar. 19, 2019 March 19, 2019 1 / 43 Administration March 19, 2019 2 / 43 Administration TA3 is due this week March

More information

Bayesian Networks Inference with Probabilistic Graphical Models

Bayesian Networks Inference with Probabilistic Graphical Models 4190.408 2016-Spring Bayesian Networks Inference with Probabilistic Graphical Models Byoung-Tak Zhang intelligence Lab Seoul National University 4190.408 Artificial (2016-Spring) 1 Machine Learning? Learning

More information

CSE 352 (AI) LECTURE NOTES Professor Anita Wasilewska. NEURAL NETWORKS Learning

CSE 352 (AI) LECTURE NOTES Professor Anita Wasilewska. NEURAL NETWORKS Learning CSE 352 (AI) LECTURE NOTES Professor Anita Wasilewska NEURAL NETWORKS Learning Neural Networks Classifier Short Presentation INPUT: classification data, i.e. it contains an classification (class) attribute.

More information

Lecture 4: Perceptrons and Multilayer Perceptrons

Lecture 4: Perceptrons and Multilayer Perceptrons Lecture 4: Perceptrons and Multilayer Perceptrons Cognitive Systems II - Machine Learning SS 2005 Part I: Basic Approaches of Concept Learning Perceptrons, Artificial Neuronal Networks Lecture 4: Perceptrons

More information

The prediction of passenger flow under transport disturbance using accumulated passenger data

The prediction of passenger flow under transport disturbance using accumulated passenger data Computers in Railways XIV 623 The prediction of passenger flow under transport disturbance using accumulated passenger data T. Kunimatsu & C. Hirai Signalling and Transport Information Technology Division,

More information

Short Term Load Forecasting Based Artificial Neural Network

Short Term Load Forecasting Based Artificial Neural Network Short Term Load Forecasting Based Artificial Neural Network Dr. Adel M. Dakhil Department of Electrical Engineering Misan University Iraq- Misan Dr.adelmanaa@gmail.com Abstract Present study develops short

More information

4. Multilayer Perceptrons

4. Multilayer Perceptrons 4. Multilayer Perceptrons This is a supervised error-correction learning algorithm. 1 4.1 Introduction A multilayer feedforward network consists of an input layer, one or more hidden layers, and an output

More information

Neural Networks. Single-layer neural network. CSE 446: Machine Learning Emily Fox University of Washington March 10, /9/17

Neural Networks. Single-layer neural network. CSE 446: Machine Learning Emily Fox University of Washington March 10, /9/17 3/9/7 Neural Networks Emily Fox University of Washington March 0, 207 Slides adapted from Ali Farhadi (via Carlos Guestrin and Luke Zettlemoyer) Single-layer neural network 3/9/7 Perceptron as a neural

More information

Delay management with capacity considerations.

Delay management with capacity considerations. Bachelor Thesis Econometrics Delay management with capacity considerations. Should a train wait for transferring passengers or not, and which train goes first? 348882 1 Content Chapter 1: Introduction...

More information

CS 6501: Deep Learning for Computer Graphics. Basics of Neural Networks. Connelly Barnes

CS 6501: Deep Learning for Computer Graphics. Basics of Neural Networks. Connelly Barnes CS 6501: Deep Learning for Computer Graphics Basics of Neural Networks Connelly Barnes Overview Simple neural networks Perceptron Feedforward neural networks Multilayer perceptron and properties Autoencoders

More information

Introduction to Neural Networks

Introduction to Neural Networks CUONG TUAN NGUYEN SEIJI HOTTA MASAKI NAKAGAWA Tokyo University of Agriculture and Technology Copyright by Nguyen, Hotta and Nakagawa 1 Pattern classification Which category of an input? Example: Character

More information

Lecture 5: Logistic Regression. Neural Networks

Lecture 5: Logistic Regression. Neural Networks Lecture 5: Logistic Regression. Neural Networks Logistic regression Comparison with generative models Feed-forward neural networks Backpropagation Tricks for training neural networks COMP-652, Lecture

More information

A Wavelet Neural Network Forecasting Model Based On ARIMA

A Wavelet Neural Network Forecasting Model Based On ARIMA A Wavelet Neural Network Forecasting Model Based On ARIMA Wang Bin*, Hao Wen-ning, Chen Gang, He Deng-chao, Feng Bo PLA University of Science &Technology Nanjing 210007, China e-mail:lgdwangbin@163.com

More information

DEEP LEARNING AND NEURAL NETWORKS: BACKGROUND AND HISTORY

DEEP LEARNING AND NEURAL NETWORKS: BACKGROUND AND HISTORY DEEP LEARNING AND NEURAL NETWORKS: BACKGROUND AND HISTORY 1 On-line Resources http://neuralnetworksanddeeplearning.com/index.html Online book by Michael Nielsen http://matlabtricks.com/post-5/3x3-convolution-kernelswith-online-demo

More information

Traffic Modelling for Moving-Block Train Control System

Traffic Modelling for Moving-Block Train Control System Commun. Theor. Phys. (Beijing, China) 47 (2007) pp. 601 606 c International Academic Publishers Vol. 47, No. 4, April 15, 2007 Traffic Modelling for Moving-Block Train Control System TANG Tao and LI Ke-Ping

More information

Data Mining. Preamble: Control Application. Industrial Researcher s Approach. Practitioner s Approach. Example. Example. Goal: Maintain T ~Td

Data Mining. Preamble: Control Application. Industrial Researcher s Approach. Practitioner s Approach. Example. Example. Goal: Maintain T ~Td Data Mining Andrew Kusiak 2139 Seamans Center Iowa City, Iowa 52242-1527 Preamble: Control Application Goal: Maintain T ~Td Tel: 319-335 5934 Fax: 319-335 5669 andrew-kusiak@uiowa.edu http://www.icaen.uiowa.edu/~ankusiak

More information

Does the Wake-sleep Algorithm Produce Good Density Estimators?

Does the Wake-sleep Algorithm Produce Good Density Estimators? Does the Wake-sleep Algorithm Produce Good Density Estimators? Brendan J. Frey, Geoffrey E. Hinton Peter Dayan Department of Computer Science Department of Brain and Cognitive Sciences University of Toronto

More information

PATTERN CLASSIFICATION

PATTERN CLASSIFICATION PATTERN CLASSIFICATION Second Edition Richard O. Duda Peter E. Hart David G. Stork A Wiley-lnterscience Publication JOHN WILEY & SONS, INC. New York Chichester Weinheim Brisbane Singapore Toronto CONTENTS

More information

Introduction to Artificial Neural Networks

Introduction to Artificial Neural Networks Facultés Universitaires Notre-Dame de la Paix 27 March 2007 Outline 1 Introduction 2 Fundamentals Biological neuron Artificial neuron Artificial Neural Network Outline 3 Single-layer ANN Perceptron Adaline

More information

Neural Networks. Nethra Sambamoorthi, Ph.D. Jan CRMportals Inc., Nethra Sambamoorthi, Ph.D. Phone:

Neural Networks. Nethra Sambamoorthi, Ph.D. Jan CRMportals Inc., Nethra Sambamoorthi, Ph.D. Phone: Neural Networks Nethra Sambamoorthi, Ph.D Jan 2003 CRMportals Inc., Nethra Sambamoorthi, Ph.D Phone: 732-972-8969 Nethra@crmportals.com What? Saying it Again in Different ways Artificial neural network

More information

Electric Load Forecasting Using Wavelet Transform and Extreme Learning Machine

Electric Load Forecasting Using Wavelet Transform and Extreme Learning Machine Electric Load Forecasting Using Wavelet Transform and Extreme Learning Machine Song Li 1, Peng Wang 1 and Lalit Goel 1 1 School of Electrical and Electronic Engineering Nanyang Technological University

More information

The Research of Urban Rail Transit Sectional Passenger Flow Prediction Method

The Research of Urban Rail Transit Sectional Passenger Flow Prediction Method Journal of Intelligent Learning Systems and Applications, 2013, 5, 227-231 Published Online November 2013 (http://www.scirp.org/journal/jilsa) http://dx.doi.org/10.4236/jilsa.2013.54026 227 The Research

More information

CS60021: Scalable Data Mining. Large Scale Machine Learning

CS60021: Scalable Data Mining. Large Scale Machine Learning J. Leskovec, A. Rajaraman, J. Ullman: Mining of Massive Datasets, http://www.mmds.org 1 CS60021: Scalable Data Mining Large Scale Machine Learning Sourangshu Bhattacharya Example: Spam filtering Instance

More information

Neural Networks and the Back-propagation Algorithm

Neural Networks and the Back-propagation Algorithm Neural Networks and the Back-propagation Algorithm Francisco S. Melo In these notes, we provide a brief overview of the main concepts concerning neural networks and the back-propagation algorithm. We closely

More information

CS:4420 Artificial Intelligence

CS:4420 Artificial Intelligence CS:4420 Artificial Intelligence Spring 2018 Neural Networks Cesare Tinelli The University of Iowa Copyright 2004 18, Cesare Tinelli and Stuart Russell a a These notes were originally developed by Stuart

More information

Recurrent Neural Networks

Recurrent Neural Networks Recurrent Neural Networks Datamining Seminar Kaspar Märtens Karl-Oskar Masing Today's Topics Modeling sequences: a brief overview Training RNNs with back propagation A toy example of training an RNN Why

More information

Multilayer Perceptrons (MLPs)

Multilayer Perceptrons (MLPs) CSE 5526: Introduction to Neural Networks Multilayer Perceptrons (MLPs) 1 Motivation Multilayer networks are more powerful than singlelayer nets Example: XOR problem x 2 1 AND x o x 1 x 2 +1-1 o x x 1-1

More information

Machine learning comes from Bayesian decision theory in statistics. There we want to minimize the expected value of the loss function.

Machine learning comes from Bayesian decision theory in statistics. There we want to minimize the expected value of the loss function. Bayesian learning: Machine learning comes from Bayesian decision theory in statistics. There we want to minimize the expected value of the loss function. Let y be the true label and y be the predicted

More information

CSCI567 Machine Learning (Fall 2018)

CSCI567 Machine Learning (Fall 2018) CSCI567 Machine Learning (Fall 2018) Prof. Haipeng Luo U of Southern California Sep 12, 2018 September 12, 2018 1 / 49 Administration GitHub repos are setup (ask TA Chi Zhang for any issues) HW 1 is due

More information

Data and prognosis for renewable energy

Data and prognosis for renewable energy The Hong Kong Polytechnic University Department of Electrical Engineering Project code: FYP_27 Data and prognosis for renewable energy by Choi Man Hin 14072258D Final Report Bachelor of Engineering (Honours)

More information

A Mixed Integer Linear Program for Optimizing the Utilization of Locomotives with Maintenance Constraints

A Mixed Integer Linear Program for Optimizing the Utilization of Locomotives with Maintenance Constraints A Mixed Integer Linear Program for with Maintenance Constraints Sarah Frisch Philipp Hungerländer Anna Jellen Dominic Weinberger September 10, 2018 Abstract In this paper we investigate the Locomotive

More information

Artificial Neural Networks. Edward Gatt

Artificial Neural Networks. Edward Gatt Artificial Neural Networks Edward Gatt What are Neural Networks? Models of the brain and nervous system Highly parallel Process information much more like the brain than a serial computer Learning Very

More information

Midterm: CS 6375 Spring 2018

Midterm: CS 6375 Spring 2018 Midterm: CS 6375 Spring 2018 The exam is closed book (1 cheat sheet allowed). Answer the questions in the spaces provided on the question sheets. If you run out of room for an answer, use an additional

More information

Artificial Intelligence (AI) Common AI Methods. Training. Signals to Perceptrons. Artificial Neural Networks (ANN) Artificial Intelligence

Artificial Intelligence (AI) Common AI Methods. Training. Signals to Perceptrons. Artificial Neural Networks (ANN) Artificial Intelligence Artificial Intelligence (AI) Artificial Intelligence AI is an attempt to reproduce intelligent reasoning using machines * * H. M. Cartwright, Applications of Artificial Intelligence in Chemistry, 1993,

More information

Multilayer Neural Networks. (sometimes called Multilayer Perceptrons or MLPs)

Multilayer Neural Networks. (sometimes called Multilayer Perceptrons or MLPs) Multilayer Neural Networks (sometimes called Multilayer Perceptrons or MLPs) Linear separability Hyperplane In 2D: w x + w 2 x 2 + w 0 = 0 Feature x 2 = w w 2 x w 0 w 2 Feature 2 A perceptron can separate

More information

Machine Learning. Neural Networks. (slides from Domingos, Pardo, others)

Machine Learning. Neural Networks. (slides from Domingos, Pardo, others) Machine Learning Neural Networks (slides from Domingos, Pardo, others) For this week, Reading Chapter 4: Neural Networks (Mitchell, 1997) See Canvas For subsequent weeks: Scaling Learning Algorithms toward

More information

Machine Learning (CS 567) Lecture 2

Machine Learning (CS 567) Lecture 2 Machine Learning (CS 567) Lecture 2 Time: T-Th 5:00pm - 6:20pm Location: GFS118 Instructor: Sofus A. Macskassy (macskass@usc.edu) Office: SAL 216 Office hours: by appointment Teaching assistant: Cheol

More information

Artificial Neural Network

Artificial Neural Network Artificial Neural Network Eung Je Woo Department of Biomedical Engineering Impedance Imaging Research Center (IIRC) Kyung Hee University Korea ejwoo@khu.ac.kr Neuron and Neuron Model McCulloch and Pitts

More information

MLPR: Logistic Regression and Neural Networks

MLPR: Logistic Regression and Neural Networks MLPR: Logistic Regression and Neural Networks Machine Learning and Pattern Recognition Amos Storkey Amos Storkey MLPR: Logistic Regression and Neural Networks 1/28 Outline 1 Logistic Regression 2 Multi-layer

More information

Machine Learning. Neural Networks. (slides from Domingos, Pardo, others)

Machine Learning. Neural Networks. (slides from Domingos, Pardo, others) Machine Learning Neural Networks (slides from Domingos, Pardo, others) Human Brain Neurons Input-Output Transformation Input Spikes Output Spike Spike (= a brief pulse) (Excitatory Post-Synaptic Potential)

More information

18.6 Regression and Classification with Linear Models

18.6 Regression and Classification with Linear Models 18.6 Regression and Classification with Linear Models 352 The hypothesis space of linear functions of continuous-valued inputs has been used for hundreds of years A univariate linear function (a straight

More information

Outline. MLPR: Logistic Regression and Neural Networks Machine Learning and Pattern Recognition. Which is the correct model? Recap.

Outline. MLPR: Logistic Regression and Neural Networks Machine Learning and Pattern Recognition. Which is the correct model? Recap. Outline MLPR: and Neural Networks Machine Learning and Pattern Recognition 2 Amos Storkey Amos Storkey MLPR: and Neural Networks /28 Recap Amos Storkey MLPR: and Neural Networks 2/28 Which is the correct

More information

Prediction of Citations for Academic Papers

Prediction of Citations for Academic Papers 000 001 002 003 004 005 006 007 008 009 010 011 012 013 014 015 016 017 018 019 020 021 022 023 024 025 026 027 028 029 030 031 032 033 034 035 036 037 038 039 040 041 042 043 044 045 046 047 048 049 050

More information

Part 8: Neural Networks

Part 8: Neural Networks METU Informatics Institute Min720 Pattern Classification ith Bio-Medical Applications Part 8: Neural Netors - INTRODUCTION: BIOLOGICAL VS. ARTIFICIAL Biological Neural Netors A Neuron: - A nerve cell as

More information

CHAPTER-17. Decision Tree Induction

CHAPTER-17. Decision Tree Induction CHAPTER-17 Decision Tree Induction 17.1 Introduction 17.2 Attribute selection measure 17.3 Tree Pruning 17.4 Extracting Classification Rules from Decision Trees 17.5 Bayesian Classification 17.6 Bayes

More information

Classification with Perceptrons. Reading:

Classification with Perceptrons. Reading: Classification with Perceptrons Reading: Chapters 1-3 of Michael Nielsen's online book on neural networks covers the basics of perceptrons and multilayer neural networks We will cover material in Chapters

More information

Supplementary Technical Details and Results

Supplementary Technical Details and Results Supplementary Technical Details and Results April 6, 2016 1 Introduction This document provides additional details to augment the paper Efficient Calibration Techniques for Large-scale Traffic Simulators.

More information

Machine Learning. Neural Networks. (slides from Domingos, Pardo, others)

Machine Learning. Neural Networks. (slides from Domingos, Pardo, others) Machine Learning Neural Networks (slides from Domingos, Pardo, others) For this week, Reading Chapter 4: Neural Networks (Mitchell, 1997) See Canvas For subsequent weeks: Scaling Learning Algorithms toward

More information

Travel Time Calculation With GIS in Rail Station Location Optimization

Travel Time Calculation With GIS in Rail Station Location Optimization Travel Time Calculation With GIS in Rail Station Location Optimization Topic Scope: Transit II: Bus and Rail Stop Information and Analysis Paper: # UC8 by Sutapa Samanta Doctoral Student Department of

More information

ECE521 Lecture 7/8. Logistic Regression

ECE521 Lecture 7/8. Logistic Regression ECE521 Lecture 7/8 Logistic Regression Outline Logistic regression (Continue) A single neuron Learning neural networks Multi-class classification 2 Logistic regression The output of a logistic regression

More information

Learning Tetris. 1 Tetris. February 3, 2009

Learning Tetris. 1 Tetris. February 3, 2009 Learning Tetris Matt Zucker Andrew Maas February 3, 2009 1 Tetris The Tetris game has been used as a benchmark for Machine Learning tasks because its large state space (over 2 200 cell configurations are

More information

Introduction to Machine Learning Midterm Exam Solutions

Introduction to Machine Learning Midterm Exam Solutions 10-701 Introduction to Machine Learning Midterm Exam Solutions Instructors: Eric Xing, Ziv Bar-Joseph 17 November, 2015 There are 11 questions, for a total of 100 points. This exam is open book, open notes,

More information

CHAPTER 4: DATASETS AND CRITERIA FOR ALGORITHM EVALUATION

CHAPTER 4: DATASETS AND CRITERIA FOR ALGORITHM EVALUATION CHAPTER 4: DATASETS AND CRITERIA FOR ALGORITHM EVALUATION 4.1 Overview This chapter contains the description about the data that is used in this research. In this research time series data is used. A time

More information

Neural Networks and Deep Learning

Neural Networks and Deep Learning Neural Networks and Deep Learning Professor Ameet Talwalkar November 12, 2015 Professor Ameet Talwalkar Neural Networks and Deep Learning November 12, 2015 1 / 16 Outline 1 Review of last lecture AdaBoost

More information

michele piana dipartimento di matematica, universita di genova cnr spin, genova

michele piana dipartimento di matematica, universita di genova cnr spin, genova michele piana dipartimento di matematica, universita di genova cnr spin, genova first question why so many space instruments since we may have telescopes on earth? atmospheric blurring if you want to

More information

Analysis of Disruption Causes and Effects in a Heavy Rail System

Analysis of Disruption Causes and Effects in a Heavy Rail System Third LACCEI International Latin American and Caribbean Conference for Engineering and Technology (LACCEI 25) Advances in Engineering and Technology: A Global Perspective, 8-1 June 25, Cartagena de Indias,

More information

Real-Time Travel Time Prediction Using Multi-level k-nearest Neighbor Algorithm and Data Fusion Method

Real-Time Travel Time Prediction Using Multi-level k-nearest Neighbor Algorithm and Data Fusion Method 1861 Real-Time Travel Time Prediction Using Multi-level k-nearest Neighbor Algorithm and Data Fusion Method Sehyun Tak 1, Sunghoon Kim 2, Kiate Jang 3 and Hwasoo Yeo 4 1 Smart Transportation System Laboratory,

More information

Internet Engineering Jacek Mazurkiewicz, PhD

Internet Engineering Jacek Mazurkiewicz, PhD Internet Engineering Jacek Mazurkiewicz, PhD Softcomputing Part 11: SoftComputing Used for Big Data Problems Agenda Climate Changes Prediction System Based on Weather Big Data Visualisation Natural Language

More information

Metaheuristics and Local Search

Metaheuristics and Local Search Metaheuristics and Local Search 8000 Discrete optimization problems Variables x 1,..., x n. Variable domains D 1,..., D n, with D j Z. Constraints C 1,..., C m, with C i D 1 D n. Objective function f :

More information

Classification Using Decision Trees

Classification Using Decision Trees Classification Using Decision Trees 1. Introduction Data mining term is mainly used for the specific set of six activities namely Classification, Estimation, Prediction, Affinity grouping or Association

More information

Induction of Decision Trees

Induction of Decision Trees Induction of Decision Trees Peter Waiganjo Wagacha This notes are for ICS320 Foundations of Learning and Adaptive Systems Institute of Computer Science University of Nairobi PO Box 30197, 00200 Nairobi.

More information

Chapter 10 Logistic Regression

Chapter 10 Logistic Regression Chapter 10 Logistic Regression Data Mining for Business Intelligence Shmueli, Patel & Bruce Galit Shmueli and Peter Bruce 2010 Logistic Regression Extends idea of linear regression to situation where outcome

More information

Chapter 1 Introduction

Chapter 1 Introduction Chapter 1 Introduction 1.1 Introduction to Chapter This chapter starts by describing the problems addressed by the project. The aims and objectives of the research are outlined and novel ideas discovered

More information

Multilayer Neural Networks. (sometimes called Multilayer Perceptrons or MLPs)

Multilayer Neural Networks. (sometimes called Multilayer Perceptrons or MLPs) Multilayer Neural Networks (sometimes called Multilayer Perceptrons or MLPs) Linear separability Hyperplane In 2D: w 1 x 1 + w 2 x 2 + w 0 = 0 Feature 1 x 2 = w 1 w 2 x 1 w 0 w 2 Feature 2 A perceptron

More information

DM534 - Introduction to Computer Science

DM534 - Introduction to Computer Science Department of Mathematics and Computer Science University of Southern Denmark, Odense October 21, 2016 Marco Chiarandini DM534 - Introduction to Computer Science Training Session, Week 41-43, Autumn 2016

More information

Pattern Recognition and Machine Learning. Learning and Evaluation of Pattern Recognition Processes

Pattern Recognition and Machine Learning. Learning and Evaluation of Pattern Recognition Processes Pattern Recognition and Machine Learning James L. Crowley ENSIMAG 3 - MMIS Fall Semester 2016 Lesson 1 5 October 2016 Learning and Evaluation of Pattern Recognition Processes Outline Notation...2 1. The

More information

Risk Analysis for Assessment of Vegetation Impact on Outages in Electric Power Systems. T. DOKIC, P.-C. CHEN, M. KEZUNOVIC Texas A&M University USA

Risk Analysis for Assessment of Vegetation Impact on Outages in Electric Power Systems. T. DOKIC, P.-C. CHEN, M. KEZUNOVIC Texas A&M University USA 21, rue d Artois, F-75008 PARIS CIGRE US National Committee http : //www.cigre.org 2016 Grid of the Future Symposium Risk Analysis for Assessment of Vegetation Impact on Outages in Electric Power Systems

More information

Neural Networks Lecture 4: Radial Bases Function Networks

Neural Networks Lecture 4: Radial Bases Function Networks Neural Networks Lecture 4: Radial Bases Function Networks H.A Talebi Farzaneh Abdollahi Department of Electrical Engineering Amirkabir University of Technology Winter 2011. A. Talebi, Farzaneh Abdollahi

More information

Neural Networks DWML, /25

Neural Networks DWML, /25 DWML, 2007 /25 Neural networks: Biological and artificial Consider humans: Neuron switching time 0.00 second Number of neurons 0 0 Connections per neuron 0 4-0 5 Scene recognition time 0. sec 00 inference

More information

is called an integer programming (IP) problem. model is called a mixed integer programming (MIP)

is called an integer programming (IP) problem. model is called a mixed integer programming (MIP) INTEGER PROGRAMMING Integer Programming g In many problems the decision variables must have integer values. Example: assign people, machines, and vehicles to activities in integer quantities. If this is

More information

From statistics to data science. BAE 815 (Fall 2017) Dr. Zifei Liu

From statistics to data science. BAE 815 (Fall 2017) Dr. Zifei Liu From statistics to data science BAE 815 (Fall 2017) Dr. Zifei Liu Zifeiliu@ksu.edu Why? How? What? How much? How many? Individual facts (quantities, characters, or symbols) The Data-Information-Knowledge-Wisdom

More information

EEE 241: Linear Systems

EEE 241: Linear Systems EEE 4: Linear Systems Summary # 3: Introduction to artificial neural networks DISTRIBUTED REPRESENTATION An ANN consists of simple processing units communicating with each other. The basic elements of

More information

Need for Deep Networks Perceptron. Can only model linear functions. Kernel Machines. Non-linearity provided by kernels

Need for Deep Networks Perceptron. Can only model linear functions. Kernel Machines. Non-linearity provided by kernels Need for Deep Networks Perceptron Can only model linear functions Kernel Machines Non-linearity provided by kernels Need to design appropriate kernels (possibly selecting from a set, i.e. kernel learning)

More information

A Hybrid Method of CART and Artificial Neural Network for Short-term term Load Forecasting in Power Systems

A Hybrid Method of CART and Artificial Neural Network for Short-term term Load Forecasting in Power Systems A Hybrid Method of CART and Artificial Neural Network for Short-term term Load Forecasting in Power Systems Hiroyuki Mori Dept. of Electrical & Electronics Engineering Meiji University Tama-ku, Kawasaki

More information

Metaheuristics and Local Search. Discrete optimization problems. Solution approaches

Metaheuristics and Local Search. Discrete optimization problems. Solution approaches Discrete Mathematics for Bioinformatics WS 07/08, G. W. Klau, 31. Januar 2008, 11:55 1 Metaheuristics and Local Search Discrete optimization problems Variables x 1,...,x n. Variable domains D 1,...,D n,

More information

CS6220: DATA MINING TECHNIQUES

CS6220: DATA MINING TECHNIQUES CS6220: DATA MINING TECHNIQUES Matrix Data: Prediction Instructor: Yizhou Sun yzsun@ccs.neu.edu September 14, 2014 Today s Schedule Course Project Introduction Linear Regression Model Decision Tree 2 Methods

More information

Text Mining. Dr. Yanjun Li. Associate Professor. Department of Computer and Information Sciences Fordham University

Text Mining. Dr. Yanjun Li. Associate Professor. Department of Computer and Information Sciences Fordham University Text Mining Dr. Yanjun Li Associate Professor Department of Computer and Information Sciences Fordham University Outline Introduction: Data Mining Part One: Text Mining Part Two: Preprocessing Text Data

More information

Neural Networks Task Sheet 2. Due date: May

Neural Networks Task Sheet 2. Due date: May Neural Networks 2007 Task Sheet 2 1/6 University of Zurich Prof. Dr. Rolf Pfeifer, pfeifer@ifi.unizh.ch Department of Informatics, AI Lab Matej Hoffmann, hoffmann@ifi.unizh.ch Andreasstrasse 15 Marc Ziegler,

More information

Advanced statistical methods for data analysis Lecture 2

Advanced statistical methods for data analysis Lecture 2 Advanced statistical methods for data analysis Lecture 2 RHUL Physics www.pp.rhul.ac.uk/~cowan Universität Mainz Klausurtagung des GK Eichtheorien exp. Tests... Bullay/Mosel 15 17 September, 2008 1 Outline

More information

Introduction to Machine Learning Midterm Exam

Introduction to Machine Learning Midterm Exam 10-701 Introduction to Machine Learning Midterm Exam Instructors: Eric Xing, Ziv Bar-Joseph 17 November, 2015 There are 11 questions, for a total of 100 points. This exam is open book, open notes, but

More information

CS 570: Machine Learning Seminar. Fall 2016

CS 570: Machine Learning Seminar. Fall 2016 CS 570: Machine Learning Seminar Fall 2016 Class Information Class web page: http://web.cecs.pdx.edu/~mm/mlseminar2016-2017/fall2016/ Class mailing list: cs570@cs.pdx.edu My office hours: T,Th, 2-3pm or

More information

Integer weight training by differential evolution algorithms

Integer weight training by differential evolution algorithms Integer weight training by differential evolution algorithms V.P. Plagianakos, D.G. Sotiropoulos, and M.N. Vrahatis University of Patras, Department of Mathematics, GR-265 00, Patras, Greece. e-mail: vpp

More information

Feedforward Neural Nets and Backpropagation

Feedforward Neural Nets and Backpropagation Feedforward Neural Nets and Backpropagation Julie Nutini University of British Columbia MLRG September 28 th, 2016 1 / 23 Supervised Learning Roadmap Supervised Learning: Assume that we are given the features

More information

Artificial neural networks

Artificial neural networks Artificial neural networks Chapter 8, Section 7 Artificial Intelligence, spring 203, Peter Ljunglöf; based on AIMA Slides c Stuart Russel and Peter Norvig, 2004 Chapter 8, Section 7 Outline Brains Neural

More information