DeepSD: Supply-Demand Prediction for Online Car-hailing Services using Deep Neural Networks

Size: px
Start display at page:

Download "DeepSD: Supply-Demand Prediction for Online Car-hailing Services using Deep Neural Networks"

Transcription

1 DeepSD: Supply- Prediction for Online Car-hailing Services using Deep Neural Networks Dong Wang 1,2 Wei Cao 1,2 Jian Li 1 Jieping Ye 2 1 Institute for Interdisciplinary Information Sciences, Tsinghua University 2 Bigdata Research Lab, Didi Chuxing {cao-w13@mails, wang-dong12@mails, lijian83@mail}.tsinghua.edu.cn yejieping@didichuxing.com Abstract The online car-hailing service has gained great popularity all over the world. As more passengers and more drivers use the service, it becomes increasingly more important for the the car-hailing service providers to effectively schedule the drivers to minimize the waiting time of passengers and maximize the driver utilization, thus to improve the overall user experience. In this paper, we study the problem of predicting the real-time car-hailing supply-demand, which is one of the most important component of an effective scheduling system. Our objective is to predict the gap between the car-hailing supply and demand in a certain area in the next few minutes. Based on the prediction, we can balance the supply-demands by scheduling the drivers in advance. We present an end-to-end framework called Deep Supply- (DeepSD) using a novel deep neural network structure. Our approach can automatically discover complicated supply-demand patterns from the car-hailing service data while only requires a minimal amount hand-crafted features. Moreover, our framework is highly flexible and extendable. Based on our framework, it is very easy to utilize multiple data sources (e.g., car-hailing orders, weather and traffic data) to achieve a high accuracy. We conduct extensive experimental evaluations, which show that our framework provides more accurate prediction results than the existing methods. I. INTRODUCTION Online car-hailing apps/platforms have emerged as a novel and popular means to provide on-demand transportation service via mobile apps. To hire a vehicle, a passenger simply types in her/his desired pick up location and destination in the app and sends the request to the service provider, who either forwards the request to some drivers close to the pick up location, or directly schedule a close-by driver to take the order. Comparing with the traditional transportation such as the subways and buses, the online car-hailing service is much more convenient and flexible for the passengers. Furthermore, by incentivizing private cars owners to provide car-hailing services, it promotes the sharing economy and enlarges the transportation capacities of the cities. Several car-hailing mobile apps have gained great popularities all over the world, such as Uber, Didi, and Lyft. Large number of passengers are served and volume of carhailing orders are generated routinely every day. For example, Didi, the largest online car-hailing service provider in China, handles around 11 million orders per day all over China. 1 As a large number of drivers and passengers use the service, several issues arise: Sometimes, some drivers experience a hard time to get any request since few people nearby call the rides; At the same time, it is very difficult for some passengers to get the 1 Homepage: ride, in bad weather or rush hours, because the demand in the surrounding areas significantly exceeds the supply. Hence, it is an very important yet challenging task for the service providers to schedule the drivers in order to minimize the waiting time of passengers and maximize the driver utilization. One of the most important ingredient of an effective driver scheduler is the supply-demand prediction. If one could predict/estimate how many passengers need the ride service in a certain area in some future time slot and how many close-by drivers are available, it is possible to balance the supply-demands in advance by dispatching the cars, dynamically adjusting the price, or recommending popular pick-up locations to some drivers. In this paper, we study the problem of predicting the car-hailing supply-demand. More concretely, our goal is to predict the gap between the car-hailing supply and demand (i.e., max(, demand supply)) for a certain area in the next few minutes. Our research is conducted based on the online car-hailing order data of Didi. To motivate our approach, we first present some challenges of the problem and discuss the drawback of the current standard practice for such problem. The car-hailing supply-demand varies dynamically due to different geographic locations and time intervals. For example, in the morning the demand tends to surge in the residential areas whereas in the evening the demand usually tends to surge in the business areas. Furthermore, the supply-demand patterns under different days of a week can be extremely different. Prior work usually distinguishes different geographic locations, time intervals or days of week and build several sub-models respectively [1] [4]. Treating the order data separately and creating many submodels are tedious, and may suffer from the lack of training data since each sub-model is trained over a small part of data. The order data contains multiple attributes such as the timestamp, passenger ID, start location, destination etc, as well as several environment factors, such as the traffic condition, weather condition etc. These attributes together provide a wealth of information for supply-demand prediction. However, it is nontrivial how to use all the attributes in a unified model. Currently, the most standard approach is to come up with many hand-crafted features (i.e., feature engineering), and fit them into an off-the-shelf learning algorithm such as logistic regression or random forest. However, feature engineering typically requires substantial human efforts (it is not unusual to see data science/ machine

2 : 4: 8: 12: 16: : : 4: 8: 12: 16: : : 4: 8: 12: 16: : : 4: 8: 12: 16: : (a) First area on March 9th (b) First area on March 13th (c) Second area on March 9th (d) Second area on March 13th Fig. 1. Car-hailing demands under four different situations. learning practitioners creating hundreds different features in order to achieve a competitive performance) and there is little general principle how this should be done. Some prior work only keeps a subset of attributes for training, such as the timestamp, start location and drops other attributes [2] [6]. While this makes the training easier, discarding the attributes leads to the information loss and reduces the prediction accuracy. To provide some intuitions for the readers and to illustrate the challenges, we provide an example in Fig.1. Example 1: Fig. 1 shows the demand curves for two areas on March 9th (Wedneay) and March 13th (Sunday). From the figure, we can see very different patterns under different timeslots for the two areas. For the first area, few people require the car-hailing services on Wedneay. However, the demand increased sharply on Sunday. Such pattern usually occurs in the entertainment area. For the second area, we observe a heavy demand on Wedneay, especially during two peak hours around 8 o clock and 19 o clock (which are the commute times for most people during the weekdays). On Sunday, the demand of car-hailing services on this area reduced significantly. Moreover, the supply-demand patterns change from day to day. There are many other complicated factors that can affect the pattern, and it is impossible to list them exhaustively. Hence, simply using the average value of historic data or empirical supply-demand patterns can lead to quite inaccurate prediction results, which we show in our experiments (see Section VI). To address the above challenges, we propose an end-toend framework for supply-demand prediction, called Deep Supply- (DeepSD). Our framework is based on the deep learning technique, which has successfully demonstrated its power in a number of application domains such as vision, speech and natural language processing [7] [9]. In particular, we develop a new neural network architecture, that is tailored to our supply-demand prediction task. Our model demonstrates a high prediction accuracy, requires little hand-crafted feature, and can be easily extended to incorporate new dataset and features. A preliminary version of our model achieved the 2nd place among 1648 teams in the Didi supply-demand prediction competition. 2 Our technical contributions are summarized below: We proposed an end-to-end framework based on a deep learning approach. Our approach can automatically learn 2 DiTech16. The preliminary model we used for the competition was almost the same as the basic version of our model described in Section IV. Our final model, described in Section V, further refines the basic model by introducing a few new ideas, and is more stable and accurate. We are currently in an effort of deploying the model and incorporate it into the scheduling system in Didi. the patterns across different spatio-temporal attributes (e.g. geographic locations, time intervals and days of week), which allows us to process all the data in a unified model, instead of separating it into the sub-models manually. Comparing with other off-the-shelf methods (e.g., gradient boosting, random forest [1]), our model requires a minimal amount feature-engineering (i.e., hand-crafted features), but produces more accurate prediction results. We devise a novel neural network architecture, which is inspired by the deep residual network (ResNet) proposed very recently by He et al. [11] for image classification. The new network structure allows one to incorporate the environment factor data such as the weather and traffic data very easily into our model. On the other hand, we can easily utilize the multiple attributes contained in the order data without much information loss. We utilize the embedding method [9], [12], a popular technique used in natural language processing, to map the high dimensional features into a smaller subspace. In the experiment, we show that the embedding method enhances the prediction accuracy significantly. Furthermore, with embedding, our model also automatically discovers the similarities among the supply-demand patterns of different areas and timeslots. We further study the extendability of our model. In real applications, it is very common to incorporate new extra attributes or data sources into the already trained model. Typically we have to re-train the model from the scratch. However, the residual learning component of our model can utilize these already trained parameters by a simple fine tuning strategy. In the experiment, we show that the fine-tuning can accelerate the convergence rate of the model significantly. Finally, we conduct extensive experiments on a large scale real dataset of car-hailing orders from Didi. The experimental results show that our algorithm outperforms the existing method significantly. The prediction error of our algorithm is 11.9% lower than the best existing method. II. FORMULATION AND OVERVIEW We present a formal definition of our problem. We divide a city into N non-overlapping square areas a 1, a 2,..., a N and each day into 144 timeslots (one minute for one timeslot). Then we define the car-hailing orders in Definition 1. Definition 1 (Car-hailing Order): A car-hailing order o is defined as a tuple: the date when the car-hailing request was sent o.d, the corresponding timeslot o.ts [1, 144], the passenger

3 ID o.pid, the area ID of start location o.loc s [N] and the area ID of destination o.loc d [N]. If the a driver answered the request, we say it is a valid order. Otherwise, if no driver answered the request, we say it is an invalid order. Definition 2 (Supply-demand Gap): For the d-th day, the supply-demand gap of the time interval [t, t + C) in area a is defined as the total amount of invalid orders in this time interval. We fix the constant C to be 1 in this paper 3 and we denote the corresponding gap as gap a. We further collected the weather condition data and traffic condition data of different areas which we refer to as the environment data. Definition 3 (Weather Condition): For a specific area a at timeslot t in the d-th day, the weather condition (denoted as wc) is defined as a tuple: the weather type (e.g., sunny, rainy, cloudy etc.) wc.type, the temperature wc.temp and the PM2.5 wc.pm. All areas share the same weather condition at the same timeslot. Definition 4 (Traffic Condition): The traffic condition describes the congestion level of road segments in each area: from Level 1 (most congested) to Level 4 (least congested). For a specific area a at timeslot t in the d-th day, the traffic condition is defined as a quadruple: the total amount of road segments in area a under four congestion levels. Now, we can define our problem as below. Problem Suppose the current date is the d-th day and the current time slot is t. Given the past order data and the past environment data, our goal is to predict the supply-demand gap gap a for every area a, i.e., the supply-demand gap in the next 1 minutes. Our paper is organized as follows. We first present several preliminaries in Section III, including the embedding method and the residual network which we mentioned in the introduction part. Then, we show a basic version of our model in Section IV. The basic version adopt a simple network structure and only uses the order data in the current day. In Section V, we present an advanced version which is an extension of the basic version. The advanced version utilize more attributes in the order data and it further incorporates the historical order data to enhance the prediction accuracy. In Section VI we conduct extensive experiment evaluations. Finally, we briefly review some related work in Section VII and conclude our paper in Section VIII. III. A. Embedding Method PRELIMINARY Embedding method is a feature learning technique which is widely used in Deep Learning, especially in natural language processing (NLP) tasks [9], [12], [13]. It is a parameterized function which maps the categorical values to the real numbers. Specifically, neural networks treat every input as a real value. A simple way to transform a categorical value to real numbers is one-hot representation. For example, suppose the value of a categorical feature is 3 and the corresponding vocabulary size (highest possible value) is 5. Then, its one-hot representation is (,, 1,, ). However, using such representation can be 3 The constant 1 (minutes) is due to the business requirement. It can be replaced by any other constant. computationally expensive when the vocabulary size is huge. Moreover, such representation does not capture the similarity between different categories. The embedding method overcomes such issues by mapping each categorical value into a low-dimensional space (relative to the vocabulary size). For example, the categorical value with one-hot representation equal to (,, 1,, ) can be represented as the form of (.2, 1.4,.5). Formally, for each categorical feature, we build an embedding layer with parameter matrix W R I O. Here I is the vocabulary size of input categorical value and O is the dimension of the output space (which we refer to as the embedding space). For a specific categorical value i [I], we use onehot(i) R 1 I to denote its onehot representation. Then, its embedded vector embed(i) R 1 O is equal to onehot(i) multiply the matrix W, i.e., the i-th row of matrix W. We usually have that O I. Thus, even the vocabulary size is very large, we can still handle these categorical values efficiently. Furthermore, an important property of embedding method is that the categorical values with similar semantic meaning are usually very close in the embedding space. For example in our problem, we find that if two different areas share similar supply-demand patterns, then their area IDs are close in the embedding space. See Section VI for the details. We stress that the parameter matrix W in the embedding layer is optimized with other parameters in the network. We do not train the Embedding Layers separately. Many non-trivial tasks have greatly benefited from very deep neural networks, which reveals that network depth is of crucial importance [14] [16]. However, an obstacle to train a very deep model is the gradient vanishing/exploding problem, i.e., the gradient vanishes or explodes after passing through several layers during the backpropagation [17], [18]. To overcome such issue in deep neural networks, He et al. [11] proposed a new network architecture called the residual network (ResNet), which allows one to train very deep convolutional neural networks successfully. The residual learning adds the shortcut connections (dashed line in Fig.??) and direct connections (solid line in Fig.??) between different layers. Thus, the input vector can be directly passed to the following layers though the shortcut connections. For example, in Fig.??, we use x to denote the input vector and H(x) to denote the desired mapping after two stacked layers. In the residual network, instead of learning the mapping function H(x) directly, we learn the residual mapping F(x) = H(x) x and broadcast F(x) + x to the following layers. It has been shown that optimizing the residual mapping is much easier than optimizing the original mapping [11], which is the key to the success of deep residual network. IV. BASIC VERSION We first present the basic version of our model in this section. In Section V, we extend the basic version with a few new ideas, and present the advanced version of our model. The basic model consists of three parts. Each part consists of one or more blocks (recall that the block is the base unit of our model). In Section IV-A, we first process the identity features (area ID, timeslot, day of week) in the identity part. Next in Section IV-B, we describe the order part which processes the order data. The order part is the most important part of our model. In Section IV-C, we present the environment part. The environment part processes the weather data and traffic data.

4 Fig. 3. Identity Block we do not need to design any distance measure separately. The parameters are optimized through backpropagation towards minimizing the final prediction loss. Fig. 2. Structure of basic DeepSD Finally, in Section IV-D, we illustrate how we connect different blocks. The structure of our basic model is shown in Fig. 2. A. Identity Part The identity part consists of one block called the identity block. We call the features which identify the data item we want to predict as the identity features. The identity features include the ID of area AreaID, the timeslot ID and the day of week (Monday, Tueay,..., Sunday) WeekID. For example, if we want to predict the supply-demand gap of area a in the time interval [t, t + 1) in the d-th day and that day is Monday, then we have that AreaID = a, ID = t and WeekID =. Note that the features in the identity block are categorical. As we mentioned in Section III, we can either using the onehot representation or embedding representation to transform the categorical values to real numbers. In our problem, since the vocabularies of AreaID and ID are very large, the one-hot representation leads to a high cost. Moreover, the one-hot representation treats the different areas or timeslots independently. However, we find that different areas at different time can share similar supply-demand patterns, especially when they are spatio-temporally close. For example, the demands of car-hailing are usually very heavy for all the areas around the business center at 19:. Clustering these similar data items helps enhance the prediction accuracy. In our model, we use the embedding method to reduce the feature dimensions and discover the similarities among different areas and timeslots. Formally, the structure of the identity part is shown in Fig. 3. We use three Embedding Layers to embed AreaID, ID and WeekID respectively. We then concatenate the outputs of three Embedding Layers by a Concatenate Layer. The Concatenate Layer takes a list of vectors as the input and simply outputs the concatenation of the vectors. We use the output of the Concatenate Layer as the output of the identity block, denoted as X id. Furthermore, we stress that prior work [1], [3], [19] also clusters the similar data items to enhance the prediction accuracy. However, they treat the clustering stage as a separate sub-task and they need to manually design the distance measure, which is a non-trivial task. Our model is end-to-end and we can optimize the embedding parameters together with other parameters in the neural network. Hence B. Order Part The order part in the basic version consists only one block called the supply-demand block. The supply-demand block can be regarded as a three layer perception, which processes the order data. For a specific area a, to predict the supply-demand gap gap a of the time interval [t, t + 1) in the d-th day, we consider the order set with timestamp in [t L, t) of the d-th day, which we denote as S. Here L is the window size which is specified as minutes in the experiment section (Section VI). We then aggregate S into a real-time supply-demand vector. Definition 5 (Real-time supply-demand vector): For a specific area a, we define the real-time supply-demand vector in the d-th day at timeslot t as V. V is a 2L-dimensional vector, which consists of two parts. We denote the first L dimensions of V as V A. The l-th dimension of V A is defined as: V A (l) = {o o is valid o S o.ts = t l} In another word, V A (l) describes the amount of valid orders at t l in the current day. Similarly, we define the remaining part as V B which corresponds to the invalid orders in the previous L minutes. We use V as the Input Layer of the supply-demand block. We then pass V through two Fully-Connected (abbr. FC) layers. A Fully-Connected Layer with input x is defined as FC sz (x) = f(x W + b) where sz is the corresponding output size, W, b are the parameters and f is the activation function which we specify in Section VI. We use FC 64 as the first Fully-Connected Layer and the FC 32 as the second Fully-Connected Layer. The output of the supply-demand block is the output of FC 32, denoted as X. See Fig 4 for illustration. C. Environment Part In the environment part, we incorporate the information from the weather data through adding the weather block to the network and the traffic data through the traffic block. For the weather condition, we first create a weather condition vector Vwc. We show the structure of the weather block in Fig. 5. The vector Vwc consists of L parts. For a specific l [L], we have the weather condition wc at timeslot

5 Fig. 4. Structure of Supply-demand Block t l in the d-th day and we embed the weather type wc.type into a low dimensional space. Then the l-th part of Vwc is defined as the concatenation of the embedded weather type wc.type, the temperature wc.temp and the PM 2.5 wc.pm. Furthermore, note that the weather block also receives the output of the supply-demand block X through a direct connection. We concatenate X and Vwc by a Concatenate Layer and pass the output of the Concatenate Layer through two Fully-Connected layers FC 64 and FC 32. We denote the output of FC 32 as R wc. Then, the output of the weather block X wc is defined as: Fig. 5. Weather Block and Traffic Block X wc = X R wc where is the element-wise add operation and X is obtained through the shortcut connection. Note that the structure we used here is similar with ResNet as we mentioned in Section III. However, there are two main differences between our model and ResNet. First, instead of adding shortcut connections between layers, we add the shortcut connections between different blocks. Second, in ResNet, a layer only receives the input from previous layers through a direct connection whereas in our model a block receives the inputs from both the previous block and the dataset. Such structure on one hand is more suitable for handling the data from multiple sources. On the other hand, we show that in Section VI-H, such structure is highly extendable. We can easily incorporate new datasets or attributes into our model based on such structure. For the traffic condition, recall that at each timeslot the traffic condition of a specific area can be represented as the total amount of road segments in four different congestion levels. We thus create a traffic condition vector V tc with L parts. Each part consists of four real values corresponding to the traffic condition at that time slot. We construct the traffic block in the same way as we construct the weather block. Then, we use X tc = X wc R tc as the output of the traffic block, as shown in Fig. 5. D. Block Connections We then connect all the blocks. Note that the supply-demand block, the weather block and the traffic block are already connected through the residual learning. The output vector of these stacked blocks is X tc. We then concatenate the output of the identity block X id and X tc with a Concatenate Layer. We append a Fully-Connected Layer FC 32 and a single neuron after the Concatenate Layer. The single neuron finally outputs the predicted supply-demand gap with the linear activation function, as shown in Fig. 2. We stress that our model is endto-end, once we obtain the predicted value, we can calculate the loss based on the loss function and update each parameter with its gradient through backpropagation. We further illustrate the intuition of our model. To predict the supply-demand gap, the most relevant and important data is the car-hailing order data. We use the supply-demand block to learn the useful feature vector from the order data. In our model, the learnt feature corresponds to the output of the supplydemand block, X. The environment data can be regarded as the supplementary of the learnt features. Thus, we add the weather block to extract the residual R wc and adjust the previous learnt features by adding R wc to X. The same argument holds for the traffic block. V. ADVANCED VERSION In this section, we present an advanced version of our model. Comparing with the basic model, the advanced model replaces the order part in Fig. 2 with an extended order part as shown in Fig. 6, which is composed of three blocks. The first block extended supply-demand block extends the original supply-demand block with a well-designed structure. Such structure enables our model to learn the dependence of the historical supply-demand over different days automatically, which we present in Section V-A. In Section V-B, we present the remaining two blocks, the last call block and the waiting time block, which have the same structure as the extended supply-demand block. Comparing with the basic version where we only use the number of orders, the new blocks contains passenger information as well. A. Extended supply-demand block Recall that in the basic version, we use the real-time supplydemand vector V to predict the supply-demand gap. In the extended order block, we further incorporate the historical order data to enhance the prediction accuracy, i.e., the car-hailing orders with date prior to the d-th day. We present the extended supply-demand block in two stages. In the first stage, we obtain an empirical supply-demand vector in time interval [t L, t)

6 in the d-th day. Such empirical supply-demand vector is an estimation of V based on the historical order data. In the second stage, we use the real-time supply-demand vector and the empirical supply-demand vector to construct our extended supply-demand block. 1) First Stage: We first extract the empirical supply-demand vector in [t L, t) in the d-th day, denoted as E. It has been shown that due to the regularity of human mobility, the patterns in the traffic system usually show a strong periodicity in time on a weekly basis [1] [3], [5], [], [21]. However, for different days of week, the supply-demand patterns can be very different. For example, in Huilongguan, a district in Beijing where most of IT employees live, the demand of car-hailing services in Monday morning is usually much more than that in Sunday morning. Motived by this, we first consider the historical supply-demands in different days of week. Formally, we use M to denote all the Mondays prior to the d-th day. For each day m M, we calculate the corresponding realtime supply-demand vector in that day, denoted as V m,t as we defined in Definition 5. We average the vectors V m,t for all m M. We call such average the historical supply-demand vector on Monday, denoted as H (Mon),. Thus, we have that, H (Mon), = 1 V m,t M. m M Similarly, we define the historical supply-demand vector on the other days of week: H (Tue),, H (Wed),,..., H (Sun),. The empirical supply-demand vector E is defined as a weighted combination of {H (Mon),,..., H (Sun), }. We refer to the weight vector as combining weights of different weekdays, denoted as p. In our model, such weight vector p is automatically learnt by the neural network according to the current AreaID and WeekID. The network structure is shown in Fig. 7. We first embed the current AreaID and WeekID into a low-dimensional space. We concatenate the embedded vectors and pass it into a Softmax Layer. A Softmax Layer takes the concatenation x as the input and outputs the weight vector p by p (i) = j ex W.i ex W.j, i = where W.j is the j-th column of the parameter matrix W in the Softmax Layer. Then, we have that E = p(1) H (Mon), p (7) H (Sun),. (1) We stress that most of prior work simply distinguish the historical data in weekdays and weekends separately [1] [3], [6], [22]. However, on one hand, such method may suffer from the lack of training data. We only utilizes part of the data when we calculate the historical supply-demand vector. On the other hand, different areas can show different dependences over days of week. For example, in our experiment (Section VI), we find that for some areas, the supply-demands in Tueays are very different from the other days of week. Thus, to predict the supply-demand in Tueay, we mainly consider the historical data in the past Tueays. For some other areas, the supplydemands in all the days of week are very similar. In this case, taking all the historical data into consideration leads to a more accurate result. Obviously, simply separating the historical data in weekdays and weekends can not such patterns. Fig. 6. Fig. 7. Extended Order Part Historical supply-demand vector H 2) Second Stage: Next, we use the obtained empirical supply-demand vector and real-time supply-demand vector to construct our block. First, using the same method as we obtain E, we calculate another empirical supply-demand vector in time interval [t L + 1, t + 1) in the current day, denoted as E +1. Note that E +1 is the empirical estimation of the real-time supply-demand vector V +1. If we can estimate V +1 accurately, we can easily predict the currently supplydemand gap. In our model, we use the empirical estimations E, E +1 and the real-time supply-demand vector V to estimate V +1. We first use the Fully-Connection Layers to project these three vectors onto the same low-dimensional space (in our experiment we fix the dimensionality to be 16). We denote the projected vectors as Proj(V ), Proj(E ) and Proj(E +1 ). Instead of estimating V +1 directly, we estimate the projection of V +1. We denoted the estimated projection as Proj(V ˆ +1 ) and we have that, ˆ Proj(V +1 ) = Proj(V ) Proj(E ) + Proj(E+1 ). Finally, we concatenate Proj(V ), Proj(E ), Proj(E +1 ), Proj(V ˆ +1 ), with a Concatenate Layer and pass it through two Fully-Connected layers FC 64 and FC 32. We use the output of FC 32 as the output of the extended supply-demand block. See Fig. 8 for an illustration. We explain the reason that we estimate V +1 in such way. The vector Proj(V ) Proj(E ) indicates how the realtime supply-demand of [t L, t) deviates from its empirical estimation. We thus estimate Proj(V +1 ) by adding such deviation to the projection of empirical estimation Proj(E +1 ).

7 Fig. 8. Extended Supply-demand Block Moreover, the projection operation on one hand reduce the dimension of each supply-demand vector from 2L to 16. On the other hand, we find that using the projection operation in our experiment makes our model more stable. B. Last Call Block and Waiting Block In this section, we present two additional blocks, called the last call block and the waiting time block. Note that the order data contains multiple attributes. However, when calculating the supply-demand vector, we did not consider the attribute o.pid. Thus, the supply-demand vector V does not contain any passenger information. From V, we can not answer the questions such as how many unique passengers did not get the rides in the last 5 minutes or how many passengers waited for more than 3 minutes etc. However, we find that the passenger information is also very important to supplydemand gap prediction. For example, if many passengers failed on calling the rides or waited for a long time, it reflects that the current demand exceeds the supply significantly which can lead to a large supply-demand gap in the next few minutes. We use the last call block and the waiting time block to provide the passenger information. Both of these two blocks have the same structure as the extended supply-demand block. In another word, we just replace the real-time supply-demand vector V in the extended supply-demand vector with the real-time last call vector and real-time waiting time vector. For the last call block, we define the last call vector as follow. Definition 6 (Real-time last call vector): For a specific area a at timeslot t in the d-th day, we first pick out the last call orders in [t L, t) for all passengers, (i.e. for a specific passenger pid, we only keep the last order sent by pid), and denote the order set as S L. Then, the real-time last call vector V lc is defined as a 2L-dimensional vector. We denote the first L dimension as V A lc. For the l-th dimension of V A lc, we have that V A lc (l) = {pid o S L s.t. o is valid o.pid = pid o.ts = t l} V Alc (l) describes the amount of passengers whose last call is at t l and she/he successfully got the ride. Similarly, we define V B lc which corresponds to the passengers who did not get the rides. We explain the reason that we define the real-time last call vector. In our data, we find that if a passenger failed on calling a ride, she/he is likely to sent the car-hailing request again in the next few minutes. Especially, the last calls near timeslot t are highly relevant to the supply-demand gap in [t, t + 1). Based on V lc, we can further obtain the empirical last call vector E lc with the same way as we obtain E. We thus construct the extended real-time last call block with the same structure as the extended supply-demand block. For the waiting time block, we define the real-time waiting time vector V wt R 2L in the same way as we defining V and V lc. Definition 7 (Real-time waiting time vector): For a specific area a at timeslot t in the d-th day, we define the real-time waiting time vector as V wt. The l-th dimension in the first part V A wt (first L dimensions) is the total amount of passengers who waited for l minutes (from her/his first call in [t L, t) to the last call) and did get the rides at last. Similarly, we define the second part V B wt which corresponds to the wait time of passengers who did not get the ride. We thus construct the extended waiting time block with the same structure of the extended supply-demand block. Finally, we connect the supply-demand block, the last call block and the waiting time block through residual learning, as shown in Fig. 6. These three blocks together form the extended order part in the advanced model. We use the extended order part to replace the original order part and we thus obtain the advanced version of DeepSD. C. Extendability Finally, in this section we present the extendability of our model. In real applications, it is very common to incorporate new extra attributes or data sources into the previous model. For example, imagine that we have already trained a model based on the order data and the weather data. Now we obtained the traffic data and we want to incorporate such data to enhance the prediction accuracy. Typically, we have to discard the already trained parameters and re-train the model from beginning. However, our model makes a good use of the already trained parameters. In our model, such scenario corresponds to that we have trained a model with the order block and the weather block. To incorporate the traffic data, we construct the traffic block and connect the traffic block with previous blocks through residual learning, as we show in Fig. 2. Instead of re-training the model from the scratch, we use the already trained parameters as the initialized parameters and keep optimizing the parameters of the new model through backpropagation. We refer to such strategy fine-tuning. In the experiment (Section VI), we show that the fine-tuning accelerates the convergence rate significantly and makes our model highly extendable. VI. EXPERIMENT In this section, we report our experimental results on an real dataset from Didi. We first describe the details of our dataset in Section VI-A and the experimental setting in Section VI-B. Then, we compare our models with several other most popular machine learning algorithms in Section VI-C. In Section VI-D to Section VI-F, we show the effects of different components in our model. The advanced DeepSD can automatically extract the weights to combine the features of different days of a week. We present some interesting properties of the weights in Section

8 VI-G. Finally, we show some results about the extendability of our model in Section VI-H. A. Data Description In our experiment, we use the public dataset released by Didi in the Di-tech supply-demand prediction competition 4. The order dataset contains the car-hailing orders from Didi over more than 7 weeks of 58 square areas in Hangzhou, China. Each area is about 3km 3km large. The order dataset consists of 11, 467, 117 orders. The gaps in our dataset is approximately power-law distributed. The largest gap is as large as On the other hand, around 48% of test items are supply-demand balanced, i.e., gap =. Auxiliary information include weather conditions (weather type, temperature, PM 2.5) and traffic conditions (total amount of road segments under different congestion levels in each area). The training data is from 23th Feb to 17th March (24 days in total). To construct the training set, for each area in each training day, we generate one training item every 5 minutes from : to 24:. Thus, we have 58(areas) 24(days) 283(items) = 393, 936 training items in total. Due to the restriction of test data, we set the window size L =. The test data is from 18th March to 14th April (28 days in total). During the test days, the first time slot is 7:3 and the last time slot is 23:3. We select one time slot t every 2 hours from the first time slot unit the last time slot, i.e., t = 7:3, 9:3, 11:3,..., 23:3. For each time slot t, we generate one test item. We use T to denote the set of test items. 1) Error Metrics: We evaluate the predicted results using the mean absolute error (MAE) and the root mean squared error (RMSE). Formally, we use pred a to denote the predicted value of gap a. Then, the mean absolute error and the root mean squared error can be computed as follows: MAE = 1 gap a pred a T RMSE = 1 T B. Model Details (a,) T (a,) T ( gap a ) 2. pred a We describe the model setting in this section. 1) Embedding: Recall that we map all the categorical values to a low-dimensional vectors via embedding (in Section IV-A and Section IV-C). The detailed settings of different embedding layers are shown in Table I. 2) Activation Function: For all Fully-Connected layers, we use leaky rectified linear function (LReL) [23] as the corresponding activation function. An LReL function is defined as: LReL(x) = max{.1 x, x}. For the final output neuron, we simply use the linear activation. 4 DiTech16&&locale=en TABLE I. EMBEDDING SETTING Embedding Layers Setting Occurred Parts Embedding of AreaID Embedding of ID Embedding of WeekID Embedding of wc.type R 58 R 8 R 144 R 6 R 7 R 3 R 1 R 3 Identity Part, Extended Order Part Identity Part Identity Part, Extended Order Part Environment Part 3) Optimization Method: We apply the Adaptive Moment Estimation (Adam) method [24] to train our model. Adam is a robust mini-batch gradient descent algorithm. We fix the batch size to be 64. To prevent overfitting, we further apply the dropout method [25] with probability.5 after each block (except the identity block). 4) Platform: Our model is trained on a GPU server with one GeForce 18 GPU (8GB DDR5) and 24 CPU cores (2.1GHz) in Centos 6.5 platform. We implement our model with Theano.8.2, a widely used Deep Learning Python library [26]. C. Performance Comparison We train both the basic model and advanced model for 5 epochs. We evaluate the model after each epoch. To make our model more robust, our final model is the average of the models in the best 1 epochs. To illustrate the effectiveness of our model, we further compare our model with several existing methods. The parameters of all the models are fine-tuned through the grid search. Empirical Average: For a specific t in area a, we simply 1 use the empirical average gap D train d D train gap a as the prediction for the supply-demand gap in time interval [t, t + 1). LASSO [1]: The Lasso is a linear model that estimates sparse coefficients. It usually produces better prediction result than simple linear regression. Since LASSO can not handle the categorical variables, we transform each categorical variable to the one-hot representation. We use the LASSO implementation from the scikit-learn library [27]. Gradient Boosting Decision Tree: Gradient Boosting Decision Tree (GBDT) is a powerful ensemble method which is widely used in data mining applications. In our experiment, we use a fine-tuned and efficient GBDT implementation XGBoost [28]. Random Forest: Random Forest (RF) is another widely used ensemble method which offers comparable performance with GBDT. We use the RF implementation from the scikit-learn library [27]. For fair comparisons, we use the same input features for the above methods (except empirical average) as those used in DeepSD, including: - AreaID, ID, WeekID - Real-time supply-demand vector V ; Historical supply-demand vector of different days of week,..., H (Sun), H (Mon),. - Real-time last call vector V lc ; Historical last call vector of different days of week H (Mon), lc,..., H (Sun), lc.

9 TABLE II. PERFORMANCE COMPARISON TABLE III. EFFECTS OF EMBEDDING Model Error Metrics MAE RMSE Average LASSO GBDT RF Basic DeepSD Advanced DeepSD Representation MAE Basic DeepSD RMSE (per epoch) Advanced DeepSD MAE RMSE (per epoch) One-hot s s Embedding s s TABLE IV. AreaID AreaID DISTANCE OF EMBEDDED AREAS MAE Fig. 9. LASSO GBDT RF Basic DeepSD Advanced DeepSD threshold (a) MAE RMSE Accuracy under different thresholds 12 1 LASSO GBDT 8 RF Basic DeepSD Advanced DeepSD threshold (b) RMSE - Real-time waiting time vector V wt ; Historical wait time vector of different days of week H (Mon), wt,..., H (Sun), wt. - Weather conditions; Traffic conditions. Table II shows the comparison results. From Table II, we can see that the empirical average gap is much larger than the other methods. By carefully tuning the parameters, LASSO provides a much better prediction result than the empirical average. GBDT achieves the best prediction accuracy among all existing methods, for both MAE and RMSE. The overall error of the RF is somewhat worse than that of LASSO. Our models significantly outperform all existing methods. Basic DeepSD only uses the real-time order data, yet already outperforms the other methods even when they use more input features. The advanced DeepSD achieves the best prediction results for both MAE and RMSE, which demonstrates its prediction power. The RMSE of the advanced DeepSD is 11.9% lower than the best existing method. In Fig. 9, we further enumerate a threshold and compare the models under different threshold. For a specific threshold, we evaluate the models on a subset of test data which has the gaps smaller than the threshold. Basic DeepSD shows a comparable result with GBDT for RMSE. However, for MAE, Basic DeepSD is significantly better than GBDT. For all the thresholds, Advanced DeepSD gives out the best result for both evaluations. Fig. 1 shows the prediction curves of the advanced model and that of GBDT (which performs the best among all other methods). The figure shows that GBDT is more likely to overestimate or underestimate the supply-demand gap under rapid variations. See the curves in the circles in the figure. Our model provide a relatively more accurate prediction result even under very rapid variations. D. Effects of Embedding Our model uses the embedding representation instead of one-hot representation for the categorical values. To show the RMSE Fig. 12. basic advanced A B C Cases (a) MAE RMSE Effects of the Environment Part basic advanced A B C Cases (b) RMSE effectiveness of embedding, we list in Table III the errors of different models with both embedding representation and one-hot representation respectively. The experimental results show that utilizing the embedding methods improves both the time-cost and the accuracy. Moreover, recall that in Section IV-A, we claim that the embedding technique can cluster the data with similar supplydemand patterns to enhance the prediction accuracy. To verify this, we consider the embedded vectors of different areas. We compare the supply-demand curves of different areas. We find that if two area IDs are close in the embedding space, their supply-demand patterns are very similar. As an example, we show the pairwise Euclidean distances among four different areas in the embedding space in Table IV. We can see that in the embedding space, Area 3 is very close to Area 19 and Area 4 is very close to Area 24. We plot the car-hailing demand curves in 1st March in these areas, as shown in Fig. 11(a) and Fig. 11(b). From the figure we can see that for the areas which are close in the embedding space, their demand curves are very similar. Meanwhile, for the areas which are far apart from each other, the corresponding demand curves are very different. More importantly, in the experiment, we find that our model is able to discover the supply-demand similarity under different scales. In another word, our model discovers the similarity of supply-demand trends regardless of the scales. For example, Fig. 11(c) shows the demand curves of Area 4 and Area 46. The demands in these two areas are in different scales and the demand curves do not even overlap. However, the distance of these two areas obtained by our model in the embedding space is only Actually, if we plot two demand curves under the same scale (as shown in Fig. 11(d)), we can see that the curves are very similar, i.e., they have similar supply-demand trends.

10 Fig. 1. Comparison of GBDT and DeepSD. See the curves in the circles, where the ground truth changes drastically Area19 Area Area24 Area Area46 Area4 of Area Area46 Area of Area 4 : 6: 12: 18: : 6: 12: 18: : 6: 12: 18: 5 : 6: 12: 18: (a) Area 3 and Area 19 (b) Area 4 and Area 24 (c) Supply-demand before scaling (d) Supply-demand after scaling Fig. 11. Effects of Embedding. (a) and (b): Areas that have similar patterns are also closer in Euclidean distance in the embedding space. (c) and (d): Areas 46 and 4 have similar demand pattern, but at different scales. F. Effects of Residual Fig. 13. The network structure of Basic DeepSD without Residual Learning We adopt the residual learning technique to connect different blocks. To show the effects of residual learning, we eliminate all the shortcut/direct connections and simply concatenate all the blocks by a Concatenate Layer. We show the structure of basic DeepSD without residual learning in Fig 13. The advanced DeepSD without residual learning can be constructed in the same way. The experimental results are shown in Table V. We find that the residual learning improves the prediction accuracy effectively. In contrast, simply concatenating different blocks leads to a larger error. TABLE V. EFFECTS OF RESIDUAL LEARNING Model With Residual Learning Without Residual Learning MAE RMSE MAE RMSE Basic DeepSD Advanced DeepSD E. Effects of Environment Part In our model, we incorporate the environment data (e.g., weather, traffic) to further improve the prediction accuracy. To show the effectiveness of supplementary part, we compare the performances of the models under different cases. In Case A, we only use the order part/extended order part. In Case B, we further incorporate the weather block. In Case C, we use all the blocks as we presented in our paper. Fig. 12 shows the prediction accuracies under different cases. Clearly, incorporating the environment data further reduce the prediction error for both the basic and advanced versions of DeepSD. G. Combining Weights of Different Weekdays Our DeepSD model learns the relative importance for different days of a week, and use a weight vector to combine the features for different days. Specifically, from the current AreaID and WeekID, we obtain a 7-dimensional vector p, which indicates the weights of different days of week (See Equ.(1)). We visualize the weight vectors in two different areas at different days of week, as shown in Fig. 14. The blue bars correspond to the weight vector at Tueay, and the red bars correspond to the weight vector at Sunday. As we can see, the weight vector on the Tueay is extremely different from that on the Sunday. If the current day is Sunday, the weight is only concentrated on the weekends. This also explains the effectiveness of distinguishing the data in weekdays and weekends which is used in prior work [1] [3], [22]. However, even for the same day of week, the weights in different areas can be different. For example in Fig. 14(a), the weight of Tueay is significantly higher than the other days whereas in Fig. 14(b) the weight of all the days are relatively uniform.

Understanding Travel Time to Airports in New York City Sierra Gentry Dominik Schunack

Understanding Travel Time to Airports in New York City Sierra Gentry Dominik Schunack Understanding Travel Time to Airports in New York City Sierra Gentry Dominik Schunack 1 Introduction Even with the rising competition of rideshare services, many in New York City still utilize taxis for

More information

Predicting MTA Bus Arrival Times in New York City

Predicting MTA Bus Arrival Times in New York City Predicting MTA Bus Arrival Times in New York City Man Geen Harold Li, CS 229 Final Project ABSTRACT This final project sought to outperform the Metropolitan Transportation Authority s estimated time of

More information

From perceptrons to word embeddings. Simon Šuster University of Groningen

From perceptrons to word embeddings. Simon Šuster University of Groningen From perceptrons to word embeddings Simon Šuster University of Groningen Outline A basic computational unit Weighting some input to produce an output: classification Perceptron Classify tweets Written

More information

Introduction to Convolutional Neural Networks 2018 / 02 / 23

Introduction to Convolutional Neural Networks 2018 / 02 / 23 Introduction to Convolutional Neural Networks 2018 / 02 / 23 Buzzword: CNN Convolutional neural networks (CNN, ConvNet) is a class of deep, feed-forward (not recurrent) artificial neural networks that

More information

CS 229 Project Final Report: Reinforcement Learning for Neural Network Architecture Category : Theory & Reinforcement Learning

CS 229 Project Final Report: Reinforcement Learning for Neural Network Architecture Category : Theory & Reinforcement Learning CS 229 Project Final Report: Reinforcement Learning for Neural Network Architecture Category : Theory & Reinforcement Learning Lei Lei Ruoxuan Xiong December 16, 2017 1 Introduction Deep Neural Network

More information

Demand and Trip Prediction in Bike Share Systems

Demand and Trip Prediction in Bike Share Systems Demand and Trip Prediction in Bike Share Systems Team members: Zhaonan Qu SUNet ID: zhaonanq December 16, 2017 1 Abstract 2 Introduction Bike Share systems are becoming increasingly popular in urban areas.

More information

Click Prediction and Preference Ranking of RSS Feeds

Click Prediction and Preference Ranking of RSS Feeds Click Prediction and Preference Ranking of RSS Feeds 1 Introduction December 11, 2009 Steven Wu RSS (Really Simple Syndication) is a family of data formats used to publish frequently updated works. RSS

More information

Encapsulating Urban Traffic Rhythms into Road Networks

Encapsulating Urban Traffic Rhythms into Road Networks Encapsulating Urban Traffic Rhythms into Road Networks Junjie Wang +, Dong Wei +, Kun He, Hang Gong, Pu Wang * School of Traffic and Transportation Engineering, Central South University, Changsha, Hunan,

More information

CSE 417T: Introduction to Machine Learning. Final Review. Henry Chai 12/4/18

CSE 417T: Introduction to Machine Learning. Final Review. Henry Chai 12/4/18 CSE 417T: Introduction to Machine Learning Final Review Henry Chai 12/4/18 Overfitting Overfitting is fitting the training data more than is warranted Fitting noise rather than signal 2 Estimating! "#$

More information

Machine Learning for Large-Scale Data Analysis and Decision Making A. Neural Networks Week #6

Machine Learning for Large-Scale Data Analysis and Decision Making A. Neural Networks Week #6 Machine Learning for Large-Scale Data Analysis and Decision Making 80-629-17A Neural Networks Week #6 Today Neural Networks A. Modeling B. Fitting C. Deep neural networks Today s material is (adapted)

More information

Apprentissage, réseaux de neurones et modèles graphiques (RCP209) Neural Networks and Deep Learning

Apprentissage, réseaux de neurones et modèles graphiques (RCP209) Neural Networks and Deep Learning Apprentissage, réseaux de neurones et modèles graphiques (RCP209) Neural Networks and Deep Learning Nicolas Thome Prenom.Nom@cnam.fr http://cedric.cnam.fr/vertigo/cours/ml2/ Département Informatique Conservatoire

More information

Integrated Electricity Demand and Price Forecasting

Integrated Electricity Demand and Price Forecasting Integrated Electricity Demand and Price Forecasting Create and Evaluate Forecasting Models The many interrelated factors which influence demand for electricity cannot be directly modeled by closed-form

More information

Caesar s Taxi Prediction Services

Caesar s Taxi Prediction Services 1 Caesar s Taxi Prediction Services Predicting NYC Taxi Fares, Trip Distance, and Activity Paul Jolly, Boxiao Pan, Varun Nambiar Abstract In this paper, we propose three models each predicting either taxi

More information

Machine learning comes from Bayesian decision theory in statistics. There we want to minimize the expected value of the loss function.

Machine learning comes from Bayesian decision theory in statistics. There we want to minimize the expected value of the loss function. Bayesian learning: Machine learning comes from Bayesian decision theory in statistics. There we want to minimize the expected value of the loss function. Let y be the true label and y be the predicted

More information

Learning from Examples

Learning from Examples Learning from Examples Data fitting Decision trees Cross validation Computational learning theory Linear classifiers Neural networks Nonparametric methods: nearest neighbor Support vector machines Ensemble

More information

Deep Learning Architecture for Univariate Time Series Forecasting

Deep Learning Architecture for Univariate Time Series Forecasting CS229,Technical Report, 2014 Deep Learning Architecture for Univariate Time Series Forecasting Dmitry Vengertsev 1 Abstract This paper studies the problem of applying machine learning with deep architecture

More information

A Deep Learning Approach to the Citywide Traffic Accident Risk Prediction

A Deep Learning Approach to the Citywide Traffic Accident Risk Prediction A Deep Learning Approach to the Citywide Traffic Accident Risk Prediction Honglei Ren *, You Song *, Jingwen Wang *, Yucheng Hu +, and Jinzhi Lei + * School of Software, Beihang University, Beijing, China,

More information

Surge Pricing and Labor Supply in the Ride- Sourcing Market

Surge Pricing and Labor Supply in the Ride- Sourcing Market Surge Pricing and Labor Supply in the Ride- Sourcing Market Yafeng Yin Professor Department of Civil and Environmental Engineering University of Michigan, Ann Arbor *Joint work with Liteng Zha (@Amazon)

More information

Exploring Human Mobility with Multi-Source Data at Extremely Large Metropolitan Scales. ACM MobiCom 2014, Maui, HI

Exploring Human Mobility with Multi-Source Data at Extremely Large Metropolitan Scales. ACM MobiCom 2014, Maui, HI Exploring Human Mobility with Multi-Source Data at Extremely Large Metropolitan Scales Desheng Zhang & Tian He University of Minnesota, USA Jun Huang, Ye Li, Fan Zhang, Chengzhong Xu Shenzhen Institute

More information

CPSC 340: Machine Learning and Data Mining. Stochastic Gradient Fall 2017

CPSC 340: Machine Learning and Data Mining. Stochastic Gradient Fall 2017 CPSC 340: Machine Learning and Data Mining Stochastic Gradient Fall 2017 Assignment 3: Admin Check update thread on Piazza for correct definition of trainndx. This could make your cross-validation code

More information

Random Coattention Forest for Question Answering

Random Coattention Forest for Question Answering Random Coattention Forest for Question Answering Jheng-Hao Chen Stanford University jhenghao@stanford.edu Ting-Po Lee Stanford University tingpo@stanford.edu Yi-Chun Chen Stanford University yichunc@stanford.edu

More information

Summary and discussion of: Dropout Training as Adaptive Regularization

Summary and discussion of: Dropout Training as Adaptive Regularization Summary and discussion of: Dropout Training as Adaptive Regularization Statistics Journal Club, 36-825 Kirstin Early and Calvin Murdock November 21, 2014 1 Introduction Multi-layered (i.e. deep) artificial

More information

ECE521 Lectures 9 Fully Connected Neural Networks

ECE521 Lectures 9 Fully Connected Neural Networks ECE521 Lectures 9 Fully Connected Neural Networks Outline Multi-class classification Learning multi-layer neural networks 2 Measuring distance in probability space We learnt that the squared L2 distance

More information

Semantics with Dense Vectors. Reference: D. Jurafsky and J. Martin, Speech and Language Processing

Semantics with Dense Vectors. Reference: D. Jurafsky and J. Martin, Speech and Language Processing Semantics with Dense Vectors Reference: D. Jurafsky and J. Martin, Speech and Language Processing 1 Semantics with Dense Vectors We saw how to represent a word as a sparse vector with dimensions corresponding

More information

Lecture 2: Linear regression

Lecture 2: Linear regression Lecture 2: Linear regression Roger Grosse 1 Introduction Let s ump right in and look at our first machine learning algorithm, linear regression. In regression, we are interested in predicting a scalar-valued

More information

Need for Deep Networks Perceptron. Can only model linear functions. Kernel Machines. Non-linearity provided by kernels

Need for Deep Networks Perceptron. Can only model linear functions. Kernel Machines. Non-linearity provided by kernels Need for Deep Networks Perceptron Can only model linear functions Kernel Machines Non-linearity provided by kernels Need to design appropriate kernels (possibly selecting from a set, i.e. kernel learning)

More information

CSC321 Lecture 2: Linear Regression

CSC321 Lecture 2: Linear Regression CSC32 Lecture 2: Linear Regression Roger Grosse Roger Grosse CSC32 Lecture 2: Linear Regression / 26 Overview First learning algorithm of the course: linear regression Task: predict scalar-valued targets,

More information

Google s Neural Machine Translation System: Bridging the Gap between Human and Machine Translation

Google s Neural Machine Translation System: Bridging the Gap between Human and Machine Translation Google s Neural Machine Translation System: Bridging the Gap between Human and Machine Translation Y. Wu, M. Schuster, Z. Chen, Q.V. Le, M. Norouzi, et al. Google arxiv:1609.08144v2 Reviewed by : Bill

More information

arxiv: v2 [cs.ne] 22 Feb 2013

arxiv: v2 [cs.ne] 22 Feb 2013 Sparse Penalty in Deep Belief Networks: Using the Mixed Norm Constraint arxiv:1301.3533v2 [cs.ne] 22 Feb 2013 Xanadu C. Halkias DYNI, LSIS, Universitè du Sud, Avenue de l Université - BP20132, 83957 LA

More information

Short-term water demand forecast based on deep neural network ABSTRACT

Short-term water demand forecast based on deep neural network ABSTRACT Short-term water demand forecast based on deep neural network Guancheng Guo 1, Shuming Liu 2 1,2 School of Environment, Tsinghua University, 100084, Beijing, China 2 shumingliu@tsinghua.edu.cn ABSTRACT

More information

Neural Networks. Single-layer neural network. CSE 446: Machine Learning Emily Fox University of Washington March 10, /9/17

Neural Networks. Single-layer neural network. CSE 446: Machine Learning Emily Fox University of Washington March 10, /9/17 3/9/7 Neural Networks Emily Fox University of Washington March 0, 207 Slides adapted from Ali Farhadi (via Carlos Guestrin and Luke Zettlemoyer) Single-layer neural network 3/9/7 Perceptron as a neural

More information

Comments. Assignment 3 code released. Thought questions 3 due this week. Mini-project: hopefully you have started. implement classification algorithms

Comments. Assignment 3 code released. Thought questions 3 due this week. Mini-project: hopefully you have started. implement classification algorithms Neural networks Comments Assignment 3 code released implement classification algorithms use kernels for census dataset Thought questions 3 due this week Mini-project: hopefully you have started 2 Example:

More information

Deep Learning Lab Course 2017 (Deep Learning Practical)

Deep Learning Lab Course 2017 (Deep Learning Practical) Deep Learning Lab Course 207 (Deep Learning Practical) Labs: (Computer Vision) Thomas Brox, (Robotics) Wolfram Burgard, (Machine Learning) Frank Hutter, (Neurorobotics) Joschka Boedecker University of

More information

Lecture 6: Neural Networks for Representing Word Meaning

Lecture 6: Neural Networks for Representing Word Meaning Lecture 6: Neural Networks for Representing Word Meaning Mirella Lapata School of Informatics University of Edinburgh mlap@inf.ed.ac.uk February 7, 2017 1 / 28 Logistic Regression Input is a feature vector,

More information

Neural Networks and Deep Learning

Neural Networks and Deep Learning Neural Networks and Deep Learning Professor Ameet Talwalkar November 12, 2015 Professor Ameet Talwalkar Neural Networks and Deep Learning November 12, 2015 1 / 16 Outline 1 Review of last lecture AdaBoost

More information

Stochastic Gradient Descent. CS 584: Big Data Analytics

Stochastic Gradient Descent. CS 584: Big Data Analytics Stochastic Gradient Descent CS 584: Big Data Analytics Gradient Descent Recap Simplest and extremely popular Main Idea: take a step proportional to the negative of the gradient Easy to implement Each iteration

More information

Predicting New Search-Query Cluster Volume

Predicting New Search-Query Cluster Volume Predicting New Search-Query Cluster Volume Jacob Sisk, Cory Barr December 14, 2007 1 Problem Statement Search engines allow people to find information important to them, and search engine companies derive

More information

Introduction to Logistic Regression

Introduction to Logistic Regression Introduction to Logistic Regression Guy Lebanon Binary Classification Binary classification is the most basic task in machine learning, and yet the most frequent. Binary classifiers often serve as the

More information

Neural Networks. David Rosenberg. July 26, New York University. David Rosenberg (New York University) DS-GA 1003 July 26, / 35

Neural Networks. David Rosenberg. July 26, New York University. David Rosenberg (New York University) DS-GA 1003 July 26, / 35 Neural Networks David Rosenberg New York University July 26, 2017 David Rosenberg (New York University) DS-GA 1003 July 26, 2017 1 / 35 Neural Networks Overview Objectives What are neural networks? How

More information

CSCI567 Machine Learning (Fall 2018)

CSCI567 Machine Learning (Fall 2018) CSCI567 Machine Learning (Fall 2018) Prof. Haipeng Luo U of Southern California Sep 12, 2018 September 12, 2018 1 / 49 Administration GitHub repos are setup (ask TA Chi Zhang for any issues) HW 1 is due

More information

Deep Multi-View Spatial-Temporal Network for Taxi Demand Prediction

Deep Multi-View Spatial-Temporal Network for Taxi Demand Prediction Deep Multi-View Spatial-Temporal Network for Taxi Demand Prediction Huaxiu Yao, Fei Wu Pennsylvania State University {huaxiuyao, fxw133}@ist.psu.edu Jintao Ke Hong Kong University of Science and Technology

More information

The Research of Urban Rail Transit Sectional Passenger Flow Prediction Method

The Research of Urban Rail Transit Sectional Passenger Flow Prediction Method Journal of Intelligent Learning Systems and Applications, 2013, 5, 227-231 Published Online November 2013 (http://www.scirp.org/journal/jilsa) http://dx.doi.org/10.4236/jilsa.2013.54026 227 The Research

More information

Simple Techniques for Improving SGD. CS6787 Lecture 2 Fall 2017

Simple Techniques for Improving SGD. CS6787 Lecture 2 Fall 2017 Simple Techniques for Improving SGD CS6787 Lecture 2 Fall 2017 Step Sizes and Convergence Where we left off Stochastic gradient descent x t+1 = x t rf(x t ; yĩt ) Much faster per iteration than gradient

More information

Deep Learning: a gentle introduction

Deep Learning: a gentle introduction Deep Learning: a gentle introduction Jamal Atif jamal.atif@dauphine.fr PSL, Université Paris-Dauphine, LAMSADE February 8, 206 Jamal Atif (Université Paris-Dauphine) Deep Learning February 8, 206 / Why

More information

DANIEL WILSON AND BEN CONKLIN. Integrating AI with Foundation Intelligence for Actionable Intelligence

DANIEL WILSON AND BEN CONKLIN. Integrating AI with Foundation Intelligence for Actionable Intelligence DANIEL WILSON AND BEN CONKLIN Integrating AI with Foundation Intelligence for Actionable Intelligence INTEGRATING AI WITH FOUNDATION INTELLIGENCE FOR ACTIONABLE INTELLIGENCE in an arms race for artificial

More information

Statistical Machine Learning

Statistical Machine Learning Statistical Machine Learning Lecture 9 Numerical optimization and deep learning Niklas Wahlström Division of Systems and Control Department of Information Technology Uppsala University niklas.wahlstrom@it.uu.se

More information

Lecture 17: Neural Networks and Deep Learning

Lecture 17: Neural Networks and Deep Learning UVA CS 6316 / CS 4501-004 Machine Learning Fall 2016 Lecture 17: Neural Networks and Deep Learning Jack Lanchantin Dr. Yanjun Qi 1 Neurons 1-Layer Neural Network Multi-layer Neural Network Loss Functions

More information

Forecasting demand in the National Electricity Market. October 2017

Forecasting demand in the National Electricity Market. October 2017 Forecasting demand in the National Electricity Market October 2017 Agenda Trends in the National Electricity Market A review of AEMO s forecasting methods Long short-term memory (LSTM) neural networks

More information

Learning Deep Architectures for AI. Part II - Vijay Chakilam

Learning Deep Architectures for AI. Part II - Vijay Chakilam Learning Deep Architectures for AI - Yoshua Bengio Part II - Vijay Chakilam Limitations of Perceptron x1 W, b 0,1 1,1 y x2 weight plane output =1 output =0 There is no value for W and b such that the model

More information

STA 414/2104: Lecture 8

STA 414/2104: Lecture 8 STA 414/2104: Lecture 8 6-7 March 2017: Continuous Latent Variable Models, Neural networks With thanks to Russ Salakhutdinov, Jimmy Ba and others Outline Continuous latent variable models Background PCA

More information

Machine Learning. Boris

Machine Learning. Boris Machine Learning Boris Nadion boris@astrails.com @borisnadion @borisnadion boris@astrails.com astrails http://astrails.com awesome web and mobile apps since 2005 terms AI (artificial intelligence)

More information

Lecture 5 Neural models for NLP

Lecture 5 Neural models for NLP CS546: Machine Learning in NLP (Spring 2018) http://courses.engr.illinois.edu/cs546/ Lecture 5 Neural models for NLP Julia Hockenmaier juliahmr@illinois.edu 3324 Siebel Center Office hours: Tue/Thu 2pm-3pm

More information

UAPD: Predicting Urban Anomalies from Spatial-Temporal Data

UAPD: Predicting Urban Anomalies from Spatial-Temporal Data UAPD: Predicting Urban Anomalies from Spatial-Temporal Data Xian Wu, Yuxiao Dong, Chao Huang, Jian Xu, Dong Wang and Nitesh V. Chawla* Department of Computer Science and Engineering University of Notre

More information

CS 188: Artificial Intelligence Spring Announcements

CS 188: Artificial Intelligence Spring Announcements CS 188: Artificial Intelligence Spring 2011 Lecture 12: Probability 3/2/2011 Pieter Abbeel UC Berkeley Many slides adapted from Dan Klein. 1 Announcements P3 due on Monday (3/7) at 4:59pm W3 going out

More information

Interpreting Deep Classifiers

Interpreting Deep Classifiers Ruprecht-Karls-University Heidelberg Faculty of Mathematics and Computer Science Seminar: Explainable Machine Learning Interpreting Deep Classifiers by Visual Distillation of Dark Knowledge Author: Daniela

More information

Introduction to Neural Networks

Introduction to Neural Networks CUONG TUAN NGUYEN SEIJI HOTTA MASAKI NAKAGAWA Tokyo University of Agriculture and Technology Copyright by Nguyen, Hotta and Nakagawa 1 Pattern classification Which category of an input? Example: Character

More information

18.6 Regression and Classification with Linear Models

18.6 Regression and Classification with Linear Models 18.6 Regression and Classification with Linear Models 352 The hypothesis space of linear functions of continuous-valued inputs has been used for hundreds of years A univariate linear function (a straight

More information

Need for Deep Networks Perceptron. Can only model linear functions. Kernel Machines. Non-linearity provided by kernels

Need for Deep Networks Perceptron. Can only model linear functions. Kernel Machines. Non-linearity provided by kernels Need for Deep Networks Perceptron Can only model linear functions Kernel Machines Non-linearity provided by kernels Need to design appropriate kernels (possibly selecting from a set, i.e. kernel learning)

More information

The prediction of passenger flow under transport disturbance using accumulated passenger data

The prediction of passenger flow under transport disturbance using accumulated passenger data Computers in Railways XIV 623 The prediction of passenger flow under transport disturbance using accumulated passenger data T. Kunimatsu & C. Hirai Signalling and Transport Information Technology Division,

More information

Based on the original slides of Hung-yi Lee

Based on the original slides of Hung-yi Lee Based on the original slides of Hung-yi Lee Google Trends Deep learning obtains many exciting results. Can contribute to new Smart Services in the Context of the Internet of Things (IoT). IoT Services

More information

Administration. Registration Hw3 is out. Lecture Captioning (Extra-Credit) Scribing lectures. Questions. Due on Thursday 10/6

Administration. Registration Hw3 is out. Lecture Captioning (Extra-Credit) Scribing lectures. Questions. Due on Thursday 10/6 Administration Registration Hw3 is out Due on Thursday 10/6 Questions Lecture Captioning (Extra-Credit) Look at Piazza for details Scribing lectures With pay; come talk to me/send email. 1 Projects Projects

More information

Machine Learning. Neural Networks. (slides from Domingos, Pardo, others)

Machine Learning. Neural Networks. (slides from Domingos, Pardo, others) Machine Learning Neural Networks (slides from Domingos, Pardo, others) For this week, Reading Chapter 4: Neural Networks (Mitchell, 1997) See Canvas For subsequent weeks: Scaling Learning Algorithms toward

More information

Encoder Based Lifelong Learning - Supplementary materials

Encoder Based Lifelong Learning - Supplementary materials Encoder Based Lifelong Learning - Supplementary materials Amal Rannen Rahaf Aljundi Mathew B. Blaschko Tinne Tuytelaars KU Leuven KU Leuven, ESAT-PSI, IMEC, Belgium firstname.lastname@esat.kuleuven.be

More information

Linear Models for Regression

Linear Models for Regression Linear Models for Regression CSE 4309 Machine Learning Vassilis Athitsos Computer Science and Engineering Department University of Texas at Arlington 1 The Regression Problem Training data: A set of input-output

More information

Predicting flight on-time performance

Predicting flight on-time performance 1 Predicting flight on-time performance Arjun Mathur, Aaron Nagao, Kenny Ng I. INTRODUCTION Time is money, and delayed flights are a frequent cause of frustration for both travellers and airline companies.

More information

Algorithms for Learning Good Step Sizes

Algorithms for Learning Good Step Sizes 1 Algorithms for Learning Good Step Sizes Brian Zhang (bhz) and Manikant Tiwari (manikant) with the guidance of Prof. Tim Roughgarden I. MOTIVATION AND PREVIOUS WORK Many common algorithms in machine learning,

More information

The Deep Ritz method: A deep learning-based numerical algorithm for solving variational problems

The Deep Ritz method: A deep learning-based numerical algorithm for solving variational problems The Deep Ritz method: A deep learning-based numerical algorithm for solving variational problems Weinan E 1 and Bing Yu 2 arxiv:1710.00211v1 [cs.lg] 30 Sep 2017 1 The Beijing Institute of Big Data Research,

More information

Topics in AI (CPSC 532L): Multimodal Learning with Vision, Language and Sound. Lecture 3: Introduction to Deep Learning (continued)

Topics in AI (CPSC 532L): Multimodal Learning with Vision, Language and Sound. Lecture 3: Introduction to Deep Learning (continued) Topics in AI (CPSC 532L): Multimodal Learning with Vision, Language and Sound Lecture 3: Introduction to Deep Learning (continued) Course Logistics - Update on course registrations - 6 seats left now -

More information

Deep Residual. Variations

Deep Residual. Variations Deep Residual Network and Its Variations Diyu Yang (Originally prepared by Kaiming He from Microsoft Research) Advantages of Depth Degradation Problem Possible Causes? Vanishing/Exploding Gradients. Overfitting

More information

ECE521 Lecture 7/8. Logistic Regression

ECE521 Lecture 7/8. Logistic Regression ECE521 Lecture 7/8 Logistic Regression Outline Logistic regression (Continue) A single neuron Learning neural networks Multi-class classification 2 Logistic regression The output of a logistic regression

More information

Exploiting Geographic Dependencies for Real Estate Appraisal

Exploiting Geographic Dependencies for Real Estate Appraisal Exploiting Geographic Dependencies for Real Estate Appraisal Yanjie Fu Joint work with Hui Xiong, Yu Zheng, Yong Ge, Zhihua Zhou, Zijun Yao Rutgers, the State University of New Jersey Microsoft Research

More information

Improving the travel time prediction by using the real-time floating car data

Improving the travel time prediction by using the real-time floating car data Improving the travel time prediction by using the real-time floating car data Krzysztof Dembczyński Przemys law Gawe l Andrzej Jaszkiewicz Wojciech Kot lowski Adam Szarecki Institute of Computing Science,

More information

Where to Find My Next Passenger?

Where to Find My Next Passenger? Where to Find My Next Passenger? Jing Yuan 1 Yu Zheng 2 Liuhang Zhang 1 Guangzhong Sun 1 1 University of Science and Technology of China 2 Microsoft Research Asia September 19, 2011 Jing Yuan et al. (USTC,MSRA)

More information

DATA MINING AND MACHINE LEARNING

DATA MINING AND MACHINE LEARNING DATA MINING AND MACHINE LEARNING Lecture 5: Regularization and loss functions Lecturer: Simone Scardapane Academic Year 2016/2017 Table of contents Loss functions Loss functions for regression problems

More information

Real Estate Price Prediction with Regression and Classification CS 229 Autumn 2016 Project Final Report

Real Estate Price Prediction with Regression and Classification CS 229 Autumn 2016 Project Final Report Real Estate Price Prediction with Regression and Classification CS 229 Autumn 2016 Project Final Report Hujia Yu, Jiafu Wu [hujiay, jiafuwu]@stanford.edu 1. Introduction Housing prices are an important

More information

Predictive analysis on Multivariate, Time Series datasets using Shapelets

Predictive analysis on Multivariate, Time Series datasets using Shapelets 1 Predictive analysis on Multivariate, Time Series datasets using Shapelets Hemal Thakkar Department of Computer Science, Stanford University hemal@stanford.edu hemal.tt@gmail.com Abstract Multivariate,

More information

INTRODUCTION TO DATA SCIENCE

INTRODUCTION TO DATA SCIENCE INTRODUCTION TO DATA SCIENCE JOHN P DICKERSON Lecture #13 3/9/2017 CMSC320 Tuesdays & Thursdays 3:30pm 4:45pm ANNOUNCEMENTS Mini-Project #1 is due Saturday night (3/11): Seems like people are able to do

More information

Predicting freeway traffic in the Bay Area

Predicting freeway traffic in the Bay Area Predicting freeway traffic in the Bay Area Jacob Baldwin Email: jtb5np@stanford.edu Chen-Hsuan Sun Email: chsun@stanford.edu Ya-Ting Wang Email: yatingw@stanford.edu Abstract The hourly occupancy rate

More information

CME323 Distributed Algorithms and Optimization. GloVe on Spark. Alex Adamson SUNet ID: aadamson. June 6, 2016

CME323 Distributed Algorithms and Optimization. GloVe on Spark. Alex Adamson SUNet ID: aadamson. June 6, 2016 GloVe on Spark Alex Adamson SUNet ID: aadamson June 6, 2016 Introduction Pennington et al. proposes a novel word representation algorithm called GloVe (Global Vectors for Word Representation) that synthesizes

More information

Parking Study MAIN ST

Parking Study MAIN ST Parking Study This parking study was initiated to help understand parking supply and parking demand within Oneida City Center. The parking study was performed and analyzed by the Madison County Planning

More information

ECE 5424: Introduction to Machine Learning

ECE 5424: Introduction to Machine Learning ECE 5424: Introduction to Machine Learning Topics: Ensemble Methods: Bagging, Boosting PAC Learning Readings: Murphy 16.4;; Hastie 16 Stefan Lee Virginia Tech Fighting the bias-variance tradeoff Simple

More information

Dreem Challenge report (team Bussanati)

Dreem Challenge report (team Bussanati) Wavelet course, MVA 04-05 Simon Bussy, simon.bussy@gmail.com Antoine Recanati, arecanat@ens-cachan.fr Dreem Challenge report (team Bussanati) Description and specifics of the challenge We worked on the

More information

Deep Learning for Natural Language Processing. Sidharth Mudgal April 4, 2017

Deep Learning for Natural Language Processing. Sidharth Mudgal April 4, 2017 Deep Learning for Natural Language Processing Sidharth Mudgal April 4, 2017 Table of contents 1. Intro 2. Word Vectors 3. Word2Vec 4. Char Level Word Embeddings 5. Application: Entity Matching 6. Conclusion

More information

Feature Design. Feature Design. Feature Design. & Deep Learning

Feature Design. Feature Design. Feature Design. & Deep Learning Artificial Intelligence and its applications Lecture 9 & Deep Learning Professor Daniel Yeung danyeung@ieee.org Dr. Patrick Chan patrickchan@ieee.org South China University of Technology, China Appropriately

More information

Source localization in an ocean waveguide using supervised machine learning

Source localization in an ocean waveguide using supervised machine learning Source localization in an ocean waveguide using supervised machine learning Haiqiang Niu, Emma Reeves, and Peter Gerstoft Scripps Institution of Oceanography, UC San Diego Part I Localization on Noise09

More information

(Feed-Forward) Neural Networks Dr. Hajira Jabeen, Prof. Jens Lehmann

(Feed-Forward) Neural Networks Dr. Hajira Jabeen, Prof. Jens Lehmann (Feed-Forward) Neural Networks 2016-12-06 Dr. Hajira Jabeen, Prof. Jens Lehmann Outline In the previous lectures we have learned about tensors and factorization methods. RESCAL is a bilinear model for

More information

Machine Learning for Computer Vision 8. Neural Networks and Deep Learning. Vladimir Golkov Technical University of Munich Computer Vision Group

Machine Learning for Computer Vision 8. Neural Networks and Deep Learning. Vladimir Golkov Technical University of Munich Computer Vision Group Machine Learning for Computer Vision 8. Neural Networks and Deep Learning Vladimir Golkov Technical University of Munich Computer Vision Group INTRODUCTION Nonlinear Coordinate Transformation http://cs.stanford.edu/people/karpathy/convnetjs/

More information

VISUAL EXPLORATION OF SPATIAL-TEMPORAL TRAFFIC CONGESTION PATTERNS USING FLOATING CAR DATA. Candra Kartika 2015

VISUAL EXPLORATION OF SPATIAL-TEMPORAL TRAFFIC CONGESTION PATTERNS USING FLOATING CAR DATA. Candra Kartika 2015 VISUAL EXPLORATION OF SPATIAL-TEMPORAL TRAFFIC CONGESTION PATTERNS USING FLOATING CAR DATA Candra Kartika 2015 OVERVIEW Motivation Background and State of The Art Test data Visualization methods Result

More information

Classification of Higgs Boson Tau-Tau decays using GPU accelerated Neural Networks

Classification of Higgs Boson Tau-Tau decays using GPU accelerated Neural Networks Classification of Higgs Boson Tau-Tau decays using GPU accelerated Neural Networks Mohit Shridhar Stanford University mohits@stanford.edu, mohit@u.nus.edu Abstract In particle physics, Higgs Boson to tau-tau

More information

Jakub Hajic Artificial Intelligence Seminar I

Jakub Hajic Artificial Intelligence Seminar I Jakub Hajic Artificial Intelligence Seminar I. 11. 11. 2014 Outline Key concepts Deep Belief Networks Convolutional Neural Networks A couple of questions Convolution Perceptron Feedforward Neural Network

More information

Introduction to Deep Neural Networks

Introduction to Deep Neural Networks Introduction to Deep Neural Networks Presenter: Chunyuan Li Pattern Classification and Recognition (ECE 681.01) Duke University April, 2016 Outline 1 Background and Preliminaries Why DNNs? Model: Logistic

More information

MIDTERM: CS 6375 INSTRUCTOR: VIBHAV GOGATE October,

MIDTERM: CS 6375 INSTRUCTOR: VIBHAV GOGATE October, MIDTERM: CS 6375 INSTRUCTOR: VIBHAV GOGATE October, 23 2013 The exam is closed book. You are allowed a one-page cheat sheet. Answer the questions in the spaces provided on the question sheets. If you run

More information

Explaining Results of Neural Networks by Contextual Importance and Utility

Explaining Results of Neural Networks by Contextual Importance and Utility Explaining Results of Neural Networks by Contextual Importance and Utility Kary FRÄMLING Dep. SIMADE, Ecole des Mines, 158 cours Fauriel, 42023 Saint-Etienne Cedex 2, FRANCE framling@emse.fr, tel.: +33-77.42.66.09

More information

Neural Network Language Modeling

Neural Network Language Modeling Neural Network Language Modeling Instructor: Wei Xu Ohio State University CSE 5525 Many slides from Marek Rei, Philipp Koehn and Noah Smith Course Project Sign up your course project In-class presentation

More information

Boosting: Algorithms and Applications

Boosting: Algorithms and Applications Boosting: Algorithms and Applications Lecture 11, ENGN 4522/6520, Statistical Pattern Recognition and Its Applications in Computer Vision ANU 2 nd Semester, 2008 Chunhua Shen, NICTA/RSISE Boosting Definition

More information

What s Cooking? Predicting Cuisines from Recipe Ingredients

What s Cooking? Predicting Cuisines from Recipe Ingredients What s Cooking? Predicting Cuisines from Recipe Ingredients Kevin K. Do Department of Computer Science Duke University Durham, NC 27708 kevin.kydat.do@gmail.com Abstract Kaggle is an online platform for

More information

Journal of Chemical and Pharmaceutical Research, 2014, 6(5): Research Article

Journal of Chemical and Pharmaceutical Research, 2014, 6(5): Research Article Available online www.jocpr.com Journal of Chemical and Pharmaceutical Research, 2014, 6(5):266-270 Research Article ISSN : 0975-7384 CODEN(USA) : JCPRC5 Anomaly detection of cigarette sales using ARIMA

More information

Prediction and Uncertainty Quantification of Daily Airport Flight Delays

Prediction and Uncertainty Quantification of Daily Airport Flight Delays Proceedings of Machine Learning Research 82 (2017) 45-51, 4th International Conference on Predictive Applications and APIs Prediction and Uncertainty Quantification of Daily Airport Flight Delays Thomas

More information

How to shape future met-services: a seamless perspective

How to shape future met-services: a seamless perspective How to shape future met-services: a seamless perspective Paolo Ruti, Chief World Weather Research Division Sarah Jones, Chair Scientific Steering Committee Improving the skill big resources ECMWF s forecast

More information

STA 414/2104: Lecture 8

STA 414/2104: Lecture 8 STA 414/2104: Lecture 8 6-7 March 2017: Continuous Latent Variable Models, Neural networks Delivered by Mark Ebden With thanks to Russ Salakhutdinov, Jimmy Ba and others Outline Continuous latent variable

More information