Proactive MDP-based Collision Avoidance Algorithm for Autonomous Cars

Size: px
Start display at page:

Download "Proactive MDP-based Collision Avoidance Algorithm for Autonomous Cars"

Transcription

1 Proactive MDP-based Collision Avoidance Algorithm for Autonomous Cars Denis Osipychev, Duy Tran, Weihua Sheng School of Electrical and Computer Engineering Oklahoma State University Stillwater, Oklahoma Girish Chowdhary School of Mechanical and Aerospace Engineering Oklahoma State University Stillwater, Oklahoma Ruili Zeng Department of Automobile Engineering Military Transportation University Tianjin, China Abstract This paper considers a decision making problem of an autonomous car driving through the intersection with the presence of human-driving cars. A proactive collision avoidance system based on a learning-based MDP model is proposed in contrast to a reactive system. This approach allows to pose the question as an optimization problem. The proposed learning algorithm explicitly describes the interaction with the environment through a probabilistic transition model. The effectiveness of this concept is supported by a variety of simulations which include driving behaviors with Gaussian-distributed velocity, random actions and real human driving. I. INTRODUCTION The high risk of collisions and the severity of the possible consequences remain to be the main properties of land transportation. Driving in the presence of other road users is a complex task achieved by human drivers only, but even they make wrong actions leading to lamentable statistics. Safe and reliable decision making is a major challenge for the use and popularization of the autonomous robotic vehicles. To fit the existing traffic manner the modern autonomous cars are expected to have fast reactive safety system and proactive predictive control algorithm [1], [2]. The reactive safety features warn the driver about difficulties on a road or even make the urgent actions to avoid the accidents. They were developed to surpass the human in time of reaction or excellence of sensors. Because of the use of modern detectors and fast computer logic, such systems had many successful implementations and prevented up to 80% of simulated collisions [3], [4]. For example, the completely reactive robotic system ALVINN uses images from the cameras and Neural Networks for reactive decision making [5]. Reactive safety systems have improved road safety by helping avoid collisions and accidents in the short term. However, further safety improvements are requiring increasing the sensitivity of the reactive systems that leads to an increase in the number of false alarms. Also, most of those systems were non-optimal and annoying to the passengers. Proactive safety allows to achieve a higher sensitivity to the potentially danger situations while taking softer actions. Despite the use of both proactive and reactive methods in This project is supported by the National Science Foundation NSF Grant CISE/IIS , CISE/IIS/ , National Natural Science Foundation of China NSFC Grants , and the Open Research Project of the State Key Laboratory of Industrial Control Technology, Zhejiang University, China No. ICT1408. Fig. 1. Collision avoidance with the use of an intersection s infrastructure is possible in time domain by changing speed in advance. mobile robotics research, it is still a challenge for its adoption in transportation vehicles. There are existing works which recognize a driver s activities and act according to the likelihood of those or other activities. Most of these works consider the world as partially observed or completely hidden where the motivation and dynamics of the processes are not available while only the effects of certain actions can be observed [6], [7]. These works are giving an approximate solution or using heuristic approaches like if-else rules preprogrammed by the developer what does not allow to optimize a solution. This paper proposes the use of classical Markov Decision Process MDP to solve the problem. In this way, it allows us to find the best actions given full knowledge of the parameters of speed, direction and position for all involved vehicles. This condition can be satisfied by establishing RF connections between all cars and transferring the data to each other using V2V or V2I communication as has been explained in the work [8]. Due to the fact that up to 50% of accidents occurred at intersections, this paper introduced and verified the possibility of the use of the MDP framework for planning the actions of an autonomous vehicle the agent and checked the sufficiency of using proactive actions for avoiding collision. Fig. 1 illustrates a sufficiency of an early little change in the speed to avoid a collision in the time domain.

2 II. METHODOLOGY A. Learning-based MDP model optimization over expected reward In this section, we formulate the proactive decision making problem as an optimization problem. For this purpose, the autonomous collision avoidance task is posed as an MDP tuple S,A,T,R that captures the Markovian transition of the car in the real world [9], [10]. Here, S is the set of discrete states of the car, A is the set of desired actions, T s,a,s is the transition model from any state s S to any other state s S when the action a A is taken, and denotes the conditional probability of transition ps a,s. R is the model of the reward obtained by the transition s,a,s. The value of each state is given by the value of the next state discounted by the discount factor γ and the cost of transition and mathematically described by the Bellman equation: V s = max a A T s,a,s Rs,a,s + γv s s S The optimal policy π a is the set of action for each state that maximizes the expected discounted reward: π = argmax E π Rs,a,s π s S There are many approaches to solving MDPs, some of which were surveyed in the recent papers [10], [7]. We chose the value-iteration algorithm due to its convergence guarantees. The proposed method was decomposed into the following steps: creating a dynamical model of a car, learning transition rules for the list of actions over dynamical simulations, solving MDP in order to find the optimal solution to pass the intersection, build dynamical simulation of an intersection to prove the method. B. Dynamic model of a vehicle In order to simulate a dynamics of a car, a simplified dynamical model of the Dubin s car has been described by the equations of motion based on the dynamic vehicle model [11]. It used six parameters to describe the real vehicle and environment: m : Mass of vehicle [kg] a : Distance from front axle to Center of Gravity [m] b : Distance from rear axle to Center of Gravity [m] C x : Longitudinal tire stiffness [N] C y : Lateral tire stiffness [N/rad] C A : Air resistance coefficient [1/m] In this simulation, we chose coefficients according to the Volvo V70 model as followed, m = 1700, a = 1.5, b = 1.5, C x = , C y = 4000, C A = 0.5. Three states of the model were taken into consideration: 1 2 x 1 t = v x t = Longitudinal velocity [m/s] 3 x 2 t = v y t = Lateral velocity [m/s] 4 x 3 t = rt = Yaw rate[rad/s] 5 where v x t and v y t represented longitudinal and lateral velocity. rt was the yaw rate at time t. The state-space structure of the model was illustrated by the following differential equations: dx 1 t = x 2 t x 3 t dt + m 1 [C x u 1 t + u 2 t cosu 5 t 2 C y u 5 t x 2t + a x 3 t sinu 5 t x 1 t +C x u 3 t + u 4 t C A x 1 t 2] 6 dx 2 t = x 1 t x 3 t dt + m 1 [C x u 1 t + u 2 t sinu 5 t + 2 C y u 5 t x 2t + a x 3 t x 1 t ] + 2 C y b x 3t x 2 t x 1 t dx 3 t 1 = dt 0.5 a + b 2 m {a [C x u 1 t + u 2 t sinu 5 t + 2 C y u 5 t x 2 + a x 3 t x 1 t } 2 b C y b x 3t x 2 t x 1 t cosu 5 t cosu 5 t Solving these ordinary differential equations ODE Eq. 6 8 explicitly was difficult. However, Runge-Kutta method [12] provided a numerical solution for the state of the vehiclevelocity, acceleration and yaw rate in every iteration. Fig. 2. An example of MDP formulation showing that some actions lead to the collision state. These actions should be marked by highly negative reward penalty. To utilize the discrete state MDP framework described in Section II-A, the continuous time dynamic model of the car has to be translated to a discrete-state transition model. The example of this translation is shown in Fig. 2. The collision state is defined as a state in a grid world which is occupied ] 7 8

3 by two cars in the same time. As can be inferred from the example, the only way to avoid the collision in the junction of two paths is to reach this point at the time different from another vehicle. This approach enables time to be used as one of the states of the car and allows to separate dynamic states into static states by time steps. To maintain the connection between states the transition model is required. The uncertainty in transitions s s shown in Fig. 3 has to be described in terms of transition probabilities ps s,a. The distribution of the probabilities has to be found according to discrete actions performed by the agent. TABLE I. ACTION S DESCRIPTIONS AND PENALTIES N A Description of action Penalty 1 Keep going 0 2 Soft Speed up 0 3 Soft Slow down 0 4 Soft Turn left 0 5 Soft Turn right 0 6 Emergency stop Speed up Slow down Turn left Turn right -30 The set of actions can be decomposed into two main subsets: so called soft actions and firm actions. The soft actions are shown in Table I with number 1 to 5. Because of their smoothness and passengers-friliness, they were grouped as a preferred actions and defined as zero-cost actions. The firm actions with numbers 6 to 10 in Table I are rough actions which were used when the soft actions were not sufficient to prevent the collision with the costs defined accordingly to their preference. The durations of all actions were identical and defined by the time-step of the CAS algorithm equaled to 1 second. C. Learning of a discrete transition and reward model In this paper, to represent a dynamical state of the agent as a static state we choose 4 parameters: time, longitudinal and latitudinal locations on the road and velocity of the vehicle. These parameters forms a 4 dimensional set of nonoverlapping states while other parameters such as acceleration and orientation of the vehicle are neglected to reduce the number of states. These ignored parameters are assumed to be relatively small and can be set to initial values in a very short duration. Any state of the autonomous car can be classified by this discrete model of the world and be represented as a tuple: s = [time,loc X,loc Y,velocity] The resulting state-action transition matrix T s,s,a is very large and increases in size with the number of states. For the case considered in this paper, the set of all states forms 10 x 10 x 3 x 10 matrix, with 3000 initial states and same number of possible states for each of the 10 actions. This lead to a very large dimensional MDP with 90 millions elements 3000x3000x10. It should be noted that the dimensionality of the discretized state-space can be reduced by increasing the range over which the states are discretized, but this leads to other complexities such as high uncertainties in the transitions. To learn Transition model, this paper proposes the learning Algorithm 1, where one time step of CAS is divided to 10 incremental time steps equal to 0.1 second. Then, the Dynamic Simulation function, described in Section II-B, simulates the path with these steps and returns the [x, y] data of all 10 steps. This coordinates are linearly applied to all possible initial points [Loc x,loc y Road] equally distributed inside the one discrete location state and give the expected paths from these points. The obtained paths are being classified to the discrete states. The numbers of visits to these discrete states by taking one action give the conditional probability distribution of the vehicle inside one time step of the CAS. This process requires a lot of computational work, but the T matrix has to be obtained just once, and remains to be the same till dynamic model and parameters of the grid world are valid. Data: Car dynamic model D Result: Transition model T for every action a A do for every velocity v R do x = 0, y = 0, t = 0 ; while t inc t CAS do [x n,y n,v n,t n ] = Dx,y,t,t inc,v ; t inc = t inc +t CAS /10 ; for Loc x,loc y,time R do s = [Loc x,loc y,v,time] ; s n = [x n + Loc x,y n + Loc y,v n,t n +t] ; T s,a,s = ns s n S n ; Algorithm 1: Learning the Transition Model Fig. 3. Uncertainties in the transitions from one state by one actions may result different states due to stochasticity inside the initial state The reward function is designed to show the agent which states should be followed. We give a large negative reward

4 to the collisions, or to be more precise the states in which collision happens. To motivate the agent moves towards the intersection, the states at the other side of the intersection get the positive reward. This positive reward has reduction by time of obtaining this reward to avoid the following very slow and safe policy. All other states obtains the reward according to the cost of actions shown in the Table I. This formulation provides a great degree of flexibility in defining the priorities of actions and states. Rs,s collision,a = Rs,s 50 final,a = s time 10 Rs,s,a = Costa 11 D. CAS algorithm description Decision making Algorithm 2 for the CAS is based on the Bellman function shown in the Equation 1. We calculates the vector V s of the maximum values of state s using the T s,s,a and Rs,s,a matrices with respect to probability of the transition from this state to any resulting state and the cost of this transition. The output matrix Ps gives the best policy of actions. When the allocation of the penalty states in the matrix R is known, we have a map of actions for any state of the agent, regardless of where it had really been. This policy is relevant only for the specific location of penalties or distribution of the reward at the space. We could say that, regardless of other factors, once calculated policy should fit to any similar distribution of the rewards. By that, there is no need for constantly calculating the policies on-line, they could be precomputed in advance and stored as ready-made solutions in the database what let to save the time of calculation. Data: Transition model T, Reward model R Result: Optimal policy π while > η do for s S do v = V s ; V s = max a A s T s,a,s Rs,a,s + γv s ; πs = argmax π s T s,a,s Rs,a,s + γv s ; = max, v V s ; Algorithm 2: Value-iteration algorithm E. Simulation description To prove the viability of the concept the computer simulation has been built to describe the intersection where an autonomous vehicle has been moving from south to north. The simulation environment has been designed in the Matlab computing environment as an intersection where both autonomous and human-driving vehicles were involved. Fig. 4 illustrates the simulation of the vehicles passing the intersection where the green, blue and yellow rectangles represented the humandriving vehicles, while the red one was the autonomous driving vehicle. The Algorithm 3 utilizes the dynamical equations of Fig. 4. Simulation of the autonomous car at the bottom coming through the intersection with other human-driving cars the right ones all vehicles and updates their positions with a time interval of 10 ms. The short update interval is used to eliminate a possibility of skipping discrete states and avoid jumping one vehicle over another. The frequency of the CAS decision making algorithm has been set to 1 Hz once every second. Therefore, after each decision the agent continued to go by inertia for 1 second, until the next action is computed based on the evaluation of the environment. A delay in the implementation of the action is not taken into consideration due to an opportunity to define the dynamics of the car as a black box. Two generalized cases of the problem have been elaborated - the moving in the same direction to the traffic and in the transverse direction. The states of collisions are determined by classifying the visited states of the human-driving cars with the assumption of further move with fixed velocity. This makes possible to obtain the probability distribution of the intermediate states of all vehicles and assign the values of penalties to these states corresponding to their probabilities. Three role-models have been created to simulate a humandriving car. The first one reproduces holding the constant speed by the human driver. The car has been given the initial velocity while its further speed is defined according to the Gaussian probability distribution of the velocity in the previous step. The second model emulates a random selection of the action every second from the list of soft actions unified with the list of the agent s actions. It reproduces the intentional actions of the driver while he is driving. The third model is using a real human-driving. For this purpose, the data have been obtained from the driving through simulated intersection with the use of the Logitech G27 steering wheel and pedals to control the model of the car. Due to the large computational delay in calculating the transition matrix and policy, the human driving cannot be executed in real time. The pure data resulted by the human intention has been saved to a data file and reproduced by steps during CAS simulation. Thus, when the value-iteration algorithm is calculating the policy, the manual driving vehicle stops until the calculations gets finished. This

5 Data: Transition model T, Dynamic function D Result: Result of collision car n = [x n,y n,v n ],t = 0 ; while y y f inal R do [x n,y n,v n,t n ] = D n x n,y n,v n,t n,n = [0..3]; if t t CAS then S collision n = SAgent car n ; S collision R ; if R R prev then π = CASx,y,v,t,T,R ; a n=0 = πs; switch Human behavior model do case 1 v n=1..3 = Gaussianv n ; case 2 a n=1..3 = randoma A; case 3 v n=1..3 = load human.model ; sw a n=0..3 D n ; t = t ; Algorithm 3: Simulation algorithm Fig. 5. Transition model for actions: 1- keep going, 6- emergency brake, 7- speed up, 9- turn left, 10-turn right and speeds 1, 30, 60 mph. The probability of transition from the state marked by * is shown in gradations of red color. allows us to simulate the interaction with the real drivers as close as possible. It should be noted that none of these models performs the actions in the very aggressive manner aimed to commit an intentional crash. III. EVALUATION Transition matrix has been obtained by the simulation of the dynamic function of the agent. 10 small incremental steps within each time state have been checked and classified into discrete states and defined the conditional probability of being in any of this state. Thereby, 10 interim states have been tested for each of 10 actions in each of 3000 states resulting the classification of values of the dynamic functions. This process was the most computational-intensive despite the use of a simplified dynamic model. The resulted states of each action are shown in Figure 5 in tonal gradations with respect to its probability. As can be seen, this probability deps not only on the selected action, but on the vehicle s speed and location on the roadway as well. The quantitative simulations provided the data sufficient to compare the work of reactive and proactive systems during 100 trials with 8 simulations each including 2 different initial velocities 30 and 60 miles per hour and the presence of one and two human-driving cars with the Gaussian distribution of the speed. This quantitative simulation did not consider random-action and real-human models due to difficulties in the comparison. In all cases, there were obtained no car collisions and the significant improvement in the travel time through the intersection in contrast to the reactive systems. Fig. 6 shows the velocities of the agent denoted as car1 and the human driving cardenoted as car2 moving in transverse directions. Fig. 6. Agent Car1 and human Car2 velocities in random example, simulation stops when Agent pass intersection. Both human-driving and autonomous cars had initial velocities 30 mph 14 mps, shown at top figure and mps, shown at bottom one. As can be inferred from the figure, the time required to pass the intersection for the proactive algorithm is less6.1 and 4.5 seconds than for the reactive

6 the significantly lower maximum acceleration used to avoid collision, and improvement in travel time. The wider range of travel time and accelerations were resulted by originality of each solution found by MDP for each particular allocation of the cars. Fig. 7. Agent Car1 and human Car2, Car3 velocities in random example, simulation stops when Agent pass intersection. algorithm7.1 and 9.8 seconds. The actions performed by the proactive system were smoother and required less change in the speed what gave less discomfort to the passengers. The cases considered two human-driving cars are shown in Fig. 7. In all simulations the travel time was less for the proactive system for 25-30% and the agent avoided a complete stop in most cases when the use of soft actions was enough. Fig. 8. Max acceleration used and travel time comparison for MDP and reactive methods. The higher variances of MDP results are due to variety of solutions. The statistical data over all 100 trials shown in Fig. 8 gave IV. CONCLUSIONS Simulations of this approach proved the possibility of longterm planning the actions which avoid collisions with other cars. The CAS algorithm proposed in the paper has avoided collisions in all considered cases. Significant advantages over the reactive methods using a full-stop algorithm programed with if-else were reached in the travel time. Simulations showed that the delay was reduced by 25-50% for the case of the cross-traffic. The car has performed a full stop only when there was not enough distance to maintain the lower speed while other cars were passing through. However, the calculation of the optimal policy carried out on-line significantly delayed the CAS algorithm and can not be implemented as an on-line process on a real car. The only way to reduce the computation time is to avoid the change in the allocation of the penalties. This can be done by a prediction of the intention of other drivers. Human behavior can be learned and classified to several models which can be used for allocation of the penalty states. Another way is based on off-line calculating of all possible allocations of the penalties and combining them into groups with the unified solution which satisfied the whole group. This list of solutions will be used as a ready-made policy and can be considered as on-line. REFERENCES [1] G. Leen and D. Heffernan, Expanding automotive electronic systems, Computer, vol. 35, no. 1, pp , [2] J. Levinson et al., Towards fully autonomous driving: Systems and algorithms, in Intelligent Vehicles Symposium IV, 2011 IEEE. IEEE, 2011, pp [3] T. Li, S.-J. Chang, and Y.-X. Chen, Implementation of human-like driving skills by autonomous fuzzy behavior control on an fpga-based car-like mobile robot, Industrial Electronics, IEEE Transactions on, vol. 50, no. 5, pp , [4] R. Sukthankar, Raccoon: A real-time autonomous car chaser operating optimally at night, DTIC Document, Tech. Rep., [5] D. A. Pomerleau, Alvinn: An autonomous land vehicle in a neural network, DTIC Document, Tech. Rep., [6] T. Bandyopadhyay et al., Intention-aware pedestrian avoidance, in Experimental Robotics, pp [7] S. Brechtel, T. Gindele, and R. Dillmann, Probabilistic mdp-behavior planning for cars, in 14th International IEEE Conference on Intelligent Transportation Systems ITSC, 2011, pp [8] J. Santa, A. F. Gomez-Skarmeta, and M. Sanchez-Artigas, Architecture and evaluation of a unified v2v and v2i communication system based on cellular networks, Computer Communications, vol. 31, no. 12, pp , [9] R. Bellman, A markovian decision process, DTIC Document, Tech. Rep., [10] A. Geramifard et al A tutorial on linear function approximators for dynamic programming and reinforcement learning. [Online]. Available: [11] MathWorks. Modeling a vehicle dynamics system. [Online]. Available: [12] C. L. E. Hairer and M. Roche, The numerical solution of differentialalgebraic systems by Runge-Kutta methods. Springer, 1989.

Reinforcement Learning II

Reinforcement Learning II Reinforcement Learning II Andrea Bonarini Artificial Intelligence and Robotics Lab Department of Electronics and Information Politecnico di Milano E-mail: bonarini@elet.polimi.it URL:http://www.dei.polimi.it/people/bonarini

More information

MARKOV DECISION PROCESSES (MDP) AND REINFORCEMENT LEARNING (RL) Versione originale delle slide fornita dal Prof. Francesco Lo Presti

MARKOV DECISION PROCESSES (MDP) AND REINFORCEMENT LEARNING (RL) Versione originale delle slide fornita dal Prof. Francesco Lo Presti 1 MARKOV DECISION PROCESSES (MDP) AND REINFORCEMENT LEARNING (RL) Versione originale delle slide fornita dal Prof. Francesco Lo Presti Historical background 2 Original motivation: animal learning Early

More information

Prof. Dr. Ann Nowé. Artificial Intelligence Lab ai.vub.ac.be

Prof. Dr. Ann Nowé. Artificial Intelligence Lab ai.vub.ac.be REINFORCEMENT LEARNING AN INTRODUCTION Prof. Dr. Ann Nowé Artificial Intelligence Lab ai.vub.ac.be REINFORCEMENT LEARNING WHAT IS IT? What is it? Learning from interaction Learning about, from, and while

More information

Balancing and Control of a Freely-Swinging Pendulum Using a Model-Free Reinforcement Learning Algorithm

Balancing and Control of a Freely-Swinging Pendulum Using a Model-Free Reinforcement Learning Algorithm Balancing and Control of a Freely-Swinging Pendulum Using a Model-Free Reinforcement Learning Algorithm Michail G. Lagoudakis Department of Computer Science Duke University Durham, NC 2778 mgl@cs.duke.edu

More information

Reinforcement learning an introduction

Reinforcement learning an introduction Reinforcement learning an introduction Prof. Dr. Ann Nowé Computational Modeling Group AIlab ai.vub.ac.be November 2013 Reinforcement Learning What is it? Learning from interaction Learning about, from,

More information

Final Exam December 12, 2017

Final Exam December 12, 2017 Introduction to Artificial Intelligence CSE 473, Autumn 2017 Dieter Fox Final Exam December 12, 2017 Directions This exam has 7 problems with 111 points shown in the table below, and you have 110 minutes

More information

Statistical Model Checking Applied on Perception and Decision-making Systems for Autonomous Driving

Statistical Model Checking Applied on Perception and Decision-making Systems for Autonomous Driving Statistical Model Checking Applied on Perception and Decision-making Systems for Autonomous Driving J. Quilbeuf 1 M. Barbier 2,3 L. Rummelhard 3 C. Laugier 2 A. Legay 1 T. Genevois 2 J. Ibañez-Guzmán 3

More information

Reinforcement Learning. Donglin Zeng, Department of Biostatistics, University of North Carolina

Reinforcement Learning. Donglin Zeng, Department of Biostatistics, University of North Carolina Reinforcement Learning Introduction Introduction Unsupervised learning has no outcome (no feedback). Supervised learning has outcome so we know what to predict. Reinforcement learning is in between it

More information

Reinforcement Learning. Introduction

Reinforcement Learning. Introduction Reinforcement Learning Introduction Reinforcement Learning Agent interacts and learns from a stochastic environment Science of sequential decision making Many faces of reinforcement learning Optimal control

More information

Final Exam December 12, 2017

Final Exam December 12, 2017 Introduction to Artificial Intelligence CSE 473, Autumn 2017 Dieter Fox Final Exam December 12, 2017 Directions This exam has 7 problems with 111 points shown in the table below, and you have 110 minutes

More information

Learning Control Under Uncertainty: A Probabilistic Value-Iteration Approach

Learning Control Under Uncertainty: A Probabilistic Value-Iteration Approach Learning Control Under Uncertainty: A Probabilistic Value-Iteration Approach B. Bischoff 1, D. Nguyen-Tuong 1,H.Markert 1 anda.knoll 2 1- Robert Bosch GmbH - Corporate Research Robert-Bosch-Str. 2, 71701

More information

Decision Theory: Q-Learning

Decision Theory: Q-Learning Decision Theory: Q-Learning CPSC 322 Decision Theory 5 Textbook 12.5 Decision Theory: Q-Learning CPSC 322 Decision Theory 5, Slide 1 Lecture Overview 1 Recap 2 Asynchronous Value Iteration 3 Q-Learning

More information

, and rewards and transition matrices as shown below:

, and rewards and transition matrices as shown below: CSE 50a. Assignment 7 Out: Tue Nov Due: Thu Dec Reading: Sutton & Barto, Chapters -. 7. Policy improvement Consider the Markov decision process (MDP) with two states s {0, }, two actions a {0, }, discount

More information

Preference Elicitation for Sequential Decision Problems

Preference Elicitation for Sequential Decision Problems Preference Elicitation for Sequential Decision Problems Kevin Regan University of Toronto Introduction 2 Motivation Focus: Computational approaches to sequential decision making under uncertainty These

More information

An Adaptive Clustering Method for Model-free Reinforcement Learning

An Adaptive Clustering Method for Model-free Reinforcement Learning An Adaptive Clustering Method for Model-free Reinforcement Learning Andreas Matt and Georg Regensburger Institute of Mathematics University of Innsbruck, Austria {andreas.matt, georg.regensburger}@uibk.ac.at

More information

Decision Theory: Markov Decision Processes

Decision Theory: Markov Decision Processes Decision Theory: Markov Decision Processes CPSC 322 Lecture 33 March 31, 2006 Textbook 12.5 Decision Theory: Markov Decision Processes CPSC 322 Lecture 33, Slide 1 Lecture Overview Recap Rewards and Policies

More information

Introduction to Reinforcement Learning. CMPT 882 Mar. 18

Introduction to Reinforcement Learning. CMPT 882 Mar. 18 Introduction to Reinforcement Learning CMPT 882 Mar. 18 Outline for the week Basic ideas in RL Value functions and value iteration Policy evaluation and policy improvement Model-free RL Monte-Carlo and

More information

INF 5860 Machine learning for image classification. Lecture 14: Reinforcement learning May 9, 2018

INF 5860 Machine learning for image classification. Lecture 14: Reinforcement learning May 9, 2018 Machine learning for image classification Lecture 14: Reinforcement learning May 9, 2018 Page 3 Outline Motivation Introduction to reinforcement learning (RL) Value function based methods (Q-learning)

More information

Learning in Zero-Sum Team Markov Games using Factored Value Functions

Learning in Zero-Sum Team Markov Games using Factored Value Functions Learning in Zero-Sum Team Markov Games using Factored Value Functions Michail G. Lagoudakis Department of Computer Science Duke University Durham, NC 27708 mgl@cs.duke.edu Ronald Parr Department of Computer

More information

Outline. CSE 573: Artificial Intelligence Autumn Agent. Partial Observability. Markov Decision Process (MDP) 10/31/2012

Outline. CSE 573: Artificial Intelligence Autumn Agent. Partial Observability. Markov Decision Process (MDP) 10/31/2012 CSE 573: Artificial Intelligence Autumn 2012 Reasoning about Uncertainty & Hidden Markov Models Daniel Weld Many slides adapted from Dan Klein, Stuart Russell, Andrew Moore & Luke Zettlemoyer 1 Outline

More information

Vision for Mobile Robot Navigation: A Survey

Vision for Mobile Robot Navigation: A Survey Vision for Mobile Robot Navigation: A Survey (February 2002) Guilherme N. DeSouza & Avinash C. Kak presentation by: Job Zondag 27 February 2009 Outline: Types of Navigation Absolute localization (Structured)

More information

Towards Uncertainty-Aware Path Planning On Road Networks Using Augmented-MDPs. Lorenzo Nardi and Cyrill Stachniss

Towards Uncertainty-Aware Path Planning On Road Networks Using Augmented-MDPs. Lorenzo Nardi and Cyrill Stachniss Towards Uncertainty-Aware Path Planning On Road Networks Using Augmented-MDPs Lorenzo Nardi and Cyrill Stachniss Navigation under uncertainty C B C B A A 2 `B` is the most likely position C B C B A A 3

More information

Reinforcement Learning and Control

Reinforcement Learning and Control CS9 Lecture notes Andrew Ng Part XIII Reinforcement Learning and Control We now begin our study of reinforcement learning and adaptive control. In supervised learning, we saw algorithms that tried to make

More information

CS599 Lecture 1 Introduction To RL

CS599 Lecture 1 Introduction To RL CS599 Lecture 1 Introduction To RL Reinforcement Learning Introduction Learning from rewards Policies Value Functions Rewards Models of the Environment Exploitation vs. Exploration Dynamic Programming

More information

Reinforcement Learning

Reinforcement Learning Reinforcement Learning Model-Based Reinforcement Learning Model-based, PAC-MDP, sample complexity, exploration/exploitation, RMAX, E3, Bayes-optimal, Bayesian RL, model learning Vien Ngo MLR, University

More information

We provide two sections from the book (in preparation) Intelligent and Autonomous Road Vehicles, by Ozguner, Acarman and Redmill.

We provide two sections from the book (in preparation) Intelligent and Autonomous Road Vehicles, by Ozguner, Acarman and Redmill. We provide two sections from the book (in preparation) Intelligent and Autonomous Road Vehicles, by Ozguner, Acarman and Redmill. 2.3.2. Steering control using point mass model: Open loop commands We consider

More information

Traffic Modelling for Moving-Block Train Control System

Traffic Modelling for Moving-Block Train Control System Commun. Theor. Phys. (Beijing, China) 47 (2007) pp. 601 606 c International Academic Publishers Vol. 47, No. 4, April 15, 2007 Traffic Modelling for Moving-Block Train Control System TANG Tao and LI Ke-Ping

More information

Markov Decision Processes

Markov Decision Processes Markov Decision Processes Noel Welsh 11 November 2010 Noel Welsh () Markov Decision Processes 11 November 2010 1 / 30 Annoucements Applicant visitor day seeks robot demonstrators for exciting half hour

More information

A Probabilistic Relational Model for Characterizing Situations in Dynamic Multi-Agent Systems

A Probabilistic Relational Model for Characterizing Situations in Dynamic Multi-Agent Systems A Probabilistic Relational Model for Characterizing Situations in Dynamic Multi-Agent Systems Daniel Meyer-Delius 1, Christian Plagemann 1, Georg von Wichert 2, Wendelin Feiten 2, Gisbert Lawitzky 2, and

More information

CMU Lecture 12: Reinforcement Learning. Teacher: Gianni A. Di Caro

CMU Lecture 12: Reinforcement Learning. Teacher: Gianni A. Di Caro CMU 15-781 Lecture 12: Reinforcement Learning Teacher: Gianni A. Di Caro REINFORCEMENT LEARNING Transition Model? State Action Reward model? Agent Goal: Maximize expected sum of future rewards 2 MDP PLANNING

More information

Partially Observable Markov Decision Processes (POMDPs)

Partially Observable Markov Decision Processes (POMDPs) Partially Observable Markov Decision Processes (POMDPs) Sachin Patil Guest Lecture: CS287 Advanced Robotics Slides adapted from Pieter Abbeel, Alex Lee Outline Introduction to POMDPs Locally Optimal Solutions

More information

Autonomous Helicopter Flight via Reinforcement Learning

Autonomous Helicopter Flight via Reinforcement Learning Autonomous Helicopter Flight via Reinforcement Learning Authors: Andrew Y. Ng, H. Jin Kim, Michael I. Jordan, Shankar Sastry Presenters: Shiv Ballianda, Jerrolyn Hebert, Shuiwang Ji, Kenley Malveaux, Huy

More information

REINFORCE Framework for Stochastic Policy Optimization and its use in Deep Learning

REINFORCE Framework for Stochastic Policy Optimization and its use in Deep Learning REINFORCE Framework for Stochastic Policy Optimization and its use in Deep Learning Ronen Tamari The Hebrew University of Jerusalem Advanced Seminar in Deep Learning (#67679) February 28, 2016 Ronen Tamari

More information

CS 188: Artificial Intelligence Spring Announcements

CS 188: Artificial Intelligence Spring Announcements CS 188: Artificial Intelligence Spring 2011 Lecture 12: Probability 3/2/2011 Pieter Abbeel UC Berkeley Many slides adapted from Dan Klein. 1 Announcements P3 due on Monday (3/7) at 4:59pm W3 going out

More information

Reinforcement Learning for Continuous. Action using Stochastic Gradient Ascent. Hajime KIMURA, Shigenobu KOBAYASHI JAPAN

Reinforcement Learning for Continuous. Action using Stochastic Gradient Ascent. Hajime KIMURA, Shigenobu KOBAYASHI JAPAN Reinforcement Learning for Continuous Action using Stochastic Gradient Ascent Hajime KIMURA, Shigenobu KOBAYASHI Tokyo Institute of Technology, 4259 Nagatsuda, Midori-ku Yokohama 226-852 JAPAN Abstract:

More information

Exponential Moving Average Based Multiagent Reinforcement Learning Algorithms

Exponential Moving Average Based Multiagent Reinforcement Learning Algorithms Exponential Moving Average Based Multiagent Reinforcement Learning Algorithms Mostafa D. Awheda Department of Systems and Computer Engineering Carleton University Ottawa, Canada KS 5B6 Email: mawheda@sce.carleton.ca

More information

Course 16:198:520: Introduction To Artificial Intelligence Lecture 13. Decision Making. Abdeslam Boularias. Wednesday, December 7, 2016

Course 16:198:520: Introduction To Artificial Intelligence Lecture 13. Decision Making. Abdeslam Boularias. Wednesday, December 7, 2016 Course 16:198:520: Introduction To Artificial Intelligence Lecture 13 Decision Making Abdeslam Boularias Wednesday, December 7, 2016 1 / 45 Overview We consider probabilistic temporal models where the

More information

Reinforcement learning

Reinforcement learning Reinforcement learning Based on [Kaelbling et al., 1996, Bertsekas, 2000] Bert Kappen Reinforcement learning Reinforcement learning is the problem faced by an agent that must learn behavior through trial-and-error

More information

Reinforcement Learning and Deep Reinforcement Learning

Reinforcement Learning and Deep Reinforcement Learning Reinforcement Learning and Deep Reinforcement Learning Ashis Kumer Biswas, Ph.D. ashis.biswas@ucdenver.edu Deep Learning November 5, 2018 1 / 64 Outlines 1 Principles of Reinforcement Learning 2 The Q

More information

Christopher Watkins and Peter Dayan. Noga Zaslavsky. The Hebrew University of Jerusalem Advanced Seminar in Deep Learning (67679) November 1, 2015

Christopher Watkins and Peter Dayan. Noga Zaslavsky. The Hebrew University of Jerusalem Advanced Seminar in Deep Learning (67679) November 1, 2015 Q-Learning Christopher Watkins and Peter Dayan Noga Zaslavsky The Hebrew University of Jerusalem Advanced Seminar in Deep Learning (67679) November 1, 2015 Noga Zaslavsky Q-Learning (Watkins & Dayan, 1992)

More information

Reinforcement Learning

Reinforcement Learning Reinforcement Learning Ron Parr CompSci 7 Department of Computer Science Duke University With thanks to Kris Hauser for some content RL Highlights Everybody likes to learn from experience Use ML techniques

More information

Artificial Intelligence

Artificial Intelligence Artificial Intelligence Roman Barták Department of Theoretical Computer Science and Mathematical Logic Summary of last lecture We know how to do probabilistic reasoning over time transition model P(X t

More information

Lecture 18: Reinforcement Learning Sanjeev Arora Elad Hazan

Lecture 18: Reinforcement Learning Sanjeev Arora Elad Hazan COS 402 Machine Learning and Artificial Intelligence Fall 2016 Lecture 18: Reinforcement Learning Sanjeev Arora Elad Hazan Some slides borrowed from Peter Bodik and David Silver Course progress Learning

More information

1 INTRODUCTION 2 PROBLEM DEFINITION

1 INTRODUCTION 2 PROBLEM DEFINITION Autonomous cruise control with cut-in target vehicle detection Ashwin Carvalho, Alek Williams, Stéphanie Lefèvre & Francesco Borrelli Department of Mechanical Engineering University of California Berkeley,

More information

A Gentle Introduction to Reinforcement Learning

A Gentle Introduction to Reinforcement Learning A Gentle Introduction to Reinforcement Learning Alexander Jung 2018 1 Introduction and Motivation Consider the cleaning robot Rumba which has to clean the office room B329. In order to keep things simple,

More information

ABSTRACT INTRODUCTION

ABSTRACT INTRODUCTION ABSTRACT Presented in this paper is an approach to fault diagnosis based on a unifying review of linear Gaussian models. The unifying review draws together different algorithms such as PCA, factor analysis,

More information

Today s s Lecture. Applicability of Neural Networks. Back-propagation. Review of Neural Networks. Lecture 20: Learning -4. Markov-Decision Processes

Today s s Lecture. Applicability of Neural Networks. Back-propagation. Review of Neural Networks. Lecture 20: Learning -4. Markov-Decision Processes Today s s Lecture Lecture 20: Learning -4 Review of Neural Networks Markov-Decision Processes Victor Lesser CMPSCI 683 Fall 2004 Reinforcement learning 2 Back-propagation Applicability of Neural Networks

More information

6 Reinforcement Learning

6 Reinforcement Learning 6 Reinforcement Learning As discussed above, a basic form of supervised learning is function approximation, relating input vectors to output vectors, or, more generally, finding density functions p(y,

More information

A Study on Performance Analysis of V2V Communication Based AEB System Considering Road Friction at Slopes

A Study on Performance Analysis of V2V Communication Based AEB System Considering Road Friction at Slopes , pp. 71-80 http://dx.doi.org/10.14257/ijfgcn.2016.9.11.07 A Study on Performance Analysis of V2V Communication Based AEB System Considering Road Friction at Slopes Sangduck Jeon 1, Jungeun Lee 1 and Byeongwoo

More information

To convert a speed to a velocity. V = Velocity in feet per seconds (ft/sec) S = Speed in miles per hour (mph) = Mathematical Constant

To convert a speed to a velocity. V = Velocity in feet per seconds (ft/sec) S = Speed in miles per hour (mph) = Mathematical Constant To convert a speed to a velocity V S ( 1.466) V Velocity in feet per seconds (ft/sec) S Speed in miles per hour (mph) 1.466 Mathematical Constant Example Your driver just had a rear-end accident and says

More information

Probabilistic Planning. George Konidaris

Probabilistic Planning. George Konidaris Probabilistic Planning George Konidaris gdk@cs.brown.edu Fall 2017 The Planning Problem Finding a sequence of actions to achieve some goal. Plans It s great when a plan just works but the world doesn t

More information

Reinforcement Learning. George Konidaris

Reinforcement Learning. George Konidaris Reinforcement Learning George Konidaris gdk@cs.brown.edu Fall 2017 Machine Learning Subfield of AI concerned with learning from data. Broadly, using: Experience To Improve Performance On Some Task (Tom

More information

An Introduction to Markov Decision Processes. MDP Tutorial - 1

An Introduction to Markov Decision Processes. MDP Tutorial - 1 An Introduction to Markov Decision Processes Bob Givan Purdue University Ron Parr Duke University MDP Tutorial - 1 Outline Markov Decision Processes defined (Bob) Objective functions Policies Finding Optimal

More information

AN INTRODUCTION TO NEURAL NETWORKS. Scott Kuindersma November 12, 2009

AN INTRODUCTION TO NEURAL NETWORKS. Scott Kuindersma November 12, 2009 AN INTRODUCTION TO NEURAL NETWORKS Scott Kuindersma November 12, 2009 SUPERVISED LEARNING We are given some training data: We must learn a function If y is discrete, we call it classification If it is

More information

Driving in Rural Areas. 82 percent of a miles of roadways are rural roads.

Driving in Rural Areas. 82 percent of a miles of roadways are rural roads. Driving in Rural Areas 82 percent of a miles of roadways are rural roads. Different types of Roadways Rural roads are constructed of many different types of materials. Some are paved Others are not. Different

More information

Lecture 3: Markov Decision Processes

Lecture 3: Markov Decision Processes Lecture 3: Markov Decision Processes Joseph Modayil 1 Markov Processes 2 Markov Reward Processes 3 Markov Decision Processes 4 Extensions to MDPs Markov Processes Introduction Introduction to MDPs Markov

More information

Machine Learning and Bayesian Inference. Unsupervised learning. Can we find regularity in data without the aid of labels?

Machine Learning and Bayesian Inference. Unsupervised learning. Can we find regularity in data without the aid of labels? Machine Learning and Bayesian Inference Dr Sean Holden Computer Laboratory, Room FC6 Telephone extension 6372 Email: sbh11@cl.cam.ac.uk www.cl.cam.ac.uk/ sbh11/ Unsupervised learning Can we find regularity

More information

Planning in Markov Decision Processes

Planning in Markov Decision Processes Carnegie Mellon School of Computer Science Deep Reinforcement Learning and Control Planning in Markov Decision Processes Lecture 3, CMU 10703 Katerina Fragkiadaki Markov Decision Process (MDP) A Markov

More information

CS 570: Machine Learning Seminar. Fall 2016

CS 570: Machine Learning Seminar. Fall 2016 CS 570: Machine Learning Seminar Fall 2016 Class Information Class web page: http://web.cecs.pdx.edu/~mm/mlseminar2016-2017/fall2016/ Class mailing list: cs570@cs.pdx.edu My office hours: T,Th, 2-3pm or

More information

Today s Outline. Recap: MDPs. Bellman Equations. Q-Value Iteration. Bellman Backup 5/7/2012. CSE 473: Artificial Intelligence Reinforcement Learning

Today s Outline. Recap: MDPs. Bellman Equations. Q-Value Iteration. Bellman Backup 5/7/2012. CSE 473: Artificial Intelligence Reinforcement Learning CSE 473: Artificial Intelligence Reinforcement Learning Dan Weld Today s Outline Reinforcement Learning Q-value iteration Q-learning Exploration / exploitation Linear function approximation Many slides

More information

Internet Monetization

Internet Monetization Internet Monetization March May, 2013 Discrete time Finite A decision process (MDP) is reward process with decisions. It models an environment in which all states are and time is divided into stages. Definition

More information

Prediction and Prevention of Tripped Rollovers

Prediction and Prevention of Tripped Rollovers Prediction and Prevention of Tripped Rollovers Final Report Prepared by: Gridsada Phanomchoeng Rajesh Rajamani Department of Mechanical Engineering University of Minnesota CTS 12-33 Technical Report Documentation

More information

A Probabilistic Relational Model for Characterizing Situations in Dynamic Multi-Agent Systems

A Probabilistic Relational Model for Characterizing Situations in Dynamic Multi-Agent Systems A Probabilistic Relational Model for Characterizing Situations in Dynamic Multi-Agent Systems Daniel Meyer-Delius 1, Christian Plagemann 1, Georg von Wichert 2, Wendelin Feiten 2, Gisbert Lawitzky 2, and

More information

CHALLENGE #1: ROAD CONDITIONS

CHALLENGE #1: ROAD CONDITIONS CHALLENGE #1: ROAD CONDITIONS Your forward collision warning system may struggle on wet or icy roads because it is not able to adjust for road conditions. Wet or slick roads may increase your stopping

More information

Understand FORWARD COLLISION WARNING WHAT IS IT? HOW DOES IT WORK? HOW TO USE IT?

Understand FORWARD COLLISION WARNING WHAT IS IT? HOW DOES IT WORK? HOW TO USE IT? Understand WHAT IS IT? Forward collision warning systems warn you of an impending collision by detecting stopped or slowly moved vehicles ahead of your vehicle. Forward collision warning use radar, lasers,

More information

Reinforcement Learning

Reinforcement Learning CS7/CS7 Fall 005 Supervised Learning: Training examples: (x,y) Direct feedback y for each input x Sequence of decisions with eventual feedback No teacher that critiques individual actions Learn to act

More information

A Decentralized Approach to Multi-agent Planning in the Presence of Constraints and Uncertainty

A Decentralized Approach to Multi-agent Planning in the Presence of Constraints and Uncertainty 2011 IEEE International Conference on Robotics and Automation Shanghai International Conference Center May 9-13, 2011, Shanghai, China A Decentralized Approach to Multi-agent Planning in the Presence of

More information

Markov decision processes

Markov decision processes CS 2740 Knowledge representation Lecture 24 Markov decision processes Milos Hauskrecht milos@cs.pitt.edu 5329 Sennott Square Administrative announcements Final exam: Monday, December 8, 2008 In-class Only

More information

Markov Decision Processes (and a small amount of reinforcement learning)

Markov Decision Processes (and a small amount of reinforcement learning) Markov Decision Processes (and a small amount of reinforcement learning) Slides adapted from: Brian Williams, MIT Manuela Veloso, Andrew Moore, Reid Simmons, & Tom Mitchell, CMU Nicholas Roy 16.4/13 Session

More information

Driving a car with low dimensional input features

Driving a car with low dimensional input features December 2016 Driving a car with low dimensional input features Jean-Claude Manoli jcma@stanford.edu Abstract The aim of this project is to train a Deep Q-Network to drive a car on a race track using only

More information

Reinforcement Learning. Spring 2018 Defining MDPs, Planning

Reinforcement Learning. Spring 2018 Defining MDPs, Planning Reinforcement Learning Spring 2018 Defining MDPs, Planning understandability 0 Slide 10 time You are here Markov Process Where you will go depends only on where you are Markov Process: Information state

More information

Lecture 25: Learning 4. Victor R. Lesser. CMPSCI 683 Fall 2010

Lecture 25: Learning 4. Victor R. Lesser. CMPSCI 683 Fall 2010 Lecture 25: Learning 4 Victor R. Lesser CMPSCI 683 Fall 2010 Final Exam Information Final EXAM on Th 12/16 at 4:00pm in Lederle Grad Res Ctr Rm A301 2 Hours but obviously you can leave early! Open Book

More information

A Proposed Driver Assistance System in Adverse Weather Conditions

A Proposed Driver Assistance System in Adverse Weather Conditions 1 A Proposed Driver Assistance System in Adverse Weather Conditions National Rural ITS Conference Student Paper Competition Second runner-up Ismail Zohdy Ph.D. Student, Department of Civil & Environmental

More information

A Cellular Automaton Model for Heterogeneous and Incosistent Driver Behavior in Urban Traffic

A Cellular Automaton Model for Heterogeneous and Incosistent Driver Behavior in Urban Traffic Commun. Theor. Phys. 58 (202) 744 748 Vol. 58, No. 5, November 5, 202 A Cellular Automaton Model for Heterogeneous and Incosistent Driver Behavior in Urban Traffic LIU Ming-Zhe ( ), ZHAO Shi-Bo ( ô ),,

More information

CS 7180: Behavioral Modeling and Decisionmaking

CS 7180: Behavioral Modeling and Decisionmaking CS 7180: Behavioral Modeling and Decisionmaking in AI Markov Decision Processes for Complex Decisionmaking Prof. Amy Sliva October 17, 2012 Decisions are nondeterministic In many situations, behavior and

More information

A NOVEL METHOD TO EVALUATE THE SAFETY OF HIGHLY AUTOMATED VEHICLES. Joshua L. Every Transportation Research Center Inc. United States of America

A NOVEL METHOD TO EVALUATE THE SAFETY OF HIGHLY AUTOMATED VEHICLES. Joshua L. Every Transportation Research Center Inc. United States of America A NOVEL METHOD TO EVALUATE THE SAFETY OF HIGHLY AUTOMATED VEHICLES Joshua L. Every Transportation Research Center Inc. United States of America Frank Barickman John Martin National Highway Traffic Safety

More information

CSL302/612 Artificial Intelligence End-Semester Exam 120 Minutes

CSL302/612 Artificial Intelligence End-Semester Exam 120 Minutes CSL302/612 Artificial Intelligence End-Semester Exam 120 Minutes Name: Roll Number: Please read the following instructions carefully Ø Calculators are allowed. However, laptops or mobile phones are not

More information

Reinforcement Learning

Reinforcement Learning 1 Reinforcement Learning Chris Watkins Department of Computer Science Royal Holloway, University of London July 27, 2015 2 Plan 1 Why reinforcement learning? Where does this theory come from? Markov decision

More information

Dynamic Power Management under Uncertain Information. University of Southern California Los Angeles CA

Dynamic Power Management under Uncertain Information. University of Southern California Los Angeles CA Dynamic Power Management under Uncertain Information Hwisung Jung and Massoud Pedram University of Southern California Los Angeles CA Agenda Introduction Background Stochastic Decision-Making Framework

More information

Reinforcement Learning. Yishay Mansour Tel-Aviv University

Reinforcement Learning. Yishay Mansour Tel-Aviv University Reinforcement Learning Yishay Mansour Tel-Aviv University 1 Reinforcement Learning: Course Information Classes: Wednesday Lecture 10-13 Yishay Mansour Recitations:14-15/15-16 Eliya Nachmani Adam Polyak

More information

Single-track models of an A-double heavy vehicle combination

Single-track models of an A-double heavy vehicle combination Single-track models of an A-double heavy vehicle combination PETER NILSSON KRISTOFFER TAGESSON Department of Applied Mechanics Division of Vehicle Engineering and Autonomous Systems Vehicle Dynamics Group

More information

Artificial Intelligence & Sequential Decision Problems

Artificial Intelligence & Sequential Decision Problems Artificial Intelligence & Sequential Decision Problems (CIV6540 - Machine Learning for Civil Engineers) Professor: James-A. Goulet Département des génies civil, géologique et des mines Chapter 15 Goulet

More information

Practicable Robust Markov Decision Processes

Practicable Robust Markov Decision Processes Practicable Robust Markov Decision Processes Huan Xu Department of Mechanical Engineering National University of Singapore Joint work with Shiau-Hong Lim (IBM), Shie Mannor (Techion), Ofir Mebel (Apple)

More information

Markov Decision Processes Chapter 17. Mausam

Markov Decision Processes Chapter 17. Mausam Markov Decision Processes Chapter 17 Mausam Planning Agent Static vs. Dynamic Fully vs. Partially Observable Environment What action next? Deterministic vs. Stochastic Perfect vs. Noisy Instantaneous vs.

More information

Choosing a Safe Vehicle Challenge: Analysis: Measuring Speed Challenge: Analysis: Reflection:

Choosing a Safe Vehicle Challenge: Analysis: Measuring Speed Challenge: Analysis: Reflection: Activity 73: Choosing a Safe Vehicle Challenge: Which vehicle do you think is safer? 1. Compare the features you listed in the data evidence section to the features listed on the worksheet. a. How are

More information

Neural Map. Structured Memory for Deep RL. Emilio Parisotto

Neural Map. Structured Memory for Deep RL. Emilio Parisotto Neural Map Structured Memory for Deep RL Emilio Parisotto eparisot@andrew.cmu.edu PhD Student Machine Learning Department Carnegie Mellon University Supervised Learning Most deep learning problems are

More information

REINFORCEMENT LEARNING

REINFORCEMENT LEARNING REINFORCEMENT LEARNING Larry Page: Where s Google going next? DeepMind's DQN playing Breakout Contents Introduction to Reinforcement Learning Deep Q-Learning INTRODUCTION TO REINFORCEMENT LEARNING Contents

More information

CS Machine Learning Qualifying Exam

CS Machine Learning Qualifying Exam CS Machine Learning Qualifying Exam Georgia Institute of Technology March 30, 2017 The exam is divided into four areas: Core, Statistical Methods and Models, Learning Theory, and Decision Processes. There

More information

Lecture 1: March 7, 2018

Lecture 1: March 7, 2018 Reinforcement Learning Spring Semester, 2017/8 Lecture 1: March 7, 2018 Lecturer: Yishay Mansour Scribe: ym DISCLAIMER: Based on Learning and Planning in Dynamical Systems by Shie Mannor c, all rights

More information

Reinforcement Learning

Reinforcement Learning Reinforcement Learning Function approximation Mario Martin CS-UPC May 18, 2018 Mario Martin (CS-UPC) Reinforcement Learning May 18, 2018 / 65 Recap Algorithms: MonteCarlo methods for Policy Evaluation

More information

Active Policy Iteration: Efficient Exploration through Active Learning for Value Function Approximation in Reinforcement Learning

Active Policy Iteration: Efficient Exploration through Active Learning for Value Function Approximation in Reinforcement Learning Active Policy Iteration: fficient xploration through Active Learning for Value Function Approximation in Reinforcement Learning Takayuki Akiyama, Hirotaka Hachiya, and Masashi Sugiyama Department of Computer

More information

15-780: Graduate Artificial Intelligence. Reinforcement learning (RL)

15-780: Graduate Artificial Intelligence. Reinforcement learning (RL) 15-780: Graduate Artificial Intelligence Reinforcement learning (RL) From MDPs to RL We still use the same Markov model with rewards and actions But there are a few differences: 1. We do not assume we

More information

AUTONOMOUS DRIFTING RC CAR WITH REINFORCEMENT LEARNING

AUTONOMOUS DRIFTING RC CAR WITH REINFORCEMENT LEARNING Reinforcement Learning Report AUTONOMOUS DRIFTING RC CAR WITH REINFORCEMENT LEARNING May 9, 2018 Supervisor: Dr. D. Schnieders Sourav Bhattacharjee (3035123796) Kanak Dipak Kabara (3035164221) Rachit Jain

More information

Homework 2: MDPs and Search

Homework 2: MDPs and Search Graduate Artificial Intelligence 15-780 Homework 2: MDPs and Search Out on February 15 Due on February 29 Problem 1: MDPs [Felipe, 20pts] Figure 1: MDP for Problem 1. States are represented by circles

More information

Estimation of Tire-Road Friction by Tire Rotational Vibration Model

Estimation of Tire-Road Friction by Tire Rotational Vibration Model 53 Research Report Estimation of Tire-Road Friction by Tire Rotational Vibration Model Takaji Umeno Abstract Tire-road friction is the most important piece of information used by active safety systems.

More information

A study on wheel force measurement using strain gauge equipped wheels

A study on wheel force measurement using strain gauge equipped wheels A study on wheel force measurement using strain gauge equipped wheels PAVLOS MAVROMATIDIS a, ANDREAS KANARACHOS b Electrical Engineering Department a, Mechanical Engineering Department b Frederick University

More information

Planning Under Uncertainty II

Planning Under Uncertainty II Planning Under Uncertainty II Intelligent Robotics 2014/15 Bruno Lacerda Announcement No class next Monday - 17/11/2014 2 Previous Lecture Approach to cope with uncertainty on outcome of actions Markov

More information

Reinforcement Learning

Reinforcement Learning Reinforcement Learning Markov decision process & Dynamic programming Evaluative feedback, value function, Bellman equation, optimality, Markov property, Markov decision process, dynamic programming, value

More information

Analysis and Design of an Electric Vehicle using Matlab and Simulink

Analysis and Design of an Electric Vehicle using Matlab and Simulink Analysis and Design of an Electric Vehicle using Matlab and Simulink Advanced Support Group January 22, 29 23-27: University of Michigan Research: Optimal System Partitioning and Coordination Original

More information

Available online Journal of Scientific and Engineering Research, 2017, 4(4): Research Article

Available online  Journal of Scientific and Engineering Research, 2017, 4(4): Research Article Available online www.jsaer.com, 2017, 4(4):137-142 Research Article ISSN: 2394-2630 CODEN(USA): JSERBR A Qualitative Examination of the Composition of the Cooperative Vehicles Çağlar Koşun 1, Çağatay Kök

More information