Approximate active fault detection and control

Size: px
Start display at page:

Download "Approximate active fault detection and control"

Transcription

1 Approximate active fault detection and control Jan Škach Ivo Punčochář Miroslav Šimandl Department of Cybernetics Faculty of Applied Sciences University of West Bohemia Pilsen, Czech Republic 11th European Workshop on Advanced Control and Diagnosis ACD / 19

2 Outline 1 Introduction 2 Problem formulation 3 Optimal active fault detector and controller 4 Approximate active fault detector and controller 5 6 Conclusion ACD / 19

3 Introduction Introduction Passive fault detector and controller Passive fault detector uses the input and output data [u, y] to generate a decision d about faults in the system, no input signal improving the quality of detection is generated. Controller works separately. It uses the output data y to generate an input signal u that controls the system. ACD / 19

4 Introduction Introduction Passive fault detector and controller Passive fault detector uses the input and output data [u, y] to generate a decision d about faults in the system, no input signal improving the quality of detection is generated. Controller works separately. It uses the output data y to generate an input signal u that controls the system. ACD / 19

5 Introduction Introduction Active fault detector and controller Active fault detector uses the output data y to generate a decision d and an input signal u that probes the system to ensure improved quality of detection. Active fault detector and controller uses the output data y to generate a decision d and an input signal u which probes and controls the system. ACD / 19

6 Introduction Introduction Active fault detector and controller Active fault detector uses the output data y to generate a decision d and an input signal u that probes the system to ensure improved quality of detection. Active fault detector and controller uses the output data y to generate a decision d and an input signal u which probes and controls the system. ACD / 19

7 Introduction Introduction Goal of the paper To design an active fault detector and controller (AFDC) for nonlinear systems over an infinite time horizon with a discounted criterion. To demonstrate the proposed AFDC in a numerical example. ACD / 19

8 Problem formulation Problem formulation System description The multiple-model approach is considered (one model fault-free, other faulty, µ k M={1,2,...,N} is unknown model index). A system with the perfect state information described by time-invariant model s k+1 = φ(s k, u k, x k+1 ), (1) s k =[x k, b k ] T S is a hyper-state (perfect state information), x k R nx is a common state (x k+1 defined by p(x k+1 s k, u k )), b k =[b k,1,...,b k,i,...,b k,n 1 ] T B is a belief state of system models, b k,i =P(µ k =i x k 0, uk 1 0 ), φ is a nonlinear vector function, u k U = {ū1,..., um } R nu is an admissible control, P i,j =P(µ k+1 = j µ k =i), x 0, and P(µ 0 ) are known. ACD / 19

9 Problem formulation Problem formulation Active fault detector and controller Two actions: decision d k M and control u k U, [ ] [ ] dk σ(sk ) = = ρ(s γ(s k ) k ), (2) u k ρ:s M U is an unknown policy, σ : S M is a detector, γ : S U is a controller. ACD / 19

10 Problem formulation Problem formulation Optimality criterion Optimality criterion is given by { F } J( ρ, s 0 ) = lim E λ k L(dk, s k, u k ) s 0, (3) F k=0 where L(d k, s k, u k ) = α L d (d k, s k ) + (1 α) L c (s k, u k ) is a cost function (CF), α [0; 1] is a weighting factor, L d (d k, s k ) = E{L d (µ k, d k ) d k, x k 0, uk 1 0 }, is a detection CF (L d : M M R + is the original detection CF), L c (s k, u k ) = L c ([s k,1,..., s k,nx ] T, u k ) is a control CF (L c : R nx U R + is the original control CF). Assume L d, L c, and L are bounded making the criterion (3) well defined for any policy ρ. ACD / 19

11 Optimal active fault detector and controller Optimal active fault detector and controller Design The goal is to find Bellman function V that solves the following Bellman functional equation V (s k )= min E { L(dk, s k, u k )+λv } (s k+1 ) d k, s k, u k. (4) d k M,u k U Optimal detector σ and optimal controller γ are given as dk =σ (s k )=arg min α L d (s k, d k ), (5) d k M u k =γ (s k )=arg mine { (1 α) L c (s k,u k )+λv } (s k+1 ) s k, u k. (6) u k U The Bellman function V is computed offline by solving (4), then the AFDC is implemented online by means of (5) and (6). ACD / 19

12 Approximate active fault detector and controller Approximate active fault detector and controller Analytical solution to the Bellman equation is impossible to find in this case. Numerical methods are employed. Numerical solution to the Bellman equation The hyper-state space is quantized by a uniform grid with grid points s R ns. Hyper-states are projected to the grid using an aggregation function. The Bellman function is approximated by a piecewise constant function V found by a numerical method of dynamic programming (e.g. value iteration method). Due to nonlinearity of the system, the expectation in the Bellman equation is approximated by the Unscented transform. ACD / 19

13 Air handling unit An example of an air handling unit (AHU) is considered. The AHU mixes the ambient air and indoor air together with ratio proportional to a damper position. The mixed air is then heated or cooled by a coil before entering a lecture hall. Two nonlinear discretized state-space models are considered: a fault-free model and a faulty model with a stuck damper in fully opened position. Discretization by the forward Euler method with sampling period T s = 300s. ACD / 19

14 Air handling unit A goal is to detect a stuck damper in the AHU and to control the indoor air temperature (AT) in a lecture hall. x k = [x k,1, x k,2 ] T, x k,1 is the indoor AT, x k,2 is the ambient AT, u k = [u k,1, u k,2 ] T, u k,1 U L = { 1, 0, 1} is the coil control, u k,2 U N ={0, 0.1,..., 0.9} is the damper position. ACD / 19

15 Detection The original detection cost function aims at correct system model detection { L d 0 if d k = µ k, (µ k, d k ) = 1 otherwise. Control An objective of the original control cost function is to control the indoor air temperature x k,1 to a reference temperature x ref = 23 C L c (x k, u k ) = n u i=1 phc i u k,i + q 1 (1 e q 2(x k,1 x ref ) 2), where p hc = [ p hc 1, ] T phc 2 = [1, 0] T and q = [q 1, q 2 ] T = [60, 1] T are parameters. Note that control actions u k are penalized as well. ACD / 19

16 Simulation conditions The ambient air temperature is constant x k,2 = 21 C. The common state x k,1 is influenced by a state noise defined by the Laplace distribution with the location parameter ϖ = 0 and the scale parameter β = The hyper-state s k is quantized by a uniform grid defined by a Cartesian product S g S g 1 Sg 2 Sg 3 = {5, 5.1,..., 30} {21} {0, 0.01,..., 1}. The weighting factor set to α = 0.99 which compromises the detection and control objectives as shown later. Value iteration stopped after 30 iterations, λ = 0.98, P(µ 0 =1)=1, and P(µ k+1 =i µ k =j)=0.02 for i, j M, i j. ACD / 19

17 state xref sk,1 xk,1 sk,2 xk, Typical state trajectories and system control for the time horizon of 12 hours control uk,1 uk, time step k The indoor air temperature x k,1 follows x ref with oscillations caused by a discrete amount of power delivered during T s. The damper is almost closed to the ambient air because x k,2 <x ref. ACD / 19

18 model µk sk, Typical true model, probability of the fault-free model, and a decision of the AFDC for the time horizon of 12 hours dk detection time step k The actual model is correctly detected with a delay of approximately 5 steps. The detection and control aims are fulfilled. ACD / 19

19 Consider detection J d = lim F + E{ F k=0 λk L d (µ k, d k )} and control J c = lim F + E{λ k L c (x k, u k )} parts of the criterion (3). Pareto front for the AHU optimization problem indicating an influence of the weighting factor α on estimates of detection and control parts of the criterion (3), F = 42 hours, and Monte Carlo simulations was used. The values of estimate ˆ J c remains approximately the same for α [0, 0.99], only quality of detection changes. ACD / 19

20 Conclusion Conclusion Conclusion The problem formulation allows a compromise between detection and control aims. The approximate active fault detector and controller for nonlinear stochastic systems over an infinite time horizon was designed. The quality of the AFDC depends on approximations employed. The utility of the presented approach was demonstrated in the numerical example of an air handling unit. ACD / 19

21 References References A list of references relevant to the topic. Bertsekas, D.P. and Tsitsiklis, J.N. (1996). Neuro-Dynamic Programming. Athena Scientific, Belmont, Massachusetts. Šimandl, M. and Punčochář, I. (2009). Active fault detection and control: Unified formulation and optimal design. Automatica, 45(9), Šimandl, M., Škach, J., and Punčochář, I. (2014). Approximation Methods for Optimal Active Fault Detection. In Proceedings of the 22nd Mediterranean Conference on Control and Automation (MED), Palermo, Italy. ACD / 19

Approximate active fault detection and control

Approximate active fault detection and control Journal of Physics: Conference Series OPEN ACCESS Approximate active fault detection and control To cite this article: Jan Škach et al 214 J. Phys.: Conf. Ser. 57 723 View the article online for updates

More information

Temporal-Difference Q-learning in Active Fault Diagnosis

Temporal-Difference Q-learning in Active Fault Diagnosis Temporal-Difference Q-learning in Active Fault Diagnosis Jan Škach 1 Ivo Punčochář 1 Frank L. Lewis 2 1 Identification and Decision Making Research Group (IDM) European Centre of Excellence - NTIS University

More information

Reinforcement Learning and Optimal Control. ASU, CSE 691, Winter 2019

Reinforcement Learning and Optimal Control. ASU, CSE 691, Winter 2019 Reinforcement Learning and Optimal Control ASU, CSE 691, Winter 2019 Dimitri P. Bertsekas dimitrib@mit.edu Lecture 8 Bertsekas Reinforcement Learning 1 / 21 Outline 1 Review of Infinite Horizon Problems

More information

Reinforcement learning

Reinforcement learning Reinforcement learning Based on [Kaelbling et al., 1996, Bertsekas, 2000] Bert Kappen Reinforcement learning Reinforcement learning is the problem faced by an agent that must learn behavior through trial-and-error

More information

Introduction to Approximate Dynamic Programming

Introduction to Approximate Dynamic Programming Introduction to Approximate Dynamic Programming Dan Zhang Leeds School of Business University of Colorado at Boulder Dan Zhang, Spring 2012 Approximate Dynamic Programming 1 Key References Bertsekas, D.P.

More information

Model-Based Reinforcement Learning with Continuous States and Actions

Model-Based Reinforcement Learning with Continuous States and Actions Marc P. Deisenroth, Carl E. Rasmussen, and Jan Peters: Model-Based Reinforcement Learning with Continuous States and Actions in Proceedings of the 16th European Symposium on Artificial Neural Networks

More information

Markov Decision Processes and Dynamic Programming

Markov Decision Processes and Dynamic Programming Markov Decision Processes and Dynamic Programming A. LAZARIC (SequeL Team @INRIA-Lille) Ecole Centrale - Option DAD SequeL INRIA Lille EC-RL Course In This Lecture A. LAZARIC Markov Decision Processes

More information

Procedia Computer Science 00 (2011) 000 6

Procedia Computer Science 00 (2011) 000 6 Procedia Computer Science (211) 6 Procedia Computer Science Complex Adaptive Systems, Volume 1 Cihan H. Dagli, Editor in Chief Conference Organized by Missouri University of Science and Technology 211-

More information

6.231 DYNAMIC PROGRAMMING LECTURE 7 LECTURE OUTLINE

6.231 DYNAMIC PROGRAMMING LECTURE 7 LECTURE OUTLINE 6.231 DYNAMIC PROGRAMMING LECTURE 7 LECTURE OUTLINE DP for imperfect state info Sufficient statistics Conditional state distribution as a sufficient statistic Finite-state systems Examples 1 REVIEW: IMPERFECT

More information

Optimistic Policy Iteration and Q-learning in Dynamic Programming

Optimistic Policy Iteration and Q-learning in Dynamic Programming Optimistic Policy Iteration and Q-learning in Dynamic Programming Dimitri P. Bertsekas Laboratory for Information and Decision Systems Massachusetts Institute of Technology November 2010 INFORMS, Austin,

More information

Average Reward Parameters

Average Reward Parameters Simulation-Based Optimization of Markov Reward Processes: Implementation Issues Peter Marbach 2 John N. Tsitsiklis 3 Abstract We consider discrete time, nite state space Markov reward processes which depend

More information

Stochastic Shortest Path Problems

Stochastic Shortest Path Problems Chapter 8 Stochastic Shortest Path Problems 1 In this chapter, we study a stochastic version of the shortest path problem of chapter 2, where only probabilities of transitions along different arcs can

More information

Markov Decision Processes and Dynamic Programming

Markov Decision Processes and Dynamic Programming Markov Decision Processes and Dynamic Programming A. LAZARIC (SequeL Team @INRIA-Lille) ENS Cachan - Master 2 MVA SequeL INRIA Lille MVA-RL Course How to model an RL problem The Markov Decision Process

More information

In: Proc. BENELEARN-98, 8th Belgian-Dutch Conference on Machine Learning, pp 9-46, 998 Linear Quadratic Regulation using Reinforcement Learning Stephan ten Hagen? and Ben Krose Department of Mathematics,

More information

6.231 DYNAMIC PROGRAMMING LECTURE 6 LECTURE OUTLINE

6.231 DYNAMIC PROGRAMMING LECTURE 6 LECTURE OUTLINE 6.231 DYNAMIC PROGRAMMING LECTURE 6 LECTURE OUTLINE Review of Q-factors and Bellman equations for Q-factors VI and PI for Q-factors Q-learning - Combination of VI and sampling Q-learning and cost function

More information

Elements of Reinforcement Learning

Elements of Reinforcement Learning Elements of Reinforcement Learning Policy: way learning algorithm behaves (mapping from state to action) Reward function: Mapping of state action pair to reward or cost Value function: long term reward,

More information

Hidden Markov Models (HMM) and Support Vector Machine (SVM)

Hidden Markov Models (HMM) and Support Vector Machine (SVM) Hidden Markov Models (HMM) and Support Vector Machine (SVM) Professor Joongheon Kim School of Computer Science and Engineering, Chung-Ang University, Seoul, Republic of Korea 1 Hidden Markov Models (HMM)

More information

ECE7850 Lecture 8. Nonlinear Model Predictive Control: Theoretical Aspects

ECE7850 Lecture 8. Nonlinear Model Predictive Control: Theoretical Aspects ECE7850 Lecture 8 Nonlinear Model Predictive Control: Theoretical Aspects Model Predictive control (MPC) is a powerful control design method for constrained dynamical systems. The basic principles and

More information

Approximate dynamic programming for stochastic reachability

Approximate dynamic programming for stochastic reachability Approximate dynamic programming for stochastic reachability Nikolaos Kariotoglou, Sean Summers, Tyler Summers, Maryam Kamgarpour and John Lygeros Abstract In this work we illustrate how approximate dynamic

More information

Basics of reinforcement learning

Basics of reinforcement learning Basics of reinforcement learning Lucian Buşoniu TMLSS, 20 July 2018 Main idea of reinforcement learning (RL) Learn a sequential decision policy to optimize the cumulative performance of an unknown system

More information

DIFFERENTIAL TRAINING OF 1 ROLLOUT POLICIES

DIFFERENTIAL TRAINING OF 1 ROLLOUT POLICIES Appears in Proc. of the 35th Allerton Conference on Communication, Control, and Computing, Allerton Park, Ill., October 1997 DIFFERENTIAL TRAINING OF 1 ROLLOUT POLICIES by Dimitri P. Bertsekas 2 Abstract

More information

6.231 DYNAMIC PROGRAMMING LECTURE 17 LECTURE OUTLINE

6.231 DYNAMIC PROGRAMMING LECTURE 17 LECTURE OUTLINE 6.231 DYNAMIC PROGRAMMING LECTURE 17 LECTURE OUTLINE Undiscounted problems Stochastic shortest path problems (SSP) Proper and improper policies Analysis and computational methods for SSP Pathologies of

More information

An Adaptive Clustering Method for Model-free Reinforcement Learning

An Adaptive Clustering Method for Model-free Reinforcement Learning An Adaptive Clustering Method for Model-free Reinforcement Learning Andreas Matt and Georg Regensburger Institute of Mathematics University of Innsbruck, Austria {andreas.matt, georg.regensburger}@uibk.ac.at

More information

Value and Policy Iteration

Value and Policy Iteration Chapter 7 Value and Policy Iteration 1 For infinite horizon problems, we need to replace our basic computational tool, the DP algorithm, which we used to compute the optimal cost and policy for finite

More information

Reinforcement Learning. Yishay Mansour Tel-Aviv University

Reinforcement Learning. Yishay Mansour Tel-Aviv University Reinforcement Learning Yishay Mansour Tel-Aviv University 1 Reinforcement Learning: Course Information Classes: Wednesday Lecture 10-13 Yishay Mansour Recitations:14-15/15-16 Eliya Nachmani Adam Polyak

More information

Lecture 3: Policy Evaluation Without Knowing How the World Works / Model Free Policy Evaluation

Lecture 3: Policy Evaluation Without Knowing How the World Works / Model Free Policy Evaluation Lecture 3: Policy Evaluation Without Knowing How the World Works / Model Free Policy Evaluation CS234: RL Emma Brunskill Winter 2018 Material builds on structure from David SIlver s Lecture 4: Model-Free

More information

Lecture 9: Discrete-Time Linear Quadratic Regulator Finite-Horizon Case

Lecture 9: Discrete-Time Linear Quadratic Regulator Finite-Horizon Case Lecture 9: Discrete-Time Linear Quadratic Regulator Finite-Horizon Case Dr. Burak Demirel Faculty of Electrical Engineering and Information Technology, University of Paderborn December 15, 2015 2 Previous

More information

6.231 DYNAMIC PROGRAMMING LECTURE 9 LECTURE OUTLINE

6.231 DYNAMIC PROGRAMMING LECTURE 9 LECTURE OUTLINE 6.231 DYNAMIC PROGRAMMING LECTURE 9 LECTURE OUTLINE Rollout algorithms Policy improvement property Discrete deterministic problems Approximations of rollout algorithms Model Predictive Control (MPC) Discretization

More information

A System Theoretic Perspective of Learning and Optimization

A System Theoretic Perspective of Learning and Optimization A System Theoretic Perspective of Learning and Optimization Xi-Ren Cao* Hong Kong University of Science and Technology Clear Water Bay, Kowloon, Hong Kong eecao@ee.ust.hk Abstract Learning and optimization

More information

The Art of Sequential Optimization via Simulations

The Art of Sequential Optimization via Simulations The Art of Sequential Optimization via Simulations Stochastic Systems and Learning Laboratory EE, CS* & ISE* Departments Viterbi School of Engineering University of Southern California (Based on joint

More information

Risk-Sensitive and Average Optimality in Markov Decision Processes

Risk-Sensitive and Average Optimality in Markov Decision Processes Risk-Sensitive and Average Optimality in Markov Decision Processes Karel Sladký Abstract. This contribution is devoted to the risk-sensitive optimality criteria in finite state Markov Decision Processes.

More information

7. Shortest Path Problems and Deterministic Finite State Systems

7. Shortest Path Problems and Deterministic Finite State Systems 7. Shortest Path Problems and Deterministic Finite State Systems In the next two lectures we will look at shortest path problems, where the objective is to find the shortest path from a start node to an

More information

INTRODUCTION TO MARKOV DECISION PROCESSES

INTRODUCTION TO MARKOV DECISION PROCESSES INTRODUCTION TO MARKOV DECISION PROCESSES Balázs Csanád Csáji Research Fellow, The University of Melbourne Signals & Systems Colloquium, 29 April 2010 Department of Electrical and Electronic Engineering,

More information

Reinforcement Learning. Introduction

Reinforcement Learning. Introduction Reinforcement Learning Introduction Reinforcement Learning Agent interacts and learns from a stochastic environment Science of sequential decision making Many faces of reinforcement learning Optimal control

More information

Convergence of Synchronous Reinforcement Learning. with linear function approximation

Convergence of Synchronous Reinforcement Learning. with linear function approximation Convergence of Synchronous Reinforcement Learning with Linear Function Approximation Artur Merke artur.merke@udo.edu Lehrstuhl Informatik, University of Dortmund, 44227 Dortmund, Germany Ralf Schoknecht

More information

Temporal difference learning

Temporal difference learning Temporal difference learning AI & Agents for IET Lecturer: S Luz http://www.scss.tcd.ie/~luzs/t/cs7032/ February 4, 2014 Recall background & assumptions Environment is a finite MDP (i.e. A and S are finite).

More information

Adaptive Dual Control

Adaptive Dual Control Adaptive Dual Control Björn Wittenmark Department of Automatic Control, Lund Institute of Technology Box 118, S-221 00 Lund, Sweden email: bjorn@control.lth.se Keywords: Dual control, stochastic control,

More information

State-Feedback Control of Partially-Observed Boolean Dynamical Systems Using RNA-Seq Time Series Data

State-Feedback Control of Partially-Observed Boolean Dynamical Systems Using RNA-Seq Time Series Data State-Feedback Control of Partially-Observed Boolean Dynamical Systems Using RNA-Seq Time Series Data Mahdi Imani and Ulisses Braga-Neto Department of Electrical and Computer Engineering Texas A&M University

More information

Bayesian Active Learning With Basis Functions

Bayesian Active Learning With Basis Functions Bayesian Active Learning With Basis Functions Ilya O. Ryzhov Warren B. Powell Operations Research and Financial Engineering Princeton University Princeton, NJ 08544, USA IEEE ADPRL April 13, 2011 1 / 29

More information

Active Fault Diagnosis for Uncertain Systems

Active Fault Diagnosis for Uncertain Systems Active Fault Diagnosis for Uncertain Systems Davide M. Raimondo 1 Joseph K. Scott 2, Richard D. Braatz 2, Roberto Marseglia 1, Lalo Magni 1, Rolf Findeisen 3 1 Identification and Control of Dynamic Systems

More information

An Empirical Algorithm for Relative Value Iteration for Average-cost MDPs

An Empirical Algorithm for Relative Value Iteration for Average-cost MDPs 2015 IEEE 54th Annual Conference on Decision and Control CDC December 15-18, 2015. Osaka, Japan An Empirical Algorithm for Relative Value Iteration for Average-cost MDPs Abhishek Gupta Rahul Jain Peter

More information

Reinforcement Learning and Optimal Control. Chapter 4 Infinite Horizon Reinforcement Learning DRAFT

Reinforcement Learning and Optimal Control. Chapter 4 Infinite Horizon Reinforcement Learning DRAFT Reinforcement Learning and Optimal Control by Dimitri P. Bertsekas Massachusetts Institute of Technology Chapter 4 Infinite Horizon Reinforcement Learning DRAFT This is Chapter 4 of the draft textbook

More information

ECE7850 Lecture 7. Discrete Time Optimal Control and Dynamic Programming

ECE7850 Lecture 7. Discrete Time Optimal Control and Dynamic Programming ECE7850 Lecture 7 Discrete Time Optimal Control and Dynamic Programming Discrete Time Optimal control Problems Short Introduction to Dynamic Programming Connection to Stabilization Problems 1 DT nonlinear

More information

Introduction to Reinforcement Learning. CMPT 882 Mar. 18

Introduction to Reinforcement Learning. CMPT 882 Mar. 18 Introduction to Reinforcement Learning CMPT 882 Mar. 18 Outline for the week Basic ideas in RL Value functions and value iteration Policy evaluation and policy improvement Model-free RL Monte-Carlo and

More information

1 MDP Value Iteration Algorithm

1 MDP Value Iteration Algorithm CS 0. - Active Learning Problem Set Handed out: 4 Jan 009 Due: 9 Jan 009 MDP Value Iteration Algorithm. Implement the value iteration algorithm given in the lecture. That is, solve Bellman s equation using

More information

Static and Dynamic Optimization (42111)

Static and Dynamic Optimization (42111) Static and Dynamic Optimization (421) Niels Kjølstad Poulsen Build. 0b, room 01 Section for Dynamical Systems Dept. of Applied Mathematics and Computer Science The Technical University of Denmark Email:

More information

Markov decision processes

Markov decision processes CS 2740 Knowledge representation Lecture 24 Markov decision processes Milos Hauskrecht milos@cs.pitt.edu 5329 Sennott Square Administrative announcements Final exam: Monday, December 8, 2008 In-class Only

More information

Course 16:198:520: Introduction To Artificial Intelligence Lecture 13. Decision Making. Abdeslam Boularias. Wednesday, December 7, 2016

Course 16:198:520: Introduction To Artificial Intelligence Lecture 13. Decision Making. Abdeslam Boularias. Wednesday, December 7, 2016 Course 16:198:520: Introduction To Artificial Intelligence Lecture 13 Decision Making Abdeslam Boularias Wednesday, December 7, 2016 1 / 45 Overview We consider probabilistic temporal models where the

More information

Numerical Optimal Control Overview. Moritz Diehl

Numerical Optimal Control Overview. Moritz Diehl Numerical Optimal Control Overview Moritz Diehl Simplified Optimal Control Problem in ODE path constraints h(x, u) 0 initial value x0 states x(t) terminal constraint r(x(t )) 0 controls u(t) 0 t T minimize

More information

On the Convergence of Optimistic Policy Iteration

On the Convergence of Optimistic Policy Iteration Journal of Machine Learning Research 3 (2002) 59 72 Submitted 10/01; Published 7/02 On the Convergence of Optimistic Policy Iteration John N. Tsitsiklis LIDS, Room 35-209 Massachusetts Institute of Technology

More information

Artificial Intelligence & Sequential Decision Problems

Artificial Intelligence & Sequential Decision Problems Artificial Intelligence & Sequential Decision Problems (CIV6540 - Machine Learning for Civil Engineers) Professor: James-A. Goulet Département des génies civil, géologique et des mines Chapter 15 Goulet

More information

Reinforcement Learning

Reinforcement Learning Reinforcement Learning Dipendra Misra Cornell University dkm@cs.cornell.edu https://dipendramisra.wordpress.com/ Task Grasp the green cup. Output: Sequence of controller actions Setup from Lenz et. al.

More information

Numerical Methods for Model Predictive Control. Jing Yang

Numerical Methods for Model Predictive Control. Jing Yang Numerical Methods for Model Predictive Control Jing Yang Kongens Lyngby February 26, 2008 Technical University of Denmark Informatics and Mathematical Modelling Building 321, DK-2800 Kongens Lyngby, Denmark

More information

Approximate Dynamic Programming

Approximate Dynamic Programming Approximate Dynamic Programming A. LAZARIC (SequeL Team @INRIA-Lille) Ecole Centrale - Option DAD SequeL INRIA Lille EC-RL Course Value Iteration: the Idea 1. Let V 0 be any vector in R N A. LAZARIC Reinforcement

More information

Lecture 1 Numerical methods: principles, algorithms and applications: an introduction

Lecture 1 Numerical methods: principles, algorithms and applications: an introduction Lecture 1 Numerical methods: principles, algorithms and applications: an introduction Weinan E 1,2 and Tiejun Li 2 1 Department of Mathematics, Princeton University, weinan@princeton.edu 2 School of Mathematical

More information

Constrained State Estimation Using the Unscented Kalman Filter

Constrained State Estimation Using the Unscented Kalman Filter 16th Mediterranean Conference on Control and Automation Congress Centre, Ajaccio, France June 25-27, 28 Constrained State Estimation Using the Unscented Kalman Filter Rambabu Kandepu, Lars Imsland and

More information

Infinite-Horizon Dynamic Programming

Infinite-Horizon Dynamic Programming 1/70 Infinite-Horizon Dynamic Programming http://bicmr.pku.edu.cn/~wenzw/bigdata2018.html Acknowledgement: this slides is based on Prof. Mengdi Wang s and Prof. Dimitri Bertsekas lecture notes 2/70 作业

More information

Online solution of the average cost Kullback-Leibler optimization problem

Online solution of the average cost Kullback-Leibler optimization problem Online solution of the average cost Kullback-Leibler optimization problem Joris Bierkens Radboud University Nijmegen j.bierkens@science.ru.nl Bert Kappen Radboud University Nijmegen b.kappen@science.ru.nl

More information

Decision Theory: Markov Decision Processes

Decision Theory: Markov Decision Processes Decision Theory: Markov Decision Processes CPSC 322 Lecture 33 March 31, 2006 Textbook 12.5 Decision Theory: Markov Decision Processes CPSC 322 Lecture 33, Slide 1 Lecture Overview Recap Rewards and Policies

More information

ECE276B: Planning & Learning in Robotics Lecture 16: Model-free Control

ECE276B: Planning & Learning in Robotics Lecture 16: Model-free Control ECE276B: Planning & Learning in Robotics Lecture 16: Model-free Control Lecturer: Nikolay Atanasov: natanasov@ucsd.edu Teaching Assistants: Tianyu Wang: tiw161@eng.ucsd.edu Yongxi Lu: yol070@eng.ucsd.edu

More information

Introduction to Reinforcement Learning Part 1: Markov Decision Processes

Introduction to Reinforcement Learning Part 1: Markov Decision Processes Introduction to Reinforcement Learning Part 1: Markov Decision Processes Rowan McAllister Reinforcement Learning Reading Group 8 April 2015 Note I ve created these slides whilst following Algorithms for

More information

Today s s Lecture. Applicability of Neural Networks. Back-propagation. Review of Neural Networks. Lecture 20: Learning -4. Markov-Decision Processes

Today s s Lecture. Applicability of Neural Networks. Back-propagation. Review of Neural Networks. Lecture 20: Learning -4. Markov-Decision Processes Today s s Lecture Lecture 20: Learning -4 Review of Neural Networks Markov-Decision Processes Victor Lesser CMPSCI 683 Fall 2004 Reinforcement learning 2 Back-propagation Applicability of Neural Networks

More information

, and rewards and transition matrices as shown below:

, and rewards and transition matrices as shown below: CSE 50a. Assignment 7 Out: Tue Nov Due: Thu Dec Reading: Sutton & Barto, Chapters -. 7. Policy improvement Consider the Markov decision process (MDP) with two states s {0, }, two actions a {0, }, discount

More information

Why do we care? Measurements. Handling uncertainty over time: predicting, estimating, recognizing, learning. Dealing with time

Why do we care? Measurements. Handling uncertainty over time: predicting, estimating, recognizing, learning. Dealing with time Handling uncertainty over time: predicting, estimating, recognizing, learning Chris Atkeson 2004 Why do we care? Speech recognition makes use of dependence of words and phonemes across time. Knowing where

More information

State estimation of linear dynamic system with unknown input and uncertain observation using dynamic programming

State estimation of linear dynamic system with unknown input and uncertain observation using dynamic programming Control and Cybernetics vol. 35 (2006) No. 4 State estimation of linear dynamic system with unknown input and uncertain observation using dynamic programming by Dariusz Janczak and Yuri Grishin Department

More information

Minimum average value-at-risk for finite horizon semi-markov decision processes

Minimum average value-at-risk for finite horizon semi-markov decision processes 12th workshop on Markov processes and related topics Minimum average value-at-risk for finite horizon semi-markov decision processes Xianping Guo (with Y.H. HUANG) Sun Yat-Sen University Email: mcsgxp@mail.sysu.edu.cn

More information

Reinforcement Learning and Optimal Control. Chapter 4 Infinite Horizon Reinforcement Learning DRAFT

Reinforcement Learning and Optimal Control. Chapter 4 Infinite Horizon Reinforcement Learning DRAFT Reinforcement Learning and Optimal Control by Dimitri P. Bertsekas Massachusetts Institute of Technology Chapter 4 Infinite Horizon Reinforcement Learning DRAFT This is Chapter 4 of the draft textbook

More information

Information Structures, the Witsenhausen Counterexample, and Communicating Using Actions

Information Structures, the Witsenhausen Counterexample, and Communicating Using Actions Information Structures, the Witsenhausen Counterexample, and Communicating Using Actions Pulkit Grover, Carnegie Mellon University Abstract The concept of information-structures in decentralized control

More information

Motivation for introducing probabilities

Motivation for introducing probabilities for introducing probabilities Reaching the goals is often not sufficient: it is important that the expected costs do not outweigh the benefit of reaching the goals. 1 Objective: maximize benefits - costs.

More information

Factored State Spaces 3/2/178

Factored State Spaces 3/2/178 Factored State Spaces 3/2/178 Converting POMDPs to MDPs In a POMDP: Action + observation updates beliefs Value is a function of beliefs. Instead we can view this as an MDP where: There is a state for every

More information

OPTIMAL CONTROL WITH DISTURBANCE ESTIMATION

OPTIMAL CONTROL WITH DISTURBANCE ESTIMATION OPTIMAL CONTROL WITH DISTURBANCE ESTIMATION František Dušek, Daniel Honc, Rahul Sharma K. Department of Process control Faculty of Electrical Engineering and Informatics, University of Pardubice, Czech

More information

Stochastic Primal-Dual Methods for Reinforcement Learning

Stochastic Primal-Dual Methods for Reinforcement Learning Stochastic Primal-Dual Methods for Reinforcement Learning Alireza Askarian 1 Amber Srivastava 1 1 Department of Mechanical Engineering University of Illinois at Urbana Champaign Big Data Optimization,

More information

A Tour of Reinforcement Learning The View from Continuous Control. Benjamin Recht University of California, Berkeley

A Tour of Reinforcement Learning The View from Continuous Control. Benjamin Recht University of California, Berkeley A Tour of Reinforcement Learning The View from Continuous Control Benjamin Recht University of California, Berkeley trustable, scalable, predictable Control Theory! Reinforcement Learning is the study

More information

Constructing Learning Models from Data: The Dynamic Catalog Mailing Problem

Constructing Learning Models from Data: The Dynamic Catalog Mailing Problem Constructing Learning Models from Data: The Dynamic Catalog Mailing Problem Peng Sun May 6, 2003 Problem and Motivation Big industry In 2000 Catalog companies in the USA sent out 7 billion catalogs, generated

More information

Duality in Robust Dynamic Programming: Pricing Convertibles, Stochastic Games and Control

Duality in Robust Dynamic Programming: Pricing Convertibles, Stochastic Games and Control Duality in Robust Dynamic Programming: Pricing Convertibles, Stochastic Games and Control Shyam S Chandramouli Abstract Many decision making problems that arise in inance, Economics, Inventory etc. can

More information

Christopher Watkins and Peter Dayan. Noga Zaslavsky. The Hebrew University of Jerusalem Advanced Seminar in Deep Learning (67679) November 1, 2015

Christopher Watkins and Peter Dayan. Noga Zaslavsky. The Hebrew University of Jerusalem Advanced Seminar in Deep Learning (67679) November 1, 2015 Q-Learning Christopher Watkins and Peter Dayan Noga Zaslavsky The Hebrew University of Jerusalem Advanced Seminar in Deep Learning (67679) November 1, 2015 Noga Zaslavsky Q-Learning (Watkins & Dayan, 1992)

More information

Introduction to Reinforcement Learning

Introduction to Reinforcement Learning Introduction to Reinforcement Learning Rémi Munos SequeL project: Sequential Learning http://researchers.lille.inria.fr/ munos/ INRIA Lille - Nord Europe Machine Learning Summer School, September 2011,

More information

The ϵ-capacity of a gain matrix and tolerable disturbances: Discrete-time perturbed linear systems

The ϵ-capacity of a gain matrix and tolerable disturbances: Discrete-time perturbed linear systems IOSR Journal of Mathematics (IOSR-JM) e-issn: 2278-5728, p-issn: 2319-765X. Volume 11, Issue 3 Ver. IV (May - Jun. 2015), PP 52-62 www.iosrjournals.org The ϵ-capacity of a gain matrix and tolerable disturbances:

More information

Planning and Model Selection in Data Driven Markov models

Planning and Model Selection in Data Driven Markov models Planning and Model Selection in Data Driven Markov models Shie Mannor Department of Electrical Engineering Technion Joint work with many people along the way: Dotan Di-Castro (Yahoo!), Assaf Halak (Technion),

More information

A Review of the E 3 Algorithm: Near-Optimal Reinforcement Learning in Polynomial Time

A Review of the E 3 Algorithm: Near-Optimal Reinforcement Learning in Polynomial Time A Review of the E 3 Algorithm: Near-Optimal Reinforcement Learning in Polynomial Time April 16, 2016 Abstract In this exposition we study the E 3 algorithm proposed by Kearns and Singh for reinforcement

More information

Quasi Stochastic Approximation American Control Conference San Francisco, June 2011

Quasi Stochastic Approximation American Control Conference San Francisco, June 2011 Quasi Stochastic Approximation American Control Conference San Francisco, June 2011 Sean P. Meyn Joint work with Darshan Shirodkar and Prashant Mehta Coordinated Science Laboratory and the Department of

More information

MARKOV DECISION PROCESSES (MDP) AND REINFORCEMENT LEARNING (RL) Versione originale delle slide fornita dal Prof. Francesco Lo Presti

MARKOV DECISION PROCESSES (MDP) AND REINFORCEMENT LEARNING (RL) Versione originale delle slide fornita dal Prof. Francesco Lo Presti 1 MARKOV DECISION PROCESSES (MDP) AND REINFORCEMENT LEARNING (RL) Versione originale delle slide fornita dal Prof. Francesco Lo Presti Historical background 2 Original motivation: animal learning Early

More information

An optimization approach to resolve the competing aims of active fault detection and control

An optimization approach to resolve the competing aims of active fault detection and control 211 5th IEEE Conference on Decision and Control and European Control Conference CDC-ECC) Orlando, FL, USA, December 12-15, 211 An optimization approach to resolve the competing aims of active fault detection

More information

Robotics: Science & Systems [Topic 6: Control] Prof. Sethu Vijayakumar Course webpage:

Robotics: Science & Systems [Topic 6: Control] Prof. Sethu Vijayakumar Course webpage: Robotics: Science & Systems [Topic 6: Control] Prof. Sethu Vijayakumar Course webpage: http://wcms.inf.ed.ac.uk/ipab/rss Control Theory Concerns controlled systems of the form: and a controller of the

More information

Laplacian Agent Learning: Representation Policy Iteration

Laplacian Agent Learning: Representation Policy Iteration Laplacian Agent Learning: Representation Policy Iteration Sridhar Mahadevan Example of a Markov Decision Process a1: $0 Heaven $1 Earth What should the agent do? a2: $100 Hell $-1 V a1 ( Earth ) = f(0,1,1,1,1,...)

More information

On the Approximate Solution of POMDP and the Near-Optimality of Finite-State Controllers

On the Approximate Solution of POMDP and the Near-Optimality of Finite-State Controllers On the Approximate Solution of POMDP and the Near-Optimality of Finite-State Controllers Huizhen (Janey) Yu (janey@mit.edu) Dimitri Bertsekas (dimitrib@mit.edu) Lab for Information and Decision Systems,

More information

Approximate Dynamic Programming

Approximate Dynamic Programming Master MVA: Reinforcement Learning Lecture: 5 Approximate Dynamic Programming Lecturer: Alessandro Lazaric http://researchers.lille.inria.fr/ lazaric/webpage/teaching.html Objectives of the lecture 1.

More information

Research Article Pareto Optimal Solutions for Stochastic Dynamic Programming Problems via Monte Carlo Simulation

Research Article Pareto Optimal Solutions for Stochastic Dynamic Programming Problems via Monte Carlo Simulation Applied Mathematics Volume 213, Article ID 81734, 9 pages http://dx.doi.org/1.1155/213/81734 Research Article Pareto Optimal Solutions for Stochastic Dynamic Programming Problems via Monte Carlo Simulation

More information

CS599 Lecture 1 Introduction To RL

CS599 Lecture 1 Introduction To RL CS599 Lecture 1 Introduction To RL Reinforcement Learning Introduction Learning from rewards Policies Value Functions Rewards Models of the Environment Exploitation vs. Exploration Dynamic Programming

More information

AM 121: Intro to Optimization Models and Methods: Fall 2018

AM 121: Intro to Optimization Models and Methods: Fall 2018 AM 11: Intro to Optimization Models and Methods: Fall 018 Lecture 18: Markov Decision Processes Yiling Chen Lesson Plan Markov decision processes Policies and value functions Solving: average reward, discounted

More information

Fuzzy Logic Control and Fault Detection in Centralized Chilled Water System

Fuzzy Logic Control and Fault Detection in Centralized Chilled Water System 201 IEEE Symposium Series on Computational Intelligence Fuzzy Logic Control and Fault Detection in Centralized Chilled Water System Noor Asyikin Sulaiman 1, 2 Faculty of Electrical Engineering Universiti

More information

Chapter 3 Nonlinear Model Predictive Control

Chapter 3 Nonlinear Model Predictive Control Chapter 3 Nonlinear Model Predictive Control In this chapter, we introduce the nonlinear model predictive control algorithm in a rigorous way. We start by defining a basic NMPC algorithm for constant reference

More information

A Gentle Introduction to Reinforcement Learning

A Gentle Introduction to Reinforcement Learning A Gentle Introduction to Reinforcement Learning Alexander Jung 2018 1 Introduction and Motivation Consider the cleaning robot Rumba which has to clean the office room B329. In order to keep things simple,

More information

Markov Decision Processes and Dynamic Programming

Markov Decision Processes and Dynamic Programming Master MVA: Reinforcement Learning Lecture: 2 Markov Decision Processes and Dnamic Programming Lecturer: Alessandro Lazaric http://researchers.lille.inria.fr/ lazaric/webpage/teaching.html Objectives of

More information

Q-Learning for Markov Decision Processes*

Q-Learning for Markov Decision Processes* McGill University ECSE 506: Term Project Q-Learning for Markov Decision Processes* Authors: Khoa Phan khoa.phan@mail.mcgill.ca Sandeep Manjanna sandeep.manjanna@mail.mcgill.ca (*Based on: Convergence of

More information

Lecture 7: Linear-Quadratic Dynamic Programming Real Business Cycle Models

Lecture 7: Linear-Quadratic Dynamic Programming Real Business Cycle Models Lecture 7: Linear-Quadratic Dynamic Programming Real Business Cycle Models Shinichi Nishiyama Graduate School of Economics Kyoto University January 10, 2019 Abstract In this lecture, we solve and simulate

More information

Convergence Rate for Consensus with Delays

Convergence Rate for Consensus with Delays Convergence Rate for Consensus with Delays Angelia Nedić and Asuman Ozdaglar October 8, 2007 Abstract We study the problem of reaching a consensus in the values of a distributed system of agents with time-varying

More information

UNCORRECTED PROOFS. P{X(t + s) = j X(t) = i, X(u) = x(u), 0 u < t} = P{X(t + s) = j X(t) = i}.

UNCORRECTED PROOFS. P{X(t + s) = j X(t) = i, X(u) = x(u), 0 u < t} = P{X(t + s) = j X(t) = i}. Cochran eorms934.tex V1 - May 25, 21 2:25 P.M. P. 1 UNIFORMIZATION IN MARKOV DECISION PROCESSES OGUZHAN ALAGOZ MEHMET U.S. AYVACI Department of Industrial and Systems Engineering, University of Wisconsin-Madison,

More information

Abstract Dynamic Programming

Abstract Dynamic Programming Abstract Dynamic Programming Dimitri P. Bertsekas Department of Electrical Engineering and Computer Science Massachusetts Institute of Technology Overview of the Research Monograph Abstract Dynamic Programming"

More information

Stochastic Approximation for Optimal Observer Trajectory Planning

Stochastic Approximation for Optimal Observer Trajectory Planning Stochastic Approximation for Optimal Observer Trajectory Planning Sumeetpal Singh a, Ba-Ngu Vo a, Arnaud Doucet b, Robin Evans a a Department of Electrical and Electronic Engineering The University of

More information