Approximate active fault detection and control
|
|
- Catherine Farmer
- 5 years ago
- Views:
Transcription
1 Approximate active fault detection and control Jan Škach Ivo Punčochář Miroslav Šimandl Department of Cybernetics Faculty of Applied Sciences University of West Bohemia Pilsen, Czech Republic 11th European Workshop on Advanced Control and Diagnosis ACD / 19
2 Outline 1 Introduction 2 Problem formulation 3 Optimal active fault detector and controller 4 Approximate active fault detector and controller 5 6 Conclusion ACD / 19
3 Introduction Introduction Passive fault detector and controller Passive fault detector uses the input and output data [u, y] to generate a decision d about faults in the system, no input signal improving the quality of detection is generated. Controller works separately. It uses the output data y to generate an input signal u that controls the system. ACD / 19
4 Introduction Introduction Passive fault detector and controller Passive fault detector uses the input and output data [u, y] to generate a decision d about faults in the system, no input signal improving the quality of detection is generated. Controller works separately. It uses the output data y to generate an input signal u that controls the system. ACD / 19
5 Introduction Introduction Active fault detector and controller Active fault detector uses the output data y to generate a decision d and an input signal u that probes the system to ensure improved quality of detection. Active fault detector and controller uses the output data y to generate a decision d and an input signal u which probes and controls the system. ACD / 19
6 Introduction Introduction Active fault detector and controller Active fault detector uses the output data y to generate a decision d and an input signal u that probes the system to ensure improved quality of detection. Active fault detector and controller uses the output data y to generate a decision d and an input signal u which probes and controls the system. ACD / 19
7 Introduction Introduction Goal of the paper To design an active fault detector and controller (AFDC) for nonlinear systems over an infinite time horizon with a discounted criterion. To demonstrate the proposed AFDC in a numerical example. ACD / 19
8 Problem formulation Problem formulation System description The multiple-model approach is considered (one model fault-free, other faulty, µ k M={1,2,...,N} is unknown model index). A system with the perfect state information described by time-invariant model s k+1 = φ(s k, u k, x k+1 ), (1) s k =[x k, b k ] T S is a hyper-state (perfect state information), x k R nx is a common state (x k+1 defined by p(x k+1 s k, u k )), b k =[b k,1,...,b k,i,...,b k,n 1 ] T B is a belief state of system models, b k,i =P(µ k =i x k 0, uk 1 0 ), φ is a nonlinear vector function, u k U = {ū1,..., um } R nu is an admissible control, P i,j =P(µ k+1 = j µ k =i), x 0, and P(µ 0 ) are known. ACD / 19
9 Problem formulation Problem formulation Active fault detector and controller Two actions: decision d k M and control u k U, [ ] [ ] dk σ(sk ) = = ρ(s γ(s k ) k ), (2) u k ρ:s M U is an unknown policy, σ : S M is a detector, γ : S U is a controller. ACD / 19
10 Problem formulation Problem formulation Optimality criterion Optimality criterion is given by { F } J( ρ, s 0 ) = lim E λ k L(dk, s k, u k ) s 0, (3) F k=0 where L(d k, s k, u k ) = α L d (d k, s k ) + (1 α) L c (s k, u k ) is a cost function (CF), α [0; 1] is a weighting factor, L d (d k, s k ) = E{L d (µ k, d k ) d k, x k 0, uk 1 0 }, is a detection CF (L d : M M R + is the original detection CF), L c (s k, u k ) = L c ([s k,1,..., s k,nx ] T, u k ) is a control CF (L c : R nx U R + is the original control CF). Assume L d, L c, and L are bounded making the criterion (3) well defined for any policy ρ. ACD / 19
11 Optimal active fault detector and controller Optimal active fault detector and controller Design The goal is to find Bellman function V that solves the following Bellman functional equation V (s k )= min E { L(dk, s k, u k )+λv } (s k+1 ) d k, s k, u k. (4) d k M,u k U Optimal detector σ and optimal controller γ are given as dk =σ (s k )=arg min α L d (s k, d k ), (5) d k M u k =γ (s k )=arg mine { (1 α) L c (s k,u k )+λv } (s k+1 ) s k, u k. (6) u k U The Bellman function V is computed offline by solving (4), then the AFDC is implemented online by means of (5) and (6). ACD / 19
12 Approximate active fault detector and controller Approximate active fault detector and controller Analytical solution to the Bellman equation is impossible to find in this case. Numerical methods are employed. Numerical solution to the Bellman equation The hyper-state space is quantized by a uniform grid with grid points s R ns. Hyper-states are projected to the grid using an aggregation function. The Bellman function is approximated by a piecewise constant function V found by a numerical method of dynamic programming (e.g. value iteration method). Due to nonlinearity of the system, the expectation in the Bellman equation is approximated by the Unscented transform. ACD / 19
13 Air handling unit An example of an air handling unit (AHU) is considered. The AHU mixes the ambient air and indoor air together with ratio proportional to a damper position. The mixed air is then heated or cooled by a coil before entering a lecture hall. Two nonlinear discretized state-space models are considered: a fault-free model and a faulty model with a stuck damper in fully opened position. Discretization by the forward Euler method with sampling period T s = 300s. ACD / 19
14 Air handling unit A goal is to detect a stuck damper in the AHU and to control the indoor air temperature (AT) in a lecture hall. x k = [x k,1, x k,2 ] T, x k,1 is the indoor AT, x k,2 is the ambient AT, u k = [u k,1, u k,2 ] T, u k,1 U L = { 1, 0, 1} is the coil control, u k,2 U N ={0, 0.1,..., 0.9} is the damper position. ACD / 19
15 Detection The original detection cost function aims at correct system model detection { L d 0 if d k = µ k, (µ k, d k ) = 1 otherwise. Control An objective of the original control cost function is to control the indoor air temperature x k,1 to a reference temperature x ref = 23 C L c (x k, u k ) = n u i=1 phc i u k,i + q 1 (1 e q 2(x k,1 x ref ) 2), where p hc = [ p hc 1, ] T phc 2 = [1, 0] T and q = [q 1, q 2 ] T = [60, 1] T are parameters. Note that control actions u k are penalized as well. ACD / 19
16 Simulation conditions The ambient air temperature is constant x k,2 = 21 C. The common state x k,1 is influenced by a state noise defined by the Laplace distribution with the location parameter ϖ = 0 and the scale parameter β = The hyper-state s k is quantized by a uniform grid defined by a Cartesian product S g S g 1 Sg 2 Sg 3 = {5, 5.1,..., 30} {21} {0, 0.01,..., 1}. The weighting factor set to α = 0.99 which compromises the detection and control objectives as shown later. Value iteration stopped after 30 iterations, λ = 0.98, P(µ 0 =1)=1, and P(µ k+1 =i µ k =j)=0.02 for i, j M, i j. ACD / 19
17 state xref sk,1 xk,1 sk,2 xk, Typical state trajectories and system control for the time horizon of 12 hours control uk,1 uk, time step k The indoor air temperature x k,1 follows x ref with oscillations caused by a discrete amount of power delivered during T s. The damper is almost closed to the ambient air because x k,2 <x ref. ACD / 19
18 model µk sk, Typical true model, probability of the fault-free model, and a decision of the AFDC for the time horizon of 12 hours dk detection time step k The actual model is correctly detected with a delay of approximately 5 steps. The detection and control aims are fulfilled. ACD / 19
19 Consider detection J d = lim F + E{ F k=0 λk L d (µ k, d k )} and control J c = lim F + E{λ k L c (x k, u k )} parts of the criterion (3). Pareto front for the AHU optimization problem indicating an influence of the weighting factor α on estimates of detection and control parts of the criterion (3), F = 42 hours, and Monte Carlo simulations was used. The values of estimate ˆ J c remains approximately the same for α [0, 0.99], only quality of detection changes. ACD / 19
20 Conclusion Conclusion Conclusion The problem formulation allows a compromise between detection and control aims. The approximate active fault detector and controller for nonlinear stochastic systems over an infinite time horizon was designed. The quality of the AFDC depends on approximations employed. The utility of the presented approach was demonstrated in the numerical example of an air handling unit. ACD / 19
21 References References A list of references relevant to the topic. Bertsekas, D.P. and Tsitsiklis, J.N. (1996). Neuro-Dynamic Programming. Athena Scientific, Belmont, Massachusetts. Šimandl, M. and Punčochář, I. (2009). Active fault detection and control: Unified formulation and optimal design. Automatica, 45(9), Šimandl, M., Škach, J., and Punčochář, I. (2014). Approximation Methods for Optimal Active Fault Detection. In Proceedings of the 22nd Mediterranean Conference on Control and Automation (MED), Palermo, Italy. ACD / 19
Approximate active fault detection and control
Journal of Physics: Conference Series OPEN ACCESS Approximate active fault detection and control To cite this article: Jan Škach et al 214 J. Phys.: Conf. Ser. 57 723 View the article online for updates
More informationTemporal-Difference Q-learning in Active Fault Diagnosis
Temporal-Difference Q-learning in Active Fault Diagnosis Jan Škach 1 Ivo Punčochář 1 Frank L. Lewis 2 1 Identification and Decision Making Research Group (IDM) European Centre of Excellence - NTIS University
More informationReinforcement Learning and Optimal Control. ASU, CSE 691, Winter 2019
Reinforcement Learning and Optimal Control ASU, CSE 691, Winter 2019 Dimitri P. Bertsekas dimitrib@mit.edu Lecture 8 Bertsekas Reinforcement Learning 1 / 21 Outline 1 Review of Infinite Horizon Problems
More informationReinforcement learning
Reinforcement learning Based on [Kaelbling et al., 1996, Bertsekas, 2000] Bert Kappen Reinforcement learning Reinforcement learning is the problem faced by an agent that must learn behavior through trial-and-error
More informationIntroduction to Approximate Dynamic Programming
Introduction to Approximate Dynamic Programming Dan Zhang Leeds School of Business University of Colorado at Boulder Dan Zhang, Spring 2012 Approximate Dynamic Programming 1 Key References Bertsekas, D.P.
More informationModel-Based Reinforcement Learning with Continuous States and Actions
Marc P. Deisenroth, Carl E. Rasmussen, and Jan Peters: Model-Based Reinforcement Learning with Continuous States and Actions in Proceedings of the 16th European Symposium on Artificial Neural Networks
More informationMarkov Decision Processes and Dynamic Programming
Markov Decision Processes and Dynamic Programming A. LAZARIC (SequeL Team @INRIA-Lille) Ecole Centrale - Option DAD SequeL INRIA Lille EC-RL Course In This Lecture A. LAZARIC Markov Decision Processes
More informationProcedia Computer Science 00 (2011) 000 6
Procedia Computer Science (211) 6 Procedia Computer Science Complex Adaptive Systems, Volume 1 Cihan H. Dagli, Editor in Chief Conference Organized by Missouri University of Science and Technology 211-
More information6.231 DYNAMIC PROGRAMMING LECTURE 7 LECTURE OUTLINE
6.231 DYNAMIC PROGRAMMING LECTURE 7 LECTURE OUTLINE DP for imperfect state info Sufficient statistics Conditional state distribution as a sufficient statistic Finite-state systems Examples 1 REVIEW: IMPERFECT
More informationOptimistic Policy Iteration and Q-learning in Dynamic Programming
Optimistic Policy Iteration and Q-learning in Dynamic Programming Dimitri P. Bertsekas Laboratory for Information and Decision Systems Massachusetts Institute of Technology November 2010 INFORMS, Austin,
More informationAverage Reward Parameters
Simulation-Based Optimization of Markov Reward Processes: Implementation Issues Peter Marbach 2 John N. Tsitsiklis 3 Abstract We consider discrete time, nite state space Markov reward processes which depend
More informationStochastic Shortest Path Problems
Chapter 8 Stochastic Shortest Path Problems 1 In this chapter, we study a stochastic version of the shortest path problem of chapter 2, where only probabilities of transitions along different arcs can
More informationMarkov Decision Processes and Dynamic Programming
Markov Decision Processes and Dynamic Programming A. LAZARIC (SequeL Team @INRIA-Lille) ENS Cachan - Master 2 MVA SequeL INRIA Lille MVA-RL Course How to model an RL problem The Markov Decision Process
More informationIn: Proc. BENELEARN-98, 8th Belgian-Dutch Conference on Machine Learning, pp 9-46, 998 Linear Quadratic Regulation using Reinforcement Learning Stephan ten Hagen? and Ben Krose Department of Mathematics,
More information6.231 DYNAMIC PROGRAMMING LECTURE 6 LECTURE OUTLINE
6.231 DYNAMIC PROGRAMMING LECTURE 6 LECTURE OUTLINE Review of Q-factors and Bellman equations for Q-factors VI and PI for Q-factors Q-learning - Combination of VI and sampling Q-learning and cost function
More informationElements of Reinforcement Learning
Elements of Reinforcement Learning Policy: way learning algorithm behaves (mapping from state to action) Reward function: Mapping of state action pair to reward or cost Value function: long term reward,
More informationHidden Markov Models (HMM) and Support Vector Machine (SVM)
Hidden Markov Models (HMM) and Support Vector Machine (SVM) Professor Joongheon Kim School of Computer Science and Engineering, Chung-Ang University, Seoul, Republic of Korea 1 Hidden Markov Models (HMM)
More informationECE7850 Lecture 8. Nonlinear Model Predictive Control: Theoretical Aspects
ECE7850 Lecture 8 Nonlinear Model Predictive Control: Theoretical Aspects Model Predictive control (MPC) is a powerful control design method for constrained dynamical systems. The basic principles and
More informationApproximate dynamic programming for stochastic reachability
Approximate dynamic programming for stochastic reachability Nikolaos Kariotoglou, Sean Summers, Tyler Summers, Maryam Kamgarpour and John Lygeros Abstract In this work we illustrate how approximate dynamic
More informationBasics of reinforcement learning
Basics of reinforcement learning Lucian Buşoniu TMLSS, 20 July 2018 Main idea of reinforcement learning (RL) Learn a sequential decision policy to optimize the cumulative performance of an unknown system
More informationDIFFERENTIAL TRAINING OF 1 ROLLOUT POLICIES
Appears in Proc. of the 35th Allerton Conference on Communication, Control, and Computing, Allerton Park, Ill., October 1997 DIFFERENTIAL TRAINING OF 1 ROLLOUT POLICIES by Dimitri P. Bertsekas 2 Abstract
More information6.231 DYNAMIC PROGRAMMING LECTURE 17 LECTURE OUTLINE
6.231 DYNAMIC PROGRAMMING LECTURE 17 LECTURE OUTLINE Undiscounted problems Stochastic shortest path problems (SSP) Proper and improper policies Analysis and computational methods for SSP Pathologies of
More informationAn Adaptive Clustering Method for Model-free Reinforcement Learning
An Adaptive Clustering Method for Model-free Reinforcement Learning Andreas Matt and Georg Regensburger Institute of Mathematics University of Innsbruck, Austria {andreas.matt, georg.regensburger}@uibk.ac.at
More informationValue and Policy Iteration
Chapter 7 Value and Policy Iteration 1 For infinite horizon problems, we need to replace our basic computational tool, the DP algorithm, which we used to compute the optimal cost and policy for finite
More informationReinforcement Learning. Yishay Mansour Tel-Aviv University
Reinforcement Learning Yishay Mansour Tel-Aviv University 1 Reinforcement Learning: Course Information Classes: Wednesday Lecture 10-13 Yishay Mansour Recitations:14-15/15-16 Eliya Nachmani Adam Polyak
More informationLecture 3: Policy Evaluation Without Knowing How the World Works / Model Free Policy Evaluation
Lecture 3: Policy Evaluation Without Knowing How the World Works / Model Free Policy Evaluation CS234: RL Emma Brunskill Winter 2018 Material builds on structure from David SIlver s Lecture 4: Model-Free
More informationLecture 9: Discrete-Time Linear Quadratic Regulator Finite-Horizon Case
Lecture 9: Discrete-Time Linear Quadratic Regulator Finite-Horizon Case Dr. Burak Demirel Faculty of Electrical Engineering and Information Technology, University of Paderborn December 15, 2015 2 Previous
More information6.231 DYNAMIC PROGRAMMING LECTURE 9 LECTURE OUTLINE
6.231 DYNAMIC PROGRAMMING LECTURE 9 LECTURE OUTLINE Rollout algorithms Policy improvement property Discrete deterministic problems Approximations of rollout algorithms Model Predictive Control (MPC) Discretization
More informationA System Theoretic Perspective of Learning and Optimization
A System Theoretic Perspective of Learning and Optimization Xi-Ren Cao* Hong Kong University of Science and Technology Clear Water Bay, Kowloon, Hong Kong eecao@ee.ust.hk Abstract Learning and optimization
More informationThe Art of Sequential Optimization via Simulations
The Art of Sequential Optimization via Simulations Stochastic Systems and Learning Laboratory EE, CS* & ISE* Departments Viterbi School of Engineering University of Southern California (Based on joint
More informationRisk-Sensitive and Average Optimality in Markov Decision Processes
Risk-Sensitive and Average Optimality in Markov Decision Processes Karel Sladký Abstract. This contribution is devoted to the risk-sensitive optimality criteria in finite state Markov Decision Processes.
More information7. Shortest Path Problems and Deterministic Finite State Systems
7. Shortest Path Problems and Deterministic Finite State Systems In the next two lectures we will look at shortest path problems, where the objective is to find the shortest path from a start node to an
More informationINTRODUCTION TO MARKOV DECISION PROCESSES
INTRODUCTION TO MARKOV DECISION PROCESSES Balázs Csanád Csáji Research Fellow, The University of Melbourne Signals & Systems Colloquium, 29 April 2010 Department of Electrical and Electronic Engineering,
More informationReinforcement Learning. Introduction
Reinforcement Learning Introduction Reinforcement Learning Agent interacts and learns from a stochastic environment Science of sequential decision making Many faces of reinforcement learning Optimal control
More informationConvergence of Synchronous Reinforcement Learning. with linear function approximation
Convergence of Synchronous Reinforcement Learning with Linear Function Approximation Artur Merke artur.merke@udo.edu Lehrstuhl Informatik, University of Dortmund, 44227 Dortmund, Germany Ralf Schoknecht
More informationTemporal difference learning
Temporal difference learning AI & Agents for IET Lecturer: S Luz http://www.scss.tcd.ie/~luzs/t/cs7032/ February 4, 2014 Recall background & assumptions Environment is a finite MDP (i.e. A and S are finite).
More informationAdaptive Dual Control
Adaptive Dual Control Björn Wittenmark Department of Automatic Control, Lund Institute of Technology Box 118, S-221 00 Lund, Sweden email: bjorn@control.lth.se Keywords: Dual control, stochastic control,
More informationState-Feedback Control of Partially-Observed Boolean Dynamical Systems Using RNA-Seq Time Series Data
State-Feedback Control of Partially-Observed Boolean Dynamical Systems Using RNA-Seq Time Series Data Mahdi Imani and Ulisses Braga-Neto Department of Electrical and Computer Engineering Texas A&M University
More informationBayesian Active Learning With Basis Functions
Bayesian Active Learning With Basis Functions Ilya O. Ryzhov Warren B. Powell Operations Research and Financial Engineering Princeton University Princeton, NJ 08544, USA IEEE ADPRL April 13, 2011 1 / 29
More informationActive Fault Diagnosis for Uncertain Systems
Active Fault Diagnosis for Uncertain Systems Davide M. Raimondo 1 Joseph K. Scott 2, Richard D. Braatz 2, Roberto Marseglia 1, Lalo Magni 1, Rolf Findeisen 3 1 Identification and Control of Dynamic Systems
More informationAn Empirical Algorithm for Relative Value Iteration for Average-cost MDPs
2015 IEEE 54th Annual Conference on Decision and Control CDC December 15-18, 2015. Osaka, Japan An Empirical Algorithm for Relative Value Iteration for Average-cost MDPs Abhishek Gupta Rahul Jain Peter
More informationReinforcement Learning and Optimal Control. Chapter 4 Infinite Horizon Reinforcement Learning DRAFT
Reinforcement Learning and Optimal Control by Dimitri P. Bertsekas Massachusetts Institute of Technology Chapter 4 Infinite Horizon Reinforcement Learning DRAFT This is Chapter 4 of the draft textbook
More informationECE7850 Lecture 7. Discrete Time Optimal Control and Dynamic Programming
ECE7850 Lecture 7 Discrete Time Optimal Control and Dynamic Programming Discrete Time Optimal control Problems Short Introduction to Dynamic Programming Connection to Stabilization Problems 1 DT nonlinear
More informationIntroduction to Reinforcement Learning. CMPT 882 Mar. 18
Introduction to Reinforcement Learning CMPT 882 Mar. 18 Outline for the week Basic ideas in RL Value functions and value iteration Policy evaluation and policy improvement Model-free RL Monte-Carlo and
More information1 MDP Value Iteration Algorithm
CS 0. - Active Learning Problem Set Handed out: 4 Jan 009 Due: 9 Jan 009 MDP Value Iteration Algorithm. Implement the value iteration algorithm given in the lecture. That is, solve Bellman s equation using
More informationStatic and Dynamic Optimization (42111)
Static and Dynamic Optimization (421) Niels Kjølstad Poulsen Build. 0b, room 01 Section for Dynamical Systems Dept. of Applied Mathematics and Computer Science The Technical University of Denmark Email:
More informationMarkov decision processes
CS 2740 Knowledge representation Lecture 24 Markov decision processes Milos Hauskrecht milos@cs.pitt.edu 5329 Sennott Square Administrative announcements Final exam: Monday, December 8, 2008 In-class Only
More informationCourse 16:198:520: Introduction To Artificial Intelligence Lecture 13. Decision Making. Abdeslam Boularias. Wednesday, December 7, 2016
Course 16:198:520: Introduction To Artificial Intelligence Lecture 13 Decision Making Abdeslam Boularias Wednesday, December 7, 2016 1 / 45 Overview We consider probabilistic temporal models where the
More informationNumerical Optimal Control Overview. Moritz Diehl
Numerical Optimal Control Overview Moritz Diehl Simplified Optimal Control Problem in ODE path constraints h(x, u) 0 initial value x0 states x(t) terminal constraint r(x(t )) 0 controls u(t) 0 t T minimize
More informationOn the Convergence of Optimistic Policy Iteration
Journal of Machine Learning Research 3 (2002) 59 72 Submitted 10/01; Published 7/02 On the Convergence of Optimistic Policy Iteration John N. Tsitsiklis LIDS, Room 35-209 Massachusetts Institute of Technology
More informationArtificial Intelligence & Sequential Decision Problems
Artificial Intelligence & Sequential Decision Problems (CIV6540 - Machine Learning for Civil Engineers) Professor: James-A. Goulet Département des génies civil, géologique et des mines Chapter 15 Goulet
More informationReinforcement Learning
Reinforcement Learning Dipendra Misra Cornell University dkm@cs.cornell.edu https://dipendramisra.wordpress.com/ Task Grasp the green cup. Output: Sequence of controller actions Setup from Lenz et. al.
More informationNumerical Methods for Model Predictive Control. Jing Yang
Numerical Methods for Model Predictive Control Jing Yang Kongens Lyngby February 26, 2008 Technical University of Denmark Informatics and Mathematical Modelling Building 321, DK-2800 Kongens Lyngby, Denmark
More informationApproximate Dynamic Programming
Approximate Dynamic Programming A. LAZARIC (SequeL Team @INRIA-Lille) Ecole Centrale - Option DAD SequeL INRIA Lille EC-RL Course Value Iteration: the Idea 1. Let V 0 be any vector in R N A. LAZARIC Reinforcement
More informationLecture 1 Numerical methods: principles, algorithms and applications: an introduction
Lecture 1 Numerical methods: principles, algorithms and applications: an introduction Weinan E 1,2 and Tiejun Li 2 1 Department of Mathematics, Princeton University, weinan@princeton.edu 2 School of Mathematical
More informationConstrained State Estimation Using the Unscented Kalman Filter
16th Mediterranean Conference on Control and Automation Congress Centre, Ajaccio, France June 25-27, 28 Constrained State Estimation Using the Unscented Kalman Filter Rambabu Kandepu, Lars Imsland and
More informationInfinite-Horizon Dynamic Programming
1/70 Infinite-Horizon Dynamic Programming http://bicmr.pku.edu.cn/~wenzw/bigdata2018.html Acknowledgement: this slides is based on Prof. Mengdi Wang s and Prof. Dimitri Bertsekas lecture notes 2/70 作业
More informationOnline solution of the average cost Kullback-Leibler optimization problem
Online solution of the average cost Kullback-Leibler optimization problem Joris Bierkens Radboud University Nijmegen j.bierkens@science.ru.nl Bert Kappen Radboud University Nijmegen b.kappen@science.ru.nl
More informationDecision Theory: Markov Decision Processes
Decision Theory: Markov Decision Processes CPSC 322 Lecture 33 March 31, 2006 Textbook 12.5 Decision Theory: Markov Decision Processes CPSC 322 Lecture 33, Slide 1 Lecture Overview Recap Rewards and Policies
More informationECE276B: Planning & Learning in Robotics Lecture 16: Model-free Control
ECE276B: Planning & Learning in Robotics Lecture 16: Model-free Control Lecturer: Nikolay Atanasov: natanasov@ucsd.edu Teaching Assistants: Tianyu Wang: tiw161@eng.ucsd.edu Yongxi Lu: yol070@eng.ucsd.edu
More informationIntroduction to Reinforcement Learning Part 1: Markov Decision Processes
Introduction to Reinforcement Learning Part 1: Markov Decision Processes Rowan McAllister Reinforcement Learning Reading Group 8 April 2015 Note I ve created these slides whilst following Algorithms for
More informationToday s s Lecture. Applicability of Neural Networks. Back-propagation. Review of Neural Networks. Lecture 20: Learning -4. Markov-Decision Processes
Today s s Lecture Lecture 20: Learning -4 Review of Neural Networks Markov-Decision Processes Victor Lesser CMPSCI 683 Fall 2004 Reinforcement learning 2 Back-propagation Applicability of Neural Networks
More information, and rewards and transition matrices as shown below:
CSE 50a. Assignment 7 Out: Tue Nov Due: Thu Dec Reading: Sutton & Barto, Chapters -. 7. Policy improvement Consider the Markov decision process (MDP) with two states s {0, }, two actions a {0, }, discount
More informationWhy do we care? Measurements. Handling uncertainty over time: predicting, estimating, recognizing, learning. Dealing with time
Handling uncertainty over time: predicting, estimating, recognizing, learning Chris Atkeson 2004 Why do we care? Speech recognition makes use of dependence of words and phonemes across time. Knowing where
More informationState estimation of linear dynamic system with unknown input and uncertain observation using dynamic programming
Control and Cybernetics vol. 35 (2006) No. 4 State estimation of linear dynamic system with unknown input and uncertain observation using dynamic programming by Dariusz Janczak and Yuri Grishin Department
More informationMinimum average value-at-risk for finite horizon semi-markov decision processes
12th workshop on Markov processes and related topics Minimum average value-at-risk for finite horizon semi-markov decision processes Xianping Guo (with Y.H. HUANG) Sun Yat-Sen University Email: mcsgxp@mail.sysu.edu.cn
More informationReinforcement Learning and Optimal Control. Chapter 4 Infinite Horizon Reinforcement Learning DRAFT
Reinforcement Learning and Optimal Control by Dimitri P. Bertsekas Massachusetts Institute of Technology Chapter 4 Infinite Horizon Reinforcement Learning DRAFT This is Chapter 4 of the draft textbook
More informationInformation Structures, the Witsenhausen Counterexample, and Communicating Using Actions
Information Structures, the Witsenhausen Counterexample, and Communicating Using Actions Pulkit Grover, Carnegie Mellon University Abstract The concept of information-structures in decentralized control
More informationMotivation for introducing probabilities
for introducing probabilities Reaching the goals is often not sufficient: it is important that the expected costs do not outweigh the benefit of reaching the goals. 1 Objective: maximize benefits - costs.
More informationFactored State Spaces 3/2/178
Factored State Spaces 3/2/178 Converting POMDPs to MDPs In a POMDP: Action + observation updates beliefs Value is a function of beliefs. Instead we can view this as an MDP where: There is a state for every
More informationOPTIMAL CONTROL WITH DISTURBANCE ESTIMATION
OPTIMAL CONTROL WITH DISTURBANCE ESTIMATION František Dušek, Daniel Honc, Rahul Sharma K. Department of Process control Faculty of Electrical Engineering and Informatics, University of Pardubice, Czech
More informationStochastic Primal-Dual Methods for Reinforcement Learning
Stochastic Primal-Dual Methods for Reinforcement Learning Alireza Askarian 1 Amber Srivastava 1 1 Department of Mechanical Engineering University of Illinois at Urbana Champaign Big Data Optimization,
More informationA Tour of Reinforcement Learning The View from Continuous Control. Benjamin Recht University of California, Berkeley
A Tour of Reinforcement Learning The View from Continuous Control Benjamin Recht University of California, Berkeley trustable, scalable, predictable Control Theory! Reinforcement Learning is the study
More informationConstructing Learning Models from Data: The Dynamic Catalog Mailing Problem
Constructing Learning Models from Data: The Dynamic Catalog Mailing Problem Peng Sun May 6, 2003 Problem and Motivation Big industry In 2000 Catalog companies in the USA sent out 7 billion catalogs, generated
More informationDuality in Robust Dynamic Programming: Pricing Convertibles, Stochastic Games and Control
Duality in Robust Dynamic Programming: Pricing Convertibles, Stochastic Games and Control Shyam S Chandramouli Abstract Many decision making problems that arise in inance, Economics, Inventory etc. can
More informationChristopher Watkins and Peter Dayan. Noga Zaslavsky. The Hebrew University of Jerusalem Advanced Seminar in Deep Learning (67679) November 1, 2015
Q-Learning Christopher Watkins and Peter Dayan Noga Zaslavsky The Hebrew University of Jerusalem Advanced Seminar in Deep Learning (67679) November 1, 2015 Noga Zaslavsky Q-Learning (Watkins & Dayan, 1992)
More informationIntroduction to Reinforcement Learning
Introduction to Reinforcement Learning Rémi Munos SequeL project: Sequential Learning http://researchers.lille.inria.fr/ munos/ INRIA Lille - Nord Europe Machine Learning Summer School, September 2011,
More informationThe ϵ-capacity of a gain matrix and tolerable disturbances: Discrete-time perturbed linear systems
IOSR Journal of Mathematics (IOSR-JM) e-issn: 2278-5728, p-issn: 2319-765X. Volume 11, Issue 3 Ver. IV (May - Jun. 2015), PP 52-62 www.iosrjournals.org The ϵ-capacity of a gain matrix and tolerable disturbances:
More informationPlanning and Model Selection in Data Driven Markov models
Planning and Model Selection in Data Driven Markov models Shie Mannor Department of Electrical Engineering Technion Joint work with many people along the way: Dotan Di-Castro (Yahoo!), Assaf Halak (Technion),
More informationA Review of the E 3 Algorithm: Near-Optimal Reinforcement Learning in Polynomial Time
A Review of the E 3 Algorithm: Near-Optimal Reinforcement Learning in Polynomial Time April 16, 2016 Abstract In this exposition we study the E 3 algorithm proposed by Kearns and Singh for reinforcement
More informationQuasi Stochastic Approximation American Control Conference San Francisco, June 2011
Quasi Stochastic Approximation American Control Conference San Francisco, June 2011 Sean P. Meyn Joint work with Darshan Shirodkar and Prashant Mehta Coordinated Science Laboratory and the Department of
More informationMARKOV DECISION PROCESSES (MDP) AND REINFORCEMENT LEARNING (RL) Versione originale delle slide fornita dal Prof. Francesco Lo Presti
1 MARKOV DECISION PROCESSES (MDP) AND REINFORCEMENT LEARNING (RL) Versione originale delle slide fornita dal Prof. Francesco Lo Presti Historical background 2 Original motivation: animal learning Early
More informationAn optimization approach to resolve the competing aims of active fault detection and control
211 5th IEEE Conference on Decision and Control and European Control Conference CDC-ECC) Orlando, FL, USA, December 12-15, 211 An optimization approach to resolve the competing aims of active fault detection
More informationRobotics: Science & Systems [Topic 6: Control] Prof. Sethu Vijayakumar Course webpage:
Robotics: Science & Systems [Topic 6: Control] Prof. Sethu Vijayakumar Course webpage: http://wcms.inf.ed.ac.uk/ipab/rss Control Theory Concerns controlled systems of the form: and a controller of the
More informationLaplacian Agent Learning: Representation Policy Iteration
Laplacian Agent Learning: Representation Policy Iteration Sridhar Mahadevan Example of a Markov Decision Process a1: $0 Heaven $1 Earth What should the agent do? a2: $100 Hell $-1 V a1 ( Earth ) = f(0,1,1,1,1,...)
More informationOn the Approximate Solution of POMDP and the Near-Optimality of Finite-State Controllers
On the Approximate Solution of POMDP and the Near-Optimality of Finite-State Controllers Huizhen (Janey) Yu (janey@mit.edu) Dimitri Bertsekas (dimitrib@mit.edu) Lab for Information and Decision Systems,
More informationApproximate Dynamic Programming
Master MVA: Reinforcement Learning Lecture: 5 Approximate Dynamic Programming Lecturer: Alessandro Lazaric http://researchers.lille.inria.fr/ lazaric/webpage/teaching.html Objectives of the lecture 1.
More informationResearch Article Pareto Optimal Solutions for Stochastic Dynamic Programming Problems via Monte Carlo Simulation
Applied Mathematics Volume 213, Article ID 81734, 9 pages http://dx.doi.org/1.1155/213/81734 Research Article Pareto Optimal Solutions for Stochastic Dynamic Programming Problems via Monte Carlo Simulation
More informationCS599 Lecture 1 Introduction To RL
CS599 Lecture 1 Introduction To RL Reinforcement Learning Introduction Learning from rewards Policies Value Functions Rewards Models of the Environment Exploitation vs. Exploration Dynamic Programming
More informationAM 121: Intro to Optimization Models and Methods: Fall 2018
AM 11: Intro to Optimization Models and Methods: Fall 018 Lecture 18: Markov Decision Processes Yiling Chen Lesson Plan Markov decision processes Policies and value functions Solving: average reward, discounted
More informationFuzzy Logic Control and Fault Detection in Centralized Chilled Water System
201 IEEE Symposium Series on Computational Intelligence Fuzzy Logic Control and Fault Detection in Centralized Chilled Water System Noor Asyikin Sulaiman 1, 2 Faculty of Electrical Engineering Universiti
More informationChapter 3 Nonlinear Model Predictive Control
Chapter 3 Nonlinear Model Predictive Control In this chapter, we introduce the nonlinear model predictive control algorithm in a rigorous way. We start by defining a basic NMPC algorithm for constant reference
More informationA Gentle Introduction to Reinforcement Learning
A Gentle Introduction to Reinforcement Learning Alexander Jung 2018 1 Introduction and Motivation Consider the cleaning robot Rumba which has to clean the office room B329. In order to keep things simple,
More informationMarkov Decision Processes and Dynamic Programming
Master MVA: Reinforcement Learning Lecture: 2 Markov Decision Processes and Dnamic Programming Lecturer: Alessandro Lazaric http://researchers.lille.inria.fr/ lazaric/webpage/teaching.html Objectives of
More informationQ-Learning for Markov Decision Processes*
McGill University ECSE 506: Term Project Q-Learning for Markov Decision Processes* Authors: Khoa Phan khoa.phan@mail.mcgill.ca Sandeep Manjanna sandeep.manjanna@mail.mcgill.ca (*Based on: Convergence of
More informationLecture 7: Linear-Quadratic Dynamic Programming Real Business Cycle Models
Lecture 7: Linear-Quadratic Dynamic Programming Real Business Cycle Models Shinichi Nishiyama Graduate School of Economics Kyoto University January 10, 2019 Abstract In this lecture, we solve and simulate
More informationConvergence Rate for Consensus with Delays
Convergence Rate for Consensus with Delays Angelia Nedić and Asuman Ozdaglar October 8, 2007 Abstract We study the problem of reaching a consensus in the values of a distributed system of agents with time-varying
More informationUNCORRECTED PROOFS. P{X(t + s) = j X(t) = i, X(u) = x(u), 0 u < t} = P{X(t + s) = j X(t) = i}.
Cochran eorms934.tex V1 - May 25, 21 2:25 P.M. P. 1 UNIFORMIZATION IN MARKOV DECISION PROCESSES OGUZHAN ALAGOZ MEHMET U.S. AYVACI Department of Industrial and Systems Engineering, University of Wisconsin-Madison,
More informationAbstract Dynamic Programming
Abstract Dynamic Programming Dimitri P. Bertsekas Department of Electrical Engineering and Computer Science Massachusetts Institute of Technology Overview of the Research Monograph Abstract Dynamic Programming"
More informationStochastic Approximation for Optimal Observer Trajectory Planning
Stochastic Approximation for Optimal Observer Trajectory Planning Sumeetpal Singh a, Ba-Ngu Vo a, Arnaud Doucet b, Robin Evans a a Department of Electrical and Electronic Engineering The University of
More information