Reinforcement Learning

Size: px
Start display at page:

Download "Reinforcement Learning"

Transcription

1 Reinforcement Learning Yihay Manour Google Inc. & Tel-Aviv Univerity

2 Outline Goal of Reinforcement Learning Mathematical Model (MDP) Planning Learning Current Reearch iue 2

3 Goal of Reinforcement Learning Goal oriented learning through interaction Control of large cale tochatic environment with partial knowledge. Supervied / Unupervied Learning Learn from labeled / unlabeled example 3

4 Reinforcement Learning - origin Artificial Intelligence Control Theory Operation Reearch Cognitive Science & Pychology Solid foundation; well etablihed reearch. 4

5 Typical Application Robotic Elevator control [CB]. Robo-occer [SV]. Board game backgammon [T], checker [S]. Che [B] Scheduling Dynamic channel allocation [SB]. Inventory problem. 5

6 Contrat with Supervied Learning The ytem ha a tate. The algorithm influence the tate ditribution. Inherent Tradeoff: Exploration veru Exploitation. 6

7 Mathematical Model - Motivation Model of uncertainty: Environment, action, our knowledge. Focu on deciion making. Maximize long term reward. Markov Deciion Proce (MDP) 7

8 Mathematical Model - MDP Markov deciion procee S- et of tate A- et of action δ - Tranition probability R - Reward function Similar to DFA! 8

9 MDP model - tate and action Environment = tate action a Action = tranition δ (, a, ' ) 9

10 MDP model - reward R(,a) = reward at tate for doing action a (a random variable). Example: R(,a) = -1 with probability with probability with probability

11 MDP model - trajectorie trajectory: 0 a 0 r 0 1 a 1 r 1 2 a 2 r 2 11

12 MDP - Return function. Combining all the immediate reward to a ingle value. Modeling Iue: Are early reward more valuable than later reward? I the ytem terminating or continuou? Uually the return i linear in the immediate reward. 12

13 MDP model - return function Finite Horizon - parameter H return = R( i, ai ) Infinite Horizon 1 i H i dicounted - parameter γ<1. return = γ R(i,ai ) i=0 undicounted 1 N N 1 i= 0 R( i,a i N ) return Terminating MDP 13

14 MDP model - action election AIM: Maximize the expected return. Fully Obervable - can ee the entire tate. Policy - mapping from tate to action Optimal policy: optimal from any tart tate. THEOREM: There exit a determinitic optimal policy 14

15 Contrat with Supervied Learning Supervied Learning: Fixed ditribution on example. Reinforcement Learning: The tate ditribution i policy dependent!!! A mall local change in the policy can make a huge global change in the return. 15

16 MDP model - ummary S a A δ ( 1, a, 2) π : R(,a) S A i = 0 γ i ri - et of tate, S =n. -et of k action, A =k. - tranition function. - immediate reward function. - policy. - dicounted cumulative return. 16

17 Simple example: N- armed bandit Single tate. Goal: Maximize um of immediate reward. a 1 a 2 Given the model: Greedy action. a 3 Difficulty: unknown model. 17

18 N-Armed Bandit: Highlight Algorithm (near greedy): Exponential weight G i um of reward of action a i w i = β G i Follow the (perturbed) leader Reult: For any equence of T reward: E[online] > max i {G i } - qrt{t log N} 18

19 Planning - Baic Problem. Given a complete MDP model. Policy evaluation - Given a policy π, etimate it return. Optimal control - Find an optimal policy π * (maximize the return from any tart tate). 19

20 Planning - Value Function V π () The expected return tarting at tate following π. Q π (,a) The expected return tarting at tate with action a and then following π. V () and Q (,a) are define uing an optimal policy π. V () = max π V π () 20

21 Planning - Policy Evaluation Dicounted infinite horizon (Bellman Eq.) V π () = E ~ π () [ R(,π ()) + γ V π ( )] Rewrite the expectation V π ( ) = E[ R(, π ( ))] + γ ' δ (, π ( ), ') V π ( ') Linear ytem of equation. 21

22 Algorithm - Policy Evaluation Example A={+1,-1} γ = 1/2 δ( i,a)= i+a π random a: R( i,a) = i V π ( 0 ) = 0 +γ [π( 0,+1)V π ( 1 ) + π( 0,-1) V π ( 3 ) ] 22

23 Algorithm -Policy Evaluation Example A={+1,-1} γ = 1/2 δ( i,a)= i+a π random a: R( i,a) = i V π ( 0 ) = 5/3 V π ( 1 ) = 7/3 V π ( 2 ) = 11/3 V π ( 3 ) = 13/3 V π ( 0 ) = 0 + (V π ( 1 ) + V π ( 3 ) )/4 23

24 Algorithm - optimal control State-Action Value function: Q π (,a) = E [ R(,a)] + γ E ~ (,a) [ V π ( )] π π Note V ( ) = Q (, π ( )) For a determinitic policy π. 24

25 Algorithm -Optimal control Example A={+1,-1} γ = 1/2 δ( i,a)= i+a π random R( i,a) = i Q π ( 0,+1) = 5/6 Q π ( 0,-1) = 13/6 Q π ( 0,+1) = 0 +γ V π ( 1 ) 25

26 Algorithm - optimal control CLAIM: A policy π i optimal if and only if at each tate : V π () = MAX a {Q π (,a)} (Bellman Eq.) PROOF: Aume there i a tate and action a.t., V π () < Q π (,a). Then the trategy of performing a at tate (the firt time) i better than π. Thi i true each time we viit, o the policy that perform action a at tate i better than π. 26

27 Algorithm -optimal control Example A={+1,-1} γ = 1/2 δ( i,a)= i+a π random R( i,a) = i Changing the policy uing the tate-action value function. 27

28 28 MDP - computing optimal policy 1. Linear Programming 2. Value Iteration method. )}, ( max { arg ) ( 1 a Q i a i = π π )} ' ( ) ',, ( ), ( max{ ) ( ' 1 V a a R V i a i + + δ γ 3. Policy Iteration method.

29 29 Convergence: Value Iteration Ditance of V i from the optimal V * (in L ) + + = + = ), ( ) ( ) ( ) ( ')] ( ') ( ')[,, ( ), ( ) ( ') ( '),, ( ), ( ), ( * 1 * * * 1 * * ' * * * * ' i i i i i i i i i V V V V a Q V V V V V V V a a Q V V a a R a Q γ γ δ γ δ γ Convergence Rate: 1/(1-γ) ONLY Peudo Polynomial

30 Convergence: Policy Iteration Policy Iteration Algorithm: Compute Q π (,a) Set π() = arg max a Q π (,a) Reiterate Convergence: Policy can only improve V t+1 () V t () Le iteration then Value Iteration, but more expenive iteration. OPEN: How many iteration doe it require?! LB: linear UB: 2 n /n (2-action MDP) [MS] 30

31 Outline Done Goal of Reinforcement Learning Mathematical Model (MDP) Planning Value iteration Policy iteration Now: Learning Algorithm Model baed Model Free 31

32 Planning veru Learning Tightly coupled in Reinforcement Learning Goal: maximize return while learning. 32

33 Example - Elevator Control Learning (alone): Model the arrival model well. Planning (alone) : Given arrival model build chedule Real objective: Contruct a chedule while updating model 33

34 Learning Algorithm Given acce only to action perform: 1. policy evaluation. 2. control - find optimal policy. Two approache: 1. Model baed (Dynamic Programming). 2. Model free (Q-Learning). 34

35 Learning - Model Baed Etimate the model from the obervation. (Both tranition probability and reward.) Ue the etimated model a the true model, and find optimal policy. If we have a good etimated model, we hould have a good etimation. 35

36 Learning - Model Baed: off policy Let the policy run for a long time. what i long?! Auming ome exploration Build an oberved model : Tranition probabilitie Reward Ue the oberved model to etimate value of the policy. 36

37 Learning - Model Baed ample ize Sample ize (optimal policy): Naive: O( S 2 A log ( S A ) ) ample. (approximate each tranition δ(,a, ) well.) Better: O( S A log ( S A ) ) ample. (Sufficient to approximate optimal policy.) [KS, NIPS 98] 37

38 Learning - Model Baed: on policy The learner ha control over the action. The immediate goal i to lean a model A before: Build an oberved model : Tranition probabilitie and Reward Ue the oberved model to etimate value of the policy. Accelerating the learning: How to reach new place?! 38

39 Learning - Model Baed: on policy Well ampled node Relatively unknown node 39

40 Learning - Model Baed: on policy HIGH REAWRD Well ampled node Relatively unknown node Exploration Planning in new MDP 40

41 Learning: Policy improvement Aume that we can perform: Given a policy π, Etimate V and Q function of π Can run policy improvement: π = Greedy (Q) Proce converge if etimation are accurate. 41

Reinforcement Learning. Yishay Mansour Tel-Aviv University

Reinforcement Learning. Yishay Mansour Tel-Aviv University Reinforcement Learning Yishay Mansour Tel-Aviv University 1 Reinforcement Learning: Course Information Classes: Wednesday Lecture 10-13 Yishay Mansour Recitations:14-15/15-16 Eliya Nachmani Adam Polyak

More information

Lecture 1: March 7, 2018

Lecture 1: March 7, 2018 Reinforcement Learning Spring Semester, 2017/8 Lecture 1: March 7, 2018 Lecturer: Yishay Mansour Scribe: ym DISCLAIMER: Based on Learning and Planning in Dynamical Systems by Shie Mannor c, all rights

More information

Jul 4, 2005 turbo_code_primer Revision 0.0. Turbo Code Primer

Jul 4, 2005 turbo_code_primer Revision 0.0. Turbo Code Primer Jul 4, 5 turbo_code_primer Reviion. Turbo Code Primer. Introduction Thi document give a quick tutorial on MAP baed turbo coder. Section develop the background theory. Section work through a imple numerical

More information

Problem Set 8 Solutions

Problem Set 8 Solutions Deign and Analyi of Algorithm April 29, 2015 Maachuett Intitute of Technology 6.046J/18.410J Prof. Erik Demaine, Srini Devada, and Nancy Lynch Problem Set 8 Solution Problem Set 8 Solution Thi problem

More information

MARKOV DECISION PROCESSES (MDP) AND REINFORCEMENT LEARNING (RL) Versione originale delle slide fornita dal Prof. Francesco Lo Presti

MARKOV DECISION PROCESSES (MDP) AND REINFORCEMENT LEARNING (RL) Versione originale delle slide fornita dal Prof. Francesco Lo Presti 1 MARKOV DECISION PROCESSES (MDP) AND REINFORCEMENT LEARNING (RL) Versione originale delle slide fornita dal Prof. Francesco Lo Presti Historical background 2 Original motivation: animal learning Early

More information

Elements of Reinforcement Learning

Elements of Reinforcement Learning Elements of Reinforcement Learning Policy: way learning algorithm behaves (mapping from state to action) Reward function: Mapping of state action pair to reward or cost Value function: long term reward,

More information

Multi-Objective Verification on MDPs. Tim Quatmann, Joost-Pieter Katoen Lehrstuhl für Informatik 2

Multi-Objective Verification on MDPs. Tim Quatmann, Joost-Pieter Katoen Lehrstuhl für Informatik 2 Multi-Objective Verification on MDP Lehrtuhl für Informatik 2 Motivation Planning Under Uncertainty Scenario: Travel to the airport T a x i Travel time i uncertain... Goal: Arrive before the flight depart!!2

More information

Comparing Means: t-tests for Two Independent Samples

Comparing Means: t-tests for Two Independent Samples Comparing ean: t-tet for Two Independent Sample Independent-eaure Deign t-tet for Two Independent Sample Allow reearcher to evaluate the mean difference between two population uing data from two eparate

More information

Reinforcement Learning

Reinforcement Learning Reinforcement Learning Markov decision process & Dynamic programming Evaluative feedback, value function, Bellman equation, optimality, Markov property, Markov decision process, dynamic programming, value

More information

The Winding Path to RL

The Winding Path to RL Markov Deciion Procee MDP) Ron Parr ComSci 70 Deartment of Comuter Science Duke Univerity With thank to Kri Hauer for ome lide The Winding Path to RL Deciion Theory Decritive theory of otimal behavior

More information

Clustering Methods without Given Number of Clusters

Clustering Methods without Given Number of Clusters Clutering Method without Given Number of Cluter Peng Xu, Fei Liu Introduction A we now, mean method i a very effective algorithm of clutering. It mot powerful feature i the calability and implicity. However,

More information

Assignment for Mathematics for Economists Fall 2016

Assignment for Mathematics for Economists Fall 2016 Due date: Mon. Nov. 1. Reading: CSZ, Ch. 5, Ch. 8.1 Aignment for Mathematic for Economit Fall 016 We now turn to finihing our coverage of concavity/convexity. There are two part: Jenen inequality for concave/convex

More information

RELIABILITY OF REPAIRABLE k out of n: F SYSTEM HAVING DISCRETE REPAIR AND FAILURE TIMES DISTRIBUTIONS

RELIABILITY OF REPAIRABLE k out of n: F SYSTEM HAVING DISCRETE REPAIR AND FAILURE TIMES DISTRIBUTIONS www.arpapre.com/volume/vol29iue1/ijrras_29_1_01.pdf RELIABILITY OF REPAIRABLE k out of n: F SYSTEM HAVING DISCRETE REPAIR AND FAILURE TIMES DISTRIBUTIONS Sevcan Demir Atalay 1,* & Özge Elmataş Gültekin

More information

CS599 Lecture 1 Introduction To RL

CS599 Lecture 1 Introduction To RL CS599 Lecture 1 Introduction To RL Reinforcement Learning Introduction Learning from rewards Policies Value Functions Rewards Models of the Environment Exploitation vs. Exploration Dynamic Programming

More information

Lecture 4 Topic 3: General linear models (GLMs), the fundamentals of the analysis of variance (ANOVA), and completely randomized designs (CRDs)

Lecture 4 Topic 3: General linear models (GLMs), the fundamentals of the analysis of variance (ANOVA), and completely randomized designs (CRDs) Lecture 4 Topic 3: General linear model (GLM), the fundamental of the analyi of variance (ANOVA), and completely randomized deign (CRD) The general linear model One population: An obervation i explained

More information

Christopher Watkins and Peter Dayan. Noga Zaslavsky. The Hebrew University of Jerusalem Advanced Seminar in Deep Learning (67679) November 1, 2015

Christopher Watkins and Peter Dayan. Noga Zaslavsky. The Hebrew University of Jerusalem Advanced Seminar in Deep Learning (67679) November 1, 2015 Q-Learning Christopher Watkins and Peter Dayan Noga Zaslavsky The Hebrew University of Jerusalem Advanced Seminar in Deep Learning (67679) November 1, 2015 Noga Zaslavsky Q-Learning (Watkins & Dayan, 1992)

More information

Reinforcement Learning. Introduction

Reinforcement Learning. Introduction Reinforcement Learning Introduction Reinforcement Learning Agent interacts and learns from a stochastic environment Science of sequential decision making Many faces of reinforcement learning Optimal control

More information

Memoryle Strategie in Concurrent Game with Reachability Objective Λ Krihnendu Chatterjee y Luca de Alfaro x Thoma A. Henzinger y;z y EECS, Univerity o

Memoryle Strategie in Concurrent Game with Reachability Objective Λ Krihnendu Chatterjee y Luca de Alfaro x Thoma A. Henzinger y;z y EECS, Univerity o Memoryle Strategie in Concurrent Game with Reachability Objective Krihnendu Chatterjee, Luca de Alfaro and Thoma A. Henzinger Report No. UCB/CSD-5-1406 Augut 2005 Computer Science Diviion (EECS) Univerity

More information

Stochastic Perishable Inventory Control in a Service Facility System Maintaining Inventory for Service: Semi Markov Decision Problem

Stochastic Perishable Inventory Control in a Service Facility System Maintaining Inventory for Service: Semi Markov Decision Problem Stochatic Perihable Inventory Control in a Service Facility Sytem Maintaining Inventory for Service: Semi Markov Deciion Problem R.Mugeh 1,S.Krihnakumar 2, and C.Elango 3 1 mugehrengawamy@gmail.com 2 krihmathew@gmail.com

More information

ARTIFICIAL INTELLIGENCE. Reinforcement learning

ARTIFICIAL INTELLIGENCE. Reinforcement learning INFOB2KI 2018-2019 Utrecht University The Netherlands ARTIFICIAL INTELLIGENCE Reinforcement learning Lecturer: Silja Renooij These slides are part of the INFOB2KI Course Notes available from www.cs.uu.nl/docs/vakken/b2ki/schema.html

More information

μ + = σ = D 4 σ = D 3 σ = σ = All units in parts (a) and (b) are in V. (1) x chart: Center = μ = 0.75 UCL =

μ + = σ = D 4 σ = D 3 σ = σ = All units in parts (a) and (b) are in V. (1) x chart: Center = μ = 0.75 UCL = Our online Tutor are available 4*7 to provide Help with Proce control ytem Homework/Aignment or a long term Graduate/Undergraduate Proce control ytem Project. Our Tutor being experienced and proficient

More information

Today s Outline. Recap: MDPs. Bellman Equations. Q-Value Iteration. Bellman Backup 5/7/2012. CSE 473: Artificial Intelligence Reinforcement Learning

Today s Outline. Recap: MDPs. Bellman Equations. Q-Value Iteration. Bellman Backup 5/7/2012. CSE 473: Artificial Intelligence Reinforcement Learning CSE 473: Artificial Intelligence Reinforcement Learning Dan Weld Today s Outline Reinforcement Learning Q-value iteration Q-learning Exploration / exploitation Linear function approximation Many slides

More information

Seminar in Artificial Intelligence Near-Bayesian Exploration in Polynomial Time

Seminar in Artificial Intelligence Near-Bayesian Exploration in Polynomial Time Seminar in Artificial Intelligence Near-Bayesian Exploration in Polynomial Time 26.11.2015 Fachbereich Informatik Knowledge Engineering Group David Fischer 1 Table of Contents Problem and Motivation Algorithm

More information

CS 570: Machine Learning Seminar. Fall 2016

CS 570: Machine Learning Seminar. Fall 2016 CS 570: Machine Learning Seminar Fall 2016 Class Information Class web page: http://web.cecs.pdx.edu/~mm/mlseminar2016-2017/fall2016/ Class mailing list: cs570@cs.pdx.edu My office hours: T,Th, 2-3pm or

More information

CS788 Dialogue Management Systems Lecture #2: Markov Decision Processes

CS788 Dialogue Management Systems Lecture #2: Markov Decision Processes CS788 Dialogue Management Systems Lecture #2: Markov Decision Processes Kee-Eung Kim KAIST EECS Department Computer Science Division Markov Decision Processes (MDPs) A popular model for sequential decision

More information

Z a>2 s 1n = X L - m. X L = m + Z a>2 s 1n X L = The decision rule for this one-tail test is

Z a>2 s 1n = X L - m. X L = m + Z a>2 s 1n X L = The decision rule for this one-tail test is M09_BERE8380_12_OM_C09.QD 2/21/11 3:44 PM Page 1 9.6 The Power of a Tet 9.6 The Power of a Tet 1 Section 9.1 defined Type I and Type II error and their aociated rik. Recall that a repreent the probability

More information

Chapter Landscape of an Optimization Problem. Local Search. Coping With NP-Hardness. Gradient Descent: Vertex Cover

Chapter Landscape of an Optimization Problem. Local Search. Coping With NP-Hardness. Gradient Descent: Vertex Cover Coping With NP-Hardne Chapter 12 Local Search Q Suppoe I need to olve an NP-hard problem What hould I do? A Theory ay you're unlikely to find poly-time algorithm Mut acrifice one of three deired feature

More information

1 MDP Value Iteration Algorithm

1 MDP Value Iteration Algorithm CS 0. - Active Learning Problem Set Handed out: 4 Jan 009 Due: 9 Jan 009 MDP Value Iteration Algorithm. Implement the value iteration algorithm given in the lecture. That is, solve Bellman s equation using

More information

Grundlagen der Künstlichen Intelligenz

Grundlagen der Künstlichen Intelligenz Grundlagen der Künstlichen Intelligenz Reinforcement learning Daniel Hennes 4.12.2017 (WS 2017/18) University Stuttgart - IPVS - Machine Learning & Robotics 1 Today Reinforcement learning Model based and

More information

Reinforcement Learning. Donglin Zeng, Department of Biostatistics, University of North Carolina

Reinforcement Learning. Donglin Zeng, Department of Biostatistics, University of North Carolina Reinforcement Learning Introduction Introduction Unsupervised learning has no outcome (no feedback). Supervised learning has outcome so we know what to predict. Reinforcement learning is in between it

More information

Sampling Based Approaches for Minimizing Regret in Uncertain Markov Decision Processes (MDPs)

Sampling Based Approaches for Minimizing Regret in Uncertain Markov Decision Processes (MDPs) Journal of Artificial Intelligence Reearch 59 (2017) 229-264 Submitted 06/16; publihed 06/17 Sampling Baed Approache for Minimizing Regret in Uncertain Markov Deciion Procee (MDP) Arar Ahmed London Buine

More information

Lecture 7: Testing Distributions

Lecture 7: Testing Distributions CSE 5: Sublinear (and Streaming) Algorithm Spring 014 Lecture 7: Teting Ditribution April 1, 014 Lecturer: Paul Beame Scribe: Paul Beame 1 Teting Uniformity of Ditribution We return today to property teting

More information

Administration. CSCI567 Machine Learning (Fall 2018) Outline. Outline. HW5 is available, due on 11/18. Practice final will also be available soon.

Administration. CSCI567 Machine Learning (Fall 2018) Outline. Outline. HW5 is available, due on 11/18. Practice final will also be available soon. Administration CSCI567 Machine Learning Fall 2018 Prof. Haipeng Luo U of Southern California Nov 7, 2018 HW5 is available, due on 11/18. Practice final will also be available soon. Remaining weeks: 11/14,

More information

Markov decision processes

Markov decision processes CS 2740 Knowledge representation Lecture 24 Markov decision processes Milos Hauskrecht milos@cs.pitt.edu 5329 Sennott Square Administrative announcements Final exam: Monday, December 8, 2008 In-class Only

More information

Introduction to Reinforcement Learning. CMPT 882 Mar. 18

Introduction to Reinforcement Learning. CMPT 882 Mar. 18 Introduction to Reinforcement Learning CMPT 882 Mar. 18 Outline for the week Basic ideas in RL Value functions and value iteration Policy evaluation and policy improvement Model-free RL Monte-Carlo and

More information

Lecture 18: Reinforcement Learning Sanjeev Arora Elad Hazan

Lecture 18: Reinforcement Learning Sanjeev Arora Elad Hazan COS 402 Machine Learning and Artificial Intelligence Fall 2016 Lecture 18: Reinforcement Learning Sanjeev Arora Elad Hazan Some slides borrowed from Peter Bodik and David Silver Course progress Learning

More information

Artificial Intelligence & Sequential Decision Problems

Artificial Intelligence & Sequential Decision Problems Artificial Intelligence & Sequential Decision Problems (CIV6540 - Machine Learning for Civil Engineers) Professor: James-A. Goulet Département des génies civil, géologique et des mines Chapter 15 Goulet

More information

Social Studies 201 Notes for March 18, 2005

Social Studies 201 Notes for March 18, 2005 1 Social Studie 201 Note for March 18, 2005 Etimation of a mean, mall ample ize Section 8.4, p. 501. When a reearcher ha only a mall ample ize available, the central limit theorem doe not apply to the

More information

ALLOCATING BANDWIDTH FOR BURSTY CONNECTIONS

ALLOCATING BANDWIDTH FOR BURSTY CONNECTIONS SIAM J. COMPUT. Vol. 30, No. 1, pp. 191 217 c 2000 Society for Indutrial and Applied Mathematic ALLOCATING BANDWIDTH FOR BURSTY CONNECTIONS JON KLEINBERG, YUVAL RABANI, AND ÉVA TARDOS Abtract. In thi paper,

More information

The simplex method is strongly polynomial for deterministic Markov decision processes

The simplex method is strongly polynomial for deterministic Markov decision processes The implex method i trongly polynomial for determinitic Markov deciion procee Ian Pot Yinyu Ye May 31, 2013 Abtract We prove that the implex method with the highet gain/mot-negative-reduced cot pivoting

More information

CS 7180: Behavioral Modeling and Decisionmaking

CS 7180: Behavioral Modeling and Decisionmaking CS 7180: Behavioral Modeling and Decisionmaking in AI Markov Decision Processes for Complex Decisionmaking Prof. Amy Sliva October 17, 2012 Decisions are nondeterministic In many situations, behavior and

More information

Myriad of applications

Myriad of applications Shortet Path Myriad of application Finding hortet ditance between location (Google map, etc.) Internet router protocol: OSPF (Open Shortet Path Firt) i ued to find the hortet path to interchange package

More information

Departure Time and Route Choices with Bottleneck Congestion: User Equilibrium under Risk and Ambiguity

Departure Time and Route Choices with Bottleneck Congestion: User Equilibrium under Risk and Ambiguity Departure Time and Route Choice with Bottleneck Congetion: Uer Equilibrium under Rik and Ambiguity Yang Liu, Yuanyuan Li and Lu Hu Department of Indutrial Sytem Engineering and Management National Univerity

More information

Chapter 12 Simple Linear Regression

Chapter 12 Simple Linear Regression Chapter 1 Simple Linear Regreion Introduction Exam Score v. Hour Studied Scenario Regreion Analyi ued to quantify the relation between (or more) variable o you can predict the value of one variable baed

More information

Today s s Lecture. Applicability of Neural Networks. Back-propagation. Review of Neural Networks. Lecture 20: Learning -4. Markov-Decision Processes

Today s s Lecture. Applicability of Neural Networks. Back-propagation. Review of Neural Networks. Lecture 20: Learning -4. Markov-Decision Processes Today s s Lecture Lecture 20: Learning -4 Review of Neural Networks Markov-Decision Processes Victor Lesser CMPSCI 683 Fall 2004 Reinforcement learning 2 Back-propagation Applicability of Neural Networks

More information

COMP3702/7702 Artificial Intelligence Lecture 11: Introduction to Machine Learning and Reinforcement Learning. Hanna Kurniawati

COMP3702/7702 Artificial Intelligence Lecture 11: Introduction to Machine Learning and Reinforcement Learning. Hanna Kurniawati COMP3702/7702 Artificial Intelligence Lecture 11: Introduction to Machine Learning and Reinforcement Learning Hanna Kurniawati Today } What is machine learning? } Where is it used? } Types of machine learning

More information

Basics of reinforcement learning

Basics of reinforcement learning Basics of reinforcement learning Lucian Buşoniu TMLSS, 20 July 2018 Main idea of reinforcement learning (RL) Learn a sequential decision policy to optimize the cumulative performance of an unknown system

More information

To describe a queuing system, an input process and an output process has to be specified.

To describe a queuing system, an input process and an output process has to be specified. 5. Queue (aiting Line) Queuing terminology Input Service Output To decribe a ueuing ytem, an input proce and an output proce ha to be pecified. For example ituation input proce output proce Bank Cutomer

More information

Stochastic games with additive transitions

Stochastic games with additive transitions European Journal of Operational Reearch 79 (007) 483 497 Deciion Support Stochatic game with additive tranition J. Flech, F. Thuijman *, O.J. Vrieze Department of Mathematic, Maatricht Univerity, P.O.

More information

7.2 INVERSE TRANSFORMS AND TRANSFORMS OF DERIVATIVES 281

7.2 INVERSE TRANSFORMS AND TRANSFORMS OF DERIVATIVES 281 72 INVERSE TRANSFORMS AND TRANSFORMS OF DERIVATIVES 28 and i 2 Show how Euler formula (page 33) can then be ued to deduce the reult a ( a) 2 b 2 {e at co bt} {e at in bt} b ( a) 2 b 2 5 Under what condition

More information

Lecture 10 Filtering: Applied Concepts

Lecture 10 Filtering: Applied Concepts Lecture Filtering: Applied Concept In the previou two lecture, you have learned about finite-impule-repone (FIR) and infinite-impule-repone (IIR) filter. In thee lecture, we introduced the concept of filtering

More information

, and rewards and transition matrices as shown below:

, and rewards and transition matrices as shown below: CSE 50a. Assignment 7 Out: Tue Nov Due: Thu Dec Reading: Sutton & Barto, Chapters -. 7. Policy improvement Consider the Markov decision process (MDP) with two states s {0, }, two actions a {0, }, discount

More information

Planning in Markov Decision Processes

Planning in Markov Decision Processes Carnegie Mellon School of Computer Science Deep Reinforcement Learning and Control Planning in Markov Decision Processes Lecture 3, CMU 10703 Katerina Fragkiadaki Markov Decision Process (MDP) A Markov

More information

Standard normal distribution. t-distribution, (df=5) t-distribution, (df=2) PDF created with pdffactory Pro trial version

Standard normal distribution. t-distribution, (df=5) t-distribution, (df=2) PDF created with pdffactory Pro trial version t-ditribution In biological reearch the population variance i uually unknown and an unbiaed etimate,, obtained from the ample data, ha to be ued in place of σ. The propertie of t- ditribution are: -It

More information

Balancing and Control of a Freely-Swinging Pendulum Using a Model-Free Reinforcement Learning Algorithm

Balancing and Control of a Freely-Swinging Pendulum Using a Model-Free Reinforcement Learning Algorithm Balancing and Control of a Freely-Swinging Pendulum Using a Model-Free Reinforcement Learning Algorithm Michail G. Lagoudakis Department of Computer Science Duke University Durham, NC 2778 mgl@cs.duke.edu

More information

Layering as Optimization Decomposition

Layering as Optimization Decomposition TLEN7000/ECEN7002: Analytical Foundation of Network Layering a Optimization Decompoition Lijun Chen 11/29/2012 The Internet Compleity i ever increaing Large in ize and cope Enormou heterogeneity Incomplete

More information

Codes Correcting Two Deletions

Codes Correcting Two Deletions 1 Code Correcting Two Deletion Ryan Gabry and Frederic Sala Spawar Sytem Center Univerity of California, Lo Angele ryan.gabry@navy.mil fredala@ucla.edu Abtract In thi work, we invetigate the problem of

More information

Course 16:198:520: Introduction To Artificial Intelligence Lecture 13. Decision Making. Abdeslam Boularias. Wednesday, December 7, 2016

Course 16:198:520: Introduction To Artificial Intelligence Lecture 13. Decision Making. Abdeslam Boularias. Wednesday, December 7, 2016 Course 16:198:520: Introduction To Artificial Intelligence Lecture 13 Decision Making Abdeslam Boularias Wednesday, December 7, 2016 1 / 45 Overview We consider probabilistic temporal models where the

More information

Q-learning. Tambet Matiisen

Q-learning. Tambet Matiisen Q-learning Tambet Matiisen (based on chapter 11.3 of online book Artificial Intelligence, foundations of computational agents by David Poole and Alan Mackworth) Stochastic gradient descent Experience

More information

Confusion matrices. True / False positives / negatives. INF 4300 Classification III Anne Solberg The agenda today: E.g., testing for cancer

Confusion matrices. True / False positives / negatives. INF 4300 Classification III Anne Solberg The agenda today: E.g., testing for cancer INF 4300 Claification III Anne Solberg 29.10.14 The agenda today: More on etimating claifier accuracy Cure of dimenionality knn-claification K-mean clutering x i feature vector for pixel i i- The cla label

More information

Bogoliubov Transformation in Classical Mechanics

Bogoliubov Transformation in Classical Mechanics Bogoliubov Tranformation in Claical Mechanic Canonical Tranformation Suppoe we have a et of complex canonical variable, {a j }, and would like to conider another et of variable, {b }, b b ({a j }). How

More information

Preemptive scheduling on a small number of hierarchical machines

Preemptive scheduling on a small number of hierarchical machines Available online at www.ciencedirect.com Information and Computation 06 (008) 60 619 www.elevier.com/locate/ic Preemptive cheduling on a mall number of hierarchical machine György Dóa a, Leah Eptein b,

More information

Value Iteration for Continuous-State POMDPs

Value Iteration for Continuous-State POMDPs Univeriteit van Amterdam IAS technical report IAS-UVA-04-04 Value Iteration for Continuou-State POMDP Joep M. Porta, Matthij T.J. Spaan, and Niko Vlai Intitut de Robòtica i Informàtica Indutrial (UPC-CSIC)

More information

CHAPTER 8 OBSERVER BASED REDUCED ORDER CONTROLLER DESIGN FOR LARGE SCALE LINEAR DISCRETE-TIME CONTROL SYSTEMS

CHAPTER 8 OBSERVER BASED REDUCED ORDER CONTROLLER DESIGN FOR LARGE SCALE LINEAR DISCRETE-TIME CONTROL SYSTEMS CHAPTER 8 OBSERVER BASED REDUCED ORDER CONTROLLER DESIGN FOR LARGE SCALE LINEAR DISCRETE-TIME CONTROL SYSTEMS 8.1 INTRODUCTION 8.2 REDUCED ORDER MODEL DESIGN FOR LINEAR DISCRETE-TIME CONTROL SYSTEMS 8.3

More information

Evolutionary Algorithms Based Fixed Order Robust Controller Design and Robustness Performance Analysis

Evolutionary Algorithms Based Fixed Order Robust Controller Design and Robustness Performance Analysis Proceeding of 01 4th International Conference on Machine Learning and Computing IPCSIT vol. 5 (01) (01) IACSIT Pre, Singapore Evolutionary Algorithm Baed Fixed Order Robut Controller Deign and Robutne

More information

Introduction to Reinforcement Learning

Introduction to Reinforcement Learning Introduction to Reinforcement Learning Rémi Munos SequeL project: Sequential Learning http://researchers.lille.inria.fr/ munos/ INRIA Lille - Nord Europe Machine Learning Summer School, September 2011,

More information

Behavioral thermal modeling for quad-core microprocessors

Behavioral thermal modeling for quad-core microprocessors Behavioral thermal modeling for quad-core microproceor Duo Li and Sheldon X.-D. Tan Department of Electrical Engineering Univerity of California, Riveride, CA Murli Tirumala Intel Corporation Outline Introduction

More information

Prof. Dr. Ann Nowé. Artificial Intelligence Lab ai.vub.ac.be

Prof. Dr. Ann Nowé. Artificial Intelligence Lab ai.vub.ac.be REINFORCEMENT LEARNING AN INTRODUCTION Prof. Dr. Ann Nowé Artificial Intelligence Lab ai.vub.ac.be REINFORCEMENT LEARNING WHAT IS IT? What is it? Learning from interaction Learning about, from, and while

More information

Factor Analysis with Poisson Output

Factor Analysis with Poisson Output Factor Analyi with Poion Output Gopal Santhanam Byron Yu Krihna V. Shenoy, Department of Electrical Engineering, Neurocience Program Stanford Univerity Stanford, CA 94305, USA {gopal,byronyu,henoy}@tanford.edu

More information

Artificial Intelligence Markov Decision Problems

Artificial Intelligence Markov Decision Problems rtificil Intelligence Mrkov eciion Problem ilon - briefly mentioned in hpter Ruell nd orvig - hpter 7 Mrkov eciion Problem; pge of Mrkov eciion Problem; pge of exmple: probbilitic blockworld ction outcome

More information

The continuous time random walk (CTRW) was introduced by Montroll and Weiss 1.

The continuous time random walk (CTRW) was introduced by Montroll and Weiss 1. 1 I. CONTINUOUS TIME RANDOM WALK The continuou time random walk (CTRW) wa introduced by Montroll and Wei 1. Unlike dicrete time random walk treated o far, in the CTRW the number of jump n made by the walker

More information

A Study on Simulating Convolutional Codes and Turbo Codes

A Study on Simulating Convolutional Codes and Turbo Codes A Study on Simulating Convolutional Code and Turbo Code Final Report By Daniel Chang July 27, 2001 Advior: Dr. P. Kinman Executive Summary Thi project include the deign of imulation of everal convolutional

More information

Quantitative Information Leakage. Lecture 9

Quantitative Information Leakage. Lecture 9 Quantitative Information Leakage Lecture 9 1 The baic model: Sytem = Information-Theoretic channel Secret Information Obervable 1 o1... Sytem... m on Input Output 2 Toward a quantitative notion of leakage

More information

State Space: Observer Design Lecture 11

State Space: Observer Design Lecture 11 State Space: Oberver Deign Lecture Advanced Control Sytem Dr Eyad Radwan Dr Eyad Radwan/ACS/ State Space-L Controller deign relie upon acce to the tate variable for feedback through adjutable gain. Thi

More information

Real Time Value Iteration and the State-Action Value Function

Real Time Value Iteration and the State-Action Value Function MS&E338 Reinforcement Learning Lecture 3-4/9/18 Real Time Value Iteration and the State-Action Value Function Lecturer: Ben Van Roy Scribe: Apoorva Sharma and Tong Mu 1 Review Last time we left off discussing

More information

Machine Learning. Reinforcement learning. Hamid Beigy. Sharif University of Technology. Fall 1396

Machine Learning. Reinforcement learning. Hamid Beigy. Sharif University of Technology. Fall 1396 Machine Learning Reinforcement learning Hamid Beigy Sharif University of Technology Fall 1396 Hamid Beigy (Sharif University of Technology) Machine Learning Fall 1396 1 / 32 Table of contents 1 Introduction

More information

HSC PHYSICS ONLINE KINEMATICS EXPERIMENT

HSC PHYSICS ONLINE KINEMATICS EXPERIMENT HSC PHYSICS ONLINE KINEMATICS EXPERIMENT RECTILINEAR MOTION WITH UNIFORM ACCELERATION Ball rolling down a ramp Aim To perform an experiment and do a detailed analyi of the numerical reult for the rectilinear

More information

No-load And Blocked Rotor Test On An Induction Machine

No-load And Blocked Rotor Test On An Induction Machine No-load And Blocked Rotor Tet On An Induction Machine Aim To etimate magnetization and leakage impedance parameter of induction machine uing no-load and blocked rotor tet Theory An induction machine in

More information

White Rose Research Online URL for this paper: Version: Accepted Version

White Rose Research Online URL for this paper:   Version: Accepted Version Thi i a repoitory copy of Identification of nonlinear ytem with non-peritent excitation uing an iterative forward orthogonal leat quare regreion algorithm. White Roe Reearch Online URL for thi paper: http://eprint.whiteroe.ac.uk/107314/

More information

MDP Preliminaries. Nan Jiang. February 10, 2019

MDP Preliminaries. Nan Jiang. February 10, 2019 MDP Preliminaries Nan Jiang February 10, 2019 1 Markov Decision Processes In reinforcement learning, the interactions between the agent and the environment are often described by a Markov Decision Process

More information

Lecture 8: Period Finding: Simon s Problem over Z N

Lecture 8: Period Finding: Simon s Problem over Z N Quantum Computation (CMU 8-859BB, Fall 205) Lecture 8: Period Finding: Simon Problem over Z October 5, 205 Lecturer: John Wright Scribe: icola Rech Problem A mentioned previouly, period finding i a rephraing

More information

Internet Monetization

Internet Monetization Internet Monetization March May, 2013 Discrete time Finite A decision process (MDP) is reward process with decisions. It models an environment in which all states are and time is divided into stages. Definition

More information

Optimal Control. McGill COMP 765 Oct 3 rd, 2017

Optimal Control. McGill COMP 765 Oct 3 rd, 2017 Optimal Control McGill COMP 765 Oct 3 rd, 2017 Classical Control Quiz Question 1: Can a PID controller be used to balance an inverted pendulum: A) That starts upright? B) That must be swung-up (perhaps

More information

Q-Learning in Continuous State Action Spaces

Q-Learning in Continuous State Action Spaces Q-Learning in Continuous State Action Spaces Alex Irpan alexirpan@berkeley.edu December 5, 2015 Contents 1 Introduction 1 2 Background 1 3 Q-Learning 2 4 Q-Learning In Continuous Spaces 4 5 Experimental

More information

Reinforcement Learning

Reinforcement Learning Reinforcement Learning Ron Parr CompSci 7 Department of Computer Science Duke University With thanks to Kris Hauser for some content RL Highlights Everybody likes to learn from experience Use ML techniques

More information

Course basics. CSE 190: Reinforcement Learning: An Introduction. Last Time. Course goals. The website for the class is linked off my homepage.

Course basics. CSE 190: Reinforcement Learning: An Introduction. Last Time. Course goals. The website for the class is linked off my homepage. Course basics CSE 190: Reinforcement Learning: An Introduction The website for the class is linked off my homepage. Grades will be based on programming assignments, homeworks, and class participation.

More information

Comparison of independent process analytical measurements a variographic study

Comparison of independent process analytical measurements a variographic study WSC 7, Raivola, Ruia, 15-19 February, 010 Comparion of independent proce analytical meaurement a variographic tudy Atmopheric emiion Watewater Solid wate Pentti Minkkinen 1) Lappeenranta Univerity of Technology

More information

Suggested Answers To Exercises. estimates variability in a sampling distribution of random means. About 68% of means fall

Suggested Answers To Exercises. estimates variability in a sampling distribution of random means. About 68% of means fall Beyond Significance Teting ( nd Edition), Rex B. Kline Suggeted Anwer To Exercie Chapter. The tatitic meaure variability among core at the cae level. In a normal ditribution, about 68% of the core fall

More information

An Introduction to Reinforcement Learning

An Introduction to Reinforcement Learning An Introduction to Reinforcement Learning Shivaram Kalyanakrishnan shivaram@cse.iitb.ac.in Department of Computer Science and Engineering Indian Institute of Technology Bombay April 2018 What is Reinforcement

More information

Reinforcement Learning and Control

Reinforcement Learning and Control CS9 Lecture notes Andrew Ng Part XIII Reinforcement Learning and Control We now begin our study of reinforcement learning and adaptive control. In supervised learning, we saw algorithms that tried to make

More information

Reinforcement Learning: An Introduction

Reinforcement Learning: An Introduction Introduction Betreuer: Freek Stulp Hauptseminar Intelligente Autonome Systeme (WiSe 04/05) Forschungs- und Lehreinheit Informatik IX Technische Universität München November 24, 2004 Introduction What is

More information

Reinforcement Learning

Reinforcement Learning Reinforcement Learning Dipendra Misra Cornell University dkm@cs.cornell.edu https://dipendramisra.wordpress.com/ Task Grasp the green cup. Output: Sequence of controller actions Setup from Lenz et. al.

More information

Multi Constrained Optimization model of Supply Chain Based on Intelligent Algorithm Han Juan School of Management Shanghai University

Multi Constrained Optimization model of Supply Chain Based on Intelligent Algorithm Han Juan School of Management Shanghai University 5th nternational Conference on Advanced Material and Computer cience (CAMC 206) Multi Contrained Optimization model of upply Chain Baed on ntelligent Algorithm Han Juan chool of Management hanghai Univerity

More information

Artificial Intelligence

Artificial Intelligence Artificial Intelligence Dynamic Programming Marc Toussaint University of Stuttgart Winter 2018/19 Motivation: So far we focussed on tree search-like solvers for decision problems. There is a second important

More information

List coloring hypergraphs

List coloring hypergraphs Lit coloring hypergraph Penny Haxell Jacque Vertraete Department of Combinatoric and Optimization Univerity of Waterloo Waterloo, Ontario, Canada pehaxell@uwaterloo.ca Department of Mathematic Univerity

More information

Autonomous Helicopter Flight via Reinforcement Learning

Autonomous Helicopter Flight via Reinforcement Learning Autonomous Helicopter Flight via Reinforcement Learning Authors: Andrew Y. Ng, H. Jin Kim, Michael I. Jordan, Shankar Sastry Presenters: Shiv Ballianda, Jerrolyn Hebert, Shuiwang Ji, Kenley Malveaux, Huy

More information

STOCHASTIC GENERALIZED TRANSPORTATION PROBLEM WITH DISCRETE DISTRIBUTION OF DEMAND

STOCHASTIC GENERALIZED TRANSPORTATION PROBLEM WITH DISCRETE DISTRIBUTION OF DEMAND OPERATIONS RESEARCH AND DECISIONS No. 4 203 DOI: 0.5277/ord30402 Marcin ANHOLCER STOCHASTIC GENERALIZED TRANSPORTATION PROBLEM WITH DISCRETE DISTRIBUTION OF DEMAND The generalized tranportation problem

More information

Hidden Markov Models (HMM) and Support Vector Machine (SVM)

Hidden Markov Models (HMM) and Support Vector Machine (SVM) Hidden Markov Models (HMM) and Support Vector Machine (SVM) Professor Joongheon Kim School of Computer Science and Engineering, Chung-Ang University, Seoul, Republic of Korea 1 Hidden Markov Models (HMM)

More information

This question has three parts, each of which can be answered concisely, but be prepared to explain and justify your concise answer.

This question has three parts, each of which can be answered concisely, but be prepared to explain and justify your concise answer. This question has three parts, each of which can be answered concisely, but be prepared to explain and justify your concise answer. 1. Suppose you have a policy and its action-value function, q, then you

More information

Channel Selection in Multi-channel Opportunistic Spectrum Access Networks with Perfect Sensing

Channel Selection in Multi-channel Opportunistic Spectrum Access Networks with Perfect Sensing Channel Selection in Multi-channel Opportunitic Spectrum Acce Networ with Perfect Sening Xin Liu Univerity of California Davi, CA 95616 liu@c.ucdavi.edu Bhaar Krihnamachari Univerity of Southern California

More information