Reinforcement Learning
|
|
- Shanon Sparks
- 6 years ago
- Views:
Transcription
1 Reinforcement Learning Yihay Manour Google Inc. & Tel-Aviv Univerity
2 Outline Goal of Reinforcement Learning Mathematical Model (MDP) Planning Learning Current Reearch iue 2
3 Goal of Reinforcement Learning Goal oriented learning through interaction Control of large cale tochatic environment with partial knowledge. Supervied / Unupervied Learning Learn from labeled / unlabeled example 3
4 Reinforcement Learning - origin Artificial Intelligence Control Theory Operation Reearch Cognitive Science & Pychology Solid foundation; well etablihed reearch. 4
5 Typical Application Robotic Elevator control [CB]. Robo-occer [SV]. Board game backgammon [T], checker [S]. Che [B] Scheduling Dynamic channel allocation [SB]. Inventory problem. 5
6 Contrat with Supervied Learning The ytem ha a tate. The algorithm influence the tate ditribution. Inherent Tradeoff: Exploration veru Exploitation. 6
7 Mathematical Model - Motivation Model of uncertainty: Environment, action, our knowledge. Focu on deciion making. Maximize long term reward. Markov Deciion Proce (MDP) 7
8 Mathematical Model - MDP Markov deciion procee S- et of tate A- et of action δ - Tranition probability R - Reward function Similar to DFA! 8
9 MDP model - tate and action Environment = tate action a Action = tranition δ (, a, ' ) 9
10 MDP model - reward R(,a) = reward at tate for doing action a (a random variable). Example: R(,a) = -1 with probability with probability with probability
11 MDP model - trajectorie trajectory: 0 a 0 r 0 1 a 1 r 1 2 a 2 r 2 11
12 MDP - Return function. Combining all the immediate reward to a ingle value. Modeling Iue: Are early reward more valuable than later reward? I the ytem terminating or continuou? Uually the return i linear in the immediate reward. 12
13 MDP model - return function Finite Horizon - parameter H return = R( i, ai ) Infinite Horizon 1 i H i dicounted - parameter γ<1. return = γ R(i,ai ) i=0 undicounted 1 N N 1 i= 0 R( i,a i N ) return Terminating MDP 13
14 MDP model - action election AIM: Maximize the expected return. Fully Obervable - can ee the entire tate. Policy - mapping from tate to action Optimal policy: optimal from any tart tate. THEOREM: There exit a determinitic optimal policy 14
15 Contrat with Supervied Learning Supervied Learning: Fixed ditribution on example. Reinforcement Learning: The tate ditribution i policy dependent!!! A mall local change in the policy can make a huge global change in the return. 15
16 MDP model - ummary S a A δ ( 1, a, 2) π : R(,a) S A i = 0 γ i ri - et of tate, S =n. -et of k action, A =k. - tranition function. - immediate reward function. - policy. - dicounted cumulative return. 16
17 Simple example: N- armed bandit Single tate. Goal: Maximize um of immediate reward. a 1 a 2 Given the model: Greedy action. a 3 Difficulty: unknown model. 17
18 N-Armed Bandit: Highlight Algorithm (near greedy): Exponential weight G i um of reward of action a i w i = β G i Follow the (perturbed) leader Reult: For any equence of T reward: E[online] > max i {G i } - qrt{t log N} 18
19 Planning - Baic Problem. Given a complete MDP model. Policy evaluation - Given a policy π, etimate it return. Optimal control - Find an optimal policy π * (maximize the return from any tart tate). 19
20 Planning - Value Function V π () The expected return tarting at tate following π. Q π (,a) The expected return tarting at tate with action a and then following π. V () and Q (,a) are define uing an optimal policy π. V () = max π V π () 20
21 Planning - Policy Evaluation Dicounted infinite horizon (Bellman Eq.) V π () = E ~ π () [ R(,π ()) + γ V π ( )] Rewrite the expectation V π ( ) = E[ R(, π ( ))] + γ ' δ (, π ( ), ') V π ( ') Linear ytem of equation. 21
22 Algorithm - Policy Evaluation Example A={+1,-1} γ = 1/2 δ( i,a)= i+a π random a: R( i,a) = i V π ( 0 ) = 0 +γ [π( 0,+1)V π ( 1 ) + π( 0,-1) V π ( 3 ) ] 22
23 Algorithm -Policy Evaluation Example A={+1,-1} γ = 1/2 δ( i,a)= i+a π random a: R( i,a) = i V π ( 0 ) = 5/3 V π ( 1 ) = 7/3 V π ( 2 ) = 11/3 V π ( 3 ) = 13/3 V π ( 0 ) = 0 + (V π ( 1 ) + V π ( 3 ) )/4 23
24 Algorithm - optimal control State-Action Value function: Q π (,a) = E [ R(,a)] + γ E ~ (,a) [ V π ( )] π π Note V ( ) = Q (, π ( )) For a determinitic policy π. 24
25 Algorithm -Optimal control Example A={+1,-1} γ = 1/2 δ( i,a)= i+a π random R( i,a) = i Q π ( 0,+1) = 5/6 Q π ( 0,-1) = 13/6 Q π ( 0,+1) = 0 +γ V π ( 1 ) 25
26 Algorithm - optimal control CLAIM: A policy π i optimal if and only if at each tate : V π () = MAX a {Q π (,a)} (Bellman Eq.) PROOF: Aume there i a tate and action a.t., V π () < Q π (,a). Then the trategy of performing a at tate (the firt time) i better than π. Thi i true each time we viit, o the policy that perform action a at tate i better than π. 26
27 Algorithm -optimal control Example A={+1,-1} γ = 1/2 δ( i,a)= i+a π random R( i,a) = i Changing the policy uing the tate-action value function. 27
28 28 MDP - computing optimal policy 1. Linear Programming 2. Value Iteration method. )}, ( max { arg ) ( 1 a Q i a i = π π )} ' ( ) ',, ( ), ( max{ ) ( ' 1 V a a R V i a i + + δ γ 3. Policy Iteration method.
29 29 Convergence: Value Iteration Ditance of V i from the optimal V * (in L ) + + = + = ), ( ) ( ) ( ) ( ')] ( ') ( ')[,, ( ), ( ) ( ') ( '),, ( ), ( ), ( * 1 * * * 1 * * ' * * * * ' i i i i i i i i i V V V V a Q V V V V V V V a a Q V V a a R a Q γ γ δ γ δ γ Convergence Rate: 1/(1-γ) ONLY Peudo Polynomial
30 Convergence: Policy Iteration Policy Iteration Algorithm: Compute Q π (,a) Set π() = arg max a Q π (,a) Reiterate Convergence: Policy can only improve V t+1 () V t () Le iteration then Value Iteration, but more expenive iteration. OPEN: How many iteration doe it require?! LB: linear UB: 2 n /n (2-action MDP) [MS] 30
31 Outline Done Goal of Reinforcement Learning Mathematical Model (MDP) Planning Value iteration Policy iteration Now: Learning Algorithm Model baed Model Free 31
32 Planning veru Learning Tightly coupled in Reinforcement Learning Goal: maximize return while learning. 32
33 Example - Elevator Control Learning (alone): Model the arrival model well. Planning (alone) : Given arrival model build chedule Real objective: Contruct a chedule while updating model 33
34 Learning Algorithm Given acce only to action perform: 1. policy evaluation. 2. control - find optimal policy. Two approache: 1. Model baed (Dynamic Programming). 2. Model free (Q-Learning). 34
35 Learning - Model Baed Etimate the model from the obervation. (Both tranition probability and reward.) Ue the etimated model a the true model, and find optimal policy. If we have a good etimated model, we hould have a good etimation. 35
36 Learning - Model Baed: off policy Let the policy run for a long time. what i long?! Auming ome exploration Build an oberved model : Tranition probabilitie Reward Ue the oberved model to etimate value of the policy. 36
37 Learning - Model Baed ample ize Sample ize (optimal policy): Naive: O( S 2 A log ( S A ) ) ample. (approximate each tranition δ(,a, ) well.) Better: O( S A log ( S A ) ) ample. (Sufficient to approximate optimal policy.) [KS, NIPS 98] 37
38 Learning - Model Baed: on policy The learner ha control over the action. The immediate goal i to lean a model A before: Build an oberved model : Tranition probabilitie and Reward Ue the oberved model to etimate value of the policy. Accelerating the learning: How to reach new place?! 38
39 Learning - Model Baed: on policy Well ampled node Relatively unknown node 39
40 Learning - Model Baed: on policy HIGH REAWRD Well ampled node Relatively unknown node Exploration Planning in new MDP 40
41 Learning: Policy improvement Aume that we can perform: Given a policy π, Etimate V and Q function of π Can run policy improvement: π = Greedy (Q) Proce converge if etimation are accurate. 41
Reinforcement Learning. Yishay Mansour Tel-Aviv University
Reinforcement Learning Yishay Mansour Tel-Aviv University 1 Reinforcement Learning: Course Information Classes: Wednesday Lecture 10-13 Yishay Mansour Recitations:14-15/15-16 Eliya Nachmani Adam Polyak
More informationLecture 1: March 7, 2018
Reinforcement Learning Spring Semester, 2017/8 Lecture 1: March 7, 2018 Lecturer: Yishay Mansour Scribe: ym DISCLAIMER: Based on Learning and Planning in Dynamical Systems by Shie Mannor c, all rights
More informationJul 4, 2005 turbo_code_primer Revision 0.0. Turbo Code Primer
Jul 4, 5 turbo_code_primer Reviion. Turbo Code Primer. Introduction Thi document give a quick tutorial on MAP baed turbo coder. Section develop the background theory. Section work through a imple numerical
More informationProblem Set 8 Solutions
Deign and Analyi of Algorithm April 29, 2015 Maachuett Intitute of Technology 6.046J/18.410J Prof. Erik Demaine, Srini Devada, and Nancy Lynch Problem Set 8 Solution Problem Set 8 Solution Thi problem
More informationMARKOV DECISION PROCESSES (MDP) AND REINFORCEMENT LEARNING (RL) Versione originale delle slide fornita dal Prof. Francesco Lo Presti
1 MARKOV DECISION PROCESSES (MDP) AND REINFORCEMENT LEARNING (RL) Versione originale delle slide fornita dal Prof. Francesco Lo Presti Historical background 2 Original motivation: animal learning Early
More informationElements of Reinforcement Learning
Elements of Reinforcement Learning Policy: way learning algorithm behaves (mapping from state to action) Reward function: Mapping of state action pair to reward or cost Value function: long term reward,
More informationMulti-Objective Verification on MDPs. Tim Quatmann, Joost-Pieter Katoen Lehrstuhl für Informatik 2
Multi-Objective Verification on MDP Lehrtuhl für Informatik 2 Motivation Planning Under Uncertainty Scenario: Travel to the airport T a x i Travel time i uncertain... Goal: Arrive before the flight depart!!2
More informationComparing Means: t-tests for Two Independent Samples
Comparing ean: t-tet for Two Independent Sample Independent-eaure Deign t-tet for Two Independent Sample Allow reearcher to evaluate the mean difference between two population uing data from two eparate
More informationReinforcement Learning
Reinforcement Learning Markov decision process & Dynamic programming Evaluative feedback, value function, Bellman equation, optimality, Markov property, Markov decision process, dynamic programming, value
More informationThe Winding Path to RL
Markov Deciion Procee MDP) Ron Parr ComSci 70 Deartment of Comuter Science Duke Univerity With thank to Kri Hauer for ome lide The Winding Path to RL Deciion Theory Decritive theory of otimal behavior
More informationClustering Methods without Given Number of Clusters
Clutering Method without Given Number of Cluter Peng Xu, Fei Liu Introduction A we now, mean method i a very effective algorithm of clutering. It mot powerful feature i the calability and implicity. However,
More informationAssignment for Mathematics for Economists Fall 2016
Due date: Mon. Nov. 1. Reading: CSZ, Ch. 5, Ch. 8.1 Aignment for Mathematic for Economit Fall 016 We now turn to finihing our coverage of concavity/convexity. There are two part: Jenen inequality for concave/convex
More informationRELIABILITY OF REPAIRABLE k out of n: F SYSTEM HAVING DISCRETE REPAIR AND FAILURE TIMES DISTRIBUTIONS
www.arpapre.com/volume/vol29iue1/ijrras_29_1_01.pdf RELIABILITY OF REPAIRABLE k out of n: F SYSTEM HAVING DISCRETE REPAIR AND FAILURE TIMES DISTRIBUTIONS Sevcan Demir Atalay 1,* & Özge Elmataş Gültekin
More informationCS599 Lecture 1 Introduction To RL
CS599 Lecture 1 Introduction To RL Reinforcement Learning Introduction Learning from rewards Policies Value Functions Rewards Models of the Environment Exploitation vs. Exploration Dynamic Programming
More informationLecture 4 Topic 3: General linear models (GLMs), the fundamentals of the analysis of variance (ANOVA), and completely randomized designs (CRDs)
Lecture 4 Topic 3: General linear model (GLM), the fundamental of the analyi of variance (ANOVA), and completely randomized deign (CRD) The general linear model One population: An obervation i explained
More informationChristopher Watkins and Peter Dayan. Noga Zaslavsky. The Hebrew University of Jerusalem Advanced Seminar in Deep Learning (67679) November 1, 2015
Q-Learning Christopher Watkins and Peter Dayan Noga Zaslavsky The Hebrew University of Jerusalem Advanced Seminar in Deep Learning (67679) November 1, 2015 Noga Zaslavsky Q-Learning (Watkins & Dayan, 1992)
More informationReinforcement Learning. Introduction
Reinforcement Learning Introduction Reinforcement Learning Agent interacts and learns from a stochastic environment Science of sequential decision making Many faces of reinforcement learning Optimal control
More informationMemoryle Strategie in Concurrent Game with Reachability Objective Λ Krihnendu Chatterjee y Luca de Alfaro x Thoma A. Henzinger y;z y EECS, Univerity o
Memoryle Strategie in Concurrent Game with Reachability Objective Krihnendu Chatterjee, Luca de Alfaro and Thoma A. Henzinger Report No. UCB/CSD-5-1406 Augut 2005 Computer Science Diviion (EECS) Univerity
More informationStochastic Perishable Inventory Control in a Service Facility System Maintaining Inventory for Service: Semi Markov Decision Problem
Stochatic Perihable Inventory Control in a Service Facility Sytem Maintaining Inventory for Service: Semi Markov Deciion Problem R.Mugeh 1,S.Krihnakumar 2, and C.Elango 3 1 mugehrengawamy@gmail.com 2 krihmathew@gmail.com
More informationARTIFICIAL INTELLIGENCE. Reinforcement learning
INFOB2KI 2018-2019 Utrecht University The Netherlands ARTIFICIAL INTELLIGENCE Reinforcement learning Lecturer: Silja Renooij These slides are part of the INFOB2KI Course Notes available from www.cs.uu.nl/docs/vakken/b2ki/schema.html
More informationμ + = σ = D 4 σ = D 3 σ = σ = All units in parts (a) and (b) are in V. (1) x chart: Center = μ = 0.75 UCL =
Our online Tutor are available 4*7 to provide Help with Proce control ytem Homework/Aignment or a long term Graduate/Undergraduate Proce control ytem Project. Our Tutor being experienced and proficient
More informationToday s Outline. Recap: MDPs. Bellman Equations. Q-Value Iteration. Bellman Backup 5/7/2012. CSE 473: Artificial Intelligence Reinforcement Learning
CSE 473: Artificial Intelligence Reinforcement Learning Dan Weld Today s Outline Reinforcement Learning Q-value iteration Q-learning Exploration / exploitation Linear function approximation Many slides
More informationSeminar in Artificial Intelligence Near-Bayesian Exploration in Polynomial Time
Seminar in Artificial Intelligence Near-Bayesian Exploration in Polynomial Time 26.11.2015 Fachbereich Informatik Knowledge Engineering Group David Fischer 1 Table of Contents Problem and Motivation Algorithm
More informationCS 570: Machine Learning Seminar. Fall 2016
CS 570: Machine Learning Seminar Fall 2016 Class Information Class web page: http://web.cecs.pdx.edu/~mm/mlseminar2016-2017/fall2016/ Class mailing list: cs570@cs.pdx.edu My office hours: T,Th, 2-3pm or
More informationCS788 Dialogue Management Systems Lecture #2: Markov Decision Processes
CS788 Dialogue Management Systems Lecture #2: Markov Decision Processes Kee-Eung Kim KAIST EECS Department Computer Science Division Markov Decision Processes (MDPs) A popular model for sequential decision
More informationZ a>2 s 1n = X L - m. X L = m + Z a>2 s 1n X L = The decision rule for this one-tail test is
M09_BERE8380_12_OM_C09.QD 2/21/11 3:44 PM Page 1 9.6 The Power of a Tet 9.6 The Power of a Tet 1 Section 9.1 defined Type I and Type II error and their aociated rik. Recall that a repreent the probability
More informationChapter Landscape of an Optimization Problem. Local Search. Coping With NP-Hardness. Gradient Descent: Vertex Cover
Coping With NP-Hardne Chapter 12 Local Search Q Suppoe I need to olve an NP-hard problem What hould I do? A Theory ay you're unlikely to find poly-time algorithm Mut acrifice one of three deired feature
More information1 MDP Value Iteration Algorithm
CS 0. - Active Learning Problem Set Handed out: 4 Jan 009 Due: 9 Jan 009 MDP Value Iteration Algorithm. Implement the value iteration algorithm given in the lecture. That is, solve Bellman s equation using
More informationGrundlagen der Künstlichen Intelligenz
Grundlagen der Künstlichen Intelligenz Reinforcement learning Daniel Hennes 4.12.2017 (WS 2017/18) University Stuttgart - IPVS - Machine Learning & Robotics 1 Today Reinforcement learning Model based and
More informationReinforcement Learning. Donglin Zeng, Department of Biostatistics, University of North Carolina
Reinforcement Learning Introduction Introduction Unsupervised learning has no outcome (no feedback). Supervised learning has outcome so we know what to predict. Reinforcement learning is in between it
More informationSampling Based Approaches for Minimizing Regret in Uncertain Markov Decision Processes (MDPs)
Journal of Artificial Intelligence Reearch 59 (2017) 229-264 Submitted 06/16; publihed 06/17 Sampling Baed Approache for Minimizing Regret in Uncertain Markov Deciion Procee (MDP) Arar Ahmed London Buine
More informationLecture 7: Testing Distributions
CSE 5: Sublinear (and Streaming) Algorithm Spring 014 Lecture 7: Teting Ditribution April 1, 014 Lecturer: Paul Beame Scribe: Paul Beame 1 Teting Uniformity of Ditribution We return today to property teting
More informationAdministration. CSCI567 Machine Learning (Fall 2018) Outline. Outline. HW5 is available, due on 11/18. Practice final will also be available soon.
Administration CSCI567 Machine Learning Fall 2018 Prof. Haipeng Luo U of Southern California Nov 7, 2018 HW5 is available, due on 11/18. Practice final will also be available soon. Remaining weeks: 11/14,
More informationMarkov decision processes
CS 2740 Knowledge representation Lecture 24 Markov decision processes Milos Hauskrecht milos@cs.pitt.edu 5329 Sennott Square Administrative announcements Final exam: Monday, December 8, 2008 In-class Only
More informationIntroduction to Reinforcement Learning. CMPT 882 Mar. 18
Introduction to Reinforcement Learning CMPT 882 Mar. 18 Outline for the week Basic ideas in RL Value functions and value iteration Policy evaluation and policy improvement Model-free RL Monte-Carlo and
More informationLecture 18: Reinforcement Learning Sanjeev Arora Elad Hazan
COS 402 Machine Learning and Artificial Intelligence Fall 2016 Lecture 18: Reinforcement Learning Sanjeev Arora Elad Hazan Some slides borrowed from Peter Bodik and David Silver Course progress Learning
More informationArtificial Intelligence & Sequential Decision Problems
Artificial Intelligence & Sequential Decision Problems (CIV6540 - Machine Learning for Civil Engineers) Professor: James-A. Goulet Département des génies civil, géologique et des mines Chapter 15 Goulet
More informationSocial Studies 201 Notes for March 18, 2005
1 Social Studie 201 Note for March 18, 2005 Etimation of a mean, mall ample ize Section 8.4, p. 501. When a reearcher ha only a mall ample ize available, the central limit theorem doe not apply to the
More informationALLOCATING BANDWIDTH FOR BURSTY CONNECTIONS
SIAM J. COMPUT. Vol. 30, No. 1, pp. 191 217 c 2000 Society for Indutrial and Applied Mathematic ALLOCATING BANDWIDTH FOR BURSTY CONNECTIONS JON KLEINBERG, YUVAL RABANI, AND ÉVA TARDOS Abtract. In thi paper,
More informationThe simplex method is strongly polynomial for deterministic Markov decision processes
The implex method i trongly polynomial for determinitic Markov deciion procee Ian Pot Yinyu Ye May 31, 2013 Abtract We prove that the implex method with the highet gain/mot-negative-reduced cot pivoting
More informationCS 7180: Behavioral Modeling and Decisionmaking
CS 7180: Behavioral Modeling and Decisionmaking in AI Markov Decision Processes for Complex Decisionmaking Prof. Amy Sliva October 17, 2012 Decisions are nondeterministic In many situations, behavior and
More informationMyriad of applications
Shortet Path Myriad of application Finding hortet ditance between location (Google map, etc.) Internet router protocol: OSPF (Open Shortet Path Firt) i ued to find the hortet path to interchange package
More informationDeparture Time and Route Choices with Bottleneck Congestion: User Equilibrium under Risk and Ambiguity
Departure Time and Route Choice with Bottleneck Congetion: Uer Equilibrium under Rik and Ambiguity Yang Liu, Yuanyuan Li and Lu Hu Department of Indutrial Sytem Engineering and Management National Univerity
More informationChapter 12 Simple Linear Regression
Chapter 1 Simple Linear Regreion Introduction Exam Score v. Hour Studied Scenario Regreion Analyi ued to quantify the relation between (or more) variable o you can predict the value of one variable baed
More informationToday s s Lecture. Applicability of Neural Networks. Back-propagation. Review of Neural Networks. Lecture 20: Learning -4. Markov-Decision Processes
Today s s Lecture Lecture 20: Learning -4 Review of Neural Networks Markov-Decision Processes Victor Lesser CMPSCI 683 Fall 2004 Reinforcement learning 2 Back-propagation Applicability of Neural Networks
More informationCOMP3702/7702 Artificial Intelligence Lecture 11: Introduction to Machine Learning and Reinforcement Learning. Hanna Kurniawati
COMP3702/7702 Artificial Intelligence Lecture 11: Introduction to Machine Learning and Reinforcement Learning Hanna Kurniawati Today } What is machine learning? } Where is it used? } Types of machine learning
More informationBasics of reinforcement learning
Basics of reinforcement learning Lucian Buşoniu TMLSS, 20 July 2018 Main idea of reinforcement learning (RL) Learn a sequential decision policy to optimize the cumulative performance of an unknown system
More informationTo describe a queuing system, an input process and an output process has to be specified.
5. Queue (aiting Line) Queuing terminology Input Service Output To decribe a ueuing ytem, an input proce and an output proce ha to be pecified. For example ituation input proce output proce Bank Cutomer
More informationStochastic games with additive transitions
European Journal of Operational Reearch 79 (007) 483 497 Deciion Support Stochatic game with additive tranition J. Flech, F. Thuijman *, O.J. Vrieze Department of Mathematic, Maatricht Univerity, P.O.
More information7.2 INVERSE TRANSFORMS AND TRANSFORMS OF DERIVATIVES 281
72 INVERSE TRANSFORMS AND TRANSFORMS OF DERIVATIVES 28 and i 2 Show how Euler formula (page 33) can then be ued to deduce the reult a ( a) 2 b 2 {e at co bt} {e at in bt} b ( a) 2 b 2 5 Under what condition
More informationLecture 10 Filtering: Applied Concepts
Lecture Filtering: Applied Concept In the previou two lecture, you have learned about finite-impule-repone (FIR) and infinite-impule-repone (IIR) filter. In thee lecture, we introduced the concept of filtering
More information, and rewards and transition matrices as shown below:
CSE 50a. Assignment 7 Out: Tue Nov Due: Thu Dec Reading: Sutton & Barto, Chapters -. 7. Policy improvement Consider the Markov decision process (MDP) with two states s {0, }, two actions a {0, }, discount
More informationPlanning in Markov Decision Processes
Carnegie Mellon School of Computer Science Deep Reinforcement Learning and Control Planning in Markov Decision Processes Lecture 3, CMU 10703 Katerina Fragkiadaki Markov Decision Process (MDP) A Markov
More informationStandard normal distribution. t-distribution, (df=5) t-distribution, (df=2) PDF created with pdffactory Pro trial version
t-ditribution In biological reearch the population variance i uually unknown and an unbiaed etimate,, obtained from the ample data, ha to be ued in place of σ. The propertie of t- ditribution are: -It
More informationBalancing and Control of a Freely-Swinging Pendulum Using a Model-Free Reinforcement Learning Algorithm
Balancing and Control of a Freely-Swinging Pendulum Using a Model-Free Reinforcement Learning Algorithm Michail G. Lagoudakis Department of Computer Science Duke University Durham, NC 2778 mgl@cs.duke.edu
More informationLayering as Optimization Decomposition
TLEN7000/ECEN7002: Analytical Foundation of Network Layering a Optimization Decompoition Lijun Chen 11/29/2012 The Internet Compleity i ever increaing Large in ize and cope Enormou heterogeneity Incomplete
More informationCodes Correcting Two Deletions
1 Code Correcting Two Deletion Ryan Gabry and Frederic Sala Spawar Sytem Center Univerity of California, Lo Angele ryan.gabry@navy.mil fredala@ucla.edu Abtract In thi work, we invetigate the problem of
More informationCourse 16:198:520: Introduction To Artificial Intelligence Lecture 13. Decision Making. Abdeslam Boularias. Wednesday, December 7, 2016
Course 16:198:520: Introduction To Artificial Intelligence Lecture 13 Decision Making Abdeslam Boularias Wednesday, December 7, 2016 1 / 45 Overview We consider probabilistic temporal models where the
More informationQ-learning. Tambet Matiisen
Q-learning Tambet Matiisen (based on chapter 11.3 of online book Artificial Intelligence, foundations of computational agents by David Poole and Alan Mackworth) Stochastic gradient descent Experience
More informationConfusion matrices. True / False positives / negatives. INF 4300 Classification III Anne Solberg The agenda today: E.g., testing for cancer
INF 4300 Claification III Anne Solberg 29.10.14 The agenda today: More on etimating claifier accuracy Cure of dimenionality knn-claification K-mean clutering x i feature vector for pixel i i- The cla label
More informationBogoliubov Transformation in Classical Mechanics
Bogoliubov Tranformation in Claical Mechanic Canonical Tranformation Suppoe we have a et of complex canonical variable, {a j }, and would like to conider another et of variable, {b }, b b ({a j }). How
More informationPreemptive scheduling on a small number of hierarchical machines
Available online at www.ciencedirect.com Information and Computation 06 (008) 60 619 www.elevier.com/locate/ic Preemptive cheduling on a mall number of hierarchical machine György Dóa a, Leah Eptein b,
More informationValue Iteration for Continuous-State POMDPs
Univeriteit van Amterdam IAS technical report IAS-UVA-04-04 Value Iteration for Continuou-State POMDP Joep M. Porta, Matthij T.J. Spaan, and Niko Vlai Intitut de Robòtica i Informàtica Indutrial (UPC-CSIC)
More informationCHAPTER 8 OBSERVER BASED REDUCED ORDER CONTROLLER DESIGN FOR LARGE SCALE LINEAR DISCRETE-TIME CONTROL SYSTEMS
CHAPTER 8 OBSERVER BASED REDUCED ORDER CONTROLLER DESIGN FOR LARGE SCALE LINEAR DISCRETE-TIME CONTROL SYSTEMS 8.1 INTRODUCTION 8.2 REDUCED ORDER MODEL DESIGN FOR LINEAR DISCRETE-TIME CONTROL SYSTEMS 8.3
More informationEvolutionary Algorithms Based Fixed Order Robust Controller Design and Robustness Performance Analysis
Proceeding of 01 4th International Conference on Machine Learning and Computing IPCSIT vol. 5 (01) (01) IACSIT Pre, Singapore Evolutionary Algorithm Baed Fixed Order Robut Controller Deign and Robutne
More informationIntroduction to Reinforcement Learning
Introduction to Reinforcement Learning Rémi Munos SequeL project: Sequential Learning http://researchers.lille.inria.fr/ munos/ INRIA Lille - Nord Europe Machine Learning Summer School, September 2011,
More informationBehavioral thermal modeling for quad-core microprocessors
Behavioral thermal modeling for quad-core microproceor Duo Li and Sheldon X.-D. Tan Department of Electrical Engineering Univerity of California, Riveride, CA Murli Tirumala Intel Corporation Outline Introduction
More informationProf. Dr. Ann Nowé. Artificial Intelligence Lab ai.vub.ac.be
REINFORCEMENT LEARNING AN INTRODUCTION Prof. Dr. Ann Nowé Artificial Intelligence Lab ai.vub.ac.be REINFORCEMENT LEARNING WHAT IS IT? What is it? Learning from interaction Learning about, from, and while
More informationFactor Analysis with Poisson Output
Factor Analyi with Poion Output Gopal Santhanam Byron Yu Krihna V. Shenoy, Department of Electrical Engineering, Neurocience Program Stanford Univerity Stanford, CA 94305, USA {gopal,byronyu,henoy}@tanford.edu
More informationArtificial Intelligence Markov Decision Problems
rtificil Intelligence Mrkov eciion Problem ilon - briefly mentioned in hpter Ruell nd orvig - hpter 7 Mrkov eciion Problem; pge of Mrkov eciion Problem; pge of exmple: probbilitic blockworld ction outcome
More informationThe continuous time random walk (CTRW) was introduced by Montroll and Weiss 1.
1 I. CONTINUOUS TIME RANDOM WALK The continuou time random walk (CTRW) wa introduced by Montroll and Wei 1. Unlike dicrete time random walk treated o far, in the CTRW the number of jump n made by the walker
More informationA Study on Simulating Convolutional Codes and Turbo Codes
A Study on Simulating Convolutional Code and Turbo Code Final Report By Daniel Chang July 27, 2001 Advior: Dr. P. Kinman Executive Summary Thi project include the deign of imulation of everal convolutional
More informationQuantitative Information Leakage. Lecture 9
Quantitative Information Leakage Lecture 9 1 The baic model: Sytem = Information-Theoretic channel Secret Information Obervable 1 o1... Sytem... m on Input Output 2 Toward a quantitative notion of leakage
More informationState Space: Observer Design Lecture 11
State Space: Oberver Deign Lecture Advanced Control Sytem Dr Eyad Radwan Dr Eyad Radwan/ACS/ State Space-L Controller deign relie upon acce to the tate variable for feedback through adjutable gain. Thi
More informationReal Time Value Iteration and the State-Action Value Function
MS&E338 Reinforcement Learning Lecture 3-4/9/18 Real Time Value Iteration and the State-Action Value Function Lecturer: Ben Van Roy Scribe: Apoorva Sharma and Tong Mu 1 Review Last time we left off discussing
More informationMachine Learning. Reinforcement learning. Hamid Beigy. Sharif University of Technology. Fall 1396
Machine Learning Reinforcement learning Hamid Beigy Sharif University of Technology Fall 1396 Hamid Beigy (Sharif University of Technology) Machine Learning Fall 1396 1 / 32 Table of contents 1 Introduction
More informationHSC PHYSICS ONLINE KINEMATICS EXPERIMENT
HSC PHYSICS ONLINE KINEMATICS EXPERIMENT RECTILINEAR MOTION WITH UNIFORM ACCELERATION Ball rolling down a ramp Aim To perform an experiment and do a detailed analyi of the numerical reult for the rectilinear
More informationNo-load And Blocked Rotor Test On An Induction Machine
No-load And Blocked Rotor Tet On An Induction Machine Aim To etimate magnetization and leakage impedance parameter of induction machine uing no-load and blocked rotor tet Theory An induction machine in
More informationWhite Rose Research Online URL for this paper: Version: Accepted Version
Thi i a repoitory copy of Identification of nonlinear ytem with non-peritent excitation uing an iterative forward orthogonal leat quare regreion algorithm. White Roe Reearch Online URL for thi paper: http://eprint.whiteroe.ac.uk/107314/
More informationMDP Preliminaries. Nan Jiang. February 10, 2019
MDP Preliminaries Nan Jiang February 10, 2019 1 Markov Decision Processes In reinforcement learning, the interactions between the agent and the environment are often described by a Markov Decision Process
More informationLecture 8: Period Finding: Simon s Problem over Z N
Quantum Computation (CMU 8-859BB, Fall 205) Lecture 8: Period Finding: Simon Problem over Z October 5, 205 Lecturer: John Wright Scribe: icola Rech Problem A mentioned previouly, period finding i a rephraing
More informationInternet Monetization
Internet Monetization March May, 2013 Discrete time Finite A decision process (MDP) is reward process with decisions. It models an environment in which all states are and time is divided into stages. Definition
More informationOptimal Control. McGill COMP 765 Oct 3 rd, 2017
Optimal Control McGill COMP 765 Oct 3 rd, 2017 Classical Control Quiz Question 1: Can a PID controller be used to balance an inverted pendulum: A) That starts upright? B) That must be swung-up (perhaps
More informationQ-Learning in Continuous State Action Spaces
Q-Learning in Continuous State Action Spaces Alex Irpan alexirpan@berkeley.edu December 5, 2015 Contents 1 Introduction 1 2 Background 1 3 Q-Learning 2 4 Q-Learning In Continuous Spaces 4 5 Experimental
More informationReinforcement Learning
Reinforcement Learning Ron Parr CompSci 7 Department of Computer Science Duke University With thanks to Kris Hauser for some content RL Highlights Everybody likes to learn from experience Use ML techniques
More informationCourse basics. CSE 190: Reinforcement Learning: An Introduction. Last Time. Course goals. The website for the class is linked off my homepage.
Course basics CSE 190: Reinforcement Learning: An Introduction The website for the class is linked off my homepage. Grades will be based on programming assignments, homeworks, and class participation.
More informationComparison of independent process analytical measurements a variographic study
WSC 7, Raivola, Ruia, 15-19 February, 010 Comparion of independent proce analytical meaurement a variographic tudy Atmopheric emiion Watewater Solid wate Pentti Minkkinen 1) Lappeenranta Univerity of Technology
More informationSuggested Answers To Exercises. estimates variability in a sampling distribution of random means. About 68% of means fall
Beyond Significance Teting ( nd Edition), Rex B. Kline Suggeted Anwer To Exercie Chapter. The tatitic meaure variability among core at the cae level. In a normal ditribution, about 68% of the core fall
More informationAn Introduction to Reinforcement Learning
An Introduction to Reinforcement Learning Shivaram Kalyanakrishnan shivaram@cse.iitb.ac.in Department of Computer Science and Engineering Indian Institute of Technology Bombay April 2018 What is Reinforcement
More informationReinforcement Learning and Control
CS9 Lecture notes Andrew Ng Part XIII Reinforcement Learning and Control We now begin our study of reinforcement learning and adaptive control. In supervised learning, we saw algorithms that tried to make
More informationReinforcement Learning: An Introduction
Introduction Betreuer: Freek Stulp Hauptseminar Intelligente Autonome Systeme (WiSe 04/05) Forschungs- und Lehreinheit Informatik IX Technische Universität München November 24, 2004 Introduction What is
More informationReinforcement Learning
Reinforcement Learning Dipendra Misra Cornell University dkm@cs.cornell.edu https://dipendramisra.wordpress.com/ Task Grasp the green cup. Output: Sequence of controller actions Setup from Lenz et. al.
More informationMulti Constrained Optimization model of Supply Chain Based on Intelligent Algorithm Han Juan School of Management Shanghai University
5th nternational Conference on Advanced Material and Computer cience (CAMC 206) Multi Contrained Optimization model of upply Chain Baed on ntelligent Algorithm Han Juan chool of Management hanghai Univerity
More informationArtificial Intelligence
Artificial Intelligence Dynamic Programming Marc Toussaint University of Stuttgart Winter 2018/19 Motivation: So far we focussed on tree search-like solvers for decision problems. There is a second important
More informationList coloring hypergraphs
Lit coloring hypergraph Penny Haxell Jacque Vertraete Department of Combinatoric and Optimization Univerity of Waterloo Waterloo, Ontario, Canada pehaxell@uwaterloo.ca Department of Mathematic Univerity
More informationAutonomous Helicopter Flight via Reinforcement Learning
Autonomous Helicopter Flight via Reinforcement Learning Authors: Andrew Y. Ng, H. Jin Kim, Michael I. Jordan, Shankar Sastry Presenters: Shiv Ballianda, Jerrolyn Hebert, Shuiwang Ji, Kenley Malveaux, Huy
More informationSTOCHASTIC GENERALIZED TRANSPORTATION PROBLEM WITH DISCRETE DISTRIBUTION OF DEMAND
OPERATIONS RESEARCH AND DECISIONS No. 4 203 DOI: 0.5277/ord30402 Marcin ANHOLCER STOCHASTIC GENERALIZED TRANSPORTATION PROBLEM WITH DISCRETE DISTRIBUTION OF DEMAND The generalized tranportation problem
More informationHidden Markov Models (HMM) and Support Vector Machine (SVM)
Hidden Markov Models (HMM) and Support Vector Machine (SVM) Professor Joongheon Kim School of Computer Science and Engineering, Chung-Ang University, Seoul, Republic of Korea 1 Hidden Markov Models (HMM)
More informationThis question has three parts, each of which can be answered concisely, but be prepared to explain and justify your concise answer.
This question has three parts, each of which can be answered concisely, but be prepared to explain and justify your concise answer. 1. Suppose you have a policy and its action-value function, q, then you
More informationChannel Selection in Multi-channel Opportunistic Spectrum Access Networks with Perfect Sensing
Channel Selection in Multi-channel Opportunitic Spectrum Acce Networ with Perfect Sening Xin Liu Univerity of California Davi, CA 95616 liu@c.ucdavi.edu Bhaar Krihnamachari Univerity of Southern California
More information