Reinforcement Learning

Size: px
Start display at page:

Download "Reinforcement Learning"

Transcription

1 Reinforcement Lerning Tom Mitchell, Mchine Lerning, chpter 13 Outline Introduction Comprison with inductive lerning Mrkov Decision Processes: the model Optiml policy: The tsk Q Lerning: Q function Algorithm Convergence proof Explortion vs exploittion Non-deterministic rewrds nd ctions Copyright Fcundo Bromberg 2 Introduction How n utonomous gent tht senses nd cts in its environment cn lern to choose optiml ctions to chieve its gol. Rewrd Critic Agent Action Stte Environment s 0 s 1 s r 0 r 1 r 2 Gol: Lern to choose ctions tht mximize r 0 +r r , where 0<1 Copyright Fcundo Bromberg 3 Copyright Fcundo Bromberg 1

2 Introduction This generic problem is one of lerning to control sequentil processes such s: Lerning to control mobile robot, Lerning to optimize opertions in fctorie nd Lerning to ply bord gmes (world-clss bckgmmon plyer, Tesuro 1995). Suitble for scenrios with unpredictble (e.g. highly dynmic) or complex environments. Lck of domin theory Impossible to build-in optiml behvior (like in plnning, optimiztion lg, etc). Copyright Fcundo Bromberg 4 Introduction Assumption: gols cn be defined by rewrd function tht ssigns numericl vlues to ctionstte pirs. This rewrd function is known by the critic who could be externl or built-in. The tsk of the gent is to perform sequences of ction observe their consequence nd lern control policy : S A tht chooses ctions tht mximize the ccumulted rewrd. Copyright Fcundo Bromberg 5 Differences with inductive lerning Differences with function pproximtion (inductive lerning) of : S A: Delyed rewrd. Insted of pirs <(s)>, gent Optiml ction for current stte s. receives sequence of rewrds nd fces the problem of temporl credit ssignment. Explortion. The gent hs influence on the distribution of trining exmples by the ction sequence it chooses. Rises problem of explortion vs exploittion. Copyright Fcundo Bromberg 6 Copyright Fcundo Bromberg 2

3 Differences with inductive lerning Prtilly observble sttes. In mny prcticl situtions sensors provide only prtil informtion of the environment s stte. Optiml policy my therefore include specificlly ctions tht improve observbility of environment. Life-long lerning. Unlike isolted inductive lerning tsk gent lerning often requires lerning of severl relted tsks within the sme environment. Prior knowledge or experience become relevnt. Copyright Fcundo Bromberg 7 The model (or the tsk of mking compromises) Deterministic or non-deterministic ctions? Prior knowledge or not bout effects of its ctions on the environment (domin theory)? Triner gives exmples of optiml ction sequences (inductive lerning), or it must trin itself? The choice: Mrkov Decision Process (MDP). Copyright Fcundo Bromberg 8 Mrkov Decision Processes An MDP is tuple (S, A, s 0,, r), S is the set of stte A is the set of ctions vilble to the gent, s 0 is the initil stte, : S x A S, is the trnsition function, nd r: S x A R + is the rewrd function. r nd depend only on current stte nd ction (Mrkov property), nd they re deterministic. Copyright Fcundo Bromberg 9 Copyright Fcundo Bromberg 3

4 2 The tsk The tsk of the gent is to lern policy : S A tht selects next ction t bsed on current observed stte s t, i.e. (s t )= t. How? Policy tht mximizes cumultive rewrd over time. Tht i policy tht mximizes: V (s t ) =r t + r t r t+1 + = i r, 0 1 = + i 0 t < i Where the sequence of rewrds ws generted by 0 s0 r s1 1 ) r s2 ) r = 0 1 ( s = ( s = ( s Copyright Fcundo Bromberg 10 The tsk (2) Alterntive definitions of totl rewrd: Finite horizon rewrd: Averge rewrd: lim h = r i 0 t + i 1 h h h i = r 0 t + i Copyright Fcundo Bromberg 11 The optiml policy The tsk of the gent is thus to lern the optiml policy given by: rg mxv, And we denote by V (s)=v (s) the mximum rewrd the gent cn obtin strting t s. Copyright Fcundo Bromberg 12 Copyright Fcundo Bromberg 4

5 Exmple S: cells A: rrows. r: numbers by rrows. V (s bottom-right )=100 V (s bottom-center )= = 90 V (s bottom-left )= = 81 Since G is bsorbing, infinite sum becomes finite. Copyright Fcundo Bromberg 13 Q Lerning Agent wnts to mximize cumultive rewrd, thu it should prefer stte s 1 over s 2 whenever V (s 1 )>V (s 2 ). However, gent s policy must choose mong ction not sttes. No problem: The optiml ction in stte s is the ction tht mximizes the sum of the immedite rewrd r( plus the vlue V of the immedite successor stte s, discounted by. s, successor = rg mx [ r( + V ( ( )] immedite rewrd vlue of successor Copyright Fcundo Bromberg 14 Q Lerning rδ = rg mx[ ( + V ( ( )] Thu if gent knows functions r nd, it cn lern optiml policy by lerning vlue V offline vlue itertion lgorithm (skipped) For ll but few cse is unknown. Requires precise knowledge of the domin. Sometimes the domin is even non-deterministic!. Copyright Fcundo Bromberg 15 Copyright Fcundo Bromberg 5

6 The Q function If r or re unknown, wht evlution function should the gent use? The evlution function Q. rδ = rg mx[ ( + V ( ( )] Q( s, ) = rg mx Q( Thu if gent is cpble of lerning Q, it will be ble to select optiml ctions even when it hs no knowledge of r nd. Copyright Fcundo Bromberg 16 The Q function rδ = rg mx Q(, Q( = ( + V ( ( ) Surprisingly, gent cn choose optiml ction without ever conducting lookhed serch to explicitly consider wht sttes result from the ction. Ye Q function hs exctly tht property. Q( summrizes in single vlue ll the informtion needed to determine discounted cumultive rewrd tht will be gined in the future if ction is chosen in stte s. Copyright Fcundo Bromberg 17 Exmple rq( = δ ( + V ( ( ) S: cells A: rrows. r: numbers by rrows. Q(s G, )= Q(s bootom-right, )= = 100 Q(s bootom-center, )= = 90 Q(s bootom-left, )= (0.9) 2 100= 81 Since G is bsorbing, infinite sum becomes finite. Copyright Fcundo Bromberg 18 Copyright Fcundo Bromberg 6

7 Algorithm for lerning Q (Wtkins 1989) rδ = rg mx Q(, Q( = ( + V ( ( ) Lerning Q is equivlent to lern optiml policy. Note tht: V = mx [ r( ) + V ( ( ))] = mx Q( ) So we obtin recursive definition of Q, rδq( = ( + mx Q( ( ), ') The lgorithm lerns n pproximtion Qˆ of Q represented s tble with seprte entries for ech stte-ction pir. Copyright Fcundo Bromberg 19 Algorithm for lerning Q (cond.) For ech initilize tble entry Qˆ ( to zero. Observe current stte s Do forever: Agent observes its current stte Chooses some ction nd executes it Observes resulting rewrd r=r( nd new stte s =( Updtes the tble entry for Qˆ (, ccording to the rule: Qˆ( r + mxqˆ( s', ) s s Note tht Q-lerning propgtes Qˆ estimtes bckwrds from the new stte s to the old stte s. Copyright Fcundo Bromberg 20 Exmple 2. Q lerning S S S right S Qˆ( s1, ) r + mx Qˆ( s2, ) right mx{63, 81,100} 90 Copyright Fcundo Bromberg 21 Copyright Fcundo Bromberg 7

8 Exmple 3. Q lerning Proceeding in episodes from s 0 to G, lwys through s 1, s 2. s 1 0 s 2 0 G s 0 0 Copyright Fcundo Bromberg 22 Exmple 3. Q lerning episode 1 s 1 0 s G s 0 0 Copyright Fcundo Bromberg 23 Exmple 3. Q lerning episode 2 s 1 90 s G s 0 0 Copyright Fcundo Bromberg 24 Copyright Fcundo Bromberg 8

9 Exmple 3. Q lerning episode 3 s 1 90 s G s 0 81 Copyright Fcundo Bromberg 25 Convergence of Q lerning Will the lgorithm converge towrd Qˆ equl to the true Q function? Copyright Fcundo Bromberg 26 Convergence of Q lerning Qˆ Copyright Fcundo Bromberg 27 Copyright Fcundo Bromberg 9

10 Experimenttion strtegies in Q lerning Algorithm does not specify how ctions re chosen!. Exploittion: One possibility is for the gent t stte s to choose ction tht mximizes Qˆ (, thereby exploiting current pproximtion. With this strtegy, gent risks filing to explore other ction in other stte tht hve even higher vlues but hven t been visited yet. Moreover, theorem requires ction-stte pirs visited infinitely often. Copyright Fcundo Bromberg 28 Experimenttion strtegies in Q lerning (2) Explortion: Probbilistic pproch tht gives higher probbilities to higher Qˆ vlues. P( s) = i k Qˆ ( i ) where k > 0 determines how strongly the selection fvors ctions with high Qˆ vlues. j k Qˆ ( j ) High k exploit Low k explore Copyright Fcundo Bromberg 29 Nondeterministic rewrds nd ctions Noisy effector gmes with dice, etc. ( first produce distribution P : S A S then drws n outcome t rndom from P., nd Similrly, for r. We ssume these probbilities follows mrkov property. We retrce line of rgument tht led to the deterministic lgorithm, revising it where needed. Copyright Fcundo Bromberg 30 Copyright Fcundo Bromberg 10

11 Nondeterministic vlue function We define the nondeterministic vlue function V (s t ) for policy s the expected vlue of the discounted cumultive rewrd: V ( s ) E t i [ rt i ] i= 0 + As before, we define the optiml policy to be the policy tht mximizes V (s) for ll sttes s. rgmx V, Copyright Fcundo Bromberg 31 Nondeterministic vlue function And we generlize the erlier definition of Q by tking its expected vlue Q( E r [ ( + V ( ( ) ] = E[ r( ] + E[ V ( ( )] = E[ r( ] + s' P( s' V ( s' ) As before, we cn express Q recursively Q( = E[ r( ] + s' P( s' mxq( s', ) Copyright Fcundo Bromberg 32 Convergence nd trining rule The convergence proof holds for the deterministic cse, but previous lerning rule do not converge in the nondeterministic cse. The following trining rule is sufficient to ssure convergence of Qˆ to Q Qˆ ( (1 ) Qˆ n n ˆ 1( + n[ r mx Qn 1( s', )] n + where 1 n = 1+ visits ( n deterministic trining rule Copyright Fcundo Bromberg 33 Copyright Fcundo Bromberg 11

12 Convergence nd trining rule Key ide: revisions to Qˆ re mde more grdully thn in the deterministic cse. n =1 we recover the deterministic lerning rule. Choice of n given bove is one of mny to stisfy the conditions for convergence ccording to theorem by Wtkins nd Dyn (1992) (see Mitchell, not included here). Copyright Fcundo Bromberg 34 Copyright Fcundo Bromberg 12

Reinforcement learning II

Reinforcement learning II CS 1675 Introduction to Mchine Lerning Lecture 26 Reinforcement lerning II Milos Huskrecht milos@cs.pitt.edu 5329 Sennott Squre Reinforcement lerning Bsics: Input x Lerner Output Reinforcement r Critic

More information

2D1431 Machine Learning Lab 3: Reinforcement Learning

2D1431 Machine Learning Lab 3: Reinforcement Learning 2D1431 Mchine Lerning Lb 3: Reinforcement Lerning Frnk Hoffmnn modified by Örjn Ekeberg December 7, 2004 1 Introduction In this lb you will lern bout dynmic progrmming nd reinforcement lerning. It is ssumed

More information

19 Optimal behavior: Game theory

19 Optimal behavior: Game theory Intro. to Artificil Intelligence: Dle Schuurmns, Relu Ptrscu 1 19 Optiml behvior: Gme theory Adversril stte dynmics hve to ccount for worst cse Compute policy π : S A tht mximizes minimum rewrd Let S (,

More information

1 Online Learning and Regret Minimization

1 Online Learning and Regret Minimization 2.997 Decision-Mking in Lrge-Scle Systems My 10 MIT, Spring 2004 Hndout #29 Lecture Note 24 1 Online Lerning nd Regret Minimiztion In this lecture, we consider the problem of sequentil decision mking in

More information

Bellman Optimality Equation for V*

Bellman Optimality Equation for V* Bellmn Optimlity Eqution for V* The vlue of stte under n optiml policy must equl the expected return for the best ction from tht stte: V (s) mx Q (s,) A(s) mx A(s) mx A(s) Er t 1 V (s t 1 ) s t s, t s

More information

Module 6 Value Iteration. CS 886 Sequential Decision Making and Reinforcement Learning University of Waterloo

Module 6 Value Iteration. CS 886 Sequential Decision Making and Reinforcement Learning University of Waterloo Module 6 Vlue Itertion CS 886 Sequentil Decision Mking nd Reinforcement Lerning University of Wterloo Mrkov Decision Process Definition Set of sttes: S Set of ctions (i.e., decisions): A Trnsition model:

More information

Administrivia CSE 190: Reinforcement Learning: An Introduction

Administrivia CSE 190: Reinforcement Learning: An Introduction Administrivi CSE 190: Reinforcement Lerning: An Introduction Any emil sent to me bout the course should hve CSE 190 in the subject line! Chpter 4: Dynmic Progrmming Acknowledgment: A good number of these

More information

Reinforcement Learning and Policy Reuse

Reinforcement Learning and Policy Reuse Reinforcement Lerning nd Policy Reue Mnuel M. Veloo PEL Fll 206 Reding: Reinforcement Lerning: An Introduction R. Sutton nd A. Brto Probbilitic policy reue in reinforcement lerning gent Fernndo Fernndez

More information

Reinforcement learning

Reinforcement learning Reinforcement lerning Regulr MDP Given: Trnition model P Rewrd function R Find: Policy π Reinforcement lerning Trnition model nd rewrd function initilly unknown Still need to find the right policy Lern

More information

CS 188: Artificial Intelligence Spring 2007

CS 188: Artificial Intelligence Spring 2007 CS 188: Artificil Intelligence Spring 2007 Lecture 3: Queue-Bsed Serch 1/23/2007 Srini Nrynn UC Berkeley Mny slides over the course dpted from Dn Klein, Sturt Russell or Andrew Moore Announcements Assignment

More information

CS 188 Introduction to Artificial Intelligence Fall 2018 Note 7

CS 188 Introduction to Artificial Intelligence Fall 2018 Note 7 CS 188 Introduction to Artificil Intelligence Fll 2018 Note 7 These lecture notes re hevily bsed on notes originlly written by Nikhil Shrm. Decision Networks In the third note, we lerned bout gme trees

More information

{ } = E! & $ " k r t +k +1

{ } = E! & $  k r t +k +1 Chpter 4: Dynmic Progrmming Objectives of this chpter: Overview of collection of clssicl solution methods for MDPs known s dynmic progrmming (DP) Show how DP cn be used to compute vlue functions, nd hence,

More information

Chapter 4: Dynamic Programming

Chapter 4: Dynamic Programming Chpter 4: Dynmic Progrmming Objectives of this chpter: Overview of collection of clssicl solution methods for MDPs known s dynmic progrmming (DP) Show how DP cn be used to compute vlue functions, nd hence,

More information

Artificial Intelligence Markov Decision Problems

Artificial Intelligence Markov Decision Problems rtificil Intelligence Mrkov eciion Problem ilon - briefly mentioned in hpter Ruell nd orvig - hpter 7 Mrkov eciion Problem; pge of Mrkov eciion Problem; pge of exmple: probbilitic blockworld ction outcome

More information

Uninformed Search Lecture 4

Uninformed Search Lecture 4 Lecture 4 Wht re common serch strtegies tht operte given only serch problem? How do they compre? 1 Agend A quick refresher DFS, BFS, ID-DFS, UCS Unifiction! 2 Serch Problem Formlism Defined vi the following

More information

Decision Networks. CS 188: Artificial Intelligence Fall Example: Decision Networks. Decision Networks. Decisions as Outcome Trees

Decision Networks. CS 188: Artificial Intelligence Fall Example: Decision Networks. Decision Networks. Decisions as Outcome Trees CS 188: Artificil Intelligence Fll 2011 Decision Networks ME: choose the ction which mximizes the expected utility given the evidence mbrell Lecture 17: Decision Digrms 10/27/2011 Cn directly opertionlize

More information

Multi-Armed Bandits: Non-adaptive and Adaptive Sampling

Multi-Armed Bandits: Non-adaptive and Adaptive Sampling CSE 547/Stt 548: Mchine Lerning for Big Dt Lecture Multi-Armed Bndits: Non-dptive nd Adptive Smpling Instructor: Shm Kkde 1 The (stochstic) multi-rmed bndit problem The bsic prdigm is s follows: K Independent

More information

Properties of Integrals, Indefinite Integrals. Goals: Definition of the Definite Integral Integral Calculations using Antiderivatives

Properties of Integrals, Indefinite Integrals. Goals: Definition of the Definite Integral Integral Calculations using Antiderivatives Block #6: Properties of Integrls, Indefinite Integrls Gols: Definition of the Definite Integrl Integrl Clcultions using Antiderivtives Properties of Integrls The Indefinite Integrl 1 Riemnn Sums - 1 Riemnn

More information

Hidden Markov Models

Hidden Markov Models Hidden Mrkov Models Huptseminr Mchine Lerning 18.11.2003 Referent: Nikols Dörfler 1 Overview Mrkov Models Hidden Mrkov Models Types of Hidden Mrkov Models Applictions using HMMs Three centrl problems:

More information

Advanced Calculus: MATH 410 Notes on Integrals and Integrability Professor David Levermore 17 October 2004

Advanced Calculus: MATH 410 Notes on Integrals and Integrability Professor David Levermore 17 October 2004 Advnced Clculus: MATH 410 Notes on Integrls nd Integrbility Professor Dvid Levermore 17 October 2004 1. Definite Integrls In this section we revisit the definite integrl tht you were introduced to when

More information

Strong Bisimulation. Overview. References. Actions Labeled transition system Transition semantics Simulation Bisimulation

Strong Bisimulation. Overview. References. Actions Labeled transition system Transition semantics Simulation Bisimulation Strong Bisimultion Overview Actions Lbeled trnsition system Trnsition semntics Simultion Bisimultion References Robin Milner, Communiction nd Concurrency Robin Milner, Communicting nd Mobil Systems 32

More information

DATA Search I 魏忠钰. 复旦大学大数据学院 School of Data Science, Fudan University. March 7 th, 2018

DATA Search I 魏忠钰. 复旦大学大数据学院 School of Data Science, Fudan University. March 7 th, 2018 DATA620006 魏忠钰 Serch I Mrch 7 th, 2018 Outline Serch Problems Uninformed Serch Depth-First Serch Bredth-First Serch Uniform-Cost Serch Rel world tsk - Pc-mn Serch problems A serch problem consists of:

More information

Applying Q-Learning to Flappy Bird

Applying Q-Learning to Flappy Bird Applying Q-Lerning to Flppy Bird Moritz Ebeling-Rump, Mnfred Ko, Zchry Hervieux-Moore Abstrct The field of mchine lerning is n interesting nd reltively new re of reserch in rtificil intelligence. In this

More information

Planning to Be Surprised: Optimal Bayesian Exploration in Dynamic Environments

Planning to Be Surprised: Optimal Bayesian Exploration in Dynamic Environments Plnning to Be Surprised: Optiml Byesin Explortion in Dynmic Environments Yi Sun, Fustino Gomez, nd Jürgen Schmidhuber IDSIA, Glleri 2, Mnno, CH-6928, Switzerlnd {yi,tino,juergen}@idsi.ch Abstrct. To mximize

More information

Cf. Linn Sennott, Stochastic Dynamic Programming and the Control of Queueing Systems, Wiley Series in Probability & Statistics, 1999.

Cf. Linn Sennott, Stochastic Dynamic Programming and the Control of Queueing Systems, Wiley Series in Probability & Statistics, 1999. Cf. Linn Sennott, Stochstic Dynmic Progrmming nd the Control of Queueing Systems, Wiley Series in Probbility & Sttistics, 1999. D.L.Bricker, 2001 Dept of Industril Engineering The University of Iow MDP

More information

Exam 2, Mathematics 4701, Section ETY6 6:05 pm 7:40 pm, March 31, 2016, IH-1105 Instructor: Attila Máté 1

Exam 2, Mathematics 4701, Section ETY6 6:05 pm 7:40 pm, March 31, 2016, IH-1105 Instructor: Attila Máté 1 Exm, Mthemtics 471, Section ETY6 6:5 pm 7:4 pm, Mrch 1, 16, IH-115 Instructor: Attil Máté 1 17 copies 1. ) Stte the usul sufficient condition for the fixed-point itertion to converge when solving the eqution

More information

THE EXISTENCE-UNIQUENESS THEOREM FOR FIRST-ORDER DIFFERENTIAL EQUATIONS.

THE EXISTENCE-UNIQUENESS THEOREM FOR FIRST-ORDER DIFFERENTIAL EQUATIONS. THE EXISTENCE-UNIQUENESS THEOREM FOR FIRST-ORDER DIFFERENTIAL EQUATIONS RADON ROSBOROUGH https://intuitiveexplntionscom/picrd-lindelof-theorem/ This document is proof of the existence-uniqueness theorem

More information

Efficient Planning. R. S. Sutton and A. G. Barto: Reinforcement Learning: An Introduction

Efficient Planning. R. S. Sutton and A. G. Barto: Reinforcement Learning: An Introduction Efficient Plnning 1 Tuesdy clss summry: Plnning: ny computtionl process tht uses model to crete or improve policy Dyn frmework: 2 Questions during clss Why use simulted experience? Cn t you directly compute

More information

Unit #9 : Definite Integral Properties; Fundamental Theorem of Calculus

Unit #9 : Definite Integral Properties; Fundamental Theorem of Calculus Unit #9 : Definite Integrl Properties; Fundmentl Theorem of Clculus Gols: Identify properties of definite integrls Define odd nd even functions, nd reltionship to integrl vlues Introduce the Fundmentl

More information

Chapter 6 Notes, Larson/Hostetler 3e

Chapter 6 Notes, Larson/Hostetler 3e Contents 6. Antiderivtives nd the Rules of Integrtion.......................... 6. Are nd the Definite Integrl.................................. 6.. Are............................................ 6. Reimnn

More information

LECTURE NOTE #12 PROF. ALAN YUILLE

LECTURE NOTE #12 PROF. ALAN YUILLE LECTURE NOTE #12 PROF. ALAN YUILLE 1. Clustering, K-mens, nd EM Tsk: set of unlbeled dt D = {x 1,..., x n } Decompose into clsses w 1,..., w M where M is unknown. Lern clss models p(x w)) Discovery of

More information

Continuous Random Variables

Continuous Random Variables STAT/MATH 395 A - PROBABILITY II UW Winter Qurter 217 Néhémy Lim Continuous Rndom Vribles Nottion. The indictor function of set S is rel-vlued function defined by : { 1 if x S 1 S (x) if x S Suppose tht

More information

Jonathan Mugan. July 15, 2013

Jonathan Mugan. July 15, 2013 Jonthn Mugn July 15, 2013 Imgine rt in Skinner box. The rt cn see screen of imges, nd dot in the lower-right corner determines if there will be shock. Bottom-up methods my not find this dot, but top-down

More information

Goals: Determine how to calculate the area described by a function. Define the definite integral. Explore the relationship between the definite

Goals: Determine how to calculate the area described by a function. Define the definite integral. Explore the relationship between the definite Unit #8 : The Integrl Gols: Determine how to clculte the re described by function. Define the definite integrl. Eplore the reltionship between the definite integrl nd re. Eplore wys to estimte the definite

More information

Chapter 0. What is the Lebesgue integral about?

Chapter 0. What is the Lebesgue integral about? Chpter 0. Wht is the Lebesgue integrl bout? The pln is to hve tutoril sheet ech week, most often on Fridy, (to be done during the clss) where you will try to get used to the ides introduced in the previous

More information

Chapter 5 Plan-Space Planning

Chapter 5 Plan-Space Planning Lecture slides for Automted Plnning: Theory nd Prctice Chpter 5 Pln-Spce Plnning Dn S. Nu CMSC 722, AI Plnning University of Mrylnd, Spring 2008 1 Stte-Spce Plnning Motivtion g 1 1 g 4 4 s 0 g 5 5 g 2

More information

Math 1B, lecture 4: Error bounds for numerical methods

Math 1B, lecture 4: Error bounds for numerical methods Mth B, lecture 4: Error bounds for numericl methods Nthn Pflueger 4 September 0 Introduction The five numericl methods descried in the previous lecture ll operte by the sme principle: they pproximte the

More information

Coalgebra, Lecture 15: Equations for Deterministic Automata

Coalgebra, Lecture 15: Equations for Deterministic Automata Colger, Lecture 15: Equtions for Deterministic Automt Julin Slmnc (nd Jurrin Rot) Decemer 19, 2016 In this lecture, we will study the concept of equtions for deterministic utomt. The notes re self contined

More information

Scalable Learning in Stochastic Games

Scalable Learning in Stochastic Games Sclble Lerning in Stochstic Gmes Michel Bowling nd Mnuel Veloso Computer Science Deprtment Crnegie Mellon University Pittsburgh PA, 15213-3891 Abstrct Stochstic gmes re generl model of interction between

More information

Convergence Results for Single-Step On-Policy Reinforcement-Learning Algorithms

Convergence Results for Single-Step On-Policy Reinforcement-Learning Algorithms Mchine Lerning, 39, 287 308, 2000. c 2000 Kluwer Acdemic Publishers. Printed in The Netherlnds. Convergence Results for Single-Step On-Policy Reinforcement-Lerning Algorithms SATINDER SINGH AT&T Lbs-Reserch,

More information

Bayesian Networks: Approximate Inference

Bayesian Networks: Approximate Inference pproches to inference yesin Networks: pproximte Inference xct inference Vrillimintion Join tree lgorithm pproximte inference Simplify the structure of the network to mkxct inferencfficient (vritionl methods,

More information

Review of Calculus, cont d

Review of Calculus, cont d Jim Lmbers MAT 460 Fll Semester 2009-10 Lecture 3 Notes These notes correspond to Section 1.1 in the text. Review of Clculus, cont d Riemnn Sums nd the Definite Integrl There re mny cses in which some

More information

More on automata. Michael George. March 24 April 7, 2014

More on automata. Michael George. March 24 April 7, 2014 More on utomt Michel George Mrch 24 April 7, 2014 1 Automt constructions Now tht we hve forml model of mchine, it is useful to mke some generl constructions. 1.1 DFA Union / Product construction Suppose

More information

Acceptance Sampling by Attributes

Acceptance Sampling by Attributes Introduction Acceptnce Smpling by Attributes Acceptnce smpling is concerned with inspection nd decision mking regrding products. Three spects of smpling re importnt: o Involves rndom smpling of n entire

More information

AUTOMATA AND LANGUAGES. Definition 1.5: Finite Automaton

AUTOMATA AND LANGUAGES. Definition 1.5: Finite Automaton 25. Finite Automt AUTOMATA AND LANGUAGES A system of computtion tht only hs finite numer of possile sttes cn e modeled using finite utomton A finite utomton is often illustrted s stte digrm d d d. d q

More information

Decision Networks. CS 188: Artificial Intelligence. Decision Networks. Decision Networks. Decision Networks and Value of Information

Decision Networks. CS 188: Artificial Intelligence. Decision Networks. Decision Networks. Decision Networks and Value of Information CS 188: Artificil Intelligence nd Vlue of Informtion Instructors: Dn Klein nd Pieter Abbeel niversity of Cliforni, Berkeley [These slides were creted by Dn Klein nd Pieter Abbeel for CS188 Intro to AI

More information

Chapter 5 : Continuous Random Variables

Chapter 5 : Continuous Random Variables STAT/MATH 395 A - PROBABILITY II UW Winter Qurter 216 Néhémy Lim Chpter 5 : Continuous Rndom Vribles Nottions. N {, 1, 2,...}, set of nturl numbers (i.e. ll nonnegtive integers); N {1, 2,...}, set of ll

More information

CS 188: Artificial Intelligence

CS 188: Artificial Intelligence CS 188: Artificil Intelligence Lecture 19: Decision Digrms Pieter Abbeel --- C Berkeley Mny slides over this course dpted from Dn Klein, Sturt Russell, Andrew Moore Decision Networks ME: choose the ction

More information

1B40 Practical Skills

1B40 Practical Skills B40 Prcticl Skills Comining uncertinties from severl quntities error propgtion We usully encounter situtions where the result of n experiment is given in terms of two (or more) quntities. We then need

More information

CS 188: Artificial Intelligence Fall 2010

CS 188: Artificial Intelligence Fall 2010 CS 188: Artificil Intelligence Fll 2010 Lecture 18: Decision Digrms 10/28/2010 Dn Klein C Berkeley Vlue of Informtion 1 Decision Networks ME: choose the ction which mximizes the expected utility given

More information

New Expansion and Infinite Series

New Expansion and Infinite Series Interntionl Mthemticl Forum, Vol. 9, 204, no. 22, 06-073 HIKARI Ltd, www.m-hikri.com http://dx.doi.org/0.2988/imf.204.4502 New Expnsion nd Infinite Series Diyun Zhng College of Computer Nnjing University

More information

f(x) dx, If one of these two conditions is not met, we call the integral improper. Our usual definition for the value for the definite integral

f(x) dx, If one of these two conditions is not met, we call the integral improper. Our usual definition for the value for the definite integral Improper Integrls Every time tht we hve evluted definite integrl such s f(x) dx, we hve mde two implicit ssumptions bout the integrl:. The intervl [, b] is finite, nd. f(x) is continuous on [, b]. If one

More information

CS667 Lecture 6: Monte Carlo Integration 02/10/05

CS667 Lecture 6: Monte Carlo Integration 02/10/05 CS667 Lecture 6: Monte Crlo Integrtion 02/10/05 Venkt Krishnrj Lecturer: Steve Mrschner 1 Ide The min ide of Monte Crlo Integrtion is tht we cn estimte the vlue of n integrl by looking t lrge number of

More information

Numerical Integration. 1 Introduction. 2 Midpoint Rule, Trapezoid Rule, Simpson Rule. AMSC/CMSC 460/466 T. von Petersdorff 1

Numerical Integration. 1 Introduction. 2 Midpoint Rule, Trapezoid Rule, Simpson Rule. AMSC/CMSC 460/466 T. von Petersdorff 1 AMSC/CMSC 46/466 T. von Petersdorff 1 umericl Integrtion 1 Introduction We wnt to pproximte the integrl I := f xdx where we re given, b nd the function f s subroutine. We evlute f t points x 1,...,x n

More information

Nondeterminism and Nodeterministic Automata

Nondeterminism and Nodeterministic Automata Nondeterminism nd Nodeterministic Automt 61 Nondeterminism nd Nondeterministic Automt The computtionl mchine models tht we lerned in the clss re deterministic in the sense tht the next move is uniquely

More information

WE would like to build intelligent agents that can. Autonomous Learning of High-Level States and Actions in Continuous Environments

WE would like to build intelligent agents that can. Autonomous Learning of High-Level States and Actions in Continuous Environments Autonomous Lerning of High-Level Sttes nd s in Continuous Environments Jonthn Mugn nd Benjmin Kuipers, Fellow, IEEE Abstrct How cn n gent bootstrp up from pixel-level representtion to utonomously lern

More information

Learning to Serve and Bounce a Ball

Learning to Serve and Bounce a Ball Sndr Amend Gregor Gebhrdt Technische Universität Drmstdt Abstrct In this pper we investigte lerning the tsks of bll serving nd bll bouncing. These tsks disply chrcteristics which re common in vriety of

More information

The Regulated and Riemann Integrals

The Regulated and Riemann Integrals Chpter 1 The Regulted nd Riemnn Integrls 1.1 Introduction We will consider severl different pproches to defining the definite integrl f(x) dx of function f(x). These definitions will ll ssign the sme vlue

More information

Chapter Five: Nondeterministic Finite Automata. Formal Language, chapter 5, slide 1

Chapter Five: Nondeterministic Finite Automata. Formal Language, chapter 5, slide 1 Chpter Five: Nondeterministic Finite Automt Forml Lnguge, chpter 5, slide 1 1 A DFA hs exctly one trnsition from every stte on every symol in the lphet. By relxing this requirement we get relted ut more

More information

Riemann Sums and Riemann Integrals

Riemann Sums and Riemann Integrals Riemnn Sums nd Riemnn Integrls Jmes K. Peterson Deprtment of Biologicl Sciences nd Deprtment of Mthemticl Sciences Clemson University August 26, 2013 Outline 1 Riemnn Sums 2 Riemnn Integrls 3 Properties

More information

Student Activity 3: Single Factor ANOVA

Student Activity 3: Single Factor ANOVA MATH 40 Student Activity 3: Single Fctor ANOVA Some Bsic Concepts In designed experiment, two or more tretments, or combintions of tretments, is pplied to experimentl units The number of tretments, whether

More information

KNOWLEDGE-BASED AGENTS INFERENCE

KNOWLEDGE-BASED AGENTS INFERENCE AGENTS THAT REASON LOGICALLY KNOWLEDGE-BASED AGENTS Two components: knowledge bse, nd n inference engine. Declrtive pproch to building n gent. We tell it wht it needs to know, nd It cn sk itself wht to

More information

Tests for the Ratio of Two Poisson Rates

Tests for the Ratio of Two Poisson Rates Chpter 437 Tests for the Rtio of Two Poisson Rtes Introduction The Poisson probbility lw gives the probbility distribution of the number of events occurring in specified intervl of time or spce. The Poisson

More information

CS 373, Spring Solutions to Mock midterm 1 (Based on first midterm in CS 273, Fall 2008.)

CS 373, Spring Solutions to Mock midterm 1 (Based on first midterm in CS 273, Fall 2008.) CS 373, Spring 29. Solutions to Mock midterm (sed on first midterm in CS 273, Fll 28.) Prolem : Short nswer (8 points) The nswers to these prolems should e short nd not complicted. () If n NF M ccepts

More information

Solution for Assignment 1 : Intro to Probability and Statistics, PAC learning

Solution for Assignment 1 : Intro to Probability and Statistics, PAC learning Solution for Assignment 1 : Intro to Probbility nd Sttistics, PAC lerning 10-701/15-781: Mchine Lerning (Fll 004) Due: Sept. 30th 004, Thursdy, Strt of clss Question 1. Bsic Probbility ( 18 pts) 1.1 (

More information

Riemann Integrals and the Fundamental Theorem of Calculus

Riemann Integrals and the Fundamental Theorem of Calculus Riemnn Integrls nd the Fundmentl Theorem of Clculus Jmes K. Peterson Deprtment of Biologicl Sciences nd Deprtment of Mthemticl Sciences Clemson University September 16, 2013 Outline Grphing Riemnn Sums

More information

Problem Set 3 Solutions

Problem Set 3 Solutions Chemistry 36 Dr Jen M Stndrd Problem Set 3 Solutions 1 Verify for the prticle in one-dimensionl box by explicit integrtion tht the wvefunction ψ ( x) π x is normlized To verify tht ψ ( x) is normlized,

More information

New data structures to reduce data size and search time

New data structures to reduce data size and search time New dt structures to reduce dt size nd serch time Tsuneo Kuwbr Deprtment of Informtion Sciences, Fculty of Science, Kngw University, Hirtsuk-shi, Jpn FIT2018 1D-1, No2, pp1-4 Copyright (c)2018 by The Institute

More information

Problem Set 7: Monopoly and Game Theory

Problem Set 7: Monopoly and Game Theory ECON 000 Problem Set 7: Monopoly nd Gme Theory. () The monopolist will choose the production level tht mximizes its profits: The FOC of monopolist s problem is: So, the monopolist would set the quntity

More information

Riemann Sums and Riemann Integrals

Riemann Sums and Riemann Integrals Riemnn Sums nd Riemnn Integrls Jmes K. Peterson Deprtment of Biologicl Sciences nd Deprtment of Mthemticl Sciences Clemson University August 26, 203 Outline Riemnn Sums Riemnn Integrls Properties Abstrct

More information

Riemann is the Mann! (But Lebesgue may besgue to differ.)

Riemann is the Mann! (But Lebesgue may besgue to differ.) Riemnn is the Mnn! (But Lebesgue my besgue to differ.) Leo Livshits My 2, 2008 1 For finite intervls in R We hve seen in clss tht every continuous function f : [, b] R hs the property tht for every ɛ >

More information

Chapter 3 Solving Nonlinear Equations

Chapter 3 Solving Nonlinear Equations Chpter 3 Solving Nonliner Equtions 3.1 Introduction The nonliner function of unknown vrible x is in the form of where n could be non-integer. Root is the numericl vlue of x tht stisfies f ( x) 0. Grphiclly,

More information

Local orthogonality: a multipartite principle for (quantum) correlations

Local orthogonality: a multipartite principle for (quantum) correlations Locl orthogonlity: multiprtite principle for (quntum) correltions Antonio Acín ICREA Professor t ICFO-Institut de Ciencies Fotoniques, Brcelon Cusl Structure in Quntum Theory, Bensque, Spin, June 2013

More information

CS103B Handout 18 Winter 2007 February 28, 2007 Finite Automata

CS103B Handout 18 Winter 2007 February 28, 2007 Finite Automata CS103B ndout 18 Winter 2007 Ferury 28, 2007 Finite Automt Initil text y Mggie Johnson. Introduction Severl childrens gmes fit the following description: Pieces re set up on plying ord; dice re thrown or

More information

Recitation 3: More Applications of the Derivative

Recitation 3: More Applications of the Derivative Mth 1c TA: Pdric Brtlett Recittion 3: More Applictions of the Derivtive Week 3 Cltech 2012 1 Rndom Question Question 1 A grph consists of the following: A set V of vertices. A set E of edges where ech

More information

Power Constrained DTNs: Risk MDP-LP Approach

Power Constrained DTNs: Risk MDP-LP Approach Power Constrined DTNs: Risk MDP-LP Approch Atul Kumr tulkr.in@gmil.com IEOR, IIT Bomby, Indi Veerrun Kvith vkvith@iitb.c.in, IEOR, IIT Bomby, Indi N Hemchndr nh@iitb.c.in, IEOR, IIT Bomby, Indi. Abstrct

More information

REINFORCEMENT learning (RL) was originally studied

REINFORCEMENT learning (RL) was originally studied IEEE TRANSACTIONS ON SYSTEMS, MAN, AND CYBERNETICS: SYSTEMS, VOL. 45, NO. 3, MARCH 2015 385 Multiobjective Reinforcement Lerning: A Comprehensive Overview Chunming Liu, Xin Xu, Senior Member, IEEE, nd

More information

Chapters 4 & 5 Integrals & Applications

Chapters 4 & 5 Integrals & Applications Contents Chpters 4 & 5 Integrls & Applictions Motivtion to Chpters 4 & 5 2 Chpter 4 3 Ares nd Distnces 3. VIDEO - Ares Under Functions............................................ 3.2 VIDEO - Applictions

More information

Situation Calculus. Situation Calculus Building Blocks. Sheila McIlraith, CSC384, University of Toronto, Winter Situations Fluents Actions

Situation Calculus. Situation Calculus Building Blocks. Sheila McIlraith, CSC384, University of Toronto, Winter Situations Fluents Actions Plnning gent: single gent or multi-gent Stte: complete or Incomplete (logicl/probbilistic) stte of the worl n/or gent s stte of knowlege ctions: worl-ltering n/or knowlege-ltering (e.g. sensing) eterministic

More information

CS:4330 Theory of Computation Spring Regular Languages. Equivalences between Finite automata and REs. Haniel Barbosa

CS:4330 Theory of Computation Spring Regular Languages. Equivalences between Finite automata and REs. Haniel Barbosa CS:4330 Theory of Computtion Spring 208 Regulr Lnguges Equivlences between Finite utomt nd REs Hniel Brbos Redings for this lecture Chpter of [Sipser 996], 3rd edition. Section.3. Finite utomt nd regulr

More information

Name Solutions to Test 3 November 8, 2017

Name Solutions to Test 3 November 8, 2017 Nme Solutions to Test 3 November 8, 07 This test consists of three prts. Plese note tht in prts II nd III, you cn skip one question of those offered. Some possibly useful formuls cn be found below. Brrier

More information

Arithmetic & Algebra. NCTM National Conference, 2017

Arithmetic & Algebra. NCTM National Conference, 2017 NCTM Ntionl Conference, 2017 Arithmetic & Algebr Hether Dlls, UCLA Mthemtics & The Curtis Center Roger Howe, Yle Mthemtics & Texs A & M School of Eduction Relted Common Core Stndrds First instnce of vrible

More information

Jim Lambers MAT 169 Fall Semester Lecture 4 Notes

Jim Lambers MAT 169 Fall Semester Lecture 4 Notes Jim Lmbers MAT 169 Fll Semester 2009-10 Lecture 4 Notes These notes correspond to Section 8.2 in the text. Series Wht is Series? An infinte series, usully referred to simply s series, is n sum of ll of

More information

Anatomy of a Deterministic Finite Automaton. Deterministic Finite Automata. A machine so simple that you can understand it in less than one minute

Anatomy of a Deterministic Finite Automaton. Deterministic Finite Automata. A machine so simple that you can understand it in less than one minute Victor Admchik Dnny Sletor Gret Theoreticl Ides In Computer Science CS 5-25 Spring 2 Lecture 2 Mr 3, 2 Crnegie Mellon University Deterministic Finite Automt Finite Automt A mchine so simple tht you cn

More information

Lecture 3 ( ) (translated and slightly adapted from lecture notes by Martin Klazar)

Lecture 3 ( ) (translated and slightly adapted from lecture notes by Martin Klazar) Lecture 3 (5.3.2018) (trnslted nd slightly dpted from lecture notes by Mrtin Klzr) Riemnn integrl Now we define precisely the concept of the re, in prticulr, the re of figure U(, b, f) under the grph of

More information

Chapter 2 Finite Automata

Chapter 2 Finite Automata Chpter 2 Finite Automt 28 2.1 Introduction Finite utomt: first model of the notion of effective procedure. (They lso hve mny other pplictions). The concept of finite utomton cn e derived y exmining wht

More information

Numerical Integration

Numerical Integration Chpter 5 Numericl Integrtion Numericl integrtion is the study of how the numericl vlue of n integrl cn be found. Methods of function pproximtion discussed in Chpter??, i.e., function pproximtion vi the

More information

Metrics for Finite Markov Decision Processes

Metrics for Finite Markov Decision Processes Metrics for Finite Mrkov Decision Processes Norm Ferns chool of Computer cience McGill University Montrél, Cnd, H3 27 nferns@cs.mcgill.c Prksh Pnngden chool of Computer cience McGill University Montrél,

More information

Autonomous Learning of High-Level States and Actions in Continuous Environments. Jonathan Mugan and Benjamin Kuipers, Fellow, IEEE

Autonomous Learning of High-Level States and Actions in Continuous Environments. Jonathan Mugan and Benjamin Kuipers, Fellow, IEEE Autonomous Lerning of High-Level Sttes nd s in Continuous Environments Jonthn Mugn nd Benjmin Kuipers, Fellow, IEEE Abstrct How cn n gent bootstrp up from low-level representtion to utonomously lern high-level

More information

Best Approximation. Chapter The General Case

Best Approximation. Chapter The General Case Chpter 4 Best Approximtion 4.1 The Generl Cse In the previous chpter, we hve seen how n interpolting polynomil cn be used s n pproximtion to given function. We now wnt to find the best pproximtion to given

More information

Actor-Critic. Hung-yi Lee

Actor-Critic. Hung-yi Lee Actor-Critic Hung-yi Lee Asynchronous Advntge Actor-Critic (A3C) Volodymyr Mnih, Adrià Puigdomènech Bdi, Mehdi Mirz, Alex Grves, Timothy P. Lillicrp, Tim Hrley, Dvid Silver, Kory Kvukcuoglu, Asynchronous

More information

MATH 144: Business Calculus Final Review

MATH 144: Business Calculus Final Review MATH 144: Business Clculus Finl Review 1 Skills 1. Clculte severl limits. 2. Find verticl nd horizontl symptotes for given rtionl function. 3. Clculte derivtive by definition. 4. Clculte severl derivtives

More information

A Fast and Reliable Policy Improvement Algorithm

A Fast and Reliable Policy Improvement Algorithm A Fst nd Relible Policy Improvement Algorithm Ysin Abbsi-Ydkori Peter L. Brtlett Stephen J. Wright Queenslnd University of Technology UC Berkeley nd QUT University of Wisconsin-Mdison Abstrct We introduce

More information

1. Gauss-Jacobi quadrature and Legendre polynomials. p(t)w(t)dt, p {p(x 0 ),...p(x n )} p(t)w(t)dt = w k p(x k ),

1. Gauss-Jacobi quadrature and Legendre polynomials. p(t)w(t)dt, p {p(x 0 ),...p(x n )} p(t)w(t)dt = w k p(x k ), 1. Guss-Jcobi qudrture nd Legendre polynomils Simpson s rule for evluting n integrl f(t)dt gives the correct nswer with error of bout O(n 4 ) (with constnt tht depends on f, in prticulr, it depends on

More information

Non-Linear & Logistic Regression

Non-Linear & Logistic Regression Non-Liner & Logistic Regression If the sttistics re boring, then you've got the wrong numbers. Edwrd R. Tufte (Sttistics Professor, Yle University) Regression Anlyses When do we use these? PART 1: find

More information

Chapter 14. Matrix Representations of Linear Transformations

Chapter 14. Matrix Representations of Linear Transformations Chpter 4 Mtrix Representtions of Liner Trnsformtions When considering the Het Stte Evolution, we found tht we could describe this process using multipliction by mtrix. This ws nice becuse computers cn

More information

Math 270A: Numerical Linear Algebra

Math 270A: Numerical Linear Algebra Mth 70A: Numericl Liner Algebr Instructor: Michel Holst Fll Qurter 014 Homework Assignment #3 Due Give to TA t lest few dys before finl if you wnt feedbck. Exercise 3.1. (The Bsic Liner Method for Liner

More information

Do the one-dimensional kinetic energy and momentum operators commute? If not, what operator does their commutator represent?

Do the one-dimensional kinetic energy and momentum operators commute? If not, what operator does their commutator represent? 1 Problem 1 Do the one-dimensionl kinetic energy nd momentum opertors commute? If not, wht opertor does their commuttor represent? KE ˆ h m d ˆP i h d 1.1 Solution This question requires clculting the

More information

Learning Moore Machines from Input-Output Traces

Learning Moore Machines from Input-Output Traces Lerning Moore Mchines from Input-Output Trces Georgios Gintmidis 1 nd Stvros Tripkis 1,2 1 Alto University, Finlnd 2 UC Berkeley, USA Motivtion: lerning models from blck boxes Inputs? Lerner Forml Model

More information

Bisimulation. R.J. van Glabbeek

Bisimulation. R.J. van Glabbeek Bisimultion R.J. vn Glbbeek NICTA, Sydney, Austrli. School of Computer Science nd Engineering, The University of New South Wles, Sydney, Austrli. Computer Science Deprtment, Stnford University, CA 94305-9045,

More information