Reinforcement learning II

Size: px
Start display at page:

Download "Reinforcement learning II"

Transcription

1 CS 1675 Introduction to Mchine Lerning Lecture 26 Reinforcement lerning II Milos Huskrecht 5329 Sennott Squre Reinforcement lerning Bsics: Input x Lerner Output Reinforcement r Critic Lerner intercts with the environment Receives input with informtion bout the environment (e.g. from sensors) Mkes ctions tht (my) effect the environment Receives reinforcement signl tht provides feedbck on how well it performed 1

2 Reinforcement lerning Objective: Lern how to ct in the environment in order to mximize the reinforcement signl The selection of ctions should depend on the input A policy : X A mps inputs to ctions Gol: find the optiml policy : X A tht gives the best expected reinforcements Input x Lerner Output Reinforcement r Critic Exmple: lern how to ply gmes (AlphGo) Gmbling exmple Gme: 3 bised coins The coin to be tossed is selected rndomly from the three coin options. The gent lwys sees which coin is going to be plyed next. The gent mkes bet on either hed or til with wge of $1. If fter the coin tos the outcome grees with the bet, the gent wins $1, otherwise it looses $1 RL model: Input: X coin chosen for the next tos Action: A choice of hed or til the gent bets on, Reinforcements: {1, -1} A policy : X A Exmple: : Coin1 Coin2 Coin3 hed til hed : hed til hed 2

3 Gmbling exmple RL model: Input: X coin chosen for the next tos Action: A choice of hed or til the gent bets on, Reinforcements: {1, -1} A policy : Coin1 Coin2 Coin3 hed til hed Stte, ction rewrd trjectories Step0 Step1 Step2 Step k stte ction Coin2 Til Coin1 Hed Coin2 Til.. Coin1 Hed.. rewrd RL lerning: objective functions Objective: Find policy : X A Coin3? Tht mximizes some combintion of future reinforcements (rewrds) received over time Vlution models (quntify how good the mpping is): Finite horizon models E( T r t t0 T Infinite horizon discounted model 0 t Averge rewrd ) Time horizon: E( t rt ) Discount fctor: 1 lim E( T T T t0 r ) t T 0 E( ) Discount fctor: 0 1 t0 t rt : Coin1? Coin2? 0 1 3

4 Expected rewrd E( 0 t t rt RL with immedite rewrds ) Immedite rewrd cse: Rewrd depends only on x nd the ction choice The ction does not ffect the environment nd hence future inputs (sttes) nd future rewrds Expected one step rewrd for input x (coin to ply next) nd the choice : 0 1 Expected rewrd t0 RL with immedite rewrds t 2 E( rt ) E( r0 ) E( r1 ) E( r2 )... Optiml strtegy: : X A ( x) rg mx : Expected one step rewrd for input x (coin to ply next) nd the choice 4

5 RL with immedite rewrds The optiml choice ssumes we know the expected rewrd Then: Cvets ( x) rg mx We do not know the expected rewrd We need to estimte it using from interction We cnnot determine the optiml policy if the estimte of the expected rewrd is not good We need to try lso ctions tht look suboptiml wrt the ~ current estimtes of ~ Estimting Solution 1: For ech input x try different ctions Estimte using the verge of observed rewrds ) ~ 1 x, N N x, i1 r i Solution 2: online pproximtion Updtes n estimte fter performing ction in x nd x observing the rewrd r, R ~ ( ( i) (1 ( i)) R ~ ( ( i1) ( i) r i (i) - lerning rte 5

6 RL with immedite rewrds At ny step in time i during the experiment we hve estimtes of expected rewrds for ech (coin, ction) pir: coin1, hed) coin1, til) coin2, hed) coin2, til) coin3, hed) coin3, til) Assume the next coin to ply in step (i+1) is coin 2 nd we pick hed s our bet. Then we updte ~ ( i 1) coin2, hed) using the observed rewrd nd one of the updte strtegy bove, nd keep the rewrd estimtes for the remining (coin, ction) pirs unchnged, e.g. ~ ( i1) ~ ( ) coin2, til) coin2, til) i Explortion vs. Exploittion Uniform explortion: Uses explortion prmeter 0 1 Choose the current best choice with probbility ˆ ( x) rg mx R ~ ( A 1 All other choices re selected with uniform probbility A 1 Advntges: Simple, esy to implement Disdvntges: Explortion more pproprite t the beginning when we do not ~ hve good estimtes of Exploittion more pproprite lter when we hve good estimtes 6

7 Explortion vs. Exploittion Boltzmn explortion The ction is chosen rndomly but proportionlly to its current expected rewrd estimte Cn be tuned with temperture prmeter T to promote explortion or exploittion Probbility of choosing ction expr ~ ( / T p( x) ~ expr ( ') / T ' A Effect of T: For high vlues of T, p( x) is uniformly distributed for ll ctions For low vlues of T, p( x) of the ction with the highest ~ vlue of is pproching 1 Agent nvigtion exmple Agent nvigtion in the mze: 4 moves in compss directions Effects of moves re stochstic we my wind up in other thn intended loction with non-zero probbility Objective: lern how to rech the gol stte in the shortest expected time moves G 7

8 Agent nvigtion exmple The RL model: Input: X position of n gent Output: A the next move Reinforcements: R -1 for ech move +100 for reching the gol A policy: : X A Gol: find the policy mximizing future expected rewrds E( 0 t t rt ) : Position 1 Position 2 Position 25 G right right left 0 1 moves Agent nvigtion exmple Stte, ction rewrd trjectories policy : Position 1 Position 2 Position 25 right right left G moves Step0 Step1 Step2 Step k stte ction Pos1 Right Pos2 Right Pos3 Up.. Pos15 Up.. rewrd

9 Lerning with delyed rewrds Action in ddition to immedite rewrds ffect the next stte of the environment nd thus indirectly lso future rewrds We need model to represent environment chnges nd the the effect of ctions on sttes nd rewrds ssocited with them Mrkov decision process (MDP) Frequently used in AI, OR, control theory ction t-1 stte t-1 stte t rewrd t-1 Mrkov decision process ction t-1 stte t-1 stte t Forml definition: ( S, A, T, R) A set of sttes S (X ) loctions of robot A set of ctions move ctions Trnsition model S AS [0,1] where cn I get with different moves Rewrd model S A S rewrd/cost for trnsition A 4-tuple rewrd t-1 9

10 MDP problem We wnt to find the best policy : S A Vlue function ( V ) for policy, quntifies the goodness of policy through, e.g. infinite horizon, discounted model It: E( 0 t 1. combines future rewrds over trjectory 2. combines rewrds for multiple trjectories (through expecttion-bsed mesures) t rt ) G G Vlue of policy for MDP Assume fixed policy : S A How to compute the vlue of policy under infinite horizon discounted model? A fixed point eqution: V ( s) ( s)) S P( ( s)) V ( ) expected one step rewrd for the first ction v r Uv expected discounted rewrd for following the policy for the rest of the steps v ( I U) 1 r For finite stte spce we get set of liner equtions 10

11 Optiml policy The vlue of the optiml policy V ( s) mx P( V ( ) A S expected one step rewrd for the first ction expected discounted rewrd for following the opt. policy for the rest of the steps The optiml policy: :S A ( s) rg mx P( V ( ) A S Computing optiml policy Dynmic progrmming: Vlue itertion: computes the optiml vlue function first then the policy itertive pproximtion converges to the optiml vlue function Vlue itertion ( ) initilize V ;; V is vector of vlues for ll sttes repet set V' V set V ( s) mx P( V '( ) A S until V' V output ( s) rg mx P( V ( ) A S 11

12 Reinforcement lerning of optiml policies In the RL frmework we do not know the MDP model!!! Gol: lern the optiml policy Two bsic pproches: : S A Model bsed lerning Lern the MDP model (probbilitie rewrds) first Solve the MDP fterwrds Model-free lerning Lern how to ct directly No need to lern the prmeters of the MDP A number of clones of the two in the literture Model-bsed lerning We need to lern trnsition probbilities nd rewrds Lerning of probbilities ML prmeter estimtes Use counts Ns s P ~,, ' ( N s N Lerning rewrds Similr to lerning with immedite rewrds R ~ 1 ( N N s, i1 r i,, S Problem: chnges in the probbilities nd rewrd estimtes would require us to solve n MDP from scrtch! (fter every ction nd rewrd seen) N or the online solution 12

13 Model free lerning Motivtion: vlue function updte (vlue itertion): V Let ( s) mx A Q( S S P( V P( V ( ) ( ) Then V ( s) mx Q( A Note tht the updte cn be defined purely in terms of Q- functions Q( S P( mx Q(, ') ' Q-lerning Q-lerning uses the Q-vlue updte ide But relies on stochstic (on-line, smple by smple) updte Q( is replced with S P( mx Q(, ') Qˆ( (1 ) Qˆ( r( mx Qˆ(, ') r( - rewrd received from the environment fter performing n ction in stte s - new stte reched fter ction - lerning rte, function of s N, - number of times hs been executed t s ' ' 13

14 Q-function updtes in Q-lerning At ny step in time i during the experiment we hve estimtes of Q functions for ech (stte, ction) pir: Q( position1, up) ( position 1, left) ( position 1, right ) ( position 1, down) Q Q Q Q( position 2, up) Assume the current stte is position 1 nd we pick up ction to be performed next. ~ After we observe the rewrd, we updte Q( position 1, up), nd keep the Q function estimtes for the remining (stte, ction) pirs unchnged. Q-lerning The on-line updte rule is pplied repetedly during the direct interction with n environment Q-lerning initilize Q( =0 for ll pirs observe current stte s repet select ction ; use some explortion/exploittion schedule receive rewrd r observe next stte s updte Q( (1 ) Q( r mx Q(, ') ' set s to s end repet 14

15 Q-lerning convergence The Q-lerning is gurnteed to converge to the optiml Q- vlues under the following conditions: Every stte is visited nd every ction in tht stte is tried infinite number of times This is ssured vi explortion/exploittion schedule The sequence of lerning rtes for ech Q( stisfies: i1 1. ( i) 2. i1 (i) 2 ( n( ) - is the lerning rte for the nth tril of ( RL with delyed rewrds The optiml choice ( s) rg mx Q( much like wht we hd for the immedite rewrds ( x) rg mx RL Lerning Insted of exct vlues of Q( we use Since we hve only estimtes of Qˆ ( We need to try lso ctions tht look suboptiml wrt the current estimtes Explortion/exploittion strtegies Uniform explortion Boltzmn explortion Qˆ ( Qˆ( (1 ) Qˆ( r( mx Qˆ(, ') ' 15

16 Q-lerning speed-ups The bsic Q-lerning rule updtes my propgte distnt (delyed) rewrds very slowly Exmple: G Gol: high rewrd stte To mke the correct decision we need ll Q-vlues for the current position to be good Problem: in ech run we bck-propgte vlues only one-step bck. It tkes multiple trils to bck-propgte vlues multiple steps. Q-lerning speed-ups Remedy: Bckup vlues for lrger number of steps Rewrds from pplying the policy 2 qt rt r 1 r... t t2 i0 r i ti We cn substitute (immedite rewrds with n-step rewrds): n n 1 q i n t r ti mx Qt n(, ') i0 ' Postpone the updte for n steps nd updte with longer trjectory rewrds n Qt n1( Qt n( q t Qt n( Problems: - lrger vrince - explortion/exploittion switching - wit n steps to updte 16

17 Q-lerning speed-ups One step vs. n-step bckup G G Problems with n-step bckups: - lrger vrince - explortion/exploittion switching - wit n steps to updte Q-lerning speed-ups Temporl difference (TD) method Remedy of the wit n-steps problem Prtil bck-up fter every simultion step Similr ide: wether forecst djustment G Different versions of this ide hs been implemented 17

18 RL successes Reinforcement lerning is reltively simple On-line techniques cn trck non-sttionry environments nd dpt to its chnges Successful pplictions: Deep Mind s AlphGo (Alph Zero) TD Gmmon lerned to ply bckgmmon on the chmpionship level Elevtor control Dynmic chnnel lloction in mobile telephony Robot nvigtion in the environment 18

Reinforcement Learning

Reinforcement Learning Reinforcement Lerning Tom Mitchell, Mchine Lerning, chpter 13 Outline Introduction Comprison with inductive lerning Mrkov Decision Processes: the model Optiml policy: The tsk Q Lerning: Q function Algorithm

More information

Administrivia CSE 190: Reinforcement Learning: An Introduction

Administrivia CSE 190: Reinforcement Learning: An Introduction Administrivi CSE 190: Reinforcement Lerning: An Introduction Any emil sent to me bout the course should hve CSE 190 in the subject line! Chpter 4: Dynmic Progrmming Acknowledgment: A good number of these

More information

Bellman Optimality Equation for V*

Bellman Optimality Equation for V* Bellmn Optimlity Eqution for V* The vlue of stte under n optiml policy must equl the expected return for the best ction from tht stte: V (s) mx Q (s,) A(s) mx A(s) mx A(s) Er t 1 V (s t 1 ) s t s, t s

More information

{ } = E! & $ " k r t +k +1

{ } = E! & $  k r t +k +1 Chpter 4: Dynmic Progrmming Objectives of this chpter: Overview of collection of clssicl solution methods for MDPs known s dynmic progrmming (DP) Show how DP cn be used to compute vlue functions, nd hence,

More information

Chapter 4: Dynamic Programming

Chapter 4: Dynamic Programming Chpter 4: Dynmic Progrmming Objectives of this chpter: Overview of collection of clssicl solution methods for MDPs known s dynmic progrmming (DP) Show how DP cn be used to compute vlue functions, nd hence,

More information

2D1431 Machine Learning Lab 3: Reinforcement Learning

2D1431 Machine Learning Lab 3: Reinforcement Learning 2D1431 Mchine Lerning Lb 3: Reinforcement Lerning Frnk Hoffmnn modified by Örjn Ekeberg December 7, 2004 1 Introduction In this lb you will lern bout dynmic progrmming nd reinforcement lerning. It is ssumed

More information

Reinforcement learning

Reinforcement learning Reinforcement lerning Regulr MDP Given: Trnition model P Rewrd function R Find: Policy π Reinforcement lerning Trnition model nd rewrd function initilly unknown Still need to find the right policy Lern

More information

Module 6 Value Iteration. CS 886 Sequential Decision Making and Reinforcement Learning University of Waterloo

Module 6 Value Iteration. CS 886 Sequential Decision Making and Reinforcement Learning University of Waterloo Module 6 Vlue Itertion CS 886 Sequentil Decision Mking nd Reinforcement Lerning University of Wterloo Mrkov Decision Process Definition Set of sttes: S Set of ctions (i.e., decisions): A Trnsition model:

More information

Artificial Intelligence Markov Decision Problems

Artificial Intelligence Markov Decision Problems rtificil Intelligence Mrkov eciion Problem ilon - briefly mentioned in hpter Ruell nd orvig - hpter 7 Mrkov eciion Problem; pge of Mrkov eciion Problem; pge of exmple: probbilitic blockworld ction outcome

More information

19 Optimal behavior: Game theory

19 Optimal behavior: Game theory Intro. to Artificil Intelligence: Dle Schuurmns, Relu Ptrscu 1 19 Optiml behvior: Gme theory Adversril stte dynmics hve to ccount for worst cse Compute policy π : S A tht mximizes minimum rewrd Let S (,

More information

Reinforcement Learning and Policy Reuse

Reinforcement Learning and Policy Reuse Reinforcement Lerning nd Policy Reue Mnuel M. Veloo PEL Fll 206 Reding: Reinforcement Lerning: An Introduction R. Sutton nd A. Brto Probbilitic policy reue in reinforcement lerning gent Fernndo Fernndez

More information

Multi-Armed Bandits: Non-adaptive and Adaptive Sampling

Multi-Armed Bandits: Non-adaptive and Adaptive Sampling CSE 547/Stt 548: Mchine Lerning for Big Dt Lecture Multi-Armed Bndits: Non-dptive nd Adptive Smpling Instructor: Shm Kkde 1 The (stochstic) multi-rmed bndit problem The bsic prdigm is s follows: K Independent

More information

Decision Networks. CS 188: Artificial Intelligence Fall Example: Decision Networks. Decision Networks. Decisions as Outcome Trees

Decision Networks. CS 188: Artificial Intelligence Fall Example: Decision Networks. Decision Networks. Decisions as Outcome Trees CS 188: Artificil Intelligence Fll 2011 Decision Networks ME: choose the ction which mximizes the expected utility given the evidence mbrell Lecture 17: Decision Digrms 10/27/2011 Cn directly opertionlize

More information

CS 188: Artificial Intelligence Spring 2007

CS 188: Artificial Intelligence Spring 2007 CS 188: Artificil Intelligence Spring 2007 Lecture 3: Queue-Bsed Serch 1/23/2007 Srini Nrynn UC Berkeley Mny slides over the course dpted from Dn Klein, Sturt Russell or Andrew Moore Announcements Assignment

More information

1 Online Learning and Regret Minimization

1 Online Learning and Regret Minimization 2.997 Decision-Mking in Lrge-Scle Systems My 10 MIT, Spring 2004 Hndout #29 Lecture Note 24 1 Online Lerning nd Regret Minimiztion In this lecture, we consider the problem of sequentil decision mking in

More information

Cf. Linn Sennott, Stochastic Dynamic Programming and the Control of Queueing Systems, Wiley Series in Probability & Statistics, 1999.

Cf. Linn Sennott, Stochastic Dynamic Programming and the Control of Queueing Systems, Wiley Series in Probability & Statistics, 1999. Cf. Linn Sennott, Stochstic Dynmic Progrmming nd the Control of Queueing Systems, Wiley Series in Probbility & Sttistics, 1999. D.L.Bricker, 2001 Dept of Industril Engineering The University of Iow MDP

More information

This lecture covers Chapter 8 of HMU: Properties of CFLs

This lecture covers Chapter 8 of HMU: Properties of CFLs This lecture covers Chpter 8 of HMU: Properties of CFLs Turing Mchine Extensions of Turing Mchines Restrictions of Turing Mchines Additionl Reding: Chpter 8 of HMU. Turing Mchine: Informl Definition B

More information

Chapter 5 : Continuous Random Variables

Chapter 5 : Continuous Random Variables STAT/MATH 395 A - PROBABILITY II UW Winter Qurter 216 Néhémy Lim Chpter 5 : Continuous Rndom Vribles Nottions. N {, 1, 2,...}, set of nturl numbers (i.e. ll nonnegtive integers); N {1, 2,...}, set of ll

More information

Continuous Random Variables

Continuous Random Variables STAT/MATH 395 A - PROBABILITY II UW Winter Qurter 217 Néhémy Lim Continuous Rndom Vribles Nottion. The indictor function of set S is rel-vlued function defined by : { 1 if x S 1 S (x) if x S Suppose tht

More information

AQA Further Pure 1. Complex Numbers. Section 1: Introduction to Complex Numbers. The number system

AQA Further Pure 1. Complex Numbers. Section 1: Introduction to Complex Numbers. The number system Complex Numbers Section 1: Introduction to Complex Numbers Notes nd Exmples These notes contin subsections on The number system Adding nd subtrcting complex numbers Multiplying complex numbers Complex

More information

Review of Calculus, cont d

Review of Calculus, cont d Jim Lmbers MAT 460 Fll Semester 2009-10 Lecture 3 Notes These notes correspond to Section 1.1 in the text. Review of Clculus, cont d Riemnn Sums nd the Definite Integrl There re mny cses in which some

More information

CS 188: Artificial Intelligence

CS 188: Artificial Intelligence CS 188: Artificil Intelligence Lecture 19: Decision Digrms Pieter Abbeel --- C Berkeley Mny slides over this course dpted from Dn Klein, Sturt Russell, Andrew Moore Decision Networks ME: choose the ction

More information

Solution for Assignment 1 : Intro to Probability and Statistics, PAC learning

Solution for Assignment 1 : Intro to Probability and Statistics, PAC learning Solution for Assignment 1 : Intro to Probbility nd Sttistics, PAC lerning 10-701/15-781: Mchine Lerning (Fll 004) Due: Sept. 30th 004, Thursdy, Strt of clss Question 1. Bsic Probbility ( 18 pts) 1.1 (

More information

Bayesian Networks: Approximate Inference

Bayesian Networks: Approximate Inference pproches to inference yesin Networks: pproximte Inference xct inference Vrillimintion Join tree lgorithm pproximte inference Simplify the structure of the network to mkxct inferencfficient (vritionl methods,

More information

Actor-Critic. Hung-yi Lee

Actor-Critic. Hung-yi Lee Actor-Critic Hung-yi Lee Asynchronous Advntge Actor-Critic (A3C) Volodymyr Mnih, Adrià Puigdomènech Bdi, Mehdi Mirz, Alex Grves, Timothy P. Lillicrp, Tim Hrley, Dvid Silver, Kory Kvukcuoglu, Asynchronous

More information

CS 188: Artificial Intelligence Fall 2010

CS 188: Artificial Intelligence Fall 2010 CS 188: Artificil Intelligence Fll 2010 Lecture 18: Decision Digrms 10/28/2010 Dn Klein C Berkeley Vlue of Informtion 1 Decision Networks ME: choose the ction which mximizes the expected utility given

More information

Decision Networks. CS 188: Artificial Intelligence. Decision Networks. Decision Networks. Decision Networks and Value of Information

Decision Networks. CS 188: Artificial Intelligence. Decision Networks. Decision Networks. Decision Networks and Value of Information CS 188: Artificil Intelligence nd Vlue of Informtion Instructors: Dn Klein nd Pieter Abbeel niversity of Cliforni, Berkeley [These slides were creted by Dn Klein nd Pieter Abbeel for CS188 Intro to AI

More information

Scalable Learning in Stochastic Games

Scalable Learning in Stochastic Games Sclble Lerning in Stochstic Gmes Michel Bowling nd Mnuel Veloso Computer Science Deprtment Crnegie Mellon University Pittsburgh PA, 15213-3891 Abstrct Stochstic gmes re generl model of interction between

More information

Planning to Be Surprised: Optimal Bayesian Exploration in Dynamic Environments

Planning to Be Surprised: Optimal Bayesian Exploration in Dynamic Environments Plnning to Be Surprised: Optiml Byesin Explortion in Dynmic Environments Yi Sun, Fustino Gomez, nd Jürgen Schmidhuber IDSIA, Glleri 2, Mnno, CH-6928, Switzerlnd {yi,tino,juergen}@idsi.ch Abstrct. To mximize

More information

CS667 Lecture 6: Monte Carlo Integration 02/10/05

CS667 Lecture 6: Monte Carlo Integration 02/10/05 CS667 Lecture 6: Monte Crlo Integrtion 02/10/05 Venkt Krishnrj Lecturer: Steve Mrschner 1 Ide The min ide of Monte Crlo Integrtion is tht we cn estimte the vlue of n integrl by looking t lrge number of

More information

CS 188: Artificial Intelligence Fall Announcements

CS 188: Artificial Intelligence Fall Announcements CS 188: Artificil Intelligence Fll 2009 Lecture 20: Prticle Filtering 11/5/2009 Dn Klein UC Berkeley Announcements Written 3 out: due 10/12 Project 4 out: due 10/19 Written 4 proly xed, Project 5 moving

More information

We will see what is meant by standard form very shortly

We will see what is meant by standard form very shortly THEOREM: For fesible liner progrm in its stndrd form, the optimum vlue of the objective over its nonempty fesible region is () either unbounded or (b) is chievble t lest t one extreme point of the fesible

More information

Math& 152 Section Integration by Parts

Math& 152 Section Integration by Parts Mth& 5 Section 7. - Integrtion by Prts Integrtion by prts is rule tht trnsforms the integrl of the product of two functions into other (idelly simpler) integrls. Recll from Clculus I tht given two differentible

More information

Acceptance Sampling by Attributes

Acceptance Sampling by Attributes Introduction Acceptnce Smpling by Attributes Acceptnce smpling is concerned with inspection nd decision mking regrding products. Three spects of smpling re importnt: o Involves rndom smpling of n entire

More information

Markov decision processes

Markov decision processes CS 2740 Knowledge representation Lecture 24 Markov decision processes Milos Hauskrecht milos@cs.pitt.edu 5329 Sennott Square Administrative announcements Final exam: Monday, December 8, 2008 In-class Only

More information

Numerical Analysis: Trapezoidal and Simpson s Rule

Numerical Analysis: Trapezoidal and Simpson s Rule nd Simpson s Mthemticl question we re interested in numericlly nswering How to we evlute I = f (x) dx? Clculus tells us tht if F(x) is the ntiderivtive of function f (x) on the intervl [, b], then I =

More information

LECTURE NOTE #12 PROF. ALAN YUILLE

LECTURE NOTE #12 PROF. ALAN YUILLE LECTURE NOTE #12 PROF. ALAN YUILLE 1. Clustering, K-mens, nd EM Tsk: set of unlbeled dt D = {x 1,..., x n } Decompose into clsses w 1,..., w M where M is unknown. Lern clss models p(x w)) Discovery of

More information

Hybrid Control and Switched Systems. Lecture #2 How to describe a hybrid system? Formal models for hybrid system

Hybrid Control and Switched Systems. Lecture #2 How to describe a hybrid system? Formal models for hybrid system Hyrid Control nd Switched Systems Lecture #2 How to descrie hyrid system? Forml models for hyrid system João P. Hespnh University of Cliforni t Snt Brr Summry. Forml models for hyrid systems: Finite utomt

More information

Lecture 3 Gaussian Probability Distribution

Lecture 3 Gaussian Probability Distribution Introduction Lecture 3 Gussin Probbility Distribution Gussin probbility distribution is perhps the most used distribution in ll of science. lso clled bell shped curve or norml distribution Unlike the binomil

More information

1 Probability Density Functions

1 Probability Density Functions Lis Yn CS 9 Continuous Distributions Lecture Notes #9 July 6, 28 Bsed on chpter by Chris Piech So fr, ll rndom vribles we hve seen hve been discrete. In ll the cses we hve seen in CS 9, this ment tht our

More information

Math Calculus with Analytic Geometry II

Math Calculus with Analytic Geometry II orem of definite Mth 5.0 with Anlytic Geometry II Jnury 4, 0 orem of definite If < b then b f (x) dx = ( under f bove x-xis) ( bove f under x-xis) Exmple 8 0 3 9 x dx = π 3 4 = 9π 4 orem of definite Problem

More information

Math 8 Winter 2015 Applications of Integration

Math 8 Winter 2015 Applications of Integration Mth 8 Winter 205 Applictions of Integrtion Here re few importnt pplictions of integrtion. The pplictions you my see on n exm in this course include only the Net Chnge Theorem (which is relly just the Fundmentl

More information

ODE: Existence and Uniqueness of a Solution

ODE: Existence and Uniqueness of a Solution Mth 22 Fll 213 Jerry Kzdn ODE: Existence nd Uniqueness of Solution The Fundmentl Theorem of Clculus tells us how to solve the ordinry differentil eqution (ODE) du = f(t) dt with initil condition u() =

More information

Chapter 0. What is the Lebesgue integral about?

Chapter 0. What is the Lebesgue integral about? Chpter 0. Wht is the Lebesgue integrl bout? The pln is to hve tutoril sheet ech week, most often on Fridy, (to be done during the clss) where you will try to get used to the ides introduced in the previous

More information

Applying Q-Learning to Flappy Bird

Applying Q-Learning to Flappy Bird Applying Q-Lerning to Flppy Bird Moritz Ebeling-Rump, Mnfred Ko, Zchry Hervieux-Moore Abstrct The field of mchine lerning is n interesting nd reltively new re of reserch in rtificil intelligence. In this

More information

DATA Search I 魏忠钰. 复旦大学大数据学院 School of Data Science, Fudan University. March 7 th, 2018

DATA Search I 魏忠钰. 复旦大学大数据学院 School of Data Science, Fudan University. March 7 th, 2018 DATA620006 魏忠钰 Serch I Mrch 7 th, 2018 Outline Serch Problems Uninformed Serch Depth-First Serch Bredth-First Serch Uniform-Cost Serch Rel world tsk - Pc-mn Serch problems A serch problem consists of:

More information

Name Solutions to Test 3 November 8, 2017

Name Solutions to Test 3 November 8, 2017 Nme Solutions to Test 3 November 8, 07 This test consists of three prts. Plese note tht in prts II nd III, you cn skip one question of those offered. Some possibly useful formuls cn be found below. Brrier

More information

CMDA 4604: Intermediate Topics in Mathematical Modeling Lecture 19: Interpolation and Quadrature

CMDA 4604: Intermediate Topics in Mathematical Modeling Lecture 19: Interpolation and Quadrature CMDA 4604: Intermedite Topics in Mthemticl Modeling Lecture 19: Interpoltion nd Qudrture In this lecture we mke brief diversion into the res of interpoltion nd qudrture. Given function f C[, b], we sy

More information

Efficient Planning. R. S. Sutton and A. G. Barto: Reinforcement Learning: An Introduction

Efficient Planning. R. S. Sutton and A. G. Barto: Reinforcement Learning: An Introduction Efficient Plnning 1 Tuesdy clss summry: Plnning: ny computtionl process tht uses model to crete or improve policy Dyn frmework: 2 Questions during clss Why use simulted experience? Cn t you directly compute

More information

THE EXISTENCE-UNIQUENESS THEOREM FOR FIRST-ORDER DIFFERENTIAL EQUATIONS.

THE EXISTENCE-UNIQUENESS THEOREM FOR FIRST-ORDER DIFFERENTIAL EQUATIONS. THE EXISTENCE-UNIQUENESS THEOREM FOR FIRST-ORDER DIFFERENTIAL EQUATIONS RADON ROSBOROUGH https://intuitiveexplntionscom/picrd-lindelof-theorem/ This document is proof of the existence-uniqueness theorem

More information

Learning to Serve and Bounce a Ball

Learning to Serve and Bounce a Ball Sndr Amend Gregor Gebhrdt Technische Universität Drmstdt Abstrct In this pper we investigte lerning the tsks of bll serving nd bll bouncing. These tsks disply chrcteristics which re common in vriety of

More information

CS 188 Introduction to Artificial Intelligence Fall 2018 Note 7

CS 188 Introduction to Artificial Intelligence Fall 2018 Note 7 CS 188 Introduction to Artificil Intelligence Fll 2018 Note 7 These lecture notes re hevily bsed on notes originlly written by Nikhil Shrm. Decision Networks In the third note, we lerned bout gme trees

More information

Ph2b Quiz - 1. Instructions

Ph2b Quiz - 1. Instructions Ph2b Winter 217-18 Quiz - 1 Due Dte: Mondy, Jn 29, 218 t 4pm Ph2b Quiz - 1 Instructions 1. Your solutions re due by Mondy, Jnury 29th, 218 t 4pm in the quiz box outside 21 E. Bridge. 2. Lte quizzes will

More information

How do we solve these things, especially when they get complicated? How do we know when a system has a solution, and when is it unique?

How do we solve these things, especially when they get complicated? How do we know when a system has a solution, and when is it unique? XII. LINEAR ALGEBRA: SOLVING SYSTEMS OF EQUATIONS Tody we re going to tlk bout solving systems of liner equtions. These re problems tht give couple of equtions with couple of unknowns, like: 6 2 3 7 4

More information

Energy Bands Energy Bands and Band Gap. Phys463.nb Phenomenon

Energy Bands Energy Bands and Band Gap. Phys463.nb Phenomenon Phys463.nb 49 7 Energy Bnds Ref: textbook, Chpter 7 Q: Why re there insultors nd conductors? Q: Wht will hppen when n electron moves in crystl? In the previous chpter, we discussed free electron gses,

More information

Non-Linear & Logistic Regression

Non-Linear & Logistic Regression Non-Liner & Logistic Regression If the sttistics re boring, then you've got the wrong numbers. Edwrd R. Tufte (Sttistics Professor, Yle University) Regression Anlyses When do we use these? PART 1: find

More information

Convergence Results for Single-Step On-Policy Reinforcement-Learning Algorithms

Convergence Results for Single-Step On-Policy Reinforcement-Learning Algorithms Mchine Lerning, 39, 287 308, 2000. c 2000 Kluwer Acdemic Publishers. Printed in The Netherlnds. Convergence Results for Single-Step On-Policy Reinforcement-Lerning Algorithms SATINDER SINGH AT&T Lbs-Reserch,

More information

Monte Carlo method in solving numerical integration and differential equation

Monte Carlo method in solving numerical integration and differential equation Monte Crlo method in solving numericl integrtion nd differentil eqution Ye Jin Chemistry Deprtment Duke University yj66@duke.edu Abstrct: Monte Crlo method is commonly used in rel physics problem. The

More information

Student Activity 3: Single Factor ANOVA

Student Activity 3: Single Factor ANOVA MATH 40 Student Activity 3: Single Fctor ANOVA Some Bsic Concepts In designed experiment, two or more tretments, or combintions of tretments, is pplied to experimentl units The number of tretments, whether

More information

Properties of Integrals, Indefinite Integrals. Goals: Definition of the Definite Integral Integral Calculations using Antiderivatives

Properties of Integrals, Indefinite Integrals. Goals: Definition of the Definite Integral Integral Calculations using Antiderivatives Block #6: Properties of Integrls, Indefinite Integrls Gols: Definition of the Definite Integrl Integrl Clcultions using Antiderivtives Properties of Integrls The Indefinite Integrl 1 Riemnn Sums - 1 Riemnn

More information

Duality # Second iteration for HW problem. Recall our LP example problem we have been working on, in equality form, is given below.

Duality # Second iteration for HW problem. Recall our LP example problem we have been working on, in equality form, is given below. Dulity #. Second itertion for HW problem Recll our LP emple problem we hve been working on, in equlity form, is given below.,,,, 8 m F which, when written in slightly different form, is 8 F Recll tht we

More information

Integral equations, eigenvalue, function interpolation

Integral equations, eigenvalue, function interpolation Integrl equtions, eigenvlue, function interpoltion Mrcin Chrząszcz mchrzsz@cernch Monte Crlo methods, 26 My, 2016 1 / Mrcin Chrząszcz (Universität Zürich) Integrl equtions, eigenvlue, function interpoltion

More information

Data Assimilation. Alan O Neill Data Assimilation Research Centre University of Reading

Data Assimilation. Alan O Neill Data Assimilation Research Centre University of Reading Dt Assimiltion Aln O Neill Dt Assimiltion Reserch Centre University of Reding Contents Motivtion Univrite sclr dt ssimiltion Multivrite vector dt ssimiltion Optiml Interpoltion BLUE 3d-Vritionl Method

More information

Math 113 Exam 2 Practice

Math 113 Exam 2 Practice Mth Em Prctice Februry, 8 Em will cover sections 6.5, 7.-7.5 nd 7.8. This sheet hs three sections. The first section will remind you bout techniques nd formuls tht you should know. The second gives number

More information

How do you know you have SLE?

How do you know you have SLE? Simultneous Liner Equtions Simultneous Liner Equtions nd Liner Algebr Simultneous liner equtions (SLE s) occur frequently in Sttics, Dynmics, Circuits nd other engineering clsses Need to be ble to, nd

More information

Online Markov Decision Processes under Bandit Feedback

Online Markov Decision Processes under Bandit Feedback Online Mrkov Decision Processes under Bndit Feedbck Gergely Neu, András György, Csb Szepesvári, András Antos Abstrct We consider online lerning in finite stochstic Mrkovin environments where in ech time

More information

4 7x =250; 5 3x =500; Read section 3.3, 3.4 Announcements: Bell Ringer: Use your calculator to solve

4 7x =250; 5 3x =500; Read section 3.3, 3.4 Announcements: Bell Ringer: Use your calculator to solve Dte: 3/14/13 Objective: SWBAT pply properties of exponentil functions nd will pply properties of rithms. Bell Ringer: Use your clcultor to solve 4 7x =250; 5 3x =500; HW Requests: Properties of Log Equtions

More information

Strong Bisimulation. Overview. References. Actions Labeled transition system Transition semantics Simulation Bisimulation

Strong Bisimulation. Overview. References. Actions Labeled transition system Transition semantics Simulation Bisimulation Strong Bisimultion Overview Actions Lbeled trnsition system Trnsition semntics Simultion Bisimultion References Robin Milner, Communiction nd Concurrency Robin Milner, Communicting nd Mobil Systems 32

More information

Hidden Markov Models

Hidden Markov Models Hidden Mrkov Models Huptseminr Mchine Lerning 18.11.2003 Referent: Nikols Dörfler 1 Overview Mrkov Models Hidden Mrkov Models Types of Hidden Mrkov Models Applictions using HMMs Three centrl problems:

More information

Lecture 19: Continuous Least Squares Approximation

Lecture 19: Continuous Least Squares Approximation Lecture 19: Continuous Lest Squres Approximtion 33 Continuous lest squres pproximtion We begn 31 with the problem of pproximting some f C[, b] with polynomil p P n t the discrete points x, x 1,, x m for

More information

Near-Bayesian Exploration in Polynomial Time

Near-Bayesian Exploration in Polynomial Time J. Zico Kolter kolter@cs.stnford.edu Andrew Y. Ng ng@cs.stnford.edu Computer Science Deprtment, Stnford University, CA 94305 Abstrct We consider the explortion/exploittion problem in reinforcement lerning

More information

Improper Integrals, and Differential Equations

Improper Integrals, and Differential Equations Improper Integrls, nd Differentil Equtions October 22, 204 5.3 Improper Integrls Previously, we discussed how integrls correspond to res. More specificlly, we sid tht for function f(x), the region creted

More information

KNOWLEDGE-BASED AGENTS INFERENCE

KNOWLEDGE-BASED AGENTS INFERENCE AGENTS THAT REASON LOGICALLY KNOWLEDGE-BASED AGENTS Two components: knowledge bse, nd n inference engine. Declrtive pproch to building n gent. We tell it wht it needs to know, nd It cn sk itself wht to

More information

Read section 3.3, 3.4 Announcements:

Read section 3.3, 3.4 Announcements: Dte: 3/1/13 Objective: SWBAT pply properties of exponentil functions nd will pply properties of rithms. Bell Ringer: 1. f x = 3x 6, find the inverse, f 1 x., Using your grphing clcultor, Grph 1. f x,f

More information

Z b. f(x)dx. Yet in the above two cases we know what f(x) is. Sometimes, engineers want to calculate an area by computing I, but...

Z b. f(x)dx. Yet in the above two cases we know what f(x) is. Sometimes, engineers want to calculate an area by computing I, but... Chpter 7 Numericl Methods 7. Introduction In mny cses the integrl f(x)dx cn be found by finding function F (x) such tht F 0 (x) =f(x), nd using f(x)dx = F (b) F () which is known s the nlyticl (exct) solution.

More information

Review of basic calculus

Review of basic calculus Review of bsic clculus This brief review reclls some of the most importnt concepts, definitions, nd theorems from bsic clculus. It is not intended to tech bsic clculus from scrtch. If ny of the items below

More information

SUMMER KNOWHOW STUDY AND LEARNING CENTRE

SUMMER KNOWHOW STUDY AND LEARNING CENTRE SUMMER KNOWHOW STUDY AND LEARNING CENTRE Indices & Logrithms 2 Contents Indices.2 Frctionl Indices.4 Logrithms 6 Exponentil equtions. Simplifying Surds 13 Opertions on Surds..16 Scientific Nottion..18

More information

Jonathan Mugan. July 15, 2013

Jonathan Mugan. July 15, 2013 Jonthn Mugn July 15, 2013 Imgine rt in Skinner box. The rt cn see screen of imges, nd dot in the lower-right corner determines if there will be shock. Bottom-up methods my not find this dot, but top-down

More information

A Fast and Reliable Policy Improvement Algorithm

A Fast and Reliable Policy Improvement Algorithm A Fst nd Relible Policy Improvement Algorithm Ysin Abbsi-Ydkori Peter L. Brtlett Stephen J. Wright Queenslnd University of Technology UC Berkeley nd QUT University of Wisconsin-Mdison Abstrct We introduce

More information

New data structures to reduce data size and search time

New data structures to reduce data size and search time New dt structures to reduce dt size nd serch time Tsuneo Kuwbr Deprtment of Informtion Sciences, Fculty of Science, Kngw University, Hirtsuk-shi, Jpn FIT2018 1D-1, No2, pp1-4 Copyright (c)2018 by The Institute

More information

Genetic Programming. Outline. Evolutionary Strategies. Evolutionary strategies Genetic programming Summary

Genetic Programming. Outline. Evolutionary Strategies. Evolutionary strategies Genetic programming Summary Outline Genetic Progrmming Evolutionry strtegies Genetic progrmming Summry Bsed on the mteril provided y Professor Michel Negnevitsky Evolutionry Strtegies An pproch simulting nturl evolution ws proposed

More information

Riemann is the Mann! (But Lebesgue may besgue to differ.)

Riemann is the Mann! (But Lebesgue may besgue to differ.) Riemnn is the Mnn! (But Lebesgue my besgue to differ.) Leo Livshits My 2, 2008 1 For finite intervls in R We hve seen in clss tht every continuous function f : [, b] R hs the property tht for every ɛ >

More information

Jim Lambers MAT 169 Fall Semester Lecture 4 Notes

Jim Lambers MAT 169 Fall Semester Lecture 4 Notes Jim Lmbers MAT 169 Fll Semester 2009-10 Lecture 4 Notes These notes correspond to Section 8.2 in the text. Series Wht is Series? An infinte series, usully referred to simply s series, is n sum of ll of

More information

A signalling model of school grades: centralized versus decentralized examinations

A signalling model of school grades: centralized versus decentralized examinations A signlling model of school grdes: centrlized versus decentrlized exmintions Mri De Pol nd Vincenzo Scopp Diprtimento di Economi e Sttistic, Università dell Clbri m.depol@unicl.it; v.scopp@unicl.it 1 The

More information

P 3 (x) = f(0) + f (0)x + f (0) 2. x 2 + f (0) . In the problem set, you are asked to show, in general, the n th order term is a n = f (n) (0)

P 3 (x) = f(0) + f (0)x + f (0) 2. x 2 + f (0) . In the problem set, you are asked to show, in general, the n th order term is a n = f (n) (0) 1 Tylor polynomils In Section 3.5, we discussed how to pproximte function f(x) round point in terms of its first derivtive f (x) evluted t, tht is using the liner pproximtion f() + f ()(x ). We clled this

More information

Chapters 4 & 5 Integrals & Applications

Chapters 4 & 5 Integrals & Applications Contents Chpters 4 & 5 Integrls & Applictions Motivtion to Chpters 4 & 5 2 Chpter 4 3 Ares nd Distnces 3. VIDEO - Ares Under Functions............................................ 3.2 VIDEO - Applictions

More information

1 Error Analysis of Simple Rules for Numerical Integration

1 Error Analysis of Simple Rules for Numerical Integration cs41: introduction to numericl nlysis 11/16/10 Lecture 19: Numericl Integrtion II Instructor: Professor Amos Ron Scries: Mrk Cowlishw, Nthnel Fillmore 1 Error Anlysis of Simple Rules for Numericl Integrtion

More information

Extended nonlocal games from quantum-classical games

Extended nonlocal games from quantum-classical games Extended nonlocl gmes from quntum-clssicl gmes Theory Seminr incent Russo niversity of Wterloo October 17, 2016 Outline Extended nonlocl gmes nd quntum-clssicl gmes Entngled vlues nd the dimension of entnglement

More information

REINFORCEMENT learning (RL) was originally studied

REINFORCEMENT learning (RL) was originally studied IEEE TRANSACTIONS ON SYSTEMS, MAN, AND CYBERNETICS: SYSTEMS, VOL. 45, NO. 3, MARCH 2015 385 Multiobjective Reinforcement Lerning: A Comprehensive Overview Chunming Liu, Xin Xu, Senior Member, IEEE, nd

More information

Lecture 20: Numerical Integration III

Lecture 20: Numerical Integration III cs4: introduction to numericl nlysis /8/0 Lecture 0: Numericl Integrtion III Instructor: Professor Amos Ron Scribes: Mrk Cowlishw, Yunpeng Li, Nthnel Fillmore For the lst few lectures we hve discussed

More information

The Regulated and Riemann Integrals

The Regulated and Riemann Integrals Chpter 1 The Regulted nd Riemnn Integrls 1.1 Introduction We will consider severl different pproches to defining the definite integrl f(x) dx of function f(x). These definitions will ll ssign the sme vlue

More information

Math 270A: Numerical Linear Algebra

Math 270A: Numerical Linear Algebra Mth 70A: Numericl Liner Algebr Instructor: Michel Holst Fll Qurter 014 Homework Assignment #3 Due Give to TA t lest few dys before finl if you wnt feedbck. Exercise 3.1. (The Bsic Liner Method for Liner

More information

Lecture Note 9: Orthogonal Reduction

Lecture Note 9: Orthogonal Reduction MATH : Computtionl Methods of Liner Algebr 1 The Row Echelon Form Lecture Note 9: Orthogonl Reduction Our trget is to solve the norml eution: Xinyi Zeng Deprtment of Mthemticl Sciences, UTEP A t Ax = A

More information

How to simulate Turing machines by invertible one-dimensional cellular automata

How to simulate Turing machines by invertible one-dimensional cellular automata How to simulte Turing mchines by invertible one-dimensionl cellulr utomt Jen-Christophe Dubcq Déprtement de Mthémtiques et d Informtique, École Normle Supérieure de Lyon, 46, llée d Itlie, 69364 Lyon Cedex

More information

A Continuous-time Markov Decision Process Based Method on Pursuit-Evasion Problem

A Continuous-time Markov Decision Process Based Method on Pursuit-Evasion Problem Preprints of the th World Congress The Interntionl Federtion of Automtic Control Cpe Town, South Afric. August -, A Continuous-time Mrkov Decision Process Bsed Method on Pursuit-Evsion Problem Ji Shengde

More information

Uninformed Search Lecture 4

Uninformed Search Lecture 4 Lecture 4 Wht re common serch strtegies tht operte given only serch problem? How do they compre? 1 Agend A quick refresher DFS, BFS, ID-DFS, UCS Unifiction! 2 Serch Problem Formlism Defined vi the following

More information

f(x) dx, If one of these two conditions is not met, we call the integral improper. Our usual definition for the value for the definite integral

f(x) dx, If one of these two conditions is not met, we call the integral improper. Our usual definition for the value for the definite integral Improper Integrls Every time tht we hve evluted definite integrl such s f(x) dx, we hve mde two implicit ssumptions bout the integrl:. The intervl [, b] is finite, nd. f(x) is continuous on [, b]. If one

More information

Discrete Mathematics and Probability Theory Spring 2013 Anant Sahai Lecture 17

Discrete Mathematics and Probability Theory Spring 2013 Anant Sahai Lecture 17 EECS 70 Discrete Mthemtics nd Proility Theory Spring 2013 Annt Shi Lecture 17 I.I.D. Rndom Vriles Estimting the is of coin Question: We wnt to estimte the proportion p of Democrts in the US popultion,

More information

In-Class Problems 2 and 3: Projectile Motion Solutions. In-Class Problem 2: Throwing a Stone Down a Hill

In-Class Problems 2 and 3: Projectile Motion Solutions. In-Class Problem 2: Throwing a Stone Down a Hill MASSACHUSETTS INSTITUTE OF TECHNOLOGY Deprtment of Physics Physics 8T Fll Term 4 In-Clss Problems nd 3: Projectile Motion Solutions We would like ech group to pply the problem solving strtegy with the

More information

Numerical integration

Numerical integration 2 Numericl integrtion This is pge i Printer: Opque this 2. Introduction Numericl integrtion is problem tht is prt of mny problems in the economics nd econometrics literture. The orgniztion of this chpter

More information