Efficient Planning. R. S. Sutton and A. G. Barto: Reinforcement Learning: An Introduction
|
|
- Heather Hoover
- 5 years ago
- Views:
Transcription
1 Efficient Plnning 1
2 Tuesdy clss summry: Plnning: ny computtionl process tht uses model to crete or improve policy Dyn frmework: 2
3 Questions during clss Why use simulted experience? Cn t you directly compute solution bsed on model? Wouldn t it be better to pln bckwrds from gol 3
4 How to Achieve Efficient Plnning? Wht type of bckup is better? Smple vs. full bckups Incrementl vs. less incrementl bckups How to order the bckups? 4
5 Wht is Efficient Plnning? Plnning lgorithm A is more efficient thn plnning lgorithm B if: it cn compute the optiml policy (or vlue function) in less time. given the sme mount of computtion time, it improves the policy (or vlue function) more. 5
6 Wht bckup type is best? 6
7 Full vs. Smple Bckups Vlue estimted Full bckups (DP) Smple bckups (one-step TD) s s V v! π (s) r s' r s' policy evlution TD() V v * * (s) mx s vlue itertion r s' s, s, Q q! π (,s) r s' r s' ' Q-policy evlution ' Srs s, s, Q q * (,s) * mx 7 r s' Q-vlue itertion ' mx r s' ' Q-lerning
8 Full vs. Smple Bckups 1 smple bckups full bckups RMS error in vlue estimte b =1 b = 2 (brnching fctor) b =1 b =1 b =1, 1b 2b Number of mx Q(s, ) computtions b successor sttes, eqully likely; initil error = 1; ssume ll next sttes vlues re correct 8
9 Smll Bckups Smll bckups re single-successor bckups bsed on the model Smll bckups hve the sme computtionl complexity s smple bckups Smll bckups hve no smpling error Smll bckups require storge for old vlues 9
10 A n optiml Xicon. Consider tht estimtes we re interested in some est eks tht is constructed from sum of other X. i problem behind this newinibckuptht is sisfollows. iin n constructed from sum of other estim The estimte A cn be computed using full bckup: t we re interested in Ide some estimte A Smll Bckups Min behind The estimte A cn be computed using full s tsk is often forx uctedupdted, from suma of cn other be estimtes Xi. re recomputed A X. X cess (MDP), whereusing full ibckup: A cn be computed A weighted Xi. sum i e on bckup. Alterntively, if we know Consider estimte A tht is constructed from the gent s bex i A X ignl. The gent s If the estimtes Xii. revlue updted, A cn be recomputed ed estimtes significnt chnge, we i is by redoing the bove Alterntively, ifi we know eturn, which the If the estimtes X re updted, A cn be reco te A for only Xbckup.. Let us indicte X j tht only Xj received significnt vlue chnge, we ure steps. An w fullupdted, bckup: by redoing the bove bckup. Alterntively, if es Xitime re AAcn be recomputed i Xi used to construct the vlue i current updte A for only. jlet us indicte jx bove bckup. Alterntively, ifonly we X know ne might RL iswnt the to smple tht received significnt vlue ch the old vlue of X, used to construct the current vlue hen be updted byvlue subtrcting this received significnt chnge, we ber of environment Wht cn we know tht only single might wnt to updte Asuccessor, for only Xj., Let us Aj doaif +we w (X x ) j j j ofupdte A, s xaj. for A cn updted by subtrcting this og onlythen X Let us indicte j. be in chnged good policy. vlue since the lst bckup? the new vlue: the old vlue of Xj, used to construct the curre dding the vlue: ofold Xjvlue, usednd to construct thenew current vlue of A, s x 2 S ccording to some selection strtegy jh. A cn then be updted by subtrc then be updted l cn Conference on M-by subtrcting this old the new vlue: the of to construct the current kup to s: xj be Ajold x vlue + XX. useddding ALet + X. Avlue j,nd USA, JMLR: h 213. i j dding thep new vlue: vlue vlue A cn thenwbe updted for single successor r (s,uthor(s). ) + ofs A. p(sthe s, )V (s ) A yxthe ix ia A thexold Xj. j +vlue: A by dding A xj + j. thexdifference between the new nd i r (S, ) + smll P p(s S, ) mx bckup: s A Q(S, ) Q(s, ) A + wj (Xj A) + R + mx Q(S, A) xj ) 1
11 Smll bckup : single-successor bckup with cost tht yi yj is frctionsmll of the cost: single-successor of full bckup. bckup bckup with cost tht 2Cons ev O(1) is frction of the cost of full bckup. xi Smll vs.xj Smple Bckups p with cost tht t xk r=leffk ( Advntge SmllSmll Bckups over Smple Bckups: No Step-S Single-s Advntge Bckups over Smple is Smll bckup : single-successor bckup cost tht Bckups: No Step-Size x izewith ep sk t s t n 1 t bckup. is frction of the cost),ofconsfull ( TD step size g : f in.8 p y u k c e k d,.8 bc le up: TD() yj p size y k p c i b m le p s RMS.6 sm ste stnt ), con size ( p size D p error T e RMS.6 t st up: TD(), decying ste ckup: n b t s le.4 smp, consmple bck ize (normlized) error TD() : ing step s p u k c (), decy r le t gh 2 evluttion tsks: 2 evluttion rri r left ft 1 tsks: rri t gh rleft = +1 Conside r = +1 Advntge Smll bbckups over Smple Bckups: No =Step-Size r = +1 O(1) coi r -1 right x D x T : r = -1 e p over Smple Bckups: No Step-Size is Req r (normlized) left i mple bcku j l smp right s smll bckup RMS error p siz.5tnt ste.6 s n o rri r left nd rn ns dom trnsitio t gh.2 rleft = +1 r 2 evluttion tsks: s rright = -1 ndom trnsition smll bckup Smll bckup : single-successor bckup with.8 cost tht.8 step size / step size decy bckup.3 is frction of.5the cost.6 of full.7 bckup. 2smll evluttion tsks: e.2.1 left rright = +1 step size / step size decy ying step size c D(/), step size decy T step size : p rri u k ), dect le bc ckup: TD( r lef b le p smp m s = +1 r =+ Tke-Home Messge: smllerr bckups more plnning Tke-Home Messge: step size / step size decy left error.4 (normlized).2 r = -1 smple bckupright.4 s smller bckups more p de : TD(),.9 p.8 size t step p: TD(), ple bcku sm cyin1g α r = +1 smller bckups decying step size rleft = +1 rright = -1 left rright = +1 rleft = +1 rright = +1 Tke-Home Messge: r smller bckups more plnni : TD(), cons.2.9 stn ), con D( kup: T bc mple smll bckup 2revluttion tsks: n do ns m trnsitio rri r left t gh ku smple bc r = +1 Tke-Home Messge:.7.8 smll bckup 1.8.1e p siz e t s g in y.6 D(), dec RMS ll bckup.7 left.2 normlized RMS error.3 left Advntge Smll Bckups over Smple rbckups: No Step-Size is= = -1 rright right 1 size.2 t gh (normlized) tnt α smll bckup n dom step size / step size decy lph / decy.4.3 ecy ns o i t trnsi rn ns dom trnsitio 11
12 Smll vs. Smple Bckups A B C trnsition probbility stte vlues stte A stte B stte A stte B 12
13 Bckup Ordering 13
14 Bckup Ordering Do Forever: 1) Select stte s 2 S ccording to some selection strtegy H 2) Apply full bckup to s: V (s) mx hˆr(s, )+ P i s p(s s, )V (s ) Asynchronous Vlue Itertion P For every selection strtegy H tht selects ech stte infinitely often the vlues V converge to the optiml vlue function V The rte of convergence depends strongly on the selection strtegy H 14
15 The Trde-Off For ny effective ordering strtegy the cost tht is sved by hving to perform less bckups should out-weigh the cost of mintining the ordering: cost to mintin ordering cost svings due to fewer bckups 15
16 Prioritized Sweeping Which sttes or stte-ction pirs should be generted during plnning? Work bckwrds from sttes whose vlues hve just chnged: Mintin queue of stte-ction pirs whose vlues would chnge lot if bcked up, prioritized by the size of the chnge When new bckup occurs, insert predecessors ccording to their priorities Alwys perform bckups from first in queue Moore & Atkeson 1993; Peng & Willims 1993 improved by McMhn & Gordon 25; Vn Seijen
17 Moore nd Atekson s Prioritized Sweeping Published in
18 Prioritized Sweeping vs. Dyn-Q Both use n=5 bckups per environmentl interction 18
19 Bellmn Error Ordering Bellmn error is mesure for the difference between the current vlue nd the vlue fter full bckup: h BE(s) = V (s) mx ˆr(s, )+ X i p(s s, )V (s ) s 19
20 Bellmn Error Ordering initilize V (s) rbitrrily for ll s compute BE(s) for ll s loop {until convergence} select stte s with worst Bellmn error perform full bckup of s BE(s ) for ll predecessor sttes s of s do recompute BE( s) end for end loop To get positive trde-off: comp. time Bellmn error << comp time Full bckup 2
21 Prioritized Sweeping with Smll Bckups initilize V (s) rbitrrily for ll s initilize U(s) =V (s) for ll s initilize Q(s, ) =V (s) for ll s, initilize N s,ns s to for ll s,, s loop {over episodes} initilize s repet {for ech step in the episode} select ction, bsed on Q(s, ) tke ction, observe r nd s N s N s + 1; N s N s s s +1 Q(s, )(Ns 1) + r + V (s ) /N s Q(s, ) V (s) mx b Q(s, b) p V (s) U(s) if s is on queue, set its priority to p; otherwise, dd it with priority p for number of updte cycles do remove top stte s from queue U U( s ) V ( s ) V ( s ) VU s ) for ll ( s, ā) pirs with N sā s > do Q( s, ā) Q( s, ā)+ N sā/n sā s U U( s) mx b Q( s, b) p V ( s) U( s) if s is on queue, set its priority to p; otherwise, dd it with priority p end for end for s s until s is terminl end loop 21
22 ing results in the best plnning efficiency? ping (PS) with smll bckups outperform Empiricl Comprison Prioritized Sweeping (PS) with smll bckups outper.55.5 initil error initil error.45 son RMS error (vg. over first 1 5 obs) PS, Moore & Atkeson PS, Wiering & Schmidhuber PS, Peng & Willims PS, Wiering & Schmidhuber PS, Peng & Willims.2 PS, smll bckups, smll bckups.15 vlue itertion S x 1 6 vlue itertion comp. time per observtion [s] S G x
23 Trjectory Smpling Trjectory smpling: perform bckups long simulted trjectories This smples from the on-policy distribution Advntges when function pproximtion is used (Chpter 8) Focusing of computtion: cn cuse vst uninteresting prts of the stte spce to be (usefully) ignored: Initil sttes Rechble under optiml control Irrelevnt sttes 23
24 Trjectory Smpling Experiment one-step full tbulr bckups uniform: cycled through ll sttection pirs on-policy: bcked up long simulted trjectories 2 rndomly generted undiscounted episodic tsks 2 ctions for ech stte, ech with b eqully likely next sttes.1 prob of trnsition to terminl stte expected rewrd on ech trnsition selected from men vrince 1 Gussin 24
25 Heuristic Serch Used for ction selection, not for chnging vlue function (=heuristic evlution function) Bcked-up vlues re computed, but typiclly discrded Extension of the ide of greedy policy only deeper Also suggests wys to select sttes to bckup: smrt focusing: 25
26 Summry Efficient plnning is bout trying to spend the vilble computtion time in the most effective wy. Bckup types: full/smple/smll Bckup Ordering gin/loss trde-off prioritized sweeping prioritized sweeping with smll bckups: Bellmn error ordering trjectory smpling: bckup long trjectories heuristic serch 26
27 27
Bellman Optimality Equation for V*
Bellmn Optimlity Eqution for V* The vlue of stte under n optiml policy must equl the expected return for the best ction from tht stte: V (s) mx Q (s,) A(s) mx A(s) mx A(s) Er t 1 V (s t 1 ) s t s, t s
More information{ } = E! & $ " k r t +k +1
Chpter 4: Dynmic Progrmming Objectives of this chpter: Overview of collection of clssicl solution methods for MDPs known s dynmic progrmming (DP) Show how DP cn be used to compute vlue functions, nd hence,
More informationChapter 4: Dynamic Programming
Chpter 4: Dynmic Progrmming Objectives of this chpter: Overview of collection of clssicl solution methods for MDPs known s dynmic progrmming (DP) Show how DP cn be used to compute vlue functions, nd hence,
More informationAdministrivia CSE 190: Reinforcement Learning: An Introduction
Administrivi CSE 190: Reinforcement Lerning: An Introduction Any emil sent to me bout the course should hve CSE 190 in the subject line! Chpter 4: Dynmic Progrmming Acknowledgment: A good number of these
More informationReinforcement learning II
CS 1675 Introduction to Mchine Lerning Lecture 26 Reinforcement lerning II Milos Huskrecht milos@cs.pitt.edu 5329 Sennott Squre Reinforcement lerning Bsics: Input x Lerner Output Reinforcement r Critic
More informationCS 188: Artificial Intelligence Spring 2007
CS 188: Artificil Intelligence Spring 2007 Lecture 3: Queue-Bsed Serch 1/23/2007 Srini Nrynn UC Berkeley Mny slides over the course dpted from Dn Klein, Sturt Russell or Andrew Moore Announcements Assignment
More informationReinforcement Learning
Reinforcement Lerning Tom Mitchell, Mchine Lerning, chpter 13 Outline Introduction Comprison with inductive lerning Mrkov Decision Processes: the model Optiml policy: The tsk Q Lerning: Q function Algorithm
More information19 Optimal behavior: Game theory
Intro. to Artificil Intelligence: Dle Schuurmns, Relu Ptrscu 1 19 Optiml behvior: Gme theory Adversril stte dynmics hve to ccount for worst cse Compute policy π : S A tht mximizes minimum rewrd Let S (,
More information2D1431 Machine Learning Lab 3: Reinforcement Learning
2D1431 Mchine Lerning Lb 3: Reinforcement Lerning Frnk Hoffmnn modified by Örjn Ekeberg December 7, 2004 1 Introduction In this lb you will lern bout dynmic progrmming nd reinforcement lerning. It is ssumed
More informationUninformed Search Lecture 4
Lecture 4 Wht re common serch strtegies tht operte given only serch problem? How do they compre? 1 Agend A quick refresher DFS, BFS, ID-DFS, UCS Unifiction! 2 Serch Problem Formlism Defined vi the following
More informationModule 6 Value Iteration. CS 886 Sequential Decision Making and Reinforcement Learning University of Waterloo
Module 6 Vlue Itertion CS 886 Sequentil Decision Mking nd Reinforcement Lerning University of Wterloo Mrkov Decision Process Definition Set of sttes: S Set of ctions (i.e., decisions): A Trnsition model:
More informationDATA Search I 魏忠钰. 复旦大学大数据学院 School of Data Science, Fudan University. March 7 th, 2018
DATA620006 魏忠钰 Serch I Mrch 7 th, 2018 Outline Serch Problems Uninformed Serch Depth-First Serch Bredth-First Serch Uniform-Cost Serch Rel world tsk - Pc-mn Serch problems A serch problem consists of:
More informationReinforcement learning
Reinforcement lerning Regulr MDP Given: Trnition model P Rewrd function R Find: Policy π Reinforcement lerning Trnition model nd rewrd function initilly unknown Still need to find the right policy Lern
More informationSUMMER KNOWHOW STUDY AND LEARNING CENTRE
SUMMER KNOWHOW STUDY AND LEARNING CENTRE Indices & Logrithms 2 Contents Indices.2 Frctionl Indices.4 Logrithms 6 Exponentil equtions. Simplifying Surds 13 Opertions on Surds..16 Scientific Nottion..18
More informationChapter 0. What is the Lebesgue integral about?
Chpter 0. Wht is the Lebesgue integrl bout? The pln is to hve tutoril sheet ech week, most often on Fridy, (to be done during the clss) where you will try to get used to the ides introduced in the previous
More informationProperties of Integrals, Indefinite Integrals. Goals: Definition of the Definite Integral Integral Calculations using Antiderivatives
Block #6: Properties of Integrls, Indefinite Integrls Gols: Definition of the Definite Integrl Integrl Clcultions using Antiderivtives Properties of Integrls The Indefinite Integrl 1 Riemnn Sums - 1 Riemnn
More informationCS667 Lecture 6: Monte Carlo Integration 02/10/05
CS667 Lecture 6: Monte Crlo Integrtion 02/10/05 Venkt Krishnrj Lecturer: Steve Mrschner 1 Ide The min ide of Monte Crlo Integrtion is tht we cn estimte the vlue of n integrl by looking t lrge number of
More informationBayesian Networks: Approximate Inference
pproches to inference yesin Networks: pproximte Inference xct inference Vrillimintion Join tree lgorithm pproximte inference Simplify the structure of the network to mkxct inferencfficient (vritionl methods,
More informationDecision Networks. CS 188: Artificial Intelligence Fall Example: Decision Networks. Decision Networks. Decisions as Outcome Trees
CS 188: Artificil Intelligence Fll 2011 Decision Networks ME: choose the ction which mximizes the expected utility given the evidence mbrell Lecture 17: Decision Digrms 10/27/2011 Cn directly opertionlize
More information5.2 Exponent Properties Involving Quotients
5. Eponent Properties Involving Quotients Lerning Objectives Use the quotient of powers property. Use the power of quotient property. Simplify epressions involving quotient properties of eponents. Use
More informationLecture 3 Gaussian Probability Distribution
Introduction Lecture 3 Gussin Probbility Distribution Gussin probbility distribution is perhps the most used distribution in ll of science. lso clled bell shped curve or norml distribution Unlike the binomil
More informationActor-Critic. Hung-yi Lee
Actor-Critic Hung-yi Lee Asynchronous Advntge Actor-Critic (A3C) Volodymyr Mnih, Adrià Puigdomènech Bdi, Mehdi Mirz, Alex Grves, Timothy P. Lillicrp, Tim Hrley, Dvid Silver, Kory Kvukcuoglu, Asynchronous
More informationReinforcement Learning and Policy Reuse
Reinforcement Lerning nd Policy Reue Mnuel M. Veloo PEL Fll 206 Reding: Reinforcement Lerning: An Introduction R. Sutton nd A. Brto Probbilitic policy reue in reinforcement lerning gent Fernndo Fernndez
More informationNew data structures to reduce data size and search time
New dt structures to reduce dt size nd serch time Tsuneo Kuwbr Deprtment of Informtion Sciences, Fculty of Science, Kngw University, Hirtsuk-shi, Jpn FIT2018 1D-1, No2, pp1-4 Copyright (c)2018 by The Institute
More informationState space systems analysis (continued) Stability. A. Definitions A system is said to be Asymptotically Stable (AS) when it satisfies
Stte spce systems nlysis (continued) Stbility A. Definitions A system is sid to be Asymptoticlly Stble (AS) when it stisfies ut () = 0, t > 0 lim xt () 0. t A system is AS if nd only if the impulse response
More information4. GREEDY ALGORITHMS I
4. GREEDY ALGORITHMS I coin chnging intervl scheduling scheduling to minimize lteness optiml cching Lecture slides by Kevin Wyne Copyright 2005 Person-Addison Wesley http://www.cs.princeton.edu/~wyne/kleinberg-trdos
More informationLesson 25: Adding and Subtracting Rational Expressions
Lesson 2: Adding nd Subtrcting Rtionl Expressions Student Outcomes Students perform ddition nd subtrction of rtionl expressions. Lesson Notes This lesson reviews ddition nd subtrction of frctions using
More informationSearch: The Core of Planning
Serch: The Core of Plnning Dr. Neil T. Dntm CSCI-498/598 RPM, Colordo School of Mines Spring 208 Dntm (Mines CSCI, RPM) Serch Spring 208 / 75 Outline Plnning nd Serch Problems Bsic Serch Depth-First Serch
More informationHidden Markov Models
Hidden Mrkov Models Huptseminr Mchine Lerning 18.11.2003 Referent: Nikols Dörfler 1 Overview Mrkov Models Hidden Mrkov Models Types of Hidden Mrkov Models Applictions using HMMs Three centrl problems:
More informationThis lecture covers Chapter 8 of HMU: Properties of CFLs
This lecture covers Chpter 8 of HMU: Properties of CFLs Turing Mchine Extensions of Turing Mchines Restrictions of Turing Mchines Additionl Reding: Chpter 8 of HMU. Turing Mchine: Informl Definition B
More informationContinuous Random Variables
STAT/MATH 395 A - PROBABILITY II UW Winter Qurter 217 Néhémy Lim Continuous Rndom Vribles Nottion. The indictor function of set S is rel-vlued function defined by : { 1 if x S 1 S (x) if x S Suppose tht
More informationAcceptance Sampling by Attributes
Introduction Acceptnce Smpling by Attributes Acceptnce smpling is concerned with inspection nd decision mking regrding products. Three spects of smpling re importnt: o Involves rndom smpling of n entire
More informationQuantum Nonlocality Pt. 2: No-Signaling and Local Hidden Variables May 1, / 16
Quntum Nonloclity Pt. 2: No-Signling nd Locl Hidden Vriles My 1, 2018 Quntum Nonloclity Pt. 2: No-Signling nd Locl Hidden Vriles My 1, 2018 1 / 16 Non-Signling Boxes The primry lesson from lst lecture
More informationCS 188: Artificial Intelligence Fall 2010
CS 188: Artificil Intelligence Fll 2010 Lecture 18: Decision Digrms 10/28/2010 Dn Klein C Berkeley Vlue of Informtion 1 Decision Networks ME: choose the ction which mximizes the expected utility given
More informationStrong Bisimulation. Overview. References. Actions Labeled transition system Transition semantics Simulation Bisimulation
Strong Bisimultion Overview Actions Lbeled trnsition system Trnsition semntics Simultion Bisimultion References Robin Milner, Communiction nd Concurrency Robin Milner, Communicting nd Mobil Systems 32
More information1 Linear Least Squares
Lest Squres Pge 1 1 Liner Lest Squres I will try to be consistent in nottion, with n being the number of dt points, nd m < n being the number of prmeters in model function. We re interested in solving
More information1 Online Learning and Regret Minimization
2.997 Decision-Mking in Lrge-Scle Systems My 10 MIT, Spring 2004 Hndout #29 Lecture Note 24 1 Online Lerning nd Regret Minimiztion In this lecture, we consider the problem of sequentil decision mking in
More informationDIRECT CURRENT CIRCUITS
DRECT CURRENT CUTS ELECTRC POWER Consider the circuit shown in the Figure where bttery is connected to resistor R. A positive chrge dq will gin potentil energy s it moves from point to point b through
More informationName Solutions to Test 3 November 8, 2017
Nme Solutions to Test 3 November 8, 07 This test consists of three prts. Plese note tht in prts II nd III, you cn skip one question of those offered. Some possibly useful formuls cn be found below. Brrier
More informationLECTURE NOTE #12 PROF. ALAN YUILLE
LECTURE NOTE #12 PROF. ALAN YUILLE 1. Clustering, K-mens, nd EM Tsk: set of unlbeled dt D = {x 1,..., x n } Decompose into clsses w 1,..., w M where M is unknown. Lern clss models p(x w)) Discovery of
More informationPopulation bottleneck : dramatic reduction of population size followed by rapid expansion,
Selection We hve defined nucleotide diversity denoted by π s the proportion of nucleotides tht differ between two rndomly chosen sequences. We hve shown tht E[π] = θ = 4 e µ where µ cn be estimted directly.
More informationn f(x i ) x. i=1 In section 4.2, we defined the definite integral of f from x = a to x = b as n f(x i ) x; f(x) dx = lim i=1
The Fundmentl Theorem of Clculus As we continue to study the re problem, let s think bck to wht we know bout computing res of regions enclosed by curves. If we wnt to find the re of the region below the
More informationCS 188 Introduction to Artificial Intelligence Fall 2018 Note 7
CS 188 Introduction to Artificil Intelligence Fll 2018 Note 7 These lecture notes re hevily bsed on notes originlly written by Nikhil Shrm. Decision Networks In the third note, we lerned bout gme trees
More informationMonte Carlo method in solving numerical integration and differential equation
Monte Crlo method in solving numericl integrtion nd differentil eqution Ye Jin Chemistry Deprtment Duke University yj66@duke.edu Abstrct: Monte Crlo method is commonly used in rel physics problem. The
More informationChapter 5 : Continuous Random Variables
STAT/MATH 395 A - PROBABILITY II UW Winter Qurter 216 Néhémy Lim Chpter 5 : Continuous Rndom Vribles Nottions. N {, 1, 2,...}, set of nturl numbers (i.e. ll nonnegtive integers); N {1, 2,...}, set of ll
More informationDecision Networks. CS 188: Artificial Intelligence. Decision Networks. Decision Networks. Decision Networks and Value of Information
CS 188: Artificil Intelligence nd Vlue of Informtion Instructors: Dn Klein nd Pieter Abbeel niversity of Cliforni, Berkeley [These slides were creted by Dn Klein nd Pieter Abbeel for CS188 Intro to AI
More informationGoals: Determine how to calculate the area described by a function. Define the definite integral. Explore the relationship between the definite
Unit #8 : The Integrl Gols: Determine how to clculte the re described by function. Define the definite integrl. Eplore the reltionship between the definite integrl nd re. Eplore wys to estimte the definite
More informationArtificial Intelligence Markov Decision Problems
rtificil Intelligence Mrkov eciion Problem ilon - briefly mentioned in hpter Ruell nd orvig - hpter 7 Mrkov eciion Problem; pge of Mrkov eciion Problem; pge of exmple: probbilitic blockworld ction outcome
More informationOperations with Polynomials
38 Chpter P Prerequisites P.4 Opertions with Polynomils Wht you should lern: How to identify the leding coefficients nd degrees of polynomils How to dd nd subtrct polynomils How to multiply polynomils
More informationActive Tree Search. Robert Lieck Marc Toussaint Machine Learning and Robotics Lab University of Stuttgart
Active Tree Serch Robert Lieck Mrc Toussint Mchine Lerning nd Robotics Lb University of Stuttgrt prenme.surnme@ipvs.uni-stuttgrt.de Abstrct Monte-Crlo tree serch is bsed on contiguous rollouts. Since not
More informationPi evaluation. Monte Carlo integration
Pi evlution y 1 1 x Computtionl Physics 2018-19 (Phys Dep IST, Lisbon) Fernndo Bro (311) Monte Crlo integrtion we wnt to evlute the following integrl: F = f (x) dx remember tht the expecttion vlue of the
More informationGenetic Programming. Outline. Evolutionary Strategies. Evolutionary strategies Genetic programming Summary
Outline Genetic Progrmming Evolutionry strtegies Genetic progrmming Summry Bsed on the mteril provided y Professor Michel Negnevitsky Evolutionry Strtegies An pproch simulting nturl evolution ws proposed
More informationLesson 1.6 Exercises, pages 68 73
Lesson.6 Exercises, pges 68 7 A. Determine whether ech infinite geometric series hs finite sum. How do you know? ) + +.5 + 6.75 +... r is:.5, so the sum is not finite. b) 0.5 0.05 0.005 0.0005... r is:
More informationRecursively Enumerable and Recursive. Languages
Recursively Enumerble nd Recursive nguges 1 Recll Definition (clss 19.pdf) Definition 10.4, inz, 6 th, pge 279 et S be set of strings. An enumertion procedure for Turing Mchine tht genertes ll strings
More informationData Assimilation. Alan O Neill Data Assimilation Research Centre University of Reading
Dt Assimiltion Aln O Neill Dt Assimiltion Reserch Centre University of Reding Contents Motivtion Univrite sclr dt ssimiltion Multivrite vector dt ssimiltion Optiml Interpoltion BLUE 3d-Vritionl Method
More informationChapter 14. Matrix Representations of Linear Transformations
Chpter 4 Mtrix Representtions of Liner Trnsformtions When considering the Het Stte Evolution, we found tht we could describe this process using multipliction by mtrix. This ws nice becuse computers cn
More informationCS 188: Artificial Intelligence
CS 188: Artificil Intelligence Lecture 19: Decision Digrms Pieter Abbeel --- C Berkeley Mny slides over this course dpted from Dn Klein, Sturt Russell, Andrew Moore Decision Networks ME: choose the ction
More informationNumerical Integration
Numericl Integrtion Wouter J. Den Hn London School of Economics c 2011 by Wouter J. Den Hn June 3, 2011 Qudrture techniques I = f (x)dx n n w i f (x i ) = w i f i i=1 i=1 Nodes: x i Weights: w i Qudrture
More information5.3 The Fundamental Theorem of Calculus
CHAPTER 5. THE DEFINITE INTEGRAL 35 5.3 The Funmentl Theorem of Clculus Emple. Let f(t) t +. () Fin the re of the region below f(t), bove the t-is, n between t n t. (You my wnt to look up the re formul
More information12.1 Introduction to Rational Expressions
. Introduction to Rtionl Epressions A rtionl epression is rtio of polynomils; tht is, frction tht hs polynomil s numertor nd/or denomintor. Smple rtionl epressions: 0 EVALUATING RATIONAL EXPRESSIONS To
More informationTests for the Ratio of Two Poisson Rates
Chpter 437 Tests for the Rtio of Two Poisson Rtes Introduction The Poisson probbility lw gives the probbility distribution of the number of events occurring in specified intervl of time or spce. The Poisson
More informationfractions Let s Learn to
5 simple lgebric frctions corne lens pupil retin Norml vision light focused on the retin concve lens Shortsightedness (myopi) light focused in front of the retin Corrected myopi light focused on the retin
More informationCS415 Compilers. Lexical Analysis and. These slides are based on slides copyrighted by Keith Cooper, Ken Kennedy & Linda Torczon at Rice University
CS415 Compilers Lexicl Anlysis nd These slides re sed on slides copyrighted y Keith Cooper, Ken Kennedy & Lind Torczon t Rice University First Progrmming Project Instruction Scheduling Project hs een posted
More informationWhere did dynamic programming come from?
Where did dynmic progrmming come from? String lgorithms Dvid Kuchk cs302 Spring 2012 Richrd ellmn On the irth of Dynmic Progrmming Sturt Dreyfus http://www.eng.tu.c.il/~mi/cd/ or50/1526-5463-2002-50-01-0048.pdf
More informationdifferent methods (left endpoint, right endpoint, midpoint, trapezoid, Simpson s).
Mth 1A with Professor Stnkov Worksheet, Discussion #41; Wednesdy, 12/6/217 GSI nme: Roy Zho Problems 1. Write the integrl 3 dx s limit of Riemnn sums. Write it using 2 intervls using the 1 x different
More informationNumerical integration
2 Numericl integrtion This is pge i Printer: Opque this 2. Introduction Numericl integrtion is problem tht is prt of mny problems in the economics nd econometrics literture. The orgniztion of this chpter
More informationCf. Linn Sennott, Stochastic Dynamic Programming and the Control of Queueing Systems, Wiley Series in Probability & Statistics, 1999.
Cf. Linn Sennott, Stochstic Dynmic Progrmming nd the Control of Queueing Systems, Wiley Series in Probbility & Sttistics, 1999. D.L.Bricker, 2001 Dept of Industril Engineering The University of Iow MDP
More informationConsolidation Worksheet
Cmbridge Essentils Mthemtics Core 8 NConsolidtion Worksheet N Consolidtion Worksheet Work these out. 8 b 7 + 0 c 6 + 7 5 Use the number line to help. 2 Remember + 2 2 +2 2 2 + 2 Adding negtive number is
More informationProblem. Statement. variable Y. Method: Step 1: Step 2: y d dy. Find F ( Step 3: Find f = Y. Solution: Assume
Functions of Rndom Vrible Problem Sttement We know the pdf ( or cdf ) of rndom r vrible. Define new rndom vrible Y = g. Find the pdf of Y. Method: Step : Step : Step 3: Plot Y = g( ). Find F ( y) by mpping
More informationRecitation 3: More Applications of the Derivative
Mth 1c TA: Pdric Brtlett Recittion 3: More Applictions of the Derivtive Week 3 Cltech 2012 1 Rndom Question Question 1 A grph consists of the following: A set V of vertices. A set E of edges where ech
More informationMulti-Armed Bandits: Non-adaptive and Adaptive Sampling
CSE 547/Stt 548: Mchine Lerning for Big Dt Lecture Multi-Armed Bndits: Non-dptive nd Adptive Smpling Instructor: Shm Kkde 1 The (stochstic) multi-rmed bndit problem The bsic prdigm is s follows: K Independent
More information13: Diffusion in 2 Energy Groups
3: Diffusion in Energy Groups B. Rouben McMster University Course EP 4D3/6D3 Nucler Rector Anlysis (Rector Physics) 5 Sept.-Dec. 5 September Contents We study the diffusion eqution in two energy groups
More informationf(x) dx, If one of these two conditions is not met, we call the integral improper. Our usual definition for the value for the definite integral
Improper Integrls Every time tht we hve evluted definite integrl such s f(x) dx, we hve mde two implicit ssumptions bout the integrl:. The intervl [, b] is finite, nd. f(x) is continuous on [, b]. If one
More informationMathcad Lecture #1 In-class Worksheet Mathcad Basics
Mthcd Lecture #1 In-clss Worksheet Mthcd Bsics At the end of this lecture, you will be ble to: Evlute mthemticl epression numericlly Assign vrible nd use them in subsequent clcultions Distinguish between
More informationRiemann is the Mann! (But Lebesgue may besgue to differ.)
Riemnn is the Mnn! (But Lebesgue my besgue to differ.) Leo Livshits My 2, 2008 1 For finite intervls in R We hve seen in clss tht every continuous function f : [, b] R hs the property tht for every ɛ >
More informationTHE EXISTENCE-UNIQUENESS THEOREM FOR FIRST-ORDER DIFFERENTIAL EQUATIONS.
THE EXISTENCE-UNIQUENESS THEOREM FOR FIRST-ORDER DIFFERENTIAL EQUATIONS RADON ROSBOROUGH https://intuitiveexplntionscom/picrd-lindelof-theorem/ This document is proof of the existence-uniqueness theorem
More informationA-Level Mathematics Transition Task (compulsory for all maths students and all further maths student)
A-Level Mthemtics Trnsition Tsk (compulsory for ll mths students nd ll further mths student) Due: st Lesson of the yer. Length: - hours work (depending on prior knowledge) This trnsition tsk provides revision
More informationChapter 9: Inferences based on Two samples: Confidence intervals and tests of hypotheses
Chpter 9: Inferences bsed on Two smples: Confidence intervls nd tests of hypotheses 9.1 The trget prmeter : difference between two popultion mens : difference between two popultion proportions : rtio of
More informationNumerical Analysis: Trapezoidal and Simpson s Rule
nd Simpson s Mthemticl question we re interested in numericlly nswering How to we evlute I = f (x) dx? Clculus tells us tht if F(x) is the ntiderivtive of function f (x) on the intervl [, b], then I =
More informationConcepts of Concurrent Computation Spring 2015 Lecture 9: Petri Nets
Concepts of Concurrent Computtion Spring 205 Lecture 9: Petri Nets Sebstin Nnz Chris Poskitt Chir of Softwre Engineering Petri nets Petri nets re mthemticl models for describing systems with concurrency
More informationFingerprint idea. Assume:
Fingerprint ide Assume: We cn compute fingerprint f(p) of P in O(m) time. If f(p) f(t[s.. s+m 1]), then P T[s.. s+m 1] We cn compre fingerprints in O(1) We cn compute f = f(t[s+1.. s+m]) from f(t[s.. s+m
More informationPHYS Summer Professor Caillault Homework Solutions. Chapter 2
PHYS 1111 - Summer 2007 - Professor Cillult Homework Solutions Chpter 2 5. Picture the Problem: The runner moves long the ovl trck. Strtegy: The distnce is the totl length of trvel, nd the displcement
More informationApplying Q-Learning to Flappy Bird
Applying Q-Lerning to Flppy Bird Moritz Ebeling-Rump, Mnfred Ko, Zchry Hervieux-Moore Abstrct The field of mchine lerning is n interesting nd reltively new re of reserch in rtificil intelligence. In this
More informationNumerical Analysis. 10th ed. R L Burden, J D Faires, and A M Burden
Numericl Anlysis 10th ed R L Burden, J D Fires, nd A M Burden Bemer Presenttion Slides Prepred by Dr. Annette M. Burden Youngstown Stte University July 9, 2015 Chpter 4.1: Numericl Differentition 1 Three-Point
More informationPlanning to Be Surprised: Optimal Bayesian Exploration in Dynamic Environments
Plnning to Be Surprised: Optiml Byesin Explortion in Dynmic Environments Yi Sun, Fustino Gomez, nd Jürgen Schmidhuber IDSIA, Glleri 2, Mnno, CH-6928, Switzerlnd {yi,tino,juergen}@idsi.ch Abstrct. To mximize
More informationAdding and Subtracting Rational Expressions
6.4 Adding nd Subtrcting Rtionl Epressions Essentil Question How cn you determine the domin of the sum or difference of two rtionl epressions? You cn dd nd subtrct rtionl epressions in much the sme wy
More informationImproper Integrals. Type I Improper Integrals How do we evaluate an integral such as
Improper Integrls Two different types of integrls cn qulify s improper. The first type of improper integrl (which we will refer to s Type I) involves evluting n integrl over n infinite region. In the grph
More informationChapter 6 Notes, Larson/Hostetler 3e
Contents 6. Antiderivtives nd the Rules of Integrtion.......................... 6. Are nd the Definite Integrl.................................. 6.. Are............................................ 6. Reimnn
More informationFundamentals of Analytical Chemistry
Homework Fundmentls of nlyticl hemistry hpter 9 0, 1, 5, 7, 9 cids, Bses, nd hpter 9(b) Definitions cid Releses H ions in wter (rrhenius) Proton donor (Bronsted( Lowry) Electron-pir cceptor (Lewis) hrcteristic
More informationLearning to Serve and Bounce a Ball
Sndr Amend Gregor Gebhrdt Technische Universität Drmstdt Abstrct In this pper we investigte lerning the tsks of bll serving nd bll bouncing. These tsks disply chrcteristics which re common in vriety of
More informationToday. Recap: Reasoning Over Time. Demo Bonanza! CS 188: Artificial Intelligence. Advanced HMMs. Speech recognition. HMMs. Start machine learning
CS 188: Artificil Intelligence Advnced HMMs Dn Klein, Pieter Aeel University of Cliforni, Berkeley Demo Bonnz! Tody HMMs Demo onnz! Most likely explntion queries Speech recognition A mssive HMM! Detils
More informationNon-Linear & Logistic Regression
Non-Liner & Logistic Regression If the sttistics re boring, then you've got the wrong numbers. Edwrd R. Tufte (Sttistics Professor, Yle University) Regression Anlyses When do we use these? PART 1: find
More information7.1 Integral as Net Change and 7.2 Areas in the Plane Calculus
7.1 Integrl s Net Chnge nd 7. Ares in the Plne Clculus 7.1 INTEGRAL AS NET CHANGE Notecrds from 7.1: Displcement vs Totl Distnce, Integrl s Net Chnge We hve lredy seen how the position of n oject cn e
More informationAQA Further Pure 1. Complex Numbers. Section 1: Introduction to Complex Numbers. The number system
Complex Numbers Section 1: Introduction to Complex Numbers Notes nd Exmples These notes contin subsections on The number system Adding nd subtrcting complex numbers Multiplying complex numbers Complex
More informationKNOWLEDGE-BASED AGENTS INFERENCE
AGENTS THAT REASON LOGICALLY KNOWLEDGE-BASED AGENTS Two components: knowledge bse, nd n inference engine. Declrtive pproch to building n gent. We tell it wht it needs to know, nd It cn sk itself wht to
More informationChapter 5 Plan-Space Planning
Lecture slides for Automted Plnning: Theory nd Prctice Chpter 5 Pln-Spce Plnning Dn S. Nu CMSC 722, AI Plnning University of Mrylnd, Spring 2008 1 Stte-Spce Plnning Motivtion g 1 1 g 4 4 s 0 g 5 5 g 2
More informationMath 8 Winter 2015 Applications of Integration
Mth 8 Winter 205 Applictions of Integrtion Here re few importnt pplictions of integrtion. The pplictions you my see on n exm in this course include only the Net Chnge Theorem (which is relly just the Fundmentl
More informationChapters 4 & 5 Integrals & Applications
Contents Chpters 4 & 5 Integrls & Applictions Motivtion to Chpters 4 & 5 2 Chpter 4 3 Ares nd Distnces 3. VIDEO - Ares Under Functions............................................ 3.2 VIDEO - Applictions
More informationProblem Set 3 Solutions
Chemistry 36 Dr Jen M Stndrd Problem Set 3 Solutions 1 Verify for the prticle in one-dimensionl box by explicit integrtion tht the wvefunction ψ ( x) π x is normlized To verify tht ψ ( x) is normlized,
More informationChapter 7 Notes, Stewart 8e. 7.1 Integration by Parts Trigonometric Integrals Evaluating sin m x cos n (x) dx...
Contents 7.1 Integrtion by Prts................................... 2 7.2 Trigonometric Integrls.................................. 8 7.2.1 Evluting sin m x cos n (x)......................... 8 7.2.2 Evluting
More information