Efficient Planning. R. S. Sutton and A. G. Barto: Reinforcement Learning: An Introduction

Size: px
Start display at page:

Download "Efficient Planning. R. S. Sutton and A. G. Barto: Reinforcement Learning: An Introduction"

Transcription

1 Efficient Plnning 1

2 Tuesdy clss summry: Plnning: ny computtionl process tht uses model to crete or improve policy Dyn frmework: 2

3 Questions during clss Why use simulted experience? Cn t you directly compute solution bsed on model? Wouldn t it be better to pln bckwrds from gol 3

4 How to Achieve Efficient Plnning? Wht type of bckup is better? Smple vs. full bckups Incrementl vs. less incrementl bckups How to order the bckups? 4

5 Wht is Efficient Plnning? Plnning lgorithm A is more efficient thn plnning lgorithm B if: it cn compute the optiml policy (or vlue function) in less time. given the sme mount of computtion time, it improves the policy (or vlue function) more. 5

6 Wht bckup type is best? 6

7 Full vs. Smple Bckups Vlue estimted Full bckups (DP) Smple bckups (one-step TD) s s V v! π (s) r s' r s' policy evlution TD() V v * * (s) mx s vlue itertion r s' s, s, Q q! π (,s) r s' r s' ' Q-policy evlution ' Srs s, s, Q q * (,s) * mx 7 r s' Q-vlue itertion ' mx r s' ' Q-lerning

8 Full vs. Smple Bckups 1 smple bckups full bckups RMS error in vlue estimte b =1 b = 2 (brnching fctor) b =1 b =1 b =1, 1b 2b Number of mx Q(s, ) computtions b successor sttes, eqully likely; initil error = 1; ssume ll next sttes vlues re correct 8

9 Smll Bckups Smll bckups re single-successor bckups bsed on the model Smll bckups hve the sme computtionl complexity s smple bckups Smll bckups hve no smpling error Smll bckups require storge for old vlues 9

10 A n optiml Xicon. Consider tht estimtes we re interested in some est eks tht is constructed from sum of other X. i problem behind this newinibckuptht is sisfollows. iin n constructed from sum of other estim The estimte A cn be computed using full bckup: t we re interested in Ide some estimte A Smll Bckups Min behind The estimte A cn be computed using full s tsk is often forx uctedupdted, from suma of cn other be estimtes Xi. re recomputed A X. X cess (MDP), whereusing full ibckup: A cn be computed A weighted Xi. sum i e on bckup. Alterntively, if we know Consider estimte A tht is constructed from the gent s bex i A X ignl. The gent s If the estimtes Xii. revlue updted, A cn be recomputed ed estimtes significnt chnge, we i is by redoing the bove Alterntively, ifi we know eturn, which the If the estimtes X re updted, A cn be reco te A for only Xbckup.. Let us indicte X j tht only Xj received significnt vlue chnge, we ure steps. An w fullupdted, bckup: by redoing the bove bckup. Alterntively, if es Xitime re AAcn be recomputed i Xi used to construct the vlue i current updte A for only. jlet us indicte jx bove bckup. Alterntively, ifonly we X know ne might RL iswnt the to smple tht received significnt vlue ch the old vlue of X, used to construct the current vlue hen be updted byvlue subtrcting this received significnt chnge, we ber of environment Wht cn we know tht only single might wnt to updte Asuccessor, for only Xj., Let us Aj doaif +we w (X x ) j j j ofupdte A, s xaj. for A cn updted by subtrcting this og onlythen X Let us indicte j. be in chnged good policy. vlue since the lst bckup? the new vlue: the old vlue of Xj, used to construct the curre dding the vlue: ofold Xjvlue, usednd to construct thenew current vlue of A, s x 2 S ccording to some selection strtegy jh. A cn then be updted by subtrc then be updted l cn Conference on M-by subtrcting this old the new vlue: the of to construct the current kup to s: xj be Ajold x vlue + XX. useddding ALet + X. Avlue j,nd USA, JMLR: h 213. i j dding thep new vlue: vlue vlue A cn thenwbe updted for single successor r (s,uthor(s). ) + ofs A. p(sthe s, )V (s ) A yxthe ix ia A thexold Xj. j +vlue: A by dding A xj + j. thexdifference between the new nd i r (S, ) + smll P p(s S, ) mx bckup: s A Q(S, ) Q(s, ) A + wj (Xj A) + R + mx Q(S, A) xj ) 1

11 Smll bckup : single-successor bckup with cost tht yi yj is frctionsmll of the cost: single-successor of full bckup. bckup bckup with cost tht 2Cons ev O(1) is frction of the cost of full bckup. xi Smll vs.xj Smple Bckups p with cost tht t xk r=leffk ( Advntge SmllSmll Bckups over Smple Bckups: No Step-S Single-s Advntge Bckups over Smple is Smll bckup : single-successor bckup cost tht Bckups: No Step-Size x izewith ep sk t s t n 1 t bckup. is frction of the cost),ofconsfull ( TD step size g : f in.8 p y u k c e k d,.8 bc le up: TD() yj p size y k p c i b m le p s RMS.6 sm ste stnt ), con size ( p size D p error T e RMS.6 t st up: TD(), decying ste ckup: n b t s le.4 smp, consmple bck ize (normlized) error TD() : ing step s p u k c (), decy r le t gh 2 evluttion tsks: 2 evluttion rri r left ft 1 tsks: rri t gh rleft = +1 Conside r = +1 Advntge Smll bbckups over Smple Bckups: No =Step-Size r = +1 O(1) coi r -1 right x D x T : r = -1 e p over Smple Bckups: No Step-Size is Req r (normlized) left i mple bcku j l smp right s smll bckup RMS error p siz.5tnt ste.6 s n o rri r left nd rn ns dom trnsitio t gh.2 rleft = +1 r 2 evluttion tsks: s rright = -1 ndom trnsition smll bckup Smll bckup : single-successor bckup with.8 cost tht.8 step size / step size decy bckup.3 is frction of.5the cost.6 of full.7 bckup. 2smll evluttion tsks: e.2.1 left rright = +1 step size / step size decy ying step size c D(/), step size decy T step size : p rri u k ), dect le bc ckup: TD( r lef b le p smp m s = +1 r =+ Tke-Home Messge: smllerr bckups more plnning Tke-Home Messge: step size / step size decy left error.4 (normlized).2 r = -1 smple bckupright.4 s smller bckups more p de : TD(),.9 p.8 size t step p: TD(), ple bcku sm cyin1g α r = +1 smller bckups decying step size rleft = +1 rright = -1 left rright = +1 rleft = +1 rright = +1 Tke-Home Messge: r smller bckups more plnni : TD(), cons.2.9 stn ), con D( kup: T bc mple smll bckup 2revluttion tsks: n do ns m trnsitio rri r left t gh ku smple bc r = +1 Tke-Home Messge:.7.8 smll bckup 1.8.1e p siz e t s g in y.6 D(), dec RMS ll bckup.7 left.2 normlized RMS error.3 left Advntge Smll Bckups over Smple rbckups: No Step-Size is= = -1 rright right 1 size.2 t gh (normlized) tnt α smll bckup n dom step size / step size decy lph / decy.4.3 ecy ns o i t trnsi rn ns dom trnsitio 11

12 Smll vs. Smple Bckups A B C trnsition probbility stte vlues stte A stte B stte A stte B 12

13 Bckup Ordering 13

14 Bckup Ordering Do Forever: 1) Select stte s 2 S ccording to some selection strtegy H 2) Apply full bckup to s: V (s) mx hˆr(s, )+ P i s p(s s, )V (s ) Asynchronous Vlue Itertion P For every selection strtegy H tht selects ech stte infinitely often the vlues V converge to the optiml vlue function V The rte of convergence depends strongly on the selection strtegy H 14

15 The Trde-Off For ny effective ordering strtegy the cost tht is sved by hving to perform less bckups should out-weigh the cost of mintining the ordering: cost to mintin ordering cost svings due to fewer bckups 15

16 Prioritized Sweeping Which sttes or stte-ction pirs should be generted during plnning? Work bckwrds from sttes whose vlues hve just chnged: Mintin queue of stte-ction pirs whose vlues would chnge lot if bcked up, prioritized by the size of the chnge When new bckup occurs, insert predecessors ccording to their priorities Alwys perform bckups from first in queue Moore & Atkeson 1993; Peng & Willims 1993 improved by McMhn & Gordon 25; Vn Seijen

17 Moore nd Atekson s Prioritized Sweeping Published in

18 Prioritized Sweeping vs. Dyn-Q Both use n=5 bckups per environmentl interction 18

19 Bellmn Error Ordering Bellmn error is mesure for the difference between the current vlue nd the vlue fter full bckup: h BE(s) = V (s) mx ˆr(s, )+ X i p(s s, )V (s ) s 19

20 Bellmn Error Ordering initilize V (s) rbitrrily for ll s compute BE(s) for ll s loop {until convergence} select stte s with worst Bellmn error perform full bckup of s BE(s ) for ll predecessor sttes s of s do recompute BE( s) end for end loop To get positive trde-off: comp. time Bellmn error << comp time Full bckup 2

21 Prioritized Sweeping with Smll Bckups initilize V (s) rbitrrily for ll s initilize U(s) =V (s) for ll s initilize Q(s, ) =V (s) for ll s, initilize N s,ns s to for ll s,, s loop {over episodes} initilize s repet {for ech step in the episode} select ction, bsed on Q(s, ) tke ction, observe r nd s N s N s + 1; N s N s s s +1 Q(s, )(Ns 1) + r + V (s ) /N s Q(s, ) V (s) mx b Q(s, b) p V (s) U(s) if s is on queue, set its priority to p; otherwise, dd it with priority p for number of updte cycles do remove top stte s from queue U U( s ) V ( s ) V ( s ) VU s ) for ll ( s, ā) pirs with N sā s > do Q( s, ā) Q( s, ā)+ N sā/n sā s U U( s) mx b Q( s, b) p V ( s) U( s) if s is on queue, set its priority to p; otherwise, dd it with priority p end for end for s s until s is terminl end loop 21

22 ing results in the best plnning efficiency? ping (PS) with smll bckups outperform Empiricl Comprison Prioritized Sweeping (PS) with smll bckups outper.55.5 initil error initil error.45 son RMS error (vg. over first 1 5 obs) PS, Moore & Atkeson PS, Wiering & Schmidhuber PS, Peng & Willims PS, Wiering & Schmidhuber PS, Peng & Willims.2 PS, smll bckups, smll bckups.15 vlue itertion S x 1 6 vlue itertion comp. time per observtion [s] S G x

23 Trjectory Smpling Trjectory smpling: perform bckups long simulted trjectories This smples from the on-policy distribution Advntges when function pproximtion is used (Chpter 8) Focusing of computtion: cn cuse vst uninteresting prts of the stte spce to be (usefully) ignored: Initil sttes Rechble under optiml control Irrelevnt sttes 23

24 Trjectory Smpling Experiment one-step full tbulr bckups uniform: cycled through ll sttection pirs on-policy: bcked up long simulted trjectories 2 rndomly generted undiscounted episodic tsks 2 ctions for ech stte, ech with b eqully likely next sttes.1 prob of trnsition to terminl stte expected rewrd on ech trnsition selected from men vrince 1 Gussin 24

25 Heuristic Serch Used for ction selection, not for chnging vlue function (=heuristic evlution function) Bcked-up vlues re computed, but typiclly discrded Extension of the ide of greedy policy only deeper Also suggests wys to select sttes to bckup: smrt focusing: 25

26 Summry Efficient plnning is bout trying to spend the vilble computtion time in the most effective wy. Bckup types: full/smple/smll Bckup Ordering gin/loss trde-off prioritized sweeping prioritized sweeping with smll bckups: Bellmn error ordering trjectory smpling: bckup long trjectories heuristic serch 26

27 27

Bellman Optimality Equation for V*

Bellman Optimality Equation for V* Bellmn Optimlity Eqution for V* The vlue of stte under n optiml policy must equl the expected return for the best ction from tht stte: V (s) mx Q (s,) A(s) mx A(s) mx A(s) Er t 1 V (s t 1 ) s t s, t s

More information

{ } = E! & $ " k r t +k +1

{ } = E! & $  k r t +k +1 Chpter 4: Dynmic Progrmming Objectives of this chpter: Overview of collection of clssicl solution methods for MDPs known s dynmic progrmming (DP) Show how DP cn be used to compute vlue functions, nd hence,

More information

Chapter 4: Dynamic Programming

Chapter 4: Dynamic Programming Chpter 4: Dynmic Progrmming Objectives of this chpter: Overview of collection of clssicl solution methods for MDPs known s dynmic progrmming (DP) Show how DP cn be used to compute vlue functions, nd hence,

More information

Administrivia CSE 190: Reinforcement Learning: An Introduction

Administrivia CSE 190: Reinforcement Learning: An Introduction Administrivi CSE 190: Reinforcement Lerning: An Introduction Any emil sent to me bout the course should hve CSE 190 in the subject line! Chpter 4: Dynmic Progrmming Acknowledgment: A good number of these

More information

Reinforcement learning II

Reinforcement learning II CS 1675 Introduction to Mchine Lerning Lecture 26 Reinforcement lerning II Milos Huskrecht milos@cs.pitt.edu 5329 Sennott Squre Reinforcement lerning Bsics: Input x Lerner Output Reinforcement r Critic

More information

CS 188: Artificial Intelligence Spring 2007

CS 188: Artificial Intelligence Spring 2007 CS 188: Artificil Intelligence Spring 2007 Lecture 3: Queue-Bsed Serch 1/23/2007 Srini Nrynn UC Berkeley Mny slides over the course dpted from Dn Klein, Sturt Russell or Andrew Moore Announcements Assignment

More information

Reinforcement Learning

Reinforcement Learning Reinforcement Lerning Tom Mitchell, Mchine Lerning, chpter 13 Outline Introduction Comprison with inductive lerning Mrkov Decision Processes: the model Optiml policy: The tsk Q Lerning: Q function Algorithm

More information

19 Optimal behavior: Game theory

19 Optimal behavior: Game theory Intro. to Artificil Intelligence: Dle Schuurmns, Relu Ptrscu 1 19 Optiml behvior: Gme theory Adversril stte dynmics hve to ccount for worst cse Compute policy π : S A tht mximizes minimum rewrd Let S (,

More information

2D1431 Machine Learning Lab 3: Reinforcement Learning

2D1431 Machine Learning Lab 3: Reinforcement Learning 2D1431 Mchine Lerning Lb 3: Reinforcement Lerning Frnk Hoffmnn modified by Örjn Ekeberg December 7, 2004 1 Introduction In this lb you will lern bout dynmic progrmming nd reinforcement lerning. It is ssumed

More information

Uninformed Search Lecture 4

Uninformed Search Lecture 4 Lecture 4 Wht re common serch strtegies tht operte given only serch problem? How do they compre? 1 Agend A quick refresher DFS, BFS, ID-DFS, UCS Unifiction! 2 Serch Problem Formlism Defined vi the following

More information

Module 6 Value Iteration. CS 886 Sequential Decision Making and Reinforcement Learning University of Waterloo

Module 6 Value Iteration. CS 886 Sequential Decision Making and Reinforcement Learning University of Waterloo Module 6 Vlue Itertion CS 886 Sequentil Decision Mking nd Reinforcement Lerning University of Wterloo Mrkov Decision Process Definition Set of sttes: S Set of ctions (i.e., decisions): A Trnsition model:

More information

DATA Search I 魏忠钰. 复旦大学大数据学院 School of Data Science, Fudan University. March 7 th, 2018

DATA Search I 魏忠钰. 复旦大学大数据学院 School of Data Science, Fudan University. March 7 th, 2018 DATA620006 魏忠钰 Serch I Mrch 7 th, 2018 Outline Serch Problems Uninformed Serch Depth-First Serch Bredth-First Serch Uniform-Cost Serch Rel world tsk - Pc-mn Serch problems A serch problem consists of:

More information

Reinforcement learning

Reinforcement learning Reinforcement lerning Regulr MDP Given: Trnition model P Rewrd function R Find: Policy π Reinforcement lerning Trnition model nd rewrd function initilly unknown Still need to find the right policy Lern

More information

SUMMER KNOWHOW STUDY AND LEARNING CENTRE

SUMMER KNOWHOW STUDY AND LEARNING CENTRE SUMMER KNOWHOW STUDY AND LEARNING CENTRE Indices & Logrithms 2 Contents Indices.2 Frctionl Indices.4 Logrithms 6 Exponentil equtions. Simplifying Surds 13 Opertions on Surds..16 Scientific Nottion..18

More information

Chapter 0. What is the Lebesgue integral about?

Chapter 0. What is the Lebesgue integral about? Chpter 0. Wht is the Lebesgue integrl bout? The pln is to hve tutoril sheet ech week, most often on Fridy, (to be done during the clss) where you will try to get used to the ides introduced in the previous

More information

Properties of Integrals, Indefinite Integrals. Goals: Definition of the Definite Integral Integral Calculations using Antiderivatives

Properties of Integrals, Indefinite Integrals. Goals: Definition of the Definite Integral Integral Calculations using Antiderivatives Block #6: Properties of Integrls, Indefinite Integrls Gols: Definition of the Definite Integrl Integrl Clcultions using Antiderivtives Properties of Integrls The Indefinite Integrl 1 Riemnn Sums - 1 Riemnn

More information

CS667 Lecture 6: Monte Carlo Integration 02/10/05

CS667 Lecture 6: Monte Carlo Integration 02/10/05 CS667 Lecture 6: Monte Crlo Integrtion 02/10/05 Venkt Krishnrj Lecturer: Steve Mrschner 1 Ide The min ide of Monte Crlo Integrtion is tht we cn estimte the vlue of n integrl by looking t lrge number of

More information

Bayesian Networks: Approximate Inference

Bayesian Networks: Approximate Inference pproches to inference yesin Networks: pproximte Inference xct inference Vrillimintion Join tree lgorithm pproximte inference Simplify the structure of the network to mkxct inferencfficient (vritionl methods,

More information

Decision Networks. CS 188: Artificial Intelligence Fall Example: Decision Networks. Decision Networks. Decisions as Outcome Trees

Decision Networks. CS 188: Artificial Intelligence Fall Example: Decision Networks. Decision Networks. Decisions as Outcome Trees CS 188: Artificil Intelligence Fll 2011 Decision Networks ME: choose the ction which mximizes the expected utility given the evidence mbrell Lecture 17: Decision Digrms 10/27/2011 Cn directly opertionlize

More information

5.2 Exponent Properties Involving Quotients

5.2 Exponent Properties Involving Quotients 5. Eponent Properties Involving Quotients Lerning Objectives Use the quotient of powers property. Use the power of quotient property. Simplify epressions involving quotient properties of eponents. Use

More information

Lecture 3 Gaussian Probability Distribution

Lecture 3 Gaussian Probability Distribution Introduction Lecture 3 Gussin Probbility Distribution Gussin probbility distribution is perhps the most used distribution in ll of science. lso clled bell shped curve or norml distribution Unlike the binomil

More information

Actor-Critic. Hung-yi Lee

Actor-Critic. Hung-yi Lee Actor-Critic Hung-yi Lee Asynchronous Advntge Actor-Critic (A3C) Volodymyr Mnih, Adrià Puigdomènech Bdi, Mehdi Mirz, Alex Grves, Timothy P. Lillicrp, Tim Hrley, Dvid Silver, Kory Kvukcuoglu, Asynchronous

More information

Reinforcement Learning and Policy Reuse

Reinforcement Learning and Policy Reuse Reinforcement Lerning nd Policy Reue Mnuel M. Veloo PEL Fll 206 Reding: Reinforcement Lerning: An Introduction R. Sutton nd A. Brto Probbilitic policy reue in reinforcement lerning gent Fernndo Fernndez

More information

New data structures to reduce data size and search time

New data structures to reduce data size and search time New dt structures to reduce dt size nd serch time Tsuneo Kuwbr Deprtment of Informtion Sciences, Fculty of Science, Kngw University, Hirtsuk-shi, Jpn FIT2018 1D-1, No2, pp1-4 Copyright (c)2018 by The Institute

More information

State space systems analysis (continued) Stability. A. Definitions A system is said to be Asymptotically Stable (AS) when it satisfies

State space systems analysis (continued) Stability. A. Definitions A system is said to be Asymptotically Stable (AS) when it satisfies Stte spce systems nlysis (continued) Stbility A. Definitions A system is sid to be Asymptoticlly Stble (AS) when it stisfies ut () = 0, t > 0 lim xt () 0. t A system is AS if nd only if the impulse response

More information

4. GREEDY ALGORITHMS I

4. GREEDY ALGORITHMS I 4. GREEDY ALGORITHMS I coin chnging intervl scheduling scheduling to minimize lteness optiml cching Lecture slides by Kevin Wyne Copyright 2005 Person-Addison Wesley http://www.cs.princeton.edu/~wyne/kleinberg-trdos

More information

Lesson 25: Adding and Subtracting Rational Expressions

Lesson 25: Adding and Subtracting Rational Expressions Lesson 2: Adding nd Subtrcting Rtionl Expressions Student Outcomes Students perform ddition nd subtrction of rtionl expressions. Lesson Notes This lesson reviews ddition nd subtrction of frctions using

More information

Search: The Core of Planning

Search: The Core of Planning Serch: The Core of Plnning Dr. Neil T. Dntm CSCI-498/598 RPM, Colordo School of Mines Spring 208 Dntm (Mines CSCI, RPM) Serch Spring 208 / 75 Outline Plnning nd Serch Problems Bsic Serch Depth-First Serch

More information

Hidden Markov Models

Hidden Markov Models Hidden Mrkov Models Huptseminr Mchine Lerning 18.11.2003 Referent: Nikols Dörfler 1 Overview Mrkov Models Hidden Mrkov Models Types of Hidden Mrkov Models Applictions using HMMs Three centrl problems:

More information

This lecture covers Chapter 8 of HMU: Properties of CFLs

This lecture covers Chapter 8 of HMU: Properties of CFLs This lecture covers Chpter 8 of HMU: Properties of CFLs Turing Mchine Extensions of Turing Mchines Restrictions of Turing Mchines Additionl Reding: Chpter 8 of HMU. Turing Mchine: Informl Definition B

More information

Continuous Random Variables

Continuous Random Variables STAT/MATH 395 A - PROBABILITY II UW Winter Qurter 217 Néhémy Lim Continuous Rndom Vribles Nottion. The indictor function of set S is rel-vlued function defined by : { 1 if x S 1 S (x) if x S Suppose tht

More information

Acceptance Sampling by Attributes

Acceptance Sampling by Attributes Introduction Acceptnce Smpling by Attributes Acceptnce smpling is concerned with inspection nd decision mking regrding products. Three spects of smpling re importnt: o Involves rndom smpling of n entire

More information

Quantum Nonlocality Pt. 2: No-Signaling and Local Hidden Variables May 1, / 16

Quantum Nonlocality Pt. 2: No-Signaling and Local Hidden Variables May 1, / 16 Quntum Nonloclity Pt. 2: No-Signling nd Locl Hidden Vriles My 1, 2018 Quntum Nonloclity Pt. 2: No-Signling nd Locl Hidden Vriles My 1, 2018 1 / 16 Non-Signling Boxes The primry lesson from lst lecture

More information

CS 188: Artificial Intelligence Fall 2010

CS 188: Artificial Intelligence Fall 2010 CS 188: Artificil Intelligence Fll 2010 Lecture 18: Decision Digrms 10/28/2010 Dn Klein C Berkeley Vlue of Informtion 1 Decision Networks ME: choose the ction which mximizes the expected utility given

More information

Strong Bisimulation. Overview. References. Actions Labeled transition system Transition semantics Simulation Bisimulation

Strong Bisimulation. Overview. References. Actions Labeled transition system Transition semantics Simulation Bisimulation Strong Bisimultion Overview Actions Lbeled trnsition system Trnsition semntics Simultion Bisimultion References Robin Milner, Communiction nd Concurrency Robin Milner, Communicting nd Mobil Systems 32

More information

1 Linear Least Squares

1 Linear Least Squares Lest Squres Pge 1 1 Liner Lest Squres I will try to be consistent in nottion, with n being the number of dt points, nd m < n being the number of prmeters in model function. We re interested in solving

More information

1 Online Learning and Regret Minimization

1 Online Learning and Regret Minimization 2.997 Decision-Mking in Lrge-Scle Systems My 10 MIT, Spring 2004 Hndout #29 Lecture Note 24 1 Online Lerning nd Regret Minimiztion In this lecture, we consider the problem of sequentil decision mking in

More information

DIRECT CURRENT CIRCUITS

DIRECT CURRENT CIRCUITS DRECT CURRENT CUTS ELECTRC POWER Consider the circuit shown in the Figure where bttery is connected to resistor R. A positive chrge dq will gin potentil energy s it moves from point to point b through

More information

Name Solutions to Test 3 November 8, 2017

Name Solutions to Test 3 November 8, 2017 Nme Solutions to Test 3 November 8, 07 This test consists of three prts. Plese note tht in prts II nd III, you cn skip one question of those offered. Some possibly useful formuls cn be found below. Brrier

More information

LECTURE NOTE #12 PROF. ALAN YUILLE

LECTURE NOTE #12 PROF. ALAN YUILLE LECTURE NOTE #12 PROF. ALAN YUILLE 1. Clustering, K-mens, nd EM Tsk: set of unlbeled dt D = {x 1,..., x n } Decompose into clsses w 1,..., w M where M is unknown. Lern clss models p(x w)) Discovery of

More information

Population bottleneck : dramatic reduction of population size followed by rapid expansion,

Population bottleneck : dramatic reduction of population size followed by rapid expansion, Selection We hve defined nucleotide diversity denoted by π s the proportion of nucleotides tht differ between two rndomly chosen sequences. We hve shown tht E[π] = θ = 4 e µ where µ cn be estimted directly.

More information

n f(x i ) x. i=1 In section 4.2, we defined the definite integral of f from x = a to x = b as n f(x i ) x; f(x) dx = lim i=1

n f(x i ) x. i=1 In section 4.2, we defined the definite integral of f from x = a to x = b as n f(x i ) x; f(x) dx = lim i=1 The Fundmentl Theorem of Clculus As we continue to study the re problem, let s think bck to wht we know bout computing res of regions enclosed by curves. If we wnt to find the re of the region below the

More information

CS 188 Introduction to Artificial Intelligence Fall 2018 Note 7

CS 188 Introduction to Artificial Intelligence Fall 2018 Note 7 CS 188 Introduction to Artificil Intelligence Fll 2018 Note 7 These lecture notes re hevily bsed on notes originlly written by Nikhil Shrm. Decision Networks In the third note, we lerned bout gme trees

More information

Monte Carlo method in solving numerical integration and differential equation

Monte Carlo method in solving numerical integration and differential equation Monte Crlo method in solving numericl integrtion nd differentil eqution Ye Jin Chemistry Deprtment Duke University yj66@duke.edu Abstrct: Monte Crlo method is commonly used in rel physics problem. The

More information

Chapter 5 : Continuous Random Variables

Chapter 5 : Continuous Random Variables STAT/MATH 395 A - PROBABILITY II UW Winter Qurter 216 Néhémy Lim Chpter 5 : Continuous Rndom Vribles Nottions. N {, 1, 2,...}, set of nturl numbers (i.e. ll nonnegtive integers); N {1, 2,...}, set of ll

More information

Decision Networks. CS 188: Artificial Intelligence. Decision Networks. Decision Networks. Decision Networks and Value of Information

Decision Networks. CS 188: Artificial Intelligence. Decision Networks. Decision Networks. Decision Networks and Value of Information CS 188: Artificil Intelligence nd Vlue of Informtion Instructors: Dn Klein nd Pieter Abbeel niversity of Cliforni, Berkeley [These slides were creted by Dn Klein nd Pieter Abbeel for CS188 Intro to AI

More information

Goals: Determine how to calculate the area described by a function. Define the definite integral. Explore the relationship between the definite

Goals: Determine how to calculate the area described by a function. Define the definite integral. Explore the relationship between the definite Unit #8 : The Integrl Gols: Determine how to clculte the re described by function. Define the definite integrl. Eplore the reltionship between the definite integrl nd re. Eplore wys to estimte the definite

More information

Artificial Intelligence Markov Decision Problems

Artificial Intelligence Markov Decision Problems rtificil Intelligence Mrkov eciion Problem ilon - briefly mentioned in hpter Ruell nd orvig - hpter 7 Mrkov eciion Problem; pge of Mrkov eciion Problem; pge of exmple: probbilitic blockworld ction outcome

More information

Operations with Polynomials

Operations with Polynomials 38 Chpter P Prerequisites P.4 Opertions with Polynomils Wht you should lern: How to identify the leding coefficients nd degrees of polynomils How to dd nd subtrct polynomils How to multiply polynomils

More information

Active Tree Search. Robert Lieck Marc Toussaint Machine Learning and Robotics Lab University of Stuttgart

Active Tree Search. Robert Lieck Marc Toussaint Machine Learning and Robotics Lab University of Stuttgart Active Tree Serch Robert Lieck Mrc Toussint Mchine Lerning nd Robotics Lb University of Stuttgrt prenme.surnme@ipvs.uni-stuttgrt.de Abstrct Monte-Crlo tree serch is bsed on contiguous rollouts. Since not

More information

Pi evaluation. Monte Carlo integration

Pi evaluation. Monte Carlo integration Pi evlution y 1 1 x Computtionl Physics 2018-19 (Phys Dep IST, Lisbon) Fernndo Bro (311) Monte Crlo integrtion we wnt to evlute the following integrl: F = f (x) dx remember tht the expecttion vlue of the

More information

Genetic Programming. Outline. Evolutionary Strategies. Evolutionary strategies Genetic programming Summary

Genetic Programming. Outline. Evolutionary Strategies. Evolutionary strategies Genetic programming Summary Outline Genetic Progrmming Evolutionry strtegies Genetic progrmming Summry Bsed on the mteril provided y Professor Michel Negnevitsky Evolutionry Strtegies An pproch simulting nturl evolution ws proposed

More information

Lesson 1.6 Exercises, pages 68 73

Lesson 1.6 Exercises, pages 68 73 Lesson.6 Exercises, pges 68 7 A. Determine whether ech infinite geometric series hs finite sum. How do you know? ) + +.5 + 6.75 +... r is:.5, so the sum is not finite. b) 0.5 0.05 0.005 0.0005... r is:

More information

Recursively Enumerable and Recursive. Languages

Recursively Enumerable and Recursive. Languages Recursively Enumerble nd Recursive nguges 1 Recll Definition (clss 19.pdf) Definition 10.4, inz, 6 th, pge 279 et S be set of strings. An enumertion procedure for Turing Mchine tht genertes ll strings

More information

Data Assimilation. Alan O Neill Data Assimilation Research Centre University of Reading

Data Assimilation. Alan O Neill Data Assimilation Research Centre University of Reading Dt Assimiltion Aln O Neill Dt Assimiltion Reserch Centre University of Reding Contents Motivtion Univrite sclr dt ssimiltion Multivrite vector dt ssimiltion Optiml Interpoltion BLUE 3d-Vritionl Method

More information

Chapter 14. Matrix Representations of Linear Transformations

Chapter 14. Matrix Representations of Linear Transformations Chpter 4 Mtrix Representtions of Liner Trnsformtions When considering the Het Stte Evolution, we found tht we could describe this process using multipliction by mtrix. This ws nice becuse computers cn

More information

CS 188: Artificial Intelligence

CS 188: Artificial Intelligence CS 188: Artificil Intelligence Lecture 19: Decision Digrms Pieter Abbeel --- C Berkeley Mny slides over this course dpted from Dn Klein, Sturt Russell, Andrew Moore Decision Networks ME: choose the ction

More information

Numerical Integration

Numerical Integration Numericl Integrtion Wouter J. Den Hn London School of Economics c 2011 by Wouter J. Den Hn June 3, 2011 Qudrture techniques I = f (x)dx n n w i f (x i ) = w i f i i=1 i=1 Nodes: x i Weights: w i Qudrture

More information

5.3 The Fundamental Theorem of Calculus

5.3 The Fundamental Theorem of Calculus CHAPTER 5. THE DEFINITE INTEGRAL 35 5.3 The Funmentl Theorem of Clculus Emple. Let f(t) t +. () Fin the re of the region below f(t), bove the t-is, n between t n t. (You my wnt to look up the re formul

More information

12.1 Introduction to Rational Expressions

12.1 Introduction to Rational Expressions . Introduction to Rtionl Epressions A rtionl epression is rtio of polynomils; tht is, frction tht hs polynomil s numertor nd/or denomintor. Smple rtionl epressions: 0 EVALUATING RATIONAL EXPRESSIONS To

More information

Tests for the Ratio of Two Poisson Rates

Tests for the Ratio of Two Poisson Rates Chpter 437 Tests for the Rtio of Two Poisson Rtes Introduction The Poisson probbility lw gives the probbility distribution of the number of events occurring in specified intervl of time or spce. The Poisson

More information

fractions Let s Learn to

fractions Let s Learn to 5 simple lgebric frctions corne lens pupil retin Norml vision light focused on the retin concve lens Shortsightedness (myopi) light focused in front of the retin Corrected myopi light focused on the retin

More information

CS415 Compilers. Lexical Analysis and. These slides are based on slides copyrighted by Keith Cooper, Ken Kennedy & Linda Torczon at Rice University

CS415 Compilers. Lexical Analysis and. These slides are based on slides copyrighted by Keith Cooper, Ken Kennedy & Linda Torczon at Rice University CS415 Compilers Lexicl Anlysis nd These slides re sed on slides copyrighted y Keith Cooper, Ken Kennedy & Lind Torczon t Rice University First Progrmming Project Instruction Scheduling Project hs een posted

More information

Where did dynamic programming come from?

Where did dynamic programming come from? Where did dynmic progrmming come from? String lgorithms Dvid Kuchk cs302 Spring 2012 Richrd ellmn On the irth of Dynmic Progrmming Sturt Dreyfus http://www.eng.tu.c.il/~mi/cd/ or50/1526-5463-2002-50-01-0048.pdf

More information

different methods (left endpoint, right endpoint, midpoint, trapezoid, Simpson s).

different methods (left endpoint, right endpoint, midpoint, trapezoid, Simpson s). Mth 1A with Professor Stnkov Worksheet, Discussion #41; Wednesdy, 12/6/217 GSI nme: Roy Zho Problems 1. Write the integrl 3 dx s limit of Riemnn sums. Write it using 2 intervls using the 1 x different

More information

Numerical integration

Numerical integration 2 Numericl integrtion This is pge i Printer: Opque this 2. Introduction Numericl integrtion is problem tht is prt of mny problems in the economics nd econometrics literture. The orgniztion of this chpter

More information

Cf. Linn Sennott, Stochastic Dynamic Programming and the Control of Queueing Systems, Wiley Series in Probability & Statistics, 1999.

Cf. Linn Sennott, Stochastic Dynamic Programming and the Control of Queueing Systems, Wiley Series in Probability & Statistics, 1999. Cf. Linn Sennott, Stochstic Dynmic Progrmming nd the Control of Queueing Systems, Wiley Series in Probbility & Sttistics, 1999. D.L.Bricker, 2001 Dept of Industril Engineering The University of Iow MDP

More information

Consolidation Worksheet

Consolidation Worksheet Cmbridge Essentils Mthemtics Core 8 NConsolidtion Worksheet N Consolidtion Worksheet Work these out. 8 b 7 + 0 c 6 + 7 5 Use the number line to help. 2 Remember + 2 2 +2 2 2 + 2 Adding negtive number is

More information

Problem. Statement. variable Y. Method: Step 1: Step 2: y d dy. Find F ( Step 3: Find f = Y. Solution: Assume

Problem. Statement. variable Y. Method: Step 1: Step 2: y d dy. Find F ( Step 3: Find f = Y. Solution: Assume Functions of Rndom Vrible Problem Sttement We know the pdf ( or cdf ) of rndom r vrible. Define new rndom vrible Y = g. Find the pdf of Y. Method: Step : Step : Step 3: Plot Y = g( ). Find F ( y) by mpping

More information

Recitation 3: More Applications of the Derivative

Recitation 3: More Applications of the Derivative Mth 1c TA: Pdric Brtlett Recittion 3: More Applictions of the Derivtive Week 3 Cltech 2012 1 Rndom Question Question 1 A grph consists of the following: A set V of vertices. A set E of edges where ech

More information

Multi-Armed Bandits: Non-adaptive and Adaptive Sampling

Multi-Armed Bandits: Non-adaptive and Adaptive Sampling CSE 547/Stt 548: Mchine Lerning for Big Dt Lecture Multi-Armed Bndits: Non-dptive nd Adptive Smpling Instructor: Shm Kkde 1 The (stochstic) multi-rmed bndit problem The bsic prdigm is s follows: K Independent

More information

13: Diffusion in 2 Energy Groups

13: Diffusion in 2 Energy Groups 3: Diffusion in Energy Groups B. Rouben McMster University Course EP 4D3/6D3 Nucler Rector Anlysis (Rector Physics) 5 Sept.-Dec. 5 September Contents We study the diffusion eqution in two energy groups

More information

f(x) dx, If one of these two conditions is not met, we call the integral improper. Our usual definition for the value for the definite integral

f(x) dx, If one of these two conditions is not met, we call the integral improper. Our usual definition for the value for the definite integral Improper Integrls Every time tht we hve evluted definite integrl such s f(x) dx, we hve mde two implicit ssumptions bout the integrl:. The intervl [, b] is finite, nd. f(x) is continuous on [, b]. If one

More information

Mathcad Lecture #1 In-class Worksheet Mathcad Basics

Mathcad Lecture #1 In-class Worksheet Mathcad Basics Mthcd Lecture #1 In-clss Worksheet Mthcd Bsics At the end of this lecture, you will be ble to: Evlute mthemticl epression numericlly Assign vrible nd use them in subsequent clcultions Distinguish between

More information

Riemann is the Mann! (But Lebesgue may besgue to differ.)

Riemann is the Mann! (But Lebesgue may besgue to differ.) Riemnn is the Mnn! (But Lebesgue my besgue to differ.) Leo Livshits My 2, 2008 1 For finite intervls in R We hve seen in clss tht every continuous function f : [, b] R hs the property tht for every ɛ >

More information

THE EXISTENCE-UNIQUENESS THEOREM FOR FIRST-ORDER DIFFERENTIAL EQUATIONS.

THE EXISTENCE-UNIQUENESS THEOREM FOR FIRST-ORDER DIFFERENTIAL EQUATIONS. THE EXISTENCE-UNIQUENESS THEOREM FOR FIRST-ORDER DIFFERENTIAL EQUATIONS RADON ROSBOROUGH https://intuitiveexplntionscom/picrd-lindelof-theorem/ This document is proof of the existence-uniqueness theorem

More information

A-Level Mathematics Transition Task (compulsory for all maths students and all further maths student)

A-Level Mathematics Transition Task (compulsory for all maths students and all further maths student) A-Level Mthemtics Trnsition Tsk (compulsory for ll mths students nd ll further mths student) Due: st Lesson of the yer. Length: - hours work (depending on prior knowledge) This trnsition tsk provides revision

More information

Chapter 9: Inferences based on Two samples: Confidence intervals and tests of hypotheses

Chapter 9: Inferences based on Two samples: Confidence intervals and tests of hypotheses Chpter 9: Inferences bsed on Two smples: Confidence intervls nd tests of hypotheses 9.1 The trget prmeter : difference between two popultion mens : difference between two popultion proportions : rtio of

More information

Numerical Analysis: Trapezoidal and Simpson s Rule

Numerical Analysis: Trapezoidal and Simpson s Rule nd Simpson s Mthemticl question we re interested in numericlly nswering How to we evlute I = f (x) dx? Clculus tells us tht if F(x) is the ntiderivtive of function f (x) on the intervl [, b], then I =

More information

Concepts of Concurrent Computation Spring 2015 Lecture 9: Petri Nets

Concepts of Concurrent Computation Spring 2015 Lecture 9: Petri Nets Concepts of Concurrent Computtion Spring 205 Lecture 9: Petri Nets Sebstin Nnz Chris Poskitt Chir of Softwre Engineering Petri nets Petri nets re mthemticl models for describing systems with concurrency

More information

Fingerprint idea. Assume:

Fingerprint idea. Assume: Fingerprint ide Assume: We cn compute fingerprint f(p) of P in O(m) time. If f(p) f(t[s.. s+m 1]), then P T[s.. s+m 1] We cn compre fingerprints in O(1) We cn compute f = f(t[s+1.. s+m]) from f(t[s.. s+m

More information

PHYS Summer Professor Caillault Homework Solutions. Chapter 2

PHYS Summer Professor Caillault Homework Solutions. Chapter 2 PHYS 1111 - Summer 2007 - Professor Cillult Homework Solutions Chpter 2 5. Picture the Problem: The runner moves long the ovl trck. Strtegy: The distnce is the totl length of trvel, nd the displcement

More information

Applying Q-Learning to Flappy Bird

Applying Q-Learning to Flappy Bird Applying Q-Lerning to Flppy Bird Moritz Ebeling-Rump, Mnfred Ko, Zchry Hervieux-Moore Abstrct The field of mchine lerning is n interesting nd reltively new re of reserch in rtificil intelligence. In this

More information

Numerical Analysis. 10th ed. R L Burden, J D Faires, and A M Burden

Numerical Analysis. 10th ed. R L Burden, J D Faires, and A M Burden Numericl Anlysis 10th ed R L Burden, J D Fires, nd A M Burden Bemer Presenttion Slides Prepred by Dr. Annette M. Burden Youngstown Stte University July 9, 2015 Chpter 4.1: Numericl Differentition 1 Three-Point

More information

Planning to Be Surprised: Optimal Bayesian Exploration in Dynamic Environments

Planning to Be Surprised: Optimal Bayesian Exploration in Dynamic Environments Plnning to Be Surprised: Optiml Byesin Explortion in Dynmic Environments Yi Sun, Fustino Gomez, nd Jürgen Schmidhuber IDSIA, Glleri 2, Mnno, CH-6928, Switzerlnd {yi,tino,juergen}@idsi.ch Abstrct. To mximize

More information

Adding and Subtracting Rational Expressions

Adding and Subtracting Rational Expressions 6.4 Adding nd Subtrcting Rtionl Epressions Essentil Question How cn you determine the domin of the sum or difference of two rtionl epressions? You cn dd nd subtrct rtionl epressions in much the sme wy

More information

Improper Integrals. Type I Improper Integrals How do we evaluate an integral such as

Improper Integrals. Type I Improper Integrals How do we evaluate an integral such as Improper Integrls Two different types of integrls cn qulify s improper. The first type of improper integrl (which we will refer to s Type I) involves evluting n integrl over n infinite region. In the grph

More information

Chapter 6 Notes, Larson/Hostetler 3e

Chapter 6 Notes, Larson/Hostetler 3e Contents 6. Antiderivtives nd the Rules of Integrtion.......................... 6. Are nd the Definite Integrl.................................. 6.. Are............................................ 6. Reimnn

More information

Fundamentals of Analytical Chemistry

Fundamentals of Analytical Chemistry Homework Fundmentls of nlyticl hemistry hpter 9 0, 1, 5, 7, 9 cids, Bses, nd hpter 9(b) Definitions cid Releses H ions in wter (rrhenius) Proton donor (Bronsted( Lowry) Electron-pir cceptor (Lewis) hrcteristic

More information

Learning to Serve and Bounce a Ball

Learning to Serve and Bounce a Ball Sndr Amend Gregor Gebhrdt Technische Universität Drmstdt Abstrct In this pper we investigte lerning the tsks of bll serving nd bll bouncing. These tsks disply chrcteristics which re common in vriety of

More information

Today. Recap: Reasoning Over Time. Demo Bonanza! CS 188: Artificial Intelligence. Advanced HMMs. Speech recognition. HMMs. Start machine learning

Today. Recap: Reasoning Over Time. Demo Bonanza! CS 188: Artificial Intelligence. Advanced HMMs. Speech recognition. HMMs. Start machine learning CS 188: Artificil Intelligence Advnced HMMs Dn Klein, Pieter Aeel University of Cliforni, Berkeley Demo Bonnz! Tody HMMs Demo onnz! Most likely explntion queries Speech recognition A mssive HMM! Detils

More information

Non-Linear & Logistic Regression

Non-Linear & Logistic Regression Non-Liner & Logistic Regression If the sttistics re boring, then you've got the wrong numbers. Edwrd R. Tufte (Sttistics Professor, Yle University) Regression Anlyses When do we use these? PART 1: find

More information

7.1 Integral as Net Change and 7.2 Areas in the Plane Calculus

7.1 Integral as Net Change and 7.2 Areas in the Plane Calculus 7.1 Integrl s Net Chnge nd 7. Ares in the Plne Clculus 7.1 INTEGRAL AS NET CHANGE Notecrds from 7.1: Displcement vs Totl Distnce, Integrl s Net Chnge We hve lredy seen how the position of n oject cn e

More information

AQA Further Pure 1. Complex Numbers. Section 1: Introduction to Complex Numbers. The number system

AQA Further Pure 1. Complex Numbers. Section 1: Introduction to Complex Numbers. The number system Complex Numbers Section 1: Introduction to Complex Numbers Notes nd Exmples These notes contin subsections on The number system Adding nd subtrcting complex numbers Multiplying complex numbers Complex

More information

KNOWLEDGE-BASED AGENTS INFERENCE

KNOWLEDGE-BASED AGENTS INFERENCE AGENTS THAT REASON LOGICALLY KNOWLEDGE-BASED AGENTS Two components: knowledge bse, nd n inference engine. Declrtive pproch to building n gent. We tell it wht it needs to know, nd It cn sk itself wht to

More information

Chapter 5 Plan-Space Planning

Chapter 5 Plan-Space Planning Lecture slides for Automted Plnning: Theory nd Prctice Chpter 5 Pln-Spce Plnning Dn S. Nu CMSC 722, AI Plnning University of Mrylnd, Spring 2008 1 Stte-Spce Plnning Motivtion g 1 1 g 4 4 s 0 g 5 5 g 2

More information

Math 8 Winter 2015 Applications of Integration

Math 8 Winter 2015 Applications of Integration Mth 8 Winter 205 Applictions of Integrtion Here re few importnt pplictions of integrtion. The pplictions you my see on n exm in this course include only the Net Chnge Theorem (which is relly just the Fundmentl

More information

Chapters 4 & 5 Integrals & Applications

Chapters 4 & 5 Integrals & Applications Contents Chpters 4 & 5 Integrls & Applictions Motivtion to Chpters 4 & 5 2 Chpter 4 3 Ares nd Distnces 3. VIDEO - Ares Under Functions............................................ 3.2 VIDEO - Applictions

More information

Problem Set 3 Solutions

Problem Set 3 Solutions Chemistry 36 Dr Jen M Stndrd Problem Set 3 Solutions 1 Verify for the prticle in one-dimensionl box by explicit integrtion tht the wvefunction ψ ( x) π x is normlized To verify tht ψ ( x) is normlized,

More information

Chapter 7 Notes, Stewart 8e. 7.1 Integration by Parts Trigonometric Integrals Evaluating sin m x cos n (x) dx...

Chapter 7 Notes, Stewart 8e. 7.1 Integration by Parts Trigonometric Integrals Evaluating sin m x cos n (x) dx... Contents 7.1 Integrtion by Prts................................... 2 7.2 Trigonometric Integrls.................................. 8 7.2.1 Evluting sin m x cos n (x)......................... 8 7.2.2 Evluting

More information