Bellman Optimality Equation for V*
|
|
- Augustine Curtis
- 5 years ago
- Views:
Transcription
1 Bellmn Optimlity Eqution for V* The vlue of stte under n optiml policy must equl the expected return for the best ction from tht stte: V (s) mx Q (s,) A(s) mx A(s) mx A(s) Er t 1 V (s t 1 ) s t s, t s P ss The relevnt bckup digrm: R ss V ( s ) V is the unique solution of this system of nonliner equtions. R. S. Sutton nd A. G. Brto: Reinforcement Lerning: An Introduction 1
2 Bellmn Optimlity Eqution for Q* Q (s,) E r t 1 mx P ss s R ss Q (s t 1, ) s t s, t mx Q ( s, ) The relevnt bckup digrm: Q * is the unique solution of this system of nonliner equtions. R. S. Sutton nd A. G. Brto: Reinforcement Lerning: An Introduction 2
3 Why Optiml Stte-Vlue Functions re Useful Any policy tht is greedy with respect to V is n optiml policy. V Therefore, given, one-step-hed serch produces the long-term optiml ctions. E.g., bck to the gridworld: * R. S. Sutton nd A. G. Brto: Reinforcement Lerning: An Introduction 3
4 Wht About Optiml Action-Vlue Functions? Q * Given, the gent does not even hve to do one-step-hed serch: (s) rg mx Q (s,) A(s) R. S. Sutton nd A. G. Brto: Reinforcement Lerning: An Introduction 4
5 Solving the Bellmn Optimlity Eqution Finding n optiml policy by solving the Bellmn Optimlity Eqution requires the following: ccurte knowledge of environment dynmics; we hve enough spce nd time to do the computtion; the Mrkov Property. How much spce nd time do we need? polynomil in number of sttes (vi dynmic progrmming methods; Chpter 4), BUT, number of sttes is often huge (e.g., bckgmmon hs bout sttes). We usully hve to settle for pproximtions. Mny RL methods cn be understood s pproximtely solving the Bellmn Optimlity Eqution. R. S. Sutton nd A. G. Brto: Reinforcement Lerning: An Introduction 5
6 Summry Agent-environment interction Sttes Actions Rewrds Policy: stochstic rule for selecting ctions Return: the function of future rewrds gent tries to mximize Episodic nd continuing tsks Mrkov Property Mrkov Decision Process Trnsition probbilities Expected rewrds Vlue functions Stte-vlue function for policy Action-vlue function for policy Optiml stte-vlue function Optiml ction-vlue function Optiml vlue functions Optiml policies Bellmn Equtions The need for pproximtion R. S. Sutton nd A. G. Brto: Reinforcement Lerning: An Introduction 6
7 R. S. Sutton nd A. G. Brto: Reinforcement Lerning: An Introduction 7
8 Gridworld Actions: north, south, est, west; deterministic. If would tke gent off the grid: no move but rewrd = 1 Other ctions produce rewrd = 0, except ctions tht move gent out of specil sttes A nd B s shown. Wht if ll rewrds re shifted by constnt? R. S. Sutton nd A. G. Brto: Reinforcement Lerning: An Introduction 8
9 R. S. Sutton nd A. G. Brto: Reinforcement Lerning: An Introduction 9
10 Chpter 4: Dynmic Progrmming Objectives of this chpter: Overview of collection of clssicl solution methods for MDPs known s dynmic progrmming (DP) Show how DP cn be used to compute vlue functions, nd hence, optiml policies Discuss efficiency nd utility of DP R. S. Sutton nd A. G. Brto: Reinforcement Lerning: An Introduction 10
11 Policy Evlution Policy Evlution: for given policy, compute the stte-vlue function V Recll: Stte- vlue function for policy : V (s) E R t s t s E k r t k 1 s t s k 0 Bellmnequtionfor V V ( s) ( s, ) s P ss systemof S simultneous liner equtions R : ss V ( s) R. S. Sutton nd A. G. Brto: Reinforcement Lerning: An Introduction 11
12 Itertive Methods V 0 V 1 V k V k1 V sweep A sweep consists of pplying bckup opertion to ech stte. A full policy-evlution bckup: s V k1 (s) (s,) P s s R ss V k ( s ) R. S. Sutton nd A. G. Brto: Reinforcement Lerning: An Introduction 12
13 Itertive Policy Evlution R. S. Sutton nd A. G. Brto: Reinforcement Lerning: An Introduction 13
14 A Smll Gridworld An undiscounted episodic tsk Nonterminl sttes: 1, 2,..., 14; One terminl stte (shown twice s shded squres) Actions tht would tke gent off the grid leve stte unchnged Rewrd is 1 until the terminl stte is reched R. S. Sutton nd A. G. Brto: Reinforcement Lerning: An Introduction 14
15 Itertive Policy Evl for the Smll Gridworld equiprobble rndom ction choices R. S. Sutton nd A. G. Brto: Reinforcement Lerning: An Introduction 15
16 Policy Improvement Suppose we hve computed V for deterministic policy. For given stte s, would it be better to do n ction (s)? The vlue of doing in stte s is : Q (s,) E s r t 1 V (s t 1 ) s t s, t P ss R ss V ( s ) It is better to switch to ction for stte s if nd only if Q (s,) V (s) R. S. Sutton nd A. G. Brto: Reinforcement Lerning: An Introduction 16
17 Policy Improvement Cont. Do this for ll sttes to get new policy tht is greedy with respect to V : Then V V (s) rgmx Q (s,) rgmx s P s R s V ( s ) R. S. Sutton nd A. G. Brto: Reinforcement Lerning: An Introduction 17
18 Policy Improvement Cont. Wht if V V? i.e., for ll s S, V (s) mx s V ( s )? P R ss ss But this is the Bellmn Optimlity Eqution. So V V nd both nd re optiml policies. R. S. Sutton nd A. G. Brto: Reinforcement Lerning: An Introduction 18
19 Policy Itertion 0 V 0 1 V 1 * V * * policy evlution policy improvement greedifiction R. S. Sutton nd A. G. Brto: Reinforcement Lerning: An Introduction 19
20 Policy Itertion R. S. Sutton nd A. G. Brto: Reinforcement Lerning: An Introduction 20
21 Jck s Cr Rentl $10 for ech cr rented (must be vilble when request rec d) Two loctions, mximum of 20 crs t ech Crs returned nd requested rndomly Poisson distribution, n returns/requests with prob 1st loction: verge requests = 3, verge returns = 3 2nd loction: verge requests = 4, verge returns = 2 Cn move up to 5 crs between loctions overnight n n! e Sttes, Actions, Rewrds? Trnsition probbilities? R. S. Sutton nd A. G. Brto: Reinforcement Lerning: An Introduction 21
22 Jck s Cr Rentl R. S. Sutton nd A. G. Brto: Reinforcement Lerning: An Introduction 22
23 Jck s CR Exercise Suppose the first cr moved is free From 1st to 2nd loction Becuse n employee trvels tht wy nywy (by bus) Suppose only 10 crs cn be prked for free t ech loction More thn 10 cost $4 for using n extr prking lot Such rbitrry nonlinerities re common in rel problems R. S. Sutton nd A. G. Brto: Reinforcement Lerning: An Introduction 23
24 Vlue Itertion Recll the full policy-evlution bckup: s V k1 (s) (s,) P ss R ss V k ( s ) Here is the full vlue-itertion bckup: V k1 (s) mx s P ss R ss V k ( s ) R. S. Sutton nd A. G. Brto: Reinforcement Lerning: An Introduction 24
25 Vlue Itertion Cont. R. S. Sutton nd A. G. Brto: Reinforcement Lerning: An Introduction 25
Chapter 4: Dynamic Programming
Chpter 4: Dynmic Progrmming Objectives of this chpter: Overview of collection of clssicl solution methods for MDPs known s dynmic progrmming (DP) Show how DP cn be used to compute vlue functions, nd hence,
More information{ } = E! & $ " k r t +k +1
Chpter 4: Dynmic Progrmming Objectives of this chpter: Overview of collection of clssicl solution methods for MDPs known s dynmic progrmming (DP) Show how DP cn be used to compute vlue functions, nd hence,
More informationAdministrivia CSE 190: Reinforcement Learning: An Introduction
Administrivi CSE 190: Reinforcement Lerning: An Introduction Any emil sent to me bout the course should hve CSE 190 in the subject line! Chpter 4: Dynmic Progrmming Acknowledgment: A good number of these
More information2D1431 Machine Learning Lab 3: Reinforcement Learning
2D1431 Mchine Lerning Lb 3: Reinforcement Lerning Frnk Hoffmnn modified by Örjn Ekeberg December 7, 2004 1 Introduction In this lb you will lern bout dynmic progrmming nd reinforcement lerning. It is ssumed
More informationReinforcement learning II
CS 1675 Introduction to Mchine Lerning Lecture 26 Reinforcement lerning II Milos Huskrecht milos@cs.pitt.edu 5329 Sennott Squre Reinforcement lerning Bsics: Input x Lerner Output Reinforcement r Critic
More informationModule 6 Value Iteration. CS 886 Sequential Decision Making and Reinforcement Learning University of Waterloo
Module 6 Vlue Itertion CS 886 Sequentil Decision Mking nd Reinforcement Lerning University of Wterloo Mrkov Decision Process Definition Set of sttes: S Set of ctions (i.e., decisions): A Trnsition model:
More informationReinforcement Learning
Reinforcement Lerning Tom Mitchell, Mchine Lerning, chpter 13 Outline Introduction Comprison with inductive lerning Mrkov Decision Processes: the model Optiml policy: The tsk Q Lerning: Q function Algorithm
More information19 Optimal behavior: Game theory
Intro. to Artificil Intelligence: Dle Schuurmns, Relu Ptrscu 1 19 Optiml behvior: Gme theory Adversril stte dynmics hve to ccount for worst cse Compute policy π : S A tht mximizes minimum rewrd Let S (,
More informationReinforcement learning
Reinforcement lerning Regulr MDP Given: Trnition model P Rewrd function R Find: Policy π Reinforcement lerning Trnition model nd rewrd function initilly unknown Still need to find the right policy Lern
More informationReinforcement Learning and Policy Reuse
Reinforcement Lerning nd Policy Reue Mnuel M. Veloo PEL Fll 206 Reding: Reinforcement Lerning: An Introduction R. Sutton nd A. Brto Probbilitic policy reue in reinforcement lerning gent Fernndo Fernndez
More informationCf. Linn Sennott, Stochastic Dynamic Programming and the Control of Queueing Systems, Wiley Series in Probability & Statistics, 1999.
Cf. Linn Sennott, Stochstic Dynmic Progrmming nd the Control of Queueing Systems, Wiley Series in Probbility & Sttistics, 1999. D.L.Bricker, 2001 Dept of Industril Engineering The University of Iow MDP
More informationChapter 4: Dynamic Programming
Chapter 4: Dynamic Programming Objectives of this chapter: Overview of a collection of classical solution methods for MDPs known as dynamic programming (DP) Show how DP can be used to compute value functions,
More informationCS 188: Artificial Intelligence Spring 2007
CS 188: Artificil Intelligence Spring 2007 Lecture 3: Queue-Bsed Serch 1/23/2007 Srini Nrynn UC Berkeley Mny slides over the course dpted from Dn Klein, Sturt Russell or Andrew Moore Announcements Assignment
More informationIntroduction to Reinforcement Learning. Part 6: Core Theory II: Bellman Equations and Dynamic Programming
Introduction to Reinforcement Learning Part 6: Core Theory II: Bellman Equations and Dynamic Programming Bellman Equations Recursive relationships among values that can be used to compute values The tree
More informationCS 188: Artificial Intelligence
CS 188: Artificil Intelligence Lecture 19: Decision Digrms Pieter Abbeel --- C Berkeley Mny slides over this course dpted from Dn Klein, Sturt Russell, Andrew Moore Decision Networks ME: choose the ction
More informationEfficient Planning. R. S. Sutton and A. G. Barto: Reinforcement Learning: An Introduction
Efficient Plnning 1 Tuesdy clss summry: Plnning: ny computtionl process tht uses model to crete or improve policy Dyn frmework: 2 Questions during clss Why use simulted experience? Cn t you directly compute
More informationCS 188: Artificial Intelligence Fall 2010
CS 188: Artificil Intelligence Fll 2010 Lecture 18: Decision Digrms 10/28/2010 Dn Klein C Berkeley Vlue of Informtion 1 Decision Networks ME: choose the ction which mximizes the expected utility given
More informationDecision Networks. CS 188: Artificial Intelligence. Decision Networks. Decision Networks. Decision Networks and Value of Information
CS 188: Artificil Intelligence nd Vlue of Informtion Instructors: Dn Klein nd Pieter Abbeel niversity of Cliforni, Berkeley [These slides were creted by Dn Klein nd Pieter Abbeel for CS188 Intro to AI
More informationBayesian Networks: Approximate Inference
pproches to inference yesin Networks: pproximte Inference xct inference Vrillimintion Join tree lgorithm pproximte inference Simplify the structure of the network to mkxct inferencfficient (vritionl methods,
More informationDecision Networks. CS 188: Artificial Intelligence Fall Example: Decision Networks. Decision Networks. Decisions as Outcome Trees
CS 188: Artificil Intelligence Fll 2011 Decision Networks ME: choose the ction which mximizes the expected utility given the evidence mbrell Lecture 17: Decision Digrms 10/27/2011 Cn directly opertionlize
More informationReview of Calculus, cont d
Jim Lmbers MAT 460 Fll Semester 2009-10 Lecture 3 Notes These notes correspond to Section 1.1 in the text. Review of Clculus, cont d Riemnn Sums nd the Definite Integrl There re mny cses in which some
More informationOperations with Polynomials
38 Chpter P Prerequisites P.4 Opertions with Polynomils Wht you should lern: How to identify the leding coefficients nd degrees of polynomils How to dd nd subtrct polynomils How to multiply polynomils
More informationContinuous Random Variables
STAT/MATH 395 A - PROBABILITY II UW Winter Qurter 217 Néhémy Lim Continuous Rndom Vribles Nottion. The indictor function of set S is rel-vlued function defined by : { 1 if x S 1 S (x) if x S Suppose tht
More informationThe Regulated and Riemann Integrals
Chpter 1 The Regulted nd Riemnn Integrls 1.1 Introduction We will consider severl different pproches to defining the definite integrl f(x) dx of function f(x). These definitions will ll ssign the sme vlue
More informationArtificial Intelligence Markov Decision Problems
rtificil Intelligence Mrkov eciion Problem ilon - briefly mentioned in hpter Ruell nd orvig - hpter 7 Mrkov eciion Problem; pge of Mrkov eciion Problem; pge of exmple: probbilitic blockworld ction outcome
More informationMATH 115 FINAL EXAM. April 25, 2005
MATH 115 FINAL EXAM April 25, 2005 NAME: Solution Key INSTRUCTOR: SECTION NO: 1. Do not open this exm until you re told to begin. 2. This exm hs 9 pges including this cover. There re 9 questions. 3. Do
More informationLECTURE NOTE #12 PROF. ALAN YUILLE
LECTURE NOTE #12 PROF. ALAN YUILLE 1. Clustering, K-mens, nd EM Tsk: set of unlbeled dt D = {x 1,..., x n } Decompose into clsses w 1,..., w M where M is unknown. Lern clss models p(x w)) Discovery of
More informationNUMERICAL INTEGRATION. The inverse process to differentiation in calculus is integration. Mathematically, integration is represented by.
NUMERICAL INTEGRATION 1 Introduction The inverse process to differentition in clculus is integrtion. Mthemticlly, integrtion is represented by f(x) dx which stnds for the integrl of the function f(x) with
More informationWe will see what is meant by standard form very shortly
THEOREM: For fesible liner progrm in its stndrd form, the optimum vlue of the objective over its nonempty fesible region is () either unbounded or (b) is chievble t lest t one extreme point of the fesible
More information1B40 Practical Skills
B40 Prcticl Skills Comining uncertinties from severl quntities error propgtion We usully encounter situtions where the result of n experiment is given in terms of two (or more) quntities. We then need
More informationLecture 3 Gaussian Probability Distribution
Introduction Lecture 3 Gussin Probbility Distribution Gussin probbility distribution is perhps the most used distribution in ll of science. lso clled bell shped curve or norml distribution Unlike the binomil
More informationA Fast and Reliable Policy Improvement Algorithm
A Fst nd Relible Policy Improvement Algorithm Ysin Abbsi-Ydkori Peter L. Brtlett Stephen J. Wright Queenslnd University of Technology UC Berkeley nd QUT University of Wisconsin-Mdison Abstrct We introduce
More informationMath 1B, lecture 4: Error bounds for numerical methods
Mth B, lecture 4: Error bounds for numericl methods Nthn Pflueger 4 September 0 Introduction The five numericl methods descried in the previous lecture ll operte by the sme principle: they pproximte the
More informationBest Approximation. Chapter The General Case
Chpter 4 Best Approximtion 4.1 The Generl Cse In the previous chpter, we hve seen how n interpolting polynomil cn be used s n pproximtion to given function. We now wnt to find the best pproximtion to given
More informationMORE FUNCTION GRAPHING; OPTIMIZATION. (Last edited October 28, 2013 at 11:09pm.)
MORE FUNCTION GRAPHING; OPTIMIZATION FRI, OCT 25, 203 (Lst edited October 28, 203 t :09pm.) Exercise. Let n be n rbitrry positive integer. Give n exmple of function with exctly n verticl symptotes. Give
More informationNondeterminism and Nodeterministic Automata
Nondeterminism nd Nodeterministic Automt 61 Nondeterminism nd Nondeterministic Automt The computtionl mchine models tht we lerned in the clss re deterministic in the sense tht the next move is uniquely
More informationMatrix Solution to Linear Equations and Markov Chains
Trding Systems nd Methods, Fifth Edition By Perry J. Kufmn Copyright 2005, 2013 by Perry J. Kufmn APPENDIX 2 Mtrix Solution to Liner Equtions nd Mrkov Chins DIRECT SOLUTION AND CONVERGENCE METHOD Before
More informationWhere did dynamic programming come from?
Where did dynmic progrmming come from? String lgorithms Dvid Kuchk cs302 Spring 2012 Richrd ellmn On the irth of Dynmic Progrmming Sturt Dreyfus http://www.eng.tu.c.il/~mi/cd/ or50/1526-5463-2002-50-01-0048.pdf
More informationHow do you know you have SLE?
Simultneous Liner Equtions Simultneous Liner Equtions nd Liner Algebr Simultneous liner equtions (SLE s) occur frequently in Sttics, Dynmics, Circuits nd other engineering clsses Need to be ble to, nd
More informationProblem Set 3 Solutions
Chemistry 36 Dr Jen M Stndrd Problem Set 3 Solutions 1 Verify for the prticle in one-dimensionl box by explicit integrtion tht the wvefunction ψ ( x) π x is normlized To verify tht ψ ( x) is normlized,
More informationChapter 14. Matrix Representations of Linear Transformations
Chpter 4 Mtrix Representtions of Liner Trnsformtions When considering the Het Stte Evolution, we found tht we could describe this process using multipliction by mtrix. This ws nice becuse computers cn
More informationState space systems analysis (continued) Stability. A. Definitions A system is said to be Asymptotically Stable (AS) when it satisfies
Stte spce systems nlysis (continued) Stbility A. Definitions A system is sid to be Asymptoticlly Stble (AS) when it stisfies ut () = 0, t > 0 lim xt () 0. t A system is AS if nd only if the impulse response
More informationComputing the Optimal Global Alignment Value. B = n. Score of = 1 Score of = a a c g a c g a. A = n. Classical Dynamic Programming: O(n )
Alignment Grph Alignment Mtrix Computing the Optiml Globl Alignment Vlue An Introduction to Bioinformtics Algorithms A = n c t 2 3 c c 4 g 5 g 6 7 8 9 B = n 0 c g c g 2 3 4 5 6 7 8 t 9 0 2 3 4 5 6 7 8
More information5.2 Exponent Properties Involving Quotients
5. Eponent Properties Involving Quotients Lerning Objectives Use the quotient of powers property. Use the power of quotient property. Simplify epressions involving quotient properties of eponents. Use
More informationChapter 5 : Continuous Random Variables
STAT/MATH 395 A - PROBABILITY II UW Winter Qurter 216 Néhémy Lim Chpter 5 : Continuous Rndom Vribles Nottions. N {, 1, 2,...}, set of nturl numbers (i.e. ll nonnegtive integers); N {1, 2,...}, set of ll
More informationEngineering Analysis ENG 3420 Fall Dan C. Marinescu Office: HEC 439 B Office hours: Tu-Th 11:00-12:00
Engineering Anlysis ENG 3420 Fll 2009 Dn C. Mrinescu Office: HEC 439 B Office hours: Tu-Th 11:00-12:00 Lecture 13 Lst time: Problem solving in preprtion for the quiz Liner Algebr Concepts Vector Spces,
More informationReview of Gaussian Quadrature method
Review of Gussin Qudrture method Nsser M. Asi Spring 006 compiled on Sundy Decemer 1, 017 t 09:1 PM 1 The prolem To find numericl vlue for the integrl of rel vlued function of rel vrile over specific rnge
More informationJack Simons, Henry Eyring Scientist and Professor Chemistry Department University of Utah
1. Born-Oppenheimer pprox.- energy surfces 2. Men-field (Hrtree-Fock) theory- orbitls 3. Pros nd cons of HF- RHF, UHF 4. Beyond HF- why? 5. First, one usully does HF-how? 6. Bsis sets nd nottions 7. MPn,
More information20 MATHEMATICS POLYNOMIALS
0 MATHEMATICS POLYNOMIALS.1 Introduction In Clss IX, you hve studied polynomils in one vrible nd their degrees. Recll tht if p(x) is polynomil in x, the highest power of x in p(x) is clled the degree of
More informationA-Level Mathematics Transition Task (compulsory for all maths students and all further maths student)
A-Level Mthemtics Trnsition Tsk (compulsory for ll mths students nd ll further mths student) Due: st Lesson of the yer. Length: - hours work (depending on prior knowledge) This trnsition tsk provides revision
More informationMath& 152 Section Integration by Parts
Mth& 5 Section 7. - Integrtion by Prts Integrtion by prts is rule tht trnsforms the integrl of the product of two functions into other (idelly simpler) integrls. Recll from Clculus I tht given two differentible
More informationMonte Carlo method in solving numerical integration and differential equation
Monte Crlo method in solving numericl integrtion nd differentil eqution Ye Jin Chemistry Deprtment Duke University yj66@duke.edu Abstrct: Monte Crlo method is commonly used in rel physics problem. The
More informationA REVIEW OF CALCULUS CONCEPTS FOR JDEP 384H. Thomas Shores Department of Mathematics University of Nebraska Spring 2007
A REVIEW OF CALCULUS CONCEPTS FOR JDEP 384H Thoms Shores Deprtment of Mthemtics University of Nebrsk Spring 2007 Contents Rtes of Chnge nd Derivtives 1 Dierentils 4 Are nd Integrls 5 Multivrite Clculus
More information1 Online Learning and Regret Minimization
2.997 Decision-Mking in Lrge-Scle Systems My 10 MIT, Spring 2004 Hndout #29 Lecture Note 24 1 Online Lerning nd Regret Minimiztion In this lecture, we consider the problem of sequentil decision mking in
More informationSTEP FUNCTIONS, DELTA FUNCTIONS, AND THE VARIATION OF PARAMETERS FORMULA. 0 if t < 0, 1 if t > 0.
STEP FUNCTIONS, DELTA FUNCTIONS, AND THE VARIATION OF PARAMETERS FORMULA STEPHEN SCHECTER. The unit step function nd piecewise continuous functions The Heviside unit step function u(t) is given by if t
More informationDATA Search I 魏忠钰. 复旦大学大数据学院 School of Data Science, Fudan University. March 7 th, 2018
DATA620006 魏忠钰 Serch I Mrch 7 th, 2018 Outline Serch Problems Uninformed Serch Depth-First Serch Bredth-First Serch Uniform-Cost Serch Rel world tsk - Pc-mn Serch problems A serch problem consists of:
More informationSample pages. 9:04 Equations with grouping symbols
Equtions 9 Contents I know the nswer is here somewhere! 9:01 Inverse opertions 9:0 Solving equtions Fun spot 9:0 Why did the tooth get dressed up? 9:0 Equtions with pronumerls on both sides GeoGebr ctivity
More informationSection 7.2 Velocity. Solution
Section 7.2 Velocity In the previous chpter, we showed tht velocity is vector becuse it hd both mgnitude (speed) nd direction. In this section, we will demonstrte how two velocities cn be combined to determine
More informationTHE EXISTENCE-UNIQUENESS THEOREM FOR FIRST-ORDER DIFFERENTIAL EQUATIONS.
THE EXISTENCE-UNIQUENESS THEOREM FOR FIRST-ORDER DIFFERENTIAL EQUATIONS RADON ROSBOROUGH https://intuitiveexplntionscom/picrd-lindelof-theorem/ This document is proof of the existence-uniqueness theorem
More informationApplying Q-Learning to Flappy Bird
Applying Q-Lerning to Flppy Bird Moritz Ebeling-Rump, Mnfred Ko, Zchry Hervieux-Moore Abstrct The field of mchine lerning is n interesting nd reltively new re of reserch in rtificil intelligence. In this
More informationChapters 4 & 5 Integrals & Applications
Contents Chpters 4 & 5 Integrls & Applictions Motivtion to Chpters 4 & 5 2 Chpter 4 3 Ares nd Distnces 3. VIDEO - Ares Under Functions............................................ 3.2 VIDEO - Applictions
More informationBefore we can begin Ch. 3 on Radicals, we need to be familiar with perfect squares, cubes, etc. Try and do as many as you can without a calculator!!!
Nme: Algebr II Honors Pre-Chpter Homework Before we cn begin Ch on Rdicls, we need to be fmilir with perfect squres, cubes, etc Try nd do s mny s you cn without clcultor!!! n The nth root of n n Be ble
More informationCMDA 4604: Intermediate Topics in Mathematical Modeling Lecture 19: Interpolation and Quadrature
CMDA 4604: Intermedite Topics in Mthemticl Modeling Lecture 19: Interpoltion nd Qudrture In this lecture we mke brief diversion into the res of interpoltion nd qudrture. Given function f C[, b], we sy
More information(See Notes on Spontaneous Emission)
ECE 240 for Cvity from ECE 240 (See Notes on ) Quntum Rdition in ECE 240 Lsers - Fll 2017 Lecture 11 1 Free Spce ECE 240 for Cvity from Quntum Rdition in The electromgnetic mode density in free spce is
More information1 Probability Density Functions
Lis Yn CS 9 Continuous Distributions Lecture Notes #9 July 6, 28 Bsed on chpter by Chris Piech So fr, ll rndom vribles we hve seen hve been discrete. In ll the cses we hve seen in CS 9, this ment tht our
More informationSearch: The Core of Planning
Serch: The Core of Plnning Dr. Neil T. Dntm CSCI-498/598 RPM, Colordo School of Mines Spring 208 Dntm (Mines CSCI, RPM) Serch Spring 208 / 75 Outline Plnning nd Serch Problems Bsic Serch Depth-First Serch
More information3.4 Numerical integration
3.4. Numericl integrtion 63 3.4 Numericl integrtion In mny economic pplictions it is necessry to compute the definite integrl of relvlued function f with respect to "weight" function w over n intervl [,
More informationReview of basic calculus
Review of bsic clculus This brief review reclls some of the most importnt concepts, definitions, nd theorems from bsic clculus. It is not intended to tech bsic clculus from scrtch. If ny of the items below
More informationMatrices and Determinants
Nme Chpter 8 Mtrices nd Determinnts Section 8.1 Mtrices nd Systems of Equtions Objective: In this lesson you lerned how to use mtrices, Gussin elimintion, nd Guss-Jordn elimintion to solve systems of liner
More informationNumerical Integration
Chpter 5 Numericl Integrtion Numericl integrtion is the study of how the numericl vlue of n integrl cn be found. Methods of function pproximtion discussed in Chpter??, i.e., function pproximtion vi the
More informationChapter 4 Contravariance, Covariance, and Spacetime Diagrams
Chpter 4 Contrvrince, Covrince, nd Spcetime Digrms 4. The Components of Vector in Skewed Coordintes We hve seen in Chpter 3; figure 3.9, tht in order to show inertil motion tht is consistent with the Lorentz
More informationJonathan Mugan. July 15, 2013
Jonthn Mugn July 15, 2013 Imgine rt in Skinner box. The rt cn see screen of imges, nd dot in the lower-right corner determines if there will be shock. Bottom-up methods my not find this dot, but top-down
More informationAdding and Subtracting Rational Expressions
6.4 Adding nd Subtrcting Rtionl Epressions Essentil Question How cn you determine the domin of the sum or difference of two rtionl epressions? You cn dd nd subtrct rtionl epressions in much the sme wy
More informationLinear Inequalities. Work Sheet 1
Work Sheet 1 Liner Inequlities Rent--Hep, cr rentl compny,chrges $ 15 per week plus $ 0.0 per mile to rent one of their crs. Suppose you re limited y how much money you cn spend for the week : You cn spend
More informationRead section 3.3, 3.4 Announcements:
Dte: 3/1/13 Objective: SWBAT pply properties of exponentil functions nd will pply properties of rithms. Bell Ringer: 1. f x = 3x 6, find the inverse, f 1 x., Using your grphing clcultor, Grph 1. f x,f
More informationMulti-Armed Bandits: Non-adaptive and Adaptive Sampling
CSE 547/Stt 548: Mchine Lerning for Big Dt Lecture Multi-Armed Bndits: Non-dptive nd Adptive Smpling Instructor: Shm Kkde 1 The (stochstic) multi-rmed bndit problem The bsic prdigm is s follows: K Independent
More informationUNIFORM CONVERGENCE. Contents 1. Uniform Convergence 1 2. Properties of uniform convergence 3
UNIFORM CONVERGENCE Contents 1. Uniform Convergence 1 2. Properties of uniform convergence 3 Suppose f n : Ω R or f n : Ω C is sequence of rel or complex functions, nd f n f s n in some sense. Furthermore,
More informationMath 270A: Numerical Linear Algebra
Mth 70A: Numericl Liner Algebr Instructor: Michel Holst Fll Qurter 014 Homework Assignment #3 Due Give to TA t lest few dys before finl if you wnt feedbck. Exercise 3.1. (The Bsic Liner Method for Liner
More informationPHYS Summer Professor Caillault Homework Solutions. Chapter 2
PHYS 1111 - Summer 2007 - Professor Cillult Homework Solutions Chpter 2 5. Picture the Problem: The runner moves long the ovl trck. Strtegy: The distnce is the totl length of trvel, nd the displcement
More information4 7x =250; 5 3x =500; Read section 3.3, 3.4 Announcements: Bell Ringer: Use your calculator to solve
Dte: 3/14/13 Objective: SWBAT pply properties of exponentil functions nd will pply properties of rithms. Bell Ringer: Use your clcultor to solve 4 7x =250; 5 3x =500; HW Requests: Properties of Log Equtions
More informationMath 113 Exam 2 Practice
Mth Em Prctice Februry, 8 Em will cover sections 6.5, 7.-7.5 nd 7.8. This sheet hs three sections. The first section will remind you bout techniques nd formuls tht you should know. The second gives number
More informationEnergy Bands Energy Bands and Band Gap. Phys463.nb Phenomenon
Phys463.nb 49 7 Energy Bnds Ref: textbook, Chpter 7 Q: Why re there insultors nd conductors? Q: Wht will hppen when n electron moves in crystl? In the previous chpter, we discussed free electron gses,
More informationAbstract inner product spaces
WEEK 4 Abstrct inner product spces Definition An inner product spce is vector spce V over the rel field R equipped with rule for multiplying vectors, such tht the product of two vectors is sclr, nd the
More informationCHM Physical Chemistry I Chapter 1 - Supplementary Material
CHM 3410 - Physicl Chemistry I Chpter 1 - Supplementry Mteril For review of some bsic concepts in mth, see Atkins "Mthemticl Bckground 1 (pp 59-6), nd "Mthemticl Bckground " (pp 109-111). 1. Derivtion
More informationPre-Calculus TMTA Test 2018
. For the function f ( x) ( x )( x )( x 4) find the verge rte of chnge from x to x. ) 70 4 8.4 8.4 4 7 logb 8. If logb.07, logb 4.96, nd logb.60, then ).08..867.9.48. For, ) sec (sin ) is equivlent to
More informationTutorial 4. b a. h(f) = a b a ln 1. b a dx = ln(b a) nats = log(b a) bits. = ln λ + 1 nats. = log e λ bits. = ln 1 2 ln λ + 1. nats. = ln 2e. bits.
Tutoril 4 Exercises on Differentil Entropy. Evlute the differentil entropy h(x) f ln f for the following: () The uniform distribution, f(x) b. (b) The exponentil density, f(x) λe λx, x 0. (c) The Lplce
More informationCS 188 Introduction to Artificial Intelligence Fall 2018 Note 7
CS 188 Introduction to Artificil Intelligence Fll 2018 Note 7 These lecture notes re hevily bsed on notes originlly written by Nikhil Shrm. Decision Networks In the third note, we lerned bout gme trees
More informationLecture 6 Regular Grammars
Lecture 6 Regulr Grmmrs COT 4420 Theory of Computtion Section 3.3 Grmmr A grmmr G is defined s qudruple G = (V, T, S, P) V is finite set of vribles T is finite set of terminl symbols S V is specil vrible
More informationLecture 19: Continuous Least Squares Approximation
Lecture 19: Continuous Lest Squres Approximtion 33 Continuous lest squres pproximtion We begn 31 with the problem of pproximting some f C[, b] with polynomil p P n t the discrete points x, x 1,, x m for
More informationAdvanced Calculus: MATH 410 Notes on Integrals and Integrability Professor David Levermore 17 October 2004
Advnced Clculus: MATH 410 Notes on Integrls nd Integrbility Professor Dvid Levermore 17 October 2004 1. Definite Integrls In this section we revisit the definite integrl tht you were introduced to when
More informationUnit #9 : Definite Integral Properties; Fundamental Theorem of Calculus
Unit #9 : Definite Integrl Properties; Fundmentl Theorem of Clculus Gols: Identify properties of definite integrls Define odd nd even functions, nd reltionship to integrl vlues Introduce the Fundmentl
More informationSection 4.8. D v(t j 1 ) t. (4.8.1) j=1
Difference Equtions to Differentil Equtions Section.8 Distnce, Position, nd the Length of Curves Although we motivted the definition of the definite integrl with the notion of re, there re mny pplictions
More informationIf deg(num) deg(denom), then we should use long-division of polynomials to rewrite: p(x) = s(x) + r(x) q(x), q(x)
Mth 50 The method of prtil frction decomposition (PFD is used to integrte some rtionl functions of the form p(x, where p/q is in lowest terms nd deg(num < deg(denom. q(x If deg(num deg(denom, then we should
More informationMarkov Decision Processes
Mrkov Deciion Procee A Brief Introduction nd Overview Jck L. King Ph.D. Geno UK Limited Preenttion Outline Introduction to MDP Motivtion for Study Definition Key Point of Interet Solution Technique Prtilly
More informationCS 310 (sec 20) - Winter Final Exam (solutions) SOLUTIONS
CS 310 (sec 20) - Winter 2003 - Finl Exm (solutions) SOLUTIONS 1. (Logic) Use truth tles to prove the following logicl equivlences: () p q (p p) (q q) () p q (p q) (p q) () p q p q p p q q (q q) (p p)
More informationThis lecture covers Chapter 8 of HMU: Properties of CFLs
This lecture covers Chpter 8 of HMU: Properties of CFLs Turing Mchine Extensions of Turing Mchines Restrictions of Turing Mchines Additionl Reding: Chpter 8 of HMU. Turing Mchine: Informl Definition B
More informationNon-Linear & Logistic Regression
Non-Liner & Logistic Regression If the sttistics re boring, then you've got the wrong numbers. Edwrd R. Tufte (Sttistics Professor, Yle University) Regression Anlyses When do we use these? PART 1: find
More informationdifferent methods (left endpoint, right endpoint, midpoint, trapezoid, Simpson s).
Mth 1A with Professor Stnkov Worksheet, Discussion #41; Wednesdy, 12/6/217 GSI nme: Roy Zho Problems 1. Write the integrl 3 dx s limit of Riemnn sums. Write it using 2 intervls using the 1 x different
More informationNormal Distribution. Lecture 6: More Binomial Distribution. Properties of the Unit Normal Distribution. Unit Normal Distribution
Norml Distribution Lecture 6: More Binomil Distribution If X is rndom vrible with norml distribution with men µ nd vrince σ 2, X N (µ, σ 2, then P(X = x = f (x = 1 e 1 (x µ 2 2 σ 2 σ Sttistics 104 Colin
More informationAutonomous Learning of High-Level States and Actions in Continuous Environments. Jonathan Mugan and Benjamin Kuipers, Fellow, IEEE
Autonomous Lerning of High-Level Sttes nd s in Continuous Environments Jonthn Mugn nd Benjmin Kuipers, Fellow, IEEE Abstrct How cn n gent bootstrp up from low-level representtion to utonomously lern high-level
More information