Reinforcement Learning
|
|
- Heather McKinney
- 5 years ago
- Views:
Transcription
1 Reinforcement Lerning Tom Mitchell, Mchine Lerning, chpter 13 Outline Introduction Comprison with inductive lerning Mrkov Decision Processes: the model Optiml policy: The tsk Q Lerning: Q function Algorithm Convergence proof Explortion vs exploittion Non-deterministic rewrds nd ctions Copyright Fcundo Bromberg 2 Introduction How n utonomous gent tht senses nd cts in its environment cn lern to choose optiml ctions to chieve its gol. Rewrd Critic Agent Action Stte Environment s 0 s 1 s r 0 r 1 r 2 Gol: Lern to choose ctions tht mximize r 0 +r r , where 0<1 Copyright Fcundo Bromberg 3 Copyright Fcundo Bromberg 1
2 Introduction This generic problem is one of lerning to control sequentil processes such s: Lerning to control mobile robot, Lerning to optimize opertions in fctorie nd Lerning to ply bord gmes (world-clss bckgmmon plyer, Tesuro 1995). Suitble for scenrios with unpredictble (e.g. highly dynmic) or complex environments. Lck of domin theory Impossible to build-in optiml behvior (like in plnning, optimiztion lg, etc). Copyright Fcundo Bromberg 4 Introduction Assumption: gols cn be defined by rewrd function tht ssigns numericl vlues to ctionstte pirs. This rewrd function is known by the critic who could be externl or built-in. The tsk of the gent is to perform sequences of ction observe their consequence nd lern control policy : S A tht chooses ctions tht mximize the ccumulted rewrd. Copyright Fcundo Bromberg 5 Differences with inductive lerning Differences with function pproximtion (inductive lerning) of : S A: Delyed rewrd. Insted of pirs <(s)>, gent Optiml ction for current stte s. receives sequence of rewrds nd fces the problem of temporl credit ssignment. Explortion. The gent hs influence on the distribution of trining exmples by the ction sequence it chooses. Rises problem of explortion vs exploittion. Copyright Fcundo Bromberg 6 Copyright Fcundo Bromberg 2
3 Differences with inductive lerning Prtilly observble sttes. In mny prcticl situtions sensors provide only prtil informtion of the environment s stte. Optiml policy my therefore include specificlly ctions tht improve observbility of environment. Life-long lerning. Unlike isolted inductive lerning tsk gent lerning often requires lerning of severl relted tsks within the sme environment. Prior knowledge or experience become relevnt. Copyright Fcundo Bromberg 7 The model (or the tsk of mking compromises) Deterministic or non-deterministic ctions? Prior knowledge or not bout effects of its ctions on the environment (domin theory)? Triner gives exmples of optiml ction sequences (inductive lerning), or it must trin itself? The choice: Mrkov Decision Process (MDP). Copyright Fcundo Bromberg 8 Mrkov Decision Processes An MDP is tuple (S, A, s 0,, r), S is the set of stte A is the set of ctions vilble to the gent, s 0 is the initil stte, : S x A S, is the trnsition function, nd r: S x A R + is the rewrd function. r nd depend only on current stte nd ction (Mrkov property), nd they re deterministic. Copyright Fcundo Bromberg 9 Copyright Fcundo Bromberg 3
4 2 The tsk The tsk of the gent is to lern policy : S A tht selects next ction t bsed on current observed stte s t, i.e. (s t )= t. How? Policy tht mximizes cumultive rewrd over time. Tht i policy tht mximizes: V (s t ) =r t + r t r t+1 + = i r, 0 1 = + i 0 t < i Where the sequence of rewrds ws generted by 0 s0 r s1 1 ) r s2 ) r = 0 1 ( s = ( s = ( s Copyright Fcundo Bromberg 10 The tsk (2) Alterntive definitions of totl rewrd: Finite horizon rewrd: Averge rewrd: lim h = r i 0 t + i 1 h h h i = r 0 t + i Copyright Fcundo Bromberg 11 The optiml policy The tsk of the gent is thus to lern the optiml policy given by: rg mxv, And we denote by V (s)=v (s) the mximum rewrd the gent cn obtin strting t s. Copyright Fcundo Bromberg 12 Copyright Fcundo Bromberg 4
5 Exmple S: cells A: rrows. r: numbers by rrows. V (s bottom-right )=100 V (s bottom-center )= = 90 V (s bottom-left )= = 81 Since G is bsorbing, infinite sum becomes finite. Copyright Fcundo Bromberg 13 Q Lerning Agent wnts to mximize cumultive rewrd, thu it should prefer stte s 1 over s 2 whenever V (s 1 )>V (s 2 ). However, gent s policy must choose mong ction not sttes. No problem: The optiml ction in stte s is the ction tht mximizes the sum of the immedite rewrd r( plus the vlue V of the immedite successor stte s, discounted by. s, successor = rg mx [ r( + V ( ( )] immedite rewrd vlue of successor Copyright Fcundo Bromberg 14 Q Lerning rδ = rg mx[ ( + V ( ( )] Thu if gent knows functions r nd, it cn lern optiml policy by lerning vlue V offline vlue itertion lgorithm (skipped) For ll but few cse is unknown. Requires precise knowledge of the domin. Sometimes the domin is even non-deterministic!. Copyright Fcundo Bromberg 15 Copyright Fcundo Bromberg 5
6 The Q function If r or re unknown, wht evlution function should the gent use? The evlution function Q. rδ = rg mx[ ( + V ( ( )] Q( s, ) = rg mx Q( Thu if gent is cpble of lerning Q, it will be ble to select optiml ctions even when it hs no knowledge of r nd. Copyright Fcundo Bromberg 16 The Q function rδ = rg mx Q(, Q( = ( + V ( ( ) Surprisingly, gent cn choose optiml ction without ever conducting lookhed serch to explicitly consider wht sttes result from the ction. Ye Q function hs exctly tht property. Q( summrizes in single vlue ll the informtion needed to determine discounted cumultive rewrd tht will be gined in the future if ction is chosen in stte s. Copyright Fcundo Bromberg 17 Exmple rq( = δ ( + V ( ( ) S: cells A: rrows. r: numbers by rrows. Q(s G, )= Q(s bootom-right, )= = 100 Q(s bootom-center, )= = 90 Q(s bootom-left, )= (0.9) 2 100= 81 Since G is bsorbing, infinite sum becomes finite. Copyright Fcundo Bromberg 18 Copyright Fcundo Bromberg 6
7 Algorithm for lerning Q (Wtkins 1989) rδ = rg mx Q(, Q( = ( + V ( ( ) Lerning Q is equivlent to lern optiml policy. Note tht: V = mx [ r( ) + V ( ( ))] = mx Q( ) So we obtin recursive definition of Q, rδq( = ( + mx Q( ( ), ') The lgorithm lerns n pproximtion Qˆ of Q represented s tble with seprte entries for ech stte-ction pir. Copyright Fcundo Bromberg 19 Algorithm for lerning Q (cond.) For ech initilize tble entry Qˆ ( to zero. Observe current stte s Do forever: Agent observes its current stte Chooses some ction nd executes it Observes resulting rewrd r=r( nd new stte s =( Updtes the tble entry for Qˆ (, ccording to the rule: Qˆ( r + mxqˆ( s', ) s s Note tht Q-lerning propgtes Qˆ estimtes bckwrds from the new stte s to the old stte s. Copyright Fcundo Bromberg 20 Exmple 2. Q lerning S S S right S Qˆ( s1, ) r + mx Qˆ( s2, ) right mx{63, 81,100} 90 Copyright Fcundo Bromberg 21 Copyright Fcundo Bromberg 7
8 Exmple 3. Q lerning Proceeding in episodes from s 0 to G, lwys through s 1, s 2. s 1 0 s 2 0 G s 0 0 Copyright Fcundo Bromberg 22 Exmple 3. Q lerning episode 1 s 1 0 s G s 0 0 Copyright Fcundo Bromberg 23 Exmple 3. Q lerning episode 2 s 1 90 s G s 0 0 Copyright Fcundo Bromberg 24 Copyright Fcundo Bromberg 8
9 Exmple 3. Q lerning episode 3 s 1 90 s G s 0 81 Copyright Fcundo Bromberg 25 Convergence of Q lerning Will the lgorithm converge towrd Qˆ equl to the true Q function? Copyright Fcundo Bromberg 26 Convergence of Q lerning Qˆ Copyright Fcundo Bromberg 27 Copyright Fcundo Bromberg 9
10 Experimenttion strtegies in Q lerning Algorithm does not specify how ctions re chosen!. Exploittion: One possibility is for the gent t stte s to choose ction tht mximizes Qˆ (, thereby exploiting current pproximtion. With this strtegy, gent risks filing to explore other ction in other stte tht hve even higher vlues but hven t been visited yet. Moreover, theorem requires ction-stte pirs visited infinitely often. Copyright Fcundo Bromberg 28 Experimenttion strtegies in Q lerning (2) Explortion: Probbilistic pproch tht gives higher probbilities to higher Qˆ vlues. P( s) = i k Qˆ ( i ) where k > 0 determines how strongly the selection fvors ctions with high Qˆ vlues. j k Qˆ ( j ) High k exploit Low k explore Copyright Fcundo Bromberg 29 Nondeterministic rewrds nd ctions Noisy effector gmes with dice, etc. ( first produce distribution P : S A S then drws n outcome t rndom from P., nd Similrly, for r. We ssume these probbilities follows mrkov property. We retrce line of rgument tht led to the deterministic lgorithm, revising it where needed. Copyright Fcundo Bromberg 30 Copyright Fcundo Bromberg 10
11 Nondeterministic vlue function We define the nondeterministic vlue function V (s t ) for policy s the expected vlue of the discounted cumultive rewrd: V ( s ) E t i [ rt i ] i= 0 + As before, we define the optiml policy to be the policy tht mximizes V (s) for ll sttes s. rgmx V, Copyright Fcundo Bromberg 31 Nondeterministic vlue function And we generlize the erlier definition of Q by tking its expected vlue Q( E r [ ( + V ( ( ) ] = E[ r( ] + E[ V ( ( )] = E[ r( ] + s' P( s' V ( s' ) As before, we cn express Q recursively Q( = E[ r( ] + s' P( s' mxq( s', ) Copyright Fcundo Bromberg 32 Convergence nd trining rule The convergence proof holds for the deterministic cse, but previous lerning rule do not converge in the nondeterministic cse. The following trining rule is sufficient to ssure convergence of Qˆ to Q Qˆ ( (1 ) Qˆ n n ˆ 1( + n[ r mx Qn 1( s', )] n + where 1 n = 1+ visits ( n deterministic trining rule Copyright Fcundo Bromberg 33 Copyright Fcundo Bromberg 11
12 Convergence nd trining rule Key ide: revisions to Qˆ re mde more grdully thn in the deterministic cse. n =1 we recover the deterministic lerning rule. Choice of n given bove is one of mny to stisfy the conditions for convergence ccording to theorem by Wtkins nd Dyn (1992) (see Mitchell, not included here). Copyright Fcundo Bromberg 34 Copyright Fcundo Bromberg 12
Reinforcement learning II
CS 1675 Introduction to Mchine Lerning Lecture 26 Reinforcement lerning II Milos Huskrecht milos@cs.pitt.edu 5329 Sennott Squre Reinforcement lerning Bsics: Input x Lerner Output Reinforcement r Critic
More information2D1431 Machine Learning Lab 3: Reinforcement Learning
2D1431 Mchine Lerning Lb 3: Reinforcement Lerning Frnk Hoffmnn modified by Örjn Ekeberg December 7, 2004 1 Introduction In this lb you will lern bout dynmic progrmming nd reinforcement lerning. It is ssumed
More information19 Optimal behavior: Game theory
Intro. to Artificil Intelligence: Dle Schuurmns, Relu Ptrscu 1 19 Optiml behvior: Gme theory Adversril stte dynmics hve to ccount for worst cse Compute policy π : S A tht mximizes minimum rewrd Let S (,
More information1 Online Learning and Regret Minimization
2.997 Decision-Mking in Lrge-Scle Systems My 10 MIT, Spring 2004 Hndout #29 Lecture Note 24 1 Online Lerning nd Regret Minimiztion In this lecture, we consider the problem of sequentil decision mking in
More informationBellman Optimality Equation for V*
Bellmn Optimlity Eqution for V* The vlue of stte under n optiml policy must equl the expected return for the best ction from tht stte: V (s) mx Q (s,) A(s) mx A(s) mx A(s) Er t 1 V (s t 1 ) s t s, t s
More informationModule 6 Value Iteration. CS 886 Sequential Decision Making and Reinforcement Learning University of Waterloo
Module 6 Vlue Itertion CS 886 Sequentil Decision Mking nd Reinforcement Lerning University of Wterloo Mrkov Decision Process Definition Set of sttes: S Set of ctions (i.e., decisions): A Trnsition model:
More informationAdministrivia CSE 190: Reinforcement Learning: An Introduction
Administrivi CSE 190: Reinforcement Lerning: An Introduction Any emil sent to me bout the course should hve CSE 190 in the subject line! Chpter 4: Dynmic Progrmming Acknowledgment: A good number of these
More informationReinforcement Learning and Policy Reuse
Reinforcement Lerning nd Policy Reue Mnuel M. Veloo PEL Fll 206 Reding: Reinforcement Lerning: An Introduction R. Sutton nd A. Brto Probbilitic policy reue in reinforcement lerning gent Fernndo Fernndez
More informationReinforcement learning
Reinforcement lerning Regulr MDP Given: Trnition model P Rewrd function R Find: Policy π Reinforcement lerning Trnition model nd rewrd function initilly unknown Still need to find the right policy Lern
More informationCS 188: Artificial Intelligence Spring 2007
CS 188: Artificil Intelligence Spring 2007 Lecture 3: Queue-Bsed Serch 1/23/2007 Srini Nrynn UC Berkeley Mny slides over the course dpted from Dn Klein, Sturt Russell or Andrew Moore Announcements Assignment
More informationCS 188 Introduction to Artificial Intelligence Fall 2018 Note 7
CS 188 Introduction to Artificil Intelligence Fll 2018 Note 7 These lecture notes re hevily bsed on notes originlly written by Nikhil Shrm. Decision Networks In the third note, we lerned bout gme trees
More information{ } = E! & $ " k r t +k +1
Chpter 4: Dynmic Progrmming Objectives of this chpter: Overview of collection of clssicl solution methods for MDPs known s dynmic progrmming (DP) Show how DP cn be used to compute vlue functions, nd hence,
More informationChapter 4: Dynamic Programming
Chpter 4: Dynmic Progrmming Objectives of this chpter: Overview of collection of clssicl solution methods for MDPs known s dynmic progrmming (DP) Show how DP cn be used to compute vlue functions, nd hence,
More informationArtificial Intelligence Markov Decision Problems
rtificil Intelligence Mrkov eciion Problem ilon - briefly mentioned in hpter Ruell nd orvig - hpter 7 Mrkov eciion Problem; pge of Mrkov eciion Problem; pge of exmple: probbilitic blockworld ction outcome
More informationUninformed Search Lecture 4
Lecture 4 Wht re common serch strtegies tht operte given only serch problem? How do they compre? 1 Agend A quick refresher DFS, BFS, ID-DFS, UCS Unifiction! 2 Serch Problem Formlism Defined vi the following
More informationDecision Networks. CS 188: Artificial Intelligence Fall Example: Decision Networks. Decision Networks. Decisions as Outcome Trees
CS 188: Artificil Intelligence Fll 2011 Decision Networks ME: choose the ction which mximizes the expected utility given the evidence mbrell Lecture 17: Decision Digrms 10/27/2011 Cn directly opertionlize
More informationMulti-Armed Bandits: Non-adaptive and Adaptive Sampling
CSE 547/Stt 548: Mchine Lerning for Big Dt Lecture Multi-Armed Bndits: Non-dptive nd Adptive Smpling Instructor: Shm Kkde 1 The (stochstic) multi-rmed bndit problem The bsic prdigm is s follows: K Independent
More informationProperties of Integrals, Indefinite Integrals. Goals: Definition of the Definite Integral Integral Calculations using Antiderivatives
Block #6: Properties of Integrls, Indefinite Integrls Gols: Definition of the Definite Integrl Integrl Clcultions using Antiderivtives Properties of Integrls The Indefinite Integrl 1 Riemnn Sums - 1 Riemnn
More informationHidden Markov Models
Hidden Mrkov Models Huptseminr Mchine Lerning 18.11.2003 Referent: Nikols Dörfler 1 Overview Mrkov Models Hidden Mrkov Models Types of Hidden Mrkov Models Applictions using HMMs Three centrl problems:
More informationAdvanced Calculus: MATH 410 Notes on Integrals and Integrability Professor David Levermore 17 October 2004
Advnced Clculus: MATH 410 Notes on Integrls nd Integrbility Professor Dvid Levermore 17 October 2004 1. Definite Integrls In this section we revisit the definite integrl tht you were introduced to when
More informationStrong Bisimulation. Overview. References. Actions Labeled transition system Transition semantics Simulation Bisimulation
Strong Bisimultion Overview Actions Lbeled trnsition system Trnsition semntics Simultion Bisimultion References Robin Milner, Communiction nd Concurrency Robin Milner, Communicting nd Mobil Systems 32
More informationDATA Search I 魏忠钰. 复旦大学大数据学院 School of Data Science, Fudan University. March 7 th, 2018
DATA620006 魏忠钰 Serch I Mrch 7 th, 2018 Outline Serch Problems Uninformed Serch Depth-First Serch Bredth-First Serch Uniform-Cost Serch Rel world tsk - Pc-mn Serch problems A serch problem consists of:
More informationApplying Q-Learning to Flappy Bird
Applying Q-Lerning to Flppy Bird Moritz Ebeling-Rump, Mnfred Ko, Zchry Hervieux-Moore Abstrct The field of mchine lerning is n interesting nd reltively new re of reserch in rtificil intelligence. In this
More informationPlanning to Be Surprised: Optimal Bayesian Exploration in Dynamic Environments
Plnning to Be Surprised: Optiml Byesin Explortion in Dynmic Environments Yi Sun, Fustino Gomez, nd Jürgen Schmidhuber IDSIA, Glleri 2, Mnno, CH-6928, Switzerlnd {yi,tino,juergen}@idsi.ch Abstrct. To mximize
More informationCf. Linn Sennott, Stochastic Dynamic Programming and the Control of Queueing Systems, Wiley Series in Probability & Statistics, 1999.
Cf. Linn Sennott, Stochstic Dynmic Progrmming nd the Control of Queueing Systems, Wiley Series in Probbility & Sttistics, 1999. D.L.Bricker, 2001 Dept of Industril Engineering The University of Iow MDP
More informationExam 2, Mathematics 4701, Section ETY6 6:05 pm 7:40 pm, March 31, 2016, IH-1105 Instructor: Attila Máté 1
Exm, Mthemtics 471, Section ETY6 6:5 pm 7:4 pm, Mrch 1, 16, IH-115 Instructor: Attil Máté 1 17 copies 1. ) Stte the usul sufficient condition for the fixed-point itertion to converge when solving the eqution
More informationTHE EXISTENCE-UNIQUENESS THEOREM FOR FIRST-ORDER DIFFERENTIAL EQUATIONS.
THE EXISTENCE-UNIQUENESS THEOREM FOR FIRST-ORDER DIFFERENTIAL EQUATIONS RADON ROSBOROUGH https://intuitiveexplntionscom/picrd-lindelof-theorem/ This document is proof of the existence-uniqueness theorem
More informationEfficient Planning. R. S. Sutton and A. G. Barto: Reinforcement Learning: An Introduction
Efficient Plnning 1 Tuesdy clss summry: Plnning: ny computtionl process tht uses model to crete or improve policy Dyn frmework: 2 Questions during clss Why use simulted experience? Cn t you directly compute
More informationUnit #9 : Definite Integral Properties; Fundamental Theorem of Calculus
Unit #9 : Definite Integrl Properties; Fundmentl Theorem of Clculus Gols: Identify properties of definite integrls Define odd nd even functions, nd reltionship to integrl vlues Introduce the Fundmentl
More informationChapter 6 Notes, Larson/Hostetler 3e
Contents 6. Antiderivtives nd the Rules of Integrtion.......................... 6. Are nd the Definite Integrl.................................. 6.. Are............................................ 6. Reimnn
More informationLECTURE NOTE #12 PROF. ALAN YUILLE
LECTURE NOTE #12 PROF. ALAN YUILLE 1. Clustering, K-mens, nd EM Tsk: set of unlbeled dt D = {x 1,..., x n } Decompose into clsses w 1,..., w M where M is unknown. Lern clss models p(x w)) Discovery of
More informationContinuous Random Variables
STAT/MATH 395 A - PROBABILITY II UW Winter Qurter 217 Néhémy Lim Continuous Rndom Vribles Nottion. The indictor function of set S is rel-vlued function defined by : { 1 if x S 1 S (x) if x S Suppose tht
More informationJonathan Mugan. July 15, 2013
Jonthn Mugn July 15, 2013 Imgine rt in Skinner box. The rt cn see screen of imges, nd dot in the lower-right corner determines if there will be shock. Bottom-up methods my not find this dot, but top-down
More informationGoals: Determine how to calculate the area described by a function. Define the definite integral. Explore the relationship between the definite
Unit #8 : The Integrl Gols: Determine how to clculte the re described by function. Define the definite integrl. Eplore the reltionship between the definite integrl nd re. Eplore wys to estimte the definite
More informationChapter 0. What is the Lebesgue integral about?
Chpter 0. Wht is the Lebesgue integrl bout? The pln is to hve tutoril sheet ech week, most often on Fridy, (to be done during the clss) where you will try to get used to the ides introduced in the previous
More informationChapter 5 Plan-Space Planning
Lecture slides for Automted Plnning: Theory nd Prctice Chpter 5 Pln-Spce Plnning Dn S. Nu CMSC 722, AI Plnning University of Mrylnd, Spring 2008 1 Stte-Spce Plnning Motivtion g 1 1 g 4 4 s 0 g 5 5 g 2
More informationMath 1B, lecture 4: Error bounds for numerical methods
Mth B, lecture 4: Error bounds for numericl methods Nthn Pflueger 4 September 0 Introduction The five numericl methods descried in the previous lecture ll operte by the sme principle: they pproximte the
More informationCoalgebra, Lecture 15: Equations for Deterministic Automata
Colger, Lecture 15: Equtions for Deterministic Automt Julin Slmnc (nd Jurrin Rot) Decemer 19, 2016 In this lecture, we will study the concept of equtions for deterministic utomt. The notes re self contined
More informationScalable Learning in Stochastic Games
Sclble Lerning in Stochstic Gmes Michel Bowling nd Mnuel Veloso Computer Science Deprtment Crnegie Mellon University Pittsburgh PA, 15213-3891 Abstrct Stochstic gmes re generl model of interction between
More informationConvergence Results for Single-Step On-Policy Reinforcement-Learning Algorithms
Mchine Lerning, 39, 287 308, 2000. c 2000 Kluwer Acdemic Publishers. Printed in The Netherlnds. Convergence Results for Single-Step On-Policy Reinforcement-Lerning Algorithms SATINDER SINGH AT&T Lbs-Reserch,
More informationBayesian Networks: Approximate Inference
pproches to inference yesin Networks: pproximte Inference xct inference Vrillimintion Join tree lgorithm pproximte inference Simplify the structure of the network to mkxct inferencfficient (vritionl methods,
More informationReview of Calculus, cont d
Jim Lmbers MAT 460 Fll Semester 2009-10 Lecture 3 Notes These notes correspond to Section 1.1 in the text. Review of Clculus, cont d Riemnn Sums nd the Definite Integrl There re mny cses in which some
More informationMore on automata. Michael George. March 24 April 7, 2014
More on utomt Michel George Mrch 24 April 7, 2014 1 Automt constructions Now tht we hve forml model of mchine, it is useful to mke some generl constructions. 1.1 DFA Union / Product construction Suppose
More informationAcceptance Sampling by Attributes
Introduction Acceptnce Smpling by Attributes Acceptnce smpling is concerned with inspection nd decision mking regrding products. Three spects of smpling re importnt: o Involves rndom smpling of n entire
More informationAUTOMATA AND LANGUAGES. Definition 1.5: Finite Automaton
25. Finite Automt AUTOMATA AND LANGUAGES A system of computtion tht only hs finite numer of possile sttes cn e modeled using finite utomton A finite utomton is often illustrted s stte digrm d d d. d q
More informationDecision Networks. CS 188: Artificial Intelligence. Decision Networks. Decision Networks. Decision Networks and Value of Information
CS 188: Artificil Intelligence nd Vlue of Informtion Instructors: Dn Klein nd Pieter Abbeel niversity of Cliforni, Berkeley [These slides were creted by Dn Klein nd Pieter Abbeel for CS188 Intro to AI
More informationChapter 5 : Continuous Random Variables
STAT/MATH 395 A - PROBABILITY II UW Winter Qurter 216 Néhémy Lim Chpter 5 : Continuous Rndom Vribles Nottions. N {, 1, 2,...}, set of nturl numbers (i.e. ll nonnegtive integers); N {1, 2,...}, set of ll
More informationCS 188: Artificial Intelligence
CS 188: Artificil Intelligence Lecture 19: Decision Digrms Pieter Abbeel --- C Berkeley Mny slides over this course dpted from Dn Klein, Sturt Russell, Andrew Moore Decision Networks ME: choose the ction
More information1B40 Practical Skills
B40 Prcticl Skills Comining uncertinties from severl quntities error propgtion We usully encounter situtions where the result of n experiment is given in terms of two (or more) quntities. We then need
More informationCS 188: Artificial Intelligence Fall 2010
CS 188: Artificil Intelligence Fll 2010 Lecture 18: Decision Digrms 10/28/2010 Dn Klein C Berkeley Vlue of Informtion 1 Decision Networks ME: choose the ction which mximizes the expected utility given
More informationNew Expansion and Infinite Series
Interntionl Mthemticl Forum, Vol. 9, 204, no. 22, 06-073 HIKARI Ltd, www.m-hikri.com http://dx.doi.org/0.2988/imf.204.4502 New Expnsion nd Infinite Series Diyun Zhng College of Computer Nnjing University
More informationf(x) dx, If one of these two conditions is not met, we call the integral improper. Our usual definition for the value for the definite integral
Improper Integrls Every time tht we hve evluted definite integrl such s f(x) dx, we hve mde two implicit ssumptions bout the integrl:. The intervl [, b] is finite, nd. f(x) is continuous on [, b]. If one
More informationCS667 Lecture 6: Monte Carlo Integration 02/10/05
CS667 Lecture 6: Monte Crlo Integrtion 02/10/05 Venkt Krishnrj Lecturer: Steve Mrschner 1 Ide The min ide of Monte Crlo Integrtion is tht we cn estimte the vlue of n integrl by looking t lrge number of
More informationNumerical Integration. 1 Introduction. 2 Midpoint Rule, Trapezoid Rule, Simpson Rule. AMSC/CMSC 460/466 T. von Petersdorff 1
AMSC/CMSC 46/466 T. von Petersdorff 1 umericl Integrtion 1 Introduction We wnt to pproximte the integrl I := f xdx where we re given, b nd the function f s subroutine. We evlute f t points x 1,...,x n
More informationNondeterminism and Nodeterministic Automata
Nondeterminism nd Nodeterministic Automt 61 Nondeterminism nd Nondeterministic Automt The computtionl mchine models tht we lerned in the clss re deterministic in the sense tht the next move is uniquely
More informationWE would like to build intelligent agents that can. Autonomous Learning of High-Level States and Actions in Continuous Environments
Autonomous Lerning of High-Level Sttes nd s in Continuous Environments Jonthn Mugn nd Benjmin Kuipers, Fellow, IEEE Abstrct How cn n gent bootstrp up from pixel-level representtion to utonomously lern
More informationLearning to Serve and Bounce a Ball
Sndr Amend Gregor Gebhrdt Technische Universität Drmstdt Abstrct In this pper we investigte lerning the tsks of bll serving nd bll bouncing. These tsks disply chrcteristics which re common in vriety of
More informationThe Regulated and Riemann Integrals
Chpter 1 The Regulted nd Riemnn Integrls 1.1 Introduction We will consider severl different pproches to defining the definite integrl f(x) dx of function f(x). These definitions will ll ssign the sme vlue
More informationChapter Five: Nondeterministic Finite Automata. Formal Language, chapter 5, slide 1
Chpter Five: Nondeterministic Finite Automt Forml Lnguge, chpter 5, slide 1 1 A DFA hs exctly one trnsition from every stte on every symol in the lphet. By relxing this requirement we get relted ut more
More informationRiemann Sums and Riemann Integrals
Riemnn Sums nd Riemnn Integrls Jmes K. Peterson Deprtment of Biologicl Sciences nd Deprtment of Mthemticl Sciences Clemson University August 26, 2013 Outline 1 Riemnn Sums 2 Riemnn Integrls 3 Properties
More informationStudent Activity 3: Single Factor ANOVA
MATH 40 Student Activity 3: Single Fctor ANOVA Some Bsic Concepts In designed experiment, two or more tretments, or combintions of tretments, is pplied to experimentl units The number of tretments, whether
More informationKNOWLEDGE-BASED AGENTS INFERENCE
AGENTS THAT REASON LOGICALLY KNOWLEDGE-BASED AGENTS Two components: knowledge bse, nd n inference engine. Declrtive pproch to building n gent. We tell it wht it needs to know, nd It cn sk itself wht to
More informationTests for the Ratio of Two Poisson Rates
Chpter 437 Tests for the Rtio of Two Poisson Rtes Introduction The Poisson probbility lw gives the probbility distribution of the number of events occurring in specified intervl of time or spce. The Poisson
More informationCS 373, Spring Solutions to Mock midterm 1 (Based on first midterm in CS 273, Fall 2008.)
CS 373, Spring 29. Solutions to Mock midterm (sed on first midterm in CS 273, Fll 28.) Prolem : Short nswer (8 points) The nswers to these prolems should e short nd not complicted. () If n NF M ccepts
More informationSolution for Assignment 1 : Intro to Probability and Statistics, PAC learning
Solution for Assignment 1 : Intro to Probbility nd Sttistics, PAC lerning 10-701/15-781: Mchine Lerning (Fll 004) Due: Sept. 30th 004, Thursdy, Strt of clss Question 1. Bsic Probbility ( 18 pts) 1.1 (
More informationRiemann Integrals and the Fundamental Theorem of Calculus
Riemnn Integrls nd the Fundmentl Theorem of Clculus Jmes K. Peterson Deprtment of Biologicl Sciences nd Deprtment of Mthemticl Sciences Clemson University September 16, 2013 Outline Grphing Riemnn Sums
More informationProblem Set 3 Solutions
Chemistry 36 Dr Jen M Stndrd Problem Set 3 Solutions 1 Verify for the prticle in one-dimensionl box by explicit integrtion tht the wvefunction ψ ( x) π x is normlized To verify tht ψ ( x) is normlized,
More informationNew data structures to reduce data size and search time
New dt structures to reduce dt size nd serch time Tsuneo Kuwbr Deprtment of Informtion Sciences, Fculty of Science, Kngw University, Hirtsuk-shi, Jpn FIT2018 1D-1, No2, pp1-4 Copyright (c)2018 by The Institute
More informationProblem Set 7: Monopoly and Game Theory
ECON 000 Problem Set 7: Monopoly nd Gme Theory. () The monopolist will choose the production level tht mximizes its profits: The FOC of monopolist s problem is: So, the monopolist would set the quntity
More informationRiemann Sums and Riemann Integrals
Riemnn Sums nd Riemnn Integrls Jmes K. Peterson Deprtment of Biologicl Sciences nd Deprtment of Mthemticl Sciences Clemson University August 26, 203 Outline Riemnn Sums Riemnn Integrls Properties Abstrct
More informationRiemann is the Mann! (But Lebesgue may besgue to differ.)
Riemnn is the Mnn! (But Lebesgue my besgue to differ.) Leo Livshits My 2, 2008 1 For finite intervls in R We hve seen in clss tht every continuous function f : [, b] R hs the property tht for every ɛ >
More informationChapter 3 Solving Nonlinear Equations
Chpter 3 Solving Nonliner Equtions 3.1 Introduction The nonliner function of unknown vrible x is in the form of where n could be non-integer. Root is the numericl vlue of x tht stisfies f ( x) 0. Grphiclly,
More informationLocal orthogonality: a multipartite principle for (quantum) correlations
Locl orthogonlity: multiprtite principle for (quntum) correltions Antonio Acín ICREA Professor t ICFO-Institut de Ciencies Fotoniques, Brcelon Cusl Structure in Quntum Theory, Bensque, Spin, June 2013
More informationCS103B Handout 18 Winter 2007 February 28, 2007 Finite Automata
CS103B ndout 18 Winter 2007 Ferury 28, 2007 Finite Automt Initil text y Mggie Johnson. Introduction Severl childrens gmes fit the following description: Pieces re set up on plying ord; dice re thrown or
More informationRecitation 3: More Applications of the Derivative
Mth 1c TA: Pdric Brtlett Recittion 3: More Applictions of the Derivtive Week 3 Cltech 2012 1 Rndom Question Question 1 A grph consists of the following: A set V of vertices. A set E of edges where ech
More informationPower Constrained DTNs: Risk MDP-LP Approach
Power Constrined DTNs: Risk MDP-LP Approch Atul Kumr tulkr.in@gmil.com IEOR, IIT Bomby, Indi Veerrun Kvith vkvith@iitb.c.in, IEOR, IIT Bomby, Indi N Hemchndr nh@iitb.c.in, IEOR, IIT Bomby, Indi. Abstrct
More informationREINFORCEMENT learning (RL) was originally studied
IEEE TRANSACTIONS ON SYSTEMS, MAN, AND CYBERNETICS: SYSTEMS, VOL. 45, NO. 3, MARCH 2015 385 Multiobjective Reinforcement Lerning: A Comprehensive Overview Chunming Liu, Xin Xu, Senior Member, IEEE, nd
More informationChapters 4 & 5 Integrals & Applications
Contents Chpters 4 & 5 Integrls & Applictions Motivtion to Chpters 4 & 5 2 Chpter 4 3 Ares nd Distnces 3. VIDEO - Ares Under Functions............................................ 3.2 VIDEO - Applictions
More informationSituation Calculus. Situation Calculus Building Blocks. Sheila McIlraith, CSC384, University of Toronto, Winter Situations Fluents Actions
Plnning gent: single gent or multi-gent Stte: complete or Incomplete (logicl/probbilistic) stte of the worl n/or gent s stte of knowlege ctions: worl-ltering n/or knowlege-ltering (e.g. sensing) eterministic
More informationCS:4330 Theory of Computation Spring Regular Languages. Equivalences between Finite automata and REs. Haniel Barbosa
CS:4330 Theory of Computtion Spring 208 Regulr Lnguges Equivlences between Finite utomt nd REs Hniel Brbos Redings for this lecture Chpter of [Sipser 996], 3rd edition. Section.3. Finite utomt nd regulr
More informationName Solutions to Test 3 November 8, 2017
Nme Solutions to Test 3 November 8, 07 This test consists of three prts. Plese note tht in prts II nd III, you cn skip one question of those offered. Some possibly useful formuls cn be found below. Brrier
More informationArithmetic & Algebra. NCTM National Conference, 2017
NCTM Ntionl Conference, 2017 Arithmetic & Algebr Hether Dlls, UCLA Mthemtics & The Curtis Center Roger Howe, Yle Mthemtics & Texs A & M School of Eduction Relted Common Core Stndrds First instnce of vrible
More informationJim Lambers MAT 169 Fall Semester Lecture 4 Notes
Jim Lmbers MAT 169 Fll Semester 2009-10 Lecture 4 Notes These notes correspond to Section 8.2 in the text. Series Wht is Series? An infinte series, usully referred to simply s series, is n sum of ll of
More informationAnatomy of a Deterministic Finite Automaton. Deterministic Finite Automata. A machine so simple that you can understand it in less than one minute
Victor Admchik Dnny Sletor Gret Theoreticl Ides In Computer Science CS 5-25 Spring 2 Lecture 2 Mr 3, 2 Crnegie Mellon University Deterministic Finite Automt Finite Automt A mchine so simple tht you cn
More informationLecture 3 ( ) (translated and slightly adapted from lecture notes by Martin Klazar)
Lecture 3 (5.3.2018) (trnslted nd slightly dpted from lecture notes by Mrtin Klzr) Riemnn integrl Now we define precisely the concept of the re, in prticulr, the re of figure U(, b, f) under the grph of
More informationChapter 2 Finite Automata
Chpter 2 Finite Automt 28 2.1 Introduction Finite utomt: first model of the notion of effective procedure. (They lso hve mny other pplictions). The concept of finite utomton cn e derived y exmining wht
More informationNumerical Integration
Chpter 5 Numericl Integrtion Numericl integrtion is the study of how the numericl vlue of n integrl cn be found. Methods of function pproximtion discussed in Chpter??, i.e., function pproximtion vi the
More informationMetrics for Finite Markov Decision Processes
Metrics for Finite Mrkov Decision Processes Norm Ferns chool of Computer cience McGill University Montrél, Cnd, H3 27 nferns@cs.mcgill.c Prksh Pnngden chool of Computer cience McGill University Montrél,
More informationAutonomous Learning of High-Level States and Actions in Continuous Environments. Jonathan Mugan and Benjamin Kuipers, Fellow, IEEE
Autonomous Lerning of High-Level Sttes nd s in Continuous Environments Jonthn Mugn nd Benjmin Kuipers, Fellow, IEEE Abstrct How cn n gent bootstrp up from low-level representtion to utonomously lern high-level
More informationBest Approximation. Chapter The General Case
Chpter 4 Best Approximtion 4.1 The Generl Cse In the previous chpter, we hve seen how n interpolting polynomil cn be used s n pproximtion to given function. We now wnt to find the best pproximtion to given
More informationActor-Critic. Hung-yi Lee
Actor-Critic Hung-yi Lee Asynchronous Advntge Actor-Critic (A3C) Volodymyr Mnih, Adrià Puigdomènech Bdi, Mehdi Mirz, Alex Grves, Timothy P. Lillicrp, Tim Hrley, Dvid Silver, Kory Kvukcuoglu, Asynchronous
More informationMATH 144: Business Calculus Final Review
MATH 144: Business Clculus Finl Review 1 Skills 1. Clculte severl limits. 2. Find verticl nd horizontl symptotes for given rtionl function. 3. Clculte derivtive by definition. 4. Clculte severl derivtives
More informationA Fast and Reliable Policy Improvement Algorithm
A Fst nd Relible Policy Improvement Algorithm Ysin Abbsi-Ydkori Peter L. Brtlett Stephen J. Wright Queenslnd University of Technology UC Berkeley nd QUT University of Wisconsin-Mdison Abstrct We introduce
More information1. Gauss-Jacobi quadrature and Legendre polynomials. p(t)w(t)dt, p {p(x 0 ),...p(x n )} p(t)w(t)dt = w k p(x k ),
1. Guss-Jcobi qudrture nd Legendre polynomils Simpson s rule for evluting n integrl f(t)dt gives the correct nswer with error of bout O(n 4 ) (with constnt tht depends on f, in prticulr, it depends on
More informationNon-Linear & Logistic Regression
Non-Liner & Logistic Regression If the sttistics re boring, then you've got the wrong numbers. Edwrd R. Tufte (Sttistics Professor, Yle University) Regression Anlyses When do we use these? PART 1: find
More informationChapter 14. Matrix Representations of Linear Transformations
Chpter 4 Mtrix Representtions of Liner Trnsformtions When considering the Het Stte Evolution, we found tht we could describe this process using multipliction by mtrix. This ws nice becuse computers cn
More informationMath 270A: Numerical Linear Algebra
Mth 70A: Numericl Liner Algebr Instructor: Michel Holst Fll Qurter 014 Homework Assignment #3 Due Give to TA t lest few dys before finl if you wnt feedbck. Exercise 3.1. (The Bsic Liner Method for Liner
More informationDo the one-dimensional kinetic energy and momentum operators commute? If not, what operator does their commutator represent?
1 Problem 1 Do the one-dimensionl kinetic energy nd momentum opertors commute? If not, wht opertor does their commuttor represent? KE ˆ h m d ˆP i h d 1.1 Solution This question requires clculting the
More informationLearning Moore Machines from Input-Output Traces
Lerning Moore Mchines from Input-Output Trces Georgios Gintmidis 1 nd Stvros Tripkis 1,2 1 Alto University, Finlnd 2 UC Berkeley, USA Motivtion: lerning models from blck boxes Inputs? Lerner Forml Model
More informationBisimulation. R.J. van Glabbeek
Bisimultion R.J. vn Glbbeek NICTA, Sydney, Austrli. School of Computer Science nd Engineering, The University of New South Wles, Sydney, Austrli. Computer Science Deprtment, Stnford University, CA 94305-9045,
More information