2D1431 Machine Learning Lab 3: Reinforcement Learning
|
|
- Austen Sherman
- 5 years ago
- Views:
Transcription
1 2D1431 Mchine Lerning Lb 3: Reinforcement Lerning Frnk Hoffmnn modified by Örjn Ekeberg December 7, Introduction In this lb you will lern bout dynmic progrmming nd reinforcement lerning. It is ssumed tht you re fmilir with the bsic concepts of reinforcement lerning nd tht you hve red chpter 13 in the course book Mchine Lerning (Mitchell, 1997). The first four chpters of the survey on reinforcement lerning by Kelbling et l. (1996) is good supplementry mteril. For further reding nd detiled discussion of policy itertion nd reinforcement lerning, the textbook Reinforcement Lerning is highly recommendble (Sutton nd Brto, 1999). In prticulr studying chpters 3,4 nd 6 is of immense help for this lb. The predefined Mtlb functions for this lb re locted in the course directory /info/mi04/lbs/lb3. Dynmic progrmming refers to clss of lgorithms tht cn be used to compute optiml policies given complete model of the environment. Dynmic progrmming solves problems tht cn be formulted s Mrkov decision processes. Unlike in the reinforcement lerning cse, dynmic progrmming ssumes tht the stte trnsition nd rewrd functions re known. The centrl ide of dynmic progrmming nd reinforcement lerning is to lern vlue functions, which in turn cn be used to identify the optiml policy. 2 Policy Evlution nd Policy Itertion First we consider policy evlution, nmely how to compute the stte-vlue function V π for n rbitrry policy π. For the deterministic cse the vlue 1
2 function hs to obey the Bellmn eqution. V π (s) = r(s, π(s)) + γv π (δ(s, π(s))) (1) where δ(s, ) : S A S nd r(s, ) : S A R re the deterministic stte trnsition nd rewrd function. This eqution cn be either solved directly, by solving liner eqution of the type V = R + BV (2) where V nd R re vectors nd B is mtrix. An lterntive is to solve eqution 1 by successive pproximtion, nd considering the Bellmnn eqution s n updte rule V π k+1 = r(s, π(s)) + γv π k (δ(s, π(s))) (3) The sequence of Vk π cn be shown to converge to V π s k. This method is clled itertive policy evlution. If the policy is stochstic, i.e., the ction in given sitution s is probbility distribution over possible ctions, then we will use π(s, ) to denote the probbility of tking ction. The itertive Bellmn eqution then hs the following form: V π k+1 = π(s, ) (r(s, ) + γv π k (δ(s, ))) (4) For the non-deterministic cse, the trnsition nd rewrd functions hve to be replced by probbilistic functions. In tht cse the Bellmn equtions become: V π (s) = s P (s s, π(s))(r(s, s, π(s)) + γv π (s )) (5) where P (s s, ) is the probbility tht the next stte is s when executing ction in stte s nd R(s, s, ) is the rewrd when executing ction in stte s nd trnsitioning to the next stte s. Policy evlution for the nondeterministic cse, cn be formulted s n updte rule similr to eqution 3 by Vk+1 π = P (s s, π(s))(r(s, s, π(s)) + γvk π (s )) (6) s Our min motivtion for computing the vlue function for policy is to improve on our current policy. For some stte s we cn improve our current policy by picking n lterntive ction π(s) tht devites from our current policy π(s) if it hs higher ction vlue function Q(s, ) > 2
3 Q(s, π(s)). This process is clled policy improvement. In other words, for ech stte s we greedily choose the ction tht mximizes Q π (s, ) π (s) = rgmx Q π (s, ) = rgmx r(s, ) + γv (δ(s, ) (7) Once policy π hs been improved using V π to yield better policy π, we cn then compute V π nd improve it gin to yield n even better π. Policy itertion intertwines policy evlution nd policy improvement ccording to V π k+1(s) = mx π k+1 (s) = rgmx Q(s, ) = mx (r(s, ) + γv π k (δ(s, ))) Q(s, ) = rgmx For the non-deterministic cse we obtin V π (r(s, ) + γv π k (δ(s, ))) (8) k+1(s) = mx Q(s, ) = mx P (s s, )(R(s, s, ) + γv π (s )) s π k+1 (s) = rgmx Q(s, ) = rgmx P (s s, )(R(s, s, ) + γvk π (s )) (9) s It cn be shown tht policy itertion converges to the optiml policy. Notice, tht ech policy evlution, itself n itertive computtion, is strted with the vlue function for the previous policy. Assume grid world of 4 4 cells tht correspond to 16 sttes enumerted s 1,..., s 16 s shown in Figure 1. In ech stte the gent cn choose one of the four possible ctions (North, West, South, Est) in order to move to neighboring cell. If the gent ttempts to move beyond the limits of the grid world, for exmple going est in stte s 8 locted t the right edge, it remins in the originl cell but incurs penlty of -1. There re two specil cells A (s 1 ) nd B (s 3 ) from which the gent is bemed to the cells A (s 13 ) respectively B (s 11 ) independent of the ction it chooses. When being bemed it receives rewrd of +10 for the trnsition from A to A nd rewrd of +5 for the trnsporttion from B to B. For ll other moves tht do not ttempt to led outside the grid world the rewrd is zero. There re no terminl sttes nd the gent tries to mximize its future discounted rewrds over n infinite horizon. Assume discount fctor of γ = 0.9. Due to the discount fctor the ccumulted rewrd remins finite even if the 3
4 A 1 B B A Figure 1: Grid world. Independent of the ction tken by the gent in cell A, it is bemed to cell A nd receives rewrd of +10. The sme pplies to B nd B with rewrd of +5. problem hs n infinite horizon. Notice, tht returning from B to B, only tkes minimum of two steps, wheres going bck to A from A tkes t lest three steps. Therefore, it is not immeditely obvious which policy is optiml. Assignment 1: Use vlue itertion to compute the vlue function V π (s) for n equiprobble policy in which t ech stte ll four possible ctions (including the ones tht ttempt to cross the boundry of the grid world) hve the sme uniform probbility π(s, ) = 1/4. Assume discount fctor γ = 0.9. Use vlue itertion ccording to the Bellmn equtions in (4) to pproximte the vlue function. You cn either use two rrys, one for the old vlues Vk π (s) nd one for the new vlues Vk+1 π (s). This wy the new vlues cn be computed one by one from the old vlues without the old vlues being chnged. It turns out however, tht it is esier to use synchronous updtes, with ech new vlue immeditely overwriting the old one. Asynchronous updtes lso converges to V π, in fct it usully converges fster thn the synchronous updte two-rry version. As n exmple we compute the new vlue of stte s 8. For the four possible ctions 4
5 North, West, South, Est the successor sttes re δ(s 8, North) = s 4, δ(s 8, South) = s 12, δ(s 8, W est) = s 7 nd δ(s 8, Est) = s 8 (the gent ttempts to leve the grid world nd remins in the sme squre). The rewrds re ll zero except for the penlty r(s 8, Est) = 1 when tking the Est ction. All ctions re eqully likely, therefore π(s 8, North) = π(s 8, South) = π(s 8, W est) = π(s 8, Est) = 1/4. In Mtlb we use vector of length 16 to store the vlue function. The updte rule for stte s 8 would look like: >> gmm=0.9; >> V=zeros(16,1); >> V(8) = 1/4 * (-1 + gmm* (V(4) + V(7) + V(12) + V(8))) The Mtlb function plot_v(v,rnge,pi) plots the stte vlue function s color plot. The first rgument V is 16 1-vector with the stte vlues V (s i ). The second optionl rgument rnge is 2 1- vector to specify the lower nd upper bound of the vlue function for scling the color-plot. The defult rnge is [ 10 30]. The third optionl rgument pi is 16 1-vector for specifying the current policy π(s) : S A, where by definition, the ctions North, Est, South, West re clockwise enumerted from 1 to 4. Use policy itertion bsed on eqution 8 to compute the optiml vlue function V nd policy π (s, ). It might be esier to use the ction vlue function Q(s, ) rther thn the stte vlue functionv (s). In Mtlb you represent Q(s, ) by 16 4-mtrix, where the first dimension corresponds to the stte, nd the second dimension to the ction. Visulize the optiml vlue function nd policy using plot_v. After how mny itertions does the lgorithm find n optiml policy, ssuming the initil stte vlues re zero? Is the optiml policy unique? Wht hppens if you initilize the stte vlue function with rndom vlues rther thn zero >> V=10.0*rnd(16,1); Does the lgorithm converge to different policy? Assignment 2: Assume, tht the trnsition function is no longer deterministic, but given by the probbility P (s s, ). Compute the optiml vlue function V nd 5
6 policy π (s, ) using policy itertion ccording to equtions 9, for nondeterministic stte trnsition function. Assume tht with probbility p = 0.7, the gent moves to the correct squre s indicted by the desired ction, but with probbility 1 p = 0.3 rndom ction is tken tht pushes the gent to rndom neighboring squre. The rndom squre cn be coincidentlly the very sme cell tht ws originlly preferred by the ction. A rndom ction cn lso be n illegl move, tht incurs penlty of -1. Visulize the optiml vlue function nd policy using plot_v. After how mny itertions does the lgorithm find n optiml policy, ssuming the initil stte vlues re zero? Is the optiml policy unique? 3 Temporl Difference Lerning This ssignment dels with the generl reinforcement lerning problem, in tht we no longer ssume tht the stte trnsition nd rewrd functions re known. Temporl difference (TD) lerning directly lern from experience nd do not rely on model of the environment s dynmics. TD methods updte the estimte of the ction vlue function bsed on lerned estimtes, in other words unlike Monte Crlo methods which updte their estimtes only t the end of n episode, they bootstrp nd updte their beliefs immeditely fter ech stte trnsition. For more detils on temporl difference lerning red chpters six nd seven of the reinforcement lerning book Sutton nd Brto (1999). Temporl difference lerning is esier formulted using the ction vlue function Q(s, ) rther thn the stte vlue function V (s) which re relted through Q π (s, ) = P (s s, )R(s, s, ) + γv π (s ) (10) s In contrst to dynmic progrmming, the gent lerns through interction with the environment. There is need for ctive explortion of the stte spce nd the possible ctions. At ech stte s the gent chooses n ction ccording to its current policy, nd observes n immedite rewrd r nd new stte s. This sequence of stte, ction, rewrd, stte, ction motivtes the nme SARSA for this form of lerning. The ction vlue function cn be lerned by mens of off-policy TD lerning lso clled Q-lerning. In its simplest form, one step Q-lerning, it is defined by the updte rule Q(s, ) = Q(s, ) + α(r + γ mx Q(s, ) Q(s, )) (11) 6
7 In this cse, the lerned ction-vlue function Q(s, ) directly pproximtes the optiml vlue function Q (s, ), independent of the policy followed, hence off-policy lerning. However, the policy π(s, ) : S A R (π(s, ) is the probbility of tking ction in stte s) still hs n effect in tht it determines which stte-ction pirs re visited nd updted. All temporl difference methods hve need for ctive explortion, which requires tht the gent every now nd then tries lterntive ctions tht re not necessrily optiml ccording to its current estimtes of Q(s, ). The policy is generlly soft, mening tht π(s, ) > 0 for ll sttes nd ctions. An ɛ-greedy policy stisfies this requirement, in tht most of the time with probbility 1 ɛ it picks the optiml ction ccording to π(s) = rgmx Q(s, ) (12) but with smll probbility ɛ it tkes rndom ction. Therefore, ll nongreedy ctions re tken with the probbility π(s, ) = ɛ/a(s), where A(s) is the number of lterntive ctions in stte s. As the gent collects more nd more evidence the policy shifts towrds deterministic optiml policy. This cn be chieved by decresing ɛ with n incresing number of observtions, for exmple ccording to ɛ(t) = ɛ 0 (1 t/t ) (13) where T is the totl number of itertions. Resonble vlues for lerning nd explortion rte re α = 0.1 nd ɛ 0 = 0.2. The off-policy TD lgorithm cn be summrized s Initilize Q(s, ) rbitrrily Initilize s Repet for ech step Choose from s using ɛ-greedy policy bsed on Q(s, ) Tke ction, observe rewrd r, nd next stte s Updte Q(s, ) = Q(s, ) + α(r + γ mx Q(s, ) Q(s, )) Replce s with s until T steps 7
8 Assignment 3: For n unknown environment the gent is supposed to lern the optiml policy by mens of off-policy temporl difference lerning. The stte spce consists of 25 sttes s 1,..., s 25, corresponding to 5 5 grid-world. In ech stte the gent hs the choice between four possible ctions 1,..., 4, which cn be ssocited to the four directions North, Est, South, West. However, the trnsition function is not deterministic, which mens the gent sometimes ends up in non-neighboring squre. Assume, tht the exct model of the environment nd the rewrds re unknown. The dynmics of the environment re determined by the Mtlb functions s = strtstte nd [s_new rewrd] = env(s_old,ction). The function strtstte returns the initil stte. The sttes s 1,..., s 25 re represented by the integers 1,..., 25, nd the ctions 1,..., 4 re enumerted by 1,..., 4. The function [s_new rewrd] = env(s_old,ction) computes the next stte s_new nd the rewrd rewrd when executing ction ction in the current stte s_old. Represent the ction vlue function Q(s, ) by 25 4-mtrix Q. Given Q you cn compute the optiml policy pi(s) nd stte vlue function V nd visulize it with plot_v_td(v,rnge,pi) using the following code >> [V pi] = mx(q,[],2); >> plot_v_td(v,[-5 15],pi); The function plot_v_td(v,rnge,pi) is the counterprt to the Mtlb function plot_v(v,rnge,pi) for the 4 4-gridworld used in the erlier ssignments. The function plot_trce(sttes,ctions,tlength) cn be used to plot trce of the most recently visited sttes. The prmeter sttes is N 1-vector tht contins the history of recent sttes s(t),..., s(t + N), the prmeter ctions is N 1-vector tht stores the history of recent ctions (t 1),..., (t + N 1), nd tlength determines how mny sttes from the pst re plotted. Build history of sttes, ctions nd rewrds when iterting the TD-lerning lgorithm, by ppending the new stte s, ction nd rewrd r to the history of previous sttes, ctions nd rewrds. >> for k=1:itertions >>... >> sttes = [sttes s]; >> ctions = [ctions ]; >> rewrds = [rewrds r]; >>... 8
9 >> end >> plot_trce(sttes,ctions,12); Run the off-policy TD lerning lgorithm for steps. Initilize the Q(s, ) with smll positive vlues (e.g. 0.1) in order to bis the TD-lerning to explore lterntive ctions in the erly stges, when most of the time the rewrds re zero. Every 500 steps visulize the current stte vlue function V (s), optiml policy π(s) plot trce of the recently visited sttes nd ctions. nd compute the verge rewrd over the pst 500 steps nd plot the evolution of the verge nd ccumulted rewrd s function of the number of itertions. Experiment with different settings for the explortion prmeter ɛ 0 nd lerning rte α. Cn you think of n extension to the one-step TD-lerning lgorithm tht would help to lern the optiml policy in fewer number of itertions? If you hve time, try to implement this extension. References L. P. Kelbling, M. L. Littmn, nd A. W. Moore. Reinforcement lerning: A survey. Journl of Artificil Intelligence Reserch, 4: , T. M. Mitchell. Mchine Lerning. McGrw Hill, R. Sutton nd A. Brto. Reinforcement Lerning. MIT Press, Also vilble online t 9
Bellman Optimality Equation for V*
Bellmn Optimlity Eqution for V* The vlue of stte under n optiml policy must equl the expected return for the best ction from tht stte: V (s) mx Q (s,) A(s) mx A(s) mx A(s) Er t 1 V (s t 1 ) s t s, t s
More informationReinforcement Learning
Reinforcement Lerning Tom Mitchell, Mchine Lerning, chpter 13 Outline Introduction Comprison with inductive lerning Mrkov Decision Processes: the model Optiml policy: The tsk Q Lerning: Q function Algorithm
More informationAdministrivia CSE 190: Reinforcement Learning: An Introduction
Administrivi CSE 190: Reinforcement Lerning: An Introduction Any emil sent to me bout the course should hve CSE 190 in the subject line! Chpter 4: Dynmic Progrmming Acknowledgment: A good number of these
More information{ } = E! & $ " k r t +k +1
Chpter 4: Dynmic Progrmming Objectives of this chpter: Overview of collection of clssicl solution methods for MDPs known s dynmic progrmming (DP) Show how DP cn be used to compute vlue functions, nd hence,
More informationChapter 4: Dynamic Programming
Chpter 4: Dynmic Progrmming Objectives of this chpter: Overview of collection of clssicl solution methods for MDPs known s dynmic progrmming (DP) Show how DP cn be used to compute vlue functions, nd hence,
More informationReinforcement learning II
CS 1675 Introduction to Mchine Lerning Lecture 26 Reinforcement lerning II Milos Huskrecht milos@cs.pitt.edu 5329 Sennott Squre Reinforcement lerning Bsics: Input x Lerner Output Reinforcement r Critic
More information19 Optimal behavior: Game theory
Intro. to Artificil Intelligence: Dle Schuurmns, Relu Ptrscu 1 19 Optiml behvior: Gme theory Adversril stte dynmics hve to ccount for worst cse Compute policy π : S A tht mximizes minimum rewrd Let S (,
More informationModule 6 Value Iteration. CS 886 Sequential Decision Making and Reinforcement Learning University of Waterloo
Module 6 Vlue Itertion CS 886 Sequentil Decision Mking nd Reinforcement Lerning University of Wterloo Mrkov Decision Process Definition Set of sttes: S Set of ctions (i.e., decisions): A Trnsition model:
More information1 Online Learning and Regret Minimization
2.997 Decision-Mking in Lrge-Scle Systems My 10 MIT, Spring 2004 Hndout #29 Lecture Note 24 1 Online Lerning nd Regret Minimiztion In this lecture, we consider the problem of sequentil decision mking in
More information20 MATHEMATICS POLYNOMIALS
0 MATHEMATICS POLYNOMIALS.1 Introduction In Clss IX, you hve studied polynomils in one vrible nd their degrees. Recll tht if p(x) is polynomil in x, the highest power of x in p(x) is clled the degree of
More informationMath 1B, lecture 4: Error bounds for numerical methods
Mth B, lecture 4: Error bounds for numericl methods Nthn Pflueger 4 September 0 Introduction The five numericl methods descried in the previous lecture ll operte by the sme principle: they pproximte the
More informationDecision Networks. CS 188: Artificial Intelligence Fall Example: Decision Networks. Decision Networks. Decisions as Outcome Trees
CS 188: Artificil Intelligence Fll 2011 Decision Networks ME: choose the ction which mximizes the expected utility given the evidence mbrell Lecture 17: Decision Digrms 10/27/2011 Cn directly opertionlize
More informationAdvanced Calculus: MATH 410 Notes on Integrals and Integrability Professor David Levermore 17 October 2004
Advnced Clculus: MATH 410 Notes on Integrls nd Integrbility Professor Dvid Levermore 17 October 2004 1. Definite Integrls In this section we revisit the definite integrl tht you were introduced to when
More informationMonte Carlo method in solving numerical integration and differential equation
Monte Crlo method in solving numericl integrtion nd differentil eqution Ye Jin Chemistry Deprtment Duke University yj66@duke.edu Abstrct: Monte Crlo method is commonly used in rel physics problem. The
More informationProperties of Integrals, Indefinite Integrals. Goals: Definition of the Definite Integral Integral Calculations using Antiderivatives
Block #6: Properties of Integrls, Indefinite Integrls Gols: Definition of the Definite Integrl Integrl Clcultions using Antiderivtives Properties of Integrls The Indefinite Integrl 1 Riemnn Sums - 1 Riemnn
More informationReinforcement Learning and Policy Reuse
Reinforcement Lerning nd Policy Reue Mnuel M. Veloo PEL Fll 206 Reding: Reinforcement Lerning: An Introduction R. Sutton nd A. Brto Probbilitic policy reue in reinforcement lerning gent Fernndo Fernndez
More informationRiemann is the Mann! (But Lebesgue may besgue to differ.)
Riemnn is the Mnn! (But Lebesgue my besgue to differ.) Leo Livshits My 2, 2008 1 For finite intervls in R We hve seen in clss tht every continuous function f : [, b] R hs the property tht for every ɛ >
More informationCf. Linn Sennott, Stochastic Dynamic Programming and the Control of Queueing Systems, Wiley Series in Probability & Statistics, 1999.
Cf. Linn Sennott, Stochstic Dynmic Progrmming nd the Control of Queueing Systems, Wiley Series in Probbility & Sttistics, 1999. D.L.Bricker, 2001 Dept of Industril Engineering The University of Iow MDP
More informationCS667 Lecture 6: Monte Carlo Integration 02/10/05
CS667 Lecture 6: Monte Crlo Integrtion 02/10/05 Venkt Krishnrj Lecturer: Steve Mrschner 1 Ide The min ide of Monte Crlo Integrtion is tht we cn estimte the vlue of n integrl by looking t lrge number of
More informationJack Simons, Henry Eyring Scientist and Professor Chemistry Department University of Utah
1. Born-Oppenheimer pprox.- energy surfces 2. Men-field (Hrtree-Fock) theory- orbitls 3. Pros nd cons of HF- RHF, UHF 4. Beyond HF- why? 5. First, one usully does HF-how? 6. Bsis sets nd nottions 7. MPn,
More informationAQA Further Pure 1. Complex Numbers. Section 1: Introduction to Complex Numbers. The number system
Complex Numbers Section 1: Introduction to Complex Numbers Notes nd Exmples These notes contin subsections on The number system Adding nd subtrcting complex numbers Multiplying complex numbers Complex
More informationContinuous Random Variables
STAT/MATH 395 A - PROBABILITY II UW Winter Qurter 217 Néhémy Lim Continuous Rndom Vribles Nottion. The indictor function of set S is rel-vlued function defined by : { 1 if x S 1 S (x) if x S Suppose tht
More informationCS 188 Introduction to Artificial Intelligence Fall 2018 Note 7
CS 188 Introduction to Artificil Intelligence Fll 2018 Note 7 These lecture notes re hevily bsed on notes originlly written by Nikhil Shrm. Decision Networks In the third note, we lerned bout gme trees
More informationTHE EXISTENCE-UNIQUENESS THEOREM FOR FIRST-ORDER DIFFERENTIAL EQUATIONS.
THE EXISTENCE-UNIQUENESS THEOREM FOR FIRST-ORDER DIFFERENTIAL EQUATIONS RADON ROSBOROUGH https://intuitiveexplntionscom/picrd-lindelof-theorem/ This document is proof of the existence-uniqueness theorem
More informationNUMERICAL INTEGRATION. The inverse process to differentiation in calculus is integration. Mathematically, integration is represented by.
NUMERICAL INTEGRATION 1 Introduction The inverse process to differentition in clculus is integrtion. Mthemticlly, integrtion is represented by f(x) dx which stnds for the integrl of the function f(x) with
More informationReview of Calculus, cont d
Jim Lmbers MAT 460 Fll Semester 2009-10 Lecture 3 Notes These notes correspond to Section 1.1 in the text. Review of Clculus, cont d Riemnn Sums nd the Definite Integrl There re mny cses in which some
More informationMath 31S. Rumbos Fall Solutions to Assignment #16
Mth 31S. Rumbos Fll 2016 1 Solutions to Assignment #16 1. Logistic Growth 1. Suppose tht the growth of certin niml popultion is governed by the differentil eqution 1000 dn N dt = 100 N, (1) where N(t)
More informationCS 188: Artificial Intelligence Spring 2007
CS 188: Artificil Intelligence Spring 2007 Lecture 3: Queue-Bsed Serch 1/23/2007 Srini Nrynn UC Berkeley Mny slides over the course dpted from Dn Klein, Sturt Russell or Andrew Moore Announcements Assignment
More informationDecision Networks. CS 188: Artificial Intelligence. Decision Networks. Decision Networks. Decision Networks and Value of Information
CS 188: Artificil Intelligence nd Vlue of Informtion Instructors: Dn Klein nd Pieter Abbeel niversity of Cliforni, Berkeley [These slides were creted by Dn Klein nd Pieter Abbeel for CS188 Intro to AI
More informationLECTURE NOTE #12 PROF. ALAN YUILLE
LECTURE NOTE #12 PROF. ALAN YUILLE 1. Clustering, K-mens, nd EM Tsk: set of unlbeled dt D = {x 1,..., x n } Decompose into clsses w 1,..., w M where M is unknown. Lern clss models p(x w)) Discovery of
More informationFinite Automata. Informatics 2A: Lecture 3. John Longley. 22 September School of Informatics University of Edinburgh
Lnguges nd Automt Finite Automt Informtics 2A: Lecture 3 John Longley School of Informtics University of Edinburgh jrl@inf.ed.c.uk 22 September 2017 1 / 30 Lnguges nd Automt 1 Lnguges nd Automt Wht is
More informationChapter 0. What is the Lebesgue integral about?
Chpter 0. Wht is the Lebesgue integrl bout? The pln is to hve tutoril sheet ech week, most often on Fridy, (to be done during the clss) where you will try to get used to the ides introduced in the previous
More informationNew Expansion and Infinite Series
Interntionl Mthemticl Forum, Vol. 9, 204, no. 22, 06-073 HIKARI Ltd, www.m-hikri.com http://dx.doi.org/0.2988/imf.204.4502 New Expnsion nd Infinite Series Diyun Zhng College of Computer Nnjing University
More informationLecture Note 9: Orthogonal Reduction
MATH : Computtionl Methods of Liner Algebr 1 The Row Echelon Form Lecture Note 9: Orthogonl Reduction Our trget is to solve the norml eution: Xinyi Zeng Deprtment of Mthemticl Sciences, UTEP A t Ax = A
More informationReview of basic calculus
Review of bsic clculus This brief review reclls some of the most importnt concepts, definitions, nd theorems from bsic clculus. It is not intended to tech bsic clculus from scrtch. If ny of the items below
More informationWe will see what is meant by standard form very shortly
THEOREM: For fesible liner progrm in its stndrd form, the optimum vlue of the objective over its nonempty fesible region is () either unbounded or (b) is chievble t lest t one extreme point of the fesible
More informationA Fast and Reliable Policy Improvement Algorithm
A Fst nd Relible Policy Improvement Algorithm Ysin Abbsi-Ydkori Peter L. Brtlett Stephen J. Wright Queenslnd University of Technology UC Berkeley nd QUT University of Wisconsin-Mdison Abstrct We introduce
More informationAPPROXIMATE INTEGRATION
APPROXIMATE INTEGRATION. Introduction We hve seen tht there re functions whose nti-derivtives cnnot be expressed in closed form. For these resons ny definite integrl involving these integrnds cnnot be
More informationReinforcement learning
Reinforcement lerning Regulr MDP Given: Trnition model P Rewrd function R Find: Policy π Reinforcement lerning Trnition model nd rewrd function initilly unknown Still need to find the right policy Lern
More informationRecitation 3: More Applications of the Derivative
Mth 1c TA: Pdric Brtlett Recittion 3: More Applictions of the Derivtive Week 3 Cltech 2012 1 Rndom Question Question 1 A grph consists of the following: A set V of vertices. A set E of edges where ech
More informationMath 270A: Numerical Linear Algebra
Mth 70A: Numericl Liner Algebr Instructor: Michel Holst Fll Qurter 014 Homework Assignment #3 Due Give to TA t lest few dys before finl if you wnt feedbck. Exercise 3.1. (The Bsic Liner Method for Liner
More informationEnergy Bands Energy Bands and Band Gap. Phys463.nb Phenomenon
Phys463.nb 49 7 Energy Bnds Ref: textbook, Chpter 7 Q: Why re there insultors nd conductors? Q: Wht will hppen when n electron moves in crystl? In the previous chpter, we discussed free electron gses,
More informationNumerical integration
2 Numericl integrtion This is pge i Printer: Opque this 2. Introduction Numericl integrtion is problem tht is prt of mny problems in the economics nd econometrics literture. The orgniztion of this chpter
More informationTravelling Profile Solutions For Nonlinear Degenerate Parabolic Equation And Contour Enhancement In Image Processing
Applied Mthemtics E-Notes 8(8) - c IN 67-5 Avilble free t mirror sites of http://www.mth.nthu.edu.tw/ men/ Trvelling Profile olutions For Nonliner Degenerte Prbolic Eqution And Contour Enhncement In Imge
More informationMIXED MODELS (Sections ) I) In the unrestricted model, interactions are treated as in the random effects model:
1 2 MIXED MODELS (Sections 17.7 17.8) Exmple: Suppose tht in the fiber breking strength exmple, the four mchines used were the only ones of interest, but the interest ws over wide rnge of opertors, nd
More information1 Probability Density Functions
Lis Yn CS 9 Continuous Distributions Lecture Notes #9 July 6, 28 Bsed on chpter by Chris Piech So fr, ll rndom vribles we hve seen hve been discrete. In ll the cses we hve seen in CS 9, this ment tht our
More informationSolution for Assignment 1 : Intro to Probability and Statistics, PAC learning
Solution for Assignment 1 : Intro to Probbility nd Sttistics, PAC lerning 10-701/15-781: Mchine Lerning (Fll 004) Due: Sept. 30th 004, Thursdy, Strt of clss Question 1. Bsic Probbility ( 18 pts) 1.1 (
More informationIntermediate Math Circles Wednesday, November 14, 2018 Finite Automata II. Nickolas Rollick a b b. a b 4
Intermedite Mth Circles Wednesdy, Novemer 14, 2018 Finite Automt II Nickols Rollick nrollick@uwterloo.c Regulr Lnguges Lst time, we were introduced to the ide of DFA (deterministic finite utomton), one
More informationP 3 (x) = f(0) + f (0)x + f (0) 2. x 2 + f (0) . In the problem set, you are asked to show, in general, the n th order term is a n = f (n) (0)
1 Tylor polynomils In Section 3.5, we discussed how to pproximte function f(x) round point in terms of its first derivtive f (x) evluted t, tht is using the liner pproximtion f() + f ()(x ). We clled this
More informationUnit #9 : Definite Integral Properties; Fundamental Theorem of Calculus
Unit #9 : Definite Integrl Properties; Fundmentl Theorem of Clculus Gols: Identify properties of definite integrls Define odd nd even functions, nd reltionship to integrl vlues Introduce the Fundmentl
More informationMath 360: A primitive integral and elementary functions
Mth 360: A primitive integrl nd elementry functions D. DeTurck University of Pennsylvni October 16, 2017 D. DeTurck Mth 360 001 2017C: Integrl/functions 1 / 32 Setup for the integrl prtitions Definition:
More informationThe Regulated and Riemann Integrals
Chpter 1 The Regulted nd Riemnn Integrls 1.1 Introduction We will consider severl different pproches to defining the definite integrl f(x) dx of function f(x). These definitions will ll ssign the sme vlue
More informationOrdinary Differential Equations- Boundary Value Problem
Ordinry Differentil Equtions- Boundry Vlue Problem Shooting method Runge Kutt method Computer-bsed solutions o BVPFD subroutine (Fortrn IMSL subroutine tht Solves (prmeterized) system of differentil equtions
More informationInfinite Geometric Series
Infinite Geometric Series Finite Geometric Series ( finite SUM) Let 0 < r < 1, nd let n be positive integer. Consider the finite sum It turns out there is simple lgebric expression tht is equivlent to
More informationA REVIEW OF CALCULUS CONCEPTS FOR JDEP 384H. Thomas Shores Department of Mathematics University of Nebraska Spring 2007
A REVIEW OF CALCULUS CONCEPTS FOR JDEP 384H Thoms Shores Deprtment of Mthemtics University of Nebrsk Spring 2007 Contents Rtes of Chnge nd Derivtives 1 Dierentils 4 Are nd Integrls 5 Multivrite Clculus
More informationdt. However, we might also be curious about dy
Section 0. The Clculus of Prmetric Curves Even though curve defined prmetricly my not be function, we cn still consider concepts such s rtes of chnge. However, the concepts will need specil tretment. For
More informationNumerical Integration
Chpter 5 Numericl Integrtion Numericl integrtion is the study of how the numericl vlue of n integrl cn be found. Methods of function pproximtion discussed in Chpter??, i.e., function pproximtion vi the
More informationChapter 5 : Continuous Random Variables
STAT/MATH 395 A - PROBABILITY II UW Winter Qurter 216 Néhémy Lim Chpter 5 : Continuous Rndom Vribles Nottions. N {, 1, 2,...}, set of nturl numbers (i.e. ll nonnegtive integers); N {1, 2,...}, set of ll
More informationVyacheslav Telnin. Search for New Numbers.
Vycheslv Telnin Serch for New Numbers. 1 CHAPTER I 2 I.1 Introduction. In 1984, in the first issue for tht yer of the Science nd Life mgzine, I red the rticle "Non-Stndrd Anlysis" by V. Uspensky, in which
More informationFinite Automata. Informatics 2A: Lecture 3. Mary Cryan. 21 September School of Informatics University of Edinburgh
Finite Automt Informtics 2A: Lecture 3 Mry Cryn School of Informtics University of Edinburgh mcryn@inf.ed.c.uk 21 September 2018 1 / 30 Lnguges nd Automt Wht is lnguge? Finite utomt: recp Some forml definitions
More informationNondeterminism and Nodeterministic Automata
Nondeterminism nd Nodeterministic Automt 61 Nondeterminism nd Nondeterministic Automt The computtionl mchine models tht we lerned in the clss re deterministic in the sense tht the next move is uniquely
More informationState space systems analysis (continued) Stability. A. Definitions A system is said to be Asymptotically Stable (AS) when it satisfies
Stte spce systems nlysis (continued) Stbility A. Definitions A system is sid to be Asymptoticlly Stble (AS) when it stisfies ut () = 0, t > 0 lim xt () 0. t A system is AS if nd only if the impulse response
More informationCS 188: Artificial Intelligence
CS 188: Artificil Intelligence Lecture 19: Decision Digrms Pieter Abbeel --- C Berkeley Mny slides over this course dpted from Dn Klein, Sturt Russell, Andrew Moore Decision Networks ME: choose the ction
More informationOperations with Polynomials
38 Chpter P Prerequisites P.4 Opertions with Polynomils Wht you should lern: How to identify the leding coefficients nd degrees of polynomils How to dd nd subtrct polynomils How to multiply polynomils
More informationDuality # Second iteration for HW problem. Recall our LP example problem we have been working on, in equality form, is given below.
Dulity #. Second itertion for HW problem Recll our LP emple problem we hve been working on, in equlity form, is given below.,,,, 8 m F which, when written in slightly different form, is 8 F Recll tht we
More informationChapter 3 Polynomials
Dr M DRAIEF As described in the introduction of Chpter 1, pplictions of solving liner equtions rise in number of different settings In prticulr, we will in this chpter focus on the problem of modelling
More informationThe First Fundamental Theorem of Calculus. If f(x) is continuous on [a, b] and F (x) is any antiderivative. f(x) dx = F (b) F (a).
The Fundmentl Theorems of Clculus Mth 4, Section 0, Spring 009 We now know enough bout definite integrls to give precise formultions of the Fundmentl Theorems of Clculus. We will lso look t some bsic emples
More informationConvergence Results for Single-Step On-Policy Reinforcement-Learning Algorithms
Mchine Lerning, 39, 287 308, 2000. c 2000 Kluwer Acdemic Publishers. Printed in The Netherlnds. Convergence Results for Single-Step On-Policy Reinforcement-Lerning Algorithms SATINDER SINGH AT&T Lbs-Reserch,
More informationMATH34032: Green s Functions, Integral Equations and the Calculus of Variations 1
MATH34032: Green s Functions, Integrl Equtions nd the Clculus of Vritions 1 Section 1 Function spces nd opertors Here we gives some brief detils nd definitions, prticulrly relting to opertors. For further
More informationHow do you know you have SLE?
Simultneous Liner Equtions Simultneous Liner Equtions nd Liner Algebr Simultneous liner equtions (SLE s) occur frequently in Sttics, Dynmics, Circuits nd other engineering clsses Need to be ble to, nd
More informationStudent Activity 3: Single Factor ANOVA
MATH 40 Student Activity 3: Single Fctor ANOVA Some Bsic Concepts In designed experiment, two or more tretments, or combintions of tretments, is pplied to experimentl units The number of tretments, whether
More informationp-adic Egyptian Fractions
p-adic Egyptin Frctions Contents 1 Introduction 1 2 Trditionl Egyptin Frctions nd Greedy Algorithm 2 3 Set-up 3 4 p-greedy Algorithm 5 5 p-egyptin Trditionl 10 6 Conclusion 1 Introduction An Egyptin frction
More informationChapter 14. Matrix Representations of Linear Transformations
Chpter 4 Mtrix Representtions of Liner Trnsformtions When considering the Het Stte Evolution, we found tht we could describe this process using multipliction by mtrix. This ws nice becuse computers cn
More informationHow do we solve these things, especially when they get complicated? How do we know when a system has a solution, and when is it unique?
XII. LINEAR ALGEBRA: SOLVING SYSTEMS OF EQUATIONS Tody we re going to tlk bout solving systems of liner equtions. These re problems tht give couple of equtions with couple of unknowns, like: 6 2 3 7 4
More informationActor-Critic. Hung-yi Lee
Actor-Critic Hung-yi Lee Asynchronous Advntge Actor-Critic (A3C) Volodymyr Mnih, Adrià Puigdomènech Bdi, Mehdi Mirz, Alex Grves, Timothy P. Lillicrp, Tim Hrley, Dvid Silver, Kory Kvukcuoglu, Asynchronous
More informationCS 275 Automata and Formal Language Theory
CS 275 Automt nd Forml Lnguge Theory Course Notes Prt II: The Recognition Problem (II) Chpter II.5.: Properties of Context Free Grmmrs (14) Anton Setzer (Bsed on book drft by J. V. Tucker nd K. Stephenson)
More informationCS 188: Artificial Intelligence Fall Announcements
CS 188: Artificil Intelligence Fll 2009 Lecture 20: Prticle Filtering 11/5/2009 Dn Klein UC Berkeley Announcements Written 3 out: due 10/12 Project 4 out: due 10/19 Written 4 proly xed, Project 5 moving
More informationMain topics for the First Midterm
Min topics for the First Midterm The Midterm will cover Section 1.8, Chpters 2-3, Sections 4.1-4.8, nd Sections 5.1-5.3 (essentilly ll of the mteril covered in clss). Be sure to know the results of the
More informationPhysics 202H - Introductory Quantum Physics I Homework #08 - Solutions Fall 2004 Due 5:01 PM, Monday 2004/11/15
Physics H - Introductory Quntum Physics I Homework #8 - Solutions Fll 4 Due 5:1 PM, Mondy 4/11/15 [55 points totl] Journl questions. Briefly shre your thoughts on the following questions: Of the mteril
More informationCS 188: Artificial Intelligence Fall 2010
CS 188: Artificil Intelligence Fll 2010 Lecture 18: Decision Digrms 10/28/2010 Dn Klein C Berkeley Vlue of Informtion 1 Decision Networks ME: choose the ction which mximizes the expected utility given
More informationLecture 3. In this lecture, we will discuss algorithms for solving systems of linear equations.
Lecture 3 3 Solving liner equtions In this lecture we will discuss lgorithms for solving systems of liner equtions Multiplictive identity Let us restrict ourselves to considering squre mtrices since one
More informationPlanning to Be Surprised: Optimal Bayesian Exploration in Dynamic Environments
Plnning to Be Surprised: Optiml Byesin Explortion in Dynmic Environments Yi Sun, Fustino Gomez, nd Jürgen Schmidhuber IDSIA, Glleri 2, Mnno, CH-6928, Switzerlnd {yi,tino,juergen}@idsi.ch Abstrct. To mximize
More informationNUMERICAL INTEGRATION
NUMERICAL INTEGRATION How do we evlute I = f (x) dx By the fundmentl theorem of clculus, if F (x) is n ntiderivtive of f (x), then I = f (x) dx = F (x) b = F (b) F () However, in prctice most integrls
More informationBest Approximation. Chapter The General Case
Chpter 4 Best Approximtion 4.1 The Generl Cse In the previous chpter, we hve seen how n interpolting polynomil cn be used s n pproximtion to given function. We now wnt to find the best pproximtion to given
More information1 The Riemann Integral
The Riemnn Integrl. An exmple leding to the notion of integrl (res) We know how to find (i.e. define) the re of rectngle (bse height), tringle ( (sum of res of tringles). But how do we find/define n re
More information1B40 Practical Skills
B40 Prcticl Skills Comining uncertinties from severl quntities error propgtion We usully encounter situtions where the result of n experiment is given in terms of two (or more) quntities. We then need
More informationBayesian Networks: Approximate Inference
pproches to inference yesin Networks: pproximte Inference xct inference Vrillimintion Join tree lgorithm pproximte inference Simplify the structure of the network to mkxct inferencfficient (vritionl methods,
More informationJim Lambers MAT 169 Fall Semester Lecture 4 Notes
Jim Lmbers MAT 169 Fll Semester 2009-10 Lecture 4 Notes These notes correspond to Section 8.2 in the text. Series Wht is Series? An infinte series, usully referred to simply s series, is n sum of ll of
More informationCBE 291b - Computation And Optimization For Engineers
The University of Western Ontrio Fculty of Engineering Science Deprtment of Chemicl nd Biochemicl Engineering CBE 9b - Computtion And Optimiztion For Engineers Mtlb Project Introduction Prof. A. Jutn Jn
More informationNumerical Linear Algebra Assignment 008
Numericl Liner Algebr Assignment 008 Nguyen Qun B Hong Students t Fculty of Mth nd Computer Science, Ho Chi Minh University of Science, Vietnm emil. nguyenqunbhong@gmil.com blog. http://hongnguyenqunb.wordpress.com
More informationW. We shall do so one by one, starting with I 1, and we shall do it greedily, trying
Vitli covers 1 Definition. A Vitli cover of set E R is set V of closed intervls with positive length so tht, for every δ > 0 nd every x E, there is some I V with λ(i ) < δ nd x I. 2 Lemm (Vitli covering)
More informationPhysics 116C Solution of inhomogeneous ordinary differential equations using Green s functions
Physics 6C Solution of inhomogeneous ordinry differentil equtions using Green s functions Peter Young November 5, 29 Homogeneous Equtions We hve studied, especilly in long HW problem, second order liner
More informationReview of Gaussian Quadrature method
Review of Gussin Qudrture method Nsser M. Asi Spring 006 compiled on Sundy Decemer 1, 017 t 09:1 PM 1 The prolem To find numericl vlue for the integrl of rel vlued function of rel vrile over specific rnge
More informationMath 520 Final Exam Topic Outline Sections 1 3 (Xiao/Dumas/Liaw) Spring 2008
Mth 520 Finl Exm Topic Outline Sections 1 3 (Xio/Dums/Liw) Spring 2008 The finl exm will be held on Tuesdy, My 13, 2-5pm in 117 McMilln Wht will be covered The finl exm will cover the mteril from ll of
More informationThe solutions of the single electron Hamiltonian were shown to be Bloch wave of the form: ( ) ( ) ikr
Lecture #1 Progrm 1. Bloch solutions. Reciprocl spce 3. Alternte derivtion of Bloch s theorem 4. Trnsforming the serch for egenfunctions nd eigenvlues from solving PDE to finding the e-vectors nd e-vlues
More informationf(x)dx . Show that there 1, 0 < x 1 does not exist a differentiable function g : [ 1, 1] R such that g (x) = f(x) for all
3 Definite Integrl 3.1 Introduction In school one comes cross the definition of the integrl of rel vlued function defined on closed nd bounded intervl [, b] between the limits nd b, i.e., f(x)dx s the
More informationGenetic Programming. Outline. Evolutionary Strategies. Evolutionary strategies Genetic programming Summary
Outline Genetic Progrmming Evolutionry strtegies Genetic progrmming Summry Bsed on the mteril provided y Professor Michel Negnevitsky Evolutionry Strtegies An pproch simulting nturl evolution ws proposed
More informationHere we study square linear systems and properties of their coefficient matrices as they relate to the solution set of the linear system.
Section 24 Nonsingulr Liner Systems Here we study squre liner systems nd properties of their coefficient mtrices s they relte to the solution set of the liner system Let A be n n Then we know from previous
More informationChapter 3. Vector Spaces
3.4 Liner Trnsformtions 1 Chpter 3. Vector Spces 3.4 Liner Trnsformtions Note. We hve lredy studied liner trnsformtions from R n into R m. Now we look t liner trnsformtions from one generl vector spce
More informationEfficient Planning. R. S. Sutton and A. G. Barto: Reinforcement Learning: An Introduction
Efficient Plnning 1 Tuesdy clss summry: Plnning: ny computtionl process tht uses model to crete or improve policy Dyn frmework: 2 Questions during clss Why use simulted experience? Cn t you directly compute
More information