Bellman goes Relational

Size: px
Start display at page:

Download "Bellman goes Relational"

Transcription

1 Bellmn goes Reltionl Kristin Kersting 1 kersting@informtik.uni-freiburg.de University of Freiburg, Mchine Lerning Lb, Georges-Koehler-Allee 079, Freiburg, Germny Mrtijn Vn Otterlo 1 otterlo@cs.utwente.nl University of Freiburg, Mchine Lerning Lb, Georges-Koehler-Allee 079, Freiburg, Germny Twente University, Deprtment of Computer Science, TKI, P.O. Box 217, 7500 AE Enschede, The Netherlnds Luc De Redt deredt@informtik.uni-freiburg.de University of Freiburg, Mchine Lerning Lb, Georges-Koehler-Allee 079, Freiburg, Germny Abstrct Motivted by the interest in reltionl reinforcement lerning, we introduce novel reltionl Bellmn updte opertor clled ReBel. It employs constrint logic progrmming lnguge to compctly represent Mrkov decision processes over reltionl domins. Using ReBel, novel vlue itertion lgorithm is developed in which bstrction (over sttes nd ctions) plys mjor role. This frmework provides new insights into reltionl reinforcement lerning. Convergence results s well s experiments re presented. 1. Introduction There hs been lot of ttention nd progress in reinforcement lerning (RL) nd Mrkov decision processes (MDPs) recently. Severl bsic lgorithms hve been proposed nd their behvior is reltively well understood tody (Sutton & Brto, 1998). This hs led to n incresed interest into the effects of generliztion nd to new chllenges. One of them concerns the use of RL in reltionl domins (Džeroski et l., 2001). Even though number of reltionl RL lgorithms hs been developed essentilly through vrying the underlying function pproximtors (Driessens & Rmon, 2003; Gärtner et l., 2003) the problem of reltionl RL is still not well understood nd theory of reltionl RL is lcking. Appering in Proceedings of the 21 st Interntionl Conference on Mchine Lerning, Bnff, Cnd, Copyright by the uthors. In trditionl RL, the Bellmn bckup opertor is one of the centrl concepts. A prticulrly interesting pproch is tht of Dietterich nd Flnn (1997), who showed tht vlue bckups in model-bsed RL cn be upgrded to region-bsed bckups, where multiple sttes re updted simultneously using bckup opertor tht reverses the ction opertors. Inspired by this work, the key contribution of this pper is the introduction of reltionl Bellmn bckup opertor, clled ReBel. ReBel is developed within simple probbilistic STRIPS-like reltionl formlism tht incorportes severl elements of reltionl nd logicl Mrkov Decision Progrmming (Kersting & De Redt, 2003; Vn Otterlo, 2004) such s bstrct sttes tht re represented using reltionl queries. Using ReBel, we then develop model-bsed reltionl RL lgorithm nd demonstrte it on number of experiments. The pproch is lso relted to tht by Boutilier et l. (2001) who employ sitution clculus bsed lnguge. Although their work is certinly elegnt nd principled, due to the complexity of the lnguge, they neither report on complete implementtion nor present utomted experiments. In contrst, our pproch is simpler nd therefore fully utomted. It dels fully utomticlly with the sme experimentl exmple tht Boutilier et l. report on. Outline: Section 2 briefly reviews reltionl logic nd MDPs. After discussing vlue itertion (VI) for MDPs in Section 3, we introduce lnguge to compctly specify MDPs over reltionl domins in Section 4. In Section 5, we develop reltionl VI lgorithm bsed on ReBel. It is empiriclly vlidted in Section 6. Before concluding we discuss relted work. 1 Both uthors contributed eqully to the pper.

2 2. Preliminries Reltionl Logic, cf. (Nienhuys-Cheng & de Wolf, 1997): An lphbet Σ is set of reltion symbols p with rity m 0, nd set of constnts c. An tom p(t 1,... t m ) is reltion symbol p followed by brcketed m-tuple of terms t i. A term is vrible X or constnt c. A conjunction A is set of toms. The set of vribles in conjunction A is denoted s vrs(a). A substitution θ is set of ssignments of terms to vribles {X 1 /t 1,... X n /t n } where X i re vribles nd ll t i re terms. A term, tom or conjunction is clled ground if it contins no vribles. Conjunctions re implicitly ssumed to be existentilly quntified. A conjunction A is sid to be θ-subsumed by conjunction B, denoted by A θ B, if there exists substitution θ such tht Bθ A. The most generl unifier (mgu) for toms nd b is denoted by mgu(, b). A (Horn) cluse H B consists of positive tom H nd conjunction B nd cn be red s H is true if B is true. The gretest lower bound (glb) of two conjunctions A nd B is the most generl conjunction tht is subsumed by both A nd B. Both subsumption nd glb re lso defined for cluses. The Herbrnd bse of Σ, HB Σ, is the set of ll ground toms which cn be constructed with the predicte symbols nd constnts of Σ. An interprettion is subset of HB Σ. Our running exmple will be blocks world. Here, block X cn be moved on top of nother block Y, denoted s move(x, Y). Vlid reltions re on(x, Y), i.e. block X is on Y, nd cl(z), i.e. block Z is cler. To model the floor, we follow common pproch. It is set of blocks which cnnot be on top of other blocks. Mrkov Decision Processes, cf. (Sutton & Brto, 1998): A Mrkov Decision Process (MDP) is tuple M = S, A, T, R, where S is set of sttes, A set of ctions, T : S A S [0, 1] trnsition model nd R : S A S [0, 1] rewrd model. The set of ctions pplicble in stte s S is denoted A(s). A trnsition from stte i S to j S cused by some ction A(i) occurs with probbility T (i,, j) nd rewrd R(i,, j) is received. T defines proper probbility distribution if for ll sttes i S nd ll ctions A(i): j S T (i,, j) = 1. A deterministic policy π : S A for M specifies which ction A(s) will be executed when the gent is in some stte s S, i.e. π(s) =. 3. Vlue Itertion Given some MDP M = S, A, T, R, policy π for M, nd discount fctor γ [0, 1], the stte vlue function V π : S R represents the vlue of being in stte following policy π, w.r.t. expected rewrds. A similr stte-ction vlue function Q π : S A R cn be defined. A policy π is optiml if V π (s) V π (s) s S nd π. Optiml vlue functions re denoted V nd Q. Bellmn s (1957) optimlity eqution sttes: V (s) = mx s T (s,, s )[R(s,, s ) + γv (s )] (1) From this eqution, bsiclly ll methods for solving MDPs cn be derived. For exmple, the well-known exct solution technique vlue itertion (VI) is obtined from (1) by turning it into n updte rule: V t+1 (s) = mx s T (s,, s )[R(s,, s ) + γv t (s )] (2) = mx Q t+1 (s, ). (3) Bsed on Eqution (2), the VI lgorithm cn be stted s follows: strting with vlue function V 0 over ll sttes, we itertively updte the vlue of ech stte ccording to (2) to get the next vlue functions V t (t = 1, 2, 3,...). VI is gurnteed to converge in the limit towrds V, i.e. the Bellmn optimlity eqution (1) holds for ech stte. Trditionl VI s expressed by Eqution (2) ssumes tht ll sttes nd vlues re represented explicitly in tble. This is imprcticl for ll but the smllest stte spces. Furthermore, for reltionl domins, where the number of sttes cn grow very lrge (even infinitely lrge) this is infesible. Therefore, methods tht mke bstrct from specific sttes re needed. Such method is developed in the next sections. 4. Mrkov Decision Progrms Trditionl MDPs re essentilly propositionl in tht ech stte cn be represented using seprte proposition. In Mrkov decision progrms these propositionl symbols re replced by bstrct sttes: Definition 1 An bstrct stte is conjunction Z of logicl toms, i.e., logicl query. Abstrct sttes represent sets of sttes. More formlly, stte is n interprettion, i.e. set of grounds fcts. Consider e.g. the stte z = cl(), cl(b), on(, c) in the blocks world. An bstrct stte Z is, e.g., cl(x). It represents ll sttes tht re subsumed by Z, i.e., ll interprettions in which there exists something tht is cler. We cn now introduce the bsic ingredients of Mrkov decision progrms, nmely, bstrct ctions, bstrct rewrds, nd integrity constrints.

3 An bstrct ction is defined s follows. Definition 2 An ction 2 is finite set of ction rules p i:a H i B where A is n tom representing the nme nd the rguments of the ction nd B is n bstrct stte denoting the preconditions of A. H i is the i-th possible outcome of A. It holds tht i p i = 1. We ssume tht vrs(a) = (vrs(h i ) vrs(b)). The semntics of the ction definition re: If the current stte b is subsumed by B, i.e., b θ B, then tking ction A will result in [b \ Bθ] H i θ with probbility p i. So, if the preconditions re fullfilled, ll outcomes re possible. As n illustrtion, consider on(x, Y), cl(x), cl(z), X Y, Y Z, X Z cl(x), cl(y), on(x, Z), X Y, Y Z, X Z. 0.9:move(X,Y,Z) 0.1:move(X,Y,Z) cl(x), cl(y), on(x, Z), X Y, Y Z, X Z cl(x), cl(y), on(x, Z), X Y, Y Z, X Z. which moves block X on Y with probbility 0.9. With probbility 0.1 the ction fils, i.e., we do not chnge the stte. Applied to the bove stte z the ction tells us tht move(, b, c) will result in z on(, b), cl(), cl(c) with probbility 0.9 nd with probbility 0.1 we sty in z. This type of ction definition implements kind of probbilistic STRIPS opertor. The model R of bstrct rewrds specifies the rewrds generted by entering bstrct sttes. In our frmework it coincides with our initil bstrct stte vlue function V 0. Definition 3 An bstrct stte vlue function V is finite list of vlue rules of the form c B were B is n bstrct stte nd c R. To ny bstrct stte Z, V ssigns the mximl vlue c of ll mtching vlue rules c B to Z s vlue. A rule mtches if Z θ B. Consider e.g. R = V 0 s 10.0 on(, b). nd 0.0 true. It ssigns 0 to z but 10 to z. Using true in the lst vlue rule ssures tht ll stte re ssigned vlue. To develop ReBel, we will lso employ bstrct ction-stte vlue functions, which re similr to bstrct stte vlue functions nd of which n exmple cn be found in Section 5.2. Definition 4 An bstrct stte ction vlue function Q is finite set of Q-rules of the form c : A B were B is n bstrct stte, A is n ction nd c R. 2 For the ske of simplicity, we consider cost-free ctions. The frmework cn be dpted to the cse of ction costs. Note lso tht the mening of bstrct ction here differs from tht sometimes used in the context of hierrchicl RL. To ny bstrct stte-ction pir B nd A, Q ssigns the mximl vlue c of ll bstrct stte ction rules subsumed by A B. Rewrds re specified over queries, i.e., existentilly quntified gols. Although these re simple, they re expressive enough to specify mny interesting problems studied by the (reltionl) RL community such s shortest-pth problems. Here, the gol is to rech certin (bstrct) sttes. When gol stte is entered, the process ends. In RL, episodic tsks re encoded using bsorbing sttes. We encode it by rtificil deterministic ctions such s on(, b) 1.0:bsorbing on(, b), which denotes tht ll sttes tht re subsumed by on(, b) trnsition only to themselves nd generte only zero rewrds. For exmple, z is not bsorbing but z is. Finlly, we need wy to cope with the integrity constrints imposed by our domin. For instnce, in the move definition bove we employed symmetry of. This cn be modeled by set C of integrity constrints. Ech integrity constrint is Horn cluse. For instnce in the blocks world, no block cn be free if there is block on top of it nd no block cn be on itself: flse on(x, Y), cl(y) nd X Y on(x, Y). The completion of n bstrct stte Z is the lest fixpoint of C {Z}, i.e., ll fcts deducible from C {Z}. For exmple, on(, b) does not encode tht is not b. Using the rules bove, this stte is completed to on(, b), b. Furthermore, if the completion includes flse, the stte does not stisfy the constrints, i.e., it is n illegl stte. To del with integrity constrints, we lso hve to dpt our nottions of ction definitions nd generlity. Action definitions re now constrined so tht they cnnot led to illegl sttes. For subsumption we employ the integrity constrints s bckground theory nd use Buntine s generlized subsumption frmework (Buntine, 1988). Along the lines of (Kersting & De Redt, 2003; Vn Otterlo, 2004), it cn be proven tht ny Mrkov decision progrm induces (possibly infinite) MDP. 5. Reltionl Vlue Itertion: ReBel We will now develop vlue itertion lgorithm for Mrkov decision progrms, i.e., given n bstrct rewrd model R, i.e., initil bstrct stte vlue function V 0, compute the next bstrct stte vlue functions V t, t = 1, 2,... The min ide is to upgrde Bellmn s trditionl bckup opertor in Eqution (2). Therefore, we iterte over: 1): Regress ll preceding bstrct sttes from V t.

4 2): Compute Q t+1 over the regressed sttes. 3): Compute V t+1 by mximizing over Q t+1. We will now discuss ech step in turn Regression Let V t be the current bstrct stte vlue function, sy V 0, nd consider the bstrct ction move. For single Bellmn bckup, ll bstrct sttes S which led to condition in V 0 when tking ction move hve to be computed. Thus, we hve to reson from postto preconditions. For exmple, the first outcome of move(, b, c) cn led from stte S ( cl(), cl(b), on(, c), on(b, d) ) (inequlity constrints omitted) to the bstrct stte S on(, b). Thus, we hve to compute the wekest preconditions for the outcomes of move nd S. Definition 5 All bstrct sttes which led to S when p i:a following some ction rule H i B constitute the so clled wekest precondition wp i (A, S ) of the i-th outcome of A. For exmple, S lies in the wekest precondition of S, i.e., S wp 1 (move(x, Y, Z), S ) but it does not lie in wp 2 (move(x, Y, Z), S ) To compute wp 1 (move(x, Y, Z), S ) we cn ssume tht we moved from S to S. Thus, 1) the preconditions of the ction (rule) re fullfilled in S, nd 2) S is prtilly cused by the first outcome of the ction. As n illustrtion of 2), consider on(, b) : move cused on(, b): We hve been in bstrct stte S 1 ( cl(), cl(b), on(, Z), b, Z, b Z ) nd moved X = on Y = b. move did not cuse on(, b): We moved X on Y but not on b. Therefore, we hve been in bstrct sttes T ( cl(x), cl(y), on(x, Z),on(, b), X Y, X Z, Y Z ) stisfying tht we did not move on b, i.e., on(x, Y) on(, b), nd tht we did not move from b wy, i.e., on(x, Z) on(, b). The constrints gurntee tht pplying move(x, Y, Z) in T preserves on(, b). The definition of S simplifies to S 2 ( T X ), S 3 ( T X Z b ) ( ), S 4 T Y b X, nd S5 ( T Y b Z b ). All S i re completed to the sme stte nmely S 6 cl(a), cl(b), on(, b), on(a, C) where ll vribles nd constnts re mutully different. The bstrct sttes S 1, S 6 together logiclly define wp 1 (move(x, Y, Z), S ) ( ) S 1 S 6. So fr, we considered single effect only, nmely on(, b). In generl, however, there cn be multiple 1: initilize wp i to be the empty list 2: for ech subset S of S nd subset P of H i such tht θ = mgu(s, P ) exists or S == P == H i, i.e., θ = do 3: S := (S θ \ P θ) Bθ 4: for ll pirs (l, l ) in {(l, l ) l (S θ \ P θ) l H iθ Bθ} do 5: if mgu(l, l ) exists then 6: dd l l to S 7: dd ll simplifictions of S to wp i 8: return wp i Procedure 1: WekestPre returns the wekest precondition of ction rule H p i :A i B nd bstrct stte S given set of integrity constrints C. We omitted tht only legl nd completed bstrct sttes re inserted in wp i. (combined) effects tht re or tht re not cused by tking ction move, cf. WekestPre in Procedure 1. Consider for exmple S ( on(, b), on(c, d) ). Moving block on some other block cn hve cused either on(, b) or on(c, d), or neither of them, cf. line 2. Assume tht no effect ws cused. Then, S is empty nd P = H 1, cf. line 2. Therefore, θ is the empty substitution nd S ( on(, b), on(c, d), cl(x), cl(y), on(x, Z) ) (inequlity constrints omitted) is possible preimge, cf. line 3. However, we know tht move did not cuse on(, b), on(c, d). Therefore, it holds on(x, Z) on(, b) on(x, Z) on(c, d) on(x, Y) on(, b) on(x, Y) on(c, d), cf. lines 4 6. S cn be simplified for instnce to S, X, X c which is legl bstrct stte. The cse tht the ction cused some effects is covered by the mgu(s, P ) exists conditition in line 2. It is treted nlogously Computing Abstrct Stte Action Vlues Given the regressed bstrct sttes nd the current bstrct stte vlue function V t, we now compute n bstrct stte-ction vlue function Q t+1 ccording to Procedure 2. To do so, (A) we tret ech outcome of n ction A s though it would be single ction nd compute its bstrct stte ction vlue, cf. line 4. Then, (B) we combine the vlues of ll outcomes to n bstrct stte ction vlue for A, cf. lines For the ske of brevity, we will not stte constrints in the exmples till the end of Section 5.3. For step (A), consider gin the first outcome of move. The wekest precondition ws wp 1 (move(x, Y, Z), S ) S 1 S 6. Becuse S 6, is bsorbing, we ssign n bstrct stte ction vlue of 10 for tking ction move, i.e., 10 : move(x, Y, Z) S 6. The vlue of S 1, however, is dependent on V t (S ), i.e. in our exmple V 0. Assuming discount fctor of 0.9 this yields R(S) + p V 0 (S ) = = 8.1, i.e., 8.1 : move(, b, Z)

5 1: initilize Qrules to the empty set. p 2: for ech ction rule H i :A i B for A do 3: for ech v V in V t do 4: := { q : à S S wp i(a, V )} prtilq ( q := R(S) : S is bsorbing R(S) + p i γ V t(v ) : otherwise 5: if Qrules then 6: Qrules := prtilq 7: else 8: newq := 9: for ll pirs q : à S Qrules nd q : à S prtilq do 10: if G := glb(ã S, à S ) exists then 11: dd q : G to newq with q = q + q 12: Qrules := newq 13: return Qrules Procedure 2: QRules returns the Q-rules of n ction A given the rewrd model R, the current vlue function V t nd discount fctor γ. Note tht à denotes the ction hed where we keep the substitution mde by wp i. We lso omitted tht only legl nd completed bstrct sttes g re inserted in Qrules. S 1. Doing the sme for ll other rules in V 0 results in: 10 : move(x, Y, Z) cl(x), cl(y), on(, b), on(x, Z) b 8.1 : move(, b, Z) cl(), cl(b), on(, Z) c 0.0 : move(x, Y, Z) cl(x), cl(y), on(x, Z) For the second outcome of move, step (A) leds to: d 1.0 : move(, X, b) cl(), cl(x), on(, b) e 1.0 : move(x, Y, Z) cl(x), cl(y), on(, b), on(x, Z) f 0.0 : move(x, Y, Z) cl(x), cl(y), on(x, Z) For step (B), we note tht ech of these rules describes situtions such s if we re in stte then we cn get some vlue for chieving the i-th outcome of ction A. This informtion hs to be combined to n bstrct stte ction vlues for A. To do so, we select rule from c, sy b, nd rule from d f, sy f, nd check whether we cn be in both bstrct sttes t the sme time nd whether we cn pply the sme ction. In other words, we compute the gretest lower bound (glb) of the logicl cluses underlying both vlue rules. If the glb (where the ctions hve to unify) exists nd it is legl stte, then it is inserted s new rule, cf. line 11. The vlue of the new rule is the sum of vlues of the combined rules. For b nd f this yields 8.1 : move(, b, X) cl(), cl(b), on(, X). In contrst, b nd d do not give new rule. In our blocks world exmple, QRules yields the following bstrct stte ction vlue function when pplied on V 0 nd move nd bsorbing: 1: initilize V t+1 to the empty set of V -rules. 2: sort Qrules in decresing order of Q-vlues 3: while Qrules not empty do 4: remove top element d : A B of Qrules 5: if no other rule d : A B in Qrules exists such tht B subsumes B then 6: dd d B to V t+1 7: remove ll rules d B from Qrules such tht B is subsumed by B 8: return V t+1 Procedure 3: VRules returns the vlue functions V t+1 given the Q-rules computed from V t for ll ctions : bsorbing on(, b) 2 10 : move(x, Y, Z) cl(x), cl(y), on(, b), on(x, Z) : move(, b, X) cl(), cl(b), on(, X) : move(x, Y, Z) cl(x), cl(y), on(x, Z) Note tht we hve sorted the Q-rules in descending order only for the ske of redbility Computing Abstrct Stte Vlues The set of Q-rules enbles one to compute the next bstrct stte vlue function V t+1. In contrst to the trditionl cse, Q-rules, i.e., vlues of bstrct stte ction pirs, cn overlp such s Q-rules 1 nd 2. To compute bstrct stte vlues we mke use of the fct tht V t+1 (S) = mx A Q t+1 (S, A) due to Eq. (3). In generl, ny vlue-preserving trnsformtion cn be pplied. In this pper, we use simple seprte-ndconquer rule lerning pproch where the rules to lern nd the exmples to lern from coincide, see VRules in Procedure 3. We serch for Q-rule m hving mximl Q-vlue mong Qrules, lines 3 4, seprte the covered Q-rules, line 5, nd recursively conquer the remining Q-rules by selecting more rules until no Q-rules remin, line 6. The min difference is tht we select m nd dd it to V t+1 only if there is no other Q-rule left in Qrules with the sme vlue whose body subsumes the body of m, cf. line 8. In our running exmple, we strt with rule 1. Becuse it is not subsumed by ny other rule hving the sme vlue, we dd 10 on(, b) to V 1 nd, becuse it subsumes 2, we remove 2 from Qrules. The remining highest vlued rule is 3, nd we iterte. After completing, this yields the new vlue function V 1 (constrints listed gin): 10 on(, b), b. 8.1 cl(), cl(b), on(, X), b, X, b X. 0 cl(x), cl(y), on(x, Z), X Y, X Z, Y Z Reltionl Bellmn Bckup Opertor To summrize, the generl scheme of ReBel is: 1) Compute the wekest precondition of ech ction

6 Vlues X F 1 F 2 F 1 F X 1 X 1 F 2 F 1 F X 1 X 2 F 1 F 2 X 1 X 2 X 1 F 1 X 2 X 1 F 2, X 2 F 1 X 1 X 2 F 2, F 1 F 2 X k F 1 F 2 X 1 X i(i = 1,..., k) F 1 F 2 F i(i = 1, 2) X i X j(i < j; i, j = 1,... k) X i F j(i = 1,... k; j = 1, 2) F 1 F 2 Figure 1. Blocks World Experiment I: Abstrct stte vlue function for the cl() gol fter 10 itertions. It pplies for ny number of blocks. Vlues re rounded to the second digit. F i cn be block or floor block. Sttes structurlly different from the depicted ones get vlue 0.0. outcome for ech bstrct stte in V t using Wekest- Pre. As done in QRules, 2) ssign to ech bstrct stte ction outcome pir computed in 1) Q-vlue nd 2b) combine them bsed using the glb. 3) Mximize the Q-rules to compute V t+1 using VRules. Note tht in 2b), if there re n > 1 mny outcomes of n ction, then the Q-vlues of the n-th outcome re combined with lredy combined Q-vlues of the n 1 previous outcomes. Thus, there re n 1 mny combintions per ction. This might produces mny rules. To overcome this, one cn dpt VRules mximizing Q-rules to compress Q-rules: if we re in stte with different currently combined vlues for comptible ctions, then we select only the higher one. This is sfe becuse the higher vlued Q-rule subsumes the lower vlued one. Therefore, it would hve been selected in ny cse lter on. Formlly, this Bellmn bckup requires n infinite number of itertions to converge to V, cf. Section 6. In prctice, we stop when the bstrct vlue function chnges by only smll mount. 6. Experiments In this section we empiriclly vlidte ReBel. We implemented ReBel with compressing Qrules in the Prolog system YAP version nd we used the supplemented constrint hndling rules librry (Frühwirth, 1998). In ll experiments we ssume discount fctor of 0.9 nd gol rewrd of 10, i.e., in ll other sttes we receive 0 rewrd. Only gol sttes re bsorbing. Experiments were run on 3.1 GHz Linux mchine. The running times were estimted using YAP s buildin sttistics(runtime, ). We focused on stndrd for k = 3, 4,..., 10 Vlues b F 1 b A b F 1 F 2 $ # $ A b F 1 F 2 % & b F 1 F 2 '' (( ' (! " ** )) ** ) * A B C F 1 b F 2 // 00 / 0,, ++,, +, Figure 2. Blocks World Experiment II: Prts of the bstrct vlue function for on(, b) fter 10 itertions (vlues rounded to the second digit). It pplies for ny number of blocks. We omitted the inequlity constrints: All blocks re mutully different. F i cn be block or floor block. Stte more thn 10 steps wy from the gol get vlue 0.0. exmples known from the reltionl RL literture. Blocks World Experiment I: We consider cl() s gol in our probbilistic blocks world setting. The experiment shows tht even on simple problems ReBel is not gurnteed to converge on the structurl level. Figure 1 shows the bstrct stte vlue function fter 10 itertions. It took ReBel roughly 1 minute to iterte ten times. Figure 1 highlights tht sttes tht re one step further wy from the gol get the sme vlue. The vlue, however, is lower becuse of the dditionl block on top of the stck of. Thus, becuse the number of blocks is not restricted, vlue itertion will never stop. Proposition: Abstrction does not gurntee convergence in infinite domins becuse n infinite number of bstrct sttes cn be required. This is interesting, becuse infinite stte spces esily rise when reltionl representtions re used nd reltionl bstrction ws hoped to be solution. Nevertheless, reltionl vlue itertion cn converge even for infinite domins s our third experiment will show. Blocks World Experiment II: We consider the gol on(, b) in deterministic blocks world becuse it is reported to be hrd problem for model-free reltionl RL (RRL) pproches (Džeroski et l., 2001; Driessens & Rmon, 2003). For instnce, Driessens nd Rmon (2003) report tht on verge the lerned policies did not rech optiml performnce even for 5 blocks. Using the sme experimentl set-up s in our first experiment but deterministic move ction, ReBel

7 V t bstrct sttes bin(b, p) tin(a, p), on(b, A), not rin tin(a, p), on(b, A), rin tin(a, B), on(b, A), not rin tin(a, B), on(b, A), rin tin(a, B), bin(b, B), not rin tin(a, B), bin(b, B), rin tin(a, B), bin(b, C), not rin tin(a, B), bin(b, C), rin tin(a, B) Tble 1. Lod-Unlod Experiment: The t-th column shows the bstrct stte vlue function fter the t-th itertion. When no vlue is given, the bstrct stte hs vlue 0.0. Bold numbers highlight chnged vlues. computed V 10 in less thn 12 minutes. The bstrct vlue function is prtilly shown in Figure 2. Becuse the move ction is deterministic, V 10 is optiml for 10 blocks (more thn 58 million ground sttes). The optiml policy cn directly be extrcted by computing the mximizing Q-rules for ech bstrct stte. In our exmple, this results in removing the top elements from the stcks on top of nd b. However, to compctly represent this strtegy, one needs to define the predicte ontop. In the experiments Driessens nd Rmon (2003) reported on, this ws lwys the cse. The policy bsed on Rebel is optiml no mtter how mny blocks there re. Lod-Unlod Experiment: Our finl experiment considers the logistics domin which Boutilier et l. (2001) solved semi-utomticlly. The domin consists of cities, trucks nd boxes. Boxes cn be loded onto nd unloded from trucks, nd trucks cn be driven between cities. The predicte on(b, T) denotes tht box B is on the truck T, bin(b, C) denotes tht box B is in some city C nd tin(t, C) denotes tht truck T is in city C. The ctions tht cn be performed re: lod(b, T) nd unlod(b, T) specifying how box B cn be loded onto or loded from truck T nd drive(t, C) specifying tht the truck T is driven to city C. The ctions in this domin hve probbilistic effects. The probbility of filing lod or unlod ction, i.e., stying in the current stte, depends on whether it rins or not, denoted by rin. The ction specifiction is s follows (we omit the filing specifictions for the ske of brevity): bin(b, C), tin(t, C), R on(b, T), tin(t, C), R pr:unlod(b,t) on(b, T), tin(t, C), R pr:lod(b,t) tin(t, C ), C C 1.0:drive(T,C ) tin(t, C) bin(b, C), tin(t, C), R where the probbility pr is 0.9 if R is rin nd 0.7 if R is not rin. To correctly hndle the explicit negtion we used for rin, we provided flse rin, not rin s constrint. The gol in this domin is to get some box b in p where p stnds for Pris, i.e., in bin(b, p) we get rewrd of 10. ReBel rn for less thn 6 seconds to compute the results summrized in Tble 1. In contrst to the blocks world exmples, the solution converges both t the vlue level nd t the structurl level. E.g., tke the sitution in which truck is in city different from Pris nd the box is there too. Then, it will tke three steps (lod drive unlod) to rech the gol stte nd the stte vlue in V 10 is in cse it rins. The bstrct stte vlue function pplies no mtter how mny trucks, boxes nd cities re present. 7. Relted Work In the pst few yers, there hs been n incresed nd significnt interest in using rich reltionl representtions for modeling nd lerning MDPs. In model-free reltionl RL, one hs studied different reltionl lerners for function pproximtion (Džeroski et l., 2001; Lecoeuche, 2001; Driessens & Rmon, 2003; Gärtner et l., 2003). Others hve pplied Q-lerning bsed on pre-specified bstrct stte spces: Kersting nd De Redt (2003) investigte pure Q-lerning, Vn Otterlo (2004) lerns the Q-function vi lerning the underlying trnsition model. Fern et l. (2003) extended previous work on upgrding lerned policies for smll reltionl MDPS (RMDPs) with pproximted policy itertion. Finlly, Guestrin et l. (2003) recently reported on clss-bsed, pproximte vlue functions for RMDPs. For model-bsed pproches, there hs been surprising lck of reserch on exct solution methods. From generl point of view, ReBel is closely relted to decision theoretic regression (DTR) (Boutilier et l., 1999) nd, becuse of tht, it is lso relted to regression plnning in the sme wy s DTR is. Within DTR, most lgorithms re designed to work with propositionl representtions. Actully, the only exception the uthors know of is tht of Boutilier et l. (2001). ReBeL reltes to this in tht it lso is model-bsed exct solution method for RMDPs. One key difference is tht Boutilier et l. employ sitution clculus

8 for representing RMDPs. Sitution clculus is very expressive nd s consequence it is hrder to simplify the logicl descriptions of the bstrct vlue functions sttes tht re obtined. This my lso explin why to the best of the uthors knowledge tht pproch hs not been fully implemented nd experimented with. In contrst, becuse of the use of simpler logicl lnguge, the simplifiction in ReBel is computtionlly fesible. As shown in the experiments, ReBel successfully nd fully utomticlly implements reltionl vlue itertion. Finlly, the work by Dietterich nd Flnn (1997) is lso concerned with generlizing Bellmn bckups but no reltionl representtion is used. 8. Conclusions The key contribution of this pper is the introduction of ReBel, reltionl upgrde of the Bellmn updte opertor. It hs been used to implement reltionl vlue itertion lgorithm. It hs been shown to be effective in number of simple though significnt exmples. This in turn hs led to number of novel insights into reltionl MDPs. First, it hs been shown tht vlue-bsed methods for reltionl MDPs my not converge becuse n infinite number of bstrct sttes hs to represented. Second, we highlighted tht in such cses bckground knowledge my enble the lerning of optiml policies. So, depending on the representtion of the problem, one cn or cnnot lern the optiml policy. Therefore, using bckground knowledge in RMDPs is not only n interesting feture, but in some cses lso necessity for successful lerning. In this wy, we hve given n explntion for nd confirmed some of the experimentl insights of the erly reltionl RL work (Džeroski et l., 2001). Further work could ddress combining ReBel with other types of vlue-bsed methods, extending the representtion lnguge, efficient dt structures, complexity nlysis, nd employing other lerning lgorithms to compress vlue functions. The uthors hope tht the theoreticl insights, s well s the lgorithm developed in this pper, will be helpful in dvncing the field of reltionl RL s well s contribute to n improved understnding of the problems involved. Acknowledgements The uthors would like to thnk the nonymous reviewers for their helpful comments. This reserch ws supported by the Europen Union IST progrmme, contrct no. FP , APrIL II. Mrtijn Vn Otterlo ws supported by Mrie Curie fellowship t DAISY, HPMT-CT References Bellmn, R. E. (1957). Dynmic progrmming. Princeton, New Jersey: Princeton University Press. Boutilier, C., Den, T., & Hnks, S. (1999). Decisiontheoretic plnning: Structurl ssumptions nd computtionl leverge. J. Art. Intel. Res., 11, Boutilier, C., Reiter, R., & Price, B. (2001). Symbolic dynmic progrmming for first-order MDP s. Proc. of IJCAI 01. Buntine, W. (1988). Generlized subumption nd its pplictions to induction nd redundncy. Artificil Intelligence, 36, Dietterich, T. G., & Flnn, N. S. (1997). Explntionbsed lerning nd reinforcement lerning: unified view. Mchine Lerning, 28, Driessens, K., & Rmon, J. (2003). Reltionl instnce bsed regression for reltionl reinforcement lerning. Proc. of ICML Džeroski, S., De Redt, L., & Driessens, K. (2001). Reltionl reinforcement lerning. Mchine Lerning, 43, Fern, A., Yoon, S., & Givn, R. (2003). Approximte policy itertion with policy lnguge bis. Proc. of NIPS 03. Frühwirth, T. (1998). Theory nd Prctice of Constrint Hndling Rules. Journl of Logic Progrmming, 37, Gärtner, T., Driessens, K., & Rmon, J. (2003). Grph kernels nd Gussin processes for reltionl reinforcement lerning. Proc. of ILP 03. Guestrin, C., Koller, D., Gerhrt, C., & Knodi, N. (2003). Generlizing plns to new environments in reltionl MDPs. Proc. of IJCAI 03. Kersting, K., & De Redt, L. (2003). Logicl Mrkov decision progrms. Proc. of the IJCAI 03 Workshop on Lerning Sttisticl Models of Reltionl Dt. Lecoeuche, R. (2001). Lerning optiml dilogue mngement rules by using reinforcement lerning nd inductive logic progrmming. Proc. of the North Americn Chpter of the Assocition for Computtionl Linguistics (NAACL). Pittsburgh. Nienhuys-Cheng, S.-H., & de Wolf, R. (1997). Foundtions of inductive logic progrmming, vol of Lecture Notes in Artificil Intelligence. Springer-Verlg. Sutton, R., & Brto, A. (1998). Reinforcement lerning: n introduction. Cmbridge: The MIT Press. Vn Otterlo, M. (2004). Reinforcement lerning for reltionl MDPs. Mchine Lerning Conference of Belgium nd the Netherlnds (BeNeLern 04).

Reinforcement Learning

Reinforcement Learning Reinforcement Lerning Tom Mitchell, Mchine Lerning, chpter 13 Outline Introduction Comprison with inductive lerning Mrkov Decision Processes: the model Optiml policy: The tsk Q Lerning: Q function Algorithm

More information

Reinforcement learning II

Reinforcement learning II CS 1675 Introduction to Mchine Lerning Lecture 26 Reinforcement lerning II Milos Huskrecht milos@cs.pitt.edu 5329 Sennott Squre Reinforcement lerning Bsics: Input x Lerner Output Reinforcement r Critic

More information

2D1431 Machine Learning Lab 3: Reinforcement Learning

2D1431 Machine Learning Lab 3: Reinforcement Learning 2D1431 Mchine Lerning Lb 3: Reinforcement Lerning Frnk Hoffmnn modified by Örjn Ekeberg December 7, 2004 1 Introduction In this lb you will lern bout dynmic progrmming nd reinforcement lerning. It is ssumed

More information

Duality # Second iteration for HW problem. Recall our LP example problem we have been working on, in equality form, is given below.

Duality # Second iteration for HW problem. Recall our LP example problem we have been working on, in equality form, is given below. Dulity #. Second itertion for HW problem Recll our LP emple problem we hve been working on, in equlity form, is given below.,,,, 8 m F which, when written in slightly different form, is 8 F Recll tht we

More information

Administrivia CSE 190: Reinforcement Learning: An Introduction

Administrivia CSE 190: Reinforcement Learning: An Introduction Administrivi CSE 190: Reinforcement Lerning: An Introduction Any emil sent to me bout the course should hve CSE 190 in the subject line! Chpter 4: Dynmic Progrmming Acknowledgment: A good number of these

More information

19 Optimal behavior: Game theory

19 Optimal behavior: Game theory Intro. to Artificil Intelligence: Dle Schuurmns, Relu Ptrscu 1 19 Optiml behvior: Gme theory Adversril stte dynmics hve to ccount for worst cse Compute policy π : S A tht mximizes minimum rewrd Let S (,

More information

KNOWLEDGE-BASED AGENTS INFERENCE

KNOWLEDGE-BASED AGENTS INFERENCE AGENTS THAT REASON LOGICALLY KNOWLEDGE-BASED AGENTS Two components: knowledge bse, nd n inference engine. Declrtive pproch to building n gent. We tell it wht it needs to know, nd It cn sk itself wht to

More information

p-adic Egyptian Fractions

p-adic Egyptian Fractions p-adic Egyptin Frctions Contents 1 Introduction 1 2 Trditionl Egyptin Frctions nd Greedy Algorithm 2 3 Set-up 3 4 p-greedy Algorithm 5 5 p-egyptin Trditionl 10 6 Conclusion 1 Introduction An Egyptin frction

More information

Module 6 Value Iteration. CS 886 Sequential Decision Making and Reinforcement Learning University of Waterloo

Module 6 Value Iteration. CS 886 Sequential Decision Making and Reinforcement Learning University of Waterloo Module 6 Vlue Itertion CS 886 Sequentil Decision Mking nd Reinforcement Lerning University of Wterloo Mrkov Decision Process Definition Set of sttes: S Set of ctions (i.e., decisions): A Trnsition model:

More information

The First Fundamental Theorem of Calculus. If f(x) is continuous on [a, b] and F (x) is any antiderivative. f(x) dx = F (b) F (a).

The First Fundamental Theorem of Calculus. If f(x) is continuous on [a, b] and F (x) is any antiderivative. f(x) dx = F (b) F (a). The Fundmentl Theorems of Clculus Mth 4, Section 0, Spring 009 We now know enough bout definite integrls to give precise formultions of the Fundmentl Theorems of Clculus. We will lso look t some bsic emples

More information

New data structures to reduce data size and search time

New data structures to reduce data size and search time New dt structures to reduce dt size nd serch time Tsuneo Kuwbr Deprtment of Informtion Sciences, Fculty of Science, Kngw University, Hirtsuk-shi, Jpn FIT2018 1D-1, No2, pp1-4 Copyright (c)2018 by The Institute

More information

CS 188: Artificial Intelligence Spring 2007

CS 188: Artificial Intelligence Spring 2007 CS 188: Artificil Intelligence Spring 2007 Lecture 3: Queue-Bsed Serch 1/23/2007 Srini Nrynn UC Berkeley Mny slides over the course dpted from Dn Klein, Sturt Russell or Andrew Moore Announcements Assignment

More information

We will see what is meant by standard form very shortly

We will see what is meant by standard form very shortly THEOREM: For fesible liner progrm in its stndrd form, the optimum vlue of the objective over its nonempty fesible region is () either unbounded or (b) is chievble t lest t one extreme point of the fesible

More information

Finite Automata Theory and Formal Languages TMV027/DIT321 LP4 2018

Finite Automata Theory and Formal Languages TMV027/DIT321 LP4 2018 Finite Automt Theory nd Forml Lnguges TMV027/DIT321 LP4 2018 Lecture 10 An Bove April 23rd 2018 Recp: Regulr Lnguges We cn convert between FA nd RE; Hence both FA nd RE ccept/generte regulr lnguges; More

More information

1 Online Learning and Regret Minimization

1 Online Learning and Regret Minimization 2.997 Decision-Mking in Lrge-Scle Systems My 10 MIT, Spring 2004 Hndout #29 Lecture Note 24 1 Online Lerning nd Regret Minimiztion In this lecture, we consider the problem of sequentil decision mking in

More information

Bellman Optimality Equation for V*

Bellman Optimality Equation for V* Bellmn Optimlity Eqution for V* The vlue of stte under n optiml policy must equl the expected return for the best ction from tht stte: V (s) mx Q (s,) A(s) mx A(s) mx A(s) Er t 1 V (s t 1 ) s t s, t s

More information

The Regulated and Riemann Integrals

The Regulated and Riemann Integrals Chpter 1 The Regulted nd Riemnn Integrls 1.1 Introduction We will consider severl different pproches to defining the definite integrl f(x) dx of function f(x). These definitions will ll ssign the sme vlue

More information

THE EXISTENCE-UNIQUENESS THEOREM FOR FIRST-ORDER DIFFERENTIAL EQUATIONS.

THE EXISTENCE-UNIQUENESS THEOREM FOR FIRST-ORDER DIFFERENTIAL EQUATIONS. THE EXISTENCE-UNIQUENESS THEOREM FOR FIRST-ORDER DIFFERENTIAL EQUATIONS RADON ROSBOROUGH https://intuitiveexplntionscom/picrd-lindelof-theorem/ This document is proof of the existence-uniqueness theorem

More information

Math 1B, lecture 4: Error bounds for numerical methods

Math 1B, lecture 4: Error bounds for numerical methods Mth B, lecture 4: Error bounds for numericl methods Nthn Pflueger 4 September 0 Introduction The five numericl methods descried in the previous lecture ll operte by the sme principle: they pproximte the

More information

Convert the NFA into DFA

Convert the NFA into DFA Convert the NF into F For ech NF we cn find F ccepting the sme lnguge. The numer of sttes of the F could e exponentil in the numer of sttes of the NF, ut in prctice this worst cse occurs rrely. lgorithm:

More information

Chapter 5 : Continuous Random Variables

Chapter 5 : Continuous Random Variables STAT/MATH 395 A - PROBABILITY II UW Winter Qurter 216 Néhémy Lim Chpter 5 : Continuous Rndom Vribles Nottions. N {, 1, 2,...}, set of nturl numbers (i.e. ll nonnegtive integers); N {1, 2,...}, set of ll

More information

{ } = E! & $ " k r t +k +1

{ } = E! & $  k r t +k +1 Chpter 4: Dynmic Progrmming Objectives of this chpter: Overview of collection of clssicl solution methods for MDPs known s dynmic progrmming (DP) Show how DP cn be used to compute vlue functions, nd hence,

More information

Chapter 4: Dynamic Programming

Chapter 4: Dynamic Programming Chpter 4: Dynmic Progrmming Objectives of this chpter: Overview of collection of clssicl solution methods for MDPs known s dynmic progrmming (DP) Show how DP cn be used to compute vlue functions, nd hence,

More information

Chapter 0. What is the Lebesgue integral about?

Chapter 0. What is the Lebesgue integral about? Chpter 0. Wht is the Lebesgue integrl bout? The pln is to hve tutoril sheet ech week, most often on Fridy, (to be done during the clss) where you will try to get used to the ides introduced in the previous

More information

Coalgebra, Lecture 15: Equations for Deterministic Automata

Coalgebra, Lecture 15: Equations for Deterministic Automata Colger, Lecture 15: Equtions for Deterministic Automt Julin Slmnc (nd Jurrin Rot) Decemer 19, 2016 In this lecture, we will study the concept of equtions for deterministic utomt. The notes re self contined

More information

Improper Integrals, and Differential Equations

Improper Integrals, and Differential Equations Improper Integrls, nd Differentil Equtions October 22, 204 5.3 Improper Integrls Previously, we discussed how integrls correspond to res. More specificlly, we sid tht for function f(x), the region creted

More information

MAA 4212 Improper Integrals

MAA 4212 Improper Integrals Notes by Dvid Groisser, Copyright c 1995; revised 2002, 2009, 2014 MAA 4212 Improper Integrls The Riemnn integrl, while perfectly well-defined, is too restrictive for mny purposes; there re functions which

More information

Nondeterminism and Nodeterministic Automata

Nondeterminism and Nodeterministic Automata Nondeterminism nd Nodeterministic Automt 61 Nondeterminism nd Nondeterministic Automt The computtionl mchine models tht we lerned in the clss re deterministic in the sense tht the next move is uniquely

More information

New Expansion and Infinite Series

New Expansion and Infinite Series Interntionl Mthemticl Forum, Vol. 9, 204, no. 22, 06-073 HIKARI Ltd, www.m-hikri.com http://dx.doi.org/0.2988/imf.204.4502 New Expnsion nd Infinite Series Diyun Zhng College of Computer Nnjing University

More information

Chapter 5 Plan-Space Planning

Chapter 5 Plan-Space Planning Lecture slides for Automted Plnning: Theory nd Prctice Chpter 5 Pln-Spce Plnning Dn S. Nu CMSC 722, AI Plnning University of Mrylnd, Spring 2008 1 Stte-Spce Plnning Motivtion g 1 1 g 4 4 s 0 g 5 5 g 2

More information

Finite Automata. Informatics 2A: Lecture 3. John Longley. 22 September School of Informatics University of Edinburgh

Finite Automata. Informatics 2A: Lecture 3. John Longley. 22 September School of Informatics University of Edinburgh Lnguges nd Automt Finite Automt Informtics 2A: Lecture 3 John Longley School of Informtics University of Edinburgh jrl@inf.ed.c.uk 22 September 2017 1 / 30 Lnguges nd Automt 1 Lnguges nd Automt Wht is

More information

Global Session Types for Dynamic Checking of Protocol Conformance of Multi-Agent Systems

Global Session Types for Dynamic Checking of Protocol Conformance of Multi-Agent Systems Globl Session Types for Dynmic Checking of Protocol Conformnce of Multi-Agent Systems (Extended Abstrct) Dvide Ancon, Mtteo Brbieri, nd Vivin Mscrdi DIBRIS, University of Genov, Itly emil: dvide@disi.unige.it,

More information

Section 6.1 INTRO to LAPLACE TRANSFORMS

Section 6.1 INTRO to LAPLACE TRANSFORMS Section 6. INTRO to LAPLACE TRANSFORMS Key terms: Improper Integrl; diverge, converge A A f(t)dt lim f(t)dt Piecewise Continuous Function; jump discontinuity Function of Exponentil Order Lplce Trnsform

More information

Chapter Five: Nondeterministic Finite Automata. Formal Language, chapter 5, slide 1

Chapter Five: Nondeterministic Finite Automata. Formal Language, chapter 5, slide 1 Chpter Five: Nondeterministic Finite Automt Forml Lnguge, chpter 5, slide 1 1 A DFA hs exctly one trnsition from every stte on every symol in the lphet. By relxing this requirement we get relted ut more

More information

RELATIONAL MODEL.

RELATIONAL MODEL. RELATIONAL MODEL Structure of Reltionl Dtbses Reltionl Algebr Tuple Reltionl Clculus Domin Reltionl Clculus Extended Reltionl-Algebr- Opertions Modifiction of the Dtbse Views EXAMPLE OF A RELATION BASIC

More information

CS 188 Introduction to Artificial Intelligence Fall 2018 Note 7

CS 188 Introduction to Artificial Intelligence Fall 2018 Note 7 CS 188 Introduction to Artificil Intelligence Fll 2018 Note 7 These lecture notes re hevily bsed on notes originlly written by Nikhil Shrm. Decision Networks In the third note, we lerned bout gme trees

More information

How to simulate Turing machines by invertible one-dimensional cellular automata

How to simulate Turing machines by invertible one-dimensional cellular automata How to simulte Turing mchines by invertible one-dimensionl cellulr utomt Jen-Christophe Dubcq Déprtement de Mthémtiques et d Informtique, École Normle Supérieure de Lyon, 46, llée d Itlie, 69364 Lyon Cedex

More information

Strong Bisimulation. Overview. References. Actions Labeled transition system Transition semantics Simulation Bisimulation

Strong Bisimulation. Overview. References. Actions Labeled transition system Transition semantics Simulation Bisimulation Strong Bisimultion Overview Actions Lbeled trnsition system Trnsition semntics Simultion Bisimultion References Robin Milner, Communiction nd Concurrency Robin Milner, Communicting nd Mobil Systems 32

More information

Handout: Natural deduction for first order logic

Handout: Natural deduction for first order logic MATH 457 Introduction to Mthemticl Logic Spring 2016 Dr Json Rute Hndout: Nturl deduction for first order logic We will extend our nturl deduction rules for sententil logic to first order logic These notes

More information

Situation Calculus. Situation Calculus Building Blocks. Sheila McIlraith, CSC384, University of Toronto, Winter Situations Fluents Actions

Situation Calculus. Situation Calculus Building Blocks. Sheila McIlraith, CSC384, University of Toronto, Winter Situations Fluents Actions Plnning gent: single gent or multi-gent Stte: complete or Incomplete (logicl/probbilistic) stte of the worl n/or gent s stte of knowlege ctions: worl-ltering n/or knowlege-ltering (e.g. sensing) eterministic

More information

Recitation 3: More Applications of the Derivative

Recitation 3: More Applications of the Derivative Mth 1c TA: Pdric Brtlett Recittion 3: More Applictions of the Derivtive Week 3 Cltech 2012 1 Rndom Question Question 1 A grph consists of the following: A set V of vertices. A set E of edges where ech

More information

Review of Calculus, cont d

Review of Calculus, cont d Jim Lmbers MAT 460 Fll Semester 2009-10 Lecture 3 Notes These notes correspond to Section 1.1 in the text. Review of Clculus, cont d Riemnn Sums nd the Definite Integrl There re mny cses in which some

More information

Properties of Integrals, Indefinite Integrals. Goals: Definition of the Definite Integral Integral Calculations using Antiderivatives

Properties of Integrals, Indefinite Integrals. Goals: Definition of the Definite Integral Integral Calculations using Antiderivatives Block #6: Properties of Integrls, Indefinite Integrls Gols: Definition of the Definite Integrl Integrl Clcultions using Antiderivtives Properties of Integrls The Indefinite Integrl 1 Riemnn Sums - 1 Riemnn

More information

Advanced Calculus: MATH 410 Notes on Integrals and Integrability Professor David Levermore 17 October 2004

Advanced Calculus: MATH 410 Notes on Integrals and Integrability Professor David Levermore 17 October 2004 Advnced Clculus: MATH 410 Notes on Integrls nd Integrbility Professor Dvid Levermore 17 October 2004 1. Definite Integrls In this section we revisit the definite integrl tht you were introduced to when

More information

Continuous Random Variables

Continuous Random Variables STAT/MATH 395 A - PROBABILITY II UW Winter Qurter 217 Néhémy Lim Continuous Rndom Vribles Nottion. The indictor function of set S is rel-vlued function defined by : { 1 if x S 1 S (x) if x S Suppose tht

More information

Unit #9 : Definite Integral Properties; Fundamental Theorem of Calculus

Unit #9 : Definite Integral Properties; Fundamental Theorem of Calculus Unit #9 : Definite Integrl Properties; Fundmentl Theorem of Clculus Gols: Identify properties of definite integrls Define odd nd even functions, nd reltionship to integrl vlues Introduce the Fundmentl

More information

Global Types for Dynamic Checking of Protocol Conformance of Multi-Agent Systems

Global Types for Dynamic Checking of Protocol Conformance of Multi-Agent Systems Globl Types for Dynmic Checking of Protocol Conformnce of Multi-Agent Systems (Extended Abstrct) Dvide Ancon, Mtteo Brbieri, nd Vivin Mscrdi DIBRIS, University of Genov, Itly emil: dvide@disi.unige.it,

More information

CS 275 Automata and Formal Language Theory

CS 275 Automata and Formal Language Theory CS 275 Automt nd Forml Lnguge Theory Course Notes Prt II: The Recognition Problem (II) Chpter II.5.: Properties of Context Free Grmmrs (14) Anton Setzer (Bsed on book drft by J. V. Tucker nd K. Stephenson)

More information

Model Reduction of Finite State Machines by Contraction

Model Reduction of Finite State Machines by Contraction Model Reduction of Finite Stte Mchines y Contrction Alessndro Giu Dip. di Ingegneri Elettric ed Elettronic, Università di Cgliri, Pizz d Armi, 09123 Cgliri, Itly Phone: +39-070-675-5892 Fx: +39-070-675-5900

More information

A REVIEW OF CALCULUS CONCEPTS FOR JDEP 384H. Thomas Shores Department of Mathematics University of Nebraska Spring 2007

A REVIEW OF CALCULUS CONCEPTS FOR JDEP 384H. Thomas Shores Department of Mathematics University of Nebraska Spring 2007 A REVIEW OF CALCULUS CONCEPTS FOR JDEP 384H Thoms Shores Deprtment of Mthemtics University of Nebrsk Spring 2007 Contents Rtes of Chnge nd Derivtives 1 Dierentils 4 Are nd Integrls 5 Multivrite Clculus

More information

Exam 2, Mathematics 4701, Section ETY6 6:05 pm 7:40 pm, March 31, 2016, IH-1105 Instructor: Attila Máté 1

Exam 2, Mathematics 4701, Section ETY6 6:05 pm 7:40 pm, March 31, 2016, IH-1105 Instructor: Attila Máté 1 Exm, Mthemtics 471, Section ETY6 6:5 pm 7:4 pm, Mrch 1, 16, IH-115 Instructor: Attil Máté 1 17 copies 1. ) Stte the usul sufficient condition for the fixed-point itertion to converge when solving the eqution

More information

20 MATHEMATICS POLYNOMIALS

20 MATHEMATICS POLYNOMIALS 0 MATHEMATICS POLYNOMIALS.1 Introduction In Clss IX, you hve studied polynomils in one vrible nd their degrees. Recll tht if p(x) is polynomil in x, the highest power of x in p(x) is clled the degree of

More information

Lecture 1. Functional series. Pointwise and uniform convergence.

Lecture 1. Functional series. Pointwise and uniform convergence. 1 Introduction. Lecture 1. Functionl series. Pointwise nd uniform convergence. In this course we study mongst other things Fourier series. The Fourier series for periodic function f(x) with period 2π is

More information

A New Grey-rough Set Model Based on Interval-Valued Grey Sets

A New Grey-rough Set Model Based on Interval-Valued Grey Sets Proceedings of the 009 IEEE Interntionl Conference on Systems Mn nd Cybernetics Sn ntonio TX US - October 009 New Grey-rough Set Model sed on Intervl-Vlued Grey Sets Wu Shunxing Deprtment of utomtion Ximen

More information

LECTURE NOTE #12 PROF. ALAN YUILLE

LECTURE NOTE #12 PROF. ALAN YUILLE LECTURE NOTE #12 PROF. ALAN YUILLE 1. Clustering, K-mens, nd EM Tsk: set of unlbeled dt D = {x 1,..., x n } Decompose into clsses w 1,..., w M where M is unknown. Lern clss models p(x w)) Discovery of

More information

UNIFORM CONVERGENCE. Contents 1. Uniform Convergence 1 2. Properties of uniform convergence 3

UNIFORM CONVERGENCE. Contents 1. Uniform Convergence 1 2. Properties of uniform convergence 3 UNIFORM CONVERGENCE Contents 1. Uniform Convergence 1 2. Properties of uniform convergence 3 Suppose f n : Ω R or f n : Ω C is sequence of rel or complex functions, nd f n f s n in some sense. Furthermore,

More information

Jim Lambers MAT 169 Fall Semester Lecture 4 Notes

Jim Lambers MAT 169 Fall Semester Lecture 4 Notes Jim Lmbers MAT 169 Fll Semester 2009-10 Lecture 4 Notes These notes correspond to Section 8.2 in the text. Series Wht is Series? An infinte series, usully referred to simply s series, is n sum of ll of

More information

5.7 Improper Integrals

5.7 Improper Integrals 458 pplictions of definite integrls 5.7 Improper Integrls In Section 5.4, we computed the work required to lift pylod of mss m from the surfce of moon of mss nd rdius R to height H bove the surfce of the

More information

Riemann Sums and Riemann Integrals

Riemann Sums and Riemann Integrals Riemnn Sums nd Riemnn Integrls Jmes K. Peterson Deprtment of Biologicl Sciences nd Deprtment of Mthemticl Sciences Clemson University August 26, 2013 Outline 1 Riemnn Sums 2 Riemnn Integrls 3 Properties

More information

Bisimulation. R.J. van Glabbeek

Bisimulation. R.J. van Glabbeek Bisimultion R.J. vn Glbbeek NICTA, Sydney, Austrli. School of Computer Science nd Engineering, The University of New South Wles, Sydney, Austrli. Computer Science Deprtment, Stnford University, CA 94305-9045,

More information

Decision Networks. CS 188: Artificial Intelligence Fall Example: Decision Networks. Decision Networks. Decisions as Outcome Trees

Decision Networks. CS 188: Artificial Intelligence Fall Example: Decision Networks. Decision Networks. Decisions as Outcome Trees CS 188: Artificil Intelligence Fll 2011 Decision Networks ME: choose the ction which mximizes the expected utility given the evidence mbrell Lecture 17: Decision Digrms 10/27/2011 Cn directly opertionlize

More information

13: Diffusion in 2 Energy Groups

13: Diffusion in 2 Energy Groups 3: Diffusion in Energy Groups B. Rouben McMster University Course EP 4D3/6D3 Nucler Rector Anlysis (Rector Physics) 5 Sept.-Dec. 5 September Contents We study the diffusion eqution in two energy groups

More information

Lecture 14: Quadrature

Lecture 14: Quadrature Lecture 14: Qudrture This lecture is concerned with the evlution of integrls fx)dx 1) over finite intervl [, b] The integrnd fx) is ssumed to be rel-vlues nd smooth The pproximtion of n integrl by numericl

More information

CS103B Handout 18 Winter 2007 February 28, 2007 Finite Automata

CS103B Handout 18 Winter 2007 February 28, 2007 Finite Automata CS103B ndout 18 Winter 2007 Ferury 28, 2007 Finite Automt Initil text y Mggie Johnson. Introduction Severl childrens gmes fit the following description: Pieces re set up on plying ord; dice re thrown or

More information

ODE: Existence and Uniqueness of a Solution

ODE: Existence and Uniqueness of a Solution Mth 22 Fll 213 Jerry Kzdn ODE: Existence nd Uniqueness of Solution The Fundmentl Theorem of Clculus tells us how to solve the ordinry differentil eqution (ODE) du = f(t) dt with initil condition u() =

More information

Riemann Sums and Riemann Integrals

Riemann Sums and Riemann Integrals Riemnn Sums nd Riemnn Integrls Jmes K. Peterson Deprtment of Biologicl Sciences nd Deprtment of Mthemticl Sciences Clemson University August 26, 203 Outline Riemnn Sums Riemnn Integrls Properties Abstrct

More information

Taylor Polynomial Inequalities

Taylor Polynomial Inequalities Tylor Polynomil Inequlities Ben Glin September 17, 24 Abstrct There re instnces where we my wish to pproximte the vlue of complicted function round given point by constructing simpler function such s polynomil

More information

Review of basic calculus

Review of basic calculus Review of bsic clculus This brief review reclls some of the most importnt concepts, definitions, nd theorems from bsic clculus. It is not intended to tech bsic clculus from scrtch. If ny of the items below

More information

UNIT 1 FUNCTIONS AND THEIR INVERSES Lesson 1.4: Logarithmic Functions as Inverses Instruction

UNIT 1 FUNCTIONS AND THEIR INVERSES Lesson 1.4: Logarithmic Functions as Inverses Instruction Lesson : Logrithmic Functions s Inverses Prerequisite Skills This lesson requires the use of the following skills: determining the dependent nd independent vribles in n exponentil function bsed on dt from

More information

CMDA 4604: Intermediate Topics in Mathematical Modeling Lecture 19: Interpolation and Quadrature

CMDA 4604: Intermediate Topics in Mathematical Modeling Lecture 19: Interpolation and Quadrature CMDA 4604: Intermedite Topics in Mthemticl Modeling Lecture 19: Interpoltion nd Qudrture In this lecture we mke brief diversion into the res of interpoltion nd qudrture. Given function f C[, b], we sy

More information

Bayesian Networks: Approximate Inference

Bayesian Networks: Approximate Inference pproches to inference yesin Networks: pproximte Inference xct inference Vrillimintion Join tree lgorithm pproximte inference Simplify the structure of the network to mkxct inferencfficient (vritionl methods,

More information

Classical Mechanics. From Molecular to Con/nuum Physics I WS 11/12 Emiliano Ippoli/ October, 2011

Classical Mechanics. From Molecular to Con/nuum Physics I WS 11/12 Emiliano Ippoli/ October, 2011 Clssicl Mechnics From Moleculr to Con/nuum Physics I WS 11/12 Emilino Ippoli/ October, 2011 Wednesdy, October 12, 2011 Review Mthemtics... Physics Bsic thermodynmics Temperture, idel gs, kinetic gs theory,

More information

Chapter 3. Vector Spaces

Chapter 3. Vector Spaces 3.4 Liner Trnsformtions 1 Chpter 3. Vector Spces 3.4 Liner Trnsformtions Note. We hve lredy studied liner trnsformtions from R n into R m. Now we look t liner trnsformtions from one generl vector spce

More information

Euler, Ioachimescu and the trapezium rule. G.J.O. Jameson (Math. Gazette 96 (2012), )

Euler, Ioachimescu and the trapezium rule. G.J.O. Jameson (Math. Gazette 96 (2012), ) Euler, Iochimescu nd the trpezium rule G.J.O. Jmeson (Mth. Gzette 96 (0), 36 4) The following results were estblished in recent Gzette rticle [, Theorems, 3, 4]. Given > 0 nd 0 < s

More information

4.4 Areas, Integrals and Antiderivatives

4.4 Areas, Integrals and Antiderivatives . res, integrls nd ntiderivtives 333. Ares, Integrls nd Antiderivtives This section explores properties of functions defined s res nd exmines some connections mong res, integrls nd ntiderivtives. In order

More information

1 Nondeterministic Finite Automata

1 Nondeterministic Finite Automata 1 Nondeterministic Finite Automt Suppose in life, whenever you hd choice, you could try oth possiilities nd live your life. At the end, you would go ck nd choose the one tht worked out the est. Then you

More information

Generation of Lyapunov Functions by Neural Networks

Generation of Lyapunov Functions by Neural Networks WCE 28, July 2-4, 28, London, U.K. Genertion of Lypunov Functions by Neurl Networks Nvid Noroozi, Pknoosh Krimghee, Ftemeh Sfei, nd Hmed Jvdi Abstrct Lypunov function is generlly obtined bsed on tril nd

More information

This lecture covers Chapter 8 of HMU: Properties of CFLs

This lecture covers Chapter 8 of HMU: Properties of CFLs This lecture covers Chpter 8 of HMU: Properties of CFLs Turing Mchine Extensions of Turing Mchines Restrictions of Turing Mchines Additionl Reding: Chpter 8 of HMU. Turing Mchine: Informl Definition B

More information

221B Lecture Notes WKB Method

221B Lecture Notes WKB Method Clssicl Limit B Lecture Notes WKB Method Hmilton Jcobi Eqution We strt from the Schrödinger eqution for single prticle in potentil i h t ψ x, t = [ ] h m + V x ψ x, t. We cn rewrite this eqution by using

More information

Metrics for Finite Markov Decision Processes

Metrics for Finite Markov Decision Processes Metrics for Finite Mrkov Decision Processes Norm Ferns chool of Computer cience McGill University Montrél, Cnd, H3 27 nferns@cs.mcgill.c Prksh Pnngden chool of Computer cience McGill University Montrél,

More information

Math 426: Probability Final Exam Practice

Math 426: Probability Final Exam Practice Mth 46: Probbility Finl Exm Prctice. Computtionl problems 4. Let T k (n) denote the number of prtitions of the set {,..., n} into k nonempty subsets, where k n. Argue tht T k (n) kt k (n ) + T k (n ) by

More information

Frobenius numbers of generalized Fibonacci semigroups

Frobenius numbers of generalized Fibonacci semigroups Frobenius numbers of generlized Fiboncci semigroups Gretchen L. Mtthews 1 Deprtment of Mthemticl Sciences, Clemson University, Clemson, SC 29634-0975, USA gmtthe@clemson.edu Received:, Accepted:, Published:

More information

Finite Automata. Informatics 2A: Lecture 3. Mary Cryan. 21 September School of Informatics University of Edinburgh

Finite Automata. Informatics 2A: Lecture 3. Mary Cryan. 21 September School of Informatics University of Edinburgh Finite Automt Informtics 2A: Lecture 3 Mry Cryn School of Informtics University of Edinburgh mcryn@inf.ed.c.uk 21 September 2018 1 / 30 Lnguges nd Automt Wht is lnguge? Finite utomt: recp Some forml definitions

More information

Intermediate Math Circles Wednesday, November 14, 2018 Finite Automata II. Nickolas Rollick a b b. a b 4

Intermediate Math Circles Wednesday, November 14, 2018 Finite Automata II. Nickolas Rollick a b b. a b 4 Intermedite Mth Circles Wednesdy, Novemer 14, 2018 Finite Automt II Nickols Rollick nrollick@uwterloo.c Regulr Lnguges Lst time, we were introduced to the ide of DFA (deterministic finite utomton), one

More information

Monte Carlo method in solving numerical integration and differential equation

Monte Carlo method in solving numerical integration and differential equation Monte Crlo method in solving numericl integrtion nd differentil eqution Ye Jin Chemistry Deprtment Duke University yj66@duke.edu Abstrct: Monte Crlo method is commonly used in rel physics problem. The

More information

DATA Search I 魏忠钰. 复旦大学大数据学院 School of Data Science, Fudan University. March 7 th, 2018

DATA Search I 魏忠钰. 复旦大学大数据学院 School of Data Science, Fudan University. March 7 th, 2018 DATA620006 魏忠钰 Serch I Mrch 7 th, 2018 Outline Serch Problems Uninformed Serch Depth-First Serch Bredth-First Serch Uniform-Cost Serch Rel world tsk - Pc-mn Serch problems A serch problem consists of:

More information

Learning Moore Machines from Input-Output Traces

Learning Moore Machines from Input-Output Traces Lerning Moore Mchines from Input-Output Trces Georgios Gintmidis 1 nd Stvros Tripkis 1,2 1 Alto University, Finlnd 2 UC Berkeley, USA Motivtion: lerning models from blck boxes Inputs? Lerner Forml Model

More information

Chapter 14. Matrix Representations of Linear Transformations

Chapter 14. Matrix Representations of Linear Transformations Chpter 4 Mtrix Representtions of Liner Trnsformtions When considering the Het Stte Evolution, we found tht we could describe this process using multipliction by mtrix. This ws nice becuse computers cn

More information

Chapter 2 Finite Automata

Chapter 2 Finite Automata Chpter 2 Finite Automt 28 2.1 Introduction Finite utomt: first model of the notion of effective procedure. (They lso hve mny other pplictions). The concept of finite utomton cn e derived y exmining wht

More information

4. GREEDY ALGORITHMS I

4. GREEDY ALGORITHMS I 4. GREEDY ALGORITHMS I coin chnging intervl scheduling scheduling to minimize lteness optiml cching Lecture slides by Kevin Wyne Copyright 2005 Person-Addison Wesley http://www.cs.princeton.edu/~wyne/kleinberg-trdos

More information

Lecture 09: Myhill-Nerode Theorem

Lecture 09: Myhill-Nerode Theorem CS 373: Theory of Computtion Mdhusudn Prthsrthy Lecture 09: Myhill-Nerode Theorem 16 Ferury 2010 In this lecture, we will see tht every lnguge hs unique miniml DFA We will see this fct from two perspectives

More information

and that at t = 0 the object is at position 5. Find the position of the object at t = 2.

and that at t = 0 the object is at position 5. Find the position of the object at t = 2. 7.2 The Fundmentl Theorem of Clculus 49 re mny, mny problems tht pper much different on the surfce but tht turn out to be the sme s these problems, in the sense tht when we try to pproimte solutions we

More information

Math& 152 Section Integration by Parts

Math& 152 Section Integration by Parts Mth& 5 Section 7. - Integrtion by Prts Integrtion by prts is rule tht trnsforms the integrl of the product of two functions into other (idelly simpler) integrls. Recll from Clculus I tht given two differentible

More information

The practical version

The practical version Roerto s Notes on Integrl Clculus Chpter 4: Definite integrls nd the FTC Section 7 The Fundmentl Theorem of Clculus: The prcticl version Wht you need to know lredy: The theoreticl version of the FTC. Wht

More information

Riemann is the Mann! (But Lebesgue may besgue to differ.)

Riemann is the Mann! (But Lebesgue may besgue to differ.) Riemnn is the Mnn! (But Lebesgue my besgue to differ.) Leo Livshits My 2, 2008 1 For finite intervls in R We hve seen in clss tht every continuous function f : [, b] R hs the property tht for every ɛ >

More information

1.4 Nonregular Languages

1.4 Nonregular Languages 74 1.4 Nonregulr Lnguges The number of forml lnguges over ny lphbet (= decision/recognition problems) is uncountble On the other hnd, the number of regulr expressions (= strings) is countble Hence, ll

More information

1. For each of the following theorems, give a two or three sentence sketch of how the proof goes or why it is not true.

1. For each of the following theorems, give a two or three sentence sketch of how the proof goes or why it is not true. York University CSE 2 Unit 3. DFA Clsses Converting etween DFA, NFA, Regulr Expressions, nd Extended Regulr Expressions Instructor: Jeff Edmonds Don t chet y looking t these nswers premturely.. For ech

More information

IN GAUSSIAN INTEGERS X 3 + Y 3 = Z 3 HAS ONLY TRIVIAL SOLUTIONS A NEW APPROACH

IN GAUSSIAN INTEGERS X 3 + Y 3 = Z 3 HAS ONLY TRIVIAL SOLUTIONS A NEW APPROACH INTEGERS: ELECTRONIC JOURNAL OF COMBINATORIAL NUMBER THEORY 8 (2008), #A2 IN GAUSSIAN INTEGERS X + Y = Z HAS ONLY TRIVIAL SOLUTIONS A NEW APPROACH Elis Lmpkis Lmpropoulou (Term), Kiprissi, T.K: 24500,

More information

7.2 The Definite Integral

7.2 The Definite Integral 7.2 The Definite Integrl the definite integrl In the previous section, it ws found tht if function f is continuous nd nonnegtive, then the re under the grph of f on [, b] is given by F (b) F (), where

More information

1.9 C 2 inner variations

1.9 C 2 inner variations 46 CHAPTER 1. INDIRECT METHODS 1.9 C 2 inner vritions So fr, we hve restricted ttention to liner vritions. These re vritions of the form vx; ǫ = ux + ǫφx where φ is in some liner perturbtion clss P, for

More information