PPCP: The Proofs. 1 Notations and Assumptions. Maxim Likhachev Computer and Information Science University of Pennsylvania Philadelphia, PA 19104

Size: px
Start display at page:

Download "PPCP: The Proofs. 1 Notations and Assumptions. Maxim Likhachev Computer and Information Science University of Pennsylvania Philadelphia, PA 19104"

Transcription

1 PPCP: The Proofs Maxm Lkhachev Computer ad Iformato Scece Uversty of Pesylvaa Phladelpha, PA Athoy Stetz The Robotcs Isttute Carege Mello Uversty Pttsburgh, PA Notatos ad Assumptos I ths secto we troduce some otatos ad formalze mathematcally the class of problems our algorthm s sutable for. We assume that the evromet s fully determstc ad ca be modeled as a graph. That s, f we were to kow the true value of each varable that represets the mssg formato about the evromet the there would be o ucertaty a outcome of ay acto. There are certa elemets of the evromet, however, whose status we are ucerta about ad whch affect the outcomes (ad/or possble costs) of oe or more actos. I the followg we re-phrase ths mathematcally. Let X be a full state-vector (a belef state). We assume t ca be splt to two sets of varables, S(X), H(X): X = [S(X); H(X)]. S(X) s the fte set of varables whose values are always observed ad the umber of possble values s also fte. H(X) s the set of (hdde) varables that tally represeted the mssg formato about the evromet. The varables H(X) are ever moved to S(X). X start s used to deote the start state, all the values of the varables H(X start ) are ukow. The goal of the plaer s to costruct a polcy that reaches ay state X such that S(X) = S goal, where S goal s gve, whle mmzg the expected cost of executo. We assume perfect sesg. For the sake of easer otato let us troduce a addtoal value u for each varable h H. The settg h (X) = u at state X wll represet the fact that the value of h s ukow at X. If h (X) u, the the true value of h s kow at X sce sesg s perfect. We restrct that all the varables that make up X ca take oly a fte umber of dstct values. We assume at most oe hdde varable per acto. Let A(S(X)) to deote the fte set of actos avalable at ay state Y whose S(Y ) = S(X). Each acto a A(S(X)) take at state X may have oe or more outcomes. If the executo of the acto does ot deped o ay of the varables h whose values are ot yet kow, the there s oly oe outcome of a. Otherwse, there ca be more tha oe outcome. We assume that each such acto ca ot be cotrolled by more tha oe hdde varable. (The value of oe hdde varable ca affect more tha oe acto though.) We use h S(X),a to represet the hdde varable that cotrols the outcomes ad costs of acto a take at state X. By h S(X),a = ull we deote the case whe 1

2 there was ever ay ucertaty about the outcome of acto a take at state X. The set of possble outcomes of acto a take S(X) s otated by succ(s(x), a), whereas c(s(x), a, S(Y )) such that S(Y ) succ(s(x), a) deotes the cost of the acto ad the outcome S(Y ). The costs are assumed to be bouded from below by a (small) postve costat. Sometmes, we wll eed to refer to the set of successors the belef state-space. I these cases we wll use the otato succ(x, a) to deote the set of belef states Y such that S(Y ) succ(s(x), a) ad H(Y ) s the same as H(X) except for h S(X),a whch also remas the same f t was kow at X ad s dfferet otherwse. The fucto P X,a (succ(x, a)), the probablty dstrbuto of outcomes of a executed at X, follows the probablty dstrbuto of h S(X),a, P (h S(X),a ). Oce acto a was executed at state X the actual value of h S(X),a ca be deduced sce we assumed the sesg s perfect ad the evromet s determstc. We assume depedece of the hdde varables. For the sake of effcet plag we assume that the varables H ca be cosdered depedet of each other ad therefore P (H) = H =1 P (h ). We assume clear prefereces o the values of the hdde varables are avalable. We requre that for each varable h H we are gve ts preferred value, deoted by b (.e., best). Ths value must satsfy the followg property. Gve ay state X ad ay acto a such that h S(X),a s ot kow (that s, h S(X),a (X) = u), there exsts a successor state X such that h S(X),a (X ) = b ad X = argm Y succ(x,a) c(s(x), a, S(Y ))+v (Y ), where v (Y ) s the expected cost of executg a optmal polcy at state Y (Def. 1). We wll use the otato succ(x, a) b (.e., the best successor) to deote the state X whose h S(X),a (X ) = b f h S(X),a (X) = u ad whose h S(X),a (X ) = h S(X),a (X) otherwse. 2

3 A Appedx: The Proofs The pseudocode below assumes the followg: 1. Every state X the search state space tally s assumed to have v( X) = g( X) = ad besta( X) = ull; 1 procedure ComputePath(X pvot) 2 Xsearchgoal = GetStateSearchGraph([S(X pvot); H(X pvot)]); 3 g( X searchgoal ) = v( X searchgoal ) = ; 4 OPEN = ; 5 for every H whose every elemet h satsfes: [(h = u h = b) h (X pvot) = u] OR [h = h (X pvot) h (X pvot) u] 6 X = GetStateSearchGraph([Sgoal ; H]); 7 v( X) =, g( X) = 0, besta( X) = ull; 8 sert X to OPEN wth g( X) + h( X); 9 whle(g( X searchgoal ) > m X OPEN g( X ) + h( X )) 10 remove X wth the smallest g( X) + h( X) from OPEN; 11 v( X) = g( X); 12 for each acto a ad X = [S(X ); H(X ); H u (X pvot)] s.t. X = [S(succ(X, a) b ); H(succ(X, a) b )] 13 X = GetStateSearchGraph([S(X ); H(X )]); 14 Q a = Y succ(x,a) P (X, a, Y ) max(c(s(x ), a, S(Y ))+w(y ), c(s(x ), a, S(Y ))+v( X)); 15 f g( X ) > Q a 16 g( X ) = Q a; 17 besta( X ) = a; 18 sert/update X OPEN wth the prorty equal to g( X ) + h( X ); Fgure 1: ComputePath fucto The pseudocode below assumes the followg: 1. Every state X tally has 0 w(x) w b (X) ad besta(x) = ull. 1 procedure UpdateMDP(X pvot) 2 X = X pvot; X = GetStateSearchGraph([S(Xpvot); H(X pvot)]); 3 whle (S(X) S goal ) 4 w(x) = g( X); w([s(x); H(X); H u (X pvot)]) = g( X); besta(x) = besta( X); 5 f (besta(x) = ull) break; 6 X = succ(x, besta(x)) b ; X = GetStateSearchGraph([S(X); H(X)]); 7 procedure Ma() 8 X pvot = X start; 9 whle (X pvot! = ull) 10 ComputePath(X pvot); 11 UpdateMDP(X pvot); 12 fd state X o the curret polcy that has w(x) < E X succ(x,besta(x)) (c(s(x), besta(x), S(X )) + w(x )); 13 f foud set X pvot to X; 14 otherwse set X pvot to ull; Fgure 2: Ma fucto 3

4 Let us frst defe several varables that we wll use durg the proofs. Let H b be defed as H wth each h equal to u replaced by b. X b s the defed as [S(X); H(X); H b (X)]. Let H u (X) be H(X) but wth each h = h b replaced by u. For every state X we the defe X u state as the followg state: X u = [S(X); H(X); H u (X)]. We ow troduce optmstc Q-values. Every state-acto par X ad a A(S(X)) has a Q f,w (X, a) > 0 assocated wth t that s calculated from the acto costs c(s(x), a, S(Y )) for all states Y succ(x, a), the o-egatve f- value for state X = succ(x, a) b ad the o-egatve values w(y ) for all states Y succ(x, a). Q f,w (X, a) s defed as follows: Q f,w (X, a) = Y succ(x,a) P (X, a, Y ) max(c(s(x), a, S(Y )) + w(y ), c(s(x), a, S(X )) + f(x )) (1) We ow defe a optmstc path from X to X 0 whose S(X 0 ) = S goal as follows: π = [{X, a, X 1 },..., {X 1, a 1, X 0 }], where every tme a s stochastc, a outcome X k 1 = succ(x k, a k ) b. We defe a optmstc cost of a optmstc path π = [{X, a, X 1 },..., {X 1, a 1, X 0 }] uder a o-egatve value fucto w recursvely as follows: φπ (X, X 0 ) = { 0 f = 0 Q f(x 1 )=φπ (X 1,X 0 ),w (X, a ) f > 0 (2) We call a path defed by besta poters from X to X 0 as follows: π best = [{X, a, X 1 },..., {X 1, a 1, X 0 }], where a = besta(x ) ad X 1 = succ(x, a ) b. We defe a greedy path π greedy,f,w (X, X 0 ) = [{X, a, X 1 },..., {X 1, a 1, X 0 }] wth respect to fuctos f ad w that map each state X oto o-egatve real-values. It s defed as a path π from X to X 0 where for every 1 a = argm Q a A(S(X)) f,w(x, a) ad the outcome X 1 = succ(x, a ) b. We also defe w b values of states as costs of reachg a goal state uder the assumpto that the values of the mssg varables are all set to b: { w b 0 f S(X) = Sgoal (X) = m a S(X) (c(s(x), a, succ(x, a) b ) + w b (succ(x, a) b )) otherwse (3) A.1 ComputePath Fucto I ths secto we wll prove theorems that maly cocer the ComputePath fucto. We wll cosder a sgle executo of ComputePath fucto. We wll take the followg coveto: the search state-space at ay partcular executo of ComputePath wll be deoted by S, ay state S wll be deoted by a letter wth above t. The states the orgal MDP wll ot use sg above t. Thus, f X s a full state, the 4

5 X = [S(X), H(X)]. We wll also reserve the otato X X+H u to deote a full state [S( X); H( X); H u (X pvot )]. Smlarly to the defto of π a full state-space, a optmstc path from X to X0 s defed as π = [{X X+H u, a, succ(x X+H u..., {X X+H u 1, a 1, succ(x X+H u X 1 = [S(succ(X X+H u, a ) b }, {X X+H u 1, a 1, succ(x X+H u 1, a 1 ) b }, 1, a 1 ) b }], where for every 1, a ) b ); H(succ(X X+H u, a ) b )]. Smlarly to the defto of π best, a path π best from X to X 0 a full state-space s defed as π from X to X 0 where for every 1 a = besta( X ). I addto, we defe a greedy path π greedy,f,w wth respect to fuctos f ad w that map each state X oto o-egatve real-values. It s defed as a path π from X to X 0 where for every 1 a = argm a A(S( X)) Q f,w(x X+H u, a). We defe goal dstaces, g -values uder a fucto w recursvely as follows: { g ( X) = 0 f S( X) = S goal m Q a A(S( X)) f(y =succ(x X+H u,a) b )=g (X X+H u (4), a) otherwse (Ỹ ),w Fally, we requre that the heurstcs are cosstet the followg sese: h( X searchgoal ) = 0 ad for every other state X, a A(S( X)) ad Ỹ s.t. Y = succ(x X+H u, a) b, h(ỹ ) h( X) + c(s( X), a, S(Ỹ )). A.1.1 Low-level Correctess Lemma 1 Gve a o-egatve fucto w, for ay state X g ( X ) = φ πgreedy,g,w ( X, X 0 ) = m π from X to X 0 φ π ( X, X 0 ) where X 0 s the oly state o π greedy,g,w( X, X 0 ) that has S( X 0 ) = S goal. I addto, t holds that H( X 0 ) satsfes the equato o le 5. Proof: Let us frst prove that g ( X ) = φ πgreedy,g,w ( X, X 0 ). Let us wrte out the formula for φ πgreedy,g,w ( X, X 0 ). If = 0, φ πgreedy,g,w ( X, X 0 ) = g ( X ) = 0 sce S( X ) = S( X 0 ) = S goal. Suppose ow, 0. The φ πgreedy,g,w ( X, X 0 ) = m Q a A(S( X f(x 1)=φ πgreedy,g )),w ( X 1, X 0),w (X, a) Accordg to the defto of a optmstc path π, X 1 = succ(x X+H u, a ) b. It s thus the exact same formula as for g -values (equato 4). Let us ow prove that φ πgreedy,g,w ( X, X 0 ) = m π from X to X 0 φ π ( X, X 0 ). Let us deote argm π from X to X 0 φ π ( X, X 0 ) by π ( X, X 0 ) ad m π from X to X 0 φ π ( X, X 0 ) by φ π ( X, X 0 ). Sce π ( X, X 0 ) s a optmal optmstc path, φ πgreedy,g,w ( X, X 0 ) φ π ( X, X 0 ). We therefore eed to show that φ πgreedy,g,w ( X, X 0 ) φ π ( X, X 0 ) also. 5

6 The proof s a smple proof by cotradcto. Let us assume that φ πgreedy,g,w ( X, X 0 ) > φ π ( X, X 0 ). Ths mples that φ π ( X, X 0 ) s fte ad therefore path π ( X, X 0 ) s fte (sce φ π ( X, X 0 ) > φ π ( X 1, X 0 ) for all > 0 ad φ π ( X 0, X 0 ) = 0). Cosder a par of states X ad X 1 o the path π ( X, X 0 ) such that φ πgreedy,g,w ( X, X 0 ) > φ π ( X, X 0 ) but φ πgreedy,g,w ( X 1, X 0 ) φ π ( X 1, X 0 ). Such par must exst sce at least for X 0, φ πgreedy,g,w ( X 0, X 0 ) = φ π ( X 0, X 0 ) = 0. The we get the followg cotradcto. φ πgreedy,g,w ( X, X 0 ) Q f(x 1)=φ πgreedy,g,w ( X 1, X 0),w (X, a) Q f(x 1)=φ π ( X 1, X 0),w (X, a) = φ π ( X, X 0 ) We ow show that t holds that H( X 0 ) satsfes the equato o le 5. Cosder ay h. Utl path π volves executg a acto whose outcomes deped o h, ay state X o the path wll have h ( X ) = h ( X pvot ). Suppose ow at state X a acto a s executed whose outcomes deped o h. The, f h (X pvot ) u, the acto s determstc ad h ( X 1 ) = h ( X pvot ), whch s cosstet wth the equato o le 5. h ( X 1 ) remas to be such utl the ed of the path. O the other had, f h (X pvot ) = u, the acto a may have multple outcomes, but a optmstc path always chooses the preferred outcome: X 1 = succ(x X+H u 1, a) b. Therefore, h ( X 1 ) = b ad remas such utl the ed of the path. Ths s aga cosstet wth the equato o le 5. Fally, f path π does ot volve executg a acto whose outcomes deped o h, the h ( X 0 ) = h ( X pvot ), whch s also cosstet wth the equato o le 5. Lemma 2 Gve a o-egatve fucto w ad a path π greedy,g,w from X to ay state X 0 wth S( X 0 ) = S goal t holds that g ( X ) +1 j= c(s( X j ), a j, S( X j 1 )) + g ( X ) for ay 0 Proof: The followg s the proof that the theorem holds for = 1. g ( X) = X+H u m a A(S( Qf(Y X)) =succ(x X+H u (X,a) b )=g (Ỹ ),w, a) X+H u = Qf(Y =succ(x X+H u (X,a) b )=g (Ỹ ),w, a) = Y succ(x X+H u,a) P (X X+H u, a, Y ) max(c(s( X), a, S(Ỹ )) + w(y ), c(s( X), a, S(succ(X X+H u, a) b )) + g (succ(x X+H u, a) b ) c(s( X), a, S(succ(X X+H u = c(s( X), a, S( X 1 )) + g ( X 1 ), a) b )) + g (succ(x X+H u, a) b )) 6

7 The proof for 0 1 holds by ducto o. Lemma 3 At ay pot tme, for ay state X t holds that v( X) g( X). Proof: The theorem clearly holds before le 9 was executed for the frst tme sce for each state X v( X) =. Afterwards, the g-values ca oly decrease (les 15-16). For ay state X, o the other had, v( X) oly chages o le 11 whe t s set to g( X). Thus, t s always true that v( X) g( X). Lemma 4 Assumg fucto w s o-egatve, at le 9, the followg holds: g( X) = 0, besta( X) = ull for every state X whose S( X) = S goal ad H( X) satsfes the equato o le 5 g( X) = Q f(y )=v( Ỹ ),w (X X+H u, besta( X)) ad besta( X) = argm a A(S( X)) Q f(y )=v( Ỹ ),w (X X+H u, a), for every other state X f g( X) =, the besta( X) = ull Proof: The theorem holds the frst tme le 9 s executed. Ths s so because every state X S has v( X) =. As a result, the rght-had sde of the equato 1 evaluated uder fucto f = s equal to, depedetly of acto a. Ths s correct, sce after the talzato every state X wth S( X) S goal or whose H( X) does ot satsfy the equato o le 5 has g( X) =, besta( X) = ull ad every state X wth S( X) = S goal ad H( X) satsfyg the equato o le 5 has g( X) = 0, besta( X) = ull. The oly place where g- ad v-values are chaged afterwards s o les 11 ad 16. If v(s) s chaged le 11, the t s decreased accordg to Lemma 3. Thus, t may oly decrease the g-values of ts successors. The test o le 15 checks ths ad updates the g-values ad besta poters as ecessary. Sce all costs are postve ad ever chage, g-value of a state X wth S( X) = S goal ad H( X) satsfyg the equato o le 5 ca ever be chaged: t wll ever pass the test o le 15, ad thus s always 0. Also, sce g-values do ot crease, t cotues to hold that f g( X) =, the besta( X) = ull. Lemma 5 At le 9, OPEN cotas all ad oly states X whose v( X) g( X). Proof: The frst tme le 9 s executed the theorem holds sce after the talzato the oly states OPEN are the states X wth v( X) = = 0 = g( X). The rest of the states have fte values. Durg the followg executo wheever we decrease g( X) (le 16), ad as a result make g( X) < v( X) (Lemma 3), we sert t to OPEN; wheever we remove X from OPEN (le 10) we set v( X) = g( X) (le 11) makg the state cosstet. We ever modfy v( X) or g( X) elsewhere. 7

8 Lemma 6 Assumg fucto w s o-egatve, suppose X s selected for expaso o le 10. The the ext tme le 9 s executed v( X) = g( X), where g( X) before ad after the expaso of X s the same. Proof: Suppose X s selected for expaso. The o le 11 v( X) = g( X), ad t s the oly place where a v-value chages. We, thus, oly eed to show that g( X) does ot chage. It could oly chage f X = X ad g( X ) > Q a at oe of the executos of le 15. The former codto meas that there exsts a such that X = [S(succ(X X+H u, a) b ); H(succ(X X+H u, a) b )]. The later codto meas that g( X) > Q f(y )=v( Ỹ ),w (X X+H u, a). Sce X = [S(succ(X X+H u, a) b ); H(succ(X X+H u, a) b )], f(succ(x X+H u, a) b ) = v( X) = g( X). Hece, g( X) > Q f(succ(x X+H u,a) b )=g( X),w (X X+H u, a). Ths meas that g( X) > c(s( X), a, S( X)) + g( X) whch s mpossble sce costs are postve. Lemma 7 Assumg fucto w s o-egatve, at le 9, for ay state X, a optmstc cost of a path defed by besta poters, π best, from X to a state X 0 whose S( X 0 ) = S goal s o larger tha g( X), that s, φ πbest ( X, X 0 ) g( X). I addto, v( X) g( X) g ( X). Proof: v( X) g( X) holds accordg to Lemma 3. We thus eed to show that φ πbest ( X, X 0 ) g( X), ad g( X) g ( X). The statemet follows f g( X) =. We thus ca restrct our proof to a fte g-value. Cosder a path π best from X = X to a state X0 : π best = [{X X+H u, a, succ(x X+H u..., {X X+H u 1, a 1, succ(x X+H u [S(succ(X X+H u, a ) b }, {X X+H u 1, a 1, succ(x X+H u 1, a 1 ) b }, 1, a 1 ) b }], where a = besta( X ) ad X 1 =, a ) b ); H(succ(X X+H u, a ) b )]. We ow show that φ πbest ( X, X 0 ) g( X) by cotradcto. Suppose t does ot hold. Let us the pck a state Xk o the path that s closest to X 0 ad for whch φ πbest ( X k, X 0 ) > g( X k ). S(X k ) S goal because otherwse φ πbest ( X k, X 0 ) = 0 from the defto of φ-values. Cosequetly, φ πbest ( X k, X 0 ) = Q f(succ(x X+H u k,a k ) b )=φ πbest ( X k 1, X (Xk 0),w, a k ). Accordg to Lemma 4 g( X k ) = Q f(y )=v( Ỹ ),w(xk, a k ), where a k = besta( X k ). From Lemma 3 t the also follows that g( X k ) Q f(y )=g( Ỹ ),w(xk, a k ). Hece, g( X k ) Q X+H u (X f(succ(x k, a k ). k,a k ) b )=g( X k 1 ),w Fally, because of the way we pcked state Xk, φ πbest ( X k 1, X 0 ) g( X k 1 ). As a result, g( X k ) Q f(succ(x X+H u k Q X+H u f(succ(x,a k k ) b )=φ πbest ( X k 1, X (Xk 0),w cotradcto to the assumpto that φ πbest ( X k, X 0 ) > g( X k ).,a k ) b )=g( X (X k 1 ),w k, a k ), a k ) = φ πbest ( X k, X 0 ). Ths s a 8

9 Sce φ πbest ( X, X 0 ) g( X) the proof that g( X) g ( X) follows drectly from Lemma 1. A.1.2 Ma theorems Theorem 1 Assumg fucto w s o-egatve, at le 9, for ay state X wth (h( X) < g( X) + h( X) g(ũ) + h(ũ) Ũ OPEN), t holds that g( X) = g ( X). Proof: We prove by cotradcto. Suppose there exsts X such that h( X) < g( X) + h( X) g(ũ) + h(ũ) Ũ OPEN, but g( X) g ( X). Accordg to Lemma 7 t the follows that g( X) > g ( X). Ths also mples that g ( X) <. We also assume that S( X) S goal or H( X) does ot satsfy the equato o le 5 sce otherwse g( X) = 0 = g ( X) from Lemma 4. Cosder a path π greedy,g,w from X = X to a state X 0 whose S( X 0 ) = S goal. Accordg to Lemma 1, the cost of ths path s g ( X) ad H( X 0 ) satsfes the equato o le 5. Such path must exst sce g ( X) < ad from equato 4 t s clear that g ( X ) > g ( X 1 ) for each 1 o the path. Our assumpto that g( X) > g ( X) meas that there exsts at least oe X o the path π greedy,g,w, amely X 1, whose v( X ) > g ( X ). Otherwse, m Q a A(S( X f(y )=v( Ỹ ),w (X X+H u, a) )) g( X) = g( X ) = (Lemma 4) Q f(y )=v( Ỹ ),w (X X+H u, a ) = (def. of π) Q f(y )=v( X 1),w (X X+H u, a ) Q f(y )=g ( X 1),w (X X+H u, a ) = (def. of g ) g ( X ) = g ( X) Let us ow cosder X o the path wth the smallest dex 0 (that s, closest to X 0 ) such that v( X ) > g ( X ). We wll frst show that g ( X ) g( X ). It s clearly so whe = 0 accordg to Lemma 4 whch says that g( X ) = 0 wheever S( X ) = S goal ad H( X ) satsfes the equato o le 5. For > 0 we use the fact that v( X 1 ) g ( X 1 ) from the way X was chose, m Q a A(S( X f(y )=v( Ỹ ),w(x, a) )) g( X ) = (Lemma 4) Q f(y )=v( Ỹ ),w(x, a ) = (def. of π) Q f(y )=v( X 1),w (X, a ) Q f(y )=g ( X 1),w (X, a ) = (def. of g ) 9

10 g ( X ) We thus have v( X ) > g ( X ) g( X ), whch mples that X OPEN accordg to Lemma 5. We wll ow show that g( X) + h( X) > g( X ) + h( X ), ad fally arrve at a cotradcto. Accordg to our assumpto g( X) > g ( X) ad h( X) <, therefore +1 j= 1 g( X) + h( X) = g( X ) + h( X ) > g ( X ) + h( X ) (Lemma 2) +1 c(s( X j ), a j, S( X j 1 )) + g ( X ) + h( X ) (property of h) j= c(s( X j ), a j, S( X j 1 )) + g ( X ) + h( X 1 )... g ( X ) + h( X ) g( X ) + h( X ) Ths equalty, however, mples that X / OPEN sce accordg to the codtos of the theorem g( X) + h( X) g(ũ) + h(ũ) Ũ OPEN. But ths cotradcts to what we have prove earler. A.1.3 Correctess The corollares ths secto show how the theorems the prevous secto lead qute trvally to the correctess of ComputePath. We also show that each state s expaded at most oce, smlar to the guaratee that A* makes for determstc graphs wheever heurstcs are cosstet. Corollary 1 Whe the ComputePath fucto exts the followg holds for ay state X wth h( X) < g( X)+h( X) m X OP EN (g( X )+h( X )): a optmstc cost of a path defed by besta poters, π best, from X to a state X 0 whose S( X 0 ) = S goal s equal to g ( X), that s, φ πbest ( X, X 0 ) = g ( X). Proof: Accordg to Theorem 1 the codto h( X) < g( X) + h( X) m X OP EN (g( X ) + h( X )) mples that g( X) = g ( X). From Lemma 7 t the follows that φ πbest ( X, X 0 ) g ( X). Sce g ( X) s a optmstc cost of a least-cost path from X to X 0 accordg to Lemma 1, φ πbest ( X, X 0 ) = g ( X). Corollary 2 Whe the ComputePath fucto exts the followg holds: a optmstc cost of a path defed by besta poters, π best, from X searchgoal to a state X 0 whose S( X 0 ) = S goal s equal to g ( X searchgoal ), that s, φ πbest ( X searchgoal, X 0 ) = g ( X searchgoal ). The legth of ths path s fte. 10

11 Proof: Accordg to the termato codto of the ComputePath fucto, upo ts ext g( X searchgoal ) m X OPEN (g( X ) + h( X )). Sce h( X searchgoal ) = 0 the proof that the cost of the path s equal to g ( X searchgoal ) the follows drectly from Corollary 1 To prove that the path defed by besta poters s always fte, frst cosder the case of g( X searchgoal ) =. Accordg to lemma 4 the, besta( X searchgoal ) = ull ad the path defed by besta poters s therefore empty. Suppose ow g( X searchgoal ). Sce g( X searchgoal ) m X OPEN (g( X ) + h( X )) ad h( X searchgoal ) = 0, theorem 1 apples ad therefore > g( X searchgoal ) = g ( X searchgoal ). As a result, the optmstc cost of the path defed by besta poters s also fte accordg to lemma 7. Cosderg that the costs are bouded from below by a postve costat, t shows that the path s of fte legth. Corollary 3 Whe the ComputePath fucto exts the followg holds for each state X o the path π best ( X searchgoal, X 0 ): g( X) = g ( X). Proof: At the tme ComputePath termates g( X searchgoal ) m X OPEN (g( X ) + h( X )). ad h( X searchgoal ) = 0. Thus, accordg to theorem 1, g( X searchgoal ) = g ( X searchgoal ). We ow prove that the theorem holds for the rest of the states o the path defed by besta poters. The case whe g( X searchgoal ) = s trvally prove by otg that ths case besta( X searchgoal ) = ull accordg to lemma 4. We therefore cosder the case whe g( X searchgoal ). We prove the theorem for ths case by ducto. Suppose g( X ) = g ( X ), g( X ) + h( X ) m X OPEN (g( X ) + h( X )) ad h( X ) <. Ths s true at least for the frst state o the path, amely, Xsearchgoal. We wll show that g( X 1 ) = g ( X 1 ), g( X 1 ) + h( X 1 ) m X OPEN (g( X ) + h( X )) ad h( X 1 ) <. Ths ducto step wll prove the statemet of the theorem. The property h( X 1 ) < follows from the cosstecy of heurstcs ad the fact that h( X ) <. By cosstecy h( X 1 ) h( X )+c(s( X ), besta( X ), S( X 1 )). h( X ) s fte accordg to our ducto assumpto, whereas the costs are fte because > g( X searchgoal ) = g ( X searchgoal ). Thus, h( X 1 ) <. To prove that g( X 1 ) + h( X 1 ) m X OPEN (g( X ) + h( X )) we wll show that g( X 1 ) + h( X 1 ) g( X ) + h( X ) as follows: g( X 1 ) + h( X 1 ) cosstecy of heurstcs g( X 1 ) + h( X ) + c(s( X ), besta( X ), S( X 1 )) lemma 3 v( X 1 ) + c(s( X ), besta( X ), S( X 1 )) + h( X ) Y succ(x X+H u,besta( X )) P (X X+H u, besta( X ), Y ) max(c(s( X ), besta( X ), S(Ỹ )) + w(y ), c(s( X), besta( X ), S( X 1 )) + v( X 1 )) + h( X ) = eq. 1 Q f(y )=v( Ỹ ),w X+H u (X, besta( X )) + h( X ) = lemma 4 g( X ) + h( X ) ductve assumpto m X (g( X ) + h( X )) OPEN 11

12 Fally, the fact that g( X 1 ) = g ( X 1 ) ow comes drectly from theorem 1. Theorem 2 No state s expaded more tha oce durg the executo of the ComputePath fucto. Proof: Suppose a state X s selected for expaso for the frst tme durg the executo of the ComputePath fucto. The, t s removed from OPEN set o le 10. Accordg to theorem 1 ts g-value at ths pot s equal to g ( X). O le 11 the state s made cosstet by settg ts v-value to ts g-value. The oly way how X ca be chose for expaso aga s f t s serted to OPEN, but ths oly happes f ts g-value s decreased. Ths however s mpossble sce g( X) s already equal to g ( X) = m π from X to X 0 φ π ( X, X 0 ) where X 0 has S( X 0 ) = S goal (accordg to Lemma 1) ad g( X) must always rema a upper boud o φ πbest ( X, X 0 ) (accordg to Lemma 7). A.2 Ma Fucto I ths secto we preset the theorems about the ma fucto of the algorthm. All refereces to le umbers are for the fgure 2 uless explctly specfed otherwse. By w (X) we deote a mmum expected cost of a polcy for reachg a goal state from state X. We also troduce w u -values defed recursvely as follows: w u (X) = { 0 f S(X) = 0 m a A(S(X)) Q w u,w u (X u, a) otherwse (5) We also defe goal dstaces for full states g -values uder a fucto w recursvely as follows: g (X) = { 0 f S(X) = S goal m a A(S(X)) Q f(y =succ(x u,a) b )=g (X u, a) (Y ),w otherwse (6) Lemma 8 For each X, w u (X) = w u (X u ) Proof: Accordg to equato 5, f S(X) = S(X u ) = S goal the w u (X) = w u (X u ) = 0. Otherwse, w u (X) = m a A(S(X)) Q w u,w u(xu, a) = m a A(S(X)) Q w u,w u((xu ) u, a) = w u (X u ). Lemma 9 For each X, g (X) = g (X u ) Proof: Accordg to the defto, X u = [S(X); H(X); H u (X)], ad therefore S(X) = S(X u ). Suppose frst S(X) = S goal. The, accordg to equato 6, g (X) = 0 ad g (X u ) = 0. 12

13 Now suppose S(X) = S(X u ) S goal. The, accordg to equato 6, g(x) = m a A(S(X)) Q f(y =succ(xu,a) b )=g (Y ),w(x u, a) ad g(x u ) = m a A(S(Xu )) Q f(y =succ((xu ) u,a) b )=g (Y ),w((x u ) u, a). (X u ) u = X u because H u (X) does ot cota ay h elemets equal to b ad therefore H u (X u ) = H u (X). Also, S(X) = S(X u ). Cosequetly, g(x u ) = m a A(S(X)) Q f(y =succ(x u,a) b )=g (Y ),w(x u, a) = g(x). Lemma 10 For each X ad a A(S(X)), h S(X),a (succ(x, a) b ) = h S(X),a (succ(x u, a) b ) ad g (succ(x u, a) b ) = g (succ(x, a) b ) Proof: We cosder all possble cases for h S(X),a (X). Suppose frst h S(X),a (X) = ull. That s, acto a s (ad always was) determstc. The h S(X),a (X u ) = ull also ad therefore h S(X),a (succ(x, a) b ) = h S(X),a (succ(x u, a) b ) = ull. Also, succ(x u, a) b = (succ(x, a) b ) u because h-values are ot affected by acto a ad therefore g (succ(x u, a) b ) = g (succ(x, a) b ) accordg to lemma 9. Suppose ow h S(X),a (X) b. The aga h S(X),a (X u ) = h S(X),a (X) ad therefore h S(X),a (succ(x, a) b ) = h S(X),a (succ(x u, a) b ). Also, succ(x u, a) b = (succ(x, a) b ) u because h-values are ot affected by acto a ad therefore g (succ(x u, a) b ) = g (succ(x, a) b ) accordg to lemma 9. Now suppose h S(X),a (X) = b. If h S(X),a H, the h S(X),a (X u ) = b, whereas f h S(X),a H, the h S(X),a (X u ) = u. I ether case, however, h S(X),a (succ(x, a) b ) = h S(X),a (succ(x u, a) b ) = b. Also, g (succ(x, a) b ) = g ((succ(x, a) b ) u ) ad g (succ(x u, a) b ) = g ((succ(x u, a) b ) u ) accordg to lemma 9. But (succ(x, a) b ) u = (succ(x u, a) b ) u ad therefore g (succ(x, a) b ) = g (succ(x u, a) b ) as stated the theorem. Theorem 3 Suppose that before le 10 s executed for every state X t s true that 0 w(x) w(x u ). The after le 11 s executed for each state X o π best from X pvot to a goal state t holds that w(x) E X succ(x,besta(x))(c(s(x), besta(x), S(X )) + w(x )) f S(X) S goal ad w(x) = 0 otherwse. Proof: We frst prove that after le 11 s executed for each state X o π best from X pvot = X to a goal state X 0 t s true that X = [S(X ); H(X )], where X s the th state o π best from X pvot = X to a goal state X 0. We prove ths by ducto. It certaly holds for = sce X = [S(X pvot ); H(X pvot )] = [S(X ); H(X )]. We ow prove that t cotues to hold for 1. O le 6 we pck X 1 to be equal to succ(x, a ) b, where a = besta(x ) = besta( X ). We thus eed to show that [S(succ(X, a ) b ); H(succ(X, a ) b )] s the 1th state o π best. Accordg to the defto of π best the 1th state o t s defed as: X 1 = [S(succ(X X+H u, a ) b ); H(succ(X X+H u, a ) b )]. We thus eed to show that S(succ(X, a ) b ) = S(succ(X X+H u, a ) b ) ad H(succ(X, a ) b ) = H(succ(X X+H u, a ) b ), where X = [S(X ); H(X ); H(X )] ad X X+H u = [S(X ); H(X ); H u (X pvot )]. 13

14 Sce accordg to the defto of π best X = succ(x +1, a +1 ) b = succ(succ(x +2, a +2 ) b, a +1 ) b ad so o, h S(X),a (X ) ca oly be dfferet from h S(X),a (X X+H u ) f h S(X),a (X X+H u ) = u ad h S(X),a (X ) = b. Cosequetly, the same preferred outcome of acto a exsts for both X ad X X+H u, amely the oe that has h S(X),a = b. I other words, [S(succ(X, a ) b ); H(succ(X, a ) b )] = [S(succ(X X+H u, a ) b ); H(succ(X X+H u, a ) b )]. That s, [S(X 1 ); H(X 1 )] = X 1 We ow prove the statemet of the theorem tself. Cosder a arbtrary X o π best from X pvot = X to a goal state X 0. Because of the statemet we have just prove ad the executo of le 4, w(x ) = g( X ), where X = [S(X ); H(X )] s the th state o π best from X pvot = X to a goal state X 0. If = 0 the w(x ) = g( X ) = 0 accordg to Lemma 4. Suppose ow > 0. Accordg to Lemma 4 the w(x ) = g( X ) = Q f(y )=v( Ỹ ),w old (X X+H u, a ), where w old s w-fucto before the executo of the ComputePath fucto. I addto, the w-value of each X succ(x, a ) such that X X 1 remas the same as that before the ComputePath fucto was called. Ths s so because UpdateMDP does ot update w-values of states wth at least oe h j -value that s ether equal to h j (X X+H u ) or equal to b. Moreover, from Lemma 3 v( X 1 ) g( X 1 ) = w(x 1 ). Hece w(x ) Q f(y )=w(x 1),w(X X+H u, a ). Thus, max(c(s(x X+H u ), a, S(Y )) + w(y ), c(s(x X+H u Y succ(x X+H u,a ) w(x ) P (X X+H u, a, Y ) ), a, S(succ(X X+H u, a ) b )) + w(x 1 )) We dstgush two cases. Frst, suppose h S(X),a (X ) s dfferet from h S(X),a (X X+H u ). Ths s oly possble f h S(X),a (X X+H u ) = u ad h S(X),a (X ) = b. The latter mples that there s oly oe outcome of succ(x, a ), amely, X 1. Hece, Y succ(x X+H u,a ) P (X X+H u, a, Y ) (c(s(x X+H u (c(s(x X+H u w(x ) ), a, S(succ(X X+H u, a ) b )) + w(x 1 )) = ), a, S(succ(X X+H u, a ) b )) + w(x 1 )) = c(s(x ), a, S(X 1 )) + w(x 1 )) = c(s(x ), besta(x ), S(X 1 )) + w(x 1 )) = E X succ(x,besta(x )) (c(s(x ), besta(x), S(X )) + w(x )) Now suppose h S(X),a (X ) = h S(X),a (X X+H u ). The the probablty dstrbuto s the same for succ(x X+H u, a ) ad succ(x, a ). Hece, 14

15 P (X X+H u Y succ(x X+H u Y succ(x X+H u, a, succ(x X+H u, a ) b ) (c(s(x X+H u,a )s.t.y succ(x X+H u,a ) b,a )s.t.y succ(x X+H u,a ) b Cosder ow Y succ(x X+H u P (X X+H u w(x ) ), a, S(succ(X X+H u, a ) b )) + w(x 1 ))+, a, Y ) (c(s(x X+H u ), a, S(Y )) + w(y )) = P (X, a, X 1 ) (c(s(x ), a, S(X 1 )) + w(x 1 ))+ (P (X X+H u, a, Y ) (c(s(x ), a, S(Y )) + w(y ))), a ) such that Y succ(x X+H u, a ) b. Cosder also Z succ(x, a ) such that h S(X),a (Y ) = h S(X),a (Z) (that s, Y ad Z are correspodg outcomes). The Y = [S(Z); H(Z); H u (Z)] = Z u. Cosequetly, w(y ) = w old (Y ) w old (Z) = w(z) accordg to the assumptos of the theorem. As a result, w(x ) (P (X, a, Y ) (c(s(x ), a, S(Y )) + w(y ))) = Y succ(x,a ) E X succ(x,besta(x )) (c(s(x ), besta(x), S(X )) + w(x )) Theorem 4 For each state X, t holds that w b (X) g (X). Proof: The case of g (X) = s trval. We therefore assume that g (X) s fte ad prove by ducto. Suppose there exst X such that w b (X) > g (X). It could ot have bee a state whose S(X) = S goal sce accordg to the deftos of w b (X) ad g (X), they are both equal to 0. We therefore assume that S(X) S goal. The w b (X) = m (c(s(x), a, succ(x, a) b ) + w b (succ(x, a) b )) a S(X) ad, g (X) = m Q f(y =succ(x u,a) b )=g (X u, a) (Y ),w a A(S(X)) = m a A(S(X)) P (X u, a, Y ) max(c(s(x u ), a, S(Y )) + w(y ), Y succ(x u,a) c(s(x u ), a, S(succ(X u, a) b )) + g (succ(x u, a) b )) m c(s(x u ), a, S(succ(X u, a) b )) + g (succ(x u, a) b ) a A(S(X)) m c(s(x), a, S(succ(X, a) b )) + g (succ(x, a) b ) a A(S(X)) 15

16 . The last le s due to the fact that S(X) = S(X u ), S(succ(X u, a) b ) = S(succ(X, a) b ) ad g (succ(x u, a) b ) = g (succ(x, a) b ) accordg to lemma 10. Let us cosder a path π = [{X, a, X 1 },..., {X 1, a 1, X 0 }], where X = X, S(X 0 ) = S goal ad for every tuple {X, a, X 1 }, X 1 = succ(x, a ) b ad a = m a A(S(X)) c(s(x ), a, succ(x, a) b ) + g (succ(x, a) b ). Sce g (X) s fte ad all costs are bouded from below by a postve costat, the path π s fte. Because w b (X ) > g (X ) ad w b (X 0 ) = g (X 0 ) = 0, there must be a tuple {X, a, X 1 } π such that w b (X ) > g (X ) whereas w b (X 1 ) g (X 1 ). But the we get the followg cotradcto g (X ) = c(s(x ), a, succ(x, a ) b ) + g (succ(x, a ) b ) c(s(x ), a, succ(x, a ) b ) + w b (succ(x, a ) b ) m c(s(x ), a, succ(x, a) b ) + w b (succ(x, a) b ) a A(S(X )) = w b (X ). Theorem 5 After each executo of the UpdateMDP fucto, for each state X t holds that w old (X) w(x) g old (X) g (X), where w old -values are w-values before the executo of UpdateMDP, g old (X) are g -values uder w old -values ad g (X) are g -values uder w-values. Proof: Frst, let us show that before the frst executo of UpdateMDP for every X t holds that w(x) g (X). It holds because accordg to the assumptos about state talzato before the ma fucto s executed, w(x) w b (X). O the other had, accordg to theorem 4, w b (X) g (X). We ow prove by ducto. Suppose w old (X) gold (X) before the call to UpdateMDP. We eed to show that after UpdateMDP fucto returs, for each state X we have w old (X) w(x) g (X). Let us frst prove that w old (X) w(x). We oly eed to cosder the states updated by UpdateMDP fucto sce w-values of all other states rema uchaged. We frst prove by ducto o the executo of le 4 that for each state X updated by UpdateMDP t holds that X u = X X+H u. Cosder the frst tme, le 4 s executed. The X = X pvot. Therefore, X u = [S(X); H(X); H u (X pvot )] = X X+H u. UpdateMDP also updates drectly [S(X); H(X); H u (X pvot )] = X X+H u. Now cosder the th executo of le 4, whereas o all prevous executos t held that X u = X X+H u. At th executo, state X s a state whch s equal to some succ(y, besta(y )) b, where Y s a state that was updated durg (-1)th executo of le 4. Thus, H u (X) = H u (Y ) ad therefore X u = [S(X); H(X); H u (X pvot )] = X X+H u. Oce aga, UpdateMDP also updates [S(X); H(X); H u (X pvot )] = X X+H u. Thus, for each state updated by UpdateMDP, t holds that X u = [S(X); H(X); H u (X pvot )]. As a result, f S(X) = S goal, the accordg to the defto of g-values, gold ( X) = gold (X) = 0. O the other had, f S(X) S goal, the 16

17 gold( X) = m Q a A(S( X)) f(y =succ(x X+H u,a)b )=g (Ỹ ),w (X X+H u, a) old = m a A(S(X)) Q f(y =succ(x u,a) b )=g (Ỹ ),w old (Xu, a) Because g (Ỹ ) = g (Y ) = 0 f S(Y ) = s goal, t the holds that g old( X) = m a A(S(X)) Q f(y =succ(x u,a) b )=g (Y ),w old (X u, a) = g old(x) Thus, for each state X updated by UpdateMDP, t holds that g old ( X) = g old (X). Also, accordg to corollary 3, g( X) = g old ( X) ad from ducto assumpto w old (X) g old (X). Thus whe UpdateMDP executes w(x) = g( X) o le 4, the w(x) = g( X) = gold( X) = gold(x) w old (X) Suppose ow UpdateMDP executes w([s(x); H(X); H u (X pvot )]) = g( X) o le 4. Accordg to lemma 9 gold (X) = g old (Xu ) = gold (X X+H u ) ad sce w old (X X+H u ) gold (X X+H u ), t follows that: w(x X+H u ) = g( X) = g old( X) = g old(x) = g old(x X+H u ) w old (X X+H u ) We ow prove that for every state X w(x) g old (X) g (X). We frst ote that sce as we have just proved oe of w-values decreased, t holds that for every state X g (X) g old (X). Suppose X was ot updated by UpdateMDP, that s, w(x) = w old (X). The w(x) = w old (X) g old (X) g (X). Now suppose w(x) was updated by UpdateMDP. Oce aga suppose the update s w(x) = g( X) o le 4. The w(x) = g( X) = g old( X) = g old(x) g (X) Now suppose the update s w([s(x); H(X); H u (X pvot )]) = g( X) o le 4. The w(x X+H u ) = g( X) = g old( X) = g old(x) = g old(x X+H u ) g (X X+H u ) 17

18 Theorem 6 For a o-egatve fucto w w u, the followg holds: for each state X, g (X) s bouded from above by w u (X). Proof: Ths certaly holds for X whose S(X) = S goal. Now suppose S(X) S goal. We prove by cotradcto ad assume that there exsts oe or more states X whose g (X) > w u (X), whch mples that w u (X) s fte. Let us cosder a path π greedy,w u,w u(x, X 0 ) where X = X ad S(X 0 ) = s goal. Accordg to ts defto, for every par of states X, X 1 o ths path t holds that X 1 = succ(x, a ) b, where a = arg m a A(S(X )) Q wu,w u(xu, a ). Sce all costs are postve ad w u (X) s fte, w u (X ) > w u (X 1 ). Also, sce the costs are bouded from below by a postve costat, w u (X 0 ) = 0 ad w u (X) s fte, t holds that the path π greedy,w u,w u(x, X 0 ) s fte. Ths meas that there must exst such X o the path that g (X ) > w u (X ), whle g (X 1 ) w u (X 1 ) where X 1 = succ(x, a ) b The we arrve at the followg cotradcto g (X ) Q f(x 1 =succ(x u,a )b )=g (X 1 ),w (Xu, a ) Q f(x 1 =succ(x u,a )b )=g (X 1 ),w u (Xu, a ) = P (X u, a, Y ) max(c(s(x ), a, S(Y )) + wu (Y ), c(s(x ), a, S(X 1 )) + g (X 1 )) Y succ(x u,a ) Y succ(x u,a ) P (X u, a, Y ) max(c(s(x ), a, S(Y )) + wu (Y ), c(s(x ), a, S(X 1 )) + w u (X 1 )) = Q w u,w u (X u, a ) = m Q w u,w u (X u, a ) = a A(S(X )) w u (X) Theorem 7 For each state X, w(x) s bouded from above by w u (X). Proof: Before the frst executo of UpdateMDP, accordg to the talzato assumptos, for every state X, 0 w(x) w b (X). Also, accordg to theorem 4, for each state X, w b (X) g (X) ad accordg to theorem 6, g (X) w u (X). Thus, 0 w(x) w u (X). We ow prove the theorem by ducto. Suppose before the th executo of UpdateMDP, t holds that 0 w old (X) w u (X), where w old -values are w-values rght before the th executo of UpdateMDP fucto. We eed to show that after the th executo of UpdateMDP fucto, the equalty 0 w(x) w u (X) holds. Accordg to theorem 6, for every state X, g old (X) wu (X). At the same tme, accordg to theorem 6, w(x) g old (X). Thus, w(x) wu (X). 18

19 The equalty 0 w(x) follows from the fact that 0 w old (X) ad theorem 6, accordg to whch, w old (X) w(x). Theorem 8 PPCP termates, ad at that tme, w(x start ) w u (X start ) ad the expected cost of the polcy of always takg acto besta(x) at ay state X startg at X goal utl state X 0 whose S(X 0 ) = S goal s reached s o more tha w(x start ). Proof: We wll frst show that the algorthm termates. For ths let us frst show that the set of all possble polces π(x start ) ever cosdered by PPCP s guarateed to be fte. To prove ths we eed to show that ay polcy cosdered by PPCP s acyclc. The, the fact that the set of all such polces s fte wll be due to the belef state-space tself beg fte. Ay polcy PPCP has at ay pot of tme s acyclc because after each stochastc acto a at ay state X, the correspodg h S(X),a s set to a value ot equal to u the outcome states ad remas such all of ther descedats, whereas all the acestors of X ad X tself had h S(X),a = u. The determstc paths betwee ay two stochastc actos o the polcy or betwee X start ad the frst stochastc acto o the polcy, o the other had, are all segmets of the paths retured by ComputePath fucto ad these paths are fte accordg to corollary 2. Thus, the set of all possble polces cosdered by PPCP s fte. The termato crtero for the algorthm s that all states o ts curret polcy are have w-values at least as large as the expectato over the acto cost plus w-values of the successors of ther acto defed by besta poter, except for the goal states, whose w-values are 0 because they are bouded by w u -values accordg to theorem 7 ad these are zeroes for goal states. I other words, for every X o the curret polcy s.t. S(X) S goal t holds that w(x) E X succ(x,besta(x)) (c(s(x), besta(x), S(X )) + w(x )) (7) At each terato, PPCP fxes at least oe state X o the polcy to satsfy ths equato. Whle fxg the equato, PPCP may chage besta acto for state X ad/or chage w-value of X. There s a fte umber of possble subtrees below X that PPCP ca cosder sce the set of all possble polces cosdered by PPCP s fte. The chage the w-value of X may potetally affect other states, but sce the polcy s acyclc t ca ot affect the states that are descedats of X. The umber of acestors of X, o the other had, s fte sce the polcy s acyclc ad the belef state-space s fte. Therefore, PPCP s boud to arrve a fte umber of teratos at a polcy for whch all of the states that belog to t satsfy the equato 7. Each terato s also guarateed to be fte for the followg reasos. Frst, the ComputePath fucto s guarateed to retur because each state s expaded o more tha oce per search accordg to theorem 2. Secod, the UpdateMDP fucto s guarateed to retur, because the path t processes s guarateed to be of fte legth accordg to corollary 2. We ow show that after PPCP termates, w(x start ) w u (X start ) ad the expected cost of the polcy of always takg acto besta(x) at ay state X startg at 19

20 X goal utl state Y whose S(Y ) = S goal s reached s o more tha w(x start ). The frst part comes drectly from theorem 7. The secod part ca be proved as follows. Cosder the followg potetal fucto that we mata whle executg the polcy defed by besta actos startg wth X start : F (t) = costsofar(t) + w(x t ), where t s the curret tme-step. So, tally F (t = 0) = 0 + w(x start ) = w(x start ). We execute the polcy utl we reach a state Y such that S(Y ) = S goal. Suppose t happes at tmestep t = k. That s, Y = X t. The F (t = k) = costsofar(k) + 0 = costsofar(k). We eed to show that the expected value of F (t = k) s bouded above by w(x start ). Itally, E{F (t = 0)} = w(x 0 ), where X 0 = X start. Now cosder the expectato at the th step: E{F (t = )} = E{costsofar() + w(x )} = E{costsofar( 1) + cost() + w(x )} = E{costsofar( 1)} + E{cost() + w(x )} Sce all states o the polcy (except for the goal states) satsfy the equato 7, we have w(x 1 ) E{cost()+w(X )}. After takg a addtoal expectato we have E{w(X 1 ) cost()} E{w(X )}. Hece, E{F (t = )} = E{costsofar( 1)} + E{cost() + w(x )} E{costsofar( 1)} + E{cost() + (w(x 1 ) cost())} = E{costsofar( 1) + w(x 1 )} = E{F (t = 1)} By ducto the E{F (t = k)} E{F (t = 0)} = w(x start ). Theorem 9 Suppose there exsts a mmum expected cost polcy ρ that satsfes the followg codto: for every par of states X 1 ρ ad X 2 ρ such that X 2 ca be reached wth a o-zero probablty from X 1 whe followg polcy ρ t holds that ether h S(X1),ρ (X 1) h S(X2),ρ (X 2) or h S(X1),ρ (X 1) = h S(X2),ρ (X 2) = ull. The the polcy defed by besta poters at the tme PPCP termates s also a mmum expected cost polcy. Proof: Let us assume that there exsts a mmum expected cost polcy ρ that satsfes the codtos of the theorem. That s, for every par of states X 1 ρ ad X 2 ρ such that X 2 ca be reached wth a o-zero probablty from X 1 whe followg polcy ρ t holds that ether h S(X1),ρ (X 1) h S(X2),ρ (X 2) or h S(X1),ρ (X 1) = h S(X2),ρ (X 2) = ull. Sce ρ s a optmal polcy, ts expected cost s w (X start ). We wll show that w u (X start ) w (X start ). Ths wll prove the theorem sce the expected cost of the polcy retured by PPCP s bouded from above by w u (X start ). The expected cost the polcy wll the be exactly equal to w (X start ) sce ρ s already a optmal polcy. 20

21 Let us prove by cotradcto ad assume that w u (X start ) > w (X start ). Ths also meas that w (X start ) s fte ad therefore all braches o the polcy ρ ed up at states X whose S(X) = S goal sce a optmal polcy whe sesg s perfect s acyclc. Let us ow pck a state X ρ such that w u (X) > w (X), but all the successor states Y of acto ρ (X) executed at state X have w u (Y ) w (Y ). Such state X must exst because at least for X = X start t holds that w u (X) > w (X) ad all braches of the polcy ed up at states Y whose S(Y ) = S goal ad for these states w u (Y ) = w (Y ) = 0 accordg to the defto of w u ad w values. By defto, w u (X) = m a A(S(X)) Q w u,w u(xu, a) = Q wu,w u(xu, ρ (X)) P (X u, a, Z) max(c(s(x), a, S(Z)) + w u (Z), Z succ(x u,ρ (X)) c(s(x), a, S(Z)) + w u (succ(x u, ρ (X)) b )) Let us ow cosder h S(X),ρ (X) (X). It must be the case that ether h S(X),ρ (X) (X) = u or h S(X),ρ (X) (X) = ull sce, accordg to the assumptos of the theorem, o acto whose outcome depeds o h S(X),ρ (X) could have bee executed before. Thus, h S(X),ρ (X) (X u ) = h S(X),ρ (X) (X). Ths property has a mportat mplcato that we wll use. For ay par of states Y succ(x, ρ (X)) ad Z succ(x u, ρ (X)) such that h S(X),ρ (X) (Z u ) = h S(X),ρ (X) (Y u ) ( other words, Y ad Z are correspodg outcomes of acto ρ (X) executed at X ad X u respectvely), t holds that P (X, ρ (X), Y ) = P (X u, ρ (X), Z) ad Y u = Z u. Usg ths fact ad lemma 8 we ca derve the followg. w u (X) = = = P (X u, a, Z) max(c(s(x), a, S(Z)) + w u (Z), Z succ(x u,ρ (X)) c(s(x), a, S(Z)) + w u (succ(x u, ρ (X)) b )) P (X u, a, Z) max(c(s(x), a, S(Z)) + w u (Z u ), Z succ(x u,ρ (X)) c(s(x), a, S(Z)) + w u ((succ(x u, ρ (X)) b ) u )) P (X, a, Y ) max(c(s(x), a, S(Y )) + w u (Y u ), Y succ(x,ρ (X)) c(s(x), a, S(Y )) + w u ((succ(x, ρ (X)) b ) u )) P (X, a, Y ) max(c(s(x), a, S(Y )) + w u (Y ), Y succ(x,ρ (X)) 21

22 c(s(x), a, S(Y )) + w u (succ(x, ρ (X)) b )) Accordg to the way we pcked X, w u (Y ) w (Y ) for every Y succ(x, ρ (X)). Moreover, from the defto of clear prefereces t follows that c(s(x), a, S(Y )) + w u (succ(x, ρ (X)) b ) c(s(x), a, S(Y )) + w u (Y ) for all Y succ(x, ρ (X)). Hece, we obta the followg cotradcto w u (X) P (X, a, Y ) max(c(s(x), a, S(Y )) + w (Y ), Y succ(x,ρ (X)) c(s(x), a, S(Y )) + w (succ(x, ρ (X)) b )) = P (X, a, Y ) (c(s(x), a, S(Y )) + w (Y )) Y succ(x,ρ (X)) = w (X) 22

The Mathematical Appendix

The Mathematical Appendix The Mathematcal Appedx Defto A: If ( Λ, Ω, where ( λ λ λ whch the probablty dstrbutos,,..., Defto A. uppose that ( Λ,,..., s a expermet type, the σ-algebra o λ λ λ are defed s deoted by ( (,,...,, σ Ω.

More information

Discrete Mathematics and Probability Theory Fall 2016 Seshia and Walrand DIS 10b

Discrete Mathematics and Probability Theory Fall 2016 Seshia and Walrand DIS 10b CS 70 Dscrete Mathematcs ad Probablty Theory Fall 206 Sesha ad Walrad DIS 0b. Wll I Get My Package? Seaky delvery guy of some compay s out delverg packages to customers. Not oly does he had a radom package

More information

A tighter lower bound on the circuit size of the hardest Boolean functions

A tighter lower bound on the circuit size of the hardest Boolean functions Electroc Colloquum o Computatoal Complexty, Report No. 86 2011) A tghter lower boud o the crcut sze of the hardest Boolea fuctos Masak Yamamoto Abstract I [IPL2005], Fradse ad Mlterse mproved bouds o the

More information

CHAPTER 4 RADICAL EXPRESSIONS

CHAPTER 4 RADICAL EXPRESSIONS 6 CHAPTER RADICAL EXPRESSIONS. The th Root of a Real Number A real umber a s called the th root of a real umber b f Thus, for example: s a square root of sce. s also a square root of sce ( ). s a cube

More information

Lecture 9: Tolerant Testing

Lecture 9: Tolerant Testing Lecture 9: Tolerat Testg Dael Kae Scrbe: Sakeerth Rao Aprl 4, 07 Abstract I ths lecture we prove a quas lear lower boud o the umber of samples eeded to do tolerat testg for L dstace. Tolerat Testg We have

More information

PTAS for Bin-Packing

PTAS for Bin-Packing CS 663: Patter Matchg Algorthms Scrbe: Che Jag /9/00. Itroducto PTAS for B-Packg The B-Packg problem s NP-hard. If we use approxmato algorthms, the B-Packg problem could be solved polyomal tme. For example,

More information

CS286.2 Lecture 4: Dinur s Proof of the PCP Theorem

CS286.2 Lecture 4: Dinur s Proof of the PCP Theorem CS86. Lecture 4: Dur s Proof of the PCP Theorem Scrbe: Thom Bohdaowcz Prevously, we have prove a weak verso of the PCP theorem: NP PCP 1,1/ (r = poly, q = O(1)). Wth ths result we have the desred costat

More information

{ }{ ( )} (, ) = ( ) ( ) ( ) Chapter 14 Exercises in Sampling Theory. Exercise 1 (Simple random sampling): Solution:

{ }{ ( )} (, ) = ( ) ( ) ( ) Chapter 14 Exercises in Sampling Theory. Exercise 1 (Simple random sampling): Solution: Chapter 4 Exercses Samplg Theory Exercse (Smple radom samplg: Let there be two correlated radom varables X ad A sample of sze s draw from a populato by smple radom samplg wthout replacemet The observed

More information

Solving Constrained Flow-Shop Scheduling. Problems with Three Machines

Solving Constrained Flow-Shop Scheduling. Problems with Three Machines It J Cotemp Math Sceces, Vol 5, 2010, o 19, 921-929 Solvg Costraed Flow-Shop Schedulg Problems wth Three Maches P Pada ad P Rajedra Departmet of Mathematcs, School of Advaced Sceces, VIT Uversty, Vellore-632

More information

MATH 247/Winter Notes on the adjoint and on normal operators.

MATH 247/Winter Notes on the adjoint and on normal operators. MATH 47/Wter 00 Notes o the adjot ad o ormal operators I these otes, V s a fte dmesoal er product space over, wth gve er * product uv, T, S, T, are lear operators o V U, W are subspaces of V Whe we say

More information

Econometric Methods. Review of Estimation

Econometric Methods. Review of Estimation Ecoometrc Methods Revew of Estmato Estmatg the populato mea Radom samplg Pot ad terval estmators Lear estmators Ubased estmators Lear Ubased Estmators (LUEs) Effcecy (mmum varace) ad Best Lear Ubased Estmators

More information

18.413: Error Correcting Codes Lab March 2, Lecture 8

18.413: Error Correcting Codes Lab March 2, Lecture 8 18.413: Error Correctg Codes Lab March 2, 2004 Lecturer: Dael A. Spelma Lecture 8 8.1 Vector Spaces A set C {0, 1} s a vector space f for x all C ad y C, x + y C, where we take addto to be compoet wse

More information

Homework 1: Solutions Sid Banerjee Problem 1: (Practice with Asymptotic Notation) ORIE 4520: Stochastics at Scale Fall 2015

Homework 1: Solutions Sid Banerjee Problem 1: (Practice with Asymptotic Notation) ORIE 4520: Stochastics at Scale Fall 2015 Fall 05 Homework : Solutos Problem : (Practce wth Asymptotc Notato) A essetal requremet for uderstadg scalg behavor s comfort wth asymptotc (or bg-o ) otato. I ths problem, you wll prove some basc facts

More information

Chapter 9 Jordan Block Matrices

Chapter 9 Jordan Block Matrices Chapter 9 Jorda Block atrces I ths chapter we wll solve the followg problem. Gve a lear operator T fd a bass R of F such that the matrx R (T) s as smple as possble. f course smple s a matter of taste.

More information

Likewise, properties of the optimal policy for equipment replacement & maintenance problems can be used to reduce the computation.

Likewise, properties of the optimal policy for equipment replacement & maintenance problems can be used to reduce the computation. Whe solvg a vetory repleshmet problem usg a MDP model, kowg that the optmal polcy s of the form (s,s) ca reduce the computatoal burde. That s, f t s optmal to replesh the vetory whe the vetory level s,

More information

Lecture 7. Confidence Intervals and Hypothesis Tests in the Simple CLR Model

Lecture 7. Confidence Intervals and Hypothesis Tests in the Simple CLR Model Lecture 7. Cofdece Itervals ad Hypothess Tests the Smple CLR Model I lecture 6 we troduced the Classcal Lear Regresso (CLR) model that s the radom expermet of whch the data Y,,, K, are the outcomes. The

More information

Part 4b Asymptotic Results for MRR2 using PRESS. Recall that the PRESS statistic is a special type of cross validation procedure (see Allen (1971))

Part 4b Asymptotic Results for MRR2 using PRESS. Recall that the PRESS statistic is a special type of cross validation procedure (see Allen (1971)) art 4b Asymptotc Results for MRR usg RESS Recall that the RESS statstc s a specal type of cross valdato procedure (see Alle (97)) partcular to the regresso problem ad volves fdg Y $,, the estmate at the

More information

STK4011 and STK9011 Autumn 2016

STK4011 and STK9011 Autumn 2016 STK4 ad STK9 Autum 6 Pot estmato Covers (most of the followg materal from chapter 7: Secto 7.: pages 3-3 Secto 7..: pages 3-33 Secto 7..: pages 35-3 Secto 7..3: pages 34-35 Secto 7.3.: pages 33-33 Secto

More information

Complete Convergence and Some Maximal Inequalities for Weighted Sums of Random Variables

Complete Convergence and Some Maximal Inequalities for Weighted Sums of Random Variables Joural of Sceces, Islamc Republc of Ira 8(4): -6 (007) Uversty of Tehra, ISSN 06-04 http://sceces.ut.ac.r Complete Covergece ad Some Maxmal Iequaltes for Weghted Sums of Radom Varables M. Am,,* H.R. Nl

More information

9 U-STATISTICS. Eh =(m!) 1 Eh(X (1),..., X (m ) ) i.i.d

9 U-STATISTICS. Eh =(m!) 1 Eh(X (1),..., X (m ) ) i.i.d 9 U-STATISTICS Suppose,,..., are P P..d. wth CDF F. Our goal s to estmate the expectato t (P)=Eh(,,..., m ). Note that ths expectato requres more tha oe cotrast to E, E, or Eh( ). Oe example s E or P((,

More information

Chapter 5 Properties of a Random Sample

Chapter 5 Properties of a Random Sample Lecture 6 o BST 63: Statstcal Theory I Ku Zhag, /0/008 Revew for the prevous lecture Cocepts: t-dstrbuto, F-dstrbuto Theorems: Dstrbutos of sample mea ad sample varace, relatoshp betwee sample mea ad sample

More information

Non-uniform Turán-type problems

Non-uniform Turán-type problems Joural of Combatoral Theory, Seres A 111 2005 106 110 wwwelsevercomlocatecta No-uform Turá-type problems DhruvMubay 1, Y Zhao 2 Departmet of Mathematcs, Statstcs, ad Computer Scece, Uversty of Illos at

More information

Rademacher Complexity. Examples

Rademacher Complexity. Examples Algorthmc Foudatos of Learg Lecture 3 Rademacher Complexty. Examples Lecturer: Patrck Rebesch Verso: October 16th 018 3.1 Itroducto I the last lecture we troduced the oto of Rademacher complexty ad showed

More information

C.11 Bang-bang Control

C.11 Bang-bang Control Itroucto to Cotrol heory Iclug Optmal Cotrol Nguye a e -.5 C. Bag-bag Cotrol. Itroucto hs chapter eals wth the cotrol wth restrctos: s boue a mght well be possble to have scotutes. o llustrate some of

More information

Ideal multigrades with trigonometric coefficients

Ideal multigrades with trigonometric coefficients Ideal multgrades wth trgoometrc coeffcets Zarathustra Brady December 13, 010 1 The problem A (, k) multgrade s defed as a par of dstct sets of tegers such that (a 1,..., a ; b 1,..., b ) a j = =1 for all

More information

Dimensionality Reduction and Learning

Dimensionality Reduction and Learning CMSC 35900 (Sprg 009) Large Scale Learg Lecture: 3 Dmesoalty Reducto ad Learg Istructors: Sham Kakade ad Greg Shakharovch L Supervsed Methods ad Dmesoalty Reducto The theme of these two lectures s that

More information

Summary of the lecture in Biostatistics

Summary of the lecture in Biostatistics Summary of the lecture Bostatstcs Probablty Desty Fucto For a cotuos radom varable, a probablty desty fucto s a fucto such that: 0 dx a b) b a dx A probablty desty fucto provdes a smple descrpto of the

More information

Functions of Random Variables

Functions of Random Variables Fuctos of Radom Varables Chapter Fve Fuctos of Radom Varables 5. Itroducto A geeral egeerg aalyss model s show Fg. 5.. The model output (respose) cotas the performaces of a system or product, such as weght,

More information

Point Estimation: definition of estimators

Point Estimation: definition of estimators Pot Estmato: defto of estmators Pot estmator: ay fucto W (X,..., X ) of a data sample. The exercse of pot estmato s to use partcular fuctos of the data order to estmate certa ukow populato parameters.

More information

X X X E[ ] E X E X. is the ()m n where the ( i,)th. j element is the mean of the ( i,)th., then

X X X E[ ] E X E X. is the ()m n where the ( i,)th. j element is the mean of the ( i,)th., then Secto 5 Vectors of Radom Varables Whe workg wth several radom varables,,..., to arrage them vector form x, t s ofte coveet We ca the make use of matrx algebra to help us orgaze ad mapulate large umbers

More information

Lecture 16: Backpropogation Algorithm Neural Networks with smooth activation functions

Lecture 16: Backpropogation Algorithm Neural Networks with smooth activation functions CO-511: Learg Theory prg 2017 Lecturer: Ro Lv Lecture 16: Bacpropogato Algorthm Dsclamer: These otes have ot bee subected to the usual scruty reserved for formal publcatos. They may be dstrbuted outsde

More information

ECONOMETRIC THEORY. MODULE VIII Lecture - 26 Heteroskedasticity

ECONOMETRIC THEORY. MODULE VIII Lecture - 26 Heteroskedasticity ECONOMETRIC THEORY MODULE VIII Lecture - 6 Heteroskedastcty Dr. Shalabh Departmet of Mathematcs ad Statstcs Ida Isttute of Techology Kapur . Breusch Paga test Ths test ca be appled whe the replcated data

More information

Laboratory I.10 It All Adds Up

Laboratory I.10 It All Adds Up Laboratory I. It All Adds Up Goals The studet wll work wth Rema sums ad evaluate them usg Derve. The studet wll see applcatos of tegrals as accumulatos of chages. The studet wll revew curve fttg sklls.

More information

Bounds on the expected entropy and KL-divergence of sampled multinomial distributions. Brandon C. Roy

Bounds on the expected entropy and KL-divergence of sampled multinomial distributions. Brandon C. Roy Bouds o the expected etropy ad KL-dvergece of sampled multomal dstrbutos Brado C. Roy bcroy@meda.mt.edu Orgal: May 18, 2011 Revsed: Jue 6, 2011 Abstract Iformato theoretc quattes calculated from a sampled

More information

F. Inequalities. HKAL Pure Mathematics. 進佳數學團隊 Dr. Herbert Lam 林康榮博士. [Solution] Example Basic properties

F. Inequalities. HKAL Pure Mathematics. 進佳數學團隊 Dr. Herbert Lam 林康榮博士. [Solution] Example Basic properties 進佳數學團隊 Dr. Herbert Lam 林康榮博士 HKAL Pure Mathematcs F. Ieualtes. Basc propertes Theorem Let a, b, c be real umbers. () If a b ad b c, the a c. () If a b ad c 0, the ac bc, but f a b ad c 0, the ac bc. Theorem

More information

Arithmetic Mean and Geometric Mean

Arithmetic Mean and Geometric Mean Acta Mathematca Ntresa Vol, No, p 43 48 ISSN 453-6083 Arthmetc Mea ad Geometrc Mea Mare Varga a * Peter Mchalča b a Departmet of Mathematcs, Faculty of Natural Sceces, Costate the Phlosopher Uversty Ntra,

More information

Lecture 3 Probability review (cont d)

Lecture 3 Probability review (cont d) STATS 00: Itroducto to Statstcal Iferece Autum 06 Lecture 3 Probablty revew (cot d) 3. Jot dstrbutos If radom varables X,..., X k are depedet, the ther dstrbuto may be specfed by specfyg the dvdual dstrbuto

More information

UNIVERSITY OF OSLO DEPARTMENT OF ECONOMICS

UNIVERSITY OF OSLO DEPARTMENT OF ECONOMICS UNIVERSITY OF OSLO DEPARTMENT OF ECONOMICS Postpoed exam: ECON430 Statstcs Date of exam: Jauary 0, 0 Tme for exam: 09:00 a.m. :00 oo The problem set covers 5 pages Resources allowed: All wrtte ad prted

More information

The Selection Problem - Variable Size Decrease/Conquer (Practice with algorithm analysis)

The Selection Problem - Variable Size Decrease/Conquer (Practice with algorithm analysis) We have covered: Selecto, Iserto, Mergesort, Bubblesort, Heapsort Next: Selecto the Qucksort The Selecto Problem - Varable Sze Decrease/Coquer (Practce wth algorthm aalyss) Cosder the problem of fdg the

More information

UNIVERSITY OF OSLO DEPARTMENT OF ECONOMICS

UNIVERSITY OF OSLO DEPARTMENT OF ECONOMICS UNIVERSITY OF OSLO DEPARTMENT OF ECONOMICS Exam: ECON430 Statstcs Date of exam: Frday, December 8, 07 Grades are gve: Jauary 4, 08 Tme for exam: 0900 am 00 oo The problem set covers 5 pages Resources allowed:

More information

Feature Selection: Part 2. 1 Greedy Algorithms (continued from the last lecture)

Feature Selection: Part 2. 1 Greedy Algorithms (continued from the last lecture) CSE 546: Mache Learg Lecture 6 Feature Selecto: Part 2 Istructor: Sham Kakade Greedy Algorthms (cotued from the last lecture) There are varety of greedy algorthms ad umerous amg covetos for these algorthms.

More information

A Primer on Summation Notation George H Olson, Ph. D. Doctoral Program in Educational Leadership Appalachian State University Spring 2010

A Primer on Summation Notation George H Olson, Ph. D. Doctoral Program in Educational Leadership Appalachian State University Spring 2010 Summato Operator A Prmer o Summato otato George H Olso Ph D Doctoral Program Educatoal Leadershp Appalacha State Uversty Sprg 00 The summato operator ( ) {Greek letter captal sgma} s a structo to sum over

More information

Unimodality Tests for Global Optimization of Single Variable Functions Using Statistical Methods

Unimodality Tests for Global Optimization of Single Variable Functions Using Statistical Methods Malaysa Umodalty Joural Tests of Mathematcal for Global Optmzato Sceces (): of 05 Sgle - 5 Varable (007) Fuctos Usg Statstcal Methods Umodalty Tests for Global Optmzato of Sgle Varable Fuctos Usg Statstcal

More information

( ) = ( ) ( ) Chapter 13 Asymptotic Theory and Stochastic Regressors. Stochastic regressors model

( ) = ( ) ( ) Chapter 13 Asymptotic Theory and Stochastic Regressors. Stochastic regressors model Chapter 3 Asmptotc Theor ad Stochastc Regressors The ature of eplaator varable s assumed to be o-stochastc or fed repeated samples a regresso aalss Such a assumpto s approprate for those epermets whch

More information

Lecture 4 Sep 9, 2015

Lecture 4 Sep 9, 2015 CS 388R: Radomzed Algorthms Fall 205 Prof. Erc Prce Lecture 4 Sep 9, 205 Scrbe: Xagru Huag & Chad Voegele Overvew I prevous lectures, we troduced some basc probablty, the Cheroff boud, the coupo collector

More information

Chapter 4 Multiple Random Variables

Chapter 4 Multiple Random Variables Revew for the prevous lecture: Theorems ad Examples: How to obta the pmf (pdf) of U = g (, Y) ad V = g (, Y) Chapter 4 Multple Radom Varables Chapter 44 Herarchcal Models ad Mxture Dstrbutos Examples:

More information

The internal structure of natural numbers, one method for the definition of large prime numbers, and a factorization test

The internal structure of natural numbers, one method for the definition of large prime numbers, and a factorization test Fal verso The teral structure of atural umbers oe method for the defto of large prme umbers ad a factorzato test Emmaul Maousos APM Isttute for the Advacemet of Physcs ad Mathematcs 3 Poulou str. 53 Athes

More information

Simulation Output Analysis

Simulation Output Analysis Smulato Output Aalyss Summary Examples Parameter Estmato Sample Mea ad Varace Pot ad Iterval Estmato ermatg ad o-ermatg Smulato Mea Square Errors Example: Sgle Server Queueg System x(t) S 4 S 4 S 3 S 5

More information

ENGI 4421 Joint Probability Distributions Page Joint Probability Distributions [Navidi sections 2.5 and 2.6; Devore sections

ENGI 4421 Joint Probability Distributions Page Joint Probability Distributions [Navidi sections 2.5 and 2.6; Devore sections ENGI 441 Jot Probablty Dstrbutos Page 7-01 Jot Probablty Dstrbutos [Navd sectos.5 ad.6; Devore sectos 5.1-5.] The jot probablty mass fucto of two dscrete radom quattes, s, P ad p x y x y The margal probablty

More information

Lecture 02: Bounding tail distributions of a random variable

Lecture 02: Bounding tail distributions of a random variable CSCI-B609: A Theorst s Toolkt, Fall 206 Aug 25 Lecture 02: Boudg tal dstrbutos of a radom varable Lecturer: Yua Zhou Scrbe: Yua Xe & Yua Zhou Let us cosder the ubased co flps aga. I.e. let the outcome

More information

Introduction to local (nonparametric) density estimation. methods

Introduction to local (nonparametric) density estimation. methods Itroducto to local (oparametrc) desty estmato methods A slecture by Yu Lu for ECE 66 Sprg 014 1. Itroducto Ths slecture troduces two local desty estmato methods whch are Parze desty estmato ad k-earest

More information

Estimation of Stress- Strength Reliability model using finite mixture of exponential distributions

Estimation of Stress- Strength Reliability model using finite mixture of exponential distributions Iteratoal Joural of Computatoal Egeerg Research Vol, 0 Issue, Estmato of Stress- Stregth Relablty model usg fte mxture of expoetal dstrbutos K.Sadhya, T.S.Umamaheswar Departmet of Mathematcs, Lal Bhadur

More information

CIS 800/002 The Algorithmic Foundations of Data Privacy October 13, Lecture 9. Database Update Algorithms: Multiplicative Weights

CIS 800/002 The Algorithmic Foundations of Data Privacy October 13, Lecture 9. Database Update Algorithms: Multiplicative Weights CIS 800/002 The Algorthmc Foudatos of Data Prvacy October 13, 2011 Lecturer: Aaro Roth Lecture 9 Scrbe: Aaro Roth Database Update Algorthms: Multplcatve Weghts We ll recall aga) some deftos from last tme:

More information

The number of observed cases The number of parameters. ith case of the dichotomous dependent variable. the ith case of the jth parameter

The number of observed cases The number of parameters. ith case of the dichotomous dependent variable. the ith case of the jth parameter LOGISTIC REGRESSION Notato Model Logstc regresso regresses a dchotomous depedet varable o a set of depedet varables. Several methods are mplemeted for selectg the depedet varables. The followg otato s

More information

1 Solution to Problem 6.40

1 Solution to Problem 6.40 1 Soluto to Problem 6.40 (a We wll wrte T τ (X 1,...,X where the X s are..d. wth PDF f(x µ, σ 1 ( x µ σ g, σ where the locato parameter µ s ay real umber ad the scale parameter σ s > 0. Lettg Z X µ σ we

More information

Third handout: On the Gini Index

Third handout: On the Gini Index Thrd hadout: O the dex Corrado, a tala statstca, proposed (, 9, 96) to measure absolute equalt va the mea dfferece whch s defed as ( / ) where refers to the total umber of dvduals socet. Assume that. The

More information

Chapter 3 Sampling For Proportions and Percentages

Chapter 3 Sampling For Proportions and Percentages Chapter 3 Samplg For Proportos ad Percetages I may stuatos, the characterstc uder study o whch the observatos are collected are qualtatve ature For example, the resposes of customers may marketg surveys

More information

Algorithms Design & Analysis. Hash Tables

Algorithms Design & Analysis. Hash Tables Algorthms Desg & Aalyss Hash Tables Recap Lower boud Order statstcs 2 Today s topcs Drect-accessble table Hash tables Hash fuctos Uversal hashg Perfect Hashg Ope addressg 3 Symbol-table problem Symbol

More information

For combinatorial problems we might need to generate all permutations, combinations, or subsets of a set.

For combinatorial problems we might need to generate all permutations, combinations, or subsets of a set. Addtoal Decrease ad Coquer Algorthms For combatoral problems we mght eed to geerate all permutatos, combatos, or subsets of a set. Geeratg Permutatos If we have a set f elemets: { a 1, a 2, a 3, a } the

More information

UNIT 2 SOLUTION OF ALGEBRAIC AND TRANSCENDENTAL EQUATIONS

UNIT 2 SOLUTION OF ALGEBRAIC AND TRANSCENDENTAL EQUATIONS Numercal Computg -I UNIT SOLUTION OF ALGEBRAIC AND TRANSCENDENTAL EQUATIONS Structure Page Nos..0 Itroducto 6. Objectves 7. Ital Approxmato to a Root 7. Bsecto Method 8.. Error Aalyss 9.4 Regula Fals Method

More information

Special Instructions / Useful Data

Special Instructions / Useful Data JAM 6 Set of all real umbers P A..d. B, p Posso Specal Istructos / Useful Data x,, :,,, x x Probablty of a evet A Idepedetly ad detcally dstrbuted Bomal dstrbuto wth parameters ad p Posso dstrbuto wth

More information

best estimate (mean) for X uncertainty or error in the measurement (systematic, random or statistical) best

best estimate (mean) for X uncertainty or error in the measurement (systematic, random or statistical) best Error Aalyss Preamble Wheever a measuremet s made, the result followg from that measuremet s always subject to ucertaty The ucertaty ca be reduced by makg several measuremets of the same quatty or by mprovg

More information

Multiple Choice Test. Chapter Adequacy of Models for Regression

Multiple Choice Test. Chapter Adequacy of Models for Regression Multple Choce Test Chapter 06.0 Adequac of Models for Regresso. For a lear regresso model to be cosdered adequate, the percetage of scaled resduals that eed to be the rage [-,] s greater tha or equal to

More information

Mu Sequences/Series Solutions National Convention 2014

Mu Sequences/Series Solutions National Convention 2014 Mu Sequeces/Seres Solutos Natoal Coveto 04 C 6 E A 6C A 6 B B 7 A D 7 D C 7 A B 8 A B 8 A C 8 E 4 B 9 B 4 E 9 B 4 C 9 E C 0 A A 0 D B 0 C C Usg basc propertes of arthmetc sequeces, we fd a ad bm m We eed

More information

Lecture Note to Rice Chapter 8

Lecture Note to Rice Chapter 8 ECON 430 HG revsed Nov 06 Lecture Note to Rce Chapter 8 Radom matrces Let Y, =,,, m, =,,, be radom varables (r.v. s). The matrx Y Y Y Y Y Y Y Y Y Y = m m m s called a radom matrx ( wth a ot m-dmesoal dstrbuto,

More information

Strong Convergence of Weighted Averaged Approximants of Asymptotically Nonexpansive Mappings in Banach Spaces without Uniform Convexity

Strong Convergence of Weighted Averaged Approximants of Asymptotically Nonexpansive Mappings in Banach Spaces without Uniform Convexity BULLETIN of the MALAYSIAN MATHEMATICAL SCIENCES SOCIETY Bull. Malays. Math. Sc. Soc. () 7 (004), 5 35 Strog Covergece of Weghted Averaged Appromats of Asymptotcally Noepasve Mappgs Baach Spaces wthout

More information

The Occupancy and Coupon Collector problems

The Occupancy and Coupon Collector problems Chapter 4 The Occupacy ad Coupo Collector problems By Sarel Har-Peled, Jauary 9, 08 4 Prelmares [ Defto 4 Varace ad Stadard Devato For a radom varable X, let V E [ X [ µ X deote the varace of X, where

More information

Class 13,14 June 17, 19, 2015

Class 13,14 June 17, 19, 2015 Class 3,4 Jue 7, 9, 05 Pla for Class3,4:. Samplg dstrbuto of sample mea. The Cetral Lmt Theorem (CLT). Cofdece terval for ukow mea.. Samplg Dstrbuto for Sample mea. Methods used are based o CLT ( Cetral

More information

1 Onto functions and bijections Applications to Counting

1 Onto functions and bijections Applications to Counting 1 Oto fuctos ad bectos Applcatos to Coutg Now we move o to a ew topc. Defto 1.1 (Surecto. A fucto f : A B s sad to be surectve or oto f for each b B there s some a A so that f(a B. What are examples of

More information

Department of Agricultural Economics. PhD Qualifier Examination. August 2011

Department of Agricultural Economics. PhD Qualifier Examination. August 2011 Departmet of Agrcultural Ecoomcs PhD Qualfer Examato August 0 Istructos: The exam cossts of sx questos You must aswer all questos If you eed a assumpto to complete a questo, state the assumpto clearly

More information

7.0 Equality Contraints: Lagrange Multipliers

7.0 Equality Contraints: Lagrange Multipliers Systes Optzato 7.0 Equalty Cotrats: Lagrage Multplers Cosder the zato of a o-lear fucto subject to equalty costrats: g f() R ( ) 0 ( ) (7.) where the g ( ) are possbly also olear fuctos, ad < otherwse

More information

means the first term, a2 means the term, etc. Infinite Sequences: follow the same pattern forever.

means the first term, a2 means the term, etc. Infinite Sequences: follow the same pattern forever. 9.4 Sequeces ad Seres Pre Calculus 9.4 SEQUENCES AND SERIES Learg Targets:. Wrte the terms of a explctly defed sequece.. Wrte the terms of a recursvely defed sequece. 3. Determe whether a sequece s arthmetc,

More information

Ordinary Least Squares Regression. Simple Regression. Algebra and Assumptions.

Ordinary Least Squares Regression. Simple Regression. Algebra and Assumptions. Ordary Least Squares egresso. Smple egresso. Algebra ad Assumptos. I ths part of the course we are gog to study a techque for aalysg the lear relatoshp betwee two varables Y ad X. We have pars of observatos

More information

. The set of these sums. be a partition of [ ab, ]. Consider the sum f( x) f( x 1)

. The set of these sums. be a partition of [ ab, ]. Consider the sum f( x) f( x 1) Chapter 7 Fuctos o Bouded Varato. Subject: Real Aalyss Level: M.Sc. Source: Syed Gul Shah (Charma, Departmet o Mathematcs, US Sargodha Collected & Composed by: Atq ur Rehma (atq@mathcty.org, http://www.mathcty.org

More information

MA 524 Homework 6 Solutions

MA 524 Homework 6 Solutions MA 524 Homework 6 Solutos. Sce S(, s the umber of ways to partto [] to k oempty blocks, ad c(, s the umber of ways to partto to k oempty blocks ad also the arrage each block to a cycle, we must have S(,

More information

TESTS BASED ON MAXIMUM LIKELIHOOD

TESTS BASED ON MAXIMUM LIKELIHOOD ESE 5 Toy E. Smth. The Basc Example. TESTS BASED ON MAXIMUM LIKELIHOOD To llustrate the propertes of maxmum lkelhood estmates ad tests, we cosder the smplest possble case of estmatg the mea of the ormal

More information

Assignment 5/MATH 247/Winter Due: Friday, February 19 in class (!) (answers will be posted right after class)

Assignment 5/MATH 247/Winter Due: Friday, February 19 in class (!) (answers will be posted right after class) Assgmet 5/MATH 7/Wter 00 Due: Frday, February 9 class (!) (aswers wll be posted rght after class) As usual, there are peces of text, before the questos [], [], themselves. Recall: For the quadratc form

More information

Introduction to Probability

Introduction to Probability Itroducto to Probablty Nader H Bshouty Departmet of Computer Scece Techo 32000 Israel e-mal: bshouty@cstechoacl 1 Combatorcs 11 Smple Rules I Combatorcs The rule of sum says that the umber of ways to choose

More information

CHAPTER 3 POSTERIOR DISTRIBUTIONS

CHAPTER 3 POSTERIOR DISTRIBUTIONS CHAPTER 3 POSTERIOR DISTRIBUTIONS If scece caot measure the degree of probablt volved, so much the worse for scece. The practcal ma wll stck to hs apprecatve methods utl t does, or wll accept the results

More information

Bayes (Naïve or not) Classifiers: Generative Approach

Bayes (Naïve or not) Classifiers: Generative Approach Logstc regresso Bayes (Naïve or ot) Classfers: Geeratve Approach What do we mea by Geeratve approach: Lear p(y), p(x y) ad the apply bayes rule to compute p(y x) for makg predctos Ths s essetally makg

More information

Algorithms Theory, Solution for Assignment 2

Algorithms Theory, Solution for Assignment 2 Juor-Prof. Dr. Robert Elsässer, Marco Muñz, Phllp Hedegger WS 2009/200 Algorthms Theory, Soluto for Assgmet 2 http://lak.formatk.u-freburg.de/lak_teachg/ws09_0/algo090.php Exercse 2. - Fast Fourer Trasform

More information

Q-analogue of a Linear Transformation Preserving Log-concavity

Q-analogue of a Linear Transformation Preserving Log-concavity Iteratoal Joural of Algebra, Vol. 1, 2007, o. 2, 87-94 Q-aalogue of a Lear Trasformato Preservg Log-cocavty Daozhog Luo Departmet of Mathematcs, Huaqao Uversty Quazhou, Fua 362021, P. R. Cha ldzblue@163.com

More information

arxiv:math/ v1 [math.gm] 8 Dec 2005

arxiv:math/ v1 [math.gm] 8 Dec 2005 arxv:math/05272v [math.gm] 8 Dec 2005 A GENERALIZATION OF AN INEQUALITY FROM IMO 2005 NIKOLAI NIKOLOV The preset paper was spred by the thrd problem from the IMO 2005. A specal award was gve to Yure Boreko

More information

ρ < 1 be five real numbers. The

ρ < 1 be five real numbers. The Lecture o BST 63: Statstcal Theory I Ku Zhag, /0/006 Revew for the prevous lecture Deftos: covarace, correlato Examples: How to calculate covarace ad correlato Theorems: propertes of correlato ad covarace

More information

MULTIDIMENSIONAL HETEROGENEOUS VARIABLE PREDICTION BASED ON EXPERTS STATEMENTS. Gennadiy Lbov, Maxim Gerasimov

MULTIDIMENSIONAL HETEROGENEOUS VARIABLE PREDICTION BASED ON EXPERTS STATEMENTS. Gennadiy Lbov, Maxim Gerasimov Iteratoal Boo Seres "Iformato Scece ad Computg" 97 MULTIIMNSIONAL HTROGNOUS VARIABL PRICTION BAS ON PRTS STATMNTS Geady Lbov Maxm Gerasmov Abstract: I the wors [ ] we proposed a approach of formg a cosesus

More information

Multiple Linear Regression Analysis

Multiple Linear Regression Analysis LINEA EGESSION ANALYSIS MODULE III Lecture - 4 Multple Lear egresso Aalyss Dr. Shalabh Departmet of Mathematcs ad Statstcs Ida Isttute of Techology Kapur Cofdece terval estmato The cofdece tervals multple

More information

1 Lyapunov Stability Theory

1 Lyapunov Stability Theory Lyapuov Stablty heory I ths secto we cosder proofs of stablty of equlbra of autoomous systems. hs s stadard theory for olear systems, ad oe of the most mportat tools the aalyss of olear systems. It may

More information

8.1 Hashing Algorithms

8.1 Hashing Algorithms CS787: Advaced Algorthms Scrbe: Mayak Maheshwar, Chrs Hrchs Lecturer: Shuch Chawla Topc: Hashg ad NP-Completeess Date: September 21 2007 Prevously we looked at applcatos of radomzed algorthms, ad bega

More information

NP!= P. By Liu Ran. Table of Contents. The P versus NP problem is a major unsolved problem in computer

NP!= P. By Liu Ran. Table of Contents. The P versus NP problem is a major unsolved problem in computer NP!= P By Lu Ra Table of Cotets. Itroduce 2. Prelmary theorem 3. Proof 4. Expla 5. Cocluso. Itroduce The P versus NP problem s a major usolved problem computer scece. Iformally, t asks whether a computer

More information

Qualifying Exam Statistical Theory Problem Solutions August 2005

Qualifying Exam Statistical Theory Problem Solutions August 2005 Qualfyg Exam Statstcal Theory Problem Solutos August 5. Let X, X,..., X be d uform U(,),

More information

Beam Warming Second-Order Upwind Method

Beam Warming Second-Order Upwind Method Beam Warmg Secod-Order Upwd Method Petr Valeta Jauary 6, 015 Ths documet s a part of the assessmet work for the subject 1DRP Dfferetal Equatos o Computer lectured o FNSPE CTU Prague. Abstract Ths documet

More information

X ε ) = 0, or equivalently, lim

X ε ) = 0, or equivalently, lim Revew for the prevous lecture Cocepts: order statstcs Theorems: Dstrbutos of order statstcs Examples: How to get the dstrbuto of order statstcs Chapter 5 Propertes of a Radom Sample Secto 55 Covergece

More information

STRONG CONSISTENCY OF LEAST SQUARES ESTIMATE IN MULTIPLE REGRESSION WHEN THE ERROR VARIANCE IS INFINITE

STRONG CONSISTENCY OF LEAST SQUARES ESTIMATE IN MULTIPLE REGRESSION WHEN THE ERROR VARIANCE IS INFINITE Statstca Sca 9(1999), 289-296 STRONG CONSISTENCY OF LEAST SQUARES ESTIMATE IN MULTIPLE REGRESSION WHEN THE ERROR VARIANCE IS INFINITE J Mgzhog ad Che Xru GuZhou Natoal College ad Graduate School, Chese

More information

Analysis of Lagrange Interpolation Formula

Analysis of Lagrange Interpolation Formula P IJISET - Iteratoal Joural of Iovatve Scece, Egeerg & Techology, Vol. Issue, December 4. www.jset.com ISS 348 7968 Aalyss of Lagrage Iterpolato Formula Vjay Dahya PDepartmet of MathematcsMaharaja Surajmal

More information

PROJECTION PROBLEM FOR REGULAR POLYGONS

PROJECTION PROBLEM FOR REGULAR POLYGONS Joural of Mathematcal Sceces: Advaces ad Applcatos Volume, Number, 008, Pages 95-50 PROJECTION PROBLEM FOR REGULAR POLYGONS College of Scece Bejg Forestry Uversty Bejg 0008 P. R. Cha e-mal: sl@bjfu.edu.c

More information

Johns Hopkins University Department of Biostatistics Math Review for Introductory Courses

Johns Hopkins University Department of Biostatistics Math Review for Introductory Courses Johs Hopks Uverst Departmet of Bostatstcs Math Revew for Itroductor Courses Ratoale Bostatstcs courses wll rel o some fudametal mathematcal relatoshps, fuctos ad otato. The purpose of ths Math Revew s

More information

NP!= P. By Liu Ran. Table of Contents. The P vs. NP problem is a major unsolved problem in computer

NP!= P. By Liu Ran. Table of Contents. The P vs. NP problem is a major unsolved problem in computer NP!= P By Lu Ra Table of Cotets. Itroduce 2. Strategy 3. Prelmary theorem 4. Proof 5. Expla 6. Cocluso. Itroduce The P vs. NP problem s a major usolved problem computer scece. Iformally, t asks whether

More information

ON THE LOGARITHMIC INTEGRAL

ON THE LOGARITHMIC INTEGRAL Hacettepe Joural of Mathematcs ad Statstcs Volume 39(3) (21), 393 41 ON THE LOGARITHMIC INTEGRAL Bra Fsher ad Bljaa Jolevska-Tueska Receved 29:9 :29 : Accepted 2 :3 :21 Abstract The logarthmc tegral l(x)

More information

Johns Hopkins University Department of Biostatistics Math Review for Introductory Courses

Johns Hopkins University Department of Biostatistics Math Review for Introductory Courses Johs Hopks Uverst Departmet of Bostatstcs Math Revew for Itroductor Courses Ratoale Bostatstcs courses wll rel o some fudametal mathematcal relatoshps, fuctos ad otato. The purpose of ths Math Revew s

More information

ANALYSIS ON THE NATURE OF THE BASIC EQUATIONS IN SYNERGETIC INTER-REPRESENTATION NETWORK

ANALYSIS ON THE NATURE OF THE BASIC EQUATIONS IN SYNERGETIC INTER-REPRESENTATION NETWORK Far East Joural of Appled Mathematcs Volume, Number, 2008, Pages Ths paper s avalable ole at http://www.pphm.com 2008 Pushpa Publshg House ANALYSIS ON THE NATURE OF THE ASI EQUATIONS IN SYNERGETI INTER-REPRESENTATION

More information