Making Complex Decisions Markov Decision Processes. Making Complex Decisions: Markov Decision Problem
|
|
- Cornelius Nelson
- 6 years ago
- Views:
Transcription
1 Mking Comple Decisions Mrkov Decision Processes Vsn Honvr Bioinformics nd Compuionl Biology Progrm Cener for Compuionl Inelligence, Lerning, & Discovery Vsn Honvr, 006. Mking Comple Decisions: Mrkov Decision Problem How o use knowledge bou he world o mke decisions when here is unceriny bou consequences of cions Rewrds re delyed Vsn Honvr, 006. The Soluion Sequenil decision problems in uncerin environmens cn be solved by clculing policy h ssocies n opiml decision wih every environmenl se Mrkov Decision Process (MDP) Vsn Honvr, 006.
2 Emple The world 3 + Acions hve uncerin consequences sr 3 4 Vsn Honvr, 006. Vsn Honvr, 006. Vsn Honvr, 006.
3 Vsn Honvr, 006. Vsn Honvr, 006. Vsn Honvr,
4 Vsn Honvr, 006. Cumulive Discouned Rewrd Suppose rewrds re bounded by M Cumulive discouned rewrd is bounded by M + M n+ n ( ) ( ) ( γ ) γ +.. M γ = M ( γ ) Noe :For he geomeric series o converge, 0 γ < Vsn Honvr, 006. Uiliy of Se Sequence U h U h g Addiive rewrds ([ s, s, s...]) = R( s0) + R( s) + R( s) 0 + g Discouned rewrds ([ s, s, s...]) = R( s0) + γr( s) + γ R( s) Vsn Honvr,
5 Vsn Honvr, 006. Uiliy of Se g g The uiliy of ech se is he epeced sum of discouned rewrds if he gen eecues he policy π U π = ( s) E γ R( s ) π, s = 0 = s The rue uiliy of se corresponds o he opiml policy π* 0 Vsn Honvr, 006. Vsn Honvr,
6 Clculing he Opiml Policy Vlue ierion Policy ierion Vsn Honvr, 006. Vlue Ierion g Clcule he uiliy of ech se g Then use he se uiliies o selec n opiml cion in ech se * π ( s) = rg m / s / / T ( s,, s ) U ( s ) Vsn Honvr, 006. Vlue Ierion Algorihm funcion vlue-ierion(mdp) reurns uiliy funcion locl vribles: U, U iniilly idenicl o R repe U U for ech se s do U ( s) R( s) + γ end unil close-enough(u, U ) reurn U m / s / / T ( s,, s ) U ( s ) Bellmn upde Vsn Honvr,
7 Vlue Ierion Algorihm: Emple The Uiliies of he Ses Obined Afer Vlue Ierion Vsn Honvr, 006. Policy Ierion Pick policy, hen clcule he uiliy of ech se given h policy (vlue deerminion sep) Upde he policy ech se using he uiliies of he successor ses Repe unil he policy sbilizes Vsn Honvr, 006. Vsn Honvr, 006. Policy Ierion Algorihm funcion policy-ierion(mdp) reurns policy locl vribles: U, uiliy funcion, π, policy repe U vlue-deerminion(π,u,mdp,r) unchnged? rue for ech se s do / / / if m T ( s,, s ) U ( s ) > T ( s, π ( s), s / / s s π ( s) T ( s,, s rg m unchnged? flse end unil unchnged? reurn π / s / ) U ( s ) hen / / ) U ( s ) 7
8 Vlue Deerminion g Simplificion of he vlue ierion lgorihm becuse he policy is fied g Liner equions becuse he m() operor hs been removed g Solve ecly for he uiliies using sndrd liner lgebr Vsn Honvr, 006. Opiml Policy (policy ierion wih liner equions) u (,) = 0.8 u (,) + 0. u (,) +0. u (,) Vsn Honvr, 006. u (,) = 0.8 u (,3) + 0. u (,) Prilly observble MDP (POMDP) In n inccessible environmen, he percep does no provide enough informion o deermine he se or he rnsiion probbiliy POMDP Se rnsiion funcion: P(s + s, ) Observion funcion: P(o s, ) Rewrd funcion: E(r s, ) Approch Clcule probbiliy disribuion over he possible ses given ll previous perceps, nd o bse decision on his disribuion Vsn Honvr,
9 Vsn Honvr, 006. Lerning from Inercion wih he world An gen receives sensions or perceps from he environmen hrough is sensors nd cs on he environmen hrough is effecors nd occsionlly receives rewrds or punishmens from he environmen The gol of he gen is o mimize is rewrd (plesure) or minimize is punishmen (or pin) s i sumbles long in n -priori unknown, uncerin, environmen Supervised Lerning Eperience = Lbeled Emples Inpus Supervised Lerning Sysem Oupus Objecive Minimize Error beween desired nd cul oupus 9
10 Reinforcemen Lerning Eperience = Acion-induced Se Trnsiions nd Rewrds Inpus Reinforcemen Lerning Sysem Oupus = cions Objecive Mimize rewrd Reinforcemen lerning Lerner is no old which cions o ke Rewrds nd punishmens my be delyed Scrifice shor-erm gins for greer long-erm gins The need o rdeoff beween eplorion nd eploiion Environmen my no be observble or only prilly observble Environmen my be deerminisic or sochsic Reinforcemen lerning Environmen se rewrd cion Agen 0
11 Key elemens of n RL Sysem Policy Rewrd Vlue Model of environmen Policy wh o do Rewrd wh is good Vlue wh is good becuse i predics rewrd Model wh follows wh An Eended Emple: Tic-Tc-Toe o X X X X X O X O X O X O X O O o... X O X O X O o X O X O X O X } O moves } X moves } O moves } X moves Assume n imperfec opponen: he/she someimes mkes miskes } X moves o o A Simple RL Approch o Tic-Tc-Toe Mke ble wih one enry per se Se o o o o o o o o o o V(s) esimed probbiliy of winning win 0 loss 0.5 drw Now ply los of gmes. To pick our moves, look hed one sep Curren se Possible ne ses * Pick he ne se wih he highes esimed prob. of winning he lrges V(s) greedy move; Occsionlly pick move rndom n eplorory move.
12 RL Lerning Rule for Tic-Tc-Toe opponen's move { our move { opponen's move { our move { opponen's move { our move { sring posiion e * b c c * d e f s s he se before our greedy move he se fer our greedy move Eplorory move We incremen ech V(s) owrd V( s ) bckup : V(s) V (s) + α[ V( s ) V (s)] g g*. Why is Tic-Tc-Toe Too Esy? Number of ses is smll nd finie One-sep look-hed is lwys possible Se compleely observble Some Noble RL Applicions TD-Gmmon world s bes bckgmmon progrm (Tesuro) Elevor Conrol Cries & Bro Invenory Mngemen 0 5% improvemen over indusry sndrd mehods Vn Roy, Berseks, Lee nd Tsisiklis Dynmic Chnnel Assignmen -- high performnce ssignmen of rdio chnnels o mobile elephone clls Singh nd Berseks
13 The n-armed Bndi Problem Choose repeedly from one of n cions; ech choice is clled ply Afer ech ply, you ge rewrd r, where Er = Q * ( ) Disribuion of r depends only on Objecive is o mimize he rewrd in he long erm, e.g., over 000 plys The Eplorion Eploiion Dilemm Suppose you form Q ( Q * ( cion vlue esimes * = rgmq ( The greedy cion is = * eploiion * eplorion You cn eploi ll he ime; you cn eplore ll he ime You cn never sop eploring; bu you could reduce eploring Acion-Vlue Mehods Seless Adp cion-vlue esimes nd nohing else. Suppose by he -h ply, cion hd been chosen imes, producing rewrds r, r, K, r k, hen k Q ( = r + r + Lr k k lim Q ( = k Q* ( 3
14 Greedy ε-greedy Acion Selecion = * = rg mq ( ε-greedy Bolzmnn * wih probbiliy ε = rndom cion wih probbiliy ε Pr(choosing cion ime ) = whereτ is compuionl emperure e n b = Q ( τ e Q ( b) τ Incremenl Implemenion Recll he smple verge esimion mehod The verge of he firs k rewrds is Q k = r + r +Lr k k Incremenl upde rule does no require soring ps rewrds Q k + = Q k + [ k + r Q k + k] Trcking Nonsionry Environmen Choosing Q k o be smple verge is pproprie in Sionry environmen in which he dependence of Rewrds on cions is ime invrin when none of he Q * ( chnge over ime, In nonsionry environmen, i is beer o use eponenil, recency-weighed verge Q k + = Q k +α[ r k + Q k ] for consn α, 0< α = ( α) k Q 0 + α( α) k i r i k i = 4
15 Reinforcemen lerning when he gen cn sense nd respond o environmenl ses Agen se s rewrd r cion r + s + Environmen Agen nd environmen inerc discree ime seps: = 0,,, K Agen observes se sep : s S produces cion sep : A(s ) ges resuling rewrd: r + R nd resuling ne se: s +... r + r s s + + s r +3 + s The Agen Lerns Policy Policy sep, π : π ( s, = probbiliy h = when s = s mpping from ses o cion probbiliies Reinforcemen lerning mehods specify how he gen chnges is policy s resul of eperience. Roughly, he gen s gol is o ge s much rewrd s i cn over he long run. Agen-Environmen Inerfce -- Gols nd Rewrds Is sclr rewrd signl n deque noion of gol? mybe no, bu i is surprisingly fleible. A gol should specify wh we wn o chieve, no how we wn o chieve i. A gol is ypiclly ouside he gen s direc conrol The gen mus be ble o mesure success: eplicily frequenly during is lifespn 5
16 Rewrds Suppose he sequence of rewrds fer sep is : r, r, r , K Wh do we wn o mimize? In generl, we wn o mimize he epeced for ech sep. reurn, E Episodic sks inercion breks nurlly ino episodes, e.g., plys of gme, rips hrough mze. R r + r + + r = + + L T, { R }, where T is finl ime sep which erminl se is reched, ending n episode. Rewrds for Coninuing Tsks Coninuing sks: inercion does no hve nurl episodes. Discouned reurn: = = k R r + γ r + γ r + L γ r k = 0 where γ,0 γ, is he discoun re., + k + shorsighed 0 γ frsighed Emple Pole Blncing Tsk Avoid filure: he pole flling beyond criicl ngle or he cr hiing end of rck. As n episodic sk where episode ends upon filure: rewrd =+ for ech sep before filure reurn = number of seps before filure As coninuing sk wih discouned reurn: rewrd = upon filure; 0 oherwise reurn = γ k, for k seps before filure In eiher cse, reurn is mimized by voiding filure for s long s possible. 6
17 Emple -- Driving sk Ge o he op of he hill s quickly s possible. rewrd = for ech sep when no op of hill reurn = number of seps before reching op of hill Reurn is mimized by minimizing he number of seps ken o rech he op of he hill. The Mrkov Propery Pr By he se sep, we men whever informion is vilble o he gen sep bou is environmen. The se cn include immedie sensions, highly processed sensions, nd srucures buil up over ime from sequences of sensions. Idelly, se should summrize ps sensions so s o rein ll essenil informion i should hve he Mrkov Propery: { s = s, r = r s,, r, s,, K, r, s, } = Pr{ s = s, r = r s, } + + s, r,nd hisories s,, r, s,, K, r, s, Mrkov Decision Processes If reinforcemen lerning sk hs he Mrkov Propery, i is clled Mrkov Decision Process (MDP). If se nd cion ses re finie, i is finie MDP. To define finie MDP, you need o specify: se nd cion ses one-sep dynmics defined by rnsiion probbiliies: P = Pr + ss rewrd probbiliies: R { s = s s = s, = } s, s S, A( s). { r s = s, =, s = s } s, s S, A( s). = E + + ss 7
18 Recycling Robo Finie MDP Emple A ech sep, robo hs o decide wheher i should cively serch for cn, b) wi for someone o bring i cn, or c) go o home bse nd rechrge. Serching is beer bu runs down he bery; if runs ou of power while serching, hs o be rescued (which is bd). Decisions mde on bsis of curren energy level: high, low. Rewrd = number of cns colleced Vlue Funcions The vlue of se is he epeced reurn sring from h se; depends on he gen s policy: Se - vlue funcion for policy π : π = { = } = k V ( s) Eπ R s s Eπ γ r + k k = 0 + s = s The vlue of king n cion in se under policy π is he epeced reurn sring from h se, king h cion, nd herefer following π : Acion - vlue funcion for policy π : { R s = s, = } π = = k Q ( s, Eπ Eπ γ r + k + s = s, = k = 0 Bellmn Equion for Policy π The bsic ide: R = r = r = r γr + γ + + γr ( r + γr + γ r L) γ r So: Vπ (s) = E π R s = s γ r L { } { ( )s = s} = E π r + + γ Vs + Or, wihou he epecion operor: π [ R + γv ( s ] V ) π ( s) = π( s, P ss ss s 8
19 Opiml Vlue Funcions For finie MDPs, policies cn be prilly ordered: π π if nd only if V π (s) V π (s) for ll s S There is lwys les one (nd possibly mny) policies h is beer hn or equl o ll he ohers. This is n opiml policy. We denoe hem ll π *. Opiml policies shre he sme opiml se-vlue funcion: V (s) = mv π (s) for ll s S π Opiml policies lso shre he sme opiml cion-vlue funcion: Q (s, = mq π (s, for ll s S nd A(s) π This is he epeced reurn for king cion in se s nd herefer following n opiml policy. V Bellmn Opimliy Equion for V* V (s) = m A(s) Qπ (s, = m Er + + γ V (s + ) s = s, = A(s) { } P s s [ s + γ V ( s )] = m R s A(s) s The vlue of se under n opiml policy mus equl he epeced reurn for he bes cion from h se: The relevn bckup digrm: is he unique soluion of his sysem of nonliner equions. ( m s r s' Bellmn Opimliy Equion for Q* { } s [ R s s +γ mq ( s, ) ] Q (s, = Er + + γ mq (s +, ) s = s, = = P s s The relevn bckup digrm: (b) s, r s' m ' Q * is he unique soluion of his sysem of nonliner equions. 9
20 Why Opiml Se-Vlue Funcions re Useful Any policy h is greedy wih respec o V is n opiml policy. V Therefore, given, one-sep-hed serch produces he long-erm opiml cions. Wh Abou Opiml Acion-Vlue Funcions? Q * Given, he gen does no even hve o do one-sep-hed serch: π (s) = rg m A(s) Q (s, Solving he Bellmn Opimliy Equion Finding n opiml policy by solving he Bellmn Opimliy Equion requires: ccure knowledge of environmen dynmics; enough spce n ime o do he compuion; he Mrkov Propery. How much spce nd ime do we need? polynomil in number of ses (vi dynmic progrmming mehods), BUT, number of ses is ofen huge We usully hve o sele for pproimions. Mny RL mehods cn be undersood s pproimely solving he Bellmn Opimliy Equion. 0
21 Efficiency of DP To find n opiml policy is polynomil in he number of ses BUT, he number of ses ofen grows eponenilly wih he number of se vribles In prcice, clssicl DP cn be pplied o problems wih few millions of ses. Asynchronous DP cn be pplied o lrger problems, nd pproprie for prllel compuion. I is surprisingly esy o come up wih MDPs for which DP mehods re no prcicl. Reinforcemen lerning Environmen se rewrd cion Agen Mrkov Decision Processes Assume finie se of ses S se of cions A ech discree ime gen observes se s S nd chooses cion A hen receives immedie rewrd r nd se chnges o s + Mrkov ssumpion: s + = δ(s, ) nd r = r(s, ) i.e., r nd s + depend only on curren se nd cion funcions δ nd r my be nondeerminisic funcions δ nd r no necessrily known o gen
22 Agen s lerning sk Eecue cions in environmen, observe resuls, nd lern cion policy π : S A h mimizes E [r + γr + + γ r + + ] from ny sring se in S here 0 γ< is he discoun fcor for fuure rewrds Noe somehing new: Trge funcion is π : S A bu we hve no rining emples of form s, rining emples re of form s,, r Reinforcemen lerning problem Gol: lern o choose cions h mimize r 0 + γr + γ r +, where 0 γ< Lerning An Acion-Vlue Funcion Esime Q π for he curren behvior policy π. r r s + s + + s s, s +, + + s +, + Afer every rnsiion from nonerminl (, ) Q( s, ) + α[ r + γq( s, ) Q( s, )] Q s If s + + is erminl, hen Q( s +, + + ) = 0. + se s, do :
23 Vlue funcion To begin, consider deerminisic worlds... For ech possible policy π he gen migh dop, we cn define n evluion funcion over ses π V ( s) r + γr + γ r +... i= 0 + γ r i + i + where r, r +,... re genered by following policy π sring se s Resed, he sk is o lern he opiml policy π* π π * rg mv ( s),( s) π Wh o lern We migh ry o hve gen lern he evluion funcion V π* (which we wrie s V*) I could hen do look-hed serch o choose bes cion from ny se s becuse π *( s) rg m[ r( s, + γv *( δ ( s, )] A problem: This works well if gen knows δ : S A S, nd r : S A R Bu when i doesn', i cn' choose cions his wy 3
24 Acion-Vlue funcion Q funcion Define new funcion very similr o V* Q( s, r( s, + γv *( δ ( s, ) If gen lerns Q, i cn choose opiml cion even wihou knowing δ! π *( s) rg m[ r( s, + γv *( δ ( s, )] π π *( s) rg m Q( s, Q is he evluion funcion he gen will lern π Trining rule o lern Q Noe Q nd V* re closely reled: V *( s) = mq( s, ')) ' Which llows us o wrie Q recursively s Q( s, ) = r( s, ) + γv *( δ ( s, )) = r( s, ) + γ mq( δ ( s ' +, ')) Le Qˆ denoe lerner s curren pproimion o Q. Consider rining rule Qˆ( s, r + γm Qˆ( s', ' ) where s is he se resuling from pplying cion in se s. ' Q-Lerning [ + ] ( s, ) Q( s, ) + α r + γmq( s, Q( s ) Q, + 4
25 Q Lerning for Deerminisic Worlds For ech s, iniilize ble enry Observe curren se s Qˆ ( s, 0 Do forever: Selec n cion nd eecue i Receive immedie rewrd r Observe he new se s Upde he ble enry for Qˆ ( s, s follows: Qˆ( s) r + γ m Qˆ( s', ') s s. ' Upding Q Qˆ( s, righ ) r + γm Qˆ( s ', ' ) ' m{ 63800,, } 90 Noice if rewrds non-negive, hen ( s,, n) Qˆ (, ) ˆ n+ s Qn ( s, nd ( s,, n) 0 Qˆ n( s, Q( s, Convergence heorem Theorem Qˆ converges o Q. Consider cse of deerminisic world, wih bounded immedie rewrds, where ech s, visied infiniely ofen. Proof: Define full inervl o be n inervl during which ech s, is visied. During ech full inervl he lrges error in Qˆ ble is reduced by fcor of γ. Le Qˆn be ble fer n updes, nd Δ n be he mimum error in Qˆn : h is Δ = m ˆ ( s, Q( s, n s, Q n 5
26 Convergence heorem For ny ble enry Qˆ n ( s, upded on ierion n +, he error in he revised esime ˆ ( s, ) is Q n+ Qˆ n+ ( s, Q( s, = ( r + γ m Qˆ ( s', ')) ( r + γ m Q( s', ')) = γ m Qˆ ( s', ') m Q( s', ') ' ' n n ' ' Q Lerning Recipe Qˆ Qˆ n+ n+ ( s, Q( s, = ( r + γ mqˆ ( s', ')) ( r + γ mq( s', ')) ( s, Q( s, ' s'', ' n ' ' n n n n ' γ m Qˆ ( s', ') Q( s', ') γ m Qˆ ( s'', ') Q( s'', ') ' = γ mqˆ ( s', ') mq( s', ') = γδ Noe we used generl fc h: m f( m f( m f( f ( Non-deerminisic cse Wh if rewrd nd ne se re non-deerminisic? We redefine V nd Q by king epeced vlues. π V ( s) E[ r + γr + + γ r ] i E γ r + i i= 0 Q( s, E[ r( s, + γv *( δ ( s, )] 6
27 where Nondeerminisic cse Q lerning generlizes o nondeerminisic worlds Aler rining rule o Qˆ ( s, ( α ) Qˆ ( s, [ r m Qˆ n n + α n + n n ' αn = + visis n ( s, ( s', ' )] Convergence of Qˆ o Q cn be proved [Wkins nd Dyn, 99] Temporl Difference Lerning Temporl Difference (TD) lerning mehods Cn be used when ccure models of he environmen re unvilble neiher se rnsiion funcion nor rewrd funcion re known Cn be eended o work wih implici represenions of cion-vlue funcions Are mong he mos useful reinforcemen lerning mehods Emple TD-Gmmon Lern o ply Bckgmmon (Tesuro, 995) Immedie rewrd: +00 if win -00 if lose 0 for ll oher ses Trined by plying.5 million gmes gins iself. Now comprble o he bes humn plyer. 7
28 λ Q ( s, ) ( λ)[ Q Temporl difference lerning Q s, ) r + γ mqˆ( s () ( + () ( s, ) + λq Q lerning: reduce discrepncy beween successive Q esimes One sep ime difference: () ( s, ) + λ Q, Why no wo seps? () Q ( s, ) r + γr + + γ mqˆ( s+, Or n? Q ( n) ( n ) ( s, ) r + γr + L+ γ r + n Blend ll of hese: + + n γ mqˆ( s + n (3), ( s, ) λ Q ( s, ) ( λ)[ Q Temporl difference lerning () Equivlen epression: ( s, ) + λq ( s, ) + λ Q λ λ Q ( s, ) = r + γ [( λ) mqˆ( s, ) + λq ( s+, + )] ( s, ) TD(λ) lgorihm uses bove rining rule Someimes converges fser hn Q lerning converges for lerning V * for ny 0 λ (Dyn, 99) Tesuro's TD-Gmmon uses his lgorihm () (3) Hndling Lrge Se Spces Replce Qˆ ble wih neurl ne or oher funcion pproimor Virully ny funcion pproimor would work provided i cn be upded in n online fshion 8
29 Lerning se-cion vlues Trining emples of he form: { descripion of ( s, ), v } The generl grdien-descen rule: r θ + = r θ +α[ v Q (s, )] r Q(s, ) θ Liner Grdien Descen Wkins Q(λ) 9
Chapter 2: Evaluative Feedback
Chper 2: Evluive Feedbck Evluing cions vs. insrucing by giving correc cions Pure evluive feedbck depends olly on he cion ken. Pure insrucive feedbck depends no ll on he cion ken. Supervised lerning is
More informationReinforcement Learning
Reiforceme Corol lerig Corol polices h choose opiml cios Q lerig Covergece Chper 13 Reiforceme 1 Corol Cosider lerig o choose cios, e.g., Robo lerig o dock o bery chrger o choose cios o opimize fcory oupu
More informatione t dt e t dt = lim e t dt T (1 e T ) = 1
Improper Inegrls There re wo ypes of improper inegrls - hose wih infinie limis of inegrion, nd hose wih inegrnds h pproch some poin wihin he limis of inegrion. Firs we will consider inegrls wih infinie
More informationReinforcement learning
CS 75 Mchine Lening Lecue b einfocemen lening Milos Huskech milos@cs.pi.edu 539 Senno Sque einfocemen lening We wn o len conol policy: : X A We see emples of bu oupus e no given Insed of we ge feedbck
More informationA Kalman filtering simulation
A Klmn filering simulion The performnce of Klmn filering hs been esed on he bsis of wo differen dynmicl models, ssuming eiher moion wih consn elociy or wih consn ccelerion. The former is epeced o beer
More informationReinforcement Learning. Markov Decision Processes
einforcemen Lerning Mrkov Decision rocesses Mnfred Huber 2014 1 equenil Decision Mking N-rmed bi problems re no good wy o model sequenil decision problem Only dels wih sic decision sequences Could be miiged
More informationMotion. Part 2: Constant Acceleration. Acceleration. October Lab Physics. Ms. Levine 1. Acceleration. Acceleration. Units for Acceleration.
Moion Accelerion Pr : Consn Accelerion Accelerion Accelerion Accelerion is he re of chnge of velociy. = v - vo = Δv Δ ccelerion = = v - vo chnge of velociy elpsed ime Accelerion is vecor, lhough in one-dimensionl
More informationMinimum Squared Error
Minimum Squred Error LDF: Minimum Squred-Error Procedures Ide: conver o esier nd eer undersood prolem Percepron y i > 0 for ll smples y i solve sysem of liner inequliies MSE procedure y i i for ll smples
More informationMinimum Squared Error
Minimum Squred Error LDF: Minimum Squred-Error Procedures Ide: conver o esier nd eer undersood prolem Percepron y i > for ll smples y i solve sysem of liner inequliies MSE procedure y i = i for ll smples
More informationENGR 1990 Engineering Mathematics The Integral of a Function as a Function
ENGR 1990 Engineering Mhemics The Inegrl of Funcion s Funcion Previously, we lerned how o esime he inegrl of funcion f( ) over some inervl y dding he res of finie se of rpezoids h represen he re under
More information4.8 Improper Integrals
4.8 Improper Inegrls Well you ve mde i hrough ll he inegrion echniques. Congrs! Unforunely for us, we sill need o cover one more inegrl. They re clled Improper Inegrls. A his poin, we ve only del wih inegrls
More informationContraction Mapping Principle Approach to Differential Equations
epl Journl of Science echnology 0 (009) 49-53 Conrcion pping Principle pproch o Differenil Equions Bishnu P. Dhungn Deprmen of hemics, hendr Rn Cmpus ribhuvn Universiy, Khmu epl bsrc Using n eension of
More informationBellman Optimality Equation for V*
Bellmn Optimlity Eqution for V* The vlue of stte under n optiml policy must equl the expected return for the best ction from tht stte: V (s) mx Q (s,) A(s) mx A(s) mx A(s) Er t 1 V (s t 1 ) s t s, t s
More informationMATH 124 AND 125 FINAL EXAM REVIEW PACKET (Revised spring 2008)
MATH 14 AND 15 FINAL EXAM REVIEW PACKET (Revised spring 8) The following quesions cn be used s review for Mh 14/ 15 These quesions re no cul smples of quesions h will pper on he finl em, bu hey will provide
More information1. Consider a PSA initially at rest in the beginning of the left-hand end of a long ISS corridor. Assume xo = 0 on the left end of the ISS corridor.
In Eercise 1, use sndrd recngulr Cresin coordine sysem. Le ime be represened long he horizonl is. Assume ll ccelerions nd decelerions re consn. 1. Consider PSA iniilly res in he beginning of he lef-hnd
More informationProperties of Logarithms. Solving Exponential and Logarithmic Equations. Properties of Logarithms. Properties of Logarithms. ( x)
Properies of Logrihms Solving Eponenil nd Logrihmic Equions Properies of Logrihms Produc Rule ( ) log mn = log m + log n ( ) log = log + log Properies of Logrihms Quoien Rule log m = logm logn n log7 =
More information0 for t < 0 1 for t > 0
8.0 Sep nd del funcions Auhor: Jeremy Orloff The uni Sep Funcion We define he uni sep funcion by u() = 0 for < 0 for > 0 I is clled he uni sep funcion becuse i kes uni sep = 0. I is someimes clled he Heviside
More informationThe solution is often represented as a vector: 2xI + 4X2 + 2X3 + 4X4 + 2X5 = 4 2xI + 4X2 + 3X3 + 3X4 + 3X5 = 4. 3xI + 6X2 + 6X3 + 3X4 + 6X5 = 6.
[~ o o :- o o ill] i 1. Mrices, Vecors, nd Guss-Jordn Eliminion 1 x y = = - z= The soluion is ofen represened s vecor: n his exmple, he process of eliminion works very smoohly. We cn elimine ll enries
More informationChapter Direct Method of Interpolation
Chper 5. Direc Mehod of Inerpolion Afer reding his chper, you should be ble o:. pply he direc mehod of inerpolion,. sole problems using he direc mehod of inerpolion, nd. use he direc mehod inerpolns o
More informationS Radio transmission and network access Exercise 1-2
S-7.330 Rdio rnsmission nd nework ccess Exercise 1 - P1 In four-symbol digil sysem wih eqully probble symbols he pulses in he figure re used in rnsmission over AWGN-chnnel. s () s () s () s () 1 3 4 )
More informationREAL ANALYSIS I HOMEWORK 3. Chapter 1
REAL ANALYSIS I HOMEWORK 3 CİHAN BAHRAN The quesions re from Sein nd Shkrchi s e. Chper 1 18. Prove he following sserion: Every mesurble funcion is he limi.e. of sequence of coninuous funcions. We firs
More informationReinforcement learning II
CS 1675 Introduction to Mchine Lerning Lecture 26 Reinforcement lerning II Milos Huskrecht milos@cs.pitt.edu 5329 Sennott Squre Reinforcement lerning Bsics: Input x Lerner Output Reinforcement r Critic
More informationPHYSICS 1210 Exam 1 University of Wyoming 14 February points
PHYSICS 1210 Em 1 Uniersiy of Wyoming 14 Februry 2013 150 poins This es is open-noe nd closed-book. Clculors re permied bu compuers re no. No collborion, consulion, or communicion wih oher people (oher
More informationProbability, Estimators, and Stationarity
Chper Probbiliy, Esimors, nd Sionriy Consider signl genered by dynmicl process, R, R. Considering s funcion of ime, we re opering in he ime domin. A fundmenl wy o chrcerize he dynmics using he ime domin
More informationPhysics 2A HW #3 Solutions
Chper 3 Focus on Conceps: 3, 4, 6, 9 Problems: 9, 9, 3, 41, 66, 7, 75, 77 Phsics A HW #3 Soluions Focus On Conceps 3-3 (c) The ccelerion due o grvi is he sme for boh blls, despie he fc h he hve differen
More informationAdministrivia CSE 190: Reinforcement Learning: An Introduction
Administrivi CSE 190: Reinforcement Lerning: An Introduction Any emil sent to me bout the course should hve CSE 190 in the subject line! Chpter 4: Dynmic Progrmming Acknowledgment: A good number of these
More information3. Renewal Limit Theorems
Virul Lborories > 14. Renewl Processes > 1 2 3 3. Renewl Limi Theorems In he inroducion o renewl processes, we noed h he rrivl ime process nd he couning process re inverses, in sens The rrivl ime process
More informationSeptember 20 Homework Solutions
College of Engineering nd Compuer Science Mechnicl Engineering Deprmen Mechnicl Engineering A Seminr in Engineering Anlysis Fll 7 Number 66 Insrucor: Lrry Creo Sepember Homework Soluions Find he specrum
More informationChapter 4: Dynamic Programming
Chpter 4: Dynmic Progrmming Objectives of this chpter: Overview of collection of clssicl solution methods for MDPs known s dynmic progrmming (DP) Show how DP cn be used to compute vlue functions, nd hence,
More informationReinforcement Learning
Reinforcement Lerning Tom Mitchell, Mchine Lerning, chpter 13 Outline Introduction Comprison with inductive lerning Mrkov Decision Processes: the model Optiml policy: The tsk Q Lerning: Q function Algorithm
More informationf t f a f x dx By Lin McMullin f x dx= f b f a. 2
Accumulion: Thoughs On () By Lin McMullin f f f d = + The gols of he AP* Clculus progrm include he semen, Sudens should undersnd he definie inegrl s he ne ccumulion of chnge. 1 The Topicl Ouline includes
More information{ } = E! & $ " k r t +k +1
Chpter 4: Dynmic Progrmming Objectives of this chpter: Overview of collection of clssicl solution methods for MDPs known s dynmic progrmming (DP) Show how DP cn be used to compute vlue functions, nd hence,
More informationRL Lecture 7: Eligibility Traces. R. S. Sutton and A. G. Barto: Reinforcement Learning: An Introduction 1
RL Lecure 7: Eligibiliy Traces R. S. Suon and A. G. Baro: Reinforcemen Learning: An Inroducion 1 N-sep TD Predicion Idea: Look farher ino he fuure when you do TD backup (1, 2, 3,, n seps) R. S. Suon and
More informationAverage & instantaneous velocity and acceleration Motion with constant acceleration
Physics 7: Lecure Reminders Discussion nd Lb secions sr meeing ne week Fill ou Pink dd/drop form if you need o swich o differen secion h is FULL. Do i TODAY. Homework Ch. : 5, 7,, 3,, nd 6 Ch.: 6,, 3 Submission
More informationChapter 2. Motion along a straight line. 9/9/2015 Physics 218
Chper Moion long srigh line 9/9/05 Physics 8 Gols for Chper How o describe srigh line moion in erms of displcemen nd erge elociy. The mening of insnneous elociy nd speed. Aerge elociy/insnneous elociy
More information1.0 Electrical Systems
. Elecricl Sysems The ypes of dynmicl sysems we will e sudying cn e modeled in erms of lgeric equions, differenil equions, or inegrl equions. We will egin y looking fmilir mhemicl models of idel resisors,
More informationA 1.3 m 2.5 m 2.8 m. x = m m = 8400 m. y = 4900 m 3200 m = 1700 m
PHYS : Soluions o Chper 3 Home Work. SSM REASONING The displcemen is ecor drwn from he iniil posiion o he finl posiion. The mgniude of he displcemen is he shores disnce beween he posiions. Noe h i is onl
More information15/03/1439. Lecture 4: Linear Time Invariant (LTI) systems
Lecre 4: Liner Time Invrin LTI sysems 2. Liner sysems, Convolion 3 lecres: Implse response, inp signls s coninm of implses. Convolion, discree-ime nd coninos-ime. LTI sysems nd convolion Specific objecives
More informationINTEGRALS. Exercise 1. Let f : [a, b] R be bounded, and let P and Q be partitions of [a, b]. Prove that if P Q then U(P ) U(Q) and L(P ) L(Q).
INTEGRALS JOHN QUIGG Eercise. Le f : [, b] R be bounded, nd le P nd Q be priions of [, b]. Prove h if P Q hen U(P ) U(Q) nd L(P ) L(Q). Soluion: Le P = {,..., n }. Since Q is obined from P by dding finiely
More informationSome basic notation and terminology. Deterministic Finite Automata. COMP218: Decision, Computation and Language Note 1
COMP28: Decision, Compuion nd Lnguge Noe These noes re inended minly s supplemen o he lecures nd exooks; hey will e useful for reminders ou noion nd erminology. Some sic noion nd erminology An lphe is
More informationT-Match: Matching Techniques For Driving Yagi-Uda Antennas: T-Match. 2a s. Z in. (Sections 9.5 & 9.7 of Balanis)
3/0/018 _mch.doc Pge 1 of 6 T-Mch: Mching Techniques For Driving Ygi-Ud Anenns: T-Mch (Secions 9.5 & 9.7 of Blnis) l s l / l / in The T-Mch is shun-mching echnique h cn be used o feed he driven elemen
More information( ) ( ) ( ) ( ) ( ) ( y )
8. Lengh of Plne Curve The mos fmous heorem in ll of mhemics is he Pyhgoren Theorem. I s formulion s he disnce formul is used o find he lenghs of line segmens in he coordine plne. In his secion you ll
More information5.1-The Initial-Value Problems For Ordinary Differential Equations
5.-The Iniil-Vlue Problems For Ordinry Differenil Equions Consider solving iniil-vlue problems for ordinry differenil equions: (*) y f, y, b, y. If we know he generl soluion y of he ordinry differenil
More informationForms of Energy. Mass = Energy. Page 1. SPH4U: Introduction to Work. Work & Energy. Particle Physics:
SPH4U: Inroducion o ork ork & Energy ork & Energy Discussion Definiion Do Produc ork of consn force ork/kineic energy heore ork of uliple consn forces Coens One of he os iporn conceps in physics Alernive
More information1 jordan.mcd Eigenvalue-eigenvector approach to solving first order ODEs. -- Jordan normal (canonical) form. Instructor: Nam Sun Wang
jordnmcd Eigenvlue-eigenvecor pproch o solving firs order ODEs -- ordn norml (cnonicl) form Insrucor: Nm Sun Wng Consider he following se of coupled firs order ODEs d d x x 5 x x d d x d d x x x 5 x x
More informationMTH 146 Class 11 Notes
8.- Are of Surfce of Revoluion MTH 6 Clss Noes Suppose we wish o revolve curve C round n is nd find he surfce re of he resuling solid. Suppose f( ) is nonnegive funcion wih coninuous firs derivive on he
More informationChapter 21. Reinforcement Learning. The Reinforcement Learning Agent
CSE 47 Chaper Reinforcemen Learning The Reinforcemen Learning Agen Agen Sae u Reward r Acion a Enironmen CSE AI Faculy Why reinforcemen learning Programming an agen o drie a car or fly a helicoper is ery
More informationMAT 266 Calculus for Engineers II Notes on Chapter 6 Professor: John Quigg Semester: spring 2017
MAT 66 Clculus for Engineers II Noes on Chper 6 Professor: John Quigg Semeser: spring 7 Secion 6.: Inegrion by prs The Produc Rule is d d f()g() = f()g () + f ()g() Tking indefinie inegrls gives [f()g
More informationSolutions to Problems from Chapter 2
Soluions o Problems rom Chper Problem. The signls u() :5sgn(), u () :5sgn(), nd u h () :5sgn() re ploed respecively in Figures.,b,c. Noe h u h () :5sgn() :5; 8 including, bu u () :5sgn() is undeined..5
More information(b) 10 yr. (b) 13 m. 1.6 m s, m s m s (c) 13.1 s. 32. (a) 20.0 s (b) No, the minimum distance to stop = 1.00 km. 1.
Answers o Een Numbered Problems Chper. () 7 m s, 6 m s (b) 8 5 yr 4.. m ih 6. () 5. m s (b).5 m s (c).5 m s (d) 3.33 m s (e) 8. ().3 min (b) 64 mi..3 h. ().3 s (b) 3 m 4..8 mi wes of he flgpole 6. (b)
More informationECE Microwave Engineering. Fall Prof. David R. Jackson Dept. of ECE. Notes 10. Waveguides Part 7: Transverse Equivalent Network (TEN)
EE 537-635 Microwve Engineering Fll 7 Prof. Dvid R. Jcson Dep. of EE Noes Wveguides Pr 7: Trnsverse Equivlen Newor (N) Wveguide Trnsmission Line Model Our gol is o come up wih rnsmission line model for
More informationA Simple Method to Solve Quartic Equations. Key words: Polynomials, Quartics, Equations of the Fourth Degree INTRODUCTION
Ausrlin Journl of Bsic nd Applied Sciences, 6(6): -6, 0 ISSN 99-878 A Simple Mehod o Solve Quric Equions Amir Fhi, Poo Mobdersn, Rhim Fhi Deprmen of Elecricl Engineering, Urmi brnch, Islmic Ad Universi,
More informationOptimality of Myopic Policy for a Class of Monotone Affine Restless Multi-Armed Bandit
Univeriy of Souhern Cliforni Opimliy of Myopic Policy for Cl of Monoone Affine Rele Muli-Armed Bndi Pri Mnourifrd USC Tr Jvidi UCSD Bhkr Krihnmchri USC Dec 0, 202 Univeriy of Souhern Cliforni Inroducion
More information3D Transformations. Computer Graphics COMP 770 (236) Spring Instructor: Brandon Lloyd 1/26/07 1
D Trnsformions Compuer Grphics COMP 770 (6) Spring 007 Insrucor: Brndon Lloyd /6/07 Geomery Geomeric eniies, such s poins in spce, exis wihou numers. Coordines re nming scheme. The sme poin cn e descried
More informationPhysic 231 Lecture 4. Mi it ftd l t. Main points of today s lecture: Example: addition of velocities Trajectories of objects in 2 = =
Mi i fd l Phsic 3 Lecure 4 Min poins of od s lecure: Emple: ddiion of elociies Trjecories of objecs in dimensions: dimensions: g 9.8m/s downwrds ( ) g o g g Emple: A foobll pler runs he pern gien in he
More informationA new model for limit order book dynamics
Anewmodelforlimiorderbookdynmics JeffreyR.Russell UniversiyofChicgo,GrdueSchoolofBusiness TejinKim UniversiyofChicgo,DeprmenofSisics Absrc:Thispperproposesnewmodelforlimiorderbookdynmics.Thelimiorderbookconsiss
More informationf(x) dx with An integral having either an infinite limit of integration or an unbounded integrand is called improper. Here are two examples dx x x 2
Impope Inegls To his poin we hve only consideed inegls f() wih he is of inegion nd b finie nd he inegnd f() bounded (nd in fc coninuous ecep possibly fo finiely mny jump disconinuiies) An inegl hving eihe
More informationCSE/NB 528 Lecture 14: Reinforcement Learning (Chapter 9)
CSE/NB 528 Lecure 14: Reinforcemen Learning Chaper 9 Image from hp://clasdean.la.asu.edu/news/images/ubep2001/neuron3.jpg Lecure figures are from Dayan & Abbo s book hp://people.brandeis.edu/~abbo/book/index.hml
More informationANSWERS TO EVEN NUMBERED EXERCISES IN CHAPTER 2
ANSWERS TO EVEN NUMBERED EXERCISES IN CHAPTER Seion Eerise -: Coninuiy of he uiliy funion Le λ ( ) be he monooni uiliy funion defined in he proof of eisene of uiliy funion If his funion is oninuous y hen
More information2D Motion WS. A horizontally launched projectile s initial vertical velocity is zero. Solve the following problems with this information.
Nme D Moion WS The equions of moion h rele o projeciles were discussed in he Projecile Moion Anlsis Acii. ou found h projecile moes wih consn eloci in he horizonl direcion nd consn ccelerion in he ericl
More informationPhys 110. Answers to even numbered problems on Midterm Map
Phys Answers o een numbered problems on Miderm Mp. REASONING The word per indices rio, so.35 mm per dy mens.35 mm/d, which is o be epressed s re in f/cenury. These unis differ from he gien unis in boh
More informationQuestion Details Int Vocab 1 [ ] Question Details Int Vocab 2 [ ]
/3/5 Assignmen Previewer 3 Bsic: Definie Inegrls (67795) Due: Wed Apr 5 5 9: AM MDT Quesion 3 5 6 7 8 9 3 5 6 7 8 9 3 5 6 Insrucions Red ody's Noes nd Lerning Gols. Quesion Deils In Vocb [37897] The chnge
More informationMachine Learning Reinforcement Learning
Mchine Lerning Reinforcemen Lerning Leon 2 Mchine Lerning Mchine Lerning Supervied Lerning Techer ell lerner wh o remember Reinforcemen Lerning Environmen provide hin o lerner Unupervied Lerning Lerner
More informationNMR Spectroscopy: Principles and Applications. Nagarajan Murali Advanced Tools Lecture 4
NMR Specroscop: Principles nd Applicions Ngrjn Murli Advnced Tools Lecure 4 Advnced Tools Qunum Approch We know now h NMR is rnch of Specroscop nd he MNR specrum is n oucome of nucler spin inercion wih
More informationOn the Pseudo-Spectral Method of Solving Linear Ordinary Differential Equations
Journl of Mhemics nd Sisics 5 ():136-14, 9 ISS 1549-3644 9 Science Publicions On he Pseudo-Specrl Mehod of Solving Liner Ordinry Differenil Equions B.S. Ogundre Deprmen of Pure nd Applied Mhemics, Universiy
More informationNeural assembly binding in linguistic representation
Neurl ssembly binding in linguisic represenion Frnk vn der Velde & Mrc de Kmps Cogniive Psychology Uni, Universiy of Leiden, Wssenrseweg 52, 2333 AK Leiden, The Neherlnds, vdvelde@fsw.leidenuniv.nl Absrc.
More informationVersion 001 test-1 swinney (57010) 1. is constant at m/s.
Version 001 es-1 swinne (57010) 1 This prin-ou should hve 20 quesions. Muliple-choice quesions m coninue on he nex column or pge find ll choices before nswering. CubeUniVec1x76 001 10.0 poins Acubeis1.4fee
More informationP441 Analytical Mechanics - I. Coupled Oscillators. c Alex R. Dzierba
Lecure 3 Mondy - Deceber 5, 005 Wrien or ls upded: Deceber 3, 005 P44 Anlyicl Mechnics - I oupled Oscillors c Alex R. Dzierb oupled oscillors - rix echnique In Figure we show n exple of wo coupled oscillors,
More informationINVESTIGATION OF REINFORCEMENT LEARNING FOR BUILDING THERMAL MASS CONTROL
INVESTIGATION OF REINFORCEMENT LEARNING FOR BUILDING THERMAL MASS CONTROL Simeng Liu nd Gregor P. Henze, Ph.D., P.E. Universiy of Nebrsk Lincoln, Archiecurl Engineering 1110 Souh 67 h Sree, Peer Kiewi
More informationPART V. Wavelets & Multiresolution Analysis
Wveles 65 PART V Wveles & Muliresoluion Anlysis ADDITIONAL REFERENCES: A. Cohen, Numericl Anlysis o Wvele Mehods, Norh-Hollnd, (003) S. Mll, A Wvele Tour o Signl Processing, Acdemic Press, (999) I. Dubechies,
More informationName: Per: L o s A l t o s H i g h S c h o o l. Physics Unit 1 Workbook. 1D Kinematics. Mr. Randall Room 705
Nme: Per: L o s A l o s H i g h S c h o o l Physics Uni 1 Workbook 1D Kinemics Mr. Rndll Room 705 Adm.Rndll@ml.ne www.laphysics.com Uni 1 - Objecies Te: Physics 6 h Ediion Cunel & Johnson The objecies
More informationIntroduction to LoggerPro
Inroducion o LoggerPro Sr/Sop collecion Define zero Se d collecion prmeers Auoscle D Browser Open file Sensor seup window To sr d collecion, click he green Collec buon on he ool br. There is dely of second
More informationECE Microwave Engineering
EE 537-635 Microwve Engineering Adped from noes y Prof. Jeffery T. Willims Fll 8 Prof. Dvid R. Jcson Dep. of EE Noes Wveguiding Srucures Pr 7: Trnsverse Equivlen Newor (N) Wveguide Trnsmission Line Model
More informationAn integral having either an infinite limit of integration or an unbounded integrand is called improper. Here are two examples.
Improper Inegrls To his poin we hve only considered inegrls f(x) wih he is of inegrion nd b finie nd he inegrnd f(x) bounded (nd in fc coninuous excep possibly for finiely mny jump disconinuiies) An inegrl
More informationMathematics 805 Final Examination Answers
. 5 poins Se he Weiersrss M-es. Mhemics 85 Finl Eminion Answers Answer: Suppose h A R, nd f n : A R. Suppose furher h f n M n for ll A, nd h Mn converges. Then f n converges uniformly on A.. 5 poins Se
More informationBipartite Matching. Matching. Bipartite Matching. Maxflow Formulation
Mching Inpu: undireced grph G = (V, E). Biprie Mching Inpu: undireced, biprie grph G = (, E).. Mching Ern Myr, Hrld äcke Biprie Mching Inpu: undireced, biprie grph G = (, E). Mflow Formulion Inpu: undireced,
More informationAho-Corasick Automata
Aho-Corsick Auom Sring D Srucures Over he nex few dys, we're going o be exploring d srucures specificlly designed for sring processing. These d srucures nd heir vrins re frequenly used in prcice Looking
More informationA LOG IS AN EXPONENT.
Ojeives: n nlze nd inerpre he ehvior of rihmi funions, inluding end ehvior nd smpoes. n solve rihmi equions nlill nd grphill. n grph rihmi funions. n deermine he domin nd rnge of rihmi funions. n deermine
More informationThe Finite Element Method for the Analysis of Non-Linear and Dynamic Systems
Swiss Federl Insiue of Pge 1 The Finie Elemen Mehod for he Anlysis of Non-Liner nd Dynmic Sysems Prof. Dr. Michel Hvbro Fber Dr. Nebojs Mojsilovic Swiss Federl Insiue of ETH Zurich, Swizerlnd Mehod of
More informationZürich. ETH Master Course: L Autonomous Mobile Robots Localization II
Roland Siegwar Margaria Chli Paul Furgale Marco Huer Marin Rufli Davide Scaramuzza ETH Maser Course: 151-0854-00L Auonomous Mobile Robos Localizaion II ACT and SEE For all do, (predicion updae / ACT),
More information22.615, MHD Theory of Fusion Systems Prof. Freidberg Lecture 9: The High Beta Tokamak
.65, MHD Theory of Fusion Sysems Prof. Freidberg Lecure 9: The High e Tokmk Summry of he Properies of n Ohmic Tokmk. Advnges:. good euilibrium (smll shif) b. good sbiliy ( ) c. good confinemen ( τ nr )
More informationSome Inequalities variations on a common theme Lecture I, UL 2007
Some Inequliies vriions on common heme Lecure I, UL 2007 Finbrr Hollnd, Deprmen of Mhemics, Universiy College Cork, fhollnd@uccie; July 2, 2007 Three Problems Problem Assume i, b i, c i, i =, 2, 3 re rel
More informationIX.1.1 The Laplace Transform Definition 700. IX.1.2 Properties 701. IX.1.3 Examples 702. IX.1.4 Solution of IVP for ODEs 704
Chper IX The Inegrl Trnform Mehod IX. The plce Trnform November 4, 7 699 IX. THE APACE TRANSFORM IX.. The plce Trnform Definiion 7 IX.. Properie 7 IX..3 Emple 7 IX..4 Soluion of IVP for ODE 74 IX..5 Soluion
More information3 Motion with constant acceleration: Linear and projectile motion
3 Moion wih consn ccelerion: Liner nd projecile moion cons, In he precedin Lecure we he considered moion wih consn ccelerion lon he is: Noe h,, cn be posiie nd neie h leds o rie of behiors. Clerl similr
More informationPresentation Overview
Acion Refinemen in Reinforcemen Learning by Probabiliy Smoohing By Thomas G. Dieerich & Didac Busques Speaer: Kai Xu Presenaion Overview Bacground The Probabiliy Smoohing Mehod Experimenal Sudy of Acion
More informationModule 6 Value Iteration. CS 886 Sequential Decision Making and Reinforcement Learning University of Waterloo
Module 6 Vlue Itertion CS 886 Sequentil Decision Mking nd Reinforcement Lerning University of Wterloo Mrkov Decision Process Definition Set of sttes: S Set of ctions (i.e., decisions): A Trnsition model:
More informationSection P.1 Notes Page 1 Section P.1 Precalculus and Trigonometry Review
Secion P Noe Pge Secion P Preclculu nd Trigonomer Review ALGEBRA AND PRECALCULUS Eponen Lw: Emple: 8 Emple: Emple: Emple: b b Emple: 9 EXAMPLE: Simplif: nd wrie wi poiive eponen Fir I will flip e frcion
More informationgraph of unit step function t
.5 Piecewie coninuou forcing funcion...e.g. urning he forcing on nd off. The following Lplce rnform meril i ueful in yem where we urn forcing funcion on nd off, nd when we hve righ hnd ide "forcing funcion"
More informationInventory Analysis and Management. Multi-Period Stochastic Models: Optimality of (s, S) Policy for K-Convex Objective Functions
Muli-Period Sochasic Models: Opimali of (s, S) Polic for -Convex Objecive Funcions Consider a seing similar o he N-sage newsvendor problem excep ha now here is a fixed re-ordering cos (> 0) for each (re-)order.
More information6. Gas dynamics. Ideal gases Speed of infinitesimal disturbances in still gas
6. Gs dynmics Dr. Gergely Krisóf De. of Fluid echnics, BE Februry, 009. Seed of infiniesiml disurbnces in sill gs dv d, dv d, Coninuiy: ( dv)( ) dv omenum r r heorem: ( ( dv) ) d 3443 4 q m dv d dv llievi
More information1 Sterile Resources. This is the simplest case of exhaustion of a finite resource. We will use the terminology
Cmbridge Universiy Press 978--5-8997-7 - Susinble Nurl Resource Mngemen for Scieniss nd Engineers Excerp More informion Serile Resources In his chper, we inroduce he simple concepion of scrce resource,
More information3.1.3 INTRODUCTION TO DYNAMIC OPTIMIZATION: DISCRETE TIME PROBLEMS. A. The Hamiltonian and First-Order Conditions in a Finite Time Horizon
3..3 INRODUCION O DYNAMIC OPIMIZAION: DISCREE IME PROBLEMS A. he Hamilonian and Firs-Order Condiions in a Finie ime Horizon Define a new funcion, he Hamilonian funcion, H. H he change in he oal value of
More information1 Review of Zero-Sum Games
COS 5: heoreical Machine Learning Lecurer: Rob Schapire Lecure #23 Scribe: Eugene Brevdo April 30, 2008 Review of Zero-Sum Games Las ime we inroduced a mahemaical model for wo player zero-sum games. Any
More informationTax Audit and Vertical Externalities
T Audi nd Vericl Eernliies Hidey Ko Misuyoshi Yngihr Ngoy Keizi Universiy Ngoy Universiy 1. Inroducion The vericl fiscl eernliies rise when he differen levels of governmens, such s he federl nd se governmens,
More informationLaplace Transforms. Examples. Is this equation differential? y 2 2y + 1 = 0, y 2 2y + 1 = 0, (y ) 2 2y + 1 = cos x,
Laplace Transforms Definiion. An ordinary differenial equaion is an equaion ha conains one or several derivaives of an unknown funcion which we call y and which we wan o deermine from he equaion. The equaion
More informationTHREE IMPORTANT CONCEPTS IN TIME SERIES ANALYSIS: STATIONARITY, CROSSING RATES, AND THE WOLD REPRESENTATION THEOREM
THR IMPORTANT CONCPTS IN TIM SRIS ANALYSIS: STATIONARITY, CROSSING RATS, AND TH WOLD RPRSNTATION THORM Prof. Thoms B. Fomb Deprmen of conomics Souhern Mehodis Universi June 8 I. Definiion of Covrince Sionri
More informationLecture Notes 2. The Hilbert Space Approach to Time Series
Time Series Seven N. Durlauf Universiy of Wisconsin. Basic ideas Lecure Noes. The Hilber Space Approach o Time Series The Hilber space framework provides a very powerful language for discussing he relaionship
More informationRESPONSE UNDER A GENERAL PERIODIC FORCE. When the external force F(t) is periodic with periodτ = 2π
RESPONSE UNDER A GENERAL PERIODIC FORCE When he exernl force F() is periodic wih periodτ / ω,i cn be expnded in Fourier series F( ) o α ω α b ω () where τ F( ) ω d, τ,,,... () nd b τ F( ) ω d, τ,,... (3)
More informationExact Minimization of # of Joins
A Quer Rewriing Algorihm: Ec Minimizion of # of Joins Emple (movie bse) selec.irecor from movie, movie, movie m3, scheule, scheule s2 where.irecor =.irecor n.cor = m3.cor n.ile =.ile n m3.ile = s2.ile
More informationCSE/NB 528 Lecture 14: From Supervised to Reinforcement Learning (Chapter 9) R. Rao, 528: Lecture 14
CSE/NB 58 Lecure 14: From Supervised o Reinforcemen Learning Chaper 9 1 Recall from las ime: Sigmoid Neworks Oupu v T g w u g wiui w Inpu nodes u = u 1 u u 3 T i Sigmoid oupu funcion: 1 g a 1 a e 1 ga
More information