An Optimal Approximate Dynamic Programming Algorithm for the Lagged Asset Acquisition Problem

Size: px
Start display at page:

Download "An Optimal Approximate Dynamic Programming Algorithm for the Lagged Asset Acquisition Problem"

Transcription

1 An Opimal Approximae Dynamic Programming Algorihm for he Lagged Asse Acquisiion Problem Juliana M. Nascimeno Warren B. Powell Deparmen of Operaions Research and Financial Engineering Princeon Universiy Augus 21, 2006

2 Absrac We consider a mulisage asse acquisiion problem, where asses are purchased now, a a price ha varies randomly over ime, o be used o saisfy a random demand a a paricular poin in ime in he fuure. We provide a rare proof of convergence for an approximae dynamic programming algorihm using pure exploiaion, where he saes we visi depend on he decisions produced by solving he approximae problem. The resuling algorihm does no require knowing he probabiliy disribuion of prices or demands, nor does i require any assumpions abou is funcional form. The algorihm and is proof rely on he fac ha he rue value funcion is a family of piecewise linear concave funcions.

3 1 Inroducion We consider a class of mulisage problems called he lagged asse acquisiion problem. An ineger amoun x of a single asse is purchased a ime, = 0,..., T 1, o be used o saisfy a demand ha occurs only a a fixed ime T. The price P ha we pay o acquire asses a ime is a Markov process. In mos pracical applicaions, he price rends upward, bu downward flucuaions creae buying opporuniies. We do no realize he demand ˆD unil ime T, a which poin we receive a random revenue ˆr muliplied by he smaller of ˆD and he oal we have ordered up o his poin. In our problem, x is a scalar quaniy. Le Ω, F, IP) be our probabiliy space. The goal is o find an F -measurable sequence x = x ) T 1 =0 ha maximizes: max IE x [ T 1 P x ) + ˆr min =1 )] x. T 1 ˆD, =0 This problem arises in a number of seings. An energy company may be purchasing fuures conracs for oil or gas o lock in a lower price now. Companies purchasing expensive equipmen aircraf, locomoives, power ransformers) can ofen pay less if hey place orders for furher in he fuure. Shipping companies purchase space on conainer ships for a year or more in advance o guaranee space. All of hese decisions are made before knowing he rue demand, he prices and he revenues in he fuure. Our problem could be solved using classical backward dynamic programming, bu wo issues can preven his. demands and revenues. Firs, we may no know he probabiliy disribuion of prices, There has been an increasing ineres in solving sochasic opimizaion problems using a disribuion free, nonparameric approach. A disribuion free revenue managemen and muliproduc pricing applicaions can be found in van Ryzin & McGill 2000) and Rusmevichienong e al. 2006), respecively. A single-period newsvendor problem and is muli-period exension, when he demand disribuion is unknown, are considered in Levi e al. 2006). The auhors esablished bounds on he number of samples required o guaranee ha wih high probabiliy, he expeced cos of he sampling-based policies is arbirarily close o he opimal policy. Second, even hough he sae variable only has wo dimensions price and quaniy, which we assume are discree), he sae space 1

4 can sill be quie large. In secion 7, we repor on experimens where he sae space has as many as 16 million possible values. If we assume he probabiliy disribuions are known, exac soluions using classical mehods require up o 6.7 hours o compue. Even wih one dimensional sae spaces, he curse of dimensionaliy migh presen iself. In he conex of a single-iem sochasic lo-sizing problem wih known disribuion, Halman e al. 2006) develops approximaion algorihms o deal wih i. The auhors also prove ha finding an opimal policy is NP-hard. The goal of his paper is o prove convergence of an algorihm which proceeds by solving problems of he form: where R n x n = arg max 0 x M P n x + n 1 V P n, R 1 n + x), = R n 1+x n capures cumulaive pas purchases, o he dynamic programming opimal value funcion and P n n 1 V P n, R n ) is an approximaion price we mus pay for purchases a ime. Our convergence proof requires P n as would occur in any pracical applicaion. is a sample realizaion of he o be discree, However, our algorihm allows prices o be coninuous even hough we discreize he value funcion approximaion. In addiion, our numerical experimens show ha our algorihm produces very accurae resuls even when we use a relaively coarse discreizaion of he value funcion. Our algorihm and is convergence proof rely on he fac ha boh he opimal and he approximaed value funcions are piecewise linear and concave in he asse dimension wih breakpoins on he inegers. If we define F n 1 n,p x) = P n,rn 1 x + hen he slopes of n 1 F,P : IR IR, where n,rn 1 n 1 V P n, R 1 n + x), 1) F n 1,P n,rn 1 x) o he lef and righ of xn F n 1 n 1,P x)) are used o updae V n,rn 1 1 used o updae V n 1 1 which is an ineger breakpoin of obaining V n 1. As we can see from 1), he slopes depend boh on he sample informaion given by P n and on V n 1, which a ieraion n is only an approximaion of fuure profis. As a resul, he slopes are biased, causing complicaions in he convergence proof. The dependence on sample informaion and on he approximaion of he value funcion in he fuure is common in approximae dynamic programming algorihms see Bersekas 2

5 & Tsisiklis 1996), Suon & Baro 1998)), where an approximaion of he fuure is used o make decisions now, sepping forward in ime. The use of separable, piecewise linear approximaions has already proven effecive on very difficul classes of sochasic resource allocaion problems see Godfrey & Powell 2002) and Topaloglu & Powell 2006)) bu as of his wriing here are no convergence resuls for mulisage problems. Our proof echnique combines ideas from he field of approximae dynamic programming noably Bersekas & Tsisiklis 1996)) as well as he proof of he SPAR algorihm in Powell e al. 2004). Our algorihm is modeled afer he SPAR algorihm which is presened in he conex of a wo-sage problem. The resul is a rare insance of a provably convergen approximae dynamic programming algorihm ha uses pure exploiaion, which is o say ha he decision x n ha we make now based on he value funcion approximaion V n 1 ) deermines he sae we visi a + 1. Curren proofs of convergence for approximae dynamic programming algorihms such as Q-learning Tsisiklis 1994), Jaakkola e al. 1994)) and opimisic policy ieraion Tsisiklis 2002)) require ha we visi saes and possibly acions) infiniely ofen. A convergence proof for a Real Time Dynamic Programming Baro e al. 1995)) algorihm ha considers a pure exploiaion scheme is provided in Bersekas & Tsisiklis 1996)[Prop. 5.3 and 5.4 ], bu i assumes ha he disribuion of he random variables are known. We make no such assumpions, bu i is imporan o emphasize ha our resul depends on he concaviy of he objecive funcion. There are a number of compeing approaches o his problem. Since our problem requires ineger soluions, we can use any of a vas range of approximae dynamic programming algorihms Bersekas & Tsisiklis 1996)) bu hese lack provable convergence wihou compuaionally expensive seps ha require forcing he algorihm o sample saes and acions infiniely ofen. From he field of sochasic programming, here are several flavors of Benders decomposiion ha can be used Van Slyke & Wes 1969), Higle & Sen 1991), Chen & Powell 1999)). However, hese mehods will no handle he random price issue. Anoher powerful echnique is sample average approximaion Shapiro 2003))SAA), which relies on generaing random samples ouside of he opimizaion problems and hen solving he corresponding deerminisic problems using an appropriae opimizaion algorihm. Nu- 3

6 merical experimens wih he SAA approach applied o problems where an ineger soluion is required can be found in Ahmed & Shapiro 2002). The conribuions of he paper are: a) We propose an approximae dynamic programming algorihm for he lagged asse acquisiion problem using pure exploiaion; b) we prove convergence of he algorihm and c) we demonsrae experimenally ha our algorihm ouperforms compeing algorihms, and in paricular dramaically ouperforms sandard backward dynamic programming when he disribuions are assumed known. This paper is organized as follows. Secion 2 defines he problem and he corresponding dynamic programming model. Secion 3 describes he algorihmic sraegy. Secion 4 inroduces noaion and assumpions for he convergence analysis. Secion 5 presens a skech of he convergence proofs while secion 6 provides he full proofs. Finally, secion 7 provides some experimenal comparisons agains he opimal policy and oher approaches and secion 8 presens he conclusions. 2 Problem Formulaion and Model In his secion we give a precise descripion of he problem considered in his paper as well as he assumpions aken. We also provide he dynamic programming model associaed wih he problem and idenify he srucural properies ha are exploied in our proof. The problem is o deermine, in each ime period = 0,..., T 1, how much should be purchased of a given asse o mee a posiive discree ineger random demand ˆD a ime T. A sricly posiive price P is charged for each uni of asse purchased a and a sricly posiive bounded random reward ˆr is received for each uni of saisfied demand. The demand is independen of he price and reward. We denoe by x he amoun purchased a each period and we require ha x {0,..., M }, where M is a naural number. Moreover, x F and he price process P = P 0,..., P T 1 ) is a Markov process wih finie suppor P = P 1 P T 1. We can wrie P +1 = fp, ˆP +1 ), where f is a deerminisic funcion of he previous price P and he exogenous price infor- 4

7 maion ˆP +1, which migh be dependen on P. The objecive is o maximize he expeced profi. We show in secion 7 ha he finie suppor assumpion is no ha resricive, as he algorihm works well for arbirarily fine levels of discreizaion. The decision x a each period, depends boh on he curren uni price of he asse and on he amoun of asses purchased up unil ime 1 inclusive), which is denoed by R 1. We assume ha R 1 = 0. Clearly, R = R 1 + x, for = 0,..., T 1. Noe ha R T 1 denoes he oal number of asses acquired over all ime periods, which is used o saisfy demand ˆD a T. The sae variable is hus given by S = P, R ). We le S = S 0,..., S T 1 ) be our sae vecor and S = S 0 S T 1 ) be he sae space. The problem can be formulaed as a dynamic program. For = 0,..., T 2, he opimaliy equaions V : P [0, B ] IR, where B = M i, are given by [ V P, R) = IE i=0 max P +1 x +1 + V 0 x +1 M +1 +1P +1, R + x +1 ) P = P For = T 1, VT 1 : P T 1 [0, B T 1 ] IR is given by [ ) ] VT 1P, R) = IE ˆr min ˆD, R P T 1 = P. ]. Noe ha we are using a pos-decision sae variable. This is he sae of he sysem afer he decision x is aken. See Powell & Van Roy 2004)) and Van Roy e al. 1997)) for a discussion and an applicaion. Pos-decision saes lead o an inversion of he opimizaion/expecaion order in he value funcion formula. This inversion allows for more effecive compuaional sraegies. We can show ha he opimal value funcions are concave and piecewise linear wih ineger break poins in he asse dimension. Therefore, he value funcion V P, ), for = 0,..., T 1 and P P, can be idenified uniquely by is decreasing slopes v P, 1),..., v P, B )). Moreover, if R is an ineger, he opimal decision x +1 = arg max P +1 x + V 0 x M +1 +1P +1, R + x) is an ineger, wihou having o enforce inegraliy. We disregard he values a P, 0) because he opimal decisions x +1 do no change when V P, ) is shifed by a consan. In order 5

8 o simplify noaion, le S = P {1,..., B }. Noe ha S = S 0,..., S T 1 ) is he sae space minus all he sae pairs P, 0). We close he secion by summarizing he imporan properies of he opimal value funcions and heir slopes ha are used hroughou he paper. The proof is posponed o he appendix. Proposiion 1. The opimal value funcions are piecewise linear, wih ineger breakpoins and concave in he asse dimension. Moreover, for = 0,..., T 1 and P, R) S, he opimal slope v P, R) = V P, R) V P, R 1) is given by v P, R) =IE [ max min P +1, v +1P +1, R) ), v +1P +1, R + M +1 ) ) P = P ] 1 {<T 1} + rip{ ˆD R}1 {=T 1}, 2) where r = IE[ˆr P T 1 = P ]. Thus, v P, R) is bounded beween 0 and max ˆr, which is he maximum of he suppor for he reward ˆr. Furhermore, v P, 1),..., v P, B )) C, where C = { v IR B : v 1 max ˆr, v B 0, v R+1 v R for R = 1,..., B 1 }. 3 Algorihmic Sraegy Our approach o he problem consiss of learning he opimal decision given he ime period, he amoun of asses already available and he curren price. However, he objecive is o learn he opimal decision only for asse levels ha can be generaed by an opimal policy. Figure 1 describes he ADP-Lagged algorihm, a modified version of he SPAR algorihm Powell e al. 2004)). The algorihm sars wih iniial piecewise linear value funcion approximaions represened by heir slopes v 0. As discussed in he previous secion, opimal decisions depend only on he slopes of he value funcions, hus he algorihm only deals wih he slopes insead of he value funcions hemselves. The iniial approximaion of he slopes are only required o be decreasing and bounded beween 0 and max ˆr. A each ieraion n and ime, a decision x n is made. This decision is opimal wih respec o he sample realizaion of he price sequence up o ime, he asse level R 1 n and 6

9 he curren approximaion of he slopes v n 1. I will bring he sysem o he new asse level R n = R n 1 + x n. Jus afer his ransiion, a sample realizaion of he slopes of V n 1 righ of P n, R n ) is observed. These samples, denoed by ˆv n +1R n ) and ˆv n +1R n used o updae he slope approximaions v n 1 P n, R n ) and v n 1 o he lef and + 1), are P n, R n + 1). Afer ha, a projecion operaion Π C is performed in case a violaion of he concaviy propery occurs. For compleeness, we assume ha R n 1 = 0 for all n. Also assume P n 0 = ˆP n 0. We denoe by S = P, R ) a general sae a ime, while S n = P n, R n ) represens he acual sae visied by he algorihm a ieraion n and ime. Moreover, {S n } n 0 = {P n, R n )} n 0 is he sequence of saes generaed by he algorihm. The same noaion holds for he decisions x, x n and {x n } n 0. The algorihm also generaes he { v n } n 0 sequences, ha is, he sequences of slopes of he value funcion approximaions. I is imporan o realize ha here is one sequence { v n P, R )} n 0 for each ime < T and S S. The noaion { v n } n 0 represens he family of all such sequences. Remember ha, for P, R) S, he opimal slope is given by 2). A sample slope is obained by replacing he expecaion by a sample realizaion and by replacing v +1 by is curren approximaion. Therefore, he sample slope, for R = 1,..., B is given by ˆv n +1R) = max min P n +1, v n +1P n +1, R) ), v +1 P n +1, M +1 ) ) 1 {<T 1} + ˆr n 1 {R ˆDn } 1 {=T 1}. 3) Noe ha, for all, ˆv n +1R) ˆv n +1R + 1). When = T 1, he sample slope does no depend on a curren slope approximaion, as is he case for < T 1. This fac is imporan for he convergence analysis of he algorihm, since i implies ha ˆv n +1R) is an unbiased esimaor of v +1P, R) for = T 1 and i is biased for < T 1. The projecion operaor Π C maps a vecor z n ha may no be monoone decreasing in he asse dimension he concaviy propery), ino anoher vecor v n, such ha, for P P, v n P ) = v n P, 1),..., v n P, B )) C. In his paper, we consider he Level projecion operaor inroduced in Topaloglu & Powell 2003)). I imposes concaviy by simply forcing 7

10 STEP 0: Iniialize v 0 P, R) for all and P, R) o be monoone decreasing in R. Se n = 1. STEP 1: Sample he price sequence P n = P n 0,..., P n T 1 ), he demand ˆD n and reward ˆr n. STEP 2: Do for = 0,..., T 1: STEP 2a: x n = arg max 0 x M P n x + STEP 2b: S n = P n, R n + x n ). n 1 V P n, R 1 n + x). STEP 2c: Observe ˆv n +1R n ) and ˆv n +1R n + 1) according o 3). STEP 2d: For P, R) S, { z n 1 α n ) v n 1 P, R) + α n ˆv n P, R) = +1R), if P = P n, R = R n or R n + 1) v n 1 P, R), else STEP 2e: v n = Π C z n ). See 4) for he deails. STEP 3: Increase n by one and go o sep 1. Figure 1: ADP-Lagged Algorihm he violaing slopes o be equal o he newly updaed ones. For P, R) S, he operaor is given by z n P n, R n ), Π C z n )P, R) = z n P n, R n + 1), z n P, R), if P = P n, R R n, z n P, R) z n P n, R n ) if P = P n, R R n + 1, z n P, R) z n P n, R n + 1) else 4) Figure 2 helps us visualize one ieraion n of he algorihm a ime. Afer he algorihm has sampled he price sequence, demand and reward, and has made he decisions up unil ime 1, he curren price is P n and he oal amoun of asses purchased so far is R n 1. Based on he slope approximaion v n 1, he algorihm deermines he amoun of asses x n o acquire a ime and samples he slopes a R n 2a. The decision x n maximizes he funcion where he value funcion approximaion assuming V n 1 = R n 1+x n and R n +1, as illusraed in figure n 1 n n 1 F,P x) = P n,rn 1 x + V P n, R 1 n + x), is compleely deermined by he slopes v n 1, n 1 V P n, 0) = 0. Afer he curren slopes approximaions are updaed using he sampled slopes, a violaion of he concaviy propery may occur, as shown in 2b. In his 8

11 ,P n,rn 1 x) F n 1 R n 1 ˆv n +1R n ) x n R n ˆv n +1R n + 1) Exac x Slopes of v n 1 P n, R n ) n 1 V P n ) 2a: Curren approximae funcion, opimal decision and sampled slopes P n Concaviy violaion Exac z n P n, R n ) P n 2b: Temporary approximae funcion wih violaion of concaviy,p n,r n 1 x) F n Exac x v n P n, R n ) Slopes of V n P n ) P n 2c: Level projecion operaion: updaed approximae funcion wih concaviy resored Figure 2: Ieraion n of he algorihm a ime. case, he projecion operaion Π C is performed and concaviy is resored, as in 2c. 4 Theoreical Condiions and Assumpions We sar his secion poining ou ha he sequence of saes {S n } n 0 = {P n, R n )} n 0 and he sequence of decisions {x n } n 0 generaed by he algorihm have a leas one accumulaion 9

12 poin, as he price sequence has finie suppor and he decisions are ineger and bounded, which implies ha he resource sequence has finie suppor as well. Le S be he se of all saes ha are eiher equal o an accumulaion poin P, R ) of {P n, R n )} n 0 or are equal o P, R + 1). Moreover, we only consider accumulaion poins P, R ) such ha R > 0 and R < B. The slope sequences { v n } n 0 also have an accumulaion poin, as he se C defined in proposiion 1) is compac and he projecion operaion guaranees, for all ieraions n and prices P P, ha v n P ) = v n P, 1),..., v n P, B )) C. Le F be he sigma-algebra generaed by he algorihm. We denoe by F n, for = 0,..., T, he sigma-algebra generaed by he algorihm up unil ieraion n and ime period. Moreover, we denoe by F n he sigma-algebra generaed by he algorihm up unil he end of ieraion n. Clearly, for = 0,..., T 1, F n F n +1 and F n T F n F n+1 0. Furhermore, v n P, R) and z n P, R) F n, while, for all < T, P n, x n ˆD n, ˆr n and ˆv T nr) F T n. and ˆv n R) F n. We also have ha We inroduce he ineger random variable N, which is used o indicae when an ieraion of he algorihm is large enough for convergence analysis purposes. Le N be he smalles ineger such ha all accumulaion poins P, R, x ) = P 0, R 0, x 0),..., P T 1, R T 1, x T 1 )) of {P m, R m, x m )} m 0 have been observed a leas once. Moreover, N is also he smalles ineger such ha if a mahemaical saemen regarding he sequences of slopes, saes and decisions generaed by he algorihm is rue only for finiely many ieraions, hen i is false for all ieraions n N. For example: If If 1 {R n 1,P n,xn )=R,P,x)} <, hen 1 {R n 1,P n,xn )=R,P,x)} = 0; n=1 n= N 1 { v n P,R)<P } <, hen 1 { v n P,R)<P } = 0. n=1 n= N I is rivial o see ha N is finie almos surely. 10

13 For P, R) S, we presen he ses of ieraions N P, R) and N + P, R). These ses keep rack of he effecs produced by he projecion operaion. Le N P, R) N + P, R)) be he se of ieraions in which he unprojeced slope corresponding o sae P, R) was oo small large) and had o be increased decreased) by he projecion operaion. Formally, N P, R) = {n IN : z n P, R) < v n P, R)} N + P, R) = {n IN : z n P, R) > v n P, R)}. For example, based on figure 2c, n N P n, R n 1) and n N + P n, R n + 2). We now inroduce he ses of saes S and S +. The saes in S S + ) are he ones for which he projecion operaion decreased increased) or kep he same he corresponding unprojeced slopes infiniely ofen, ha is, for P, R) S, N P, R) N + P, R)) is finie if and only if P, R) S S + ). Tha is, S S + = {P, R) S : z n P, R) v n P, R) for all n N} = {P, R) S : z n P, R) v n P, R) for all n N}. Due o he definiion of he projecion operaor, S + is no empy, since P, R Min ) S +, where R Min is he minimum asse level such ha P, R) S. We can use a similar argumen o show ha S is no empy. Finally, we impose he condiions ha mus be saisfied by he sepsizes α n used o updae he value funcion approximaions. For < T, he sepsizes saisfy he following condiions α n 0, 1] and α n F n 5) α n ) 2 B < a.s., 6) n=0 where B is a consan. These are sandard condiions for sochasic approximaion proofs of convergence. We also require ha α n 1 {P n =P,R n=r } = a.s., 7) n=0 11

14 where P, R ) is an accumulaion poin of he sequence {P n, R n )} n 0. For example, he sepsize rule α n 1 n = saisfies condiions 5) 7), where NP NP n,rn ), R n ) is he number of visis o sae P n, R n ) up unil ieraion n. For ease of noaion in he nex secions, we define a new sepsize sequence ᾱ n based on he previous one. For < T and S S, le ᾱ n P, R) = α n 1{P =P n,r=rn } + 1 {P =P n,r=r n +1} ). Noe ha while α n is a scalar, ᾱ n is a vecor wih argumens P, R) S. Based on he assumpions 5) 7), we can rivially prove ha ᾱ n P, R) [0, 1], is F n - measurable and, for P, R ) S, ᾱ n P, R ) 2 B a.s. and n=0 ᾱ n P, R ) = a.s. 8) n=0 Furhermore, for all posiive inegers N, 1 ᾱ n P, R ) ) = 0 a.s. 9) n=n The proof for 9) follows direcly from he fac ha log1 + x) x. As a final remark, we can easily see ha ˆv n R), z n P, R) and v n P, R) are bounded by 0 and max ˆr for all ieraions n, because he iniial approximaions are bounded by 0 and max ˆr and he sepsizes are beween 0 and 1. 5 Skech of Convergence Analysis We inroduce he convergence resuls we wan o prove and skech he proofs, summarizing he seps ha will be used. The full proofs are given in secion 6. We are afer wo main convergence resuls. The firs one is, for each < T and P, R ) S, v n P, R ) v P, R ) a.s. 10) 12

15 The second resul says ha x = arg max P x + V P, R 1 + x) a.s. 11) 0 x M where R 1, P, x ) is an accumulaion poin of he sequence {R n 1, P n, x n )} n 0 generaed by he algorihm and V is he opimal value funcion. Equaion 11) shows ha indeed he algorihm has learned he opimal decision for all saes ha can be reached by an opimal policy. I is easy o see his implicaion. Saring wih = 0, we have by assumpion ha R 1 = 0, as R n 1 = 0 for all ieraions of he algorihm. Moreover, all prices in P 0 are accumulaion poins of {P n 0 } n 0. Thus, 11) ells us ha he accumulaion poins x 0 of he sequence {x n 0} along he ieraions wih iniial price P 0 are in fac an opimal policy for period 0 when he price is P 0. This implies ha all accumulaion poins R 0 = x 0 of {R n 0 } n 0 are asse levels ha can be reached by an opimal policy. By he same oken, for = 1, every price in P 1 is an accumulaion poin of {P n 1 } n 0. Hence, he second resul ells us ha he accumulaion poins x 1 of he sequence {x n 1} along ieraions wih R n 0, P n 1 ) = R 0, P 1 ) are indeed an opimal policy for period 1 when he asse level is R 0 and he price is P 1. As before, he accumulaion poins R 1 = R 0 +x 1 of {R n 1 } n 0 are asse levels ha can be reached by an opimal policy. The same reasoning can be applied for = 2,..., T 1. The main idea o achieve 10) is o define for each < T and P, R) S inroduced in secion 2) deerminisic sequences {L k P, R)} k 0 and {U k P, R)} k 0 ha are provably convergen o v P, R) and hen prove, for all k 0 and each P, R ) S, ha L k P, R ) v n P, R ) U k P, R ) 12) for all n big enough. Esablishing hese inequaliies is nonrivial, and draws on a proof echnique in Bersekas & Tsisiklis 1996, Secion 4.3.6)B&T). In our proof, however, we have o handle wo significan differences. Firs, our algorihm uses a pure exploiaion sraegy whereas B&T assumed ha all saes are visied infiniely ofen. Second, we inroduce a projecion operaor o mainain concaviy of he approximaion, which is no he case in B&T. In order o esablish 12), we need o presen he dynamic programming operaor H associaed wih he asse acquisiion problem and he deerminisic bounding sequences 13

16 {L k } k 0 and {U k } k 0. I is noeworhy ha hese sequences are compleely independen of he algorihm. We also define four sochasic sequences, { s n } n 0, { s n +} n 0, { l n } n 0 and {ū n } n 0, which do depend on he ieraions of he algorihm. The firs wo sequences are called sochasic noise sequences and he las wo sequences are called sochasic bounding sequences. All hese elemens are combined o obain 12) and he concaviy of he value funcions plays a major role in he proofs. Roughly speaking, using properies of he operaor H and concaviy, we prove HL k ) P n, R n ) H v n 1 ) P n, R n ) HU k ) P n, R n ) HL k ) P n, R n + 1) H v n 1 ) P n, R n + 1) HU k ) P n, R n + 1). These inequaliies enable us o prove for n big enough ha v n 1 P, R ) ū n 1 P, R ) + s n 1 P, R ), if P, R ) S, v n 1 P, R ) n 1 l P, R ) s n 1 + P, R ), if P, R ) S +. Then, convergence o zero of he noise sequences, a convex combinaion propery of he sochasic bounding sequences and concaviy, will give us v n 1 P, R ) U k P, R ), if P, R ) S, v n 1 P, R ) L k P, R ), if P, R ) S +. Finally, concaviy plays a role again and we obain 12). The second convergence resul, he opimaliy of he decisions wih respec o he opimal value funcions represened by 11), is a byproduc of he convergence of he approximae slopes. I is discussed in deail in he nex secion. 6 Convergence Analysis We presen formally he dynamic operaor H and he deerminisic bounding sequences {U k } k 0 and {L k } k 0 in secion 6.1. Afer ha, in secion 6.2, we sae and prove our major 14

17 heorem, he almos sure convergence of he approximae slopes o he opimal slopes. As par of he proof, we define he sochasic sequences and sae echnical lemmas as hey are needed. In order o focus on he main ideas of he heorem proof, he proofs of he lemmas will be deferred o he appendix. Finally, in secion 6.3 we prove he almos sure convergence o he opimal decisions. Since we deal wih almos sure convergence proofs, hroughou his secion we assume we only consider he ses in he sigma-algebra F ha have sricly posiive measure. 6.1 The Operaor H and he Bounding Sequences We sar by defining he dynamic programming operaor H ha maps a vecor v ino a new vecor Hv according o he formula Hv) P, R) = IE [ max min P +1, v +1 P +1, R)), v +1 P +1, M +1 )) 1 {<T 1} + ˆr1 {R ˆD} 1 {=T 1} P = P ]. 13) for = 0,..., T 1 and P, R) S. The following properies can be easily proved: 1. H has a unique fixed poin v, where v is he vecor of slopes of he opimal value funcions. 2. H is monoone, ha is, if v ṽ componenwise, hen Hv Hṽ. 3. Hv ηe Hv ηe) Hv + ηe) Hv + ηe, where η is a posiive consan and e is a vecor wih all componens equal o 1. The inequaliies are considered componenwise. 4. H is coninuous. We inroduce he deerminisic bounding sequences {U k } k 0 and {L k } k 0 and esablish hree imporan properies. When we refer o he sequence {U k } k 0 wihou menioning he ime index and he sae P, R) S, we are referring o he family of sequences 15

18 {U k P, R)} k 0, one for each ime < T and sae P, R). The same is rue wih he oher deerminisic sequence {L k } k 0. Le U 0 = v + max ˆre and U k+1 = U k + HU k, k 0 14) 2 L 0 = v max ˆre and L k+1 = Lk + HL k, k 0. 15) 2 Noe ha jus like he slopes v, for all k 0, L k and U k are boh monoone decreasing in he asse dimension. Lemma 1. The sequences {U k } k 0 and {L k } k 0 saisfy HU k U k+1 U k HL k L k+1 L k 16) 17) and boh converge o v. Furhermore, U k > v and L k < v for all k 0. Proof. The proof of inequaliies 16) and 17) as well as he proof of convergence of he sequences o v is given in Bersekas & Tsisiklis 1996, Lemmas 4.5 and 4.6 ). They jus require he firs four properies of he operaor H. In order o show ha L k < v for all k 0, we begin by analyzing L k T 1. By definiion of H, for all P, R) S T 1, HL k ) T 1 P, R) = vt 1 P, R), for all k 0. We also have ha L 0 T 1 P, R) = v T 1 P, R) max ˆr < v T 1 P, R). Thus, L1 T 1 P, R) < v T 1 P, R) and an inducion argumen on k shows ha L k T 1 P, R) < v T 1 P, R) for all k 0. Now, assume ha L k +1P, R) < v +1P, R) for all k 0 and P, R) S +1. We prove HL k ) P, R) v P, R) for, when = 0,..., T 2. We have, for P, R) S, HL k ) P, R) = IE [max min P +1, L +1 P +1, R)), L +1 P +1, R + M +1 )) P = P ] IE [ max min P +1, v +1P +1, R) ), v +1P +1, R + M +1 ) ) P = P ] = v P, R). Furhermore, L 0 P, R) = v P, R) max ˆr < v P, R), which implies L 1 P, R) < v P, R). Again, an inducion argumen on k shows ha L k P, R) < v P, R) for all k 0. The proof for U k follows by a symmerical argumen. 16

19 6.2 Convergence of v n P, R ) In his secion, we prove almos sure convergence of he slopes of he approximae funcions o he slopes of he opimal ones for saes in S. In he process, we presen he noise and he bounding sochasic sequences. We also inroduce hree echnical lemmas. Their proofs are given in he appendix. We assume for all possible saes P, R) ha vt P, R) = UT kp, R) = Lk T P, R) = vn T P, R) = 0 for inegers k 0 and ieraions n 0. Theorem 1. Assume he sepsize condiions 5) 7). Then, for all k 0 and = 0,..., T, here exiss a posiive ineger N,k Therefore, such ha, for all n N,k and saes P, R ) S, L k P, R ) v n 1 P, R ) U k P, R ). 18) v n P, R ) v P, R ) a.s. 19) Proof. The proof of he heorem is by backward inducion on. The base case is = T. As vt P, R) = U T kp, R) = Lk T P, R) = vn T P, R) = 0 for all saes P, R) and inegers k 0 and n 0, he inequaliies in 18) are rivial for = T. Thus, we can pick, for example, N,k T = N, where N, as defined in secion 4, is a random variable ha denoes when an ieraion of he algorihm is large enough for convergence analysis purposes. The backward inducion proof is compleed when we prove 19) for a general, = 0,..., T 1. Given he inducion hypohesis for + 1, he proof for ime period is divided ino wo pars. We prove for all k 0 ha here exiss an ineger N k n N k, such ha, for all v n 1 P, R ) U k P, R ), if P, R ) S, 20) v n 1 P, R ) L k P, R ), if P, R ) S +. 21) This is he firs par. Is proof is by inducion on k. Noe ha his par only applies o saes in he ses S and S +. Then, again for, we ake on he second par, which proves he exisence of an ineger N,k such ha 18) is rue for all saes in S and ieraions n N,k. Noe ha he second par akes care of he saes in S no covered by he firs 17

20 par. Consequenly, 19) is rue for. Figure 3 shows he relaionship beween he ses of saes. S Sae Space) S Sae Space minus P, 0) pairs) S Accumulaion poin P, R ) or P, R + 1) of {P n, R n )}) S Corresponding slope is increased finiely ofen due o he proj. op. ) S + Corresponding slope is decreased finiely ofen due o he proj. op.) Figure 3: Relaionship beween he ses of saes. We sar he backward inducion on. Remember ha he base case = T is rivial and we pick N,k T = N. We also pick, for compleeness, N k T = N. Inducion Hypohesis: Given = 0,..., T 1, assume, for +1, and all k 0 he exisence of inegers N k +1 and N,k +1 such ha, for all n N k +1, 20) and 21) are rue, and, for all n N,k +1, he inequaliies in 18) hold rue for all saes P, R ) S +1. Par 1: Now we prove for any k, he exisence of an ineger N k such ha for n N k, inequaliies 20) and 21) are rue. For a paricular ime, he proof is by forward inducion on k. We sar wih k = 0. For every P, R) S, 0 v P, R) max ˆr implies ha, by definiion, U 0 P, R) max ˆr and L 0 P, R) 0. Therefore, 20) and 21) are saisfied for all n 1, since we know ha v n 1 is bounded by 0 and max ˆr for all ieraions. Thus, N 0 = max ) 1, N,0 +1 = N,0 +1. The inducion hypohesis on k assumes ha here exiss N k such ha, for all n N k, 20) and 21) are rue. Noe ha we can always make N k larger han N,k +1, hus we assume ha N k N,k +1. The nex sep is he proof for k + 1. Before we move on, we define he variables ŝ n +1 and ŝ n +1+ o be he error incurred by 18

21 observing a sample slope. For R = 1,..., B, ŝ n +1 R) = ˆv n +1R) H v n 1 ) P n, R) and ŝ n +1+R) = ŝ n +1 R). Using ŝ n +1 and ŝ n +1+, we also define he sochasic noise sequences { s n } n 0 and { s n +} n 0. For P, R) S, s n P, R) = 0 and s n +P, R) = 0, for n < N k, and, for n N k, s n P, R) = max 0, 1 ᾱ n P, R)) s n 1 P, R) + ᾱ n P, R)ŝ n +1 R n 1 {R R n } + R n + 1)1 {R>R n }) ) s n +P, R) = max 0, 1 ᾱ n P, R)) s n 1 + P, R) + ᾱ n P, R)ŝ n +1+R n 1 {R R n } + R n + 1)1 {R>R n }) ) The sample slopes are defined in a way such ha IE [ ŝ n +1 R) F n ] = 0. 22) This condiional expecaion is called he unbiasedness propery. This propery, ogeher wih he maringale convergence heorem and he boundedness of boh he sample slopes and he approximae slopes are crucial for proving ha he noise inroduced by he observaion of he sample slopes, which replace he observaion of rue expecaions, go o zero as he number of ieraions of he algorihm goes o infiniy, as is saed in he nex lemma. Lemma 2. For P, R ) S, { s n P, R )} n 0 0 and { s n + P, R )} n 0 0 a.s. 23) Proof of lemma 2. Given in he appendix. Using he convenion ha he minimum of an empy se is +, le { δl k HL k ) = min P, R ) L k P, R } ) : 4 P, R ) S +, HL k ) P, R ) > L k P, R ). If δ k L < + we define an ineger N L N k N L 1 m=n k o be such ha 1 ᾱ m P, R ) ) 1/4 and s + n 1 P, R ) δl, k 24) 19

22 for all n N L and saes P, R ) S +. Such an N L exiss because boh 9) and 23) are rue. If δ k L = + hen, for all saes P, R ) S +, HL k ) P, R ) = L k P, R ) since 17) ells us ha HL k L k. Thus, L k+1 P, R ) = L k P, R ) and we define he ineger N L o be equal o N k. We can apply symmeric reasoning o deermine δ k U and N U. We jus need o consider he deerminisic bounding sequence {U k } k 0, he se S insead of {L k } k 0, S + L k+1 Finally, le N k+1 and { s n +} n 0, respecively. = max N L, N U, N,k+1 +1 and he noise sequence { s n } n 0 ). Firs, pick a sae P, R ) S +. If P, R ) = L k P, R ), hen inequaliy L k+1 P, R ) v n 1 P, R ) follows from he inducion hypohesis. We herefore concenrae on he case where L k+1 P, R ) > L k P, R ). Firs, we define he sochasic bounding sequences { l n } n 0 and {ū n } n 0. For each P, R) S, we have ln P, R) = L k P, R) and ū n P, R) = U k P, R), for n < N k, and, for n N k, ln P, R) = 1 ᾱ n n 1 P, R)) l P, R) + ᾱ n P, R)HL k ) P, R) ū n P, R) = 1 ᾱ n P, R)) ū n 1 P, R) + ᾱ n P, R)HU k ) P, R). A simple inducive argumen proves ha ū n P, R) is a convex combinaion of U k P, R) and HU k ) P, R), while l n P, R) is a convex combinaion of L k P, R) and HL k ) P, R). n 1 n 1 Therefore we can wrie, wih b = 1 ᾱ m P, R ) ), ln 1 P, R ) = For n N k+1 m=n k n 1 b L k P, R n 1 ) + 1 b )HL k ) P, R ). N L, we have b n 1 using 15) and he definiion of δl k, we obain ln 1 P, R ) 1 4 Lk P, R ) HLk ) P, R ) 1/4. Moreover, L k P, R ) HL k ) P, R ). Thus, = 1 2 Lk P, R ) HLk ) P, R ) HLk ) P, R ) L k P, R )) L k+1 P, R ) + δ k L. 25) 20

23 Again for n N k+1 N L, he following lemma is used o show ha v n 1 P, R ) Lemma 3. For n N k, Moreover, n 1 l P, R ) s n 1 + P, R ). 26) HL k ) P n, R n ) H v n 1 ) P n, R n ) HU k ) P n, R n ), if R n > 0 HL k ) P n, R n + 1) H v n 1 ) P n, R n + 1) HU k ) P n, R n + 1), if R n < M. v n 1 P, R ) ū n 1 P, R ) + s n 1 P, R ), if P, R ) S, v n 1 P, R ) n 1 l P, R ) s n 1 + P, R ), if P, R ) S +. Proof of lemma 3. Given in he appendix. Combining 25) and 26), we obain, for all n N k+1 N L, v n 1 P, R ) L k+1 P, R ) + δl k s + n 1 P, R ) L k+1 P, R ) + δl k δl k = L k+1 P, R ), where he las inequaliy follows from 24). To finish he proof of par 1, we pick a sae P, R ) S. The reasoning for U k+1 P, R ) is symmerical o ha for L k+1 P, R ), which complees our inducion. Thus, we have proved ha, for all k 0, here exiss N k concludes he firs par of he proof. such ha 20) and 21) hold for all n N k. This Par 2: In his par, we ake care of he saes P, R ) S \ S + S ), because if P, R ) S + S ), we have already proved in par 1 ha, for all k 0, here exiss N k such ha if n N k, hen v n 1 P, R ) L k P, R ) U k P, R )). In conras o par 1, he proof echnique here is no by forward inducion on k. 21

24 A discussion abou he projecion operaion is in order, as his par of he proof is all abou saes for which he projecion operaion decreased or increased he corresponding approximae slopes infiniely ofen. If for all P, R ) S he corresponding opimal slopes v P, R ) are disinc, hen S = S + = S and Par 2 is no necessary. However, his fac is no verifiable. Figure 4 illusraes a ypical siuaion where S \ S + S ). P v P, R) S + N + P, R 2) = N + P, R 1) = N + P, R ) = R Min R R S R + Figure 4: Opimal slopes ha can lead o S \ ) S+ S An imporan propery of he projecion operaor is ha all he slopes o he lef of R n changed by he projecion operaion are increased o be equal o he new slope a R n. Similarly, all he slopes o he righ of R n decreased o be equal o he slope a R n + 1 see figure 2c). + 1 changed by he projecion operaion are There is anoher ineresing propery ha is necessary for he proof of Par 2. P, R ) S \ S +. We argued in secion 4 ha he sae P, R Min ) is an elemen of S +, where R Min is he minimum asse level of he se {R : P, R) S }. Therefore, he sae P, R +) where R + is he maximum asse level smaller han R such ha P, R +) S + is well defined. We show nex ha for all asse levels R beween R + and R inclusive), N + P, R) is also equal o infiniy. By definiion of he se S +, N + P, R ) =. If P, R 1) = P, R +) we are done. Oherwise, we have o consider wo cases. Firs, if P, R 1) S, hen N + P, R 1) is infinie by he definiion of he se S + from he fac ha, in his case, P, R 1) S \ S +. Second, if P, R 1) S, hen he corresponding slope is never updaed due o a direc observaion of sample slopes Le and 22

25 for n N. Moreover, every ime he slope of P, R ) is decreased due o a projecion which is coming from he lef), he slope of P, R 1) is decreased as well. Therefore, N + P, R ) {n N} N + P, R 1) {n N}, implying ha N + P, R 1) is infinie. We hen apply he same reasoning for saes P, R 2), P, R 3),..., unil we reach sae P, R +). A symmerical argumen handles he saes P, R ) S \ S. Wih hese properies in mind we go back o he proof of Par 2. We inroduce he lemma ha is he key elemen for he proof. Lemma 4. If for all k 0, here exiss an ineger N k P, R) such ha v n 1 P, R) L k P, R) for all n N k P, R) and N + P, R + 1) is infinie, hen for all k 0, here exiss an ineger N k P, R + 1) such ha v n 1 P, R + 1) L k P, R + 1) for all n N k P, R + 1). Similarly, if for all k 0, here exiss an ineger N k P, R) such ha v n 1 P, R) U k P, R) for all n N k P, R) and N P, R 1) is infinie, hen for all k 0, here exiss an ineger N k P, R 1) such ha v n 1 P, R 1) U k P, R 1) for all n N k P, R 1). Proof of lemma 4. Given in he appendix. Pick k 0, P, R ) S \ S + and he sae P, R +) S + inroduced in he projecion discussion. Noe ha we can apply lemma 4 considering saes P, R +) and P, R + + 1) in order o obain, for all k 0, an ineger N k P, R + + 1) such ha v n 1 P, R + + 1) L k P, R + + 1), for all n P, R + + 1). Afer ha, we make use of lemma 4 again, his ime o saes P, R ++1) and P, R ++2). Noe ha he firs applicaion of lemma 4 gave us he ineger N k P, R + + 1), necessary o fulfill he condiions of his second usage of he lemma. We repea he same reasoning, applying lemma 4 successively o he pairs of saes P, R + +2) and P, R + +3), P, R + + 3) and P, R + + 4),..., P, R 1) and P, R ). In he end, we obain, for each k 0, an ineger N k P, R ), such ha v n 1 P, R ) L k P, R ), for all n N k P, R ). Figure 5 illusraes his process. 23

26 lemma 4 v n 1 P, R + + 1) L k for n N k P, R P, R + + 1) + + 1) lemma 4 v n 1 P, R + + 1) L k for n N k P, R P, R + + 1) + + 1) lemma 4 v n 1 P, R + + 1) L k P, R + + 1) for n N k P, R + + 1) v n 1 P, R +) L k P, R +) for n N k P, R +) = N k N + P, R + + 1) = N + P, R + + 2) = N + P, R ) = R + S S+ R R = R 1 R S \ S + Projecion Propery Figure 5: Successive applicaions of lemma 4. Similarly, pick P, R ) S \ S. By successive applicaions of he second par of lemma 4 we obain for each k 0, an ineger N k P, R ), such ha v n 1 P, R ) U k P, R ), for all n N k P, R ). { Finally, if we consider N,k = max N k, max P, R ) S \ S e + T S e ) N k P, R ), hen 18) is rue for all saes P, R ) S and n N,k. Consequenly, 19) is also rue for all saes P, R ) S. } 6.3 Opimaliy of he Decisions We are ready o prove 11), he second convergence resul. Theorem 2. For = 0,..., T 1, le v, R 1, P, x ) be an accumulaion poin of he sequence { v )} n 1, R 1, n P n, x n generaed by he algorihm. Assume all condiions of n 1 heorem 1 are saisfied. Then, wih probabiliy one, x is an opimal soluion of max P x + V P, R 1 + x). 27) 0 x M Proof. A each ieraion n and ime of he algorihm, he decision x n is opimal wih respec o he sample price P n, he curren asse level R n 1 and he value funcion approximaion for price P n, which is piecewise linear wih ineger breakpoins and is represened by is slopes 24

27 v n P n, 1),..., v n P n, B ). Therefore, i follows ha P n + v n P n, R 1 n + x n ) > 0 and P n + v n P n, R 1 n + x n + 1) 0. Then, by passing o he limi, we can conclude ha each accumulaion poin v, R 1, P, x ) of he sequence { v )} n 1, R 1, n P n, x n saisfies n 1 P + v P, R 1 + x ) > 0 and P + v P, R 1 + x + 1) 0. 28) Since saes P, R 1+x ) and P, R 1+x +1) are elemens of S, i follows from heorem 1 ha and v P, R 1 + x ) = v P, R 1 + x ) a.s. v P, R 1 + x + 1) = v P, R 1 + x + 1) a.s. This fac combined wih 28) is sufficien o conclude he proof. 7 Experimenal Resuls The purpose of his secion is o esablish he compuaional benefis relaive o oher Mone Carlo-based algorihms as well as classical backward dynamic programming where we have o assume he disribuion is known). We sar by giving a brief descripion of each approach o which we compare our algorihm. In a Bach-mode Mone-Carlo-based value ieraion algorihm Bach), a each ieraion n, once a sample for he price process, reward and demand is gahered, sample slopes a all possible asse levels R are observed and used o updae he corresponding slopes for he observed sampled prices P n = P0 n,..., PT n 1 ). Tha is, seps 2c and 2d of he algorihm described in figure 1 are replaced by STEP 2c: Observe ˆv n +1R) according o 3) for all R such ha P n, R) S. 25

28 STEP 2d: For P, R) S, z n P, R) = [ 1 α n ) v n 1 P, R) + α n ˆv +1R) ] n 1 {P =P n } + v n 1 P, R)1 {P P n }. One can argue ha such a bach algorihm would make a beer use of he informaion in he sample realizaions. Applying his mehod, which is synchronous in he sense ha all he slopes for he observed prices are updaed a once, we wish o see how i compares o an asynchronous approach our algorihm). Our mehod is asynchronous in he sense ha only wo slopes are updaed a each ieraion n and ime i can be more if a violaion of concaviy occurs). A Real Time Dynamic Programming RTDP) approach Baro e al. 1995)) assumes ha he disribuion of he random variables is known, which is no he case for eiher our algorihm or he Bach one. We could consider an Approximae RTDP ARTDP) approach Baro e al. 1995)), which sars wih an iniial esimae of a disribuion and hen updaes i during he ieraions of he algorihm. However ARTDP is a mos as good as RTDP, so we assume he disribuion is known and implemen a RTDP mehod. Insead of using he sample slope given by 3), he RTDP algorihm uses ˆv n +1R) = IE [ max min P +1, v n +1P +1, R) ), v +1 P +1, M +1 ) ) 1 {<T 1} + ˆr n 1 {R ˆDn } 1 {=T 1} P n = P ]. 29) Tha is, sep 2c of he algorihm described in figure 1 is replaced by STEP 2c: Observe ˆv n +1R n ) and ˆv n +1R n + 1) according o 29). Due o is known disribuion feaure, Proposiions 5.3 and 5.4 in Bersekas & Tsisiklis 1996)) can be applied o obain almos sure convergence resuls similar o ours. When we compare he compuaional resuls of his mehod o he compuaional resuls of our approach, we are measuring he radeoff beween more informaion given by he expecaion versus he ime spen o do his operaion. A very popular approach in he approximae dynamic lieraure is Q-learning Abounadi e al. 2002), Rummery & Niranjan 1994), Even-Dar & Mansour 2004), Cybenko e al. 26

29 1997), Tsisiklis 1994), Duff 1995)), which, like our algorihm, is also ofen used as a model-free algorihmic sraegy. A sandard Q-learning algorihm Wakins & Dayan 1992)) sores all possible Sae-Acion pairs and a proof of convergence requires all he corresponding Q-values o be sampled infiniely many imes. One can argue ha Q-values share he same characerisics as he value funcion around a pos decision sae which is he sae considered in his paper), ha is, he opimizaion is inside he expecaion. Compuaionally, however, he wo approaches are quie differen. Insead of S, he sae space in Q-learning would be S X, where X is our acion space. Moreover, in Q-learning, firs a decision is aken based only on he Q-values and hen he realizaion of he random informaion is observed. In our case, firs he realizaion of he random informaion is observed, hen a decision is aken, which of course is also dependen on he value funcion for he fuure. The size of he Q-learning sae space makes his approach impossible o be implemened for his problem. The program would have aken an unaffordable amoun of ime and virual memory o run. Therefore, insead of implemening a Q-learning approach, we propose and implemen an algorihm ha only sores he sae afer he decision is made and samples all possible acions infiniely ofen in a uniform way. This implies ha all saes are sampled infiniely ofen as well. This algorihm should be a leas as good as sandard Q-learning, due o a smaller sae space. Moreover, heorem 1 can be used o show ha i converges almos surely o he opimal slopes. Using his approach, sep 2a of he algorihm described in figure 1 is replaced by STEP 2a: Sample x n according o a discree uniform disribuion beween 0 and M. Wih his algorihm, we ry o infer how a pure exploiaion scheme our approach) compares o a pure exploraion one. Experimenal resuls omied) showed ha his algorihm worked erribly, implying ha using pure exploiaion insead of pure exploraion pays off. The insances considered in he experimens are described in able 1. Problems were randomly generaed using differen disribuions for he rewards ˆr and iniial prices P 0. 27

30 Moreover, boh discree uniform DiscU) and Poisson demand disribuions wih differen parameers were used. We also creaed differen price processes, namely random walk RW), mean reversion MR) and geomeric Brownian moion GBM), all of which are described below. Even hough all price processes are coninuous, we use a discreizaion incremen of 0.1 for all insances. I is imporan o emphasize ha when using our algorihm, he discreizaion only occurs when represening he value funcions - he prices in he simulaed pahs are sill coninuous. For all insances, he number of ime periods considered is 10. Tha is, he random demand ˆD and he random reward ˆr are observed a T = 10. Furhermore, he upper bound on he decision quaniy x, for = 0,..., T 1, is se o M = M = 400. Table 1 also conveys he size of he sae space of each insance. Ins. Sae Space Iniial Price Reward ˆr Demand ˆD Price Proc Consan 20 U50,60) DiscU180,250) RW Consan 20 U50,60) Poisson200) RW *U1,12) P T 1 *U1.03,1.15) Poisson250) MR *U1,12) P T 1 *U1.03,1.15) DiscU180,220) MR Consan 40 Consan 25 Poisson300) GBM Consan 45 Consan 15 DiscU225,375) GBM Table 1: Insances descripion - T = 10 and M = M = Discreizaion = 0.1. Nex, we give he deails of he differen price processes. The random walk price process is given by P = P 1 + ˆP, where he price incremen ˆP has he normal disribuion wih mean µ = 0.02 and sandard deviaion σ = 1.5. The mean reversion price process is given by P = P 1 + ˆP + 0.5B P 1 ), where ˆP is uniformly disribued beween 0.9 and 1.2, and B 0 = 1.7Ū1, 12) and B = B 1 Ū0.9, 1.2), where Ū is he mean of he corresponding uniform disribuion. Finally, he geomeric Brownian moion process is given by P = P 1 e ˆP, where ˆP is normally disribued wih mean µ = and sandard deviaion σ = I is easy o see ha when he random walk and he geomeric Brownian moion are 28

31 considered, he slopes v P, R) given by 2) are monoone increasing in he price dimension. Therefore, for all he differen mehods and insances 1, 2, 5, 6, his propery is going o be imposed in order o speed up he rae of convergence. We describe how he experimens were conduced. We also presen and analyze he resuls. Knowing he underlying disribuions, as described in able 1, we compued a policy using a classical backward dynamic programming echnique. A discreizaion incremen of 0.01 was used for prices. The experimens were run as follows. Using he underlying disribuions given in able 1), we compued he opimal policy using backward dynamic programming, assuming ha prices were discreized o he neares We nex randomly generaed 50 ses Ω i, i = 1,..., 50, where each se Ω i consised of sample pahs. Each approximaion algorihm was rained using he sample pahs in Ω i, generaing a policy. This exercise was repeaed 50 imes, and he qualiy of he soluion is based on an average over he 50 policies obained using he 50 differen raining daases. In order o evaluae he policies, for each insance, we randomly generaed a se ˆΩ of 800 sample pahs. For each ω ˆΩ, le X ω) be he decision of he opimal policy compued exacly using backward dynamic programming) a ime for sample pah ω and le n,i ˆX ω) be he decision of he approximae policy i = 1,..., 50, a ime for sample pah ω ˆΩ. The approximae policy i was obained afer n ieraions of a given approximaion algorihm when se Ω i was he raining daase. and From his, for he exac policy, we compued, for ω ˆΩ, T 1 ) F ω) = P ω)x ω) + ˆrω) min ˆDT ω), R T 1 ω) F = =0 F ω). ω ˆΩ 29

32 For he approximae policy, we compued, for ω ˆΩ, T 1 ˆF n,i ω) = P ω) =0 Nex, hese values are averaged o obain F n ω) = i=1 Finally, we compued F n = ˆF n,i ω). F n ω). ω ˆΩ ) n,i ˆX ω) + ˆrω) min ˆDω), RT 1 ω). Table 2 shows he ime in seconds) ha ook each mehod o be 10%, 1%,..., 10 4 % away from he opimal soluion given by he classical dynamic programming CDP) echnique. I also shows he ime o compue he CDP soluion. The error is measured according o η n = F F n F ) All mehods were limied o 2 million ieraions. Noe ha insances 3 and 4 did no reach he 10 2 % level. This is due o he fac ha hese insances use he mean reversion price process and he monoone increasing propery of he slopes in he price dimension does no apply o his process. Hence, his propery could no be imposed in order o speed up convergence. Table 2 also conveys ha he compuaional ime for he Bach approach is much higher han he compuaional ime of he ADP approach. I follows ha even hough he Bach mehod makes beer use of he informaion in each sample realizaion, i does no ranslae ino beer soluions in compeiive ime, showing ha our asynchronous algorihm performs beer han he synchronous one. The same is rue for he RTDP approach. More informaion given by he expecaion insead of a sample realizaion does no resul in an improvemen in he soluions, when he same amoun of ime is considered for boh he ADP and RTDP approach. 30

Optimal approximate dynamic programming algorithms for a general class of storage problems

Optimal approximate dynamic programming algorithms for a general class of storage problems Opimal approximae dynamic programming algorihms for a general class of sorage problems Juliana M. Nascimeno Warren B. Powell Deparmen of Operaions Research and Financial Engineering Princeon Universiy

More information

T L. t=1. Proof of Lemma 1. Using the marginal cost accounting in Equation(4) and standard arguments. t )+Π RB. t )+K 1(Q RB

T L. t=1. Proof of Lemma 1. Using the marginal cost accounting in Equation(4) and standard arguments. t )+Π RB. t )+K 1(Q RB Elecronic Companion EC.1. Proofs of Technical Lemmas and Theorems LEMMA 1. Le C(RB) be he oal cos incurred by he RB policy. Then we have, T L E[C(RB)] 3 E[Z RB ]. (EC.1) Proof of Lemma 1. Using he marginal

More information

An introduction to the theory of SDDP algorithm

An introduction to the theory of SDDP algorithm An inroducion o he heory of SDDP algorihm V. Leclère (ENPC) Augus 1, 2014 V. Leclère Inroducion o SDDP Augus 1, 2014 1 / 21 Inroducion Large scale sochasic problem are hard o solve. Two ways of aacking

More information

On Boundedness of Q-Learning Iterates for Stochastic Shortest Path Problems

On Boundedness of Q-Learning Iterates for Stochastic Shortest Path Problems MATHEMATICS OF OPERATIONS RESEARCH Vol. 38, No. 2, May 2013, pp. 209 227 ISSN 0364-765X (prin) ISSN 1526-5471 (online) hp://dx.doi.org/10.1287/moor.1120.0562 2013 INFORMS On Boundedness of Q-Learning Ieraes

More information

Energy Storage Benchmark Problems

Energy Storage Benchmark Problems Energy Sorage Benchmark Problems Daniel F. Salas 1,3, Warren B. Powell 2,3 1 Deparmen of Chemical & Biological Engineering 2 Deparmen of Operaions Research & Financial Engineering 3 Princeon Laboraory

More information

Inventory Analysis and Management. Multi-Period Stochastic Models: Optimality of (s, S) Policy for K-Convex Objective Functions

Inventory Analysis and Management. Multi-Period Stochastic Models: Optimality of (s, S) Policy for K-Convex Objective Functions Muli-Period Sochasic Models: Opimali of (s, S) Polic for -Convex Objecive Funcions Consider a seing similar o he N-sage newsvendor problem excep ha now here is a fixed re-ordering cos (> 0) for each (re-)order.

More information

Finish reading Chapter 2 of Spivak, rereading earlier sections as necessary. handout and fill in some missing details!

Finish reading Chapter 2 of Spivak, rereading earlier sections as necessary. handout and fill in some missing details! MAT 257, Handou 6: Ocober 7-2, 20. I. Assignmen. Finish reading Chaper 2 of Spiva, rereading earlier secions as necessary. handou and fill in some missing deails! II. Higher derivaives. Also, read his

More information

STATE-SPACE MODELLING. A mass balance across the tank gives:

STATE-SPACE MODELLING. A mass balance across the tank gives: B. Lennox and N.F. Thornhill, 9, Sae Space Modelling, IChemE Process Managemen and Conrol Subjec Group Newsleer STE-SPACE MODELLING Inroducion: Over he pas decade or so here has been an ever increasing

More information

Some Ramsey results for the n-cube

Some Ramsey results for the n-cube Some Ramsey resuls for he n-cube Ron Graham Universiy of California, San Diego Jozsef Solymosi Universiy of Briish Columbia, Vancouver, Canada Absrac In his noe we esablish a Ramsey-ype resul for cerain

More information

1 Review of Zero-Sum Games

1 Review of Zero-Sum Games COS 5: heoreical Machine Learning Lecurer: Rob Schapire Lecure #23 Scribe: Eugene Brevdo April 30, 2008 Review of Zero-Sum Games Las ime we inroduced a mahemaical model for wo player zero-sum games. Any

More information

Christos Papadimitriou & Luca Trevisan November 22, 2016

Christos Papadimitriou & Luca Trevisan November 22, 2016 U.C. Bereley CS170: Algorihms Handou LN-11-22 Chrisos Papadimiriou & Luca Trevisan November 22, 2016 Sreaming algorihms In his lecure and he nex one we sudy memory-efficien algorihms ha process a sream

More information

PENALIZED LEAST SQUARES AND PENALIZED LIKELIHOOD

PENALIZED LEAST SQUARES AND PENALIZED LIKELIHOOD PENALIZED LEAST SQUARES AND PENALIZED LIKELIHOOD HAN XIAO 1. Penalized Leas Squares Lasso solves he following opimizaion problem, ˆβ lasso = arg max β R p+1 1 N y i β 0 N x ij β j β j (1.1) for some 0.

More information

Diebold, Chapter 7. Francis X. Diebold, Elements of Forecasting, 4th Edition (Mason, Ohio: Cengage Learning, 2006). Chapter 7. Characterizing Cycles

Diebold, Chapter 7. Francis X. Diebold, Elements of Forecasting, 4th Edition (Mason, Ohio: Cengage Learning, 2006). Chapter 7. Characterizing Cycles Diebold, Chaper 7 Francis X. Diebold, Elemens of Forecasing, 4h Ediion (Mason, Ohio: Cengage Learning, 006). Chaper 7. Characerizing Cycles Afer compleing his reading you should be able o: Define covariance

More information

Chapter 2. First Order Scalar Equations

Chapter 2. First Order Scalar Equations Chaper. Firs Order Scalar Equaions We sar our sudy of differenial equaions in he same way he pioneers in his field did. We show paricular echniques o solve paricular ypes of firs order differenial equaions.

More information

Application of a Stochastic-Fuzzy Approach to Modeling Optimal Discrete Time Dynamical Systems by Using Large Scale Data Processing

Application of a Stochastic-Fuzzy Approach to Modeling Optimal Discrete Time Dynamical Systems by Using Large Scale Data Processing Applicaion of a Sochasic-Fuzzy Approach o Modeling Opimal Discree Time Dynamical Sysems by Using Large Scale Daa Processing AA WALASZE-BABISZEWSA Deparmen of Compuer Engineering Opole Universiy of Technology

More information

Seminar 4: Hotelling 2

Seminar 4: Hotelling 2 Seminar 4: Hoelling 2 November 3, 211 1 Exercise Par 1 Iso-elasic demand A non renewable resource of a known sock S can be exraced a zero cos. Demand for he resource is of he form: D(p ) = p ε ε > A a

More information

Lecture Notes 2. The Hilbert Space Approach to Time Series

Lecture Notes 2. The Hilbert Space Approach to Time Series Time Series Seven N. Durlauf Universiy of Wisconsin. Basic ideas Lecure Noes. The Hilber Space Approach o Time Series The Hilber space framework provides a very powerful language for discussing he relaionship

More information

Vehicle Arrival Models : Headway

Vehicle Arrival Models : Headway Chaper 12 Vehicle Arrival Models : Headway 12.1 Inroducion Modelling arrival of vehicle a secion of road is an imporan sep in raffic flow modelling. I has imporan applicaion in raffic flow simulaion where

More information

Supplement for Stochastic Convex Optimization: Faster Local Growth Implies Faster Global Convergence

Supplement for Stochastic Convex Optimization: Faster Local Growth Implies Faster Global Convergence Supplemen for Sochasic Convex Opimizaion: Faser Local Growh Implies Faser Global Convergence Yi Xu Qihang Lin ianbao Yang Proof of heorem heorem Suppose Assumpion holds and F (w) obeys he LGC (6) Given

More information

Notes for Lecture 17-18

Notes for Lecture 17-18 U.C. Berkeley CS278: Compuaional Complexiy Handou N7-8 Professor Luca Trevisan April 3-8, 2008 Noes for Lecure 7-8 In hese wo lecures we prove he firs half of he PCP Theorem, he Amplificaion Lemma, up

More information

Online Convex Optimization Example And Follow-The-Leader

Online Convex Optimization Example And Follow-The-Leader CSE599s, Spring 2014, Online Learning Lecure 2-04/03/2014 Online Convex Opimizaion Example And Follow-The-Leader Lecurer: Brendan McMahan Scribe: Sephen Joe Jonany 1 Review of Online Convex Opimizaion

More information

Matrix Versions of Some Refinements of the Arithmetic-Geometric Mean Inequality

Matrix Versions of Some Refinements of the Arithmetic-Geometric Mean Inequality Marix Versions of Some Refinemens of he Arihmeic-Geomeric Mean Inequaliy Bao Qi Feng and Andrew Tonge Absrac. We esablish marix versions of refinemens due o Alzer ], Carwrigh and Field 4], and Mercer 5]

More information

Lecture 2-1 Kinematics in One Dimension Displacement, Velocity and Acceleration Everything in the world is moving. Nothing stays still.

Lecture 2-1 Kinematics in One Dimension Displacement, Velocity and Acceleration Everything in the world is moving. Nothing stays still. Lecure - Kinemaics in One Dimension Displacemen, Velociy and Acceleraion Everyhing in he world is moving. Nohing says sill. Moion occurs a all scales of he universe, saring from he moion of elecrons in

More information

ACE 562 Fall Lecture 5: The Simple Linear Regression Model: Sampling Properties of the Least Squares Estimators. by Professor Scott H.

ACE 562 Fall Lecture 5: The Simple Linear Regression Model: Sampling Properties of the Least Squares Estimators. by Professor Scott H. ACE 56 Fall 005 Lecure 5: he Simple Linear Regression Model: Sampling Properies of he Leas Squares Esimaors by Professor Sco H. Irwin Required Reading: Griffihs, Hill and Judge. "Inference in he Simple

More information

Technical Report Doc ID: TR March-2013 (Last revision: 23-February-2016) On formulating quadratic functions in optimization models.

Technical Report Doc ID: TR March-2013 (Last revision: 23-February-2016) On formulating quadratic functions in optimization models. Technical Repor Doc ID: TR--203 06-March-203 (Las revision: 23-Februar-206) On formulaing quadraic funcions in opimizaion models. Auhor: Erling D. Andersen Convex quadraic consrains quie frequenl appear

More information

Hamilton- J acobi Equation: Weak S olution We continue the study of the Hamilton-Jacobi equation:

Hamilton- J acobi Equation: Weak S olution We continue the study of the Hamilton-Jacobi equation: M ah 5 7 Fall 9 L ecure O c. 4, 9 ) Hamilon- J acobi Equaion: Weak S oluion We coninue he sudy of he Hamilon-Jacobi equaion: We have shown ha u + H D u) = R n, ) ; u = g R n { = }. ). In general we canno

More information

E β t log (C t ) + M t M t 1. = Y t + B t 1 P t. B t 0 (3) v t = P tc t M t Question 1. Find the FOC s for an optimum in the agent s problem.

E β t log (C t ) + M t M t 1. = Y t + B t 1 P t. B t 0 (3) v t = P tc t M t Question 1. Find the FOC s for an optimum in the agent s problem. Noes, M. Krause.. Problem Se 9: Exercise on FTPL Same model as in paper and lecure, only ha one-period govenmen bonds are replaced by consols, which are bonds ha pay one dollar forever. I has curren marke

More information

AN APPROXIMATE DYNAMIC PROGRAMMING ALGORITHM FOR MONOTONE VALUE FUNCTIONS

AN APPROXIMATE DYNAMIC PROGRAMMING ALGORITHM FOR MONOTONE VALUE FUNCTIONS AN APPROXIMATE DYNAMIC PROGRAMMING ALGORITHM FOR MONOTONE VALUE FUNCTIONS DANIEL R. JIANG AND WARREN B. POWELL Absrac. Many sequenial decision problems can be formulaed as Markov Decision Processes (MDPs)

More information

EXERCISES FOR SECTION 1.5

EXERCISES FOR SECTION 1.5 1.5 Exisence and Uniqueness of Soluions 43 20. 1 v c 21. 1 v c 1 2 4 6 8 10 1 2 2 4 6 8 10 Graph of approximae soluion obained using Euler s mehod wih = 0.1. Graph of approximae soluion obained using Euler

More information

A Primal-Dual Type Algorithm with the O(1/t) Convergence Rate for Large Scale Constrained Convex Programs

A Primal-Dual Type Algorithm with the O(1/t) Convergence Rate for Large Scale Constrained Convex Programs PROC. IEEE CONFERENCE ON DECISION AND CONTROL, 06 A Primal-Dual Type Algorihm wih he O(/) Convergence Rae for Large Scale Consrained Convex Programs Hao Yu and Michael J. Neely Absrac This paper considers

More information

Final Spring 2007

Final Spring 2007 .615 Final Spring 7 Overview The purpose of he final exam is o calculae he MHD β limi in a high-bea oroidal okamak agains he dangerous n = 1 exernal ballooning-kink mode. Effecively, his corresponds o

More information

5. Stochastic processes (1)

5. Stochastic processes (1) Lec05.pp S-38.45 - Inroducion o Teleraffic Theory Spring 2005 Conens Basic conceps Poisson process 2 Sochasic processes () Consider some quaniy in a eleraffic (or any) sysem I ypically evolves in ime randomly

More information

RANDOM LAGRANGE MULTIPLIERS AND TRANSVERSALITY

RANDOM LAGRANGE MULTIPLIERS AND TRANSVERSALITY ECO 504 Spring 2006 Chris Sims RANDOM LAGRANGE MULTIPLIERS AND TRANSVERSALITY 1. INTRODUCTION Lagrange muliplier mehods are sandard fare in elemenary calculus courses, and hey play a cenral role in economic

More information

Decentralized Stochastic Control with Partial History Sharing: A Common Information Approach

Decentralized Stochastic Control with Partial History Sharing: A Common Information Approach 1 Decenralized Sochasic Conrol wih Parial Hisory Sharing: A Common Informaion Approach Ashuosh Nayyar, Adiya Mahajan and Demoshenis Tenekezis arxiv:1209.1695v1 [cs.sy] 8 Sep 2012 Absrac A general model

More information

Random Walk with Anti-Correlated Steps

Random Walk with Anti-Correlated Steps Random Walk wih Ani-Correlaed Seps John Noga Dirk Wagner 2 Absrac We conjecure he expeced value of random walks wih ani-correlaed seps o be exacly. We suppor his conjecure wih 2 plausibiliy argumens and

More information

Course Notes for EE227C (Spring 2018): Convex Optimization and Approximation

Course Notes for EE227C (Spring 2018): Convex Optimization and Approximation Course Noes for EE7C Spring 018: Convex Opimizaion and Approximaion Insrucor: Moriz Hard Email: hard+ee7c@berkeley.edu Graduae Insrucor: Max Simchowiz Email: msimchow+ee7c@berkeley.edu Ocober 15, 018 3

More information

Two Popular Bayesian Estimators: Particle and Kalman Filters. McGill COMP 765 Sept 14 th, 2017

Two Popular Bayesian Estimators: Particle and Kalman Filters. McGill COMP 765 Sept 14 th, 2017 Two Popular Bayesian Esimaors: Paricle and Kalman Filers McGill COMP 765 Sep 14 h, 2017 1 1 1, dx x Bel x u x P x z P Recall: Bayes Filers,,,,,,, 1 1 1 1 u z u x P u z u x z P Bayes z = observaion u =

More information

Two Coupled Oscillators / Normal Modes

Two Coupled Oscillators / Normal Modes Lecure 3 Phys 3750 Two Coupled Oscillaors / Normal Modes Overview and Moivaion: Today we ake a small, bu significan, sep owards wave moion. We will no ye observe waves, bu his sep is imporan in is own

More information

Robust estimation based on the first- and third-moment restrictions of the power transformation model

Robust estimation based on the first- and third-moment restrictions of the power transformation model h Inernaional Congress on Modelling and Simulaion, Adelaide, Ausralia, 6 December 3 www.mssanz.org.au/modsim3 Robus esimaion based on he firs- and hird-momen resricions of he power ransformaion Nawaa,

More information

The expectation value of the field operator.

The expectation value of the field operator. The expecaion value of he field operaor. Dan Solomon Universiy of Illinois Chicago, IL dsolom@uic.edu June, 04 Absrac. Much of he mahemaical developmen of quanum field heory has been in suppor of deermining

More information

Inventory Control of Perishable Items in a Two-Echelon Supply Chain

Inventory Control of Perishable Items in a Two-Echelon Supply Chain Journal of Indusrial Engineering, Universiy of ehran, Special Issue,, PP. 69-77 69 Invenory Conrol of Perishable Iems in a wo-echelon Supply Chain Fariborz Jolai *, Elmira Gheisariha and Farnaz Nojavan

More information

The Strong Law of Large Numbers

The Strong Law of Large Numbers Lecure 9 The Srong Law of Large Numbers Reading: Grimme-Sirzaker 7.2; David Williams Probabiliy wih Maringales 7.2 Furher reading: Grimme-Sirzaker 7.1, 7.3-7.5 Wih he Convergence Theorem (Theorem 54) and

More information

Notes on Kalman Filtering

Notes on Kalman Filtering Noes on Kalman Filering Brian Borchers and Rick Aser November 7, Inroducion Daa Assimilaion is he problem of merging model predicions wih acual measuremens of a sysem o produce an opimal esimae of he curren

More information

O Q L N. Discrete-Time Stochastic Dynamic Programming. I. Notation and basic assumptions. ε t : a px1 random vector of disturbances at time t.

O Q L N. Discrete-Time Stochastic Dynamic Programming. I. Notation and basic assumptions. ε t : a px1 random vector of disturbances at time t. Econ. 5b Spring 999 C. Sims Discree-Time Sochasic Dynamic Programming 995, 996 by Chrisopher Sims. This maerial may be freely reproduced for educaional and research purposes, so long as i is no alered,

More information

Optimality Conditions for Unconstrained Problems

Optimality Conditions for Unconstrained Problems 62 CHAPTER 6 Opimaliy Condiions for Unconsrained Problems 1 Unconsrained Opimizaion 11 Exisence Consider he problem of minimizing he funcion f : R n R where f is coninuous on all of R n : P min f(x) x

More information

Bias-Variance Error Bounds for Temporal Difference Updates

Bias-Variance Error Bounds for Temporal Difference Updates Bias-Variance Bounds for Temporal Difference Updaes Michael Kearns AT&T Labs mkearns@research.a.com Sainder Singh AT&T Labs baveja@research.a.com Absrac We give he firs rigorous upper bounds on he error

More information

Guest Lectures for Dr. MacFarlane s EE3350 Part Deux

Guest Lectures for Dr. MacFarlane s EE3350 Part Deux Gues Lecures for Dr. MacFarlane s EE3350 Par Deux Michael Plane Mon., 08-30-2010 Wrie name in corner. Poin ou his is a review, so I will go faser. Remind hem o go lisen o online lecure abou geing an A

More information

10. State Space Methods

10. State Space Methods . Sae Space Mehods. Inroducion Sae space modelling was briefly inroduced in chaper. Here more coverage is provided of sae space mehods before some of heir uses in conrol sysem design are covered in he

More information

3.1.3 INTRODUCTION TO DYNAMIC OPTIMIZATION: DISCRETE TIME PROBLEMS. A. The Hamiltonian and First-Order Conditions in a Finite Time Horizon

3.1.3 INTRODUCTION TO DYNAMIC OPTIMIZATION: DISCRETE TIME PROBLEMS. A. The Hamiltonian and First-Order Conditions in a Finite Time Horizon 3..3 INRODUCION O DYNAMIC OPIMIZAION: DISCREE IME PROBLEMS A. he Hamilonian and Firs-Order Condiions in a Finie ime Horizon Define a new funcion, he Hamilonian funcion, H. H he change in he oal value of

More information

A Hop Constrained Min-Sum Arborescence with Outage Costs

A Hop Constrained Min-Sum Arborescence with Outage Costs A Hop Consrained Min-Sum Arborescence wih Ouage Coss Rakesh Kawara Minnesoa Sae Universiy, Mankao, MN 56001 Email: Kawara@mnsu.edu Absrac The hop consrained min-sum arborescence wih ouage coss problem

More information

Estimation of Poses with Particle Filters

Estimation of Poses with Particle Filters Esimaion of Poses wih Paricle Filers Dr.-Ing. Bernd Ludwig Chair for Arificial Inelligence Deparmen of Compuer Science Friedrich-Alexander-Universiä Erlangen-Nürnberg 12/05/2008 Dr.-Ing. Bernd Ludwig (FAU

More information

23.2. Representing Periodic Functions by Fourier Series. Introduction. Prerequisites. Learning Outcomes

23.2. Representing Periodic Functions by Fourier Series. Introduction. Prerequisites. Learning Outcomes Represening Periodic Funcions by Fourier Series 3. Inroducion In his Secion we show how a periodic funcion can be expressed as a series of sines and cosines. We begin by obaining some sandard inegrals

More information

MATH 5720: Gradient Methods Hung Phan, UMass Lowell October 4, 2018

MATH 5720: Gradient Methods Hung Phan, UMass Lowell October 4, 2018 MATH 5720: Gradien Mehods Hung Phan, UMass Lowell Ocober 4, 208 Descen Direcion Mehods Consider he problem min { f(x) x R n}. The general descen direcions mehod is x k+ = x k + k d k where x k is he curren

More information

Let us start with a two dimensional case. We consider a vector ( x,

Let us start with a two dimensional case. We consider a vector ( x, Roaion marices We consider now roaion marices in wo and hree dimensions. We sar wih wo dimensions since wo dimensions are easier han hree o undersand, and one dimension is a lile oo simple. However, our

More information

SZG Macro 2011 Lecture 3: Dynamic Programming. SZG macro 2011 lecture 3 1

SZG Macro 2011 Lecture 3: Dynamic Programming. SZG macro 2011 lecture 3 1 SZG Macro 2011 Lecure 3: Dynamic Programming SZG macro 2011 lecure 3 1 Background Our previous discussion of opimal consumpion over ime and of opimal capial accumulaion sugges sudying he general decision

More information

Some Basic Information about M-S-D Systems

Some Basic Information about M-S-D Systems Some Basic Informaion abou M-S-D Sysems 1 Inroducion We wan o give some summary of he facs concerning unforced (homogeneous) and forced (non-homogeneous) models for linear oscillaors governed by second-order,

More information

23.5. Half-Range Series. Introduction. Prerequisites. Learning Outcomes

23.5. Half-Range Series. Introduction. Prerequisites. Learning Outcomes Half-Range Series 2.5 Inroducion In his Secion we address he following problem: Can we find a Fourier series expansion of a funcion defined over a finie inerval? Of course we recognise ha such a funcion

More information

Chapter 3 Boundary Value Problem

Chapter 3 Boundary Value Problem Chaper 3 Boundary Value Problem A boundary value problem (BVP) is a problem, ypically an ODE or a PDE, which has values assigned on he physical boundary of he domain in which he problem is specified. Le

More information

Designing Information Devices and Systems I Spring 2019 Lecture Notes Note 17

Designing Information Devices and Systems I Spring 2019 Lecture Notes Note 17 EES 16A Designing Informaion Devices and Sysems I Spring 019 Lecure Noes Noe 17 17.1 apaciive ouchscreen In he las noe, we saw ha a capacior consiss of wo pieces on conducive maerial separaed by a nonconducive

More information

GMM - Generalized Method of Moments

GMM - Generalized Method of Moments GMM - Generalized Mehod of Momens Conens GMM esimaion, shor inroducion 2 GMM inuiion: Maching momens 2 3 General overview of GMM esimaion. 3 3. Weighing marix...........................................

More information

Approximation Algorithms for Unique Games via Orthogonal Separators

Approximation Algorithms for Unique Games via Orthogonal Separators Approximaion Algorihms for Unique Games via Orhogonal Separaors Lecure noes by Konsanin Makarychev. Lecure noes are based on he papers [CMM06a, CMM06b, LM4]. Unique Games In hese lecure noes, we define

More information

KINEMATICS IN ONE DIMENSION

KINEMATICS IN ONE DIMENSION KINEMATICS IN ONE DIMENSION PREVIEW Kinemaics is he sudy of how hings move how far (disance and displacemen), how fas (speed and velociy), and how fas ha how fas changes (acceleraion). We say ha an objec

More information

Echocardiography Project and Finite Fourier Series

Echocardiography Project and Finite Fourier Series Echocardiography Projec and Finie Fourier Series 1 U M An echocardiagram is a plo of how a porion of he hear moves as he funcion of ime over he one or more hearbea cycles If he hearbea repeas iself every

More information

14 Autoregressive Moving Average Models

14 Autoregressive Moving Average Models 14 Auoregressive Moving Average Models In his chaper an imporan parameric family of saionary ime series is inroduced, he family of he auoregressive moving average, or ARMA, processes. For a large class

More information

Simulation-Solving Dynamic Models ABE 5646 Week 2, Spring 2010

Simulation-Solving Dynamic Models ABE 5646 Week 2, Spring 2010 Simulaion-Solving Dynamic Models ABE 5646 Week 2, Spring 2010 Week Descripion Reading Maerial 2 Compuer Simulaion of Dynamic Models Finie Difference, coninuous saes, discree ime Simple Mehods Euler Trapezoid

More information

Zürich. ETH Master Course: L Autonomous Mobile Robots Localization II

Zürich. ETH Master Course: L Autonomous Mobile Robots Localization II Roland Siegwar Margaria Chli Paul Furgale Marco Huer Marin Rufli Davide Scaramuzza ETH Maser Course: 151-0854-00L Auonomous Mobile Robos Localizaion II ACT and SEE For all do, (predicion updae / ACT),

More information

arxiv: v1 [math.pr] 19 Feb 2011

arxiv: v1 [math.pr] 19 Feb 2011 A NOTE ON FELLER SEMIGROUPS AND RESOLVENTS VADIM KOSTRYKIN, JÜRGEN POTTHOFF, AND ROBERT SCHRADER ABSTRACT. Various equivalen condiions for a semigroup or a resolven generaed by a Markov process o be of

More information

Physics 235 Chapter 2. Chapter 2 Newtonian Mechanics Single Particle

Physics 235 Chapter 2. Chapter 2 Newtonian Mechanics Single Particle Chaper 2 Newonian Mechanics Single Paricle In his Chaper we will review wha Newon s laws of mechanics ell us abou he moion of a single paricle. Newon s laws are only valid in suiable reference frames,

More information

Article from. Predictive Analytics and Futurism. July 2016 Issue 13

Article from. Predictive Analytics and Futurism. July 2016 Issue 13 Aricle from Predicive Analyics and Fuurism July 6 Issue An Inroducion o Incremenal Learning By Qiang Wu and Dave Snell Machine learning provides useful ools for predicive analyics The ypical machine learning

More information

The Asymptotic Behavior of Nonoscillatory Solutions of Some Nonlinear Dynamic Equations on Time Scales

The Asymptotic Behavior of Nonoscillatory Solutions of Some Nonlinear Dynamic Equations on Time Scales Advances in Dynamical Sysems and Applicaions. ISSN 0973-5321 Volume 1 Number 1 (2006, pp. 103 112 c Research India Publicaions hp://www.ripublicaion.com/adsa.hm The Asympoic Behavior of Nonoscillaory Soluions

More information

Section 3.5 Nonhomogeneous Equations; Method of Undetermined Coefficients

Section 3.5 Nonhomogeneous Equations; Method of Undetermined Coefficients Secion 3.5 Nonhomogeneous Equaions; Mehod of Undeermined Coefficiens Key Terms/Ideas: Linear Differenial operaor Nonlinear operaor Second order homogeneous DE Second order nonhomogeneous DE Soluion o homogeneous

More information

Class Meeting # 10: Introduction to the Wave Equation

Class Meeting # 10: Introduction to the Wave Equation MATH 8.5 COURSE NOTES - CLASS MEETING # 0 8.5 Inroducion o PDEs, Fall 0 Professor: Jared Speck Class Meeing # 0: Inroducion o he Wave Equaion. Wha is he wave equaion? The sandard wave equaion for a funcion

More information

Solutions from Chapter 9.1 and 9.2

Solutions from Chapter 9.1 and 9.2 Soluions from Chaper 9 and 92 Secion 9 Problem # This basically boils down o an exercise in he chain rule from calculus We are looking for soluions of he form: u( x) = f( k x c) where k x R 3 and k is

More information

Exponential Weighted Moving Average (EWMA) Chart Under The Assumption of Moderateness And Its 3 Control Limits

Exponential Weighted Moving Average (EWMA) Chart Under The Assumption of Moderateness And Its 3 Control Limits DOI: 0.545/mjis.07.5009 Exponenial Weighed Moving Average (EWMA) Char Under The Assumpion of Moderaeness And Is 3 Conrol Limis KALPESH S TAILOR Assisan Professor, Deparmen of Saisics, M. K. Bhavnagar Universiy,

More information

L07. KALMAN FILTERING FOR NON-LINEAR SYSTEMS. NA568 Mobile Robotics: Methods & Algorithms

L07. KALMAN FILTERING FOR NON-LINEAR SYSTEMS. NA568 Mobile Robotics: Methods & Algorithms L07. KALMAN FILTERING FOR NON-LINEAR SYSTEMS NA568 Mobile Roboics: Mehods & Algorihms Today s Topic Quick review on (Linear) Kalman Filer Kalman Filering for Non-Linear Sysems Exended Kalman Filer (EKF)

More information

Single and Double Pendulum Models

Single and Double Pendulum Models Single and Double Pendulum Models Mah 596 Projec Summary Spring 2016 Jarod Har 1 Overview Differen ypes of pendulums are used o model many phenomena in various disciplines. In paricular, single and double

More information

Matlab and Python programming: how to get started

Matlab and Python programming: how to get started Malab and Pyhon programming: how o ge sared Equipping readers he skills o wrie programs o explore complex sysems and discover ineresing paerns from big daa is one of he main goals of his book. In his chaper,

More information

Lecture 2 October ε-approximation of 2-player zero-sum games

Lecture 2 October ε-approximation of 2-player zero-sum games Opimizaion II Winer 009/10 Lecurer: Khaled Elbassioni Lecure Ocober 19 1 ε-approximaion of -player zero-sum games In his lecure we give a randomized ficiious play algorihm for obaining an approximae soluion

More information

Online Appendix to Solution Methods for Models with Rare Disasters

Online Appendix to Solution Methods for Models with Rare Disasters Online Appendix o Soluion Mehods for Models wih Rare Disasers Jesús Fernández-Villaverde and Oren Levinal In his Online Appendix, we presen he Euler condiions of he model, we develop he pricing Calvo block,

More information

20. Applications of the Genetic-Drift Model

20. Applications of the Genetic-Drift Model 0. Applicaions of he Geneic-Drif Model 1) Deermining he probabiliy of forming any paricular combinaion of genoypes in he nex generaion: Example: If he parenal allele frequencies are p 0 = 0.35 and q 0

More information

SOLUTIONS TO ECE 3084

SOLUTIONS TO ECE 3084 SOLUTIONS TO ECE 384 PROBLEM 2.. For each sysem below, specify wheher or no i is: (i) memoryless; (ii) causal; (iii) inverible; (iv) linear; (v) ime invarian; Explain your reasoning. If he propery is no

More information

Convergence of the Neumann series in higher norms

Convergence of the Neumann series in higher norms Convergence of he Neumann series in higher norms Charles L. Epsein Deparmen of Mahemaics, Universiy of Pennsylvania Version 1.0 Augus 1, 003 Absrac Naural condiions on an operaor A are given so ha he Neumann

More information

An Introduction to Malliavin calculus and its applications

An Introduction to Malliavin calculus and its applications An Inroducion o Malliavin calculus and is applicaions Lecure 5: Smoohness of he densiy and Hörmander s heorem David Nualar Deparmen of Mahemaics Kansas Universiy Universiy of Wyoming Summer School 214

More information

BU Macro BU Macro Fall 2008, Lecture 4

BU Macro BU Macro Fall 2008, Lecture 4 Dynamic Programming BU Macro 2008 Lecure 4 1 Ouline 1. Cerainy opimizaion problem used o illusrae: a. Resricions on exogenous variables b. Value funcion c. Policy funcion d. The Bellman equaion and an

More information

Unit Root Time Series. Univariate random walk

Unit Root Time Series. Univariate random walk Uni Roo ime Series Univariae random walk Consider he regression y y where ~ iid N 0, he leas squares esimae of is: ˆ yy y y yy Now wha if = If y y hen le y 0 =0 so ha y j j If ~ iid N 0, hen y ~ N 0, he

More information

Modal identification of structures from roving input data by means of maximum likelihood estimation of the state space model

Modal identification of structures from roving input data by means of maximum likelihood estimation of the state space model Modal idenificaion of srucures from roving inpu daa by means of maximum likelihood esimaion of he sae space model J. Cara, J. Juan, E. Alarcón Absrac The usual way o perform a forced vibraion es is o fix

More information

R t. C t P t. + u t. C t = αp t + βr t + v t. + β + w t

R t. C t P t. + u t. C t = αp t + βr t + v t. + β + w t Exercise 7 C P = α + β R P + u C = αp + βr + v (a) (b) C R = α P R + β + w (c) Assumpions abou he disurbances u, v, w : Classical assumions on he disurbance of one of he equaions, eg. on (b): E(v v s P,

More information

Operations Research. An Approximate Dynamic Programming Algorithm for Monotone Value Functions

Operations Research. An Approximate Dynamic Programming Algorithm for Monotone Value Functions This aricle was downloaded by: [140.1.241.64] On: 05 January 2016, A: 21:41 Publisher: Insiue for Operaions Research and he Managemen Sciences (INFORMS) INFORMS is locaed in Maryland, USA Operaions Research

More information

A Shooting Method for A Node Generation Algorithm

A Shooting Method for A Node Generation Algorithm A Shooing Mehod for A Node Generaion Algorihm Hiroaki Nishikawa W.M.Keck Foundaion Laboraory for Compuaional Fluid Dynamics Deparmen of Aerospace Engineering, Universiy of Michigan, Ann Arbor, Michigan

More information

A Dynamic Model of Economic Fluctuations

A Dynamic Model of Economic Fluctuations CHAPTER 15 A Dynamic Model of Economic Flucuaions Modified for ECON 2204 by Bob Murphy 2016 Worh Publishers, all righs reserved IN THIS CHAPTER, OU WILL LEARN: how o incorporae dynamics ino he AD-AS model

More information

t is a basis for the solution space to this system, then the matrix having these solutions as columns, t x 1 t, x 2 t,... x n t x 2 t...

t is a basis for the solution space to this system, then the matrix having these solutions as columns, t x 1 t, x 2 t,... x n t x 2 t... Mah 228- Fri Mar 24 5.6 Marix exponenials and linear sysems: The analogy beween firs order sysems of linear differenial equaions (Chaper 5) and scalar linear differenial equaions (Chaper ) is much sronger

More information

6. Stochastic calculus with jump processes

6. Stochastic calculus with jump processes A) Trading sraegies (1/3) Marke wih d asses S = (S 1,, S d ) A rading sraegy can be modelled wih a vecor φ describing he quaniies invesed in each asse a each insan : φ = (φ 1,, φ d ) The value a of a porfolio

More information

A Forward-Backward Splitting Method with Component-wise Lazy Evaluation for Online Structured Convex Optimization

A Forward-Backward Splitting Method with Component-wise Lazy Evaluation for Online Structured Convex Optimization A Forward-Backward Spliing Mehod wih Componen-wise Lazy Evaluaion for Online Srucured Convex Opimizaion Yukihiro Togari and Nobuo Yamashia March 28, 2016 Absrac: We consider large-scale opimizaion problems

More information

On Measuring Pro-Poor Growth. 1. On Various Ways of Measuring Pro-Poor Growth: A Short Review of the Literature

On Measuring Pro-Poor Growth. 1. On Various Ways of Measuring Pro-Poor Growth: A Short Review of the Literature On Measuring Pro-Poor Growh 1. On Various Ways of Measuring Pro-Poor Growh: A Shor eview of he Lieraure During he pas en years or so here have been various suggesions concerning he way one should check

More information

Essential Microeconomics : OPTIMAL CONTROL 1. Consider the following class of optimization problems

Essential Microeconomics : OPTIMAL CONTROL 1. Consider the following class of optimization problems Essenial Microeconomics -- 6.5: OPIMAL CONROL Consider he following class of opimizaion problems Max{ U( k, x) + U+ ( k+ ) k+ k F( k, x)}. { x, k+ } = In he language of conrol heory, he vecor k is he vecor

More information

0.1 MAXIMUM LIKELIHOOD ESTIMATION EXPLAINED

0.1 MAXIMUM LIKELIHOOD ESTIMATION EXPLAINED 0.1 MAXIMUM LIKELIHOOD ESTIMATIO EXPLAIED Maximum likelihood esimaion is a bes-fi saisical mehod for he esimaion of he values of he parameers of a sysem, based on a se of observaions of a random variable

More information

Planning in POMDPs. Dominik Schoenberger Abstract

Planning in POMDPs. Dominik Schoenberger Abstract Planning in POMDPs Dominik Schoenberger d.schoenberger@sud.u-darmsad.de Absrac This documen briefly explains wha a Parially Observable Markov Decision Process is. Furhermore i inroduces he differen approaches

More information

ACE 562 Fall Lecture 8: The Simple Linear Regression Model: R 2, Reporting the Results and Prediction. by Professor Scott H.

ACE 562 Fall Lecture 8: The Simple Linear Regression Model: R 2, Reporting the Results and Prediction. by Professor Scott H. ACE 56 Fall 5 Lecure 8: The Simple Linear Regression Model: R, Reporing he Resuls and Predicion by Professor Sco H. Irwin Required Readings: Griffihs, Hill and Judge. "Explaining Variaion in he Dependen

More information

Lecture 9: September 25

Lecture 9: September 25 0-725: Opimizaion Fall 202 Lecure 9: Sepember 25 Lecurer: Geoff Gordon/Ryan Tibshirani Scribes: Xuezhi Wang, Subhodeep Moira, Abhimanu Kumar Noe: LaTeX emplae couresy of UC Berkeley EECS dep. Disclaimer:

More information

Linear Response Theory: The connection between QFT and experiments

Linear Response Theory: The connection between QFT and experiments Phys540.nb 39 3 Linear Response Theory: The connecion beween QFT and experimens 3.1. Basic conceps and ideas Q: How do we measure he conduciviy of a meal? A: we firs inroduce a weak elecric field E, and

More information