IN this paper, we investigate the effectiveness of several

Size: px
Start display at page:

Download "IN this paper, we investigate the effectiveness of several"

Transcription

1 A Comparison of Approximae Dynamic Programming Techniques on Benchmark Energy Sorage Problems: Does Anyhing Work? Daniel R. Jiang, Thuy V. Pham, Warren B. Powell, Daniel F. Salas, and Warren R. Sco Absrac As more renewable, ye volaile, forms of energy like solar and wind are being incorporaed ino he grid, he problem of finding opimal conrol policies for energy sorage is becoming increasingly imporan. These sequenial decision problems are ofen modeled as sochasic dynamic programs, bu when he sae space becomes large, radiional (exac) echniques such as backward inducion, policy ieraion, or value ieraion quickly become compuaionally inracable. Approximae dynamic programming (ADP) hus becomes a naural soluion echnique for solving hese problems o near opimaliy using significanly fewer compuaional resources. In his paper, we compare he performance of he following: various approximaion archiecures wih approximae policy ieraion (API), approximae value ieraion (AVI) wih srucured lookup able, and direc policy search on a benchmarked energy sorage problem (i.e., he opimal soluion is compuable). I. INTRODUCTION IN his paper, we invesigae he effeciveness of several echniques ha fall under he realm of approximae dynamic programming (ADP) on a simple energy sorage and allocaion problem (previously described in 1] and 2]): we seek o opimally conrol (profi maximizaion) a sorage device ha ineracs wih boh he grid and an uncerain energy supply (i.e., wind) in order o mee demand. In our benchmarks, we consider a sochasic wind supply, sochasic elecriciy prices (from he grid), and a deerminisic demand. We use his problem class because i can be simplified hrough discreizaion (and possibly dimensionaliy reducion) o obain benchmark problems ha can be solved opimally. The idea is o use hese benchmark problems o provide insighs ino he performance of a variey of ADP sraegies (for an overview of radiional mehods in ADP, see e.g., 3], 4], 5]). A precise formulaion of his problem is given in Secion III. Algorihmically, we consider soluion echniques ha are varians of approximae policy ieraion (API) and approximae value ieraion (AVI). The basis for boh of hese algorihms is a value funcion approximaion (VFA) (he value funcion is also known as he cos o go funcion) and hus, by alering he approximaion archiecure, we arrive a a family of ADP algorihms. For API, we es several mehods ypically found in he machine learning (ML) lieraure o approximae he value funcion: suppor vecor regression (SVR), Gaussian process regression (GPR), local parameric mehods (LPR), and a clusering mehod called Dirichle cloud wih radial basis funcions (DCR). In he case of AVI, we consider lookup able echniques ha exploi he srucural properies of he problem a hand: monooniciy (he use of he naural concaviy in his problem was sudied previously in 2]). Alhough lookup able iself can be a very limied mehod, we find ha he addiional knowledge of problem srucure makes i an exremely effecive soluion mehod, even when compared o more advanced saisical esimaion mehods. This paper repors on he performance of a variey of approximaion mehods ha have been developed in he approximae dynamic programming communiy, esed using a series of opimal benchmark problems drawn from a relaively simple energy sorage applicaion. These sugges ha mehods based on Bellman error minimizaion, using boh approximae value ieraion and approximae policy ieraion, work surprisingly poorly if we use approximaion mehods drawn from machine learning. Pure able lookup also works poorly. By conras, a simple cos funcion approximaion esimaed using policy search works remarkable well, hining ha he problem is no he approximaion archiecure (hough his mehod does no scale o more complex policies). In addiion, lookup able mehods ha exploi convexiy or monooniciy (if applicable) work exremely well, bu do no scale o complex sae of he world variables. The implicaions for many curren ADP algorihms are no encouraging, which signals he need for furher work in his area. The paper is organized as follows. In Secion II, we give a brief lieraure review. Secion III provides he mahemaical formulaion for he problem and discusses is inheren srucure. Nex, Secions IV V give an overview of he algorihmic echniques ha we employ, followed by numerical work (including previous work) in Secions VI X. We conclude in Secion XI. II. LITERATURE REVIEW The problem of energy sorage, and is closely relaed problems in invenory and asse managemen, has been widely sudied. For example, in 6], he auhors derive, under an assumpion on he disribuion of wind energy, an analyical soluion o an energy commimen problem in he presence of sorage. The mahemaical formulaion is similar regardless of he exac applicaion; 7] and 8], for example, presen differen echniques (including opimal swiching and ADP) o sudy conrol policies of naural gas sorage faciliies. Moreover, 9] and 10] sudy he opimizaion of a hydro elecric reservoir, wih he addiional complicaion of bidding day ahead. The second paper, 10], uses a mehod based on

2 sochasic dual dynamic programming (SDDP). SDDP and is relaed mehods use Benders cus, bu he heoreical work in his area uses he assumpion ha random variables only have a finie se of oucomes 11] (and hus difficul o scale o larger problems). Taking a slighly differen poin of view, 12] considers he capaciy value of energy sorage by solving a dynamic program. Broader works include 13], 14], 15], and 16], all of which solve relaed problems ha involve sorage and an generic asse or commodiy. Simple, scalar sorage (or invenory) problems can be easily solved using backward dynamic programming (see 17]), bu hese mehods quickly become inracable as we add addiional sae of he world variables, leading us o consider he use of approximae dynamic programming. 1] uses approximae policy ieraion wih parameric linear model (i.e., basis funcions), leas squares emporal difference (LSTD), and Bellman error minimizaion o solve he same energy allocaion problem ha we consider here. 18] akes an alernaive approach o he policy evaluaion sep and uses neural neworks (in his paper, we use nonparameric models). 2] uses he naural concaviy of he value funcions o speed up he convergence of a TD(1) algorihm (see 19]). 16] akes a similar approach of exploiing concaviy for a generic problem wih a scalar resource, bu wihin an approximae value ieraion framework. Moreover, 20] considers a simple sorage problem moivaed by muual fund managemen and solves i using a lookup able approach exploiing concaviy. Also aking advanage of srucure, 21] explois he monooniciy in he value funcions in a lookup able approach o solving an opimal bidding and sorage problem. In boh he cases of 20] and 21], pure lookup able wihou srucure does no work in pracice wihin reasonable ime consrains. 7] solves he naural gas sorage conrol problem hrough he discreizaion of a coninuous ime model and applying a basis funcion approximaion of he value funcion. One of few works o consider a nonparameric approximaion of he value funcion, 15] employs Dirichle process mixure models o cluser saes and hen uses a convex model wihin each cluser. As can be seen from he lieraure, i is generally he case ha a specific algorihm is applied o a specific applicaion. The conribuion of his paper is o empirically compare he effeciveness of several popular ADP mehods on common se of problems derived from a energy sorage applicaion. III. MATHEMATICAL FORMULATION We now formulae he energy sorage and allocaion problem as a Markov decision process (MDP). Le N be a discree ime index represening he decision epochs of he MDP (in his problem, could be measured in hours or days). Over a finie horizon from = 0 o = T, our goal is o find a policy ha maximizes expeced profis. Le R R = 0, R max ] be he level of energy in sorage a ime, ha has charge and discharge efficiencies denoed by β c and β d, respecively, wih boh β c and β d in (0, 1). Also, le γ c and γ d be he maximum amoun of energy ha can be charged or discharged, respecively, from he sorage device. For example, suppose ha our sorage device is a 1 MW baery (meaning ha i can charge and discharge a a rae of 1 MW) and we make allocaion decisions every hour. In his case, we have ha γ c = γ d = 1 MWh. Le E be he amoun of energy available from wind a ime and P be he spo price of elecriciy. Finally, suppose D is he amoun of demand ha mus be saisfied a ime. To allow for differen models (eiher deerminisic or sochasic), we also define E S, P S, and D S o be he sae variables associaed wih he respecive processes a ime. As an example, if E is modeled as a Markov process, hen E S = E and if D is modeled as a deerminisic process, hen D S = {}. Hence, he sae variable for he problem is S = (R, E S, P S, D S ). To abbreviae, le W = (E S, P S, D S ) W and S = (R, W ). Throughou his paper, we operae under he assumpion ha he process W is independen of R. Nex, we define he exogenous informaion, Ŵ +1, o be he change in W : W +1 = W + Ŵ+1, which of course is model dependen (he specific processes we use for benchmarking are defined in Secion VI). The decision problem is ha, while anicipaing he fuure value of sorage, we mus combine energy from he following hree sources in order o fully saisfy he demand: 1) energy currenly in sorage, consrained by γ c, γ d, and R (represened by a decision x rd ); 2) newly available wind energy, consrained by E (represened by a decision x wd ); 3) and energy from he grid, a a spo price of P (represened by a decision x gd ). Addiional allocaion decisions are, amoun of wind energy o sore; x rg, he amoun of energy o sell o he grid a price P ; and x gr, he amoun of energy o buy from he grid and sore. These allocaion decisions are summarized by he six dimensional, nonnegaive decision vecor, x = (x wd, x gd, x rd,, x gr, x rg ) T 0, (1) and he consrains are as follows: x wd + β d x rd x rd x rd + x gd = D, (2) + x rg R, (3) + x gr R max R, (4) + x wd E, (5) + x gr γ c, (6) + x rg γ d. (7) The firs consrain guaranees ha demand is fully saisfied; (3) and (4) are sorage capaciy consrains; (5) saes ha he maximum amoun of energy used from wind is bounded by E ; and finally, (6) and (7) consrain he decisions o wihin he sorage ransfer raes. Le us denoe he feasible se, deermined by he consrains (1) (7), by X (S ). Suppressing he dependence on S for ease of noaion, define X = X (S ). See Figure 1 below for an illusraive summary of he problem, annoaed wih he componens of x.

3 Sorage R x rw x gr Wind E x rd Grid x wd x gd Demand D Fig. 1. Illusraion of he Energy Sorage/Allocaion Problem Le φ = (0, 0, 1, β c, β c, 1) T be a vecor conaining he flow coefficiens of a decision x wih respec o he sorage device. Then, he ransiion funcion is: R +1 = R + φ T x. (8) Noe ha here is no dependence on any random informaion, allowing us o easily ake advanage of he pos decision formulaion of his problem, o be made clear below. Now, we define he conribuion funcion. For a given sae S and decision x, we define: C(S, x ) = P (D + β d x rg x gr x gd ), he profi realized a ime (we ge paid for saisfying demand and for selling o he grid, bu we mus pay for any energy ha originaes from he grid). Using Bellman s opimaliy equaion, we define value funcions hrough he following se of recursive equaions. Le V T (S T ) = 0 and for T 1, V (S ) = max x X C(S, x ) + E(V +1(S +1 ) S ) ], (9) where i is undersood ha S +1 depends on boh S and x. For simulaion and compuaional purposes, i is ofen roublesome o deal wih an expecaion operaor wihin a maximum operaor. As described in deail in 3], his can be remedied by appealing o he pos decision formulaion of Bellman s equaion. Essenially, he pos decision sae S x S is he sae immediaely afer he decision x is made bu before any new informaion has arrived, where S is he pos decision sae space. The canonical form of he pos decision sae is simply S x = (S, x ), bu ofenimes, i can be wrien in a more condensed way. Mahemaically, i mus be he case ha S +1 S x d = S +1 S, x (equal in disribuion). Le R x = R +1 as defined in (8); for our problem, due o he fac ha R +1 depends solely on R and x, he pos decision sae is given by: S x = (R x, E S, P S, D S ) = (R x, W ). We define he pos decision value funcion as V x (S x ) = E(V+1(S +1 ) S x ), which gives us he following wo relaions: V (S ) = max C(S, x ) + V x (S x ) ] (10) x X and V x 1(S x 1) = E max C(S, x ) + V x x X (S x ) ] ] S x 1. (11) Equaion (10) is useful for simulaing a policy induced by a se of value funcions and equaion (11) is used in he simulaion seps of he various ADP algorihms. In 2], he concaviy of he pos decision value funcions is exploied as he VFAs are learned by he ADP algorihm. For his paper, in addiion o he API varians, we consider for comparison a more recen algorihm ha akes advanage of monooniciy, called Monoone ADP (see 22]). To do so, we give he following proposiion. Proposiion 1. For each ime T 1, he pos decision value funcion V x (R x, W ) is nondecreasing in R x. Proof. We proceed by inducion. Since VT (S T ) = 0, i is clear ha VT x 1 (Sx T 1 ) = 0 by definiion and hence saisfies monooniciy. Assume ha V x (S x ) saisfies he monooniciy propery (inducion hypohesis) and consider (11). A ime 1, fix wo saes S 1 x = (R 1, x W 1 ) and S 1 x = ( R 1, x W 1 ), wih boh R 1, x R 1 x R, such ha R 1 x < R 1. x Le ɛ = R 1 x R 1. x Denoe S = (R, W ) = (R 1, x W ) and S = ( R, W ) = ( R 1, x W ), wih S x and S x be he corresponding pos decision saes. As before, le X = X (S ), bu also le X = X ( S ). We aim o show ha he following inequaliy holds for any oucome of W W 1 d = (noe ha W S 1 x d = W W 1, so he disribuion of he exogenous W S 1 x informaion is he same in boh siuaions): max C(S, x ) + V x (S x ) ] x X max x X C( S, x ) + V x ( S x ) ]. Noe he differing feasible ses X and X. Denoe he opimal soluion o he lef hand side of he inequaliy as x and he opimal value of he objecive as F. Now, here are wo cases: 1) x X. Using his same decision on he righ hand side as well, we see ha since R < R, we have R x < R x. Using C(S, x ) = C( S, x ) and he inducion hypohesis, we conclude ha C( S, x ) + V x ( S x ) F. Since here exiss a feasible soluion, namely x, in he new decision space X ha achieves he objecive value greaer han or equal o F, he inequaliy is verified. 2) x X. To ge from X o X, consrain (3) is relaxed by ɛ and consrain (4) is ighened by ɛ. Therefore, i mus be he case ha consrain (4) is violaed by x : + x gr > R max R = R max R ɛ. To consruc a feasible soluion x X from x, le us simply decrease + x gr unil (4) is saisfied. Tha is, choose and x gr such ha + x gr = R max R. I is clear ha: ( + x gr ) ( + x gr ) ɛ, and hus, from he resource ransiion funcion (8), we see ha Rx R x. Also, C(S, x ) = C( S, x ), so by he inducion hypohesis, we have shown he exisence of a feasible soluion x in X such ha C( S, x )+V x ( S x ) F, he original inequaliy is verified. Because his is rue for any realizaion of W, monooniciy holds in expecaion as well and he proof is complee.

4 Monooniciy ofen exiss in he sae of he world dimensions as well, bu his depends on he model of he random processes used. We show how o ake advanage of his srucural propery in Secion V. IV. APPROXIMATE POLICY ITERATION Exac policy ieraion involves wo main seps: policy evaluaion and policy improvemen (see, e.g., 5]). The (exac) policy evaluaion sep can be compleed using a marix inversion (solving Bellman s opimaliy equaions), bu his is ofen inracable. One opion for approximaing he policy evaluaion sep is o apply exac value ieraion for a large number of ieraions, bu even his is difficul for complex problems wih large sae spaces and impossible when he problem admis a coninuous sae space. In our implemenaion of he approximae policy evaluaion sep (for a finie horizon model), we ake a simulaion approach where we firs generae observaions of he value funcion for he fixed policy and hen fiing a model o he observaions. Consider a fixed policy π = (π 1, π 2,..., π T ) and a general approximaion archiecure Q, which akes a se of samples Z = { (x i, y i ) i=1} M (wih xi X and y i Y) and produces a model Q(Z, ) ha maps from X o Y. Figure 2 provides he precise seps aken o perform approximae policy evaluaion, given π, Q, and he number of samples desired, M. The idea is ha we simulae he policy π from various iniial saes, keeping rack of boh he (pos decision) saes S x,m ha we visi and conribuions C m = C(S m, x m ) ha we receive. From his, we produce a se of samples Z (see Sep 5 of Figure 2) ha is used by Q o produce an approximaion. Approximae Policy Evaluaion (Inpus: policy π, approximaion Q, sample size M) Sep 0. Se m = 1. Sep 1. Selec an iniial pos decision sae S x,m 0. Sep 2. For = 1, 2,..., (T 1): Sep 2a. Sep 2b. Sample W and se pre decision sae: S m = (R x,m 0, W ). Apply policy o receive decision and compue conribuion: x m = π (S m ); Cm = C(S m, xm ). Sep 2d. Compue nex pos decision sae S x using (8). Sep 3. Compue observaions of he ime dependen value funcion. For each, se: v m T 1 = C m τ. τ= Sep 4. If m < M, incremen m and reurn o Sep 1. Sep 5. Denoe he se of samples by: { Z = (S x,m, v m } )M m=1. Using he approximaion model, reurn V x ( ) = Q(Z, ). Fig. 2. Approximae Policy Evaluaion Sep for API Wih he policy evaluaion sep defined, we define he API algorihm by essenially replacing he exac policy evaluaion sep in radiional policy ieraion wih he approximae version. As menioned above, he algorihm ieraes he wo seps of policy evaluaion and improvemen, shown in Figure 3. A. Choices of Approximaion Archiecure Q In his secion, we give a brief inroducion for each of he following approximaion archiecures Q (for a deailed Approximae Policy Ieraion (Inpus: approximaion Q, sample size M, ieraions N ) Sep 0. Se iniial policy π 0 ; se n = 1. Sep 1. Use Approximae Policy Evaluaion wih argumens (π n 1, Q, M) o compue V x,n 1 for each. Sep 2. Policy improvemen sep: π n (S ) = arg max (C(S, x ) + V x,n 1 (S x ]. ) x X Sep 3. If n < N, incremen n and reurn o Sep 1. Fig. 3. Approximae Policy Ieraion Algorihm reamen, see he corresponding lieraure). The moivaion for choosing nonparameric esimaors is ha no only have hey received he mos aenion in he saisics and machine learning communiies, bu also require lile o no problem specific uning, as opposed o parameric varians. The more radiional echnique of LSTD was esed on he same problem in 1]. Assume ha he noaion is self conained for each of he following sub secions; in addiion, for purposes of presenaion, we have removed he subscrip and superscrip from he noaion V x (s) and use V (s) insead. Suppor vecor regression (SVR), originally inroduced in 23], is an exension of he well known suppor vecor machine (SVM) algorihm for classificaion o he problem of regression. Also see 24] for an overview of SVR and implemenaions. Given a linear model, V (s) = f F θ f φ f (s) = θ f, φ f (s), where F is a se of feaures, φ f are basis funcions, and θ f are weighs. Le he raining daase be represened by (s m, y m ) for m = 1, 2,..., M. The essenial idea of SVR is o choose a hyperplane defined by he weighs θ f, so ha mos of he raining pairs fall wihin ɛ of he hyperplane while keeping he hyperplane as fla or as simple as possible, by minimizing θ = θ, θ (given wo models ha explain he raining daa, we prefer he simpler one ha is less affeced by noise in s m ). The opimizaion problem can be wrien as follows: minimize 1 2 θ 2 + η subjec o M ( ξm + ξm) m=1 y m θ, φ(s m ) ɛ + ξ m y m θ, φ(s m ) ɛ + ξm ξ m, ξm 0. In he numerical work, we leverage he of used and versaile Gaussian radial basis kernel. SVR is implemened using svm of he R package e1071 (wih λ = 10 and ɛ = 0.01). Gaussian process regression (GPR) is a Bayesian machine learning echnique (see 25] for a horough descripion) ha allows us o model he unknown value funcion by a Gaussian process indexed by elemens s of he sae space S. A Gaussian process V GP(m, k), specified by a mean funcion m(s) = EV (s)] and covariance funcion k(s, s ) = CovV (s), V (s )], is a (possibly infinie) collecion of random variables such ha any finie se of hem is joinly Gaussian. For he prior, a ypical choice of mean funcion is m(s) = 0 (noe ha he poserior mean is no necessarily zero). In our work, we choose k o be he Gaussian radial basis

5 ( funcion, k(s, s ) = 1 exp 2πσ s s 2 2 2σ ), and assume we 2 observe y = V (s) + ɛ, where ɛ N (0, σɛ 2 ). The essenial sep in GPR is compuing he poserior Gaussian process by condiioning on he observed values. We implemen GPR using gausspr of he R package kernlab. Local Polynomial Regression (LPR), or more specifically, locally weighed scaerplo smoohing (LOESS), is a nonparameric echnique used for smooh funcions 26]. As before, le (S, y) be he raining se and suppose we are ineresed in esimaing he value of V (s). Le θ(s) = (V (s), V (s),..., V (l) (s)) and U(u) = (1, u, u 2 /2!,..., u l /l!). For any s i S near s, he value V (s i ) can be approximaed using (Taylor expansion) θ(s) U(s i s). The LOESS esimaor for θ is defined by: ˆθ(s) = arg min θ R l+1 ] ( 2K y i θ si s U(s i s) h s i ). In our numerical work, we use a second order local polynomial fi (l = 2). LOESS is implemened using loess of he R package sas. Dirichle Cloud Radial Basis Funcions (DCR) is a mehod, developed in 27], ha performs local regressions on clusers of daa. As each raining poin is processed, a cluser for he poin is chosen and he local (low degree) polynomial funcions is updaed recursively. Once again, we use he Gaussian radial basis funcion; using he noaion from 27], le φ(r) = 1 2π exp( r 2 /2). Le N c be he oal number of clusers, c i be he cenroid of he i h cluser, and p i be he polynomial fied o he i h cluser. The model can be summarized by he following equaion; for a new sae s: Nc i=1 V (s) = p i(s)φ( s c i ) Nc i=1 φ( s c, i ) a weighed average of he predicions of each of he individual clusers. For a deailed descripion of when and how new clusers are creaed and he precise equaions for he fiing of local polynomials, see 27]. This mehod is implemened in R. As we can see, LPR and DCR are similarly moivaed by local approximaions, while SVR and GPR are significanly differen: SVR is a more sophisicaed version of he basis funcion echnique, while GPR is a Bayesian mehod of modeling a funcion as a random process. V. APPROXIMATE VALUE ITERATION WITH MONOTONICITY PRESERVATION We now move away from API and consider anoher main ADP echnique, approximae value ieraion (AVI). The version of AVI for finie horizon problems ha we consider is a forward simulaion mehod ha ieraively updaes he VFA based on each new observaion. A weakness of his mehod is ha i requires a lookup able represenaion of he sae space, somehing ha he API mehods do no require. Neverheless, in his paper, all problems are discreized in order o compare o opimal benchmark. We presen a version of he AVI ha explois he monoone srucure of he problem (see Proposiion 1), called Monoone ADP (MADP) 22]. Firs, we define some noaion. Le V x,n (s) be he esimae of he (pos decision) value funcion evaluaed a s S a ieraion n of he algorihm. The sae ha he algorihm visis on ieraion n and ime is denoed S x,n. Also, le α n be a possibly sochasic sepsize sequence, wih α n (s) = α n 1 {S x,n =s}. Consider wo saes s = (r, w) S and s = (r, w ) S, wih r, r R and w, w W. We say ha s s if and only if r r and w = w he necessary condiions o invoke Proposiion 1. Now we define he monooniciy preservaion operaor, Π M (see Figure 4 for an illusraion). In he following definiion, suppose ha v is he previous esimae of he value of a paricular sae s and ha z n is a new observaion of he value of he currenly visied sae S x,n. We define: Π M (S x,n, z x,n, s, v) = z x,n if s = S x,n, z x,n v if S x,n s, s S x,n, z x,n v if s S x,n, s S x,n, v oherwise. The precise descripion of he algorihm is given in Figure 5. curren esimae of value curren esimae of value : new observaions : rue value funcion : VFA (discreized) R x,n monooniciy violaion resource level monooniciy violaion resource level R x,n+1 M M curren esimae of value curren esimae of value R x,n resource level resource level R x,n+1 Fig. 4. Illusraion of he Monooniciy Preservaion Operaor in he Resource Dimension (i.e., for a fixed and fixed oucome of W ) We remark ha Monoone ADP is a provably convergen algorihm under cerain echnical condiions (see 22]). Alhough we do no describe he deails of he convergence heory in his paper, i can be easily checked ha he problem a hand, afer discreizaion, saisfies he condiions for convergence. VI. BENCHMARK PROBLEMS The problems ha we use as opimal benchmarks o he proposed algorihms originaed from 2]. For all of he benchmark problems, we choose R max = 30, β c = β d = 1, γ c = γ d = 5, and T = 100. The deerminisic demand is assumed o have a seasonal srucure: D = max { 0, 3 4 sin ( )} 2π T. We now define wo parameers ha deermine he suppor of he price processes, P min = 30 and P max = 70. Moreover, 2] defines a discree disribuion called he pseudonormal disribuion, characerized by five parameers, µ, σ 2, a, b, and. Le X be pseudonormally disribued (wrien X PN (µ, σ 2, a, b, )). The suppor of X is defined o be X =

6 Monoone ADP Algorihm Sep 0a. Iniialize V x,0 (s) = 0 for each T 1 and s S. Sep 0b. Se V x,n (s) = 0 for each s S and n N. T Sep 0c. Se n = 1. Sep 1. Selec an iniial sae S x,n 0 = (R x,n 0, W 0 ). Sep 2. For = 0,..., (T 1): Sep 2a. Sample S+1 n and ge a noisy observaion: ˆv x,n (S x,n { n ) = max C(S x +1, x +1) + V x,n 1 +1 (S x +1 )}. +1 Sep 2b. Smooh new observaion wih previous value: z x,n (S x,n ) = (1 α n 1 (S x,n )) V x,n 1 (S x,n ) + α n 1 (S x,n ) ˆv x,n (S x,n ). Sep 2c. Enforce monooniciy. For each s S: V x,n ( x,n (s) = Π M S, z x,n, s, V x,n 1 (s) ). Sep 2d. Choose he nex sae S x,n +1. Sep 3. If n < N, incremen n and reurn Sep 1. Fig. 5. Monoone ADP Algorihm using Pos Decision Saes {a, a+, a+2,..., b } and for x i X, we have P(X = x i ) = f(x i ; µ, σ 2 )/ x f(x j X j; µ, σ 2 ), where f( ; µ, σ 2 ) is he pdf of a normal random variable wih mean µ and variance σ 2. Three ypes of price processes are considered. Le ɛ P PN (µ P, σp 2, 8, 8, 1), ɛj PN (0, 50 2, 40, 40, 1) (for jumps), and u U(0, 1) be i.i.d. random variables. 1) Sinusoidal. { { ( ) 5π P = min max sin 2T } + ɛ, P min, P max }. 2) Markov chain. Le P 0 = P min and P +1 = min {max {P + ɛ +1, P min }, P max }. 3) Markov chain wih jumps. Le P 0 = P min and P +1 = min { max { P + ɛ {u+1 p}ɛ J +1, P min }, Pmax }. We consider a Markov chain model for he wind process, E. Define E min = 1 and E max = 7. The suppor of E are he values beween E min and E max discreized a a level of a parameer E. Le ɛ E i.i.d. random variables ha can be eiher uniformly or pseudonormally disribued, PN (µ E, σe 2, 3, 3, E). E +1 = min {max {E + ɛ +1, E min }, E max }. Lasly, suppose ha R x akes values beween 0 and R max, discreized a a level R. Table I summarizes he sochasic benchmark problems; for ɛ E and ɛ P, since a, b, and are defined he same way across all problems, we use PN (µ, σ 2 ) as shorhand. VII. NUMERICAL RESULTS Due o he more complex naure of he various approximaion archiecures, here are more compuaional issues associaed wih he API algorihms han he AVI algorihm. The main difficuly arises in he policy improvemen sep (given in Sep 2 of Figure 3 bu he compuaional cos is acually realized in Sep 2b of Figure 2). Due o he TABLE I PARAMETER CHOICES FOR STOCHASTIC BENCHMARK PROBLEMS 2] Problem R E ɛ E Price Process S U( 1, 1) Sinusoidal PN (0, 25 2 ) S PN (0, ) Sinusoidal PN (0, 25 2 ) S PN (0, ) Sinusoidal PN (0, 25 2 ) S PN (0, ) Sinusoidal PN (0, 25 2 ) S5 1 1 U( 1, 1) MC + Jump PN (0, ) S6 1 1 U( 1, 1) MC + Jump PN (0, ) S7 1 1 U( 1, 1) MC + Jump PN (0, ) S8 1 1 U( 1, 1) MC + Jump PN (0, ) S9 1 1 PN (0, ) MC + Jump PN (0, ) S PN (0, ) MC + Jump PN (0, ) S PN (0, ) MC + Jump PN (0, ) S PN (0, ) MC + Jump PN (0, ) S PN (0, ) MC + Jump PN (0, ) S PN (0, ) MC + Jump PN (0, ) S PN (0, ) MC + Jump PN (0, ) S PN (0, ) MC PN (0, ) S PN (0, ) MC PN (0, ) exisence of local opima when solving Sep 2 of Figure 3, he maximizaion problem is solved using grid search, a compuaionally expensive mehod. Because of hese limiaions, we are able o use approximaely 12.5% of he sae space for policy evaluaion purposes and 10 policy improvemen seps. On he oher hand, he MADP algorihm uses marix operaions o manipulae a simple lookup able represenaion of saes and can finish ieraions wihin 2 days of compuaion ime. For a given pos decision VFA, V x, we define he approximae policy as X π (S ) = arg max x X C(S, x ) + V x (S x ) ]. To compue he value of he policy, we generae 1000 sample pahs of he wind and price processes, and for each sample pah, we follow he policy and sum he conribuions. The value of he policy is hen he average conribuion over he 1000 sample pahs. The percen of opimaliy is defined o be he value of he approximae policy divided by he value of he opimal (backward dynamic programming) policy. See 3, Secion 4.94] for a deailed descripion of deermining a policy s value. The resuls are given in Figure 6. SVR and MADP generae he highes qualiy policies, bu i is noeworhy o see ha SVR does no use problem specific informaion while MADP does. Despie his, when considering he relaive simpliciy of he energy sorage problem when compared o oher real world problems, resuls of 90% are no necessarily encouraging (GPR, LPR, and DCR ofen perform significanly worse han 90%). This suggess ha care needs o be aken when combining API wih a general purpose approximaion archiecure no any approximaion mehod will work. VIII. EXPLOITING CONCAVITY I needs o be poined ou ha neiher SVR nor MADP perform a he level of he ADP algorihm of 2] (98 99% opimaliy, see Figure 7), which explois he piecewise linear concave naure of he value funcions; he algorihm of 2] also uses a specific backward pass designed wih his energy sorage applicaion in mind. Alhough experimens are no shown in his paper, we wan o sress ha, while convergence heory exiss, unsrucured ɛ P

7 SVR GPR LPR DCR MADP % Opimaliy Fig. 6. Benchmark Resuls 0 S1 S2 S3 S4 S5 S6 S7 S8 S9 S10 S11 S12 S13 S14 S15 S16 S17 Sochasic Benchmark Problem lookup able wih AVI does no work for any reasonably large problems (he convergence rae is far oo slow o be of any pracical use). 20] and 21] show he benefis of aking advanage of srucure, respecively. We would also like o noe ha AVI can also be used wih oher approximaions beyond lookup able (wih or wihou srucure), such as basis funcions, bu i is shown in 28] ha here is ofen a lack of a fixed poin. Our resuls in his paper sugges ha srucured, problem specific lookup able echniques also ouperform oher, more general approaches, such as API paired wih a generic approximaion echnique. A his poin, he numerical resuls sugges ha srucured lookup able is consisenly effecive on moderaely sized problems, unlike any oher mehods ha we esed. The cavea, of course, is ha lookup able echniques do no scale o larger sae spaces due o he requiremen of soring a value esimae for every sae. No only ha, i is ypically he case ha srucure exiss in only 1 or 2 dimensions of higher dimensional sae variable (sae of he world variables quickly add dimensionaliy and here is no guaranee ha hey conain srucure). based algorihms. Several versions of API are discussed in he original paper 1]; here, we only reproduce he resuls for bes performing version, API wih insrumenal variables (IVAPI). The oher version considered in 1] is leas squares API (LSAPI). Quadraic basis funcions are used for he approximaion. Direc policy search is implemened using he knowledge gradien for coninuous parameers (KGCP, see 29]). The srucure of he policy is: X π (S θ) = arg maxc(s, x ) + φ(s x ) θ], x X where φ is he vecor of basis funcions and θ is a vecor of weighs (he parameer of he policy). Noe ha alhough he second erm resembles a VFA, he policy search echnique has no noion of minimizing he disance beween φ(s x ) θ o V x (S x ). The reproduced resuls are shown below in Figure 8. Alhough direc policy search seems robus in his applicaion, we emphasize ha his ype of direc search does no easily scale o higher dimensional parameer spaces IVAPI Direc 70 % Opimaliy Benchmark Problem (see 1]) Fig. 8. Direc Policy Search vs. IVAPI, 1] Fig. 7. Resuls from ADP Exploiing Concaviy, 2] IX. DIRECT POLICY SEARCH In his secion, we review he somewha surprising resul from 1] ha direc policy search (over a low dimensional paramerized space of policies) yields beer resuls han API X. API SAMPLING DISTRIBUTION In he implemenaions of API discussed in his paper, he sampling disribuion used for Sep 1 of he approximae policy evaluaion sep of Figure 2 is chosen o be a uniform disribuion over he sae space. One hypohesis o explain API s relaively poor performance is ha insead of sampling uniformly, we can sample from he disribuion of saes visied under he opimal policy (say, given a deerminisic iniial

8 sae, S0 x ). In mos cases, his disribuion is unknown; however, we are able o es his hypohesis on a simple problem wih a compuable opimal policy. We consider a version of our energy sorage problem where he sae variable is he scalar resource sae, R x combined wih a quadraic approximaion. When sampling uniformly, we consisenly achieve policies ha are 90% 95% opimal, bu when sampling from he opimal policy s sae disribuion, we obain policies ha are anywhere from 40% 70% opimal. The primary reason ha we observed for such low qualiy policies is ha he opimal policy visis some saes wih very low o zero probabiliy, causing he quadraic approximaion o be very accurae for a porion of he sae space bu a he same ime very poor in oher porions of he sae space. This ofen leads o policy oscillaions (or chaering, see 30] for a discussion on his issue). Besides hese preliminary observaions, he issue of he correc sampling disribuion remains a work in progress. XI. CONCLUSION In his paper, we describe a simple finie horizon energy sorage and allocaion problem ha is subjec o sochasic prices and wind supply, wih he purpose of comparing he performance of several ADP algorihms. We consider API algorihms ha ake advanage of he following approximaion archiecures: SVR, GPR, LPR, and DCR. In addiion, we es an AVI algorihm ha explois he known monooniciy of he problem, MADP. We draw he following conclusions from his and relaed papers: API performs decenly well wih SVR, bu poorly wih he oher approximaion archiecures ha we considered. However, given he simpliciy of he problem, even he resuls from SVR are no oo encouraging. Pure lookup able AVI performs poorly in pracice, despie convergence heory (see 21], 20]). Srucured lookup able AVI (concaviy or monooniciy, bu especially concaviy) works exremely well, bu is limied o a low dimensional sae of he world variable (see 2]). Direc policy search also displays superior performance compared o API based mehods (see 1]), bu canno scale o policies requiring a large number of parameers. In paricular, direc policy search is generally no suiable for ime dependen policies. From his, we can conclude ha none of hese echniques work reliably in a way ha would scale o more complex problems. Therefore, we believe ha new heory and mehodology need o be developed o order o solve real world sequenial decision problems, which are becoming increasingly difficul. REFERENCES 1] W. Sco and W. B. Powell, Approximae Dynamic Programming for Energy Sorage wih New Resuls on Insrumenal Variables and Projeced Bellman Errors, (working paper), ] D. Salas and W. B. Powell, Benchmarking a Scalable Approximaion Dynamic Programming Algorihm for Sochasic Conrol of Mulidimensional Energy Sorage Problems, (working paper), ] W. B. Powell, Approximae Dynamic Programming: Solving he Curses of Dimensionaliy, 2nd ed. Wiley, ] F. L. Lewis and D. Vrabie, Learning and Adapive Dynamic Programming for Feedback Conrol, IEEE Circuis Sys. Mag., vol. 9, no. 3, pp , ] D. P. Bersekas and J. N. Tsisiklis, Neuro Dynamic Programming. Belmon, MA: Ahena Scienific, ] J. H. Kim and W. B. Powell, Opimal Energy Commimens wih Sorage and Inermien Supply, Operaions Research, vol. 59, no. 6, pp , ] R. Carmona and M. Ludkovski, Valuaion of energy sorage: An opimal swiching approach, Quaniaive Finance, vol. 10, no. 4, pp , ] M. Thompson, M. Davison, and H. Rasmussen, Naural gas sorage valuaion and opimizaion: A real opions applicaion, Naval Research Logisics, vol. 56, no. 3, pp , ] G. Prichard, B. Philpo, and J. Neame, Hydroelecric reservoir opimizaion in a pool marke, vol. 461, pp , ] N. Löhndorf, D. Wozabal, and S. Minner, Opimizing Trading Decisions for Hydro Sorage Sysems Using Approximae Dual Dynamic Programming, Operaions Research, vol. 61, no. 4, pp , ] A. Philpo and Z. Guan, On he convergence of sochasic dual dynamic programming and relaed mehods, Operaions Research Leers, vol. 36, no. 4, pp , ] R. Sioshansi, S. H. Madaeni, and P. Denholm, A Dynamic Programming Approach o Esimae he Capaciy Value of Energy Sorage, IEEE Transacions on Power Sysems, vol. 29, no. 1, pp , ] J. M. Nascimeno and W. B. Powell, An Opimal Approximae Dynamic Programming Algorihm for he Lagged Asse Acquisiion Problem, Mahemaics of Operaions Research, vol. 34, no. 1, pp , ] N. Secomandi, Opimal Commodiy Trading wih a Capaciaed Sorage Asse, Managemen Science, vol. 56, no. 3, pp , ] L. Hannah and D. Dunson, Approximae dynamic programming for sorage problems, Proceedings of he 29h Inernaional Conference on Machine Learning, ] J. M. Nascimeno and W. B. Powell, An Opimal Approximae Dynamic Programming Algorihm for Concave, Scalar Sorage Problems Wih Vecor-Valued Conrols, IEEE Transacions on Auomaic Conrol, vol. 58, no. 12, pp , ] M. L. Puerman, Markov Decision Processes: Discree Sochasic Dynamic Programming. New York: Wiley, ] D. Liu and Q. Wei, Policy Ieraion Adapive Dynamic Programming Algorihm for Discree-Time Nonlinear Sysems for Discree-Time Nonlinear Sysems, vol. 25, no. 3, pp , ] R. S. Suon and A. G. Baro, Inroducion o reinforcemen learning. MIT Press, ] J. M. Nascimeno and W. B. Powell, Dynamic Programming Models and Algorihms for he Muual Fund Cash Balance Problem, Managemen Science, vol. 56, no. 5, pp , ] D. R. Jiang and W. B. Powell, Opimal hour-ahead bidding in he realime elecriciy marke wih baery sorage using approximae dynamic programming, arxiv preprin arxiv: , ], An Approximae Dynamic Programming Algorihm for Monoone Value Funcions, arxiv preprin arxiv: , ] H. Drucker, C. J. Burges, L. Kaufman, A. Smola, and V. Vapnik, Suppor vecor regression machines, Advances in neural informaion processing sysems, no. 9, pp , ] A. J. Smola and B. Scholkopf, A Tuorial on Suppor Vecor Regression, ] C. E. Rasmussen, Gaussian processes for machine learning, ] W. Cleveland and S. Devlin, Locally Weighed Regression: An Approach o Regression Analysis by Local Fiing, Journal of he American Saisical Associaion, vol. 83, no. 403, pp , ] A. A. Jamshidi and W. B. Powell, A recursive local polynomial approximaion mehod using Dirichle clouds and radial basis funcions, (working paper), ] D. De Farias and B. Van Roy, On he Exisence of Fixed Poins for Approximae Value Ieraion and Temporal-Difference Learning, Journal of Opimizaion heory and Applicaions, vol. 105, no. 3, pp , ] W. Sco, P. I. Frazier, and W. B. Powell, The Correlaed Knowledge Gradien for Simulaion Opimizaion of Coninuous Parameers using Gaussian Process Regression, SIAM Journal on Opimizaion, vol. 21, no. 3, p. 996, ] D. P. Bersekas, Approximae Policy Ieraion : A Survey and Some New Mehods, Journal of Conrol Theory and Applicaions, no. June, 2011.

An introduction to the theory of SDDP algorithm

An introduction to the theory of SDDP algorithm An inroducion o he heory of SDDP algorihm V. Leclère (ENPC) Augus 1, 2014 V. Leclère Inroducion o SDDP Augus 1, 2014 1 / 21 Inroducion Large scale sochasic problem are hard o solve. Two ways of aacking

More information

Energy Storage Benchmark Problems

Energy Storage Benchmark Problems Energy Sorage Benchmark Problems Daniel F. Salas 1,3, Warren B. Powell 2,3 1 Deparmen of Chemical & Biological Engineering 2 Deparmen of Operaions Research & Financial Engineering 3 Princeon Laboraory

More information

Vehicle Arrival Models : Headway

Vehicle Arrival Models : Headway Chaper 12 Vehicle Arrival Models : Headway 12.1 Inroducion Modelling arrival of vehicle a secion of road is an imporan sep in raffic flow modelling. I has imporan applicaion in raffic flow simulaion where

More information

Diebold, Chapter 7. Francis X. Diebold, Elements of Forecasting, 4th Edition (Mason, Ohio: Cengage Learning, 2006). Chapter 7. Characterizing Cycles

Diebold, Chapter 7. Francis X. Diebold, Elements of Forecasting, 4th Edition (Mason, Ohio: Cengage Learning, 2006). Chapter 7. Characterizing Cycles Diebold, Chaper 7 Francis X. Diebold, Elemens of Forecasing, 4h Ediion (Mason, Ohio: Cengage Learning, 006). Chaper 7. Characerizing Cycles Afer compleing his reading you should be able o: Define covariance

More information

Application of a Stochastic-Fuzzy Approach to Modeling Optimal Discrete Time Dynamical Systems by Using Large Scale Data Processing

Application of a Stochastic-Fuzzy Approach to Modeling Optimal Discrete Time Dynamical Systems by Using Large Scale Data Processing Applicaion of a Sochasic-Fuzzy Approach o Modeling Opimal Discree Time Dynamical Sysems by Using Large Scale Daa Processing AA WALASZE-BABISZEWSA Deparmen of Compuer Engineering Opole Universiy of Technology

More information

Optimal approximate dynamic programming algorithms for a general class of storage problems

Optimal approximate dynamic programming algorithms for a general class of storage problems Opimal approximae dynamic programming algorihms for a general class of sorage problems Juliana M. Nascimeno Warren B. Powell Deparmen of Operaions Research and Financial Engineering Princeon Universiy

More information

1 Review of Zero-Sum Games

1 Review of Zero-Sum Games COS 5: heoreical Machine Learning Lecurer: Rob Schapire Lecure #23 Scribe: Eugene Brevdo April 30, 2008 Review of Zero-Sum Games Las ime we inroduced a mahemaical model for wo player zero-sum games. Any

More information

PENALIZED LEAST SQUARES AND PENALIZED LIKELIHOOD

PENALIZED LEAST SQUARES AND PENALIZED LIKELIHOOD PENALIZED LEAST SQUARES AND PENALIZED LIKELIHOOD HAN XIAO 1. Penalized Leas Squares Lasso solves he following opimizaion problem, ˆβ lasso = arg max β R p+1 1 N y i β 0 N x ij β j β j (1.1) for some 0.

More information

RL Lecture 7: Eligibility Traces. R. S. Sutton and A. G. Barto: Reinforcement Learning: An Introduction 1

RL Lecture 7: Eligibility Traces. R. S. Sutton and A. G. Barto: Reinforcement Learning: An Introduction 1 RL Lecure 7: Eligibiliy Traces R. S. Suon and A. G. Baro: Reinforcemen Learning: An Inroducion 1 N-sep TD Predicion Idea: Look farher ino he fuure when you do TD backup (1, 2, 3,, n seps) R. S. Suon and

More information

An Optimal Approximate Dynamic Programming Algorithm for the Lagged Asset Acquisition Problem

An Optimal Approximate Dynamic Programming Algorithm for the Lagged Asset Acquisition Problem An Opimal Approximae Dynamic Programming Algorihm for he Lagged Asse Acquisiion Problem Juliana M. Nascimeno Warren B. Powell Deparmen of Operaions Research and Financial Engineering Princeon Universiy

More information

STATE-SPACE MODELLING. A mass balance across the tank gives:

STATE-SPACE MODELLING. A mass balance across the tank gives: B. Lennox and N.F. Thornhill, 9, Sae Space Modelling, IChemE Process Managemen and Conrol Subjec Group Newsleer STE-SPACE MODELLING Inroducion: Over he pas decade or so here has been an ever increasing

More information

Lecture 20: Riccati Equations and Least Squares Feedback Control

Lecture 20: Riccati Equations and Least Squares Feedback Control 34-5 LINEAR SYSTEMS Lecure : Riccai Equaions and Leas Squares Feedback Conrol 5.6.4 Sae Feedback via Riccai Equaions A recursive approach in generaing he marix-valued funcion W ( ) equaion for i for he

More information

A Primal-Dual Type Algorithm with the O(1/t) Convergence Rate for Large Scale Constrained Convex Programs

A Primal-Dual Type Algorithm with the O(1/t) Convergence Rate for Large Scale Constrained Convex Programs PROC. IEEE CONFERENCE ON DECISION AND CONTROL, 06 A Primal-Dual Type Algorihm wih he O(/) Convergence Rae for Large Scale Consrained Convex Programs Hao Yu and Michael J. Neely Absrac This paper considers

More information

0.1 MAXIMUM LIKELIHOOD ESTIMATION EXPLAINED

0.1 MAXIMUM LIKELIHOOD ESTIMATION EXPLAINED 0.1 MAXIMUM LIKELIHOOD ESTIMATIO EXPLAIED Maximum likelihood esimaion is a bes-fi saisical mehod for he esimaion of he values of he parameers of a sysem, based on a se of observaions of a random variable

More information

T L. t=1. Proof of Lemma 1. Using the marginal cost accounting in Equation(4) and standard arguments. t )+Π RB. t )+K 1(Q RB

T L. t=1. Proof of Lemma 1. Using the marginal cost accounting in Equation(4) and standard arguments. t )+Π RB. t )+K 1(Q RB Elecronic Companion EC.1. Proofs of Technical Lemmas and Theorems LEMMA 1. Le C(RB) be he oal cos incurred by he RB policy. Then we have, T L E[C(RB)] 3 E[Z RB ]. (EC.1) Proof of Lemma 1. Using he marginal

More information

GMM - Generalized Method of Moments

GMM - Generalized Method of Moments GMM - Generalized Mehod of Momens Conens GMM esimaion, shor inroducion 2 GMM inuiion: Maching momens 2 3 General overview of GMM esimaion. 3 3. Weighing marix...........................................

More information

AN APPROXIMATE DYNAMIC PROGRAMMING ALGORITHM FOR MONOTONE VALUE FUNCTIONS

AN APPROXIMATE DYNAMIC PROGRAMMING ALGORITHM FOR MONOTONE VALUE FUNCTIONS AN APPROXIMATE DYNAMIC PROGRAMMING ALGORITHM FOR MONOTONE VALUE FUNCTIONS DANIEL R. JIANG AND WARREN B. POWELL Absrac. Many sequenial decision problems can be formulaed as Markov Decision Processes (MDPs)

More information

Speaker Adaptation Techniques For Continuous Speech Using Medium and Small Adaptation Data Sets. Constantinos Boulis

Speaker Adaptation Techniques For Continuous Speech Using Medium and Small Adaptation Data Sets. Constantinos Boulis Speaker Adapaion Techniques For Coninuous Speech Using Medium and Small Adapaion Daa Ses Consaninos Boulis Ouline of he Presenaion Inroducion o he speaker adapaion problem Maximum Likelihood Sochasic Transformaions

More information

Notes on Kalman Filtering

Notes on Kalman Filtering Noes on Kalman Filering Brian Borchers and Rick Aser November 7, Inroducion Daa Assimilaion is he problem of merging model predicions wih acual measuremens of a sysem o produce an opimal esimae of he curren

More information

L07. KALMAN FILTERING FOR NON-LINEAR SYSTEMS. NA568 Mobile Robotics: Methods & Algorithms

L07. KALMAN FILTERING FOR NON-LINEAR SYSTEMS. NA568 Mobile Robotics: Methods & Algorithms L07. KALMAN FILTERING FOR NON-LINEAR SYSTEMS NA568 Mobile Roboics: Mehods & Algorihms Today s Topic Quick review on (Linear) Kalman Filer Kalman Filering for Non-Linear Sysems Exended Kalman Filer (EKF)

More information

Physics 235 Chapter 2. Chapter 2 Newtonian Mechanics Single Particle

Physics 235 Chapter 2. Chapter 2 Newtonian Mechanics Single Particle Chaper 2 Newonian Mechanics Single Paricle In his Chaper we will review wha Newon s laws of mechanics ell us abou he moion of a single paricle. Newon s laws are only valid in suiable reference frames,

More information

Ensamble methods: Boosting

Ensamble methods: Boosting Lecure 21 Ensamble mehods: Boosing Milos Hauskrech milos@cs.pi.edu 5329 Senno Square Schedule Final exam: April 18: 1:00-2:15pm, in-class Term projecs April 23 & April 25: a 1:00-2:30pm in CS seminar room

More information

Least Squares Policy Iteration with Instrumental Variables vs. Direct Policy Search: Comparison against Optimal Benchmarks using Energy Storage

Least Squares Policy Iteration with Instrumental Variables vs. Direct Policy Search: Comparison against Optimal Benchmarks using Energy Storage Leas Squares Policy Ieraion wih Insrumenal Variables vs. Direc Policy Search: Comparison agains Opimal Benchmarks using Energy Sorage Warren R. Sco Deparmen of Operaions Research and Financial Engineering,

More information

Robust estimation based on the first- and third-moment restrictions of the power transformation model

Robust estimation based on the first- and third-moment restrictions of the power transformation model h Inernaional Congress on Modelling and Simulaion, Adelaide, Ausralia, 6 December 3 www.mssanz.org.au/modsim3 Robus esimaion based on he firs- and hird-momen resricions of he power ransformaion Nawaa,

More information

3.1.3 INTRODUCTION TO DYNAMIC OPTIMIZATION: DISCRETE TIME PROBLEMS. A. The Hamiltonian and First-Order Conditions in a Finite Time Horizon

3.1.3 INTRODUCTION TO DYNAMIC OPTIMIZATION: DISCRETE TIME PROBLEMS. A. The Hamiltonian and First-Order Conditions in a Finite Time Horizon 3..3 INRODUCION O DYNAMIC OPIMIZAION: DISCREE IME PROBLEMS A. he Hamilonian and Firs-Order Condiions in a Finie ime Horizon Define a new funcion, he Hamilonian funcion, H. H he change in he oal value of

More information

Two Popular Bayesian Estimators: Particle and Kalman Filters. McGill COMP 765 Sept 14 th, 2017

Two Popular Bayesian Estimators: Particle and Kalman Filters. McGill COMP 765 Sept 14 th, 2017 Two Popular Bayesian Esimaors: Paricle and Kalman Filers McGill COMP 765 Sep 14 h, 2017 1 1 1, dx x Bel x u x P x z P Recall: Bayes Filers,,,,,,, 1 1 1 1 u z u x P u z u x z P Bayes z = observaion u =

More information

On Boundedness of Q-Learning Iterates for Stochastic Shortest Path Problems

On Boundedness of Q-Learning Iterates for Stochastic Shortest Path Problems MATHEMATICS OF OPERATIONS RESEARCH Vol. 38, No. 2, May 2013, pp. 209 227 ISSN 0364-765X (prin) ISSN 1526-5471 (online) hp://dx.doi.org/10.1287/moor.1120.0562 2013 INFORMS On Boundedness of Q-Learning Ieraes

More information

RANDOM LAGRANGE MULTIPLIERS AND TRANSVERSALITY

RANDOM LAGRANGE MULTIPLIERS AND TRANSVERSALITY ECO 504 Spring 2006 Chris Sims RANDOM LAGRANGE MULTIPLIERS AND TRANSVERSALITY 1. INTRODUCTION Lagrange muliplier mehods are sandard fare in elemenary calculus courses, and hey play a cenral role in economic

More information

Simulation-Solving Dynamic Models ABE 5646 Week 2, Spring 2010

Simulation-Solving Dynamic Models ABE 5646 Week 2, Spring 2010 Simulaion-Solving Dynamic Models ABE 5646 Week 2, Spring 2010 Week Descripion Reading Maerial 2 Compuer Simulaion of Dynamic Models Finie Difference, coninuous saes, discree ime Simple Mehods Euler Trapezoid

More information

Article from. Predictive Analytics and Futurism. July 2016 Issue 13

Article from. Predictive Analytics and Futurism. July 2016 Issue 13 Aricle from Predicive Analyics and Fuurism July 6 Issue An Inroducion o Incremenal Learning By Qiang Wu and Dave Snell Machine learning provides useful ools for predicive analyics The ypical machine learning

More information

SZG Macro 2011 Lecture 3: Dynamic Programming. SZG macro 2011 lecture 3 1

SZG Macro 2011 Lecture 3: Dynamic Programming. SZG macro 2011 lecture 3 1 SZG Macro 2011 Lecure 3: Dynamic Programming SZG macro 2011 lecure 3 1 Background Our previous discussion of opimal consumpion over ime and of opimal capial accumulaion sugges sudying he general decision

More information

Lecture Notes 2. The Hilbert Space Approach to Time Series

Lecture Notes 2. The Hilbert Space Approach to Time Series Time Series Seven N. Durlauf Universiy of Wisconsin. Basic ideas Lecure Noes. The Hilber Space Approach o Time Series The Hilber space framework provides a very powerful language for discussing he relaionship

More information

CHAPTER 10 VALIDATION OF TEST WITH ARTIFICAL NEURAL NETWORK

CHAPTER 10 VALIDATION OF TEST WITH ARTIFICAL NEURAL NETWORK 175 CHAPTER 10 VALIDATION OF TEST WITH ARTIFICAL NEURAL NETWORK 10.1 INTRODUCTION Amongs he research work performed, he bes resuls of experimenal work are validaed wih Arificial Neural Nework. From he

More information

Ensamble methods: Bagging and Boosting

Ensamble methods: Bagging and Boosting Lecure 21 Ensamble mehods: Bagging and Boosing Milos Hauskrech milos@cs.pi.edu 5329 Senno Square Ensemble mehods Mixure of expers Muliple base models (classifiers, regressors), each covers a differen par

More information

Zürich. ETH Master Course: L Autonomous Mobile Robots Localization II

Zürich. ETH Master Course: L Autonomous Mobile Robots Localization II Roland Siegwar Margaria Chli Paul Furgale Marco Huer Marin Rufli Davide Scaramuzza ETH Maser Course: 151-0854-00L Auonomous Mobile Robos Localizaion II ACT and SEE For all do, (predicion updae / ACT),

More information

INTRODUCTION TO MACHINE LEARNING 3RD EDITION

INTRODUCTION TO MACHINE LEARNING 3RD EDITION ETHEM ALPAYDIN The MIT Press, 2014 Lecure Slides for INTRODUCTION TO MACHINE LEARNING 3RD EDITION alpaydin@boun.edu.r hp://www.cmpe.boun.edu.r/~ehem/i2ml3e CHAPTER 2: SUPERVISED LEARNING Learning a Class

More information

Hamilton- J acobi Equation: Weak S olution We continue the study of the Hamilton-Jacobi equation:

Hamilton- J acobi Equation: Weak S olution We continue the study of the Hamilton-Jacobi equation: M ah 5 7 Fall 9 L ecure O c. 4, 9 ) Hamilon- J acobi Equaion: Weak S oluion We coninue he sudy of he Hamilon-Jacobi equaion: We have shown ha u + H D u) = R n, ) ; u = g R n { = }. ). In general we canno

More information

Technical Report Doc ID: TR March-2013 (Last revision: 23-February-2016) On formulating quadratic functions in optimization models.

Technical Report Doc ID: TR March-2013 (Last revision: 23-February-2016) On formulating quadratic functions in optimization models. Technical Repor Doc ID: TR--203 06-March-203 (Las revision: 23-Februar-206) On formulaing quadraic funcions in opimizaion models. Auhor: Erling D. Andersen Convex quadraic consrains quie frequenl appear

More information

Georey E. Hinton. University oftoronto. Technical Report CRG-TR February 22, Abstract

Georey E. Hinton. University oftoronto.   Technical Report CRG-TR February 22, Abstract Parameer Esimaion for Linear Dynamical Sysems Zoubin Ghahramani Georey E. Hinon Deparmen of Compuer Science Universiy oftorono 6 King's College Road Torono, Canada M5S A4 Email: zoubin@cs.orono.edu Technical

More information

State-Space Models. Initialization, Estimation and Smoothing of the Kalman Filter

State-Space Models. Initialization, Estimation and Smoothing of the Kalman Filter Sae-Space Models Iniializaion, Esimaion and Smoohing of he Kalman Filer Iniializaion of he Kalman Filer The Kalman filer shows how o updae pas predicors and he corresponding predicion error variances when

More information

Multi-scale 2D acoustic full waveform inversion with high frequency impulsive source

Multi-scale 2D acoustic full waveform inversion with high frequency impulsive source Muli-scale D acousic full waveform inversion wih high frequency impulsive source Vladimir N Zubov*, Universiy of Calgary, Calgary AB vzubov@ucalgaryca and Michael P Lamoureux, Universiy of Calgary, Calgary

More information

Chapter 2. First Order Scalar Equations

Chapter 2. First Order Scalar Equations Chaper. Firs Order Scalar Equaions We sar our sudy of differenial equaions in he same way he pioneers in his field did. We show paricular echniques o solve paricular ypes of firs order differenial equaions.

More information

Kriging Models Predicting Atrazine Concentrations in Surface Water Draining Agricultural Watersheds

Kriging Models Predicting Atrazine Concentrations in Surface Water Draining Agricultural Watersheds 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 Kriging Models Predicing Arazine Concenraions in Surface Waer Draining Agriculural Waersheds Paul L. Mosquin, Jeremy Aldworh, Wenlin Chen Supplemenal Maerial Number

More information

Energy Storage and Renewables in New Jersey: Complementary Technologies for Reducing Our Carbon Footprint

Energy Storage and Renewables in New Jersey: Complementary Technologies for Reducing Our Carbon Footprint Energy Sorage and Renewables in New Jersey: Complemenary Technologies for Reducing Our Carbon Fooprin ACEE E-filliaes workshop November 14, 2014 Warren B. Powell Daniel Seingar Harvey Cheng Greg Davies

More information

14 Autoregressive Moving Average Models

14 Autoregressive Moving Average Models 14 Auoregressive Moving Average Models In his chaper an imporan parameric family of saionary ime series is inroduced, he family of he auoregressive moving average, or ARMA, processes. For a large class

More information

Inventory Control of Perishable Items in a Two-Echelon Supply Chain

Inventory Control of Perishable Items in a Two-Echelon Supply Chain Journal of Indusrial Engineering, Universiy of ehran, Special Issue,, PP. 69-77 69 Invenory Conrol of Perishable Iems in a wo-echelon Supply Chain Fariborz Jolai *, Elmira Gheisariha and Farnaz Nojavan

More information

5. Stochastic processes (1)

5. Stochastic processes (1) Lec05.pp S-38.45 - Inroducion o Teleraffic Theory Spring 2005 Conens Basic conceps Poisson process 2 Sochasic processes () Consider some quaniy in a eleraffic (or any) sysem I ypically evolves in ime randomly

More information

BU Macro BU Macro Fall 2008, Lecture 4

BU Macro BU Macro Fall 2008, Lecture 4 Dynamic Programming BU Macro 2008 Lecure 4 1 Ouline 1. Cerainy opimizaion problem used o illusrae: a. Resricions on exogenous variables b. Value funcion c. Policy funcion d. The Bellman equaion and an

More information

2. Nonlinear Conservation Law Equations

2. Nonlinear Conservation Law Equations . Nonlinear Conservaion Law Equaions One of he clear lessons learned over recen years in sudying nonlinear parial differenial equaions is ha i is generally no wise o ry o aack a general class of nonlinear

More information

An recursive analytical technique to estimate time dependent physical parameters in the presence of noise processes

An recursive analytical technique to estimate time dependent physical parameters in the presence of noise processes WHAT IS A KALMAN FILTER An recursive analyical echnique o esimae ime dependen physical parameers in he presence of noise processes Example of a ime and frequency applicaion: Offse beween wo clocks PREDICTORS,

More information

A variational radial basis function approximation for diffusion processes.

A variational radial basis function approximation for diffusion processes. A variaional radial basis funcion approximaion for diffusion processes. Michail D. Vreas, Dan Cornford and Yuan Shen {vreasm, d.cornford, y.shen}@ason.ac.uk Ason Universiy, Birmingham, UK hp://www.ncrg.ason.ac.uk

More information

Modal identification of structures from roving input data by means of maximum likelihood estimation of the state space model

Modal identification of structures from roving input data by means of maximum likelihood estimation of the state space model Modal idenificaion of srucures from roving inpu daa by means of maximum likelihood esimaion of he sae space model J. Cara, J. Juan, E. Alarcón Absrac The usual way o perform a forced vibraion es is o fix

More information

References are appeared in the last slide. Last update: (1393/08/19)

References are appeared in the last slide. Last update: (1393/08/19) SYSEM IDEIFICAIO Ali Karimpour Associae Professor Ferdowsi Universi of Mashhad References are appeared in he las slide. Las updae: 0..204 393/08/9 Lecure 5 lecure 5 Parameer Esimaion Mehods opics o be

More information

On Measuring Pro-Poor Growth. 1. On Various Ways of Measuring Pro-Poor Growth: A Short Review of the Literature

On Measuring Pro-Poor Growth. 1. On Various Ways of Measuring Pro-Poor Growth: A Short Review of the Literature On Measuring Pro-Poor Growh 1. On Various Ways of Measuring Pro-Poor Growh: A Shor eview of he Lieraure During he pas en years or so here have been various suggesions concerning he way one should check

More information

Time series model fitting via Kalman smoothing and EM estimation in TimeModels.jl

Time series model fitting via Kalman smoothing and EM estimation in TimeModels.jl Time series model fiing via Kalman smoohing and EM esimaion in TimeModels.jl Gord Sephen Las updaed: January 206 Conens Inroducion 2. Moivaion and Acknowledgemens....................... 2.2 Noaion......................................

More information

Applying Genetic Algorithms for Inventory Lot-Sizing Problem with Supplier Selection under Storage Capacity Constraints

Applying Genetic Algorithms for Inventory Lot-Sizing Problem with Supplier Selection under Storage Capacity Constraints IJCSI Inernaional Journal of Compuer Science Issues, Vol 9, Issue 1, No 1, January 2012 wwwijcsiorg 18 Applying Geneic Algorihms for Invenory Lo-Sizing Problem wih Supplier Selecion under Sorage Capaciy

More information

Operations Research. An Approximate Dynamic Programming Algorithm for Monotone Value Functions

Operations Research. An Approximate Dynamic Programming Algorithm for Monotone Value Functions This aricle was downloaded by: [140.1.241.64] On: 05 January 2016, A: 21:41 Publisher: Insiue for Operaions Research and he Managemen Sciences (INFORMS) INFORMS is locaed in Maryland, USA Operaions Research

More information

Decentralized Stochastic Control with Partial History Sharing: A Common Information Approach

Decentralized Stochastic Control with Partial History Sharing: A Common Information Approach 1 Decenralized Sochasic Conrol wih Parial Hisory Sharing: A Common Informaion Approach Ashuosh Nayyar, Adiya Mahajan and Demoshenis Tenekezis arxiv:1209.1695v1 [cs.sy] 8 Sep 2012 Absrac A general model

More information

Random Walk with Anti-Correlated Steps

Random Walk with Anti-Correlated Steps Random Walk wih Ani-Correlaed Seps John Noga Dirk Wagner 2 Absrac We conjecure he expeced value of random walks wih ani-correlaed seps o be exacly. We suppor his conjecure wih 2 plausibiliy argumens and

More information

Západočeská Univerzita v Plzni, Czech Republic and Groupe ESIEE Paris, France

Západočeská Univerzita v Plzni, Czech Republic and Groupe ESIEE Paris, France ADAPTIVE SIGNAL PROCESSING USING MAXIMUM ENTROPY ON THE MEAN METHOD AND MONTE CARLO ANALYSIS Pavla Holejšovsá, Ing. *), Z. Peroua, Ing. **), J.-F. Bercher, Prof. Assis. ***) Západočesá Univerzia v Plzni,

More information

Chapter 3 Boundary Value Problem

Chapter 3 Boundary Value Problem Chaper 3 Boundary Value Problem A boundary value problem (BVP) is a problem, ypically an ODE or a PDE, which has values assigned on he physical boundary of he domain in which he problem is specified. Le

More information

Christos Papadimitriou & Luca Trevisan November 22, 2016

Christos Papadimitriou & Luca Trevisan November 22, 2016 U.C. Bereley CS170: Algorihms Handou LN-11-22 Chrisos Papadimiriou & Luca Trevisan November 22, 2016 Sreaming algorihms In his lecure and he nex one we sudy memory-efficien algorihms ha process a sream

More information

Tom Heskes and Onno Zoeter. Presented by Mark Buller

Tom Heskes and Onno Zoeter. Presented by Mark Buller Tom Heskes and Onno Zoeer Presened by Mark Buller Dynamic Bayesian Neworks Direced graphical models of sochasic processes Represen hidden and observed variables wih differen dependencies Generalize Hidden

More information

Supplement for Stochastic Convex Optimization: Faster Local Growth Implies Faster Global Convergence

Supplement for Stochastic Convex Optimization: Faster Local Growth Implies Faster Global Convergence Supplemen for Sochasic Convex Opimizaion: Faser Local Growh Implies Faser Global Convergence Yi Xu Qihang Lin ianbao Yang Proof of heorem heorem Suppose Assumpion holds and F (w) obeys he LGC (6) Given

More information

Scheduling of Crude Oil Movements at Refinery Front-end

Scheduling of Crude Oil Movements at Refinery Front-end Scheduling of Crude Oil Movemens a Refinery Fron-end Ramkumar Karuppiah and Ignacio Grossmann Carnegie Mellon Universiy ExxonMobil Case Sudy: Dr. Kevin Furman Enerprise-wide Opimizaion Projec March 15,

More information

A Dynamic Model of Economic Fluctuations

A Dynamic Model of Economic Fluctuations CHAPTER 15 A Dynamic Model of Economic Flucuaions Modified for ECON 2204 by Bob Murphy 2016 Worh Publishers, all righs reserved IN THIS CHAPTER, OU WILL LEARN: how o incorporae dynamics ino he AD-AS model

More information

The Optimal Stopping Time for Selling an Asset When It Is Uncertain Whether the Price Process Is Increasing or Decreasing When the Horizon Is Infinite

The Optimal Stopping Time for Selling an Asset When It Is Uncertain Whether the Price Process Is Increasing or Decreasing When the Horizon Is Infinite American Journal of Operaions Research, 08, 8, 8-9 hp://wwwscirporg/journal/ajor ISSN Online: 60-8849 ISSN Prin: 60-8830 The Opimal Sopping Time for Selling an Asse When I Is Uncerain Wheher he Price Process

More information

3.1 More on model selection

3.1 More on model selection 3. More on Model selecion 3. Comparing models AIC, BIC, Adjused R squared. 3. Over Fiing problem. 3.3 Sample spliing. 3. More on model selecion crieria Ofen afer model fiing you are lef wih a handful of

More information

O Q L N. Discrete-Time Stochastic Dynamic Programming. I. Notation and basic assumptions. ε t : a px1 random vector of disturbances at time t.

O Q L N. Discrete-Time Stochastic Dynamic Programming. I. Notation and basic assumptions. ε t : a px1 random vector of disturbances at time t. Econ. 5b Spring 999 C. Sims Discree-Time Sochasic Dynamic Programming 995, 996 by Chrisopher Sims. This maerial may be freely reproduced for educaional and research purposes, so long as i is no alered,

More information

Pade and Laguerre Approximations Applied. to the Active Queue Management Model. of Internet Protocol

Pade and Laguerre Approximations Applied. to the Active Queue Management Model. of Internet Protocol Applied Mahemaical Sciences, Vol. 7, 013, no. 16, 663-673 HIKARI Ld, www.m-hikari.com hp://dx.doi.org/10.1988/ams.013.39499 Pade and Laguerre Approximaions Applied o he Acive Queue Managemen Model of Inerne

More information

Particle Swarm Optimization Combining Diversification and Intensification for Nonlinear Integer Programming Problems

Particle Swarm Optimization Combining Diversification and Intensification for Nonlinear Integer Programming Problems Paricle Swarm Opimizaion Combining Diversificaion and Inensificaion for Nonlinear Ineger Programming Problems Takeshi Masui, Masaoshi Sakawa, Kosuke Kao and Koichi Masumoo Hiroshima Universiy 1-4-1, Kagamiyama,

More information

Econ107 Applied Econometrics Topic 7: Multicollinearity (Studenmund, Chapter 8)

Econ107 Applied Econometrics Topic 7: Multicollinearity (Studenmund, Chapter 8) I. Definiions and Problems A. Perfec Mulicollineariy Econ7 Applied Economerics Topic 7: Mulicollineariy (Sudenmund, Chaper 8) Definiion: Perfec mulicollineariy exiss in a following K-variable regression

More information

CENTRALIZED VERSUS DECENTRALIZED PRODUCTION PLANNING IN SUPPLY CHAINS

CENTRALIZED VERSUS DECENTRALIZED PRODUCTION PLANNING IN SUPPLY CHAINS CENRALIZED VERSUS DECENRALIZED PRODUCION PLANNING IN SUPPLY CHAINS Georges SAHARIDIS* a, Yves DALLERY* a, Fikri KARAESMEN* b * a Ecole Cenrale Paris Deparmen of Indusial Engineering (LGI), +3343388, saharidis,dallery@lgi.ecp.fr

More information

A Shooting Method for A Node Generation Algorithm

A Shooting Method for A Node Generation Algorithm A Shooing Mehod for A Node Generaion Algorihm Hiroaki Nishikawa W.M.Keck Foundaion Laboraory for Compuaional Fluid Dynamics Deparmen of Aerospace Engineering, Universiy of Michigan, Ann Arbor, Michigan

More information

On-line Adaptive Optimal Timing Control of Switched Systems

On-line Adaptive Optimal Timing Control of Switched Systems On-line Adapive Opimal Timing Conrol of Swiched Sysems X.C. Ding, Y. Wardi and M. Egersed Absrac In his paper we consider he problem of opimizing over he swiching imes for a muli-modal dynamic sysem when

More information

Online Convex Optimization Example And Follow-The-Leader

Online Convex Optimization Example And Follow-The-Leader CSE599s, Spring 2014, Online Learning Lecure 2-04/03/2014 Online Convex Opimizaion Example And Follow-The-Leader Lecurer: Brendan McMahan Scribe: Sephen Joe Jonany 1 Review of Online Convex Opimizaion

More information

10. State Space Methods

10. State Space Methods . Sae Space Mehods. Inroducion Sae space modelling was briefly inroduced in chaper. Here more coverage is provided of sae space mehods before some of heir uses in conrol sysem design are covered in he

More information

A Hop Constrained Min-Sum Arborescence with Outage Costs

A Hop Constrained Min-Sum Arborescence with Outage Costs A Hop Consrained Min-Sum Arborescence wih Ouage Coss Rakesh Kawara Minnesoa Sae Universiy, Mankao, MN 56001 Email: Kawara@mnsu.edu Absrac The hop consrained min-sum arborescence wih ouage coss problem

More information

International Journal of Scientific & Engineering Research, Volume 4, Issue 10, October ISSN

International Journal of Scientific & Engineering Research, Volume 4, Issue 10, October ISSN Inernaional Journal of Scienific & Engineering Research, Volume 4, Issue 10, Ocober-2013 900 FUZZY MEAN RESIDUAL LIFE ORDERING OF FUZZY RANDOM VARIABLES J. EARNEST LAZARUS PIRIYAKUMAR 1, A. YAMUNA 2 1.

More information

Online Appendix to Solution Methods for Models with Rare Disasters

Online Appendix to Solution Methods for Models with Rare Disasters Online Appendix o Soluion Mehods for Models wih Rare Disasers Jesús Fernández-Villaverde and Oren Levinal In his Online Appendix, we presen he Euler condiions of he model, we develop he pricing Calvo block,

More information

Bias in Conditional and Unconditional Fixed Effects Logit Estimation: a Correction * Tom Coupé

Bias in Conditional and Unconditional Fixed Effects Logit Estimation: a Correction * Tom Coupé Bias in Condiional and Uncondiional Fixed Effecs Logi Esimaion: a Correcion * Tom Coupé Economics Educaion and Research Consorium, Naional Universiy of Kyiv Mohyla Academy Address: Vul Voloska 10, 04070

More information

2.160 System Identification, Estimation, and Learning. Lecture Notes No. 8. March 6, 2006

2.160 System Identification, Estimation, and Learning. Lecture Notes No. 8. March 6, 2006 2.160 Sysem Idenificaion, Esimaion, and Learning Lecure Noes No. 8 March 6, 2006 4.9 Eended Kalman Filer In many pracical problems, he process dynamics are nonlinear. w Process Dynamics v y u Model (Linearized)

More information

Inventory Analysis and Management. Multi-Period Stochastic Models: Optimality of (s, S) Policy for K-Convex Objective Functions

Inventory Analysis and Management. Multi-Period Stochastic Models: Optimality of (s, S) Policy for K-Convex Objective Functions Muli-Period Sochasic Models: Opimali of (s, S) Polic for -Convex Objecive Funcions Consider a seing similar o he N-sage newsvendor problem excep ha now here is a fixed re-ordering cos (> 0) for each (re-)order.

More information

CONTROL SYSTEMS, ROBOTICS AND AUTOMATION Vol. XI Control of Stochastic Systems - P.R. Kumar

CONTROL SYSTEMS, ROBOTICS AND AUTOMATION Vol. XI Control of Stochastic Systems - P.R. Kumar CONROL OF SOCHASIC SYSEMS P.R. Kumar Deparmen of Elecrical and Compuer Engineering, and Coordinaed Science Laboraory, Universiy of Illinois, Urbana-Champaign, USA. Keywords: Markov chains, ransiion probabiliies,

More information

SUPPLEMENTARY INFORMATION

SUPPLEMENTARY INFORMATION SUPPLEMENTARY INFORMATION DOI: 0.038/NCLIMATE893 Temporal resoluion and DICE * Supplemenal Informaion Alex L. Maren and Sephen C. Newbold Naional Cener for Environmenal Economics, US Environmenal Proecion

More information

R t. C t P t. + u t. C t = αp t + βr t + v t. + β + w t

R t. C t P t. + u t. C t = αp t + βr t + v t. + β + w t Exercise 7 C P = α + β R P + u C = αp + βr + v (a) (b) C R = α P R + β + w (c) Assumpions abou he disurbances u, v, w : Classical assumions on he disurbance of one of he equaions, eg. on (b): E(v v s P,

More information

ONLINE SUPPLEMENT: AN APPROXIMATE DYNAMIC PROGRAMMING ALGORITHM FOR MONOTONE VALUE FUNCTIONS. 1. Preliminaries

ONLINE SUPPLEMENT: AN APPROXIMATE DYNAMIC PROGRAMMING ALGORITHM FOR MONOTONE VALUE FUNCTIONS. 1. Preliminaries ONLINE SUPPLEMENT: AN APPROXIMATE DYNAMIC PROGRAMMING ALGORITHM FOR MONOTONE VALUE FUNCTIONS DANIEL R. JIANG AND WARREN B. POWELL Absrac. In his online supplemen we provide he proofs o a condiion for monooniciy

More information

Lecture 33: November 29

Lecture 33: November 29 36-705: Inermediae Saisics Fall 2017 Lecurer: Siva Balakrishnan Lecure 33: November 29 Today we will coninue discussing he boosrap, and hen ry o undersand why i works in a simple case. In he las lecure

More information

Global Optimization for Scheduling Refinery Crude Oil Operations

Global Optimization for Scheduling Refinery Crude Oil Operations Global Opimizaion for Scheduling Refinery Crude Oil Operaions Ramkumar Karuppiah 1, Kevin C. Furman 2 and Ignacio E. Grossmann 1 (1) Deparmen of Chemical Engineering Carnegie Mellon Universiy (2) Corporae

More information

Course Notes for EE227C (Spring 2018): Convex Optimization and Approximation

Course Notes for EE227C (Spring 2018): Convex Optimization and Approximation Course Noes for EE7C Spring 018: Convex Opimizaion and Approximaion Insrucor: Moriz Hard Email: hard+ee7c@berkeley.edu Graduae Insrucor: Max Simchowiz Email: msimchow+ee7c@berkeley.edu Ocober 15, 018 3

More information

Linear Response Theory: The connection between QFT and experiments

Linear Response Theory: The connection between QFT and experiments Phys540.nb 39 3 Linear Response Theory: The connecion beween QFT and experimens 3.1. Basic conceps and ideas Q: How do we measure he conduciviy of a meal? A: we firs inroduce a weak elecric field E, and

More information

Matrix Versions of Some Refinements of the Arithmetic-Geometric Mean Inequality

Matrix Versions of Some Refinements of the Arithmetic-Geometric Mean Inequality Marix Versions of Some Refinemens of he Arihmeic-Geomeric Mean Inequaliy Bao Qi Feng and Andrew Tonge Absrac. We esablish marix versions of refinemens due o Alzer ], Carwrigh and Field 4], and Mercer 5]

More information

Sequential Importance Resampling (SIR) Particle Filter

Sequential Importance Resampling (SIR) Particle Filter Paricle Filers++ Pieer Abbeel UC Berkeley EECS Many slides adaped from Thrun, Burgard and Fox, Probabilisic Roboics 1. Algorihm paricle_filer( S -1, u, z ): 2. Sequenial Imporance Resampling (SIR) Paricle

More information

A Forward-Backward Splitting Method with Component-wise Lazy Evaluation for Online Structured Convex Optimization

A Forward-Backward Splitting Method with Component-wise Lazy Evaluation for Online Structured Convex Optimization A Forward-Backward Spliing Mehod wih Componen-wise Lazy Evaluaion for Online Srucured Convex Opimizaion Yukihiro Togari and Nobuo Yamashia March 28, 2016 Absrac: We consider large-scale opimizaion problems

More information

di Bernardo, M. (1995). A purely adaptive controller to synchronize and control chaotic systems.

di Bernardo, M. (1995). A purely adaptive controller to synchronize and control chaotic systems. di ernardo, M. (995). A purely adapive conroller o synchronize and conrol chaoic sysems. hps://doi.org/.6/375-96(96)8-x Early version, also known as pre-prin Link o published version (if available):.6/375-96(96)8-x

More information

OBJECTIVES OF TIME SERIES ANALYSIS

OBJECTIVES OF TIME SERIES ANALYSIS OBJECTIVES OF TIME SERIES ANALYSIS Undersanding he dynamic or imedependen srucure of he observaions of a single series (univariae analysis) Forecasing of fuure observaions Asceraining he leading, lagging

More information

MATH 5720: Gradient Methods Hung Phan, UMass Lowell October 4, 2018

MATH 5720: Gradient Methods Hung Phan, UMass Lowell October 4, 2018 MATH 5720: Gradien Mehods Hung Phan, UMass Lowell Ocober 4, 208 Descen Direcion Mehods Consider he problem min { f(x) x R n}. The general descen direcions mehod is x k+ = x k + k d k where x k is he curren

More information

MATHEMATICAL DESCRIPTION OF THEORETICAL METHODS OF RESERVE ECONOMY OF CONSIGNMENT STORES

MATHEMATICAL DESCRIPTION OF THEORETICAL METHODS OF RESERVE ECONOMY OF CONSIGNMENT STORES MAHEMAICAL DESCIPION OF HEOEICAL MEHODS OF ESEVE ECONOMY OF CONSIGNMEN SOES Péer elek, József Cselényi, György Demeer Universiy of Miskolc, Deparmen of Maerials Handling and Logisics Absrac: Opimizaion

More information

DEPARTMENT OF STATISTICS

DEPARTMENT OF STATISTICS A Tes for Mulivariae ARCH Effecs R. Sco Hacker and Abdulnasser Haemi-J 004: DEPARTMENT OF STATISTICS S-0 07 LUND SWEDEN A Tes for Mulivariae ARCH Effecs R. Sco Hacker Jönköping Inernaional Business School

More information

E β t log (C t ) + M t M t 1. = Y t + B t 1 P t. B t 0 (3) v t = P tc t M t Question 1. Find the FOC s for an optimum in the agent s problem.

E β t log (C t ) + M t M t 1. = Y t + B t 1 P t. B t 0 (3) v t = P tc t M t Question 1. Find the FOC s for an optimum in the agent s problem. Noes, M. Krause.. Problem Se 9: Exercise on FTPL Same model as in paper and lecure, only ha one-period govenmen bonds are replaced by consols, which are bonds ha pay one dollar forever. I has curren marke

More information