We consider a finite-state, finite-action, infinite-horizon, discounted reward Markov decision process and

Size: px
Start display at page:

Download "We consider a finite-state, finite-action, infinite-horizon, discounted reward Markov decision process and"

Transcription

1 MANAGEMENT SCIENCE Vol. 53, No. 2, Februry 2007, pp ssn essn nforms do /mnsc INFORMS Bs nd Vrnce Approxmton n Vlue Functon Estmtes She Mnnor Deprtment of Electrcl nd Computer Engneerng, McGll Unversty, Montrel, Quebec, Cnd H3A 2A7, she@ece.mcgll.c Duncn Smester Slon School of Mngement, Msschusetts Insttute of Technology, Cmbrdge, Msschusetts 02139, smester@mt.edu Peng Sun Fuqu School of Busness, Duke Unversty, Durhm, North Croln 27708, psun@duke.edu John N. Tstskls Lbortory for Informton nd Decson Systems, Msschusetts Insttute of Technology, Cmbrdge, Msschusetts 02139, nt@mt.edu We consder fnte-stte, fnte-cton, nfnte-horzon, dscounted rewrd Mrkov decson process nd study the bs nd vrnce n the vlue functon estmtes tht result from emprcl estmtes of the model prmeters. We provde closed-form pproxmtons for the bs nd vrnce, whch cn then be used to derve confdence ntervls round the vlue functon estmtes. We llustrte nd vldte our fndngs usng lrge dtbse descrbng the trnscton nd mlng hstores for customers of ml-order ctlog frm. Key words: vlue functon; confdence ntervl; vrnce; bs Hstory: Accepted by Wllce J. Hopp, stochstc models nd smulton; receved July 14, Ths pper ws wth the uthors 4 1 months for 4 revsons Introducton Bellmn s vlue functon plys centrl role n the optmzton of dynmc decson-mkng models, s well s n the structurl estmton of dynmc models of rtonl gents. For the mportnt cse of fnte-stte Mrkov decson process (MDP), the vlue functon depends on two types of model prmeters: the trnston probbltes between sttes nd the expected one-step rewrds from ech stte. In mny pplctons n the socl scences nd n engneerng, the trnston probbltes nd expected rewrds re not known nd nsted must be estmted from fnte smples of dt. The estmton errors for these prmeters ntroduce bs nd vrnce n the vlue functon estmtes. In ths pper, we present methodology for evlutng ths bs nd vrnce. Ths, n turn, llows the clculton of confdence ntervls round the vlue functon estmtes. The confdence ntervls re themselves pproxmtons. For nlytcl nd computtonl trctblty, they rely on second-order Tylor seres pproxmtons. Moreover, becuse the expressons for the bs nd the vrnce pproxmton requre the true but unknown model prmeters, we replce these unknown prmeters by ther estmtes. We evlute the ccurcy of these pproxmtons nd vldte the expressons usng lrge smple of rel dt obtned from ml-order ctlog compny Sources of Vrnce We strt by dstngushng between two types of vrnce tht cn rse n n MDP: nternl nd prmetrc. Internl vrnce reflects the stochstcty n the trnstons nd rewrds. For exmple, n mrketng settng there s rrely certnty s to whether n ndvdul customer wll purchse, resultng n genunely stochstc trnstons nd rewrds. Prmetrc vrnce rses f the true trnston probbltes nd expected rewrds re estmted rther thn known; the potentl for error n the estmtes of these prmeters ntroduces vrnce n the vlue functon estmtes. The two types of vrnce hve dfferent sources nd cn be llustrted through dfferent experments. To llustrte nternl vrnce, we cn fx the model prmeters nd then generte number of fntelength smple trectores (wth ll trectores hvng the sme length, strtng from the sme stte, nd usng common control polcy). The vrton cross smple trectores n the totl rewrds nd/or the dentty of the fnl stte reflects nternl vrnce. In 308

2 Mnnor et l.: Bs nd Vrnce Approxmton n Vlue Functon Estmtes Mngement Scence 53(2), pp , 2007 INFORMS 309 contrst, ggregton cross smples does not mtgte prmetrc vrnce. The ltter cn be llustrted by comprng the verge outcomes from lrge number of smples generted under dfferent estmtes for the model prmeters. The vrton n the verge outcomes under dfferent estmtes reflects prmetrc vrnce. Internl vrnce hs lredy been consdered n the lterture. In prtculr, Sobel (1982) provdes n expresson for the nternl vrnce n n MDP wth dscounted rewrds, whle Flr et l. (1989) nd Bukl-Gursoy nd Ross (1992) consder the verge rewrd crteron. In ths pper, we focus on prmetrc vrnce. Our motvton s tht n mny contexts the underlyng obectve nvolves vergng outcomes cross lrge number of smples, n whch cse the nternl vrnce s verged out. For exmple, n mrketng pplcton, frm profts typclly represent the ggregton of outcomes cross lrge number of customers. Smlrly, n lbor economcs settng, frm often ggregtes cross lrge number of employees. Of course, there re settngs where nternl vrnce s lso mportnt. For exmple, when lloctng fnncl portfolos, the (nternl) vrnce of the return on sngle fnncl portfolo s mportnt n ts own rght Lterture Mrkov decson problems nd the ssocted methodology of dynmc progrmmng hve found brod rnge of pplctons n numerous felds n the socl scences nd n engneerng. These pplctons cn be brodly dvded n two ctegores bsed on the reserch obectves. The frst nd more trdtonl ctegory of pplctons focuses on optmzng the operton of humn or engneerng systems nd on provdng tools for effectve decson mkng. The pplcton res re vst, nd nclude fnnce (Luenberger 1997, Cmpbell nd Vcer 2002), economcs (Dxt nd Pndyck 1994), nventory control nd supply chn mngement (Zpkn 2000), revenue nd yeld mngement (McGll nd vn Ryzn 1999), trnsportton (Godfrey nd Powell 2002), communctons, wter resource mngement, nd electrc power systems. The vst morty of ths lterture ssumes tht n ccurte system model s vlble. There s n underlyng mplct ssumpton tht the true model wll be estmted usng sttstcl methods on the bss of whtever dt re vlble. However, the sttstcl rmfctons of workng wth fnte dt records hve receved lttle ttenton. An excepton s the lterture delng wth onlne lernng of optml polces (dptve control of Mrkov chns, renforcement lernng) (Sutton nd Brto 1998, Bertseks nd Tstskls 1996). However, ths lterture s concerned wth symptotc convergence s opposed to the common sttstcl questons of stndrd errors nd confdence ntervls. The second ctegory of pplctons focuses on explnng observed phenomen. Among the most wdely cted exmples s the work of Rust (1987), who develops dscrete dynmc progrmmng model of the optml replcement polcy for bus engnes. Accordng to ths pproch, the resercher strts by ssumng tht ndvduls or frms behve optmlly, but tht the prmeters of the frm or the customer decson problem re unknown. By mxmzng the lkelhood of the emprclly observed ctons of ndvduls or frms under the optml polces for dfferent sets of prmeters, the resercher seeks to dentfy these unobserved prmeters. Smlr pplctons of dscrete dynmc progrmmng models hve become ncresngly common, prtculrly n the lbor (Kene nd Wolpn 1994), ndustrl orgnzton (Hendel nd Nevo 2002), nd mrketng (Gönül nd Sh 1998) ltertures. Whle these methods use vrety of pproches to clculte or pproxmte the vlue functon, the vlue functon reles on pont estmtes of the model prmeters. Prevous ttempts to consder the mpct of prmeter error on the clculted vlue functon hve been lmted to smulton-bsed pproches. We fnlly note tht the mpct of uncertnty n the model prmeters on the ccurcy of the vlue functon estmtes hs receved ttenton n the fnnce lterture. For exmple, X (2001) nd Brbers (2000) nvestgte how dynmc lernng bout stock return predctblty ffects optml portfolo lloctons. The generl problem consdered n these studes s smlr to the one ddressed n ths pper. However, the sources of vrnce re dfferent. In prtculr, the fnnce lterture s concerned wth nternl vrnce due to the stochstcty n the underlyng process nd prmetrc vrnce due to nonsttonrty of the model prmeters, ncludng chnges n the nvestment horzon nd/or dynmc lernng. In contrst, we bstrct wy from the problem of nternl vrnce, ssume tht the model prmeters re sttonry, nd focus on the prmetrc vrnce tht results from estmtng the model prmeters from fnte smple of dt Overvew As fr s we know, ths s the frst pper to study prmetrc bs nd vrnce n MDPs. It serves two purposes: Frst, to llustrte the potentl for error n vlue functon estmtes nd to hghlght the potentl mgntude of these errors; second, to provde formuls nd methodology for estmtng the bs nd vrnce n vlue functon estmtes, whch cn then be used to construct confdence ntervls round the vlue functon estmtes.

3 Mnnor et l.: Bs nd Vrnce Approxmton n Vlue Functon Estmtes 310 Mngement Scence 53(2), pp , 2007 INFORMS We begn wth some nottons nd bckground mterl n 2. In 3, we llustrte the reltonshp between errors n the model prmeters nd the ccurcy of vlue functon estmtes, usng ctul dt from ctlog mlng context. In 4, we present methodology for estmtng the bs nd vrnce n the vlue functon estmtes. In 5, we vldte our methodology usng the ctlog mlng dt. We conclude n 6 wth revew of the fndngs nd dscusson of opportuntes for future reserch. 2. A Forml Descrpton of the Problem We consder fnte-stte, fnte-cton, nfnte-horzon, dscounted rewrd MDP, where S denotes the set of sttes of crdnlty m, A s the set of ctons, 0 1 s the dscount fctor, nd P nd R S A denote trnston probbltes nd the condtonl expected rewrds. The sclrs P nd R re nterpreted s follows: f the current stte s nd cton s ppled, then the next stte s wth probblty P ; furthermore, gven tht trnston from to occurs followng n cton equl to, rndom rewrd s obtned, whose condtonl expectton s equl to R. Note tht f cton s ppled t stte, the expected rewrd, denoted by R, s equl to P R. We re nterested n the vlue functon ssocted wth sttonry, Mrkovn, possbly rndomzed, fxed polcy. The ssumpton tht the polcy s fxed llows us to ntlly bstrct wy from the control problem. As we dscuss n 4.2, the mpct of prmeter uncertnty on the soluton to the control problem rses ddtonl ssues. We use to denote the condtonl probblty of pplyng cton when t stte. Let P = P, whch s the trnston probblty from to. The superscrpt heres slght buse of notton. It wll be cler from the context tht the Greek letter s superscrpt ndctes tht the prmeter s functon wth polcy s n rgument. Smlrly, denote R = R = P R (1) whch s the expected rewrd t stte, under the polcy. We use P to denote the m m mtrx wth entres P, nd R to denote the m-dmensonl vector wth components R. Defne the vlue functon ssocted wth polcy to be the m-dmensonl vector gven by Y = k P k R = I P 1 R R In our settng, the true model prmeters, P nd, re not known. Insted, we hve ccess to fnte smple of dt from whch these prmeters cn be estmted. Specfclly, ssume tht for every nd, we hve record of N trnstons out of stte, under cton, nd the ssocted rewrds. We tret the numbers N s fxed (not s rndom vrbles), nd ssume tht N > 0 for every nd. Ths lst ssumpton restrcts ttenton to ctons tht hve been tred before. For t lest two resons, we ntcpte tht ths wll be reltvely wek ssumpton n prctce. Frst, the nblty to evlute ctons n one stte does not restrct our blty to evlute the sme cton n other sttes becuse we cn stll evlute n cton t ny stte where the cton hs been tred before. Thus, the restrcton only pples to sttes n whch there s no pst nformton bout the outcome. Second, there s tremendous mount of vrton n hstorcl polces n mny rel-world pplctons. Ths vrton my rse for lot of resons ncludng expermentton, mplementton errors, or nonsttonrty n the polcy. If there s nterest n untred ctons nd there re prors vlble to help predct the outcome, then Byesn pproch cn be used. For completeness, we detl such n pproch n Appendx D (provded n the e-compnon). 1 Furthermore, we do not ssume ny relton between the smplng process nd the polcy of nterest; n prtculr, the N for dfferent need not be proportonl to, nd the number N = N of trnstons out of stte need not be relted to the stedy-stte probblty of stte under polcy. For the N trnstons out of stte under cton n the smple dt, let N be the number of trnstons tht led to stte. Furthermore, let C be the sum of the rewrds ssocted wth these N trnstons (for completeness, we defne C = 0fN = 0). We defne P = N R = C N N whch wll be our estmtes of P nd R, respectvely. When N = 0, we defne R = 0. The possblty of N beng zero for fesble trnstons ntroduces some ddtonl bs, whch wll not be ccounted for. However, n our nlyss, we wll ssume tht ny trnston wth N = 0 s nfesble. In ddton, we defne nd R = P P R = = C N P R = R (2) 1 An electronc compnon to ths pper s vlble s prt of the onlne verson tht cn be found t nforms.org/.

4 Mnnor et l.: Bs nd Vrnce Approxmton n Vlue Functon Estmtes Mngement Scence 53(2), pp , 2007 INFORMS 311 whch wll be our estmtes of P, R, nd R, respectvely. We fnlly defne mtrx P nd vector R, wth entres P nd R, respectvely, whch wll be our estmtes of P nd R. Bsed on these estmtes, we obtn n estmted vlue functon Y, gven by Y = I P 1 R (3) We ssume tht the smple dt reflect the true process n the followng sense. The vector N1 N m follows multnoml dstrbuton wth prmeters N P 1 P m. Let Ɛ denote expectton under the true model. We then hve Ɛ N = N P. A lst ssumpton tht reflects our erler ssumptons tht N s fxed nd tht ech smple rewrd s condtonlly ndependent from the pst, s tht Ɛ C N =. Under these ssumptons, t s esly verfed N R tht P nd R re unbsed estmtes of P nd R. Bsed on Equton (3), we cn ntcpte the mpct of errors n P nd R on Y. Note frst tht Y s lner n R, so tht f P were observed wthout error (.e., f P = P ), the vrnce of R would led to vrnce n Y but not to bs (becuse R s unbsed). In contrst, Y s nonlner n P, so tht errors n P led to both bs nd vrnce n Y. Moreover, due to the mtrx nverson the nonlnerty s substntl, so tht ny error n P cn trnslte to lrge error n Y. Ths s prtculrly true when s close to one. Furthermore, f the errors n P nd R re correlted, the nonlnerty mples tht errors n R wll lso led to bs n Y. 3. An Illustrton To llustrte the bs nd vrnce tht cn be ntroduced to vlue functon estmtes by errors n the model prmeters, we use rel dt from ml-order ctlog compny. Whle ths pplcton serves s useful cse study, our fndngs re not lmted to ths pplcton. Decdng who should receve ctlog s mongst the most mportnt decsons tht ml-order compnes must ddress. Yet, dentfyng n optml mlng polcy s dffcult tsk. Customer response functons re hghly stochstc, reflectng n prt the reltve pucty of nformton tht frms hve bout ech customer. Moreover, the problem s dynmc one. Purchsng decsons re nfluenced not ust by the frm s most recent mlng decson, but lso by pror mlng decsons. As result, the optml mlng decson depends on pst nd future mlng decsons. A typcl ctlog compny mght ml 25 ctlogs per yer. The number of ctlogs, the dtes tht they re mled, nd the content of the ctlogs re determned up to yer before the frm decdes to whom ech ctlog wll be mled. For ths reson, these decsons re typclly treted s fxed when decdng who to ml to. Accordngly, the frm only needs to decde whch customers to ml to, on ech exogenously determned mlng dte ( dscrete nfntehorzon problem). The frm s obectve s to mxmze ts expected totl dscounted profts. Rewrds (profts) n ech perod re clculted s the revenue erned from customer purchses (f ny) less the cost of the goods sold nd the mlng costs (pproxmtely 65 cents per ctlog mled). To support ther mlng decsons, ctlog frms typclly mntn lrge dtbses descrbng the ndvdul purchse nd mlng hstores for ech customer. We re fortunte to hve ccess to lrge dtbse descrbng the trnscton nd mlng hstores for the women s pprel dvson of modertely lrge ctlog compny. Ths dt s descrbed n detl n Smester et l. (2006). It ncludes the complete trnscton hstores for pproxmtely 1.72 mllon customers. The mlng hstores re complete for the sx-yer perod from 1996 through 2002 (the compny dd not mntn record of the mlng hstory pror to 1996). Ctlogs were mled on 133 occsons n ths sx-yer perod, so tht on verge mlng decson occurred every two to three weeks. The ctlog mlng problem cn be modelled s n MDP (s n Gönül nd Sh 1998), where the stte s summry of the customer s hstory, nd the cton t ech perod s to ether ml or not ml. The constructon of the stte spce s n nterestng problem tht we wll not consder here. We wll nsted follow stndrd ndustry pproch to ths problem tht uses three stte vrbles, the so-clled RFM mesures (e.g., Bult nd Wnsbeek 1995, Btrn nd Mondschen 1996). These mesures descrbe the recency, frequency, nd monetry vlue of customers pror purchses. Recency s mesured s the number of dys (n hundreds) snce customer s lst purchse. Frequency mesures the number of tems tht customers prevously purchsed. Monetry vlue mesures the verge prce (n dollrs) of the tems ordered by ech customer. For the purposes of ths llustrton, we constructed stte spce by quntzng ech of the RFM vrbles to four dscrete levels, yeldng stte spce wth S = 4 3 = 64 sttes. At ech hstorcl mlng epoch, we evlute the RFM vrbles of ech customer (regrdless of whether the customer receved ctlog or mde purchse) nd chrcterze hm/her nto one of the 64 sttes. We lso tret the purchse mount (zero f no purchse n the epoch) less the mlng cost s rewrd smple. Therefore, ech customer s hstorcl dt over tme serves s smple trectory. Followng the procedure descrbed n the prevous secton, we my then estmte the model prmeters P nd R nd clculte Y for the current polcy embedded n dt.

5 Mnnor et l.: Bs nd Vrnce Approxmton n Vlue Functon Estmtes 312 Mngement Scence 53(2), pp , 2007 INFORMS Becuse the frm s nterested n the verge proft per customer rther thn the proft erned from n ndvdul customer, nternl vrnce verges out. However, prmetrc vrnce s of nterest becuse t ffects the comprson of dfferent polces. In prtculr, when evlutng new polcy, the frm would lke both predcton of the expected profts from doptng the new polcy, together wth confdence bounds round tht predcton. To llustrte the mpct of prmetrc vrnce, we rndomly dvded the 1.72 mllon customers nd 164 mllon observtons nto 250 eqully-szed subsmples, ech contnng pproxmtely 657,000 observtons. By observton, we men mlng perod nd n ssocted stte trnston n the hstory of customer, rrespectve of whether ctlog ws mled or purchse ws mde durng tht tme perod. We then seprtely estmted the model prmeters P nd R followng 2 usng the observtons from ech of these subsmples. Here, we consdered the polcy to be the sme s the smplng polcy tht generted the dt. Usng Equton (3), we clculted 250 estmtes of the vlue functon. As benchmrk, we lso estmted the model prmeters usng the full smple of 1.72 mllon customers. For the purposes of ths llustrton, we wll nterpret the model estmted usng the full smple s the true model, whch s essentlly equvlent to ssumng tht the 1.72 mllon customers re the full populton. Thus, wthn typcl subsmple, the expected rewrd n ech stte R ws estmted usng n verge of pproxmtely 10,000 observtons (N ), whle the trnston mtrx P ws estmted usng n verge of 160 observtons per trnston. In prctce, most of the trnstons re nfesble; for exmple, customer cnnot trnston from hvng three pror purchses to only hvng two pror purchses. When lmtng ttenton to only those trnstons tht re fesble, the verge number of observtons per trnston ws pproxmtely 1,400. (The verge of the postve N s s round 1,400.) In Fgure 1, we report the emprcl dstrbuton (hstogrm) of the vlue functon Y cross ll 250 subsmples under the hstorcl polcy used by the frm (s clculted usng the whole smple). To summrze n estmted vlue functon wth sngle number, we verge the estmtes cross sttes for ech subsmple, weghng ech stte eqully. We wll refer to ths mesure s the verge vlue functon (AVF). We use equl weghts to ncrese the clrty of llustrton. By usng equl weghts (s wth ny fxed weghts), we vod potentlly ntroducng n ddtonl source of vrnce due to the weghts themselves beng rndom vrbles. The true AVF, computed from the prmeters estmted for the full smple, s $ In comprson, the verge of the Fgure 1 Number of subsmples Ml Ctlog Problem: A Hstogrm of the AVF of the Hstorcl Polcy for Prtton of the Customers to 250 Subsmples AVF per subsmple Note. The dscount fctor per perod s = The polcy used s the hstorcl (mxed) polcy used by the frm, nd the vlue functon s weghted unformly cross sttes. The AVF obtned from the full dt s $28 54, nd s plotted s vertcl lne. The emprcl stndrd devton s $ estmtes s $28 65, wth n emprcl stndrd devton of $0 97. The dfference between $28 54 nd $28 65 s not sttstclly sgnfcnt nd s of seemngly lttle mngerl mportnce. However, the vrnce s potentlly very mportnt. The 95% confdence ntervl round the 250 AVF estmtes rnges from $26 59 to $30 49, or roughly 14% of the true men. Of course, we were ble to estmte the $0 97 stndrd devton only becuse we hd ccess to mny subsmples. In rel-world settng, where only sngle smple s vlble, the resercher generlly reles on smultons or ck-knfng technques to estmte the stndrd devton. In ths pper, we wll present procedure for dervng closed-form pproxmtons of the stndrd devton drectly from the dt. We cn demonstrte the robustness of the bove descrbed results by vryng both the sze of the subsmples nd the dscount fctor. In Tble 1, we present the emprcl bs nd stndrd devton for dfferent dscount fctors (verged over 10 repettons). In ech repetton, we dvde the dt set nto 100 subsmples nd compute the AVF for ech subsmple. We clculte the verge bsolute vlue of the bs nd the emprcl stndrd devton of the AVF estmtes cross subsmples. It cn be seen tht the verge bs s smll for dscount fctors tht re not too close to one. For dscount fctors tht re close to one, the bs becomes more menngful but stll remns much smller thn the stndrd devton. In nother experment, we vred the precson of the estmtes by chngng the sze of the

6 Mnnor et l.: Bs nd Vrnce Approxmton n Vlue Functon Estmtes Mngement Scence 53(2), pp , 2007 INFORMS 313 Tble 1 Bs nd Vrnce s Functon of the Dscount Fctor Bs/AVF (%) STD/AVF (%) Note. For ech dscount fctor, we prtton the dt 10 tmes, wth ech prtton resultng n 100 subsmples (ech wth roughly 1.6 mllon observtons). In the tble, we present the men bsolute vlue of the bs nd the men emprcl stndrd devton ech verged cross the 10 repettons. Both of these mens re stndrdzed by dvdng by the AVF ssocted wth the hstorcl polcy (s mesured on the whole dt set). subsmples nd repeted the nlyss usng subsmples wth dfferent number of observtons. In Fgure 2, we report emprcl stndrd devtons of the AVF estmtes under the dfferent-szed subsmples. Ech cross n Fgure 2 represents rndom ssgnment of the observtons to subsmples (the dfferent ssgnments led to vrton n the subsmples between repettons). Whle ncresng the sze of the subsmples ncreses the ccurcy of the model prmeters, nd n turn reduces the vrnce n the AVF estmtes, the rte t whch the vrnce pproches zero slows down s the subsmples ncrese n sze. It seems tht even when estmtng the model prmeters wth very lrge mounts of dt, prmetrc vrnce leds to nonneglgble vrnce n the vlue functon estmtes. Fgure 2 STD of AVF (%) Ml Ctlog Problem: The Emprcl Stndrd Devton of the AVF s Functon of the Smple Sze Number of observtons per subsmple (mllons) Note. Ech cross represents sngle (rndom) prtton of the observtons nto subsmples. 4. Anlyss In ths secton, we provde closed-form pproxmtons for the bs nd vrnce of the estmted vlue functon usng second-order pproxmtons. We then brefly dscuss the control problem where n ddton to the estmton process, we look for n optml polcy. In 4.1, we wll drop the superscrpt becuse we consder fxed polcy Approxmtons for Bs nd Vrnce n the Estmted Vlue Functon We now derve closed-form pproxmtons for the (prmetrc) bs nd vrnce of Y. The nlyss follows clsscl (non-byesn) pproch, where the bs nd vrnce re expressed n terms of the (unknown) true prmeters. Becuse the true model prmeters re unknown, we substtute the estmted prmeters, whch s stndrd prctce. However, s result of ths substtuton, the vlues obtned for the bs nd vrnce re themselves estmtes. For completeness, we lso provde n Appendx D Byesn nlyss. Under the Byesn pproch, P nd R re treted s rndom vrbles wth known pror dstrbutons, nd we deduce pproxmtons for the condtonl bs nd vrnce, gven the vlues of P nd R. The expressons obtned usng the Byesn pproch re lmost dentcl to the ones n the clsscl pproch (unless n nformtve pror s vlble). Our gol s to clculte Ɛ Y nd the covrnce mtrx for Y, defned by cov Y = Ɛ Y Y Ɛ Y Ɛ Y We defne rndom m m mtrx P = P P nd rndom m-vector R = R R. Note tht P nd R re zero men rndom vrbles tht represent the dfference between the true model nd the estmted model. To help nterpret some of the lter nlyss, t wll be helpful to hve sense of the mgntudes of P nd R. Becuse the trnston probbltes re bounded by zero nd one, the errors n these probbltes re lso bounded between zero nd one. The trnston probbltes themselves wll tend to be smller the lrger the number of sttes to whch trnstons re fesble, whle the errors n these probbltes wll be smller the more observtons there re reltve to the number of fesble trnstons. In the exmple dscussed n 3 nd Fgure 1, the mxmum error n the trnston probbltes n subsmple (mx P ) hs men of nd stndrd devton of Furthermore, the verge (verged over ll prs wth nonzero trnston probblty) bsolute error n the trnston probblty estmtes, P, hs men of nd n emprcl stndrd devton of

7 Mnnor et l.: Bs nd Vrnce Approxmton n Vlue Functon Estmtes 314 Mngement Scence 53(2), pp , 2007 INFORMS Note tht n tht exmple, the fesble trnstons consst of less thn 10% of the 64 2 entres n P. The expected rewrds re not bounded pror nd so the errors re lso unbounded. In the ctlog exmple, the verge bsolute error n the rewrd estmtes, R, hs men of $4.25 nd stndrd devton of $1.82. The mxml error n the rewrd estmtes, mx R, hs men of $56.3 nd stndrd devton of $43.2. We now wrte the expectton of Y (cf. Equton (3)) s Ɛ Y = Ɛ I P + P 1 R + R [ ] = Ɛ k P + P k R + R (4) where the geometrc seres expnson of I P+ P 1 ws used to obtn the second equlty. We use the notton X = I P 1 nd f k P = X ( PX ) k = ( X P ) k X. The followng lemm wll be useful. Lemm 4.1. l=0 l P + P l = k f k P. Proof. k f k P = k X P k X = I X P 1 X = X 1 X 1 X P 1 = I P P 1 = l P + P l l=0 where we repetedly used the defnton of X nd the fct tht X s nvertble. Usng Lemm 4.1 n Equton (4), we obtn ( ) Ɛ Y = I P 1 R + k Ɛ f k P R + k=1 k Ɛ f k P R (5) There re three terms on the rght-hnd sde of Equton (5). The frst term s the vlue functon for the true model. The second term reflects the bs ntroduced by the uncertnty n P lone, nd the thrd term represents the bs ntroduced by the correlton between the errors n P nd R. Equton (5) provdes seres expnson of the error n terms of hgh-order moments nd cross moments of the errors n P nd R. The clculton of the bs s tedous becuse the term Ɛ f k P nvolves kth-order moments of multnoml dstrbutons. But becuse P s typclly smll, P k s generlly close to zero for lrge k. For ths reson, we lmt our ttenton to second-order pproxmton nd we wll ssume tht Ɛ f k P 0 for k>2, nd tht Ɛ f k P R 0 for k>1. We use the ctlog dt to nvestgte the pproprteness of ths ssumpton n 5. Therefore, we cn wrte Equton (5) s Ɛ Y = I P 1 R + Ɛ f 1 P R + 2 Ɛ f 2 P R + XƐ R + Ɛ f 1 P R + L exp (6) where we represent ll the terms of order greter thn two n L exp = k Ɛ f k P R + k Ɛ f k P R (7) k=3 Gven tht we wll be usng second-order pproxmtons, we expect tht the men nd vrnce of Y cn be clculted s long s we re ble to compute the covrnce between vrous entres of R nd P. We strt wth P. Frst, we ntroduce some notton. We use the notton A nd A to denote the th row nd column, respectvely, of mtrx A, nd dg A to denote dgonl mtrx wth the entres of A long the dgonl. We note tht P nd P re ndependent when. To fnd the covrnce mtrx of P, we consder the row vectors P nd P wth the estmted nd true trnston probbltes, nd defne P to be ther dfference. Note tht P = P For ech stte-cton pr, we defne M = dg P P P whch s symmetrc postve semdefnte mtrx. Recll tht for ech, we hve P = N /N, where the N re drwn from multnoml dstrbuton. The covrnce mtrx of P s M /N, nd the covrnce mtrx of P s cov = Ɛ P P = k=2 2 M Now we consder R. Becuse C s ndependent of C kl whenever k, we hve Ɛ R R k = 0 for k. Furthermore, Ɛ R 2 = 2 Ɛ R 2 In the followng, we use N to represent the vector wth components N, = 1 m, nd R to represent the vector wth components R, = 1 m. Note tht C nd C k re ndependent gven N nd N k, so tht [ Ɛ R 2 C ] = vr = 1 [ ] vr C N N 2 = 1 { ( [ ]) vr Ɛ C N 2 N ( [ ])} + Ɛ vr C N N

8 Mnnor et l.: Bs nd Vrnce Approxmton n Vlue Functon Estmtes Mngement Scence 53(2), pp , 2007 INFORMS 315 = 1 { ( ) [ ]} vr R N 2 N + Ɛ V N = 1 N 2 = 1 N { N 2 vr R P + N } V P R M R + V P (8) Here, V s the vrnce of the rewrds ssocted wth trnston from to, under cton. To ccount for the correlton between P nd R,we use Equton (2) to obtn R = R P = R P + R P + R P + R P (9) where R = R R. Comprng wth Equton (1), we hve R = R R = R P + R P + R P (10) We use to denote Hdmrd multplcton: for ny two mtrces A nd B wth the sme dmensons, A B s mtrx (gn wth the sme dmensons) wth entres A B = A B. We lso use e to denote the m-dmensonl vector wth ll components equl to one, nd we use to denote the m-dmensonl vector wth the th component beng. Wth ths notton, Equton (10) becomes ( ) R = P R + R P + R P e (11) We defne n m m mtrx Q wth entres Q = cov X (12) (Recll the defnton X = I P 1, nd tht Y = XR s the true vlue functon.) We defne n m-dmensonl vector B wth ts th component defned s B = 2 R M X N The followng proposton quntfes the bs under the second-order pproxmton ssumpton. The proof s gven n Appendx A. Proposton 4.1. The expectton of the estmted vlue functon Y stsfes Ɛ Y = Y + 2 XQY + XB + L exp where L exp s defned n Equton (7) nd ( ) 1 L exp = o where N = mn >0 N lm N o 1/N N = 0. N nd the term o stsfes In the bove proposton, represents the lest smpled stte-cton pr tht s used by the polcy. The term L exp decreses to zero fster tht 1/N, wheres Q nd B cn be shown to decrese lke 1/N. Therefore, our pproxmton for the bs n the vlue functon estmtes wll be 2 XQY + XB For the purposes of the next proposton, we ntroduce more notton. We defne the dgonl mtrx W, whose dgonl entres re gven by W = 2 Y + R N M Y + R + V P (13) The next proposton provdes n expresson for the second moment, Ɛ Y Y. Together wth the expresson for Ɛ Y n the precedng proposton, t leds to n pproxmton for the covrnce mtrx of Y. The proof s gven n Appendx B. Proposton 4.2. The second moment of Y stsfes Ɛ Y Y = YY +X { 2 QYR +RY Q + BR +RB +W } X +L vr where L vr s gven by L vr = k+l Ɛ f k P RR + R R f l P k l k+l>2 + Ɛ X R R f 1 P + Ɛ f 1 P R R X ( ) 1 = o N By tkng the dfference between Ɛ Y Y, s gven by Proposton 4.2, nd Ɛ Y Ɛ Y, s prescrbed by Proposton 4.1, the followng corollry s esly derved. Corollry 4.1. The covrnce mtrx of the estmted vlue functon stsfes ( 1 cov Y = XWX + o N The expressons n Propostons 4.1 nd 4.2 nd Corollry 4.1 yeld severl nsghts. Frst, s the counts N ncrese to nfnty, cov pproches zero, nd thus ll the terms nvolvng the mtrces Q, B, nd W converge to zero. As expected, ths mples tht s the smple sze ncreses nd the ccurcy of the estmted prmeters mproves, both the bs nd the vrnce decrese to zero. Second, the expressons for the bs nd vrnce rely on the true model prmeters, whch re unknown. As dscussed n the ntroducton, to obtn computble pproxmtons of the )

9 Mnnor et l.: Bs nd Vrnce Approxmton n Vlue Functon Estmtes 316 Mngement Scence 53(2), pp , 2007 INFORMS bs nd vrnce, we wll use nsted P, R, nd the emprcl vrnce of ech R k. In prncple, we could lso estmte the bs nd vrnce due to ths pproxmton, but ths s tedous nd, s suggested by the expermentl results n the next secton, generlly unnecessry. Thrd, when mn N s lrge, t follows tht the nonzero entres of B, W, nd Q decrese to zero lke 1/N. Therefore, the stndrd devton decreses to zero lke 1/ N, whch s the usul behvor of emprcl estmtes. The expressons n Proposton 4.1 nd Corollry 4.1 llow us to qulttvely compre the mgntude of the bs nd vrnce. Accordng to Corollry 4.1, the stndrd devton of Y cn be pproxmtely estmted s Y = X WX (14) The next proposton, proved n Appendx C, quntfes the rto between the stndrd devton nd the bs. Recll tht for two postve functons f nd g (defned on the rel numbers), we wrte f n = g n f there exst constnts N 0 nd C>0 such tht f n Cg n for n N 0. N Proposton 4.3. Suppose tht Y >0 nd N >c>0 for ll nd. Then, / Y ( ) Ɛ Y Y = N for ll Proposton 4.3 mples tht the errors ntroduced by the prmetrc vrnce wll generlly be much lrger thn the bs. Note tht becuse W s postve semdefnte mtrx, Y >0 s very wek nondegenercy ssumpton. The condton N /N >c>0 requres tht smple szes ncrese unformly. Whle the expresson n Corollry 4.1 llows us to pproxmte the covrnce mtrx of the estmted vlue functon, the fndngs on ther own do not llow us to clculte confdence ntervls round these estmtes. Clcultng confdence ntervl requres tht we know the dstrbuton of the vlue functon estmtes. A centrl lmt theorem (Serflng 1980, p. 122, Theorem A) speks to ths ssue. Theorem 4.1 (Serflng 1980). Suppose tht sequence of rndom vectors X n = X n1 X nk s bn 2 (symptotclly norml,tht s, X n / d b n 0 ); nd the sequence of sclrs b n converges to 0. Let g x = g 1 x g m x, x = x 1 x k,be vector-vlued functon for whch ech component functon g x s rel-vlued functon nd hs nonzero grdent t x =. Let [ ] g D = x x= m k Then, g X n s g bn 2D D. Becuse P nd R re ll estmtors tht symptotclly follow norml dstrbutons, we my consder Y s the functon g n the bove theorem nd conclude tht Y s symptotclly norml. We further nvestgte ths ssue usng ctlog mlng dt n 5, where we report tht Kolmogorov-Smrnov test cnnot reect the hypothess tht Y s normlly dstrbuted. Reders my wonder whether we could hve used Serflng s (1980) result to derve our erler fndngs. It s technclly possble to do so. Indeed, under the ssumpton tht ll of the N s re dentcl, we were ble to show tht the two pproches yeld the sme result, nd observed tht the two dervtons were of comprble length nd complexty. However, f smplng occurs t dfferent rtes n dfferent sttes, the rte t whch the N s pproch nfnty wll generlly vry. In ths cse, use of the Serflng theorem, or ny relted centrl lmt theorem, requres extensve ddtonl dervton. Moreover, these theorems do not ddress the ssue of bs. The sme pproch cn be used for nfnte-horzon verge rewrd MDPs. Under mld ssumptons on the structure of the Mrkov chn, we get smlr pproxmtons to the bs nd vrnce for the verge rewrd. The de cn lso be extended to sem- Mrkov processes, where the trnston tmes between tme epochs re rndom nd estmted from smpled dt The Control Problem To ths pont, we hve focused on the vlue functon under fxed polcy. In mny pplctons, we re nterested n comprng n exstng polcy wth n lterntve polcy, possbly derved through polcy optmzton process. We know from the MDP theory tht there exsts n optml polcy such tht Y for ll dmssble polces nd ll sttes S. An Y optml polcy my be obtned usng vlue terton, polcy terton, or lner progrmmng lgorthms. See, for exmple, Bertseks (2000). Becuse we do not hve ccess to the true model prmeters P nd R, optmzton bsed on the estmted prmeters P nd R produces n optml polcy such tht Y Y for ll dmssble polces. In generl, polcy s dfferent from. Moreover, becuse the polcy s obtned through n optmzton process, the estmtes of the model prmeters for tht polcy ( P nd R ) wll no longer be unbsed estmtes of the true model prmeters (P nd R ). Therefore, we cnnot use the pproxmton derved n Proposton 4.1 (for fxed polcy) to evlute the bs n the optml vlue functon nor cn we use the pproxmtons n Proposton 4.2 nd Corollry 4.1 to estmte the covrnce mtrx.

10 Mnnor et l.: Bs nd Vrnce Approxmton n Vlue Functon Estmtes Mngement Scence 53(2), pp , 2007 INFORMS 317 We cn llustrte the problem through smple exmple. Consder sngle-stte MDP wth two ctons, tht s, S = 1 nd A = 0 1. Both ctons yeld dentcl zero-men rndom rewrds. Clerly, n such problem could be ether cton 0 or 1, wth vlue functons Y = Y = 0 Now ssume tht we hve n smples to estmte the expected rewrd R for ether cton. Indeed, both R follow (pproxmtely) norml dstrbuton 0 1/n. The polcy optmzton procedure chooses the cton wth the lrgest R.Ifweuse R to denote the mxmum of R 0 nd R 1, we know from Jensen s nequlty tht Ɛ R >0, nd so the vlue functon estmted for the chosen polcy wll on verge be postvely bsed: Ɛ Y = Ɛ R = Ɛ mx R 0 R 1 > mx Ɛ R 0 Ɛ R 1 = 0 The mgntude of Ɛ Y, nd therefore the bs n ths exmple, s studed n the order sttstcs lterture (Ledbetter et l. 1983). We lso refer reders to Clrk (1961), whch presents procedure to pproxmte moments of the mxmum of fnte number of correlted Gussn rndom vrbles. Ths problem rses two ssues. Frst, how cn we de-bs the estmtes of P nd R so tht we cn use our erler results to estmte the bs nd covrnce mtrx of vlue functon when the polcy s derved from n optmzton procedure? Second, becuse the optmzton procedures themselves rely on estmtes P nd R, the polces derved from stndrd dynmc progrmmng lgorthms wll generlly not be truly optml ( ). In the remnder of ths secton, we propose cross-vldton pproch tht cn help to ddress the frst ssue. Unfortuntely, we do not hve soluton to the second ssue. Indeed, t seems unlkely tht generl procedure cn be found tht resolves the second ssue s the suboptmlty reflects the bsence of complete nformton n the trnng dt. The bs n the estmtes of P nd R rses becuse optmzton methods tend to fvor ctons for whch the estmton errors n P nd R led to nflted estmtes of the vlue functon. As long s the errors n P nd R re ndependent cross smples, we cn derve unbsed estmtes of P nd R f we use dfferent smple of dt to evlute the polcy thn the smple we used to desgn the polcy. In prtculr, consder the followng pproch: Strt by dvdng the trnng dt nto two subsmples clbrton smple nd vldton smple. Use the clbrton smple to estmte the model prmeters P cl nd R cl nd obtn the optml polcy cl = rg mx I P cl 1 R cl Then, estmte model prmeters P vl nd R vl from the vldton smple nd (followng Equton (3)) evlute the polcy usng these new prmeters: Y cl vl = ( I P ) cl 1 vl R cl vl Through ths procedure, we cn de-bs the vlue functon estmtes by reportng Y cl vl nsted of Y cl cl, where Y cl cl = I P cl cl 1 R cl cl. Accordngly, we my lso pproxmte the bs nd vrnce nd therefore the confdence bounds of Y cl vl followng Proposton 4.1 nd Corollry 4.1. The ssumpton tht the estmton errors n P nd R re ndependent cross the clbrton nd vldton subsmples s obvously crtcl. In ths pper, we hve ssumed tht estmtes P nd R re derved from strghtforwrd nonprmetrc ggregtes of the vlble dt. Under ths pproch, the estmton errors re ndependent cross the subsmples s long s ny mesurement errors re ndependent cross observtons. However, n some settngs, t s common to estmte the model prmeters from mxmum lkelhood estmtes tht requre functonl form nd dstrbuton ssumptons (ths s prtculrly common n the economcs lterture). Under ths lterntve pproch, ny errors ntroduced by the functonl form nd dstrbuton ssumptons wll be correlted cross the subsmples. As result, the cross-vldton procedure tht we hve proposed wll not de-bs the estmtes of P nd R, even f the mesurement errors re ndependent cross the observtons. 5. Experments The relnce on second-order expnson n dervng the pproxmtons for the bs nd vrnce presumes tht hgher-order terms re reltvely unmportnt. We now exmne ths ssumpton n further detl by usng the ctlog mlng dt to vldte the fndngs. These dt lso enble us to nvestgte the mpct (f ny) of usng estmtes of the model prmeters n these expressons (n the bsence of the true model prmeters). If the vlue functon estmtes follow norml dstrbuton, the vrnce nd bs expressons derved n the prevous secton fcltte clculton of confdence ntervls round the de-bsed vlue functon estmtes. We cn nvestgte the ccurcy of these confdence ntervls by comprng how frequently the true vlue functon flls wthn the confdence ntervls. We would expect tht on verge the true vlue

11 Mnnor et l.: Bs nd Vrnce Approxmton n Vlue Functon Estmtes 318 Mngement Scence 53(2), pp , 2007 INFORMS wll fll wthn one stndrd devton of the unbsed men 68% of the tme nd wthn two stndrd devtons 95% of the tme. We begn by nvestgtng whether the vlue functon estmtes follow norml dstrbuton. We do so by usng Kolmogorov-Smrnov test on ech of the dt ponts reported n 3. The hypothess tht the rewrd s two-sded Gussn could not be reected wth confdence 0.05 t ny nstnce. The verge p-vlue ws wth mnmum of nd mxmum of Ths ndctes tht t cnnot be determned tht the dt do not follow Gussn rule. We use the sme prttons of the dt s n 2. In Fgure 3, the percentge of tmes tht the true vlue functon ws wthn one stndrd devton s denoted by + nd wthn two stndrd devtons by n. For exmple, for the 250 subsmples (wth bout 657,000 observtons ech), we report the percentge of the 250 estmtes n whch the true AVF (s estmted on the full smple) ws wthn the estmted confdence ntervl. By redrwng the 250 subsmples 10 tmes, we report 10 nstnces of ths percentge. An nlogous process ws used wth other choces of the subsmple sze. The fndngs n Fgure 3 confrm tht the percentge of estmtes tht fll wthn one nd two stndrd devtons of the true AVF re close to the trgets of 68% nd 95%, respectvely. We next consder the mportnce of the secondorder pproxmtons. We do so by tkng dvntge of the role plyed by the dscount fctor. The mportnce of hgher-order terms n the seres expnsons Fgure 3 Percentge below 1 (+) or 2 (+) STDs The Percentge of the AVF Estmtes tht Fll Wthn One + nd Two Stndrd Devtons from the Vlue Clculted Bsed on the Full Dt Set Observtons per subsmple (mllons) Note. Ech + nd represents rndom prtton of the full dt to subsmples. The dscount fctor s = Tble 2 Percentge of the AVF Estmtes tht Fll Wthn One nd Two Stndrd Devtons Smples wth Smples wth one STD (%) two STDs (%) ( ) ( ) ( ) ( ) ( ) ( ) ( ) ( ) ( ) ( ) ( ) ( ) ( ) ( ) Note. We rndomly prttoned the dt whle vryng the dscount fctor. For ech dscount fctor, we performed the prtton 10 tmes; ech prtton ws to 250 subsmples (ech wth roughly 657,000 mllon observtons). We present the percentge of smples n whch the estmted AVF s wthn one stndrd devton (s predcted by Proposton 4.2) of the vlue s mesured on ll the dt; the mnmum nd mxmum percentges over the 10 runs re provded n prentheses. The sme sttstcs re presented for two stndrd devtons. ncreses s the dscount fctor pproches one. In Tble 2, we repet the nlyss for 250 subsmples of fxed sze, but for dfferent dscount fctors (sme settngs s n Tble 1). As expected, s pproches one, the ccurcy of the confdence ntervls degrdes. We ttrbute ths to the error ntroduced by the secondorder pproxmton The Control Problem As dscussed n 4.2, n obvous pplcton of our nlyss s the comprson of current polcy wth new polcy generted through some optmzton process. We cutoned tht before pplyng the expressons for the bs nd the vrnce to polcy derved from such process, we should frst obtn unbsed estmtes of the model prmeters, usng n ndependent vldton smple. We wll use the ctlog mlng dt to llustrte the mportnce of ths frst step. We begn by rndomly selectng porton of the vlble dt to be used s clbrton smple, nd retn the remnng dt s vldton smple. To demonstrte how the sze of the clbrton smple ffects the fndngs, we repet ths process for clbrton smples of dfferent szes. The clbrton smple s used to estmte model prmeters P cl nd R cl. Then, we run polcy terton lgorthm to dentfy n optml polcy cl from P cl nd R cl. We wll compre two AVF estmtes for ths polcy: the AVF clculted on the bss of the model estmted usng the clbrton smple (denoted by Y cl ), nd the AVF of tht polcy s estmted usng the vldton smple (denoted by Y vl ). The dfference between the two estmtes represents the bs ntroduced by the error n the model prmeters (the errors no longer hve zero expectton due to the optmzton process).

12 Mnnor et l.: Bs nd Vrnce Approxmton n Vlue Functon Estmtes Mngement Scence 53(2), pp , 2007 INFORMS 319 Fgure 4 The Dfferences (Mrked by + ) Between the AVF Estmtes (n Dollrs, nd Averged Over All Sttes) Bsed on the Clbrton Smple nd the Vldton Smple for the Polcy Identfed Through n Optmzton Process Fgure 5 The Dfferences (Mrked by + ) Between the AVF Estmtes (n Dollrs) of the Optml Polcy Bsed on the Clbrton Smple nd the AVF of the Optml Polcy Found by Optmzng on the Vldton Smple $5 $0 $4 Bs: Y cl Y vl $3 $2 $1 Suboptmlty: Y vl Y * $0.5 $1.0 $0 $ Sze of clbrton smple (% of dt set) Note. Ech + ws generted by rndomly prttonng the dt to clbrton nd vldton smple. The horzontl xs corresponds to the sze of the clbrton smple, s percentge of the full dt smple. Here, = 0 98 for whch the true optml AVF s pproxmtely $ Ths bs s llustrted n Fgure 4 for clbrton smples of vryng szes. It cn be seen tht vlue functon estmtes from the clbrton smple re lmost unformly greter thn the estmtes from the vldton smple. Ths bs s sttstclly sgnfcnt. It s lso mngerlly relevnt, vergng round 6.3% of the true optml AVF ($33.59) for clbrton smple tht conssts of pproxmtely 1.6 mllon observtons (1% of the dt). In ddton, the $33.59 AVF for the optml polcy cn be compred wth the $28.54 AVF for the hstorcl polcy (reported n Fgure 1). These results ndcte tht the optml polcy offers potentl proft mprovement of pproxmtely 17%. We cn lso use the ctlog dt to nvestgte the extent to whch prmetrc vrnce leds to suboptml polces. To do so, we compred the optml polcy derved usng ech subsmple, wth the true optml polcy derved usng the entre dt set. Both polces re evluted on the vldton smple. We use Y to denote the AVF for the optml polcy found by optmzng on the entre dt set. The fndngs re reported n Fgure 5. As expected, the optml polcy lwys outperforms the polcy derved from the clbrton subsmple. The dfferences re gn sttstclly sgnfcnt. Note tht the computton of Y nd Y vl uses the sme dt, whch my ntroduce correlton between the two qunttes. Ths wll tend to dmnsh our estmtes of the suboptmlty. We lso computed Y vl, the optml AVF over the vldton set, n plce of Y for Tble 3 nd Fgures 4 nd 5. The results re smlr. $ Sze of clbrton smple (% of dt set) Note. Ech + ws generted by rndomly prttonng the dt to clbrton nd vldton smple. The horzontl xs corresponds to the sze of the clbrton smple, s percentge of the full dt smple. Here, = 0 98 for whch the true optml AVF s pproxmtely $ To demonstrte the robustness of the fndngs, we performed n experment smlr to the one reported n Tble 2. In Tble 3, we present the bs nd suboptmlty ntroduced by the optmzton process for dfferent vlues of. Specfclly, the bs ws clculted s Y cl Y vl /Y ; the suboptmlty ws clculted s Y vl Y /Y. From Tble 3, we cn esly obtn the men stndrd errors s the smple stndrd devtons dvded by 10 (the squre root of the smple sze, 100). It s cler tht both the bs nd the suboptmlty re generlly sgnfcntly greter thn zero, wth the bs vergng round 2% of the AVF nd the suboptmlty vergng round 1%. Tble 3 Optmzton Bs for Dfferent Vlues of Bs n percent Suboptmlty n percent Men STD Men STD Note. For ech dscount fctor, we performed rndom smplng of the dt 100 tmes. Ech tme we use rndom clbrton smple of 20% of the entre dt set (ech wth roughly eght mllon observtons) nd the other 80% s vldton smple. We found the optml polcy n ech such MDP nd present n the tble the bs, Y cl Y vl, normlzed by Y. We lso present the suboptmlty, Y vl Y, normlzed smlrly. The mens of the bses re sgnfcntly greter thn zero.

13 Mnnor et l.: Bs nd Vrnce Approxmton n Vlue Functon Estmtes 320 Mngement Scence 53(2), pp , 2007 INFORMS We conclude tht prmetrc vrnce ntroduces two ssues n polcy optmzton. Frst, the estmtes of the trnston probbltes nd the rewrds for the optml polcy re bsed, ledng to postve bs n the vlue functon estmtes. Ths problem cn be remeded reltvely esly by evlutng the polcy on seprte vldton smple. The second problem s more dffcult to resolve: errors n the model prmeters lso led to suboptml polces. As we dscussed n 4.2, ths second problem s t lest to some extent nevtble n the bsence of the true model prmeters. There s n nterestng queston rsed by referee: Gven fxed mount of dt, how do we dvde t nto the clbrton nd vldton smples? Includng more dt n the clbrton smple potentlly leds to better polcy, whle more dt n the vldton smple mens tghter confdence ntervl when we evlute the polcy. Ths trde-off cn often be resolved emprclly. 6. Concludng Remrks We hve provded closed-form pproxmtons for the bs nd vrnce of estmted vlue functons cused by uncertnty n the true model prmeters. For smll nd md-szed MDP models, the expressons cn be esly clculted nd used to evlute exstng polces or to compre new polces wth exstng ones. For the cse where new polcy s derved through polcy optmzton process, we lso demonstrted how to remove the ddtonl bs ntroduced by the optmzton process, by usng vldton smple. The expressons re bsed on second-order pproxmtons. Moreover, n the bsence of the true model prmeters, the expressons re evluted by relyng on estmtes of the model prmeters nd re therefore themselves estmtes, subect to prmetrc vrnce. We used lrge smple of dt from ctlog mlng compny to nvestgte the mpct of these pproxmtons. The fndngs ndcte tht the confdence ntervls obtned on the bss of the bs nd vrnce expressons re ressurngly ccurte. Both the ctlog mlng dt nd our theoretcl nlyss provde comprson of the reltve mgntude of the vrnce nd the dfferent bses. The vrnce ntroduced by prmetrc uncertnty s consderble, suggestng both prctcl nd sttstcl mportnce. Of the two bses, only the bs n optml polces ntroduced by the optmzton process s sgnfcnt. For fxed polcy, the bs ntroduced by prmetrc uncertnty wll generlly be neglgble when compred to the vrnce. Whle we report the verge of the vlue functons (verged over ll sttes), the dsggregte results my lso be of nterest. In prtculr, the vrnce of the vlue functons for lterntve polces could be used to gude future expermentton. Future expermentton my fvor ctons tht mght hve lrge effect on the vrnce of the vlue functon estmte. In ths mnner, the fndngs my contrbute to our understndng of the trde-off between explorton nd explotton. The fndngs my lso help to mprove the polcy optmzton process. The polcy mprovement portons of stndrd lgorthms focus on pont estmtes of the vlue functons nd overlook the vrnce round these estmtes. Fnlly, we cuton tht, s wth ll nlyses of MDPs, our fndngs rely on n ssumpton tht the dt re smpled from Mrkov process. In our experments, we ensured stsfcton of ths condton by smplng observtons rther thn trectores ( trectory here would be the complete hstory of customer). 7. Electronc Compnon An electronc compnon to ths pper s vlble s prt of the onlne verson tht cn be found t mnsc.ournl.nforms.org/. Acknowledgments Ths reserch ws prtlly supported by NSF Grnt DMI Ths pper hs benefted from comments by workshop prtcpnts t Duke Unversty, Unversty of Pennsylvn, Wshngton Unversty t St. Lous, the 2004 Interntonl Conference on Mchne Lernng, nd the INFORMS Annul Meetng The uthors re thnkful to Ynn Le Tllec for fndng n error n prevous verson nd to the referees for constructve comments. The uthors re especlly thnkful to the deprtment edtor for detled revew nd mny constructve suggestons. Appendx A Proof of Proposton 4.1. Lemm A.1. For ny cton,we hve Ɛ P = 0, Ɛ R = 0,nd furthermore, R s uncorrelted wth ny functon of P 1 P A,n whch A s the crdnlty of the cton spce A. Proof. The property Ɛ P = 0 s obvous. Let N stnd for the collecton of rndom vrbles N b k for every, k, nd b. We hve Ɛ R R N = 0. The fct Ɛ R = 0 follows by tkng the uncondtonl expectton. Furthermore, becuse ny P s completely determned by N, t follows tht for ny functon g, we hve Ɛ g P 1 P A R N = g P 1 P A Ɛ R N = 0 By tkng uncondtonl expecttons, we obtn the lst prt of the lemm. For the proof of Proposton 4.1, we strt from Equton (5) nd substtute the expresson from Equton (11) for R, to obtn ( ) Ɛ Y = I P 1 R + k Ɛ f k P R k=1

Dennis Bricker, 2001 Dept of Industrial Engineering The University of Iowa. MDP: Taxi page 1

Dennis Bricker, 2001 Dept of Industrial Engineering The University of Iowa. MDP: Taxi page 1 Denns Brcker, 2001 Dept of Industrl Engneerng The Unversty of Iow MDP: Tx pge 1 A tx serves three djcent towns: A, B, nd C. Ech tme the tx dschrges pssenger, the drver must choose from three possble ctons:

More information

Applied Statistics Qualifier Examination

Applied Statistics Qualifier Examination Appled Sttstcs Qulfer Exmnton Qul_june_8 Fll 8 Instructons: () The exmnton contns 4 Questons. You re to nswer 3 out of 4 of them. () You my use ny books nd clss notes tht you mght fnd helpful n solvng

More information

Remember: Project Proposals are due April 11.

Remember: Project Proposals are due April 11. Bonformtcs ecture Notes Announcements Remember: Project Proposls re due Aprl. Clss 22 Aprl 4, 2002 A. Hdden Mrov Models. Defntons Emple - Consder the emple we tled bout n clss lst tme wth the cons. However,

More information

UNIVERSITY OF IOANNINA DEPARTMENT OF ECONOMICS. M.Sc. in Economics MICROECONOMIC THEORY I. Problem Set II

UNIVERSITY OF IOANNINA DEPARTMENT OF ECONOMICS. M.Sc. in Economics MICROECONOMIC THEORY I. Problem Set II Mcroeconomc Theory I UNIVERSITY OF IOANNINA DEPARTMENT OF ECONOMICS MSc n Economcs MICROECONOMIC THEORY I Techng: A Lptns (Note: The number of ndctes exercse s dffculty level) ()True or flse? If V( y )

More information

Chapter Newton-Raphson Method of Solving a Nonlinear Equation

Chapter Newton-Raphson Method of Solving a Nonlinear Equation Chpter.4 Newton-Rphson Method of Solvng Nonlner Equton After redng ths chpter, you should be ble to:. derve the Newton-Rphson method formul,. develop the lgorthm of the Newton-Rphson method,. use the Newton-Rphson

More information

Rank One Update And the Google Matrix by Al Bernstein Signal Science, LLC

Rank One Update And the Google Matrix by Al Bernstein Signal Science, LLC Introducton Rnk One Updte And the Google Mtrx y Al Bernsten Sgnl Scence, LLC www.sgnlscence.net here re two dfferent wys to perform mtrx multplctons. he frst uses dot product formulton nd the second uses

More information

Principle Component Analysis

Principle Component Analysis Prncple Component Anlyss Jng Go SUNY Bufflo Why Dmensonlty Reducton? We hve too mny dmensons o reson bout or obtn nsghts from o vsulze oo much nose n the dt Need to reduce them to smller set of fctors

More information

Online Appendix to. Mandating Behavioral Conformity in Social Groups with Conformist Members

Online Appendix to. Mandating Behavioral Conformity in Social Groups with Conformist Members Onlne Appendx to Mndtng Behvorl Conformty n Socl Groups wth Conformst Members Peter Grzl Andrze Bnk (Correspondng uthor) Deprtment of Economcs, The Wllms School, Wshngton nd Lee Unversty, Lexngton, 4450

More information

Statistics and Probability Letters

Statistics and Probability Letters Sttstcs nd Probblty Letters 79 (2009) 105 111 Contents lsts vlble t ScenceDrect Sttstcs nd Probblty Letters journl homepge: www.elsever.com/locte/stpro Lmtng behvour of movng verge processes under ϕ-mxng

More information

Quiz: Experimental Physics Lab-I

Quiz: Experimental Physics Lab-I Mxmum Mrks: 18 Totl tme llowed: 35 mn Quz: Expermentl Physcs Lb-I Nme: Roll no: Attempt ll questons. 1. In n experment, bll of mss 100 g s dropped from heght of 65 cm nto the snd contner, the mpct s clled

More information

Partially Observable Systems. 1 Partially Observable Markov Decision Process (POMDP) Formalism

Partially Observable Systems. 1 Partially Observable Markov Decision Process (POMDP) Formalism CS294-40 Lernng for Rootcs nd Control Lecture 10-9/30/2008 Lecturer: Peter Aeel Prtlly Oservle Systems Scre: Dvd Nchum Lecture outlne POMDP formlsm Pont-sed vlue terton Glol methods: polytree, enumerton,

More information

4. Eccentric axial loading, cross-section core

4. Eccentric axial loading, cross-section core . Eccentrc xl lodng, cross-secton core Introducton We re strtng to consder more generl cse when the xl force nd bxl bendng ct smultneousl n the cross-secton of the br. B vrtue of Snt-Vennt s prncple we

More information

Statistics 423 Midterm Examination Winter 2009

Statistics 423 Midterm Examination Winter 2009 Sttstcs 43 Mdterm Exmnton Wnter 009 Nme: e-ml: 1. Plese prnt your nme nd e-ml ddress n the bove spces.. Do not turn ths pge untl nstructed to do so. 3. Ths s closed book exmnton. You my hve your hnd clcultor

More information

Jean Fernand Nguema LAMETA UFR Sciences Economiques Montpellier. Abstract

Jean Fernand Nguema LAMETA UFR Sciences Economiques Montpellier. Abstract Stochstc domnnce on optml portfolo wth one rsk less nd two rsky ssets Jen Fernnd Nguem LAMETA UFR Scences Economques Montpeller Abstrct The pper provdes restrctons on the nvestor's utlty functon whch re

More information

INTRODUCTION TO COMPLEX NUMBERS

INTRODUCTION TO COMPLEX NUMBERS INTRODUCTION TO COMPLEX NUMBERS The numers -4, -3, -, -1, 0, 1,, 3, 4 represent the negtve nd postve rel numers termed ntegers. As one frst lerns n mddle school they cn e thought of s unt dstnce spced

More information

Definition of Tracking

Definition of Tracking Trckng Defnton of Trckng Trckng: Generte some conclusons bout the moton of the scene, objects, or the cmer, gven sequence of mges. Knowng ths moton, predct where thngs re gong to project n the net mge,

More information

Dynamic Power Management in a Mobile Multimedia System with Guaranteed Quality-of-Service

Dynamic Power Management in a Mobile Multimedia System with Guaranteed Quality-of-Service Dynmc Power Mngement n Moble Multmed System wth Gurnteed Qulty-of-Servce Qnru Qu, Qng Wu, nd Mssoud Pedrm Dept. of Electrcl Engneerng-Systems Unversty of Southern Clforn Los Angeles CA 90089 Outlne! Introducton

More information

Chapter Newton-Raphson Method of Solving a Nonlinear Equation

Chapter Newton-Raphson Method of Solving a Nonlinear Equation Chpter 0.04 Newton-Rphson Method o Solvng Nonlner Equton Ater redng ths chpter, you should be ble to:. derve the Newton-Rphson method ormul,. develop the lgorthm o the Newton-Rphson method,. use the Newton-Rphson

More information

Chapter 5 Supplemental Text Material R S T. ij i j ij ijk

Chapter 5 Supplemental Text Material R S T. ij i j ij ijk Chpter 5 Supplementl Text Mterl 5-. Expected Men Squres n the Two-fctor Fctorl Consder the two-fctor fxed effects model y = µ + τ + β + ( τβ) + ε k R S T =,,, =,,, k =,,, n gven s Equton (5-) n the textook.

More information

523 P a g e. is measured through p. should be slower for lesser values of p and faster for greater values of p. If we set p*

523 P a g e. is measured through p. should be slower for lesser values of p and faster for greater values of p. If we set p* R. Smpth Kumr, R. Kruthk, R. Rdhkrshnn / Interntonl Journl of Engneerng Reserch nd Applctons (IJERA) ISSN: 48-96 www.jer.com Vol., Issue 4, July-August 0, pp.5-58 Constructon Of Mxed Smplng Plns Indexed

More information

Lecture 4: Piecewise Cubic Interpolation

Lecture 4: Piecewise Cubic Interpolation Lecture notes on Vrtonl nd Approxmte Methods n Appled Mthemtcs - A Perce UBC Lecture 4: Pecewse Cubc Interpolton Compled 6 August 7 In ths lecture we consder pecewse cubc nterpolton n whch cubc polynoml

More information

DCDM BUSINESS SCHOOL NUMERICAL METHODS (COS 233-8) Solutions to Assignment 3. x f(x)

DCDM BUSINESS SCHOOL NUMERICAL METHODS (COS 233-8) Solutions to Assignment 3. x f(x) DCDM BUSINESS SCHOOL NUMEICAL METHODS (COS -8) Solutons to Assgnment Queston Consder the followng dt: 5 f() 8 7 5 () Set up dfference tble through fourth dfferences. (b) Wht s the mnmum degree tht n nterpoltng

More information

PLEASE SCROLL DOWN FOR ARTICLE

PLEASE SCROLL DOWN FOR ARTICLE Ths rtcle ws downloded by:ntonl Cheng Kung Unversty] On: 1 September 7 Access Detls: subscrpton number 7765748] Publsher: Tylor & Frncs Inform Ltd Regstered n Englnd nd Wles Regstered Number: 17954 Regstered

More information

GAUSS ELIMINATION. Consider the following system of algebraic linear equations

GAUSS ELIMINATION. Consider the following system of algebraic linear equations Numercl Anlyss for Engneers Germn Jordnn Unversty GAUSS ELIMINATION Consder the followng system of lgebrc lner equtons To solve the bove system usng clsscl methods, equton () s subtrcted from equton ()

More information

CONTEXTUAL MULTI-ARMED BANDIT ALGORITHMS FOR PERSONALIZED LEARNING ACTION SELECTION. Indu Manickam, Andrew S. Lan, and Richard G.

CONTEXTUAL MULTI-ARMED BANDIT ALGORITHMS FOR PERSONALIZED LEARNING ACTION SELECTION. Indu Manickam, Andrew S. Lan, and Richard G. CONTEXTUAL MULTI-ARMED BANDIT ALGORITHMS FOR PERSONALIZED LEARNING ACTION SELECTION Indu Mnckm, Andrew S. Ln, nd Rchrd G. Brnuk Rce Unversty ABSTRACT Optmzng the selecton of lernng resources nd prctce

More information

International Journal of Pure and Applied Sciences and Technology

International Journal of Pure and Applied Sciences and Technology Int. J. Pure Appl. Sc. Technol., () (), pp. 44-49 Interntonl Journl of Pure nd Appled Scences nd Technolog ISSN 9-67 Avlle onlne t www.jopst.n Reserch Pper Numercl Soluton for Non-Lner Fredholm Integrl

More information

CALIBRATION OF SMALL AREA ESTIMATES IN BUSINESS SURVEYS

CALIBRATION OF SMALL AREA ESTIMATES IN BUSINESS SURVEYS CALIBRATION OF SMALL AREA ESTIMATES IN BUSINESS SURVES Rodolphe Prm, Ntle Shlomo Southmpton Sttstcl Scences Reserch Insttute Unverst of Southmpton Unted Kngdom SAE, August 20 The BLUE-ETS Project s fnnced

More information

In this Chapter. Chap. 3 Markov chains and hidden Markov models. Probabilistic Models. Example: CpG Islands

In this Chapter. Chap. 3 Markov chains and hidden Markov models. Probabilistic Models. Example: CpG Islands In ths Chpter Chp. 3 Mrov chns nd hdden Mrov models Bontellgence bortory School of Computer Sc. & Eng. Seoul Ntonl Unversty Seoul 5-74, Kore The probblstc model for sequence nlyss HMM (hdden Mrov model)

More information

Variable time amplitude amplification and quantum algorithms for linear algebra. Andris Ambainis University of Latvia

Variable time amplitude amplification and quantum algorithms for linear algebra. Andris Ambainis University of Latvia Vrble tme mpltude mplfcton nd quntum lgorthms for lner lgebr Andrs Ambns Unversty of Ltv Tlk outlne. ew verson of mpltude mplfcton;. Quntum lgorthm for testng f A s sngulr; 3. Quntum lgorthm for solvng

More information

Two Coefficients of the Dyson Product

Two Coefficients of the Dyson Product Two Coeffcents of the Dyson Product rxv:07.460v mth.co 7 Nov 007 Lun Lv, Guoce Xn, nd Yue Zhou 3,,3 Center for Combntorcs, LPMC TJKLC Nnk Unversty, Tnjn 30007, P.R. Chn lvlun@cfc.nnk.edu.cn gn@nnk.edu.cn

More information

CHI-SQUARE DIVERGENCE AND MINIMIZATION PROBLEM

CHI-SQUARE DIVERGENCE AND MINIMIZATION PROBLEM CHI-SQUARE DIVERGENCE AND MINIMIZATION PROBLEM PRANESH KUMAR AND INDER JEET TANEJA Abstrct The mnmum dcrmnton nformton prncple for the Kullbck-Lebler cross-entropy well known n the lterture In th pper

More information

A Family of Multivariate Abel Series Distributions. of Order k

A Family of Multivariate Abel Series Distributions. of Order k Appled Mthemtcl Scences, Vol. 2, 2008, no. 45, 2239-2246 A Fmly of Multvrte Abel Seres Dstrbutons of Order k Rupk Gupt & Kshore K. Ds 2 Fculty of Scence & Technology, The Icf Unversty, Agrtl, Trpur, Ind

More information

Many-Body Calculations of the Isotope Shift

Many-Body Calculations of the Isotope Shift Mny-Body Clcultons of the Isotope Shft W. R. Johnson Mrch 11, 1 1 Introducton Atomc energy levels re commonly evluted ssumng tht the nucler mss s nfnte. In ths report, we consder correctons to tomc levels

More information

CS434a/541a: Pattern Recognition Prof. Olga Veksler. Lecture 9

CS434a/541a: Pattern Recognition Prof. Olga Veksler. Lecture 9 CS434/541: Pttern Recognton Prof. Olg Veksler Lecture 9 Announcements Fnl project proposl due Nov. 1 1-2 prgrph descrpton Lte Penlt: s 1 pont off for ech d lte Assgnment 3 due November 10 Dt for fnl project

More information

Using Predictions in Online Optimization: Looking Forward with an Eye on the Past

Using Predictions in Online Optimization: Looking Forward with an Eye on the Past Usng Predctons n Onlne Optmzton: Lookng Forwrd wth n Eye on the Pst Nngjun Chen Jont work wth Joshu Comden, Zhenhu Lu, Anshul Gndh, nd Adm Wermn 1 Predctons re crucl for decson mkng 2 Predctons re crucl

More information

Katholieke Universiteit Leuven Department of Computer Science

Katholieke Universiteit Leuven Department of Computer Science Updte Rules for Weghted Non-negtve FH*G Fctorzton Peter Peers Phlp Dutré Report CW 440, Aprl 006 Ktholeke Unverstet Leuven Deprtment of Computer Scence Celestjnenln 00A B-3001 Heverlee (Belgum) Updte Rules

More information

Demand. Demand and Comparative Statics. Graphically. Marshallian Demand. ECON 370: Microeconomic Theory Summer 2004 Rice University Stanley Gilbert

Demand. Demand and Comparative Statics. Graphically. Marshallian Demand. ECON 370: Microeconomic Theory Summer 2004 Rice University Stanley Gilbert Demnd Demnd nd Comrtve Sttcs ECON 370: Mcroeconomc Theory Summer 004 Rce Unversty Stnley Glbert Usng the tools we hve develoed u to ths ont, we cn now determne demnd for n ndvdul consumer We seek demnd

More information

A Tri-Valued Belief Network Model for Information Retrieval

A Tri-Valued Belief Network Model for Information Retrieval December 200 A Tr-Vlued Belef Networ Model for Informton Retrevl Fernndo Ds-Neves Computer Scence Dept. Vrgn Polytechnc Insttute nd Stte Unversty Blcsburg, VA 24060. IR models t Combnng Evdence Grphcl

More information

Bi-level models for OD matrix estimation

Bi-level models for OD matrix estimation TNK084 Trffc Theory seres Vol.4, number. My 2008 B-level models for OD mtrx estmton Hn Zhng, Quyng Meng Abstrct- Ths pper ntroduces two types of O/D mtrx estmton model: ME2 nd Grdent. ME2 s mxmum-entropy

More information

AIR FORCE INSTITUTE OF TECHNOLOGY Wright-Patterson Air Force Base, Ohio

AIR FORCE INSTITUTE OF TECHNOLOGY Wright-Patterson Air Force Base, Ohio 7 CHANGE-POINT METHODS FOR OVERDISPERSED COUNT DATA THESIS Brn A. Wlken, Cptn, Unted Sttes Ar Force AFIT/GOR/ENS/7-26 DEPARTMENT OF THE AIR FORCE AIR UNIVERSITY AIR FORCE INSTITUTE OF TECHNOLOGY Wrght-Ptterson

More information

Research Article On the Upper Bounds of Eigenvalues for a Class of Systems of Ordinary Differential Equations with Higher Order

Research Article On the Upper Bounds of Eigenvalues for a Class of Systems of Ordinary Differential Equations with Higher Order Hndw Publshng Corporton Interntonl Journl of Dfferentl Equtons Volume 0, Artcle ID 7703, pges do:055/0/7703 Reserch Artcle On the Upper Bounds of Egenvlues for Clss of Systems of Ordnry Dfferentl Equtons

More information

THE COMBINED SHEPARD ABEL GONCHAROV UNIVARIATE OPERATOR

THE COMBINED SHEPARD ABEL GONCHAROV UNIVARIATE OPERATOR REVUE D ANALYSE NUMÉRIQUE ET DE THÉORIE DE L APPROXIMATION Tome 32, N o 1, 2003, pp 11 20 THE COMBINED SHEPARD ABEL GONCHAROV UNIVARIATE OPERATOR TEODORA CĂTINAŞ Abstrct We extend the Sheprd opertor by

More information

18.7 Artificial Neural Networks

18.7 Artificial Neural Networks 310 18.7 Artfcl Neurl Networks Neuroscence hs hypotheszed tht mentl ctvty conssts prmrly of electrochemcl ctvty n networks of brn cells clled neurons Ths led McCulloch nd Ptts to devse ther mthemtcl model

More information

arxiv: v2 [cs.lg] 9 Nov 2017

arxiv: v2 [cs.lg] 9 Nov 2017 Renforcement Lernng under Model Msmtch Aurko Roy 1, Hun Xu 2, nd Sebstn Pokutt 2 rxv:1706.04711v2 cs.lg 9 Nov 2017 1 Google Eml: urkor@google.com 2 ISyE, Georg Insttute of Technology, Atlnt, GA, USA. Eml:

More information

Stratified Extreme Ranked Set Sample With Application To Ratio Estimators

Stratified Extreme Ranked Set Sample With Application To Ratio Estimators Journl of Modern Appled Sttstcl Metods Volume 3 Issue Artcle 5--004 Strtfed Extreme Rned Set Smple Wt Applcton To Rto Estmtors Hn M. Smw Sultn Qboos Unversty, smw@squ.edu.om t J. Sed Sultn Qboos Unversty

More information

ESCI 342 Atmospheric Dynamics I Lesson 1 Vectors and Vector Calculus

ESCI 342 Atmospheric Dynamics I Lesson 1 Vectors and Vector Calculus ESI 34 tmospherc Dnmcs I Lesson 1 Vectors nd Vector lculus Reference: Schum s Outlne Seres: Mthemtcl Hndbook of Formuls nd Tbles Suggested Redng: Mrtn Secton 1 OORDINTE SYSTEMS n orthonorml coordnte sstem

More information

Study of Trapezoidal Fuzzy Linear System of Equations S. M. Bargir 1, *, M. S. Bapat 2, J. D. Yadav 3 1

Study of Trapezoidal Fuzzy Linear System of Equations S. M. Bargir 1, *, M. S. Bapat 2, J. D. Yadav 3 1 mercn Interntonl Journl of Reserch n cence Technology Engneerng & Mthemtcs vlble onlne t http://wwwsrnet IN (Prnt: 38-349 IN (Onlne: 38-3580 IN (CD-ROM: 38-369 IJRTEM s refereed ndexed peer-revewed multdscplnry

More information

Introduction to Numerical Integration Part II

Introduction to Numerical Integration Part II Introducton to umercl Integrton Prt II CS 75/Mth 75 Brn T. Smth, UM, CS Dept. Sprng, 998 4/9/998 qud_ Intro to Gussn Qudrture s eore, the generl tretment chnges the ntegrton prolem to ndng the ntegrl w

More information

Lecture 36. Finite Element Methods

Lecture 36. Finite Element Methods CE 60: Numercl Methods Lecture 36 Fnte Element Methods Course Coordntor: Dr. Suresh A. Krth, Assocte Professor, Deprtment of Cvl Engneerng, IIT Guwht. In the lst clss, we dscussed on the ppromte methods

More information

Improving Anytime Point-Based Value Iteration Using Principled Point Selections

Improving Anytime Point-Based Value Iteration Using Principled Point Selections In In Proceedngs of the Twenteth Interntonl Jont Conference on Artfcl Intellgence (IJCAI-7) Improvng Anytme Pont-Bsed Vlue Iterton Usng Prncpled Pont Selectons Mchel R. Jmes, Mchel E. Smples, nd Dmtr A.

More information

THE SPEED OF SOCIAL LEARNING

THE SPEED OF SOCIAL LEARNING THE SPEED OF SOCIAL LEARNING MATAN HAREL, ELCHANAN MOSSEL 2, PHILIPP STRACK 3, AND OMER TAMUZ 4 Abstrct. We study how effectvely group of rtonl gents lerns from repetedly observng ech others ctons. We

More information

Fall 2012 Analysis of Experimental Measurements B. Eisenstein/rev. S. Errede. with respect to λ. 1. χ λ χ λ ( ) λ, and thus:

Fall 2012 Analysis of Experimental Measurements B. Eisenstein/rev. S. Errede. with respect to λ. 1. χ λ χ λ ( ) λ, and thus: More on χ nd errors : uppose tht we re fttng for sngle -prmeter, mnmzng: If we epnd The vlue χ ( ( ( ; ( wth respect to. χ n Tlor seres n the vcnt of ts mnmum vlue χ ( mn χ χ χ χ + + + mn mnmzes χ, nd

More information

Online Stochastic Matching: New Algorithms with Better Bounds

Online Stochastic Matching: New Algorithms with Better Bounds Onlne Stochstc Mtchng: New Algorthms wth Better Bounds Ptrck Jllet Xn Lu My 202; revsed Jnury 203, June 203 Abstrct We consder vrnts of the onlne stochstc bprtte mtchng problem motvted by Internet dvertsng

More information

6 Roots of Equations: Open Methods

6 Roots of Equations: Open Methods HK Km Slghtly modfed 3//9, /8/6 Frstly wrtten t Mrch 5 6 Roots of Equtons: Open Methods Smple Fed-Pont Iterton Newton-Rphson Secnt Methods MATLAB Functon: fzero Polynomls Cse Study: Ppe Frcton Brcketng

More information

4. More general extremum principles and thermodynamic potentials

4. More general extremum principles and thermodynamic potentials 4. More generl etremum prncples nd thermodynmc potentls We hve seen tht mn{u(s, X )} nd m{s(u, X)} mply one nother. Under certn condtons, these prncples re very convenent. For emple, ds = 1 T du T dv +

More information

Heteroscedastic One-Way ANOVA and Lack-of-Fit Tests

Heteroscedastic One-Way ANOVA and Lack-of-Fit Tests M. G. AKRITAS nd N. PAPAATOS Heteroscedstc One-Wy ANOVA nd Lck-of-Ft Tests Recent rtcles hve consdered the symptotc behvor of the one-wy nlyss of vrnce (ANOVA) F sttstc when the number of levels or groups

More information

A Regression-Based Approach for Scaling-Up Personalized Recommender Systems in E-Commerce

A Regression-Based Approach for Scaling-Up Personalized Recommender Systems in E-Commerce A Regresson-Bsed Approch for Sclng-Up Personlzed Recommender Systems n E-Commerce Slobodn Vucetc 1 nd Zorn Obrdovc 1, svucetc@eecs.wsu.edu, zorn@cs.temple.edu 1 Electrcl Engneerng nd Computer Scence, Wshngton

More information

Review of linear algebra. Nuno Vasconcelos UCSD

Review of linear algebra. Nuno Vasconcelos UCSD Revew of lner lgebr Nuno Vsconcelos UCSD Vector spces Defnton: vector spce s set H where ddton nd sclr multplcton re defned nd stsf: ) +( + ) (+ )+ 5) λ H 2) + + H 6) 3) H, + 7) λ(λ ) (λλ ) 4) H, - + 8)

More information

Simultaneous estimation of rewards and dynamics from noisy expert demonstrations

Simultaneous estimation of rewards and dynamics from noisy expert demonstrations Smultneous estmton of rewrds nd dynmcs from nosy expert demonstrtons Mchel Hermn,2, Tobs Gndele, Jo rg Wgner, Felx Schmtt, nd Wolfrm Burgrd2 - Robert Bosch GmbH - 70442 Stuttgrt - Germny 2- Unversty of

More information

3/6/00. Reading Assignments. Outline. Hidden Markov Models: Explanation and Model Learning

3/6/00. Reading Assignments. Outline. Hidden Markov Models: Explanation and Model Learning 3/6/ Hdden Mrkov Models: Explnton nd Model Lernng Brn C. Wllms 6.4/6.43 Sesson 2 9/3/ courtesy of JPL copyrght Brn Wllms, 2 Brn C. Wllms, copyrght 2 Redng Assgnments AIMA (Russell nd Norvg) Ch 5.-.3, 2.3

More information

Electrochemical Thermodynamics. Interfaces and Energy Conversion

Electrochemical Thermodynamics. Interfaces and Energy Conversion CHE465/865, 2006-3, Lecture 6, 18 th Sep., 2006 Electrochemcl Thermodynmcs Interfces nd Energy Converson Where does the energy contrbuton F zϕ dn come from? Frst lw of thermodynmcs (conservton of energy):

More information

Altitude Estimation for 3-D Tracking with Two 2-D Radars

Altitude Estimation for 3-D Tracking with Two 2-D Radars th Interntonl Conference on Informton Fuson Chcgo Illnos USA July -8 Alttude Estmton for -D Trckng wth Two -D Rdrs Yothn Rkvongth Jfeng Ru Sv Svnnthn nd Soontorn Orntr Deprtment of Electrcl Engneerng Unversty

More information

Estimation of Normal Mixtures in a Nested Error Model with an Application to Small Area Estimation of Poverty and Inequality

Estimation of Normal Mixtures in a Nested Error Model with an Application to Small Area Estimation of Poverty and Inequality Publc Dsclosure Authorzed Publc Dsclosure Authorzed Publc Dsclosure Authorzed Publc Dsclosure Authorzed Polcy Reserch Workng Pper 696 Estmton of Norml Mxtures n Nested Error Model wth n Applcton to Smll

More information

ON SIMPSON S INEQUALITY AND APPLICATIONS. 1. Introduction The following inequality is well known in the literature as Simpson s inequality : 2 1 f (4)

ON SIMPSON S INEQUALITY AND APPLICATIONS. 1. Introduction The following inequality is well known in the literature as Simpson s inequality : 2 1 f (4) ON SIMPSON S INEQUALITY AND APPLICATIONS SS DRAGOMIR, RP AGARWAL, AND P CERONE Abstrct New neultes of Smpson type nd ther pplcton to udrture formule n Numercl Anlyss re gven Introducton The followng neulty

More information

Computing a complete histogram of an image in Log(n) steps and minimum expected memory requirements using hypercubes

Computing a complete histogram of an image in Log(n) steps and minimum expected memory requirements using hypercubes Computng complete hstogrm of n mge n Log(n) steps nd mnmum expected memory requrements usng hypercubes TAREK M. SOBH School of Engneerng, Unversty of Brdgeport, Connectcut, USA. Abstrct Ths work frst revews

More information

Math 497C Sep 17, Curves and Surfaces Fall 2004, PSU

Math 497C Sep 17, Curves and Surfaces Fall 2004, PSU Mth 497C Sep 17, 004 1 Curves nd Surfces Fll 004, PSU Lecture Notes 3 1.8 The generl defnton of curvture; Fox-Mlnor s Theorem Let α: [, b] R n be curve nd P = {t 0,...,t n } be prtton of [, b], then the

More information

CHOVER-TYPE LAWS OF THE ITERATED LOGARITHM FOR WEIGHTED SUMS OF ρ -MIXING SEQUENCES

CHOVER-TYPE LAWS OF THE ITERATED LOGARITHM FOR WEIGHTED SUMS OF ρ -MIXING SEQUENCES CHOVER-TYPE LAWS OF THE ITERATED LOGARITHM FOR WEIGHTED SUMS OF ρ -MIXING SEQUENCES GUANG-HUI CAI Receved 24 September 2004; Revsed 3 My 2005; Accepted 3 My 2005 To derve Bum-Ktz-type result, we estblsh

More information

State Estimation in TPN and PPN Guidance Laws by Using Unscented and Extended Kalman Filters

State Estimation in TPN and PPN Guidance Laws by Using Unscented and Extended Kalman Filters Stte Estmton n PN nd PPN Gudnce Lws by Usng Unscented nd Extended Klmn Flters S.H. oospour*, S. oospour**, mostf.sdollh*** Fculty of Electrcl nd Computer Engneerng, Unversty of brz, brz, Irn, *s.h.moospour@gml.com

More information

Fall 2012 Analysis of Experimental Measurements B. Eisenstein/rev. S. Errede

Fall 2012 Analysis of Experimental Measurements B. Eisenstein/rev. S. Errede Fll Anlss of Epermentl Mesurements B. Esensten/rev. S. Errede Monte Crlo Methods/Technques: These re mong the most powerful tools for dt nlss nd smulton of eperments. The growth of ther mportnce s closel

More information

CIS587 - Artificial Intelligence. Uncertainty CIS587 - AI. KB for medical diagnosis. Example.

CIS587 - Artificial Intelligence. Uncertainty CIS587 - AI. KB for medical diagnosis. Example. CIS587 - rtfcl Intellgence Uncertnty K for medcl dgnoss. Exmple. We wnt to uld K system for the dgnoss of pneumon. rolem descrpton: Dsese: pneumon tent symptoms fndngs, l tests: Fever, Cough, leness, WC

More information

7.2 Volume. A cross section is the shape we get when cutting straight through an object.

7.2 Volume. A cross section is the shape we get when cutting straight through an object. 7. Volume Let s revew the volume of smple sold, cylnder frst. Cylnder s volume=se re heght. As llustrted n Fgure (). Fgure ( nd (c) re specl cylnders. Fgure () s rght crculr cylnder. Fgure (c) s ox. A

More information

Pyramid Algorithms for Barycentric Rational Interpolation

Pyramid Algorithms for Barycentric Rational Interpolation Pyrmd Algorthms for Brycentrc Rtonl Interpolton K Hormnn Scott Schefer Astrct We present new perspectve on the Floter Hormnn nterpolnt. Ths nterpolnt s rtonl of degree (n, d), reproduces polynomls of degree

More information

M/G/1/GD/ / System. ! Pollaczek-Khinchin (PK) Equation. ! Steady-state probabilities. ! Finding L, W q, W. ! π 0 = 1 ρ

M/G/1/GD/ / System. ! Pollaczek-Khinchin (PK) Equation. ! Steady-state probabilities. ! Finding L, W q, W. ! π 0 = 1 ρ M/G//GD/ / System! Pollcze-Khnchn (PK) Equton L q 2 2 λ σ s 2( + ρ ρ! Stedy-stte probbltes! π 0 ρ! Fndng L, q, ) 2 2 M/M/R/GD/K/K System! Drw the trnston dgrm! Derve the stedy-stte probbltes:! Fnd L,L

More information

Linear and Nonlinear Optimization

Linear and Nonlinear Optimization Lner nd Nonlner Optmzton Ynyu Ye Deprtment of Mngement Scence nd Engneerng Stnford Unversty Stnford, CA 9430, U.S.A. http://www.stnford.edu/~yyye http://www.stnford.edu/clss/msnde/ Ynyu Ye, Stnford, MS&E

More information

Reinforcement Learning with a Gaussian Mixture Model

Reinforcement Learning with a Gaussian Mixture Model Renforcement Lernng wth Gussn Mxture Model Alejndro Agostn, Member, IEEE nd Enrc Cely Abstrct Recent pproches to Renforcement Lernng (RL) wth functon pproxmton nclude Neurl Ftted Q Iterton nd the use of

More information

A New Markov Chain Based Acceptance Sampling Policy via the Minimum Angle Method

A New Markov Chain Based Acceptance Sampling Policy via the Minimum Angle Method Irnn Journl of Opertons Reserch Vol. 3, No., 202, pp. 04- A New Mrkov Chn Bsed Acceptnce Smplng Polcy v the Mnmum Angle Method M. S. Fllh Nezhd * We develop n optmzton model bsed on Mrkovn pproch to determne

More information

The Schur-Cohn Algorithm

The Schur-Cohn Algorithm Modelng, Estmton nd Otml Flterng n Sgnl Processng Mohmed Njm Coyrght 8, ISTE Ltd. Aendx F The Schur-Cohn Algorthm In ths endx, our m s to resent the Schur-Cohn lgorthm [] whch s often used s crteron for

More information

CHAPTER - 7. Firefly Algorithm based Strategic Bidding to Maximize Profit of IPPs in Competitive Electricity Market

CHAPTER - 7. Firefly Algorithm based Strategic Bidding to Maximize Profit of IPPs in Competitive Electricity Market CHAPTER - 7 Frefly Algorthm sed Strtegc Bddng to Mxmze Proft of IPPs n Compettve Electrcty Mrket 7. Introducton The renovton of electrc power systems plys mjor role on economc nd relle operton of power

More information

Investigation phase in case of Bragg coupling

Investigation phase in case of Bragg coupling Journl of Th-Qr Unversty No.3 Vol.4 December/008 Investgton phse n cse of Brgg couplng Hder K. Mouhmd Deprtment of Physcs, College of Scence, Th-Qr, Unv. Mouhmd H. Abdullh Deprtment of Physcs, College

More information

Solubilities and Thermodynamic Properties of SO 2 in Ionic

Solubilities and Thermodynamic Properties of SO 2 in Ionic Solubltes nd Therodync Propertes of SO n Ionc Lquds Men Jn, Yucu Hou, b Weze Wu, *, Shuhng Ren nd Shdong Tn, L Xo, nd Zhgng Le Stte Key Lbortory of Checl Resource Engneerng, Beng Unversty of Checl Technology,

More information

Modeling zero response data from willingness to pay surveys A semi-parametric estimation

Modeling zero response data from willingness to pay surveys A semi-parametric estimation Economcs Letters 71 (001) 191 196 www.elsever.com/ locte/ econbse Modelng zero response dt from wllngness to py surveys A sem-prmetrc estmton Seung-Hoon Yoo *, T-Yoo Km, J-K Lee, b c Insttute of Economc

More information

Investigation of Fatality Probability Function Associated with Injury Severity and Age. Toshiyuki Yanaoka, Akihiko Akiyama, Yukou Takahashi

Investigation of Fatality Probability Function Associated with Injury Severity and Age. Toshiyuki Yanaoka, Akihiko Akiyama, Yukou Takahashi Investgton of Ftlty Probblty Functon Assocted wth Injury Severty nd Age Toshyuk Ynok, Akhko Akym, Yukou Tkhsh Abstrct The gol of ths study s to develop ftlty probblty functon ssocted wth njury severty

More information

Two Activation Function Wavelet Network for the Identification of Functions with High Nonlinearity

Two Activation Function Wavelet Network for the Identification of Functions with High Nonlinearity Interntonl Journl of Engneerng & Computer Scence IJECS-IJENS Vol:1 No:04 81 Two Actvton Functon Wvelet Network for the Identfcton of Functons wth Hgh Nonlnerty Wsm Khld Abdulkder Abstrct-- The ntegrton

More information

Group-based active query selection. for rapid diagnosis in time-critical situations

Group-based active query selection. for rapid diagnosis in time-critical situations Group-bsed ctve query selecton for rpd dgnoss n tme-crtcl stutons *Gowthm Belll, Student Member, IEEE, Suresh K. Bhvnn, nd Clyton Scott, Member, IEEE Abstrct In pplctons such s ctve lernng or dsese/fult

More information

Zbus 1.0 Introduction The Zbus is the inverse of the Ybus, i.e., (1) Since we know that

Zbus 1.0 Introduction The Zbus is the inverse of the Ybus, i.e., (1) Since we know that us. Introducton he us s the nverse of the us,.e., () Snce we now tht nd therefore then I V () V I () V I (4) So us reltes the nodl current njectons to the nodl voltges, s seen n (4). In developng the power

More information

More metrics on cartesian products

More metrics on cartesian products More metrcs on cartesan products If (X, d ) are metrc spaces for 1 n, then n Secton II4 of the lecture notes we defned three metrcs on X whose underlyng topologes are the product topology The purpose of

More information

GMM Estimation of the GB2 class of Income Distributions from Grouped Data

GMM Estimation of the GB2 class of Income Distributions from Grouped Data GMM Estmton of the GB clss of Income Dstrbutons from Grouped Dt Gholmre Hjrgsht Deprtment of Economcs Unversty of Melbourne, Austrl Wllm E. Grffths Deprtment of Economcs Unversty of Melbourne, Austrl Joseph

More information

Soft Set Theoretic Approach for Dimensionality Reduction 1

Soft Set Theoretic Approach for Dimensionality Reduction 1 Interntonl Journl of Dtbse Theory nd pplcton Vol No June 00 Soft Set Theoretc pproch for Dmensonlty Reducton Tutut Herwn Rozd Ghzl Mustf Mt Ders Deprtment of Mthemtcs Educton nversts hmd Dhln Yogykrt Indones

More information

The solution of transport problems by the method of structural optimization

The solution of transport problems by the method of structural optimization Scentfc Journls Mrtme Unversty of Szczecn Zeszyty Nukowe Akdem Morsk w Szczecne 2013, 34(106) pp. 59 64 2013, 34(106) s. 59 64 ISSN 1733-8670 The soluton of trnsport problems by the method of structurl

More information

Identification of Robot Arm s Joints Time-Varying Stiffness Under Loads

Identification of Robot Arm s Joints Time-Varying Stiffness Under Loads TELKOMNIKA, Vol.10, No.8, December 2012, pp. 2081~2087 e-issn: 2087-278X ccredted by DGHE (DIKTI), Decree No: 51/Dkt/Kep/2010 2081 Identfcton of Robot Arm s Jonts Tme-Vryng Stffness Under Lods Ru Xu 1,

More information

Reproducing Kernel Hilbert Space for. Penalized Regression Multi-Predictors: Case in Longitudinal Data

Reproducing Kernel Hilbert Space for. Penalized Regression Multi-Predictors: Case in Longitudinal Data Interntonl Journl of Mthemtcl Anlyss Vol. 8, 04, no. 40, 95-96 HIKARI Ltd, www.m-hkr.com http://dx.do.org/0.988/jm.04.47 Reproducng Kernel Hlbert Spce for Penlzed Regresson Mult-Predctors: Cse n Longudnl

More information

The Dynamic Multi-Task Supply Chain Principal-Agent Analysis

The Dynamic Multi-Task Supply Chain Principal-Agent Analysis J. Servce Scence & Mngement 009 : 9- do:0.46/jssm.009.409 Publshed Onlne December 009 www.scp.org/journl/jssm) 9 he Dynmc Mult-sk Supply Chn Prncpl-Agent Anlyss Shnlng LI Chunhu WANG Dol ZHU Mngement School

More information

On-line Reinforcement Learning Using Incremental Kernel-Based Stochastic Factorization

On-line Reinforcement Learning Using Incremental Kernel-Based Stochastic Factorization On-lne Renforcement Lernng Usng Incrementl Kernel-Bsed Stochstc Fctorzton André M. S. Brreto School of Computer Scence McGll Unversty Montrel, Cnd msb@cs.mcgll.c Don Precup School of Computer Scence McGll

More information

MEASURING THE EFFECT OF PRODUCTION FACTORS ON YIELD OF GREENHOUSE TOMATO PRODUCTION USING MULTIVARIATE MODEL

MEASURING THE EFFECT OF PRODUCTION FACTORS ON YIELD OF GREENHOUSE TOMATO PRODUCTION USING MULTIVARIATE MODEL MEASURING THE EFFECT OF PRODUCTION FACTORS ON YIELD OF GREENHOUSE TOMATO PRODUCTION USING MULTIVARIATE MODEL Mrn Nkoll, MSc Ilr Kpj, PhD An Kpj (Mne), Prof. As. Jon Mullr Agrculture Unversty of Trn Alfons

More information

Utility function estimation: The entropy approach

Utility function estimation: The entropy approach Physc A 387 (28) 3862 3867 www.elsever.com/locte/phys Utlty functon estmton: The entropy pproch Andre Donso,, A. Hetor Res b,c, Lus Coelho Unversty of Evor, Center of Busness Studes, CEFAGE-UE, Lrgo Colegs,

More information

Online Learning Algorithms for Stochastic Water-Filling

Online Learning Algorithms for Stochastic Water-Filling Onlne Lernng Algorthms for Stochstc Wter-Fllng Y G nd Bhskr Krshnmchr Mng Hseh Deprtment of Electrcl Engneerng Unversty of Southern Clforn Los Angeles, CA 90089, USA Eml: {yg, bkrshn}@usc.edu Abstrct Wter-fllng

More information

Model Fitting and Robust Regression Methods

Model Fitting and Robust Regression Methods Dertment o Comuter Engneerng Unverst o Clorn t Snt Cruz Model Fttng nd Robust Regresson Methods CMPE 64: Imge Anlss nd Comuter Vson H o Fttng lnes nd ellses to mge dt Dertment o Comuter Engneerng Unverst

More information

Lecture 5 Single factor design and analysis

Lecture 5 Single factor design and analysis Lectue 5 Sngle fcto desgn nd nlss Completel ndomzed desgn (CRD Completel ndomzed desgn In the desgn of expements, completel ndomzed desgns e fo studng the effects of one pm fcto wthout the need to tke

More information

A Theoretical Study on the Rank of the Integral Operators for Large- Scale Electrodynamic Analysis

A Theoretical Study on the Rank of the Integral Operators for Large- Scale Electrodynamic Analysis Purdue Unversty Purdue e-pubs ECE Techncl Reports Electrcl nd Computer Engneerng -22-2 A Theoretcl Study on the Rnk of the Integrl Opertors for Lrge- Scle Electrodynmc Anlyss Wenwen Ch Purdue Unversty,

More information

Cramer-Rao Lower Bound for a Nonlinear Filtering Problem with Multiplicative Measurement Errors and Forcing Noise

Cramer-Rao Lower Bound for a Nonlinear Filtering Problem with Multiplicative Measurement Errors and Forcing Noise Preprnts of the 9th World Congress he Interntonl Federton of Automtc Control Crmer-Ro Lower Bound for Nonlner Flterng Problem wth Multplctve Mesurement Errors Forcng Nose Stepnov О.А. Vslyev V.А. Concern

More information