Bias and Variance Approximation in Value Function Estimates

Size: px
Start display at page:

Download "Bias and Variance Approximation in Value Function Estimates"

Transcription

1 Bis nd Vrince Approximtion in Vlue Function Estimtes Shie Mnnor Duncn Simester Peng Sun John N. Tsitsiklis July 11, 2004 Revised: July 5, 2005 Abstrct We consider Mrkov Decision Process nd study the bis nd vrince in the vlue function estimtes tht result from empiricl estimtes of the model prmeters. We provide closed-form pproximtions for the bis nd vrince, which cn then be used to derive confidence intervls round the vlue function estimtes. We illustrte nd vlidte our findings using lrge dtbse describing the trnsction nd miling histories for customers of mil-order ctlog firm. This reserch ws prtilly supported by NSF grnt DMI The pper hs benefited from comments by workshop prticipnts t Duke University, University of Pennsylvni, Wshington University t St. Louis, the 2004 Interntionl Conference on Mchine Lerning nd INFORMS Annul Meeting The uthors re thnkful to Ynn Le Tllec for finding n error in previous version nd referees for constructive comments. The uthors re especilly thnkful to the deprtment editor for detiled review nd mny constructive suggestions. Lbortory for Informtion nd Decision Systems, Msschusetts Institute of Technology, Cmbridge, MA 02139; current ddress: Deprtment of Electricl nd Computer Engineering, McGill University, Montrel, Quebec H3A 2A7, Cnd, shie@ece.mcgill.c Slon School of Mngement, Msschusetts Institute of Technology, Cmbridge, MA 02139, simester@mit.edu Fuqu School of Business, Duke University, Durhm, NC 27708, psun@duke.edu Lbortory for Informtion nd Decision Systems, Msschusetts Institute of Technology, Cmbridge, MA 02139, jnt@mit.edu

2 1 Introduction Bellmn s vlue function plys centrl role in the optimiztion of dynmic decision-mking models, s well s in the structurl estimtion of dynmic models of rtionl gents. For the importnt cse of finite-stte Mrkov Decision Process MDP), the vlue function depends on two types of model prmeters: the trnsition probbilities between sttes nd the expected one-step rewrds from ech stte. In mny pplictions in the socil sciences nd in engineering, the trnsition probbilities nd expected rewrds re not known nd insted must be estimted from finite smples of dt. The estimtion errors for these prmeters introduce errors nd bises in the vlue function estimtes. In this pper, we present methodology for evluting the bis nd vrince in vlue function estimtes cused by errors in the model prmeters. This, in turn, llows the clcultion of confidence intervls round the vlue function estimtes. The confidence intervls re themselves pproximtions. For nlyticl nd computtionl trctbility, they rely on second order Tylor series pproximtions. Moreover, becuse the expressions for the bis nd the vrince pproximtion require the true but unknown model prmeters, we replce these unknown prmeters by their estimtes. We evlute the ccurcy of these pproximtions nd vlidte the expressions using lrge smple of rel dt obtined from mil-order ctlog compny. Sources of Vrince We strt by distinguishing between two types of vrince tht cn rise in n MDP: internl nd prmetric. Internl vrince reflects the stochsticity in the trnsitions nd rewrds. For exmple, in mrketing setting there is rrely certinty s to whether n individul customer will purchse, resulting in genuinely stochstic trnsitions nd rewrds. Prmetric vrince rises if the true trnsition probbilities nd expected rewrds re estimted rther thn known; the potentil for error in the estimtes of these prmeters introduces vrince in the vlue function estimtes. The two types of vrince hve different sources nd cn be illustrted through different experiments. To illustrte internl vrince, we cn fix the model prmeters nd then generte number of finite-length smple trjectories with ll trjectories hving the sme length, strting from the sme stte, nd using common control policy). The vrition cross smple trjectories in the totl rewrds nd/or the identity of the finl stte reflects internl vrince. 1

3 In contrst, ggregtion cross smples does not mitigte prmetric vrince. The ltter cn be illustrted by compring the verge outcomes from lrge number of smples generted under different estimtes for the model prmeters. The vrition in the verge outcomes under different estimtes reflects prmetric vrince. Internl vrince hs lredy been considered in the literture. In prticulr, Sobel 1982) provides n expression for the internl vrince in Mrkov Decision Process with discounted rewrds, while Filr et l. 1989) nd Bukl-Gursoy nd Ross 1992) consider the verge rewrd criterion. In this pper we focus on prmetric vrince. Our motivtion is tht in mny contexts the underlying objective involves verging outcomes cross lrge number of smples, in which cse the internl vrince is verged out. For exmple, in mrketing ppliction, firm profits typiclly represent the ggregtion of outcomes cross lrge number of customers. Similrly, in lbor economics setting, firm often ggregtes cross lrge number of employees. Of course, there re settings where internl vrince is lso importnt. For exmple, when llocting finncil portfolios, the internl) vrince of the return on single finncil portfolio is importnt in its own right. Literture Mrkov Decision Problems, nd the ssocited methodology of Dynmic Progrmming, hve found brod rnge of pplictions in numerous fields in the socil sciences nd in engineering. These pplictions cn be brodly divided into two ctegories, bsed upon the reserch objectives. The first nd more trditionl ctegory of pplictions focuses on optimizing the opertion of humn or engineering systems, nd on providing tools for effective decision-mking. The ppliction res re vst, nd include finnce Luenberger, 1997; Cmpbell nd Viceir, 2002), economics Dixit nd Pindyck, 1994), inventory control nd supply chin mngement Zipkin, 2000), revenue nd yield mngement McGill nd vn Ryzin, 1999), trnsporttion Godfrey nd Powell, 2002), communictions, wter resource mngement, electric power systems. The vst mjority of this literture ssumes tht n ccurte system model is vilble. There is n underlying implicit ssumption tht the true model will be estimted using sttisticl methods on the bsis of whtever dt re vilble. However, the sttisticl rmifictions of working with finite dt records hve received little ttention. An exception is the literture deling with on-line lerning of optiml policies dptive control of Mrkov chins, reinforcement 2

4 lerning) Sutton nd Brto, 1998; Bertseks nd Tsitsiklis, 1996). However, this literture is concerned with symptotic convergence s opposed to the common sttisticl questions of stndrd errors nd confidence intervls. The second ctegory of pplictions focuses on explining observed phenomen. Amongst the most widely cited exmples is the work of Rust 1987), who develops discrete dynmic progrmming model of the optiml replcement policy for bus engines. According to this pproch the resercher strts by ssuming tht individuls or firms behve optimlly, but tht the prmeters of the firm or customer decision problem re unknown. By mximizing the likelihood of the empiriclly observed ctions of individuls or firms under the optiml policies for different sets of prmeters, the resercher seeks to identify these unobserved prmeters. Similr pplictions of discrete dynmic progrmming models hve become incresingly common, prticulrly in the lbor Kene nd Wolpin, 1994), industril orgniztion Hendel nd Nevo, 2002), nd mrketing Gönül nd Shi, 1998) litertures. While these methods use vriety of pproches to clculte or pproximte the vlue function, the vlue function relies upon point estimtes of the model prmeters. Previous ttempts to consider the impct of prmeter error on the clculted vlue function hve been limited to simultion-bsed pproches. We finlly note tht the impct of uncertinty in the model prmeters on the ccurcy of the vlue function estimtes hs received ttention in the finnce literture. For exmple, Xi 2001) nd Brberis 2000) investigte how dynmic lerning bout stock return predictbility ffects optiml portfolio lloctions. The generl problem considered in these studies is similr to the one ddressed in this pper. However, the sources of vrince re different. In prticulr, the finnce literture is concerned with internl vrince due to the stochsticity in the underlying process, nd prmetric vrince due to non-sttionrity of the model prmeters, including chnges in the investment horizon nd/or dynmic lerning. In contrst, we bstrct wy from the problem of internl vrince, ssume tht the model prmeters re sttionry, nd focus on the prmetric vrince tht results from estimting the model prmeters from finite smple of dt. Overview As fr s we know this is the first pper to study prmetric bis nd vrince in Mrkov Decision Processes. It serves two purposes. First, to illustrte the potentil for error in vlue 3

5 function estimtes nd to highlight the potentil mgnitude of these errors. Second, to provide formuls nd methodology for estimting the bis nd vrince in vlue function estimtes, which cn then be used to construct confidence intervls round the vlue function estimtes. We begin with some nottions nd bckground mteril in Section 2. In Section 3 we illustrte the reltionship between errors in the model prmeters nd the ccurcy of vlue function estimtes using ctul dt from ctlog miling context. In Section 4, we present methodology for estimting the bis nd vrince in the vlue function estimtes. In Section 5, we vlidte our methodology using the ctlog miling dt. We conclude in Section 6 with review of the findings nd discussion of opportunities for future reserch. 2 A Forml Description of the Problem We consider Mrkov Decision Processes MDP) with fixed policy, where both the MDP nd the policy re ssumed sttionry. The ssumption tht the policy is fixed llows us to initilly bstrct wy from the control problem. As we discuss in Section 4.2, the impct of prmeter uncertinty on the solution to the control problem rises dditionl issues. An MDP is specified by finite set S of sttes, of crdinlity m, finite set A of ctions, nd two sclrs, Pij nd R ij for every i, j S nd every A. These sclrs re interpreted s follows: if the current stte is i nd ction is pplied, then the next stte is j with probbility Pij ; furthermore, given tht trnsition from i to j occurs following n ction equl to, rndom rewrd is obtined, whose conditionl expecttion is equl to Rij. We mke the usul Mrkovin ssumptions, nmely, tht given i nd, the next stte is conditionlly independent from the pst history of the process; lso, tht given i,, nd j, the ssocited rewrd is gin conditionlly independent from the pst history of the process. Note tht if ction is pplied t stte i, the expected rewrd, denoted by R i, is equl to j P ij R ij. We re interested in the vlue function ssocited with sttionry, Mrkovin, possibly rndomized, policy π. We use π i) to denote the conditionl probbility of pplying ction when t stte i. Let Pij π = π i)p ij, which is the trnsition probbility from i to j, nd R π i = π i)r i = π i) j P ijr ij, 1) 4

6 which is the expected rewrd t stte i, under the policy π. We use P π to denote the m m mtrix with entries P π ij, nd Rπ to denote the m-dimensionl vector with components R π i. We restrict our ttention to the infinite horizon, discounted rewrd criterion for fixed discount fctor α 0, 1). Define the vlue function ssocited with policy π to be the m- dimensionl vector given by Y π = α k P π ) k R π. k=0 Using the geometric series formul, the vlue function is given by Bellmn, 1957) Y π = I αp π ) 1 R π. In our setting the true model prmeters, Pij nd R ij, re not known. Insted, we hve ccess to finite smple of dt, from which these prmeters cn be estimted. Specificlly, ssume tht for every i nd, we hve record of N i trnsitions out of stte i, under ction, nd the ssocited rewrds. We tret the numbers N i s fixed not s rndom vribles), nd ssume tht N i > 0 for every i nd. This lst ssumption restricts ttention to ctions tht hve been tried before. For t lest two resons we nticipte tht this will be reltively wek ssumption in prctice. First, the inbility to evlute ctions in one stte does not restrict our bility to evlute the sme ction in other sttes, becuse we cn still evlute n ction t ny stte where the ction hs been tried before. Thus the restriction only pplies to sttes in which there is no pst informtion bout the outcome. Second, there is tremendous mount of vrition in historicl policies in mny rel-world pplictions. This vrition my rise for lot of resons including experimenttion, implementtion errors or non-sttionrity in the policy. If there is interest in untried ctions, nd there re priors vilble to help predict the outcome, then Byesin pproch cn be used. For completeness we detil such n pproch in the online Appendix D Mnnor et l., 2005). Furthermore, we do not ssume ny reltion between the smpling process nd the policy π of interest; in prticulr, the Ni, for different, need not be proportionl to the π i), nd the number N i = N i of trnsitions out of stte i need not be relted to the stedy-stte probbility of stte i under policy π. For the N i trnsitions out of stte i under ction in the smple dt, let N ij be the 5

7 number of trnsitions tht led to stte j. Furthermore, let Cij be the sum of the rewrds ssocited with these N ij trnsitions for completeness we define C ij = 0 if N ij = 0). We define ˆP ij = N ij N i, ˆR ij = C ij Nij, which will be our estimtes of P ij nd R ij, respectively. When N ij = 0, we define ˆR ij = 0.1 In ddition, we define ˆP π ij = π i) ˆP ij, nd ˆR i = j ˆP ij ˆR ij = j C ij Ni, ˆRπ i = π i) ˆR i, 2) which will be our estimtes of Pij π, R i, nd Rπ i, respectively. We finlly define mtrix ˆP π nd vector ˆR π, with entries ˆP π ij nd ˆR π i, respectively, which will be our estimtes of P π nd R π. Bsed on these estimtes, we obtin n estimted vlue function Ŷ π, given by Ŷ π = I α ˆP π ) 1 ˆRπ. 3) We ssume tht the smple dt reflect the true process, in the following sense. The vector Ni1,..., N im ) follows multinomil distribution with prmeters N i ; P i1,..., P im ). Let IE denote expecttion under the true model. We then hve IE[Nij ] = N i P ij. A lst ssumption tht reflects our erlier ssumptions tht N i is fixed nd tht ech smple rewrd is conditionlly independent from the pst, is tht IE[Cij N ij ] = N ij R ij. Under these ssumptions it is esily verified tht ˆP π nd ˆR π re unbised estimtes of P nd R. Bsed on Eq. 3), we cn nticipte the impct of errors in ˆP π nd ˆR π on Ŷ π. Notice first, tht Ŷ π is liner in ˆR π, so tht if P were observed without error i.e., if ˆP = P ), the vrince of ˆR π would led to vrince in Ŷ π but not to bis since ˆR π is unbised). In contrst, Ŷ π is nonliner in ˆP π, so tht errors in ˆP π led to both bis nd vrince in Ŷ π. Moreover, due to the mtrix inversion the nonlinerity is substntil, so tht ny error in ˆP π cn trnslte to lrge error in Ŷ π. This is prticulrly true when α is close to one. Furthermore, if the errors in ˆP π nd ˆR π re correlted, the nonlinerity implies tht errors in ˆR π will lso led to bis 1 The possibility of N ij being zero for fesible trnsitions introduces some dditionl bis, which will not be ccounted for. However, in our nlysis, we will ssume tht ny trnsition with N ij = 0 is infesible. 6

8 in Ŷ π. 3 An Illustrtion To illustrte the bis nd vrince tht cn be introduced to vlue function estimtes by errors in the model prmeters we use rel dt from mil-order ctlog compny. While this ppliction serves s useful cse study, our findings re not limited to this ppliction. Deciding who should receive ctlog is mongst the most importnt decisions tht milorder compnies must ddress. Yet, identifying n optiml miling policy is difficult tsk. Customer response functions re highly stochstic, reflecting in prt the reltive pucity of informtion tht firms hve bout ech customer. Moreover, the problem is dynmic one. Purchsing decisions re influenced not just by the firm s most recent miling decision, but lso by prior miling decisions. As result, the optiml miling decision depends upon pst nd future miling decisions. A typicl ctlog compny might mil 25 ctlogs per yer. The number of ctlogs, the dtes tht they re miled, nd the content of the ctlogs re determined up to yer before the firm decides to whom ech ctlog will be miled. For this reson, these decisions re typiclly treted s fixed when deciding who to mil to. Accordingly, the firm only needs to decide which customers to mil to, on ech exogenously determined miling dte discrete infinite horizon problem). The firm s objective is to mximize its expected totl discounted profits. Rewrds profits) in ech period re clculted s the revenue erned from customer purchses if ny) less the cost of the goods sold nd the miling costs pproximtely 65 cents per ctlog miled). To support their miling decisions, ctlog firms typiclly mintin lrge dtbses describing the individul purchse nd miling histories for ech customer. We re fortunte to hve ccess to lrge dtbse describing the trnsction nd miling histories for the women s pprel division of modertely lrge ctlog compny. This dt is described in detil in Simester et l. 2004). It includes the complete trnsction histories for pproximtely 1.72 million customers. The miling histories re complete for the six-yer period from 1996 through 2002 the compny did not mintin record of the miling history prior to 1996). Ctlogs were miled on 133 occsions in this six-yer period, so tht on verge miling decision occurred 7

9 every 2-3 weeks. The ctlog miling problem cn be modelled s n MDP s in Gönül nd Shi, 1998), where the stte is summry of the customer s history, nd the ction t ech period is to either mil or not mil. The construction of the stte spce is n interesting problem tht we will not consider here. We will insted follow stndrd industry pproch to this problem tht uses three stte vribles, the so-clled RFM mesures e.g., Bult nd Wnsbeek, 1995; Bitrn nd Mondschein, 1996). These mesures describe the recency, frequency nd monetry vlue of customers prior purchses. 2 For the purposes of this illustrtion, we constructed stte spce by quntizing ech of the RFM vribles to 4 discrete levels, yielding stte spce with S = 4 3 = 64 sttes. At ech historicl miling epoch, we evlute the RFM vribles of ech customer regrdless of whether the customer received ctlog or mde purchse) nd chrcterize him/her into one of the 64 sttes. We lso tret the purchse mount zero if no purchse in the epoch) less the miling cost s rewrd smple. Therefore ech customer s historicl dt over time serves s smple trjectory. Following the procedure described in the previous section, we my then estimte the model prmeters ˆP nd ˆR nd clculte Ŷ for the current policy embedded in dt. Since the firm is interested in the verge profit per customer, rther thn the profit erned from n individul customer, internl vrince is verged out. However, prmetric vrince is of interest becuse it ffects the comprison of different policies. In prticulr, when evluting new policy, the firm would like both prediction of the expected profits from dopting the new policy, together with confidence bounds round tht prediction. In order to illustrte the impct of prmetric vrince, we rndomly divided the 1.72 million customers nd 164 million observtions into 250 eqully sized sub-smples, ech contining pproximtely 657 thousnd observtions. By observtion we men miling period nd n ssocited stte trnsition in the history of customer, irrespective of whether ctlog ws miled or purchse ws mde during tht time period. We then seprtely estimted the model prmeters ˆP π nd ˆR π following Section 2 using the observtions from ech of these sub-smples. Here we considered the policy π to be the sme s the smpling policy tht 2 Recency is mesured s the number of dys in hundreds) since customer s lst purchse. Frequency mesures the number of items tht customers previously purchsed. Monetry Vlue mesures the verge price in dollrs) of the items ordered by ech customer. 8

10 generted the dt. Using eqution 3) we clculted 250 estimtes of the vlue function. As benchmrk, we lso estimted the model prmeters using the full smple of 1.72 million customers. For the purposes of this illustrtion, we will interpret the model estimted using the full smple s the true model, which is essentilly equivlent to ssuming tht the 1.72 million customers re the full popultion. Thus, within typicl sub-smple, the expected rewrd in ech stte ˆR π were estimted using n verge of pproximtely 10 thousnd observtions N i ), while the trnsition mtrix ˆP π ws estimted using n verge of 160 observtions per trnsition. In prctice, most of the trnsitions re infesible; for exmple, customer cnnot trnsition from hving 3 prior purchses to only hving 2 prior purchses. When limiting ttention to only those trnsitions tht re fesible, the verge number of observtions per trnsition ws pproximtely 1,400. The verge of the positive Nij s is round 1, 400.) In Figure 1 we report the empiricl distribution histogrm) of the vlue function Ŷ π cross ll 250 sub-smples under the historicl policy used by the firm s clculted using the whole smple). In order to summrize n estimted vlue function with single number to be referred to s the verge vlue function, or AVF ) for ech sub-smple, we verge the estimtes cross sttes weighing ech stte eqully). The true AVF, computed from the prmeters estimted for the full smple, is $ In comprison, the verge of the 250 estimtes is $28.65, with n empiricl stndrd devition of $0.97. The difference between $28.54 nd $28.65 is not sttisticlly significnt nd is of seemingly little mngeril importnce. However, the vrince is potentilly very importnt. The 95% confidence intervl round the 250 AVF estimtes rnges from $26.59 to $30.49, or roughly 14% of the true men. Of course, we were ble to estimte the $0.97 stndrd devition only becuse we hd ccess to mny sub-smples. In rel world setting, where only single smple is vilble, the resercher generlly relies on simultions or jck-knifing techniques to estimte the stndrd devition. In this pper, we will present procedure for deriving closed-form pproximtions of the stndrd devition directly from the dt. We cn demonstrte the robustness of the bove described results by vrying both the size of the sub-smples nd the discount fctor. In Tble 1 we present the empiricl bis nd stndrd devition for different discount fctors verged over 10 repetitions). In ech repetition, we divide the dt set into 100 sub-smples nd compute the AVF for ech sub-smple. We clculte the verge bsolute vlue of the bis nd the empiricl stndrd devition of the AVF 9

11 60 Number of sub smples AVF per sub smple Figure 1: Mil ctlog problem: histogrm of the AVF of the historicl policy for prtition of the customers to 250 sub-smples. The discount fctor per period is α = The policy used is the historicl mixed) policy used by the firm, nd the vlue function is weighted uniformly cross sttes. The AVF obtined from the full dt is $28.54, nd is plotted s verticl line. The empiricl stndrd devition is $0.97. estimtes cross sub-smples. It cn be seen tht the verge bis is smll for discount fctors tht re not too close to 1. For discount fctors tht re close to 1, the bis becomes more meningful but still remins much smller thn the stndrd devition. In nother experiment we vried the precision of the estimtes by chnging the size of the sub-smples nd repeted the nlysis using sub-smples with different number of observtions. In Figure 2 we report empiricl stndrd devitions of the AVF estimtes under the different sized sub-smples. Ech cross in Figure 2 represents rndom ssignment of the observtions to sub-smples the different ssignments led to vrition in the sub-smples between repetitions). While incresing the size of the sub-smples increses the ccurcy of the model prmeters, nd in turn reduces the vrince in the AVF estimtes, the rte t which the vrince pproches zero slows down s the sub-smples increse in size. It seems tht even when estimting the model prmeters with very lrge mounts of dt, prmetric vrince leds to non-negligible vrince in the vlue function estimtes. 4 Anlysis In this section we provide closed-form pproximtions for the bis nd vrince of the estimted vlue function using second order pproximtions. We then briefly discuss the control problem where in ddition to the estimtion process, we look for n optiml policy. In Section 4.1 we 10

12 α bis/avf STD/AVF % 3.57% % 3.37% % 3.32% % 3.26% % 3.33% % 3.88% % 5.26% Tble 1: Bis nd vrince s function of the discount fctor. For ech discount fctor, we prtition the dt 10 times, with ech prtition resulting in 100 sub-smples ech with roughly 1.6 million observtions). We present in the tble the men bsolute vlue of the bis nd the men empiricl stndrd devition ech verged cross the ten repetitions. Both of these mens re stndrdized by dividing by the AVF ssocited with the historicl policy s mesured on the whole dt set). $1.5 STD of AVF $1 $0.5 $ Number of observtions per sub smple Millions) Figure 2: Mil ctlog problem: the empiricl stndrd devition of the AVF s function of the smple size. Ech cross represents single rndom) prtition of the observtions into sub-smples. will drop the superscript π, becuse we consider fixed policy π. 4.1 Approximtions for Bis nd Vrince in the Estimted Vlue Function We now derive closed-form pproximtions for the prmetric) bis nd vrince of Ŷ. The nlysis follows clssicl non Byesin) pproch, where the bis nd vrince re expressed in terms of the unknown) true prmeters. Since the true model prmeters re unknown, we substitute the estimted prmeters, which is stndrd prctice. However, s result of this substitution, the vlues obtined for the bis nd vrince re themselves estimtes. For completeness we lso provide in the online Appendix D Mnnor et l., 2005) Byesin nlysis. Under the Byesin pproch P nd R re treted s rndom vribles with known 11

13 prior distributions, nd we deduce pproximtions for the conditionl bis nd vrince, given the vlues of ˆP nd ˆR. The expressions obtined using the Byesin pproch re lmost identicl to the ones in the clssicl pproch unless n informtive prior is vilble). Our gol is to clculte IE[Ŷ ] nd the covrince mtrix for Ŷ, defined by covŷ ) = IE[Ŷ Ŷ ] IE[Ŷ ]IE[Ŷ ]. We define rndom m m mtrix P = ˆP P nd rndom m-vector R = ˆR R. Note tht P nd R re zero men rndom vribles tht represent the difference between the true model nd the estimted model. To help interpret some of the lter nlysis, it will be helpful to hve sense of the mgnitudes of P nd R. Becuse the trnsition probbilities re bounded by zero nd one, the errors in these probbilities re lso bounded between zero nd one. The trnsition probbilities themselves will tend to be smller the lrger the number of sttes to which trnsitions re fesible, while the errors in these probbilities will be smller the more observtions there re reltive to the number of fesible trnsitions. In the exmple discussed in Section 3 nd Figure 1, the mximum error in the trnsition probbilities in sub-smple mx ij P ij ) hs men of 0.011, nd stndrd devition of Furthermore, the verge verged over ll pirs i, j) with nonzero trnsition probbility) bsolute error in the trnsition probbility estimtes, P ij, hs men of nd n empiricl stndrd devition of Note tht in tht exmple, the fesible trnsitions consist of less thn 10% of the 64 2 entries in P. The expected rewrds re not bounded priori nd so the errors re lso unbounded. In the ctlog exmple, the verge bsolute error in the rewrd estimtes, R ij, hs men of $4.25 nd stndrd devition of $1.82. The mximl error in the rewrd estimtes, mx ij R ij, hs men of $56.3 nd stndrd devition of $43.2. We now write the expecttion of Ŷ cf. Eq. 3)) s: IE [ Ŷ ] [ = IE I αp + P )) 1 R + R) ] [ ] = IE α k P + P ) k R + R), 4) k=0 where the geometric series expnsion of I αp + P )) 1 ws used to obtin the second 12

14 equlity. We use the nottion X = I αp ) 1 nd f k P ) k ) = X P X = X P ) k X. The following lemm will be useful. Lemm 4.1 l=0 αl P + P ) l = k=0 αk f k P ). Proof: α k f k P ) = k=0 α k X P ) k X = I αx P ) 1 X k=0 = X 1 X 1 αx P ) 1 = I αp α P ) 1 = α l P + P ) l, l=0 where we repetedly used the definition of X, nd the fct tht X is invertible. Using Lemm 4.1 in Eq. 4), we obtin: ) IE[Ŷ ] = I αp ) 1 R + α k IE[f k P )] R + α k IE[f k P ) R]. 5) k=1 k=0 There re three terms on the right-hnd side of Eq. 5). The first term is the vlue function for the true model. The second term reflects the bis introduced by the uncertinty in ˆP lone, nd the third term represents the bis introduced by the correltion between the errors in ˆP nd ˆR. Eqution 5) provides series expnsion of the error in terms of high order moments nd cross moments of the errors in ˆP nd ˆR. The clcultion of the bis is tedious becuse the term IE[f k P )] involves kth order moments of multinomil distributions. But since P ij is typiclly smll, P k is generlly close to zero for lrge k. For this reson we limit our ttention to second order pproximtion nd we will ssume tht IE[f k P )] 0 for k > 2, nd tht IE[f k P ) R] 0 for k > 1. We use the ctlog dt to investigte the ppropriteness of this ssumption in Section 5. Under this ssumption, we cn write Eqution 5) s: IE[Ŷ ] = I αp ) 1 R + αie[f 1 P )]R + α 2 IE[f 2 P )]R + XIE[ R] + αie[f 1 P ) R] + L exp, 6) where we represent ll the terms of order greter thn 2 in L exp = α k IE[f k P )]R + α k IE[f k P ) R]. k=3 k=2 13

15 Given tht we will be using second order pproximtions, we expect tht the men nd vrince of Ŷ cn be clculted s long s we re ble to compute the covrince between vrious entries of R nd P. We strt with P. First we introduce some nottion. We use the nottion A i nd A i to denote the i th row nd column, respectively, of mtrix A, nd diga i ) to denote digonl mtrix with the entries of A i long the digonl. We note tht P i nd P j re independent when i j. To find the covrince mtrix of P i, we consider the row vectors ˆP i nd P i with the estimted nd true trnsition probbilities, nd define P i to be their difference. Note tht P i = π i)p i. For ech stte-ction pir i, ), we define M i = digp i ) P i ) P i, which is symmetric positive semi-definite mtrix. Recll tht for ech i, ), we hve ˆP ij = N ij /N i, where the N ij re drwn from multinomil distribution. The covrince mtrix of ˆP i is M i /N i, nd the covrince mtrix of ˆP i is COV i) = IE[ P i P i ] = π i) 2 Mi. N i Now we consider R. Since C ij is independent of C kl whenever i k, we hve IE[ R i Rk ] = 0, for i k. Furthermore, IE[ R 2 i ] = π i) 2 IE[ R i ) 2 ]. In the following we use Ni to represent the vector with components Nij, j = 1,..., m, nd R i to represent the vector with components R ij, j = 1,..., m. Note tht C ij nd C ik re 14

16 independent given N ij nd N ik, so tht IE[ R [ i ) 2 j ] = vr C ] ij = 1 [ Ni vr )2 = = = N i j 1 { [ ]) [ ])} Ni vr IE C )2 ij Ni + IE vr Cij Ni j j 1 { ) [ ]} Ni vr RijN )2 ij + IE VijN ij = 1 N i j C ij 1 { Ni N )2 i ) 2 vrri ˆP i ) ) + Ni j ] j } VijP ij R i M i R i + V i P i ). 7) Here, V ij. is the vrince of the rewrds ssocited with trnsition from i to j, under ction In order to ccount for the correltion between P nd R i, we use Eq. 2), to obtin ˆR i = = π i) j π i) j ˆR ij ˆP ij RijP ij + Rij P ij + R ijp ij + R ij P ) ij, 8) where R ij = ˆR ij R ij. Compring with Eq. 1), we hve R i = ˆR i R i = π i) j Rij P ij + R ijp ij + R ij P ) ij. 9) We use to denote Hdmrd multipliction: for ny two mtrices A nd B with the sme dimensions, A B) is mtrix gin with the sme dimensions) with entries A B) ij = A ij B ij. We lso use e to denote the m-dimensionl vector with ll components equl to one. And we use π to denote the m-dimensionl vector with components π i = π i). With this nottion, Eq. 9) becomes R = π [ P R + R P + R P ) )e]. 10) We define n m m mtrix Q with entries Q ij = COV i) j X i. 11) 15

17 Recll the definition X = I αp ) 1, nd tht Y = XR is the true vlue function.) And we define n m-dimensionl vector B with its i th component defined s B i = π i) 2 N i R i M i X i The following proposition quntifies the bis under the second order pproximtion ssumption. The proof is given in Appendix A. Proposition 4.1 The expecttion of the estimted vlue function Ŷ stisfies IE[Ŷ ] = Y + α2 XQY + αxb + L exp, where L exp = k=3 [ α k IE f k P )] ) R + [ α k IE f k P ) k=2 π P R )e) )] ) 1 = o Ni where N i = min i,):π i)>0 N i nd the term o ) stisfies lim N o1/n) N = 0., In the bove proposition, i, ) represents the lest smpled stte-ction pir tht is used by the policy. The term L exp decreses to 0 fster tht 1/Ni, wheres Q nd B cn be shown to decrese like 1/Ni. Therefore, our pproximtion of the bis in the vlue function estimtes will be α 2 XQY + αxb. For the purposes of the next proposition, we introduce some more nottion. We define the digonl mtrix W whose digonl entries re given by W ii = π i) 2 N i [ ) αy + Ri Mi αy + R i ) + Vi P ] i. 12) The next proposition provides n expression for the second moment, IE[Y Ŷ ]. Together with the expression for IE[Ŷ ] in the preceding proposition, it leds to n pproximtion for the covrince mtrix of Ŷ. The proof is given in Appendix B. 16

18 Proposition 4.2 The second moment of Ŷ stisfies { } IE[Ŷ Ŷ ] = Y Y + X α 2 QY R + RY Q ) + αbr + RB ) + W X + L vr, where L vr is given by L vr = k,l:k+l>2 [ α k+l IE f k P ) RR + R) R) ) f l P ) ] [ + αie X R) R) f 1 P ) ] + [ IE f 1 P ) R) R) X ] 1 = o N i ) By tking the difference between IE[Ŷ Ŷ ], s given by Proposition 4.2, nd IE[Ŷ ]IE[Ŷ ], s prescribed by Proposition 4.1, the following corollry is esily derived. Corollry 4.1 The covrince mtrix of the estimted vlue function stisfies cov Ŷ ) ) 1 = XW X + o. Ni The expressions in Propositions 4.1, 4.2 nd Corollry 4.1 yield severl insights. First, s the counts N i increse to infinity, COV i) pproches 0, nd thus ll the terms involving the mtrices Q, B nd W converge to 0. As expected, this implies tht s the smple size increses nd the ccurcy of the estimted prmeters improves, both the bis nd the vrince decrese to 0. Second, the expressions for the bis nd vrince rely on the true model prmeters, which re unknown. As discussed in the introduction, to obtin computble pproximtions of the bis nd vrince, we will use insted ˆP, ˆR, nd the empiricl vrince of ech R ik. In principle, we could lso estimte the bis nd vrince due to this pproximtion, but this is tedious nd, s suggested by the experimentl results in the next section, generlly unnecessry. Third, when min i, N i is lrge, it follows tht the non zero entries of B, W, nd Q decreses to 0 like 1/Ni. Therefore the stndrd devition decreses to 0 like 1/ N, which is the usul behvior of empiricl estimtes. The expressions in Proposition 4.1 nd Corollry 4.1 llow us to qulittively compre the mgnitude of the bis nd vrince. According to Corollry 4.1, the stndrd devition of Ŷi i 17

19 cn be pproximtely estimted s σŷi) = X i W X i. 13) The next proposition, proved in Appendix C, quntifies the rtio between the stndrd devition nd the bis. Recll tht for two functions f nd g defined on the rel numbers) we write fn) = Ωgn)) if there exist constnts N 0 nd C such tht fn) Cgn) for n N 0. Proposition 4.3 Suppose tht σŷi) > 0 nd N i /N i σŷi) ) IE[Ŷi] Y i = Ω Ni > c > 0 for ll nd i. Then for ll i. Proposition 4.3 implies tht the errors introduced by the prmetric vrince will generlly be much lrger thn the bis. Note tht since W is positive semi-definite mtrix, σŷi) > 0 is very wek non-degenercy ssumption. The condition N i /N i > c > 0 requires tht smple sizes increse uniformly. The conditions in this proposition re somewht stronger thn necessry, for simplicity of exposition. While the expression in Corollry 4.1 llows us to pproximte the covrince mtrix of the estimted vlue function, the findings on their own do not llow us to clculte confidence intervls round these estimtes. Clculting confidence intervl requires tht we know the distribution of the vlue function estimtes. A centrl limit theorem Serfling, 1980, pge 122, Theorem A) speks to this issue. 3 Theorem 4.1 Serfling, 1980) Suppose tht X n := X n1,..., X nk ) symptoticlly pproches N µ, b 2 nσ) with b n 0. Let gx) = g 1 x),..., g m x)), x = x 1,..., x k ) be vector-vlued function for which ech component function g i x) is rel-vlued function nd hs non-zero grdient t x = µ. Let [ g i D = x j x=µ ] m k. Then gx n ) symptoticlly pproches N gµ), b 2 ndσd ). Becuse ˆP ij nd ˆR ij re ll estimtors tht symptoticlly follow norml distributions, we my consider Ŷ s the function g in the bove theorem nd conclude tht Ŷ is symptoticlly 3 We thnk the deprtment editor for directing our ttention to this theorem. 18

20 norml. We further investigte this issue using ctlog miling dt in Section 5, where we report tht Kolmogorov-Smirnov test cnnot reject the hypothesis tht Ŷ is normlly distributed. Reders my wonder whether we could hve used the Serfling result to derive our erlier findings. It is techniclly possible to do so. Indeed, under the ssumption tht ll of the N i s re identicl, we were ble to show tht the two pproches yield the sme result, nd observed tht the two derivtions were of comprble length nd complexity. However, if smpling occurs t different rtes in different sttes, the rte t which the Ni s pproch infinity will generlly vry. In this cse use of the Serfling theorem, or ny relted centrl limit theorem, requires extensive dditionl derivtion. Moreover, these theorems do not ddress the issue of bis. 4.2 The Control Problem To this point we hve focused on the vlue function under fixed policy. In mny pplictions we re interested in compring n existing policy with n lterntive policy, possibly derived through policy optimiztion process. We know from the MDP theory tht there exists n optiml policy π such tht Y π i Y π i for ll dmissible policies π nd ll sttes i S. The optiml policy my be obtined from vlue itertion, policy itertion or liner progrmming lgorithms. See, for exmple, Bertseks 2000). Since we do not hve ccess to the true model prmeters P nd R, optimiztion bsed on the estimted prmeters ˆP nd ˆR produces n optiml policy ˆπ such tht Ŷ ˆπ Ŷ π for ll dmissible policies π. In generl, policy ˆπ is different from π. Moreover, since the policy ˆπ is obtined through n optimiztion process, the estimtes of the model prmeters for tht policy ˆP ˆπ nd ˆRˆπ ) will no longer be unbised estimtes of the true model prmeters P ˆπ nd Rˆπ ). Therefore we cnnot use the pproximtion derived in Proposition 4.1 for fixed policy) to evlute the bis in the optiml vlue function. Nor cn we use the pproximtions in Proposition 4.2 nd Corollry 4.1 to estimte the covrince mtrix. We cn illustrte the problem using through simple exmple. Consider single stte MDP with two ctions, tht is, S = {1} nd A = {0, 1}. Both ctions yield identicl zeromen rndom rewrds. Clerly in such problem π could be either ction 0 or 1, with vlue 19

21 functions Y π = Y ˆπ = 0. Now ssume tht we hve n smples to estimte the expected rewrd ˆR for either ction. Indeed both ˆR follow pproximtely) norml distribution N 0, 1/n). The policy optimiztion procedure chooses the ction with the lrgest ˆR. If we use ˆR to denote the mximum of ˆR 0 nd ˆR 1, we know from Jensen s Inequlity tht IE[ ˆR ] > 0, nd so the vlue function estimted for the chosen policy will on verge be positively bised: [ ] [ IE[Ŷ ˆπ ] = IE ˆR = IE mx{ ˆR 0, ˆR ] { 1 } > mx IE[ ˆR 0 ], IE[ ˆR } 1 ] = 0. The mgnitude of IE[Ŷ ˆπ ], nd therefore the bis in this exmple, is studied in the order sttistics literture Ledbetter et l., 1983). We lso refer reders to Clrk 1961), where the uthor presents procedure to pproximte moments of the mximum of finite number of correlted Gussin rndom vribles. This problem rises two issues. First, how cn we de-bis the estimtes of ˆP ˆπ nd ˆP ˆπ so tht we cn use our erlier results to estimte the bis nd covrince mtrix of vlue function when the policy is derived from n optimiztion procedure? Second, becuse the optimiztion procedures themselves rely on estimtes ˆP π nd ˆR π, the policies derived from stndrd dynmic progrmming lgorithms will generlly not be truly optiml ˆπ π ). In the reminder of this section we propose cross-vlidtion pproch tht cn help to ddress the first issue. Unfortuntely, we do not hve solution to the second issue. Indeed, it seems unlikely tht generl procedure cn be found tht resolves the second issue s the sub-optimlity reflects the bsence of complete informtion in the trining dt. The bis in the estimtes of ˆP ˆπ nd ˆRˆπ rises becuse optimiztion methods tend to fvor ctions for which the estimtion errors in ˆP π nd ˆR π led to inflted estimtes of the vlue function. As long s the errors in ˆP nd ˆR re independent cross smples, we cn derive unbised estimtes of P nd R if we use different smple of dt to evlute the policy ˆπ thn the smple we used to design the policy. In prticulr, consider the following pproch. Strt by dividing the trining dt into two sub-smples; clibrtion smple nd vlidtion smple. Use the clibrtion smple to estimte the model prmeters ˆP cl nd ˆR cl nd obtin 20

22 the optiml policy ˆπ cl = rg mx I α ˆP 1 π π cl) ˆRπ cl. Then estimte model prmeters ˆP vl nd ˆR vl from the vlidtion smple nd following Eqution 3)) evlute the policy using these new prmeters: Ŷ ˆπ cl vl = I α ˆP ˆπ cl vl ) 1 ˆRˆπ cl vl. Through this procedure we cn de-bis the vlue function estimtes by reporting Ŷ ˆπ cl vl of Ŷ ˆπ cl cl, where Ŷ ˆπ cl cl = I α ˆP ˆπ cl cl vrince nd therefore the confidence bounds of Ŷ ˆπ cl vl 4.1. insted ) 1 ˆRˆπ cl cl. Accordingly, we my lso pproximte the bis nd following Proposition 4.1 nd Corollry The ssumption tht the estimtion errors in ˆP nd ˆR re independent cross the clibrtion nd vlidtion sub-smples is obviously criticl. In this pper we hve ssumed tht estimtes ˆP nd ˆR re derived from stright-forwrd non-prmetric ggregtes of the vilble dt. Under this pproch the estimtion errors re independent cross the sub-smples s long s ny mesurement errors re independent cross observtions. However, in some settings, it is common to estimte the model prmeters from mximum likelihood estimtes tht require functionl form nd distribution ssumptions this is prticulrly common in the economics literture). Under this lterntive pproch, ny errors introduced by the functionl form nd distribution ssumptions will be correlted cross the sub-smples. As result, the cross-vlidtion procedure tht we hve proposed will not de-bis the estimtes of ˆP ˆπ nd ˆRˆπ, even if the mesurement errors re independent cross the observtions. 5 Experiments The relince on second order expnsion in deriving the pproximtions for the bis nd vrince presumes tht higher order terms re reltively unimportnt. We now exmine this ssumption in further detil by using the ctlog miling dt to vlidte the findings. These dt lso enble us to investigte the impct if ny) of using estimtes of the model prmeters in these expressions in the bsence of the true model prmeters). If the vlue function estimtes follow norml distribution, the vrince nd bis expres- 21

23 sions derived in the previous section fcilitte clcultion of confidence intervls round the de-bised vlue function estimtes. We cn investigte the ccurcy of these confidence intervls by compring how frequently the true vlue function flls within the confidence intervls. We would expect tht on verge the true vlue will fll within one stndrd devition of the unbised men 68% of the time nd within two stndrd devitions 95% of the time. We begin by investigting whether the vlue function estimtes follow norml distribution. We do so by using Kolmogorov-Smirnov test on ech of the dt points reported in Section 3. The hypothesis tht the rewrd is two-sided Gussin could not be rejected with confidence 0.05 t ny instnce. The verge P-vlue ws with minimum of nd mximum of This indictes tht it cnnot be determined tht the dt do not follow Gussin rule. We use the sme prtitions of the dt s in Section 2. In Figure 3 the percentge of times tht the true vlue function ws within one stndrd devition is denoted by + nd within two stndrd devitions by n x. For exmple, for the 250 sub-smples with bout 657,000 observtions ech), we report the percentge of the 250 estimtes in which the true verge vlue function AVF) s estimted on the full smple) ws within the estimted confidence intervl. By re-drwing the 250 sub-smples ten times, we report ten instnces of this percentge. An nlogous process ws used with other choices of the sub-smple size. The findings in Figure 3 confirm tht the percentge of estimtes tht fll within one nd two stndrd devitions of the true AVF re close to the trgets of 68% nd 95% respectively. We next consider the importnce of the second order pproximtions. We do so by tking dvntge of the role plyed by the discount fctor α. The importnce of higher order terms in the series expnsions increses s the discount fctor pproches one. In Tble 2 we repet the nlysis for 250 sub-smples of fixed size, but for different discount fctors sme settings s in Tble 1). As expected, s α pproches 1, the ccurcy of the confidence intervls degrdes. We ttribute this to the error introduced by the second order pproximtion. 5.1 The Control Problem As discussed in Section 4.2, n obvious ppliction of our nlysis is the comprison of current policy with new policy generted through some optimiztion process. We cutioned tht before pplying the expressions for the bis nd the vrince to policy derived from 22

24 Percentge below 1 +) or 2 +) STDs Observtions per sub smple Millions) Figure 3: The percentge of the AVF estimtes tht fll within one + ) nd two x ) stndrd devitions from the vlue clculted bsed on the full dt set. Ech + nd x represents rndom prtition of the full dt to sub-smples. The discount fctor ws α = such process, we should first obtin unbised estimtes of the model prmeters, using n independent vlidtion smple. We will use the ctlog miling dt to illustrte the importnce of this first step. We begin by rndomly selecting portion of the vilble dt, to be used s clibrtion smple, nd retin the remining dt s vlidtion smple. To demonstrte how the size of the clibrtion smple ffects the findings, we repet this process for clibrtion smples of different sizes. The clibrtion smple is used to estimte model prmeters ˆP cl nd ˆR cl. Then we run policy itertion lgorithm to identify n optiml policy ˆπ cl from ˆP cl nd ˆR cl. We will compre two AVF estimtes for this policy: the AVF clculted on the bsis of the model estimted using the clibrtion smple denoted by Y cl ); nd the AVF of tht policy s estimted using the vlidtion smple denoted by Y vl ). The difference between the two estimtes represents the bis introduced by the error in the model prmeters the errors no longer hve zero expecttion due to the optimiztion process). This bis is illustrted in Figure 4 for clibrtion smples of vrying sizes. It cn be seen tht vlue function estimtes from 23

25 α Smples with Smples with 1 STD 2 STD % ) 95.44% ) % ) 94.84% ) % ) 95.08% ) % ) 94.76% ) % ) 95.52% ) % ) 94.92% ) % ) 92.20% ) Tble 2: We rndomly prtitioned the dt while vrying the discount fctor. For ech discount fctor, we performed the prtition 10 times, ech prtition ws to 250 sub-smples ech with roughly 657,000 million observtions). We present the percentge of smples in which the estimted AVF is within one stndrd devition s predicted by Proposition 4.2) of the vlue s mesured on ll the dt; the minimum nd mximum percentges over the 10 runs re provided in prentheses. The sme sttistics re presented for two stndrd devitions. the clibrtion smple re lmost uniformly greter thn the estimtes from the vlidtion smple. This bis is sttisticlly significnt. It is lso mngerilly relevnt, verging round 6.3% of the true optiml AVF $33.59) for clibrtion smple tht consists of pproximtely 1.6 million observtions 1% of the dt). As n side, the $33.59 AVF for the optiml policy cn be compred with the $28.54 AVF for the historicl policy reported in Figure 1). These results indicte tht the optiml policy offers potentil profit improvement of pproximtely 17%. $5 $4 Bis: Y cl Y vl $3 $2 $1 $0 $ Size of clibrtion smple % of dt set) Figure 4: The differences mrked by + ) between the AVF estimtes in Dollrs, nd verged over ll sttes) bsed on the clibrtion smple nd the vlidtion smple, for the policy identified through n optimiztion process. Ech + ws generted by rndomly prtitioning the dt to clibrtion nd vlidtion smple. The horizontl xis corresponds to the size of the clibrtion smple, s percentge of the full dt smple. Here α = 0.98 for which the true optiml AVF is pproximtely $

26 We cn lso use the ctlog dt to investigte the extent to which prmetric vrince leds to sub-optiml policies. To do so, we compred the optiml policy derived using ech sub-smple, with the true optiml policy derived using the entire dt set. Both policies re evluted on the vlidtion smple. We use Y to denote the AVF for the optiml policy found by optimizing on the entire dt set. 4 The findings re reported in Figure 5. As expected, the optiml policy lwys outperforms the policy derived from the clibrtion sub-smple. The differences re gin sttisticlly significnt. $0 Suboptimlity: Y vl Y * $0.5 $1 $ Size of clibrtion smple % of dt set) Figure 5: The differences mrked by + ) between the AVF estimtes in Dollrs) of the optiml policy bsed on the clibrtion smple nd the AVF of the optiml policy found by optimizing on the vlidtion smple. Ech + ws generted by rndomly prtitioning the dt to clibrtion nd vlidtion smple. The horizontl xis corresponds to the size of the clibrtion smple, s percentge of the full dt smple. Here α = 0.98 for which the true optiml AVF is pproximtely $ In order to demonstrte the robustness of the findings, we performed n experiment similr to the one reported in Tble 2. In Tble 3 we present the bis nd sub-optimlity introduced by the optimiztion process, for different vlues of α. 5 From Tble 3 we cn esily obtin the men stndrd errors s the smple stndrd devitions divided by 10 the squre root of the smple size, 100). It is cler tht both the bis nd the sub-optimlity re generlly significntly greter thn zero, with the bis verging round 2% of the AVF nd the suboptimlity verging round 1%. We conclude tht prmetric vrince introduces two issues in policy optimiztion. First, 4 Note tht the computtion of Y nd Y vl uses the sme dt, which my introduce correltion between the two quntities. This will tend to diminish our estimtes of the sub-optimlity. We lso computed Yvl, the optiml AVF over the vlidtion set, in plce of Y for Tble 3 nd Figures 4 nd 5. The results re similr. 5 The bis ws clculted s Y cl Y vl )/Y ; the sub-optimlity ws clculted s Y vl Y )/Y. 25

Tests for the Ratio of Two Poisson Rates

Tests for the Ratio of Two Poisson Rates Chpter 437 Tests for the Rtio of Two Poisson Rtes Introduction The Poisson probbility lw gives the probbility distribution of the number of events occurring in specified intervl of time or spce. The Poisson

More information

1 Online Learning and Regret Minimization

1 Online Learning and Regret Minimization 2.997 Decision-Mking in Lrge-Scle Systems My 10 MIT, Spring 2004 Hndout #29 Lecture Note 24 1 Online Lerning nd Regret Minimiztion In this lecture, we consider the problem of sequentil decision mking in

More information

Reinforcement learning II

Reinforcement learning II CS 1675 Introduction to Mchine Lerning Lecture 26 Reinforcement lerning II Milos Huskrecht milos@cs.pitt.edu 5329 Sennott Squre Reinforcement lerning Bsics: Input x Lerner Output Reinforcement r Critic

More information

Review of Calculus, cont d

Review of Calculus, cont d Jim Lmbers MAT 460 Fll Semester 2009-10 Lecture 3 Notes These notes correspond to Section 1.1 in the text. Review of Clculus, cont d Riemnn Sums nd the Definite Integrl There re mny cses in which some

More information

The steps of the hypothesis test

The steps of the hypothesis test ttisticl Methods I (EXT 7005) Pge 78 Mosquito species Time of dy A B C Mid morning 0.0088 5.4900 5.5000 Mid Afternoon.3400 0.0300 0.8700 Dusk 0.600 5.400 3.000 The Chi squre test sttistic is the sum of

More information

Properties of Integrals, Indefinite Integrals. Goals: Definition of the Definite Integral Integral Calculations using Antiderivatives

Properties of Integrals, Indefinite Integrals. Goals: Definition of the Definite Integral Integral Calculations using Antiderivatives Block #6: Properties of Integrls, Indefinite Integrls Gols: Definition of the Definite Integrl Integrl Clcultions using Antiderivtives Properties of Integrls The Indefinite Integrl 1 Riemnn Sums - 1 Riemnn

More information

Math 1B, lecture 4: Error bounds for numerical methods

Math 1B, lecture 4: Error bounds for numerical methods Mth B, lecture 4: Error bounds for numericl methods Nthn Pflueger 4 September 0 Introduction The five numericl methods descried in the previous lecture ll operte by the sme principle: they pproximte the

More information

Unit #9 : Definite Integral Properties; Fundamental Theorem of Calculus

Unit #9 : Definite Integral Properties; Fundamental Theorem of Calculus Unit #9 : Definite Integrl Properties; Fundmentl Theorem of Clculus Gols: Identify properties of definite integrls Define odd nd even functions, nd reltionship to integrl vlues Introduce the Fundmentl

More information

Testing categorized bivariate normality with two-stage. polychoric correlation estimates

Testing categorized bivariate normality with two-stage. polychoric correlation estimates Testing ctegorized bivrite normlity with two-stge polychoric correltion estimtes Albert Mydeu-Olivres Dept. of Psychology University of Brcelon Address correspondence to: Albert Mydeu-Olivres. Fculty of

More information

p-adic Egyptian Fractions

p-adic Egyptian Fractions p-adic Egyptin Frctions Contents 1 Introduction 1 2 Trditionl Egyptin Frctions nd Greedy Algorithm 2 3 Set-up 3 4 p-greedy Algorithm 5 5 p-egyptin Trditionl 10 6 Conclusion 1 Introduction An Egyptin frction

More information

NUMERICAL INTEGRATION. The inverse process to differentiation in calculus is integration. Mathematically, integration is represented by.

NUMERICAL INTEGRATION. The inverse process to differentiation in calculus is integration. Mathematically, integration is represented by. NUMERICAL INTEGRATION 1 Introduction The inverse process to differentition in clculus is integrtion. Mthemticlly, integrtion is represented by f(x) dx which stnds for the integrl of the function f(x) with

More information

Monte Carlo method in solving numerical integration and differential equation

Monte Carlo method in solving numerical integration and differential equation Monte Crlo method in solving numericl integrtion nd differentil eqution Ye Jin Chemistry Deprtment Duke University yj66@duke.edu Abstrct: Monte Crlo method is commonly used in rel physics problem. The

More information

Duality # Second iteration for HW problem. Recall our LP example problem we have been working on, in equality form, is given below.

Duality # Second iteration for HW problem. Recall our LP example problem we have been working on, in equality form, is given below. Dulity #. Second itertion for HW problem Recll our LP emple problem we hve been working on, in equlity form, is given below.,,,, 8 m F which, when written in slightly different form, is 8 F Recll tht we

More information

The Regulated and Riemann Integrals

The Regulated and Riemann Integrals Chpter 1 The Regulted nd Riemnn Integrls 1.1 Introduction We will consider severl different pproches to defining the definite integrl f(x) dx of function f(x). These definitions will ll ssign the sme vlue

More information

Lecture 14: Quadrature

Lecture 14: Quadrature Lecture 14: Qudrture This lecture is concerned with the evlution of integrls fx)dx 1) over finite intervl [, b] The integrnd fx) is ssumed to be rel-vlues nd smooth The pproximtion of n integrl by numericl

More information

Administrivia CSE 190: Reinforcement Learning: An Introduction

Administrivia CSE 190: Reinforcement Learning: An Introduction Administrivi CSE 190: Reinforcement Lerning: An Introduction Any emil sent to me bout the course should hve CSE 190 in the subject line! Chpter 4: Dynmic Progrmming Acknowledgment: A good number of these

More information

Continuous Random Variables

Continuous Random Variables STAT/MATH 395 A - PROBABILITY II UW Winter Qurter 217 Néhémy Lim Continuous Rndom Vribles Nottion. The indictor function of set S is rel-vlued function defined by : { 1 if x S 1 S (x) if x S Suppose tht

More information

New Expansion and Infinite Series

New Expansion and Infinite Series Interntionl Mthemticl Forum, Vol. 9, 204, no. 22, 06-073 HIKARI Ltd, www.m-hikri.com http://dx.doi.org/0.2988/imf.204.4502 New Expnsion nd Infinite Series Diyun Zhng College of Computer Nnjing University

More information

Driving Cycle Construction of City Road for Hybrid Bus Based on Markov Process Deng Pan1, a, Fengchun Sun1,b*, Hongwen He1, c, Jiankun Peng1, d

Driving Cycle Construction of City Road for Hybrid Bus Based on Markov Process Deng Pan1, a, Fengchun Sun1,b*, Hongwen He1, c, Jiankun Peng1, d Interntionl Industril Informtics nd Computer Engineering Conference (IIICEC 15) Driving Cycle Construction of City Rod for Hybrid Bus Bsed on Mrkov Process Deng Pn1,, Fengchun Sun1,b*, Hongwen He1, c,

More information

Lecture 21: Order statistics

Lecture 21: Order statistics Lecture : Order sttistics Suppose we hve N mesurements of sclr, x i =, N Tke ll mesurements nd sort them into scending order x x x 3 x N Define the mesured running integrl S N (x) = 0 for x < x = i/n for

More information

Reinforcement Learning

Reinforcement Learning Reinforcement Lerning Tom Mitchell, Mchine Lerning, chpter 13 Outline Introduction Comprison with inductive lerning Mrkov Decision Processes: the model Optiml policy: The tsk Q Lerning: Q function Algorithm

More information

Chapter 9: Inferences based on Two samples: Confidence intervals and tests of hypotheses

Chapter 9: Inferences based on Two samples: Confidence intervals and tests of hypotheses Chpter 9: Inferences bsed on Two smples: Confidence intervls nd tests of hypotheses 9.1 The trget prmeter : difference between two popultion mens : difference between two popultion proportions : rtio of

More information

Credibility Hypothesis Testing of Fuzzy Triangular Distributions

Credibility Hypothesis Testing of Fuzzy Triangular Distributions 666663 Journl of Uncertin Systems Vol.9, No., pp.6-74, 5 Online t: www.jus.org.uk Credibility Hypothesis Testing of Fuzzy Tringulr Distributions S. Smpth, B. Rmy Received April 3; Revised 4 April 4 Abstrct

More information

Chapter 5 : Continuous Random Variables

Chapter 5 : Continuous Random Variables STAT/MATH 395 A - PROBABILITY II UW Winter Qurter 216 Néhémy Lim Chpter 5 : Continuous Rndom Vribles Nottions. N {, 1, 2,...}, set of nturl numbers (i.e. ll nonnegtive integers); N {1, 2,...}, set of ll

More information

Numerical integration

Numerical integration 2 Numericl integrtion This is pge i Printer: Opque this 2. Introduction Numericl integrtion is problem tht is prt of mny problems in the economics nd econometrics literture. The orgniztion of this chpter

More information

MATH 144: Business Calculus Final Review

MATH 144: Business Calculus Final Review MATH 144: Business Clculus Finl Review 1 Skills 1. Clculte severl limits. 2. Find verticl nd horizontl symptotes for given rtionl function. 3. Clculte derivtive by definition. 4. Clculte severl derivtives

More information

Student Activity 3: Single Factor ANOVA

Student Activity 3: Single Factor ANOVA MATH 40 Student Activity 3: Single Fctor ANOVA Some Bsic Concepts In designed experiment, two or more tretments, or combintions of tretments, is pplied to experimentl units The number of tretments, whether

More information

Fig. 1. Open-Loop and Closed-Loop Systems with Plant Variations

Fig. 1. Open-Loop and Closed-Loop Systems with Plant Variations ME 3600 Control ystems Chrcteristics of Open-Loop nd Closed-Loop ystems Importnt Control ystem Chrcteristics o ensitivity of system response to prmetric vritions cn be reduced o rnsient nd stedy-stte responses

More information

Non-Linear & Logistic Regression

Non-Linear & Logistic Regression Non-Liner & Logistic Regression If the sttistics re boring, then you've got the wrong numbers. Edwrd R. Tufte (Sttistics Professor, Yle University) Regression Anlyses When do we use these? PART 1: find

More information

8 Laplace s Method and Local Limit Theorems

8 Laplace s Method and Local Limit Theorems 8 Lplce s Method nd Locl Limit Theorems 8. Fourier Anlysis in Higher DImensions Most of the theorems of Fourier nlysis tht we hve proved hve nturl generliztions to higher dimensions, nd these cn be proved

More information

3.4 Numerical integration

3.4 Numerical integration 3.4. Numericl integrtion 63 3.4 Numericl integrtion In mny economic pplictions it is necessry to compute the definite integrl of relvlued function f with respect to "weight" function w over n intervl [,

More information

Lecture 19: Continuous Least Squares Approximation

Lecture 19: Continuous Least Squares Approximation Lecture 19: Continuous Lest Squres Approximtion 33 Continuous lest squres pproximtion We begn 31 with the problem of pproximting some f C[, b] with polynomil p P n t the discrete points x, x 1,, x m for

More information

An approximation to the arithmetic-geometric mean. G.J.O. Jameson, Math. Gazette 98 (2014), 85 95

An approximation to the arithmetic-geometric mean. G.J.O. Jameson, Math. Gazette 98 (2014), 85 95 An pproximtion to the rithmetic-geometric men G.J.O. Jmeson, Mth. Gzette 98 (4), 85 95 Given positive numbers > b, consider the itertion given by =, b = b nd n+ = ( n + b n ), b n+ = ( n b n ) /. At ech

More information

Bellman Optimality Equation for V*

Bellman Optimality Equation for V* Bellmn Optimlity Eqution for V* The vlue of stte under n optiml policy must equl the expected return for the best ction from tht stte: V (s) mx Q (s,) A(s) mx A(s) mx A(s) Er t 1 V (s t 1 ) s t s, t s

More information

Numerical Integration

Numerical Integration Chpter 5 Numericl Integrtion Numericl integrtion is the study of how the numericl vlue of n integrl cn be found. Methods of function pproximtion discussed in Chpter??, i.e., function pproximtion vi the

More information

2D1431 Machine Learning Lab 3: Reinforcement Learning

2D1431 Machine Learning Lab 3: Reinforcement Learning 2D1431 Mchine Lerning Lb 3: Reinforcement Lerning Frnk Hoffmnn modified by Örjn Ekeberg December 7, 2004 1 Introduction In this lb you will lern bout dynmic progrmming nd reinforcement lerning. It is ssumed

More information

1B40 Practical Skills

1B40 Practical Skills B40 Prcticl Skills Comining uncertinties from severl quntities error propgtion We usully encounter situtions where the result of n experiment is given in terms of two (or more) quntities. We then need

More information

Advanced Calculus: MATH 410 Notes on Integrals and Integrability Professor David Levermore 17 October 2004

Advanced Calculus: MATH 410 Notes on Integrals and Integrability Professor David Levermore 17 October 2004 Advnced Clculus: MATH 410 Notes on Integrls nd Integrbility Professor Dvid Levermore 17 October 2004 1. Definite Integrls In this section we revisit the definite integrl tht you were introduced to when

More information

SUMMER KNOWHOW STUDY AND LEARNING CENTRE

SUMMER KNOWHOW STUDY AND LEARNING CENTRE SUMMER KNOWHOW STUDY AND LEARNING CENTRE Indices & Logrithms 2 Contents Indices.2 Frctionl Indices.4 Logrithms 6 Exponentil equtions. Simplifying Surds 13 Opertions on Surds..16 Scientific Nottion..18

More information

Entropy and Ergodic Theory Notes 10: Large Deviations I

Entropy and Ergodic Theory Notes 10: Large Deviations I Entropy nd Ergodic Theory Notes 10: Lrge Devitions I 1 A chnge of convention This is our first lecture on pplictions of entropy in probbility theory. In probbility theory, the convention is tht ll logrithms

More information

Recitation 3: More Applications of the Derivative

Recitation 3: More Applications of the Derivative Mth 1c TA: Pdric Brtlett Recittion 3: More Applictions of the Derivtive Week 3 Cltech 2012 1 Rndom Question Question 1 A grph consists of the following: A set V of vertices. A set E of edges where ech

More information

Theoretical foundations of Gaussian quadrature

Theoretical foundations of Gaussian quadrature Theoreticl foundtions of Gussin qudrture 1 Inner product vector spce Definition 1. A vector spce (or liner spce) is set V = {u, v, w,...} in which the following two opertions re defined: (A) Addition of

More information

Chapter 14. Matrix Representations of Linear Transformations

Chapter 14. Matrix Representations of Linear Transformations Chpter 4 Mtrix Representtions of Liner Trnsformtions When considering the Het Stte Evolution, we found tht we could describe this process using multipliction by mtrix. This ws nice becuse computers cn

More information

A REVIEW OF CALCULUS CONCEPTS FOR JDEP 384H. Thomas Shores Department of Mathematics University of Nebraska Spring 2007

A REVIEW OF CALCULUS CONCEPTS FOR JDEP 384H. Thomas Shores Department of Mathematics University of Nebraska Spring 2007 A REVIEW OF CALCULUS CONCEPTS FOR JDEP 384H Thoms Shores Deprtment of Mthemtics University of Nebrsk Spring 2007 Contents Rtes of Chnge nd Derivtives 1 Dierentils 4 Are nd Integrls 5 Multivrite Clculus

More information

Online Supplements to Performance-Based Contracts for Outpatient Medical Services

Online Supplements to Performance-Based Contracts for Outpatient Medical Services Jing, Png nd Svin: Performnce-bsed Contrcts Article submitted to Mnufcturing & Service Opertions Mngement; mnuscript no. MSOM-11-270.R2 1 Online Supplements to Performnce-Bsed Contrcts for Outptient Medicl

More information

Review of basic calculus

Review of basic calculus Review of bsic clculus This brief review reclls some of the most importnt concepts, definitions, nd theorems from bsic clculus. It is not intended to tech bsic clculus from scrtch. If ny of the items below

More information

7.2 The Definite Integral

7.2 The Definite Integral 7.2 The Definite Integrl the definite integrl In the previous section, it ws found tht if function f is continuous nd nonnegtive, then the re under the grph of f on [, b] is given by F (b) F (), where

More information

Discrete Mathematics and Probability Theory Spring 2013 Anant Sahai Lecture 17

Discrete Mathematics and Probability Theory Spring 2013 Anant Sahai Lecture 17 EECS 70 Discrete Mthemtics nd Proility Theory Spring 2013 Annt Shi Lecture 17 I.I.D. Rndom Vriles Estimting the is of coin Question: We wnt to estimte the proportion p of Democrts in the US popultion,

More information

Estimation of Binomial Distribution in the Light of Future Data

Estimation of Binomial Distribution in the Light of Future Data British Journl of Mthemtics & Computer Science 102: 1-7, 2015, Article no.bjmcs.19191 ISSN: 2231-0851 SCIENCEDOMAIN interntionl www.sciencedomin.org Estimtion of Binomil Distribution in the Light of Future

More information

A signalling model of school grades: centralized versus decentralized examinations

A signalling model of school grades: centralized versus decentralized examinations A signlling model of school grdes: centrlized versus decentrlized exmintions Mri De Pol nd Vincenzo Scopp Diprtimento di Economi e Sttistic, Università dell Clbri m.depol@unicl.it; v.scopp@unicl.it 1 The

More information

Lecture 1. Functional series. Pointwise and uniform convergence.

Lecture 1. Functional series. Pointwise and uniform convergence. 1 Introduction. Lecture 1. Functionl series. Pointwise nd uniform convergence. In this course we study mongst other things Fourier series. The Fourier series for periodic function f(x) with period 2π is

More information

Reversals of Signal-Posterior Monotonicity for Any Bounded Prior

Reversals of Signal-Posterior Monotonicity for Any Bounded Prior Reversls of Signl-Posterior Monotonicity for Any Bounded Prior Christopher P. Chmbers Pul J. Hely Abstrct Pul Milgrom (The Bell Journl of Economics, 12(2): 380 391) showed tht if the strict monotone likelihood

More information

Goals: Determine how to calculate the area described by a function. Define the definite integral. Explore the relationship between the definite

Goals: Determine how to calculate the area described by a function. Define the definite integral. Explore the relationship between the definite Unit #8 : The Integrl Gols: Determine how to clculte the re described by function. Define the definite integrl. Eplore the reltionship between the definite integrl nd re. Eplore wys to estimte the definite

More information

W. We shall do so one by one, starting with I 1, and we shall do it greedily, trying

W. We shall do so one by one, starting with I 1, and we shall do it greedily, trying Vitli covers 1 Definition. A Vitli cover of set E R is set V of closed intervls with positive length so tht, for every δ > 0 nd every x E, there is some I V with λ(i ) < δ nd x I. 2 Lemm (Vitli covering)

More information

{ } = E! & $ " k r t +k +1

{ } = E! & $  k r t +k +1 Chpter 4: Dynmic Progrmming Objectives of this chpter: Overview of collection of clssicl solution methods for MDPs known s dynmic progrmming (DP) Show how DP cn be used to compute vlue functions, nd hence,

More information

Chapter 4: Dynamic Programming

Chapter 4: Dynamic Programming Chpter 4: Dynmic Progrmming Objectives of this chpter: Overview of collection of clssicl solution methods for MDPs known s dynmic progrmming (DP) Show how DP cn be used to compute vlue functions, nd hence,

More information

Chapter 2 Fundamental Concepts

Chapter 2 Fundamental Concepts Chpter 2 Fundmentl Concepts This chpter describes the fundmentl concepts in the theory of time series models In prticulr we introduce the concepts of stochstic process, men nd covrince function, sttionry

More information

MIXED MODELS (Sections ) I) In the unrestricted model, interactions are treated as in the random effects model:

MIXED MODELS (Sections ) I) In the unrestricted model, interactions are treated as in the random effects model: 1 2 MIXED MODELS (Sections 17.7 17.8) Exmple: Suppose tht in the fiber breking strength exmple, the four mchines used were the only ones of interest, but the interest ws over wide rnge of opertors, nd

More information

Data Assimilation. Alan O Neill Data Assimilation Research Centre University of Reading

Data Assimilation. Alan O Neill Data Assimilation Research Centre University of Reading Dt Assimiltion Aln O Neill Dt Assimiltion Reserch Centre University of Reding Contents Motivtion Univrite sclr dt ssimiltion Multivrite vector dt ssimiltion Optiml Interpoltion BLUE 3d-Vritionl Method

More information

THE EXISTENCE-UNIQUENESS THEOREM FOR FIRST-ORDER DIFFERENTIAL EQUATIONS.

THE EXISTENCE-UNIQUENESS THEOREM FOR FIRST-ORDER DIFFERENTIAL EQUATIONS. THE EXISTENCE-UNIQUENESS THEOREM FOR FIRST-ORDER DIFFERENTIAL EQUATIONS RADON ROSBOROUGH https://intuitiveexplntionscom/picrd-lindelof-theorem/ This document is proof of the existence-uniqueness theorem

More information

Lecture 3 Gaussian Probability Distribution

Lecture 3 Gaussian Probability Distribution Introduction Lecture 3 Gussin Probbility Distribution Gussin probbility distribution is perhps the most used distribution in ll of science. lso clled bell shped curve or norml distribution Unlike the binomil

More information

Synoptic Meteorology I: Finite Differences September Partial Derivatives (or, Why Do We Care About Finite Differences?

Synoptic Meteorology I: Finite Differences September Partial Derivatives (or, Why Do We Care About Finite Differences? Synoptic Meteorology I: Finite Differences 16-18 September 2014 Prtil Derivtives (or, Why Do We Cre About Finite Differences?) With the exception of the idel gs lw, the equtions tht govern the evolution

More information

Discrete Mathematics and Probability Theory Summer 2014 James Cook Note 17

Discrete Mathematics and Probability Theory Summer 2014 James Cook Note 17 CS 70 Discrete Mthemtics nd Proility Theory Summer 2014 Jmes Cook Note 17 I.I.D. Rndom Vriles Estimting the is of coin Question: We wnt to estimte the proportion p of Democrts in the US popultion, y tking

More information

An instructive toy model: two paradoxes

An instructive toy model: two paradoxes Tel Aviv University, 2006 Gussin rndom vectors 27 3 Level crossings... the fmous ice formul, undoubtedly one of the most importnt results in the ppliction of smooth stochstic processes..j. Adler nd J.E.

More information

New data structures to reduce data size and search time

New data structures to reduce data size and search time New dt structures to reduce dt size nd serch time Tsuneo Kuwbr Deprtment of Informtion Sciences, Fculty of Science, Kngw University, Hirtsuk-shi, Jpn FIT2018 1D-1, No2, pp1-4 Copyright (c)2018 by The Institute

More information

ECO 317 Economics of Uncertainty Fall Term 2007 Notes for lectures 4. Stochastic Dominance

ECO 317 Economics of Uncertainty Fall Term 2007 Notes for lectures 4. Stochastic Dominance Generl structure ECO 37 Economics of Uncertinty Fll Term 007 Notes for lectures 4. Stochstic Dominnce Here we suppose tht the consequences re welth mounts denoted by W, which cn tke on ny vlue between

More information

ADVANCEMENT OF THE CLOSELY COUPLED PROBES POTENTIAL DROP TECHNIQUE FOR NDE OF SURFACE CRACKS

ADVANCEMENT OF THE CLOSELY COUPLED PROBES POTENTIAL DROP TECHNIQUE FOR NDE OF SURFACE CRACKS ADVANCEMENT OF THE CLOSELY COUPLED PROBES POTENTIAL DROP TECHNIQUE FOR NDE OF SURFACE CRACKS F. Tkeo 1 nd M. Sk 1 Hchinohe Ntionl College of Technology, Hchinohe, Jpn; Tohoku University, Sendi, Jpn Abstrct:

More information

THERMAL EXPANSION COEFFICIENT OF WATER FOR VOLUMETRIC CALIBRATION

THERMAL EXPANSION COEFFICIENT OF WATER FOR VOLUMETRIC CALIBRATION XX IMEKO World Congress Metrology for Green Growth September 9,, Busn, Republic of Kore THERMAL EXPANSION COEFFICIENT OF WATER FOR OLUMETRIC CALIBRATION Nieves Medin Hed of Mss Division, CEM, Spin, mnmedin@mityc.es

More information

13: Diffusion in 2 Energy Groups

13: Diffusion in 2 Energy Groups 3: Diffusion in Energy Groups B. Rouben McMster University Course EP 4D3/6D3 Nucler Rector Anlysis (Rector Physics) 5 Sept.-Dec. 5 September Contents We study the diffusion eqution in two energy groups

More information

APPROXIMATE INTEGRATION

APPROXIMATE INTEGRATION APPROXIMATE INTEGRATION. Introduction We hve seen tht there re functions whose nti-derivtives cnnot be expressed in closed form. For these resons ny definite integrl involving these integrnds cnnot be

More information

Cf. Linn Sennott, Stochastic Dynamic Programming and the Control of Queueing Systems, Wiley Series in Probability & Statistics, 1999.

Cf. Linn Sennott, Stochastic Dynamic Programming and the Control of Queueing Systems, Wiley Series in Probability & Statistics, 1999. Cf. Linn Sennott, Stochstic Dynmic Progrmming nd the Control of Queueing Systems, Wiley Series in Probbility & Sttistics, 1999. D.L.Bricker, 2001 Dept of Industril Engineering The University of Iow MDP

More information

Generation of Lyapunov Functions by Neural Networks

Generation of Lyapunov Functions by Neural Networks WCE 28, July 2-4, 28, London, U.K. Genertion of Lypunov Functions by Neurl Networks Nvid Noroozi, Pknoosh Krimghee, Ftemeh Sfei, nd Hmed Jvdi Abstrct Lypunov function is generlly obtined bsed on tril nd

More information

Acceptance Sampling by Attributes

Acceptance Sampling by Attributes Introduction Acceptnce Smpling by Attributes Acceptnce smpling is concerned with inspection nd decision mking regrding products. Three spects of smpling re importnt: o Involves rndom smpling of n entire

More information

Math 8 Winter 2015 Applications of Integration

Math 8 Winter 2015 Applications of Integration Mth 8 Winter 205 Applictions of Integrtion Here re few importnt pplictions of integrtion. The pplictions you my see on n exm in this course include only the Net Chnge Theorem (which is relly just the Fundmentl

More information

Chapter 4 Contravariance, Covariance, and Spacetime Diagrams

Chapter 4 Contravariance, Covariance, and Spacetime Diagrams Chpter 4 Contrvrince, Covrince, nd Spcetime Digrms 4. The Components of Vector in Skewed Coordintes We hve seen in Chpter 3; figure 3.9, tht in order to show inertil motion tht is consistent with the Lorentz

More information

CS667 Lecture 6: Monte Carlo Integration 02/10/05

CS667 Lecture 6: Monte Carlo Integration 02/10/05 CS667 Lecture 6: Monte Crlo Integrtion 02/10/05 Venkt Krishnrj Lecturer: Steve Mrschner 1 Ide The min ide of Monte Crlo Integrtion is tht we cn estimte the vlue of n integrl by looking t lrge number of

More information

f(x) dx, If one of these two conditions is not met, we call the integral improper. Our usual definition for the value for the definite integral

f(x) dx, If one of these two conditions is not met, we call the integral improper. Our usual definition for the value for the definite integral Improper Integrls Every time tht we hve evluted definite integrl such s f(x) dx, we hve mde two implicit ssumptions bout the integrl:. The intervl [, b] is finite, nd. f(x) is continuous on [, b]. If one

More information

1.9 C 2 inner variations

1.9 C 2 inner variations 46 CHAPTER 1. INDIRECT METHODS 1.9 C 2 inner vritions So fr, we hve restricted ttention to liner vritions. These re vritions of the form vx; ǫ = ux + ǫφx where φ is in some liner perturbtion clss P, for

More information

How do we solve these things, especially when they get complicated? How do we know when a system has a solution, and when is it unique?

How do we solve these things, especially when they get complicated? How do we know when a system has a solution, and when is it unique? XII. LINEAR ALGEBRA: SOLVING SYSTEMS OF EQUATIONS Tody we re going to tlk bout solving systems of liner equtions. These re problems tht give couple of equtions with couple of unknowns, like: 6 2 3 7 4

More information

Best Approximation. Chapter The General Case

Best Approximation. Chapter The General Case Chpter 4 Best Approximtion 4.1 The Generl Cse In the previous chpter, we hve seen how n interpolting polynomil cn be used s n pproximtion to given function. We now wnt to find the best pproximtion to given

More information

For the percentage of full time students at RCC the symbols would be:

For the percentage of full time students at RCC the symbols would be: Mth 17/171 Chpter 7- ypothesis Testing with One Smple This chpter is s simple s the previous one, except it is more interesting In this chpter we will test clims concerning the sme prmeters tht we worked

More information

P 3 (x) = f(0) + f (0)x + f (0) 2. x 2 + f (0) . In the problem set, you are asked to show, in general, the n th order term is a n = f (n) (0)

P 3 (x) = f(0) + f (0)x + f (0) 2. x 2 + f (0) . In the problem set, you are asked to show, in general, the n th order term is a n = f (n) (0) 1 Tylor polynomils In Section 3.5, we discussed how to pproximte function f(x) round point in terms of its first derivtive f (x) evluted t, tht is using the liner pproximtion f() + f ()(x ). We clled this

More information

ODE: Existence and Uniqueness of a Solution

ODE: Existence and Uniqueness of a Solution Mth 22 Fll 213 Jerry Kzdn ODE: Existence nd Uniqueness of Solution The Fundmentl Theorem of Clculus tells us how to solve the ordinry differentil eqution (ODE) du = f(t) dt with initil condition u() =

More information

Euler, Ioachimescu and the trapezium rule. G.J.O. Jameson (Math. Gazette 96 (2012), )

Euler, Ioachimescu and the trapezium rule. G.J.O. Jameson (Math. Gazette 96 (2012), ) Euler, Iochimescu nd the trpezium rule G.J.O. Jmeson (Mth. Gzette 96 (0), 36 4) The following results were estblished in recent Gzette rticle [, Theorems, 3, 4]. Given > 0 nd 0 < s

More information

CBE 291b - Computation And Optimization For Engineers

CBE 291b - Computation And Optimization For Engineers The University of Western Ontrio Fculty of Engineering Science Deprtment of Chemicl nd Biochemicl Engineering CBE 9b - Computtion And Optimiztion For Engineers Mtlb Project Introduction Prof. A. Jutn Jn

More information

Abstract inner product spaces

Abstract inner product spaces WEEK 4 Abstrct inner product spces Definition An inner product spce is vector spce V over the rel field R equipped with rule for multiplying vectors, such tht the product of two vectors is sclr, nd the

More information

Session 13

Session 13 780.20 Session 3 (lst revised: Februry 25, 202) 3 3. 780.20 Session 3. Follow-ups to Session 2 Histogrms of Uniform Rndom Number Distributions. Here is typicl figure you might get when histogrmming uniform

More information

2008 Mathematical Methods (CAS) GA 3: Examination 2

2008 Mathematical Methods (CAS) GA 3: Examination 2 Mthemticl Methods (CAS) GA : Exmintion GENERAL COMMENTS There were 406 students who st the Mthemticl Methods (CAS) exmintion in. Mrks rnged from to 79 out of possible score of 80. Student responses showed

More information

Section 11.5 Estimation of difference of two proportions

Section 11.5 Estimation of difference of two proportions ection.5 Estimtion of difference of two proportions As seen in estimtion of difference of two mens for nonnorml popultion bsed on lrge smple sizes, one cn use CLT in the pproximtion of the distribution

More information

Chapters 4 & 5 Integrals & Applications

Chapters 4 & 5 Integrals & Applications Contents Chpters 4 & 5 Integrls & Applictions Motivtion to Chpters 4 & 5 2 Chpter 4 3 Ares nd Distnces 3. VIDEO - Ares Under Functions............................................ 3.2 VIDEO - Applictions

More information

1 Probability Density Functions

1 Probability Density Functions Lis Yn CS 9 Continuous Distributions Lecture Notes #9 July 6, 28 Bsed on chpter by Chris Piech So fr, ll rndom vribles we hve seen hve been discrete. In ll the cses we hve seen in CS 9, this ment tht our

More information

CMDA 4604: Intermediate Topics in Mathematical Modeling Lecture 19: Interpolation and Quadrature

CMDA 4604: Intermediate Topics in Mathematical Modeling Lecture 19: Interpolation and Quadrature CMDA 4604: Intermedite Topics in Mthemticl Modeling Lecture 19: Interpoltion nd Qudrture In this lecture we mke brief diversion into the res of interpoltion nd qudrture. Given function f C[, b], we sy

More information

Natural examples of rings are the ring of integers, a ring of polynomials in one variable, the ring

Natural examples of rings are the ring of integers, a ring of polynomials in one variable, the ring More generlly, we define ring to be non-empty set R hving two binry opertions (we ll think of these s ddition nd multipliction) which is n Abelin group under + (we ll denote the dditive identity by 0),

More information

Stuff You Need to Know From Calculus

Stuff You Need to Know From Calculus Stuff You Need to Know From Clculus For the first time in the semester, the stuff we re doing is finlly going to look like clculus (with vector slnt, of course). This mens tht in order to succeed, you

More information

A Signal-Level Fusion Model for Image-Based Change Detection in DARPA's Dynamic Database System

A Signal-Level Fusion Model for Image-Based Change Detection in DARPA's Dynamic Database System SPIE Aerosense 001 Conference on Signl Processing, Sensor Fusion, nd Trget Recognition X, April 16-0, Orlndo FL. (Minor errors in published version corrected.) A Signl-Level Fusion Model for Imge-Bsed

More information

Numerical Analysis: Trapezoidal and Simpson s Rule

Numerical Analysis: Trapezoidal and Simpson s Rule nd Simpson s Mthemticl question we re interested in numericlly nswering How to we evlute I = f (x) dx? Clculus tells us tht if F(x) is the ntiderivtive of function f (x) on the intervl [, b], then I =

More information

Predict Global Earth Temperature using Linier Regression

Predict Global Earth Temperature using Linier Regression Predict Globl Erth Temperture using Linier Regression Edwin Swndi Sijbt (23516012) Progrm Studi Mgister Informtik Sekolh Teknik Elektro dn Informtik ITB Jl. Gnesh 10 Bndung 40132, Indonesi 23516012@std.stei.itb.c.id

More information

Tangent Line and Tangent Plane Approximations of Definite Integral

Tangent Line and Tangent Plane Approximations of Definite Integral Rose-Hulmn Undergrdute Mthemtics Journl Volume 16 Issue 2 Article 8 Tngent Line nd Tngent Plne Approximtions of Definite Integrl Meghn Peer Sginw Vlley Stte University Follow this nd dditionl works t:

More information

Numerical Integration

Numerical Integration Chpter 1 Numericl Integrtion Numericl differentition methods compute pproximtions to the derivtive of function from known vlues of the function. Numericl integrtion uses the sme informtion to compute numericl

More information

MAA 4212 Improper Integrals

MAA 4212 Improper Integrals Notes by Dvid Groisser, Copyright c 1995; revised 2002, 2009, 2014 MAA 4212 Improper Integrls The Riemnn integrl, while perfectly well-defined, is too restrictive for mny purposes; there re functions which

More information