Importance Sampling with Unequal Support

Size: px
Start display at page:

Download "Importance Sampling with Unequal Support"

Transcription

1 Importae Samplig with Uequal Support Philip S. Thomas ad Emma Bruskill Caregie Mello Uiversity Abstrat Importae samplig is ofte used i mahie learig whe traiig ad testig data ome from differet distributios. I this paper we propose a ew variat of importae samplig that a redue the variae of importae sampligbased estimates by orders of magitude whe the supports of the traiig ad testig distributios differ. After motivatig ad presetig our ew importae samplig estimator, we provide a detailed theoretial aalysis that haraterizes both its bias ad variae relative to the ordiary importae samplig estimator i various settigs, whih ilude ases where ordiary importae samplig is biased, while our ew estimator is ot, ad vie versa. We olude with a example of how our ew importae samplig estimator a be used to improve estimates of how well a ew treatmet poliy for diabetes will work for a idividual, usig oly data from whe the idividual used a previous treatmet poliy. Itrodutio A key hallege i artifiial itelligee is to estimate the expetatio of a radom variable. Istaes of this problem arise i areas ragig from plaig ad deisio makig e.g., estimatig the expeted sum of rewards produed by a poliy for deisio makig uder uertaity to probabilisti iferee. Although the estimatio of a expeted value is straightforward if we a geerate may idepedet ad idetially distributed i.i.d. samples from the relevat probability distributio whih we refer to as the target distributio, we may ot have geerative aess to the target distributio. Istead, we might oly have data from a differet distributio that we all the samplig distributio. For example, i off-poliy evaluatio for reiforemet learig, the goal is to estimate the expeted sum of rewards that a deisio poliy will produe, give oly data gathered usig some other poliy. Similarly, i supervised learig, we may wish to predit the performae of a regressor or lassifier if it were to be applied to data that omes from a distributio that differs from the distributio of the available data e.g., we might predit the auray of a lassifier for had-writte letters give that observed letter frequeies ome from Eglish, usig a orpus of labeled letters olleted from Germa doumets. Copyright 07, Assoiatio for the Advaemet of Artifiial Itelligee All rights reserved. More preisely, we osider the problem of estimatig : EhX], where h is a real-valued futio ad the expetatio is over the radom variable X, whih is a sample from the target distributio. As iput we assume aess to i.i.d. samples from a samplig distributio that is differet from the target distributio. A lassial approah to this problem is to use importae samplig IS, whih reweights the observed samples to aout for the differee betwee the target ad samplig distributios Kah, 955. Importae samplig produes a ubiased but ofte highvariae estimate of. We itrodue importae samplig with uequal support US a simple ew importae samplig estimator that a drastially redue the variae of importae samplig whe the supports of the samplig ad target distributios differ. This settig with uequal support a our, for example, i our earlier example where Germa doumets might ilude symbols like ß, that the lassifier will ot eouter. US essetially performs importae samplig oly o the data that falls withi the support of the target distributio, ad the sales this estimate by a ostat that reflets the relative support of the target ad samplig distributios. US typially has lower variae tha ordiary importae samplig sometimes by orders of magitude, ad is ubiased i the importat settig where at least oe sample falls withi the support of the target distributio. If o samples do, the oe of the available data ould have bee geerated by the target distributio, ad so it is ulear what would make for a reasoable estimate. Furthermore, the oditioally ubiased ature of US is suffiiet to allow for its use with oetratio iequalities like Hoeffdig s iequality to ostrut ofidee bouds o. By otrast, weighted importae samplig Rubistei, 98 is aother variat of importae samplig that a redue variae, but whih itrodues bias that makes it iompatible with Hoeffdig s iequality. Problem Settig ad Importae Samplig Let f ad g be probability desity futios PDFs for two distributios that we all the target distributio ad samplig distributio, respetively. Let h : R R be alled the evaluatio futio. Let : E f hx], where E f deotes the expeted value give that f is the PDF of the radom variables i the expetatio i this ase, just X. Let

2 F : {x R : fx 0}, G : {x R : gx 0}, ad H : {x R : hx 0} be the supports of the target ad samplig distributios, ad the evaluatio futio, respetively. I this paper we will disuss tehiques for estimatig give N >0 i.i.d. samples, X : {X,..., X }, from the samplig distributio, ad we fous o the settig where F H G where the joit support of F ad H is a strit subset of the support of G. The importae samplig estimator, ISX : t + fx i gx i hx i t, is a widely used estimator of, where t 0 we osider o-zero values of t later. If F H G, the ISX is a osistet ad ubiased estimator of. That is, ISX a.s. ad E g ISX ] we review this latter result i Property i the supplemetal doumet. A otrol variate is a ostat, t R, that is subtrated from eah hx i ad the added bak to the fial estimate, as i Hammersley, 960; Hammersley ad Hadsomb, 964. Although otrol variates, tx i, that deped o the sample, X i, a be beefiial, for our later purposes we oly osider ostat otrol variates. Ituitively, iludig a ostat otrol variate equates to estimatig : E f h X] usig importae samplig without a otrol variate, where h x hx t, ad the addig t to the resultig estimate to get a estimate of. Later we show that the variae of importae samplig ireases with, ad so applyig importae samplig to h results i higher variae tha applyig importae samplig to h with t, sie the 0. That is, by iduig a kid of ormalizatio, a otrol variate a redue the variae of estimates without itroduig bias a property that has made the ilusio of otrol variates a popular topi i some reet works usig importae samplig Dudík et al., 0; Jiag ad Li, 06; Thomas ad Bruskill, 06. Although later we disuss otrol variates more, for simpliity our derivatios fous o importae samplig estimators without otrol variates. There are also other extesios of the importae samplig estimator that a redue variae otably the weighted importae samplig estimator, whih we ompare to later, ad whih a provide large redutios of variae ad mea squared error, but whih itrodues bias. A Illustrative Example I this setio we preset a example that highlights the peuliar behavior of the IS estimator whe F H G. The illustrative example, depited i Figure, is defied as follows. Let gx 0.5 if x 0, ] ad gx 0 otherwise, ad let fx if x 0, ] ad fx 0 otherwise. So, F 0, ] ad G 0, ]. Let hx if x 0, ] ad hx 0 otherwise, so that H 0, ]. Notie that. Sie the samplig ad target distributios are both uiform, a obvious estimator of if f ad g are kow but h is ot would be the average of the poits that fall withi F. Let #X i F deote the umber of samples i X that / g F f G Figure : Depitio of the illustrative example. The evaluatio futio is ot show beause h f ad H F. are i F. Formally, the obvious estimator is ˆ : F X i hx i, #X i F where A x if x A ad A x 0 otherwise. Give our kowledge of h, it is straightforward to show that this estimator is equal to if #X i F > 0 ad is udefied otherwise it is exatly orret has zero bias ad variae as log as at least oe sample falls withi F. If o samples fall withi F, the we have oly observed data that will ever our uder the target distributio, ad so we have o useful iformatio about. I this ase, we might defie our obvious estimator to retur a arbitrary value, e.g., zero. Perhaps surprisigly, the importae samplig estimator does ot degeerate to this obvious estimator: ISX F X i hx i #X i F. Sie E g #X i F /] /, this estimate is orret i expetatio, but does ot have zero variae give that at least oe sample falls withi F. If more tha / of the samples fall withi F, this estimate will be a over-estimate of, ad if fewer tha / of the samples fall withi F, this estimate will be a uder-estimate. Although orret o average, the importae samplig estimator has ueessary additioal variae relative to the obvious estimator. Importae Samplig with Uequal Support We propose a ew importae samplig estimator, importae samplig with uequal support ISUS, or US for brevity, that does degeerate to the obvious estimator for our illustrative example. Ituitively, US prues from X the samples that are outside F or more geerally, outside some set C, that we defie later to ostrut a ew data set, X, that has fewer samples. This ew data set a be viewed as #X i F i.i.d. samples from a differet samplig distributio a distributio with PDF g, whih is simply g, but truated to oly have support o F ad re-ormalized to itegrate to oe. US the applies ordiary importae samplig to this ew data set. For geerality, we allow US to prue from X all of the poits that are ot i a set, C, whih a be defied may differet ways, iludig C : F as i our previous example. Our oly requiremet is that F H C G. I order

3 to ompute US, we must ompute a value, : gx dx, C whih is the probability that a sample from the samplig distributio will be i C. I geeral, C should be hose to be as small as possible while still esurig that both F H C G so that iformative samples are ot disarded ad a be omputed. Ideally, we would selet C F H, however i some ases aot be omputed for this value of C. For example, i our later experimets we osider a problem where h ad H are ot kow, but F is, ad so we a ompute usig C F, but ot C F H. Let kx : CX i be the umber of X i that are i C. The US estimator is the defied as: USX : kx fx i gx i hx i, if kx > 0, ad USX : 0 if kx 0. This is equivalet to applyig importae samplig to the prued data set, X, sie the g x gx/ for x C. Also, i we sum over all samples rather tha just the kx samples i C beause fx i hx i 0 for all X i ot i C. Although we aalyze the US estimator as defied i, it a be geeralized to use measure theoreti probability ad to iorporate a otrol variate. I this more geeral settig, f ad g are probability measures, f is absolutely otiuous with respet to g, tx i deotes a real-valued sampledepedet otrol variate, ad USX : gc df kx dg X i hx i tx i E gtx]. Theoretial Aalysis of US We begi with two simple theorems that eluidate the relatioship betwee IS ad US. The proofs of both theorems are straightforward, but deferred to the supplemetal doumet. First, Theorem shows that, whe C G, US degeerates to IS. Oe ase where C G is whe the support of the target distributio ad evaluatio futio are both equal to the support of the samplig distributio, i.e., whe F H G, ad so C G eessarily. Theorem. If C G, the USX ISX. Theorem shows that, if we replae i the defiitio of US with a empirial estimate, ĉx : kx /, the US ad IS are equivalet. This provides some ituitio for why US teds to outperform IS whe C G IS is US, but usig a empirial estimate of the probability that a sample falls withi C, i plae of its kow value. Theorem. If we replae with a empirial estimate, ĉx : kx /, the USX ISX. I Table we summarize more theoretial results that larify the differees betwee IS ad US i several settigs. The first settig deoted by a i Table is the stadard settig where we osider the ordiary expeted value ad variae of the two estimators. The seod settig deoted by a i Table oditios o the evet that at least oe sample falls withi C, that is, the evet that kx > 0. This is a reasoable settig to osider if oe takes the view that o estimate should be retured if all of the samples are outside C. That is, if the prued data set, X, is empty, the o estimate should be produed or osidered just as IS does ot produe a estimate whe 0 whe there are o samples at all. Fially, the third settig deoted by a i Table oditios o the evet that kx that a speifi ostat umber of the samples are i C. Table ad the theorems that it referees use additioal symbols that we review here. Let ρ : PrkX > 0 be the probability that at least oe of samples is i C. Let Var g deote the variae give that the radom variables withi the parethesis are sampled from the distributio with PDF g. Let fx v : Var g gx hx X C be the oditioal variae of the importae samplig estimate whe usig a sigle sample ad give that the sample is i C. Let B, deote the biomial distributio with parameters ad ad let E B, deote the expeted value give that B,. Although the proofs of the laims i Table are some of the primary otributios of this work, we defer them to the supplemetal doumet beause they are straightforward though legthy ad do ot provide further isights ito the results. The primary result of Table is that US is ubiased ad ofte has lower variae i the key settig of iterest: whe at least oe sample is i the support of the target distributio whe kx > 0. We fid this settig ompellig beause, whe o samples are i F, little a be iferred about E f hx]. I this settig deoted by i Table US is a ubiased estimator, while IS is ot although the bias of IS does go to zero as. To uderstad the soure of this bias, osider the bias of IS give that kx the settig i Table. I this ase, E g ISX ]. Reall that IS uses a empirial estimate of, i.e., ĉ as disussed i Theorem. Whe this estimate is orret, terms i ael, makig IS ubiased. Thus, the bias of IS whe oditioig o the evet that kx > 0 stems from IS s use of a estimate of. Next we disuss the variae of the two estimators give that at least oe sample falls withi C, i.e., i the settig. First osider how the variaes of IS ad US hage as 0 that is, as the differees betwee the supports of the samplig ad target distributios ireases. Speifially, let i : i for i N >0. We the have that: VarISX kx > 0, i iv ρ v ρi v i, sie ρ 0, ], ad VarUSX kx > 0, i v/i E B, / > 0] v/i, sie E B, > 0]. Thus, as i as 0 logarithmially, ad If we do ot oditio o the evet that kx > 0, the US is a biased estimator of. This is beause it is ulear how to defie USX whe kx 0, ad we hose arbitrarily to defie it to be 0. However, the bias of USX i this settig overges quikly to zero, sie ρ the probability that o samples fall withi C overges quikly to oe as.

4 IS US E g ] E g ] E g ] Variae Variae Strogly Cosistet ρ v + v ρ +ρ + ρ ρ Property Theorem 6 Theorem 5 Theorem Yes ad Theorem 9 ρ Theorem 7 Theorem 4 Theorem 3 ρ ve B, > 0] + ρ ρ Theorem 0 ve B, > 0 ] Theorem 8 Yes ad Table : Theoretial properties of IS ad US estimators. give o oditios. oditioed o the evet that kx > 0 that at least oe sample is i C. oditioed o the evet that kx that exatly of samples are i C. All theorems require the assumptio that F H G. The osistey results follow immediately from the fat that the biases ad variaes all overge to zero as Thomas ad Bruskill, 06, Lemma 3. give some fixed ad v, the variae of US goes to zero muh faster tha the variae of IS. The variae of US as a futio of i overges to zero liearly or faster with a rate of at most while the variae of IS overges to zero subliearly at best, logarithmially. Next ote that the variae of US i this settig is idepedet of, but the variae of IS ireases with see Property 3 i the supplemetal doumet, applied to Theorem 9. To ameliorate this issue, a otrol variate, t, a be used to eter the data so that 0. However, sie is ot kow a priori, seletig t is ot pratial. The term that sales with i the variae of IS give that kx > 0 therefore meas that the variae of IS depeds o the quality of the otrol variate poor otrol variates a ause IS to have high variae. By otrast, the variae of US i this settig does ot have a term that sales with, ad so the quality of the otrol variate is less importat. There is a rare ase whe IS a have a lower variae tha US. First, we assume that the otrol variate is perfet so that 0 whih, as disussed before, is impratial ad osider the term that sales with v. From this term, it is lear that US will have lower variae tha IS if: E B, > 0] ρ. 3 Notie that this iequality depeds oly o ad, whih must both be kow i order to implemet US, ad so we a test a priori whether US will have lower variae tha IS. That is, if 3 holds, the US will have lower variae tha IS, give that kx > 0. However, if 3 does ot hold, it does ot mea that IS will have lower variae tha US uless the perfet typially ukow otrol variate is used so that 0. Appliatio to Illustrative Example Beause either method is always superior, here we osider the appliatio of IS ad US to the illustrative example to see whe eah method works best, ad by how muh. We osider the settig where C F, but modify the example slightly. First, although the target distributio is always uiform, we allow for its support to be saled. Speifially, we defie the support of f to be 0, F max ], where F max 0, ]. Whe F max is small, it orrespods to sigifiat differees i support, while large F max orrespod to small differees The quality of the otrol variate a still impat the variae of estimates though, sie it a hage v. whe F max, C F G ad so the two estimators are equivalet. We also modify h to allow for various values of. Speifially, we defie hx + if x < F max / ad hx + if x F max /. Notie that, although we defied h i terms of, remais E f hx], ad also that usig this defiitio of h ad 0 is a istae that is partiularly favorable to IS. For this example, it is straightforward to verify that v 4/Fmax for ay defiitio of, ad F max /. Give these two values ad, we a ompute the bias ad variae of eah estimator. The biases ad variaes of the two estimators for various settigs are depited i Figure. Notie that US is always ompetitive with IS, although the reverse is ot true. Partiularly, whe F max is small so that is small, or whe is large, US a have orders of magitude lower variae tha IS. Also, as ireases, the two estimators beome ireasigly similar, sie the empirial estimate of used by IS beomes ireasigly aurate, although US is still vastly superior to IS eve whe is large if is orrespodigly small. This mathes our theoretial aalysis from the previous setio: we expet US to perform better whe is small by our overgee rate aalysis or whe is large due to US s lesser depedee o the quality of the otrol variate, ad we expet the two estimators to beome ireasigly similar as beause ĉ beomes ireasigly similar to. Notie also that gais are ot oly obtaied whe is so small relative to that o samples are expeted to fall withi C a relatively uiterestig settig. For example, the right-most plot i Figure shows that with F max 0.5, where PrkX > 0 ρ, the MSE of US is approximately 0.086, while the MSE 50 of IS is approximately 6.08 US is has roughly /70 the MSE of IS /8 the RMSE. Perhaps surprisigly, there are ases where IS has lower variae tha US eve whe both are ubiased, sie 0. For example, osider the plot with 0 ad 0, ad the positio o the horizotal axis that orrespods to F max.0. This is oe ase where IS is margially better tha US it has lower variae i both settigs, ad either estimator is biased. Ituitively, the IS estimator iludes the poits outside the support of F, although they have assoiated values, hx i 0, whih pulls the importae samplig estimate towards zero. I this ase, whe 0, this extra pull towards zero happes to be beefiial. However, to remai ubiased give the pull towards zero, IS also ireases the magitudes of the weights assoiated with poits

5 Figure : The variaes of IS ad US aross various settigs of ad deoted alog the left ad top. At a glae, otie that the red ad gree urves US ted to be below the blak urves IS, partiularly whe osiderig the logarithmi sale of the vertial axes. The dotted lies show the variae oditioed o the evet that kx > 0. The gree lie shows the mea squared error of the US estimator without ay oditios, whih shows that the variae redutio of US is ot ompletely offset by ireased bias ompare the solid blak ad gree urves. Whe 0 the gree lie obsures the solid red lie. The plot o the right shows a zoomed-i view of the 0, 50 plot without the logarithmi vertial axis. i F, whih iurs additioal variae. Whe F max is small eough, this additioal variae outweighs the variae redutio that results from the extra pull towards zero, ad so US is agai superior. This ituitio is supported by the fat that i Figure IS does ot outperform US for small F max or, sie the a pull towards zero is detrimetal. Fially, we osider the use of IS ad US to reate highofidee upper ad lower bouds o usig a oetratio iequality Massart, 007 like Hoeffdig s iequality Hoeffdig, 963. If b deotes the rage of the futio fxhx/gx, for x G, the usig Hoeffdig s iequality, we have that ISX b l/δ/ is a δ ofidee lower boud o. Similarly, we a use US with Hoeffdig s iequality to reate a δ ofidee lower boud: USX b l/δ/kx, sie the rage of the kx i.i.d. radom variables averaged by USX is b. Notie that, if kx 0, the this seod estimator is udefied oe might defie the lower boud to be a kow lower boud o i this settig. Although we expet that kx, the resultig i the deomiator of the US-based boud is withi the square root, while the i the umerator is ot, ad so the boud ostruted usig US should ted to be tighter whe is small. Appliatio to Diabetes Treatmet We applied US ad IS to the problem of preditig the effetiveess of alterig the treatmet poliy for a partiular perso with type diabetes. That is, we would like to use prior data from whe the idividual was treated with oe treatmet poliy to estimate how well a related poliy would work. The treatmet poliy is parameterized by two umbers, CR ad CF, ad ditates how muh isuli a perso should ijet prior to eatig a meal i order to keep his or her blood gluose lose to optimum levels. CR ad CF are typially speified by a diabetologist ad tweaked durig follow-up visits every 3 6 moths. If follow-up visits are ot a optio, reet researh has suggested usig reiforemet learig algorithms to tue CR ad CF Bastai, 04. Here we fous o a sub-problem of improvig CR ad CF usig data olleted from a iitial rage of admissible values of CR ad CF to predit how well a ew rage of values for CR ad CF would perform. Whe olletig data, CR ad CF are draw uiformly from a iitial admissible rage, ad the used for oe day whih we view as oe episode of a Markov deisio proess. The performae durig eah day is measured usig a objetive futio similar to the reward futio proposed by Bastai 04, whih measures the deviatio of blood gluose from optimum levels, with larger pealties for low blood gluose levels. We refer to the measure of how good the outome was from oe day as the retur assoiated with that day, with larger values beig better. Usig approximately 30 days of data, our goal is to estimate the expeted retur if a differet distributio of CR ad CF were to be used. We osider a speifi i silio perso a perso simulated usig a metaboli simulator. We used the subjet Adult#003 i the Type Diabetes Metaboli Simulator TDMS Dalla Ma et al., 04 a simulator that has bee approved by the US Food ad Drug Admiistratio as a substitute for aimal trials i pre-liial testig of treatmet poliies for type diabetes. Durig eah day, the subjet is give three or four meals of radomized sizes at radomized

6 Figure 3: The first ad seod plots show a estimate of the expeted retur for various CR ad CF, from two differet agles the seod is a side-view of the first. The seod plot also iludes blue poits depitig the Mote Carlo returs observed from usig differet values of CR ad CF for a day otie the high variae. The two plots o the right depit the bias, variae, ad MSE of IS, US, ad WIS without ay oditioig for various values of ad both without third plot ad with fourth plot a otrol variate. The urves for US are largely obsured by the orrespodig urves for WIS. Notie that the variae of IS approahes 0.06, whih is eormous give that the differee betwee the best ad worst CR ad CF pairs possible uder the samplig poliy is approximately times, similar to the experimetal setup proposed by Bastai 04. As a result of this radomess, ad the stohasti ature of the TDMS model, applyig the same values of CR ad CF a produe differet returs if used for multiple days. After aalyzig the performae of may CR ad CF pairs, we seleted a iitial rage that results i good performae: CR 8.5, ] ad CF 0, 5]. Usig a large umber of samples, we omputed a Mote Carlo estimate of the expeted retur if differet CR ad CF values are used for a sigle day this estimate is depited i Figure 3. As desribed by Bastai 04, whe the value of CR is set appropriately, performae is robust to hages i CF. We therefore fous o possible hages to CR. Speifially, we osider ew treatmet poliies where CF remais sampled from the uiform distributio over 0, 5], but where CR is sampled from the truated ormal distributio over CR mi, ], with mea ad stadard deviatio CR mi. This distributio plaes the largest probability desities at the upper ed of the rage of CR, whih favors better poliies. As CR mi ireases towards, the support of the samplig distributio ad target distributio beome ireasigly differet CR mi /.5 ad the expeted retur ireases. For eah value of CR mi eah of whih orrespods to a value of, we performed,433 trials, eah of whih ivolved geeratig the returs from 30 days, where the values of CR ad CF used for eah day were sampled uiformly from CR 8.5, ] ad CF 0, 5], ad the usig IS, US, ad weighted importae samplig WIS to estimate the expeted retur if CR ad CF were sampled from the target distributio the truated Gaussia parameterized by CR mi. Figure 3 displays the bias, variae ad mea squared error MSE of these,433 estimates, usig a estimate of groud truth omputed usig Mote Carlo samplig. Figure 3 also shows the impat of providig a ostat otrol variate to all the estimators: the hose otrol variate was the expeted retur uder the samplig distributio. Notie that we see the same tred as i the illustrative example for small the best treatmet poliies, whih have small rages of CR, US sigifiatly outperforms IS. Furthermore, whe a deet otrol variate is ot used, the beefits of US are ireased, eve whe otrollig for the resultig bias by measurig the mea squared error. We also omputed the biases ad variaes give that kx > 0, ad observed similar results ot show, whih favored US slightly more. Notie that WIS ad US perform very similarly. Ideed, if the samplig ad target distributios are both uiform, it is straightforward to verify that WIS ad US are equivalet. I other experimets ot show we foud that WIS yields lower variae tha US whe the target distributio is modified to be eve less like the uiform distributio. However, it is ofte importat to be able to produe ofidee itervals aroud estimates espeially whe data is limited, ad sie WIS is biased, it aot be used with stadard oetratio iequalities. We used Hoeffdig s iequality to ompute a 90% ofidee iterval aroud the estimates produed by IS ad US without otrol variates ad with CR mi 0.375, so that /4 usig various umbers of samples days of data. The mea ofidee itervals are depited i Figure 4, whih also shows a Mote Carlo estimate of, as well as determiisti domai-speifi upper ad lower bouds o hx deoted by h rage i the leged. If kx 0, the US is ot defied, ad so the ofidee itervals show for US are averaged oly over the istaes where kx > 0. To show how ofte US returs a solutio, Figure 4 also shows ρ the probability that US will produe a ofidee boud usig the right vertial axis for sale. US produes a muh tighter ofidee iterval tha IS i all ases. Furthermore, the settig where US ofte does ot retur a boud orrespods to the settig where IS produes a ofidee iterval that is outside the determiisti boud o hx a trivial ofidee iterval. I additioal experimets ot show we defied the bouds to be truated to always be withi the determiisti bouds o hx ad defie the boud produed usig US to be oservative equal to the determiisti bouds whe kx 0. I this experimet we saw similar results the ofidee itervals produed usig US were muh tighter tha those usig IS.

7 Figure 4: Cofidee bouds usig IS ad US. Should Oe Use US or WIS i Pratie? The results preseted i the previous setio might raise the questio: whe should oe use US rather tha WIS? Previously we hited at the problem with WIS: it is a biased estimator. Here we disuss why this theoretial property has importat pratial ramifiatios that rule out the use of WIS but ot US for may high-risk problems. First we list the troublesome theoretial properties of the WIS estimator, whih are disussed i the work of Thomas 05, Setio 3.8. Whe there is oly a sigle sample, i.e., whe, WIS is a ubiased estimator of E g hx]. As ireases, the expeted value of the WIS estimator shifts towards the target value, E f hx]. If the samples that are likely uder g are extremely ulikely uder f, the the shift of the expeted value of the WIS estimator from E g hx] to E f hx] a be exeedigly slow. Cosider what this would mea for our diabetes experimet. Here the behavior poliy samplig distributio is a relatively deet poliy that we might be osiderig hagig. The evaluatio poliy target distributio might be a ew treatmet poliy that is both dagerously worse tha the behavior poliy ad quite differet from the behavior poliy. To determie whether the evaluatio poliy should be deployed, we might rely o high-ofidee guaratees, as has bee suggested for similar problems Thomas et al., 05a. That is, we might use Hoeffdig s iequality to ostrut a high-ofidee lower-boud o the expeted value of the WIS estimator, ad the require this boud to be ot far below the performae of the behavior poliy. Beause the behavior ad evaluatio poliies are quite differet, the WIS estimator will produe relatively lowvariae estimates etered ear the performae of the reasoable behavior poliy, rather tha estimates etered ear the dagerously poor performae of the evaluatio poliy. This meas that the lower-boud that we ompute will be a lower boud o the performae of the deet behavior poliy, rather the true poor performae of the evaluatio poliy. Moreover, if oe uses Studet s t-test or a bootstrap method to ostrut the ofidee iterval, as has bee suggested whe usig WIS Thomas et al., 05b, we might obtai a very-tight ofidee iterval aroud the performae of the behavior poliy. This exemplifies the problem with usig WIS for high-risk problems: the bias of the WIS estimator a ause us to ofte erroeously olude that dagerous poliies are safe to deploy. Colusio ad Future Work We have preseted a simple ew variat of importae samplig, US. Our aalytial ad empirial results suggest that US a sigifiatly outperform ordiary importae samplig whe the supports of the samplig ad target distributios differ. We also provide a iequality that a be evaluated prior to observig ay data, ad whih, if satisfied, guaratees that US will have lower variae tha ordiary importae samplig. Ulike some other importae samplig estimators that have bee developed to redue variae like WIS, US is ubiased give mild oditios that still permit the easy omputatio of ofidee itervals. Referees M. Bastai. Model-free itelliget diabetes maagemet usig mahie learig. Master s thesis, Departmet of Computig Siee, Uiversity of Alberta, 04. C. Dalla Ma, F. Miheletto, D. Lv, M. Breto, B. Kovathev, ad C. Cobelli. The UVA/Padova type diabetes simulator ew features. Joural of Diabetes Siee ad Tehology, 8:6 34, 04. M. Dudík, J. Lagford, ad L. Li. Doubly robust poliy evaluatio ad learig. I Proeedigs of the Twety- Eighth Iteratioal Coferee o Mahie Learig, pages , 0. J. M. Hammersley. Mote Carlo methods for solvig multivariable problems. Aals of the New York Aademy of Siees, 863: , 960. J. M. Hammersley ad D. C. Hadsomb. Mote Carlo methods, Methue & Co. Ltd., Lodo, page 40, 964. W. Hoeffdig. Probability iequalities for sums of bouded radom variables. Joural of the Ameria Statistial Assoiatio, 5830:3 30, 963. N. Jiag ad L. Li. Doubly robust off-poliy value evaluatio for reiforemet learig. I Iteratioal Coferee o Mahie Learig, 06. H. Kah. Use of differet Mote Carlo samplig tehiques. Tehial Report P-766, The RAND Corporatio, September 955. P. Massart. Coetratio Iequalities ad Model Seletio. Spriger, 007. R. Rubistei. Simulatio ad the Mote Carlo method. Wiley, New York, 98. P. S. Thomas. Safe Reiforemet Learig. PhD thesis, Uiversity of Massahusetts Amherst, 05. P. S. Thomas ad E. Bruskill. Data-effiiet off-poliy poliy evaluatio for reiforemet learig. I Iteratioal Coferee o Mahie Learig, 06. P. S. Thomas, G. Theoharous, ad M. Ghavamzadeh. High ofidee off-poliy evaluatio. I Proeedigs of the Twety-Nith Coferee o Artifiial Itelligee, 05a. P. S. Thomas, G. Theoharous, ad M. Ghavamzadeh. High ofidee poliy improvemet. I Iteratioal Coferee o Mahie Learig, 05b.

8 Supplemetal Doumet I this supplemetal doumet we prove the various properties ad theorems refereed earlier partiularly those i Table. Property. If F H G the E g ISX ]. ] E gisx ] E a fx g gx hx b F H gx fx hx dx gx fxhx dx E f hx], where a holds beause ISX is the mea of idepedet ad idetially distributed radom variables, ad b holds beause x G \ F H, fx 0. We ow provide a proof of Theorem, whih states that if C G, the USX ISX. I this settig, gx dx ad sie every G X i must be withi C, kx. So, USX kx G fx i gx i hx i fx i gx i hx i. We ow provide a proof of Theorem, whih states that if we replae with a empirial estimate, ĉx : kx, the USX ISX. Usig the empirial estimate, ĉx, i plae of withi US we have: USX ĉx kx kx kx fx i gx i hx i fx i gx i hx i fx i gx i hx i ISX. Theorem 3. If F H G ad N >0, the E g USX kx ]. Let Pr g X C deote the probability that a sample, X, from the samplig distributio is i C. E g USX kx ] ] fx i E g gx i hx i kx ] a fx i E g gx i hx i i {,..., }, X i C b E g fx ] gx hx X C gx C Pr g X C fx hx dx gx d gx C fx hx dx gx fxhx dx C e E f hx], where a holds beause fx i 0 for all but of the terms i the summatio, ad so by re-orderig the X i so that these terms have idies,..., we eed oly sum to rather tha, b holds beause the summatio is over idepedet ad idetially distributed radom variables, holds by the defiitio of oditioal expetatios, d holds beause Pr g X C, ad e holds beause F H C. Theorem 4. If F H G the E g USX kx > 0]. E g USX kx > 0] PrkX kx > 0 E gusx kx ] PrkX > 0 a, PrkX kx > 0 PrkX > 0 PrkX kx > 0 PrkX > 0 where a holds beause, by Theorem 3, EUSX kx ]. Theorem 5. If F H G ad N >0, the E g ISX kx ]. 4 Followig roughly the same steps as used to prove

9 Theorem 3 we have that: E g ISX kx ] ] fx i E g gx i hx i kx ] fx i E g gx i hx i i {,..., }, X i C ] fx E g gx hx X C gx fx hx dx C gx E f hx], ad so 4 follows. Theorem 6. If F H G the E g ISX kx > 0]. Reall from Property that E g ISX ]. By margializig over whether or ot kx > 0, we also have that: E g ISX ] PrkX > 0E g ISX kx > 0] + PrkX 0E g ISX kx 0]. Property. Let X,..., X be idepedet ad idetially distributed radom variables, eah with fiite mea ad variae. The, E Reall that Var X i Var X + E X ]. X i E So, by rearragig terms: E ] X i E ] X i. ] X i Var X i + ] E X i. Sie the X i are idepedet ad idetially distributed, we therefore have that: E X i Var X + E X ] Var X + E X ]. So, E g ISX kx > 0] PrkX 0E g ISX kx 0] PrkX > 0 a, where a holds beause E g ISX kx 0] 0 ad PrkX > 0 PrkX 0. Theorem 7. If F H G, the E g USX ]. E g USX ] PrkX > 0 E } kx {{ 0] }, by Theorem 4 + PrkX 0 E g USX kx 0] 0. Before otiuig, reall the followig property whih we prove for ompleteess: Theorem 8. If F H G the ] Var g USX kx > 0 ve B, > 0. Var gusx kx > 0 E gusx kx > 0] E gusx kx > 0] E gusx kx > 0] PrkX PrkX > 0 EgUSX kx ]. We will write y to deote a vetor i R, the elemets of whih are y,..., y R. We also write y i:j to deote the i th through j th etries of y, i.e., y i:j : y i, y i+,..., y j, y j ]. Let G {y G : ky } be the set of all possible tuples of samples where exatly are i C. We also overload the defiitio of g by defiig gy : gy i. Usig this otatio, we have that where... are used to deote that a log lie is split aross multiple lies via salar multipliatio: 5

10 Combiig 5 with 6 we have that E g USX kx ] gy G PrkX USy dy a gy USy dy : dy +: PrkX C G\C b PrkX C USy : dy : dy +: G\C gy :gy +:... gy : USy : C dy :... gy +: dy +: G\C } {{} k gy : C gy : C gy : fy i gy hyi dy : i fy i gy hyi dy : i fy i gy hyi dy : i C PrkX ] fx i E g gx hxi X C i ] d fx v + E gx hx X g, X C v + gx C fx hx dx gx v +, 6 Var g USX kx > 0 PrkX PrkX > 0 v + PrkX v PrkX > 0 + PrkX PrkX > 0 PrkX v PrkX > 0 ] ve B, > 0. Theorem 9. If F H G the Var gisx kx > 0 v ρ + ρ + ρ ρ. At a high level, this proof is similar to the proof of Theorem 8, but uses the property that ISX USX. kx Var gisx kx > 0 E gisx kx > 0] E gisx kx > 0] a E gisx kx > 0] PrkX PrkX > 0 EgISX kx ], 7 where a omes from Theorem 6. Also, where a omes from the fat that there are ways of orderig elemets suh that are i C ad are i G \ C, ad the fat that US does ot deped o the order of its iputs, b omes from the property that USy does ot hage if additioal samples are appeded to y that are ot i C ad the fat that gy a be deomposed ito gy : gy +: sie it represets the joit probability desity futio for idepedet ad idetially distributed radom variables, omes from the fat that PrkX, ad d omes from Property. E g ISX kx ] ] a kx E g USX kx EgUSX kx ] b v +,8 where a holds beause ISX kx USX ad b follows from 6. Usig the shorthad, ρ : PrkX > 0 ad by ombiig 7 with 8 we have

11 that: Var g ISX kx > 0 PrkX PrkX > 0 v + v PrkX ρ + ρ E B, ] PrkX E B, ] + ρ v ρ + + ρ ρ v ρ + ρ + ρ ρ. Theorem 0. If F H G the Var g USX ρ ve B, ]+ > 0 ρ ρ. Var g USX E g USX ] E g USX ] a E g USX ] ρ PrkX E g USX kx ] 0 ρ PrkX 0 E g USX kx 0] 0 + PrkX E g USX kx ] ρ PrkX ρ v + ρ ρ PrkX v ρ + ρ PrkX ρ ρ ] ρ ve B, > 0 + ρ ρ, b ρ where a omes from Theorem 7, b omes from 6 ad from multiplyig oe term by ρ/ρ. Theorem. If F H G the Var g ISX v +. Var g ISX a Var gisx Eg ISX ] E g ISX] b Eg ISX ] PrX C X ge g ISX X C] + PrX C X g E g ISX X C] 0 E g ISX X C] v + v +, where a holds beause ISX is the sum of idepedet ad idetially distributed radom variables, b omes from Property, ad omes from applyig 8 with ad. Property 3. ρ + ρ 0, Reall that ρ :, so we have that: ρ + ρ We will show by idutio that 9 is o-egative for all. First, otie that for the base ase where, 9 is equal to zero. For the idutive step we will show that 9 is o-egative for + give that it is o-egative for } {{ } a + +, where a is positive by the idutive hypothesis, ad so we eed oly show that Sie + + +, ad + beause 0, ], we olude.

arxiv: v1 [cs.lg] 10 Nov 2016 Abstract

arxiv: v1 [cs.lg] 10 Nov 2016 Abstract Importae Samplig with Uequal Support Philip S. Thomas ad Emma Bruskill Caregie Mello Uiversity arxiv:6.0345v s.lg 0 Nov 06 Abstrat Importae samplig is ofte used i mahie learig whe traiig ad testig data

More information

Chapter 8 Hypothesis Testing

Chapter 8 Hypothesis Testing Chapter 8 for BST 695: Speial Topis i Statistial Theory Kui Zhag, Chapter 8 Hypothesis Testig Setio 8 Itrodutio Defiitio 8 A hypothesis is a statemet about a populatio parameter Defiitio 8 The two omplemetary

More information

Basic Probability/Statistical Theory I

Basic Probability/Statistical Theory I Basi Probability/Statistial Theory I Epetatio The epetatio or epeted values of a disrete radom variable X is the arithmeti mea of the radom variable s distributio. E[ X ] p( X ) all Epetatio by oditioig

More information

ε > 0 N N n N a n < ε. Now notice that a n = a n.

ε > 0 N N n N a n < ε. Now notice that a n = a n. 4 Sequees.5. Null sequees..5.. Defiitio. A ull sequee is a sequee (a ) N that overges to 0. Hee, by defiitio of (a ) N overges to 0, a sequee (a ) N is a ull sequee if ad oly if ( ) ε > 0 N N N a < ε..5..

More information

After the completion of this section the student. V.4.2. Power Series Solution. V.4.3. The Method of Frobenius. V.4.4. Taylor Series Solution

After the completion of this section the student. V.4.2. Power Series Solution. V.4.3. The Method of Frobenius. V.4.4. Taylor Series Solution Chapter V ODE V.4 Power Series Solutio Otober, 8 385 V.4 Power Series Solutio Objetives: After the ompletio of this setio the studet - should reall the power series solutio of a liear ODE with variable

More information

ANOTHER PROOF FOR FERMAT S LAST THEOREM 1. INTRODUCTION

ANOTHER PROOF FOR FERMAT S LAST THEOREM 1. INTRODUCTION ANOTHER PROOF FOR FERMAT S LAST THEOREM Mugur B. RĂUŢ Correspodig author: Mugur B. RĂUŢ, E-mail: m_b_raut@yahoo.om Abstrat I this paper we propose aother proof for Fermat s Last Theorem (FLT). We foud

More information

Observer Design with Reduced Measurement Information

Observer Design with Reduced Measurement Information Observer Desig with Redued Measuremet Iformatio I pratie all the states aot be measured so that SVF aot be used Istead oly a redued set of measuremets give by y = x + Du p is available where y( R We assume

More information

Nonparametric Goodness-of-Fit Tests for Discrete, Grouped or Censored Data 1

Nonparametric Goodness-of-Fit Tests for Discrete, Grouped or Censored Data 1 Noparametri Goodess-of-Fit Tests for Disrete, Grouped or Cesored Data Boris Yu. Lemeshko, Ekateria V. Chimitova ad Stepa S. Kolesikov Novosibirsk State Tehial Uiversity Departmet of Applied Mathematis

More information

COMP26120: Introducing Complexity Analysis (2018/19) Lucas Cordeiro

COMP26120: Introducing Complexity Analysis (2018/19) Lucas Cordeiro COMP60: Itroduig Complexity Aalysis (08/9) Luas Cordeiro luas.ordeiro@mahester.a.uk Itroduig Complexity Aalysis Textbook: Algorithm Desig ad Appliatios, Goodrih, Mihael T. ad Roberto Tamassia (hapter )

More information

MATH 320: Probability and Statistics 9. Estimation and Testing of Parameters. Readings: Pruim, Chapter 4

MATH 320: Probability and Statistics 9. Estimation and Testing of Parameters. Readings: Pruim, Chapter 4 MATH 30: Probability ad Statistics 9. Estimatio ad Testig of Parameters Estimatio ad Testig of Parameters We have bee dealig situatios i which we have full kowledge of the distributio of a radom variable.

More information

Lecture 8. Dirac and Weierstrass

Lecture 8. Dirac and Weierstrass Leture 8. Dira ad Weierstrass Audrey Terras May 5, 9 A New Kid of Produt of Futios You are familiar with the poitwise produt of futios de ed by f g(x) f(x) g(x): You just tae the produt of the real umbers

More information

Fluids Lecture 2 Notes

Fluids Lecture 2 Notes Fluids Leture Notes. Airfoil orte Sheet Models. Thi-Airfoil Aalysis Problem Readig: Aderso.,.7 Airfoil orte Sheet Models Surfae orte Sheet Model A aurate meas of represetig the flow about a airfoil i a

More information

Estimation for Complete Data

Estimation for Complete Data Estimatio for Complete Data complete data: there is o loss of iformatio durig study. complete idividual complete data= grouped data A complete idividual data is the oe i which the complete iformatio of

More information

The beta density, Bayes, Laplace, and Pólya

The beta density, Bayes, Laplace, and Pólya The beta desity, Bayes, Laplae, ad Pólya Saad Meimeh The beta desity as a ojugate form Suppose that is a biomial radom variable with idex ad parameter p, i.e. ( ) P ( p) p ( p) Applyig Bayes s rule, we

More information

Lecture 2: Monte Carlo Simulation

Lecture 2: Monte Carlo Simulation STAT/Q SCI 43: Itroductio to Resamplig ethods Sprig 27 Istructor: Ye-Chi Che Lecture 2: ote Carlo Simulatio 2 ote Carlo Itegratio Assume we wat to evaluate the followig itegratio: e x3 dx What ca we do?

More information

(Dependent or paired samples) Step (1): State the null and alternate hypotheses: Case1: One-tailed test (Right)

(Dependent or paired samples) Step (1): State the null and alternate hypotheses: Case1: One-tailed test (Right) (epedet or paired samples) Step (1): State the ull ad alterate hypotheses: Case1: Oe-tailed test (Right) Upper tail ritial (where u1> u or u1 -u> 0) H0: 0 H1: > 0 Case: Oe-tailed test (Left) Lower tail

More information

One way Analysis of Variance (ANOVA)

One way Analysis of Variance (ANOVA) Oe way Aalysis of Variae (ANOVA) ANOVA Geeral ANOVA Settig"Slide 43-45) Ivestigator otrols oe or more fators of iterest Eah fator otais two or more levels Levels a be umerial or ategorial ifferet levels

More information

Class #25 Wednesday, April 19, 2018

Class #25 Wednesday, April 19, 2018 Cla # Wedesday, April 9, 8 PDE: More Heat Equatio with Derivative Boudary Coditios Let s do aother heat equatio problem similar to the previous oe. For this oe, I ll use a square plate (N = ), but I m

More information

Discrete Mathematics for CS Spring 2008 David Wagner Note 22

Discrete Mathematics for CS Spring 2008 David Wagner Note 22 CS 70 Discrete Mathematics for CS Sprig 2008 David Wager Note 22 I.I.D. Radom Variables Estimatig the bias of a coi Questio: We wat to estimate the proportio p of Democrats i the US populatio, by takig

More information

Bernoulli Numbers. n(n+1) = n(n+1)(2n+1) = n(n 1) 2

Bernoulli Numbers. n(n+1) = n(n+1)(2n+1) = n(n 1) 2 Beroulli Numbers Beroulli umbers are amed after the great Swiss mathematiia Jaob Beroulli5-705 who used these umbers i the power-sum problem. The power-sum problem is to fid a formula for the sum of the

More information

Construction of Control Chart for Random Queue Length for (M / M / c): ( / FCFS) Queueing Model Using Skewness

Construction of Control Chart for Random Queue Length for (M / M / c): ( / FCFS) Queueing Model Using Skewness Iteratioal Joural of Sietifi ad Researh Publiatios, Volume, Issue, Deember ISSN 5-5 Costrutio of Cotrol Chart for Radom Queue Legth for (M / M / ): ( / FCFS) Queueig Model Usig Skewess Dr.(Mrs.) A.R. Sudamai

More information

Statistical Inference (Chapter 10) Statistical inference = learn about a population based on the information provided by a sample.

Statistical Inference (Chapter 10) Statistical inference = learn about a population based on the information provided by a sample. Statistical Iferece (Chapter 10) Statistical iferece = lear about a populatio based o the iformatio provided by a sample. Populatio: The set of all values of a radom variable X of iterest. Characterized

More information

Probability & Statistics Chapter 8

Probability & Statistics Chapter 8 I. Estimatig with Large Samples Probability & Statistis Poit Estimate of a parameter is a estimate of a populatio parameter give by a sigle umber. Use x (the sample mea) as a poit estimate for (the populatio

More information

Hypothesis Testing. Evaluation of Performance of Learned h. Issues. Trade-off Between Bias and Variance

Hypothesis Testing. Evaluation of Performance of Learned h. Issues. Trade-off Between Bias and Variance Hypothesis Testig Empirically evaluatig accuracy of hypotheses: importat activity i ML. Three questios: Give observed accuracy over a sample set, how well does this estimate apply over additioal samples?

More information

6.3 Testing Series With Positive Terms

6.3 Testing Series With Positive Terms 6.3. TESTING SERIES WITH POSITIVE TERMS 307 6.3 Testig Series With Positive Terms 6.3. Review of what is kow up to ow I theory, testig a series a i for covergece amouts to fidig the i= sequece of partial

More information

Infinite Sequences and Series

Infinite Sequences and Series Chapter 6 Ifiite Sequeces ad Series 6.1 Ifiite Sequeces 6.1.1 Elemetary Cocepts Simply speakig, a sequece is a ordered list of umbers writte: {a 1, a 2, a 3,...a, a +1,...} where the elemets a i represet

More information

Random Variables, Sampling and Estimation

Random Variables, Sampling and Estimation Chapter 1 Radom Variables, Samplig ad Estimatio 1.1 Itroductio This chapter will cover the most importat basic statistical theory you eed i order to uderstad the ecoometric material that will be comig

More information

Principal Component Analysis. Nuno Vasconcelos ECE Department, UCSD

Principal Component Analysis. Nuno Vasconcelos ECE Department, UCSD Priipal Compoet Aalysis Nuo Vasoelos ECE Departmet, UCSD Curse of dimesioality typial observatio i Bayes deisio theory: error ireases whe umber of features is large problem: eve for simple models (e.g.

More information

Lecture 3: August 31

Lecture 3: August 31 36-705: Itermediate Statistics Fall 018 Lecturer: Siva Balakrisha Lecture 3: August 31 This lecture will be mostly a summary of other useful expoetial tail bouds We will ot prove ay of these i lecture,

More information

EECS564 Estimation, Filtering, and Detection Hwk 2 Solns. Winter p θ (z) = (2θz + 1 θ), 0 z 1

EECS564 Estimation, Filtering, and Detection Hwk 2 Solns. Winter p θ (z) = (2θz + 1 θ), 0 z 1 EECS564 Estimatio, Filterig, ad Detectio Hwk 2 Sols. Witer 25 4. Let Z be a sigle observatio havig desity fuctio where. p (z) = (2z + ), z (a) Assumig that is a oradom parameter, fid ad plot the maximum

More information

Principal Component Analysis

Principal Component Analysis Priipal Compoet Aalysis Nuo Vasoelos (Ke Kreutz-Delgado) UCSD Curse of dimesioality Typial observatio i Bayes deisio theory: Error ireases whe umber of features is large Eve for simple models (e.g. Gaussia)

More information

An Introduction to Randomized Algorithms

An Introduction to Randomized Algorithms A Itroductio to Radomized Algorithms The focus of this lecture is to study a radomized algorithm for quick sort, aalyze it usig probabilistic recurrece relatios, ad also provide more geeral tools for aalysis

More information

Rademacher Complexity

Rademacher Complexity EECS 598: Statistical Learig Theory, Witer 204 Topic 0 Rademacher Complexity Lecturer: Clayto Scott Scribe: Ya Deg, Kevi Moo Disclaimer: These otes have ot bee subjected to the usual scrutiy reserved for

More information

Topic 9: Sampling Distributions of Estimators

Topic 9: Sampling Distributions of Estimators Topic 9: Samplig Distributios of Estimators Course 003, 2018 Page 0 Samplig distributios of estimators Sice our estimators are statistics (particular fuctios of radom variables), their distributio ca be

More information

Monte Carlo Integration

Monte Carlo Integration Mote Carlo Itegratio I these otes we first review basic umerical itegratio methods (usig Riema approximatio ad the trapezoidal rule) ad their limitatios for evaluatig multidimesioal itegrals. Next we itroduce

More information

CS434a/541a: Pattern Recognition Prof. Olga Veksler. Lecture 5

CS434a/541a: Pattern Recognition Prof. Olga Veksler. Lecture 5 CS434a/54a: Patter Recogitio Prof. Olga Veksler Lecture 5 Today Itroductio to parameter estimatio Two methods for parameter estimatio Maimum Likelihood Estimatio Bayesia Estimatio Itroducto Bayesia Decisio

More information

Topic 9: Sampling Distributions of Estimators

Topic 9: Sampling Distributions of Estimators Topic 9: Samplig Distributios of Estimators Course 003, 2018 Page 0 Samplig distributios of estimators Sice our estimators are statistics (particular fuctios of radom variables), their distributio ca be

More information

Sx [ ] = x must yield a

Sx [ ] = x must yield a Math -b Leture #5 Notes This wee we start with a remider about oordiates of a vetor relative to a basis for a subspae ad the importat speial ase where the subspae is all of R. This freedom to desribe vetors

More information

Let us give one more example of MLE. Example 3. The uniform distribution U[0, θ] on the interval [0, θ] has p.d.f.

Let us give one more example of MLE. Example 3. The uniform distribution U[0, θ] on the interval [0, θ] has p.d.f. Lecture 5 Let us give oe more example of MLE. Example 3. The uiform distributio U[0, ] o the iterval [0, ] has p.d.f. { 1 f(x =, 0 x, 0, otherwise The likelihood fuctio ϕ( = f(x i = 1 I(X 1,..., X [0,

More information

Output Analysis (2, Chapters 10 &11 Law)

Output Analysis (2, Chapters 10 &11 Law) B. Maddah ENMG 6 Simulatio Output Aalysis (, Chapters 10 &11 Law) Comparig alterative system cofiguratio Sice the output of a simulatio is radom, the comparig differet systems via simulatio should be doe

More information

Topic 9: Sampling Distributions of Estimators

Topic 9: Sampling Distributions of Estimators Topic 9: Samplig Distributios of Estimators Course 003, 2016 Page 0 Samplig distributios of estimators Sice our estimators are statistics (particular fuctios of radom variables), their distributio ca be

More information

Certain inclusion properties of subclass of starlike and convex functions of positive order involving Hohlov operator

Certain inclusion properties of subclass of starlike and convex functions of positive order involving Hohlov operator Iteratioal Joural of Pure ad Applied Mathematial Siees. ISSN 0972-9828 Volume 0, Number (207), pp. 85-97 Researh Idia Publiatios http://www.ripubliatio.om Certai ilusio properties of sublass of starlike

More information

What is a Hypothesis? Hypothesis is a statement about a population parameter developed for the purpose of testing.

What is a Hypothesis? Hypothesis is a statement about a population parameter developed for the purpose of testing. What is a ypothesis? ypothesis is a statemet about a populatio parameter developed for the purpose of testig. What is ypothesis Testig? ypothesis testig is a proedure, based o sample evidee ad probability

More information

Resampling Methods. X (1/2), i.e., Pr (X i m) = 1/2. We order the data: X (1) X (2) X (n). Define the sample median: ( n.

Resampling Methods. X (1/2), i.e., Pr (X i m) = 1/2. We order the data: X (1) X (2) X (n). Define the sample median: ( n. Jauary 1, 2019 Resamplig Methods Motivatio We have so may estimators with the property θ θ d N 0, σ 2 We ca also write θ a N θ, σ 2 /, where a meas approximately distributed as Oce we have a cosistet estimator

More information

Chapter 6 Sampling Distributions

Chapter 6 Sampling Distributions Chapter 6 Samplig Distributios 1 I most experimets, we have more tha oe measuremet for ay give variable, each measuremet beig associated with oe radomly selected a member of a populatio. Hece we eed to

More information

THE MEASUREMENT OF THE SPEED OF THE LIGHT

THE MEASUREMENT OF THE SPEED OF THE LIGHT THE MEASUREMENT OF THE SPEED OF THE LIGHT Nyamjav, Dorjderem Abstrat The oe of the physis fudametal issues is a ature of the light. I this experimet we measured the speed of the light usig MihelsoÕs lassial

More information

Math Third Midterm Exam November 17, 2010

Math Third Midterm Exam November 17, 2010 Math 37 1. Treibergs σιι Third Midterm Exam Name: November 17, 1 1. Suppose that the mahie fillig mii-boxes of Fruitlad Raisis fills boxes so that the weight of the boxes has a populatio mea µ x = 14.1

More information

Introduction to Machine Learning DIS10

Introduction to Machine Learning DIS10 CS 189 Fall 017 Itroductio to Machie Learig DIS10 1 Fu with Lagrage Multipliers (a) Miimize the fuctio such that f (x,y) = x + y x + y = 3. Solutio: The Lagragia is: L(x,y,λ) = x + y + λ(x + y 3) Takig

More information

Summation Method for Some Special Series Exactly

Summation Method for Some Special Series Exactly The Iteratioal Joural of Mathematis, Siee, Tehology ad Maagemet (ISSN : 39-85) Vol. Issue Summatio Method for Some Speial Series Eatly D.A.Gismalla Deptt. Of Mathematis & omputer Studies Faulty of Siee

More information

7-1. Chapter 4. Part I. Sampling Distributions and Confidence Intervals

7-1. Chapter 4. Part I. Sampling Distributions and Confidence Intervals 7-1 Chapter 4 Part I. Samplig Distributios ad Cofidece Itervals 1 7- Sectio 1. Samplig Distributio 7-3 Usig Statistics Statistical Iferece: Predict ad forecast values of populatio parameters... Test hypotheses

More information

Société de Calcul Mathématique SA Mathematical Modelling Company, Corp.

Société de Calcul Mathématique SA Mathematical Modelling Company, Corp. oiété de Calul Mathéatique A Matheatial Modellig Copay, Corp. Deisio-aig tools, sie 995 iple Rado Wals Part V Khihi's Law of the Iterated Logarith: Quatitative versios by Berard Beauzay August 8 I this

More information

Recurrences: Methods and Examples

Recurrences: Methods and Examples Reurrees: Methods ad Examples CSE 30 Algorithms ad Data Strutures Alexadra Stefa Uiversity of exas at Arligto Updated: 308 Summatios Review Review slides o Summatios Reurrees Reursive algorithms It may

More information

1 of 7 7/16/2009 6:06 AM Virtual Laboratories > 6. Radom Samples > 1 2 3 4 5 6 7 6. Order Statistics Defiitios Suppose agai that we have a basic radom experimet, ad that X is a real-valued radom variable

More information

1 Introduction to reducing variance in Monte Carlo simulations

1 Introduction to reducing variance in Monte Carlo simulations Copyright c 010 by Karl Sigma 1 Itroductio to reducig variace i Mote Carlo simulatios 11 Review of cofidece itervals for estimatig a mea I statistics, we estimate a ukow mea µ = E(X) of a distributio by

More information

ENGI 4421 Confidence Intervals (Two Samples) Page 12-01

ENGI 4421 Confidence Intervals (Two Samples) Page 12-01 ENGI 44 Cofidece Itervals (Two Samples) Page -0 Two Sample Cofidece Iterval for a Differece i Populatio Meas [Navidi sectios 5.4-5.7; Devore chapter 9] From the cetral limit theorem, we kow that, for sufficietly

More information

Unbiased Estimation. February 7-12, 2008

Unbiased Estimation. February 7-12, 2008 Ubiased Estimatio February 7-2, 2008 We begi with a sample X = (X,..., X ) of radom variables chose accordig to oe of a family of probabilities P θ where θ is elemet from the parameter space Θ. For radom

More information

There is no straightforward approach for choosing the warmup period l.

There is no straightforward approach for choosing the warmup period l. B. Maddah INDE 504 Discrete-Evet Simulatio Output Aalysis () Statistical Aalysis for Steady-State Parameters I a otermiatig simulatio, the iterest is i estimatig the log ru steady state measures of performace.

More information

Sequences. Notation. Convergence of a Sequence

Sequences. Notation. Convergence of a Sequence Sequeces A sequece is essetially just a list. Defiitio (Sequece of Real Numbers). A sequece of real umbers is a fuctio Z (, ) R for some real umber. Do t let the descriptio of the domai cofuse you; it

More information

Supplementary Material for: Classical Testing in Functional Linear Models

Supplementary Material for: Classical Testing in Functional Linear Models To appear i the Joural of Noparametri Statistis Vol. 00, No. 00, Moth 20XX, 1 16 Supplemetary Material for: Classial Testig i utioal Liear Models Deha Kog a Aa-Maria Staiu b ad Arab Maity b a Departmet

More information

REGRESSION WITH QUADRATIC LOSS

REGRESSION WITH QUADRATIC LOSS REGRESSION WITH QUADRATIC LOSS MAXIM RAGINSKY Regressio with quadratic loss is aother basic problem studied i statistical learig theory. We have a radom couple Z = X, Y ), where, as before, X is a R d

More information

Machine Learning Brett Bernstein

Machine Learning Brett Bernstein Machie Learig Brett Berstei Week 2 Lecture: Cocept Check Exercises Starred problems are optioal. Excess Risk Decompositio 1. Let X = Y = {1, 2,..., 10}, A = {1,..., 10, 11} ad suppose the data distributio

More information

FACULTY OF MATHEMATICAL STUDIES MATHEMATICS FOR PART I ENGINEERING. Lectures

FACULTY OF MATHEMATICAL STUDIES MATHEMATICS FOR PART I ENGINEERING. Lectures FACULTY OF MATHEMATICAL STUDIES MATHEMATICS FOR PART I ENGINEERING Lectures MODULE 5 STATISTICS II. Mea ad stadard error of sample data. Biomial distributio. Normal distributio 4. Samplig 5. Cofidece itervals

More information

Distribution of Random Samples & Limit theorems

Distribution of Random Samples & Limit theorems STAT/MATH 395 A - PROBABILITY II UW Witer Quarter 2017 Néhémy Lim Distributio of Radom Samples & Limit theorems 1 Distributio of i.i.d. Samples Motivatig example. Assume that the goal of a study is to

More information

Big Picture. 5. Data, Estimates, and Models: quantifying the accuracy of estimates.

Big Picture. 5. Data, Estimates, and Models: quantifying the accuracy of estimates. 5. Data, Estimates, ad Models: quatifyig the accuracy of estimates. 5. Estimatig a Normal Mea 5.2 The Distributio of the Normal Sample Mea 5.3 Normal data, cofidece iterval for, kow 5.4 Normal data, cofidece

More information

Econ 325/327 Notes on Sample Mean, Sample Proportion, Central Limit Theorem, Chi-square Distribution, Student s t distribution 1.

Econ 325/327 Notes on Sample Mean, Sample Proportion, Central Limit Theorem, Chi-square Distribution, Student s t distribution 1. Eco 325/327 Notes o Sample Mea, Sample Proportio, Cetral Limit Theorem, Chi-square Distributio, Studet s t distributio 1 Sample Mea By Hiro Kasahara We cosider a radom sample from a populatio. Defiitio

More information

1 Review of Probability & Statistics

1 Review of Probability & Statistics 1 Review of Probability & Statistics a. I a group of 000 people, it has bee reported that there are: 61 smokers 670 over 5 960 people who imbibe (drik alcohol) 86 smokers who imbibe 90 imbibers over 5

More information

11 Correlation and Regression

11 Correlation and Regression 11 Correlatio Regressio 11.1 Multivariate Data Ofte we look at data where several variables are recorded for the same idividuals or samplig uits. For example, at a coastal weather statio, we might record

More information

Empirical Process Theory and Oracle Inequalities

Empirical Process Theory and Oracle Inequalities Stat 928: Statistical Learig Theory Lecture: 10 Empirical Process Theory ad Oracle Iequalities Istructor: Sham Kakade 1 Risk vs Risk See Lecture 0 for a discussio o termiology. 2 The Uio Boud / Boferoi

More information

Output Analysis and Run-Length Control

Output Analysis and Run-Length Control IEOR E4703: Mote Carlo Simulatio Columbia Uiversity c 2017 by Marti Haugh Output Aalysis ad Ru-Legth Cotrol I these otes we describe how the Cetral Limit Theorem ca be used to costruct approximate (1 α%

More information

This is an introductory course in Analysis of Variance and Design of Experiments.

This is an introductory course in Analysis of Variance and Design of Experiments. 1 Notes for M 384E, Wedesday, Jauary 21, 2009 (Please ote: I will ot pass out hard-copy class otes i future classes. If there are writte class otes, they will be posted o the web by the ight before class

More information

Machine Learning Theory Tübingen University, WS 2016/2017 Lecture 12

Machine Learning Theory Tübingen University, WS 2016/2017 Lecture 12 Machie Learig Theory Tübige Uiversity, WS 06/07 Lecture Tolstikhi Ilya Abstract I this lecture we derive risk bouds for kerel methods. We will start by showig that Soft Margi kerel SVM correspods to miimizig

More information

CS284A: Representations and Algorithms in Molecular Biology

CS284A: Representations and Algorithms in Molecular Biology CS284A: Represetatios ad Algorithms i Molecular Biology Scribe Notes o Lectures 3 & 4: Motif Discovery via Eumeratio & Motif Represetatio Usig Positio Weight Matrix Joshua Gervi Based o presetatios by

More information

Optimally Sparse SVMs

Optimally Sparse SVMs A. Proof of Lemma 3. We here prove a lower boud o the umber of support vectors to achieve geeralizatio bouds of the form which we cosider. Importatly, this result holds ot oly for liear classifiers, but

More information

ECE 8527: Introduction to Machine Learning and Pattern Recognition Midterm # 1. Vaishali Amin Fall, 2015

ECE 8527: Introduction to Machine Learning and Pattern Recognition Midterm # 1. Vaishali Amin Fall, 2015 ECE 8527: Itroductio to Machie Learig ad Patter Recogitio Midterm # 1 Vaishali Ami Fall, 2015 tue39624@temple.edu Problem No. 1: Cosider a two-class discrete distributio problem: ω 1 :{[0,0], [2,0], [2,2],

More information

Simulation. Two Rule For Inverting A Distribution Function

Simulation. Two Rule For Inverting A Distribution Function Simulatio Two Rule For Ivertig A Distributio Fuctio Rule 1. If F(x) = u is costat o a iterval [x 1, x 2 ), the the uiform value u is mapped oto x 2 through the iversio process. Rule 2. If there is a jump

More information

1 Review and Overview

1 Review and Overview DRAFT a fial versio will be posted shortly CS229T/STATS231: Statistical Learig Theory Lecturer: Tegyu Ma Lecture #3 Scribe: Migda Qiao October 1, 2013 1 Review ad Overview I the first half of this course,

More information

16th International Symposium on Ballistics San Francisco, CA, September 1996

16th International Symposium on Ballistics San Francisco, CA, September 1996 16th Iteratioal Symposium o Ballistis Sa Fraiso, CA, 3-8 September 1996 GURNEY FORULAS FOR EXPLOSIVE CHARGES SURROUNDING RIGID CORES William J. Flis, Dya East Corporatio, 36 Horizo Drive, Kig of Prussia,

More information

Probability and statistics: basic terms

Probability and statistics: basic terms Probability ad statistics: basic terms M. Veeraraghava August 203 A radom variable is a rule that assigs a umerical value to each possible outcome of a experimet. Outcomes of a experimet form the sample

More information

Since X n /n P p, we know that X n (n. Xn (n X n ) Using the asymptotic result above to obtain an approximation for fixed n, we obtain

Since X n /n P p, we know that X n (n. Xn (n X n ) Using the asymptotic result above to obtain an approximation for fixed n, we obtain Assigmet 9 Exercise 5.5 Let X biomial, p, where p 0, 1 is ukow. Obtai cofidece itervals for p i two differet ways: a Sice X / p d N0, p1 p], the variace of the limitig distributio depeds oly o p. Use the

More information

Lesson 10: Limits and Continuity

Lesson 10: Limits and Continuity www.scimsacademy.com Lesso 10: Limits ad Cotiuity SCIMS Academy 1 Limit of a fuctio The cocept of limit of a fuctio is cetral to all other cocepts i calculus (like cotiuity, derivative, defiite itegrals

More information

Element sampling: Part 2

Element sampling: Part 2 Chapter 4 Elemet samplig: Part 2 4.1 Itroductio We ow cosider uequal probability samplig desigs which is very popular i practice. I the uequal probability samplig, we ca improve the efficiecy of the resultig

More information

Probability and MLE.

Probability and MLE. 10-701 Probability ad MLE http://www.cs.cmu.edu/~pradeepr/701 (brief) itro to probability Basic otatios Radom variable - referrig to a elemet / evet whose status is ukow: A = it will rai tomorrow Domai

More information

Mixtures of Gaussians and the EM Algorithm

Mixtures of Gaussians and the EM Algorithm Mixtures of Gaussias ad the EM Algorithm CSE 6363 Machie Learig Vassilis Athitsos Computer Sciece ad Egieerig Departmet Uiversity of Texas at Arligto 1 Gaussias A popular way to estimate probability desity

More information

1 Approximating Integrals using Taylor Polynomials

1 Approximating Integrals using Taylor Polynomials Seughee Ye Ma 8: Week 7 Nov Week 7 Summary This week, we will lear how we ca approximate itegrals usig Taylor series ad umerical methods. Topics Page Approximatig Itegrals usig Taylor Polyomials. Defiitios................................................

More information

Econ 325 Notes on Point Estimator and Confidence Interval 1 By Hiro Kasahara

Econ 325 Notes on Point Estimator and Confidence Interval 1 By Hiro Kasahara Poit Estimator Eco 325 Notes o Poit Estimator ad Cofidece Iterval 1 By Hiro Kasahara Parameter, Estimator, ad Estimate The ormal probability desity fuctio is fully characterized by two costats: populatio

More information

Stat 421-SP2012 Interval Estimation Section

Stat 421-SP2012 Interval Estimation Section Stat 41-SP01 Iterval Estimatio Sectio 11.1-11. We ow uderstad (Chapter 10) how to fid poit estimators of a ukow parameter. o However, a poit estimate does ot provide ay iformatio about the ucertaity (possible

More information

Convergence of random variables. (telegram style notes) P.J.C. Spreij

Convergence of random variables. (telegram style notes) P.J.C. Spreij Covergece of radom variables (telegram style otes).j.c. Spreij this versio: September 6, 2005 Itroductio As we kow, radom variables are by defiitio measurable fuctios o some uderlyig measurable space

More information

1 Inferential Methods for Correlation and Regression Analysis

1 Inferential Methods for Correlation and Regression Analysis 1 Iferetial Methods for Correlatio ad Regressio Aalysis I the chapter o Correlatio ad Regressio Aalysis tools for describig bivariate cotiuous data were itroduced. The sample Pearso Correlatio Coefficiet

More information

Lecture 27. Capacity of additive Gaussian noise channel and the sphere packing bound

Lecture 27. Capacity of additive Gaussian noise channel and the sphere packing bound Lecture 7 Ageda for the lecture Gaussia chael with average power costraits Capacity of additive Gaussia oise chael ad the sphere packig boud 7. Additive Gaussia oise chael Up to this poit, we have bee

More information

Discrete Mathematics and Probability Theory Spring 2016 Rao and Walrand Note 19

Discrete Mathematics and Probability Theory Spring 2016 Rao and Walrand Note 19 CS 70 Discrete Mathematics ad Probability Theory Sprig 2016 Rao ad Walrad Note 19 Some Importat Distributios Recall our basic probabilistic experimet of tossig a biased coi times. This is a very simple

More information

Frequentist Inference

Frequentist Inference Frequetist Iferece The topics of the ext three sectios are useful applicatios of the Cetral Limit Theorem. Without kowig aythig about the uderlyig distributio of a sequece of radom variables {X i }, for

More information

Calculus 2 TAYLOR SERIES CONVERGENCE AND TAYLOR REMAINDER

Calculus 2 TAYLOR SERIES CONVERGENCE AND TAYLOR REMAINDER Calulus TAYLO SEIES CONVEGENCE AND TAYLO EMAINDE Let the differee betwee f () ad its Taylor polyomial approimatio of order be (). f ( ) P ( ) + ( ) Cosider to be the remaider with the eat value ad the

More information

CEE 522 Autumn Uncertainty Concepts for Geotechnical Engineering

CEE 522 Autumn Uncertainty Concepts for Geotechnical Engineering CEE 5 Autum 005 Ucertaity Cocepts for Geotechical Egieerig Basic Termiology Set A set is a collectio of (mutually exclusive) objects or evets. The sample space is the (collectively exhaustive) collectio

More information

Lecture 10 October Minimaxity and least favorable prior sequences

Lecture 10 October Minimaxity and least favorable prior sequences STATS 300A: Theory of Statistics Fall 205 Lecture 0 October 22 Lecturer: Lester Mackey Scribe: Brya He, Rahul Makhijai Warig: These otes may cotai factual ad/or typographic errors. 0. Miimaxity ad least

More information

Statistics 511 Additional Materials

Statistics 511 Additional Materials Cofidece Itervals o mu Statistics 511 Additioal Materials This topic officially moves us from probability to statistics. We begi to discuss makig ifereces about the populatio. Oe way to differetiate probability

More information

A Question. Output Analysis. Example. What Are We Doing Wrong? Result from throwing a die. Let X be the random variable

A Question. Output Analysis. Example. What Are We Doing Wrong? Result from throwing a die. Let X be the random variable A Questio Output Aalysis Let X be the radom variable Result from throwig a die 5.. Questio: What is E (X? Would you throw just oce ad take the result as your aswer? Itroductio to Simulatio WS/ - L 7 /

More information

Lecture 5: Parametric Hypothesis Testing: Comparing Means. GENOME 560, Spring 2016 Doug Fowler, GS

Lecture 5: Parametric Hypothesis Testing: Comparing Means. GENOME 560, Spring 2016 Doug Fowler, GS Lecture 5: Parametric Hypothesis Testig: Comparig Meas GENOME 560, Sprig 2016 Doug Fowler, GS (dfowler@uw.edu) 1 Review from last week What is a cofidece iterval? 2 Review from last week What is a cofidece

More information

The Use of L-Moments in the Peak Over Threshold Approach for Estimating Extreme Quantiles of Wind Velocity

The Use of L-Moments in the Peak Over Threshold Approach for Estimating Extreme Quantiles of Wind Velocity The Use of L-Momets i the Pea Over Threshold Approah for Estimatig Extreme Quatiles of Wid Veloity M.D. Padey Uiversity of Waterloo, Otario, Caada P.H.A.J.M. va Gelder & J.K. Vrijlig Delft Uiversity of

More information

n outcome is (+1,+1, 1,..., 1). Let the r.v. X denote our position (relative to our starting point 0) after n moves. Thus X = X 1 + X 2 + +X n,

n outcome is (+1,+1, 1,..., 1). Let the r.v. X denote our position (relative to our starting point 0) after n moves. Thus X = X 1 + X 2 + +X n, CS 70 Discrete Mathematics for CS Sprig 2008 David Wager Note 9 Variace Questio: At each time step, I flip a fair coi. If it comes up Heads, I walk oe step to the right; if it comes up Tails, I walk oe

More information

Module 1 Fundamentals in statistics

Module 1 Fundamentals in statistics Normal Distributio Repeated observatios that differ because of experimetal error ofte vary about some cetral value i a roughly symmetrical distributio i which small deviatios occur much more frequetly

More information