ENGINEERING solutions to decision-making problems are

Size: px
Start display at page:

Download "ENGINEERING solutions to decision-making problems are"

Transcription

1 3788 IEEE TRANSACTIONS ON AUTOMATIC CONTROL, VOL. 62, NO. 8, AUGUST 2017 Sasfcg Mul-Armed Bad Problems Paul Reverdy, Member, IEEE, Vabhav Srvasava, ad Naom Ehrch Leoard, Fellow, IEEE Absrac Sasfcg s a relaxao of maxmzg ad allows for less rsky decso makg he face of uceray. We propose wo ses of sasfcg objecves for he mul-armed bad problem, where he objecve s o acheve reward-based decso-makg performace above a gve hreshold. We show ha hese ew problems are equvale o varous sadard mul-armed bad problems wh maxmzg objecves ad use he equvalece o fd bouds o performace. The dffere objecves ca resul qualavely dffere behavor; for example, ages explore her opos coually oe case ad oly a fe umber of mes aoher. For he case of Gaussa rewards we show a addoal equvalece bewee he wo ses of sasfcg objecves ha allows algorhms developed for oe se o be appled o he oher. We he develop varas of he Upper Credble Lm UCL algorhm ha solve he problems wh sasfcg objecves ad show ha hese modfed UCL algorhms acheve effce sasfcg performace. Idex Terms Mul-armed bad, upper credble lm UCL. I. INTRODUCTION ENGINEERING soluos o decso-makg problems are ofe desged o maxmze a objecve fuco. However, may coexs maxmzao of a objecve fuco s a ureasoable goal, eher because he objecve self s poorly defed or because solvg he resulg opmzao problem s racable or cosly. I hese coexs, s valuable o cosder alerave decso-makg frameworks. Herber Smo cosdered alerave models of raoal decso-makg [30] wh he goal of makg hem compable wh he access o formao ad he compuaoal capaces ha are acually possessed by orgasms, cludg ma, he kds of evromes whch such orgasms exs. A major feaure of he models he cosdered s wha he called sasfcg. I [30], he dscussed very broad erms a varey of Mauscrp receved July 20, 2016; acceped November 27, Dae of publcao December 22, 2016; dae of curre verso July 26, Ths work was suppored par by ONR gra N ad ARO gra W911NF P. Reverdy was suppored hrough a NDSEG Fellowshp. Recommeded by Assocae Edor Q.-S. Ja. P. Reverdy s wh he Deparme of Elecrcal ad Sysems Egeerg, Uversy of Pesylvaa, Phladelpha, PA USA e-mal: preverdy@seas.upe.edu. V. Srvasava s wh he Deparme of Elecrcal ad Compuer Egeerg, Mchga Sae Uversy, Eas Lasg, MI USA e-mal: vabhav@egr.msu.edu. N. E. Leoard s wh he Deparme of Mechacal ad Aerospace Egeerg, Prceo Uversy, Prceo, NJ USA e-mal: aom@prceo.edu. Color versos of oe or more of he fgures hs paper are avalable ole a hp://eeexplore.eee.org. Dgal Objec Idefer /TAC smplfcaos o he classcal ecoomc cocep of raoaly, mos mporaly he dea ha payoffs should be smple, defed by dog well relave o some hreshold value. I [31], he roduced he word sasfcg, a combao of he words sasfy ad suffce, o refer o hs hresholdg cocep ad llusraed usg a mahemacal model of foragg. He also brefly dscussed how sasfcg relaes o problems veory corol ad more complcaed decso processes lke playg chess. Sce Smo s poeerg work, sasfcg has bee suded may felds such as psychology [29], ecoomcs [6], maageme scece [23], [37], ad ecology [36], [8]. I egeerg, sasfcg s of eres for he same reasos ha movaed s roduco he socal scece leraure, specfcally ha ca smplfy decso-makg problems: as compared o maxmzg allows for less rsky decso makg he face of uceray. Furhermore, may egeerg problems are aurally posed usg a sasfcg objecve, such as choosg a desg ha mees gve specfcaos, bu where he desgers may be dffere amog ay such desgs. Sasfcg s well defed eve f here are several compeg performace measures ha rade off complcaed ways, whereas maxmzg may be poorly defed whou addoal formao abou prefereces. Sasfcg has bee suded he egeerg leraure several coexs. I [25], he auhors suded desg opmzao usg a sasfcg objecve ad foud ha s effecve may praccal felds. I [14], he auhors suded corol heory usg a sasfcg objecve fuco, ad [38], he auhors used sasfcg o sudy opmal sofware desg. I [10], he auhors used a mul-armed bad algorhm o cosruc robos ha acvely adap her corol polces o mgae damage, such as acuaor falures. I order o speed he covergece of her algorhm, hey oly sough o defy corol polces wh performace above a se hreshold, raher ha o defy a opmal polcy. The heory ha we develop hs paper formalzes her oo of hresholdg ad provdes bouds o performace. I hs paper, we cosder sasfcg he sochasc mul-armed bad problem [28], for whch a decso maker sequeally chooses oe of a se of alerave opos, called arms, ad ears a reward draw from a saoary probably dsrbuo assocaed wh ha arm. The sadard mul-armed bad problem uses a maxmzg objecve o accumulaed reward. For hs objecve here s a kow performace boud erms of expeced regre, whch s he expeced dfferece bewee he reward receved by he decso maker ad he maxmum reward possble IEEE. Persoal use s permed, bu republcao/redsrbuo requres IEEE permsso. See hp:// sadards/publcaos/rghs/dex.hml for more formao.

2 REVERDY e al.: SATISFICING IN MULTI-ARMED BANDIT PROBLEMS 3789 Sce he sadard oo of regre s defed relave o he ukow opmum, ca oly be compued by a omsce age; hs oo of regre s o compuable by a decso maker faced wh a mul-armed bad problem. Neverheless, s a useful heorecal cocep, whch faclaes he aalyss of algorhms desged o solve bad problems. We exed he oo of regre o sasfcg objecves ad use o aalyze ew algorhms. I coras o he sadard sochasc mul-armed bad problem whch he age seeks o deerme, wh ceray, he opo wh maxmum mea reward, he sasfcg mul-armed bad problem seeks o deerme, wh a desred cofdece, a sasfyg opo. We characerze sasfcg mul-armed bad problems usg hree separae feaures of he sasfcg objecve. The frs feaure selecs he quay o whch he sasfcg objecve s defed. We cosder wo such quaes: he ukow mea reward of he seleced opo, ad he saaeous observed reward. The secod feaure reas he sasfaco aspec of he sasfcg problem. I parcular, selecs f he objecve fuco should be opmzg, or f should be sasfyg. The hrd feaure reas he suffcg aspec of he sasfcg problem. I parcular, selecs f he decso-makg algorhm should be cera ha he opmzg/sasfyg crero s me, or f s suffce for he algorhm o mee a desred hreshold cofdece abou he crero. Dffere combaos of he above hree feaures of sasfcg lead o egh sasfcg objecves ha we dscuss hs paper. We beg by defg he four objecves for he case where he sasfcg quay s he ukow mea reward. We show ha he bad problem wh each of hese four objecves s equvale o a prevously suded bad problem ad use he equvalece o derve a performace boud for he sasfcg problems. These four objecves seek a arm wh sasfygly hgh mea reward whou regard o ha reward s dsperso. To develop objecves wh mproved robusess properes, we he cosder he case where he sasfcg quay s he saaeous observed reward. We exed he frs four objecves o hs case by addg a addoal layer of hresholdg, whch defes four more objecves. Whe he reward dsrbuos belog o locao-scale famles, here s a equvalece bewee he objecves defed erms of mea reward ad he robus objecves defed erms of saaeous reward, whch we prove for Gaussa rewards. For smplcy of exposo, we he specalze o Gaussa mul-armed bad problems, where he reward dsrbuos are Gaussa wh ukow mea ad kow varace. For such problems, we develop several modfcaos of he UCL algorhm ha we developed prevous work [27]. These algorhms solve he problem wh he sasfcg mea reward objecves ad hus also wh he robus objecves. We show ha hese algorhms acheve effce performace. These resuls exed our prevous work [26] by corporag he cocep of suffcecy o he sasfcg objecve, as well as by addg several ew algorhms ad her assocaed aalyss. The assumpo of Gaussa rewards wh kow varace s o requred, bu allows us o focus o he dffere oos of regre, whch s he ma corbuo of hs paper. We laer show how he kow varace assumpo ca be relaxed. Our mehods also exed mmedaely o may oher mpora classes of reward dsrbuos, cludg dsrbuos wh bouded suppor ad sub-gaussa dsrbuos. We show how o exed our mehods hese cases ad provde refereces o he releva leraure for oher exesos. The remader of he paper s srucured as follows. I Seco II we revew he sadard sochasc mul-armed bad problem ad he assocaed performace bouds. I Seco III we propose he sasfcg objecves ad boud performace erms of hese objecves. I Seco IV we specalze o he case of Gaussa rewards ad show he equvalece bewee he sasfcg mea reward objecves ad he sasfcg saaeous observed reward objecves. I Seco V we revew he UCL algorhm, ad Seco VI we desg modfed versos of he UCL algorhm for he sasfcg objecves. We show ha hese modfed algorhms acheve effce performace for Gaussa rewards. We show he resuls of umercal smulaos Seco VII ad Seco VIII we coclude. II. THE STOCHASTIC MULTI-ARMED BANDIT PROBLEM I he sochasc mul-armed bad problem a decsomakg age sequeally chooses oe amog a se of N opos called arms aalogy wh he lever of a slo mache. A sglelevered slo mache s called a oe-armed bad, so he case of N 2 opos s called a mul-armed bad. The decso-makg age collecs reward r R by choosg arm a each me {1,...,T}, where T N s he horzo legh for he sequeal decso process. The reward from opo {1,...,N} s sampled from a saoary probably dsrbuo ν whch has a ukow mea m R.The decso-maker s objecve s o maxmze some fuco of he sequece of rewards {r } by sequeally pckg arms usg oly he formao avalable a me. A. Maxmzao Objecve I he sadard mul-armed bad problem, he age s objecve s o maxmze he expeced cumulave reward [ T ] T J = E r = m. 1 Equvalely, by defg m = max m ad R = m m, he expeced regre a me, mmzg 1 ca be formulaed as mmzg he cumulave expeced regre defed by T N R = Tm m E [ T ] N = E [ T ], 2 =1 where T s he umber of mes arm has bee chose up o me T, = m m s he expeced regre due o pckg arm sead of arm, ad he expecao s over he possble rewards ad decsos made by he age. =1

3 3790 IEEE TRANSACTIONS ON AUTOMATIC CONTROL, VOL. 62, NO. 8, AUGUST 2017 The erpreao of 2 s ha subopmal arms should be chose as rarely as possble. Ths s a o-rval ask sce he mea rewards m are ally ukow o he decsomaker, who mus ry arms o lear abou her rewards whle prefereally pckg arms ha appear more rewardg. The eso bewee hese requremes s kow as he exploreexplo radeoff ad s commo o may problems mache learg ad adapve corol. B. Boud o Opmal Performace Opmal performace a bad problem correspods o pckg subopmal arms as rarely as possble, as show by 2. La ad Robbs [20] suded he sadard sochasc mularmed bad problem ad showed ha ay polcy solvg he problem mus pck each subopmal arm a umber of mes ha s a leas logarhmc he me horzo T,.e., E [ T ] 1 Dν ν + o1 log T, 3 where o1 0 as T +. The quay Dν ν := ν r log ν r ν r dr s he Kullback-Lebler dvergece bewee he reward desy ν of ay subopmal arm ad he reward desy ν of he opmal arm. Equao 3 mples ha cumulave expeced regre mus grow a leas logarhmcally me. The boud 3 s asympoc me, bu researchers e.g., [4], [13], [27] have developed algorhms ha acheve cumulave expeced regre ha s bouded by a logarhmc erm uformly me, somemes wh he same cosa as 3. Cumulave expeced regre ha s uformly bouded me by a logarhmc erm s ofe called logarhmc regre for shor. I he leraure, algorhms ha acheve logarhmc regre wh a leadg erm ha s wh a cosa facor of ha 3 are cosdered o have opmal performace. C. Mulple Plays Aaharam e al. [2] suded a geeralzao of he mularmed bad problem whch he age pcks k 1 arms a each me, whch hey called he mul-armed bad problem wh mulple plays. The case k =1correspods o he sadard mul-armed bad problem defed above. I he spr of [2], le σ be a permuao of {1,...,N} such ha m σ 1 m σ 2 m σ N. For he mul-armed bad problem wh k plays, he opmal polcy wh full formao correspods o pckg he arms σ1,, σk, called he k-bes arms [2]. I he case k =1, σ1 =, he opmal arm defed above. For he case of geeral k 1, he cumulave expeced regre for he mul-armed bad problem wh mulple plays s defed as follows [2]: k N T m σ m E [ T ], 4 =1 =1 whch s a sraghforward geeralzao of he regre 2. The subopmal arms σk +1,,σN are called he k-wors arms [2]. Defe k = m σ k m for each k-wors arm. The quay k s he geeralzao of he expeced regre for he problem wh mulple plays, where he expeced value of he opmal polcy s ha of he k bes arms. As he case of a sgle play, opmal performace correspods o pckg subopmal.e., k-wors arms as rarely as possble. By [2] each k-wors arm mus be pcked a umber of mes ha s a leas logarhmc he me horzo T,.e., E [ T ] 1 Dν ν σ k + o1 log T. 5 Ths boud ca be erpreed as a geeralzao of he La- Robbs boud 3 where he Kullback-Lebler dvergece s ake wh respec o he k h bes arm σk raher ha he frs bes arm σ1.e., he case k =1. D. PAC Bouds I he sadard mul-armed bad problem ad he mularmed bad problem wh mulple plays, regre s defed erms of he ukow mea reward values m. These regre defos mply ha avodg regre requres defyg opmal arms wh ceray. The requreme o defy opmal arms wh ceray s characersc of a maxmzg decsomakg sraegy. I coras, a sasfcg decso-makg age should seek arms ha are good eough. I hs coex, sasfcg correspods o fdg arms ha are opmal wh hgh probably raher ha wh ceray. The Probably Approxmaely Correc PAC model for learg roduced by Vala [34] provdes a aural way o capure hs aspec of sasfcg. Eve-Dar e al. [11], [12] ad Maor ad Tsskls [22] suded he mul-armed bad problem usg he PAC model ad defed a ɛ-opmal arm as oe for whch m >m ɛ,.e., he mea reward s wh ɛ of he opmum value. Equvalely, a ɛ-opmal arm s a arm for whch he expeced regre s a mos ɛ. Uder he PAC model oe wshes o fd a ɛ-opmal arm wh probably of a leas 1 δ. Wh probably oe, hs ca be acheved a fe umber of samples, so performace guaraees ake he form of bouds o he umber of samples requred, whch s referred o as sample complexy. I our oao, we deoe sample complexy by T, as s he value of he horzo legh a whch samplg ermaes. Whe he rewards are Beroull dsrbued wh ukow success probables p, he followg lower boud holds [22]: 1 E [T ] O ɛ 2 log1/δ. 6 A smlar resul was repored [11] for T, raher ha s expeced value. I oher words, oe mus sample a arm a leas log1/δ/ɛ 2 mes o be able o declare ha s ɛ-opmal wh probably a leas 1 δ. Smlar o he work of [2] exedg La ad Robbs bouds [20] o he case of mulple plays, Kalyaakrsha e al. [15] exeded he work of [12] from fdg he ɛ-opmal arm o fdg he mɛ-bes arms wh probably a leas 1 δ. I [15] hs problem s called Explore-m, ad a algorhm ha solves ɛ, m, δ-opmal. Noe ha he problem [12] s he specal case Explore-1. The Explore-m problem s suded [15] for

4 REVERDY e al.: SATISFICING IN MULTI-ARMED BANDIT PROBLEMS 3791 rewards ha are Beroull dsrbued. I s proved ha, for every ɛ, m, δ-opmal algorhm, here exss a bad problem o whch ha algorhm has wors-case sample complexy of a leas logm/8δ. Specfcally, s show ha here exss a bad problem such ha he umber of samples T requred o defy mɛ-bes arms obeys T 1 N m ɛ 2 log. 7 8δ Ths gves a wors-case boud o he umber of mes all arms eed o be sampled o acheve ɛ, m, δ-opmaly. The bouds 6 ad 7 were boh formulaed for he case of Beroull rewards, bu s sraghforward o exed hem o he case where he rewards are Gaussa dsrbued wh ukow mea ad kow varace. E. Gaussa Rewards I hs paper we focus o he case of Gaussa reward dsrbuos, where he dsrbuo ν of rewards assocaed wh arm s Gaussa wh mea m, whch s ukow o he decso maker, ad varace σs, 2, whch s kow o he decso maker from, e.g., prevous observaos or kow measureme characerscs. Relaxao of he assumpo of kow varace s dscussed Remark 12. For he gve case, he Kullback- Lebler dvergece 3 akes he value Dν ν = σs, 2 + σ2 s, σ 2 1 log σ2 s, s, σ 2. 8 s, Ths equao s more easly erpreed whe he reward varaces are uform,.e., σs, 2 = σ2 s for each. I some cases we assume uform varace for smplcy of exposo, bu he releva resuls are readly geeralzed o he case of o-uform varace. Assumg uform varace, Dν ν = 2 /2σ2 s, so he boud 3 s E [ T ] 2σ 2 s 2 + o1 log T. 9 Ths resul ca be erpreed as follows. For a gve value of, a larger varace σs 2 makes he rewards more varable ad herefore s more dffcul o dsgush bewee he arms. For a gve value of σs 2, a larger value of makes easer o dsgush from he opmal arm. The expressos for he problem wh mulple plays.e., 5 are decal excep for subsug σk for ad k for. III. THE MULTI-ARMED BANDIT PROBLEM WITH SATISFICING OBJECTIVES We ow defe he mul-armed bad problem wh sasfcg objecves. We propose several ew sasfcg oos of regre ad fd assocaed bouds o opmal performace. These oos capure wo dmesos of he sasfcg problem: sasfaco,.e., he age s desre o oba a reward ha s above a cera hreshold, ad suffcecy,.e., he age s desre o aa a level of cofdece ha s choce of a gve arm wll brg hem sasfaco. We defe hese oos frs for sasfcg mea reward ad he exed hem o sasfcg saaeous reward, whch we refer o as robus sasfcg. A. Sasfcg Mea Reward We defe sasfaco mea reward as havg a expeced reward m ha s above a specfed hreshold value M. Formally, we represe sasfaco mea reward a me by he varable s, defed as s = 1m > M, 10 where 1 s he dcaor fuco, equal o oe f he argume s rue ad zero oherwse. The hreshold M s a free parameer ha mus be specfed by he decso-makg age. Le m = max m be he maxmum expeced reward from ay arm. The age ca ever be sasfed f M s greaer ha m,sowe assume ha M m o make he problem feasble. If M > m σ 2,.e., greaer ha he mea reward of he secod-bes arm, he arm σ1 = s he oly oe ha s sasfyg mea reward. As he mul-armed bad problem wh mulple plays, le σ be a permuao of {1,...,N} such ha m σ 1 m σ 2 m σ N.Lekbe he larges eger such ha m σ k M. The arms {σ1,...,σk} are he k-bes arms defed by he sasfaco hreshold M. For each arm, defe he hresholded expeced regre M = max{m m, 0}. For each k- bes arm, he hresholded regre s zero, ad for each k-wors arm {σk +1,...,σN}, hevalue M > 0 quafes he exe o whch he arm s usasfyg mea rewards. Noe ha f M = m, M =, whch s he sadard measure of expeced regre. We refer o he k-bes ad k-wors arms as sasfyg ad o-sasfyg arms, respecvely. The sasfaco varable s defed 10 ca be wre as a fuco of he sg of M : s = 1 M =0. The quay s s deermsc. However, sce he age does o kow he value of M assocaed wh ay gve arm, hey mus lear by samplg rewards from he varous arms ad updag her belefs accordgly. Adopg a Bayesa framework, we assume s s a realzao of a bary radom varable S. Due o he sochasc aure of he rewards he age wll have less ha perfec cofdece her belefs abou he value of s. We dsgush sasfcg objecves mea reward accordg o he degree δ [0, 1] of cofdece he age seeks her belefs, whch we call suffcecy mea reward. We defe a arm o be δ-suffcg mea reward f Pr [S =1] 1 δ, where he probably s evaluaed based o he age s curre belefs. For o-zero values of δ, he age fds suffce o have fe cofdece ha hey are sasfed, whle for δ = 0, he age was ceray ha hey are sasfed. The age cao acheve ceray fe me, so hese wo cases resul qualavely dffere behavor: δ =0meas he age wll ever sop explorg, whle δ>0 meas he age wll sele o a se of accepable opos afer fe me.

5 3792 IEEE TRANSACTIONS ON AUTOMATIC CONTROL, VOL. 62, NO. 8, AUGUST 2017 TABLE I TABLE OF THE FOUR DIFFERENT REGRET CONCEPTS, AND RESULTING PROBLEMS, ASSOCIATED WITH THE SATISFICING-IN-MEAN-REWARD MULTI-ARMED BANDIT PROBLEM Threshold level Seek ceray δ =0 Suffce δ >0 M >m σ 2 1 Sadard bad 3 δ-suffcg M m σ 2 2 Sasfaco--mea-rwd 4 M,δ-sasfcg The sasfcg--mea-reward objecve s T 1 s =1or Pr [S =1]> 1 δ. 11 The objecve 11 s maxmzed f, a each me, a sasfyg opo s seleced, or he probably ha he opo s sasfyg s suffcely hgh. The eve ha a opo s sasfyg s o kow a pror ad mus be leared by explorao. Ths resuls a explore-explo radeoff as he sadard mul-armed bad problem. To quafy he opmal explore-explo radeoff he spr of he La-Robbs boud we roduce he followg oo of he expeced sasfcg regre a me, R, defed by R = M 1Pr[S =1]< 1 δ. 12 If he age s suffcely cera of beg sasfed by he choce of, hey cur expeced regre of M. Oherwse, hey cur o regre. We defe he sasfcg--mea-reward mul-armed bad problem erms of mmzg cumulave expeced sasfcg regre. Defo 1 Sasfcg-I-Mea-Reward Mul-Armed Bad Problem: The sasfcg--mea-reward mul-armed bad problem s o mmze he cumulave sum of he expeced sasfcg regre 12: [ T ] J R = E R. 13 The sasfcg--mea-reward bad problem has wo parameers: M ad δ. These parameers characerze he age s hresholds for sasfaco ad suffcecy, respecvely. For purposes of aalyss we dsgush four cases as a fuco of he parameer values. For he sasfaco hreshold M R, he frs case s seg M >m σ 2, whle he secod case s seg M m σ 2. For he suffcecy hreshold δ [0, 1], he frs case s he ceray value δ =0, whle he secod case s δ 0, 1]. Table I summarzes he four problems ha resul from he eraco of he wo dmesos of sasfaco ad suffcecy. Problem 1 ses he sasfaco hreshold M >m σ 2 ad he suffcecy hreshold δ =0, whch resuls a sadard bad problem. We call Problem 2 wh M m σ 2 ad δ =0 sasfaco--mea-reward. We call Problem 3 wh M > m σ 2 ad δ 0, 1] δ-suffcg. Fally, we call Problem 4 wh M m σ 2 ad δ 0, 1], M,δ-sasfcg. Remark 1: We oe ha he dsco bewee Problems 1 ad 2 ad bewee Problems 3 ad 4 s oly due o he rage of values M ca ake. These problems ca be hough of as a sgle problem whch he choce of M dcaes he cardaly of he se of sasfyg arms. However, he wo rages of hresholds M >m σ 2 ad M m σ 2 allow us o clearly coras he sasfcg problem wh he sadard problem. Assumg M >m σ 2 Problems 1 ad 3 s equvale o assumg ha he age seeks he ukow hghes mea reward, whch s cosse wh he sadard problem. The polces we defe for Problems 1 ad 3 do o rely o a kow hreshold M. Assumg M m σ 2 s equvale o assumg ha he age seeks o mee a kow desred mea reward hreshold. The polces we defe for Problems 2 ad 4 do rely o he hreshold M. These same assumpos aalogously dsgush Problems 5 ad 7 from Problems 6 ad 8 defed Seco III-B. However, ulke he polces for Problems 1 ad 3, he polces defed for Problems 5 ad 7 do rely o M >m σ 2 beg kow. We do o assume ay of he problems ha he age kows he permuao σ, so o polces deped o σ. We develop performace bouds for each of hese problems erms of corollares of he performace bouds preseed Seco II. For he problems wh δ =0, hese bouds show ha cumulave expeced regre mus grow a leas a a logarhmc rae, whle for he problems wh δ>0, fe regre s possble. Problem 1 Sadard Bad: The sasfcg--mea-reward mul-armed bad problem wh M >m σ 2 ad δ =0s a sadard mul-armed bad problem. Therefore, for hs problem, he La-Robbs boud 3 holds, ad he expeced umber of mes a subopmal arm s chose obeys E [ T ] 1 Dν ν + o1 log T. As a drec cosequece, he cumulave expeced sasfcg regre 13 grows a leas logarhmcally wh me horzo T : N J R =1 M Dν ν + o1 log T. Problem 2 Sasfaco--Mea-Reward: The sasfaco-mea-reward problem, defed as he sasfcg--meareward mul-armed bad problem where M m σ 2 ad δ =0, also has a logarhmc lower boud o he cumulave expeced sasfcg regre: Corollary 2 Sasfaco-I-Mea-Reward Regre Boud: The sasfaco--mea-reward problem s a sasfcg-mea-reward mul-armed bad problem where he objecve 13 s defed wh M m σ 2 ad δ =0. Ay polcy solvg he sasfaco--mea-reward problem obeys E [ T ] 1 Dν ν σ k + o1 log T 14 for each o-sasfyg arm, where σ s a permuao of {1,...,N} such ha m σ 1 m σ 2 m σ N ad k s he larges eger such ha m σ k M. Proof: The defo of sasfaco 10 mples ha performace bouds for he sasfaco--mea-reward problem ad he mul-armed bad problem wh mulple plays are equvale. Gve a problem sace, he hreshold M duces he umber k of sasfyg arms, so performace ca be aalyzed as he problem wh mulple plays. The boud 5 apples o

6 REVERDY e al.: SATISFICING IN MULTI-ARMED BANDIT PROBLEMS 3793 he problem wh mulple plays ad he equvalece mples he resul. Problem 3 δ-suffcg: The δ-suffcg problem, defed as he sasfcg--mea-reward mul-armed bad problem where M >m σ 2 ad δ 0, 1], adms polces ha acheve cumulave expeced regre ha s a bouded fuco of T : Corollary 3 δ-suffcg Regre Boud: The δ-suffcg problem s a sasfcg--mea-reward mul-armed bad problem where he objecve 13 s defed wh M >m σ 2 ad δ 0, 1]. Ay polcy solvg he δ-suffcg problem obeys 1 T O ɛ 2 log1/δ 15 for each subopmal arm, where ɛ = = M m. Proof: The defo of sasfaco 10 he δ-suffcg problem mples ha he age curs regre f he arm seleced s o ɛ =0,δ-opmal. The boud 6 hus provdes a lower boud o he umber of mes he age mus cur regre. Problem 4 M,δ-Sasfcg: The M,δ-sasfcg problem, defed as he sasfcg--mea-reward mul-armed bad problem where M m σ 2 ad δ 0, 1], adms polces ha acheve cumulave expeced regre ha s a bouded fuco of T : Corollary 4 M,δ-Sasfcg Regre Boud: The M, δ-sasfcg problem s a sasfcg--mea-reward mularmed bad problem where he objecve 13 s defed wh M m σ 2 ad δ 0, 1]. Ay polcy solvg he M,δ-sasfcg mul-armed bad problem obeys N T = T 1 N k ɛ 2 log 16 8δ =1 where σ s a permuao of {1,...,N} such ha m σ 1 m σ 2 m σ N, k s he larges eger such ha m σ k M, ad ɛ = M m σ k. Sce oly arms {σk + 1,...,σN} resul regre, he lef had sde of 16 s a upper boud o he expeced sasfcg regre 13. Proof: The defo of sasfaco 10 he M,δ- suffcg problem mples ha a algorhm ha mmzes sasfcg regre s equvale o a ɛ = m σ k M,k,δ-opmal algorhm he sese of [15]. Therefore, he boud 7 apples o he M,δ-suffcg problem. Recall ha T s he umber of mes all arms cludg he opmal oe should be cumulavely sampled such ha followg T a M,ɛ-opmal decso ca be made. The lower bouds o boh T ad T are depede of T, suggesg ha for M,ɛ-sasfcg, a bouded regre ca be acheved. Corollares 3 ad 4 show ha he wors-case regre s a bouded fuco of T for he suffcg problems, where δ > 0. Therefore we ca coclude ha he expeced regre for such problems ca also be a bouded fuco of T. Ths s a mpora dsco from he maxmzg problems, where δ =0: such problems, he La-Robbs boud 3 mples ha he expeced regre mus grow logarhmcally wh T. As s sadard he bad leraure, we say a algorhm has effce performace f s regre maches, up o cosa facors, he releva growh raes: log T for maxmzg problems ad logk/δ/ɛ 2 for suffcg problems. B. Robus Sasfcg Isaaeous Reward The four objecves defed Seco III-A above defe sasfaco 10 erms of he mea reward m from a arm. Ths capures suaos where he me scale for sasfaco spas umerous decso mes. For example, cosder foragg, where a amal mus cosume a mmum amou of food each day. If each decso me represes a small poro of he day, he oal food cosumed durg he day represes he sum of umerous small rewards from each decso me. As log as he mea reward a each decso me s suffcely hgh, he amal wll mee s daly food requreme. If, sead, he decso me scale s he same as he sasfaco me scale, s more approprae o defe sasfaco a me erms of he reward r receved a ha me. Ths requres more robus algorhms, he sese ha hey mus esure ha each reward, raher ha smply he mea reward, s sasfyg wh hgh probably. I hs coex we defe sasfaco wo sages. Frs, we defe happess as recevg a reward r ha s a leas a hreshold value M R. We represe happess a me as he Beroull radom varable h, defed as h = 1r >M. 17 We defe he success probably of he happess radom varable h as p =Pr[h =1 = ]. 18 The success probably p s he expeced rae of happess due o pckg arm. Ths defes a Beroull mul-armed bad problem where he mea reward.e., happess rae s p. We he defe sasfaco erms of a hreshold Π for hs Beroull mul-armed bad problem as we dd 10: s = 1p >P. 19 Gve he happess hreshold M, hs defo s decal o he defo 10 of sasfaco where m = p, p = max p, ad M =Π. Therefore he four sasfcg mul-armed bad problems defed Table I ca be used o defe four addoal problems hs coex, whch we call robus sasfcg. Defo 2 Robus Sasfcg Mul-Armed Bad Problem: The robus sasfcg mul-armed bad problem s o mmze he cumulave sum of he expeced sasfcg regre 12: [ T ] J R = E R, where he regre R s defed usg he oo of sasfaco defed by A robus sasfcg mul-armed bad problem has hree parameers: M,Π, ad δ. We assume ha M ad Π are chose such ha here s a leas oe sasfyg arm; oherwse, he expeced regre mus grow defely. Table II summarzes he four robus sasfcg mul-armed bad problems ha resul from he eraco of he wo dmesos of sasfaco ad suffcecy, whch we ls below. We assume ha ς s a permuao of {1,...,N} such ha p ς 1 p ς 2... p ς N.

7 3794 IEEE TRANSACTIONS ON AUTOMATIC CONTROL, VOL. 62, NO. 8, AUGUST 2017 TABLE II TABLE OF THE FOUR DIFFERENT REGRET CONCEPTS, AND RESULTING PROBLEMS, ASSOCIATED WITH THE ROBUST SATISFICING MULTI-ARMED BANDIT PROBLEM Threshold level Seek ceray δ =0 Suffce δ>0 Π >p ς 2 5 Robus bad 7 δ-robus suffcg Π p ς 2 6 Robus sasfaco 8Π,δ-robus sasfcg The quay p represes he probably of happess.e., recevg a reward of a leas M due o choosg arm Problem 5. Robus Bad: The robus bad problem s defed as he robus sasfcg mul-armed bad problem where Π >p ς 2 ad δ =0. Problem 6. Robus Sasfaco: The robus sasfaco problem s defed as he robus sasfcg mul-armed bad problem where Π p ς 2 ad δ =0. Problem 7. δ-robus Suffcg: The δ-robus suffcg problem s defed as he robus sasfcg mul-armed bad problem where Π >p ς 2 ad δ 0, 1]. Problem 8. Π,δ-Robus Sasfcg: The Π,δ-robus sasfcg problem s defed as he robus sasfcg mul-armed bad problem where Π p ς 2 ad δ 0, 1]. For a large class of reward dsrbuos, here s a equvalece bewee Problems 5 8 defed erms of r ad Problems 1 4 defed erms of m. By Lemma 5 below, whe he rewards r follow a Gaussa dsrbuo wh ukow mea m ad kow varace σs, 2, each problem Table II s equvale o he aalogous problem Table I. IV. SATISFICING WITH GAUSSIAN REWARDS I hs seco we sudy he Gaussa sasfcg mularmed bad problem. Ths s he sasfcg mul-armed bad problem where he reward r due o selecg arm s r Nm,σs, 2 ad σs, 2 s he kow varace of arm.i hs case, we show a formal equvalece bewee he sasfcg-mea-reward mul-armed bad problems ad he robus sasfcg mul-armed bad problems. The choce of Gaussa rewards faclaes modelg correlao depedeces amog arms, whch ca be useful applcaos. A. Equvalece Lemma for Gaussa Rewards For he Gaussa robus sasfcg mul-armed bad problem, defe he quay x = m M, 20 σ s, whch we call he sadardzed mea reward, for each arm. The followg lemma saes ha each Gaussa robus sasfcg mul-armed bad problem where sasfaco s defed by 19 s equvale o a Gaussa sasfcg--mea-reward mul-armed bad problem where sasfaco s defed by 10 wh sadardzed reward dsrbuos. Lemma 5 Equvalece for Gaussa Rewards: Each Gaussa robus sasfcg mul-armed bad problem s equvale o a Gaussa sasfcg--mea-reward mul-armed bad problem wh rewards r Nx, 1 wh x gve by 20. Tha s, he orderg of he arms erms of x s decal o he orderg erms of p, ad, parcular, he arm wh maxmal x s he arm wh maxmal p. Proof: Wh Gaussa rewards, he probably 18 of happess due o choosg arm s p =Pr[m + σ s, z M] m M =Φ =Φx, 21 σ s, where z N0, 1 s a sadard ormal radom varable ad Φz s s cumulave dsrbuo fuco. Le = arg max p. The key sgh s ha Φ s a mooocally creasg fuco, whch mples ha he orderg of arms erms of p s decal o he orderg erms of x. I parcular, arm s he arm wh maxmal x. Therefore, sasfaco erms of r s equvale o sasfaco erms of he mea reward x. Ths s aga a Gaussa bad problem: cosder he sadardzed reward r = r M, 22 σ s, whch s a Gaussa radom varable r Nx, 1. The quay x plays he role of he mea reward m ad he rasformed rewards have uform varace σ s 2 =1. Mmzg he robus sasfcg regre erms of r s equvale o mmzg he sasfcg regre erms of x. Lemma 5 has wo mplcaos for he relaoshp bewee Problems 5 8 ad Problems 1 4 whe rewards are Gaussa dsrbued. Frs, each Problem 5 8 hers a regre boud from he correspodg Problem 1 4. Secod, each Problem 5 8 ca be solved by applyg he algorhm developed for Problem 1 4 by frs applyg he sadardzao rasformao 22 o he observed rewards. Remark 6 Locao-Scale Famles: Lemma 5 s easly geeralzed o reward dsrbuos belogg o locao-scale famles. A locao-scale famly s a se of probably dsrbuos closed uder affe rasformaos,.e., f he radom varable X s he famly, so s he varable Y = a + bx, where a, b R. Ay radom varable X such a famly wh mea μ ad sadard devao σ ca be wre as X = μ + σz, where Z s a zero-mea, u-varace member of he famly. Examples clude he uform dsrbuo ad Sude s -dsrbuo. B. Applcao o he Gaussa Robus Sasfcg Problems I hs seco we show how o use he equvalece resul of Lemma 5 for he full se of robus sasfcg problems he case of Gaussa rewards. Recall from Lemma 5 ha he probably of happess 18 due o pckg a arm s p. I he proof of he lemma, we show ha maxmzg he probably of happess s equvale o maxmzg he mea reward a Gaussa mul-armed bad problem wh mea rewards x =Φ 1 p, where x s he sadardzed mea reward m M/σ s,.

8 REVERDY e al.: SATISFICING IN MULTI-ARMED BANDIT PROBLEMS 3795 Gve a algorhm developed for oe of he Problems 1 4 defed Table I, ca be appled o he correspodg Problem 5 8 defed Table II as follows. Sadardze he observed rewards r ad ru he algorhm usg he sadardzed rewards r =r M/σ s, as pu. For example, Problem 5, he robus mul-armed bad problem, ca be solved by a algorhm desged o solve Problem 1, he sadard bad problem, where rewards are rasformed accordg o 22 before beg pu o he algorhm. The same procedure allows oe o apply algorhms developed for Problem 3, δ-suffcg, o Problem 7, δ-robus suffcg. For Problem 6, robus sasfaco, ad Problem 8, Π,δ- robus sasfcg, we eed a hreshold X ha s aalogous o he hreshold M defed for Problem 2, sasfaco--meareward, ad Problem 4, M,δ-sasfcg. We use he relaoshp bewee x ad p o derve he hreshold. I parcular, for a robus sasfcg problem wh probably of happess hreshold Π, defe he hreshold X by X =Φ 1 Π. 23 Whe he rewards are Gaussa dsrbued, we ca apply algorhms developed for Problems 2 ad 4 o he correspodg robus sasfcg Problems 6 ad 8 by sadardzg rewards ad usg he hreshold X defed 23 place of he hreshold M. Lemma 5 mples ha he effce performace guaraees for algorhms desged for Problems 1 4 also hold whe hey are used o solve he robus sasfcg Problems 5 8. V. THE UCL ALGORITHM FOR GAUSSIAN MULTI-ARMED BANDIT PROBLEMS I hs seco we revew he UCL algorhm, a Bayesa algorhm we developed ad aalyzed [27] o solve he sadard Gaussa mul-armed bad problem. The UCL algorhm was developed by applyg he Bayesa upper cofdece boud approach of [16] o he case of Gaussa rewards; he choce of Gaussa rewards faclaed he modelg of huma decsomakg behavor. The UCL algorhm maas a belef abou he mea rewards m by sarg wh a pror ad updag usg Bayesa ferece as ew rewards are receved. A each me he algorhm chooses arm usg a heursc ha s a smple fuco of he curre belef sae. For uformave prors, he UCL algorhm acheves logarhmc regre,.e., opmal performace. Uformave prors correspod o havg o formao abou he mea rewards. A major advaage of he UCL algorhm s s ably o corporae formao abou he mea rewards hrough he use of a so-called formave pror. I [27], we showed ha a appropraely-chose pror ca sgfcaly crease he performace of he UCL algorhm. Several dffere UCL algorhms were developed [27], cludg a sochasc decso rule o model huma behavor; here we cover oly he deermsc UCL algorhm, whch, for brevy, we refer o as he UCL algorhm. A. Pror The pror dsrbuo capures he age s kowledge abou he vecor of mea rewards m before begg he ask. We assume ha he pror dsrbuo s mulvarae Gaussa wh mea μ 0 R N ad covarace Σ 0 R N N : m Nμ 0, Σ The h eleme of μ 0, deoed by μ 0, represes he age s mea belef of he reward m assocaed wh arm. The, eleme of Σ 0, deoed by 2, σ 0 represes he age s uceray assocaed wh ha belef. Off-dagoal elemes of Σ 0, e.g., σj 0, represe he age s perceved relaoshp bewee m ad m j :fσj 0 s posve, hgh values of m are correlaed wh hgh values of m j, whle f s egave, hgh values of m correlae wh low values of m j. Ay posve-defe marx ca be used as Σ 0, bu s ofe useful o cosder a srucured paramerzao, such as Σ 0 = σ0 2 Σ, where σ0 2 > 0 ecodes he age s uceray. Oe mpora specal case s a ucorrelaed pror, where Σ s dagoal, whch correspods o he age percevg he rewards assocaed wh dffere arms o be depede. Aoher mpora specal case s a uformave pror, whch correspods o complee uceray,.e., he lm σ0 2 + ; a uformave pror ca be hough of as a specal case of a ucorrelaed pror. B. Iferece Updae A each me he age pcks a arm ad receves a reward r ha s Gaussa dsrbued: r Nm,σs, 2. Bayesa ferece provdes a opmal soluo o he problem of updag he belef sae μ, Σ.e., he suffce sascs for esmag m o corporae hs ew formao. Le Λ =Σ 1, ad le φ R N be he vecor wh eleme equal o 1 ad all oher elemes equal o zero. The gve he Gaussa pror 24, he Bayesa updae equaos are lear [17]: q = r φ σ 2 s, +Λ 1 μ 1, Λ = φ φ T σ 2 s, +Λ 1, μ =Σ q. 25 C. Decso Heursc A each me he UCL algorhm compues a value Q for each arm. The algorhm he pcks he arm ha maxmzes Q.Thas,pcks = arg max Q. 26 The heursc value Q s Q = μ + σ Φ 1 1 α, 27 where μ =μ, σ 2 =Σ,α =1/K, K>0 s a uable parameer, ad Φ 1 s he quale fuco of he sadard ormal radom varable. The heursc Q s a Bayesa upper lm for he value of m based o he formao avalable a me. I represes a opmsc assessme of he value of

9 3796 IEEE TRANSACTIONS ON AUTOMATIC CONTROL, VOL. 62, NO. 8, AUGUST 2017 m. The decso made ca be hough of as he mos opmsc oe cosse wh he curre formao. D. Performace I [27], we suded he case of homogeeous samplg ose.e., σs, 2 = σ2 s for each ad showed ha he UCL algorhm acheves logarhmc cumulave expeced regre uformly me. I parcular, we proved ha he followg heorem holds. We defe {R UCL } {1,...,T } as he sequece of expeced regre for he deermsc UCL algorhm. Theorem 7 Regre of he Deermsc UCL Algorhm [27]: The followg saemes hold for he Gaussa mul-armed bad problem ad he deermsc UCL algorhm wh ucorrelaed uformave pror ad K =1: 1 he expeced umber of mes a subopmal arm s chose ul me T sasfes E [ T ] 8σ 2 s 2 +2 log T +3; 2 he cumulave expeced regre ul me T sasfes T N 8σ 2 J R = R s 2 +2 log T +3. =1 The mplcao of hs heorem ca be see by comparg 1 wh he La-Robbs boud 9: he UCL algorhm acheves logarhmc regre uformly me wh a cosa ha dffers from he opmal asympoc oe by a cosa facor, ad hus s cosdered o have opmal performace. VI. ALGORITHMS FOR SATISFICING GAUSSIAN MULTI-ARMED BANDIT PROBLEMS I hs seco we develop algorhms for solvg Gaussa mul-armed bad problems wh he sasfcg objecves proposed Seco III. All he algorhms coss of modfed versos of he UCL algorhm. We aalyze he algorhms ad show ha hey acheve effce performace. The UCL algorhm solves he sadard Gaussa mul-armed bad problem,.e., he sasfcg Gaussa mul-armed bad problem wh M >m σ 2 ad δ =0Problem 1. We develop hree ew UCL varas for Problems 2 4 Table I. These algorhms ca he be appled o Problems 5 8 Table II. A he ed of he seco, we cosder exesos o reward dsrbuos oher ha he Gaussa wh kow varace. A. Problem 2: Sasfaco-I-Mea-Reward UCL Algorhm A smple modfcao of he UCL algorhm acheves logarhmc regre for he Gaussa sasfaco--mea-reward problem, whch s he sasfcg--mea-reward mul-armed bad problem wh M m σ 2 ad δ =0Problem 2. We defe hs algorhm, whch we refer o as he sasfaco-mea-reward UCL algorhm, as follows. As 27, defe he heursc value Q as Q = μ + σ Φ 1 1 α, where α =1/K ad K>0s aga a uable parameer. Le M R be he sasfaco hreshold, so he age s sasfed f pcks a arm wh m M. Le he elgble se a me be { Q M}. I coras o he UCL seleco scheme 26 ha pcks he arm wh maxmal Q, sasfaco-mea-reward UCL pcks ay arm he elgble se. Tha s, f he elgble se s o-empy, he { Q M}, 28 or f he elgble se s empy, he sasfaco--mea-reward UCL pcks he arm wh maxmal Q. Thus, f he mos recely seleced arm s he elgble se, may be seleced aga eve f does o have he maxmal Q. The sasfaco--mea-reward UCL algorhm acheves logarhmc cumulave expeced sasfaco--mea-reward regre, as guaraeed by he followg heorem. Theorem 8 Regre of he Sasfaco-I-Mea-Reward UCL Algorhm: Le a Gaussa mul-armed bad problem wh he sasfaco--mea-reward objecve have a leas oe arm ha obeys m > M, ad, whou loss of geeraly, assume σs, 2 =1for each arm. The, he followg saemes hold for he sasfaco--mea-reward UCL algorhm wh ucorrelaed uformave pror ad K =1: 1 he expeced umber of mes a o-sasfyg arm s chose ul me T sasfes E [ T ] 8 M 2 +3 log T +4; 2 he cumulave expeced sasfaco--mea-reward regre ul me T sasfes J SM N =1 M 8 M 2 +3 log T +4. To prove Theorem 8 we use he followg boud from [1]. Lemma 9 Bouds o he Iverse Gaussa cdf: For he sadard ormal.e., Gaussa radom varable z ad a cosa w R 0, Pr [z w] 2e w 2 /2 2πw + w2 +8/π 1 2 e w 2 /2. 29 I follows from 29 ha for ay α [0.5, 1], Φ 1 1 α 2 log α. 30 Proof of Theorem 8: The proof proceeds as he proof of Theorem 7 [27], whch self follows he proofs [4]. Le be a o-sasfyg arm,.e., m < M, ad recall ha desgaes

10 REVERDY e al.: SATISFICING IN MULTI-ARMED BANDIT PROBLEMS 3797 he maxmum mea reward. The E [ T ] T = Pr [ = ] T Pr [ Q M ] +Pr η + [ Q Q ] & max Q j < M j T [ Pr Q M, η ] +Pr [ Q Q, η ]. The frs erm he summad correspods o he probably ha he o-sasfyg arm s he elgble se, whle he secod erm correspods o he probably ha he elgble se s empy ad ha a o-sasfyg arm appears beer ha a opmal arm. The saeme Q Q mples ha a leas oe of he followg equales holds: μ m + C 31 μ m C 32 m <m +2C, 33 where C = σ Φ 1 1 α ad α =1/K. Oherwse,f oe of holds, he Q = μ + C >m m +2C >μ + C = Q. We frs aalyze he probably ha 31 holds. For a ucorrelaed uformave pror, μ s equal o m, he emprcal mea reward observed a arm ul me, ad σ =1/. Therefore, for a ucorrelaed uformave pror, Q = m + 1 Φ 1 1 α. Codoal o, he emprcal mea reward m s self a Gaussa radom varable wh mea m ad sadard devao 1/, so 31 holds f m m + 1 Φ 1 1 α m + z m + 1 Φ 1 1 α z Φ 1 1 α, where z N0, 1 s a sadard ormal radom varable. Thus, for a uformave pror, Pr [31 holds] =α = 1 K. Smlarly, 32 holds f m + m m C z m 1 Φ 1 1 α z Φ 1 1 α, where z N0, 1 s a sadard ormal radom varable. Thus, for a uformave pror, Pr [32 holds] =α = 1 K. Iequaly 33 holds f m <m + 2 Φ 1 1 α < 2 Φ 1 1 α 2 4 < 2 log < 2 log T < 2 log α 34 where = m m ad equaly 34 follows from boud 30. Thus, for a uformave pror, 33 ever holds f 8 2 log T. 35 Thus, for suffcely large, Pr [Q Q ]=2/K. We ow boud he probably Pr [Q M] ha a osasfyg arm s he elgble se. Noe ha Q Mmples ha a leas oe of he followg equales holds: μ m + C 36 M <m +2C. 37 Oherwse, f eher 36 or 37 holds, M m +2C > μ + C = Q ad arm s o he elgble se. 36 s decal o 31 ad 37 o 33. For a uformave pror, Pr [36 holds] =α = 1 K. Ad 37 holds f M <m + 2 Φ 1 1 α M < 2 Φ 1 1 α M 2 < 2 logα 4 M 2 M 2 < 2 log < 2 log T. 4 4 Thus, for a uformave pror, 37 ever holds f 8 M 2 log T. 38 Sce m M, for each o-sasfyg arm, M. Thus, 1/ M 2 1/ 2 ad 38 mples 35. So seg 8 η = M 2 log T 39 yelds he boud E [ T ] T [ η + Pr Q M, η ] < +Pr [ Q Q, η ] 8 T 1 M 2 log T +3.

11 3798 IEEE TRANSACTIONS ON AUTOMATIC CONTROL, VOL. 62, NO. 8, AUGUST 2017 The sum ca be bouded by he egral T 1 T d = 1 + log T, yeldg he boud he frs saeme of he heorem: E [ T ] 8 M 2 +3 log T +4. The secod saeme of he heorem follows from he defo 12 of expeced sasfcg regre. B. Problem 3: δ-suffcg UCL Algorhm A alerave modfcao of he UCL algorhm acheves fe sasfcg regre he Gaussa δ-suffcg problem, whch s he sasfcg--mea-reward mul-armed bad wh M >m σ 2 ad δ 0, 1] Problem 3. For he age, hs ca be hough of as wag o have fe cofdece ha has foud he ukow opmal arm σ1.forheδ-suffcg problem, defe he heursc fuco Q = μ + σ Φ 1 1 δ. 2 We defe he δ-suffcg UCL algorhm as he algorhm ha selecs arm = arg max Q a each decso me. Theδsuffcg UCL algorhm acheves fe cumulave sasfcg regre, as guaraeed by he followg heorem. Theorem 10: Cosder he δ-suffcg UCL algorhm wh a uformave pror. The umber of mes he pcked arm s o-sasfyg wh probably greaer ha δ s upper bouded as T < 4σ2 s 2 Φ 1 1 δ Proof: We boud T by og ha a o-sasfyg arm s pcked oly f Q Q, whch ca be decomposed as he proof of Theorem 8 o he hree codos 42 s equvale o μ m + C 40 μ m C 41 m <m +2C. 42 = m m < 2C = 2σ s Φ 1 1 δ/2. Squarg ad rearragg, we see ha hs ever holds f > 4σ2 s log1/δ 2 +1 > 4σ2 s Φ 1 log δ/2 2 = η. The same argume as he proof of Theorem 8 shows ha for 1, 40 ad 41 each hold wh probably a mos δ/2. Therefore, for >η+1, a o-sasfyg arm s seleced wh probably a mos δ. Theorem 10 guaraees ha he δ-suffcg UCL algorhm acheves fe regre. Furhermore, he algorhm s effce ha he regre maches he depedece o ɛ ad δ he boud 15. To see hs, oe ha a o-sasfyg arm wh s a ɛ = -subopmal arm, so Corollary 3 mples ha T s lower bouded by O log1/δ/ɛ 2. The saeme of Theorem 10 combed wh he boud 30 o he verse Gaussa cdf mples ha T s upper bouded by 8σs 2 log2/δ/ 2 +1=8σ2 s log2/δ/ɛ 2 +1, whch maches he lower boud 15 up o cosa facors. C. Problem 4: M,δ-Sasfcg UCL Algorhm A hrd modfcao of he UCL algorhm acheves fe sasfcg regre he Gaussa M,δ-sasfcg problem, whch s he sasfcg--mea-reward mul-armed bad wh M m σ 2 ad δ 0, 1] Problem 4. For he age, hs ca be hough of as wag o have fe cofdece ha has foud a arm whose mea reward s above a kow hreshold. For he M,δ-sasfcg problem, defe he heursc fuco Q = μ + σ Φ 1 1 δ. 3 Le he elgble se a me be { Q M}. We defe he M,δ-sasfcg UCL algorhm as he algorhm ha selecs arm { Q M}, f he elgble se a me s o-empy. Oherwse, f he elgble se s empy, he algorhm pcks he arm wh maxmal Q. The M,δ-sasfcg UCL algorhm acheves effce performace as guaraeed by he followg heorem. Theorem 11: Cosder he M,δ-sasfcg UCL algorhm wh a uformave pror. The umber of mes he pcked arm s o-sasfyg wh probably greaer ha δ s upper bouded as T < 4σ2 s M 2 Φ 1 1 δ/ Proof: The proof s very smlar o he proofs of Theorems 8 ad 10. As Theorem 8, we boud T by T = T 1 = η + T 1 Q M, η + 1 Q Q, η. The codo Q M, whch meas arm s he elgble se, ca be decomposed o he wo codos Equao 44 s equvale o M μ m + C 43 M <m +2C. 44 = M m < 2C = 2σ s Φ 1 1 δ/3. Squarg ad rearragg, we see ha 44 ever holds f > 4σ2 s M 2 Φ 1 1 δ/3 2 = η.

12 REVERDY e al.: SATISFICING IN MULTI-ARMED BANDIT PROBLEMS 3799 The same argume as he proof of Theorem 10 shows ha for 1, 43 holds wh probably a mos δ/3, so >η mples ha a o-sasfyg arm s he elgble se wh probably a mos δ/3. As he proof of Theorem 10, a o-sasfyg arm s pcked due o he elgble se beg empy oly f Q Q, where s he arm wh maxmal mea reward. Ths codo ca aga be decomposed o he hree codos Equao 42 does o hold f >η, so he probably ha Q Q s bouded by he probably ha eher 40 or 41 holds. For > 1, each of hese holds wh probably δ/3, so he probably of a o-sasfyg arm beg chose due o he elgble se beg empy s a mos 2δ/3. Thus, for >η+1, a o-sasfyg arm s seleced wh probably a mos δ. Theorem 11 guaraees ha he M,δ-sasfcg UCL algorhm acheves fe regre. Furhermore, he algorhm s effce ha he regre maches he depedece o ɛ ad δ he boud 16. Applyg he boud 30 o he verse Gaussa cdf o he saeme he heorem, we see ha T s upper bouded by 8σs 2 log3/δ/ 2. M Summg hs boud over o-sasfyg arms shows ha he oal umber of mes he algorhm curs regre s a mos 8σs 2 log3/δ { M >0} 1/ 2. M Ths maches he depedece o ɛ ad δ he boud 16 up o cosa facors. Noe ha lower boud 16 cous he umber of selecos of all arms cludg he opmal arm, whle he upper boud cous oly he subopmal arms. Hece, we ca oly clam ha we acheve cumulave regre bouded T. Wh a beer lower boud o T, we may be able o clam ha, smlar o δ-suffcg UCL, M,δ-suffcg UCL acheves he opmal depedece o ɛ ad δ. However, hs remas a ope problem o pursue. D. Robus Sasfcg UCL Algorhms The UCL algorhm solves Problem 1, he Gaussa sadard problem. The modfed versos of he UCL algorhm Secos VI-A, VI-B, ad VI-C solve he oher hree Gaussa sasfcg--mea-reward Problems 2 4. All four UCL algorhms acheve effce performace solvg her respecve problems, as guaraeed by Theorems 8, 10, ad 11. The equvalece resul of Lemma 5 shows for Gaussa dsrbued rewards ha we ca modfy he four UCL algorhms developed for Problems 1 4 o solve Problems 5 8 as follows. The modfed UCL algorhms make decsos based o he sadardzed mea reward 20 usg prors o he sadardzed mea rewards. A pror belef m Nμ 0, Σ 0 o he mea rewards m s rasformed o a pror belef o he sadardzed mea rewards x N μ 0, Σ 0 by μ 0 =μ 0 M/σ s,, Σ 0 j =Σ 0 j /σ s, σ s,j. Problem 5: Robus UCL Algorhm: The robus UCL algorhm s he UCL algorhm where he pror s gve erms of he sadardzed mea rewards, ad he observed reward r s sadardzed accordg o he rasformao 22 before beg pu o he ferece equaos 25. Problem 6: Robus Sasfaco UCL Algorhm: The robus sasfaco UCL algorhm s he sasfaco-mea-reward UCL algorhm where he pror s gve erms of he sadardzed mea rewards, he observed reward r s sadardzed accordg o he rasformao 22 before beg pu o he ferece equaos 25, ad he parameer M s se equal o X =Φ 1 Π defed 23. Problem 7: δ-robus Suffcg Algorhm: The δ- robus suffcg UCL algorhm sheδ-suffcg UCL algorhm where he pror s gve erms of he sadardzed mea rewards, ad he observed reward r s sadardzed accordg o he rasformao 22 before beg pu o he ferece equaos 25. Problem 8: Π,δ-Robus Suffcg Algorhm: The Π,δ-robus suffcg UCL algorhm s he M,δ-sasfcg UCL algorhm where he pror s gve erms of he sadardzed mea rewards, he observed reward r s sadardzed accordg o he rasformao 22 before beg pu o he ferece equaos 25, ad he parameer M s se equal o X =Φ 1 Π defed 23. Lemma 5 mples ha he performace guaraees ha hold for he UCL algorhms developed for Problems 1 4 also hold for he four ew UCL algorhms defed above whe appled o Problems 5 8. E. Relaxaos of Gaussa ad Kow Varace Assumpos The algorhms preseed so far have bee developed assumg ha he reward dsrbuo assocaed wh each arm s Gaussa wh ukow mea m ad kow varace σs, 2.The reward varace may be kow, e.g., esmaed from kow sesor characerscs or pror daa. Whe he reward varace s o kow, a smple modfcao o he heursc 27 yelds a algorhm ha acheves effce performace. Smlar smple modfcaos exed our resuls o he case where he reward dsrbuo s sub-gaussa, whch cludes dsrbuos wh bouded suppor. We sae modfcaos for he case of a uformave pror. Pror formao ca be corporaed usg a cojugae pror, as dscussed [24]. Remark 12 Gaussa Rewards Wh Ukow Varace: Whe he reward dsrbuo s Gaussa wh ukow varace, he heursc developed by Auer e al. [4] for her algorhm UCB1-NORMAL resuls algorhms ha acheve effce performace. Recall ha s he umber of mes arm has bee seleced up o me, ad m s he emprcal mea reward observed a arm up o me. Defe q = = as he sum of he squared rewards obaed from arm. The UCB1-NORMAL algorhm s composed of wo rules: f here s a arm ha has bee played less ha 8 log mes, selecs ha arm. Oherwse selecs he arm ha maxmzes he heursc Q,UCB1 NORMAL = m + r 2 16 q m 2 1 log.

13 3800 IEEE TRANSACTIONS ON AUTOMATIC CONTROL, VOL. 62, NO. 8, AUGUST 2017 Ths heursc ca be used drecly he sadard ad sasfaco--mea-reward UCL algorhms. For he δ- suffcg ad M,δ-sasfcg UCL algorhms, use k =2ad k =3, respecvely, he heursc Q = m + 4 q m 2 1 log k/δ. The Gaussa dsrbuo wh ukow mea ad varace s aga a locao-scale famly, so Lemma 5 mples ha hese modfed algorhms ca be used o solve he robus sasfcg problems as well. Pror formao ca be corporaed by meas of a cojugae pror, as dscussed [24]. For he followg remarks, defe he geeralzed heursc Q β = m + β log. Remark 13 Sub-Gaussa Rewards: Aoher geeralzao of Gaussa rewards wh kow varace s he case where he reward dsrbuo s sub-gaussa, also kow as lgh-aled. The dsrbuo of a radom varable X s called sub-gaussa f s mome geerag fuco Mu = E [expux] s fe for all u R. The, oe ca fd a cosa ζ such ha Mu expζu 2 /2 [9]. I hs case, a heursc fuco due o Lu ad Zhou [21] = Q 8ζ ca be used o acheve effce performace. Remark 14 Reward Dsrbuos Wh Bouded Suppor: Aoher commo assumpo he bad leraure s ha he reward dsrbuos are arbrary bu have a kow bouded suppor [a, b] R. Whou loss of geeraly, we assume ha he suppor s coaed he u erval [0, 1]. I hs case Q,SG = Q 2 ca be used he sadard ad sasfaco--mea-reward UCL algorhms. For he δ-suffcg ad M,δ-sasfcg UCL algorhms, k =2 ad k =3, respecvely, he heursc Q = he UCB1 heursc due o Auer e al. [4] Q,UCB1 Q k/δ 1/2 ca be used o acheve effce performace. For he robus sasfcg problems he releva reward, happess h 17, s a Beroull radom varable whch s suppored o [0, 1]. Therefore, each robus sasfcg problem ca be solved by he approprae vara of UCB1. However, f addoal formao s avalable abou he dsrbuo of he raw rewards r, e.g., ha hey are Gaussa wh kow varace, he he robus UCL algorhms ca acheve mproved performace relave o UCB1, for example f he Kullback- Lebler dvergece bewee he r dsrbuos s larger ha he Beroull dsrbuos assocaed wh h. Addoal exesos o heavy-aled dsrbuos may be possble usg he echques of [7]. VII. NUMERICAL EXAMPLES I hs seco, we prese he resuls of umercal smulaos of he modfed UCL algorhms solvg mul-armed bad problems wh Gaussa rewards ad sasfcg objecves. We cosder boh hresholdg he mea rewards m,as Problems 1 4 Table I, ad hresholdg he saaeous rewards, as Problems 5 8 Table II. I all of he cases preseed, he algorhms used a uformave pror. We use he Fg. 1. Comparso of regre curred by he UCL algorhms whe solvg he sadard problem Problem 1 ad sasfaco--mea-reward problem Problem 2. Boh problems defe regre by hresholdg mea reward values; he sadard bad objecve curs regre whe he mea reward of he chose opo s less ha he maxmum reward m, whle he sasfaco--mea-reward problem curs regre whe he mea reward s less ha M m σ 2, here se equal o 2.5. For boh problems, he cumulave expeced regre ad s upper boud crease a a logarhmc rae sce he age seeks ceray ha s hreshold s me, whch cao acheve fe me. Fg. 2. Comparso of regre curred by he UCL algorhms whe solvg he δ- ad M,δ-sasfcg problems, Problems 3 ad 4, respecvely. As Fg. 1, he problems defe regre by hresholdg he mea reward values; he δ-suffcg objecve curs regre whe he mea reward of he chose opo s less ha he maxmum reward m, whle he M,δ-suffcg problem curs regre whe he mea reward s less ha M m σ 2, here se equal o 2.5. I coras o Fg. 1, he age oly seeks o have 1 δ = 95% cofdece ha s hreshold s me, whch ca acheve fe me. Thus, he upper bouds o cumulave expeced regre are cosa fucos of horzo legh ad he mea regre plaeaus a a fe value. smulaos o llusrae performace of he algorhms relave o he bouds proved he heorems of Seco IV. We also use he smulaos o compare how he dffere algorhms rade off accumulao of reward wh reduco explorao cos as measured by umber of swches amog arms. As show he fgures, sasfcg ca sgfcaly decrease he explorao cos whle currg lle cos erms of he rewards receved by he age. We frs cosder he sasfcg objecves wh hresholdg he mea rewards. We llusrae how he objecves of Problems 1 ad 2 yeld logarhmc regre Fg. 1 whereas he objecves of Problems 3 ad 4 yeld fe regre Fg. 2, as predced by he bouds proved Theorems 7, 8, 10 ad 11.

arxiv: v2 [cs.lg] 19 Dec 2016

arxiv: v2 [cs.lg] 19 Dec 2016 1 Sasfcg mul-armed bad problems Paul Reverdy, Vabhav Srvasava, ad Naom Ehrch Leoard arxv:1512.07638v2 [cs.lg] 19 Dec 2016 Absrac Sasfcg s a relaxao of maxmzg ad allows for less rsky decso makg he face

More information

The Poisson Process Properties of the Poisson Process

The Poisson Process Properties of the Poisson Process Posso Processes Summary The Posso Process Properes of he Posso Process Ierarrval mes Memoryless propery ad he resdual lfeme paradox Superposo of Posso processes Radom seleco of Posso Pos Bulk Arrvals ad

More information

(1) Cov(, ) E[( E( ))( E( ))]

(1) Cov(, ) E[( E( ))( E( ))] Impac of Auocorrelao o OLS Esmaes ECON 3033/Evas Cosder a smple bvarae me-seres model of he form: y 0 x The four key assumpos abou ε hs model are ) E(ε ) = E[ε x ]=0 ) Var(ε ) =Var(ε x ) = ) Cov(ε, ε )

More information

14. Poisson Processes

14. Poisson Processes 4. Posso Processes I Lecure 4 we roduced Posso arrvals as he lmg behavor of Bomal radom varables. Refer o Posso approxmao of Bomal radom varables. From he dscusso here see 4-6-4-8 Lecure 4 " arrvals occur

More information

International Journal Of Engineering And Computer Science ISSN: Volume 5 Issue 12 Dec. 2016, Page No.

International Journal Of Engineering And Computer Science ISSN: Volume 5 Issue 12 Dec. 2016, Page No. www.jecs. Ieraoal Joural Of Egeerg Ad Compuer Scece ISSN: 19-74 Volume 5 Issue 1 Dec. 16, Page No. 196-1974 Sofware Relably Model whe mulple errors occur a a me cludg a faul correco process K. Harshchadra

More information

Key words: Fractional difference equation, oscillatory solutions,

Key words: Fractional difference equation, oscillatory solutions, OSCILLATION PROPERTIES OF SOLUTIONS OF FRACTIONAL DIFFERENCE EQUATIONS Musafa BAYRAM * ad Ayd SECER * Deparme of Compuer Egeerg, Isabul Gelsm Uversy Deparme of Mahemacal Egeerg, Yldz Techcal Uversy * Correspodg

More information

Real-Time Systems. Example: scheduling using EDF. Feasibility analysis for EDF. Example: scheduling using EDF

Real-Time Systems. Example: scheduling using EDF. Feasibility analysis for EDF. Example: scheduling using EDF EDA/DIT6 Real-Tme Sysems, Chalmers/GU, 0/0 ecure # Updaed February, 0 Real-Tme Sysems Specfcao Problem: Assume a sysem wh asks accordg o he fgure below The mg properes of he asks are gve he able Ivesgae

More information

Real-time Classification of Large Data Sets using Binary Knapsack

Real-time Classification of Large Data Sets using Binary Knapsack Real-me Classfcao of Large Daa Ses usg Bary Kapsack Reao Bru bru@ds.uroma. Uversy of Roma La Sapeza AIRO 004-35h ANNUAL CONFERENCE OF THE ITALIAN OPERATIONS RESEARCH Sepember 7-0, 004, Lecce, Ialy Oule

More information

Determination of Antoine Equation Parameters. December 4, 2012 PreFEED Corporation Yoshio Kumagae. Introduction

Determination of Antoine Equation Parameters. December 4, 2012 PreFEED Corporation Yoshio Kumagae. Introduction refeed Soluos for R&D o Desg Deermao of oe Equao arameers Soluos for R&D o Desg December 4, 0 refeed orporao Yosho Kumagae refeed Iroduco hyscal propery daa s exremely mpora for performg process desg ad

More information

Continuous Time Markov Chains

Continuous Time Markov Chains Couous me Markov chas have seay sae probably soluos f a oly f hey are ergoc, us lke scree me Markov chas. Fg he seay sae probably vecor for a couous me Markov cha s o more ffcul ha s he scree me case,

More information

AML710 CAD LECTURE 12 CUBIC SPLINE CURVES. Cubic Splines Matrix formulation Normalised cubic splines Alternate end conditions Parabolic blending

AML710 CAD LECTURE 12 CUBIC SPLINE CURVES. Cubic Splines Matrix formulation Normalised cubic splines Alternate end conditions Parabolic blending CUIC SLINE CURVES Cubc Sples Marx formulao Normalsed cubc sples Alerae ed codos arabolc bledg AML7 CAD LECTURE CUIC SLINE The ame sple comes from he physcal srume sple drafsme use o produce curves A geeral

More information

Optimal Eye Movement Strategies in Visual Search (Supplement)

Optimal Eye Movement Strategies in Visual Search (Supplement) Opmal Eye Moveme Sraeges Vsual Search (Suppleme) Jr Naemk ad Wlso S. Gesler Ceer for Percepual Sysems ad Deparme of Psychology, Uversy of exas a Aus, Aus X 787 Here we derve he deal searcher for he case

More information

The Mean Residual Lifetime of (n k + 1)-out-of-n Systems in Discrete Setting

The Mean Residual Lifetime of (n k + 1)-out-of-n Systems in Discrete Setting Appled Mahemacs 4 5 466-477 Publshed Ole February 4 (hp//wwwscrporg/oural/am hp//dxdoorg/436/am45346 The Mea Resdual Lfeme of ( + -ou-of- Sysems Dscree Seg Maryam Torab Sahboom Deparme of Sascs Scece ad

More information

-distributed random variables consisting of n samples each. Determine the asymptotic confidence intervals for

-distributed random variables consisting of n samples each. Determine the asymptotic confidence intervals for Assgme Sepha Brumme Ocober 8h, 003 9 h semeser, 70544 PREFACE I 004, I ed o sped wo semesers o a sudy abroad as a posgraduae exchage sude a he Uversy of Techology Sydey, Ausrala. Each opporuy o ehace my

More information

FORCED VIBRATION of MDOF SYSTEMS

FORCED VIBRATION of MDOF SYSTEMS FORCED VIBRAION of DOF SSES he respose of a N DOF sysem s govered by he marx equao of moo: ] u C] u K] u 1 h al codos u u0 ad u u 0. hs marx equao of moo represes a sysem of N smulaeous equaos u ad s me

More information

IMPROVED PORTFOLIO OPTIMIZATION MODEL WITH TRANSACTION COST AND MINIMAL TRANSACTION LOTS

IMPROVED PORTFOLIO OPTIMIZATION MODEL WITH TRANSACTION COST AND MINIMAL TRANSACTION LOTS Vol.7 No.4 (200) p73-78 Joural of Maageme Scece & Sascal Decso IMPROVED PORTFOLIO OPTIMIZATION MODEL WITH TRANSACTION COST AND MINIMAL TRANSACTION LOTS TIANXIANG YAO AND ZAIWU GONG College of Ecoomcs &

More information

θ = θ Π Π Parametric counting process models θ θ θ Log-likelihood: Consider counting processes: Score functions:

θ = θ Π Π Parametric counting process models θ θ θ Log-likelihood: Consider counting processes: Score functions: Paramerc coug process models Cosder coug processes: N,,..., ha cou he occurreces of a eve of eres for dvduals Iesy processes: Lelhood λ ( ;,,..., N { } λ < Log-lelhood: l( log L( Score fucos: U ( l( log

More information

Linear Regression Linear Regression with Shrinkage

Linear Regression Linear Regression with Shrinkage Lear Regresso Lear Regresso h Shrkage Iroduco Regresso meas predcg a couous (usuall scalar oupu from a vecor of couous pus (feaures x. Example: Predcg vehcle fuel effcec (mpg from 8 arbues: Lear Regresso

More information

Quantum Mechanics II Lecture 11 Time-dependent perturbation theory. Time-dependent perturbation theory (degenerate or non-degenerate starting state)

Quantum Mechanics II Lecture 11 Time-dependent perturbation theory. Time-dependent perturbation theory (degenerate or non-degenerate starting state) Pro. O. B. Wrgh, Auum Quaum Mechacs II Lecure Tme-depede perurbao heory Tme-depede perurbao heory (degeerae or o-degeerae sarg sae) Cosder a sgle parcle whch, s uperurbed codo wh Hamloa H, ca exs a superposo

More information

8. Queueing systems lect08.ppt S Introduction to Teletraffic Theory - Fall

8. Queueing systems lect08.ppt S Introduction to Teletraffic Theory - Fall 8. Queueg sysems lec8. S-38.45 - Iroduco o Teleraffc Theory - Fall 8. Queueg sysems Coes Refresher: Smle eleraffc model M/M/ server wag laces M/M/ servers wag laces 8. Queueg sysems Smle eleraffc model

More information

Fault Tolerant Computing. Fault Tolerant Computing CS 530 Probabilistic methods: overview

Fault Tolerant Computing. Fault Tolerant Computing CS 530 Probabilistic methods: overview Probably 1/19/ CS 53 Probablsc mehods: overvew Yashwa K. Malaya Colorado Sae Uversy 1 Probablsc Mehods: Overvew Cocree umbers presece of uceray Probably Dsjo eves Sascal depedece Radom varables ad dsrbuos

More information

QR factorization. Let P 1, P 2, P n-1, be matrices such that Pn 1Pn 2... PPA

QR factorization. Let P 1, P 2, P n-1, be matrices such that Pn 1Pn 2... PPA QR facorzao Ay x real marx ca be wre as AQR, where Q s orhogoal ad R s upper ragular. To oba Q ad R, we use he Householder rasformao as follows: Le P, P, P -, be marces such ha P P... PPA ( R s upper ragular.

More information

Moments of Order Statistics from Nonidentically Distributed Three Parameters Beta typei and Erlang Truncated Exponential Variables

Moments of Order Statistics from Nonidentically Distributed Three Parameters Beta typei and Erlang Truncated Exponential Variables Joural of Mahemacs ad Sascs 6 (4): 442-448, 200 SSN 549-3644 200 Scece Publcaos Momes of Order Sascs from Nodecally Dsrbued Three Parameers Bea ype ad Erlag Trucaed Expoeal Varables A.A. Jamoom ad Z.A.

More information

Some Probability Inequalities for Quadratic Forms of Negatively Dependent Subgaussian Random Variables

Some Probability Inequalities for Quadratic Forms of Negatively Dependent Subgaussian Random Variables Joural of Sceces Islamc epublc of Ira 6(: 63-67 (005 Uvers of ehra ISSN 06-04 hp://scecesuacr Some Probabl Iequales for Quadrac Forms of Negavel Depede Subgaussa adom Varables M Am A ozorga ad H Zare 3

More information

4. THE DENSITY MATRIX

4. THE DENSITY MATRIX 4. THE DENSTY MATRX The desy marx or desy operaor s a alerae represeao of he sae of a quaum sysem for whch we have prevously used he wavefuco. Alhough descrbg a quaum sysem wh he desy marx s equvale o

More information

Least Squares Fitting (LSQF) with a complicated function Theexampleswehavelookedatsofarhavebeenlinearintheparameters

Least Squares Fitting (LSQF) with a complicated function Theexampleswehavelookedatsofarhavebeenlinearintheparameters Leas Squares Fg LSQF wh a complcaed fuco Theeampleswehavelookedasofarhavebeelearheparameers ha we have bee rg o deerme e.g. slope, ercep. For he case where he fuco s lear he parameers we ca fd a aalc soluo

More information

The algebraic immunity of a class of correlation immune H Boolean functions

The algebraic immunity of a class of correlation immune H Boolean functions Ieraoal Coferece o Advaced Elecroc Scece ad Techology (AEST 06) The algebrac mmuy of a class of correlao mmue H Boolea fucos a Jgla Huag ad Zhuo Wag School of Elecrcal Egeerg Norhwes Uversy for Naoales

More information

Fundamentals of Speech Recognition Suggested Project The Hidden Markov Model

Fundamentals of Speech Recognition Suggested Project The Hidden Markov Model . Projec Iroduco Fudameals of Speech Recogo Suggesed Projec The Hdde Markov Model For hs projec, s proposed ha you desg ad mpleme a hdde Markov model (HMM) ha opmally maches he behavor of a se of rag sequeces

More information

FALL HOMEWORK NO. 6 - SOLUTION Problem 1.: Use the Storage-Indication Method to route the Input hydrograph tabulated below.

FALL HOMEWORK NO. 6 - SOLUTION Problem 1.: Use the Storage-Indication Method to route the Input hydrograph tabulated below. Jorge A. Ramírez HOMEWORK NO. 6 - SOLUTION Problem 1.: Use he Sorage-Idcao Mehod o roue he Ipu hydrograph abulaed below. Tme (h) Ipu Hydrograph (m 3 /s) Tme (h) Ipu Hydrograph (m 3 /s) 0 0 90 450 6 50

More information

As evident from the full-sample-model, we continue to assume that individual errors are identically and

As evident from the full-sample-model, we continue to assume that individual errors are identically and Maxmum Lkelhood smao Greee Ch.4; App. R scrp modsa, modsb If we feel safe makg assumpos o he sascal dsrbuo of he error erm, Maxmum Lkelhood smao (ML) s a aracve alerave o Leas Squares for lear regresso

More information

Solution. The straightforward approach is surprisingly difficult because one has to be careful about the limits.

Solution. The straightforward approach is surprisingly difficult because one has to be careful about the limits. ose ad Varably Homewor # (8), aswers Q: Power spera of some smple oses A Posso ose A Posso ose () s a sequee of dela-fuo pulses, eah ourrg depedely, a some rae r (More formally, s a sum of pulses of wdh

More information

Chapter 8. Simple Linear Regression

Chapter 8. Simple Linear Regression Chaper 8. Smple Lear Regresso Regresso aalyss: regresso aalyss s a sascal mehodology o esmae he relaoshp of a respose varable o a se of predcor varable. whe here s jus oe predcor varable, we wll use smple

More information

Quantitative Portfolio Theory & Performance Analysis

Quantitative Portfolio Theory & Performance Analysis 550.447 Quaave Porfolo heory & Performace Aalyss Week February 4 203 Coceps. Assgme For February 4 (hs Week) ead: A&L Chaper Iroduco & Chaper (PF Maageme Evrome) Chaper 2 ( Coceps) Seco (Basc eur Calculaos)

More information

The Linear Regression Of Weighted Segments

The Linear Regression Of Weighted Segments The Lear Regresso Of Weghed Segmes George Dael Maeescu Absrac. We proposed a regresso model where he depede varable s made o up of pos bu segmes. Ths suao correspods o he markes hroughou he da are observed

More information

Fully Fuzzy Linear Systems Solving Using MOLP

Fully Fuzzy Linear Systems Solving Using MOLP World Appled Sceces Joural 12 (12): 2268-2273, 2011 ISSN 1818-4952 IDOSI Publcaos, 2011 Fully Fuzzy Lear Sysems Solvg Usg MOLP Tofgh Allahvraloo ad Nasser Mkaelvad Deparme of Mahemacs, Islamc Azad Uversy,

More information

The textbook expresses the stock price as the present discounted value of the dividend paid and the price of the stock next period.

The textbook expresses the stock price as the present discounted value of the dividend paid and the price of the stock next period. ublc Affars 974 Meze D. Ch Fall Socal Sceces 748 Uversy of Wscos-Madso Sock rces, News ad he Effce Markes Hypohess (rev d //) The rese Value Model Approach o Asse rcg The exbook expresses he sock prce

More information

Midterm Exam. Tuesday, September hour, 15 minutes

Midterm Exam. Tuesday, September hour, 15 minutes Ecoomcs of Growh, ECON560 Sa Fracsco Sae Uvers Mchael Bar Fall 203 Mderm Exam Tuesda, Sepember 24 hour, 5 mues Name: Isrucos. Ths s closed boo, closed oes exam. 2. No calculaors of a d are allowed. 3.

More information

Probability Bracket Notation and Probability Modeling. Xing M. Wang Sherman Visual Lab, Sunnyvale, CA 94087, USA. Abstract

Probability Bracket Notation and Probability Modeling. Xing M. Wang Sherman Visual Lab, Sunnyvale, CA 94087, USA. Abstract Probably Bracke Noao ad Probably Modelg Xg M. Wag Sherma Vsual Lab, Suyvale, CA 94087, USA Absrac Ispred by he Drac oao, a ew se of symbols, he Probably Bracke Noao (PBN) s proposed for probably modelg.

More information

COMPARISON OF ESTIMATORS OF PARAMETERS FOR THE RAYLEIGH DISTRIBUTION

COMPARISON OF ESTIMATORS OF PARAMETERS FOR THE RAYLEIGH DISTRIBUTION COMPARISON OF ESTIMATORS OF PARAMETERS FOR THE RAYLEIGH DISTRIBUTION Eldesoky E. Affy. Faculy of Eg. Shbee El kom Meoufa Uv. Key word : Raylegh dsrbuo, leas squares mehod, relave leas squares, leas absolue

More information

Lecture 3 Topic 2: Distributions, hypothesis testing, and sample size determination

Lecture 3 Topic 2: Distributions, hypothesis testing, and sample size determination Lecure 3 Topc : Drbuo, hypohe eg, ad ample ze deermao The Sude - drbuo Coder a repeaed drawg of ample of ze from a ormal drbuo of mea. For each ample, compue,,, ad aoher ac,, where: The ac he devao of

More information

Comparison of the Bayesian and Maximum Likelihood Estimation for Weibull Distribution

Comparison of the Bayesian and Maximum Likelihood Estimation for Weibull Distribution Joural of Mahemacs ad Sascs 6 (2): 1-14, 21 ISSN 1549-3644 21 Scece Publcaos Comarso of he Bayesa ad Maxmum Lkelhood Esmao for Webull Dsrbuo Al Omar Mohammed Ahmed, Hadeel Salm Al-Kuub ad Noor Akma Ibrahm

More information

Least squares and motion. Nuno Vasconcelos ECE Department, UCSD

Least squares and motion. Nuno Vasconcelos ECE Department, UCSD Leas squares ad moo uo Vascocelos ECE Deparme UCSD Pla for oda oda we wll dscuss moo esmao hs s eresg wo was moo s ver useful as a cue for recogo segmeao compresso ec. s a grea eample of leas squares problem

More information

VARIATIONAL ITERATION METHOD FOR DELAY DIFFERENTIAL-ALGEBRAIC EQUATIONS. Hunan , China,

VARIATIONAL ITERATION METHOD FOR DELAY DIFFERENTIAL-ALGEBRAIC EQUATIONS. Hunan , China, Mahemacal ad Compuaoal Applcaos Vol. 5 No. 5 pp. 834-839. Assocao for Scefc Research VARIATIONAL ITERATION METHOD FOR DELAY DIFFERENTIAL-ALGEBRAIC EQUATIONS Hoglag Lu Aguo Xao Yogxag Zhao School of Mahemacs

More information

Cyclone. Anti-cyclone

Cyclone. Anti-cyclone Adveco Cycloe A-cycloe Lorez (963) Low dmesoal aracors. Uclear f hey are a good aalogy o he rue clmae sysem, bu hey have some appealg characerscs. Dscusso Is he al codo balaced? Is here a al adjusme

More information

Continuous Indexed Variable Systems

Continuous Indexed Variable Systems Ieraoal Joural o Compuaoal cece ad Mahemacs. IN 0974-389 Volume 3, Number 4 (20), pp. 40-409 Ieraoal Research Publcao House hp://www.rphouse.com Couous Idexed Varable ysems. Pouhassa ad F. Mohammad ghjeh

More information

Available online Journal of Scientific and Engineering Research, 2014, 1(1): Research Article

Available online  Journal of Scientific and Engineering Research, 2014, 1(1): Research Article Avalable ole wwwjsaercom Joural o Scec ad Egeerg Research, 0, ():0-9 Research Arcle ISSN: 39-630 CODEN(USA): JSERBR NEW INFORMATION INEUALITIES ON DIFFERENCE OF GENERALIZED DIVERGENCES AND ITS APPLICATION

More information

Partial Molar Properties of solutions

Partial Molar Properties of solutions Paral Molar Properes of soluos A soluo s a homogeeous mxure; ha s, a soluo s a oephase sysem wh more ha oe compoe. A homogeeous mxures of wo or more compoes he gas, lqud or sold phase The properes of a

More information

The textbook expresses the stock price as the present discounted value of the dividend paid and the price of the stock next period.

The textbook expresses the stock price as the present discounted value of the dividend paid and the price of the stock next period. coomcs 435 Meze. Ch Fall 07 Socal Sceces 748 Uversy of Wscos-Madso Sock rces, News ad he ffce Markes Hypohess The rese Value Model Approach o Asse rcg The exbook expresses he sock prce as he prese dscoued

More information

Solution set Stat 471/Spring 06. Homework 2

Solution set Stat 471/Spring 06. Homework 2 oluo se a 47/prg 06 Homework a Whe he upper ragular elemes are suppressed due o smmer b Le Y Y Y Y A weep o he frs colum o oba: A ˆ b chagg he oao eg ad ec YY weep o he secod colum o oba: Aˆ YY weep o

More information

For the plane motion of a rigid body, an additional equation is needed to specify the state of rotation of the body.

For the plane motion of a rigid body, an additional equation is needed to specify the state of rotation of the body. The kecs of rgd bodes reas he relaoshps bewee he exeral forces acg o a body ad he correspodg raslaoal ad roaoal moos of he body. he kecs of he parcle, we foud ha wo force equaos of moo were requred o defe

More information

RATIO ESTIMATORS USING CHARACTERISTICS OF POISSON DISTRIBUTION WITH APPLICATION TO EARTHQUAKE DATA

RATIO ESTIMATORS USING CHARACTERISTICS OF POISSON DISTRIBUTION WITH APPLICATION TO EARTHQUAKE DATA The 7 h Ieraoal as of Sascs ad Ecoomcs Prague Sepember 9-0 Absrac RATIO ESTIMATORS USING HARATERISTIS OF POISSON ISTRIBUTION WITH APPLIATION TO EARTHQUAKE ATA Gamze Özel Naural pulaos bolog geecs educao

More information

Cyclically Interval Total Colorings of Cycles and Middle Graphs of Cycles

Cyclically Interval Total Colorings of Cycles and Middle Graphs of Cycles Ope Joural of Dsree Mahemas 2017 7 200-217 hp://wwwsrporg/joural/ojdm ISSN Ole: 2161-7643 ISSN Pr: 2161-7635 Cylally Ierval Toal Colorgs of Cyles Mddle Graphs of Cyles Yogqag Zhao 1 Shju Su 2 1 Shool of

More information

Supplement Material for Inverse Probability Weighted Estimation of Local Average Treatment Effects: A Higher Order MSE Expansion

Supplement Material for Inverse Probability Weighted Estimation of Local Average Treatment Effects: A Higher Order MSE Expansion Suppleme Maeral for Iverse Probably Weged Esmao of Local Average Treame Effecs: A Hger Order MSE Expaso Sepe G. Doald Deparme of Ecoomcs Uversy of Texas a Aus Yu-C Hsu Isue of Ecoomcs Academa Sca Rober

More information

Regression Approach to Parameter Estimation of an Exponential Software Reliability Model

Regression Approach to Parameter Estimation of an Exponential Software Reliability Model Amerca Joural of Theorecal ad Appled Sascs 06; 5(3): 80-86 hp://www.scecepublshggroup.com/j/ajas do: 0.648/j.ajas.060503. ISSN: 36-8999 (Pr); ISSN: 36-9006 (Ole) Regresso Approach o Parameer Esmao of a

More information

Density estimation. Density estimations. CS 2750 Machine Learning. Lecture 5. Milos Hauskrecht 5329 Sennott Square

Density estimation. Density estimations. CS 2750 Machine Learning. Lecture 5. Milos Hauskrecht 5329 Sennott Square Lecure 5 esy esmao Mlos Hauskrec mlos@cs..edu 539 Seo Square esy esmaos ocs: esy esmao: Mamum lkelood ML Bayesa arameer esmaes M Beroull dsrbuo. Bomal dsrbuo Mulomal dsrbuo Normal dsrbuo Eoeal famly Noaramerc

More information

Complete Identification of Isotropic Configurations of a Caster Wheeled Mobile Robot with Nonredundant/Redundant Actuation

Complete Identification of Isotropic Configurations of a Caster Wheeled Mobile Robot with Nonredundant/Redundant Actuation 486 Ieraoal Joural Sugbok of Corol Km Auomao ad Byugkwo ad Sysems Moo vol 4 o 4 pp 486-494 Augus 006 Complee Idefcao of Isoropc Cofguraos of a Caser Wheeled Moble Robo wh Noreduda/Reduda Acuao Sugbok Km

More information

Complementary Tree Paired Domination in Graphs

Complementary Tree Paired Domination in Graphs IOSR Joural of Mahemacs (IOSR-JM) e-issn: 2278-5728, p-issn: 239-765X Volume 2, Issue 6 Ver II (Nov - Dec206), PP 26-3 wwwosrjouralsorg Complemeary Tree Pared Domao Graphs A Meeaksh, J Baskar Babujee 2

More information

The ray paths and travel times for multiple layers can be computed using ray-tracing, as demonstrated in Lab 3.

The ray paths and travel times for multiple layers can be computed using ray-tracing, as demonstrated in Lab 3. C. Trael me cures for mulple reflecors The ray pahs ad rael mes for mulple layers ca be compued usg ray-racg, as demosraed Lab. MATLAB scrp reflec_layers_.m performs smple ray racg. (m) ref(ms) ref(ms)

More information

General Complex Fuzzy Transformation Semigroups in Automata

General Complex Fuzzy Transformation Semigroups in Automata Joural of Advaces Compuer Research Quarerly pissn: 345-606x eissn: 345-6078 Sar Brach Islamc Azad Uversy Sar IRIra Vol 7 No May 06 Pages: 7-37 wwwacrausaracr Geeral Complex uzzy Trasformao Semgroups Auomaa

More information

Asymptotic Behavior of Solutions of Nonlinear Delay Differential Equations With Impulse

Asymptotic Behavior of Solutions of Nonlinear Delay Differential Equations With Impulse P a g e Vol Issue7Ver,oveber Global Joural of Scece Froer Research Asypoc Behavor of Soluos of olear Delay Dffereal Equaos Wh Ipulse Zhag xog GJSFR Classfcao - F FOR 3 Absrac Ths paper sudes he asypoc

More information

Redundancy System Fault Sampling Under Imperfect Maintenance

Redundancy System Fault Sampling Under Imperfect Maintenance A publcao of CHEMICAL EGIEERIG TRASACTIOS VOL. 33, 03 Gues Edors: Erco Zo, Pero Barald Copyrgh 03, AIDIC Servz S.r.l., ISB 978-88-95608-4-; ISS 974-979 The Iala Assocao of Chemcal Egeerg Ole a: www.adc./ce

More information

Pricing of CDO s Based on the Multivariate Wang Transform*

Pricing of CDO s Based on the Multivariate Wang Transform* Prcg of DO s Based o he Mulvarae Wag Trasform* ASTIN 2009 olloquum @ Helsk 02 Jue 2009 Masaak Kma Tokyo Meropola versy/ Kyoo versy Emal: kma@mu.ac.p hp://www.comp.mu.ac.p/kmam * Jo Work wh Sh-ch Moomya

More information

Mixed Integral Equation of Contact Problem in Position and Time

Mixed Integral Equation of Contact Problem in Position and Time Ieraoal Joural of Basc & Appled Sceces IJBAS-IJENS Vol: No: 3 ed Iegral Equao of Coac Problem Poso ad me. A. Abdou S. J. oaquel Deparme of ahemacs Faculy of Educao Aleadra Uversy Egyp Deparme of ahemacs

More information

Research on portfolio model based on information entropy theory

Research on portfolio model based on information entropy theory Avalable ole www.jocpr.com Joural of Chemcal ad Pharmaceucal esearch, 204, 6(6):286-290 esearch Arcle ISSN : 0975-7384 CODEN(USA) : JCPC5 esearch o porfolo model based o formao eropy heory Zhag Jusha,

More information

The Optimal Combination Forecasting Based on ARIMA,VAR and SSM

The Optimal Combination Forecasting Based on ARIMA,VAR and SSM Advaces Compuer, Sgals ad Sysems (206) : 3-7 Clausus Scefc Press, Caada The Opmal Combao Forecasg Based o ARIMA,VAR ad SSM Bebe Che,a, Mgya Jag,b* School of Iformao Scece ad Egeerg, Shadog Uversy, Ja,

More information

Stabilization of LTI Switched Systems with Input Time Delay. Engineering Letters, 14:2, EL_14_2_14 (Advance online publication: 16 May 2007) Lin Lin

Stabilization of LTI Switched Systems with Input Time Delay. Engineering Letters, 14:2, EL_14_2_14 (Advance online publication: 16 May 2007) Lin Lin Egeerg Leers, 4:2, EL_4_2_4 (Advace ole publcao: 6 May 27) Sablzao of LTI Swched Sysems wh Ipu Tme Delay L L Absrac Ths paper deals wh sablzao of LTI swched sysems wh pu me delay. A descrpo of sysems sablzao

More information

Density estimation III. Linear regression.

Density estimation III. Linear regression. Lecure 6 Mlos Hauskrec mlos@cs.p.eu 539 Seo Square Des esmao III. Lear regresso. Daa: Des esmao D { D D.. D} D a vecor of arbue values Obecve: r o esmae e uerlg rue probabl srbuo over varables X px usg

More information

The Bernstein Operational Matrix of Integration

The Bernstein Operational Matrix of Integration Appled Mahemacal Sceces, Vol. 3, 29, o. 49, 2427-2436 he Berse Operaoal Marx of Iegrao Am K. Sgh, Vee K. Sgh, Om P. Sgh Deparme of Appled Mahemacs Isue of echology, Baaras Hdu Uversy Varaas -225, Ida Asrac

More information

Model for Optimal Management of the Spare Parts Stock at an Irregular Distribution of Spare Parts

Model for Optimal Management of the Spare Parts Stock at an Irregular Distribution of Spare Parts Joural of Evromeal cece ad Egeerg A 7 (08) 8-45 do:0.765/6-598/08.06.00 D DAVID UBLIHING Model for Opmal Maageme of he pare ars ock a a Irregular Dsrbuo of pare ars veozar Madzhov Fores Research Isue,

More information

JORIND 9(2) December, ISSN

JORIND 9(2) December, ISSN JORIND 9() December, 011. ISSN 1596 8308. www.rascampus.org., www.ajol.o/jourals/jord THE EXONENTIAL DISTRIBUTION AND THE ALICATION TO MARKOV MODELS Usma Yusu Abubakar Deparme o Mahemacs/Sascs Federal

More information

NOTE ON SIMPLE AND LOGARITHMIC RETURN

NOTE ON SIMPLE AND LOGARITHMIC RETURN Appled udes Agrbusess ad Commerce AAC Ceer-r ublshg House, Debrece DOI:.94/AAC/27/-2/6 CIENIFIC AE NOE ON IME AND OGAIHMIC EUN aa Mskolcz Uversy of Debrece, Isue of Accoug ad Face mskolczpaa@gmal.com Absrac:

More information

Brownian Motion and Stochastic Calculus. Brownian Motion and Stochastic Calculus

Brownian Motion and Stochastic Calculus. Brownian Motion and Stochastic Calculus Browa Moo Sochasc Calculus Xogzh Che Uversy of Hawa a Maoa earme of Mahemacs Seember, 8 Absrac Ths oe s abou oob decomoso he bascs of Suare egrable margales Coes oob-meyer ecomoso Suare Iegrable Margales

More information

ASYMPTOTIC EQUIVALENCE OF NONPARAMETRIC REGRESSION AND WHITE NOISE. BY LAWRENCE D. BROWN 1 AND MARK G. LOW 2 University of Pennsylvania

ASYMPTOTIC EQUIVALENCE OF NONPARAMETRIC REGRESSION AND WHITE NOISE. BY LAWRENCE D. BROWN 1 AND MARK G. LOW 2 University of Pennsylvania The Aals of Sascs 996, Vol., No. 6, 38398 ASYMPTOTIC EQUIVALENCE OF NONPARAMETRIC REGRESSION AND WITE NOISE BY LAWRENCE D. BROWN AND MARK G. LOW Uversy of Pesylvaa The prcpal resul s ha, uder codos, o

More information

Solving fuzzy linear programming problems with piecewise linear membership functions by the determination of a crisp maximizing decision

Solving fuzzy linear programming problems with piecewise linear membership functions by the determination of a crisp maximizing decision Frs Jo Cogress o Fuzzy ad Iellge Sysems Ferdows Uversy of Mashhad Ira 9-3 Aug 7 Iellge Sysems Scefc Socey of Ira Solvg fuzzy lear programmg problems wh pecewse lear membershp fucos by he deermao of a crsp

More information

Application of the stochastic self-training procedure for the modelling of extreme floods

Application of the stochastic self-training procedure for the modelling of extreme floods The Exremes of he Exremes: Exraordary Floods (Proceedgs of a symposum held a Reyjav, Icelad, July 000). IAHS Publ. o. 7, 00. 37 Applcao of he sochasc self-rag procedure for he modellg of exreme floods

More information

Solving Non-Linear Rational Expectations Models: Approximations based on Taylor Expansions

Solving Non-Linear Rational Expectations Models: Approximations based on Taylor Expansions Work progress Solvg No-Lear Raoal Expecaos Models: Approxmaos based o Taylor Expasos Rober Kollma (*) Deparme of Ecoomcs, Uversy of Pars XII 6, Av. du Gééral de Gaulle; F-94 Créel Cedex; Frace rober_kollma@yahoo.com;

More information

EE 6885 Statistical Pattern Recognition

EE 6885 Statistical Pattern Recognition EE 6885 Sascal Paer Recogo Fall 005 Prof. Shh-Fu Chag hp://.ee.columba.edu/~sfchag Lecure 8 (/8/05 8- Readg Feaure Dmeso Reduco PCA, ICA, LDA, Chaper 3.8, 0.3 ICA Tuoral: Fal Exam Aapo Hyväre ad Erkk Oja,

More information

USING INPUT PROCESS INDICATORS FOR DYNAMIC DECISION MAKING

USING INPUT PROCESS INDICATORS FOR DYNAMIC DECISION MAKING Proceedgs of he 999 Wer Smulao Coferece P. A. Farrgo, H. B. Nembhard, D. T. Surrock, ad G. W. Evas, eds. USING INPUT PROCESS INDICATORS FOR DYNAMIC DECISION MAKING Mchael Fremer School of Operaos Research

More information

Asymptotic Regional Boundary Observer in Distributed Parameter Systems via Sensors Structures

Asymptotic Regional Boundary Observer in Distributed Parameter Systems via Sensors Structures Sesors,, 37-5 sesors ISSN 44-8 by MDPI hp://www.mdp.e/sesors Asympoc Regoal Boudary Observer Dsrbued Parameer Sysems va Sesors Srucures Raheam Al-Saphory Sysems Theory Laboraory, Uversy of Perpga, 5, aveue

More information

Other Topics in Kernel Method Statistical Inference with Reproducing Kernel Hilbert Space

Other Topics in Kernel Method Statistical Inference with Reproducing Kernel Hilbert Space Oher Topcs Kerel Mehod Sascal Iferece wh Reproducg Kerel Hlber Space Kej Fukumzu Isue of Sascal Mahemacs, ROIS Deparme of Sascal Scece, Graduae Uversy for Advaced Sudes Sepember 6, 008 / Sascal Learg Theory

More information

An Efficient Dual to Ratio and Product Estimator of Population Variance in Sample Surveys

An Efficient Dual to Ratio and Product Estimator of Population Variance in Sample Surveys "cece as True Here" Joural of Mahemacs ad ascal cece, Volume 06, 78-88 cece gpos Publshg A Effce Dual o Rao ad Produc Esmaor of Populao Varace ample urves ubhash Kumar Yadav Deparme of Mahemacs ad ascs

More information

Efficient Estimators for Population Variance using Auxiliary Information

Efficient Estimators for Population Variance using Auxiliary Information Global Joural of Mahemacal cece: Theor ad Praccal. IN 97-3 Volume 3, Number (), pp. 39-37 Ieraoal Reearch Publcao Houe hp://www.rphoue.com Effce Emaor for Populao Varace ug Aular Iformao ubhah Kumar Yadav

More information

Chapter 3: Maximum-Likelihood & Bayesian Parameter Estimation (part 1)

Chapter 3: Maximum-Likelihood & Bayesian Parameter Estimation (part 1) Aoucemes Reags o E-reserves Proec roosal ue oay Parameer Esmao Bomercs CSE 9-a Lecure 6 CSE9a Fall 6 CSE9a Fall 6 Paer Classfcao Chaer 3: Mamum-Lelhoo & Bayesa Parameer Esmao ar All maerals hese sles were

More information

Synchronization of Complex Network System with Time-Varying Delay Via Periodically Intermittent Control

Synchronization of Complex Network System with Time-Varying Delay Via Periodically Intermittent Control Sychrozao of Complex ework Sysem wh me-varyg Delay Va Perodcally Ierme Corol JIAG Ya Deparme of Elecrcal ad Iformao Egeerg Hua Elecrcal College of echology Xaga 4, Cha Absrac he sychrozao corol problem

More information

Use of Non-Conventional Measures of Dispersion for Improved Estimation of Population Mean

Use of Non-Conventional Measures of Dispersion for Improved Estimation of Population Mean Amerca Joural of Operaoal esearch 06 6(: 69-75 DOI: 0.59/.aor.06060.0 Use of o-coveoal Measures of Dsperso for Improve Esmao of Populao Mea ubhash Kumar aav.. Mshra * Alok Kumar hukla hak Kumar am agar

More information

EDUCATION COMMITTEE OF THE SOCIETY OF ACTUARIES ADVANCED TOPICS IN GENERAL INSURANCE STUDY NOTE CREDIBILITY WITH SHIFTING RISK PARAMETERS

EDUCATION COMMITTEE OF THE SOCIETY OF ACTUARIES ADVANCED TOPICS IN GENERAL INSURANCE STUDY NOTE CREDIBILITY WITH SHIFTING RISK PARAMETERS EDUCATION COMMITTEE OF THE SOCIETY OF ACTUARIES ADVANCED TOPICS IN GENERAL INSURANCE STUDY NOTE CREDIBILITY WITH SHIFTING RISK PARAMETERS Suar Klugma, FSA, CERA, PhD Copyrgh 04 Socey of Acuares The Educao

More information

AN INCREMENTAL QUASI-NEWTON METHOD WITH A LOCAL SUPERLINEAR CONVERGENCE RATE. Aryan Mokhtari Mark Eisen Alejandro Ribeiro

AN INCREMENTAL QUASI-NEWTON METHOD WITH A LOCAL SUPERLINEAR CONVERGENCE RATE. Aryan Mokhtari Mark Eisen Alejandro Ribeiro AN INCREMENTAL QUASI-NEWTON METHOD WITH A LOCAL SUPERLINEAR CONVERGENCE RATE Arya Mokhar Mark Ese Alejadro Rbero Deparme of Elecrcal ad Sysems Egeerg, Uversy of Pesylvaa ABSTRACT We prese a cremeal Broyde-Flecher-Goldfarb-Shao

More information

Final Exam Applied Econometrics

Final Exam Applied Econometrics Fal Eam Appled Ecoomercs. 0 Sppose we have he followg regresso resl: Depede Varable: SAT Sample: 437 Iclded observaos: 437 Whe heeroskedasc-cosse sadard errors & covarace Varable Coeffce Sd. Error -Sasc

More information

Density estimation III.

Density estimation III. Lecure 4 esy esmao III. Mlos Hauskrec mlos@cs..edu 539 Seo Square Oule Oule: esy esmao: Mamum lkelood ML Bayesa arameer esmaes MP Beroull dsrbuo. Bomal dsrbuo Mulomal dsrbuo Normal dsrbuo Eoeal famly Eoeal

More information

Density estimation III.

Density estimation III. Lecure 6 esy esmao III. Mlos Hausrec mlos@cs..eu 539 Seo Square Oule Oule: esy esmao: Bomal srbuo Mulomal srbuo ormal srbuo Eoeal famly aa: esy esmao {.. } a vecor of arbue values Objecve: ry o esmae e

More information

Pricing Asian Options with Fourier Convolution

Pricing Asian Options with Fourier Convolution Prcg Asa Opos wh Fourer Covoluo Cheg-Hsug Shu Deparme of Compuer Scece ad Iformao Egeerg Naoal Tawa Uversy Coes. Iroduco. Backgroud 3. The Fourer Covoluo Mehod 3. Seward ad Hodges facorzao 3. Re-ceerg

More information

Nature and Science, 5(1), 2007, Han and Xu, Multi-variable Grey Model based on Genetic Algorithm and its Application in Urban Water Consumption

Nature and Science, 5(1), 2007, Han and Xu, Multi-variable Grey Model based on Genetic Algorithm and its Application in Urban Water Consumption Naure ad Scece, 5, 7, Ha ad u, ul-varable Grey odel based o Geec Algorhm ad s Applcao Urba Waer Cosumpo ul-varable Grey odel based o Geec Algorhm ad s Applcao Urba Waer Cosumpo Ha Ya*, u Shguo School of

More information

Voltage Sensitivity Analysis in MV Distribution Networks

Voltage Sensitivity Analysis in MV Distribution Networks Proceedgs of he 6h WSEAS/IASME I. Cof. o Elecrc Power Sysems, Hgh olages, Elecrc Maches, Teerfe, Spa, December 6-8, 2006 34 olage Sesvy Aalyss M Dsrbuo Neworks S. CONTI, A.M. GRECO, S. RAITI Dparmeo d

More information

New Guaranteed H Performance State Estimation for Delayed Neural Networks

New Guaranteed H Performance State Estimation for Delayed Neural Networks Ieraoal Joural of Iformao ad Elecrocs Egeerg Vol. o. 6 ovember ew Guaraeed H Performace ae Esmao for Delayed eural eworks Wo Il Lee ad PooGyeo Park Absrac I hs paper a ew guaraeed performace sae esmao

More information

Time-Dependent Perturbation Theory

Time-Dependent Perturbation Theory Tme-Depede Perurbao Theory Mchael Fowler 7/6/7 Iroduco: Geeral Formalsm We look a a Hamloa H H + V( ), wh V( ) some me-depede perurbao, so ow he wave fuco wll have perurbao-duced me depedece Our sarg po

More information

Learning of Graphical Models Parameter Estimation and Structure Learning

Learning of Graphical Models Parameter Estimation and Structure Learning Learg of Grahal Models Parameer Esmao ad Sruure Learg e Fukumzu he Isue of Sasal Mahemas Comuaoal Mehodology Sasal Iferee II Work wh Grahal Models Deermg sruure Sruure gve by modelg d e.g. Mxure model

More information

Solution of Impulsive Differential Equations with Boundary Conditions in Terms of Integral Equations

Solution of Impulsive Differential Equations with Boundary Conditions in Terms of Integral Equations Joural of aheacs ad copuer Scece (4 39-38 Soluo of Ipulsve Dffereal Equaos wh Boudary Codos Ters of Iegral Equaos Arcle hsory: Receved Ocober 3 Acceped February 4 Avalable ole July 4 ohse Rabba Depare

More information

Point Estimation: definition of estimators

Point Estimation: definition of estimators Pot Estmato: defto of estmators Pot estmator: ay fucto W (X,..., X ) of a data sample. The exercse of pot estmato s to use partcular fuctos of the data order to estmate certa ukow populato parameters.

More information

Interval Estimation. Consider a random variable X with a mean of X. Let X be distributed as X X

Interval Estimation. Consider a random variable X with a mean of X. Let X be distributed as X X ECON 37: Ecoomercs Hypohess Tesg Iervl Esmo Wh we hve doe so fr s o udersd how we c ob esmors of ecoomcs reloshp we wsh o sudy. The queso s how comforble re we wh our esmors? We frs exme how o produce

More information

A note on Turán number Tk ( 1, kn, )

A note on Turán number Tk ( 1, kn, ) A oe o Turá umber T (,, ) L A-Pg Beg 00085, P.R. Cha apl000@sa.com Absrac: Turá umber s oe of prmary opcs he combaorcs of fe ses, hs paper, we wll prese a ew upper boud for Turá umber T (,, ). . Iroduco

More information