Sequential Allocation with Minimal Switching

In Cmputing Science and Statistics 28 (1996), pp. 567 572 Sequential Allcatin with Minimal Switching Quentin F. Stut 1 Janis Hardwick 1 EECS Dept., University f Michigan Statistics Dept., Purdue University Abstract This paper describes algrithms fr the design f sequential experiments where extensive switching is undesirable. Given an bjective functin t minimize by sampling between Bernulli ppulatins, tw different mdels are cnsidered. The cnstraint mdel ptimizes the tradeff f the maximum number f switches vs. the bjective functin, while the cst mdel ptimizes the tradeff fr the expected number f switches. Fr each mdel, an algrithm is develped which prduces the ptimal sequential experiment. The algrithms are quite general, and give users flexibility in incrprating practical cnsideratins in the design f experiments. T shw the usability f these algrithms, they are applied t a bandit prblem and an estimatin prblem. It is bserved that the expected number f switches grws apprximately as the square rt f the sample size, fr sample sizes up t a few hundred. It is als bserved that ne can dramatically reduce the number f switches withut substantially affecting the expected value f the bjective functin. Thus ne need sacrifice nly a small amunt f statistical bjective in rder t achieve significant gains in practicality. Keywrds: adaptive sampling, switching csts, cnstraints, bandit, estimatin, dynamic prgramming, ptimal tradeffs 1 Intrductin In situatins where data is cllected ver time, adaptive sampling r allcatin, in which decisins are made based n accruing data, is mre efficient than fixed sampling, where all decisins are made in advance. Adaptive allcatins can reduce csts r time, r imprve the results fr a given sample size. Hwever, fully adaptive designs are rarely used, due t varius cncerns ver their design, analysis, and implementatin. In sme settings, ne cncern is that sequential designs switch repeatedly between the alternatives, a design attribute that may be cstly r impssible [6]. Fr example, in an industrial setting, ne may need t recnfigure fixtures each time a switch ccurs. In a clinical setting, similar setup r 1 Research supprted in part by Natinal Science Fundatin under grants DMS-9157715 and DMS-9504980. training csts may be required t switch between treatment alternatives. Imprtant alternatives fr mdeling the ill effects f switching include: 1. Yu have setup cst ff i t change t ppulatin i, and incur incremental cst fi i fr each bservatin as lng as yu stay n that ppulatin, where ff i fl fi i. 2. Yu have t make up a treatment in batches, r need t set up fixtures t cnduct several identical tests at the same time, s when yu decide that the next m bservatins are n ppulatin i yu incur a cst ff i + fi i m. Here yu must specify m in advance. 3. Yu can switch at mst S times. Fr example, yu may need t use a special apparatus t which yu have nly S accesses. Such cst structures are quite imprtant, althugh rarely directly incrprated in designs. One exceptin is in cntrl thery, where the cst structure is as in 1 (see [1] and the references therein). Unfrtunately, their results apply nly when a gemetric discunt structure is used with an infinite hrizn and n terminal bjective. Anther analysis f switching csts appears in [2], where the setting is quite specialized with ne arm having a knwn prbability f success and the bjective is t minimize the expected number f failures. Nne f this prir wrk applies t the fairly general sequential prblems we have in mind, which allws fr an arbitrary bjective functin, finite hrizn, and flexible methds fr incrprating switching cnsideratins. In Sectin 2, basic definitins are given. In Sectin 3 a cnstraint mdel is defined, crrespnding t alternative 3 abve, and an algrithm is given that determines the ptimal sequential design fr this mdel. In Sectin 4 a cst mdel is defined, generalizing alternative 1. There it is shwn that this mdel determines the ptimal tradeff f expected switches vs. bjective, and an algrithm that determines the ptimal sequential design is given. The algrithms in Sectins 3 and 4 are based n dynamic prgramming, and the crux f the cmputatinal prblem is t determine a minimal state space and manner f evaluatin fr these mdels that makes the cmputatins feasible. In Sectin 5 the algrithms are applied t a bandit prblem and an estimatin prblem, and the switching

behavir f the ptimal allcatin prcedures is determined cmputatinally. Sectin 6 cncludes with remarks cncerning generalizatins and bservatins cncerning the results. While n wrk will be dne here n cst alternative 2, nte that it can be viewed as a staged-allcatin prblem, and can be ptimized by the techniques develped in [4]. 2 Definitins Thrughut, we assumethat thesample size N is fixed. This assumptin merely simplifies ur analyses and examples, and the algrithms can easily be adapted t include stpping rules. If ptinal stpping is desired (and it is quite natural t incrprate stpping rules in cst mdels), then N shuld be interpreted as the maximum pssible sample size. Often knwledge f the stpping rules can be used t significantly reduce the state space, but since this is quite applicatin-dependent we will mit such imprvements here and analyze nly the wrst case in which n stpping ccurs. There are P Bernulli ppulatins, and at any pint the nly decisin required is t chse which f these t bserve. We use a Bayesian apprach, where the success parameters f the ppulatins have independent prir distributins. Suppse that at sme pint we have bserved s i successes and f i failures n Ppulatin i. Then the vectr (s 1 ;f 1 ;:::;s P ;f P ) is a sufficient statistic, and frms a natural index fr the state space describing the experiment. States, dented as v, will be treated as vectrs s that ne can add bservatins in a natural manner. We use p i (v) t dente the prbability f success if the next bservatin is frm ppulatin i, given that the experiment is at state v. There is an bjective functin R Λ (v) which is the value f each final state v (i.e., states fr which jvj = N), and the gal is t minimize the expected value f R Λ.Theexpected value f allcatin A, dented R A, is the sum, ver all final states v, fr Λ (v) times the prbability that A reaches v. Fr an arbitrary state v,letr(v) dente the expected value f R Λ when starting at v and prceeding ptimally t the end. Then R(0) is the expected value f the ptimal sequential experiment, i.e., is R pt.theefficiency f allcatin prcedure A is R pt =R A. By using standard dynamic prgramming, R pt can be determined in (PN 2P =(2P )!) time since there are (N 2P =(2P )!) states, each requiring the evaluatin f P alternatives. Here, as in all f ur timing analyses, we assume that R Λ can be cmputed fr all terminal states in time prprtinal t the number f terminal states, and that fr each ppulatin i, p i can be determined fr all states in time prprtinal t the number f states. Even when R Λ r p i are cmplicated t cmpute fr a single state, ur assumptin typically hlds because ne can reuse parts f the calculatins fr ne state t assist thse fr ther states. N: samplesize P : number f ppulatins S: maximum number f switches pssible (cnstraint mdel) R Λ : terminal bjective functin R A : expected value f R fr allcatin A R(v): expected value f R Λ, starting at state v and prceeding ptimally (n cst cnsideratins) R pt : expected value f R fr ptimal sequential allcatin (i.e., R(0)) R i (v): expected value f bjective + switching csts btained by starting at state v, sampling frmppulatin i, and prceeding ptimally (cst mdel) R ff i (v): expected bjective btained by starting at state v, sampling frm pp. i, and prceeding ptimally using n mre than ff switches (cnstraint mdel) c(i; j): the cst f switching frm pp. i t pp. j C i (v): expected value f ttal switching csts btained by starting at state v, sampling frm ppulatin i, and prceeding ptimally (cst mdel) v: a state, that is, a vectr denting number f successes and failures bserved n each ppulatin jvj : the ttal number f bservatins at state v si; fi: vectrs denting 1 success r failure n pp. i p i (v): prbability f success n next bservatin f ppulatin i, when experiment is in state v Figure 1: Ntatin 3 Cnstraint Mdel In the cnstraint mdel, there is an upper bund S n the number f times that the ppulatins t be sampled can be switched (the initial sampling is nt cunted as a switch.) The gal is t minimize the expected value f the bjective functin, subject t this cnstraint. When S N 1, the prblem is equivalent t the standard ptimizatin prblem withut switching cnsideratins. Results fr k-stage allcatin immediately give a lwer bund n the efficiency btainable fr S =(P 1)k, since a k-stage allcatin can be scheduled by allcating all stage 1 bservatins n ppulatins 1 thrugh P,thenallstage2 bservatins n ppulatins P thrugh 1, etc. In particular,

wrk in [4] shws that fr P = 2, valuesfs as small as 3 will give very high efficiency n sme prblems, and this is bserved in Sectin 5. The bserved perfrmance with an average f k switches is significantly better than that f the k- stage rules, due t the increased adaptiveness f cst mdel 1 ver mdel 2. T determine the ptimal allcatin algrithm under the cnstraint mdel, there is a difficulty in that the natural state space des nt uniquely determine the number f switches that ccured in reaching that state (except in trivial cases), nr can ne determine if a switch ccurs when ging frm ne given state t its successr (again, except in trivial cases). T determine these, ne apparently needs t add infrmatin specifying the number f switches and the mst recent ppulatin sampled. There are several ways this can be dne. Here, the values Ri ff (v) are cmputed at each state, where Ri ff (v) is the expected value f the bjective functin if ne is at state v, bserves ppulatin i, and prceeds ptimally under the cnstraint that at mst ff mre switches ccur. This is equivalent t extending the state v t a cllectin f states (v; i; ff), with the abve interpretatins fr i and ff, andthen determining the value f the ptimal cntinuatin at each state. The ptimal expected value f a sequential allcatin prcedure satisfying the switching cnstraint is the minimum, ver all ppulatins i, fri S (0). The critical dynamic prgramming equatins determining the values f Ri ff (v) are based n nting that the successr states can nly be v + s i and v + f i, depending n whether a success r failure, respectively, ccurs. Upn reaching a successr state, either ppulatin i is sampled again, in which case the number f future switches pssible des nt change, r else a new ppulatin is sampled and the maximum number f switches remaining decreases by ne. The detailed equatins appear in Figure 2. Nte that if ff =0, then n further switches are pssible. Analyzing the time and space f the algrithm in Figure 2, ne arrives at the results in Therem 3.1. The space analysis assumes that space is reused fr each new value f m. Hwever, this requires that ne be careful with the rdering in which ne ges thrugh the states with jvj = m t make sure that values are nt verwritten befre all f their uses have ccurred. The wavefrnt dependencies can be satisfied by prceeding thrugh each cmpnent f v in increasing fashin, and prceeding thrugh the number f switches in decreasing rder. The rdering f the ppulatins is irrelevant. Therem 3.1 The value f the ptimal experiment f N bservatins with P Bernulli ppulatins, with an bjective functin and a cnstraint f at mst S switches, can be determined in (SP 2 N 2P =(2P )!) time and (SPN 2P 1 =(2P 1)!) space by the algrithm in Figure 2. 2 In practice ne wuld usually stre the Ri ff (v) values in a single array. In Frtran, it is best t d this as the array fr all terminal states v (i.e., states where jvj = N) fr all ppulatins i 2f1;:::;Pg fr all switches ff 2f0;:::;Sg initialize R ff i (v) =RΛ (v) fr m = n 1 dwnt 0 fr all states v with jvj = m fr all switches ff 2f0;:::;Sg fr all ppulatins i 2f1;:::;Pg r suc = R ff i (v + s i) r fail = R ff i (v + f i) if ff>0 then r suc = minfr suc ; minfr ff 1 j (v + si) :j 6= igg (v + fi) :j 6= igg r fail = minfr fail ; minfr ff 1 j Ri ff (v) =p i(v) r suc +(1 p i (v)) r fail Expected value = minfri S (0) :i 2f1;:::;Pgg Figure 2: Optimal Experiment fr Cnstraint Mdel R(i; ff; v), since there are multiple innermst lps (the minimum peratins) that run thrugh the ppulatins fr fixed values f ff and v. Figure 2 shws nly hw t cmpute the values f Ri S (0), nt what the allcatin algrithm is that achieves these values. T determine the algrithm, ne needs t recrd the value f j that achieves the minimum in each place where a minimum peratin is perfrmed. Then ne can fllw the standard prcedure f recnstructing the ptimal slutin frm the beginning f the experiment twards the end by fllwing these pinters. Nte that the algrithm actually finds the value f the ptimal experiments crrespnding t all cnstraints less than r equal t S, nt just the ptimal experiment fr S. This is quite useful, since it allws ne t examine the range f tradeffs between maximum switches vs. expected bjective functin all frm a single run. In general the mst interesting cases are thse in which S is quite small, and hence reductins in the effective value f S have a large percentage change in the runtime r space required. Fr example, Ri 0 (v) can ften be determined algebraically, and thus it need nt be stred. Fr ff = S, there have nly been bservatins n a single ppulatin, and hence mst states cannt ccur. There are nly apprximately PN 2 =2 states that can ccur, as ppsed t the apprximately PN 2P =(2P )! ttal states, s the time and space requirements fr determining Ri S can be significantly reduced. 4 Cst Mdel In the cst mdel, if the last ppulatin sampled was i, and we nw sample j,thenwepayacstc(i; j) (if the first bservatin is n i, thenwepayc(i; i)). The gal is t minimize

the expected value f the terminal bjective plus csts. This is a flexible mdel that includes cst alternative 1 f Sectin 1 as a special case. A particularly imprtant special case ccurs when c(i; i) = 0 and c(i; j) = ff, i 6= j, fr then the cst cmpnent is prprtinal t the number f switches. Let R ff dente the ptimal value btained using this cst functin (where value is nw terminal bjective plus cst), and let C ff dente the expected cst f the allcatin prcedure achieving R ff.thenr ff C ff is the ptimal expected value f the bjective functin, under the cnstraint that the expected cst is n mre than C ff, i.e., under the cnstraint that the average number f switches is n mre than C ff =ff. Thus the cst mdel achieves ptimal bjective functin vs. expected number f switches tradeffs, but des s thrugh an indirect cntrl parameter (ff). T investigate a specific expected number f switches, ne must search thrugh ff, thugh ne can explit the fact that C ff =ff is mntne decreasing in ff. One culd determine the ptimal sequential allcatin prcedure under the cst mdel by using the same apprach as fr the cnstraint mdel, with nly minr changes t add the switching csts. Hwever, this wuld be impractical, because ne wuld need t set S = N 1 t insure that all pssibilities are analyzed, which wuld greatly reduce the size f experiment that culd be analyzed. Instead, at each state v, we determine nly P quantities. Let R i (v) dente the expected value f the bjective functin plus all csts frm v t the end f the experiment, given that the next bservatin is n ppulatin i and that ne then prceeds ptimally t minimize this quantity. Nte that R i (v) des nt include any csts incurred in reaching v, merely thse in prceeding nward frm v. It is straightfrward t see that minfr i (0) :i 2f1;:::;Pgg is the value f the ptimal sequential allcatin prcedure satisfying the cst mdel, and that R i (v) is crrectly determined by the recursive equatins in Figure 3. Therem 4.1 The value f the ptimal experiment f N bservatins with P Bernulli ppulatins, minimizing the bjective functin plus csts, can be determined in (P 2 N 2P =(2P )!) time and (PN 2P 1 =(2P 1)!) space by the algrithm in Figure 3. 2 5 Examples T shw that the algrithms f the previus sectins are practical, they are applied t tw illustrative prblems. Fr bth f these, P = 2 and beta distributins are used as the prirs n bth ppulatins. Our primary example is the 2-armed bandit prblem with finite hrizn, n discunting, and unifrm weights. We chse this example because it is wellknwn and has been widely studied, althugh we culd find n wrk n the expected number f switches. Our gal is t fr all terminal states v (i.e., states where jvj = N) fr all ppulatins i 2f1;:::;Pg initialize R i (v) =R Λ (v) initialize C i (v) =0 fr all m = n 1 dwnt 0 fr all states v with jvj = m fr all ppulatins i 2f1;:::;Pg js = argminfc(i; j)+r j (v + si) :j 2f1;:::;Pgg jf = argminfc(i; j)+r j (v + fi) :j 2f1;:::;Pgg R i (v) =p i (v) [c(i; js) +R js (v + si)] + (1 p i (v)) [c(i; jf )+R jf (v + fi)] C i (v) =p i (v) [c(i; js) +C js (v + si)] + (1 p i (v)) [c(i; jf )+C jf (v + fi)] Expected value = minfr i (0) C i (0) :i 2f1;:::;Pgg Figure 3: Optimal Experiment fr Cst Mdel shw that the general algrithms can be applied t prblems f interesting sizes. (We have slved prblems as large as N = 600 using standard wrkstatins.) N special prperties f the bandit prblem were explited t reduce the run-time r space required. T emphasize this, we applied the same prgrams t ur secnd example. By changing nly the bjective functin, we addressed the prblem f minimizing the mean squared errr f the estimate f the prduct P 1 P 2,whereP i is the success prbability f ppulatin i. Nte that fr the bandit prblem the gal is t stay n the better ppulatin, while fr the estimatin prblem ne shuld have extensive bservatins frm bth ppulatins. 5.1 2-Armed Bandit Fr the 2-armed bandit with binary respnse, hrizn N,and n discunting, the bjective is t minimize the ttal number f failures. In Figure 4, the expected number f switches when using the ptimal sequential slutin, with n switching cnsideratins, is pltted as a functin f the sample size N. Nte that the number f switches grws rughly as the square rt f N, fr the range f N and values f the prirs cnsidered, but that the grwth rate seems t be slwing. The maximum number f switches is nearly N in all cases, and is nt shwn. In Figure 5 a), the ptimal tradeffs between the expected value f the bjective functin and the expected number f switches are pltted. In Figure 5 b) the ptimal tradeffs invlve the maximum number f switches. These tradeffs are expressed in terms f the efficiency f the bjective functin. Fr a), the tradeffs are btained using the algrithm in Figure 3 with cst structures f the frm c(i; i) = 0 and c(i; j) = ff, i 6= j, while fr b) the tradeffs are btained using the algrithm in Figure 2. Nte that, in bth cases, ne

Lg(Expected Number Switches) 0.0 0.5 1.0 1.5 Efficiency 0.70 0.75 0.80 0.85 0.90 0.95 1.00 0.5 1.0 1.5 2.0 2.5 Ppulatin prirs: Lg(Sample Size) + Be(1,9), Be(1,9) 4 Be(1,1), Be(1,1) Λ Be(1,1), Be(2,2) Figure 4: 2-Armed Bandit, N Switching Cnsideratins btains very high efficiency with relatively few switches. 5.2 Prduct Estimatin The prblem f estimating the prduct f tw success prbabilities arises in reliability settings, and has been studied several times (see [3, 5] and the references therein). The bjective functin is the mean squared errr f the terminal estimate f P 1 P 2,whereP i is the success prbability f ppulatin i. In Figure 6 the ptimal tradeffs f bjective functin vs. expected number f switches are shwn. Due t space cnstraints, ther behavirs f this estimatin prblem cannt be shwn here, but the basic prperties bserved fr the bandit prblem have als been seen t ccur fr this prblem. In particular, sequential allcatin prcedures which d nt incrprate switching cnsideratins exhibit high numbers f switches, and the switches are dramatically reduced, with nly miniscule lss f efficiency, by the ptimal prcedures that d incrprate switching cnsideratins. 6 Final Remarks Practical cnsideratins are imprtant in the cnduct f experiments, s it is useful t give investigatrs ways t directly address such cnsideratins in the design f their experiments. One such cnsideratin is the extensive switching that cmmnly ccurs with sequential designs. This paper Efficiency 0.70 0.75 0.80 0.85 0.90 0.95 1.00 0 2 4 6 Expected Number f Switches a) Expected Number f Switches 0 50 100 150 Maximum Number f Switches b) Maximum Number f Switches Figure 5: N = 200, Unifrm prirs n bth ppulatins has addressed this cncern by giving algrithms that ptimize bjective functin vs. switching cnsideratin tradeffs. Given an arbitrary bjective functin, and given either switching csts r switching cnstraints, the algrithms herein determine the ptimal sequential experiment fr the resulting mdel. In sme cases an investigatr may nt utilize the sequential allcatin prcedure that ptimizes a tradeff, but may want t use it as a benchmark against which subptimal designs are evaluated. The cst mdel can easily be extended t depend n the number f bservatins s far, and n the utcme f the bservatin. This wuld allw ne t ptimize interesting cases such as bandit prblems with nn-unifrm weights. Fr such

Efficiency 0.2 0.4 0.6 0.8 1.0 0 10 20 30 40 50 60 Expected Number f Switches N = 200, Bth ppulatins have prir Be(9,1) Figure 6: Optimal Tradeffs, Estimating P 1 P 2 bandits the bjective functin cannt be evaluated with just a knwledge f the standard state space, but can be evaluated by the path-based apprach used t evaluate switching csts. Fr arbitrary bjective functins, with trivial changes either algrithm culd be adapted t ptimize the wrst-case values f the bjective functin, rather than the expected case. One can als merge the tw algrithms t ptimize the expected bjective functin plus switching csts, under a cnstraint n the maximal number f switches allwed. The algrithms were applied t tw examples t shw their utility, and t shw sme f the behavir that ccurs. It was bserved that sequential designs ptimized withut regard fr switching cnsideratins tend t have extensive switches, but that the number f switches can be dramatically reduced with nly minr lss f efficiency in the bjective functin. It was als bserved that, fr the prblems cnsidered, the expected number f switches fr standard ptimal sequential designs grws fairly rapidly, rughly n the rder f the square rt f the sample size, fr sample sizes f a few hundred. This grwth seems t slw dwn as the sample size increases, hwever, and this leads us t believe that the asympttic rate is far slwer. Thus it is expected that purely asympttic results wuld prly predict the bserved behavir. Because asympttics ften give weak guidance fr the design f specific experiments, we believe that cmputatinal insight and ptimizatin fills an imprtant rle. Hwever, t better fill that rle, it helps t have cmputatinal appraches that mdel all f the factrs that are relevant t the experimenter. This wrk is just a small piece in a larger prject t develp such mdels and prgrams. References [1] Assawa, M. and Teneketzis, D. (1996), Multi-armed bandits with switching penalties, IEEE Trans. Aut. Cntrl 41: 328 348. [2] Benzing, H., Kalin, D., Thedrescu, R. (1987), Optimal plicies fr sequential Bernulli experiments with switching csts, J. Infrm. Prcess. Cybernet. 23: 599 607. [3] Hardwick, J. and Stut, Q.F. (1993), Optimal allcatin fr estimating the prduct f tw means, Cmputing Science and Stat. 24: 592 596. [4] Hardwick, J. and Stut, Q.F. (1995), Determining ptimal few-stage allcatin prcedures, Cmputing Science and Stat. 27. [5] Page, C. (1987), Sequential designs fr estimating prducts f parameters, Seq. Anal. 6: 351 371. [6] Schmitz, N. (1993), Optimal Sequentially Planned Decisin Prcedures, Springer-Verlag Lecture Ntes.