Optimality of Myopic Policy for a Class of Monotone Affine Restless Multi-Armed Bandit

Univeriy of Souhern Cliforni Opimliy of Myopic Policy for Cl of Monoone Affine Rele Muli-Armed Bndi Pri Mnourifrd USC Tr Jvidi UCSD Bhkr Krihnmchri USC Dec 0, 202

Univeriy of Souhern Cliforni Inroducion Muli-Armed Bndi: Sochic deciion problem Selecing from everl lernive rm ech ime Plying n rm yield n immedie rewrd How o ply rm o mximize he expeced dicouned or verge rewrd over horizon Trde-off beween explorion nd exploiion Two cegorie: Reed nd Rele 2

Univeriy of Souhern Cliforni Reed MAB: Inroducion The e of he plyed rm chnge ccording o known Mrkovin rule Byein The remining rm y frozen Opiml policy: Specificlly n index cn be igned o he e of ech rm Plying n rm wih he lrge index ech ime Referred o he Giin index 3

Univeriy of Souhern Cliforni Rele MAB: Inroducion The e of ll rm even hoe h re no eleced evolve in Mrkovin fhion ech ime n index-policy i no in generl opiml, While index policy i opiml under ome conrin on he verge number of rm h cn be plyed ech ime PSPACE-hrd problem In lierure: pecil cle of RMAB for which priculr heuriic re opiml Our conribuion: generl cl of RMAB for which imple index policy Myopic policy i opiml 4

Univeriy of Souhern Cliforni Myopic policy: Inroducion elec n rm wih he highe immedie rewrd, ech ime, ignoring he impc of he curren cion on he fuure rewrd Recenly everl reerche: opimliy of Myopic policy under cerin condiion for muliple rm evolving wih i.i.d. wo-e dicree-ime Mrkov chin Our conribuion: Generlizing beyond he pecific eing of wo-e Mrkov chin rel-vlued e p 0 p 0 00 p bd good p 0 5

Univeriy of Souhern Cliforni cl of RMAB Problem Formulion n independen nd ochiclly idenicl rm. Finie horizon T, ime ep,...,t Only one rm cn be plyed ech ime Ech rm i in rel-vlued e: [ 0, mx ] Plying n rm wih e yield n immedie rewrd wih expecion R 6

Univeriy of Souhern Cliforni he e of rm j ime : Problem Formulion The e of eleced rm will ree ochiclly. j The e of no-plyed rm evolve ccording o deerminiic funcion Se rniion of rm j : mx Prior work: Specific eing of our formulion, p p mx R, 0 0 p j j j 0 p j p j 2 7

Univeriy of Souhern Cliforni Problem Formulion Policy vecor: [,..., T] The policy mp he curren e vecor o he cion of elecing n rm ime {,..., n} Curren e vecor i ufficien iic due o he Mrkovin dynmic Gol: Mximizing ol dicouned expeced rewrd: mx E [ T R ] 8

Univeriy of Souhern Cliforni Problem Formulion lue funcion: mximum expeced remining rewrd ring from ime : Recurive Equion DP 9 T n,...,, mx,,...,,...,,,,,, 0 mx, T p p R n n,, T R

Univeriy of Souhern Cliforni Problem Formulion Opiml policy: Myopic policy: mx T [ ' E R ' ' opiml rg mx,,..., n ' ] Myopic rg mx,..., n R rg Mximizing curren expeced rewrd R R i umed monooniclly increing in mx,..., n 0

Univeriy of Souhern Cliforni Condiion: Min Reul monooniclly increing nd ffine funcion of e, i conrcion mpping Theorem: Under bove condiion, nd he myopic policy i opiml R, p, b. 2 b 2 if, b i p mx p 0 i,,,..., T, i 2,..., n

Univeriy of Souhern Cliforni Concluion We proved he opimliy of Myopic policy for generl cl of rele Muli-rmed Bndi Generlizing o non-idenicl rm, non-ffine evoluion Generlizing o muli-dimenionl e Idenifying condiion for he problem h Myopic i no opiml bu oher efficien, poibly index-bed, policy i opiml. 2

Univeriy of Souhern Cliforni 3