Revenue comparison when the length of horizon is known in advance. The standard errors of all numbers are less than 0.1%.

Golrezae, Nazerzadeh, and Rusmevchentong: Real-tme Optmzaton of Personalzed Assortments Management Scence 00(0, pp. 000 000, c 0000 INFORMS 37 Onlne Appendx Appendx A: Numercal Experments: Appendx to Secton 6 A.. Known Length of the Horzon In ths Secton we compare the performance of the EIB, LIB and Hybrd algorthms to the myopc polcy and the LP-based heurstcs when the length of the horzon s known n advance. We set the ntal nventory levels to 00,.e., c = 00, =, 2,..., 73. Performance Evaluaton: In Table 6, we present the average revenue of each algorthm as a percentage of the upper bound, whch s averaged over all 250 problem nstances, for loadng factors.4,.6 and.8 and for coeffcents of varaton of 0.,, and 2. As the table shows, when the number of customers s known n advance, LPR 500 algorthm can obtan more that 99% of the optmal soluton for all the consdered problem classes whch mples that havng more resolvng perods s not necessary. Problem Upper Avg. Revenue under Dfferent Polces (as % of the Upper Bound Class Bound, Inventory-Balancng Myopc One-shot LP LP Resolvng Hybrd LF CV (n $000 EIB LIB Polcy LPO ALPO LPR 500 H.5 H 2 2.0 73 97.3 97.2 96.2 69.3 77.9 99.3 98.6 98.9.4.0 79 97.6 97.8 95.4 83.5 89.2 99.3 98.5 98.8 0. 82 98. 98.5 95.8 95.8 98.0 99.5 98.9 99. 2.0 75 98.2 98.3 97.2 68.4 75.4 99.4 98.9 99.0.6.0 8 98.7 98.9 97.4 83.6 88.3 99.6 99.0 99.0 0. 83 99.3 99.3 97.8 95.8 97.8 99.7 99.4 99.5 2.0 77 98.8 98.9 98.0 65.5 72. 99.4 99. 99..8.0 82 99.2 99.3 98.4 79.8 84.6 99.5 99.3 99.4 0. 83 99.7 99.8 99.4 95.5 97.6 99.8 99.8 99.8 Table 6 Revenue comparson when the length of horzon s known n advance. The standard errors of all numbers are less than 0.%. We note that both the LIB and EIB algorthms outperform the myopc and LPO algorthms. Moreover, the revenue of the EIB and LIB algorthms s wthn ±2% of that of the resolvng heurstcs. Comparng the performance of LPR 500 n Tables 6 and 3 mples that that LPR heurstcs are senstve to uncertanty n number of customers. Precsely, the performance of LPR heurstc decreases sgnfcantly when t does not know the exact length of the horzon. In all problem classes, the Hybrd algorthms yeld more revenue than the IB polces snce they ncorporates addtonal nformaton about arrval sequence by usng the LP resolvng heurstc. Agan, n all cases, the LPO algorthm has the lowest revenue and ts performance decreases by ncreasng CV and loadng factor. Observe that when CV= 0., One-shot LP heurstcs obtan more than 95% of the optmal clarvoyant soluton. For small value of CV, the number of customers of each type s very concentrated around ts average. Therefore, these heurstcs do not suffer from fxng ther strateges at the begnnng of the horzon. Note that even for small value of CV, our IB algorthms perform better than One-shot LP heurstcs.

Golrezae, Nazerzadeh, and Rusmevchentong: Real-tme Optmzaton of Personalzed Assortments 38 Management Scence 00(0, pp. 000 000, c 0000 INFORMS Transent Behavor: Fgures 2 shows the cumulatve revenue over tme for the myopc, LIB, LPR 500, LPO and ALPO algorthms wth LF =.8 and CV = 2. We observe that the myopc polcy and the LIB algorthm are very aggressve durng the ntal perods, resultng n hgher cumulatve revenues than Oneshot LP and resolvng heurstcs. Snce resolvng heurstcs know exactly the number of customers n advance, they manage to earn revenue lnearly over tme. Ths mples that knowng the true estmate of the length of the horzon (number of customers s essental for the resolvng heurstcs, that s, f the number of customers s less than ts estmated value, these heurstcs wll suffer from sgnfcant revenue loss, see Secton 6.2. Revenue.5 2 x 05 EIB EIB Myopc ALPO LPO LPR 500 0.5 0 0 2000 4000 6000 8000 0000 2000 Perod Fgure 2 The cumulatve revenue over tme for LF =.8 and CV = 2 when the length of the horzon s known n advance. A.2. Worst-Case Performance In Secton 6.2, we have compared dfferent polces n term of ther average performance. Here, we nvestgate the worst-case performance of dfferent polces. To ths am, we consder 250 random arrval sequences. For each of them we compute the rato of revenue collected by each polcy and the correspondng optmal clarvoyant soluton. Then, the worst-case performance of any polcy s defned as the mnmum of these ratos. Table 7 presents the worst-case performance of all polces for LF=.4,.6, and.8 and CV= 0., and 2 when the length of the horzon s drawn from the unform dstrbuton wth T T = E[T ]. Our IB polces outperform other polces n term of worst-case performance, that s they can obtan at least 9% of the optmal clarvoyant soluton, whch s much hgher than the theoretcal bounds,.e., 63% for the EIB polcy and 50% for the LIB polcy. We observe that the LP resolvng heurstcs perform poorly compare to IB and Hybrd algorthms. Furthermore, One-shot LP heurstcs are very senstve to uncertanty n arrval sequence (large CV. For nstance, when LF=.8 and CV=2 there s an arrval sequence n whch they only get 3.8% of the optmal clarvoyant soluton.

Golrezae, Nazerzadeh, and Rusmevchentong: Real-tme Optmzaton of Personalzed Assortments Management Scence 00(0, pp. 000 000, c 0000 INFORMS 39 Problem Worst Case Revenue under Dfferent Polces (as % of the Upper Bound Class Inventory-Balancng Myopc One-shot LP LP Resolvng hybrd LF CV EIB LIB Polcy LPO ALPO LPR 500 LPR 50 H.5 H 2 2.0 9.8 9.4 9.0 3.2 4.4 75.8 76.7 88.4 83.2.4.0 92.2 92.0 9.9 6.8 7.8 76.7 77.4 88.8 84.7 0. 92.2 9.8 9.4 72.6 73.3 78.9 79. 89.5 85.4 2.0 92.5 92.0 92.0 8.6 8.8 70.9 73.3 90.0 86.9.6.0 93.2 9.7 9.2 4.8 4.9 72.7 73. 89.8 87.3 0. 92.7 92.8 9.3 66.5 67.4 73.4 74.5 9. 87.3 2.0 92.4 92.3 9.2 3.8 3.8 66.2 67.4 90.5 87..8.0 92.8 92.5 92.0 20.9 20.3 68.4 69.0 9.8 88.9 0. 93. 93.2 9.6 60.3 60.5 67.8 68.2 92.6 90.3 Table 7 Worst-Case Performance Comparson when the length of horzon s unknown. A.3. Learnng the Customer Types Here we nvestgate the performance of the IB algorthms when we do not know the exact value of the selecton probablty φ z t (S. Rather, we only have an estmate φ z (S based on data collected n the prevous perods. Snce we assume the multnomal logt choce model for each customer type, we mantan an estmate V z (t = (V z 0 (t, V z (t,..., V z n (t of the preference weght parameters, where for each product, we set V z (t to be proportonal to the number of tmes that a customer of type z purchases DVD durng the prevous t perods, and we normalze V z (t so that V z 0 (t =. Smlar to the prevous secton, we have 0 customer types and 73 products wth ntal nventory of c = 30. Problem Class Upper Revenue Loadng Coeffcent of Bound, (as % of the Upper Bound Factor (LF Varaton (CV (n $00 LIB EIB 0.2 526 9.3 9.4.2 0.5 524 89.6 89.6 0.8 522 86.2 86.4 0.2 546 95.4 95.7.4 0.5 544 92.9 93.2 0.8 54 89.9 90.6 0.2 549 98.6 98.5.6 0.5 548 96.7 96.9 0.8 546 93.8 93.9 Table 8 The average revenue for the LIB and EIB algorthms when the underlyng parameters are unknown, and each algorthm uses the estmated parameters based on data collected n the prevous perods. Table 8 shows the revenue of the IB algorthms when these algorthms only have estmates of the preference weght parameters. In absolute terms, the IB algorthms perform well despte not knowng the true parameter values; they obtan 83% 98% of the upper bound, dependng of the coeffcent of varatons and the loadng factor. We observe better performance for loadng factor of.6 n compare to smaller loadng factors. The

Golrezae, Nazerzadeh, and Rusmevchentong: Real-tme Optmzaton of Personalzed Assortments 40 Management Scence 00(0, pp. 000 000, c 0000 INFORMS reason s that larger loadng factor or longer the horzon allows the algorthms to obtan better estmates of the unknown parameters. Note that IB algorthms perform well even wth few observatons. One of the reasons s that n the settng above, we do not mpose any constrant on the sze of the assortment that polces can offer to each customer. Ths could compensate for the naccuracy n estmaton of choce model snce the algorthm can offer large assortments. Furthermore, by Proposton, we expect the IB algorthms to be robust wth respect to the preference weght parameters. Appendx B: Relegated Proofs B.. Onlne Appendx to Secton 3 I t j Proof of Lemma 2: When the customer arrves n perod t, f product j has no remanng nventory, then = 0, whch mples that r j Ψ(I t j /c j = 0. By Assumpton, under our choce model, addng product j to an assortment does not ncrease the probablty that a customer wll select other products. Recall that n the case that both sets S and S {j} have the maxmum dscounted revenue, we choose the set wth the smaller number of products. Therefore, product j wll never be ncluded as a part of the optmal assortment. Proof of Corollary : Frst, observe that Ψ(x x. Ths s because Ψ s ncreasng and concave, and we have Ψ(0 = 0 and Ψ( =. By ths observaton, we have α Ψ (c mn mn x [0, cmn ] mn x [0, cmn ] mn x [0, cmn ] 2. x + Ψ(x + cmn x + x + cmn x+ { x The second equalty follows from the fact that for any x s less than the upper lmt of ntegral,, and Ψ(x. x+ cmn cmn dy } + x + x cmn cmn Ψ(ydy [ ] 0,, the lower lmt of ntegral, x +, cmn cmn In the followng we show that the compettve rato of the IB algorthm wth an ncreasng, strctly concave, and dfferentable penalty { functon s strctly greater } than. Frst note that for x = 0, the lower bound of 2 the compettve rato, penally functon Ψ, cmn mn x (0, cmn ] x cmn + Ψ(x+ x+ cmn Ψ(ydy Ψ(ydy s strctly less than cmn x + Ψ(x + cmn x+ cmn Ψ(ydy > s greater than. Ths s because for a dfferentable 2 and Ψ(0 = 0. Thus, the result holds because x cmn ] + x + dy = 2. cmn mn x (0, x+ cmn The nequalty holds because for any dfferentable and strctly concave penalty functon Ψ(x, we have Ψ(x > x for all x (0,.

Golrezae, Nazerzadeh, and Rusmevchentong: Real-tme Optmzaton of Personalzed Assortments Management Scence 00(0, pp. 000 000, c 0000 INFORMS 4 B.2. Onlne Appendx to Secton 3. Proof of Theorem 2: From Theorem we have: x α(ψ = mn ( x [0,] e e e x ( x e y dy { } x = mn x [0,] e ( e e x + x e + e x { } x = mn x [0,] e (x = e = e e e e. The second part of the theorem s followed from Lemma 3. Proof of Lemma 3: 29 Consder a settng wth n products, ndexed by,, n, all wth revenue equal to and ntal nventory of T. Thnk of T, the length of the horzon, as a very large number (that would n tend to nfnty and a multple of n. The number of types s equal to 2 n. Each type corresponds to a set Θ of products that a customer of that type equally lkes; the no-purchase probablty for all types s equal to zero. The arrval process s defned as follows: customer arrves n n phases of equal length, that s, the number of customers n each phase s T. All the customers n each phase have the same type. We denote the type n of the customer n phase j by Θ j. We have Θ = {, 2,... n}; for j, 2 j n, Θ j = Θ j \ {θ j } where θ j s a randomly chosen element of Θ j. In other words, the set of products of nterest to customer durng phase j s the set of products of nterest to customers n phase j mnus one of those products and θ n s the only product of nterest to customers n phase n,.e., customers n phase j randomly lose nterest n one of the products of nterest n phase j. An example of sequences of customer types n n phases s { } {, 2,..., n}, {, 2,..., n },..., {, 2}, {}. Therefore, there are n! sequences of customer arrvals, each wth equal probablty. In Lemma 6 n Appendx B, we show that the followng Inventory-Balancng polcy s optmal among all determnstc polces: offer to each customer all the products wth the hghest (postve remanng nventory that are of nterest to her. 30 Each customer purchases one of the products (f any offered to her because the no-purchase probablty s zero. Hence, the polcy descrbed above, n each phase, sells equal porton of the remanng nventory of each product that s of nterest to the customers n that phase (whch are all of the same type. For nstance, n the frst phase fracton of the nventory of every product s sold. Note that the roundng error s neglgble n snce T s large. Recall that θ denotes the product that wll be of no more nterest to the customers arrvng 29 The proof s bult upon deas from Mehta et al. (2007. Our analyss s dfferent, more rgorous, and apples to smaller number of products. For nstance, thers omts the correspondng proof of Lemma 6 whch we establsh va nducton, usng the dynamc programmng formulaton of the problem. 30 We do not nvestgate that a randomzed algorthm would be able to outperform the aforementoned polcy, but we are not studyng that questons n the paper.

Golrezae, Nazerzadeh, and Rusmevchentong: Real-tme Optmzaton of Personalzed Assortments 42 Management Scence 00(0, pp. 000 000, c 0000 INFORMS after and ncludng phase +. Let q,j be the fracton of customers n phase j that bought product θ. We have j n j+ q,j = 0 j > where n j + s the number of products of nterest to customers n phase j. Therefore, the revenue obtaned { from product θ s (mn T }, n j= and consequently the total revenue of the polcy above s n j+ equal to n T ( n = mn { j= }., On the other hand, the optmal clarvoyant soluton that knows n j+ the customers types n advance sells all unts of product θ to customers n phase and obtans n total a revenue of T. Ths completes the proof. Lemma 6. For the arrval process descrbed n the proof of Lemma 3, the followng nventory balancng algorthm s optmal among all determnstc polces: offer to the customer all the products wth the hghest (postve remanng nventory that are of nterest to her. Proof: Snce the revenue from all the products s the same and no-purchase probablty s zero, by Eq. (8, to prove the lemma, t suffces to show the followng Clam. Clam : Consder any two products and j and remanng nventory levels (x,..., x n such that x > x j. If wth t perods remanng, the type of the arrvng customer s z such that z and j z, then we have V (t, x,..., x,..., x j,..., x n z V (t, x,..., x,..., x j +,..., x n z Ths clam mples that t would be better to equalze the nventory levels. Namely, f the nventory for product s hgher than product j, the value (.e., expected revenue of the DP polcy would ncrease f nstead we have one addtonal unt of product j and one less unt of product. We prove the clam usng nducton on the nventory levels, fxng product (any two products and j: The nducton bass s when x = x j + (and no restrcton on the nventory of other products. In ths case, because of the symmetry n the problem, the value functon does not change f we replace one unt of product wth one unt of product j. The reason s that the current customer n nterested n both and j and because of the symmetry n the arrval process, the probablty that a future customer s only nterested n product but not product j s the same as the probablty that a future customer s only nterested n product j but not product. Inducton Step: Consder ntal nventory levels (y,..., y n such that y > y j +. Assume Clam holds for any other ntal nventory levels (x,..., x n such x k y k, k n, and at least one of these nequaltes s strct. To prove the nducton step, suppose that the optmal dynamc program startng wth nventory levels (y, y 2,..., y n offers set S to the arrvng customer. Hence, by condtonng on the type of the customer n the next perod, denoted by z, we have: V (t, y,..., y n z = z Z Pr[next customer s of type z z] ( + S V (t, y k, y {k} z k S (2

Golrezae, Nazerzadeh, and Rusmevchentong: Real-tme Optmzaton of Personalzed Assortments Management Scence 00(0, pp. 000 000, c 0000 INFORMS 43 where (y k, y {k} represents the same nventory levels as before only wth one less unt of product k. Now by applyng the nducton hypothess, we get V (t, y,..., y n z z Z Pr[next customer s of type z z] V (t, y, y j +, y {,j} z ( + S V (t, y k, y, y j +, y {,j,k} z The last nequalty follows from the propertes of the optmal dynamc program note that the optmal polcy startng wth nventory levels (y, y j +, y {,j} may fnd a set more proftable than S to offer to the customer. Fnally, we pont out f S and k = s chosen by the customer, then V (t, y k, y, y j +, y {,j,k} z would be defned equvalent to V (t, y 2, y j +, y {,j} z ; note that n the nducton step we assume y y j + 2. B.3. Onlne Appendx to Secton 4: Proof of Proposton 3 In the followng, we prove the clam for the EIB algorthm; a smlar argument can be appled to the LIB algorthm. Frst, note that we dvde the horzon T nto ɛ k S tme slots such that T ɛ s an nteger. We only observe the remanng nventory of each product n perods kt ɛ, 0 < k. Consder the soluton ɛ (allocaton of the EIB algorthm for a sequence of customers {z t } T t=. For ths soluton, let ρ j,k be the sum of revenue tmes capacty of any product whose fracton of the remanng nventory IkT ɛ c n the k th tme slot (perod kt ɛ s between (j + ɛ and jɛ (nclusve; that s, IkT ɛ c ( (j + ɛ, jɛ] where 0 j and 0 < k. Smlarly, let ɛ ɛ γj,k be the total revenue obtaned from product n the optmal soluton (of Prmal-S where IkT ɛ c ( (j + ɛ, jɛ],.e., ρ j,k = γ j,k = : IkT ɛ c ( (j+ɛ, jɛ] : IkT ɛ c ( (j+ɛ, jɛ] where o s the revenue obtaned from sellng product n the optmal soluton. Note that o r c. For any tme slot k, we defne χ(k = n c r = y= IkT ɛ c Ψ(ydy = r c, o, n c r = y=i kt ɛ Ψ(y/c dy. Notce that χ(k s an ncreasng functon of k. Usng the fact that n any perod t the IB algorthm chooses an assortment S t that maxmzes S r Ψ ( I t /c φ z t (S, we wll bound the change n χ functon at two consecutve tme slots, see the ffth sets of constrants n lnear program FRLP. Next, we wll show that the objectve functon of FRLP s less than the total revenue of the EIB algorthm. Snce the penalty functon s exponental, Ψ(x = functon s gven by χ( n ɛ = c r Ψ(ydy = e y= IT c e = e ( e e x, x [0, ], the frst term n the objectve n = c r ( I T c + e e I T c.

Golrezae, Nazerzadeh, and Rusmevchentong: Real-tme Optmzaton of Personalzed Assortments 44 Management Scence 00(0, pp. 000 000, c 0000 INFORMS By defnton of ρ j, ɛ, the second term of the objectve functon s lower bounded as follows e ɛ ( jɛ ρ j, ɛ e e ejɛ + e e n c r I T c e I T c + e. e e j=0 The nequalty holds because x e ex s a decreasng functon of x. Therefore, χ( ɛ e ɛ e j=0 ρ j, ɛ = ( jɛ e ejɛ + e n r (c I T, where the left hand sde s the objectve functon of lnear program FRLP and the rght hand sde s the revenue of the EIB algorthm. The next step s to show that any soluton of the EIB algorthm corresponds to a feasble soluton of lnear program FRLP. Wthout loss of generalty, we can normalze the revenue of the optmal soluton, Prmal-S, to whch mples the frst set of constrants. The second set of constrants holds because o r c. The thrd set of constrants follows from the defnton of ρ j,k and the fact that I kt ɛ = s a decreasng functon of k. The forth set of constrants holds because of the defnton of χ and the fact that Ψ(ydy s a decreasng y=x functon of x. In the followng, we prove Lemma 7 stated below, whch leads us to the last set of constrants. Ths set of constrants gves a lower bound for dfference of χ(k + and χ(k as a functon of the optmal soluton γ j,k. To prove the lower bound, we show that under unform permutaton, the revenue obtaned from product n the optmal soluton durng perods n [kt ɛ, (k + T ɛ denoted by o,k s concentrated around ts average ɛo. Usng ths concentraton and the fact that the IB algorthm chooses a set that maxmzes dscounted revenue, we wll get the desred bound for the change n functon χ. soluton of Hence, any the EIB algorthm, corresponds to a feasble soluton of the lnear program FRLP where the objectve s less than the revenue of the algorthm. Consderng the fact that the optmal soluton, Prmal-S, s normalzed to one, the soluton of the lnear program FRLP s a number n [0, ] and by mnmzng the objectve functon, we obtan a lower bound for the performance of the EIB algorthm, namely the rato of E {zt } T t= [Rev EIB ({z t } T t=] to Prmal-S. Lemma 7. Suppose that Ψ s ncreasng, concave, and twce dfferentable wth a bounded dervatve, n [0, ] and ɛ cmn = O(, then wth hgh probablty, for any 0 < k < ɛ, ɛ ɛ γ j,k+ Ψ ( (j + ɛ ( χ(k + χ(k + O. c mn j=0 Proof of Lemma 7: Note that the contrbuton of product to χ(k + χ(k s equal to t I 0 r c /c Ψ(ydy where t I t 0 = kt ɛ and t = (k +T ɛ. By the assumpton that the dervatve of Ψ s bounded, /c we can substtute the ntegral wth the sum and get r c I t 0 /c I t /c Ψ(ydy = r I t 0 I t Ψ(y/c dy r I t 0 z=i t ( Ψ(z/c O c mn (3

Golrezae, Nazerzadeh, and Rusmevchentong: Real-tme Optmzaton of Personalzed Assortments Management Scence 00(0, pp. 000 000, c 0000 INFORMS 45 Now let us consder the optmal soluton. Let S t opt denote the set that s shown to the customer n perod t by the optmal soluton. Snce the Inventory-Balancng algorthm shows to each customer an assortment that maxmzes S r Ψ ( I t c φ t (S, we have n = r I t 0 z=i t t Ψ(z/c t=t 0 Sopt t r Ψ ( t I t /c φ t (Sopt t t=t 0 Sopt t The last nequalty holds because Ψ s an ncreasng functon. Recall that o,k r Ψ(I t /c φ t (S t opt (4 s the revenue obtaned by the optmal soluton from product durng perods n [kt ɛ, (k + T ɛ. By ths defnton and the nequalty above, we have n = r I t 0 z=i t Ψ(z/c n = Ψ(I t /c o,k (5 In Lemma 8 whch s borrowed from Mrrokn et al. (202 (wth some modfcatons, we show that under unform permutaton, wth hgh probablty, o,k s concentrated around ts average ɛo where s the number ɛ of tme slots. Note that ths concentraton holds under the unform permutaton and as we wll dscuss below t s useful when = O(. By ths lemma and the above equaton, ɛ cmn n = r I t 0 z=i t Ψ(z/c wth hgh probablty. Therefore, by Equaton (6 and the defnton of γ, we have n = Ψ(I t /c ɛo (6 n = r I t 0 z=i t ( Ψ(z/c O c mn ɛ 2 ɛ j=0 Ψ ( (j + ɛ ( γ j,k+ O c mn (7 Snce the left hand sde s less that χ(k + χ(k, the proof s completed. Lemma 8. If the customers arrve accordng to a random order (.e., a permutaton chosen unformly at random, n o = =, then for any δ > 0 and k, T ɛ ɛ [ n ] 5 Pr o,k ɛo > < δ. ɛ c mn δ = The assumpton that n = o = mples that we have normalzed Prmal-S to. In ths lemma, we need 5 to be ether constant or go to 0 whch justfes the assumpton = O( n Lemma 7. ɛ cmn ɛ cmn B.4. Onlne Appendx to Secton 5.2 Proof of Proposton : We prove the clam by revstng the steps of the proof of Theorem. Let {z t } T t= be the sequence of the customers. By Lemma 2, we never offer any product that has no nventory. We now construct a soluton for Dual ({z t } T t=, wth the true selecton probabltes, as follows: θ = r ( Ψ ( I T /c λ t = n [ ( Ψ I t /c t φz (S t + 2ɛ t]. = r

Golrezae, Nazerzadeh, and Rusmevchentong: Real-tme Optmzaton of Personalzed Assortments 46 Management Scence 00(0, pp. 000 000, c 0000 INFORMS where S t = arg max r S S S Ψ ( I t /c φ z t (S s the assortment offered by the IB algorthm. Note that we add the error term ɛ t to the value of λ t because the assortment S t s computed usng the estmated selecton probablty φ z t. Ths constructon gves us a feasble dual soluton because λ t n = r Ψ ( I T /c (φ z t (S t ɛ t + 2 n r ɛ t = n = r Ψ ( I T /c φ z t (S t + n r ɛ t where the frst nequalty follows from the fact that for all =, 2,..., n and S S, φ z t (S φ z t (S ɛ t. The second nequalty holds because Ψ s ncreasng and I t I T. By defnton of S t, { λ t max r Ψ ( } n I T /c φ z t (S + r ɛ t S S = { max r Ψ ( ( I T /c t φz (S ɛ t} n + r ɛ t S max S S (r θ φ z t (S, where the second nequalty follows from the fact that for all and S S, φ z t S = = (S φ z t (S ɛ t. The thrd nequalty follows from the defnton of θ and the fact that Ψ (I T /c s less than or equal to. It follows from the Weak Dualty Theorem that Prmal ( [ T ] n n {z t } T t= E λ t + c θ E t= = Hence, = r c ( Ψ ( I T /c + c t=i T + Ψ(t/c + 2 T t= ɛ t E [ n = r (c I T ] Prmal ({z t } T t= E [ n = r (c I T ] [ n ( E r = c ( Ψ (I T /c + c Ψ(t/c t=i + 2 ]. T T + t= ɛt where E [ n = r (c I T ] s the revenue of the Inventory-Balancng algorthm. Note that the contrbuton of any product that s not sold by the optmal soluton to Prmal ({z t } T t= s zero. Thus, to fnd the compettve rato, we only consder products that are sold by the optmal soluton. Snce t s assumed that the IB algorthm sells at least one unt of any product that s sold by the optmal soluton, the compettve rato of the algorthm s at least x mn (c,x:x c Ψ(t/c t=i + ( Ψ(x + 2 T + c E [ T ɛt], t= c c where s a product sold by the optmal soluton. Therefore, by the same argument as n the proof of Theorem, the compettve rato s at least mn x [0, cmn ] + Ψ(x + cmn x+ cmn x Ψ(ydy + 2 cmn E [ T t= ɛt].

Golrezae, Nazerzadeh, and Rusmevchentong: Real-tme Optmzaton of Personalzed Assortments Management Scence 00(0, pp. 000 000, c 0000 INFORMS 47 Appendx C: Asymptotc Optmalty of the Dynamc Programmng Polcy In ths secton, we show asymptotc optmalty of the dynamc programmng (DP polcy when the type of customers s drawn ndependently from a known dstrbuton. Namely, we show that the value obtaned by the DP polcy approaches Prmal-S asymptotcally when both the capactes and the horzon scale proportonally. Let η z > 0, z Z, be the probablty that n each perod a customer of type z arrves. 3 Let V (t, x,..., x n z denote the maxmum expected revenue wth t perods remanng, gven that a customer of type z Z arrves, and the remanng nventores are (x,..., x n. Then, the dynamc programmng formulaton of ths problem s gven by V (t, x,..., x n z (8 { } = max φ z (S [r + V (t, x,..., x,..., x n ] + φ z S S : x 0(SV (t, x,..., x n S S where V (t, x,..., x n = z Z ηz V (t, x,..., x n z. Also, the termnal condton s gven by V (0, = 0. We denote the optmal revenue under the dynamc programmng formulaton by V (T, c where c s the vector of ntal nventores. We note that n computng V (T, c, we take expectaton wth respect to sequence of customers and the customers choces. For smplcty, we assume that the polcy can always offer an empty assortment wth S =. Thus, the maxmum n the dynamc programmng equaton s always well-defned. The asymptotc optmalty result s stated n the followng Proposton. Proposton 4 (Asymptotc Optmalty of DP. Gven that the type of customers s drawn ndependently from a known dstrbuton such that the probablty of arrvng a customer of type z Z n any perod t s η z, then lm β V (βt, βc Prmal-S(βT, βc =, where Prmal-S(βT, βc s the lnear programmng Prmal-S wth ntal nventores βc and the length of the horzon βt. In the above proposton, we scale both the horzon and ntal nventory wth a scalar β. The correspondng problem s called β-scaled stochastc problem. Then, to see the asymptotc behavor of dynamc programmng, we let β go to nfnty. We note that Proposton 4 does not mply that dynamc programmng polcy s asymptotcally optmal for every sequence of customer types. Instead t shows that t s asymptotcally optmal only when take average over all sequences. Proof of Proposton 4: By Lemma 4, V (βt, βc Prmal-S(βT, βc for all T, c and β > 0. Now, let {ȳ z (S : S S, z Z} denote an optmal soluton for the (unscaled Prmal-S(T, c for all T, c and β > 0. Then, t s easy to verfy that {βȳ z (S : S S, z Z} s an optmal soluton to Prmal-S(βT, βc. V (βt,βc To show that lm β =, we construct a determnstc polcy µ for the β-scaled stochastc Prmal-S(βT,βc problem whose expected revenue approaches Prmal-S(βT, βc as β ncreases toward nfnty. We show that 3 Ths s wth abuse of notaton and done for the sake of economy of notaton, we prevously used η z as the expected number of customers of type z, not the probablty.

Golrezae, Nazerzadeh, and Rusmevchentong: Real-tme Optmzaton of Personalzed Assortments 48 Management Scence 00(0, pp. 000 000, c 0000 INFORMS ths polcy s admssble, that s the total sales of product s less than ts ntal nventory. Therefore, V (βt, βc also approaches Prmal-S(βT, βc as β. The polcy µ operates as follows: Offer a set S S to customers of type z for up to βη z ȳ z (S tmes. The order n whch the sets are offered s arbtrary. Under ths polcy, we wll NOT accept all of the demands generated by offerng S. Rather, we wll lmt the sales of product from offerng S to customers of type z to at most βη z φ z (Sȳ z (S. Let N(βT = (N z (βt : z Z be a multnomal random vector, where N z (βt denotes the total number of customers of type z over βt perods. Note that N z (βt has a bnomal dstrbuton wth parameter βt and η z. We defne the random varable D z (S, q as the total number of customers of type z who select product when S s offered under the polcy µ, gven that there are q customers of type z. Snce under the polcy µ we do not accept all the demands, the total sales of product from customers of type z generated from offerng S under the polcy µ s gven by Sale µ,z (S = mn {D z (S, N z (βt, βη z φ z (Sȳ z (S}. We pont out that Sale µ,z (S s a random varable because D z (S, N z (βt s a random varable. Snce βȳ z (S s a feasble soluton of lnear program Prmal-S(βT, βc, we have β z Z η z φ z (Sȳ z (S βc, =,..., n, S S whch mples that, wth probablty one, z Z S S Sale µ,z (S β z Z η z φ z (Sȳ z (S βc, =,..., n. S S Therefore, the polcy µ s admssble because the total sales of product does not exceed ts ntal nventory. The total revenue over βt n r = S S z Z Saleµ,z (S. Then, n n lm r Sale µ,z (S = lm r β β β β = S S z Z = S S n = r mn = = perods under polcy µ s gven by a random varable S S z Z mn {D z (S, N z (βt, βη z ȳ z (Sφ z (S} z Z { lm β } β Dz (S, N z (βt, η z ȳ z (Sφ z (S n r η z ȳ z (Sφ z (S = Prmal-S(T, c. = To establsh the thrd equalty above, note that t= S S z Z β Dz (S, N z (βt = M z l β {B t,z (S=} = M z β M z l M z {B t,z (S=}, where M z := mn{n z (βt, βη z ȳ z (S} and B t,z(s = denotes the event that the t th customer of type z selects product when S s offered, wth E[l {B t,z (S=}] = φ z (S. By SLLN, we know that lm β N z (βt /β = η z T almost surely (a.s.. Snce under the polcy µ, we only offer S up to βη z ȳ z (S customers of type z, t= M z lm β β = lm mn{n z (βt, βη z ȳ z (S} = η z ȳ z (S a.s. β β

Golrezae, Nazerzadeh, and Rusmevchentong: Real-tme Optmzaton of Personalzed Assortments Management Scence 00(0, pp. 000 000, c 0000 INFORMS 49 By a smlar argument, M z M z t= l {B t,z (S=} = φ z (S. Thus, wth probablty one, lm β β Dz (S, N z (βt = η z ȳ z (Sφ z (S, whch gves us the desred result. Then, by the Domnated Convergence Theorem, t follows that lm E [ n r β β = S S z Z Saleµ,z (S ] = Prmal-S(T, c. Snce the polcy µ s admssble, V (βt, βc lm β Prmal-S(βT, βc lm E [ n r β = S S z Z Saleµ,z (S ] β Prmal-S(βT, βc =, β whch completes the proof.