Technical Note: An Expectation-Maximization Algorithm to Estimate the Parameters of the Markov Chain Choice Model

Size: px

Start display at page:

Download "Technical Note: An Expectation-Maximization Algorithm to Estimate the Parameters of the Markov Chain Choice Model"

Nora Walton
5 years ago
Views:

1 Techncal Note: An Expectaton-Maxmzaton Algorthm to Estmate the Parameters of the Markov Chan Choce Model A. Serdar Şmşek 1, Huseyn Topaloglu 2 August 1, 2017 Abstract We develop an expectaton-maxmzaton algorthm to estmate the parameters of the Markov chan choce model. In ths choce model, a customer arrves nto the system to purchase a certan product. If ths product s avalable for purchase, then the customer purchases t. Otherwse, the customer transtons between the products accordng to a transton probablty matrx untl she reaches an avalable one and purchases ths product. The parameters of the Markov chan choce model are the probablty that the customer arrves nto the system to purchase each one of the products and the entres of the transton probablty matrx. In our expectaton-maxmzaton algorthm, we treat the path that a customer follows n the Markov chan as the mssng pece of the data. Condtonal on the fnal purchase decson of a customer, we show how to compute the probablty that the customer arrves nto the system to purchase a certan product and the expected number of tmes that the customer transtons from a certan product to another one. These results allow us to execute the expectaton step of our algorthm. Also, we show how to solve the optmzaton problem that appears n the maxmzaton step of our algorthm. Our computatonal experments show that the Markov chan choce model, coupled wth our expectaton-maxmzaton algorthm, can yeld better predctons of customer choce behavor when compared wth other commonly used alternatves. 1 Introducton Incorporatng customer choce behavor nto revenue management models has been seeng consderable attenton. Tradtonal revenue management models capture the customer demand for a product through an exogenous random varable whose dstrbuton does not depend on what other products are avalable. In realty, however, many customers choose and substtute among the avalable products n a partcular product category, ether because the customer arrves wth no specfc product n mnd and makes a choce among the offered products or because the customer arrves wth a specfc product n mnd and ths product s not avalable. Feld experments, 1 Naveen Jndal School of Management, The Unversty of Texas at Dallas, Rchardson, TX 75080, USA, serdar.smsek@utdallas.edu 2 School of Operatons Research and Informaton Engneerng, Cornell Tech, New York, NY 10011, USA, topaloglu@ore.cornell.edu 1

2 customer surveys and controlled studes n Znn and Lu (2001), Chong et al. (2001), Campo et al. (2003), Sloot et al. (2005) and Guadagn and Lttle (2008) ndcate that customers ndeed make a choce among the offered products after comparng them wth respect to varous features and substtute for another product when the product that they orgnally have n mnd s not avalable. When customers choose and substtute, the demand for a partcular product depends on what other products are avalable. Dscrete choce models become useful to model the customer demand when the customers choose and substtute among the avalable products. Although dscrete choce models can provde a more realstc model of the customer demand when compared wth usng an exogenous random varable, estmatng the parameters of a dscrete choce model can be challengng. In ths paper, we consder the problem of estmatng the parameters of the Markov chan choce model. In ths choce model, wth a certan probablty, a customer arrvng nto the system s nterested n purchasng a certan product. If ths product s avalable for purchase, then the customer purchases t and leaves the system. Otherwse, the customer transtons nto another product wth a certan transton probablty and checks the avalablty of ths next product. In ths way, the customer transtons between the products untl she reaches an avalable one. Therefore, the parameters of the Markov chan choce model are the probablty that a customer arrves nto the system to purchase each one of the products and the probablty that a customer transtons from the current product to the next one when the current product s not avalable. We develop an expectaton-maxmzaton algorthm to estmate the parameters of the Markov chan choce model from the past purchase hstory of the customers. The expectaton-maxmzaton algorthm dates back to Dempster et al. (1977) and t s useful for solvng parameter estmaton problems when the data avalable for estmaton has a mssng pece. In ths algorthm, we start from some ntal parameter estmates and terate between the expectaton and maxmzaton steps. We focus on the so-called complete log-lkelhood functon, whch s constructed under the assumpton that we have access to the mssng pece of the data. In the expectaton step, we compute the expectaton of the complete log-lkelhood functon, when the dstrbuton of the mssng pece of the data s drven by the current parameter estmates. In the maxmzaton step, we maxmze the expectaton of the complete log-lkelhood functon to obtan new parameter estmates and repeat the process startng from the new parameter estmates. In parameter estmaton problems, our goal s to fnd a maxmzer of the so-called ncomplete log-lkelhood functon, whch s constructed under the assumpton that we do not have access 2

3 to the mssng pece of the data. Dempster et al. (1977) show that the successve parameter estmates from the expectaton-maxmzaton algorthm monotoncally mprove the value of the ncomplete log-lkelhood functon. Wu (1983) and Nettleton (1999) gve regularty condtons to ensure convergence to a local maxmum of the ncomplete log-lkelhood functon. Man Contrbutons. The data for estmatng the parameters of the Markov chan choce model are the set of products made avalable to each customer and the product purchased by each customer. In our expectaton-maxmzaton algorthm, we treat the path that a customer follows n the Markov chan choce model as the mssng pece of the data. In the expectaton step, we need to compute two quanttes, when the dstrbuton of the mssng pece of the data s drven by the current parameter estmates. The frst quantty s the probablty that the customer arrves nto the system to purchase a certan product, condtonal on the fnal purchase decson of the customer n the data. The second quantty s the expected number of tmes that a customer transtons from a certan product to another one, also condtonal on the fnal purchase decson of the customer n the data. We show how to compute these quanttes by solvng lnear systems of equatons. Also, we show that the optmzaton problem n the maxmzaton step has a closed-form soluton. We gve a convergence result for our expectaton-maxmzaton algorthm. In partcular, we show that the value of the ncomplete log-lkelhood functon at the parameter estmates generated by our algorthm monotoncally ncreases and converges to the value at a local maxmzer, under the assumpton that the parameters of the Markov chan choce model that we are tryng to estmate are bounded away from zero. Ths assumpton s arguably mld snce we can put a small but strctly postve lower bound on the parameters wth a neglgble effect on the choce probabltes. In our computatonal experments, we ft a Markov chan choce model to dfferent data sets by usng our expectaton-maxmzaton algorthm and compare the ftted Markov chan choce model wth a benchmark that captures the choce process of the customers by usng the multnomal logt model. The out-of-sample log-lkelhoods of the Markov chan choce model can mprove those of the benchmark by as much as 2%. Related Lterature. The Markov chan choce model was proposed by Blanchet et al. (2013). The authors study assortment problems, where there s a fxed revenue assocated wth each product and the goal s to fnd a set of products to offer to maxmze the expected revenue from a customer. They gve a polynomal-tme algorthm to solve the assortment problem under the Markov chan choce model exactly. Feldman and Topaloglu (2014) focus on a determnstc approxmaton 3

4 for network revenue management problems, where the decson varables correspond to the duratons of tme durng whch dfferent subsets of products are offered to the customers. Thus, the number of decson varables ncreases exponentally wth the number of products. The authors show that f the customers choose under the Markov chan choce model, then the number of decson varables n the determnstc approxmaton ncreases lnearly wth the number of products. Desr et al. (2015) study assortment problems under the Markov chan choce model wth a constrant on the total space consumpton of the offered products. They gve a constant factor approxmaton algorthm and show that t s NP-hard to approxmate the problem better than a fxed constant factor. The expectaton-maxmzaton algorthm s used to estmate the parameters of varous choce models. Vulcano et al. (2012) focus on estmatng the parameters of the multnomal logt model when the demand s censored so that the customers who do not make a purchase are not recorded n the data. Followng ther work, we can also deal wth demand censorshp, as dscussed n our conclusons secton. Faras et al. (2013), van Ryzn and Vulcano (2015), van Ryzn and Vulcano (2016), Jagabathula and Vulcano (2016) and Jagabathula and Rusmevchentong (2016) consder the rankng-based choce model, where each customer has a ranked lst of products n mnd and she purchases the most preferred avalable product. The authors focus on estmatng the parameters and comng up wth ranked lsts supported by the data. Chong et al. (2001), Kok and Fsher (2007), Msra (2008), Vulcano et al. (2010) and Da et al. (2014) use real data to quantfy the revenue mprovements when one accounts for the customer choce process n assortment decsons. Outlne. In Secton 2, we descrbe the Markov chan choce model. In Secton 3, we provde the ncomplete and complete lkelhood functons. In Secton 4, we gve our expectaton-maxmzaton algorthm, show how to execute the expectaton and maxmzaton steps and dscuss convergence. In Secton 5, we gve computatonal experments. In Secton 6, we conclude. 2 Markov Chan Choce Model In the Markov chan choce model, we have n products ndexed by N = {1,..., n}. A customer arrvng nto the system s nterested n purchasng product wth probablty λ. If ths product s avalable for purchase, then the customer purchases t and leaves the system. If ths product s not avalable for purchase, then the customer transtons from product to product j wth probablty ρ,j. The customer vsts dfferent products n ths fashon untl she vsts a product 4

5 that s avalable for purchase and purchases t. Naturally, we assume that N λ = 1 and j N ρ,j = 1 for all N. We note that a customer may have the opton of leavng the system wthout a purchase n certan settngs. To capture such a settng, we can assume that the opton of leavng the system wthout a purchase corresponds to one of the products n N. Ths product s always avalable for purchase and f a customer vsts ths product, then she leaves the system wthout a purchase. Blanchet et al. (2013) and Feldman and Topaloglu (2014) show that we can solve a lnear system of equatons to compute the probablty that a customer purchases a certan product when we offer the subset S N of products. In partcular, we let Θ (S) be the expected number of tmes that a customer vsts product durng the course of her choce process gven that we offer the subset S of products. We can compute Θ(S) = (Θ 1 (S),..., Θ n (S)) by solvng Θ (S) = λ + ρ j, Θ j (S) N. (1) j N\S We nterpret (1) as follows. On the left sde, Θ (S) s the expected number of tmes that a customer vsts product. The expected number of tmes that a customer vsts product when she arrves nto the system s λ, yeldng λ on the rght sde. The expected number of tmes that a customer vsts some product j N \S s Θ j (S). In each one of these vsts, she transtons from product j to product wth probablty ρ j,, yeldng j N\S ρ j, Θ j (S) on the rght sde. If product s avalable for purchase so that S, then a customer can vst ths product at most once, snce she purchases ths product whenever she vsts t. So, the expected number of tmes that a customer vsts product S s the same as the probablty that a customer vsts ths product, whch s, n turn, the same as the probablty that a customer purchases ths product. Thus, f Θ(S) = (Θ 1 (S),..., Θ n (S)) s the soluton to (1), then the probablty that a customer purchases product S s Θ (S). Usng the vector λ = {λ : N \ S} and the matrx ρ = {ρ,j :, j N \ S}, by (1), the vector Θ(S) = {Θ (S) : N \ S} satsfes Θ(S) = λ + ρ Θ(S). Thus, we have (I ρ) Θ(S) = λ, where I s the dentty matrx wth the approprate dmenson. By Corollary C.4 n Puterman (1994), f we have j N\S ρ,j < 1 for all N \ S, then (I ρ) 1 exsts and has non-negatve entres. So, snce ((I ρ) ) 1 = ((I ρ) 1 ), we obtan Θ(S) = ((I ρ) 1 ) λ. Once we compute Θ(S) = {Θ (S) : N \ S} by usng the last equalty, (1) mples that we can compute {Θ (S) : S} as Θ (S) = λ + j N\S ρ j, Θ j (S) for all S. Ths dscusson mples that f we have j N\S ρ,j < 1 for all N \ S, then there exsts a unque value of Θ(S) that satsfes (1) and ths value of Θ(S) has non-negatve entres. 5

6 3 Incomplete and Complete Lkelhood Functons Our goal s to estmate the parameters λ = (λ 1,..., λ n ) and ρ = {ρ,j :, j N} of the Markov chan choce model by usng the data on the subsets of products offered to the customers and the purchase decsons of the customers. We use the maxmum lkelhood method to estmate the parameters (λ, ρ). To capture the product that a customer purchases, we defne the random varable Z (S) {0, 1} such that Z (S) = 1 f and only f a customer purchases product when we offer the subset S of products. Therefore, usng e R n to denote the unt vector wth a one n the -th component, we have Z(S) = (Z 1 (S),..., Z n (S)) = e wth probablty Θ (S). In the data that we have avalable to estmate the parameters of the Markov chan choce model, there are τ customers ndexed by T = {1,..., τ}. We use Ŝt N to denote the subset of products offered to customer t. To capture the product purchased by ths customer, we defne Ẑt {0, 1} such that Ẑ t = 1 f and only f customer t purchased product, whch mples that Ẑt = (Ẑt 1,..., Ẑt n) s a sample of the random varable Z(Ŝt ) = (Z 1 (Ŝt ),..., Z n (Ŝt )). Thus, the data that s avalable to estmate the parameters of the Markov chan choce model s {(Ŝt, Ẑt ) : t T }. The probablty that customer t purchases product s Θ (Ŝt λ, ρ), where we explctly show that the soluton Θ(S λ, ρ) = (Θ 1 (S λ, ρ),..., Θ n (S λ, ρ)) to (1) depends on the parameters (λ, ρ) of the Markov chan choce model. In ths case, the lkelhood of the purchase decson of customer t s gven by N Θ (Ŝt λ, ρ)ẑt, where we follow the conventon that 0 0 = 1. The log-lkelhood of ths purchase decson s N Ẑt log Θ (Ŝt λ, ρ). Assumng that the purchase decsons of the dfferent customers are ndependent of each other, the log-lkelhood of the data {(Ŝt, Ẑt ) : t T } s L I (λ, ρ) = t T Ẑ t log Θ (Ŝt λ, ρ). (2) N To estmate the parameters of the Markov chan choce model, we can maxmze L I (λ, ρ) subject to the constrant that N λ = 1, j N ρ,j = 1 for all N, λ R n + and ρ R n n +. The dffculty wth ths approach s that there s no closed-form expresson for Θ (Ŝt λ, ρ) n the defnton of L I (λ, ρ), whch s the man motvaton for our expectaton-maxmzaton algorthm. In our expectaton-maxmzaton algorthm, we use a lkelhood functon constructed under the assumpton that we have access to addtonal data for each customer. In partcular, we defne the random varable F {0, 1} such that F = 1 f and only f a customer arrvng nto the system s nterested n purchasng product. Thus, we have F = (F 1,..., F n ) = e wth probablty λ. Also, 6

7 we use the random varable X,j (S) to denote the number of tmes that a customer transtons from product to product j durng the course of her choce process when we offer the subset S of products. We do not gve the probablty law for the random varable X,j (S) explctly, but we compute certan expectatons nvolvng ths random varable n the next secton. For each customer t, we assume that we have access to addtonal data so that we know the product that ths customer was nterested n purchasng when she arrved nto the system, as well as the number of tmes that she transtoned from each product to each product j durng the course of her choce process. (Ths assumpton s temporary to facltate our analyss and our expectaton-maxmzaton algorthm wll not requre havng access to the addtonal data.) In partcular, we defne ˆF t {0, 1} such that ˆF t = 1 f and only f customer t was nterested n purchasng product when she arrved nto the system. We use ˆX t,j to denote the number of tmes that customer t transtoned from product to product j durng the course of her choce process. So, ˆF t = ( ˆF 1 t,..., ˆF n) t and ˆX t = { ˆX,j t :, j N} are respectvely samples of the random varables F = (F 1,..., F n ) and X(Ŝt ) = {X,j (Ŝt ) :, j N}. We construct a lkelhood functon under the assumpton that the data that s avalable to estmate the parameters of the Markov chan choce model s {(Ŝt, Ẑt, ˆF t, ˆX t ) : t T }. The probablty that customer t s nterested n purchasng product when she arrves nto the system s λ. Also, gven that customer t s nterested n purchasng product up on arrval, the probablty that she vsts products, 1, 2,..., k 1, k, j to purchase product j s gven by ρ,1 ρ 1, 2... ρ k 1, k ρ k,j. In ths case, the lkelhood of the purchase decson of customer t s N λ ˆF t ths purchase decson s N ˆF t log λ +,j N {(Ŝt, Ẑt, ˆF t, ˆX t ) : t T } s,j N ρ ˆX t,j,j. The log-lkelhood of ˆX t,j log ρ,j. Thus, the log-lkelhood of the data L C (λ, ρ) = t T N ˆF t log λ + t T,j N ˆX t,j log ρ,j. (3) Note that once we know {( ˆF t, ˆX t ) : t T }, {(Ŝt, Ẑt ) : t T } does not play a role n (3). To estmate the parameters of the Markov chan choce model, knowng {( ˆF t, ˆX t ) : t T } s equvalent to knowng the path that a customer follows n the Markov chan, snce the second term on the rght sde of (3) does not depend on the order n whch the transtons take place. The lkelhood functon L C (λ, ρ) n (3) s constructed under the assumpton that we have access to addtonal data for each customer. Ths lkelhood functon s known as the complete lkelhood functon and the subscrpt C n L C (λ, ρ) stands for complete. In contrast, the lkelhood functon L I (λ, ρ) n (2) s constructed under the assumpton that we do not have access to the addtonal 7

8 data for each customer. Ths lkelhood functon s known as the ncomplete lkelhood functon and the subscrpt I n L I (λ, ρ) stands for ncomplete. Notng that log x s concave n x, the lkelhood functon L C (λ, ρ) s concave n (λ, ρ) and t has a closed-form expresson. However, ths lkelhood functon s not mmedately useful when estmatng the parameters of the Markov chan choce model, snce we do not have access to the data {( ˆF t, ˆX t ) : t T } n practce. In the next secton, we gve an expectaton-maxmzaton algorthm that uses the lkelhood functon L C (λ, ρ) to estmate the parameters of the Markov chan choce model, whle makng sure that we do not need to have access to the data {( ˆF t, ˆX t ) : t T }. 4 Expectaton-Maxmzaton Algorthm In ths secton, we descrbe our expectaton-maxmzaton algorthm. We show how to execute the expectaton and maxmzaton steps of ths algorthm n detal. Lastly, we dscuss the convergence propertes of the terates of our expectaton-maxmzaton algorthm. 4.1 Overvew of the Algorthm Our expectaton-maxmzaton algorthm estmates the parameters of the Markov chan choce model by usng the lkelhood functon L C (λ, ρ) n (3). Although ths algorthm works wth the lkelhood functon L C (λ, ρ) n (3), t requres havng access to the data {(Ŝt, Ẑt ) : t T }, but not to the data {( ˆF t, ˆX t ) : t T }. In our expectaton-maxmzaton algorthm, we start wth some estmate of the parameters (λ 1, ρ 1 ) at the frst teraton. At teraton l, we estmate ˆF t as the expectaton of the random varable F condtonal on the fact that customer t chooses accordng to the Markov chan choce model wth parameters (λ l, ρ l ) and her purchase decson s gven by Ẑ t = (Ẑt 1,..., Ẑt n). Smlarly, we estmate ˆX t,j as the expectaton of the random varable X,j(Ŝt ) condtonal on the fact that customer t chooses accordng to the Markov chan choce model wth parameters (λ l, ρ l ) and her purchase decson s gven by Ẑt = (Ẑt 1,..., Ẑt n). Computng these condtonal expectatons to estmate ˆF t and ˆX,j t for all, j N, t T s known as the expectaton step. Next, we plug these estmates nto (3) to construct the lkelhood functon L C (λ, ρ) and maxmze ths lkelhood functon subject to the constrant that N λ = 1, j N ρ,j = 1 for all N, λ R n + and ρ R n n +. The optmal soluton to ths problem yelds parameters (λ l+1, ρ l+1 ) that we use at teraton l + 1. Maxmzng the lkelhood functon L C (λ, ρ) n ths fashon s known as the maxmzaton step. Usng the parameters (λ l+1, ρ l+1 ), we can go back to 8

9 the expectaton step to estmate ˆF t and ˆX,j t for all, j N, t T. The expectaton-maxmzaton algorthm teratvely carres out the expectaton and maxmzaton steps to generate a sequence of parameters {(λ l, ρ l ) : l = 1, 2,...}. We state the expectaton-maxmzaton algorthm below. Step 1. Choose the ntal estmates (λ 1, ρ 1 ) of the parameters of the Markov chan choce model arbtrarly, as long as they satsfy N λ1 = 1, j N ρ1,j = 1 for all N, λ1 R n + and ρ 1 R n n +. Intalze the teraton counter by settng l = 1. Step 2. (Expectaton) Assumng that the customers choose accordng to the Markov chan choce model wth parameters (λ l, ρ l ), set for all, j N, t T. ˆF = E{F Z(Ŝt ) = Ẑt } and ˆX,j = E{X,j(Ŝt) Z(Ŝt ) = Ẑt } N Step 3. (Maxmzaton) Let (λ l+1, ρ l+1 ) be the maxmzer of L l C (λ, ρ) = t T ˆF log λ + t T,j N ˆX,j log ρ,j subject to the constrant that N λ = 1, j N ρ,j = 1 for all N, λ R n + and ρ R n n +. Increase l by one and go to Step 2. In the expectaton step, we compute non-trval condtonal expectatons. In Secton 4.2, we show that we can compute these condtonal expectatons by solvng lnear systems of equatons. In the maxmzaton step, we maxmze the functon L l C (λ, ρ) subject to lnear constrants. In Secton 4.3, we show that ths optmzaton problem has a closed-form soluton. To estmate the parameters of the Markov chan choce model, we need to maxmze the lkelhood functon L I (λ, ρ). In Secton 4.4, we consder our expectaton-maxmzaton algorthm under the assumpton that the parameters that we are tryng to estmate are bounded away from zero. We show that the value of the lkelhood functon {L I (λ l, ρ l ) : l = 1, 2,...} at the successve parameter estmates {(λ l, ρ l ) : l = 1, 2,...} generated by our algorthm monotoncally ncreases and converges to the value of the lkelhood functon L I (λ, ρ) at a local maxmzer. In that secton, we precsely defne what we mean by a local maxmzer. We also show that we can stll solve the optmzaton problem n the maxmzaton step n polynomal tme when we have a strctly postve lower bound on the parameters. 4.2 Expectaton Step In the expectaton step of the expectaton-maxmzaton algorthm, customer t s offered the subset Ŝ t of products. She chooses among these products accordng to the Markov chan choce model wth parameters (λ l, ρ l ). We know that the purchase decson of ths customer s gven by the 9

10 vector Ẑt, whch s to say that f we know that customer t purchased product k, then Ẑt = e k. We need to compute the condtonal expectatons E{F Z(Ŝt ) = Ẑt } and E{X,j (Ŝt) Z(Ŝt ) = Ẑt }. In ths secton, we show that we can compute these condtonal expectatons by solvng lnear systems of equatons. For notatonal brevty, we omt the superscrpt t ndexng the customer and the superscrpt l ndexng the teraton counter. In partcular, we consder the case where a customer s offered the subset S of products. She chooses among these products accordng to the Markov chan choce model wth parameters (λ, ρ). We know that the customer purchased some product k. In other words, notng that the purchase decson of a customer s captured by the random varable Z(S) = (Z 1 (S),..., Z n (S)), we know that Z(S) = e k. expectatons E{F Z(S) = e k } and E{X,j (S) Z(S) = e k }. We want to compute the condtonal Computaton of E{F Z(S) = e k }. The expectaton E{F Z(S) = e k } s condtonal on havng Z(S) = e k. Thus, we know that a customer purchased product k out of the subset S of products, whch mples that we must have k S. Therefore, we assume that k S n our dscusson. Usng the Bayes rule, we have E{F Z(S) = e k } = P{F = 1 Z(S) = e k } = P{Z k(s) = 1 F = 1} P{F = 1}. (4) P{Z k (S) = 1} On the rght sde of (4), P{Z k (S) = 1 F = 1} s the probablty that a customer purchases product k out of the subset S of products gven that she s nterested n purchasng product when she arrves nto the system. Ths probablty s smple to compute when S. In partcular, f product s offered and a customer s nterested n purchasng product when she arrves nto the system, then ths customer defntely purchases product. Thus, lettng 1( ) be the ndcator functon, we have P{Z k (S) = 1 F = 1} = 1( = k) for all S. We focus on computng P{Z k (S) = 1 F = 1} for all N \ S. Lettng Ψ k (, S) = P{Z k (S) = 1 F = 1} for all N \ S for notatonal brevty, we can compute {Ψ k (, S) : N \ S} by solvng the lnear system of equatons Ψ k (, S) = ρ,k + ρ,j Ψ k (j, S) N \ S. (5) j N\S We nterpret (5) as follows. On the left sde, Ψ k (, S) s the probablty that a customer purchases product k out of the subset S of products gven that she s nterested n purchasng product up on arrval. For ths customer to purchase product k, she may transton from product to product k, yeldng ρ,k on the rght sde. Alternatvely, the customer may transton from product to some 10

11 product j N \ S, at whch pont, she s dentcal to a customer nterested n purchasng product j up on arrval and ths customer purchases product k wth probablty Ψ k (j, S). Ths reasonng yelds j N\S ρ,j Ψ k (j, S) on the rght sde. By the same dscusson at the end of Secton 2, f j N\S ρ,j < 1 for all N \ S, then there exsts a unque soluton to the system of equatons n (5). Thus, f {Ψ k (, S) : N \ S} solve (5), then we have Ψ k (, S) = P{Z k (S) = 1 F = 1} for all N \ S. For notatonal unformty, notng the dscusson rght before (5), we let Ψ k (, S) = 1( = k) for all S. In ths case, we have Ψ k (, S) = P{Z k (S) = 1 F = 1} for all N. The other probabltes on the rght sde of (4) are smple to compute. Notng that P{F = 1} s the probablty that a customer arrvng nto the system s nterested n purchasng product, we have P{F = 1} = λ. Smlarly, snce P{Z k (S) = 1} s the probablty that a customer purchases product k out of the subset S of products, we have P{Z k (S) = 1} = Θ k (S), where (Θ 1 (S),..., Θ n (S)) solve the system of equatons n (1). Puttng the dscusson so far together, we compute {Ψ k (, S) : N \ S} by solvng the system of equatons n (5). Also, lettng Ψ k (, S) = 1( = k) for all S, by (4), for all N, we have E{F Z(S) = e k } = P{Z k(s) = 1 F = 1} P{F = 1} P{Z k (S) = 1} = Ψ k(, S) λ. (6) Θ k (S) When we offer the subset S of products, Ψ k (, S) s the purchase probablty of product k condtonal on the fact that the customer s nterested n purchasng product when she arrves nto the system, whch can be computed by solvng (5), whereas Θ k (S) s the uncondtonal purchase probablty of product k, whch can be computed by solvng (1). The systems of equatons n (1) and (5) are smlar to each other. Blanchet et al. (2013) and Feldman and Topaloglu (2014) use (1) to compute uncondtonal purchase probabltes, but t s nterestng that a slght varaton of (1) allows computng condtonal purchase probabltes. Computaton of E{X,j (S) Z(S) = e k }. Smlar to our earler argument, snce the expectaton E{X,j (S) Z(S) = e k } s condtonal on havng Z(S) = e k, we know that a customer purchased product k out of the subset S of products, whch mples that k S. Therefore, we assume that k S n our dscusson. The random varable X,j (S) captures the number of tmes that a customer transtons from product to product j durng the course of her choce process when we offer the subset S of products. If we have S, then a customer cannot transton from product to another product, snce the customer purchases product whenever she vsts t. So, 11

12 X,j (S) = 0 wth probablty one for all S, whch mples that E{X,j (S) Z(S) = e k } = 0. We focus on computng the expectaton E{X,j (S) Z(S) = e k } for all N \S. We defne the random varable Y m (S) {0, 1} such that Y m (S) = 1 f and only f the m-th product that a customer vsts durng the course of her choce process s product when we offer the subset S of products. Snce X,j (S) s the number of tmes a customer transtons from product to product j when we offer the subset S of products, we have X,j (S) = m=1 1(Y m (S) = 1, Y m+1 (S) = 1). So, we have E{X,j (S) Z(S) = e k } = = m=1 m=1 P{Y m+1 j where the second equalty s by the Bayes rule. P{Y m+1 j P{Y m (S) = 1, Y m+1 j (S) = 1 Z(S) = e k }. (S) = 1 Y m (S) = 1, Z(S) = e k } P{Y m (S) = 1 Z(S) = e k }, (7) j We focus on each one of the probabltes (S) = 1 Y m (S) = 1, Z(S) = e k } and P{Y m (S) = 1 Z(S) = e k } on the rght sde of (7) separately. Consderng the probablty P{Y m (S) = 1 Z(S) = e k }, from the perspectve of the fnal purchase decson, a customer that vsts product as the m-th product s ndstngushable from a customer that vsts product as the frst product. Thus, we have P{Z(S) = e k Y m (S) = 1} = P{Z(S) = e k F = 1}. In ths case, by usng the Bayes rule once more, we have P{Y m (S) = 1 Z(S) = e k } = P{Z k(s) = 1 Y m (S) = 1} P{Y m (S) = 1} P{Z k (S) = 1} = P{Z k(s) = 1 F = 1} P{Y m (S) = 1} P{Z k (S) = 1} = Ψk(, S) P{Y m (S) = 1}, (8) Θ k (S) where we compute {Ψ k (, S) : N \ S} by solvng (5). probablty P{Y m+1 j On the other hand, consderng the (S) = 1 Y m (S) = 1, Z(S) = e k }, by the Bayes rule, we also have P{Yj m+1 (S) = 1 Z(S) = e k, Y m (S) = 1} = P{Z k(s) = 1 Yj m+1 (S) = 1, Y m j P{Z k (S) = 1 Y m (S) = 1} (S) = 1} P{Y m+1 (S) = 1 Y m (S) = 1} = P{Z k(s) = 1 Yj m+1 (S) = 1} P{Yj m+1 (S) = 1 Y m (S) = 1} P{Z k (S) = 1 Y m (S) = 1} = P{Z k(s) = 1 F j = 1} P{Yj m+1 (S) = 1 Y m (S) = 1} = Ψ k(j, S) ρ,j. (9) P{Z k (S) = 1 F = 1} Ψ k (, S) In the chan of equaltes above, the second equalty uses the fact that f we know the (m + 1)-st product that a customer vsts, then the dstrbuton of the product that she purchases does not 12

13 depend on the m-th product that ths customer vsts. The thrd equalty uses the fact that a customer that vsts product j as the (m + 1)-st product s ndstngushable from a customer that vsts product j as the frst product from the perspectve of the fnal purchase decson. The fourth equalty s by the fact that gven that a customer vsts product as the m-th product, the probablty that she vsts product j next s gven by the transton probablty ρ,j. To compute the condtonal expectaton E{X,j (S) Z(S) = e k }, we use (8) and (9) n (7) to get E{X,j (S) Z(S) = e k } = m=1 P{Y m+1 j Ψ k (j, S) ρ,j = Ψ m=1 k (, S) = Ψ k(j, S) ρ,j Θ k (S) (S) = 1 Y m (S) = 1, Z(S) = e k } P{Y m (S) = 1 Z(S) = e k } m=1 In the last equalty above, we use the fact that Ψ k (, S) P{Y m (S) = 1} Θ k (S) P{Y m (S) = 1} = Ψ k(j, S) ρ,j Θ (S). (10) Θ k (S) m=1 P{Y m (S) = 1} corresponds to the expected number of tmes that a customer vsts product gven that we offer the subset S of products, n whch case, by the dscusson n Secton 2, ths quantty s gven by Θ (S). The dscusson n ths secton shows how we can compute the condtonal expectatons E{F Z(S) = e k } and E{X,j (S) Z(S) = e k }. The man bulk of the work nvolves solvng the systems of equatons n (1) and (5) to obtan (Θ 1 (S),..., Θ n (S)) and {Ψ k (, S) : N \ S}. Usng (6) and (10), we can gve explct expressons to execute the expectaton step of our expectaton-maxmzaton algorthm. We replace (λ, ρ) wth (λ l, ρ l ) and S wth Ŝt n (1) and solve ths system of equatons. We use (Θ l 1 (Ŝt ),..., Θ l n(ŝt )) to denote the soluton. Smlarly, we replace (λ, ρ) wth (λ l, ρ l ) and S wth Ŝt n (5) and solve ths system of equatons. We use {Ψ l k (, Ŝt ) : N \ S} to denote the soluton. Also, we let Ψ l k (, Ŝt ) = 1( = k) for all S. In ths case, by (6), for all N and t T, we have = E{F Z(Ŝt ) = e k } = Ψ l k (, Ŝt ) λ l / Θl k (Ŝt ). Also, by (10), for all N \ S, j N and t T, we have ˆX,j = E{X,j(Ŝt ) Z(Ŝt ) = e k } = Ψ l k (j, Ŝt ) ρ l,j Θl (Ŝt ) / Θ l k (Ŝt ). Fnally, we set ˆX,j ˆF = 0 for all S, j N and t T. 4.3 Maxmzaton Step In the maxmzaton step of the expectaton-maxmzaton algorthm we need to maxmze the functon L l C (λ, ρ) = t T N ˆF log λ + t T,j N ˆX,j log ρ,j subject to the constrant 13

14 that N λ = 1, j N ρ,j = 1 for all N, λ R n + and ρ R+ n n. Ths optmzaton problem decomposes nto 1 + n problems gven by { max t T { max t T j N N ˆF log λ : N λ = 1, λ 0 N ˆX,j log ρ,j : j N ρ,j = 1, ρ,j 0 j N } } and (11) N. (12) Problem (11) corresponds to the problem of computng the maxmum lkelhood estmators of the parameters (λ 1,..., λ n ) of the multnomal dstrbuton, where λ s the probablty of observng outcome n each tral, we have a total of t T ˆF N ˆF trals and we observe outcome n t T trals. In ths case, the maxmum lkelhood estmator of λ s known to be t T ˆF / t T j N ˆF j ; see Secton 2.2 n Bshop (2006). Therefore, the optmal soluton to problem (11) s obtaned by settng λ = t T ˆF / t T j N ˆF j for all N. In the next secton, we focus on our expectaton-maxmzaton algorthm under the assumpton that the parameters of the Markov chan choce model are known to be bounded away from zero by some ɛ > 0. In ths case, the maxmzaton step requres solvng the frst problem above wth a lower bound of ɛ on the decson varables (λ 1,..., λ n ). In Onlne Appendx A, we dscuss how to solve ths optmzaton problem. Repeatng ths dscusson wth ɛ = 0 also shows that we can obtan the optmal soluton to problem (11) by settng λ = t T ˆF / t T j N ˆF j for all N. Each one of the n problems n (12) has the same structure as problem (11). Followng the same argument used to fnd the optmal value of λ, the optmal soluton to each one of the n problems n (12) s obtaned by settng ρ,j = t T ˆX,j / t T k N ˆX,k for all j N. Thus, to execute the maxmzaton step of our expectaton-maxmzaton algorthm, we smply set λ l+1 = t T ˆF / t T j N ˆF j and ρ l+1,j = t T ˆX,j / t T k N ˆX,k for all, j N. 4.4 Convergence of the Algorthm We can gve a convergence result for our expectaton-maxmzaton algorthm under the assumpton that the parameters of the Markov chan choce model that we are tryng to estmate are known to be bounded away from zero by some ɛ > 0. Under ths assumpton, we execute the maxmzaton step of the expectaton-maxmzaton algorthm slghtly dfferently. In partcular, we let (λ l+1, ρ l+1 ) be the maxmzer of L l C (λ, ρ) = t T N ˆF log λ + t T,j N ˆX,j log ρ,j subject to the 14

15 constrant that N λ = 1, j N ρ,j = 1 for all N, λ ɛ for all N and ρ,j ɛ for all, j N. In other words, we mpose a lower bound of ɛ on the decson varables. In Onlne Appendx A, we show that we can stll solve the last optmzaton problem n polynomal tme. The assumpton that the parameters of the Markov chan choce model are known to be bounded away from zero by some ɛ > 0 allows us to satsfy certan regularty condtons when we study the convergence of our expectaton-maxmzaton algorthm. Ths assumpton s arguably mld, snce we can put a small lower bound of ɛ > 0 on the parameters wth neglgble effect on the choce probabltes. To gve a convergence result for our expectaton-maxmzaton algorthm, we let Ω = {(λ, ρ) R n + R n n + : N λ = 1, j N ρ,j = 1 N, λ ɛ N, ρ,j ɛ, j N}, capturng the set of possble parameter values when we have a lower bound of ɛ on the parameters. Also, we defne the set of parameters Φ = { (λ 0, ρ 0 ) Ω : dl I((1 γ) (λ 0, ρ 0 ) + γ (λ, ρ)) dγ } 0 (λ, ρ) Ω. γ=0 Roughly speakng, havng (λ 0, ρ 0 ) Φ mples that f we start from the pont (λ 0, ρ 0 ) and move towards any pont (λ, ρ) Ω for an nfntesmal step sze, then the value of the lkelhood functon L I (λ, ρ) does not mprove. In the next theorem, we gve a convergence result for our expectaton-maxmzaton algorthm when we know that the parameters that we are tryng to estmate are bounded away from zero by some ɛ > 0. The proof s n Onlne Appendx B. Theorem 1 Assume that the sequence {(λ l, ρ l ) : l = 1, 2,...} s generated by our expectaton-maxmzaton algorthm when we mpose a lower bound of ɛ > 0 on the parameters of the Markov chan choce model. Then, we have L I (λ l+1, ρ l+1 ) L I (λ l, ρ l ) for all l = 1, 2,.... Furthermore, all lmt ponts of the sequence {(λ l, ρ l ) : l = 1, 2,...} are n Φ and the sequence {L I (λ l, ρ l ) : l = 1, 2,...} converges to L I (ˆλ, ˆρ) for some (ˆλ, ˆρ) Φ. Note that L I (λ, ρ) s the functon that we need to maxmze to estmate the parameters. By the theorem above, the sequence of parameters generated by our algorthm monotoncally mproves L I (λ, ρ) and we have convergence to a some form of local maxmum of L I (λ, ρ). Snce L I (λ, ρ) s not necessarly concave, we are not guaranteed to get to the global maxmum. Nettleton (1999) gves regularty condtons to ensure convergence of the expectaton-maxmzaton algorthm. The proof of Theorem 1 follows by verfyng these regularty condtons. In Onlne Appendx C, we gve an example to show that the regularty condtons n Nettleton (1999) may not hold wthout a lower 15

16 bound of ɛ > 0 on the parameters. Wu (1983) gves other regularty condtons but he assumes that the parameters generated by the algorthm are n the nteror of the set of possble parameter values, whch s dffcult to satsfy for the Markov chan choce model. Also, Theorem 1 does not rule out the possblty of multple lmt ponts for the sequence {(λ l, ρ l ) : l = 1, 2,...}, but all lmt ponts are n Φ. Lastly, we are not able to gve a convergence result wthout a lower bound of ɛ > 0 on the parameters. In Onlne Appendx D, however, we show that as long as the ntal parameter estmates are strctly postve, even f we do not mpose a lower bound on the parameters, our algorthm always generates a sequence of parameters {(λ l, ρ l ) : l = 1, 2,...} such that there exst unque solutons to the systems of equatons n (1) and (5) when we solve these systems of equatons after replacng (λ, ρ) wth (λ l, ρ l ) and S wth Ŝt for any subset n the data {(Ŝt, Ẑt ) : t T }. So, we do not encounter parameters that render the systems of equatons n (1) or (5) unsolvable. 5 Computatonal Experments We test the performance of our expectaton-maxmzaton algorthm on randomly generated data, as well as on a data set comng from a hotel revenue management applcaton. 5.1 Benchmark Strateges In our frst benchmark, referred to as EM, we estmate the parameters of the Markov chan choce model by usng our expectaton-maxmzaton algorthm. In our second benchmark, referred to as DM, we contnue usng the Markov chan choce model to capture the customer choces, but we estmate the parameters by drectly maxmzng the lkelhood functon L I (λ, ρ) n (2) through contnuous optmzaton software. In our thrd benchmark, referred to as ML, we use the multnomal logt model to capture the customer choces and estmate ts parameters by usng maxmum lkelhood. We brefly descrbe the multnomal logt model. In the multnomal logt model, the mean utlty of product s η. If we offer the subset S of products, then a customer purchases product wth probablty e η / j S eη j. As mentoned n Secton 2, we represent the no-purchase opton as a product always avalable for purchase. We denote ths product by φ N. In other words, a customer purchasng product φ corresponds to a customer leavng the system wthout a purchase. If we add the same constant to the mean utltes of all products, then the choce probablty e η / j S eη j of each product does not change. So, we normalze the mean utlty of the no-purchase opton to zero. In ths case, the parameters of the multnomal logt 16

17 model are η = {η : N \ {φ}}. Assume that we offer the subset Ŝt of products to customer t and the purchase decson of ths customer s gven by Ẑt = (Ẑt 1,..., Ẑt n), where Ẑt = 1 f and only f the customer purchases product. The lkelhood of the purchase decson of customer t s N (eη / j Ŝt eη j )Ẑt. Notng that N Ẑt = 1, the log-lkelhood of ths purchase decson s N Ẑt η N Ẑt log(j Ŝt eη j ) = N Ẑt η log( Ŝt eη ); see Secton II n McFadden (1974). Thus, the log-lkelhood of the data {(Ŝt, Ẑt ) : t T } s gven by L(η) = Ẑ t η t T N t T ( ) log e ηj. (13) Ŝt In ML, we estmate the parameters of the multnomal logt model by maxmzng the log-lkelhood functon n (13) through the Matlab routne fmncon. Secton n Boyd and Vandenberghe (2005) shows that log( n =1 ex ) s convex n (x 1,..., x n ) R n. Thus, the log-lkelhood functon L(η) n (13) s concave n η. Lettng w = e η, we can also express the choce probablty of product out of the subset S as w / j S w j, but the log-lkelhood functon L(w) = t T N Ẑt log w t T log( w ) s not concave n w = {w : N \ {φ}}. Ŝt In DM, we drectly maxmze the log-lkelhood functon L I (λ, ρ) n (2) also by usng the Matlab routne fmncon. Snce the functon L I (λ, ρ) s not necessarly concave n (λ, ρ), the parameters estmated by DM may depend on the ntal soluton, but exploratory trals ndcated that the performance of DM s rather nsenstve to the ntal soluton. We use the ntal soluton λ = 1/n for all N, ρ,j = 1/n for all N \ {φ}, j N and ρ φ,φ = 1. In EM, we use ths ntal soluton as well. In our expectaton-maxmzaton algorthm, we do not mpose a strctly postve lower bound on the parameters and stop when the ncomplete log-lkelhood ncreases by less than 0.01% n two successve teratons. We gve the pseudo-code for our algorthm n Onlne Appendx E. We also used the so-called ndependent demand model as a benchmark. Ths model performed consstently worse than EM, DM and ML and we wll comment on ts performance only brefly. 5.2 Known Ground Choce Model We provde computatonal experments on randomly generated data where we have access to the exact ground choce model that governs the customer choce process. Expermental Setup. We assume that the ground choce model that governs the customer choce process s the rankng-based choce model. In ths choce model, each arrvng customer 17

18 has a ranked lst of products n mnd and she purchases the most preferred avalable product n her ranked lst. The rankng-based choce model s used n Mahajan and van Ryzn (2001), van Ryzn and Vulcano (2008), Smth et al. (2009), Honhon et al. (2010), Honhon et al. (2012), Faras et al. (2013), van Ryzn and Vulcano (2015), Jagabathula and Vulcano (2016), Jagabathula and Rusmevchentong (2016) and van Ryzn and Vulcano (2016). We have m possble ranked lsts ndexed by M = {1,..., m}. We denote the possble ranked lsts that an arrvng customer can have n mnd by usng {(σ g 1,..., σg n) : g M}, where σ g s the preference order of product n the ranked lst σ g = (σ g 1,..., σg n). For example, f n = 3 and σ g 1 = 2, σg 2 = 3, σg 3 = 1, then product 3 s the most preferred product, product 1 s the second most preferred product and product 2 s the thrd most preferred product. The probablty that an arrvng customer has the ranked lst σ g n mnd s β g. If we offer the subset S of products, then an arrvng customer purchases product wth probablty g M βg 1( = arg mn j S σ g j ), whch s the probablty that an arrvng customer has product as her most preferred avalable product. In our computatonal experments, to come up wth the possble ranked lsts that a customer can have n mnd, we generate m random permutatons of the products. To come up wth the probablty β g that an arrvng customer has the ranked lst σ g n mnd, followng van Ryzn and Vulcano (2016), we generate γ g from the unform dstrbuton over [0, 1] and set β g = γ g / h M γh. We note that (β 1,..., β m ) generated n ths fashon s not unformly dstrbuted over the (m 1)-dmensonal smplex. Once we generate the ground choce model that governs the customer choce process, we generate the past purchase hstores {(Ŝt, Ẑt ) : t T } of the customers from ths ground choce model. To come up wth the subset Ŝt of products offered to customer t, we assume that the no-purchase opton s always avalable n the offered subset of products. Each of the other products are ncluded n the subset Ŝt wth probablty 1/2. Once we come up wth the subset of products offered to customer t, we generate the choce of a random customer out of ths subset accordng to the ground choce model and set Ẑt = 1 f and only f customer t purchases product. Usng ths approach, we generate the purchase hstory of 50,000 customers to use as the tranng data and a separate purchase hstory of 10,000 customers to use as the hold-out data. We use dfferent portons of the generated tranng data wth 2,500, 5,000, 10,000 and 50,000 customers to ft our choce models. In our test problems, the number of products s n = 11 or n = 21. One of these products corresponds to the no-purchase opton. The number of possble ranked lsts s m = 10 + n, m = 20 + n or m = 40 + n. For each product N, there s one ranked lst where the most preferred product n the ranked lst s 18

19 Out-of-sample Trn. log-lkelhood EM- EM- (n, m) data EM DM ML DM ML (11, 21) 2,500-16,391-16,407-16, % 1.72% (11, 21) 5,000-16,387-16,390-16, % 1.67% (11, 21) 10,000-16,366-16,349-16, % 1.78% (11, 21) 50,000-16,361-16,327-16, % 1.81% (11, 31) 2,500-16,199-16,222-16, % 0.61% (11, 31) 5,000-16,154-16,161-16, % 0.81% (11, 31) 10,000-16,141-16,138-16, % 0.87% (11, 31) 50,000-16,132-16,120-16, % 0.93% (11, 51) 2,500-17,029-17,059-17, % 0.51% (11, 51) 5,000-17,017-17,032-17, % 0.55% (11, 51) 10,000-16,992-16,993-17, % 0.67% (11, 51) 50,000-16,986-16,966-17, % 0.69% Out-of-sample Trn. log-lkelhood EM- EM- (n, m) data EM DM ML DM ML (21, 31) 2,500-22,839-23,053-23, % 1.30% (21, 31) 5,000-22,737-22,740-23, % 1.64% (21, 31) 10,000-22,673-22,648-23, % 1.85% (21, 31) 50,000-22,627-22,583-23, % 2.00% (21, 41) 2,500-22,581-22,783-22, % 0.99% (21, 41) 5,000-22,483-22,580-22, % 1.31% (21, 41) 10,000-22,422-22,408-22, % 1.54% (21, 41) 50,000-22,378-22,331-22, % 1.69% (21, 61) 2,500-23,406-23,562-23, % 0.14% (21, 61) 5,000-23,301-23,321-23, % 0.53% (21, 61) 10,000-23,267-23,271-23, % 0.59% (21, 61) 50,000-23,226-23,183-23, % 0.74% Table 1: Out-of-sample log-lkelhoods obtaned by EM, DM and ML. product. In ths way, we fx the most preferred product n n of the ranked lsts. The preference order of the other products n these n ranked lsts are randomly generated. The remanng ranked lsts other than these n ranked lsts are fully random permutatons of the products. Results. In Table 1, we show the out-of-sample log-lkelhoods obtaned by EM, DM and ML. The frst column n ths table shows the parameter combnaton (n, m) n the ground choce model. The second column shows the number of customers n the tranng data that we use to ft our choce models. The thrd column shows the out-of-sample log-lkelhood obtaned by EM, whch s the value of the log-lkelhood functon n (2) after replacng (λ, ρ) by the parameters estmated by EM and usng the hold-out data as the data {(Ŝt, Ẑt ) : t T } n ths log-lkelhood. The fourth column shows the out-of-sample log-lkelhood obtaned by DM, whch s the value of the same log-lkelhood functon n the second column, but the parameters (λ, ρ) correspond to the those estmated by DM. The ffth column shows the out-of-sample log-lkelhood obtaned by ML, whch s the value of the log-lkelhood functon n (13) after replacng η by the parameters estmated by ML and usng the hold-out data as the data {(Ŝt, Ẑt ) : t T } n ths log-lkelhood. The last two columns compare the log-lkelhood obtaned by EM wth those obtaned by DM and ML by gvng the percent gaps between the correspondng pars of log-lkelhoods. The results ndcate that EM provdes better out-of-sample log-lkelhoods than ML n our test problems. The out-of-sample log-lkelhoods obtaned by EM and DM are qute close. When we have 2,500 customers n the tranng data and 11 products, the average computaton tmes for EM, DM and ML are respectvely 12.93, and 0.54 seconds on a 2.2 GHz Intel Core 7 CPU wth 16 GB RAM. Wth 50,000 customers and 21 products, the average computaton tmes 19

20 for EM, DM and ML are respectvely 3,456.19, 218, and seconds. EM termnates n 31 to 52 teratons. The computaton tmes for EM and ML are reasonable snce we do not solve the estmaton problem n real tme, but DM s computatonally demandng. The computaton tme for EM s mostly spent on solvng the systems of equatons n (1) and (5) for each subset n the tranng data. Thus, EM s drastcally faster when the customers n the tranng data are offered a few dfferent subsets, whch s lkely to happen n practce. For example, when we have 50,000 customers n the tranng data, f these customers are offered one of 10 dfferent subsets, then EM takes about 20 seconds. In Onlne Appendx F, we gve the detaled computaton tmes. EM mproves the log-lkelhoods obtaned by ML, but ML may provde advantages n certan cases. EM estmates O(n 2 ) parameters gven by {λ : N} and {ρ,j :, j N}, whereas ML estmates O(n) parameters gven by {η : N \ {φ}}. The Markov chan choce model s more flexble due to ts larger number of parameters. However, snce the Markov chan choce model has a large number of parameters, EM may over-ft ths choce model to the tranng data, especally when we have too few customers n the tranng data and too many products so that we need to estmate too many parameters from too lttle data. In ths case, the out-of-sample performance of EM may be nferor. For example, f we have 1,000 customers n the tranng data and 21 products, so that EM estmates about 400 parameters from 1,000 data ponts, then the average percent gap between the out-of-sample log-lkelhoods obtaned by EM and ML s 0.45%, favorng ML, where the average s computed over the test problems wth m {10 + n, 20 + n, 40 + n}. Clearly, t s dffcult to estmate 400 parameters from 1,000 data ponts! If we have 1,000 customers n the tranng data and 11 products, so that EM estmates about 100 parameters nstead of 400, then the same average percent gap s 0.57%, favorng EM back agan. Thus, we should be cautous about usng the Markov chan choce model when we have too lttle data and too many products. To form a base lne, we also check the out-of-sample log-lkelhoods when we ft a rankng-based choce model, whch s the ground choce model that actually drves the choce process of the customers n the tranng and hold-out data. The papers by van Ryzn and Vulcano (2015) and van Ryzn and Vulcano (2016) gve algorthms for estmatng the parameters of the rankng-based choce model. We ft two versons of the ground choce model. In the frst verson, we estmate both the ranked lsts {σ g : g M} n the ground choce model and the correspondng probabltes (β 1,..., β m ). In the second verson, we assume that we know the ranked lsts {σ g : g M} and we estmate only the probabltes (β 1,..., β m ). We refer to the frst and second versons of the 20

Assortment Optimization under the Paired Combinatorial Logit Model

Assortment Optimization under the Paired Combinatorial Logit Model Assortment Optmzaton under the Pared Combnatoral Logt Model Heng Zhang, Paat Rusmevchentong Marshall School of Busness, Unversty of Southern Calforna, Los Angeles, CA 90089 hengz@usc.edu, rusmevc@marshall.usc.edu