Assortment Optimization for Parallel Flights under a Multinomial Logit Choice Model with Cheapest Fare Spikes

Assortment Optimization for Parallel Flights under a Multinomial Logit Choice Model with Cheapest Fare Spikes Yufeng Cao, Anton Kleywegt, He Wang School of Industrial and Systems Engineering, Georgia Institute of Technology, yufeng.cao@gatech.edu, anton@isye.gatech.edu, he.wang@isye.gatech.edu It has long been noticed by airlines that many customers tend to choose the cheapest fare class among all available fare classes. However, this phenomenon is not entirely captured by the widely used multinomial logit (MNL) choice model. In this paper, we study an assortment optimization problem for parallel flights under a spiked multinomial logit (spiked-mnl) choice model. The spiked-mnl model extends the classical MNL model by having a separate attractiveness parameter for the cheapest available fare class on each flight. We show that under the spiked-mnl choice model, the optimal dynamic assortment policy for parallel flights always selects assortments that are revenue-ordered, which implies that the optimal policy can be implemented as dynamic nested booking limit control. We also propose static booking limit control heuristics based on deterministic approximations of the problem. Finally, we evaluate different assortment policies in numerical experiments using both synthetic and real-world data provided by an airline partner. Key words: airline revenue management; assortment optimization; discrete choice model; spike effect; booking limits 1. Introduction Revenue management (RM) is widely adopted by airlines to improve demand forecasting, optimize inventory control and pricing strategy, and increase revenues (Belobaba 2015). A basic decision in airline RM is to select a subset of products, which consist of combinations of flights and fare classes, and offer them to customers. The subset of products made available to customers is called an assortment, and the problem of selecting such subsets to maximize revenue is known as assortment optimization. Airlines dynamically adjust assortments based on the remaining seats on flights and the time until departure. The assortment decisions need to be considered jointly for a collection of flights in an airline network, as customers may substitute between different flights based on product availability and price. The complexity of airline networks makes solving assortment optimization problems challenging. In this paper, we consider an assortment optimization problem for a collection of parallel flights. Parallel flights are flights with the same origin-destination pair and the same departure date. This problem is motivated by our collaboration with a major airline who competes in one of the busiest origin-destination markets in the world with over 30 parallel flights every day. Since both airports 1

Cao, Kleywegt, and Wang: Assortment for Parallel Flights under Spiked-MNL Model 2 at the origin and destination involved have flights to the same collection of other airports, very few passengers on these parallel flights are connecting passengers. Therefore it is reasonable to consider RM for these parallel flights separate from the other flights in the airline s network. Due to historical reasons, most airlines sell flight tickets in terms of fare classes. Each fare class is usually associated with a fixed price and certain booking restrictions (e.g., refundable or nonrefundable). Airlines then control prices indirectly by opening or closing different fare classes. We refer to a combination of a flight and a fare class as a product. Traditional RM demand models assume that a customer comes with a request for a predetermined product. A modern approach of choice-based RM assumes that customers have heterogeneous preferences over products, and select the product that they prefer most from the set of available products. One of the most popular choice models is the multinomial logit (MNL) model. The MNL model has a simple structure, and the parameter estimation problem as well as assortment optimization problems and optimal pricing problems under the MNL model are tractable. However, the MNL model has the independence from irrelevant alternatives (IIA) property, which states that the odds of preferring one alternative over another do not depend on the presence or absence of other irrelevant alternatives. This is undesirable for modeling the choice behavior of airline customers, among others, for the following reason. It has long been noticed in the airline industry that most customers who buy anything would buy a product that is the cheapest among a considered set of available products; for example, most customers who book a ticket choose the cheapest available fare class for their chosen flight (Boyd and Kallesen 2004). We have also observed this behavior in airline data (Dai et al. 2014). This phenomenon violates the IIA property. For example, Figure 1 shows the historical booking data for a specific flight. The fare classes are ordered such that Class 1 has the highest ticket price and Class 8 has the lowest. On the left panel, we show the fraction of bookings in each fare class for the flight when all eight fare classes are open. On the right panel, we show the fraction of bookings in each fare class when only Classes 1 to 7 are open. Note that in both cases, the cheapest available fare class (Class 8 on the left and Class 7 on the right) receives more than 60% of bookings. Moreover, the fraction of bookings in Class 7 is significantly more than the fraction of bookings in Class 6 when Class 7 is the cheapest available fare class (right panel); but the fraction of bookings in Class 7 is less than the fraction of bookings in Class 6 when Class 7 is not the cheapest available fare class (left panel). Since the ratio between the fractions of bookings in Class 7 and Class 6 is affected by the inclusion of other alternatives (such as Class 8), the IIA property is violated. We refer to the phenomenon that more customers than predicted by the MNL model buy the cheapest available fare class on each flight as the spike effect. To capture the spike effect in customer behavior, we consider an extension of the classical MNL model by using separate attractiveness parameters for the cheapest available products in an assortment. We call this the spiked-mnl choice

Cao, Kleywegt, and Wang: Assortment for Parallel Flights under Spiked-MNL Model 3 Figure 1 Historical booking data for a flight when Classes 1 to 8 are open (left) and when only Classes 1 to 7 are open (right). model, which was first introduced by Dai et al. (2014). After reviewing relevant literature in Section 2, we introduce the assortment optimization problem in Section 3. In Section 4, we define the spiked-mnl choice model and discuss some of its properties that are different from other commonly used choice models. In Section 5, we explore the structure of the optimal assortment policy under the spiked-mnl model. In Section 6, we consider deterministic approximations of the problem and propose static booking limit heuristics. Section 7 examines the numerical performance of different assortment control policies through synthetic and real-world airline data. Notation Let R and R + denote the set of real numbers and the set of nonnegative real numbers, respectively. Let Z and Z + denote the set of integers and the set of nonnegative integers. We use boldface lower-case and upper-case letters to represent vectors and matrices, respectively. For a vector x, let x j denote its j-th component. Given two real numbers a R and b R, let a b := min{a, b}, a b := max{a, b}, and a + := a 0. Given a set S, let 2 S denote its power set, which contains all subsets of S, and let S n denote its n-th Cartesian power. An indicator function is denoted with I( ); a.s. means almost surely; i.i.d. stands for independent and identically distributed; w.p.1 means with probability 1. 2. Literature Review Since the deregulation in the airline industry, airlines have been seeking better ways to price and manage their products. The science of revenue management, or yield management, has been developing with the boost of the airline industry. Among the pioneers were the Scandinavian Airlines System and the American Airlines, who survived fierce competition with the help of revenue management (Andersson 1989, Smith et al. 1992).

Cao, Kleywegt, and Wang: Assortment for Parallel Flights under Spiked-MNL Model 4 Assortment optimization has been an active research field in revenue management. There is an extensive literature on assortment optimization for a wide range of industries including airline, retail, e-commerce, etc. We refer readers to the survey by Hübner and Kuhn (2012) and Kök et al. (2015) for a comprehensive discussion on this stream of literature. Our literature review below focuses on most relevant papers that study assortment optimization for airlines under customer choice behavior. The idea of airline assortment optimization can be traced back to traditional RM methodology such as Littlewood s classical paper on controlling inventory of two fare classes (Littlewood 1972). Traditional RM demand models assume that a customer comes with a request for a predetermined product. A firm then decides whether to accept or reject the customer s request. Typical control policies use either bid-prices (e.g., Bertsimas and Popescu 2003) or booking limits (e.g., Talluri and van Ryzin 1998, Bertsimas and de Boer 2005) to make accept/reject decisions. We refer readers to McGill and van Ryzin (1999) for a survey on early development of traditional RM under the independent demand assumption. Traditional RM does not account for customer choice behavior and may lead to cascading deterioration of demand estimation accuracy and revenue performance (Cooper et al. 2006). We also note that some remedies based on buy-downs and buy-ups have been proposed to account for restricted demand substitution patterns (see, e.g., Gallego et al. 2009, Walczak et al. 2010, Cooper and Li 2012). A modern approach of choice-based RM has been adopted by academia and industry (Strauss et al. 2018). Talluri and Van Ryzin (2005) studied the problem of assortment optimization under a general choice model for a single flight leg. They formulated the problem as a dynamic program (DP). By introducing the concept of efficient sets, they showed that only efficient sets are used in optimal assortment controls. Zhang and Cooper (2005) considered assortment optimization for parallel flights and developed a simulation-based heuristic. An important assumption in their paper was that customers would only switch between flights, but not fare classes within a flight. Later, van Ryzin and Vulcano (2008a) studied assortment optimization for a network revenue management problem using virtual nesting controls. They adopted a simulation-based sample path gradient method to optimize booking controls. Zhang and Adelman (2009) approximated value functions of the DP by affine functions, and developed a column generation algorithm to solve the assortment problem for the MNL model with disjoint consideration sets. Due to the curse of dimensionality, the computational burden of DP increases significantly from single-leg to parallel flights, and then to general airline networks. Therefore, Gallego et al. (2004) proposed a choice-based deterministic linear programming (CDLP) model as a deterministic approximation of the stochastic DP problem. Liu and van Ryzin (2008) extended the concept of efficient sets from Talluri and van Ryzin (2004) and proved that the solution to the CDLP is asymptotically optimal for the DP. Even though identifying efficient sets helps reduce the number of candidate

Cao, Kleywegt, and Wang: Assortment for Parallel Flights under Spiked-MNL Model 5 assortments, there could still be exponentially many decision variables for the CDLP. Liu and van Ryzin (2008) suggested solving the CDLP using column generation. Talluri (2014) proposed a new approach called segment-based deterministic concave program (SDCP), which is a compact relaxation of the CDLP. The SDCP formulation can be tightened with randomized convex programming method. Recently, Gallego et al. (2015) proposed a sales-based linear programming (SBLP) model under a general attractiveness model, of which the MNL model is a special case. The SBLP model only requires a polynomial number of variables under the MNL model and is equivalent to the CDLP. In addition to assortment optimization for general choice models, many researchers have also considered assortment planning under specific choice models. The choice models studied in the assortment optimization literature are diverse, which include but are not limited to MNL (Talluri and van Ryzin 2004, Liu and van Ryzin 2008, Gallego et al. 2015), robust MNL (Rusmevichientong et al. 2010, Rusmevichientong and Topaloglu 2012), nested logit model (Davis et al. 2014, Gallego and Topaloglu 2014, Feldman and Topaloglu 2015), mixed MNL (Bront et al. 2009, Rusmevichientong et al. 2014), Markov chain choice model (Feldman and Topaloglu 2017), and nonparametric choice models (Farias et al. 2013, Bertsimas and Mišic 2015). For an overview, we refer readers to a recent survey by Strauss et al. (2018). Among all these models, the MNL model is commonly used in the literature as a benchmark. The MNL model has many favorable properties, such as the maximum likelihood estimation problem, the assortment optimization problem, and the optimal pricing problem, being easy to solve. Talluri and van Ryzin (2004) showed that the optimal policy of the assortment optimization problem under the MNL model is nested-by-fare-order for single-leg RM. Liu and van Ryzin (2008) and Gallego et al. (2015) also discussed assortment optimization under the MNL model. The phenomenon of cheapest fare spikes has long been noticed by the airline industry (Boyd and Kallesen 2004). However, we are not aware of many papers that explicitly consider the spike effect in customer choice models, with the exception of Dai et al. (2014) and Ding (2017). Dai et al. (2014) referred to the cheapest fare spike phenomenon as context effect, while Ding (2017) called it buydown effect. As we discussed in the example in Figure 1, the spike effect cannot be explained by the MNL choice model. Therefore, Dai et al. (2014) and Ding (2017) both used a variant of the MNL model to incorporate the spike effect, and proposed a SBLP formulation for this model. We will formally describe this modified MNL model in Section 4. 3. Model Formulation We consider an assortment optimization problem over a collection of parallel flights that depart on the same day between a common origin-destination pair. Let F denote the set of parallel flights operated by a host airline, who is the decision maker in our model setting. (There could be other flights offered by competing airlines in this origin-destination market.) The number of parallel flights is denoted by

Cao, Kleywegt, and Wang: Assortment for Parallel Flights under Spiked-MNL Model 6 m := F, and the vector of seat capacities on these flights is denoted with c = (c f, f F) Z m +. Let I denote the set of fare classes on each flight, and let n := I denote the number of fare classes. A product is defined as a combination of a flight and a fare class on that flight. We use j := (i, f) to denote a product, and J := I F to denote the set of all products; the number of products is equal to J = mn. Occasionally, using discrete choice terminology, we also refer to a product as an alternative. We use j = 0 to represent the alternative that a customer buys nothing from the host airline, also called the no-purchase alternative or the null alternative. Let a j = (a j f, f F) {0, 1} m denote a vector representing the resource consumption of product j; that is, a j f = 1 if j = (i, f) for some i I and all other elements of a j are equal to 0. We say that an assortment S J is offered when the airline makes only the products in S available to customers. The null alternative is always available to customers. Let r = (r j, j J ) R mn + denote the vector of revenues associated with each product; that is, r j denotes the revenue of product j = (i, f). Without loss of generality, we order the fare classes on each flight f F by their revenues such that r 1,f > > r i,f > > r n,f > 0, with i = 1 being the most expensive fare class and i = n being the cheapest fare class. The selling horizon is divided into discrete periods indexed by t = 1,..., T. We assume that the time periods are sufficiently short so that there is at most one customer arrival in each period. In other words, the probability that two or more customers arrive in the same period is negligible. We assume that the probability of arrival, denoted by λ, is the same for all periods. The arrivals form an i.i.d. sequence that is also independent of customer choices. (In Section 7, we relax the time homogeneity assumption of the arrival process and also allow the choice probabilities to depend on time and booking channels.) In period t, the host airline offers an assortment S J. When an individual customer arrives, she sees the assortment S and purchases a product j S with probability P (j, S) or leaves without a purchase with probability P (0, S), so that, given the assortment S, it holds that P (0, S) + j S P (j, S) = 1. Therefore, having no sales in a period could be due either to no arrival or to an arriving customer who does not purchase. The probability that no sale occurs is λp (0, S) + (1 λ). Given the initial capacity c, the host airline selects the assortment offered to the customers in each period t in order to maximize the total expected revenue. We model the assortment optimization problem by dynamic programming. Let c t Z m + denote the vector of remaining seat capacities at time period t. Let V t : Z m + R denote the optimal revenue-to-go function at period t given the remaining seat capacities. The optimality equation is V t (c t ) = max λp (j, S) ( r S J j + V t+1 (c t a j ) ) + (λp (0, S) + 1 λ)v t+1 (c t ) j S

Cao, Kleywegt, and Wang: Assortment for Parallel Flights under Spiked-MNL Model 7 = max S J λp (j, S)(r j (V t+1 (c t ) V t+1 (c t a j ))) + V t+1(c t ). (1) j S The boundary conditions are V t (0) = 0 for all t = 1,..., T and V T +1 ( c) = 0 for any c Z m +. 4. The Spiked-MNL Model In this section, we define the spiked-mnl choice model and discuss its properties. The spiked-mnl choice model is adopted from the modified MNL model in Dai et al. (2014) and Ding (2017) to capture the effect of cheapest fare spikes. 4.1. Definition of Choice Model For every product j J, we define two parameters > 0, v j > 0. The quantity represents the special attractiveness of product j when it is the cheapest available fare class on its associated flight; otherwise, product j has a regular attractiveness of v j. We assume that the cheapest fare spikes are always nonnegative, i.e., v j for all products j J, unless otherwise specified. This is in general consistent with the airline data that we used. We denote the attractiveness of the null alternative by v 0 and call it the null attractiveness. Suppose the firm offers an assortment S. Let I(j, S) denote an indicator function such that I(j, S) = 1 if j is the cheapest available fare class on its associated flight in assortment S, and I(j, S) = 0 otherwise. The spiked-mnl model specifies that an arriving customer chooses product j S with probability P (j, S) = v j (1 I(j, S)) + I(j, S) v 0 + j S [v j (1 I(j, S)) + I(j, S)]. The probability that the customer does not make a purchase is given by P (0, S) = v 0 v 0 + j S [v j (1 I(j, S)) + I(j, S)]. Note that when = v j for all j J, the spiked MNL model reduces to the classical MNL model. (Under the classical MNL model, the attractiveness of a product is constant and represented by an exponentiated utility, i.e., v j = = e ρu j, where u j denotes a mean utility measure of j and ρ > 0 is a parameter that is inversely related to the variance of the underlying Gumbel distribution.) Dai et al. (2014) showed that the spiked-mnl model defined above fits airline booking data better than the classical MNL model. Figure 2, which is taken from Dai et al. (2014), shows both the actual and estimated fractions of bookings in different fare classes on a flight. The curve XX Actual represents the actual fraction of bookings in different fare classes, MNL no Spike corresponds to the fractions of bookings predicted by a classical MNL choice model calibrated with booking data, and MNL Spike corresponds to the fractions of bookings predicted by the above spiked-mnl model calibrated with the same data. The left panel shows the fractions when Classes 1 to 8 are open and the right

Cao, Kleywegt, and Wang: Assortment for Parallel Flights under Spiked-MNL Model 8 panel shows the fractions when Classes 1 to 7 are open. (Recall that fare classes are ordered such that Class 1 has the highest price and Class 8 has the lowest.) It is easy to see that in both settings the prediction of the spiked-mnl model is much closer to the actual data. Figure 2 (a) fare class 8 cheapest (b) fare class 7 cheapest Fraction of bookings and its estimations under MNL with or without spikes. There are several important differences between the classical MNL model and the spiked-mnl model in terms of their properties. Below we examine some of the properties of the spiked-mnl model. First we introduce some additional notation. Let J f = I {f} denote the set of products associated with flight f. For any assortment S J, let S f := S J f denote the products in assortment S that are associated with flight f. For any product j, let f(j) denote the flight that product j is associated with, and let J(j) denote the set of products associated with the same flight as product j and that have higher fares than product j. That is, J(j) = {j J f(j) : r j > r j }. Let J(j) := J(j) {j}, and let J(j) := {j J f(j) : r j < r j } denote the set of products associated with the same flight as product j and that have lower fares than product j. 4.2. Regularity The regularity property states that the probability of choosing any alternative, including the null alternative, from an assortment does not increase if the assortment is enlarged (Manski and McFadden 1981). More formally, the definition of a regular choice model is as follows. Definition 1. A choice model is regular if for any two assortments S and T satisfying S T J and any alternative j S {0}, it always holds that P (j, S) P (j, T ). The regularity property is a common assumption in the assortment optimization literature (see, e.g., Golrezaei et al. 2014, Berbeglia and Joret 2016). The classical MNL choice model is regular, but the spiked-mnl choice model is not regular. Consider the following example:

Cao, Kleywegt, and Wang: Assortment for Parallel Flights under Spiked-MNL Model 9 Example 1. Suppose a vendor sells three products H, M, and L with revenues r H > r M > r L. Let the attractiveness parameters of these products be v H = v M = w L = 1 and w M = 8 (we don t need to specify w H or v L in this example), and let the null attractiveness be v 0 = 1. Then P (H, {H, M}) = v H /(v H + w M + v 0 ) = 1/10 and P (H, {H, M, L}) = v H /(v H + v M + v L + v 0 ) = 1/4, which violates the regularity property. In order to check whether the spiked-mnl model is regular, or to enforce regularity when calibrating a spiked-mnl model, we have the following necessary and sufficient condition, the proof of which is given in the appendix. Proposition 1. The spiked-mnl model is regular if and only if for any two products j and j such that j J(j), i.e., j and j are associated with the same flight and j has higher fare than j, it holds that + v j. According to the proposition, for m parallel flights and n fare classes, the complexity of checking the regularity of a spiked-mnl model is no more than O(mn 2 ). 4.3. Submodularity Given a choice model, let the demand function of the choice model be g(s) := j S P (j, S) for any assortment S J. Another common property of many choice models is the submodularity of their demand functions, which implies that the marginal increment in total purchase probability decreases as the assortment enlarges (Berbeglia and Joret 2016). More formally, the definition of a submodular demand function is as follows. Definition 2. The demand function g of a choice model is submodular, if g(t {k}) g(t ) g(s {k}) g(s), S T J, k J \ T. (2) The demand function of the classical MNL choice model is submodular, but the demand function of the spiked-mnl choice model is not submodular. Consider the following example: Example 2. Suppose a vendor sells three products H, M, and L with revenues r H > r M > r L. Let the attractiveness parameters of the products be v H = 1, w H = 3, and v M = w M = w L = 2 (we don t need to specify v L ); and let the null attractiveness be v 0 = 1. Consider set S = {H}, set T = {H, L}, and product k = M. Then g(t {k}) g(t ) = g({h, M, L}) g({h, L}) = 5/6 3/4 = 1/12, and g(s {k}) g(s) = g({h, M}) g({h}) = 3/4 3/4 = 0. Therefore, the demand function is not submodular. Note that in Example 2, the choice model is regular, as the condition in Proposition 1 is satisfied. Therefore, regularity of the spiked-mnl model does not imply submodularity of its demand function.

Cao, Kleywegt, and Wang: Assortment for Parallel Flights under Spiked-MNL Model 10 Moreover, it is well known that any random utility model has a submodular demand function and is equivalent to a certain stochastic preference model (Berbeglia and Joret 2016). Our example shows that a spiked-mnl model is in general not representable by any random utility model or stochastic preference model. 5. Structure of Optimal Policy under the Spiked-MNL Model In this section, we study the structure of the optimal policy under the spiked-mnl model. As a main result, we show that in optimal assortment controls under the spiked-mnl model, every chosen assortment is revenue-ordered. That is, if a fare class is open at any given time, all fare classes on the same flight with higher fares must also be open. This result implies that the optimal policy can be implemented using nested booking limit control, a type of control that is widely used in airline RM practice. More specifically, the booking limits are adjusted dynamically in the optimal policy based on the remaining seats on flights and the time until departure. 5.1. Efficient Sets It is well known that, for single-leg RM, the optimal assortment policy under the MNL model is nested allocations, where the nesting is ordered by revenue of fare classes (Talluri and van Ryzin 2004). Therefore, the optimal assortment policies for single-leg RM can be implemented as dynamic nested booking limits/protection levels. The reasoning to show this result is based the concept of efficient sets, which are a collection of assortments that provide Pareto trade-offs between expected revenue and expected resource consumption. Later, Liu and van Ryzin (2008) extended the concept of efficient sets to general network RM. They also showed that the optimal policy are only composed of efficient sets for parallel flights. We first revisit the concept of efficient sets proposed by Liu and van Ryzin (2008). Let R(S) be the expected revenue given an assortment S J, and let function Q : 2 J [0, 1] m represent the vector of resource consumption rates of assortments. For parallel flights, given assortment S, the expected revenue is R(S) = j S r jp (j, S) and the resource consumption rates are Q(S) = j S aj P (j, S). Recall that a j is a column vector representing the resources required by product j. Definition 3 (Efficient Sets). An assortment T is said to be inefficient if a mixture of other assortments can be used to generate strictly higher revenue with the same or lower resource consumption rates. That is, there exists a set of weights {µ(s): S J } satisfying S µ(s) = 1 and µ(s) 0 for all S J such that R(T ) < S J µ(s)r(s), Q(T ) S J µ(s)q(s). If no such weights exists, the assortment T is said to be efficient. To check whether a set is efficient, Liu and van Ryzin (2008) provided the following condition.

Cao, Kleywegt, and Wang: Assortment for Parallel Flights under Spiked-MNL Model 11 Proposition 2 (Liu and van Ryzin (2008)). A set T is efficient if and only if for some π R m +, set T is an optimal solution to the problem max {R(S) S J πt Q(S)}. We derive the following corollary, which will be used to prove a key result (Theorem 2) later. The proof of Corollary 1 is included in the appendix. Corollary 1. For parallel flights, given that an assortment T is efficient, there exists a vector γ R mn, satisfying γ j > γ j problem for all j J and j J(j), such that T is an optimal solution to the max S J γ j P (j, S). (3) j S The coefficient γ j in Corollary 1 can be interpreted as the marginal profit of adding product j into assortment S. In general we have γ j r j, since adding product j to the assortment also affects choice probabilities of other products. It is easily verified that, for parallel flights, the optimal assortment policy obtained from DP in Eq (1) only uses efficient sets (c.f. Liu and van Ryzin 2008). Indeed, the maximization problem in Eq (1) has the same form as in (3). If we can characterize the structure of efficient sets, we can restrict our attention to efficient sets in the DP (1), which is a subset of the set of all assortments, 2 J. For general choice models, efficient sets are often hard to characterize; but for the spiked-mnl model, we show next that there is a simple structure for efficient sets in the parallel-flight RM setting. 5.2. (Partially) Revenue-ordered Assortments Talluri and van Ryzin (2004) showed that the efficient sets under the MNL model for single-leg RM are assortments of the form A k = {1, 2,, k} for some k I. Rusmevichientong and Topaloglu (2012) showed that the same conclusion holds even when the model parameters are uncertain, and they referred to such sets as revenue-ordered assortments. We extend the concept of revenue-ordered assortments for parallel flights and show that the efficient sets under the spiked-mnl model are (partially) revenue-ordered. Definition 4 (Revenue-ordered assortments). For parallel flights, an assortment S is (partially) revenue-ordered if for any product j offered in S, the products associated with the same fight leg and with higher ranks than j are also offered in the assortment. In other words, for any j S, we have J(j) S. We use the phrase (partially) revenue-ordered in the definition, because unlike the single-leg RM setting, if we rank a set of products for parallel flights by their fare classes, it only gives a partial

Cao, Kleywegt, and Wang: Assortment for Parallel Flights under Spiked-MNL Model 12 order of the products, as products associated with different flight legs are incomparable. For brevity, when there is no ambiguity, we simply refer to (partially) revenue-ordered assortments as revenueordered. For parallel flights, revenue-ordered assortments are indexed by the cheapest available fare class on each flight. Let l = (i 1,, i m ) T be a list of fare classes, where m is the number of flights. Given the list l I m, the associated revenue-ordered assortment is defined by A l = m f=1 i f k=1 {(k, f)}, where i f is the cheapest available fare class offered on flight f. The following theorem provides a characterization if efficient sets are revenue-ordered for general choice models. Theorem 1. For a parallel flight network, every efficient set under a given choice model is revenue-ordered if and only if for any set T that is not revenue-ordered, there exists constants µ l 0 ( l I m ) satisfying l I m µ l = 1 such that j J(j) l I m µ l P (j, A l ) j J(j) P (j, T ), j J, and µ l P (j, A l ) = P (j, T ), f F. j J f l I m j J f Next, we prove the following result for the spiked-mnl model. Theorem 2. For a parallel flight network, every efficient set under the spiked-mnl model is a revenue-ordered assortment. By Theorem 2, when solving optimal assortment policies for parallel flights under the spiked- MNL model, we can restrict our attention to revenue-ordered assortments. In other words, in the DP equation (1), the control space J can be replace by the set of all revenue-ordered assortments, {A l : l I m }. As a result, the computational complexity of the DP is reduced. Remark 1. We have assumed that the spike effect is nonnegative in the previous analysis; namely, v j. If < v j, the result of Theorem 2 may not hold. See a counterexample in Appendix A.1. 6. Deterministic Approximation and Static Booking Limit Control According to Theorem 2 in the previous section, we can reduce the control space of the DP from the set of all assortments, which has a size of 2 mn for m parallel flights and n fare classes, to the set of revenue-ordered assortments, which has a size of n m. Unfortunately, the reduced control space still has a size that is exponential in the number of flights, making the DP intractable for large m. This motivates us to consider deterministic approximations of the DP. A deterministic approximation commonly used in the RM literature is choice-based deterministic linear program (CDLP). For both general choice models and the spiked-mnl model, the CDLP has exponentially many variables. We introduce a compact SBLP formulation, which is equivalent to the CDLP and only has mn variables. The SBLP can be used to construct static booking limit heuristics.

Cao, Kleywegt, and Wang: Assortment for Parallel Flights under Spiked-MNL Model 13 6.1. Choice-based Deterministic Linear Programming Choice-based Deterministic Linear Programming (CDLP) is an approximation of the original dynamic assortment optimization problem where customer arrivals and choices are replaced by their mean (Gallego et al. 2004). The decision variables of CDLP are fractions of time that different assortments are offered. Let be the fraction of time that assortment S J is offered. The CDLP is given by z CDLP = max α 0 λt S J R(S) (4a) s.t. S J 1, (4b) λt S J Q(S) c. (4c) Recall that R(S) is the expected revenue from offering assortment S to a customer, and that Q(S) is the vector of resource consumption rates given assortment S. The objective (4a) of the CDLP maximizes the total expected revenue over the horizon. Constraint (4b) specifies that the sum of fractions of offering different assortments is bounded by 1. With fraction 1 S J, all the fare classes are closed and only the null alternative is available. Constraint (4c) represents seat capacity constraints. For a parallel flight network, the number of variables in CDLP is 2 mn for general choice models. Under the spiked-mnl model, by Theorem 2, = 0 in the optimal solution to the CDLP if assortment S is not revenue-ordered, so the number of variables is reduced. However, the number of revenue-ordered assortments is n m, which means that the CDLP under the spiked-mnl model can have exponentially many variables. This motivates us to consider an deterministic LP formulation with a polynomial size. 6.2. Sales-based Linear Programming Under the classical MNL model, the CDLP can be transformed into an equivalent LP formulation called Sales-based Linear Program (SBLP), which has a polynomial size of variables and constraints (Gallego et al. 2015). We propose an extension to the SBLP formulation for parallel flights under the the spiked-mnl choice model. Let x = (x j : j J ), where x j is the expected sales of product j when it is the cheapest available on the corresponding flight. Let w(s) := j S [I(j, S) + v j (1 I(j, S))] denote the total attractiveness of products in assortment S, and r(s) := j S r j [ I(j, S) + v j (1 I(j, S))] denote the total revenue of products in assortment S weighted by their attractiveness parameters. Recall that J(j) denotes the set of products associated with the same flight as product j that have higher fares; we also define J(j) = J(j) {j}. The SBLP under the spiked-mnl model is given by z SBLP = max x,x 0 j J r( J(j)) x j (5a)

Cao, Kleywegt, and Wang: Assortment for Parallel Flights under Spiked-MNL Model 14 s.t. x 0 + j J w( J(j)) x j = λt (5b) w( J(j)) x j c f w j J f j f F (5c) x j x 0 v 0 f F (5d) j J f x 0, x 0 0. The objective (5a) is to maximize the total expected revenue. Constraint (5b) is due to the fact that the number of bookings plus the number of customers without purchase equals the number of arrivals. The quantity w( J(j)) x j is the expected sales on flight f(j) when product j is the cheapest available product on that flight. (To see this, by Theorem 2, when product j is the cheapest available fare class on its associated flight, f(j), the available fare classes on flight f(j) is J(j), since products with higher fares on the same flight must also be available.) Constraint (5c) is the seat capacity constraint for each flight. Constraint (5d) is derived from the fact that the null alternative is always available. We show in the following theorem that the SBLP formulation (5) is equivalent to the CDLP formulation under the spiked-mnl model. Theorem 3. Under the spiked-mnl model, given an optimal solution to CDLP (4), an optimal solution to SBLP (5) can be constructed in polynomial time, and vice versa. The proof of Theorem 3 is constructive: we give an algorithm that coverts optimal solutions between the two formulations in polynomial time (see Appendix A.3). In fact, we can further show that the optimal CDLP solution produced by the algorithm contains a sequence of nested assortments. That is, there exists a sequence of assortments S 1 S 2 S k and an optimal CDLP solution {, S J }, such that > 0 if and only if S = S j Theorem 3, we have the following result. for some j = 1,..., k. As a corollary to Corollary 2. Under the spiked-mnl model, the CDLP (4) has an optimal solution that consists of a sequence of nested assortments. According to Corollary 2, if the CDLP has a unique optimal solution, the support of the optimal solution contains a collection of nested assortments. If the CDLP has multiple optimal solutions, it is possible that some of them do not have the nested assortment structure, but we can always find at least one optimal solution with the nested structure. An example for the latter case is given in Appendix A.3. The result of Corollary 2 has an interesting implication. By Definition 3, the support of an optimal solution to the CDLP for parallel flights only contains efficient sets (see Liu and van Ryzin 2008). For

Cao, Kleywegt, and Wang: Assortment for Parallel Flights under Spiked-MNL Model 15 a single-leg flight, Theorem 2 states that efficient sets are revenue-ordered assortments of the form {1, 2,..., i}, which immediately implies Corollary 2. However, for parallel flights, a set of (partially) revenue-ordered assortments might not be nested a simple counterexample is two parallel flights and one fare class on each flight. Therefore, Corollary 2 is not directly implied by Theorem 2. Proving Corollary 2 is critical for constructing static booking limit controls that we will discuss later in this section. We make two final remarks about the SBLP formulation. First, Dai et al. (2014) also provided an SBLP formulation under the spiked-mnl model, but their formulation has more variables and constraints than the SBLP formulation (5), as their formulation does not take advantage of the revenue-ordered structure of optimal assortments (see Appendix A.2). Second, the SBLP formulation above assumes time-homogeneous demand model and a single booking channel. We can extend the SBLP formulation with time-varying demand model and multiple booking channels. This extension is used in our numerical experiments based on real-world airline data (Section 7). 6.3. Static Booking Limit Controls by Deterministic Approximation Booking limits are widely used by airline reservation systems for controlling availability of fare classes. With a partitioned booking limit policy, seat capacity on a flight is divided for each fare class, and a fare class is closed to customers once the number of sales of that class reaches its booking limit. With a nested booking limit policy, the booking limits are defined for subsets of fare classes that are nested by revenue order, so higher-ranked classes have access to the capacity reserved for lowerranked classes. A detailed discussion of booking limit controls can be found in Talluri and Van Ryzin (2005). By Corollary 2, the optimal solution to SBLP (5) can be used naturally to construct booking limit policies, where the booking limit for each product is given by the expected sales of that product in the SBLP (5). In particular, let x = (x j : j J ) be the solution to SBLP. The expect number of sales of product j, denoted by s j, is given by s j = x j + v j w j J(j) j x j. (6) Recall that J(j) is the set of products that use the same flight leg as product j and have lower fare classes. According to the definition of SBLP, if any product j J(j) is open, product j must also be

Cao, Kleywegt, and Wang: Assortment for Parallel Flights under Spiked-MNL Model 16 open. So v j x j is the expected sales of product j when j J(j) is the cheapest available fare class on the flight. By Eq (5c), j J f s j c f. We thus define a (static) partitioned booking limit policy by setting the booking limit of product j to s j. We also define a (static) nested booking limit policy, where the booking limit for subset J(j) {j} is given by b j = j J(j) {j} s j. (7) The nested booking limit policy defined by Eq (7) can be implemented using either standard nesting or theft nesting (Talluri and Van Ryzin 2005). Under standard nesting, product j is closed when the booking limit of product j or any product ranked above j has been reached. Under theft nesting, product j is closed when the total bookings on flight f(j) over all fare classes reach the booking limit of product j. In sum, the optimal solution to the SBLP (5) defines three static booking limit heuristics: a partitioned booking limit policy, using expected sales defined by Eq (6) as booking limits; a standard nested booking limit policy, using booking limits defined by Eq (7); a theft nested booking limit policy, using booking limits defined by Eq (7). Under any of the three booking limit policies above, once a product is closed, it would remain closed until the end of the horizon. Therefore, when any of the static booking limit policies are implemented, a sequence of assortments S j, 1 j k, are offered such that S 1 S 2 S k. If all the random variables in the system associated with customer arrivals and choices are replaced by their expectations, the resulting sequence of assortments is the one given by Corollary 2. 7. Simulation In this section, we conduct numerical experiments to study the performance of different assortment control policies. As common in practice, we allow the arrival rates and the choice parameters to vary over time and among different booking channels in the numerical experiments. Specifically, the selling horizon is divided into several phases of possibly different lengths; the set of phases is denoted by T.

Cao, Kleywegt, and Wang: Assortment for Parallel Flights under Spiked-MNL Model 17 Customers arrive via different booking channels, which are denoted by set C. (For example, customers could book a ticket by phone, on the airline s website, or through a third-party travel agent.) We assume that the phases are divided in such a way that the arrival process in each phase l T through each channel c C can be viewed as a homogeneous Poisson process with a total expected number of arrivals λ c,l. Likewise, the parameters of the spiked-mnl choice model also depends on phase l T and a booking channel c C. 7.1. Data Description In the simulation, we test on synthetic data as well as real-world data provided by our airline partner. 7.1.1. Synthetic data. We consider a numerical example with m = 10 parallel flights, and each flight has n = 4 fare classes. Each flight f has a seat capacity of c f = 25. The prices of fare classes are randomly generated between $50 and $500. The selling horizon is divided into 100 phases, and the expected number of arrivals λ c,l in each phase is sampled uniformly from [2, 9.5]. We randomly generate parameters of the spiked-mnl model for each phase and each channel, while forcing the spike effect to be strictly positive (i.e, > v j for each product j J ). We assume that the host airline faces competition and its market share is about 50%. So, we select the arrival rates and the choice model parameters in such as way that the seat capacity of the host airline is scarce and is about half of the total number of customer arrivals. 7.1.2. Real-world data. The real-world booking data are provided by our airline partner for an anonymous origin-destination market, which has more than 30 parallel flights per day. Among which, the host airline operates m = 20 parallel flights per day in this market, and each flight has the same fare class structure with n = 13 fare classes. There are C = 5 booking channels. The selling horizon is divided into T = 200 phases. We model and estimate customer demand as follows. Let N be the set of the customers, including those who booked with the host airline and other airlines, associated with all the parallel flights on a specific departure date. (The airline data we used contain records of customers who booked with the host airline, as well as estimated numbers of customers who booked with other airlines.) When

Cao, Kleywegt, and Wang: Assortment for Parallel Flights under Spiked-MNL Model 18 a customer τ N arrives via channel c τ in phase l τ, she sees an assortment S τ offered by the host airline and chooses alternative j S τ {0} with probability P cτ,l τ (j, S τ ) = v(x τ,j ) j S τ v(x τ,j ) + v 0 (c τ, l τ ). (8) Eq (8) represents a spiked-mnl model. Here, x τ,j is a feature vector consisting of information about product j and customer τ. For example, product-specific features include price, change fees, and mileage gain; customer-specific features include customer booking channel and booking time. In addition, x τ,j contains a binary variable indicating whether product j is the cheapest available fare class on the associated flight. Function v(x τ,j ) measures the attractiveness of product j given feature vector x τ,j. Quantity v 0 (c τ, t τ ) is the null attractiveness, which depends on the assortments offered by the competing airlines. The parameters in Eq (8) are estimated using maximum likelihood estimation. 7.2. Assortment Policies Tested We test the following assortment control policies in our numerical experiments. FCFS: A naïve first-come first-serve heuristic that opens all fare classes on a flight as long as there is remaining capacity on that flight. EMSR-b: The nested booking limit heuristic proposed by Belobaba (1989). SBLP: The nested booking limit heuristic proposed in Section 6.3, where the booking limits are constructed from the optimal solution to the SBLP. Updated: This policy uses a simulation-based optimization method to improve the booking limits of the SBLP policy (see details in Appendix A.4). CDLP: Offering different assortments over the horizon with fractions specified by the optimal solution to the CDLP. The optimal solution to the CDLP can be obtained by solving the SBLP and then transforming the SBLP solution to the CDLP solution (see Appendix A.3). Note that SBLP, Updated, and EMSR-b all belong to nested booking limit policies. There are two variants of nested booking limit policies, i.e., standard nesting and theft nesting. We implement both variants on all three booking limit heuristics and use -s and -t to distinguish them. A detailed

Cao, Kleywegt, and Wang: Assortment for Parallel Flights under Spiked-MNL Model 19 discussion on standard versus theft nesting for booking limit policies can be found in Talluri and Van Ryzin (2005) and Haerian et al. (2006). 7.3. Simulation Results We conduct the numerical experiments on a laptop with a 2.20 GHz CPU and 8.00 GB RAM. The assortment algorithms are coded in Matlab R2016a; we use CVX 2.1 as modeling language and Gurobi 7.01 as optimization package. We consider the following performance benchmark. For a given assortment control policy ψ, let E[Z ψ ] be the expected revenue achieved using policy ψ. Since the CDLP optimal value z CDLP is an upper bound on the optimal expected revenue of the assortment optimization problem (1), we use the ratio ρ ψ := E[Z ψ ]/z CDLP as the performance metric of policy ψ. A good policy should yield a ratio ρ ψ that is close to 1. 7.3.1. Performance over a Synthetic Dataset. Figure 3 shows the ratio ρ ψ of different policies averaged over 100 simulation runs with 95% confidence intervals. We find that CDLP-based heuristic has the best average performance among all the policies tested. Both EMSRb-s and SBLP-s achieve ratios above 0.92. The updated booking limit heuristic using standard nesting (Updated-s) improves the revenue of SBLP-s by roughly 2%. The performance under theft nesting is in general not as good as that under standard nesting. In particular, the SBLP-based heuristic using theft nesting (SBLP-t) has an average revenue that is 5% less than the revenue of SBLP-s. The first-come first-serve heuristic (FCFS) policy has the worst performance, with a ratio below 0.8. 7.3.2. Performance over the Real-world Dataset. Next we examine the performance of assortment control policies with the real-world data. We train assortment control policies based on the demand models calibrated with the data of Monday flights in year 2011. We then test their performance using the demand models calibrated with the data in year 2012. Table 1 shows the sample mean and standard errors of revenues for each control policy. Figure 4 shows the ratios ρ ψ in the testing set of different policies over 1000 simulation runs. The figure also shows 95% confidence intervals. We again observe that the CDLP-based heuristic has the best performance over all policies tested. The FCFS heuristic performs the worst with the ratios close to