Choice-Based Revenue Management: An Empirical Study of Estimation and Optimization. Appendix. Brief description of maximum likelihood estimation

Size: px

Start display at page:

Download "Choice-Based Revenue Management: An Empirical Study of Estimation and Optimization. Appendix. Brief description of maximum likelihood estimation"

Lorin Harrington
5 years ago
Views:

1 Choice-Based Revenue Management: An Empirical Study of Estimation and Optimization Appendix Gustavo Vulcano Garrett van Ryzin Wassim Chaar In this online supplement we provide supplementary materials of the paper, including a brief description of maximum likelihood estimation, extensive statistics for the examples discussed in the paper, an alternative discrete choice formulation for Example 0, and the derivation of the asymptotic standard error used in the main body of the paper. A1 Brief description of maximum likelihood estimation Choice models are most frequently estimated using maximum likelihood estimation (MLE) methods. Let Z be a random variable, and let x be a vector of known attributes that influence the distribution of Z. Denote this dependence as Z f(x; θ), where f is the distribution of Z, and θ is a vector of parameters, at least some of which are unknown a priori. Using a sample of observations z 1,..., z N from the process being modeled, where N is the number of observations drawn from the whole population, an estimator function of the observations is constructed to estimate the unknown parameters. We will use ˆθ to denote an estimator of θ. The value ˆθ is (before the sample is drawn) a random variable. Of course, after the sample is taken and the utility functions are evaluated, ˆθ is simply a vector of numbers. To distinguish between the random variable ˆθ and its realization for any particular sample, we term the random variable an estimator and any given realization of it an estimate. MLE estimators are based on inferring the parameters θ that maximize the probability of observing the provided data. The likelihood of observing the n-th observation z n, 1 n N, is denoted f(z n x n, θ). The likelihood of observing independent draws ((z 1, x 1 ),..., (z n, x N )) conditioned on Leonard N. Stern School of Business, New York University, 44 West 4th Street, Suite 8-76, New York, NY 10012, gvulcano@stern.nyu.edu. Graduate School of Business, Columbia University, New York, NY 10027, gjv1@columbia.edu. Sabre Holdings, Research Group, Southlake, Texas 76092, wassim.chaar@sabre-holdings.com. 1

2 the parameters θ is simply N L(x, θ) = f(z n x n, θ) n=1 The MLE estimation problem is to solve for the estimate ˆθ which maximizes L(x, θ). The usual approach though is to take advantage of the strict monotonicity of the logarithmic function, and maximize the logarithm of L, which does not change the value of the optimal parameter estimate. For example, for the MNL model (5) (6), if we can observe the arrival process and the nth observation is z n, representing the fact that customer n chooses alternative i C n, then f(z n x n, β) P n (i); and if customer n does not purchase from our airline, f(z n x n, β) P n (0). A2 Supplemental material for Example 0 The tables and figures in this section supports with data some of the claims made in Section 4.3. Table A1 provides descriptive statistics for Example 0. This example corresponds to outbound flights from New York to Florida, and hence the arrival time attribute is used. In this table, the average observed bookings are computed based on the H = 100 histories generated by simulation. Table A1: Descriptive statistics for Example 0 Departure Flight Arrival Min. open Max. open Avg. obs. Time slots Departure day day number time fare fare bookings Mor. Noon Aft. Ev. d = 1 d = 2 d = 3 3/ : : : : / : : : : : / : : : : : Table A2 confirms the goodness-of-fit of the estimation for the base case in Example 0. Here, we generate H = 100 new histories of bookings, and compare the observed bookings with the number predicted by the estimated parameters. The p-value of the χ 2 -test is one. 2

3 Table A2: True expected and predicted bookings for Example 0 Flight Observed bookings Expected bookings Flight Observed bookings Expected bookings Figure A2.1 shows the quality of fit of the true and predicted probabilities for Example 0. True Expected vs. Predicted Choice Probabilities Probability B1,D1 B1,D3 B2,D1 B2,D3 B3,D1 B3,D3 Booking day, departure day B4,D1 B4, no purchase B5,D2 B5,D3 True Prob. Predicted Prob. Figure A2.1: Goodness-of-fit for the probabilities per combination (b, d), for booking days b = 1,..., 5 in Example 0. Table A3 shows the quality of the estimates obtained by the Em procedure starting from different initial values of ˆλ. It is indeed noticeably good, with the only exception that for the case when we start from ˆλ = 0.3, we can also reject the null hypothesis that the true value for the coefficient ˆβ 2 is zero (when it is indeed zero) at the 0.01 significance level. A3 Alternative preliminary example We present this preliminary, alternative example here that does not suffer from the identifiability problem of Example 0. The input parameters are described in Table A4. Here, the time slots are 3

4 Table A3: Estimated parameters for Example 0 under different starting values of ˆλ. Parameter True Starting from ˆλ = 0.3 Starting from ˆλ = 0.6 value Est. value Bias ASE t-stat Est. value Bias ASE t-stat ˆβ % % ˆβ % % ˆβ ˆβ % % ˆβ % % ˆβ % % ˆβ % % ˆλ % % treated as in Example 0, but the indicator for the departure days are treated as standard category variables, where a flight on day d = 3 is represented by setting the attributes x 6 = x 7 = 0. In symbols, for customer n, given x in and β, and noting that x 2 + x 3 + x 4 + x 5 = 1, we would have: v in = β 1 x 1 + β 2 x β 7 x 7 = β 1 x 1 + (β 2 β 5 )x 2 + (β 3 β 5 )x 3 + (β 4 β 5 )x 4 + β 5 + β 6 x 6 + β 7 x 7. By defining β 0 = β 5, and relabeling the parameters to have a consecutive numbering, we can equivalently consider a mean utility of the form v in = β 0 +β T x in for the different purchase options. Table A4: Input parameter values for alternative preliminary example Attribute Description Value β 1 Base fare -1.0 β 2 Morning flight (before 11AM) 0.5 β 3 Noon flight (9AM 3PM) 0.7 β 4 Afternoon flight (1PM 7PM) 0.3 β 5 Evening flight (5PM Midnight) 0.5 β 6 Indicator for flying on day d = β 7 Indicator for flying on day d = λ Arrival rate 0.3 The output of the EM method is provided in Table A5. We observe there a strong bias in the 4

5 additive utility parameter, which implies a shift in the utilities of the different alternatives for the customers. However, we could verify that the true expected number of bookings and the true expected utilities are very close to the predicted ones. Table A5: Output parameters for alternative preliminary example Parameter Description True Value Est. value Bias ASE t-stat ˆβ 0 Base utility % for any purchase option ˆβ 1 Base fare % ˆβ 2 Morning flight (before 11AM) ˆβ 3 Noon flight (9AM 3PM) % ˆβ 4 Afternoon flight (1PM 7PM) % ˆβ 5 Indicator for flying on day d = % ˆβ 6 Indicator for flying on day d = % ˆλ Arrival rate % 2.26E A4 Supplemental material for Examples 1 and 2 Table A6 provides descriptive statistics for Examples 1 and 2. outbound flights from New York to Florida. These examples correspond to Table A6: Descriptive statistics for Examples 1 and 2 Example Flight Arrival Min. open Max. open Bookings Time slots number time fare fare observed Mor. Noon Aft. Ev : : : : : : : : : : Tables A7 and A8 show revenues obtained under randomly perturbed ˆλ and ( ˆβ 0, ˆβ) estimates for airline Examples 1 and 2, respectively. 5

6 Table A7: Test of revenue robustness with respect to perturbed estimates for Example 1 Perturbation Perturbed Revenue E[%Gap] 95% CI range scenario for the %Gap 1 177, % (-5.31%, 3.19%) 2 178, % (-3.56%, 2.62%) ±10% 3 181, % (-2.66%, 4.62%) , % (-0.41%, 6.11%) , % (-3.70%, 4.49%) 1 142, % (-10.15%,-0.93%) 2 142, % (-6.70%,-1.79%) ±25% 3 161, % (-15.72%,-4.73%) 4 179, % (-2.42%, 2.77%) 5 176, % (-4.79%, 2.02%) 1 171, % (-8.63%, 0.18%) 2 165, % (-12.56%,-2.49%) ±50% 3 172, % (-8.97%, 1.11%) 4 165, % (-14.20%,-0.75%) 5 167, % (-11.61%,-1.75%) Table A8: Test of revenue robustness with respect to perturbed estimates for Example 2 Perturbation Perturbed Revenue E[%Gap] 95% CI range scenario for the %Gap 1 161, % ( 4.62%,12.28%) 2 157, % ( 2.95%, 9.18%) ±10% 3 153, % ( 0.13%, 6.51%) 4 153, % (-2.73%, 9.08%) 5 148, % (-3.10%, 3.27%) 1 160, % ( 4.17%,11.75%) 2 150, % (-4.35%, 6.85%) ±25% 3 153, % ( 0.70%, 6.21%) 4 159, % ( 2.64%,11.38%) 5 155, % ( 1.75%, 7.19%) 1 132, % (-14.45%,-7.82%) 2 123, % (-20.91%,-12.50%) ±50% 3 152, % (-0.54%, 5.26%) 4 122, % (-21.87%,-13.33%) 5 142, % (-7.49%,-1.00%) A5 Calculation of the asymptotic standard error (ASE) The asymptotic covariance matrix of the MLE is approximated using the gradient vector of the expected complete data log-likelihood function (9) evaluated at the MLE ˆθ (see McLachlan and Krishnan [25, Section 4.3] for a justification of the approximation, and Greene [18, Section ] for a comprehensive study of asymptotic covariance estimation for MLE). Let ĝ i be the gradient of the expected log-likelihood function (9) evaluated at observation i, where 6

7 each observation corresponds to a period of either a purchase or no-purchase transaction. symbols: ĝ i = ˆθ log f(z i x i, ˆθ) In There are three cases to consider when computing the gradient of the expected log-likelihood function (9) evaluated at observation i: 1. If i P b,h (i.e. in small period i of booking day b in history h, there is a customer who buys from our airline), then β 0 = 1 β k = x i,k λ = 1 λ e β 0+β T x j, e β 0+β T x j + 1 e β 0+β T x j x j,k, k = 1,..., K, j Cb a e β 0+β T x j If i P b irrespective of the booking history, and there is a customer arriving in period i who does not purchase from our airline, then = β 0 = β k λ = 1 λ e β 0+β T x j, e β 0+β T x j + 1 e β 0+β T x j x j,k, k = 1,..., K, j Cb a e β 0+β T x j If i P b irrespective of the booking history, and there is no arrival in period i: β k = 0, k = 0, 1,..., K, λ b = 1 λ b 1 From here, we build the gradient vector for observation i, depending on the case j = 1, 2, 3, just 7

8 described: ĝ T j,i = ( β 0, log f(z i x i, ˆθ) β 1 and then compute the sum of matrices: Î(ˆθ) = H B h=1 b=1 i P b,h ĝ 1,i ĝ T 1,i +,..., log f(z i x i, ˆθ) β K ( B H ) P b,h â b ĝ 2,i ĝ T 2,i + b=1 h=1 We approximate the information matrix by [ ] ] 1 1 [Î(ˆθ) = ĝ i ĝ T i, i, log f(z i x i, ˆθ),..., log f(z i x i, ˆθ) ), λ 1 λ B ( B H ) P b,h (1 â b )ĝ 3,i ĝ T 3,i b=1 h=1 ] 1 where if J is the number of parameters to estimate, ĝ i R J 1, and [Î(ˆθ) R J J. The asymptotic standard error of each estimate (coordinate in the vector ˆθ) is the square root of the corresponding element in the main diagonal. 8

Binary choice 3.3 Maximum likelihood estimation

Binary choice 3.3 Maximum likelihood estimation Michel Bierlaire Output of the estimation We explain here the various outputs from the maximum likelihood estimation procedure. Solution of the maximum likelihood