OPERATIONS RESEARCH doi 10.1287/opre.1070.0427ec pp. ec1 ec5 e-copanion ONLY AVAILABLE IN ELECTRONIC FORM infors 07 INFORMS Electronic Copanion A Learning Approach for Interactive Marketing to a Custoer Segent by Diitris Bertsias and Ada J. Mersereau, Operations Research, doi 10.1287/opre.1070.0427. This docuent accopanies A Learning Approach for Interactive Marketing to a Custoer Segent by Bertsias and Mersereau. We provide proofs of soe results fro that paper and soe additional coputational results. References point to that docuent, and we use notation specified there. A. Proofs of. Results Proposition 1 can be extended to the case in which the decisions x are not fixed but are decided in stages. Corrollary 1. For 0 and where x i ay depend on s t + i 1 j=t yj f t + i 1 j=t xj y j for i>t, E y t ( E y t+1 E y t+ J t s t + t+ t+ y i f t + x i y i ) x t+ s t + t+ 1 t+ 1 y i f t + x i y i xt+1 s t +y t f t +x t y t x t s t f J t t s t f t Proof. We can apply Proposition 1 to the inner expectation to show that the left-hand side of the inequality is greater than or equal to E y t ( t+ 1 t+ 1 E y t+1 E y t+ 1 J t s t + y i f t + x i y i ) x t+ 1 s t + t+ 2 t+ 2 y i f t + x i y i xt+1 s t + y t f t + x t y t x t s t f t The sae arguent applied 1 ore ties yields the desired result. Proof of Proposition 2 Observe that a feasible policy for the N A + N B -stage-size proble is to set the decision x t at stage t equal to the optial decision for the N A -stage-size proble plus the optial decision for the N B -stage-size proble. Let J t s t f t N A N B T denote the cost-to-go function corresponding to this policy beginning in stage t and state s t f t. Observe J T 1 s T 1 f T 1 NT is linear in N,so J T 1 s T 1 f T 1 N A N B T= J T 1 s T 1 f T 1 N A T+ J T 1 s T 1 f T 1 N B T Now consider stage t<t 1. Assue for any s t+1 and f t+1, J t+1 s t+1 f t+1 N A N B T J t+1 s t+1 f t+1 N A T+ J t+1 s t+1 f t+1 N B T and let x A s t = arg ax x x =N A s t + f x t + E y J t+1 s t + y f t + x yn A Txs t f t x B s t = arg ax x x =N B s t + f x t + E y J t+1 s t + y f t + x yn B Txs t f t ec1
ec2 Bertsias and Mersereau: Learning Approach for Interactive Marketing pp. ec1 ec5; suppl. to Oper. Res., doi 10.1287/opre.1070.0427, 07 INFORMS Then J t s t f t N A N B T = s t s t + f t x A + xb + E y J t+1 s t + y f t + x A + x B yn A N B T x A + x B s t f t s t s t + f x A t + xb + E yj t+1 s t + y f t + x A + x B yn A Tx A + x B s t f t + E y J t+1 s t + y f t + x A + x B yn B Tx A + x B s t f t s t = s t + f x A t + xb + E y AE y BJ t+1 s t + y A + y B f t + x A + x B y A y B N A Tx B s t + y A f t + x A y A x A s t f t + E y BE y AJ t+1 s t + y A + y B f t + x A + x B y A y B N B Tx A s t + y B f t + x B y B x B s t f t s t s t + f x A t + xb + E y AJ t+1s t + y A f t + x A y A N A Tx A s t f t + E y BJ t+1 s t + y B f t + x B y B N B Tx B s t f t = J t s t f t N A T+ J t s t f t N B T where the equality in the third-to-last line follows fro Lea 1 and the inequality on the second-to-last line follows fro Proposition 1. The desired result follows fro induction and the fact that J 0 s t f t N A + N B T J 0 s t f t N A N B T. Proof of Proposition The optial T A + T B -horizon policy yields at least as uch expected reward as the policy that uses the optial T A -horizon policy for ties 0 1T A 1, then uses the optial T B -horizon policy for ties T A T A + 1T A + T B 1. If we denote the decisions and outcoes in periods 0T A 1 under this policy as the vectors x A and y A respectively, then this stateent is equivalent to J 0 s 0 f 0 NT A + T B J 0 s 0 f 0 NT A + E x A y AJ 0s 0 + y A f 0 + x A y A NT B where the expectation is over the decisions and outcoes of the T A -stage proble, and is shorthand for the expression in the left-hand side of the inequality in Corollary 1 with t = 0 and = T A 1. By that corollary then, we have E x A y AJ 0s 0 + y A f 0 + x A y A NT B J 0 s 0 f 0 NT B This gives the desired result, J 0 s 0 f 0 NT A + T B J 0 s 0 f 0 NT A + J 0 s 0 f 0 NT B. Proof of Proposition 4 The proble with stage size N and horizon T/ is equivalent to a proble with horizon T, where stage sizes are 0 when the stage t is not divisible by, and N when t is divisible by. Denote the optial value function of this odified proble by J t s t f t NT. Adopt the convention J T s T f T NT= J T s T f T NT= 0 for all s T, f T. We proceed by induction. Assue for soe t divisible by that J t+ s t+ f t+ NT Jt+s t+ f t+ NT for all s t+, f t+. Fix s t, f t and let { x = arg ax x x =N s t s t + f t x + E y Jt+s t + y f t + x yntxs t f t and arbitrarily assign x t x t+1 x t+ 1 such that x = t+ 1 =t x and x t = x t+1 = = x t+ 1 = N. This represents a feasible (non-markov) policy for stages tt + 1t+ 1oftheT-stage, N -stage-size proble.
Bertsias and Mersereau: Learning Approach for Interactive Marketing pp. ec1 ec5; suppl. to Oper. Res., doi 10.1287/opre.1070.0427, 07 INFORMS ec By the definition of x, we can write { J t st f t NT = ax x x =N s t s t + f t x + E y J t+s t + y f t + x ynt x s t f t = s t s t + f t x + E y J t+s t + y f t + x ynt x s t f t s t s t + f x t + E y Jt+ s t + y f t + x ynt x s t f t where the inequality follows fro the induction assuption. We introduce the notation x a b = b =a x. Using this notation, we can write x = x tt+ 1 = x tt+ 2 + x t+ 1. Substituting yields J t st f t NT s t s t + f t x tt+ 1 + E y J t+ s t + y f t + x tt+ 1 ynt x tt+ 1 s t f t (EC1) = s t s t + f t s t s t + f t x tt+ 2 + E y x tt+ 2 + E y s t + y s t + f t + xtt+ 2 x t+ 1 + E y J t+ s t + y + y f t + x tt+ 2 + x t+ 1 y y NT x t+ 1 s t + y f t + x tt+ 2 y x tt+ 2 s t f t ax x x =N { s t + y s t + f t + xtt+ 2 x +E y Jt+ s t +y+y f t +x tt+ 2 +x y y NT x s t + y f t + x tt+ 2 y x tt+ 2 s t f t s t = s t + f x tt+ 2 t + E y J t+ 1 s t + y f t + x tt+ 2 ynt x tt+ 2 s t f t (EC2) where the first equality follows fro Lea 1 and the fact that s E + y y s + f + x x s f = s + E y y x s f = s + x s /s + f = s s + f + x s + f + x s + f We can repeat the arguents (EC1) (EC2) 2 ore ties to get J t st f t NT s t s t + f x t t + E yj t+1 s t + y t f t + x t y t NT x t s t f t { ax x x =N = J t s t f t NT The desired result follows by induction. B. Soe Properties of J 0 s0 f 0 s t s t + f t x + E y J t+1 s t + y t f t + x y t NTxs t f t The propositions proven in this section support the assertions ade in 4.1 that the function J 0 s0 f 0 is convex as a function of and an upper bound for the true value function J 0 s 0 f 0. Proposition 6. J 0 s0 f 0 J 0 s 0 f 0 for all, s 0, and f 0. Proof. First, we use induction to show Jt st f t J t s t f t for all t,, s t, and f t. Fix, and consider stage T 1. Let = arg ax s T 1 1 /st + f T 1, then J T 1s T 1 f T 1 = Ns T 1 1 /st + f T 1. If s T 1 1 /st + f T 1 T 1, then: M J T 1 s T 1 f T 1 = N T 1 + Jˆ 1 1 T 1 st ft NT 1 + Jˆ 1 1 T 1sT ft = J T 1 st 1 f T 1 =1
ec4 Bertsias and Mersereau: Learning Approach for Interactive Marketing pp. ec1 ec5; suppl. to Oper. Res., doi 10.1287/opre.1070.0427, 07 INFORMS If s T 1 1 /st + f T 1 <T 1, then: J T 1 s T 1 f T 1 = N s T 1 s T 1 + f T 1 N T 1 = J T 1 st 1 f T 1 Now for soe t assue Jt+1 st+1 f t+1 J t+1 s t+1 f t+1 for all, s t+1, and f t+1. Let x t be feasible and achieve the axiu in the optiization of Equation (5). x t is feasible in the optiization proble of Equation (7), E y tj t+1 s t + y t f t + x t y t x t s t f t E y tj t+1 st + y t f t + x t y t x t s t f t and t N M =1 xt = 0, thus by coparison of Equations (5) and (7) we have Jt st f t J t s t f t.by induction we thus have J1 s1 f 1 J 1 s 1 f 1 for all, s 1, and f 1, then a coparison of Equations (5) and (11) gives us J 0 s0 f 0 J 0 s 0 f 0. Proposition 7. J 0 s0 f 0 is convex as a function of for all s 0 and f 0. Proof. First, we use induction to show that Jˆ t st ft is convex in for all tst, and f t 1. For all st and f T 1, Jˆ T 1 1 1sT ft is the axiu of linear functions of and is thus convex in. Now for arbitrary t<t 1 assue Jˆ t+1 st+1 ft+1 is convex in for all st+1 and f t+1. Then fix xt and see that E y Jˆ t+1 st + xf t + x y xst ft is a positively weighted su of convex functions of and is thus convex. Jˆ t st ft is then a axiu over a finite set of convex functions and is thus convex as a function of for all s t and f t. Jt st f t is the su of convex functions and is thus convex for all s t and f t. Thus EJ1 s1 f 1 is convex as a function of for all s 1 and f 1. Finally, J 0 s0 f 0 is a axiu of convex functions and is thus convex as a function of for all s 0 and f 0. C. Alternate Algoriths for Selecting Here we present support for our choice of ethod for selecting the paraeter. The results in this section ake use of the subproble approxiations of 4.2 with H = 2, B = N/10, and copare the following ethods for selecting : ADP: This is the ethod used to generate the results in 6. is assued constant and is chosen using binary search to identify the for which the constraint M =1 x0 = N is satisfied in the relaxed proble. After 7 iterations of binary search, the constrained proble (11) is used to deterine a feasible solution. ADP_in: This approach assues constant and attepts to select a that iniizes the value J 0 s0 f 0. The nuerical iniization relies on the convexity of the relaxed proble value function (see Proposition 7). Specifically, we begin with a known interval for (initially, 0 1) and subdivide the interval into 4 evenly spaced subintervals. By evaluating and coparing the relaxed value function at each of the subinterval boundaries, we can narrow the interval to at ost one half the original interval. This procedure is iterated 7 ties. ADP_: This ipleents a version of the algorith with retaining a coponent for each future tie stage. We include variables for 1 through H, using H for estiating the relaxed value functions beyond the lookahead horizon H. The coponent paraeters are chosen in a iniization procedure that perfors local search on a discretized grid of values. The discretization we use is 005, which we note is coarser than the precision of ADP_in. Table EC.1 gives results for a few selected probles for each of the ethods described. We note that the results do not give evidence that our assuption of constant is a poor one, nor does it see that significant gains can be achieved by using a iniization procedure to choose. TABLE EC.1. Siulation results coparing ADP, ADP_in, and ADP_l. Nubers represent average nubers of successes over 2,000 siulated probles. s, f T N k Ideal Greedy Intval. ADP ADP_in ADP_l 2, 8 10 50 U0 478 1856 19012 18978 1890 18907 2, 8 10 100 U0 41268 764 8886 8860 8754 8710 2, 50 6 0 U0 60 9889 8740 8875 8861 8824 8851 4, 100 5 1000 U100 0 258 0507 070 0699 064 0688
Bertsias and Mersereau: Learning Approach for Interactive Marketing pp. ec1 ec5; suppl. to Oper. Res., doi 10.1287/opre.1070.0427, 07 INFORMS ec5 TABLE EC.2. Siulation results for soe randoly generated ulti-segent probles. Coputation ties represent average CPU tie per stage on an Intel Xeon 2.4 GHz processor. The Greedy and Dynaic algoriths took negligible tie per stage (<0005 seconds). S T M Ni 0 { 4 25 2 8 4 25 8 4 8 2 8 { 1 4 Cpu tie Cpu tie sf k Greedy Dynaic Info Decop Info Decop 1 9 U0 2 8 U0 9695 9957 982 100.84 0.1 0.2 2 8 U010 2 8 U010 22521 22702 22807 229.98 0.8 0.9 2 8 U010 2 8 U010 2 8 U0 2 8 U00 2 8 U040 18089 18272 18242 184.0 0.4 0.44 5 5 U010 40 17577 1764 17626 176.67 0.09 0.10 D. Coputational Results for Multiple Segents with Migrating Custoers Through siulated experients, we evaluate the effectiveness of the ethod described in 4. for accounting for the igration of custoers aong segents. For purposes of coparison, we have ipleented the following heuristics for the proble described in 4.: Greedy: Sends to all custoers in state i the available essage offering the greatest expected reward in the current stage. Thus, this ethod accounts for neither the custoer igration dynaics nor the effects of inforation accuulation. Dynaics: This heuristic fixes all reward probabilities at their expected values, then solves a siple dynaic progra for each custoer. In the case with known purchase probabilities, solving for each custoer independently produces an optial policy for the overall proble. This ethod accounts for custoer igration dynaics but ignores inforation effects. Info: This ignores custoer dynaics entirely and akes decisions using the dynaic prograing-based adaptive sapling heuristic of 4. Decop: This is the decoposition-based approxiation described in 4.. We test the algorith on a few randoly generated exaples. True reward probabilities and prior distributions are generated as in 6. Transition probabilities for each segent and essage are chosen by selecting an S-vector of unifor rando deviates and noralizing so that S j=1 P ij = 1. Average results over 2,000 randoly generated probles for a few cases are presented in Table EC.2. We observe that the Decop heuristic outperfors all the other ethods for each set of probles tried. Moreover, the iproveent afforded by the Decop heuristic is statistically significant in each case. Most notably, the decoposition approach perfors as well as the bext of the Dynaic and Info techniques in all of the exaples, suggesting it is adequately accounting for both inforation value and custoer dynaics. We also note that all three of the Dynaic, Info, and Decop ethods are preferable to the Greedy heuristic in all the exaples. Coputation ties are reasonable and coparable to those observed for the single-segent probles of 6, although we point out that we have chosen instances with fewer custoers per stage than in 6.