A New Multivariate Markov Chain Model with Applications to Sales Demand Forecasting

Iteratioal Coferece o Idustrial Egieerig ad Systems Maagemet IESM 2007 May 30 - Jue 2 BEIJING - CHINA A New Multivariate Markov Chai Model with Applicatios to Sales Demad Forecastig Wai-Ki CHING a, Li-Mi LI c, Tag LI d, Shu-Qi ZHANG b a Advaced Modelig ad Applied Computig Laboratory ad Departmet of Mathematics, The Uiversity of Hog Kog, Pokfulam Road, Hog Kog b Advaced Modelig ad Applied Computig Laboratory ad Departmet of Mathematics, The Uiversity of Hog Kog, Pokfulam Road, Hog Kog c Advaced Modelig ad Applied Computig Laboratory ad Departmet of Mathematics, The Uiversity of Hog Kog, Pokfulam Road, Hog Kog d Advaced Modelig ad Applied Computig Laboratory ad Departmet of Mathematics, The Uiversity of Hog Kog, Pokfulam Road, Hog Kog Abstract Markov chais are popular tools for modelig a lot of practical systems such as queueig systems, maufacturig systems ad categorical data sequeces Multiple categorical sequeces occur i may applicatios such as ivetory cotrol, fiace ad data miig I may situatios, oe would like to cosider multiple categorical sequeces together at the same time The reaso is that the data sequeces ca be correlated ad therefore by explorig their relatioships, oe ca develop better models I this paper, we propose a ew multivariate Markov chai model for modelig multiple categorical data sequeces We the test the proposed model with sythetic data ad apply it to practical sales demad data Key words: Ivetory, Markov Process, Multivariate Markov chais, Categorical Data Sequeces, Demad Predictio 1 Itroductio Categorical data sequeces (time series) have may applicatios i both applied scieces ad egieerig problems such as ivetory cotrol problemss [1,10,11], webpage predictio problems i data miig [12], credit risk problems i fiace [9] ad may others [2,6,8] Categorical data sequeces ca be modeled by usig Markov chais, see for istace [10,11,8] I may occasios, oe has to cosider multiple Markov chais (categorical sequeces) together at the same time, ie, to study the chais i a holistic maer rather tha idividually The reaso is that the chais (data sequeces) ca be correlated ad therefore the iformatio of other chais ca cotribute to explai the captured chai (data sequece) Thus by explorig these relatioships, oe ca develop better models We remark that the covetioal Markov chai model for s categorical data sequeces of m states has m s states The umber of parameters This paper was ot preseted at ay other revue Research supported i part by RGC Grats ad HKU CRCG Grats Correspodig author Li-Mi LI Email: limili@hkusuahkuhk Fax (852) 2559-2225 Email addresses: wchig@hkusuahkuhk (Wai-Ki CHING), limili@hkusuahkuhk (Li-Mi LI), litag@hkusuahkuhk (Tag LI), sqzhag@hkusuahkuhk (Shu-Qi ZHANG)

IESM 2007, BEIJING - CHINA, May 30 - Jue 2 (trasitio probabilities) icreases expoetially with respect to the umber of categorical sequeces Because of this large umber of parameters, people seldom use such kid of Markov chai models directly I view of this, Chig et al proposed a first-order multivariate Markov chai model i [10] for modelig the sales demads of multiple products i a soft drik compay Their model ivolves oly O(s 2 m 2 + s 2 ) umber of parameters where s is the umber of sequeces ad m is the umber of possible states The model ca capture both the itra- ad iter-trasitio probabilities amog the sequeces They also developed efficiet estimatio methods for solvig the model parameters ad applied the model to a problem of sales demad data sequeces A simplified multivariate Markov model based o [10] has also bee proposed i [13] where the umber of parameters is oly O(sm 2 + s 2 ) However, all the models [10,13] caot capture egative correlatios amog the data sequeces We will elaborate the meaig of positive ad egative correlatios shortly i the ext sectio Moreover, all the models have bee show to be statioary ad steady-state is assumed i the estimatio of model parameters Thus the estimatio methods have to be modified whe the give data sequeces are short I this paper, we propose a ew model which ca capture both positive ad egative correlatios amog the data sequeces The model ca also be adjusted to hadle the case whe the data sequeces are short easily The rest of the paper is orgaized as follows I Sectio 2, we propose the ew multivariate Markov model ad discuss some importat properties of the model I Sectio 3, we preset the method for the estimatio of model parameters I Sectio 4, we apply the model ad the method to some sythetic data ad practical sales demad data Fially, a summary is give i Sectio 5 to coclude the paper 2 The New Multivariate Markov Chai Model I this sectio, we first propose our ew multivariate Markov chai model ad the some of its properties The properties are importat for the estimatio of model parameters The followig multivariate Markov chai model has bee proposed i [10] The model assumes that there are s categorical sequeces ad each has m possible states i M = {1, 2,, m} Here we adopt the followig otatios as i [10] Let x (k) be the state vector of the kth sequece at time If the kth sequece is i State j at time the we write x (k) = e j = (0,, 0, }{{} 1, 0,,0) T jth etry The followig relatioships amog the sequeces are assumed: where x (j) +1 = λ jjp (jj) x (j) + k=1,k j λ jk P (jk) x (k) for j = 1, 2,,s, (1) λ jk 0, 1 j, k s ad λ jk = 1, for j = 1, 2,,s (2) k=1 Equatio (1) simply meas that the state probability distributio of the jth chai (sequece) at time ( + 1) depeds oly o the weighted average of P (jj) x (j) ad P (jk) x (k) Here P (ij) is the oe-step trasitio probability matrix of the states from the jth sequece to the states of the ith sequece I matrix form, oe may write x +1 +1 +1 +1 λ 11 P (11) λ 12 P (12) λ 1s P (1s) λ 21 P (21) λ 22 P (22) λ 2s P (2s) = λ s1 P (s1) λ s2 P (s2) λ ss P (ss) Qx

IESM 2007, BEIJING - CHINA, May 30 - Jue 2 We ote that this model oly allows positive correlatio amog the data sequeces as all the λ ij are assumed to be o-egative This meas a icrease i a state probability i ay oe of the sequeces at time ca oly icrease (but ever decrease) the state probabilities at time ( + 1) To exted the model so as to take care of the egative correlatios amog the data sequeces, our idea here is to use z +1 = 1 m 1 (1 x ) to model the case whe the state probability vector x is egatively correlated to the state probability vector z +1 Here 1 is the vector of all oes ad the factor (m 1) 1 is the ormalizatio costat ad the umber of possible states m 2 We propose the followig model for s data sequeces,,, I order to reduce the umber of parameters, i the proposed model, we assume that P (ij) = I whe i j This idea has bee used i [13] ad has bee show to be effective We the assume the followig relatioship amog the data sequeces: +1 +1 +1 = Λ + }{{} P ositive correlated part + 1 m 1 Λ 1 1 1 }{{} Negative correlated part where λ 1,1 P (11) λ 1,2 I λ 1,s I Λ + λ = 2,1 I λ 2,2 P (22) λ 2,s I λ s,1 I λ s,s 1 I λ s,s P (ss) ad λ 1, 1 P (11) λ 1, 2 I λ 1, s I Λ λ = 2, 1 I λ 2, 2 P (22) λ 2, s I λ s, 1 I λ s, s+1 I λ s, s P (ss) Here λ i,j 0 for i = 1, 2,,s ad j = ±1,, ±s ad λ i,j = 1 j= s

This is equivalet to x (+1) = +1 +1 +1 H 1,1 H 1,2 H 1,s H = 2,1 H 2,2 H 2,s H s,1 H s,2 H s,s M s x () + b IESM 2007, BEIJING - CHINA, May 30 - Jue 2 + 1 m 1 J 1, 1 J 1, 2 J 1, s 1 J 2, 1 J 2, 2 J 2, s 1 J s, 1 J s, s+1 J s, s 1 Here ad We ote that H ij = (λ i,j λi, j m 1 )P (ii) if i = j (λ i,j λi, j m 1 )I otherwise λ i, j P (ii) if i = j J ij = λ i, j I otherwise x (+1) = M 2 sx ( 1) + (I + M s )b = M 3 sx ( 2) + (I + M s + M 2 s )b where M 0 s = I = M (+1) s x (0) + Ms k b k=0 The model has a statioary distributio if for certai matrix orm (let us say M s ) we have M s < 1 Here give a real matrix M, I this case, we have We also ote that M = max M ij i lim x() = lim j=1 Ms k b = (I M s ) 1 b k=0 M s max 1 k s m λ k,k λ k, k (m 1) + λ k,i λ k, i (m 1) k i Therefore the smaller M s is the faster the covergece rate of the process will be Thus oe ca cotrol the coverget rate by settig a upper boud α < 1 by itroducig the extra costraits m λ k,k λ k, k (m 1) + λ k,i λ k, i (m 1) α for k i i = 1, 2,,s

3 Estimatio of Model Parameters IESM 2007, BEIJING - CHINA, May 30 - Jue 2 I this sectio, we propose efficiet methods for the estimatios of P (jj) ad λ jk For each data sequece, oe ca estimate the trasitio probability matrix by the followig method [10,11] Give a data sequece, oe ca cout the trasitio frequecies from oe arbitrary state to the other states Hece we ca costruct the trasitio frequecy matrix for the data sequece After makig a ormalizatio, the estimates of the trasitio probability matrices ca also be obtaied We ote that oe has to estimate O(s m 2 ) trasitio frequecy matrices for the multivariate Markov chai model The vector statioary vector x ca be estimated from proportio of the occurrece of each state i each of the sequeces Accordig to the idea at the ed of last sectio, if we take to be 1 we ca get the values of λ jk by solvig the followig optimizatio problem ([10,11]): subject to mi λ b j,k = k= s [ b j,k ˆx (j)] i i k=1 λ jk = 1, ( (λ j,k λ j, k m 1 ) jkˆx (k) + 1 ) m 1 λ j,k jk 1 j = 1, 2,,s λ jk 0, k = ±1,,±s, j = 1, 2,,s m λ k,k λ k, k (m 1) + λ k,i λ k, i (m 1) α for k i k = 1, 2,,s (3) Here 1 is the vector with all the etries beig equal to oe ad P (jj) if j = k jk = I if j k Problem (3) ca be formulated to s liear programmig problems as follows, see for istace [3, (p 221)] We remark that other vector orms such as 2 ad ca also be used but they have differet characteristics It is clear that the former will result i a quadratic programmig problem while will still result i a liear programmig problem, see for istace [3, (pp 221-226)] There is a well-kow fact that i approximatig data by usig a liear fuctio [3, (p 220)], 1 gives the most robust result While avoids gross discrepacies with the data as much as possible If the errors are kow to be ormally distributed the 2 is the best choice Fially we remark that the complexity of solvig a liear programmig problem or a quadratic programmig problem is O( 3 L) where is the umber of variables ad L is the umber of biary bits eeded to record all the data of the problem [4] 4 Numerical Examples I this sectio, we preset both sythetic data sequeces ad practical sales demad data sequeces to demostrate the effectiveess of the proposed model We first estimate all the trasitio probability matrices P (jj) ad the parameters λ ij by usig the method proposed i Sectio 3 To evaluate the performace ad effectiveess of the ew multivariate Markov chai model, we use BIC (Bayesia Iformatio Criterio) [5,7] as a idicator which is defied as BIC = 2L + q log, where m L = j=1 i 0,k 1, k s=1 i 0,k 1,,k s log I, (j)

IESM 2007, BEIJING - CHINA, May 30 - Jue 2 I = is the log-likelihood of the model, m l=1 k=1 (λ jk 1 m 1 λ j, k)p (jk) i 0,k l + 1 m 1 λ j, k (j) i 0,k 1,k 2,,k s = x (j) +1 (i 0)x 1 (k 1)x 2 (k 2) x s (k s) Here q is the umber of idepedet parameters, ad is the legth of the sequece The less the value of BIC, the better the model is For the sake of compariso, we also give the results for the model proposed by Chig et al i [10] 41 Sythetic Data Sequeces I this subsectio, we cosider two biary sequeces geerated by the followig model: Here ad x(1) +1 +1 Begiig with = λ 1,1P (11) λ 1,2 I λ 2,1 I λ 2,2 P (22) oe ca geerate sequeces of as follows: x(1) + λ 1, 1P (11) λ 1, 2 I λ 2, 1 I λ 2, 2 P (22) P (11) 02 03 = ad P (22) 04 01 = 08 07 06 09 λ 1,1 = 04300, λ 1,2 = 00505, λ 1, 1 = 04401, λ 1, 2 = 00794, λ 2,1 = 00838, λ 2,2 = 03932, λ 2, 1 = 01187, λ 2, 2 = 04043 0 = (0, 1) T ad 0 = (1, 0) T, ad A : 21221122212222212122 B : 11121212112121122111 1 x(1) 1 We the apply our to these data sequeces ad compare it with the model i [10] The results are the reported i Table 1 Table 1 The BIC for Sythetic Data Sequeces Models BIC Multivariate Markov Model i [10] 10793 The New Model (α = 10) The New Model (α = 09) The New Model (α = 08) The New Model (α = 07) The New Model (α = 06) 42 Sales Demads Data Sequeces 42130e+003 42201e+003 42258e+003 42322e+003 42410e+003 I this subsectio, we preset some umerical results based o the sales demad data of a soft-drik compay i Hog Kog [10] The soft-drik compay actually facig a i-house problem of productio plaig ad ivetory cotrol A key issue is that the storage space of the cetral warehouse ofte fids itself full or ear capacity Product are categorized ito six possible states accordig to sales volume All products are labeled as either very fast-movig (very high sales volume), fast-movig, stadard, slow-movig, very slow-movig

IESM 2007, BEIJING - CHINA, May 30 - Jue 2 (low sales volume) or o sales volume The compay has a big customer ad would like to predict sales demad for this customer so as to miimize the i-house ivetory ad at same time maximize the demad satisfactio for this customer More importatly, the compay ca uderstad the sales patter of this customer ad the develop a marketig strategy to deal with this customer We expect sales demad sequeces geerated by the same customer to be correlated to each other Therefore by explorig these relatioships, we develop the multivariate Markov model for such demad sequeces, hece obtai better predictio rules We the build the multivariate Markov chai model based o the data i the Appedix of [10] Table 2 The BIC for Data sequeces A,B,C,D,E Models Multivariate Markov Model i [10] The New Model (α = 10) The New Model (α = 09) The New Model (α = 08) The New Model (α = 07) The New Model (α = 06) BIC 80215e+003 41264e+003 41762e+003 42638e+003 43749e+003 45112e+003 The results above show the effectiveess of our simplified multivariate Markov model Oe ca see that the ew multivariate Markov model is much better tha the multivariate Markov model i [10] i fittig the sales demad data We remark that whe orm is used istead of 1 orm, i the LP, we still get similar results of BIC for both the sythetic data ad practical data 5 Summary I this paper, we propose a ew multi-dimesioal Markov chai model for modelig multiple categorical data sequeces We test the proposed model with both sythetic data ad practical sales demad data The ew model ca capture both positive ad egative correlatios amog the data sequeces The model ca also be adjusted to hadle the case whe the data sequeces are short easily 6 Appedix Sales Demad Sequeces of the Five Products (Take from [10]) Product A: 6 6 6 6 2 6 2 6 2 2 6 2 6 6 2 6 2 4 4 4 5 6 6 1 2 2 6 6 6 2 6 2 6 6 2 6 2 2 6 2 1 2 2 6 6 6 2 1 2 6 2 6 6 2 2 6 2 2 2 6 2 6 2 2 2 2 2 6 2 2 6 6 6 6 1 2 2 6 2 2 2 2 6 2 2 2 2 3 3 2 3 2 6 6 6 6 2 6 2 6 6 2 6 2 6 6 2 6 6 2 2 3 4 3 3 1 3 1 2 1 6 1 6 6 1 6 6 2 6 2 6 2 2 2 6 6 1 6 2 6 1 2 1 6 2 6 2 2 2 2 6 6 1 6 6 2 2 6 2 2 2 3 4 4 4 6 4 6 1 6 6 1 6 6 6 6 1 6 2 2 2 6 6 6 6 2 6 6 2 2 6 2 6 2 2 2 6 2 2 2 6 6 6 6 3 2 2 6 2 2 2 2 2 2 6 2 6 2 2 2 6 2 2 6 6 2 6 6 6 2 2 2 3 3 3 4 1 6 6 1 6 6 1 6 1 6 6 6 6 1 6 6 6 2 1 2 2 2 2 2 2 3 6 6 6 6 6 2 6 Product B: 1 6 6 1 6 1 1 1 1 1 1 6 6 6 1 2 1 6 6 1 1 1 6 6 2 1 6 6 1 1 1 6 1 2 1 6 2 2 2 2 2 6 1 6 6 1 2 1 6 6 6 1 1 1 6 6 1 1 1 1 6 1 1 2 1 6 1 6 1 1 6 2 6 2 6 6 6 3 6 6 1 6 6 2 2 2 3 2 2 6 6 6 1 1 6 2 6 6 2 6 2 6 6 1 3 6 6 1 1 1 2 2 3 2 2 6 2 2 2 1 6 1 6 1 1 6 2 1 1 1 2 2 1 6 1 1 1 1 2 6 1 1 1 1 6 1 6 1 2 1 6 1 6 6 1 6 1 2 2 2 2 3 3 2 2 2 6 6 6 6 2 1 1 6 1 1 1 6 1 6 1 6 1 6 1 1 6 6 2 1 1 6 6 1 1 2 6 2 6 6 6 1 2 6 1 6 1 1 1 1 6 1 6 1 1 6 6 1 6 6 1 6 1 6 6 1 1 6 6 2 2 2 2 2 2 2 2 2 6 6 6 6 1 6 6 6 1 6 6 1 6 6 1 1 6 1 3 3 3 5 1 6 6 6 6 6 6 6 6 Product C: 6 6 6 6 6 6 6 2 6 6 6 6 6 6 6 2 6 6 6 6 2 6 6 6 2 2 6 6 6 6 6 6 6 1 6 2 6 6 6 6 6 6 6 6 2 6 6 1 2 6 1 6 6 1 6 2 6 6 6 6 6 6 6 2 6 6 6 2 6 6 1 6 6 6 6 6 6 6 3 3 6 3 2 1 2 2 1 6 6 1 6 1 6 6 6 6 6 6 1 6 6 6 1 6 6 6 6 6 6 6 6 6 6 6 2 6 6 6 6 6 6 6 6 2 2 6 6 2 6 1 2 6 6 6 2 6 6 2 6 6 2 6 1 6 2 6 2 1 2 6 6 2 2 6 2 6 2 2 6 2 6 6 6 2 2 2 6 6 2 6 6 2 2 6 1 2 1 2 6 6 2 2 6 6 1 2 2 1 6 2 6 2 2 1 1 5 6 3 6 1 6 6 1 2 2 6 1 6 2 6 6 1 6 2 6 2 6 6 6 1 6 1 6 6 2 2 2 1 2 3 6 1 6 1 6 1 6 1 6 6 6 1 1 6 6 6 6 6 1 6 6 6 1 6 1 1 6 6 6 6 6 6 6 6 1 6 6 1 6 Product D: 6 2 2 2 2 3 3 4 4 4 5 4 3 3 6 2 6 6 6 3 4 4 3 3 3 3 3 2 6 6 3 4 4 4 4 3 4 2 6 2 2 6 2 2 6 6 3 4 5 4 4 6 3 6 6 6 2 6 2 6 6 2 2 6 4 4 5 4 3 4 3 4 4 6 2 6 6 2 2 6 2 6 6 2 6 6 2 6 6 2 6 2 6 3 5 5 5 4 4 4 3 6 2 6 6 2 6 2 6 2 2 6 2 6 6 2 6 4 4 4 4 4 4 6 3 6 6 2 6 2 6 2 6 2 6 6 2 2 2 2 2 2 2 2 2 3 3 3 5 5 4 5 3 3 3 6 2 6 6 2 2 6 2 2 2 2 6 2 3 2 2 3 6 3 2 2 3 4 4 4 4 5 5 4 4 6 6 2 6 2 6 2 2 2 2 2 2 2 5 5 4 4 5 5 2 6 2 6 6 2 6 2 6 2 2 3 3 4 4 5 4 4 4 3 4 3 6 2 6 2 2 2 2 2 2 2 2 2 2 2 3 4 4 4 4 5 4 4 4 3 2 2 2 6 2 2 2 6 2 6 2 6 2 2 2 2 2 3 2 Product E: 6 2 2 2 2 3 3 4 4 4 5 4 3 3 6 2 6 6 2 3 4 4 3 4 4 3 3 2 2 6 3 4 4 4 4 3 4 2 3 2 2 6 3 3 6 6 3 4 5 4 5 3 3 2 6 6 2 6 2 6 6 2 2 6 4 4 4 4 4 4 5 4 4 6 2 6 6 2 2 6 2 6 6 2 6 6 2 6 6 2 6 2 6 3 4 4 4 4 4 4 4 6 2 6 6 2 6 2 6 6 6 6 2 6 2 2 6 4 4 4 4 4 4 6 3 3 6 2 2 2 6 2 6 2 2 2 2 2 2 2 2 2 2 2 2 3 6 4 5 5 5 5 2 4 6 6 2 6 6 2 2 6 2 2 2 2 6 2 3 2 2 3 6 3 2 2 3 4 4 4 4 5 5 4 3 3 6 2 6 2 2 2 6 3 2 2 2 2 5 5 4 4 4 4 3 6 2 6 6 2 6 2 6 2 2 3 3 4 4 5 4 4 4 4 4 3 6 2 6 2 2 2 6 2 2 2 2 2 2 2 3 4 4 4 4 5 4 4 4 3 2 2 2 6 6 6 2 6 2 6 2 6 2 2 2 2 2 2 2 6=very fast-movig, 5 = fast-movig, 4 = stadard, 3 = slow-movig, 2 = very slow-movig ad 1= o sales volume Refereces [1] W Chig Markov modulated poisso processes for multi-locatio ivetory problems Iter J Prod Eco, 53:232 239, 97

IESM 2007, BEIJING - CHINA, May 30 - Jue 2 [2] W Chig ad M Ng Markov Chais : Models, Algorithms ad Applicatios Spriger, 06 [3] V Chvátal Liear Programmig Freema, 83 [4] S Fag ad S Puthepura Liear Optimizatio ad Extesio : Theory ad Algorithms Pretice-Hall, 93 [5] J Kuha Aic ad bic comparisos of assumptios ad performace Sociological Methods ad Research, 33:188 229, 04 [6] I MacDoald ad W Zucchii Hidde Markov ad Other Models for Discrete-valued Time Series Chapma & Hall, Lodo, 97 [7] A Raftery A model of higher-order markov chais J Royal Statist Soc, 47:528 539, 85 [8] A Raftery ad S Tavare Estimatio ad modellig repeated patters i high order markov chais with the mixture trasitio distributio model Appl Statist, 43:179 199, 94 [9] M Ng T Siu, W Chig ad E Fug O multivariate credibility approach for portfolio credit risk measuremet Quatitative Fiace, 5:543 556, 05 [10] E Fug W Chig ad M Ng A multivariate markov chai model for categorical data sequeces ad its applicatios i demad predictio IMA J Maag Math, 13:187 199, 02 [11] E Fug W Chig ad M Ng A higher-order markov model for the ewsboy s problem J Operat Res Soc, 54:291 298, 03 [12] E Fug W Chig ad M Ng Higher-order markov chai models for categorical data sequeces Iter J Nav Res Logist, 51:557 574, 04 [13] S Zhag W Chig ad M Ng O higher-dimesioal markov chai models to appear i Pacific Joural of Optimizatio, 06