RAINFALL PREDICTION BY WAVELET DECOMPOSITION

RAIFALL PREDICTIO BY WAVELET DECOMPOSITIO A. W. JAYAWARDEA Departmet of Civil Egieerig, The Uiversit of Hog Kog, Hog Kog, Chia P. C. XU Academ of Mathematics ad Sstem Scieces, Chiese Academ of Scieces, Beijig, Chia F. L. TSAG Departmet of Civil Egieerig, The Uiversit of Hog Kog, Hog Kog, Chia A ovel approach of simulatig dail raifall data usig wavelet ad hidde Markov model is preseted i this stud. It is the applied to simulate dail raifall data i three gaugig statios i the Chao Phraa Basi i Thailad. ITRODUCTIO Because of the strog determiistic character of the large-scale (mothl or bimothl) raifall data ad the strog stochastic character of the small-scale (dail) data, either determiistic or stochastic models ca produce successful simulatios of dail raifall data. I this paper, a mixed method which combies a determiistic model with a stochastic model is itroduced ad applied to simulate raifall data i the Chao Phraa Basi i Thailad. I the proposed approach, the raifall sigal is decomposed ito sub-sigals with differet scales, i.e., a large-scale sigal ad several small-scale sigals. For a give time series ad predictio origi t, the date correspodig to t i previous ears, t (e.g. if t, t 7 ad t 3 ), is first idetified, ad sigals of dail data i the das periods [ t +, t + ] is decomposed ito sub-sigals (wavelet tree), where is required to be determied a priori. After decompositio, a determiistic model is used to describe the large-scale sigal, ad a hidde Markov tree model is used to simulate the small-scale sigals i a wavelet tree (Crouse et al., []; Smth et al., [8]). The odes of the wavelet trees have some hidde states (Rabier, []) that follow a mixed Gaussia distributio ad the state probabilities are treated as Markov stochastic processes. The EM algorithm (Crouse et al., []; McLachla, []; Roe et al., [7]; Dempster et al., [3]) is applied to estimate the state ad trasitio probabilities (Rabier, []) for the Markov model. Oce the state probabilities are obtaied, the wavelet coefficiets are simulated b Mote Carlo method. Together with the large-scale sigal determied, the dail raifall data ca be simulated via iverse trasformatio. The approach is the used to simulate three dail raifall time series i the Chao Phraa Basi (CPB) i Thailad. The data series are from the gaugig statio os.,

ad 7 respectivel for the periods April, 98 to March 3, 994, April, 98 to Jul 3, 994, ad April, 98 to Jul 3, 994. The statistics of the three data sets are give i Table. SMALL SCALE DATA SIMULATIO For a series of give dail raifall data r i ( i max ) ; max is the legth of the raifall time series, a ew time series u, which gives the mea raifalls for ever das ca be costructed as u r, () i i ( ) + where is the give scale (assiged to be i this stud). The ew time series u ca be cosidered as determiistic ad therefore predictable for sufficietl large scale. To estimate the dail raifall usig the predicted value C, of the mea raifall i a da period, a reasoable thought is to estimate the mea raifalls C, ad C, at the scale level, C,, C,, C, 3 ad C, 4 at the scale level etc. successivel, util the dail scale level C,, C,,, C, is reached. However, to obtai the mea raifall at the scale level, usig the predicted raifall at the scale level, more iformatio is eeded. Sice C,, ad C, & C, respectivel are the mea raifalls at ad scales, the satisf the equatio C, + C, C,. () To estimate the scale mea raifall, aother variable is itroduced: C, C, D,. (3) C +, The, the scale mea raifall ca be estimated b kow ad as C, D, C, (C, + ) D, C,, (4) C, + (C, + ) D, C,. () B the same method, if the wavelet coefficiets

Ck, Ck, D k, () C + k, are kow for all the scales k, the Ck, ad C k, ca be obtaied for all, k usig Ck, + Ck, C k,. (7) The estimatio of dail raifall data, C, for all, ca the be doe b sequetial applicatio of this procedure. I this stud, -order scale data correspod to dail raifall data ( ), -order correspod to -da data ( ), -order correspod to 4-da data ( ), 3-order correspod 3 to 8-da data ( ), etc., ad -order data correspod to -da data. The otatios (for example, C k, ad D k, ) without superscript i this paper are limited to the data that are to be estimated, whereas the otatios (for example, C k, ad D k, ) with a superscript refer to the historical data that have bee used i the simulatio process. B usig the above decompositio method the (observed) -order scale data C, ca be decomposed as follows: C, C, C, C, C, D, D, D, D, Coversel, the (simulated) -order scale data ca be recostructed b the wavelet sigal ad the large-scale sigal as: C, C, C, C, C, D, D, D, D, Sice the large-scale sigal of a dail raifall data is assumed to be determiistic, the -order scale data C, ca be predicted usig the historical data for large. I this stud, a local liear model is used for this purpose ad a tree model is used to simulate the k wavelet sigal D k,, for k ad ( D, ca be calculated b estimated C,, historical data C, ad Eq. (3)). I the wavelet tree, the ode ( k, ) where k is the laer umber ad is the positio umber i laer k has the paret ( k +,[( + ) / ]) while the offsprigs are ( k, ) ad ( k,). Here the fuctio [x] returs the largest iteger smaller tha (the paret-child termiolog is used i related papers, e.g. Roe et al. [7] ad x 3

Crouse et al. []). The data i the wavelet tree are all stochastic, ad therefore a stochastic method must be used for simulatio. A simple probabilit fuctio such as the Gaussia distributio will ot be suitable because the data i the wavelet tree which are built b the dail raifall data, will cotai ma small values (See Table which gives the dr probabilit, a idicator of the umber of das with zero raifall), ad some large values. I this stud, a mixture model, a combiatio of several Gaussia distributio fuctios, is used to simulate the wavelet sigal: p( x) M vk, ( i) exp( ) i πσ σ i i x, (8) where p(x) is the probabilit distributio of the wavelet sigal x (i this case x Dk, ). The weighted value for each Gaussia distributio fuctio, v k, ( i), is also stochastic. It is simulated b usig a ew radom variable S k,, called the hidde state variable, which has values of {,, 3, K, M}. The weighted value v k, ( i) is equal to the probabilit of hidde state variable i state i (i.e. vk, ( i) Pr( Sk, i) ). I the above defiitios, i is the umber of hidde states assumed ( i,, 3, K, M ), M is the maximum umber of hidde states, ad σ i is the variace of the Gaussia distributio. It ca also be see that the weighted values for the mixture model for the data poits D k, ad D k, or Dk, are depedet. A large value of D k, alwas meas that oe value of either Dk, or Dk, is large. So the weighted value v k, ( i), (or, vk, ( i) ) for the data poit Dk, (or, Dk, ) which is equal to the probabilit of the hidde state variable Sk, (or, Sk, ) depeds o the weighted value v k, (, the probabilit of the hidde state variable S k, equal to j. Sice probabilit of trasitio of the hidde state variable from S k, to S k, l ( l, ) could var with positio ( k, ), we itroduce the trasitio probabilities as follows: Tk, l ( i, Pr( Sk, l i Sk,, l,. (9) The the weighted values vk, ( i), or, vk, ( i) satisf the Markov coditio v (, l,. () M k, l i) Tk, l ( i, vk, ( j I order to simulate the dail raifall data b the above approach, we eed to kow the followig parameters: the -order scale data C, ; the umber of hidde states M ; the variace for each hidde state σ i ; the weighted value v, ( i) for each Gaussia distributio, i.e. the probabilit of the hidde state radom variable S, i ad the trasitio probabilities T k, ( i,. All the remaiig v k, ( i) ca the be obtaied b Eq. () ad the estimated v, ( i). I this stud, the umber of the hidde states M ad their 4

variaces σ i, i M, are fixed a priori. The weighted value v, ( i) is obtaied b the EM algorithm. The other weighted values v k, ( i) are give iterativel b Eq. () for k k,, K, ;,, K, ad i,, K, M. Together with the trasitio probabilities T k, ( i, estimated b the EM algorithm usig the previous ears dail raifall data, all the hidde state probabilities ca be obtaied. Usig the hidde state probabilities, the remaiig values D k, (other tha D, ) are simulated b the Mote Carlo method. LARGE SCALE DATA SIMULATIO The -order scale data C, ca be estimated b usig the historical dail raifall data. th For a give predictio origi t, we idetif the date correspodig to t i the ear as th t. If u ad u respectivel deote the mea raifalls for the ear for the periods [ t +, t + ] ad [ t 3 +, t ], the b the determiism of the -order scale data, it ca be assumed that u ad u satisf a evolutioar equatio of the form u h( u ) () where h deotes the evolutioar fuctio which is assumed to be liear of the form u w + w + ε () u where the parameters w ad w are estimated (as ŵ ad ŵ ) b the least squares method. Oce the coefficiets ŵ ad ŵ are kow, the mea raifall data u for the period [ t +, t + ] ca be estimated b u wˆ ˆ + wu (here u will be the mea raifall for the period [ t 3 +, t ], see Fig. for the results of large-scale simulatio). Sice the mea raifall for the period [ t +, t] deoted b C, is kow, the ( ) -order scale data C, ad -order wavelet data D, ca be obtaied from Eqs. () ad (3). APPLICATIO The proposed method is applied to three raifall data sets from the Chao Phraa River Basi i Thailad. As metioed earlier, some parameters eeded to be determied a priori. The iclude the umber of laers i the wavelet tree, the umber of hidde states M ad the variaces σ i for each hidde state. The umber of laers (or, the scale) i the wavelet tree,, is determied usig the False earest eighbours (F) method that has bee proposed for fidig the embeddig dimesio d e of a determiistic sstem (Abarbael []; Jaawardea et al., [4]). I this stud, the same cocept is exteded to determie the best scale order

which will esure that the data i the -order scale are determiistic. It should be metioed that, is chose to be the miimum, so that the data of scale less tha are cosidered as stochastic, ad therefore the wavelet tree coefficiets are stochastic. The secod parameter to be assiged a priori is the umber of hidde states which has bee set at 3. The third is the variaces. Sice all the wavelet sigals lie i the iterval (,), their variaces will be withi the rage (, ). Therefore, the are assiged the values σ., σ. 4, σ 3. 7 represetig large, medium ad small hidde states. The simulatio procedure for fixed, M ad predictio origi t, ivolves four steps: The first step is to estimate C, C, u - the mea of raifall data i the iterval [ t +, t + ] b the liear model. It is determied b usig the meas of correspodig das i the previous ears, i.e. the mea u i the iterval [ t +, t + ], ad the mea u i the iterval [ t 3 +, t ], for Y. Sice the raifall data of the period [ t +, t] is kow, C, is obtaied ad thus D, is evaluated b Eq. (3). The secod step is to estimate the weighted value v,( i) for i 3 usig D, obtaied i step oe ad the EM algorithm. The third step k is to estimate the trasitio probabilities T k, ( i, ( k, ad i, j 3) of the wavelet trees costructed b the historical data of the period [ t +, t + ] ad EM algorithm. The last step is to simulate the wavelet coefficiets D k, (other tha D, ) b the Mote Carlo method. The dail raifall data for the das followig the predictio origi ca the be calculated b iverse trasformatio. COCLUSIO t I this paper, a ovel approach of simulatig dail raifall data usig wavelet ad hidde Markov model is itroduced. Sice the model that has bee used is a mixed oe, it has iheretl some radomess built ito it. Therefore a determiistic compariso aloe is ot expected to give a oe to oe match. Istead, a frequec distributio of the umbers of das with raifalls of varig magitudes is show i Fig.. Table gives some parameters of compariso of the large scale simulatio with those of observatio. Obviousl, the liear model ma ot be the best, but it is the simplest. Other tpes of local models ca also be equall applicable. Table 3 gives some parameters of compariso of the simulated ad the observed. Several assumptios were ecessar i this stud. The iclude the assumptios that the 4-da data are determiistic, the wavelet coefficiets follow a mixed Gaussia distributio, ad that the trasitio probabilities for the same period of time i differet ears are the same. Table. Data summar for the Chao Phraa Basi Regios Gaugig statio umber of data poits Dr probabilit o. (CPB).7. o. (CPB) 3.849 88. o. 7 (CPB7) 3.8 7.38 Chao Phraa Basi Average aual raifall (mm)

Table. Data summar for the large-scale simulatio Gaugig statio Mea Stadard deviatio CPB.8.87.4.4 CPB.4.38.8.79 CPB7.3.8 3.8. Table 3. Mea ad stadard deviatios of simulated ad observed dail raifall data Gaugig statio Predictio Mea Stadard deviatio origi CPB t 48.4.4.9.4 t 48 4.7 8.48. 7.7 t 49..83.78. CPB t 48 3.37.7.3. t 48.8 7. 9.3 8.9 t 49 8.7 9.8.4 7. CPB7 t 48.4.9 4.7 3.7 t 48.4.8 8..83 t 49 3.3 8.9 9.73 8.94 Mea raifall of 4 das at t (mm) 9 8 7 4 3 3 4 7 Predictio origi t (Da) Fig. (a) Mea raifall of 4 das at t (mm) 7 4 3 3 4 7 Predictio origi t (Da) Fig. (b) Figure. Large-scale simulatio usig -parameter liear fuctio: from t 4349 (Da ) to t 78 (Da 73) of (a) CPB, (b) CPB, 7

Mea raifall of 4 das at t (mm) 4 8 4 3 4 7 - Predictio origi t (Da) Fig. (c) Figure. Large-scale simulatio usig -parameter liear fuctio: from t 4349 ) to t 78 (Da 73) of (c) CPB7.( Cotiued) (Da umber of das 3 umber of das 3 3 4 7 8 9 Raifall (mm) 3 4 7 8 9 Raifall (mm) (A) (B) umber of das 3 umber of das 3 3 3 4 7 8 9 Raifall (mm) 3 4 7 8 9 Raifall (mm) (B) (D) Figure. Frequec distributio of dail raifall i Chao Phraa Basi. 8

umber of das 3 umber of das 3 3 4 7 8 9 Raifall (mm) 3 4 7 8 9 Raifall (mm) (E) (F) Figure. Frequec distributio of dail raifall i Chao Phraa Basi.(cotiued) (A. CPB at origi 48, B. CPB at origi 48, C. CPB at origi 49, D. CPB at origi 48, E. CPB at origi 48, F. CPB at origi 49) REFERECES [] Abarbael, H. D. I., Aalsis of observed chaotic data, ew York: Spriger-Verlag, (99) [] Crouse, M. S., owak, R. D., Baraiuk, R. G., Wavelet-based statistical sigal processig usig hidde Markov models, IEEE Trasactios o Sigal Processig, 4 (998), pp 88-9. [3] Dempster, A. P., Laird,. M., Rubi, D. B.,. Maximum likelihood from icomplete data via the EM algorithm, J. Roal Stat. Soc. B. 39 (977), pp -38. [4] Jaawardea, A. W., Li, W. K., Xu, P., eighbourhood selectio for local modellig ad predictio of hdrological time series, J. Hdrol. 8 (), pp 4-7. [] McLachla, G. J., Krisha, T., The EM algorithm ad extesios, ew York: Joh Wile, (997). [] Rabier, L. R.,. A tutorial o hidde Markov models ad selected applicatios i speech recogitio, Proc. IEEE 77 (989), pp 7-8. [7] Roe, O., Rohlicek, J. R., Ostedorf, M., Parameter estimatio of depedece tree models usig the EM algorithm, IEEE Sigal Proc. Lett. (99), pp 7-9. [8] Smth, P., Hecherma, D., Jorda, M. I., Probabilistic idepedece etworks for hidde Markov probabilit models, eural comp. 9 (997), pp 7-9. 9