RAINFALL PREDICTION BY WAVELET DECOMPOSITION

Similar documents
Properties and Hypothesis Testing

CEE 522 Autumn Uncertainty Concepts for Geotechnical Engineering

R. van Zyl 1, A.J. van der Merwe 2. Quintiles International, University of the Free State

ECE 901 Lecture 12: Complexity Regularization and the Squared Loss

Monte Carlo Integration

Chapter 12 EM algorithms The Expectation-Maximization (EM) algorithm is a maximum likelihood method for models that have hidden variables eg. Gaussian

Statistical Intervals for a Single Sample

MOMENT-METHOD ESTIMATION BASED ON CENSORED SAMPLE

Lecture 2: Monte Carlo Simulation

11 Correlation and Regression

U8L1: Sec Equations of Lines in R 2

6.867 Machine learning, lecture 7 (Jaakkola) 1

Estimation for Complete Data

A Unified Approach on Fast Training of Feedforward and Recurrent Networks Using EM Algorithm

FACULTY OF MATHEMATICAL STUDIES MATHEMATICS FOR PART I ENGINEERING. Lectures

The DOA Estimation of Multiple Signals based on Weighting MUSIC Algorithm

Most text will write ordinary derivatives using either Leibniz notation 2 3. y + 5y= e and y y. xx tt t

10-701/ Machine Learning Mid-term Exam Solution

Overview. p 2. Chapter 9. Pooled Estimate of. q = 1 p. Notation for Two Proportions. Inferences about Two Proportions. Assumptions

A statistical method to determine sample size to estimate characteristic value of soil parameters

Topic 9: Sampling Distributions of Estimators

Linear Regression Models

ADVANCED SOFTWARE ENGINEERING

ECON 3150/4150, Spring term Lecture 3

Lecture 19: Convergence

ECONOMETRIC THEORY. MODULE XIII Lecture - 34 Asymptotic Theory and Stochastic Regressors

Statistical Analysis on Uncertainty for Autocorrelated Measurements and its Applications to Key Comparisons

The Random Walk For Dummies

Bayesian Control Charts for the Two-parameter Exponential Distribution

FIR Filter Design: Part II

Confidence Intervals

Topic 10: Introduction to Estimation

MOST PEOPLE WOULD RATHER LIVE WITH A PROBLEM THEY CAN'T SOLVE, THAN ACCEPT A SOLUTION THEY CAN'T UNDERSTAND.

3/8/2016. Contents in latter part PATTERN RECOGNITION AND MACHINE LEARNING. Dynamical Systems. Dynamical Systems. Linear Dynamical Systems

Statistical Pattern Recognition

Confidence Interval for Standard Deviation of Normal Distribution with Known Coefficients of Variation

Journal of Multivariate Analysis. Superefficient estimation of the marginals by exploiting knowledge on the copula

Filter banks. Separately, the lowpass and highpass filters are not invertible. removes the highest frequency 1/ 2and

This is an introductory course in Analysis of Variance and Design of Experiments.

Recursive Algorithm for Generating Partitions of an Integer. 1 Preliminary

Chapter 2 Feedback Control Theory Continued

w (1) ˆx w (1) x (1) /ρ and w (2) ˆx w (2) x (2) /ρ.

Discrete Orthogonal Moment Features Using Chebyshev Polynomials

Goodness-Of-Fit For The Generalized Exponential Distribution. Abstract

CHAPTER 4 BIVARIATE DISTRIBUTION EXTENSION

CS434a/541a: Pattern Recognition Prof. Olga Veksler. Lecture 5

DS 100: Principles and Techniques of Data Science Date: April 13, Discussion #10

Economics 250 Assignment 1 Suggested Answers. 1. We have the following data set on the lengths (in minutes) of a sample of long-distance phone calls

A NEW CLASS OF 2-STEP RATIONAL MULTISTEP METHODS

The Choquet Integral with Respect to Fuzzy-Valued Set Functions

Department of Mathematics

Trimmed Mean as an Adaptive Robust Estimator of a Location Parameter for Weibull Distribution

t distribution [34] : used to test a mean against an hypothesized value (H 0 : µ = µ 0 ) or the difference

LINEAR REGRESSION ANALYSIS. MODULE IX Lecture Multicollinearity

INF Introduction to classifiction Anne Solberg Based on Chapter 2 ( ) in Duda and Hart: Pattern Classification

On the convergence, consistence and stability of a standard finite difference scheme

ECE 8527: Introduction to Machine Learning and Pattern Recognition Midterm # 1. Vaishali Amin Fall, 2015

1 Inferential Methods for Correlation and Regression Analysis

Information-based Feature Selection

Simulation. Two Rule For Inverting A Distribution Function

MATH 320: Probability and Statistics 9. Estimation and Testing of Parameters. Readings: Pruim, Chapter 4

Open book and notes. 120 minutes. Cover page and six pages of exam. No calculators.

An Introduction to Randomized Algorithms

Random Variables, Sampling and Estimation

WHAT IS THE PROBABILITY FUNCTION FOR LARGE TSUNAMI WAVES? ABSTRACT

Module 5 EMBEDDED WAVELET CODING. Version 2 ECE IIT, Kharagpur

A Slight Extension of Coherent Integration Loss Due to White Gaussian Phase Noise Mark A. Richards

TMA4245 Statistics. Corrected 30 May and 4 June Norwegian University of Science and Technology Department of Mathematical Sciences.

AN OPEN-PLUS-CLOSED-LOOP APPROACH TO SYNCHRONIZATION OF CHAOTIC AND HYPERCHAOTIC MAPS

Advanced Stochastic Processes.

DISTRIBUTION LAW Okunev I.V.

The random version of Dvoretzky s theorem in l n

Achieving Stationary Distributions in Markov Chains. Monday, November 17, 2008 Rice University

Clustering. CM226: Machine Learning for Bioinformatics. Fall Sriram Sankararaman Acknowledgments: Fei Sha, Ameet Talwalkar.

Confidence interval for the two-parameter exponentiated Gumbel distribution based on record values

5.1 Review of Singular Value Decomposition (SVD)

62. Power series Definition 16. (Power series) Given a sequence {c n }, the series. c n x n = c 0 + c 1 x + c 2 x 2 + c 3 x 3 +

Asymptotic distribution of the first-stage F-statistic under weak IVs

Central limit theorem and almost sure central limit theorem for the product of some partial sums

Access to the published version may require journal subscription. Published with permission from: Elsevier.

Molecular Mechanisms of Gas Diffusion in CO 2 Hydrates

6.867 Machine learning

Chapter 8: Estimating with Confidence

Goodness-of-Fit Tests and Categorical Data Analysis (Devore Chapter Fourteen)

Quick Review of Probability

Quick Review of Probability

Topic 9: Sampling Distributions of Estimators

Economics 241B Relation to Method of Moments and Maximum Likelihood OLSE as a Maximum Likelihood Estimator

EECS564 Estimation, Filtering, and Detection Hwk 2 Solns. Winter p θ (z) = (2θz + 1 θ), 0 z 1

THE KALMAN FILTER RAUL ROJAS

Probability and statistics: basic terms

Mathematical Modeling of Optimum 3 Step Stress Accelerated Life Testing for Generalized Pareto Distribution

Fastest mixing Markov chain on a path

ANOTHER WEIGHTED WEIBULL DISTRIBUTION FROM AZZALINI S FAMILY

The picture in figure 1.1 helps us to see that the area represents the distance traveled. Figure 1: Area represents distance travelled

U8L1: Sec Equations of Lines in R 2

Asymptotic distribution of products of sums of independent random variables

Double Stage Shrinkage Estimator of Two Parameters. Generalized Exponential Distribution

Topic 9: Sampling Distributions of Estimators

Bayesian Methods: Introduction to Multi-parameter Models

Transcription:

RAIFALL PREDICTIO BY WAVELET DECOMPOSITIO A. W. JAYAWARDEA Departmet of Civil Egieerig, The Uiversit of Hog Kog, Hog Kog, Chia P. C. XU Academ of Mathematics ad Sstem Scieces, Chiese Academ of Scieces, Beijig, Chia F. L. TSAG Departmet of Civil Egieerig, The Uiversit of Hog Kog, Hog Kog, Chia A ovel approach of simulatig dail raifall data usig wavelet ad hidde Markov model is preseted i this stud. It is the applied to simulate dail raifall data i three gaugig statios i the Chao Phraa Basi i Thailad. ITRODUCTIO Because of the strog determiistic character of the large-scale (mothl or bimothl) raifall data ad the strog stochastic character of the small-scale (dail) data, either determiistic or stochastic models ca produce successful simulatios of dail raifall data. I this paper, a mixed method which combies a determiistic model with a stochastic model is itroduced ad applied to simulate raifall data i the Chao Phraa Basi i Thailad. I the proposed approach, the raifall sigal is decomposed ito sub-sigals with differet scales, i.e., a large-scale sigal ad several small-scale sigals. For a give time series ad predictio origi t, the date correspodig to t i previous ears, t (e.g. if t, t 7 ad t 3 ), is first idetified, ad sigals of dail data i the das periods [ t +, t + ] is decomposed ito sub-sigals (wavelet tree), where is required to be determied a priori. After decompositio, a determiistic model is used to describe the large-scale sigal, ad a hidde Markov tree model is used to simulate the small-scale sigals i a wavelet tree (Crouse et al., []; Smth et al., [8]). The odes of the wavelet trees have some hidde states (Rabier, []) that follow a mixed Gaussia distributio ad the state probabilities are treated as Markov stochastic processes. The EM algorithm (Crouse et al., []; McLachla, []; Roe et al., [7]; Dempster et al., [3]) is applied to estimate the state ad trasitio probabilities (Rabier, []) for the Markov model. Oce the state probabilities are obtaied, the wavelet coefficiets are simulated b Mote Carlo method. Together with the large-scale sigal determied, the dail raifall data ca be simulated via iverse trasformatio. The approach is the used to simulate three dail raifall time series i the Chao Phraa Basi (CPB) i Thailad. The data series are from the gaugig statio os.,

ad 7 respectivel for the periods April, 98 to March 3, 994, April, 98 to Jul 3, 994, ad April, 98 to Jul 3, 994. The statistics of the three data sets are give i Table. SMALL SCALE DATA SIMULATIO For a series of give dail raifall data r i ( i max ) ; max is the legth of the raifall time series, a ew time series u, which gives the mea raifalls for ever das ca be costructed as u r, () i i ( ) + where is the give scale (assiged to be i this stud). The ew time series u ca be cosidered as determiistic ad therefore predictable for sufficietl large scale. To estimate the dail raifall usig the predicted value C, of the mea raifall i a da period, a reasoable thought is to estimate the mea raifalls C, ad C, at the scale level, C,, C,, C, 3 ad C, 4 at the scale level etc. successivel, util the dail scale level C,, C,,, C, is reached. However, to obtai the mea raifall at the scale level, usig the predicted raifall at the scale level, more iformatio is eeded. Sice C,, ad C, & C, respectivel are the mea raifalls at ad scales, the satisf the equatio C, + C, C,. () To estimate the scale mea raifall, aother variable is itroduced: C, C, D,. (3) C +, The, the scale mea raifall ca be estimated b kow ad as C, D, C, (C, + ) D, C,, (4) C, + (C, + ) D, C,. () B the same method, if the wavelet coefficiets

Ck, Ck, D k, () C + k, are kow for all the scales k, the Ck, ad C k, ca be obtaied for all, k usig Ck, + Ck, C k,. (7) The estimatio of dail raifall data, C, for all, ca the be doe b sequetial applicatio of this procedure. I this stud, -order scale data correspod to dail raifall data ( ), -order correspod to -da data ( ), -order correspod to 4-da data ( ), 3-order correspod 3 to 8-da data ( ), etc., ad -order data correspod to -da data. The otatios (for example, C k, ad D k, ) without superscript i this paper are limited to the data that are to be estimated, whereas the otatios (for example, C k, ad D k, ) with a superscript refer to the historical data that have bee used i the simulatio process. B usig the above decompositio method the (observed) -order scale data C, ca be decomposed as follows: C, C, C, C, C, D, D, D, D, Coversel, the (simulated) -order scale data ca be recostructed b the wavelet sigal ad the large-scale sigal as: C, C, C, C, C, D, D, D, D, Sice the large-scale sigal of a dail raifall data is assumed to be determiistic, the -order scale data C, ca be predicted usig the historical data for large. I this stud, a local liear model is used for this purpose ad a tree model is used to simulate the k wavelet sigal D k,, for k ad ( D, ca be calculated b estimated C,, historical data C, ad Eq. (3)). I the wavelet tree, the ode ( k, ) where k is the laer umber ad is the positio umber i laer k has the paret ( k +,[( + ) / ]) while the offsprigs are ( k, ) ad ( k,). Here the fuctio [x] returs the largest iteger smaller tha (the paret-child termiolog is used i related papers, e.g. Roe et al. [7] ad x 3

Crouse et al. []). The data i the wavelet tree are all stochastic, ad therefore a stochastic method must be used for simulatio. A simple probabilit fuctio such as the Gaussia distributio will ot be suitable because the data i the wavelet tree which are built b the dail raifall data, will cotai ma small values (See Table which gives the dr probabilit, a idicator of the umber of das with zero raifall), ad some large values. I this stud, a mixture model, a combiatio of several Gaussia distributio fuctios, is used to simulate the wavelet sigal: p( x) M vk, ( i) exp( ) i πσ σ i i x, (8) where p(x) is the probabilit distributio of the wavelet sigal x (i this case x Dk, ). The weighted value for each Gaussia distributio fuctio, v k, ( i), is also stochastic. It is simulated b usig a ew radom variable S k,, called the hidde state variable, which has values of {,, 3, K, M}. The weighted value v k, ( i) is equal to the probabilit of hidde state variable i state i (i.e. vk, ( i) Pr( Sk, i) ). I the above defiitios, i is the umber of hidde states assumed ( i,, 3, K, M ), M is the maximum umber of hidde states, ad σ i is the variace of the Gaussia distributio. It ca also be see that the weighted values for the mixture model for the data poits D k, ad D k, or Dk, are depedet. A large value of D k, alwas meas that oe value of either Dk, or Dk, is large. So the weighted value v k, ( i), (or, vk, ( i) ) for the data poit Dk, (or, Dk, ) which is equal to the probabilit of the hidde state variable Sk, (or, Sk, ) depeds o the weighted value v k, (, the probabilit of the hidde state variable S k, equal to j. Sice probabilit of trasitio of the hidde state variable from S k, to S k, l ( l, ) could var with positio ( k, ), we itroduce the trasitio probabilities as follows: Tk, l ( i, Pr( Sk, l i Sk,, l,. (9) The the weighted values vk, ( i), or, vk, ( i) satisf the Markov coditio v (, l,. () M k, l i) Tk, l ( i, vk, ( j I order to simulate the dail raifall data b the above approach, we eed to kow the followig parameters: the -order scale data C, ; the umber of hidde states M ; the variace for each hidde state σ i ; the weighted value v, ( i) for each Gaussia distributio, i.e. the probabilit of the hidde state radom variable S, i ad the trasitio probabilities T k, ( i,. All the remaiig v k, ( i) ca the be obtaied b Eq. () ad the estimated v, ( i). I this stud, the umber of the hidde states M ad their 4

variaces σ i, i M, are fixed a priori. The weighted value v, ( i) is obtaied b the EM algorithm. The other weighted values v k, ( i) are give iterativel b Eq. () for k k,, K, ;,, K, ad i,, K, M. Together with the trasitio probabilities T k, ( i, estimated b the EM algorithm usig the previous ears dail raifall data, all the hidde state probabilities ca be obtaied. Usig the hidde state probabilities, the remaiig values D k, (other tha D, ) are simulated b the Mote Carlo method. LARGE SCALE DATA SIMULATIO The -order scale data C, ca be estimated b usig the historical dail raifall data. th For a give predictio origi t, we idetif the date correspodig to t i the ear as th t. If u ad u respectivel deote the mea raifalls for the ear for the periods [ t +, t + ] ad [ t 3 +, t ], the b the determiism of the -order scale data, it ca be assumed that u ad u satisf a evolutioar equatio of the form u h( u ) () where h deotes the evolutioar fuctio which is assumed to be liear of the form u w + w + ε () u where the parameters w ad w are estimated (as ŵ ad ŵ ) b the least squares method. Oce the coefficiets ŵ ad ŵ are kow, the mea raifall data u for the period [ t +, t + ] ca be estimated b u wˆ ˆ + wu (here u will be the mea raifall for the period [ t 3 +, t ], see Fig. for the results of large-scale simulatio). Sice the mea raifall for the period [ t +, t] deoted b C, is kow, the ( ) -order scale data C, ad -order wavelet data D, ca be obtaied from Eqs. () ad (3). APPLICATIO The proposed method is applied to three raifall data sets from the Chao Phraa River Basi i Thailad. As metioed earlier, some parameters eeded to be determied a priori. The iclude the umber of laers i the wavelet tree, the umber of hidde states M ad the variaces σ i for each hidde state. The umber of laers (or, the scale) i the wavelet tree,, is determied usig the False earest eighbours (F) method that has bee proposed for fidig the embeddig dimesio d e of a determiistic sstem (Abarbael []; Jaawardea et al., [4]). I this stud, the same cocept is exteded to determie the best scale order

which will esure that the data i the -order scale are determiistic. It should be metioed that, is chose to be the miimum, so that the data of scale less tha are cosidered as stochastic, ad therefore the wavelet tree coefficiets are stochastic. The secod parameter to be assiged a priori is the umber of hidde states which has bee set at 3. The third is the variaces. Sice all the wavelet sigals lie i the iterval (,), their variaces will be withi the rage (, ). Therefore, the are assiged the values σ., σ. 4, σ 3. 7 represetig large, medium ad small hidde states. The simulatio procedure for fixed, M ad predictio origi t, ivolves four steps: The first step is to estimate C, C, u - the mea of raifall data i the iterval [ t +, t + ] b the liear model. It is determied b usig the meas of correspodig das i the previous ears, i.e. the mea u i the iterval [ t +, t + ], ad the mea u i the iterval [ t 3 +, t ], for Y. Sice the raifall data of the period [ t +, t] is kow, C, is obtaied ad thus D, is evaluated b Eq. (3). The secod step is to estimate the weighted value v,( i) for i 3 usig D, obtaied i step oe ad the EM algorithm. The third step k is to estimate the trasitio probabilities T k, ( i, ( k, ad i, j 3) of the wavelet trees costructed b the historical data of the period [ t +, t + ] ad EM algorithm. The last step is to simulate the wavelet coefficiets D k, (other tha D, ) b the Mote Carlo method. The dail raifall data for the das followig the predictio origi ca the be calculated b iverse trasformatio. COCLUSIO t I this paper, a ovel approach of simulatig dail raifall data usig wavelet ad hidde Markov model is itroduced. Sice the model that has bee used is a mixed oe, it has iheretl some radomess built ito it. Therefore a determiistic compariso aloe is ot expected to give a oe to oe match. Istead, a frequec distributio of the umbers of das with raifalls of varig magitudes is show i Fig.. Table gives some parameters of compariso of the large scale simulatio with those of observatio. Obviousl, the liear model ma ot be the best, but it is the simplest. Other tpes of local models ca also be equall applicable. Table 3 gives some parameters of compariso of the simulated ad the observed. Several assumptios were ecessar i this stud. The iclude the assumptios that the 4-da data are determiistic, the wavelet coefficiets follow a mixed Gaussia distributio, ad that the trasitio probabilities for the same period of time i differet ears are the same. Table. Data summar for the Chao Phraa Basi Regios Gaugig statio umber of data poits Dr probabilit o. (CPB).7. o. (CPB) 3.849 88. o. 7 (CPB7) 3.8 7.38 Chao Phraa Basi Average aual raifall (mm)

Table. Data summar for the large-scale simulatio Gaugig statio Mea Stadard deviatio CPB.8.87.4.4 CPB.4.38.8.79 CPB7.3.8 3.8. Table 3. Mea ad stadard deviatios of simulated ad observed dail raifall data Gaugig statio Predictio Mea Stadard deviatio origi CPB t 48.4.4.9.4 t 48 4.7 8.48. 7.7 t 49..83.78. CPB t 48 3.37.7.3. t 48.8 7. 9.3 8.9 t 49 8.7 9.8.4 7. CPB7 t 48.4.9 4.7 3.7 t 48.4.8 8..83 t 49 3.3 8.9 9.73 8.94 Mea raifall of 4 das at t (mm) 9 8 7 4 3 3 4 7 Predictio origi t (Da) Fig. (a) Mea raifall of 4 das at t (mm) 7 4 3 3 4 7 Predictio origi t (Da) Fig. (b) Figure. Large-scale simulatio usig -parameter liear fuctio: from t 4349 (Da ) to t 78 (Da 73) of (a) CPB, (b) CPB, 7

Mea raifall of 4 das at t (mm) 4 8 4 3 4 7 - Predictio origi t (Da) Fig. (c) Figure. Large-scale simulatio usig -parameter liear fuctio: from t 4349 ) to t 78 (Da 73) of (c) CPB7.( Cotiued) (Da umber of das 3 umber of das 3 3 4 7 8 9 Raifall (mm) 3 4 7 8 9 Raifall (mm) (A) (B) umber of das 3 umber of das 3 3 3 4 7 8 9 Raifall (mm) 3 4 7 8 9 Raifall (mm) (B) (D) Figure. Frequec distributio of dail raifall i Chao Phraa Basi. 8

umber of das 3 umber of das 3 3 4 7 8 9 Raifall (mm) 3 4 7 8 9 Raifall (mm) (E) (F) Figure. Frequec distributio of dail raifall i Chao Phraa Basi.(cotiued) (A. CPB at origi 48, B. CPB at origi 48, C. CPB at origi 49, D. CPB at origi 48, E. CPB at origi 48, F. CPB at origi 49) REFERECES [] Abarbael, H. D. I., Aalsis of observed chaotic data, ew York: Spriger-Verlag, (99) [] Crouse, M. S., owak, R. D., Baraiuk, R. G., Wavelet-based statistical sigal processig usig hidde Markov models, IEEE Trasactios o Sigal Processig, 4 (998), pp 88-9. [3] Dempster, A. P., Laird,. M., Rubi, D. B.,. Maximum likelihood from icomplete data via the EM algorithm, J. Roal Stat. Soc. B. 39 (977), pp -38. [4] Jaawardea, A. W., Li, W. K., Xu, P., eighbourhood selectio for local modellig ad predictio of hdrological time series, J. Hdrol. 8 (), pp 4-7. [] McLachla, G. J., Krisha, T., The EM algorithm ad extesios, ew York: Joh Wile, (997). [] Rabier, L. R.,. A tutorial o hidde Markov models ad selected applicatios i speech recogitio, Proc. IEEE 77 (989), pp 7-8. [7] Roe, O., Rohlicek, J. R., Ostedorf, M., Parameter estimatio of depedece tree models usig the EM algorithm, IEEE Sigal Proc. Lett. (99), pp 7-9. [8] Smth, P., Hecherma, D., Jorda, M. I., Probabilistic idepedece etworks for hidde Markov probabilit models, eural comp. 9 (997), pp 7-9. 9