PAijpam.eu ADAPTIVE K-S TESTS FOR WHITE NOISE IN THE FREQUENCY DOMAIN Hossein Arsham University of Baltimore Baltimore, MD 21201, USA

Similar documents
covariance function, 174 probability structure of; Yule-Walker equations, 174 Moving average process, fluctuations, 5-6, 175 probability structure of

HANDBOOK OF APPLICABLE MATHEMATICS

Applied Probability and Stochastic Processes

Financial Econometrics and Quantitative Risk Managenent Return Properties

8.2 Harmonic Regression and the Periodogram

Local Whittle Likelihood Estimators and Tests for non-gaussian Linear Processes

STOCHASTIC MODELING OF MONTHLY RAINFALL AT KOTA REGION

Monte Carlo Studies. The response in a Monte Carlo study is a random variable.

1. Introduction. Let the distribution of a non-negative integer-valued random variable X be defined as follows:

Chapter 3: Regression Methods for Trends

arxiv: v1 [stat.me] 2 Mar 2015

On Consistency of Tests for Stationarity in Autoregressive and Moving Average Models of Different Orders

PAijpam.eu A SHORT PROOF THAT NP IS NOT P

Distribution Theory. Comparison Between Two Quantiles: The Normal and Exponential Cases

Detection theory 101 ELEC-E5410 Signal Processing for Communications

Random Number Generation. CS1538: Introduction to simulations

Testing Goodness-of-Fit for Exponential Distribution Based on Cumulative Residual Entropy

TIMES SERIES INTRODUCTION INTRODUCTION. Page 1. A time series is a set of observations made sequentially through time

ECONOMICS 7200 MODERN TIME SERIES ANALYSIS Econometric Theory and Applications

RELIABILITY TEST PLANS BASED ON BURR DISTRIBUTION FROM TRUNCATED LIFE TESTS

Chapter 9. Non-Parametric Density Function Estimation

The Spectrum of Broadway: A SAS

Business Statistics. Lecture 10: Correlation and Linear Regression

ANALYSIS OF VARIANCE OF BALANCED DAIRY SCIENCE DATA USING SAS

A COMPLETE ASYMPTOTIC SERIES FOR THE AUTOCOVARIANCE FUNCTION OF A LONG MEMORY PROCESS. OFFER LIEBERMAN and PETER C. B. PHILLIPS

INDIAN INSTITUTE OF SCIENCE STOCHASTIC HYDROLOGY. Lecture -27 Course Instructor : Prof. P. P. MUJUMDAR Department of Civil Engg., IISc.

The Goodness-of-fit Test for Gumbel Distribution: A Comparative Study

SF2943: TIME SERIES ANALYSIS COMMENTS ON SPECTRAL DENSITIES

Reliability Theory of Dynamic Loaded Structures (cont.) Calculation of Out-Crossing Frequencies Approximations to the Failure Probability.

Fractal structure in the Chinese yuan/us dollar rate. Abstract. Grande Do Sul, Brazil

Defect Detection using Nonparametric Regression

On the Application of the Generalized Pareto Distribution for Statistical Extrapolation in the Assessment of Dynamic Stability in Irregular Waves

Chapter 9. Non-Parametric Density Function Estimation

6. The econometrics of Financial Markets: Empirical Analysis of Financial Time Series. MA6622, Ernesto Mordecki, CityU, HK, 2006.

Declarative Statistics

Narrowing confidence interval width of PAC learning risk function by algorithmic inference

Time Series: Theory and Methods

Recall the Basics of Hypothesis Testing

High Breakdown Analogs of the Trimmed Mean

Exploring Granger Causality for Time series via Wald Test on Estimated Models with Guaranteed Stability

Analysis of the AIC Statistic for Optimal Detection of Small Changes in Dynamic Systems

Inference for P(Y<X) in Exponentiated Gumbel Distribution

Random Vibrations & Failure Analysis Sayan Gupta Indian Institute of Technology Madras

An Analysis of College Algebra Exam Scores December 14, James D Jones Math Section 01

Parameter estimation in epoch folding analysis

Univariate ARIMA Models

401 Review. 6. Power analysis for one/two-sample hypothesis tests and for correlation analysis.

Residuals in Time Series Models

Remedial Measures, Brown-Forsythe test, F test

ARIMA modeling to forecast area and production of rice in West Bengal

Generalized Method of Moments Estimation

Uniform Random Number Generators

Testing for Regime Switching in Singaporean Business Cycles

Chapter 2 Inference on Mean Residual Life-Overview

A Two Stage Group Acceptance Sampling Plans Based On Truncated Life Tests For Inverse And Generalized Rayleigh Distributions

CPSC 531: Random Numbers. Jonathan Hudson Department of Computer Science University of Calgary

H 2 : otherwise. that is simply the proportion of the sample points below level x. For any fixed point x the law of large numbers gives that

Time Series. Anthony Davison. c

Time Series Analysis -- An Introduction -- AMS 586

Research Note: A more powerful test statistic for reasoning about interference between units

Sequential Procedure for Testing Hypothesis about Mean of Latent Gaussian Process

Probability Models in Electrical and Computer Engineering Mathematical models as tools in analysis and design Deterministic models Probability models

7.2 One-Sample Correlation ( = a) Introduction. Correlation analysis measures the strength and direction of association between

If we want to analyze experimental or simulated data we might encounter the following tasks:

Guangzhou, P.R. China

MATH4427 Notebook 4 Fall Semester 2017/2018

Monitoring Random Start Forward Searches for Multivariate Data

Modeling and forecasting global mean temperature time series

Thomas J. Fisher. Research Statement. Preliminary Results

Analysis and Optimization of Discrete Event Systems using Petri Nets

Glossary. The ISI glossary of statistical terms provides definitions in a number of different languages:

Solutions to the recurrence relation u n+1 = v n+1 + u n v n in terms of Bell polynomials

However, reliability analysis is not limited to calculation of the probability of failure.

3. Linear Regression With a Single Regressor

Discrete Simulation of Power Law Noise

A3. Statistical Inference

Lawrence D. Brown, T. Tony Cai and Anirban DasGupta

On Forecast Strength of Some Linear and Non Linear Time Series Models for Stationary Data Structure

Algorithms for generating surrogate data for sparsely quantized time series

SAMPLE SIZE RE-ESTIMATION FOR ADAPTIVE SEQUENTIAL DESIGN IN CLINICAL TRIALS

Inferring the Causal Decomposition under the Presence of Deterministic Relations.

Reliability Theory of Dynamically Loaded Structures (cont.)

Lack-of-fit Tests to Indicate Material Model Improvement or Experimental Data Noise Reduction

distributed approximately according to white noise. Likewise, for general ARMA(p,q), the residuals can be expressed as

Study Ch. 9.3, #47 53 (45 51), 55 61, (55 59)

Goodness-of-fit tests for correlated data

OPTIMIZATION OF FIRST ORDER MODELS

1 Random walks and data

Generic Stochastic Effect Model of Noise in Controlling Source to Electronically Controlled Device

The Slow Convergence of OLS Estimators of α, β and Portfolio. β and Portfolio Weights under Long Memory Stochastic Volatility

Distribution Fitting (Censored Data)

PROBABILITY AND STOCHASTIC PROCESSES A Friendly Introduction for Electrical and Computer Engineers

CHAPTER 8 MODEL DIAGNOSTICS. 8.1 Residual Analysis

Journal of Statistical Research 2007, Vol. 41, No. 1, pp Bangladesh

Hypothesis Testing. Part I. James J. Heckman University of Chicago. Econ 312 This draft, April 20, 2006

Journal of Environmental Statistics

Estimation of the long Memory parameter using an Infinite Source Poisson model applied to transmission rate measurements

On the Goodness-of-Fit Tests for Some Continuous Time Processes

Estimation of Parameters of Multiplicative Seasonal Autoregressive Integrated Moving Average Model Using Multiple Regression

IMPROVEMENTS IN MODAL PARAMETER EXTRACTION THROUGH POST-PROCESSING FREQUENCY RESPONSE FUNCTION ESTIMATES

Transcription:

International Journal of Pure and Applied Mathematics Volume 82 No. 4 2013, 521-529 ISSN: 1311-8080 (printed version); ISSN: 1314-3395 (on-line version) url: http://www.ijpam.eu doi: http://dx.doi.org/10.12732/ijpam.v82i4.2 PAijpam.eu ADAPTIVE K-S TESTS FOR WHITE NOISE IN THE FREQUENCY DOMAIN Hossein Arsham University of Baltimore Baltimore, MD 21201, USA Abstract: Since the classical Kolmogorov-Smirnov (K-S) tests for white noise are completely insensitive to both tails it is difficult in a particular application to detect departure from whiteness. A modified version of the widely used (K- S) test of null hypothesis is constructed, that a given time series is Gaussian white noise, against the alternative hypothesis that the time series contains an added or multiplicative deterministic periodic component of unspecified frequency. The usual K-S test is treated as a special case. The proposed test is more powerful than the ordinary K-S test in detecting extreme (low or high) hidden periodicities. Computational procedures necessary for implementation are given. AMS Subject Classification: 60H40, 62F25, 60G10 Key Words: Kolmogorov-Smirnov test, frequency analysis, identification of periodic component, goodness-of-fit test for white noise, periodogram, residuals, Spectrum; Stationary process 1. Introduction A systematic approach in statistical treatments of hydrologic time series often includes testing the goodness of fit of a model. This phase of analysis needs to be checked in order to verify whether it complies with certain assumptions about the model and to verify how well it represents the historical time series. One of the model assumptions to be checked is the independence of residual (Gaussian white noise). If this check is not satisfactory, the model should Received: May 9, 2012 c 2013 Academic Publications, Ltd. url: www.acadpubl.eu

522 H. Arsham be changed and the procedure repeated until a satisfactory model is found. Since the periodogram has simpler sampling properties than auto-correlation coefficients, tests based on the periodogram are considered ideal techniques, particularly for detecting the important frequencies (in problems such as water level oscillations and vibration problems). Therefore, any hypothesis about a mathematical model of the series composition resulting from the periodogram analysis of a time series and the inference of the best fitted mathematical model of dependency of the series is equivalent to testing whether the residuals are from an independent series. Given a finite realization from a time series X t, we can represent the n observations by the trigonometric polynomial, see e.g. Wilks [6], where: and X t = a m 0 2 + (a k cosw k t+b k sinw k t), (1) k=1 w k = 2πk/n, k = 0,1,2,...,m, (2) n a k = 2/n X t cosw k t, k = 0,1,2,...,m, (3) b k = 2/n t=1 n X t sinw k t, k = 1,2,...,m, (4) t=1 where m is the smallest integer greater than or equal to (n 1)/2. The periodogram is defined by I n (w k ) = n a2 k +b2 k, k = 1,2,...,m. (5) 2 The graph of I n (w k ) against k presents a highly unstable appearance which makes it difficult for visual inspection. A better method of presenting the information in a periodogram (5) is to plot the cumulative periodogram Y k = k I n (w j ) j=1 m j=1, k = 1,2,3,...,m 1. (6) I n (w j )

ADAPTIVE K-S TESTS FOR WHITE NOISE... 523 This differs from the periodogram itself, in that the highly unstable behavior of I n (w k ) s is, to large extend, ironed out by the process of accumulation. Thus the cumulative periodogram is a useful diagnostic tool. It is well known that if {X t is Gaussian white noise, then Y k random variables will lie close to the 45 0 line on the graph of Y k against k. On the other hand, Y k for a process with excess of low frequency will tend to lie above the line. By way of contrast, a process with an excess of high frequency components will tend to have a cumulative periodogram which lies below the line. Therefore, based on cumulative periodogram(6), we shall consider a new procedure which is used to test the null hypothesis H 0, that the residual {X 1, X 2,...,X n } is a sequence generated by Gaussian white noise, against the alternative hypothesis that the data is generated by a Gaussian white noise sequence with a superimposed deterministic periodic components. The basic assumption that the periodic movements in most hydrologic time series constitute deterministic components has both physical justification and practical implication. The astromonic cycles that produce hydrologic periodicities are constant periods in all hydrologic and water resources problems. Amplitudes and phases of individual harmonics usually change when e.g. water passes through a hydrologic environment, by the system s effects on amplification and attenuation of amplitudes, and the shifts of phases. The practical implication of the assumption of deterministic periodic components in e.g. hydrologic series is basically twofold. the series can be rightfully decomposed into deterministic and stochastic components. the explanation of components by various physical and/or environmental are causative factors in the regional analysis of series. Since the periodic component is usually unknown, we will consider tests for hidden periodicities of unspecified frequency. As mentioned earlier, when residuals {X 1, X 2,...,X n } are independently normally distributed, Y 1, Y 2,...Y m 1 are distributed like order statistics from a sample of m-1 independent observation from Uniform (0, 1). The most widely used tests based on periodograms for hidden periodicities is the Kolmogorov-Smirnov (K-S) test, see e.g. Balakrishnan [2], Brockwell and Davis [2] and Stoica and Moses [5]. One problem with the K-S is that it is too wide in the tails therefore it may not detect any hidden periodicity at low or high frequencies. The next section provides a new test, which is a modification of the ordinary K-S test, to make it more powerful in detecting extreme hidden periodicities than the ordinary K-S test.

524 H. Arsham 2. A Modified K-S Test Clearly, the cumulative distribution function with jumps of size 1/(m-1) at Y k, k = 1,2,..,m 1 is the empirical distribution function of a sample of size (m 1) from a uniform distribution on [0,1]. Therefore, if we plot the Y k on a sample distribution function and apply the K-S test of the hypothesis that it is a sample distribution function for a sample of (m-1) obtained from a uniform [0,1] distribution, we have a test of the hypothesis that the original time series is white noise. This testing procedure has been discussed by e.g. Pollock, et. al. [3]. This procedure is precisely equivalent to plotting the standardized, cumulative periodogram 0, X < 1, C(X) = Y, I X < i+1, 1, X m, for i = 1,2,...,m 1, (7) and rejecting the null hypothesis at the significance level of α if for any X in [1,m], the function C(X) exits from the boundaries Y = X +K α (m 1) 1 2, (8) Y = X K α (m 1) 1 2, (9) wherek α isthecriticalvalueoftheordinaryk-sstatisticsforsampleofsizem-1 and significance level α. The advantage of this test over others e.g. the Fisher s test, is that its graphical presentation is universally appealing. However, one problem with the K-S test is that the acceptance bandwidth generated by (8) and (9) is too wide at the both ends. This weakness is caused because the bandwidth is constant for all X. Therefore, the K-S test may not detect hidden periodicities at either low or high frequencies. In fact the K-S confidence region with 100(1 α)% confidence must be truncated at both ends, i.e. P { [ ] [ ]} max X K α (m 1) 1 2,0 C(X) min X +K α (m 1) 1 2,1 = 1 α. (10) That means we may not be able to detect extreme periodicities by using this test.

ADAPTIVE K-S TESTS FOR WHITE NOISE... 525 Figure 1: A typical usual K-S acceptance region with diagonal E In the next section we suggest a modification of the K-S acceptance region to remedy this problem. We consider the acceptance regions of the following desirable forms: For c > 0 and a 0: P {A L (x,m) C(x) B L (x,m), for all x} = 1 α (11) and where: P {A R (x,m) C(x) B R (x,m), for all x} = 1 α, (12) A L (x,m) = max { [ ] ] } X/ (m 1)+ca(m 1) 1 2 a/ [(m 1) 1 2 +ca,0, (13) B L (x,m) = min { [ ] ] } X/ (m 1) ca(m 1) 1 2 +a/ [(m 1) 1 2 ca,1, (14) A R (x,m) = max { [ ] [ ] } X/ (m 1) ca(m 1) 1 2 a/ (m 1) 1 2 ca,0, (15) B R (x,m)

526 H. Arsham { [ ] [ ] } = min X/ (m 1)+ca(m 1) 1 2 +a/ (m 1) 1 2 +ca,1. (16) These particular shapes of acceptance regions were dictated by their analytical tractability and usefulness. Note that region (11) is narrower in the left (this helps to detect hidden low periodicities) and (12) narrower in the right (this helps to detect hidden high periodicities). The factor c shapes the acceptance region. Larger values of c imply a region with smaller bandwidth one side (right or left). For c =0, the modified regions (11) and (12) reduce to the usual K-S acceptance region (10). Figure 2: Desirable acceptance region sensitive to low hidden periodicities with diagonal E Figure 3: Desirable acceptance region sensitive to high hidden periodicities with diagonal E

ADAPTIVE K-S TESTS FOR WHITE NOISE... 527 In terms of U m 1, by an appropriate re-scaling the acceptance region (11) can be expressed as { { [ } P max X 1 ac(m 1) 1 2 ] a(m 1) 1 2,0 U m 1 { [ ] }} min X 1+ac(m 1) 1 2 +a(m 1) 1 2,1 = 1 α. (17) Because [1 U m 1 (1 x)] and [1 U m 1 (x)] are identical. (12) can be written as: { { [ ] } P max X 1 ac(m 1) 1 2 a(1+c)(m 1) 1 2,0 U m 1 { [ }} min X 1+ac(m 1) 1 2 ]+a(1+c)(m 1) 1 2,1 = 1 α. (18) The confidence levels of regions (17) and (18) are related to the crossing probabilities of U m 1 ( ), a path of the empirical cdf of m 1 independent uniform [0,1] random variables. There are a few algorithms available to compute α using (17). For a complete discussion of these algorithms see Shorack and Wellner [4], Chapter 9. For example in Steck s algorithm, let } G 1 (i) = min {(i 1+a(m 1) 1 1 2 )/(m 1 ac(m 1) 2 ),1, { } G 2 (i) = max (i a(m 1) 1 2)/(m 1+ac(m 1) 1 2),0, i = 0,1,2,...,m 2, (19) i = 1,2,...,m 1. (20) Then where, 1 α = (m 1)det [G 1(i) G 2 (j)] j i+1 +, (21) (j i+1)! [G 1 (i) G 2 (j)] + = { G 1 (i) G 2 (j), if G 1 (i) G 2 (j), 0, otherwise, and it is understood that all elements of the matrix having subscripts i > j+1 are zero. This algorithm performs well for small size sample. For example for m = 31, c = 1, and a = 0.8446 we get α = 0.1, while for c = 0, i.e. for

528 H. Arsham the ordinary K-S version to have the same significance level, we need to set a = 1.1916. The asymptotic critical values can be computed by using the theory of weak convergence. Theorem. For any a > 0 and c 0 as m, 1 α = 1+2 ( 1) k exp( 2k 2 a 2 (1+c)). (23) k=1 Proof. Proof follows by the weak convergency, see e.g. Shorack and Wellner [4]. As a numerical example, at α = 0.1 level for c = 0, and c = 1, we obtain a = 1.2239, 0.8654, respectively. Usually, the construction of the acceptance region of form (11) and (12) involves calculation of critical value a, given parameters m, α and a specified value of c. The choice for numerical values for these parameters naturally depends on the proposed application. One may, however, rely on actual observations in deciding what parameter is suitable. In practice one constructs several different confidence bounds with different values for parameters (a, and c) using (21), and then selects the one which looks good overall. This procedure is subjective. 3. Concluding Remarks It is often desirable to supplement informal inference from a periodogram with a formal test to compare the periodogram with the theoretical white noise. The ordinary Kolmogorov-Smirinov (K-S) test represents an attempt to uncover any hidden periodicity. One problem with this test is that it may not recognize any hidden periodicity at the extreme frequencies. We modified the ordinary K-S test to be more sensitive to either low or high frequencies. A factor c shapes the confidence regions. Larger values of c imply a region with smaller bandwidth in one tail. For c=0 these regions reduce to the ordinary K-S confidence region. These bounds are important, as they allow a more realistic way of selecting the desired confidence bounds. Recognizing that analysts are concerned with the risk implication of their decision alternatives, it is suggested that the problem of selecting the parameters of a desirable shape could be based on a

ADAPTIVE K-S TESTS FOR WHITE NOISE... 529 measurable criterion to minimize the risk factor. In certain circumstances a one-sided test may be appropriate. For example, if the alternative hypothesis is that there is an excess of low frequency, only the upper line is relevant. Since an excess of low frequency corresponds to positive serial correlation, such an alternative will often be very reasonable. For the one-sided version of the proposed modified K-S confidence regions, see Arsham [1]. Pseudo codes version of all the procedures presented in this study are available from the author upon request. Acknowledgements I am grateful to the reviewers useful comments on an earlier version. The National Science Foundation Grant CCR-9505732 supports this work. References [1] H. Arsham, Confidence regions having different shapes for the failure distribution function, Microelectronics and Reliability, 36 (1996), 1439-1457, doi: 10.1016/0026-2714(95)00206-5. [2] N. Balakrishnan, Methods and Applications of Statistics in Business, Finance, and Management Science, Wiley, New York (2010). [3] P. Brockwell, R. Davis,Time Series: Theory and Methods, Springer-Verlag (2009). [4] D. Pollock, R. Green, T. Nguyen, Handbook of Time Series Analysis, Signal Processing, and Dynamics, Academic Press, New York (1999). [5] G. Shorack, J. Wellner, Empirical Processes With Applications to Statistics, SIAM, Philadelphia (2009). [6] P. Stoica, and R. Moses, Spectral Analysis of Signals, Prentice Hall, New York (2005). [7] D. Wilks, Statistical Methods is the Atmospheric Sciences, Academic Press, New York (2011).

530