A Note on Data-Adaptive Bandwidth Selection for Sequential Kernel Smoothers

Similar documents
A BOOTSTRAP VIEW ON DICKEY-FULLER CONTROL CHARTS FOR AR(1) SERIES

Ansgar Steland Institute of Statistics RWTH Aachen University Wüllnerstr. 3, D Aachen, Germany

A Bootstrap View on Dickey-Fuller Control Charts for AR(1) Series

A BINARY CONTROL CHART TO DETECT SMALL JUMPS

Statistical inference on Lévy processes

Local Polynomial Regression

Modelling Non-linear and Non-stationary Time Series

Minimax Rate of Convergence for an Estimator of the Functional Component in a Semiparametric Multivariate Partially Linear Model.

Estimation of cumulative distribution function with spline functions

Local Polynomial Modelling and Its Applications

Econ 582 Nonparametric Regression

A New Method for Varying Adaptive Bandwidth Selection

Smooth functions and local extreme values

WEIGHTED DICKEY-FULLER PROCESSES FOR DETECTING STATIONARITY

Chapter 2 Detection of Changes in INAR Models

On detection of unit roots generalizing the classic Dickey-Fuller approach

Nonparametric Methods

Model Specification Testing in Nonparametric and Semiparametric Time Series Econometrics. Jiti Gao

A Novel Nonparametric Density Estimator

Stability and Rare Events in Stochastic Models Sergey Foss Heriot-Watt University, Edinburgh and Institute of Mathematics, Novosibirsk

Test for Discontinuities in Nonparametric Regression

Introduction. Linear Regression. coefficient estimates for the wage equation: E(Y X) = X 1 β X d β d = X β

L. Brown. Statistics Department, Wharton School University of Pennsylvania

On a Nonparametric Notion of Residual and its Applications

A tailor made nonparametric density estimate

Adaptive Kernel Estimation of The Hazard Rate Function

Integral approximation by kernel smoothing

Pointwise convergence rates and central limit theorems for kernel density estimators in linear processes

O Combining cross-validation and plug-in methods - for kernel density bandwidth selection O

Wavelet Shrinkage for Nonequispaced Samples

Optimal global rates of convergence for interpolation problems with random design

ECO Class 6 Nonparametric Econometrics

Relative error prediction via kernel regression smoothers

Introduction. log p θ (y k y 1:k 1 ), k=1

Variance Function Estimation in Multivariate Nonparametric Regression

A review of some semiparametric regression models with application to scoring

Nonparametric Regression. Badr Missaoui

SIMPLE AND HONEST CONFIDENCE INTERVALS IN NONPARAMETRIC REGRESSION. Timothy B. Armstrong and Michal Kolesár. June 2016 Revised October 2016

Sequential Detection. Changes: an overview. George V. Moustakides

7 Semiparametric Estimation of Additive Models

Local linear multiple regression with variable. bandwidth in the presence of heteroscedasticity

Defect Detection using Nonparametric Regression

Simple and Honest Confidence Intervals in Nonparametric Regression

Alternatives to Basis Expansions. Kernels in Density Estimation. Kernels and Bandwidth. Idea Behind Kernel Methods

Chapter 2 SOME ANALYTICAL TOOLS USED IN THE THESIS

Estimation of a quadratic regression functional using the sinc kernel

Class 2 & 3 Overfitting & Regularization

ON SOME TWO-STEP DENSITY ESTIMATION METHOD

Change detection problems in branching processes

OPTIMAL POINTWISE ADAPTIVE METHODS IN NONPARAMETRIC ESTIMATION 1

Independent and conditionally independent counterfactual distributions

Introduction to machine learning and pattern recognition Lecture 2 Coryn Bailer-Jones

Density estimation Nonparametric conditional mean estimation Semiparametric conditional mean estimation. Nonparametrics. Gabriel Montes-Rojas

Transformation and Smoothing in Sample Survey Data

Nonparametric Cointegrating Regression with Endogeneity and Long Memory

Time Series and Forecasting Lecture 4 NonLinear Time Series

DEPARTMENT MATHEMATIK ARBEITSBEREICH MATHEMATISCHE STATISTIK UND STOCHASTISCHE PROZESSE

Inference for Non-stationary Time Series Auto Regression. By Zhou Zhou 1 University of Toronto January 21, Abstract

Estimation of the Bivariate and Marginal Distributions with Censored Data

Nonparametric Indepedence Tests: Space Partitioning and Kernel Approaches

Quantile Processes for Semi and Nonparametric Regression

Nonparametric Regression

Empirical Density Estimation for Interval Censored Data

Convergence rates in weighted L 1 spaces of kernel density estimators for linear processes

Nonparametric Econometrics

Gaussian Random Field: simulation and quantification of the error

A nonparametric method of multi-step ahead forecasting in diffusion processes

A kernel indicator variogram and its application to groundwater pollution

WEIGHTED QUANTILE REGRESSION THEORY AND ITS APPLICATION. Abstract

Bayesian estimation of bandwidths for a nonparametric regression model with a flexible error density

Nonlinear Error Correction Model and Multiple-Threshold Cointegration May 23, / 31

SCALE SPACE VIEW OF CURVE ESTIMATION. By Probal Chaudhuri and J. S. Marron Indian Statistical Institute and University of North Carolina

Practical conditions on Markov chains for weak convergence of tail empirical processes

USING A BIMODAL KERNEL FOR A NONPARAMETRIC REGRESSION SPECIFICATION TEST

Nonparametric Modal Regression

4 Nonparametric Regression

Frontier estimation based on extreme risk measures

Computation Of Asymptotic Distribution. For Semiparametric GMM Estimators. Hidehiko Ichimura. Graduate School of Public Policy

PREWHITENING-BASED ESTIMATION IN PARTIAL LINEAR REGRESSION MODELS: A COMPARATIVE STUDY

On variable bandwidth kernel density estimation

Chapter 9. Non-Parametric Density Function Estimation

Asymptotic Behavior of a Higher-Order Recursive Sequence

Titolo Smooth Backfitting with R

Section 7: Local linear regression (loess) and regression discontinuity designs

Bootstrap with Larger Resample Size for Root-n Consistent Density Estimation with Time Series Data

STAT 992 Paper Review: Sure Independence Screening in Generalized Linear Models with NP-Dimensionality J.Fan and R.Song

AUTOMATIC CONTROL COMMUNICATION SYSTEMS LINKÖPINGS UNIVERSITET. Questions AUTOMATIC CONTROL COMMUNICATION SYSTEMS LINKÖPINGS UNIVERSITET

A New Test in Parametric Linear Models with Nonparametric Autoregressive Errors

Local regression I. Patrick Breheny. November 1. Kernel weighted averages Local linear regression

Unlabeled Data: Now It Helps, Now It Doesn t

Supplemental Material for KERNEL-BASED INFERENCE IN TIME-VARYING COEFFICIENT COINTEGRATING REGRESSION. September 2017

LOCAL LINEAR REGRESSION FOR GENERALIZED LINEAR MODELS WITH MISSING DATA

Bickel Rosenblatt test

Statistica Sinica Preprint No: SS

CONCEPT OF DENSITY FOR FUNCTIONAL DATA

Decentralized Sequential Hypothesis Testing. Change Detection

Rejoinder. 1 Phase I and Phase II Profile Monitoring. Peihua Qiu 1, Changliang Zou 2 and Zhaojun Wang 2

CURRENT STATUS LINEAR REGRESSION. By Piet Groeneboom and Kim Hendrickx Delft University of Technology and Hasselt University

Asymptotically Efficient Nonparametric Estimation of Nonlinear Spectral Functionals

Comments on \Wavelets in Statistics: A Review" by. A. Antoniadis. Jianqing Fan. University of North Carolina, Chapel Hill

Transcription:

6th St.Petersburg Workshop on Simulation (2009) 1-3 A Note on Data-Adaptive Bandwidth Selection for Sequential Kernel Smoothers Ansgar Steland 1 Abstract Sequential kernel smoothers form a class of procedures covering various known methods for the problem of detecting a change in the mean as special cases. In applications, one often aims at estimation, prediction and detection of changes. We propose to use sequential kernel smoothers and study a sequential cross-validation algorithm to choose the bandwidth parameter assuming that observations arrive sequentially at equidistant time instants. An uniform weak law of large number and a consistency result for the crossvalidated bandwidth is discussed. 1 Introduction Let us assume that observations Y n = Y Tn, 1 n T, T the maximum sample size, arrive sequentially and satisfy the model equation Y n = m(n/t) + ǫ n, n = 1, 2,..., T, T 1, for some bounded and piecewise continuous function m : [0, ) R. The errors {ǫ n : n N} form a sequence of i.i.d. random variables such that E(ǫ n ) = 0, E(ǫ 4 1) <. (1) Consequently, m(t), t [0, 1], models the process mean during the relevant time frame [0, T]. In practice, an analysis has often to solve three problems. (i) Estimation of the current process mean. (ii) One-step prediction of the process mean. (iii) Signaling when there is evidence that the process mean differs from an assumed (null) model. Usually, different statistics are used for these problems. To ease interpretation and applicability, we will base the detector on the same statistic used for estimation and prediction. Our reasoning is that a method which fits the data well and has convincing prediction properties should also possess reasonable detection properties for a large class of alternatives models. We confine ourselves to closed end procedures where monitoring stops at a (usually large) time horizon T. The proposed kernel smoother is controlled by a bandwidth parameter which controls the degree of smoothing. As well known, 1 RWTH Aachen University, E-mail: steland@stochastik.rwth-aachen.de

its selection is crucial, particularly for estimation and prediction accuracy. We propose to select the bandwidth sequentially by minimizing a sequential version of the cross-validation criterion. The topic has been quite extensively studied in the literature assuming the classic regression estimation framework where the data gets dense as the sample size increases. A comprehensive monograph of the general methodology is [7]. For references to the literature on estimation of regression functions that are smooth except some discontinuity (change-) points see the recent work of [2]. Before proceeding, let us discuss our assumptions on m. Often the information about the problem of interest is not sufficient to setup a (semi-) parametric model for the process mean m and the distribution of the error terms, which would allow us to use methods based on, e.g., likelihood ratios. In this paper it is only assumed that m Lip, m(t) > 0, t > 0, and m <, (2) where Lip denotes the class of Lipschitz continuous functions. Under these general conditions, one should use detectors which avoid (semi-) parametric specifications about the shape of m, and nonparametric smoothers m n which estimate some monotone functional of the process mean and are sensitive with respect to changes of the mean. For these reasons, we confine our study to detectors of the form S T = inf{ s 0 T t T : m t > c}. Here c is a threshold (control limit), s 0 (0, 1) determines through Ts 0 the start of monitoring, x denoting the integer part of x, and { m n : n N} is a sequence of σ(y 1,...,Y n )-measureable statistics. Specifically, we study the following sequential kernel smoother m n = m n,h = 1 h n K([i n]/h)y i, n = 1, 2,... i=1 and the associated normed version m n = m n,h = m h / 1 h n K([i n]/h), respectively, which are related to the classic Nadaraya-Watson estimator. It is worth noting that various classic control chart statistics are obtained as special cases. Denoting the target value by µ 0, the CUSUM chart is based on C n = n i=1 [X i (µ 0 + K)] where {X n } denotes the observed process and K is the reference value. This chart corresponds to the choice K(z) = 1 if Y n = X n (µ 0 + K) for all n. The EWMA recursion, m n = λy n + (1 λ) m n 1 with starting value m 0 = Y 0, λ (0, 1) a smoothing parameter, corresponds to the kernel K(z) = e z and the bandwidth h = 1/ log(1 λ). Our assumptions on the smoothing kernel are as follows. K Lip(R; [0, )), K <, supp(k) [ 1, 1], and K > 0 on (0, 1). (3) 2 j=1

For the bandwidth h > 0 we assume that lim T/h = ξ (4) T for some constant ξ (0, ), which guarantees that in our design the number of observations on which m T depends converges to, as T. In practice, one can select ξ and put h = T/ξ. In [4, 5, 6] procedures based on the sequential smoother m n are studied, which allow us to detect changes in the mean of a stationary or random walk series of observations. The asymptotic theory was studied as well. Specifically, in [5] it is shown that under the assumptions of the present paper the process { T m,h : s [0, 1]} satisfies a functional central limit theorem when m = 0, i.e., T m,h M(s), for some centered Gaussian process {M(s) : s [0, 1]} which depends on ξ.this result can be used to construct detection procedures with pre-specified statistical properties. E.g., when choosing the control limit such that the type I error rate satisfies P(S T T) = α when m = 0 for some given significance level α (0, 1), the control limit also depends on ξ. The question arises, how one can or should select the bandwidth h T and the parameter ξ, respectively. In this paper we propose to select the bandwidth h > 0 such that the Y t are well approximated by sequential predictions m t which are calculated from past data Y 1,...,Y t 1. For that purpose we propose a sequential version of the crossvalidation criterion based on sequential leave-one-out estimates. 2 Sequential Cross-Validation The idea of cross-validation is to choose parameters such that the corresponding estimates provide a good fit on average. To achieve this goal, one may consider the average squared distance between observations, Y i, and predictions as an approximation of the integrated squared distance. To avoid over-fitting and interpolation, the prediction of Y i is determined using the reduced sample where Y i is omitted. Since, additionally, we aim at selecting the bandwidth h to obtain a good fit when using the sequential estimate, we consider m h, i = N 1 T, i 1 i 1 K([j i]/h)y j, i = 2, 3,... h j=1 with the constant N T, i = h 1 i 1 j=1 K([j i]/h). Notice that m h, i can be regarded as a sequential leave-one-out estimate. The corresponding sequential leave-one-out cross-validation criterion is defined as CV s (h) = 1 T (Y i m h, i ) 2, h > 0. 3

The cross-validation bandwidth at time s is now obtained by minimizing CV s (h) for fixed s. Notice that CV s (h) is a sequential unweighted version of the criterion studied by [3] in the classic regression function estimation framework. We do not consider a weighted CV sum, since we have in mind that the selected bandwidth is used to obtain a good fit for past and current observations. However, similar results as presented here can be obtained for the weighted criterion T 1 n i=1 K([i n]/h)(y i m h, i ) 2 as well. Notice that due to CV s (h) = 1 T i=1 Yi 2 2 T minimizing CV s (h) is equivalent to minimizing C T,s (h) = 2 T Y i m h, i + 1 T Y i m h, i + 1 T m 2 h, i. m 2 h, i, Thus, we will study C T,s (h) in the sequel. Cross-validation is expensive in terms of computational costs and minimizing C T,s for all s is not feasible in many case. Therefore and to simplify exposition, let us fix a finite number of time points 0 < s 1 < < s N, N N. Later we shall relax this assumption and allow that N is an increasing function of T. At time s i the cross-validation criterion is minimized to select the bandwidth, h i = h i (Y 1,..., Y si ), and that bandwidth is used during the time interval [s i, s i+1 ), i = 1,...,N. 3 Asymptotic Results The question arises which function is estimated by C T,s (h). Our first result identifies the limit and shows convergence in mean. Theorem 1. We have E(C T,s (h)) C ξ (s) = 2 s 0 + as T, uniformly in s [s 0, 1]. r 0 ξk(ξ(u r))m(ξu)dudr s 0 ξk(ξ(r s))dr (5) s 0 ξ2 r r K(ξ(u r))k(ξ(v r))m(u)m(v)du dv dr 0 0 s 0 ξk(ξ(r s))dr, Before proceeding, let us consider an example where the function C ξ (s) possesses a well-separated minimum. Example 1. Suppose K is given by K(z) = (1 z )1 [0,1] (z) for z R. Further, let us consider the nonlinear function m(t) = x(x 0.2)(x 0.4). Clearly, C ξ (s) is a polynomial of order 4 with coefficients which depend on s. Figure 1 depicts C ξ (s) for some values of ξ. The locations of the (real) roots of ξ C ξ(s) depend on s [0, 1] and are shown in Figure 1 as well. 4

0.0003 0.0002 0.0001-0.0001-0.0002 2 4 6 8 10 8 6 4-0.0003 0.2 0.4 0.6 0.8 1 Figure 1: Left panel: The function C s (ξ), ξ (0, 20], for s {0.1, 0.2, 0.3, 0.4}. Right panel: The optimal values for ξ as a function of s (0, 1]. We will now study the uniform mean squared convergence of the random function C T,s (h). Define S N = {s i : 1 i N}. Theorem 2. We have as T. E sup s S N C T,s (h) E(C T,s (h)) 2 = O(T 1 ), Current research focuses on the following generalization which allows that the number of time points where cross validation is conducted is a function of the maximum sample size T. Conjecture 3. Assume N = N T and 0 < s 0 < s N1 < < s NN 1, N 1, (6) and put S N = {s Ni : 1 i N}. Given the assumptions of Theorem 2 there exists some γ > 0 with NT T = o(1), such that γ E sup s S N C T,s (h) E(C T,s (h)) 2 = o(1) Combining the above statements, we obtain Theorem 4. Suppose that (1) and (6) as T. E sup s S N C T,s (h) C s (ξ)) 2 0, We shall now extend the above results to study weak consistency of the crossvalidation bandwidth under fairly general and weak assumptions. Having in mind the fact that h T, let us simplify the setting by assuming that h = h(ξ) = T/ξ, ξ [1, Ξ], 5

for some fixed Ξ (1, ). This means, h and ξ are now equivalent parameters for each T. We also restrict the optimization to a compact interval, which is not restrictive for applications. Now m h, i can be written as 1 i 1 m h, i = K(ξ(j i)/t)y j. (i 1)h j=1 With some abuse of notation, let us also write Theorem 5. For any s [s 0, 1] and as T. C T,s (ξ) = C T,s (T/ξ). sup C T,s (ξ) EC T,s (ξ) = o P (1), (7) ξ [1,Ξ] sup C T,s (ξ) C ξ (s) = o P (1), (8) ξ [1,Ξ] We are now in a position to formulate the following conjecture on the asymptotic behavior of the the cross-validated sequential bandwidth selector. Conjecture 6. Suppose C ξ (s) possesses a well-separated minimum ξ [1, Ξ], i.e., sup C ξ (s) > C ξ (s). ξ [1,Ξ]: ξ ξ ε for every ε > 0. Then References argmin ξ [1,Ξ] C T,s (ξ) P ξ. [1] Gijbels I., Goderniaux A.C. (2004) Bandwidth Selection for Changepoint Estimation in Nonparametric Regression. Technometrics, 46, 1, 76-86. Oliver and Boyd, Edinburgh. [2] Härdle W., Marron J.S. (1985) Optimal bandwidth selection in nonparametric regression function estimation. Ann. Statist., 13, 1465-1481. [3] Schmid W., Steland A. (2000) Sequential control of non-stationary processes by nonparametric kernel control charts. AStA Adv. Stat. Anal., 84, 315-336. [4] Steland A. (2004) Sequential control of time series by functionals of kernelweighted empirical processes under local alternatives. Metrika, 60, 229-249. [5] Steland A. (2005) Random walks with drift - A sequential approach. J. Time Ser. Anal., 26 (6), 917-942. [6] Wand M.P., Jones M.C. (1995) Kernel Smoothing. Chapman & Hall, Boca Raton. 6