Robust Nonparametric Smoothing of Nonstationary Time-Series

Size: px

Start display at page:

Download "Robust Nonparametric Smoothing of Nonstationary Time-Series"

May Parks
5 years ago
Views:

1 JSCS # Revised Robust Nonparametric Smoothing of Nonstationary Time-Series Carlo Grillenzoni University IUAV of Venice 335 Venice, ITALY (carlog@iuav.it) Abstract. Motivated by the need of extracting local trends and low frequency components in nonstationary time series, this paper discusses methods of robust nonparametric smoothing. Basic approach is the combination of the parametric M- estimation with kernel and local polynomial regression (LPR) methods. The result is an iterative estimator which retains a linear structure, but has kernel weights also in the direction of the prediction errors. The design of smoothing coefficients is carried out with robust cross-validation (RCV) criteria and rules of thumb. The method works well both to remove the influence of patches of outliers and to detect local breaks and persistent structural change in time series. Key Words. Additive outliers, Cross validation, Financial data, Kernel Regression, Local Polynomial, M-estimation, Structural changes. Acknowledgments: The revision has greatly benefit of the remarks of a reviewer. i

2 . Introduction Nonstationary time-series are usually characterized by complex and chaotic patterns due to the large number of causes that intervene in the real world. Economic and environmental phenomena are the fields where such data are commonly encountered. An example is provided in Figure which deals with the gold sector of the USA stock market in the period January, 99 - June 3, 6. More precisely, it exhibits the daily volume of transactions of the stocks which compose the AUX Index (the Philadelphia Gold & Silver Sector Industries). As one can see, the pattern of the series is rather irregular and is subject to the influence of anomalous observations, heteroscedasticity, jumps and changes of regime. For such series, there is large variety of smoothing methods to extract trendcycle components which represent the fundamental tendencies of the phenomenon. In the parametric context, traditional decomposition techniques have been developed through state-space models which are estimated with the Kalman Filter and its extentions (e.g. Harvey and Trimbur, 3). However, this approach involves problems of parametric identifiability and convergence of the estimates, which urge the resort to Bayesian solutions. More recently, repeated medians, trimmed regression, and other robust filters, have been revisited by Gather et al. (6), providing effective methods of signal extraction in series with jumps and outliers. The comparison with nonparametric regression will be discussed in this paper. Figure. Daily volume of transactions in the AUX index (Philadelphia gold and silver sector industries) in the period January 99 - June 6. Source: 3 x 5 AUX Index Volume Time in Days

3 Nonparametric smoothers are suitable for signal extraction in messy data, because are flexible and reduce mathematical assumptions at the minimum. One of the most general approaches is the local polynomial regression (LPR, e.g. Fan and Gijbels, 996), because includes the kernel methods of Nadaraya-Watson and Priestley-Chao as particular cases. The main advantage of LPR is its built-in automatic boundary correction, namely its insensitivity to the border conditions. This enables to obtain reliable estimates of the regression function also at the beginning and at the end of the data-set; for time series, this is a necessary condition to perform forecasts. Another important feature is the possibility to express such smoothers as weighted least squares (WLS) estimators; this simplifies both numerical implementation and statistical analysis. Local properties of nonparametric estimators can be enhanced by robust statistics. Robust LPR smoothers were developed by Cleveland (979) in a heuristic way, and by Härdle and Gasser (984) and Fan et al. (994) following the M-estimation approach of Huber (98). Subsequently, Assaid and Birch () and Beran et al. () have discussed bandwidth selection based on modified cross-validation and plug-in, and Rue et al. () and Hwang (4) have dealt with specific issues of edge-preserving. This feature is suitable for signal denoising and change-point problems, and is allowed by the adaptive properties of robust smoothers. More precisely, the observations which are placed near a jump-point yield anomalous residuals and, therefore, they tend to be ignored by the estimators. However, the kernels of such smoothers act as threshold functions, so that discontinuities in the estimated regression surface are generated. In this paper, we derive a robust PLR algorithm based on the weighted-average form of M-type estimates (see Hampel et al., 986 p. 5). Using the weighting approach to robustness seems natural for nonparametric smoothers, because they are usually expressed as weighted least squares. The resulting algorithm utilizes simple kernels in place of the typical ψ-functions, and this makes its nature totally nonparametric. Its structure is also suitable to derive robust smoothers in recursive form, which are suitable for real-time processing of data (e.g. Grillenzoni, 997).

4 The second part of the paper deals with bandwidth selection from the perspective of robust cross-validation (RCV, e.g. Wang and Scott, 994). This approach is MSE optimal and can design even the coefficients which tune the degree of robustness. Extended simulation experiments, as those of Fried et al. (6) and Bianco and Boente (6), will show the validity of the proposed solutions.. Models and Smoothers Given a nonstationary time series {y i }, we consider an additive model based on three fundamental components: trend, stationary and random y i = g(t i ) + x ti + u i, i =,...n () d g(t i ) = ϕ(t i ) + δ j I(t i > t j ) j= q x ti = φ k x ti k + e ti IN(, σ e ) k= u i ( α) IID(, σ ) + α IID(, σ ) in the fixed design case, the index t i = i; otherwise, for random sampling or missing observations {t i } is a random sequence in the interval (, n]. The trend-cycle is represented by a function g( ) which has a continuous component ϕ( ) and is affected by discontinuities at fixed but unknown locations t j, j =,...d; the size of the jumps is finite δ j <. The stationary component {x t } has a stable autoregressive (AR) representation of order q, and Gaussian disturbances. The innovation sequence {u i } is contaminated by outliers, with α > and σ σ. The inclusion of the piecewise constant component υ i = j δ j I(t i > t j ) into the random term u i, shows the close relationship between jumps and outliers. In a certain sense, it is analogous to that between innovation and additive outliers in dynamic models (see Maronna et al., 6 p. 95). In general, the observed process {y t } is double nonstationary, because it has time-varying moments and persistent autocorrelation function (long memory). Unlike the semi-parametric system of Bianco and Boente (6), consistent estimation of g(t) can only be conceived in a closed interval, i.e. by replacing the time index with τ i = t i /t n. This approach can exploit 3

5 the inferential framework of standard nonparametric regression (e.g. Härdle et al., ), but is not realistic in time series analysis. Referring to the composite innovation ε i = (x ti +u i ) = (y i g(t i )) and following Fan and Gijbels (996, p. ), the M-type LPR estimator is defined as ĝ M (t) = arg min g { nκ n ( ti t K i= κ ) [ ρ (y i g) p ] } β k (t i t) k () k= where ρ( ) is a convex function that controls the influence of the observations (y i, t i ), t (, n] is a continuous variable, K( ) is a symmetric density function and < κ < is a smoothing coefficient. When ρ = ( ) the estimator () coincides with the LPR smoother; in this case, if p = one has the classical kernel regression and for p = one has the local linear regression (LLR). Only for p > the automatic boundary correction property is guaranteed by the auxiliary coefficients β k. This allows to obtain reliable estimates of g(t) on the borders, namely at t = t, t n which is fundamental in forecasting. The choice of the ρ-function in () concerns M-estimation theory of robust statistics. Huber (98) states that ρ( ) should be unbounded and must achieve its maximum value asymptotically, because outlying observations may contain useful information. On the contrary, Hampel et al. (986) claim that it should be strictly bounded, because outliers are usually extraneous to the models. The two approaches have opposite consequences on the properties of convergence and adaptivity of the estimates, and, in nonparametric smoothing, they have been applied to outlier removing and edge preserving respectively. Following Huber and Hampel philosophies, the most common unbounded (a,b) and bounded (c,d) loss functions are given by a) ρ a (ε) = ε ε /, ε λ b) ρ b (ε) = λ ε λ /, ε > λ ε /, ε λ c) ρ c (ε) = λ /, ε > λ d) ρ d (ε) = L ( ε/λ)/λ (3) 4

6 where L( ) is a unimodal density function (i.e. it is similar to K), and λ > is a design constant. The common feature of the above criteria is that their score function ψ(ε) = ρ(ε)/ ε is uniformly bounded; this is the general condition of robustness. The loss function (3,a) is the most simple and was stressed by Wang and Scott (994); (3,b) is the preferred one of Huber, and has a monotone ψ-function. Finally, (3,c) corresponds to the trimmed method and (3,d) is a smooth solution which provides redescending ψ-functions (e.g. Hampel et al., 986). Graphical display of all of these functions is given in Figure. Figure. Display of loss functions (3) with L( ) Gaussian, λ = and of the related score and weight functions. ρ(ε) Functions ψ(ε) = ρ(ε) / ε ω (ε) = ψ (ε) / ε (a) (c) (b) (d) Letting β = [ g, β...β p ], the vector of parameters to be estimated in () at every point t, the minimization of the objective function is carried by iterative algorithms, such as Steepest-Descent. At the k-th iteration one has ˆβ (k+) M (k) (t) = ˆβ n M (t) + K i (t) t i ψ [ y i t i i= ˆβ (k) M (t)] (4) where K i (t) are the kernel weights in (), the vector t i = [, (t i t)...(t i t) p ] and ψ( ) = ρ ( ). The initial value for (4) can be obtained from the simple (linear) LPR estimation, namely ˆβ () M (t) = ˆβ L (t). The choice of the initial values in M- 5

7 estimators with non-monotone ψ-function is a delicate issue because may affect the convergence to the global optimum point (e.g. Chu et al., 998). However, the initial value problem for (4) is very complex (it involves p n values) and the choice of the linear smoother has not much alternatives. An alternative approach to (4), which leads to a quasi-linear solution for (), can be obtained from the weighted-average form of M-estimates introduced by Tukey (see Hampel et al., 986, p. 5). Applying the residual weight function ω(ε) = ψ(ε)/ε to the rescaled normal equations obtainable from (), one has n K i (t) ψ [ (y i t i β)/σ ε] ti = i= n K i (t) ω [ (y i t i β)/σ ε] (yi t i β) t i = i= n n K i (t) ω(ε i /σ ε ) t i y i = K i (t) ω(ε i /σ ε ) t i t i β (5) i= i= Notice from (3) and Figure that the weights ω( ) have the same structure as kernel functions; in particular, in the case (3,d) one has ω( ) = L( ). Thus, defining W i (t, ε i ) = ( ) ( κ λ K ti t yi t i L β ) κ λ and solving the system (5) for β, in iterative form, one can obtain the estimator [ ˆβ (k+) n M (t) = i= (6) ( (k) W i t, ˆε i (t) ) ] t i t n ( (k) i W i t, ˆε i (t) ) t i y i (7) i= where ˆε (k) i (t) = [ y i t (k) i ˆβ M (t)] is the i-th residual evaluated at the point t. The estimate of g(t) is provided by the first element of the vector (7). To outline its structure, we consider the simple case p = ; by using the notation L λ ( ) = L( /λ)/λ, the estimator becomes ĝ (k+) M (t p = ) = ni= K κ (t i t) L λ [ yi ĝ (k) M (t)] y i ni= K κ (t i t) L λ [ yi ĝ (k) M (t) ] (8) This looks like a kernel regression, but is iterative and also weights data in the direction of the variable y i ; it is this weighting that enables robustness. This estimator has been derived under the assumption of L Gaussian; however, it can be adapted 6

8 to other kinds of ω-functions. For example, in the trimmed case (3,d) the kernel L is replaced by the indicator I( y i ĝ (k) M (t) < λ) (see Figure ). The quasi-linear algorithms (7) and (8) allow important computational advantages because avoid direct minimization. They are useful in the selection of κ, λ through methods which also involve an optimization problem; for example, (7)-(8) can be included in the cross-validation framework without yielding a double simultaneous minimization. Another advantage is for regular time series (t i = i) of medicine and engineering, where real-time processing of data is required. The on-line version of an iterative smoother ĝ (k) n (t) can be obtained by equating number of iterations and number of processed data (k = n) = t and then applying recursive least-squares transformations (see Grillenzoni, 997). The robust mechanism underlying (7) has some connection with those developed by Cleveland (979), Fan et al. (994) and Assaid et al. (). These authors focused on the Huber s function ψ H (ε) = min[λ, max( λ, ε)], and distinguish themselves mainly for the weights W i (t, ε). In Cleveland (979) the first iteration is as follows: the LPR residuals ˆε i are rescaled by means of a robust estimate σ ε ; next they are transformed into weights by means of ψ H and are multiplied by tricube weights K i based on a nearest-neighbor bandwidth. Similarly, Fan et al. (994) use the weights K i ψ H (ˆε i)/ˆε i, where the star denotes rescaling, and Assaid et al. () focused on K i Gaussian designed with a modified cross-validation criterion. Instead, the algorithms (7)-(8) use kernel functions even for weighting the residuals, so that the robustness mechanism is similar to that of local weighting. Some remarks about the statistical properties are now in order. It can be shown that robust estimators in weighted average form have the same influence function and asymptotic variance as M-estimates (e.g. Hampel et al., 986 p. 6). If the kernel weights K i, L i are non-negative, then the same property can be extended to robust smoothers and one can conclude that () and (7) are equivalent. As regards the relationship between (7) and (8), it is important recalling that LPR smoother of order p = has the same asymptotic properties as that of order p = (see Fan and Gijbels, 996); hence, (7) and (8) are asymptotically equivalent for p. 7

9 The convergence of the estimator (4) to the global optimal solution is guaranteed only if ψ( ) is monotone, as in the Huber ρ-function; in the other cases, multiple local solutions are possible. However, redescending ψ-functions have better adaptive properties about the jump-points. The analysis of redescending smoothers has been developed by Chu et al. (998), Rue et al. () and Hillebrand and Müller (6). Under regularity conditions, they have shown that the asymptotic MSE of the estimator () with p, in a neighbor of a jump-point t, is given by E { [ ĝ M (t) g(t) ] } (π δ) + π( π) δ, t (t ± κ) (a C κ 4 + C /(n κ λ 3 ), t elsewhere (b where δ > is the jump-size, π = δ/ f(ε) dε. The constants C j are given by C = 4 µ (K) g (t), C = σ ε S (K) S (L ) f ε () f ε () where µ is the mean square and S is the square norm. In the equation (9,a) the MSE does not vanish asymptotically and, therefore, the M-smoother is not consistent at the jump point t. However, the formula of π shows that if f(ε) has a bounded support, with range less than δ, then the consistency may exist (see also Hillebrand and Müller, 6). The formula (9,b) holds for g( ) continuos, and confirms the well known conditions of consistency κ, (nκ) ; instead, the robustness coefficient may be λ >. This fact is natural in robust statistics (e.g. Maronna et al., 6), where λ has the role of scale parameter, but it is not well recognized in the analysis of M-smoothers, where it is considered a bandwidth, hence it is assumed λ (e.g. Hwang, 4). We now apply linear and robust LPR to the series in Figure. We focus on the period April 3 - March 5 (namely t i =3-36 on the abscissa of Figure ), because it is particularly contaminated by outliers. All smoothers were implemented with K, L Gaussian and the degree p was selected on the basis of stability; the choice was p=, because for p= the pattern of ĝ L (t) was more unstable. The bandwidths κ, λ were selected with quadratic and absolute cross-validation (CV); the results in the linear (L) and the robust (M) case were: CV : ˆκ L = 4.7; ˆκ M =.8, ˆλ M = 6. 8 (9)

10 CV : ˆκ L = 6.9; ˆκ M = 5.3, ˆλ M =. () we used the smallest ones for generating the estimates of g(t) in Figure 3. It can be seen that robust estimates are much less sensitive to extreme values and provide a more convincing path of the trend. This result is not obvious because, in the selected period, the difference between outlying and normal data is not clear. Figure 3. Results of smoothing AUX series in the period April 3 - March 5 with LPR(p=): (a) Linear estimator with ˆκ=4.7; (b) Robust estimator (7) with ˆκ=5.3 and ˆλ=.. In both cases, K,L are Gaussian kernels..5 (a) Simple Smoothing (b) Robust Smoothing As a final exercise, we check the forecasting performance in the interval t=45-5 of Figure 3. The one-step-ahead predictor for the system () with order q= is ŷ t+ = ĝ M (t) +.3 x t, where the latter comes from a robust estimation. The performance of this scheme is compared with that of an AR() model: ŷ t+ =.64 y t +.35 y t, also estimated with the trimmed loss function (3,b). The out-ofsample MSE of forecasts of the two models provided similar values (.96,.9). This means that smoothing methods may also be useful in forecasting, although their primary usage is in trend extraction. 9

11 3. Bandwidth Selection Bandwidth selection is a fundamental problem in the design of nonparametric smoothers. The ideal approach is plug-in because analyzes the theoretical MSE of the smoother, derives the expression of the optimal bandwidth and then replaces estimates of the unknown quantities, in particular that of second derivatives. However, this approach is analytically and computationally demanding and, in the case of discontinuous functions and complex coefficients, it cannot be applied. As an example, in the expression (9) the function g is not differentiable at t = t and the minimization of (9,b) with respect to λ provides the non-robust solution λ. Under regularity conditions, the cross-validation (CV) method provides asymptotically equivalent results, but it is more simple because works on the prediction errors ˆε i = [ y i ĝ(t i )]. In this section, two issues related to the effect of outliers on CV and to the estimation of the coefficient λ are discussed. Classical CV selects κ, λ by minimizing i ˆε i; therefore, it is very sensitive to outliers and jumps, even when a robust smoother is used to fit the data. Following Wang and Scott (994) and Leung et al. (993, 5), the solution is provided by robust cross-validation (RCV), which considers the loss function P n (κ, λ) = n n i= [ y i ĝ (k) M i (t i) ] () where [ ] may be one of the ρ-functions in (3) and ĝ (k) M i( ) are the estimates (7)-(8) obtained by omitting the i-th observation (y i, t i ). The side-effect of () is that the -error optimality of the estimates is not demonstrated, because the expressions of κ, λ which minimizes the mean integrated robust error E{ [ ĝ M (t) g(t) ] dt} are unknown (e.g. Boente et al., 997). However, under regularity conditions the RCV approach is optimal for κ. Proposition. Assume that the model () has continuous response function (i.e. g = ϕ) and that the coefficients of the M-smoothers (),(7) with p, are designed with the robust CV criterion (), with bounded. Then, the bandwidth ˆκ RCV is asymptotically MSE optimal and is independent of the function [ ].

12 Proof. Under usual regularity conditions of nonparametric regression, Leung (5) has shown that ˆκ RCV is asymptotically equivalent to the bandwidth which minimizes the average squared error (ASE) and its expectation ˆκ ASE = arg min κ E{ n n [ĝm (t i ) g(t i ) ] } Asymptotically, the expected ASE is equivalent to the mean integrated squared error (MISE), whose expression was derived by Härdle and Gasser (984) for p i= AMISE[ ĝ M (x) ] = [ 4 κ4 µ (K) S (g ) + (nκ) S (K) E [ρ (ε) ] / E [ρ (ε)] ] () where S ( ) is the squared integral and µ is the second moment. Notice that () is just the combination of components of simple kernel regression and parametric M-estimation (e.g. Huber, 98) and is minimized by the expression { S (K) κ opt = µ (K) S (g ) n E[ρ (ε) } /5 ] E[ρ (ε)] Summarizing, we have ˆκ RCV ˆκ ASE κ opt, and the remarkable fact is that κ opt is independent on the loss function [ ] used in (). This result extends to robust CV the asymptotic optimality of the quadratic CV (e.g. Härdle et al., p. 4). Unlike the ρ-function of the smoothers (which must be bounded in presence of jumps), there is relative indifference in the choice of the -function of (). Indeed, κ opt is independent of it and the sole necessary requirement is that is bounded. The simplest design is the absolute loss (3,a), because it does not involve additional robustness coefficients as λ (say). With the other functions (3;b-d), such coefficient would rise a circular problem with respect to the estimation of λ through (). Finally, Wang and Scott (994) have shown that = has sufficient robustness properties for CV problems. Proposition only holds for κ, and in the nonparametric literature very little has been said about λ. In robust smoothers implemented with the Huber ρ-function, that coefficient is usually established a-priori (e.g. Leung, 5); on the other hand, the authors who tried to estimate it with a quadratic CV found that ˆλ CV (see Hall and Jones, 99, p. 77). This drawback could not occur in smoothers with

13 bounded ρ-function or designed with RCV criteria (in fact, the estimates in () converged); however, an alternative solution may be considered. In robust statistics, λ has the role of scale and is selected according to the distribution of ourliers. If this is unknown, λ should just provide a compromise between efficiency and robustness of the estimates (e.g. Maronna et al., 6 p. 65). Indeed, robustness is inversely proportional to λ, but the efficiency (in absence of outliers), is directly proportional to it. Now, setting λ = Cσ ε, with C >, it can be shown that M-estimates of the location parameter of a Gaussian model maintain 95% relative efficiency with respect to the least-squares if < C < 5. Specifically, for the Huber loss one has C H =.345, whereas for the Tukey loss C T = This solution can be extended to M-smoothers with bounded ρ-function. Proposition. Assume that model () has continuous response function (i.e. g ϕ) and Gaussian disturbances ε i. Then, the M-smoothers (7) with p and L=Gauss maintains 95% asymptotically relative efficiency (ARE) with respect to the simple PLR estimator if λ =. σ ε. Proof. See the Appendix. As regards the data in Figure 3(a), the sample standard deviation was ˆσ ε =.5; hence λ =. ˆσ ε =., which virtually coincides with the RCV value in (). On the other hand, robust estimates of the scale, such as the median absolute deviation (MAD) ˆσ M = med j { ˆε j med i (ˆε i ) }/.6745, provided ˆσ M =., a value which tends to underestimate λ. This means that in resorting to the solution of Proposition, one must pay attention to the choice of the scale estimator. As an alternative to MAD, one can consider the unbiased estimate of Huber (98, p. 8), based on the Winsorized residuals ε i = ψ H(ε i /σ ε )/σ ε. Starting from the prediction errors of the linear smoother ˆε i = [ y i ĝ L i (t i ) ], and iterating, it is given by ˆε i (k) { = min ˆεi,.345 ˆσ ε (k )} ˆσ ε (k) = n ( n n ) / ˆε i n (k) (3) k where n k is the number of unmodified errors at the k-th iteration. In AUX data the above converged to ˆσ ε =.54, independently of the initial ˆσ ε (). i=

14 4. Simulation Experiments We now test the proposed methods on artificial series affected by systematic jumps and consecutive outliers (patches). We consider the simulation experiment of Fried et al. (6, p. 33), whose basic signal is piecewise constant and linear, with ramps of different slopes; see Figure 4(a). The series has length n=3, is overlied by a N(,) white noise and 5% of observations were replaced by outliers of size. Specifically, there are 3 isolated outliers, 3 pairs with the same sign (i.e. yt o = yt o = ), and triples with alternating sign (i.e. yt o =, yo t = +, yt+ o = ). They were placed about the same positions as Fried et al. (6). Figure 4. Results on the simulation experiment of Fried et al. (6): (a) True signal and observed series y t ; (c-d) LPR(p) estimates: linear (thin line) and robust (solid line) with ρ = Gauss. Their coefficients κ, λ are selected with robust CV (see Figure 5). 3 (a) Signals 3 (b) P=, MSE= (c) P=, MSE=.6 3 (d) P=, MSE=. 3 3 On the series in Figure 4(a), we fitted LPR smoothers with orders p =,,, in linear and robust version. The robust estimator (6)-(7) was implemented with L Gaussian, and the coefficients κ, λ were selected with robust CV-criteria (see Figure 3

15 5). The results are displayed in Figures 5(b-d); the curve in the panel (c) is better than those obtained by Fried et al. (6, p. 333), both as jump-preserving and outiler resistance. Robust estimates clearly outperform the linear ones and their empirical MSE m g (p) = 3 t [ ĝ M (t) g(t) ] were.3,.6,. respectively; it follows that the order p = is the best one. These results much worsen by using unbounded ρ-functions as (3,a-b); for example, the smoother with ρ = yields m g ()=.56 and the Huber solution implies m g ()=.43. On the other hand, all linear estimates provided m g (p).3 for all p. Figure 5. Path of CV criteria of linear and robust LPR() smoothers in Figure 4(c): (a,b) Quadratic; (c,d) Absolute; (e,f) Robust with = Gauss and λ = ˆσ M (a) CV(κ) (b) CV( κ, λ) 3 λ κ κ (c) CV(κ) (d) CV( κ, λ) κ (e) CVL( κ) κ λ λ κ (f) CVL( κ, λ) κ To complete the analysis, Figure 5 displays various CV functions of the LPR() smoother; the first column is that of the linear estimator. Panels (a,b) show that quadratic criteria are relatively flat and nearly fail to select the best coefficients. Panels (c,e) are very similar and provide an empirical evidence of the Proposition, namely the equivalence of robust CV criteria for the coefficient κ. Those with 4

16 = Gauss were implemented with the scale ˆσ M = MAD[ y t ĝ L (t)] =.5, which was obtained from the linear estimate in Figure 4(b). Notice that its value is close to the variance of the noise added to the signal. Finally, the paths of the bivariate criteria are much less smooth than the univariate ones; this means that λ may need heuristic solutions as that discussed in Proposition. Figure 6. Behaviour of the LPR() smoother (7) with L Gaussian and κ,λ =,3,6. (a) κ=, λ= (b) κ=, λ=3 (c) κ=, λ=6 3 (d) κ=3, λ= 3 (e) κ=3, λ=3 3 (f) κ=3, λ=6 3 (g) κ=6, λ= 3 (h) κ=6, λ=3 3 (i) κ=6, λ= Besides the CV optimization, it is interesting to evaluate the separate effect of κ, λ on the estimates. We also worse the experimental conditions of the previous simulation by introducing triples of outliers with the same sign (i.e. yt o = yt o = y o t+ = ); these can also be interpreted as local breaks. The robust LPR() estimator (7) was run with 9 pairs of the coefficients κ, λ =, 3, 6 and results are reported in Figure 6. Panels (e,h) show that there are designs which can treat the triples either as jumps to be preserved, or as outliers to be removed. In the latter case, the robustness property can be improved by initializing the smoother with a robust kernel estimator. In general, as λ increases the smoother (7) behaves like a linear estimator and is less sensitive to jumps and outliers. Hence, there is a delicate trade-off between jump preserving and oultier resistance. 5

17 4. Monte Carlo Simulation So far we have considered the results from a single realization of the process y t. This was necessary to have a realistic graphical evidence of the methods we have discussed. Following Bianco and Boente (6, p. 89), Monte Carlo simulations are necessary to confirm the validity of the proposed solutions. In particular, it is interesting to evaluate the effect of the loss functions on the fitting performance and on the bandwidth selection respectively. Generally speaking, robust smoothers should be implemented with bounded ρ-functions, because they have better adaptive properties; whereas, robust CV criteria should adopt absolute -function to avoid circularity in the selection of the coefficient λ. Moreover, following Proposition and Leung (5), the selection of κ is asymptotically independent of the function ; the sole requirement is that its derivative must be bounded. Through repeated simulations, we are interested to check these statements for finite samples and for more complex coefficients such as λ. We have considered a signal similar to the previous one, but with more severe conditions. Its length is much shorter n=; it was overlied by the AR process x t =.6 x t + e t N(, ) and 9% of observations were replaced by outliers of size 9, containing two pairs and one triple; see Figure 7(a). For the sake of simplicity, we restrict the analysis to the smoother with order p= and to the loss functions and L, Gaussian. Robust methods were initialized with linear estimates, either as regards the iterative smoothers and the scale coefficient of the CV functions; more precisely, we set ĝ () M (t) = ĝ L (t) and λ = ˆσ M. The main experiment consists of comparing bandwidths selected with absolute, quadratic and bounded CV criteria, with those obtained by minimizing the ASE(κ, λ) = n t[ ĝ n (t) g(t) ]. Notice that the asymptotic equivalence between ASE and CV holds only for linear smoothers and for series without jumps and outliers. N= replications were performed and mean values and mean squared errors with respect to the ASE estimates (e.g. m κ = h [ˆκ h ˆκ ASE h ] ) are reported in Table. At the graphical level, Figure 7(b) displays mean values of the estimated curves ḡ(t) = h ĝ h (t). 6

18 Table. Results of the Monte Carlo simulation on the system of Figure 7(a): κ, λ, σ M, m g are mean values over replications. m κ, m λ are mean squared errors with respect to the ASE (average squared error) estimates; the values in parentheses are computed with respect to the heuristic estimation λ =. ˆσ M. CVL is the robust cross-validation with = Gauss; it was tuned with λ = ˆσ M of the CV selection. Smoother Criterion κ m κ λ mλ σ M m g LPR() ASE CV (.5).73.9 CV.6.6. (.95) CVL..66. (.).6.69 Robust ASE LPR() CV (.9).3.4 with CV (.8)..95 ρ = CVL (.9).9.97 ASE CV CV ρ = L CVL ASE without CV additive CV outliers CVL The table must be read starting from the last column, which provides the average (over the replications) mean squared distances between the estimated curves and the true signal; hence, it resumes the performance of the smoothers. The values indicate that the best solution is the robust one with bounded ρ-function; in particular, we have m g ( L) <.5 m g ( ). This conclusion is also supported by Figure 7(b), which shows that the robust smoother with ρ = is close to the linear one. Within each block, there is substantial equivalence between the results of the robust CV criteria, 7

19 both as mean values of the bandwidths κ, λ and mean squared distances from the ASE estimates m κ, m λ ; instead, CV estimates have greater variability. This leads to extend Proposition to the coefficient λ and to prefer the CV method in view of its simplicity. Therefore, the general conclusion is: use robust smoother with ρ = L, but select its coefficents κ, λ with = ; moreover, from the values of m λ in parentheses, the heuristic solution λ =. ˆσ M is the closest one to the ASE estimates of the third block. Finally, these remarks also apply to the case without outliers (but with jumps) in the fourth block. Classical CV criterion provides the best estimate of the coefficient κ, but its fitting performance is the worst within the block. To complete the analysis, panels (c,d) provide the kernel densities of the estimates ˆκ h, ˆλ h obtained with the various CV criteria. Figure 7. Results of Monte Carlo simulation: (a) Basic signal and a realization of the process y t = g(t)+ar+outliers; (b) Mean values of curve estimates: Linear LPR() (thin line), Robust with ρ = L (solid line), Robust with ρ = (dotted line). (c,d) Kernel densities of the estimates ˆκ h, ˆλ h obatained in the various replications with CV (solid line), CVL (thin line), CV (dashed line); the bandwidths used were ˆσ/n /5. 3 (a) g(t) + AR 5 (b) Mean Curves (c) Densities of κ.6 (c) Densities of λ kappa 3 4 lambda 8

20 5. Conclusion Signal extraction in nonstationary time-series is an important problem in many applied fields of research (economy, ecology, engineering). It enables to judge the fundamental tendency in complex processes and so to indicate the right action to take. Nonparametric smoothing is useful techniques for this purpose because it reduces a-priori assumptions at the minimum; local polynomial regression, in particular, has suitable boundary properties that allow extrapolation. Robustification of smoothers has the advantage both to reduce the influence of anomalous observations and to detect and preserve structural changes. A further improvement comes from robust cross-validation that allow to select optimal bandwidths without resorting to demanding plug-in strategies. Application to real time series and simulated data has shown the efficacy of the methods. Appendix: Proof of Proposition We must consider the asymptotic variances of the robust and linear smoothers ĝ M, ĝ L. These are respectively given by the second term in () and (nκ) S (K) σ ε. Hence, the ARE ratio is similar to that of the parametric case ARE ( ĝ M, ĝ L ) = σ ε τ ρ, τ ρ = E [ ρ (ε) ] E [ ρ (ε) ] (4) this is independent of κ and is equal to that of least squares and M-estimates for the location parameter of ε i. Now assuming Gaussianity both for f(ε) and for L( ), one has that ρ (ε) = N(ε;, λ)ε/λ ; therefore E [ ρ (ε) ] = = = = [ u/λ π λ exp πλ 6 σ ε α πλ 6 σ ε ( ε ε π exp ) ] [ π σε exp λ ( ε λ + ε σε ε ( ε exp πα α ) dε ( ε ) ] dε σ ε )dε, α = σ ε λ σ ε + λ σ ε πλ 3 (σ ε + λ ) 3/ (5) 9

21 Analogously, it can be shown that ρ (ε) = N(ε;, λ)/λ N(ε;, λ)ε /λ 4, and using β = (σελ )/(σε + λ ) one can obtain E [ ρ (ε) ] [ ( ε ) ] = E exp ε E[ ( ε ) ] π λ 3 λ exp π λ 5 λ [ λ ( ε ) ( = exp ε ε ) ] exp dε π(λ + σε)λ 4 πβ β πβ β ( = λ β ) = (6) π(λ + σε π(λ )λ4 + σε )3 Substituting (5) and (6) into (4), and letting λ = Cσ ε, it follows ARE ( ) λ 3 (σ ĝ M, ĝ L = ε + λ ) 3/ = C3 ( + C ) 3/ (λ + σε )3 (C + ) 3 and the level ARE=.95 implies C=.. Thus, the M-smoother (7) with p and L =Gauss, maintains a 95% relative efficiency only if λ σ ε. References Assaid C.A. and Birch J.B. (), Automatic Bandwidth Selection in Robust Nonparametric Regression. Jour. Stat. Comp. Simul., 66, Beran J., Feng Y., Ghosh S. and Sibbertsen P. (), On Local Polynomial Estimation with Long-Memory Errors. Int. Jour. of Forecasting, 8, 7-4. Bianco A. and Boente G. (6), Robust Estimators under Semi-Parametric Partly Linear Autoregression: Asymptotic Behaviour and Bandwidth Selection. Jour. of Ttime Series Analysis, 8, Boente G., Fraiman R. and Meloche J. (997), Robust Plug-in Bandwidth Estimators in Nonparametric Regression. Jour. of Stat. Plann. Inf., 57, 9-4. Chu C.K., Glad I., Godtliebsen F. and Marron J.S. (998), Edge-Preserving Smoothers for Image Processing. Jour. of Amer. Stat. Assoc., 93, Cleveland W.S. (979), Robust Locally Weighted Regression and Smoothing Scatterplots. Journal of the American Statistical Association, 74,

22 Fan J. and Gijbels I. (996), Local Polynomial Modelling and its Applications. Chapman & Hall, London. Fan J., Hu C.T. and Troung Y.K. (994), Robust Nonparametric Function Estimation. Scand. Jour. of Statistics,, Fried R., Bernholdt T. and Gather U. (6), Repeated Medians and Hybrid Filters. Comp. Stat. and Data Analysis, 5, Gather U., Fried R. and Lanius V. (6), Robust detail-preserving signal extraction; in: Handbook of Time Series Analysis, eds. B. Schelter, M. Winterhalder, J. Timmer. Wiley, Weinheim. Chapter 6, pp Grillenzoni C. (997), Recursive Generalized M-estimates of System Parameters. Technometrics, 39, -4. Hall P. and Jones M.C. (99), Adaptive M-Estimation in Nonparametric Regression. The Annals of Statistics, 8, Hampel F., Ronchetti E., Rousseeuw P. and Stahel W. (986), Robust Statistics: the Approach Based on Influence Functions. Wiley, New York. Härdle W. and Gasser T. (984), Robust Non-parametric Function Fitting. Journal of the Royal Statistical Society, ser. B, 46, 4-5. Härdle W., Müller M., Sperlich S. and Werwatz A. (), Nonparametric and Semiparametric Models. Springer, Berlin. Harvey A.C. and Trimbur T.M. (3), General model-based filters for extracting cycles and trends in economic time series. Review of Economics and Statistics, 85, Hillebrand M. and Müller C.H. (6), On Consistency of Redescending M-Kernel Smoothers. Metrika, 63, 7-9. Huber P.J. (98), Robust Statistics. Wiley, New York.

23 Hwang R.-C. (4), Local Polynomial M-smoothers in Nonparametric Regression. Journal of Statistical Planning and Inference, 6, Leung D.H.-Y., Marriott F.H.C. and Wu E.K.H. (993), Bandwidth Selection in Robust Smoothing. Journal of Nonparametric Statistics,, Leung D.H.-Y. (5), Cross-Validation in Nonparametric Regression with Outliers. The Annals of Statistics, 33, 9-3. Maronna R.A., Martin R.D. and Yohai V.J. (6), Robust Statistics: Theory and Methods. Wiley, New York. Rue H., Chu C.-K., Godtliebsen F. and Marron J.S. (), M-Smoother with Local Linear Fit. Journal of Nonparametric Statistics, 4, Wang F. and Scott D. (994), The L Method for Robust Nonparametric Regression. Journal of the American Statistical Association, 89,

Robust Nonparametric Estimation of the Intensity Function of Point Data

AStA 78 - Revision Robust Nonparametric Estimation of the Intensity Function of Point Data Carlo Grillenzoni University IUAV of Venice Cá Tron, Venezia, Italy (carlog@iuav.it) Abstract. Intensity functions