Nonparametric time series prediction: A semi-functional partial linear modeling

Size: px

Start display at page:

Download "Nonparametric time series prediction: A semi-functional partial linear modeling"

Rodger Grant
5 years ago
Views:

Journal of Multivariate Analysis 99 28 834 857 www.elsevier.

1 Journal of Multivariate Analysis Nonparametric time series prediction: A semi-functional partial linear modeling Germán Aneiros-Pérez a,, Philippe Vieu b a Universidade da Coruña, Facultad de Informática, Campus de Elviña s/n, 1571 A Coruña, Spain b Université Paul Sabatier, Toulouse 3, France Received 2 September 26 Available online 27 April 27 Abstract There is a recent interest in developing new statistical methods to predict time series by taking into account a continuous set of past values as predictors. In this functional time series prediction approach, we propose a functional version of the partial linear model that allows both to consider additional covariates and to use a continuous path in the past to predict future values of the process. The aim of this paper is to present this model, to construct some estimates and to look at their properties both from a theoretical point of view by means of asymptotic results and from a practical perspective by treating some real data sets. Although the literature on the use of parametric or nonparametric functional modeling is growing, as far as we know, this is the first paper on semiparametric functional modeling for the prediction of time series. 27 Elsevier Inc. All rights reserved. AMS 1991 subject classification: 6G25; 6G8; 62M1 Keywords: Partial linear regression; Functional data; Semiparametric functional model; Dependent data; Time series prediction 1. Introduction Predicting time series future values is of main interest in many fields of applied sciences, and the statistical literature in this field is quite abundant. Independently of the kind of statistical modeling used, an important parameter that has to be chosen or estimated is the number of past values to use to construct the prediction method. The larger the number of past predictors, the more flexible the model will be, and in contrast, the more difficult the estimation of the parameters Corresponding author. Fax: address: ganeiros@udc.es G. Aneiros-Pérez X/$ - see front matter 27 Elsevier Inc. All rights reserved. doi:1.116/j.jmva

2 G. Aneiros-Pérez, P. Vieu / Journal of Multivariate Analysis Time Fig. 1. The ozone concentration data. of the model. This well-known phenomenon is particularly worrying in nonparametric statistics for which the asymptotic behavior of the estimates is exponentially decaying with the number of explanatory variables that is, with the number of past values incorporated in the model. The literature in this field of nonparametric time series prediction is vast and we will just cite two, arbitrarily selected, recent monographs by Bosq [4] and Fan and Yao [7]. One way to overcome the problem of incorporating a high number of past values into the statistical model is to use functional ideas. The idea is to cut the observed time series into a sample of trajectories and to incorporate in the model one single past continuous trajectory rather than many single past values. Look for instance at the following time series that will be studied more in detail in Section 4.2. The original time series is composed of hourly measurements of ozone concentration during 124 days so, there are exactly 2976 observed values and is plotted in Fig. 1. The functional idea consists in cutting this time series into 124 daily paths. Finally, the time series can be represented as a set of 124 functional data, as shown in Fig. 2. Hence, one can use for prediction purposes a single functional variable the continuous daily ozone concentration path rather than the 24 observed points. In other words, a 24-dimensional problem can be changed into a one-dimensional but functional problem. The prediction question can be addressed through a regression problem with dependent functional covariate. Recent statistical literature has attacked this functional time series problem by proposing either parametric mainly linear or purely nonparametric modeling. Functional linear prediction has been popularized by several works of Denis Bosq discussions and references found in the monograph by Bosq [5], while for functional nonparametric time series modeling the first advances were provided by Ferraty et al. [8], and Masry [13] see the monograph by Ferraty and Vieu [11], for complementary bibliography. There are often cases in practice where one has to take into account additional information. For instance, in the ozone concentration problem discussed before see also Section 4.2 one has also at hand measurements of various other chemical quantities. In the nonfunctional setting, a popular way to incorporate additional covariables consists in using some kind of semiparametric modeling. Since the paper by Engle et al. [6], partial linear models became quite popular in this field

3 836 G. Aneiros-Pérez, P. Vieu / Journal of Multivariate Analysis Hour Fig. 2. Ozone concentration: 124 daily curves. see also Aneiros-Pérez et al. [1] for recent advances and references on partial linear time series modeling and estimation. The aim of our paper is to develop a new model that will combine the advantages of partial linearity, to incorporate additional covariates, with the advantages of functional modeling. As far as we know, the semi-functional partial linear model proposed here is presented for the first time in the field of semiparametric functional time series analysis. Our paper is organized as follows. The semi-functional partial linear model is presented in Section 2 in the general form of regression estimation involving dependent variables, and the estimators of both linear and nonparametric components of the model are defined. Then, the asymptotic theory is provided in Section 3 and two real data examples are treated in Section 4. Technical proofs are reported in Section Model and estimators 2.1. Semi-functional partial linear time series modeling Let {Z t,t [, + [} be a real valued time series which has been observed at N equispaced times. Without loss of generality, we assume that N can be written in the form N = nτ. To clarify this, in the ozone example discussed before we had N = 2976, n = 124 and τ = 24. The observed time series {Z 1,...,Z N } can therefore be cut into n successive paths of length τ in the following way: Z i = {Z t,t [τi 1, τi[} i = 1,...,n. In addition to using the past values of the process, we wish to incorporate other covariates in the model. For instance, assume that we have a set of p real random variables X 1,...,X p, each observed once for each period, meaning that we have a set of observed variables {X i1,...,x ip, i = 1,...,n}. The problem consists in predicting some characteristic of the future of the process let us say, for instance, some real random variable GZ i+1, for some fixed operator G from the information

4 G. Aneiros-Pérez, P. Vieu / Journal of Multivariate Analysis obtained in the last period that is, from the covariates {Z i,x i1,...,x ip }. The semi-functional partial linear time series modeling leads us to assume that GZ i+1 = p X ij β j + mz i + ε i i = 1,...,n, 1 j=1 where β = β 1,...,β p T is a vector of unknown real parameters, m is an unknown smooth real function and ε i are identically distributed random errors satisfying E ε i X i1,...,x ip, Z i =. Given such a model, the prediction problem could be treated as soon as we have at hand some estimates of the real parameters β j and of the functional operator m. Remark 1. As written at the beginning of this section, we consider that the N observations of the time series {Z t,t [, + [} are taken at equispaced times, but this is assumed without loss of generality. In fact, the really important thing is to have close enough observations taken or not at equispaced times in each of the n periods of time considered, in such a way that the discretized curve can be reconstructed from these observations. Once we have these n curves, the assumption of equispaced times is irrelevant. This question does not appear in functional time series problems but indeed in any statistical problems involving curve data sets. Section 3.6 in Ferraty and Vieu [11] discusses this point further, including automatic R/S+ routines allowing to transform unbalanced data sets into balanced ones and therefore to apply any functional statistical method to nonequispaced time series Semi-functional partial linear regression model Rather than attacking the time series prediction problem directly, we will look at it as a special case of regression estimation problem from dependent observations. Let { Y i,x i1,...,x ip, T i } n be np+2-variate random vectors identically distributed as Y, X 1,...,X p,t, where Y is a real random variable which is linked both to a set of real explanatory variables X j j = 1,...,p and to a functional explanatory variable T by means of the so-called semi-functional partial linear regression model SFPLR model: Y i = p X ij β j + mt i + ε i i = 1,...,n, 2 j=1 where β,mand ε i are defined as before. The remainder of this paper will deal with this SFPLR model, rather than with model 1. Indeed, model 1 can be seen as a special case of 2 by considering Y i = GZ i+1 and T i = Z i. To allow for large classes of functional variables, we only suppose that T is valued in some abstract semi-metric space H, and we denote the associated semi-metric by d,.

5 838 G. Aneiros-Pérez, P. Vieu / Journal of Multivariate Analysis Construction of the estimators As we will motivate below, it is natural to estimate the vector of parameters β and the function m in 2 by means of βh = X h T X 1 h X h T Ỹh 3 and m h t = w n,h t, T i Y i Xi T β h, 4 respectively see [15] for the corresponding estimates in the nonfunctional partial linear regression model. In these estimators, h is a smoothing parameter that typically appears in any setting of nonparametric estimation. Furthermore, we have denoted X = X 1,...,X n T with X i = Xi1,...,X ip T, Y = Y1,...,Y n T, and for any n q-matrix A q 1, Ã h = I W h A, where W h = w n,h Ti,T j i,j with w n,h, being a weight function that can take different forms. In this paper we will focus on the Nadaraya Watson-type weights, w n,h t, T i = K d t,t i /h nj=1 K d t,t j /h, 5 where K is a function from [, to [,. Several motivations for β h in 3 can be given. For instance, β h can be seen as the ordinary least squared estimator obtained by regressing the partial residual vector Ỹ h on the partial residual matrix X h note that Ỹ h and X h are formed by partial residuals adjusting for T. Another motivation for β h is based on the fact that βh = arg min β= β 1,...,β p Y i 2 p X ij β j m h,β T i, j=1 where m h,β t = n w n,h t, T i Y i Xi T β. Finally, observe that once β is estimated by means of β h, it seems natural to estimate mt by means of the kernel estimator m h t. Estimators 3 and 4 were introduced and studied in Aneiros-Pérez and Vieu [2] for the case of independent and identically distributed vectors { } n Y i,x i1,...,x ip,t i. Specifically, these authors obtained the rates of convergence of 3 and 4, together with the asymptotic normality of 3. In this paper, those results will be generalized to the case where the data satisfy some strong mixing condition, and the asymptotic normality of 4 will be established, too. Hence, our model is adequate for time series prediction. 3. Asymptotic properties In this section, we consider the SFPLR problem as stated in Section 2.2, and we present the asymptotic normality and the rates of convergence of the estimators β h and m h t defined in 3 and 4, respectively.

6 3.1. Technical assumptions G. Aneiros-Pérez, P. Vieu / Journal of Multivariate Analysis To state asymptotic results, we require the following assumptions note that the conditions on the smoothing parameter h> are not given in this subsection but will be specified for each among the theorems stated below. Conditions on the semi-metric space: T is valued in some given compact subset C of H, such that C τ n B z k,l n, 6 where τ n ln γ = C γ and C denote real positive constants, and τ n and l n asn we have denoted B t,h = { t H; d t,t <h }. Conditions on the kernel K: K has support [, 1], is Lipschitz continuous on [,, and κ such that u [, 1], K u > κ >. 7 Conditions on the smoothness: Let us introduce the following notation: g j t =E X ij T i =t, 1 i n,1 j p. We assume that all the operators to be estimated are smooth, in the sense that for some C< and some α > we have that u, v C C, f { m, g 1,...,g p }, f u f v Cd u, v α. 8 Conditions on the distributions: The probability distribution of the infinite dimensional process T is assumed such that there exist a positive-valued function on, and positive constants α, α 1 and α 2 such that 1 hs ds >α h and α 1 h P T B t,h α 2 h t C,h>, 9 and the joint probability distribution of T i,t j is assumed such that there exist a function Ψh = c h 1+ε c >, ε 1 and positive constants α 3 and α 4 such that < α 3 Ψh sup P [ ] T i,t j B t,h B t,h α4 Ψh t C,h>. 1 i =j Conditions on the dependence structure: We assume that { Y i,x i1,...,x ip,t i } n come from some stationary strong mixing process whose mixing coefficients {α n} verify while α n cn a for some a>4.5, 11 η i is independent of ε i i = 1,...,n, 12 where we have denoted η i = T η i1,...,η ip with ηij = X ij E X ij T i, j = 1,...,p. Conditions on the moments: Let us introduce the following notation: V ε = E εε T with ε T = ε 1,...,ε n and η T = η 1,...,η n.

7 84 G. Aneiros-Pérez, P. Vieu / Journal of Multivariate Analysis We suppose that and E Y 1 r + E X 11 r + +E X 1p r < for some r>4, 13 sup i,j E Y i Y j Ti,T j <, 14 max sup E X i1 j X i2 j T i1,t i2 <, 15 1 j p i 1,i 2 B = E η 1 η T 1 and C = lim n n 1 E η T V ε η are positive definite matrices. 16 Furthermore, ra+1 2a+r sn = on θ for some θ > 2, 17 where we have denoted s n = sup sn,1 t + s n,2 t + s n,3 t t C with and s n,1 t = s n,2 t = s n,3 t = j=1 j=1 max 1 k p Cov Δi t, Δ j t Cov Γi t, Γ j t j=1 Cov Γik t, Γ jk t d t,ti with Δ i t = K, h d t,ti with Γ i t = Y i K h d t,ti with Γ ik t = X ik K. h The assumptions above are common in the setting of partial linear regression models see [1] and/or functional nonparametric models under strong mixing conditions see [11]. Note, for clarity, these assumptions were not presented here in their more general form. For instance, constants α i i = 1, 2 in 9 and α j j = 3, 4 in 1 could be changed by nonnegative functionals α i f 1 t and α j f 2 t, respectively see [13], this change having no consequences on the final results. However, even if the condition linking h and Ψh in Assumption 1 Ψh = c h 1+ε, c>, ε 1 could be slightly reduced, it should be noted that it is already quite general. For instance, if we look at how this condition would behave in the finite dimensional case, we would see that it is much less restrictive than what is usually observed in the literature when the existence of density for T i,t j is assumed that is, when it is assumed that Assumption 1 holds for ε = 1. The reader will find in Ferraty and Vieu [11] a discussion on the links between the semi-metric d and the small ball concentration properties of T, as well as a discussion about how the small ball probability Assumptions 9 and 1 can be interpreted in a finite dimensional setting in terms of standard conditions on the density of T and T i,t j, respectively. Observe that, as a consequence of the expressions of our estimators 3 and 4, assumptions on Y i and m are similar to those on X ij and g j, respectively.

8 G. Aneiros-Pérez, P. Vieu / Journal of Multivariate Analysis Asymptotics for the linear parameters Theorem 1. Under Assumptions 5 17, if in addition nh 4α, n h εar 2 r 1 = O1 and n r h 1 log n, θa+r 2 1 n ra+1 h log n = O1 as n where α >, ε 1, a>4.5, r>4 and θ > 2 were defined in Assumptions 8, 1, 11, 13 and 17, respectively, then n βh β d N, A where A = B 1 CB If in addition { Y i,x i1,...,x ip,t i } is strictly stationary, then lim sup n n 2 log log n βhj β j = ajj 2 a.s., where ajj = A jj Asymptotics for the functional nonparametric component Theorem 2. Under the assumptions used to obtain 19, we have that sup m h t mt = O h α log n + O a.s. 2 t C n h Theorem 3. Under the assumptions used to obtain 18, if in addition α 1 = α 2 see 9, ε = 1 see 1, nh 2α h as n and the conditions 1. s 2 u := Varm T i + ε i T i = u and s r u := E m T i + ε i mt r T i = u, u C, are independent of i and are continuous in some neighborhood of t, and su, v, t := E m T i + ε i mt m T j + εj mt T i = u, T j = v, i = j, u, v C, does not depend on i, j and is continuous in some neighborhood of t, t, 2. = and h is absolutely continuous in a neighborhood of the origin, 3. I j h C j as h j = 1, 2 for some positive constant C j, where 1 1 I j h = K j s hs ds, h /h and 4. α v n = o n/ h 1/2, where {v n } is a sequence of positive integers satisfying v n and v n = o n h 1/2 as n, hold, then where n h 1 2 m h t mt d N σ 2 t = C 2 s 2 t C1 2. α 1, σ 2 t,

9 842 G. Aneiros-Pérez, P. Vieu / Journal of Multivariate Analysis Remark 2. From Theorems 1 and 2 together with the results by Aneiros-Pérez and Vieu [2], we can say that the same rates of convergence are observed in the setting of mixing dependence as well as in the setting of independence but the effect of the dependence is present through the conditions imposed on the smoothing parameter h, i.e., through a and ε. Furthermore, from the proofs below it is easy to see that the condition n h εar 2 r 1 = O1 is the key condition to obtain these similar asymptotic behaviors. If this condition is removed, then the term [ ] log n n s O n hε 1 h 1+ε should be added to the right-hand side in 2 we have denoted s = r/ar 2a + r. Remark 3. Theorem 3 extends the asymptotic normality of the regression function estimator in the pure nonparametric functional regression model see [13] to the estimator of the nonparametric part in a SFPLR model. Similar asymptotic distributions are obtained. Note that the condition nh 2α h asn removes the bias of m h t, while the additional Condition 4 is satisfied provided h n = n b for some 2/1 + a<b<1. Finally, although Masry [13] used the additional Condition 3, it is clear that the corresponding expressions of the constants C 1 and C 2 are unfeasible in practice. Exact computation of these constants can be obtained by means of the same techniques used in Ferraty et al. [9], so the additional Condition 3 becomes a more interpretable condition. These techniques are not presented here to avoid deviating from the main purpose of our paper. 4. Applications 4.1. Introduction As in pure nonparametric functional modeling see [11, Chapter 13], the practical results stated below will put into light the fact that the topological structure of the space of functional variables is playing a crucial role. In any practical application of our procedure, it is necessary to control this topological structure and this is achieved by choosing a suitable semi-metric to measure the proximity between two functional data. To see this point clearly, we decided to present in Section 4 two real data examples. The first one corresponds to the environmetrics data discussed before and is characterized by smooth functional paths see Fig. 2 in such a way that a standard functional metric could be used in the procedure. The second example is of economic interest and is characterized by rough paths due to the small number of observed points, and a more sophisticated modeling, involving functional principal component semi-metrics, will be necessary. We will see that with these two different real data examples, the semi-functional partial linear modeling gives good results Ozone concentration data Presentation of the data In this subsection, we are interested in forecasting future values of ozone concentration. The data consist of hourly measurements of ozone O 3 concentration together with additional chemical measurements such as NO 2 and SO 2 concentrations μg/m 3 in Getafe Madrid, Spain from May 15, 25 to September 15, days. Data are available on the website

10 G. Aneiros-Pérez, P. Vieu / Journal of Multivariate Analysis Using the general notation introduced in Section 2.1, the original time series Z t = O 3,t, t = 1,...,2976, is cut into 124 daily curves Z i = T i ={O 3,24i 1+t,t [, 24[}, i = 1,...,124. The response variable is, for fixed value of s [, 24[, G s Z i+1 = Y i s = O 3,24i+s, i = 1,...,123, and the p = 2 additional real covariates are defined by X s i,1 = NO 2,24i 1+s and X s i,2 = SO 2,24i 1+s, i = 1,...,124. B-splines with 1 knots were used to smooth the discretized curves Goals of the study Our goal is to forecast the future values Z n+1 s for each value of s {1,...,24}. Because these values {Z n+1 s} 24 s=1 will form part of a test sample, we will be able to compare the performance of the different models. To demonstrate the usefulness of the proposed model, several regression models nonfunctional, functional and semi-functional are used, and the criteria used for the comparisons were based on the quadratic errors E 1 s = Z n+1 s Ẑ n+1 s 2, and the relative quadratic errors E 2 s = E 1 s VarZs, where VarZs is the empirical variance of {Z i s} n Choosing the parameters of the SFPLR model According to the general guidelines provided in Ferraty and Vieu [11], the smooth shape of the curves see Fig. 2 suggests to use standard L 2 semi-metrics. Precisely, we considered a class of semi-metrics based on the following semi-norms: u a,q = a+q a u 2 t dt 1 2, where a = 1, 2,...,24 q and q = 1, 2,...,23. In addition, as motivated in Ferraty and Vieu [11], the values of a and q were selected by means of the cross-validation method. Similarly, the bandwidth h was selected by cross-validation among a family of k-nearest neighbors-type bandwidths see again [11]. The training samples { } Z i+1 s, Z i,x s 122 i,1,xs i,2 s = 1,...,24 were used for these selection procedures, while

11 844 G. Aneiros-Pérez, P. Vieu / Journal of Multivariate Analysis Table 1 Statistical models used to predict the ozone concentrations Nonfunctional models Z i+1 s = m 1,s Z i s + ε 1,i Z i+1 s = X s i,1 β 1,s + m 2,s Z i s + ε 2,i PL1 Z i+1 s = X s i,2 β 2,s + m 3,s Z i s + ε 3,i PL2 Z i+1 s = X s β i,1 3,s + Xs β i,2 4,s + m 4,s Z i s + ε 4,i PL3 Z i+1 s = μ 1,s + m 5,s X s i,1 + m 6,s Z i s + ε 5,i AD1 Z i+1 s = μ 2,s + m 7,s X s i,2 + m 8,s Z i s + ε 6,i AD2 Z i+1 s = μ 3,s + m 9,s X s i,1 + m 1,s X s i,2 Notation + m 11,s Z i s + ε 7,i AD3 NP Functional or semi-functional models Z i+1 s = m 12,s Z i + ε 8,i Z i+1 s = X s i,1 β 5,s + m 13,s Z i + ε 9,i SFPL1 Z i+1 s = X s i,2 β 6,s + m 14,s Z i + ε 1,i SFPL2 Z i+1 s = X s β i,1 7,s + Xs β i,2 8,s + m 15,s Z i + ε 11,i SFPL3 Z i+1 s = μ 4,s + m 16,s X s i,1 + m 17,s Z i + ε 12,i SFAD1 Z i+1 s = μ 5,s + m 18,s X s i,2 + m 19,s Z i + ε 13,i SFAD2 Z i+1 s = μ 6,s + m 2,s X s i,1 + m 21,s X s i,2 FNP + m 22,s Z i + ε 14,i SFAD3 the test samples Z s, Z 123,X s 123,1,Xs 123,2 allowed to verify the quality of the prediction Z s = O 3, s were forecasted, s = 1,..., The results The statistical models used to obtain the different predictions are reported in Table 1, while Fig. 3 gives the corresponding forecasted ozone concentrations. Two interesting facts can be seen in Fig. 3. On the one hand, the greater differences between the predictions obtained by means of the two classes of models the class of nonfunctional models and the class of functional or semifunctional ones are found when one forecasts the ozone concentration in the first and last hours of the day. On the other hand, and focusing in the class of functional or semi-functional models, we observe very bad behavior of both the functional nonparametric and the semi-functional partial linear models when s = 6 and 7, in contrast to the good behavior of the semi-functional additive models. Hence, it appears that the flexibility of the semi-functional additive models takes advantage for these two particular predictions, but not for s {1,...,24}\{6, 7}. To numerically compare the performance of the different models considered, Table 2 shows the corresponding mean values of the errors E 1 s and E 2 s. These mean values were obtained on both s {1,...,24} and s {1,...,24}\{6, 7}. This table shows that any of the functional or semi-functional models considered is better than any of the nonfunctional ones, and depending on the criterion error or on the values of s considered, the best were the SFPL3, the SFAD1 or the

12 G. Aneiros-Pérez, P. Vieu / Journal of Multivariate Analysis TRUE DATA s NP AD1 AD2 AD s TRUE DATA PL1 PL2 PL s TRUE DATA FNP SFAD1 SFAD2 SFAD s TRUE DATA SFPL1 SFPL2 SFPL3 Fig. 3. Forecasted ozone concentration data by means of nonfunctional upper panels and functional or semi-functional lower panels models. SFAD3 models. Finally, Table 3 shows, for each s, the values a and q equivalently, the interval a, a + q of the optimal semi-metric corresponding to the SFPL3 model Electricity consumption data The data and the problem We will now quickly present a second application. The main difference is the rough shape of the functional paths of the time series. The aim is to predict future values of electricity consumption C. For this purpose, we dispose of both the US monthly electricity consumed by residential and commercial sectors from January 1972 to January months and their annual average retail prices P cents per Kilowatt-hour, including taxes 33 years. Data are available on the websites and respectively. To treat the electricity consumption data, we eliminated the heteroscedasticity and the linear trend by differencing the ln data. Then, using the general notation introduced in Section 2.1, the

13 846 G. Aneiros-Pérez, P. Vieu / Journal of Multivariate Analysis Table 2 Mean value of the criterion error Quadratic error Relative quadratic error Nonfunctional models NP a a b b PL PL PL AD AD AD Functional or semi-functional models FNP SFPL SFPL SFPL SFAD SFAD SFAD a s {1,...,24}. b s {1,...,24}\{6, 7}. Table 3 Optimal semi-metric u a,q for the model SFPL3 s a q s a q time series see Fig. 4 Z t = ln C t ln C t 1, t = 1,...,396

14 G. Aneiros-Pérez, P. Vieu / Journal of Multivariate Analysis Time Fig. 4. Electricity consumption: The differenced ln data Month Fig. 5. Electricity consumption: 33 yearly differenced ln curves. is cut into 33 annual curves see Fig. 5 Z i = T i ={Z 12i 1+t,t [, 12[}, i = 1,...,33. The response variable is, for fixed value of s [, 12[, G s Z i+1 = Y i s = Z 12i+s, i = 1,...,32

15 848 G. Aneiros-Pérez, P. Vieu / Journal of Multivariate Analysis Table 4 Statistical models used to predict electrical consumption Notation Nonfunctional models Z i+1 s = m 1,s Z i s + ε 1,i Z i+1 s = X i,1 β 1,s + m 2,s Z i s + ε 2,i Z i+1 s = μ 1,s + m 3,s Xi,1 + m4,s Z i s + ε 3,i Functional or semi-functional models Z i+1 s = m 5,s Z i + ε 4,i Z i+1 s = X i,1 β 2,s + m 6,s Z i + ε 5,i Z i+1 s = μ 2,s + m 7,s Xi,1 + m8,s Z i + ε 6,i NP PL AD FNP SFPL SFAD and the p = 1 additional real covariate is defined by X i,1 = P i, i = 1,..., Choosing the parameters of the SFPLR model We are in a situation in which the functional curves see Fig. 5 are quite rough. So, it is not possible to use standard tools such as the L 2 distance used for instance before for the ozone concentration example for measuring the proximity between two curves. As motivated { in Section 3.4 of Ferraty and Vieu [11], we used a class of semi-metrics d PCA q } 12 q=1 based on functional principal component ideas. Now, as before for the ozone concentration example, the parameters q and h were selected by cross-validation procedures. The training samples were { Zi+1 s, Z i,x i,1 } 31 while the test samples were Z 32+1 s, Z 32,X 32,1 s = 1,..., The results The statistical models used to obtain the different predictions can be seen in Table 4, while Fig. 6 gives the corresponding forecasted electricity consumptions. Fig. 6 shows that the greater differences between the predictions obtained by means of the two classes of models the class of nonfunctional models and the class of functional or semi-functional ones are found when s = 1, 9 or 12, that is, when one predicts the change in the electrical consumption from January to February, from August to September and from December to January, respectively. To numerically compare the performance of the different models considered, Table 5 shows the corresponding mean values of the errors E 1 s and E 2 s s {1,...,12}. As in the example of the ozone concentration, we observe that any of the functional or semi-functional models considered is better than any of the nonfunctional ones. Furthermore, the best was the SFPL model. Finally, Table 6 shows, for each s, the value q of the optimal semi-metric corresponding to the SFPL model. 5. Proofs Throughout this section, C denotes a generic positive constant which may take different values from one formula to another. We first need to state some preliminary results.

16 G. Aneiros-Pérez, P. Vieu / Journal of Multivariate Analysis a TRUE DATA NP PL -.2 b s AD TRUE DATA FNP SFPL s SFAD Fig. 6. Forecasted electricity consumption data by means of nonfunctional upper side and functional or semi-functional lower side models Some technical lemmas Lemma 1 Theorem 1.32, Bosq [4]. Let {X k } be a zero-mean real-valued process such that sup 1 k n X k b. Then, for each integer q [1,n/2] and each ε> ε 2 P S n >nε 4exp 8v 2 q q b 1 [ ] 2 n qα, ε 2q where S n = n X k, v 2 q = 2 p 2 σ2 q + bε 2

17 85 G. Aneiros-Pérez, P. Vieu / Journal of Multivariate Analysis Table 5 Mean value of the criterion error Quadratic error Relative quadratic error Nonfunctional models NP PL AD Functional or semi-functional models FNP SFPL SFAD Table 6 Optimal semi-metric dq PCA for the SFPL model s q with p = n 2q and σ 2 q = max E [jp] + 1 jp X [jp]+1 + X [jp]+2 j 2q 1 [ and α n {X k }. 2q ] 2 + +X [j+1p] + j + 1p [j + 1p] X [j+1p+1], [ ] denotes the strong mixing coefficient of order n 2q corresponding to the process Lemma 2 Theorem 5, Oodaira and Yoshihara [14]. Let {V k } be a zero-mean, strictly stationary, α-mixing and real process, such that E V 1 2+δ < and n=1 αn 2+δ < < δ < δ and VarV Cov V 1,V 1+k >. Let S n = n V k and sn 2 = E Sn 2. Then, 1 lim sup S n / 2sn 2 log log 2 n s2 = 1 a.s. n Lemma 3. Let {V k } be a zero-mean, stationary, α-mixing and real process, such that for some r>4, max 1 k n E V k r C<. Assume that {a ik,i,k = 1,...,n} is a sequence of positive numbers such that max 1 i,k n a ik = O a n. If, in addition, n=1 n 41 γ α n <.5 < γ < 1, then max a ik V k = O a n n r log n a.s. 1 i n Remark 4. As a matter of fact, the conclusion of this Lemma remains unchanged when {a ik,i,,...,n} is a random sequence satisfying the conditions above almost surely. If the condition δ 5+4γ

18 G. Aneiros-Pérez, P. Vieu / Journal of Multivariate Analysis γ on the mixing coefficients is changed by n 41 γ α n asn, then the result of the Lemma holds in probability. Observe that Lemma 3 generalizes and corrects Proposition 1 in Avramidis [3]. { } Proof of Lemma 3. Let V k = V ki V k n 1 r and V k = V k V k. Considering b = Ca nn 1 r, q = [ n γ] and ε = a n n r log n in Lemma 1, we obtain that P max a 1 i n ik V k EV k >nε = O n 1 log n + n 5+4γ [ 4 α n 1 γ] 21 we have used that, as a consequence of the Billingsley inequality and the summability of the mixing coefficients αn, it verifies that v 2 q = O a 2n n 2 r +γ 1. On the other hand, the Strong Law of Large Numbers gives that n V k EV k 2 2 = Onr a.s. From this result together with the Hölder inequality we obtain that max a 1 i n ik V k EV k = O a n n r a.s. 22 Now, 21, 22 and the Borell Cantelli Lemma yield the result of the Lemma. Lemma 4. Under Assumptions 5 7 and 9, if in addition {T i } are identically distributed and come from some α-mixing process whose mixing coefficients {α n} verify and and α n cn a for some a > 1, a +1 s 2 n,1 = on θ for some θ > θ 2 n a +1 h 2 1 log n = O1, 24 then max 1 i,j n w n,h T i,t j = O n h 1 a.s. Proof of Lemma 4. We can write 1 neδ t K d t,t j /h w n,h t, T j =, 1 n K d t,t k /h neδ t

19 852 G. Aneiros-Pérez, P. Vieu / Journal of Multivariate Analysis where we have denoted Δ t = K d t,t 1 /h. According to Ferraty and Vieu [1] we have 1 sn,1 K d t,t k /h 1 neδ t = O log n n h and sup t C a.s. 25 inf EΔ t C h >. t C Now, together with the boundedness of K give the result of the Lemma. 26 Lemma 5. Under Assumptions 5 1 and 15 m not included in 8, if in addition h, log n/ n h and n h εar 2 r 1 = O1 as n where a>1and r>4, and ε 1 was defined in Assumption 1, and 1. { } n X i1,...,x ip,t i come from some stationary α-mixing process whose mixing coefficients are α n cn a, 2. max 1 i n E Xi1 r + +E Xip r C<, and 3. sup t C sn,1 t + s n,3 t then we have that n 1 X T X B a.s. ra+1 2a+r = on θ, for some θ > 2, Proof of Lemma 5. The r, sth element of n 1 X T X can be written as n 1 X T X η kr η ks + g rh T k η ks rs = n 1 where we have denoted + g sh T k η kr + g rh T k g sh T k, 27 g jh t = g j t ĝ jh t with ĝ jh t = Now, the Strong Law of Large Numbers gives that n 1 w n,h t, T i X ij. η kr η ks B rs a.s. 28 In particular, n 1 η 2 ks = O 1 a.s. 29

20 G. Aneiros-Pérez, P. Vieu / Journal of Multivariate Analysis Furthermore, by applying directly the results in Ferraty and Vieu [1] and taking into account Assumption 1 and conditions on h, we can see that max max gjh T i = O h α log n + O a.s. 3 1 j p 1 i n n h Note that in Ferraty and Vieu [1] Assumption 6 was forgotten but it played an obvious key role in the proof. This is the reason why we included it in this paper. We conclude this proof by using 27 3, the assumptions made on h and h, and the Cauchy Schwarz inequality Proof of Theorem 1 Proof of 18. We can write n βh β { = n 1 X 1 X T n 2 1 X i m h T i w n,h T i,t l ε l + } X i ε i X i l=1 = n 1 X 1 X T n 2 1 S n1 S n2 + S n3, 31 where we have denoted m h T i = m T i n j=1 w n,h T i,t j m T j. Furthermore, taking into account the decomposition X ij = g jh T i + η ij w n,h T i,t k η kj i = 1,...,n,j = 1,...,p, where X ij is the i, jth element of X, each j-component j = 1,...,pofS nk k = 1, 2, 3 can be written as the sum of three summands S nk,j1,s nk,j2 and S nk,j3. Results and 45 below give the asymptotic behavior of these summands. Considering a ik = w n,h T i,t k, a n = n h 1 see Lemma 4 with a = ra+1 a+r 1, V k = ε k V k = η kj and.5 < γ < 1 9/4a in Lemma 3, we obtain that and max i max i w n,h T i,t k ε k = O h 1 n r log n a.s. 32 w n,h T i,t k η kj = O h 1 n r log n a.s. 33 Furthermore, 3 and the analogous result for m, together with 32 and 33 give max m h T i = O h α log n + O + O h 1 n r log n i n h = O h α + O h 1 n r log n a.s. 34

21 854 G. Aneiros-Pérez, P. Vieu / Journal of Multivariate Analysis and max g jh T i = O h α + O h 1 n r log n a.s. 35 i,j Now, from 34, 35 and the Abel inequality, we obtain that S n1,j1 = g jh T i m h T i n max g jh T i max m h T i i i = O nh 2α + O h 2 n 2 r log 2 n a.s. 36 Considering a ki = m h T i, a n = h α + h 1 n r log n see 34, V i = η ij and.5 < γ < 1 9/4a in Lemma 3, we obtain that S n1,j2 = m h T i η ij = O h α n r log n + h 1 n 2 r log 2 n a.s. 37 Furthermore, Abel s inequality, 33 and 34 give S n1,j3 = m h T i w n,h T i,t k η kj n max i m h T i max i w n,h T i,t k η kj = O h α h 1 n r log n + h 2 n 2 r log 2 n a.s. 38 By means of a similar reasoning as that used to obtain 38, but using 32 and 35 instead of 33 and 34, respectively, we have that S n2,j1 = g jh T i w n,h T i,t l ε l l=1 = O h α h 1 n r log n + h 2 n 2 r log 2 n a.s. 39 On the other hand, considering a ki = n l=1 w n,h T i,t l ε l, a n = h 1 n r log n see 32, V i = η ij and.5 < γ < 1 9/4a in Lemma 3, we obtain that S n2,j2 = w n,h T i,t l ε l η ij = O h 1 n 2 r log 2 n a.s. 4 l=1

22 G. Aneiros-Pérez, P. Vieu / Journal of Multivariate Analysis Furthermore, Abel s inequality, 32 and 33 give S n2,j3 = w n,h T i,t l ε l w n,h T i,t k η kj l=1 n max w i n,h T i,t k η kj max w i n,h T i,t l ε l l=1 = O h 2 n 2 r log 2 n a.s. 41 By means of a similar reasoning as that used to obtain 37, we have that S n3,j1 = g jh T i ε i = O h α n r log n + h 1 n 2 r log 2 n a.s. 42 By means of a similar reasoning as that used to obtain 4, we have that S n3,j3 = w n,h T i,t k η kj ε i = O h 1 n 2 r log 2 n a.s. 43 Now, together with the facts that nh 4α and give S n1 S n2 + S n3 = n r h 1 log n asn η i ε i + on 1/2 a.s. 44 Furthermore, taking into account the facts that {η i ε i } is α-mixing with mixing coefficients α ηε n α n and α n 2+δ δ < where 2/a 1 <δ r 2, a central limit theorem see [12] gives n 1 2 S n3,j2 = n 1 2 d η i ε i N, C 45 remember that C = lim n n 1 E η T V ε η. Now, Lemma 5, 31, 44 and 45 give part i of the Theorem. Proof of 19. By means of 31 and 44, together with Lemma 5, it is easy to see that βh β = n 1 X 1 X T n 1 S n1 S n2 + S n3 = B 1 + o1 n 1 η i ε i + on 2 1 a.s. 46 Now, we will study the term b jt η i ε i, 2n log log n

23 856 G. Aneiros-Pérez, P. Vieu / Journal of Multivariate Analysis where we have denoted b j = b j1,...,b jp T, with b jk jk = B 1. For this, we will use Lemma 2, considering V i = b jt η i ε i and 2/a 1 <δ < δ r 2 observe that, as a consequence of Assumption 12 and the fact that C is a definite positive matrix, the condition on the autocovariance function in Lemma 2 holds. Taking into account that E Sn 2 = Var b jt η i ε i = b jt Var η i ε i b j = nb jt Cb j 1 + o1 = na jj 1 + o1, Lemma 2 gives 1 lim sup 1 2 b jt η i ε i n 2n log log n = 1 a jj 2 a.s. 47 Now, 46 and 47 give part ii of the Theorem Proof of Theorem 2 We can write m h t = w n,h t, T i mt i + ε i w n,h t, T i Xi T βh β. 48 Therefore, we have that sup t C m h t mt sup t C m h t mt + sup t C w n,h t, T i Xi T βh β, where denotes the Euclidean norm and m h t = n w n,h t, T i mt i + ε i.now,the results in Ferraty and Vieu [1] together with 19 are enough to conclude this proof Proof of Theorem 3 From 48, we have that n h 2 1 m h t mt = n h 2 1 w n,h t, T i mt i + ε i mt n h 1 2 w n,h t, T i Xi T βh β S 1 t S 2 t. Now, Corollary 2 in Masry [13] establishes that S 1 t d N, σ 2 t while 18 gives S 2 t = o1 a.s. These results conclude this proof.

24 Acknowledgments G. Aneiros-Pérez, P. Vieu / Journal of Multivariate Analysis Philippe Vieu wishes to thank all the participants of the working group STAPH on Functional Statistics at the Paul Sabatier University of Toulouse for their stimulating and continuous helpful comments, and special thanks to Frédéric Ferraty. The activities of this group are available on The research of the first author was supported in part by MEC Grant ERDF included MTM References [1] G. Aneiros-Pérez, W. González-Manteiga, P. Vieu, Estimation and testing in a partial regression model under longmemory dependence, Bernoulli [2] G. Aneiros-Pérez, P. Vieu, Semi-functional partial linear regression, Statist. Probab. Lett [3] P. Avramidis, Two-step cross-validation selection method for partially linear models, Statist. Sinica [4] D. Bosq, Nonparametric Statistics for Stochastic Processes: Estimation and Prediction, second ed., Lecture Notes in Statistics, vol. 11, Springer, Berlin, [5] D. Bosq, Linear Processes in Function Spaces. Estimation and Prediction, Lecture Notes in Statistics, vol. 149, Springer, Berlin, 2. [6] R. Engle, C. Granger, J. Rice, A. Weiss, Nonparametric estimates of the relation between weather and electricity sales, J. Amer. Statist. Assoc [7] J. Fan, Q.Yao, Nonlinear Time Series. Nonparametric and Parametric Methods, Springer Series in Statistics, Springer, New York, 23. [8] F. Ferraty,A. Goia, P.Vieu, Functional nonparametric model for time series: a fractal approach to dimension reduction, Test [9] F. Ferraty, A. Mas, P. Vieu, Nonparametric regression on functional data: inference and practical aspects, Aust. N.Z. J. Statist. 27, doi:1.1111/j x x. [1] F. Ferraty, P. Vieu, Nonparametric models for functional data, with applications in regression, curves discrimination and time series prediction, J. Nonparametric Statist [11] F. Ferraty, P. Vieu, Nonparametric Functional Data Analysis, Springer Series in Statistics, Springer, NewYork, 26. [12] I.A. Ibragimov, Some limit theorems for stationary processes, Theoret. Probab. Appl [13] E. Masry, Nonparametric regression estimation for dependent functional data: asymptotic normality, Stochastic Process. Appl [14] H. Oodaira, K.I. Yoshihara, The law of the iterated logarithm for stationary processes satisfying mixing conditions, Kodai Math. Sem. Rep [15] P. Speckman, Kernel smoothing in partial linear models, J. Roy. Statist. Soc. Ser. B

PREWHITENING-BASED ESTIMATION IN PARTIAL LINEAR REGRESSION MODELS: A COMPARATIVE STUDY

REVSTAT Statistical Journal Volume 7, Number 1, April 2009, 37 54 PREWHITENING-BASED ESTIMATION IN PARTIAL LINEAR REGRESSION MODELS: A COMPARATIVE STUDY Authors: Germán Aneiros-Pérez Departamento de Matemáticas,