Modelling Non-linear and Non-stationary Time Series

Size: px

Start display at page:

Download "Modelling Non-linear and Non-stationary Time Series"

Erik Richard
5 years ago
Views:

1 Modelling Non-linear and Non-stationary Time Series Chapter 2: Non-parametric methods Henrik Madsen Advanced Time Series Analysis September 206 Henrik Madsen (02427 Adv. TS Analysis) Lecture Notes September 206 / 45

2 Kernel estimators in time series analysis Non-parametric estimation Non-parametric estimation Free of parameters and model structure (apart from a smoothing constant). Useful if no prior information about the structure is available. The regression curve is a direct function of the data. Important methods: Kernel estimation, splines, and nearest neighbor. Henrik Madsen (02427 Adv. TS Analysis) Lecture Notes September / 45

3 Introduction Kernel estimators in time series analysis Non-parametric estimation Assume that Y has PDF f(y), then f(y) = lim P(Y h < Y Y + h) h 0 2h. Assume further we have N observations. An estimator of f(y) is: ˆf(y) = [Number of observations in (y-h;y+h)] 2hN ˆf(x) = 2hN n χ ]x h,x+h] (x i ) where χ is the characteristic function: { if x i ]x h, x+h], χ [x h,x+h] (x i ) = 0 if x i ]x h, x+h] i= Henrik Madsen (02427 Adv. TS Analysis) Lecture Notes September / 45

4 Introduction Kernel estimators in time series analysis Non-parametric estimation Define then w(u) = ˆf(y) = N The general kernel estimator: ˆf(y) = Nh { 2 if u < 0 otherwise N ( ) h w y Yt h t= k(u) du = N ( ) y Yt k h t= Henrik Madsen (02427 Adv. TS Analysis) Lecture Notes September / 45

5 Kernel estimators in time series analysis The kernel estimator The kernel estimator Assume {Y t },{Z t } are strongly (or strictly) stationary processes Madsen (2008). The basic quantity estimated is then Robinson (983): E[g(Z t ) Y t = y] f Yt (y) (*) where f Yt (y) is the PDF of Y t. The solution to (*) is given by N ( ) [G(Z t ); y] = (N y Yt h) G(Z t )k h t= Henrik Madsen (02427 Adv. TS Analysis) Lecture Notes September / 45

6 Kernel estimators in time series analysis The kernel estimator - examples The kernel estimator Example Example 2 ˆfY (y) = [; y] = Nh N ( ) y Yt k h t= Notice the special case leads to estimation of Ê[G(Z t ) Y t = y] = [G(Z t); y] [; y] = g(z t ) = Y t+ E[Y t+ Y t = y] N G(Zt )k( y Yt N k( y Y t h ) h ) Henrik Madsen (02427 Adv. TS Analysis) Lecture Notes September / 45

7 Kernel estimators in time series analysis The kernel estimator The kernel estimator Assume that the theoretical relation is Y = g(x (),...,X (q) )+ǫ where ǫ is white noise, and g is an unknown continuous function. Given N observations O = {(X (),...,X(q), Y ),...,(X () N,...,X(q) N, Y N)} the goal is now to estimate the function g. If q = the kernel estimator for g given the observations is ĝ(x) = N N s= Y sk{h (x X s )} N N s= k{h (x X s )} Henrik Madsen (02427 Adv. TS Analysis) Lecture Notes September / 45

8 Kernel estimators in time series analysis The kernel estimator The kernel estimator It is shown in Robinson (983) that under reasonable assumptions about the stochastic process {X t } and the kernel, the kernel estimator converges in probability to the conditional expectation of Y: More precisely: ĝ(x) p E[Y X = x] N, h 0, Nh Then at every point of continuity of g(x): ĝ(x) p g(x) Henrik Madsen (02427 Adv. TS Analysis) Lecture Notes September / 45

9 Kernel functions Kernel estimators in time series analysis Kernel functions The function, k is the kernel, and h is called the bandwidth of the kernel. A kernel is a real and bounded function which satisfies k(u) du =. The kernel with bandwidth, h, is given by K h (u) = h k(u/h) Commonly used kernels are Gaussian, Epanechnikov kernel, rectangular, and triangular kernels. Henrik Madsen (02427 Adv. TS Analysis) Lecture Notes September / 45

10 Kernel estimators in time series analysis Kernel functions Important kernels The Epanechnikov kernel with bandwidth h is given by : K Epa h (u) = 3 u2 ( 4h h 2)I { u h} The Gaussian kernel with bandwidth h is given by: Kh Gau (u) = ) ( h 2π exp u2 2h 2 Henrik Madsen (02427 Adv. TS Analysis) Lecture Notes September / 45

11 Kernel estimators in time series analysis Kernel functions - illustration Kernel functions K(u) Triangular Epanechnikov uniform Gauss u Henrik Madsen (02427 Adv. TS Analysis) Lecture Notes September 206 / 45

12 Kernel estimators in time series analysis Kernel functions Nearest Neighbor (NN) estimation Previously: h was fixed. Another (and often better) alternative is * k-nn (k Nearest Neighbor) (Note: here, k is a natural number - not the kernel). Since it is reasonable to link the value of k to N we often consider a fraction (or percentage) of the data. Often the symbol, ħ, is used in this case. Henrik Madsen (02427 Adv. TS Analysis) Lecture Notes September / 45

13 Kernel estimators in time series analysis Multivariate kernels Kernel functions If we allow q > in () a kernel of q th order have to be used, i.e. a real bounded function k q (u,...,u q ), where k q (u)du =. By a generalization of the bandwidth h to the quadratic matrix h with the dimension q q the more general kernel estimator for g(x (),...,X (q) ) is ĝ(x) = N N s= Y sk q [(x X s )h ] N N s= k q[(x X s )h ] where x = (x,...,x q ) and X s = (X s (),...,X s (q) ). This is called the Nadaraya-Watson estimator. Henrik Madsen (02427 Adv. TS Analysis) Lecture Notes September / 45

14 Kernel estimators in time series analysis Multivariate kernels - example Kernel functions If k q is parameterized using the normal distribution, N q (0, I), it is seen that h k q ( (x z)h ) is the value in the point, x of the q-dimensional normal density function with mean z and covariance h h T. Hence it is possible to take into account correlations in data. Y N(0, I), Y = (X z)h X N(z, hh T ) Henrik Madsen (02427 Adv. TS Analysis) Lecture Notes September / 45

15 Product kernel Kernel estimators in time series analysis Kernel functions In practice product kernels are used, i.e. k q (xh ) = q k(x i /h i ) where k is a kernel of order. Assuming that h i = h, i =,...,q the following generalization of the one dimensional case in () is obtained i= ĝ(x) = N N s= Y q s i= k{h (x i X s (i) )} q i= k{h (x i X s (i) )} N N s= In case of a Gaussian kernel, this corresponds to a diagonal covariance matrix. Henrik Madsen (02427 Adv. TS Analysis) Lecture Notes September / 45

16 Kernel estimators in time series analysis Non-parametric LS Non-parametric LS If we define the weight w s (x) = q i= k{h (x i X s (i) )} q i= k{h (x i X s (i) )} N N s= then it is seen that the estimator corresponds to the local average: ĝ(x) = N N w s (x)y s s= Henrik Madsen (02427 Adv. TS Analysis) Lecture Notes September / 45

17 Kernel estimators in time series analysis Non-parametric LS - continued Non-parametric LS The estimate is actually a non-parametric least squares estimate at the point x. This is recognized from the fact that the solution to the least squares problem is given by arg min θ N N w s (x)(y s θ) 2 s= ˆθ LS (x) = N s= w s(x)y s N s= w s(x) Henrik Madsen (02427 Adv. TS Analysis) Lecture Notes September / 45

18 Kernel estimators in time series analysis Non-parametric LS - continued Non-parametric LS This shows that at each x, ĝ is essentially a weighted LS location estimate, i.e. ĝ(x) = ˆθ LS (x) N w s (x) = N ˆθ LS (x) The kernel estimate is equivalent to a local weighted least squares estimate. s= Henrik Madsen (02427 Adv. TS Analysis) Lecture Notes September / 45

19 Kernel estimators in time series analysis Non-parametric LS Variance of the non-parametric estimates To assess the variance of the curve estimate at point x Härdle (990) proposes the point-wise estimator given by ˆσ 2 (x) = N N w s (x)(y s ĝ(x s )) 2 s= where the weights are given as shown previously by w s (x) = q i= k{h (x i X s (i) )} q i= k{h (x i X s (i) )} N N s= Remembering the WLS interpretation this estimate seems reasonable. Henrik Madsen (02427 Adv. TS Analysis) Lecture Notes September / 45

20 The bandwidth Kernel estimators in time series analysis Selection of bandwidth In analogy with smoothing in spectrum analysis the bandwidth h determines the smoothness of ĝ: If h is large the variance is small but the bias is large. If h is small the variance is large but the bias is small. In the limits: As h it is seen that ĝ(x) = Ȳ. As h 0 it is seen that ĝ(x) = Y i for x = (X () i,...,x (q) i ) and otherwise 0. Henrik Madsen (02427 Adv. TS Analysis) Lecture Notes September / 45

21 Kernel estimators in time series analysis Selection of bandwidth Selection of bandwidth - Cross Validation The ultimate goal for the selection of h is to minimize (see Härdle (990)) MSE(h) = N [ĝ(x i ) g(x i )] 2 π(x i ) N i= Where π(...) is a weighting function which screens off some of the extreme observations. However g is unknown. An easy solution is the plug-in method, where Y i is used as an estimate of g(x i ). Hence, the criterion is MSE2(h) = N N [ĝ(x i ) Y i ] 2 π(x i ) i= but it is clear that MSE2(h) 0 for h 0. Henrik Madsen (02427 Adv. TS Analysis) Lecture Notes September / 45

22 Kernel estimators in time series analysis The Leave one out method Selection of bandwidth The idea is to avoid that (X i ) is used in the estimate for Y i. For every data (X i, Y i ) we define an estimator ĝ (i) for Y i based on all data except (X i ). The N estimators ĝ (),...,ĝ (N) (called the leave one out estimators) are written ĝ (j) (x) = N N s j Y s s j q i= k{h (x i X s (i) )} q i= k{h (x i X s (i) )} Henrik Madsen (02427 Adv. TS Analysis) Lecture Notes September / 45

23 Kernel estimators in time series analysis The Leave one out method Selection of bandwidth Now the Cross-Validation criterion using the leave one out estimates ĝ (),...,ĝ (N) is CV(h) = N N [Y i ĝ (i) (X i )] 2 π(x i ) i= and the optimal value of h is then found by minimizing CV(h). It can be shown that under weak assumptions the estimate of the bandwidth ĥ that is obtained when minimizing the Cross-Validation criterion is asymptotic optimal, i.e. it minimizes () see Härdle (990). Henrik Madsen (02427 Adv. TS Analysis) Lecture Notes September / 45

24 Applications Non-parametric estimation of the conditional mean and variance Applications of kernel estimators Assume that a realization X,...,X N of a stochastic process {X t } is available. The goal is now to use the realization to estimate the functions g( ) and h( ) in the model X t = g(x t,...,x t p )+h(x t,...,x t q )ǫ t As in Tjöstheim (994) we shall use the notation M(x i,...,x id ) = E[X t X t i = x i,...,x t id = x id ] V(x i,...,x id ) = V[X t X t i = x i,...,x t id = x id ] One solution is to use the kernel estimator (other possibilities are splines, nearest neighbor or neural network estimates). Henrik Madsen (02427 Adv. TS Analysis) Lecture Notes September / 45

25 Applications Non-parametric estimation of the conditional mean and variance Applications of kernel estimators - continued Using a product kernel we obtain ˆM(x i,...,x id ) = N N i d s=i d + X d s j= k{h (x ij X s ij )} N+i d N i d +i s=i d + j= k{h (x ij X s ij )} Assuming that E(X t ) = 0 the estimator for V( ) is ˆV(x i,...,x id ) = N d N i d s=i d + X2 s j= k{h (x ij X s ij )} N+i d N i d +i s=i d + j= k{h (x ij X s ij )} Henrik Madsen (02427 Adv. TS Analysis) Lecture Notes September / 45

26 Applications Non-parametric estimation of the conditional mean and variance Applications of kernel estimators - continued If E[X t ] 0 it is clear that the above estimator is changed to ˆV(x i,...,x id ) = N d N i d s=i d + X2 s j= k{h (x ij X s ij )} N+i d N i d +i s=i d + j= k{h (x ij X s ij )} ˆM 2 (x i,...,x id ) Henrik Madsen (02427 Adv. TS Analysis) Lecture Notes September / 45

27 Applications Identification of functional relationship Non-parametric estimation of the conditional mean and variance Estimation of conditional mean and variance We have simulated 0 stochastic independent time series for both of the following non-linear models: A : X t = B : X t = { 0.8Xt +ǫ t, X t 0 0.8X t +.5+ǫ t, X t < 0 +Xt 2 ǫ t What kind of models are model A and B? Gaussian kernels with bandwidths, h = 0.6, are used. Simulation of 0 independent time series. Henrik Madsen (02427 Adv. TS Analysis) Lecture Notes September / 45

28 Applications Identification of functional relationship - Estimated M(x) for model A Non-parametric estimation of the conditional mean and variance M(x) x The theoretical conditional mean and variance are shown with +. Henrik Madsen (02427 Adv. TS Analysis) Lecture Notes September / 45

29 Applications Identification of functional relationship - Estimated M(x) for model B Non-parametric estimation of the conditional mean and variance M(x) x Henrik Madsen (02427 Adv. TS Analysis) Lecture Notes September / 45

30 Applications Identification of functional relationship - Estimated V(x) for model A Non-parametric estimation of the conditional mean and variance V(x) x Henrik Madsen (02427 Adv. TS Analysis) Lecture Notes September / 45

31 Applications Applications of kernel estimators - Estimated V(x) for model B Non-parametric estimation of the conditional mean and variance V(x) x Henrik Madsen (02427 Adv. TS Analysis) Lecture Notes September / 45

32 Applications Example: non-parametric estimation of the PDF Example: non-parametric estimation of the PDF An estimate of the probability density function (PDF) is Robinson (983) : ˆf(xi,...,x id ) = N i d + i N+i s=i d + j= d h k{h (x ij X s ij )} Henrik Madsen (02427 Adv. TS Analysis) Lecture Notes September / 45

33 Applications Non-parametric estimation of non-stationarity Example: Non-parametric estimation of non-stationarity Non-parametric methods can be applied to an identification of the structure of the existing relationships leading to proposals for a parametric model. An example of identifying the diurnal variation in a non-stationary time series is given in the following. Measurements of heat supply from 6 terrace houses in Kulladal, a suburb of Malmö in Sweden, are used to estimate the heat load as a function of the time of the day. Henrik Madsen (02427 Adv. TS Analysis) Lecture Notes September / 45

34 Applications Heat supply vs. time of day Example data - Heat supply versus measurement number [kj/s] 80 Heat Supply No. Henrik Madsen (02427 Adv. TS Analysis) Lecture Notes September / 45

35 Applications Heat supply vs. time of day Example data heat supply versus time of day [kj/s] Heat Supply Hour The observations are equidistantly sampled on the interval [0, 24] with 0.25 hour between the sample points. The transport delay between the consumers and the measuring instruments is not more than a couple of minutes. Henrik Madsen (02427 Adv. TS Analysis) Lecture Notes September / 45

36 Applications Heat supply vs. time of day Dependence between data points Assumptions: The heat supply can be related to the time of day. There is no difference between the days for the considered period. Hence, the regression curve is a function only of the time of day, and it is this functional relationship, which will be considered. Thus, the assumption of independence between measurements is not fulfilled. The regression function contained in data is minor compared to a set of data with the same number of independent observations. Henrik Madsen (02427 Adv. TS Analysis) Lecture Notes September / 45

37 Applications Heat supply vs. time of day Smoothing with different bandwidths Smoothing of the diurnal power load curve using an Epanechnikov kernel and bandwidths 0.25, 0.625,..., [kj/s] Nadaraya-Watson Estimator Epanechnikov Kernel Hour The characteristics of the non-smoothed averages gradually disappear, when the bandwidth is increased. Henrik Madsen (02427 Adv. TS Analysis) Lecture Notes September / 45

38 Applications CV - approach for finding h Heat supply vs. time of day [(kj/s) 2 ] 9.75 CV-function Epanechnikov Kernel Bandwidth [hour] Obviously the minimum of the CV-function is obtained for a bandwidth close to.3. This value is most likely somewhat bigger than the visually chosen. Henrik Madsen (02427 Adv. TS Analysis) Lecture Notes September / 45

39 Applications Heat supply vs. time of day Cross-Validation - Comments The drop at 5.5 in the morning will clearly disappear in a curve estimate with this bandwidth. These results indicate the need of non-parametric curve estimates in which the bandwidth is not only a function of the density of the predictor variables, but for instance also of the gradient of the regression curve. Henrik Madsen (02427 Adv. TS Analysis) Lecture Notes September / 45

40 Cross-Validation Gaussian kernels Applications Heat supply vs. time of day Smoothing of the diurnal power load curve using a Gaussian kernel and bandwidths 0.025, 0.25,..., [kj/s] Nadaraya-Watson Estimator Gaussian Kernel Hour Henrik Madsen (02427 Adv. TS Analysis) Lecture Notes September / 45

41 Cross-validation Gaussian kernels Applications Heat supply vs. time of day Clearly it is difficult visually to observe any difference between the curve estimates obtained with the Epanechnikov and Gaussian kernel. A general conclusion : The choice of the kernel (window) is not important as long as it is reasonably smooth. It is the choice of the bandwidth that matters. Henrik Madsen (02427 Adv. TS Analysis) Lecture Notes September / 45

42 Applications Estimation of dependence on external variables Example: Non-parametric estimation of (static) dependence on external variables The dependent variable is the heat load in a district heating system, and the independent variables are ambient and supply temperature. Data are collected in the district heating system in Esbjerg, Denmark, during the period August 4 to December 0, 989. Heat load p(t) versus ambient air temperature a(t) and supply temperature s(t). Henrik Madsen (02427 Adv. TS Analysis) Lecture Notes September / 45

43 Applications Applying the Epanechnikov kernel Estimation of dependence on external variables For the estimation of the regression surface the kernel method is applied using the Epanechnikov kernel (). A product kernel is used, i.e. the power load estimate, p, at time t, at ambient air temperature a and for supply temperature s is estimated as ˆp ht,h a,h s (t, a, s) = 3843 i= p i K ht (t t i )K ha (a a i )K hs (s s i ) K ht (t t i )K ha (a a i )K hs (s s i ) i=009 where K h (u) = h k(u/h). The bandwidths were chosen using a combination of Cross-Validation and visual inspection. Henrik Madsen (02427 Adv. TS Analysis) Lecture Notes September / 45

44 Applications The Epanechnikov kernel estimate Estimation of dependence on external variables Kernel estimate of the dependence on time of day, t d, ambient air temperature, a and supply temperature, s, shown at t d = 7. Henrik Madsen (02427 Adv. TS Analysis) Lecture Notes September / 45

45 Applications The Epanechnikov kernel estimate Estimate of the standard deviation Estimation of dependence on external variables Point-wise estimate of standard deviation of the surface estimates. Henrik Madsen (02427 Adv. TS Analysis) Lecture Notes September / 45

46 Applications Estimation of dependence on external variables Härdle, W. (990), Applied nonparametric regression, Cambridge University Press, New York. Madsen, H. (2008), Time Series Analysis,. edn, Chapman & Hall/CRC. Robinson, P. (983), Nonparametric estimators for time series, Journal of time series analysis 4(3), Tjöstheim, D. (994), Non-linear time series: A selective review, Scandinavian Journal of Statistics 2, Henrik Madsen (02427 Adv. TS Analysis) Lecture Notes September / 45

Nonparametric Methods

Nonparametric Methods Michael R. Roberts Department of Finance The Wharton School University of Pennsylvania July 28, 2009 Michael R. Roberts Nonparametric Methods 1/42 Overview Great for data analysis