Nonparametric density estimation for positive time series

Size: px

Start display at page:

Download "Nonparametric density estimation for positive time series"

Shannon Hunt
5 years ago
Views:

See discussions, stats, and author profiles for this publication at: https://www.researchgate.

1 See discussions, stats, and author profiles for this publication at: Nonparametric density estimation for positive time series Article in Computational Statistics & Data Analysis February 2010 Impact Factor: 1.4 DOI: /ssrn Source: RePEc CITATIONS 13 READS 65 2 authors: Taoufik Bouezmarni Université de Sherbrooke 24 PUBLICATIONS 245 CITATIONS Jeroen V.K. Rombouts HEC Montréal - École des Hautes Études com 44 PUBLICATIONS 1,611 CITATIONS SEE PROFILE SEE PROFILE All in-text references underlined in blue are linked to publications on ResearchGate, letting you access and read them immediately. Available from: Taoufik Bouezmarni Retrieved on: 10 May 2016

2 Nonparametric Density Estimation for Positive Time Series Taoufik Bouezmarni Jeroen V.K. Rombouts September 9, 2009 Abstract The Gaussian kernel density estimator is known to have substantial problems for bounded random variables with high density at the boundaries. For independent and identically distributed data, several solutions have been put forward to solve this boundary problem. In this paper, we propose the gamma kernel estimator as density estimator for positive time series data from a stationary α-mixing process. We derive the mean (integrated) squared error and asymptotic normality. In a Monte Carlo simulation, we generate data from an autoregressive conditional duration model and a stochastic volatility model. We study the local and global behavior of the estimator and we find that the gamma kernel estimator outperforms the local linear density estimator and the Gaussian kernel estimator based on log-transformed data. We illustrate also the good performance of the h block cross-validation as a bandwidth method selection procedure. An application to data from financial transaction durations and realized volatility is provided. Key words: Gamma kernel, Nonparametric density estimation, Mixing process, Transaction durations, Realized volatility. JEL Classification : C13, C14, C32. We would like to thank Luc Bauwens, Marine Carrasco, Song Chen, Luc Devroye, Nicolay Gospodinov, Bruno Remillard, Jean-Marie Rolin and Ingrid Van Keilegom for useful comments. Département de mathématiques et de statistique, Université de Montréal and Institute of Statistics, Université catholique de Louvain. Address: Département de mathématiques et de statistique, Université de Montréal, C.P. 6128, succursale Centre-ville Montréal, Canada, H3C 3J7. Institute of Applied Economics at HEC Montréal, CIRANO, CIRPEE, CORE (Université catholique de Louvain). Address: 3000 Cote Sainte Catherine, Montréal (QC), Canada, H3T 2A7. TEL: ; FAX: ; jeroen.rombouts@hec.ca 1

3 1 Introduction Nonparametric density estimation is now frequently used in economics and finance. An important first step in applied modelling is looking at the characteristics of the data at hand without making yet strong assumptions. Apart from the usual descriptive statistics, we also estimate univariate marginal densities of the variables of interest to examine for example the tails or to check if there is some multimodality in the density. Furthermore, the density estimates are used for example to estimate diffusion processes, to construct semiparametric estimators and in Bayesian inference to summarize posterior information with respect to parameter draws from samplers. Nonparametric density estimation is usually done by a Gaussian kernel estimator the asymptotic properties of which are well established for independent identically distributed (i.i.d.) data and for time series data, see for example Silverman (1986), Pagan and Ullah (1999), Fan and Yao (2003) and? for textbook details. Another advantage of the Gaussian kernel estimator is the direct availability in most statistical software packages. However, a drawback is that it may give positive weight to points outside the support when we estimate the density of a bounded random variable. This causes a bias problem at the boundary and therefore inconsistency of the estimator. This problem, well known as the boundary bias problem, has given rise to new methods and new kernel estimators for independent data. First solutions to this boundary problem are the reflection method, see Schuster (1985) and the local renormalisation method, see Diggle (1985) and Härdle (1990). The idea behind the reflection method is to recapture the missing weight by reflecting the estimate in the boundary. The renormalisation method uses the Gaussian estimator with renormalized kernels. Although both methods lower the bias, there is still more bias at the boundary compared to the interior of the support. To obtain the same bias order over the whole support,? and Marron and Ruppert (1994), apply first a data transformation such that the first derivative of the density of the transformed data is equal to zero at the boundary points. Second, they use the reflection method to estimate the density function of the transformed data which is thereafter transformed to the density of the original data. In order to get the same order of bias without transforming the data, several authors propose to use an adapted kernel at the boundary region and the standard kernel in the interior region. See Müller (1991) for optimal boundary kernel estimators, Lejeune and Sarda (1992) for the local linear estimator, Jones (1993) for the boundary kernel and? who combines two 2

4 density estimators with the same kernel but with two different bandwidths. The drawback of these estimators is that they can take negative values near the boundary. To solve this problem, Jones and Foster (1996) propose a simple nonnegative boundary corrected estimator. A recent solution is to use asymmetric and adapted kernels which do not assign any weight outside the support of the variable. Chen (1999) and Chen (2000) propose respectively beta kernel densities for variables with compact support and gamma densities as kernels for variables with positive support. Chen (2002) applies the gamma kernel in local linear smoothing to estimate a regression curve with a positive support. Fernandez and Monteiro (2005) establish a central limit theorem for gamma kernel functionals. Asymmetric kernels are also developed by? who proposes optimal asymmetric kernels and? who suggests inverse Gaussian and reciprocal inverse Gaussian kernels. For economics data, it frequently happens that the range of the variable of interest is not the whole real line. This is not necessarily a problem for the Gaussian kernel estimator because the boundary can sometimes be ignored in practice. For example in the independent data framework, income data from the computer industry are positive but there are no values near the zero boundary and therefore the boundary problem does not exist practically. However, income data for a country can have most of the density mass near zero or the minimum wage because of high unemployment and/or high income inequality. Bouezmarni and Scaillet (2005), who derive weak convergence of the integrated absolute error for the gamma kernel estimator, illustrate this for an income density using data from Brazil. For dependent data, many illustrations are available in economics and finance where there is an important concentration of time series data close to zero or another boundary. Financial transaction data are typically highly dependent and often close or approximately equal to zero for frequently traded stocks. In the autoregressive conditional duration literature Engle and Russell (1998) analyze IBM trade duration data which a substantial concentration near zero. See Bauwens, Giot, Grammig, and Veredas (2004) and? for similar characteristics on other US stock data and Ghysels, Gourieroux, and Jasiak (2004) for Alcatel traded on the Paris Stock Exchange.? study the impact of volatility on the transaction intensity for Deutsche Telekom IPO data. Another example is, the short term interest process which has been widely studied in the literature, see?. For this process, nonparametric estimators of the drift function and the diffusion function have been proposed by many authors, for example?,?,? and?. The nonparametric estimators using Gaussian kernels cause a boundary bias both in the drift and the diffusion func- 3

5 tions. To remove this bias,? propose the Jackknife kernels of?,? uses the reflection method and? propose the local linear method. Recently,? use the gamma kernel to take into account the boundary bias. Another important example of bounded time series data is daily volatility series constructed using intraday data using realized volatility techniques. Andersen, Bollerslev, Diebold, and Labys (2001) investigate the shape of the unconditional distribution of realized exchange rate volatility and find that the distribution is extremely skewed. See Andersen, Bollerslev, Diebold, and Ebens (2001) for similar results on stock returns. A non finance example is efficient frontier analysis where inefficiency scores for a company related to its peer companies are computed over time, see Park and Simar (1994) for an application using railway company data. Bayesian inference is an important field where the model parameters are more often bounded than not. For example, in a standard GARCH model, see Bollerslev (1986), the parameters of the conditional variance model are imposed to be positive. In the canonical version of the stochastic volatility model, see Ghysels, Harvey, and Renault (1996), the autoregressive parameter of the log-volatility model is generally assumed to be lower than one. These parameter restrictions have to be taken into account when the posterior marginal densities are computed based on kernels using draws of these parameters obtained from MCMC samplers. The objective of this paper is to propose a kernel estimator for positive time series that directly takes into account the support of the time series. Instead of the Gaussian kernel, we use a gamma kernel when the support is positive and we suggest to use a beta kernel whenever the support is the unit interval. The estimators are free from boundary bias and their shapes depend on the point where the density is to be estimated. When there is no boundary bias problem, the estimators behave like the Gaussian kernel estimator which is optimal in that context. For the gamma kernel estimator, we derive the convergence properties for data series from a stationary α-mixing process. We compute the mean squared error (MSE), the mean integrated squared error (MISE) and obtain the optimal rate of convergence. We also state the asymptotic normality for the gamma kernel estimator. The asymptotic normal distribution can be used for confidence intervals as we show in some applications. This new result can of course be applied also in the independent data framework. In general, the theoretical results of this paper can be easily extended for the beta kernel estimator for density with compact supports. In a detailed Monte Carlo study, where we generate data from an autoregressive conditional 4

6 duration model and a stochastic volatility model, we show that, in terms of mean squared error and mean integrated squared error, the gamma kernel outperforms its closest competitor which is the local linear density estimator of Jones (1993). We compare the gamma kernel estimator with the Gaussian kernel estimator based on log-transformed data, a popular technique among applied econometricians to deal with boundary bias problem. We compare three techniques to select a practical bandwidth, and we find that the h-block cross-validation method works well. The rest of the paper is organized as follows. The gamma kernel estimator for positive time series is introduced in Section 2. Section 3 provides convergence properties. In Section 4, we investigate the finite sample properties of the gamma kernel estimator and the bandwidth selection for data from a conditional duration model and a stochastic volatility model. Both models are frequently used in financial econometrics. Section 5 provides first an application to transaction duration data for four US stocks traded on the New York Stock Exchange. The second application uses realized volatility series from the DM/dollar exchange rate and IBM. Section 6 concludes. The proofs of the propositions are gathered in the Appendix. 2 Gamma kernel estimator We observe X 1,...,X n from a stationary α-mixing nonnegative process with density function f. Our goal is to estimate the function f(x) nonparametrically for x [0, ). The gamma kernel estimator is defined as follows ˆf(x) = 1 n n K ρb (x),b (X i ) (1) i=1 where the kernel K ρb (x),b is defined by K ρb (x),b(t) = tρ b(x) 1 exp( t/b) b ρ b(x) Γ(ρ b (x))) and ρ b (x) = x/b if x 2b 1 4 (x/b)2 + 1 if x [0,2b). (2) For notational simplicity we write K x,b instead of K ρb (x),b. The shape of the gamma kernel and the amount of smoothing vary according to the position where the density is estimated implying that the gamma kernel estimator is an adaptive density estimator. This is a crucial difference with the 5

7 Gaussian or other fixed symmetric kernel estimators. The support of the gamma kernel matches the support of the probability density function to be estimated, and therefore no weight is lost when the density is estimated at the boundary region. The gamma kernel density estimator is easy to implement, free of boundary bias and always non-negative. As we will see in the next section, it achieves the optimal rate, O(n 4/5 ) which is the best rate in nonparametric kernel estimation, of convergence for the mean integrated squared error within the class of non-negative kernel density estimators for strong mixing data. Furthermore, the variance of the estimator reduces when the position where the smoothing is made moves away from the boundary. This is an interesting property when the support of the density has sparse regions because more data points can be used to estimate the density in areas with fewer observations. We show also the importance of this feature when we deal with the asymptotic normality of the gamma kernel estimator. For i.i.d. data, Chen (2000) shows by simulations that, in term of mean squared error, the gamma kernel estimator outperforms the local linear estimator of Jones (1993), the non-negative estimator of Jones and Foster (1996) and the boundary kernel estimator of Müller (1991). Note that the latter estimators are proposed to remove the boundary bias problem of the standard kernel estimator. Bouezmarni and Scaillet (2005) state the uniform weak consistency for the gamma kernel estimator on each compact set in the positive real line when f is continuous on its support and the weak convergence in terms of mean integrated absolute error. For unbounded densities at the origin, they investigate the performance of the gamma kernel density estimator through simulations, and prove that it converges in probability to infinity at the origin where the density is unbounded.? study a local multiplicative bias correction for the gamma kernel and? investigate the gamma kernel in the time series regression context. 3 Convergence properties As already mentioned, the boundary bias problem for the standard kernel density estimator of density functions defined on [0, + ) is still present when the data are dependent. In this section we prove that the gamma kernel estimator is free of boundary bias in the case of dependent data. We find that, under mild conditions, it is possible to obtain the same convergence rates as in the i.i.d. case. In fact, we prove that the gamma kernel estimator achieves the optimal rate of convergence, the best precision that can be achieved by a nonparametric kernel estimator, in terms 6

8 of the mean integrated squared error. We also state the asymptotic normality. We require that the process {X i } i 1 is α-mixing. The sequence is α-mixing if the mixing coefficient α(j) 0 as j, where α(j) = sup P(A B) P(A)P(B), A F1 k(x), B F k+j(x) where F k j (X) and F k+j(x) are the σ-field of events generated by {X l,j l k} and {X l,l k+j}, respectively. The concept of α-mixing is omnipresent in time series analysis and is less restrictive than β and ρ-mixing. The following condition requires an α-mixing coefficient with exponential decay. Condition 1. (Mixing condition) (X i ) is supposed to be α-mixing such that α(j) ρ j, j 1 for some constant 0 < ρ < 1. Below are two examples of popular financial econometrics models that satisfy this mixing condition and that generate positive data. Example 1 (Augmented autoregressive conditional model) The augmented autoregressive conditional duration model (AACD) is defined as follows x i = ψ i ǫ i ψ λ i = ω + αψ λ i 1 [ ǫ i 1 b + c(ǫ i 1 b)] v + βψ λ i 1 (3) where the ǫ i are positive i.i.d random variables and independent from ψ i. Fernandez and Grammig (2006) establish that x i is strictly stationary and β mixing with exponential decay, hence x t satisfies condition 1, if β < 1 and IE β + α[ ǫ i 1 b + c(ǫ i 1 b)] v m < 1 for some positive integer m. For the ACD(1,1) model, that is AACD with λ = v = 1 and b = c = 0, Carasco and Chen (2002) show the same results. Example 2 (Interest rate model, CIR model in?)? express the interest rate dynamic as follows dr i = κ(θ r i )dt + σ r i dw i 7

9 where W i is the Brownian motion and κ,θ and σ are strictly positive such that σ 2 2θ. µ(r i ) = κ(θ r i ) and σ(r i ) = σ r i are the drift and the diffusion function, respectively.? show that the interest rate r i is β mixing, hence α-mixing, with geometric decay. Therefore, µ(r i ) and σ(r i ) are α-mixing with geometric decay since they are a measurable transformation of r i, see? and?. For the remainder of the paper we denote by f i the joint density of (X 1,X 1+i ) and g i (x,y) = f i (x,y) f(x)f(y). The function g i is directly linked to the covariance of (X 1,X 1+i ). The next proposition states the mean squared error and mean integrated squared error of the gamma kernel estimator using condition 1 and the same conditions on the bandwidth parameter as in the i.i.d. case. In the sequel, we continue to denote the bandwidth parameter as b without stressing that it depends on the sample size n. Proposition 1. Suppose that f is twice continuously differentiable and g i is bounded. Further assume that lim b = 0 and lim n n n ba/2 = +. Under condition 1, the mean squared error of the gamma kernel estimator is given by MSE(x) = { } b (xf ) 2 (x) + n 1 b 1/2 f(x) 2 πx + o(n 1 b 1/2 ) + o(b 2 ) if x/b and a = 1 b 2 (ξ b (x)f (x)) 2 + Γ(2κ+1) 2 1+2κ Γ 2 (κ+1) n 1 b 1 {f(x) + O(n 1 )} + o(b 2 ) if x/b κ and a = 2. where ξ b (x) = (1 x)(ρ b (x) x/b)/(1 + bρ b (x) x) and for a non-negative constant κ. x/b κ means the boundary region and x/b the interior region. The mean integrated squared error of the gamma kernel estimator is given by MISE = b 2 B + n 1 b 1/2 V ar + o(b 2 ) + o(n 1 b 1/2 ) where B = x 2 f 2 (x)dx and V ar = 1 2 π 0 x 1/2 f(x)dx. The optimal bandwidth which minimizes the asymptotic mean integrated squared error is b = ( ) V ar 2/5 n 2/5. (4) 4B 8

10 This leads to the optimal mean integrated squared error: MISE 5 4 4/5 V ar4/5 B 1/5 n 4/5. The results of Proposition 1 remain valid if α(i) < γi β for some positive constants γ and β > 3 2 instead of β > 2 for the Gaussian kernel estimator in the no boundary case, see Fan and Yao (2003). As in the i.i.d. case, the gamma kernel estimator achieves the same optimal rate of convergence in terms of mean integrated squared error. With respect to MSE, the gamma kernel estimator has a uniform rate of bias, O(b). The presence of xf (x) in the bias indicates that away from zero the bias of the gamma kernel estimator increases. Therefore, we impose that the underlying density is smooth enough away from zero, so that the second derivative is negligible. For the variance, we have the opposite effect: the variance decreases and tends to zero away from zero, that is when x > 2b, but the rate of the variance becomes slower near zero, that is, O(nb 1 ) instead of O(nb 1/2 ) in the interior region. We will see in the simulations that, for finite samples, the variance of the gamma kernel dominates that of the local linear kernel in this region. The MISE gives the global error of the gamma kernel estimator. We remark that the larger order of the variance in the boundary region does not affect the MISE. Note that away from zero, x > 2b, the optimal mean squared error of the gamma kernel and the Gaussian kernel is the same. Proposition 1 also provides the theoretical formula of the optimal bandwidth parameter, see (4), which cannot be used in practice since it depends on the unknown density function f.? recommended to use the global bandwidth, the bandwidth which minimizes the MISE, because the local bandwidth, the bandwidth which minimizes the MSE, destroys the quality of the gamma kernel. Several techniques, such as cross-validation and the block bootstrap are available to obtain a bandwidth in practice. In Section 4, we investigate the performance of three techniques to select a practical bandwidth. First, Bauwens, Giot, Grammig, and Veredas (2004) and? use the formula b = (0.9sn 0.2 ) 2, where s is the empirical standard deviation of the data. This formula is based on (4) with an exponential density with parameter λ = s 2, We call this the exponential rule. Second, we suppose as if the density function is a gamma distribution, and estimate its parameters by maximum likelihood, and then apply (4). We call this the gamma rule. Third, since the data are dependent, we study the h-block cross-validation method proposed by?. For the Gaussian kernel density estimator, they establish that the bandwidth selected with this method is asymptotically 9

11 optimal under the α mixing condition. A similar result could be shown for the gamma kernel estimator. We now turn to the second theoretical result of the paper. The asymptotic distribution of density estimators, usually Gaussian, is useful for goodness of fit and confidence intervals. The following proposition deals with the asymptotic normality of the gamma kernel estimator. Proposition 2. Let f be a density function twice continuously differentiable with bounded g i. Assume that the density f t1,...,t 4 (t 1 <... < t 4 ) is bounded. If α(i) < γi β for some positive constants γ, β > 3 2 and b = O ( n 2/5) we have for all x such that f(x) > 0 ( ) ˆf(x) f(x) n 1/2 b 3/4 B(x) n 1/2 b 1/4 N(0,1), (5) V (x) where B(x) = 1 2 xf (x) if x/b (ξ b (x)f (x)) if x/b κ and V (x) = f(x) 2 πx if x/b Γ(2κ+1)b 1/2 f(x) if x/b κ κ Γ 2 (κ+1) (6) For the Gaussian kernel estimator and when the boundary bias is absent, the asymptotic normality is proved for α(i) < γi β with some positive constants γ and β > 2, see Bosq (1996) and Fan and Yao (2003). If we are interested in confidence intervals for the density function, an interesting feature of the gamma kernel estimator is that the variance vanishes farther away from the origin, so that the confidence intervals become narrow. The unknown density function f(x) in the expression of V (x) can be replaced by ˆf(x), thanks to the convergence in probability of the gamma kernel which can be deduced from its mean squar error. Note that it would also be interesting to a norm (sup and L 2 norm) to show both the consistency as in Proposition 1 and Normal distribution convergence as in Proposition 2. The distribution of the L 2 norm can then be used to evaluate the values in simulation experiments. In the next section on finite sample properties, we focus more on the comparison among different nonparametric density estimators by studying their performance for various levels of dependence, sample sizes and underlying densities. 10

12 4 Finite sample properties We investigate the performance of the gamma kernel estimator for data generated from an autoregressive conditional duration (ACD) model and a stochastic volatility (SV) model. Both models are frequently used in financial econometrics and typically exhibit relatively high dependence when fitted to high frequency financial data. The performance of the gamma kernel estimator is compared to the log-transform method (LT), where we estimate the density of Y i = log(x i ) by the Gaussian kernel estimator and use the inverse transform method to estimate the underlying density. We also compare the gamma kernel estimator with the local linear density estimator (LL) of Lejeune and Sarda (1992) and Jones (1993) which is defined next. For a non-negative integer s and any symmetric kernel K with a compact support [ 1,1], define a s (x,h) = x/h 1 t s K(t)dt. (7) Throughout the paper we take the Epanechnikov kernel. The local linear estimator is defined as follows where ˆf h (x) = 1 nh n i=1 ( K l x,h, x X ) i h (8) K l (x,h,t) = a 2 (x,h) a 1 (x,h)t a 0 (x,h)a 2 (x,h) a 2 (9) 1 (x,h)k(t) is a local linear kernel evaluated at x. Note that K l (x,h,t) = K(t) for x > h. Simulation results of Jones (1993) and Chen (2000) show that the local linear estimator outperforms the boundary kernel estimator of Müller (1991) and the non-negative estimator Jones and Foster (1996). The latter estimators are therefore not considered here. With respect to the practical choice of the bandwidth parameter, we investigate three methods. We compare the gamma kernel estimator, with the exponential rule, the gamma rule as explained in Section 3 and the h-block cross-validation method with the theoretical bandwidth, which is the one that minimizes the integrated square error. The two first methods give only acceptable results in a few settings, and we decide to not report the results here and concentrate on the third method which selects the bandwidth by minimizing the following cross-validation criterion CV h (b) = ˆf 2 (x) 2 n ˆf (i) (X i ) n 11 i=1

13 where ˆf (i) is defined as follows: ˆf (i) (x) = 1 n h n K x,b (X j ) j i >h with n h is the number of data points (j) such that j i > h. ˆf(i) is the gamma kernel estimator based on the remaining sample after removing h observations on both side of X i. Remark that for h = 0, this method is the least-squares cross validation method for independent data. More details on this method can be found in?,? and?.? suggest to choose h such that h/n 0. In the simulations, we consider different values of h and they give similar results. We report those with h = n Since the underlying density f is unknown for the ACD and the SV models, we consider the empirical density function based on observations. For each model we perform 500 replications for different degrees of dependence and for sample sizes 125, 250, 500, 1000 and Autoregressive conditional durations The analysis of durations between financial transactions is of central importance in the understanding of financial markets microstructure. Engle and Russell (1998) propose a new class of point processes with dependent arrival rates. They model the expected duration between events by an autoregressive structure and call it therefore the autoregressive conditional duration (ACD) model. Let the duration be defined as x i = t i t i 1, that is the interval between two arrival times. Their ACD model is defined as x i = ψ i ǫ i ψ i = ω + α 1 x i 1 + α 2 ψ i 1 (10) where the innovations ǫ i,i = 1,...,n are i.i.d random variables. Given a density assumption on ǫ i, for example the generalized gamma or the Burr distributions, the model is estimated easily by maximum likelihood. Inference on a semiparametric model, where the distribution of ǫ i is determined only by the data sample, is provided by Drost and Werker (2004). The unconditional distribution of x i is unknown but expressions for the unconditional moments are available, see?, Chapter 3, and Fernandez and Grammig (2006). For extensions and applications of the ACD model, see for example, Engle (2000), Hamilton and Jorda (2002), Bauwens and Veredas (2004) and Bauwens, Giot, Grammig, and Veredas (2004). 12

14 We consider three innovation densities, the standard exponential, the Weibull with shape and scale parameters equal to 0.91 and 1 respectively, and the Gamma with shape and scale parameters equal to 2 and.05 respectively. As can be seen in Figure 1, the exponential and Weibull densities have a peak at zero, while the Weibull density is unbounded at zero. We consider three levels of (a) Exponential (b) Weibull Figure 1: Innovation densities. (c) Gamma dependence: high, low and independence. The parameter values for these levels are given in Table 1. In general, the choice of α 1, α 2 and the parameter of the density function of ǫ i obey α 2 < 1 and Table 1: Parameter values for the ACD model ω α 1 α 2 I high dependence II low dependence III independence E(α 2 + α 1 ǫ i ) 2 < 1, such that the process is β-mixing with exponential decay, so that the process satisfies the conditions in Proposition 1 and Proposition 2. See also Carasco and Chen (2002) and? for more mixing properties of GARCH and ACD models. By including the limiting independence case we can appreciate how the performance of the gamma kernel changes when dependence is introduced. Figure 2 displays in the first panel a simulated sample path of the high dependence model with exponential innovation density for 1000 observations. The second panel shows the gamma kernel estimator which, as it should be, has a peak at zero. The bandwidth parameter which minimizes the integrated squared error is equal to 13

15 As a comparison, the last panel displays the Gaussian kernel estimator, using the rule of thumb to obtain the bandwidth, for which we immediately observe the boundary problem (a) Sample path from ACD model (b) Gamma kernel estimator (c) Gaussian kernel estimator Figure 2: ACD model with high dependence and exponential innovation density. Table 2 shows the mean and the variance ( 10 4 ) of the L 2 norm for the gamma (G), the local linear (LL), and the log-transformed Gaussian kernel (LT) estimators for ACD Models. For each replication, the bandwidth parameter is chosen such that the integrated squared error is minimized. As expected, the mean norm decreases when the sample size rises, and this holds for any dependence level. For any sample size, there is a positive relation between the mean norm and the dependence level. Therefore, minimizing the L 2 norm is more difficult in the dependent data case. 14

16 In most situations, the gamma kernel estimator dominates the local linear estimator. The difference is more striking when the sample size is small and when the dependence becomes high. For example, for the sample size of 125, the mean L 2 norm in the high dependence case for the gamma kernel estimator is and for the local linear estimator, compared to only and in the medium dependence case. This difference is less pronounced for the exponential innovation density in the independent case. With respect to the log-transformed Gaussian kernel estimator, its performance is dominated by that one of the gamma kernel estimator, for the models with exponential and Weibull innovations. For example, for Weibull innovations and sample size 125, high dependence, the mean L 2 error is for the gamma kernel estimator and for the log-transformed Gaussian kernel estimator. Though, for gamma distributed innovations, they behave similarly. For the variances of the L 2 norm for the three estimators for the ACD Models, there is a negative relation between the size of the variance of the two estimators and the sample size, as expected. The variance increases also for higher dependence levels. Except for the high dependence case and sample size 125, the variance of the gamma kernel and the local linear estimators are similar for the Gamma innovation density. For the exponential and the Weibull innovation density the gamma estimator has a smaller variance than the local linear estimator. In particular, in the high dependence case, the variance of local linear estimator is about twice the variance of gamma kernel estimator. For example, for the Weibull innovation density and a sample size 500, the variance is 4.41 and 2.40, respectively. The variance of the Gaussian kernel estimator based on log-transformed data is always larger than that of the gamma kernel estimator, in particular for high dependence and small sample sizes. For example, for the Weibull innovation and high dependence case, the variance of the L 2 norm is 4.9 for the gamma kernel estimator and for the log-transformed Gaussian kernel estimator. We study now the local performance of the gamma and the local linear kernel estimators. Figure 3 shows the pointwise squared bias, the variance and the mean squared error of the gamma and the local linear kernel estimators for the ACD model with exponential innovation and sample size T = 125 and T = For the small sample size, we remark that the square bias of the gamma kernel estimator is dominated by that of the local linear kernel estimator in a small area in the boundary region, but there is no winner outside this region. Although the presence of xf (x) in the formula of the bias away from zero in Proposition 1, the bias converges to zero when x becomes 15

17 16 n = 125 n = 250 n = 500 n = 1000 n = 2000 G LL LT G LL LT G LL LT G LL LT G LL LT (A) I µ σ II µ σ III µ σ (B) I µ σ II µ σ III µ σ (C) I µ σ II µ σ III µ σ Table 2: L2 norm for gamma (G), local linear (LL) and the log-transform (GauT) estimators. I: high dependence, II: low dependence, III: independent case. For each type of dependence, µ is the mean and σ 2 is the variance ( 10 4 ). ACD Models with innovations drawn from exponential (A), Weibull (B) and gamma (C) densities. For each of the 500 replications, the bandwidth parameter is chosen such that the integrated squared error is minimized.

18 large. For the variance, as expected, we remark that the variance is larger in the boundary region than in the interior region. However, the variance is still small for the gamma kernel estimator compared to the local linear estimator. For the mean squared error, we see a domination of the gamma kernel estimator in the boundary region and a slight domination of the local linear estimator in the interior region. Both MSE s converge to zero when x becomes large. For T = 2000, we see that the differences become negligible which is in line with the results found in Table 2. We focus now on the bandwidth choice. Table 3 shows the median and the interquartile range for the squared difference between the theoretical and the h block cross validation bandwidth. The median and the interquartile range converge to zero as the sample size increases. The error increases with the degree of dependence, but the difference vanishes as the sample size increases. For example, for exponential innovations and a sample size of 125, the median is for high dependence and for weak dependence, compared to, for a sample size of 1000, and for high dependence and weak dependence, respectively. Table 3: Median and interquartile range of the squared error between the theoretical and the h block cross validation bandwidths for gamma kernel estimator for ACD models n = 125 n = 250 n = 500 n = 1000 Med IQR Med IQR Med IQR Med IQR A I II B I II C I II I: high dependence, II: low dependence. The innovations are drawn from exponential (A), Weibull (B) and gamma (C) densities. Med is the median and IQR is the interquartile range. For each of the 500 replications, we calculate the square of the difference between the bandwidth parameter minimizing the integrated squared error (theoretical bandwidth) and the h-block cross-validation (h = n 0.25 ) bandwidth. 17

19 (a) Square bias (b) Variance (c) Mean square error (d) Square bias (e) Variance (f) Mean square error Figure 3: Bias, variance and mean square error of gamma kernel estimator (dashed line) and local linear estimator (solid line). The data are generated from a ACD Model with exponential innovation with high dependence. Upper (resp. below) plots are with sample size 125 (resp. 2000). 18

20 4.2 Stochastic volatility The stochastic volatility (SV) model, see for example Taylor (1986), is a popular alternative to the GARCH type volatility model. Estimation of the SV model is less straightforward than the GARCH model, the likelihood function involves a high dimensional integral, but it has direct links with finance theory such as option and bond pricing for example. The basic SV model is given by y i ( xi ) = exp µ i 2 x i = γ + δx i 1 + νǫ i (11) where µ i and ǫ i are i.i.d random variables and independent from each other. Various extensions and other formulations exist of the basic SV model, see Shephard (2005) for a collection of selected readings. Our interest lies in the marginal density of the variance (exp(x i )). In the simulation study, we consider two innovation densities for µ i, the normal and the student distribution with 5 degrees of freedom, the distribution of ǫ i is the standard normal. As before, we consider three levels of dependence: high, low and independence. The parameter values, given in Table 4, are chosen such that the y t and x t process are α mixing with exponential decay, see Carasco and Chen (2002) for more details. To illustrate the SV process, Figure 4 displays in the Table 4: Parameter values for the SV model γ δ ν I high dependence II low dependence III independence first panel a simulated sample path of the high dependence SV model with normal innovation density for 1000 observations. The second panel shows the gamma kernel estimator which has, as it should be, a peak at zero. The bandwidth parameter which minimizes the integrated squared error is equal to The normal kernel estimator in the last panel suffers from the boundary problem as expected. Here, the bandwidth is obtained using the rule of thumb for the Gaussian kernel estimator and the h block cross-validation for the gamma kernel estimator. Table 5 contains the mean and the variance ( 10 4 ) of the L 2 error for the gamma kernel estimator, the local linear kernel estimator and the Gaussian kernel estimator based on log-transformed 19

21 (a) Sample path from SV model (b) Gamma kernel estimator (c) Gaussian kernel estimator Figure 4: SV model with high dependence and Gaussian innovation density. 20

22 data for the variance of the SV Model. Broadly speaking, the results are the same as the results of the ACD model in Section 4.1. The mean error decreases with the sample size at about the same rate as for the ACD models. Though, the rate is higher for the local linear estimator. This is true for all dependence levels and for both innovation densities. The mean L 2 error increases for higher dependence levels no matter the innovation density and the sample size. The mean error of the gamma kernel estimator is remarkably lower than the local linear estimator in the high dependence case for sample sizes up to 500. For example, for 125 observations and a normal innovation density, the mean errors for the gamma kernel and the local linear estimator are respectively and For 1000 observations the difference in the mean L 2 error is negligible. Except for the high dependence case, where we see a domination of the gamma kernel estimator, the Gaussian kernel estimator on log-transformed data and the gamma kernel estimator have a similar performance for normal innovations and the student innovations. With respect to the variance of the L 2 error, the main conclusions are similar as for the ACD models of the previous section. The Gaussian kernel estimator on log-transformed data has a problem of stability, in particular for small sample sizes. The variance decreases with the sample size but at a rate that is slower than that for the ACD models. However, the rate of decrease in variance is higher for the local linear estimator and the log-transformed Gaussian kernel estimator. For example, in the high dependence case for the student innovation density, the variance of the L 2 error for the gamma kernel estimator, the local linear estimator and the Gaussian kernel estimator on log-transformed data are respectively 13.87, and for 125 observations and respectively 1.852, and for 2000 observations. This observation holds true in general for all dependence levels and for both innovation densities. The variance increases for higher dependence levels and for small sample sizes. With respect to the bandwidths, Table 6 shows the median and the interquartile range for the squared difference between the theoretical and the h block cross validation bandwidth for the SV model. As the ACD models, we report those with h = n The median and the interquartile range decreases when the sample size increases but with at lower rate compared to the ACD models. 21

23 22 n = 125 n = 250 n = 500 n = 1000 n = 2000 G LL LT G LL LT G LL LT G LL LT G LL LT (A) I µ σ II µ σ III µ σ (B) I µ σ II µ σ III µ σ I: high dependence, II: low dependence, III: independent case. For each type of dependence, µ is the mean and σ 2 is the variance ( 10 4 ). SV Model with innovations drawn from A: normal and B: student densities. For each of the 500 replications, the bandwidth parameter is chosen such that the integrated squared error is minimized. Table 5: L2 error for gamma (G), local linear (LL) and the log-transform (LogT) estimators.

24 Table 6: Median and interquartile range of the square error between the theoretical and the h block cross validation bandwidths for gamma kernel estimator for SV models n = 125 n = 250 n = 500 n = 1000 Med IQR Med IQR Med IQR Med IQR I A II I B II I: high dependence, II: low dependence. For each type of dependence, Med is the median and IQR is the interquartile range. SV Model: The data are drawn from A: normal and B: student densities. For each of the 500 replications, we calculate the square of the difference between the bandwidth parameter minimizing the integrated squared error is minimized and the h-block crossvalidation(h = n 0.25 ). 5 Applications In this section we use the gamma kernel to estimate some densities of positive financial time series data the type of which are frequently used in financial econometric modeling. We also compute pointwise confidence intervals based on the asymptotic results in Section 3. The first application is on financial durations. We take a subset of the duration data of Bauwens, Giot, Grammig, and Veredas (2004). They use the durations to compare several financial duration models via density forecasts. The second application is using realized volatility data series from an exchange rate and IBM. 5.1 Price durations We use transaction data from New York Stock Exchange which trading process combines a market maker system and an order book system. Apart from an opening auction, trading is continuous from 9h30 to 16h. Data were extracted from the Trade and Quote (TAQ) database supplied by the NYSE. The data cover the months September, October and November 1996 and is available for Boeing, Coca-Cola, Disney and Exxon which are actively traded Dow Jones stocks. In particular, we use price durations which are defined as the time interval needed to observe a cumulative change 23

25 in the mid-price not less than a threshold that is set at Price durations provide a measure of the instantaneous volatility of the quoted mid-quoted price process as explained in Engle and Russell (1998). For a more detailed description of the data and other types of durations, we refer to Bauwens, Giot, Grammig, and Veredas (2004). Figure 5 displays the durations series for the four stocks. The sample sizes differ for each stock and range between 1609 and 2717 observations many of which are small because the stocks are actively traded. We observe the typical characteristics of high frequency financial duration data for all the series. Fitting ACD models to the durations typically implies a high persistence, α + β is close to one as explained in Section 4, because of the high dependence between the durations. See Bauwens, Giot, Grammig, and Veredas (2004) for a model comparison using density forecasts between several ACD models (different conditional mean and innovation density specifications) using the same data as in this paper. Figure 6 displays the gamma kernel density estimates for the four stocks. The bandwidth parameter is obtained using the h block cross validation method, as explained in Section 3. There is an important concentration of observations near the origin. This makes the Gaussian kernel estimator inadequate as already explained in Section 4. The shape of the four densities is quite similar, although we remark that the Boeing and Coke price durations are more concentrated at the origin than Disney and Exxon. The confidence intervals are close to the density estimates because of the large sample sizes and are inverse trumpet shaped according to the results in proposition Realized volatility A trading day can be divided in N equidistant periods, if r i,t denotes the intradaily return of the ith interval of day t, the realized volatility for day t is defined as N V t = ri,t. 2 (12) i=1 If the intraday returns have zero mean and are uncorrelated, then (12) is an unbiased estimator of volatility. Andersen, Bollerslev, Diebold, and Ebens (2001) show for the 30 stocks in the Dow Jones Industrial Average index that the unconditional distributions of the realized variances are extremely right-skewed and leptokurtic. See Andersen, Bollerslev, Diebold, and Labys (2003) and Andersen, Bollerslev, Diebold, and Labys (2001) for similar features on exchange rate data. We use the DM/$ exchange rate and IBM data from Hansen and Lunde (2005) which are 24

26 (a) Boeing Price (2620 obs) (b) Coke Price (1609 obs) (c) Disney Price (2610 obs) (d) Exxon Price (2717 obs) Figure 5: Price durations of four US stocks. 25

Nonparametric Kernel Density Estimation Near the Boundary

Nonparametric Kernel Density Estimation Near the Boundary Peter Malec a,, Melanie Schienle b a Institute for Statistics and Econometrics, Humboldt-Universität zu Berlin, Spandauer Str. 1, D-10178 Berlin,