ADDITIVE COEFFICIENT MODELING VIA POLYNOMIAL SPLINE

Size: px
Start display at page:

Download "ADDITIVE COEFFICIENT MODELING VIA POLYNOMIAL SPLINE"

Transcription

1 ADDITIVE COEFFICIENT MODELING VIA POLYNOMIAL SPLINE Lan Xue and Lijian Yang Michigan State University Abstract: A flexible nonparametric regression model is considered in which the response depends linearly on some covariates, with regression coefficients as additive functions of other covariates. Polynomial spline estimators are proposed for the unknown coefficient functions, with optimal univariate mean square convergence rate under geometric mixing condition. Consistent model selection method is also proposed based on a nonparametric Bayes Information Criterion BIC). Simulated and real data examples demonstrate that the polynomial spline estimators are computationally efficient and also as accurate as the existing local polynomial estimators. Short Running Title. Additive Coefficient Model Key words and phrases: AIC, Approximation space, BIC, German real GNP, knot, mean square convergence, spline approximation 1. Introduction Parametric regression models are based on the assumption that their regression functions follow a pre-determined parametric form with finitely many unknown parameters. A wrong model can lead to excessive estimation biases and erroneous inferences. In contrast, nonparametric models impose less stringent assumptions on the regression functions. General nonparametric models, however, need large sample sizes to obtain reasonable estimators when the predictors are high dimensional. Much effort has been made to alleviate the curse of dimensionality by imposing appropriate structures on the regression function. Of special importance is the varying coefficient model Hastie & Tibshirani 1993), whose regression function depends linearly on some regressors, with coefficients as smooth functions of other predictor variables, called tuning variables. A special type of varying coefficient model is called functional coefficient model by Chen & Tsay 1993b), in which all tuning variables are the same and univariate. It was studied in the time series context by Cai, Fan & Yao 000) and Huang & Shen 004). Xue & Yang 005a) extended the functional coefficient model to the case when the tuning variable is multivariate, with additive structure on regression coefficients to avoid the

2 curse of dimensionality. The regression function of the new additive coefficient model is d 1 d m X, T) = α l X) T l, α l X) = α l0 + α ls X s ), 1.1) in which the predictor vector X, T) R d R d 1, with X = X 1,..., X d ) T, T = T 1,..., T d1 ) T. This additive coefficient model includes as special cases the varying/functional coefficient models, as well as additive model Chen & Tsay 1993a, Hastie & Tibshirani 1990) and linear regression model, see Xue & Yang 005a), which has obtained asymptotic distributions of local polynomial marginal integration estimators of the unknown coefficients. The parameters {α l0 } d 1 are estimated at the parametric rate 1/ n, and the nonparametric functions {α ls x s )} d 1,d, are estimated at the univariate smoothing rate. Due to integration step and its local nature, the kernel method in Xue & Yang 005a) is computationally expensive. Based on a sample of size n, to estimate the coefficient functions {α l x)} d 1 in 1.1) at any fixed point x, a total of d + 1) n least squares estimations have to be done. So the computational burden increases dramatically as the sample size n and the dimension of the tuning variables d increase. In this paper, we propose a faster polynomial spline estimator for model 1.1). In contrast to the local polynomial, polynomial spline is a global smoothing method. One solves only one least squares estimation to estimate all the components in the coefficient functions, regardless of the sample size n and the dimension of the tuning variable d. So the computation is substantially reduced. As an attractive alternative to the local polynomial, polynomial spline has been used to estimate various models, for example, additive model Stone 1985), the functional ANOVA model Huang 1998a, b), the varying coefficient model Huang, Wu & Zhou 00), and additive model for weakly dependent data Huang & Yang 004), with asymptotic results. We have given complete proof of the polynomial spline estimators rate of convergence under geometric mixing condition. A major innovation is the use of approximation space with possibly unbounded basis. Huang, Wu & Zhou 00), for instance, has imposed the assumption that T = T 1,..., T d1 ) T in 1.1) has compactly supported distribution to make their basis bounded. imposes mild moment conditions on T. Our method, in contrast, only The paper is organized as follows. Section discusses the identification issue for model 1.1). Section 3 presents the polynomial spline estimators, their L consistency and a model selection procedure based on Bayes Information Criterion BIC). These estimation and model selection procedures adapt automatically to varying coefficient model Hastie & Tibshirani 1993), functional coefficient model Chen & Tsay 1993b), additive model Hastie & Tibshirani 1990, Chen & Tsay 1993a), and linear regression model, a feature not shared by any kernel type estimators. Section 4 applies the methods to the simulated and empirical examples. Technical assumptions and proofs are given in the Appendix.. The Model Let {Y i, X i, T i )} n be a sequence of strictly stationary observations, with univariate response Y i, d and d 1 variate predictors X i and T i. With unknown conditional mean and variance functions

3 m X i, T i ) = E Y i X i, T i ), σ X i, T i ) = var Y i X i, T i ), the observations satisfy Y i = m X i, T i ) + σ X i, T i ) ε i..1) The errors {ε i } n are i.i.d with E ε i X i, T i ) = 0, E ε i X i, T i ) = 1, and εi independent of the σ-field F i = σ {X j, T j ), j i} for i = 1,..., n. So the variables X i, T i ) can consist of either exogenous variables or lagged values of Y i. For the additive coefficient model, the regression function m takes the form in 1.1), and satisfies the identification conditions that E {α ls X is )} = 0, 1 l d 1, 1 s d..) The conditional variance function σ x, t) is assumed to be continuous and bounded. As in most works on nonparametric smoothing, estimation of the functions {α ls x s )} d 1,d, compact sets. Without lose of generality, let the compact set be χ = [0, 1] d. is conducted on Following Stone 1985), p.693, the space of s-centered square integrable functions on [0, 1] is H 0 s = { α : E {α X s )} = 0, E { α X s ) } < + }, 1 s d. Next define the model space M, a collection of functions on χ R d 1 as { } d 1 d M = m x, t) = α l x) t l ; α l x) = α l0 + α ls x s ); α ls Hs 0, in which {α l0 } d 1 are finite constants. The constraints that E {α ls X s )} = 0, 1 s d ensure unique additive representation of α l, but are not necessary for the definition of space M. In what follows, denote by E n the empirical expectation, E n ϕ = n ϕ X i, T i ) /n. We introduce two inner products on M. For functions m 1, m M, the theoretical and empirical inner products are defined respectively as m 1, m = E {m 1 X, T) m X, T)}, m 1, m n = E n {m 1 X, T) m X, T)}. The corresponding induced norms are m 1 = Em 1 X, T), m 1,n = E n m 1 X, T). The model space M is called theoretically empirically) identifiable, if for any m M, m = 0 m,n = 0) implies that m = 0 a.s. Lemma 1 Under assumptions C1) and C) in the Appendix, there exists a constant C > 0 such that { d1 )} ) m C d αl0 + d 1 d α ls, m = α l0 + α ls t l M. Hence for any m M, m = 0 implies α l0 = 0, α ls = 0 a.s., for all 1 l d 1, 1 s d. Consequently the model space M is theoretically identifiable. Proof. Let A l X) = α l0 + d m = E [ d1 α ls X s ), AX) = A 1 X),..., A d1 X)) T. Under assumption C) { } d α l0 + α ls X s ) [ ] c 3 E AX) T AX) ] [ ] T l = E AX) T TT T AX) { } d 1 d = c 3 E α l0 + α ls X s ) 3

4 [ d1 which, by.), is c 3 α l0 + { d 1 E d } ] α lsx s ). Applying Lemma 1 of Stone 1985) m c 3 [ d1 ] αl0 + {1 d 1 d δ)/}d 1 Eαls X s), where δ = 1 c 1 /c ) 1/ with 0 < c 1 c as specified in assumption C1). By taking C = c 3 {1 δ)/} d 1, the first part is proved. To show identifiability, notice that for any m = d 1 α l0 + ) d α ls t l M, with m = 0, we have [ d1 { } ] d [ d1 d 1 0 = E α l0 + α ls X s ) T l C αl0 + d E { αls X s) }], which entails that a l0 = 0 and α ls X s ) = 0 a.s. for all 1 l d 1, 1 s d, or m = 0 a.s. 3. Polynomial Spline Estimation 3.1 The estimators In this paper, denote by C p [0, 1]) the space of p-times continuously differentiable functions. For each of the tuning variable direction, i.e. s = 1,..., d, a knot sequence k s,n with N n interior knots is introduced k s,n = {0 = x s,0 < x s,1 < < x s,nn < x s,nn+1 = 1}. For an integer p 1, define ϕ s = ϕ p [0, 1], k s,n ), a subspace of C p 1 [0, 1]) consisting of functions that are polynomial of degree p or less) on intervals [x s,i, x s,i+1 ), i = 0,..., N n 1, and [x s,nn, x s,nn+1]. Functions in ϕ s are called polynomial splines, which are piecewise polynomials connected smoothly on the interior knots. So a polynomial spline with degree p = 1 is a continuous piecewise linear function, etc. The space ϕ s is determined by the polynomial degree p and the knot sequence k s,n. Let h s = h s,n = max i=0,...,nn x s,i+1 x s,i, called mesh size of k s,n, and define h = max s=0,...,d h s, the overall smoothness measure. Lemma For 1 s d, define ϕ 0 s = {g s : g s ϕ s, Eg s X s ) = 0}, the space of centered polynomial splines. There exists a constant c > 0, so that for any α s Hs 0 C p+1 [0, 1]), there exists a g s ϕ 0 s, such that α s g s c h p+1 α s p+1) s. Proof. According to de Boor 001), p.149, there exists a constant c > 0 and spline function gs ϕ s, such that α s gs c h p+1. Note next that Eg s E gs α s ) + Eα s α s p+1) gs α s. Thus for g s = gs Egs ϕ 0 s, one has α s g s α s gs + Egs c s α s p+1) h p+1. Lemma entails that if the functions {α ls x s )} d 1,d, in 1.1) are smooth, they are approximated well by centered splines { g ls x s ) ϕ 0 } d1,d s,. As the definition of ϕ0 s depends on the 4 s

5 unknown distribution of X s, the empirically defined space ϕ 0,n s = {g s : g s ϕ s, E n g ls ) = 0} is used. Intuitively, function m M is approximated by some function from the approximate space { } M n = d 1 d m n x, t) = g l x) t l ; g l x) = α l0 + g ls x s ); g ls ϕ 0,n s Given observations {Y i, X i, T i )} n from model.1), the estimator of the unknown regression function m is defined as its best approximation from M n, n ˆm = argmin mn Mn {Y i m n X i, T i )}. 3.1) To be precise, write J n = N n + p, and let {w s,0, w s,1,..., w s,jn } be a basis of the spline space ϕ s, for 1 s d. For example, the truncated power basis is used in the implementation { } 1, x s,..., x p s, x s x s,1 ) p +,..., x s x s,nn ) p +, in which x) p + = x +) p. Let w = {1, w 1,1,..., w 1,Jn,..., w d,1,..., w d,j n }, then {wt 1,..., wt d1 } is a R n = d 1 {d J n + 1} dimensional basis of M n, and 3.1) amounts to d 1 d J n ˆm x, t) = + ĉ ls,j w s,j x s ) ĉl0 t l, 3.) in which the coefficients {ĉ l0, ĉ ls,j, 1 l d 1, 1 s d, 1 j J n } minimize the sum of squares n d 1 d Y i c J n l0 + c ls,j w s,j X is ) T il 3.3) with respect to {c l0, c ls,j, 1 l d 1, 1 s d, 1 j J n }. Note that Lemma A.5 entails that, with probability approaching one, the sum of squares in 3.3) has a unique minimizer. For 1 l d 1, 1 s d, denote αls x s) = J n ĉls,jw s,j x s ). Then the estimators of {α l0 } d 1 {α ls x s )} d 1,d, in 1.1) are given as d ˆα l0 = ĉ l0 + E n αls, 1 l d 1;. and ˆα ls x s ) = α ls x s) E n α ls 1 l d 1, 1 s d, 3.4) where {ˆα ls x s )} d 1,d, are empirically centered to consistently estimate the theoretically centered function components in 1.1). These estimators are determined by the knot sequences {k s,n } d and the polynomial degree p, which relates to the smoothness of the regression function. We will refer to an estimator by its degree p. For example, a linear spline fit corresponds to p = 1. Theorem 1 Under assumptions C1)-C5) in the Appendix, if α ls C p+1 [0, 1]), for 1 l d 1, 1 s d, one has ˆm m = O p h p+1 + ) 1/nh, max ˆα l0 α l0 + max ˆα ls α ls = O p h p+1 + ) 1/nh. 1 l d 1 1 l d 1,1 s d 5

6 Theorem 1 entails that the optimal order of h is n 1/p+3), in which case ˆα ls α ls = O p n 1/p+3) ), which is the same rate of the mean square errors as the marginal integration estimators in Xue & Yang 005a). 3. Knot number selection An appropriate selection of the knot sequence is important to efficiently implement the proposed polynomial spline estimation method. Stone 1986) found that the number of knots is more crucial than its location. Thus we discuss an approach to select the number of interior knots N n using the AIC criteria. For knots location, we use either equally spaced knots the same distance between any adjacent knots), or quantile knots sample quantiles with the same number of observations between any two adjacent knots). According to Theorem 1, the optimal order of N n is n 1/p+3). Thus we propose to select the optimal N n denoted as ˆN opt n from the set of integers in [0.5N r, min 5N r, T b)] with N r = n 1/p+3) and T b = {n/ 4d 1 ) 1} /d which ensures the total number of parameters in the least square estimation is less than n/4. To be specific, we denote the estimator for the i-th response Y i by Ŷi N n ) = ˆm X i, T i ), for i = 1,, n. Here ˆm depends on the knot sequence as given in 3.). Let q n = 1 + d N n ) d 1 be the total number of parameters in the least square problem 3.3). Then the AIC value where AIC N n ) = log MSE) + q n /n with MSE = n 3.3 Model selection ˆN opt n is the one minimizing ˆN n opt = argmin AIC N n ), 3.5) N n [0.5N r,min5n r,t b)] { Y i Ŷi N n )} /n. For the full model 1.1), a natural question to ask is whether the functions {α ls x s )} d 1,d, are all significant. A simpler model by setting some of {α ls x s )} d 1,d, zero may perform as well as the full model. For 1 l d 1, let S l denote the set of indices of the tuning variables which are significant in the coefficient function of T l, and S the collection of indices from all the sets S l. The set S is called the model indices. In particular, the model indices of the full model is S f = {S f1,..., S fd1 }, where S fl {1,..., d }, 1 l d 1. For two indices S = {S 1,..., S d1 }, S = { S 1,..., S d 1 }, we say that S S if and only if S l S l, for all 1 l d 1 and S l S l, for some l. The goal is to select the smallest sub-model with indices S S f, which gives the same information as the full additive coefficient model. Following Huang & Yang 004), Akaike Information Criterion AIC) and Bayes Information Criterion BIC) are considered. For a submodel m S with indices S = {S 1,..., S d1 }, let N n,s be the number of interior knots used to estimate the model m S and J n,s = N n,s + p. As in the full model estimation, let {ĉ l0, ĉ ls,j, 1 l d 1, s S l, 1 j J n,s } be the minimizer of the sum of squares n d 1 Y i c l0 + J n,s c ls,j w s,j X is ) T il s S l )

7 Define d 1 ˆm S x, t) = + J n,s ĉ ls,j w s,j x s ) ĉl0 t l. 3.7) s S l Denote Ŷi,s = ˆm S X i, T i ),i = 1,, n, MSE S = Ŷi,s) n Y i /n, qs = d 1 {1 + # S l) J n,s }, the total number of parameters in 3.6). Then the submodel is selected with the smallest AIC or BIC) values, defined as AIC S = log MSE S ) + q S /n, BIC S = log MSE S ) + log n) q S /n. Let S 0 and Ŝ be the index set of the true model and the selected model respectively. The outcome is defined as correct fitting, if Ŝ = S 0; overfitting, if S 0 Ŝ; and underfitting, if S 0 Ŝ, that is, S 0l Ŝl, for some l. For either overfitting or underfitting, we denote Ŝ S 0. Theorem Under the same conditions as in Theorem 1, and N n,s N n,s0 n 1/p+3) ), the BIC is consistent: for any S S 0, lim n P BIC S > BIC S0 ) = 1, hence lim n P Ŝ = S0 = 1. The condition that N n,s N n,s0 is essential for the BIC to be consistent. As a referee pointed out, the number of parameters q S depends on the number of knots and the number of additive terms used in the model function. To ensure BIC consistency, roughly the same sufficient number of knots should be used to estimate the various models so that q S depends only on the number of functions terms. In the implementation, we have used the same number of interior knots Nn opt see 3.5), the optimal knot number for the full additive coefficient model) in the estimation of all the submodels. 4. Examples In this section, we first analyze two simulated data sets with an i.i.d. and time series set-up respectively. Both data sets have sample sizes n = 100, 50 and 500, and 100 replications. Later the proposed methods are successfully applied to an empirical example: West German real GNP. The performance of the function estimators is assessed by the averaged integrated squared error AISE). Denoting the estimator of α ls in the i-th replication as ˆα i,ls, and {x m } n grid m=1 the grid points where the functions are evaluated, we define ISEˆα i,ls ) = 1 n grid {ˆα i,ls x m ) α ls x m )} and AISEˆα ls ) = ISEˆα i,ls ). n grid 100 m=1 4.1 Simulated example 1 The data are generated from the model Y = {c 1 + α 11 X 1 ) + α 1 X )} T 1 + {c + α 1 X 1 ) + α X )} T + ε, with c 1 =, c = 1 and α 11 x) = sin {4x )} + exp { x 0.5) }, α 1 x) = x, α 1 x) = sin x), α x) = 0. The vector X = X 1, X ) T is uniformly distributed on [ π, π] independent of the standard bivariate normal T = T 1, T ) T. The error ε is a standard normal variable independent of X, T). 7

8 The functions are estimated by linear spline p = 1), cubic spline p = 3) and the marginal integration method of Xue & Yang 005a). For s = 1,, let x i s,min, xi s,max denote the smallest and largest [ observation] of the variable x s in the i-th replication. Knots are placed evenly on the intervals x i s,min, xi s,max, with the number of interior knots N n selected by AIC as in subsection 3.. The functions {α ls },, are estimated on a grid of equally-spaced points x m, m = 1,..., n grid with x 1 = 0.975π, x ngrid = 0.975π, n grid = 6. Table 1 reports the means and standard errors in the parentheses) of {ĉ l }, and the AISEs of {ˆα ls },, for all the three fits. Both spline fits p = 1, 3) are generally comparable, with the cubic fit p = 3) better than the linear fit p = 1) for larger sample sizes n = 50, 500), the standard errors of the constant estimators and the AISEs of the function estimators decrease as samples size increases, confirming Theorem 1. The polynomial spline methods also perform better than the marginal integration method. Figure 1 gives the plots of the 100 cubic spline fits for all sample sizes, clearly illustrating the estimation improvements as sample size increases. Plots d1-d4 of the typical estimated curves whose ISE is the median of the 100 ISEs from the replications) seem satisfactory for sample size as small as 100. As mentioned earlier, the polynomial spline method enjoys great computational efficiency. It takes polynomial spline less than 0 seconds to run 100 simulations on a Pentium 4 PC, regardless of sample sizes. In contrast, it takes marginal integration about hours to run 100 simulations with n = 100; and about 0 hours with n = 500. For each replication, model selection is also conducted according to the criteria proposed in subsection 3.3, for polynomial splines with p = 1,, 3. The model selection results are presented in Table, indexed as Example 1. The BIC is rather accurate: more than 86% correct selection when the sample size is as small as 100, and absolute correct selection when sample size increases to 50 and 500, corroborating with Theorem that BIC is consistent. AIC clearly tends to over-fit and never under-fits. 4. Simulated example Insert Table 1 about here) Insert Table about here) Insert Figure 1 about here) The data is generated from a nonlinear AR model Y t = {c 1 + α 11 Y t 1 ) + α 1 Y t )} Y t 3 + {c + α 1 Y t 1 ) + α Y t )} Y t ε t, with i.i.d. standard normal noise ε t, c 1 = 0., c = 0.3 and α 11 u) = u) exp 4u ), α 1 u) = 0.3/ { 1 + u 1) 4}, a 1 u) = 0, α u) = u) exp 4u ). In each replication, a total of n observations are generated and only the last n observations are used to ensure approximate stationarity. In this example, we have used linear spline p = 1) on the quantile knot sequences. The coefficient functions {α ls },, are estimated on a 8

9 grid of equally-spaced points on the interval [ 1, 1], with the number of grid points n grid = 41. Table 3 contains the means and standard errors of {ĉ l }, and the AISEs of {ˆα ls }, The,. results are also graphically presented in Figure. Similar to Example 1, the estimation improves as sample size increases, supporting the asymptotic result Theorem 1). The model selection results are presented in Table, indexed as Example. As Example 1, AIC tends to overfit compared with BIC, and for n = 50, 500, the model selection result is satisfactory. 4.3 West German real GNP Insert Table 3 about here) Insert Figure about here) Now, we compare the estimation and prediction performances of the polynomial spline and marginal integration methods through the West German real GNP. For other empirical examples, see Xue & Yang 005b). The data consists of the quarterly West German real GNP from January 1960 to December 1990, denote as {G t } 14 t=1, where G t is the real GNP in the t-th quarter the first quarter being from January 1, 1960 to April 1, 1960). From its time plot Figure 3), {G t } 14 t=1 appears to have both trend and seasonality. After removing the seasonal means from {log G t+4 /G t+3 )} 10 we obtain a more stationary time series denoted as {Y t } 10 t=1, whose time plot is given in Figure 4. As the nonparametric alternative to the linear autoregressive model selected by BIC Xue & Yang 005a) proposed the additive coefficient model t=1, Y t = a 1 Y t + a Y t 4 + σε t, 4.1) Y t = {c 1 + α 11 Y t 1 ) + α 1 Y t 8 )} Y t + {c + α 1 Y t 1 ) + α Y t 8 )} Y t 4 + σε t, 4.) which is fitted by linear and cubic splines p = 1, 3). Following Xue & Yang 005a), we use the first 110 observations for estimation and the last 10 observations for one-step prediction. Table 4 gives the averaged squared estimation errors ASE) and averaged squared prediction errors ASPE) of different fittings. Polynomial spline is better than the marginal integration method overall, while model 4.) significantly improves over model 4.1) in both estimation and prediction. For visualization, plots of the function estimates are given in Figure 5. Insert Table 4 about here) Insert Figure 3 about here) Insert Figure 4 about here) Insert Figure 5 about here) 5. Conclusion A polynomial spline estimation method together with a BIC model selection procedure have been proposed for the semiparametric additive coefficient model. Consistency has been established under broad geometric mixing assumptions. The procedures apply to nonlinear regression with weakly dependence as well as i.i.d. observations. Implementation of the proposed method is as easy and 9

10 fast as linear regression, with excellent performance as Theorem 1 stipulates. Empirical study illustrates that these methods are very useful for statistical inference in multivariate regression setting. Acknowledgements The paper is substantially improved thanks to the helpful comments of Editor Jane-Ling Wang, an associate editor and the referees. Research has been supported in part by National Science Foundation grants BCS , DMS , and SES References Bosq, D. 1998). Nonparametric Statistics for Stochastic Processes: estimation and prediction, nd Ed. Springer-Verlag, New York. Cai, Z., Fan, J. and Yao, Q. W. 000). Functional-coefficient regression models for nonlinear time series. J. Amer. Statist. Assoc. 95, Chen, R. and Tsay, R. S. 1993a). Nonlinear additive ARX models. J. Amer. Statist. Assoc. 88, Chen, R. and Tsay, R. S. 1993b). Functional-coefficient autoregressive models. J. Amer. Statist. Assoc. 88, de Boor, C. 001). A Practical Guide to Splines. Springer, New York. Devore R. A. and Lorentz, G. G. 1993). Constructive Approximation. Springer-Varlag, Berlin Heidelberg. Hastie, T. J. and Tibshirani, R. J. 1990). Generalized Additive Models. Chapman and Hall, London. Hastie, T. J. and Tibshirani, R. J. 1993). Varying-coefficient models. J. Roy. Statist. Soc. Ser. B 55, Huang, J. Z. 1998a). Projection estimation in multiple regression with application to functional ANOVA models. Ann. Statist. 6, 4-7. Huang, J. Z. 1998b). Functional ANOVA models for generalized regression. J. Multivariate Anal. 67, Huang, J. Z. and Shen, H. 004). Functional coefficient regression models for non-linear time series: a polynomial spline approach. Scandinavian Journal of Statistics 31, Huang, J. Z., Wu, C. O. and Zhou, L. 00). Varying-coefficient models and basis function approximations for the analysis of repeated measurements. Biometrika 89,

11 Huang, J. Z., and Yang, L. 004). Identification of nonlinear additive autoregressive models. Journal of the Royal Statistical Society Series B 66, Stone, C. J. 1985). Additive regression and other nonparametric models. Ann. Statist. 13, Xue, L. and Yang, L. 005a). Estimation of semiparametric additive coefficient model. Journal of Statistical Planning and Inference in press. Xue, L. and Yang, L. 005b). Additive coefficient modeling via polynomial spline, downloadable at yangli/addsplinefull.pdf. Department of Statistics and Probability, Michigan State University, East Lansing, MI 4884, USA xuelan@stt.msu.edu Department of Statistics and Probability, Michigan State University, East Lansing, MI 4884, USA yang@stt.msu.edu Appendix A.1 Assumptions and notations The following assumptions are needed for our theoretical results. C1) The tuning variables X = X 1,..., X d ) are compactly supported and without lose of generality, we assume that its support is χ = [0, 1] d. The joint density of X, denoted by fx), is absolutely continuous and bounded away from zero and infinity, that is, 0 < c 1 min x χ fx) max x χ fx) c <. C) i) There exist positive constants 0 < c 3 c 4, such that c 3 I d1 ETT T X = x) c 4 I d1 for all x χ. Here I d1 denotes { the d 1 d 1 identity matrix. There also exist positive constants Tl ) } +δ0 c 5, c 6 such that, c 5 E T l X = x c 6 a.s. for some δ 0 > 0 and l, l = 1,..., d 1. ii) For some sufficient large m > 0, E T l m < +, for l = 1,..., d 1. C3) The d sets of knots denoted as k s,n = {0 = x s,0 x s,1 x s,nn x s,nn+1 = 1}, s = 1,..., d, are quasi-uniform, that is, there exists c 7 > 0 max x s,j+1 x s,j, j = 0,..., N n ) max,...,d minx s,j+1 x s,j, j = 0,..., N n ) c 7. The number of interior knots N n n 1 p+3, where p is the degree of the spline and meaning both sides have the same order. In particular, h n 1 p+3. C4) The vector process {ς t } t= = {Y t, X t, T t )} t= is strictly stationary and geometricly strongly mixing, that is, its α -mixing coefficient αk) cρ k, for constants c > 0, 0 < ρ < 1, where αk) = sup A σςt,t 0),B σς t,t k) P A)P B) P AB). 11

12 C5) The conditional variance function σ x, t) is measurable and bounded. Assumptions C1)-C5) are common in the nonparametric regression literature. Assumption C1) is the same as Condition 1, p.693 of Stone 1985), assumption c), p.468 of Huang & Yang 004). Assumption C) i) is a direct extension of condition ii), p.531 of Huang & Shen 004). Assumption C) ii) is a direct extension of condition v), p.531 of Huang & Shen 004), and of the moment condition A. c) p.95 of Cai, Fan & Yao 000). Assumption C3) is the same as in equation 6), p.49 of Huang 1998a), and also p.59, Huang 1998b). Assumption C4) is similar to condition iv), p.531 of Huang & Shen 004). Assumption C5) is the same as p.4 of Huang 1998a), and p.465 of Huang & Yang 004). In this Appendix, whenever proofs are brief, see Xue & Yang 005b) for details. A. Technical lemmas For notational convenience, we introduce, for 1 l d 1, 1 s d, d α l0 = ĉ l0 + Eαls, α lsx s ) = αls x s) Eαls. A.1) Then one can rewrite ˆm 3.1) as ˆm = { d 1 α l0 + } d α lsx s ) t l, with { α ls x s )} 1 l d1 ϕ 0 s. The terms in A.1) are not directly observable and serve only as the intermediate step in the proof of Theorem 1. By observing that, for 1 l d 1, 1 s d, d ˆα l0 = α l0 E n α ls, ˆα ls x s ) = α ls x s ) E n α ls, A.) the terms { α l0 } d 1, { α lsx s )} d 1,d, and {ˆα l0} d 1, {ˆα lsx s )} d 1,d, differ only by a constant. In section A.3, we first prove the consistency of { α l0 } d 1, { α lsx s )} d 1,d, in Theorem 3. Then Theorem 1 follows by showing {E n α ls } d 1,d, are negligible. B-spline basis is used in the proofs, which is equivalent to the truncated power basis used in implementation, but has nice local properties de Boor 001). With J n = N n + p, we denote the B-spline basis of ϕ s by b s = {b s,0,..., b s,jn }. For 1 s d, denote B s = {B s,1,..., B s,jn } with B s,j = N n b s,j E b ) s,j) E b s,0 ) b s,0, j = 1,..., J n. A.3) Note that assumption C1) ensures all B s,j s are well defined. Now, let B = 1,B 1,1,..., B 1,Jn,..., B d,1,..., B d,j n ) T with 1 being the identity function defined on χ. Define G = Bt 1,..., Bt d1 ) T = G 1,..., G Rn {) T,with R n = d 1 d J n + 1). } Then G is a set of basis of M n. By 3.1), one has ˆm x, t) = d 1 ĉ l0 + d ĉ ls,j B s,j x s ) t l, in which } {ĉ l0, ĉ ls,j, 1 l d 1, 1 s d, 1 j J n minimize the sum of squares as in 3.3), with w s,j replaced by B s,j. Then Lemma 1 leads to α l0 = ĉ l0, α ls = Jn ĉ ls,j B s,j x s ), 1 l d 1, 1 s d. J n 1

13 Theorem 3 Under assumptions C1)-C5), if α ls C p+1 [0, 1]), for 1 l d 1, 1 s d, one has ˆm m = O p h p+1 + ) 1/nh, max α l0 α l0 + max α ls α ls = O p h p+1 + ) 1/nh. 1 l d 1 1 l d 1,1 s d To prove Theorem 3, we first present the properties of the basis G in Lemmas A.1-A.3. Lemma A.1 For any 1 s d, and spline basis B s,j as in A.3), one has i) E B s,j ) = 0, E B s,j k N k/ 1 n, for k > 1, j = 1,..., J n. ii) There exists a constant C > 0 such that for any vector a = a 1,..., a Jn ) T, as n, J n a j B s,j C Jn a j. Proof. i) follows from Theorem 5.4. of DeVore & Lorentz 1993), and assumptions C1), C3). To prove ii), we introduce the auxiliary knots x s, p =... = x s, 1 = x s,0 = 0, and x s,nn+p+1 =... = x s,nn+ = x s,nn+1 = 1 for the knots k s. Then J n a j B s,j c 1 J n a j B s,j = c 1 J n a j Nn b s,j J n a j Nn E b s,j ) E b s,0 ) where is defined as f = f x) dx, for any square integrable function f. Let d s,j = x s,j+1 x s,j p ) / p + 1). Then Theorem 5.4. of Devore & Lorentz 1993) ensures that for a C > 0, the above is c 1 C J n a jn n d s,j + J n a j Nn E b s,j ) E b s,0 ) d s,0 c 1 C J n b s,0 J n a jn n d s,j c 1 C p + 1) /c 7 a j. Lemma A. There exists a constant C > 0, such that as n, for any sets of coefficients, {c l0, c ls,j, l = 1,..., d 1 ; s = 1,..., d ; j = 1,..., J n } d 1 d c l0 + J n d 1 d c ls,j B s,j t l C c l0 + J n c ls,j., Proof. The result follows immediately from Lemmas 1, and A.1. Lemma A.3 Let G, G be the R n R n matrix defined as G, G = G i,g j ) Rn i,. Define G, G n similarly as G, G, but replace the theoretical inner product with the empirical inner product, and let D = diag G, G ). Define Q n = sup D 1/ G, G n G, G ) D 1/, where sup is ) taken oven all the elements in the random matrix. Then as n, Q n = O p n 1 h 1 log n). 13

14 Proof. For notation simplicity, we consider the diagonal terms. For any 1 l d, 1 s { } d 1, 1 j J n fixed, let ξ = E n E) Bs,j X n s) Tl = 1 n ξ i, in which ξ i = Bs,j X is) Til { } E Bs,j X is) Til. Define T il = T il I { Til n δ }, for some 0 < δ < 1, and define ξ, ξ i similarly as ξ and ξ i, but replace T l with T l. Then for any ɛ > 0, one has P ) ξ ɛ log n)/ nh) ) P ξ ɛ log n)/ nh) + P ξ ξ), A.4) in which P ξ ξ) P T il T ) il, for some i = 1,..., n n P T il n δ) E T l m /n mδ 1. Also note that sup 0 xs 1 B s,j x s ) = sup 0 xs 1 N n {b s,j E b s,j ) b s,0 /E b s,0 )} c N n, for some c > 0. Then by Minkowski s inequality, for any positive integer k 3 [ k E ξ i k 1 E Bs,j X s ) T k { + E Bs,j X s ) T } ] [ k k 1 n δk c k Nn k + cn n ) k] n δk c k Nn. k l On the other hand E ξ i E Bs,j X s ) Tl / E { Bs,j X s ) Tl } B E s,j X s ) Tl I { T l >n δ } /, in which, under assumption C) E Bs,j X s ) Tl I { T l >n δ } E Bs,j 4 X s ) E where δ 0 is as in assumption C). Furthermore l T 4+δ 0 l /n δδ 0 X) c6 E B 4 s,j X s ) /n δδ 0 cn n /n δδ 0, E { Bs,j X s ) Tl } c 4 E { Bs,j X s ) } c, E B s,j X s ) Tl c 5 E B s,j X s ) 4 c 1 c 5 B s,j x s ) 4 dx s cc 1 c 5 N n. Thus E ξ i cnn c cnn n cn δδ 0 n. So there exists a constant c > 0, such that for all 3 k E ξ i n δk c k Nn k cn 6δ N n) k k!e ξi. Then one can apply Theorem 1.4 of Bosq 1998) to n ξ i, with the Cramer s constant c r = cn 6δ N n. That is, for any ɛ > 0, q [ 1, n ], and k 3, one has P 1 n where n ξ i ɛ log n) nh ɛ log n) a 1 = n q + nh 1 + 5m + 5ɛc r qɛ log n) nh a 1 exp 5m + 5ɛc r log n) nh log n) nh, a k) = 11n 1 + ɛ k 5m k+1 p log n) nh + a k)α [ ]) k n k+1, q + 1, m = E ξ i, m p = ξ i p. 14

15 log Observe that 5ɛc n) r nh = 5ɛcn 6δ Nn log n) nh = o1), by taking δ < ) q = n/ {c 0 log n)}, one has a 1 = O n q = O {logn)}, a k) = O for n large enough 1 P ) n ξ i ɛ log n)/ nh) n p 1p+3) ) k n N k+1 n log n) nh. Then by taking ) = o n 3. Thus, { } c logn) exp ɛ logn) 50c 0 m + cn 3 exp { logρ)c0 log n)}. Thus by A.4), taking c 0, ɛ, m large enough and use assumption C4), one has that ) P sup G, G n=1 n G, G ɛ log n)/ nh) { { } {d 1d N n + )} c logn) exp ɛ logn) + cn 3 exp { logρ)c0 log n)} + E T l m } n=1 5c 0 n mδ 1 < {d 1d N n + )} n 3 < +, n=1 in which N n n 1 p+3. Then the lemma follows from Borel-Cantelli Lemma and Lemma A.1. Lemma A.4 As n, one has sup φ 1, φ n φ 1, φ φ 1 M n,φ M n φ 1 φ = O p log n). nh In particular, there exist constants 0 < c < 1 < C such that, except on an event whose probability tends to zero as n, c m m,n C m, m M n. Proof. With vector notation, one can write φ 1 = a T 1 G, φ = a T G, for R n 1 vectors a 1, a. φ 1, φ n φ 1, φ = R n Q n i, On the other hand by Lemma A., R n i, a 1i a j G i G j Q n C a 1i a j G i, G j n G i, G j R n i, a 1i a j Q n C a T 1 a 1a T a. φ 1 φ = a T 1 G, G a 1 ) a T G, G a ) C a T 1 a 1 a T a. Then φ 1, φ n φ 1, φ φ 1 φ C Q n C a T1 a a T 1 a 1 a T a a T a = O p Q n ) = O p log n). nh Lemma A.4 shows that the empirical and theoretical inner products are uniformly close over the approximation space M n. This lemma plays the crucial role analogous to that of Lemma 10 in Huang 1998a). Our result is new in that i) the spline basis of Huang 1998a) must be bounded, 15

16 whereas the term t in basis G makes it possibly bounded; ii) Huang 1998a) s setting is i.i.d. with uniform approximation rate of o p 1), while our setting is α-mixing, ) broadly applicable to time series data, with approximation rate the sharper O p log n)/nh. The next lemma follows immediately from Lemmas A. and A.4. Lemma A.5 There exists constant C > 0 such that except on an event whose probability tends to zero as n d 1 d c l0 + J n c ls,j B s,j t l A.3 Proof of mean square consistency Proof of Theorem 3. We denote,n d 1 d C c l0 + J n c ls,j Y = Y 1,..., Y n ) T, m = {mx 1, T 1 ),..., mx n, T n )} T, E = {σx 1, T 1 )ε 1,..., σx n, T n )ε n } T. Note that Y = m + E, and projecting it onto M n, one has ˆm = m + e, where ˆm is defined in 3.1), and m, e are the solution to 3.1) with Y i replaced by mx i, T i ) and σx i, T i )ε i respectively. Also one can uniquely represent m as m = d 1 α l0 + ) d α ls t l, α ls ϕ 0 s. With these notations, one has the error decomposition ˆm m = m m + e, where m m is the bias term, and e is the variance term. Since for 1 l d 1, 1 s d, α ls C p+1 [0, 1]), by Lemma, there exist { C > 0 and spline functions g ls ϕ 0 s, such that α ls g ls Ch p+1. Let m n x, t) = α l0 + } d g ls x s ) t l M n. One has d1. d 1 d d 1 d m m n {α ls g ls } t l c 4 α ls g ls c 4 Ch p+1. A.5) Also m m n,n Ch p+1 a.s. Then by the definition of projection, one has m m,n m m n,n Ch p+1, which also implies m m n,n m m,n + m m n,n Ch p+1. By Lemma A.4 m m n m m n,n 1 Q n ) 1/ = O p h p+1 ). Together with A.5), one has m m = O p h p+1 ). A.6) Next we consider the variance term e. For some set of coefficients â = â 1,..., â Rn ) T, one can write e x, t) = R n âjg j x, t). Denote N = [N G 1 ),..., N G Rn )] T, with N G j ) = 1 n n G j X i, T i ) σ X i, T i ) ε i. By the definition of projection, one has Gj, G j n ) Rn j,j =1 â = N. 16

17 Multiplying both sides with the same vector, one gets â T ) Rn G j, G j n j,j =1 â = ât N. Now, by Lemmas A., A.4, the LHS is R n âjg j C 1 Q n) R n,n â j, while the RHS is 1/ ) R n R n 1/ â 1 n j G j X i, T i ) σ X i, T i ) ε i. n Hence C 1 Q n ) R n â j Rn ) 1/ { Rn 1 n â j n G ) } 1/ j X i, T i ) σ X i, T i ) ε i entailing 1/ ) R n R n 1/ â j C 1 1 Q n ) 1 1 n G j X i, T i ) σ X i, T i ) ε i, n { and as a result e C 1 Q n) Rn 1 n n G ) } j X i, T i ) σ X i, T i ) ε i. Since ε i is independent of {X j, T j ), j i}, for i = 1,..., n, one has R n E 1 n ) n G j X i, T i ) σ X i, T i ) ε i = R n 1 n E {G j X i, T i ) σ X i, T i ) ε i } CJ n n ) 1 = O, nh by assumptions C), C5) and Lemma A.1 i). Therefore ẽ = O p n 1 h 1). This, together with A.6) prove ˆm m = O p h p+1 + ) 1/nh. The second part of Theorem 3 follows from Lemma [ 1, which entails that for C > 0, ˆm m C d1 { α l0 α l0 ) + }] d α ls α ls. Proof of Theorem 1. By A.), one only needs to show E n α ls = O p h p+1 + ) 1/nh, for 1 l d 1, 1 s d. Note that E n α ls E n { α ls α ls } + E n α ls, whose first term E n { α ls α ls } α ls α ls,n α ls α ls,n + α ls α ls,n α ls g ls,n + α ls g ls,n + α ls α ls,n, with α ls g ls,n α ls g ls Ch p+1, and applying Lemmas 1 and A.3, one has α ls g ls,n 1 + Q n ) α ls g ls 1 + Q n ) m m n = O p h p+1 ), ) α ls α ls,n 1 + Q n ) α ls α ls,n 1 + Q n ) ẽ = O p 1/nh. Thus E n { α ls α ls } = O p h p+1 + ) 1/nh. Since E n α ls = O p 1/ n), one now has E n α ls = O p h p+1 + ) 1/nh. Theorem 1 now follows from the triangular inequality. A.4 Proof of BIC consistency We denote the model space M S and the approximation space M n,s of m S separately as d1 M S = m x, t) = α l x) t l ; α l x) = α l0 + α ls x s ); α ls Hs 0, s S l M n,s = m d 1 n x, t) = g l x) t l ; 17 g l x) = α l0 + g ls x s ); g ls ϕ 0 s. s S l

18 If S S f, M S M Sf and M n,s M n,sf. Let Proj S Proj n,s ) be the orthogonal least square projector onto M S M n,s ) with respect to the empirical inner product. Then ˆm S 3.7) can be viewed as: ˆm S = Proj n,s Y). As a special case of Theorem 1, one has the following result. Lemma A.6 Under the same conditions as in Theorem 1, one has ˆm S m S = O p 1/N p+1 S + ) N S /n. Now denote c S, m) = Proj S m m. One has if m M S0, Proj S0 m = m, thus c S 0, m) = 0; and if S overfits, since m M S0 M S, c S, m) = 0; and if S underfits, c S, m) > 0. Proof of Theorem. Notice that BIC S BIC S0 = MSE S MSE S0 {1 + o p 1)} + q S q S0 log n) MSE S0 n MSE S MSE S0 = E{σ X, T)}1 + o p 1)) {1 + o p 1)} + n p+ p+3 log n), since q S q S0 n 1/p+3), and MSE s0 1 n n {Y i m X i, T i )} + 1 n n { ˆm S0 X i, T i ) m X i, T i )} = E{σ X, T)}1+o p 1)). Case 1 Overfitting): Suppose that S 0 S and S 0 S. One has MSE s MSE s0 = m s m s,0,n = m s m s,0 {1 + o p 1)} ) m s m + m s,0 m {1 + o p 1)} = O p n p+)/p+3)). Thus lim n + {P BIC s BIC s0 > 0)} = 1. To see why the assumption q S q S0 n 1/p+3) is necessary, suppose q S0 n r, with r > 1/p + 3) instead. Then it can be shown that n r 1 MSE s MSE s0 = E{σ X, T)} {1 + o p 1)} nr 1 logn) {1 + o p 1)}, which leads to lim n + {P BIC s BIC s0 < 0)} = 1, instead. Case Underfitting): Similarly as in Huang & Yang 004), we can show that if S underfits, MSE S MSE S0 c S, m) + o p 1). Then BIC S BIC S0 c S, m) + o p 1) E{σ X, T)}1 + o p 1)) + o p 1), which implies that lim n + {P BIC s BIC s0 > 0)} = 1. 18

19 Table 1: Simulated example 1: the means and standard errors in parentheses) of ĉ 1, ĉ and AISEs of ˆα 11, ˆα 1, ˆα 1, ˆα by two methods: marginal integration and polynomial spline. Integration fit c 1 = c = 1 α 11 α 1 α 1 α n = ) ) n = ) ) n = ) ) Spline fit p = 1 n = ) ) n = ) ) n = ) ) Spline fit p = 3 n = ) ) n = ) ) n = ) ) Table : Simulation results of model selection using polynomial spline fit with the BIC, AIC. For each setup, the first, second and third columns give the number of underfitting, correct fitting and overfitting over 100 simulations. n BIC AIC p = 1 p = p = 3 p = 1 p = p = 3 Example Example

20 Table 3: Results of the simulated example : the means and standard errors in parentheses) of ĉ 1 and ĉ, and AISEs of ˆα 11, ˆα 1, ˆα 1, ˆα by linear spline p = 1). Spline fit p = 1 c 1 = 0. c = 0.3 α 11 α 1 α 1 α n = ) ) n = ) ) n = ) ) Table 4: German real GNP: the ASE s and ASPE s of five fits. ASE ASPE Integration fit, p = Integration fit, p = Spline fit p = Spline fit p = Linear AR fit

21 a1 a a3 a b1 b b3 b c1 c c3 c d1 d d3 d4 Figure 1: Plots of the estimated coefficient functions in the simulated example 1. a1-a4) are plots of the 100 estimated curves for α 11 x) = sin4x )) + exp x 0.5) ), α 1 x) = x, α 1 x) = sinx), α x) = 0, for n = 100. b1-b4) and c1-c4) are the same as a1-a4), but for n = 50 and n = 500 respectively. d1-d4) are plots of the typical estimators, the solid curve represents the true curve, the dotted curve is the typical estimated curve for n = 100, the dotdashed and dashed curves are for n = 50, 500 respectively.

22 a1 a a3 a b1 b b3 b c1 c c3 c d1 d d3 d4 Figure : Plots of the estimated coefficient functions in the simulated example. a1-a4) are plots of the 100 estimated curves for α 11 u) = u) exp 4u ), α 1 u) = 0.3/{1 + u 1) 4 }, α 1 u) = 0, α u) = u) exp 4u ) when n = 100. b1-b4) and c1-c4) are the same as a1-a4), but when n = 50, 500 respectively. d1-d4) are plots of the typical estimators, the solid curve represents the true curve, the dotted curve is the typical estimated curve for n = 100, the dot-dashed and dashed curves are for n = 50, 500 respectively.

23 Original quarterly GNP data Figure 3: GNP data: time plot of the series {G t } 14 t=1.

24 Transformed quarterly GNP data Figure 4: GNP data after transformation: time plot of the series {Y t } 10 t=1.

25 a b c d Figure 5: GNP data: spline approximation of the functions in model 4.). a) ˆα 11 ; b) ˆα 1 ; c) ˆα 1 ; d) ˆα, in which solid lines denote the estimation results using linear spline p = 1), and dotted lines denote estimates using cubic spline p = 3).

Identification of non-linear additive autoregressive models

Identification of non-linear additive autoregressive models J. R. Statist. Soc. B (2004) 66, Part 2, pp. 463 477 Identification of non-linear additive autoregressive models Jianhua Z. Huang University of Pennsylvania, Philadelphia, USA and Lijian Yang Michigan

More information

SMOOTHED BLOCK EMPIRICAL LIKELIHOOD FOR QUANTILES OF WEAKLY DEPENDENT PROCESSES

SMOOTHED BLOCK EMPIRICAL LIKELIHOOD FOR QUANTILES OF WEAKLY DEPENDENT PROCESSES Statistica Sinica 19 (2009), 71-81 SMOOTHED BLOCK EMPIRICAL LIKELIHOOD FOR QUANTILES OF WEAKLY DEPENDENT PROCESSES Song Xi Chen 1,2 and Chiu Min Wong 3 1 Iowa State University, 2 Peking University and

More information

Estimation of cumulative distribution function with spline functions

Estimation of cumulative distribution function with spline functions INTERNATIONAL JOURNAL OF ECONOMICS AND STATISTICS Volume 5, 017 Estimation of cumulative distribution function with functions Akhlitdin Nizamitdinov, Aladdin Shamilov Abstract The estimation of the cumulative

More information

Issues on quantile autoregression

Issues on quantile autoregression Issues on quantile autoregression Jianqing Fan and Yingying Fan We congratulate Koenker and Xiao on their interesting and important contribution to the quantile autoregression (QAR). The paper provides

More information

Discussion of the paper Inference for Semiparametric Models: Some Questions and an Answer by Bickel and Kwon

Discussion of the paper Inference for Semiparametric Models: Some Questions and an Answer by Bickel and Kwon Discussion of the paper Inference for Semiparametric Models: Some Questions and an Answer by Bickel and Kwon Jianqing Fan Department of Statistics Chinese University of Hong Kong AND Department of Statistics

More information

Asymptotic inference for a nonstationary double ar(1) model

Asymptotic inference for a nonstationary double ar(1) model Asymptotic inference for a nonstationary double ar() model By SHIQING LING and DONG LI Department of Mathematics, Hong Kong University of Science and Technology, Hong Kong maling@ust.hk malidong@ust.hk

More information

Efficient Estimation for the Partially Linear Models with Random Effects

Efficient Estimation for the Partially Linear Models with Random Effects A^VÇÚO 1 33 ò 1 5 Ï 2017 c 10 Chinese Journal of Applied Probability and Statistics Oct., 2017, Vol. 33, No. 5, pp. 529-537 doi: 10.3969/j.issn.1001-4268.2017.05.009 Efficient Estimation for the Partially

More information

WEIGHTED QUANTILE REGRESSION THEORY AND ITS APPLICATION. Abstract

WEIGHTED QUANTILE REGRESSION THEORY AND ITS APPLICATION. Abstract Journal of Data Science,17(1). P. 145-160,2019 DOI:10.6339/JDS.201901_17(1).0007 WEIGHTED QUANTILE REGRESSION THEORY AND ITS APPLICATION Wei Xiong *, Maozai Tian 2 1 School of Statistics, University of

More information

Functional Coefficient Regression Models for Nonlinear Time Series: A Polynomial Spline Approach

Functional Coefficient Regression Models for Nonlinear Time Series: A Polynomial Spline Approach Functional Coefficient Regression Models for Nonlinear Time Series: A Polynomial Spline Approach JIANHUA Z. HUANG University of Pennsylvania HAIPENG SHEN University of North Carolina at Chapel Hill ABSTRACT.

More information

Smooth simultaneous confidence bands for cumulative distribution functions

Smooth simultaneous confidence bands for cumulative distribution functions Journal of Nonparametric Statistics, 2013 Vol. 25, No. 2, 395 407, http://dx.doi.org/10.1080/10485252.2012.759219 Smooth simultaneous confidence bands for cumulative distribution functions Jiangyan Wang

More information

Estimation and Testing for Varying Coefficients in Additive Models with Marginal Integration

Estimation and Testing for Varying Coefficients in Additive Models with Marginal Integration Estimation and Testing for Varying Coefficients in Additive Models with Marginal Integration Lijian Yang Byeong U. Park Lan Xue Wolfgang Härdle September 6, 2005 Abstract We propose marginal integration

More information

Econ 582 Nonparametric Regression

Econ 582 Nonparametric Regression Econ 582 Nonparametric Regression Eric Zivot May 28, 2013 Nonparametric Regression Sofarwehaveonlyconsideredlinearregressionmodels = x 0 β + [ x ]=0 [ x = x] =x 0 β = [ x = x] [ x = x] x = β The assume

More information

41903: Introduction to Nonparametrics

41903: Introduction to Nonparametrics 41903: Notes 5 Introduction Nonparametrics fundamentally about fitting flexible models: want model that is flexible enough to accommodate important patterns but not so flexible it overspecializes to specific

More information

Optimal global rates of convergence for interpolation problems with random design

Optimal global rates of convergence for interpolation problems with random design Optimal global rates of convergence for interpolation problems with random design Michael Kohler 1 and Adam Krzyżak 2, 1 Fachbereich Mathematik, Technische Universität Darmstadt, Schlossgartenstr. 7, 64289

More information

Estimation of a quadratic regression functional using the sinc kernel

Estimation of a quadratic regression functional using the sinc kernel Estimation of a quadratic regression functional using the sinc kernel Nicolai Bissantz Hajo Holzmann Institute for Mathematical Stochastics, Georg-August-University Göttingen, Maschmühlenweg 8 10, D-37073

More information

Model Selection and Geometry

Model Selection and Geometry Model Selection and Geometry Pascal Massart Université Paris-Sud, Orsay Leipzig, February Purpose of the talk! Concentration of measure plays a fundamental role in the theory of model selection! Model

More information

7 Semiparametric Estimation of Additive Models

7 Semiparametric Estimation of Additive Models 7 Semiparametric Estimation of Additive Models Additive models are very useful for approximating the high-dimensional regression mean functions. They and their extensions have become one of the most widely

More information

Nonparametric Estimation of Functional-Coefficient Autoregressive Models

Nonparametric Estimation of Functional-Coefficient Autoregressive Models Nonparametric Estimation of Functional-Coefficient Autoregressive Models PEDRO A. MORETTIN and CHANG CHIANN Department of Statistics, University of São Paulo Introduction Nonlinear Models: - Exponential

More information

Time Series and Forecasting Lecture 4 NonLinear Time Series

Time Series and Forecasting Lecture 4 NonLinear Time Series Time Series and Forecasting Lecture 4 NonLinear Time Series Bruce E. Hansen Summer School in Economics and Econometrics University of Crete July 23-27, 2012 Bruce Hansen (University of Wisconsin) Foundations

More information

SUPPLEMENT TO PARAMETRIC OR NONPARAMETRIC? A PARAMETRICNESS INDEX FOR MODEL SELECTION. University of Minnesota

SUPPLEMENT TO PARAMETRIC OR NONPARAMETRIC? A PARAMETRICNESS INDEX FOR MODEL SELECTION. University of Minnesota Submitted to the Annals of Statistics arxiv: math.pr/0000000 SUPPLEMENT TO PARAMETRIC OR NONPARAMETRIC? A PARAMETRICNESS INDEX FOR MODEL SELECTION By Wei Liu and Yuhong Yang University of Minnesota In

More information

Single Index Quantile Regression for Heteroscedastic Data

Single Index Quantile Regression for Heteroscedastic Data Single Index Quantile Regression for Heteroscedastic Data E. Christou M. G. Akritas Department of Statistics The Pennsylvania State University SMAC, November 6, 2015 E. Christou, M. G. Akritas (PSU) SIQR

More information

Nonparametric Methods

Nonparametric Methods Nonparametric Methods Michael R. Roberts Department of Finance The Wharton School University of Pennsylvania July 28, 2009 Michael R. Roberts Nonparametric Methods 1/42 Overview Great for data analysis

More information

Thomas J. Fisher. Research Statement. Preliminary Results

Thomas J. Fisher. Research Statement. Preliminary Results Thomas J. Fisher Research Statement Preliminary Results Many applications of modern statistics involve a large number of measurements and can be considered in a linear algebra framework. In many of these

More information

Nonlinear Support Vector Machines through Iterative Majorization and I-Splines

Nonlinear Support Vector Machines through Iterative Majorization and I-Splines Nonlinear Support Vector Machines through Iterative Majorization and I-Splines P.J.F. Groenen G. Nalbantov J.C. Bioch July 9, 26 Econometric Institute Report EI 26-25 Abstract To minimize the primal support

More information

9. Model Selection. statistical models. overview of model selection. information criteria. goodness-of-fit measures

9. Model Selection. statistical models. overview of model selection. information criteria. goodness-of-fit measures FE661 - Statistical Methods for Financial Engineering 9. Model Selection Jitkomut Songsiri statistical models overview of model selection information criteria goodness-of-fit measures 9-1 Statistical models

More information

On a Nonparametric Notion of Residual and its Applications

On a Nonparametric Notion of Residual and its Applications On a Nonparametric Notion of Residual and its Applications Bodhisattva Sen and Gábor Székely arxiv:1409.3886v1 [stat.me] 12 Sep 2014 Columbia University and National Science Foundation September 16, 2014

More information

Introduction Wavelet shrinage methods have been very successful in nonparametric regression. But so far most of the wavelet regression methods have be

Introduction Wavelet shrinage methods have been very successful in nonparametric regression. But so far most of the wavelet regression methods have be Wavelet Estimation For Samples With Random Uniform Design T. Tony Cai Department of Statistics, Purdue University Lawrence D. Brown Department of Statistics, University of Pennsylvania Abstract We show

More information

A Semiparametric Estimation for Regression Functions in the Partially Linear Autoregressive Time Series Model

A Semiparametric Estimation for Regression Functions in the Partially Linear Autoregressive Time Series Model Available at http://pvamu.edu/aam Appl. Appl. Math. ISSN: 1932-9466 Vol. 9, Issue 2 (December 2014), pp. 573-591 Applications and Applied Mathematics: An International Journal (AAM) A Semiparametric Estimation

More information

Model Selection Tutorial 2: Problems With Using AIC to Select a Subset of Exposures in a Regression Model

Model Selection Tutorial 2: Problems With Using AIC to Select a Subset of Exposures in a Regression Model Model Selection Tutorial 2: Problems With Using AIC to Select a Subset of Exposures in a Regression Model Centre for Molecular, Environmental, Genetic & Analytic (MEGA) Epidemiology School of Population

More information

PREWHITENING-BASED ESTIMATION IN PARTIAL LINEAR REGRESSION MODELS: A COMPARATIVE STUDY

PREWHITENING-BASED ESTIMATION IN PARTIAL LINEAR REGRESSION MODELS: A COMPARATIVE STUDY REVSTAT Statistical Journal Volume 7, Number 1, April 2009, 37 54 PREWHITENING-BASED ESTIMATION IN PARTIAL LINEAR REGRESSION MODELS: A COMPARATIVE STUDY Authors: Germán Aneiros-Pérez Departamento de Matemáticas,

More information

Local linear multiple regression with variable. bandwidth in the presence of heteroscedasticity

Local linear multiple regression with variable. bandwidth in the presence of heteroscedasticity Local linear multiple regression with variable bandwidth in the presence of heteroscedasticity Azhong Ye 1 Rob J Hyndman 2 Zinai Li 3 23 January 2006 Abstract: We present local linear estimator with variable

More information

Generated Covariates in Nonparametric Estimation: A Short Review.

Generated Covariates in Nonparametric Estimation: A Short Review. Generated Covariates in Nonparametric Estimation: A Short Review. Enno Mammen, Christoph Rothe, and Melanie Schienle Abstract In many applications, covariates are not observed but have to be estimated

More information

SGN Advanced Signal Processing: Lecture 8 Parameter estimation for AR and MA models. Model order selection

SGN Advanced Signal Processing: Lecture 8 Parameter estimation for AR and MA models. Model order selection SG 21006 Advanced Signal Processing: Lecture 8 Parameter estimation for AR and MA models. Model order selection Ioan Tabus Department of Signal Processing Tampere University of Technology Finland 1 / 28

More information

Density estimation Nonparametric conditional mean estimation Semiparametric conditional mean estimation. Nonparametrics. Gabriel Montes-Rojas

Density estimation Nonparametric conditional mean estimation Semiparametric conditional mean estimation. Nonparametrics. Gabriel Montes-Rojas 0 0 5 Motivation: Regression discontinuity (Angrist&Pischke) Outcome.5 1 1.5 A. Linear E[Y 0i X i] 0.2.4.6.8 1 X Outcome.5 1 1.5 B. Nonlinear E[Y 0i X i] i 0.2.4.6.8 1 X utcome.5 1 1.5 C. Nonlinearity

More information

An Unbiased C p Criterion for Multivariate Ridge Regression

An Unbiased C p Criterion for Multivariate Ridge Regression An Unbiased C p Criterion for Multivariate Ridge Regression (Last Modified: March 7, 2008) Hirokazu Yanagihara 1 and Kenichi Satoh 2 1 Department of Mathematics, Graduate School of Science, Hiroshima University

More information

401 Review. 6. Power analysis for one/two-sample hypothesis tests and for correlation analysis.

401 Review. 6. Power analysis for one/two-sample hypothesis tests and for correlation analysis. 401 Review Major topics of the course 1. Univariate analysis 2. Bivariate analysis 3. Simple linear regression 4. Linear algebra 5. Multiple regression analysis Major analysis methods 1. Graphical analysis

More information

Function of Longitudinal Data

Function of Longitudinal Data New Local Estimation Procedure for Nonparametric Regression Function of Longitudinal Data Weixin Yao and Runze Li Abstract This paper develops a new estimation of nonparametric regression functions for

More information

Econometrics of Panel Data

Econometrics of Panel Data Econometrics of Panel Data Jakub Mućk Meeting # 6 Jakub Mućk Econometrics of Panel Data Meeting # 6 1 / 36 Outline 1 The First-Difference (FD) estimator 2 Dynamic panel data models 3 The Anderson and Hsiao

More information

AR, MA and ARMA models

AR, MA and ARMA models AR, MA and AR by Hedibert Lopes P Based on Tsay s Analysis of Financial Time Series (3rd edition) P 1 Stationarity 2 3 4 5 6 7 P 8 9 10 11 Outline P Linear Time Series Analysis and Its Applications For

More information

Threshold Autoregressions and NonLinear Autoregressions

Threshold Autoregressions and NonLinear Autoregressions Threshold Autoregressions and NonLinear Autoregressions Original Presentation: Central Bank of Chile October 29-31, 2013 Bruce Hansen (University of Wisconsin) Threshold Regression 1 / 47 Threshold Models

More information

Odds ratio estimation in Bernoulli smoothing spline analysis-ofvariance

Odds ratio estimation in Bernoulli smoothing spline analysis-ofvariance The Statistician (1997) 46, No. 1, pp. 49 56 Odds ratio estimation in Bernoulli smoothing spline analysis-ofvariance models By YUEDONG WANG{ University of Michigan, Ann Arbor, USA [Received June 1995.

More information

Density estimators for the convolution of discrete and continuous random variables

Density estimators for the convolution of discrete and continuous random variables Density estimators for the convolution of discrete and continuous random variables Ursula U Müller Texas A&M University Anton Schick Binghamton University Wolfgang Wefelmeyer Universität zu Köln Abstract

More information

High Dimensional Empirical Likelihood for Generalized Estimating Equations with Dependent Data

High Dimensional Empirical Likelihood for Generalized Estimating Equations with Dependent Data High Dimensional Empirical Likelihood for Generalized Estimating Equations with Dependent Data Song Xi CHEN Guanghua School of Management and Center for Statistical Science, Peking University Department

More information

Variable Selection for Highly Correlated Predictors

Variable Selection for Highly Correlated Predictors Variable Selection for Highly Correlated Predictors Fei Xue and Annie Qu arxiv:1709.04840v1 [stat.me] 14 Sep 2017 Abstract Penalty-based variable selection methods are powerful in selecting relevant covariates

More information

Local Polynomial Regression

Local Polynomial Regression VI Local Polynomial Regression (1) Global polynomial regression We observe random pairs (X 1, Y 1 ),, (X n, Y n ) where (X 1, Y 1 ),, (X n, Y n ) iid (X, Y ). We want to estimate m(x) = E(Y X = x) based

More information

Introduction to Empirical Processes and Semiparametric Inference Lecture 09: Stochastic Convergence, Continued

Introduction to Empirical Processes and Semiparametric Inference Lecture 09: Stochastic Convergence, Continued Introduction to Empirical Processes and Semiparametric Inference Lecture 09: Stochastic Convergence, Continued Michael R. Kosorok, Ph.D. Professor and Chair of Biostatistics Professor of Statistics and

More information

Transformation and Smoothing in Sample Survey Data

Transformation and Smoothing in Sample Survey Data Scandinavian Journal of Statistics, Vol. 37: 496 513, 2010 doi: 10.1111/j.1467-9469.2010.00691.x Published by Blackwell Publishing Ltd. Transformation and Smoothing in Sample Survey Data YANYUAN MA Department

More information

On the simplest expression of the perturbed Moore Penrose metric generalized inverse

On the simplest expression of the perturbed Moore Penrose metric generalized inverse Annals of the University of Bucharest (mathematical series) 4 (LXII) (2013), 433 446 On the simplest expression of the perturbed Moore Penrose metric generalized inverse Jianbing Cao and Yifeng Xue Communicated

More information

Fall 2017 STAT 532 Homework Peter Hoff. 1. Let P be a probability measure on a collection of sets A.

Fall 2017 STAT 532 Homework Peter Hoff. 1. Let P be a probability measure on a collection of sets A. 1. Let P be a probability measure on a collection of sets A. (a) For each n N, let H n be a set in A such that H n H n+1. Show that P (H n ) monotonically converges to P ( k=1 H k) as n. (b) For each n

More information

Nonparametric Econometrics

Nonparametric Econometrics Applied Microeconometrics with Stata Nonparametric Econometrics Spring Term 2011 1 / 37 Contents Introduction The histogram estimator The kernel density estimator Nonparametric regression estimators Semi-

More information

Stochastic volatility models: tails and memory

Stochastic volatility models: tails and memory : tails and memory Rafa l Kulik and Philippe Soulier Conference in honour of Prof. Murad Taqqu 19 April 2012 Rafa l Kulik and Philippe Soulier Plan Model assumptions; Limit theorems for partial sums and

More information

Goodness-of-Fit Tests for Time Series Models: A Score-Marked Empirical Process Approach

Goodness-of-Fit Tests for Time Series Models: A Score-Marked Empirical Process Approach Goodness-of-Fit Tests for Time Series Models: A Score-Marked Empirical Process Approach By Shiqing Ling Department of Mathematics Hong Kong University of Science and Technology Let {y t : t = 0, ±1, ±2,

More information

Quantile Regression for Extraordinarily Large Data

Quantile Regression for Extraordinarily Large Data Quantile Regression for Extraordinarily Large Data Shih-Kang Chao Department of Statistics Purdue University November, 2016 A joint work with Stanislav Volgushev and Guang Cheng Quantile regression Two-step

More information

A New Method for Varying Adaptive Bandwidth Selection

A New Method for Varying Adaptive Bandwidth Selection IEEE TRASACTIOS O SIGAL PROCESSIG, VOL. 47, O. 9, SEPTEMBER 1999 2567 TABLE I SQUARE ROOT MEA SQUARED ERRORS (SRMSE) OF ESTIMATIO USIG THE LPA AD VARIOUS WAVELET METHODS A ew Method for Varying Adaptive

More information

Some Theories about Backfitting Algorithm for Varying Coefficient Partially Linear Model

Some Theories about Backfitting Algorithm for Varying Coefficient Partially Linear Model Some Theories about Backfitting Algorithm for Varying Coefficient Partially Linear Model 1. Introduction Varying-coefficient partially linear model (Zhang, Lee, and Song, 2002; Xia, Zhang, and Tong, 2004;

More information

Professors Lin and Ying are to be congratulated for an interesting paper on a challenging topic and for introducing survival analysis techniques to th

Professors Lin and Ying are to be congratulated for an interesting paper on a challenging topic and for introducing survival analysis techniques to th DISCUSSION OF THE PAPER BY LIN AND YING Xihong Lin and Raymond J. Carroll Λ July 21, 2000 Λ Xihong Lin (xlin@sph.umich.edu) is Associate Professor, Department ofbiostatistics, University of Michigan, Ann

More information

Wavelet Shrinkage for Nonequispaced Samples

Wavelet Shrinkage for Nonequispaced Samples University of Pennsylvania ScholarlyCommons Statistics Papers Wharton Faculty Research 1998 Wavelet Shrinkage for Nonequispaced Samples T. Tony Cai University of Pennsylvania Lawrence D. Brown University

More information

Regression: Lecture 2

Regression: Lecture 2 Regression: Lecture 2 Niels Richard Hansen April 26, 2012 Contents 1 Linear regression and least squares estimation 1 1.1 Distributional results................................ 3 2 Non-linear effects and

More information

Mohsen Pourahmadi. 1. A sampling theorem for multivariate stationary processes. J. of Multivariate Analysis, Vol. 13, No. 1 (1983),

Mohsen Pourahmadi. 1. A sampling theorem for multivariate stationary processes. J. of Multivariate Analysis, Vol. 13, No. 1 (1983), Mohsen Pourahmadi PUBLICATIONS Books and Editorial Activities: 1. Foundations of Time Series Analysis and Prediction Theory, John Wiley, 2001. 2. Computing Science and Statistics, 31, 2000, the Proceedings

More information

Estimating GARCH models: when to use what?

Estimating GARCH models: when to use what? Econometrics Journal (2008), volume, pp. 27 38. doi: 0./j.368-423X.2008.00229.x Estimating GARCH models: when to use what? DA HUANG, HANSHENG WANG AND QIWEI YAO, Guanghua School of Management, Peking University,

More information

A CENTRAL LIMIT THEOREM FOR NESTED OR SLICED LATIN HYPERCUBE DESIGNS

A CENTRAL LIMIT THEOREM FOR NESTED OR SLICED LATIN HYPERCUBE DESIGNS Statistica Sinica 26 (2016), 1117-1128 doi:http://dx.doi.org/10.5705/ss.202015.0240 A CENTRAL LIMIT THEOREM FOR NESTED OR SLICED LATIN HYPERCUBE DESIGNS Xu He and Peter Z. G. Qian Chinese Academy of Sciences

More information

Introduction to machine learning and pattern recognition Lecture 2 Coryn Bailer-Jones

Introduction to machine learning and pattern recognition Lecture 2 Coryn Bailer-Jones Introduction to machine learning and pattern recognition Lecture 2 Coryn Bailer-Jones http://www.mpia.de/homes/calj/mlpr_mpia2008.html 1 1 Last week... supervised and unsupervised methods need adaptive

More information

Model Specification Testing in Nonparametric and Semiparametric Time Series Econometrics. Jiti Gao

Model Specification Testing in Nonparametric and Semiparametric Time Series Econometrics. Jiti Gao Model Specification Testing in Nonparametric and Semiparametric Time Series Econometrics Jiti Gao Department of Statistics School of Mathematics and Statistics The University of Western Australia Crawley

More information

Research Article Exponential Inequalities for Positively Associated Random Variables and Applications

Research Article Exponential Inequalities for Positively Associated Random Variables and Applications Hindawi Publishing Corporation Journal of Inequalities and Applications Volume 008, Article ID 38536, 11 pages doi:10.1155/008/38536 Research Article Exponential Inequalities for Positively Associated

More information

Modelling Non-linear and Non-stationary Time Series

Modelling Non-linear and Non-stationary Time Series Modelling Non-linear and Non-stationary Time Series Chapter 2: Non-parametric methods Henrik Madsen Advanced Time Series Analysis September 206 Henrik Madsen (02427 Adv. TS Analysis) Lecture Notes September

More information

DEPARTMENT MATHEMATIK ARBEITSBEREICH MATHEMATISCHE STATISTIK UND STOCHASTISCHE PROZESSE

DEPARTMENT MATHEMATIK ARBEITSBEREICH MATHEMATISCHE STATISTIK UND STOCHASTISCHE PROZESSE Estimating the error distribution in nonparametric multiple regression with applications to model testing Natalie Neumeyer & Ingrid Van Keilegom Preprint No. 2008-01 July 2008 DEPARTMENT MATHEMATIK ARBEITSBEREICH

More information

Multiscale Adaptive Inference on Conditional Moment Inequalities

Multiscale Adaptive Inference on Conditional Moment Inequalities Multiscale Adaptive Inference on Conditional Moment Inequalities Timothy B. Armstrong 1 Hock Peng Chan 2 1 Yale University 2 National University of Singapore June 2013 Conditional moment inequality models

More information

Forecasting Levels of log Variables in Vector Autoregressions

Forecasting Levels of log Variables in Vector Autoregressions September 24, 200 Forecasting Levels of log Variables in Vector Autoregressions Gunnar Bårdsen Department of Economics, Dragvoll, NTNU, N-749 Trondheim, NORWAY email: gunnar.bardsen@svt.ntnu.no Helmut

More information

Heteroskedasticity; Step Changes; VARMA models; Likelihood ratio test statistic; Cusum statistic.

Heteroskedasticity; Step Changes; VARMA models; Likelihood ratio test statistic; Cusum statistic. 47 3!,57 Statistics and Econometrics Series 5 Febrary 24 Departamento de Estadística y Econometría Universidad Carlos III de Madrid Calle Madrid, 126 2893 Getafe (Spain) Fax (34) 91 624-98-49 VARIANCE

More information

More Powerful Tests for Homogeneity of Multivariate Normal Mean Vectors under an Order Restriction

More Powerful Tests for Homogeneity of Multivariate Normal Mean Vectors under an Order Restriction Sankhyā : The Indian Journal of Statistics 2007, Volume 69, Part 4, pp. 700-716 c 2007, Indian Statistical Institute More Powerful Tests for Homogeneity of Multivariate Normal Mean Vectors under an Order

More information

Interaction effects for continuous predictors in regression modeling

Interaction effects for continuous predictors in regression modeling Interaction effects for continuous predictors in regression modeling Testing for interactions The linear regression model is undoubtedly the most commonly-used statistical model, and has the advantage

More information

Identification and Estimation for Generalized Varying Coefficient Partially Linear Models

Identification and Estimation for Generalized Varying Coefficient Partially Linear Models Identification and Estimation for Generalized Varying Coefficient Partially Linear Models Mingqiu Wang, Xiuli Wang and Muhammad Amin Abstract The generalized varying coefficient partially linear model

More information

Bahadur representations for bootstrap quantiles 1

Bahadur representations for bootstrap quantiles 1 Bahadur representations for bootstrap quantiles 1 Yijun Zuo Department of Statistics and Probability, Michigan State University East Lansing, MI 48824, USA zuo@msu.edu 1 Research partially supported by

More information

PARAMETER ESTIMATION OF CHIRP SIGNALS IN PRESENCE OF STATIONARY NOISE

PARAMETER ESTIMATION OF CHIRP SIGNALS IN PRESENCE OF STATIONARY NOISE PARAMETER ESTIMATION OF CHIRP SIGNALS IN PRESENCE OF STATIONARY NOISE DEBASIS KUNDU AND SWAGATA NANDI Abstract. The problem of parameter estimation of the chirp signals in presence of stationary noise

More information

SPLINE ESTIMATION OF SINGLE-INDEX MODELS

SPLINE ESTIMATION OF SINGLE-INDEX MODELS SPLINE ESIMAION OF SINGLE-INDEX MODELS Li Wang and Lijian Yang University of Georgia and Mihigan State University Supplementary Material his note ontains proofs for the main results he following two propositions

More information

Lecture 2: Univariate Time Series

Lecture 2: Univariate Time Series Lecture 2: Univariate Time Series Analysis: Conditional and Unconditional Densities, Stationarity, ARMA Processes Prof. Massimo Guidolin 20192 Financial Econometrics Spring/Winter 2017 Overview Motivation:

More information

Convergence rates in weighted L 1 spaces of kernel density estimators for linear processes

Convergence rates in weighted L 1 spaces of kernel density estimators for linear processes Alea 4, 117 129 (2008) Convergence rates in weighted L 1 spaces of kernel density estimators for linear processes Anton Schick and Wolfgang Wefelmeyer Anton Schick, Department of Mathematical Sciences,

More information

Stat 5101 Lecture Notes

Stat 5101 Lecture Notes Stat 5101 Lecture Notes Charles J. Geyer Copyright 1998, 1999, 2000, 2001 by Charles J. Geyer May 7, 2001 ii Stat 5101 (Geyer) Course Notes Contents 1 Random Variables and Change of Variables 1 1.1 Random

More information

On prediction and density estimation Peter McCullagh University of Chicago December 2004

On prediction and density estimation Peter McCullagh University of Chicago December 2004 On prediction and density estimation Peter McCullagh University of Chicago December 2004 Summary Having observed the initial segment of a random sequence, subsequent values may be predicted by calculating

More information

1. Introduction Over the last three decades a number of model selection criteria have been proposed, including AIC (Akaike, 1973), AICC (Hurvich & Tsa

1. Introduction Over the last three decades a number of model selection criteria have been proposed, including AIC (Akaike, 1973), AICC (Hurvich & Tsa On the Use of Marginal Likelihood in Model Selection Peide Shi Department of Probability and Statistics Peking University, Beijing 100871 P. R. China Chih-Ling Tsai Graduate School of Management University

More information

arxiv: v1 [stat.me] 30 Dec 2017

arxiv: v1 [stat.me] 30 Dec 2017 arxiv:1801.00105v1 [stat.me] 30 Dec 2017 An ISIS screening approach involving threshold/partition for variable selection in linear regression 1. Introduction Yu-Hsiang Cheng e-mail: 96354501@nccu.edu.tw

More information

Preface. 1 Nonparametric Density Estimation and Testing. 1.1 Introduction. 1.2 Univariate Density Estimation

Preface. 1 Nonparametric Density Estimation and Testing. 1.1 Introduction. 1.2 Univariate Density Estimation Preface Nonparametric econometrics has become one of the most important sub-fields in modern econometrics. The primary goal of this lecture note is to introduce various nonparametric and semiparametric

More information

Jump detection in time series nonparametric regression models: a polynomial spline approach

Jump detection in time series nonparametric regression models: a polynomial spline approach Ann Inst Stat Math (2014) 66:325 344 DOI 10.1007/s10463-013-0411-3 Jump detection in time series nonparametric regression models: a polynomial spline approach Yujiao Yang Qiongxia Song Received: 7 June

More information

11. Further Issues in Using OLS with TS Data

11. Further Issues in Using OLS with TS Data 11. Further Issues in Using OLS with TS Data With TS, including lags of the dependent variable often allow us to fit much better the variation in y Exact distribution theory is rarely available in TS applications,

More information

ON THE ESTIMATION OF EXTREME TAIL PROBABILITIES. By Peter Hall and Ishay Weissman Australian National University and Technion

ON THE ESTIMATION OF EXTREME TAIL PROBABILITIES. By Peter Hall and Ishay Weissman Australian National University and Technion The Annals of Statistics 1997, Vol. 25, No. 3, 1311 1326 ON THE ESTIMATION OF EXTREME TAIL PROBABILITIES By Peter Hall and Ishay Weissman Australian National University and Technion Applications of extreme

More information

Vector Autoregressive Model. Vector Autoregressions II. Estimation of Vector Autoregressions II. Estimation of Vector Autoregressions I.

Vector Autoregressive Model. Vector Autoregressions II. Estimation of Vector Autoregressions II. Estimation of Vector Autoregressions I. Vector Autoregressive Model Vector Autoregressions II Empirical Macroeconomics - Lect 2 Dr. Ana Beatriz Galvao Queen Mary University of London January 2012 A VAR(p) model of the m 1 vector of time series

More information

Plug-in Approach to Active Learning

Plug-in Approach to Active Learning Plug-in Approach to Active Learning Stanislav Minsker Stanislav Minsker (Georgia Tech) Plug-in approach to active learning 1 / 18 Prediction framework Let (X, Y ) be a random couple in R d { 1, +1}. X

More information

Classification via kernel regression based on univariate product density estimators

Classification via kernel regression based on univariate product density estimators Classification via kernel regression based on univariate product density estimators Bezza Hafidi 1, Abdelkarim Merbouha 2, and Abdallah Mkhadri 1 1 Department of Mathematics, Cadi Ayyad University, BP

More information

Learning Theory. Ingo Steinwart University of Stuttgart. September 4, 2013

Learning Theory. Ingo Steinwart University of Stuttgart. September 4, 2013 Learning Theory Ingo Steinwart University of Stuttgart September 4, 2013 Ingo Steinwart University of Stuttgart () Learning Theory September 4, 2013 1 / 62 Basics Informal Introduction Informal Description

More information

Forecasting Lecture 2: Forecast Combination, Multi-Step Forecasts

Forecasting Lecture 2: Forecast Combination, Multi-Step Forecasts Forecasting Lecture 2: Forecast Combination, Multi-Step Forecasts Bruce E. Hansen Central Bank of Chile October 29-31, 2013 Bruce Hansen (University of Wisconsin) Forecast Combination and Multi-Step Forecasts

More information

Bayesian Estimation and Inference for the Generalized Partial Linear Model

Bayesian Estimation and Inference for the Generalized Partial Linear Model Bayesian Estimation Inference for the Generalized Partial Linear Model Haitham M. Yousof 1, Ahmed M. Gad 2 1 Department of Statistics, Mathematics Insurance, Benha University, Egypt. 2 Department of Statistics,

More information

Notes on Time Series Modeling

Notes on Time Series Modeling Notes on Time Series Modeling Garey Ramey University of California, San Diego January 17 1 Stationary processes De nition A stochastic process is any set of random variables y t indexed by t T : fy t g

More information

High-dimensional two-sample tests under strongly spiked eigenvalue models

High-dimensional two-sample tests under strongly spiked eigenvalue models 1 High-dimensional two-sample tests under strongly spiked eigenvalue models Makoto Aoshima and Kazuyoshi Yata University of Tsukuba Abstract: We consider a new two-sample test for high-dimensional data

More information

Tutorial lecture 2: System identification

Tutorial lecture 2: System identification Tutorial lecture 2: System identification Data driven modeling: Find a good model from noisy data. Model class: Set of all a priori feasible candidate systems Identification procedure: Attach a system

More information

STAT Financial Time Series

STAT Financial Time Series STAT 6104 - Financial Time Series Chapter 4 - Estimation in the time Domain Chun Yip Yau (CUHK) STAT 6104:Financial Time Series 1 / 46 Agenda 1 Introduction 2 Moment Estimates 3 Autoregressive Models (AR

More information

Additive Isotonic Regression

Additive Isotonic Regression Additive Isotonic Regression Enno Mammen and Kyusang Yu 11. July 2006 INTRODUCTION: We have i.i.d. random vectors (Y 1, X 1 ),..., (Y n, X n ) with X i = (X1 i,..., X d i ) and we consider the additive

More information

Gaussian processes. Basic Properties VAG002-

Gaussian processes. Basic Properties VAG002- Gaussian processes The class of Gaussian processes is one of the most widely used families of stochastic processes for modeling dependent data observed over time, or space, or time and space. The popularity

More information

Nonparametric Density Estimation

Nonparametric Density Estimation Nonparametric Density Estimation Econ 690 Purdue University Justin L. Tobias (Purdue) Nonparametric Density Estimation 1 / 29 Density Estimation Suppose that you had some data, say on wages, and you wanted

More information

DS-GA 1002 Lecture notes 12 Fall Linear regression

DS-GA 1002 Lecture notes 12 Fall Linear regression DS-GA Lecture notes 1 Fall 16 1 Linear models Linear regression In statistics, regression consists of learning a function relating a certain quantity of interest y, the response or dependent variable,

More information

Asymptotic Normality under Two-Phase Sampling Designs

Asymptotic Normality under Two-Phase Sampling Designs Asymptotic Normality under Two-Phase Sampling Designs Jiahua Chen and J. N. K. Rao University of Waterloo and University of Carleton Abstract Large sample properties of statistical inferences in the context

More information