Efficient Regressions via Optimally Combining Quantile Information

Size: px
Start display at page:

Download "Efficient Regressions via Optimally Combining Quantile Information"

Transcription

1 Efficient Regressions via Optimally Combining Quantile Information Zhibiao Zhao Penn State University Zhijie Xiao Boston College September 29, 2011 Abstract We study efficient estimation of regression models via quantile regression. Both the classical parametric linear regression model and the nonparametric regression model are investigated. We argue that it is crucial to optimally combine information over quantiles. We investigate both the weighted composite quantile regression estimator based on a quantile weighted loss function, and the weighted quantile average estimator based on a weighted average of quantile regression estimators at single quantiles. We show that efficient regression estimators can be constructed using quantile regression. In the case of non-regular statistical estimation, the proposed estimators lead to super-efficient estimation. Asymptotic properties of the proposed estimators are studied and efficiency comparison is conducted among these estimators and other estimators such as the ordinary least squares estimator and the least-absolute-deviation estimator. We also conduct Monte Carlo experiments to investigate the sampling performance of the proposed estimators. AMS 2000: Primary 62J05, 62G08; Secondary 62G20. Keywords: Asymptotic normality, Bahadur representation, Composite quantile regression, Efficiency, Fisher information, Quantile regression, Super-efficiency. We thank Roger Koenker and Steve Portnoy for their very helpful comments. Zhibiao Zhao: Department of Statistics, Penn State University, University Park, PA 16802; zuz13@stat.psu.edu. Zhijie Xiao: Department of Economics, Boston College, Chestnut Hill, MA 02467; zhijie.xiao@bc.edu. 1

2 1 Introduction We are interested in the problem of efficient estimation of regression models: Y = m(x, θ + ε, (1 where X is the p-dimensional covariates, Y R is the response, ε is the noise, and θ is the vector of unknown parameters. In particular, although our method may be extended to the estimation of other statistical models (see discussion in Section 6, we consider in this paper two leading models that are widely used in statistical applications: the classical parametric linear regression model where θ is finite dimensional, and the nonparametric regression model where the parameter is of infinite dimension. In practice, model (1 is usually estimated via an ordinary least squares (OLS regression (for the parametric model or local least squares regression (in the nonparametric case of Y on X based on a collection of observations. When ε is normally distributed, the widely used OLS estimator has likelihood interpretation and is the most efficient estimator. In the absence of Gaussianity, the OLS estimator is usually less efficient than methods that exploit the distributional information, although may still be consistent under appropriate regularity conditions. Without these regularity conditions, the OLS estimator may not even be consistent, for example, when the error has a heavy-tailed distribution such as the Cauchy distribution. Monte Carlo evidence indicates that the OLS estimator can be very sensitive to certain outliers. In empirical analysis, many applications (such as finance and economics involve data whose distributions are heavy-tailed or skewed, and estimators based on OLS may have poor performance in these cases. It is therefore important to develop robust and efficient estimation procedures for general innovation distributions. If the error distribution were known, the Maximum Likelihood Estimator (MLE could be constructed. Under appropriate regularity conditions, the MLE is asymptotic normal and asymptotically efficient in the sense that the limiting covariance matrix attains the Cramér-Rao lower bound. In practice, the true density function is generally unknown and so the MLE is not feasible. However, the MLE and the Craḿer-Rao efficiency bound serve as a standard against which we should measure our estimator. The existence and construction of asymptotically efficient regression estimators have attracted much research attention, see, e.g., Stein (1956, Beran (1974 and Stone (

3 In his Wald lecture, based on a nonparametric estimate of the score function and an one-step Newton-Raphson approximation, Bickel (1982 proposed asymptotically efficient estimator of linear regressions with unknown symmetric error distribution. He also showed that, in the presence of an intercept term, asymptotically efficient estimator of the slope parameters can be obtained without symmetry assumption. We believe that the quantile regression technique (Koenker and Bassett, 1978 can provide a useful method in efficient statistical estimation. Intuitively, an estimation method that exploits the distributional information can potentially provide an efficient estimator. Since quantile regression provides a way of estimating the whole conditional distribution, appropriately using quantile regressions may improve estimation efficiency. Under regularity assumptions, the least-absolute-deviation (LAD regression (a quantile regression at median can provide better estimators than the OLS in the presence of heavy-tailed distributions. In addition, for certain distributions, a quantile regression at a non-median quantile may deliver a more efficient estimator than the LAD method. Furthermore, additional efficiency gain may be achieved by aggregating information over multiple quantiles. Although using quantile regression over multiple quantiles can potentially improve the efficiency of estimation, this is often much easier to say than it is to do in a satisfactory way. To combine information from quantile regression, one may consider combining information over different quantiles via the criterion function of the estimation procedure, or combining information based on estimators at different quantiles. An important contribution along the second direction is Portnoy and Koenker (1989, who proposed an asymptotically efficient estimator for the slope parameters of the classical linear regression model. Chamberlain (1994 and Xiao and Koenker (2009 considered some more complicate models using minimum distance estimation to improve efficiency, although not reaching the efficiency bound. Zou and Yuan (2008, Kai, Li and Zou (2010 recently considered the composite quantile regression to incorporate information across quantiles along the first direction mentioned above. In particular, they use an aggregate loss function based on a simple average of the individual loss functions over multiple quantiles. In this paper, we argue that, although a composite quantile regression estimator based on the aggregated criterion function may outperform the OLS estimator for some non-normal distributions, simple average in general is not an efficient way of using distributional information from quantile regressions. 3

4 Information at different quantiles are correlated, improperly using multiple quantile information may even reduce the efficiency. Roughly speaking, simple average delivers good estimators when the error distribution is close to normal. In fact, for nonparametric regression, a simple-averaging-based composite quantile regression estimator is asymptotically equivalent to the local least squares estimator. However, the main purpose of combining quantile information is to improve efficiency when the error distribution is not normal and OLS does not work well. It is therefore important to combine quantile information appropriately to achieve efficiency gain. We study optimal combination of quantile regression information. We investigate both the weighted composite quantile regression estimator based on weighted quantile loss functions, and the weighted quantile average estimator based on a weighted average of quantile regression estimators at single quantiles. Optimal weighting schemes that lead to efficient regression estimators are proposed. In the case of non-regular statistical estimation, the proposed estimators lead to super-efficient estimation. Asymptotic properties of the proposed estimators are studied and efficiency comparison is conducted among these estimators and other estimators such as the OLS and LAD estimators. The rest of this paper is organized as follows: Section 2 contains a preliminary result that is useful for our subsequent asymptotic theories. We study the linear regression in Section 3 and nonparametric regression model in Section 4. We briefly discuss estimation in non-regular cases in Section 5. Extensions and future work are discussed in section 6. Section 7 contains simulation studies. Proofs are gathered in Section 8. Throughout, we consider combining information over k quantiles τ i = i/(k + 1, i = 1,..., k. 2 A Preliminary Result Given the regression model (1, we denote the distribution, density, and quantile functions of the error term ε by F, f, and Q. Denote the Fisher information of f by I(f, i.e. [ log f(u ] 2f(udu [f (u] 2 I(f = = u f(u du. Throughout we assume I(f <. R Our goal is to improve efficiency (upon traditional methods of regression estimators by optimally using information at different quantiles τ 1,..., τ k. Throughout this paper, 4 R

5 we use the vector q and matrices Γ and H defined as follows: q = Γ = H = [ ] T q 1,..., q k, where qi = f(q(τ i, [ ] min(τ i, τ j τ i τ j, 1 i,j k [ ] min(τi, τ j τ i τ j. f (Q (τ i f (Q (τ j 1 i,j k For convenience we write q 0 = q k+1 = 0. We also make the following assumptions. Assumption 1. f is positive, twice continuously differentiable, and bounded on {u : 0 < F (u < 1}. Assumption 2. Let g(τ = f(q(τ, g 2 (τ + g 2 (1 τ lim τ 0 τ 1 τ = 0 and lim τ 2 g (t 2 dt = 0. τ 0 τ Assumptions 1 and 2 are conventionally imposed in the study of efficient statistical estimation. Assumption 1 assumes smoothness of the density function. Assumption 2 is needed for Cramér-Rao type efficiency result. It requires that the error density approaches zero at the boundary at a sufficiently fast rate, otherwise one may estimate the parameters at a faster rate; see, e.g. Akahira and Takeuchi (1995 for a discussion of this issue, also see discussions in Section 5. Given a random sample {(X t, Y t } n t=1, we consider efficient estimation of θ using quantile regressions. As will become more clear later, for many quantile regression estimators that have an appropriate Bahadur representation, the covariance matrix of the estimator based on optimal quantile combination will be dependent on Λ k defined below: Λ k = e T k H 1 e k, where e k = (1,..., 1 T. (2 Theorem 1 below gives an important preliminary result on combining quantile information, and it indicates that as we use a large number of quantiles (k, Λ k will converge to the Fisher information. Theorem 1. Under Assumptions 1 and 2, lim k Λ k = I(f. 5

6 Assumption 2 is an important assumption that ensures the Cramér-Rao efficiency result. Proposition 1 below presents alternative sufficient conditions for Assumption 2. Proposition 1. Under Assumption 1, if {u : 0 < F (u < 1} = R, then Assumption 2 holds if either one of the following two conditions holds: { g 2 (τ + g 2 (1 τ } lim + [ g (τ + g (1 τ ]τ 3/2 = 0. (3 τ 0 τ or equivalently { lim u ± f 2 (u min{1 F (u, F (u} + 2 [log f(u] min{1 F (u, F (u} 3/2 u 2 f(u } = 0. (4 By (4 in Proposition 1, for symmetric density f, Assumption 2 is satisfied if { f 2 (u lim u 1 F (u + 2 [log f(u] } [1 F (u] 3/2 = 0. (5 u 2 f(u We write a b if a/b is bounded away from zero and infinity. Corollary 1. Suppose that there exist ν > 1 and β > 0 such that f(u u ν and 2 [log f(u]/ u 2 u β as u. Then Assumption 2 is satisfied if β + 3(ν 1/2 > 1. In particular, the latter holds if β 1. Many commonly used distributions with support on R satisfy Assumption 2, and hence resulting in asymptotic efficiency by Theorem 1. (i If f is the standard normal density, then 2 logf(u/ u 2 = 1 and 1 F (u = [1 + o(1]f(u/u as u, it is easy to verify (5. (ii For Laplace distribution with density f(u = 0.5 exp( u, u R, 1 F (u = 0.5 exp( u for u > 0, and 2 [log f(u]/ u 2 = 0 for u 0, so (5 trivially holds. (iii For logistic distribution, we can show g(τ = c(τ τ 2 for some constant c > 0, so (3 holds. (iv For Student-t distribution with α > 0 degrees of freedom, Corollary 1 holds with ν = α +1 and β = 2. (v For symmetric α-stable distribution with index α (0, 2, Corollary 1 also holds with ν = α + 1 and β = 2. For such a case, OLS is consistent with rate n 1 1/α only for α > 1 (and for α 1, OLS is not consistent. Also, the limiting distribution is non-normal and again α-stable distribution. The proposed quantile regression method still achieves efficiency with asymptotic normality. (vi For mixture normal distribution: θn(µ 1, σ (1 θn(µ 2, σ 2 2, µ 1, µ 2 R, σ 2 1, σ 2 2 > 0, θ [0, 1], we can verify (4. 6

7 3 Efficient Estimation of The Linear Model We first consider in this section the problem of estimating coefficients of the linear model Y = α + Xβ + ε, (6 where X = (X(1,..., X(p is the p-dimensional covariates, Y is the response, ε is the noise independent of X, α is the intercept, and β = (β 1,..., β p T is the slope vector. We first look at β. Given a random sample {(X t, Y t } n t=1, we consider two ways of combining quantile information in Sections 3.1 and 3.2 respectively, additional discussions are given in Section 3.3. Section 3.4 considers α. For convenience and without loss of generality, we assume that the regressors X are centered around 0. Assumption 3. E(X = 0 and the covariance Σ X = E(X T X is positive definite. 3.1 The Weighted Composite Quantile Regression Estimation The first approach combines quantiles via the criterion function. For τ (0, 1, denote by Q Y (τ X the conditional τ-th quantile of Y given X, then model (6 has the following quantile representation: Q Y (τ X = α(τ + Xβ with α(τ = α + Q(τ, (7 where Q(τ is the τ-th quantile of ε. We consider estimating the model over {τ i, i = 1,..., k} jointly based on the following weighted quantile loss function: ({ α(τ i } k, β WCQR = argmin a 1,...,a k,b w i n ρ τi (Y t a i X t b, (8 t=1 where w i 0 with k w i = 1 are weights and ρ τ (z = z(τ 1 z 0 is the quantile loss function (Koenker and Bassett, If w i 1/k, (8 reduces to the (unweighted Composite Quantile Regression (CQR of Zou and Yuan (2008. For this reason, we call (8 the Weighted CQR (WCQR. Theorem 2 below gives the limiting distribution of the WCQR estimator. 7

8 Theorem 2. Let w = (w 1,, w k T, and Γ and q be defined in Section 2. Under Assumption 1 and 3, as the sample size n, n ( βwcqr β ( N 0, Σ 1 X J(w, where J(w = wt Γw w T qq T w. (9 If we use equal weights w i = 1/k over all quantiles, the asymptotic covariance matrix of the unweighted CQR is given by Σ 1 X J CQR, where J CQR = ( k 1 f (Q (τ i 2 [ ] min(τ i, τ j τ i τ j. (10 However, the importance of information at different quantiles are different, and the information at different quantiles are correlated - depending on the error distribution. Thus, it is important to optimally (instead of simple average combine these quantiles together. By Theorem 2, the covariance matrix of β WCQR depends on the weights through J(w. Thus a natural way to select optimal weights in the WCQR is to minimizes J(w in (9. j=1 Theorem 3 below studies the optimal WCQR estimator. Assumption 4. The log density function log f(u of ε is concave. Theorem 3. are Under Assumptions 1, 3 and 4, the optimal weights w minimizing J(w w i = (2q i q i 1 q i+1 /(q 1 + q k 0, i = 1,..., k, (11 For the corresponding optimal WCQR estimator β WCQR, we have ( n ( β WCQR β N 0, Σ 1 X Λ 1 k. where Λ k is given by (2. By the result in Theorem 1, under the additional Assumption 2, Λ k I(f: as we use more quantiles (k, the optimal WCQR achieves the Cramér-Rao efficiency bound. Remark. For a general weighting w i = w(τ i for a function w(, as k, J(w 2 1 { r w (r w (s s (1 r ds} dr 0 0 ( 2. 1 w (r f (Q (r dr 0 8

9 3.2 The Weighted Quantile Average Estimation The WCQR estimator combines information at different quantiles via the criterion function. We next consider an alternative approach that combines information based on estimators at different quantiles. For each τ, we estimate β via the quantile regression: ( α(τ, β(τ n = argmin ρ τ (Y t a X t b. (12 a,b Under Assumptions 1 and 3, β(τ has asymptotic normality: ( n ( β(τ β N 0, Σ 1 τ(1 τ X. (13 f 2 (Q(τ To combine quantile information, we define the weighted quantile average estimator (WQAE: β WQAE = ϖ i β(τi, subject to ϖ i = 1. (14 t=1 Estimators of the above form has a long history in statistics, dated back to Mosteller (1946. Portnoy and Koenker (1989 studied estimators in the form of β(τdν(τ with a general weighting function. An important difference between the weights ϖ in the WQAE and the weights w in the WCQR is that ϖ i are not restricted to be non-negative. Just as the case of the WCQR, properly weighting different quantiles in the WQAE is crucial in efficiently combining distributional information to improve efficiency. As shown by Theorem 4(i below, the optimal weights can be obtained by minimizing a quadratic function of ϖ. The optimal weights and the corresponding asymptotic distribution of the optimal WQAE are given in Theorem 4(ii. Recall the matrix H and the vector e k in Section 2. Theorem 4. (i Denote ϖ = (ϖ 1,, ϖ k T, under Assumptions 1 and 3, as the sample size n, ( n ( βwqae β N 0, Σ 1 X, S(ϖ where S(ϖ = ϖ T Hϖ. (15 (ii The optimal weights that minimizes S(ϖ are unique and are given by ϖ = H 1 e k /(e T k H 1 e k. (16 Recall Λ k in (2. For the corresponding optimal WQAE β WQAE, we have ( n ( β WQAE β N 0, Σ 1 X S(ϖ, where S(ϖ = Λ 1 k. 9

10 By Theorem 1, lim k Λ k = I(f under Assumption 2. Thus, as we use more quantiles, the variance of the optimal WQAE estimator approaches the Cramér-Rao lower bound, corroborating the result of Portnoy and Koenker (1989. Notice that the optimal weights ϖ allow for negative components. The appearance of a negative weight in ϖ is due to very inaccurate estimation at some quantile (say τ and the existence of very accurate estimates at other quantiles that are highly correlated with the estimator at this quantile (τ. For example, consider Cauchy distribution and two quantiles τ 1 = 0.15, τ 2 = 0.5, then β (τ 2 is a much more accurate estimator than β (τ 1, and they are positively correlated. In this case, the correlation effect is stronger than the averaging effect, and an optimal combination of β (τ 1 and β (τ 2 would impose a negative weight on β (τ A Comparison of Asymptotic Efficiency The WQAE β WQAE differs from the WCQR estimator β WCQR in several aspects. While the WCQR estimator in (8 is based on an aggregation of several quantile loss functions, the WQAE in (14 is based on an a weighted average of separate estimates from different quantiles. As a result, computing the WQAE only involves k separate (p + 1-parameter minimization problems, whereas the WCQR requires solving a larger (p + 1k-parameter minimization problem. In addition, to ensure a proper loss function, the weights w i of β WCQR are restricted to be non-negative; by contrast, the weights ϖ 1,..., ϖ k in (14 can be negative. It is computationally appealing to impose less constraint on the weights. By Theorems 1 and 4, the optimal WQAE is asymptotically efficient under Assumptions 1 and 2. By contrast, asymptotic efficiency of the optimal WCQR requires the additional Assumption 4, which is due to the constraint of non-negative weights in WCQR. Theorem 5 below shows that in general the optimal WQAE is asymptotically more efficient than the WCQR estimator. Only when Assumption 4 holds, the optimal WCQR is asymptotically equivalent to the optimal WQAE, and both are asymptotically efficient. Theorem 5. Let S(ϖ be defined in Theorem 4 (ii and J(w given by (9. Then S(ϖ min w J(w, with equality at w = w (given by (11 under Assumption 4. Next we study the asymptotic relative efficiency (ARE of the optimal WQAE to the OLS estimator, the quantile regression (QR estimator β(τ in (12 at a single quantile 10

11 τ, and the unweighted CQR estimator. They all have asymptotic normality with limiting distribution in the form: n( β β N(0, s 2 Σ 1 X, where s2 = var(ε (assuming finite for the OLS, τ(1 τ/f 2 (Q(τ for the QR, J CQR in (10 for the unweighted CQR, and S(ϖ in Theorem 4 (ii for the optimal WQAE. For comparison, we use the optimal WQAE as the benchmark and define its ARE to estimator β as ARE( β = s2 S(ϖ = (et k H 1 e k s 2, where β : OLS, QR(τ, CQR. ARE 1 indicates better performance of the optimal WQAE. Clearly, ARE(CQR 1, ARE(QR(τ i 1, lim k ARE(OLS 1. We discuss a few examples below using k = 9 quantiles. Example 1. Let ε be Student-t distributed. Then ARE(CQR= 1.03, 1.12, 1.58 for t or N(0,1, t 2, t 1. As the tail of the distribution gets heavier, the optimal WQAE becomes increasingly more efficient than CQR. For t 1 (Cauchy distribution, the optimal weights ϖ = { 0.03, 0.04,0.08,0.29,0.40,0.29,0.08, 0.04, 0.03}, quantiles τ = 0.4, 0.5, 0.6 contribute almost all information whereas quantiles τ = 0.1, 0.2, 0.8, 0.9 are penalized heavily with negative weights. In this case, Zou and Yuan s CQR with uniform weights would not perform well. On the other hand, for N(0,1, ϖ = {0.13,0.11,0.11,0.10,0.10,0.10,0.11,0.11,0.13} are close to the uniform weights. Therefore the optimal WQAE and CQR have comparable performance, and both significantly outperform the single QR estimator in (12. See Table 1 for details. Example 2. Let ε be a mixture of two normal distributions. Mixture 1 (different variances: 0.5N(0,1+0.5N(0,0.5 6 ; Mixture 2 (different means: 0.5N( 2,1+0.5N(2,1. For Mixture 1, ϖ = { 0.002, 0.102,0.183,0.277,0.287,0.277,0.183, 0.102, 0.002}, quantiles 0.3,0.4,0.5,0.6,0.7 contain substantial information whereas quantiles 0.2,0.8 have negative weights For Mixture 2, ϖ = {0.185,0.156,0.153,0.078, 0.144,0.078,0.153,0.156,0.185} have quite different pattern. For Mixture 1, ARE(OLS = 10.17, ARE(CQR = 2.43; for Mixture 2, ARE(OLS = 3.28, ARE(CQR = Thus, the optimal WQAE has substantial efficiency gains. See Table 1 for details. Example 3. Let ε be Gamma random variable with parameter α > 0. For α = 1 (exponential distribution, ϖ 1 = 1.152, ϖ 2 = 0.124, and ϖ i 0 for i = 3,..., 9. Quantiles 0.1, 0.2 contain almost all information. See Table 1 for details. 11

12 Insert Table 1 about here. As shown in the examples above, different quantiles may carry substantially different amount of information, and without properly utilizing such information may result in substantial loss of efficiency. The latter phenomenon provides strong evidence in favor of our proposed optimally weighted quantile based estimators. 3.4 Efficient Estimation of the Intercept Term Efficient estimation of the intercept α in (6 requires additional assumptions such as symmetry, see, e.g. Bickel (1982. In the quantile domain representation (7, without further assumptions, α is unidentifiable from α(τ. Following the previous literature, we assume that f is symmetric, which holds for many distributions, such as Student-t distribution, uniform distribution on symmetric intervals, Laplace distribution, symmetric α-stable distribution, many normal mixture distributions, and their truncated versions on symmetric interval. Assumption 5. The distribution of ε is symmetric. Recall α(τ in (7 and α(τ in (12. We consider the following WQAE for α: α WQAE = ϖ i α(τ i, subject to ϖ i = 1, ϖ i = ϖ k+1 i, i = 1,..., k. (17 The asymptotic distribution of α WQAE is given in Theorem 6 below. Again, the optimal WQAE of α can be constructed by minimizing the limiting variance of α WQAE. Since the limiting variance of α WQAE depends on S(ϖ defined in Theorem 4, and the optimal weights ϖ in Theorem 4 satisfy the condition ϖi = ϖk+1 i under Assumption 5, we have: Theorem 6. (i Under Assumptions 1, 3 and 5, we have the asymptotic normality: ( ( n α WQAE α N 0, S(ϖ, where S(ϖ = ϖ T Hϖ. (18 (ii The optimal weights that minimize the asymptotic variance is given by ϖ in (16, and the optimal WQAE α WQAE has limiting distribution n ( α WQAE α N(0, Λ 1 k, where Λ k is defined in (2. By Theorem 1, under the additional Assumption 2, the variance of the optimal WQAE of α approaches the Cramér-Rao lower bound as the number of quantiles k. 12

13 3.5 Estimating f(q(τ To compute the optimal weights, we need to estimate f(q(τ. Since f (Q (τ is invariant under location shift of ε, α in (6 can be absorbed into ε. We propose the procedure: (i Use uniform weights ϖ i = 1/k to obtain the preliminary estimator β, and compute the residuals (a combination of both α and ε as ε t = Y t X t β. (ii Use the nonparametric kernel density estimator to estimate f(u: f(u = 1 nb n ( u εt K, (19 b where K( is a non-negative kernel function and b > 0 is a bandwidth. (iii Estimate f(q(τ by f( Q(τ, where Q(τ is the sample τ-quantile of ε 1,..., ε n. t=1 In (19, we use Gaussian kernel for K. Following Silverman (1986, p. 48, we choose { b = 0.9 min sd( ε 1,..., ε n, IQR( ε 1,..., ε n } n 1/5, 1.34 where sd and IQR denote the sample standard deviation and sample interquartile. In estimating the intercept term, we symmetrize the weights; see Section Efficient Nonparametric Regression In this section, we study efficient estimation of the nonparametric regression model: Y = m(x + σ(xε, (20 where X and Y represent covariate and response respectively, ε is the error independent of X, m( is an unknown nonparametric function of interest, and σ( is the scale function. Our goal is to obtain efficient estimate of m(x at given x. For simplicity, we consider the case where X is one-dimensional and use local linear smoothing. Our analysis also applies to the case with multivariate X, and other nonparametric smoothing methods. For convenience, write Z n N(c n, s 2 n if s 1 n {Z n c n [1 + o p (1]} N(0, 1 in distribution. Let K( be a kernel function and h a bandwidth parameter, denote µ K = u 2 K(udu, φ K = K 2 (udu. R 13 R

14 Assumption 6. (i The density function p X ( > 0 of X is differentiable, m( is three times differentiable, and σ( > 0 is differentiable, in the neighborhood of x; the density function f of ε is symmetric, positive, and twice continuously differentiable on its support. (ii The kernel function K( integrates to one, is symmetric, and has bounded support. (iii The bandwidth h satisfies h 0 and nh as n. Assumption 6 is similar to those of Kai, Li and Zou (2010. As in Section 3.4, the symmetry assumption in Assumption 6 (i is imposed to ensure identifiability. Based on observations (X 1, Y 1,..., (X n, Y n, the widely used local linear least squares regression is ( m LS (x, b = argmin a,b n ( {Y t a b(x t x} 2 Xt x K. (21 h t=1 If we assume that var(ε <, under some regularity conditions [see, e.g. Fan and Gijbels (1996], the local linear least squares estimator is asymptotically normal: ( 1 m LS (x m(x N 2 m (xµ K h 2 φ K σ 2 (x, nhp X (x var(ε. When ε i s are Gaussian, the local linear least squares estimation corresponds to the weighted likelihood criterion. In the absence of Gaussianity, asymptotic results of the above estimator generally still hold but this estimator is less efficient in terms of meansquared error than estimators that exploit the distributional information. In the presence of heavy-tailed data, local median (or more generally, local τ-th quantile regression has been proposed as a robust method in estimating m(x; see, e.g. Yu and Jones (1998: ( m LAD (x, b = argmin a,b n ( Xt x Y t a b(x t x K. (22 h t=1 Under Assumption 6, we have m LAD (x m(x N ( 1 2 m (xµ K h 2, φ K σ 2 (x 1. nhp X (x 4f 2 (0 If the error density f were known, we could replace {Y t a b(x t x} 2 in (21 by the log likelihood log f(y t a b(x t x and obtain a likelihood-based estimator, denoted 14

15 by m MLE (x, see, e.g., Fan, Farman and Gijbels (1998. Under some regularity conditions, ( 1 m MLE (x m(x N 2 m (xµ K h 2 φ K σ 2 (x, nhp X (x I(f 1, (23 where I(f is the Fisher information of the error distribution. The local likelihood estimator has lower variance than the least squares or least-absolute-deviation based local linear estimator as follows by the classical Cramér-Rao inequality. In practice, f is unknown and the local likelihood estimator is infeasible. In this section, we propose a quantile average nonparametric estimator for m(x, and show that, as the number of quantiles k, the asymptotic variance of the proposed estimator converges to the same asymptotic variance as the local likelihood procedure. This is true regardless of the dimensionality of the regressors. In this sense, our estimator is efficient 1. In many applications, the error distribution is likely to be non-normal and perhaps quite far from normal to such an extent that the local likelihood estimator is much more efficient than the least squares estimator. As a modeling issue it is hard to believe that we have better information about the error density than about the shape of the main regression effect and so it is quite natural to treat the error density as an unknown parameter. Thus our results are comforting in that they say that we can still use information from the error distribution to improve the performance of nonparametric estimates. 4.1 The Weighted Quantile Averaging Nonparametric Estimator Let Q(τ be the τ-th quantile of ε, and Q Y X (τ x be the conditional τ-th quantile of Y given X = x, then the nonparametric regression model (20 has the quantile regression representation: Q Y X (τ x = m(x + σ(xq(τ. Denote Q Y X (τ x by m(τ, x. Consider the local linear nonparametric quantile regression: ( m(τ, x, b = argmin a,b n ( Xt x ρ τ {Y t a b(x t x}k. (24 h t=1 Solution of a in the above optimization, m(τ, x, provides an estimate for Q Y X (τ x. 1 This is not to say that our method has any special properties with regard to minimax risk; as is well known it is not possible to achieve the lower bound here, see Fan (

16 We construct the following weighted quantile averaging nonparametric (WQAN estimator for m(x based on weighted average of m(τ i, x over τ i, i = 1,..., k: m WQAN (x = ϖ i m(τ i, x, subject to ϖ i = 1, ϖ i = ϖ k+1 i, i = 1,..., k. (25 Theorem 7. Let S(ϖ be defined in Theorem 4 (i. Under Assumption 6, as n, ( 1 m WQAN (x m(x N 2 m (xµ K h 2 φ K σ 2 (x, nhp X (x S(ϖ. (26 If we simply use equal weights, then the resulting unweighted quantile average nonparametric estimator has the asymptotic normality in Theorem 7 with S(ϖ replaced by R k = 1 k 2 j=1 min(τ i, τ j τ i τ j f(q(τ i f(q(τ j. (27 This is the same as the limiting distribution of the nonparametric CQR estimator of Kai, Li and Zou (2010. Assuming that var(ε <, by result of Kai, Li and Zou (2010, lim R min(r, s rs k = drds = var(ε, k f(q(rf(q(s the unweighted quantile averaging nonparametric regression estimator is asymptotically equivalent to the local least squares estimator! This result indicates that if we use a simple average over τ, even as we use more and more quantiles, there is no efficiency gain of combining quantile information; Simple averaging does not work when it is most needed. See the simulation studies in Section 7.2. Proper weighting over different quantiles is very important in the nonparametric case. Next we show that a large efficiency gain can be obtained by optimal weighting in the WQAN. Since the asymptotic bias of m WQAN (x does not depend on the weights by Theorem 7, it is natural to find the optimal weights in m WQAN (x that minimize the asymptotic variance subject to the conditions on ϖ in (25. Notice that, the limiting variance of the WQAN estimator depends on ϖ through S(ϖ, which is the same as in Theorem 4. In addition, under Assumption 6 (i, the optimal weights ϖ in Theorem 4 satisfy the condition ϖi = ϖk+1 i. Therefore, the optimal WQAN estimator of m(x is given by: ( ( H m WQAN(x = m(τ 1, x,..., m(τ k, x ϖ 1 e k = m(τ 1, x,..., m(τ k, x. (28 e T k H 1 e k 16

17 By Theorem 7 and Theorem 4, we have the following result. Theorem 8. Recall Λ k in (2. Under Assumption 6, as n, ( m 1 WQAN(x m(x N 2 m (xµ K h 2 φ K σ 2 (x, nhp X (x Λ 1 k. By Theorem 1, under the additional Assumption 2, the optimal WQAN estimator m WQAN (x achieves asymptotic efficiency in the sense that its asymptotic bias and variance are the same as that of the likelihood-based estimator in ( Comparison of Asymptotic Efficiency In this section we compare the proposed optimal WQAN estimator m WQAN (x in (28 with other nonparametric regression estimators: the local linear least squares estimator m LS (x in (21, the local linear least-absolute-deviation estimator m LAD (x in (22, and the local linear CQR estimator m CQR (x of Kai, Li and Zou (2010. All four estimators have the same asymptotic bias m (xµ K h 2 /2 and different variances. In Theorem 9 we show that m WQAN (x is more efficient than the local CQR estimator m CQR (x. Theorem 9. Under Assumptions 2 and 6, the optimal WQAN estimator m WQAN (x is asymptotically more efficient than the nonparametric CQR m CQR (x for any nonnormal distribution. For normal distribution, they are asymptotically equivalent. It is easy to see that the proposed optimal WQAN estimator m WQAN (x is more efficient than the local OLS and LAD estimators. Thus we conclude that the proposed optimal WQAN estimator is asymptotically more efficient than all these three estimators. Next we calculate asymptotic relative efficiencies of these nonparametric methods. All four estimators have asymptotic normality of the following form with different s 2 : ( 1 m(x m(x N 2 m (xµ K h 2 φ K σ 2 (x, nhp X (x s2. we define the asymptotic mean-squared error (AMSE { } 2 1 AMSE{ m(x} = 2 m (xµ K h 2 + φ Kσ 2 (x nhp X (x s2. 17

18 Minimizing the AMSE, we obtain the optimal bandwidth: { } 1/5 h = argmin AMSE{ m(x} = {m (xµ K } 2/5 φ K σ 2 (x h p X (x s2 n 1/5, (29 and the associated optimal AMSE evaluated at the optimal bandwidth h { } AMSE { m(x} = 5 4/5 4 {m (xµ K } 2/5 φ K σ 2 (x p X (x s2 n 4/5, (30 which is proportional to (s 2 4/5. Therefore, similar results in Section 3.3 also apply here. Remark. Our analysis also extends to the multivariate case when X is d-dimensional. In this case, the variance component will be of order (nh d Estimating f(q(τ As in Section 3.5, we need to estimate the unknown quantity f(q(τ. Notice that the optimal weights ϖ are invariant under the transformation aε of the error for any a > 0, and hence we can assume without loss of generality that ε has median one. Then the conditional median of Y m(x given X is σ(x, therefore we can apply local median quantile regression to estimate σ(. We propose the following procedure (i Use (25 with uniform weights to obtain preliminary estimator m(. (ii Compute Y i m(x i and estimate σ( by local linear median quantile regression: ( σ(x, b = argmin σ,b n ( Xi x ρ 0.5 { Y i m(x i σ b(x i x}k. (31 l (iii Compute the errors ε i = [Y i m(x i ]/ σ(x i and use (19 to obtain estimate f(u. (iv Denote the sample τ-quantile of ε 1,..., ε n by Q(τ, obtain the estimate of f(q(τ as f( Q(τ, use (16 to compute ϖ 1,..., ϖ k, and finally obtain symmetrized weights ϖ j = ( ϖ j + ϖ k+1 j /2, j = 1,..., k. In (31, following Yu and Jones (1998, we use l = l LS (π/2 1/5, where l LS is the plug-in bandwidth (Ruppert, Sheather and Wand, 1995 for local linear least squares regression based on the data (X i, Y i m(x i 2, i = 1,..., n. 18

19 5 Super-efficient Estimation via Quantile Regression Under Assumptions 1, 2 and 3, asymptotically efficient estimators of β in model (6 exist, they are root-n consistent, asymptotic normal, with limiting covariance matrix attaining the Fisher information lower bound. Similarly, under Assumptions 6 and 2, the local likelihood estimator of m(x is nh-consistent, asymptotic normal, and the limiting covariance matrix equals to the Fisher information matrix. These cases are usually related to the terminology of regular statistical estimation. The corresponding regularity conditions are not mathematical trivialities (for easy of proof but are real restricting conditions to obtain the efficiency results. In the case when those usual regularity conditions do not hold, the previously discussed efficiency result may not hold, and different results (from the likelihood-based estimation may be found in such cases. For example, different rate of convergence may be found, and it is well-known that, in general, asymptotically efficient estimators do not exist. These unusual cases are some times called non-regular statistical estimation. In this section, we briefly discuss the case of non-regular estimation. We show that, in this case, by using quantile regression with optimal weighting, super-efficient estimators may be obtained in the sense that the limiting variance is smaller than the conventional Cramér-Rao bound or even zero. Assumption 2 guarantees that the density f vanishes at the boundary at a sufficiently fast rate so that the properties of regular estimation holds. If this assumption does not hold, the conventional estimators such as the OLS or the MLE may still be root-n consistent and asymptotic normal. However, we show in Theorem 10 below that the optimally weighted quantile averaging estimators has smaller variance than the conventional lower bound, and one may even estimate β (or m(x at a rate better than root-n (or nh. Theorem 10. Let g(τ = f(q(τ, and Λ k be defined by (2, and assume g 2 (τ + g 2 (1 τ lim τ 0 τ = c. (32 (i If 0 < c < and lim τ 0 τ 2 1 τ τ g (t 2 dt = 0, then lim k Λ k = [c + I(f] 1. (ii If c =, then lim k Λ k = 0. Assumption 2 covers the regular case corresponding to c = 0 in (32. Theorem 10 indicates that, for the non-regular case c > 0 in (32, under appropriate regularity condi- 19

20 tions, for large k, the variances of the (standardized optimally weighted quantile regression based estimators are smaller than the Cramér-Rao bound. In particular, if c =, as the number of quantiles k increases, the variance of the weighted quantile regression based estimators approaches zero. In this sense, the optimal quantile regression based estimators are super-efficient. Corollary 2 below concerns a special case that the density f is positive at the boundary. Corollary 2. Denote the support of f by D, then lim k Λ k = 0 in any of the following three cases: (i D = [D 1, D 2 ] with f(d 1 + f(d 2 > 0; (ii D = [D 1, with f(d 1 > 0; or (iii D = (, D 2 ] with f(d 2 > 0. Example 4. For the truncated version of many commonly used distributions, we have lim k Λ k = 0. For example, for truncated normal on [ 1, 1], Corollary 2 (i applies. For uniform distribution on [0, 1], we can show Λ k = 1/(2k Example 5. Let ε have Beta distribution with parameters α 1, α 2 > 0. For α 1 = α 2 = 1 (uniform distribution, ϖ1 = ϖk = 0.5 and ϖ 2 = = ϖk 1 = 0, which is not surprising: the sufficient statistic for independent uniform samples on [θ 1, θ 2 ] is the two extreme observations. Also, ARE(CQR = (k +1(k +2/(6k, so the CQR with uniform weights is substantially inferior to the optimal WQAE for large k. 6 Extensions Our proposed methodology provides a general framework for constructing efficient estimator under a broad variety of settings. Here we briefly discuss some general ideas and mention some possible directions. Different types of models may involve different assumptions and have different results - depending on the model and the subject of interest. We wish to explore these in more detail in future research. In general, one may consider efficient estimation of a statistical model that depends on {Y t, X t, α, β}. Suppose that the model has a quantile domain representation, say, Q Yt (τ X t = g(x t, α(τ, β where α(τ are local unknown parameters that are related to α and vary over different quantiles, and β are global parameters that remain the same over all quantiles. We are 20

21 interested in more efficient estimation of the parameters. Under appropriate regularity conditions, the quantile regression estimators θ(τ = ( α(τ, β(τ are consistent estimators of θ(τ = (α(τ, β and have a Bahadur representation, say, θ(τ = θ(τ + C mf(q(τ n Z t [τ 1 εt <Q(τ] + negligible error, t=1 where m = m(n, C is a p p positive definite matrix that does not depend on τ, Z 1,..., Z n R p are IID p-dimensional column random vector, and Q(τ is the τ-th quantile of ε t. An efficient estimator for β based on combined quantile regressions can be constructed in a similar way to those discussed in this paper. Under additional assumptions on α(τ, say α(τ = α + cq(τ is a quantile shift parameter, and the error distribution is symmetric, estimator for α based on combined quantile regressions can be also be constructed. We briefly discuss a few possible directions below. (i Efficient estimation of derivatives. For model (20, our proposed methodology can also be used to construct efficient estimator for m (x by considering linear combination of b in (24 over different quantiles. (ii Efficient estimation of varying-coefficient models. Consider Y t = m 1 (U t X t,1 + + m p (U t X t,p + σ(u t ε t, (33 where m 1 (,..., m p ( are nonparametric functional coefficients of interest. Denote by Q Y U (τ u the conditional τ-th quantile of Y given U = u, and by Q(τ the τ-th quantile of ε. Then (33 has the quantile representation Q Y U (τ u = m 1 (ux t,1 + + m p (ux t,p + σ(uq(τ, which can be estimated by a similar local linear quantile regression as in (24. Note that m 1 (u,..., m p (u are global parameters and σ(uq(τ are local parameters, so our methodology can be applied here. (iii Efficient estimation of conditional variance function. Consider Y = σ(xε. Then Y 2 = σ 2 (Xε 2, leading to the quantile representation Q Y 2 X(τ x = σ 2 (xq ε 2(τ. To ensure identifiability, assume σ(0 = 1. Then Q Y 2 X(τ x/q Y 2 X(τ 0 = σ 2 (x for all τ (0, 1, and efficient estimator of σ 2 ( can be constructed via combining quantiles. 21

22 (iv Efficient estimation of dependent data. Our asymptotic theories are developed for independent data for technical convenience, but they can be extended to dependent data, such as time series or even longitudinal data, under appropriate mixing conditions. 7 Monte Carlo Studies 7.1 Linear regression model In (6, let α = 0, β = 1, and the covariate X N(0,1. For ε, we consider Normal distribution, Student-t distribution with one (t 1 and two (t 2 degrees of freedom, the two normal mixture distributions in Example 2, Laplace distribution, Beta distributions Beta(1,b, b = 0.5, 1, 2, 3, and Gamma distributions Gamma(γ, γ = 0.5, 1, 2, 3. Throughout this section we consider uniformly spaced quantiles τ i = i/(k + 1, i = 1,..., k = 9. We compare six estimation methods. OLS: ordinary least squares estimate; LAD: the median quantile estimate with τ = 0.5 in (12; QAU, QAO, QAE: quantile average estimate in (14 with uniform weights, the theoretical optimal weights, and the estimated optimal weights (cf. Section 3.5, respectively; CQR: Zou and Yuan (2008 s CQR estimator. With realizations, we use QAE as a benchmark to which the five other methods are compared based on the empirical relative efficiency: RE(Method = MSE(Method , with MSE = [ β(j β] 2. MSE(QAE where Method stands for any of the other five estimators, and β(j is the estimator from the j-th iteration. A value of RE larger than one indicates better performance of QAE. The results are summarized in Tables 2 for two sample sizes n = 100, 300. For N(0,1, Student-t 2 and Laplace distributions, QAE and CQR are comparable; for all other distributions, QAE significantly outperforms CQR. Also, QAE outperforms OLS for all non-normal distributions whereas they are comparable for N(0,1. For larger sample size, the superior performance of QAE is even more remarkable, which agrees with our theoretical discovery. For almost all cases considered, QAE substantially outperforms the LAD estimator and the relative efficiency can be as high as 2845%. It is worth pointing out that, for Beta and j=1 22

23 Gamma distributions, the relative efficiencies are much higher than the other distributions considered, owing to the super-efficiency phenomenon in Theorem 10. We conclude that the proposed optimal QAE offers a more efficient alternative to existing methods. Insert Table 2 about here. 7.2 Nonparametric regression model In our data analysis, we use the standard Gaussian kernel for K(. We now address the bandwidth selection issue. By (29, the optimal bandwidth h is proportional to (s 2 1/5. Denote by h LS and h WQAN the bandwidth for the least squares estimator and the proposed optimal WQAN estimator. Then h WQAN = h LS {Λ k/var(ε} 1/5, where Λ k is defined in (2. In practice, to select h LS, we can use the plug-in bandwidth selector in Ruppert, Sheather and Wand (1995, implemented using the command dpill of the R package KernSmooth. We then select h WQAN by plugging in estimates of Λ k and var(ε using the methods in Section 4.3, but for the purpose of comparison we shall use their true values in our simulation studies. Similarly, we can choose the optimal bandwidths for the other two estimators. Kai, Li and Zou (2010 adopted the same strategy. In step (i of Section 4.3, for the preliminary estimator m(, we use the plug-in bandwidth selector h LS. We compare the empirical performance of the four methods listed at the beginning of Section 4.2. With 1000 realizations, we use the least squares estimator m LS as the benchmark to which the other three methods are compared based on the relative efficiency: RE( m = MISE( m LS 1, with MISE( m = MISE( m l2 r=1 l 1 { m r (x m(x} 2 dx, where m r is the estimator from the rth realization, and [l 1, l 2 ] is the interval over which m is estimated. A relative efficiency greater than one indicates m outperforms the LS estimator. To facilitate computation, the integral is approximated using 20 grid points. Consider n = 200 samples from the model Model I : Y = sin(2x + 2 exp( 16X ε, X Unif[ 1.6, 1.6]. (34 The same model was also used in Kai, Li and Zou (2010 with normal design X N(0,1. Here we use uniform design to avoid some computational issues. 23 For ε, we consider

24 nine symmetric distributions: N(0,1, truncated normal on [ 1, 1], truncated Cauchy on [ 10, 10], truncated Cauchy on [ 1, 1], Student-t with 3 (t 3 degrees of freedom, Standard Laplace distribution, uniform distribution on [ 0.5, 0.5], and two normal mixtures 0.5 N(2,1+0.5 N( 2,1 and 0.95 N(0, N(0,9. The first normal mixture can be used to model a two-cluster population, whereas the second normal mixture can be viewed as a noise contamination model. Let [l 1, l 2 ] = [ 1.5, 1.5]. The relative efficiencies of the four methods, with m LS (x as the benchmark, are summarized in Table 3 (a. Overall, the proposed optimal WQAN either significantly outperforms or is comparable to the other three methods. For example, for N(0,1 and 0.95 N(0, N(0,9, WQAN is comparable to LS; for other distributions, WQAN has about 20% efficiency gain over LS for most distributions and more than 60% efficiency gain for 0.5 N(2,1+0.5 N( 2,1. When compared with CQR, WQAN outperforms CQR for all but the four distributions: N(0,1, Student-t 3, Laplace distribution, and 0.95 N(0, N(0,9, for which they are comparable. While WQAN underperforms LAD for truncated Cauchy on [ 10, 10], it has substantial efficiency gains for N(0,1, truncated N(0,1 on [ 1, 1], truncated Cauchy on [ 1, 1], uniform on [ 0.5, 0.5], and 0.5 N(2,1+0.5 N( 2,1. The empirical performance of the proposed method in Table 3 (a is not as impressive as its theoretical performance in (30. For example, for truncated N(0,1 on [ 1, 1], the theoretical AREs according to (30 are 1, 0.48, 0.86, 0.93, 0.95, 1.13, 1.69, 2.17, compared to 1, 0.54, 0.93, 0.98, 0.99, 1.14, 1.25, 1.26 in Table 3 (a. To explain this phenomenon, the plot (not included here of the function m(x = sin(2x + 2 exp( 16x 2 exhibits large curvature and sharp changes on [ 0.5, 0.5], and thus large estimation bias could easily offset the asymptotic efficiency improvements, especially for a moderate sample size. To appreciate this, use the same X and ε in (34, and consider model Model II : Y = 1.8X + 0.5ε. (35 Then the bias term m (xµ K h 2 /2 = 0 vanishes and the variance plays a dominating role. For all four estimation methods, we use the same bandwidth: the plug-in bandwidth selector for local linear regression. We summarize the relative efficiencies in Table 3 (b. The overall pattern of the empirical relative efficiencies is consistent with that of the theoretical ones in (30, and the proposed WQAN significantly outperforms other methods for almost all distributions considered. Also, using more quantiles (k = 29 significantly 24

25 improves the performance of WQAN for truncated N(0,1 on [ 1, 1], truncated Cauchy on [ 1, 1] and uniform on [ 0.5, 0.5], which is not shared by the CQR method. In summary, for most non-normal distributions considered, the proposed method can have substantial efficiency improvements over other methods, and the empirical performance is consistent with our asymptotic theory. Insert Table 3 about here. 8 Proofs The following Lemma 1 is needed to prove Theorem 1. Lemma 1 Let h( be any differentiable function on interval [a, b]. Then we have [ b ] 2 b h(xdx (b a h 2 (xdx (b b a3 [h (x] 2 dx. (36 2 a Proof. It is easy to verify the following identity [ b 2 b V := h(xdx] (b a h 2 (xdx = 1 2 a a a b b a a a [h(x h(y] 2 dxdy. (37 For all x, y [a, b], we have h(x h(y = x y h (udu b a h (u du, uniformly. Thus, by the Cauchy-Schwartz inequality, [ b 2 b max h(x h(y 2 h (u du] (b a [h (u] 2 du. x,y [a,b] Applying the above inequality to (37, we have V completing the proof. (b a2 2 a max h(x h(y 2 x,y [a,b] (b a3 2 a b a [h (u] 2 du, Proof of Theorem 1. Recall Γ = [min(τ i, τ j τ i τ j ] 1 i,j k, τ i = i/(k + 1. We can verify Γ 1 = (k + 1., (

26 with 2(k + 1 on the diagonal, (k + 1 on the super-/sub-diagonals, and 0 elsewhere. Define the k k diagonal matrix } P = diag {q 1,..., q k with q i = f(q(τ i. (39 We have H = P 1 ΓP 1 and H 1 = P Γ 1 P. Using (38, we can obtain [ Λ k = e T k H 1 e k = e T k P Γ 1 P e k = (k + 1 q1 2 + qk 2 + (q i q i 1 ]. 2 (40 Recall g(τ = f(q(τ in Assumption 2. Define u = Q(τ. Then τ/ u = F (u/ u = f(u, and by the chain rule, g (τ = ( g/ u( u/ τ = f (u/f(u. Thus, 1 [ f [g (τ] 2 (u ] 2f(udu dτ = = I(f. f(u Applying the chain rule again, 0 g (τ = {g (τ} u u τ = {f (u/f(u} u R i=2 u τ = f (uf(u f (u 2. (41 f 3 (u Let = 1/(k + 1. In (40, by Assumption 2, (k + 1(q1 2 + qk 2 0. Define W k = (k + 1 (q i q i 1 2 i=2 1 [g (t] 2 dt. To prove Λ k I(f, it suffices to show W k 0. Since τ 1 =, τ k = 1, and τ i τ i 1 =, we can rewrite W k as { [ τi ] 2 W k = (k + 1 g (tdt τ i 1 i=2 Applying Lemma 1 and Assumption 2, we get (τ i τ i 1 i=2 τi τ i 1 [g (t] 2 dt }. W k 2 2 i=2 τi [g (t] 2 dt = 2 τ i g (t 2 dt 0, completing the proof. Proof of Proposition 1. First, we prove that (3 implies Assumption 2. Let ϵ > 0 be any given number. By the condition lim τ 0 [ g (τ + g (1 τ ]τ 3/2 = 0, there exists 26

E cient Regressions via Optimally Combining Quantile Information

E cient Regressions via Optimally Combining Quantile Information E cient Regressions via Optimally Combining Quantile Information Zhibiao Zhao Penn State University Zhijie Xiao Boston College Abstract We develop a generally applicable framework for constructing e cient

More information

WEIGHTED QUANTILE REGRESSION THEORY AND ITS APPLICATION. Abstract

WEIGHTED QUANTILE REGRESSION THEORY AND ITS APPLICATION. Abstract Journal of Data Science,17(1). P. 145-160,2019 DOI:10.6339/JDS.201901_17(1).0007 WEIGHTED QUANTILE REGRESSION THEORY AND ITS APPLICATION Wei Xiong *, Maozai Tian 2 1 School of Statistics, University of

More information

Single Index Quantile Regression for Heteroscedastic Data

Single Index Quantile Regression for Heteroscedastic Data Single Index Quantile Regression for Heteroscedastic Data E. Christou M. G. Akritas Department of Statistics The Pennsylvania State University SMAC, November 6, 2015 E. Christou, M. G. Akritas (PSU) SIQR

More information

Least Absolute Value vs. Least Squares Estimation and Inference Procedures in Regression Models with Asymmetric Error Distributions

Least Absolute Value vs. Least Squares Estimation and Inference Procedures in Regression Models with Asymmetric Error Distributions Journal of Modern Applied Statistical Methods Volume 8 Issue 1 Article 13 5-1-2009 Least Absolute Value vs. Least Squares Estimation and Inference Procedures in Regression Models with Asymmetric Error

More information

Time Series and Forecasting Lecture 4 NonLinear Time Series

Time Series and Forecasting Lecture 4 NonLinear Time Series Time Series and Forecasting Lecture 4 NonLinear Time Series Bruce E. Hansen Summer School in Economics and Econometrics University of Crete July 23-27, 2012 Bruce Hansen (University of Wisconsin) Foundations

More information

Multivariate Distributions

Multivariate Distributions IEOR E4602: Quantitative Risk Management Spring 2016 c 2016 by Martin Haugh Multivariate Distributions We will study multivariate distributions in these notes, focusing 1 in particular on multivariate

More information

where x i and u i are iid N (0; 1) random variates and are mutually independent, ff =0; and fi =1. ff(x i )=fl0 + fl1x i with fl0 =1. We examine the e

where x i and u i are iid N (0; 1) random variates and are mutually independent, ff =0; and fi =1. ff(x i )=fl0 + fl1x i with fl0 =1. We examine the e Inference on the Quantile Regression Process Electronic Appendix Roger Koenker and Zhijie Xiao 1 Asymptotic Critical Values Like many other Kolmogorov-Smirnov type tests (see, e.g. Andrews (1993)), the

More information

Does k-th Moment Exist?

Does k-th Moment Exist? Does k-th Moment Exist? Hitomi, K. 1 and Y. Nishiyama 2 1 Kyoto Institute of Technology, Japan 2 Institute of Economic Research, Kyoto University, Japan Email: hitomi@kit.ac.jp Keywords: Existence of moments,

More information

Improving linear quantile regression for

Improving linear quantile regression for Improving linear quantile regression for replicated data arxiv:1901.0369v1 [stat.ap] 16 Jan 2019 Kaushik Jana 1 and Debasis Sengupta 2 1 Imperial College London, UK 2 Indian Statistical Institute, Kolkata,

More information

Local Polynomial Regression

Local Polynomial Regression VI Local Polynomial Regression (1) Global polynomial regression We observe random pairs (X 1, Y 1 ),, (X n, Y n ) where (X 1, Y 1 ),, (X n, Y n ) iid (X, Y ). We want to estimate m(x) = E(Y X = x) based

More information

Can we do statistical inference in a non-asymptotic way? 1

Can we do statistical inference in a non-asymptotic way? 1 Can we do statistical inference in a non-asymptotic way? 1 Guang Cheng 2 Statistics@Purdue www.science.purdue.edu/bigdata/ ONR Review Meeting@Duke Oct 11, 2017 1 Acknowledge NSF, ONR and Simons Foundation.

More information

Lecture 7 Introduction to Statistical Decision Theory

Lecture 7 Introduction to Statistical Decision Theory Lecture 7 Introduction to Statistical Decision Theory I-Hsiang Wang Department of Electrical Engineering National Taiwan University ihwang@ntu.edu.tw December 20, 2016 1 / 55 I-Hsiang Wang IT Lecture 7

More information

SEMIPARAMETRIC QUANTILE REGRESSION WITH HIGH-DIMENSIONAL COVARIATES. Liping Zhu, Mian Huang, & Runze Li. The Pennsylvania State University

SEMIPARAMETRIC QUANTILE REGRESSION WITH HIGH-DIMENSIONAL COVARIATES. Liping Zhu, Mian Huang, & Runze Li. The Pennsylvania State University SEMIPARAMETRIC QUANTILE REGRESSION WITH HIGH-DIMENSIONAL COVARIATES Liping Zhu, Mian Huang, & Runze Li The Pennsylvania State University Technical Report Series #10-104 College of Health and Human Development

More information

Bayesian linear regression

Bayesian linear regression Bayesian linear regression Linear regression is the basis of most statistical modeling. The model is Y i = X T i β + ε i, where Y i is the continuous response X i = (X i1,..., X ip ) T is the corresponding

More information

Quantile Regression for Extraordinarily Large Data

Quantile Regression for Extraordinarily Large Data Quantile Regression for Extraordinarily Large Data Shih-Kang Chao Department of Statistics Purdue University November, 2016 A joint work with Stanislav Volgushev and Guang Cheng Quantile regression Two-step

More information

Sequential design for nonparametric inference

Sequential design for nonparametric inference Sequential design for nonparametric inference Zhibiao Zhao and Weixin Yao Penn State University and Kansas State University Abstract Performance of nonparametric function estimates often depends on the

More information

Integrated Likelihood Estimation in Semiparametric Regression Models. Thomas A. Severini Department of Statistics Northwestern University

Integrated Likelihood Estimation in Semiparametric Regression Models. Thomas A. Severini Department of Statistics Northwestern University Integrated Likelihood Estimation in Semiparametric Regression Models Thomas A. Severini Department of Statistics Northwestern University Joint work with Heping He, University of York Introduction Let Y

More information

For right censored data with Y i = T i C i and censoring indicator, δ i = I(T i < C i ), arising from such a parametric model we have the likelihood,

For right censored data with Y i = T i C i and censoring indicator, δ i = I(T i < C i ), arising from such a parametric model we have the likelihood, A NOTE ON LAPLACE REGRESSION WITH CENSORED DATA ROGER KOENKER Abstract. The Laplace likelihood method for estimating linear conditional quantile functions with right censored data proposed by Bottai and

More information

Single Index Quantile Regression for Heteroscedastic Data

Single Index Quantile Regression for Heteroscedastic Data Single Index Quantile Regression for Heteroscedastic Data E. Christou M. G. Akritas Department of Statistics The Pennsylvania State University JSM, 2015 E. Christou, M. G. Akritas (PSU) SIQR JSM, 2015

More information

Fall 2017 STAT 532 Homework Peter Hoff. 1. Let P be a probability measure on a collection of sets A.

Fall 2017 STAT 532 Homework Peter Hoff. 1. Let P be a probability measure on a collection of sets A. 1. Let P be a probability measure on a collection of sets A. (a) For each n N, let H n be a set in A such that H n H n+1. Show that P (H n ) monotonically converges to P ( k=1 H k) as n. (b) For each n

More information

Understanding Regressions with Observations Collected at High Frequency over Long Span

Understanding Regressions with Observations Collected at High Frequency over Long Span Understanding Regressions with Observations Collected at High Frequency over Long Span Yoosoon Chang Department of Economics, Indiana University Joon Y. Park Department of Economics, Indiana University

More information

Nonparametric Density Estimation

Nonparametric Density Estimation Nonparametric Density Estimation Econ 690 Purdue University Justin L. Tobias (Purdue) Nonparametric Density Estimation 1 / 29 Density Estimation Suppose that you had some data, say on wages, and you wanted

More information

Supplemental Material for KERNEL-BASED INFERENCE IN TIME-VARYING COEFFICIENT COINTEGRATING REGRESSION. September 2017

Supplemental Material for KERNEL-BASED INFERENCE IN TIME-VARYING COEFFICIENT COINTEGRATING REGRESSION. September 2017 Supplemental Material for KERNEL-BASED INFERENCE IN TIME-VARYING COEFFICIENT COINTEGRATING REGRESSION By Degui Li, Peter C. B. Phillips, and Jiti Gao September 017 COWLES FOUNDATION DISCUSSION PAPER NO.

More information

Bayesian Semiparametric GARCH Models

Bayesian Semiparametric GARCH Models Bayesian Semiparametric GARCH Models Xibin (Bill) Zhang and Maxwell L. King Department of Econometrics and Business Statistics Faculty of Business and Economics xibin.zhang@monash.edu Quantitative Methods

More information

Nonparametric Identification of a Binary Random Factor in Cross Section Data - Supplemental Appendix

Nonparametric Identification of a Binary Random Factor in Cross Section Data - Supplemental Appendix Nonparametric Identification of a Binary Random Factor in Cross Section Data - Supplemental Appendix Yingying Dong and Arthur Lewbel California State University Fullerton and Boston College July 2010 Abstract

More information

Inference For High Dimensional M-estimates: Fixed Design Results

Inference For High Dimensional M-estimates: Fixed Design Results Inference For High Dimensional M-estimates: Fixed Design Results Lihua Lei, Peter Bickel and Noureddine El Karoui Department of Statistics, UC Berkeley Berkeley-Stanford Econometrics Jamboree, 2017 1/49

More information

Bayesian Semiparametric GARCH Models

Bayesian Semiparametric GARCH Models Bayesian Semiparametric GARCH Models Xibin (Bill) Zhang and Maxwell L. King Department of Econometrics and Business Statistics Faculty of Business and Economics xibin.zhang@monash.edu Quantitative Methods

More information

Teruko Takada Department of Economics, University of Illinois. Abstract

Teruko Takada Department of Economics, University of Illinois. Abstract Nonparametric density estimation: A comparative study Teruko Takada Department of Economics, University of Illinois Abstract Motivated by finance applications, the objective of this paper is to assess

More information

Econ 2148, fall 2017 Gaussian process priors, reproducing kernel Hilbert spaces, and Splines

Econ 2148, fall 2017 Gaussian process priors, reproducing kernel Hilbert spaces, and Splines Econ 2148, fall 2017 Gaussian process priors, reproducing kernel Hilbert spaces, and Splines Maximilian Kasy Department of Economics, Harvard University 1 / 37 Agenda 6 equivalent representations of the

More information

Inference For High Dimensional M-estimates. Fixed Design Results

Inference For High Dimensional M-estimates. Fixed Design Results : Fixed Design Results Lihua Lei Advisors: Peter J. Bickel, Michael I. Jordan joint work with Peter J. Bickel and Noureddine El Karoui Dec. 8, 2016 1/57 Table of Contents 1 Background 2 Main Results and

More information

Quantile Processes for Semi and Nonparametric Regression

Quantile Processes for Semi and Nonparametric Regression Quantile Processes for Semi and Nonparametric Regression Shih-Kang Chao Department of Statistics Purdue University IMS-APRM 2016 A joint work with Stanislav Volgushev and Guang Cheng Quantile Response

More information

Lecture Notes 15 Prediction Chapters 13, 22, 20.4.

Lecture Notes 15 Prediction Chapters 13, 22, 20.4. Lecture Notes 15 Prediction Chapters 13, 22, 20.4. 1 Introduction Prediction is covered in detail in 36-707, 36-701, 36-715, 10/36-702. Here, we will just give an introduction. We observe training data

More information

CALCULATION METHOD FOR NONLINEAR DYNAMIC LEAST-ABSOLUTE DEVIATIONS ESTIMATOR

CALCULATION METHOD FOR NONLINEAR DYNAMIC LEAST-ABSOLUTE DEVIATIONS ESTIMATOR J. Japan Statist. Soc. Vol. 3 No. 200 39 5 CALCULAION MEHOD FOR NONLINEAR DYNAMIC LEAS-ABSOLUE DEVIAIONS ESIMAOR Kohtaro Hitomi * and Masato Kagihara ** In a nonlinear dynamic model, the consistency and

More information

Nonparametric Identi cation and Estimation of Truncated Regression Models with Heteroskedasticity

Nonparametric Identi cation and Estimation of Truncated Regression Models with Heteroskedasticity Nonparametric Identi cation and Estimation of Truncated Regression Models with Heteroskedasticity Songnian Chen a, Xun Lu a, Xianbo Zhou b and Yahong Zhou c a Department of Economics, Hong Kong University

More information

Parametric Inference Maximum Likelihood Inference Exponential Families Expectation Maximization (EM) Bayesian Inference Statistical Decison Theory

Parametric Inference Maximum Likelihood Inference Exponential Families Expectation Maximization (EM) Bayesian Inference Statistical Decison Theory Statistical Inference Parametric Inference Maximum Likelihood Inference Exponential Families Expectation Maximization (EM) Bayesian Inference Statistical Decison Theory IP, José Bioucas Dias, IST, 2007

More information

Stat260: Bayesian Modeling and Inference Lecture Date: February 10th, Jeffreys priors. exp 1 ) p 2

Stat260: Bayesian Modeling and Inference Lecture Date: February 10th, Jeffreys priors. exp 1 ) p 2 Stat260: Bayesian Modeling and Inference Lecture Date: February 10th, 2010 Jeffreys priors Lecturer: Michael I. Jordan Scribe: Timothy Hunter 1 Priors for the multivariate Gaussian Consider a multivariate

More information

Covariance function estimation in Gaussian process regression

Covariance function estimation in Gaussian process regression Covariance function estimation in Gaussian process regression François Bachoc Department of Statistics and Operations Research, University of Vienna WU Research Seminar - May 2015 François Bachoc Gaussian

More information

Discussion of the paper Inference for Semiparametric Models: Some Questions and an Answer by Bickel and Kwon

Discussion of the paper Inference for Semiparametric Models: Some Questions and an Answer by Bickel and Kwon Discussion of the paper Inference for Semiparametric Models: Some Questions and an Answer by Bickel and Kwon Jianqing Fan Department of Statistics Chinese University of Hong Kong AND Department of Statistics

More information

Selection of Smoothing Parameter for One-Step Sparse Estimates with L q Penalty

Selection of Smoothing Parameter for One-Step Sparse Estimates with L q Penalty Journal of Data Science 9(2011), 549-564 Selection of Smoothing Parameter for One-Step Sparse Estimates with L q Penalty Masaru Kanba and Kanta Naito Shimane University Abstract: This paper discusses the

More information

Modelling Non-linear and Non-stationary Time Series

Modelling Non-linear and Non-stationary Time Series Modelling Non-linear and Non-stationary Time Series Chapter 2: Non-parametric methods Henrik Madsen Advanced Time Series Analysis September 206 Henrik Madsen (02427 Adv. TS Analysis) Lecture Notes September

More information

Minimum distance tests and estimates based on ranks

Minimum distance tests and estimates based on ranks Minimum distance tests and estimates based on ranks Authors: Radim Navrátil Department of Mathematics and Statistics, Masaryk University Brno, Czech Republic (navratil@math.muni.cz) Abstract: It is well

More information

Approximate Median Regression via the Box-Cox Transformation

Approximate Median Regression via the Box-Cox Transformation Approximate Median Regression via the Box-Cox Transformation Garrett M. Fitzmaurice,StuartR.Lipsitz, and Michael Parzen Median regression is used increasingly in many different areas of applications. The

More information

Spatially Smoothed Kernel Density Estimation via Generalized Empirical Likelihood

Spatially Smoothed Kernel Density Estimation via Generalized Empirical Likelihood Spatially Smoothed Kernel Density Estimation via Generalized Empirical Likelihood Kuangyu Wen & Ximing Wu Texas A&M University Info-Metrics Institute Conference: Recent Innovations in Info-Metrics October

More information

Statistical signal processing

Statistical signal processing Statistical signal processing Short overview of the fundamentals Outline Random variables Random processes Stationarity Ergodicity Spectral analysis Random variable and processes Intuition: A random variable

More information

DESIGN-ADAPTIVE MINIMAX LOCAL LINEAR REGRESSION FOR LONGITUDINAL/CLUSTERED DATA

DESIGN-ADAPTIVE MINIMAX LOCAL LINEAR REGRESSION FOR LONGITUDINAL/CLUSTERED DATA Statistica Sinica 18(2008), 515-534 DESIGN-ADAPTIVE MINIMAX LOCAL LINEAR REGRESSION FOR LONGITUDINAL/CLUSTERED DATA Kani Chen 1, Jianqing Fan 2 and Zhezhen Jin 3 1 Hong Kong University of Science and Technology,

More information

Master s Written Examination

Master s Written Examination Master s Written Examination Option: Statistics and Probability Spring 016 Full points may be obtained for correct answers to eight questions. Each numbered question which may have several parts is worth

More information

Bayesian spatial quantile regression

Bayesian spatial quantile regression Brian J. Reich and Montserrat Fuentes North Carolina State University and David B. Dunson Duke University E-mail:reich@stat.ncsu.edu Tropospheric ozone Tropospheric ozone has been linked with several adverse

More information

Boosting Methods: Why They Can Be Useful for High-Dimensional Data

Boosting Methods: Why They Can Be Useful for High-Dimensional Data New URL: http://www.r-project.org/conferences/dsc-2003/ Proceedings of the 3rd International Workshop on Distributed Statistical Computing (DSC 2003) March 20 22, Vienna, Austria ISSN 1609-395X Kurt Hornik,

More information

Eco517 Fall 2004 C. Sims MIDTERM EXAM

Eco517 Fall 2004 C. Sims MIDTERM EXAM Eco517 Fall 2004 C. Sims MIDTERM EXAM Answer all four questions. Each is worth 23 points. Do not devote disproportionate time to any one question unless you have answered all the others. (1) We are considering

More information

O Combining cross-validation and plug-in methods - for kernel density bandwidth selection O

O Combining cross-validation and plug-in methods - for kernel density bandwidth selection O O Combining cross-validation and plug-in methods - for kernel density selection O Carlos Tenreiro CMUC and DMUC, University of Coimbra PhD Program UC UP February 18, 2011 1 Overview The nonparametric problem

More information

Additive Isotonic Regression

Additive Isotonic Regression Additive Isotonic Regression Enno Mammen and Kyusang Yu 11. July 2006 INTRODUCTION: We have i.i.d. random vectors (Y 1, X 1 ),..., (Y n, X n ) with X i = (X1 i,..., X d i ) and we consider the additive

More information

Fixed Effects, Invariance, and Spatial Variation in Intergenerational Mobility

Fixed Effects, Invariance, and Spatial Variation in Intergenerational Mobility American Economic Review: Papers & Proceedings 2016, 106(5): 400 404 http://dx.doi.org/10.1257/aer.p20161082 Fixed Effects, Invariance, and Spatial Variation in Intergenerational Mobility By Gary Chamberlain*

More information

Quantile Regression for Panel/Longitudinal Data

Quantile Regression for Panel/Longitudinal Data Quantile Regression for Panel/Longitudinal Data Roger Koenker University of Illinois, Urbana-Champaign University of Minho 12-14 June 2017 y it 0 5 10 15 20 25 i = 1 i = 2 i = 3 0 2 4 6 8 Roger Koenker

More information

,..., θ(2),..., θ(n)

,..., θ(2),..., θ(n) Likelihoods for Multivariate Binary Data Log-Linear Model We have 2 n 1 distinct probabilities, but we wish to consider formulations that allow more parsimonious descriptions as a function of covariates.

More information

Default Priors and Effcient Posterior Computation in Bayesian

Default Priors and Effcient Posterior Computation in Bayesian Default Priors and Effcient Posterior Computation in Bayesian Factor Analysis January 16, 2010 Presented by Eric Wang, Duke University Background and Motivation A Brief Review of Parameter Expansion Literature

More information

Kernel Density Based Linear Regression Estimate

Kernel Density Based Linear Regression Estimate Kernel Density Based Linear Regression Estimate Weixin Yao and Zibiao Zao Abstract For linear regression models wit non-normally distributed errors, te least squares estimate (LSE will lose some efficiency

More information

Robust and Efficient Regression

Robust and Efficient Regression Clemson University TigerPrints All Dissertations Dissertations 5-203 Robust and Efficient Regression Qi Zheng Clemson University, qiz@clemson.edu Follow this and additional works at: https://tigerprints.clemson.edu/all_dissertations

More information

INFERENCE APPROACHES FOR INSTRUMENTAL VARIABLE QUANTILE REGRESSION. 1. Introduction

INFERENCE APPROACHES FOR INSTRUMENTAL VARIABLE QUANTILE REGRESSION. 1. Introduction INFERENCE APPROACHES FOR INSTRUMENTAL VARIABLE QUANTILE REGRESSION VICTOR CHERNOZHUKOV CHRISTIAN HANSEN MICHAEL JANSSON Abstract. We consider asymptotic and finite-sample confidence bounds in instrumental

More information

Estimation de mesures de risques à partir des L p -quantiles

Estimation de mesures de risques à partir des L p -quantiles 1/ 42 Estimation de mesures de risques à partir des L p -quantiles extrêmes Stéphane GIRARD (Inria Grenoble Rhône-Alpes) collaboration avec Abdelaati DAOUIA (Toulouse School of Economics), & Gilles STUPFLER

More information

Model-free prediction intervals for regression and autoregression. Dimitris N. Politis University of California, San Diego

Model-free prediction intervals for regression and autoregression. Dimitris N. Politis University of California, San Diego Model-free prediction intervals for regression and autoregression Dimitris N. Politis University of California, San Diego To explain or to predict? Models are indispensable for exploring/utilizing relationships

More information

Minimum Hellinger Distance Estimation in a. Semiparametric Mixture Model

Minimum Hellinger Distance Estimation in a. Semiparametric Mixture Model Minimum Hellinger Distance Estimation in a Semiparametric Mixture Model Sijia Xiang 1, Weixin Yao 1, and Jingjing Wu 2 1 Department of Statistics, Kansas State University, Manhattan, Kansas, USA 66506-0802.

More information

Stat 5101 Lecture Notes

Stat 5101 Lecture Notes Stat 5101 Lecture Notes Charles J. Geyer Copyright 1998, 1999, 2000, 2001 by Charles J. Geyer May 7, 2001 ii Stat 5101 (Geyer) Course Notes Contents 1 Random Variables and Change of Variables 1 1.1 Random

More information

Web Appendix for Hierarchical Adaptive Regression Kernels for Regression with Functional Predictors by D. B. Woodard, C. Crainiceanu, and D.

Web Appendix for Hierarchical Adaptive Regression Kernels for Regression with Functional Predictors by D. B. Woodard, C. Crainiceanu, and D. Web Appendix for Hierarchical Adaptive Regression Kernels for Regression with Functional Predictors by D. B. Woodard, C. Crainiceanu, and D. Ruppert A. EMPIRICAL ESTIMATE OF THE KERNEL MIXTURE Here we

More information

Econometrics I, Estimation

Econometrics I, Estimation Econometrics I, Estimation Department of Economics Stanford University September, 2008 Part I Parameter, Estimator, Estimate A parametric is a feature of the population. An estimator is a function of the

More information

Function of Longitudinal Data

Function of Longitudinal Data New Local Estimation Procedure for Nonparametric Regression Function of Longitudinal Data Weixin Yao and Runze Li Abstract This paper develops a new estimation of nonparametric regression functions for

More information

A Few Notes on Fisher Information (WIP)

A Few Notes on Fisher Information (WIP) A Few Notes on Fisher Information (WIP) David Meyer dmm@{-4-5.net,uoregon.edu} Last update: April 30, 208 Definitions There are so many interesting things about Fisher Information and its theoretical properties

More information

On Modifications to Linking Variance Estimators in the Fay-Herriot Model that Induce Robustness

On Modifications to Linking Variance Estimators in the Fay-Herriot Model that Induce Robustness Statistics and Applications {ISSN 2452-7395 (online)} Volume 16 No. 1, 2018 (New Series), pp 289-303 On Modifications to Linking Variance Estimators in the Fay-Herriot Model that Induce Robustness Snigdhansu

More information

A Bootstrap Test for Conditional Symmetry

A Bootstrap Test for Conditional Symmetry ANNALS OF ECONOMICS AND FINANCE 6, 51 61 005) A Bootstrap Test for Conditional Symmetry Liangjun Su Guanghua School of Management, Peking University E-mail: lsu@gsm.pku.edu.cn and Sainan Jin Guanghua School

More information

Bayesian estimation of bandwidths for a nonparametric regression model with a flexible error density

Bayesian estimation of bandwidths for a nonparametric regression model with a flexible error density ISSN 1440-771X Australia Department of Econometrics and Business Statistics http://www.buseco.monash.edu.au/depts/ebs/pubs/wpapers/ Bayesian estimation of bandwidths for a nonparametric regression model

More information

COMPUTER-AIDED MODELING AND SIMULATION OF ELECTRICAL CIRCUITS WITH α-stable NOISE

COMPUTER-AIDED MODELING AND SIMULATION OF ELECTRICAL CIRCUITS WITH α-stable NOISE APPLICATIONES MATHEMATICAE 23,1(1995), pp. 83 93 A. WERON(Wroc law) COMPUTER-AIDED MODELING AND SIMULATION OF ELECTRICAL CIRCUITS WITH α-stable NOISE Abstract. The aim of this paper is to demonstrate how

More information

Efficient and Robust Scale Estimation

Efficient and Robust Scale Estimation Efficient and Robust Scale Estimation Garth Tarr, Samuel Müller and Neville Weber School of Mathematics and Statistics THE UNIVERSITY OF SYDNEY Outline Introduction and motivation The robust scale estimator

More information

Tobit and Interval Censored Regression Model

Tobit and Interval Censored Regression Model Global Journal of Pure and Applied Mathematics. ISSN 0973-768 Volume 2, Number (206), pp. 98-994 Research India Publications http://www.ripublication.com Tobit and Interval Censored Regression Model Raidani

More information

Lecture 4: Types of errors. Bayesian regression models. Logistic regression

Lecture 4: Types of errors. Bayesian regression models. Logistic regression Lecture 4: Types of errors. Bayesian regression models. Logistic regression A Bayesian interpretation of regularization Bayesian vs maximum likelihood fitting more generally COMP-652 and ECSE-68, Lecture

More information

Bayesian estimation of the discrepancy with misspecified parametric models

Bayesian estimation of the discrepancy with misspecified parametric models Bayesian estimation of the discrepancy with misspecified parametric models Pierpaolo De Blasi University of Torino & Collegio Carlo Alberto Bayesian Nonparametrics workshop ICERM, 17-21 September 2012

More information

9. Model Selection. statistical models. overview of model selection. information criteria. goodness-of-fit measures

9. Model Selection. statistical models. overview of model selection. information criteria. goodness-of-fit measures FE661 - Statistical Methods for Financial Engineering 9. Model Selection Jitkomut Songsiri statistical models overview of model selection information criteria goodness-of-fit measures 9-1 Statistical models

More information

Optimal bandwidth selection for the fuzzy regression discontinuity estimator

Optimal bandwidth selection for the fuzzy regression discontinuity estimator Optimal bandwidth selection for the fuzzy regression discontinuity estimator Yoichi Arai Hidehiko Ichimura The Institute for Fiscal Studies Department of Economics, UCL cemmap working paper CWP49/5 Optimal

More information

IV Quantile Regression for Group-level Treatments, with an Application to the Distributional Effects of Trade

IV Quantile Regression for Group-level Treatments, with an Application to the Distributional Effects of Trade IV Quantile Regression for Group-level Treatments, with an Application to the Distributional Effects of Trade Denis Chetverikov Brad Larsen Christopher Palmer UCLA, Stanford and NBER, UC Berkeley September

More information

ECAS Summer Course. Fundamentals of Quantile Regression Roger Koenker University of Illinois at Urbana-Champaign

ECAS Summer Course. Fundamentals of Quantile Regression Roger Koenker University of Illinois at Urbana-Champaign ECAS Summer Course 1 Fundamentals of Quantile Regression Roger Koenker University of Illinois at Urbana-Champaign La Roche-en-Ardennes: September, 2005 An Outline 2 Beyond Average Treatment Effects Computation

More information

Computational statistics

Computational statistics Computational statistics Markov Chain Monte Carlo methods Thierry Denœux March 2017 Thierry Denœux Computational statistics March 2017 1 / 71 Contents of this chapter When a target density f can be evaluated

More information

Nonparametric Econometrics

Nonparametric Econometrics Applied Microeconometrics with Stata Nonparametric Econometrics Spring Term 2011 1 / 37 Contents Introduction The histogram estimator The kernel density estimator Nonparametric regression estimators Semi-

More information

Defect Detection using Nonparametric Regression

Defect Detection using Nonparametric Regression Defect Detection using Nonparametric Regression Siana Halim Industrial Engineering Department-Petra Christian University Siwalankerto 121-131 Surabaya- Indonesia halim@petra.ac.id Abstract: To compare

More information

Working Paper No Maximum score type estimators

Working Paper No Maximum score type estimators Warsaw School of Economics Institute of Econometrics Department of Applied Econometrics Department of Applied Econometrics Working Papers Warsaw School of Economics Al. iepodleglosci 64 02-554 Warszawa,

More information

Likelihood Ratio Test in High-Dimensional Logistic Regression Is Asymptotically a Rescaled Chi-Square

Likelihood Ratio Test in High-Dimensional Logistic Regression Is Asymptotically a Rescaled Chi-Square Likelihood Ratio Test in High-Dimensional Logistic Regression Is Asymptotically a Rescaled Chi-Square Yuxin Chen Electrical Engineering, Princeton University Coauthors Pragya Sur Stanford Statistics Emmanuel

More information

Lecture 8: Information Theory and Statistics

Lecture 8: Information Theory and Statistics Lecture 8: Information Theory and Statistics Part II: Hypothesis Testing and I-Hsiang Wang Department of Electrical Engineering National Taiwan University ihwang@ntu.edu.tw December 23, 2015 1 / 50 I-Hsiang

More information

Local Polynomial Wavelet Regression with Missing at Random

Local Polynomial Wavelet Regression with Missing at Random Applied Mathematical Sciences, Vol. 6, 2012, no. 57, 2805-2819 Local Polynomial Wavelet Regression with Missing at Random Alsaidi M. Altaher School of Mathematical Sciences Universiti Sains Malaysia 11800

More information

Sparse Nonparametric Density Estimation in High Dimensions Using the Rodeo

Sparse Nonparametric Density Estimation in High Dimensions Using the Rodeo Outline in High Dimensions Using the Rodeo Han Liu 1,2 John Lafferty 2,3 Larry Wasserman 1,2 1 Statistics Department, 2 Machine Learning Department, 3 Computer Science Department, Carnegie Mellon University

More information

Quantile regression and heteroskedasticity

Quantile regression and heteroskedasticity Quantile regression and heteroskedasticity José A. F. Machado J.M.C. Santos Silva June 18, 2013 Abstract This note introduces a wrapper for qreg which reports standard errors and t statistics that are

More information

Statistical Properties of Numerical Derivatives

Statistical Properties of Numerical Derivatives Statistical Properties of Numerical Derivatives Han Hong, Aprajit Mahajan, and Denis Nekipelov Stanford University and UC Berkeley November 2010 1 / 63 Motivation Introduction Many models have objective

More information

MA 575 Linear Models: Cedric E. Ginestet, Boston University Mixed Effects Estimation, Residuals Diagnostics Week 11, Lecture 1

MA 575 Linear Models: Cedric E. Ginestet, Boston University Mixed Effects Estimation, Residuals Diagnostics Week 11, Lecture 1 MA 575 Linear Models: Cedric E Ginestet, Boston University Mixed Effects Estimation, Residuals Diagnostics Week 11, Lecture 1 1 Within-group Correlation Let us recall the simple two-level hierarchical

More information

NEW EFFICIENT AND ROBUST ESTIMATION IN VARYING-COEFFICIENT MODELS WITH HETEROSCEDASTICITY

NEW EFFICIENT AND ROBUST ESTIMATION IN VARYING-COEFFICIENT MODELS WITH HETEROSCEDASTICITY Statistica Sinica 22 202, 075-0 doi:http://dx.doi.org/0.5705/ss.200.220 NEW EFFICIENT AND ROBUST ESTIMATION IN VARYING-COEFFICIENT MODELS WITH HETEROSCEDASTICITY Jie Guo, Maozai Tian and Kai Zhu 2 Renmin

More information

Local Polynomial Modelling and Its Applications

Local Polynomial Modelling and Its Applications Local Polynomial Modelling and Its Applications J. Fan Department of Statistics University of North Carolina Chapel Hill, USA and I. Gijbels Institute of Statistics Catholic University oflouvain Louvain-la-Neuve,

More information

Lecture 3 September 1

Lecture 3 September 1 STAT 383C: Statistical Modeling I Fall 2016 Lecture 3 September 1 Lecturer: Purnamrita Sarkar Scribe: Giorgio Paulon, Carlos Zanini Disclaimer: These scribe notes have been slightly proofread and may have

More information

Testing Equality of Nonparametric Quantile Regression Functions

Testing Equality of Nonparametric Quantile Regression Functions International Journal of Statistics and Probability; Vol. 3, No. 1; 2014 ISSN 1927-7032 E-ISSN 1927-7040 Published by Canadian Center of Science and Education Testing Equality of Nonparametric Quantile

More information

Nonparametric Methods

Nonparametric Methods Nonparametric Methods Michael R. Roberts Department of Finance The Wharton School University of Pennsylvania July 28, 2009 Michael R. Roberts Nonparametric Methods 1/42 Overview Great for data analysis

More information

A General Overview of Parametric Estimation and Inference Techniques.

A General Overview of Parametric Estimation and Inference Techniques. A General Overview of Parametric Estimation and Inference Techniques. Moulinath Banerjee University of Michigan September 11, 2012 The object of statistical inference is to glean information about an underlying

More information

Brief Review on Estimation Theory

Brief Review on Estimation Theory Brief Review on Estimation Theory K. Abed-Meraim ENST PARIS, Signal and Image Processing Dept. abed@tsi.enst.fr This presentation is essentially based on the course BASTA by E. Moulines Brief review on

More information

regression Lie Wang Abstract In this paper, the high-dimensional sparse linear regression model is considered,

regression Lie Wang Abstract In this paper, the high-dimensional sparse linear regression model is considered, L penalized LAD estimator for high dimensional linear regression Lie Wang Abstract In this paper, the high-dimensional sparse linear regression model is considered, where the overall number of variables

More information

Nonparametric estimation of extreme risks from heavy-tailed distributions

Nonparametric estimation of extreme risks from heavy-tailed distributions Nonparametric estimation of extreme risks from heavy-tailed distributions Laurent GARDES joint work with Jonathan EL METHNI & Stéphane GIRARD December 2013 1 Introduction to risk measures 2 Introduction

More information

Estimation and Hypothesis Testing in LAV Regression with Autocorrelated Errors: Is Correction for Autocorrelation Helpful?

Estimation and Hypothesis Testing in LAV Regression with Autocorrelated Errors: Is Correction for Autocorrelation Helpful? Journal of Modern Applied Statistical Methods Volume 10 Issue Article 13 11-1-011 Estimation and Hypothesis Testing in LAV Regression with Autocorrelated Errors: Is Correction for Autocorrelation Helpful?

More information

MA 575 Linear Models: Cedric E. Ginestet, Boston University Non-parametric Inference, Polynomial Regression Week 9, Lecture 2

MA 575 Linear Models: Cedric E. Ginestet, Boston University Non-parametric Inference, Polynomial Regression Week 9, Lecture 2 MA 575 Linear Models: Cedric E. Ginestet, Boston University Non-parametric Inference, Polynomial Regression Week 9, Lecture 2 1 Bootstrapped Bias and CIs Given a multiple regression model with mean and

More information