Efficient Regressions via Optimally Combining Quantile Information
|
|
- Evelyn Sparks
- 5 years ago
- Views:
Transcription
1 Efficient Regressions via Optimally Combining Quantile Information Zhibiao Zhao Penn State University Zhijie Xiao Boston College September 29, 2011 Abstract We study efficient estimation of regression models via quantile regression. Both the classical parametric linear regression model and the nonparametric regression model are investigated. We argue that it is crucial to optimally combine information over quantiles. We investigate both the weighted composite quantile regression estimator based on a quantile weighted loss function, and the weighted quantile average estimator based on a weighted average of quantile regression estimators at single quantiles. We show that efficient regression estimators can be constructed using quantile regression. In the case of non-regular statistical estimation, the proposed estimators lead to super-efficient estimation. Asymptotic properties of the proposed estimators are studied and efficiency comparison is conducted among these estimators and other estimators such as the ordinary least squares estimator and the least-absolute-deviation estimator. We also conduct Monte Carlo experiments to investigate the sampling performance of the proposed estimators. AMS 2000: Primary 62J05, 62G08; Secondary 62G20. Keywords: Asymptotic normality, Bahadur representation, Composite quantile regression, Efficiency, Fisher information, Quantile regression, Super-efficiency. We thank Roger Koenker and Steve Portnoy for their very helpful comments. Zhibiao Zhao: Department of Statistics, Penn State University, University Park, PA 16802; zuz13@stat.psu.edu. Zhijie Xiao: Department of Economics, Boston College, Chestnut Hill, MA 02467; zhijie.xiao@bc.edu. 1
2 1 Introduction We are interested in the problem of efficient estimation of regression models: Y = m(x, θ + ε, (1 where X is the p-dimensional covariates, Y R is the response, ε is the noise, and θ is the vector of unknown parameters. In particular, although our method may be extended to the estimation of other statistical models (see discussion in Section 6, we consider in this paper two leading models that are widely used in statistical applications: the classical parametric linear regression model where θ is finite dimensional, and the nonparametric regression model where the parameter is of infinite dimension. In practice, model (1 is usually estimated via an ordinary least squares (OLS regression (for the parametric model or local least squares regression (in the nonparametric case of Y on X based on a collection of observations. When ε is normally distributed, the widely used OLS estimator has likelihood interpretation and is the most efficient estimator. In the absence of Gaussianity, the OLS estimator is usually less efficient than methods that exploit the distributional information, although may still be consistent under appropriate regularity conditions. Without these regularity conditions, the OLS estimator may not even be consistent, for example, when the error has a heavy-tailed distribution such as the Cauchy distribution. Monte Carlo evidence indicates that the OLS estimator can be very sensitive to certain outliers. In empirical analysis, many applications (such as finance and economics involve data whose distributions are heavy-tailed or skewed, and estimators based on OLS may have poor performance in these cases. It is therefore important to develop robust and efficient estimation procedures for general innovation distributions. If the error distribution were known, the Maximum Likelihood Estimator (MLE could be constructed. Under appropriate regularity conditions, the MLE is asymptotic normal and asymptotically efficient in the sense that the limiting covariance matrix attains the Cramér-Rao lower bound. In practice, the true density function is generally unknown and so the MLE is not feasible. However, the MLE and the Craḿer-Rao efficiency bound serve as a standard against which we should measure our estimator. The existence and construction of asymptotically efficient regression estimators have attracted much research attention, see, e.g., Stein (1956, Beran (1974 and Stone (
3 In his Wald lecture, based on a nonparametric estimate of the score function and an one-step Newton-Raphson approximation, Bickel (1982 proposed asymptotically efficient estimator of linear regressions with unknown symmetric error distribution. He also showed that, in the presence of an intercept term, asymptotically efficient estimator of the slope parameters can be obtained without symmetry assumption. We believe that the quantile regression technique (Koenker and Bassett, 1978 can provide a useful method in efficient statistical estimation. Intuitively, an estimation method that exploits the distributional information can potentially provide an efficient estimator. Since quantile regression provides a way of estimating the whole conditional distribution, appropriately using quantile regressions may improve estimation efficiency. Under regularity assumptions, the least-absolute-deviation (LAD regression (a quantile regression at median can provide better estimators than the OLS in the presence of heavy-tailed distributions. In addition, for certain distributions, a quantile regression at a non-median quantile may deliver a more efficient estimator than the LAD method. Furthermore, additional efficiency gain may be achieved by aggregating information over multiple quantiles. Although using quantile regression over multiple quantiles can potentially improve the efficiency of estimation, this is often much easier to say than it is to do in a satisfactory way. To combine information from quantile regression, one may consider combining information over different quantiles via the criterion function of the estimation procedure, or combining information based on estimators at different quantiles. An important contribution along the second direction is Portnoy and Koenker (1989, who proposed an asymptotically efficient estimator for the slope parameters of the classical linear regression model. Chamberlain (1994 and Xiao and Koenker (2009 considered some more complicate models using minimum distance estimation to improve efficiency, although not reaching the efficiency bound. Zou and Yuan (2008, Kai, Li and Zou (2010 recently considered the composite quantile regression to incorporate information across quantiles along the first direction mentioned above. In particular, they use an aggregate loss function based on a simple average of the individual loss functions over multiple quantiles. In this paper, we argue that, although a composite quantile regression estimator based on the aggregated criterion function may outperform the OLS estimator for some non-normal distributions, simple average in general is not an efficient way of using distributional information from quantile regressions. 3
4 Information at different quantiles are correlated, improperly using multiple quantile information may even reduce the efficiency. Roughly speaking, simple average delivers good estimators when the error distribution is close to normal. In fact, for nonparametric regression, a simple-averaging-based composite quantile regression estimator is asymptotically equivalent to the local least squares estimator. However, the main purpose of combining quantile information is to improve efficiency when the error distribution is not normal and OLS does not work well. It is therefore important to combine quantile information appropriately to achieve efficiency gain. We study optimal combination of quantile regression information. We investigate both the weighted composite quantile regression estimator based on weighted quantile loss functions, and the weighted quantile average estimator based on a weighted average of quantile regression estimators at single quantiles. Optimal weighting schemes that lead to efficient regression estimators are proposed. In the case of non-regular statistical estimation, the proposed estimators lead to super-efficient estimation. Asymptotic properties of the proposed estimators are studied and efficiency comparison is conducted among these estimators and other estimators such as the OLS and LAD estimators. The rest of this paper is organized as follows: Section 2 contains a preliminary result that is useful for our subsequent asymptotic theories. We study the linear regression in Section 3 and nonparametric regression model in Section 4. We briefly discuss estimation in non-regular cases in Section 5. Extensions and future work are discussed in section 6. Section 7 contains simulation studies. Proofs are gathered in Section 8. Throughout, we consider combining information over k quantiles τ i = i/(k + 1, i = 1,..., k. 2 A Preliminary Result Given the regression model (1, we denote the distribution, density, and quantile functions of the error term ε by F, f, and Q. Denote the Fisher information of f by I(f, i.e. [ log f(u ] 2f(udu [f (u] 2 I(f = = u f(u du. Throughout we assume I(f <. R Our goal is to improve efficiency (upon traditional methods of regression estimators by optimally using information at different quantiles τ 1,..., τ k. Throughout this paper, 4 R
5 we use the vector q and matrices Γ and H defined as follows: q = Γ = H = [ ] T q 1,..., q k, where qi = f(q(τ i, [ ] min(τ i, τ j τ i τ j, 1 i,j k [ ] min(τi, τ j τ i τ j. f (Q (τ i f (Q (τ j 1 i,j k For convenience we write q 0 = q k+1 = 0. We also make the following assumptions. Assumption 1. f is positive, twice continuously differentiable, and bounded on {u : 0 < F (u < 1}. Assumption 2. Let g(τ = f(q(τ, g 2 (τ + g 2 (1 τ lim τ 0 τ 1 τ = 0 and lim τ 2 g (t 2 dt = 0. τ 0 τ Assumptions 1 and 2 are conventionally imposed in the study of efficient statistical estimation. Assumption 1 assumes smoothness of the density function. Assumption 2 is needed for Cramér-Rao type efficiency result. It requires that the error density approaches zero at the boundary at a sufficiently fast rate, otherwise one may estimate the parameters at a faster rate; see, e.g. Akahira and Takeuchi (1995 for a discussion of this issue, also see discussions in Section 5. Given a random sample {(X t, Y t } n t=1, we consider efficient estimation of θ using quantile regressions. As will become more clear later, for many quantile regression estimators that have an appropriate Bahadur representation, the covariance matrix of the estimator based on optimal quantile combination will be dependent on Λ k defined below: Λ k = e T k H 1 e k, where e k = (1,..., 1 T. (2 Theorem 1 below gives an important preliminary result on combining quantile information, and it indicates that as we use a large number of quantiles (k, Λ k will converge to the Fisher information. Theorem 1. Under Assumptions 1 and 2, lim k Λ k = I(f. 5
6 Assumption 2 is an important assumption that ensures the Cramér-Rao efficiency result. Proposition 1 below presents alternative sufficient conditions for Assumption 2. Proposition 1. Under Assumption 1, if {u : 0 < F (u < 1} = R, then Assumption 2 holds if either one of the following two conditions holds: { g 2 (τ + g 2 (1 τ } lim + [ g (τ + g (1 τ ]τ 3/2 = 0. (3 τ 0 τ or equivalently { lim u ± f 2 (u min{1 F (u, F (u} + 2 [log f(u] min{1 F (u, F (u} 3/2 u 2 f(u } = 0. (4 By (4 in Proposition 1, for symmetric density f, Assumption 2 is satisfied if { f 2 (u lim u 1 F (u + 2 [log f(u] } [1 F (u] 3/2 = 0. (5 u 2 f(u We write a b if a/b is bounded away from zero and infinity. Corollary 1. Suppose that there exist ν > 1 and β > 0 such that f(u u ν and 2 [log f(u]/ u 2 u β as u. Then Assumption 2 is satisfied if β + 3(ν 1/2 > 1. In particular, the latter holds if β 1. Many commonly used distributions with support on R satisfy Assumption 2, and hence resulting in asymptotic efficiency by Theorem 1. (i If f is the standard normal density, then 2 logf(u/ u 2 = 1 and 1 F (u = [1 + o(1]f(u/u as u, it is easy to verify (5. (ii For Laplace distribution with density f(u = 0.5 exp( u, u R, 1 F (u = 0.5 exp( u for u > 0, and 2 [log f(u]/ u 2 = 0 for u 0, so (5 trivially holds. (iii For logistic distribution, we can show g(τ = c(τ τ 2 for some constant c > 0, so (3 holds. (iv For Student-t distribution with α > 0 degrees of freedom, Corollary 1 holds with ν = α +1 and β = 2. (v For symmetric α-stable distribution with index α (0, 2, Corollary 1 also holds with ν = α + 1 and β = 2. For such a case, OLS is consistent with rate n 1 1/α only for α > 1 (and for α 1, OLS is not consistent. Also, the limiting distribution is non-normal and again α-stable distribution. The proposed quantile regression method still achieves efficiency with asymptotic normality. (vi For mixture normal distribution: θn(µ 1, σ (1 θn(µ 2, σ 2 2, µ 1, µ 2 R, σ 2 1, σ 2 2 > 0, θ [0, 1], we can verify (4. 6
7 3 Efficient Estimation of The Linear Model We first consider in this section the problem of estimating coefficients of the linear model Y = α + Xβ + ε, (6 where X = (X(1,..., X(p is the p-dimensional covariates, Y is the response, ε is the noise independent of X, α is the intercept, and β = (β 1,..., β p T is the slope vector. We first look at β. Given a random sample {(X t, Y t } n t=1, we consider two ways of combining quantile information in Sections 3.1 and 3.2 respectively, additional discussions are given in Section 3.3. Section 3.4 considers α. For convenience and without loss of generality, we assume that the regressors X are centered around 0. Assumption 3. E(X = 0 and the covariance Σ X = E(X T X is positive definite. 3.1 The Weighted Composite Quantile Regression Estimation The first approach combines quantiles via the criterion function. For τ (0, 1, denote by Q Y (τ X the conditional τ-th quantile of Y given X, then model (6 has the following quantile representation: Q Y (τ X = α(τ + Xβ with α(τ = α + Q(τ, (7 where Q(τ is the τ-th quantile of ε. We consider estimating the model over {τ i, i = 1,..., k} jointly based on the following weighted quantile loss function: ({ α(τ i } k, β WCQR = argmin a 1,...,a k,b w i n ρ τi (Y t a i X t b, (8 t=1 where w i 0 with k w i = 1 are weights and ρ τ (z = z(τ 1 z 0 is the quantile loss function (Koenker and Bassett, If w i 1/k, (8 reduces to the (unweighted Composite Quantile Regression (CQR of Zou and Yuan (2008. For this reason, we call (8 the Weighted CQR (WCQR. Theorem 2 below gives the limiting distribution of the WCQR estimator. 7
8 Theorem 2. Let w = (w 1,, w k T, and Γ and q be defined in Section 2. Under Assumption 1 and 3, as the sample size n, n ( βwcqr β ( N 0, Σ 1 X J(w, where J(w = wt Γw w T qq T w. (9 If we use equal weights w i = 1/k over all quantiles, the asymptotic covariance matrix of the unweighted CQR is given by Σ 1 X J CQR, where J CQR = ( k 1 f (Q (τ i 2 [ ] min(τ i, τ j τ i τ j. (10 However, the importance of information at different quantiles are different, and the information at different quantiles are correlated - depending on the error distribution. Thus, it is important to optimally (instead of simple average combine these quantiles together. By Theorem 2, the covariance matrix of β WCQR depends on the weights through J(w. Thus a natural way to select optimal weights in the WCQR is to minimizes J(w in (9. j=1 Theorem 3 below studies the optimal WCQR estimator. Assumption 4. The log density function log f(u of ε is concave. Theorem 3. are Under Assumptions 1, 3 and 4, the optimal weights w minimizing J(w w i = (2q i q i 1 q i+1 /(q 1 + q k 0, i = 1,..., k, (11 For the corresponding optimal WCQR estimator β WCQR, we have ( n ( β WCQR β N 0, Σ 1 X Λ 1 k. where Λ k is given by (2. By the result in Theorem 1, under the additional Assumption 2, Λ k I(f: as we use more quantiles (k, the optimal WCQR achieves the Cramér-Rao efficiency bound. Remark. For a general weighting w i = w(τ i for a function w(, as k, J(w 2 1 { r w (r w (s s (1 r ds} dr 0 0 ( 2. 1 w (r f (Q (r dr 0 8
9 3.2 The Weighted Quantile Average Estimation The WCQR estimator combines information at different quantiles via the criterion function. We next consider an alternative approach that combines information based on estimators at different quantiles. For each τ, we estimate β via the quantile regression: ( α(τ, β(τ n = argmin ρ τ (Y t a X t b. (12 a,b Under Assumptions 1 and 3, β(τ has asymptotic normality: ( n ( β(τ β N 0, Σ 1 τ(1 τ X. (13 f 2 (Q(τ To combine quantile information, we define the weighted quantile average estimator (WQAE: β WQAE = ϖ i β(τi, subject to ϖ i = 1. (14 t=1 Estimators of the above form has a long history in statistics, dated back to Mosteller (1946. Portnoy and Koenker (1989 studied estimators in the form of β(τdν(τ with a general weighting function. An important difference between the weights ϖ in the WQAE and the weights w in the WCQR is that ϖ i are not restricted to be non-negative. Just as the case of the WCQR, properly weighting different quantiles in the WQAE is crucial in efficiently combining distributional information to improve efficiency. As shown by Theorem 4(i below, the optimal weights can be obtained by minimizing a quadratic function of ϖ. The optimal weights and the corresponding asymptotic distribution of the optimal WQAE are given in Theorem 4(ii. Recall the matrix H and the vector e k in Section 2. Theorem 4. (i Denote ϖ = (ϖ 1,, ϖ k T, under Assumptions 1 and 3, as the sample size n, ( n ( βwqae β N 0, Σ 1 X, S(ϖ where S(ϖ = ϖ T Hϖ. (15 (ii The optimal weights that minimizes S(ϖ are unique and are given by ϖ = H 1 e k /(e T k H 1 e k. (16 Recall Λ k in (2. For the corresponding optimal WQAE β WQAE, we have ( n ( β WQAE β N 0, Σ 1 X S(ϖ, where S(ϖ = Λ 1 k. 9
10 By Theorem 1, lim k Λ k = I(f under Assumption 2. Thus, as we use more quantiles, the variance of the optimal WQAE estimator approaches the Cramér-Rao lower bound, corroborating the result of Portnoy and Koenker (1989. Notice that the optimal weights ϖ allow for negative components. The appearance of a negative weight in ϖ is due to very inaccurate estimation at some quantile (say τ and the existence of very accurate estimates at other quantiles that are highly correlated with the estimator at this quantile (τ. For example, consider Cauchy distribution and two quantiles τ 1 = 0.15, τ 2 = 0.5, then β (τ 2 is a much more accurate estimator than β (τ 1, and they are positively correlated. In this case, the correlation effect is stronger than the averaging effect, and an optimal combination of β (τ 1 and β (τ 2 would impose a negative weight on β (τ A Comparison of Asymptotic Efficiency The WQAE β WQAE differs from the WCQR estimator β WCQR in several aspects. While the WCQR estimator in (8 is based on an aggregation of several quantile loss functions, the WQAE in (14 is based on an a weighted average of separate estimates from different quantiles. As a result, computing the WQAE only involves k separate (p + 1-parameter minimization problems, whereas the WCQR requires solving a larger (p + 1k-parameter minimization problem. In addition, to ensure a proper loss function, the weights w i of β WCQR are restricted to be non-negative; by contrast, the weights ϖ 1,..., ϖ k in (14 can be negative. It is computationally appealing to impose less constraint on the weights. By Theorems 1 and 4, the optimal WQAE is asymptotically efficient under Assumptions 1 and 2. By contrast, asymptotic efficiency of the optimal WCQR requires the additional Assumption 4, which is due to the constraint of non-negative weights in WCQR. Theorem 5 below shows that in general the optimal WQAE is asymptotically more efficient than the WCQR estimator. Only when Assumption 4 holds, the optimal WCQR is asymptotically equivalent to the optimal WQAE, and both are asymptotically efficient. Theorem 5. Let S(ϖ be defined in Theorem 4 (ii and J(w given by (9. Then S(ϖ min w J(w, with equality at w = w (given by (11 under Assumption 4. Next we study the asymptotic relative efficiency (ARE of the optimal WQAE to the OLS estimator, the quantile regression (QR estimator β(τ in (12 at a single quantile 10
11 τ, and the unweighted CQR estimator. They all have asymptotic normality with limiting distribution in the form: n( β β N(0, s 2 Σ 1 X, where s2 = var(ε (assuming finite for the OLS, τ(1 τ/f 2 (Q(τ for the QR, J CQR in (10 for the unweighted CQR, and S(ϖ in Theorem 4 (ii for the optimal WQAE. For comparison, we use the optimal WQAE as the benchmark and define its ARE to estimator β as ARE( β = s2 S(ϖ = (et k H 1 e k s 2, where β : OLS, QR(τ, CQR. ARE 1 indicates better performance of the optimal WQAE. Clearly, ARE(CQR 1, ARE(QR(τ i 1, lim k ARE(OLS 1. We discuss a few examples below using k = 9 quantiles. Example 1. Let ε be Student-t distributed. Then ARE(CQR= 1.03, 1.12, 1.58 for t or N(0,1, t 2, t 1. As the tail of the distribution gets heavier, the optimal WQAE becomes increasingly more efficient than CQR. For t 1 (Cauchy distribution, the optimal weights ϖ = { 0.03, 0.04,0.08,0.29,0.40,0.29,0.08, 0.04, 0.03}, quantiles τ = 0.4, 0.5, 0.6 contribute almost all information whereas quantiles τ = 0.1, 0.2, 0.8, 0.9 are penalized heavily with negative weights. In this case, Zou and Yuan s CQR with uniform weights would not perform well. On the other hand, for N(0,1, ϖ = {0.13,0.11,0.11,0.10,0.10,0.10,0.11,0.11,0.13} are close to the uniform weights. Therefore the optimal WQAE and CQR have comparable performance, and both significantly outperform the single QR estimator in (12. See Table 1 for details. Example 2. Let ε be a mixture of two normal distributions. Mixture 1 (different variances: 0.5N(0,1+0.5N(0,0.5 6 ; Mixture 2 (different means: 0.5N( 2,1+0.5N(2,1. For Mixture 1, ϖ = { 0.002, 0.102,0.183,0.277,0.287,0.277,0.183, 0.102, 0.002}, quantiles 0.3,0.4,0.5,0.6,0.7 contain substantial information whereas quantiles 0.2,0.8 have negative weights For Mixture 2, ϖ = {0.185,0.156,0.153,0.078, 0.144,0.078,0.153,0.156,0.185} have quite different pattern. For Mixture 1, ARE(OLS = 10.17, ARE(CQR = 2.43; for Mixture 2, ARE(OLS = 3.28, ARE(CQR = Thus, the optimal WQAE has substantial efficiency gains. See Table 1 for details. Example 3. Let ε be Gamma random variable with parameter α > 0. For α = 1 (exponential distribution, ϖ 1 = 1.152, ϖ 2 = 0.124, and ϖ i 0 for i = 3,..., 9. Quantiles 0.1, 0.2 contain almost all information. See Table 1 for details. 11
12 Insert Table 1 about here. As shown in the examples above, different quantiles may carry substantially different amount of information, and without properly utilizing such information may result in substantial loss of efficiency. The latter phenomenon provides strong evidence in favor of our proposed optimally weighted quantile based estimators. 3.4 Efficient Estimation of the Intercept Term Efficient estimation of the intercept α in (6 requires additional assumptions such as symmetry, see, e.g. Bickel (1982. In the quantile domain representation (7, without further assumptions, α is unidentifiable from α(τ. Following the previous literature, we assume that f is symmetric, which holds for many distributions, such as Student-t distribution, uniform distribution on symmetric intervals, Laplace distribution, symmetric α-stable distribution, many normal mixture distributions, and their truncated versions on symmetric interval. Assumption 5. The distribution of ε is symmetric. Recall α(τ in (7 and α(τ in (12. We consider the following WQAE for α: α WQAE = ϖ i α(τ i, subject to ϖ i = 1, ϖ i = ϖ k+1 i, i = 1,..., k. (17 The asymptotic distribution of α WQAE is given in Theorem 6 below. Again, the optimal WQAE of α can be constructed by minimizing the limiting variance of α WQAE. Since the limiting variance of α WQAE depends on S(ϖ defined in Theorem 4, and the optimal weights ϖ in Theorem 4 satisfy the condition ϖi = ϖk+1 i under Assumption 5, we have: Theorem 6. (i Under Assumptions 1, 3 and 5, we have the asymptotic normality: ( ( n α WQAE α N 0, S(ϖ, where S(ϖ = ϖ T Hϖ. (18 (ii The optimal weights that minimize the asymptotic variance is given by ϖ in (16, and the optimal WQAE α WQAE has limiting distribution n ( α WQAE α N(0, Λ 1 k, where Λ k is defined in (2. By Theorem 1, under the additional Assumption 2, the variance of the optimal WQAE of α approaches the Cramér-Rao lower bound as the number of quantiles k. 12
13 3.5 Estimating f(q(τ To compute the optimal weights, we need to estimate f(q(τ. Since f (Q (τ is invariant under location shift of ε, α in (6 can be absorbed into ε. We propose the procedure: (i Use uniform weights ϖ i = 1/k to obtain the preliminary estimator β, and compute the residuals (a combination of both α and ε as ε t = Y t X t β. (ii Use the nonparametric kernel density estimator to estimate f(u: f(u = 1 nb n ( u εt K, (19 b where K( is a non-negative kernel function and b > 0 is a bandwidth. (iii Estimate f(q(τ by f( Q(τ, where Q(τ is the sample τ-quantile of ε 1,..., ε n. t=1 In (19, we use Gaussian kernel for K. Following Silverman (1986, p. 48, we choose { b = 0.9 min sd( ε 1,..., ε n, IQR( ε 1,..., ε n } n 1/5, 1.34 where sd and IQR denote the sample standard deviation and sample interquartile. In estimating the intercept term, we symmetrize the weights; see Section Efficient Nonparametric Regression In this section, we study efficient estimation of the nonparametric regression model: Y = m(x + σ(xε, (20 where X and Y represent covariate and response respectively, ε is the error independent of X, m( is an unknown nonparametric function of interest, and σ( is the scale function. Our goal is to obtain efficient estimate of m(x at given x. For simplicity, we consider the case where X is one-dimensional and use local linear smoothing. Our analysis also applies to the case with multivariate X, and other nonparametric smoothing methods. For convenience, write Z n N(c n, s 2 n if s 1 n {Z n c n [1 + o p (1]} N(0, 1 in distribution. Let K( be a kernel function and h a bandwidth parameter, denote µ K = u 2 K(udu, φ K = K 2 (udu. R 13 R
14 Assumption 6. (i The density function p X ( > 0 of X is differentiable, m( is three times differentiable, and σ( > 0 is differentiable, in the neighborhood of x; the density function f of ε is symmetric, positive, and twice continuously differentiable on its support. (ii The kernel function K( integrates to one, is symmetric, and has bounded support. (iii The bandwidth h satisfies h 0 and nh as n. Assumption 6 is similar to those of Kai, Li and Zou (2010. As in Section 3.4, the symmetry assumption in Assumption 6 (i is imposed to ensure identifiability. Based on observations (X 1, Y 1,..., (X n, Y n, the widely used local linear least squares regression is ( m LS (x, b = argmin a,b n ( {Y t a b(x t x} 2 Xt x K. (21 h t=1 If we assume that var(ε <, under some regularity conditions [see, e.g. Fan and Gijbels (1996], the local linear least squares estimator is asymptotically normal: ( 1 m LS (x m(x N 2 m (xµ K h 2 φ K σ 2 (x, nhp X (x var(ε. When ε i s are Gaussian, the local linear least squares estimation corresponds to the weighted likelihood criterion. In the absence of Gaussianity, asymptotic results of the above estimator generally still hold but this estimator is less efficient in terms of meansquared error than estimators that exploit the distributional information. In the presence of heavy-tailed data, local median (or more generally, local τ-th quantile regression has been proposed as a robust method in estimating m(x; see, e.g. Yu and Jones (1998: ( m LAD (x, b = argmin a,b n ( Xt x Y t a b(x t x K. (22 h t=1 Under Assumption 6, we have m LAD (x m(x N ( 1 2 m (xµ K h 2, φ K σ 2 (x 1. nhp X (x 4f 2 (0 If the error density f were known, we could replace {Y t a b(x t x} 2 in (21 by the log likelihood log f(y t a b(x t x and obtain a likelihood-based estimator, denoted 14
15 by m MLE (x, see, e.g., Fan, Farman and Gijbels (1998. Under some regularity conditions, ( 1 m MLE (x m(x N 2 m (xµ K h 2 φ K σ 2 (x, nhp X (x I(f 1, (23 where I(f is the Fisher information of the error distribution. The local likelihood estimator has lower variance than the least squares or least-absolute-deviation based local linear estimator as follows by the classical Cramér-Rao inequality. In practice, f is unknown and the local likelihood estimator is infeasible. In this section, we propose a quantile average nonparametric estimator for m(x, and show that, as the number of quantiles k, the asymptotic variance of the proposed estimator converges to the same asymptotic variance as the local likelihood procedure. This is true regardless of the dimensionality of the regressors. In this sense, our estimator is efficient 1. In many applications, the error distribution is likely to be non-normal and perhaps quite far from normal to such an extent that the local likelihood estimator is much more efficient than the least squares estimator. As a modeling issue it is hard to believe that we have better information about the error density than about the shape of the main regression effect and so it is quite natural to treat the error density as an unknown parameter. Thus our results are comforting in that they say that we can still use information from the error distribution to improve the performance of nonparametric estimates. 4.1 The Weighted Quantile Averaging Nonparametric Estimator Let Q(τ be the τ-th quantile of ε, and Q Y X (τ x be the conditional τ-th quantile of Y given X = x, then the nonparametric regression model (20 has the quantile regression representation: Q Y X (τ x = m(x + σ(xq(τ. Denote Q Y X (τ x by m(τ, x. Consider the local linear nonparametric quantile regression: ( m(τ, x, b = argmin a,b n ( Xt x ρ τ {Y t a b(x t x}k. (24 h t=1 Solution of a in the above optimization, m(τ, x, provides an estimate for Q Y X (τ x. 1 This is not to say that our method has any special properties with regard to minimax risk; as is well known it is not possible to achieve the lower bound here, see Fan (
16 We construct the following weighted quantile averaging nonparametric (WQAN estimator for m(x based on weighted average of m(τ i, x over τ i, i = 1,..., k: m WQAN (x = ϖ i m(τ i, x, subject to ϖ i = 1, ϖ i = ϖ k+1 i, i = 1,..., k. (25 Theorem 7. Let S(ϖ be defined in Theorem 4 (i. Under Assumption 6, as n, ( 1 m WQAN (x m(x N 2 m (xµ K h 2 φ K σ 2 (x, nhp X (x S(ϖ. (26 If we simply use equal weights, then the resulting unweighted quantile average nonparametric estimator has the asymptotic normality in Theorem 7 with S(ϖ replaced by R k = 1 k 2 j=1 min(τ i, τ j τ i τ j f(q(τ i f(q(τ j. (27 This is the same as the limiting distribution of the nonparametric CQR estimator of Kai, Li and Zou (2010. Assuming that var(ε <, by result of Kai, Li and Zou (2010, lim R min(r, s rs k = drds = var(ε, k f(q(rf(q(s the unweighted quantile averaging nonparametric regression estimator is asymptotically equivalent to the local least squares estimator! This result indicates that if we use a simple average over τ, even as we use more and more quantiles, there is no efficiency gain of combining quantile information; Simple averaging does not work when it is most needed. See the simulation studies in Section 7.2. Proper weighting over different quantiles is very important in the nonparametric case. Next we show that a large efficiency gain can be obtained by optimal weighting in the WQAN. Since the asymptotic bias of m WQAN (x does not depend on the weights by Theorem 7, it is natural to find the optimal weights in m WQAN (x that minimize the asymptotic variance subject to the conditions on ϖ in (25. Notice that, the limiting variance of the WQAN estimator depends on ϖ through S(ϖ, which is the same as in Theorem 4. In addition, under Assumption 6 (i, the optimal weights ϖ in Theorem 4 satisfy the condition ϖi = ϖk+1 i. Therefore, the optimal WQAN estimator of m(x is given by: ( ( H m WQAN(x = m(τ 1, x,..., m(τ k, x ϖ 1 e k = m(τ 1, x,..., m(τ k, x. (28 e T k H 1 e k 16
17 By Theorem 7 and Theorem 4, we have the following result. Theorem 8. Recall Λ k in (2. Under Assumption 6, as n, ( m 1 WQAN(x m(x N 2 m (xµ K h 2 φ K σ 2 (x, nhp X (x Λ 1 k. By Theorem 1, under the additional Assumption 2, the optimal WQAN estimator m WQAN (x achieves asymptotic efficiency in the sense that its asymptotic bias and variance are the same as that of the likelihood-based estimator in ( Comparison of Asymptotic Efficiency In this section we compare the proposed optimal WQAN estimator m WQAN (x in (28 with other nonparametric regression estimators: the local linear least squares estimator m LS (x in (21, the local linear least-absolute-deviation estimator m LAD (x in (22, and the local linear CQR estimator m CQR (x of Kai, Li and Zou (2010. All four estimators have the same asymptotic bias m (xµ K h 2 /2 and different variances. In Theorem 9 we show that m WQAN (x is more efficient than the local CQR estimator m CQR (x. Theorem 9. Under Assumptions 2 and 6, the optimal WQAN estimator m WQAN (x is asymptotically more efficient than the nonparametric CQR m CQR (x for any nonnormal distribution. For normal distribution, they are asymptotically equivalent. It is easy to see that the proposed optimal WQAN estimator m WQAN (x is more efficient than the local OLS and LAD estimators. Thus we conclude that the proposed optimal WQAN estimator is asymptotically more efficient than all these three estimators. Next we calculate asymptotic relative efficiencies of these nonparametric methods. All four estimators have asymptotic normality of the following form with different s 2 : ( 1 m(x m(x N 2 m (xµ K h 2 φ K σ 2 (x, nhp X (x s2. we define the asymptotic mean-squared error (AMSE { } 2 1 AMSE{ m(x} = 2 m (xµ K h 2 + φ Kσ 2 (x nhp X (x s2. 17
18 Minimizing the AMSE, we obtain the optimal bandwidth: { } 1/5 h = argmin AMSE{ m(x} = {m (xµ K } 2/5 φ K σ 2 (x h p X (x s2 n 1/5, (29 and the associated optimal AMSE evaluated at the optimal bandwidth h { } AMSE { m(x} = 5 4/5 4 {m (xµ K } 2/5 φ K σ 2 (x p X (x s2 n 4/5, (30 which is proportional to (s 2 4/5. Therefore, similar results in Section 3.3 also apply here. Remark. Our analysis also extends to the multivariate case when X is d-dimensional. In this case, the variance component will be of order (nh d Estimating f(q(τ As in Section 3.5, we need to estimate the unknown quantity f(q(τ. Notice that the optimal weights ϖ are invariant under the transformation aε of the error for any a > 0, and hence we can assume without loss of generality that ε has median one. Then the conditional median of Y m(x given X is σ(x, therefore we can apply local median quantile regression to estimate σ(. We propose the following procedure (i Use (25 with uniform weights to obtain preliminary estimator m(. (ii Compute Y i m(x i and estimate σ( by local linear median quantile regression: ( σ(x, b = argmin σ,b n ( Xi x ρ 0.5 { Y i m(x i σ b(x i x}k. (31 l (iii Compute the errors ε i = [Y i m(x i ]/ σ(x i and use (19 to obtain estimate f(u. (iv Denote the sample τ-quantile of ε 1,..., ε n by Q(τ, obtain the estimate of f(q(τ as f( Q(τ, use (16 to compute ϖ 1,..., ϖ k, and finally obtain symmetrized weights ϖ j = ( ϖ j + ϖ k+1 j /2, j = 1,..., k. In (31, following Yu and Jones (1998, we use l = l LS (π/2 1/5, where l LS is the plug-in bandwidth (Ruppert, Sheather and Wand, 1995 for local linear least squares regression based on the data (X i, Y i m(x i 2, i = 1,..., n. 18
19 5 Super-efficient Estimation via Quantile Regression Under Assumptions 1, 2 and 3, asymptotically efficient estimators of β in model (6 exist, they are root-n consistent, asymptotic normal, with limiting covariance matrix attaining the Fisher information lower bound. Similarly, under Assumptions 6 and 2, the local likelihood estimator of m(x is nh-consistent, asymptotic normal, and the limiting covariance matrix equals to the Fisher information matrix. These cases are usually related to the terminology of regular statistical estimation. The corresponding regularity conditions are not mathematical trivialities (for easy of proof but are real restricting conditions to obtain the efficiency results. In the case when those usual regularity conditions do not hold, the previously discussed efficiency result may not hold, and different results (from the likelihood-based estimation may be found in such cases. For example, different rate of convergence may be found, and it is well-known that, in general, asymptotically efficient estimators do not exist. These unusual cases are some times called non-regular statistical estimation. In this section, we briefly discuss the case of non-regular estimation. We show that, in this case, by using quantile regression with optimal weighting, super-efficient estimators may be obtained in the sense that the limiting variance is smaller than the conventional Cramér-Rao bound or even zero. Assumption 2 guarantees that the density f vanishes at the boundary at a sufficiently fast rate so that the properties of regular estimation holds. If this assumption does not hold, the conventional estimators such as the OLS or the MLE may still be root-n consistent and asymptotic normal. However, we show in Theorem 10 below that the optimally weighted quantile averaging estimators has smaller variance than the conventional lower bound, and one may even estimate β (or m(x at a rate better than root-n (or nh. Theorem 10. Let g(τ = f(q(τ, and Λ k be defined by (2, and assume g 2 (τ + g 2 (1 τ lim τ 0 τ = c. (32 (i If 0 < c < and lim τ 0 τ 2 1 τ τ g (t 2 dt = 0, then lim k Λ k = [c + I(f] 1. (ii If c =, then lim k Λ k = 0. Assumption 2 covers the regular case corresponding to c = 0 in (32. Theorem 10 indicates that, for the non-regular case c > 0 in (32, under appropriate regularity condi- 19
20 tions, for large k, the variances of the (standardized optimally weighted quantile regression based estimators are smaller than the Cramér-Rao bound. In particular, if c =, as the number of quantiles k increases, the variance of the weighted quantile regression based estimators approaches zero. In this sense, the optimal quantile regression based estimators are super-efficient. Corollary 2 below concerns a special case that the density f is positive at the boundary. Corollary 2. Denote the support of f by D, then lim k Λ k = 0 in any of the following three cases: (i D = [D 1, D 2 ] with f(d 1 + f(d 2 > 0; (ii D = [D 1, with f(d 1 > 0; or (iii D = (, D 2 ] with f(d 2 > 0. Example 4. For the truncated version of many commonly used distributions, we have lim k Λ k = 0. For example, for truncated normal on [ 1, 1], Corollary 2 (i applies. For uniform distribution on [0, 1], we can show Λ k = 1/(2k Example 5. Let ε have Beta distribution with parameters α 1, α 2 > 0. For α 1 = α 2 = 1 (uniform distribution, ϖ1 = ϖk = 0.5 and ϖ 2 = = ϖk 1 = 0, which is not surprising: the sufficient statistic for independent uniform samples on [θ 1, θ 2 ] is the two extreme observations. Also, ARE(CQR = (k +1(k +2/(6k, so the CQR with uniform weights is substantially inferior to the optimal WQAE for large k. 6 Extensions Our proposed methodology provides a general framework for constructing efficient estimator under a broad variety of settings. Here we briefly discuss some general ideas and mention some possible directions. Different types of models may involve different assumptions and have different results - depending on the model and the subject of interest. We wish to explore these in more detail in future research. In general, one may consider efficient estimation of a statistical model that depends on {Y t, X t, α, β}. Suppose that the model has a quantile domain representation, say, Q Yt (τ X t = g(x t, α(τ, β where α(τ are local unknown parameters that are related to α and vary over different quantiles, and β are global parameters that remain the same over all quantiles. We are 20
21 interested in more efficient estimation of the parameters. Under appropriate regularity conditions, the quantile regression estimators θ(τ = ( α(τ, β(τ are consistent estimators of θ(τ = (α(τ, β and have a Bahadur representation, say, θ(τ = θ(τ + C mf(q(τ n Z t [τ 1 εt <Q(τ] + negligible error, t=1 where m = m(n, C is a p p positive definite matrix that does not depend on τ, Z 1,..., Z n R p are IID p-dimensional column random vector, and Q(τ is the τ-th quantile of ε t. An efficient estimator for β based on combined quantile regressions can be constructed in a similar way to those discussed in this paper. Under additional assumptions on α(τ, say α(τ = α + cq(τ is a quantile shift parameter, and the error distribution is symmetric, estimator for α based on combined quantile regressions can be also be constructed. We briefly discuss a few possible directions below. (i Efficient estimation of derivatives. For model (20, our proposed methodology can also be used to construct efficient estimator for m (x by considering linear combination of b in (24 over different quantiles. (ii Efficient estimation of varying-coefficient models. Consider Y t = m 1 (U t X t,1 + + m p (U t X t,p + σ(u t ε t, (33 where m 1 (,..., m p ( are nonparametric functional coefficients of interest. Denote by Q Y U (τ u the conditional τ-th quantile of Y given U = u, and by Q(τ the τ-th quantile of ε. Then (33 has the quantile representation Q Y U (τ u = m 1 (ux t,1 + + m p (ux t,p + σ(uq(τ, which can be estimated by a similar local linear quantile regression as in (24. Note that m 1 (u,..., m p (u are global parameters and σ(uq(τ are local parameters, so our methodology can be applied here. (iii Efficient estimation of conditional variance function. Consider Y = σ(xε. Then Y 2 = σ 2 (Xε 2, leading to the quantile representation Q Y 2 X(τ x = σ 2 (xq ε 2(τ. To ensure identifiability, assume σ(0 = 1. Then Q Y 2 X(τ x/q Y 2 X(τ 0 = σ 2 (x for all τ (0, 1, and efficient estimator of σ 2 ( can be constructed via combining quantiles. 21
22 (iv Efficient estimation of dependent data. Our asymptotic theories are developed for independent data for technical convenience, but they can be extended to dependent data, such as time series or even longitudinal data, under appropriate mixing conditions. 7 Monte Carlo Studies 7.1 Linear regression model In (6, let α = 0, β = 1, and the covariate X N(0,1. For ε, we consider Normal distribution, Student-t distribution with one (t 1 and two (t 2 degrees of freedom, the two normal mixture distributions in Example 2, Laplace distribution, Beta distributions Beta(1,b, b = 0.5, 1, 2, 3, and Gamma distributions Gamma(γ, γ = 0.5, 1, 2, 3. Throughout this section we consider uniformly spaced quantiles τ i = i/(k + 1, i = 1,..., k = 9. We compare six estimation methods. OLS: ordinary least squares estimate; LAD: the median quantile estimate with τ = 0.5 in (12; QAU, QAO, QAE: quantile average estimate in (14 with uniform weights, the theoretical optimal weights, and the estimated optimal weights (cf. Section 3.5, respectively; CQR: Zou and Yuan (2008 s CQR estimator. With realizations, we use QAE as a benchmark to which the five other methods are compared based on the empirical relative efficiency: RE(Method = MSE(Method , with MSE = [ β(j β] 2. MSE(QAE where Method stands for any of the other five estimators, and β(j is the estimator from the j-th iteration. A value of RE larger than one indicates better performance of QAE. The results are summarized in Tables 2 for two sample sizes n = 100, 300. For N(0,1, Student-t 2 and Laplace distributions, QAE and CQR are comparable; for all other distributions, QAE significantly outperforms CQR. Also, QAE outperforms OLS for all non-normal distributions whereas they are comparable for N(0,1. For larger sample size, the superior performance of QAE is even more remarkable, which agrees with our theoretical discovery. For almost all cases considered, QAE substantially outperforms the LAD estimator and the relative efficiency can be as high as 2845%. It is worth pointing out that, for Beta and j=1 22
23 Gamma distributions, the relative efficiencies are much higher than the other distributions considered, owing to the super-efficiency phenomenon in Theorem 10. We conclude that the proposed optimal QAE offers a more efficient alternative to existing methods. Insert Table 2 about here. 7.2 Nonparametric regression model In our data analysis, we use the standard Gaussian kernel for K(. We now address the bandwidth selection issue. By (29, the optimal bandwidth h is proportional to (s 2 1/5. Denote by h LS and h WQAN the bandwidth for the least squares estimator and the proposed optimal WQAN estimator. Then h WQAN = h LS {Λ k/var(ε} 1/5, where Λ k is defined in (2. In practice, to select h LS, we can use the plug-in bandwidth selector in Ruppert, Sheather and Wand (1995, implemented using the command dpill of the R package KernSmooth. We then select h WQAN by plugging in estimates of Λ k and var(ε using the methods in Section 4.3, but for the purpose of comparison we shall use their true values in our simulation studies. Similarly, we can choose the optimal bandwidths for the other two estimators. Kai, Li and Zou (2010 adopted the same strategy. In step (i of Section 4.3, for the preliminary estimator m(, we use the plug-in bandwidth selector h LS. We compare the empirical performance of the four methods listed at the beginning of Section 4.2. With 1000 realizations, we use the least squares estimator m LS as the benchmark to which the other three methods are compared based on the relative efficiency: RE( m = MISE( m LS 1, with MISE( m = MISE( m l2 r=1 l 1 { m r (x m(x} 2 dx, where m r is the estimator from the rth realization, and [l 1, l 2 ] is the interval over which m is estimated. A relative efficiency greater than one indicates m outperforms the LS estimator. To facilitate computation, the integral is approximated using 20 grid points. Consider n = 200 samples from the model Model I : Y = sin(2x + 2 exp( 16X ε, X Unif[ 1.6, 1.6]. (34 The same model was also used in Kai, Li and Zou (2010 with normal design X N(0,1. Here we use uniform design to avoid some computational issues. 23 For ε, we consider
24 nine symmetric distributions: N(0,1, truncated normal on [ 1, 1], truncated Cauchy on [ 10, 10], truncated Cauchy on [ 1, 1], Student-t with 3 (t 3 degrees of freedom, Standard Laplace distribution, uniform distribution on [ 0.5, 0.5], and two normal mixtures 0.5 N(2,1+0.5 N( 2,1 and 0.95 N(0, N(0,9. The first normal mixture can be used to model a two-cluster population, whereas the second normal mixture can be viewed as a noise contamination model. Let [l 1, l 2 ] = [ 1.5, 1.5]. The relative efficiencies of the four methods, with m LS (x as the benchmark, are summarized in Table 3 (a. Overall, the proposed optimal WQAN either significantly outperforms or is comparable to the other three methods. For example, for N(0,1 and 0.95 N(0, N(0,9, WQAN is comparable to LS; for other distributions, WQAN has about 20% efficiency gain over LS for most distributions and more than 60% efficiency gain for 0.5 N(2,1+0.5 N( 2,1. When compared with CQR, WQAN outperforms CQR for all but the four distributions: N(0,1, Student-t 3, Laplace distribution, and 0.95 N(0, N(0,9, for which they are comparable. While WQAN underperforms LAD for truncated Cauchy on [ 10, 10], it has substantial efficiency gains for N(0,1, truncated N(0,1 on [ 1, 1], truncated Cauchy on [ 1, 1], uniform on [ 0.5, 0.5], and 0.5 N(2,1+0.5 N( 2,1. The empirical performance of the proposed method in Table 3 (a is not as impressive as its theoretical performance in (30. For example, for truncated N(0,1 on [ 1, 1], the theoretical AREs according to (30 are 1, 0.48, 0.86, 0.93, 0.95, 1.13, 1.69, 2.17, compared to 1, 0.54, 0.93, 0.98, 0.99, 1.14, 1.25, 1.26 in Table 3 (a. To explain this phenomenon, the plot (not included here of the function m(x = sin(2x + 2 exp( 16x 2 exhibits large curvature and sharp changes on [ 0.5, 0.5], and thus large estimation bias could easily offset the asymptotic efficiency improvements, especially for a moderate sample size. To appreciate this, use the same X and ε in (34, and consider model Model II : Y = 1.8X + 0.5ε. (35 Then the bias term m (xµ K h 2 /2 = 0 vanishes and the variance plays a dominating role. For all four estimation methods, we use the same bandwidth: the plug-in bandwidth selector for local linear regression. We summarize the relative efficiencies in Table 3 (b. The overall pattern of the empirical relative efficiencies is consistent with that of the theoretical ones in (30, and the proposed WQAN significantly outperforms other methods for almost all distributions considered. Also, using more quantiles (k = 29 significantly 24
25 improves the performance of WQAN for truncated N(0,1 on [ 1, 1], truncated Cauchy on [ 1, 1] and uniform on [ 0.5, 0.5], which is not shared by the CQR method. In summary, for most non-normal distributions considered, the proposed method can have substantial efficiency improvements over other methods, and the empirical performance is consistent with our asymptotic theory. Insert Table 3 about here. 8 Proofs The following Lemma 1 is needed to prove Theorem 1. Lemma 1 Let h( be any differentiable function on interval [a, b]. Then we have [ b ] 2 b h(xdx (b a h 2 (xdx (b b a3 [h (x] 2 dx. (36 2 a Proof. It is easy to verify the following identity [ b 2 b V := h(xdx] (b a h 2 (xdx = 1 2 a a a b b a a a [h(x h(y] 2 dxdy. (37 For all x, y [a, b], we have h(x h(y = x y h (udu b a h (u du, uniformly. Thus, by the Cauchy-Schwartz inequality, [ b 2 b max h(x h(y 2 h (u du] (b a [h (u] 2 du. x,y [a,b] Applying the above inequality to (37, we have V completing the proof. (b a2 2 a max h(x h(y 2 x,y [a,b] (b a3 2 a b a [h (u] 2 du, Proof of Theorem 1. Recall Γ = [min(τ i, τ j τ i τ j ] 1 i,j k, τ i = i/(k + 1. We can verify Γ 1 = (k + 1., (
26 with 2(k + 1 on the diagonal, (k + 1 on the super-/sub-diagonals, and 0 elsewhere. Define the k k diagonal matrix } P = diag {q 1,..., q k with q i = f(q(τ i. (39 We have H = P 1 ΓP 1 and H 1 = P Γ 1 P. Using (38, we can obtain [ Λ k = e T k H 1 e k = e T k P Γ 1 P e k = (k + 1 q1 2 + qk 2 + (q i q i 1 ]. 2 (40 Recall g(τ = f(q(τ in Assumption 2. Define u = Q(τ. Then τ/ u = F (u/ u = f(u, and by the chain rule, g (τ = ( g/ u( u/ τ = f (u/f(u. Thus, 1 [ f [g (τ] 2 (u ] 2f(udu dτ = = I(f. f(u Applying the chain rule again, 0 g (τ = {g (τ} u u τ = {f (u/f(u} u R i=2 u τ = f (uf(u f (u 2. (41 f 3 (u Let = 1/(k + 1. In (40, by Assumption 2, (k + 1(q1 2 + qk 2 0. Define W k = (k + 1 (q i q i 1 2 i=2 1 [g (t] 2 dt. To prove Λ k I(f, it suffices to show W k 0. Since τ 1 =, τ k = 1, and τ i τ i 1 =, we can rewrite W k as { [ τi ] 2 W k = (k + 1 g (tdt τ i 1 i=2 Applying Lemma 1 and Assumption 2, we get (τ i τ i 1 i=2 τi τ i 1 [g (t] 2 dt }. W k 2 2 i=2 τi [g (t] 2 dt = 2 τ i g (t 2 dt 0, completing the proof. Proof of Proposition 1. First, we prove that (3 implies Assumption 2. Let ϵ > 0 be any given number. By the condition lim τ 0 [ g (τ + g (1 τ ]τ 3/2 = 0, there exists 26
E cient Regressions via Optimally Combining Quantile Information
E cient Regressions via Optimally Combining Quantile Information Zhibiao Zhao Penn State University Zhijie Xiao Boston College Abstract We develop a generally applicable framework for constructing e cient
More informationWEIGHTED QUANTILE REGRESSION THEORY AND ITS APPLICATION. Abstract
Journal of Data Science,17(1). P. 145-160,2019 DOI:10.6339/JDS.201901_17(1).0007 WEIGHTED QUANTILE REGRESSION THEORY AND ITS APPLICATION Wei Xiong *, Maozai Tian 2 1 School of Statistics, University of
More informationSingle Index Quantile Regression for Heteroscedastic Data
Single Index Quantile Regression for Heteroscedastic Data E. Christou M. G. Akritas Department of Statistics The Pennsylvania State University SMAC, November 6, 2015 E. Christou, M. G. Akritas (PSU) SIQR
More informationLeast Absolute Value vs. Least Squares Estimation and Inference Procedures in Regression Models with Asymmetric Error Distributions
Journal of Modern Applied Statistical Methods Volume 8 Issue 1 Article 13 5-1-2009 Least Absolute Value vs. Least Squares Estimation and Inference Procedures in Regression Models with Asymmetric Error
More informationTime Series and Forecasting Lecture 4 NonLinear Time Series
Time Series and Forecasting Lecture 4 NonLinear Time Series Bruce E. Hansen Summer School in Economics and Econometrics University of Crete July 23-27, 2012 Bruce Hansen (University of Wisconsin) Foundations
More informationMultivariate Distributions
IEOR E4602: Quantitative Risk Management Spring 2016 c 2016 by Martin Haugh Multivariate Distributions We will study multivariate distributions in these notes, focusing 1 in particular on multivariate
More informationwhere x i and u i are iid N (0; 1) random variates and are mutually independent, ff =0; and fi =1. ff(x i )=fl0 + fl1x i with fl0 =1. We examine the e
Inference on the Quantile Regression Process Electronic Appendix Roger Koenker and Zhijie Xiao 1 Asymptotic Critical Values Like many other Kolmogorov-Smirnov type tests (see, e.g. Andrews (1993)), the
More informationDoes k-th Moment Exist?
Does k-th Moment Exist? Hitomi, K. 1 and Y. Nishiyama 2 1 Kyoto Institute of Technology, Japan 2 Institute of Economic Research, Kyoto University, Japan Email: hitomi@kit.ac.jp Keywords: Existence of moments,
More informationImproving linear quantile regression for
Improving linear quantile regression for replicated data arxiv:1901.0369v1 [stat.ap] 16 Jan 2019 Kaushik Jana 1 and Debasis Sengupta 2 1 Imperial College London, UK 2 Indian Statistical Institute, Kolkata,
More informationLocal Polynomial Regression
VI Local Polynomial Regression (1) Global polynomial regression We observe random pairs (X 1, Y 1 ),, (X n, Y n ) where (X 1, Y 1 ),, (X n, Y n ) iid (X, Y ). We want to estimate m(x) = E(Y X = x) based
More informationCan we do statistical inference in a non-asymptotic way? 1
Can we do statistical inference in a non-asymptotic way? 1 Guang Cheng 2 Statistics@Purdue www.science.purdue.edu/bigdata/ ONR Review Meeting@Duke Oct 11, 2017 1 Acknowledge NSF, ONR and Simons Foundation.
More informationLecture 7 Introduction to Statistical Decision Theory
Lecture 7 Introduction to Statistical Decision Theory I-Hsiang Wang Department of Electrical Engineering National Taiwan University ihwang@ntu.edu.tw December 20, 2016 1 / 55 I-Hsiang Wang IT Lecture 7
More informationSEMIPARAMETRIC QUANTILE REGRESSION WITH HIGH-DIMENSIONAL COVARIATES. Liping Zhu, Mian Huang, & Runze Li. The Pennsylvania State University
SEMIPARAMETRIC QUANTILE REGRESSION WITH HIGH-DIMENSIONAL COVARIATES Liping Zhu, Mian Huang, & Runze Li The Pennsylvania State University Technical Report Series #10-104 College of Health and Human Development
More informationBayesian linear regression
Bayesian linear regression Linear regression is the basis of most statistical modeling. The model is Y i = X T i β + ε i, where Y i is the continuous response X i = (X i1,..., X ip ) T is the corresponding
More informationQuantile Regression for Extraordinarily Large Data
Quantile Regression for Extraordinarily Large Data Shih-Kang Chao Department of Statistics Purdue University November, 2016 A joint work with Stanislav Volgushev and Guang Cheng Quantile regression Two-step
More informationSequential design for nonparametric inference
Sequential design for nonparametric inference Zhibiao Zhao and Weixin Yao Penn State University and Kansas State University Abstract Performance of nonparametric function estimates often depends on the
More informationIntegrated Likelihood Estimation in Semiparametric Regression Models. Thomas A. Severini Department of Statistics Northwestern University
Integrated Likelihood Estimation in Semiparametric Regression Models Thomas A. Severini Department of Statistics Northwestern University Joint work with Heping He, University of York Introduction Let Y
More informationFor right censored data with Y i = T i C i and censoring indicator, δ i = I(T i < C i ), arising from such a parametric model we have the likelihood,
A NOTE ON LAPLACE REGRESSION WITH CENSORED DATA ROGER KOENKER Abstract. The Laplace likelihood method for estimating linear conditional quantile functions with right censored data proposed by Bottai and
More informationSingle Index Quantile Regression for Heteroscedastic Data
Single Index Quantile Regression for Heteroscedastic Data E. Christou M. G. Akritas Department of Statistics The Pennsylvania State University JSM, 2015 E. Christou, M. G. Akritas (PSU) SIQR JSM, 2015
More informationFall 2017 STAT 532 Homework Peter Hoff. 1. Let P be a probability measure on a collection of sets A.
1. Let P be a probability measure on a collection of sets A. (a) For each n N, let H n be a set in A such that H n H n+1. Show that P (H n ) monotonically converges to P ( k=1 H k) as n. (b) For each n
More informationUnderstanding Regressions with Observations Collected at High Frequency over Long Span
Understanding Regressions with Observations Collected at High Frequency over Long Span Yoosoon Chang Department of Economics, Indiana University Joon Y. Park Department of Economics, Indiana University
More informationNonparametric Density Estimation
Nonparametric Density Estimation Econ 690 Purdue University Justin L. Tobias (Purdue) Nonparametric Density Estimation 1 / 29 Density Estimation Suppose that you had some data, say on wages, and you wanted
More informationSupplemental Material for KERNEL-BASED INFERENCE IN TIME-VARYING COEFFICIENT COINTEGRATING REGRESSION. September 2017
Supplemental Material for KERNEL-BASED INFERENCE IN TIME-VARYING COEFFICIENT COINTEGRATING REGRESSION By Degui Li, Peter C. B. Phillips, and Jiti Gao September 017 COWLES FOUNDATION DISCUSSION PAPER NO.
More informationBayesian Semiparametric GARCH Models
Bayesian Semiparametric GARCH Models Xibin (Bill) Zhang and Maxwell L. King Department of Econometrics and Business Statistics Faculty of Business and Economics xibin.zhang@monash.edu Quantitative Methods
More informationNonparametric Identification of a Binary Random Factor in Cross Section Data - Supplemental Appendix
Nonparametric Identification of a Binary Random Factor in Cross Section Data - Supplemental Appendix Yingying Dong and Arthur Lewbel California State University Fullerton and Boston College July 2010 Abstract
More informationInference For High Dimensional M-estimates: Fixed Design Results
Inference For High Dimensional M-estimates: Fixed Design Results Lihua Lei, Peter Bickel and Noureddine El Karoui Department of Statistics, UC Berkeley Berkeley-Stanford Econometrics Jamboree, 2017 1/49
More informationBayesian Semiparametric GARCH Models
Bayesian Semiparametric GARCH Models Xibin (Bill) Zhang and Maxwell L. King Department of Econometrics and Business Statistics Faculty of Business and Economics xibin.zhang@monash.edu Quantitative Methods
More informationTeruko Takada Department of Economics, University of Illinois. Abstract
Nonparametric density estimation: A comparative study Teruko Takada Department of Economics, University of Illinois Abstract Motivated by finance applications, the objective of this paper is to assess
More informationEcon 2148, fall 2017 Gaussian process priors, reproducing kernel Hilbert spaces, and Splines
Econ 2148, fall 2017 Gaussian process priors, reproducing kernel Hilbert spaces, and Splines Maximilian Kasy Department of Economics, Harvard University 1 / 37 Agenda 6 equivalent representations of the
More informationInference For High Dimensional M-estimates. Fixed Design Results
: Fixed Design Results Lihua Lei Advisors: Peter J. Bickel, Michael I. Jordan joint work with Peter J. Bickel and Noureddine El Karoui Dec. 8, 2016 1/57 Table of Contents 1 Background 2 Main Results and
More informationQuantile Processes for Semi and Nonparametric Regression
Quantile Processes for Semi and Nonparametric Regression Shih-Kang Chao Department of Statistics Purdue University IMS-APRM 2016 A joint work with Stanislav Volgushev and Guang Cheng Quantile Response
More informationLecture Notes 15 Prediction Chapters 13, 22, 20.4.
Lecture Notes 15 Prediction Chapters 13, 22, 20.4. 1 Introduction Prediction is covered in detail in 36-707, 36-701, 36-715, 10/36-702. Here, we will just give an introduction. We observe training data
More informationCALCULATION METHOD FOR NONLINEAR DYNAMIC LEAST-ABSOLUTE DEVIATIONS ESTIMATOR
J. Japan Statist. Soc. Vol. 3 No. 200 39 5 CALCULAION MEHOD FOR NONLINEAR DYNAMIC LEAS-ABSOLUE DEVIAIONS ESIMAOR Kohtaro Hitomi * and Masato Kagihara ** In a nonlinear dynamic model, the consistency and
More informationNonparametric Identi cation and Estimation of Truncated Regression Models with Heteroskedasticity
Nonparametric Identi cation and Estimation of Truncated Regression Models with Heteroskedasticity Songnian Chen a, Xun Lu a, Xianbo Zhou b and Yahong Zhou c a Department of Economics, Hong Kong University
More informationParametric Inference Maximum Likelihood Inference Exponential Families Expectation Maximization (EM) Bayesian Inference Statistical Decison Theory
Statistical Inference Parametric Inference Maximum Likelihood Inference Exponential Families Expectation Maximization (EM) Bayesian Inference Statistical Decison Theory IP, José Bioucas Dias, IST, 2007
More informationStat260: Bayesian Modeling and Inference Lecture Date: February 10th, Jeffreys priors. exp 1 ) p 2
Stat260: Bayesian Modeling and Inference Lecture Date: February 10th, 2010 Jeffreys priors Lecturer: Michael I. Jordan Scribe: Timothy Hunter 1 Priors for the multivariate Gaussian Consider a multivariate
More informationCovariance function estimation in Gaussian process regression
Covariance function estimation in Gaussian process regression François Bachoc Department of Statistics and Operations Research, University of Vienna WU Research Seminar - May 2015 François Bachoc Gaussian
More informationDiscussion of the paper Inference for Semiparametric Models: Some Questions and an Answer by Bickel and Kwon
Discussion of the paper Inference for Semiparametric Models: Some Questions and an Answer by Bickel and Kwon Jianqing Fan Department of Statistics Chinese University of Hong Kong AND Department of Statistics
More informationSelection of Smoothing Parameter for One-Step Sparse Estimates with L q Penalty
Journal of Data Science 9(2011), 549-564 Selection of Smoothing Parameter for One-Step Sparse Estimates with L q Penalty Masaru Kanba and Kanta Naito Shimane University Abstract: This paper discusses the
More informationModelling Non-linear and Non-stationary Time Series
Modelling Non-linear and Non-stationary Time Series Chapter 2: Non-parametric methods Henrik Madsen Advanced Time Series Analysis September 206 Henrik Madsen (02427 Adv. TS Analysis) Lecture Notes September
More informationMinimum distance tests and estimates based on ranks
Minimum distance tests and estimates based on ranks Authors: Radim Navrátil Department of Mathematics and Statistics, Masaryk University Brno, Czech Republic (navratil@math.muni.cz) Abstract: It is well
More informationApproximate Median Regression via the Box-Cox Transformation
Approximate Median Regression via the Box-Cox Transformation Garrett M. Fitzmaurice,StuartR.Lipsitz, and Michael Parzen Median regression is used increasingly in many different areas of applications. The
More informationSpatially Smoothed Kernel Density Estimation via Generalized Empirical Likelihood
Spatially Smoothed Kernel Density Estimation via Generalized Empirical Likelihood Kuangyu Wen & Ximing Wu Texas A&M University Info-Metrics Institute Conference: Recent Innovations in Info-Metrics October
More informationStatistical signal processing
Statistical signal processing Short overview of the fundamentals Outline Random variables Random processes Stationarity Ergodicity Spectral analysis Random variable and processes Intuition: A random variable
More informationDESIGN-ADAPTIVE MINIMAX LOCAL LINEAR REGRESSION FOR LONGITUDINAL/CLUSTERED DATA
Statistica Sinica 18(2008), 515-534 DESIGN-ADAPTIVE MINIMAX LOCAL LINEAR REGRESSION FOR LONGITUDINAL/CLUSTERED DATA Kani Chen 1, Jianqing Fan 2 and Zhezhen Jin 3 1 Hong Kong University of Science and Technology,
More informationMaster s Written Examination
Master s Written Examination Option: Statistics and Probability Spring 016 Full points may be obtained for correct answers to eight questions. Each numbered question which may have several parts is worth
More informationBayesian spatial quantile regression
Brian J. Reich and Montserrat Fuentes North Carolina State University and David B. Dunson Duke University E-mail:reich@stat.ncsu.edu Tropospheric ozone Tropospheric ozone has been linked with several adverse
More informationBoosting Methods: Why They Can Be Useful for High-Dimensional Data
New URL: http://www.r-project.org/conferences/dsc-2003/ Proceedings of the 3rd International Workshop on Distributed Statistical Computing (DSC 2003) March 20 22, Vienna, Austria ISSN 1609-395X Kurt Hornik,
More informationEco517 Fall 2004 C. Sims MIDTERM EXAM
Eco517 Fall 2004 C. Sims MIDTERM EXAM Answer all four questions. Each is worth 23 points. Do not devote disproportionate time to any one question unless you have answered all the others. (1) We are considering
More informationO Combining cross-validation and plug-in methods - for kernel density bandwidth selection O
O Combining cross-validation and plug-in methods - for kernel density selection O Carlos Tenreiro CMUC and DMUC, University of Coimbra PhD Program UC UP February 18, 2011 1 Overview The nonparametric problem
More informationAdditive Isotonic Regression
Additive Isotonic Regression Enno Mammen and Kyusang Yu 11. July 2006 INTRODUCTION: We have i.i.d. random vectors (Y 1, X 1 ),..., (Y n, X n ) with X i = (X1 i,..., X d i ) and we consider the additive
More informationFixed Effects, Invariance, and Spatial Variation in Intergenerational Mobility
American Economic Review: Papers & Proceedings 2016, 106(5): 400 404 http://dx.doi.org/10.1257/aer.p20161082 Fixed Effects, Invariance, and Spatial Variation in Intergenerational Mobility By Gary Chamberlain*
More informationQuantile Regression for Panel/Longitudinal Data
Quantile Regression for Panel/Longitudinal Data Roger Koenker University of Illinois, Urbana-Champaign University of Minho 12-14 June 2017 y it 0 5 10 15 20 25 i = 1 i = 2 i = 3 0 2 4 6 8 Roger Koenker
More information,..., θ(2),..., θ(n)
Likelihoods for Multivariate Binary Data Log-Linear Model We have 2 n 1 distinct probabilities, but we wish to consider formulations that allow more parsimonious descriptions as a function of covariates.
More informationDefault Priors and Effcient Posterior Computation in Bayesian
Default Priors and Effcient Posterior Computation in Bayesian Factor Analysis January 16, 2010 Presented by Eric Wang, Duke University Background and Motivation A Brief Review of Parameter Expansion Literature
More informationKernel Density Based Linear Regression Estimate
Kernel Density Based Linear Regression Estimate Weixin Yao and Zibiao Zao Abstract For linear regression models wit non-normally distributed errors, te least squares estimate (LSE will lose some efficiency
More informationRobust and Efficient Regression
Clemson University TigerPrints All Dissertations Dissertations 5-203 Robust and Efficient Regression Qi Zheng Clemson University, qiz@clemson.edu Follow this and additional works at: https://tigerprints.clemson.edu/all_dissertations
More informationINFERENCE APPROACHES FOR INSTRUMENTAL VARIABLE QUANTILE REGRESSION. 1. Introduction
INFERENCE APPROACHES FOR INSTRUMENTAL VARIABLE QUANTILE REGRESSION VICTOR CHERNOZHUKOV CHRISTIAN HANSEN MICHAEL JANSSON Abstract. We consider asymptotic and finite-sample confidence bounds in instrumental
More informationEstimation de mesures de risques à partir des L p -quantiles
1/ 42 Estimation de mesures de risques à partir des L p -quantiles extrêmes Stéphane GIRARD (Inria Grenoble Rhône-Alpes) collaboration avec Abdelaati DAOUIA (Toulouse School of Economics), & Gilles STUPFLER
More informationModel-free prediction intervals for regression and autoregression. Dimitris N. Politis University of California, San Diego
Model-free prediction intervals for regression and autoregression Dimitris N. Politis University of California, San Diego To explain or to predict? Models are indispensable for exploring/utilizing relationships
More informationMinimum Hellinger Distance Estimation in a. Semiparametric Mixture Model
Minimum Hellinger Distance Estimation in a Semiparametric Mixture Model Sijia Xiang 1, Weixin Yao 1, and Jingjing Wu 2 1 Department of Statistics, Kansas State University, Manhattan, Kansas, USA 66506-0802.
More informationStat 5101 Lecture Notes
Stat 5101 Lecture Notes Charles J. Geyer Copyright 1998, 1999, 2000, 2001 by Charles J. Geyer May 7, 2001 ii Stat 5101 (Geyer) Course Notes Contents 1 Random Variables and Change of Variables 1 1.1 Random
More informationWeb Appendix for Hierarchical Adaptive Regression Kernels for Regression with Functional Predictors by D. B. Woodard, C. Crainiceanu, and D.
Web Appendix for Hierarchical Adaptive Regression Kernels for Regression with Functional Predictors by D. B. Woodard, C. Crainiceanu, and D. Ruppert A. EMPIRICAL ESTIMATE OF THE KERNEL MIXTURE Here we
More informationEconometrics I, Estimation
Econometrics I, Estimation Department of Economics Stanford University September, 2008 Part I Parameter, Estimator, Estimate A parametric is a feature of the population. An estimator is a function of the
More informationFunction of Longitudinal Data
New Local Estimation Procedure for Nonparametric Regression Function of Longitudinal Data Weixin Yao and Runze Li Abstract This paper develops a new estimation of nonparametric regression functions for
More informationA Few Notes on Fisher Information (WIP)
A Few Notes on Fisher Information (WIP) David Meyer dmm@{-4-5.net,uoregon.edu} Last update: April 30, 208 Definitions There are so many interesting things about Fisher Information and its theoretical properties
More informationOn Modifications to Linking Variance Estimators in the Fay-Herriot Model that Induce Robustness
Statistics and Applications {ISSN 2452-7395 (online)} Volume 16 No. 1, 2018 (New Series), pp 289-303 On Modifications to Linking Variance Estimators in the Fay-Herriot Model that Induce Robustness Snigdhansu
More informationA Bootstrap Test for Conditional Symmetry
ANNALS OF ECONOMICS AND FINANCE 6, 51 61 005) A Bootstrap Test for Conditional Symmetry Liangjun Su Guanghua School of Management, Peking University E-mail: lsu@gsm.pku.edu.cn and Sainan Jin Guanghua School
More informationBayesian estimation of bandwidths for a nonparametric regression model with a flexible error density
ISSN 1440-771X Australia Department of Econometrics and Business Statistics http://www.buseco.monash.edu.au/depts/ebs/pubs/wpapers/ Bayesian estimation of bandwidths for a nonparametric regression model
More informationCOMPUTER-AIDED MODELING AND SIMULATION OF ELECTRICAL CIRCUITS WITH α-stable NOISE
APPLICATIONES MATHEMATICAE 23,1(1995), pp. 83 93 A. WERON(Wroc law) COMPUTER-AIDED MODELING AND SIMULATION OF ELECTRICAL CIRCUITS WITH α-stable NOISE Abstract. The aim of this paper is to demonstrate how
More informationEfficient and Robust Scale Estimation
Efficient and Robust Scale Estimation Garth Tarr, Samuel Müller and Neville Weber School of Mathematics and Statistics THE UNIVERSITY OF SYDNEY Outline Introduction and motivation The robust scale estimator
More informationTobit and Interval Censored Regression Model
Global Journal of Pure and Applied Mathematics. ISSN 0973-768 Volume 2, Number (206), pp. 98-994 Research India Publications http://www.ripublication.com Tobit and Interval Censored Regression Model Raidani
More informationLecture 4: Types of errors. Bayesian regression models. Logistic regression
Lecture 4: Types of errors. Bayesian regression models. Logistic regression A Bayesian interpretation of regularization Bayesian vs maximum likelihood fitting more generally COMP-652 and ECSE-68, Lecture
More informationBayesian estimation of the discrepancy with misspecified parametric models
Bayesian estimation of the discrepancy with misspecified parametric models Pierpaolo De Blasi University of Torino & Collegio Carlo Alberto Bayesian Nonparametrics workshop ICERM, 17-21 September 2012
More information9. Model Selection. statistical models. overview of model selection. information criteria. goodness-of-fit measures
FE661 - Statistical Methods for Financial Engineering 9. Model Selection Jitkomut Songsiri statistical models overview of model selection information criteria goodness-of-fit measures 9-1 Statistical models
More informationOptimal bandwidth selection for the fuzzy regression discontinuity estimator
Optimal bandwidth selection for the fuzzy regression discontinuity estimator Yoichi Arai Hidehiko Ichimura The Institute for Fiscal Studies Department of Economics, UCL cemmap working paper CWP49/5 Optimal
More informationIV Quantile Regression for Group-level Treatments, with an Application to the Distributional Effects of Trade
IV Quantile Regression for Group-level Treatments, with an Application to the Distributional Effects of Trade Denis Chetverikov Brad Larsen Christopher Palmer UCLA, Stanford and NBER, UC Berkeley September
More informationECAS Summer Course. Fundamentals of Quantile Regression Roger Koenker University of Illinois at Urbana-Champaign
ECAS Summer Course 1 Fundamentals of Quantile Regression Roger Koenker University of Illinois at Urbana-Champaign La Roche-en-Ardennes: September, 2005 An Outline 2 Beyond Average Treatment Effects Computation
More informationComputational statistics
Computational statistics Markov Chain Monte Carlo methods Thierry Denœux March 2017 Thierry Denœux Computational statistics March 2017 1 / 71 Contents of this chapter When a target density f can be evaluated
More informationNonparametric Econometrics
Applied Microeconometrics with Stata Nonparametric Econometrics Spring Term 2011 1 / 37 Contents Introduction The histogram estimator The kernel density estimator Nonparametric regression estimators Semi-
More informationDefect Detection using Nonparametric Regression
Defect Detection using Nonparametric Regression Siana Halim Industrial Engineering Department-Petra Christian University Siwalankerto 121-131 Surabaya- Indonesia halim@petra.ac.id Abstract: To compare
More informationWorking Paper No Maximum score type estimators
Warsaw School of Economics Institute of Econometrics Department of Applied Econometrics Department of Applied Econometrics Working Papers Warsaw School of Economics Al. iepodleglosci 64 02-554 Warszawa,
More informationLikelihood Ratio Test in High-Dimensional Logistic Regression Is Asymptotically a Rescaled Chi-Square
Likelihood Ratio Test in High-Dimensional Logistic Regression Is Asymptotically a Rescaled Chi-Square Yuxin Chen Electrical Engineering, Princeton University Coauthors Pragya Sur Stanford Statistics Emmanuel
More informationLecture 8: Information Theory and Statistics
Lecture 8: Information Theory and Statistics Part II: Hypothesis Testing and I-Hsiang Wang Department of Electrical Engineering National Taiwan University ihwang@ntu.edu.tw December 23, 2015 1 / 50 I-Hsiang
More informationLocal Polynomial Wavelet Regression with Missing at Random
Applied Mathematical Sciences, Vol. 6, 2012, no. 57, 2805-2819 Local Polynomial Wavelet Regression with Missing at Random Alsaidi M. Altaher School of Mathematical Sciences Universiti Sains Malaysia 11800
More informationSparse Nonparametric Density Estimation in High Dimensions Using the Rodeo
Outline in High Dimensions Using the Rodeo Han Liu 1,2 John Lafferty 2,3 Larry Wasserman 1,2 1 Statistics Department, 2 Machine Learning Department, 3 Computer Science Department, Carnegie Mellon University
More informationQuantile regression and heteroskedasticity
Quantile regression and heteroskedasticity José A. F. Machado J.M.C. Santos Silva June 18, 2013 Abstract This note introduces a wrapper for qreg which reports standard errors and t statistics that are
More informationStatistical Properties of Numerical Derivatives
Statistical Properties of Numerical Derivatives Han Hong, Aprajit Mahajan, and Denis Nekipelov Stanford University and UC Berkeley November 2010 1 / 63 Motivation Introduction Many models have objective
More informationMA 575 Linear Models: Cedric E. Ginestet, Boston University Mixed Effects Estimation, Residuals Diagnostics Week 11, Lecture 1
MA 575 Linear Models: Cedric E Ginestet, Boston University Mixed Effects Estimation, Residuals Diagnostics Week 11, Lecture 1 1 Within-group Correlation Let us recall the simple two-level hierarchical
More informationNEW EFFICIENT AND ROBUST ESTIMATION IN VARYING-COEFFICIENT MODELS WITH HETEROSCEDASTICITY
Statistica Sinica 22 202, 075-0 doi:http://dx.doi.org/0.5705/ss.200.220 NEW EFFICIENT AND ROBUST ESTIMATION IN VARYING-COEFFICIENT MODELS WITH HETEROSCEDASTICITY Jie Guo, Maozai Tian and Kai Zhu 2 Renmin
More informationLocal Polynomial Modelling and Its Applications
Local Polynomial Modelling and Its Applications J. Fan Department of Statistics University of North Carolina Chapel Hill, USA and I. Gijbels Institute of Statistics Catholic University oflouvain Louvain-la-Neuve,
More informationLecture 3 September 1
STAT 383C: Statistical Modeling I Fall 2016 Lecture 3 September 1 Lecturer: Purnamrita Sarkar Scribe: Giorgio Paulon, Carlos Zanini Disclaimer: These scribe notes have been slightly proofread and may have
More informationTesting Equality of Nonparametric Quantile Regression Functions
International Journal of Statistics and Probability; Vol. 3, No. 1; 2014 ISSN 1927-7032 E-ISSN 1927-7040 Published by Canadian Center of Science and Education Testing Equality of Nonparametric Quantile
More informationNonparametric Methods
Nonparametric Methods Michael R. Roberts Department of Finance The Wharton School University of Pennsylvania July 28, 2009 Michael R. Roberts Nonparametric Methods 1/42 Overview Great for data analysis
More informationA General Overview of Parametric Estimation and Inference Techniques.
A General Overview of Parametric Estimation and Inference Techniques. Moulinath Banerjee University of Michigan September 11, 2012 The object of statistical inference is to glean information about an underlying
More informationBrief Review on Estimation Theory
Brief Review on Estimation Theory K. Abed-Meraim ENST PARIS, Signal and Image Processing Dept. abed@tsi.enst.fr This presentation is essentially based on the course BASTA by E. Moulines Brief review on
More informationregression Lie Wang Abstract In this paper, the high-dimensional sparse linear regression model is considered,
L penalized LAD estimator for high dimensional linear regression Lie Wang Abstract In this paper, the high-dimensional sparse linear regression model is considered, where the overall number of variables
More informationNonparametric estimation of extreme risks from heavy-tailed distributions
Nonparametric estimation of extreme risks from heavy-tailed distributions Laurent GARDES joint work with Jonathan EL METHNI & Stéphane GIRARD December 2013 1 Introduction to risk measures 2 Introduction
More informationEstimation and Hypothesis Testing in LAV Regression with Autocorrelated Errors: Is Correction for Autocorrelation Helpful?
Journal of Modern Applied Statistical Methods Volume 10 Issue Article 13 11-1-011 Estimation and Hypothesis Testing in LAV Regression with Autocorrelated Errors: Is Correction for Autocorrelation Helpful?
More informationMA 575 Linear Models: Cedric E. Ginestet, Boston University Non-parametric Inference, Polynomial Regression Week 9, Lecture 2
MA 575 Linear Models: Cedric E. Ginestet, Boston University Non-parametric Inference, Polynomial Regression Week 9, Lecture 2 1 Bootstrapped Bias and CIs Given a multiple regression model with mean and
More information