The Pennsylvania State University The Graduate School Eberly College of Science A NON-ITERATIVE METHOD FOR FITTING THE SINGLE

Size: px
Start display at page:

Download "The Pennsylvania State University The Graduate School Eberly College of Science A NON-ITERATIVE METHOD FOR FITTING THE SINGLE"

Transcription

1 The Pennsylvania State University The Graduate School Eberly College of Science A NON-ITERATIVE METHOD FOR FITTING THE SINGLE INDEX QUANTILE REGRESSION MODEL WITH UNCENSORED AND CENSORED DATA A Dissertation in Statistics by Eliana Christou 2016 Eliana Christou Submitted in Partial Fulfillment of the Requirements for the Degree of Doctor of Philosophy May 2016

2 The dissertation of Eliana Christou was reviewed and approved by the following: Michael G. Akritas Professor of Statistics Dissertation Advisor, Chair of Committee Bing Li Professor of Statistics Zhibiao Zhao Associate Professor of Statistics Spiro E. Stefanou Professor Emeritus of Agricultural Economics, The Pennsylvania State University Professor and Chair of Food and Resource Economics, University of Florida Aleksandra B. Slavković Associate Head for Graduate Studies, Professor of Statistics Signatures are on file in the Graduate School. ii

3 Abstract Quantile regression QR) is becoming increasingly popular due to its relevance in many scientific investigations. Linear and nonlinear QR models have been studied extensively, while recent research focuses on the single index quantile regression SIQR) model. Compared to the single index mean regression SIMR) problem, the fitting and the asymptotic theory of the SIQR model are more complicated due to the lack of closed form expressions for estimators of conditional quantiles. Consequently, existing methods are necessarily iterative. We propose a non-iterative estimation algorithm, and derive the asymptotic distribution of the proposed estimator under heteroscedasticity. For identifiability, we use a parametrization that sets the first coefficient to 1 instead of the typical condition which restricts the norm of the parametric component. This distinction is more than simply cosmetic as it affects, in a critical way, the correspondence between the estimator derived and the asymptotic theory. The ubiquity of high dimensional data has led to a number of variable selection methods for linear/nonlinear QR models and, recently, for the SIQR model. We propose a new algorithm for simultaneous variable selection and parameter estimation applicable also for heteroscedastic data. The proposed algorithm, which is non-iterative, consists of two steps. Step 1 performs an initial variable selection method. Step 2 uses the results of Step 1 to obtain better estimation of the conditional quantiles and, using them, to perform simultaneous variable selection and estimation of the parametric component of the SIQR model. It is shown that the initial variable selection method of Step 1 consistently estimates the relevant variables, and that the estimated parametric component derived in Step 2 satisfies the oracle property. Furthermore, QR is particularly relevant for the analysis of censored survival data as an alternative to proportional hazards and the accelerated failure time models. Such data occur frequently in biostatistics, environmental sciences, social sciences and econometrics. There is a large body of work for linear/nonlinear QR models for censored data, but it is only recently that the SIQR model has received iii

4 some attention. However, the only existing method for fitting the SIQR model uses an iterative algorithm and no asymptotic theory for the resulting estimator of the Euclidean parameter is given. We propose a new non-iterative estimation algorithm, and derive the asymptotic distribution of the proposed estimator under heteroscedasticity. iv

5 Table of Contents List of Figures List of Tables List of Symbols Acknowledgments viii ix xii xiii Chapter 1 Introduction to Quantile Regression Linear Quantile Regression Nonparametric Quantile Regression Semiparametric Quantile Regression Outline of Thesis Chapter 2 Single Index Quantile Regression for Heteroscedastic Data Introduction The Proposed Estimator Main Results Numerical Studies Computational Remarks Simulation Results Boston Housing Data Conclusions Chapter 3 Variable Selection in Heteroscedastic Single Index Quantile Regression Introduction v

6 3.2 The Proposed Estimator Main Results Numerical Studies Computational Remarks Simulation Results Boston Housing Data An application to Genomic Data Conclusions Chapter 4 Single Index Quantile Regression for Censored Data Introduction The Proposed Estimator Main Results Numerical Studies Computational Remarks Simulation Results A Real Example Conclusions Appendix Assumptions and Proofs of Main Results 63 A.1 Assumptions A.2 Some General Lemmas A.3 Proofs for Chapter A.3.1 Some Lemmas A.3.2 Proof of Proposition A.3.3 Proof of Proposition A.3.4 Proof of Theorem A.4 Proofs for Chapter A.4.1 Some Lemmas A.4.2 Proof of Theorem A.4.3 Proof of Proposition A.4.4 Proof of Proposition A.4.5 Proof of Theorem A.5 Proofs for Chapter A.5.1 Some Lemmas A.5.2 Proof of Proposition A.5.3 Proof of Proposition A.5.4 Proof of Theorem vi

7 Bibliography 109 vii

8 List of Figures 1.1 Quantile Regression ρ check function Boxplot of estimated parametric component for Model 2.18) for the three estimators; the true β is Estimated SIQR for Boston housing data for Model 2.22). The dots are the observations and the curve is the estimated quantile function Estimated SIQR for Boston housing data for Model 2.23). The dots are the observations and the curve is the estimated quantile function Estimated SIQR for Boston housing data for Model 3.11). The dots are the observations and the curve is the conditional quantile function viii

9 List of Tables 2.1 Mean values and standard errors in parenthesis), R β) and average LL R τ Q ), defined in 2.19), for Model 2.18) τ, β1 2.2 Mean values and standard errors in parenthesis), R β) and average LL R τ Q ), defined in 2.19), for Model 2.20). Also, 95% coverage τ, β1 probability for NWQR, WYY-2 and the second estimated coefficient of Wu et al. 2010), denoted by WYY % coverage probability for NWQR, WYY-2, and the second estimated coefficient of Wu et al. 2010), denoted by WYY, for Model 2.21) Proposed parametric vector estimates and standard errors in parenthesis) for Boston housing data for Model 2.22) with five different quantile levels Proposed parametric vector estimates and standard errors in parenthesis) for Boston housing data for Model 2.23) with five different quantile levels LL 2.6 Mean check based absolute residuals, R τ Q ), defined in 2.19), τ, β1 for Models 2.22) and 2.23); WYY denotes the method proposed by Wu et al. 2010) and NWQR denotes the proposed methodology Mean values and standard deviations in parenthesis) for the size and the number of correct and incorrect zeros of the estimated parametric component β SCAD. Also, mean values for AR β SCAD ) and R τ Q LL τ, βscad 1 ) for Model 3.7) Mean values and standard deviations in parenthesis) for the MSE β 1X) for the SCAD-NWQR, LASSO-AY, and ALASSO-AY estimated parametric components for Model 3.7) ix

10 3.3 Mean values and standard deviations in parenthesis) for the size and the number of correct and incorrect zeros of the estimated parametric component β SCAD. Also, mean values for AR β SCAD ) and R τ Q LL τ, βscad 1 ) for Model 3.8) Mean values and standard deviations in parenthesis) for the MSE β 1X) for the SCAD-NWQR, LASSO-AY, and ALASSO-AY estimated parametric components for Model 3.8) Mean values and standard deviations in parenthesis) for the size and the number of correct and incorrect zeros of the estimated parametric component β SCAD. Also, mean values for AR β SCAD ) and R τ Q LL τ, βscad 1 ) for Model 3.9) Mean values and standard deviations in parenthesis) for the MSE β 1X) for the SCAD-SIR, LASSO-AY, and ALASSO-AY estimated parametric components for Model 3.9) Mean values and standard deviations in parenthesis) for the size and the number of correct and incorrect zeros of the estimated parametric component β SCAD for Model 3.10) Mean values and standard deviations in parenthesis) for the MSE β 1X), average AR β) and average R τ Q LL τ, β1 ) for NWQR and SCAD-NWQR estimated parametric components for Model 3.10) Proposed parametric vector estimates for Boston housing data for Model 3.11) with five different quantile levels Number of zero coefficients for SCAD-NWQR, LASSO-AY, and ALASSO-AY for Boston housing data for Model 3.11) with five different quantile levels LL 3.11 Mean check based absolute residuals, R τ Q ), defined in 2.19), for τ, β1 SCAD-NWQR, LASSO-AY, and ALASSO-AY for Boston housing data for Model 3.11) with five different quantile levels Average estimated coefficients for States 1, 3 and 6. These states that make up the cold or warm bulk of the genome; they have low or intermediate divergence rates, and are located on the autosomes away from the telomeres Average estimated coefficients for States 2, 4 and 5. State 2 has very low divergence rates and is located exclusively on chromosome X. States 4 and 5 have very high divergence rates, and the former is located in the telomeric regions of autosomes the latter is interspersed throughout the autosomes) x

11 LL 4.1 Mean values and standard errors in parenthesis), average R τ Q ), τ, β1 defined in 2.19), and coverage probabilities for the CD-NWQR for Model 4.10) Trimmed mean values and standard errors in parenthesis), and average R τ Q LL τ, β1 ), defined in 2.19), for the CD-BGK for Model 4.10) Mean and trimmed mean values along with their corresponding standard errors in parenthesis) for the estimated parametric component, as well as average R τ Q LL τ, β1 ), defined in 2.19), for the CD-NWQR and CD-BGK estimators for Model 4.11) Parametric vector estimates and standard errors in parenthesis) for Model 4.12) for five different quantile levels LL 4.5 Mean check absolute residuals, R τ Q ), defined in 2.19), for the τ, β1 CD-NWQR and CD-BGK estimators for Model 4.12) xi

12 List of Symbols τ The greek letter, p. 1, denotes the quantile of interest, where 0 < τ < 1. ρ τ β 1 ɛ The greek letter, p. 1, denotes the check function, defined for 0 < τ < 1, as ρ τ u) = τ Iu < 0))u. The greek letter, p. 6, denotes a d-dimensional vector of unknown parameters, where β 1 = 1, β ) and β R d 1. The greek letter, p. 8, denotes the error term. β The greek letter, p. 9, denotes an estimator of β. xii

13 Acknowledgments I would like to express my special appreciation and thanks to my advisor Professor Michael G. Akritas, you have been a tremendous mentor for me. I would like to thank you for the useful comments, remarks and engagement through the learning process of this thesis, as well as, your continuous encouragement and support throughout these years. I would also like to thank the Associate Head for Graduate Studies Professor Aleksandra Slavković for supporting me throughout all my academic years and helping me whenever I had the need. I would like to thank all the faculty in the department for encouraging all the students and especially Professor Naomi Altman and Professor David Hunter for been next to me throughout all my steps. I would also like to thank an undergraduate professor of mine, Dr. Tasos Christofides, for encouraging me to continue my studies at the graduate level and providing me guidance throughout all my decisions. A special thanks to my family. Words cannot express how grateful I am to my mother and my father for all of the sacrifices that you ve made on my behalf. I have worked hard to complete my graduate studies, but I wouldn t have been able to achieve my goals if it wasn t your support, love, and encouragement. Your prayer for me was what sustained me thus far. Thank you from the bottom of my heart. Many thanks to my brother, my sister, and my grandparents for being always there for me. I lost my grandmother two months before my graduation, but I own her a great thanks for all the times she was giving me her support through her calls. Thank you grandma! I would also like to thank a special friend of mine, Sotiria Marathovounioti who was always my support and Vasiliki Vasileiou who supported me in writing, and pushed me to strive towards my goal. xiii

14 Dedication To my mother, Andri Christou, my father, Constantinos Christou, and my grandmother, Avgi Stefanou. Thank you for everything! xiv

15 Chapter 1 Introduction to Quantile Regression Ordinary least squares regression plays a prominent role in a wide variety of fields and is a very popular method for modeling the relationship between a d- dimensional vector of covariates X and the conditional mean of the response variable Y given X = x. However, mean regression, linear or not, provides only a single summary measure of the conditional distribution of the response given the covariates. Moreover, the sensitivity of the least squares estimator to even modest amount of outliers, makes it a very poor estimator in many non-gaussian and especially long-tailed distributions. Quantile regression QR), which was first introduced by Koenker and Bassett 1978), is a method for completing the regression picture by focusing on specific conditional quantiles of the distribution. QR models the relationship between a d- dimensional vector of covariates X and the τth, for 0 < τ < 1, conditional quantile of the response variable Y given X = x. When the error term is heteroscedastic, a direct approach for estimating conditional quantiles has a number of advantages. There are a lot of real data sets that motivate the use of QR, especially cases were extremes are important. For example, QR can be used in environmental studies, where the upper quantiles of pollution levels are critical from a public health perspective. Koenker and Bassett 1978) introduced the loss function ρ τ ), also known as check function, defined for 0 < τ < 1, as ρ τ u) = τ Iu < 0))u, which simply gives different weights to positive and negative values; see Figure 1.1. It can be easily shown that minimizing the function Eρ τ X q)) with respect to q, gives the 1

16 τ th quantile of the random variable X. This idea motivated Koenker and Bassett 1978) to introduce linear QR. Figure 1.1. Quantile Regression ρ check function 1.1 Linear Quantile Regression Let Q τ Y x) Q τ Y X = x) = inf{y : P Y y X = x) τ} 1.1) denote the τ-th conditional quantile of Y given X = x and consider the linear QR model Q τ Y x) = β x, 1.2) 2

17 where β is a d-dimensional vector of unknown parameters. Koenker and Bassett 1978) used the representation Q τ Y x) = arg min q E ρ τ Y q) X = x), 1.3) where ρ τ ) is the check function, to define the estimator β as β = arg min b R d ρ τ Y i b X i ). 1.4) Thus, β x gives the estimator of the τ-th conditional quantile under the linear QR model. Observe that for τ = 1/2, the objective function is the L 1 norm, that is arg min b R d 1 2 Y i b X i, which gives the estimated conditional median. It turns out that QR inherits the well known robustness properties of the median regression; see Pollard 1991). Koenker and Bassett 1978) studied the asymptotic statistical behavior of the estimated conditional regression quantiles, while Koenker 1994) studied confidence intervals for the regression quantiles, based on the asymptotic theory. He suggested three different construction methods for the confidence interval: the sparsity estimation which involves direct estimation of the sparsity function, the inversion of rank tests which computes confidence intervals by inverting the rank score test, and the resampling method. Later, Koenker and Hallock 2001) presented some practical implementations for linear QR. Specifically, they considered two data sets, the quantile engel curves and the QR of infant birthweights, and they demonstrated that there were characteristics that were not captured by the least squares estimators. 1.2 Nonparametric Quantile Regression Because the linearity assumption of model 1.2) is quite strict, several authors considered the completely flexible nonparametric estimation of the conditional 3

18 quantiles. For the model Q τ Y x) = hx), where h : R d R is a fully nonparametric function, Truong 1989) showed that, under conditions, local median estimators achieve the global optimal rates of Stone 1982) with respect to L m norms, 0 < m. Chaudhuri 1991) constructed local polynomial estimators for conditional quantile functions and their derivatives, and also showed that they achieve the optimal nonparametric rates of convergence of Stone 1982) under mild conditions. A local Bahadur type representation was also established by Chaudhuri 1991) for the uniform kernel function, and this result was later extended to general kernels by Hong 2003). Fan et al. 1994) considered a general convex loss function, that includes the mean, median, quantiles, and other robust functionals, and constructed local linear estimators. See also Yu and Jones 1998) who proposed inverting a local linear conditional distribution estimator. Takeuchi et al. 2006) presented a nonparametric version of a quantile estimator, which can be obtained by solving a simple quadratic programming problem and provide uniform convergence results. Kong et al. 2010) extended Chaudhuri s 1991) and Hong s 2003) pointwise Bahadur representation results by deriving a strong uniform with respect to x) Bahadur representation also for dependent observations. Guerre and Sabbah 2012) investigated the bias and the weak Bahadur representation of a local polynomial estimator of the conditional quantile function and its derivatives uniformly with respect to the quantile level, the covariates and the smoothing parameter. Also, they showed that the local polynomial quantile estimator achieves the global optimal rates of Stone 1982) for the L m and uniform norms. 1.3 Semiparametric Quantile Regression The rate of convergence of completely nonparametric estimators of conditional quantiles, however, decreases with increasing dimensionality of the covariate vector. This motivated the study of a number of semiparametric models, and of variable selection methods, for QR. Koenker 2011) considered the additive model for QR which includes both parametric and nonparametric components. Lin et al. 2013) 4

19 considered variable selection for nonparametric QR via smoothing spline ANOVA SS-ANOVA). See Su and Zhang 2012) for a literature review. The single index quantile regression SIQR) model has received particular attention and this will be the subject undertaken this thesis; see Chapter 2 for the definition of a SIQR model. 1.4 Outline of Thesis The main complication faced by methods that are based on minimization of a semiparametric objective function, lies in the lack of a closed form expression for the conditional quantile; see Chapter 2, Sections 2.1 and 2.2 for details. Thus, the proposed methods are necessarily iterative; this is further clarified in the next chapter. Here we propose a new check-function based objective function, which can be minimized non-iteratively for parameter estimation. In Chapter 2 we present the proposed algorithm for estimating the parametric component of a SIQR model for heteroscedastic data. In Chapter 3 we extend the non-iterative algorithm for simultaneous variable selection and parameter estimation, and in Chapter 4 we present the SIQR model for censored data. 5

20 Chapter 2 Single Index Quantile Regression for Heteroscedastic Data 2.1 Introduction The single index quantile regression SIQR) model specifies that Q τ Y x) = Q τ,β1 Y β 1x), 2.1) where β 1 is a d-dimensional vector of unknown parameters, Q τ Y x) is the τth conditional quantile of the response Y given X = x defined in 1.1), and, for any d-dimensional vector b 1, Q τ,b1 Y b 1x) = inf{y : P Y y b 1X = b 1x) τ}. 2.2) A SIQR model is very useful since it maintains some nonparametric flexibility, while at the same time, it reduces the dimensionality. For identifiability one imposes certain conditions on β 1, the most common of which is to assume that β 1 = 1, with its first coordinate positive. In this work, we propose the parametrization which assumes that β 1 = 1, β ), β R d 1 ; 2.3) this parametrization is also used in the R package np for the single index mean regression SIMR) model. This distinction is more than simply cosmetic as it affects, 6

21 in a critical way, the correspondence between the estimator derived and the asymptotic theory. The advantages of the proposed parametrization are demonstrated in the simulations; see Section Existing literature considers the SIQR model under homoscedasticity Wu et al. 2010), restricted heteroscedasticity Chaudhuri et al. 1997) and general heteroscedasticity Kong and Xia 2012). Chaudhuri et al. 1997) considered the average derivative quantile regression estimator which, under a SIQR model where the variance function depends only on β 1x, estimates the direction of β 1. Wu et al. 2010) and Kong and Xia 2012) estimate β 1 by minimizing an objective function. Compared with SIMR, see Li and Racine 2007, Chapter 8), the main complication faced by this approach lies in the lack of a closed form expression for the estimator of conditional quantiles. Thus, the proposed methods are necessarily iterative. Wu et al. 2010) proposed an algorithm which, starting from an initial value b 0 1 for the parametric component, iteratively estimates the nonparametric component and its derivative using local linear QR, and the parametric component using essentially) linear QR. Kong and Xia 2012) criticized the convergence properties of the algorithm in Wu et al. 2010) and proposed an improved iterative algorithm by introducing a penalty term that assures its almost sure convergence. The outliers in the boxplot of the Wu et al. 2010) estimator shown in Section are probably a consequence of the iteration issues of their algorithm.) In addition, Kong and Xia 2012) allowed general heteroscedasticity, but the covariance function of the limiting normal distribution they obtained depends on the true value of the parametric component in an explicit manner. In this work we propose a non-iterative method, based on minimization of a check-function based objective function, for estimating the parametric component of the SIQR model. The proposed estimator is shown to have an asymptotically normal distribution, with a simple expression for the covariance matrix, under general heteroscedasticity. In Section 2.2 we present the proposed estimator, while in Section 2.3 we present the main results, that include the n-consistency and the asymptotic normality of the estimated parametric component. In Section 2.4 we present results from several simulation examples and a real data application on the Boston housing data. Some concluding remarks are given in Section

22 2.2 The Proposed Estimator Let {Y i, X i } n be independent and identically distributed iid) observations that satisfy Y i = Q τ,β1 Y β 1X i ) + ɛ i, 2.4) where Q τ,β1 Y β 1X i ) is defined in 2.2), and the error term ɛ i satisfies Q τ ɛ i X i ) = 0. The quantities β 1 and ɛ i are specific to the τ-th quantile, but we omit the subscript τ for notational convenience. Note that 2.4) is an equivalent way of specifying the SIQR model 2.1). Relation 1.3) implies that the true parametric vector β recall the parametrization used in 2.3)) satisfies β = arg min b E ρ τ Y Q τ,b1 Y b 1X)) ), 2.5) where, in a notation that will be used throughout this thesis, b 1 = 1, b ), b R d 1. The sample level version of 2.5) consists of minimizing ρ τ Y i Q τ,b1 Y b 1X i )). 2.6) As in the SIMR problem cf. Ichimura 1993, Newey and Stoker 1993), the unknown Q τ,b1 Y b 1X i ) must be replaced with an estimator. Unlike the SIMR problem, however, there is no closed form expression for the estimator of Q τ,b1 Y b 1X i ), and this has led to iterative algorithms for estimating β; see the literature review in Section 2.1. To overcome this difficulty, we define, for any given b R d 1, the function gt b) : R R as gt b) = E Q τ Y X) b 1X = t ), where b 1 = 1, b ). Noting that, under the SIQR model 2.1), Q τ Y X) = Q τ,β1 Y β 1X) = gβ 1X β), it follows that β also satisfies β = arg min b E ρ τ Y gb 1X b)) ). 2.7) 8

23 The sample level version of 2.7) consists of minimizing S n τ, b) = ρ τ Y i gb 1X i b)). 2.8) Again, g b) is unknown but it can be estimated, in a non-iterative fashion, by first obtaining estimators Q τ Y X i ), for i = 1,..., n, and forming the Nadaraya- Watson-type estimator ĝ NW t b) = Q τ Y X i )K t b 1 X ) i h nk=1 K t b 1 X k h ), 2.9) where K ) is a univariate kernel function and h is a bandwidth. The different methods for constructing nonparametric estimators Q τ Y X i ) are summarized in Racine and Li 2014), who also introduced a new direct method. In this thesis, we will use the local polynomial conditional quantile estimator, which was studied in Guerre and Sabbah 2012). Specifically, let k denote the order of the local polynomial estimator and define, for v = v 1,..., v d ) where v 1,..., v d integer numbers, v = v v d. Then, for α = α 0, α 1) R P, where P the number of v s with v k, a multivariate kernel function K x) = K x 1,..., x d ), and a univariate bandwidth h, let L n α 0, α 1 ); τ, x) = 1 nh ) d where Uz) = z v /v!, v k), for z v = z v 1 1 z v d d Q τ Y x) as α 0 τ; x), where α 0 τ; x) is defined through ) ρ τ Y i UX i x) α)k Xi x, 2.10) h and v! = d v i!. Define α 0 τ; x), α 1 τ; x)) = arg min α 0,α 1 ) L nα 0, α 1 ); τ, x). 2.11) Remark For high dimensional data it is possible to improve the estimation of conditional quantiles by employing variable selection methods; see Chapter 3 for details. Thus, the proposed estimator is obtained by β = arg min Ŝ n τ, b), 2.12) b Θ 9

24 where Θ R d 1 is a compact set assumed to contain the true value of β, and Ŝ n τ, b) = ρ τ Yi ĝ NW b 1X i b) ). 2.13) For technical reasons that have to do with the uniform convergence of the Nadaraya- Watson estimator, a trimming function is usually introduced in the objective function 2.13). To avoid complicating the notation, we will assume that the support X 0 of X is compact and the density f b of b 1X stays bounded away from zero on T b = {t : t = b 1x, x X 0 }, uniformly in b Θ. 2.3 Main Results The first two results, which have to do with the uniform, in both t and b, consistency of ĝ NW t b), and the n-consistency of β, are needed for the proof of Theorem PROPOSITION Let ĝ NW t b) be as defined in 2.9). Assume that for some r > 2, E Q τ Y X) r < and sup t Tb E Q τ Y X) r b 1X = t)f b t) < holds for all b Θ, where T b = {t : t = b 1x, x X 0 }, X 0 is the compact support of X and f b is the density of b 1X. Moreover, assume that Q τ Y x) is in H s X 0 ) for some s with [s] k, where H s X 0 ) is defined in Appendix A.1 and k is the order of the local polynomial conditional quantile estimators Q τ Y X i ) used in 2.9)). Under Assumptions GS1-GS3 and Assumptions A1-A5 given in Appendix A.1, we have sup b Θ,t T b ĝ NW t b) gt b) = Op a n + a n + h 2), where a n = log n/n) s/2s+d) and a n = log n/nh)) 1/2. The proof of Proposition is given in Appendix A.3.2. PROPOSITION Let β be as defined in 2.12). Then, under the assumptions of Proposition 2.3.1, Assumptions A6 and A7 given in Appendix A.1, and the condition nh 4 = o1), where h is the bandwidth in 2.9), β is n-consistent estimator of β. 10

25 The proof of Proposition is given in Appendix A.3.3. THEOREM Let β be as defined in 2.12). Then, under the assumptions of Proposition 2.3.2, n β β) = V 1 W n + o p 1), where V=E g β 1X β)) 2 X 1 EX 1 β 1X))X 1 EX 1 β 1X)) f ɛ X 0 X) ) 2.14) for g t b) = / t)gt b), X 1 the d 1)-dimensional vector consisting of coordinates 2,..., d of X, and f ɛ X x) the conditional probability density function of ɛ given X = x, and for Y i W n = n 1/2 n = Y i ĝ NW β 1X i β). Furthermore, ρ τy i )g β 1X i β)x i, 1 EX 1 β 1X)), 2.15) n β β) d N 0, τ1 τ)v 1 ΣV 1), where Σ = E g β 1X β)) 2 X 1 EX 1 β 1X))X 1 EX 1 β 1X)) ). 2.16) The proof of Theorem is given in Appendix A.3.4. Next, let β 1 = 1, β ) and define âτ; x), bτ; x)) = arg min a,b) ρ τ Y i a b β 1X i x)) K β 1X i x) h where K ) is a univariate kernel function and h is a bandwidth, and define, Q LL τ Y x) = Q LL τ, β1 Y β 1x) = âτ; x), 2.17) as the quantile estimator based on the assumption of the SIQR model 2.1). 11

26 LL COROLLARY Let Q τ Y x) be as defined in 2.17), where K ) is a symmetric, second order and density kernel with a compact support and a bounded first derivative that satisfies t j K 2 t)dt < for j = 0, 1, 2. Then, under the assumptions of Proposition and Assumption A8 given in Appendix A.1, n h Q LL τ Y x) Q τ Y x) h ) 2 Iβ d 1x) N0, ω 2 β 1x)), where Iβ 1x) = 1/2)g β 1x β) t 2 Kt)dt and ω 2 β 1x) = τ1 τ) K 2 t)dt f β β 1x)f ɛ β 0 β 1x)) 2, for f ɛ β t) the conditional density function of ɛ given β 1X takes the value t, h 0 and n h as n. The corollary follows from the fact that the proposed estimator β is n-consistent and the proof of Theorem 2 of Wu et al. 2010). Remark It can be shown that, when using the parametrization adopted here see 2.3)), the asymptotic normality result for the parametric vector of Wu et al. 2010) is a special case pertaining to the homoscedastic case) of Theorem Oberhofer and Haupt 2015), considered nonlinear QR thus known link function), in a fixed design with heteroscedastic errors which are allowed to be weakly dependent. Taking these differences into consideration, the form of the limiting covariance matrix they obtained is also related to that of Theorem Finally, Kong and Xia 2012), who considered the heteroscedastic case and use a penalty term to ensure the convergence of their iterative algorithm, obtain a covariance matrix that depends on the true value of the parametric component in an explicit manner and whose form is not directly comparable to that of the previous literature or ours. 12

27 2.4 Numerical Studies This section contains simulation results and the analysis of a real data set contrasting the proposed estimator with that of Wu et al. 2010) Computational Remarks For the computation of the proposed estimator, the conditional quantile estimator Q τ Y x) used in 2.9)) is a multivariate local linear estimator, computed using an extension of the code for the function lprq in the R package quantreg which applies only to univariate covariates). The bandwidth h, used in 2.10), is selected to be the rule-of-thumb bandwidth for the local linear conditional quantile estimator derived in Yu and Jones 1998). The bandwidth h used in 2.9) is selected using the optimal rate of Cn 1/5, where C is chosen to be the standard deviation of b 0 1 x for b 0 1 an initial value. The Gaussian kernel was used for estimation of the conditional quantiles Q τ Y X i ) and the Nadaraya-Watson-type estimator ĝ NW t b). The function nlrq of the same R package was used for minimizing the objective function 2.13). For the computation of the covariance matrix derived in Theorem 2.3.3, note that the expressions for V and Σ in 2.14) and 2.16), respectively, involve the quantity g β 1X β)x 1 EX 1 β 1X)) and f ɛ X 0 X). The estimator used for the first quantity is based on the observation that, under the single index SI) model, b gb 1X b) = g β 1X β)x 1 EX 1 β 1X)), b=β and, therefore, it can be estimated as b ĝ NW b 1X b) b= β. Finally, the estimation of f ɛ X 0 X) uses Gaussian kernel and bandwidth chosen according to the R function bw.nrd0 in the stats package. The resulting code is available from the first author. For the computation of the Wu et al. 2010) estimator we used the code provided by these authors. Because the two estimators being compared are derived using different parametrizations for ensuring identifiability, e.g., 2.3) versus the constraint to have a norm of one, for the sake of comparison we found necessary to introduce two modifications of the Wu et al. 2010) estimator. For the first modification, we divide their esti- 13

28 mator by its first component, and for the second modification we use our proposed parametrization in conjunction with the iterative algorithm of Wu et al. 2010) for estimating the remaining d 1 coefficients. In what follows, NWQR denotes the proposed estimator, and WYY, WYY-1 and WYY-2 denote, respectively, the estimator in Wu et al. 2010) and its first and second modification. All simulation results in this chapter use a sample size of n = 400 and are based on N = 100 iterations Simulation Results Example 1 Asymmetric Homoscedastic Errors): Here the data are generated according to the model Y = 5 cosβ 1X) + exp β 1X) 2 ) + ɛ, 2.18) where X = X 1, X 2 ), X i U0, 1) are iid, β 1 = 1, 2), the residual ɛ follows an exponential distribution with mean 2, and X i s and ɛ are mutually independent. In this example, we fit the single index median regression model using the proposed method and that of Wu et al. 2010). The boxplots presented in Figure 2.1 shows 100 coefficient estimates of the parametric component β whose true value is 2), using the proposed methodology and the two modifications of the Wu et al. 2010) estimator. Observe that the boxplot for NWQR is more closely concentrated around the true value of 2 than the other two boxplots, while the boxplot for WYY-1 displays the widest variability around the true value. The observed outliers in the boxplot for the WYY-2 estimator which uses the proposed parametrization) is probably a consequence of the iterative algorithm which stops after a maximum number of iterations. For further comparison, Table 2.1 reports the observed, over the N = 100 simulation runs, mean values and standard deviations for the three estimators of the parametric component β, as well as the mean squared error, R β), and the LL average mean check based absolute residuals, R τ Q ), defined as τ, β1 R τ Q LL τ, β1 ) = 1 n ρ τ Y i Q LL τ, β1 Y β 1X i )), 2.19) 14

29 Figure 2.1. Boxplot of estimated parametric component for Model 2.18) for the three estimators; the true β is NWQR WYY 1 WYY 2 β^ where β LL 1 denotes an estimator of the Euclidean parameter β 1, and Q Y β τ, β1 1X i ) is defined in 2.17). The findings in Table 2.1 can be further confirm the conclusions drawn from the boxplot in Figure 2.1. We observe that NWQR gives the smallest bias, the smallest value of R β), LL and the smallest average R τ Q ), followed by τ, β1 WYY-2, which uses the proposed constraint, and then WYY-1. Finally, we compare the coverage of the asymptotic 95% confidence intervals 15

30 Table 2.1. Mean values and standard errors in parenthesis), R β) and average R τ Q LL ), τ, β1 defined in 2.19), for Model 2.18). β R β) LL R τ Q ) τ, β1 NWQR ) WYY ) WYY ) based on NWQR, WYY-2, and the second estimated coefficient of Wu et al. 2010) which estimates 2/ 5). The WYY-1 estimator was not considered due to the additional complication presented by the ratio. The observed coverage probabilities are 0.96, 0.91 and 0.68 for NWQR, WYY-2 and the second estimated coefficient of Wu et al. 2010), respectively. The p-values corresponding to 0.91 and 0.68 are 0.07 and , respectively. The conclusion is that the marginal standard error formula given in Wu et al. 2010) for the second component of β 1 is appropriate for the parametrization used in the present work. Example 2 Symmetric Homoscedastic Errors): Here the data are generated according to the model Y = expβ 1X) + ɛ, 2.20) where X = X 1, X 2 ), X i N0, 1) are iid, β 1 = 1, 2), the residual ɛ follows a standard normal distribution, and X i s and ɛ are mutually independent. In this example, we fit the SIQR model for five different quantile levels, τ = 0.1, 0.25, 0.5, 0.75, 0.9, using the proposed method and that of Wu et al. 2010). The boxplots for Model 2.20) are ommited since they reveal similar conclusions as in Example 1. Table 2.2 presents the observed, over the N = 100 simulation runs, mean values and standard deviations for the three estimators of the parametric component β, as well as the mean squared error, R β), and the average mean check Q LL based absolute residuals, R τ ). NWQR is more closely concentrated around τ, β1 2 and gives the smallest R β) LL and average R τ Q ) values for all quantile levels. τ, β1 Note that for τ = 0.9, the NWQR estimator has a mean value of , while WYY-2 has a mean value of This is due to an outlier presented for NWQR, while it still gives the smallest bias. Also, observe the large values of β and R β) for 16

31 τ = 0.5 for WYY-1. This is due to a very extreme outlier around 140) which can be observed from the boxplot and which is a consequence of the iterative algorithm. Table 2.2 also reports the coverage of the asymptotic 95% confidence intervals based on NWQR, WYY-2, and the second estimated coefficient of Wu et al. 2010) denoted by WYY in the table. Similar conclusions as in Example 1 can be drawn regarding the performance of these confidence intervals with the additional remark that the coverage probability of the WYY-2 intervals deteriorates for the 75th and 90th percentiles. Table 2.2. Mean values and standard errors in parenthesis), R β) and average R τ Q LL ), τ, β1 defined in 2.19), for Model 2.20). Also, 95% coverage probability for NWQR, WYY-2 and the second estimated coefficient of Wu et al. 2010), denoted by WYY. τ NWQR ) ) ) ) ) β WYY ) ) ) ) ) WYY ) ) ) ) ) NWQR R β) WYY WYY NWQR LL R τ Q ) τ, β1 WYY WYY % coverage NWQR prob. WYY WYY Example 3 Asymmetric Heteroscedastic Errors): Here the data are generated according to the model Y = sin2πβ 1X) β 2X) 2 ) ɛ, 2.21) 4 where X = X 1, X 2 ), X i U0, 1) are iid, β 1 = 1, 2), β 2 = 1, 1), and the residual ɛ follows an exponential distribution with mean 1. We fit the SIQR model 17

32 for five different quantile levels, τ = 0.1, 0.25, 0.5, 0.75, 0.9, using the proposed method and that of Wu et al. 2010). Table 2.3 reports the coverage of the asymptotic 95% confidence intervals based on NWQR, WYY-2, and WYY. The observed, over the N = 100 simulation runs, mean values and standard deviations, as well as R β) LL and average R τ Q ) for the three estimators of the parametric τ, β1 component are not reported here since they reveal the same trends as in the previous two examples. From Table 2.3 we observe that the coverage probabilities of the NWQR intervals are close to the true nominal value of 0.95, while those of the WYY-2 estimator tend to be smaller than the nominal value, probably a consequence of the heteroscedasticity. Table % coverage probability for NWQR, WYY-2, and the second estimated coefficient of Wu et al. 2010), denoted by WYY, for Model 2.21). τ NWQR WYY WYY Boston Housing Data For this example we consider an application regarding Boston housing data. The data contains 506 observations on 14 variables, for which the dependent variable of interest is medv, the median value of owner-occupied homes in $1000s, and the other thirteen variables are statistical measurements on the 506 census tracts in suburban Boston from the 1970 census. The data was originally published by Harrison and Rubinfeld 1978). This data set can be found in the MASS library in R. QR is appropriate for this data set because the response variable is the median price of homes and the y-values larger than or equal to $50,000 have been recorded as $50,000. As was noted in Chaudhuri et al. 1997, p. 724), such a truncation in the upper tail of the response makes quantile regression, which is not influenced very much by extreme values of the response, a very appropriate methodology. Due to the collinearity in the data set, Breiman and Friedman 1985) applied their alternating conditional expectation ACE) method for selecting the relevant 18

33 variables and selected the four covariates RM, TAX, PTRATIO, and LSTAT, for which the description is given below. Many regression studies Opsomer and Ruppert 1998; Yu and Lu 2004; Wu et al. 2010) have used this data set and, using a logarithmic transformation on the covariates TAX and LSTAT, found potential relationship between the response medv and these four covariates. Opsomer and Ruppert 1998) considered mean regression and fitted the additive model after removing the observations with outliers on the covariates TAX and LSTAT. Yu and Lu 2004) fitted an additive QR model and Wu et al. 2010) considered the SIQR model. In addition, many studies Chaudhuri et al. 1997; Wu et al. 2010) considered the relationship between medv and the three covariates RM, LSTAT, and DIS, for which the description is given below. Chaudhuri et al. 1997) considered the quantile average derivative estimate qade) regression, while Wu et al. 2010) considered the SIQR. We apply our proposed methodology using the above two sets of predictors. First, consider the four covariates: RM: average number of rooms per house in the area TAX: full-value property tax in dollar) per $10,000 PTRATIO: pupil-teacher ratio by town LSTAT: percentage of the population having lower economic status in the area. Following previous studies, we take logarithmic transformations on TAX and LSTAT, and center the dependent variable. Let X 1, X 2, X 3 and X 4 denote the standardized RM, logtax), PTRATIO and loglstat), respectively, and set X = X 1, X 2, X 3, X 4 ). We consider the SIQR model: Q τ medv X) = gx 1 + β 2 X 2 + β 3 X 3 + β 4 X 4 β), 2.22) for five different quantile levels τ = 0.1, 0.25, 0.5, 0.75, 0.9, which assumes that RM is a significant predictor, an assumption which is confirmed by the previous studies. An analysis assuming different significant predictor gave the same results. Note that the dependence of g and β on τ is not made explicit. 19

34 Table 2.4 gives the estimators and their standard errors, resulting from the proposed methodology. The conclusions derived from this table are: a) PTRATIO seems to have a significant contribution only on the 90th quantile; b) none of TAX, PTRATIO, LSTAT appear to have a significant contribution on the 50th and 75th quantiles; c) TAX is more significant than LSTAT for the 10th and 25th quantiles; d) LSTAT is more significant than TAX for the 90th quantile; e) RM seems to have an opposite effect than that of the other predictors. Conclusions a)-d) differ from those in the aforementioned literature. The difference with the conclusions in Yu and Lu 2004) is largely because they considered only the absolute value of the coefficients instead of the coefficients t-values. Similarly, the conclusions of Wu et al. 2010) regarding the relative significance of the predictors are based on the absolute value of their coefficients, even though they did compute standard errors based on bootstrap instead of their variance formulas). Finally, the results of Opsomer and Ruppert 1998) are not directly comparable to ours because they considered mean regression using an additive model. Table 2.4. Proposed parametric vector estimates and standard errors in parenthesis) for Boston housing data for Model 2.22) with five different quantile levels. τ RM logtax) PTRATIO loglstat) ) ) ) ) ) ) ) ) ) ) ) ) ) ) ) In order to compare the performance of the proposed estimator with that of Wu et al. 2010), we consider the mean check based absolute residuals statistic, LL R τ Q ), as defined in 2.19). From the left panel of Table 2.6, we note that τ, β1 both methods perform similarly, with the proposed estimator achieving somewhat smaller values for all quantiles. 20

35 Figure 2.2. Estimated SIQR for Boston housing data for Model 2.22). The dots are the observations and the curve is the estimated quantile function index medv index medv index medv index medv index medv Each panel in Figure 2.2 superimposes the scatterplots of Y i, β 1X i ) and Q LL τ, β1 Y β 1X i ), β 1X i ), i = 1,..., 506. These plots suggest that the estimated conditional quantile function provides a good fit to the data. They also indicate the presence of heteroscedasticity. Next, we consider the three covariates RM, LSTAT and DIS where: RM: average number of rooms per house in the area 21

36 LSTAT: percentage of the population having lower economic status in the area DIS: weighted distance to five Boston employment centers from houses of the area. Let X 1, X 2 and X 3 denote the standardized RM, LSTAT and DIS, respectively, and set X = X 1, X 2, X 3 ). We consider the SIQR model: Q τ medv X) = gx 1 + β 2 X 2 + β 3 X 3 β), 2.23) for five different quantile levels τ = 0.1, 0.25, 0.5, 0.75, 0.9, which assumes that RM is a significant predictor, an assumption which is confirmed by the previous studies. An analysis assuming that LSTAT is the significant predictor gave the same results. Table 2.5 gives the estimators and their standard errors, resulting from the proposed methodology. The conclusions derived from this table are: a) LSTAT is the most important covariate for all quantile levels by comparing the coefficients t-values; b) the effect of LSTAT is essentially stable across different quantiles; c) DIS has a significant contribution only on the 90th quantile. Conclusions a) and b) are in conformity with the ones of Chaudhuri et al. 1997) and Wu et al. 2010), with the difference that the aforementioned authors draw their conclusions by comparing only the absolute values of the normalized coefficients without calculating the coefficients t-values. Our conclusion c), however, is a more definitive statement regarding the relevance of DIS, since none of the above investigators mentioned anything about the significance of DIS. LL The right panel of Table 2.6 gives R τ Q ) for the five different quantile levels, τ, β1 contrasting the proposed estimator with that of Wu et al. 2010). We remark LL that Wu et al. 2010) display the R τ Q ) values resulting from the qade of τ, β1 Chaudhuri et al. 1997). Because these values are considerably higher, they are not displayed in Table 2.6 for either model. For τ = 0.1, 0.25, 0.5, 0.75 the two LL methods give similar R τ Q ) values with NWQR resulting in somewhat larger τ, β1 values for τ = 0.1, 0.25 and somewhat smaller for τ = 0.5, 0.75; for τ = 0.9, however, LL NWQR results in considerably lower R τ Q ) value. In terms of comparing the τ, β1 two models, Model 2.22) seems better for τ = 0.1, 0.25, 0.5 according to both WYY and NWQR, while Model 2.23) seems better for τ = 0.75 according to WYY and for τ = 0.75, 0.9 for NWQR. Lastly, our method suggests that the best fit 22

37 Table 2.5. Proposed parametric vector estimates and standard errors in parenthesis) for Boston housing data for Model 2.23) with five different quantile levels. τ RM LSTAT DIS ) ) ) ) ) ) ) ) ) ) corresponds to the lower quanitle τ = 0.1 for both models, a conclusion which is in conformity with the aforementioned literature. Table 2.6. Mean check based absolute residuals, R τ Q LL ), defined in 2.19), for Models τ, β1 2.22) and 2.23); WYY denotes the method proposed by Wu et al. 2010) and NWQR denotes the proposed methodology. Model 2.22) Model 2.23) τ WYY NWQR WYY NWQR Finally, each panel in Figure 2.3 superimposes the scatterplots of Y i, β 1X i ) LL and Q Y β τ, β1 1X i ), β 1X i ), i = 1,..., 506. Again, these plots suggest that the estimated conditional quantile function provides a good fit to the data and they also indicate the presence of heteroscedasticity. 23

Single Index Quantile Regression for Heteroscedastic Data

Single Index Quantile Regression for Heteroscedastic Data Single Index Quantile Regression for Heteroscedastic Data E. Christou M. G. Akritas Department of Statistics The Pennsylvania State University SMAC, November 6, 2015 E. Christou, M. G. Akritas (PSU) SIQR

More information

Single Index Quantile Regression for Heteroscedastic Data

Single Index Quantile Regression for Heteroscedastic Data Single Index Quantile Regression for Heteroscedastic Data E. Christou M. G. Akritas Department of Statistics The Pennsylvania State University JSM, 2015 E. Christou, M. G. Akritas (PSU) SIQR JSM, 2015

More information

SEMIPARAMETRIC QUANTILE REGRESSION WITH HIGH-DIMENSIONAL COVARIATES. Liping Zhu, Mian Huang, & Runze Li. The Pennsylvania State University

SEMIPARAMETRIC QUANTILE REGRESSION WITH HIGH-DIMENSIONAL COVARIATES. Liping Zhu, Mian Huang, & Runze Li. The Pennsylvania State University SEMIPARAMETRIC QUANTILE REGRESSION WITH HIGH-DIMENSIONAL COVARIATES Liping Zhu, Mian Huang, & Runze Li The Pennsylvania State University Technical Report Series #10-104 College of Health and Human Development

More information

WEIGHTED QUANTILE REGRESSION THEORY AND ITS APPLICATION. Abstract

WEIGHTED QUANTILE REGRESSION THEORY AND ITS APPLICATION. Abstract Journal of Data Science,17(1). P. 145-160,2019 DOI:10.6339/JDS.201901_17(1).0007 WEIGHTED QUANTILE REGRESSION THEORY AND ITS APPLICATION Wei Xiong *, Maozai Tian 2 1 School of Statistics, University of

More information

Issues on quantile autoregression

Issues on quantile autoregression Issues on quantile autoregression Jianqing Fan and Yingying Fan We congratulate Koenker and Xiao on their interesting and important contribution to the quantile autoregression (QAR). The paper provides

More information

41903: Introduction to Nonparametrics

41903: Introduction to Nonparametrics 41903: Notes 5 Introduction Nonparametrics fundamentally about fitting flexible models: want model that is flexible enough to accommodate important patterns but not so flexible it overspecializes to specific

More information

Quantile methods. Class Notes Manuel Arellano December 1, Let F (r) =Pr(Y r). Forτ (0, 1), theτth population quantile of Y is defined to be

Quantile methods. Class Notes Manuel Arellano December 1, Let F (r) =Pr(Y r). Forτ (0, 1), theτth population quantile of Y is defined to be Quantile methods Class Notes Manuel Arellano December 1, 2009 1 Unconditional quantiles Let F (r) =Pr(Y r). Forτ (0, 1), theτth population quantile of Y is defined to be Q τ (Y ) q τ F 1 (τ) =inf{r : F

More information

On the Harrison and Rubinfeld Data

On the Harrison and Rubinfeld Data On the Harrison and Rubinfeld Data By Otis W. Gilley Department of Economics and Finance College of Administration and Business Louisiana Tech University Ruston, Louisiana 71272 (318)-257-3468 and R. Kelley

More information

Nonparametric Identification of a Binary Random Factor in Cross Section Data - Supplemental Appendix

Nonparametric Identification of a Binary Random Factor in Cross Section Data - Supplemental Appendix Nonparametric Identification of a Binary Random Factor in Cross Section Data - Supplemental Appendix Yingying Dong and Arthur Lewbel California State University Fullerton and Boston College July 2010 Abstract

More information

VARIABLE SELECTION IN QUANTILE REGRESSION

VARIABLE SELECTION IN QUANTILE REGRESSION Statistica Sinica 19 (2009), 801-817 VARIABLE SELECTION IN QUANTILE REGRESSION Yichao Wu and Yufeng Liu North Carolina State University and University of North Carolina, Chapel Hill Abstract: After its

More information

Quantile Regression for Extraordinarily Large Data

Quantile Regression for Extraordinarily Large Data Quantile Regression for Extraordinarily Large Data Shih-Kang Chao Department of Statistics Purdue University November, 2016 A joint work with Stanislav Volgushev and Guang Cheng Quantile regression Two-step

More information

Rejoinder. 1 Phase I and Phase II Profile Monitoring. Peihua Qiu 1, Changliang Zou 2 and Zhaojun Wang 2

Rejoinder. 1 Phase I and Phase II Profile Monitoring. Peihua Qiu 1, Changliang Zou 2 and Zhaojun Wang 2 Rejoinder Peihua Qiu 1, Changliang Zou 2 and Zhaojun Wang 2 1 School of Statistics, University of Minnesota 2 LPMC and Department of Statistics, Nankai University, China We thank the editor Professor David

More information

Introduction. Linear Regression. coefficient estimates for the wage equation: E(Y X) = X 1 β X d β d = X β

Introduction. Linear Regression. coefficient estimates for the wage equation: E(Y X) = X 1 β X d β d = X β Introduction - Introduction -2 Introduction Linear Regression E(Y X) = X β +...+X d β d = X β Example: Wage equation Y = log wages, X = schooling (measured in years), labor market experience (measured

More information

A Comparison of Robust Estimators Based on Two Types of Trimming

A Comparison of Robust Estimators Based on Two Types of Trimming Submitted to the Bernoulli A Comparison of Robust Estimators Based on Two Types of Trimming SUBHRA SANKAR DHAR 1, and PROBAL CHAUDHURI 1, 1 Theoretical Statistics and Mathematics Unit, Indian Statistical

More information

Econ 582 Nonparametric Regression

Econ 582 Nonparametric Regression Econ 582 Nonparametric Regression Eric Zivot May 28, 2013 Nonparametric Regression Sofarwehaveonlyconsideredlinearregressionmodels = x 0 β + [ x ]=0 [ x = x] =x 0 β = [ x = x] [ x = x] x = β The assume

More information

Estimation for state space models: quasi-likelihood and asymptotic quasi-likelihood approaches

Estimation for state space models: quasi-likelihood and asymptotic quasi-likelihood approaches University of Wollongong Research Online University of Wollongong Thesis Collection 1954-2016 University of Wollongong Thesis Collections 2008 Estimation for state space models: quasi-likelihood and asymptotic

More information

Ultra High Dimensional Variable Selection with Endogenous Variables

Ultra High Dimensional Variable Selection with Endogenous Variables 1 / 39 Ultra High Dimensional Variable Selection with Endogenous Variables Yuan Liao Princeton University Joint work with Jianqing Fan Job Market Talk January, 2012 2 / 39 Outline 1 Examples of Ultra High

More information

Contents. Acknowledgments. xix

Contents. Acknowledgments. xix Table of Preface Acknowledgments page xv xix 1 Introduction 1 The Role of the Computer in Data Analysis 1 Statistics: Descriptive and Inferential 2 Variables and Constants 3 The Measurement of Variables

More information

Lifetime prediction and confidence bounds in accelerated degradation testing for lognormal response distributions with an Arrhenius rate relationship

Lifetime prediction and confidence bounds in accelerated degradation testing for lognormal response distributions with an Arrhenius rate relationship Scholars' Mine Doctoral Dissertations Student Research & Creative Works Spring 01 Lifetime prediction and confidence bounds in accelerated degradation testing for lognormal response distributions with

More information

Model-free prediction intervals for regression and autoregression. Dimitris N. Politis University of California, San Diego

Model-free prediction intervals for regression and autoregression. Dimitris N. Politis University of California, San Diego Model-free prediction intervals for regression and autoregression Dimitris N. Politis University of California, San Diego To explain or to predict? Models are indispensable for exploring/utilizing relationships

More information

Semi-Nonparametric Inferences for Massive Data

Semi-Nonparametric Inferences for Massive Data Semi-Nonparametric Inferences for Massive Data Guang Cheng 1 Department of Statistics Purdue University Statistics Seminar at NCSU October, 2015 1 Acknowledge NSF, Simons Foundation and ONR. A Joint Work

More information

Introduction to Empirical Processes and Semiparametric Inference Lecture 02: Overview Continued

Introduction to Empirical Processes and Semiparametric Inference Lecture 02: Overview Continued Introduction to Empirical Processes and Semiparametric Inference Lecture 02: Overview Continued Michael R. Kosorok, Ph.D. Professor and Chair of Biostatistics Professor of Statistics and Operations Research

More information

Nonparametric Econometrics

Nonparametric Econometrics Applied Microeconometrics with Stata Nonparametric Econometrics Spring Term 2011 1 / 37 Contents Introduction The histogram estimator The kernel density estimator Nonparametric regression estimators Semi-

More information

Additive Isotonic Regression

Additive Isotonic Regression Additive Isotonic Regression Enno Mammen and Kyusang Yu 11. July 2006 INTRODUCTION: We have i.i.d. random vectors (Y 1, X 1 ),..., (Y n, X n ) with X i = (X1 i,..., X d i ) and we consider the additive

More information

arxiv: v1 [stat.me] 30 Dec 2017

arxiv: v1 [stat.me] 30 Dec 2017 arxiv:1801.00105v1 [stat.me] 30 Dec 2017 An ISIS screening approach involving threshold/partition for variable selection in linear regression 1. Introduction Yu-Hsiang Cheng e-mail: 96354501@nccu.edu.tw

More information

SUPPLEMENT TO PARAMETRIC OR NONPARAMETRIC? A PARAMETRICNESS INDEX FOR MODEL SELECTION. University of Minnesota

SUPPLEMENT TO PARAMETRIC OR NONPARAMETRIC? A PARAMETRICNESS INDEX FOR MODEL SELECTION. University of Minnesota Submitted to the Annals of Statistics arxiv: math.pr/0000000 SUPPLEMENT TO PARAMETRIC OR NONPARAMETRIC? A PARAMETRICNESS INDEX FOR MODEL SELECTION By Wei Liu and Yuhong Yang University of Minnesota In

More information

A Shape Constrained Estimator of Bidding Function of First-Price Sealed-Bid Auctions

A Shape Constrained Estimator of Bidding Function of First-Price Sealed-Bid Auctions A Shape Constrained Estimator of Bidding Function of First-Price Sealed-Bid Auctions Yu Yvette Zhang Abstract This paper is concerned with economic analysis of first-price sealed-bid auctions with risk

More information

Web-based Supplementary Material for. Dependence Calibration in Conditional Copulas: A Nonparametric Approach

Web-based Supplementary Material for. Dependence Calibration in Conditional Copulas: A Nonparametric Approach 1 Web-based Supplementary Material for Dependence Calibration in Conditional Copulas: A Nonparametric Approach Elif F. Acar, Radu V. Craiu, and Fang Yao Web Appendix A: Technical Details The score and

More information

Supplement to Quantile-Based Nonparametric Inference for First-Price Auctions

Supplement to Quantile-Based Nonparametric Inference for First-Price Auctions Supplement to Quantile-Based Nonparametric Inference for First-Price Auctions Vadim Marmer University of British Columbia Artyom Shneyerov CIRANO, CIREQ, and Concordia University August 30, 2010 Abstract

More information

STAT 992 Paper Review: Sure Independence Screening in Generalized Linear Models with NP-Dimensionality J.Fan and R.Song

STAT 992 Paper Review: Sure Independence Screening in Generalized Linear Models with NP-Dimensionality J.Fan and R.Song STAT 992 Paper Review: Sure Independence Screening in Generalized Linear Models with NP-Dimensionality J.Fan and R.Song Presenter: Jiwei Zhao Department of Statistics University of Wisconsin Madison April

More information

University of California, Berkeley

University of California, Berkeley University of California, Berkeley U.C. Berkeley Division of Biostatistics Working Paper Series Year 24 Paper 153 A Note on Empirical Likelihood Inference of Residual Life Regression Ying Qing Chen Yichuan

More information

Flexible Estimation of Treatment Effect Parameters

Flexible Estimation of Treatment Effect Parameters Flexible Estimation of Treatment Effect Parameters Thomas MaCurdy a and Xiaohong Chen b and Han Hong c Introduction Many empirical studies of program evaluations are complicated by the presence of both

More information

Lecture 3: Statistical Decision Theory (Part II)

Lecture 3: Statistical Decision Theory (Part II) Lecture 3: Statistical Decision Theory (Part II) Hao Helen Zhang Hao Helen Zhang Lecture 3: Statistical Decision Theory (Part II) 1 / 27 Outline of This Note Part I: Statistics Decision Theory (Classical

More information

Transformation and Smoothing in Sample Survey Data

Transformation and Smoothing in Sample Survey Data Scandinavian Journal of Statistics, Vol. 37: 496 513, 2010 doi: 10.1111/j.1467-9469.2010.00691.x Published by Blackwell Publishing Ltd. Transformation and Smoothing in Sample Survey Data YANYUAN MA Department

More information

Chapter 15: Nonparametric Statistics Section 15.1: An Overview of Nonparametric Statistics

Chapter 15: Nonparametric Statistics Section 15.1: An Overview of Nonparametric Statistics Section 15.1: An Overview of Nonparametric Statistics Understand Difference between Parametric and Nonparametric Statistical Procedures Parametric statistical procedures inferential procedures that rely

More information

Lectures on Simple Linear Regression Stat 431, Summer 2012

Lectures on Simple Linear Regression Stat 431, Summer 2012 Lectures on Simple Linear Regression Stat 43, Summer 0 Hyunseung Kang July 6-8, 0 Last Updated: July 8, 0 :59PM Introduction Previously, we have been investigating various properties of the population

More information

Practical Bayesian Quantile Regression. Keming Yu University of Plymouth, UK

Practical Bayesian Quantile Regression. Keming Yu University of Plymouth, UK Practical Bayesian Quantile Regression Keming Yu University of Plymouth, UK (kyu@plymouth.ac.uk) A brief summary of some recent work of us (Keming Yu, Rana Moyeed and Julian Stander). Summary We develops

More information

Some Theories about Backfitting Algorithm for Varying Coefficient Partially Linear Model

Some Theories about Backfitting Algorithm for Varying Coefficient Partially Linear Model Some Theories about Backfitting Algorithm for Varying Coefficient Partially Linear Model 1. Introduction Varying-coefficient partially linear model (Zhang, Lee, and Song, 2002; Xia, Zhang, and Tong, 2004;

More information

Stat 5101 Lecture Notes

Stat 5101 Lecture Notes Stat 5101 Lecture Notes Charles J. Geyer Copyright 1998, 1999, 2000, 2001 by Charles J. Geyer May 7, 2001 ii Stat 5101 (Geyer) Course Notes Contents 1 Random Variables and Change of Variables 1 1.1 Random

More information

Smooth nonparametric estimation of a quantile function under right censoring using beta kernels

Smooth nonparametric estimation of a quantile function under right censoring using beta kernels Smooth nonparametric estimation of a quantile function under right censoring using beta kernels Chanseok Park 1 Department of Mathematical Sciences, Clemson University, Clemson, SC 29634 Short Title: Smooth

More information

Chapter 8. Quantile Regression and Quantile Treatment Effects

Chapter 8. Quantile Regression and Quantile Treatment Effects Chapter 8. Quantile Regression and Quantile Treatment Effects By Joan Llull Quantitative & Statistical Methods II Barcelona GSE. Winter 2018 I. Introduction A. Motivation As in most of the economics literature,

More information

SEMIPARAMETRIC ESTIMATION OF CONDITIONAL HETEROSCEDASTICITY VIA SINGLE-INDEX MODELING

SEMIPARAMETRIC ESTIMATION OF CONDITIONAL HETEROSCEDASTICITY VIA SINGLE-INDEX MODELING Statistica Sinica 3 (013), 135-155 doi:http://dx.doi.org/10.5705/ss.01.075 SEMIPARAMERIC ESIMAION OF CONDIIONAL HEEROSCEDASICIY VIA SINGLE-INDEX MODELING Liping Zhu, Yuexiao Dong and Runze Li Shanghai

More information

Identification and Estimation of Partially Linear Censored Regression Models with Unknown Heteroscedasticity

Identification and Estimation of Partially Linear Censored Regression Models with Unknown Heteroscedasticity Identification and Estimation of Partially Linear Censored Regression Models with Unknown Heteroscedasticity Zhengyu Zhang School of Economics Shanghai University of Finance and Economics zy.zhang@mail.shufe.edu.cn

More information

CALCULATION METHOD FOR NONLINEAR DYNAMIC LEAST-ABSOLUTE DEVIATIONS ESTIMATOR

CALCULATION METHOD FOR NONLINEAR DYNAMIC LEAST-ABSOLUTE DEVIATIONS ESTIMATOR J. Japan Statist. Soc. Vol. 3 No. 200 39 5 CALCULAION MEHOD FOR NONLINEAR DYNAMIC LEAS-ABSOLUE DEVIAIONS ESIMAOR Kohtaro Hitomi * and Masato Kagihara ** In a nonlinear dynamic model, the consistency and

More information

Estimation of cumulative distribution function with spline functions

Estimation of cumulative distribution function with spline functions INTERNATIONAL JOURNAL OF ECONOMICS AND STATISTICS Volume 5, 017 Estimation of cumulative distribution function with functions Akhlitdin Nizamitdinov, Aladdin Shamilov Abstract The estimation of the cumulative

More information

Testing Equality of Nonparametric Quantile Regression Functions

Testing Equality of Nonparametric Quantile Regression Functions International Journal of Statistics and Probability; Vol. 3, No. 1; 2014 ISSN 1927-7032 E-ISSN 1927-7040 Published by Canadian Center of Science and Education Testing Equality of Nonparametric Quantile

More information

Semiparametric Regression

Semiparametric Regression Semiparametric Regression Patrick Breheny October 22 Patrick Breheny Survival Data Analysis (BIOS 7210) 1/23 Introduction Over the past few weeks, we ve introduced a variety of regression models under

More information

Minimum Hellinger Distance Estimation in a. Semiparametric Mixture Model

Minimum Hellinger Distance Estimation in a. Semiparametric Mixture Model Minimum Hellinger Distance Estimation in a Semiparametric Mixture Model Sijia Xiang 1, Weixin Yao 1, and Jingjing Wu 2 1 Department of Statistics, Kansas State University, Manhattan, Kansas, USA 66506-0802.

More information

Inference on distributions and quantiles using a finite-sample Dirichlet process

Inference on distributions and quantiles using a finite-sample Dirichlet process Dirichlet IDEAL Theory/methods Simulations Inference on distributions and quantiles using a finite-sample Dirichlet process David M. Kaplan University of Missouri Matt Goldman UC San Diego Midwest Econometrics

More information

Correlated and Interacting Predictor Omission for Linear and Logistic Regression Models

Correlated and Interacting Predictor Omission for Linear and Logistic Regression Models Clemson University TigerPrints All Dissertations Dissertations 8-207 Correlated and Interacting Predictor Omission for Linear and Logistic Regression Models Emily Nystrom Clemson University, emily.m.nystrom@gmail.com

More information

Efficient Quantile Regression for Heteroscedastic Models

Efficient Quantile Regression for Heteroscedastic Models Efficient Quantile Regression for Heteroscedastic Models Yoonsuh Jung, University of Waikato, New Zealand Yoonkyung Lee, The Ohio State University Steven N. MacEachern, The Ohio State University Technical

More information

Robust Backtesting Tests for Value-at-Risk Models

Robust Backtesting Tests for Value-at-Risk Models Robust Backtesting Tests for Value-at-Risk Models Jose Olmo City University London (joint work with Juan Carlos Escanciano, Indiana University) Far East and South Asia Meeting of the Econometric Society

More information

Local Polynomial Modelling and Its Applications

Local Polynomial Modelling and Its Applications Local Polynomial Modelling and Its Applications J. Fan Department of Statistics University of North Carolina Chapel Hill, USA and I. Gijbels Institute of Statistics Catholic University oflouvain Louvain-la-Neuve,

More information

Testing Statistical Hypotheses

Testing Statistical Hypotheses E.L. Lehmann Joseph P. Romano Testing Statistical Hypotheses Third Edition 4y Springer Preface vii I Small-Sample Theory 1 1 The General Decision Problem 3 1.1 Statistical Inference and Statistical Decisions

More information

Introduction to Empirical Processes and Semiparametric Inference Lecture 01: Introduction and Overview

Introduction to Empirical Processes and Semiparametric Inference Lecture 01: Introduction and Overview Introduction to Empirical Processes and Semiparametric Inference Lecture 01: Introduction and Overview Michael R. Kosorok, Ph.D. Professor and Chair of Biostatistics Professor of Statistics and Operations

More information

STAT 461/561- Assignments, Year 2015

STAT 461/561- Assignments, Year 2015 STAT 461/561- Assignments, Year 2015 This is the second set of assignment problems. When you hand in any problem, include the problem itself and its number. pdf are welcome. If so, use large fonts and

More information

IV Quantile Regression for Group-level Treatments, with an Application to the Distributional Effects of Trade

IV Quantile Regression for Group-level Treatments, with an Application to the Distributional Effects of Trade IV Quantile Regression for Group-level Treatments, with an Application to the Distributional Effects of Trade Denis Chetverikov Brad Larsen Christopher Palmer UCLA, Stanford and NBER, UC Berkeley September

More information

Time Series and Forecasting Lecture 4 NonLinear Time Series

Time Series and Forecasting Lecture 4 NonLinear Time Series Time Series and Forecasting Lecture 4 NonLinear Time Series Bruce E. Hansen Summer School in Economics and Econometrics University of Crete July 23-27, 2012 Bruce Hansen (University of Wisconsin) Foundations

More information

Spatially Smoothed Kernel Density Estimation via Generalized Empirical Likelihood

Spatially Smoothed Kernel Density Estimation via Generalized Empirical Likelihood Spatially Smoothed Kernel Density Estimation via Generalized Empirical Likelihood Kuangyu Wen & Ximing Wu Texas A&M University Info-Metrics Institute Conference: Recent Innovations in Info-Metrics October

More information

ESTIMATING STATISTICAL CHARACTERISTICS UNDER INTERVAL UNCERTAINTY AND CONSTRAINTS: MEAN, VARIANCE, COVARIANCE, AND CORRELATION ALI JALAL-KAMALI

ESTIMATING STATISTICAL CHARACTERISTICS UNDER INTERVAL UNCERTAINTY AND CONSTRAINTS: MEAN, VARIANCE, COVARIANCE, AND CORRELATION ALI JALAL-KAMALI ESTIMATING STATISTICAL CHARACTERISTICS UNDER INTERVAL UNCERTAINTY AND CONSTRAINTS: MEAN, VARIANCE, COVARIANCE, AND CORRELATION ALI JALAL-KAMALI Department of Computer Science APPROVED: Vladik Kreinovich,

More information

Estimation of the Bivariate and Marginal Distributions with Censored Data

Estimation of the Bivariate and Marginal Distributions with Censored Data Estimation of the Bivariate and Marginal Distributions with Censored Data Michael Akritas and Ingrid Van Keilegom Penn State University and Eindhoven University of Technology May 22, 2 Abstract Two new

More information

Lasso Maximum Likelihood Estimation of Parametric Models with Singular Information Matrices

Lasso Maximum Likelihood Estimation of Parametric Models with Singular Information Matrices Article Lasso Maximum Likelihood Estimation of Parametric Models with Singular Information Matrices Fei Jin 1,2 and Lung-fei Lee 3, * 1 School of Economics, Shanghai University of Finance and Economics,

More information

Inference For High Dimensional M-estimates: Fixed Design Results

Inference For High Dimensional M-estimates: Fixed Design Results Inference For High Dimensional M-estimates: Fixed Design Results Lihua Lei, Peter Bickel and Noureddine El Karoui Department of Statistics, UC Berkeley Berkeley-Stanford Econometrics Jamboree, 2017 1/49

More information

An Introduction to Nonstationary Time Series Analysis

An Introduction to Nonstationary Time Series Analysis An Introduction to Analysis Ting Zhang 1 tingz@bu.edu Department of Mathematics and Statistics Boston University August 15, 2016 Boston University/Keio University Workshop 2016 A Presentation Friendly

More information

Bayesian estimation of bandwidths for a nonparametric regression model with a flexible error density

Bayesian estimation of bandwidths for a nonparametric regression model with a flexible error density ISSN 1440-771X Australia Department of Econometrics and Business Statistics http://www.buseco.monash.edu.au/depts/ebs/pubs/wpapers/ Bayesian estimation of bandwidths for a nonparametric regression model

More information

Confidence Intervals for Low-dimensional Parameters with High-dimensional Data

Confidence Intervals for Low-dimensional Parameters with High-dimensional Data Confidence Intervals for Low-dimensional Parameters with High-dimensional Data Cun-Hui Zhang and Stephanie S. Zhang Rutgers University and Columbia University September 14, 2012 Outline Introduction Methodology

More information

Minimax Rate of Convergence for an Estimator of the Functional Component in a Semiparametric Multivariate Partially Linear Model.

Minimax Rate of Convergence for an Estimator of the Functional Component in a Semiparametric Multivariate Partially Linear Model. Minimax Rate of Convergence for an Estimator of the Functional Component in a Semiparametric Multivariate Partially Linear Model By Michael Levine Purdue University Technical Report #14-03 Department of

More information

Lecture 14: Variable Selection - Beyond LASSO

Lecture 14: Variable Selection - Beyond LASSO Fall, 2017 Extension of LASSO To achieve oracle properties, L q penalty with 0 < q < 1, SCAD penalty (Fan and Li 2001; Zhang et al. 2007). Adaptive LASSO (Zou 2006; Zhang and Lu 2007; Wang et al. 2007)

More information

Bayesian spatial quantile regression

Bayesian spatial quantile regression Brian J. Reich and Montserrat Fuentes North Carolina State University and David B. Dunson Duke University E-mail:reich@stat.ncsu.edu Tropospheric ozone Tropospheric ozone has been linked with several adverse

More information

A Bootstrap Test for Conditional Symmetry

A Bootstrap Test for Conditional Symmetry ANNALS OF ECONOMICS AND FINANCE 6, 51 61 005) A Bootstrap Test for Conditional Symmetry Liangjun Su Guanghua School of Management, Peking University E-mail: lsu@gsm.pku.edu.cn and Sainan Jin Guanghua School

More information

Data Mining Techniques. Lecture 2: Regression

Data Mining Techniques. Lecture 2: Regression Data Mining Techniques CS 6220 - Section 3 - Fall 2016 Lecture 2: Regression Jan-Willem van de Meent (credit: Yijun Zhao, Marc Toussaint, Bishop) Administrativa Instructor Jan-Willem van de Meent Email:

More information

DEPARTMENT MATHEMATIK ARBEITSBEREICH MATHEMATISCHE STATISTIK UND STOCHASTISCHE PROZESSE

DEPARTMENT MATHEMATIK ARBEITSBEREICH MATHEMATISCHE STATISTIK UND STOCHASTISCHE PROZESSE Estimating the error distribution in nonparametric multiple regression with applications to model testing Natalie Neumeyer & Ingrid Van Keilegom Preprint No. 2008-01 July 2008 DEPARTMENT MATHEMATIK ARBEITSBEREICH

More information

INFERENCE APPROACHES FOR INSTRUMENTAL VARIABLE QUANTILE REGRESSION. 1. Introduction

INFERENCE APPROACHES FOR INSTRUMENTAL VARIABLE QUANTILE REGRESSION. 1. Introduction INFERENCE APPROACHES FOR INSTRUMENTAL VARIABLE QUANTILE REGRESSION VICTOR CHERNOZHUKOV CHRISTIAN HANSEN MICHAEL JANSSON Abstract. We consider asymptotic and finite-sample confidence bounds in instrumental

More information

Inference For High Dimensional M-estimates. Fixed Design Results

Inference For High Dimensional M-estimates. Fixed Design Results : Fixed Design Results Lihua Lei Advisors: Peter J. Bickel, Michael I. Jordan joint work with Peter J. Bickel and Noureddine El Karoui Dec. 8, 2016 1/57 Table of Contents 1 Background 2 Main Results and

More information

Tobit and Interval Censored Regression Model

Tobit and Interval Censored Regression Model Global Journal of Pure and Applied Mathematics. ISSN 0973-768 Volume 2, Number (206), pp. 98-994 Research India Publications http://www.ripublication.com Tobit and Interval Censored Regression Model Raidani

More information

A Goodness-of-fit Test for Copulas

A Goodness-of-fit Test for Copulas A Goodness-of-fit Test for Copulas Artem Prokhorov August 2008 Abstract A new goodness-of-fit test for copulas is proposed. It is based on restrictions on certain elements of the information matrix and

More information

arxiv: v1 [stat.me] 23 Dec 2017 Abstract

arxiv: v1 [stat.me] 23 Dec 2017 Abstract Distribution Regression Xin Chen Xuejun Ma Wang Zhou Department of Statistics and Applied Probability, National University of Singapore stacx@nus.edu.sg stamax@nus.edu.sg stazw@nus.edu.sg arxiv:1712.08781v1

More information

Efficient Estimation in Convex Single Index Models 1

Efficient Estimation in Convex Single Index Models 1 1/28 Efficient Estimation in Convex Single Index Models 1 Rohit Patra University of Florida http://arxiv.org/abs/1708.00145 1 Joint work with Arun K. Kuchibhotla (UPenn) and Bodhisattva Sen (Columbia)

More information

Truncated Regression Model and Nonparametric Estimation for Gifted and Talented Education Program

Truncated Regression Model and Nonparametric Estimation for Gifted and Talented Education Program Global Journal of Pure and Applied Mathematics. ISSN 0973-768 Volume 2, Number (206), pp. 995-002 Research India Publications http://www.ripublication.com Truncated Regression Model and Nonparametric Estimation

More information

High-dimensional Ordinary Least-squares Projection for Screening Variables

High-dimensional Ordinary Least-squares Projection for Screening Variables 1 / 38 High-dimensional Ordinary Least-squares Projection for Screening Variables Chenlei Leng Joint with Xiangyu Wang (Duke) Conference on Nonparametric Statistics for Big Data and Celebration to Honor

More information

Fixed Effects, Invariance, and Spatial Variation in Intergenerational Mobility

Fixed Effects, Invariance, and Spatial Variation in Intergenerational Mobility American Economic Review: Papers & Proceedings 2016, 106(5): 400 404 http://dx.doi.org/10.1257/aer.p20161082 Fixed Effects, Invariance, and Spatial Variation in Intergenerational Mobility By Gary Chamberlain*

More information

Modelling Non-linear and Non-stationary Time Series

Modelling Non-linear and Non-stationary Time Series Modelling Non-linear and Non-stationary Time Series Chapter 2: Non-parametric methods Henrik Madsen Advanced Time Series Analysis September 206 Henrik Madsen (02427 Adv. TS Analysis) Lecture Notes September

More information

Quantile Regression for Panel Data Models with Fixed Effects and Small T : Identification and Estimation

Quantile Regression for Panel Data Models with Fixed Effects and Small T : Identification and Estimation Quantile Regression for Panel Data Models with Fixed Effects and Small T : Identification and Estimation Maria Ponomareva University of Western Ontario May 8, 2011 Abstract This paper proposes a moments-based

More information

SMOOTHED BLOCK EMPIRICAL LIKELIHOOD FOR QUANTILES OF WEAKLY DEPENDENT PROCESSES

SMOOTHED BLOCK EMPIRICAL LIKELIHOOD FOR QUANTILES OF WEAKLY DEPENDENT PROCESSES Statistica Sinica 19 (2009), 71-81 SMOOTHED BLOCK EMPIRICAL LIKELIHOOD FOR QUANTILES OF WEAKLY DEPENDENT PROCESSES Song Xi Chen 1,2 and Chiu Min Wong 3 1 Iowa State University, 2 Peking University and

More information

3 Joint Distributions 71

3 Joint Distributions 71 2.2.3 The Normal Distribution 54 2.2.4 The Beta Density 58 2.3 Functions of a Random Variable 58 2.4 Concluding Remarks 64 2.5 Problems 64 3 Joint Distributions 71 3.1 Introduction 71 3.2 Discrete Random

More information

Expecting the Unexpected: Uniform Quantile Regression Bands with an application to Investor Sentiments

Expecting the Unexpected: Uniform Quantile Regression Bands with an application to Investor Sentiments Expecting the Unexpected: Uniform Bands with an application to Investor Sentiments Boston University November 16, 2016 Econometric Analysis of Heterogeneity in Financial Markets Using s Chapter 1: Expecting

More information

Nonparametric Methods

Nonparametric Methods Nonparametric Methods Michael R. Roberts Department of Finance The Wharton School University of Pennsylvania July 28, 2009 Michael R. Roberts Nonparametric Methods 1/42 Overview Great for data analysis

More information

Program Evaluation with High-Dimensional Data

Program Evaluation with High-Dimensional Data Program Evaluation with High-Dimensional Data Alexandre Belloni Duke Victor Chernozhukov MIT Iván Fernández-Val BU Christian Hansen Booth ESWC 215 August 17, 215 Introduction Goal is to perform inference

More information

CS 147: Computer Systems Performance Analysis

CS 147: Computer Systems Performance Analysis CS 147: Computer Systems Performance Analysis Summarizing Variability and Determining Distributions CS 147: Computer Systems Performance Analysis Summarizing Variability and Determining Distributions 1

More information

Irr. Statistical Methods in Experimental Physics. 2nd Edition. Frederick James. World Scientific. CERN, Switzerland

Irr. Statistical Methods in Experimental Physics. 2nd Edition. Frederick James. World Scientific. CERN, Switzerland Frederick James CERN, Switzerland Statistical Methods in Experimental Physics 2nd Edition r i Irr 1- r ri Ibn World Scientific NEW JERSEY LONDON SINGAPORE BEIJING SHANGHAI HONG KONG TAIPEI CHENNAI CONTENTS

More information

Can we do statistical inference in a non-asymptotic way? 1

Can we do statistical inference in a non-asymptotic way? 1 Can we do statistical inference in a non-asymptotic way? 1 Guang Cheng 2 Statistics@Purdue www.science.purdue.edu/bigdata/ ONR Review Meeting@Duke Oct 11, 2017 1 Acknowledge NSF, ONR and Simons Foundation.

More information

Nonparametric Modal Regression

Nonparametric Modal Regression Nonparametric Modal Regression Summary In this article, we propose a new nonparametric modal regression model, which aims to estimate the mode of the conditional density of Y given predictors X. The nonparametric

More information

Bickel Rosenblatt test

Bickel Rosenblatt test University of Latvia 28.05.2011. A classical Let X 1,..., X n be i.i.d. random variables with a continuous probability density function f. Consider a simple hypothesis H 0 : f = f 0 with a significance

More information

Linear Regression. In this problem sheet, we consider the problem of linear regression with p predictors and one intercept,

Linear Regression. In this problem sheet, we consider the problem of linear regression with p predictors and one intercept, Linear Regression In this problem sheet, we consider the problem of linear regression with p predictors and one intercept, y = Xβ + ɛ, where y t = (y 1,..., y n ) is the column vector of target values,

More information

Density estimation Nonparametric conditional mean estimation Semiparametric conditional mean estimation. Nonparametrics. Gabriel Montes-Rojas

Density estimation Nonparametric conditional mean estimation Semiparametric conditional mean estimation. Nonparametrics. Gabriel Montes-Rojas 0 0 5 Motivation: Regression discontinuity (Angrist&Pischke) Outcome.5 1 1.5 A. Linear E[Y 0i X i] 0.2.4.6.8 1 X Outcome.5 1 1.5 B. Nonlinear E[Y 0i X i] i 0.2.4.6.8 1 X utcome.5 1 1.5 C. Nonlinearity

More information

Approximate Median Regression via the Box-Cox Transformation

Approximate Median Regression via the Box-Cox Transformation Approximate Median Regression via the Box-Cox Transformation Garrett M. Fitzmaurice,StuartR.Lipsitz, and Michael Parzen Median regression is used increasingly in many different areas of applications. The

More information

Stock Sampling with Interval-Censored Elapsed Duration: A Monte Carlo Analysis

Stock Sampling with Interval-Censored Elapsed Duration: A Monte Carlo Analysis Stock Sampling with Interval-Censored Elapsed Duration: A Monte Carlo Analysis Michael P. Babington and Javier Cano-Urbina August 31, 2018 Abstract Duration data obtained from a given stock of individuals

More information

WEIGHTED LIKELIHOOD NEGATIVE BINOMIAL REGRESSION

WEIGHTED LIKELIHOOD NEGATIVE BINOMIAL REGRESSION WEIGHTED LIKELIHOOD NEGATIVE BINOMIAL REGRESSION Michael Amiguet 1, Alfio Marazzi 1, Victor Yohai 2 1 - University of Lausanne, Institute for Social and Preventive Medicine, Lausanne, Switzerland 2 - University

More information

Median Cross-Validation

Median Cross-Validation Median Cross-Validation Chi-Wai Yu 1, and Bertrand Clarke 2 1 Department of Mathematics Hong Kong University of Science and Technology 2 Department of Medicine University of Miami IISA 2011 Outline Motivational

More information

ON CONCURVITY IN NONLINEAR AND NONPARAMETRIC REGRESSION MODELS

ON CONCURVITY IN NONLINEAR AND NONPARAMETRIC REGRESSION MODELS STATISTICA, anno LXXIV, n. 1, 2014 ON CONCURVITY IN NONLINEAR AND NONPARAMETRIC REGRESSION MODELS Sonia Amodio Department of Economics and Statistics, University of Naples Federico II, Via Cinthia 21,

More information