Variable Selection in Measurement Error Models 1

Size: px
Start display at page:

Download "Variable Selection in Measurement Error Models 1"

Transcription

1 Variable Selection in Measurement Error Models 1 Yanyuan Ma Institut de Statistique Université de Neuchâtel Pierre à Mazel 7, 2000 Neuchâtel Runze Li Department of Statistics Pennsylvania State University University Park, PA Abstract Compared with ordinary regression models, the computational cost for estimating parameters in general measurement error models is often much more expensive because the estimation procedures typically require solving integral equations. In addition, natural criteria functions are often unavailable for general measurement error models. Thus, the traditionally best variable selection procedures become infeasible in the measurement error models context. In this paper, we develop a new framework for variable selection in measurement error models via penalized estimating equations. We first propose a new class of variable selection procedures for general parametric measurement error models and for general semiparametric measurement error models, and study the asymptotic properties of the proposed procedures. Then, under certain regularity conditions and with a properly chosen regularization parameter, we demonstrate that the proposed procedure performs as well as an oracle procedure. We assess the finite sample performance by Monte Carlo simulation studies and illustrate the proposed methodology through the empirical analysis of a real data set. 1 Introduction Variable selection is fundamental in high-dimensional statistical modeling and has received much attention in the recent literature. Frank and Friedman (1993) proposed bridge regression for variable selection in linear models. Tibshirani (1996) proposed the LASSO method for variable selection in linear models. Fan and Li (2001) proposed a nonconcave penalized likelihood method for variable selection in likelihood-based models. The nonconcave penalized likelihood approach has been 1 AMS 2000 subject classifications. Primary 62G08, 62G10; secondary 62G20. Key Words: Errors in variables; Estimating equations; Kernel regression; Instrumental variables; Latent variables; Local efficiency; Measurement error; Nonconcave penalized likelihood; SCAD; Semiparametric methods. Li s research was supported by NSF grants DMS

2 Y. Ma and R. Li further extended for Cox s model for survival data (Fan and Li, 2002) and partially linear models for longitudinal data (Fan and Li, 2004). The LASSO method becomes very popular in the literature of machine learning. The whole solution path of LASSO can be obtained by the least angle regression (LARS) algorithm (Efron, et al., 2004). Yuan and Lin (2005) provided insights of the LASSO method from empirical Bayes point of view, while Yuan and Lin (2006) extended the LASSO method for grouped variable selection in linear models. Zou (2006) proposed adaptive LASSO for linear models to reduce the estimation bias in the original LASSO. Zhang and Lu (2007) studied the adaptive LASSO for the Cox model. Fan and Li (2006) presented a review on feature selection in high-dimensional modeling. In many situations, covariates can only be measured imprecisely or indirectly, thus result in so called measurement error models or errors-in-variable models in the literature. Various statistical procedure have been developed for analyzing measurement error models (Fuller, 1987; Carroll et al., 2006). Tsiatis and Ma (2004) proposed the locally efficient estimator for parametric measurement error models, and Ma and Carroll (2006) further extend the locally efficient estimation procedure for semiparametric measurement error models. To our best knowledge, variable selection for measurement error model has not been systematically studied yet. This paper intends to develop a class of variable selection procedures for both parametric measurement error models and semiparametric measurement error models. Liang and Li (2005) extended the nonconcave penalized likelihood method for linear and partially linear measurement error models, but their method is not applicable for measurement error models beyond linear or partially linear setting, such as logistic measurement error model and generalized partially linear measurement error models. Variable selection for general parametric or semiparametric measurement error models is much more challenging than that for the linear or partially linear models. One major difficulty associated with this topic is the lack of availability of a likelihood function. In fact, classical model selection methods such as BIC, AIC, as well as the aforementioned penalized likelihood methods all require computing the likelihood, which is typically formidable in measurement error models due to the difficulty in obtaining the distribution of the error-prone covariates. Although a reasonable criterion function can be used in place of the likelihood, the difficulty persists in that except for very special models such as in linear or partially linear cases, even a criterion function is unavailable. To develop variable selection procedures for both parametric and semiparametric measurement error models, we propose a penalized estimating equation method. We systematically study the asymptotic properties of the proposed procedures. In this paper, we first consider two scenarios: one is when the number of regression coefficients is fixed and finite, and the other is when the number of regression coefficients diverges as the sample size increases. We demonstrate how the convergence 2

3 Measurement Error Model Selection rate of the resulting estimator depends on the regularization parameter. We further show that with proper choice of the regularization parameters and the penalty function, the penalized estimating equation estimator possesses the oracle property (Fan and Li, 2001). It is desirable to have an automatic, data-driven method to select the regularization parameters. We propose GCV-type and BIC-type tuning parameter selector for the proposed penalized estimating equation method. Monte Carlo simulation studies are conducted to assess finite sample performance in terms of model complexity and model error. The rest of the paper is organized as the following. We propose a new class of variable selection procedure for parametric measurement error model and study asymptotic properties of the proposed procedures in Section 2. We develop a new variable selection procedure for semiparametric measurement error model in Section 3. Implementation issues and numerical examples are presented in Section 4, where we describe a data driven automatic tuning parameter selection method (Section 4.1), define the concept of approximate model error to evaluate the selected model (Section 4.2), carry out a simulation study to assess the finite sample performance of the proposed procedures (Section 4.3), and illustrate our method in an example (Section 4.4). Technical details are collected in Section 5. We give some discussions in Section 6. 2 Parametric measurement error models A general parametric measurement error model has two parts and can be written as p Y X,Z (Y X, Z, β) and p W X,Z (W X, Z, ξ). (1) The main model is p Y X,Z (Y X, Z, β), denoting the conditional probability density function (pdf) of the response variable Y on the covariates measured with error X and the covariates measured without error Z. The error model is denoted p W X,Z (W X, Z, ξ), where W is an observable surrogate of X. Parameter β is a d-dimensional regression coefficient, ξ is a finite dimensional parameter, and our main interest is in selecting the relevant subset of covariates in X and Z and estimating the subsequent parameters contained in β. Typically, ξ is a nuisance parameter, and its estimation usually requires multiple observations or instrumental variables. As in the literature, for simplicity, we assume in the main context of this paper that the error model p W X,Z (W X, Z) is completely known and hence ξ is suppressed. We discuss the treatment of the unknown ξ case in Section 6. The observed data is of the form (W i, Z i, Y i ), i = 1,..., n. Denote S β the purported score function. That is, Sβ (W, Z, Y ) = log p W X,Z (W X, Z)p Y X,Z (Y X, Z)p X Z (X Z)dµ(X), β 3

4 Y. Ma and R. Li where p X Z (X Z) is a conditional pdf that one posits, which can be equal or not equal to the true pdf p X Z (X Z). The function a(x, Z) satisfies E[E a(x, Z) W, Z, Y X, Z] = ESβ (W, Z, Y ) X, Z, where E indicates that the expectation is calculated using the posited p X Z (X Z). Let S eff (W, Z, Y ) = S β (W, Z, Y ) E a(x, Z) W, Z, Y. Define penalized estimating equations for model (1) to be Seff (W i, Z i, Y i, β) nṗ λ (β) = 0. (2) where ṗ λ (β) = p λ (β 1),, p λ (β d) T and p λ ( ) is the first order derivative of a penalty function p λ ( ). In practice, we may allow different coefficients to have penalty functions with different regularization parameters. For example, we may want to keep some variables in the model and do not penalize their coefficients. For ease of presentation, we assume that the penalty functions and the regularization parameters are the same for all the coefficients in this paper. The choice of the penalty functions has been studied in Fan and Li (2001) in depth. penalties in the classic variable selection criteria, such as the AIC and BIC, cannot be applied for the penalized estimating equations. The L q penalty, namely p λ (θ) = q 1 λ θ q, has been proposed for bridge regression in Frank and Friedman (1993). Fan and Li (2001) advocated the use of the SCAD penalty, whose first order derivative is defined as p λ (θ) = λ I( θ λ) + (aλ θ ) + I( θ > λ) sign(θ), (3) (a 1)λ where a > 2 and sign( ) is the sign function, i.e., sign(θ) = 1, 0, and 1 when θ < 0, = 0 and > 0 respectively. Fan and Li (2001) further suggested using a = 3.7 from a Bayesian point of view. With a proper choice of penalty function, such as the SCAD penalty, the resulting estimate contains some exact zero coefficients. This is equivalent to excluding the corresponding variables from the final selected model, thus achieving the purpose of variable selection for parametric measurement error models. We now study the asymptotic properties of the resulting estimator from the penalized estimating equations. Denote β 0 = (β 10,, β d0 ) T the true value of β. Let and The a n = max p λ n ( β j0 ) : β j0 0, (4) b n = max p λ n ( β j0 ) : β j0 0, (5) 4

5 Measurement Error Model Selection where we write λ as λ n to emphasize that λ depends on the sample size n. We first establish the convergence rate of the penalized estimating equation estimator for the settings in which the dimension of β is fixed and finite. Theorem 1 Suppose that the regularity conditions (A1)-(A4) in Section 5 hold. If both a n and b n tends to 0 as n, then, with probability tending to one, there exists a root of (2), denoted ˆβ, such that ˆβ β 0 = O p (n 1/2 + a n ). Before we demonstrate that the resulting estimator possesses the oracle property, let us introduce some notation. Without loss of generality, it is assumed β 0 = (βi0 T, βt II0 )T, and in the true model, any element in β I0 is not equal to 0 while β II = 0. Throughout this paper, denote the dimension of β I as d 1 and that of β II as d 2. Let d = d 1 + d 2. Furthermore, denote b = p λ n (β 10 ),, p λ n (β d1 0) T (6) Σ = diagp λ n (β 10 ),, p λ n (β d1 0), (7) and the first d 1 components of S eff (W, Z, Y, β) is denoted S eff,i (β). Theorem 2 Under regularity conditions of Theorem 1, if lim inf n lim inf θ 0+ np λn (θ), (8) then with probability tending to one, any root-n convergent solution ˆβ = ( ˆβ T I, ˆβ T II )T satisfy that of (2) must (i) ˆβ II = 0; (ii) n ˆβI β I0 E S eff,i (β I0) N 0, E S eff,i (β I0) β T I β T I 1 Σ b D 1 Σ E Seff,I (β I0)Seff,I T (β I0) E S eff,i (β T I0) βi T Σ, where the notation to denote (M 1 ) T for a matrix M. D stands for convergence in distribution, and we use the notation M T 5

6 Y. Ma and R. Li For some penalty functions, including the SCAD penalty, b and Σ are zero when λ n is sufficiently small. So we actually have n( ˆβI β I0 ) N ( 0, E( S eff,i / βt I ) 1 E(S eff,i S T eff,i )E(S eff,i / βt I ) T ) in distribution. In other words, with probability tending to 1, the penalized estimator performs the same as the locally efficient estimator under the correct model. Hence it has the celebrated oracle property. In Theorems 1 and 2, both d 1 and d are fixed, as n. Concerns about model bias often prompts us to build models that contain many variables, especially so when the sample size becomes large. A reasonable way to capture such tendency is to consider the situation where the dimension of the parameter β increases along with the sample size n. Obviously, the methods in (2) (and also in (11) later on for semiparametric model case) can still be used in this case; the operation does not involve any modification from the fixed number of parameters case. However, the large sample properties of the same procedures will behave inevitably differently under the increasing parameter number case. We therefore study the sampling properties of the penalized estimating equation estimator under the setting in which both d 1 and d tend to infinity as n goes to infinity. To reflect that d and d 1 change with n, we use the notation d n1, d n. The issue of a diverging number of parameters has also been considered in Fan and Peng (2004) in the context of penalized likelihood. Because semiparametric efficiency is not defined under this setting, our result will no longer imply any local efficiency. However, when λ n is sufficiently small, both b and Σ are zero, so the oracle property still holds. The main results are given in the following two theorems, whose proofs are given in Section 5. Theorem 3 Under the regularity conditions (B1)-(B6) in Section 5, and if d 4 nn 1 0, λ n 0 when n, then with probability tending to one, there exists a root of (2), denoted ˆβ n, such that ˆβ n β n0 = O p d n (n 1/2 + a n ). Theorem 4 Under regularity conditions (B1)-(B6) in Section 5, assume λ n 0 and d 5 n/n 0 when n. If lim inf n lim inf θ 0+ n/dn p λ n (θ), (9) then with probability tending to one, any root-n consistent solution ˆ β n = ( ˆβ T I, ˆβ T II )T satisfy that (i) ˆβ II = 0, of (2) must 6

7 Measurement Error Model Selection (ii) For any d 1 1 vector v, s.t. v T v = 1, nv T [ E S eff,i (β I0)S T eff,i (β I0) ] 1/2 ˆβI β I0 E S eff,i (β I0) β T I 1 Σ b E S eff,i (β I0) β T I D N(0, 1) Σ 3 Semiparametric measurement error models The semiparametric measurement error model we consider here also has two parts. The major difference from its parametric counter part is that the main model contains an unknown function of one of the observable covariates. For notational convenience, we rewrite the observable covariates Z into two parts, Z and S, where now Z denotes an observable covariate that enters the model nonparametrically, say through θ(z), and S denotes the observable covariates that enter the model parametrically. The resulting semiparametric measurement error model can be summarized as p Y X,Z,S Y X, Z, S, β, θ(z) and p W X,Z,S (W X, Z, S). (10) The method we propose in this situation admits a similar simple form L(W i, Z i, S i, Y i, β, ˆθ i ) nṗ λ (β) = 0, (11) where ṗ λ (β) has the same form as in (3). However, the computation of L is more involved as we now describe. If we replace θ(z) with a single unknown parameter α, and append α to β, we obtain a parametric measurement error model with parameters (β T, α) T. For this parametric model, we can compute its corresponding Seff as we did in Section 2. Decompose S eff into L(X, Z, S, Y, β, α) and Ψ(X, Z, S, Y, β, α), where L has the same dimension as β and Ψ is a one-dimensional function. We solve for ˆθ i, i = 1,..., n, from K h (s i s 1 )Ψ(w i, z i, s i, y i ; β, θ 1 ) = 0. (12) K h (s i s n )Ψ(w i, z i, s i, y i ; β, θ n ) = 0, where K h (s) = h 1 K(s/h), K is a kernel function, h is a bandwidth. Insert the ˆθ i s into L in (11) and we obtain a complete description of the estimator. Note that since ˆθ i depends on β, so a more precise notation for ˆθ i is ˆθ i (β). 7

8 Y. Ma and R. Li Similar results hold for the semiparametric penalized estimator. In the semiparametric model setting, make the definitions that L I is the first d 1 components of L, L IβI is the partial derivative of L I with respect to β I, L Iθ is the partial derivative of L I with respect to θ, Ψ θ is the partial derivative of Ψ with respect to θ and Ψ βi is the partial derivative of Ψ with respect to β I. Also define Ω(Z) = E (Ψ θ Z), U I (Z) = E(L Iθ Z)/Ω(Z) and θ βi (Z) = E(Ψ βi Z)/Ω(Z). Note that the dimensions of β 1 and β here, d 1 and d, are fixed even when the sample size n increases. Theorem 5 Under the regularity conditions (C1)-(C6) in Section 5, if λ n 0 when n, then with probability tending to one, there exists a root of (11), denoted ˆβ, such that ˆβ β 0 = O p (n 1/2 + a n ). Further define A = E [ L IβI W, Z, S, Y, β 0, θ 0 (Z) + L Iθ W, Z, S, Y, β 0, θ 0 (Z)θβ T I (Z, β 0 ) ], B = cov L I W, Z, S, Y, β 0, θ 0 (Z) ΨW, Z, S, Y, β 0, θ 0 (Z)U I (Z), we obtain the following result. Theorem 6 Under the regularity conditions (C1)-(C6) in Section 5, if λ n 0 as n, and (8) holds, then with probability tending to one, any root-n consistent solution ˆβ = ( ˆβ T I, ˆβ T II )T of (11) must satisfy that (i) ˆβ II = 0, (ii) ˆβ I has the following asymptotic normality, n ˆβI β I0 (A Σ) 1 b D N 0, (A Σ) 1 B(A Σ) T. For some penalty functions, including the SCAD penalty, b and Σ are zero when λ n is sufficiently small, so we have n( ˆβ I β I0 ) D N(0, A 1 BA T ). These are exactly the same first order asymptotic properties of the locally efficient estimator for such semiparametric model in the case when only the first d 1 parameters are included in the model, that is, the procedure possesses the oracle property. Next, in the following two theorems, we present the asymptotic property of the estimator for increasing dimensions d n and d n1. Theorem 7 Under the regularity conditions (D1)-(D7) in Section 5, and if d 4 nn 1 0, λ n 0 when n, then with probability tending to one, there exists a root of (11), denoted ˆβ n, such that ˆβ n β n0 = O p d n (n 1/2 + a n ). 8

9 Measurement Error Model Selection Theorem 8 Under regularity conditions (D1)-(D7) in Section 5, if λ n 0, d 5 n/n 0, and (9) holds, then with probability tending to one, any root-n consistent estimator ˆ β n = ( ˆβ T I, ˆβ T II )T obtained in (11) must satisfied that (i) ˆβ II = 0, (ii) for any d n1 1 vector v such that v T v = 1, n/dn v T B 1/2 (A Σ) ˆβI β I0 (A Σ) 1 D b N(0, 1). 4 Numerical studies and application In this section, we assess the finite sample performance of the proposed procedure by Monte Carlo simulation, and illustrate the proposed methodology by an empirical analysis of the Framingham heart study data. In our simulation, we concentrate on the performance of the proposed procedure for a quadratic logistic measurement error model and partially linear logistic measurement error model in terms of model complexity and model error. 4.1 Tuning parameter selection To implement the Newton-Raphson algorithm to solve the penalized estimating equations, we locally approximate the first order derivative of the penalty function by a linear function, following the idea of local quadratic approximation algorithm proposed in Fan and Li (2001), Specifically, suppose that at the kth step in the course of the iteration, we obtain the value β (k). Then, for β (k) j not very close to zero, p λ (β j) = p λ ( β j )sign(β j ) p λ ( β(k) j ) β (k) β j, j otherwise, we set β (k+1) j = 0, and exclude the corresponding covariate from the model. This approximation is updated in every step in the course of the Newton-Raphson algorithm iteration. Following the results in Theorem 2 and 6, we can further approximate the estimation variance of the resulting estimate. That is ĉov( ˆβ) = 1 n (E Σ λ) 1 F (E Σ λ ) T, where Σ λ is a diagonal matrix with elements equal to p λ ( ˆβ j )/ ˆβ j for nonvanishing ˆβ j, a linear approximation of Σ defined in (7). We use E to denote the sample approximation of E Seff,I (W, Z, Y, β I)/ β I evaluated at ˆβ for the parametric model (1) and the sample approximation of the matrix A evaluated at ˆβ for the semiparametric model (10). Similarly, we use F to denote the sample approximation of 9

10 Y. Ma and R. Li var(seff,i ) evaluated at ˆβ for the parametric model and the sample approximation of the matrix B evaluated at ˆβ for the semiparametric model, respectively. The accuracy of this sandwich formula will be tested in our simulation studies. It is desirable to have automatic, data-driven method to select the tuning parameter λ. Here we will consider two tuning parameter selectors, the GCV and BIC tuning parameter selectors. To define the GCV statistic and the BIC statistic, we need to define the degrees of freedom and goodness of fit measure for the final selected model. Similar to the nonconcave penalty likelihood approach (Fan and Li, 2001), we may define effective number of parameter or degrees of freedom to be df λ = tracei(i + Σ λ ) 1, where I stands for the Fisher Information matrix. For the logistic regression models that we employ in this section, a natural approximation of I, ignoring the measurement error effect, is C T QC, where C represents the covariates included in the model, and Q is a diagonal matrix with the ith element equal to ˆµ λ,i (1 ˆµ λ,i ). Here, ˆµ λ,i = P (Y i = 1 C i ). In the logistic regression model context of this section, we may employ its deviance as goodness of fit measure. Specifically, let µ i be the conditional expectation of Y i given its covariates for i = 1,, n. The deviance of a model fit ˆµ λ = (ˆµ λ,1,, ˆµ λ,n ) is defined to be D(ˆµ λ ) = 2 [Y i log(y i /ˆµ λ,i ) + (1 Y i )log(1 Y i )/(1 ˆµ λ,i )]. Define GCV statistic to be GCV (λ) = D(ˆµ λ ) n(1 df λ /n) 2, and the BIC statistic to be BIC(λ) = D(ˆµ λ ) + 2log(n)df λ. The GCV and the BIC tuning parameter selectors select λ by minimizing GCV (λ) and BIC(λ) respectively. Note that the BIC tuning parameter selector is distinguished from the traditional BIC variable selection criterion, which is not well defined for estimating equation methods. Wang, Li and Tsai (2007) provided a study on the asymptotic behavior for the GCV and BIC tuning parameter selectors for the nonconvex penalized least squares variable selection procedures in linear and partially linear regression models. 10

11 Measurement Error Model Selection 4.2 Model error Let us simplify the model error for logistic partially linear measurement error model. µ(s, X, Z) = E(Y S, X, Z), and model error for a model ˆµ(S, X, Z) is defined Denote ME(ˆµ) = Eˆµ(S, X, Z ) µ(s, X, Z ) 2, where the expectation is taken over the new observation S, X and Z. Let g( ) be the logit link. For logistic partially linear model, the mean function has the form µ(s, X, Z) = g 1 θ(z) + β T V, where V = (S T, X T ) T. If ˆθ( ) and ˆβ are consistent estimator for θ( ) and β, respectively, then by a Taylor expansion, the model error can be approximated by ( ME(ˆµ) E ġ 1 θ(z ) + β T V 2 [ˆθ(Z ) θ(z ) 2 +( ˆβ T V β T V ) 2 + 2ˆθ(Z ) θ(z )( ˆβ ) T V β T V )]. The first component is the inherent model error due to ˆθ( ), the second one is due to lack of fit of ˆβ, and the third one is the cross-product between the first two components. Thus, to assess the performance of the proposed variable selection procedure, we define the approximate model error (AME) for ˆβ to be AME( ˆβ) [ = E ġ 1 θ(z ) + β T V 2 ( ˆβ T V β T V ) 2]. Furthermore, the AME of ˆβ can be written as AME( ˆβ) = ( ˆβ β) T E[ġ 1 θ(z ) + β T V 2 V V T ]( ˆβ β) ˆ=( ˆβ β) T C X ( ˆβ β). (13) In our simulation, the matrix C X is estimated by 1,000,000 Monte Carlo simulations. For measurement error data, we observe W instead of X. We also consider an alternative approximate model error AME W ( ˆβ) = ( ˆβ β) T C W ( ˆβ β), (14) where C W is obtained by replacing X with W in the definition of C X. The AME( ˆβ) and AME W ( ˆβ) are defined for parametric model case, specifically the quadratic logistic model, by setting θ( ) = 0 and append X 2 (or W 2 ) in V. Note that although we defined the model error in the context with a logistic link function, it is certainly not restricted to such case. The general approach for calculating AM E is to approximate the probability density function evaluated at the estimated parameters around the true parameter value, and extract the linear term of the parameter of interest. AME W is calculated through replacing X with W. 11

12 Y. Ma and R. Li 4.3 Simulation examples To demonstrate the performance of the method in both parametric and semiparametric measurement error models, we conduct two simulation studies. Example 1. In this example, we generate data from a logistic model where the covariate measured with error enters the model through a quadratic function, and the covariates measured without error enter linearly. The measurement error follows a normal additive pattern. Specifically, we have logitp(y = 1 X, Z) = β 0 + β 1 X + β 2 X 2 + β 3 Z 1 + β 4 Z 2 + β 5 Z 3 + β 6 Z 4 + β 7 Z 5 + β 8 Z 6 + β 9 Z 7, and W = X + U, where β = (0, 1.5, 2, 0, 3, 0, 1.5, 0, 0, 0) T, the covariate X is generated from a normal distribution with mean 0 and variance 1, (Z 1,..., Z 6 ) T is generated from a normal distribution with mean 0 and covariance between Z i and Z j is 0.5 i j. The last component of the Z covariates, Z 7, is a binary variable with equal probability to be 0 or 1. U N(0, ). In our simulation, the sample size is taken to be either n = 500 or n = The model complexity of the selected model is summarized in terms of the number of zero coefficients, and the model error of the selected model is summarized in terms of relative approximation model error (RAME), defined to be the ratio of model error of the selected model to that of the full model. In Table 1, the RAME column corresponds to the sample median and median absolute deviation (MAD) divided by a factor of of the AME values defined in (13) over 1000 simulations. Similarly, RAME W corresponds to that of the AME W values defined in (14) over 1000 simulations. From Table 1, it can be seen that the values of RAME and RAME W are very close. The average count of zero coefficients is also reported in Table 1, in which the column labeled C presents the average count restricted only to the true zero coefficients, while the column labeled E displays the average count of the coefficients erroneously set to 0. Table 1: MRMEs and Model Complexity for Example 1 n RAME RAME W # of Zero Cofficients Median(MAD) Median(MAD) C E GCV (0.231) 0.698(0.228) BIC (0.188) 0.396(0.187) GCV (0.187) 0.770(0.185) BIC (0.157) 0.401(0.158)

13 Measurement Error Model Selection Table 2: Bias and Standard Errors for Example 1 bias(sd) SE(Std(SE)) bias(sd) SE(Std(SE)) ˆβ 1 ˆβ2 n = 500 EE (0.609) (0.418) (0.922) 0.585(0.647) GCV (0.426) (0.159) (0.418) 0.388(0.178) BIC (0.593) (0.189) (0.462) 0.367(0.176) n = 1000 EE (0.273) (0.062) (0.332) 0.321(0.088) GCV (0.254) (0.048) (0.258) 0.253(0.057) BIC (0.290) (0.054) (0.255) 0.244(0.052) ˆβ 4 ˆβ6 n = 500 EE (0.618) (0.397) (0.465) (0.256) GCV (0.453) (0.105) (0.302) (0.066) BIC (0.462) (0.107) (0.286) (0.065) n = 1000 EE (0.314) (0.046) (0.207) (0.026) GCV (0.282) (0.038) (0.184) (0.022) BIC (0.275) (0.037) (0.166) (0.020) We next verify the consistency of the estimators and test the accuracy of the proposed standard error formula. Table 2 displays the bias and sample standard deviation (SD) of the estimates for two nonzero coefficients, (β 1, β 2 ), over 1000 simulations, and the sample average and the sample standard deviations of the 1000 standard errors obtained by using the sandwich formula. The row labeled EE corresponds to the unpenalized estimating equation estimator. Overall, the sandwich formula works well. It is worth pointing out that a variable selection procedures effectively reduces the finite sample estimation bias and variance, even though asymptotically, the estimator without variable selection is also consistent. Example 2. In this example, we illustrate the performance of the method for a semiparametric 13

14 Y. Ma and R. Li measurement error model. Simulation data are generated from logit(y ) = β 1 X + β 2 S β 10 S 9 + θ(z) W = X + U where β, X and W are the same as in the previous simulation. We generate S s in a similar fashion as the Z s in simulation 1. That is, (S 1,..., S 8 ) is generated from normal distribution with mean zero and covariance between S i and S j is 0.5 i j. S 9 is a binary variable with equal probability to be zero or one. The random variable Z is generated from a uniform distribution in [ π/2, π/2]. The true function θ(z) = 0.5 cos(z). The parameter takes values β = (1.5, 2, 0, 0, 3, 0, 1.5, 0, 0, 0). The simulation results are summarized in Table 3, in which the notation is the same as that in Table 1. From Table 3, we can see that the penalized estimating equation estimators can significantly reduce model complexity. Overall, the BIC tuning parameter selectors performs better, while GCV is too conservative. We have further tested the consistency and the accuracy of the standard error formula derived from the sandwich formula. The result is summarized in Table 4, in which the notation is the same as that in table 2, and from which we can see the consistency of the estimator and that the standard error formula performs rather well when n = Table 3: MRMEs and Model Complexity for Example 2 Method n RME X RME W # of Zero Coefficients Median(MAD) Median(MAD) C E GCV (0.161) 0.880(0.158) BIC (0.158) 0.387(0.155) GCV (0.164) 0.873(0.160) BIC (0.162) 0.392(0.161) Data Analysis Framingham heart study data set (Kannel et al., 1986) is a well known data set where it is generally accepted that there exists measurement error on the long term systolic blood pressure (SBP). In addition to SBP, other measurements include age, smoking status and serum cholesterol. In literature, there has been speculation that a second order term involving age might be needed in analyzing the dependence of the occurrence of heart disease. In addition, it is unclear if the interaction (product) between the various covariates plays a role in influencing the heart disease rate. The data set includes 1615 observations. 14

15 Measurement Error Model Selection Table 4: Bias and Standard Errors for Example 2 Bias(SD) SE(Std(SE)) Bias(SD) SE(Std(SE)) ˆβ 1 ˆβ2 n = 500 EE (0.267) (0.043) (0.295) (0.045) GCV (0.275) (0.047) (0.297) (0.050) BIC (0.262) (0.044) (0.273) (0.045) n = 1000 EE 0.039(0.170) 0.166(0.018) 0.057(0.194) 0.190(0.018) GCV 0.047(0.174) 0.172(0.020) 0.069(0.196) 0.191(0.021) BIC 0.031(0.169) 0.170(0.019) 0.044(0.179) 0.185(0.019) ˆβ 5 ˆβ7 n = 500 EE (0.415) (0.063) (0.288) (0.038) GCV (0.418) (0.071) (0.285) (0.044) BIC (0.378) (0.064) (0.249) (0.037) n = 1000 EE (0.258) (0.026) (0.181) (0.016) GCV (0.265) (0.031) (0.181) (0.020) BIC (0.242) (0.028) (0.159) (0.016) With the method developed here, it is possible to perform a variable selection to address these issues. Following the literature, we adopt the measurement error model of log(msbp 50) = log(sbp 50)+U, where U is a mean zero normal random variable with variance σu 2 = , MSBP is the measured SBP. We denote the standardized log(msbp 50) as W. The standardization using the same parameters on log(sbp 50) is denoted X. The standardized serum cholesterol, age are denoted Z 1, Z 2, and we use Z 3 to denote the binary variable smoking status. Using Y to denote the occurrence of heart disease, the saturated model which includes all the interaction terms and 15

16 Y. Ma and R. Li 1 Plots of GCV and BIC scores versus λ Normalized GCV and BIC scores GCV BIC λ Figure 1: Tuning parameters and their corresponding BIC and GCV scores for the Framingham data. The scores are normalized to the range [0, 1]. also the square of age term is of the form logitp(y = 1 X, Z s) = β 1 X + β 2 XZ 1 + β 3 XZ 2 + β 4 XZ 3 + β 5 + β 6 Z 1 + β 7 Z 2 + β 8 Z 3 + β 9 Z 2 2 W = X + U. +β 10 Z 1 Z 2 + β 11 Z 1 Z 3 + β 12 Z 2 Z 3 We first used both GCV and BIC tuning parameter selectors to choose λ. We present the tuning parameters and the corresponding GCV and BIC scores in Figure 1. The final chosen λ is and by the GCV and BIC selectors, respectively. The selected model is depicted in Table 5. The GCV criterion selects the covariates X, XZ 1, 1, Z 1, Z 2, Z 3, Z2 2, Z 2Z 3 into the model, while the BIC criterion selects the covariates X, 1, Z 1, Z 2 into the model. we report the selection and estimation results in table 5, as well as the semiparametric estimation results without variable selection. As we can see, the terms X, 1, Z 1, Z 2 are selected by both criteria, while Z 3, Z2 2 and some of the interaction terms are selected by GCV. The BIC criterion is very aggressive and it results in a very simple final model while the GCV criterion is much more conservative, hence the resulting model is more complex. This agrees with the simulation results we have obtained. Since both criteria have included the covariate X, we see that the measurement error feature and its treatment in the framingham data is unavoidable. 16

17 Measurement Error Model Selection Table 5: Results for the Framingham data set. EE GCV BIC ˆβ (SE) ˆβ (SE) ˆβ (SE) X 0.643(0.248) 0.416(0.093) 0.179(0.039) XZ (0.097) (0.041) 0 (NA) XZ (0.111) 0 (NA) 0 (NA) XZ (0.249) 0 (NA) 0 (NA) Intercept (0.428) (0.356) (0.092) Z (0.212) 0.332(0.085) 0.124(0.033) Z (0.341) 1.044(0.329) 0.398(0.067) Z (0.443) 0.907(0.373) 0 (NA) Z (0.125) (0.121) 0 (NA) Z 1 Z (0.103) 0 (NA) 0 (NA) Z 1 Z (0.225) 0 (NA) 0 (NA) Z 2 Z (0.336) (0.326) 0 (NA) 5 Proofs List of Regularity Conditions for Theorems 1 and 2 (A1) The model is identifiable. (A2) The expectation of the first derivative of S eff β 0. with respect to β exists and is not singular at (A3) The second derivative of S eff neighborhood of β 0. with respect to β exists and is continuous and bounded in a (A4) The variance-covariance matrix of the first d 1 components of Seff, S eff,i, is positive definite at β 0. Proof of Theorem 1 From condition (A2), we can denote J = ( S ) 1 eff E β T β 0, φ eff = JS eff and q λ n (β) = Jp λ n (β). Let α n = n 1/2 + a n, φ eff,i (β) = φ eff (W i, Z i, Y i, β). It is sufficient to show that n 1/2 φ eff,i (β) n1/2 q λ n (β) = 0 (15) 17

18 Y. Ma and R. Li has a solution ˆβ that satisfies ˆβ β 0 = O p (α n ). Consider (β β 0 ) T n 1/2 φ eff,i (β) n1/2 q λ n (β). For any β such that β β 0 = Cα n for some constant C, because of condition (A3) and a n 0, it follows by the Taylor expansion that (β β 0 ) T n 1/2 φ eff,i (β) n1/2 q λ n (β) = (β β 0 ) T [ n 1/2 φ eff,i (β 0) n 1/2 q λ n (β 0 ) + n 1/2 ] n 1/2 q λ n (β 0 ) β T (β β 0 )1 + o p (1) [ = (β β 0 ) T n 1/2 φ eff,i (β 0) n 1/2 q λ n (β 0 ) + n 1/2 1 + o p (1)(β β 0 ) φ eff,i (β 0) β T (β β 0 )1 + o p (1) ] +n 1/2 O(b n )(β β 0 )1 + o p (1) = (β β 0 ) T n 1/2 φ eff,i (β 0) n 1/2 q λ n (β 0 ) + n 1/2 β β o p n 1/2 β β 0 2. The first term in the above display is of order O p (Cα n ) = O p (C nα 2 n), the second term equals C 2 nα 2 n, which dominates the first term as long as C is large enough. The remaining terms are dominated by the first two terms. Thus, for any ɛ > 0, as long as C is large enough, the probability for the above display to be larger than zero is at least 1 ɛ. From Brouwer fixed-point theorem, we know there with probability at least 1 ɛ, there exists at least one solution for (15) in the region β β 0 Cα n. To show Theorem 2, we first prove the following lemma. Lemma 1 If conditions in Theorem 2 hold, then with probability tending to 1, for any given β that satisfies β β 0 = O p (n 1/2 ), β II = 0 is a solution to the last d 2 equations of (2). Proof: Denote the kth equation in n S eff (W i, Z i, Y i, β) as L k (β), k = d 1 + 1,..., d, then L k (β) np λ n (β k ) d L k (β) d = L k (β 0 ) + (β j β j0 ) β j j=1 d l=1 j=1 2 L k (β ) β l β j (β l β l0 )(β j β j0 ) np λ n,k ( β k )sign(β k ). The first three terms of the above display are all of order at most O p (n 1/2 ), hence we have L k (β) np λ n (β k ) = n np λ n ( β k )sign(β k ) + O p (1). 18

19 Measurement Error Model Selection Using (8), the sign of L k (β) np λ n (β k ) is decided by sign(β k ) completely when n is large enough. From the continuity of L k (β) np λ n (β k ), we obtain that it is zero at β k = 0. Proof of Theorem 2 We let λ n be sufficiently small so that a n = o(n 1/2 ). From Lemma 1, there is a root-n consistent estimator ˆβ. From Lemma 1, ˆβ = ( ˆβ T I, 0T ) T, so (i) is shown. Denote the first d 1 equations in n S eff W i, Z i, Y i, (β T I, 0T ) T as L(β I ), then 0 = L( ˆβ I ) np λ n,i ( ˆβ I ) L(βI0 ) = L(β I0 ) + = L(β I0 ) + n β T I + o p (n) ( ˆβ I β I0 ) nb n Σ + o p (1) ( ˆβ I β I0 ) E S eff,i (β I0) β T I We thus obtain n ˆβI β I0 E S eff,i (β I0) βi T Σ ˆβI β I0 1 Σ b = n 1/2 Because of condition (A4), the results in (ii) now follow. E S eff,i (β I0) β T I E S eff (β I0) β T I 1 Σ b + o p (n 1/2 ). Σ 1 L(β I0 ) + o p (1). List of Regularity Conditions for Theorems 3 and 4 (B1) The model is identifiable. (B2) The expectation of the first derivative of S eff with respect to β exists at β 0 and its left eigenvalues are bounded away from zero and infinity uniformly for all n. For any entry S jk in S eff (β 0)/ β T, E(S 2 jk ) < C 3 <. (B3) The eigenvalues of the matrix E(S eff,i S T eff,i ), satisfy 0 < C 1 < λ min < < λ max < C 2 < for all n; For any entries, S k, S j in S eff (β 0), E(S 2 k S2 j ) < C 4 <. (B4) The second derivatives of Seff with respect to β exists and the entries are uniformly bounded by a function M(W i, Z i, Y i ) in a large enough neighborhood of β 0. In addition, E(M 2 ) < c 5 < for all n, d. (B5) min 1 j d1 β j0 /λ n as n. (B6) Let c n = max 1 j d1 p λ ( β j0 ) : β j0 0. Assume that λ n 0, a n = O(n 1/2 ) and c n 0 n. In addition, there exists constants C and D such that when θ 1, θ 2 > Cλ, p λ (θ 1) p λ (θ 2) D θ 1 θ 2. 19

20 Y. Ma and R. Li To emphasize the dependence of v, λ, d, d 1, d 2, S eff, b, Σ, β, β I, β II, β j, for j = 1,..., d, on n, we write them as v n, λ n, d n, d n1, d n2, S n,eff, b n, Σ n, β n, β ni, β nii, β nj respectively throughout the proof. Proof of Theorem 3 From condition (B2), we can denote J n = ( S ) 1 n,eff E βn T βn0, φ n,eff = J nsn,eff and q λ n (β n ) = J n p λ n (β n ). Let α n = n 1/2 + a n, φ n,eff,i (β n) = φ n,eff (W i, Z i, Y i, β n ). It suffices to show that n 1/2 φ n,eff,i (β n) n 1/2 q λ n (β n ) = 0 (16) has a solution ˆβ n that satisfies ˆβ n β n0 = O p (d 1/2 n α n ). Consider (β n β n0 ) T n 1/2 φ n,eff,i (β n) n 1/2 q λ n (β n ). For any β n such that β n β n0 = C d n α n for some constant C, because of condition (B2), (B3), (B4) and (B6), we have the expansion (β n β n0 ) T n 1/2 φ n,eff,i (β n) n 1/2 q λ n (β n ) = (β n β n0 ) T [ n 1/2 φ n,eff,i (β n0) n 1/2 q λ n (β n0 ) + n 1/2 φ n,eff,i (β n0) (β n β n0 ) ] +2 1 n 1/2 (β n β n0 ) T 2 φ n,eff,i (β n) β n βn T (β n β n0 ) n 1/2 q λ n (β n0 ) βn T (β n β n0 )1 + o p (1) [ = (β n β n0 ) T n 1/2 φ n,eff,i (β n0) n 1/2 q λ n (β n0 ) + n 1/2 1 + o p (1)(β n β n0 ) ] +o p (1) + n 1/2 O(c n )(β n β n0 )1 + o p (1) = (β n β n0 ) T n 1/2 φ n,eff,i (β n0) n 1/2 q λ n (β n0 ) + n 1/2 β n β n0 2 +o p (n 1/2 β n β n0 2 ). In the above derivation, β n is in between β n and β n0, and we used n 1/2 (β n β n0 ) T 2 φ n,eff,i (β n) β n βn T (β n β n0 ) = o p (1), 20 β T n

21 Measurement Error Model Selection which is shown in detail in (18), hence the proof is omitted here. The first term in the above display is bounded by β n β n0 O p n 1/2 φ n,eff,i (β n0) n 1/2 q λ n (β n0 ) = β n β n0 O p ((d n + d n na 2 n) 1/2 ) = O p (Cn 1/2 d n αn), 2 the second term equals C 2 n 1/2 d n αn, 2 which dominates the first term with probability 1 ɛ for any ɛ > 0 as long as C is large enough. The remaining terms are dominated by the first two terms. Thus, for any ɛ > 0, for large enough C, the probability for the above display to be larger than zero is at least 1 ɛ. From Brouwer fixed-point theorem, we know that with probability at least 1 ɛ, there exists at least one solution for (16) in the region β β 0 C d n α n. We first prove the following lemma, then give the proof of Theorem 4. Lemma 2 If the conditions in Theorem 4 hold, then with probability tending to 1, for any given β n that satisfies β n β n0 = O p ( d n /n), β nii = 0 is a solution to the last d n2 equations of (2). Proof: Denote the kth equation in n S n,eff (W i, Z i, Y i, β n ) as L nk (β n ), k = d n1 + 1,..., d n, then L nk (β n ) np λ n,k (β nk) = L nk (β n0 ) + d n j=1 L nk (β n0 ) (β nj β nj0 ) β nj d n d n l=1 j=1 2 L nk (β n) β nl β nj (β nl β nl0 )(β nj β nj0 ) np λ n,k ( β nk )sign(β nk ), (17) where β n is in between ˆβ n and β n0. Because of condition (B3), the first term of (17) is of order O p (n 1/2 ) = o p (n 1/2 d n 1/2 ). The second term (17) can be further written as d n j=1 Lnk (β n0 ) E L nk(β n0 ) (β nj β nj0 ) + β nj β nj where the first term is controlled by β n1 d n j=1 E L nk(β n0 ) β nj (β nj β nj0 ), d n Lnk (β n0 ) E L 1/2 nk(β n0 ) 2 β n β n0 β j=1 nj β nj ( S ) = O(d 1/2 n n 1/2 n,eff,k )O p var O p (d 1/2 n n 1/2 ) O(d n 1/2 n 1/2 )O p E(S 2 k1 ) O p (d n 1/2 n 1/2 ) = o p (n 1/2 d n 1/2 ), 21

22 Y. Ma and R. Li due to condition (B2), and the second term is controlled by n d n j=1 ( E S ) 2 n,eff,k βn T 1/2 The third term of (17) can be further written as d n d n l=1 j=1 + d n l=1 j=1 β n β n0 nλ max E S 2 n,eff,k βn T β n β n0 = O p (n 1/2 d 1/2 n ). 2 L nk (β E n) β nl β nj d n where the first term is bounded by n d n j,l=1 d n j,l=1 (β nl β nl0 )(β nj β nj0 ) [ 2 L nk (βn) 2 L nk (β E n) β nl β nj β nl β nj [ 2 L nk (β ] 1/2 2 E n) β n β n0 2 = β nl β nj 1/2 d n E(M) 2 O p (n 1 d n ) E(M 2 ) j,l=1 d n j,l=1 due to condition (B4), and the second term is bounded by = d n l,j=1 d n l,j=1 ] (β nl β nl0 )(β nj β nj0 ), 1/2 [ 2 L nk (βn) 2 L nk (β ] 2 E n) β nl β nj β nl β nj [ ne 2 S n,eff,k (βn) ] 2 β nl β nj 1/2 O p (n 1 d n ) O p (d n ) = O p (d n d n ) = o p (n 1/2 d 1/2 n ), 1/2 no p var 2 Sn,eff,k (β n) 1/2 O p (n 1 d n ) β nl β nj d n O p l,j=1 d n l,j=1 E 2 Sn,eff,k 2 (β n) β nl β nj O p E(M 2 ) 1/2 due to condition (B4) and (B5). Hence we have 1/2 β n β n0 2 O p (n 1/2 d n ) O p (n 1/2 d n ) = O p (n 1/2 d 2 n) = o p (n 1/2 d 1/2 n ), L nk (β n ) np λ n,k (β nk) = n n/d n p λ n,k ( β nk )sign(β nk ) + O p (1). Using condition (9), the sign of L nk (β n ) np λ n (β nk ) is decided by sign(β nk ) completely when n is large enough. From the continuity of L nk (β n ) np λ n (β nk ), we obtain that it is zero at β nk = 0. 22

23 Measurement Error Model Selection Proof of Theorem 4: From Theorem 3 and condition (B6), there is a root-(n/d n ) consistent estimator ˆβ n. From Lemma 2, ˆβn = ( ˆβ T ni, 0T ) T, so (i) is shown. Denote the first d n1 equations in n S n,eff W i, Z i, Y i, (β T ni, 0T ) T as L n (β ni ). From Lemma 2, if we select β ni β ni0 = O p (d n 1/2 n 1/2 ), then (i) is shown. Now consider solving the first d n1 equations in (2) for β ni, while β nii = 0. Obviously, when λ n is sufficiently small, a n = 0, hence from Theorem 3, there is a rootn/d n consistent root, denote it ˆβ I. Denote the first d n1 equations in n S n,eff (W i, Z i, Y i, β n ) as L n (β n ), then 0 = L n ( ˆβ ni ) np λ n,i ( ˆβ ni ) = L n (β I0 ) + L n(β I0 ) βni T ( ˆβ ni β I0 ) ( ˆβ ni β ni0 ) T 2 L n (βni ) β ni βni T ( ˆβ ni β ni0 ) nb n np λ n,i (β ni)( ˆβ ni β ni0 ), where β ni is between β I0 and ˆβ ni. We also have n 1 ( ˆβ ni β ni0 ) T 2 L n (βni ) β ni βni T ( ˆβ ni β ni0 ) 2 ˆβ ni β ni0 ) 4 n 1 2 L n (βni ) β ni βni T 2 [ = O p (d 2 n n 2 n 1 d 3 Sn,eff,i (βni n1)o p (n E ) ] 2 Sn,eff,i (βni + var ) ) β j β k β j β k O p (d n 5 n 3 )O p [n E(M) 2 + EM 2] O p (d n 5 n 3 )O p (nem 2 ) = o p (n 1 ), (18) due to condition (B4). In addition, n 1 L n(β ni0 ) β T ni 2 n 1 L n(β ni0 ) β T ni p λ n,i (β ni) E L n(β ni0 ) β T ni E L n(β ni0 ) βni T 2 + O p (n 1 d n1 ) due to condition (B5) and (B6). Since for any fixed ɛ > 0, we have P r n 1 L n(β ni0 ) βni T E L n(β ni0 ) 1 βni T ɛd n d n 2 n 2 ɛ 2 E L n(β ni0 ) β T ni due to condition (B2), we have + p λ n,i (β ni0) 2 ne L n(β ni0 ) βni T 2 = O(d 2 n n 2 d 2 n1n) = o(1), Therefore, n 1 L n(β ni0 ) β T ni n 1 L n(β ni0 ) β T ni p λ n,i (β n) E L n(β ni0 ) β T ni E L n(β ni0 ) βni T = o p (d 1 n ). + p λ n,i (β ni0) 2 = o p (d n 2 ), 23

24 Y. Ma and R. Li and subsequently, n 1 L n(β ni0 ) βni T p λ n,i (β n) E L n(β ni0 ) βni T + p λ n,i (β ni0) ( ˆβ ni β ni0 ) o p (d 1 n )O p (n 1/2 d 1/2 n ) = o p (n 1/2 ). We thus obtain E L n(β ni0 ) β T ni + Σ n ( ˆβ ni β ni0 ) + b n = n 1 L n (β ni0 ) + o p (n 1/2 ). Denoting I = E Sn,eff,I (β ni0)sn,eff,i T (β ni0), using condition (B3), we have Let [ n 1/2 v n I 1/2 E L ] n(β ni0 ) βni T + Σ n ( ˆβ ni β ni0 ) + b n = n 1/2 v n I 1/2 L n (β ni0 ) + o p (v n I 1/2 ) = n 1/2 v n I 1/2 L n (β ni0 ) + o p (1). It follows that for any ɛ > 0, Y i = n 1/2 v n I 1/2 S n,eff,i (W i, Z i, Y i, β ni0 ), i = 1,..., n. E Y i 2 1( Y i > ɛ) = ne Y 1 2 1( Y 1 > ɛ) n(e Y 1 4 ) 1/2 P r ( Y 1 > ɛ) 1/2. We have and P r ( Y 1 > ɛ) = P r ( Y 1 2 > ɛ 2 ) E Y 1 2 = v nv T n nɛ 2 = O(n 1 ), E( Y 1 4 ) = n 2 E v n I 1/2 S n,eff,i (W 1, Z 1, Y 1, β ni0 ) 4 ɛ 2 = E v ni 1/2 S n,eff,i (W 1, Z 1, Y 1, β ni0 ) 2 nɛ 2 = n 2 ES n,eff,i (W 1, Z 1, Y 1, β ni0 ) T I 1/2 v T n v n I 1/2 S n,eff,i (W 1, Z 1, Y 1, β ni0 ) 2 n 2 λ 2 max(v T n v n )E(S n,eff,i (W 1, Z 1, Y 1, β ni0 ) T I 1 S n,eff,i (W 1, Z 1, Y 1, β ni0 ) 2 n 2 λ 2 max(v T n v n )λ 2 max(i 1 )ES n,eff,i (W 1, Z 1, Y 1, β ni0 ) T S n,eff,i (W 1, Z 1, Y 1, β ni0 ) 2 = n 2 λ 2 max(i 1 )E S n,eff,i (W 1, Z 1, Y 1, β ni0 ) 4 = O(d 2 n1n 2 ), due to condition (B3). Hence, E Y i 2 1( Y i > ɛ) = O(nd n1 n 1 n 1/2 ) = o(1). 24

25 Measurement Error Model Selection On the other hand, cov(y i ) = ncovn 1/2 v n I 1/2 S n,eff,i (W 1, Z 1, Y 1, β ni0 ) = v n I 1/2 ES n,eff,i (W 1, Z 1, Y 1, β ni0 )S n,eff,i (W 1, Z 1, Y 1, β ni0 ) T I 1/2 vn T = 1. Following Lindeberg-Feller central limit theorem, the results in (ii) now follow. List of Regularity Conditions for Theorems 5 and 6 (C1) The model is identifiable. (C2) The first derivatives of L with respect to β and θ exist and are denoted as L β and L θ respectively. The first derivative of θ with respect to β exists and is denoted as θ β. Then E(L β + L θ θβ T exists and is not singular at β 0 and the true function θ 0 (Z). (C3) The second derivatives of L with respect to β and θ, 2 L/ β β T, 2 L/ θ 2 and 2 L/ β θ, exist and are uniformly bounded in a neighborhood of β 0, θ 0. The second derivative of θ with respect to β exists and is uniformly bounded in a neighborhood of β 0 for all θ(z) in a neighborhood of θ 0. (C4) The variance-covariance matrix of L I ΨU I (Z) is positive definite at β 0, θ 0. (C5) The random variable Z has compact support and its density f Z (z) is positive on that support. The bandwidth h satisfies nh 4 0 and nh 2. θ(z) has bounded second derivative. Proof of Theorem 5 From condition (C2), we can denote J = [ E ( L β + L θ θβ T ) ] 1 β0,θ 0, φ eff (β, θ) = JL(β, θ) and q λ n (β) = Jp λ n (β). Let α n = n 1/2 + a n, φ eff,i (β, ˆθ) = φ eff W i, Z i, Y i, β, ˆθ i (β), and Ψ eff,i (β, ˆθ) = Ψ eff β, ˆθ i (β), where ˆθ i are obtained from (12). We will prove that n 1/2 φ eff,i (β, ˆθ) n 1/2 q λ n (β) = 0 (19) has a solution ˆβ that satisfies ˆβ β 0 = O p (α n ). Consider (β β 0 ) T n 1/2 φ eff,i (β, ˆθ) n 1/2 q λ n (β). 25

26 Y. Ma and R. Li For any β such that β β 0 = Cα n for some constant C, because of condition (C3) and (C6), using the expansion (A.3) in Ma and Carroll (2006), we have (β β 0 ) T n 1/2 φ eff,i (β, ˆθ) n 1/2 q λ n (β) = (β β 0 ) T [ n 1/2 φ eff,i (β 0, θ 0 ) n 1/2 q λ n (β 0 ) +Eφ eff,i,β (β 0, θ 0 ) + φ eff,i,θ (β 0, θ 0 )θβ T (Z, β 0)n 1/2 (β β 0 ) +n 1/2 φ eff,i (β ] 0, θ 0 ) ˆθ(β 0 ) θ 0 + o p (1) n 1/2 q λ n (β 0 ) θ β T (β β 0 )1 + o p (1) [ = (β β 0 ) T n 1/2 φ eff,i (β 0, θ 0 ) n 1/2 q λ n (β 0 ) + n 1/2 (β β 0 ) +n 1/2 φ eff,i (β ] 0, θ 0 ) θ T ˆθ(β 0 ) θ 0 + n 1/2 O(b n )(β β 0 )1 + o p (1) = (β β 0 ) T n 1/2 φ eff,i (β 0, θ 0 ) n 1/2 q λ n (β 0 ) + n 1/2 β β 0 2 +(β β 0 ) T n 1/2 Due to the usual local estimating equation expansion ˆθ(z, β 0 ) θ 0 (z) = (h 2 /2)θ 0(z) n 1 φ eff,i (β 0, θ 0 ) θ T ˆθ(β 0 ) θ 0 + o p n 1/2 β β 0 2. (20) K h (Z j z)ψ j (β 0, θ 0 )/f Z (z)ω(z) + o p (n 1/2 ), (21) j=1 substituting into the third term in (20), making use of condition (C5), it can be easily seen that (β β 0 ) T n 1/2 φ eff,i (β 0, θ 0 ) θ T ˆθ(β 0 ) θ 0 = (β β 0 ) T n 1/2 Ψ i (β 0, θ 0 )JU(Z i ) + o p (1), where U(Z) = E( L/ θ Z)/Ω(Z). Continuing from (20), we have (β β 0 ) T n 1/2 φ eff,i (β, ˆθ) n 1/2 q λ n (β) = (β β 0 ) T n 1/2 φ eff,i (β 0, θ 0 ) n 1/2 q λ n (β 0 ) n 1/2 +n 1/2 β β o p n 1/2 β β 0 2. Ψ i (β 0, θ 0 )JU(Z i ) The first term in the above display is of order O p (Cn 1/2 α 2 n), the second term equals C 2 n 1/2 α 2 n, which dominates the first term as long as C is large enough. The last term is dominated by the first two 26

27 Measurement Error Model Selection terms. Thus, for any ɛ > 0, as long as C is large enough, the probability for the above display to be larger than zero is at least 1 ɛ. From Brouwer fixed-point theorem, we know there with probability at least 1 ɛ, there exists at least one solution for (19) in the region β β 0 Cα n. We first prove the following lemma, then give the proof of Theorem 6. Lemma 3 If conditions in Theorem 6 hold, then with probability tending to 1, for any given β that satisfies β β 0 = O p (n 1/2 ), β II = 0 is a solution to the last d 2 equations of (11). Proof: Denote the kth equation in n L iβ, ˆθ(β) as L k (β, ˆθ), and that in n Ψ i(β 0, θ 0 )U(Z i ) as G k (β 0, θ 0 ), k = d 1 + 1,..., d, then the expansion in Lemma 5 leads to L k (β, ˆθ) np λ n,k (β k) = L k (β 0, θ 0 ) G k (β 0, θ 0 ) + n d (J 1 ) kj (β j β j0 ) np λ n,k ( β k )sign(β k ) + o p (n 1/2 ). j=1 The first three terms of the above display are all of order O p (n 1/2 ), hence we have L k (β, ˆθ) np λ n,k (β k) = n np λ n ( β k )sign(β k ) + O p (1). Due to (8). the sign of L k (β) np λ n (β k ) is decided by sign(β k ) completely. From the continuity of L k (β) np λ n (β k ), we obtain that it is zero at β k = 0 with probability larger than any 1 ɛ. Proof of Theorem 6: We let λ n be sufficiently small so that a n = o(n 1/2 ). From Theorem 5, there is a root-n consistent estimator ˆβ. From Lemma 3, ˆβ = ( ˆβ T I, 0T ) T, so (i) is shown. Denote the first d 1 equations in n L i(β T I, 0T ) T, ˆθ as Lβ I, ˆθ(β I ), and that in n Ψ i(β 0, θ 0 )U(Z i ) as G(β I0, θ 0 ). Note that the d 1 d 1 upper left block of J 1 is the matrix A defined in Theorem 6. Use the expansion in Theorem 5 at β = (β T I, 0)T, the first d 1 equations yield 0 = L ˆβ I, ˆθ( ˆβ I ) np λ n,i ( ˆβ I ) We thus obtain = L(β I0, θ 0 ) G(β I0, θ 0 ) + na( ˆβ I β I0 ) nb n Σ + o p (1) ( ˆβ I β I0 ) + o p (n 1/2 ) [ ] = L(β I0, θ 0 ) G(β I0, θ 0 ) + n(a Σ) ˆβI β I0 (A Σ) 1 b + o p (n 1/2 ). n ˆβI β I0 (A Σ) b 1 = n 1/2 (A Σ) 1 L(β I0, θ 0 ) G(β I0, θ 0 ) + o p (1). Because of condition (C4), the results in (ii) now follow. List of Regularity Conditions for Theorem 7 and 8 27

Nonconcave Penalized Likelihood with A Diverging Number of Parameters

Nonconcave Penalized Likelihood with A Diverging Number of Parameters Nonconcave Penalized Likelihood with A Diverging Number of Parameters Jianqing Fan and Heng Peng Presenter: Jiale Xu March 12, 2010 Jianqing Fan and Heng Peng Presenter: JialeNonconcave Xu () Penalized

More information

Paper Review: Variable Selection via Nonconcave Penalized Likelihood and its Oracle Properties by Jianqing Fan and Runze Li (2001)

Paper Review: Variable Selection via Nonconcave Penalized Likelihood and its Oracle Properties by Jianqing Fan and Runze Li (2001) Paper Review: Variable Selection via Nonconcave Penalized Likelihood and its Oracle Properties by Jianqing Fan and Runze Li (2001) Presented by Yang Zhao March 5, 2010 1 / 36 Outlines 2 / 36 Motivation

More information

Analysis Methods for Supersaturated Design: Some Comparisons

Analysis Methods for Supersaturated Design: Some Comparisons Journal of Data Science 1(2003), 249-260 Analysis Methods for Supersaturated Design: Some Comparisons Runze Li 1 and Dennis K. J. Lin 2 The Pennsylvania State University Abstract: Supersaturated designs

More information

Selection of Smoothing Parameter for One-Step Sparse Estimates with L q Penalty

Selection of Smoothing Parameter for One-Step Sparse Estimates with L q Penalty Journal of Data Science 9(2011), 549-564 Selection of Smoothing Parameter for One-Step Sparse Estimates with L q Penalty Masaru Kanba and Kanta Naito Shimane University Abstract: This paper discusses the

More information

MODEL SELECTION FOR CORRELATED DATA WITH DIVERGING NUMBER OF PARAMETERS

MODEL SELECTION FOR CORRELATED DATA WITH DIVERGING NUMBER OF PARAMETERS Statistica Sinica 23 (2013), 901-927 doi:http://dx.doi.org/10.5705/ss.2011.058 MODEL SELECTION FOR CORRELATED DATA WITH DIVERGING NUMBER OF PARAMETERS Hyunkeun Cho and Annie Qu University of Illinois at

More information

Lasso Maximum Likelihood Estimation of Parametric Models with Singular Information Matrices

Lasso Maximum Likelihood Estimation of Parametric Models with Singular Information Matrices Article Lasso Maximum Likelihood Estimation of Parametric Models with Singular Information Matrices Fei Jin 1,2 and Lung-fei Lee 3, * 1 School of Economics, Shanghai University of Finance and Economics,

More information

Smoothly Clipped Absolute Deviation (SCAD) for Correlated Variables

Smoothly Clipped Absolute Deviation (SCAD) for Correlated Variables Smoothly Clipped Absolute Deviation (SCAD) for Correlated Variables LIB-MA, FSSM Cadi Ayyad University (Morocco) COMPSTAT 2010 Paris, August 22-27, 2010 Motivations Fan and Li (2001), Zou and Li (2008)

More information

Theoretical results for lasso, MCP, and SCAD

Theoretical results for lasso, MCP, and SCAD Theoretical results for lasso, MCP, and SCAD Patrick Breheny March 2 Patrick Breheny High-Dimensional Data Analysis (BIOS 7600) 1/23 Introduction There is an enormous body of literature concerning theoretical

More information

WEIGHTED QUANTILE REGRESSION THEORY AND ITS APPLICATION. Abstract

WEIGHTED QUANTILE REGRESSION THEORY AND ITS APPLICATION. Abstract Journal of Data Science,17(1). P. 145-160,2019 DOI:10.6339/JDS.201901_17(1).0007 WEIGHTED QUANTILE REGRESSION THEORY AND ITS APPLICATION Wei Xiong *, Maozai Tian 2 1 School of Statistics, University of

More information

Feature selection with high-dimensional data: criteria and Proc. Procedures

Feature selection with high-dimensional data: criteria and Proc. Procedures Feature selection with high-dimensional data: criteria and Procedures Zehua Chen Department of Statistics & Applied Probability National University of Singapore Conference in Honour of Grace Wahba, June

More information

Tuning parameter selectors for the smoothly clipped absolute deviation method

Tuning parameter selectors for the smoothly clipped absolute deviation method Tuning parameter selectors for the smoothly clipped absolute deviation method Hansheng Wang Guanghua School of Management, Peking University, Beijing, China, 100871 hansheng@gsm.pku.edu.cn Runze Li Department

More information

Shrinkage Tuning Parameter Selection in Precision Matrices Estimation

Shrinkage Tuning Parameter Selection in Precision Matrices Estimation arxiv:0909.1123v1 [stat.me] 7 Sep 2009 Shrinkage Tuning Parameter Selection in Precision Matrices Estimation Heng Lian Division of Mathematical Sciences School of Physical and Mathematical Sciences Nanyang

More information

UNIVERSITY of PENNSYLVANIA CIS 520: Machine Learning Final, Fall 2013

UNIVERSITY of PENNSYLVANIA CIS 520: Machine Learning Final, Fall 2013 UNIVERSITY of PENNSYLVANIA CIS 520: Machine Learning Final, Fall 2013 Exam policy: This exam allows two one-page, two-sided cheat sheets; No other materials. Time: 2 hours. Be sure to write your name and

More information

Lecture 14: Variable Selection - Beyond LASSO

Lecture 14: Variable Selection - Beyond LASSO Fall, 2017 Extension of LASSO To achieve oracle properties, L q penalty with 0 < q < 1, SCAD penalty (Fan and Li 2001; Zhang et al. 2007). Adaptive LASSO (Zou 2006; Zhang and Lu 2007; Wang et al. 2007)

More information

Identification and Estimation for Generalized Varying Coefficient Partially Linear Models

Identification and Estimation for Generalized Varying Coefficient Partially Linear Models Identification and Estimation for Generalized Varying Coefficient Partially Linear Models Mingqiu Wang, Xiuli Wang and Muhammad Amin Abstract The generalized varying coefficient partially linear model

More information

Regression, Ridge Regression, Lasso

Regression, Ridge Regression, Lasso Regression, Ridge Regression, Lasso Fabio G. Cozman - fgcozman@usp.br October 2, 2018 A general definition Regression studies the relationship between a response variable Y and covariates X 1,..., X n.

More information

FULL LIKELIHOOD INFERENCES IN THE COX MODEL

FULL LIKELIHOOD INFERENCES IN THE COX MODEL October 20, 2007 FULL LIKELIHOOD INFERENCES IN THE COX MODEL BY JIAN-JIAN REN 1 AND MAI ZHOU 2 University of Central Florida and University of Kentucky Abstract We use the empirical likelihood approach

More information

Bi-level feature selection with applications to genetic association

Bi-level feature selection with applications to genetic association Bi-level feature selection with applications to genetic association studies October 15, 2008 Motivation In many applications, biological features possess a grouping structure Categorical variables may

More information

Likelihood-Based Methods

Likelihood-Based Methods Likelihood-Based Methods Handbook of Spatial Statistics, Chapter 4 Susheela Singh September 22, 2016 OVERVIEW INTRODUCTION MAXIMUM LIKELIHOOD ESTIMATION (ML) RESTRICTED MAXIMUM LIKELIHOOD ESTIMATION (REML)

More information

The Pennsylvania State University The Graduate School Eberly College of Science NEW PROCEDURES FOR COX S MODEL WITH HIGH DIMENSIONAL PREDICTORS

The Pennsylvania State University The Graduate School Eberly College of Science NEW PROCEDURES FOR COX S MODEL WITH HIGH DIMENSIONAL PREDICTORS The Pennsylvania State University The Graduate School Eberly College of Science NEW PROCEDURES FOR COX S MODEL WITH HIGH DIMENSIONAL PREDICTORS A Dissertation in Statistics by Ye Yu c 2015 Ye Yu Submitted

More information

Variable Selection for Highly Correlated Predictors

Variable Selection for Highly Correlated Predictors Variable Selection for Highly Correlated Predictors Fei Xue and Annie Qu arxiv:1709.04840v1 [stat.me] 14 Sep 2017 Abstract Penalty-based variable selection methods are powerful in selecting relevant covariates

More information

SOLVING NON-CONVEX LASSO TYPE PROBLEMS WITH DC PROGRAMMING. Gilles Gasso, Alain Rakotomamonjy and Stéphane Canu

SOLVING NON-CONVEX LASSO TYPE PROBLEMS WITH DC PROGRAMMING. Gilles Gasso, Alain Rakotomamonjy and Stéphane Canu SOLVING NON-CONVEX LASSO TYPE PROBLEMS WITH DC PROGRAMMING Gilles Gasso, Alain Rakotomamonjy and Stéphane Canu LITIS - EA 48 - INSA/Universite de Rouen Avenue de l Université - 768 Saint-Etienne du Rouvray

More information

The Iterated Lasso for High-Dimensional Logistic Regression

The Iterated Lasso for High-Dimensional Logistic Regression The Iterated Lasso for High-Dimensional Logistic Regression By JIAN HUANG Department of Statistics and Actuarial Science, 241 SH University of Iowa, Iowa City, Iowa 52242, U.S.A. SHUANGE MA Division of

More information

Sparse Linear Models (10/7/13)

Sparse Linear Models (10/7/13) STA56: Probabilistic machine learning Sparse Linear Models (0/7/) Lecturer: Barbara Engelhardt Scribes: Jiaji Huang, Xin Jiang, Albert Oh Sparsity Sparsity has been a hot topic in statistics and machine

More information

Some Theories about Backfitting Algorithm for Varying Coefficient Partially Linear Model

Some Theories about Backfitting Algorithm for Varying Coefficient Partially Linear Model Some Theories about Backfitting Algorithm for Varying Coefficient Partially Linear Model 1. Introduction Varying-coefficient partially linear model (Zhang, Lee, and Song, 2002; Xia, Zhang, and Tong, 2004;

More information

Model comparison and selection

Model comparison and selection BS2 Statistical Inference, Lectures 9 and 10, Hilary Term 2008 March 2, 2008 Hypothesis testing Consider two alternative models M 1 = {f (x; θ), θ Θ 1 } and M 2 = {f (x; θ), θ Θ 2 } for a sample (X = x)

More information

High-dimensional regression

High-dimensional regression High-dimensional regression Advanced Methods for Data Analysis 36-402/36-608) Spring 2014 1 Back to linear regression 1.1 Shortcomings Suppose that we are given outcome measurements y 1,... y n R, and

More information

Lecture 5: Soft-Thresholding and Lasso

Lecture 5: Soft-Thresholding and Lasso High Dimensional Data and Statistical Learning Lecture 5: Soft-Thresholding and Lasso Weixing Song Department of Statistics Kansas State University Weixing Song STAT 905 October 23, 2014 1/54 Outline Penalized

More information

Robust Variable Selection Through MAVE

Robust Variable Selection Through MAVE Robust Variable Selection Through MAVE Weixin Yao and Qin Wang Abstract Dimension reduction and variable selection play important roles in high dimensional data analysis. Wang and Yin (2008) proposed sparse

More information

A Blockwise Descent Algorithm for Group-penalized Multiresponse and Multinomial Regression

A Blockwise Descent Algorithm for Group-penalized Multiresponse and Multinomial Regression A Blockwise Descent Algorithm for Group-penalized Multiresponse and Multinomial Regression Noah Simon Jerome Friedman Trevor Hastie November 5, 013 Abstract In this paper we purpose a blockwise descent

More information

Administration. Homework 1 on web page, due Feb 11 NSERC summer undergraduate award applications due Feb 5 Some helpful books

Administration. Homework 1 on web page, due Feb 11 NSERC summer undergraduate award applications due Feb 5 Some helpful books STA 44/04 Jan 6, 00 / 5 Administration Homework on web page, due Feb NSERC summer undergraduate award applications due Feb 5 Some helpful books STA 44/04 Jan 6, 00... administration / 5 STA 44/04 Jan 6,

More information

Biostatistics Advanced Methods in Biostatistics IV

Biostatistics Advanced Methods in Biostatistics IV Biostatistics 140.754 Advanced Methods in Biostatistics IV Jeffrey Leek Assistant Professor Department of Biostatistics jleek@jhsph.edu Lecture 12 1 / 36 Tip + Paper Tip: As a statistician the results

More information

Linear Regression Models P8111

Linear Regression Models P8111 Linear Regression Models P8111 Lecture 25 Jeff Goldsmith April 26, 2016 1 of 37 Today s Lecture Logistic regression / GLMs Model framework Interpretation Estimation 2 of 37 Linear regression Course started

More information

Ultra High Dimensional Variable Selection with Endogenous Variables

Ultra High Dimensional Variable Selection with Endogenous Variables 1 / 39 Ultra High Dimensional Variable Selection with Endogenous Variables Yuan Liao Princeton University Joint work with Jianqing Fan Job Market Talk January, 2012 2 / 39 Outline 1 Examples of Ultra High

More information

ASYMPTOTIC PROPERTIES OF BRIDGE ESTIMATORS IN SPARSE HIGH-DIMENSIONAL REGRESSION MODELS

ASYMPTOTIC PROPERTIES OF BRIDGE ESTIMATORS IN SPARSE HIGH-DIMENSIONAL REGRESSION MODELS ASYMPTOTIC PROPERTIES OF BRIDGE ESTIMATORS IN SPARSE HIGH-DIMENSIONAL REGRESSION MODELS Jian Huang 1, Joel L. Horowitz 2, and Shuangge Ma 3 1 Department of Statistics and Actuarial Science, University

More information

Penalized Profiled Semiparametric Estimating Functions

Penalized Profiled Semiparametric Estimating Functions Electronic Journal of Statistics ISSN: 1935-7524 Penalized Profiled Semiparametric Estimating Functions Lan Wang School of Statistics, University of Minnesota, Minneapolis, MN 55455, e-mail: wangx346@umn.edu

More information

Web-based Supplementary Material for. Dependence Calibration in Conditional Copulas: A Nonparametric Approach

Web-based Supplementary Material for. Dependence Calibration in Conditional Copulas: A Nonparametric Approach 1 Web-based Supplementary Material for Dependence Calibration in Conditional Copulas: A Nonparametric Approach Elif F. Acar, Radu V. Craiu, and Fang Yao Web Appendix A: Technical Details The score and

More information

M-estimation in high-dimensional linear model

M-estimation in high-dimensional linear model Wang and Zhu Journal of Inequalities and Applications 208 208:225 https://doi.org/0.86/s3660-08-89-3 R E S E A R C H Open Access M-estimation in high-dimensional linear model Kai Wang and Yanling Zhu *

More information

Single Index Quantile Regression for Heteroscedastic Data

Single Index Quantile Regression for Heteroscedastic Data Single Index Quantile Regression for Heteroscedastic Data E. Christou M. G. Akritas Department of Statistics The Pennsylvania State University SMAC, November 6, 2015 E. Christou, M. G. Akritas (PSU) SIQR

More information

arxiv: v2 [stat.me] 4 Jun 2016

arxiv: v2 [stat.me] 4 Jun 2016 Variable Selection for Additive Partial Linear Quantile Regression with Missing Covariates 1 Variable Selection for Additive Partial Linear Quantile Regression with Missing Covariates Ben Sherwood arxiv:1510.00094v2

More information

Statistical Data Mining and Machine Learning Hilary Term 2016

Statistical Data Mining and Machine Learning Hilary Term 2016 Statistical Data Mining and Machine Learning Hilary Term 2016 Dino Sejdinovic Department of Statistics Oxford Slides and other materials available at: http://www.stats.ox.ac.uk/~sejdinov/sdmml Naïve Bayes

More information

Nonparametric Modal Regression

Nonparametric Modal Regression Nonparametric Modal Regression Summary In this article, we propose a new nonparametric modal regression model, which aims to estimate the mode of the conditional density of Y given predictors X. The nonparametric

More information

Math 423/533: The Main Theoretical Topics

Math 423/533: The Main Theoretical Topics Math 423/533: The Main Theoretical Topics Notation sample size n, data index i number of predictors, p (p = 2 for simple linear regression) y i : response for individual i x i = (x i1,..., x ip ) (1 p)

More information

STAT 992 Paper Review: Sure Independence Screening in Generalized Linear Models with NP-Dimensionality J.Fan and R.Song

STAT 992 Paper Review: Sure Independence Screening in Generalized Linear Models with NP-Dimensionality J.Fan and R.Song STAT 992 Paper Review: Sure Independence Screening in Generalized Linear Models with NP-Dimensionality J.Fan and R.Song Presenter: Jiwei Zhao Department of Statistics University of Wisconsin Madison April

More information

ESTIMATION AND VARIABLE SELECTION FOR GENERALIZED ADDITIVE PARTIAL LINEAR MODELS

ESTIMATION AND VARIABLE SELECTION FOR GENERALIZED ADDITIVE PARTIAL LINEAR MODELS Submitted to the Annals of Statistics ESTIMATION AND VARIABLE SELECTION FOR GENERALIZED ADDITIVE PARTIAL LINEAR MODELS BY LI WANG, XIANG LIU, HUA LIANG AND RAYMOND J. CARROLL University of Georgia, University

More information

Comparisons of penalized least squares. methods by simulations

Comparisons of penalized least squares. methods by simulations Comparisons of penalized least squares arxiv:1405.1796v1 [stat.co] 8 May 2014 methods by simulations Ke ZHANG, Fan YIN University of Science and Technology of China, Hefei 230026, China Shifeng XIONG Academy

More information

Moment and IV Selection Approaches: A Comparative Simulation Study

Moment and IV Selection Approaches: A Comparative Simulation Study Moment and IV Selection Approaches: A Comparative Simulation Study Mehmet Caner Esfandiar Maasoumi Juan Andrés Riquelme August 7, 2014 Abstract We compare three moment selection approaches, followed by

More information

Lecture 6: Methods for high-dimensional problems

Lecture 6: Methods for high-dimensional problems Lecture 6: Methods for high-dimensional problems Hector Corrada Bravo and Rafael A. Irizarry March, 2010 In this Section we will discuss methods where data lies on high-dimensional spaces. In particular,

More information

Linear Models in Machine Learning

Linear Models in Machine Learning CS540 Intro to AI Linear Models in Machine Learning Lecturer: Xiaojin Zhu jerryzhu@cs.wisc.edu We briefly go over two linear models frequently used in machine learning: linear regression for, well, regression,

More information

Regularization Path Algorithms for Detecting Gene Interactions

Regularization Path Algorithms for Detecting Gene Interactions Regularization Path Algorithms for Detecting Gene Interactions Mee Young Park Trevor Hastie July 16, 2006 Abstract In this study, we consider several regularization path algorithms with grouped variable

More information

VARIABLE SELECTION IN ROBUST LINEAR MODELS

VARIABLE SELECTION IN ROBUST LINEAR MODELS The Pennsylvania State University The Graduate School Eberly College of Science VARIABLE SELECTION IN ROBUST LINEAR MODELS A Thesis in Statistics by Bo Kai c 2008 Bo Kai Submitted in Partial Fulfillment

More information

VARIABLE SELECTION AND ESTIMATION WITH THE SEAMLESS-L 0 PENALTY

VARIABLE SELECTION AND ESTIMATION WITH THE SEAMLESS-L 0 PENALTY Statistica Sinica 23 (2013), 929-962 doi:http://dx.doi.org/10.5705/ss.2011.074 VARIABLE SELECTION AND ESTIMATION WITH THE SEAMLESS-L 0 PENALTY Lee Dicker, Baosheng Huang and Xihong Lin Rutgers University,

More information

Classification. Chapter Introduction. 6.2 The Bayes classifier

Classification. Chapter Introduction. 6.2 The Bayes classifier Chapter 6 Classification 6.1 Introduction Often encountered in applications is the situation where the response variable Y takes values in a finite set of labels. For example, the response Y could encode

More information

Using Estimating Equations for Spatially Correlated A

Using Estimating Equations for Spatially Correlated A Using Estimating Equations for Spatially Correlated Areal Data December 8, 2009 Introduction GEEs Spatial Estimating Equations Implementation Simulation Conclusion Typical Problem Assess the relationship

More information

Quasi-likelihood Scan Statistics for Detection of

Quasi-likelihood Scan Statistics for Detection of for Quasi-likelihood for Division of Biostatistics and Bioinformatics, National Health Research Institutes & Department of Mathematics, National Chung Cheng University 17 December 2011 1 / 25 Outline for

More information

Robust Variable Selection Methods for Grouped Data. Kristin Lee Seamon Lilly

Robust Variable Selection Methods for Grouped Data. Kristin Lee Seamon Lilly Robust Variable Selection Methods for Grouped Data by Kristin Lee Seamon Lilly A dissertation submitted to the Graduate Faculty of Auburn University in partial fulfillment of the requirements for the Degree

More information

MA 575 Linear Models: Cedric E. Ginestet, Boston University Regularization: Ridge Regression and Lasso Week 14, Lecture 2

MA 575 Linear Models: Cedric E. Ginestet, Boston University Regularization: Ridge Regression and Lasso Week 14, Lecture 2 MA 575 Linear Models: Cedric E. Ginestet, Boston University Regularization: Ridge Regression and Lasso Week 14, Lecture 2 1 Ridge Regression Ridge regression and the Lasso are two forms of regularized

More information

High-dimensional Ordinary Least-squares Projection for Screening Variables

High-dimensional Ordinary Least-squares Projection for Screening Variables 1 / 38 High-dimensional Ordinary Least-squares Projection for Screening Variables Chenlei Leng Joint with Xiangyu Wang (Duke) Conference on Nonparametric Statistics for Big Data and Celebration to Honor

More information

Regularization: Ridge Regression and the LASSO

Regularization: Ridge Regression and the LASSO Agenda Wednesday, November 29, 2006 Agenda Agenda 1 The Bias-Variance Tradeoff 2 Ridge Regression Solution to the l 2 problem Data Augmentation Approach Bayesian Interpretation The SVD and Ridge Regression

More information

STAT 100C: Linear models

STAT 100C: Linear models STAT 100C: Linear models Arash A. Amini June 9, 2018 1 / 21 Model selection Choosing the best model among a collection of models {M 1, M 2..., M N }. What is a good model? 1. fits the data well (model

More information

MSA220/MVE440 Statistical Learning for Big Data

MSA220/MVE440 Statistical Learning for Big Data MSA220/MVE440 Statistical Learning for Big Data Lecture 7/8 - High-dimensional modeling part 1 Rebecka Jörnsten Mathematical Sciences University of Gothenburg and Chalmers University of Technology Classification

More information

BAGUS: Bayesian Regularization for Graphical Models with Unequal Shrinkage

BAGUS: Bayesian Regularization for Graphical Models with Unequal Shrinkage BAGUS: Bayesian Regularization for Graphical Models with Unequal Shrinkage Lingrui Gan, Naveen N. Narisetty, Feng Liang Department of Statistics University of Illinois at Urbana-Champaign Problem Statement

More information

Consistent change point estimation. Chi Tim Ng, Chonnam National University Woojoo Lee, Inha University Youngjo Lee, Seoul National University

Consistent change point estimation. Chi Tim Ng, Chonnam National University Woojoo Lee, Inha University Youngjo Lee, Seoul National University Consistent change point estimation Chi Tim Ng, Chonnam National University Woojoo Lee, Inha University Youngjo Lee, Seoul National University Outline of presentation Change point problem = variable selection

More information

A Consistent Information Criterion for Support Vector Machines in Diverging Model Spaces

A Consistent Information Criterion for Support Vector Machines in Diverging Model Spaces Journal of Machine Learning Research 17 (2016) 1-26 Submitted 6/14; Revised 5/15; Published 4/16 A Consistent Information Criterion for Support Vector Machines in Diverging Model Spaces Xiang Zhang Yichao

More information

The Adaptive Lasso and Its Oracle Properties Hui Zou (2006), JASA

The Adaptive Lasso and Its Oracle Properties Hui Zou (2006), JASA The Adaptive Lasso and Its Oracle Properties Hui Zou (2006), JASA Presented by Dongjun Chung March 12, 2010 Introduction Definition Oracle Properties Computations Relationship: Nonnegative Garrote Extensions:

More information

On Mixture Regression Shrinkage and Selection via the MR-LASSO

On Mixture Regression Shrinkage and Selection via the MR-LASSO On Mixture Regression Shrinage and Selection via the MR-LASSO Ronghua Luo, Hansheng Wang, and Chih-Ling Tsai Guanghua School of Management, Peing University & Graduate School of Management, University

More information

Grouped variable selection in high dimensional partially linear additive Cox model

Grouped variable selection in high dimensional partially linear additive Cox model University of Iowa Iowa Research Online Theses and Dissertations Fall 2010 Grouped variable selection in high dimensional partially linear additive Cox model Li Liu University of Iowa Copyright 2010 Li

More information

Size and Shape of Confidence Regions from Extended Empirical Likelihood Tests

Size and Shape of Confidence Regions from Extended Empirical Likelihood Tests Biometrika (2014),,, pp. 1 13 C 2014 Biometrika Trust Printed in Great Britain Size and Shape of Confidence Regions from Extended Empirical Likelihood Tests BY M. ZHOU Department of Statistics, University

More information

Generalized Elastic Net Regression

Generalized Elastic Net Regression Abstract Generalized Elastic Net Regression Geoffroy MOURET Jean-Jules BRAULT Vahid PARTOVINIA This work presents a variation of the elastic net penalization method. We propose applying a combined l 1

More information

STA6938-Logistic Regression Model

STA6938-Logistic Regression Model Dr. Ying Zhang STA6938-Logistic Regression Model Topic 2-Multiple Logistic Regression Model Outlines:. Model Fitting 2. Statistical Inference for Multiple Logistic Regression Model 3. Interpretation of

More information

Semiparametric Mixed Effects Models with Flexible Random Effects Distribution

Semiparametric Mixed Effects Models with Flexible Random Effects Distribution Semiparametric Mixed Effects Models with Flexible Random Effects Distribution Marie Davidian North Carolina State University davidian@stat.ncsu.edu www.stat.ncsu.edu/ davidian Joint work with A. Tsiatis,

More information

Variable Selection in Multivariate Multiple Regression

Variable Selection in Multivariate Multiple Regression Variable Selection in Multivariate Multiple Regression by c Anita Brobbey A thesis submitted to the School of Graduate Studies in partial fulfillment of the requirement for the Degree of Master of Science

More information

Chris Fraley and Daniel Percival. August 22, 2008, revised May 14, 2010

Chris Fraley and Daniel Percival. August 22, 2008, revised May 14, 2010 Model-Averaged l 1 Regularization using Markov Chain Monte Carlo Model Composition Technical Report No. 541 Department of Statistics, University of Washington Chris Fraley and Daniel Percival August 22,

More information

Chapter 3. Linear Models for Regression

Chapter 3. Linear Models for Regression Chapter 3. Linear Models for Regression Wei Pan Division of Biostatistics, School of Public Health, University of Minnesota, Minneapolis, MN 55455 Email: weip@biostat.umn.edu PubH 7475/8475 c Wei Pan Linear

More information

Ensemble estimation and variable selection with semiparametric regression models

Ensemble estimation and variable selection with semiparametric regression models Ensemble estimation and variable selection with semiparametric regression models Sunyoung Shin Department of Mathematical Sciences University of Texas at Dallas Joint work with Jason Fine, Yufeng Liu,

More information

Model Selection and Geometry

Model Selection and Geometry Model Selection and Geometry Pascal Massart Université Paris-Sud, Orsay Leipzig, February Purpose of the talk! Concentration of measure plays a fundamental role in the theory of model selection! Model

More information

EXAM IN STATISTICAL MACHINE LEARNING STATISTISK MASKININLÄRNING

EXAM IN STATISTICAL MACHINE LEARNING STATISTISK MASKININLÄRNING EXAM IN STATISTICAL MACHINE LEARNING STATISTISK MASKININLÄRNING DATE AND TIME: August 30, 2018, 14.00 19.00 RESPONSIBLE TEACHER: Niklas Wahlström NUMBER OF PROBLEMS: 5 AIDING MATERIAL: Calculator, mathematical

More information

Function of Longitudinal Data

Function of Longitudinal Data New Local Estimation Procedure for Nonparametric Regression Function of Longitudinal Data Weixin Yao and Runze Li Abstract This paper develops a new estimation of nonparametric regression functions for

More information

A Confidence Region Approach to Tuning for Variable Selection

A Confidence Region Approach to Tuning for Variable Selection A Confidence Region Approach to Tuning for Variable Selection Funda Gunes and Howard D. Bondell Department of Statistics North Carolina State University Abstract We develop an approach to tuning of penalized

More information

VARIABLE SELECTION IN QUANTILE REGRESSION

VARIABLE SELECTION IN QUANTILE REGRESSION Statistica Sinica 19 (2009), 801-817 VARIABLE SELECTION IN QUANTILE REGRESSION Yichao Wu and Yufeng Liu North Carolina State University and University of North Carolina, Chapel Hill Abstract: After its

More information

CURRENT STATUS LINEAR REGRESSION. By Piet Groeneboom and Kim Hendrickx Delft University of Technology and Hasselt University

CURRENT STATUS LINEAR REGRESSION. By Piet Groeneboom and Kim Hendrickx Delft University of Technology and Hasselt University CURRENT STATUS LINEAR REGRESSION By Piet Groeneboom and Kim Hendrickx Delft University of Technology and Hasselt University We construct n-consistent and asymptotically normal estimates for the finite

More information

STA 450/4000 S: January

STA 450/4000 S: January STA 450/4000 S: January 6 005 Notes Friday tutorial on R programming reminder office hours on - F; -4 R The book Modern Applied Statistics with S by Venables and Ripley is very useful. Make sure you have

More information

Model Selection. Frank Wood. December 10, 2009

Model Selection. Frank Wood. December 10, 2009 Model Selection Frank Wood December 10, 2009 Standard Linear Regression Recipe Identify the explanatory variables Decide the functional forms in which the explanatory variables can enter the model Decide

More information

Efficient Estimation of Population Quantiles in General Semiparametric Regression Models

Efficient Estimation of Population Quantiles in General Semiparametric Regression Models Efficient Estimation of Population Quantiles in General Semiparametric Regression Models Arnab Maity 1 Department of Statistics, Texas A&M University, College Station TX 77843-3143, U.S.A. amaity@stat.tamu.edu

More information

Introduction to Empirical Processes and Semiparametric Inference Lecture 25: Semiparametric Models

Introduction to Empirical Processes and Semiparametric Inference Lecture 25: Semiparametric Models Introduction to Empirical Processes and Semiparametric Inference Lecture 25: Semiparametric Models Michael R. Kosorok, Ph.D. Professor and Chair of Biostatistics Professor of Statistics and Operations

More information

Bootstrapped Basis Regression with Variable Selection: A New Method for Flexible Functional Form Estimation

Bootstrapped Basis Regression with Variable Selection: A New Method for Flexible Functional Form Estimation Bootstrapped Basis Regression with Variable Selection: A New Method for Flexible Functional Form Estimation Brenton Kenkel Curtis S. Signorino January 23, 2013 Work in progress: comments welcome. Abstract

More information

High-dimensional regression modeling

High-dimensional regression modeling High-dimensional regression modeling David Causeur Department of Statistics and Computer Science Agrocampus Ouest IRMAR CNRS UMR 6625 http://www.agrocampus-ouest.fr/math/causeur/ Course objectives Making

More information

Chapter 17: Undirected Graphical Models

Chapter 17: Undirected Graphical Models Chapter 17: Undirected Graphical Models The Elements of Statistical Learning Biaobin Jiang Department of Biological Sciences Purdue University bjiang@purdue.edu October 30, 2014 Biaobin Jiang (Purdue)

More information

ASYMPTOTIC PROPERTIES OF BRIDGE ESTIMATORS IN SPARSE HIGH-DIMENSIONAL REGRESSION MODELS

ASYMPTOTIC PROPERTIES OF BRIDGE ESTIMATORS IN SPARSE HIGH-DIMENSIONAL REGRESSION MODELS The Annals of Statistics 2008, Vol. 36, No. 2, 587 613 DOI: 10.1214/009053607000000875 Institute of Mathematical Statistics, 2008 ASYMPTOTIC PROPERTIES OF BRIDGE ESTIMATORS IN SPARSE HIGH-DIMENSIONAL REGRESSION

More information

Statistics 203: Introduction to Regression and Analysis of Variance Course review

Statistics 203: Introduction to Regression and Analysis of Variance Course review Statistics 203: Introduction to Regression and Analysis of Variance Course review Jonathan Taylor - p. 1/?? Today Review / overview of what we learned. - p. 2/?? General themes in regression models Specifying

More information

Feature Selection for Varying Coefficient Models With Ultrahigh Dimensional Covariates

Feature Selection for Varying Coefficient Models With Ultrahigh Dimensional Covariates Feature Selection for Varying Coefficient Models With Ultrahigh Dimensional Covariates Jingyuan Liu, Runze Li and Rongling Wu Abstract This paper is concerned with feature screening and variable selection

More information

FULL LIKELIHOOD INFERENCES IN THE COX MODEL: AN EMPIRICAL LIKELIHOOD APPROACH

FULL LIKELIHOOD INFERENCES IN THE COX MODEL: AN EMPIRICAL LIKELIHOOD APPROACH FULL LIKELIHOOD INFERENCES IN THE COX MODEL: AN EMPIRICAL LIKELIHOOD APPROACH Jian-Jian Ren 1 and Mai Zhou 2 University of Central Florida and University of Kentucky Abstract: For the regression parameter

More information

Machine Learning for OR & FE

Machine Learning for OR & FE Machine Learning for OR & FE Regression II: Regularization and Shrinkage Methods Martin Haugh Department of Industrial Engineering and Operations Research Columbia University Email: martin.b.haugh@gmail.com

More information

Direct Learning: Linear Regression. Donglin Zeng, Department of Biostatistics, University of North Carolina

Direct Learning: Linear Regression. Donglin Zeng, Department of Biostatistics, University of North Carolina Direct Learning: Linear Regression Parametric learning We consider the core function in the prediction rule to be a parametric function. The most commonly used function is a linear function: squared loss:

More information

Gaussian Graphical Models and Graphical Lasso

Gaussian Graphical Models and Graphical Lasso ELE 538B: Sparsity, Structure and Inference Gaussian Graphical Models and Graphical Lasso Yuxin Chen Princeton University, Spring 2017 Multivariate Gaussians Consider a random vector x N (0, Σ) with pdf

More information

Full likelihood inferences in the Cox model: an empirical likelihood approach

Full likelihood inferences in the Cox model: an empirical likelihood approach Ann Inst Stat Math 2011) 63:1005 1018 DOI 10.1007/s10463-010-0272-y Full likelihood inferences in the Cox model: an empirical likelihood approach Jian-Jian Ren Mai Zhou Received: 22 September 2008 / Revised:

More information

Variable Selection in Restricted Linear Regression Models. Y. Tuaç 1 and O. Arslan 1

Variable Selection in Restricted Linear Regression Models. Y. Tuaç 1 and O. Arslan 1 Variable Selection in Restricted Linear Regression Models Y. Tuaç 1 and O. Arslan 1 Ankara University, Faculty of Science, Department of Statistics, 06100 Ankara/Turkey ytuac@ankara.edu.tr, oarslan@ankara.edu.tr

More information

Estimating subgroup specific treatment effects via concave fusion

Estimating subgroup specific treatment effects via concave fusion Estimating subgroup specific treatment effects via concave fusion Jian Huang University of Iowa April 6, 2016 Outline 1 Motivation and the problem 2 The proposed model and approach Concave pairwise fusion

More information

New Local Estimation Procedure for Nonparametric Regression Function of Longitudinal Data

New Local Estimation Procedure for Nonparametric Regression Function of Longitudinal Data ew Local Estimation Procedure for onparametric Regression Function of Longitudinal Data Weixin Yao and Runze Li The Pennsylvania State University Technical Report Series #0-03 College of Health and Human

More information

Bayesian variable selection via. Penalized credible regions. Brian Reich, NCSU. Joint work with. Howard Bondell and Ander Wilson

Bayesian variable selection via. Penalized credible regions. Brian Reich, NCSU. Joint work with. Howard Bondell and Ander Wilson Bayesian variable selection via penalized credible regions Brian Reich, NC State Joint work with Howard Bondell and Ander Wilson Brian Reich, NCSU Penalized credible regions 1 Motivation big p, small n

More information