Variable Selection in Measurement Error Models 1
|
|
- Nelson Miller
- 5 years ago
- Views:
Transcription
1 Variable Selection in Measurement Error Models 1 Yanyuan Ma Institut de Statistique Université de Neuchâtel Pierre à Mazel 7, 2000 Neuchâtel Runze Li Department of Statistics Pennsylvania State University University Park, PA Abstract Compared with ordinary regression models, the computational cost for estimating parameters in general measurement error models is often much more expensive because the estimation procedures typically require solving integral equations. In addition, natural criteria functions are often unavailable for general measurement error models. Thus, the traditionally best variable selection procedures become infeasible in the measurement error models context. In this paper, we develop a new framework for variable selection in measurement error models via penalized estimating equations. We first propose a new class of variable selection procedures for general parametric measurement error models and for general semiparametric measurement error models, and study the asymptotic properties of the proposed procedures. Then, under certain regularity conditions and with a properly chosen regularization parameter, we demonstrate that the proposed procedure performs as well as an oracle procedure. We assess the finite sample performance by Monte Carlo simulation studies and illustrate the proposed methodology through the empirical analysis of a real data set. 1 Introduction Variable selection is fundamental in high-dimensional statistical modeling and has received much attention in the recent literature. Frank and Friedman (1993) proposed bridge regression for variable selection in linear models. Tibshirani (1996) proposed the LASSO method for variable selection in linear models. Fan and Li (2001) proposed a nonconcave penalized likelihood method for variable selection in likelihood-based models. The nonconcave penalized likelihood approach has been 1 AMS 2000 subject classifications. Primary 62G08, 62G10; secondary 62G20. Key Words: Errors in variables; Estimating equations; Kernel regression; Instrumental variables; Latent variables; Local efficiency; Measurement error; Nonconcave penalized likelihood; SCAD; Semiparametric methods. Li s research was supported by NSF grants DMS
2 Y. Ma and R. Li further extended for Cox s model for survival data (Fan and Li, 2002) and partially linear models for longitudinal data (Fan and Li, 2004). The LASSO method becomes very popular in the literature of machine learning. The whole solution path of LASSO can be obtained by the least angle regression (LARS) algorithm (Efron, et al., 2004). Yuan and Lin (2005) provided insights of the LASSO method from empirical Bayes point of view, while Yuan and Lin (2006) extended the LASSO method for grouped variable selection in linear models. Zou (2006) proposed adaptive LASSO for linear models to reduce the estimation bias in the original LASSO. Zhang and Lu (2007) studied the adaptive LASSO for the Cox model. Fan and Li (2006) presented a review on feature selection in high-dimensional modeling. In many situations, covariates can only be measured imprecisely or indirectly, thus result in so called measurement error models or errors-in-variable models in the literature. Various statistical procedure have been developed for analyzing measurement error models (Fuller, 1987; Carroll et al., 2006). Tsiatis and Ma (2004) proposed the locally efficient estimator for parametric measurement error models, and Ma and Carroll (2006) further extend the locally efficient estimation procedure for semiparametric measurement error models. To our best knowledge, variable selection for measurement error model has not been systematically studied yet. This paper intends to develop a class of variable selection procedures for both parametric measurement error models and semiparametric measurement error models. Liang and Li (2005) extended the nonconcave penalized likelihood method for linear and partially linear measurement error models, but their method is not applicable for measurement error models beyond linear or partially linear setting, such as logistic measurement error model and generalized partially linear measurement error models. Variable selection for general parametric or semiparametric measurement error models is much more challenging than that for the linear or partially linear models. One major difficulty associated with this topic is the lack of availability of a likelihood function. In fact, classical model selection methods such as BIC, AIC, as well as the aforementioned penalized likelihood methods all require computing the likelihood, which is typically formidable in measurement error models due to the difficulty in obtaining the distribution of the error-prone covariates. Although a reasonable criterion function can be used in place of the likelihood, the difficulty persists in that except for very special models such as in linear or partially linear cases, even a criterion function is unavailable. To develop variable selection procedures for both parametric and semiparametric measurement error models, we propose a penalized estimating equation method. We systematically study the asymptotic properties of the proposed procedures. In this paper, we first consider two scenarios: one is when the number of regression coefficients is fixed and finite, and the other is when the number of regression coefficients diverges as the sample size increases. We demonstrate how the convergence 2
3 Measurement Error Model Selection rate of the resulting estimator depends on the regularization parameter. We further show that with proper choice of the regularization parameters and the penalty function, the penalized estimating equation estimator possesses the oracle property (Fan and Li, 2001). It is desirable to have an automatic, data-driven method to select the regularization parameters. We propose GCV-type and BIC-type tuning parameter selector for the proposed penalized estimating equation method. Monte Carlo simulation studies are conducted to assess finite sample performance in terms of model complexity and model error. The rest of the paper is organized as the following. We propose a new class of variable selection procedure for parametric measurement error model and study asymptotic properties of the proposed procedures in Section 2. We develop a new variable selection procedure for semiparametric measurement error model in Section 3. Implementation issues and numerical examples are presented in Section 4, where we describe a data driven automatic tuning parameter selection method (Section 4.1), define the concept of approximate model error to evaluate the selected model (Section 4.2), carry out a simulation study to assess the finite sample performance of the proposed procedures (Section 4.3), and illustrate our method in an example (Section 4.4). Technical details are collected in Section 5. We give some discussions in Section 6. 2 Parametric measurement error models A general parametric measurement error model has two parts and can be written as p Y X,Z (Y X, Z, β) and p W X,Z (W X, Z, ξ). (1) The main model is p Y X,Z (Y X, Z, β), denoting the conditional probability density function (pdf) of the response variable Y on the covariates measured with error X and the covariates measured without error Z. The error model is denoted p W X,Z (W X, Z, ξ), where W is an observable surrogate of X. Parameter β is a d-dimensional regression coefficient, ξ is a finite dimensional parameter, and our main interest is in selecting the relevant subset of covariates in X and Z and estimating the subsequent parameters contained in β. Typically, ξ is a nuisance parameter, and its estimation usually requires multiple observations or instrumental variables. As in the literature, for simplicity, we assume in the main context of this paper that the error model p W X,Z (W X, Z) is completely known and hence ξ is suppressed. We discuss the treatment of the unknown ξ case in Section 6. The observed data is of the form (W i, Z i, Y i ), i = 1,..., n. Denote S β the purported score function. That is, Sβ (W, Z, Y ) = log p W X,Z (W X, Z)p Y X,Z (Y X, Z)p X Z (X Z)dµ(X), β 3
4 Y. Ma and R. Li where p X Z (X Z) is a conditional pdf that one posits, which can be equal or not equal to the true pdf p X Z (X Z). The function a(x, Z) satisfies E[E a(x, Z) W, Z, Y X, Z] = ESβ (W, Z, Y ) X, Z, where E indicates that the expectation is calculated using the posited p X Z (X Z). Let S eff (W, Z, Y ) = S β (W, Z, Y ) E a(x, Z) W, Z, Y. Define penalized estimating equations for model (1) to be Seff (W i, Z i, Y i, β) nṗ λ (β) = 0. (2) where ṗ λ (β) = p λ (β 1),, p λ (β d) T and p λ ( ) is the first order derivative of a penalty function p λ ( ). In practice, we may allow different coefficients to have penalty functions with different regularization parameters. For example, we may want to keep some variables in the model and do not penalize their coefficients. For ease of presentation, we assume that the penalty functions and the regularization parameters are the same for all the coefficients in this paper. The choice of the penalty functions has been studied in Fan and Li (2001) in depth. penalties in the classic variable selection criteria, such as the AIC and BIC, cannot be applied for the penalized estimating equations. The L q penalty, namely p λ (θ) = q 1 λ θ q, has been proposed for bridge regression in Frank and Friedman (1993). Fan and Li (2001) advocated the use of the SCAD penalty, whose first order derivative is defined as p λ (θ) = λ I( θ λ) + (aλ θ ) + I( θ > λ) sign(θ), (3) (a 1)λ where a > 2 and sign( ) is the sign function, i.e., sign(θ) = 1, 0, and 1 when θ < 0, = 0 and > 0 respectively. Fan and Li (2001) further suggested using a = 3.7 from a Bayesian point of view. With a proper choice of penalty function, such as the SCAD penalty, the resulting estimate contains some exact zero coefficients. This is equivalent to excluding the corresponding variables from the final selected model, thus achieving the purpose of variable selection for parametric measurement error models. We now study the asymptotic properties of the resulting estimator from the penalized estimating equations. Denote β 0 = (β 10,, β d0 ) T the true value of β. Let and The a n = max p λ n ( β j0 ) : β j0 0, (4) b n = max p λ n ( β j0 ) : β j0 0, (5) 4
5 Measurement Error Model Selection where we write λ as λ n to emphasize that λ depends on the sample size n. We first establish the convergence rate of the penalized estimating equation estimator for the settings in which the dimension of β is fixed and finite. Theorem 1 Suppose that the regularity conditions (A1)-(A4) in Section 5 hold. If both a n and b n tends to 0 as n, then, with probability tending to one, there exists a root of (2), denoted ˆβ, such that ˆβ β 0 = O p (n 1/2 + a n ). Before we demonstrate that the resulting estimator possesses the oracle property, let us introduce some notation. Without loss of generality, it is assumed β 0 = (βi0 T, βt II0 )T, and in the true model, any element in β I0 is not equal to 0 while β II = 0. Throughout this paper, denote the dimension of β I as d 1 and that of β II as d 2. Let d = d 1 + d 2. Furthermore, denote b = p λ n (β 10 ),, p λ n (β d1 0) T (6) Σ = diagp λ n (β 10 ),, p λ n (β d1 0), (7) and the first d 1 components of S eff (W, Z, Y, β) is denoted S eff,i (β). Theorem 2 Under regularity conditions of Theorem 1, if lim inf n lim inf θ 0+ np λn (θ), (8) then with probability tending to one, any root-n convergent solution ˆβ = ( ˆβ T I, ˆβ T II )T satisfy that of (2) must (i) ˆβ II = 0; (ii) n ˆβI β I0 E S eff,i (β I0) N 0, E S eff,i (β I0) β T I β T I 1 Σ b D 1 Σ E Seff,I (β I0)Seff,I T (β I0) E S eff,i (β T I0) βi T Σ, where the notation to denote (M 1 ) T for a matrix M. D stands for convergence in distribution, and we use the notation M T 5
6 Y. Ma and R. Li For some penalty functions, including the SCAD penalty, b and Σ are zero when λ n is sufficiently small. So we actually have n( ˆβI β I0 ) N ( 0, E( S eff,i / βt I ) 1 E(S eff,i S T eff,i )E(S eff,i / βt I ) T ) in distribution. In other words, with probability tending to 1, the penalized estimator performs the same as the locally efficient estimator under the correct model. Hence it has the celebrated oracle property. In Theorems 1 and 2, both d 1 and d are fixed, as n. Concerns about model bias often prompts us to build models that contain many variables, especially so when the sample size becomes large. A reasonable way to capture such tendency is to consider the situation where the dimension of the parameter β increases along with the sample size n. Obviously, the methods in (2) (and also in (11) later on for semiparametric model case) can still be used in this case; the operation does not involve any modification from the fixed number of parameters case. However, the large sample properties of the same procedures will behave inevitably differently under the increasing parameter number case. We therefore study the sampling properties of the penalized estimating equation estimator under the setting in which both d 1 and d tend to infinity as n goes to infinity. To reflect that d and d 1 change with n, we use the notation d n1, d n. The issue of a diverging number of parameters has also been considered in Fan and Peng (2004) in the context of penalized likelihood. Because semiparametric efficiency is not defined under this setting, our result will no longer imply any local efficiency. However, when λ n is sufficiently small, both b and Σ are zero, so the oracle property still holds. The main results are given in the following two theorems, whose proofs are given in Section 5. Theorem 3 Under the regularity conditions (B1)-(B6) in Section 5, and if d 4 nn 1 0, λ n 0 when n, then with probability tending to one, there exists a root of (2), denoted ˆβ n, such that ˆβ n β n0 = O p d n (n 1/2 + a n ). Theorem 4 Under regularity conditions (B1)-(B6) in Section 5, assume λ n 0 and d 5 n/n 0 when n. If lim inf n lim inf θ 0+ n/dn p λ n (θ), (9) then with probability tending to one, any root-n consistent solution ˆ β n = ( ˆβ T I, ˆβ T II )T satisfy that (i) ˆβ II = 0, of (2) must 6
7 Measurement Error Model Selection (ii) For any d 1 1 vector v, s.t. v T v = 1, nv T [ E S eff,i (β I0)S T eff,i (β I0) ] 1/2 ˆβI β I0 E S eff,i (β I0) β T I 1 Σ b E S eff,i (β I0) β T I D N(0, 1) Σ 3 Semiparametric measurement error models The semiparametric measurement error model we consider here also has two parts. The major difference from its parametric counter part is that the main model contains an unknown function of one of the observable covariates. For notational convenience, we rewrite the observable covariates Z into two parts, Z and S, where now Z denotes an observable covariate that enters the model nonparametrically, say through θ(z), and S denotes the observable covariates that enter the model parametrically. The resulting semiparametric measurement error model can be summarized as p Y X,Z,S Y X, Z, S, β, θ(z) and p W X,Z,S (W X, Z, S). (10) The method we propose in this situation admits a similar simple form L(W i, Z i, S i, Y i, β, ˆθ i ) nṗ λ (β) = 0, (11) where ṗ λ (β) has the same form as in (3). However, the computation of L is more involved as we now describe. If we replace θ(z) with a single unknown parameter α, and append α to β, we obtain a parametric measurement error model with parameters (β T, α) T. For this parametric model, we can compute its corresponding Seff as we did in Section 2. Decompose S eff into L(X, Z, S, Y, β, α) and Ψ(X, Z, S, Y, β, α), where L has the same dimension as β and Ψ is a one-dimensional function. We solve for ˆθ i, i = 1,..., n, from K h (s i s 1 )Ψ(w i, z i, s i, y i ; β, θ 1 ) = 0. (12) K h (s i s n )Ψ(w i, z i, s i, y i ; β, θ n ) = 0, where K h (s) = h 1 K(s/h), K is a kernel function, h is a bandwidth. Insert the ˆθ i s into L in (11) and we obtain a complete description of the estimator. Note that since ˆθ i depends on β, so a more precise notation for ˆθ i is ˆθ i (β). 7
8 Y. Ma and R. Li Similar results hold for the semiparametric penalized estimator. In the semiparametric model setting, make the definitions that L I is the first d 1 components of L, L IβI is the partial derivative of L I with respect to β I, L Iθ is the partial derivative of L I with respect to θ, Ψ θ is the partial derivative of Ψ with respect to θ and Ψ βi is the partial derivative of Ψ with respect to β I. Also define Ω(Z) = E (Ψ θ Z), U I (Z) = E(L Iθ Z)/Ω(Z) and θ βi (Z) = E(Ψ βi Z)/Ω(Z). Note that the dimensions of β 1 and β here, d 1 and d, are fixed even when the sample size n increases. Theorem 5 Under the regularity conditions (C1)-(C6) in Section 5, if λ n 0 when n, then with probability tending to one, there exists a root of (11), denoted ˆβ, such that ˆβ β 0 = O p (n 1/2 + a n ). Further define A = E [ L IβI W, Z, S, Y, β 0, θ 0 (Z) + L Iθ W, Z, S, Y, β 0, θ 0 (Z)θβ T I (Z, β 0 ) ], B = cov L I W, Z, S, Y, β 0, θ 0 (Z) ΨW, Z, S, Y, β 0, θ 0 (Z)U I (Z), we obtain the following result. Theorem 6 Under the regularity conditions (C1)-(C6) in Section 5, if λ n 0 as n, and (8) holds, then with probability tending to one, any root-n consistent solution ˆβ = ( ˆβ T I, ˆβ T II )T of (11) must satisfy that (i) ˆβ II = 0, (ii) ˆβ I has the following asymptotic normality, n ˆβI β I0 (A Σ) 1 b D N 0, (A Σ) 1 B(A Σ) T. For some penalty functions, including the SCAD penalty, b and Σ are zero when λ n is sufficiently small, so we have n( ˆβ I β I0 ) D N(0, A 1 BA T ). These are exactly the same first order asymptotic properties of the locally efficient estimator for such semiparametric model in the case when only the first d 1 parameters are included in the model, that is, the procedure possesses the oracle property. Next, in the following two theorems, we present the asymptotic property of the estimator for increasing dimensions d n and d n1. Theorem 7 Under the regularity conditions (D1)-(D7) in Section 5, and if d 4 nn 1 0, λ n 0 when n, then with probability tending to one, there exists a root of (11), denoted ˆβ n, such that ˆβ n β n0 = O p d n (n 1/2 + a n ). 8
9 Measurement Error Model Selection Theorem 8 Under regularity conditions (D1)-(D7) in Section 5, if λ n 0, d 5 n/n 0, and (9) holds, then with probability tending to one, any root-n consistent estimator ˆ β n = ( ˆβ T I, ˆβ T II )T obtained in (11) must satisfied that (i) ˆβ II = 0, (ii) for any d n1 1 vector v such that v T v = 1, n/dn v T B 1/2 (A Σ) ˆβI β I0 (A Σ) 1 D b N(0, 1). 4 Numerical studies and application In this section, we assess the finite sample performance of the proposed procedure by Monte Carlo simulation, and illustrate the proposed methodology by an empirical analysis of the Framingham heart study data. In our simulation, we concentrate on the performance of the proposed procedure for a quadratic logistic measurement error model and partially linear logistic measurement error model in terms of model complexity and model error. 4.1 Tuning parameter selection To implement the Newton-Raphson algorithm to solve the penalized estimating equations, we locally approximate the first order derivative of the penalty function by a linear function, following the idea of local quadratic approximation algorithm proposed in Fan and Li (2001), Specifically, suppose that at the kth step in the course of the iteration, we obtain the value β (k). Then, for β (k) j not very close to zero, p λ (β j) = p λ ( β j )sign(β j ) p λ ( β(k) j ) β (k) β j, j otherwise, we set β (k+1) j = 0, and exclude the corresponding covariate from the model. This approximation is updated in every step in the course of the Newton-Raphson algorithm iteration. Following the results in Theorem 2 and 6, we can further approximate the estimation variance of the resulting estimate. That is ĉov( ˆβ) = 1 n (E Σ λ) 1 F (E Σ λ ) T, where Σ λ is a diagonal matrix with elements equal to p λ ( ˆβ j )/ ˆβ j for nonvanishing ˆβ j, a linear approximation of Σ defined in (7). We use E to denote the sample approximation of E Seff,I (W, Z, Y, β I)/ β I evaluated at ˆβ for the parametric model (1) and the sample approximation of the matrix A evaluated at ˆβ for the semiparametric model (10). Similarly, we use F to denote the sample approximation of 9
10 Y. Ma and R. Li var(seff,i ) evaluated at ˆβ for the parametric model and the sample approximation of the matrix B evaluated at ˆβ for the semiparametric model, respectively. The accuracy of this sandwich formula will be tested in our simulation studies. It is desirable to have automatic, data-driven method to select the tuning parameter λ. Here we will consider two tuning parameter selectors, the GCV and BIC tuning parameter selectors. To define the GCV statistic and the BIC statistic, we need to define the degrees of freedom and goodness of fit measure for the final selected model. Similar to the nonconcave penalty likelihood approach (Fan and Li, 2001), we may define effective number of parameter or degrees of freedom to be df λ = tracei(i + Σ λ ) 1, where I stands for the Fisher Information matrix. For the logistic regression models that we employ in this section, a natural approximation of I, ignoring the measurement error effect, is C T QC, where C represents the covariates included in the model, and Q is a diagonal matrix with the ith element equal to ˆµ λ,i (1 ˆµ λ,i ). Here, ˆµ λ,i = P (Y i = 1 C i ). In the logistic regression model context of this section, we may employ its deviance as goodness of fit measure. Specifically, let µ i be the conditional expectation of Y i given its covariates for i = 1,, n. The deviance of a model fit ˆµ λ = (ˆµ λ,1,, ˆµ λ,n ) is defined to be D(ˆµ λ ) = 2 [Y i log(y i /ˆµ λ,i ) + (1 Y i )log(1 Y i )/(1 ˆµ λ,i )]. Define GCV statistic to be GCV (λ) = D(ˆµ λ ) n(1 df λ /n) 2, and the BIC statistic to be BIC(λ) = D(ˆµ λ ) + 2log(n)df λ. The GCV and the BIC tuning parameter selectors select λ by minimizing GCV (λ) and BIC(λ) respectively. Note that the BIC tuning parameter selector is distinguished from the traditional BIC variable selection criterion, which is not well defined for estimating equation methods. Wang, Li and Tsai (2007) provided a study on the asymptotic behavior for the GCV and BIC tuning parameter selectors for the nonconvex penalized least squares variable selection procedures in linear and partially linear regression models. 10
11 Measurement Error Model Selection 4.2 Model error Let us simplify the model error for logistic partially linear measurement error model. µ(s, X, Z) = E(Y S, X, Z), and model error for a model ˆµ(S, X, Z) is defined Denote ME(ˆµ) = Eˆµ(S, X, Z ) µ(s, X, Z ) 2, where the expectation is taken over the new observation S, X and Z. Let g( ) be the logit link. For logistic partially linear model, the mean function has the form µ(s, X, Z) = g 1 θ(z) + β T V, where V = (S T, X T ) T. If ˆθ( ) and ˆβ are consistent estimator for θ( ) and β, respectively, then by a Taylor expansion, the model error can be approximated by ( ME(ˆµ) E ġ 1 θ(z ) + β T V 2 [ˆθ(Z ) θ(z ) 2 +( ˆβ T V β T V ) 2 + 2ˆθ(Z ) θ(z )( ˆβ ) T V β T V )]. The first component is the inherent model error due to ˆθ( ), the second one is due to lack of fit of ˆβ, and the third one is the cross-product between the first two components. Thus, to assess the performance of the proposed variable selection procedure, we define the approximate model error (AME) for ˆβ to be AME( ˆβ) [ = E ġ 1 θ(z ) + β T V 2 ( ˆβ T V β T V ) 2]. Furthermore, the AME of ˆβ can be written as AME( ˆβ) = ( ˆβ β) T E[ġ 1 θ(z ) + β T V 2 V V T ]( ˆβ β) ˆ=( ˆβ β) T C X ( ˆβ β). (13) In our simulation, the matrix C X is estimated by 1,000,000 Monte Carlo simulations. For measurement error data, we observe W instead of X. We also consider an alternative approximate model error AME W ( ˆβ) = ( ˆβ β) T C W ( ˆβ β), (14) where C W is obtained by replacing X with W in the definition of C X. The AME( ˆβ) and AME W ( ˆβ) are defined for parametric model case, specifically the quadratic logistic model, by setting θ( ) = 0 and append X 2 (or W 2 ) in V. Note that although we defined the model error in the context with a logistic link function, it is certainly not restricted to such case. The general approach for calculating AM E is to approximate the probability density function evaluated at the estimated parameters around the true parameter value, and extract the linear term of the parameter of interest. AME W is calculated through replacing X with W. 11
12 Y. Ma and R. Li 4.3 Simulation examples To demonstrate the performance of the method in both parametric and semiparametric measurement error models, we conduct two simulation studies. Example 1. In this example, we generate data from a logistic model where the covariate measured with error enters the model through a quadratic function, and the covariates measured without error enter linearly. The measurement error follows a normal additive pattern. Specifically, we have logitp(y = 1 X, Z) = β 0 + β 1 X + β 2 X 2 + β 3 Z 1 + β 4 Z 2 + β 5 Z 3 + β 6 Z 4 + β 7 Z 5 + β 8 Z 6 + β 9 Z 7, and W = X + U, where β = (0, 1.5, 2, 0, 3, 0, 1.5, 0, 0, 0) T, the covariate X is generated from a normal distribution with mean 0 and variance 1, (Z 1,..., Z 6 ) T is generated from a normal distribution with mean 0 and covariance between Z i and Z j is 0.5 i j. The last component of the Z covariates, Z 7, is a binary variable with equal probability to be 0 or 1. U N(0, ). In our simulation, the sample size is taken to be either n = 500 or n = The model complexity of the selected model is summarized in terms of the number of zero coefficients, and the model error of the selected model is summarized in terms of relative approximation model error (RAME), defined to be the ratio of model error of the selected model to that of the full model. In Table 1, the RAME column corresponds to the sample median and median absolute deviation (MAD) divided by a factor of of the AME values defined in (13) over 1000 simulations. Similarly, RAME W corresponds to that of the AME W values defined in (14) over 1000 simulations. From Table 1, it can be seen that the values of RAME and RAME W are very close. The average count of zero coefficients is also reported in Table 1, in which the column labeled C presents the average count restricted only to the true zero coefficients, while the column labeled E displays the average count of the coefficients erroneously set to 0. Table 1: MRMEs and Model Complexity for Example 1 n RAME RAME W # of Zero Cofficients Median(MAD) Median(MAD) C E GCV (0.231) 0.698(0.228) BIC (0.188) 0.396(0.187) GCV (0.187) 0.770(0.185) BIC (0.157) 0.401(0.158)
13 Measurement Error Model Selection Table 2: Bias and Standard Errors for Example 1 bias(sd) SE(Std(SE)) bias(sd) SE(Std(SE)) ˆβ 1 ˆβ2 n = 500 EE (0.609) (0.418) (0.922) 0.585(0.647) GCV (0.426) (0.159) (0.418) 0.388(0.178) BIC (0.593) (0.189) (0.462) 0.367(0.176) n = 1000 EE (0.273) (0.062) (0.332) 0.321(0.088) GCV (0.254) (0.048) (0.258) 0.253(0.057) BIC (0.290) (0.054) (0.255) 0.244(0.052) ˆβ 4 ˆβ6 n = 500 EE (0.618) (0.397) (0.465) (0.256) GCV (0.453) (0.105) (0.302) (0.066) BIC (0.462) (0.107) (0.286) (0.065) n = 1000 EE (0.314) (0.046) (0.207) (0.026) GCV (0.282) (0.038) (0.184) (0.022) BIC (0.275) (0.037) (0.166) (0.020) We next verify the consistency of the estimators and test the accuracy of the proposed standard error formula. Table 2 displays the bias and sample standard deviation (SD) of the estimates for two nonzero coefficients, (β 1, β 2 ), over 1000 simulations, and the sample average and the sample standard deviations of the 1000 standard errors obtained by using the sandwich formula. The row labeled EE corresponds to the unpenalized estimating equation estimator. Overall, the sandwich formula works well. It is worth pointing out that a variable selection procedures effectively reduces the finite sample estimation bias and variance, even though asymptotically, the estimator without variable selection is also consistent. Example 2. In this example, we illustrate the performance of the method for a semiparametric 13
14 Y. Ma and R. Li measurement error model. Simulation data are generated from logit(y ) = β 1 X + β 2 S β 10 S 9 + θ(z) W = X + U where β, X and W are the same as in the previous simulation. We generate S s in a similar fashion as the Z s in simulation 1. That is, (S 1,..., S 8 ) is generated from normal distribution with mean zero and covariance between S i and S j is 0.5 i j. S 9 is a binary variable with equal probability to be zero or one. The random variable Z is generated from a uniform distribution in [ π/2, π/2]. The true function θ(z) = 0.5 cos(z). The parameter takes values β = (1.5, 2, 0, 0, 3, 0, 1.5, 0, 0, 0). The simulation results are summarized in Table 3, in which the notation is the same as that in Table 1. From Table 3, we can see that the penalized estimating equation estimators can significantly reduce model complexity. Overall, the BIC tuning parameter selectors performs better, while GCV is too conservative. We have further tested the consistency and the accuracy of the standard error formula derived from the sandwich formula. The result is summarized in Table 4, in which the notation is the same as that in table 2, and from which we can see the consistency of the estimator and that the standard error formula performs rather well when n = Table 3: MRMEs and Model Complexity for Example 2 Method n RME X RME W # of Zero Coefficients Median(MAD) Median(MAD) C E GCV (0.161) 0.880(0.158) BIC (0.158) 0.387(0.155) GCV (0.164) 0.873(0.160) BIC (0.162) 0.392(0.161) Data Analysis Framingham heart study data set (Kannel et al., 1986) is a well known data set where it is generally accepted that there exists measurement error on the long term systolic blood pressure (SBP). In addition to SBP, other measurements include age, smoking status and serum cholesterol. In literature, there has been speculation that a second order term involving age might be needed in analyzing the dependence of the occurrence of heart disease. In addition, it is unclear if the interaction (product) between the various covariates plays a role in influencing the heart disease rate. The data set includes 1615 observations. 14
15 Measurement Error Model Selection Table 4: Bias and Standard Errors for Example 2 Bias(SD) SE(Std(SE)) Bias(SD) SE(Std(SE)) ˆβ 1 ˆβ2 n = 500 EE (0.267) (0.043) (0.295) (0.045) GCV (0.275) (0.047) (0.297) (0.050) BIC (0.262) (0.044) (0.273) (0.045) n = 1000 EE 0.039(0.170) 0.166(0.018) 0.057(0.194) 0.190(0.018) GCV 0.047(0.174) 0.172(0.020) 0.069(0.196) 0.191(0.021) BIC 0.031(0.169) 0.170(0.019) 0.044(0.179) 0.185(0.019) ˆβ 5 ˆβ7 n = 500 EE (0.415) (0.063) (0.288) (0.038) GCV (0.418) (0.071) (0.285) (0.044) BIC (0.378) (0.064) (0.249) (0.037) n = 1000 EE (0.258) (0.026) (0.181) (0.016) GCV (0.265) (0.031) (0.181) (0.020) BIC (0.242) (0.028) (0.159) (0.016) With the method developed here, it is possible to perform a variable selection to address these issues. Following the literature, we adopt the measurement error model of log(msbp 50) = log(sbp 50)+U, where U is a mean zero normal random variable with variance σu 2 = , MSBP is the measured SBP. We denote the standardized log(msbp 50) as W. The standardization using the same parameters on log(sbp 50) is denoted X. The standardized serum cholesterol, age are denoted Z 1, Z 2, and we use Z 3 to denote the binary variable smoking status. Using Y to denote the occurrence of heart disease, the saturated model which includes all the interaction terms and 15
16 Y. Ma and R. Li 1 Plots of GCV and BIC scores versus λ Normalized GCV and BIC scores GCV BIC λ Figure 1: Tuning parameters and their corresponding BIC and GCV scores for the Framingham data. The scores are normalized to the range [0, 1]. also the square of age term is of the form logitp(y = 1 X, Z s) = β 1 X + β 2 XZ 1 + β 3 XZ 2 + β 4 XZ 3 + β 5 + β 6 Z 1 + β 7 Z 2 + β 8 Z 3 + β 9 Z 2 2 W = X + U. +β 10 Z 1 Z 2 + β 11 Z 1 Z 3 + β 12 Z 2 Z 3 We first used both GCV and BIC tuning parameter selectors to choose λ. We present the tuning parameters and the corresponding GCV and BIC scores in Figure 1. The final chosen λ is and by the GCV and BIC selectors, respectively. The selected model is depicted in Table 5. The GCV criterion selects the covariates X, XZ 1, 1, Z 1, Z 2, Z 3, Z2 2, Z 2Z 3 into the model, while the BIC criterion selects the covariates X, 1, Z 1, Z 2 into the model. we report the selection and estimation results in table 5, as well as the semiparametric estimation results without variable selection. As we can see, the terms X, 1, Z 1, Z 2 are selected by both criteria, while Z 3, Z2 2 and some of the interaction terms are selected by GCV. The BIC criterion is very aggressive and it results in a very simple final model while the GCV criterion is much more conservative, hence the resulting model is more complex. This agrees with the simulation results we have obtained. Since both criteria have included the covariate X, we see that the measurement error feature and its treatment in the framingham data is unavoidable. 16
17 Measurement Error Model Selection Table 5: Results for the Framingham data set. EE GCV BIC ˆβ (SE) ˆβ (SE) ˆβ (SE) X 0.643(0.248) 0.416(0.093) 0.179(0.039) XZ (0.097) (0.041) 0 (NA) XZ (0.111) 0 (NA) 0 (NA) XZ (0.249) 0 (NA) 0 (NA) Intercept (0.428) (0.356) (0.092) Z (0.212) 0.332(0.085) 0.124(0.033) Z (0.341) 1.044(0.329) 0.398(0.067) Z (0.443) 0.907(0.373) 0 (NA) Z (0.125) (0.121) 0 (NA) Z 1 Z (0.103) 0 (NA) 0 (NA) Z 1 Z (0.225) 0 (NA) 0 (NA) Z 2 Z (0.336) (0.326) 0 (NA) 5 Proofs List of Regularity Conditions for Theorems 1 and 2 (A1) The model is identifiable. (A2) The expectation of the first derivative of S eff β 0. with respect to β exists and is not singular at (A3) The second derivative of S eff neighborhood of β 0. with respect to β exists and is continuous and bounded in a (A4) The variance-covariance matrix of the first d 1 components of Seff, S eff,i, is positive definite at β 0. Proof of Theorem 1 From condition (A2), we can denote J = ( S ) 1 eff E β T β 0, φ eff = JS eff and q λ n (β) = Jp λ n (β). Let α n = n 1/2 + a n, φ eff,i (β) = φ eff (W i, Z i, Y i, β). It is sufficient to show that n 1/2 φ eff,i (β) n1/2 q λ n (β) = 0 (15) 17
18 Y. Ma and R. Li has a solution ˆβ that satisfies ˆβ β 0 = O p (α n ). Consider (β β 0 ) T n 1/2 φ eff,i (β) n1/2 q λ n (β). For any β such that β β 0 = Cα n for some constant C, because of condition (A3) and a n 0, it follows by the Taylor expansion that (β β 0 ) T n 1/2 φ eff,i (β) n1/2 q λ n (β) = (β β 0 ) T [ n 1/2 φ eff,i (β 0) n 1/2 q λ n (β 0 ) + n 1/2 ] n 1/2 q λ n (β 0 ) β T (β β 0 )1 + o p (1) [ = (β β 0 ) T n 1/2 φ eff,i (β 0) n 1/2 q λ n (β 0 ) + n 1/2 1 + o p (1)(β β 0 ) φ eff,i (β 0) β T (β β 0 )1 + o p (1) ] +n 1/2 O(b n )(β β 0 )1 + o p (1) = (β β 0 ) T n 1/2 φ eff,i (β 0) n 1/2 q λ n (β 0 ) + n 1/2 β β o p n 1/2 β β 0 2. The first term in the above display is of order O p (Cα n ) = O p (C nα 2 n), the second term equals C 2 nα 2 n, which dominates the first term as long as C is large enough. The remaining terms are dominated by the first two terms. Thus, for any ɛ > 0, as long as C is large enough, the probability for the above display to be larger than zero is at least 1 ɛ. From Brouwer fixed-point theorem, we know there with probability at least 1 ɛ, there exists at least one solution for (15) in the region β β 0 Cα n. To show Theorem 2, we first prove the following lemma. Lemma 1 If conditions in Theorem 2 hold, then with probability tending to 1, for any given β that satisfies β β 0 = O p (n 1/2 ), β II = 0 is a solution to the last d 2 equations of (2). Proof: Denote the kth equation in n S eff (W i, Z i, Y i, β) as L k (β), k = d 1 + 1,..., d, then L k (β) np λ n (β k ) d L k (β) d = L k (β 0 ) + (β j β j0 ) β j j=1 d l=1 j=1 2 L k (β ) β l β j (β l β l0 )(β j β j0 ) np λ n,k ( β k )sign(β k ). The first three terms of the above display are all of order at most O p (n 1/2 ), hence we have L k (β) np λ n (β k ) = n np λ n ( β k )sign(β k ) + O p (1). 18
19 Measurement Error Model Selection Using (8), the sign of L k (β) np λ n (β k ) is decided by sign(β k ) completely when n is large enough. From the continuity of L k (β) np λ n (β k ), we obtain that it is zero at β k = 0. Proof of Theorem 2 We let λ n be sufficiently small so that a n = o(n 1/2 ). From Lemma 1, there is a root-n consistent estimator ˆβ. From Lemma 1, ˆβ = ( ˆβ T I, 0T ) T, so (i) is shown. Denote the first d 1 equations in n S eff W i, Z i, Y i, (β T I, 0T ) T as L(β I ), then 0 = L( ˆβ I ) np λ n,i ( ˆβ I ) L(βI0 ) = L(β I0 ) + = L(β I0 ) + n β T I + o p (n) ( ˆβ I β I0 ) nb n Σ + o p (1) ( ˆβ I β I0 ) E S eff,i (β I0) β T I We thus obtain n ˆβI β I0 E S eff,i (β I0) βi T Σ ˆβI β I0 1 Σ b = n 1/2 Because of condition (A4), the results in (ii) now follow. E S eff,i (β I0) β T I E S eff (β I0) β T I 1 Σ b + o p (n 1/2 ). Σ 1 L(β I0 ) + o p (1). List of Regularity Conditions for Theorems 3 and 4 (B1) The model is identifiable. (B2) The expectation of the first derivative of S eff with respect to β exists at β 0 and its left eigenvalues are bounded away from zero and infinity uniformly for all n. For any entry S jk in S eff (β 0)/ β T, E(S 2 jk ) < C 3 <. (B3) The eigenvalues of the matrix E(S eff,i S T eff,i ), satisfy 0 < C 1 < λ min < < λ max < C 2 < for all n; For any entries, S k, S j in S eff (β 0), E(S 2 k S2 j ) < C 4 <. (B4) The second derivatives of Seff with respect to β exists and the entries are uniformly bounded by a function M(W i, Z i, Y i ) in a large enough neighborhood of β 0. In addition, E(M 2 ) < c 5 < for all n, d. (B5) min 1 j d1 β j0 /λ n as n. (B6) Let c n = max 1 j d1 p λ ( β j0 ) : β j0 0. Assume that λ n 0, a n = O(n 1/2 ) and c n 0 n. In addition, there exists constants C and D such that when θ 1, θ 2 > Cλ, p λ (θ 1) p λ (θ 2) D θ 1 θ 2. 19
20 Y. Ma and R. Li To emphasize the dependence of v, λ, d, d 1, d 2, S eff, b, Σ, β, β I, β II, β j, for j = 1,..., d, on n, we write them as v n, λ n, d n, d n1, d n2, S n,eff, b n, Σ n, β n, β ni, β nii, β nj respectively throughout the proof. Proof of Theorem 3 From condition (B2), we can denote J n = ( S ) 1 n,eff E βn T βn0, φ n,eff = J nsn,eff and q λ n (β n ) = J n p λ n (β n ). Let α n = n 1/2 + a n, φ n,eff,i (β n) = φ n,eff (W i, Z i, Y i, β n ). It suffices to show that n 1/2 φ n,eff,i (β n) n 1/2 q λ n (β n ) = 0 (16) has a solution ˆβ n that satisfies ˆβ n β n0 = O p (d 1/2 n α n ). Consider (β n β n0 ) T n 1/2 φ n,eff,i (β n) n 1/2 q λ n (β n ). For any β n such that β n β n0 = C d n α n for some constant C, because of condition (B2), (B3), (B4) and (B6), we have the expansion (β n β n0 ) T n 1/2 φ n,eff,i (β n) n 1/2 q λ n (β n ) = (β n β n0 ) T [ n 1/2 φ n,eff,i (β n0) n 1/2 q λ n (β n0 ) + n 1/2 φ n,eff,i (β n0) (β n β n0 ) ] +2 1 n 1/2 (β n β n0 ) T 2 φ n,eff,i (β n) β n βn T (β n β n0 ) n 1/2 q λ n (β n0 ) βn T (β n β n0 )1 + o p (1) [ = (β n β n0 ) T n 1/2 φ n,eff,i (β n0) n 1/2 q λ n (β n0 ) + n 1/2 1 + o p (1)(β n β n0 ) ] +o p (1) + n 1/2 O(c n )(β n β n0 )1 + o p (1) = (β n β n0 ) T n 1/2 φ n,eff,i (β n0) n 1/2 q λ n (β n0 ) + n 1/2 β n β n0 2 +o p (n 1/2 β n β n0 2 ). In the above derivation, β n is in between β n and β n0, and we used n 1/2 (β n β n0 ) T 2 φ n,eff,i (β n) β n βn T (β n β n0 ) = o p (1), 20 β T n
21 Measurement Error Model Selection which is shown in detail in (18), hence the proof is omitted here. The first term in the above display is bounded by β n β n0 O p n 1/2 φ n,eff,i (β n0) n 1/2 q λ n (β n0 ) = β n β n0 O p ((d n + d n na 2 n) 1/2 ) = O p (Cn 1/2 d n αn), 2 the second term equals C 2 n 1/2 d n αn, 2 which dominates the first term with probability 1 ɛ for any ɛ > 0 as long as C is large enough. The remaining terms are dominated by the first two terms. Thus, for any ɛ > 0, for large enough C, the probability for the above display to be larger than zero is at least 1 ɛ. From Brouwer fixed-point theorem, we know that with probability at least 1 ɛ, there exists at least one solution for (16) in the region β β 0 C d n α n. We first prove the following lemma, then give the proof of Theorem 4. Lemma 2 If the conditions in Theorem 4 hold, then with probability tending to 1, for any given β n that satisfies β n β n0 = O p ( d n /n), β nii = 0 is a solution to the last d n2 equations of (2). Proof: Denote the kth equation in n S n,eff (W i, Z i, Y i, β n ) as L nk (β n ), k = d n1 + 1,..., d n, then L nk (β n ) np λ n,k (β nk) = L nk (β n0 ) + d n j=1 L nk (β n0 ) (β nj β nj0 ) β nj d n d n l=1 j=1 2 L nk (β n) β nl β nj (β nl β nl0 )(β nj β nj0 ) np λ n,k ( β nk )sign(β nk ), (17) where β n is in between ˆβ n and β n0. Because of condition (B3), the first term of (17) is of order O p (n 1/2 ) = o p (n 1/2 d n 1/2 ). The second term (17) can be further written as d n j=1 Lnk (β n0 ) E L nk(β n0 ) (β nj β nj0 ) + β nj β nj where the first term is controlled by β n1 d n j=1 E L nk(β n0 ) β nj (β nj β nj0 ), d n Lnk (β n0 ) E L 1/2 nk(β n0 ) 2 β n β n0 β j=1 nj β nj ( S ) = O(d 1/2 n n 1/2 n,eff,k )O p var O p (d 1/2 n n 1/2 ) O(d n 1/2 n 1/2 )O p E(S 2 k1 ) O p (d n 1/2 n 1/2 ) = o p (n 1/2 d n 1/2 ), 21
22 Y. Ma and R. Li due to condition (B2), and the second term is controlled by n d n j=1 ( E S ) 2 n,eff,k βn T 1/2 The third term of (17) can be further written as d n d n l=1 j=1 + d n l=1 j=1 β n β n0 nλ max E S 2 n,eff,k βn T β n β n0 = O p (n 1/2 d 1/2 n ). 2 L nk (β E n) β nl β nj d n where the first term is bounded by n d n j,l=1 d n j,l=1 (β nl β nl0 )(β nj β nj0 ) [ 2 L nk (βn) 2 L nk (β E n) β nl β nj β nl β nj [ 2 L nk (β ] 1/2 2 E n) β n β n0 2 = β nl β nj 1/2 d n E(M) 2 O p (n 1 d n ) E(M 2 ) j,l=1 d n j,l=1 due to condition (B4), and the second term is bounded by = d n l,j=1 d n l,j=1 ] (β nl β nl0 )(β nj β nj0 ), 1/2 [ 2 L nk (βn) 2 L nk (β ] 2 E n) β nl β nj β nl β nj [ ne 2 S n,eff,k (βn) ] 2 β nl β nj 1/2 O p (n 1 d n ) O p (d n ) = O p (d n d n ) = o p (n 1/2 d 1/2 n ), 1/2 no p var 2 Sn,eff,k (β n) 1/2 O p (n 1 d n ) β nl β nj d n O p l,j=1 d n l,j=1 E 2 Sn,eff,k 2 (β n) β nl β nj O p E(M 2 ) 1/2 due to condition (B4) and (B5). Hence we have 1/2 β n β n0 2 O p (n 1/2 d n ) O p (n 1/2 d n ) = O p (n 1/2 d 2 n) = o p (n 1/2 d 1/2 n ), L nk (β n ) np λ n,k (β nk) = n n/d n p λ n,k ( β nk )sign(β nk ) + O p (1). Using condition (9), the sign of L nk (β n ) np λ n (β nk ) is decided by sign(β nk ) completely when n is large enough. From the continuity of L nk (β n ) np λ n (β nk ), we obtain that it is zero at β nk = 0. 22
23 Measurement Error Model Selection Proof of Theorem 4: From Theorem 3 and condition (B6), there is a root-(n/d n ) consistent estimator ˆβ n. From Lemma 2, ˆβn = ( ˆβ T ni, 0T ) T, so (i) is shown. Denote the first d n1 equations in n S n,eff W i, Z i, Y i, (β T ni, 0T ) T as L n (β ni ). From Lemma 2, if we select β ni β ni0 = O p (d n 1/2 n 1/2 ), then (i) is shown. Now consider solving the first d n1 equations in (2) for β ni, while β nii = 0. Obviously, when λ n is sufficiently small, a n = 0, hence from Theorem 3, there is a rootn/d n consistent root, denote it ˆβ I. Denote the first d n1 equations in n S n,eff (W i, Z i, Y i, β n ) as L n (β n ), then 0 = L n ( ˆβ ni ) np λ n,i ( ˆβ ni ) = L n (β I0 ) + L n(β I0 ) βni T ( ˆβ ni β I0 ) ( ˆβ ni β ni0 ) T 2 L n (βni ) β ni βni T ( ˆβ ni β ni0 ) nb n np λ n,i (β ni)( ˆβ ni β ni0 ), where β ni is between β I0 and ˆβ ni. We also have n 1 ( ˆβ ni β ni0 ) T 2 L n (βni ) β ni βni T ( ˆβ ni β ni0 ) 2 ˆβ ni β ni0 ) 4 n 1 2 L n (βni ) β ni βni T 2 [ = O p (d 2 n n 2 n 1 d 3 Sn,eff,i (βni n1)o p (n E ) ] 2 Sn,eff,i (βni + var ) ) β j β k β j β k O p (d n 5 n 3 )O p [n E(M) 2 + EM 2] O p (d n 5 n 3 )O p (nem 2 ) = o p (n 1 ), (18) due to condition (B4). In addition, n 1 L n(β ni0 ) β T ni 2 n 1 L n(β ni0 ) β T ni p λ n,i (β ni) E L n(β ni0 ) β T ni E L n(β ni0 ) βni T 2 + O p (n 1 d n1 ) due to condition (B5) and (B6). Since for any fixed ɛ > 0, we have P r n 1 L n(β ni0 ) βni T E L n(β ni0 ) 1 βni T ɛd n d n 2 n 2 ɛ 2 E L n(β ni0 ) β T ni due to condition (B2), we have + p λ n,i (β ni0) 2 ne L n(β ni0 ) βni T 2 = O(d 2 n n 2 d 2 n1n) = o(1), Therefore, n 1 L n(β ni0 ) β T ni n 1 L n(β ni0 ) β T ni p λ n,i (β n) E L n(β ni0 ) β T ni E L n(β ni0 ) βni T = o p (d 1 n ). + p λ n,i (β ni0) 2 = o p (d n 2 ), 23
24 Y. Ma and R. Li and subsequently, n 1 L n(β ni0 ) βni T p λ n,i (β n) E L n(β ni0 ) βni T + p λ n,i (β ni0) ( ˆβ ni β ni0 ) o p (d 1 n )O p (n 1/2 d 1/2 n ) = o p (n 1/2 ). We thus obtain E L n(β ni0 ) β T ni + Σ n ( ˆβ ni β ni0 ) + b n = n 1 L n (β ni0 ) + o p (n 1/2 ). Denoting I = E Sn,eff,I (β ni0)sn,eff,i T (β ni0), using condition (B3), we have Let [ n 1/2 v n I 1/2 E L ] n(β ni0 ) βni T + Σ n ( ˆβ ni β ni0 ) + b n = n 1/2 v n I 1/2 L n (β ni0 ) + o p (v n I 1/2 ) = n 1/2 v n I 1/2 L n (β ni0 ) + o p (1). It follows that for any ɛ > 0, Y i = n 1/2 v n I 1/2 S n,eff,i (W i, Z i, Y i, β ni0 ), i = 1,..., n. E Y i 2 1( Y i > ɛ) = ne Y 1 2 1( Y 1 > ɛ) n(e Y 1 4 ) 1/2 P r ( Y 1 > ɛ) 1/2. We have and P r ( Y 1 > ɛ) = P r ( Y 1 2 > ɛ 2 ) E Y 1 2 = v nv T n nɛ 2 = O(n 1 ), E( Y 1 4 ) = n 2 E v n I 1/2 S n,eff,i (W 1, Z 1, Y 1, β ni0 ) 4 ɛ 2 = E v ni 1/2 S n,eff,i (W 1, Z 1, Y 1, β ni0 ) 2 nɛ 2 = n 2 ES n,eff,i (W 1, Z 1, Y 1, β ni0 ) T I 1/2 v T n v n I 1/2 S n,eff,i (W 1, Z 1, Y 1, β ni0 ) 2 n 2 λ 2 max(v T n v n )E(S n,eff,i (W 1, Z 1, Y 1, β ni0 ) T I 1 S n,eff,i (W 1, Z 1, Y 1, β ni0 ) 2 n 2 λ 2 max(v T n v n )λ 2 max(i 1 )ES n,eff,i (W 1, Z 1, Y 1, β ni0 ) T S n,eff,i (W 1, Z 1, Y 1, β ni0 ) 2 = n 2 λ 2 max(i 1 )E S n,eff,i (W 1, Z 1, Y 1, β ni0 ) 4 = O(d 2 n1n 2 ), due to condition (B3). Hence, E Y i 2 1( Y i > ɛ) = O(nd n1 n 1 n 1/2 ) = o(1). 24
25 Measurement Error Model Selection On the other hand, cov(y i ) = ncovn 1/2 v n I 1/2 S n,eff,i (W 1, Z 1, Y 1, β ni0 ) = v n I 1/2 ES n,eff,i (W 1, Z 1, Y 1, β ni0 )S n,eff,i (W 1, Z 1, Y 1, β ni0 ) T I 1/2 vn T = 1. Following Lindeberg-Feller central limit theorem, the results in (ii) now follow. List of Regularity Conditions for Theorems 5 and 6 (C1) The model is identifiable. (C2) The first derivatives of L with respect to β and θ exist and are denoted as L β and L θ respectively. The first derivative of θ with respect to β exists and is denoted as θ β. Then E(L β + L θ θβ T exists and is not singular at β 0 and the true function θ 0 (Z). (C3) The second derivatives of L with respect to β and θ, 2 L/ β β T, 2 L/ θ 2 and 2 L/ β θ, exist and are uniformly bounded in a neighborhood of β 0, θ 0. The second derivative of θ with respect to β exists and is uniformly bounded in a neighborhood of β 0 for all θ(z) in a neighborhood of θ 0. (C4) The variance-covariance matrix of L I ΨU I (Z) is positive definite at β 0, θ 0. (C5) The random variable Z has compact support and its density f Z (z) is positive on that support. The bandwidth h satisfies nh 4 0 and nh 2. θ(z) has bounded second derivative. Proof of Theorem 5 From condition (C2), we can denote J = [ E ( L β + L θ θβ T ) ] 1 β0,θ 0, φ eff (β, θ) = JL(β, θ) and q λ n (β) = Jp λ n (β). Let α n = n 1/2 + a n, φ eff,i (β, ˆθ) = φ eff W i, Z i, Y i, β, ˆθ i (β), and Ψ eff,i (β, ˆθ) = Ψ eff β, ˆθ i (β), where ˆθ i are obtained from (12). We will prove that n 1/2 φ eff,i (β, ˆθ) n 1/2 q λ n (β) = 0 (19) has a solution ˆβ that satisfies ˆβ β 0 = O p (α n ). Consider (β β 0 ) T n 1/2 φ eff,i (β, ˆθ) n 1/2 q λ n (β). 25
26 Y. Ma and R. Li For any β such that β β 0 = Cα n for some constant C, because of condition (C3) and (C6), using the expansion (A.3) in Ma and Carroll (2006), we have (β β 0 ) T n 1/2 φ eff,i (β, ˆθ) n 1/2 q λ n (β) = (β β 0 ) T [ n 1/2 φ eff,i (β 0, θ 0 ) n 1/2 q λ n (β 0 ) +Eφ eff,i,β (β 0, θ 0 ) + φ eff,i,θ (β 0, θ 0 )θβ T (Z, β 0)n 1/2 (β β 0 ) +n 1/2 φ eff,i (β ] 0, θ 0 ) ˆθ(β 0 ) θ 0 + o p (1) n 1/2 q λ n (β 0 ) θ β T (β β 0 )1 + o p (1) [ = (β β 0 ) T n 1/2 φ eff,i (β 0, θ 0 ) n 1/2 q λ n (β 0 ) + n 1/2 (β β 0 ) +n 1/2 φ eff,i (β ] 0, θ 0 ) θ T ˆθ(β 0 ) θ 0 + n 1/2 O(b n )(β β 0 )1 + o p (1) = (β β 0 ) T n 1/2 φ eff,i (β 0, θ 0 ) n 1/2 q λ n (β 0 ) + n 1/2 β β 0 2 +(β β 0 ) T n 1/2 Due to the usual local estimating equation expansion ˆθ(z, β 0 ) θ 0 (z) = (h 2 /2)θ 0(z) n 1 φ eff,i (β 0, θ 0 ) θ T ˆθ(β 0 ) θ 0 + o p n 1/2 β β 0 2. (20) K h (Z j z)ψ j (β 0, θ 0 )/f Z (z)ω(z) + o p (n 1/2 ), (21) j=1 substituting into the third term in (20), making use of condition (C5), it can be easily seen that (β β 0 ) T n 1/2 φ eff,i (β 0, θ 0 ) θ T ˆθ(β 0 ) θ 0 = (β β 0 ) T n 1/2 Ψ i (β 0, θ 0 )JU(Z i ) + o p (1), where U(Z) = E( L/ θ Z)/Ω(Z). Continuing from (20), we have (β β 0 ) T n 1/2 φ eff,i (β, ˆθ) n 1/2 q λ n (β) = (β β 0 ) T n 1/2 φ eff,i (β 0, θ 0 ) n 1/2 q λ n (β 0 ) n 1/2 +n 1/2 β β o p n 1/2 β β 0 2. Ψ i (β 0, θ 0 )JU(Z i ) The first term in the above display is of order O p (Cn 1/2 α 2 n), the second term equals C 2 n 1/2 α 2 n, which dominates the first term as long as C is large enough. The last term is dominated by the first two 26
27 Measurement Error Model Selection terms. Thus, for any ɛ > 0, as long as C is large enough, the probability for the above display to be larger than zero is at least 1 ɛ. From Brouwer fixed-point theorem, we know there with probability at least 1 ɛ, there exists at least one solution for (19) in the region β β 0 Cα n. We first prove the following lemma, then give the proof of Theorem 6. Lemma 3 If conditions in Theorem 6 hold, then with probability tending to 1, for any given β that satisfies β β 0 = O p (n 1/2 ), β II = 0 is a solution to the last d 2 equations of (11). Proof: Denote the kth equation in n L iβ, ˆθ(β) as L k (β, ˆθ), and that in n Ψ i(β 0, θ 0 )U(Z i ) as G k (β 0, θ 0 ), k = d 1 + 1,..., d, then the expansion in Lemma 5 leads to L k (β, ˆθ) np λ n,k (β k) = L k (β 0, θ 0 ) G k (β 0, θ 0 ) + n d (J 1 ) kj (β j β j0 ) np λ n,k ( β k )sign(β k ) + o p (n 1/2 ). j=1 The first three terms of the above display are all of order O p (n 1/2 ), hence we have L k (β, ˆθ) np λ n,k (β k) = n np λ n ( β k )sign(β k ) + O p (1). Due to (8). the sign of L k (β) np λ n (β k ) is decided by sign(β k ) completely. From the continuity of L k (β) np λ n (β k ), we obtain that it is zero at β k = 0 with probability larger than any 1 ɛ. Proof of Theorem 6: We let λ n be sufficiently small so that a n = o(n 1/2 ). From Theorem 5, there is a root-n consistent estimator ˆβ. From Lemma 3, ˆβ = ( ˆβ T I, 0T ) T, so (i) is shown. Denote the first d 1 equations in n L i(β T I, 0T ) T, ˆθ as Lβ I, ˆθ(β I ), and that in n Ψ i(β 0, θ 0 )U(Z i ) as G(β I0, θ 0 ). Note that the d 1 d 1 upper left block of J 1 is the matrix A defined in Theorem 6. Use the expansion in Theorem 5 at β = (β T I, 0)T, the first d 1 equations yield 0 = L ˆβ I, ˆθ( ˆβ I ) np λ n,i ( ˆβ I ) We thus obtain = L(β I0, θ 0 ) G(β I0, θ 0 ) + na( ˆβ I β I0 ) nb n Σ + o p (1) ( ˆβ I β I0 ) + o p (n 1/2 ) [ ] = L(β I0, θ 0 ) G(β I0, θ 0 ) + n(a Σ) ˆβI β I0 (A Σ) 1 b + o p (n 1/2 ). n ˆβI β I0 (A Σ) b 1 = n 1/2 (A Σ) 1 L(β I0, θ 0 ) G(β I0, θ 0 ) + o p (1). Because of condition (C4), the results in (ii) now follow. List of Regularity Conditions for Theorem 7 and 8 27
Nonconcave Penalized Likelihood with A Diverging Number of Parameters
Nonconcave Penalized Likelihood with A Diverging Number of Parameters Jianqing Fan and Heng Peng Presenter: Jiale Xu March 12, 2010 Jianqing Fan and Heng Peng Presenter: JialeNonconcave Xu () Penalized
More informationPaper Review: Variable Selection via Nonconcave Penalized Likelihood and its Oracle Properties by Jianqing Fan and Runze Li (2001)
Paper Review: Variable Selection via Nonconcave Penalized Likelihood and its Oracle Properties by Jianqing Fan and Runze Li (2001) Presented by Yang Zhao March 5, 2010 1 / 36 Outlines 2 / 36 Motivation
More informationAnalysis Methods for Supersaturated Design: Some Comparisons
Journal of Data Science 1(2003), 249-260 Analysis Methods for Supersaturated Design: Some Comparisons Runze Li 1 and Dennis K. J. Lin 2 The Pennsylvania State University Abstract: Supersaturated designs
More informationSelection of Smoothing Parameter for One-Step Sparse Estimates with L q Penalty
Journal of Data Science 9(2011), 549-564 Selection of Smoothing Parameter for One-Step Sparse Estimates with L q Penalty Masaru Kanba and Kanta Naito Shimane University Abstract: This paper discusses the
More informationMODEL SELECTION FOR CORRELATED DATA WITH DIVERGING NUMBER OF PARAMETERS
Statistica Sinica 23 (2013), 901-927 doi:http://dx.doi.org/10.5705/ss.2011.058 MODEL SELECTION FOR CORRELATED DATA WITH DIVERGING NUMBER OF PARAMETERS Hyunkeun Cho and Annie Qu University of Illinois at
More informationLasso Maximum Likelihood Estimation of Parametric Models with Singular Information Matrices
Article Lasso Maximum Likelihood Estimation of Parametric Models with Singular Information Matrices Fei Jin 1,2 and Lung-fei Lee 3, * 1 School of Economics, Shanghai University of Finance and Economics,
More informationSmoothly Clipped Absolute Deviation (SCAD) for Correlated Variables
Smoothly Clipped Absolute Deviation (SCAD) for Correlated Variables LIB-MA, FSSM Cadi Ayyad University (Morocco) COMPSTAT 2010 Paris, August 22-27, 2010 Motivations Fan and Li (2001), Zou and Li (2008)
More informationTheoretical results for lasso, MCP, and SCAD
Theoretical results for lasso, MCP, and SCAD Patrick Breheny March 2 Patrick Breheny High-Dimensional Data Analysis (BIOS 7600) 1/23 Introduction There is an enormous body of literature concerning theoretical
More informationWEIGHTED QUANTILE REGRESSION THEORY AND ITS APPLICATION. Abstract
Journal of Data Science,17(1). P. 145-160,2019 DOI:10.6339/JDS.201901_17(1).0007 WEIGHTED QUANTILE REGRESSION THEORY AND ITS APPLICATION Wei Xiong *, Maozai Tian 2 1 School of Statistics, University of
More informationFeature selection with high-dimensional data: criteria and Proc. Procedures
Feature selection with high-dimensional data: criteria and Procedures Zehua Chen Department of Statistics & Applied Probability National University of Singapore Conference in Honour of Grace Wahba, June
More informationTuning parameter selectors for the smoothly clipped absolute deviation method
Tuning parameter selectors for the smoothly clipped absolute deviation method Hansheng Wang Guanghua School of Management, Peking University, Beijing, China, 100871 hansheng@gsm.pku.edu.cn Runze Li Department
More informationShrinkage Tuning Parameter Selection in Precision Matrices Estimation
arxiv:0909.1123v1 [stat.me] 7 Sep 2009 Shrinkage Tuning Parameter Selection in Precision Matrices Estimation Heng Lian Division of Mathematical Sciences School of Physical and Mathematical Sciences Nanyang
More informationUNIVERSITY of PENNSYLVANIA CIS 520: Machine Learning Final, Fall 2013
UNIVERSITY of PENNSYLVANIA CIS 520: Machine Learning Final, Fall 2013 Exam policy: This exam allows two one-page, two-sided cheat sheets; No other materials. Time: 2 hours. Be sure to write your name and
More informationLecture 14: Variable Selection - Beyond LASSO
Fall, 2017 Extension of LASSO To achieve oracle properties, L q penalty with 0 < q < 1, SCAD penalty (Fan and Li 2001; Zhang et al. 2007). Adaptive LASSO (Zou 2006; Zhang and Lu 2007; Wang et al. 2007)
More informationIdentification and Estimation for Generalized Varying Coefficient Partially Linear Models
Identification and Estimation for Generalized Varying Coefficient Partially Linear Models Mingqiu Wang, Xiuli Wang and Muhammad Amin Abstract The generalized varying coefficient partially linear model
More informationRegression, Ridge Regression, Lasso
Regression, Ridge Regression, Lasso Fabio G. Cozman - fgcozman@usp.br October 2, 2018 A general definition Regression studies the relationship between a response variable Y and covariates X 1,..., X n.
More informationFULL LIKELIHOOD INFERENCES IN THE COX MODEL
October 20, 2007 FULL LIKELIHOOD INFERENCES IN THE COX MODEL BY JIAN-JIAN REN 1 AND MAI ZHOU 2 University of Central Florida and University of Kentucky Abstract We use the empirical likelihood approach
More informationBi-level feature selection with applications to genetic association
Bi-level feature selection with applications to genetic association studies October 15, 2008 Motivation In many applications, biological features possess a grouping structure Categorical variables may
More informationLikelihood-Based Methods
Likelihood-Based Methods Handbook of Spatial Statistics, Chapter 4 Susheela Singh September 22, 2016 OVERVIEW INTRODUCTION MAXIMUM LIKELIHOOD ESTIMATION (ML) RESTRICTED MAXIMUM LIKELIHOOD ESTIMATION (REML)
More informationThe Pennsylvania State University The Graduate School Eberly College of Science NEW PROCEDURES FOR COX S MODEL WITH HIGH DIMENSIONAL PREDICTORS
The Pennsylvania State University The Graduate School Eberly College of Science NEW PROCEDURES FOR COX S MODEL WITH HIGH DIMENSIONAL PREDICTORS A Dissertation in Statistics by Ye Yu c 2015 Ye Yu Submitted
More informationVariable Selection for Highly Correlated Predictors
Variable Selection for Highly Correlated Predictors Fei Xue and Annie Qu arxiv:1709.04840v1 [stat.me] 14 Sep 2017 Abstract Penalty-based variable selection methods are powerful in selecting relevant covariates
More informationSOLVING NON-CONVEX LASSO TYPE PROBLEMS WITH DC PROGRAMMING. Gilles Gasso, Alain Rakotomamonjy and Stéphane Canu
SOLVING NON-CONVEX LASSO TYPE PROBLEMS WITH DC PROGRAMMING Gilles Gasso, Alain Rakotomamonjy and Stéphane Canu LITIS - EA 48 - INSA/Universite de Rouen Avenue de l Université - 768 Saint-Etienne du Rouvray
More informationThe Iterated Lasso for High-Dimensional Logistic Regression
The Iterated Lasso for High-Dimensional Logistic Regression By JIAN HUANG Department of Statistics and Actuarial Science, 241 SH University of Iowa, Iowa City, Iowa 52242, U.S.A. SHUANGE MA Division of
More informationSparse Linear Models (10/7/13)
STA56: Probabilistic machine learning Sparse Linear Models (0/7/) Lecturer: Barbara Engelhardt Scribes: Jiaji Huang, Xin Jiang, Albert Oh Sparsity Sparsity has been a hot topic in statistics and machine
More informationSome Theories about Backfitting Algorithm for Varying Coefficient Partially Linear Model
Some Theories about Backfitting Algorithm for Varying Coefficient Partially Linear Model 1. Introduction Varying-coefficient partially linear model (Zhang, Lee, and Song, 2002; Xia, Zhang, and Tong, 2004;
More informationModel comparison and selection
BS2 Statistical Inference, Lectures 9 and 10, Hilary Term 2008 March 2, 2008 Hypothesis testing Consider two alternative models M 1 = {f (x; θ), θ Θ 1 } and M 2 = {f (x; θ), θ Θ 2 } for a sample (X = x)
More informationHigh-dimensional regression
High-dimensional regression Advanced Methods for Data Analysis 36-402/36-608) Spring 2014 1 Back to linear regression 1.1 Shortcomings Suppose that we are given outcome measurements y 1,... y n R, and
More informationLecture 5: Soft-Thresholding and Lasso
High Dimensional Data and Statistical Learning Lecture 5: Soft-Thresholding and Lasso Weixing Song Department of Statistics Kansas State University Weixing Song STAT 905 October 23, 2014 1/54 Outline Penalized
More informationRobust Variable Selection Through MAVE
Robust Variable Selection Through MAVE Weixin Yao and Qin Wang Abstract Dimension reduction and variable selection play important roles in high dimensional data analysis. Wang and Yin (2008) proposed sparse
More informationA Blockwise Descent Algorithm for Group-penalized Multiresponse and Multinomial Regression
A Blockwise Descent Algorithm for Group-penalized Multiresponse and Multinomial Regression Noah Simon Jerome Friedman Trevor Hastie November 5, 013 Abstract In this paper we purpose a blockwise descent
More informationAdministration. Homework 1 on web page, due Feb 11 NSERC summer undergraduate award applications due Feb 5 Some helpful books
STA 44/04 Jan 6, 00 / 5 Administration Homework on web page, due Feb NSERC summer undergraduate award applications due Feb 5 Some helpful books STA 44/04 Jan 6, 00... administration / 5 STA 44/04 Jan 6,
More informationBiostatistics Advanced Methods in Biostatistics IV
Biostatistics 140.754 Advanced Methods in Biostatistics IV Jeffrey Leek Assistant Professor Department of Biostatistics jleek@jhsph.edu Lecture 12 1 / 36 Tip + Paper Tip: As a statistician the results
More informationLinear Regression Models P8111
Linear Regression Models P8111 Lecture 25 Jeff Goldsmith April 26, 2016 1 of 37 Today s Lecture Logistic regression / GLMs Model framework Interpretation Estimation 2 of 37 Linear regression Course started
More informationUltra High Dimensional Variable Selection with Endogenous Variables
1 / 39 Ultra High Dimensional Variable Selection with Endogenous Variables Yuan Liao Princeton University Joint work with Jianqing Fan Job Market Talk January, 2012 2 / 39 Outline 1 Examples of Ultra High
More informationASYMPTOTIC PROPERTIES OF BRIDGE ESTIMATORS IN SPARSE HIGH-DIMENSIONAL REGRESSION MODELS
ASYMPTOTIC PROPERTIES OF BRIDGE ESTIMATORS IN SPARSE HIGH-DIMENSIONAL REGRESSION MODELS Jian Huang 1, Joel L. Horowitz 2, and Shuangge Ma 3 1 Department of Statistics and Actuarial Science, University
More informationPenalized Profiled Semiparametric Estimating Functions
Electronic Journal of Statistics ISSN: 1935-7524 Penalized Profiled Semiparametric Estimating Functions Lan Wang School of Statistics, University of Minnesota, Minneapolis, MN 55455, e-mail: wangx346@umn.edu
More informationWeb-based Supplementary Material for. Dependence Calibration in Conditional Copulas: A Nonparametric Approach
1 Web-based Supplementary Material for Dependence Calibration in Conditional Copulas: A Nonparametric Approach Elif F. Acar, Radu V. Craiu, and Fang Yao Web Appendix A: Technical Details The score and
More informationM-estimation in high-dimensional linear model
Wang and Zhu Journal of Inequalities and Applications 208 208:225 https://doi.org/0.86/s3660-08-89-3 R E S E A R C H Open Access M-estimation in high-dimensional linear model Kai Wang and Yanling Zhu *
More informationSingle Index Quantile Regression for Heteroscedastic Data
Single Index Quantile Regression for Heteroscedastic Data E. Christou M. G. Akritas Department of Statistics The Pennsylvania State University SMAC, November 6, 2015 E. Christou, M. G. Akritas (PSU) SIQR
More informationarxiv: v2 [stat.me] 4 Jun 2016
Variable Selection for Additive Partial Linear Quantile Regression with Missing Covariates 1 Variable Selection for Additive Partial Linear Quantile Regression with Missing Covariates Ben Sherwood arxiv:1510.00094v2
More informationStatistical Data Mining and Machine Learning Hilary Term 2016
Statistical Data Mining and Machine Learning Hilary Term 2016 Dino Sejdinovic Department of Statistics Oxford Slides and other materials available at: http://www.stats.ox.ac.uk/~sejdinov/sdmml Naïve Bayes
More informationNonparametric Modal Regression
Nonparametric Modal Regression Summary In this article, we propose a new nonparametric modal regression model, which aims to estimate the mode of the conditional density of Y given predictors X. The nonparametric
More informationMath 423/533: The Main Theoretical Topics
Math 423/533: The Main Theoretical Topics Notation sample size n, data index i number of predictors, p (p = 2 for simple linear regression) y i : response for individual i x i = (x i1,..., x ip ) (1 p)
More informationSTAT 992 Paper Review: Sure Independence Screening in Generalized Linear Models with NP-Dimensionality J.Fan and R.Song
STAT 992 Paper Review: Sure Independence Screening in Generalized Linear Models with NP-Dimensionality J.Fan and R.Song Presenter: Jiwei Zhao Department of Statistics University of Wisconsin Madison April
More informationESTIMATION AND VARIABLE SELECTION FOR GENERALIZED ADDITIVE PARTIAL LINEAR MODELS
Submitted to the Annals of Statistics ESTIMATION AND VARIABLE SELECTION FOR GENERALIZED ADDITIVE PARTIAL LINEAR MODELS BY LI WANG, XIANG LIU, HUA LIANG AND RAYMOND J. CARROLL University of Georgia, University
More informationComparisons of penalized least squares. methods by simulations
Comparisons of penalized least squares arxiv:1405.1796v1 [stat.co] 8 May 2014 methods by simulations Ke ZHANG, Fan YIN University of Science and Technology of China, Hefei 230026, China Shifeng XIONG Academy
More informationMoment and IV Selection Approaches: A Comparative Simulation Study
Moment and IV Selection Approaches: A Comparative Simulation Study Mehmet Caner Esfandiar Maasoumi Juan Andrés Riquelme August 7, 2014 Abstract We compare three moment selection approaches, followed by
More informationLecture 6: Methods for high-dimensional problems
Lecture 6: Methods for high-dimensional problems Hector Corrada Bravo and Rafael A. Irizarry March, 2010 In this Section we will discuss methods where data lies on high-dimensional spaces. In particular,
More informationLinear Models in Machine Learning
CS540 Intro to AI Linear Models in Machine Learning Lecturer: Xiaojin Zhu jerryzhu@cs.wisc.edu We briefly go over two linear models frequently used in machine learning: linear regression for, well, regression,
More informationRegularization Path Algorithms for Detecting Gene Interactions
Regularization Path Algorithms for Detecting Gene Interactions Mee Young Park Trevor Hastie July 16, 2006 Abstract In this study, we consider several regularization path algorithms with grouped variable
More informationVARIABLE SELECTION IN ROBUST LINEAR MODELS
The Pennsylvania State University The Graduate School Eberly College of Science VARIABLE SELECTION IN ROBUST LINEAR MODELS A Thesis in Statistics by Bo Kai c 2008 Bo Kai Submitted in Partial Fulfillment
More informationVARIABLE SELECTION AND ESTIMATION WITH THE SEAMLESS-L 0 PENALTY
Statistica Sinica 23 (2013), 929-962 doi:http://dx.doi.org/10.5705/ss.2011.074 VARIABLE SELECTION AND ESTIMATION WITH THE SEAMLESS-L 0 PENALTY Lee Dicker, Baosheng Huang and Xihong Lin Rutgers University,
More informationClassification. Chapter Introduction. 6.2 The Bayes classifier
Chapter 6 Classification 6.1 Introduction Often encountered in applications is the situation where the response variable Y takes values in a finite set of labels. For example, the response Y could encode
More informationUsing Estimating Equations for Spatially Correlated A
Using Estimating Equations for Spatially Correlated Areal Data December 8, 2009 Introduction GEEs Spatial Estimating Equations Implementation Simulation Conclusion Typical Problem Assess the relationship
More informationQuasi-likelihood Scan Statistics for Detection of
for Quasi-likelihood for Division of Biostatistics and Bioinformatics, National Health Research Institutes & Department of Mathematics, National Chung Cheng University 17 December 2011 1 / 25 Outline for
More informationRobust Variable Selection Methods for Grouped Data. Kristin Lee Seamon Lilly
Robust Variable Selection Methods for Grouped Data by Kristin Lee Seamon Lilly A dissertation submitted to the Graduate Faculty of Auburn University in partial fulfillment of the requirements for the Degree
More informationMA 575 Linear Models: Cedric E. Ginestet, Boston University Regularization: Ridge Regression and Lasso Week 14, Lecture 2
MA 575 Linear Models: Cedric E. Ginestet, Boston University Regularization: Ridge Regression and Lasso Week 14, Lecture 2 1 Ridge Regression Ridge regression and the Lasso are two forms of regularized
More informationHigh-dimensional Ordinary Least-squares Projection for Screening Variables
1 / 38 High-dimensional Ordinary Least-squares Projection for Screening Variables Chenlei Leng Joint with Xiangyu Wang (Duke) Conference on Nonparametric Statistics for Big Data and Celebration to Honor
More informationRegularization: Ridge Regression and the LASSO
Agenda Wednesday, November 29, 2006 Agenda Agenda 1 The Bias-Variance Tradeoff 2 Ridge Regression Solution to the l 2 problem Data Augmentation Approach Bayesian Interpretation The SVD and Ridge Regression
More informationSTAT 100C: Linear models
STAT 100C: Linear models Arash A. Amini June 9, 2018 1 / 21 Model selection Choosing the best model among a collection of models {M 1, M 2..., M N }. What is a good model? 1. fits the data well (model
More informationMSA220/MVE440 Statistical Learning for Big Data
MSA220/MVE440 Statistical Learning for Big Data Lecture 7/8 - High-dimensional modeling part 1 Rebecka Jörnsten Mathematical Sciences University of Gothenburg and Chalmers University of Technology Classification
More informationBAGUS: Bayesian Regularization for Graphical Models with Unequal Shrinkage
BAGUS: Bayesian Regularization for Graphical Models with Unequal Shrinkage Lingrui Gan, Naveen N. Narisetty, Feng Liang Department of Statistics University of Illinois at Urbana-Champaign Problem Statement
More informationConsistent change point estimation. Chi Tim Ng, Chonnam National University Woojoo Lee, Inha University Youngjo Lee, Seoul National University
Consistent change point estimation Chi Tim Ng, Chonnam National University Woojoo Lee, Inha University Youngjo Lee, Seoul National University Outline of presentation Change point problem = variable selection
More informationA Consistent Information Criterion for Support Vector Machines in Diverging Model Spaces
Journal of Machine Learning Research 17 (2016) 1-26 Submitted 6/14; Revised 5/15; Published 4/16 A Consistent Information Criterion for Support Vector Machines in Diverging Model Spaces Xiang Zhang Yichao
More informationThe Adaptive Lasso and Its Oracle Properties Hui Zou (2006), JASA
The Adaptive Lasso and Its Oracle Properties Hui Zou (2006), JASA Presented by Dongjun Chung March 12, 2010 Introduction Definition Oracle Properties Computations Relationship: Nonnegative Garrote Extensions:
More informationOn Mixture Regression Shrinkage and Selection via the MR-LASSO
On Mixture Regression Shrinage and Selection via the MR-LASSO Ronghua Luo, Hansheng Wang, and Chih-Ling Tsai Guanghua School of Management, Peing University & Graduate School of Management, University
More informationGrouped variable selection in high dimensional partially linear additive Cox model
University of Iowa Iowa Research Online Theses and Dissertations Fall 2010 Grouped variable selection in high dimensional partially linear additive Cox model Li Liu University of Iowa Copyright 2010 Li
More informationSize and Shape of Confidence Regions from Extended Empirical Likelihood Tests
Biometrika (2014),,, pp. 1 13 C 2014 Biometrika Trust Printed in Great Britain Size and Shape of Confidence Regions from Extended Empirical Likelihood Tests BY M. ZHOU Department of Statistics, University
More informationGeneralized Elastic Net Regression
Abstract Generalized Elastic Net Regression Geoffroy MOURET Jean-Jules BRAULT Vahid PARTOVINIA This work presents a variation of the elastic net penalization method. We propose applying a combined l 1
More informationSTA6938-Logistic Regression Model
Dr. Ying Zhang STA6938-Logistic Regression Model Topic 2-Multiple Logistic Regression Model Outlines:. Model Fitting 2. Statistical Inference for Multiple Logistic Regression Model 3. Interpretation of
More informationSemiparametric Mixed Effects Models with Flexible Random Effects Distribution
Semiparametric Mixed Effects Models with Flexible Random Effects Distribution Marie Davidian North Carolina State University davidian@stat.ncsu.edu www.stat.ncsu.edu/ davidian Joint work with A. Tsiatis,
More informationVariable Selection in Multivariate Multiple Regression
Variable Selection in Multivariate Multiple Regression by c Anita Brobbey A thesis submitted to the School of Graduate Studies in partial fulfillment of the requirement for the Degree of Master of Science
More informationChris Fraley and Daniel Percival. August 22, 2008, revised May 14, 2010
Model-Averaged l 1 Regularization using Markov Chain Monte Carlo Model Composition Technical Report No. 541 Department of Statistics, University of Washington Chris Fraley and Daniel Percival August 22,
More informationChapter 3. Linear Models for Regression
Chapter 3. Linear Models for Regression Wei Pan Division of Biostatistics, School of Public Health, University of Minnesota, Minneapolis, MN 55455 Email: weip@biostat.umn.edu PubH 7475/8475 c Wei Pan Linear
More informationEnsemble estimation and variable selection with semiparametric regression models
Ensemble estimation and variable selection with semiparametric regression models Sunyoung Shin Department of Mathematical Sciences University of Texas at Dallas Joint work with Jason Fine, Yufeng Liu,
More informationModel Selection and Geometry
Model Selection and Geometry Pascal Massart Université Paris-Sud, Orsay Leipzig, February Purpose of the talk! Concentration of measure plays a fundamental role in the theory of model selection! Model
More informationEXAM IN STATISTICAL MACHINE LEARNING STATISTISK MASKININLÄRNING
EXAM IN STATISTICAL MACHINE LEARNING STATISTISK MASKININLÄRNING DATE AND TIME: August 30, 2018, 14.00 19.00 RESPONSIBLE TEACHER: Niklas Wahlström NUMBER OF PROBLEMS: 5 AIDING MATERIAL: Calculator, mathematical
More informationFunction of Longitudinal Data
New Local Estimation Procedure for Nonparametric Regression Function of Longitudinal Data Weixin Yao and Runze Li Abstract This paper develops a new estimation of nonparametric regression functions for
More informationA Confidence Region Approach to Tuning for Variable Selection
A Confidence Region Approach to Tuning for Variable Selection Funda Gunes and Howard D. Bondell Department of Statistics North Carolina State University Abstract We develop an approach to tuning of penalized
More informationVARIABLE SELECTION IN QUANTILE REGRESSION
Statistica Sinica 19 (2009), 801-817 VARIABLE SELECTION IN QUANTILE REGRESSION Yichao Wu and Yufeng Liu North Carolina State University and University of North Carolina, Chapel Hill Abstract: After its
More informationCURRENT STATUS LINEAR REGRESSION. By Piet Groeneboom and Kim Hendrickx Delft University of Technology and Hasselt University
CURRENT STATUS LINEAR REGRESSION By Piet Groeneboom and Kim Hendrickx Delft University of Technology and Hasselt University We construct n-consistent and asymptotically normal estimates for the finite
More informationSTA 450/4000 S: January
STA 450/4000 S: January 6 005 Notes Friday tutorial on R programming reminder office hours on - F; -4 R The book Modern Applied Statistics with S by Venables and Ripley is very useful. Make sure you have
More informationModel Selection. Frank Wood. December 10, 2009
Model Selection Frank Wood December 10, 2009 Standard Linear Regression Recipe Identify the explanatory variables Decide the functional forms in which the explanatory variables can enter the model Decide
More informationEfficient Estimation of Population Quantiles in General Semiparametric Regression Models
Efficient Estimation of Population Quantiles in General Semiparametric Regression Models Arnab Maity 1 Department of Statistics, Texas A&M University, College Station TX 77843-3143, U.S.A. amaity@stat.tamu.edu
More informationIntroduction to Empirical Processes and Semiparametric Inference Lecture 25: Semiparametric Models
Introduction to Empirical Processes and Semiparametric Inference Lecture 25: Semiparametric Models Michael R. Kosorok, Ph.D. Professor and Chair of Biostatistics Professor of Statistics and Operations
More informationBootstrapped Basis Regression with Variable Selection: A New Method for Flexible Functional Form Estimation
Bootstrapped Basis Regression with Variable Selection: A New Method for Flexible Functional Form Estimation Brenton Kenkel Curtis S. Signorino January 23, 2013 Work in progress: comments welcome. Abstract
More informationHigh-dimensional regression modeling
High-dimensional regression modeling David Causeur Department of Statistics and Computer Science Agrocampus Ouest IRMAR CNRS UMR 6625 http://www.agrocampus-ouest.fr/math/causeur/ Course objectives Making
More informationChapter 17: Undirected Graphical Models
Chapter 17: Undirected Graphical Models The Elements of Statistical Learning Biaobin Jiang Department of Biological Sciences Purdue University bjiang@purdue.edu October 30, 2014 Biaobin Jiang (Purdue)
More informationASYMPTOTIC PROPERTIES OF BRIDGE ESTIMATORS IN SPARSE HIGH-DIMENSIONAL REGRESSION MODELS
The Annals of Statistics 2008, Vol. 36, No. 2, 587 613 DOI: 10.1214/009053607000000875 Institute of Mathematical Statistics, 2008 ASYMPTOTIC PROPERTIES OF BRIDGE ESTIMATORS IN SPARSE HIGH-DIMENSIONAL REGRESSION
More informationStatistics 203: Introduction to Regression and Analysis of Variance Course review
Statistics 203: Introduction to Regression and Analysis of Variance Course review Jonathan Taylor - p. 1/?? Today Review / overview of what we learned. - p. 2/?? General themes in regression models Specifying
More informationFeature Selection for Varying Coefficient Models With Ultrahigh Dimensional Covariates
Feature Selection for Varying Coefficient Models With Ultrahigh Dimensional Covariates Jingyuan Liu, Runze Li and Rongling Wu Abstract This paper is concerned with feature screening and variable selection
More informationFULL LIKELIHOOD INFERENCES IN THE COX MODEL: AN EMPIRICAL LIKELIHOOD APPROACH
FULL LIKELIHOOD INFERENCES IN THE COX MODEL: AN EMPIRICAL LIKELIHOOD APPROACH Jian-Jian Ren 1 and Mai Zhou 2 University of Central Florida and University of Kentucky Abstract: For the regression parameter
More informationMachine Learning for OR & FE
Machine Learning for OR & FE Regression II: Regularization and Shrinkage Methods Martin Haugh Department of Industrial Engineering and Operations Research Columbia University Email: martin.b.haugh@gmail.com
More informationDirect Learning: Linear Regression. Donglin Zeng, Department of Biostatistics, University of North Carolina
Direct Learning: Linear Regression Parametric learning We consider the core function in the prediction rule to be a parametric function. The most commonly used function is a linear function: squared loss:
More informationGaussian Graphical Models and Graphical Lasso
ELE 538B: Sparsity, Structure and Inference Gaussian Graphical Models and Graphical Lasso Yuxin Chen Princeton University, Spring 2017 Multivariate Gaussians Consider a random vector x N (0, Σ) with pdf
More informationFull likelihood inferences in the Cox model: an empirical likelihood approach
Ann Inst Stat Math 2011) 63:1005 1018 DOI 10.1007/s10463-010-0272-y Full likelihood inferences in the Cox model: an empirical likelihood approach Jian-Jian Ren Mai Zhou Received: 22 September 2008 / Revised:
More informationVariable Selection in Restricted Linear Regression Models. Y. Tuaç 1 and O. Arslan 1
Variable Selection in Restricted Linear Regression Models Y. Tuaç 1 and O. Arslan 1 Ankara University, Faculty of Science, Department of Statistics, 06100 Ankara/Turkey ytuac@ankara.edu.tr, oarslan@ankara.edu.tr
More informationEstimating subgroup specific treatment effects via concave fusion
Estimating subgroup specific treatment effects via concave fusion Jian Huang University of Iowa April 6, 2016 Outline 1 Motivation and the problem 2 The proposed model and approach Concave pairwise fusion
More informationNew Local Estimation Procedure for Nonparametric Regression Function of Longitudinal Data
ew Local Estimation Procedure for onparametric Regression Function of Longitudinal Data Weixin Yao and Runze Li The Pennsylvania State University Technical Report Series #0-03 College of Health and Human
More informationBayesian variable selection via. Penalized credible regions. Brian Reich, NCSU. Joint work with. Howard Bondell and Ander Wilson
Bayesian variable selection via penalized credible regions Brian Reich, NC State Joint work with Howard Bondell and Ander Wilson Brian Reich, NCSU Penalized credible regions 1 Motivation big p, small n
More information