Sparse regularization for functional logistic regression models

Size: px

Start display at page:

Download "Sparse regularization for functional logistic regression models"

Christina Wood
5 years ago
Views:

1 Sparse regularization for functional logistic regression models Hidetoshi Matsui The Center for Data Science Education and Research, Shiga University Banba, Hikone, Shiga, 5-85, Japan. Abstract: We consider the problem of variable selection in logistic regression models where predictors are functions, with the help of sparse regularization. Observations corresponding to the predictors are supposed to be measured repeatedly at discrete points, and then they are treated as smooth functional data. Parameters included in the functional logistic regression model are estimated by the penalized likelihood method with L 1 type penalties. Tuning parameters which control the degree of the regularization are decided by model selection criteria. In order to investigate the effectiveness of the proposed method we apply it to the analysis of real data. Key Words and Phrases: Variable selection Lasso, Functional data analysis, Regularization, 1 Introduction Sparse regularization have attracted attentions as they provide a unified approach to problems of estimating and selecting variables, and for this reason they are broadly applied in several fields (see, e.g. Bühlmann and van de Geer, 011; Hastie et al., 015) In particular, we can select variables that affect the classification by applying the sparse regularization to estimating logistic regression models (Friedman et al., 010b) In this work we consider applying the sparse regularization to the analysis of longitudinal data and selecting genes that have effect on classification. When the data to be classified have been measured repeatedly over, they can be represented by a functional form. Ramsay and Silverman (005) established this type of analysis and called it functional data analysis (FDA). FDA is one of the most useful methods for effectively analyzing discretely observed data, and it has received considerable attention in various fields (Ramsay and Silverman, 00; Horváth and Kokoszka, 01). The basic idea behind FDA is to express repeated measurement data for each individual as a smooth function and then to draw information from the collection of these functions. For regression models, there are various methods, such as a functional version of logistic regressionmodels (Aguilera-Morillo et al., 013), generalized linear models (Goldsmith et al., 010), and generalized additive models (Reiss and Ogden, 010). Furthermore, the problem of variable selection for functional regression models using L 1 -type regularization is considered in Matsui and Konishi (011); Gertheiss et al. (013). However, these works 1

2 do not include the multiclass logistic regression model. For this model, we may fail to select functional variables when we use existing types of penalties, since it has multiple coefficients for multiple classification boundaries. In this paper, we consider the problem of using L 1 -type regularization to select the variables for classifying functional data by using the multiclass logistic regression model. Data from repeated measurements are represented by basis expansions, and the functional logistic regression model is estimated by the penalized maximum likelihood method with the help of L 1 -type penalties. Here we apply two types of L 1 -type penalties; the elastic net (Zou and Hastie, 005) and the sparse group lasso (Friedman et al., 010a), and then describe the effect of them. The functional logistic regression model is estimated by the penalized likelihood method. Then we report results of the analysis of multiple sclerosis data and yeast cell cycle gene expression data. Multiclass logistic regression model for functional data Suppose we have n sets of functional data and a class label {(x i (t), g i ); i = 1,..., n}, where x i (t) = (x i1 (t),..., x ip (t)) T are predictors given as functions and g i {1,..., L} are the classes to which x i belongs. In the classification setting, we apply the Bayes rule, which assigns x i to class g i = l with the maximum posterior probability given x i, denoted by Pr(g i = l x i ). Then the logistic regression model is given by the log-odds of the posterior probabilities: { } Pr(gi = l x i ) p log = β 0l + x ij (t)β lj (t)dt, (1) Pr(g i = L x i ) where β 0l is an intercept and β jl (t) are coefficient functions. We assume that x ij (t) can be expressed by basis expansions as x ij (t) = M j m=1 w ijm ϕ jm (t) = w T ijϕ j (t), () where ϕ j (t) = (ϕ j1 (t),..., ϕ jmj (t)) T are vectors of basis functions, such as B-splines or radial basis functions, and w ij = (w ij1,..., w ijmj ) T are coefficient vectors. Since the data are originally observed at discrete points, we smooth them with a basis expansion prior to obtaining the functional data x ij (t). In other words, w ij are obtained before constructing the functional logistic regression model (1). Details of the smoothing method are described in Araki et al. (009). Furthermore, β lj (t) are also expressed by basis expansions β lj (t) = M j m=1 b jlm ϕ jm (t) = b T ljϕ j (t), (3)

3 where b jl = (b jl1,..., b jlmj ) T are vectors of coefficient parameters. Using the notation π l (x i ; b) = Pr(g i = l x i ), where b = (b T 1,..., b T p ) T and b j = (b T j1,..., b T j(l 1) )T since it is controlled by b, we can express the functional logistic regression model (1) as log { } πl (x i ; b) = β 0l + π L (x i ; b) p wijφ T j b jl = p zijb T jl, (4) where Φ j = ϕ j (t)ϕ T j (t)dt and z ij = wijφ T j. It follows from (1) that the posterior probability is exp ( ) zi T b l π l (x i ; b) = 1 + L 1 h=1 exp (zt i b (l = 1,..., L 1), h) 1 π L (x i ; b) = 1 + L 1 h=1 exp (zt i b h). We define the vectors of the response variables y i, which indicate the class labels, as y i = (y i1,..., y i(l 1) ) T = (0,..., 0, (l) 1, 0,..., 0) T if g i = l, l = 1,..., L 1, (0,..., 0) T if g i = L. Then the functional logistic regression model has the probability function f(y i x i ; b) = L 1 π l (x i ; b) y il π L (x i ; b) 1 L 1 h=1 y ih. (5) 3 Estimation by sparse regularization From the result of the previous section we can construct the likelihood function. The log-likelihood function for the functional logistic regression model (5), denoted by l(b) = i f(y i x i ; b), is represented as l(b) = 1 W (η 1/ Zb ), where W = (W hl ), { diag {π1l (1 π W hl = 1l ),..., π nl (1 π nl )} (h = l) diag { π 1h π 1l,..., π nh π nl } (h l), and W 1/ is a matrix that satisfies W = W 1/ W 1/. Z = ( Z1,..., Z p ), Zj = I L 1 Z j, Z j = (z 1j,..., z nj ). Furthermore, η = Zb + W 1 Λ1 n(l 1), Λ = diag {Λ 1,..., Λ L 1 }, Λ l = diag {y 1l π 1l,..., y nl π nl }. Then we consider maximizing the penalized log-likelihood function l λ,α (b) = l(b) np λ,α (b), (6) where P λ,α (b) is a penalty function controlled by tuning parameters λ > 0 and α [0, 1]. Following two subsections respectively introduce different types of penalties for P λ,α (b) and characteristics of them. 3

4 3.1 Elastic net-type penalty Kayano et al. (016) introduced an elastic net-type penalty for estimating the model (1) and selecting variables: P λ,α (b) = 1 (1 α) p b jl + α λ j L 1 { p L 1 λ j b jl } 1, (7) where λ j = M j λ. The 1st term penalizes the L norm of a parameter vector b and the nd term penalizes the L 1 norm of the L norm of coefficients b j. As described in Section, the functional logistic regression model (4) has M j (L 1) parameters for jth predictor. Therefore, if we want to select variables we need to treat them as grouped parameters, using the idea of the group lasso (Yuan and Lin, 006). 3. Sparse group lasso-type penalty The elastic net-type penalty described above treats a set of parameters {b j1,..., b j(l 1) } as a group for the jth variable. On the ohter hand for the multiclass classifiction problem, if we treat each vector as a group separately, we can select decision boundaries for each variable. For example, when b jl is estimated as the 0 vector, the jth variable does not affect the classification between classes l and L. Matsui (014) proposed two types of penalties that select variables and decision boundaries respectively. We extend these penalties and introduce the following penalty; P λ,α (b) = n(1 α) { p L 1 1/ λ j b jl } + nα p b jl. (8) λ j L 1 The first term of the right hand side of (8) select variables and the second term selects decision boundaries. 4 Real data analysis We applied the proposed methods to the analysis of two gene expression data sets. In Section 4.1 and 4. we report strategies of analyzing these data using the functional logistic regression models with penalties given in Section 3.1 and 3., respectively. Details of Section 4.1 is described in Kayano et al. (016). 4.1 Multiple sclerosis data analysis The real data set consists of course gene expression profiles obtained from the investigation of long-term effects of recombinant interferon β (rifn-β) on disease progression of multiple sclerosis (MS). There exist n = 53 MS patients with treatments of rifn-β, 4

5 IRF IRF Figure 1: Examples of gene expression profiles for a gene (IRF8). Points and thin lines show expression profiles and heavy lines are estimated mean functions for good responders (left) and poor responders (right). and the MS patients are categorized into 33 good responders and 0 poor responders according to their response levels for rifn-β administration (Figure 1). Expression levels were measured at the beginning of the administration and after 3, 6, 9, 1, 18, and 4 months. The data include missing values, and therefore, the actual number of the points is from 4 to 7. Also, The data consists of p = 76 genes coding for type I and II IFN-responsive molecules, cytokine receptors, members in the IFN signaling and apoptosis pathways, and transcription factors in immune regulation. We expressed the observed longitudinal data as functions using the mixed effects models which is implemented in R package fpca (Peng and Paul, 009). Then we estimated the functional logistic model and selected genes using the method described in Section 3.1. We also compared the results of our method with functional ANOVA by Minas et al. (011) with respect to the selection of genes. As the result of the analysis, we succeeded in detecting the gene that attracted attentions in viewpoints of biology as a new target of the treatment of MS, whereas the functional ANOVA could not select it. 4. Eeast cell gene expression data analysis Spellman et al. (1998) measured expression profiles over about two cell cycles for 6,178 genome-wide yeast genes using cdna microarrays. The data contain 77 microarrays with several types of temporal synchronization: cln3 ( points), clb ( points), α-factor (18 points), cdc15 (4 points), cdc8 (17 points), and elu (14 points). Spellman et al. (1998) used the clustering method from the above 77 experiments to classify 800 genes into 5 groups: G1, G/M, M/G1, S, and S/G. Fig. shows examples for each type of synchronization. We examined whether not only these experiments affect the classification 5

6 but also whether each of them affect the classification of each combination of 5 classes. Since there are many missing values in the expression profiles and only 7 genes have no missing values, we excluded genes according to the following two rules: (1) genes with at least one missing value for either cln3 or clb were excluded. () Those with a total of more than 10 missing values from some combination of -factor, cdc15, cdc8, and elu were excluded. We can easily apply the regression model even if there are some (not excessively many) missing values by expressing them as functional data. The resulting 657 genes were used for this analysis. First, except for cln3 and clb, we expressed the -course data into functions. They were expressed using basis expansions with 4 basis functions that were previously selected. The remaining variables, cln3 and clb, each of which has only points, were treated as vector data rather than functional data. We also treated the variables corresponding to the points as a group. Next, we constructed a functional logistic regression model { } Pr(gi = l x i ) log = β l0 + Pr(g i = L x i ) j =1 x ijj β jj l + 6 j=3 x ij (t)β jl (t)dt, which is a special case of (1), where X j (j = 1,..., 6) correspond to cln3, clb, α-factor, cdc15, cdc8, and elu, respectively. The model was estimated by the penalized likelihood method, with the penalty (8) and then it was evaluated by BIC-type model selection criterion. We also altered the class label L on the left-hand side of (1) and repeatedly estimated the model in order to investigate all the coefficients of the classification boundaries. We repeated this process for 100 bootstrap samples, and we then investigated which variables and boundaries affected the classification. 5 Concluding remarks In this paper we treated observed longitudinal data as a set of functional data and then selected variables that relate to the classification. In order to estimate and select the functional logistic regression model we used elastic net-type and sparse group lasso-type penalty. The former has a property that it selects variables, on the other hand, the latter can select variables and decision boundaries simultaneously. We applied the proposed method to the analysis of two gene expression data sets and investigated the effectiveness of the proposed methods. Recently several algorithms for estimating models with sparse regularization have been proposed (e.g. Boyd et al., 011). Future works includes developing more efficient algorithms for estimating these models. 6

7 cln clb alpha factor cdc cdc elu Figure : Yeast cell cycle gene expression profiles for each type of synchronization. Each plot consists of 5 genes from 5 classes: G1 (solid), G/M (dashed), M/G1 (dotted), S (dot-dashed), and S/G (long dashed). References Aguilera-Morillo, M. C., Aguilera, A. M., Escabias, M., and Valderrama, M. J. (013), Penalized spline approaches for functional logit regression, Test, 1 7. Araki, Y., Konishi, S., Kawano, S., and Matsui, H. (009), Functional regression modeling via regularized Gaussian basis expansions, Ann. Inst. Statist. Math., 61, Boyd, S., Parikh, N., Chu, E., Peleato, B., and Eckstein, J. (011), Distributed optimization and statistical learning via the alternating direction method of multipliers, Found. Trends Mach. Learn., 3, 1 1. Bühlmann, P. and van de Geer, S. (011), Statistics for high-dimensional data: methods, theory and applications, Heidelberg: Springer. Friedman, J., Hastie, T., and Tibshirani, R. (010a), A note on the group lasso and a sparse group lasso, arxiv preprint arxiv: (010b), Regularization paths for generalized linear models via coordinate descent, J. Stat. Softw., 33, 1. Gertheiss, J., Maity, A., and Staicu, A.-M. (013), Variable selection in generalized functional linear models, Stat.,, Goldsmith, J., Feder, J., and Crainiceanu, C. (010), Penalized functional regression, J. Comput. Graph. Statist., 0,

8 Hastie, T., Tibshirani, R., and Wainwright, M. (015), Statistical Learning with Sparsity: The Lasso and Generalization, Boca Raton: Chapman & Hall/CRC. Horváth, L. and Kokoszka, P. (01), Inference for functional data with applications, New York: Springer. Kayano, M., Matsui, H., Yamaguchi, R., Imoto, S., and Miyano, S. (016), Gene set differential analysis of course expression profiles via sparse estimation in functional logistic model with application to dependent biomarker detection, Biostatistics, 17, Matsui, H. (014), Variable and boundary selection for functional data via multiclass logistic regression modeling, Comput. Statist. Data Anal., 78, Matsui, H. and Konishi, S. (011), Variable selection for functional regression models via the L1 regularization, Comput. Statist. Data Anal., 55, Minas, C., Waddell, S. J., and Montana, G. (011), Distance-based differential analysis of gene curves. Bioinformatics, 7, Peng, J. and Paul, D. (009), A geometric approach to maximum likelihood estimation of the functional principal components from sparse longitudinal data, J. Comput. Graph. Statist., 18, Ramsay, J. and Silverman, B. (00), Applied functional data analysis: methods and case studies, New York: Springer. (005), Functional data analysis nd ed., New York: Springer. Reiss, P. T. and Ogden, R. T. (010), Functional generalized linear models with images as predictors, Biometrics, 66, Spellman, P., Sherlock, G., Zhang, M., Iyer, V., Anders, K., Eisen, M., Brown, P., Botstein, D., and Futcher, B. (1998), Comprehensive identification of cell cycle regulated genes of the yeast Saccharomyces cerevisiae by microarray hybridization, Mol. Biol. Cell, 9, Yuan, M. and Lin, Y. (006), Model selection and estimation in regression with grouped variables, J. Roy. Statist. Soc. Ser. B, 68, Zou, H. and Hastie, T. (005), Regularization and variable selection via the elastic net, J. Roy. Statist. Soc. Ser. B, 67,

Fast Regularization Paths via Coordinate Descent

August 2008 Trevor Hastie, Stanford Statistics 1 Fast Regularization Paths via Coordinate Descent Trevor Hastie Stanford University joint work with Jerry Friedman and Rob Tibshirani. August 2008 Trevor