Sparse regularization for functional logistic regression models
|
|
- Christina Wood
- 5 years ago
- Views:
Transcription
1 Sparse regularization for functional logistic regression models Hidetoshi Matsui The Center for Data Science Education and Research, Shiga University Banba, Hikone, Shiga, 5-85, Japan. Abstract: We consider the problem of variable selection in logistic regression models where predictors are functions, with the help of sparse regularization. Observations corresponding to the predictors are supposed to be measured repeatedly at discrete points, and then they are treated as smooth functional data. Parameters included in the functional logistic regression model are estimated by the penalized likelihood method with L 1 type penalties. Tuning parameters which control the degree of the regularization are decided by model selection criteria. In order to investigate the effectiveness of the proposed method we apply it to the analysis of real data. Key Words and Phrases: Variable selection Lasso, Functional data analysis, Regularization, 1 Introduction Sparse regularization have attracted attentions as they provide a unified approach to problems of estimating and selecting variables, and for this reason they are broadly applied in several fields (see, e.g. Bühlmann and van de Geer, 011; Hastie et al., 015) In particular, we can select variables that affect the classification by applying the sparse regularization to estimating logistic regression models (Friedman et al., 010b) In this work we consider applying the sparse regularization to the analysis of longitudinal data and selecting genes that have effect on classification. When the data to be classified have been measured repeatedly over, they can be represented by a functional form. Ramsay and Silverman (005) established this type of analysis and called it functional data analysis (FDA). FDA is one of the most useful methods for effectively analyzing discretely observed data, and it has received considerable attention in various fields (Ramsay and Silverman, 00; Horváth and Kokoszka, 01). The basic idea behind FDA is to express repeated measurement data for each individual as a smooth function and then to draw information from the collection of these functions. For regression models, there are various methods, such as a functional version of logistic regressionmodels (Aguilera-Morillo et al., 013), generalized linear models (Goldsmith et al., 010), and generalized additive models (Reiss and Ogden, 010). Furthermore, the problem of variable selection for functional regression models using L 1 -type regularization is considered in Matsui and Konishi (011); Gertheiss et al. (013). However, these works 1
2 do not include the multiclass logistic regression model. For this model, we may fail to select functional variables when we use existing types of penalties, since it has multiple coefficients for multiple classification boundaries. In this paper, we consider the problem of using L 1 -type regularization to select the variables for classifying functional data by using the multiclass logistic regression model. Data from repeated measurements are represented by basis expansions, and the functional logistic regression model is estimated by the penalized maximum likelihood method with the help of L 1 -type penalties. Here we apply two types of L 1 -type penalties; the elastic net (Zou and Hastie, 005) and the sparse group lasso (Friedman et al., 010a), and then describe the effect of them. The functional logistic regression model is estimated by the penalized likelihood method. Then we report results of the analysis of multiple sclerosis data and yeast cell cycle gene expression data. Multiclass logistic regression model for functional data Suppose we have n sets of functional data and a class label {(x i (t), g i ); i = 1,..., n}, where x i (t) = (x i1 (t),..., x ip (t)) T are predictors given as functions and g i {1,..., L} are the classes to which x i belongs. In the classification setting, we apply the Bayes rule, which assigns x i to class g i = l with the maximum posterior probability given x i, denoted by Pr(g i = l x i ). Then the logistic regression model is given by the log-odds of the posterior probabilities: { } Pr(gi = l x i ) p log = β 0l + x ij (t)β lj (t)dt, (1) Pr(g i = L x i ) where β 0l is an intercept and β jl (t) are coefficient functions. We assume that x ij (t) can be expressed by basis expansions as x ij (t) = M j m=1 w ijm ϕ jm (t) = w T ijϕ j (t), () where ϕ j (t) = (ϕ j1 (t),..., ϕ jmj (t)) T are vectors of basis functions, such as B-splines or radial basis functions, and w ij = (w ij1,..., w ijmj ) T are coefficient vectors. Since the data are originally observed at discrete points, we smooth them with a basis expansion prior to obtaining the functional data x ij (t). In other words, w ij are obtained before constructing the functional logistic regression model (1). Details of the smoothing method are described in Araki et al. (009). Furthermore, β lj (t) are also expressed by basis expansions β lj (t) = M j m=1 b jlm ϕ jm (t) = b T ljϕ j (t), (3)
3 where b jl = (b jl1,..., b jlmj ) T are vectors of coefficient parameters. Using the notation π l (x i ; b) = Pr(g i = l x i ), where b = (b T 1,..., b T p ) T and b j = (b T j1,..., b T j(l 1) )T since it is controlled by b, we can express the functional logistic regression model (1) as log { } πl (x i ; b) = β 0l + π L (x i ; b) p wijφ T j b jl = p zijb T jl, (4) where Φ j = ϕ j (t)ϕ T j (t)dt and z ij = wijφ T j. It follows from (1) that the posterior probability is exp ( ) zi T b l π l (x i ; b) = 1 + L 1 h=1 exp (zt i b (l = 1,..., L 1), h) 1 π L (x i ; b) = 1 + L 1 h=1 exp (zt i b h). We define the vectors of the response variables y i, which indicate the class labels, as y i = (y i1,..., y i(l 1) ) T = (0,..., 0, (l) 1, 0,..., 0) T if g i = l, l = 1,..., L 1, (0,..., 0) T if g i = L. Then the functional logistic regression model has the probability function f(y i x i ; b) = L 1 π l (x i ; b) y il π L (x i ; b) 1 L 1 h=1 y ih. (5) 3 Estimation by sparse regularization From the result of the previous section we can construct the likelihood function. The log-likelihood function for the functional logistic regression model (5), denoted by l(b) = i f(y i x i ; b), is represented as l(b) = 1 W (η 1/ Zb ), where W = (W hl ), { diag {π1l (1 π W hl = 1l ),..., π nl (1 π nl )} (h = l) diag { π 1h π 1l,..., π nh π nl } (h l), and W 1/ is a matrix that satisfies W = W 1/ W 1/. Z = ( Z1,..., Z p ), Zj = I L 1 Z j, Z j = (z 1j,..., z nj ). Furthermore, η = Zb + W 1 Λ1 n(l 1), Λ = diag {Λ 1,..., Λ L 1 }, Λ l = diag {y 1l π 1l,..., y nl π nl }. Then we consider maximizing the penalized log-likelihood function l λ,α (b) = l(b) np λ,α (b), (6) where P λ,α (b) is a penalty function controlled by tuning parameters λ > 0 and α [0, 1]. Following two subsections respectively introduce different types of penalties for P λ,α (b) and characteristics of them. 3
4 3.1 Elastic net-type penalty Kayano et al. (016) introduced an elastic net-type penalty for estimating the model (1) and selecting variables: P λ,α (b) = 1 (1 α) p b jl + α λ j L 1 { p L 1 λ j b jl } 1, (7) where λ j = M j λ. The 1st term penalizes the L norm of a parameter vector b and the nd term penalizes the L 1 norm of the L norm of coefficients b j. As described in Section, the functional logistic regression model (4) has M j (L 1) parameters for jth predictor. Therefore, if we want to select variables we need to treat them as grouped parameters, using the idea of the group lasso (Yuan and Lin, 006). 3. Sparse group lasso-type penalty The elastic net-type penalty described above treats a set of parameters {b j1,..., b j(l 1) } as a group for the jth variable. On the ohter hand for the multiclass classifiction problem, if we treat each vector as a group separately, we can select decision boundaries for each variable. For example, when b jl is estimated as the 0 vector, the jth variable does not affect the classification between classes l and L. Matsui (014) proposed two types of penalties that select variables and decision boundaries respectively. We extend these penalties and introduce the following penalty; P λ,α (b) = n(1 α) { p L 1 1/ λ j b jl } + nα p b jl. (8) λ j L 1 The first term of the right hand side of (8) select variables and the second term selects decision boundaries. 4 Real data analysis We applied the proposed methods to the analysis of two gene expression data sets. In Section 4.1 and 4. we report strategies of analyzing these data using the functional logistic regression models with penalties given in Section 3.1 and 3., respectively. Details of Section 4.1 is described in Kayano et al. (016). 4.1 Multiple sclerosis data analysis The real data set consists of course gene expression profiles obtained from the investigation of long-term effects of recombinant interferon β (rifn-β) on disease progression of multiple sclerosis (MS). There exist n = 53 MS patients with treatments of rifn-β, 4
5 IRF IRF Figure 1: Examples of gene expression profiles for a gene (IRF8). Points and thin lines show expression profiles and heavy lines are estimated mean functions for good responders (left) and poor responders (right). and the MS patients are categorized into 33 good responders and 0 poor responders according to their response levels for rifn-β administration (Figure 1). Expression levels were measured at the beginning of the administration and after 3, 6, 9, 1, 18, and 4 months. The data include missing values, and therefore, the actual number of the points is from 4 to 7. Also, The data consists of p = 76 genes coding for type I and II IFN-responsive molecules, cytokine receptors, members in the IFN signaling and apoptosis pathways, and transcription factors in immune regulation. We expressed the observed longitudinal data as functions using the mixed effects models which is implemented in R package fpca (Peng and Paul, 009). Then we estimated the functional logistic model and selected genes using the method described in Section 3.1. We also compared the results of our method with functional ANOVA by Minas et al. (011) with respect to the selection of genes. As the result of the analysis, we succeeded in detecting the gene that attracted attentions in viewpoints of biology as a new target of the treatment of MS, whereas the functional ANOVA could not select it. 4. Eeast cell gene expression data analysis Spellman et al. (1998) measured expression profiles over about two cell cycles for 6,178 genome-wide yeast genes using cdna microarrays. The data contain 77 microarrays with several types of temporal synchronization: cln3 ( points), clb ( points), α-factor (18 points), cdc15 (4 points), cdc8 (17 points), and elu (14 points). Spellman et al. (1998) used the clustering method from the above 77 experiments to classify 800 genes into 5 groups: G1, G/M, M/G1, S, and S/G. Fig. shows examples for each type of synchronization. We examined whether not only these experiments affect the classification 5
6 but also whether each of them affect the classification of each combination of 5 classes. Since there are many missing values in the expression profiles and only 7 genes have no missing values, we excluded genes according to the following two rules: (1) genes with at least one missing value for either cln3 or clb were excluded. () Those with a total of more than 10 missing values from some combination of -factor, cdc15, cdc8, and elu were excluded. We can easily apply the regression model even if there are some (not excessively many) missing values by expressing them as functional data. The resulting 657 genes were used for this analysis. First, except for cln3 and clb, we expressed the -course data into functions. They were expressed using basis expansions with 4 basis functions that were previously selected. The remaining variables, cln3 and clb, each of which has only points, were treated as vector data rather than functional data. We also treated the variables corresponding to the points as a group. Next, we constructed a functional logistic regression model { } Pr(gi = l x i ) log = β l0 + Pr(g i = L x i ) j =1 x ijj β jj l + 6 j=3 x ij (t)β jl (t)dt, which is a special case of (1), where X j (j = 1,..., 6) correspond to cln3, clb, α-factor, cdc15, cdc8, and elu, respectively. The model was estimated by the penalized likelihood method, with the penalty (8) and then it was evaluated by BIC-type model selection criterion. We also altered the class label L on the left-hand side of (1) and repeatedly estimated the model in order to investigate all the coefficients of the classification boundaries. We repeated this process for 100 bootstrap samples, and we then investigated which variables and boundaries affected the classification. 5 Concluding remarks In this paper we treated observed longitudinal data as a set of functional data and then selected variables that relate to the classification. In order to estimate and select the functional logistic regression model we used elastic net-type and sparse group lasso-type penalty. The former has a property that it selects variables, on the other hand, the latter can select variables and decision boundaries simultaneously. We applied the proposed method to the analysis of two gene expression data sets and investigated the effectiveness of the proposed methods. Recently several algorithms for estimating models with sparse regularization have been proposed (e.g. Boyd et al., 011). Future works includes developing more efficient algorithms for estimating these models. 6
7 cln clb alpha factor cdc cdc elu Figure : Yeast cell cycle gene expression profiles for each type of synchronization. Each plot consists of 5 genes from 5 classes: G1 (solid), G/M (dashed), M/G1 (dotted), S (dot-dashed), and S/G (long dashed). References Aguilera-Morillo, M. C., Aguilera, A. M., Escabias, M., and Valderrama, M. J. (013), Penalized spline approaches for functional logit regression, Test, 1 7. Araki, Y., Konishi, S., Kawano, S., and Matsui, H. (009), Functional regression modeling via regularized Gaussian basis expansions, Ann. Inst. Statist. Math., 61, Boyd, S., Parikh, N., Chu, E., Peleato, B., and Eckstein, J. (011), Distributed optimization and statistical learning via the alternating direction method of multipliers, Found. Trends Mach. Learn., 3, 1 1. Bühlmann, P. and van de Geer, S. (011), Statistics for high-dimensional data: methods, theory and applications, Heidelberg: Springer. Friedman, J., Hastie, T., and Tibshirani, R. (010a), A note on the group lasso and a sparse group lasso, arxiv preprint arxiv: (010b), Regularization paths for generalized linear models via coordinate descent, J. Stat. Softw., 33, 1. Gertheiss, J., Maity, A., and Staicu, A.-M. (013), Variable selection in generalized functional linear models, Stat.,, Goldsmith, J., Feder, J., and Crainiceanu, C. (010), Penalized functional regression, J. Comput. Graph. Statist., 0,
8 Hastie, T., Tibshirani, R., and Wainwright, M. (015), Statistical Learning with Sparsity: The Lasso and Generalization, Boca Raton: Chapman & Hall/CRC. Horváth, L. and Kokoszka, P. (01), Inference for functional data with applications, New York: Springer. Kayano, M., Matsui, H., Yamaguchi, R., Imoto, S., and Miyano, S. (016), Gene set differential analysis of course expression profiles via sparse estimation in functional logistic model with application to dependent biomarker detection, Biostatistics, 17, Matsui, H. (014), Variable and boundary selection for functional data via multiclass logistic regression modeling, Comput. Statist. Data Anal., 78, Matsui, H. and Konishi, S. (011), Variable selection for functional regression models via the L1 regularization, Comput. Statist. Data Anal., 55, Minas, C., Waddell, S. J., and Montana, G. (011), Distance-based differential analysis of gene curves. Bioinformatics, 7, Peng, J. and Paul, D. (009), A geometric approach to maximum likelihood estimation of the functional principal components from sparse longitudinal data, J. Comput. Graph. Statist., 18, Ramsay, J. and Silverman, B. (00), Applied functional data analysis: methods and case studies, New York: Springer. (005), Functional data analysis nd ed., New York: Springer. Reiss, P. T. and Ogden, R. T. (010), Functional generalized linear models with images as predictors, Biometrics, 66, Spellman, P., Sherlock, G., Zhang, M., Iyer, V., Anders, K., Eisen, M., Brown, P., Botstein, D., and Futcher, B. (1998), Comprehensive identification of cell cycle regulated genes of the yeast Saccharomyces cerevisiae by microarray hybridization, Mol. Biol. Cell, 9, Yuan, M. and Lin, Y. (006), Model selection and estimation in regression with grouped variables, J. Roy. Statist. Soc. Ser. B, 68, Zou, H. and Hastie, T. (005), Regularization and variable selection via the elastic net, J. Roy. Statist. Soc. Ser. B, 67,
Fast Regularization Paths via Coordinate Descent
August 2008 Trevor Hastie, Stanford Statistics 1 Fast Regularization Paths via Coordinate Descent Trevor Hastie Stanford University joint work with Jerry Friedman and Rob Tibshirani. August 2008 Trevor
More informationA Blockwise Descent Algorithm for Group-penalized Multiresponse and Multinomial Regression
A Blockwise Descent Algorithm for Group-penalized Multiresponse and Multinomial Regression Noah Simon Jerome Friedman Trevor Hastie November 5, 013 Abstract In this paper we purpose a blockwise descent
More informationAnalyzing Microarray Time course Genome wide Data
OR 779 Functional Data Analysis Course Project Analyzing Microarray Time course Genome wide Data Presented by Xin Zhao April 29, 2002 Cornell University Overview 1. Introduction Biological Background Biological
More informationFractal functional regression for classification of gene expression data by wavelets
Fractal functional regression for classification of gene expression data by wavelets Margarita María Rincón 1 and María Dolores Ruiz-Medina 2 1 University of Granada Campus Fuente Nueva 18071 Granada,
More informationMS-C1620 Statistical inference
MS-C1620 Statistical inference 10 Linear regression III Joni Virta Department of Mathematics and Systems Analysis School of Science Aalto University Academic year 2018 2019 Period III - IV 1 / 32 Contents
More informationThe lasso: some novel algorithms and applications
1 The lasso: some novel algorithms and applications Newton Institute, June 25, 2008 Robert Tibshirani Stanford University Collaborations with Trevor Hastie, Jerome Friedman, Holger Hoefling, Gen Nowak,
More informationFast Regularization Paths via Coordinate Descent
KDD August 2008 Trevor Hastie, Stanford Statistics 1 Fast Regularization Paths via Coordinate Descent Trevor Hastie Stanford University joint work with Jerry Friedman and Rob Tibshirani. KDD August 2008
More informationESL Chap3. Some extensions of lasso
ESL Chap3 Some extensions of lasso 1 Outline Consistency of lasso for model selection Adaptive lasso Elastic net Group lasso 2 Consistency of lasso for model selection A number of authors have studied
More informationChris Fraley and Daniel Percival. August 22, 2008, revised May 14, 2010
Model-Averaged l 1 Regularization using Markov Chain Monte Carlo Model Composition Technical Report No. 541 Department of Statistics, University of Washington Chris Fraley and Daniel Percival August 22,
More informationPredicting causal effects in large-scale systems from observational data
nature methods Predicting causal effects in large-scale systems from observational data Marloes H Maathuis 1, Diego Colombo 1, Markus Kalisch 1 & Peter Bühlmann 1,2 Supplementary figures and text: Supplementary
More informationAn efficient ADMM algorithm for high dimensional precision matrix estimation via penalized quadratic loss
An efficient ADMM algorithm for high dimensional precision matrix estimation via penalized quadratic loss arxiv:1811.04545v1 [stat.co] 12 Nov 2018 Cheng Wang School of Mathematical Sciences, Shanghai Jiao
More informationMultivariate Regression Modeling for Functional Data
Journal of Data Science 6(2008), 313-331 Multivariate Regression Modeling for Functional Data Hidetoshi Matsui 1, Yuko Araki 2 and Sadanori Konishi 1 1 Kyushu University and 2 Kurume University Abstract:
More informationEstimation of Identification Methods of Gene Clusters Using GO Term Annotations from a Hierarchical Cluster Tree
Estimation of Identification Methods of Gene Clusters Using GO Term Annotations from a Hierarchical Cluster Tree YOICHI YAMADA, YUKI MIYATA, MASANORI HIGASHIHARA*, KENJI SATOU Graduate School of Natural
More informationGenetic Networks. Korbinian Strimmer. Seminar: Statistical Analysis of RNA-Seq Data 19 June IMISE, Universität Leipzig
Genetic Networks Korbinian Strimmer IMISE, Universität Leipzig Seminar: Statistical Analysis of RNA-Seq Data 19 June 2012 Korbinian Strimmer, RNA-Seq Networks, 19/6/2012 1 Paper G. I. Allen and Z. Liu.
More informationGaussian Graphical Models and Graphical Lasso
ELE 538B: Sparsity, Structure and Inference Gaussian Graphical Models and Graphical Lasso Yuxin Chen Princeton University, Spring 2017 Multivariate Gaussians Consider a random vector x N (0, Σ) with pdf
More informationPre-Selection in Cluster Lasso Methods for Correlated Variable Selection in High-Dimensional Linear Models
Pre-Selection in Cluster Lasso Methods for Correlated Variable Selection in High-Dimensional Linear Models Niharika Gauraha and Swapan Parui Indian Statistical Institute Abstract. We consider variable
More informationAutomatic Response Category Combination. in Multinomial Logistic Regression
Automatic Response Category Combination in Multinomial Logistic Regression Bradley S. Price, Charles J. Geyer, and Adam J. Rothman arxiv:1705.03594v1 [stat.me] 10 May 2017 Abstract We propose a penalized
More informationA note on the group lasso and a sparse group lasso
A note on the group lasso and a sparse group lasso arxiv:1001.0736v1 [math.st] 5 Jan 2010 Jerome Friedman Trevor Hastie and Robert Tibshirani January 5, 2010 Abstract We consider the group lasso penalty
More informationProbabilistic Machine Learning. Industrial AI Lab.
Probabilistic Machine Learning Industrial AI Lab. Probabilistic Linear Regression Outline Probabilistic Classification Probabilistic Clustering Probabilistic Dimension Reduction 2 Probabilistic Linear
More informationFinal Overview. Introduction to ML. Marek Petrik 4/25/2017
Final Overview Introduction to ML Marek Petrik 4/25/2017 This Course: Introduction to Machine Learning Build a foundation for practice and research in ML Basic machine learning concepts: max likelihood,
More informationBAGUS: Bayesian Regularization for Graphical Models with Unequal Shrinkage
BAGUS: Bayesian Regularization for Graphical Models with Unequal Shrinkage Lingrui Gan, Naveen N. Narisetty, Feng Liang Department of Statistics University of Illinois at Urbana-Champaign Problem Statement
More informationMachine Learning Linear Classification. Prof. Matteo Matteucci
Machine Learning Linear Classification Prof. Matteo Matteucci Recall from the first lecture 2 X R p Regression Y R Continuous Output X R p Y {Ω 0, Ω 1,, Ω K } Classification Discrete Output X R p Y (X)
More informationarxiv: v2 [stat.ml] 22 Feb 2008
arxiv:0710.0508v2 [stat.ml] 22 Feb 2008 Electronic Journal of Statistics Vol. 2 (2008) 103 117 ISSN: 1935-7524 DOI: 10.1214/07-EJS125 Structured variable selection in support vector machines Seongho Wu
More informationVariable Selection for High-Dimensional Data with Spatial-Temporal Effects and Extensions to Multitask Regression and Multicategory Classification
Variable Selection for High-Dimensional Data with Spatial-Temporal Effects and Extensions to Multitask Regression and Multicategory Classification Tong Tong Wu Department of Epidemiology and Biostatistics
More informationSelection of Smoothing Parameter for One-Step Sparse Estimates with L q Penalty
Journal of Data Science 9(2011), 549-564 Selection of Smoothing Parameter for One-Step Sparse Estimates with L q Penalty Masaru Kanba and Kanta Naito Shimane University Abstract: This paper discusses the
More informationApplied Machine Learning Annalisa Marsico
Applied Machine Learning Annalisa Marsico OWL RNA Bionformatics group Max Planck Institute for Molecular Genetics Free University of Berlin 22 April, SoSe 2015 Goals Feature Selection rather than Feature
More informationProteomics and Variable Selection
Proteomics and Variable Selection p. 1/55 Proteomics and Variable Selection Alex Lewin With thanks to Paul Kirk for some graphs Department of Epidemiology and Biostatistics, School of Public Health, Imperial
More informationLecture 14: Variable Selection - Beyond LASSO
Fall, 2017 Extension of LASSO To achieve oracle properties, L q penalty with 0 < q < 1, SCAD penalty (Fan and Li 2001; Zhang et al. 2007). Adaptive LASSO (Zou 2006; Zhang and Lu 2007; Wang et al. 2007)
More informationMachine Learning for OR & FE
Machine Learning for OR & FE Regression II: Regularization and Shrinkage Methods Martin Haugh Department of Industrial Engineering and Operations Research Columbia University Email: martin.b.haugh@gmail.com
More informationAn Introduction to Graphical Lasso
An Introduction to Graphical Lasso Bo Chang Graphical Models Reading Group May 15, 2015 Bo Chang (UBC) Graphical Lasso May 15, 2015 1 / 16 Undirected Graphical Models An undirected graph, each vertex represents
More informationSparse regression. Optimization-Based Data Analysis. Carlos Fernandez-Granda
Sparse regression Optimization-Based Data Analysis http://www.cims.nyu.edu/~cfgranda/pages/obda_spring16 Carlos Fernandez-Granda 3/28/2016 Regression Least-squares regression Example: Global warming Logistic
More informationA New Combined Approach for Inference in High-Dimensional Regression Models with Correlated Variables
A New Combined Approach for Inference in High-Dimensional Regression Models with Correlated Variables Niharika Gauraha and Swapan Parui Indian Statistical Institute Abstract. We consider the problem of
More informationVariable Selection for Generalized Additive Mixed Models by Likelihood-based Boosting
Variable Selection for Generalized Additive Mixed Models by Likelihood-based Boosting Andreas Groll 1 and Gerhard Tutz 2 1 Department of Statistics, University of Munich, Akademiestrasse 1, D-80799, Munich,
More informationA Survey of L 1. Regression. Céline Cunen, 20/10/2014. Vidaurre, Bielza and Larranaga (2013)
A Survey of L 1 Regression Vidaurre, Bielza and Larranaga (2013) Céline Cunen, 20/10/2014 Outline of article 1.Introduction 2.The Lasso for Linear Regression a) Notation and Main Concepts b) Statistical
More informationSimultaneous regression shrinkage, variable selection, and supervised clustering of predictors with OSCAR
Simultaneous regression shrinkage, variable selection, and supervised clustering of predictors with OSCAR Howard D. Bondell and Brian J. Reich Department of Statistics, North Carolina State University,
More informationStatistical Methods for Data Mining
Statistical Methods for Data Mining Kuangnan Fang Xiamen University Email: xmufkn@xmu.edu.cn Support Vector Machines Here we approach the two-class classification problem in a direct way: We try and find
More informationCS 4491/CS 7990 SPECIAL TOPICS IN BIOINFORMATICS
CS 4491/CS 7990 SPECIAL TOPICS IN BIOINFORMATICS * Some contents are adapted from Dr. Hung Huang and Dr. Chengkai Li at UT Arlington Mingon Kang, Ph.D. Computer Science, Kennesaw State University Problems
More informationInferring Gene Regulatory Networks from Time-Ordered Gene Expression Data of Bacillus Subtilis Using Differential Equations
Inferring Gene Regulatory Networks from Time-Ordered Gene Expression Data of Bacillus Subtilis Using Differential Equations M.J.L. de Hoon, S. Imoto, K. Kobayashi, N. Ogasawara, S. Miyano Pacific Symposium
More informationSparse Permutation Invariant Covariance Estimation: Motivation, Background and Key Results
Sparse Permutation Invariant Covariance Estimation: Motivation, Background and Key Results David Prince Biostat 572 dprince3@uw.edu April 19, 2012 David Prince (UW) SPICE April 19, 2012 1 / 11 Electronic
More informationIntroduction to machine learning and pattern recognition Lecture 2 Coryn Bailer-Jones
Introduction to machine learning and pattern recognition Lecture 2 Coryn Bailer-Jones http://www.mpia.de/homes/calj/mlpr_mpia2008.html 1 1 Last week... supervised and unsupervised methods need adaptive
More informationMultinomial functional regression with application to lameness detection for horses
Department of Mathematical Sciences Multinomial functional regression with application to lameness detection for horses Helle Sørensen (helle@math.ku.dk) Joint with Seyed Nourollah Mousavi user! 2015,
More information6. Regularized linear regression
Foundations of Machine Learning École Centrale Paris Fall 2015 6. Regularized linear regression Chloé-Agathe Azencot Centre for Computational Biology, Mines ParisTech chloe agathe.azencott@mines paristech.fr
More informationBioinformatics. Transcriptome
Bioinformatics Transcriptome Jacques.van.Helden@ulb.ac.be Université Libre de Bruxelles, Belgique Laboratoire de Bioinformatique des Génomes et des Réseaux (BiGRe) http://www.bigre.ulb.ac.be/ Bioinformatics
More informationAn Introduction to the spls Package, Version 1.0
An Introduction to the spls Package, Version 1.0 Dongjun Chung 1, Hyonho Chun 1 and Sündüz Keleş 1,2 1 Department of Statistics, University of Wisconsin Madison, WI 53706. 2 Department of Biostatistics
More informationRegularization in Cox Frailty Models
Regularization in Cox Frailty Models Andreas Groll 1, Trevor Hastie 2, Gerhard Tutz 3 1 Ludwig-Maximilians-Universität Munich, Department of Mathematics, Theresienstraße 39, 80333 Munich, Germany 2 University
More informationFunctional Data Analysis & Variable Selection
Auburn University Department of Mathematics and Statistics Universidad Nacional de Colombia Medellin, Colombia March 14, 2016 Functional Data Analysis Data Types Univariate - Contains numbers as its observations
More informationSupport Vector Machines for Classification: A Statistical Portrait
Support Vector Machines for Classification: A Statistical Portrait Yoonkyung Lee Department of Statistics The Ohio State University May 27, 2011 The Spring Conference of Korean Statistical Society KAIST,
More informationAn algorithm for the multivariate group lasso with covariance estimation
An algorithm for the multivariate group lasso with covariance estimation arxiv:1512.05153v1 [stat.co] 16 Dec 2015 Ines Wilms and Christophe Croux Leuven Statistics Research Centre, KU Leuven, Belgium Abstract
More informationRobust and sparse Gaussian graphical modelling under cell-wise contamination
Robust and sparse Gaussian graphical modelling under cell-wise contamination Shota Katayama 1, Hironori Fujisawa 2 and Mathias Drton 3 1 Tokyo Institute of Technology, Japan 2 The Institute of Statistical
More informationPermutation-invariant regularization of large covariance matrices. Liza Levina
Liza Levina Permutation-invariant covariance regularization 1/42 Permutation-invariant regularization of large covariance matrices Liza Levina Department of Statistics University of Michigan Joint work
More informationA General Framework for Variable Selection in Linear Mixed Models with Applications to Genetic Studies with Structured Populations
A General Framework for Variable Selection in Linear Mixed Models with Applications to Genetic Studies with Structured Populations Joint work with Karim Oualkacha (UQÀM), Yi Yang (McGill), Celia Greenwood
More informationCompressed Sensing in Cancer Biology? (A Work in Progress)
Compressed Sensing in Cancer Biology? (A Work in Progress) M. Vidyasagar FRS Cecil & Ida Green Chair The University of Texas at Dallas M.Vidyasagar@utdallas.edu www.utdallas.edu/ m.vidyasagar University
More informationGraph Wavelets to Analyze Genomic Data with Biological Networks
Graph Wavelets to Analyze Genomic Data with Biological Networks Yunlong Jiao and Jean-Philippe Vert "Emerging Topics in Biological Networks and Systems Biology" symposium, Swedish Collegium for Advanced
More informationFast Regularization Paths via Coordinate Descent
user! 2009 Trevor Hastie, Stanford Statistics 1 Fast Regularization Paths via Coordinate Descent Trevor Hastie Stanford University joint work with Jerome Friedman and Rob Tibshirani. user! 2009 Trevor
More informationThe lasso, persistence, and cross-validation
The lasso, persistence, and cross-validation Daniel J. McDonald Department of Statistics Indiana University http://www.stat.cmu.edu/ danielmc Joint work with: Darren Homrighausen Colorado State University
More informationSparse Linear Models (10/7/13)
STA56: Probabilistic machine learning Sparse Linear Models (0/7/) Lecturer: Barbara Engelhardt Scribes: Jiaji Huang, Xin Jiang, Albert Oh Sparsity Sparsity has been a hot topic in statistics and machine
More informationLecture 2 Part 1 Optimization
Lecture 2 Part 1 Optimization (January 16, 2015) Mu Zhu University of Waterloo Need for Optimization E(y x), P(y x) want to go after them first, model some examples last week then, estimate didn t discuss
More informationGraphical Model Selection
May 6, 2013 Trevor Hastie, Stanford Statistics 1 Graphical Model Selection Trevor Hastie Stanford University joint work with Jerome Friedman, Rob Tibshirani, Rahul Mazumder and Jason Lee May 6, 2013 Trevor
More informationStatistics for high-dimensional data: Group Lasso and additive models
Statistics for high-dimensional data: Group Lasso and additive models Peter Bühlmann and Sara van de Geer Seminar für Statistik, ETH Zürich May 2012 The Group Lasso (Yuan & Lin, 2006) high-dimensional
More informationBi-level feature selection with applications to genetic association
Bi-level feature selection with applications to genetic association studies October 15, 2008 Motivation In many applications, biological features possess a grouping structure Categorical variables may
More informationConvex relaxation for Combinatorial Penalties
Convex relaxation for Combinatorial Penalties Guillaume Obozinski Equipe Imagine Laboratoire d Informatique Gaspard Monge Ecole des Ponts - ParisTech Joint work with Francis Bach Fête Parisienne in Computation,
More informationInstitute of Statistics Mimeo Series No Simultaneous regression shrinkage, variable selection and clustering of predictors with OSCAR
DEPARTMENT OF STATISTICS North Carolina State University 2501 Founders Drive, Campus Box 8203 Raleigh, NC 27695-8203 Institute of Statistics Mimeo Series No. 2583 Simultaneous regression shrinkage, variable
More informationSparse Gaussian conditional random fields
Sparse Gaussian conditional random fields Matt Wytock, J. ico Kolter School of Computer Science Carnegie Mellon University Pittsburgh, PA 53 {mwytock, zkolter}@cs.cmu.edu Abstract We propose sparse Gaussian
More informationBayesian Grouped Horseshoe Regression with Application to Additive Models
Bayesian Grouped Horseshoe Regression with Application to Additive Models Zemei Xu, Daniel F. Schmidt, Enes Makalic, Guoqi Qian, and John L. Hopper Centre for Epidemiology and Biostatistics, Melbourne
More informationPENALIZING YOUR MODELS
PENALIZING YOUR MODELS AN OVERVIEW OF THE GENERALIZED REGRESSION PLATFORM Michael Crotty & Clay Barker Research Statisticians JMP Division, SAS Institute Copyr i g ht 2012, SAS Ins titut e Inc. All rights
More informationAn Algorithm for Bayesian Variable Selection in High-dimensional Generalized Linear Models
Proceedings 59th ISI World Statistics Congress, 25-30 August 2013, Hong Kong (Session CPS023) p.3938 An Algorithm for Bayesian Variable Selection in High-dimensional Generalized Linear Models Vitara Pungpapong
More informationImportance Sampling: An Alternative View of Ensemble Learning. Jerome H. Friedman Bogdan Popescu Stanford University
Importance Sampling: An Alternative View of Ensemble Learning Jerome H. Friedman Bogdan Popescu Stanford University 1 PREDICTIVE LEARNING Given data: {z i } N 1 = {y i, x i } N 1 q(z) y = output or response
More informationPrediction & Feature Selection in GLM
Tarigan Statistical Consulting & Coaching statistical-coaching.ch Doctoral Program in Computer Science of the Universities of Fribourg, Geneva, Lausanne, Neuchâtel, Bern and the EPFL Hands-on Data Analysis
More informationPathwise coordinate optimization
Stanford University 1 Pathwise coordinate optimization Jerome Friedman, Trevor Hastie, Holger Hoefling, Robert Tibshirani Stanford University Acknowledgements: Thanks to Stephen Boyd, Michael Saunders,
More informationMSA220/MVE440 Statistical Learning for Big Data
MSA220/MVE440 Statistical Learning for Big Data Lecture 9-10 - High-dimensional regression Rebecka Jörnsten Mathematical Sciences University of Gothenburg and Chalmers University of Technology Recap from
More informationHierarchical kernel learning
Hierarchical kernel learning Francis Bach Willow project, INRIA - Ecole Normale Supérieure May 2010 Outline Supervised learning and regularization Kernel methods vs. sparse methods MKL: Multiple kernel
More informationSimultaneous variable selection and class fusion for high-dimensional linear discriminant analysis
Biostatistics (2010), 11, 4, pp. 599 608 doi:10.1093/biostatistics/kxq023 Advance Access publication on May 26, 2010 Simultaneous variable selection and class fusion for high-dimensional linear discriminant
More informationRecap from previous lecture
Recap from previous lecture Learning is using past experience to improve future performance. Different types of learning: supervised unsupervised reinforcement active online... For a machine, experience
More informationComparisons of penalized least squares. methods by simulations
Comparisons of penalized least squares arxiv:1405.1796v1 [stat.co] 8 May 2014 methods by simulations Ke ZHANG, Fan YIN University of Science and Technology of China, Hefei 230026, China Shifeng XIONG Academy
More informationECE G: Special Topics in Signal Processing: Sparsity, Structure, and Inference
ECE 18-898G: Special Topics in Signal Processing: Sparsity, Structure, and Inference Sparse Recovery using L1 minimization - algorithms Yuejie Chi Department of Electrical and Computer Engineering Spring
More informationUses of duality. Geoff Gordon & Ryan Tibshirani Optimization /
Uses of duality Geoff Gordon & Ryan Tibshirani Optimization 10-725 / 36-725 1 Remember conjugate functions Given f : R n R, the function is called its conjugate f (y) = max x R n yt x f(x) Conjugates appear
More informationIterative Selection Using Orthogonal Regression Techniques
Iterative Selection Using Orthogonal Regression Techniques Bradley Turnbull 1, Subhashis Ghosal 1 and Hao Helen Zhang 2 1 Department of Statistics, North Carolina State University, Raleigh, NC, USA 2 Department
More informationFunctional Graphical Models
Functional Graphical Models Xinghao Qiao 1, Shaojun Guo 2, and Gareth M. James 3 1 Department of Statistics, London School of Economics, U.K. 2 Institute of Statistics and Big Data, Renmin University of
More informationLeast Absolute Shrinkage is Equivalent to Quadratic Penalization
Least Absolute Shrinkage is Equivalent to Quadratic Penalization Yves Grandvalet Heudiasyc, UMR CNRS 6599, Université de Technologie de Compiègne, BP 20.529, 60205 Compiègne Cedex, France Yves.Grandvalet@hds.utc.fr
More informationLecture 3. Linear Regression II Bastian Leibe RWTH Aachen
Advanced Machine Learning Lecture 3 Linear Regression II 02.11.2015 Bastian Leibe RWTH Aachen http://www.vision.rwth-aachen.de/ leibe@vision.rwth-aachen.de This Lecture: Advanced Machine Learning Regression
More informationOr How to select variables Using Bayesian LASSO
Or How to select variables Using Bayesian LASSO x 1 x 2 x 3 x 4 Or How to select variables Using Bayesian LASSO x 1 x 2 x 3 x 4 Or How to select variables Using Bayesian LASSO On Bayesian Variable Selection
More informationLearning with Sparsity Constraints
Stanford 2010 Trevor Hastie, Stanford Statistics 1 Learning with Sparsity Constraints Trevor Hastie Stanford University recent joint work with Rahul Mazumder, Jerome Friedman and Rob Tibshirani earlier
More informationRegularized Linear Models in Stacked Generalization
Regularized Linear Models in Stacked Generalization Sam Reid and Greg Grudic University of Colorado at Boulder, Boulder CO 80309-0430, USA Abstract Stacked generalization is a flexible method for multiple
More informationBayesian Grouped Horseshoe Regression with Application to Additive Models
Bayesian Grouped Horseshoe Regression with Application to Additive Models Zemei Xu 1,2, Daniel F. Schmidt 1, Enes Makalic 1, Guoqi Qian 2, John L. Hopper 1 1 Centre for Epidemiology and Biostatistics,
More informationIntroduction to Machine Learning
Introduction to Machine Learning Linear Regression Varun Chandola Computer Science & Engineering State University of New York at Buffalo Buffalo, NY, USA chandola@buffalo.edu Chandola@UB CSE 474/574 1
More informationSupplemental Information for Pramila et al. Periodic Normal Mixture Model (PNM)
Supplemental Information for Pramila et al. Periodic Normal Mixture Model (PNM) The data sets alpha30 and alpha38 were analyzed with PNM (Lu et al. 2004). The first two time points were deleted to alleviate
More informationBiostatistics-Lecture 16 Model Selection. Ruibin Xi Peking University School of Mathematical Sciences
Biostatistics-Lecture 16 Model Selection Ruibin Xi Peking University School of Mathematical Sciences Motivating example1 Interested in factors related to the life expectancy (50 US states,1969-71 ) Per
More information> DEPARTMENT OF MATHEMATICS AND COMPUTER SCIENCE GRAVIS 2016 BASEL. Logistic Regression. Pattern Recognition 2016 Sandro Schönborn University of Basel
Logistic Regression Pattern Recognition 2016 Sandro Schönborn University of Basel Two Worlds: Probabilistic & Algorithmic We have seen two conceptual approaches to classification: data class density estimation
More informationRegularization Path Algorithms for Detecting Gene Interactions
Regularization Path Algorithms for Detecting Gene Interactions Mee Young Park Trevor Hastie July 16, 2006 Abstract In this study, we consider several regularization path algorithms with grouped variable
More informationLecture 5: November 19, Minimizing the maximum intracluster distance
Analysis of DNA Chips and Gene Networks Spring Semester, 2009 Lecture 5: November 19, 2009 Lecturer: Ron Shamir Scribe: Renana Meller 5.1 Minimizing the maximum intracluster distance 5.1.1 Introduction
More informationLASSO-Type Penalization in the Framework of Generalized Additive Models for Location, Scale and Shape
LASSO-Type Penalization in the Framework of Generalized Additive Models for Location, Scale and Shape Nikolaus Umlauf https://eeecon.uibk.ac.at/~umlauf/ Overview Joint work with Andreas Groll, Julien Hambuckers
More informationGroup exponential penalties for bi-level variable selection
for bi-level variable selection Department of Biostatistics Department of Statistics University of Kentucky July 31, 2011 Introduction In regression, variables can often be thought of as grouped: Indicator
More informationMissing Value Estimation for Time Series Microarray Data Using Linear Dynamical Systems Modeling
22nd International Conference on Advanced Information Networking and Applications - Workshops Missing Value Estimation for Time Series Microarray Data Using Linear Dynamical Systems Modeling Connie Phong
More informationOdds ratio estimation in Bernoulli smoothing spline analysis-ofvariance
The Statistician (1997) 46, No. 1, pp. 49 56 Odds ratio estimation in Bernoulli smoothing spline analysis-ofvariance models By YUEDONG WANG{ University of Michigan, Ann Arbor, USA [Received June 1995.
More informationExploratory quantile regression with many covariates: An application to adverse birth outcomes
Exploratory quantile regression with many covariates: An application to adverse birth outcomes June 3, 2011 eappendix 30 Percent of Total 20 10 0 0 1000 2000 3000 4000 5000 Birth weights efigure 1: Histogram
More informationECLT 5810 Linear Regression and Logistic Regression for Classification. Prof. Wai Lam
ECLT 5810 Linear Regression and Logistic Regression for Classification Prof. Wai Lam Linear Regression Models Least Squares Input vectors is an attribute / feature / predictor (independent variable) The
More informationPackage Grace. R topics documented: April 9, Type Package
Type Package Package Grace April 9, 2017 Title Graph-Constrained Estimation and Hypothesis Tests Version 0.5.3 Date 2017-4-8 Author Sen Zhao Maintainer Sen Zhao Description Use
More informationShrinkage Tuning Parameter Selection in Precision Matrices Estimation
arxiv:0909.1123v1 [stat.me] 7 Sep 2009 Shrinkage Tuning Parameter Selection in Precision Matrices Estimation Heng Lian Division of Mathematical Sciences School of Physical and Mathematical Sciences Nanyang
More information10-810: Advanced Algorithms and Models for Computational Biology. Optimal leaf ordering and classification
10-810: Advanced Algorithms and Models for Computational Biology Optimal leaf ordering and classification Hierarchical clustering As we mentioned, its one of the most popular methods for clustering gene
More informationVariable Selection for Highly Correlated Predictors
Variable Selection for Highly Correlated Predictors Fei Xue and Annie Qu Department of Statistics, University of Illinois at Urbana-Champaign WHOA-PSI, Aug, 2017 St. Louis, Missouri 1 / 30 Background Variable
More information