A sparse factor analysis model for high dimensional latent spaces

Size: px
Start display at page:

Download "A sparse factor analysis model for high dimensional latent spaces"

Transcription

1 A sparse factor analysis model for high dimensional latent spaces Chuan Gao Institute of Genome Sciences & Policy Duke University Barbara E Engelhardt Department of Biostatistics & Bioinformatics Institute for Genome Sciences & Policy Duke University barbara.engelhardt@duke.edu Abstract Inducing sparsity in factor analysis has become increasingly important as applications have arisen that are best modeled by a high dimensional, sparse latent space, and the interpretability of this latent space is critical. Applying latent factor models with a high dimensional latent space but without sparsity yields nonsense factors that may be artifactual and are prone to overfittiing the data. In the Bayesian context, a number of sparsity-inducing priors have been proposed, but none that specifically address the context of a high dimensional latent space. Here we describe a Bayesian sparse factor analysis model that uses a general three parameter beta prior, which, given specific settings of hyperparameters, can recapitulate sparsity inducing priors with appropriate modeling assumptions and computational properties. We apply the model to simulated and real gene expression data sets to illustrate the model properties and to identify large numbers of sparse, possibly correlated factors in this space. Introduction Factor analysis has been used in a variety of settings to extract useful features from high dimensional data sets [, 2]. Factor analysis, in a general context, has a number of drawbacks, such as unidentifiability with respect to the rotation of the latent matrices, and the difficulty of selecting the appropriate number of factors. One solution that addresses these drawbacks is to induce sparsity in the loading matrix. By imposing substantial regularization on the loading matrix, the identifiability issue can be alleviated when the latent space is sufficiently sparse, and model selection criteria appear to be more effective at choosing the number of factors because the model does not overfit to the same extent as a non-sparse model. There are currently a number of options for how to induce sparsity constraints on the latent parameter space. We choose to work in the Bayesian context, where a sparsity-inducing prior should have substantial mass around zero to provide strong shrinkage near zero, and also have heavy tails to allow signals to escape strong shrinkage [3]. In the context of sparse regression, there have been a number of proposed solutions [4, 5, 6, 7, 8], some of which have been applied to latent factor models[9, 0]. However, all of these approaches in the factor analysis context impose an equal amount of shrinkage on all parameters, which may sacrifice small signals to achieve high levels of sparsity. To address this issue in the Bayesian latent factor model context, one can use a mixture with a point mass at zero and a normal distribution, a so-called spike and slab prior, on the loading matrix []. Unfortunately there is no closed form solution for the parameters estimates, so MCMC is used to estimate the parameters, which is computational intractable for large data. In this work, we use a three parameter beta (T PB) prior [] as a general shrinkage prior for the factor loading matrix of a latent factor model. T PB(a, b, φ) is a generalized form of the Beta distribution, with the third parameter φ further controlling the shape of the density. It has been shown that a linear transformation of the Beta distribution, producing the inverse Beta distribution, has

2 desirable shrinkage properties in sparse modeling (the horseshoe prior) [7]. The T PB distribution can be used to mimic this distribution, with the inverse beta variable scaled by φ. The T PB is thus appealing because a) it can be used to recapitulate the sparsity-inducing properties of the horseshoe prior, with substantial mass around zero to provide strong shrinkage for noise and heavy tails to avoid shrinking signal, and b) by carefully controlling its parameters, it recapitulates the two-groups model [2, 3] for priors with different shrinkage characteristics. This allows us to recreate a twogroups sparsity-inducing prior, with one mode centered at zero and the other at the posterior mean, and also has a straightforward posterior distribution for which the parameters can be estimated via expectation maximization, making it computationally tractable. In the setting of identifying a large number of factors that may individually contribute minimally to the data variance, these two features, namely computationally tractability and two-groups sparsity modeling, are critical to effectively model the data. 2 Bayesian sparse factor model via T PB We will define the factor analysis model as follows: Y = ΛX + W, () where Y has dimension n m, Λ is the loading matrix with dimension n p, X is the factor matrix with dimension p m, and W is the n m residual error matrix, where we assume W N(0, Ψ). For computational tractability, we assume Ψ is diagonal (but the diagonal elements are not necessarily the same). For the latent variable X, we follow convention by giving it a standard normal prior, X N(0, I). To induce sparsity in the factor loading matrix Λ, we put the following priors on each element λ ik of the parameter matrix Λ: λ ik N(0, ) (2) T PB(a, b, γ k ) (3) γ k T PB(c, d, ν). (4) In the prior on λ ik, ρ ij provides local shrinkage for each element, while γ k controls the global shrinkage and is specific to each factor k. As in the horseshoe, [0, ] has the desirable properties of strong shrinkage to zero while not overly shrinking signals. This general model is able to capture a number of different types of shrinkage scenarios, depending on the values of a, b, ρ (Table ). Table : Table showing the shrinkage effects for different values of a, b and ρ. ρ > < a = b = 2 horseshoe strong weak a and b weak variable weak a and b strong strong variable 3 Posterior distribution We will generalize this prior further for the latent factor model. For a given parameter θ and scale φ, the following relationship holds []: θ φ β (a, b) θ G(a, λ) and λ G(b, φ), (5) where β (a, b) and G indicate a inverse beta and gamma distribution respectively. From Equations 2, 3 and 4, if we make the substitutions θ ik = and φ k = γ k, it can be shown that θ ik φ k β (a, b). This relationship implies the following simple hierarchical structure for the latent factor model: λ ik N(0, θ ik ), θ ik G(a, δ ik ), δ ik G(b, φ k ), φ k G(c, η) and η G(d, ν), where the parameter φ k controls the global shrinkage and θ ik controls the local shrinkage. We give Ψ an uninformative prior. 2

3 Based on the posterior distributions, a Gibbs sampler can be constructed to iteratively sample the parameter values from their posterior distributions. For faster computation, we use Expectation Maximization (EM), where the expectation step involves taking the expected value of the latent variable X, and the maximization step identifies MAP parameter estimates (see paper for details). 4 Results 4. Simulations We simulated five data sets with different levels of sparsity to test the performance of our model, with sample size n = 200, m = 500, and p = 20 factors. The loading matrices Λ were generated from the above model, setting a = b =. To adjust the sparsity of the matrix, we let ν take values in the range [0 4, ], where smaller values of ν produce more sparsity in the matrix. Both X and W were chosen from N(0, I) with appropriate dimensions. We set both a = b = 2 to recapitulate the horseshoe prior and differentiate them from the simulated distribution. For ν, we used values between and 0.0 with minimal changes in the estimates. We ran EM from a random starting point ten times, and used the parameters from the run with the best fit. We compared our result with the Bayesian Factor Regression Model (BFRM) [] and the K-SVD model [9], BFRM was run with default settings, with a burn in period of 2000 and a sampling period of 20, 000, and K-SVD was run with the same setting as the demonstration file included in the package. We compared the three models by looking at the sparsity level of the three methods versus the amount of information represented in the latent subspace. The sparsity level was measured by the Gini index [3]. For an ordered list of values c, the Gini index is: 2 ( ) N c (k) N k+ 2 k= c N, where N is the total number of elements in the list; bigger values indicate a sparser representation. The accuracy of of the prediction is reflected in the mean squared error (MSE), computed from the residuals. We find that, compared to BFRM, T PB achieves equivalent or better MSE with far more sparsity (range of [0.5, 0.9] for T PB versus [0.4, 0.6] for BFRM). Compared to K-SVD, our method adaptively learned the sparsity of the data, keeping MSE low, while K-SVD maintains the same sparsity level for all simulations, sacrificing accuracy for sparsity. Interestingly, for sparser simulations, both K-SVD and T PB achieve sparser estimates than the real data, while maintaing the same prediction accuracies. We find that the Bayes Information Criterion (BIC) score, depending on the value of ν, is fairly accurate in terms of the number of selected features in this context (Figure 2). Figure : Comparison of the sparsity level (left panel, Y-axis) and the MSE (right panel, Y-axis) of T PB, BFRM and K-SVD. The underlying sparsity is on the X-axis. The true sparsity plot on the left panel also corresponds to the line with slope of. 4.2 Gene Expression Analysis Microarrays are able to generate gene expression levels for tens of thousands of genes in a sample quickly and at low cost. Biologists know that genes do not function as independent units, but instead as parts of complicated networks with different biochemical purposes [4, 5]. As a result, genes that share similar functions tend to have gene expression levels that are correlated across samples because, for example, they may be regulated by common transcription factors. Identifying these coregulated sets of genes from high dimensional gene expression measurements is critical for analysis of gene networks and for identifying genetic variants that impact transcription from long genomic distances. The number of co-regulated sets of genes may be very large, relative to the number of genes in the gene expression matrix. 3

4 Data nu=0. nu=0.0 nu=0.00 nu=0.000 LogL BIC N-factor N-factor Figure 2: Plot showing the fitness of the model, with factor number on the x-axis and log likelihood (left panel) or the BIC score (right panel) on the y-axis. In this simulated data with twenty factors, the BIC score is fairly accurate in terms of the number of selected features across different values of ν. To this end, we applied our method on a subset of 8262 genes with a sample size of 354 human cerebellum samples [unpublished]. We set K = 000 and ran EM from ten starting points with a = 0.5, b = 000 and ν = 0 4 to induce strong shrinkage; we used the result with the best fit. We note that the sparse prior alleviates the problem of overfitting by shrinking unnecessary factors to 0. By looking at the correlation of the genes that load on each factor (those that have values 0), we found that the genes on the same factors clustered well (Figure 3, left). The size of the gene clusters range from 0 to 500, with most around 50 (Figure 3, right). Figure 3: Correlation of genes loaded on the first few factors (left) and distribution of the gene clusters for a total of 000 factors (right). Factors on the left are denoted by black lines. 5 Conclusions We built a model for sparse factor analysis using a three parameter beta prior to induce shrinkage. We found that this model has favorable characteristics to estimating possibly high-dimensional latent spaces. We are further testing the robustness of estimates from our model and will use the factors to identify genetic variants that are associated with long-distance genetic regulation of each factor. Acknowledgments The authors would like to thank Sayan Mukherjee for helpful conversations. The gene expression data were generated by Merck Research Laboratories in collaboration with the Harvard Brain Tissue Resource Center and was obtained through the Synapse data repository (data set id: syn4505 at References [] C. M. Carvalho, J. E. Lucas, Q. Wang, J. Chang, J. R. Nevins, and M. West. High-dimensional sparse factor modelling - applications in gene expression genomics. Journal of the American Statistical Association, 03: , PMCID [2] Mingyuan Zhou, Lauren Hannah, David Dunson, and Lawrence Carin. Beta-Negative Binomial Process and Poisson Factor Analysis. December 20. 4

5 [3] Nicholas G. Polson James G. Scott. Shrink globally, act locally: Sparse bayesian regularization and prediction. [4] Michael E. Tipping. Sparse bayesian learning and the relevance vector machine. J. Mach. Learn. Res., :2 244, September 200. [5] Jim and Philip. Inference with normal-gamma prior distributions in regression problems. Bayesian Analysis, 5():7 88, 200. [6] Trevor Park and George Casella. The Bayesian Lasso. Journal of the American Statistical Association, 03(482):68 686, [7] Carlos M. Carvalho, Nicholas G. Polson, and James G. Scott. Handling sparsity via the horseshoe. Journal of Machine Learning Research - Proceedings Track, pages 73 80, [8] Barbara Engelhardt and Matthew Stephens. Analysis of population structure: A unifying framework and novel methods based on sparse factor analysis. PLoS Genetics, 6(9):e007, 200. [9] M. Aharon, M. Elad, and A. Bruckstein. K-SVD: An Algorithm for Designing Overcomplete Dictionaries for Sparse Representation. Signal Processing, IEEE Transactions on, 54(): , November [0] Julien Mairal, Francis Bach, Jean Ponce, and Guillermo Sapiro. Online learning for matrix factorization and sparse coding. J. Mach. Learn. Res., :9 60, March 200. [] Artin Armagan, David Dunson, and Merlise Clyde. Generalized beta mixtures of gaussians. In J. Shawe-Taylor, R.S. Zemel, P. Bartlett, F.C.N. Pereira, and K.Q. Weinberger, editors, Advances in Neural Information Processing Systems 24, pages [2] B. Efron. Microarrays, Empirical Bayes and the Two-Groups Model. Statistical Science, 23: 47, [3] Niall Hurley and Scott Rickard. Comparing measures of sparsity. Machine Learning for Signal Processing, MLSP IEEE Workshop on, pages 55 60, Oct [4] Yoo-Ah Kim, Stefan Wuchty, and Teresa Przytycka. Identifying causal genes and dysregulated pathways in complex diseases. PLoS computational biology, 7(3):e00095, 20. [5] Yanqing Chen, Jun Zhu, Pek Lum, Xia Yang, Shirly Pinto, Douglas MacNeil, Chunsheng Zhang, John Lamb, Stephen Edwards, Solveig Sieberts, Amy Leonardson, Lawrence Castellini, Susanna Wang, Marie-France Champy, Bin Zhang, Valur Emilsson, Sudheer Doss, Anatole Ghazalpour, Steve Horvath, Thomas Drake, Aldons Lusis, and Eric Schadt. Variations in dna elucidate molecular networks that cause disease. Nature, 452(786): ,

Partial factor modeling: predictor-dependent shrinkage for linear regression

Partial factor modeling: predictor-dependent shrinkage for linear regression modeling: predictor-dependent shrinkage for linear Richard Hahn, Carlos Carvalho and Sayan Mukherjee JASA 2013 Review by Esther Salazar Duke University December, 2013 Factor framework The factor framework

More information

Default Priors and Effcient Posterior Computation in Bayesian

Default Priors and Effcient Posterior Computation in Bayesian Default Priors and Effcient Posterior Computation in Bayesian Factor Analysis January 16, 2010 Presented by Eric Wang, Duke University Background and Motivation A Brief Review of Parameter Expansion Literature

More information

Regularization Parameter Selection for a Bayesian Multi-Level Group Lasso Regression Model with Application to Imaging Genomics

Regularization Parameter Selection for a Bayesian Multi-Level Group Lasso Regression Model with Application to Imaging Genomics Regularization Parameter Selection for a Bayesian Multi-Level Group Lasso Regression Model with Application to Imaging Genomics arxiv:1603.08163v1 [stat.ml] 7 Mar 016 Farouk S. Nathoo, Keelin Greenlaw,

More information

Bayesian Methods for Machine Learning

Bayesian Methods for Machine Learning Bayesian Methods for Machine Learning CS 584: Big Data Analytics Material adapted from Radford Neal s tutorial (http://ftp.cs.utoronto.ca/pub/radford/bayes-tut.pdf), Zoubin Ghahramni (http://hunch.net/~coms-4771/zoubin_ghahramani_bayesian_learning.pdf),

More information

Approximating high-dimensional posteriors with nuisance parameters via integrated rotated Gaussian approximation (IRGA)

Approximating high-dimensional posteriors with nuisance parameters via integrated rotated Gaussian approximation (IRGA) Approximating high-dimensional posteriors with nuisance parameters via integrated rotated Gaussian approximation (IRGA) Willem van den Boom Department of Statistics and Applied Probability National University

More information

Sparse Linear Models (10/7/13)

Sparse Linear Models (10/7/13) STA56: Probabilistic machine learning Sparse Linear Models (0/7/) Lecturer: Barbara Engelhardt Scribes: Jiaji Huang, Xin Jiang, Albert Oh Sparsity Sparsity has been a hot topic in statistics and machine

More information

Bayesian shrinkage approach in variable selection for mixed

Bayesian shrinkage approach in variable selection for mixed Bayesian shrinkage approach in variable selection for mixed effects s GGI Statistics Conference, Florence, 2015 Bayesian Variable Selection June 22-26, 2015 Outline 1 Introduction 2 3 4 Outline Introduction

More information

Integrated Anlaysis of Genomics Data

Integrated Anlaysis of Genomics Data Integrated Anlaysis of Genomics Data Elizabeth Jennings July 3, 01 Abstract In this project, we integrate data from several genomic platforms in a model that incorporates the biological relationships between

More information

Bayesian Grouped Horseshoe Regression with Application to Additive Models

Bayesian Grouped Horseshoe Regression with Application to Additive Models Bayesian Grouped Horseshoe Regression with Application to Additive Models Zemei Xu 1,2, Daniel F. Schmidt 1, Enes Makalic 1, Guoqi Qian 2, John L. Hopper 1 1 Centre for Epidemiology and Biostatistics,

More information

Bayesian Sparse Linear Regression with Unknown Symmetric Error

Bayesian Sparse Linear Regression with Unknown Symmetric Error Bayesian Sparse Linear Regression with Unknown Symmetric Error Minwoo Chae 1 Joint work with Lizhen Lin 2 David B. Dunson 3 1 Department of Mathematics, The University of Texas at Austin 2 Department of

More information

Expression Data Exploration: Association, Patterns, Factors & Regression Modelling

Expression Data Exploration: Association, Patterns, Factors & Regression Modelling Expression Data Exploration: Association, Patterns, Factors & Regression Modelling Exploring gene expression data Scale factors, median chip correlation on gene subsets for crude data quality investigation

More information

Or How to select variables Using Bayesian LASSO

Or How to select variables Using Bayesian LASSO Or How to select variables Using Bayesian LASSO x 1 x 2 x 3 x 4 Or How to select variables Using Bayesian LASSO x 1 x 2 x 3 x 4 Or How to select variables Using Bayesian LASSO On Bayesian Variable Selection

More information

Probabilistic machine learning group, Aalto University Bayesian theory and methods, approximative integration, model

Probabilistic machine learning group, Aalto University  Bayesian theory and methods, approximative integration, model Aki Vehtari, Aalto University, Finland Probabilistic machine learning group, Aalto University http://research.cs.aalto.fi/pml/ Bayesian theory and methods, approximative integration, model assessment and

More information

Review: Probabilistic Matrix Factorization. Probabilistic Matrix Factorization (PMF)

Review: Probabilistic Matrix Factorization. Probabilistic Matrix Factorization (PMF) Case Study 4: Collaborative Filtering Review: Probabilistic Matrix Factorization Machine Learning for Big Data CSE547/STAT548, University of Washington Emily Fox February 2 th, 214 Emily Fox 214 1 Probabilistic

More information

STA414/2104 Statistical Methods for Machine Learning II

STA414/2104 Statistical Methods for Machine Learning II STA414/2104 Statistical Methods for Machine Learning II Murat A. Erdogdu & David Duvenaud Department of Computer Science Department of Statistical Sciences Lecture 3 Slide credits: Russ Salakhutdinov Announcements

More information

Bayesian methods in economics and finance

Bayesian methods in economics and finance 1/26 Bayesian methods in economics and finance Linear regression: Bayesian model selection and sparsity priors Linear Regression 2/26 Linear regression Model for relationship between (several) independent

More information

Bayes methods for categorical data. April 25, 2017

Bayes methods for categorical data. April 25, 2017 Bayes methods for categorical data April 25, 2017 Motivation for joint probability models Increasing interest in high-dimensional data in broad applications Focus may be on prediction, variable selection,

More information

Statistical Inference

Statistical Inference Statistical Inference Liu Yang Florida State University October 27, 2016 Liu Yang, Libo Wang (Florida State University) Statistical Inference October 27, 2016 1 / 27 Outline The Bayesian Lasso Trevor Park

More information

Bayesian linear regression

Bayesian linear regression Bayesian linear regression Linear regression is the basis of most statistical modeling. The model is Y i = X T i β + ε i, where Y i is the continuous response X i = (X i1,..., X ip ) T is the corresponding

More information

Variational Bayesian Dirichlet-Multinomial Allocation for Exponential Family Mixtures

Variational Bayesian Dirichlet-Multinomial Allocation for Exponential Family Mixtures 17th Europ. Conf. on Machine Learning, Berlin, Germany, 2006. Variational Bayesian Dirichlet-Multinomial Allocation for Exponential Family Mixtures Shipeng Yu 1,2, Kai Yu 2, Volker Tresp 2, and Hans-Peter

More information

Integrated Non-Factorized Variational Inference

Integrated Non-Factorized Variational Inference Integrated Non-Factorized Variational Inference Shaobo Han, Xuejun Liao and Lawrence Carin Duke University February 27, 2014 S. Han et al. Integrated Non-Factorized Variational Inference February 27, 2014

More information

Factor Analysis (10/2/13)

Factor Analysis (10/2/13) STA561: Probabilistic machine learning Factor Analysis (10/2/13) Lecturer: Barbara Engelhardt Scribes: Li Zhu, Fan Li, Ni Guan Factor Analysis Factor analysis is related to the mixture models we have studied.

More information

STA 4273H: Statistical Machine Learning

STA 4273H: Statistical Machine Learning STA 4273H: Statistical Machine Learning Russ Salakhutdinov Department of Computer Science! Department of Statistical Sciences! rsalakhu@cs.toronto.edu! h0p://www.cs.utoronto.ca/~rsalakhu/ Lecture 7 Approximate

More information

Estimating Sparse High Dimensional Linear Models using Global-Local Shrinkage

Estimating Sparse High Dimensional Linear Models using Global-Local Shrinkage Estimating Sparse High Dimensional Linear Models using Global-Local Shrinkage Daniel F. Schmidt Centre for Biostatistics and Epidemiology The University of Melbourne Monash University May 11, 2017 Outline

More information

Variable Selection in Structured High-dimensional Covariate Spaces

Variable Selection in Structured High-dimensional Covariate Spaces Variable Selection in Structured High-dimensional Covariate Spaces Fan Li 1 Nancy Zhang 2 1 Department of Health Care Policy Harvard University 2 Department of Statistics Stanford University May 14 2007

More information

Geometric ergodicity of the Bayesian lasso

Geometric ergodicity of the Bayesian lasso Geometric ergodicity of the Bayesian lasso Kshiti Khare and James P. Hobert Department of Statistics University of Florida June 3 Abstract Consider the standard linear model y = X +, where the components

More information

VCMC: Variational Consensus Monte Carlo

VCMC: Variational Consensus Monte Carlo VCMC: Variational Consensus Monte Carlo Maxim Rabinovich, Elaine Angelino, Michael I. Jordan Berkeley Vision and Learning Center September 22, 2015 probabilistic models! sky fog bridge water grass object

More information

The Mixture Approach for Simulating New Families of Bivariate Distributions with Specified Correlations

The Mixture Approach for Simulating New Families of Bivariate Distributions with Specified Correlations The Mixture Approach for Simulating New Families of Bivariate Distributions with Specified Correlations John R. Michael, Significance, Inc. and William R. Schucany, Southern Methodist University The mixture

More information

Sparse Bayesian Logistic Regression with Hierarchical Prior and Variational Inference

Sparse Bayesian Logistic Regression with Hierarchical Prior and Variational Inference Sparse Bayesian Logistic Regression with Hierarchical Prior and Variational Inference Shunsuke Horii Waseda University s.horii@aoni.waseda.jp Abstract In this paper, we present a hierarchical model which

More information

Scale Mixture Modeling of Priors for Sparse Signal Recovery

Scale Mixture Modeling of Priors for Sparse Signal Recovery Scale Mixture Modeling of Priors for Sparse Signal Recovery Bhaskar D Rao 1 University of California, San Diego 1 Thanks to David Wipf, Jason Palmer, Zhilin Zhang and Ritwik Giri Outline Outline Sparse

More information

Bayesian Grouped Horseshoe Regression with Application to Additive Models

Bayesian Grouped Horseshoe Regression with Application to Additive Models Bayesian Grouped Horseshoe Regression with Application to Additive Models Zemei Xu, Daniel F. Schmidt, Enes Makalic, Guoqi Qian, and John L. Hopper Centre for Epidemiology and Biostatistics, Melbourne

More information

A short introduction to INLA and R-INLA

A short introduction to INLA and R-INLA A short introduction to INLA and R-INLA Integrated Nested Laplace Approximation Thomas Opitz, BioSP, INRA Avignon Workshop: Theory and practice of INLA and SPDE November 7, 2018 2/21 Plan for this talk

More information

Learning Multiple Tasks with a Sparse Matrix-Normal Penalty

Learning Multiple Tasks with a Sparse Matrix-Normal Penalty Learning Multiple Tasks with a Sparse Matrix-Normal Penalty Yi Zhang and Jeff Schneider NIPS 2010 Presented by Esther Salazar Duke University March 25, 2011 E. Salazar (Reading group) March 25, 2011 1

More information

Package effectfusion

Package effectfusion Package November 29, 2016 Title Bayesian Effect Fusion for Categorical Predictors Version 1.0 Date 2016-11-21 Author Daniela Pauger [aut, cre], Helga Wagner [aut], Gertraud Malsiner-Walli [aut] Maintainer

More information

Lasso & Bayesian Lasso

Lasso & Bayesian Lasso Readings Chapter 15 Christensen Merlise Clyde October 6, 2015 Lasso Tibshirani (JRSS B 1996) proposed estimating coefficients through L 1 constrained least squares Least Absolute Shrinkage and Selection

More information

arxiv: v1 [stat.me] 6 Jul 2017

arxiv: v1 [stat.me] 6 Jul 2017 Sparsity information and regularization in the horseshoe and other shrinkage priors arxiv:77.694v [stat.me] 6 Jul 7 Juho Piironen and Aki Vehtari Helsinki Institute for Information Technology, HIIT Department

More information

Package horseshoe. November 8, 2016

Package horseshoe. November 8, 2016 Title Implementation of the Horseshoe Prior Version 0.1.0 Package horseshoe November 8, 2016 Description Contains functions for applying the horseshoe prior to highdimensional linear regression, yielding

More information

Study Notes on the Latent Dirichlet Allocation

Study Notes on the Latent Dirichlet Allocation Study Notes on the Latent Dirichlet Allocation Xugang Ye 1. Model Framework A word is an element of dictionary {1,,}. A document is represented by a sequence of words: =(,, ), {1,,}. A corpus is a collection

More information

Linear model selection and regularization

Linear model selection and regularization Linear model selection and regularization Problems with linear regression with least square 1. Prediction Accuracy: linear regression has low bias but suffer from high variance, especially when n p. It

More information

Parcimonie en apprentissage statistique

Parcimonie en apprentissage statistique Parcimonie en apprentissage statistique Guillaume Obozinski Ecole des Ponts - ParisTech Journée Parcimonie Fédération Charles Hermite, 23 Juin 2014 Parcimonie en apprentissage 1/44 Classical supervised

More information

Horseshoe, Lasso and Related Shrinkage Methods

Horseshoe, Lasso and Related Shrinkage Methods Readings Chapter 15 Christensen Merlise Clyde October 15, 2015 Bayesian Lasso Park & Casella (JASA 2008) and Hans (Biometrika 2010) propose Bayesian versions of the Lasso Bayesian Lasso Park & Casella

More information

An Algorithm for Bayesian Variable Selection in High-dimensional Generalized Linear Models

An Algorithm for Bayesian Variable Selection in High-dimensional Generalized Linear Models Proceedings 59th ISI World Statistics Congress, 25-30 August 2013, Hong Kong (Session CPS023) p.3938 An Algorithm for Bayesian Variable Selection in High-dimensional Generalized Linear Models Vitara Pungpapong

More information

Bayesian Sparse Correlated Factor Analysis

Bayesian Sparse Correlated Factor Analysis Bayesian Sparse Correlated Factor Analysis 1 Abstract In this paper, we propose a new sparse correlated factor model under a Bayesian framework that intended to model transcription factor regulation in

More information

25 : Graphical induced structured input/output models

25 : Graphical induced structured input/output models 10-708: Probabilistic Graphical Models 10-708, Spring 2016 25 : Graphical induced structured input/output models Lecturer: Eric P. Xing Scribes: Raied Aljadaany, Shi Zong, Chenchen Zhu Disclaimer: A large

More information

Bayesian non-parametric model to longitudinally predict churn

Bayesian non-parametric model to longitudinally predict churn Bayesian non-parametric model to longitudinally predict churn Bruno Scarpa Università di Padova Conference of European Statistics Stakeholders Methodologists, Producers and Users of European Statistics

More information

Machine Learning for OR & FE

Machine Learning for OR & FE Machine Learning for OR & FE Regression II: Regularization and Shrinkage Methods Martin Haugh Department of Industrial Engineering and Operations Research Columbia University Email: martin.b.haugh@gmail.com

More information

Part 2: Multivariate fmri analysis using a sparsifying spatio-temporal prior

Part 2: Multivariate fmri analysis using a sparsifying spatio-temporal prior Chalmers Machine Learning Summer School Approximate message passing and biomedicine Part 2: Multivariate fmri analysis using a sparsifying spatio-temporal prior Tom Heskes joint work with Marcel van Gerven

More information

Learning Feature Selection Dependencies in Multi-task Learning

Learning Feature Selection Dependencies in Multi-task Learning Learning Feature Selection Dependencies in Multi-task Learning Daniel Hernández-Lobato Computer Science Department Universidad Autónoma de Madrid daniel.hernandez@uam.es José Miguel Hernández-Lobato Department

More information

Nonparametric Bayes tensor factorizations for big data

Nonparametric Bayes tensor factorizations for big data Nonparametric Bayes tensor factorizations for big data David Dunson Department of Statistical Science, Duke University Funded from NIH R01-ES017240, R01-ES017436 & DARPA N66001-09-C-2082 Motivation Conditional

More information

Sparse Factor-Analytic Probit Models

Sparse Factor-Analytic Probit Models Sparse Factor-Analytic Probit Models By JAMES G. SCOTT Department of Statistical Science, Duke University, Durham, North Carolina 27708-0251, U.S.A. james@stat.duke.edu PAUL R. HAHN Department of Statistical

More information

CS598 Machine Learning in Computational Biology (Lecture 5: Matrix - part 2) Professor Jian Peng Teaching Assistant: Rongda Zhu

CS598 Machine Learning in Computational Biology (Lecture 5: Matrix - part 2) Professor Jian Peng Teaching Assistant: Rongda Zhu CS598 Machine Learning in Computational Biology (Lecture 5: Matrix - part 2) Professor Jian Peng Teaching Assistant: Rongda Zhu Feature engineering is hard 1. Extract informative features from domain knowledge

More information

Introduction to Probabilistic Machine Learning

Introduction to Probabilistic Machine Learning Introduction to Probabilistic Machine Learning Piyush Rai Dept. of CSE, IIT Kanpur (Mini-course 1) Nov 03, 2015 Piyush Rai (IIT Kanpur) Introduction to Probabilistic Machine Learning 1 Machine Learning

More information

On the Half-Cauchy Prior for a Global Scale Parameter

On the Half-Cauchy Prior for a Global Scale Parameter Bayesian Analysis (2012) 7, Number 2, pp. 1 16 On the Half-Cauchy Prior for a Global Scale Parameter Nicholas G. Polson and James G. Scott Abstract. This paper argues that the half-cauchy distribution

More information

2.3. Clustering or vector quantization 57

2.3. Clustering or vector quantization 57 Multivariate Statistics non-negative matrix factorisation and sparse dictionary learning The PCA decomposition is by construction optimal solution to argmin A R n q,h R q p X AH 2 2 under constraint :

More information

Dimension Reduction Methods

Dimension Reduction Methods Dimension Reduction Methods And Bayesian Machine Learning Marek Petrik 2/28 Previously in Machine Learning How to choose the right features if we have (too) many options Methods: 1. Subset selection 2.

More information

STA 216, GLM, Lecture 16. October 29, 2007

STA 216, GLM, Lecture 16. October 29, 2007 STA 216, GLM, Lecture 16 October 29, 2007 Efficient Posterior Computation in Factor Models Underlying Normal Models Generalized Latent Trait Models Formulation Genetic Epidemiology Illustration Structural

More information

Bayesian learning of sparse factor loadings

Bayesian learning of sparse factor loadings Magnus Rattray School of Computer Science, University of Manchester Bayesian Research Kitchen, Ambleside, September 6th 2008 Talk Outline Brief overview of popular sparsity priors Example application:

More information

MULTI-SCALE BAYESIAN RECONSTRUCTION OF COMPRESSIVE X-RAY IMAGE. Jiaji Huang, Xin Yuan, and Robert Calderbank

MULTI-SCALE BAYESIAN RECONSTRUCTION OF COMPRESSIVE X-RAY IMAGE. Jiaji Huang, Xin Yuan, and Robert Calderbank MULTI-SCALE BAYESIAN RECONSTRUCTION OF COMPRESSIVE X-RAY IMAGE Jiaji Huang, Xin Yuan, and Robert Calderbank Department of Electrical and Computer Engineering, Duke University, Durham, NC, 27708, U.S.A

More information

Gentle Introduction to Infinite Gaussian Mixture Modeling

Gentle Introduction to Infinite Gaussian Mixture Modeling Gentle Introduction to Infinite Gaussian Mixture Modeling with an application in neuroscience By Frank Wood Rasmussen, NIPS 1999 Neuroscience Application: Spike Sorting Important in neuroscience and for

More information

On Bayesian Computation

On Bayesian Computation On Bayesian Computation Michael I. Jordan with Elaine Angelino, Maxim Rabinovich, Martin Wainwright and Yun Yang Previous Work: Information Constraints on Inference Minimize the minimax risk under constraints

More information

Misspecified Bayesian Cramer-Rao Bound for Sparse Bayesian Learning

Misspecified Bayesian Cramer-Rao Bound for Sparse Bayesian Learning MITSUBISHI ELECTRIC RESEARCH LABORATORIES http://www.merl.com Misspecified Bayesian Cramer-Rao Bound for Sparse Bayesian Learning Pajovic, M. TR018-076 July 1, 018 Abstract We consider a misspecified Bayesian

More information

L 0 methods. H.J. Kappen Donders Institute for Neuroscience Radboud University, Nijmegen, the Netherlands. December 5, 2011.

L 0 methods. H.J. Kappen Donders Institute for Neuroscience Radboud University, Nijmegen, the Netherlands. December 5, 2011. L methods H.J. Kappen Donders Institute for Neuroscience Radboud University, Nijmegen, the Netherlands December 5, 2 Bert Kappen Outline George McCullochs model The Variational Garrote Bert Kappen L methods

More information

Markov Chain Monte Carlo methods

Markov Chain Monte Carlo methods Markov Chain Monte Carlo methods Tomas McKelvey and Lennart Svensson Signal Processing Group Department of Signals and Systems Chalmers University of Technology, Sweden November 26, 2012 Today s learning

More information

Variational Methods in Bayesian Deconvolution

Variational Methods in Bayesian Deconvolution PHYSTAT, SLAC, Stanford, California, September 8-, Variational Methods in Bayesian Deconvolution K. Zarb Adami Cavendish Laboratory, University of Cambridge, UK This paper gives an introduction to the

More information

MS-C1620 Statistical inference

MS-C1620 Statistical inference MS-C1620 Statistical inference 10 Linear regression III Joni Virta Department of Mathematics and Systems Analysis School of Science Aalto University Academic year 2018 2019 Period III - IV 1 / 32 Contents

More information

Bayesian Regression Linear and Logistic Regression

Bayesian Regression Linear and Logistic Regression When we want more than point estimates Bayesian Regression Linear and Logistic Regression Nicole Beckage Ordinary Least Squares Regression and Lasso Regression return only point estimates But what if we

More information

Andriy Mnih and Ruslan Salakhutdinov

Andriy Mnih and Ruslan Salakhutdinov MATRIX FACTORIZATION METHODS FOR COLLABORATIVE FILTERING Andriy Mnih and Ruslan Salakhutdinov University of Toronto, Machine Learning Group 1 What is collaborative filtering? The goal of collaborative

More information

Graphical Models for Collaborative Filtering

Graphical Models for Collaborative Filtering Graphical Models for Collaborative Filtering Le Song Machine Learning II: Advanced Topics CSE 8803ML, Spring 2012 Sequence modeling HMM, Kalman Filter, etc.: Similarity: the same graphical model topology,

More information

25 : Graphical induced structured input/output models

25 : Graphical induced structured input/output models 10-708: Probabilistic Graphical Models 10-708, Spring 2013 25 : Graphical induced structured input/output models Lecturer: Eric P. Xing Scribes: Meghana Kshirsagar (mkshirsa), Yiwen Chen (yiwenche) 1 Graph

More information

Approximate Message Passing with Built-in Parameter Estimation for Sparse Signal Recovery

Approximate Message Passing with Built-in Parameter Estimation for Sparse Signal Recovery Approimate Message Passing with Built-in Parameter Estimation for Sparse Signal Recovery arxiv:1606.00901v1 [cs.it] Jun 016 Shuai Huang, Trac D. Tran Department of Electrical and Computer Engineering Johns

More information

Pattern Recognition and Machine Learning

Pattern Recognition and Machine Learning Christopher M. Bishop Pattern Recognition and Machine Learning ÖSpri inger Contents Preface Mathematical notation Contents vii xi xiii 1 Introduction 1 1.1 Example: Polynomial Curve Fitting 4 1.2 Probability

More information

Non-parametric Bayesian Modeling and Fusion of Spatio-temporal Information Sources

Non-parametric Bayesian Modeling and Fusion of Spatio-temporal Information Sources th International Conference on Information Fusion Chicago, Illinois, USA, July -8, Non-parametric Bayesian Modeling and Fusion of Spatio-temporal Information Sources Priyadip Ray Department of Electrical

More information

Chapter 10. Semi-Supervised Learning

Chapter 10. Semi-Supervised Learning Chapter 10. Semi-Supervised Learning Wei Pan Division of Biostatistics, School of Public Health, University of Minnesota, Minneapolis, MN 55455 Email: weip@biostat.umn.edu PubH 7475/8475 c Wei Pan Outline

More information

On the half-cauchy prior for a global scale parameter

On the half-cauchy prior for a global scale parameter On the half-cauchy prior for a global scale parameter Nicholas G. Polson University of Chicago arxiv:1104.4937v2 [stat.me] 25 Sep 2011 James G. Scott The University of Texas at Austin First draft: June

More information

Linear Model Selection and Regularization

Linear Model Selection and Regularization Linear Model Selection and Regularization Recall the linear model Y = β 0 + β 1 X 1 + + β p X p + ɛ. In the lectures that follow, we consider some approaches for extending the linear model framework. In

More information

STA 4273H: Sta-s-cal Machine Learning

STA 4273H: Sta-s-cal Machine Learning STA 4273H: Sta-s-cal Machine Learning Russ Salakhutdinov Department of Computer Science! Department of Statistical Sciences! rsalakhu@cs.toronto.edu! h0p://www.cs.utoronto.ca/~rsalakhu/ Lecture 2 In our

More information

Choosing the Summary Statistics and the Acceptance Rate in Approximate Bayesian Computation

Choosing the Summary Statistics and the Acceptance Rate in Approximate Bayesian Computation Choosing the Summary Statistics and the Acceptance Rate in Approximate Bayesian Computation COMPSTAT 2010 Revised version; August 13, 2010 Michael G.B. Blum 1 Laboratoire TIMC-IMAG, CNRS, UJF Grenoble

More information

On Markov chain Monte Carlo methods for tall data

On Markov chain Monte Carlo methods for tall data On Markov chain Monte Carlo methods for tall data Remi Bardenet, Arnaud Doucet, Chris Holmes Paper review by: David Carlson October 29, 2016 Introduction Many data sets in machine learning and computational

More information

Multivariate Bayes Wavelet Shrinkage and Applications

Multivariate Bayes Wavelet Shrinkage and Applications Journal of Applied Statistics Vol. 32, No. 5, 529 542, July 2005 Multivariate Bayes Wavelet Shrinkage and Applications GABRIEL HUERTA Department of Mathematics and Statistics, University of New Mexico

More information

Default Priors and Efficient Posterior Computation in Bayesian Factor Analysis

Default Priors and Efficient Posterior Computation in Bayesian Factor Analysis Default Priors and Efficient Posterior Computation in Bayesian Factor Analysis Joyee Ghosh Institute of Statistics and Decision Sciences, Duke University Box 90251, Durham, NC 27708 joyee@stat.duke.edu

More information

Regularized Regression A Bayesian point of view

Regularized Regression A Bayesian point of view Regularized Regression A Bayesian point of view Vincent MICHEL Director : Gilles Celeux Supervisor : Bertrand Thirion Parietal Team, INRIA Saclay Ile-de-France LRI, Université Paris Sud CEA, DSV, I2BM,

More information

Linear Methods for Regression. Lijun Zhang

Linear Methods for Regression. Lijun Zhang Linear Methods for Regression Lijun Zhang zlj@nju.edu.cn http://cs.nju.edu.cn/zlj Outline Introduction Linear Regression Models and Least Squares Subset Selection Shrinkage Methods Methods Using Derived

More information

Greedy Dictionary Selection for Sparse Representation

Greedy Dictionary Selection for Sparse Representation Greedy Dictionary Selection for Sparse Representation Volkan Cevher Rice University volkan@rice.edu Andreas Krause Caltech krausea@caltech.edu Abstract We discuss how to construct a dictionary by selecting

More information

. Also, in this case, p i = N1 ) T, (2) where. I γ C N(N 2 2 F + N1 2 Q)

. Also, in this case, p i = N1 ) T, (2) where. I γ C N(N 2 2 F + N1 2 Q) Supplementary information S7 Testing for association at imputed SPs puted SPs Score tests A Score Test needs calculations of the observed data score and information matrix only under the null hypothesis,

More information

Bayesian spatial quantile regression

Bayesian spatial quantile regression Brian J. Reich and Montserrat Fuentes North Carolina State University and David B. Dunson Duke University E-mail:reich@stat.ncsu.edu Tropospheric ozone Tropospheric ozone has been linked with several adverse

More information

Scaling Neighbourhood Methods

Scaling Neighbourhood Methods Quick Recap Scaling Neighbourhood Methods Collaborative Filtering m = #items n = #users Complexity : m * m * n Comparative Scale of Signals ~50 M users ~25 M items Explicit Ratings ~ O(1M) (1 per billion)

More information

Motivation Sparse Signal Recovery is an interesting area with many potential applications. Methods developed for solving sparse signal recovery proble

Motivation Sparse Signal Recovery is an interesting area with many potential applications. Methods developed for solving sparse signal recovery proble Bayesian Methods for Sparse Signal Recovery Bhaskar D Rao 1 University of California, San Diego 1 Thanks to David Wipf, Zhilin Zhang and Ritwik Giri Motivation Sparse Signal Recovery is an interesting

More information

BAGUS: Bayesian Regularization for Graphical Models with Unequal Shrinkage

BAGUS: Bayesian Regularization for Graphical Models with Unequal Shrinkage BAGUS: Bayesian Regularization for Graphical Models with Unequal Shrinkage Lingrui Gan, Naveen N. Narisetty, Feng Liang Department of Statistics University of Illinois at Urbana-Champaign Problem Statement

More information

Computer Vision Group Prof. Daniel Cremers. 10a. Markov Chain Monte Carlo

Computer Vision Group Prof. Daniel Cremers. 10a. Markov Chain Monte Carlo Group Prof. Daniel Cremers 10a. Markov Chain Monte Carlo Markov Chain Monte Carlo In high-dimensional spaces, rejection sampling and importance sampling are very inefficient An alternative is Markov Chain

More information

Lecture 13 : Variational Inference: Mean Field Approximation

Lecture 13 : Variational Inference: Mean Field Approximation 10-708: Probabilistic Graphical Models 10-708, Spring 2017 Lecture 13 : Variational Inference: Mean Field Approximation Lecturer: Willie Neiswanger Scribes: Xupeng Tong, Minxing Liu 1 Problem Setup 1.1

More information

Nonparametric Bayesian Dictionary Learning for Machine Listening

Nonparametric Bayesian Dictionary Learning for Machine Listening Nonparametric Bayesian Dictionary Learning for Machine Listening Dawen Liang Electrical Engineering dl2771@columbia.edu 1 Introduction Machine listening, i.e., giving machines the ability to extract useful

More information

Probabilistic Time Series Classification

Probabilistic Time Series Classification Probabilistic Time Series Classification Y. Cem Sübakan Boğaziçi University 25.06.2013 Y. Cem Sübakan (Boğaziçi University) M.Sc. Thesis Defense 25.06.2013 1 / 54 Problem Statement The goal is to assign

More information

Uncertainty quantification and visualization for functional random variables

Uncertainty quantification and visualization for functional random variables Uncertainty quantification and visualization for functional random variables MascotNum Workshop 2014 S. Nanty 1,3 C. Helbert 2 A. Marrel 1 N. Pérot 1 C. Prieur 3 1 CEA, DEN/DER/SESI/LSMR, F-13108, Saint-Paul-lez-Durance,

More information

Document and Topic Models: plsa and LDA

Document and Topic Models: plsa and LDA Document and Topic Models: plsa and LDA Andrew Levandoski and Jonathan Lobo CS 3750 Advanced Topics in Machine Learning 2 October 2018 Outline Topic Models plsa LSA Model Fitting via EM phits: link analysis

More information

ABC random forest for parameter estimation. Jean-Michel Marin

ABC random forest for parameter estimation. Jean-Michel Marin ABC random forest for parameter estimation Jean-Michel Marin Université de Montpellier Institut Montpelliérain Alexander Grothendieck (IMAG) Institut de Biologie Computationnelle (IBC) Labex Numev! joint

More information

Supplementary Materials for Molecular QTL Discovery Incorporating Genomic Annotations using Bayesian False Discovery Rate Control

Supplementary Materials for Molecular QTL Discovery Incorporating Genomic Annotations using Bayesian False Discovery Rate Control Supplementary Materials for Molecular QTL Discovery Incorporating Genomic Annotations using Bayesian False Discovery Rate Control Xiaoquan Wen Department of Biostatistics, University of Michigan A Model

More information

Mixtures of Negative Binomial distributions for modelling overdispersion in RNA-Seq data

Mixtures of Negative Binomial distributions for modelling overdispersion in RNA-Seq data Mixtures of Negative Binomial distributions for modelling overdispersion in RNA-Seq data Cinzia Viroli 1 joint with E. Bonafede 1, S. Robin 2 & F. Picard 3 1 Department of Statistical Sciences, University

More information

arxiv: v3 [stat.ml] 3 Sep 2014

arxiv: v3 [stat.ml] 3 Sep 2014 A Truncated EM Approach for Spike-and-Slab Sparse Coding Abdul-Saboor Sheikh 1, Jacquelyn A. Shelton 1 and Jörg Lücke 2 arxiv:1211.3589v3 [stat.ml] 3 Sep 2014 {sheikh, shelton}@tu-berlin.de, joerg.luecke@uni-oldenburg.de

More information

Bayesian Clustering of Multi-Omics

Bayesian Clustering of Multi-Omics Bayesian Clustering of Multi-Omics for Cardiovascular Diseases Nils Strelow 22./23.01.2019 Final Presentation Trends in Bioinformatics WS18/19 Recap Intermediate presentation Precision Medicine Multi-Omics

More information

Handling Sparsity via the Horseshoe

Handling Sparsity via the Horseshoe Handling Sparsity via the Carlos M. Carvalho Booth School of Business The University of Chicago Chicago, IL 60637 Nicholas G. Polson Booth School of Business The University of Chicago Chicago, IL 60637

More information