A sparse factor analysis model for high dimensional latent spaces
|
|
- Evelyn Lloyd
- 5 years ago
- Views:
Transcription
1 A sparse factor analysis model for high dimensional latent spaces Chuan Gao Institute of Genome Sciences & Policy Duke University Barbara E Engelhardt Department of Biostatistics & Bioinformatics Institute for Genome Sciences & Policy Duke University barbara.engelhardt@duke.edu Abstract Inducing sparsity in factor analysis has become increasingly important as applications have arisen that are best modeled by a high dimensional, sparse latent space, and the interpretability of this latent space is critical. Applying latent factor models with a high dimensional latent space but without sparsity yields nonsense factors that may be artifactual and are prone to overfittiing the data. In the Bayesian context, a number of sparsity-inducing priors have been proposed, but none that specifically address the context of a high dimensional latent space. Here we describe a Bayesian sparse factor analysis model that uses a general three parameter beta prior, which, given specific settings of hyperparameters, can recapitulate sparsity inducing priors with appropriate modeling assumptions and computational properties. We apply the model to simulated and real gene expression data sets to illustrate the model properties and to identify large numbers of sparse, possibly correlated factors in this space. Introduction Factor analysis has been used in a variety of settings to extract useful features from high dimensional data sets [, 2]. Factor analysis, in a general context, has a number of drawbacks, such as unidentifiability with respect to the rotation of the latent matrices, and the difficulty of selecting the appropriate number of factors. One solution that addresses these drawbacks is to induce sparsity in the loading matrix. By imposing substantial regularization on the loading matrix, the identifiability issue can be alleviated when the latent space is sufficiently sparse, and model selection criteria appear to be more effective at choosing the number of factors because the model does not overfit to the same extent as a non-sparse model. There are currently a number of options for how to induce sparsity constraints on the latent parameter space. We choose to work in the Bayesian context, where a sparsity-inducing prior should have substantial mass around zero to provide strong shrinkage near zero, and also have heavy tails to allow signals to escape strong shrinkage [3]. In the context of sparse regression, there have been a number of proposed solutions [4, 5, 6, 7, 8], some of which have been applied to latent factor models[9, 0]. However, all of these approaches in the factor analysis context impose an equal amount of shrinkage on all parameters, which may sacrifice small signals to achieve high levels of sparsity. To address this issue in the Bayesian latent factor model context, one can use a mixture with a point mass at zero and a normal distribution, a so-called spike and slab prior, on the loading matrix []. Unfortunately there is no closed form solution for the parameters estimates, so MCMC is used to estimate the parameters, which is computational intractable for large data. In this work, we use a three parameter beta (T PB) prior [] as a general shrinkage prior for the factor loading matrix of a latent factor model. T PB(a, b, φ) is a generalized form of the Beta distribution, with the third parameter φ further controlling the shape of the density. It has been shown that a linear transformation of the Beta distribution, producing the inverse Beta distribution, has
2 desirable shrinkage properties in sparse modeling (the horseshoe prior) [7]. The T PB distribution can be used to mimic this distribution, with the inverse beta variable scaled by φ. The T PB is thus appealing because a) it can be used to recapitulate the sparsity-inducing properties of the horseshoe prior, with substantial mass around zero to provide strong shrinkage for noise and heavy tails to avoid shrinking signal, and b) by carefully controlling its parameters, it recapitulates the two-groups model [2, 3] for priors with different shrinkage characteristics. This allows us to recreate a twogroups sparsity-inducing prior, with one mode centered at zero and the other at the posterior mean, and also has a straightforward posterior distribution for which the parameters can be estimated via expectation maximization, making it computationally tractable. In the setting of identifying a large number of factors that may individually contribute minimally to the data variance, these two features, namely computationally tractability and two-groups sparsity modeling, are critical to effectively model the data. 2 Bayesian sparse factor model via T PB We will define the factor analysis model as follows: Y = ΛX + W, () where Y has dimension n m, Λ is the loading matrix with dimension n p, X is the factor matrix with dimension p m, and W is the n m residual error matrix, where we assume W N(0, Ψ). For computational tractability, we assume Ψ is diagonal (but the diagonal elements are not necessarily the same). For the latent variable X, we follow convention by giving it a standard normal prior, X N(0, I). To induce sparsity in the factor loading matrix Λ, we put the following priors on each element λ ik of the parameter matrix Λ: λ ik N(0, ) (2) T PB(a, b, γ k ) (3) γ k T PB(c, d, ν). (4) In the prior on λ ik, ρ ij provides local shrinkage for each element, while γ k controls the global shrinkage and is specific to each factor k. As in the horseshoe, [0, ] has the desirable properties of strong shrinkage to zero while not overly shrinking signals. This general model is able to capture a number of different types of shrinkage scenarios, depending on the values of a, b, ρ (Table ). Table : Table showing the shrinkage effects for different values of a, b and ρ. ρ > < a = b = 2 horseshoe strong weak a and b weak variable weak a and b strong strong variable 3 Posterior distribution We will generalize this prior further for the latent factor model. For a given parameter θ and scale φ, the following relationship holds []: θ φ β (a, b) θ G(a, λ) and λ G(b, φ), (5) where β (a, b) and G indicate a inverse beta and gamma distribution respectively. From Equations 2, 3 and 4, if we make the substitutions θ ik = and φ k = γ k, it can be shown that θ ik φ k β (a, b). This relationship implies the following simple hierarchical structure for the latent factor model: λ ik N(0, θ ik ), θ ik G(a, δ ik ), δ ik G(b, φ k ), φ k G(c, η) and η G(d, ν), where the parameter φ k controls the global shrinkage and θ ik controls the local shrinkage. We give Ψ an uninformative prior. 2
3 Based on the posterior distributions, a Gibbs sampler can be constructed to iteratively sample the parameter values from their posterior distributions. For faster computation, we use Expectation Maximization (EM), where the expectation step involves taking the expected value of the latent variable X, and the maximization step identifies MAP parameter estimates (see paper for details). 4 Results 4. Simulations We simulated five data sets with different levels of sparsity to test the performance of our model, with sample size n = 200, m = 500, and p = 20 factors. The loading matrices Λ were generated from the above model, setting a = b =. To adjust the sparsity of the matrix, we let ν take values in the range [0 4, ], where smaller values of ν produce more sparsity in the matrix. Both X and W were chosen from N(0, I) with appropriate dimensions. We set both a = b = 2 to recapitulate the horseshoe prior and differentiate them from the simulated distribution. For ν, we used values between and 0.0 with minimal changes in the estimates. We ran EM from a random starting point ten times, and used the parameters from the run with the best fit. We compared our result with the Bayesian Factor Regression Model (BFRM) [] and the K-SVD model [9], BFRM was run with default settings, with a burn in period of 2000 and a sampling period of 20, 000, and K-SVD was run with the same setting as the demonstration file included in the package. We compared the three models by looking at the sparsity level of the three methods versus the amount of information represented in the latent subspace. The sparsity level was measured by the Gini index [3]. For an ordered list of values c, the Gini index is: 2 ( ) N c (k) N k+ 2 k= c N, where N is the total number of elements in the list; bigger values indicate a sparser representation. The accuracy of of the prediction is reflected in the mean squared error (MSE), computed from the residuals. We find that, compared to BFRM, T PB achieves equivalent or better MSE with far more sparsity (range of [0.5, 0.9] for T PB versus [0.4, 0.6] for BFRM). Compared to K-SVD, our method adaptively learned the sparsity of the data, keeping MSE low, while K-SVD maintains the same sparsity level for all simulations, sacrificing accuracy for sparsity. Interestingly, for sparser simulations, both K-SVD and T PB achieve sparser estimates than the real data, while maintaing the same prediction accuracies. We find that the Bayes Information Criterion (BIC) score, depending on the value of ν, is fairly accurate in terms of the number of selected features in this context (Figure 2). Figure : Comparison of the sparsity level (left panel, Y-axis) and the MSE (right panel, Y-axis) of T PB, BFRM and K-SVD. The underlying sparsity is on the X-axis. The true sparsity plot on the left panel also corresponds to the line with slope of. 4.2 Gene Expression Analysis Microarrays are able to generate gene expression levels for tens of thousands of genes in a sample quickly and at low cost. Biologists know that genes do not function as independent units, but instead as parts of complicated networks with different biochemical purposes [4, 5]. As a result, genes that share similar functions tend to have gene expression levels that are correlated across samples because, for example, they may be regulated by common transcription factors. Identifying these coregulated sets of genes from high dimensional gene expression measurements is critical for analysis of gene networks and for identifying genetic variants that impact transcription from long genomic distances. The number of co-regulated sets of genes may be very large, relative to the number of genes in the gene expression matrix. 3
4 Data nu=0. nu=0.0 nu=0.00 nu=0.000 LogL BIC N-factor N-factor Figure 2: Plot showing the fitness of the model, with factor number on the x-axis and log likelihood (left panel) or the BIC score (right panel) on the y-axis. In this simulated data with twenty factors, the BIC score is fairly accurate in terms of the number of selected features across different values of ν. To this end, we applied our method on a subset of 8262 genes with a sample size of 354 human cerebellum samples [unpublished]. We set K = 000 and ran EM from ten starting points with a = 0.5, b = 000 and ν = 0 4 to induce strong shrinkage; we used the result with the best fit. We note that the sparse prior alleviates the problem of overfitting by shrinking unnecessary factors to 0. By looking at the correlation of the genes that load on each factor (those that have values 0), we found that the genes on the same factors clustered well (Figure 3, left). The size of the gene clusters range from 0 to 500, with most around 50 (Figure 3, right). Figure 3: Correlation of genes loaded on the first few factors (left) and distribution of the gene clusters for a total of 000 factors (right). Factors on the left are denoted by black lines. 5 Conclusions We built a model for sparse factor analysis using a three parameter beta prior to induce shrinkage. We found that this model has favorable characteristics to estimating possibly high-dimensional latent spaces. We are further testing the robustness of estimates from our model and will use the factors to identify genetic variants that are associated with long-distance genetic regulation of each factor. Acknowledgments The authors would like to thank Sayan Mukherjee for helpful conversations. The gene expression data were generated by Merck Research Laboratories in collaboration with the Harvard Brain Tissue Resource Center and was obtained through the Synapse data repository (data set id: syn4505 at References [] C. M. Carvalho, J. E. Lucas, Q. Wang, J. Chang, J. R. Nevins, and M. West. High-dimensional sparse factor modelling - applications in gene expression genomics. Journal of the American Statistical Association, 03: , PMCID [2] Mingyuan Zhou, Lauren Hannah, David Dunson, and Lawrence Carin. Beta-Negative Binomial Process and Poisson Factor Analysis. December 20. 4
5 [3] Nicholas G. Polson James G. Scott. Shrink globally, act locally: Sparse bayesian regularization and prediction. [4] Michael E. Tipping. Sparse bayesian learning and the relevance vector machine. J. Mach. Learn. Res., :2 244, September 200. [5] Jim and Philip. Inference with normal-gamma prior distributions in regression problems. Bayesian Analysis, 5():7 88, 200. [6] Trevor Park and George Casella. The Bayesian Lasso. Journal of the American Statistical Association, 03(482):68 686, [7] Carlos M. Carvalho, Nicholas G. Polson, and James G. Scott. Handling sparsity via the horseshoe. Journal of Machine Learning Research - Proceedings Track, pages 73 80, [8] Barbara Engelhardt and Matthew Stephens. Analysis of population structure: A unifying framework and novel methods based on sparse factor analysis. PLoS Genetics, 6(9):e007, 200. [9] M. Aharon, M. Elad, and A. Bruckstein. K-SVD: An Algorithm for Designing Overcomplete Dictionaries for Sparse Representation. Signal Processing, IEEE Transactions on, 54(): , November [0] Julien Mairal, Francis Bach, Jean Ponce, and Guillermo Sapiro. Online learning for matrix factorization and sparse coding. J. Mach. Learn. Res., :9 60, March 200. [] Artin Armagan, David Dunson, and Merlise Clyde. Generalized beta mixtures of gaussians. In J. Shawe-Taylor, R.S. Zemel, P. Bartlett, F.C.N. Pereira, and K.Q. Weinberger, editors, Advances in Neural Information Processing Systems 24, pages [2] B. Efron. Microarrays, Empirical Bayes and the Two-Groups Model. Statistical Science, 23: 47, [3] Niall Hurley and Scott Rickard. Comparing measures of sparsity. Machine Learning for Signal Processing, MLSP IEEE Workshop on, pages 55 60, Oct [4] Yoo-Ah Kim, Stefan Wuchty, and Teresa Przytycka. Identifying causal genes and dysregulated pathways in complex diseases. PLoS computational biology, 7(3):e00095, 20. [5] Yanqing Chen, Jun Zhu, Pek Lum, Xia Yang, Shirly Pinto, Douglas MacNeil, Chunsheng Zhang, John Lamb, Stephen Edwards, Solveig Sieberts, Amy Leonardson, Lawrence Castellini, Susanna Wang, Marie-France Champy, Bin Zhang, Valur Emilsson, Sudheer Doss, Anatole Ghazalpour, Steve Horvath, Thomas Drake, Aldons Lusis, and Eric Schadt. Variations in dna elucidate molecular networks that cause disease. Nature, 452(786): ,
Partial factor modeling: predictor-dependent shrinkage for linear regression
modeling: predictor-dependent shrinkage for linear Richard Hahn, Carlos Carvalho and Sayan Mukherjee JASA 2013 Review by Esther Salazar Duke University December, 2013 Factor framework The factor framework
More informationDefault Priors and Effcient Posterior Computation in Bayesian
Default Priors and Effcient Posterior Computation in Bayesian Factor Analysis January 16, 2010 Presented by Eric Wang, Duke University Background and Motivation A Brief Review of Parameter Expansion Literature
More informationRegularization Parameter Selection for a Bayesian Multi-Level Group Lasso Regression Model with Application to Imaging Genomics
Regularization Parameter Selection for a Bayesian Multi-Level Group Lasso Regression Model with Application to Imaging Genomics arxiv:1603.08163v1 [stat.ml] 7 Mar 016 Farouk S. Nathoo, Keelin Greenlaw,
More informationBayesian Methods for Machine Learning
Bayesian Methods for Machine Learning CS 584: Big Data Analytics Material adapted from Radford Neal s tutorial (http://ftp.cs.utoronto.ca/pub/radford/bayes-tut.pdf), Zoubin Ghahramni (http://hunch.net/~coms-4771/zoubin_ghahramani_bayesian_learning.pdf),
More informationApproximating high-dimensional posteriors with nuisance parameters via integrated rotated Gaussian approximation (IRGA)
Approximating high-dimensional posteriors with nuisance parameters via integrated rotated Gaussian approximation (IRGA) Willem van den Boom Department of Statistics and Applied Probability National University
More informationSparse Linear Models (10/7/13)
STA56: Probabilistic machine learning Sparse Linear Models (0/7/) Lecturer: Barbara Engelhardt Scribes: Jiaji Huang, Xin Jiang, Albert Oh Sparsity Sparsity has been a hot topic in statistics and machine
More informationBayesian shrinkage approach in variable selection for mixed
Bayesian shrinkage approach in variable selection for mixed effects s GGI Statistics Conference, Florence, 2015 Bayesian Variable Selection June 22-26, 2015 Outline 1 Introduction 2 3 4 Outline Introduction
More informationIntegrated Anlaysis of Genomics Data
Integrated Anlaysis of Genomics Data Elizabeth Jennings July 3, 01 Abstract In this project, we integrate data from several genomic platforms in a model that incorporates the biological relationships between
More informationBayesian Grouped Horseshoe Regression with Application to Additive Models
Bayesian Grouped Horseshoe Regression with Application to Additive Models Zemei Xu 1,2, Daniel F. Schmidt 1, Enes Makalic 1, Guoqi Qian 2, John L. Hopper 1 1 Centre for Epidemiology and Biostatistics,
More informationBayesian Sparse Linear Regression with Unknown Symmetric Error
Bayesian Sparse Linear Regression with Unknown Symmetric Error Minwoo Chae 1 Joint work with Lizhen Lin 2 David B. Dunson 3 1 Department of Mathematics, The University of Texas at Austin 2 Department of
More informationExpression Data Exploration: Association, Patterns, Factors & Regression Modelling
Expression Data Exploration: Association, Patterns, Factors & Regression Modelling Exploring gene expression data Scale factors, median chip correlation on gene subsets for crude data quality investigation
More informationOr How to select variables Using Bayesian LASSO
Or How to select variables Using Bayesian LASSO x 1 x 2 x 3 x 4 Or How to select variables Using Bayesian LASSO x 1 x 2 x 3 x 4 Or How to select variables Using Bayesian LASSO On Bayesian Variable Selection
More informationProbabilistic machine learning group, Aalto University Bayesian theory and methods, approximative integration, model
Aki Vehtari, Aalto University, Finland Probabilistic machine learning group, Aalto University http://research.cs.aalto.fi/pml/ Bayesian theory and methods, approximative integration, model assessment and
More informationReview: Probabilistic Matrix Factorization. Probabilistic Matrix Factorization (PMF)
Case Study 4: Collaborative Filtering Review: Probabilistic Matrix Factorization Machine Learning for Big Data CSE547/STAT548, University of Washington Emily Fox February 2 th, 214 Emily Fox 214 1 Probabilistic
More informationSTA414/2104 Statistical Methods for Machine Learning II
STA414/2104 Statistical Methods for Machine Learning II Murat A. Erdogdu & David Duvenaud Department of Computer Science Department of Statistical Sciences Lecture 3 Slide credits: Russ Salakhutdinov Announcements
More informationBayesian methods in economics and finance
1/26 Bayesian methods in economics and finance Linear regression: Bayesian model selection and sparsity priors Linear Regression 2/26 Linear regression Model for relationship between (several) independent
More informationBayes methods for categorical data. April 25, 2017
Bayes methods for categorical data April 25, 2017 Motivation for joint probability models Increasing interest in high-dimensional data in broad applications Focus may be on prediction, variable selection,
More informationStatistical Inference
Statistical Inference Liu Yang Florida State University October 27, 2016 Liu Yang, Libo Wang (Florida State University) Statistical Inference October 27, 2016 1 / 27 Outline The Bayesian Lasso Trevor Park
More informationBayesian linear regression
Bayesian linear regression Linear regression is the basis of most statistical modeling. The model is Y i = X T i β + ε i, where Y i is the continuous response X i = (X i1,..., X ip ) T is the corresponding
More informationVariational Bayesian Dirichlet-Multinomial Allocation for Exponential Family Mixtures
17th Europ. Conf. on Machine Learning, Berlin, Germany, 2006. Variational Bayesian Dirichlet-Multinomial Allocation for Exponential Family Mixtures Shipeng Yu 1,2, Kai Yu 2, Volker Tresp 2, and Hans-Peter
More informationIntegrated Non-Factorized Variational Inference
Integrated Non-Factorized Variational Inference Shaobo Han, Xuejun Liao and Lawrence Carin Duke University February 27, 2014 S. Han et al. Integrated Non-Factorized Variational Inference February 27, 2014
More informationFactor Analysis (10/2/13)
STA561: Probabilistic machine learning Factor Analysis (10/2/13) Lecturer: Barbara Engelhardt Scribes: Li Zhu, Fan Li, Ni Guan Factor Analysis Factor analysis is related to the mixture models we have studied.
More informationSTA 4273H: Statistical Machine Learning
STA 4273H: Statistical Machine Learning Russ Salakhutdinov Department of Computer Science! Department of Statistical Sciences! rsalakhu@cs.toronto.edu! h0p://www.cs.utoronto.ca/~rsalakhu/ Lecture 7 Approximate
More informationEstimating Sparse High Dimensional Linear Models using Global-Local Shrinkage
Estimating Sparse High Dimensional Linear Models using Global-Local Shrinkage Daniel F. Schmidt Centre for Biostatistics and Epidemiology The University of Melbourne Monash University May 11, 2017 Outline
More informationVariable Selection in Structured High-dimensional Covariate Spaces
Variable Selection in Structured High-dimensional Covariate Spaces Fan Li 1 Nancy Zhang 2 1 Department of Health Care Policy Harvard University 2 Department of Statistics Stanford University May 14 2007
More informationGeometric ergodicity of the Bayesian lasso
Geometric ergodicity of the Bayesian lasso Kshiti Khare and James P. Hobert Department of Statistics University of Florida June 3 Abstract Consider the standard linear model y = X +, where the components
More informationVCMC: Variational Consensus Monte Carlo
VCMC: Variational Consensus Monte Carlo Maxim Rabinovich, Elaine Angelino, Michael I. Jordan Berkeley Vision and Learning Center September 22, 2015 probabilistic models! sky fog bridge water grass object
More informationThe Mixture Approach for Simulating New Families of Bivariate Distributions with Specified Correlations
The Mixture Approach for Simulating New Families of Bivariate Distributions with Specified Correlations John R. Michael, Significance, Inc. and William R. Schucany, Southern Methodist University The mixture
More informationSparse Bayesian Logistic Regression with Hierarchical Prior and Variational Inference
Sparse Bayesian Logistic Regression with Hierarchical Prior and Variational Inference Shunsuke Horii Waseda University s.horii@aoni.waseda.jp Abstract In this paper, we present a hierarchical model which
More informationScale Mixture Modeling of Priors for Sparse Signal Recovery
Scale Mixture Modeling of Priors for Sparse Signal Recovery Bhaskar D Rao 1 University of California, San Diego 1 Thanks to David Wipf, Jason Palmer, Zhilin Zhang and Ritwik Giri Outline Outline Sparse
More informationBayesian Grouped Horseshoe Regression with Application to Additive Models
Bayesian Grouped Horseshoe Regression with Application to Additive Models Zemei Xu, Daniel F. Schmidt, Enes Makalic, Guoqi Qian, and John L. Hopper Centre for Epidemiology and Biostatistics, Melbourne
More informationA short introduction to INLA and R-INLA
A short introduction to INLA and R-INLA Integrated Nested Laplace Approximation Thomas Opitz, BioSP, INRA Avignon Workshop: Theory and practice of INLA and SPDE November 7, 2018 2/21 Plan for this talk
More informationLearning Multiple Tasks with a Sparse Matrix-Normal Penalty
Learning Multiple Tasks with a Sparse Matrix-Normal Penalty Yi Zhang and Jeff Schneider NIPS 2010 Presented by Esther Salazar Duke University March 25, 2011 E. Salazar (Reading group) March 25, 2011 1
More informationPackage effectfusion
Package November 29, 2016 Title Bayesian Effect Fusion for Categorical Predictors Version 1.0 Date 2016-11-21 Author Daniela Pauger [aut, cre], Helga Wagner [aut], Gertraud Malsiner-Walli [aut] Maintainer
More informationLasso & Bayesian Lasso
Readings Chapter 15 Christensen Merlise Clyde October 6, 2015 Lasso Tibshirani (JRSS B 1996) proposed estimating coefficients through L 1 constrained least squares Least Absolute Shrinkage and Selection
More informationarxiv: v1 [stat.me] 6 Jul 2017
Sparsity information and regularization in the horseshoe and other shrinkage priors arxiv:77.694v [stat.me] 6 Jul 7 Juho Piironen and Aki Vehtari Helsinki Institute for Information Technology, HIIT Department
More informationPackage horseshoe. November 8, 2016
Title Implementation of the Horseshoe Prior Version 0.1.0 Package horseshoe November 8, 2016 Description Contains functions for applying the horseshoe prior to highdimensional linear regression, yielding
More informationStudy Notes on the Latent Dirichlet Allocation
Study Notes on the Latent Dirichlet Allocation Xugang Ye 1. Model Framework A word is an element of dictionary {1,,}. A document is represented by a sequence of words: =(,, ), {1,,}. A corpus is a collection
More informationLinear model selection and regularization
Linear model selection and regularization Problems with linear regression with least square 1. Prediction Accuracy: linear regression has low bias but suffer from high variance, especially when n p. It
More informationParcimonie en apprentissage statistique
Parcimonie en apprentissage statistique Guillaume Obozinski Ecole des Ponts - ParisTech Journée Parcimonie Fédération Charles Hermite, 23 Juin 2014 Parcimonie en apprentissage 1/44 Classical supervised
More informationHorseshoe, Lasso and Related Shrinkage Methods
Readings Chapter 15 Christensen Merlise Clyde October 15, 2015 Bayesian Lasso Park & Casella (JASA 2008) and Hans (Biometrika 2010) propose Bayesian versions of the Lasso Bayesian Lasso Park & Casella
More informationAn Algorithm for Bayesian Variable Selection in High-dimensional Generalized Linear Models
Proceedings 59th ISI World Statistics Congress, 25-30 August 2013, Hong Kong (Session CPS023) p.3938 An Algorithm for Bayesian Variable Selection in High-dimensional Generalized Linear Models Vitara Pungpapong
More informationBayesian Sparse Correlated Factor Analysis
Bayesian Sparse Correlated Factor Analysis 1 Abstract In this paper, we propose a new sparse correlated factor model under a Bayesian framework that intended to model transcription factor regulation in
More information25 : Graphical induced structured input/output models
10-708: Probabilistic Graphical Models 10-708, Spring 2016 25 : Graphical induced structured input/output models Lecturer: Eric P. Xing Scribes: Raied Aljadaany, Shi Zong, Chenchen Zhu Disclaimer: A large
More informationBayesian non-parametric model to longitudinally predict churn
Bayesian non-parametric model to longitudinally predict churn Bruno Scarpa Università di Padova Conference of European Statistics Stakeholders Methodologists, Producers and Users of European Statistics
More informationMachine Learning for OR & FE
Machine Learning for OR & FE Regression II: Regularization and Shrinkage Methods Martin Haugh Department of Industrial Engineering and Operations Research Columbia University Email: martin.b.haugh@gmail.com
More informationPart 2: Multivariate fmri analysis using a sparsifying spatio-temporal prior
Chalmers Machine Learning Summer School Approximate message passing and biomedicine Part 2: Multivariate fmri analysis using a sparsifying spatio-temporal prior Tom Heskes joint work with Marcel van Gerven
More informationLearning Feature Selection Dependencies in Multi-task Learning
Learning Feature Selection Dependencies in Multi-task Learning Daniel Hernández-Lobato Computer Science Department Universidad Autónoma de Madrid daniel.hernandez@uam.es José Miguel Hernández-Lobato Department
More informationNonparametric Bayes tensor factorizations for big data
Nonparametric Bayes tensor factorizations for big data David Dunson Department of Statistical Science, Duke University Funded from NIH R01-ES017240, R01-ES017436 & DARPA N66001-09-C-2082 Motivation Conditional
More informationSparse Factor-Analytic Probit Models
Sparse Factor-Analytic Probit Models By JAMES G. SCOTT Department of Statistical Science, Duke University, Durham, North Carolina 27708-0251, U.S.A. james@stat.duke.edu PAUL R. HAHN Department of Statistical
More informationCS598 Machine Learning in Computational Biology (Lecture 5: Matrix - part 2) Professor Jian Peng Teaching Assistant: Rongda Zhu
CS598 Machine Learning in Computational Biology (Lecture 5: Matrix - part 2) Professor Jian Peng Teaching Assistant: Rongda Zhu Feature engineering is hard 1. Extract informative features from domain knowledge
More informationIntroduction to Probabilistic Machine Learning
Introduction to Probabilistic Machine Learning Piyush Rai Dept. of CSE, IIT Kanpur (Mini-course 1) Nov 03, 2015 Piyush Rai (IIT Kanpur) Introduction to Probabilistic Machine Learning 1 Machine Learning
More informationOn the Half-Cauchy Prior for a Global Scale Parameter
Bayesian Analysis (2012) 7, Number 2, pp. 1 16 On the Half-Cauchy Prior for a Global Scale Parameter Nicholas G. Polson and James G. Scott Abstract. This paper argues that the half-cauchy distribution
More information2.3. Clustering or vector quantization 57
Multivariate Statistics non-negative matrix factorisation and sparse dictionary learning The PCA decomposition is by construction optimal solution to argmin A R n q,h R q p X AH 2 2 under constraint :
More informationDimension Reduction Methods
Dimension Reduction Methods And Bayesian Machine Learning Marek Petrik 2/28 Previously in Machine Learning How to choose the right features if we have (too) many options Methods: 1. Subset selection 2.
More informationSTA 216, GLM, Lecture 16. October 29, 2007
STA 216, GLM, Lecture 16 October 29, 2007 Efficient Posterior Computation in Factor Models Underlying Normal Models Generalized Latent Trait Models Formulation Genetic Epidemiology Illustration Structural
More informationBayesian learning of sparse factor loadings
Magnus Rattray School of Computer Science, University of Manchester Bayesian Research Kitchen, Ambleside, September 6th 2008 Talk Outline Brief overview of popular sparsity priors Example application:
More informationMULTI-SCALE BAYESIAN RECONSTRUCTION OF COMPRESSIVE X-RAY IMAGE. Jiaji Huang, Xin Yuan, and Robert Calderbank
MULTI-SCALE BAYESIAN RECONSTRUCTION OF COMPRESSIVE X-RAY IMAGE Jiaji Huang, Xin Yuan, and Robert Calderbank Department of Electrical and Computer Engineering, Duke University, Durham, NC, 27708, U.S.A
More informationGentle Introduction to Infinite Gaussian Mixture Modeling
Gentle Introduction to Infinite Gaussian Mixture Modeling with an application in neuroscience By Frank Wood Rasmussen, NIPS 1999 Neuroscience Application: Spike Sorting Important in neuroscience and for
More informationOn Bayesian Computation
On Bayesian Computation Michael I. Jordan with Elaine Angelino, Maxim Rabinovich, Martin Wainwright and Yun Yang Previous Work: Information Constraints on Inference Minimize the minimax risk under constraints
More informationMisspecified Bayesian Cramer-Rao Bound for Sparse Bayesian Learning
MITSUBISHI ELECTRIC RESEARCH LABORATORIES http://www.merl.com Misspecified Bayesian Cramer-Rao Bound for Sparse Bayesian Learning Pajovic, M. TR018-076 July 1, 018 Abstract We consider a misspecified Bayesian
More informationL 0 methods. H.J. Kappen Donders Institute for Neuroscience Radboud University, Nijmegen, the Netherlands. December 5, 2011.
L methods H.J. Kappen Donders Institute for Neuroscience Radboud University, Nijmegen, the Netherlands December 5, 2 Bert Kappen Outline George McCullochs model The Variational Garrote Bert Kappen L methods
More informationMarkov Chain Monte Carlo methods
Markov Chain Monte Carlo methods Tomas McKelvey and Lennart Svensson Signal Processing Group Department of Signals and Systems Chalmers University of Technology, Sweden November 26, 2012 Today s learning
More informationVariational Methods in Bayesian Deconvolution
PHYSTAT, SLAC, Stanford, California, September 8-, Variational Methods in Bayesian Deconvolution K. Zarb Adami Cavendish Laboratory, University of Cambridge, UK This paper gives an introduction to the
More informationMS-C1620 Statistical inference
MS-C1620 Statistical inference 10 Linear regression III Joni Virta Department of Mathematics and Systems Analysis School of Science Aalto University Academic year 2018 2019 Period III - IV 1 / 32 Contents
More informationBayesian Regression Linear and Logistic Regression
When we want more than point estimates Bayesian Regression Linear and Logistic Regression Nicole Beckage Ordinary Least Squares Regression and Lasso Regression return only point estimates But what if we
More informationAndriy Mnih and Ruslan Salakhutdinov
MATRIX FACTORIZATION METHODS FOR COLLABORATIVE FILTERING Andriy Mnih and Ruslan Salakhutdinov University of Toronto, Machine Learning Group 1 What is collaborative filtering? The goal of collaborative
More informationGraphical Models for Collaborative Filtering
Graphical Models for Collaborative Filtering Le Song Machine Learning II: Advanced Topics CSE 8803ML, Spring 2012 Sequence modeling HMM, Kalman Filter, etc.: Similarity: the same graphical model topology,
More information25 : Graphical induced structured input/output models
10-708: Probabilistic Graphical Models 10-708, Spring 2013 25 : Graphical induced structured input/output models Lecturer: Eric P. Xing Scribes: Meghana Kshirsagar (mkshirsa), Yiwen Chen (yiwenche) 1 Graph
More informationApproximate Message Passing with Built-in Parameter Estimation for Sparse Signal Recovery
Approimate Message Passing with Built-in Parameter Estimation for Sparse Signal Recovery arxiv:1606.00901v1 [cs.it] Jun 016 Shuai Huang, Trac D. Tran Department of Electrical and Computer Engineering Johns
More informationPattern Recognition and Machine Learning
Christopher M. Bishop Pattern Recognition and Machine Learning ÖSpri inger Contents Preface Mathematical notation Contents vii xi xiii 1 Introduction 1 1.1 Example: Polynomial Curve Fitting 4 1.2 Probability
More informationNon-parametric Bayesian Modeling and Fusion of Spatio-temporal Information Sources
th International Conference on Information Fusion Chicago, Illinois, USA, July -8, Non-parametric Bayesian Modeling and Fusion of Spatio-temporal Information Sources Priyadip Ray Department of Electrical
More informationChapter 10. Semi-Supervised Learning
Chapter 10. Semi-Supervised Learning Wei Pan Division of Biostatistics, School of Public Health, University of Minnesota, Minneapolis, MN 55455 Email: weip@biostat.umn.edu PubH 7475/8475 c Wei Pan Outline
More informationOn the half-cauchy prior for a global scale parameter
On the half-cauchy prior for a global scale parameter Nicholas G. Polson University of Chicago arxiv:1104.4937v2 [stat.me] 25 Sep 2011 James G. Scott The University of Texas at Austin First draft: June
More informationLinear Model Selection and Regularization
Linear Model Selection and Regularization Recall the linear model Y = β 0 + β 1 X 1 + + β p X p + ɛ. In the lectures that follow, we consider some approaches for extending the linear model framework. In
More informationSTA 4273H: Sta-s-cal Machine Learning
STA 4273H: Sta-s-cal Machine Learning Russ Salakhutdinov Department of Computer Science! Department of Statistical Sciences! rsalakhu@cs.toronto.edu! h0p://www.cs.utoronto.ca/~rsalakhu/ Lecture 2 In our
More informationChoosing the Summary Statistics and the Acceptance Rate in Approximate Bayesian Computation
Choosing the Summary Statistics and the Acceptance Rate in Approximate Bayesian Computation COMPSTAT 2010 Revised version; August 13, 2010 Michael G.B. Blum 1 Laboratoire TIMC-IMAG, CNRS, UJF Grenoble
More informationOn Markov chain Monte Carlo methods for tall data
On Markov chain Monte Carlo methods for tall data Remi Bardenet, Arnaud Doucet, Chris Holmes Paper review by: David Carlson October 29, 2016 Introduction Many data sets in machine learning and computational
More informationMultivariate Bayes Wavelet Shrinkage and Applications
Journal of Applied Statistics Vol. 32, No. 5, 529 542, July 2005 Multivariate Bayes Wavelet Shrinkage and Applications GABRIEL HUERTA Department of Mathematics and Statistics, University of New Mexico
More informationDefault Priors and Efficient Posterior Computation in Bayesian Factor Analysis
Default Priors and Efficient Posterior Computation in Bayesian Factor Analysis Joyee Ghosh Institute of Statistics and Decision Sciences, Duke University Box 90251, Durham, NC 27708 joyee@stat.duke.edu
More informationRegularized Regression A Bayesian point of view
Regularized Regression A Bayesian point of view Vincent MICHEL Director : Gilles Celeux Supervisor : Bertrand Thirion Parietal Team, INRIA Saclay Ile-de-France LRI, Université Paris Sud CEA, DSV, I2BM,
More informationLinear Methods for Regression. Lijun Zhang
Linear Methods for Regression Lijun Zhang zlj@nju.edu.cn http://cs.nju.edu.cn/zlj Outline Introduction Linear Regression Models and Least Squares Subset Selection Shrinkage Methods Methods Using Derived
More informationGreedy Dictionary Selection for Sparse Representation
Greedy Dictionary Selection for Sparse Representation Volkan Cevher Rice University volkan@rice.edu Andreas Krause Caltech krausea@caltech.edu Abstract We discuss how to construct a dictionary by selecting
More information. Also, in this case, p i = N1 ) T, (2) where. I γ C N(N 2 2 F + N1 2 Q)
Supplementary information S7 Testing for association at imputed SPs puted SPs Score tests A Score Test needs calculations of the observed data score and information matrix only under the null hypothesis,
More informationBayesian spatial quantile regression
Brian J. Reich and Montserrat Fuentes North Carolina State University and David B. Dunson Duke University E-mail:reich@stat.ncsu.edu Tropospheric ozone Tropospheric ozone has been linked with several adverse
More informationScaling Neighbourhood Methods
Quick Recap Scaling Neighbourhood Methods Collaborative Filtering m = #items n = #users Complexity : m * m * n Comparative Scale of Signals ~50 M users ~25 M items Explicit Ratings ~ O(1M) (1 per billion)
More informationMotivation Sparse Signal Recovery is an interesting area with many potential applications. Methods developed for solving sparse signal recovery proble
Bayesian Methods for Sparse Signal Recovery Bhaskar D Rao 1 University of California, San Diego 1 Thanks to David Wipf, Zhilin Zhang and Ritwik Giri Motivation Sparse Signal Recovery is an interesting
More informationBAGUS: Bayesian Regularization for Graphical Models with Unequal Shrinkage
BAGUS: Bayesian Regularization for Graphical Models with Unequal Shrinkage Lingrui Gan, Naveen N. Narisetty, Feng Liang Department of Statistics University of Illinois at Urbana-Champaign Problem Statement
More informationComputer Vision Group Prof. Daniel Cremers. 10a. Markov Chain Monte Carlo
Group Prof. Daniel Cremers 10a. Markov Chain Monte Carlo Markov Chain Monte Carlo In high-dimensional spaces, rejection sampling and importance sampling are very inefficient An alternative is Markov Chain
More informationLecture 13 : Variational Inference: Mean Field Approximation
10-708: Probabilistic Graphical Models 10-708, Spring 2017 Lecture 13 : Variational Inference: Mean Field Approximation Lecturer: Willie Neiswanger Scribes: Xupeng Tong, Minxing Liu 1 Problem Setup 1.1
More informationNonparametric Bayesian Dictionary Learning for Machine Listening
Nonparametric Bayesian Dictionary Learning for Machine Listening Dawen Liang Electrical Engineering dl2771@columbia.edu 1 Introduction Machine listening, i.e., giving machines the ability to extract useful
More informationProbabilistic Time Series Classification
Probabilistic Time Series Classification Y. Cem Sübakan Boğaziçi University 25.06.2013 Y. Cem Sübakan (Boğaziçi University) M.Sc. Thesis Defense 25.06.2013 1 / 54 Problem Statement The goal is to assign
More informationUncertainty quantification and visualization for functional random variables
Uncertainty quantification and visualization for functional random variables MascotNum Workshop 2014 S. Nanty 1,3 C. Helbert 2 A. Marrel 1 N. Pérot 1 C. Prieur 3 1 CEA, DEN/DER/SESI/LSMR, F-13108, Saint-Paul-lez-Durance,
More informationDocument and Topic Models: plsa and LDA
Document and Topic Models: plsa and LDA Andrew Levandoski and Jonathan Lobo CS 3750 Advanced Topics in Machine Learning 2 October 2018 Outline Topic Models plsa LSA Model Fitting via EM phits: link analysis
More informationABC random forest for parameter estimation. Jean-Michel Marin
ABC random forest for parameter estimation Jean-Michel Marin Université de Montpellier Institut Montpelliérain Alexander Grothendieck (IMAG) Institut de Biologie Computationnelle (IBC) Labex Numev! joint
More informationSupplementary Materials for Molecular QTL Discovery Incorporating Genomic Annotations using Bayesian False Discovery Rate Control
Supplementary Materials for Molecular QTL Discovery Incorporating Genomic Annotations using Bayesian False Discovery Rate Control Xiaoquan Wen Department of Biostatistics, University of Michigan A Model
More informationMixtures of Negative Binomial distributions for modelling overdispersion in RNA-Seq data
Mixtures of Negative Binomial distributions for modelling overdispersion in RNA-Seq data Cinzia Viroli 1 joint with E. Bonafede 1, S. Robin 2 & F. Picard 3 1 Department of Statistical Sciences, University
More informationarxiv: v3 [stat.ml] 3 Sep 2014
A Truncated EM Approach for Spike-and-Slab Sparse Coding Abdul-Saboor Sheikh 1, Jacquelyn A. Shelton 1 and Jörg Lücke 2 arxiv:1211.3589v3 [stat.ml] 3 Sep 2014 {sheikh, shelton}@tu-berlin.de, joerg.luecke@uni-oldenburg.de
More informationBayesian Clustering of Multi-Omics
Bayesian Clustering of Multi-Omics for Cardiovascular Diseases Nils Strelow 22./23.01.2019 Final Presentation Trends in Bioinformatics WS18/19 Recap Intermediate presentation Precision Medicine Multi-Omics
More informationHandling Sparsity via the Horseshoe
Handling Sparsity via the Carlos M. Carvalho Booth School of Business The University of Chicago Chicago, IL 60637 Nicholas G. Polson Booth School of Business The University of Chicago Chicago, IL 60637
More information