Bayesian learning of sparse factor loadings
|
|
- Felix Michael Welch
- 5 years ago
- Views:
Transcription
1 Magnus Rattray School of Computer Science, University of Manchester Bayesian Research Kitchen, Ambleside, September 6th 2008
2 Talk Outline Brief overview of popular sparsity priors Example application: sparse Bayesian factor analysis for network inference Theory for average-case performance with mixture prior Theory and MCMC results for dierent data distributions Comparison with L1 prior
3 Sparsity priors tend to be convenient rather than realistic, e.g.
4 Sparsity priors tend to be convenient rather than realistic, e.g. L1 p(w i ) = λ 2 e λ w i
5 Sparsity priors tend to be convenient rather than realistic, e.g. L1 p(w i ) = λ 2 e λ w i ARD p(w i ) = N (w i 0, λ 1 i )
6 Sparsity priors tend to be convenient rather than realistic, e.g. L1 p(w i ) = λ 2 e λ w i ARD p(w i ) = N (w i 0, λ 1 i ) Mixture p(w i ) = (1 C)δ(w i ) + CN (w i 0, λ 1 )
7 Sparsity priors tend to be convenient rather than realistic, e.g. L1 p(w i ) = λ 2 e λ w i ARD p(w i ) = N (w i 0, λ 1 i ) Mixture p(w i ) = (1 C)δ(w i ) + CN (w i 0, λ 1 ) Is it OK to not worry about whether these actually t the data?
8 Regulatory network inference Gibbs sampler Example application: factor analysis for network inference Y N (WZ + µ, Ψ) Y = [y in ] log-expression of gene i in sample n Z = [z jn ] log-concentration (or "activity") of TF j in sample n W = [w ij ] factor loading is eect of TF j on gene i Model Z as latent variable, since mrna data may not capture TF protein level/activity, or TFs too weakly expressed For e.g. yeast W roughly Various methods for inferring sparse W from Y (reviewed by Pournara and Wernisch, BMC Bioinformatics 2007).
9 Regulatory network inference Gibbs sampler Factor analysis for network inference Y N (WZ + µ, Ψ) Mixture prior leads to tractable Gibbs sampler p(w ij ) = (1 C ij )δ(w ij ) + C ij N (w ij 0, λ 1 ) Hyper-parameters C ij [0, 1] can be obtained from e.g. ChIP-chip data DNA motifs (Sabatti and James, Bioinformatics 2006) Or we can estimate (grouped) hyper-parameters by MCMC
10 Regulatory network inference Gibbs sampler Gibbs sampler Write w ij = x ij b ij where x ij {0, 1} and b ij N (0, λ 1 ) x j p(x j X \ x j, Z, Y) (1) B p(b X, Z, Y) (2) Z p(z X, B, Y) (3)
11 Regulatory network inference Gibbs sampler Gibbs sampler Write w ij = x ij b ij where x ij {0, 1} and b ij N (0, λ 1 ) x j p(x j X \ x j, Z, Y) (1) B p(b X, Z, Y) (2) Z p(z X, B, Y) (3) Integrate out B before sampling X (2,3) more ecient when X is typically sparse Can also sample hyper-parameters C ij and λ if required
12 for sparse Bayesian PCA y n N (0, σ 2 I + w w T ) p(w C, λ) = N [ (1 C)δ(wi ) + CN (w i 0, λ 1 ) ] i=1
13 for sparse Bayesian PCA y n N (0, σ 2 I + w w T ) p(w C, λ) = N [ (1 C)δ(wi ) + CN (w i 0, λ 1 ) ] i=1 We study average behaviour over datasets D = {y 1, y 2,..., y M } produced by a teacher distribution
14 for sparse Bayesian PCA y n N (0, σ 2 I + w w T ) p(w C, λ) = N [ (1 C)δ(wi ) + CN (w i 0, λ 1 ) ] i=1 We study average behaviour over datasets D = {y 1, y 2,..., y M } produced by a teacher distribution The teacher is identical except for a dierent factorized parameter distribution. We consider two cases:
15 for sparse Bayesian PCA y n N (0, σ 2 I + w w T ) p(w C, λ) = N [ (1 C)δ(wi ) + CN (w i 0, λ 1 ) ] i=1 We study average behaviour over datasets D = {y 1, y 2,..., y M } produced by a teacher distribution The teacher is identical except for a dierent factorized parameter distribution. We consider two cases: (1) Same form: p(w t i ) = (1 C t )δ(w t i ) + C t N (w t i 0, λ 1 t )
16 for sparse Bayesian PCA y n N (0, σ 2 I + w w T ) p(w C, λ) = N [ (1 C)δ(wi ) + CN (w i 0, λ 1 ) ] i=1 We study average behaviour over datasets D = {y 1, y 2,..., y M } produced by a teacher distribution The teacher is identical except for a dierent factorized parameter distribution. We consider two cases: (1) Same form: p(w t i ) = (1 C t )δ(w t i ) + C t N (w t i 0, λ 1 t ) (2) Dierent form: p(w t i ) = (1 C t )δ(w t i ) + C t δ(w t i λ 1/2 t )
17 for sparse Bayesian PCA Compute marginal likelihood log p(d C, λ) D in limit N with α = M/N held constant using replica method
18 for sparse Bayesian PCA Compute marginal likelihood log p(d C, λ) D in limit N with α = M/N held constant using replica method Compute functions of the mean posterior parameter w (D) ρ(w ) = w w t w w t L(w ) = log p(y w, C, λ) y w t
19 for sparse Bayesian PCA Compute marginal likelihood log p(d C, λ) D in limit N with α = M/N held constant using replica method Compute functions of the mean posterior parameter w (D) ρ(w ) = w w t w w t L(w ) = log p(y w, C, λ) y w t Good agreement with simulations for most relevant case of small α (so-called large N small p regime)
20 for sparse Bayesian PCA Compute marginal likelihood log p(d C, λ) D in limit N with α = M/N held constant using replica method Compute functions of the mean posterior parameter w (D) ρ(w ) = w w t w w t L(w ) = log p(y w, C, λ) y w t Good agreement with simulations for most relevant case of small α (so-called large N small p regime)
21 for sparse Bayesian PCA Similar replica calculation to Uda and Kabashima (J. Phys. Soc. Japan 74, 2005) Z(D) = p(d C, λ) = N dw p(w C, λ) p(y n w) n=1 1 N log Z(D) D,w t = 1 N lim n 0 n Z n (D) D,w t = α log p(y w (D), C, λ) y,d,w t + entropic terms
22 for sparse Bayesian PCA Similar replica calculation to Uda and Kabashima (J. Phys. Soc. Japan 74, 2005) Z(D) = p(d C, λ) = N dw p(w C, λ) p(y n w) n=1 1 N log Z(D) D,w t = 1 N lim n 0 n Z n (D) D,w t = α log p(y w (D), C, λ) y,d,w t + entropic terms Average case becomes typical for large N due to self-averaging
23 Standard PCA result for sparse PCA (C<1) for well-matched data for unmatched data L1 prior Standard PCA result (C=1) Learning exhibits phase transitions, e.g. (for α > 1) ( ρ(w ) = θ(α T 2 ) θ α where θ(x) is the step function and λ NT T = w t 2 N = NC tλ 1 t. ) α T 2 α + T 1
24 Standard PCA result for sparse PCA (C<1) for well-matched data for unmatched data L1 prior Standard PCA result (C=1) Learning exhibits phase transitions, e.g. (for α > 1) ( ρ(w ) = θ(α T 2 ) θ α where θ(x) is the step function and λ NT T = w t 2 N = NC tλ 1 t. ) α T 2 α + T 1 Consistent with result for Bayesian PCA with spherical prior p(w) δ( w 1) (Riemann et al. J. Phys. A 1996)
25 Standard PCA result for sparse PCA (C<1) for well-matched data for unmatched data L1 prior Standard PCA result (C=1) Learning exhibits phase transitions, e.g. (for α > 1) ( ρ(w ) = θ(α T 2 ) θ α where θ(x) is the step function and λ NT T = w t 2 N = NC tλ 1 t. ) α T 2 α + T 1 Consistent with result for Bayesian PCA with spherical prior p(w) δ( w 1) (Riemann et al. J. Phys. A 1996) Only new feature is 1st-order transition with increasing λ
26 Standard PCA result for sparse PCA (C<1) for well-matched data for unmatched data L1 prior Standard PCA result (C=1) λ t = N, N = 5000, M = (α = 5) 1 Theory vs. MCMC averaged over 10 data sets 0.8 ρ(w * ) λ/λ t
27 Standard PCA result for sparse PCA (C<1) for well-matched data for unmatched data L1 prior Standard PCA result (C=1) λ t = N, N = 5000, M = (α = 5) 1 Theory vs. MCMC averaged over 10 data sets 0.8 ρ(w * ) λ/λ t Here we will only consider learning away from phase transitions
28 Standard PCA result for sparse PCA (C<1) for well-matched data for unmatched data L1 prior for sparse PCA (C<1) We consider two types of data set distribution:
29 Standard PCA result for sparse PCA (C<1) for well-matched data for unmatched data L1 prior for sparse PCA (C<1) We consider two types of data set distribution: (1) Same form as prior: p(w t i ) = (1 C t )δ(w t i ) + C t N (w t i 0, λ 1 t )
30 Standard PCA result for sparse PCA (C<1) for well-matched data for unmatched data L1 prior for sparse PCA (C<1) We consider two types of data set distribution: (1) Same form as prior: p(w t i ) = (1 C t )δ(w t i ) + C t N (w t i 0, λ 1 t ) (2) Dierent form: p(w t i ) = (1 C t )δ(w t i ) + C t δ(w t i λ 1/2 t )
31 Standard PCA result for sparse PCA (C<1) for well-matched data for unmatched data L1 prior for sparse PCA (C<1) We consider two types of data set distribution: (1) Same form as prior: p(w t i ) = (1 C t )δ(w t i ) + C t N (w t i 0, λ 1 t ) (2) Dierent form: p(w t i ) = (1 C t )δ(w t i ) + C t δ(w t i λ 1/2 t ) Both give identical performance for standard PCA
32 Standard PCA result for sparse PCA (C<1) for well-matched data for unmatched data L1 prior for sparse PCA (C<1) We consider two types of data set distribution: (1) Same form as prior: p(w t i ) = (1 C t )δ(w t i ) + C t N (w t i 0, λ 1 t ) (2) Dierent form: p(w t i ) = (1 C t )δ(w t i ) + C t δ(w t i λ 1/2 t ) Both give identical performance for standard PCA Both give identical performance if sparsity is known
33 Standard PCA result for sparse PCA (C<1) for well-matched data for unmatched data L1 prior for data distribution (1): ρ(w ) and L(w ) p(wi t) = (1 C t)δ(wi t) + C tn (wi t 0, λ 1 t ) C t = 0.2, λ = λ t = N/100, M = 200, N = M/α Theory versus MCMC results for single dataset α=0.15 α=0.1 α=0.05 α= Theory versus MCMC results for single dataset α=0.15 α=0.1 α=0.05 α= ρ(w * ) 0.7 L(w * ) C C
34 Standard PCA result for sparse PCA (C<1) for well-matched data for unmatched data L1 prior for data distribution (1): ρ(w ) and L(w ) p(wi t) = (1 C t)δ(wi t) + C tn (wi t 0, λ 1 t ) C t = 0.2, λ = λ t = N/100, M = 200, N = M/α Theory versus MCMC averaged over 10 datasets α=0.15 α=0.1 α=0.05 α= Theory versus MCMC averaged over 10 datasets α=0.15 α=0.1 α=0.05 α= ρ(w * ) 0.7 L(w * ) C C
35 Standard PCA result for sparse PCA (C<1) for well-matched data for unmatched data L1 prior for data distribution (1): ρ(w ) and L(w ) C t = 0.2, λ t = 20, M = 200, N = 2000 (α = 0.1) 8 ρ(w * ) 8 L(w * ) λ/λ t 4 λ/λ t C C
36 Standard PCA result for sparse PCA (C<1) for well-matched data for unmatched data L1 prior for data distribution (1): p(d C, λ) C t = 0.2, λ t = 20, M = 200, N = 2000 (α = 0.1) 2 Theory for log p(d C,λ) 2 MCMC for p(c,λ D) averaged over 20 data sets λ/λ t λ/λ t C C
37 Standard PCA result for sparse PCA (C<1) for well-matched data for unmatched data L1 prior for data distribution (2): ρ(w ) p(w t i ) = (1 C t)δ(w t i ) + C tδ(w t i λ 1/2 t ) C t = 0.2, λ = λ t = N/100, M = 200, N = M/α Delta mixture versus Gaussian mixture theory α=0.2 α=0.15 α=0.1 α= Theory versus MCMC averaged over 10 datasets α=0.2 α=0.15 α=0.1 α= ρ(w * ) 0.7 ρ(w * ) C C
38 Standard PCA result for sparse PCA (C<1) for well-matched data for unmatched data L1 prior for data distribution (2): ρ(w ) and L(w ) C t = 0.2, λ t = 10, M = 200, N = 1000 (α = 0.2) 8 ρ(w * ) 8 L(w * ) λ/λ t 4 λ/λ t C C
39 Standard PCA result for sparse PCA (C<1) for well-matched data for unmatched data L1 prior for data distribution (2): p(d C, λ) C t = 0.2, λ t = 10, M = 200, N = 1000 (α = 0.2) 4 Theory for log p(d C,λ) 4 MCMC for p(c,λ D) averaged over 20 data sets λ/λ t 2 λ/λ t C C
40 Standard PCA result for sparse PCA (C<1) for well-matched data for unmatched data L1 prior Other priors? Is this problem specic to the mixture prior?
41 Standard PCA result for sparse PCA (C<1) for well-matched data for unmatched data L1 prior Other priors? Is this problem specic to the mixture prior? Consider the L1 prior (with an additional L2 term), p(w i ) e λ 2w 2 i 2 λ 1 wi
42 Standard PCA result for sparse PCA (C<1) for well-matched data for unmatched data L1 prior Other priors? Is this problem specic to the mixture prior? Consider the L1 prior (with an additional L2 term), Data distribution (2) p(w i ) e λ 2w 2 i 2 λ 1 wi p(w t i ) = (1 C t )δ(w t i ) + C t δ(w t i λ 1/2 t )
43 Standard PCA result for sparse PCA (C<1) for well-matched data for unmatched data L1 prior L1 prior results: ρ(w ) C t = 0.2, λ t = N/100, α = M/N L1 prior theory α=0.2 α=0.15 α=0.1 α= Mixture prior theory α=0.2 α=0.15 α=0.1 α= ρ(w * ) 0.8 ρ(w * ) λ C
44 Standard PCA result for sparse PCA (C<1) for well-matched data for unmatched data L1 prior L1 prior results: p(d λ 1, λ 2 ) versus L(w ) and ρ(w ) C t = 0.2, λ t = N/100, α = 0.2 L(w * ) p(d λ 1,λ 2 ) λ 2 /λ t λ λ 2 /λ t λ
45 Standard PCA result for sparse PCA (C<1) for well-matched data for unmatched data L1 prior L1 prior results: p(d λ 1, λ 2 ) versus L(w ) and ρ(w ) C t = 0.2, λ t = N/100, α = 0.2 ρ(w * ) p(d λ 1,λ 2 ) λ 2 /λ t λ λ 2 /λ t λ
46 Mixture prior works as expected when well-matched to data
47 Mixture prior works as expected when well-matched to data Marginal likelihood for mixture prior can be misleading
48 Mixture prior works as expected when well-matched to data Marginal likelihood for mixture prior can be misleading Marginal likelihood seems more eective for L1 prior
49 Mixture prior works as expected when well-matched to data Marginal likelihood for mixture prior can be misleading Marginal likelihood seems more eective for L1 prior... although L1 didn't really perform well (preliminary)
50 Mixture prior works as expected when well-matched to data Marginal likelihood for mixture prior can be misleading Marginal likelihood seems more eective for L1 prior... although L1 didn't really perform well (preliminary) Future work should look at multiple factors
51 Mixture prior works as expected when well-matched to data Marginal likelihood for mixture prior can be misleading Marginal likelihood seems more eective for L1 prior... although L1 didn't really perform well (preliminary) Future work should look at multiple factors Assessment of metrics using the full posterior
52 Mixture prior works as expected when well-matched to data Marginal likelihood for mixture prior can be misleading Marginal likelihood seems more eective for L1 prior... although L1 didn't really perform well (preliminary) Future work should look at multiple factors Assessment of metrics using the full posterior Comparison with MAP and ML approaches
53 Mixture prior works as expected when well-matched to data Marginal likelihood for mixture prior can be misleading Marginal likelihood seems more eective for L1 prior... although L1 didn't really perform well (preliminary) Future work should look at multiple factors Assessment of metrics using the full posterior Comparison with MAP and ML approaches And better priors!
Non-Parametric Bayes
Non-Parametric Bayes Mark Schmidt UBC Machine Learning Reading Group January 2016 Current Hot Topics in Machine Learning Bayesian learning includes: Gaussian processes. Approximate inference. Bayesian
More informationInferring Latent Functions with Gaussian Processes in Differential Equations
Inferring Latent Functions with in Differential Equations Neil Lawrence School of Computer Science University of Manchester Joint work with Magnus Rattray and Guido Sanguinetti 9th December 26 Outline
More informationCSci 8980: Advanced Topics in Graphical Models Gaussian Processes
CSci 8980: Advanced Topics in Graphical Models Gaussian Processes Instructor: Arindam Banerjee November 15, 2007 Gaussian Processes Outline Gaussian Processes Outline Parametric Bayesian Regression Gaussian
More informationModelling Transcriptional Regulation with Gaussian Processes
Modelling Transcriptional Regulation with Gaussian Processes Neil Lawrence School of Computer Science University of Manchester Joint work with Magnus Rattray and Guido Sanguinetti 8th March 7 Outline Application
More informationProbabilistic Latent Semantic Analysis
Probabilistic Latent Semantic Analysis Yuriy Sverchkov Intelligent Systems Program University of Pittsburgh October 6, 2011 Outline Latent Semantic Analysis (LSA) A quick review Probabilistic LSA (plsa)
More informationSparse Stochastic Inference for Latent Dirichlet Allocation
Sparse Stochastic Inference for Latent Dirichlet Allocation David Mimno 1, Matthew D. Hoffman 2, David M. Blei 1 1 Dept. of Computer Science, Princeton U. 2 Dept. of Statistics, Columbia U. Presentation
More informationL 0 methods. H.J. Kappen Donders Institute for Neuroscience Radboud University, Nijmegen, the Netherlands. December 5, 2011.
L methods H.J. Kappen Donders Institute for Neuroscience Radboud University, Nijmegen, the Netherlands December 5, 2 Bert Kappen Outline George McCullochs model The Variational Garrote Bert Kappen L methods
More informationMCMC: Markov Chain Monte Carlo
I529: Machine Learning in Bioinformatics (Spring 2013) MCMC: Markov Chain Monte Carlo Yuzhen Ye School of Informatics and Computing Indiana University, Bloomington Spring 2013 Contents Review of Markov
More informationHidden Markov Models. Terminology, Representation and Basic Problems
Hidden Markov Models Terminology, Representation and Basic Problems Data analysis? Machine learning? In bioinformatics, we analyze a lot of (sequential) data (biological sequences) to learn unknown parameters
More informationCoupled Hidden Markov Models: Computational Challenges
.. Coupled Hidden Markov Models: Computational Challenges Louis J. M. Aslett and Chris C. Holmes i-like Research Group University of Oxford Warwick Algorithms Seminar 7 th March 2014 ... Hidden Markov
More informationIntroduction to Probabilistic Machine Learning
Introduction to Probabilistic Machine Learning Piyush Rai Dept. of CSE, IIT Kanpur (Mini-course 1) Nov 03, 2015 Piyush Rai (IIT Kanpur) Introduction to Probabilistic Machine Learning 1 Machine Learning
More informationRegularization Parameter Selection for a Bayesian Multi-Level Group Lasso Regression Model with Application to Imaging Genomics
Regularization Parameter Selection for a Bayesian Multi-Level Group Lasso Regression Model with Application to Imaging Genomics arxiv:1603.08163v1 [stat.ml] 7 Mar 016 Farouk S. Nathoo, Keelin Greenlaw,
More informationLatent Variable models for GWAs
Latent Variable models for GWAs Oliver Stegle Machine Learning and Computational Biology Research Group Max-Planck-Institutes Tübingen, Germany September 2011 O. Stegle Latent variable models for GWAs
More informationResearch Article Modelling Transcriptional Regulation with a Mixture of Factor Analyzers and Variational Bayesian Expectation Maximization
Hindawi Publishing Corporation EURASIP Journal on Bioinformatics and Systems Biology Volume 9, Article ID 6168, 6 pages doi:1.1155/9/6168 Research Article Modelling Transcriptional Regulation with a Mixture
More informationPattern Recognition and Machine Learning. Bishop Chapter 9: Mixture Models and EM
Pattern Recognition and Machine Learning Chapter 9: Mixture Models and EM Thomas Mensink Jakob Verbeek October 11, 27 Le Menu 9.1 K-means clustering Getting the idea with a simple example 9.2 Mixtures
More informationIntroduction to Machine Learning
Introduction to Machine Learning Brown University CSCI 1950-F, Spring 2012 Prof. Erik Sudderth Lecture 25: Markov Chain Monte Carlo (MCMC) Course Review and Advanced Topics Many figures courtesy Kevin
More informationModelling gene expression dynamics with Gaussian processes
Modelling gene expression dynamics with Gaussian processes Regulatory Genomics and Epigenomics March th 6 Magnus Rattray Faculty of Life Sciences University of Manchester Talk Outline Introduction to Gaussian
More informationSTA 4273H: Statistical Machine Learning
STA 4273H: Statistical Machine Learning Russ Salakhutdinov Department of Statistics! rsalakhu@utstat.toronto.edu! http://www.utstat.utoronto.ca/~rsalakhu/ Sidney Smith Hall, Room 6002 Lecture 11 Project
More informationTutorial on Gaussian Processes and the Gaussian Process Latent Variable Model
Tutorial on Gaussian Processes and the Gaussian Process Latent Variable Model (& discussion on the GPLVM tech. report by Prof. N. Lawrence, 06) Andreas Damianou Department of Neuro- and Computer Science,
More informationSTA 4273H: Statistical Machine Learning
STA 4273H: Statistical Machine Learning Russ Salakhutdinov Department of Computer Science! Department of Statistical Sciences! rsalakhu@cs.toronto.edu! h0p://www.cs.utoronto.ca/~rsalakhu/ Lecture 7 Approximate
More informationGaussian Mixture Model
Case Study : Document Retrieval MAP EM, Latent Dirichlet Allocation, Gibbs Sampling Machine Learning/Statistics for Big Data CSE599C/STAT59, University of Washington Emily Fox 0 Emily Fox February 5 th,
More informationBAYESIAN METHODS FOR VARIABLE SELECTION WITH APPLICATIONS TO HIGH-DIMENSIONAL DATA
BAYESIAN METHODS FOR VARIABLE SELECTION WITH APPLICATIONS TO HIGH-DIMENSIONAL DATA Intro: Course Outline and Brief Intro to Marina Vannucci Rice University, USA PASI-CIMAT 04/28-30/2010 Marina Vannucci
More informationLecture : Probabilistic Machine Learning
Lecture : Probabilistic Machine Learning Riashat Islam Reasoning and Learning Lab McGill University September 11, 2018 ML : Many Methods with Many Links Modelling Views of Machine Learning Machine Learning
More informationBayesian Sparse Correlated Factor Analysis
Bayesian Sparse Correlated Factor Analysis 1 Abstract In this paper, we propose a new sparse correlated factor model under a Bayesian framework that intended to model transcription factor regulation in
More informationBayesian Clustering with the Dirichlet Process: Issues with priors and interpreting MCMC. Shane T. Jensen
Bayesian Clustering with the Dirichlet Process: Issues with priors and interpreting MCMC Shane T. Jensen Department of Statistics The Wharton School, University of Pennsylvania stjensen@wharton.upenn.edu
More informationSYDE 372 Introduction to Pattern Recognition. Probability Measures for Classification: Part I
SYDE 372 Introduction to Pattern Recognition Probability Measures for Classification: Part I Alexander Wong Department of Systems Design Engineering University of Waterloo Outline 1 2 3 4 Why use probability
More informationBayesian RL Seminar. Chris Mansley September 9, 2008
Bayesian RL Seminar Chris Mansley September 9, 2008 Bayes Basic Probability One of the basic principles of probability theory, the chain rule, will allow us to derive most of the background material in
More informationBayesian Learning. HT2015: SC4 Statistical Data Mining and Machine Learning. Maximum Likelihood Principle. The Bayesian Learning Framework
HT5: SC4 Statistical Data Mining and Machine Learning Dino Sejdinovic Department of Statistics Oxford http://www.stats.ox.ac.uk/~sejdinov/sdmml.html Maximum Likelihood Principle A generative model for
More informationGLOBEX Bioinformatics (Summer 2015) Genetic networks and gene expression data
GLOBEX Bioinformatics (Summer 2015) Genetic networks and gene expression data 1 Gene Networks Definition: A gene network is a set of molecular components, such as genes and proteins, and interactions between
More informationDefault Priors and Effcient Posterior Computation in Bayesian
Default Priors and Effcient Posterior Computation in Bayesian Factor Analysis January 16, 2010 Presented by Eric Wang, Duke University Background and Motivation A Brief Review of Parameter Expansion Literature
More informationManifold Learning for Signal and Visual Processing Lecture 9: Probabilistic PCA (PPCA), Factor Analysis, Mixtures of PPCA
Manifold Learning for Signal and Visual Processing Lecture 9: Probabilistic PCA (PPCA), Factor Analysis, Mixtures of PPCA Radu Horaud INRIA Grenoble Rhone-Alpes, France Radu.Horaud@inria.fr http://perception.inrialpes.fr/
More informationMarkov Chain Monte Carlo Algorithms for Gaussian Processes
Markov Chain Monte Carlo Algorithms for Gaussian Processes Michalis K. Titsias, Neil Lawrence and Magnus Rattray School of Computer Science University of Manchester June 8 Outline Gaussian Processes Sampling
More informationDeep Poisson Factorization Machines: a factor analysis model for mapping behaviors in journalist ecosystem
000 001 002 003 004 005 006 007 008 009 010 011 012 013 014 015 016 017 018 019 020 021 022 023 024 025 026 027 028 029 030 031 032 033 034 035 036 037 038 039 040 041 042 043 044 045 046 047 048 049 050
More informationProbabilistic Graphical Models
Probabilistic Graphical Models Lecture 9: Variational Inference Relaxations Volkan Cevher, Matthias Seeger Ecole Polytechnique Fédérale de Lausanne 24/10/2011 (EPFL) Graphical Models 24/10/2011 1 / 15
More informationIntegrated Non-Factorized Variational Inference
Integrated Non-Factorized Variational Inference Shaobo Han, Xuejun Liao and Lawrence Carin Duke University February 27, 2014 S. Han et al. Integrated Non-Factorized Variational Inference February 27, 2014
More informationNonparametric Regression With Gaussian Processes
Nonparametric Regression With Gaussian Processes From Chap. 45, Information Theory, Inference and Learning Algorithms, D. J. C. McKay Presented by Micha Elsner Nonparametric Regression With Gaussian Processes
More informationInfinite latent feature models and the Indian Buffet Process
p.1 Infinite latent feature models and the Indian Buffet Process Tom Griffiths Cognitive and Linguistic Sciences Brown University Joint work with Zoubin Ghahramani p.2 Beyond latent classes Unsupervised
More informationBayesian Machine Learning
Bayesian Machine Learning Andrew Gordon Wilson ORIE 6741 Lecture 4 Occam s Razor, Model Construction, and Directed Graphical Models https://people.orie.cornell.edu/andrew/orie6741 Cornell University September
More informationExpectation Maximization
Expectation Maximization Aaron C. Courville Université de Montréal Note: Material for the slides is taken directly from a presentation prepared by Christopher M. Bishop Learning in DAGs Two things could
More informationBayesian Machine Learning
Bayesian Machine Learning Andrew Gordon Wilson ORIE 6741 Lecture 3 Stochastic Gradients, Bayesian Inference, and Occam s Razor https://people.orie.cornell.edu/andrew/orie6741 Cornell University August
More informationBayesian inference J. Daunizeau
Bayesian inference J. Daunizeau Brain and Spine Institute, Paris, France Wellcome Trust Centre for Neuroimaging, London, UK Overview of the talk 1 Probabilistic modelling and representation of uncertainty
More informationBayesian Inference and MCMC
Bayesian Inference and MCMC Aryan Arbabi Partly based on MCMC slides from CSC412 Fall 2018 1 / 18 Bayesian Inference - Motivation Consider we have a data set D = {x 1,..., x n }. E.g each x i can be the
More informationRecent Advances in Bayesian Inference Techniques
Recent Advances in Bayesian Inference Techniques Christopher M. Bishop Microsoft Research, Cambridge, U.K. research.microsoft.com/~cmbishop SIAM Conference on Data Mining, April 2004 Abstract Bayesian
More informationBayesian Hierarchical Classification. Seminar on Predicting Structured Data Jukka Kohonen
Bayesian Hierarchical Classification Seminar on Predicting Structured Data Jukka Kohonen 17.4.2008 Overview Intro: The task of hierarchical gene annotation Approach I: SVM/Bayes hybrid Barutcuoglu et al:
More informationGAUSSIAN PROCESS REGRESSION
GAUSSIAN PROCESS REGRESSION CSE 515T Spring 2015 1. BACKGROUND The kernel trick again... The Kernel Trick Consider again the linear regression model: y(x) = φ(x) w + ε, with prior p(w) = N (w; 0, Σ). The
More informationProbabilistic Graphical Models
Probabilistic Graphical Models Brown University CSCI 295-P, Spring 213 Prof. Erik Sudderth Lecture 11: Inference & Learning Overview, Gaussian Graphical Models Some figures courtesy Michael Jordan s draft
More informationIntroduction to Bioinformatics
CSCI8980: Applied Machine Learning in Computational Biology Introduction to Bioinformatics Rui Kuang Department of Computer Science and Engineering University of Minnesota kuang@cs.umn.edu History of Bioinformatics
More informationThe Monte Carlo Method: Bayesian Networks
The Method: Bayesian Networks Dieter W. Heermann Methods 2009 Dieter W. Heermann ( Methods)The Method: Bayesian Networks 2009 1 / 18 Outline 1 Bayesian Networks 2 Gene Expression Data 3 Bayesian Networks
More informationMachine Learning Techniques for Computer Vision
Machine Learning Techniques for Computer Vision Part 2: Unsupervised Learning Microsoft Research Cambridge x 3 1 0.5 0.2 0 0.5 0.3 0 0.5 1 ECCV 2004, Prague x 2 x 1 Overview of Part 2 Mixture models EM
More informationAges of stellar populations from color-magnitude diagrams. Paul Baines. September 30, 2008
Ages of stellar populations from color-magnitude diagrams Paul Baines Department of Statistics Harvard University September 30, 2008 Context & Example Welcome! Today we will look at using hierarchical
More information20: Gaussian Processes
10-708: Probabilistic Graphical Models 10-708, Spring 2016 20: Gaussian Processes Lecturer: Andrew Gordon Wilson Scribes: Sai Ganesh Bandiatmakuri 1 Discussion about ML Here we discuss an introduction
More informationMixture models for analysing transcriptome and ChIP-chip data
Mixture models for analysing transcriptome and ChIP-chip data Marie-Laure Martin-Magniette French National Institute for agricultural research (INRA) Unit of Applied Mathematics and Informatics at AgroParisTech,
More informationSTA414/2104 Statistical Methods for Machine Learning II
STA414/2104 Statistical Methods for Machine Learning II Murat A. Erdogdu & David Duvenaud Department of Computer Science Department of Statistical Sciences Lecture 3 Slide credits: Russ Salakhutdinov Announcements
More informationNaïve Bayes classification
Naïve Bayes classification 1 Probability theory Random variable: a variable whose possible values are numerical outcomes of a random phenomenon. Examples: A person s height, the outcome of a coin toss
More informationFitting Narrow Emission Lines in X-ray Spectra
Outline Fitting Narrow Emission Lines in X-ray Spectra Taeyoung Park Department of Statistics, University of Pittsburgh October 11, 2007 Outline of Presentation Outline This talk has three components:
More informationCPSC 540: Machine Learning
CPSC 540: Machine Learning Empirical Bayes, Hierarchical Bayes Mark Schmidt University of British Columbia Winter 2017 Admin Assignment 5: Due April 10. Project description on Piazza. Final details coming
More informationBased on slides by Richard Zemel
CSC 412/2506 Winter 2018 Probabilistic Learning and Reasoning Lecture 3: Directed Graphical Models and Latent Variables Based on slides by Richard Zemel Learning outcomes What aspects of a model can we
More informationLearning the hyper-parameters. Luca Martino
Learning the hyper-parameters Luca Martino 2017 2017 1 / 28 Parameters and hyper-parameters 1. All the described methods depend on some choice of hyper-parameters... 2. For instance, do you recall λ (bandwidth
More informationBayesian Learning in Undirected Graphical Models
Bayesian Learning in Undirected Graphical Models Zoubin Ghahramani Gatsby Computational Neuroscience Unit University College London, UK http://www.gatsby.ucl.ac.uk/ Work with: Iain Murray and Hyun-Chul
More informationICML Scalable Bayesian Inference on Point processes. with Gaussian Processes. Yves-Laurent Kom Samo & Stephen Roberts
ICML 2015 Scalable Nonparametric Bayesian Inference on Point Processes with Gaussian Processes Machine Learning Research Group and Oxford-Man Institute University of Oxford July 8, 2015 Point Processes
More informationMachine learning: Hypothesis testing. Anders Hildeman
Location of trees 0 Observed trees 50 100 150 200 250 300 350 400 450 500 0 100 200 300 400 500 600 700 800 900 1000 Figur: Observed points pattern of the tree specie Beilschmiedia pendula. Location of
More informationIntroduction to Systems Analysis and Decision Making Prepared by: Jakub Tomczak
Introduction to Systems Analysis and Decision Making Prepared by: Jakub Tomczak 1 Introduction. Random variables During the course we are interested in reasoning about considered phenomenon. In other words,
More informationMCMC algorithms for fitting Bayesian models
MCMC algorithms for fitting Bayesian models p. 1/1 MCMC algorithms for fitting Bayesian models Sudipto Banerjee sudiptob@biostat.umn.edu University of Minnesota MCMC algorithms for fitting Bayesian models
More informationProbabilistic Graphical Models
2016 Robert Nowak Probabilistic Graphical Models 1 Introduction We have focused mainly on linear models for signals, in particular the subspace model x = Uθ, where U is a n k matrix and θ R k is a vector
More informationVariational Inference. Sargur Srihari
Variational Inference Sargur srihari@cedar.buffalo.edu 1 Plan of discussion We first describe inference with PGMs and the intractability of exact inference Then give a taxonomy of inference algorithms
More informationBayesian Nonparametric Models
Bayesian Nonparametric Models David M. Blei Columbia University December 15, 2015 Introduction We have been looking at models that posit latent structure in high dimensional data. We use the posterior
More informationIntroduction to Bayesian inference
Introduction to Bayesian inference Thomas Alexander Brouwer University of Cambridge tab43@cam.ac.uk 17 November 2015 Probabilistic models Describe how data was generated using probability distributions
More informationSequential Monte Carlo and Particle Filtering. Frank Wood Gatsby, November 2007
Sequential Monte Carlo and Particle Filtering Frank Wood Gatsby, November 2007 Importance Sampling Recall: Let s say that we want to compute some expectation (integral) E p [f] = p(x)f(x)dx and we remember
More informationLecture 13 Fundamentals of Bayesian Inference
Lecture 13 Fundamentals of Bayesian Inference Dennis Sun Stats 253 August 11, 2014 Outline of Lecture 1 Bayesian Models 2 Modeling Correlations Using Bayes 3 The Universal Algorithm 4 BUGS 5 Wrapping Up
More informationDensity Estimation. Seungjin Choi
Density Estimation Seungjin Choi Department of Computer Science and Engineering Pohang University of Science and Technology 77 Cheongam-ro, Nam-gu, Pohang 37673, Korea seungjin@postech.ac.kr http://mlg.postech.ac.kr/
More informationAdvanced Introduction to Machine Learning
10-715 Advanced Introduction to Machine Learning Homework 3 Due Nov 12, 10.30 am Rules 1. Homework is due on the due date at 10.30 am. Please hand over your homework at the beginning of class. Please see
More informationMachine Learning. Probabilistic KNN.
Machine Learning. Mark Girolami girolami@dcs.gla.ac.uk Department of Computing Science University of Glasgow June 21, 2007 p. 1/3 KNN is a remarkably simple algorithm with proven error-rates June 21, 2007
More informationA new Hierarchical Bayes approach to ensemble-variational data assimilation
A new Hierarchical Bayes approach to ensemble-variational data assimilation Michael Tsyrulnikov and Alexander Rakitko HydroMetCenter of Russia College Park, 20 Oct 2014 Michael Tsyrulnikov and Alexander
More informationBayesian Quadrature: Model-based Approximate Integration. David Duvenaud University of Cambridge
Bayesian Quadrature: Model-based Approimate Integration David Duvenaud University of Cambridge The Quadrature Problem ˆ We want to estimate an integral Z = f ()p()d ˆ Most computational problems in inference
More informationMCMC and Gibbs Sampling. Kayhan Batmanghelich
MCMC and Gibbs Sampling Kayhan Batmanghelich 1 Approaches to inference l Exact inference algorithms l l l The elimination algorithm Message-passing algorithm (sum-product, belief propagation) The junction
More informationBayes methods for categorical data. April 25, 2017
Bayes methods for categorical data April 25, 2017 Motivation for joint probability models Increasing interest in high-dimensional data in broad applications Focus may be on prediction, variable selection,
More informationChapter 16. Structured Probabilistic Models for Deep Learning
Peng et al.: Deep Learning and Practice 1 Chapter 16 Structured Probabilistic Models for Deep Learning Peng et al.: Deep Learning and Practice 2 Structured Probabilistic Models way of using graphs to describe
More informationApproximate Inference using MCMC
Approximate Inference using MCMC 9.520 Class 22 Ruslan Salakhutdinov BCS and CSAIL, MIT 1 Plan 1. Introduction/Notation. 2. Examples of successful Bayesian models. 3. Basic Sampling Algorithms. 4. Markov
More informationMotivation Scale Mixutres of Normals Finite Gaussian Mixtures Skew-Normal Models. Mixture Models. Econ 690. Purdue University
Econ 690 Purdue University In virtually all of the previous lectures, our models have made use of normality assumptions. From a computational point of view, the reason for this assumption is clear: combined
More informationLearning Bayesian network : Given structure and completely observed data
Learning Bayesian network : Given structure and completely observed data Probabilistic Graphical Models Sharif University of Technology Spring 2017 Soleymani Learning problem Target: true distribution
More informationNaïve Bayes classification. p ij 11/15/16. Probability theory. Probability theory. Probability theory. X P (X = x i )=1 i. Marginal Probability
Probability theory Naïve Bayes classification Random variable: a variable whose possible values are numerical outcomes of a random phenomenon. s: A person s height, the outcome of a coin toss Distinguish
More informationGraphical Models and Kernel Methods
Graphical Models and Kernel Methods Jerry Zhu Department of Computer Sciences University of Wisconsin Madison, USA MLSS June 17, 2014 1 / 123 Outline Graphical Models Probabilistic Inference Directed vs.
More informationProbabilistic Reasoning in Deep Learning
Probabilistic Reasoning in Deep Learning Dr Konstantina Palla, PhD palla@stats.ox.ac.uk September 2017 Deep Learning Indaba, Johannesburgh Konstantina Palla 1 / 39 OVERVIEW OF THE TALK Basics of Bayesian
More informationLearning Sequence Motif Models Using Expectation Maximization (EM) and Gibbs Sampling
Learning Sequence Motif Models Using Expectation Maximization (EM) and Gibbs Sampling BMI/CS 776 www.biostat.wisc.edu/bmi776/ Spring 009 Mark Craven craven@biostat.wisc.edu Sequence Motifs what is a sequence
More informationProbabilistic Graphical Models
Probabilistic Graphical Models Introduction. Basic Probability and Bayes Volkan Cevher, Matthias Seeger Ecole Polytechnique Fédérale de Lausanne 26/9/2011 (EPFL) Graphical Models 26/9/2011 1 / 28 Outline
More informationLinear Dynamical Systems
Linear Dynamical Systems Sargur N. srihari@cedar.buffalo.edu Machine Learning Course: http://www.cedar.buffalo.edu/~srihari/cse574/index.html Two Models Described by Same Graph Latent variables Observations
More informationExpectation maximization tutorial
Expectation maximization tutorial Octavian Ganea November 18, 2016 1/1 Today Expectation - maximization algorithm Topic modelling 2/1 ML & MAP Observed data: X = {x 1, x 2... x N } 3/1 ML & MAP Observed
More informationLecture 9: PGM Learning
13 Oct 2014 Intro. to Stats. Machine Learning COMP SCI 4401/7401 Table of Contents I Learning parameters in MRFs 1 Learning parameters in MRFs Inference and Learning Given parameters (of potentials) and
More informationHub Gene Selection Methods for the Reconstruction of Transcription Networks
for the Reconstruction of Transcription Networks José Miguel Hernández-Lobato (1) and Tjeerd. M. H. Dijkstra (2) (1) Computer Science Department, Universidad Autónoma de Madrid, Spain (2) Institute for
More informationNearest Neighbor Gaussian Processes for Large Spatial Data
Nearest Neighbor Gaussian Processes for Large Spatial Data Abhi Datta 1, Sudipto Banerjee 2 and Andrew O. Finley 3 July 31, 2017 1 Department of Biostatistics, Bloomberg School of Public Health, Johns
More informationVariational Autoencoders
Variational Autoencoders Recap: Story so far A classification MLP actually comprises two components A feature extraction network that converts the inputs into linearly separable features Or nearly linearly
More informationDefault Priors and Efficient Posterior Computation in Bayesian Factor Analysis
Default Priors and Efficient Posterior Computation in Bayesian Factor Analysis Joyee Ghosh Institute of Statistics and Decision Sciences, Duke University Box 90251, Durham, NC 27708 joyee@stat.duke.edu
More informationMaking rating curves - the Bayesian approach
Making rating curves - the Bayesian approach Rating curves what is wanted? A best estimate of the relationship between stage and discharge at a given place in a river. The relationship should be on the
More informationSTA 414/2104: Machine Learning
STA 414/2104: Machine Learning Russ Salakhutdinov Department of Computer Science! Department of Statistics! rsalakhu@cs.toronto.edu! http://www.cs.toronto.edu/~rsalakhu/ Lecture 9 Sequential Data So far
More informationContrastive Divergence
Contrastive Divergence Training Products of Experts by Minimizing CD Hinton, 2002 Helmut Puhr Institute for Theoretical Computer Science TU Graz June 9, 2010 Contents 1 Theory 2 Argument 3 Contrastive
More informationQuantifying Fingerprint Evidence using Bayesian Alignment
Quantifying Fingerprint Evidence using Bayesian Alignment Peter Forbes Joint work with Steffen Lauritzen and Jesper Møller Department of Statistics University of Oxford UCL CSML Lunch Talk 14 February
More informationTopic Models. Charles Elkan November 20, 2008
Topic Models Charles Elan elan@cs.ucsd.edu November 20, 2008 Suppose that we have a collection of documents, and we want to find an organization for these, i.e. we want to do unsupervised learning. One
More informationGaussian Processes for Machine Learning
Gaussian Processes for Machine Learning Carl Edward Rasmussen Max Planck Institute for Biological Cybernetics Tübingen, Germany carl@tuebingen.mpg.de Carlos III, Madrid, May 2006 The actual science of
More informationBayesian Analysis for Natural Language Processing Lecture 2
Bayesian Analysis for Natural Language Processing Lecture 2 Shay Cohen February 4, 2013 Administrativia The class has a mailing list: coms-e6998-11@cs.columbia.edu Need two volunteers for leading a discussion
More informationWeek 3: The EM algorithm
Week 3: The EM algorithm Maneesh Sahani maneesh@gatsby.ucl.ac.uk Gatsby Computational Neuroscience Unit University College London Term 1, Autumn 2005 Mixtures of Gaussians Data: Y = {y 1... y N } Latent
More information