Bayesian learning of sparse factor loadings

Size: px
Start display at page:

Download "Bayesian learning of sparse factor loadings"

Transcription

1 Magnus Rattray School of Computer Science, University of Manchester Bayesian Research Kitchen, Ambleside, September 6th 2008

2 Talk Outline Brief overview of popular sparsity priors Example application: sparse Bayesian factor analysis for network inference Theory for average-case performance with mixture prior Theory and MCMC results for dierent data distributions Comparison with L1 prior

3 Sparsity priors tend to be convenient rather than realistic, e.g.

4 Sparsity priors tend to be convenient rather than realistic, e.g. L1 p(w i ) = λ 2 e λ w i

5 Sparsity priors tend to be convenient rather than realistic, e.g. L1 p(w i ) = λ 2 e λ w i ARD p(w i ) = N (w i 0, λ 1 i )

6 Sparsity priors tend to be convenient rather than realistic, e.g. L1 p(w i ) = λ 2 e λ w i ARD p(w i ) = N (w i 0, λ 1 i ) Mixture p(w i ) = (1 C)δ(w i ) + CN (w i 0, λ 1 )

7 Sparsity priors tend to be convenient rather than realistic, e.g. L1 p(w i ) = λ 2 e λ w i ARD p(w i ) = N (w i 0, λ 1 i ) Mixture p(w i ) = (1 C)δ(w i ) + CN (w i 0, λ 1 ) Is it OK to not worry about whether these actually t the data?

8 Regulatory network inference Gibbs sampler Example application: factor analysis for network inference Y N (WZ + µ, Ψ) Y = [y in ] log-expression of gene i in sample n Z = [z jn ] log-concentration (or "activity") of TF j in sample n W = [w ij ] factor loading is eect of TF j on gene i Model Z as latent variable, since mrna data may not capture TF protein level/activity, or TFs too weakly expressed For e.g. yeast W roughly Various methods for inferring sparse W from Y (reviewed by Pournara and Wernisch, BMC Bioinformatics 2007).

9 Regulatory network inference Gibbs sampler Factor analysis for network inference Y N (WZ + µ, Ψ) Mixture prior leads to tractable Gibbs sampler p(w ij ) = (1 C ij )δ(w ij ) + C ij N (w ij 0, λ 1 ) Hyper-parameters C ij [0, 1] can be obtained from e.g. ChIP-chip data DNA motifs (Sabatti and James, Bioinformatics 2006) Or we can estimate (grouped) hyper-parameters by MCMC

10 Regulatory network inference Gibbs sampler Gibbs sampler Write w ij = x ij b ij where x ij {0, 1} and b ij N (0, λ 1 ) x j p(x j X \ x j, Z, Y) (1) B p(b X, Z, Y) (2) Z p(z X, B, Y) (3)

11 Regulatory network inference Gibbs sampler Gibbs sampler Write w ij = x ij b ij where x ij {0, 1} and b ij N (0, λ 1 ) x j p(x j X \ x j, Z, Y) (1) B p(b X, Z, Y) (2) Z p(z X, B, Y) (3) Integrate out B before sampling X (2,3) more ecient when X is typically sparse Can also sample hyper-parameters C ij and λ if required

12 for sparse Bayesian PCA y n N (0, σ 2 I + w w T ) p(w C, λ) = N [ (1 C)δ(wi ) + CN (w i 0, λ 1 ) ] i=1

13 for sparse Bayesian PCA y n N (0, σ 2 I + w w T ) p(w C, λ) = N [ (1 C)δ(wi ) + CN (w i 0, λ 1 ) ] i=1 We study average behaviour over datasets D = {y 1, y 2,..., y M } produced by a teacher distribution

14 for sparse Bayesian PCA y n N (0, σ 2 I + w w T ) p(w C, λ) = N [ (1 C)δ(wi ) + CN (w i 0, λ 1 ) ] i=1 We study average behaviour over datasets D = {y 1, y 2,..., y M } produced by a teacher distribution The teacher is identical except for a dierent factorized parameter distribution. We consider two cases:

15 for sparse Bayesian PCA y n N (0, σ 2 I + w w T ) p(w C, λ) = N [ (1 C)δ(wi ) + CN (w i 0, λ 1 ) ] i=1 We study average behaviour over datasets D = {y 1, y 2,..., y M } produced by a teacher distribution The teacher is identical except for a dierent factorized parameter distribution. We consider two cases: (1) Same form: p(w t i ) = (1 C t )δ(w t i ) + C t N (w t i 0, λ 1 t )

16 for sparse Bayesian PCA y n N (0, σ 2 I + w w T ) p(w C, λ) = N [ (1 C)δ(wi ) + CN (w i 0, λ 1 ) ] i=1 We study average behaviour over datasets D = {y 1, y 2,..., y M } produced by a teacher distribution The teacher is identical except for a dierent factorized parameter distribution. We consider two cases: (1) Same form: p(w t i ) = (1 C t )δ(w t i ) + C t N (w t i 0, λ 1 t ) (2) Dierent form: p(w t i ) = (1 C t )δ(w t i ) + C t δ(w t i λ 1/2 t )

17 for sparse Bayesian PCA Compute marginal likelihood log p(d C, λ) D in limit N with α = M/N held constant using replica method

18 for sparse Bayesian PCA Compute marginal likelihood log p(d C, λ) D in limit N with α = M/N held constant using replica method Compute functions of the mean posterior parameter w (D) ρ(w ) = w w t w w t L(w ) = log p(y w, C, λ) y w t

19 for sparse Bayesian PCA Compute marginal likelihood log p(d C, λ) D in limit N with α = M/N held constant using replica method Compute functions of the mean posterior parameter w (D) ρ(w ) = w w t w w t L(w ) = log p(y w, C, λ) y w t Good agreement with simulations for most relevant case of small α (so-called large N small p regime)

20 for sparse Bayesian PCA Compute marginal likelihood log p(d C, λ) D in limit N with α = M/N held constant using replica method Compute functions of the mean posterior parameter w (D) ρ(w ) = w w t w w t L(w ) = log p(y w, C, λ) y w t Good agreement with simulations for most relevant case of small α (so-called large N small p regime)

21 for sparse Bayesian PCA Similar replica calculation to Uda and Kabashima (J. Phys. Soc. Japan 74, 2005) Z(D) = p(d C, λ) = N dw p(w C, λ) p(y n w) n=1 1 N log Z(D) D,w t = 1 N lim n 0 n Z n (D) D,w t = α log p(y w (D), C, λ) y,d,w t + entropic terms

22 for sparse Bayesian PCA Similar replica calculation to Uda and Kabashima (J. Phys. Soc. Japan 74, 2005) Z(D) = p(d C, λ) = N dw p(w C, λ) p(y n w) n=1 1 N log Z(D) D,w t = 1 N lim n 0 n Z n (D) D,w t = α log p(y w (D), C, λ) y,d,w t + entropic terms Average case becomes typical for large N due to self-averaging

23 Standard PCA result for sparse PCA (C<1) for well-matched data for unmatched data L1 prior Standard PCA result (C=1) Learning exhibits phase transitions, e.g. (for α > 1) ( ρ(w ) = θ(α T 2 ) θ α where θ(x) is the step function and λ NT T = w t 2 N = NC tλ 1 t. ) α T 2 α + T 1

24 Standard PCA result for sparse PCA (C<1) for well-matched data for unmatched data L1 prior Standard PCA result (C=1) Learning exhibits phase transitions, e.g. (for α > 1) ( ρ(w ) = θ(α T 2 ) θ α where θ(x) is the step function and λ NT T = w t 2 N = NC tλ 1 t. ) α T 2 α + T 1 Consistent with result for Bayesian PCA with spherical prior p(w) δ( w 1) (Riemann et al. J. Phys. A 1996)

25 Standard PCA result for sparse PCA (C<1) for well-matched data for unmatched data L1 prior Standard PCA result (C=1) Learning exhibits phase transitions, e.g. (for α > 1) ( ρ(w ) = θ(α T 2 ) θ α where θ(x) is the step function and λ NT T = w t 2 N = NC tλ 1 t. ) α T 2 α + T 1 Consistent with result for Bayesian PCA with spherical prior p(w) δ( w 1) (Riemann et al. J. Phys. A 1996) Only new feature is 1st-order transition with increasing λ

26 Standard PCA result for sparse PCA (C<1) for well-matched data for unmatched data L1 prior Standard PCA result (C=1) λ t = N, N = 5000, M = (α = 5) 1 Theory vs. MCMC averaged over 10 data sets 0.8 ρ(w * ) λ/λ t

27 Standard PCA result for sparse PCA (C<1) for well-matched data for unmatched data L1 prior Standard PCA result (C=1) λ t = N, N = 5000, M = (α = 5) 1 Theory vs. MCMC averaged over 10 data sets 0.8 ρ(w * ) λ/λ t Here we will only consider learning away from phase transitions

28 Standard PCA result for sparse PCA (C<1) for well-matched data for unmatched data L1 prior for sparse PCA (C<1) We consider two types of data set distribution:

29 Standard PCA result for sparse PCA (C<1) for well-matched data for unmatched data L1 prior for sparse PCA (C<1) We consider two types of data set distribution: (1) Same form as prior: p(w t i ) = (1 C t )δ(w t i ) + C t N (w t i 0, λ 1 t )

30 Standard PCA result for sparse PCA (C<1) for well-matched data for unmatched data L1 prior for sparse PCA (C<1) We consider two types of data set distribution: (1) Same form as prior: p(w t i ) = (1 C t )δ(w t i ) + C t N (w t i 0, λ 1 t ) (2) Dierent form: p(w t i ) = (1 C t )δ(w t i ) + C t δ(w t i λ 1/2 t )

31 Standard PCA result for sparse PCA (C<1) for well-matched data for unmatched data L1 prior for sparse PCA (C<1) We consider two types of data set distribution: (1) Same form as prior: p(w t i ) = (1 C t )δ(w t i ) + C t N (w t i 0, λ 1 t ) (2) Dierent form: p(w t i ) = (1 C t )δ(w t i ) + C t δ(w t i λ 1/2 t ) Both give identical performance for standard PCA

32 Standard PCA result for sparse PCA (C<1) for well-matched data for unmatched data L1 prior for sparse PCA (C<1) We consider two types of data set distribution: (1) Same form as prior: p(w t i ) = (1 C t )δ(w t i ) + C t N (w t i 0, λ 1 t ) (2) Dierent form: p(w t i ) = (1 C t )δ(w t i ) + C t δ(w t i λ 1/2 t ) Both give identical performance for standard PCA Both give identical performance if sparsity is known

33 Standard PCA result for sparse PCA (C<1) for well-matched data for unmatched data L1 prior for data distribution (1): ρ(w ) and L(w ) p(wi t) = (1 C t)δ(wi t) + C tn (wi t 0, λ 1 t ) C t = 0.2, λ = λ t = N/100, M = 200, N = M/α Theory versus MCMC results for single dataset α=0.15 α=0.1 α=0.05 α= Theory versus MCMC results for single dataset α=0.15 α=0.1 α=0.05 α= ρ(w * ) 0.7 L(w * ) C C

34 Standard PCA result for sparse PCA (C<1) for well-matched data for unmatched data L1 prior for data distribution (1): ρ(w ) and L(w ) p(wi t) = (1 C t)δ(wi t) + C tn (wi t 0, λ 1 t ) C t = 0.2, λ = λ t = N/100, M = 200, N = M/α Theory versus MCMC averaged over 10 datasets α=0.15 α=0.1 α=0.05 α= Theory versus MCMC averaged over 10 datasets α=0.15 α=0.1 α=0.05 α= ρ(w * ) 0.7 L(w * ) C C

35 Standard PCA result for sparse PCA (C<1) for well-matched data for unmatched data L1 prior for data distribution (1): ρ(w ) and L(w ) C t = 0.2, λ t = 20, M = 200, N = 2000 (α = 0.1) 8 ρ(w * ) 8 L(w * ) λ/λ t 4 λ/λ t C C

36 Standard PCA result for sparse PCA (C<1) for well-matched data for unmatched data L1 prior for data distribution (1): p(d C, λ) C t = 0.2, λ t = 20, M = 200, N = 2000 (α = 0.1) 2 Theory for log p(d C,λ) 2 MCMC for p(c,λ D) averaged over 20 data sets λ/λ t λ/λ t C C

37 Standard PCA result for sparse PCA (C<1) for well-matched data for unmatched data L1 prior for data distribution (2): ρ(w ) p(w t i ) = (1 C t)δ(w t i ) + C tδ(w t i λ 1/2 t ) C t = 0.2, λ = λ t = N/100, M = 200, N = M/α Delta mixture versus Gaussian mixture theory α=0.2 α=0.15 α=0.1 α= Theory versus MCMC averaged over 10 datasets α=0.2 α=0.15 α=0.1 α= ρ(w * ) 0.7 ρ(w * ) C C

38 Standard PCA result for sparse PCA (C<1) for well-matched data for unmatched data L1 prior for data distribution (2): ρ(w ) and L(w ) C t = 0.2, λ t = 10, M = 200, N = 1000 (α = 0.2) 8 ρ(w * ) 8 L(w * ) λ/λ t 4 λ/λ t C C

39 Standard PCA result for sparse PCA (C<1) for well-matched data for unmatched data L1 prior for data distribution (2): p(d C, λ) C t = 0.2, λ t = 10, M = 200, N = 1000 (α = 0.2) 4 Theory for log p(d C,λ) 4 MCMC for p(c,λ D) averaged over 20 data sets λ/λ t 2 λ/λ t C C

40 Standard PCA result for sparse PCA (C<1) for well-matched data for unmatched data L1 prior Other priors? Is this problem specic to the mixture prior?

41 Standard PCA result for sparse PCA (C<1) for well-matched data for unmatched data L1 prior Other priors? Is this problem specic to the mixture prior? Consider the L1 prior (with an additional L2 term), p(w i ) e λ 2w 2 i 2 λ 1 wi

42 Standard PCA result for sparse PCA (C<1) for well-matched data for unmatched data L1 prior Other priors? Is this problem specic to the mixture prior? Consider the L1 prior (with an additional L2 term), Data distribution (2) p(w i ) e λ 2w 2 i 2 λ 1 wi p(w t i ) = (1 C t )δ(w t i ) + C t δ(w t i λ 1/2 t )

43 Standard PCA result for sparse PCA (C<1) for well-matched data for unmatched data L1 prior L1 prior results: ρ(w ) C t = 0.2, λ t = N/100, α = M/N L1 prior theory α=0.2 α=0.15 α=0.1 α= Mixture prior theory α=0.2 α=0.15 α=0.1 α= ρ(w * ) 0.8 ρ(w * ) λ C

44 Standard PCA result for sparse PCA (C<1) for well-matched data for unmatched data L1 prior L1 prior results: p(d λ 1, λ 2 ) versus L(w ) and ρ(w ) C t = 0.2, λ t = N/100, α = 0.2 L(w * ) p(d λ 1,λ 2 ) λ 2 /λ t λ λ 2 /λ t λ

45 Standard PCA result for sparse PCA (C<1) for well-matched data for unmatched data L1 prior L1 prior results: p(d λ 1, λ 2 ) versus L(w ) and ρ(w ) C t = 0.2, λ t = N/100, α = 0.2 ρ(w * ) p(d λ 1,λ 2 ) λ 2 /λ t λ λ 2 /λ t λ

46 Mixture prior works as expected when well-matched to data

47 Mixture prior works as expected when well-matched to data Marginal likelihood for mixture prior can be misleading

48 Mixture prior works as expected when well-matched to data Marginal likelihood for mixture prior can be misleading Marginal likelihood seems more eective for L1 prior

49 Mixture prior works as expected when well-matched to data Marginal likelihood for mixture prior can be misleading Marginal likelihood seems more eective for L1 prior... although L1 didn't really perform well (preliminary)

50 Mixture prior works as expected when well-matched to data Marginal likelihood for mixture prior can be misleading Marginal likelihood seems more eective for L1 prior... although L1 didn't really perform well (preliminary) Future work should look at multiple factors

51 Mixture prior works as expected when well-matched to data Marginal likelihood for mixture prior can be misleading Marginal likelihood seems more eective for L1 prior... although L1 didn't really perform well (preliminary) Future work should look at multiple factors Assessment of metrics using the full posterior

52 Mixture prior works as expected when well-matched to data Marginal likelihood for mixture prior can be misleading Marginal likelihood seems more eective for L1 prior... although L1 didn't really perform well (preliminary) Future work should look at multiple factors Assessment of metrics using the full posterior Comparison with MAP and ML approaches

53 Mixture prior works as expected when well-matched to data Marginal likelihood for mixture prior can be misleading Marginal likelihood seems more eective for L1 prior... although L1 didn't really perform well (preliminary) Future work should look at multiple factors Assessment of metrics using the full posterior Comparison with MAP and ML approaches And better priors!

Non-Parametric Bayes

Non-Parametric Bayes Non-Parametric Bayes Mark Schmidt UBC Machine Learning Reading Group January 2016 Current Hot Topics in Machine Learning Bayesian learning includes: Gaussian processes. Approximate inference. Bayesian

More information

Inferring Latent Functions with Gaussian Processes in Differential Equations

Inferring Latent Functions with Gaussian Processes in Differential Equations Inferring Latent Functions with in Differential Equations Neil Lawrence School of Computer Science University of Manchester Joint work with Magnus Rattray and Guido Sanguinetti 9th December 26 Outline

More information

CSci 8980: Advanced Topics in Graphical Models Gaussian Processes

CSci 8980: Advanced Topics in Graphical Models Gaussian Processes CSci 8980: Advanced Topics in Graphical Models Gaussian Processes Instructor: Arindam Banerjee November 15, 2007 Gaussian Processes Outline Gaussian Processes Outline Parametric Bayesian Regression Gaussian

More information

Modelling Transcriptional Regulation with Gaussian Processes

Modelling Transcriptional Regulation with Gaussian Processes Modelling Transcriptional Regulation with Gaussian Processes Neil Lawrence School of Computer Science University of Manchester Joint work with Magnus Rattray and Guido Sanguinetti 8th March 7 Outline Application

More information

Probabilistic Latent Semantic Analysis

Probabilistic Latent Semantic Analysis Probabilistic Latent Semantic Analysis Yuriy Sverchkov Intelligent Systems Program University of Pittsburgh October 6, 2011 Outline Latent Semantic Analysis (LSA) A quick review Probabilistic LSA (plsa)

More information

Sparse Stochastic Inference for Latent Dirichlet Allocation

Sparse Stochastic Inference for Latent Dirichlet Allocation Sparse Stochastic Inference for Latent Dirichlet Allocation David Mimno 1, Matthew D. Hoffman 2, David M. Blei 1 1 Dept. of Computer Science, Princeton U. 2 Dept. of Statistics, Columbia U. Presentation

More information

L 0 methods. H.J. Kappen Donders Institute for Neuroscience Radboud University, Nijmegen, the Netherlands. December 5, 2011.

L 0 methods. H.J. Kappen Donders Institute for Neuroscience Radboud University, Nijmegen, the Netherlands. December 5, 2011. L methods H.J. Kappen Donders Institute for Neuroscience Radboud University, Nijmegen, the Netherlands December 5, 2 Bert Kappen Outline George McCullochs model The Variational Garrote Bert Kappen L methods

More information

MCMC: Markov Chain Monte Carlo

MCMC: Markov Chain Monte Carlo I529: Machine Learning in Bioinformatics (Spring 2013) MCMC: Markov Chain Monte Carlo Yuzhen Ye School of Informatics and Computing Indiana University, Bloomington Spring 2013 Contents Review of Markov

More information

Hidden Markov Models. Terminology, Representation and Basic Problems

Hidden Markov Models. Terminology, Representation and Basic Problems Hidden Markov Models Terminology, Representation and Basic Problems Data analysis? Machine learning? In bioinformatics, we analyze a lot of (sequential) data (biological sequences) to learn unknown parameters

More information

Coupled Hidden Markov Models: Computational Challenges

Coupled Hidden Markov Models: Computational Challenges .. Coupled Hidden Markov Models: Computational Challenges Louis J. M. Aslett and Chris C. Holmes i-like Research Group University of Oxford Warwick Algorithms Seminar 7 th March 2014 ... Hidden Markov

More information

Introduction to Probabilistic Machine Learning

Introduction to Probabilistic Machine Learning Introduction to Probabilistic Machine Learning Piyush Rai Dept. of CSE, IIT Kanpur (Mini-course 1) Nov 03, 2015 Piyush Rai (IIT Kanpur) Introduction to Probabilistic Machine Learning 1 Machine Learning

More information

Regularization Parameter Selection for a Bayesian Multi-Level Group Lasso Regression Model with Application to Imaging Genomics

Regularization Parameter Selection for a Bayesian Multi-Level Group Lasso Regression Model with Application to Imaging Genomics Regularization Parameter Selection for a Bayesian Multi-Level Group Lasso Regression Model with Application to Imaging Genomics arxiv:1603.08163v1 [stat.ml] 7 Mar 016 Farouk S. Nathoo, Keelin Greenlaw,

More information

Latent Variable models for GWAs

Latent Variable models for GWAs Latent Variable models for GWAs Oliver Stegle Machine Learning and Computational Biology Research Group Max-Planck-Institutes Tübingen, Germany September 2011 O. Stegle Latent variable models for GWAs

More information

Research Article Modelling Transcriptional Regulation with a Mixture of Factor Analyzers and Variational Bayesian Expectation Maximization

Research Article Modelling Transcriptional Regulation with a Mixture of Factor Analyzers and Variational Bayesian Expectation Maximization Hindawi Publishing Corporation EURASIP Journal on Bioinformatics and Systems Biology Volume 9, Article ID 6168, 6 pages doi:1.1155/9/6168 Research Article Modelling Transcriptional Regulation with a Mixture

More information

Pattern Recognition and Machine Learning. Bishop Chapter 9: Mixture Models and EM

Pattern Recognition and Machine Learning. Bishop Chapter 9: Mixture Models and EM Pattern Recognition and Machine Learning Chapter 9: Mixture Models and EM Thomas Mensink Jakob Verbeek October 11, 27 Le Menu 9.1 K-means clustering Getting the idea with a simple example 9.2 Mixtures

More information

Introduction to Machine Learning

Introduction to Machine Learning Introduction to Machine Learning Brown University CSCI 1950-F, Spring 2012 Prof. Erik Sudderth Lecture 25: Markov Chain Monte Carlo (MCMC) Course Review and Advanced Topics Many figures courtesy Kevin

More information

Modelling gene expression dynamics with Gaussian processes

Modelling gene expression dynamics with Gaussian processes Modelling gene expression dynamics with Gaussian processes Regulatory Genomics and Epigenomics March th 6 Magnus Rattray Faculty of Life Sciences University of Manchester Talk Outline Introduction to Gaussian

More information

STA 4273H: Statistical Machine Learning

STA 4273H: Statistical Machine Learning STA 4273H: Statistical Machine Learning Russ Salakhutdinov Department of Statistics! rsalakhu@utstat.toronto.edu! http://www.utstat.utoronto.ca/~rsalakhu/ Sidney Smith Hall, Room 6002 Lecture 11 Project

More information

Tutorial on Gaussian Processes and the Gaussian Process Latent Variable Model

Tutorial on Gaussian Processes and the Gaussian Process Latent Variable Model Tutorial on Gaussian Processes and the Gaussian Process Latent Variable Model (& discussion on the GPLVM tech. report by Prof. N. Lawrence, 06) Andreas Damianou Department of Neuro- and Computer Science,

More information

STA 4273H: Statistical Machine Learning

STA 4273H: Statistical Machine Learning STA 4273H: Statistical Machine Learning Russ Salakhutdinov Department of Computer Science! Department of Statistical Sciences! rsalakhu@cs.toronto.edu! h0p://www.cs.utoronto.ca/~rsalakhu/ Lecture 7 Approximate

More information

Gaussian Mixture Model

Gaussian Mixture Model Case Study : Document Retrieval MAP EM, Latent Dirichlet Allocation, Gibbs Sampling Machine Learning/Statistics for Big Data CSE599C/STAT59, University of Washington Emily Fox 0 Emily Fox February 5 th,

More information

BAYESIAN METHODS FOR VARIABLE SELECTION WITH APPLICATIONS TO HIGH-DIMENSIONAL DATA

BAYESIAN METHODS FOR VARIABLE SELECTION WITH APPLICATIONS TO HIGH-DIMENSIONAL DATA BAYESIAN METHODS FOR VARIABLE SELECTION WITH APPLICATIONS TO HIGH-DIMENSIONAL DATA Intro: Course Outline and Brief Intro to Marina Vannucci Rice University, USA PASI-CIMAT 04/28-30/2010 Marina Vannucci

More information

Lecture : Probabilistic Machine Learning

Lecture : Probabilistic Machine Learning Lecture : Probabilistic Machine Learning Riashat Islam Reasoning and Learning Lab McGill University September 11, 2018 ML : Many Methods with Many Links Modelling Views of Machine Learning Machine Learning

More information

Bayesian Sparse Correlated Factor Analysis

Bayesian Sparse Correlated Factor Analysis Bayesian Sparse Correlated Factor Analysis 1 Abstract In this paper, we propose a new sparse correlated factor model under a Bayesian framework that intended to model transcription factor regulation in

More information

Bayesian Clustering with the Dirichlet Process: Issues with priors and interpreting MCMC. Shane T. Jensen

Bayesian Clustering with the Dirichlet Process: Issues with priors and interpreting MCMC. Shane T. Jensen Bayesian Clustering with the Dirichlet Process: Issues with priors and interpreting MCMC Shane T. Jensen Department of Statistics The Wharton School, University of Pennsylvania stjensen@wharton.upenn.edu

More information

SYDE 372 Introduction to Pattern Recognition. Probability Measures for Classification: Part I

SYDE 372 Introduction to Pattern Recognition. Probability Measures for Classification: Part I SYDE 372 Introduction to Pattern Recognition Probability Measures for Classification: Part I Alexander Wong Department of Systems Design Engineering University of Waterloo Outline 1 2 3 4 Why use probability

More information

Bayesian RL Seminar. Chris Mansley September 9, 2008

Bayesian RL Seminar. Chris Mansley September 9, 2008 Bayesian RL Seminar Chris Mansley September 9, 2008 Bayes Basic Probability One of the basic principles of probability theory, the chain rule, will allow us to derive most of the background material in

More information

Bayesian Learning. HT2015: SC4 Statistical Data Mining and Machine Learning. Maximum Likelihood Principle. The Bayesian Learning Framework

Bayesian Learning. HT2015: SC4 Statistical Data Mining and Machine Learning. Maximum Likelihood Principle. The Bayesian Learning Framework HT5: SC4 Statistical Data Mining and Machine Learning Dino Sejdinovic Department of Statistics Oxford http://www.stats.ox.ac.uk/~sejdinov/sdmml.html Maximum Likelihood Principle A generative model for

More information

GLOBEX Bioinformatics (Summer 2015) Genetic networks and gene expression data

GLOBEX Bioinformatics (Summer 2015) Genetic networks and gene expression data GLOBEX Bioinformatics (Summer 2015) Genetic networks and gene expression data 1 Gene Networks Definition: A gene network is a set of molecular components, such as genes and proteins, and interactions between

More information

Default Priors and Effcient Posterior Computation in Bayesian

Default Priors and Effcient Posterior Computation in Bayesian Default Priors and Effcient Posterior Computation in Bayesian Factor Analysis January 16, 2010 Presented by Eric Wang, Duke University Background and Motivation A Brief Review of Parameter Expansion Literature

More information

Manifold Learning for Signal and Visual Processing Lecture 9: Probabilistic PCA (PPCA), Factor Analysis, Mixtures of PPCA

Manifold Learning for Signal and Visual Processing Lecture 9: Probabilistic PCA (PPCA), Factor Analysis, Mixtures of PPCA Manifold Learning for Signal and Visual Processing Lecture 9: Probabilistic PCA (PPCA), Factor Analysis, Mixtures of PPCA Radu Horaud INRIA Grenoble Rhone-Alpes, France Radu.Horaud@inria.fr http://perception.inrialpes.fr/

More information

Markov Chain Monte Carlo Algorithms for Gaussian Processes

Markov Chain Monte Carlo Algorithms for Gaussian Processes Markov Chain Monte Carlo Algorithms for Gaussian Processes Michalis K. Titsias, Neil Lawrence and Magnus Rattray School of Computer Science University of Manchester June 8 Outline Gaussian Processes Sampling

More information

Deep Poisson Factorization Machines: a factor analysis model for mapping behaviors in journalist ecosystem

Deep Poisson Factorization Machines: a factor analysis model for mapping behaviors in journalist ecosystem 000 001 002 003 004 005 006 007 008 009 010 011 012 013 014 015 016 017 018 019 020 021 022 023 024 025 026 027 028 029 030 031 032 033 034 035 036 037 038 039 040 041 042 043 044 045 046 047 048 049 050

More information

Probabilistic Graphical Models

Probabilistic Graphical Models Probabilistic Graphical Models Lecture 9: Variational Inference Relaxations Volkan Cevher, Matthias Seeger Ecole Polytechnique Fédérale de Lausanne 24/10/2011 (EPFL) Graphical Models 24/10/2011 1 / 15

More information

Integrated Non-Factorized Variational Inference

Integrated Non-Factorized Variational Inference Integrated Non-Factorized Variational Inference Shaobo Han, Xuejun Liao and Lawrence Carin Duke University February 27, 2014 S. Han et al. Integrated Non-Factorized Variational Inference February 27, 2014

More information

Nonparametric Regression With Gaussian Processes

Nonparametric Regression With Gaussian Processes Nonparametric Regression With Gaussian Processes From Chap. 45, Information Theory, Inference and Learning Algorithms, D. J. C. McKay Presented by Micha Elsner Nonparametric Regression With Gaussian Processes

More information

Infinite latent feature models and the Indian Buffet Process

Infinite latent feature models and the Indian Buffet Process p.1 Infinite latent feature models and the Indian Buffet Process Tom Griffiths Cognitive and Linguistic Sciences Brown University Joint work with Zoubin Ghahramani p.2 Beyond latent classes Unsupervised

More information

Bayesian Machine Learning

Bayesian Machine Learning Bayesian Machine Learning Andrew Gordon Wilson ORIE 6741 Lecture 4 Occam s Razor, Model Construction, and Directed Graphical Models https://people.orie.cornell.edu/andrew/orie6741 Cornell University September

More information

Expectation Maximization

Expectation Maximization Expectation Maximization Aaron C. Courville Université de Montréal Note: Material for the slides is taken directly from a presentation prepared by Christopher M. Bishop Learning in DAGs Two things could

More information

Bayesian Machine Learning

Bayesian Machine Learning Bayesian Machine Learning Andrew Gordon Wilson ORIE 6741 Lecture 3 Stochastic Gradients, Bayesian Inference, and Occam s Razor https://people.orie.cornell.edu/andrew/orie6741 Cornell University August

More information

Bayesian inference J. Daunizeau

Bayesian inference J. Daunizeau Bayesian inference J. Daunizeau Brain and Spine Institute, Paris, France Wellcome Trust Centre for Neuroimaging, London, UK Overview of the talk 1 Probabilistic modelling and representation of uncertainty

More information

Bayesian Inference and MCMC

Bayesian Inference and MCMC Bayesian Inference and MCMC Aryan Arbabi Partly based on MCMC slides from CSC412 Fall 2018 1 / 18 Bayesian Inference - Motivation Consider we have a data set D = {x 1,..., x n }. E.g each x i can be the

More information

Recent Advances in Bayesian Inference Techniques

Recent Advances in Bayesian Inference Techniques Recent Advances in Bayesian Inference Techniques Christopher M. Bishop Microsoft Research, Cambridge, U.K. research.microsoft.com/~cmbishop SIAM Conference on Data Mining, April 2004 Abstract Bayesian

More information

Bayesian Hierarchical Classification. Seminar on Predicting Structured Data Jukka Kohonen

Bayesian Hierarchical Classification. Seminar on Predicting Structured Data Jukka Kohonen Bayesian Hierarchical Classification Seminar on Predicting Structured Data Jukka Kohonen 17.4.2008 Overview Intro: The task of hierarchical gene annotation Approach I: SVM/Bayes hybrid Barutcuoglu et al:

More information

GAUSSIAN PROCESS REGRESSION

GAUSSIAN PROCESS REGRESSION GAUSSIAN PROCESS REGRESSION CSE 515T Spring 2015 1. BACKGROUND The kernel trick again... The Kernel Trick Consider again the linear regression model: y(x) = φ(x) w + ε, with prior p(w) = N (w; 0, Σ). The

More information

Probabilistic Graphical Models

Probabilistic Graphical Models Probabilistic Graphical Models Brown University CSCI 295-P, Spring 213 Prof. Erik Sudderth Lecture 11: Inference & Learning Overview, Gaussian Graphical Models Some figures courtesy Michael Jordan s draft

More information

Introduction to Bioinformatics

Introduction to Bioinformatics CSCI8980: Applied Machine Learning in Computational Biology Introduction to Bioinformatics Rui Kuang Department of Computer Science and Engineering University of Minnesota kuang@cs.umn.edu History of Bioinformatics

More information

The Monte Carlo Method: Bayesian Networks

The Monte Carlo Method: Bayesian Networks The Method: Bayesian Networks Dieter W. Heermann Methods 2009 Dieter W. Heermann ( Methods)The Method: Bayesian Networks 2009 1 / 18 Outline 1 Bayesian Networks 2 Gene Expression Data 3 Bayesian Networks

More information

Machine Learning Techniques for Computer Vision

Machine Learning Techniques for Computer Vision Machine Learning Techniques for Computer Vision Part 2: Unsupervised Learning Microsoft Research Cambridge x 3 1 0.5 0.2 0 0.5 0.3 0 0.5 1 ECCV 2004, Prague x 2 x 1 Overview of Part 2 Mixture models EM

More information

Ages of stellar populations from color-magnitude diagrams. Paul Baines. September 30, 2008

Ages of stellar populations from color-magnitude diagrams. Paul Baines. September 30, 2008 Ages of stellar populations from color-magnitude diagrams Paul Baines Department of Statistics Harvard University September 30, 2008 Context & Example Welcome! Today we will look at using hierarchical

More information

20: Gaussian Processes

20: Gaussian Processes 10-708: Probabilistic Graphical Models 10-708, Spring 2016 20: Gaussian Processes Lecturer: Andrew Gordon Wilson Scribes: Sai Ganesh Bandiatmakuri 1 Discussion about ML Here we discuss an introduction

More information

Mixture models for analysing transcriptome and ChIP-chip data

Mixture models for analysing transcriptome and ChIP-chip data Mixture models for analysing transcriptome and ChIP-chip data Marie-Laure Martin-Magniette French National Institute for agricultural research (INRA) Unit of Applied Mathematics and Informatics at AgroParisTech,

More information

STA414/2104 Statistical Methods for Machine Learning II

STA414/2104 Statistical Methods for Machine Learning II STA414/2104 Statistical Methods for Machine Learning II Murat A. Erdogdu & David Duvenaud Department of Computer Science Department of Statistical Sciences Lecture 3 Slide credits: Russ Salakhutdinov Announcements

More information

Naïve Bayes classification

Naïve Bayes classification Naïve Bayes classification 1 Probability theory Random variable: a variable whose possible values are numerical outcomes of a random phenomenon. Examples: A person s height, the outcome of a coin toss

More information

Fitting Narrow Emission Lines in X-ray Spectra

Fitting Narrow Emission Lines in X-ray Spectra Outline Fitting Narrow Emission Lines in X-ray Spectra Taeyoung Park Department of Statistics, University of Pittsburgh October 11, 2007 Outline of Presentation Outline This talk has three components:

More information

CPSC 540: Machine Learning

CPSC 540: Machine Learning CPSC 540: Machine Learning Empirical Bayes, Hierarchical Bayes Mark Schmidt University of British Columbia Winter 2017 Admin Assignment 5: Due April 10. Project description on Piazza. Final details coming

More information

Based on slides by Richard Zemel

Based on slides by Richard Zemel CSC 412/2506 Winter 2018 Probabilistic Learning and Reasoning Lecture 3: Directed Graphical Models and Latent Variables Based on slides by Richard Zemel Learning outcomes What aspects of a model can we

More information

Learning the hyper-parameters. Luca Martino

Learning the hyper-parameters. Luca Martino Learning the hyper-parameters Luca Martino 2017 2017 1 / 28 Parameters and hyper-parameters 1. All the described methods depend on some choice of hyper-parameters... 2. For instance, do you recall λ (bandwidth

More information

Bayesian Learning in Undirected Graphical Models

Bayesian Learning in Undirected Graphical Models Bayesian Learning in Undirected Graphical Models Zoubin Ghahramani Gatsby Computational Neuroscience Unit University College London, UK http://www.gatsby.ucl.ac.uk/ Work with: Iain Murray and Hyun-Chul

More information

ICML Scalable Bayesian Inference on Point processes. with Gaussian Processes. Yves-Laurent Kom Samo & Stephen Roberts

ICML Scalable Bayesian Inference on Point processes. with Gaussian Processes. Yves-Laurent Kom Samo & Stephen Roberts ICML 2015 Scalable Nonparametric Bayesian Inference on Point Processes with Gaussian Processes Machine Learning Research Group and Oxford-Man Institute University of Oxford July 8, 2015 Point Processes

More information

Machine learning: Hypothesis testing. Anders Hildeman

Machine learning: Hypothesis testing. Anders Hildeman Location of trees 0 Observed trees 50 100 150 200 250 300 350 400 450 500 0 100 200 300 400 500 600 700 800 900 1000 Figur: Observed points pattern of the tree specie Beilschmiedia pendula. Location of

More information

Introduction to Systems Analysis and Decision Making Prepared by: Jakub Tomczak

Introduction to Systems Analysis and Decision Making Prepared by: Jakub Tomczak Introduction to Systems Analysis and Decision Making Prepared by: Jakub Tomczak 1 Introduction. Random variables During the course we are interested in reasoning about considered phenomenon. In other words,

More information

MCMC algorithms for fitting Bayesian models

MCMC algorithms for fitting Bayesian models MCMC algorithms for fitting Bayesian models p. 1/1 MCMC algorithms for fitting Bayesian models Sudipto Banerjee sudiptob@biostat.umn.edu University of Minnesota MCMC algorithms for fitting Bayesian models

More information

Probabilistic Graphical Models

Probabilistic Graphical Models 2016 Robert Nowak Probabilistic Graphical Models 1 Introduction We have focused mainly on linear models for signals, in particular the subspace model x = Uθ, where U is a n k matrix and θ R k is a vector

More information

Variational Inference. Sargur Srihari

Variational Inference. Sargur Srihari Variational Inference Sargur srihari@cedar.buffalo.edu 1 Plan of discussion We first describe inference with PGMs and the intractability of exact inference Then give a taxonomy of inference algorithms

More information

Bayesian Nonparametric Models

Bayesian Nonparametric Models Bayesian Nonparametric Models David M. Blei Columbia University December 15, 2015 Introduction We have been looking at models that posit latent structure in high dimensional data. We use the posterior

More information

Introduction to Bayesian inference

Introduction to Bayesian inference Introduction to Bayesian inference Thomas Alexander Brouwer University of Cambridge tab43@cam.ac.uk 17 November 2015 Probabilistic models Describe how data was generated using probability distributions

More information

Sequential Monte Carlo and Particle Filtering. Frank Wood Gatsby, November 2007

Sequential Monte Carlo and Particle Filtering. Frank Wood Gatsby, November 2007 Sequential Monte Carlo and Particle Filtering Frank Wood Gatsby, November 2007 Importance Sampling Recall: Let s say that we want to compute some expectation (integral) E p [f] = p(x)f(x)dx and we remember

More information

Lecture 13 Fundamentals of Bayesian Inference

Lecture 13 Fundamentals of Bayesian Inference Lecture 13 Fundamentals of Bayesian Inference Dennis Sun Stats 253 August 11, 2014 Outline of Lecture 1 Bayesian Models 2 Modeling Correlations Using Bayes 3 The Universal Algorithm 4 BUGS 5 Wrapping Up

More information

Density Estimation. Seungjin Choi

Density Estimation. Seungjin Choi Density Estimation Seungjin Choi Department of Computer Science and Engineering Pohang University of Science and Technology 77 Cheongam-ro, Nam-gu, Pohang 37673, Korea seungjin@postech.ac.kr http://mlg.postech.ac.kr/

More information

Advanced Introduction to Machine Learning

Advanced Introduction to Machine Learning 10-715 Advanced Introduction to Machine Learning Homework 3 Due Nov 12, 10.30 am Rules 1. Homework is due on the due date at 10.30 am. Please hand over your homework at the beginning of class. Please see

More information

Machine Learning. Probabilistic KNN.

Machine Learning. Probabilistic KNN. Machine Learning. Mark Girolami girolami@dcs.gla.ac.uk Department of Computing Science University of Glasgow June 21, 2007 p. 1/3 KNN is a remarkably simple algorithm with proven error-rates June 21, 2007

More information

A new Hierarchical Bayes approach to ensemble-variational data assimilation

A new Hierarchical Bayes approach to ensemble-variational data assimilation A new Hierarchical Bayes approach to ensemble-variational data assimilation Michael Tsyrulnikov and Alexander Rakitko HydroMetCenter of Russia College Park, 20 Oct 2014 Michael Tsyrulnikov and Alexander

More information

Bayesian Quadrature: Model-based Approximate Integration. David Duvenaud University of Cambridge

Bayesian Quadrature: Model-based Approximate Integration. David Duvenaud University of Cambridge Bayesian Quadrature: Model-based Approimate Integration David Duvenaud University of Cambridge The Quadrature Problem ˆ We want to estimate an integral Z = f ()p()d ˆ Most computational problems in inference

More information

MCMC and Gibbs Sampling. Kayhan Batmanghelich

MCMC and Gibbs Sampling. Kayhan Batmanghelich MCMC and Gibbs Sampling Kayhan Batmanghelich 1 Approaches to inference l Exact inference algorithms l l l The elimination algorithm Message-passing algorithm (sum-product, belief propagation) The junction

More information

Bayes methods for categorical data. April 25, 2017

Bayes methods for categorical data. April 25, 2017 Bayes methods for categorical data April 25, 2017 Motivation for joint probability models Increasing interest in high-dimensional data in broad applications Focus may be on prediction, variable selection,

More information

Chapter 16. Structured Probabilistic Models for Deep Learning

Chapter 16. Structured Probabilistic Models for Deep Learning Peng et al.: Deep Learning and Practice 1 Chapter 16 Structured Probabilistic Models for Deep Learning Peng et al.: Deep Learning and Practice 2 Structured Probabilistic Models way of using graphs to describe

More information

Approximate Inference using MCMC

Approximate Inference using MCMC Approximate Inference using MCMC 9.520 Class 22 Ruslan Salakhutdinov BCS and CSAIL, MIT 1 Plan 1. Introduction/Notation. 2. Examples of successful Bayesian models. 3. Basic Sampling Algorithms. 4. Markov

More information

Motivation Scale Mixutres of Normals Finite Gaussian Mixtures Skew-Normal Models. Mixture Models. Econ 690. Purdue University

Motivation Scale Mixutres of Normals Finite Gaussian Mixtures Skew-Normal Models. Mixture Models. Econ 690. Purdue University Econ 690 Purdue University In virtually all of the previous lectures, our models have made use of normality assumptions. From a computational point of view, the reason for this assumption is clear: combined

More information

Learning Bayesian network : Given structure and completely observed data

Learning Bayesian network : Given structure and completely observed data Learning Bayesian network : Given structure and completely observed data Probabilistic Graphical Models Sharif University of Technology Spring 2017 Soleymani Learning problem Target: true distribution

More information

Naïve Bayes classification. p ij 11/15/16. Probability theory. Probability theory. Probability theory. X P (X = x i )=1 i. Marginal Probability

Naïve Bayes classification. p ij 11/15/16. Probability theory. Probability theory. Probability theory. X P (X = x i )=1 i. Marginal Probability Probability theory Naïve Bayes classification Random variable: a variable whose possible values are numerical outcomes of a random phenomenon. s: A person s height, the outcome of a coin toss Distinguish

More information

Graphical Models and Kernel Methods

Graphical Models and Kernel Methods Graphical Models and Kernel Methods Jerry Zhu Department of Computer Sciences University of Wisconsin Madison, USA MLSS June 17, 2014 1 / 123 Outline Graphical Models Probabilistic Inference Directed vs.

More information

Probabilistic Reasoning in Deep Learning

Probabilistic Reasoning in Deep Learning Probabilistic Reasoning in Deep Learning Dr Konstantina Palla, PhD palla@stats.ox.ac.uk September 2017 Deep Learning Indaba, Johannesburgh Konstantina Palla 1 / 39 OVERVIEW OF THE TALK Basics of Bayesian

More information

Learning Sequence Motif Models Using Expectation Maximization (EM) and Gibbs Sampling

Learning Sequence Motif Models Using Expectation Maximization (EM) and Gibbs Sampling Learning Sequence Motif Models Using Expectation Maximization (EM) and Gibbs Sampling BMI/CS 776 www.biostat.wisc.edu/bmi776/ Spring 009 Mark Craven craven@biostat.wisc.edu Sequence Motifs what is a sequence

More information

Probabilistic Graphical Models

Probabilistic Graphical Models Probabilistic Graphical Models Introduction. Basic Probability and Bayes Volkan Cevher, Matthias Seeger Ecole Polytechnique Fédérale de Lausanne 26/9/2011 (EPFL) Graphical Models 26/9/2011 1 / 28 Outline

More information

Linear Dynamical Systems

Linear Dynamical Systems Linear Dynamical Systems Sargur N. srihari@cedar.buffalo.edu Machine Learning Course: http://www.cedar.buffalo.edu/~srihari/cse574/index.html Two Models Described by Same Graph Latent variables Observations

More information

Expectation maximization tutorial

Expectation maximization tutorial Expectation maximization tutorial Octavian Ganea November 18, 2016 1/1 Today Expectation - maximization algorithm Topic modelling 2/1 ML & MAP Observed data: X = {x 1, x 2... x N } 3/1 ML & MAP Observed

More information

Lecture 9: PGM Learning

Lecture 9: PGM Learning 13 Oct 2014 Intro. to Stats. Machine Learning COMP SCI 4401/7401 Table of Contents I Learning parameters in MRFs 1 Learning parameters in MRFs Inference and Learning Given parameters (of potentials) and

More information

Hub Gene Selection Methods for the Reconstruction of Transcription Networks

Hub Gene Selection Methods for the Reconstruction of Transcription Networks for the Reconstruction of Transcription Networks José Miguel Hernández-Lobato (1) and Tjeerd. M. H. Dijkstra (2) (1) Computer Science Department, Universidad Autónoma de Madrid, Spain (2) Institute for

More information

Nearest Neighbor Gaussian Processes for Large Spatial Data

Nearest Neighbor Gaussian Processes for Large Spatial Data Nearest Neighbor Gaussian Processes for Large Spatial Data Abhi Datta 1, Sudipto Banerjee 2 and Andrew O. Finley 3 July 31, 2017 1 Department of Biostatistics, Bloomberg School of Public Health, Johns

More information

Variational Autoencoders

Variational Autoencoders Variational Autoencoders Recap: Story so far A classification MLP actually comprises two components A feature extraction network that converts the inputs into linearly separable features Or nearly linearly

More information

Default Priors and Efficient Posterior Computation in Bayesian Factor Analysis

Default Priors and Efficient Posterior Computation in Bayesian Factor Analysis Default Priors and Efficient Posterior Computation in Bayesian Factor Analysis Joyee Ghosh Institute of Statistics and Decision Sciences, Duke University Box 90251, Durham, NC 27708 joyee@stat.duke.edu

More information

Making rating curves - the Bayesian approach

Making rating curves - the Bayesian approach Making rating curves - the Bayesian approach Rating curves what is wanted? A best estimate of the relationship between stage and discharge at a given place in a river. The relationship should be on the

More information

STA 414/2104: Machine Learning

STA 414/2104: Machine Learning STA 414/2104: Machine Learning Russ Salakhutdinov Department of Computer Science! Department of Statistics! rsalakhu@cs.toronto.edu! http://www.cs.toronto.edu/~rsalakhu/ Lecture 9 Sequential Data So far

More information

Contrastive Divergence

Contrastive Divergence Contrastive Divergence Training Products of Experts by Minimizing CD Hinton, 2002 Helmut Puhr Institute for Theoretical Computer Science TU Graz June 9, 2010 Contents 1 Theory 2 Argument 3 Contrastive

More information

Quantifying Fingerprint Evidence using Bayesian Alignment

Quantifying Fingerprint Evidence using Bayesian Alignment Quantifying Fingerprint Evidence using Bayesian Alignment Peter Forbes Joint work with Steffen Lauritzen and Jesper Møller Department of Statistics University of Oxford UCL CSML Lunch Talk 14 February

More information

Topic Models. Charles Elkan November 20, 2008

Topic Models. Charles Elkan November 20, 2008 Topic Models Charles Elan elan@cs.ucsd.edu November 20, 2008 Suppose that we have a collection of documents, and we want to find an organization for these, i.e. we want to do unsupervised learning. One

More information

Gaussian Processes for Machine Learning

Gaussian Processes for Machine Learning Gaussian Processes for Machine Learning Carl Edward Rasmussen Max Planck Institute for Biological Cybernetics Tübingen, Germany carl@tuebingen.mpg.de Carlos III, Madrid, May 2006 The actual science of

More information

Bayesian Analysis for Natural Language Processing Lecture 2

Bayesian Analysis for Natural Language Processing Lecture 2 Bayesian Analysis for Natural Language Processing Lecture 2 Shay Cohen February 4, 2013 Administrativia The class has a mailing list: coms-e6998-11@cs.columbia.edu Need two volunteers for leading a discussion

More information

Week 3: The EM algorithm

Week 3: The EM algorithm Week 3: The EM algorithm Maneesh Sahani maneesh@gatsby.ucl.ac.uk Gatsby Computational Neuroscience Unit University College London Term 1, Autumn 2005 Mixtures of Gaussians Data: Y = {y 1... y N } Latent

More information