Motif representation using position weight matrix

Size: px
Start display at page:

Download "Motif representation using position weight matrix"

Transcription

1 Motif representation using position weight matrix Xiaohui Xie University of California, Irvine Motif representation using position weight matrix p.1/31

2 Position weight matrix Position weight matrix representation of a motif with width w: θ 11 θ 12 θ 13 θ 14 θ 21 θ 22 θ 23 θ 24 θ = θ w1 θ w2 θ w3 θ w4 where each row represents one position of the motif, and is normalized: 4 θ ij = 1 (2) for all i = 1, 2,,w. j=1 (1) Motif representation using position weight matrix p.2/31

3 Likelihood Given the position weight matrix θ, the probability of generating a sequence S = (S 1,S 2,,S w ) from θ is P(S θ) = = w w P(S i θ i ) (3) θ i,si (4) For convenience, we have converted S from a string of {A,C,G,T } to a string of {1, 2, 3, 4}. Motif representation using position weight matrix p.3/31

4 Likelihood Suppose we observe not just one, but a set of sequences S 1,S 2,,S n, each of which contains exactly w letters. Assume each of them is generated independently from the model θ. Then, the likelihood of observing these n sequences is P(S 1,S 2,,S n θ) = = n k=1 n k=1 P(S k θ) (5) w θ i,ski = w 4 j=1 θ c ij ij (6) where c ij is the number of letter j at position i (Note that 4 j=1 c ij = n for all i). Motif representation using position weight matrix p.4/31

5 Parameter estimation Now suppose we do not know θ. How to estimate it from the observed sequence data S 1,S 2,,S n? One solution: calculate the likelihood of observing the provided n sequences for different values of θ, L(θ) = P(S 1,S 2,,S n θ) = n k=1 w θ i,ski (7) Pick the one with the largest likelihood, that is, to find θ that max θ P(S 1,S 2,,S n θ) (8) Motif representation using position weight matrix p.5/31

6 Maximum likelihood estimation Maximum likelihood estimation of θ is ˆθ ML s.t. = arg max θ log L(θ)) = w 4 c ij log θ ij j=1 4 θ ij = 1, i = 1,,w (9) j=1 Motif representation using position weight matrix p.6/31

7 Optimization with equality constraints Construct a Lagrangian function taking the equality constraint into account: w 4 g(θ) = log L(θ) + λ i (1 θ ij ) (10) j=1 Solve the unconstrained optimization problem w 4 w 4 ˆθ = arg max θ g(θ)) = j=1 c ij log θ ij + λ i (1 j=1 θ ij ) (11) Motif representation using position weight matrix p.7/31

8 Optimization with equality constraints Take the derivative of g(θ) w.r.t. θ ij and the Lagrange multiplier λ i and set them to 0 which leads to: g(θ) θ ij = 0 (12) g(θ) λ i = 0 (13) ˆθ ij = c ij n which is simply the frequency of different letters at each position. (c ij is the number of letter j at position i). (14) Motif representation using position weight matrix p.8/31

9 Bayes Theorem P(θ S) = P(S θ)p(θ) P(S) (15) Each term in Bayes theorem has a conventional name: 1. P(S θ) the conditional probability of S given θ, also called the likelihood. 2. P(θ) the prior probability or marginal probability of θ. 3. P(θ S) the conditional probability of θ given S, also called the posterior probability of θ 4. P(S) the marginal probability of S, and acts as a normalizing constant. Motif representation using position weight matrix p.9/31

10 Maximum a posteriori (MAP) estmation MAP (or posterior mode) estimation of θ is ˆθ MAP (S) = arg max θ = arg max θ P(θ S 1,S 2,,S n ) (16) log L(θ) + log P(θ) (17) Assume P(θ) = w P(θ i) (independence of θ i at diffferent position i). Model P(θ i ) with a Dirichlet distribution (θ i1,θ i2,θ i3,θ i4 ) Dir(α 1,α 2,α 3,α 4 ). (18) Motif representation using position weight matrix p.10/31

11 Dirichlet Distribution Probability density function of Dirichlet distribution Dir(α) of order K 2: p(x 1,,x K ;α 1,,α K ) = 1 B(α) K x α i 1 k (19) for all x 1,,x K > 0 and K x i = 1. The density is zero outside this open (K 1)-dimensional simplex. α = (α 1,,α K ) are parameters with α i > 0 for all i. B(α), the normalizing constant, is the multinomial beta function: B(α) = K Γ(α i) Γ( K α i) (20) Motif representation using position weight matrix p.11/31

12 Gamma function Gamma function for positive real z Γ(z) = 0 t z 1 e t dt (21) If n is a positive integer, then Γ(z + 1) = zγ(z) (22) Γ(n + 1) = n! (23) Motif representation using position weight matrix p.12/31

13 Properties of Dirichlet distribution Dirichlet distribution p(x 1,,x K ;α) = 1 B(α) Expectation, define α 0 = K α i, K x α i 1 i (24) E[X i ] = α i α 0 (25) Variance Co-variance V ar[x i ] = α i(α 0 α i ) α 2 0(α 0 + 1) Cov[X i X j ] = α iα j α0(α ) (26) (27) Motif representation using position weight matrix p.13/31

14 Posterior Distribution Conditional probability: P(S 1,S 2,,S n θ) = w 4 j=1 θc ij ij Prior probability: p(θ i1,,θ i4 ;α) = 1 B(α) Posterior probability: 4 θα i 1 i P(θ i S 1,,S n ) = Dir(c i1 + α 1,c i2 + α 2,c i3 + α 3,c i4 + α 4 ) Maxmium a posteriori estimate: θ MAP i = c ij + α i 1 n + α 0 4 (28) where α 0 i α i. Motif representation using position weight matrix p.14/31

15 Mixture of sequences Suppose we have a more difficult situation: Among the set of n given sequences, S 1,S 2,,S n, only a subset of them are generated by a weight matrix model θ. How to identify θ in this case? Let us first define the "non-motif" (also called background) sequence. Suppose they are generated from a single distribution p 0 = (p 0 A,p 0 C,p 0 G,p 0 T) = (p 0 1,p 0 2,p 0 3,p 0 4) (29) Motif representation using position weight matrix p.15/31

16 Likelihood for mixture of sequences Now the problem is we do not know which sequence is generated from the motif (θ) and which one is generated from the background model (θ 0 ). Suppose we are provided with such label information: 1 if S i is generated by θ z i = (30) 0 if S i is generated by θ 0 for all i = 1, 2,,n. Then, the likelihood of observing the n sequences P(S 1,S 2,,S n z,θ,θ 0 ) = n [z i P(S i θ) + (1 z i )P(S i θ 0 )] Motif representation using position weight matrix p.16/31

17 Maximum Likelihood Find the joint probability of sequences and the labels P(S,z θ,θ 0 ) = P(S z,θ,θ 0 )P(z) n = P(z i )[z i P(S i θ) + (1 z i )P(S i θ 0 )] where z (z 1,,z n ) and P(z) = i P(z i). Marginalize over labels to derive the likelihood L(θ) = P(S θ,θ 0 ) = n [P(z i = 1)P(S i θ)+p(z i = 0)P(S i θ 0 )] Maximum likelihood estimate: ˆθ ML = arg max θ log L(θ)) Motif representation using position weight matrix p.17/31

18 Maximum Likelihood Find the joint probability of sequences and the labels P(S,z θ,θ 0 ) = P(S z,θ,θ 0 )P(z) n = P(z i )[z i P(S i θ) + (1 z i )P(S i θ 0 )] where z (z 1,,z n ) and P(z) = i P(z i). Marginalize over labels to derive the likelihood L(θ) = P(S θ,θ 0 ) = n [P(z i = 1)P(S i θ)+p(z i = 0)P(S i θ 0 )] Maximum likelihood estimate: ˆθ ML = arg max θ log L(θ)) Motif representation using position weight matrix p.18/31

19 Lower bound on the L(θ) Log likelihood function log L(θ) = n log [P(z i = 1)P(S i z i = 1) + P(z i = 0)P(S i z i = 0)] where P(S i z i = 1) = P(S i θ) and P(S i z i = 0) = P(S i θ 0 ). Jensen s inequality: log(q 1 x + q 2 y) q 1 log(x) + q 2 log(y) for all q 1,q 2 0 and q 1 + q 2 = 1. Motif representation using position weight matrix p.19/31

20 EM-algorithm Lower bound on log L(θ). log L(θ) N {q i log P(z i = 1)P(S i z i = 1) q i + (1 q i )log P(z i = 0)P(S i z i = 0) 1 q i } N φ(q i,θ) Expectation-Maximization: Alternate between two steps: E-step M-step ˆq i = arg max q i ˆθ = arg max θ φ(q i, ˆθ) N φ(ˆq i,θ) Motif representation using position weight matrix p.20/31

21 E-Step Auxiliary function φ(q i, θ) = q i log P(z i = 1)P(S i z i = 1) q i +(1 q i ) log P(z i = 0)P(S i z i = 0) 1 q i E-step which leads to ˆq i = arg max q i φ(q i, ˆθ) ˆq i = P(z i = 1)P(S i z i = 1) P(z i = 1)P(S i z i = 1) + P(z i = 0)P(S i z i = 0) = P(z i S i ) Motif representation using position weight matrix p.21/31

22 M-Step Auxiliary function φ(q i, θ) = q i log P(z i = 1)P(S i z i = 1) q i +(1 q i ) log P(z i = 0)P(S i z i = 0) 1 q i M-step ˆθ = arg max θ = arg max θ N φ(ˆq i, θ) N ˆq i [log P(S i θ) + (1 ˆq i ) log P(S i θ 0 )] which leads to ˆθ ij = N k=1 ˆq ki(s ki = j) N k=1 ˆq k where I(a) is an indicator function: I(a) = 1 if a is true and 0 o.w. Motif representation using position weight matrix p.22/31

23 Summary of EM-algorithm Initialize parameters θ. Repeat until convergence E-step: estimate the expected values of labels, given the current parameter estimate ˆq i = P(z i S i ) M-step: re-estimate the parameters, given the expected estimates of the labels ˆθ ij = N k=1 ˆq ki(s ki = j) N k=1 ˆq k The procedure is guaranteed to converge to a local maximum or saddle point solution. Motif representation using position weight matrix p.23/31

24 What about MAP estimate? Consider a Dirichlet prior distribution on θ i : Initialize parameters θ. Repeat until convergence θ i = Dir(α) i = 1,, w E-step: estimate the expected values of labels, given the current parameter estimate ˆq i = P(z i S i ) M-step: re-estimate the parameters, given the expected estimates of the labels ˆθ ij = N k=1 ˆq ki(s ki = j) + α j 1 N k=1 ˆq k + α 0 4 Motif representation using position weight matrix p.24/31

25 ifferent methods for parameter estimatio So far, we have introduced two methods: ML and MAP Maximum likelihood (ML) ˆθ ML = arg max θ P(S θ) Maximum a posterior (MAP) ˆθ MAP = arg max θ P(θ S) Bayes Estimator ˆθ = arg max θ E [ˆθ(S) θ) 2 ] that is the one minimizing MSE( mean square error). ˆθ = E[θ S] = θp(θ S)dθ Motif representation using position weight matrix p.25/31

26 Joint distribution Joint distribution of labels and sequences P(S, z θ, θ 0 ) = P(S z, θ, θ 0 )P(z) = n P(z i )[z i P(S i θ) + (1 z i )P(S i θ 0 )] Joint distribution of S, z, θ and θ 0 P(S, z, θ, θ 0 ) = n P(z i )[z i P(S i θ) + (1 z i )P(S i θ 0 )]P(θ)P(θ 0 ) Find the joint distribution of S and z P(S, z) = n P(z i )[z i P(S i θ) + (1 z i )P(S i θ 0 )]P(θ)P(θ 0 )dθ 0 θ θ 0 Motif representation using position weight matrix p.26/31

27 Posterior distribution of labels Posterior distribution of z: P(z S) = θ n P(z i )[z i P(S i θ) + (1 z i )P(S i θ 0 )]P(θ)dθ/P(S) w q m (1 q) n m θ i 4 j=1 θ n ij ij P(θ i)dθ i w [ ] q m (1 q) n m B(ni + α) B(n0 + α 0 ) B(α) B(α 0 ) B(n 0 + α 0 ) B(α 0 ) where m = n z i, P(z i = 1) = q, P(θ i ) = Dir(α), P(θ 0 ) = Dir(α 0 ) are Dirichlet priors, and n ij n k=1 z ki(s ki = j) is the number of letter j at position i among the sequences with label 1. n i (n i1,, n i4 ). n 0,j n k=1 (1 z k) w I(S ki = j) and n 0 (n 0,1,, n 0,4 ). Motif representation using position weight matrix p.27/31

28 Sampling Posterior distribution of z: P(z S) q m (1 q) n m w B(n i + α) B(α) B(n 0 + α 0 ) B(α 0 ) Posterior distribution of z k conditioned on all other labels z k {z i i = 1,, n, i k}: P(z k = 1 z k, S) q w [ ] B(n k,i + (S ki ) + α) B(n k,i + α) where n k,ij n l=1,l k z li(s li = j) is the number of letter j at position i among all sequences with label 1 excluding the k th sequence. n k,i (n k,i1,, n k,i4 ). (l) = (b 1,, b 4 ) with b j = 1 for j = S ki and otherwise 0. Motif representation using position weight matrix p.28/31

29 Posterior distribution Posterior distribution of z k conditioned on z k : P(z k = 1 z k, S) q w n k,iski + α Ski 1 j [n k,ij + α j 1] = q w θ iski Note that θ iski is same as the MAP estimate of the frequency weight matrix using all sequences with label 1 excluding the k th sequence. Similarly P(z k = 0 z k, S) (1 q) w n k,0ski + α 0,Ski 1 j [n k,0j + α 0,j 1] = (1 q) θ 0,Ski is same as the MAP estimate of the background distribution. w θ 0,Ski Motif representation using position weight matrix p.29/31

30 Gibbs sampling Posterior probability w P(z k = 1 z k, S) θ iski P(z k = 0 z k, S) (1 q) Gibbs sampling w θ 0,Ski P(z k = 1 z k, S) = q w θ is ki q w θ is ki + (1 q) w θ 0,S ki Motif representation using position weight matrix p.30/31

31 Gibbs sampling Initialize labels z: Assign the value of z i randomly according to P(z i = 1) = q for all i = 1,, n. Repeat until converge Repeat from i = 1 to n Update θ matrix using the MAP estimate (excluding i th sequence) Sample the value of z i Motif representation using position weight matrix p.31/31

EM-algorithm for motif discovery

EM-algorithm for motif discovery EM-algorithm for motif discovery Xiaohui Xie University of California, Irvine EM-algorithm for motif discovery p.1/19 Position weight matrix Position weight matrix representation of a motif with width

More information

Expectation maximization tutorial

Expectation maximization tutorial Expectation maximization tutorial Octavian Ganea November 18, 2016 1/1 Today Expectation - maximization algorithm Topic modelling 2/1 ML & MAP Observed data: X = {x 1, x 2... x N } 3/1 ML & MAP Observed

More information

Lecture 6: Graphical Models: Learning

Lecture 6: Graphical Models: Learning Lecture 6: Graphical Models: Learning 4F13: Machine Learning Zoubin Ghahramani and Carl Edward Rasmussen Department of Engineering, University of Cambridge February 3rd, 2010 Ghahramani & Rasmussen (CUED)

More information

Machine Learning Summer School

Machine Learning Summer School Machine Learning Summer School Lecture 3: Learning parameters and structure Zoubin Ghahramani zoubin@eng.cam.ac.uk http://learning.eng.cam.ac.uk/zoubin/ Department of Engineering University of Cambridge,

More information

Parametric Unsupervised Learning Expectation Maximization (EM) Lecture 20.a

Parametric Unsupervised Learning Expectation Maximization (EM) Lecture 20.a Parametric Unsupervised Learning Expectation Maximization (EM) Lecture 20.a Some slides are due to Christopher Bishop Limitations of K-means Hard assignments of data points to clusters small shift of a

More information

Overview of Course. Nevin L. Zhang (HKUST) Bayesian Networks Fall / 58

Overview of Course. Nevin L. Zhang (HKUST) Bayesian Networks Fall / 58 Overview of Course So far, we have studied The concept of Bayesian network Independence and Separation in Bayesian networks Inference in Bayesian networks The rest of the course: Data analysis using Bayesian

More information

Another Walkthrough of Variational Bayes. Bevan Jones Machine Learning Reading Group Macquarie University

Another Walkthrough of Variational Bayes. Bevan Jones Machine Learning Reading Group Macquarie University Another Walkthrough of Variational Bayes Bevan Jones Machine Learning Reading Group Macquarie University 2 Variational Bayes? Bayes Bayes Theorem But the integral is intractable! Sampling Gibbs, Metropolis

More information

Introduction to Machine Learning. Maximum Likelihood and Bayesian Inference. Lecturers: Eran Halperin, Lior Wolf

Introduction to Machine Learning. Maximum Likelihood and Bayesian Inference. Lecturers: Eran Halperin, Lior Wolf 1 Introduction to Machine Learning Maximum Likelihood and Bayesian Inference Lecturers: Eran Halperin, Lior Wolf 2014-15 We know that X ~ B(n,p), but we do not know p. We get a random sample from X, a

More information

Introduction to Machine Learning. Maximum Likelihood and Bayesian Inference. Lecturers: Eran Halperin, Yishay Mansour, Lior Wolf

Introduction to Machine Learning. Maximum Likelihood and Bayesian Inference. Lecturers: Eran Halperin, Yishay Mansour, Lior Wolf 1 Introduction to Machine Learning Maximum Likelihood and Bayesian Inference Lecturers: Eran Halperin, Yishay Mansour, Lior Wolf 2013-14 We know that X ~ B(n,p), but we do not know p. We get a random sample

More information

A Note on the Expectation-Maximization (EM) Algorithm

A Note on the Expectation-Maximization (EM) Algorithm A Note on the Expectation-Maximization (EM) Algorithm ChengXiang Zhai Department of Computer Science University of Illinois at Urbana-Champaign March 11, 2007 1 Introduction The Expectation-Maximization

More information

COS513 LECTURE 8 STATISTICAL CONCEPTS

COS513 LECTURE 8 STATISTICAL CONCEPTS COS513 LECTURE 8 STATISTICAL CONCEPTS NIKOLAI SLAVOV AND ANKUR PARIKH 1. MAKING MEANINGFUL STATEMENTS FROM JOINT PROBABILITY DISTRIBUTIONS. A graphical model (GM) represents a family of probability distributions

More information

Introduction to Probabilistic Machine Learning

Introduction to Probabilistic Machine Learning Introduction to Probabilistic Machine Learning Piyush Rai Dept. of CSE, IIT Kanpur (Mini-course 1) Nov 03, 2015 Piyush Rai (IIT Kanpur) Introduction to Probabilistic Machine Learning 1 Machine Learning

More information

Bayesian Learning. HT2015: SC4 Statistical Data Mining and Machine Learning. Maximum Likelihood Principle. The Bayesian Learning Framework

Bayesian Learning. HT2015: SC4 Statistical Data Mining and Machine Learning. Maximum Likelihood Principle. The Bayesian Learning Framework HT5: SC4 Statistical Data Mining and Machine Learning Dino Sejdinovic Department of Statistics Oxford http://www.stats.ox.ac.uk/~sejdinov/sdmml.html Maximum Likelihood Principle A generative model for

More information

Lecture 8: Graphical models for Text

Lecture 8: Graphical models for Text Lecture 8: Graphical models for Text 4F13: Machine Learning Joaquin Quiñonero-Candela and Carl Edward Rasmussen Department of Engineering University of Cambridge http://mlg.eng.cam.ac.uk/teaching/4f13/

More information

Expectation Maximization

Expectation Maximization Expectation Maximization Seungjin Choi Department of Computer Science and Engineering Pohang University of Science and Technology 77 Cheongam-ro, Nam-gu, Pohang 37673, Korea seungjin@postech.ac.kr 1 /

More information

Programming Assignment 4: Image Completion using Mixture of Bernoullis

Programming Assignment 4: Image Completion using Mixture of Bernoullis Programming Assignment 4: Image Completion using Mixture of Bernoullis Deadline: Tuesday, April 4, at 11:59pm TA: Renie Liao (csc321ta@cs.toronto.edu) Submission: You must submit two files through MarkUs

More information

Statistics 3858 : Maximum Likelihood Estimators

Statistics 3858 : Maximum Likelihood Estimators Statistics 3858 : Maximum Likelihood Estimators 1 Method of Maximum Likelihood In this method we construct the so called likelihood function, that is L(θ) = L(θ; X 1, X 2,..., X n ) = f n (X 1, X 2,...,

More information

Introduction to Machine Learning

Introduction to Machine Learning Introduction to Machine Learning Brown University CSCI 1950-F, Spring 2012 Prof. Erik Sudderth Lecture 20: Expectation Maximization Algorithm EM for Mixture Models Many figures courtesy Kevin Murphy s

More information

CS540 Machine learning L9 Bayesian statistics

CS540 Machine learning L9 Bayesian statistics CS540 Machine learning L9 Bayesian statistics 1 Last time Naïve Bayes Beta-Bernoulli 2 Outline Bayesian concept learning Beta-Bernoulli model (review) Dirichlet-multinomial model Credible intervals 3 Bayesian

More information

Decision theory. 1 We may also consider randomized decision rules, where δ maps observed data D to a probability distribution over

Decision theory. 1 We may also consider randomized decision rules, where δ maps observed data D to a probability distribution over Point estimation Suppose we are interested in the value of a parameter θ, for example the unknown bias of a coin. We have already seen how one may use the Bayesian method to reason about θ; namely, we

More information

2 Inference for Multinomial Distribution

2 Inference for Multinomial Distribution Markov Chain Monte Carlo Methods Part III: Statistical Concepts By K.B.Athreya, Mohan Delampady and T.Krishnan 1 Introduction In parts I and II of this series it was shown how Markov chain Monte Carlo

More information

Parametric Models. Dr. Shuang LIANG. School of Software Engineering TongJi University Fall, 2012

Parametric Models. Dr. Shuang LIANG. School of Software Engineering TongJi University Fall, 2012 Parametric Models Dr. Shuang LIANG School of Software Engineering TongJi University Fall, 2012 Today s Topics Maximum Likelihood Estimation Bayesian Density Estimation Today s Topics Maximum Likelihood

More information

Introduction to Bayesian inference

Introduction to Bayesian inference Introduction to Bayesian inference Thomas Alexander Brouwer University of Cambridge tab43@cam.ac.uk 17 November 2015 Probabilistic models Describe how data was generated using probability distributions

More information

Probability and Estimation. Alan Moses

Probability and Estimation. Alan Moses Probability and Estimation Alan Moses Random variables and probability A random variable is like a variable in algebra (e.g., y=e x ), but where at least part of the variability is taken to be stochastic.

More information

Computational Cognitive Science

Computational Cognitive Science Computational Cognitive Science Lecture 8: Frank Keller School of Informatics University of Edinburgh keller@inf.ed.ac.uk Based on slides by Sharon Goldwater October 14, 2016 Frank Keller Computational

More information

Learning Bayesian network : Given structure and completely observed data

Learning Bayesian network : Given structure and completely observed data Learning Bayesian network : Given structure and completely observed data Probabilistic Graphical Models Sharif University of Technology Spring 2017 Soleymani Learning problem Target: true distribution

More information

Fundamentals. CS 281A: Statistical Learning Theory. Yangqing Jia. August, Based on tutorial slides by Lester Mackey and Ariel Kleiner

Fundamentals. CS 281A: Statistical Learning Theory. Yangqing Jia. August, Based on tutorial slides by Lester Mackey and Ariel Kleiner Fundamentals CS 281A: Statistical Learning Theory Yangqing Jia Based on tutorial slides by Lester Mackey and Ariel Kleiner August, 2011 Outline 1 Probability 2 Statistics 3 Linear Algebra 4 Optimization

More information

Bayesian Nonparametrics

Bayesian Nonparametrics Bayesian Nonparametrics Lorenzo Rosasco 9.520 Class 18 April 11, 2011 About this class Goal To give an overview of some of the basic concepts in Bayesian Nonparametrics. In particular, to discuss Dirichelet

More information

David Giles Bayesian Econometrics

David Giles Bayesian Econometrics David Giles Bayesian Econometrics 1. General Background 2. Constructing Prior Distributions 3. Properties of Bayes Estimators and Tests 4. Bayesian Analysis of the Multiple Regression Model 5. Bayesian

More information

Maximum Likelihood (ML), Expectation Maximization (EM) Pieter Abbeel UC Berkeley EECS

Maximum Likelihood (ML), Expectation Maximization (EM) Pieter Abbeel UC Berkeley EECS Maximum Likelihood (ML), Expectation Maximization (EM) Pieter Abbeel UC Berkeley EECS Many slides adapted from Thrun, Burgard and Fox, Probabilistic Robotics Outline Maximum likelihood (ML) Priors, and

More information

Bayesian Inference for Dirichlet-Multinomials

Bayesian Inference for Dirichlet-Multinomials Bayesian Inference for Dirichlet-Multinomials Mark Johnson Macquarie University Sydney, Australia MLSS Summer School 1 / 50 Random variables and distributed according to notation A probability distribution

More information

A Very Brief Summary of Bayesian Inference, and Examples

A Very Brief Summary of Bayesian Inference, and Examples A Very Brief Summary of Bayesian Inference, and Examples Trinity Term 009 Prof Gesine Reinert Our starting point are data x = x 1, x,, x n, which we view as realisations of random variables X 1, X,, X

More information

Language as a Stochastic Process

Language as a Stochastic Process CS769 Spring 2010 Advanced Natural Language Processing Language as a Stochastic Process Lecturer: Xiaojin Zhu jerryzhu@cs.wisc.edu 1 Basic Statistics for NLP Pick an arbitrary letter x at random from any

More information

Variational Scoring of Graphical Model Structures

Variational Scoring of Graphical Model Structures Variational Scoring of Graphical Model Structures Matthew J. Beal Work with Zoubin Ghahramani & Carl Rasmussen, Toronto. 15th September 2003 Overview Bayesian model selection Approximations using Variational

More information

Probabilistic Graphical Models

Probabilistic Graphical Models Parameter Estimation December 14, 2015 Overview 1 Motivation 2 3 4 What did we have so far? 1 Representations: how do we model the problem? (directed/undirected). 2 Inference: given a model and partially

More information

CS Lecture 18. Topic Models and LDA

CS Lecture 18. Topic Models and LDA CS 6347 Lecture 18 Topic Models and LDA (some slides by David Blei) Generative vs. Discriminative Models Recall that, in Bayesian networks, there could be many different, but equivalent models of the same

More information

4: Parameter Estimation in Fully Observed BNs

4: Parameter Estimation in Fully Observed BNs 10-708: Probabilistic Graphical Models 10-708, Spring 2015 4: Parameter Estimation in Fully Observed BNs Lecturer: Eric P. Xing Scribes: Purvasha Charavarti, Natalie Klein, Dipan Pal 1 Learning Graphical

More information

Lecture 4. Generative Models for Discrete Data - Part 3. Luigi Freda. ALCOR Lab DIAG University of Rome La Sapienza.

Lecture 4. Generative Models for Discrete Data - Part 3. Luigi Freda. ALCOR Lab DIAG University of Rome La Sapienza. Lecture 4 Generative Models for Discrete Data - Part 3 Luigi Freda ALCOR Lab DIAG University of Rome La Sapienza October 6, 2017 Luigi Freda ( La Sapienza University) Lecture 4 October 6, 2017 1 / 46 Outline

More information

CSC411 Fall 2018 Homework 5

CSC411 Fall 2018 Homework 5 Homework 5 Deadline: Wednesday, Nov. 4, at :59pm. Submission: You need to submit two files:. Your solutions to Questions and 2 as a PDF file, hw5_writeup.pdf, through MarkUs. (If you submit answers to

More information

Lecture 18: Bayesian Inference

Lecture 18: Bayesian Inference Lecture 18: Bayesian Inference Hyang-Won Lee Dept. of Internet & Multimedia Eng. Konkuk University Lecture 18 Probability and Statistics, Spring 2014 1 / 10 Bayesian Statistical Inference Statiscal inference

More information

LDA with Amortized Inference

LDA with Amortized Inference LDA with Amortied Inference Nanbo Sun Abstract This report describes how to frame Latent Dirichlet Allocation LDA as a Variational Auto- Encoder VAE and use the Amortied Variational Inference AVI to optimie

More information

Lecture 4 September 15

Lecture 4 September 15 IFT 6269: Probabilistic Graphical Models Fall 2017 Lecture 4 September 15 Lecturer: Simon Lacoste-Julien Scribe: Philippe Brouillard & Tristan Deleu 4.1 Maximum Likelihood principle Given a parametric

More information

Linear Models A linear model is defined by the expression

Linear Models A linear model is defined by the expression Linear Models A linear model is defined by the expression x = F β + ɛ. where x = (x 1, x 2,..., x n ) is vector of size n usually known as the response vector. β = (β 1, β 2,..., β p ) is the transpose

More information

Exercise 1: Basics of probability calculus

Exercise 1: Basics of probability calculus : Basics of probability calculus Stig-Arne Grönroos Department of Signal Processing and Acoustics Aalto University, School of Electrical Engineering stig-arne.gronroos@aalto.fi [21.01.2016] Ex 1.1: Conditional

More information

Computational Cognitive Science

Computational Cognitive Science Computational Cognitive Science Lecture 9: Bayesian Estimation Chris Lucas (Slides adapted from Frank Keller s) School of Informatics University of Edinburgh clucas2@inf.ed.ac.uk 17 October, 2017 1 / 28

More information

Bayesian Methods. David S. Rosenberg. New York University. March 20, 2018

Bayesian Methods. David S. Rosenberg. New York University. March 20, 2018 Bayesian Methods David S. Rosenberg New York University March 20, 2018 David S. Rosenberg (New York University) DS-GA 1003 / CSCI-GA 2567 March 20, 2018 1 / 38 Contents 1 Classical Statistics 2 Bayesian

More information

Bayesian Inference and MCMC

Bayesian Inference and MCMC Bayesian Inference and MCMC Aryan Arbabi Partly based on MCMC slides from CSC412 Fall 2018 1 / 18 Bayesian Inference - Motivation Consider we have a data set D = {x 1,..., x n }. E.g each x i can be the

More information

Lecture 3: More on regularization. Bayesian vs maximum likelihood learning

Lecture 3: More on regularization. Bayesian vs maximum likelihood learning Lecture 3: More on regularization. Bayesian vs maximum likelihood learning L2 and L1 regularization for linear estimators A Bayesian interpretation of regularization Bayesian vs maximum likelihood fitting

More information

The Monte Carlo Method: Bayesian Networks

The Monte Carlo Method: Bayesian Networks The Method: Bayesian Networks Dieter W. Heermann Methods 2009 Dieter W. Heermann ( Methods)The Method: Bayesian Networks 2009 1 / 18 Outline 1 Bayesian Networks 2 Gene Expression Data 3 Bayesian Networks

More information

The Expectation Maximization or EM algorithm

The Expectation Maximization or EM algorithm The Expectation Maximization or EM algorithm Carl Edward Rasmussen November 15th, 2017 Carl Edward Rasmussen The EM algorithm November 15th, 2017 1 / 11 Contents notation, objective the lower bound functional,

More information

PMR Learning as Inference

PMR Learning as Inference Outline PMR Learning as Inference Probabilistic Modelling and Reasoning Amos Storkey Modelling 2 The Exponential Family 3 Bayesian Sets School of Informatics, University of Edinburgh Amos Storkey PMR Learning

More information

Unsupervised Learning

Unsupervised Learning Unsupervised Learning Bayesian Model Comparison Zoubin Ghahramani zoubin@gatsby.ucl.ac.uk Gatsby Computational Neuroscience Unit, and MSc in Intelligent Systems, Dept Computer Science University College

More information

Probabilistic Time Series Classification

Probabilistic Time Series Classification Probabilistic Time Series Classification Y. Cem Sübakan Boğaziçi University 25.06.2013 Y. Cem Sübakan (Boğaziçi University) M.Sc. Thesis Defense 25.06.2013 1 / 54 Problem Statement The goal is to assign

More information

Probabilistic modeling. The slides are closely adapted from Subhransu Maji s slides

Probabilistic modeling. The slides are closely adapted from Subhransu Maji s slides Probabilistic modeling The slides are closely adapted from Subhransu Maji s slides Overview So far the models and algorithms you have learned about are relatively disconnected Probabilistic modeling framework

More information

Latent Variable View of EM. Sargur Srihari

Latent Variable View of EM. Sargur Srihari Latent Variable View of EM Sargur srihari@cedar.buffalo.edu 1 Examples of latent variables 1. Mixture Model Joint distribution is p(x,z) We don t have values for z 2. Hidden Markov Model A single time

More information

Basic math for biology

Basic math for biology Basic math for biology Lei Li Florida State University, Feb 6, 2002 The EM algorithm: setup Parametric models: {P θ }. Data: full data (Y, X); partial data Y. Missing data: X. Likelihood and maximum likelihood

More information

Mixtures of Gaussians. Sargur Srihari

Mixtures of Gaussians. Sargur Srihari Mixtures of Gaussians Sargur srihari@cedar.buffalo.edu 1 9. Mixture Models and EM 0. Mixture Models Overview 1. K-Means Clustering 2. Mixtures of Gaussians 3. An Alternative View of EM 4. The EM Algorithm

More information

Density Estimation. Seungjin Choi

Density Estimation. Seungjin Choi Density Estimation Seungjin Choi Department of Computer Science and Engineering Pohang University of Science and Technology 77 Cheongam-ro, Nam-gu, Pohang 37673, Korea seungjin@postech.ac.kr http://mlg.postech.ac.kr/

More information

Statistical and Learning Techniques in Computer Vision Lecture 2: Maximum Likelihood and Bayesian Estimation Jens Rittscher and Chuck Stewart

Statistical and Learning Techniques in Computer Vision Lecture 2: Maximum Likelihood and Bayesian Estimation Jens Rittscher and Chuck Stewart Statistical and Learning Techniques in Computer Vision Lecture 2: Maximum Likelihood and Bayesian Estimation Jens Rittscher and Chuck Stewart 1 Motivation and Problem In Lecture 1 we briefly saw how histograms

More information

Gaussian Mixture Models, Expectation Maximization

Gaussian Mixture Models, Expectation Maximization Gaussian Mixture Models, Expectation Maximization Instructor: Jessica Wu Harvey Mudd College The instructor gratefully acknowledges Andrew Ng (Stanford), Andrew Moore (CMU), Eric Eaton (UPenn), David Kauchak

More information

Survey of JPEG compression history analysis

Survey of JPEG compression history analysis Survey of JPEG compression history analysis Andrew B. Lewis Computer Laboratory Topics in security forensic signal analysis References Neelamani et al.: JPEG compression history estimation for color images,

More information

Chapter 5 continued. Chapter 5 sections

Chapter 5 continued. Chapter 5 sections Chapter 5 sections Discrete univariate distributions: 5.2 Bernoulli and Binomial distributions Just skim 5.3 Hypergeometric distributions 5.4 Poisson distributions Just skim 5.5 Negative Binomial distributions

More information

Graphical Models for Collaborative Filtering

Graphical Models for Collaborative Filtering Graphical Models for Collaborative Filtering Le Song Machine Learning II: Advanced Topics CSE 8803ML, Spring 2012 Sequence modeling HMM, Kalman Filter, etc.: Similarity: the same graphical model topology,

More information

Hierarchical Models & Bayesian Model Selection

Hierarchical Models & Bayesian Model Selection Hierarchical Models & Bayesian Model Selection Geoffrey Roeder Departments of Computer Science and Statistics University of British Columbia Jan. 20, 2016 Contact information Please report any typos or

More information

Machine Learning. Gaussian Mixture Models. Zhiyao Duan & Bryan Pardo, Machine Learning: EECS 349 Fall

Machine Learning. Gaussian Mixture Models. Zhiyao Duan & Bryan Pardo, Machine Learning: EECS 349 Fall Machine Learning Gaussian Mixture Models Zhiyao Duan & Bryan Pardo, Machine Learning: EECS 349 Fall 2012 1 The Generative Model POV We think of the data as being generated from some process. We assume

More information

Gaussian Mixture Models

Gaussian Mixture Models Gaussian Mixture Models Pradeep Ravikumar Co-instructor: Manuela Veloso Machine Learning 10-701 Some slides courtesy of Eric Xing, Carlos Guestrin (One) bad case for K- means Clusters may overlap Some

More information

Outline. Binomial, Multinomial, Normal, Beta, Dirichlet. Posterior mean, MAP, credible interval, posterior distribution

Outline. Binomial, Multinomial, Normal, Beta, Dirichlet. Posterior mean, MAP, credible interval, posterior distribution Outline A short review on Bayesian analysis. Binomial, Multinomial, Normal, Beta, Dirichlet Posterior mean, MAP, credible interval, posterior distribution Gibbs sampling Revisit the Gaussian mixture model

More information

Lecture 2: Priors and Conjugacy

Lecture 2: Priors and Conjugacy Lecture 2: Priors and Conjugacy Melih Kandemir melih.kandemir@iwr.uni-heidelberg.de May 6, 2014 Some nice courses Fred A. Hamprecht (Heidelberg U.) https://www.youtube.com/watch?v=j66rrnzzkow Michael I.

More information

COMS 4721: Machine Learning for Data Science Lecture 16, 3/28/2017

COMS 4721: Machine Learning for Data Science Lecture 16, 3/28/2017 COMS 4721: Machine Learning for Data Science Lecture 16, 3/28/2017 Prof. John Paisley Department of Electrical Engineering & Data Science Institute Columbia University SOFT CLUSTERING VS HARD CLUSTERING

More information

Clustering problems, mixture models and Bayesian nonparametrics

Clustering problems, mixture models and Bayesian nonparametrics Clustering problems, mixture models and Bayesian nonparametrics Nguyễn Xuân Long Department of Statistics Department of Electrical Engineering and Computer Science University of Michigan Vietnam Institute

More information

Learning Sequence Motif Models Using Expectation Maximization (EM) and Gibbs Sampling

Learning Sequence Motif Models Using Expectation Maximization (EM) and Gibbs Sampling Learning Sequence Motif Models Using Expectation Maximization (EM) and Gibbs Sampling BMI/CS 776 www.biostat.wisc.edu/bmi776/ Spring 009 Mark Craven craven@biostat.wisc.edu Sequence Motifs what is a sequence

More information

Dynamic Approaches: The Hidden Markov Model

Dynamic Approaches: The Hidden Markov Model Dynamic Approaches: The Hidden Markov Model Davide Bacciu Dipartimento di Informatica Università di Pisa bacciu@di.unipi.it Machine Learning: Neural Networks and Advanced Models (AA2) Inference as Message

More information

Pattern Recognition and Machine Learning. Bishop Chapter 9: Mixture Models and EM

Pattern Recognition and Machine Learning. Bishop Chapter 9: Mixture Models and EM Pattern Recognition and Machine Learning Chapter 9: Mixture Models and EM Thomas Mensink Jakob Verbeek October 11, 27 Le Menu 9.1 K-means clustering Getting the idea with a simple example 9.2 Mixtures

More information

Infinite latent feature models and the Indian Buffet Process

Infinite latent feature models and the Indian Buffet Process p.1 Infinite latent feature models and the Indian Buffet Process Tom Griffiths Cognitive and Linguistic Sciences Brown University Joint work with Zoubin Ghahramani p.2 Beyond latent classes Unsupervised

More information

CSC321 Lecture 18: Learning Probabilistic Models

CSC321 Lecture 18: Learning Probabilistic Models CSC321 Lecture 18: Learning Probabilistic Models Roger Grosse Roger Grosse CSC321 Lecture 18: Learning Probabilistic Models 1 / 25 Overview So far in this course: mainly supervised learning Language modeling

More information

Series 6, May 14th, 2018 (EM Algorithm and Semi-Supervised Learning)

Series 6, May 14th, 2018 (EM Algorithm and Semi-Supervised Learning) Exercises Introduction to Machine Learning SS 2018 Series 6, May 14th, 2018 (EM Algorithm and Semi-Supervised Learning) LAS Group, Institute for Machine Learning Dept of Computer Science, ETH Zürich Prof

More information

Lecture 2: Conjugate priors

Lecture 2: Conjugate priors (Spring ʼ) Lecture : Conjugate priors Julia Hockenmaier juliahmr@illinois.edu Siebel Center http://www.cs.uiuc.edu/class/sp/cs98jhm The binomial distribution If p is the probability of heads, the probability

More information

Learning mixture models

Learning mixture models Learning mixture models Stephen Tu August 30, 205 Abstract This note is completely expository, and contains a whirlwind abridged introduction to the topic of mixture models by focusing on the application

More information

an introduction to bayesian inference

an introduction to bayesian inference with an application to network analysis http://jakehofman.com january 13, 2010 motivation would like models that: provide predictive and explanatory power are complex enough to describe observed phenomena

More information

Week 3: The EM algorithm

Week 3: The EM algorithm Week 3: The EM algorithm Maneesh Sahani maneesh@gatsby.ucl.ac.uk Gatsby Computational Neuroscience Unit University College London Term 1, Autumn 2005 Mixtures of Gaussians Data: Y = {y 1... y N } Latent

More information

Introduction to Systems Analysis and Decision Making Prepared by: Jakub Tomczak

Introduction to Systems Analysis and Decision Making Prepared by: Jakub Tomczak Introduction to Systems Analysis and Decision Making Prepared by: Jakub Tomczak 1 Introduction. Random variables During the course we are interested in reasoning about considered phenomenon. In other words,

More information

Lecture 7 and 8: Markov Chain Monte Carlo

Lecture 7 and 8: Markov Chain Monte Carlo Lecture 7 and 8: Markov Chain Monte Carlo 4F13: Machine Learning Zoubin Ghahramani and Carl Edward Rasmussen Department of Engineering University of Cambridge http://mlg.eng.cam.ac.uk/teaching/4f13/ Ghahramani

More information

Homework 6: Image Completion using Mixture of Bernoullis

Homework 6: Image Completion using Mixture of Bernoullis Homework 6: Image Completion using Mixture of Bernoullis Deadline: Wednesday, Nov. 21, at 11:59pm Submission: You must submit two files through MarkUs 1 : 1. a PDF file containing your writeup, titled

More information

Unsupervised Learning

Unsupervised Learning 2018 EE448, Big Data Mining, Lecture 7 Unsupervised Learning Weinan Zhang Shanghai Jiao Tong University http://wnzhang.net http://wnzhang.net/teaching/ee448/index.html ML Problem Setting First build and

More information

POSTERIOR PROPRIETY IN SOME HIERARCHICAL EXPONENTIAL FAMILY MODELS

POSTERIOR PROPRIETY IN SOME HIERARCHICAL EXPONENTIAL FAMILY MODELS POSTERIOR PROPRIETY IN SOME HIERARCHICAL EXPONENTIAL FAMILY MODELS EDWARD I. GEORGE and ZUOSHUN ZHANG The University of Texas at Austin and Quintiles Inc. June 2 SUMMARY For Bayesian analysis of hierarchical

More information

Bayesian Nonparametrics: Models Based on the Dirichlet Process

Bayesian Nonparametrics: Models Based on the Dirichlet Process Bayesian Nonparametrics: Models Based on the Dirichlet Process Alessandro Panella Department of Computer Science University of Illinois at Chicago Machine Learning Seminar Series February 18, 2013 Alessandro

More information

Note 1: Varitional Methods for Latent Dirichlet Allocation

Note 1: Varitional Methods for Latent Dirichlet Allocation Technical Note Series Spring 2013 Note 1: Varitional Methods for Latent Dirichlet Allocation Version 1.0 Wayne Xin Zhao batmanfly@gmail.com Disclaimer: The focus of this note was to reorganie the content

More information

LECTURE 5 NOTES. n t. t Γ(a)Γ(b) pt+a 1 (1 p) n t+b 1. The marginal density of t is. Γ(t + a)γ(n t + b) Γ(n + a + b)

LECTURE 5 NOTES. n t. t Γ(a)Γ(b) pt+a 1 (1 p) n t+b 1. The marginal density of t is. Γ(t + a)γ(n t + b) Γ(n + a + b) LECTURE 5 NOTES 1. Bayesian point estimators. In the conventional (frequentist) approach to statistical inference, the parameter θ Θ is considered a fixed quantity. In the Bayesian approach, it is considered

More information

Variational Inference and Learning. Sargur N. Srihari

Variational Inference and Learning. Sargur N. Srihari Variational Inference and Learning Sargur N. srihari@cedar.buffalo.edu 1 Topics in Approximate Inference Task of Inference Intractability in Inference 1. Inference as Optimization 2. Expectation Maximization

More information

The exponential family: Conjugate priors

The exponential family: Conjugate priors Chapter 9 The exponential family: Conjugate priors Within the Bayesian framework the parameter θ is treated as a random quantity. This requires us to specify a prior distribution p(θ), from which we can

More information

Non-parametric Clustering with Dirichlet Processes

Non-parametric Clustering with Dirichlet Processes Non-parametric Clustering with Dirichlet Processes Timothy Burns SUNY at Buffalo Mar. 31 2009 T. Burns (SUNY at Buffalo) Non-parametric Clustering with Dirichlet Processes Mar. 31 2009 1 / 24 Introduction

More information

Mixture Models and Expectation-Maximization

Mixture Models and Expectation-Maximization Mixture Models and Expectation-Maximiation David M. Blei March 9, 2012 EM for mixtures of multinomials The graphical model for a mixture of multinomials π d x dn N D θ k K How should we fit the parameters?

More information

Expectation Maximization

Expectation Maximization Expectation Maximization Bishop PRML Ch. 9 Alireza Ghane c Ghane/Mori 4 6 8 4 6 8 4 6 8 4 6 8 5 5 5 5 5 5 4 6 8 4 4 6 8 4 5 5 5 5 5 5 µ, Σ) α f Learningscale is slightly Parameters is slightly larger larger

More information

IEOR E4570: Machine Learning for OR&FE Spring 2015 c 2015 by Martin Haugh. The EM Algorithm

IEOR E4570: Machine Learning for OR&FE Spring 2015 c 2015 by Martin Haugh. The EM Algorithm IEOR E4570: Machine Learning for OR&FE Spring 205 c 205 by Martin Haugh The EM Algorithm The EM algorithm is used for obtaining maximum likelihood estimates of parameters when some of the data is missing.

More information

Introduction to Machine Learning. Lecture 2

Introduction to Machine Learning. Lecture 2 Introduction to Machine Learning Lecturer: Eran Halperin Lecture 2 Fall Semester Scribe: Yishay Mansour Some of the material was not presented in class (and is marked with a side line) and is given for

More information

Clustering K-means. Clustering images. Machine Learning CSE546 Carlos Guestrin University of Washington. November 4, 2014.

Clustering K-means. Clustering images. Machine Learning CSE546 Carlos Guestrin University of Washington. November 4, 2014. Clustering K-means Machine Learning CSE546 Carlos Guestrin University of Washington November 4, 2014 1 Clustering images Set of Images [Goldberger et al.] 2 1 K-means Randomly initialize k centers µ (0)

More information

INTRODUCTION TO BAYESIAN METHODS II

INTRODUCTION TO BAYESIAN METHODS II INTRODUCTION TO BAYESIAN METHODS II Abstract. We will revisit point estimation and hypothesis testing from the Bayesian perspective.. Bayes estimators Let X = (X,..., X n ) be a random sample from the

More information

Topic Modelling and Latent Dirichlet Allocation

Topic Modelling and Latent Dirichlet Allocation Topic Modelling and Latent Dirichlet Allocation Stephen Clark (with thanks to Mark Gales for some of the slides) Lent 2013 Machine Learning for Language Processing: Lecture 7 MPhil in Advanced Computer

More information

Expectation Maximization and Mixtures of Gaussians

Expectation Maximization and Mixtures of Gaussians Statistical Machine Learning Notes 10 Expectation Maximiation and Mixtures of Gaussians Instructor: Justin Domke Contents 1 Introduction 1 2 Preliminary: Jensen s Inequality 2 3 Expectation Maximiation

More information

Prior Choice, Summarizing the Posterior

Prior Choice, Summarizing the Posterior Prior Choice, Summarizing the Posterior Statistics 220 Spring 2005 Copyright c 2005 by Mark E. Irwin Informative Priors Binomial Model: y π Bin(n, π) π is the success probability. Need prior p(π) Bayes

More information