Motif representation using position weight matrix
|
|
- Lionel French
- 5 years ago
- Views:
Transcription
1 Motif representation using position weight matrix Xiaohui Xie University of California, Irvine Motif representation using position weight matrix p.1/31
2 Position weight matrix Position weight matrix representation of a motif with width w: θ 11 θ 12 θ 13 θ 14 θ 21 θ 22 θ 23 θ 24 θ = θ w1 θ w2 θ w3 θ w4 where each row represents one position of the motif, and is normalized: 4 θ ij = 1 (2) for all i = 1, 2,,w. j=1 (1) Motif representation using position weight matrix p.2/31
3 Likelihood Given the position weight matrix θ, the probability of generating a sequence S = (S 1,S 2,,S w ) from θ is P(S θ) = = w w P(S i θ i ) (3) θ i,si (4) For convenience, we have converted S from a string of {A,C,G,T } to a string of {1, 2, 3, 4}. Motif representation using position weight matrix p.3/31
4 Likelihood Suppose we observe not just one, but a set of sequences S 1,S 2,,S n, each of which contains exactly w letters. Assume each of them is generated independently from the model θ. Then, the likelihood of observing these n sequences is P(S 1,S 2,,S n θ) = = n k=1 n k=1 P(S k θ) (5) w θ i,ski = w 4 j=1 θ c ij ij (6) where c ij is the number of letter j at position i (Note that 4 j=1 c ij = n for all i). Motif representation using position weight matrix p.4/31
5 Parameter estimation Now suppose we do not know θ. How to estimate it from the observed sequence data S 1,S 2,,S n? One solution: calculate the likelihood of observing the provided n sequences for different values of θ, L(θ) = P(S 1,S 2,,S n θ) = n k=1 w θ i,ski (7) Pick the one with the largest likelihood, that is, to find θ that max θ P(S 1,S 2,,S n θ) (8) Motif representation using position weight matrix p.5/31
6 Maximum likelihood estimation Maximum likelihood estimation of θ is ˆθ ML s.t. = arg max θ log L(θ)) = w 4 c ij log θ ij j=1 4 θ ij = 1, i = 1,,w (9) j=1 Motif representation using position weight matrix p.6/31
7 Optimization with equality constraints Construct a Lagrangian function taking the equality constraint into account: w 4 g(θ) = log L(θ) + λ i (1 θ ij ) (10) j=1 Solve the unconstrained optimization problem w 4 w 4 ˆθ = arg max θ g(θ)) = j=1 c ij log θ ij + λ i (1 j=1 θ ij ) (11) Motif representation using position weight matrix p.7/31
8 Optimization with equality constraints Take the derivative of g(θ) w.r.t. θ ij and the Lagrange multiplier λ i and set them to 0 which leads to: g(θ) θ ij = 0 (12) g(θ) λ i = 0 (13) ˆθ ij = c ij n which is simply the frequency of different letters at each position. (c ij is the number of letter j at position i). (14) Motif representation using position weight matrix p.8/31
9 Bayes Theorem P(θ S) = P(S θ)p(θ) P(S) (15) Each term in Bayes theorem has a conventional name: 1. P(S θ) the conditional probability of S given θ, also called the likelihood. 2. P(θ) the prior probability or marginal probability of θ. 3. P(θ S) the conditional probability of θ given S, also called the posterior probability of θ 4. P(S) the marginal probability of S, and acts as a normalizing constant. Motif representation using position weight matrix p.9/31
10 Maximum a posteriori (MAP) estmation MAP (or posterior mode) estimation of θ is ˆθ MAP (S) = arg max θ = arg max θ P(θ S 1,S 2,,S n ) (16) log L(θ) + log P(θ) (17) Assume P(θ) = w P(θ i) (independence of θ i at diffferent position i). Model P(θ i ) with a Dirichlet distribution (θ i1,θ i2,θ i3,θ i4 ) Dir(α 1,α 2,α 3,α 4 ). (18) Motif representation using position weight matrix p.10/31
11 Dirichlet Distribution Probability density function of Dirichlet distribution Dir(α) of order K 2: p(x 1,,x K ;α 1,,α K ) = 1 B(α) K x α i 1 k (19) for all x 1,,x K > 0 and K x i = 1. The density is zero outside this open (K 1)-dimensional simplex. α = (α 1,,α K ) are parameters with α i > 0 for all i. B(α), the normalizing constant, is the multinomial beta function: B(α) = K Γ(α i) Γ( K α i) (20) Motif representation using position weight matrix p.11/31
12 Gamma function Gamma function for positive real z Γ(z) = 0 t z 1 e t dt (21) If n is a positive integer, then Γ(z + 1) = zγ(z) (22) Γ(n + 1) = n! (23) Motif representation using position weight matrix p.12/31
13 Properties of Dirichlet distribution Dirichlet distribution p(x 1,,x K ;α) = 1 B(α) Expectation, define α 0 = K α i, K x α i 1 i (24) E[X i ] = α i α 0 (25) Variance Co-variance V ar[x i ] = α i(α 0 α i ) α 2 0(α 0 + 1) Cov[X i X j ] = α iα j α0(α ) (26) (27) Motif representation using position weight matrix p.13/31
14 Posterior Distribution Conditional probability: P(S 1,S 2,,S n θ) = w 4 j=1 θc ij ij Prior probability: p(θ i1,,θ i4 ;α) = 1 B(α) Posterior probability: 4 θα i 1 i P(θ i S 1,,S n ) = Dir(c i1 + α 1,c i2 + α 2,c i3 + α 3,c i4 + α 4 ) Maxmium a posteriori estimate: θ MAP i = c ij + α i 1 n + α 0 4 (28) where α 0 i α i. Motif representation using position weight matrix p.14/31
15 Mixture of sequences Suppose we have a more difficult situation: Among the set of n given sequences, S 1,S 2,,S n, only a subset of them are generated by a weight matrix model θ. How to identify θ in this case? Let us first define the "non-motif" (also called background) sequence. Suppose they are generated from a single distribution p 0 = (p 0 A,p 0 C,p 0 G,p 0 T) = (p 0 1,p 0 2,p 0 3,p 0 4) (29) Motif representation using position weight matrix p.15/31
16 Likelihood for mixture of sequences Now the problem is we do not know which sequence is generated from the motif (θ) and which one is generated from the background model (θ 0 ). Suppose we are provided with such label information: 1 if S i is generated by θ z i = (30) 0 if S i is generated by θ 0 for all i = 1, 2,,n. Then, the likelihood of observing the n sequences P(S 1,S 2,,S n z,θ,θ 0 ) = n [z i P(S i θ) + (1 z i )P(S i θ 0 )] Motif representation using position weight matrix p.16/31
17 Maximum Likelihood Find the joint probability of sequences and the labels P(S,z θ,θ 0 ) = P(S z,θ,θ 0 )P(z) n = P(z i )[z i P(S i θ) + (1 z i )P(S i θ 0 )] where z (z 1,,z n ) and P(z) = i P(z i). Marginalize over labels to derive the likelihood L(θ) = P(S θ,θ 0 ) = n [P(z i = 1)P(S i θ)+p(z i = 0)P(S i θ 0 )] Maximum likelihood estimate: ˆθ ML = arg max θ log L(θ)) Motif representation using position weight matrix p.17/31
18 Maximum Likelihood Find the joint probability of sequences and the labels P(S,z θ,θ 0 ) = P(S z,θ,θ 0 )P(z) n = P(z i )[z i P(S i θ) + (1 z i )P(S i θ 0 )] where z (z 1,,z n ) and P(z) = i P(z i). Marginalize over labels to derive the likelihood L(θ) = P(S θ,θ 0 ) = n [P(z i = 1)P(S i θ)+p(z i = 0)P(S i θ 0 )] Maximum likelihood estimate: ˆθ ML = arg max θ log L(θ)) Motif representation using position weight matrix p.18/31
19 Lower bound on the L(θ) Log likelihood function log L(θ) = n log [P(z i = 1)P(S i z i = 1) + P(z i = 0)P(S i z i = 0)] where P(S i z i = 1) = P(S i θ) and P(S i z i = 0) = P(S i θ 0 ). Jensen s inequality: log(q 1 x + q 2 y) q 1 log(x) + q 2 log(y) for all q 1,q 2 0 and q 1 + q 2 = 1. Motif representation using position weight matrix p.19/31
20 EM-algorithm Lower bound on log L(θ). log L(θ) N {q i log P(z i = 1)P(S i z i = 1) q i + (1 q i )log P(z i = 0)P(S i z i = 0) 1 q i } N φ(q i,θ) Expectation-Maximization: Alternate between two steps: E-step M-step ˆq i = arg max q i ˆθ = arg max θ φ(q i, ˆθ) N φ(ˆq i,θ) Motif representation using position weight matrix p.20/31
21 E-Step Auxiliary function φ(q i, θ) = q i log P(z i = 1)P(S i z i = 1) q i +(1 q i ) log P(z i = 0)P(S i z i = 0) 1 q i E-step which leads to ˆq i = arg max q i φ(q i, ˆθ) ˆq i = P(z i = 1)P(S i z i = 1) P(z i = 1)P(S i z i = 1) + P(z i = 0)P(S i z i = 0) = P(z i S i ) Motif representation using position weight matrix p.21/31
22 M-Step Auxiliary function φ(q i, θ) = q i log P(z i = 1)P(S i z i = 1) q i +(1 q i ) log P(z i = 0)P(S i z i = 0) 1 q i M-step ˆθ = arg max θ = arg max θ N φ(ˆq i, θ) N ˆq i [log P(S i θ) + (1 ˆq i ) log P(S i θ 0 )] which leads to ˆθ ij = N k=1 ˆq ki(s ki = j) N k=1 ˆq k where I(a) is an indicator function: I(a) = 1 if a is true and 0 o.w. Motif representation using position weight matrix p.22/31
23 Summary of EM-algorithm Initialize parameters θ. Repeat until convergence E-step: estimate the expected values of labels, given the current parameter estimate ˆq i = P(z i S i ) M-step: re-estimate the parameters, given the expected estimates of the labels ˆθ ij = N k=1 ˆq ki(s ki = j) N k=1 ˆq k The procedure is guaranteed to converge to a local maximum or saddle point solution. Motif representation using position weight matrix p.23/31
24 What about MAP estimate? Consider a Dirichlet prior distribution on θ i : Initialize parameters θ. Repeat until convergence θ i = Dir(α) i = 1,, w E-step: estimate the expected values of labels, given the current parameter estimate ˆq i = P(z i S i ) M-step: re-estimate the parameters, given the expected estimates of the labels ˆθ ij = N k=1 ˆq ki(s ki = j) + α j 1 N k=1 ˆq k + α 0 4 Motif representation using position weight matrix p.24/31
25 ifferent methods for parameter estimatio So far, we have introduced two methods: ML and MAP Maximum likelihood (ML) ˆθ ML = arg max θ P(S θ) Maximum a posterior (MAP) ˆθ MAP = arg max θ P(θ S) Bayes Estimator ˆθ = arg max θ E [ˆθ(S) θ) 2 ] that is the one minimizing MSE( mean square error). ˆθ = E[θ S] = θp(θ S)dθ Motif representation using position weight matrix p.25/31
26 Joint distribution Joint distribution of labels and sequences P(S, z θ, θ 0 ) = P(S z, θ, θ 0 )P(z) = n P(z i )[z i P(S i θ) + (1 z i )P(S i θ 0 )] Joint distribution of S, z, θ and θ 0 P(S, z, θ, θ 0 ) = n P(z i )[z i P(S i θ) + (1 z i )P(S i θ 0 )]P(θ)P(θ 0 ) Find the joint distribution of S and z P(S, z) = n P(z i )[z i P(S i θ) + (1 z i )P(S i θ 0 )]P(θ)P(θ 0 )dθ 0 θ θ 0 Motif representation using position weight matrix p.26/31
27 Posterior distribution of labels Posterior distribution of z: P(z S) = θ n P(z i )[z i P(S i θ) + (1 z i )P(S i θ 0 )]P(θ)dθ/P(S) w q m (1 q) n m θ i 4 j=1 θ n ij ij P(θ i)dθ i w [ ] q m (1 q) n m B(ni + α) B(n0 + α 0 ) B(α) B(α 0 ) B(n 0 + α 0 ) B(α 0 ) where m = n z i, P(z i = 1) = q, P(θ i ) = Dir(α), P(θ 0 ) = Dir(α 0 ) are Dirichlet priors, and n ij n k=1 z ki(s ki = j) is the number of letter j at position i among the sequences with label 1. n i (n i1,, n i4 ). n 0,j n k=1 (1 z k) w I(S ki = j) and n 0 (n 0,1,, n 0,4 ). Motif representation using position weight matrix p.27/31
28 Sampling Posterior distribution of z: P(z S) q m (1 q) n m w B(n i + α) B(α) B(n 0 + α 0 ) B(α 0 ) Posterior distribution of z k conditioned on all other labels z k {z i i = 1,, n, i k}: P(z k = 1 z k, S) q w [ ] B(n k,i + (S ki ) + α) B(n k,i + α) where n k,ij n l=1,l k z li(s li = j) is the number of letter j at position i among all sequences with label 1 excluding the k th sequence. n k,i (n k,i1,, n k,i4 ). (l) = (b 1,, b 4 ) with b j = 1 for j = S ki and otherwise 0. Motif representation using position weight matrix p.28/31
29 Posterior distribution Posterior distribution of z k conditioned on z k : P(z k = 1 z k, S) q w n k,iski + α Ski 1 j [n k,ij + α j 1] = q w θ iski Note that θ iski is same as the MAP estimate of the frequency weight matrix using all sequences with label 1 excluding the k th sequence. Similarly P(z k = 0 z k, S) (1 q) w n k,0ski + α 0,Ski 1 j [n k,0j + α 0,j 1] = (1 q) θ 0,Ski is same as the MAP estimate of the background distribution. w θ 0,Ski Motif representation using position weight matrix p.29/31
30 Gibbs sampling Posterior probability w P(z k = 1 z k, S) θ iski P(z k = 0 z k, S) (1 q) Gibbs sampling w θ 0,Ski P(z k = 1 z k, S) = q w θ is ki q w θ is ki + (1 q) w θ 0,S ki Motif representation using position weight matrix p.30/31
31 Gibbs sampling Initialize labels z: Assign the value of z i randomly according to P(z i = 1) = q for all i = 1,, n. Repeat until converge Repeat from i = 1 to n Update θ matrix using the MAP estimate (excluding i th sequence) Sample the value of z i Motif representation using position weight matrix p.31/31
EM-algorithm for motif discovery
EM-algorithm for motif discovery Xiaohui Xie University of California, Irvine EM-algorithm for motif discovery p.1/19 Position weight matrix Position weight matrix representation of a motif with width
More informationExpectation maximization tutorial
Expectation maximization tutorial Octavian Ganea November 18, 2016 1/1 Today Expectation - maximization algorithm Topic modelling 2/1 ML & MAP Observed data: X = {x 1, x 2... x N } 3/1 ML & MAP Observed
More informationLecture 6: Graphical Models: Learning
Lecture 6: Graphical Models: Learning 4F13: Machine Learning Zoubin Ghahramani and Carl Edward Rasmussen Department of Engineering, University of Cambridge February 3rd, 2010 Ghahramani & Rasmussen (CUED)
More informationMachine Learning Summer School
Machine Learning Summer School Lecture 3: Learning parameters and structure Zoubin Ghahramani zoubin@eng.cam.ac.uk http://learning.eng.cam.ac.uk/zoubin/ Department of Engineering University of Cambridge,
More informationParametric Unsupervised Learning Expectation Maximization (EM) Lecture 20.a
Parametric Unsupervised Learning Expectation Maximization (EM) Lecture 20.a Some slides are due to Christopher Bishop Limitations of K-means Hard assignments of data points to clusters small shift of a
More informationOverview of Course. Nevin L. Zhang (HKUST) Bayesian Networks Fall / 58
Overview of Course So far, we have studied The concept of Bayesian network Independence and Separation in Bayesian networks Inference in Bayesian networks The rest of the course: Data analysis using Bayesian
More informationAnother Walkthrough of Variational Bayes. Bevan Jones Machine Learning Reading Group Macquarie University
Another Walkthrough of Variational Bayes Bevan Jones Machine Learning Reading Group Macquarie University 2 Variational Bayes? Bayes Bayes Theorem But the integral is intractable! Sampling Gibbs, Metropolis
More informationIntroduction to Machine Learning. Maximum Likelihood and Bayesian Inference. Lecturers: Eran Halperin, Lior Wolf
1 Introduction to Machine Learning Maximum Likelihood and Bayesian Inference Lecturers: Eran Halperin, Lior Wolf 2014-15 We know that X ~ B(n,p), but we do not know p. We get a random sample from X, a
More informationIntroduction to Machine Learning. Maximum Likelihood and Bayesian Inference. Lecturers: Eran Halperin, Yishay Mansour, Lior Wolf
1 Introduction to Machine Learning Maximum Likelihood and Bayesian Inference Lecturers: Eran Halperin, Yishay Mansour, Lior Wolf 2013-14 We know that X ~ B(n,p), but we do not know p. We get a random sample
More informationA Note on the Expectation-Maximization (EM) Algorithm
A Note on the Expectation-Maximization (EM) Algorithm ChengXiang Zhai Department of Computer Science University of Illinois at Urbana-Champaign March 11, 2007 1 Introduction The Expectation-Maximization
More informationCOS513 LECTURE 8 STATISTICAL CONCEPTS
COS513 LECTURE 8 STATISTICAL CONCEPTS NIKOLAI SLAVOV AND ANKUR PARIKH 1. MAKING MEANINGFUL STATEMENTS FROM JOINT PROBABILITY DISTRIBUTIONS. A graphical model (GM) represents a family of probability distributions
More informationIntroduction to Probabilistic Machine Learning
Introduction to Probabilistic Machine Learning Piyush Rai Dept. of CSE, IIT Kanpur (Mini-course 1) Nov 03, 2015 Piyush Rai (IIT Kanpur) Introduction to Probabilistic Machine Learning 1 Machine Learning
More informationBayesian Learning. HT2015: SC4 Statistical Data Mining and Machine Learning. Maximum Likelihood Principle. The Bayesian Learning Framework
HT5: SC4 Statistical Data Mining and Machine Learning Dino Sejdinovic Department of Statistics Oxford http://www.stats.ox.ac.uk/~sejdinov/sdmml.html Maximum Likelihood Principle A generative model for
More informationLecture 8: Graphical models for Text
Lecture 8: Graphical models for Text 4F13: Machine Learning Joaquin Quiñonero-Candela and Carl Edward Rasmussen Department of Engineering University of Cambridge http://mlg.eng.cam.ac.uk/teaching/4f13/
More informationExpectation Maximization
Expectation Maximization Seungjin Choi Department of Computer Science and Engineering Pohang University of Science and Technology 77 Cheongam-ro, Nam-gu, Pohang 37673, Korea seungjin@postech.ac.kr 1 /
More informationProgramming Assignment 4: Image Completion using Mixture of Bernoullis
Programming Assignment 4: Image Completion using Mixture of Bernoullis Deadline: Tuesday, April 4, at 11:59pm TA: Renie Liao (csc321ta@cs.toronto.edu) Submission: You must submit two files through MarkUs
More informationStatistics 3858 : Maximum Likelihood Estimators
Statistics 3858 : Maximum Likelihood Estimators 1 Method of Maximum Likelihood In this method we construct the so called likelihood function, that is L(θ) = L(θ; X 1, X 2,..., X n ) = f n (X 1, X 2,...,
More informationIntroduction to Machine Learning
Introduction to Machine Learning Brown University CSCI 1950-F, Spring 2012 Prof. Erik Sudderth Lecture 20: Expectation Maximization Algorithm EM for Mixture Models Many figures courtesy Kevin Murphy s
More informationCS540 Machine learning L9 Bayesian statistics
CS540 Machine learning L9 Bayesian statistics 1 Last time Naïve Bayes Beta-Bernoulli 2 Outline Bayesian concept learning Beta-Bernoulli model (review) Dirichlet-multinomial model Credible intervals 3 Bayesian
More informationDecision theory. 1 We may also consider randomized decision rules, where δ maps observed data D to a probability distribution over
Point estimation Suppose we are interested in the value of a parameter θ, for example the unknown bias of a coin. We have already seen how one may use the Bayesian method to reason about θ; namely, we
More information2 Inference for Multinomial Distribution
Markov Chain Monte Carlo Methods Part III: Statistical Concepts By K.B.Athreya, Mohan Delampady and T.Krishnan 1 Introduction In parts I and II of this series it was shown how Markov chain Monte Carlo
More informationParametric Models. Dr. Shuang LIANG. School of Software Engineering TongJi University Fall, 2012
Parametric Models Dr. Shuang LIANG School of Software Engineering TongJi University Fall, 2012 Today s Topics Maximum Likelihood Estimation Bayesian Density Estimation Today s Topics Maximum Likelihood
More informationIntroduction to Bayesian inference
Introduction to Bayesian inference Thomas Alexander Brouwer University of Cambridge tab43@cam.ac.uk 17 November 2015 Probabilistic models Describe how data was generated using probability distributions
More informationProbability and Estimation. Alan Moses
Probability and Estimation Alan Moses Random variables and probability A random variable is like a variable in algebra (e.g., y=e x ), but where at least part of the variability is taken to be stochastic.
More informationComputational Cognitive Science
Computational Cognitive Science Lecture 8: Frank Keller School of Informatics University of Edinburgh keller@inf.ed.ac.uk Based on slides by Sharon Goldwater October 14, 2016 Frank Keller Computational
More informationLearning Bayesian network : Given structure and completely observed data
Learning Bayesian network : Given structure and completely observed data Probabilistic Graphical Models Sharif University of Technology Spring 2017 Soleymani Learning problem Target: true distribution
More informationFundamentals. CS 281A: Statistical Learning Theory. Yangqing Jia. August, Based on tutorial slides by Lester Mackey and Ariel Kleiner
Fundamentals CS 281A: Statistical Learning Theory Yangqing Jia Based on tutorial slides by Lester Mackey and Ariel Kleiner August, 2011 Outline 1 Probability 2 Statistics 3 Linear Algebra 4 Optimization
More informationBayesian Nonparametrics
Bayesian Nonparametrics Lorenzo Rosasco 9.520 Class 18 April 11, 2011 About this class Goal To give an overview of some of the basic concepts in Bayesian Nonparametrics. In particular, to discuss Dirichelet
More informationDavid Giles Bayesian Econometrics
David Giles Bayesian Econometrics 1. General Background 2. Constructing Prior Distributions 3. Properties of Bayes Estimators and Tests 4. Bayesian Analysis of the Multiple Regression Model 5. Bayesian
More informationMaximum Likelihood (ML), Expectation Maximization (EM) Pieter Abbeel UC Berkeley EECS
Maximum Likelihood (ML), Expectation Maximization (EM) Pieter Abbeel UC Berkeley EECS Many slides adapted from Thrun, Burgard and Fox, Probabilistic Robotics Outline Maximum likelihood (ML) Priors, and
More informationBayesian Inference for Dirichlet-Multinomials
Bayesian Inference for Dirichlet-Multinomials Mark Johnson Macquarie University Sydney, Australia MLSS Summer School 1 / 50 Random variables and distributed according to notation A probability distribution
More informationA Very Brief Summary of Bayesian Inference, and Examples
A Very Brief Summary of Bayesian Inference, and Examples Trinity Term 009 Prof Gesine Reinert Our starting point are data x = x 1, x,, x n, which we view as realisations of random variables X 1, X,, X
More informationLanguage as a Stochastic Process
CS769 Spring 2010 Advanced Natural Language Processing Language as a Stochastic Process Lecturer: Xiaojin Zhu jerryzhu@cs.wisc.edu 1 Basic Statistics for NLP Pick an arbitrary letter x at random from any
More informationVariational Scoring of Graphical Model Structures
Variational Scoring of Graphical Model Structures Matthew J. Beal Work with Zoubin Ghahramani & Carl Rasmussen, Toronto. 15th September 2003 Overview Bayesian model selection Approximations using Variational
More informationProbabilistic Graphical Models
Parameter Estimation December 14, 2015 Overview 1 Motivation 2 3 4 What did we have so far? 1 Representations: how do we model the problem? (directed/undirected). 2 Inference: given a model and partially
More informationCS Lecture 18. Topic Models and LDA
CS 6347 Lecture 18 Topic Models and LDA (some slides by David Blei) Generative vs. Discriminative Models Recall that, in Bayesian networks, there could be many different, but equivalent models of the same
More information4: Parameter Estimation in Fully Observed BNs
10-708: Probabilistic Graphical Models 10-708, Spring 2015 4: Parameter Estimation in Fully Observed BNs Lecturer: Eric P. Xing Scribes: Purvasha Charavarti, Natalie Klein, Dipan Pal 1 Learning Graphical
More informationLecture 4. Generative Models for Discrete Data - Part 3. Luigi Freda. ALCOR Lab DIAG University of Rome La Sapienza.
Lecture 4 Generative Models for Discrete Data - Part 3 Luigi Freda ALCOR Lab DIAG University of Rome La Sapienza October 6, 2017 Luigi Freda ( La Sapienza University) Lecture 4 October 6, 2017 1 / 46 Outline
More informationCSC411 Fall 2018 Homework 5
Homework 5 Deadline: Wednesday, Nov. 4, at :59pm. Submission: You need to submit two files:. Your solutions to Questions and 2 as a PDF file, hw5_writeup.pdf, through MarkUs. (If you submit answers to
More informationLecture 18: Bayesian Inference
Lecture 18: Bayesian Inference Hyang-Won Lee Dept. of Internet & Multimedia Eng. Konkuk University Lecture 18 Probability and Statistics, Spring 2014 1 / 10 Bayesian Statistical Inference Statiscal inference
More informationLDA with Amortized Inference
LDA with Amortied Inference Nanbo Sun Abstract This report describes how to frame Latent Dirichlet Allocation LDA as a Variational Auto- Encoder VAE and use the Amortied Variational Inference AVI to optimie
More informationLecture 4 September 15
IFT 6269: Probabilistic Graphical Models Fall 2017 Lecture 4 September 15 Lecturer: Simon Lacoste-Julien Scribe: Philippe Brouillard & Tristan Deleu 4.1 Maximum Likelihood principle Given a parametric
More informationLinear Models A linear model is defined by the expression
Linear Models A linear model is defined by the expression x = F β + ɛ. where x = (x 1, x 2,..., x n ) is vector of size n usually known as the response vector. β = (β 1, β 2,..., β p ) is the transpose
More informationExercise 1: Basics of probability calculus
: Basics of probability calculus Stig-Arne Grönroos Department of Signal Processing and Acoustics Aalto University, School of Electrical Engineering stig-arne.gronroos@aalto.fi [21.01.2016] Ex 1.1: Conditional
More informationComputational Cognitive Science
Computational Cognitive Science Lecture 9: Bayesian Estimation Chris Lucas (Slides adapted from Frank Keller s) School of Informatics University of Edinburgh clucas2@inf.ed.ac.uk 17 October, 2017 1 / 28
More informationBayesian Methods. David S. Rosenberg. New York University. March 20, 2018
Bayesian Methods David S. Rosenberg New York University March 20, 2018 David S. Rosenberg (New York University) DS-GA 1003 / CSCI-GA 2567 March 20, 2018 1 / 38 Contents 1 Classical Statistics 2 Bayesian
More informationBayesian Inference and MCMC
Bayesian Inference and MCMC Aryan Arbabi Partly based on MCMC slides from CSC412 Fall 2018 1 / 18 Bayesian Inference - Motivation Consider we have a data set D = {x 1,..., x n }. E.g each x i can be the
More informationLecture 3: More on regularization. Bayesian vs maximum likelihood learning
Lecture 3: More on regularization. Bayesian vs maximum likelihood learning L2 and L1 regularization for linear estimators A Bayesian interpretation of regularization Bayesian vs maximum likelihood fitting
More informationThe Monte Carlo Method: Bayesian Networks
The Method: Bayesian Networks Dieter W. Heermann Methods 2009 Dieter W. Heermann ( Methods)The Method: Bayesian Networks 2009 1 / 18 Outline 1 Bayesian Networks 2 Gene Expression Data 3 Bayesian Networks
More informationThe Expectation Maximization or EM algorithm
The Expectation Maximization or EM algorithm Carl Edward Rasmussen November 15th, 2017 Carl Edward Rasmussen The EM algorithm November 15th, 2017 1 / 11 Contents notation, objective the lower bound functional,
More informationPMR Learning as Inference
Outline PMR Learning as Inference Probabilistic Modelling and Reasoning Amos Storkey Modelling 2 The Exponential Family 3 Bayesian Sets School of Informatics, University of Edinburgh Amos Storkey PMR Learning
More informationUnsupervised Learning
Unsupervised Learning Bayesian Model Comparison Zoubin Ghahramani zoubin@gatsby.ucl.ac.uk Gatsby Computational Neuroscience Unit, and MSc in Intelligent Systems, Dept Computer Science University College
More informationProbabilistic Time Series Classification
Probabilistic Time Series Classification Y. Cem Sübakan Boğaziçi University 25.06.2013 Y. Cem Sübakan (Boğaziçi University) M.Sc. Thesis Defense 25.06.2013 1 / 54 Problem Statement The goal is to assign
More informationProbabilistic modeling. The slides are closely adapted from Subhransu Maji s slides
Probabilistic modeling The slides are closely adapted from Subhransu Maji s slides Overview So far the models and algorithms you have learned about are relatively disconnected Probabilistic modeling framework
More informationLatent Variable View of EM. Sargur Srihari
Latent Variable View of EM Sargur srihari@cedar.buffalo.edu 1 Examples of latent variables 1. Mixture Model Joint distribution is p(x,z) We don t have values for z 2. Hidden Markov Model A single time
More informationBasic math for biology
Basic math for biology Lei Li Florida State University, Feb 6, 2002 The EM algorithm: setup Parametric models: {P θ }. Data: full data (Y, X); partial data Y. Missing data: X. Likelihood and maximum likelihood
More informationMixtures of Gaussians. Sargur Srihari
Mixtures of Gaussians Sargur srihari@cedar.buffalo.edu 1 9. Mixture Models and EM 0. Mixture Models Overview 1. K-Means Clustering 2. Mixtures of Gaussians 3. An Alternative View of EM 4. The EM Algorithm
More informationDensity Estimation. Seungjin Choi
Density Estimation Seungjin Choi Department of Computer Science and Engineering Pohang University of Science and Technology 77 Cheongam-ro, Nam-gu, Pohang 37673, Korea seungjin@postech.ac.kr http://mlg.postech.ac.kr/
More informationStatistical and Learning Techniques in Computer Vision Lecture 2: Maximum Likelihood and Bayesian Estimation Jens Rittscher and Chuck Stewart
Statistical and Learning Techniques in Computer Vision Lecture 2: Maximum Likelihood and Bayesian Estimation Jens Rittscher and Chuck Stewart 1 Motivation and Problem In Lecture 1 we briefly saw how histograms
More informationGaussian Mixture Models, Expectation Maximization
Gaussian Mixture Models, Expectation Maximization Instructor: Jessica Wu Harvey Mudd College The instructor gratefully acknowledges Andrew Ng (Stanford), Andrew Moore (CMU), Eric Eaton (UPenn), David Kauchak
More informationSurvey of JPEG compression history analysis
Survey of JPEG compression history analysis Andrew B. Lewis Computer Laboratory Topics in security forensic signal analysis References Neelamani et al.: JPEG compression history estimation for color images,
More informationChapter 5 continued. Chapter 5 sections
Chapter 5 sections Discrete univariate distributions: 5.2 Bernoulli and Binomial distributions Just skim 5.3 Hypergeometric distributions 5.4 Poisson distributions Just skim 5.5 Negative Binomial distributions
More informationGraphical Models for Collaborative Filtering
Graphical Models for Collaborative Filtering Le Song Machine Learning II: Advanced Topics CSE 8803ML, Spring 2012 Sequence modeling HMM, Kalman Filter, etc.: Similarity: the same graphical model topology,
More informationHierarchical Models & Bayesian Model Selection
Hierarchical Models & Bayesian Model Selection Geoffrey Roeder Departments of Computer Science and Statistics University of British Columbia Jan. 20, 2016 Contact information Please report any typos or
More informationMachine Learning. Gaussian Mixture Models. Zhiyao Duan & Bryan Pardo, Machine Learning: EECS 349 Fall
Machine Learning Gaussian Mixture Models Zhiyao Duan & Bryan Pardo, Machine Learning: EECS 349 Fall 2012 1 The Generative Model POV We think of the data as being generated from some process. We assume
More informationGaussian Mixture Models
Gaussian Mixture Models Pradeep Ravikumar Co-instructor: Manuela Veloso Machine Learning 10-701 Some slides courtesy of Eric Xing, Carlos Guestrin (One) bad case for K- means Clusters may overlap Some
More informationOutline. Binomial, Multinomial, Normal, Beta, Dirichlet. Posterior mean, MAP, credible interval, posterior distribution
Outline A short review on Bayesian analysis. Binomial, Multinomial, Normal, Beta, Dirichlet Posterior mean, MAP, credible interval, posterior distribution Gibbs sampling Revisit the Gaussian mixture model
More informationLecture 2: Priors and Conjugacy
Lecture 2: Priors and Conjugacy Melih Kandemir melih.kandemir@iwr.uni-heidelberg.de May 6, 2014 Some nice courses Fred A. Hamprecht (Heidelberg U.) https://www.youtube.com/watch?v=j66rrnzzkow Michael I.
More informationCOMS 4721: Machine Learning for Data Science Lecture 16, 3/28/2017
COMS 4721: Machine Learning for Data Science Lecture 16, 3/28/2017 Prof. John Paisley Department of Electrical Engineering & Data Science Institute Columbia University SOFT CLUSTERING VS HARD CLUSTERING
More informationClustering problems, mixture models and Bayesian nonparametrics
Clustering problems, mixture models and Bayesian nonparametrics Nguyễn Xuân Long Department of Statistics Department of Electrical Engineering and Computer Science University of Michigan Vietnam Institute
More informationLearning Sequence Motif Models Using Expectation Maximization (EM) and Gibbs Sampling
Learning Sequence Motif Models Using Expectation Maximization (EM) and Gibbs Sampling BMI/CS 776 www.biostat.wisc.edu/bmi776/ Spring 009 Mark Craven craven@biostat.wisc.edu Sequence Motifs what is a sequence
More informationDynamic Approaches: The Hidden Markov Model
Dynamic Approaches: The Hidden Markov Model Davide Bacciu Dipartimento di Informatica Università di Pisa bacciu@di.unipi.it Machine Learning: Neural Networks and Advanced Models (AA2) Inference as Message
More informationPattern Recognition and Machine Learning. Bishop Chapter 9: Mixture Models and EM
Pattern Recognition and Machine Learning Chapter 9: Mixture Models and EM Thomas Mensink Jakob Verbeek October 11, 27 Le Menu 9.1 K-means clustering Getting the idea with a simple example 9.2 Mixtures
More informationInfinite latent feature models and the Indian Buffet Process
p.1 Infinite latent feature models and the Indian Buffet Process Tom Griffiths Cognitive and Linguistic Sciences Brown University Joint work with Zoubin Ghahramani p.2 Beyond latent classes Unsupervised
More informationCSC321 Lecture 18: Learning Probabilistic Models
CSC321 Lecture 18: Learning Probabilistic Models Roger Grosse Roger Grosse CSC321 Lecture 18: Learning Probabilistic Models 1 / 25 Overview So far in this course: mainly supervised learning Language modeling
More informationSeries 6, May 14th, 2018 (EM Algorithm and Semi-Supervised Learning)
Exercises Introduction to Machine Learning SS 2018 Series 6, May 14th, 2018 (EM Algorithm and Semi-Supervised Learning) LAS Group, Institute for Machine Learning Dept of Computer Science, ETH Zürich Prof
More informationLecture 2: Conjugate priors
(Spring ʼ) Lecture : Conjugate priors Julia Hockenmaier juliahmr@illinois.edu Siebel Center http://www.cs.uiuc.edu/class/sp/cs98jhm The binomial distribution If p is the probability of heads, the probability
More informationLearning mixture models
Learning mixture models Stephen Tu August 30, 205 Abstract This note is completely expository, and contains a whirlwind abridged introduction to the topic of mixture models by focusing on the application
More informationan introduction to bayesian inference
with an application to network analysis http://jakehofman.com january 13, 2010 motivation would like models that: provide predictive and explanatory power are complex enough to describe observed phenomena
More informationWeek 3: The EM algorithm
Week 3: The EM algorithm Maneesh Sahani maneesh@gatsby.ucl.ac.uk Gatsby Computational Neuroscience Unit University College London Term 1, Autumn 2005 Mixtures of Gaussians Data: Y = {y 1... y N } Latent
More informationIntroduction to Systems Analysis and Decision Making Prepared by: Jakub Tomczak
Introduction to Systems Analysis and Decision Making Prepared by: Jakub Tomczak 1 Introduction. Random variables During the course we are interested in reasoning about considered phenomenon. In other words,
More informationLecture 7 and 8: Markov Chain Monte Carlo
Lecture 7 and 8: Markov Chain Monte Carlo 4F13: Machine Learning Zoubin Ghahramani and Carl Edward Rasmussen Department of Engineering University of Cambridge http://mlg.eng.cam.ac.uk/teaching/4f13/ Ghahramani
More informationHomework 6: Image Completion using Mixture of Bernoullis
Homework 6: Image Completion using Mixture of Bernoullis Deadline: Wednesday, Nov. 21, at 11:59pm Submission: You must submit two files through MarkUs 1 : 1. a PDF file containing your writeup, titled
More informationUnsupervised Learning
2018 EE448, Big Data Mining, Lecture 7 Unsupervised Learning Weinan Zhang Shanghai Jiao Tong University http://wnzhang.net http://wnzhang.net/teaching/ee448/index.html ML Problem Setting First build and
More informationPOSTERIOR PROPRIETY IN SOME HIERARCHICAL EXPONENTIAL FAMILY MODELS
POSTERIOR PROPRIETY IN SOME HIERARCHICAL EXPONENTIAL FAMILY MODELS EDWARD I. GEORGE and ZUOSHUN ZHANG The University of Texas at Austin and Quintiles Inc. June 2 SUMMARY For Bayesian analysis of hierarchical
More informationBayesian Nonparametrics: Models Based on the Dirichlet Process
Bayesian Nonparametrics: Models Based on the Dirichlet Process Alessandro Panella Department of Computer Science University of Illinois at Chicago Machine Learning Seminar Series February 18, 2013 Alessandro
More informationNote 1: Varitional Methods for Latent Dirichlet Allocation
Technical Note Series Spring 2013 Note 1: Varitional Methods for Latent Dirichlet Allocation Version 1.0 Wayne Xin Zhao batmanfly@gmail.com Disclaimer: The focus of this note was to reorganie the content
More informationLECTURE 5 NOTES. n t. t Γ(a)Γ(b) pt+a 1 (1 p) n t+b 1. The marginal density of t is. Γ(t + a)γ(n t + b) Γ(n + a + b)
LECTURE 5 NOTES 1. Bayesian point estimators. In the conventional (frequentist) approach to statistical inference, the parameter θ Θ is considered a fixed quantity. In the Bayesian approach, it is considered
More informationVariational Inference and Learning. Sargur N. Srihari
Variational Inference and Learning Sargur N. srihari@cedar.buffalo.edu 1 Topics in Approximate Inference Task of Inference Intractability in Inference 1. Inference as Optimization 2. Expectation Maximization
More informationThe exponential family: Conjugate priors
Chapter 9 The exponential family: Conjugate priors Within the Bayesian framework the parameter θ is treated as a random quantity. This requires us to specify a prior distribution p(θ), from which we can
More informationNon-parametric Clustering with Dirichlet Processes
Non-parametric Clustering with Dirichlet Processes Timothy Burns SUNY at Buffalo Mar. 31 2009 T. Burns (SUNY at Buffalo) Non-parametric Clustering with Dirichlet Processes Mar. 31 2009 1 / 24 Introduction
More informationMixture Models and Expectation-Maximization
Mixture Models and Expectation-Maximiation David M. Blei March 9, 2012 EM for mixtures of multinomials The graphical model for a mixture of multinomials π d x dn N D θ k K How should we fit the parameters?
More informationExpectation Maximization
Expectation Maximization Bishop PRML Ch. 9 Alireza Ghane c Ghane/Mori 4 6 8 4 6 8 4 6 8 4 6 8 5 5 5 5 5 5 4 6 8 4 4 6 8 4 5 5 5 5 5 5 µ, Σ) α f Learningscale is slightly Parameters is slightly larger larger
More informationIEOR E4570: Machine Learning for OR&FE Spring 2015 c 2015 by Martin Haugh. The EM Algorithm
IEOR E4570: Machine Learning for OR&FE Spring 205 c 205 by Martin Haugh The EM Algorithm The EM algorithm is used for obtaining maximum likelihood estimates of parameters when some of the data is missing.
More informationIntroduction to Machine Learning. Lecture 2
Introduction to Machine Learning Lecturer: Eran Halperin Lecture 2 Fall Semester Scribe: Yishay Mansour Some of the material was not presented in class (and is marked with a side line) and is given for
More informationClustering K-means. Clustering images. Machine Learning CSE546 Carlos Guestrin University of Washington. November 4, 2014.
Clustering K-means Machine Learning CSE546 Carlos Guestrin University of Washington November 4, 2014 1 Clustering images Set of Images [Goldberger et al.] 2 1 K-means Randomly initialize k centers µ (0)
More informationINTRODUCTION TO BAYESIAN METHODS II
INTRODUCTION TO BAYESIAN METHODS II Abstract. We will revisit point estimation and hypothesis testing from the Bayesian perspective.. Bayes estimators Let X = (X,..., X n ) be a random sample from the
More informationTopic Modelling and Latent Dirichlet Allocation
Topic Modelling and Latent Dirichlet Allocation Stephen Clark (with thanks to Mark Gales for some of the slides) Lent 2013 Machine Learning for Language Processing: Lecture 7 MPhil in Advanced Computer
More informationExpectation Maximization and Mixtures of Gaussians
Statistical Machine Learning Notes 10 Expectation Maximiation and Mixtures of Gaussians Instructor: Justin Domke Contents 1 Introduction 1 2 Preliminary: Jensen s Inequality 2 3 Expectation Maximiation
More informationPrior Choice, Summarizing the Posterior
Prior Choice, Summarizing the Posterior Statistics 220 Spring 2005 Copyright c 2005 by Mark E. Irwin Informative Priors Binomial Model: y π Bin(n, π) π is the success probability. Need prior p(π) Bayes
More information