Introduction to Bayesian inference
|
|
- Kathryn Higgins
- 5 years ago
- Views:
Transcription
1 Introduction to Bayesian inference Thomas Alexander Brouwer University of Cambridge 17 November 2015
2 Probabilistic models Describe how data was generated using probability distributions Generative process Data D, parameters θ Want to find the best parameters θ for the data D - inference
3 Topic modelling Documents D 1,..., D D Documents cover topics, with a distribution θ d = (t 1,..., t T ) Words in document, D d = {w d,1,..., w d,n } Some words are more prevalent in some topics Topics have a word distribution φ t = (w 1,..., w V ) Data is words in documents D 1,..., D D, parameters are θ d, φ t
4 From Blei s ICML-2012 tutorial
5 Overview Probabilistic models Probability theory Latent variable models Bayesian inference Bayes theorem Latent Dirichlet Allocation Conjugacy Graphical models Gibbs sampling Variational Bayesian inference
6 Probability primer Random variable X Probability distribution p(x ) Discrete distribution e.g. coin flip or dice roll Continuous distribution e.g. height distribution
7 Multiple RVs Joint distribution p(x, Y ) Conditional distribution p(x Y )
8 Probability rules Chain rule p(x Y ) = p(x, Y ) p(y ) Marginal rule p(x ) = Y for continuous variables, p(x) = p(x, y)dy = y y or p(x, Y ) = p(x Y )p(y ) p(x, Y ) = Y p(x y)p(y)dy p(x Y )p(y ) We can add more conditional random variables if we want, so e.g. p(x, Y Z) = p(x Y, Z)p(Y Z).
9 Independence X and Y are independent if p(x, Y ) = p(x )p(y ) Equivalently, if p(y X ) = p(y )
10 Expectation and variance Expectation E [X ] = x p(x = x), E [X ] = x p(x)dx x x [ Variance V [X ] = E (X E [X ]) 2] = E [ X 2] E [X ] 2 where E [ X 2] = x 2 p(x = x) or E [ X 2] = x 2 p(x)dx x x
11 Dice roll: E [X ] = = 7 2 E [ X 2] = = 91 6 So V [X ] = 91 ( ) =
12 Latent variable models Manifest or observed variable Latent or unobserved variable Latent variable models
13 Probability distributions Categorical distribution N possible outcomes, with probabilities (p 1,..., p N ) Draw a single value e.g. throw a dice once Parameters θ = (p 1,..., p N ) Discrete distribution, p(x = i) = p i Expectation for outcome i is p i, variance is p i (1 p i )
14 Probability distributions Dirichlet distribution Draws are vectors x = (x 1,..., x N ) s.t. i x i = 1 In other words, draws are probability vectors the parameter to the categorical distribution Parameters θ = (α 1,..., α N ) = α Continuous distribution, p(x) = 1 where B(α) = B(α) i Γ (α i) Γ ( i α i) and Γ (α i) = Expectation for ith element x i is E [x i ] = i 0 x α i 1 i y α i 1 e α i dy α i j α j
15 Probabilistic models Probability theory Latent variable models Bayesian inference Bayes theorem Latent Dirichlet Allocation Conjugacy Graphical models Gibbs sampling Variational Bayesian inference
16 Unfair dice Dice with unknown distribution, p = (p 1, p 2, p 3, p 4, p 5, p 6 ) We observe some throws and want to estimate p Say we observe 4, 6, 6, 4, 6, 3 Perhaps p = (0, 0, 1 6, 2 6, 0, 3 6 )
17 Maximum likelihood Maximum likelihood solution, θ ML = max p(d θ) θ Easily leads to overfitting Want to incorporate some prior belief or knowledge about our parameters
18 Bayes theorem Bayes theorem For any two random variables X and Y, p(x Y ) = p(y X )p(x ) p(y ) Proof From chain rule, p(x, Y ) = p(y X )p(x ) = p(x Y )p(y ). Divide both sides by p(y ).
19 Disease test Test for disease with 99% accuracy 1 in a 1000 people have the disease You tested positive. What is the probability that you have the disease?
20 Disease test Let X = disease, and Y = positive Want to know p(x Y ) probability of disease given a positive test p(y X )p(x ) From Bayes, p(x Y ) = p(y ) p(y X ) = 0.99, p(x ) = p(y ) = p(y, X ) + p(y,!x ) = p(y X )p(x ) + p(y!x )p(!x ) = = So p(x Y ) = =
21 Bayes theorem for inference Want to find best parameters θ for our model after observing the data D ML overfits by using p(d θ) Need some way of using prior belief about the parameters Consider p(θ D) our belief about the parameters after observing the data
22 Bayesian inference Using Bayes theorem, p(θ D) = p(d θ)p(θ) p(d) Prior p(θ) Likelihood p(d θ) Posterior p(θ D) Maximum A Posteriori (MAP) θ MAP = max p(θ D) = max p(d θ)p(θ) θ θ Bayesian inference find full posterior distribution p(θ D)
23 Intractability In our model we define the prior p(θ) and likelihood p(d θ) How do we find p(d)? p(d) = p(d, θ)dθ = p(d θ)p(θ)dθ θ BUT: space of possible values for θ is huge! Approximate Bayesian inference θ
24 Latent Dirichlet Allocation Generative process Draw document-to-topic distributions, θ d Dir(α) (d = 1,..., D) Draw topic-to-word distributions, φ t Dir(β) (t = 1,..., T ) For each of the N words in each of the D documents: Draw a topic from the document s topic distribution, z dn Multinomial(θ d ) Draw a word from the topics s word distribution, w dn Multinomial(φ z ) Note that our model s data is the words w dn we observe, and the parameters are the θ d, φ t. We have placed Dirichlet priors over the parameters, with its own parameters α, β.
25 Hyperparameters In our model we have: Random variables observed ones, like the words; and latent ones, like the topics Parameters document-to-topic distributions θ d and topic-to-word distributions φ d Hyperparameters these are parameters to the prior distributions over our parameters, so α and β
26 Conjugacy For a specific parameter θ i, p(θ i ) is conjugate to the likelihood p(d θ i ) if the posterior of the parameter, p(θ i D), is of the same family as the prior. e.g. the Dirichlet distribution is the conjugate prior for the categorical distribution.
27 Probabilistic models Probability theory Latent variable models Bayesian inference Bayes theorem Latent Dirichlet Allocation Conjugacy Graphical models Gibbs sampling Variational Bayesian inference
28 Bayesian network Nodes are random variables (latent or observed) Arrows indicate dependencies Distribution of a node only depends on its parents (and things further down the network) Plates indicate repetition of variables A C D B p(d A, B, C) = p(d C) BUT: p(c A, B, D) p(c A, B)
29 A C D B Recall Bayes p(x Y, Z) = p(y X, Z)p(X Z) p(y Z) p(c A, B, D) = = = p(d A, B, C)p(C A, B) p(d A, B) C C p(d C)p(C A, B) p(d A, B, C)p(C A, B)dC p(d C)p(C A, B) p(d C)p(C A, B)dC = p(d C)p(C A, B) p(c, D A, B)dC C
30 Latent Dirichlet Allocation Generative process Draw document-to-topic distributions, θ d Dir(α) (d = 1,..., D) Draw topic-to-word distributions, φ t Dir(β) (t = 1,..., T ) For each of the N words in each of the D documents: Draw a topic from the document s topic distribution, z dn Multinomial(θ d ) Draw a word from the topics s word distribution, w dn Multinomial(φ z )
31 Latent Dirichlet Allocation From
32 Probabilistic models Probability theory Latent variable models Bayesian inference Bayes theorem Latent Dirichlet Allocation Conjugacy Graphical models Gibbs sampling Variational Bayesian inference
33 Gibbs sampling Want to approximate p(θ D) for parameters θ = (θ 1,..., θ N ) Cannot compute this exactly, but maybe we can draw samples from it We can then use these samples to estimate the distribution, or estimate the expectation and variance
34 Gibbs sampling For each parameter θ i, write down its distribution conditional on the data and the values of the other parameters, p(θ i θ i, D) If our model is conjugate, this gives closed-form expressions (meaning this distribution is of a known form, e.g. Dirichlet, so we can draw from it) Drawing new values for the parameters θ i in turn will eventually converge to give draws from the true posterior, p(θ D) Burn-in, thinning
35 Latent Dirichlet Allocation Want to draw samples from p(θ, φ, z w) w = {w d,n } d=1..d,n=1..n z = {z d,n } d=1..d,n=1..n θ = {θ d } d=1..d φ = {φ t } t=1..t
36 Latent Dirichlet Allocation For Gibbs sampling, need distribitions: p(θ d θ d, φ, z, w) p(φ t θ, φ t, z, w) p(z d,n θ, φ, z d,n, w)
37 Latent Dirichlet Allocation These are relatively straightforward to derive. For example: p(z d,n θ, φ, z d,n, w) = p(w θ, φ, z)p(z d,n θ, φ, z d,n ) p(w θ, φ, z d,n ) p(w d,n θ, φ, z w,n )p(z d,n θ, φ, z d,n ) = p(w d,n z w,n, φ zd,n )p(z d,n θ d ) = φ zd,n,w d,n θ d,zd,n Where the first step follows from Bayes theorem, the second from that fact that some terms do not depend on z d,n, the third from independence in our Bayesian Network, and the fourth from our model s definition of those distributions. We then simply compute these probabilities for all z d,n, normalise them to sum to 1, and draw a new value with those probabilities!
38 Collapsed Gibbs sampler In practice we actually want to find p(z w), as we can estimate the θ d, φ t from the topic assignments. We integrate out the other parameters. This is called a collapsed Gibbs sampler.
39 Probabilistic models Probability theory Latent variable models Bayesian inference Bayes theorem Latent Dirichlet Allocation Conjugacy Graphical models Gibbs sampling Variational Bayesian inference
40 Variational Bayesian inference Want to approximate p(θ D) for parameters θ = (θ 1,..., θ N ) Cannot compute this exactly, but maybe we can approximate it Introduce a new distribution q(θ) over the parameters, called the variational distribution We can choose the exact form of q ourselves, giving us a set of variational parameters ν i.e. we have q(θ ν) We then tweak ν so that q is as similar to p as possible! We want q to be easier to compute we normally do this by assuming each of the parameters θ i is independent in the posterior mean-field assumption q(θ ν) = i q(θ i ν i )
41 KL-divergence We need some way of measuring similarity between distributions We use the KL-divergence between distributions q and p D KL (q p) = q(θ) log q(θ) p(θ D) dθ θ
42 ELBO We can show that minimising D KL (q p) is equivalent to maximising something called the Evidence Lower Bound (ELBO) L. L = q(θ) log p(θ, D)dθ q(θ) log q(θ)dθ θ θ = E q [log p(θ, D)] E q [log q(θ)] If we choose the precise distribution for q, we can write down this expression. Then optimise by taking the derivative w.r.t. ν and solving for 0, to give the variational parameter updates.
43 Convergence We update the variational parameters ν in turn, and alternate updates until the value of the ELBO converges. After convergence, our estimate of the posterior distribution of a parameter θ i is q(θ i ν i ).
44 Choosing q Our choice of q determines how well our approximation to p is If our model has conjugacy, we simply choose the same distribution for q(θ i ) as we used for Gibbs sampling, p(θ i θ i, D) We then obtain very nice updates In non-conjugate models we need to use gradient descent to optimise the ELBO!
45 Introduction to Bayesian inference Thomas Alexander Brouwer University of Cambridge 17 November 2015
CS Lecture 18. Topic Models and LDA
CS 6347 Lecture 18 Topic Models and LDA (some slides by David Blei) Generative vs. Discriminative Models Recall that, in Bayesian networks, there could be many different, but equivalent models of the same
More informationBayesian Learning. HT2015: SC4 Statistical Data Mining and Machine Learning. Maximum Likelihood Principle. The Bayesian Learning Framework
HT5: SC4 Statistical Data Mining and Machine Learning Dino Sejdinovic Department of Statistics Oxford http://www.stats.ox.ac.uk/~sejdinov/sdmml.html Maximum Likelihood Principle A generative model for
More informationDensity Estimation. Seungjin Choi
Density Estimation Seungjin Choi Department of Computer Science and Engineering Pohang University of Science and Technology 77 Cheongam-ro, Nam-gu, Pohang 37673, Korea seungjin@postech.ac.kr http://mlg.postech.ac.kr/
More informationLecture 13 : Variational Inference: Mean Field Approximation
10-708: Probabilistic Graphical Models 10-708, Spring 2017 Lecture 13 : Variational Inference: Mean Field Approximation Lecturer: Willie Neiswanger Scribes: Xupeng Tong, Minxing Liu 1 Problem Setup 1.1
More informationBayesian Inference and MCMC
Bayesian Inference and MCMC Aryan Arbabi Partly based on MCMC slides from CSC412 Fall 2018 1 / 18 Bayesian Inference - Motivation Consider we have a data set D = {x 1,..., x n }. E.g each x i can be the
More informationBayesian Inference. p(y)
Bayesian Inference There are different ways to interpret a probability statement in a real world setting. Frequentist interpretations of probability apply to situations that can be repeated many times,
More informationComputational Cognitive Science
Computational Cognitive Science Lecture 8: Frank Keller School of Informatics University of Edinburgh keller@inf.ed.ac.uk Based on slides by Sharon Goldwater October 14, 2016 Frank Keller Computational
More informationIntroduction to Probabilistic Machine Learning
Introduction to Probabilistic Machine Learning Piyush Rai Dept. of CSE, IIT Kanpur (Mini-course 1) Nov 03, 2015 Piyush Rai (IIT Kanpur) Introduction to Probabilistic Machine Learning 1 Machine Learning
More informationPMR Learning as Inference
Outline PMR Learning as Inference Probabilistic Modelling and Reasoning Amos Storkey Modelling 2 The Exponential Family 3 Bayesian Sets School of Informatics, University of Edinburgh Amos Storkey PMR Learning
More informationCOMP90051 Statistical Machine Learning
COMP90051 Statistical Machine Learning Semester 2, 2017 Lecturer: Trevor Cohn 2. Statistical Schools Adapted from slides by Ben Rubinstein Statistical Schools of Thought Remainder of lecture is to provide
More information13: Variational inference II
10-708: Probabilistic Graphical Models, Spring 2015 13: Variational inference II Lecturer: Eric P. Xing Scribes: Ronghuo Zheng, Zhiting Hu, Yuntian Deng 1 Introduction We started to talk about variational
More informationCSC321 Lecture 18: Learning Probabilistic Models
CSC321 Lecture 18: Learning Probabilistic Models Roger Grosse Roger Grosse CSC321 Lecture 18: Learning Probabilistic Models 1 / 25 Overview So far in this course: mainly supervised learning Language modeling
More informationStatistical Debugging with Latent Topic Models
Statistical Debugging with Latent Topic Models David Andrzejewski, Anne Mulhern, Ben Liblit, Xiaojin Zhu Department of Computer Sciences University of Wisconsin Madison European Conference on Machine Learning,
More informationGraphical Models and Kernel Methods
Graphical Models and Kernel Methods Jerry Zhu Department of Computer Sciences University of Wisconsin Madison, USA MLSS June 17, 2014 1 / 123 Outline Graphical Models Probabilistic Inference Directed vs.
More informationLearning Bayesian network : Given structure and completely observed data
Learning Bayesian network : Given structure and completely observed data Probabilistic Graphical Models Sharif University of Technology Spring 2017 Soleymani Learning problem Target: true distribution
More informationOverview of Course. Nevin L. Zhang (HKUST) Bayesian Networks Fall / 58
Overview of Course So far, we have studied The concept of Bayesian network Independence and Separation in Bayesian networks Inference in Bayesian networks The rest of the course: Data analysis using Bayesian
More informationLatent Dirichlet Allocation (LDA)
Latent Dirichlet Allocation (LDA) D. Blei, A. Ng, and M. Jordan. Journal of Machine Learning Research, 3:993-1022, January 2003. Following slides borrowed ant then heavily modified from: Jonathan Huang
More informationBayesian Inference for Dirichlet-Multinomials
Bayesian Inference for Dirichlet-Multinomials Mark Johnson Macquarie University Sydney, Australia MLSS Summer School 1 / 50 Random variables and distributed according to notation A probability distribution
More informationApplying LDA topic model to a corpus of Italian Supreme Court decisions
Applying LDA topic model to a corpus of Italian Supreme Court decisions Paolo Fantini Statistical Service of the Ministry of Justice - Italy CESS Conference - Rome - November 25, 2014 Our goal finding
More informationAn overview of Bayesian analysis
An overview of Bayesian analysis Benjamin Letham Operations Research Center, Massachusetts Institute of Technology, Cambridge, MA bletham@mit.edu May 2012 1 Introduction and Notation This work provides
More informationComputational Cognitive Science
Computational Cognitive Science Lecture 9: Bayesian Estimation Chris Lucas (Slides adapted from Frank Keller s) School of Informatics University of Edinburgh clucas2@inf.ed.ac.uk 17 October, 2017 1 / 28
More informationIntroduction into Bayesian statistics
Introduction into Bayesian statistics Maxim Kochurov EF MSU November 15, 2016 Maxim Kochurov Introduction into Bayesian statistics EF MSU 1 / 7 Content 1 Framework Notations 2 Difference Bayesians vs Frequentists
More informationGibbs Sampling. Héctor Corrada Bravo. University of Maryland, College Park, USA CMSC 644:
Gibbs Sampling Héctor Corrada Bravo University of Maryland, College Park, USA CMSC 644: 2019 03 27 Latent semantic analysis Documents as mixtures of topics (Hoffman 1999) 1 / 60 Latent semantic analysis
More informationIntroduction: MLE, MAP, Bayesian reasoning (28/8/13)
STA561: Probabilistic machine learning Introduction: MLE, MAP, Bayesian reasoning (28/8/13) Lecturer: Barbara Engelhardt Scribes: K. Ulrich, J. Subramanian, N. Raval, J. O Hollaren 1 Classifiers In this
More informationBayesian RL Seminar. Chris Mansley September 9, 2008
Bayesian RL Seminar Chris Mansley September 9, 2008 Bayes Basic Probability One of the basic principles of probability theory, the chain rule, will allow us to derive most of the background material in
More informationIntroduction to Systems Analysis and Decision Making Prepared by: Jakub Tomczak
Introduction to Systems Analysis and Decision Making Prepared by: Jakub Tomczak 1 Introduction. Random variables During the course we are interested in reasoning about considered phenomenon. In other words,
More informationSequential Monte Carlo and Particle Filtering. Frank Wood Gatsby, November 2007
Sequential Monte Carlo and Particle Filtering Frank Wood Gatsby, November 2007 Importance Sampling Recall: Let s say that we want to compute some expectation (integral) E p [f] = p(x)f(x)dx and we remember
More informationLecture 2: Priors and Conjugacy
Lecture 2: Priors and Conjugacy Melih Kandemir melih.kandemir@iwr.uni-heidelberg.de May 6, 2014 Some nice courses Fred A. Hamprecht (Heidelberg U.) https://www.youtube.com/watch?v=j66rrnzzkow Michael I.
More informationBayesian Machine Learning
Bayesian Machine Learning Andrew Gordon Wilson ORIE 6741 Lecture 4 Occam s Razor, Model Construction, and Directed Graphical Models https://people.orie.cornell.edu/andrew/orie6741 Cornell University September
More informationParticle Filtering a brief introductory tutorial. Frank Wood Gatsby, August 2007
Particle Filtering a brief introductory tutorial Frank Wood Gatsby, August 2007 Problem: Target Tracking A ballistic projectile has been launched in our direction and may or may not land near enough to
More informationDocument and Topic Models: plsa and LDA
Document and Topic Models: plsa and LDA Andrew Levandoski and Jonathan Lobo CS 3750 Advanced Topics in Machine Learning 2 October 2018 Outline Topic Models plsa LSA Model Fitting via EM phits: link analysis
More informationTopic Modelling and Latent Dirichlet Allocation
Topic Modelling and Latent Dirichlet Allocation Stephen Clark (with thanks to Mark Gales for some of the slides) Lent 2013 Machine Learning for Language Processing: Lecture 7 MPhil in Advanced Computer
More informationIntroduc)on to Bayesian Methods
Introduc)on to Bayesian Methods Bayes Rule py x)px) = px! y) = px y)py) py x) = px y)py) px) px) =! px! y) = px y)py) y py x) = py x) =! y "! y px y)py) px y)py) px y)py) px y)py)dy Bayes Rule py x) =
More informationStreaming Variational Bayes
Streaming Variational Bayes Tamara Broderick, Nicholas Boyd, Andre Wibisono, Ashia C. Wilson, Michael I. Jordan UC Berkeley Discussion led by Miao Liu September 13, 2013 Introduction The SDA-Bayes Framework
More informationan introduction to bayesian inference
with an application to network analysis http://jakehofman.com january 13, 2010 motivation would like models that: provide predictive and explanatory power are complex enough to describe observed phenomena
More informationCOS513 LECTURE 8 STATISTICAL CONCEPTS
COS513 LECTURE 8 STATISTICAL CONCEPTS NIKOLAI SLAVOV AND ANKUR PARIKH 1. MAKING MEANINGFUL STATEMENTS FROM JOINT PROBABILITY DISTRIBUTIONS. A graphical model (GM) represents a family of probability distributions
More informationProbabilistic and Bayesian Machine Learning
Probabilistic and Bayesian Machine Learning Lecture 1: Introduction to Probabilistic Modelling Yee Whye Teh ywteh@gatsby.ucl.ac.uk Gatsby Computational Neuroscience Unit University College London Why a
More informationLecture 6: Graphical Models: Learning
Lecture 6: Graphical Models: Learning 4F13: Machine Learning Zoubin Ghahramani and Carl Edward Rasmussen Department of Engineering, University of Cambridge February 3rd, 2010 Ghahramani & Rasmussen (CUED)
More informationApproximate Inference using MCMC
Approximate Inference using MCMC 9.520 Class 22 Ruslan Salakhutdinov BCS and CSAIL, MIT 1 Plan 1. Introduction/Notation. 2. Examples of successful Bayesian models. 3. Basic Sampling Algorithms. 4. Markov
More informationOverview. Probabilistic Interpretation of Linear Regression Maximum Likelihood Estimation Bayesian Estimation MAP Estimation
Overview Probabilistic Interpretation of Linear Regression Maximum Likelihood Estimation Bayesian Estimation MAP Estimation Probabilistic Interpretation: Linear Regression Assume output y is generated
More informationVariational Scoring of Graphical Model Structures
Variational Scoring of Graphical Model Structures Matthew J. Beal Work with Zoubin Ghahramani & Carl Rasmussen, Toronto. 15th September 2003 Overview Bayesian model selection Approximations using Variational
More informationLecture 4: Probabilistic Learning. Estimation Theory. Classification with Probability Distributions
DD2431 Autumn, 2014 1 2 3 Classification with Probability Distributions Estimation Theory Classification in the last lecture we assumed we new: P(y) Prior P(x y) Lielihood x2 x features y {ω 1,..., ω K
More informationCS281A/Stat241A Lecture 22
CS281A/Stat241A Lecture 22 p. 1/4 CS281A/Stat241A Lecture 22 Monte Carlo Methods Peter Bartlett CS281A/Stat241A Lecture 22 p. 2/4 Key ideas of this lecture Sampling in Bayesian methods: Predictive distribution
More informationFundamentals. CS 281A: Statistical Learning Theory. Yangqing Jia. August, Based on tutorial slides by Lester Mackey and Ariel Kleiner
Fundamentals CS 281A: Statistical Learning Theory Yangqing Jia Based on tutorial slides by Lester Mackey and Ariel Kleiner August, 2011 Outline 1 Probability 2 Statistics 3 Linear Algebra 4 Optimization
More informationModeling Environment
Topic Model Modeling Environment What does it mean to understand/ your environment? Ability to predict Two approaches to ing environment of words and text Latent Semantic Analysis (LSA) Topic Model LSA
More informationBased on slides by Richard Zemel
CSC 412/2506 Winter 2018 Probabilistic Learning and Reasoning Lecture 3: Directed Graphical Models and Latent Variables Based on slides by Richard Zemel Learning outcomes What aspects of a model can we
More informationNPFL108 Bayesian inference. Introduction. Filip Jurčíček. Institute of Formal and Applied Linguistics Charles University in Prague Czech Republic
NPFL108 Bayesian inference Introduction Filip Jurčíček Institute of Formal and Applied Linguistics Charles University in Prague Czech Republic Home page: http://ufal.mff.cuni.cz/~jurcicek Version: 21/02/2014
More informationHierarchical Models & Bayesian Model Selection
Hierarchical Models & Bayesian Model Selection Geoffrey Roeder Departments of Computer Science and Statistics University of British Columbia Jan. 20, 2016 Contact information Please report any typos or
More informationPart 1: Expectation Propagation
Chalmers Machine Learning Summer School Approximate message passing and biomedicine Part 1: Expectation Propagation Tom Heskes Machine Learning Group, Institute for Computing and Information Sciences Radboud
More informationLatent Dirichlet Alloca/on
Latent Dirichlet Alloca/on Blei, Ng and Jordan ( 2002 ) Presented by Deepak Santhanam What is Latent Dirichlet Alloca/on? Genera/ve Model for collec/ons of discrete data Data generated by parameters which
More informationLDA with Amortized Inference
LDA with Amortied Inference Nanbo Sun Abstract This report describes how to frame Latent Dirichlet Allocation LDA as a Variational Auto- Encoder VAE and use the Amortied Variational Inference AVI to optimie
More informationLecture 4: Probabilistic Learning
DD2431 Autumn, 2015 1 Maximum Likelihood Methods Maximum A Posteriori Methods Bayesian methods 2 Classification vs Clustering Heuristic Example: K-means Expectation Maximization 3 Maximum Likelihood Methods
More informationMachine Learning Summer School
Machine Learning Summer School Lecture 3: Learning parameters and structure Zoubin Ghahramani zoubin@eng.cam.ac.uk http://learning.eng.cam.ac.uk/zoubin/ Department of Engineering University of Cambridge,
More informationNote for plsa and LDA-Version 1.1
Note for plsa and LDA-Version 1.1 Wayne Xin Zhao March 2, 2011 1 Disclaimer In this part of PLSA, I refer to [4, 5, 1]. In LDA part, I refer to [3, 2]. Due to the limit of my English ability, in some place,
More informationBayesian Methods for Machine Learning
Bayesian Methods for Machine Learning CS 584: Big Data Analytics Material adapted from Radford Neal s tutorial (http://ftp.cs.utoronto.ca/pub/radford/bayes-tut.pdf), Zoubin Ghahramni (http://hunch.net/~coms-4771/zoubin_ghahramani_bayesian_learning.pdf),
More informationNotes on pseudo-marginal methods, variational Bayes and ABC
Notes on pseudo-marginal methods, variational Bayes and ABC Christian Andersson Naesseth October 3, 2016 The Pseudo-Marginal Framework Assume we are interested in sampling from the posterior distribution
More informationLatent Dirichlet Allocation
Outlines Advanced Artificial Intelligence October 1, 2009 Outlines Part I: Theoretical Background Part II: Application and Results 1 Motive Previous Research Exchangeability 2 Notation and Terminology
More informationText Mining for Economics and Finance Latent Dirichlet Allocation
Text Mining for Economics and Finance Latent Dirichlet Allocation Stephen Hansen Text Mining Lecture 5 1 / 45 Introduction Recall we are interested in mixed-membership modeling, but that the plsi model
More informationLecture : Probabilistic Machine Learning
Lecture : Probabilistic Machine Learning Riashat Islam Reasoning and Learning Lab McGill University September 11, 2018 ML : Many Methods with Many Links Modelling Views of Machine Learning Machine Learning
More informationAdvanced Probabilistic Modeling in R Day 1
Advanced Probabilistic Modeling in R Day 1 Roger Levy University of California, San Diego July 20, 2015 1/24 Today s content Quick review of probability: axioms, joint & conditional probabilities, Bayes
More informationAn Introduction to Bayesian Machine Learning
1 An Introduction to Bayesian Machine Learning José Miguel Hernández-Lobato Department of Engineering, Cambridge University April 8, 2013 2 What is Machine Learning? The design of computational systems
More informationLecturer: David Blei Lecture #3 Scribes: Jordan Boyd-Graber and Francisco Pereira October 1, 2007
COS 597C: Bayesian Nonparametrics Lecturer: David Blei Lecture # Scribes: Jordan Boyd-Graber and Francisco Pereira October, 7 Gibbs Sampling with a DP First, let s recapitulate the model that we re using.
More information17 : Markov Chain Monte Carlo
10-708: Probabilistic Graphical Models, Spring 2015 17 : Markov Chain Monte Carlo Lecturer: Eric P. Xing Scribes: Heran Lin, Bin Deng, Yun Huang 1 Review of Monte Carlo Methods 1.1 Overview Monte Carlo
More informationFoundations of Statistical Inference
Foundations of Statistical Inference Julien Berestycki Department of Statistics University of Oxford MT 2016 Julien Berestycki (University of Oxford) SB2a MT 2016 1 / 32 Lecture 14 : Variational Bayes
More information16 : Approximate Inference: Markov Chain Monte Carlo
10-708: Probabilistic Graphical Models 10-708, Spring 2017 16 : Approximate Inference: Markov Chain Monte Carlo Lecturer: Eric P. Xing Scribes: Yuan Yang, Chao-Ming Yen 1 Introduction As the target distribution
More informationParametric Models. Dr. Shuang LIANG. School of Software Engineering TongJi University Fall, 2012
Parametric Models Dr. Shuang LIANG School of Software Engineering TongJi University Fall, 2012 Today s Topics Maximum Likelihood Estimation Bayesian Density Estimation Today s Topics Maximum Likelihood
More informationBayesian Machine Learning
Bayesian Machine Learning Andrew Gordon Wilson ORIE 6741 Lecture 2: Bayesian Basics https://people.orie.cornell.edu/andrew/orie6741 Cornell University August 25, 2016 1 / 17 Canonical Machine Learning
More informationProbabilistic Reasoning in Deep Learning
Probabilistic Reasoning in Deep Learning Dr Konstantina Palla, PhD palla@stats.ox.ac.uk September 2017 Deep Learning Indaba, Johannesburgh Konstantina Palla 1 / 39 OVERVIEW OF THE TALK Basics of Bayesian
More informationProbabilistic Machine Learning
Probabilistic Machine Learning Bayesian Nets, MCMC, and more Marek Petrik 4/18/2017 Based on: P. Murphy, K. (2012). Machine Learning: A Probabilistic Perspective. Chapter 10. Conditional Independence Independent
More informationGraphical Models for Collaborative Filtering
Graphical Models for Collaborative Filtering Le Song Machine Learning II: Advanced Topics CSE 8803ML, Spring 2012 Sequence modeling HMM, Kalman Filter, etc.: Similarity: the same graphical model topology,
More informationIntroduction to Bayesian Statistics
School of Computing & Communication, UTS January, 207 Random variables Pre-university: A number is just a fixed value. When we talk about probabilities: When X is a continuous random variable, it has a
More informationCollapsed Variational Bayesian Inference for Hidden Markov Models
Collapsed Variational Bayesian Inference for Hidden Markov Models Pengyu Wang, Phil Blunsom Department of Computer Science, University of Oxford International Conference on Artificial Intelligence and
More informationParameter estimation for text analysis
Parameter estimation for text analysis Gregor Heinrich gregor@arbylon.net Abstract. This primer presents parameter estimation methods common in Bayesian statistics and apply them to discrete probability
More informationBayesian Models in Machine Learning
Bayesian Models in Machine Learning Lukáš Burget Escuela de Ciencias Informáticas 2017 Buenos Aires, July 24-29 2017 Frequentist vs. Bayesian Frequentist point of view: Probability is the frequency of
More informationProbabilistic Graphical Models
Parameter Estimation December 14, 2015 Overview 1 Motivation 2 3 4 What did we have so far? 1 Representations: how do we model the problem? (directed/undirected). 2 Inference: given a model and partially
More informationLecture 9: PGM Learning
13 Oct 2014 Intro. to Stats. Machine Learning COMP SCI 4401/7401 Table of Contents I Learning parameters in MRFs 1 Learning parameters in MRFs Inference and Learning Given parameters (of potentials) and
More informationNote 1: Varitional Methods for Latent Dirichlet Allocation
Technical Note Series Spring 2013 Note 1: Varitional Methods for Latent Dirichlet Allocation Version 1.0 Wayne Xin Zhao batmanfly@gmail.com Disclaimer: The focus of this note was to reorganie the content
More informationBayesian Approach 2. CSC412 Probabilistic Learning & Reasoning
CSC412 Probabilistic Learning & Reasoning Lecture 12: Bayesian Parameter Estimation February 27, 2006 Sam Roweis Bayesian Approach 2 The Bayesian programme (after Rev. Thomas Bayes) treats all unnown quantities
More informationBayesian Machine Learning
Bayesian Machine Learning Andrew Gordon Wilson ORIE 6741 Lecture 3 Stochastic Gradients, Bayesian Inference, and Occam s Razor https://people.orie.cornell.edu/andrew/orie6741 Cornell University August
More information6.047 / Computational Biology: Genomes, Networks, Evolution Fall 2008
MIT OpenCourseWare http://ocw.mit.edu 6.047 / 6.878 Computational Biology: Genomes, Networks, Evolution Fall 2008 For information about citing these materials or our Terms of Use, visit: http://ocw.mit.edu/terms.
More informationExpectation maximization tutorial
Expectation maximization tutorial Octavian Ganea November 18, 2016 1/1 Today Expectation - maximization algorithm Topic modelling 2/1 ML & MAP Observed data: X = {x 1, x 2... x N } 3/1 ML & MAP Observed
More informationDAG models and Markov Chain Monte Carlo methods a short overview
DAG models and Markov Chain Monte Carlo methods a short overview Søren Højsgaard Institute of Genetics and Biotechnology University of Aarhus August 18, 2008 Printed: August 18, 2008 File: DAGMC-Lecture.tex
More informationVariational Inference. Sargur Srihari
Variational Inference Sargur srihari@cedar.buffalo.edu 1 Plan of discussion We first describe inference with PGMs and the intractability of exact inference Then give a taxonomy of inference algorithms
More informationTutorial on Gaussian Processes and the Gaussian Process Latent Variable Model
Tutorial on Gaussian Processes and the Gaussian Process Latent Variable Model (& discussion on the GPLVM tech. report by Prof. N. Lawrence, 06) Andreas Damianou Department of Neuro- and Computer Science,
More informationIntegrated Non-Factorized Variational Inference
Integrated Non-Factorized Variational Inference Shaobo Han, Xuejun Liao and Lawrence Carin Duke University February 27, 2014 S. Han et al. Integrated Non-Factorized Variational Inference February 27, 2014
More informationVariational Bayesian Logistic Regression
Variational Bayesian Logistic Regression Sargur N. University at Buffalo, State University of New York USA Topics in Linear Models for Classification Overview 1. Discriminant Functions 2. Probabilistic
More informationBayesian Linear Models
Bayesian Linear Models Sudipto Banerjee September 03 05, 2017 Department of Biostatistics, Fielding School of Public Health, University of California, Los Angeles Linear Regression Linear regression is,
More informationSTAT J535: Introduction
David B. Hitchcock E-Mail: hitchcock@stat.sc.edu Spring 2012 Chapter 1: Introduction to Bayesian Data Analysis Bayesian statistical inference uses Bayes Law (Bayes Theorem) to combine prior information
More informationSome Probability and Statistics
Some Probability and Statistics David M. Blei COS424 Princeton University February 12, 2007 D. Blei ProbStat 01 1 / 42 Who wants to scribe? D. Blei ProbStat 01 2 / 42 Random variable Probability is about
More informationPrinciples of Bayesian Inference
Principles of Bayesian Inference Sudipto Banerjee University of Minnesota July 20th, 2008 1 Bayesian Principles Classical statistics: model parameters are fixed and unknown. A Bayesian thinks of parameters
More informationMachine Learning
Machine Learning 10-601 Tom M. Mitchell Machine Learning Department Carnegie Mellon University August 30, 2017 Today: Decision trees Overfitting The Big Picture Coming soon Probabilistic learning MLE,
More information10. Exchangeability and hierarchical models Objective. Recommended reading
10. Exchangeability and hierarchical models Objective Introduce exchangeability and its relation to Bayesian hierarchical models. Show how to fit such models using fully and empirical Bayesian methods.
More informationLecture 7 and 8: Markov Chain Monte Carlo
Lecture 7 and 8: Markov Chain Monte Carlo 4F13: Machine Learning Zoubin Ghahramani and Carl Edward Rasmussen Department of Engineering University of Cambridge http://mlg.eng.cam.ac.uk/teaching/4f13/ Ghahramani
More informationProbabilistic & Unsupervised Learning
Probabilistic & Unsupervised Learning Week 2: Latent Variable Models Maneesh Sahani maneesh@gatsby.ucl.ac.uk Gatsby Computational Neuroscience Unit, and MSc ML/CSML, Dept Computer Science University College
More informationA graph contains a set of nodes (vertices) connected by links (edges or arcs)
BOLTZMANN MACHINES Generative Models Graphical Models A graph contains a set of nodes (vertices) connected by links (edges or arcs) In a probabilistic graphical model, each node represents a random variable,
More informationUnsupervised Learning
Unsupervised Learning Bayesian Model Comparison Zoubin Ghahramani zoubin@gatsby.ucl.ac.uk Gatsby Computational Neuroscience Unit, and MSc in Intelligent Systems, Dept Computer Science University College
More informationBayesian Regression Linear and Logistic Regression
When we want more than point estimates Bayesian Regression Linear and Logistic Regression Nicole Beckage Ordinary Least Squares Regression and Lasso Regression return only point estimates But what if we
More informationSome Probability and Statistics
Some Probability and Statistics David M. Blei COS424 Princeton University February 13, 2012 Card problem There are three cards Red/Red Red/Black Black/Black I go through the following process. Close my
More informationProbability and Estimation. Alan Moses
Probability and Estimation Alan Moses Random variables and probability A random variable is like a variable in algebra (e.g., y=e x ), but where at least part of the variability is taken to be stochastic.
More informationInference Methods for Latent Dirichlet Allocation
Inference Methods for Latent Dirichlet Allocation Chase Geigle University of Illinois at Urbana-Champaign Department of Computer Science geigle1@illinois.edu October 15, 2016 Abstract Latent Dirichlet
More information