Probabilistic Graphical Models. Guest Lecture by Narges Razavian Machine Learning Class April
|
|
- May Pierce
- 5 years ago
- Views:
Transcription
1 Probabilistic Graphical Models Guest Lecture by Narges Razavian Machine Learning Class April
2 Today What is probabilistic graphical model and why it is useful? Bayesian Networks Basic Inference Generative Models Fancy Inference (when some variables are unobserved) How to learn model parameters from data Undirected Graphical Models Inference (Belief Propagation) New Directions in PGM research & wrapping up
3 What I cannot create, I do not understand. -Richard Feynman
4 Generative models vs Discriminative models Discriminative models learn P(Y X). It s easier, requires less data, but is only useful for one particular task: Given X, what is P(Y X)? [Example: Logistic Regression, Feed-Forward or Convolutional Neural Networks, etc.] Generative models instead learn P(Y, X) completely. Once they do that, they can compute everything! P(X) = y P(X,Y) P(Y) = x P(X,Y) P(Y X) = P(Y,X) / y P(Y,X) [Caveat: No Free Lunch!! You want to answer every question under the sun? You need more data!]
5 Probabilistic Graphical Models: Main Classic approach to modeling P(Y, X) P( Y1,, YM, X1,,XD )
6 Some Calculations on Space Imagine each variable is binary P( Y1,, YM, X1,, XD )
7 Some Calculations on Space Imagine each variable is binary P( Y1,, YM, X1,, XD ) How many parameters do we need to estimate from data to specify P(Y,X)??
8 Some Calculations on Space Imagine each variable is binary P( Y1,, YM, X1,, XD ) How many parameters do we need to estimate from data to specify P(Y,X)?? 2 (M+D) -1
9 Too many parameters! What can be done? 1) 2) 3) Look for conditional independences Use Chain Rule for probabilities to break P(Y,X) into smaller pieces Rewrite P(Y,X) as product of smaller factors a) 4) Maybe you have more data for a subset of variables.. Simplify some of the modeling assumptions to cut parameters a) b) I.e. Assume data is multivariate Gaussian I.e. Assume conditional independencies even if they don t really always apply
10 Bayesian Networks Use chain rule for probabilities This is always true, no approximations or assumptions, so no reduction in number of parameters either BNs: Conditional Independence Assumption: For some of variables, P(Xi X1,, Xi-1) is approximated with P(Xi Subset of (X1,, Xi-1) ) This Subset of (X1,, Xi-1) is referred to as Parents(Xi) Reduce parameters (if binary for instance) from 2(i-1) to 2 parents(xi)
11 Bayesian Networks Variable and assumption Number of parameters in binary case: Raw Chain Rule BN Chain Rule X1: Difficulty X2: Intelligence X1 (Difficulty) P(X1) 1 P(X1) 2-1 P(X1) X2 (Intelligence) P(X2 X1) = P(X2) 2 P(X2 X1) 1 P(X2) X3: Grade X3 (Grade) P(X3 X1,X2) = P(X3 X1,X2) 4 P(X3 X1,X2) 4 P(X3 X1,X2) X5: Letter of recom X4 (SAT score) P(X4 X1,X2,X3) = P(X4 X2) 8 P(X4 X1,X2,X3) 2 P(X4 X2) X5 (Letter) P(X5 X1,X2,X3,X4) = P(X5 X3) 16 P(X5 X1,X2,X3,X4) 2 P(X5 X3) Total P(X1,X2,X3,X4,X5) = =10 X4: SAT
12 Some Example of a BN for SNPs
13 Benefits of Bayesian Networks 1) Once estimated they can answer any conditional or marginal queries! a) Called Inference 2) Fewer parameters to estimate! 3) We can start putting prior information into the network 4) We can incorporate LATENT(Hidden/Unobserved) variables based on how we/domain experts think variables might be related Generating samples from the distribution becomes super easy. 5)
14 Inference in Bayesian Networks Query types: 1) 2) Conditional probabilities P(Y X)=? P(Xi==a X\i==B,Y==C)=? Maximum a posteriori estimate Argmax xi P(Xi X\i) =? Argmax yi P(Yi X) =? X1: Difficulty X2: Intelligence X3: Grade X5: Letter of recom X4: SAT
15 Key operation: Marginalization P(X) = Σy P(X,Y) P(X5 X2=a) =?? P(X5 X2=a) = P(X5, X2=a) / P(X2=a) P(X5, X2=a) = ΣX1,X3,X4 P(X1,X2=a,X3,X4,X5) P(X2=a) = ΣX1,X3,X4,X5 P(X1,X2=a,X3,X4,X5) X1: Difficulty X2: Intelligence X3: Grade X5: Letter of recom X4: SAT
16 Marginalize from the first parents (root) to the variable...
17 Marginalize from the first parents (root) to the variable...
18 Marginalize from the first parents (root) to the variable...
19 Marginalize from the first parents (root) to the variable...
20 Marginalize from the first parents (root) to the variable...
21 Marginalize from the first parents (root) to the variable...
22 Marginalize from the first parents (root) to the variable... This method is called sum-product or variable elimination
23 Marginalization when P(X) = Σy P(X,Y) P(X5 X2=a) =?? P(X5 X2=a) = P(X5, X2=a) / P(X2=a) X1: Difficulty X2: Intelligence X3: Grade X5: Letter of recom X4: SAT
24 Marginalization when P(X) = Σy P(X,Y) P(X5 X2=a) =?? P(X5 X2=a) = P(X5, X2=a) / P(X2=a) X1: Difficulty X2: Intelligence X3: Grade X4: SAT P(X5, X2=a) = ΣX1,X3,X4 P(X1,X2=a,X3,X4,X5) X5: Letter = ΣX1,X3,X4 P(X1) P(X2=a) P(X3 X1,X2=a) P(X4 X2=a) P(X5 X3) of recom = P(X2=a) ΣX1,X3,X4 P(X1) P(X3 X1,X2=a) P(X4 X2=a) P(X5 X3) = P(X2=a) ΣX1,X3 P(X1) P(X3 X1,X2=a) P(X5 X3) ΣX4 P(X4 X2=a) = P(X2=a) ΣX1,X3 P(X1) P(X3 X1,X2=a) P(X5 X3) = P(X2=a) ΣX1,X3 P(X1) PX2=a(X3 X1) P(X5 X3) = P(X2=a) ΣX3 P(X5 X3) ΣX1 PX2=a(X3 X1) P(X1) = P(X2=a) ΣX3 P(X5 X3) fx2=a(x3) = P(X2=a) ΣX3 P(X5 X3) fx2=a(x3) = P(X2=a) gx2=a(x5) = P(X2=a) gx2=a(x5)
25 Marginalization when P(X) = Σy P(X,Y) P(X5 X2=a) = gx2=a(x5) P(X5 X2=a) = P(X5, X2=a) / P(X2=a) X1: Difficulty X2: Intelligence X3: Grade X4: SAT P(X5, X2=a) = ΣX1,X3,X4 P(X1,X2=a,X3,X4,X5) X5: Letter = ΣX1,X3,X4 P(X1) P(X2=a) P(X3 X1,X2=a) P(X4 X2=a) P(X5 X3) of recom = P(X2=a) ΣX1,X3,X4 P(X1) P(X3 X1,X2=a) P(X4 X2=a) P(X5 X3) = P(X2=a) ΣX1,X3 P(X1) P(X3 X1,X2=a) P(X5 X3) ΣX4 P(X4 X2=a) = P(X2=a) ΣX1,X3 P(X1) P(X3 X1,X2=a) P(X5 X3) = P(X2=a) ΣX1,X3 P(X1) PX2=a(X3 X1) P(X5 X3) = P(X2=a) ΣX3 P(X5 X3) ΣX1 PX2=a(X3 X1) P(X1) = P(X2=a) ΣX3 P(X5 X3) fx2=a(x3) = P(X2=a) ΣX3 P(X5 X3) fx2=a(x3) = P(X2=a) gx2=a(x5) = P(X2=a) gx2=a(x5)
26
27 Estimating Parameters of a Bayesian Network Maximum Likelihood Estimation Also sometimes Maximum Pseudolikelihood estimation
28 How to estimate parameters of a Bayesian Network? (1) You have observed all Y,X variables and dependency structure is known If you remember from other lectures: Likelihood(D; Parameters) = Dj in data P(Dj Parameters) = Dj in data Xij in Dj P(Xij Par(Xij), Parameters{Par(Xij) -> Xij}) = i in variable set Dj in data P(Xij Par(Xij), Parameters{Par(Xij) -> Xij}) = i in variable set (Independent Local terms function of All observed Xij and Par(Xij)) MLE-Parameters{Par(Xij) -> Xij} = Argmax (Local likelihood of observed Xij and Par(Xij) in data!)
29 How to estimate parameters of a Bayesian Network? (1) You have observed all Y,X variables and dependency structure is known If variables are discrete: P(Xi = a Parents(Xi) = B) = Count (Xi==a & Pa(Xi) == B) Count (Pa(Xi) == B)
30 How to estimate parameters of a Bayesian Network? (1) You have observed all Y,X variables and dependency structure is known If variables are discrete: P(Xi = a Parents(Xi) = B) = Count (Xi==a & Pa(Xi) == B) Count (Pa(Xi) == B) If variables are continuous: P(Xi = a Parents(Xi) = B) = fit Some_PDF_Function(a,B)
31 How to estimate parameters of a Bayesian Network? (1) You have observed all Y,X variables and dependency structure is known P(Xi=a Parent(Xi) =B) = Some_PDF_Function(a, B) Single Multivariate Gaussian Mixture of Multivariate Gaussian Non-parametric density functions
32 How to estimate parameters of a Bayesian Network? (2) You have observed all Y,X variables, but dependency structure is NOT known
33 Structure learning when all variables are observed 1) Neighborhood Selection: Lasso: L1 regularized regression per variable, learning using other variables. Not necessarily a tree structure 2) Tree Learning via Chaw-Liu method: Per variable pairs find empirical distribution P(Xi,Xj) = Count(Xi,Xj)/M Per variable pairs, compute mutual information Use I(Xi,Xj) as weight in graph. Learn maximum spanning tree.
34 How to estimate parameters of a Bayesian Network? (3) You have unobserved variables!!, but dependency structure is known Most commonly used Bayesian Networks these days!
35 In practice, Bayes Nets are most used to inject priors and structure into the task Modeling documents as a collection of topics where each topic is a distribution over words: Topic Modeling via Latent Dirichlet Allocation
36 In Practice, Bayes Nets are most used to inject priors and structure Correcting for hidden confounders in expression data
37 In Practice, Bayes Nets are most used to inject priors and structure Correcting for hidden confounders in expression data
38 Estimation/Inference in when missing values 1) Sometimes P(observed) = Σunobserved P(observed & unobserved) has closed form! a) b) 2) Combining Gaussian conditional and priors usually lead to Gaussian marginals (has closed form) If your prior distribution on latent variables is a conjugate to the conditional distribution, you get closed form i) Lots of known pairs of distribution. Gaussian and Gaussian; Dirichlet and Multinomial; Gamma and Gamma; etc. etc. Expectation maximization (EM) a) b) c) d) Initialize parameters randomly. Do Inference: MAP-Estimate: Most likely value unobserved variables (E step) Re-estimate: MLE-Estimate: re-estimate the parameters (M step) Iterate (a) and (b) until parameters converge
39 Estimation/Inference in when missing values 3) Gibbs sampling or MCMC a) Initialize randomly. b) Sample new P(xi everything else). c) Burn-In: Repeat over variables & draw thousands of samples sequentially. d) Eventually (It s proven), you ll be sampling from true distribution! Use those samples to compute anything you want. (Note that in those samples all variables are observed) 3) Variational Inference (Approximate another model which HAS a closed form) a) b) Find a functional mapping from the probability under the original bayesian model into the probability under simpler model (per data point) Estimation = Minimize the gap between the two distributions
40 Example of EM for estimating Hidden Markov Model Parameters Y1 Y2 Y3 Y4 Y5 Y6 X1 X2 X3 X4 X5 X6 P(X,Y) = P(Y1)P(X1 Y1) P(Yi Yi-1)P(Xi Yi) i
41 Gibbs Sampling for all variants of models. Let your imagination go wild!
42 Problems with Bayesian Networks Prior has to have the form of conditional probability What if the variables are symmetric? Bayes Nets can t have loops A B What if the relationship can be described in un-normalized way? (i.e energy)
43 Undirected Graphical Models (aka Markov Random Fields) Comes from world of Statistical Physics and modeling energy and electron spins. Define a joint probability as normalized product of factors (i.e. energies) over cliques of variables P( X1,..., XD ) = 1 / Z Ci={subsets of X1..XD} f(ci) Z = Σ x1,x2,..xd f(ci) In practice people often use pairwise and node-wise factors only. Often called Edge and Node potentials The main problem with these models: How do we estimate Z?!
44 Conditional Independencies in Markov Random Fields We assume one edge for every pairwise potential. According to definition of undirected graphical models: Every variable Xi is conditionally independent of other variables Xj, if in every path that goes from Xi to Xj, at least one variable is observed.
45 Example: Gaussian Graphical Models: They are equivalent to a multivariate Gaussian distribution with: and Easily allow conditional independence decisions especially during inference.
46 Computing Z (Normalization factor) Note: Z is a function of the parameters not the samples. So without Z, you can still compute some conditional probabilities But need Z to compute MAP estimates Actual probabilities Just like with Bayes Nets: You can use sum-product method to compute Z
47 Factor graph representation of MRFs P(X) = 1/Z f1(x1,x2)f2(x2,x3,x4)f3(x3,x5)f4(x4,x6) Z = Σx1,x2,x3,x4,x5,x6 f1(x1,x2)f2(x2,x3,x4)f3(x3,x5)f4(x4,x6) = Σx1,x2 f1(x1,x2) Σx3,x4 f2(x2,x3,x4) (Σx5 f3(x3,x5)) (Σx6 f4(x4,x6)))
48 Belief Propagation Algorithm Kschischang, Frank R., Brendan J. Frey, and H-A. Loeliger. "Factor graphs and the sum-product algorithm." IEEE Transactions on information theory (2001)
49 Some notes on Belief Propagation/Inference in MRFs If the structure doesn t have a loop, results are exact If the structure is loopy, still people use loopy BP for inferring Z. Sometimes messages don t have a closed form. keep passing messages until the messages converge Some theoretical properties of the convergence exist. Use approximations to keep them within closed form i.e. Incoming D messages are mixtures of K gaussians Outgoing would be mixture of DK gaussians Reapproximate them with K new gaussians Variants of this method exist like Expectation propagation If replacing sum with max, you can get MAP estimates at the same time complexity
50 Related Topics (No time to cover) Generative Adversarial Networks Another method to generate samples but without factorizing the probability When conditional independencies are bad assumptions Useful for highly correlated data like images, sounds etc. Deep variational inference: Make that function that maps the two distributions more powerful and optimize that via gradient descent Probabilistic Programming! Nonparametric models (dirichlet processes) & Kernel based graphical models Causal inference and Bayesian Networks
51 Back to the big picture PGMs give you a full model of the task You can inject prior information into your model You can use partial data for better estimation Give you justifications for your results. Easy to interpret and allow humans to find hypothesis If your data changes you can adjust parts of the model but re-estimate other parts Comes with the costs: You re making independence assumption: Often wrong You re multiplying a ton of factors: Errors can grow exponentially Inference can be slow if you need sampling
Graphical Models and Kernel Methods
Graphical Models and Kernel Methods Jerry Zhu Department of Computer Sciences University of Wisconsin Madison, USA MLSS June 17, 2014 1 / 123 Outline Graphical Models Probabilistic Inference Directed vs.
More informationECE521 Tutorial 11. Topic Review. ECE521 Winter Credits to Alireza Makhzani, Alex Schwing, Rich Zemel and TAs for slides. ECE521 Tutorial 11 / 4
ECE52 Tutorial Topic Review ECE52 Winter 206 Credits to Alireza Makhzani, Alex Schwing, Rich Zemel and TAs for slides ECE52 Tutorial ECE52 Winter 206 Credits to Alireza / 4 Outline K-means, PCA 2 Bayesian
More informationBayesian Networks: Construction, Inference, Learning and Causal Interpretation. Volker Tresp Summer 2016
Bayesian Networks: Construction, Inference, Learning and Causal Interpretation Volker Tresp Summer 2016 1 Introduction So far we were mostly concerned with supervised learning: we predicted one or several
More informationPattern Recognition and Machine Learning
Christopher M. Bishop Pattern Recognition and Machine Learning ÖSpri inger Contents Preface Mathematical notation Contents vii xi xiii 1 Introduction 1 1.1 Example: Polynomial Curve Fitting 4 1.2 Probability
More informationAnnouncements. CS 188: Artificial Intelligence Fall Causality? Example: Traffic. Topology Limits Distributions. Example: Reverse Traffic
CS 188: Artificial Intelligence Fall 2008 Lecture 16: Bayes Nets III 10/23/2008 Announcements Midterms graded, up on glookup, back Tuesday W4 also graded, back in sections / box Past homeworks in return
More informationSTA 4273H: Statistical Machine Learning
STA 4273H: Statistical Machine Learning Russ Salakhutdinov Department of Statistics! rsalakhu@utstat.toronto.edu! http://www.utstat.utoronto.ca/~rsalakhu/ Sidney Smith Hall, Room 6002 Lecture 3 Linear
More information13: Variational inference II
10-708: Probabilistic Graphical Models, Spring 2015 13: Variational inference II Lecturer: Eric P. Xing Scribes: Ronghuo Zheng, Zhiting Hu, Yuntian Deng 1 Introduction We started to talk about variational
More informationA graph contains a set of nodes (vertices) connected by links (edges or arcs)
BOLTZMANN MACHINES Generative Models Graphical Models A graph contains a set of nodes (vertices) connected by links (edges or arcs) In a probabilistic graphical model, each node represents a random variable,
More informationProbabilistic Graphical Models
Probabilistic Graphical Models Lecture 11 CRFs, Exponential Family CS/CNS/EE 155 Andreas Krause Announcements Homework 2 due today Project milestones due next Monday (Nov 9) About half the work should
More informationProbabilistic Graphical Models: MRFs and CRFs. CSE628: Natural Language Processing Guest Lecturer: Veselin Stoyanov
Probabilistic Graphical Models: MRFs and CRFs CSE628: Natural Language Processing Guest Lecturer: Veselin Stoyanov Why PGMs? PGMs can model joint probabilities of many events. many techniques commonly
More informationLecture 9: PGM Learning
13 Oct 2014 Intro. to Stats. Machine Learning COMP SCI 4401/7401 Table of Contents I Learning parameters in MRFs 1 Learning parameters in MRFs Inference and Learning Given parameters (of potentials) and
More informationBayesian Networks: Construction, Inference, Learning and Causal Interpretation. Volker Tresp Summer 2014
Bayesian Networks: Construction, Inference, Learning and Causal Interpretation Volker Tresp Summer 2014 1 Introduction So far we were mostly concerned with supervised learning: we predicted one or several
More informationp L yi z n m x N n xi
y i z n x n N x i Overview Directed and undirected graphs Conditional independence Exact inference Latent variables and EM Variational inference Books statistical perspective Graphical Models, S. Lauritzen
More informationBayes Nets III: Inference
1 Hal Daumé III (me@hal3.name) Bayes Nets III: Inference Hal Daumé III Computer Science University of Maryland me@hal3.name CS 421: Introduction to Artificial Intelligence 10 Apr 2012 Many slides courtesy
More informationCS 188: Artificial Intelligence. Bayes Nets
CS 188: Artificial Intelligence Probabilistic Inference: Enumeration, Variable Elimination, Sampling Pieter Abbeel UC Berkeley Many slides over this course adapted from Dan Klein, Stuart Russell, Andrew
More informationLecture 8: PGM Inference
15 September 2014 Intro. to Stats. Machine Learning COMP SCI 4401/7401 Table of Contents I 1 Variable elimination Max-product Sum-product 2 LP Relaxations QP Relaxations 3 Marginal and MAP X1 X2 X3 X4
More information9/12/17. Types of learning. Modeling data. Supervised learning: Classification. Supervised learning: Regression. Unsupervised learning: Clustering
Types of learning Modeling data Supervised: we know input and targets Goal is to learn a model that, given input data, accurately predicts target data Unsupervised: we know the input only and want to make
More informationNonparametric Bayesian Methods (Gaussian Processes)
[70240413 Statistical Machine Learning, Spring, 2015] Nonparametric Bayesian Methods (Gaussian Processes) Jun Zhu dcszj@mail.tsinghua.edu.cn http://bigml.cs.tsinghua.edu.cn/~jun State Key Lab of Intelligent
More informationDirected and Undirected Graphical Models
Directed and Undirected Davide Bacciu Dipartimento di Informatica Università di Pisa bacciu@di.unipi.it Machine Learning: Neural Networks and Advanced Models (AA2) Last Lecture Refresher Lecture Plan Directed
More informationProbabilistic Models
Bayes Nets 1 Probabilistic Models Models describe how (a portion of) the world works Models are always simplifications May not account for every variable May not account for all interactions between variables
More informationUndirected Graphical Models
Outline Hong Chang Institute of Computing Technology, Chinese Academy of Sciences Machine Learning Methods (Fall 2012) Outline Outline I 1 Introduction 2 Properties Properties 3 Generative vs. Conditional
More informationProbabilistic Graphical Models
Probabilistic Graphical Models Brown University CSCI 295-P, Spring 213 Prof. Erik Sudderth Lecture 11: Inference & Learning Overview, Gaussian Graphical Models Some figures courtesy Michael Jordan s draft
More informationGenerative and Discriminative Approaches to Graphical Models CMSC Topics in AI
Generative and Discriminative Approaches to Graphical Models CMSC 35900 Topics in AI Lecture 2 Yasemin Altun January 26, 2007 Review of Inference on Graphical Models Elimination algorithm finds single
More informationMachine Learning Lecture 14
Many slides adapted from B. Schiele, S. Roth, Z. Gharahmani Machine Learning Lecture 14 Undirected Graphical Models & Inference 23.06.2015 Bastian Leibe RWTH Aachen http://www.vision.rwth-aachen.de/ leibe@vision.rwth-aachen.de
More informationBayesian Machine Learning
Bayesian Machine Learning Andrew Gordon Wilson ORIE 6741 Lecture 4 Occam s Razor, Model Construction, and Directed Graphical Models https://people.orie.cornell.edu/andrew/orie6741 Cornell University September
More informationLecture 13 : Variational Inference: Mean Field Approximation
10-708: Probabilistic Graphical Models 10-708, Spring 2017 Lecture 13 : Variational Inference: Mean Field Approximation Lecturer: Willie Neiswanger Scribes: Xupeng Tong, Minxing Liu 1 Problem Setup 1.1
More informationLearning Bayesian network : Given structure and completely observed data
Learning Bayesian network : Given structure and completely observed data Probabilistic Graphical Models Sharif University of Technology Spring 2017 Soleymani Learning problem Target: true distribution
More information6.867 Machine learning, lecture 23 (Jaakkola)
Lecture topics: Markov Random Fields Probabilistic inference Markov Random Fields We will briefly go over undirected graphical models or Markov Random Fields (MRFs) as they will be needed in the context
More informationIntelligent Systems (AI-2)
Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 11 Oct, 3, 2016 CPSC 422, Lecture 11 Slide 1 422 big picture: Where are we? Query Planning Deterministic Logics First Order Logics Ontologies
More information6.047 / Computational Biology: Genomes, Networks, Evolution Fall 2008
MIT OpenCourseWare http://ocw.mit.edu 6.047 / 6.878 Computational Biology: Genomes, Networks, Evolution Fall 2008 For information about citing these materials or our Terms of Use, visit: http://ocw.mit.edu/terms.
More information4 : Exact Inference: Variable Elimination
10-708: Probabilistic Graphical Models 10-708, Spring 2014 4 : Exact Inference: Variable Elimination Lecturer: Eric P. ing Scribes: Soumya Batra, Pradeep Dasigi, Manzil Zaheer 1 Probabilistic Inference
More informationECE 6504: Advanced Topics in Machine Learning Probabilistic Graphical Models and Large-Scale Learning
ECE 6504: Advanced Topics in Machine Learning Probabilistic Graphical Models and Large-Scale Learning Topics Summary of Class Advanced Topics Dhruv Batra Virginia Tech HW1 Grades Mean: 28.5/38 ~= 74.9%
More informationCPSC 540: Machine Learning
CPSC 540: Machine Learning Undirected Graphical Models Mark Schmidt University of British Columbia Winter 2016 Admin Assignment 3: 2 late days to hand it in today, Thursday is final day. Assignment 4:
More informationUndirected Graphical Models: Markov Random Fields
Undirected Graphical Models: Markov Random Fields 40-956 Advanced Topics in AI: Probabilistic Graphical Models Sharif University of Technology Soleymani Spring 2015 Markov Random Field Structure: undirected
More informationVariational Inference (11/04/13)
STA561: Probabilistic machine learning Variational Inference (11/04/13) Lecturer: Barbara Engelhardt Scribes: Matt Dickenson, Alireza Samany, Tracy Schifeling 1 Introduction In this lecture we will further
More informationMachine Learning 4771
Machine Learning 4771 Instructor: Tony Jebara Topic 16 Undirected Graphs Undirected Separation Inferring Marginals & Conditionals Moralization Junction Trees Triangulation Undirected Graphs Separation
More informationIntelligent Systems (AI-2)
Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 18 Oct, 21, 2015 Slide Sources Raymond J. Mooney University of Texas at Austin D. Koller, Stanford CS - Probabilistic Graphical Models CPSC
More information1 Undirected Graphical Models. 2 Markov Random Fields (MRFs)
Machine Learning (ML, F16) Lecture#07 (Thursday Nov. 3rd) Lecturer: Byron Boots Undirected Graphical Models 1 Undirected Graphical Models In the previous lecture, we discussed directed graphical models.
More informationMachine Learning for Data Science (CS4786) Lecture 24
Machine Learning for Data Science (CS4786) Lecture 24 Graphical Models: Approximate Inference Course Webpage : http://www.cs.cornell.edu/courses/cs4786/2016sp/ BELIEF PROPAGATION OR MESSAGE PASSING Each
More information9 Forward-backward algorithm, sum-product on factor graphs
Massachusetts Institute of Technology Department of Electrical Engineering and Computer Science 6.438 Algorithms For Inference Fall 2014 9 Forward-backward algorithm, sum-product on factor graphs The previous
More informationProbabilistic Graphical Networks: Definitions and Basic Results
This document gives a cursory overview of Probabilistic Graphical Networks. The material has been gleaned from different sources. I make no claim to original authorship of this material. Bayesian Graphical
More information3 : Representation of Undirected GM
10-708: Probabilistic Graphical Models 10-708, Spring 2016 3 : Representation of Undirected GM Lecturer: Eric P. Xing Scribes: Longqi Cai, Man-Chia Chang 1 MRF vs BN There are two types of graphical models:
More informationCS 5522: Artificial Intelligence II
CS 5522: Artificial Intelligence II Bayes Nets: Independence Instructor: Alan Ritter Ohio State University [These slides were adapted from CS188 Intro to AI at UC Berkeley. All materials available at http://ai.berkeley.edu.]
More informationBayesian Machine Learning - Lecture 7
Bayesian Machine Learning - Lecture 7 Guido Sanguinetti Institute for Adaptive and Neural Computation School of Informatics University of Edinburgh gsanguin@inf.ed.ac.uk March 4, 2015 Today s lecture 1
More informationLecture 6: Graphical Models
Lecture 6: Graphical Models Kai-Wei Chang CS @ Uniersity of Virginia kw@kwchang.net Some slides are adapted from Viek Skirmar s course on Structured Prediction 1 So far We discussed sequence labeling tasks:
More informationBayes Networks. CS540 Bryan R Gibson University of Wisconsin-Madison. Slides adapted from those used by Prof. Jerry Zhu, CS540-1
Bayes Networks CS540 Bryan R Gibson University of Wisconsin-Madison Slides adapted from those used by Prof. Jerry Zhu, CS540-1 1 / 59 Outline Joint Probability: great for inference, terrible to obtain
More informationCS 188: Artificial Intelligence Spring Announcements
CS 188: Artificial Intelligence Spring 2011 Lecture 16: Bayes Nets IV Inference 3/28/2011 Pieter Abbeel UC Berkeley Many slides over this course adapted from Dan Klein, Stuart Russell, Andrew Moore Announcements
More informationOutline. Spring It Introduction Representation. Markov Random Field. Conclusion. Conditional Independence Inference: Variable elimination
Probabilistic Graphical Models COMP 790-90 Seminar Spring 2011 The UNIVERSITY of NORTH CAROLINA at CHAPEL HILL Outline It Introduction ti Representation Bayesian network Conditional Independence Inference:
More informationRepresentation. Stefano Ermon, Aditya Grover. Stanford University. Lecture 2
Representation Stefano Ermon, Aditya Grover Stanford University Lecture 2 Stefano Ermon, Aditya Grover (AI Lab) Deep Generative Models Lecture 2 1 / 32 Learning a generative model We are given a training
More informationThe Ising model and Markov chain Monte Carlo
The Ising model and Markov chain Monte Carlo Ramesh Sridharan These notes give a short description of the Ising model for images and an introduction to Metropolis-Hastings and Gibbs Markov Chain Monte
More informationBayesian Learning in Undirected Graphical Models
Bayesian Learning in Undirected Graphical Models Zoubin Ghahramani Gatsby Computational Neuroscience Unit University College London, UK http://www.gatsby.ucl.ac.uk/ Work with: Iain Murray and Hyun-Chul
More informationUndirected graphical models
Undirected graphical models Semantics of probabilistic models over undirected graphs Parameters of undirected models Example applications COMP-652 and ECSE-608, February 16, 2017 1 Undirected graphical
More informationStructure Learning: the good, the bad, the ugly
Readings: K&F: 15.1, 15.2, 15.3, 15.4, 15.5 Structure Learning: the good, the bad, the ugly Graphical Models 10708 Carlos Guestrin Carnegie Mellon University September 29 th, 2006 1 Understanding the uniform
More informationAnnouncements. Inference. Mid-term. Inference by Enumeration. Reminder: Alarm Network. Introduction to Artificial Intelligence. V22.
Introduction to Artificial Intelligence V22.0472-001 Fall 2009 Lecture 15: Bayes Nets 3 Midterms graded Assignment 2 graded Announcements Rob Fergus Dept of Computer Science, Courant Institute, NYU Slides
More informationDirected Graphical Models or Bayesian Networks
Directed Graphical Models or Bayesian Networks Le Song Machine Learning II: Advanced Topics CSE 8803ML, Spring 2012 Bayesian Networks One of the most exciting recent advancements in statistical AI Compact
More informationSTA 4273H: Statistical Machine Learning
STA 4273H: Statistical Machine Learning Russ Salakhutdinov Department of Computer Science! Department of Statistical Sciences! rsalakhu@cs.toronto.edu! h0p://www.cs.utoronto.ca/~rsalakhu/ Lecture 7 Approximate
More informationCS 2750: Machine Learning. Bayesian Networks. Prof. Adriana Kovashka University of Pittsburgh March 14, 2016
CS 2750: Machine Learning Bayesian Networks Prof. Adriana Kovashka University of Pittsburgh March 14, 2016 Plan for today and next week Today and next time: Bayesian networks (Bishop Sec. 8.1) Conditional
More informationDeep Learning Srihari. Deep Belief Nets. Sargur N. Srihari
Deep Belief Nets Sargur N. Srihari srihari@cedar.buffalo.edu Topics 1. Boltzmann machines 2. Restricted Boltzmann machines 3. Deep Belief Networks 4. Deep Boltzmann machines 5. Boltzmann machines for continuous
More informationReview: Directed Models (Bayes Nets)
X Review: Directed Models (Bayes Nets) Lecture 3: Undirected Graphical Models Sam Roweis January 2, 24 Semantics: x y z if z d-separates x and y d-separation: z d-separates x from y if along every undirected
More informationBayesian Networks Inference with Probabilistic Graphical Models
4190.408 2016-Spring Bayesian Networks Inference with Probabilistic Graphical Models Byoung-Tak Zhang intelligence Lab Seoul National University 4190.408 Artificial (2016-Spring) 1 Machine Learning? Learning
More informationCS 188: Artificial Intelligence Fall 2008
CS 188: Artificial Intelligence Fall 2008 Lecture 14: Bayes Nets 10/14/2008 Dan Klein UC Berkeley 1 1 Announcements Midterm 10/21! One page note sheet Review sessions Friday and Sunday (similar) OHs on
More informationCOMP90051 Statistical Machine Learning
COMP90051 Statistical Machine Learning Semester 2, 2017 Lecturer: Trevor Cohn 24. Hidden Markov Models & message passing Looking back Representation of joint distributions Conditional/marginal independence
More informationCS 343: Artificial Intelligence
CS 343: Artificial Intelligence Bayes Nets: Sampling Prof. Scott Niekum The University of Texas at Austin [These slides based on those of Dan Klein and Pieter Abbeel for CS188 Intro to AI at UC Berkeley.
More informationSTA 4273H: Statistical Machine Learning
STA 4273H: Statistical Machine Learning Russ Salakhutdinov Department of Statistics! rsalakhu@utstat.toronto.edu! http://www.utstat.utoronto.ca/~rsalakhu/ Sidney Smith Hall, Room 6002 Lecture 11 Project
More informationLearning MN Parameters with Alternative Objective Functions. Sargur Srihari
Learning MN Parameters with Alternative Objective Functions Sargur srihari@cedar.buffalo.edu 1 Topics Max Likelihood & Contrastive Objectives Contrastive Objective Learning Methods Pseudo-likelihood Gradient
More informationIntroduction to Machine Learning Midterm, Tues April 8
Introduction to Machine Learning 10-701 Midterm, Tues April 8 [1 point] Name: Andrew ID: Instructions: You are allowed a (two-sided) sheet of notes. Exam ends at 2:45pm Take a deep breath and don t spend
More informationMarkov Networks. l Like Bayes Nets. l Graph model that describes joint probability distribution using tables (AKA potentials)
Markov Networks l Like Bayes Nets l Graph model that describes joint probability distribution using tables (AKA potentials) l Nodes are random variables l Labels are outcomes over the variables Markov
More informationCS 188: Artificial Intelligence Fall 2009
CS 188: Artificial Intelligence Fall 2009 Lecture 14: Bayes Nets 10/13/2009 Dan Klein UC Berkeley Announcements Assignments P3 due yesterday W2 due Thursday W1 returned in front (after lecture) Midterm
More informationIntelligent Systems (AI-2)
Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 19 Oct, 24, 2016 Slide Sources Raymond J. Mooney University of Texas at Austin D. Koller, Stanford CS - Probabilistic Graphical Models D. Page,
More informationPart I. C. M. Bishop PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 8: GRAPHICAL MODELS
Part I C. M. Bishop PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 8: GRAPHICAL MODELS Probabilistic Graphical Models Graphical representation of a probabilistic model Each variable corresponds to a
More informationBayesian Networks BY: MOHAMAD ALSABBAGH
Bayesian Networks BY: MOHAMAD ALSABBAGH Outlines Introduction Bayes Rule Bayesian Networks (BN) Representation Size of a Bayesian Network Inference via BN BN Learning Dynamic BN Introduction Conditional
More informationCPSC 540: Machine Learning
CPSC 540: Machine Learning MCMC and Non-Parametric Bayes Mark Schmidt University of British Columbia Winter 2016 Admin I went through project proposals: Some of you got a message on Piazza. No news is
More informationMidterm Review CS 6375: Machine Learning. Vibhav Gogate The University of Texas at Dallas
Midterm Review CS 6375: Machine Learning Vibhav Gogate The University of Texas at Dallas Machine Learning Supervised Learning Unsupervised Learning Reinforcement Learning Parametric Y Continuous Non-parametric
More informationA brief introduction to Conditional Random Fields
A brief introduction to Conditional Random Fields Mark Johnson Macquarie University April, 2005, updated October 2010 1 Talk outline Graphical models Maximum likelihood and maximum conditional likelihood
More informationGraphical Models for Collaborative Filtering
Graphical Models for Collaborative Filtering Le Song Machine Learning II: Advanced Topics CSE 8803ML, Spring 2012 Sequence modeling HMM, Kalman Filter, etc.: Similarity: the same graphical model topology,
More informationThe Origin of Deep Learning. Lili Mou Jan, 2015
The Origin of Deep Learning Lili Mou Jan, 2015 Acknowledgment Most of the materials come from G. E. Hinton s online course. Outline Introduction Preliminary Boltzmann Machines and RBMs Deep Belief Nets
More informationTopic Models. Brandon Malone. February 20, Latent Dirichlet Allocation Success Stories Wrap-up
Much of this material is adapted from Blei 2003. Many of the images were taken from the Internet February 20, 2014 Suppose we have a large number of books. Each is about several unknown topics. How can
More informationBayesian Learning in Undirected Graphical Models
Bayesian Learning in Undirected Graphical Models Zoubin Ghahramani Gatsby Computational Neuroscience Unit University College London, UK http://www.gatsby.ucl.ac.uk/ and Center for Automated Learning and
More informationAlternative Parameterizations of Markov Networks. Sargur Srihari
Alternative Parameterizations of Markov Networks Sargur srihari@cedar.buffalo.edu 1 Topics Three types of parameterization 1. Gibbs Parameterization 2. Factor Graphs 3. Log-linear Models with Energy functions
More informationCS839: Probabilistic Graphical Models. Lecture 7: Learning Fully Observed BNs. Theo Rekatsinas
CS839: Probabilistic Graphical Models Lecture 7: Learning Fully Observed BNs Theo Rekatsinas 1 Exponential family: a basic building block For a numeric random variable X p(x ) =h(x)exp T T (x) A( ) = 1
More informationBayesian Learning. HT2015: SC4 Statistical Data Mining and Machine Learning. Maximum Likelihood Principle. The Bayesian Learning Framework
HT5: SC4 Statistical Data Mining and Machine Learning Dino Sejdinovic Department of Statistics Oxford http://www.stats.ox.ac.uk/~sejdinov/sdmml.html Maximum Likelihood Principle A generative model for
More informationInference in Bayesian Networks
Andrea Passerini passerini@disi.unitn.it Machine Learning Inference in graphical models Description Assume we have evidence e on the state of a subset of variables E in the model (i.e. Bayesian Network)
More informationBayes Networks 6.872/HST.950
Bayes Networks 6.872/HST.950 What Probabilistic Models Should We Use? Full joint distribution Completely expressive Hugely data-hungry Exponential computational complexity Naive Bayes (full conditional
More informationBayesian Machine Learning
Bayesian Machine Learning Andrew Gordon Wilson ORIE 6741 Lecture 2: Bayesian Basics https://people.orie.cornell.edu/andrew/orie6741 Cornell University August 25, 2016 1 / 17 Canonical Machine Learning
More informationCS 5522: Artificial Intelligence II
CS 5522: Artificial Intelligence II Bayes Nets Instructor: Alan Ritter Ohio State University [These slides were adapted from CS188 Intro to AI at UC Berkeley. All materials available at http://ai.berkeley.edu.]
More informationBayesian Methods for Machine Learning
Bayesian Methods for Machine Learning CS 584: Big Data Analytics Material adapted from Radford Neal s tutorial (http://ftp.cs.utoronto.ca/pub/radford/bayes-tut.pdf), Zoubin Ghahramni (http://hunch.net/~coms-4771/zoubin_ghahramani_bayesian_learning.pdf),
More informationLogistic Regression. Machine Learning Fall 2018
Logistic Regression Machine Learning Fall 2018 1 Where are e? We have seen the folloing ideas Linear models Learning as loss minimization Bayesian learning criteria (MAP and MLE estimation) The Naïve Bayes
More informationUNIVERSITY of PENNSYLVANIA CIS 520: Machine Learning Final, Fall 2013
UNIVERSITY of PENNSYLVANIA CIS 520: Machine Learning Final, Fall 2013 Exam policy: This exam allows two one-page, two-sided cheat sheets; No other materials. Time: 2 hours. Be sure to write your name and
More informationGibbs Fields & Markov Random Fields
Statistical Techniques in Robotics (16-831, F10) Lecture#7 (Tuesday September 21) Gibbs Fields & Markov Random Fields Lecturer: Drew Bagnell Scribe: Bradford Neuman 1 1 Gibbs Fields Like a Bayes Net, a
More informationMessage Passing and Junction Tree Algorithms. Kayhan Batmanghelich
Message Passing and Junction Tree Algorithms Kayhan Batmanghelich 1 Review 2 Review 3 Great Ideas in ML: Message Passing Each soldier receives reports from all branches of tree 3 here 7 here 1 of me 11
More informationBayesian Networks. Motivation
Bayesian Networks Computer Sciences 760 Spring 2014 http://pages.cs.wisc.edu/~dpage/cs760/ Motivation Assume we have five Boolean variables,,,, The joint probability is,,,, How many state configurations
More informationBayesian Networks (Part II)
10-601 Introduction to Machine Learning Machine Learning Department School of Computer Science Carnegie Mellon University Bayesian Networks (Part II) Graphical Model Readings: Murphy 10 10.2.1 Bishop 8.1,
More informationCheng Soon Ong & Christian Walder. Canberra February June 2018
Cheng Soon Ong & Christian Walder Research Group and College of Engineering and Computer Science Canberra February June 2018 Outlines Overview Introduction Linear Algebra Probability Linear Regression
More informationIntroduction to Probabilistic Machine Learning
Introduction to Probabilistic Machine Learning Piyush Rai Dept. of CSE, IIT Kanpur (Mini-course 1) Nov 03, 2015 Piyush Rai (IIT Kanpur) Introduction to Probabilistic Machine Learning 1 Machine Learning
More informationBayesian Networks Introduction to Machine Learning. Matt Gormley Lecture 24 April 9, 2018
10-601 Introduction to Machine Learning Machine Learning Department School of Computer Science Carnegie Mellon University Bayesian Networks Matt Gormley Lecture 24 April 9, 2018 1 Homework 7: HMMs Reminders
More informationIntroduction to Machine Learning. Maximum Likelihood and Bayesian Inference. Lecturers: Eran Halperin, Lior Wolf
1 Introduction to Machine Learning Maximum Likelihood and Bayesian Inference Lecturers: Eran Halperin, Lior Wolf 2014-15 We know that X ~ B(n,p), but we do not know p. We get a random sample from X, a
More informationUnsupervised Learning
Unsupervised Learning Bayesian Model Comparison Zoubin Ghahramani zoubin@gatsby.ucl.ac.uk Gatsby Computational Neuroscience Unit, and MSc in Intelligent Systems, Dept Computer Science University College
More informationLecture 6: Graphical Models: Learning
Lecture 6: Graphical Models: Learning 4F13: Machine Learning Zoubin Ghahramani and Carl Edward Rasmussen Department of Engineering, University of Cambridge February 3rd, 2010 Ghahramani & Rasmussen (CUED)
More informationLatent Dirichlet Allocation
Latent Dirichlet Allocation 1 Directed Graphical Models William W. Cohen Machine Learning 10-601 2 DGMs: The Burglar Alarm example Node ~ random variable Burglar Earthquake Arcs define form of probability
More informationProbabilistic Graphical Models: Representation and Inference
Probabilistic Graphical Models: Representation and Inference Aaron C. Courville Université de Montréal Note: Material for the slides is taken directly from a presentation prepared by Andrew Moore 1 Overview
More information