Decision Tree Fields
|
|
- Patricia Simon
- 5 years ago
- Views:
Transcription
1 Sebastian Nowozin, arsten Rother, Shai agon, Toby Sharp, angpeng Yao, Pushmeet Kohli arcelona, 8th November 2011
2 Introduction Random Fields in omputer Vision Markov Random Fields (MRF) (Kindermann and Snell, 1980), (Li, 1995), (lake, Kohli, Rother, 2011) onditional Random Fields (RF) (Lafferty, Mcallum, Perreira, 2001) Structured prediction of multiple dependent variables
3 Introduction RFs: How do we use them? x E (y i, x) y i y j E (y i, y j, x) Factor graph notation (Kschischang, Frey, Loeliger, 1998) x: observed image y i, y j : dependent variables at pixel i and j
4 Introduction RFs: How do we use them? x E (y i, x) y i y j E (y i, y j, x) Unary energy E (y i, x) Machine learning (SVM, oosting, Random Forests, etc.)
5 Introduction RFs: How do we use them? x E (y i, x) y i y j E (y i, y j, x) Pairwise energy E (y i, y j, x) Generalized Potts, image independent ontrast-sensitive smoothing (e.g. Grabut, Textonoost) E (y i, y j, x) = exp( α x i x j 2 )
6 Introduction RFs: How do we use them? E (y i, x) x E (y i, y k, x) y i y j y k???
7 Introduction RFs: How do we use them? E (y i, x) x E (y i, y j, y k, x) y i y j y k???
8 Introduction Decision Trees in omputer Vision Random Forests (reiman, MLJ 2000) Non-parametric, infinite model capacity Fast inference and training, parallelizable (Shotton et al., 2008, 2011), (Saffari et al., 2009), (Gall and Lempitsky, 2009), etc. No structured prediction
9 Introduction ontributions 1. Learn image-dependent interactions 2. ombine random fields and decision trees 3. Efficient training 4. Superior empirical performance
10 Decision Tree lassifiers x
11 Decision Tree lassifiers x
12 Decision Trees for Image Labeling
13 Decision Trees for Image Labeling (cont) I (i, j) pply decision tree, to each pixel independently
14 Decision Trees for Image Labeling (cont) I (i, j) pply decision tree, to each pixel independently
15 Decision Tree Field (DTF) Example E E D E D E D D D D
16 Unary Factor Example x E (y i, x) y i x: entire observed image y i : prediction at pixel i, y i {1, 2, 3, 4} E (y i, x): energy function
17 Unary Factor Example x E (y i, x) y i
18 Unary Factor Example x E (y i, x) y i
19 Unary Factor Example x E (y i, x) y i
20 Unary Factor Example x E (y i, x) y i
21 Unary Factor Example x E (y i, x) y i E (y i, x) = w (q, y i ) q Path(x)
22 Pairwise Factor Example x E (y i, x) y i y j E (y i, y j, x)
23 Pairwise Factor Example x E (y i, x) y i y j E (y i, y j, x)
24 Pairwise Factor Example x E (y i, x) y i y j E (y i, y j, x)
25 Pairwise Factor Example x E (y i, x) y i y j E (y i, y j, x)
26 Pairwise Factor Example x E (y i, x) y i y j E (y i, y j, x)
27 Pairwise Factor Example x E (y i, x) y i y j E (y i, y j, x) E (y i, y j, x) = w (q, y i, y j ) q Path(x)
28 Full DTF Model E(y, x, w) = F F E tf (y F, x F, w tf ). p(y x, w) = Z(x, w) = y Y 1 exp( E(y, x, w)), Z(x, w) exp( E(y, x, w)) D D D E E D D D E E x: image, y: predicted labels, one for each pixel w: weights/energies, to be learned from data How is this different from other models? What about learning and inference?
29 Full DTF Model E(y, x, w) = F F E tf (y F, x F, w tf ). p(y x, w) = Z(x, w) = y Y 1 exp( E(y, x, w)), Z(x, w) exp( E(y, x, w)) D D D E E D D D E E x: image, y: predicted labels, one for each pixel w: weights/energies, to be learned from data How is this different from other models? What about learning and inference?
30 Related Work Relationship to Other Models Generalizes random forests (learned weights) Markov random fields Here 2 trees
31 Related Work Relationship to Other Models Generalizes random forests (learned weights) Markov random fields n
32 Related Work PT-Trees (1995) onditional Probability Table Trees (Glesner, Koller, 1995), (outilier et al., 1996) Decision tree on states of random variables Limited to ayesian networks
33 Related Work Learning a Markov hain Dependency Networks Learn p(x i x V\{i} ) (Heckermann et al., 2000) Decision tree on states of random variables Inference requires simulation (pseudo-gibbs sampling) Random Forest Random Field (Payet and Todorovic, 2010) Decision tree determines sampler Inference: Swendsen-Wang Metropolis MM
34 Learning Learning DTFs Given iid data {(x, y) i } i=1,...,n, need to learn Structure of the factor graph, Tree structure defined by split functions, Weight parameters in decision nodes. Let us assume structure and trees are given
35 Learning Learning DTFs Given iid data {(x, y) i } i=1,...,n, need to learn Structure of the factor graph, Tree structure defined by split functions, Weight parameters in decision nodes. Let us assume structure and trees are given
36 Learning Training Maximum Likelihood Estimation, given ground truth y w = argmax w log p(y x, w)
37 Learning Training Maximum Likelihood Estimation, given ground truth y w = argmax w log p(y x, w) Intractable!
38 Learning Training
39 Learning Training
40 Learning Training Maximum Pseudo-Likelihood Estimation (esag, 1974) w = argmax w 1 V with ground truth y and i V log p(y i yv\{i}, x, w) V : set of pixels in all images. (details in paper)
41 Learning Efficient Training by Subsampling 1M pixels...
42 Learning Efficient Training by Subsampling... 1M pixels
43 Learning Efficient Training by Subsampling We subsample our training set to train a structured model
44 Learning Training lgorithm 1. Fix factor graph structure 2. For each factor: learn classification tree 3. Jointly optimize convex pseudo-likelihood objective in w
45 Learning Test-time Inference in DTFs 1. Energy minimization (MP) E.g. TRW-S 2. Maximum Posterior Marginal (MPM) E.g. Gibbs sampling D E D E D D D D E E
46 Experiments Experiment: Inpainting
47 Experiments Experiment: Inpainting
48 Experiments Experiment: Inpainting
49 Experiments Experiment: Inpainting
50 Experiments Experiment: Inpainting Training set (300 images)
51 Experiments Experiment: Inpainting Test set (100 images, disjoint characters)
52 Experiments Instances Densely-connected, 64 neighbors Each instance: 10k variables, 300k factors hard to minimize energy
53 Experiments Instances Densely-connected, 64 neighbors Each instance: 10k variables, 300k factors hard to minimize energy ssociativity of the learned pairwise potentials Y offset X offset
54 Experiments Experiment: ody-part Recognition ody part recognition (Shotton et al., VPR 2011) 1500 training images, 150 test images
55 Experiments Experiment: ody-part Recognition (cont) Training set
56 Experiments Experiment: ody-part Recognition, Results
57 Experiments Experiment: ody-part Recognition, Results Model Measure Shotton et al. unary +1 +1,20 +1,5,20 MRF avg-acc runtime 6h34 * * * (30h)* weights - 6.3M 6.2M 6.2M 6.3M DTF avg-acc runtime - - * * (40h)* weights M 7.8M 8.8M
58 Experiments Experiment: ody-part Recognition, Results Model Measure Shotton et al. unary +1 +1,20 +1,5,20 MRF avg-acc runtime 6h34 * * * (30h)* weights - 6.3M 6.2M 6.2M 6.3M DTF avg-acc runtime - - * * (40h)* weights M 7.8M 8.8M
59 Experiments Experiment: ody-part Recognition, Results Model Measure Shotton et al. unary +1 +1,20 +1,5,20 MRF avg-acc runtime 6h34 * * * (30h)* weights - 6.3M 6.2M 6.2M 6.3M DTF avg-acc runtime - - * * (40h)* weights M 7.8M 8.8M
60 Experiments Experiment: ody-part Recognition, Results Model Measure Shotton et al. unary +1 +1,20 +1,5,20 MRF avg-acc runtime 6h34 * * * (30h)* weights - 6.3M 6.2M 6.2M 6.3M DTF avg-acc runtime - - * * (40h)* weights M 7.8M 8.8M
61 Experiments DTF Summary : non-parametric RF model for discrete image labeling tasks Non-parametric: model class can scale with training set size Scalable, can make use of large training sets, onditional interactions: richer models without latent variables
62 Experiments DTF Summary : non-parametric RF model for discrete image labeling tasks Non-parametric: model class can scale with training set size Scalable, can make use of large training sets, onditional interactions: richer models without latent variables ode will be made available after VPR deadline!
63 Experiments Thank you!
64 ppendix DTF: Linearity E tf (y F, x F, w tf ) can be written as a function linear in w tf, n Tree(t F ) z Y F w tf (q, z) tf (q, z; y F, x F ), n 0 s 0 n 1 n 2 s 1 where n 3 n 4 { 1 if n Path(xF ) and z = y tf (q, z; y F, x F ) = F, 0 otherwise. overall energy function is linear in w (pseudo-)likelihood function is log-concave Here: not necessarily unique maximizer
65 ppendix Learning the Decision Trees How to learn the decision tree? Ideal world: learn entire model jointly Here: learn decision trees using common information gain criterion Pairwise and order-k factors: treat as L L classification problem (L k ) lthough trees are trained independently, overcounting is avoided by optimizing the weights jointly Training summary 1. For each factor type, train a decision tree using information gain 2. Initialize tree weights to zero 3. Maximize the pseudolikelihood (using L-FGS)
66 ppendix Learning the Decision Trees How to learn the decision tree? Ideal world: learn entire model jointly Here: learn decision trees using common information gain criterion Pairwise and order-k factors: treat as L L classification problem (L k ) lthough trees are trained independently, overcounting is avoided by optimizing the weights jointly Training summary 1. For each factor type, train a decision tree using information gain 2. Initialize tree weights to zero 3. Maximize the pseudolikelihood (using L-FGS)
67 ppendix Experiment: ody-part Recognition, Results Figure: Test recognition results. MRF (top) and DTF (bottom).
68 ppendix Experiment: ody-part Recognition, Results Model Measure Shotton et al. unary +1 +1,20 +1,5,20 MRF avg-acc runtime 6h34 * * * (30h)* weights - 6.3M 6.2M 6.2M 6.3M DTF avg-acc runtime - - * * (40h)* weights M 7.8M 8.8M Table: ody-part recognition results: mean per-class accuracy, training time on a single 8-core machine, and number of model parameters.
69 ppendix Experiment: ody-part Recognition, Results Figure: Learned horizontal interactions: Left: mean silhouette reaching the 32 leaf nodes in the learned tree. One leaf (marked red) and corresponding effective weight matrix. Visualizing the most attractive (blue) and most repulsive (red) weights. Right: superimposing label-label interactions on test images, (a) matching the pattern, (b) no match, interaction is inactive.
70 ppendix Experiment: ody-part Recognition, Results Figure: Learned horizontal interactions: Left: mean silhouette reaching the 32 leaf nodes in the learned tree. One leaf (marked red) and corresponding effective weight matrix. Visualizing the most attractive (blue) and most repulsive (red) weights. Right: superimposing label-label interactions on test images, (a) matching the pattern, (b) no match, interaction is inactive.
71 ppendix Experiment: ody-part Recognition, Results Figure: Learned horizontal interactions: Left: mean silhouette reaching the 32 leaf nodes in the learned tree. One leaf (marked red) and corresponding effective weight matrix. Visualizing the most attractive (blue) and most repulsive (red) weights. Right: superimposing label-label interactions on test images, (a) matching the pattern, (b) no match, interaction is inactive.
72 ppendix Experiment: Snakes Simplest tasks with conditional label-label structure Snake: 10 labels from head (black) to tail (white) Image contains perfect instructions red = go up, yellow = go right, green = go down, blue = go left Myopic decisions are impossible (weak local evidence) Training: 200 small images Testing: 100 small images Features: relative pixel color tests input image labeling
73 ppendix Experiment: Snakes Simplest tasks with conditional label-label structure Snake: 10 labels from head (black) to tail (white) Image contains perfect instructions red = go up, yellow = go right, green = go down, blue = go left Myopic decisions are impossible (weak local evidence) Training: 200 small images Testing: 100 small images Features: relative pixel color tests input image labeling
74 ppendix Experiment: Snakes, Results RF Unary MRF DTF ccuracy ccuracy (tail) ccuracy (mid) Table: Test set accuracies for the snake data set.
75 ppendix Experiment: Snakes, Results Figure: Predictions on a novel test instance.
76 ppendix Experiment: Snakes, onclusion Here, Strong pairwise interactions help when having weak local evidence, Pairwise interactions are strong because they condition on the image, 200 training images are enough
77 ppendix Factor type in DTFs Every factor type has one scope: relative set of variables it acts on, decision tree: tree with split functions, weight parameters: in each node Energy is the sum along path of traversed nodes E tf (y F, x F, w tf ) = w tf (q, y F ) q Path(x F ) n 3 n 0 s 0 n 1 n 2 s 1 n 4
78 ppendix Efficient Training Minimize in w the regularized negative log-pseudolikelihood, l npl (w) = 1 l i (w) 1 log p t (w t ), V V with and i V l i (w) = log p(y i yv\{i}, x, w) V : set of pixels in all images. t
79 ppendix Efficient Training l npl (w) = 1 V i V l i (w) 1 V log p t (w t ), t
80 ppendix Efficient Training l npl (w) = E i U(V) [l i (w)] 1 V log p t (w t ), t
81 ppendix Efficient Training l npl (w) = E i U(V) [l i (w)] 1 V log p t (w t ), omposite objective: expectation + simple function pproximate expectation, deterministic, for V V, l npl (w) 1 V l i (w) 1 V i V MPLE enables subsampling on variable level t log p t (w t ). t
Part 6: Structured Prediction and Energy Minimization (1/2)
Part 6: Structured Prediction and Energy Minimization (1/2) Providence, 21st June 2012 Prediction Problem Prediction Problem y = f (x) = argmax y Y g(x, y) g(x, y) = p(y x), factor graphs/mrf/crf, g(x,
More informationIntelligent Systems:
Intelligent Systems: Undirected Graphical models (Factor Graphs) (2 lectures) Carsten Rother 15/01/2015 Intelligent Systems: Probabilistic Inference in DGM and UGM Roadmap for next two lectures Definition
More informationInference in Graphical Models Variable Elimination and Message Passing Algorithm
Inference in Graphical Models Variable Elimination and Message Passing lgorithm Le Song Machine Learning II: dvanced Topics SE 8803ML, Spring 2012 onditional Independence ssumptions Local Markov ssumption
More informationMarkov Random Fields for Computer Vision (Part 1)
Markov Random Fields for Computer Vision (Part 1) Machine Learning Summer School (MLSS 2011) Stephen Gould stephen.gould@anu.edu.au Australian National University 13 17 June, 2011 Stephen Gould 1/23 Pixel
More informationStatistical Problems in Computer Vision
V and ML Statistical Problems in omputer Vision Machine Learning and Perception Group Microsoft Research ambridge SI Meeting, ambridge, 26th September 2011 Statistical Problems in omputer Vision Microsoft
More informationUndirected Graphical Models: Markov Random Fields
Undirected Graphical Models: Markov Random Fields 40-956 Advanced Topics in AI: Probabilistic Graphical Models Sharif University of Technology Soleymani Spring 2015 Markov Random Field Structure: undirected
More informationPart 4: Conditional Random Fields
Part 4: Conditional Random Fields Sebastian Nowozin and Christoph H. Lampert Colorado Springs, 25th June 2011 1 / 39 Problem (Probabilistic Learning) Let d(y x) be the (unknown) true conditional distribution.
More informationPushmeet Kohli Microsoft Research
Pushmeet Kohli Microsoft Research E(x) x in {0,1} n Image (D) [Boykov and Jolly 01] [Blake et al. 04] E(x) = c i x i Pixel Colour x in {0,1} n Unary Cost (c i ) Dark (Bg) Bright (Fg) x* = arg min E(x)
More informationMAP Examples. Sargur Srihari
MAP Examples Sargur srihari@cedar.buffalo.edu 1 Potts Model CRF for OCR Topics Image segmentation based on energy minimization 2 Examples of MAP Many interesting examples of MAP inference are instances
More informationExample: multivariate Gaussian Distribution
School of omputer Science Probabilistic Graphical Models Representation of undirected GM (continued) Eric Xing Lecture 3, September 16, 2009 Reading: KF-chap4 Eric Xing @ MU, 2005-2009 1 Example: multivariate
More informationChapter 16. Structured Probabilistic Models for Deep Learning
Peng et al.: Deep Learning and Practice 1 Chapter 16 Structured Probabilistic Models for Deep Learning Peng et al.: Deep Learning and Practice 2 Structured Probabilistic Models way of using graphs to describe
More informationPart 7: Structured Prediction and Energy Minimization (2/2)
Part 7: Structured Prediction and Energy Minimization (2/2) Colorado Springs, 25th June 2011 G: Worst-case Complexity Hard problem Generality Optimality Worst-case complexity Integrality Determinism G:
More informationComputer Vision Group Prof. Daniel Cremers. 10a. Markov Chain Monte Carlo
Group Prof. Daniel Cremers 10a. Markov Chain Monte Carlo Markov Chain Monte Carlo In high-dimensional spaces, rejection sampling and importance sampling are very inefficient An alternative is Markov Chain
More informationIntroduction To Graphical Models
Peter Gehler Introduction to Graphical Models Introduction To Graphical Models Peter V. Gehler Max Planck Institute for Intelligent Systems, Tübingen, Germany ENS/INRIA Summer School, Paris, July 2013
More informationConditional Random Field
Introduction Linear-Chain General Specific Implementations Conclusions Corso di Elaborazione del Linguaggio Naturale Pisa, May, 2011 Introduction Linear-Chain General Specific Implementations Conclusions
More informationOverview of Statistical Tools. Statistical Inference. Bayesian Framework. Modeling. Very simple case. Things are usually more complicated
Fall 3 Computer Vision Overview of Statistical Tools Statistical Inference Haibin Ling Observation inference Decision Prior knowledge http://www.dabi.temple.edu/~hbling/teaching/3f_5543/index.html Bayesian
More informationProbabilistic Graphical Models & Applications
Probabilistic Graphical Models & Applications Learning of Graphical Models Bjoern Andres and Bernt Schiele Max Planck Institute for Informatics The slides of today s lecture are authored by and shown with
More informationEfficient Inference in Fully Connected CRFs with Gaussian Edge Potentials
Efficient Inference in Fully Connected CRFs with Gaussian Edge Potentials by Phillip Krahenbuhl and Vladlen Koltun Presented by Adam Stambler Multi-class image segmentation Assign a class label to each
More informationMachine Learning, Fall 2009: Midterm
10-601 Machine Learning, Fall 009: Midterm Monday, November nd hours 1. Personal info: Name: Andrew account: E-mail address:. You are permitted two pages of notes and a calculator. Please turn off all
More informationInductive Principles for Restricted Boltzmann Machine Learning
Inductive Principles for Restricted Boltzmann Machine Learning Benjamin Marlin Department of Computer Science University of British Columbia Joint work with Kevin Swersky, Bo Chen and Nando de Freitas
More informationSemi-Markov/Graph Cuts
Semi-Markov/Graph Cuts Alireza Shafaei University of British Columbia August, 2015 1 / 30 A Quick Review For a general chain-structured UGM we have: n n p(x 1, x 2,..., x n ) φ i (x i ) φ i,i 1 (x i, x
More informationClustering K-means. Clustering images. Machine Learning CSE546 Carlos Guestrin University of Washington. November 4, 2014.
Clustering K-means Machine Learning CSE546 Carlos Guestrin University of Washington November 4, 2014 1 Clustering images Set of Images [Goldberger et al.] 2 1 K-means Randomly initialize k centers µ (0)
More informationCSC 412 (Lecture 4): Undirected Graphical Models
CSC 412 (Lecture 4): Undirected Graphical Models Raquel Urtasun University of Toronto Feb 2, 2016 R Urtasun (UofT) CSC 412 Feb 2, 2016 1 / 37 Today Undirected Graphical Models: Semantics of the graph:
More informationAdaptive Crowdsourcing via EM with Prior
Adaptive Crowdsourcing via EM with Prior Peter Maginnis and Tanmay Gupta May, 205 In this work, we make two primary contributions: derivation of the EM update for the shifted and rescaled beta prior and
More informationSequence Modelling with Features: Linear-Chain Conditional Random Fields. COMP-599 Oct 6, 2015
Sequence Modelling with Features: Linear-Chain Conditional Random Fields COMP-599 Oct 6, 2015 Announcement A2 is out. Due Oct 20 at 1pm. 2 Outline Hidden Markov models: shortcomings Generative vs. discriminative
More information3 : Representation of Undirected GM
10-708: Probabilistic Graphical Models 10-708, Spring 2016 3 : Representation of Undirected GM Lecturer: Eric P. Xing Scribes: Longqi Cai, Man-Chia Chang 1 MRF vs BN There are two types of graphical models:
More informationLecture 9: PGM Learning
13 Oct 2014 Intro. to Stats. Machine Learning COMP SCI 4401/7401 Table of Contents I Learning parameters in MRFs 1 Learning parameters in MRFs Inference and Learning Given parameters (of potentials) and
More informationDoes Better Inference mean Better Learning?
Does Better Inference mean Better Learning? Andrew E. Gelfand, Rina Dechter & Alexander Ihler Department of Computer Science University of California, Irvine {agelfand,dechter,ihler}@ics.uci.edu Abstract
More informationProbabilistic Graphical Models
Probabilistic Graphical Models Brown University CSCI 295-P, Spring 213 Prof. Erik Sudderth Lecture 11: Inference & Learning Overview, Gaussian Graphical Models Some figures courtesy Michael Jordan s draft
More informationEfficient Inference in Fully Connected CRFs with Gaussian Edge Potentials
Efficient Inference in Fully Connected CRFs with Gaussian Edge Potentials Philipp Krähenbühl and Vladlen Koltun Stanford University Presenter: Yuan-Ting Hu 1 Conditional Random Field (CRF) E x I = φ u
More informationNaïve Bayes classification
Naïve Bayes classification 1 Probability theory Random variable: a variable whose possible values are numerical outcomes of a random phenomenon. Examples: A person s height, the outcome of a coin toss
More informationGraphical Models. Outline. HMM in short. HMMs. What about continuous HMMs? p(o t q t ) ML 701. Anna Goldenberg ... t=1. !
Outline Graphical Models ML 701 nna Goldenberg! ynamic Models! Gaussian Linear Models! Kalman Filter! N! Undirected Models! Unification! Summary HMMs HMM in short! is a ayes Net hidden states! satisfies
More informationFinal Exam, Machine Learning, Spring 2009
Name: Andrew ID: Final Exam, 10701 Machine Learning, Spring 2009 - The exam is open-book, open-notes, no electronics other than calculators. - The maximum possible score on this exam is 100. You have 3
More informationCSCE 478/878 Lecture 6: Bayesian Learning and Graphical Models. Stephen Scott. Introduction. Outline. Bayes Theorem. Formulas
ian ian ian Might have reasons (domain information) to favor some hypotheses/predictions over others a priori ian methods work with probabilities, and have two main roles: Naïve Nets (Adapted from Ethem
More informationLearning MN Parameters with Alternative Objective Functions. Sargur Srihari
Learning MN Parameters with Alternative Objective Functions Sargur srihari@cedar.buffalo.edu 1 Topics Max Likelihood & Contrastive Objectives Contrastive Objective Learning Methods Pseudo-likelihood Gradient
More informationImage segmentation combining Markov Random Fields and Dirichlet Processes
Image segmentation combining Markov Random Fields and Dirichlet Processes Jessica SODJO IMS, Groupe Signal Image, Talence Encadrants : A. Giremus, J.-F. Giovannelli, F. Caron, N. Dobigeon Jessica SODJO
More informationMachine Learning Summer School
Machine Learning Summer School Lecture 1: Introduction to Graphical Models Zoubin Ghahramani zoubin@eng.cam.ac.uk http://learning.eng.cam.ac.uk/zoubin/ epartment of ngineering University of ambridge, UK
More informationA graph contains a set of nodes (vertices) connected by links (edges or arcs)
BOLTZMANN MACHINES Generative Models Graphical Models A graph contains a set of nodes (vertices) connected by links (edges or arcs) In a probabilistic graphical model, each node represents a random variable,
More informationDecision trees COMS 4771
Decision trees COMS 4771 1. Prediction functions (again) Learning prediction functions IID model for supervised learning: (X 1, Y 1),..., (X n, Y n), (X, Y ) are iid random pairs (i.e., labeled examples).
More informationProbabilistic Graphical Models
Probabilistic Graphical Models Lecture 9: Variational Inference Relaxations Volkan Cevher, Matthias Seeger Ecole Polytechnique Fédérale de Lausanne 24/10/2011 (EPFL) Graphical Models 24/10/2011 1 / 15
More informationSequence labeling. Taking collective a set of interrelated instances x 1,, x T and jointly labeling them
HMM, MEMM and CRF 40-957 Special opics in Artificial Intelligence: Probabilistic Graphical Models Sharif University of echnology Soleymani Spring 2014 Sequence labeling aking collective a set of interrelated
More informationSTA 4273H: Statistical Machine Learning
STA 4273H: Statistical Machine Learning Russ Salakhutdinov Department of Computer Science! Department of Statistical Sciences! rsalakhu@cs.toronto.edu! h0p://www.cs.utoronto.ca/~rsalakhu/ Lecture 7 Approximate
More informationNotes on Markov Networks
Notes on Markov Networks Lili Mou moull12@sei.pku.edu.cn December, 2014 This note covers basic topics in Markov networks. We mainly talk about the formal definition, Gibbs sampling for inference, and maximum
More informationBayesian Learning. HT2015: SC4 Statistical Data Mining and Machine Learning. Maximum Likelihood Principle. The Bayesian Learning Framework
HT5: SC4 Statistical Data Mining and Machine Learning Dino Sejdinovic Department of Statistics Oxford http://www.stats.ox.ac.uk/~sejdinov/sdmml.html Maximum Likelihood Principle A generative model for
More informationLecture 15. Probabilistic Models on Graph
Lecture 15. Probabilistic Models on Graph Prof. Alan Yuille Spring 2014 1 Introduction We discuss how to define probabilistic models that use richly structured probability distributions and describe how
More informationNaïve Bayes classification. p ij 11/15/16. Probability theory. Probability theory. Probability theory. X P (X = x i )=1 i. Marginal Probability
Probability theory Naïve Bayes classification Random variable: a variable whose possible values are numerical outcomes of a random phenomenon. s: A person s height, the outcome of a coin toss Distinguish
More informationStructure Learning in Sequential Data
Structure Learning in Sequential Data Liam Stewart liam@cs.toronto.edu Richard Zemel zemel@cs.toronto.edu 2005.09.19 Motivation. Cau, R. Kuiper, and W.-P. de Roever. Formalising Dijkstra's development
More informationSpatial Bayesian Nonparametrics for Natural Image Segmentation
Spatial Bayesian Nonparametrics for Natural Image Segmentation Erik Sudderth Brown University Joint work with Michael Jordan University of California Soumya Ghosh Brown University Parsing Visual Scenes
More informationGenerative and Discriminative Approaches to Graphical Models CMSC Topics in AI
Generative and Discriminative Approaches to Graphical Models CMSC 35900 Topics in AI Lecture 2 Yasemin Altun January 26, 2007 Review of Inference on Graphical Models Elimination algorithm finds single
More informationLogistic Regression. Machine Learning Fall 2018
Logistic Regression Machine Learning Fall 2018 1 Where are e? We have seen the folloing ideas Linear models Learning as loss minimization Bayesian learning criteria (MAP and MLE estimation) The Naïve Bayes
More informationMachine Learning Basics Lecture 7: Multiclass Classification. Princeton University COS 495 Instructor: Yingyu Liang
Machine Learning Basics Lecture 7: Multiclass Classification Princeton University COS 495 Instructor: Yingyu Liang Example: image classification indoor Indoor outdoor Example: image classification (multiclass)
More information> DEPARTMENT OF MATHEMATICS AND COMPUTER SCIENCE GRAVIS 2016 BASEL. Logistic Regression. Pattern Recognition 2016 Sandro Schönborn University of Basel
Logistic Regression Pattern Recognition 2016 Sandro Schönborn University of Basel Two Worlds: Probabilistic & Algorithmic We have seen two conceptual approaches to classification: data class density estimation
More informationSTA 4273H: Statistical Machine Learning
STA 4273H: Statistical Machine Learning Russ Salakhutdinov Department of Statistics! rsalakhu@utstat.toronto.edu! http://www.utstat.utoronto.ca/~rsalakhu/ Sidney Smith Hall, Room 6002 Lecture 3 Linear
More information13: Variational inference II
10-708: Probabilistic Graphical Models, Spring 2015 13: Variational inference II Lecturer: Eric P. Xing Scribes: Ronghuo Zheng, Zhiting Hu, Yuntian Deng 1 Introduction We started to talk about variational
More informationBayesian Learning (II)
Universität Potsdam Institut für Informatik Lehrstuhl Maschinelles Lernen Bayesian Learning (II) Niels Landwehr Overview Probabilities, expected values, variance Basic concepts of Bayesian learning MAP
More informationStatistical Machine Learning Theory. From Multi-class Classification to Structured Output Prediction. Hisashi Kashima.
http://goo.gl/jv7vj9 Course website KYOTO UNIVERSITY Statistical Machine Learning Theory From Multi-class Classification to Structured Output Prediction Hisashi Kashima kashima@i.kyoto-u.ac.jp DEPARTMENT
More informationGraph Cut based Inference with Co-occurrence Statistics. Ľubor Ladický, Chris Russell, Pushmeet Kohli, Philip Torr
Graph Cut based Inference with Co-occurrence Statistics Ľubor Ladický, Chris Russell, Pushmeet Kohli, Philip Torr Image labelling Problems Assign a label to each image pixel Geometry Estimation Image Denoising
More informationMachine Learning Basics Lecture 2: Linear Classification. Princeton University COS 495 Instructor: Yingyu Liang
Machine Learning Basics Lecture 2: Linear Classification Princeton University COS 495 Instructor: Yingyu Liang Review: machine learning basics Math formulation Given training data x i, y i : 1 i n i.i.d.
More informationDiscriminative Fields for Modeling Spatial Dependencies in Natural Images
Discriminative Fields for Modeling Spatial Dependencies in Natural Images Sanjiv Kumar and Martial Hebert The Robotics Institute Carnegie Mellon University Pittsburgh, PA 15213 {skumar,hebert}@ri.cmu.edu
More informationApril 20th, Advanced Topics in Machine Learning California Institute of Technology. Markov Chain Monte Carlo for Machine Learning
for for Advanced Topics in California Institute of Technology April 20th, 2017 1 / 50 Table of Contents for 1 2 3 4 2 / 50 History of methods for Enrico Fermi used to calculate incredibly accurate predictions
More informationUndirected graphical models
Undirected graphical models Semantics of probabilistic models over undirected graphs Parameters of undirected models Example applications COMP-652 and ECSE-608, February 16, 2017 1 Undirected graphical
More informationIntelligent Systems (AI-2)
Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 18 Oct, 21, 2015 Slide Sources Raymond J. Mooney University of Texas at Austin D. Koller, Stanford CS - Probabilistic Graphical Models CPSC
More informationStatistical Machine Learning Theory. From Multi-class Classification to Structured Output Prediction. Hisashi Kashima.
http://goo.gl/xilnmn Course website KYOTO UNIVERSITY Statistical Machine Learning Theory From Multi-class Classification to Structured Output Prediction Hisashi Kashima kashima@i.kyoto-u.ac.jp DEPARTMENT
More informationIntroduction to Machine Learning Midterm, Tues April 8
Introduction to Machine Learning 10-701 Midterm, Tues April 8 [1 point] Name: Andrew ID: Instructions: You are allowed a (two-sided) sheet of notes. Exam ends at 2:45pm Take a deep breath and don t spend
More informationClustering K-means. Machine Learning CSE546. Sham Kakade University of Washington. November 15, Review: PCA Start: unsupervised learning
Clustering K-means Machine Learning CSE546 Sham Kakade University of Washington November 15, 2016 1 Announcements: Project Milestones due date passed. HW3 due on Monday It ll be collaborative HW2 grades
More informationPart I. C. M. Bishop PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 8: GRAPHICAL MODELS
Part I C. M. Bishop PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 8: GRAPHICAL MODELS Probabilistic Graphical Models Graphical representation of a probabilistic model Each variable corresponds to a
More informationSupport Vector Machines
Support Vector Machines Le Song Machine Learning I CSE 6740, Fall 2013 Naïve Bayes classifier Still use Bayes decision rule for classification P y x = P x y P y P x But assume p x y = 1 is fully factorized
More informationEnergy minimization via graph-cuts
Energy minimization via graph-cuts Nikos Komodakis Ecole des Ponts ParisTech, LIGM Traitement de l information et vision artificielle Binary energy minimization We will first consider binary MRFs: Graph
More informationVariational Inference (11/04/13)
STA561: Probabilistic machine learning Variational Inference (11/04/13) Lecturer: Barbara Engelhardt Scribes: Matt Dickenson, Alireza Samany, Tracy Schifeling 1 Introduction In this lecture we will further
More informationComputer Vision Group Prof. Daniel Cremers. 2. Regression (cont.)
Prof. Daniel Cremers 2. Regression (cont.) Regression with MLE (Rep.) Assume that y is affected by Gaussian noise : t = f(x, w)+ where Thus, we have p(t x, w, )=N (t; f(x, w), 2 ) 2 Maximum A-Posteriori
More informationProbabilistic Graphical Models: MRFs and CRFs. CSE628: Natural Language Processing Guest Lecturer: Veselin Stoyanov
Probabilistic Graphical Models: MRFs and CRFs CSE628: Natural Language Processing Guest Lecturer: Veselin Stoyanov Why PGMs? PGMs can model joint probabilities of many events. many techniques commonly
More informationPILCO: A Model-Based and Data-Efficient Approach to Policy Search
PILCO: A Model-Based and Data-Efficient Approach to Policy Search (M.P. Deisenroth and C.E. Rasmussen) CSC2541 November 4, 2016 PILCO Graphical Model PILCO Probabilistic Inference for Learning COntrol
More informationCSCI-567: Machine Learning (Spring 2019)
CSCI-567: Machine Learning (Spring 2019) Prof. Victor Adamchik U of Southern California Mar. 19, 2019 March 19, 2019 1 / 43 Administration March 19, 2019 2 / 43 Administration TA3 is due this week March
More informationBayesian Machine Learning
Bayesian Machine Learning Andrew Gordon Wilson ORIE 6741 Lecture 2: Bayesian Basics https://people.orie.cornell.edu/andrew/orie6741 Cornell University August 25, 2016 1 / 17 Canonical Machine Learning
More informationStatistical Learning Reading Assignments
Statistical Learning Reading Assignments S. Gong et al. Dynamic Vision: From Images to Face Recognition, Imperial College Press, 2001 (Chapt. 3, hard copy). T. Evgeniou, M. Pontil, and T. Poggio, "Statistical
More informationChris Bishop s PRML Ch. 8: Graphical Models
Chris Bishop s PRML Ch. 8: Graphical Models January 24, 2008 Introduction Visualize the structure of a probabilistic model Design and motivate new models Insights into the model s properties, in particular
More informationLecture : Probabilistic Machine Learning
Lecture : Probabilistic Machine Learning Riashat Islam Reasoning and Learning Lab McGill University September 11, 2018 ML : Many Methods with Many Links Modelling Views of Machine Learning Machine Learning
More informationProbabilistic modeling. The slides are closely adapted from Subhransu Maji s slides
Probabilistic modeling The slides are closely adapted from Subhransu Maji s slides Overview So far the models and algorithms you have learned about are relatively disconnected Probabilistic modeling framework
More informationGraphical Models and Kernel Methods
Graphical Models and Kernel Methods Jerry Zhu Department of Computer Sciences University of Wisconsin Madison, USA MLSS June 17, 2014 1 / 123 Outline Graphical Models Probabilistic Inference Directed vs.
More informationGenerative v. Discriminative classifiers Intuition
Logistic Regression Machine Learning 070/578 Carlos Guestrin Carnegie Mellon University September 24 th, 2007 Generative v. Discriminative classifiers Intuition Want to Learn: h:x a Y X features Y target
More informationStatistical Machine Learning from Data
Samy Bengio Statistical Machine Learning from Data 1 Statistical Machine Learning from Data Ensembles Samy Bengio IDIAP Research Institute, Martigny, Switzerland, and Ecole Polytechnique Fédérale de Lausanne
More informationDeep Learning Srihari. Deep Belief Nets. Sargur N. Srihari
Deep Belief Nets Sargur N. Srihari srihari@cedar.buffalo.edu Topics 1. Boltzmann machines 2. Restricted Boltzmann machines 3. Deep Belief Networks 4. Deep Boltzmann machines 5. Boltzmann machines for continuous
More informationShared Segmentation of Natural Scenes. Dependent Pitman-Yor Processes
Shared Segmentation of Natural Scenes using Dependent Pitman-Yor Processes Erik Sudderth & Michael Jordan University of California, Berkeley Parsing Visual Scenes sky skyscraper sky dome buildings trees
More informationTransformation of Markov Random Fields for Marginal Distribution Estimation
Transformation of Markov Random Fields for Marginal Distribution Estimation Masaki Saito Takayuki Okatani Tohoku University, Japan {msaito, okatani}@vision.is.tohoku.ac.jp Abstract This paper presents
More informationProbabilistic Graphical Models
Probabilisti Graphial Models David Sontag New York University Leture 12, April 19, 2012 Aknowledgement: Partially based on slides by Eri Xing at CMU and Andrew MCallum at UMass Amherst David Sontag (NYU)
More informationCMU-Q Lecture 24:
CMU-Q 15-381 Lecture 24: Supervised Learning 2 Teacher: Gianni A. Di Caro SUPERVISED LEARNING Hypotheses space Hypothesis function Labeled Given Errors Performance criteria Given a collection of input
More informationConditional Random Fields and beyond DANIEL KHASHABI CS 546 UIUC, 2013
Conditional Random Fields and beyond DANIEL KHASHABI CS 546 UIUC, 2013 Outline Modeling Inference Training Applications Outline Modeling Problem definition Discriminative vs. Generative Chain CRF General
More informationBayesian Machine Learning - Lecture 7
Bayesian Machine Learning - Lecture 7 Guido Sanguinetti Institute for Adaptive and Neural Computation School of Informatics University of Edinburgh gsanguin@inf.ed.ac.uk March 4, 2015 Today s lecture 1
More informationExpectation Maximization (EM)
Expectation Maximization (EM) The EM algorithm is used to train models involving latent variables using training data in which the latent variables are not observed (unlabeled data). This is to be contrasted
More informationPATTERN RECOGNITION AND MACHINE LEARNING
PATTERN RECOGNITION AND MACHINE LEARNING Chapter 1. Introduction Shuai Huang April 21, 2014 Outline 1 What is Machine Learning? 2 Curve Fitting 3 Probability Theory 4 Model Selection 5 The curse of dimensionality
More informationThe Particle Filter. PD Dr. Rudolph Triebel Computer Vision Group. Machine Learning for Computer Vision
The Particle Filter Non-parametric implementation of Bayes filter Represents the belief (posterior) random state samples. by a set of This representation is approximate. Can represent distributions that
More informationThe Origin of Deep Learning. Lili Mou Jan, 2015
The Origin of Deep Learning Lili Mou Jan, 2015 Acknowledgment Most of the materials come from G. E. Hinton s online course. Outline Introduction Preliminary Boltzmann Machines and RBMs Deep Belief Nets
More informationBayesian Random Fields: The Bethe-Laplace Approximation
Bayesian Random Fields: The Bethe-Laplace Approximation Max Welling Dept. of Computer Science UC Irvine Irvine CA 92697-3425 welling@ics.uci.edu Sridevi Parise Dept. of Computer Science UC Irvine Irvine
More informationGaussian Processes (10/16/13)
STA561: Probabilistic machine learning Gaussian Processes (10/16/13) Lecturer: Barbara Engelhardt Scribes: Changwei Hu, Di Jin, Mengdi Wang 1 Introduction In supervised learning, we observe some inputs
More informationFinal Overview. Introduction to ML. Marek Petrik 4/25/2017
Final Overview Introduction to ML Marek Petrik 4/25/2017 This Course: Introduction to Machine Learning Build a foundation for practice and research in ML Basic machine learning concepts: max likelihood,
More informationDiscriminative Random Fields: A Discriminative Framework for Contextual Interaction in Classification
Discriminative Random Fields: A Discriminative Framework for Contextual Interaction in Classification Sanjiv Kumar and Martial Hebert The Robotics Institute, Carnegie Mellon University Pittsburgh, PA 15213,
More informationStatistical Learning
Statistical Learning Lecture 5: Bayesian Networks and Graphical Models Mário A. T. Figueiredo Instituto Superior Técnico & Instituto de Telecomunicações University of Lisbon, Portugal May 2018 Mário A.
More informationMarkov Random Fields
Markov Random Fields Umamahesh Srinivas ipal Group Meeting February 25, 2011 Outline 1 Basic graph-theoretic concepts 2 Markov chain 3 Markov random field (MRF) 4 Gauss-Markov random field (GMRF), and
More informationIntelligent Systems:
Intelligent Systems: Probabilistic Inference in Undirected and Directed Graphical models Carsten Rother 04/02/2014 Intelligent Systems: Probabilistic Inference in DGM and UGM Reminder: Random Forests 1)
More informationRandomized Decision Trees
Randomized Decision Trees compiled by Alvin Wan from Professor Jitendra Malik s lecture Discrete Variables First, let us consider some terminology. We have primarily been dealing with real-valued data,
More information