Decision Tree Fields

Size: px
Start display at page:

Download "Decision Tree Fields"

Transcription

1 Sebastian Nowozin, arsten Rother, Shai agon, Toby Sharp, angpeng Yao, Pushmeet Kohli arcelona, 8th November 2011

2 Introduction Random Fields in omputer Vision Markov Random Fields (MRF) (Kindermann and Snell, 1980), (Li, 1995), (lake, Kohli, Rother, 2011) onditional Random Fields (RF) (Lafferty, Mcallum, Perreira, 2001) Structured prediction of multiple dependent variables

3 Introduction RFs: How do we use them? x E (y i, x) y i y j E (y i, y j, x) Factor graph notation (Kschischang, Frey, Loeliger, 1998) x: observed image y i, y j : dependent variables at pixel i and j

4 Introduction RFs: How do we use them? x E (y i, x) y i y j E (y i, y j, x) Unary energy E (y i, x) Machine learning (SVM, oosting, Random Forests, etc.)

5 Introduction RFs: How do we use them? x E (y i, x) y i y j E (y i, y j, x) Pairwise energy E (y i, y j, x) Generalized Potts, image independent ontrast-sensitive smoothing (e.g. Grabut, Textonoost) E (y i, y j, x) = exp( α x i x j 2 )

6 Introduction RFs: How do we use them? E (y i, x) x E (y i, y k, x) y i y j y k???

7 Introduction RFs: How do we use them? E (y i, x) x E (y i, y j, y k, x) y i y j y k???

8 Introduction Decision Trees in omputer Vision Random Forests (reiman, MLJ 2000) Non-parametric, infinite model capacity Fast inference and training, parallelizable (Shotton et al., 2008, 2011), (Saffari et al., 2009), (Gall and Lempitsky, 2009), etc. No structured prediction

9 Introduction ontributions 1. Learn image-dependent interactions 2. ombine random fields and decision trees 3. Efficient training 4. Superior empirical performance

10 Decision Tree lassifiers x

11 Decision Tree lassifiers x

12 Decision Trees for Image Labeling

13 Decision Trees for Image Labeling (cont) I (i, j) pply decision tree, to each pixel independently

14 Decision Trees for Image Labeling (cont) I (i, j) pply decision tree, to each pixel independently

15 Decision Tree Field (DTF) Example E E D E D E D D D D

16 Unary Factor Example x E (y i, x) y i x: entire observed image y i : prediction at pixel i, y i {1, 2, 3, 4} E (y i, x): energy function

17 Unary Factor Example x E (y i, x) y i

18 Unary Factor Example x E (y i, x) y i

19 Unary Factor Example x E (y i, x) y i

20 Unary Factor Example x E (y i, x) y i

21 Unary Factor Example x E (y i, x) y i E (y i, x) = w (q, y i ) q Path(x)

22 Pairwise Factor Example x E (y i, x) y i y j E (y i, y j, x)

23 Pairwise Factor Example x E (y i, x) y i y j E (y i, y j, x)

24 Pairwise Factor Example x E (y i, x) y i y j E (y i, y j, x)

25 Pairwise Factor Example x E (y i, x) y i y j E (y i, y j, x)

26 Pairwise Factor Example x E (y i, x) y i y j E (y i, y j, x)

27 Pairwise Factor Example x E (y i, x) y i y j E (y i, y j, x) E (y i, y j, x) = w (q, y i, y j ) q Path(x)

28 Full DTF Model E(y, x, w) = F F E tf (y F, x F, w tf ). p(y x, w) = Z(x, w) = y Y 1 exp( E(y, x, w)), Z(x, w) exp( E(y, x, w)) D D D E E D D D E E x: image, y: predicted labels, one for each pixel w: weights/energies, to be learned from data How is this different from other models? What about learning and inference?

29 Full DTF Model E(y, x, w) = F F E tf (y F, x F, w tf ). p(y x, w) = Z(x, w) = y Y 1 exp( E(y, x, w)), Z(x, w) exp( E(y, x, w)) D D D E E D D D E E x: image, y: predicted labels, one for each pixel w: weights/energies, to be learned from data How is this different from other models? What about learning and inference?

30 Related Work Relationship to Other Models Generalizes random forests (learned weights) Markov random fields Here 2 trees

31 Related Work Relationship to Other Models Generalizes random forests (learned weights) Markov random fields n

32 Related Work PT-Trees (1995) onditional Probability Table Trees (Glesner, Koller, 1995), (outilier et al., 1996) Decision tree on states of random variables Limited to ayesian networks

33 Related Work Learning a Markov hain Dependency Networks Learn p(x i x V\{i} ) (Heckermann et al., 2000) Decision tree on states of random variables Inference requires simulation (pseudo-gibbs sampling) Random Forest Random Field (Payet and Todorovic, 2010) Decision tree determines sampler Inference: Swendsen-Wang Metropolis MM

34 Learning Learning DTFs Given iid data {(x, y) i } i=1,...,n, need to learn Structure of the factor graph, Tree structure defined by split functions, Weight parameters in decision nodes. Let us assume structure and trees are given

35 Learning Learning DTFs Given iid data {(x, y) i } i=1,...,n, need to learn Structure of the factor graph, Tree structure defined by split functions, Weight parameters in decision nodes. Let us assume structure and trees are given

36 Learning Training Maximum Likelihood Estimation, given ground truth y w = argmax w log p(y x, w)

37 Learning Training Maximum Likelihood Estimation, given ground truth y w = argmax w log p(y x, w) Intractable!

38 Learning Training

39 Learning Training

40 Learning Training Maximum Pseudo-Likelihood Estimation (esag, 1974) w = argmax w 1 V with ground truth y and i V log p(y i yv\{i}, x, w) V : set of pixels in all images. (details in paper)

41 Learning Efficient Training by Subsampling 1M pixels...

42 Learning Efficient Training by Subsampling... 1M pixels

43 Learning Efficient Training by Subsampling We subsample our training set to train a structured model

44 Learning Training lgorithm 1. Fix factor graph structure 2. For each factor: learn classification tree 3. Jointly optimize convex pseudo-likelihood objective in w

45 Learning Test-time Inference in DTFs 1. Energy minimization (MP) E.g. TRW-S 2. Maximum Posterior Marginal (MPM) E.g. Gibbs sampling D E D E D D D D E E

46 Experiments Experiment: Inpainting

47 Experiments Experiment: Inpainting

48 Experiments Experiment: Inpainting

49 Experiments Experiment: Inpainting

50 Experiments Experiment: Inpainting Training set (300 images)

51 Experiments Experiment: Inpainting Test set (100 images, disjoint characters)

52 Experiments Instances Densely-connected, 64 neighbors Each instance: 10k variables, 300k factors hard to minimize energy

53 Experiments Instances Densely-connected, 64 neighbors Each instance: 10k variables, 300k factors hard to minimize energy ssociativity of the learned pairwise potentials Y offset X offset

54 Experiments Experiment: ody-part Recognition ody part recognition (Shotton et al., VPR 2011) 1500 training images, 150 test images

55 Experiments Experiment: ody-part Recognition (cont) Training set

56 Experiments Experiment: ody-part Recognition, Results

57 Experiments Experiment: ody-part Recognition, Results Model Measure Shotton et al. unary +1 +1,20 +1,5,20 MRF avg-acc runtime 6h34 * * * (30h)* weights - 6.3M 6.2M 6.2M 6.3M DTF avg-acc runtime - - * * (40h)* weights M 7.8M 8.8M

58 Experiments Experiment: ody-part Recognition, Results Model Measure Shotton et al. unary +1 +1,20 +1,5,20 MRF avg-acc runtime 6h34 * * * (30h)* weights - 6.3M 6.2M 6.2M 6.3M DTF avg-acc runtime - - * * (40h)* weights M 7.8M 8.8M

59 Experiments Experiment: ody-part Recognition, Results Model Measure Shotton et al. unary +1 +1,20 +1,5,20 MRF avg-acc runtime 6h34 * * * (30h)* weights - 6.3M 6.2M 6.2M 6.3M DTF avg-acc runtime - - * * (40h)* weights M 7.8M 8.8M

60 Experiments Experiment: ody-part Recognition, Results Model Measure Shotton et al. unary +1 +1,20 +1,5,20 MRF avg-acc runtime 6h34 * * * (30h)* weights - 6.3M 6.2M 6.2M 6.3M DTF avg-acc runtime - - * * (40h)* weights M 7.8M 8.8M

61 Experiments DTF Summary : non-parametric RF model for discrete image labeling tasks Non-parametric: model class can scale with training set size Scalable, can make use of large training sets, onditional interactions: richer models without latent variables

62 Experiments DTF Summary : non-parametric RF model for discrete image labeling tasks Non-parametric: model class can scale with training set size Scalable, can make use of large training sets, onditional interactions: richer models without latent variables ode will be made available after VPR deadline!

63 Experiments Thank you!

64 ppendix DTF: Linearity E tf (y F, x F, w tf ) can be written as a function linear in w tf, n Tree(t F ) z Y F w tf (q, z) tf (q, z; y F, x F ), n 0 s 0 n 1 n 2 s 1 where n 3 n 4 { 1 if n Path(xF ) and z = y tf (q, z; y F, x F ) = F, 0 otherwise. overall energy function is linear in w (pseudo-)likelihood function is log-concave Here: not necessarily unique maximizer

65 ppendix Learning the Decision Trees How to learn the decision tree? Ideal world: learn entire model jointly Here: learn decision trees using common information gain criterion Pairwise and order-k factors: treat as L L classification problem (L k ) lthough trees are trained independently, overcounting is avoided by optimizing the weights jointly Training summary 1. For each factor type, train a decision tree using information gain 2. Initialize tree weights to zero 3. Maximize the pseudolikelihood (using L-FGS)

66 ppendix Learning the Decision Trees How to learn the decision tree? Ideal world: learn entire model jointly Here: learn decision trees using common information gain criterion Pairwise and order-k factors: treat as L L classification problem (L k ) lthough trees are trained independently, overcounting is avoided by optimizing the weights jointly Training summary 1. For each factor type, train a decision tree using information gain 2. Initialize tree weights to zero 3. Maximize the pseudolikelihood (using L-FGS)

67 ppendix Experiment: ody-part Recognition, Results Figure: Test recognition results. MRF (top) and DTF (bottom).

68 ppendix Experiment: ody-part Recognition, Results Model Measure Shotton et al. unary +1 +1,20 +1,5,20 MRF avg-acc runtime 6h34 * * * (30h)* weights - 6.3M 6.2M 6.2M 6.3M DTF avg-acc runtime - - * * (40h)* weights M 7.8M 8.8M Table: ody-part recognition results: mean per-class accuracy, training time on a single 8-core machine, and number of model parameters.

69 ppendix Experiment: ody-part Recognition, Results Figure: Learned horizontal interactions: Left: mean silhouette reaching the 32 leaf nodes in the learned tree. One leaf (marked red) and corresponding effective weight matrix. Visualizing the most attractive (blue) and most repulsive (red) weights. Right: superimposing label-label interactions on test images, (a) matching the pattern, (b) no match, interaction is inactive.

70 ppendix Experiment: ody-part Recognition, Results Figure: Learned horizontal interactions: Left: mean silhouette reaching the 32 leaf nodes in the learned tree. One leaf (marked red) and corresponding effective weight matrix. Visualizing the most attractive (blue) and most repulsive (red) weights. Right: superimposing label-label interactions on test images, (a) matching the pattern, (b) no match, interaction is inactive.

71 ppendix Experiment: ody-part Recognition, Results Figure: Learned horizontal interactions: Left: mean silhouette reaching the 32 leaf nodes in the learned tree. One leaf (marked red) and corresponding effective weight matrix. Visualizing the most attractive (blue) and most repulsive (red) weights. Right: superimposing label-label interactions on test images, (a) matching the pattern, (b) no match, interaction is inactive.

72 ppendix Experiment: Snakes Simplest tasks with conditional label-label structure Snake: 10 labels from head (black) to tail (white) Image contains perfect instructions red = go up, yellow = go right, green = go down, blue = go left Myopic decisions are impossible (weak local evidence) Training: 200 small images Testing: 100 small images Features: relative pixel color tests input image labeling

73 ppendix Experiment: Snakes Simplest tasks with conditional label-label structure Snake: 10 labels from head (black) to tail (white) Image contains perfect instructions red = go up, yellow = go right, green = go down, blue = go left Myopic decisions are impossible (weak local evidence) Training: 200 small images Testing: 100 small images Features: relative pixel color tests input image labeling

74 ppendix Experiment: Snakes, Results RF Unary MRF DTF ccuracy ccuracy (tail) ccuracy (mid) Table: Test set accuracies for the snake data set.

75 ppendix Experiment: Snakes, Results Figure: Predictions on a novel test instance.

76 ppendix Experiment: Snakes, onclusion Here, Strong pairwise interactions help when having weak local evidence, Pairwise interactions are strong because they condition on the image, 200 training images are enough

77 ppendix Factor type in DTFs Every factor type has one scope: relative set of variables it acts on, decision tree: tree with split functions, weight parameters: in each node Energy is the sum along path of traversed nodes E tf (y F, x F, w tf ) = w tf (q, y F ) q Path(x F ) n 3 n 0 s 0 n 1 n 2 s 1 n 4

78 ppendix Efficient Training Minimize in w the regularized negative log-pseudolikelihood, l npl (w) = 1 l i (w) 1 log p t (w t ), V V with and i V l i (w) = log p(y i yv\{i}, x, w) V : set of pixels in all images. t

79 ppendix Efficient Training l npl (w) = 1 V i V l i (w) 1 V log p t (w t ), t

80 ppendix Efficient Training l npl (w) = E i U(V) [l i (w)] 1 V log p t (w t ), t

81 ppendix Efficient Training l npl (w) = E i U(V) [l i (w)] 1 V log p t (w t ), omposite objective: expectation + simple function pproximate expectation, deterministic, for V V, l npl (w) 1 V l i (w) 1 V i V MPLE enables subsampling on variable level t log p t (w t ). t

Part 6: Structured Prediction and Energy Minimization (1/2)

Part 6: Structured Prediction and Energy Minimization (1/2) Part 6: Structured Prediction and Energy Minimization (1/2) Providence, 21st June 2012 Prediction Problem Prediction Problem y = f (x) = argmax y Y g(x, y) g(x, y) = p(y x), factor graphs/mrf/crf, g(x,

More information

Intelligent Systems:

Intelligent Systems: Intelligent Systems: Undirected Graphical models (Factor Graphs) (2 lectures) Carsten Rother 15/01/2015 Intelligent Systems: Probabilistic Inference in DGM and UGM Roadmap for next two lectures Definition

More information

Inference in Graphical Models Variable Elimination and Message Passing Algorithm

Inference in Graphical Models Variable Elimination and Message Passing Algorithm Inference in Graphical Models Variable Elimination and Message Passing lgorithm Le Song Machine Learning II: dvanced Topics SE 8803ML, Spring 2012 onditional Independence ssumptions Local Markov ssumption

More information

Markov Random Fields for Computer Vision (Part 1)

Markov Random Fields for Computer Vision (Part 1) Markov Random Fields for Computer Vision (Part 1) Machine Learning Summer School (MLSS 2011) Stephen Gould stephen.gould@anu.edu.au Australian National University 13 17 June, 2011 Stephen Gould 1/23 Pixel

More information

Statistical Problems in Computer Vision

Statistical Problems in Computer Vision V and ML Statistical Problems in omputer Vision Machine Learning and Perception Group Microsoft Research ambridge SI Meeting, ambridge, 26th September 2011 Statistical Problems in omputer Vision Microsoft

More information

Undirected Graphical Models: Markov Random Fields

Undirected Graphical Models: Markov Random Fields Undirected Graphical Models: Markov Random Fields 40-956 Advanced Topics in AI: Probabilistic Graphical Models Sharif University of Technology Soleymani Spring 2015 Markov Random Field Structure: undirected

More information

Part 4: Conditional Random Fields

Part 4: Conditional Random Fields Part 4: Conditional Random Fields Sebastian Nowozin and Christoph H. Lampert Colorado Springs, 25th June 2011 1 / 39 Problem (Probabilistic Learning) Let d(y x) be the (unknown) true conditional distribution.

More information

Pushmeet Kohli Microsoft Research

Pushmeet Kohli Microsoft Research Pushmeet Kohli Microsoft Research E(x) x in {0,1} n Image (D) [Boykov and Jolly 01] [Blake et al. 04] E(x) = c i x i Pixel Colour x in {0,1} n Unary Cost (c i ) Dark (Bg) Bright (Fg) x* = arg min E(x)

More information

MAP Examples. Sargur Srihari

MAP Examples. Sargur Srihari MAP Examples Sargur srihari@cedar.buffalo.edu 1 Potts Model CRF for OCR Topics Image segmentation based on energy minimization 2 Examples of MAP Many interesting examples of MAP inference are instances

More information

Example: multivariate Gaussian Distribution

Example: multivariate Gaussian Distribution School of omputer Science Probabilistic Graphical Models Representation of undirected GM (continued) Eric Xing Lecture 3, September 16, 2009 Reading: KF-chap4 Eric Xing @ MU, 2005-2009 1 Example: multivariate

More information

Chapter 16. Structured Probabilistic Models for Deep Learning

Chapter 16. Structured Probabilistic Models for Deep Learning Peng et al.: Deep Learning and Practice 1 Chapter 16 Structured Probabilistic Models for Deep Learning Peng et al.: Deep Learning and Practice 2 Structured Probabilistic Models way of using graphs to describe

More information

Part 7: Structured Prediction and Energy Minimization (2/2)

Part 7: Structured Prediction and Energy Minimization (2/2) Part 7: Structured Prediction and Energy Minimization (2/2) Colorado Springs, 25th June 2011 G: Worst-case Complexity Hard problem Generality Optimality Worst-case complexity Integrality Determinism G:

More information

Computer Vision Group Prof. Daniel Cremers. 10a. Markov Chain Monte Carlo

Computer Vision Group Prof. Daniel Cremers. 10a. Markov Chain Monte Carlo Group Prof. Daniel Cremers 10a. Markov Chain Monte Carlo Markov Chain Monte Carlo In high-dimensional spaces, rejection sampling and importance sampling are very inefficient An alternative is Markov Chain

More information

Introduction To Graphical Models

Introduction To Graphical Models Peter Gehler Introduction to Graphical Models Introduction To Graphical Models Peter V. Gehler Max Planck Institute for Intelligent Systems, Tübingen, Germany ENS/INRIA Summer School, Paris, July 2013

More information

Conditional Random Field

Conditional Random Field Introduction Linear-Chain General Specific Implementations Conclusions Corso di Elaborazione del Linguaggio Naturale Pisa, May, 2011 Introduction Linear-Chain General Specific Implementations Conclusions

More information

Overview of Statistical Tools. Statistical Inference. Bayesian Framework. Modeling. Very simple case. Things are usually more complicated

Overview of Statistical Tools. Statistical Inference. Bayesian Framework. Modeling. Very simple case. Things are usually more complicated Fall 3 Computer Vision Overview of Statistical Tools Statistical Inference Haibin Ling Observation inference Decision Prior knowledge http://www.dabi.temple.edu/~hbling/teaching/3f_5543/index.html Bayesian

More information

Probabilistic Graphical Models & Applications

Probabilistic Graphical Models & Applications Probabilistic Graphical Models & Applications Learning of Graphical Models Bjoern Andres and Bernt Schiele Max Planck Institute for Informatics The slides of today s lecture are authored by and shown with

More information

Efficient Inference in Fully Connected CRFs with Gaussian Edge Potentials

Efficient Inference in Fully Connected CRFs with Gaussian Edge Potentials Efficient Inference in Fully Connected CRFs with Gaussian Edge Potentials by Phillip Krahenbuhl and Vladlen Koltun Presented by Adam Stambler Multi-class image segmentation Assign a class label to each

More information

Machine Learning, Fall 2009: Midterm

Machine Learning, Fall 2009: Midterm 10-601 Machine Learning, Fall 009: Midterm Monday, November nd hours 1. Personal info: Name: Andrew account: E-mail address:. You are permitted two pages of notes and a calculator. Please turn off all

More information

Inductive Principles for Restricted Boltzmann Machine Learning

Inductive Principles for Restricted Boltzmann Machine Learning Inductive Principles for Restricted Boltzmann Machine Learning Benjamin Marlin Department of Computer Science University of British Columbia Joint work with Kevin Swersky, Bo Chen and Nando de Freitas

More information

Semi-Markov/Graph Cuts

Semi-Markov/Graph Cuts Semi-Markov/Graph Cuts Alireza Shafaei University of British Columbia August, 2015 1 / 30 A Quick Review For a general chain-structured UGM we have: n n p(x 1, x 2,..., x n ) φ i (x i ) φ i,i 1 (x i, x

More information

Clustering K-means. Clustering images. Machine Learning CSE546 Carlos Guestrin University of Washington. November 4, 2014.

Clustering K-means. Clustering images. Machine Learning CSE546 Carlos Guestrin University of Washington. November 4, 2014. Clustering K-means Machine Learning CSE546 Carlos Guestrin University of Washington November 4, 2014 1 Clustering images Set of Images [Goldberger et al.] 2 1 K-means Randomly initialize k centers µ (0)

More information

CSC 412 (Lecture 4): Undirected Graphical Models

CSC 412 (Lecture 4): Undirected Graphical Models CSC 412 (Lecture 4): Undirected Graphical Models Raquel Urtasun University of Toronto Feb 2, 2016 R Urtasun (UofT) CSC 412 Feb 2, 2016 1 / 37 Today Undirected Graphical Models: Semantics of the graph:

More information

Adaptive Crowdsourcing via EM with Prior

Adaptive Crowdsourcing via EM with Prior Adaptive Crowdsourcing via EM with Prior Peter Maginnis and Tanmay Gupta May, 205 In this work, we make two primary contributions: derivation of the EM update for the shifted and rescaled beta prior and

More information

Sequence Modelling with Features: Linear-Chain Conditional Random Fields. COMP-599 Oct 6, 2015

Sequence Modelling with Features: Linear-Chain Conditional Random Fields. COMP-599 Oct 6, 2015 Sequence Modelling with Features: Linear-Chain Conditional Random Fields COMP-599 Oct 6, 2015 Announcement A2 is out. Due Oct 20 at 1pm. 2 Outline Hidden Markov models: shortcomings Generative vs. discriminative

More information

3 : Representation of Undirected GM

3 : Representation of Undirected GM 10-708: Probabilistic Graphical Models 10-708, Spring 2016 3 : Representation of Undirected GM Lecturer: Eric P. Xing Scribes: Longqi Cai, Man-Chia Chang 1 MRF vs BN There are two types of graphical models:

More information

Lecture 9: PGM Learning

Lecture 9: PGM Learning 13 Oct 2014 Intro. to Stats. Machine Learning COMP SCI 4401/7401 Table of Contents I Learning parameters in MRFs 1 Learning parameters in MRFs Inference and Learning Given parameters (of potentials) and

More information

Does Better Inference mean Better Learning?

Does Better Inference mean Better Learning? Does Better Inference mean Better Learning? Andrew E. Gelfand, Rina Dechter & Alexander Ihler Department of Computer Science University of California, Irvine {agelfand,dechter,ihler}@ics.uci.edu Abstract

More information

Probabilistic Graphical Models

Probabilistic Graphical Models Probabilistic Graphical Models Brown University CSCI 295-P, Spring 213 Prof. Erik Sudderth Lecture 11: Inference & Learning Overview, Gaussian Graphical Models Some figures courtesy Michael Jordan s draft

More information

Efficient Inference in Fully Connected CRFs with Gaussian Edge Potentials

Efficient Inference in Fully Connected CRFs with Gaussian Edge Potentials Efficient Inference in Fully Connected CRFs with Gaussian Edge Potentials Philipp Krähenbühl and Vladlen Koltun Stanford University Presenter: Yuan-Ting Hu 1 Conditional Random Field (CRF) E x I = φ u

More information

Naïve Bayes classification

Naïve Bayes classification Naïve Bayes classification 1 Probability theory Random variable: a variable whose possible values are numerical outcomes of a random phenomenon. Examples: A person s height, the outcome of a coin toss

More information

Graphical Models. Outline. HMM in short. HMMs. What about continuous HMMs? p(o t q t ) ML 701. Anna Goldenberg ... t=1. !

Graphical Models. Outline. HMM in short. HMMs. What about continuous HMMs? p(o t q t ) ML 701. Anna Goldenberg ... t=1. ! Outline Graphical Models ML 701 nna Goldenberg! ynamic Models! Gaussian Linear Models! Kalman Filter! N! Undirected Models! Unification! Summary HMMs HMM in short! is a ayes Net hidden states! satisfies

More information

Final Exam, Machine Learning, Spring 2009

Final Exam, Machine Learning, Spring 2009 Name: Andrew ID: Final Exam, 10701 Machine Learning, Spring 2009 - The exam is open-book, open-notes, no electronics other than calculators. - The maximum possible score on this exam is 100. You have 3

More information

CSCE 478/878 Lecture 6: Bayesian Learning and Graphical Models. Stephen Scott. Introduction. Outline. Bayes Theorem. Formulas

CSCE 478/878 Lecture 6: Bayesian Learning and Graphical Models. Stephen Scott. Introduction. Outline. Bayes Theorem. Formulas ian ian ian Might have reasons (domain information) to favor some hypotheses/predictions over others a priori ian methods work with probabilities, and have two main roles: Naïve Nets (Adapted from Ethem

More information

Learning MN Parameters with Alternative Objective Functions. Sargur Srihari

Learning MN Parameters with Alternative Objective Functions. Sargur Srihari Learning MN Parameters with Alternative Objective Functions Sargur srihari@cedar.buffalo.edu 1 Topics Max Likelihood & Contrastive Objectives Contrastive Objective Learning Methods Pseudo-likelihood Gradient

More information

Image segmentation combining Markov Random Fields and Dirichlet Processes

Image segmentation combining Markov Random Fields and Dirichlet Processes Image segmentation combining Markov Random Fields and Dirichlet Processes Jessica SODJO IMS, Groupe Signal Image, Talence Encadrants : A. Giremus, J.-F. Giovannelli, F. Caron, N. Dobigeon Jessica SODJO

More information

Machine Learning Summer School

Machine Learning Summer School Machine Learning Summer School Lecture 1: Introduction to Graphical Models Zoubin Ghahramani zoubin@eng.cam.ac.uk http://learning.eng.cam.ac.uk/zoubin/ epartment of ngineering University of ambridge, UK

More information

A graph contains a set of nodes (vertices) connected by links (edges or arcs)

A graph contains a set of nodes (vertices) connected by links (edges or arcs) BOLTZMANN MACHINES Generative Models Graphical Models A graph contains a set of nodes (vertices) connected by links (edges or arcs) In a probabilistic graphical model, each node represents a random variable,

More information

Decision trees COMS 4771

Decision trees COMS 4771 Decision trees COMS 4771 1. Prediction functions (again) Learning prediction functions IID model for supervised learning: (X 1, Y 1),..., (X n, Y n), (X, Y ) are iid random pairs (i.e., labeled examples).

More information

Probabilistic Graphical Models

Probabilistic Graphical Models Probabilistic Graphical Models Lecture 9: Variational Inference Relaxations Volkan Cevher, Matthias Seeger Ecole Polytechnique Fédérale de Lausanne 24/10/2011 (EPFL) Graphical Models 24/10/2011 1 / 15

More information

Sequence labeling. Taking collective a set of interrelated instances x 1,, x T and jointly labeling them

Sequence labeling. Taking collective a set of interrelated instances x 1,, x T and jointly labeling them HMM, MEMM and CRF 40-957 Special opics in Artificial Intelligence: Probabilistic Graphical Models Sharif University of echnology Soleymani Spring 2014 Sequence labeling aking collective a set of interrelated

More information

STA 4273H: Statistical Machine Learning

STA 4273H: Statistical Machine Learning STA 4273H: Statistical Machine Learning Russ Salakhutdinov Department of Computer Science! Department of Statistical Sciences! rsalakhu@cs.toronto.edu! h0p://www.cs.utoronto.ca/~rsalakhu/ Lecture 7 Approximate

More information

Notes on Markov Networks

Notes on Markov Networks Notes on Markov Networks Lili Mou moull12@sei.pku.edu.cn December, 2014 This note covers basic topics in Markov networks. We mainly talk about the formal definition, Gibbs sampling for inference, and maximum

More information

Bayesian Learning. HT2015: SC4 Statistical Data Mining and Machine Learning. Maximum Likelihood Principle. The Bayesian Learning Framework

Bayesian Learning. HT2015: SC4 Statistical Data Mining and Machine Learning. Maximum Likelihood Principle. The Bayesian Learning Framework HT5: SC4 Statistical Data Mining and Machine Learning Dino Sejdinovic Department of Statistics Oxford http://www.stats.ox.ac.uk/~sejdinov/sdmml.html Maximum Likelihood Principle A generative model for

More information

Lecture 15. Probabilistic Models on Graph

Lecture 15. Probabilistic Models on Graph Lecture 15. Probabilistic Models on Graph Prof. Alan Yuille Spring 2014 1 Introduction We discuss how to define probabilistic models that use richly structured probability distributions and describe how

More information

Naïve Bayes classification. p ij 11/15/16. Probability theory. Probability theory. Probability theory. X P (X = x i )=1 i. Marginal Probability

Naïve Bayes classification. p ij 11/15/16. Probability theory. Probability theory. Probability theory. X P (X = x i )=1 i. Marginal Probability Probability theory Naïve Bayes classification Random variable: a variable whose possible values are numerical outcomes of a random phenomenon. s: A person s height, the outcome of a coin toss Distinguish

More information

Structure Learning in Sequential Data

Structure Learning in Sequential Data Structure Learning in Sequential Data Liam Stewart liam@cs.toronto.edu Richard Zemel zemel@cs.toronto.edu 2005.09.19 Motivation. Cau, R. Kuiper, and W.-P. de Roever. Formalising Dijkstra's development

More information

Spatial Bayesian Nonparametrics for Natural Image Segmentation

Spatial Bayesian Nonparametrics for Natural Image Segmentation Spatial Bayesian Nonparametrics for Natural Image Segmentation Erik Sudderth Brown University Joint work with Michael Jordan University of California Soumya Ghosh Brown University Parsing Visual Scenes

More information

Generative and Discriminative Approaches to Graphical Models CMSC Topics in AI

Generative and Discriminative Approaches to Graphical Models CMSC Topics in AI Generative and Discriminative Approaches to Graphical Models CMSC 35900 Topics in AI Lecture 2 Yasemin Altun January 26, 2007 Review of Inference on Graphical Models Elimination algorithm finds single

More information

Logistic Regression. Machine Learning Fall 2018

Logistic Regression. Machine Learning Fall 2018 Logistic Regression Machine Learning Fall 2018 1 Where are e? We have seen the folloing ideas Linear models Learning as loss minimization Bayesian learning criteria (MAP and MLE estimation) The Naïve Bayes

More information

Machine Learning Basics Lecture 7: Multiclass Classification. Princeton University COS 495 Instructor: Yingyu Liang

Machine Learning Basics Lecture 7: Multiclass Classification. Princeton University COS 495 Instructor: Yingyu Liang Machine Learning Basics Lecture 7: Multiclass Classification Princeton University COS 495 Instructor: Yingyu Liang Example: image classification indoor Indoor outdoor Example: image classification (multiclass)

More information

> DEPARTMENT OF MATHEMATICS AND COMPUTER SCIENCE GRAVIS 2016 BASEL. Logistic Regression. Pattern Recognition 2016 Sandro Schönborn University of Basel

> DEPARTMENT OF MATHEMATICS AND COMPUTER SCIENCE GRAVIS 2016 BASEL. Logistic Regression. Pattern Recognition 2016 Sandro Schönborn University of Basel Logistic Regression Pattern Recognition 2016 Sandro Schönborn University of Basel Two Worlds: Probabilistic & Algorithmic We have seen two conceptual approaches to classification: data class density estimation

More information

STA 4273H: Statistical Machine Learning

STA 4273H: Statistical Machine Learning STA 4273H: Statistical Machine Learning Russ Salakhutdinov Department of Statistics! rsalakhu@utstat.toronto.edu! http://www.utstat.utoronto.ca/~rsalakhu/ Sidney Smith Hall, Room 6002 Lecture 3 Linear

More information

13: Variational inference II

13: Variational inference II 10-708: Probabilistic Graphical Models, Spring 2015 13: Variational inference II Lecturer: Eric P. Xing Scribes: Ronghuo Zheng, Zhiting Hu, Yuntian Deng 1 Introduction We started to talk about variational

More information

Bayesian Learning (II)

Bayesian Learning (II) Universität Potsdam Institut für Informatik Lehrstuhl Maschinelles Lernen Bayesian Learning (II) Niels Landwehr Overview Probabilities, expected values, variance Basic concepts of Bayesian learning MAP

More information

Statistical Machine Learning Theory. From Multi-class Classification to Structured Output Prediction. Hisashi Kashima.

Statistical Machine Learning Theory. From Multi-class Classification to Structured Output Prediction. Hisashi Kashima. http://goo.gl/jv7vj9 Course website KYOTO UNIVERSITY Statistical Machine Learning Theory From Multi-class Classification to Structured Output Prediction Hisashi Kashima kashima@i.kyoto-u.ac.jp DEPARTMENT

More information

Graph Cut based Inference with Co-occurrence Statistics. Ľubor Ladický, Chris Russell, Pushmeet Kohli, Philip Torr

Graph Cut based Inference with Co-occurrence Statistics. Ľubor Ladický, Chris Russell, Pushmeet Kohli, Philip Torr Graph Cut based Inference with Co-occurrence Statistics Ľubor Ladický, Chris Russell, Pushmeet Kohli, Philip Torr Image labelling Problems Assign a label to each image pixel Geometry Estimation Image Denoising

More information

Machine Learning Basics Lecture 2: Linear Classification. Princeton University COS 495 Instructor: Yingyu Liang

Machine Learning Basics Lecture 2: Linear Classification. Princeton University COS 495 Instructor: Yingyu Liang Machine Learning Basics Lecture 2: Linear Classification Princeton University COS 495 Instructor: Yingyu Liang Review: machine learning basics Math formulation Given training data x i, y i : 1 i n i.i.d.

More information

Discriminative Fields for Modeling Spatial Dependencies in Natural Images

Discriminative Fields for Modeling Spatial Dependencies in Natural Images Discriminative Fields for Modeling Spatial Dependencies in Natural Images Sanjiv Kumar and Martial Hebert The Robotics Institute Carnegie Mellon University Pittsburgh, PA 15213 {skumar,hebert}@ri.cmu.edu

More information

April 20th, Advanced Topics in Machine Learning California Institute of Technology. Markov Chain Monte Carlo for Machine Learning

April 20th, Advanced Topics in Machine Learning California Institute of Technology. Markov Chain Monte Carlo for Machine Learning for for Advanced Topics in California Institute of Technology April 20th, 2017 1 / 50 Table of Contents for 1 2 3 4 2 / 50 History of methods for Enrico Fermi used to calculate incredibly accurate predictions

More information

Undirected graphical models

Undirected graphical models Undirected graphical models Semantics of probabilistic models over undirected graphs Parameters of undirected models Example applications COMP-652 and ECSE-608, February 16, 2017 1 Undirected graphical

More information

Intelligent Systems (AI-2)

Intelligent Systems (AI-2) Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 18 Oct, 21, 2015 Slide Sources Raymond J. Mooney University of Texas at Austin D. Koller, Stanford CS - Probabilistic Graphical Models CPSC

More information

Statistical Machine Learning Theory. From Multi-class Classification to Structured Output Prediction. Hisashi Kashima.

Statistical Machine Learning Theory. From Multi-class Classification to Structured Output Prediction. Hisashi Kashima. http://goo.gl/xilnmn Course website KYOTO UNIVERSITY Statistical Machine Learning Theory From Multi-class Classification to Structured Output Prediction Hisashi Kashima kashima@i.kyoto-u.ac.jp DEPARTMENT

More information

Introduction to Machine Learning Midterm, Tues April 8

Introduction to Machine Learning Midterm, Tues April 8 Introduction to Machine Learning 10-701 Midterm, Tues April 8 [1 point] Name: Andrew ID: Instructions: You are allowed a (two-sided) sheet of notes. Exam ends at 2:45pm Take a deep breath and don t spend

More information

Clustering K-means. Machine Learning CSE546. Sham Kakade University of Washington. November 15, Review: PCA Start: unsupervised learning

Clustering K-means. Machine Learning CSE546. Sham Kakade University of Washington. November 15, Review: PCA Start: unsupervised learning Clustering K-means Machine Learning CSE546 Sham Kakade University of Washington November 15, 2016 1 Announcements: Project Milestones due date passed. HW3 due on Monday It ll be collaborative HW2 grades

More information

Part I. C. M. Bishop PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 8: GRAPHICAL MODELS

Part I. C. M. Bishop PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 8: GRAPHICAL MODELS Part I C. M. Bishop PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 8: GRAPHICAL MODELS Probabilistic Graphical Models Graphical representation of a probabilistic model Each variable corresponds to a

More information

Support Vector Machines

Support Vector Machines Support Vector Machines Le Song Machine Learning I CSE 6740, Fall 2013 Naïve Bayes classifier Still use Bayes decision rule for classification P y x = P x y P y P x But assume p x y = 1 is fully factorized

More information

Energy minimization via graph-cuts

Energy minimization via graph-cuts Energy minimization via graph-cuts Nikos Komodakis Ecole des Ponts ParisTech, LIGM Traitement de l information et vision artificielle Binary energy minimization We will first consider binary MRFs: Graph

More information

Variational Inference (11/04/13)

Variational Inference (11/04/13) STA561: Probabilistic machine learning Variational Inference (11/04/13) Lecturer: Barbara Engelhardt Scribes: Matt Dickenson, Alireza Samany, Tracy Schifeling 1 Introduction In this lecture we will further

More information

Computer Vision Group Prof. Daniel Cremers. 2. Regression (cont.)

Computer Vision Group Prof. Daniel Cremers. 2. Regression (cont.) Prof. Daniel Cremers 2. Regression (cont.) Regression with MLE (Rep.) Assume that y is affected by Gaussian noise : t = f(x, w)+ where Thus, we have p(t x, w, )=N (t; f(x, w), 2 ) 2 Maximum A-Posteriori

More information

Probabilistic Graphical Models: MRFs and CRFs. CSE628: Natural Language Processing Guest Lecturer: Veselin Stoyanov

Probabilistic Graphical Models: MRFs and CRFs. CSE628: Natural Language Processing Guest Lecturer: Veselin Stoyanov Probabilistic Graphical Models: MRFs and CRFs CSE628: Natural Language Processing Guest Lecturer: Veselin Stoyanov Why PGMs? PGMs can model joint probabilities of many events. many techniques commonly

More information

PILCO: A Model-Based and Data-Efficient Approach to Policy Search

PILCO: A Model-Based and Data-Efficient Approach to Policy Search PILCO: A Model-Based and Data-Efficient Approach to Policy Search (M.P. Deisenroth and C.E. Rasmussen) CSC2541 November 4, 2016 PILCO Graphical Model PILCO Probabilistic Inference for Learning COntrol

More information

CSCI-567: Machine Learning (Spring 2019)

CSCI-567: Machine Learning (Spring 2019) CSCI-567: Machine Learning (Spring 2019) Prof. Victor Adamchik U of Southern California Mar. 19, 2019 March 19, 2019 1 / 43 Administration March 19, 2019 2 / 43 Administration TA3 is due this week March

More information

Bayesian Machine Learning

Bayesian Machine Learning Bayesian Machine Learning Andrew Gordon Wilson ORIE 6741 Lecture 2: Bayesian Basics https://people.orie.cornell.edu/andrew/orie6741 Cornell University August 25, 2016 1 / 17 Canonical Machine Learning

More information

Statistical Learning Reading Assignments

Statistical Learning Reading Assignments Statistical Learning Reading Assignments S. Gong et al. Dynamic Vision: From Images to Face Recognition, Imperial College Press, 2001 (Chapt. 3, hard copy). T. Evgeniou, M. Pontil, and T. Poggio, "Statistical

More information

Chris Bishop s PRML Ch. 8: Graphical Models

Chris Bishop s PRML Ch. 8: Graphical Models Chris Bishop s PRML Ch. 8: Graphical Models January 24, 2008 Introduction Visualize the structure of a probabilistic model Design and motivate new models Insights into the model s properties, in particular

More information

Lecture : Probabilistic Machine Learning

Lecture : Probabilistic Machine Learning Lecture : Probabilistic Machine Learning Riashat Islam Reasoning and Learning Lab McGill University September 11, 2018 ML : Many Methods with Many Links Modelling Views of Machine Learning Machine Learning

More information

Probabilistic modeling. The slides are closely adapted from Subhransu Maji s slides

Probabilistic modeling. The slides are closely adapted from Subhransu Maji s slides Probabilistic modeling The slides are closely adapted from Subhransu Maji s slides Overview So far the models and algorithms you have learned about are relatively disconnected Probabilistic modeling framework

More information

Graphical Models and Kernel Methods

Graphical Models and Kernel Methods Graphical Models and Kernel Methods Jerry Zhu Department of Computer Sciences University of Wisconsin Madison, USA MLSS June 17, 2014 1 / 123 Outline Graphical Models Probabilistic Inference Directed vs.

More information

Generative v. Discriminative classifiers Intuition

Generative v. Discriminative classifiers Intuition Logistic Regression Machine Learning 070/578 Carlos Guestrin Carnegie Mellon University September 24 th, 2007 Generative v. Discriminative classifiers Intuition Want to Learn: h:x a Y X features Y target

More information

Statistical Machine Learning from Data

Statistical Machine Learning from Data Samy Bengio Statistical Machine Learning from Data 1 Statistical Machine Learning from Data Ensembles Samy Bengio IDIAP Research Institute, Martigny, Switzerland, and Ecole Polytechnique Fédérale de Lausanne

More information

Deep Learning Srihari. Deep Belief Nets. Sargur N. Srihari

Deep Learning Srihari. Deep Belief Nets. Sargur N. Srihari Deep Belief Nets Sargur N. Srihari srihari@cedar.buffalo.edu Topics 1. Boltzmann machines 2. Restricted Boltzmann machines 3. Deep Belief Networks 4. Deep Boltzmann machines 5. Boltzmann machines for continuous

More information

Shared Segmentation of Natural Scenes. Dependent Pitman-Yor Processes

Shared Segmentation of Natural Scenes. Dependent Pitman-Yor Processes Shared Segmentation of Natural Scenes using Dependent Pitman-Yor Processes Erik Sudderth & Michael Jordan University of California, Berkeley Parsing Visual Scenes sky skyscraper sky dome buildings trees

More information

Transformation of Markov Random Fields for Marginal Distribution Estimation

Transformation of Markov Random Fields for Marginal Distribution Estimation Transformation of Markov Random Fields for Marginal Distribution Estimation Masaki Saito Takayuki Okatani Tohoku University, Japan {msaito, okatani}@vision.is.tohoku.ac.jp Abstract This paper presents

More information

Probabilistic Graphical Models

Probabilistic Graphical Models Probabilisti Graphial Models David Sontag New York University Leture 12, April 19, 2012 Aknowledgement: Partially based on slides by Eri Xing at CMU and Andrew MCallum at UMass Amherst David Sontag (NYU)

More information

CMU-Q Lecture 24:

CMU-Q Lecture 24: CMU-Q 15-381 Lecture 24: Supervised Learning 2 Teacher: Gianni A. Di Caro SUPERVISED LEARNING Hypotheses space Hypothesis function Labeled Given Errors Performance criteria Given a collection of input

More information

Conditional Random Fields and beyond DANIEL KHASHABI CS 546 UIUC, 2013

Conditional Random Fields and beyond DANIEL KHASHABI CS 546 UIUC, 2013 Conditional Random Fields and beyond DANIEL KHASHABI CS 546 UIUC, 2013 Outline Modeling Inference Training Applications Outline Modeling Problem definition Discriminative vs. Generative Chain CRF General

More information

Bayesian Machine Learning - Lecture 7

Bayesian Machine Learning - Lecture 7 Bayesian Machine Learning - Lecture 7 Guido Sanguinetti Institute for Adaptive and Neural Computation School of Informatics University of Edinburgh gsanguin@inf.ed.ac.uk March 4, 2015 Today s lecture 1

More information

Expectation Maximization (EM)

Expectation Maximization (EM) Expectation Maximization (EM) The EM algorithm is used to train models involving latent variables using training data in which the latent variables are not observed (unlabeled data). This is to be contrasted

More information

PATTERN RECOGNITION AND MACHINE LEARNING

PATTERN RECOGNITION AND MACHINE LEARNING PATTERN RECOGNITION AND MACHINE LEARNING Chapter 1. Introduction Shuai Huang April 21, 2014 Outline 1 What is Machine Learning? 2 Curve Fitting 3 Probability Theory 4 Model Selection 5 The curse of dimensionality

More information

The Particle Filter. PD Dr. Rudolph Triebel Computer Vision Group. Machine Learning for Computer Vision

The Particle Filter. PD Dr. Rudolph Triebel Computer Vision Group. Machine Learning for Computer Vision The Particle Filter Non-parametric implementation of Bayes filter Represents the belief (posterior) random state samples. by a set of This representation is approximate. Can represent distributions that

More information

The Origin of Deep Learning. Lili Mou Jan, 2015

The Origin of Deep Learning. Lili Mou Jan, 2015 The Origin of Deep Learning Lili Mou Jan, 2015 Acknowledgment Most of the materials come from G. E. Hinton s online course. Outline Introduction Preliminary Boltzmann Machines and RBMs Deep Belief Nets

More information

Bayesian Random Fields: The Bethe-Laplace Approximation

Bayesian Random Fields: The Bethe-Laplace Approximation Bayesian Random Fields: The Bethe-Laplace Approximation Max Welling Dept. of Computer Science UC Irvine Irvine CA 92697-3425 welling@ics.uci.edu Sridevi Parise Dept. of Computer Science UC Irvine Irvine

More information

Gaussian Processes (10/16/13)

Gaussian Processes (10/16/13) STA561: Probabilistic machine learning Gaussian Processes (10/16/13) Lecturer: Barbara Engelhardt Scribes: Changwei Hu, Di Jin, Mengdi Wang 1 Introduction In supervised learning, we observe some inputs

More information

Final Overview. Introduction to ML. Marek Petrik 4/25/2017

Final Overview. Introduction to ML. Marek Petrik 4/25/2017 Final Overview Introduction to ML Marek Petrik 4/25/2017 This Course: Introduction to Machine Learning Build a foundation for practice and research in ML Basic machine learning concepts: max likelihood,

More information

Discriminative Random Fields: A Discriminative Framework for Contextual Interaction in Classification

Discriminative Random Fields: A Discriminative Framework for Contextual Interaction in Classification Discriminative Random Fields: A Discriminative Framework for Contextual Interaction in Classification Sanjiv Kumar and Martial Hebert The Robotics Institute, Carnegie Mellon University Pittsburgh, PA 15213,

More information

Statistical Learning

Statistical Learning Statistical Learning Lecture 5: Bayesian Networks and Graphical Models Mário A. T. Figueiredo Instituto Superior Técnico & Instituto de Telecomunicações University of Lisbon, Portugal May 2018 Mário A.

More information

Markov Random Fields

Markov Random Fields Markov Random Fields Umamahesh Srinivas ipal Group Meeting February 25, 2011 Outline 1 Basic graph-theoretic concepts 2 Markov chain 3 Markov random field (MRF) 4 Gauss-Markov random field (GMRF), and

More information

Intelligent Systems:

Intelligent Systems: Intelligent Systems: Probabilistic Inference in Undirected and Directed Graphical models Carsten Rother 04/02/2014 Intelligent Systems: Probabilistic Inference in DGM and UGM Reminder: Random Forests 1)

More information

Randomized Decision Trees

Randomized Decision Trees Randomized Decision Trees compiled by Alvin Wan from Professor Jitendra Malik s lecture Discrete Variables First, let us consider some terminology. We have primarily been dealing with real-valued data,

More information