Introduction To Graphical Models

Size: px
Start display at page:

Download "Introduction To Graphical Models"

Transcription

1 Peter Gehler Introduction to Graphical Models Introduction To Graphical Models Peter V. Gehler Max Planck Institute for Intelligent Systems, Tübingen, Germany ENS/INRIA Summer School, Paris, July / 6

2 Peter Gehler Introduction to Graphical Models Extended version in book form Sebastian Nowozin and Christoph Lampert Structured Learning and Prediction in Computer Vision ca 200 pages Available free online Slides mainly based on a tutorial version from Christoph Thanks! 2 / 6

3 Peter Gehler Introduction to Graphical Models Literature Recommendation David Barber Bayesian Reasoning and Machine Learning 670 pages Available free online staff/d.barber/pmwiki/pmwiki. php?n=brml.online 3 / 6

4 Peter Gehler Introduction to Graphical Models Standard Regression: f : X R. Structured Output Learning: f : X Y. 4 / 6

5 Peter Gehler Introduction to Graphical Models Standard Regression: f : X R. inputs X can be any kind of objects output y is a real number Structured Output Learning: f : X Y. inputs X can be any kind of objects outputs y Y are complex (structured) objects 5 / 6

6 Peter Gehler Introduction to Graphical Models What is structured output prediction? Ad hoc definition: predicting structured outputs from input data (in contrast to predicting just a single number, like in classification or regression) Natural Language Processing: Automatic Translation (output: sentences) Sentence Parsing (output: parse trees) Bioinformatics: Secondary Structure Prediction (output: bipartite graphs) Enzyme Function Prediction (output: path in a tree) Speech Processing: Automatic Transcription (output: sentences) Text-to-Speech (output: audio signal) Robotics: Planning (output: sequence of actions) This tutorial: Applications and Examples from Computer Vision 6 / 6

7 Probabilistic Graphical Models

8 Peter Gehler Introduction to Graphical Models Example: Human Pose Estimation x X y Y Given an image, where is a person and how is it articulated? f : X Y Image x, but what is human pose y Y precisely? 2 / 24

9 Peter Gehler Introduction to Graphical Models Human Pose Y Example y head Body Part: y head = (u, v, θ) where (u, v) center, θ rotation (u, v) {1,..., M} {1,..., N}, θ {0, 45, 90,...} 3 / 24

10 Peter Gehler Introduction to Graphical Models Human Pose Y Example y head Body Part: y head = (u, v, θ) where (u, v) center, θ rotation (u, v) {1,..., M} {1,..., N}, θ {0, 45, 90,...} 3 / 24

11 Peter Gehler Introduction to Graphical Models Human Pose Y Example y head Body Part: y head = (u, v, θ) where (u, v) center, θ rotation (u, v) {1,..., M} {1,..., N}, θ {0, 45, 90,...} 3 / 24

12 Peter Gehler Introduction to Graphical Models Human Pose Y Example y head Body Part: y head = (u, v, θ) where (u, v) center, θ rotation (u, v) {1,..., M} {1,..., N}, θ {0, 45, 90,...} 3 / 24

13 Peter Gehler Introduction to Graphical Models Human Pose Y Example y head Body Part: y head = (u, v, θ) where (u, v) center, θ rotation (u, v) {1,..., M} {1,..., N}, θ {0, 45, 90,...} 3 / 24

14 Peter Gehler Introduction to Graphical Models Human Pose Y Example y head Body Part: y head = (u, v, θ) where (u, v) center, θ rotation (u, v) {1,..., M} {1,..., N}, θ {0, 45, 90,...} 3 / 24

15 Peter Gehler Introduction to Graphical Models Human Pose Y Example y head Body Part: y head = (u, v, θ) where (u, v) center, θ rotation (u, v) {1,..., M} {1,..., N}, θ {0, 45, 90,...} Entire Body: y = (y head, y torso, y left lower arm,...} Y 3 / 24

16 Peter Gehler Introduction to Graphical Models Human Pose Y X ψ(y head, x) Y head Image x X Example y head Head detector Idea: Have a head classifier (SVM, NN,...) ψ(y head, x) R + 4 / 24

17 Peter Gehler Introduction to Graphical Models Human Pose Y X ψ(y head, x) Y head Image x X Example y head Head detector Idea: Have a head classifier (SVM, NN,...) ψ(y head, x) R + Evaluate everywhere and record score 4 / 24

18 Peter Gehler Introduction to Graphical Models Human Pose Y X ψ(y head, x) Y head Image x X Example y head Head detector Idea: Have a head classifier (SVM, NN,...) ψ(y head, x) R + Evaluate everywhere and record score Repeat for all body parts 4 / 24

19 Peter Gehler Introduction to Graphical Models Human Pose Estimation X X ψ(y head, x) ψ(y torso, x) Y head Y torso Image x X Compute y = (yhead, y torso, ) = argmax ψ(y head, x)ψ(y torso, x) y head,y torso, 5 / 24

20 Peter Gehler Introduction to Graphical Models Human Pose Estimation X X ψ(y head, x) ψ(y torso, x) Y head Y torso Image x X Compute y = (yhead, y torso, ) = argmax ψ(y head, x)ψ(y torso, x) y head,y torso, = (argmax ψ(y head, x), argmax ψ(y torso, x), ) y head y torso 5 / 24

21 Peter Gehler Introduction to Graphical Models Human Pose Estimation Image x X Prediction y Y Compute y = (yhead, y torso, ) = argmax ψ(y head, x)ψ(y torso, x) y head,y torso, = (argmax ψ(y head, x), argmax ψ(y torso, x), ) y head y torso Great! Problem solved!? 5 / 24

22 Peter Gehler Introduction to Graphical Models Idea: Connect up the body X X ψ(y head, x) ψ(y torso, x) ψ(y torso, y arm ) Y head Y torso ψ(y head, y torso ) Head-Torso Model Ensure head is on top of torso Compute y = ψ(y head, y torso ) R + argmax ψ(y head, x)ψ(y torso, x)ψ(y head, y torso ) y head,y torso, but this does not decompose anymore! left image from Ben Sapp 6 / 24

23 Peter Gehler Introduction to Graphical Models The recipe Structured output function, X = anything, Y = anything 1) Define auxiliary function, g : X Y R, e.g. g(x, y) = i ψ i(y i, x) i j ψ ij (y i, y j, x) 2) Obtain f : X Y by maximimization: f(x) = argmax g(x, y) y Y 7 / 24

24 Peter Gehler Introduction to Graphical Models A Probabilistic View Computer Vision problems usually deal with uncertain information Incomplete information (observe static images, projections, etc) Annotation is noisy (wrong or ambiguous cases)... Uncertainty is captured by (conditional) probability distributions: p(y x) for input x X, how likely is y Y the correct output? We can also phrase this as what s the probability of observing y given x? how strong is our belief in y if we know x? 8 / 24

25 Peter Gehler Introduction to Graphical Models A Probabilistic View on f : X Y Structured output function, X = anything, Y = anything We need to define an auxiliary function, g : X Y R. e.g. g(x, y) := p(y x). Then maximimization f(x) = argmax y Y g(x, y) = argmax p(y x) y Y becomes maximum a posteriori (MAP) prediction. Interpretation: The MAP estimate y Y, is the most probable value (there can be multiple). 9 / 24

26 Peter Gehler Introduction to Graphical Models Probability Distributions y Y p(y) 0 (positivity) p(y) = 1 (normalization) y Y 10 / 24

27 Peter Gehler Introduction to Graphical Models Probability Distributions y Y p(y) 0 (positivity) p(y) = 1 (normalization) y Y Example: binary ( Bernoulli ) variable y Y = {0, 1} 2 values, 1 degree of freedom p(y) y 10 / 24

28 Peter Gehler Introduction to Graphical Models Conditional Probability Distributions x X y Y p(y x) 0 (positivity) x X y Yp(y x) = 1 (normalization w.r.t. y) For example: binary prediction X = {images}, y Y = {0, 1} each x: 2 values, 1 d.o.f. one (or two) function 11 / 24

29 Peter Gehler Introduction to Graphical Models Multi-class prediction, y Y = {1,..., K} each x: K values, K 1 d.o.f. K 1 functions or 1 vector-valued function with K 1 outputs Typically: K functions, plus explicit normalization 12 / 24

30 Peter Gehler Introduction to Graphical Models Multi-class prediction, y Y = {1,..., K} each x: K values, K 1 d.o.f. K 1 functions or 1 vector-valued function with K 1 outputs Typically: K functions, plus explicit normalization Example: predicting the center point of an object y Y = {(1, 1),..., (width, height)} for each x: Y = W H values, y = (y 1, y 2 ) Y 1 Y 2 with Y 1 = {(1,..., width} and Y 2 = {1,..., height}. each x: Y 1 Y 2 = W H values, 12 / 24

31 Peter Gehler Introduction to Graphical Models Structured objects: predicting M variables jointly Y = {1, K} {1, K} {1, K} For each x: K M values, K M 1 d.o.f. K M functions Example: Object detection with variable size bounding box Y {1,..., W } {1,..., H} {1,..., W } {1,..., H} y = (left, top, right, bottom) For each x: 1 4W (W 1)H(H 1) values (millions to billions...) 13 / 24

32 Peter Gehler Introduction to Graphical Models Example: image denoising Y = { RGB images} For each x: values in p(y x), 10 2,000,000 functions too much! 14 / 24

33 Peter Gehler Introduction to Graphical Models Example: image denoising Y = { RGB images} For each x: values in p(y x), 10 2,000,000 functions too much! We cannot consider all possible distributions, we must impose structure. 14 / 24

34 Peter Gehler Introduction to Graphical Models Probabilistic Graphical Models A (probabilistic) graphical model defines a family of probability distributions over a set of random variables, by means of a graph. 15 / 24

35 Peter Gehler Introduction to Graphical Models Probabilistic Graphical Models A (probabilistic) graphical model defines a family of probability distributions over a set of random variables, by means of a graph. Popular classes of graphical models, Undirected graphical models (Markov random fields), Directed graphical models (Bayesian networks), Factor graphs, Others: chain graphs, influence diagrams, etc. 15 / 24

36 Peter Gehler Introduction to Graphical Models Probabilistic Graphical Models A (probabilistic) graphical model defines a family of probability distributions over a set of random variables, by means of a graph. Popular classes of graphical models, Undirected graphical models (Markov random fields), Directed graphical models (Bayesian networks), Factor graphs, Others: chain graphs, influence diagrams, etc. The graph encodes conditional independence assumptions between the variables: for N(i) are the neighbors of node i in the graph p(y i y V \{i} ) = p(y i y N (i)) with y V \{i} = (y 1,..., y i 1, y i+1, y n ). 15 / 24

37 Peter Gehler Introduction to Graphical Models Example: Pictorial Structures for Articulated Pose Estimation X F (1) top Y top Y head F (2) top,head... Y rhnd Yrarm Y torso Y larm Y rleg Y lleg Y lhnd Y rfoot Y lfoot In principle, all parts depend on each other. Knowing where the head is puts constraints on where the feet can be. But conditional independences as specified by the graph: If we know where the left leg is, the left foot s position does not depend on the torso position anymore, etc. p(y lfoot y top,..., y torso,..., y rfoot, x) = p(y lfoot y lleg, x) 16 / 24

38 Peter Gehler Introduction to Graphical Models Factor Graphs Decomposable output y = (y 1,..., y V ) Graph: G = (V, F, E), E V F variable nodes V (circles), factor nodes F (boxes), edges E between variable and factor nodes. each factor F F connects a subset of nodes, write F = {v1,..., v F } and y F = (y v1,..., y v F ) Y i Y j Y k Y l Factor graph 17 / 24

39 Peter Gehler Introduction to Graphical Models Factor Graphs Decomposable output y = (y 1,..., y V ) Graph: G = (V, F, E), E V F variable nodes V (circles), factor nodes F (boxes), edges E between variable and factor nodes. each factor F F connects a subset of nodes, write F = {v1,..., v F } and y F = (y v1,..., y v F ) Y i Y j Y k Y l Factor graph Factorization into potentials ψ at factors: p(y) = 1 ψ F (y F ) Z F F 17 / 24

40 Peter Gehler Introduction to Graphical Models Factor Graphs Decomposable output y = (y 1,..., y V ) Graph: G = (V, F, E), E V F variable nodes V (circles), factor nodes F (boxes), edges E between variable and factor nodes. each factor F F connects a subset of nodes, write F = {v1,..., v F } and y F = (y v1,..., y v F ) Y i Y j Y k Y l Factor graph Factorization into potentials ψ at factors: p(y) = 1 ψ F (y F ) = 1 Z Z ψ 1(Y l )ψ 2 (Y j, Y l )ψ 3 (Y i, Y j )ψ 4 (Y i, Y k, Y l ) F F 17 / 24

41 Peter Gehler Introduction to Graphical Models Factor Graphs Decomposable output y = (y 1,..., y V ) Graph: G = (V, F, E), E V F variable nodes V (circles), factor nodes F (boxes), edges E between variable and factor nodes. each factor F F connects a subset of nodes, write F = {v1,..., v F } and y F = (y v1,..., y v F ) Y i Y j Y k Y l Factor graph Factorization into potentials ψ at factors: p(y) = 1 ψ F (y F ) = 1 Z Z ψ 1(Y l )ψ 2 (Y j, Y l )ψ 3 (Y i, Y j )ψ 4 (Y i, Y k, Y l ) F F Z is a normalization constant, called partition function: Z = ψ F (y F ). y Y F F 17 / 24

42 Peter Gehler Introduction to Graphical Models Conditional Distributions How to model p(y x)? Potentials become also functions of (part of) x: ψ F (y F ; x F ) instead of just ψ F (y F ) p(y x) = 1 Z(x) ψ F (y F ; x F ) F F X i Y i X j Y j Partition function depends on x F Factor graph Z(x) = ψ F (y F ; x F ). y Y F F Note: x is treated just as an argument, not as a random variable. Conditional random fields (CRFs) 18 / 24

43 Peter Gehler Introduction to Graphical Models Conventions: Potentials and Energy Functions Assume ψ F (y F ) > 0. Then instead of potentials, we can also work with energies: ψ F (y F ; x F ) = exp( E F (y F ; x F )), or equivalently E F (y F ; x F ) = log(ψ F (y F ; x F )). 19 / 24

44 Peter Gehler Introduction to Graphical Models Conventions: Potentials and Energy Functions Assume ψ F (y F ) > 0. Then instead of potentials, we can also work with energies: ψ F (y F ; x F ) = exp( E F (y F ; x F )), or equivalently E F (y F ; x F ) = log(ψ F (y F ; x F )). p(y x) can be written as p(y x) = 1 Z(x) ψ F (y F ; x F ) F F = 1 Z(x) exp( E F (y F ; x F )) = 1 Z(x) exp( E(y; x)) F F for E(y; x) = F F E F (y F ; x F ) 19 / 24

45 Peter Gehler Introduction to Graphical Models Conventions: Energy Minimization argmax y p(y x) = argmax y Y = argmax y Y = argmax y Y = argmin y Y 1 exp( E(y; x)) Z(x) exp( E(y; x)) E(y; x) E(y; x). MAP prediction can be performed by energy minimization. 20 / 24

46 Peter Gehler Introduction to Graphical Models Conventions: Energy Minimization argmax y p(y x) = argmax y Y = argmax y Y = argmax y Y = argmin y Y 1 exp( E(y; x)) Z(x) exp( E(y; x)) E(y; x) E(y; x). MAP prediction can be performed by energy minimization. In practice, one typically models the energy function directly. the probability distribution is uniquely determined by it. 20 / 24

47 Peter Gehler Introduction to Graphical Models Example: An Energy Function for Image Segmentation Foreground/background image segmentation X = [0, 255] W H, Y = {0, 1} W H foreground: y i = 1, background: y i = 0. graph: 4-connected grid Each output pixel depends on local grayvalue (inputs) neighboring outputs Energy function components ( Ising model): E i (y i = 1, x i ) = x i E i (y i = 0, x i ) = x i x i bright y i rather foreground, x i dark y i rather background E ij (0, 0) = E ij (1, 1) = 0, E ij (0, 1) = E ij (1, 0) = ω for ω > 0 prefer that neighbors have the same label labeling smooth 21 / 24

48 Peter Gehler Introduction to Graphical Models E(y; x) = i ( ( x i) y i = ) 255 x i y i = 0 + w y i y j i j input image segmentation segmentation from from thresholding minimal energy 22 / 24

49 Peter Gehler Introduction to Graphical Models What to do with Structured Prediction Models? Case 1) p(y x) is known MAP Prediction Predict f : X Y by solving y = argmax y Y = argmin y Y p(y x) E(y, x) Probabilistic Inference Compute marginal probabilities p(y F x) for any factor F, in particular, p(y i x) for all i V. 23 / 24

50 Peter Gehler Introduction to Graphical Models What to do with Structured Prediction Models? Case 2) p(y x) is unknown, but we have training data Parameter Learning Assume fixed graph structure, learn potentials/energies (ψ F ) Among other tasks (learn the graph structure, variables, etc.) Topic of Wednesdays lecture 24 / 24

51 Example: Pictorial Structures input image x argmax y p(y x) p(y i x) MAP makes a single (structured) prediction (point estimate) best overall pose Marginal probabilities p(y i x) give us potential positions uncertainty of the individual body parts. 1 / 24

52 Example: Man-made structure detection input image x argmax y p(y x) p(y i x) Task: Pixel depicts a man made structure or not? y i {0, 1} Middle: MAP inference Right: variable marginals Attention: Max-Marginals MAP 2 / 24

53 Probabilistic Inference Compute p(y F x) and Z(x). 3 / 24

54 Assume y = (y i, y j, y k, y l ), Y = Y i Y j Y k Y l, and an energy function E(y; x) compatible with the following factor graph: Y i Y j Y k Y l F G H 4 / 24

55 Assume y = (y i, y j, y k, y l ), Y = Y i Y j Y k Y l, and an energy function E(y; x) compatible with the following factor graph: Y i Y j Y k Y l F G H Task 1: for any y Y, compute p(y x), using p(y x) = 1 exp( E(y; x)). Z(x) 4 / 24

56 Assume y = (y i, y j, y k, y l ), Y = Y i Y j Y k Y l, and an energy function E(y; x) compatible with the following factor graph: Y i Y j Y k Y l F G H Task 1: for any y Y, compute p(y x), using p(y x) = 1 exp( E(y; x)). Z(x) Problem: We don t know Z(x), and computing it using Z(x) = y Y exp( E(y; x)) looks expensive (the sum has Y i Y j Y k Y l terms). A lot research has been done on how to efficiently compute Z(x). 4 / 24

57 Probabilistic Inference Belief Propagation / Message Passing Y i Y j Y k Y l F G H For notational simplicity, we drop the dependence on (fixed) x: Z= y Y exp( E(y)) 5 / 24

58 Probabilistic Inference Belief Propagation / Message Passing Y i Y j Y k Y l F G H For notational simplicity, we drop the dependence on (fixed) x: Z= exp( E(y)) y Y = y i Y i y j Y j y k Y k y l Y l exp( E(y i, y j, y k, y l )) 5 / 24

59 Probabilistic Inference Belief Propagation / Message Passing Y i Y j Y k Y l F G H For notational simplicity, we drop the dependence on (fixed) x: Z= exp( E(y)) y Y = y i Y i y j Y j y k Y k = y i Y i y j Y j y k Y k exp( E(y i, y j, y k, y l )) y l Y l y l Y l exp( (E F (y i, y j ) + E G (y j, y k ) + E H (y k, y l ))) 5 / 24

60 Probabilistic Inference Belief Propagation / Message Passing Z= Y i Y j Y k Y l F G H y i Y i y j Y j y k Y k y l Y l exp( (E F (y i, y j ) + E G (y j, y k ) + E H (y k, y l ))) 5 / 24

61 Probabilistic Inference Belief Propagation / Message Passing Z= Y i Y j Y k Y l F G H y i Y i y j Y j y k Y k = y i y j y k y l Y l exp( (E F (y i, y j ) + E G (y j, y k ) + E H (y k, y l ))) y l exp( E F (y i, y j )) exp( E G (y j, y k )) exp( E H (y k, y l )) 5 / 24

62 Probabilistic Inference Belief Propagation / Message Passing Z= Y i Y j Y k Y l F G H y i Y i y j Y j y k Y k = y i y l Y l exp( (E F (y i, y j ) + E G (y j, y k ) + E H (y k, y l ))) exp( E F (y i, y j )) exp( E G (y j, y k )) exp( E H (y k, y l )) y j y k y l = exp( E F (y i, y j )) exp( E G (y j, y k )) exp( E H (y k, y l )) y i y j y k y l 5 / 24

63 Probabilistic Inference Belief Propagation / Message Passing r H Yk R Y k Y i Y j Y k Y l F G H Z= exp( E F (y i, y j )) exp( E G (y j, y k )) exp( E H (y k, y l )) y i y j y k y l }{{} r H Yk (y k ) 5 / 24

64 Probabilistic Inference Belief Propagation / Message Passing r H Yk R Y k Y i Y j Y k F G H Y l Z= exp( E F (y i, y j )) exp( E G (y j, y k )) exp( E H (y k, y l )) y i y j y k y l }{{} r H Yk (y k ) = exp( E F (y i, y j )) exp( E G (y j, y k ))r H Yk (y k ) y i y j y k 5 / 24

65 Probabilistic Inference Belief Propagation / Message Passing r G Yj R Y j Y i Y j Y k F G H Y l Z= exp( E F (y i, y j )) exp( E G (y j, y k ))r H Yk (y k ) y i y j y k }{{} r G Yj (y j ) 5 / 24

66 Probabilistic Inference Belief Propagation / Message Passing r G Yj R Y j Y i Y j Y k F G H Y l Z= exp( E F (y i, y j )) exp( E G (y j, y k ))r H Yk (y k ) y i y j y k }{{} r G Yj (y j ) = exp( E F (y i, y j ))r G Yj (y j ) y i y j 5 / 24

67 Probabilistic Inference Belief Propagation / Message Passing r F Yi R Y i Y i Y j Y k F G H Y l Z= exp( E F (y i, y j )) exp( E G (y j, y k ))r H Yk (y k ) y i y j y k }{{} r G Yj (y j ) = exp( E F (y i, y j ))r G Yj (y j ) y i y j = y i r F Yi (y i ) 5 / 24

68 Example: Inference on Trees Y i Y j Y k Y l F G H I Y m Z = exp( E(y)) y Y = y i Y i y j Y i y k Y i y l Y i y m Y m exp( (E F (y i, y j ) + + E I (y k, y m ))) 6 / 24

69 Example: Inference on Trees Y i Y j Y k Y l F G H I Y m Z = y i Y i exp( E F (y i, y j )) exp( E G (y j, y k )) y j Y j y k Y k exp( E H (y k, y l )) exp( E I (y k, y m )) y l Y l y m Y m }{{}}{{} r H Yk (y k ) r I Yk (y k ) 6 / 24

70 Example: Inference on Trees r H Yk (y k ) Y i Y j Y k F G H r I Yk (y k ) I Y l Y m Z = y i Y i y j Y j exp( E F (y i, y j )) (r H Yk (y k ) r I Yk (y k )) y k Y k exp( E G (y j, y k )) 6 / 24

71 Example: Inference on Trees q Yk G(y k ) r H Yk (y k ) Y i Y j Y k F G H r I Yk (y k ) I Y l Y m Z = y i Y i y j Y j exp( E F (y i, y j )) (r H Yk (y k ) r I Yk (y k )) }{{} q Yk G(y k ) y k Y k exp( E G (y j, y k )) 6 / 24

72 Example: Inference on Trees q Yk G(y k ) r H Yk (y k ) Y i Y j Y k F G H r I Yk (y k ) I Y l Y m Z = y i Y i y j Y j exp( E F (y i, y j )) y k Y k exp( E G (y j, y k ))q Yk G(y k ) 6 / 24

73 Factor Graph Sum-Product Algorithm Message : pair of vectors at each factor graph edge (i, F ) E 1. r F Yi R Yi : factor-to-variable message 2. q Yi F R Yi : variable-to-factor message F r F Yi q Yi F Y i / 24

74 Factor Graph Sum-Product Algorithm Message : pair of vectors at each factor graph edge (i, F ) E 1. r F Yi R Yi : factor-to-variable message 2. q Yi F R Yi : variable-to-factor message Algorithm iteratively update messages F r F Yi q Yi F After convergence: Z and p(y F ) can be obtained from the messages. Y i Belief Propagation 7 / 24

75 Example: Pictorial Structures X F (1) top Y top Y head F (2) top,head... Y rhnd Yrarm Y torso Y larm Y rleg Y lleg Y lhnd Y rfoot Y lfoot Tree-structured model for articulated pose (Felzenszwalb and Huttenlocher, 2000), (Fischler and Elschlager, 1973) Body-part variables, states: discretized tuple (x, y, s, θ) (x, y) position, s scale, and θ rotation 8 / 24

76 Example: Pictorial Structures input image x p(y i x) Exact marginals although state space is huge and thus partition function is a huge sum. Z(x) = exp ( E(y; x)) all bodies y 9 / 24

77 Belief Propagation in Loopy Graphs Can we do message passing also in graphs with loops? A B Y i Y j Y k A B Y i Y j Y k C D E F G Y l Y m Y n C D E F G Y l Y m Y n H I J K L Y o Y p Y q H I J K L Y o Y p Y q Problem: There is no well-defined leaf to root order. Suggested solution: Loopy Belief Propagation (LBP) initialize all messages as constant 1 pass messages until convergence 10 / 24

78 Belief Propagation in Loopy Graphs A B Y i Y j Y k A B Y i Y j Y k C D E F G Y l Y m Y n C D E F G Y l Y m Y n H I J K L Y o Y p Y q H I J K L Y o Y p Y q Loopy Belief Propagation is very popular, but has some problems: it might not converge (e.g. oscillate) even if it does, the computed probabilities are only approximate. Many improved message-passing schemes exist (see tutorial book). 11 / 24

79 Probabilistic Inference Variational Inference / Mean Field Task: Compute marginals p(y F x) for general p(y x) Idea: Approximate p(y x) by simpler q(y) and use marginals from that. q = argmin D KL (q(y) p(y x)) q Q E.g. Naive Mean Field: Q all distributions of the form q(y) = i V q i (y i ). q h q e q f q g q i q j q k q l q m 12 / 24

80 Probabilistic Inference Sampling / Markov-Chain Monte Carlo Task: Compute marginals p(y F x) for general p(y x) Idea: Rephrase as computing the expected value of a quantity: E y p(y x,w) [h(x, y)], for some (well-behaved) function h : X Y R. For probabilistic inference, this step is easy. Set h F,z (x, y) := y F = z, then E y p(y x,w) [h F,z (x, y)] = p(y x) y F = z y Y = p(y F x) y F = z = p(y F = z x). y F Y F 13 / 24

81 Probabilistic Inference Sampling / Markov-Chain Monte Carlo Expectations can be computed/approximated by sampling: For fixed x, let y (1), y (2),... be i.i.d. samples from p(y x), then E y p(y x) [h(x, y)] 1 S S h(x, y (s) ). s=1 The law of large numbers guarantees convergence for S, For S independent samples, approximation error is O(1/ S), independent of the dimension of Y. 14 / 24

82 Probabilistic Inference Sampling / Markov-Chain Monte Carlo Expectations can be computed/approximated by sampling: For fixed x, let y (1), y (2),... be i.i.d. samples from p(y x), then E y p(y x) [h(x, y)] 1 S S h(x, y (s) ). s=1 The law of large numbers guarantees convergence for S, For S independent samples, approximation error is O(1/ S), independent of the dimension of Y. Problem: Producing i.i.d. samples, y (s), from p(y x) is hard. Solution: We can get away with a sequence of dependent samples Monte-Carlo Markov Chain (MCMC) sampling 14 / 24

83 Probabilistic Inference Sampling / Markov-Chain Monte Carlo One example how to do MCMC sampling: Gibbs sampler Initialize y (0) = (y 1,..., y d ) arbitrarily For s = 1,..., S: 1. Select a variable y i, 2. Re-sample y i p(y i y (s 1) V \{i}, x). 3. Output sample y (s) = (y (s 1) 1,..., y (s 1) i 1, y i, y (s 1) i+1,..., y (s 1) d ) p(y i y (s) V \{i}, x) = p(y i, y (t) V \{i} x) y i Y i p(y i, y (t) V \{i} x) exp( E(y i, y (t), x) = y i Y i exp( E(y i, y (t), x) 15 / 24

84 MAP Prediction Compute y = argmax y p(y x). 16 / 24

85 MAP Prediction Belief Propagation / Message Passing B 10. Y m A B Y i Y j Y k F 9. C D E A Y k Y l D E Y i Y j 7. C F G Y l Y m Y n H I J K L Y o Y p Y q One can also derive message passing algorithms for MAP prediction. In trees: guaranteed to converge to optimal solution. In loopy graphs: convergence not guaranteed, approximate solution. 17 / 24

86 MAP Prediction Graph Cuts For loopy graphs, we can find the global optimum only in special cases: Binary output variables: Y i = {0, 1} for i = 1,..., d, Energy function with only unary and pairwise terms E(y; x, w) = i E i (y i ; x) + i j E i,j (y i, y j ; x) 18 / 24

87 MAP Prediction Graph Cuts For loopy graphs, we can find the global optimum only in special cases: Binary output variables: Y i = {0, 1} for i = 1,..., d, Energy function with only unary and pairwise terms E(y; x, w) = i E i (y i ; x) + i j E i,j (y i, y j ; x) Restriction 1 (positive unary potentials): E F (y i ; x, w tf ) 0 (always achievable by reparametrization) Restriction 2 (regular/submodular/attractive pairwise potentials) E F (y i, y j ; x, w tf ) = 0, if y i = y j, E F (y i, y j ; x, w tf ) = E F (y j, y i ; x, w tf ) 0, otherwise. (not always achievable, depends on the task) 18 / 24

88 Construct auxiliary undirected graph One node {i} i V per variable Two extra nodes: source s, sink t Edges Edge Graph cut weight {i, j} E F (y i = 0, y j = 1; x, w tf ) {i, s} E F (y i = 1; x, w tf ) {i, t} E F (y i = 0; x, w tf ) Find linear s-t-mincut {i, t} {i, s} s i j k l m n Solution defines optimal binary labeling of the original energy minimization problem GraphCuts algorithms (Approximate) multi-class extensions exist, see tutorial book. t 19 / 24

89 GraphCuts Example Image segmentation energy: E(y; x) = ( ( x i) y i = ) 255 x i y i = 0 + w y i y j i i j All conditions to apply GraphCuts are fulfilled. E i (y i, x) 0, E ij (y i, y j ) = 0 for y i = y j, E ij (y i, y j ) = w > 0 for y i y j. input image thresholding GraphCuts 20 / 24

90 MAP Prediction Linear Programming Relaxation More general alternative, Y i = {1,..., K}: E(y; x) = i E i (y i ; x) + ij E ij (y i, y j ; x) Linearize the energy using indicator functions: E i (y i ; x) = K E i (k; x) y }{{} i = k = =:a ik k=1 K a i;k µ i;k k=1 for new variables µ i;k {0, 1} with k µ i;k = 1. E ij (y i, y j ; x) = K K k=1 l=1 E ij (k, l; x) }{{} =:a ij;kl y i = k y j = l = K a ij;kl µ ij;kl for new variables µ ij;kl {0, 1} with l µ ij;kl = µ i;k and k µ ij;kl = µ j;l. k=1 21 / 24

91 MAP Prediction Linear Programming Relaxation Energy minimization becomes y µ := argmin a i;k µ i;k + µ i ij subject to a ij;kl µ ij;kl = argmin Aµ µ µ i;k {0, 1} µ ij;kl {0, 1} µ i;k = 1, µ ij;kl = µ i;k, k l µ ij;kl = µ j;l k Integer variables, linear objective function, linear constraints: Integer linear program (ILP) Unfortunately, ILPs are in general NP-hard. 22 / 24

92 MAP Prediction Linear Programming Relaxation Energy minimization becomes y µ := argmin a i;k µ i;k + µ i ij subject to a ij;kl µ ij;kl = argmin Aµ µ µ i;k [0, 1] {0, 1} µ ij;kl [0, 1] {0, 1} µ i;k = 1, µ ij;kl = µ i;k, µ ij;kl = µ j;l k l Integer real-values variables, linear objective function, linear constraints: Linear program (LP) relaxation LPs can be solved very efficiently, µ yields approximate solution for y. k 23 / 24

93 MAP Prediction Custom solutions: E.g. branch-and-bound Note: we just try to solve an optimization problem y = argmin E(y; x) y Y We can use any optimization technique that fits the problem. 24 / 24

94 MAP Prediction Custom solutions: E.g. branch-and-bound Note: we just try to solve an optimization problem y = argmin E(y; x) y Y We can use any optimization technique that fits the problem. For low-dimensional Y, such as bounding boxes: branch-and-bound: 24 / 24

95 MAP Prediction Custom solutions: E.g. branch-and-bound Note: we just try to solve an optimization problem y = argmin E(y; x) y Y We can use any optimization technique that fits the problem. For low-dimensional Y, such as bounding boxes: branch-and-bound: 24 / 24

96 MAP Prediction Custom solutions: E.g. branch-and-bound Note: we just try to solve an optimization problem y = argmin E(y; x) y Y We can use any optimization technique that fits the problem. For low-dimensional Y, such as bounding boxes: branch-and-bound: 24 / 24

97 MAP Prediction Custom solutions: E.g. branch-and-bound Note: we just try to solve an optimization problem y = argmin E(y; x) y Y We can use any optimization technique that fits the problem. For low-dimensional Y, such as bounding boxes: branch-and-bound: 24 / 24

98 MAP Prediction Custom solutions: E.g. branch-and-bound Note: we just try to solve an optimization problem y = argmin E(y; x) y Y We can use any optimization technique that fits the problem. For low-dimensional Y, such as bounding boxes: branch-and-bound: 24 / 24

99 MAP Prediction Custom solutions: E.g. branch-and-bound Note: we just try to solve an optimization problem y = argmin E(y; x) y Y We can use any optimization technique that fits the problem. For low-dimensional Y, such as bounding boxes: branch-and-bound: 24 / 24

100 MAP Prediction Custom solutions: E.g. branch-and-bound Note: we just try to solve an optimization problem y = argmin E(y; x) y Y We can use any optimization technique that fits the problem. For low-dimensional Y, such as bounding boxes: branch-and-bound: 24 / 24

101 MAP Prediction Custom solutions: E.g. branch-and-bound Note: we just try to solve an optimization problem y = argmin E(y; x) y Y We can use any optimization technique that fits the problem. For low-dimensional Y, such as bounding boxes: branch-and-bound: 24 / 24

102 Optimal Prediction Predict with loss function (ȳ, y). 1 / 6

103 Optimal Prediction Optimal prediction is minimum expected risk an expectation y = argmin ȳ Y (ȳ, y)p(y x) y Y X i X j Y i Y j 2 / 6

104 Optimal Prediction Optimal prediction is minimum expected risk an expectation y = argmin (ȳ, y)p(y x) ȳ Y y Y = argmin (ȳ, y) ψ F (y F ; x) ȳ Y y Y F X i Y i X j Y j Can think of as another CRF factor Reuse inference techniques (ȳ, ) 2 / 6

105 Example: Hamming loss Count the number of mislabeled variables: H (y, y) = 1 V I(y i y i ) i V Makes more sense than 0/1 loss for image segmentation Optimal: predict maximum marginals (exercise) y = (argmax p(y 1 x), argmax p(y 2 x),...) y 1 y 2 3 / 6

106 Example: Pixel error If we can add elements in Y i (pixel intensities, optical flow vectors, etc.). Sum of squared errors Q (y, y) = 1 V y i y i 2. i V Used, e.g., in stereo reconstruction, part-based object detection. Optimal: predict marginal mean (exercise) y = (E p(y x) [y 1 ], E p(y x) [y 2 ],...) 4 / 6

107 Example: Task specific losses Object detection bounding boxes, or arbitrary regions detection ground truth image Area overlap loss: AO (y, y) = 1 area(y y) area(y y) = 1 Used, e.g., in PASCAL VOC challenges for object detection, because it scale-invariants (no bias for or against big objects). 5 / 6

108 Summary: Inference and Prediction Two main tasks for a given probability distribution p(y x): Probabilistic Inference Compute p(y I x) for a subset I of variables, in particular p(y i x) (Loopy) Belief Propagation, Variation Inference, Sampling,... MAP Prediction Identify y Y that maximizes p(y x) (minimizes energy) (Loopy) Belief Propagation, GraphCuts, LP-relaxation, custom,... Structured prediction comes with structured loss functions, : Y Y R. Loss Function (y, y) is loss (or cost) for predicting y Y if y Y is correct. Task specific: use 0/1-loss, Hamming loss, area overlap,... 6 / 6

109 More information: Max Planck Institute for Intelligent Systems Other groups on Campus Empirical Inference (Machine Learning) Perceiving Systems (Computer Vision) Autonomous Motion (Robotics)

Learning with Structured Inputs and Outputs

Learning with Structured Inputs and Outputs Learning with Structured Inputs and Outputs Christoph H. Lampert IST Austria (Institute of Science and Technology Austria), Vienna Microsoft Machine Learning and Intelligence School 2015 July 29-August

More information

Probabilistic Graphical Models & Applications

Probabilistic Graphical Models & Applications Probabilistic Graphical Models & Applications Learning of Graphical Models Bjoern Andres and Bernt Schiele Max Planck Institute for Informatics The slides of today s lecture are authored by and shown with

More information

Intelligent Systems:

Intelligent Systems: Intelligent Systems: Undirected Graphical models (Factor Graphs) (2 lectures) Carsten Rother 15/01/2015 Intelligent Systems: Probabilistic Inference in DGM and UGM Roadmap for next two lectures Definition

More information

Markov Random Fields for Computer Vision (Part 1)

Markov Random Fields for Computer Vision (Part 1) Markov Random Fields for Computer Vision (Part 1) Machine Learning Summer School (MLSS 2011) Stephen Gould stephen.gould@anu.edu.au Australian National University 13 17 June, 2011 Stephen Gould 1/23 Pixel

More information

Introduction to Probabilistic Graphical Models

Introduction to Probabilistic Graphical Models Introduction to Probabilistic Graphical Models Christoph Lampert IST Austria (Institute of Science and Technology Austria) 1 / 51 Schedule Refresher of Probabilities Introduction to Probabilistic Graphical

More information

Undirected Graphical Models

Undirected Graphical Models Outline Hong Chang Institute of Computing Technology, Chinese Academy of Sciences Machine Learning Methods (Fall 2012) Outline Outline I 1 Introduction 2 Properties Properties 3 Generative vs. Conditional

More information

Part 4: Conditional Random Fields

Part 4: Conditional Random Fields Part 4: Conditional Random Fields Sebastian Nowozin and Christoph H. Lampert Colorado Springs, 25th June 2011 1 / 39 Problem (Probabilistic Learning) Let d(y x) be the (unknown) true conditional distribution.

More information

Learning with Structured Inputs and Outputs

Learning with Structured Inputs and Outputs Learning with Structured Inputs and Outputs Christoph H. Lampert IST Austria (Institute of Science and Technology Austria), Vienna ENS/INRIA Summer School, Paris, July 2013 Slides: http://www.ist.ac.at/~chl/

More information

Undirected graphical models

Undirected graphical models Undirected graphical models Semantics of probabilistic models over undirected graphs Parameters of undirected models Example applications COMP-652 and ECSE-608, February 16, 2017 1 Undirected graphical

More information

Probabilistic Graphical Models

Probabilistic Graphical Models Probabilistic Graphical Models David Sontag New York University Lecture 4, February 16, 2012 David Sontag (NYU) Graphical Models Lecture 4, February 16, 2012 1 / 27 Undirected graphical models Reminder

More information

Probabilistic Graphical Models: MRFs and CRFs. CSE628: Natural Language Processing Guest Lecturer: Veselin Stoyanov

Probabilistic Graphical Models: MRFs and CRFs. CSE628: Natural Language Processing Guest Lecturer: Veselin Stoyanov Probabilistic Graphical Models: MRFs and CRFs CSE628: Natural Language Processing Guest Lecturer: Veselin Stoyanov Why PGMs? PGMs can model joint probabilities of many events. many techniques commonly

More information

Introduction to Graphical Models. Srikumar Ramalingam School of Computing University of Utah

Introduction to Graphical Models. Srikumar Ramalingam School of Computing University of Utah Introduction to Graphical Models Srikumar Ramalingam School of Computing University of Utah Reference Christopher M. Bishop, Pattern Recognition and Machine Learning, Jonathan S. Yedidia, William T. Freeman,

More information

Part 7: Structured Prediction and Energy Minimization (2/2)

Part 7: Structured Prediction and Energy Minimization (2/2) Part 7: Structured Prediction and Energy Minimization (2/2) Colorado Springs, 25th June 2011 G: Worst-case Complexity Hard problem Generality Optimality Worst-case complexity Integrality Determinism G:

More information

Intelligent Systems (AI-2)

Intelligent Systems (AI-2) Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 19 Oct, 23, 2015 Slide Sources Raymond J. Mooney University of Texas at Austin D. Koller, Stanford CS - Probabilistic Graphical Models D. Page,

More information

Undirected Graphical Models: Markov Random Fields

Undirected Graphical Models: Markov Random Fields Undirected Graphical Models: Markov Random Fields 40-956 Advanced Topics in AI: Probabilistic Graphical Models Sharif University of Technology Soleymani Spring 2015 Markov Random Field Structure: undirected

More information

Introduction to Graphical Models. Srikumar Ramalingam School of Computing University of Utah

Introduction to Graphical Models. Srikumar Ramalingam School of Computing University of Utah Introduction to Graphical Models Srikumar Ramalingam School of Computing University of Utah Reference Christopher M. Bishop, Pattern Recognition and Machine Learning, Jonathan S. Yedidia, William T. Freeman,

More information

Lecture 9: PGM Learning

Lecture 9: PGM Learning 13 Oct 2014 Intro. to Stats. Machine Learning COMP SCI 4401/7401 Table of Contents I Learning parameters in MRFs 1 Learning parameters in MRFs Inference and Learning Given parameters (of potentials) and

More information

Part 6: Structured Prediction and Energy Minimization (1/2)

Part 6: Structured Prediction and Energy Minimization (1/2) Part 6: Structured Prediction and Energy Minimization (1/2) Providence, 21st June 2012 Prediction Problem Prediction Problem y = f (x) = argmax y Y g(x, y) g(x, y) = p(y x), factor graphs/mrf/crf, g(x,

More information

Lecture 6: Graphical Models

Lecture 6: Graphical Models Lecture 6: Graphical Models Kai-Wei Chang CS @ Uniersity of Virginia kw@kwchang.net Some slides are adapted from Viek Skirmar s course on Structured Prediction 1 So far We discussed sequence labeling tasks:

More information

Probabilistic Graphical Models

Probabilistic Graphical Models Probabilistic Graphical Models Brown University CSCI 295-P, Spring 213 Prof. Erik Sudderth Lecture 11: Inference & Learning Overview, Gaussian Graphical Models Some figures courtesy Michael Jordan s draft

More information

Variational Inference (11/04/13)

Variational Inference (11/04/13) STA561: Probabilistic machine learning Variational Inference (11/04/13) Lecturer: Barbara Engelhardt Scribes: Matt Dickenson, Alireza Samany, Tracy Schifeling 1 Introduction In this lecture we will further

More information

Intelligent Systems (AI-2)

Intelligent Systems (AI-2) Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 19 Oct, 24, 2016 Slide Sources Raymond J. Mooney University of Texas at Austin D. Koller, Stanford CS - Probabilistic Graphical Models D. Page,

More information

Discrete Inference and Learning Lecture 3

Discrete Inference and Learning Lecture 3 Discrete Inference and Learning Lecture 3 MVA 2017 2018 h

More information

10 : HMM and CRF. 1 Case Study: Supervised Part-of-Speech Tagging

10 : HMM and CRF. 1 Case Study: Supervised Part-of-Speech Tagging 10-708: Probabilistic Graphical Models 10-708, Spring 2018 10 : HMM and CRF Lecturer: Kayhan Batmanghelich Scribes: Ben Lengerich, Michael Kleyman 1 Case Study: Supervised Part-of-Speech Tagging We will

More information

Structured Prediction

Structured Prediction Structured Prediction Classification Algorithms Classify objects x X into labels y Y First there was binary: Y = {0, 1} Then multiclass: Y = {1,...,6} The next generation: Structured Labels Structured

More information

ECE 6504: Advanced Topics in Machine Learning Probabilistic Graphical Models and Large-Scale Learning

ECE 6504: Advanced Topics in Machine Learning Probabilistic Graphical Models and Large-Scale Learning ECE 6504: Advanced Topics in Machine Learning Probabilistic Graphical Models and Large-Scale Learning Topics Markov Random Fields: Representation Conditional Random Fields Log-Linear Models Readings: KF

More information

Probabilistic Graphical Models Lecture Notes Fall 2009

Probabilistic Graphical Models Lecture Notes Fall 2009 Probabilistic Graphical Models Lecture Notes Fall 2009 October 28, 2009 Byoung-Tak Zhang School of omputer Science and Engineering & ognitive Science, Brain Science, and Bioinformatics Seoul National University

More information

CS Lecture 4. Markov Random Fields

CS Lecture 4. Markov Random Fields CS 6347 Lecture 4 Markov Random Fields Recap Announcements First homework is available on elearning Reminder: Office hours Tuesday from 10am-11am Last Time Bayesian networks Today Markov random fields

More information

Graphical Models and Kernel Methods

Graphical Models and Kernel Methods Graphical Models and Kernel Methods Jerry Zhu Department of Computer Sciences University of Wisconsin Madison, USA MLSS June 17, 2014 1 / 123 Outline Graphical Models Probabilistic Inference Directed vs.

More information

Probabilistic Graphical Networks: Definitions and Basic Results

Probabilistic Graphical Networks: Definitions and Basic Results This document gives a cursory overview of Probabilistic Graphical Networks. The material has been gleaned from different sources. I make no claim to original authorship of this material. Bayesian Graphical

More information

CSC 412 (Lecture 4): Undirected Graphical Models

CSC 412 (Lecture 4): Undirected Graphical Models CSC 412 (Lecture 4): Undirected Graphical Models Raquel Urtasun University of Toronto Feb 2, 2016 R Urtasun (UofT) CSC 412 Feb 2, 2016 1 / 37 Today Undirected Graphical Models: Semantics of the graph:

More information

Machine Learning 4771

Machine Learning 4771 Machine Learning 4771 Instructor: Tony Jebara Topic 16 Undirected Graphs Undirected Separation Inferring Marginals & Conditionals Moralization Junction Trees Triangulation Undirected Graphs Separation

More information

A graph contains a set of nodes (vertices) connected by links (edges or arcs)

A graph contains a set of nodes (vertices) connected by links (edges or arcs) BOLTZMANN MACHINES Generative Models Graphical Models A graph contains a set of nodes (vertices) connected by links (edges or arcs) In a probabilistic graphical model, each node represents a random variable,

More information

Efficient Inference in Fully Connected CRFs with Gaussian Edge Potentials

Efficient Inference in Fully Connected CRFs with Gaussian Edge Potentials Efficient Inference in Fully Connected CRFs with Gaussian Edge Potentials by Phillip Krahenbuhl and Vladlen Koltun Presented by Adam Stambler Multi-class image segmentation Assign a class label to each

More information

Submodularity in Machine Learning

Submodularity in Machine Learning Saifuddin Syed MLRG Summer 2016 1 / 39 What are submodular functions Outline 1 What are submodular functions Motivation Submodularity and Concavity Examples 2 Properties of submodular functions Submodularity

More information

Bayesian Networks: Construction, Inference, Learning and Causal Interpretation. Volker Tresp Summer 2016

Bayesian Networks: Construction, Inference, Learning and Causal Interpretation. Volker Tresp Summer 2016 Bayesian Networks: Construction, Inference, Learning and Causal Interpretation Volker Tresp Summer 2016 1 Introduction So far we were mostly concerned with supervised learning: we predicted one or several

More information

CPSC 540: Machine Learning

CPSC 540: Machine Learning CPSC 540: Machine Learning More Approximate Inference Mark Schmidt University of British Columbia Winter 2018 Last Time: Approximate Inference We ve been discussing graphical models for density estimation,

More information

Lecture 8: PGM Inference

Lecture 8: PGM Inference 15 September 2014 Intro. to Stats. Machine Learning COMP SCI 4401/7401 Table of Contents I 1 Variable elimination Max-product Sum-product 2 LP Relaxations QP Relaxations 3 Marginal and MAP X1 X2 X3 X4

More information

Efficient Inference in Fully Connected CRFs with Gaussian Edge Potentials

Efficient Inference in Fully Connected CRFs with Gaussian Edge Potentials Efficient Inference in Fully Connected CRFs with Gaussian Edge Potentials Philipp Krähenbühl and Vladlen Koltun Stanford University Presenter: Yuan-Ting Hu 1 Conditional Random Field (CRF) E x I = φ u

More information

Probabilistic Graphical Models

Probabilistic Graphical Models Probabilistic Graphical Models David Sontag New York University Lecture 6, March 7, 2013 David Sontag (NYU) Graphical Models Lecture 6, March 7, 2013 1 / 25 Today s lecture 1 Dual decomposition 2 MAP inference

More information

3 : Representation of Undirected GM

3 : Representation of Undirected GM 10-708: Probabilistic Graphical Models 10-708, Spring 2016 3 : Representation of Undirected GM Lecturer: Eric P. Xing Scribes: Longqi Cai, Man-Chia Chang 1 MRF vs BN There are two types of graphical models:

More information

Statistical NLP for the Web Log Linear Models, MEMM, Conditional Random Fields

Statistical NLP for the Web Log Linear Models, MEMM, Conditional Random Fields Statistical NLP for the Web Log Linear Models, MEMM, Conditional Random Fields Sameer Maskey Week 13, Nov 28, 2012 1 Announcements Next lecture is the last lecture Wrap up of the semester 2 Final Project

More information

Logistic Regression: Online, Lazy, Kernelized, Sequential, etc.

Logistic Regression: Online, Lazy, Kernelized, Sequential, etc. Logistic Regression: Online, Lazy, Kernelized, Sequential, etc. Harsha Veeramachaneni Thomson Reuter Research and Development April 1, 2010 Harsha Veeramachaneni (TR R&D) Logistic Regression April 1, 2010

More information

6.867 Machine learning, lecture 23 (Jaakkola)

6.867 Machine learning, lecture 23 (Jaakkola) Lecture topics: Markov Random Fields Probabilistic inference Markov Random Fields We will briefly go over undirected graphical models or Markov Random Fields (MRFs) as they will be needed in the context

More information

Sequence labeling. Taking collective a set of interrelated instances x 1,, x T and jointly labeling them

Sequence labeling. Taking collective a set of interrelated instances x 1,, x T and jointly labeling them HMM, MEMM and CRF 40-957 Special opics in Artificial Intelligence: Probabilistic Graphical Models Sharif University of echnology Soleymani Spring 2014 Sequence labeling aking collective a set of interrelated

More information

Lecture 15. Probabilistic Models on Graph

Lecture 15. Probabilistic Models on Graph Lecture 15. Probabilistic Models on Graph Prof. Alan Yuille Spring 2014 1 Introduction We discuss how to define probabilistic models that use richly structured probability distributions and describe how

More information

CPSC 540: Machine Learning

CPSC 540: Machine Learning CPSC 540: Machine Learning MCMC and Non-Parametric Bayes Mark Schmidt University of British Columbia Winter 2016 Admin I went through project proposals: Some of you got a message on Piazza. No news is

More information

STA 4273H: Statistical Machine Learning

STA 4273H: Statistical Machine Learning STA 4273H: Statistical Machine Learning Russ Salakhutdinov Department of Computer Science! Department of Statistical Sciences! rsalakhu@cs.toronto.edu! h0p://www.cs.utoronto.ca/~rsalakhu/ Lecture 7 Approximate

More information

13: Variational inference II

13: Variational inference II 10-708: Probabilistic Graphical Models, Spring 2015 13: Variational inference II Lecturer: Eric P. Xing Scribes: Ronghuo Zheng, Zhiting Hu, Yuntian Deng 1 Introduction We started to talk about variational

More information

Probabilistic Graphical Models

Probabilistic Graphical Models 2016 Robert Nowak Probabilistic Graphical Models 1 Introduction We have focused mainly on linear models for signals, in particular the subspace model x = Uθ, where U is a n k matrix and θ R k is a vector

More information

13 : Variational Inference: Loopy Belief Propagation

13 : Variational Inference: Loopy Belief Propagation 10-708: Probabilistic Graphical Models 10-708, Spring 2014 13 : Variational Inference: Loopy Belief Propagation Lecturer: Eric P. Xing Scribes: Rajarshi Das, Zhengzhong Liu, Dishan Gupta 1 Introduction

More information

The Ising model and Markov chain Monte Carlo

The Ising model and Markov chain Monte Carlo The Ising model and Markov chain Monte Carlo Ramesh Sridharan These notes give a short description of the Ising model for images and an introduction to Metropolis-Hastings and Gibbs Markov Chain Monte

More information

Bayesian Learning. HT2015: SC4 Statistical Data Mining and Machine Learning. Maximum Likelihood Principle. The Bayesian Learning Framework

Bayesian Learning. HT2015: SC4 Statistical Data Mining and Machine Learning. Maximum Likelihood Principle. The Bayesian Learning Framework HT5: SC4 Statistical Data Mining and Machine Learning Dino Sejdinovic Department of Statistics Oxford http://www.stats.ox.ac.uk/~sejdinov/sdmml.html Maximum Likelihood Principle A generative model for

More information

Junction Tree, BP and Variational Methods

Junction Tree, BP and Variational Methods Junction Tree, BP and Variational Methods Adrian Weller MLSALT4 Lecture Feb 21, 2018 With thanks to David Sontag (MIT) and Tony Jebara (Columbia) for use of many slides and illustrations For more information,

More information

Course 16:198:520: Introduction To Artificial Intelligence Lecture 9. Markov Networks. Abdeslam Boularias. Monday, October 14, 2015

Course 16:198:520: Introduction To Artificial Intelligence Lecture 9. Markov Networks. Abdeslam Boularias. Monday, October 14, 2015 Course 16:198:520: Introduction To Artificial Intelligence Lecture 9 Markov Networks Abdeslam Boularias Monday, October 14, 2015 1 / 58 Overview Bayesian networks, presented in the previous lecture, are

More information

Linear Classifiers IV

Linear Classifiers IV Universität Potsdam Institut für Informatik Lehrstuhl Linear Classifiers IV Blaine Nelson, Tobias Scheffer Contents Classification Problem Bayesian Classifier Decision Linear Classifiers, MAP Models Logistic

More information

Statistical Machine Learning Theory. From Multi-class Classification to Structured Output Prediction. Hisashi Kashima.

Statistical Machine Learning Theory. From Multi-class Classification to Structured Output Prediction. Hisashi Kashima. http://goo.gl/xilnmn Course website KYOTO UNIVERSITY Statistical Machine Learning Theory From Multi-class Classification to Structured Output Prediction Hisashi Kashima kashima@i.kyoto-u.ac.jp DEPARTMENT

More information

Object Detection Grammars

Object Detection Grammars Object Detection Grammars Pedro F. Felzenszwalb and David McAllester February 11, 2010 1 Introduction We formulate a general grammar model motivated by the problem of object detection in computer vision.

More information

Intelligent Systems (AI-2)

Intelligent Systems (AI-2) Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 18 Oct, 21, 2015 Slide Sources Raymond J. Mooney University of Texas at Austin D. Koller, Stanford CS - Probabilistic Graphical Models CPSC

More information

A brief introduction to Conditional Random Fields

A brief introduction to Conditional Random Fields A brief introduction to Conditional Random Fields Mark Johnson Macquarie University April, 2005, updated October 2010 1 Talk outline Graphical models Maximum likelihood and maximum conditional likelihood

More information

Graphical models. Sunita Sarawagi IIT Bombay

Graphical models. Sunita Sarawagi IIT Bombay 1 Graphical models Sunita Sarawagi IIT Bombay http://www.cse.iitb.ac.in/~sunita 2 Probabilistic modeling Given: several variables: x 1,... x n, n is large. Task: build a joint distribution function Pr(x

More information

Asaf Bar Zvi Adi Hayat. Semantic Segmentation

Asaf Bar Zvi Adi Hayat. Semantic Segmentation Asaf Bar Zvi Adi Hayat Semantic Segmentation Today s Topics Fully Convolutional Networks (FCN) (CVPR 2015) Conditional Random Fields as Recurrent Neural Networks (ICCV 2015) Gaussian Conditional random

More information

COMS 4771 Introduction to Machine Learning. Nakul Verma

COMS 4771 Introduction to Machine Learning. Nakul Verma COMS 4771 Introduction to Machine Learning Nakul Verma Announcements HW1 due next lecture Project details are available decide on the group and topic by Thursday Last time Generative vs. Discriminative

More information

4 : Exact Inference: Variable Elimination

4 : Exact Inference: Variable Elimination 10-708: Probabilistic Graphical Models 10-708, Spring 2014 4 : Exact Inference: Variable Elimination Lecturer: Eric P. ing Scribes: Soumya Batra, Pradeep Dasigi, Manzil Zaheer 1 Probabilistic Inference

More information

13 : Variational Inference: Loopy Belief Propagation and Mean Field

13 : Variational Inference: Loopy Belief Propagation and Mean Field 10-708: Probabilistic Graphical Models 10-708, Spring 2012 13 : Variational Inference: Loopy Belief Propagation and Mean Field Lecturer: Eric P. Xing Scribes: Peter Schulam and William Wang 1 Introduction

More information

Naïve Bayes classification

Naïve Bayes classification Naïve Bayes classification 1 Probability theory Random variable: a variable whose possible values are numerical outcomes of a random phenomenon. Examples: A person s height, the outcome of a coin toss

More information

Machine Learning, Fall 2009: Midterm

Machine Learning, Fall 2009: Midterm 10-601 Machine Learning, Fall 009: Midterm Monday, November nd hours 1. Personal info: Name: Andrew account: E-mail address:. You are permitted two pages of notes and a calculator. Please turn off all

More information

Expectation Maximization (EM)

Expectation Maximization (EM) Expectation Maximization (EM) The EM algorithm is used to train models involving latent variables using training data in which the latent variables are not observed (unlabeled data). This is to be contrasted

More information

STA 4273H: Statistical Machine Learning

STA 4273H: Statistical Machine Learning STA 4273H: Statistical Machine Learning Russ Salakhutdinov Department of Statistics! rsalakhu@utstat.toronto.edu! http://www.utstat.utoronto.ca/~rsalakhu/ Sidney Smith Hall, Room 6002 Lecture 3 Linear

More information

Machine Learning for NLP

Machine Learning for NLP Machine Learning for NLP Uppsala University Department of Linguistics and Philology Slides borrowed from Ryan McDonald, Google Research Machine Learning for NLP 1(50) Introduction Linear Classifiers Classifiers

More information

6.047 / Computational Biology: Genomes, Networks, Evolution Fall 2008

6.047 / Computational Biology: Genomes, Networks, Evolution Fall 2008 MIT OpenCourseWare http://ocw.mit.edu 6.047 / 6.878 Computational Biology: Genomes, etworks, Evolution Fall 2008 For information about citing these materials or our Terms of Use, visit: http://ocw.mit.edu/terms.

More information

Conditional Random Field

Conditional Random Field Introduction Linear-Chain General Specific Implementations Conclusions Corso di Elaborazione del Linguaggio Naturale Pisa, May, 2011 Introduction Linear-Chain General Specific Implementations Conclusions

More information

Statistical Machine Learning Theory. From Multi-class Classification to Structured Output Prediction. Hisashi Kashima.

Statistical Machine Learning Theory. From Multi-class Classification to Structured Output Prediction. Hisashi Kashima. http://goo.gl/jv7vj9 Course website KYOTO UNIVERSITY Statistical Machine Learning Theory From Multi-class Classification to Structured Output Prediction Hisashi Kashima kashima@i.kyoto-u.ac.jp DEPARTMENT

More information

Directed and Undirected Graphical Models

Directed and Undirected Graphical Models Directed and Undirected Davide Bacciu Dipartimento di Informatica Università di Pisa bacciu@di.unipi.it Machine Learning: Neural Networks and Advanced Models (AA2) Last Lecture Refresher Lecture Plan Directed

More information

Bayesian Networks: Construction, Inference, Learning and Causal Interpretation. Volker Tresp Summer 2014

Bayesian Networks: Construction, Inference, Learning and Causal Interpretation. Volker Tresp Summer 2014 Bayesian Networks: Construction, Inference, Learning and Causal Interpretation Volker Tresp Summer 2014 1 Introduction So far we were mostly concerned with supervised learning: we predicted one or several

More information

Computer Vision Group Prof. Daniel Cremers. 10a. Markov Chain Monte Carlo

Computer Vision Group Prof. Daniel Cremers. 10a. Markov Chain Monte Carlo Group Prof. Daniel Cremers 10a. Markov Chain Monte Carlo Markov Chain Monte Carlo In high-dimensional spaces, rejection sampling and importance sampling are very inefficient An alternative is Markov Chain

More information

Probabilistic Machine Learning

Probabilistic Machine Learning Probabilistic Machine Learning Bayesian Nets, MCMC, and more Marek Petrik 4/18/2017 Based on: P. Murphy, K. (2012). Machine Learning: A Probabilistic Perspective. Chapter 10. Conditional Independence Independent

More information

Recent Advances in Bayesian Inference Techniques

Recent Advances in Bayesian Inference Techniques Recent Advances in Bayesian Inference Techniques Christopher M. Bishop Microsoft Research, Cambridge, U.K. research.microsoft.com/~cmbishop SIAM Conference on Data Mining, April 2004 Abstract Bayesian

More information

Chapter 16. Structured Probabilistic Models for Deep Learning

Chapter 16. Structured Probabilistic Models for Deep Learning Peng et al.: Deep Learning and Practice 1 Chapter 16 Structured Probabilistic Models for Deep Learning Peng et al.: Deep Learning and Practice 2 Structured Probabilistic Models way of using graphs to describe

More information

12 : Variational Inference I

12 : Variational Inference I 10-708: Probabilistic Graphical Models, Spring 2015 12 : Variational Inference I Lecturer: Eric P. Xing Scribes: Fattaneh Jabbari, Eric Lei, Evan Shapiro 1 Introduction Probabilistic inference is one of

More information

Bayesian Learning in Undirected Graphical Models

Bayesian Learning in Undirected Graphical Models Bayesian Learning in Undirected Graphical Models Zoubin Ghahramani Gatsby Computational Neuroscience Unit University College London, UK http://www.gatsby.ucl.ac.uk/ Work with: Iain Murray and Hyun-Chul

More information

Inference in Graphical Models Variable Elimination and Message Passing Algorithm

Inference in Graphical Models Variable Elimination and Message Passing Algorithm Inference in Graphical Models Variable Elimination and Message Passing lgorithm Le Song Machine Learning II: dvanced Topics SE 8803ML, Spring 2012 onditional Independence ssumptions Local Markov ssumption

More information

CPSC 540: Machine Learning

CPSC 540: Machine Learning CPSC 540: Machine Learning Undirected Graphical Models Mark Schmidt University of British Columbia Winter 2016 Admin Assignment 3: 2 late days to hand it in today, Thursday is final day. Assignment 4:

More information

Probabilistic Graphical Models for Image Analysis - Lecture 1

Probabilistic Graphical Models for Image Analysis - Lecture 1 Probabilistic Graphical Models for Image Analysis - Lecture 1 Alexey Gronskiy, Stefan Bauer 21 September 2018 Max Planck ETH Center for Learning Systems Overview 1. Motivation - Why Graphical Models 2.

More information

Naïve Bayes classification. p ij 11/15/16. Probability theory. Probability theory. Probability theory. X P (X = x i )=1 i. Marginal Probability

Naïve Bayes classification. p ij 11/15/16. Probability theory. Probability theory. Probability theory. X P (X = x i )=1 i. Marginal Probability Probability theory Naïve Bayes classification Random variable: a variable whose possible values are numerical outcomes of a random phenomenon. s: A person s height, the outcome of a coin toss Distinguish

More information

Lab 12: Structured Prediction

Lab 12: Structured Prediction December 4, 2014 Lecture plan structured perceptron application: confused messages application: dependency parsing structured SVM Class review: from modelization to classification What does learning mean?

More information

Inference in Bayesian Networks

Inference in Bayesian Networks Andrea Passerini passerini@disi.unitn.it Machine Learning Inference in graphical models Description Assume we have evidence e on the state of a subset of variables E in the model (i.e. Bayesian Network)

More information

Machine Learning Lecture 2

Machine Learning Lecture 2 Machine Perceptual Learning and Sensory Summer Augmented 15 Computing Many slides adapted from B. Schiele Machine Learning Lecture 2 Probability Density Estimation 16.04.2015 Bastian Leibe RWTH Aachen

More information

CS 446 Machine Learning Fall 2016 Nov 01, Bayesian Learning

CS 446 Machine Learning Fall 2016 Nov 01, Bayesian Learning CS 446 Machine Learning Fall 206 Nov 0, 206 Bayesian Learning Professor: Dan Roth Scribe: Ben Zhou, C. Cervantes Overview Bayesian Learning Naive Bayes Logistic Regression Bayesian Learning So far, we

More information

Sequential Supervised Learning

Sequential Supervised Learning Sequential Supervised Learning Many Application Problems Require Sequential Learning Part-of of-speech Tagging Information Extraction from the Web Text-to to-speech Mapping Part-of of-speech Tagging Given

More information

Does Better Inference mean Better Learning?

Does Better Inference mean Better Learning? Does Better Inference mean Better Learning? Andrew E. Gelfand, Rina Dechter & Alexander Ihler Department of Computer Science University of California, Irvine {agelfand,dechter,ihler}@ics.uci.edu Abstract

More information

Pushmeet Kohli Microsoft Research

Pushmeet Kohli Microsoft Research Pushmeet Kohli Microsoft Research E(x) x in {0,1} n Image (D) [Boykov and Jolly 01] [Blake et al. 04] E(x) = c i x i Pixel Colour x in {0,1} n Unary Cost (c i ) Dark (Bg) Bright (Fg) x* = arg min E(x)

More information

ML4NLP Multiclass Classification

ML4NLP Multiclass Classification ML4NLP Multiclass Classification CS 590NLP Dan Goldwasser Purdue University dgoldwas@purdue.edu Social NLP Last week we discussed the speed-dates paper. Interesting perspective on NLP problems- Can we

More information

Structured Learning with Approximate Inference

Structured Learning with Approximate Inference Structured Learning with Approximate Inference Alex Kulesza and Fernando Pereira Department of Computer and Information Science University of Pennsylvania {kulesza, pereira}@cis.upenn.edu Abstract In many

More information

Probabilistic and Bayesian Machine Learning

Probabilistic and Bayesian Machine Learning Probabilistic and Bayesian Machine Learning Day 4: Expectation and Belief Propagation Yee Whye Teh ywteh@gatsby.ucl.ac.uk Gatsby Computational Neuroscience Unit University College London http://www.gatsby.ucl.ac.uk/

More information

Machine Learning for Data Science (CS4786) Lecture 24

Machine Learning for Data Science (CS4786) Lecture 24 Machine Learning for Data Science (CS4786) Lecture 24 Graphical Models: Approximate Inference Course Webpage : http://www.cs.cornell.edu/courses/cs4786/2016sp/ BELIEF PROPAGATION OR MESSAGE PASSING Each

More information

Partially Directed Graphs and Conditional Random Fields. Sargur Srihari

Partially Directed Graphs and Conditional Random Fields. Sargur Srihari Partially Directed Graphs and Conditional Random Fields Sargur srihari@cedar.buffalo.edu 1 Topics Conditional Random Fields Gibbs distribution and CRF Directed and Undirected Independencies View as combination

More information

Lecture 13: Structured Prediction

Lecture 13: Structured Prediction Lecture 13: Structured Prediction Kai-Wei Chang CS @ University of Virginia kw@kwchang.net Couse webpage: http://kwchang.net/teaching/nlp16 CS6501: NLP 1 Quiz 2 v Lectures 9-13 v Lecture 12: before page

More information

CSCE 478/878 Lecture 6: Bayesian Learning

CSCE 478/878 Lecture 6: Bayesian Learning Bayesian Methods Not all hypotheses are created equal (even if they are all consistent with the training data) Outline CSCE 478/878 Lecture 6: Bayesian Learning Stephen D. Scott (Adapted from Tom Mitchell

More information

Algorithms for Predicting Structured Data

Algorithms for Predicting Structured Data 1 / 70 Algorithms for Predicting Structured Data Thomas Gärtner / Shankar Vembu Fraunhofer IAIS / UIUC ECML PKDD 2010 Structured Prediction 2 / 70 Predicting multiple outputs with complex internal structure

More information