Joint Optimization of Segmentation and Appearance Models

Size: px
Start display at page:

Download "Joint Optimization of Segmentation and Appearance Models"

Transcription

1 Joint Optimization of Segmentation and Appearance Models David Mandle, Sameep Tandon April 29, 2013 David Mandle, Sameep Tandon (Stanford) April 29, / 19

2 Overview 1 Recap: Image Segmentation 2 Optimization Strategy 3 Experimental David Mandle, Sameep Tandon (Stanford) April 29, / 19

3 Recap: Image Segmentation Problem Segment an image into foreground and background Figure: Left: Input image. Middle: Segmentation by EM (GrabCut). Right: Segmentation by the method covered today David Mandle, Sameep Tandon (Stanford) April 29, / 19

4 Recap: Image Segmentation as Energy Optimization Recall Grid Structured Markov Random Field: Latent variables x i {0, 1} corresponding to foreground/background Observations z i. Take to be RGB pixel values Edge potentials Φ(x i, z i ), Ψ(x i, x j ) David Mandle, Sameep Tandon (Stanford) April 29, / 19

5 Recap: Image Segmentation as Energy Optimization The Graphical Model encodes the following (unnormalized) probability distribution: David Mandle, Sameep Tandon (Stanford) April 29, / 19

6 Recap: Image Segmentation as Energy Optimization Goal: find x to maximize P(x, z) (z is observed) David Mandle, Sameep Tandon (Stanford) April 29, / 19

7 Recap: Image Segmentation as Energy Optimization Goal: find x to maximize P(x, z) (z is observed) Taking logs: E(x, z) = i φ(x i, z i ) + i,j ψ(x i, x j ) David Mandle, Sameep Tandon (Stanford) April 29, / 19

8 Recap: Image Segmentation as Energy Optimization Goal: find x to maximize P(x, z) (z is observed) Taking logs: E(x, z) = i φ(x i, z i ) + i,j ψ(x i, x j ) Unary potential φ(x i, z i ) encodes how likely it is for a pixel or patch y i to belong to segmentation x i. David Mandle, Sameep Tandon (Stanford) April 29, / 19

9 Recap: Image Segmentation as Energy Optimization Goal: find x to maximize P(x, z) (z is observed) Taking logs: E(x, z) = i φ(x i, z i ) + i,j ψ(x i, x j ) Unary potential φ(x i, z i ) encodes how likely it is for a pixel or patch y i to belong to segmentation x i. Pairwise potential ψ(x i, x j ) encodes neighborhood info about pixel/patch segmentation labels David Mandle, Sameep Tandon (Stanford) April 29, / 19

10 Recap: GrabCut Model Unary Potentials: log of Gaussian Mixture Model But to deal with tractability, we assign each x i to component k i φ(x i, k i, θ z i ) = log π(x i, k i ) + log N(z i ; µ(k i ), Σ(k i )) David Mandle, Sameep Tandon (Stanford) April 29, / 19

11 Recap: GrabCut Model Unary Potentials: log of Gaussian Mixture Model But to deal with tractability, we assign each x i to component k i φ(x i, k i, θ z i ) = log π(x i, k i ) + log N(z i ; µ(k i ), Σ(k i )) Pairwise Potentials: ψ(x i, x j z i, z j ) = [x i x j ] exp( β 1 z i z j 2 ) where β = 2 avg( z i z j 2 ) David Mandle, Sameep Tandon (Stanford) April 29, / 19

12 Recap: GrabCut Optimization Strategy GrabCut EM Algorithm 1 Initialize Mixture Models David Mandle, Sameep Tandon (Stanford) April 29, / 19

13 Recap: GrabCut Optimization Strategy GrabCut EM Algorithm 1 Initialize Mixture Models 2 Assign GMM components: k i = arg min k φ(x i, k i, θ z i ) David Mandle, Sameep Tandon (Stanford) April 29, / 19

14 Recap: GrabCut Optimization Strategy GrabCut EM Algorithm 1 Initialize Mixture Models 2 Assign GMM components: 3 Get GMM parameters: k i = arg min k φ(x i, k i, θ z i ) θ = arg min θ φ(x i, k i, θ z i ) i David Mandle, Sameep Tandon (Stanford) April 29, / 19

15 Recap: GrabCut Optimization Strategy GrabCut EM Algorithm 1 Initialize Mixture Models 2 Assign GMM components: 3 Get GMM parameters: k i = arg min k φ(x i, k i, θ z i ) θ = arg min θ φ(x i, k i, θ z i ) 4 Perform segmentation using reduction to min-cut: i x = arg min x E(x, z; k, θ) David Mandle, Sameep Tandon (Stanford) April 29, / 19

16 Recap: GrabCut Optimization Strategy GrabCut EM Algorithm 1 Initialize Mixture Models 2 Assign GMM components: 3 Get GMM parameters: k i = arg min k φ(x i, k i, θ z i ) θ = arg min θ φ(x i, k i, θ z i ) 4 Perform segmentation using reduction to min-cut: 5 Iterate from step 2 until converged i x = arg min x E(x, z; k, θ) David Mandle, Sameep Tandon (Stanford) April 29, / 19

17 New Model Let s consider a simpler model. This will be useful soon Unary terms: Histograms David Mandle, Sameep Tandon (Stanford) April 29, / 19

18 New Model Let s consider a simpler model. This will be useful soon Unary terms: Histograms K bins, b i is bin of pixel z i David Mandle, Sameep Tandon (Stanford) April 29, / 19

19 New Model Let s consider a simpler model. This will be useful soon Unary terms: Histograms K bins, b i is bin of pixel z i θ 0, θ 1 [0, 1] K represent color models (distributions) over foreground/background David Mandle, Sameep Tandon (Stanford) April 29, / 19

20 New Model Let s consider a simpler model. This will be useful soon Unary terms: Histograms K bins, b i is bin of pixel z i θ 0, θ 1 [0, 1] K represent color models (distributions) over foreground/background φ(x i, b i, θ) = log θ x i b i David Mandle, Sameep Tandon (Stanford) April 29, / 19

21 New Model Let s consider a simpler model. This will be useful soon Unary terms: Histograms K bins, b i is bin of pixel z i θ 0, θ 1 [0, 1] K represent color models (distributions) over foreground/background Pairwise Potentials φ(x i, b i, θ) = log θ x i b i ψ(x i, x j ) = w ij x i x j We will define w ij later; for now, consider pairwise equal to Grabcut David Mandle, Sameep Tandon (Stanford) April 29, / 19

22 New Model Let s consider a simpler model. This will be useful soon Unary terms: Histograms K bins, b i is bin of pixel z i θ 0, θ 1 [0, 1] K represent color models (distributions) over foreground/background Pairwise Potentials φ(x i, b i, θ) = log θ x i b i ψ(x i, x j ) = w ij x i x j We will define w ij later; for now, consider pairwise equal to Grabcut Total Energy: E(x, θ 0, θ 1 ) = log P(z p θ xp ) + w pq x p x q p V P(z p θ xp ) = θ xp b p (p,q) N David Mandle, Sameep Tandon (Stanford) April 29, / 19

23 EM under new model 1 Initialize histograms θ 0, θ 1. David Mandle, Sameep Tandon (Stanford) April 29, / 19

24 EM under new model 1 Initialize histograms θ 0, θ 1. 2 Fix θ. Perform segmentation using reduction to min-cut: x = arg min x E(x, θ 0, θ 1 ) David Mandle, Sameep Tandon (Stanford) April 29, / 19

25 EM under new model 1 Initialize histograms θ 0, θ 1. 2 Fix θ. Perform segmentation using reduction to min-cut: x = arg min x E(x, θ 0, θ 1 ) 3 Fix x. Compute θ 0, θ 1 (via standard parameter fitting). David Mandle, Sameep Tandon (Stanford) April 29, / 19

26 EM under new model 1 Initialize histograms θ 0, θ 1. 2 Fix θ. Perform segmentation using reduction to min-cut: x = arg min x E(x, θ 0, θ 1 ) 3 Fix x. Compute θ 0, θ 1 (via standard parameter fitting). 4 Iterate from step 2 until converged David Mandle, Sameep Tandon (Stanford) April 29, / 19

27 Optimization Goal: Minimize Energy min x E(x) E(x) = k h k (n 1 k ) + (p,q) N w pq x p x q } {{ } E 1 (x) + h(n 1 ) }{{} E 2 (x) where n 1 k = p V k x p and n 1 = p V x p David Mandle, Sameep Tandon (Stanford) April 29, / 19

28 Optimization Goal: Minimize Energy min x E(x) E(x) = k h k (n 1 k ) + (p,q) N w pq x p x q } {{ } E 1 (x) + h(n 1 ) }{{} E 2 (x) where n 1 k = p V k x p and n 1 = p V x p This is hard! David Mandle, Sameep Tandon (Stanford) April 29, / 19

29 Optimization Goal: Minimize Energy min x E(x) E(x) = k h k (n 1 k ) + (p,q) N w pq x p x q } {{ } E 1 (x) + h(n 1 ) }{{} E 2 (x) where n 1 k = p V k x p and n 1 = p V x p This is hard! But efficient strategies for optimizing E 1 (x) and E 2 (x) separately David Mandle, Sameep Tandon (Stanford) April 29, / 19

30 Optimization via Dual Decomposition Consider an optimization of the form min x f 1 (x) + f 2 (x) David Mandle, Sameep Tandon (Stanford) April 29, / 19

31 Optimization via Dual Decomposition Consider an optimization of the form min x f 1 (x) + f 2 (x) where optimizing f (x) = f 1 (x) + f 2 (x) is hard David Mandle, Sameep Tandon (Stanford) April 29, / 19

32 Optimization via Dual Decomposition Consider an optimization of the form min x f 1 (x) + f 2 (x) where optimizing f (x) = f 1 (x) + f 2 (x) is hard But min x f 1 (x) and min x f 2 (x) are easy problems David Mandle, Sameep Tandon (Stanford) April 29, / 19

33 Optimization via Dual Decomposition Consider an optimization of the form min x f 1 (x) + f 2 (x) where optimizing f (x) = f 1 (x) + f 2 (x) is hard But min x f 1 (x) and min x f 2 (x) are easy problems Dual Decomposition idea: Optimize f 1 (x) and f 2 (x) separately and combine in a principled way. David Mandle, Sameep Tandon (Stanford) April 29, / 19

34 Optimization via Dual Decomposition Original Problem: min x f 1 (x) + f 2 (x) David Mandle, Sameep Tandon (Stanford) April 29, / 19

35 Optimization via Dual Decomposition Original Problem: min x f 1 (x) + f 2 (x) Introduce local variables: min x1,x 2 f 1 (x 1 ) + f 2 (x 2 ) s.t x 1 = x 2 David Mandle, Sameep Tandon (Stanford) April 29, / 19

36 Optimization via Dual Decomposition Original Problem: min x f 1 (x) + f 2 (x) Introduce local variables: min x1,x 2 f 1 (x 1 ) + f 2 (x 2 ) s.t x 1 = x 2 Equivalent Problem: min x1,x 2 f 1 (x 1 ) + f 2 (x 2 ) s.t x 2 x 1 = 0 David Mandle, Sameep Tandon (Stanford) April 29, / 19

37 Optimization via Dual Decomposition Primal Problem: min x1,x 2 f 1 (x 1 ) + f 2 (x 2 ) s.t x 2 x 1 = 0 David Mandle, Sameep Tandon (Stanford) April 29, / 19

38 Optimization via Dual Decomposition Primal Problem: Lagrangian Dual: min x1,x 2 f 1 (x 1 ) + f 2 (x 2 ) s.t x 2 x 1 = 0 g(y) = min x 1,x 2 f 1 (x 1 ) + f 2 (x 2 ) + y T (x 2 x 1 ) David Mandle, Sameep Tandon (Stanford) April 29, / 19

39 Optimization via Dual Decomposition Primal Problem: Lagrangian Dual: min x1,x 2 f 1 (x 1 ) + f 2 (x 2 ) s.t x 2 x 1 = 0 g(y) = min x 1,x 2 f 1 (x 1 ) + f 2 (x 2 ) + y T (x 2 x 1 ) Decompose Lagrangian Dual: g(y) = (min f 1 (x 1 ) y T x 1 ) + (min f 2 (x 2 ) + y T x 2 ) x 1 x 2 David Mandle, Sameep Tandon (Stanford) April 29, / 19

40 Optimization via Dual Decomposition g(y) = (min f 1 (x 1 ) y T x 1 ) + (min f 2 (x 2 ) + y T x 2 ) x } 1 x {{}} 2 {{} g 1 (y) g 2 (y) For all y, g(y) is a lower bound on optimal of the primal problem. David Mandle, Sameep Tandon (Stanford) April 29, / 19

41 Optimization via Dual Decomposition g(y) = (min f 1 (x 1 ) y T x 1 ) + (min f 2 (x 2 ) + y T x 2 ) x } 1 x {{}} 2 {{} g 1 (y) g 2 (y) For all y, g(y) is a lower bound on optimal of the primal problem. Maximize g(y) w.r.t. y to get the tightest bound David Mandle, Sameep Tandon (Stanford) April 29, / 19

42 Optimization via Dual Decomposition g(y) = (min f 1 (x 1 ) y T x 1 ) + (min f 2 (x 2 ) + y T x 2 ) x } 1 x {{}} 2 {{} g 1 (y) g 2 (y) For all y, g(y) is a lower bound on optimal of the primal problem. Maximize g(y) w.r.t. y to get the tightest bound Further, g(y) is concave in y. Many techniques for concave maximization (subgradient ascent, etc). David Mandle, Sameep Tandon (Stanford) April 29, / 19

43 Optimization via Dual Decomposition g(y) = (min f 1 (x 1 ) y T x 1 ) + (min f 2 (x 2 ) + y T x 2 ) x } 1 x {{}} 2 {{} g 1 (y) g 2 (y) For all y, g(y) is a lower bound on optimal of the primal problem. Maximize g(y) w.r.t. y to get the tightest bound Further, g(y) is concave in y. Many techniques for concave maximization (subgradient ascent, etc). Ideally have fast optimization strategies for g 1 (y) and g 2 (y) David Mandle, Sameep Tandon (Stanford) April 29, / 19

44 Optimization via Dual Decomposition Back to our Segmentation problem Energy E(x) = h k (nk 1 ) + k w pq x p x q + h(n 1 ) (p,q) N where n 1 k = p V k x p and n 1 = p V x p David Mandle, Sameep Tandon (Stanford) April 29, / 19

45 Optimization via Dual Decomposition Back to our Segmentation problem Energy E(x) = k h k (n 1 k ) + (p,q) N where n 1 k = p V k x p and n 1 = p V x p Group terms E(x) = k h k (n 1 k ) + (p,q) N w pq x p x q + h(n 1 ) w pq x p x q } {{ } E 1 (x) + h(n 1 ) }{{} E 2 (x) David Mandle, Sameep Tandon (Stanford) April 29, / 19

46 Optimization via Dual Decomposition Back to our Segmentation problem Energy E(x) = k h k (n 1 k ) + (p,q) N where n 1 k = p V k x p and n 1 = p V x p Group terms E(x) = k E(x) = E 1 (x) + E 2 (x) h k (n 1 k ) + (p,q) N w pq x p x q + h(n 1 ) w pq x p x q } {{ } E 1 (x) + h(n 1 ) }{{} E 2 (x) David Mandle, Sameep Tandon (Stanford) April 29, / 19

47 Optimization via Dual Decomposition Dual Decomposition on E(x) = E 1 (x) + E 2 (x) Φ(y) = min(e 1 (x 1 ) y T x 1 ) + min(e 2 (x 2 ) + y T x 2 ) x } 1 x {{}} 2 {{} Φ 1 (y) Φ 2 (y) David Mandle, Sameep Tandon (Stanford) April 29, / 19

48 Optimization via Dual Decomposition Dual Decomposition on E(x) = E 1 (x) + E 2 (x) Φ(y) = min(e 1 (x 1 ) y T x 1 ) + min(e 2 (x 2 ) + y T x 2 ) x } 1 x {{}} 2 {{} Φ 1 (y) Φ 2 (y) Maximize Φ(y) w.r.t. y to get the tightest lower bound. David Mandle, Sameep Tandon (Stanford) April 29, / 19

49 Optimization via Dual Decomposition Dual Decomposition on E(x) = E 1 (x) + E 2 (x) Φ(y) = min(e 1 (x 1 ) y T x 1 ) + min(e 2 (x 2 ) + y T x 2 ) x } 1 x {{}} 2 {{} Φ 1 (y) Φ 2 (y) Maximize Φ(y) w.r.t. y to get the tightest lower bound. Φ 1 (y) can be computed efficiently via reduction to min s-t cut. David Mandle, Sameep Tandon (Stanford) April 29, / 19

50 Optimization via Dual Decomposition Dual Decomposition on E(x) = E 1 (x) + E 2 (x) Φ(y) = min(e 1 (x 1 ) y T x 1 ) + min(e 2 (x 2 ) + y T x 2 ) x } 1 x {{}} 2 {{} Φ 1 (y) Φ 2 (y) Maximize Φ(y) w.r.t. y to get the tightest lower bound. Φ 1 (y) can be computed efficiently via reduction to min s-t cut. Φ 2 (y) via convex minimization (several strategies) David Mandle, Sameep Tandon (Stanford) April 29, / 19

51 Optimization via Dual Decomposition Dual Decomposition on E(x) = E 1 (x) + E 2 (x) Φ(y) = min(e 1 (x 1 ) y T x 1 ) + min(e 2 (x 2 ) + y T x 2 ) x } 1 x {{}} 2 {{} Φ 1 (y) Φ 2 (y) Maximize Φ(y) w.r.t. y to get the tightest lower bound. Φ 1 (y) can be computed efficiently via reduction to min s-t cut. Φ 2 (y) via convex minimization (several strategies) You could now use your favorite concave maximization strategy for Φ(y) David Mandle, Sameep Tandon (Stanford) April 29, / 19

52 Optimization via Dual Decomposition Turns out can maximize Φ(y) over y in polynomial time using a parametric max-flow technique. David Mandle, Sameep Tandon (Stanford) April 29, / 19

53 Optimization via Dual Decomposition Turns out can maximize Φ(y) over y in polynomial time using a parametric max-flow technique. Sketch: David Mandle, Sameep Tandon (Stanford) April 29, / 19

54 Optimization via Dual Decomposition Turns out can maximize Φ(y) over y in polynomial time using a parametric max-flow technique. Sketch: Theorem: Given Φ 1 (y) and Φ 2 (y) as described, optimal y has form y = s1 David Mandle, Sameep Tandon (Stanford) April 29, / 19

55 Optimization via Dual Decomposition Turns out can maximize Φ(y) over y in polynomial time using a parametric max-flow technique. Sketch: Theorem: Given Φ 1 (y) and Φ 2 (y) as described, optimal y has form y = s1 Implication: maximize Φ(s1) over all possible s David Mandle, Sameep Tandon (Stanford) April 29, / 19

56 Optimization via Dual Decomposition Turns out can maximize Φ(y) over y in polynomial time using a parametric max-flow technique. Sketch: Theorem: Given Φ 1 (y) and Φ 2 (y) as described, optimal y has form y = s1 Implication: maximize Φ(s1) over all possible s Φ1 (s1) is piecewise-linear concave V breakpoints computed by parametric max-flow Parametric max-flow also returns between 2 and V + 1 solutions segmentations x per breakpoint David Mandle, Sameep Tandon (Stanford) April 29, / 19

57 Optimization via Dual Decomposition Turns out can maximize Φ(y) over y in polynomial time using a parametric max-flow technique. Sketch: Theorem: Given Φ 1 (y) and Φ 2 (y) as described, optimal y has form y = s1 Implication: maximize Φ(s1) over all possible s Φ1 (s1) is piecewise-linear concave V breakpoints computed by parametric max-flow Parametric max-flow also returns between 2 and V + 1 solutions segmentations x per breakpoint Φ 2 (s1) is piecewise-linear concave V breakpoints can be enumerated from h( ) David Mandle, Sameep Tandon (Stanford) April 29, / 19

58 Optimization via Dual Decomposition Turns out can maximize Φ(y) over y in polynomial time using a parametric max-flow technique. Sketch: Theorem: Given Φ 1 (y) and Φ 2 (y) as described, optimal y has form y = s1 Implication: maximize Φ(s1) over all possible s Φ1 (s1) is piecewise-linear concave V breakpoints computed by parametric max-flow Parametric max-flow also returns between 2 and V + 1 solutions segmentations x per breakpoint Φ 2 (s1) is piecewise-linear concave V breakpoints can be enumerated from h( ) Φ(s1) is piecewise-linear concave with 2 V breakpoints, so finding max over Φ(1s) is easy it s one of the breakpoints. David Mandle, Sameep Tandon (Stanford) April 29, / 19

59 Optimization via Dual Decomposition Turns out can maximize Φ(y) over y in polynomial time using a parametric max-flow technique. Sketch: Theorem: Given Φ 1 (y) and Φ 2 (y) as described, optimal y has form y = s1 Implication: maximize Φ(s1) over all possible s Φ1 (s1) is piecewise-linear concave V breakpoints computed by parametric max-flow Parametric max-flow also returns between 2 and V + 1 solutions segmentations x per breakpoint Φ 2 (s1) is piecewise-linear concave V breakpoints can be enumerated from h( ) Φ(s1) is piecewise-linear concave with 2 V breakpoints, so finding max over Φ(1s) is easy it s one of the breakpoints. Given the breakpoint, return the segmentation x with minimum energy from set given by PMF. David Mandle, Sameep Tandon (Stanford) April 29, / 19

60 References MRF Review Slides: spring1213/slides/segmentation.p Dual Decomposition Slides: Petter Strandmark and Fredrik Kahl. Parallel and Distributed Graph Cuts by Dual Decomposition. David Mandle, Sameep Tandon (Stanford) April 29, / 19

Energy minimization via graph-cuts

Energy minimization via graph-cuts Energy minimization via graph-cuts Nikos Komodakis Ecole des Ponts ParisTech, LIGM Traitement de l information et vision artificielle Binary energy minimization We will first consider binary MRFs: Graph

More information

Part 7: Structured Prediction and Energy Minimization (2/2)

Part 7: Structured Prediction and Energy Minimization (2/2) Part 7: Structured Prediction and Energy Minimization (2/2) Colorado Springs, 25th June 2011 G: Worst-case Complexity Hard problem Generality Optimality Worst-case complexity Integrality Determinism G:

More information

Clustering K-means. Clustering images. Machine Learning CSE546 Carlos Guestrin University of Washington. November 4, 2014.

Clustering K-means. Clustering images. Machine Learning CSE546 Carlos Guestrin University of Washington. November 4, 2014. Clustering K-means Machine Learning CSE546 Carlos Guestrin University of Washington November 4, 2014 1 Clustering images Set of Images [Goldberger et al.] 2 1 K-means Randomly initialize k centers µ (0)

More information

Undirected Graphical Models: Markov Random Fields

Undirected Graphical Models: Markov Random Fields Undirected Graphical Models: Markov Random Fields 40-956 Advanced Topics in AI: Probabilistic Graphical Models Sharif University of Technology Soleymani Spring 2015 Markov Random Field Structure: undirected

More information

Parametric Unsupervised Learning Expectation Maximization (EM) Lecture 20.a

Parametric Unsupervised Learning Expectation Maximization (EM) Lecture 20.a Parametric Unsupervised Learning Expectation Maximization (EM) Lecture 20.a Some slides are due to Christopher Bishop Limitations of K-means Hard assignments of data points to clusters small shift of a

More information

Intelligent Systems:

Intelligent Systems: Intelligent Systems: Undirected Graphical models (Factor Graphs) (2 lectures) Carsten Rother 15/01/2015 Intelligent Systems: Probabilistic Inference in DGM and UGM Roadmap for next two lectures Definition

More information

STATS 306B: Unsupervised Learning Spring Lecture 3 April 7th

STATS 306B: Unsupervised Learning Spring Lecture 3 April 7th STATS 306B: Unsupervised Learning Spring 2014 Lecture 3 April 7th Lecturer: Lester Mackey Scribe: Jordan Bryan, Dangna Li 3.1 Recap: Gaussian Mixture Modeling In the last lecture, we discussed the Gaussian

More information

Clustering K-means. Machine Learning CSE546. Sham Kakade University of Washington. November 15, Review: PCA Start: unsupervised learning

Clustering K-means. Machine Learning CSE546. Sham Kakade University of Washington. November 15, Review: PCA Start: unsupervised learning Clustering K-means Machine Learning CSE546 Sham Kakade University of Washington November 15, 2016 1 Announcements: Project Milestones due date passed. HW3 due on Monday It ll be collaborative HW2 grades

More information

Pattern Recognition and Machine Learning. Bishop Chapter 9: Mixture Models and EM

Pattern Recognition and Machine Learning. Bishop Chapter 9: Mixture Models and EM Pattern Recognition and Machine Learning Chapter 9: Mixture Models and EM Thomas Mensink Jakob Verbeek October 11, 27 Le Menu 9.1 K-means clustering Getting the idea with a simple example 9.2 Mixtures

More information

Likelihood, MLE & EM for Gaussian Mixture Clustering. Nick Duffield Texas A&M University

Likelihood, MLE & EM for Gaussian Mixture Clustering. Nick Duffield Texas A&M University Likelihood, MLE & EM for Gaussian Mixture Clustering Nick Duffield Texas A&M University Probability vs. Likelihood Probability: predict unknown outcomes based on known parameters: P(x q) Likelihood: estimate

More information

Mixture Models & EM. Nicholas Ruozzi University of Texas at Dallas. based on the slides of Vibhav Gogate

Mixture Models & EM. Nicholas Ruozzi University of Texas at Dallas. based on the slides of Vibhav Gogate Mixture Models & EM icholas Ruozzi University of Texas at Dallas based on the slides of Vibhav Gogate Previously We looed at -means and hierarchical clustering as mechanisms for unsupervised learning -means

More information

Expectation maximization

Expectation maximization Expectation maximization Subhransu Maji CMSCI 689: Machine Learning 14 April 2015 Motivation Suppose you are building a naive Bayes spam classifier. After your are done your boss tells you that there is

More information

Variational Inference (11/04/13)

Variational Inference (11/04/13) STA561: Probabilistic machine learning Variational Inference (11/04/13) Lecturer: Barbara Engelhardt Scribes: Matt Dickenson, Alireza Samany, Tracy Schifeling 1 Introduction In this lecture we will further

More information

Mixture Models & EM. Nicholas Ruozzi University of Texas at Dallas. based on the slides of Vibhav Gogate

Mixture Models & EM. Nicholas Ruozzi University of Texas at Dallas. based on the slides of Vibhav Gogate Mixture Models & EM icholas Ruozzi University of Texas at Dallas based on the slides of Vibhav Gogate Previously We looed at -means and hierarchical clustering as mechanisms for unsupervised learning -means

More information

Undirected graphical models

Undirected graphical models Undirected graphical models Semantics of probabilistic models over undirected graphs Parameters of undirected models Example applications COMP-652 and ECSE-608, February 16, 2017 1 Undirected graphical

More information

Bayesian Machine Learning - Lecture 7

Bayesian Machine Learning - Lecture 7 Bayesian Machine Learning - Lecture 7 Guido Sanguinetti Institute for Adaptive and Neural Computation School of Informatics University of Edinburgh gsanguin@inf.ed.ac.uk March 4, 2015 Today s lecture 1

More information

Computer Vision Group Prof. Daniel Cremers. 6. Mixture Models and Expectation-Maximization

Computer Vision Group Prof. Daniel Cremers. 6. Mixture Models and Expectation-Maximization Prof. Daniel Cremers 6. Mixture Models and Expectation-Maximization Motivation Often the introduction of latent (unobserved) random variables into a model can help to express complex (marginal) distributions

More information

Clustering. Professor Ameet Talwalkar. Professor Ameet Talwalkar CS260 Machine Learning Algorithms March 8, / 26

Clustering. Professor Ameet Talwalkar. Professor Ameet Talwalkar CS260 Machine Learning Algorithms March 8, / 26 Clustering Professor Ameet Talwalkar Professor Ameet Talwalkar CS26 Machine Learning Algorithms March 8, 217 1 / 26 Outline 1 Administration 2 Review of last lecture 3 Clustering Professor Ameet Talwalkar

More information

Probabilistic Graphical Models & Applications

Probabilistic Graphical Models & Applications Probabilistic Graphical Models & Applications Learning of Graphical Models Bjoern Andres and Bernt Schiele Max Planck Institute for Informatics The slides of today s lecture are authored by and shown with

More information

Probabilistic Graphical Models

Probabilistic Graphical Models School of Computer Science Probabilistic Graphical Models Variational Inference II: Mean Field Method and Variational Principle Junming Yin Lecture 15, March 7, 2012 X 1 X 1 X 1 X 1 X 2 X 3 X 2 X 2 X 3

More information

Machine Learning. Gaussian Mixture Models. Zhiyao Duan & Bryan Pardo, Machine Learning: EECS 349 Fall

Machine Learning. Gaussian Mixture Models. Zhiyao Duan & Bryan Pardo, Machine Learning: EECS 349 Fall Machine Learning Gaussian Mixture Models Zhiyao Duan & Bryan Pardo, Machine Learning: EECS 349 Fall 2012 1 The Generative Model POV We think of the data as being generated from some process. We assume

More information

COMS 4721: Machine Learning for Data Science Lecture 16, 3/28/2017

COMS 4721: Machine Learning for Data Science Lecture 16, 3/28/2017 COMS 4721: Machine Learning for Data Science Lecture 16, 3/28/2017 Prof. John Paisley Department of Electrical Engineering & Data Science Institute Columbia University SOFT CLUSTERING VS HARD CLUSTERING

More information

Energy Minimization via Graph Cuts

Energy Minimization via Graph Cuts Energy Minimization via Graph Cuts Xiaowei Zhou, June 11, 2010, Journal Club Presentation 1 outline Introduction MAP formulation for vision problems Min-cut and Max-flow Problem Energy Minimization via

More information

Expectation Maximization

Expectation Maximization Expectation Maximization Bishop PRML Ch. 9 Alireza Ghane c Ghane/Mori 4 6 8 4 6 8 4 6 8 4 6 8 5 5 5 5 5 5 4 6 8 4 4 6 8 4 5 5 5 5 5 5 µ, Σ) α f Learningscale is slightly Parameters is slightly larger larger

More information

CPSC 540: Machine Learning

CPSC 540: Machine Learning CPSC 540: Machine Learning More Approximate Inference Mark Schmidt University of British Columbia Winter 2018 Last Time: Approximate Inference We ve been discussing graphical models for density estimation,

More information

1 EM algorithm: updating the mixing proportions {π k } ik are the posterior probabilities at the qth iteration of EM.

1 EM algorithm: updating the mixing proportions {π k } ik are the posterior probabilities at the qth iteration of EM. Université du Sud Toulon - Var Master Informatique Probabilistic Learning and Data Analysis TD: Model-based clustering by Faicel CHAMROUKHI Solution The aim of this practical wor is to show how the Classification

More information

Clustering and Gaussian Mixture Models

Clustering and Gaussian Mixture Models Clustering and Gaussian Mixture Models Piyush Rai IIT Kanpur Probabilistic Machine Learning (CS772A) Jan 25, 2016 Probabilistic Machine Learning (CS772A) Clustering and Gaussian Mixture Models 1 Recap

More information

Posterior Regularization

Posterior Regularization Posterior Regularization 1 Introduction One of the key challenges in probabilistic structured learning, is the intractability of the posterior distribution, for fast inference. There are numerous methods

More information

Latent Variable View of EM. Sargur Srihari

Latent Variable View of EM. Sargur Srihari Latent Variable View of EM Sargur srihari@cedar.buffalo.edu 1 Examples of latent variables 1. Mixture Model Joint distribution is p(x,z) We don t have values for z 2. Hidden Markov Model A single time

More information

Expectation Maximization

Expectation Maximization Expectation Maximization Aaron C. Courville Université de Montréal Note: Material for the slides is taken directly from a presentation prepared by Christopher M. Bishop Learning in DAGs Two things could

More information

Series 6, May 14th, 2018 (EM Algorithm and Semi-Supervised Learning)

Series 6, May 14th, 2018 (EM Algorithm and Semi-Supervised Learning) Exercises Introduction to Machine Learning SS 2018 Series 6, May 14th, 2018 (EM Algorithm and Semi-Supervised Learning) LAS Group, Institute for Machine Learning Dept of Computer Science, ETH Zürich Prof

More information

Efficient Inference in Fully Connected CRFs with Gaussian Edge Potentials

Efficient Inference in Fully Connected CRFs with Gaussian Edge Potentials Efficient Inference in Fully Connected CRFs with Gaussian Edge Potentials Philipp Krähenbühl and Vladlen Koltun Stanford University Presenter: Yuan-Ting Hu 1 Conditional Random Field (CRF) E x I = φ u

More information

CS Lecture 4. Markov Random Fields

CS Lecture 4. Markov Random Fields CS 6347 Lecture 4 Markov Random Fields Recap Announcements First homework is available on elearning Reminder: Office hours Tuesday from 10am-11am Last Time Bayesian networks Today Markov random fields

More information

THE HIDDEN CONVEXITY OF SPECTRAL CLUSTERING

THE HIDDEN CONVEXITY OF SPECTRAL CLUSTERING THE HIDDEN CONVEXITY OF SPECTRAL CLUSTERING Luis Rademacher, Ohio State University, Computer Science and Engineering. Joint work with Mikhail Belkin and James Voss This talk A new approach to multi-way

More information

Markov Random Fields

Markov Random Fields Markov Random Fields Umamahesh Srinivas ipal Group Meeting February 25, 2011 Outline 1 Basic graph-theoretic concepts 2 Markov chain 3 Markov random field (MRF) 4 Gauss-Markov random field (GMRF), and

More information

Gaussian Mixture Models, Expectation Maximization

Gaussian Mixture Models, Expectation Maximization Gaussian Mixture Models, Expectation Maximization Instructor: Jessica Wu Harvey Mudd College The instructor gratefully acknowledges Andrew Ng (Stanford), Andrew Moore (CMU), Eric Eaton (UPenn), David Kauchak

More information

Cheng Soon Ong & Christian Walder. Canberra February June 2018

Cheng Soon Ong & Christian Walder. Canberra February June 2018 Cheng Soon Ong & Christian Walder Research Group and College of Engineering and Computer Science Canberra February June 218 Outlines Overview Introduction Linear Algebra Probability Linear Regression 1

More information

Markov Random Fields for Computer Vision (Part 1)

Markov Random Fields for Computer Vision (Part 1) Markov Random Fields for Computer Vision (Part 1) Machine Learning Summer School (MLSS 2011) Stephen Gould stephen.gould@anu.edu.au Australian National University 13 17 June, 2011 Stephen Gould 1/23 Pixel

More information

CSCI-567: Machine Learning (Spring 2019)

CSCI-567: Machine Learning (Spring 2019) CSCI-567: Machine Learning (Spring 2019) Prof. Victor Adamchik U of Southern California Mar. 19, 2019 March 19, 2019 1 / 43 Administration March 19, 2019 2 / 43 Administration TA3 is due this week March

More information

Introduction to Probabilistic Graphical Models: Exercises

Introduction to Probabilistic Graphical Models: Exercises Introduction to Probabilistic Graphical Models: Exercises Cédric Archambeau Xerox Research Centre Europe cedric.archambeau@xrce.xerox.com Pascal Bootcamp Marseille, France, July 2010 Exercise 1: basics

More information

MAP Examples. Sargur Srihari

MAP Examples. Sargur Srihari MAP Examples Sargur srihari@cedar.buffalo.edu 1 Potts Model CRF for OCR Topics Image segmentation based on energy minimization 2 Examples of MAP Many interesting examples of MAP inference are instances

More information

Joint Factor Analysis for Speaker Verification

Joint Factor Analysis for Speaker Verification Joint Factor Analysis for Speaker Verification Mengke HU ASPITRG Group, ECE Department Drexel University mengke.hu@gmail.com October 12, 2012 1/37 Outline 1 Speaker Verification Baseline System Session

More information

Probabilistic Graphical Models

Probabilistic Graphical Models Probabilistic Graphical Models David Sontag New York University Lecture 6, March 7, 2013 David Sontag (NYU) Graphical Models Lecture 6, March 7, 2013 1 / 25 Today s lecture 1 Dual decomposition 2 MAP inference

More information

Gaussian Mixture Models

Gaussian Mixture Models Gaussian Mixture Models Pradeep Ravikumar Co-instructor: Manuela Veloso Machine Learning 10-701 Some slides courtesy of Eric Xing, Carlos Guestrin (One) bad case for K- means Clusters may overlap Some

More information

13: Variational inference II

13: Variational inference II 10-708: Probabilistic Graphical Models, Spring 2015 13: Variational inference II Lecturer: Eric P. Xing Scribes: Ronghuo Zheng, Zhiting Hu, Yuntian Deng 1 Introduction We started to talk about variational

More information

Lecture 6: Gaussian Mixture Models (GMM)

Lecture 6: Gaussian Mixture Models (GMM) Helsinki Institute for Information Technology Lecture 6: Gaussian Mixture Models (GMM) Pedram Daee 3.11.2015 Outline Gaussian Mixture Models (GMM) Models Model families and parameters Parameter learning

More information

The Particle Filter. PD Dr. Rudolph Triebel Computer Vision Group. Machine Learning for Computer Vision

The Particle Filter. PD Dr. Rudolph Triebel Computer Vision Group. Machine Learning for Computer Vision The Particle Filter Non-parametric implementation of Bayes filter Represents the belief (posterior) random state samples. by a set of This representation is approximate. Can represent distributions that

More information

Weighted Finite-State Transducers in Computational Biology

Weighted Finite-State Transducers in Computational Biology Weighted Finite-State Transducers in Computational Biology Mehryar Mohri Courant Institute of Mathematical Sciences mohri@cims.nyu.edu Joint work with Corinna Cortes (Google Research). 1 This Tutorial

More information

STATS 306B: Unsupervised Learning Spring Lecture 2 April 2

STATS 306B: Unsupervised Learning Spring Lecture 2 April 2 STATS 306B: Unsupervised Learning Spring 2014 Lecture 2 April 2 Lecturer: Lester Mackey Scribe: Junyang Qian, Minzhe Wang 2.1 Recap In the last lecture, we formulated our working definition of unsupervised

More information

The Expectation-Maximization Algorithm

The Expectation-Maximization Algorithm 1/29 EM & Latent Variable Models Gaussian Mixture Models EM Theory The Expectation-Maximization Algorithm Mihaela van der Schaar Department of Engineering Science University of Oxford MLE for Latent Variable

More information

Mixture Models and EM

Mixture Models and EM Mixture Models and EM Yoan Miche CIS, HUT March 17, 27 Yoan Miche (CIS, HUT) Mixture Models and EM March 17, 27 1 / 23 Mise en Bouche Yoan Miche (CIS, HUT) Mixture Models and EM March 17, 27 2 / 23 Mise

More information

Intelligent Systems (AI-2)

Intelligent Systems (AI-2) Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 18 Oct, 21, 2015 Slide Sources Raymond J. Mooney University of Texas at Austin D. Koller, Stanford CS - Probabilistic Graphical Models CPSC

More information

Linear Dynamical Systems

Linear Dynamical Systems Linear Dynamical Systems Sargur N. srihari@cedar.buffalo.edu Machine Learning Course: http://www.cedar.buffalo.edu/~srihari/cse574/index.html Two Models Described by Same Graph Latent variables Observations

More information

Chapter 16. Structured Probabilistic Models for Deep Learning

Chapter 16. Structured Probabilistic Models for Deep Learning Peng et al.: Deep Learning and Practice 1 Chapter 16 Structured Probabilistic Models for Deep Learning Peng et al.: Deep Learning and Practice 2 Structured Probabilistic Models way of using graphs to describe

More information

Estimating Latent Variable Graphical Models with Moments and Likelihoods

Estimating Latent Variable Graphical Models with Moments and Likelihoods Estimating Latent Variable Graphical Models with Moments and Likelihoods Arun Tejasvi Chaganty Percy Liang Stanford University June 18, 2014 Chaganty, Liang (Stanford University) Moments and Likelihoods

More information

CS 664 Segmentation (2) Daniel Huttenlocher

CS 664 Segmentation (2) Daniel Huttenlocher CS 664 Segmentation (2) Daniel Huttenlocher Recap Last time covered perceptual organization more broadly, focused in on pixel-wise segmentation Covered local graph-based methods such as MST and Felzenszwalb-Huttenlocher

More information

Revisiting the Limits of MAP Inference by MWSS on Perfect Graphs

Revisiting the Limits of MAP Inference by MWSS on Perfect Graphs Revisiting the Limits of MAP Inference by MWSS on Perfect Graphs Adrian Weller University of Cambridge CP 2015 Cork, Ireland Slides and full paper at http://mlg.eng.cam.ac.uk/adrian/ 1 / 21 Motivation:

More information

Latent Variable Models

Latent Variable Models Latent Variable Models Stefano Ermon, Aditya Grover Stanford University Lecture 5 Stefano Ermon, Aditya Grover (AI Lab) Deep Generative Models Lecture 5 1 / 31 Recap of last lecture 1 Autoregressive models:

More information

Computer Vision Group Prof. Daniel Cremers. 10a. Markov Chain Monte Carlo

Computer Vision Group Prof. Daniel Cremers. 10a. Markov Chain Monte Carlo Group Prof. Daniel Cremers 10a. Markov Chain Monte Carlo Markov Chain Monte Carlo In high-dimensional spaces, rejection sampling and importance sampling are very inefficient An alternative is Markov Chain

More information

Mixtures of Gaussians. Sargur Srihari

Mixtures of Gaussians. Sargur Srihari Mixtures of Gaussians Sargur srihari@cedar.buffalo.edu 1 9. Mixture Models and EM 0. Mixture Models Overview 1. K-Means Clustering 2. Mixtures of Gaussians 3. An Alternative View of EM 4. The EM Algorithm

More information

13 : Variational Inference: Loopy Belief Propagation and Mean Field

13 : Variational Inference: Loopy Belief Propagation and Mean Field 10-708: Probabilistic Graphical Models 10-708, Spring 2012 13 : Variational Inference: Loopy Belief Propagation and Mean Field Lecturer: Eric P. Xing Scribes: Peter Schulam and William Wang 1 Introduction

More information

Dual Decomposition for Inference

Dual Decomposition for Inference Dual Decomposition for Inference Yunshu Liu ASPITRG Research Group 2014-05-06 References: [1]. D. Sontag, A. Globerson and T. Jaakkola, Introduction to Dual Decomposition for Inference, Optimization for

More information

Computer Vision Group Prof. Daniel Cremers. 11. Sampling Methods: Markov Chain Monte Carlo

Computer Vision Group Prof. Daniel Cremers. 11. Sampling Methods: Markov Chain Monte Carlo Group Prof. Daniel Cremers 11. Sampling Methods: Markov Chain Monte Carlo Markov Chain Monte Carlo In high-dimensional spaces, rejection sampling and importance sampling are very inefficient An alternative

More information

CS6220: DATA MINING TECHNIQUES

CS6220: DATA MINING TECHNIQUES CS6220: DATA MINING TECHNIQUES Matrix Data: Clustering: Part 2 Instructor: Yizhou Sun yzsun@ccs.neu.edu November 3, 2015 Methods to Learn Matrix Data Text Data Set Data Sequence Data Time Series Graph

More information

Clustering, K-Means, EM Tutorial

Clustering, K-Means, EM Tutorial Clustering, K-Means, EM Tutorial Kamyar Ghasemipour Parts taken from Shikhar Sharma, Wenjie Luo, and Boris Ivanovic s tutorial slides, as well as lecture notes Organization: Clustering Motivation K-Means

More information

Lecture 15. Probabilistic Models on Graph

Lecture 15. Probabilistic Models on Graph Lecture 15. Probabilistic Models on Graph Prof. Alan Yuille Spring 2014 1 Introduction We discuss how to define probabilistic models that use richly structured probability distributions and describe how

More information

CPSC 540: Machine Learning

CPSC 540: Machine Learning CPSC 540: Machine Learning Undirected Graphical Models Mark Schmidt University of British Columbia Winter 2016 Admin Assignment 3: 2 late days to hand it in today, Thursday is final day. Assignment 4:

More information

Pushmeet Kohli Microsoft Research

Pushmeet Kohli Microsoft Research Pushmeet Kohli Microsoft Research E(x) x in {0,1} n Image (D) [Boykov and Jolly 01] [Blake et al. 04] E(x) = c i x i Pixel Colour x in {0,1} n Unary Cost (c i ) Dark (Bg) Bright (Fg) x* = arg min E(x)

More information

Better restore the recto side of a document with an estimation of the verso side: Markov model and inference with graph cuts

Better restore the recto side of a document with an estimation of the verso side: Markov model and inference with graph cuts June 23 rd 2008 Better restore the recto side of a document with an estimation of the verso side: Markov model and inference with graph cuts Christian Wolf Laboratoire d InfoRmatique en Image et Systèmes

More information

Lecture 10. Announcement. Mixture Models II. Topics of This Lecture. This Lecture: Advanced Machine Learning. Recap: GMMs as Latent Variable Models

Lecture 10. Announcement. Mixture Models II. Topics of This Lecture. This Lecture: Advanced Machine Learning. Recap: GMMs as Latent Variable Models Advanced Machine Learning Lecture 10 Mixture Models II 30.11.2015 Bastian Leibe RWTH Aachen http://www.vision.rwth-aachen.de/ Announcement Exercise sheet 2 online Sampling Rejection Sampling Importance

More information

Gaussian Mixture Models

Gaussian Mixture Models Gaussian Mixture Models David Rosenberg, Brett Bernstein New York University April 26, 2017 David Rosenberg, Brett Bernstein (New York University) DS-GA 1003 April 26, 2017 1 / 42 Intro Question Intro

More information

Review and Motivation

Review and Motivation Review and Motivation We can model and visualize multimodal datasets by using multiple unimodal (Gaussian-like) clusters. K-means gives us a way of partitioning points into N clusters. Once we know which

More information

Efficient Inference in Fully Connected CRFs with Gaussian Edge Potentials

Efficient Inference in Fully Connected CRFs with Gaussian Edge Potentials Efficient Inference in Fully Connected CRFs with Gaussian Edge Potentials by Phillip Krahenbuhl and Vladlen Koltun Presented by Adam Stambler Multi-class image segmentation Assign a class label to each

More information

Bayesian Classifiers and Probability Estimation. Vassilis Athitsos CSE 4308/5360: Artificial Intelligence I University of Texas at Arlington

Bayesian Classifiers and Probability Estimation. Vassilis Athitsos CSE 4308/5360: Artificial Intelligence I University of Texas at Arlington Bayesian Classifiers and Probability Estimation Vassilis Athitsos CSE 4308/5360: Artificial Intelligence I University of Texas at Arlington 1 Data Space Suppose that we have a classification problem The

More information

COMS 4721: Machine Learning for Data Science Lecture 19, 4/6/2017

COMS 4721: Machine Learning for Data Science Lecture 19, 4/6/2017 COMS 4721: Machine Learning for Data Science Lecture 19, 4/6/2017 Prof. John Paisley Department of Electrical Engineering & Data Science Institute Columbia University PRINCIPAL COMPONENT ANALYSIS DIMENSIONALITY

More information

ECE 6504: Advanced Topics in Machine Learning Probabilistic Graphical Models and Large-Scale Learning

ECE 6504: Advanced Topics in Machine Learning Probabilistic Graphical Models and Large-Scale Learning ECE 6504: Advanced Topics in Machine Learning Probabilistic Graphical Models and Large-Scale Learning Topics Markov Random Fields: Representation Conditional Random Fields Log-Linear Models Readings: KF

More information

Probabilistic Graphical Models

Probabilistic Graphical Models Probabilistic Graphical Models Brown University CSCI 295-P, Spring 213 Prof. Erik Sudderth Lecture 11: Inference & Learning Overview, Gaussian Graphical Models Some figures courtesy Michael Jordan s draft

More information

Kernel methods, kernel SVM and ridge regression

Kernel methods, kernel SVM and ridge regression Kernel methods, kernel SVM and ridge regression Le Song Machine Learning II: Advanced Topics CSE 8803ML, Spring 2012 Collaborative Filtering 2 Collaborative Filtering R: rating matrix; U: user factor;

More information

Variational Mixture of Gaussians. Sargur Srihari

Variational Mixture of Gaussians. Sargur Srihari Variational Mixture of Gaussians Sargur srihari@cedar.buffalo.edu 1 Objective Apply variational inference machinery to Gaussian Mixture Models Demonstrates how Bayesian treatment elegantly resolves difficulties

More information

IEOR E4570: Machine Learning for OR&FE Spring 2015 c 2015 by Martin Haugh. The EM Algorithm

IEOR E4570: Machine Learning for OR&FE Spring 2015 c 2015 by Martin Haugh. The EM Algorithm IEOR E4570: Machine Learning for OR&FE Spring 205 c 205 by Martin Haugh The EM Algorithm The EM algorithm is used for obtaining maximum likelihood estimates of parameters when some of the data is missing.

More information

A Discriminatively Trained, Multiscale, Deformable Part Model

A Discriminatively Trained, Multiscale, Deformable Part Model A Discriminatively Trained, Multiscale, Deformable Part Model P. Felzenszwalb, D. McAllester, and D. Ramanan Edward Hsiao 16-721 Learning Based Methods in Vision February 16, 2009 Images taken from P.

More information

Cheng Soon Ong & Christian Walder. Canberra February June 2018

Cheng Soon Ong & Christian Walder. Canberra February June 2018 Cheng Soon Ong & Christian Walder Research Group and College of Engineering and Computer Science Canberra February June 2018 Outlines Overview Introduction Linear Algebra Probability Linear Regression

More information

CSC 412 (Lecture 4): Undirected Graphical Models

CSC 412 (Lecture 4): Undirected Graphical Models CSC 412 (Lecture 4): Undirected Graphical Models Raquel Urtasun University of Toronto Feb 2, 2016 R Urtasun (UofT) CSC 412 Feb 2, 2016 1 / 37 Today Undirected Graphical Models: Semantics of the graph:

More information

Lecture 14. Clustering, K-means, and EM

Lecture 14. Clustering, K-means, and EM Lecture 14. Clustering, K-means, and EM Prof. Alan Yuille Spring 2014 Outline 1. Clustering 2. K-means 3. EM 1 Clustering Task: Given a set of unlabeled data D = {x 1,..., x n }, we do the following: 1.

More information

Hidden Markov Models Part 2: Algorithms

Hidden Markov Models Part 2: Algorithms Hidden Markov Models Part 2: Algorithms CSE 6363 Machine Learning Vassilis Athitsos Computer Science and Engineering Department University of Texas at Arlington 1 Hidden Markov Model An HMM consists of:

More information

Undirected Graphical Models

Undirected Graphical Models Undirected Graphical Models 1 Conditional Independence Graphs Let G = (V, E) be an undirected graph with vertex set V and edge set E, and let A, B, and C be subsets of vertices. We say that C separates

More information

Variational inference

Variational inference Simon Leglaive Télécom ParisTech, CNRS LTCI, Université Paris Saclay November 18, 2016, Télécom ParisTech, Paris, France. Outline Introduction Probabilistic model Problem Log-likelihood decomposition EM

More information

Conditional Random Field

Conditional Random Field Introduction Linear-Chain General Specific Implementations Conclusions Corso di Elaborazione del Linguaggio Naturale Pisa, May, 2011 Introduction Linear-Chain General Specific Implementations Conclusions

More information

Discrete Inference and Learning Lecture 3

Discrete Inference and Learning Lecture 3 Discrete Inference and Learning Lecture 3 MVA 2017 2018 h

More information

CS Lecture 19. Exponential Families & Expectation Propagation

CS Lecture 19. Exponential Families & Expectation Propagation CS 6347 Lecture 19 Exponential Families & Expectation Propagation Discrete State Spaces We have been focusing on the case of MRFs over discrete state spaces Probability distributions over discrete spaces

More information

K-Means and Gaussian Mixture Models

K-Means and Gaussian Mixture Models K-Means and Gaussian Mixture Models David Rosenberg New York University October 29, 2016 David Rosenberg (New York University) DS-GA 1003 October 29, 2016 1 / 42 K-Means Clustering K-Means Clustering David

More information

Undirected Graphical Models

Undirected Graphical Models Outline Hong Chang Institute of Computing Technology, Chinese Academy of Sciences Machine Learning Methods (Fall 2012) Outline Outline I 1 Introduction 2 Properties Properties 3 Generative vs. Conditional

More information

Based on slides by Richard Zemel

Based on slides by Richard Zemel CSC 412/2506 Winter 2018 Probabilistic Learning and Reasoning Lecture 3: Directed Graphical Models and Latent Variables Based on slides by Richard Zemel Learning outcomes What aspects of a model can we

More information

Mixture Models and Expectation-Maximization

Mixture Models and Expectation-Maximization Mixture Models and Expectation-Maximiation David M. Blei March 9, 2012 EM for mixtures of multinomials The graphical model for a mixture of multinomials π d x dn N D θ k K How should we fit the parameters?

More information

Discrete Optimization Lecture 5. M. Pawan Kumar

Discrete Optimization Lecture 5. M. Pawan Kumar Discrete Optimization Lecture 5 M. Pawan Kumar pawan.kumar@ecp.fr Exam Question Type 1 v 1 s v 0 4 2 1 v 4 Q. Find the distance of the shortest path from s=v 0 to all vertices in the graph using Dijkstra

More information

Structured Variational Inference

Structured Variational Inference Structured Variational Inference Sargur srihari@cedar.buffalo.edu 1 Topics 1. Structured Variational Approximations 1. The Mean Field Approximation 1. The Mean Field Energy 2. Maximizing the energy functional:

More information

Graphical Models for Collaborative Filtering

Graphical Models for Collaborative Filtering Graphical Models for Collaborative Filtering Le Song Machine Learning II: Advanced Topics CSE 8803ML, Spring 2012 Sequence modeling HMM, Kalman Filter, etc.: Similarity: the same graphical model topology,

More information

Clustering with k-means and Gaussian mixture distributions

Clustering with k-means and Gaussian mixture distributions Clustering with k-means and Gaussian mixture distributions Machine Learning and Object Recognition 2017-2018 Jakob Verbeek Clustering Finding a group structure in the data Data in one cluster similar to

More information

Learning MN Parameters with Approximation. Sargur Srihari

Learning MN Parameters with Approximation. Sargur Srihari Learning MN Parameters with Approximation Sargur srihari@cedar.buffalo.edu 1 Topics Iterative exact learning of MN parameters Difficulty with exact methods Approximate methods Approximate Inference Belief

More information

Support Vector Machines and Speaker Verification

Support Vector Machines and Speaker Verification 1 Support Vector Machines and Speaker Verification David Cinciruk March 6, 2013 2 Table of Contents Review of Speaker Verification Introduction to Support Vector Machines Derivation of SVM Equations Soft

More information