Joint Optimization of Segmentation and Appearance Models
|
|
- Cynthia Andrews
- 5 years ago
- Views:
Transcription
1 Joint Optimization of Segmentation and Appearance Models David Mandle, Sameep Tandon April 29, 2013 David Mandle, Sameep Tandon (Stanford) April 29, / 19
2 Overview 1 Recap: Image Segmentation 2 Optimization Strategy 3 Experimental David Mandle, Sameep Tandon (Stanford) April 29, / 19
3 Recap: Image Segmentation Problem Segment an image into foreground and background Figure: Left: Input image. Middle: Segmentation by EM (GrabCut). Right: Segmentation by the method covered today David Mandle, Sameep Tandon (Stanford) April 29, / 19
4 Recap: Image Segmentation as Energy Optimization Recall Grid Structured Markov Random Field: Latent variables x i {0, 1} corresponding to foreground/background Observations z i. Take to be RGB pixel values Edge potentials Φ(x i, z i ), Ψ(x i, x j ) David Mandle, Sameep Tandon (Stanford) April 29, / 19
5 Recap: Image Segmentation as Energy Optimization The Graphical Model encodes the following (unnormalized) probability distribution: David Mandle, Sameep Tandon (Stanford) April 29, / 19
6 Recap: Image Segmentation as Energy Optimization Goal: find x to maximize P(x, z) (z is observed) David Mandle, Sameep Tandon (Stanford) April 29, / 19
7 Recap: Image Segmentation as Energy Optimization Goal: find x to maximize P(x, z) (z is observed) Taking logs: E(x, z) = i φ(x i, z i ) + i,j ψ(x i, x j ) David Mandle, Sameep Tandon (Stanford) April 29, / 19
8 Recap: Image Segmentation as Energy Optimization Goal: find x to maximize P(x, z) (z is observed) Taking logs: E(x, z) = i φ(x i, z i ) + i,j ψ(x i, x j ) Unary potential φ(x i, z i ) encodes how likely it is for a pixel or patch y i to belong to segmentation x i. David Mandle, Sameep Tandon (Stanford) April 29, / 19
9 Recap: Image Segmentation as Energy Optimization Goal: find x to maximize P(x, z) (z is observed) Taking logs: E(x, z) = i φ(x i, z i ) + i,j ψ(x i, x j ) Unary potential φ(x i, z i ) encodes how likely it is for a pixel or patch y i to belong to segmentation x i. Pairwise potential ψ(x i, x j ) encodes neighborhood info about pixel/patch segmentation labels David Mandle, Sameep Tandon (Stanford) April 29, / 19
10 Recap: GrabCut Model Unary Potentials: log of Gaussian Mixture Model But to deal with tractability, we assign each x i to component k i φ(x i, k i, θ z i ) = log π(x i, k i ) + log N(z i ; µ(k i ), Σ(k i )) David Mandle, Sameep Tandon (Stanford) April 29, / 19
11 Recap: GrabCut Model Unary Potentials: log of Gaussian Mixture Model But to deal with tractability, we assign each x i to component k i φ(x i, k i, θ z i ) = log π(x i, k i ) + log N(z i ; µ(k i ), Σ(k i )) Pairwise Potentials: ψ(x i, x j z i, z j ) = [x i x j ] exp( β 1 z i z j 2 ) where β = 2 avg( z i z j 2 ) David Mandle, Sameep Tandon (Stanford) April 29, / 19
12 Recap: GrabCut Optimization Strategy GrabCut EM Algorithm 1 Initialize Mixture Models David Mandle, Sameep Tandon (Stanford) April 29, / 19
13 Recap: GrabCut Optimization Strategy GrabCut EM Algorithm 1 Initialize Mixture Models 2 Assign GMM components: k i = arg min k φ(x i, k i, θ z i ) David Mandle, Sameep Tandon (Stanford) April 29, / 19
14 Recap: GrabCut Optimization Strategy GrabCut EM Algorithm 1 Initialize Mixture Models 2 Assign GMM components: 3 Get GMM parameters: k i = arg min k φ(x i, k i, θ z i ) θ = arg min θ φ(x i, k i, θ z i ) i David Mandle, Sameep Tandon (Stanford) April 29, / 19
15 Recap: GrabCut Optimization Strategy GrabCut EM Algorithm 1 Initialize Mixture Models 2 Assign GMM components: 3 Get GMM parameters: k i = arg min k φ(x i, k i, θ z i ) θ = arg min θ φ(x i, k i, θ z i ) 4 Perform segmentation using reduction to min-cut: i x = arg min x E(x, z; k, θ) David Mandle, Sameep Tandon (Stanford) April 29, / 19
16 Recap: GrabCut Optimization Strategy GrabCut EM Algorithm 1 Initialize Mixture Models 2 Assign GMM components: 3 Get GMM parameters: k i = arg min k φ(x i, k i, θ z i ) θ = arg min θ φ(x i, k i, θ z i ) 4 Perform segmentation using reduction to min-cut: 5 Iterate from step 2 until converged i x = arg min x E(x, z; k, θ) David Mandle, Sameep Tandon (Stanford) April 29, / 19
17 New Model Let s consider a simpler model. This will be useful soon Unary terms: Histograms David Mandle, Sameep Tandon (Stanford) April 29, / 19
18 New Model Let s consider a simpler model. This will be useful soon Unary terms: Histograms K bins, b i is bin of pixel z i David Mandle, Sameep Tandon (Stanford) April 29, / 19
19 New Model Let s consider a simpler model. This will be useful soon Unary terms: Histograms K bins, b i is bin of pixel z i θ 0, θ 1 [0, 1] K represent color models (distributions) over foreground/background David Mandle, Sameep Tandon (Stanford) April 29, / 19
20 New Model Let s consider a simpler model. This will be useful soon Unary terms: Histograms K bins, b i is bin of pixel z i θ 0, θ 1 [0, 1] K represent color models (distributions) over foreground/background φ(x i, b i, θ) = log θ x i b i David Mandle, Sameep Tandon (Stanford) April 29, / 19
21 New Model Let s consider a simpler model. This will be useful soon Unary terms: Histograms K bins, b i is bin of pixel z i θ 0, θ 1 [0, 1] K represent color models (distributions) over foreground/background Pairwise Potentials φ(x i, b i, θ) = log θ x i b i ψ(x i, x j ) = w ij x i x j We will define w ij later; for now, consider pairwise equal to Grabcut David Mandle, Sameep Tandon (Stanford) April 29, / 19
22 New Model Let s consider a simpler model. This will be useful soon Unary terms: Histograms K bins, b i is bin of pixel z i θ 0, θ 1 [0, 1] K represent color models (distributions) over foreground/background Pairwise Potentials φ(x i, b i, θ) = log θ x i b i ψ(x i, x j ) = w ij x i x j We will define w ij later; for now, consider pairwise equal to Grabcut Total Energy: E(x, θ 0, θ 1 ) = log P(z p θ xp ) + w pq x p x q p V P(z p θ xp ) = θ xp b p (p,q) N David Mandle, Sameep Tandon (Stanford) April 29, / 19
23 EM under new model 1 Initialize histograms θ 0, θ 1. David Mandle, Sameep Tandon (Stanford) April 29, / 19
24 EM under new model 1 Initialize histograms θ 0, θ 1. 2 Fix θ. Perform segmentation using reduction to min-cut: x = arg min x E(x, θ 0, θ 1 ) David Mandle, Sameep Tandon (Stanford) April 29, / 19
25 EM under new model 1 Initialize histograms θ 0, θ 1. 2 Fix θ. Perform segmentation using reduction to min-cut: x = arg min x E(x, θ 0, θ 1 ) 3 Fix x. Compute θ 0, θ 1 (via standard parameter fitting). David Mandle, Sameep Tandon (Stanford) April 29, / 19
26 EM under new model 1 Initialize histograms θ 0, θ 1. 2 Fix θ. Perform segmentation using reduction to min-cut: x = arg min x E(x, θ 0, θ 1 ) 3 Fix x. Compute θ 0, θ 1 (via standard parameter fitting). 4 Iterate from step 2 until converged David Mandle, Sameep Tandon (Stanford) April 29, / 19
27 Optimization Goal: Minimize Energy min x E(x) E(x) = k h k (n 1 k ) + (p,q) N w pq x p x q } {{ } E 1 (x) + h(n 1 ) }{{} E 2 (x) where n 1 k = p V k x p and n 1 = p V x p David Mandle, Sameep Tandon (Stanford) April 29, / 19
28 Optimization Goal: Minimize Energy min x E(x) E(x) = k h k (n 1 k ) + (p,q) N w pq x p x q } {{ } E 1 (x) + h(n 1 ) }{{} E 2 (x) where n 1 k = p V k x p and n 1 = p V x p This is hard! David Mandle, Sameep Tandon (Stanford) April 29, / 19
29 Optimization Goal: Minimize Energy min x E(x) E(x) = k h k (n 1 k ) + (p,q) N w pq x p x q } {{ } E 1 (x) + h(n 1 ) }{{} E 2 (x) where n 1 k = p V k x p and n 1 = p V x p This is hard! But efficient strategies for optimizing E 1 (x) and E 2 (x) separately David Mandle, Sameep Tandon (Stanford) April 29, / 19
30 Optimization via Dual Decomposition Consider an optimization of the form min x f 1 (x) + f 2 (x) David Mandle, Sameep Tandon (Stanford) April 29, / 19
31 Optimization via Dual Decomposition Consider an optimization of the form min x f 1 (x) + f 2 (x) where optimizing f (x) = f 1 (x) + f 2 (x) is hard David Mandle, Sameep Tandon (Stanford) April 29, / 19
32 Optimization via Dual Decomposition Consider an optimization of the form min x f 1 (x) + f 2 (x) where optimizing f (x) = f 1 (x) + f 2 (x) is hard But min x f 1 (x) and min x f 2 (x) are easy problems David Mandle, Sameep Tandon (Stanford) April 29, / 19
33 Optimization via Dual Decomposition Consider an optimization of the form min x f 1 (x) + f 2 (x) where optimizing f (x) = f 1 (x) + f 2 (x) is hard But min x f 1 (x) and min x f 2 (x) are easy problems Dual Decomposition idea: Optimize f 1 (x) and f 2 (x) separately and combine in a principled way. David Mandle, Sameep Tandon (Stanford) April 29, / 19
34 Optimization via Dual Decomposition Original Problem: min x f 1 (x) + f 2 (x) David Mandle, Sameep Tandon (Stanford) April 29, / 19
35 Optimization via Dual Decomposition Original Problem: min x f 1 (x) + f 2 (x) Introduce local variables: min x1,x 2 f 1 (x 1 ) + f 2 (x 2 ) s.t x 1 = x 2 David Mandle, Sameep Tandon (Stanford) April 29, / 19
36 Optimization via Dual Decomposition Original Problem: min x f 1 (x) + f 2 (x) Introduce local variables: min x1,x 2 f 1 (x 1 ) + f 2 (x 2 ) s.t x 1 = x 2 Equivalent Problem: min x1,x 2 f 1 (x 1 ) + f 2 (x 2 ) s.t x 2 x 1 = 0 David Mandle, Sameep Tandon (Stanford) April 29, / 19
37 Optimization via Dual Decomposition Primal Problem: min x1,x 2 f 1 (x 1 ) + f 2 (x 2 ) s.t x 2 x 1 = 0 David Mandle, Sameep Tandon (Stanford) April 29, / 19
38 Optimization via Dual Decomposition Primal Problem: Lagrangian Dual: min x1,x 2 f 1 (x 1 ) + f 2 (x 2 ) s.t x 2 x 1 = 0 g(y) = min x 1,x 2 f 1 (x 1 ) + f 2 (x 2 ) + y T (x 2 x 1 ) David Mandle, Sameep Tandon (Stanford) April 29, / 19
39 Optimization via Dual Decomposition Primal Problem: Lagrangian Dual: min x1,x 2 f 1 (x 1 ) + f 2 (x 2 ) s.t x 2 x 1 = 0 g(y) = min x 1,x 2 f 1 (x 1 ) + f 2 (x 2 ) + y T (x 2 x 1 ) Decompose Lagrangian Dual: g(y) = (min f 1 (x 1 ) y T x 1 ) + (min f 2 (x 2 ) + y T x 2 ) x 1 x 2 David Mandle, Sameep Tandon (Stanford) April 29, / 19
40 Optimization via Dual Decomposition g(y) = (min f 1 (x 1 ) y T x 1 ) + (min f 2 (x 2 ) + y T x 2 ) x } 1 x {{}} 2 {{} g 1 (y) g 2 (y) For all y, g(y) is a lower bound on optimal of the primal problem. David Mandle, Sameep Tandon (Stanford) April 29, / 19
41 Optimization via Dual Decomposition g(y) = (min f 1 (x 1 ) y T x 1 ) + (min f 2 (x 2 ) + y T x 2 ) x } 1 x {{}} 2 {{} g 1 (y) g 2 (y) For all y, g(y) is a lower bound on optimal of the primal problem. Maximize g(y) w.r.t. y to get the tightest bound David Mandle, Sameep Tandon (Stanford) April 29, / 19
42 Optimization via Dual Decomposition g(y) = (min f 1 (x 1 ) y T x 1 ) + (min f 2 (x 2 ) + y T x 2 ) x } 1 x {{}} 2 {{} g 1 (y) g 2 (y) For all y, g(y) is a lower bound on optimal of the primal problem. Maximize g(y) w.r.t. y to get the tightest bound Further, g(y) is concave in y. Many techniques for concave maximization (subgradient ascent, etc). David Mandle, Sameep Tandon (Stanford) April 29, / 19
43 Optimization via Dual Decomposition g(y) = (min f 1 (x 1 ) y T x 1 ) + (min f 2 (x 2 ) + y T x 2 ) x } 1 x {{}} 2 {{} g 1 (y) g 2 (y) For all y, g(y) is a lower bound on optimal of the primal problem. Maximize g(y) w.r.t. y to get the tightest bound Further, g(y) is concave in y. Many techniques for concave maximization (subgradient ascent, etc). Ideally have fast optimization strategies for g 1 (y) and g 2 (y) David Mandle, Sameep Tandon (Stanford) April 29, / 19
44 Optimization via Dual Decomposition Back to our Segmentation problem Energy E(x) = h k (nk 1 ) + k w pq x p x q + h(n 1 ) (p,q) N where n 1 k = p V k x p and n 1 = p V x p David Mandle, Sameep Tandon (Stanford) April 29, / 19
45 Optimization via Dual Decomposition Back to our Segmentation problem Energy E(x) = k h k (n 1 k ) + (p,q) N where n 1 k = p V k x p and n 1 = p V x p Group terms E(x) = k h k (n 1 k ) + (p,q) N w pq x p x q + h(n 1 ) w pq x p x q } {{ } E 1 (x) + h(n 1 ) }{{} E 2 (x) David Mandle, Sameep Tandon (Stanford) April 29, / 19
46 Optimization via Dual Decomposition Back to our Segmentation problem Energy E(x) = k h k (n 1 k ) + (p,q) N where n 1 k = p V k x p and n 1 = p V x p Group terms E(x) = k E(x) = E 1 (x) + E 2 (x) h k (n 1 k ) + (p,q) N w pq x p x q + h(n 1 ) w pq x p x q } {{ } E 1 (x) + h(n 1 ) }{{} E 2 (x) David Mandle, Sameep Tandon (Stanford) April 29, / 19
47 Optimization via Dual Decomposition Dual Decomposition on E(x) = E 1 (x) + E 2 (x) Φ(y) = min(e 1 (x 1 ) y T x 1 ) + min(e 2 (x 2 ) + y T x 2 ) x } 1 x {{}} 2 {{} Φ 1 (y) Φ 2 (y) David Mandle, Sameep Tandon (Stanford) April 29, / 19
48 Optimization via Dual Decomposition Dual Decomposition on E(x) = E 1 (x) + E 2 (x) Φ(y) = min(e 1 (x 1 ) y T x 1 ) + min(e 2 (x 2 ) + y T x 2 ) x } 1 x {{}} 2 {{} Φ 1 (y) Φ 2 (y) Maximize Φ(y) w.r.t. y to get the tightest lower bound. David Mandle, Sameep Tandon (Stanford) April 29, / 19
49 Optimization via Dual Decomposition Dual Decomposition on E(x) = E 1 (x) + E 2 (x) Φ(y) = min(e 1 (x 1 ) y T x 1 ) + min(e 2 (x 2 ) + y T x 2 ) x } 1 x {{}} 2 {{} Φ 1 (y) Φ 2 (y) Maximize Φ(y) w.r.t. y to get the tightest lower bound. Φ 1 (y) can be computed efficiently via reduction to min s-t cut. David Mandle, Sameep Tandon (Stanford) April 29, / 19
50 Optimization via Dual Decomposition Dual Decomposition on E(x) = E 1 (x) + E 2 (x) Φ(y) = min(e 1 (x 1 ) y T x 1 ) + min(e 2 (x 2 ) + y T x 2 ) x } 1 x {{}} 2 {{} Φ 1 (y) Φ 2 (y) Maximize Φ(y) w.r.t. y to get the tightest lower bound. Φ 1 (y) can be computed efficiently via reduction to min s-t cut. Φ 2 (y) via convex minimization (several strategies) David Mandle, Sameep Tandon (Stanford) April 29, / 19
51 Optimization via Dual Decomposition Dual Decomposition on E(x) = E 1 (x) + E 2 (x) Φ(y) = min(e 1 (x 1 ) y T x 1 ) + min(e 2 (x 2 ) + y T x 2 ) x } 1 x {{}} 2 {{} Φ 1 (y) Φ 2 (y) Maximize Φ(y) w.r.t. y to get the tightest lower bound. Φ 1 (y) can be computed efficiently via reduction to min s-t cut. Φ 2 (y) via convex minimization (several strategies) You could now use your favorite concave maximization strategy for Φ(y) David Mandle, Sameep Tandon (Stanford) April 29, / 19
52 Optimization via Dual Decomposition Turns out can maximize Φ(y) over y in polynomial time using a parametric max-flow technique. David Mandle, Sameep Tandon (Stanford) April 29, / 19
53 Optimization via Dual Decomposition Turns out can maximize Φ(y) over y in polynomial time using a parametric max-flow technique. Sketch: David Mandle, Sameep Tandon (Stanford) April 29, / 19
54 Optimization via Dual Decomposition Turns out can maximize Φ(y) over y in polynomial time using a parametric max-flow technique. Sketch: Theorem: Given Φ 1 (y) and Φ 2 (y) as described, optimal y has form y = s1 David Mandle, Sameep Tandon (Stanford) April 29, / 19
55 Optimization via Dual Decomposition Turns out can maximize Φ(y) over y in polynomial time using a parametric max-flow technique. Sketch: Theorem: Given Φ 1 (y) and Φ 2 (y) as described, optimal y has form y = s1 Implication: maximize Φ(s1) over all possible s David Mandle, Sameep Tandon (Stanford) April 29, / 19
56 Optimization via Dual Decomposition Turns out can maximize Φ(y) over y in polynomial time using a parametric max-flow technique. Sketch: Theorem: Given Φ 1 (y) and Φ 2 (y) as described, optimal y has form y = s1 Implication: maximize Φ(s1) over all possible s Φ1 (s1) is piecewise-linear concave V breakpoints computed by parametric max-flow Parametric max-flow also returns between 2 and V + 1 solutions segmentations x per breakpoint David Mandle, Sameep Tandon (Stanford) April 29, / 19
57 Optimization via Dual Decomposition Turns out can maximize Φ(y) over y in polynomial time using a parametric max-flow technique. Sketch: Theorem: Given Φ 1 (y) and Φ 2 (y) as described, optimal y has form y = s1 Implication: maximize Φ(s1) over all possible s Φ1 (s1) is piecewise-linear concave V breakpoints computed by parametric max-flow Parametric max-flow also returns between 2 and V + 1 solutions segmentations x per breakpoint Φ 2 (s1) is piecewise-linear concave V breakpoints can be enumerated from h( ) David Mandle, Sameep Tandon (Stanford) April 29, / 19
58 Optimization via Dual Decomposition Turns out can maximize Φ(y) over y in polynomial time using a parametric max-flow technique. Sketch: Theorem: Given Φ 1 (y) and Φ 2 (y) as described, optimal y has form y = s1 Implication: maximize Φ(s1) over all possible s Φ1 (s1) is piecewise-linear concave V breakpoints computed by parametric max-flow Parametric max-flow also returns between 2 and V + 1 solutions segmentations x per breakpoint Φ 2 (s1) is piecewise-linear concave V breakpoints can be enumerated from h( ) Φ(s1) is piecewise-linear concave with 2 V breakpoints, so finding max over Φ(1s) is easy it s one of the breakpoints. David Mandle, Sameep Tandon (Stanford) April 29, / 19
59 Optimization via Dual Decomposition Turns out can maximize Φ(y) over y in polynomial time using a parametric max-flow technique. Sketch: Theorem: Given Φ 1 (y) and Φ 2 (y) as described, optimal y has form y = s1 Implication: maximize Φ(s1) over all possible s Φ1 (s1) is piecewise-linear concave V breakpoints computed by parametric max-flow Parametric max-flow also returns between 2 and V + 1 solutions segmentations x per breakpoint Φ 2 (s1) is piecewise-linear concave V breakpoints can be enumerated from h( ) Φ(s1) is piecewise-linear concave with 2 V breakpoints, so finding max over Φ(1s) is easy it s one of the breakpoints. Given the breakpoint, return the segmentation x with minimum energy from set given by PMF. David Mandle, Sameep Tandon (Stanford) April 29, / 19
60 References MRF Review Slides: spring1213/slides/segmentation.p Dual Decomposition Slides: Petter Strandmark and Fredrik Kahl. Parallel and Distributed Graph Cuts by Dual Decomposition. David Mandle, Sameep Tandon (Stanford) April 29, / 19
Energy minimization via graph-cuts
Energy minimization via graph-cuts Nikos Komodakis Ecole des Ponts ParisTech, LIGM Traitement de l information et vision artificielle Binary energy minimization We will first consider binary MRFs: Graph
More informationPart 7: Structured Prediction and Energy Minimization (2/2)
Part 7: Structured Prediction and Energy Minimization (2/2) Colorado Springs, 25th June 2011 G: Worst-case Complexity Hard problem Generality Optimality Worst-case complexity Integrality Determinism G:
More informationClustering K-means. Clustering images. Machine Learning CSE546 Carlos Guestrin University of Washington. November 4, 2014.
Clustering K-means Machine Learning CSE546 Carlos Guestrin University of Washington November 4, 2014 1 Clustering images Set of Images [Goldberger et al.] 2 1 K-means Randomly initialize k centers µ (0)
More informationUndirected Graphical Models: Markov Random Fields
Undirected Graphical Models: Markov Random Fields 40-956 Advanced Topics in AI: Probabilistic Graphical Models Sharif University of Technology Soleymani Spring 2015 Markov Random Field Structure: undirected
More informationParametric Unsupervised Learning Expectation Maximization (EM) Lecture 20.a
Parametric Unsupervised Learning Expectation Maximization (EM) Lecture 20.a Some slides are due to Christopher Bishop Limitations of K-means Hard assignments of data points to clusters small shift of a
More informationIntelligent Systems:
Intelligent Systems: Undirected Graphical models (Factor Graphs) (2 lectures) Carsten Rother 15/01/2015 Intelligent Systems: Probabilistic Inference in DGM and UGM Roadmap for next two lectures Definition
More informationSTATS 306B: Unsupervised Learning Spring Lecture 3 April 7th
STATS 306B: Unsupervised Learning Spring 2014 Lecture 3 April 7th Lecturer: Lester Mackey Scribe: Jordan Bryan, Dangna Li 3.1 Recap: Gaussian Mixture Modeling In the last lecture, we discussed the Gaussian
More informationClustering K-means. Machine Learning CSE546. Sham Kakade University of Washington. November 15, Review: PCA Start: unsupervised learning
Clustering K-means Machine Learning CSE546 Sham Kakade University of Washington November 15, 2016 1 Announcements: Project Milestones due date passed. HW3 due on Monday It ll be collaborative HW2 grades
More informationPattern Recognition and Machine Learning. Bishop Chapter 9: Mixture Models and EM
Pattern Recognition and Machine Learning Chapter 9: Mixture Models and EM Thomas Mensink Jakob Verbeek October 11, 27 Le Menu 9.1 K-means clustering Getting the idea with a simple example 9.2 Mixtures
More informationLikelihood, MLE & EM for Gaussian Mixture Clustering. Nick Duffield Texas A&M University
Likelihood, MLE & EM for Gaussian Mixture Clustering Nick Duffield Texas A&M University Probability vs. Likelihood Probability: predict unknown outcomes based on known parameters: P(x q) Likelihood: estimate
More informationMixture Models & EM. Nicholas Ruozzi University of Texas at Dallas. based on the slides of Vibhav Gogate
Mixture Models & EM icholas Ruozzi University of Texas at Dallas based on the slides of Vibhav Gogate Previously We looed at -means and hierarchical clustering as mechanisms for unsupervised learning -means
More informationExpectation maximization
Expectation maximization Subhransu Maji CMSCI 689: Machine Learning 14 April 2015 Motivation Suppose you are building a naive Bayes spam classifier. After your are done your boss tells you that there is
More informationVariational Inference (11/04/13)
STA561: Probabilistic machine learning Variational Inference (11/04/13) Lecturer: Barbara Engelhardt Scribes: Matt Dickenson, Alireza Samany, Tracy Schifeling 1 Introduction In this lecture we will further
More informationMixture Models & EM. Nicholas Ruozzi University of Texas at Dallas. based on the slides of Vibhav Gogate
Mixture Models & EM icholas Ruozzi University of Texas at Dallas based on the slides of Vibhav Gogate Previously We looed at -means and hierarchical clustering as mechanisms for unsupervised learning -means
More informationUndirected graphical models
Undirected graphical models Semantics of probabilistic models over undirected graphs Parameters of undirected models Example applications COMP-652 and ECSE-608, February 16, 2017 1 Undirected graphical
More informationBayesian Machine Learning - Lecture 7
Bayesian Machine Learning - Lecture 7 Guido Sanguinetti Institute for Adaptive and Neural Computation School of Informatics University of Edinburgh gsanguin@inf.ed.ac.uk March 4, 2015 Today s lecture 1
More informationComputer Vision Group Prof. Daniel Cremers. 6. Mixture Models and Expectation-Maximization
Prof. Daniel Cremers 6. Mixture Models and Expectation-Maximization Motivation Often the introduction of latent (unobserved) random variables into a model can help to express complex (marginal) distributions
More informationClustering. Professor Ameet Talwalkar. Professor Ameet Talwalkar CS260 Machine Learning Algorithms March 8, / 26
Clustering Professor Ameet Talwalkar Professor Ameet Talwalkar CS26 Machine Learning Algorithms March 8, 217 1 / 26 Outline 1 Administration 2 Review of last lecture 3 Clustering Professor Ameet Talwalkar
More informationProbabilistic Graphical Models & Applications
Probabilistic Graphical Models & Applications Learning of Graphical Models Bjoern Andres and Bernt Schiele Max Planck Institute for Informatics The slides of today s lecture are authored by and shown with
More informationProbabilistic Graphical Models
School of Computer Science Probabilistic Graphical Models Variational Inference II: Mean Field Method and Variational Principle Junming Yin Lecture 15, March 7, 2012 X 1 X 1 X 1 X 1 X 2 X 3 X 2 X 2 X 3
More informationMachine Learning. Gaussian Mixture Models. Zhiyao Duan & Bryan Pardo, Machine Learning: EECS 349 Fall
Machine Learning Gaussian Mixture Models Zhiyao Duan & Bryan Pardo, Machine Learning: EECS 349 Fall 2012 1 The Generative Model POV We think of the data as being generated from some process. We assume
More informationCOMS 4721: Machine Learning for Data Science Lecture 16, 3/28/2017
COMS 4721: Machine Learning for Data Science Lecture 16, 3/28/2017 Prof. John Paisley Department of Electrical Engineering & Data Science Institute Columbia University SOFT CLUSTERING VS HARD CLUSTERING
More informationEnergy Minimization via Graph Cuts
Energy Minimization via Graph Cuts Xiaowei Zhou, June 11, 2010, Journal Club Presentation 1 outline Introduction MAP formulation for vision problems Min-cut and Max-flow Problem Energy Minimization via
More informationExpectation Maximization
Expectation Maximization Bishop PRML Ch. 9 Alireza Ghane c Ghane/Mori 4 6 8 4 6 8 4 6 8 4 6 8 5 5 5 5 5 5 4 6 8 4 4 6 8 4 5 5 5 5 5 5 µ, Σ) α f Learningscale is slightly Parameters is slightly larger larger
More informationCPSC 540: Machine Learning
CPSC 540: Machine Learning More Approximate Inference Mark Schmidt University of British Columbia Winter 2018 Last Time: Approximate Inference We ve been discussing graphical models for density estimation,
More information1 EM algorithm: updating the mixing proportions {π k } ik are the posterior probabilities at the qth iteration of EM.
Université du Sud Toulon - Var Master Informatique Probabilistic Learning and Data Analysis TD: Model-based clustering by Faicel CHAMROUKHI Solution The aim of this practical wor is to show how the Classification
More informationClustering and Gaussian Mixture Models
Clustering and Gaussian Mixture Models Piyush Rai IIT Kanpur Probabilistic Machine Learning (CS772A) Jan 25, 2016 Probabilistic Machine Learning (CS772A) Clustering and Gaussian Mixture Models 1 Recap
More informationPosterior Regularization
Posterior Regularization 1 Introduction One of the key challenges in probabilistic structured learning, is the intractability of the posterior distribution, for fast inference. There are numerous methods
More informationLatent Variable View of EM. Sargur Srihari
Latent Variable View of EM Sargur srihari@cedar.buffalo.edu 1 Examples of latent variables 1. Mixture Model Joint distribution is p(x,z) We don t have values for z 2. Hidden Markov Model A single time
More informationExpectation Maximization
Expectation Maximization Aaron C. Courville Université de Montréal Note: Material for the slides is taken directly from a presentation prepared by Christopher M. Bishop Learning in DAGs Two things could
More informationSeries 6, May 14th, 2018 (EM Algorithm and Semi-Supervised Learning)
Exercises Introduction to Machine Learning SS 2018 Series 6, May 14th, 2018 (EM Algorithm and Semi-Supervised Learning) LAS Group, Institute for Machine Learning Dept of Computer Science, ETH Zürich Prof
More informationEfficient Inference in Fully Connected CRFs with Gaussian Edge Potentials
Efficient Inference in Fully Connected CRFs with Gaussian Edge Potentials Philipp Krähenbühl and Vladlen Koltun Stanford University Presenter: Yuan-Ting Hu 1 Conditional Random Field (CRF) E x I = φ u
More informationCS Lecture 4. Markov Random Fields
CS 6347 Lecture 4 Markov Random Fields Recap Announcements First homework is available on elearning Reminder: Office hours Tuesday from 10am-11am Last Time Bayesian networks Today Markov random fields
More informationTHE HIDDEN CONVEXITY OF SPECTRAL CLUSTERING
THE HIDDEN CONVEXITY OF SPECTRAL CLUSTERING Luis Rademacher, Ohio State University, Computer Science and Engineering. Joint work with Mikhail Belkin and James Voss This talk A new approach to multi-way
More informationMarkov Random Fields
Markov Random Fields Umamahesh Srinivas ipal Group Meeting February 25, 2011 Outline 1 Basic graph-theoretic concepts 2 Markov chain 3 Markov random field (MRF) 4 Gauss-Markov random field (GMRF), and
More informationGaussian Mixture Models, Expectation Maximization
Gaussian Mixture Models, Expectation Maximization Instructor: Jessica Wu Harvey Mudd College The instructor gratefully acknowledges Andrew Ng (Stanford), Andrew Moore (CMU), Eric Eaton (UPenn), David Kauchak
More informationCheng Soon Ong & Christian Walder. Canberra February June 2018
Cheng Soon Ong & Christian Walder Research Group and College of Engineering and Computer Science Canberra February June 218 Outlines Overview Introduction Linear Algebra Probability Linear Regression 1
More informationMarkov Random Fields for Computer Vision (Part 1)
Markov Random Fields for Computer Vision (Part 1) Machine Learning Summer School (MLSS 2011) Stephen Gould stephen.gould@anu.edu.au Australian National University 13 17 June, 2011 Stephen Gould 1/23 Pixel
More informationCSCI-567: Machine Learning (Spring 2019)
CSCI-567: Machine Learning (Spring 2019) Prof. Victor Adamchik U of Southern California Mar. 19, 2019 March 19, 2019 1 / 43 Administration March 19, 2019 2 / 43 Administration TA3 is due this week March
More informationIntroduction to Probabilistic Graphical Models: Exercises
Introduction to Probabilistic Graphical Models: Exercises Cédric Archambeau Xerox Research Centre Europe cedric.archambeau@xrce.xerox.com Pascal Bootcamp Marseille, France, July 2010 Exercise 1: basics
More informationMAP Examples. Sargur Srihari
MAP Examples Sargur srihari@cedar.buffalo.edu 1 Potts Model CRF for OCR Topics Image segmentation based on energy minimization 2 Examples of MAP Many interesting examples of MAP inference are instances
More informationJoint Factor Analysis for Speaker Verification
Joint Factor Analysis for Speaker Verification Mengke HU ASPITRG Group, ECE Department Drexel University mengke.hu@gmail.com October 12, 2012 1/37 Outline 1 Speaker Verification Baseline System Session
More informationProbabilistic Graphical Models
Probabilistic Graphical Models David Sontag New York University Lecture 6, March 7, 2013 David Sontag (NYU) Graphical Models Lecture 6, March 7, 2013 1 / 25 Today s lecture 1 Dual decomposition 2 MAP inference
More informationGaussian Mixture Models
Gaussian Mixture Models Pradeep Ravikumar Co-instructor: Manuela Veloso Machine Learning 10-701 Some slides courtesy of Eric Xing, Carlos Guestrin (One) bad case for K- means Clusters may overlap Some
More information13: Variational inference II
10-708: Probabilistic Graphical Models, Spring 2015 13: Variational inference II Lecturer: Eric P. Xing Scribes: Ronghuo Zheng, Zhiting Hu, Yuntian Deng 1 Introduction We started to talk about variational
More informationLecture 6: Gaussian Mixture Models (GMM)
Helsinki Institute for Information Technology Lecture 6: Gaussian Mixture Models (GMM) Pedram Daee 3.11.2015 Outline Gaussian Mixture Models (GMM) Models Model families and parameters Parameter learning
More informationThe Particle Filter. PD Dr. Rudolph Triebel Computer Vision Group. Machine Learning for Computer Vision
The Particle Filter Non-parametric implementation of Bayes filter Represents the belief (posterior) random state samples. by a set of This representation is approximate. Can represent distributions that
More informationWeighted Finite-State Transducers in Computational Biology
Weighted Finite-State Transducers in Computational Biology Mehryar Mohri Courant Institute of Mathematical Sciences mohri@cims.nyu.edu Joint work with Corinna Cortes (Google Research). 1 This Tutorial
More informationSTATS 306B: Unsupervised Learning Spring Lecture 2 April 2
STATS 306B: Unsupervised Learning Spring 2014 Lecture 2 April 2 Lecturer: Lester Mackey Scribe: Junyang Qian, Minzhe Wang 2.1 Recap In the last lecture, we formulated our working definition of unsupervised
More informationThe Expectation-Maximization Algorithm
1/29 EM & Latent Variable Models Gaussian Mixture Models EM Theory The Expectation-Maximization Algorithm Mihaela van der Schaar Department of Engineering Science University of Oxford MLE for Latent Variable
More informationMixture Models and EM
Mixture Models and EM Yoan Miche CIS, HUT March 17, 27 Yoan Miche (CIS, HUT) Mixture Models and EM March 17, 27 1 / 23 Mise en Bouche Yoan Miche (CIS, HUT) Mixture Models and EM March 17, 27 2 / 23 Mise
More informationIntelligent Systems (AI-2)
Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 18 Oct, 21, 2015 Slide Sources Raymond J. Mooney University of Texas at Austin D. Koller, Stanford CS - Probabilistic Graphical Models CPSC
More informationLinear Dynamical Systems
Linear Dynamical Systems Sargur N. srihari@cedar.buffalo.edu Machine Learning Course: http://www.cedar.buffalo.edu/~srihari/cse574/index.html Two Models Described by Same Graph Latent variables Observations
More informationChapter 16. Structured Probabilistic Models for Deep Learning
Peng et al.: Deep Learning and Practice 1 Chapter 16 Structured Probabilistic Models for Deep Learning Peng et al.: Deep Learning and Practice 2 Structured Probabilistic Models way of using graphs to describe
More informationEstimating Latent Variable Graphical Models with Moments and Likelihoods
Estimating Latent Variable Graphical Models with Moments and Likelihoods Arun Tejasvi Chaganty Percy Liang Stanford University June 18, 2014 Chaganty, Liang (Stanford University) Moments and Likelihoods
More informationCS 664 Segmentation (2) Daniel Huttenlocher
CS 664 Segmentation (2) Daniel Huttenlocher Recap Last time covered perceptual organization more broadly, focused in on pixel-wise segmentation Covered local graph-based methods such as MST and Felzenszwalb-Huttenlocher
More informationRevisiting the Limits of MAP Inference by MWSS on Perfect Graphs
Revisiting the Limits of MAP Inference by MWSS on Perfect Graphs Adrian Weller University of Cambridge CP 2015 Cork, Ireland Slides and full paper at http://mlg.eng.cam.ac.uk/adrian/ 1 / 21 Motivation:
More informationLatent Variable Models
Latent Variable Models Stefano Ermon, Aditya Grover Stanford University Lecture 5 Stefano Ermon, Aditya Grover (AI Lab) Deep Generative Models Lecture 5 1 / 31 Recap of last lecture 1 Autoregressive models:
More informationComputer Vision Group Prof. Daniel Cremers. 10a. Markov Chain Monte Carlo
Group Prof. Daniel Cremers 10a. Markov Chain Monte Carlo Markov Chain Monte Carlo In high-dimensional spaces, rejection sampling and importance sampling are very inefficient An alternative is Markov Chain
More informationMixtures of Gaussians. Sargur Srihari
Mixtures of Gaussians Sargur srihari@cedar.buffalo.edu 1 9. Mixture Models and EM 0. Mixture Models Overview 1. K-Means Clustering 2. Mixtures of Gaussians 3. An Alternative View of EM 4. The EM Algorithm
More information13 : Variational Inference: Loopy Belief Propagation and Mean Field
10-708: Probabilistic Graphical Models 10-708, Spring 2012 13 : Variational Inference: Loopy Belief Propagation and Mean Field Lecturer: Eric P. Xing Scribes: Peter Schulam and William Wang 1 Introduction
More informationDual Decomposition for Inference
Dual Decomposition for Inference Yunshu Liu ASPITRG Research Group 2014-05-06 References: [1]. D. Sontag, A. Globerson and T. Jaakkola, Introduction to Dual Decomposition for Inference, Optimization for
More informationComputer Vision Group Prof. Daniel Cremers. 11. Sampling Methods: Markov Chain Monte Carlo
Group Prof. Daniel Cremers 11. Sampling Methods: Markov Chain Monte Carlo Markov Chain Monte Carlo In high-dimensional spaces, rejection sampling and importance sampling are very inefficient An alternative
More informationCS6220: DATA MINING TECHNIQUES
CS6220: DATA MINING TECHNIQUES Matrix Data: Clustering: Part 2 Instructor: Yizhou Sun yzsun@ccs.neu.edu November 3, 2015 Methods to Learn Matrix Data Text Data Set Data Sequence Data Time Series Graph
More informationClustering, K-Means, EM Tutorial
Clustering, K-Means, EM Tutorial Kamyar Ghasemipour Parts taken from Shikhar Sharma, Wenjie Luo, and Boris Ivanovic s tutorial slides, as well as lecture notes Organization: Clustering Motivation K-Means
More informationLecture 15. Probabilistic Models on Graph
Lecture 15. Probabilistic Models on Graph Prof. Alan Yuille Spring 2014 1 Introduction We discuss how to define probabilistic models that use richly structured probability distributions and describe how
More informationCPSC 540: Machine Learning
CPSC 540: Machine Learning Undirected Graphical Models Mark Schmidt University of British Columbia Winter 2016 Admin Assignment 3: 2 late days to hand it in today, Thursday is final day. Assignment 4:
More informationPushmeet Kohli Microsoft Research
Pushmeet Kohli Microsoft Research E(x) x in {0,1} n Image (D) [Boykov and Jolly 01] [Blake et al. 04] E(x) = c i x i Pixel Colour x in {0,1} n Unary Cost (c i ) Dark (Bg) Bright (Fg) x* = arg min E(x)
More informationBetter restore the recto side of a document with an estimation of the verso side: Markov model and inference with graph cuts
June 23 rd 2008 Better restore the recto side of a document with an estimation of the verso side: Markov model and inference with graph cuts Christian Wolf Laboratoire d InfoRmatique en Image et Systèmes
More informationLecture 10. Announcement. Mixture Models II. Topics of This Lecture. This Lecture: Advanced Machine Learning. Recap: GMMs as Latent Variable Models
Advanced Machine Learning Lecture 10 Mixture Models II 30.11.2015 Bastian Leibe RWTH Aachen http://www.vision.rwth-aachen.de/ Announcement Exercise sheet 2 online Sampling Rejection Sampling Importance
More informationGaussian Mixture Models
Gaussian Mixture Models David Rosenberg, Brett Bernstein New York University April 26, 2017 David Rosenberg, Brett Bernstein (New York University) DS-GA 1003 April 26, 2017 1 / 42 Intro Question Intro
More informationReview and Motivation
Review and Motivation We can model and visualize multimodal datasets by using multiple unimodal (Gaussian-like) clusters. K-means gives us a way of partitioning points into N clusters. Once we know which
More informationEfficient Inference in Fully Connected CRFs with Gaussian Edge Potentials
Efficient Inference in Fully Connected CRFs with Gaussian Edge Potentials by Phillip Krahenbuhl and Vladlen Koltun Presented by Adam Stambler Multi-class image segmentation Assign a class label to each
More informationBayesian Classifiers and Probability Estimation. Vassilis Athitsos CSE 4308/5360: Artificial Intelligence I University of Texas at Arlington
Bayesian Classifiers and Probability Estimation Vassilis Athitsos CSE 4308/5360: Artificial Intelligence I University of Texas at Arlington 1 Data Space Suppose that we have a classification problem The
More informationCOMS 4721: Machine Learning for Data Science Lecture 19, 4/6/2017
COMS 4721: Machine Learning for Data Science Lecture 19, 4/6/2017 Prof. John Paisley Department of Electrical Engineering & Data Science Institute Columbia University PRINCIPAL COMPONENT ANALYSIS DIMENSIONALITY
More informationECE 6504: Advanced Topics in Machine Learning Probabilistic Graphical Models and Large-Scale Learning
ECE 6504: Advanced Topics in Machine Learning Probabilistic Graphical Models and Large-Scale Learning Topics Markov Random Fields: Representation Conditional Random Fields Log-Linear Models Readings: KF
More informationProbabilistic Graphical Models
Probabilistic Graphical Models Brown University CSCI 295-P, Spring 213 Prof. Erik Sudderth Lecture 11: Inference & Learning Overview, Gaussian Graphical Models Some figures courtesy Michael Jordan s draft
More informationKernel methods, kernel SVM and ridge regression
Kernel methods, kernel SVM and ridge regression Le Song Machine Learning II: Advanced Topics CSE 8803ML, Spring 2012 Collaborative Filtering 2 Collaborative Filtering R: rating matrix; U: user factor;
More informationVariational Mixture of Gaussians. Sargur Srihari
Variational Mixture of Gaussians Sargur srihari@cedar.buffalo.edu 1 Objective Apply variational inference machinery to Gaussian Mixture Models Demonstrates how Bayesian treatment elegantly resolves difficulties
More informationIEOR E4570: Machine Learning for OR&FE Spring 2015 c 2015 by Martin Haugh. The EM Algorithm
IEOR E4570: Machine Learning for OR&FE Spring 205 c 205 by Martin Haugh The EM Algorithm The EM algorithm is used for obtaining maximum likelihood estimates of parameters when some of the data is missing.
More informationA Discriminatively Trained, Multiscale, Deformable Part Model
A Discriminatively Trained, Multiscale, Deformable Part Model P. Felzenszwalb, D. McAllester, and D. Ramanan Edward Hsiao 16-721 Learning Based Methods in Vision February 16, 2009 Images taken from P.
More informationCheng Soon Ong & Christian Walder. Canberra February June 2018
Cheng Soon Ong & Christian Walder Research Group and College of Engineering and Computer Science Canberra February June 2018 Outlines Overview Introduction Linear Algebra Probability Linear Regression
More informationCSC 412 (Lecture 4): Undirected Graphical Models
CSC 412 (Lecture 4): Undirected Graphical Models Raquel Urtasun University of Toronto Feb 2, 2016 R Urtasun (UofT) CSC 412 Feb 2, 2016 1 / 37 Today Undirected Graphical Models: Semantics of the graph:
More informationLecture 14. Clustering, K-means, and EM
Lecture 14. Clustering, K-means, and EM Prof. Alan Yuille Spring 2014 Outline 1. Clustering 2. K-means 3. EM 1 Clustering Task: Given a set of unlabeled data D = {x 1,..., x n }, we do the following: 1.
More informationHidden Markov Models Part 2: Algorithms
Hidden Markov Models Part 2: Algorithms CSE 6363 Machine Learning Vassilis Athitsos Computer Science and Engineering Department University of Texas at Arlington 1 Hidden Markov Model An HMM consists of:
More informationUndirected Graphical Models
Undirected Graphical Models 1 Conditional Independence Graphs Let G = (V, E) be an undirected graph with vertex set V and edge set E, and let A, B, and C be subsets of vertices. We say that C separates
More informationVariational inference
Simon Leglaive Télécom ParisTech, CNRS LTCI, Université Paris Saclay November 18, 2016, Télécom ParisTech, Paris, France. Outline Introduction Probabilistic model Problem Log-likelihood decomposition EM
More informationConditional Random Field
Introduction Linear-Chain General Specific Implementations Conclusions Corso di Elaborazione del Linguaggio Naturale Pisa, May, 2011 Introduction Linear-Chain General Specific Implementations Conclusions
More informationDiscrete Inference and Learning Lecture 3
Discrete Inference and Learning Lecture 3 MVA 2017 2018 h
More informationCS Lecture 19. Exponential Families & Expectation Propagation
CS 6347 Lecture 19 Exponential Families & Expectation Propagation Discrete State Spaces We have been focusing on the case of MRFs over discrete state spaces Probability distributions over discrete spaces
More informationK-Means and Gaussian Mixture Models
K-Means and Gaussian Mixture Models David Rosenberg New York University October 29, 2016 David Rosenberg (New York University) DS-GA 1003 October 29, 2016 1 / 42 K-Means Clustering K-Means Clustering David
More informationUndirected Graphical Models
Outline Hong Chang Institute of Computing Technology, Chinese Academy of Sciences Machine Learning Methods (Fall 2012) Outline Outline I 1 Introduction 2 Properties Properties 3 Generative vs. Conditional
More informationBased on slides by Richard Zemel
CSC 412/2506 Winter 2018 Probabilistic Learning and Reasoning Lecture 3: Directed Graphical Models and Latent Variables Based on slides by Richard Zemel Learning outcomes What aspects of a model can we
More informationMixture Models and Expectation-Maximization
Mixture Models and Expectation-Maximiation David M. Blei March 9, 2012 EM for mixtures of multinomials The graphical model for a mixture of multinomials π d x dn N D θ k K How should we fit the parameters?
More informationDiscrete Optimization Lecture 5. M. Pawan Kumar
Discrete Optimization Lecture 5 M. Pawan Kumar pawan.kumar@ecp.fr Exam Question Type 1 v 1 s v 0 4 2 1 v 4 Q. Find the distance of the shortest path from s=v 0 to all vertices in the graph using Dijkstra
More informationStructured Variational Inference
Structured Variational Inference Sargur srihari@cedar.buffalo.edu 1 Topics 1. Structured Variational Approximations 1. The Mean Field Approximation 1. The Mean Field Energy 2. Maximizing the energy functional:
More informationGraphical Models for Collaborative Filtering
Graphical Models for Collaborative Filtering Le Song Machine Learning II: Advanced Topics CSE 8803ML, Spring 2012 Sequence modeling HMM, Kalman Filter, etc.: Similarity: the same graphical model topology,
More informationClustering with k-means and Gaussian mixture distributions
Clustering with k-means and Gaussian mixture distributions Machine Learning and Object Recognition 2017-2018 Jakob Verbeek Clustering Finding a group structure in the data Data in one cluster similar to
More informationLearning MN Parameters with Approximation. Sargur Srihari
Learning MN Parameters with Approximation Sargur srihari@cedar.buffalo.edu 1 Topics Iterative exact learning of MN parameters Difficulty with exact methods Approximate methods Approximate Inference Belief
More informationSupport Vector Machines and Speaker Verification
1 Support Vector Machines and Speaker Verification David Cinciruk March 6, 2013 2 Table of Contents Review of Speaker Verification Introduction to Support Vector Machines Derivation of SVM Equations Soft
More information