Dual Decomposition for Inference

Size: px
Start display at page:

Download "Dual Decomposition for Inference"

Transcription

1 Dual Decomposition for Inference Yunshu Liu ASPITRG Research Group

2 References: [1]. D. Sontag, A. Globerson and T. Jaakkola, Introduction to Dual Decomposition for Inference, Optimization for Machine Learning, MIT Press, [2]. A. Globerson and T. Jaakkola, Tutorial: Linear Programming Relaxations for Graphical Models, NIPS, [3]. C. Yanover, T. Meltzer and Y. Weiss, Linear Programming Relaxations and Belief Propagation : An Empirical Study, JMLR, Volume 7, September 2006, pages [4]. A. M. Rush and M. Collins, A Tutorial on Dual Decomposition and Lagrangian Relaxation for Inference in Natural Language Processing, JAIR, Volume 45 Issue 1, September 2012, pages [5]. D. Batra, A.C. Gallagher, D. Parikh and T. Chen, Beyond Trees: MRF Inference via Outer-Planar Decomposition, CVPR, 2010, pages

3 Outline The MAP inference problem Motivating Applications Dual decomposition and Lagrangian relaxation Algorithms for solving the dual problem Subgradient algorithms Block coordinate descent algorithms Discussion

4 Outline The MAP inference problem Motivating Applications Dual decomposition and Lagrangian relaxation Algorithms for solving the dual problem Subgradient algorithms Block coordinate descent algorithms Discussion

5 ,x 4 } is not a clique because of the missing link from x 1 to x 4. re The define MAP the factors inference theproblem: decomposition Markov of the joint random distribution fields the variables In the undirected in the cliques. case, theinprobability fact, we can distribution consider factorizes functions ques, according without loss to functions of generality, definedbecause on the clique otherof cliques the graph. must be l cliques. A clique Thus, is aif subset {x 1,x 2 of,xnodes 3 } is ainmaximal a graph such clique that and there we define exist a on over linkthis between clique, allthen pairsincluding of nodes inanother subset. factor defined over a iablesawould maximal be redundant. clique is a clique such that it is not possible to a clique by C and the set of variables in that clique by x C. Then include any other nodes from the graph in the set without it ceasing to be a clique. ted graph showing a clique (outlined in al clique (outlined in blue). x 1 Example of cliques: x 2 {x 1,x 2 },{x 2,x 3 },{x 3,x 4 },{x 2,x 4 },{x 1,x 3 }, {x 1,x 2,x 3 }, {x 2,x 3,x 4 } Maximal cliques: {x 1,x 2,x 3 }, {x 2,x 3,x 4 } x 3 x 4

6 The MAP inference problem: Markov random fields Markov random fields: Definition Denote C as a clique, x C the set of variables in clique C and ψ C (x C ) a nonnegative potential function associated with clique C. Then a Markov random field is a collection of distributions that factorize as a product of potential functions ψ C (x C ) over the maximal cliques of the graph: p(x) = 1 ψ Z C (x C ) where normalization constant Z = x called the partition function. C C ψ C(x C ) sometimes

7 The MAP inference problem: Markov random fields Factorization of undirected graphs Question: how to write the joint distribution for this undirected graph? x 1 x 3 x 2 (2 3 1) hold Answer: p(x) = 1 Z ψ 12(x 1, x 2 )ψ 13 (x 1, x 3 ) where ψ 12 (x 1, x 2 ) and ψ 13 (x 1, x 3 ) are the potential functions and Z is the partition function that make sure p(x) satisfy the conditions to be a probability distribution.

8 The MAP inference problem MAP: find the maximum probability assignment arg max p(x), where p(x) = 1 x Z ψ C (x C ) Converting to a max-product problem(since Z is constant) arg max x ψ C (x C ) Converting to a max-sum inference task by taking the log and letting θ C (x C ) = log ψ C (x C ) arg max x C θ C (x C ) C C

9 The MAP inference problem MAP: find the maximum probability assignment Consider a set of n discrete variables x 1,, x n, and a set F of subsets on these variables(i.e., f F is a subset of V = {1,, n}), where each subset corresponds to the domain of one of the factors. The MAP inference task is to find an assignment x = (x 1,, x n ) which maximizes the sum of the factors: MAP(θ) = max ( θ i (x i ) + θ f (x f )) x i V f F Note we also included θ i (x i ) on each of the individual variables, which is useful in some applications.

10 The MAP inference problem An example of MAP inference Consider three binary variables x 1, x 2 and x 3 and two factors {1, 2} and {1, 3}, for which we know the values of the factors: (x 12 ): 23 (x 23 ): The goal is MAP(θ) = max x (θ 12 (x 12 ) + θ 23 (x 23 ))

11 Outline The MAP inference problem Motivating Applications Dual decomposition and Lagrangian relaxation Algorithms for solving the dual problem Subgradient algorithms Block coordinate descent algorithms Discussion

12 Motivating Applications Protein side-chain placement (see reference [3] for detail) Amino acid Side-chain x 1 12 (x 1,x 2 ) x 2 x 5 Protein backbone (main chain) 45 (x 4,x 5 ) x 4 x 3 34 (x 3,x 4 ) For fixed protein backbone, find the maximum probability assignment of amino acid side-chains. Orientations of the side-chains are represented by discretized angles called rotamers, rotamer choices for nearby amino acids are energetically coupled(attractive or repulsive forces).

13 Motivating Applications Dependency parsing (see reference [4] for detail) Given a sentence, predict the dependency tree to relates the words in it: root He watched a basketball game last sunday An arc is drawn from the head word of each phrase to words that modify it. We use θ ij (x ij ) to denote the weight on the arcs between words i and j, use θ i (x i ) for higher order interactions between the arc selections for a given word i, where x ij {0, 1} indicate the existence of arc from i to j and x i = {x ij } j i is the coupled arc selection. The task is to maximizing a function involving ij θ ij(x ij ) + i θ i (x i ).

14 Motivating Applications Other applications Computational biology -e.g. protein design etc. (see reference [3] for details) Natural language processing -e.g. combined weighted context-free grammar and finite-state tagger etc. (see reference [4] for details) Computer vision - e.g. image segmentation, object labeling etc. (see reference [5] for details) Graphical Model -e.g. Structure learning of bayesian networks and Markov network.

15 Outline The MAP inference problem Motivating Applications Dual decomposition and Lagrangian relaxation Algorithms for solving the dual problem Subgradient algorithms Block coordinate descent algorithms Discussion

16 Dual decomposition and Lagrangian relaxation: Approximate inference for MAP problem MAP: find the maximum probability assignment The MAP inference task is to find an assignment x = (x 1,, x n ) which maximizes the sum of the factors: MAP(θ) = max ( θ i (x i ) + θ f (x f )) x i V f F MAP inference is hard, even for pairwise MRF the problem is NP-hard. Dynamic programming provide exact inference for some special cases(e.g. tree-structured graph). In general, approximate inference algorithms are used by relax some constraints so that we are able to divide the problem into easy-solving subproblems.

17 Dual decomposition and Lagrangian relaxation Introduction to Dual Decomposition for Inference For the MAP inference problem: X X MAP(θ) = max( θi (xi ) + θf (xf )) x f F i V anydecomposition given factor f, we make copy 1.3forDual and Lagrangian Relaxation (x, x ) of each variables xi in fthe1 2 x1 factor and denote as xif. h(x2, x4) g (x1, x3 ) x3 x4 k (x3, x4) x1 x2 =1 x g (x1, x3 ) xg3 = xg3 3 (x3 ) x4 = g1 (x1 ) g (xg1, xg3 ) xk3 f 1 (x1 ) xh2 h(xh2, xh4 ) + g1 (x1 ) x1 xh4 = x2 = x1 xg1 = xf2 = f (x1, x2) = f (xf1, xf2 ) xf1 xk4 k (xk3, xk4 ) x3 g3 (x3 ) x3 (x ) k3 decomposi 3 dual + Figure 1.3: Illustration of the the derivation of the (Note: the figures are adapted from reference [1]) objective by creating copies of variables. The graph in Fig. 1.2 is shown

18 Dual decomposition and Lagrangian relaxation MAP: find the maximum probability assignment The original MAP inference problem is equivalent to: max i V θ i (x i ) + f F θ f (x f f ) s.t. x f i = x i f, i f since we add the constraints x f i = x i for f, i f. Lagrangian Introduce Lagrange multipliers δ = {δ fi (x i ) : f F, i F, x i }, then define the Lagrangian: L(δ, x, x F ) = θ i (x i ) + θ f (x f f ) i V f F + δ fi (ˆx i )(1[x i = ˆx i ] 1[xi f = ˆx i ]) f F i f ˆx i

19 Dual decomposition and Lagrangian relaxation Lagrangian Then the original MAP inference problem is equivalent to: max x,x F L(δ, x, x F ) s.t. x f i = x i f, i f since if the constraints xi f L(δ, x, x F ) is zero. = x i for f, i f hold, the last term in Meaning of Lagrange multipliers δ fi (x i ) Consider δ fi (x i ) as a message that factor f sends to variable i about its state x i, then iteratively updating δ fi (x i ) is similar to message passing between factor and variables.

20 Dual decomposition and Lagrangian relaxation Lagrangian relaxation Now we discard the condition xi f = x i f, i f and consider the relaxed dual problem: L(δ) = max x,x F L(δ, x, x F ) = i V max xi (θ i (x i ) + f :i f δ fi (x i )) + f F max x f f (θ f (x f f ) i f δ fi (x f i )) Notice that if xi f = x i then the above problem is no longer equivalent to the original one since we changed position of max and sum. Each of the subproblem only involved maximization over small set of variables x i, we use δ fi (x i ) to help achieve the consensus among the subproblems.

21 Dual decomposition and Lagrangian relaxation The dual problem Without the assumption xi f = x i f, i f, L(δ) is maximized over a larger space, thus MAP(θ) L(δ), the goal is then to find the tightest upper bound by solving min δ L(δ), where L(δ) = max x,x F L(δ, x, x F ) = i V max xi (θ i (x i ) + f :i f δ fi (x i )) + f F max xf (θ f (x f ) i f δ fi (x i )) Note that for the optimization problem, the maximizations are independent, thus in the updating process we don t distinguish between x i and x f i.

22 Dual decomposition and Lagrangian relaxation Reparameterization Interpretation of the Dual Given a set of dual variables δ, define θ i δ(x i) = θ i (x i ) + f :i f δ fi(x i ) and θ f δ(x f ) = θ f (x f ) i f δ fi(x i ). For all assignments x we have: θ i (x i ) + θ f (x f ) = θ i δ (x i) + θ f δ (x f ) i V f F i V f F We call θ the reparameterization of the original parameters θ. Observe L(δ) = i V max x i θ δ i (x i) + f F max x f θ δ f (x f ) Thus the dual problem can be considered as searching for the best reparameterization θ to minimize L(δ).

23 Dual decomposition and Lagrangian relaxation Reparameterization Interpretation of the Dual Introduction to Dual Decomposition for Inference f (x 1,x 2 ) x 1 f1(x 1) x 2 f2(x 2) f(x1,x2) x1 x2 g(x1,x3) h(x2,x4) x 3 x4 k(x3,x4) f1(x 1) f2(x 2) x 1 + x 1 x 2 + x g (x 1,x 3 ) 2 g1(x 1 ) h2(x 2) g3(x 3 ) h (x 2,x 4 ) g1(x 1 ) h2(x 2) x g3(x 3 ) 3 k4(x 4) h4(x 4) + x 3 x 4 + x 4 k3(x 3) h4(x 4) x 3 x 4 k (x 3,x 4 ) k3(x 3) k4(x 4) Figure 1.2: Illustration of the the dual decomposition objective. Left: The original pairwise model consisting of four factors. Right: The maximization problems corresponding to the objective L( ). Each blue ellipse contains the (Note: the figures are adapted from reference [1])

24 Dual decomposition and Lagrangian relaxation Existence of duality gap The difference between the primal optimal value and the dual optimal value is called duality gap. For our problem MAP(θ) = max ( θ i (x i ) + x i V f F θ f (x f )) min L(δ) δ we don t necessarily have strong duality(i.e., equality hold in the above equation, thus no duality gap). Strong duality hold only when the subproblems agreeing on a maximizing assignment, that is δ, x such that xi arg max xi θδ i (x i ) and x f arg max xf θδ f (x f ), which is not guaranteed. However, in real-world problems, exact solution are frequently found using dual decomposition.

25 Outline The MAP inference problem Motivating Applications Dual decomposition and Lagrangian relaxation Algorithms for solving the dual problem Subgradient algorithms Block coordinate descent algorithms Discussion

26 Outline The MAP inference problem Motivating Applications Dual decomposition and Lagrangian relaxation Algorithms for solving the dual problem Subgradient algorithms Block coordinate descent algorithms Discussion

27 Subgradient algorithms Gradient descent/steepest descent If f (x) is differentiable in a neighborhood of a point a, then F(x) decreases fastest if one goes from a in the direction of the negative gradient of f at a, f (a), which give us the gradient descent algorithm: x t+1 = x t α k f (x t ) where α t is the step size, recall different ways to choose step size: fixed, decreasing, adapted. (Note: the figure is adapted from Wikipedia)

28 Subgradient algorithms Limitations of gradient descent For ill conditioned convex problems, gradient descent perform zigzags The function should be differentiable Our problem of minimizing L(δ) L(δ) = i V max x i θ δ i (x i) + f F max x f θ δ f (x f ) L(δ) is continuous and convex, but not differentiable at all points δ where θ δ i (x i) or θ δ f (x f ) have multiple optima.

29 Subgradient algorithms Subgradient algorithms A subgradient of a convex function L(δ) at δ is a vector g δ such that for all δ, L(δ ) L(δ) + g δ (δ δ) Suppose that the dual variables at iteration t are given by δ t, then the updating rule is: δ t+1 fi (x i ) = δ t fi (x i) α t g t fi (x i) where g t is the subgradient of L(δ) at δ t.

30 arising from computer vision. Subgradient algorithms Calculating the Subgradient of L( ) Updating subgradient g δ Let In xi s this be a section maximizing we show assignment how to calculate of θ i δt (x the i ), and subgradient let x f f be a of L( ), co ing the description of the subgradient algorithm. Given the curren maximizing assignment of θ i δt (x f ), then the subgradient is variables updated by the t, we first choose a maximizing assignment for each subpr following algorithm: Let x (recall L(δ, s x, x F ) = i V θ i(x i ) + t i be a maximizing assignment of f F θ f (xi f f ) (x + i ), and let x f f be a maxi assignment of f F i f ˆx i δ fi (ˆx t f i )(1[x (x f ). The subgradient i = ˆx i ] 1[xi f of L( ) at = ˆx i ])) t is then given following pseudocode: gfi t (x i)=0, 8f,i 2 f,x i For f 2 F and i 2 f: If x f i 6= xs i : gfi t (xs i ) = +1 gfi t (xf i ) = 1. Thus, each time that x f i 6= x s i, the subgradient update has the e (Note: the algorithm is adapted decreasing the value of from reference [1]) t t ) and increasing the value of ). Sim i (x s i i (x f i

31 Subgradient algorithms Updating subgradient g δ Recall δ t+1 fi (x i ) = δfi t (x i) α t gfi t (x i) θ i δ (x i) = θ i (x i ) + δ fi (x i ) f :i f θ δ f (x f ) = θ f (x f ) i f δ fi (x i ) In the updating process, when ever xi f δt decreasing and θ i (xi f xi s, θ i δt (xi s ) will be δt ) will be increasing; similarly, θ i (xi f, x f \i) will be decreasing and θ δt i (x s i, x f \i ) will be increasing.

32 Subgradient algorithms Notes on subgradient algorithm If at any iteration k we find that g t = 0, then xi f = xi s for all f F, i f, there is no duality gap and we exactly solved the problem. The subgradient algorithm will solve the dual to optimality whenever the step-sizes are chosen such that lim t α t = 0 and t=0 α t =, e.g. α t = 1 t.

33 Outline The MAP inference problem Motivating Applications Dual decomposition and Lagrangian relaxation Algorithms for solving the dual problem Subgradient algorithms Block coordinate descent algorithms Discussion

34 Block coordinate descent algorithms Coordinate descent Fixed all other coordinates, do a line search along the chosen coordinate direction, then change the update coordinate cyclically throughout the procedure.

35 Block coordinate descent algorithms Block coordinate descent Coordinates are divided into blocks, where each block contains a subset of coordinates, the update rule is to only update one block at a time(fix other blocks): Update Fixed coordinate Update Fixed coordinate Update Fixed coordinate Difficult of designing block coordinate descent algorithms: How to choose coordinate block to update. How to design the update rule for given coordinate block.

36 Block coordinate descent algorithms Block coordinate descent for MAP inferene The updating rule of subgradient method only involved xi s,x f f corresponding to the maximizing assignment, where in block coordinate descent algorithms we update δ fi (x i ) for all configurations of x i. Two common choices: Fix f and i, update δ fi (x i ) for x i Max-Sum Diffusion(MSD) (Schlesinger 76, Werner 07) Fix f, update δ fi (x i ) for i, x i Max-Product Linear Programming(MPLP) (Globerson & Jaakkola 07)

37 Block coordinate descent algorithms The Max-Sum Diffusion algorithm δ fi (x i ) = 1 2 δ f i (x i ) max {θ f (x f ) δ x f î (x î )} f \i î f \i where δ f i (x i ) = θ i (x i ) + ˆf f δˆf i (x i ). For fixed f and i, î f \i δ f î(x ) is the message from f to all the î children of f except for i (i.e. f \i); ˆf f δˆf i (x i ) is the message from all of i s parents other than f to i. a e f h m Updated x i x 1 x 2 x k x n Used in Update

38 Block coordinate descent algorithms The Max-Product Linear Programming algorithm δ fi (x i ) = δ f i (x i ) + 1 f max {θ f (x f ) + x f \i î f δ f i (x i )} where δ f i (x i ) = θ i (x i ) + ˆf f δˆf i (x i ). For fixed f, î f δ f i (x i ) is the message from f s co-parents to all their children; ˆf f δˆf i (x i ) is the message from all of i s parents other than f to i. a e f h m Updated x i x 1 x 2 x k x n Used in Update

39 Block coordinate descent algorithms Comparison of Subgradient, MSD and MPLP In subgradient algorithm, only x s i,x f f corresponding to the maximizing assignment are used. In MSD, we fix f,i and update δ fi for all x i simultaneously. In MPLP, we fix f and update δ fi for i and x i simultaneously. Larger blocks are used in MPLP; More communication are involved between subproblems in MPLP.

40 Block coordinate descent algorithms Limitations of coordinate descent For non-smooth problem, coordinate descent may stuck at the corners! Limitations of coordinate descent For non-smooth problem, coordinate descent may stuck at the corners!

41 Block coordinate descent algorithms Possible solution Use soft-max(convex and differentiable) instead of max: max f i ɛ log i i exp f i ɛ Smoothing Example where soft-max Model converge where MPLP to gets maxstuck. as ɛ 0. Simulation result of using soft-max when MPLP stucked: Subgradient MPLP Soft MPLP Optimum objective iteration (Note: the figure is adapted from reference [2])

42 Outline The MAP inference problem Motivating Applications Dual decomposition and Lagrangian relaxation Algorithms for solving the dual problem Discussion

43 Discussion Decoding the MAP assignment After locally decoding the single node reparameterizations: x i arg max ˆx i θδ i (ˆx i ) If the node terms θ δ i (x i) have a unique maximum, we say that δ is locally decodable to x. For subgradient algorithms, when the dual decomposition has a unique solution and it is integral, then the MAP assignment is guaranteed to be find from the dual solution.

44 Discussion Decoding the MAP assignment For coordinate descent algorithm, when the dual decomposition has a unique solution and it is integral, then For binary pairwise MRFs, fixed points of the coordinate descent algorithms are locally decodable to true MAP assignment. For pairwise tree-structured MRFs and MRFs with a single cycle, fixed points of the coordinate descent algorithms are locally decodable to true MAP assignment. There exist non-binary pairwise MRFs with cycles and dual optimal fixed points of the coordinate descent algorithms that are not locally decodable.

45 Discussion Combine subgradient and coordinate descent Subgradient is slow but guaranteed to converge, coordinate descent is faster but may not converge to optimal, but we can combine the two algorithms: First use coordinate descent, after it is close to converge, use subgradient to reach the global optimal. Alternating between subgradient and coordinate descent.

46 Thanks! Questions!

Probabilistic Graphical Models

Probabilistic Graphical Models Probabilistic Graphical Models David Sontag New York University Lecture 6, March 7, 2013 David Sontag (NYU) Graphical Models Lecture 6, March 7, 2013 1 / 25 Today s lecture 1 Dual decomposition 2 MAP inference

More information

Graphical Models and Independence Models

Graphical Models and Independence Models Graphical Models and Independence Models Yunshu Liu ASPITRG Research Group 2014-03-04 References: [1]. Steffen Lauritzen, Graphical Models, Oxford University Press, 1996 [2]. Christopher M. Bishop, Pattern

More information

Max$Sum(( Exact(Inference(

Max$Sum(( Exact(Inference( Probabilis7c( Graphical( Models( Inference( MAP( Max$Sum(( Exact(Inference( Product Summation a 1 b 1 8 a 1 b 2 1 a 2 b 1 0.5 a 2 b 2 2 a 1 b 1 3 a 1 b 2 0 a 2 b 1-1 a 2 b 2 1 Max-Sum Elimination in Chains

More information

Undirected Graphical Models

Undirected Graphical Models Outline Hong Chang Institute of Computing Technology, Chinese Academy of Sciences Machine Learning Methods (Fall 2012) Outline Outline I 1 Introduction 2 Properties Properties 3 Generative vs. Conditional

More information

Junction Tree, BP and Variational Methods

Junction Tree, BP and Variational Methods Junction Tree, BP and Variational Methods Adrian Weller MLSALT4 Lecture Feb 21, 2018 With thanks to David Sontag (MIT) and Tony Jebara (Columbia) for use of many slides and illustrations For more information,

More information

Dual Decomposition for Marginal Inference

Dual Decomposition for Marginal Inference Dual Decomposition for Marginal Inference Justin Domke Rochester Institute of echnology Rochester, NY 14623 Abstract We present a dual decomposition approach to the treereweighted belief propagation objective.

More information

Part 7: Structured Prediction and Energy Minimization (2/2)

Part 7: Structured Prediction and Energy Minimization (2/2) Part 7: Structured Prediction and Energy Minimization (2/2) Colorado Springs, 25th June 2011 G: Worst-case Complexity Hard problem Generality Optimality Worst-case complexity Integrality Determinism G:

More information

Tightness of LP Relaxations for Almost Balanced Models

Tightness of LP Relaxations for Almost Balanced Models Tightness of LP Relaxations for Almost Balanced Models Adrian Weller University of Cambridge AISTATS May 10, 2016 Joint work with Mark Rowland and David Sontag For more information, see http://mlg.eng.cam.ac.uk/adrian/

More information

Undirected graphical models

Undirected graphical models Undirected graphical models Semantics of probabilistic models over undirected graphs Parameters of undirected models Example applications COMP-652 and ECSE-608, February 16, 2017 1 Undirected graphical

More information

Intelligent Systems:

Intelligent Systems: Intelligent Systems: Undirected Graphical models (Factor Graphs) (2 lectures) Carsten Rother 15/01/2015 Intelligent Systems: Probabilistic Inference in DGM and UGM Roadmap for next two lectures Definition

More information

14 : Theory of Variational Inference: Inner and Outer Approximation

14 : Theory of Variational Inference: Inner and Outer Approximation 10-708: Probabilistic Graphical Models 10-708, Spring 2017 14 : Theory of Variational Inference: Inner and Outer Approximation Lecturer: Eric P. Xing Scribes: Maria Ryskina, Yen-Chia Hsu 1 Introduction

More information

arxiv: v1 [stat.ml] 26 Feb 2013

arxiv: v1 [stat.ml] 26 Feb 2013 Variational Algorithms for Marginal MAP Qiang Liu Donald Bren School of Information and Computer Sciences University of California, Irvine Irvine, CA, 92697-3425, USA qliu1@uci.edu arxiv:1302.6584v1 [stat.ml]

More information

Probabilistic and Bayesian Machine Learning

Probabilistic and Bayesian Machine Learning Probabilistic and Bayesian Machine Learning Day 4: Expectation and Belief Propagation Yee Whye Teh ywteh@gatsby.ucl.ac.uk Gatsby Computational Neuroscience Unit University College London http://www.gatsby.ucl.ac.uk/

More information

14 : Theory of Variational Inference: Inner and Outer Approximation

14 : Theory of Variational Inference: Inner and Outer Approximation 10-708: Probabilistic Graphical Models 10-708, Spring 2014 14 : Theory of Variational Inference: Inner and Outer Approximation Lecturer: Eric P. Xing Scribes: Yu-Hsin Kuo, Amos Ng 1 Introduction Last lecture

More information

Machine Learning 4771

Machine Learning 4771 Machine Learning 4771 Instructor: Tony Jebara Topic 16 Undirected Graphs Undirected Separation Inferring Marginals & Conditionals Moralization Junction Trees Triangulation Undirected Graphs Separation

More information

Revisiting the Limits of MAP Inference by MWSS on Perfect Graphs

Revisiting the Limits of MAP Inference by MWSS on Perfect Graphs Revisiting the Limits of MAP Inference by MWSS on Perfect Graphs Adrian Weller University of Cambridge CP 2015 Cork, Ireland Slides and full paper at http://mlg.eng.cam.ac.uk/adrian/ 1 / 21 Motivation:

More information

Dual Decomposition for Natural Language Processing. Decoding complexity

Dual Decomposition for Natural Language Processing. Decoding complexity Dual Decomposition for atural Language Processing Alexander M. Rush and Michael Collins Decoding complexity focus: decoding problem for natural language tasks motivation: y = arg max y f (y) richer model

More information

Lagrangian Relaxation Algorithms for Inference in Natural Language Processing

Lagrangian Relaxation Algorithms for Inference in Natural Language Processing Lagrangian Relaxation Algorithms for Inference in Natural Language Processing Alexander M. Rush and Michael Collins (based on joint work with Yin-Wen Chang, Tommi Jaakkola, Terry Koo, Roi Reichart, David

More information

A Combined LP and QP Relaxation for MAP

A Combined LP and QP Relaxation for MAP A Combined LP and QP Relaxation for MAP Patrick Pletscher ETH Zurich, Switzerland pletscher@inf.ethz.ch Sharon Wulff ETH Zurich, Switzerland sharon.wulff@inf.ethz.ch Abstract MAP inference for general

More information

Markov Random Fields

Markov Random Fields Markov Random Fields Umamahesh Srinivas ipal Group Meeting February 25, 2011 Outline 1 Basic graph-theoretic concepts 2 Markov chain 3 Markov random field (MRF) 4 Gauss-Markov random field (GMRF), and

More information

13 : Variational Inference: Loopy Belief Propagation

13 : Variational Inference: Loopy Belief Propagation 10-708: Probabilistic Graphical Models 10-708, Spring 2014 13 : Variational Inference: Loopy Belief Propagation Lecturer: Eric P. Xing Scribes: Rajarshi Das, Zhengzhong Liu, Dishan Gupta 1 Introduction

More information

Subproblem-Tree Calibration: A Unified Approach to Max-Product Message Passing

Subproblem-Tree Calibration: A Unified Approach to Max-Product Message Passing Subproblem-Tree Calibration: A Unified Approach to Max-Product Message Passing Huayan Wang huayanw@cs.stanford.edu Computer Science Department, Stanford University, Palo Alto, CA 90 USA Daphne Koller koller@cs.stanford.edu

More information

Variational Inference (11/04/13)

Variational Inference (11/04/13) STA561: Probabilistic machine learning Variational Inference (11/04/13) Lecturer: Barbara Engelhardt Scribes: Matt Dickenson, Alireza Samany, Tracy Schifeling 1 Introduction In this lecture we will further

More information

Probabilistic Graphical Models. Theory of Variational Inference: Inner and Outer Approximation. Lecture 15, March 4, 2013

Probabilistic Graphical Models. Theory of Variational Inference: Inner and Outer Approximation. Lecture 15, March 4, 2013 School of Computer Science Probabilistic Graphical Models Theory of Variational Inference: Inner and Outer Approximation Junming Yin Lecture 15, March 4, 2013 Reading: W & J Book Chapters 1 Roadmap Two

More information

Variational Algorithms for Marginal MAP

Variational Algorithms for Marginal MAP Variational Algorithms for Marginal MAP Qiang Liu Department of Computer Science University of California, Irvine Irvine, CA, 92697 qliu1@ics.uci.edu Alexander Ihler Department of Computer Science University

More information

Probabilistic Graphical Models

Probabilistic Graphical Models School of Computer Science Probabilistic Graphical Models Variational Inference II: Mean Field Method and Variational Principle Junming Yin Lecture 15, March 7, 2012 X 1 X 1 X 1 X 1 X 2 X 3 X 2 X 2 X 3

More information

Factor Graphs and Message Passing Algorithms Part 1: Introduction

Factor Graphs and Message Passing Algorithms Part 1: Introduction Factor Graphs and Message Passing Algorithms Part 1: Introduction Hans-Andrea Loeliger December 2007 1 The Two Basic Problems 1. Marginalization: Compute f k (x k ) f(x 1,..., x n ) x 1,..., x n except

More information

Review: Directed Models (Bayes Nets)

Review: Directed Models (Bayes Nets) X Review: Directed Models (Bayes Nets) Lecture 3: Undirected Graphical Models Sam Roweis January 2, 24 Semantics: x y z if z d-separates x and y d-separation: z d-separates x from y if along every undirected

More information

3 : Representation of Undirected GM

3 : Representation of Undirected GM 10-708: Probabilistic Graphical Models 10-708, Spring 2016 3 : Representation of Undirected GM Lecturer: Eric P. Xing Scribes: Longqi Cai, Man-Chia Chang 1 MRF vs BN There are two types of graphical models:

More information

9 Forward-backward algorithm, sum-product on factor graphs

9 Forward-backward algorithm, sum-product on factor graphs Massachusetts Institute of Technology Department of Electrical Engineering and Computer Science 6.438 Algorithms For Inference Fall 2014 9 Forward-backward algorithm, sum-product on factor graphs The previous

More information

UNDERSTANDING BELIEF PROPOGATION AND ITS GENERALIZATIONS

UNDERSTANDING BELIEF PROPOGATION AND ITS GENERALIZATIONS UNDERSTANDING BELIEF PROPOGATION AND ITS GENERALIZATIONS JONATHAN YEDIDIA, WILLIAM FREEMAN, YAIR WEISS 2001 MERL TECH REPORT Kristin Branson and Ian Fasel June 11, 2003 1. Inference Inference problems

More information

Tightening Fractional Covering Upper Bounds on the Partition Function for High-Order Region Graphs

Tightening Fractional Covering Upper Bounds on the Partition Function for High-Order Region Graphs Tightening Fractional Covering Upper Bounds on the Partition Function for High-Order Region Graphs Tamir Hazan TTI Chicago Chicago, IL 60637 Jian Peng TTI Chicago Chicago, IL 60637 Amnon Shashua The Hebrew

More information

Lecture 9: PGM Learning

Lecture 9: PGM Learning 13 Oct 2014 Intro. to Stats. Machine Learning COMP SCI 4401/7401 Table of Contents I Learning parameters in MRFs 1 Learning parameters in MRFs Inference and Learning Given parameters (of potentials) and

More information

CS Lecture 4. Markov Random Fields

CS Lecture 4. Markov Random Fields CS 6347 Lecture 4 Markov Random Fields Recap Announcements First homework is available on elearning Reminder: Office hours Tuesday from 10am-11am Last Time Bayesian networks Today Markov random fields

More information

Lecture 3: Lagrangian duality and algorithms for the Lagrangian dual problem

Lecture 3: Lagrangian duality and algorithms for the Lagrangian dual problem Lecture 3: Lagrangian duality and algorithms for the Lagrangian dual problem Michael Patriksson 0-0 The Relaxation Theorem 1 Problem: find f := infimum f(x), x subject to x S, (1a) (1b) where f : R n R

More information

Lecture 15. Probabilistic Models on Graph

Lecture 15. Probabilistic Models on Graph Lecture 15. Probabilistic Models on Graph Prof. Alan Yuille Spring 2014 1 Introduction We discuss how to define probabilistic models that use richly structured probability distributions and describe how

More information

3.10 Lagrangian relaxation

3.10 Lagrangian relaxation 3.10 Lagrangian relaxation Consider a generic ILP problem min {c t x : Ax b, Dx d, x Z n } with integer coefficients. Suppose Dx d are the complicating constraints. Often the linear relaxation and the

More information

Probabilistic Graphical Models

Probabilistic Graphical Models School of Computer Science Probabilistic Graphical Models Variational Inference IV: Variational Principle II Junming Yin Lecture 17, March 21, 2012 X 1 X 1 X 1 X 1 X 2 X 3 X 2 X 2 X 3 X 3 Reading: X 4

More information

Graphical Models for Collaborative Filtering

Graphical Models for Collaborative Filtering Graphical Models for Collaborative Filtering Le Song Machine Learning II: Advanced Topics CSE 8803ML, Spring 2012 Sequence modeling HMM, Kalman Filter, etc.: Similarity: the same graphical model topology,

More information

Does Better Inference mean Better Learning?

Does Better Inference mean Better Learning? Does Better Inference mean Better Learning? Andrew E. Gelfand, Rina Dechter & Alexander Ihler Department of Computer Science University of California, Irvine {agelfand,dechter,ihler}@ics.uci.edu Abstract

More information

CPSC 540: Machine Learning

CPSC 540: Machine Learning CPSC 540: Machine Learning Undirected Graphical Models Mark Schmidt University of British Columbia Winter 2016 Admin Assignment 3: 2 late days to hand it in today, Thursday is final day. Assignment 4:

More information

CPSC 540: Machine Learning

CPSC 540: Machine Learning CPSC 540: Machine Learning More Approximate Inference Mark Schmidt University of British Columbia Winter 2018 Last Time: Approximate Inference We ve been discussing graphical models for density estimation,

More information

A graph contains a set of nodes (vertices) connected by links (edges or arcs)

A graph contains a set of nodes (vertices) connected by links (edges or arcs) BOLTZMANN MACHINES Generative Models Graphical Models A graph contains a set of nodes (vertices) connected by links (edges or arcs) In a probabilistic graphical model, each node represents a random variable,

More information

17 Variational Inference

17 Variational Inference Massachusetts Institute of Technology Department of Electrical Engineering and Computer Science 6.438 Algorithms for Inference Fall 2014 17 Variational Inference Prompted by loopy graphs for which exact

More information

Chapter 17: Undirected Graphical Models

Chapter 17: Undirected Graphical Models Chapter 17: Undirected Graphical Models The Elements of Statistical Learning Biaobin Jiang Department of Biological Sciences Purdue University bjiang@purdue.edu October 30, 2014 Biaobin Jiang (Purdue)

More information

Variational algorithms for marginal MAP

Variational algorithms for marginal MAP Variational algorithms for marginal MAP Alexander Ihler UC Irvine CIOG Workshop November 2011 Variational algorithms for marginal MAP Alexander Ihler UC Irvine CIOG Workshop November 2011 Work with Qiang

More information

Texas A&M University

Texas A&M University Texas A&M University Electrical & Computer Engineering Department Graphical Modeling Course Project Author: Mostafa Karimi UIN: 225000309 Prof Krishna Narayanan May 10 1 Introduction Proteins are made

More information

Primal/Dual Decomposition Methods

Primal/Dual Decomposition Methods Primal/Dual Decomposition Methods Daniel P. Palomar Hong Kong University of Science and Technology (HKUST) ELEC5470 - Convex Optimization Fall 2018-19, HKUST, Hong Kong Outline of Lecture Subgradients

More information

Graphical models and message-passing Part II: Marginals and likelihoods

Graphical models and message-passing Part II: Marginals and likelihoods Graphical models and message-passing Part II: Marginals and likelihoods Martin Wainwright UC Berkeley Departments of Statistics, and EECS Tutorial materials (slides, monograph, lecture notes) available

More information

Accelerated dual decomposition for MAP inference

Accelerated dual decomposition for MAP inference Vladimir Jojic vjojic@csstanfordedu Stephen Gould sgould@stanfordedu Daphne Koller koller@csstanfordedu Computer Science Department, 353 Serra Mall, Stanford University, Stanford, CA 94305, USA Abstract

More information

Accelerated dual decomposition for MAP inference

Accelerated dual decomposition for MAP inference Vladimir Jojic vjojic@csstanfordedu Stephen Gould sgould@stanfordedu Daphne Koller koller@csstanfordedu Computer Science Department, 353 Serra Mall, Stanford University, Stanford, CA 94305, USA Abstract

More information

Conditional Random Fields and beyond DANIEL KHASHABI CS 546 UIUC, 2013

Conditional Random Fields and beyond DANIEL KHASHABI CS 546 UIUC, 2013 Conditional Random Fields and beyond DANIEL KHASHABI CS 546 UIUC, 2013 Outline Modeling Inference Training Applications Outline Modeling Problem definition Discriminative vs. Generative Chain CRF General

More information

CS-E4830 Kernel Methods in Machine Learning

CS-E4830 Kernel Methods in Machine Learning CS-E4830 Kernel Methods in Machine Learning Lecture 3: Convex optimization and duality Juho Rousu 27. September, 2017 Juho Rousu 27. September, 2017 1 / 45 Convex optimization Convex optimisation This

More information

Probabilistic Graphical Models Lecture Notes Fall 2009

Probabilistic Graphical Models Lecture Notes Fall 2009 Probabilistic Graphical Models Lecture Notes Fall 2009 October 28, 2009 Byoung-Tak Zhang School of omputer Science and Engineering & ognitive Science, Brain Science, and Bioinformatics Seoul National University

More information

ICS-E4030 Kernel Methods in Machine Learning

ICS-E4030 Kernel Methods in Machine Learning ICS-E4030 Kernel Methods in Machine Learning Lecture 3: Convex optimization and duality Juho Rousu 28. September, 2016 Juho Rousu 28. September, 2016 1 / 38 Convex optimization Convex optimisation This

More information

Convex Optimization. Dani Yogatama. School of Computer Science, Carnegie Mellon University, Pittsburgh, PA, USA. February 12, 2014

Convex Optimization. Dani Yogatama. School of Computer Science, Carnegie Mellon University, Pittsburgh, PA, USA. February 12, 2014 Convex Optimization Dani Yogatama School of Computer Science, Carnegie Mellon University, Pittsburgh, PA, USA February 12, 2014 Dani Yogatama (Carnegie Mellon University) Convex Optimization February 12,

More information

Part I. C. M. Bishop PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 8: GRAPHICAL MODELS

Part I. C. M. Bishop PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 8: GRAPHICAL MODELS Part I C. M. Bishop PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 8: GRAPHICAL MODELS Probabilistic Graphical Models Graphical representation of a probabilistic model Each variable corresponds to a

More information

CSE 254: MAP estimation via agreement on (hyper)trees: Message-passing and linear programming approaches

CSE 254: MAP estimation via agreement on (hyper)trees: Message-passing and linear programming approaches CSE 254: MAP estimation via agreement on (hyper)trees: Message-passing and linear programming approaches A presentation by Evan Ettinger November 11, 2005 Outline Introduction Motivation and Background

More information

Graphical Models and Kernel Methods

Graphical Models and Kernel Methods Graphical Models and Kernel Methods Jerry Zhu Department of Computer Sciences University of Wisconsin Madison, USA MLSS June 17, 2014 1 / 123 Outline Graphical Models Probabilistic Inference Directed vs.

More information

Approximating the Partition Function by Deleting and then Correcting for Model Edges (Extended Abstract)

Approximating the Partition Function by Deleting and then Correcting for Model Edges (Extended Abstract) Approximating the Partition Function by Deleting and then Correcting for Model Edges (Extended Abstract) Arthur Choi and Adnan Darwiche Computer Science Department University of California, Los Angeles

More information

11. Learning graphical models

11. Learning graphical models Learning graphical models 11-1 11. Learning graphical models Maximum likelihood Parameter learning Structural learning Learning partially observed graphical models Learning graphical models 11-2 statistical

More information

ECS289: Scalable Machine Learning

ECS289: Scalable Machine Learning ECS289: Scalable Machine Learning Cho-Jui Hsieh UC Davis Sept 29, 2016 Outline Convex vs Nonconvex Functions Coordinate Descent Gradient Descent Newton s method Stochastic Gradient Descent Numerical Optimization

More information

A brief introduction to Conditional Random Fields

A brief introduction to Conditional Random Fields A brief introduction to Conditional Random Fields Mark Johnson Macquarie University April, 2005, updated October 2010 1 Talk outline Graphical models Maximum likelihood and maximum conditional likelihood

More information

CS Lecture 13. More Maximum Likelihood

CS Lecture 13. More Maximum Likelihood CS 6347 Lecture 13 More Maxiu Likelihood Recap Last tie: Introduction to axiu likelihood estiation MLE for Bayesian networks Optial CPTs correspond to epirical counts Today: MLE for CRFs 2 Maxiu Likelihood

More information

Lecture 8: PGM Inference

Lecture 8: PGM Inference 15 September 2014 Intro. to Stats. Machine Learning COMP SCI 4401/7401 Table of Contents I 1 Variable elimination Max-product Sum-product 2 LP Relaxations QP Relaxations 3 Marginal and MAP X1 X2 X3 X4

More information

Probabilistic Graphical Models (I)

Probabilistic Graphical Models (I) Probabilistic Graphical Models (I) Hongxin Zhang zhx@cad.zju.edu.cn State Key Lab of CAD&CG, ZJU 2015-03-31 Probabilistic Graphical Models Modeling many real-world problems => a large number of random

More information

How to Compute Primal Solution from Dual One in LP Relaxation of MAP Inference in MRF?

How to Compute Primal Solution from Dual One in LP Relaxation of MAP Inference in MRF? How to Compute Primal Solution from Dual One in LP Relaxation of MAP Inference in MRF? Tomáš Werner Center for Machine Perception, Czech Technical University, Prague Abstract In LP relaxation of MAP inference

More information

A new look at reweighted message passing

A new look at reweighted message passing Copyright IEEE. Accepted to Transactions on Pattern Analysis and Machine Intelligence (TPAMI) http://ieeexplore.ieee.org/xpl/articledetails.jsp?arnumber=692686 A new look at reweighted message passing

More information

Joint Optimization of Segmentation and Appearance Models

Joint Optimization of Segmentation and Appearance Models Joint Optimization of Segmentation and Appearance Models David Mandle, Sameep Tandon April 29, 2013 David Mandle, Sameep Tandon (Stanford) April 29, 2013 1 / 19 Overview 1 Recap: Image Segmentation 2 Optimization

More information

Machine Learning for Data Science (CS4786) Lecture 24

Machine Learning for Data Science (CS4786) Lecture 24 Machine Learning for Data Science (CS4786) Lecture 24 Graphical Models: Approximate Inference Course Webpage : http://www.cs.cornell.edu/courses/cs4786/2016sp/ BELIEF PROPAGATION OR MESSAGE PASSING Each

More information

3 Undirected Graphical Models

3 Undirected Graphical Models Massachusetts Institute of Technology Department of Electrical Engineering and Computer Science 6.438 Algorithms For Inference Fall 2014 3 Undirected Graphical Models In this lecture, we discuss undirected

More information

Inference and Representation

Inference and Representation Inference and Representation David Sontag New York University Lecture 5, Sept. 30, 2014 David Sontag (NYU) Inference and Representation Lecture 5, Sept. 30, 2014 1 / 16 Today s lecture 1 Running-time of

More information

Graphical Model Inference with Perfect Graphs

Graphical Model Inference with Perfect Graphs Graphical Model Inference with Perfect Graphs Tony Jebara Columbia University July 25, 2013 joint work with Adrian Weller Graphical models and Markov random fields We depict a graphical model G as a bipartite

More information

Machine Learning Lecture 14

Machine Learning Lecture 14 Many slides adapted from B. Schiele, S. Roth, Z. Gharahmani Machine Learning Lecture 14 Undirected Graphical Models & Inference 23.06.2015 Bastian Leibe RWTH Aachen http://www.vision.rwth-aachen.de/ leibe@vision.rwth-aachen.de

More information

Directed and Undirected Graphical Models

Directed and Undirected Graphical Models Directed and Undirected Davide Bacciu Dipartimento di Informatica Università di Pisa bacciu@di.unipi.it Machine Learning: Neural Networks and Advanced Models (AA2) Last Lecture Refresher Lecture Plan Directed

More information

Tighter Relaxations for MAP-MRF Inference: A Local Primal-Dual Gap based Separation Algorithm

Tighter Relaxations for MAP-MRF Inference: A Local Primal-Dual Gap based Separation Algorithm Tighter Relaxations for MAP-MRF Inference: A Local Primal-Dual Gap based Separation Algorithm Dhruv Batra Sebastian Nowozin Pushmeet Kohli TTI-Chicago Microsoft Research Cambridge Microsoft Research Cambridge

More information

Introduction to Graphical Models. Srikumar Ramalingam School of Computing University of Utah

Introduction to Graphical Models. Srikumar Ramalingam School of Computing University of Utah Introduction to Graphical Models Srikumar Ramalingam School of Computing University of Utah Reference Christopher M. Bishop, Pattern Recognition and Machine Learning, Jonathan S. Yedidia, William T. Freeman,

More information

Cheng Soon Ong & Christian Walder. Canberra February June 2018

Cheng Soon Ong & Christian Walder. Canberra February June 2018 Cheng Soon Ong & Christian Walder Research Group and College of Engineering and Computer Science Canberra February June 2018 Outlines Overview Introduction Linear Algebra Probability Linear Regression

More information

COMS 4771 Probabilistic Reasoning via Graphical Models. Nakul Verma

COMS 4771 Probabilistic Reasoning via Graphical Models. Nakul Verma COMS 4771 Probabilistic Reasoning via Graphical Models Nakul Verma Last time Dimensionality Reduction Linear vs non-linear Dimensionality Reduction Principal Component Analysis (PCA) Non-linear methods

More information

Inference as Optimization

Inference as Optimization Inference as Optimization Sargur Srihari srihari@cedar.buffalo.edu 1 Topics in Inference as Optimization Overview Exact Inference revisited The Energy Functional Optimizing the Energy Functional 2 Exact

More information

Discrete Markov Random Fields the Inference story. Pradeep Ravikumar

Discrete Markov Random Fields the Inference story. Pradeep Ravikumar Discrete Markov Random Fields the Inference story Pradeep Ravikumar Graphical Models, The History How to model stochastic processes of the world? I want to model the world, and I like graphs... 2 Mid to

More information

Accurate prediction for atomic-level protein design and its application in diversifying the near-optimal sequence space

Accurate prediction for atomic-level protein design and its application in diversifying the near-optimal sequence space Accurate prediction for atomic-level protein design and its application in diversifying the near-optimal sequence space Pablo Gainza CPS 296: Topics in Computational Structural Biology Department of Computer

More information

Probabilistic Graphical Models

Probabilistic Graphical Models Probabilistic Graphical Models Brown University CSCI 295-P, Spring 213 Prof. Erik Sudderth Lecture 11: Inference & Learning Overview, Gaussian Graphical Models Some figures courtesy Michael Jordan s draft

More information

Message-Passing Algorithms for GMRFs and Non-Linear Optimization

Message-Passing Algorithms for GMRFs and Non-Linear Optimization Message-Passing Algorithms for GMRFs and Non-Linear Optimization Jason Johnson Joint Work with Dmitry Malioutov, Venkat Chandrasekaran and Alan Willsky Stochastic Systems Group, MIT NIPS Workshop: Approximate

More information

Tighter Relaxations for MAP-MRF Inference: A Local Primal-Dual Gap based Separation Algorithm

Tighter Relaxations for MAP-MRF Inference: A Local Primal-Dual Gap based Separation Algorithm Tighter Relaxations for MAP-MRF Inference: A Local Primal-Dual Gap based Separation Algorithm Dhruv Batra Sebastian Nowozin Pushmeet Kohli TTI-Chicago Microsoft Research Cambridge Microsoft Research Cambridge

More information

Massachusetts Institute of Technology Department of Electrical Engineering and Computer Science Algorithms For Inference Fall 2014

Massachusetts Institute of Technology Department of Electrical Engineering and Computer Science Algorithms For Inference Fall 2014 Massachusetts Institute of Technology Department of Electrical Engineering and Computer Science 6.438 Algorithms For Inference Fall 2014 Problem Set 3 Issued: Thursday, September 25, 2014 Due: Thursday,

More information

3.4 Relaxations and bounds

3.4 Relaxations and bounds 3.4 Relaxations and bounds Consider a generic Discrete Optimization problem z = min{c(x) : x X} with an optimal solution x X. In general, the algorithms generate not only a decreasing sequence of upper

More information

6.867 Machine learning, lecture 23 (Jaakkola)

6.867 Machine learning, lecture 23 (Jaakkola) Lecture topics: Markov Random Fields Probabilistic inference Markov Random Fields We will briefly go over undirected graphical models or Markov Random Fields (MRFs) as they will be needed in the context

More information

Logistic Regression: Online, Lazy, Kernelized, Sequential, etc.

Logistic Regression: Online, Lazy, Kernelized, Sequential, etc. Logistic Regression: Online, Lazy, Kernelized, Sequential, etc. Harsha Veeramachaneni Thomson Reuter Research and Development April 1, 2010 Harsha Veeramachaneni (TR R&D) Logistic Regression April 1, 2010

More information

Probabilistic Graphical Models: Lagrangian Relaxation Algorithms for Natural Language Processing

Probabilistic Graphical Models: Lagrangian Relaxation Algorithms for Natural Language Processing Probabilistic Graphical Models: Lagrangian Relaxation Algorithms for atural Language Processing Alexander M. Rush (based on joint work with Michael Collins, Tommi Jaakkola, Terry Koo, David Sontag) Uncertainty

More information

CSC 412 (Lecture 4): Undirected Graphical Models

CSC 412 (Lecture 4): Undirected Graphical Models CSC 412 (Lecture 4): Undirected Graphical Models Raquel Urtasun University of Toronto Feb 2, 2016 R Urtasun (UofT) CSC 412 Feb 2, 2016 1 / 37 Today Undirected Graphical Models: Semantics of the graph:

More information

Randomized Coordinate Descent Methods on Optimization Problems with Linearly Coupled Constraints

Randomized Coordinate Descent Methods on Optimization Problems with Linearly Coupled Constraints Randomized Coordinate Descent Methods on Optimization Problems with Linearly Coupled Constraints By I. Necoara, Y. Nesterov, and F. Glineur Lijun Xu Optimization Group Meeting November 27, 2012 Outline

More information

Structured Prediction

Structured Prediction Structured Prediction Classification Algorithms Classify objects x X into labels y Y First there was binary: Y = {0, 1} Then multiclass: Y = {1,...,6} The next generation: Structured Labels Structured

More information

Chris Bishop s PRML Ch. 8: Graphical Models

Chris Bishop s PRML Ch. 8: Graphical Models Chris Bishop s PRML Ch. 8: Graphical Models January 24, 2008 Introduction Visualize the structure of a probabilistic model Design and motivate new models Insights into the model s properties, in particular

More information

Duality in Linear Programs. Lecturer: Ryan Tibshirani Convex Optimization /36-725

Duality in Linear Programs. Lecturer: Ryan Tibshirani Convex Optimization /36-725 Duality in Linear Programs Lecturer: Ryan Tibshirani Convex Optimization 10-725/36-725 1 Last time: proximal gradient descent Consider the problem x g(x) + h(x) with g, h convex, g differentiable, and

More information

MAP Examples. Sargur Srihari

MAP Examples. Sargur Srihari MAP Examples Sargur srihari@cedar.buffalo.edu 1 Potts Model CRF for OCR Topics Image segmentation based on energy minimization 2 Examples of MAP Many interesting examples of MAP inference are instances

More information

Probabilistic Graphical Models: MRFs and CRFs. CSE628: Natural Language Processing Guest Lecturer: Veselin Stoyanov

Probabilistic Graphical Models: MRFs and CRFs. CSE628: Natural Language Processing Guest Lecturer: Veselin Stoyanov Probabilistic Graphical Models: MRFs and CRFs CSE628: Natural Language Processing Guest Lecturer: Veselin Stoyanov Why PGMs? PGMs can model joint probabilities of many events. many techniques commonly

More information

CPSC 540: Machine Learning

CPSC 540: Machine Learning CPSC 540: Machine Learning Mark Schmidt University of British Columbia Winter 2018 Last Time: Learning and Inference in DAGs We discussed learning in DAG models, log p(x W ) = n d log p(x i j x i pa(j),

More information

Approximation. Inderjit S. Dhillon Dept of Computer Science UT Austin. SAMSI Massive Datasets Opening Workshop Raleigh, North Carolina.

Approximation. Inderjit S. Dhillon Dept of Computer Science UT Austin. SAMSI Massive Datasets Opening Workshop Raleigh, North Carolina. Using Quadratic Approximation Inderjit S. Dhillon Dept of Computer Science UT Austin SAMSI Massive Datasets Opening Workshop Raleigh, North Carolina Sept 12, 2012 Joint work with C. Hsieh, M. Sustik and

More information

5. Sum-product algorithm

5. Sum-product algorithm Sum-product algorithm 5-1 5. Sum-product algorithm Elimination algorithm Sum-product algorithm on a line Sum-product algorithm on a tree Sum-product algorithm 5-2 Inference tasks on graphical models consider

More information