Discrete Optimization in Machine Learning. Colorado Reed
|
|
- Hilary Bates
- 5 years ago
- Views:
Transcription
1 Discrete Optimization in Machine Learning Colorado Reed [ML-RCC] 31 Jan
2 Acknowledgements Some slides/animations based on: Krause et al. tutorials: Pushmeet Kohli tutorial: Jeff Bilmes class please consult before reusing slide if marked AK, PM, or JB 2
3 What is Discrete Optimization? Optimization problem with finite or countably infinite set of solutions image segmentation games common examples TSP; Set Cover; Network flows; Vertex Coloring; Knapsack Problem 3
4 Discrete Optimization in ML Given RVs Y, X 1,, X n Predict Y X i1,, X ik Y Sick k most informative features X 1 Fever X 2 Age X 3 Biopsy Result A * := argmax I(X A ; Y) s.t. A k where I(X A ; Y) = H(Y) H(Y X A ) AK 4
5 Discrete Optimization in ML Given RVs X 1,, X n, V X 1 X 4 X 3 Partition V into sets that are as independent as possible X 2 X 5 X 6 formally A X 4 X 1 X 3 V \ A X 2 X 5 X 6 A * := argmin A I(X A ; X V\A ) s.t. A n where I(X A ; X V\A ) = H(X V\A ) H(X V\A X A ) 5
6 This Tutorial Given a finite set V, function F: 2 V R A * = argopt A F(A) s.t. satisfy contraints(a) Methods not covered here Relaxations Mixed integer programming POMDPs Heuristics submodularity scalable algorithms naturally occurring rising interest in ML performance guarantees! Goal: foster submodular intuition 6
7 Submodular set functions Set function F: 2 V R on Finite Set V = {1, 2,, n} is called submodular if For all A, B V: F(A) + F(B) F(A B) + F(A B) + B A A B A B + AK Equivalent diminishing returns characterization: + {s} B A + {s} Large improvement Small improvement For A B, s B, F(A {s}) F(A) F(B {s}) F(B) 7
8 Intuitive Example: Shopping F(A) = amount of time spent shopping for items in A V = { } A = { } + B = { } + F({ }) F({ }) - F({ }) - F({ }) 8 For A B, s B, F(A {s}) F(A) F(B {s}) F(B)
9 Example: Entropy Given random variables X 1,...,X N and index set V F (A) =H(X A )= X x A p(x A ) log p(x A ) Proof let A B V, 2 V \ B, need to show H(X A[{ } ) H(X A ) H(X B[{ } ) H(X B ) H(X { } X A ) H(X { } X B ) information never hurts JB 9
10 Example: Entropy Given random variables X 1,...,X N and index set V F (A) =H(X A )= X x A p(x A ) log p(x A ) Proof # 2 (mutual information) let A B V, 2 V \ B, I(X A ; X B ) 0 H(X A )+H(x H(X B ) H(X A[B ) H(X A\B ) 0 H(X A )+H(x H(X B ) H(X A[B )+H(X A\B ) 10
11 Example: Mutual information Given random variables X 1,...,X N and index set V F (A) =I(X A ; X V \A )=H(X V \A ) H(X V \A X A ) F (A [ { }) F (A) =H(X { } X A ) H(X { } X V \A[{ } ) Nonincreasing in A Nondecreasing in A F(A {ϵ})-f(a) monotonically nonincreasing F submodular 11
12 Quick Definition: [Super]modular Set function F: 2 V R on V = {1,2,,n} is supermodular iff For all A, B V: F(A) + F(B) F(A B) + F(A B) increasing returns OR synergy ; -F is submodular Dr. Steven Pinker (Harvard Psychologist) answering NYT question: What scientific concept would improve everybody's cognitive toolkit? Emergent systems are ones in which many different elements interact. The pattern of interaction then produces a new element that is greater than the sum of the parts, which then exercises a top-down influence on the constituent elements. JB modular if both submodular and supermodular 12
13 Quiz V = {1,...,n},A V define characteristic vector : w A = where wi A =1ifi2A; 0 otherwise (w A 1,...,w A n ) F (A) =w A r T where r i 2 R, r = n Prove F (A) is submodular Submodular definition For A B, s B: F(A {s}) F(A) F(B {s}) F(B) 13
14 Closedness properties F 1,,F m submodular functions on V and λ 1,,λ m > 0 Then: F(A) = i λ i F i (A) is submodular Submodularity closed under nonnegative linear combinations Very useful: F θ (A) submodular: θ P(θ) F θ (A) submodular 14
15 Submodularity and convexity V = {1,...,n},A V define characteristic vector : w A = {w A 1,...,w A n } where w A i =1ifi 2 A; 0 otherwise Very important Theorem [Lovasz 1983] Every submodular function F ( ) induces a function g( ) on R n s.t.! g(w) is convex! F (A) =g(w A ) for all A V! min F (A) =ming(w) s.t.w 2 [0, 1]n A w but what is g(w)? 15
16 Submodular Polytope P f = {x 2 R n : x(a) apple F (A) for all A V } where x(a) = X i2a x i x {b} Example: V = {a,b} A F(A) 0 {a} - 1 {b} 2 {a,b} 0 P F 2 1 x({b}) F({b}) x({a,b}) F({a,b}) x {a} AK x({a}) F({a}) 16
17 Lovasz extension [Lovasz 1983] g(w) = max w T x x2p f P f = {x 2 R n : x(a) apple F (A) for all A V } w {b} x w =argmax xϵ PF w T x x w 2 1 w g(w)=w T x w w {a} Quiz We defined g(w), however evaluating g(w) involves solving why exponentially many constraints? an LP with exponentially many constraints AK, JB 17
18 Evaluating the Lovasz extension g(w) = max w T x x2p f P f = {x 2 R n : x(a) apple F (A) for all A V } Very Important Theorem [Edmonds 71, Lovasz 83] For any given w can obtain optimal solution x w to the LP via the following greedy algorithm : Order V = {e 1,...,e n } : w(e 1 )... w(e n ) Let x w (e i )=F ({e 1,...,e i }) F ({e 1,...,e i 1 }) then w T x w = g(w) = max x2p f w T x 18
19 [-2,2] Example: Lovasz extension -2 g([0, 1]) = [0, 1] T [ g([1, 1]) = [1, 1] T [ [-1,1] -1 w {b} 2, 2] = 2 = F ({b}) 2 {b} {a,b} 1 {} {a} w 0 1 {a} 1, 1] = 0 = F ({a, b}) A F(A) 0 {a} - 1 {b} 2 {a,b} 0 w =[0, 1], want g(w) g(w) = max w T x x2p f greedy ordering: e 1 = b, e 2 = a x w (e 1 )=F ({b}) F (;) =2 x w (e 2 )=F ({b, a}) F ({b}) = 2 AK Order V = {e 1,...,e n } : w(e 1 )... w(e n ) Let x w (e i )=F({e 1,...,e i }) F ({e 1,...,e i 1 }) then w T x w = g(w) = max w T x 19 x2p f
20 Lovasz Extension: useful? Theorem [Lovasz 1983] g(w) obtains its minimum in [0, 1] n at a corner [0,1] 2 Translation: the corners correspond to the characteristic vector inputs, so minimizing g(w) minimizes F(A)! How to minimize? 20
21 21 slide credit: Satoru Iwata
22 Submodular Minimization in Practice x(v)=f(v) [-1,1] x* 2 1 x {b} Base polytope: B F = P F {x(v) = F(V)} A F(A) 0 {a} - 1 {b} 2 {a,b} x {a} Minimum norm algorithm: 1. Find x* = argmin x 2 s.t. x ϵ B F x*=[-1,1] 2. Return A* = {i: x*(i) < 0} A*={a} AK Theorem [Fujishige 1991]: A* is an optimal solution Note: solve 1. via Wolfe s algorithm Runtime finite but unknown 22
23 23
24 Empirical Justification [Fujishige et al 2006] Running time (seconds) Min-norm algorithm problem size AK DEMO 24
25 MNP: Super-fast! Blue: [Fujishige 2006] ; Red: [Garcia 2012] 25
26 Symmetric Submodular Functions If F(A) = F(V \ A) then we can minimize in O(n 3 ) o NB: for nontrivial solutions Queyranne s Algorithm: o Straightforward split/merge element sets algorithm o Takes around 10 minutes to explain o Implemented in Krause s Matlab toolbox general submodular functions can be exactly minimized with min-norm algorithm symmetric submodular functions can be exactly minimized with Queyranne s algorithm in O(n 3 ) 26
27 Example: Q-Clustering [Narasimhan, Jojic, Bilmes NIPS 2005] o o o o o A 1 V o o o o A 2 o o AK Group data points V into homogeneous clusters Find a partition that minimizes F(A 1,,A k ) = i E(A i ) e.g: Entropy H(A) Cut function Theorem: (2-2/k) F(P) F(P opt ) for partition P k = 2 and F(A) is symmetric submod : use Q-Algo and obtain opt! First algorithm for finding optimal MDL clustering for k = 2 And provides optimality guarantees for all symmetric submodular partitions! 27
28 Example: MAP for Pairwise MRFs p q unary terms (data) - x p are discrete variables (i.e., x p {0,1}) - θ p ( ) are unary potentials - θ pq (, ) are pairwise potentials submodular iff pairwise terms (coherence) pairwise submodular functions can be minimized in O(n 3 ) O(n) in practice! (via st-mincut algorithm) 28
29 Example: MAP for Pairwise MRFs S st-mincut E(x) T Solution PK 29
30 Sub-example: Image Segmentation PK 30
31 Sub-example: Image Segmentation E(x θ) = θ p x p + θ pq x p (1 x q ) p likelihood p,q regularity x p {0,1} E: {0, 1} n R 0 fg 1 bg Can find global minimum using min-cut/max-flow algorithm PK 31
32 Sub-example: Image Segmentation E(a 1, a 2 ) = 2a 1 + 5ā 1 + 9a 2 + 4ā 2 + 2a 1 ā 2 + ā 1 a 2 a 1 a 2 source Augmenting Paths Algorithm 1. Find path from source to sink with positive capacity 2. Push maximum possible flow through this path a a 2 3. Repeat until no path can be found Cut = 11 sink E(1,1) = 11 32
33 Sub-example: Image Segmentation E(a 1, a 2 ) = ā ā a9a 5a ā4ā 1 2a a ā 2 2a + ā 1 ā 1 a 22 + ā 1 a 2 E(a 1, a 2 ) = 2a 1 + 5ā 1 + 9a 2 + 4ā 2 + 2a 1 ā 2 + ā 1 a 2 a 1 a 2 max flow source Augmenting Paths Algorithm 1. Find path from source to sink with positive capacity 2. Push maximum possible flow through this path a a 2 3. Repeat until no path can be found sink 33
34 History of Maxflow Algorithms Augmenting Path and Push-Relabel n: #nodes m: #edges U: maximum edge weight Predates Edmonds & Lovasz work! [Slide credit: Andrew Goldberg; 34 from PK]
35 Sub-example: Image denoising image credit: 35
36 Sub-example: Image denoising Y 1 Y 2 Y 3 Pairwise Markov Random Field X 1 X 2 X 3 P(x 1,,x n,y 1,,y n ) = i,j ψ i,j (y i,y j ) Π i φ i (x i,y i ) Y 4 Y 5 Y 6 Y 7 X 4 Y 8 X 5 Y 9 X 6 Want argmax y P(y x) =argmax y log P(x,y) =argmin y i,j E i,j (y i,y j )+ i E i (y i ) X 7 X 8 X 9 X i : observed pixels Y i : true pixels E i,j (y i,y j ) = -log ψ i,j (y i,y j ) Suppose y i are binary F(A) = E(y A ) where y A i = 1 iff i2 A Then min y E(y) = min A F(A) AK 36
37 maximizing submodular functions 37
38 Concave or Convex? g( A ) A For A B, s B, F(A {s}) F(A) F(B {s}) F(B) Theorem Suppose g: N R and F(A) = g( A ) Then F(A) submodular if and only if g is concave 38
39 Maximizing convex functions: NP hard? Maximizing submodular functions: NP hard Yes, but it some cases we have approximability guarantees 39
40 Maximizing Submodular Functions Approximability (Greedy) 1 1/e Constraints monotone with cardinality constraints monotone with matroid constraints nonnegative symmetric submodular 1/3 nonnegativity 40
41 Example: Max Cover (NP-Hard) Want to cover floorplan with discs Place sensors in building Possible locations V For A V: F(A) area covered by sensors placed at A Goal: Place k sensors that cover as much area as possible AK 41
42 Example: max cover is submodular A={S 1, S 2 } S 1 S 2 S F(A {S }) - F(A) S 1 S 2 F(B {S }) - F(B) S 3 AK S 4 S B = {S 1, S 2, S 3, S 4 } 42
43 repeat Greedy Max Cover Algorithm -pick the location that covers the max uncovered area -mark the area covered by the sensor as covered until done GREEDY (1 1/e)OPT : best possible [Feige 1998] 43
44 Submodularity embraces Greed Theorem [Nemhauser et al 1978]: Greedy gives a (1-1/e)-approximation for maximizing monotone submodular functions.* *Subject to cardinality constraints A greedy : F(A greedy ) (1-1/e) max A k F(A) Common scenario: best possible guarantee unless NP = P 44
45 Theorem [Nemhauser et al 1978]: Greedy gives a (1-1/e)-approximation for maximizing monotone submodular functions.* *Subject to cardinality constraints Proof: S i : first i elements selected by the greedy algorithm C : F (C) =OPT Show via induction : F (C) F (S i ) apple (1 1/k) i F (C) case i =0:F (C) F (S 0 ) apple F (C) In step i>0, Greedy selects element i maximizing F Si 1 ( i ) F (C) F (S i 1 ) apple X F Si 1 ( ) by submodularity & inductive hypothesis 2C\S i 1 X 1 implying : F Si 1 ( i ) C \ S i 1 F Si 1 ( ) 2 \ 2C\S i 1 1 k (F (C) F (S i 1)) 45
46 Theorem [Nemhauser et al 1978]: Greedy gives a (1-1/e)-approximation for maximizing monotone submodular functions.* *Subject to cardinality constraints Proof continued: S i : first i elements selected by the greedy algorithm C : F (C) =OPT Show via induction : F (C) F (S i ) apple (1 1/k) i F (C) In step i>0, Greedy selects element i maximizing F Si 1 ( i ) 1... F Si 1 ( i ) k (F (C) F (S i 1)) F (C) F (S i )=F(C) F (S i 1 ) F Si 1 ( i ) 1 apple F (C) F (S i 1 ) k (F (C) F (S i 1)) =(1 1/k)(F (C) F (S i 1 )) apple (1 1/k) i F (C) apple 1 46 e F (C)
47 Example: Influence in social networks [Kempe, Kleinberg, Tardos KDD 03] Alice Dorothy 0.2 Eric Prob. of influencing Bob 0.5 Fiona Charlie Who should get free cell phones? V = {Alice,Bob,Charlie,Dorothy,Eric,Fiona} F(A) = Expected number of people influenced when targeting A Key idea: Flip coins in advance live edges slide credit: AK 47
48 Adaptive Submodularity [Golovan & Krause 2010] Key idea: optimize over policies (decision trees) instead of sets Objective function: F(A, x V ) A: set of actions you ve taken x V : realization of a set of random variables Expected marginal benefits conditioned on the observations (e x A ) = Σ P(x V x A ) [F(A {e}, x V ) - F(A, x V ))] x V Adaptive submodularity: (e x A ) (e x B ) for A B Adaptive monotonicity: (e x A ) 0 for all e, x A Theorem: Adaptive greedy algorithm returns policy π greedy G(π greedy ) (1 1/e) G(π opt ), where G(π) = E[F(π(x V ), x V )] 48 more policies than sets
49 Adaptive Submodularity Example Prior over diseases P(Y) Likelihood of outcomes P(X V Y) Suppose that P(X V Y) is deterministic (noise free) Each test eliminates hypotheses y How should we test to eliminate all incorrect hypotheses? Generalized binary search - Equivalent to max. info-gain Y Sick X 1 Fever X 2 Age X 3 Biopsy Result 49
50 Summary: Submodularity in ML Minimization Clustering MAP inference MRF (computer vision) Structure learning* Maximization Active learning Ranking* Feature Selection 50
51 DEMO See toolbox: 51
52 Resources Beyond convexity: submodularity in machine learning K Jeff Bilmes Submodular Optimization Class: ee595a_spring_2011/ Pushmeet Kohli video lecture on MAP in MRF: Coursera courses on discrete optimization:
Submodularity in Machine Learning
Saifuddin Syed MLRG Summer 2016 1 / 39 What are submodular functions Outline 1 What are submodular functions Motivation Submodularity and Concavity Examples 2 Properties of submodular functions Submodularity
More information9. Submodular function optimization
Submodular function maximization 9-9. Submodular function optimization Submodular function maximization Greedy algorithm for monotone case Influence maximization Greedy algorithm for non-monotone case
More informationSubmodular Functions Properties Algorithms Machine Learning
Submodular Functions Properties Algorithms Machine Learning Rémi Gilleron Inria Lille - Nord Europe & LIFL & Univ Lille Jan. 12 revised Aug. 14 Rémi Gilleron (Mostrare) Submodular Functions Jan. 12 revised
More informationEE595A Submodular functions, their optimization and applications Spring 2011
EE595A Submodular functions, their optimization and applications Spring 2011 Prof. Jeff Bilmes University of Washington, Seattle Department of Electrical Engineering Winter Quarter, 2011 http://ee.washington.edu/class/235/2011wtr/index.html
More informationSubmodular Functions, Optimization, and Applications to Machine Learning
Submodular Functions, Optimization, and Applications to Machine Learning Spring Quarter, Lecture 12 http://www.ee.washington.edu/people/faculty/bilmes/classes/ee596b_spring_2016/ Prof. Jeff Bilmes University
More informationDiscrete Inference and Learning Lecture 3
Discrete Inference and Learning Lecture 3 MVA 2017 2018 h
More informationOptimization of Submodular Functions Tutorial - lecture I
Optimization of Submodular Functions Tutorial - lecture I Jan Vondrák 1 1 IBM Almaden Research Center San Jose, CA Jan Vondrák (IBM Almaden) Submodular Optimization Tutorial 1 / 1 Lecture I: outline 1
More informationSubmodular Functions and Their Applications
Submodular Functions and Their Applications Jan Vondrák IBM Almaden Research Center San Jose, CA SIAM Discrete Math conference, Minneapolis, MN June 204 Jan Vondrák (IBM Almaden) Submodular Functions and
More information1 Submodular functions
CS 369P: Polyhedral techniques in combinatorial optimization Instructor: Jan Vondrák Lecture date: November 16, 2010 1 Submodular functions We have already encountered submodular functions. Let s recall
More informationEE596A Submodularity Functions, Optimization, and Application to Machine Learning Fall Quarter 2012
EE596A Submodularity Functions, Optimization, and Application to Machine Learning Fall Quarter 2012 Prof. Jeff Bilmes University of Washington, Seattle Department of Electrical Engineering Spring Quarter,
More informationCSE541 Class 22. Jeremy Buhler. November 22, Today: how to generalize some well-known approximation results
CSE541 Class 22 Jeremy Buhler November 22, 2016 Today: how to generalize some well-known approximation results 1 Intuition: Behavior of Functions Consider a real-valued function gz) on integers or reals).
More informationA submodular-supermodular procedure with applications to discriminative structure learning
A submodular-supermodular procedure with applications to discriminative structure learning Mukund Narasimhan Department of Electrical Engineering University of Washington Seattle WA 98004 mukundn@ee.washington.edu
More information1 Maximizing a Submodular Function
6.883 Learning with Combinatorial Structure Notes for Lecture 16 Author: Arpit Agarwal 1 Maximizing a Submodular Function In the last lecture we looked at maximization of a monotone submodular function,
More informationMaximizing the Spread of Influence through a Social Network. David Kempe, Jon Kleinberg, Éva Tardos SIGKDD 03
Maximizing the Spread of Influence through a Social Network David Kempe, Jon Kleinberg, Éva Tardos SIGKDD 03 Influence and Social Networks Economics, sociology, political science, etc. all have studied
More informationSymmetry and hardness of submodular maximization problems
Symmetry and hardness of submodular maximization problems Jan Vondrák 1 1 Department of Mathematics Princeton University Jan Vondrák (Princeton University) Symmetry and hardness 1 / 25 Submodularity Definition
More informationApproximating Submodular Functions. Nick Harvey University of British Columbia
Approximating Submodular Functions Nick Harvey University of British Columbia Approximating Submodular Functions Part 1 Nick Harvey University of British Columbia Department of Computer Science July 11th,
More informationMAP Estimation Algorithms in Computer Vision - Part II
MAP Estimation Algorithms in Comuter Vision - Part II M. Pawan Kumar, University of Oford Pushmeet Kohli, Microsoft Research Eamle: Image Segmentation E() = c i i + c ij i (1- j ) i i,j E: {0,1} n R 0
More informationCS224W: Social and Information Network Analysis Jure Leskovec, Stanford University
CS224W: Social and Information Network Analysis Jure Leskovec, Stanford University http://cs224w.stanford.edu Find most influential set S of size k: largest expected cascade size f(s) if set S is activated
More informationPushmeet Kohli. Microsoft Research Cambridge. IbPRIA 2011
Pushmeet Kohli Microsoft Research Cambridge IbPRIA 2011 2:30 4:30 Labelling Problems Graphical Models Message Passing 4:30 5:00 - Coffee break 5:00 7:00 - Graph Cuts Move Making Algorithms Speed and Efficiency
More informationAn Introduction to Submodular Functions and Optimization. Maurice Queyranne University of British Columbia, and IMA Visitor (Fall 2002)
An Introduction to Submodular Functions and Optimization Maurice Queyranne University of British Columbia, and IMA Visitor (Fall 2002) IMA, November 4, 2002 1. Submodular set functions Generalization to
More informationEE595A Submodular functions, their optimization and applications Spring 2011
EE595A Submodular functions, their optimization and applications Spring 2011 Prof. Jeff Bilmes University of Washington, Seattle Department of Electrical Engineering Spring Quarter, 2011 http://ssli.ee.washington.edu/~bilmes/ee595a_spring_2011/
More informationQuiz 1 Date: Monday, October 17, 2016
10-704 Information Processing and Learning Fall 016 Quiz 1 Date: Monday, October 17, 016 Name: Andrew ID: Department: Guidelines: 1. PLEASE DO NOT TURN THIS PAGE UNTIL INSTRUCTED.. Write your name, Andrew
More informationApplications of Submodular Functions in Speech and NLP
Applications of Submodular Functions in Speech and NLP Jeff Bilmes Department of Electrical Engineering University of Washington, Seattle http://ssli.ee.washington.edu/~bilmes June 27, 2011 J. Bilmes Applications
More informationActive Learning and Optimized Information Gathering
Active Learning and Optimized Information Gathering Lecture 13 Submodularity (cont d) CS 101.2 Andreas Krause Announcements Homework 2: Due Thursday Feb 19 Project milestone due: Feb 24 4 Pages, NIPS format:
More informationMarkov Random Fields for Computer Vision (Part 1)
Markov Random Fields for Computer Vision (Part 1) Machine Learning Summer School (MLSS 2011) Stephen Gould stephen.gould@anu.edu.au Australian National University 13 17 June, 2011 Stephen Gould 1/23 Pixel
More informationTightness of LP Relaxations for Almost Balanced Models
Tightness of LP Relaxations for Almost Balanced Models Adrian Weller University of Cambridge AISTATS May 10, 2016 Joint work with Mark Rowland and David Sontag For more information, see http://mlg.eng.cam.ac.uk/adrian/
More informationDecision Trees. Nicholas Ruozzi University of Texas at Dallas. Based on the slides of Vibhav Gogate and David Sontag
Decision Trees Nicholas Ruozzi University of Texas at Dallas Based on the slides of Vibhav Gogate and David Sontag Supervised Learning Input: labelled training data i.e., data plus desired output Assumption:
More informationDiscrete Optimization Lecture 5. M. Pawan Kumar
Discrete Optimization Lecture 5 M. Pawan Kumar pawan.kumar@ecp.fr Exam Question Type 1 v 1 s v 0 4 2 1 v 4 Q. Find the distance of the shortest path from s=v 0 to all vertices in the graph using Dijkstra
More informationAdaptive Submodularity: Theory and Applications in Active Learning and Stochastic Optimization
Journal of Artificial Intelligence Research 42 (2011) 427-486 Submitted 1/11; published 11/11 Adaptive Submodularity: Theory and Applications in Active Learning and Stochastic Optimization Daniel Golovin
More informationPushmeet Kohli Microsoft Research
Pushmeet Kohli Microsoft Research E(x) x in {0,1} n Image (D) [Boykov and Jolly 01] [Blake et al. 04] E(x) = c i x i Pixel Colour x in {0,1} n Unary Cost (c i ) Dark (Bg) Bright (Fg) x* = arg min E(x)
More informationVariable Elimination (VE) Barak Sternberg
Variable Elimination (VE) Barak Sternberg Basic Ideas in VE Example 1: Let G be a Chain Bayesian Graph: X 1 X 2 X n 1 X n How would one compute P X n = k? Using the CPDs: P X 2 = x = x Val X1 P X 1 = x
More informationIntelligent Systems:
Intelligent Systems: Undirected Graphical models (Factor Graphs) (2 lectures) Carsten Rother 15/01/2015 Intelligent Systems: Probabilistic Inference in DGM and UGM Roadmap for next two lectures Definition
More informationDiscrete DC Programming by Discrete Convex Analysis
Combinatorial Optimization (Oberwolfach, November 9 15, 2014) Discrete DC Programming by Discrete Convex Analysis Use of Conjugacy Kazuo Murota (U. Tokyo) with Takanori Maehara (NII, JST) 141111DCprogOberwolfLong
More informationPart 6: Structured Prediction and Energy Minimization (1/2)
Part 6: Structured Prediction and Energy Minimization (1/2) Providence, 21st June 2012 Prediction Problem Prediction Problem y = f (x) = argmax y Y g(x, y) g(x, y) = p(y x), factor graphs/mrf/crf, g(x,
More informationChapter 11. Approximation Algorithms. Slides by Kevin Wayne Pearson-Addison Wesley. All rights reserved.
Chapter 11 Approximation Algorithms Slides by Kevin Wayne. Copyright @ 2005 Pearson-Addison Wesley. All rights reserved. 1 Approximation Algorithms Q. Suppose I need to solve an NP-hard problem. What should
More informationLearning Bayes Net Structures
Learning Bayes Net Structures KF, Chapter 15 15.5 (RN, Chapter 20) Some material taken from C Guesterin (CMU), K Murphy (UBC) 1 2 Learning Bayes Nets Known Structure Unknown Data Complete Missing Easy
More informationCS 322: (Social and Information) Network Analysis Jure Leskovec Stanford University
CS 322: (Social and Inormation) Network Analysis Jure Leskovec Stanord University Initially some nodes S are active Each edge (a,b) has probability (weight) p ab b 0.4 0.4 0.2 a 0.4 0.2 0.4 g 0.2 Node
More informationSubmodular Functions, Optimization, and Applications to Machine Learning
Submodular Functions, Optimization, and Applications to Machine Learning Spring Quarter, Lecture 3 http://j.ee.washington.edu/~bilmes/classes/ee596b_spring_2014/ Prof. Jeff Bilmes University of Washington,
More informationCS599: Convex and Combinatorial Optimization Fall 2013 Lecture 24: Introduction to Submodular Functions. Instructor: Shaddin Dughmi
CS599: Convex and Combinatorial Optimization Fall 2013 Lecture 24: Introduction to Submodular Functions Instructor: Shaddin Dughmi Announcements Introduction We saw how matroids form a class of feasible
More informationSubmodular Functions, Optimization, and Applications to Machine Learning
Submodular Functions, Optimization, and Applications to Machine Learning Spring Quarter, Lecture 14 http://www.ee.washington.edu/people/faculty/bilmes/classes/ee596b_spring_2016/ Prof. Jeff Bilmes University
More informationLearning symmetric non-monotone submodular functions
Learning symmetric non-monotone submodular functions Maria-Florina Balcan Georgia Institute of Technology ninamf@cc.gatech.edu Nicholas J. A. Harvey University of British Columbia nickhar@cs.ubc.ca Satoru
More informationMore Approximation Algorithms
CS 473: Algorithms, Spring 2018 More Approximation Algorithms Lecture 25 April 26, 2018 Most slides are courtesy Prof. Chekuri Ruta (UIUC) CS473 1 Spring 2018 1 / 28 Formal definition of approximation
More informationPart 7: Structured Prediction and Energy Minimization (2/2)
Part 7: Structured Prediction and Energy Minimization (2/2) Colorado Springs, 25th June 2011 G: Worst-case Complexity Hard problem Generality Optimality Worst-case complexity Integrality Determinism G:
More informationGreedy Maximization Framework for Graph-based Influence Functions
Greedy Maximization Framework for Graph-based Influence Functions Edith Cohen Google Research Tel Aviv University HotWeb '16 1 Large Graphs Model relations/interactions (edges) between entities (nodes)
More informationChapter 11. Approximation Algorithms. Slides by Kevin Wayne Pearson-Addison Wesley. All rights reserved.
Chapter 11 Approximation Algorithms Slides by Kevin Wayne. Copyright @ 2005 Pearson-Addison Wesley. All rights reserved. 1 P and NP P: The family of problems that can be solved quickly in polynomial time.
More informationFrom MAP to Marginals: Variational Inference in Bayesian Submodular Models
From MAP to Marginals: Variational Inference in Bayesian Submodular Models Josip Djolonga Department of Computer Science ETH Zürich josipd@inf.ethz.ch Andreas Krause Department of Computer Science ETH
More informationLecture Notes 1 Probability and Random Variables. Conditional Probability and Independence. Functions of a Random Variable
Lecture Notes 1 Probability and Random Variables Probability Spaces Conditional Probability and Independence Random Variables Functions of a Random Variable Generation of a Random Variable Jointly Distributed
More informationBrief Introduction of Machine Learning Techniques for Content Analysis
1 Brief Introduction of Machine Learning Techniques for Content Analysis Wei-Ta Chu 2008/11/20 Outline 2 Overview Gaussian Mixture Model (GMM) Hidden Markov Model (HMM) Support Vector Machine (SVM) Overview
More informationSemi-Markov/Graph Cuts
Semi-Markov/Graph Cuts Alireza Shafaei University of British Columbia August, 2015 1 / 30 A Quick Review For a general chain-structured UGM we have: n n p(x 1, x 2,..., x n ) φ i (x i ) φ i,i 1 (x i, x
More informationLearning with Submodular Functions: A Convex Optimization Perspective
Foundations and Trends R in Machine Learning Vol. 6, No. 2-3 (2013) 145 373 c 2013 F. Bach DOI: 10.1561/2200000039 Learning with Submodular Functions: A Convex Optimization Perspective Francis Bach INRIA
More informationWord Alignment via Submodular Maximization over Matroids
Word Alignment via Submodular Maximization over Matroids Hui Lin, Jeff Bilmes University of Washington, Seattle Dept. of Electrical Engineering June 21, 2011 Lin and Bilmes Submodular Word Alignment June
More informationSubmodular Functions, Optimization, and Applications to Machine Learning
Submodular Functions, Optimization, and Applications to Machine Learning Spring Quarter, Lecture 13 http://www.ee.washington.edu/people/faculty/bilmes/classes/ee563_spring_2018/ Prof. Jeff Bilmes University
More informationCSCI 3210: Computational Game Theory. Cascading Behavior in Networks Ref: [AGT] Ch 24
CSCI 3210: Computational Game Theory Cascading Behavior in Networks Ref: [AGT] Ch 24 Mohammad T. Irfan Email: mirfan@bowdoin.edu Web: www.bowdoin.edu/~mirfan Course Website: www.bowdoin.edu/~mirfan/csci-3210.html
More informationLECTURE 2. Convexity and related notions. Last time: mutual information: definitions and properties. Lecture outline
LECTURE 2 Convexity and related notions Last time: Goals and mechanics of the class notation entropy: definitions and properties mutual information: definitions and properties Lecture outline Convexity
More informationLecture Notes 1 Probability and Random Variables. Conditional Probability and Independence. Functions of a Random Variable
Lecture Notes 1 Probability and Random Variables Probability Spaces Conditional Probability and Independence Random Variables Functions of a Random Variable Generation of a Random Variable Jointly Distributed
More informationBeyond Convexity Submodularity in Machine Learning
Beyond Convexity Submodularity in Machine Learning Andreas Krause, Carlos Guestrin Carnegie Mellon University International Conference on Machine Learning July 5, 2008 Carnegie Mellon Acknowledgements
More informationUndirected Graphical Models
Outline Hong Chang Institute of Computing Technology, Chinese Academy of Sciences Machine Learning Methods (Fall 2012) Outline Outline I 1 Introduction 2 Properties Properties 3 Generative vs. Conditional
More informationSubmodular Functions: Extensions, Distributions, and Algorithms A Survey
Submodular Functions: Extensions, Distributions, and Algorithms A Survey Shaddin Dughmi PhD Qualifying Exam Report, Department of Computer Science, Stanford University Exam Committee: Serge Plotkin, Tim
More informationDiffusion of Innovation and Influence Maximization
Diffusion of Innovation and Influence Maximization Leonid E. Zhukov School of Data Analysis and Artificial Intelligence Department of Computer Science National Research University Higher School of Economics
More informationCOMP90051 Statistical Machine Learning
COMP90051 Statistical Machine Learning Semester 2, 2017 Lecturer: Trevor Cohn 2. Statistical Schools Adapted from slides by Ben Rubinstein Statistical Schools of Thought Remainder of lecture is to provide
More informationMAP Examples. Sargur Srihari
MAP Examples Sargur srihari@cedar.buffalo.edu 1 Potts Model CRF for OCR Topics Image segmentation based on energy minimization 2 Examples of MAP Many interesting examples of MAP inference are instances
More informationSubmodular Functions, Optimization, and Applications to Machine Learning
Submodular Functions, Optimization, and Applications to Machine Learning Spring Quarter, Lecture 17 http://www.ee.washington.edu/people/faculty/bilmes/classes/ee563_spring_2018/ Prof. Jeff Bilmes University
More information1 Continuous extensions of submodular functions
CS 369P: Polyhedral techniques in combinatorial optimization Instructor: Jan Vondrák Lecture date: November 18, 010 1 Continuous extensions of submodular functions Submodular functions are functions assigning
More informationarxiv: v1 [cs.ds] 1 Nov 2014
Provable Submodular Minimization using Wolfe s Algorithm Deeparnab Chakrabarty Prateek Jain Pravesh Kothari November 4, 2014 arxiv:1411.0095v1 [cs.ds] 1 Nov 2014 Abstract Owing to several applications
More informationCS675: Convex and Combinatorial Optimization Fall 2016 Combinatorial Problems as Linear and Convex Programs. Instructor: Shaddin Dughmi
CS675: Convex and Combinatorial Optimization Fall 2016 Combinatorial Problems as Linear and Convex Programs Instructor: Shaddin Dughmi Outline 1 Introduction 2 Shortest Path 3 Algorithms for Single-Source
More informationNP Completeness and Approximation Algorithms
Chapter 10 NP Completeness and Approximation Algorithms Let C() be a class of problems defined by some property. We are interested in characterizing the hardest problems in the class, so that if we can
More informationHands-On Learning Theory Fall 2016, Lecture 3
Hands-On Learning Theory Fall 016, Lecture 3 Jean Honorio jhonorio@purdue.edu 1 Information Theory First, we provide some information theory background. Definition 3.1 (Entropy). The entropy of a discrete
More informationPart II: Integral Splittable Congestion Games. Existence and Computation of Equilibria Integral Polymatroids
Kombinatorische Matroids and Polymatroids Optimierung in der inlogistik Congestion undgames im Verkehr Tobias Harks Augsburg University WINE Tutorial, 8.12.2015 Outline Part I: Congestion Games Existence
More informationLecture 9: PGM Learning
13 Oct 2014 Intro. to Stats. Machine Learning COMP SCI 4401/7401 Table of Contents I Learning parameters in MRFs 1 Learning parameters in MRFs Inference and Learning Given parameters (of potentials) and
More informationPreliminaries. Introduction to EF-games. Inexpressivity results for first-order logic. Normal forms for first-order logic
Introduction to EF-games Inexpressivity results for first-order logic Normal forms for first-order logic Algorithms and complexity for specific classes of structures General complexity bounds Preliminaries
More informationDecision Tree Fields
Sebastian Nowozin, arsten Rother, Shai agon, Toby Sharp, angpeng Yao, Pushmeet Kohli arcelona, 8th November 2011 Introduction Random Fields in omputer Vision Markov Random Fields (MRF) (Kindermann and
More informationSubmodular function optimization: A brief tutorial (Part I)
Submodular function optimization: A brief tutorial (Part I) Donglei Du (ddu@unb.ca) Faculty of Business Administration, University of New Brunswick, NB Canada Fredericton E3B 9Y2 Workshop on Combinatorial
More informationProbabilistic Graphical Models: MRFs and CRFs. CSE628: Natural Language Processing Guest Lecturer: Veselin Stoyanov
Probabilistic Graphical Models: MRFs and CRFs CSE628: Natural Language Processing Guest Lecturer: Veselin Stoyanov Why PGMs? PGMs can model joint probabilities of many events. many techniques commonly
More informationA Note on the Budgeted Maximization of Submodular Functions
A Note on the udgeted Maximization of Submodular Functions Andreas Krause June 2005 CMU-CALD-05-103 Carlos Guestrin School of Computer Science Carnegie Mellon University Pittsburgh, PA 15213 Abstract Many
More informationTufts COMP 135: Introduction to Machine Learning
Tufts COMP 135: Introduction to Machine Learning https://www.cs.tufts.edu/comp/135/2019s/ Logistic Regression Many slides attributable to: Prof. Mike Hughes Erik Sudderth (UCI) Finale Doshi-Velez (Harvard)
More informationStructure Learning: the good, the bad, the ugly
Readings: K&F: 15.1, 15.2, 15.3, 15.4, 15.5 Structure Learning: the good, the bad, the ugly Graphical Models 10708 Carlos Guestrin Carnegie Mellon University September 29 th, 2006 1 Understanding the uniform
More informationMachine Learning
Machine Learning 10-601 Tom M. Mitchell Machine Learning Department Carnegie Mellon University August 30, 2017 Today: Decision trees Overfitting The Big Picture Coming soon Probabilistic learning MLE,
More informationAPPROXIMATION ALGORITHMS RESOLUTION OF SELECTED PROBLEMS 1
UNIVERSIDAD DE LA REPUBLICA ORIENTAL DEL URUGUAY IMERL, FACULTAD DE INGENIERIA LABORATORIO DE PROBABILIDAD Y ESTADISTICA APPROXIMATION ALGORITHMS RESOLUTION OF SELECTED PROBLEMS 1 STUDENT: PABLO ROMERO
More informationCSC 411 Lecture 3: Decision Trees
CSC 411 Lecture 3: Decision Trees Roger Grosse, Amir-massoud Farahmand, and Juan Carrasquilla University of Toronto UofT CSC 411: 03-Decision Trees 1 / 33 Today Decision Trees Simple but powerful learning
More informationRevisiting the Greedy Approach to Submodular Set Function Maximization
Submitted to manuscript Revisiting the Greedy Approach to Submodular Set Function Maximization Pranava R. Goundan Analytics Operations Engineering, pranava@alum.mit.edu Andreas S. Schulz MIT Sloan School
More informationGuaranteeing Solution Quality for SAS Optimization Problems by being Greedy
Guaranteeing Solution Quality for SAS Optimization Problems by being Greedy Ulrike Stege University of Victoria ustege@uvic.ca Preliminary results of this in: S. Balasubramanian, R.J. Desmarais, H.A. Müller,
More informationAdaptive Submodularity: A New Approach to Active Learning and Stochastic Optimization
Adaptive Submodularity: A New Approach to Active Learning and Stochastic Optimization Daniel Golovin California Institute of Technology Pasadena, CA 91125 dgolovin@caltech.edu Andreas Krause California
More informationProbability Theory for Machine Learning. Chris Cremer September 2015
Probability Theory for Machine Learning Chris Cremer September 2015 Outline Motivation Probability Definitions and Rules Probability Distributions MLE for Gaussian Parameter Estimation MLE and Least Squares
More informationIntroduction to Logistic Regression and Support Vector Machine
Introduction to Logistic Regression and Support Vector Machine guest lecturer: Ming-Wei Chang CS 446 Fall, 2009 () / 25 Fall, 2009 / 25 Before we start () 2 / 25 Fall, 2009 2 / 25 Before we start Feel
More informationMassachusetts Institute of Technology 6.854J/18.415J: Advanced Algorithms Friday, March 18, 2016 Ankur Moitra. Problem Set 6
Massachusetts Institute of Technology 6.854J/18.415J: Advanced Algorithms Friday, March 18, 2016 Ankur Moitra Problem Set 6 Due: Wednesday, April 6, 2016 7 pm Dropbox Outside Stata G5 Collaboration policy:
More informationCS 188: Artificial Intelligence. Machine Learning
CS 188: Artificial Intelligence Review of Machine Learning (ML) DISCLAIMER: It is insufficient to simply study these slides, they are merely meant as a quick refresher of the high-level ideas covered.
More informationCSCE 478/878 Lecture 6: Bayesian Learning
Bayesian Methods Not all hypotheses are created equal (even if they are all consistent with the training data) Outline CSCE 478/878 Lecture 6: Bayesian Learning Stephen D. Scott (Adapted from Tom Mitchell
More informationRandomized Algorithms
Randomized Algorithms Prof. Tapio Elomaa tapio.elomaa@tut.fi Course Basics A new 4 credit unit course Part of Theoretical Computer Science courses at the Department of Mathematics There will be 4 hours
More informationQualifying Exam in Machine Learning
Qualifying Exam in Machine Learning October 20, 2009 Instructions: Answer two out of the three questions in Part 1. In addition, answer two out of three questions in two additional parts (choose two parts
More informationMaximization of Submodular Set Functions
Northeastern University Department of Electrical and Computer Engineering Maximization of Submodular Set Functions Biomedical Signal Processing, Imaging, Reasoning, and Learning BSPIRAL) Group Author:
More informationFundamentals. CS 281A: Statistical Learning Theory. Yangqing Jia. August, Based on tutorial slides by Lester Mackey and Ariel Kleiner
Fundamentals CS 281A: Statistical Learning Theory Yangqing Jia Based on tutorial slides by Lester Mackey and Ariel Kleiner August, 2011 Outline 1 Probability 2 Statistics 3 Linear Algebra 4 Optimization
More informationEnergy minimization via graph-cuts
Energy minimization via graph-cuts Nikos Komodakis Ecole des Ponts ParisTech, LIGM Traitement de l information et vision artificielle Binary energy minimization We will first consider binary MRFs: Graph
More informationActive Learning and Optimized Information Gathering
Active Learning and Optimized Information Gathering Lecture 7 Learning Theory CS 101.2 Andreas Krause Announcements Project proposal: Due tomorrow 1/27 Homework 1: Due Thursday 1/29 Any time is ok. Office
More informationJunction Tree, BP and Variational Methods
Junction Tree, BP and Variational Methods Adrian Weller MLSALT4 Lecture Feb 21, 2018 With thanks to David Sontag (MIT) and Tony Jebara (Columbia) for use of many slides and illustrations For more information,
More informationComputational Cognitive Science
Computational Cognitive Science Lecture 8: Frank Keller School of Informatics University of Edinburgh keller@inf.ed.ac.uk Based on slides by Sharon Goldwater October 14, 2016 Frank Keller Computational
More informationSubmodular Functions, Optimization, and Applications to Machine Learning
Submodular Functions, Optimization, and Applications to Machine Learning Spring Quarter, Lecture 4 http://www.ee.washington.edu/people/faculty/bilmes/classes/ee563_spring_2018/ Prof. Jeff Bilmes University
More informationNaïve Bayes classification. p ij 11/15/16. Probability theory. Probability theory. Probability theory. X P (X = x i )=1 i. Marginal Probability
Probability theory Naïve Bayes classification Random variable: a variable whose possible values are numerical outcomes of a random phenomenon. s: A person s height, the outcome of a coin toss Distinguish
More informationCS675: Convex and Combinatorial Optimization Fall 2014 Combinatorial Problems as Linear Programs. Instructor: Shaddin Dughmi
CS675: Convex and Combinatorial Optimization Fall 2014 Combinatorial Problems as Linear Programs Instructor: Shaddin Dughmi Outline 1 Introduction 2 Shortest Path 3 Algorithms for Single-Source Shortest
More informationarxiv: v1 [math.oc] 1 Jun 2015
NEW PERFORMANCE GUARANEES FOR HE GREEDY MAXIMIZAION OF SUBMODULAR SE FUNCIONS JUSSI LAIILA AND AE MOILANEN arxiv:1506.00423v1 [math.oc] 1 Jun 2015 Abstract. We present new tight performance guarantees
More information1 Matroid intersection
CS 369P: Polyhedral techniques in combinatorial optimization Instructor: Jan Vondrák Lecture date: October 21st, 2010 Scribe: Bernd Bandemer 1 Matroid intersection Given two matroids M 1 = (E, I 1 ) and
More information