CS781 Lecture 3 January 27, 2011

Similar documents
Chapter 4. Greedy Algorithms. Slides by Kevin Wayne. Copyright 2005 Pearson-Addison Wesley. All rights reserved.

Chapter 4. Greedy Algorithms. Slides by Kevin Wayne. Copyright 2005 Pearson-Addison Wesley. All rights reserved.

CS 580: Algorithm Design and Analysis

Chapter 4. Greedy Algorithms. Slides by Kevin Wayne. Copyright 2005 Pearson-Addison Wesley. All rights reserved.

CS 580: Algorithm Design and Analysis

Algorithm Design and Analysis

Greedy Algorithms. Kleinberg and Tardos, Chapter 4

Undirected Graphs. V = { 1, 2, 3, 4, 5, 6, 7, 8 } E = { 1-2, 1-3, 2-3, 2-4, 2-5, 3-5, 3-7, 3-8, 4-5, 5-6 } n = 8 m = 11

Algorithm Design and Analysis

CSE 417. Chapter 4: Greedy Algorithms. Many Slides by Kevin Wayne. Copyright 2005 Pearson-Addison Wesley. All rights reserved.

Algorithms: Lecture 12. Chalmers University of Technology

Algorithm Design and Analysis

Matching Residents to Hospitals

CS 161: Design and Analysis of Algorithms

Discrete Wiskunde II. Lecture 5: Shortest Paths & Spanning Trees

Algorithm Design and Analysis

CSE 421 Greedy Algorithms / Interval Scheduling

Greedy Algorithms. CSE 101: Design and Analysis of Algorithms Lecture 10

Discrete Optimization 2010 Lecture 2 Matroids & Shortest Paths

1 Some loose ends from last time

Algorithm Design Strategies V

CS 4407 Algorithms Lecture: Shortest Path Algorithms

Chapter 11. Approximation Algorithms. Slides by Kevin Wayne Pearson-Addison Wesley. All rights reserved.

Algoritmiek, bijeenkomst 3

Greedy Algorithms My T. UF

6. DYNAMIC PROGRAMMING I

Design and Analysis of Algorithms

25. Minimum Spanning Trees

NATIONAL UNIVERSITY OF SINGAPORE CS3230 DESIGN AND ANALYSIS OF ALGORITHMS SEMESTER II: Time Allowed 2 Hours

25. Minimum Spanning Trees

6. DYNAMIC PROGRAMMING II. sequence alignment Hirschberg's algorithm Bellman-Ford distance vector protocols negative cycles in a digraph

Greedy. Outline CS141. Stefano Lonardi, UCR 1. Activity selection Fractional knapsack Huffman encoding Later:

Contents Lecture 4. Greedy graph algorithms Dijkstra s algorithm Prim s algorithm Kruskal s algorithm Union-find data structure with path compression

Shortest Path Algorithms

Single Source Shortest Paths

Greedy Algorithms. CSE 101: Design and Analysis of Algorithms Lecture 9

Breadth First Search, Dijkstra s Algorithm for Shortest Paths

CMPS 2200 Fall Carola Wenk Slides courtesy of Charles Leiserson with small changes by Carola Wenk. 10/8/12 CMPS 2200 Intro.

6. DYNAMIC PROGRAMMING II

FINAL EXAM PRACTICE PROBLEMS CMSC 451 (Spring 2016)

CSE 202 Dynamic Programming II

University of Toronto Department of Electrical and Computer Engineering. Final Examination. ECE 345 Algorithms and Data Structures Fall 2016

CSE 431/531: Analysis of Algorithms. Dynamic Programming. Lecturer: Shi Li. Department of Computer Science and Engineering University at Buffalo

Algorithms: COMP3121/3821/9101/9801

CS 410/584, Algorithm Design & Analysis, Lecture Notes 4

Breadth-First Search of Graphs

CMSC 451: Lecture 7 Greedy Algorithms for Scheduling Tuesday, Sep 19, 2017

Chapter 11. Approximation Algorithms. Slides by Kevin Wayne Pearson-Addison Wesley. All rights reserved.

CS675: Convex and Combinatorial Optimization Fall 2014 Combinatorial Problems as Linear Programs. Instructor: Shaddin Dughmi

Dynamic Programming: Interval Scheduling and Knapsack

CS 374: Algorithms & Models of Computation, Spring 2017 Greedy Algorithms Lecture 19 April 4, 2017 Chandra Chekuri (UIUC) CS374 1 Spring / 1

0 1 d 010. h 0111 i g 01100

Solutions to the Midterm Practice Problems

Shortest paths with negative lengths

8 Priority Queues. 8 Priority Queues. Prim s Minimum Spanning Tree Algorithm. Dijkstra s Shortest Path Algorithm

IS 709/809: Computational Methods in IS Research Fall Exam Review

Knapsack and Scheduling Problems. The Greedy Method

7.5 Bipartite Matching

6.889 Lecture 4: Single-Source Shortest Paths

General Methods for Algorithm Design

CMPS 6610 Fall 2018 Shortest Paths Carola Wenk

UNIVERSITY OF YORK. MSc Examinations 2004 MATHEMATICS Networks. Time Allowed: 3 hours.

CSE 421 Dynamic Programming

Introduction to Algorithms

Ring Sums, Bridges and Fundamental Sets

Introduction to Algorithms

Trees. A tree is a graph which is. (a) Connected and. (b) has no cycles (acyclic).

CS60007 Algorithm Design and Analysis 2018 Assignment 1

Introduction to Algorithms

Design and Analysis of Algorithms

CMPUT 675: Approximation Algorithms Fall 2014

CMPSCI611: The Matroid Theorem Lecture 5

Dijkstra s Single Source Shortest Path Algorithm. Andreas Klappenecker

CS325: Analysis of Algorithms, Fall Final Exam

Chapter 5 Data Structures Algorithm Theory WS 2017/18 Fabian Kuhn

Single Source Shortest Paths

Network Design and Game Theory Spring 2008 Lecture 6

CS675: Convex and Combinatorial Optimization Fall 2016 Combinatorial Problems as Linear and Convex Programs. Instructor: Shaddin Dughmi

/633 Introduction to Algorithms Lecturer: Michael Dinitz Topic: Matroids and Greedy Algorithms Date: 10/31/16

CSE101: Design and Analysis of Algorithms. Ragesh Jaiswal, CSE, UCSD

CSCE 750 Final Exam Answer Key Wednesday December 7, 2005

4.8 Huffman Codes. These lecture slides are supplied by Mathijs de Weerd

Preliminaries. Graphs. E : set of edges (arcs) (Undirected) Graph : (i, j) = (j, i) (edges) V = {1, 2, 3, 4, 5}, E = {(1, 3), (3, 2), (2, 4)}

12. LOCAL SEARCH. gradient descent Metropolis algorithm Hopfield neural networks maximum cut Nash equilibria

CS 161: Design and Analysis of Algorithms

Introduction to Algorithms

Lecture 6: Greedy Algorithms I

More on NP and Reductions

On improving matchings in trees, via bounded-length augmentations 1

Disjoint-Set Forests

CS60020: Foundations of Algorithm Design and Machine Learning. Sourangshu Bhattacharya

A faster algorithm for the single source shortest path problem with few distinct positive lengths

Dynamic Programming: Shortest Paths and DFA to Reg Exps

CSE101: Design and Analysis of Algorithms. Ragesh Jaiswal, CSE, UCSD

Do not turn this page until you have received the signal to start. Please fill out the identification section above. Good Luck!

Divide-and-Conquer Algorithms Part Two

2-INF-237 Vybrané partie z dátových štruktúr 2-INF-237 Selected Topics in Data Structures

University of New Mexico Department of Computer Science. Final Examination. CS 362 Data Structures and Algorithms Spring, 2007

Chapter 7. Network Flow. Slides by Kevin Wayne. Copyright 2005 Pearson-Addison Wesley. All rights reserved.

Chapter 6. Dynamic Programming. Slides by Kevin Wayne. Copyright 2005 Pearson-Addison Wesley. All rights reserved.

Transcription:

CS781 Lecture 3 January 7, 011 Greedy Algorithms Topics: Interval Scheduling and Partitioning Dijkstra s Shortest Path Algorithm Minimum Spanning Trees Single-Link k-clustering

Interval Scheduling Interval scheduling. Job j starts at s j and finishes at f j. Two jobs compatible if they don't overlap. Goal: find maximum subset of mutually compatible jobs. a b c d e f g 0 1 3 4 7 8 10 11 h Time

Interval Scheduling: Greedy Algorithms Greedy template. Consider jobs in some order. Take each job provided it's compatible with the ones already taken. [Earliest start time] Consider jobs in ascending order of start time s j. [Earliest finish time] Consider jobs in ascending order of finish time f j. [Shortest interval] Consider jobs in ascending order of interval length f j - s j. [Fewest conflicts] For each job, count the number of conflicting jobs c j. Schedule in ascending order of conflicts c j. 3

Interval Scheduling: Greedy Algorithms Greedy template. Consider jobs in some order. Take each job provided it's compatible with the ones already taken. breaks earliest start time breaks shortest interval breaks fewest conflicts 4

Interval Scheduling: Greedy Algorithm Greedy algorithm. Consider jobs in increasing order of finish time. Take each job provided it's compatible with the ones already taken. Sort jobs by finish times so that f 1 f... f n. jobs selected A φ for j = 1 to n { if (job j compatible with A) A A {j} } return A Implementation Analysis. Sort intervals by finish times O(n log n). Check each job j compatible in O(1) time Remember job j* that was added last to A. Job j is compatible with A if s j f j*.

Demo: Greedy Interval Scheduling 0 B C A E D F G H 1 3 4 7 8 10 11 Time 0 1 3 4 7 8 10 11

Interval Scheduling B C A E D F 0 1 3 4 7 8 10 11 G H Time B 0 1 3 4 7 8 10 11 7

Interval Scheduling B C A E D F 0 1 3 4 7 8 10 11 G H Time B C 0 1 3 4 7 8 10 11 8

Interval Scheduling B C A E D F 0 1 3 4 7 8 10 11 G H Time B A 0 1 3 4 7 8 10 11

Interval Scheduling B C A E D F 0 1 3 4 7 8 10 11 G H Time B E 0 1 3 4 7 8 10 11 10

Interval Scheduling B C A E D F 0 1 3 4 7 8 10 11 G H Time B D E 0 1 3 4 7 8 10 11 11

Interval Scheduling B C A E D F 0 1 3 4 7 8 10 11 G H Time B E F 0 1 3 4 7 8 10 11 1

Interval Scheduling B C A E D F 0 1 3 4 7 8 10 11 G H Time B E G 0 1 3 4 7 8 10 11 13

Interval Scheduling B C A E D F 0 1 3 4 7 8 10 11 G H Time B E H 0 1 3 4 7 8 10 11

Interval Scheduling: Analysis Theorem. Greedy algorithm returns optimal solution. Pf. (by contradiction) Assume greedy is not optimal, and let's see what happens. Let i 1, i,... i k denote set of jobs selected by greedy. Let us choose the optimal solution that most closely matches greedy, call it our Prime Suspect, which is set of jobs j 1, j,... j m with i 1 = j 1, i = j,..., i r = j r for the largest possible value of r. job i r+1 finishes before j r+1 Greedy: i 1 i 1 i r i r+1 OPT: j 1 j j r j r+1... why not replace job j r+1 with job i r+1? 1

Interval Scheduling: Analysis Theorem. Greedy algorithm is optimal. Pf.(continued) We can take our Prime Suspect and modify it to more closely match the Greedy solution. Remains optimal, thus contradicts the maximality of r. Proof now follows from this contradiction. job i r+1 finishes before j r+1 Greedy: i 1 i 1 i r i r+1 OPT: j 1 j j r i r+1... solution still feasible and optimal, but contradicts maximality of r. 1

Interval Partitioning

Interval partitioning. Interval Partitioning Lecture j starts at s j and finishes at f j. Goal: find minimum number of classrooms to schedule all lectures so that no two occur at the same time in the same room. Ex: This schedule uses 4 classrooms to schedule 10 lectures. Room1 e j Room c d g Room3 b h Room4 a f i :30 10 10:30 11 11:30 1 1:30 1 1:30 :30 3 3:30 4 4:30 Time 18

Interval Partitioning Interval partitioning. Lecture j starts at s j and finishes at f j. Goal: find minimum number of classrooms to schedule all lectures so that no two occur at the same time in the same room. Ex: This schedule uses only 3. c d f j b g i a e h :30 10 10:30 11 11:30 1 1:30 1 1:30 :30 3 3:30 4 4:30 Time 1

Interval Partitioning: Lower Bound on Optimal Solution Def. The depth of a set of open intervals is the maximum number that contain any given time. Key observation. Number of classrooms needed depth. Ex: Depth of schedule below = 3 schedule below is optimal. a, b, c all contain :30 Q. Does there always exist a schedule equal to depth of intervals? And how can we partition to match depth? c d f j b g i a e h :30 10 10:30 11 11:30 1 1:30 1 1:30 :30 3 3:30 4 4:30 Time 0

Interval Partitioning: Greedy Algorithm Greedy algorithm. Consider lectures in increasing order of start time: assign lecture to any compatible classroom. Sort intervals by starting time so that s 1 s... s n. d 0 number of allocated classrooms for j = 1 to n { if (lecture j is compatible with some classroom k) schedule lecture j in classroom k else allocate a new classroom d + 1 schedule lecture j in classroom d + 1 d d + 1 }. 1

Interval Partitioning: Greedy Algorithm Analysis Implementation. For each classroom k, maintain the finish time of the last job added. Keep the all classrooms in a priority queue with the last finish time as key value. Analysis of Implementation O(n log n) time to sort n intervals O(log d) time to update/maintain priority queue with d allocated classrooms (heap implementation) Total time: O(n log n) + O(n log d)

Interval Partitioning: Greedy Correctness Analysis Observation. Greedy algorithm never schedules two incompatible lectures in the same classroom. Theorem. Greedy algorithm returns optimal solution. Pf. Let d = number of classrooms that the greedy algorithm allocates. Classroom d is opened because we needed to schedule a job, say j, that is incompatible with all d-1 other classrooms. Since we sorted by start time, all these incompatibilities are caused by lectures that start no later than s j. Thus, we have d lectures overlapping at time s j + ε. Key observation all schedules use d classrooms. 3

Selecting Breakpoints

Selecting Breakpoints Selecting breakpoints. Road trip from Cincinnati to Miami Beach along fixed route. Refueling stations at certain points along the way. Distance limit based on fuel capacity = C. Goal: makes as few refueling stops as possible. Greedy algorithm. Go as far as you can before refueling. C C C C Cincinnati C C C Miami Beach 1 3 4 7

Selecting Breakpoints: Greedy Algorithm Truck driver's algorithm. Sort breakpoints so that: 0 = b 0 < b 1 < b <... < b n = L S {0} x 0 breakpoints selected current location while (x b n ) let p be largest integer such that b p x + C if (b p = x) return "no solution" x b p S S {p} return S Implementation. O(n log n) to sort breakpoints Use binary search to select each breakpoint p in S Or linear search which is better?

Selecting Breakpoints: Correctness Theorem. Greedy algorithm is optimal. Pf. (by contradiction) Assume greedy is not optimal, and let's see what happens. Let 0 = g 0 < g 1 <... < g p = L denote set of breakpoints chosen by greedy. Find a prime suspect - let 0 = f 0 < f 1 <... < f q = L denote set of breakpoints in an optimal solution with f 0 = g 0, f 1 = g 1,..., f r = g r for largest possible value of r. Note: g r+1 > f r+1 by greedy choice of algorithm. Greedy: g 0 g 1 g g r g r+1 OPT:... f 0 f 1 f f r f r+1 f q why doesn't optimal solution drive a little further? 7

Shortest Paths in a Graph Shortest path from Cinti to Atlantic Ocean Beach (8 miles, 10. hours)

Shortest Path Problem Shortest path network. Directed graph G = (V, E). Source s, destination t. Length e = length of edge e (today we assume e >=0) Shortest path problem: find shortest directed path from s to t. cost of path = sum of edge costs in path s 1 0 30 3 18 11 1 4 3 1 Cost of path s--3--t = + 3 + + 1 = 48. 7 44 t

Dijkstra's Algorithm Dijkstra's algorithm. Maintain a set of explored nodes S for which we have determined the shortest path distance d(u) from s to u. Initialize S = { s }, d(s) = 0. Repeatedly choose unexplored node v which minimizes π ( v) = e = min ( u, v) : u S d( u) + e, add v to S, and set d(v) = π(v). shortest path to some u in explored part, followed by a single edge (u, v) d(u) e v u S s 30

Dijkstra's Algorithm Greedily extend the explored set to include node v the closest node to s that lies outside the explored set. Then (as we will show) we have identified the shortest distance d(v) from s to v. S s d(u) u e v S d(u) u e v s 31

Dijkstra's Algorithm: Proof of Correctness Invariant. For each node u S, d(u) is the length of the shortest s-u path. Pf. (by induction on S, the size of the explored set) Base case: S = 1 is trivial. Inductive hypothesis: Assume true for S = k 1. Let v be next node added to S, and let u-v be the chosen edge. The shortest s-u path plus (u, v) is an s-v path of length π(v). Consider any s-v path P. We'll see that it's no shorter than π(v). Let x-y be the first edge in P that leaves S, and let P' be the subpath to x. P' x P is already too long as soon as it leaves S. s P y (P) (P') + (x,y) d(x) + (x, y) π(y) π(v) S u v nonnegative weights inductive hypothesis defn of π(y) Dijkstra chose v instead of y 3

Dijkstra's Algorithm: Implementation For each unexplored node, explicitly maintain π(v) = min d(u) + e. e = (u,v) : u S Next node to explore = node with minimum π(v). When exploring v, for each incident edge e = (v, w), update π(w) = min { π(w), π(v)+ e }. Efficient implementation. Maintain a priority queue of unexplored nodes, prioritized by π(v). PQ Operation Insert ExtractMin ChangeKey Dijkstra n n m Array n n 1 Binary heap log n log n log n Priority Queue d-way Heap d log d n d log d n log d n Fib heap IsEmpty n 1 1 1 1 1 log n Total n m log n m log m/n n m + n log n 1 Individual ops are amortized bounds 33

Dijkstra's Shortest Path Algorithm Find shortest path from s to t. 4 3 s 18 1 0 30 11 1 4 1 7 44 t 34

Dijkstra's Shortest Path Algorithm S = { } PQ = { s,, 3, 4,,, 7, t } 0 s 1 0 30 18 4 11 1 4 3 1 distance label 7 44 t 3

Dijkstra's Shortest Path Algorithm S = { } PQ = { s,, 3, 4,,, 7, t } delmin 0 s 1 0 30 18 4 11 1 4 3 1 distance label 7 44 t 3

Dijkstra's Shortest Path Algorithm S = { s } PQ = {, 3, 4,,, 7, t } decrease key X 0 s 1 X 0 30 18 4 11 1 4 3 1 distance label 7 X 1 44 t 37

Dijkstra's Shortest Path Algorithm S = { s } PQ = {, 3, 4,,, 7, t } X delmin 0 s 1 X 0 30 18 4 11 1 4 3 1 distance label 7 X 1 44 t 38

Dijkstra's Shortest Path Algorithm S = { s, } PQ = { 3, 4,,, 7, t } X 0 s 1 X 0 30 18 4 11 1 4 3 1 7 X 1 44 t 3

Dijkstra's Shortest Path Algorithm S = { s, } PQ = { 3, 4,,, 7, t } decrease key X X 33 0 s 1 X 0 30 18 4 11 1 4 3 1 7 X 1 44 t 40

Dijkstra's Shortest Path Algorithm S = { s, } PQ = { 3, 4,,, 7, t } X X 33 0 s 1 X delmin 30 0 18 4 11 1 4 3 1 7 X 1 44 t 41

Dijkstra's Shortest Path Algorithm S = { s,, } PQ = { 3, 4,, 7, t } 3 X X 33 X 0 s 1 X 0 30 44 X 18 4 11 1 4 3 1 7 X 1 44 t 4

Dijkstra's Shortest Path Algorithm S = { s,, } PQ = { 3, 4,, 7, t } 3 X X 33X 0 s 1 X 0 30 44 X 18 4 11 1 4 3 1 7 X 1 delmin 44 t 43

Dijkstra's Shortest Path Algorithm S = { s,,, 7 } PQ = { 3, 4,, t } 3 X X 33X 0 s 1 X 0 30 44 X 3 X 4 18 11 1 4 3 1 7 X 1 44 t X 44

Dijkstra's Shortest Path Algorithm S = { s,,, 7 } PQ = { 3, 4,, t } delmin 3 X X 33X 0 s 1 X 0 30 44 X 3 X 4 18 11 1 4 3 1 7 X 1 44 t X 4

Dijkstra's Shortest Path Algorithm S = { s,, 3,, 7 } PQ = { 4,, t } 3 X X 33X 0 s 1 X 0 30 4 18 44 X 3 X 34 X 11 1 4 3 1 7 X 1 44 1 t X X 4

Dijkstra's Shortest Path Algorithm S = { s,, 3,, 7 } PQ = { 4,, t } 3 X X 33X 0 s 1 X 0 30 4 18 44 X 3 X 34 X delmin 11 1 4 3 1 7 X 1 44 1 t X X 47

Dijkstra's Shortest Path Algorithm S = { s,, 3,,, 7 } PQ = { 4, t } 3 X X 33X 0 s 1 X 0 30 4 18 44 X 3 X 34 X 11 1 4 X 4 3 1 7 X 1 44 0 1 X t X X 48

Dijkstra's Shortest Path Algorithm S = { s,, 3,,, 7 } PQ = { 4, t } 3 X X 33X 0 s 1 X 0 30 4 18 44 X 3 X 34 X 11 1 4 X 4 delmin 3 1 7 X 1 44 0 1 X t X X 4

Dijkstra's Shortest Path Algorithm S = { s,, 3, 4,,, 7 } PQ = { t } 3 X X 33X 0 s 1 X 0 30 4 18 44 X 3 X 34 X 11 1 4 X 4 3 1 7 X 1 44 0 1 X t X X 0

Dijkstra's Shortest Path Algorithm S = { s,, 3, 4,,, 7 } PQ = { t } 3 X X 33X 0 s 1 X 0 30 4 18 44 X 3 X 34 X 11 1 4 X 4 3 1 7 X 1 44 delmin 0 1 X t X X 1

Dijkstra's Shortest Path Algorithm S = { s,, 3, 4,,, 7, t } PQ = { } 3 X X 33X 0 s 1 X 0 30 4 18 44 X 3 X 34 X 11 1 4 X 4 3 1 7 X 1 44 0 1 X t X X

Dijkstra's Shortest Path Algorithm S = { s,, 3, 4,,, 7, t } PQ = { } 3 X X 33X 0 s 1 X 0 30 4 18 44 X 3 X 34 X 11 1 4 X 4 3 1 7 X 1 44 0 1 X t X X 3

Coin Changing Problem

Coin Changing Goal. Given currency denominations: 1,, 10,, 100, devise a method to pay amount to customer using fewest number of coins. Ex: 34. Cashier's algorithm. At each iteration, add coin of the largest value that does not take us past the amount to be paid. Ex: $.8.

Coin-Changing: Greedy Algorithm Cashier's algorithm. At each iteration, add coin of the largest value that does not take us past the amount to be paid. Sort coins denominations by value: c 1 < c < < c n. coins selected S φ while (x 0) { let k be largest integer such that c k x if (k = 0) return "no solution found" x x - c k S S {k} } return S Q. Is cashier's algorithm optimal?

Coin-Changing: Analysis of Greedy Algorithm Theorem. Greed is optimal for U.S. coinage: 1,, 10,, 100. Pf. (by induction on x) Consider optimal way to change c k x < c k+1 : greedy takes coin k. We claim that any optimal solution must also take coin k. Clearly if x = c k, then greedy is optimal. Now look at value x in gaps. if greedy is not optimal, there needs to be enough coins of type c 1,, c k-1 to add up to x table below indicates no optimal solution can do this k c k All optimal solutions must satisfy Max value of coins 1,,, k-1 in any OPT 1 1 P 4 - N 1 4 3 10 N + D 4 + = 4 Q 3 0 + 4 = 4 100 no limit 7 + 4 = 7

Coin-Changing: Analysis of Greedy Algorithm Observation. Greedy algorithm is sub-optimal for US postal denominations: 1, 10, 1, 34, 70, 100, 30. Counterexample. 0. Greedy: 100, 34, 1, 1, 1, 1, 1, 1. Optimal: 70, 70. 8

Minimum Spanning Trees

Minimum Spanning Tree Minimum spanning tree. Given a connected graph G = (V, E) with real-valued edge weights c e, an MST is a subset of the edges T E such that T is a spanning tree whose sum of edge weights is minimized. 4 4 4 3 18 1 8 10 11 7 8 11 7 1 G = (V, E) T, Σ e T c e = 0 Cayley's Theorem. There are n n- spanning trees of the comple graph K n. So can't solve by brute force. MST is fundamental problem with diverse applications. 0

Greedy Algorithms Kruskal's algorithm. Start with T = φ. Consider edges in ascending order of cost. Insert edge e in T unless doing so would create a cycle. Reverse-Delete algorithm. Start with T = E. Consider edges in descending order of cost. Delete edge e from T unless doing so would disconnect T. Prim's algorithm. Start with some root node s and greedily grow a tree T from s outward. At each step, add the cheapest edge e to T that has exactly one endpoint in T. Remark. All three algorithms produce an MST. 1

Greedy Algorithms Simplifying assumption. All edge costs c e are distinct. Cut property. Let S be any subset of nodes, and let e be the min cost edge with exactly one endpoint in S. Then the MST contains e. Cycle property. Let C be any cycle, and let f be the max cost edge belonging to C. Then the MST does not contain f. f C S e e is in the MST f is not in the MST

Cycles and Cuts Cycle. Set of edges the form a-b, b-c, c-d,, y-z, z-a. 1 3 4 Cycle C = 1-, -3, 3-4, 4-, -, -1 7 8 Cutset. A cut is a subset of nodes S. The corresponding cutset D is the subset of edges with exactly one endpoint in S. 1 4 3 Cut S = { 4,, 8 } Cutset D = -, -7, 3-4, 3-, 7-8 7 8 3

Cycle-Cut Intersection Claim. A cycle and a cutset intersect in an even number of edges. 1 4 3 Cycle C = 1-, -3, 3-4, 4-, -, -1 Cutset D = 3-4, 3-, -, -7, 7-8 Intersection = 3-4, - 7 8 Pf. (by picture) C S V - S 4

Greedy Algorithms Simplifying assumption. All edge costs c e are distinct. Cut property. Let S be any subset of nodes, and let e be the min cost edge with exactly one endpoint in S. Then the MST T* contains e. Pf. (exchange argument) Suppose e does not belong to T*, and let's see what happens. Adding e to T* creates a cycle C in T*. Edge e is both in the cycle C and in the cutset D corresponding to S there exists another edge, say f, that is in both C and D. T' = T* { e } - { f } is also a spanning tree. Since c e < c f, cost(t') < cost(t*). f This is a contradiction. S e T*

Greedy Algorithms Simplifying assumption. All edge costs c e are distinct. Cycle property. Let C be any cycle in G, and let f be the max cost edge belonging to C. Then the MST T* does not contain f. Pf. (exchange argument) Suppose f belongs to T*, and let's see what happens. Deleting f from T* creates a cut S in T*. Edge f is both in the cycle C and in the cutset D corresponding to S there exists another edge, say e, that is in both C and D. T' = T* { e } - { f } is also a spanning tree. Since c e < c f, cost(t') < cost(t*). f This is a contradiction. S e T*

Prim's Algorithm: Proof of Correctness Prim's algorithm. [Jarník 130, Dijkstra 17, Prim 1] Initialize S = any node. Apply cut property to S. Add min cost edge in cutset corresponding to S to T, and add one new explored node u to S. S 7

Implementation: Prim's Algorithm Implementation. Use a priority queue ala Dijkstra. Maintain set of explored nodes S. For each unexplored node v, maintain attachment cost a[v] = cost of cheapest edge v to a node in S. Complexity Analysis is same as Dijkstra s SP O(n ) with an array; O(m log n) with a binary heap. Prim(G, c) { foreach (v V) a[v] Initialize an empty priority queue Q foreach (v V) insert v onto Q Initialize set of explored nodes S φ } while (Q is not empty) { u delete min element from Q S S { u } foreach (edge e = (u, v) incident to u) if ((v S) and (c e < a[v])) decrease priority a[v] to c e 8

Kruskal's Algorithm: Proof of Correctness Kruskal's algorithm. [Kruskal, 1] Consider edges in ascending order of weight. Case 1: If adding e to T creates a cycle, discard e according to cycle property. Case : Otherwise, insert e = (u, v) into T according to cut property where S = set of nodes in u's connected component. v e S e u Case 1 Case

Implementation: Kruskal's Algorithm Implementation. Use the union-find data structure. Build set T of edges in the MST. Maintain set for each connected component. O(m log n) for sorting and O(m log n) for unionfind. m n log m is O(log n) Kruskal(G, c) { Sort edges weights so that c 1 c... c m. T φ foreach (u V) make a set containing singleton u } for i = 1 to m (u,v) = e i if (u and v are in different sets) { T T {e i } merge the sets containing u and v } merge two components return T are u and v in different connected components? 70

Lexicographic Tiebreaking To remove the assumption that all edge costs are distinct: perturb all edge costs by tiny amounts to break any ties. Impact. Kruskal and Prim only interact with costs via pairwise comparisons. If perturbations are sufficiently small, MST with perturbed costs is MST with original costs. e.g., if all edge costs are integers, perturbing cost of edge e i by i / n Implementation. Can handle arbitrarily small perturbations implicitly by breaking ties lexicographically, according to index. boolean less(i, j) { if (cost(e i ) < cost(e j )) return true else if (cost(e i ) > cost(e j )) return false else if (i < j) return true else return false } 71

Clustering Outbreak of cholera deaths in London in 180s. Reference: Nina Mishra, HP Labs

Clustering Clustering. Given a set U of n objects labeled p 1,, p n, classify/partition into coherent groups. Distance function. Numeric value specifying "closeness" of two objects. Fundamental problem. Divide into clusters so that points in different clusters are far apart. Applications: Routing in mobile ad hoc networks. Identify patterns in gene expression. Document categorization for web search. Similarity searching in medical image databases Skycat: cluster 10 sky objects into stars, quasars, galaxies. 73

Clustering of Maximum Spacing k-clustering. Divide objects into k non-empty groups. Distance function. Assume it satisfies several natural properties. d(p i, p j ) = 0 iff p i = p j (unique identity of indiscernibles) d(p i, p j ) 0 (nonnegativity) d(p i, p j ) = d(p j, p i ) (symmetry) Spacing. Min distance between any pair of points in different clusters. Clustering of maximum spacing. Given an integer k, find a k-clustering of maximum spacing. spacing k = 4 74

Greedy Clustering Algorithm Single-link k-clustering algorithm. Form a graph on the vertex set U, corresponding to n clusters. Find the closest pair of objects such that each object is in a different cluster, and add an edge between them. Repeat n-k times until there are exactly k clusters. Key observation. This procedure is precisely Kruskal's algorithm (except we stop when there are k connected components). Remark. Equivalent to finding an MST and deleting the k-1 most expensive edges. 7

Greedy Clustering Algorithm: Analysis Theorem. Let C* denote the clustering C* 1,, C* k formed by deleting the k-1 most expensive edges of a MST. C* is a k-clustering of max spacing. Pf. Let C denote some other clustering C 1,, C k. The spacing of C* is the length d* of the (k-1) st most expensive edge. Let p i, p j be in the same cluster in C*, say C* r, but different clusters in C, say C s and C t. Some edge (p, q) on p i -p j path in C* r spans two different clusters in C. All edges on p i -p j path have length d* C since Kruskal chose them. s C t Spacing of C is d* since p and q C* r are in different clusters. p i p q p j 7

Dendrogram Dendrogram. Scientific visualization of hypothetical sequence of evolutionary events. Leaves = genes. Internal nodes = hypothetical ancestors. Reference: http://www.biostat.wisc.edu/bmi7/fall-003/lecture13.pdf 77

Dendrogram of Cancers in Human Tumors in similar tissues cluster together. Gene 1 Gene n Reference: Botstein & Brown group gene expressed gene not expressed 78