CS 161: Design and Analysis of Algorithms

Similar documents
CSE 202: Design and Analysis of Algorithms Lecture 3

9 Union Find. 9 Union Find. List Implementation. 9 Union Find. Union Find Data Structure P: Maintains a partition of disjoint sets over elements.

Chapter 4. Greedy Algorithms. Slides by Kevin Wayne. Copyright 2005 Pearson-Addison Wesley. All rights reserved.

Data Structures for Disjoint Sets

Greedy Algorithms and Data Compression. Curs 2018

1 Basic Definitions. 2 Proof By Contradiction. 3 Exchange Argument

CS 374: Algorithms & Models of Computation, Spring 2017 Greedy Algorithms Lecture 19 April 4, 2017 Chandra Chekuri (UIUC) CS374 1 Spring / 1

Algorithm Design Strategies V

CMSC 451: Lecture 7 Greedy Algorithms for Scheduling Tuesday, Sep 19, 2017

Algorithm Design and Analysis

Greedy Algorithms My T. UF

Chapter 4. Greedy Algorithms. Slides by Kevin Wayne. Copyright 2005 Pearson-Addison Wesley. All rights reserved.

CS781 Lecture 3 January 27, 2011

Computing Connected Components Given a graph G = (V; E) compute the connected components of G. Connected-Components(G) 1 for each vertex v 2 V [G] 2 d

Greedy Algorithms. Kleinberg and Tardos, Chapter 4

Chapter 4. Greedy Algorithms. Slides by Kevin Wayne. Copyright 2005 Pearson-Addison Wesley. All rights reserved.

CS 580: Algorithm Design and Analysis

Algorithm Design and Analysis

CSE 421 Greedy Algorithms / Interval Scheduling

CSE101: Design and Analysis of Algorithms. Ragesh Jaiswal, CSE, UCSD

CS4311 Design and Analysis of Algorithms. Analysis of Union-By-Rank with Path Compression

CS 6901 (Applied Algorithms) Lecture 3

Greedy. Outline CS141. Stefano Lonardi, UCR 1. Activity selection Fractional knapsack Huffman encoding Later:

Equivalence Relations. Disjoint Union / Find. Dynamic Equivalence Problem. Equivalence Classes

Disjoint-Set Forests

Disjoint Sets : ADT. Disjoint-Set : Find-Set

CS 580: Algorithm Design and Analysis

University of New Mexico Department of Computer Science. Final Examination. CS 362 Data Structures and Algorithms Spring, 2007

FINAL EXAM PRACTICE PROBLEMS CMSC 451 (Spring 2016)

25. Minimum Spanning Trees

25. Minimum Spanning Trees

Knapsack and Scheduling Problems. The Greedy Method

Equivalence Relations. Equivalence Classes. A new question. Building Equivalence Classes. CSE 326: Data Structures Disjoint Union/Find

Equivalence Relation. Equivalence Classes. Building Equivalence Classes. Disjoint Union Find 2016/2/9. Determining equivalence classes

CS60007 Algorithm Design and Analysis 2018 Assignment 1

CS2351 Data Structures. Tutorial 3: Data Structures for Disjoint Sets

Algorithms: COMP3121/3821/9101/9801

More Approximation Algorithms

Activity selection. Goal: Select the largest possible set of nonoverlapping (mutually compatible) activities.

Presentation for use with the textbook, Algorithm Design and Applications, by M. T. Goodrich and R. Tamassia, Wiley, Union-Find Structures

CS325: Analysis of Algorithms, Fall Final Exam

CSE 417. Chapter 4: Greedy Algorithms. Many Slides by Kevin Wayne. Copyright 2005 Pearson-Addison Wesley. All rights reserved.

Discrete Wiskunde II. Lecture 5: Shortest Paths & Spanning Trees

Dynamic Programming( Weighted Interval Scheduling)

Algorithm Design and Analysis

Greedy Algorithms and Data Compression. Curs 2017

Matching Residents to Hospitals

Proof methods and greedy algorithms

Greedy Homework Problems

CMPSCI611: The Matroid Theorem Lecture 5

Undirected Graphs. V = { 1, 2, 3, 4, 5, 6, 7, 8 } E = { 1-2, 1-3, 2-3, 2-4, 2-5, 3-5, 3-7, 3-8, 4-5, 5-6 } n = 8 m = 11

/633 Introduction to Algorithms Lecturer: Michael Dinitz Topic: Matroids and Greedy Algorithms Date: 10/31/16

1 Some loose ends from last time

General Methods for Algorithm Design

Contents Lecture 4. Greedy graph algorithms Dijkstra s algorithm Prim s algorithm Kruskal s algorithm Union-find data structure with path compression

IS 709/809: Computational Methods in IS Research Fall Exam Review

Greedy Algorithms. CSE 101: Design and Analysis of Algorithms Lecture 10

Discrete Optimization 2010 Lecture 2 Matroids & Shortest Paths

Coin Changing: Give change using the least number of coins. Greedy Method (Chapter 10.1) Attempt to construct an optimal solution in stages.

Advances in processor, memory, and communication technologies

Outline / Reading. Greedy Method as a fundamental algorithm design technique

Divide-and-Conquer Algorithms Part Two

CS 6783 (Applied Algorithms) Lecture 3

The Greedy Method. Design and analysis of algorithms Cs The Greedy Method

Assignment 5: Solutions

CS 161: Design and Analysis of Algorithms

8 Priority Queues. 8 Priority Queues. Prim s Minimum Spanning Tree Algorithm. Dijkstra s Shortest Path Algorithm

Design and Analysis of Algorithms

CSCI 3110 Assignment 6 Solutions

Algoritmiek, bijeenkomst 3

2. A vertex in G is central if its greatest distance from any other vertex is as small as possible. This distance is the radius of G.

Preliminaries. Graphs. E : set of edges (arcs) (Undirected) Graph : (i, j) = (j, i) (edges) V = {1, 2, 3, 4, 5}, E = {(1, 3), (3, 2), (2, 4)}

Algorithm Design and Analysis (NTU CSIE, Fall 2017) Homework #3. Homework #3. Due Time: 2017/12/14 (Thu.) 17:20 Contact TAs:

SPT is Optimally Competitive for Uniprocessor Flow

CS Analysis of Recursive Algorithms and Brute Force

Greedy Algorithms. CSE 101: Design and Analysis of Algorithms Lecture 9

Santa Claus Schedules Jobs on Unrelated Machines

b + O(n d ) where a 1, b > 1, then O(n d log n) if a = b d d ) if a < b d O(n log b a ) if a > b d

ICS 252 Introduction to Computer Design

Topics in Approximation Algorithms Solution for Homework 3

Maximum Flow Problem (Ford and Fulkerson, 1956)

CSC 421: Algorithm Design & Analysis. Spring 2018

Notes on MapReduce Algorithms

Skylines. Yufei Tao. ITEE University of Queensland. INFS4205/7205, Uni of Queensland

Quiz 1 Solutions. (a) f 1 (n) = 8 n, f 2 (n) = , f 3 (n) = ( 3) lg n. f 2 (n), f 1 (n), f 3 (n) Solution: (b)

CS361 Homework #3 Solutions

Sorting Algorithms. We have already seen: Selection-sort Insertion-sort Heap-sort. We will see: Bubble-sort Merge-sort Quick-sort

Chapter 11. Approximation Algorithms. Slides by Kevin Wayne Pearson-Addison Wesley. All rights reserved.

Topics in Theoretical Computer Science April 08, Lecture 8

Design and Analysis of Algorithms

Aditya Bhaskara CS 5968/6968, Lecture 1: Introduction and Review 12 January 2016

Another way of saying this is that amortized analysis guarantees the average case performance of each operation in the worst case.

Mathematical Background. Unsigned binary numbers. Powers of 2. Logs and exponents. Mathematical Background. Today, we will review:

CS Data Structures and Algorithm Analysis

A lower bound for scheduling of unit jobs with immediate decision on parallel machines

This means that we can assume each list ) is

Network Design and Game Theory Spring 2008 Lecture 6

1 Ordinary Load Balancing

CS 6110 S16 Lecture 33 Testing Equirecursive Equality 27 April 2016

Algorithms: Lecture 12. Chalmers University of Technology

Transcription:

CS 161: Design and Analysis of Algorithms

Greedy Algorithms 3: Minimum Spanning Trees/Scheduling Disjoint Sets, continued Analysis of Kruskal s Algorithm Interval Scheduling

Disjoint Sets, Continued Each set represented as directed tree Set identified by root of tree L G C E J F A I M O B N D H K

Disjoint Sets, Continued makeset(x) = { Set p(x) = x Set rank(x) = 0 } find(x) = { Let r = x While p(r) r, r = p(r) Return r } O(1) O(h)

Disjoint Sets, Continued union(x,y) = { O(h 1 + h 2 ) Let x = p(x), y = p(y) If rank(x ) > rank(y ): Else p(y ) = x p(x ) = y If rank(x ) = rank(y ): rank(y ) = rank(y ) + 1

Disjoint Sets, Continued rank(x) < rank(p(x)) Any root of rank k has at least 2 k descendants There are at most n/2 k nodes of rank k Maximum rank is at most log n

Optimization: Path Compression When we call find(x), we will traverse the ancestors of x until we find the root r Regardless of union operations, r will always be an ancestor of x Can shortcut path to root and set p(x) = r

Optimization: Path Compression find(x) = { Let r = x Let s be a stack While p(r) r: s.push(r) R = p(r) While s isn t empty: p(s.pop()) = r Return r }

Optimization: Path Compression find(x) = { If x p(x), set p(x) = find(p(x)) Return p(x) }

Running Time? Running time = depth of x in tree Still O(h), might still be O(log n) However, once we call find(x), subsequent calls to find(x) are constant time Using amortized analysis, almost linear time on average

Amortized Analysis Consider any sequence of n calls to find (might be part of calls to union) Running time bounded by number times we follow parent pointers

Amortized analysis Path compression does not affect ranks If we update a parent pointer, the new parent must have higher rank than the old parent If x has a parent, its rank will never be changed again. Maximum rank: since n calls to find, largest set has at most n values, so max rank = log n

Amortized Analysis Break possible ranks into intervals {k+1,,2 k }: Interval 0: {1} Interval 1: {2} Interval 2: {3,4} Interval 3: {5,6,7,,16} Interval 4: {17,,2 16 = 65536} Interval 5: {65537,,2 65536 }

Amortized Analysis How many intervals needed? Maximum rank = log n In practice: n never larger than 2 65536, so log n never larger than 65536: only need 5 intervals log * (k) = number of times apply log to k to get something at most 1 In theory: need log * (n) intervals

Amortized Analysis How many nodes in {k+1,,2 k }? At most n/2 i nodes of rank i, so sum from i=k+1 to i=2 k k 2 i= k + 1 n n n < = i i k 2 i= k + 1 2 2 j= 1 Number of times we update parent pointer, where old parent pointer in same range as x? At most 2 k for each node in {k+1,,2 k } At most n updates for each range At most n log * (n) 1 2 j = n k 2

Amortized Analysis When we call find(x), we repeatedly follow parent pointers For all except the last pointer, we update the parent pointer At most log * (n) of the nodes we visit have their parent pointers in the next interval Total number of parent pointers followed: Parent pointers not updated: n Updates where parent in next level: n log * (n) Updates where parent in same level: n log * (n)

Amortized Analysis Total number of parent pointers followed: n+2n log * (n) 3n log * (n) = O(n log * (n)) Time per call to find: O(log * (n)) amortized Therefore, find and union take amortized O(log * (n)), essentially constant

Kruskal s Algorithm Repeatedly add lightest edge that does not form a cycle How to find lightest edge? Sort them initially Heap/self-balancing BST O( E log E ) If edge weights are small, can sort in O( E ) Check if adding causes a cycle? Keep track of separate components with Disjoint Sets

Kruskal s Algorithm For all v, makeset(v) Sort E by increasing weight For all edges (u,v) in E in sorted order: If find(u) find(v): Add (u,v) to MST union(u,v)

Kruskal s Algorithm Running Time? Time for sort = O( E log E ) or O( E ) Call makeset V times, find twice per edge, union at most once per edge E V -1, so V = O( E ) Time after sorting: O( E log * E )

Kruskal s Algorithm In general, sorting is bottleneck If we can sort in linear time or are given edges in sorted order, O( E log * E ) Otherwise, O( E log E ) Same as Prim s with binary heap

More on log * (n) Called iterated logarithm Extremely slow growing At most 5 for all values seem in practice Not the slowest: Inverse Ackermann function α(n) asymptotically slower Arises occasionally in algorithmic analysis Multiply n-bit integers in time O(n log n c log *n )

Interval Scheduling Suppose there are n jobs Each job i must be worked on and completed in the specified time interval [s(i),f(i)] Can only work on 1 job at a time How do we maximize the number of jobs completed?

Greedy Attempt 1: Shortest Intervals Since we want the most intervals possible, might make sense to repeatedly pick the smallest interval that does not cause a conflict

Greedy Attempt 2: Earliest Start Since getting started sooner means we can work for longer, might make sense to pick the interval with the earliest start time

Greedy Attempt 3: Fewest Conflict Whenever we do one job, we are preventing ourselves from completing any job that overlaps. Therefore, it might make sense to pick the job with the fewest conflicts

Greedy Attempt 4: Earliest Finish If we finish the first task as early as possible, will have the most time left for other tasks. Turns out to be correct greedy algorithm

Greedy Attempt 4: Earliest Finish Sort tasks by their end time f(i) Let t = 0 For each task i in increasing order of f(i) If t > s(i), discard i Otherwise, do task i, and set t = f(i)

Greedy Attempt 4: Earliest Finish Running time? Sort in O(n log n) O(1) extra time per task Total time: O(n log n) in general O(n) if we can sort in linear time

Greedy Attempt 4: Earliest Finish Optimal? Let i 1, i k be the intervals used in the greedy solution G, in order of time completed (f(i r )<f(i r+1 )) Let j 1, j m be intervals in some optimal solution O, in order of time completed Clearly m k. We want to show that m = k Claim: For each r = 0,, k, there is an optimal solution O r where the first r intervals are i 1, i r

Greedy Attempt 4: Earliest Finish Claim: For each r = 0,, k, there is an optimal solution O r where the first r intervals are i 1, i r Proof: Clearly true for r = 0 Given O r-1, construct O r by changing interval j r Interval j r-1 = i r-1, so ends at time f(i r-1 ) Interval i r has earliest end point among intervals starting after f(i r-1 ) Can safely replace j r = i r, arriving at O r

Greedy Attempt 4: Earliest Finish Claim: Greedy is optimal Proof: Consider optimal solution O k First k i 1, i k, so fist k intervals are greedy solution If Greedy not optimal, must be j k+1 occurring after i k But then after picking i k, greedy would have had at least 1 interval to choose from, would not have terminated

Related Problem: Interval Partitioning Have n tasks, want to complete them all using as few people as possible Formally: have n intervals, want to partition them into k subsets so that the intervals in each subset do not overlap, and k is minimized

Interval Partitioning How many partitions needed?

Interval Partitioning How many partitions needed? Clearly at least 4

Interval Partitioning Let S be the set of tasks Let depth(s) be the maximum number of tasks occurring simultaneously Clearly, the number of partitions must be at least depth(s) Can we find a partition into depth(s) sets?

Interval Partitioning Idea: take earliest interval i (by start time or finish time?), and add to any set that has no intervals overlapping i By finish time?

Interval Partitioning Idea: take earliest interval i (by start time or finish time?), and add to any set that has no intervals overlapping i By finish time? 1

Interval Partitioning Idea: take earliest interval i (by start time or finish time?), and add to any set that has no intervals overlapping i By finish time? 2 1

Interval Partitioning Idea: take earliest interval i (by start time or finish time?), and add to any set that has no intervals overlapping i By finish time? 1 1 2

Interval Partitioning Idea: take earliest interval i (by start time or finish time?), and add to any set that has no intervals overlapping i By finish time? 2 1 1 3 Greedy gives 3, but depth = 2

Interval Partitioning Take earliest interval i (by start time), and add to any set that has no intervals overlapping i Suppose number of sets is > depth When we go to add interval i, all depth(s) sets have an interval overlapping i i must have higher s(i) than any of these intervals But then the number of intervals overlapping at time s(i) is depth(s) + 1, a contradiction

Scheduling to Minimize Lateness Back to having one person to work on tasks Have n jobs, each with processing time t i, and deadline d i Devise a schedule to complete the jobs, minimizing maximum lateness: Define the lateness L i of task i as max(0, f i d i ) Want to minimize max of lateness values

Minimizing Lateness How should we prioritize tasks? By processing time?

Minimizing Lateness How should we prioritize tasks? By processing time? Greedy: Late!

Minimizing Lateness How should we prioritize tasks? By processing time? Optimal: All intervals on time

Minimizing Lateness How should we prioritize tasks? By slack time d i t i? slack

Minimizing Lateness How should we prioritize tasks? By slack time d i t i?

Minimizing Lateness How should we prioritize tasks? By slack time d i t i? Greedy: Large lateness

Minimizing Lateness How should we prioritize tasks? By slack time d i t i? Optimal: Small lateness

Minimizing Lateness How should we prioritize tasks? By deadline? Seems to simplistic, but it turns out to be the right approach

Minimizing Lateness Greedy algorithm: schedule tasks with earlier deadlines earlier Running time: time to sort deadlines

Minimizing Lateness Claim 1: There is an optimal schedule with no gaps between tasks

Minimizing Lateness In any schedule, an inversion is a pair of tasks where the task with the later deadline is scheduled earlier. Claim 2: All schedules with no gaps between tasks and no inversions have the same maximum lateness

Minimizing Lateness Proof: If two tasks have no inversions and no gaps, the only difference is in the order of tasks that have same deadlines Consider one deadline d All tasks with deadline d are scheduled after all tasks with earlier deadlines Last task with deadline d ends at same time for both schedules Maximum lateness for tasks with deadline d is the same in both schedules

Minimizing Lateness Claim 3: There is an optimal schedule with no inversions and no gaps. By Claim 1, there is an optimal solution O with no gaps Idea: repeatedly remove inversions, showing that the maximum lateness can only decrease.

Minimizing Lateness Fact 1: If O has an inversion, there is a pair of jobs i and j such that j is scheduled immediately after i, and d j < d i Proof: Start with any inversion (a,b). Starting from a, and continuing in the scheduled order, there must see an interval where the deadline decreases for the first time. This interval and the previous are j and i

Minimizing Lateness Fact 2: After swapping i and j, O has one less inversion

Minimizing Lateness Fact 3: After swapping i and j, O s maximum lateness did not increase. Proof: Before swap L j d j d i i j L i L i < L j

Minimizing Lateness Fact 3: After swapping i and j, O s maximum lateness did not increase. Proof: After swap L j L j j i d j d i L i L i < L j L j < L j

Minimizing Lateness Fact 3: After swapping i and j, O s maximum lateness did not increase Proof: After swapping i and j, the maximum lateness between just i and j did not increase Lateness of other intervals did not change Therefore, maximum over all intervals did not increase

Maximizing Lateness Since greedy has no inversions or gaps, it must have the same maximum lateness as any other solution with no inversions or gaps, including the optimal solution guaranteed by claim 3. Therefore, greedy is optimal