Aditya Bhaskara CS 5968/6968, Lecture 1: Introduction and Review 12 January 2016

Similar documents
Integer Linear Programs

The Probabilistic Method

Lecture 5: January 30

More Approximation Algorithms

CS264: Beyond Worst-Case Analysis Lecture #11: LP Decoding

Oblivious and Adaptive Strategies for the Majority and Plurality Problems

Topics in Approximation Algorithms Solution for Homework 3

CS261: Problem Set #3

A Polynomial-Time Algorithm for Pliable Index Coding

Discrete Mathematics and Probability Theory Spring 2016 Rao and Walrand Note 14

Chapter 11. Approximation Algorithms. Slides by Kevin Wayne Pearson-Addison Wesley. All rights reserved.

CS 781 Lecture 9 March 10, 2011 Topics: Local Search and Optimization Metropolis Algorithm Greedy Optimization Hopfield Networks Max Cut Problem Nash

Homework 4 Solutions

CSEP 521 Applied Algorithms. Richard Anderson Winter 2013 Lecture 1

Algorithms Exam TIN093 /DIT602

Lecture 6: Greedy Algorithms I

Online Learning, Mistake Bounds, Perceptron Algorithm

Dominating Set. Chapter 7

Out-colourings of Digraphs

NP Completeness and Approximation Algorithms

CMSC 451: Lecture 7 Greedy Algorithms for Scheduling Tuesday, Sep 19, 2017

CSCE 750 Final Exam Answer Key Wednesday December 7, 2005

CS168: The Modern Algorithmic Toolbox Lectures #11 and #12: Spectral Graph Theory

12. LOCAL SEARCH. gradient descent Metropolis algorithm Hopfield neural networks maximum cut Nash equilibria

Lecture 4: Two-point Sampling, Coupon Collector s problem

(a) Write a greedy O(n log n) algorithm that, on input S, returns a covering subset C of minimum size.

Lecture 14 - P v.s. NP 1

1 Closest Pair of Points on the Plane

Bin packing and scheduling

Lecture 20: LP Relaxation and Approximation Algorithms. 1 Introduction. 2 Vertex Cover problem. CSCI-B609: A Theorist s Toolkit, Fall 2016 Nov 8

Theorem 1.7 [Bayes' Law]: Assume that,,, are mutually disjoint events in the sample space s.t.. Then Pr( )

8 Knapsack Problem 8.1 (Knapsack)

Network Design and Game Theory Spring 2008 Lecture 2

Approximation Algorithms

Spring 2017 CO 250 Course Notes TABLE OF CONTENTS. richardwu.ca. CO 250 Course Notes. Introduction to Optimization

The min cost flow problem Course notes for Search and Optimization Spring 2004

Monotone Submodular Maximization over a Matroid

Supervised Machine Learning (Spring 2014) Homework 2, sample solutions

CMPUT651: Differential Privacy

Lecture 4: Constructing the Integers, Rationals and Reals

University of California Berkeley CS170: Efficient Algorithms and Intractable Problems November 19, 2001 Professor Luca Trevisan. Midterm 2 Solutions

CS264: Beyond Worst-Case Analysis Lecture #15: Smoothed Complexity and Pseudopolynomial-Time Algorithms

On the mean connected induced subgraph order of cographs

Lecture 12 : Graph Laplacians and Cheeger s Inequality

Online Interval Coloring and Variants

Graph Theory. Thomas Bloom. February 6, 2015

The expansion of random regular graphs

FINAL EXAM PRACTICE PROBLEMS CMSC 451 (Spring 2016)

Notes 6 : First and second moment methods

10.3 Matroids and approximation

CPSC 320 (Intermediate Algorithm Design and Analysis). Summer Instructor: Dr. Lior Malka Final Examination, July 24th, 2009

An Improved Approximation Algorithm for Maximum Edge 2-Coloring in Simple Graphs

Coupling. 2/3/2010 and 2/5/2010

P P P NP-Hard: L is NP-hard if for all L NP, L L. Thus, if we could solve L in polynomial. Cook's Theorem and Reductions

COS597D: Information Theory in Computer Science October 5, Lecture 6

CS264: Beyond Worst-Case Analysis Lecture #18: Smoothed Complexity and Pseudopolynomial-Time Algorithms

Section Notes 8. Integer Programming II. Applied Math 121. Week of April 5, expand your knowledge of big M s and logical constraints.

1 T 1 = where 1 is the all-ones vector. For the upper bound, let v 1 be the eigenvector corresponding. u:(u,v) E v 1(u)

The Lovász Local Lemma : A constructive proof

Lecture 8: February 8

UC Berkeley Department of Electrical Engineering and Computer Science Department of Statistics. EECS 281A / STAT 241A Statistical Learning Theory

IS 709/809: Computational Methods in IS Research Fall Exam Review

As mentioned, we will relax the conditions of our dictionary data structure. The relaxations we

Exact and Approximate Equilibria for Optimal Group Network Formation

CS 6820 Fall 2014 Lectures, October 3-20, 2014

Computational Complexity: Problem Set 4

1 Submodular functions

Topic: Balanced Cut, Sparsest Cut, and Metric Embeddings Date: 3/21/2007

Motivating examples Introduction to algorithms Simplex algorithm. On a particular example General algorithm. Duality An application to game theory

HW Graph Theory SOLUTIONS (hbovik) - Q

Lectures 6, 7 and part of 8

15-451/651: Design & Analysis of Algorithms September 13, 2018 Lecture #6: Streaming Algorithms last changed: August 30, 2018

Packing and Covering Dense Graphs

CS 124 Math Review Section January 29, 2018

Dual fitting approximation for Set Cover, and Primal Dual approximation for Set Cover

Graphs with large maximum degree containing no odd cycles of a given length

CS103 Handout 08 Spring 2012 April 20, 2012 Problem Set 3

1 Maintaining a Dictionary

Avoider-Enforcer games played on edge disjoint hypergraphs

Contents. Introduction

Lecture 11 October 7, 2013

11.1 Set Cover ILP formulation of set cover Deterministic rounding

CS5314 Randomized Algorithms. Lecture 18: Probabilistic Method (De-randomization, Sample-and-Modify)

CSE 417T: Introduction to Machine Learning. Lecture 11: Review. Henry Chai 10/02/18

CS60007 Algorithm Design and Analysis 2018 Assignment 1

MATH2206 Prob Stat/20.Jan Weekly Review 1-2

Problem set 1. (c) Is the Ford-Fulkerson algorithm guaranteed to produce an acyclic maximum flow?

Connectedness. Proposition 2.2. The following are equivalent for a topological space (X, T ).

Lecture 12: Randomness Continued

Lecture 4. Quicksort

Midterm Exam. CS 3110: Design and Analysis of Algorithms. June 20, Group 1 Group 2 Group 3

Decomposing oriented graphs into transitive tournaments

Santa Claus Schedules Jobs on Unrelated Machines

4. How to prove a problem is NPC

Quick Sort Notes , Spring 2010

Introduction to Computer Science and Programming for Astronomers

Data Structures and Algorithms (CSCI 340)

1 Overview. 2 Learning from Experts. 2.1 Defining a meaningful benchmark. AM 221: Advanced Optimization Spring 2016

(Refer Slide Time: 0:21)

This section is an introduction to the basic themes of the course.

Transcription:

Lecture 1: Introduction and Review We begin with a short introduction to the course, and logistics. We then survey some basics about approximation algorithms and probability. We also introduce some of the problems that we will encounter over the course of the semester. Disclaimer: These lecture notes are informal in nature and are not thoroughly proofread. In case you find a serious error, please send email to the instructor pointing it out. Introduction and Background The goal of the course is to introduce some key ideas in modern algorithm design. We will cover topics such as spectral algorithms, convex relaxations, random walks, local search, the multiplicative weight update method, and so on. To illustrate the techniques, we will use graph problems and variants of clustering. Course Logistics Office hours will be Wednesday 11AM-Noon. For information about grading, references and course material, see: http://www.cs.utah.edu/ bhaskara/courses/x968/ Approximation Algorithms As we discussed in class, many interesting optimization problems we encounter turn out to be NP hard. Can we still give interesting algorithms for these problems? One natural idea is to come up with algorithms that produce a solution whose objective value is close to that of the optimal solution. An algorithm that does so for every input is called an approximation algorithm. Formally, suppose we have a problem P in which the goal is to minimize an objective function, and let ALG be an algorithm for the problem. For an input I, we denote by OPT(I) the optimal value (of the objective function), and by ALG(I) the objective value for the solution produced by the algorithm. The approximation ratio of an algorithm ALG is defined to be ALG(I) max inputs I OPT(I). Remark 1. Note that the approximation ratio is a worst case quantity i.e., an algorithm has a large approximation ratio even if it does badly on one input. Remark 2. The definition above is tailored towards minimization problems. For maximization problems, we consider the max value of the ratio OPT(I) ALG(I). Remark 3. From our definition (and the remark above), an approximation ratio is always 1 the closer it is to 1 the better. Many algorithms have approximation ratios that are functions of the input size. In these cases, the max inputs means max over inputs of a certain size. Page 1 of 6

Let us now see simple examples of approximation algorithms. First, some basic notations about graphs: A graph G is defined by a pair (V, E), where V is the set of vertices and E the set of edges (could be weighted, or directed). The neighborhood of a vertex v is the set of vertices that have an edge to v, and is denoted Γ(v). The size of Γ(v) is called the degree of v, denoted deg(v). For a set of vertices S, we write Γ(S) to be v S Γ(v) (union of the neighborhoods of vertices in S). For two disjoint sets of vertices V 1, V 2 V, denote by E(V 1, V 2 ) the set of edges with one endpoint in V 1 and the other in V 2. Maximum Cut Definition of the problem: Given an undirected, unweighted graph G = (V, E), partition V into V 1 and V 2 so as to maximize E(V 1, V 2 ). The objective value is thus E(V 1, V 2 ). We call the edges in E(V 1, V 2 ) the edges that are cut by the partition. Let us consider the following natural algorithm, which can be thought of as local search. 1. Start with an arbitrary partition V 1, V 2 of V (so every vertex in V is in one of the parts). 2. If there exists a vertex u V that has > (1/2)deg(u) edges going to vertices in the same part as u, then move u to the other part. Else, terminate. 3. Repeat step 2 as long as there exists such a vertex u. Efficiency. Note that whenever step 2 executes (without termination), the number of edges in the cut strictly increases this is because the number of edges incident to u that are cut is < (1/2)deg(u) in the beginning, and > (1/2)deg(u) at the end, and all the other edges remain unchanged. Thus the algorithm terminates in at most E steps. Approximation ratio. What can we say about the quality of the solution at the end of the algorithm? Let C u denote the number of edges incident to u that are cut when the algorithm terminates. Thus we have E(V 1, V 2 ) = 1 2 C u. (1) The factor 1/2 is because the sum ends up counting every edge precisely twice (once for each end-point). What do we know about C u? By design, when the algorithm terminates, we have C u (1/2)deg(u) (else we would have moved u to the other part). Thus we have u C u (1/2) u deg(u) = E (here we used the fact that in any undirected graph, u deg(u) = 2 E, which is true because the summation counts every edge precisely twice). u Combined with (1), we obtain E(V 1, V 2 ) 1 2 E. Maximum Cut continued on next page... Page 2 of 6

This proves that for any input I, ALG(I) (1/2)OPT(I), because OPT(I) is (trivially) at most E. Thus the approximation ratio is at most 2. In the Homework, we will see that the approximation ratio is actually equal to 2 (asymptotically), which means there are inputs for which the ratio between OPT and ALG can be arbitrarily close to 2. This shows that the analysis above is tight. In the analysis above, it was easy to compare OPT and ALG, because we had an obvious upper bound on OPT the total number of edges in the graph. For many problems, such an easy bound might not exist. The next example illustrates this. Thus the way in which we analyze the algorithm is significantly different. Set Cover Let T be a set of topics (e.g., sports, weather, finance, math,... ). Formally, let us denote them by T 1, T 2,..., T n. Suppose we have m people, denoted P 1, P 2,..., P m, and suppose that each person is an expert on a subset of the topics (e.g., P 1 could be an expert on T 2, T 5 and T 6, P 2 an expert on T 1 and T 5, and so on). The aim is to pick the smallest set of people s.t. among them, there exists an expert on each of the n topics. I.e., we want the smallest set of people who cover all the topics. The objective function here is the size of the set of people picked. Greedy algorithm: A natural idea is to start by picking the person who is an expert on the most number of topics (break ties arbitrarily). This leaves us with some topics uncovered. The natural thing now is to pick a person who is an expert on the most number of these (uncovered) topics. This leaves us with a smaller set of uncovered topics. Now we pick the person who is an expert on the most number of these topics, and continue this process until there are no uncovered topics. 1 How can we bound the number of iterations of this algorithm? One way is to try to quantify the progress we are making, i.e., can we show that the number of uncovered topics keeps reducing at every step? And if so, at what rate? To answer this, let us define k to be the optimal value of the objective, i.e., there exists a set of k people who cover all the topics. Let us denote them by P i1, P i2,..., P ik. Further, let us denote by U j the set of uncovered topics after j steps of the algorithm. Clearly U 0 is the entire set of topics (intially, none of the topics are covered). The key claim is that for every j 0, step (j + 1) in fact covers at least U j /k of the uncovered elements U j. Let us first see this for j = 0, in which case step (j + 1) is simply the first step: We know that P i1,..., P ik cover the entire set of topics, thus at least one of them must cover at least n/k topics (by averaging). Our algorithm picks the person who covers the maximum number of uncovered topics, thus this person covers at least n/k topics. The exact same argument applies for j 1, because P i1, P i2,..., P ik cover all the topics, and in particular all of U j. Thus the person picked at step (j + 1) must cover at least U j /k uncovered topics. The upshot of 1 As long as it is possible to cover all topics, our algorithm will terminate. The approximation problem does not make sense if it is not possible to cover all topics (it is infeasible). Set Cover continued on next page... Page 3 of 6

this is that the number of uncovered topics after step (j + 1) is U j Uj k. More precisely: ( U j+1 1 1 ) U j. k Now suppose the algorithm runs for T iterations. We have U T (1 1/k) U T 1 (1 1/k) 2 U T 2 (1 1/k) T U 0 = (1 1/k) T n. Thus if we set T = k log n, 2 we have (using the inequality (1 x) < e x for x > 0): U T n (1 1/k) T < n e T/k = n e log n = 1 Thus the number of uncovered elements is < 1, meaning it is zero. Thus the number of iterations is at most k log n. Recall that k is the optimal value of the objective. Thus the algorithm is a (log n) factor approximation. Remark 4. This is an example of a problem in which the approximation factor depends on the input size n. In fact, it is known (a paper of Uriel Feige) that approximating set cover to a factor (1 ɛ) log n is NP-hard, for any constant ɛ, so this greedy algorithm is essentially the best possible, in terms of approximation ratio! Review of Basic Probability Let us begin by seeing a couple of examples of the so-called Union bound. For any two events A and B in a probability space, we have Pr[A B] = Pr[A] + Pr[B] Pr[A B]. Here A B is the event A or B, and A B denotes A and B. This immediately implies that Pr[A B] Pr[A] + Pr[B] (simply because the other term is non-negative). We can extend this to a collection of events A 1, A 2,..., A n. We have Pr[A 1 A 2 A n ] i Pr[A i ]. While very simple, the inequality is rather powerful, and will be useful in many places in the course. Collecting coupons Suppose we have n stores, each of which gives out coupons in a newspaper. When a person buys a copy of the newspaper, he gets the coupon of a random store. The question is, how many newspapers should he buy to be 99% sure that he gets the coupons of all the stores? Intuitively, this is how the process should go: initially, the person ends up seeing a lot of distinct coupons. But once the number of unseen coupons becomes small, so does his likelihood of seeing them. As an extreme 2 All logarithms we consider will natural logs, i.e., to base e. Collecting coupons continued on next page... Page 4 of 6

case, suppose he has seen all but one of the coupons; now the probability that he sees this coupon when he buys a newspaper is only 1/n, so in expectation, he needs to buy n newspapers to see this one coupon. One can make this observation semi-formal. Suppose the person has i coupons remaining. Then the next time he buys a newspaper, he has a probability of i/n of seeing an unseen coupon. Thus the expected number of newspapers he needs to buy to see an unseen coupon is n/i. After this many newspapers, the number of unseen coupons drops to i 1. We can then use this argument again replacing i with (i 1). Thus the expected number of newspapers he needs to buy for i to drop from n to 0 is n n + n n 1 +... n ( 1 = n 1 + 1 2 + + 1 ) = Θ(n log n). n How can we make the argument more formal? Furthermore, we want a success probability as well (99%). For this, suppose the person buys T newspapers. Let us analyze the probabilty that he has not seen all the coupons. If this probability is < 1%, then the success probability is > 99%. Now, not seeing all the coupons is equivalent to saying that at least one of the coupons was not seen. Let us denote by A i the event that coupon i was not seen (after buying T papers). Then we are interested in getting an upper bound for Pr[A 1 A 2 A n ]. We can apply the union bound, to say that it is i Pr[A i]. But what is Pr[A i ]? It is precisely the probability that none of the T newspapers ended up giving coupon i. This is exactly ( 1 1 n) T (for any i). Thus we have ( Pr[A 1 A 2 A n ] n 1 n) 1 T. Once again, we can use the inequality 1 x < e x to say that (1 1/n) T < e T/n. Now to make the RHS above < 1/100, we simply need e T/n < 1, or equivalently, T n log(100n). 100n This is, of course, n log n + Θ(n), which is essentially the bound from the heuristic argument. The main advantage with a union bound is that we reduce the question of computing the probability of a complicated event to that of a bunch of simple events, which we can directly estimate. Balls and Bins Suppose we have n bins, labelled B 1, B 2,..., B n. Now consider throwing n balls randomly into the bins (i.e., for each ball, we throw it randomly into one of the bins, and proceed with the next ball). A natural question is, how unbalanced is the resulting assignment? For instance, could one bin end up getting a lot of balls? This sort of a question is very useful in applications imagine a large number of search queries, and we forward each query to a randomly chosen machine. We do not want any machine to be overloaded with too many queries. Formally, if S 1, S 2,..., S n denote the number of balls that landed in the bins (in order), we are interested Balls and Bins continued on next page... Page 5 of 6

in the quantity max i S i. For what T can we say that max i S i < T with probability 99%? Once again, we can use a union bound to argue. If X i denotes the event S i > T, then we are interested in bounding Pr[X 1 X 2 X n ], which we can analyze as before. Exercise. Prove that for T = C log n/ log log n for some C, we have max i S i < T with probability 99%. This bound actually happens to be tight! Specifically, with probability at least 99%, there exists a bin j that gets at least c log n/ log log n balls (for an absolute constant c). This quantity could be quite large for large n, meaning that random assignment typically produces a fairly unbalanced solution. There are very elegant ways of dealing with this. The reader is encouraged to look up: the power of two choices. Page 6 of 6