CS264: Beyond Worst-Case Analysis Lecture #14: Smoothed Analysis of Pareto Curves

Similar documents
CS264: Beyond Worst-Case Analysis Lecture #15: Smoothed Complexity and Pseudopolynomial-Time Algorithms

CS264: Beyond Worst-Case Analysis Lecture #18: Smoothed Complexity and Pseudopolynomial-Time Algorithms

CS264: Beyond Worst-Case Analysis Lecture #11: LP Decoding

CS364A: Algorithmic Game Theory Lecture #16: Best-Response Dynamics

CS261: A Second Course in Algorithms Lecture #8: Linear Programming Duality (Part 1)

CS264: Beyond Worst-Case Analysis Lecture #20: From Unknown Input Distributions to Instance Optimality

CIS 2033 Lecture 5, Fall

Quick Sort Notes , Spring 2010

CS369N: Beyond Worst-Case Analysis Lecture #7: Smoothed Analysis

Notes for Lecture 21

CS264: Beyond Worst-Case Analysis Lecture #4: Parameterized Analysis of Online Paging

CS261: A Second Course in Algorithms Lecture #11: Online Learning and the Multiplicative Weights Algorithm

CS364A: Algorithmic Game Theory Lecture #13: Potential Games; A Hierarchy of Equilibria

CS261: A Second Course in Algorithms Lecture #12: Applications of Multiplicative Weights to Games and Linear Programs

This means that we can assume each list ) is

Lecture 4: Random-order model for the k-secretary problem

CS264: Beyond Worst-Case Analysis Lecture #15: Topic Modeling and Nonnegative Matrix Factorization

Slope Fields: Graphing Solutions Without the Solutions

CS 361: Probability & Statistics

CS364B: Frontiers in Mechanism Design Lecture #3: The Crawford-Knoer Auction

CS261: Problem Set #3

CS261: A Second Course in Algorithms Lecture #18: Five Essential Tools for the Analysis of Randomized Algorithms

CS 6901 (Applied Algorithms) Lecture 3

Mathematics-I Prof. S.K. Ray Department of Mathematics and Statistics Indian Institute of Technology, Kanpur. Lecture 1 Real Numbers

Lower Bounds for the Smoothed Number of Pareto optimal Solutions

PEANO AXIOMS FOR THE NATURAL NUMBERS AND PROOFS BY INDUCTION. The Peano axioms

CS 6820 Fall 2014 Lectures, October 3-20, 2014

Aditya Bhaskara CS 5968/6968, Lecture 1: Introduction and Review 12 January 2016

Solving recurrences. Frequently showing up when analysing divide&conquer algorithms or, more generally, recursive algorithms.

Essential facts about NP-completeness:

The Knapsack Problem. n items with weight w i N and profit p i N. Choose a subset x of items

CS261: A Second Course in Algorithms Lecture #9: Linear Programming Duality (Part 2)

Lecture 6: Greedy Algorithms I

Lecture 6: September 22

1 Randomized Computation

CS364A: Algorithmic Game Theory Lecture #15: Best-Case and Strong Nash Equilibria

Lecture 13: Lower Bounds using the Adversary Method. 2 The Super-Basic Adversary Method [Amb02]

CS 6783 (Applied Algorithms) Lecture 3

CS 350 Algorithms and Complexity

x 1 + x 2 2 x 1 x 2 1 x 2 2 min 3x 1 + 2x 2

Computer Science 385 Analysis of Algorithms Siena College Spring Topic Notes: Limitations of Algorithms

Lecture 3: Lower Bounds for Bandit Algorithms

Lecture 10 September 27, 2016

University of California Berkeley CS170: Efficient Algorithms and Intractable Problems November 19, 2001 Professor Luca Trevisan. Midterm 2 Solutions

1 The Knapsack Problem

/633 Introduction to Algorithms Lecturer: Michael Dinitz Topic: Matroids and Greedy Algorithms Date: 10/31/16

NETS 412: Algorithmic Game Theory March 28 and 30, Lecture Approximation in Mechanism Design. X(v) = arg max v i (a)

Spanning and Independence Properties of Finite Frames

CS 173: Induction. Madhusudan Parthasarathy University of Illinois at Urbana-Champaign. February 7, 2016

Natural Language Processing Prof. Pawan Goyal Department of Computer Science and Engineering Indian Institute of Technology, Kharagpur

Limitations of Algorithm Power

A New Approximation Algorithm for the Asymmetric TSP with Triangle Inequality By Markus Bläser

CS 350 Algorithms and Complexity

Writing proofs for MATH 51H Section 2: Set theory, proofs of existential statements, proofs of uniqueness statements, proof by cases

Topic: Balanced Cut, Sparsest Cut, and Metric Embeddings Date: 3/21/2007

Topic: Primal-Dual Algorithms Date: We finished our discussion of randomized rounding and began talking about LP Duality.

Discrete Mathematics and Probability Theory Fall 2014 Anant Sahai Note 15. Random Variables: Distributions, Independence, and Expectations

Section - 9 GRAPHS. (a) y f x (b) y f x. (c) y f x (d) y f x. (e) y f x (f) y f x k. (g) y f x k (h) y kf x. (i) y f kx. [a] y f x to y f x

Checking Consistency. Chapter Introduction Support of a Consistent Family

U.C. Berkeley Better-than-Worst-Case Analysis Handout 3 Luca Trevisan May 24, 2018

MAT1302F Mathematical Methods II Lecture 19

Lecture 12 : Recurrences DRAFT

CS1800: Strong Induction. Professor Kevin Gold

An introduction to basic information theory. Hampus Wessman

EXAM 2 REVIEW DAVID SEAL

Discrete Mathematics and Probability Theory Fall 2013 Vazirani Note 1

Proof Techniques (Review of Math 271)

On the Power of Robust Solutions in Two-Stage Stochastic and Adaptive Optimization Problems

At the start of the term, we saw the following formula for computing the sum of the first n integers:

CSE541 Class 22. Jeremy Buhler. November 22, Today: how to generalize some well-known approximation results

P, NP, NP-Complete, and NPhard

a (b + c) = a b + a c

Guide to Proofs on Discrete Structures

ter. on Can we get a still better result? Yes, by making the rectangles still smaller. As we make the rectangles smaller and smaller, the

2.1 Convergence of Sequences

MA257: INTRODUCTION TO NUMBER THEORY LECTURE NOTES

CPSC 320 Sample Solution, Reductions and Resident Matching: A Residentectomy

1 Submodular functions

Section 4.2: Mathematical Induction 1

CS 361: Probability & Statistics

Introduction to Algorithms / Algorithms I Lecturer: Michael Dinitz Topic: Intro to Learning Theory Date: 12/8/16

1 Column Generation and the Cutting Stock Problem

8.5 Taylor Polynomials and Taylor Series

CS6999 Probabilistic Methods in Integer Programming Randomized Rounding Andrew D. Smith April 2003

Isomorphisms and Well-definedness

Theorem (Special Case of Ramsey s Theorem) R(k, l) is finite. Furthermore, it satisfies,

MATHEMATICAL INDUCTION

Discrete Mathematics for CS Spring 2006 Vazirani Lecture 22

9 Knapsack Cryptography

Limits and Continuity

Chapter Usual types of questions Tips What can go ugly. and, common denominator will be

Introduction to Computer Science and Programming for Astronomers

0. Introduction 1 0. INTRODUCTION

Comparing continuous and discrete versions of Hilbert s thirteenth problem

CMSC 451: Lecture 7 Greedy Algorithms for Scheduling Tuesday, Sep 19, 2017

P (E) = P (A 1 )P (A 2 )... P (A n ).

Convex Optimization Notes

Introductory Analysis 2 Spring 2010 Exam 1 February 11, 2015

Convergence of Random Walks and Conductance - Draft

Consequences of Continuity

Transcription:

CS264: Beyond Worst-Case Analysis Lecture #14: Smoothed Analysis of Pareto Curves Tim Roughgarden November 5, 2014 1 Pareto Curves and a Knapsack Algorithm Our next application of smoothed analysis is to Pareto curves. We begin by explaining one of several reasons why you should care about the size of Pareto curves: they govern the running time of some interesting algorithms. In the Knapsack problem, the input comprises n items, each with a positive value v i and a positive weight (or size) w i. Also given is a knapsack capacity W. The goal is choose a subset S {1, 2,..., n} of the items that maximizes the total value i S v i, subject to the chosen items fitting in the knapsack (i.e., i S w i W ). When you were first introduced to this problem, you may have been told some hokey story about a burglar trying to escape with the best loot but it really is a fundamental problem, relevant whenever you re trying to make optimal use of a limited resource. We say that a subset S {1, 2,..., n} dominates T {1, 2,..., n} if: (i) the total value of S is at least that of T ( i S v i i T v i); (ii) the total weight of S is at most that of T ( i S w i i T w i); and (iii) at least one of these two inequalities is strict. If S dominates T then it renders T moot the solution T can be safely pruned without regret. The Pareto curve of a Knapsack instance is the set of all undominated solutions; see Figure 1. Geometrically, a point is dominated if and only if there is another point to the northwest of it. You ve seen this concept before, in Lecture #2, when we studied the 2D Maxima problem. The Pareto curve corresponds, after a reflection about the y-axis, to the set of maxima of the point set with one point for each subset of {1, 2,..., n}, the two coordinates of a point being the total weight and total value of S. Here is a Knapsack algorithm that you may not have seen before (due to [5]). 1. Generate the Pareto curve. If multiple solutions have identical total value and total weight, an arbitrary one of them is retained. c 2014, Tim Roughgarden. Department of Computer Science, Stanford University, 474 Gates Building, 353 Serra Mall, Stanford, CA 94305. Email: tim@cs.stanford.edu. 1

Figure 1: The Pareto curve. The red points are dominated, the green points form the Pareto curve. 2. Among all solutions in the Pareto curve with total weight at most the knapsack capacity W, return the one with the largest total value. The first thing to notice about this algorithm is that it does not involve dynamic programming (at least not explicitly). It is clearly correct, since there is no loss of generality in restricting the search for an optimal solution to the Pareto curve (by the definition of domination). A good way to implement the algorithm above is to order the items arbitrarily and generate a sequence of Pareto curves P C 0, P C 1,..., P C n, where P C i denotes the Pareto curve for the Knapsack instance induced by the first i items. Provided each Pareto curve is represented in increasing order of total weight, P C i+1 can be obtained from P C i by merging two sorted lists (P C i and a shifted version of P C i ). A suitable implementation of this idea yields an algorithm running in time O( n i=0 P C i ); see Homework #7 for details. The good news is that the bound O( n i=0 P C i ) can be bounded above, even in the worst case, in three different ways. Certainly P C i 2 i for every i, so O( n i=0 P C i ) = O(2 n ). Next, suppose that all items weights are integers between 1 and w max. No curve P C i contains more than one solution with the same total weight (all but the highest-value one are dominated), so in this case P C i nw max and hence O( n i=0 P C i ) = O(n 2 w max ). Similarly, if all of the items values are integers between 1 and v max, then O( n i=0 P C i ) = O(n 2 v max ). Thus, this algorithm not only solves the knapsack problem without explicit dynamic programming, its running time bound of O(min{2 n, nw max, nv max }) is at least as good as both of the dynamic programming algorithms that you may have seen previously for the problem. One of these classical dynamic programs has a 2-D table indexed by a number i of items and a budget W W on weight; the other is indexed by i and a target value V nv max. The bad news is that the worst-case size of the Pareto curve of a Knapsack instance with n items can indeed be as large as 2 n (see Homework #7). 2

2 Smoothed Analysis of Pareto Curves One reason to cover the above Knapsack algorithm is that it s simple, practically useful in some cases, and underrepresented in undergraduate courses and textbooks. But the main reason here is because it s particularly amenable to a smoothed analysis. The main result in this lecture is the following. Theorem 2.1 ([1]) In smoothed Knapsack instances (see details below), the expected size of the Pareto curve is O(n 2 /σ), where σ is a measure of perturbation size. The bound in Theorem 2.1 is tight; see Homework #7. Theorem 2.1 and linearity of expectation imply that the Knapsack algorithm in the previous section has expected running time O(n 3 /σ), which is polynomial provided σ is bounded below by an inverse polynomial function of n. This is particularly interesting in light of the fact that Knapsack is an NP -hard problem! Thus, Theorem 2.1 tells us something interesting not just about a specific Knapsack algorithm, but more generally about the (smoothed) complexity of the Knapsack problem. (Next lecture elaborates on this difference further.) That the Knapsack problem has polynomial smoothed complexity corroborates well with empirical evidence that it is among the easiest of N P -hard problems. The perturbation model we use in Theorem 2.1 is essentially the same as last lecture. We allow the item values v i to be arbitrarily chosen; only the item weights w i are random. 1 We assume, without loss of generality, that the values lie in [0, 1]. We assume that each weight w i is drawn independently according to a density function f i : [0, 1] [0, 1 ] that is σ supported on [0, 1] and is not too spiky, meaning that the density is never higher than 1/σ. In particular, the support of the distribution has area at least σ, with equality if and only if f i is uniform over its support. This perturbation model is even simpler than last lecture, since we ve moved from two-dimensional to one-dimensional objects (from points in the plane to item weights). It is very satisfying that Theorem 2.1 makes only the minimal assumption that the distributions have enough randomness, and makes no assumptions about their specific form. It is easy to believe that real data contains a little randomness, even though any specific distributional assumption would be implausible. 3 Proof of Theorem 2.1 Our goal is to upper bound the expected value of N, the number of Pareto optimal solutions of a smoothed Knapsack instance. Since N can be exponential in the worst case (see Homework #7), we need to use the assumption that the item weights are random. It seems difficult to relate directly this hypothesis to N. For example, consider some subset S of items, say {4, 7, 10}. We know the total value of this subset (recall values are fixed), say 127. Depending on the random item weights, S might or might not be Pareto optimal. This event depends on the random weights of items 4, 7, and 10 the smaller these are, the more likely S 1 Alternatively, one can assume that the weights are arbitrary and that the item values are random. 3

is to be Pareto optimal. It also depends on other item weights, however the larger the weights of other items (and hence other solutions), the more likely S is to be Pareto optimal. To make matters worse, there might be an exponential number of other solutions that are eligible to dominate S, and it is hard to argue about all of them in a coherent way. Even if we condition on all item weights except one, it is not clear how to argue directly about whether or not a given solution S contributes to N. So our plan is to decompose N into simpler random variables that are easier to relate to our hypothesis that the item weights are random. To identify a role model, we recall the high-level idea of last lecture s analysis, of the 2OPT local search algorithm for TSP in the plane. We first zoomed in on a specific swap happily, there were only O(n 4 ) fundamentally distinct swaps to consider. Such a swap involved 8 numbers, two coordinates for each of the 4 relevant points. We then zoomed in further on one of the (4!) 2 relevant integer linear combinations of these 8 numbers, which corresponds to a particular relative ordering of the 4 x-coordinates and of the 4 y-coordinates. After conditioning on all but one of these numbers, the bad event boiled down simply to whether or not the remaining number fell into a particular interval. This probability was trivial to upper bound because of the upper bound on every point s density function. Returning to the present setting, we would like to identify analogously simple bad events, ideally corresponding simply to one of the smoothed random variables (i.e., item weights) falling into a particular interval. As a first step, imagine plotting the total weight and total value of every subset S {1, 2,..., n}, in the box [0, n] [0, n]. For the sake of analysis, we imagine chopping up this box into n/ɛ vertical slices, each of width ɛ (see Figure 2). Here ɛ > 0 is a parameter internal to our analysis, chosen for convenience (it will not appear in our final bound). We choose ɛ so small (e.g., inverse doubly exponential in n) that there is essentially zero probability that more than one subset S appears in a single slice. 2 We are then justified in redefining N as the number of slices that contain a Pareto optimal solution, rather than as the number of Pareto optimal solutions. Here s an idea that is in the spirit of what we want, but too simple to work. We could decompose N = slices t N t, where N t is an indicator random variable for whether or not there is a Pareto optimal solution in the slice with left boundary t (and right boundary t + ɛ). Since N t N t, linearity of expectation reduces our task from upper bounding the expectation of N to upper bounding the probability of an event {N t = 1}. This event is still too complicated to analyze directly, though: there are potentially an exponential number of item subsets that might land in this slice, even after conditioning on all item weights except for one. Intuitively, only the subsets with the highest total values are relevant. To tease out this point, we need to refine our approach and decompose the random variable N further. We now proceed to the proof of Theorem 2.1, which is extremely clever. There are two parts, and both are somewhat delicate. The first part shows that a particular decomposition 2 In more detail: because all density functions f i are bounded above by 1/σ, the probability that a given subset S winds up in a given slice is at most ɛ/σ. (To see this, fix a slice, and condition on all item weights except for one item in S.) Thus, the probability that some slice contains two or more points can be bounded above by n ɛ ( ɛ σ )2, which goes to 0 as ɛ 0. We can also assume that no solutions lands squarely on a slice boundary, since this is a probability 0 event. 4

...... Figure 2: Slices, Pareto optimal solutions, and the slice boundary t. of N as the sum of indicator random variables is valid, in that the latter only overestimates N. The second part uses the smoothed instance hypothesis to upper bound the probabilities of the indicator variables. The definitions for the first part will seem obscure, but as we ll see, they are carefully crafted to facilitate the second part. To define the indicator random variables, fix a slice boundary t and an item i {1, 2,..., n}. Ultimately, we want to charge each Pareto optimal solution S to the slice boundary immediately to its left and to an item i in S that is responsible for its high value. Formally, we define four random variables. P it {1, 2,..., n} is the solution that maximizes total value subject to two constraints: (i) the total weight of P it is at most t; and (ii) P it does not include item i. Note that P it is a random variable, since a given subset might or might not have weight at most t, depending on the item weights coin flips. Also note that P it might be the same as the maximum-value solution with weight at most t (if the latter happens to exclude item i), or not (otherwise). Similarly, P it might or might not be a Pareto optimal point. V it is the total value of P it. S it is the solution that minimizes the total weight subject to two constraints: (i) the total value of S it is more than V it ; and (ii) S it includes item i. Note that the total weight of S it could be more or less than that of P it. N it is 1 if the total weight of S it lies in the interval (t, t + ɛ), and 0 if S it lies in some other slice. 5

The N it s are, finally, random variables that we can analyze directly. Before we do this, however, we need to argue that the N it s cover all the Pareto optimal solutions. Lemma 3.1 With probability 1, N it N. slices t items i Proof: We need to show that for every contribution to the right-hand side every Pareto optimal solution S {1, 2,..., n} there is a corresponding contribution to the left-hand side. So, for each Pareto optimal solution we exhibit a distinct item-boundary pair (i, t) with N it = 1 this is the charging argument alluded to above. Condition on all of the items weights and fix a Pareto optimal solution S. Define t as the slice boundary immediately to the left of S. Thus, the total weight of S lies in the interval (t, t + ɛ). Let P denote the maximum-value solution with weight at most t, and call its total value V. Since S is Pareto optimal and P has strictly less weight than S, it must be that S has total value strictly larger than V (otherwise P would dominate it). This can only occur when there is at least one item in S that is not also in P ; let i denote an arbitrary item of S \ P. Now let s trace through the definition of all the random variables eventually, we will conclude that N it = 1. We begin with P it, defined as the maximum-value solution with weight at most t that excludes item i. Since the maximum-value solution with weight at most t, namely P, happens to exclude item i, it must be that P = P it. Thus, by the definitions, V = V it. Next, recall that S it is defined as the minimum weight solution with value strictly larger than V it (equivalently, strictly larger than V ) that excludes item i. By the definition of P and V, every solution with total value strictly greater than V has total weight strictly greater than t. Since S happens to include item i and also happens to have total weight in the earliest possible slice (total weight in (t, t + ɛ)), it must be that S it = S and N it = 1. Finally, since we chose ɛ small enough that distinct Pareto optimal solutions have distinct left boundaries t, every Pareto optimal solution is charged (i.e., implies that N it = 1) to distinct (i, t) pairs. This completes the proof. Lemma 3.1 reduces the task of bounding N to that of bounding the probability that a given N it equals 1. Lemma 3.2 For every i and t, E[N it ] ɛ σ. Proof: The lemma is equivalent to proving that, for every i and t, the probability that the set S it has total weight in (t, t + ɛ) is at most ɛ σ. Recall that S it is the minimum-weight solution with value greater than V it that includes item i, where V it is the maximum value of a solution that excludes item i and has total weight at most t. So, fix i and t. The dominoes start falling as soon as we condition on all of the random item weights except for that of item i. 6

1. The total weight and total value of every set that excludes item i is now fixed. (We ve conditioned on all weights other than that of item i, and values have been fixed all along.) This means that P it and hence V it are now fixed. 2. The total weight of every subset S that includes item i is a fixed amount W S (namely, j S\{i} w j) plus the random weight w i of item i. This implies that the relative ordering by total weight of all subsets including i is now fixed it is the same as the relative ordering by the W S s. The total values of all of these sets are also fixed (since the v i s were never random). 3. Since S it is the minimum-weight subset that includes item i and has value at least V it, V it is fixed, the subsets with value at least V it are fixed, and the relative order by total weight of the sets including i are fixed, the set S it is fixed. Summarizing, all that remains random is the actual weight of the (fixed) set S it, which has the form W + w i for a fixed number W. Thus, the bad event that S it has total weight in the interval (t, t + ɛ) translates to the still-random weight w i taking on a value in an interval that has length less than ɛ. Since the density function of w i is bounded above by 1/σ, this probability is at most ɛ/σ. Since this inequality holds conditioned on arbitrary weights for the items other than i, the unconditional inequality of the lemma follows. Using that there are n n choices for i and t, combining Lemmas 3.1 and 3.2 with linearity ɛ of expectation implies that [ ] E[N] E N it n2 ɛ max Pr[N it = 1] n2 i,t σ. This completes the proof of Theorem 2.1. i,t Remark 3.3 Over the past few years there has been much progress in extending Theorem 2.1 to a smoothed analysis of higher-dimensional Pareto curves. (In d dimensions, there are d linear objective functions, one point dominates another if and only if it is at least as good in every coordinate, and the Pareto curve contains all undominated points.) The right answer is somewhere between n d /σ d and n 2d /σ d ; see [4, 2, 3] for more details. References [1] R. Beier and B. Vöcking. Typical properties of winners and losers in discrete optimization. SIAM Journal on Computing, 35(4):855 881, 2006. [2] T. Brunsch, N. Goyal, L. Rademacher, and H. Röglin. Lower bounds for the average and smoothed number of Pareto-optima. Theory of Computing, 10(10):237 256, 2014. 7

[3] T. Brunsch and H. Röglin. Improved smoothed analysis of multiobjective optimization. In Proceedings of the 44th Annual ACM Symposium on Theory of Computing (STOC), pages 407 426, 2012. [4] A. Moitra and R. O Donnell. Pareto optimal solutions for smoothed analysts. SIAM Journal on Computing, 41(5):1266 1284, 2012. [5] G. Nemhauser and Z. Ullmann. Discrete dynamic programming and capital allocation. Management Science, 15(9):494 505, 1969. 8