Algoritmos e Incerteza (PUC-Rio INF2979, 2017.1) Lecture 2: Paging and AdWords March 20 2017 Lecturer: Marco Molinaro Scribe: Gabriel Homsi In this class we had a brief recap of the Ski Rental Problem and its applications. Then, we studied two other classical online algorithm problems: the Paging Problem and the AdWords Problem. 1 Ski Rental Problem (recap) The ski rental problem belongs to a rent-or-buy class of problems. In these problems, we have to decide between keep paying a recurring cost (rent) or finally paying a high one time cost (buy). A wide range of problems belong to this class. Next, we describe two of them. 1.1 Wait for elevator or take the stairs In this problem we have to decide between waiting for an elevator (rent) or taking the stairs (buy). The arrival time of the elevator l is unknown. The more we wait for the elevator, the more we lose time (1 unit of time per turn). We stop waiting for the elevator if we decide to take the stairs. However, if we take the stairs, we lose b units of time. 1.2 Energy states An idle computer system has two energy states: high and low energy. The system starts at the high energy state. Each turn, the system pays 1 if it resides in the high energy state and 0 if it resides in the low energy state. Transitioning from the high to the low energy state costs b. The time l that the computer system will remain idle is unknown. Therefore, the decision of when to put the system in the low energy state must be made. 2 Paging Problem In the paging problem, we have a small cache memory composed of k pages and a big main memory composed of n pages (k < n). A sequence of reads r 1,..., r m happen over time. If a page b is read at time t, two things can happen: If b is already cached, then we have a cache hit and b is read with no additional cost. Otherwise, we have a cache miss: b is copied from the main memory to the cache and is read. This has a cost of 1. 1
Our goal is to minimize the number of cache misses. We consider a simplification of the problem where n = k+1. An algorithm for the paging problem must decide which page to remove if the cache memory is full and an uncached page is read. One possible strategy is to remove a random cached page, other possible strategy is to remove the oldest cached page. We show that no matter the strategy, there is no online deterministic algorithm better than k-competitive. 2.1 Example We now consider an example where pages are read in the following order: 1, 2, 3, 4, 1, 2, 3, with n = 4, k = 3 and see how an online algorithm compares with the optimal solution. First, we consider the LRU online algorithm (least recently used): Whenever the cache is full, the page that LRU selects for removal is the oldest in cache. The execution of LRU is shown in Table 1. Time Cache Read Removed Cost 0 1 1 1 1 2 1, 2 2 1 3 1, 2, 3 3 1 4 2, 3, 4 4 1 1 5 3, 4, 1 1 2 1 6 4, 1, 2 2 3 1 7 1, 2, 3 3 4 1 Table 1: LRU execution, cost = 7 Next, we show the optimal offline algorithm OP T execution for the same instance (Table 2). Time Cache Read Removed Cost 0 0 1 1 1 1 2 1, 2 2 1 3 1, 2, 3 3 1 4 1, 2, 4 4 3 1 5 1, 2, 4 1 0 6 1, 2, 4 2 0 7 1, 2, 3 3 4 1 Table 2: Optimal execution, cost = 5 LRU performed worse than OPT. Next, we discuss how the cost of online algorithms compare to the cost of OPT. 2
2.2 Phases First, we formalize the concept of phases. An instance is a sequence of reads r 1, r 2,... r m and can be partitioned into phases as follows: the first phase p 1 is the biggest sequence of reads such that the number of pages that are different in p 1 is k. A direct implication of this definition is that a phase fits in cache. The subsequent phases p 2,... p z are defined in the same way. Take the following example: 1, 2, 3, 4, 1, 2, 3, with n = 4, k = 3. It is partitioned into phases as follows: [1, 2, 3], [4, 1, 2], [3]. The idea behind phases is that every time we switch a phase, we have a cache miss. We will analyze paging algorithms based on the maximum number of cache misses inside a phase. 2.3 Optimal offline algorithm Now we analyze the number of cache misses per phase in an optimal offline algorithm. We later use this to compare the competitiveness of online deterministic and randomized algorithms. Theorem 2.1. The optimal offline algorithm has a number of cache misses that is at least the number of phases divided by two (rounded down). Proof. Divide any instance I into a list of phases p 1, p 2,..., p z. We denote d(p) as the number of distinct pages in a sequence p. We pair up the consecutive phases without overlap (i.e. we consider only {(p 1, p 2 ), (p 3, p 4 ),...}). For any such pair (p i, p i+1 ) we have that d(p i p i+1 ) k + 1. Thus, while reading the sequence p i p i+1, at least one cache miss must happen (one of these pages will be missing in the cache in the beginning of this sequence). Since the number of these pairs of adjacent phases is # phases/2, the result follows. Actually the same argument but handling the overlaps more carefully gives that every algorithm must incur a cache miss per phase (i.e. we can consider the second read in a phase until the first read of the next phase and argue there is a miss in this interval). Theorem 2.2. The optimal offline algorithm has one cache miss per phase. 2.4 Online algorithms Now we discuss the competitiveness of online algorithms looking at the maximum number of cache misses inside a phase. Recap An algorithm Algo is α-competitive if, for every instance I, Algo(I) α OP T (I). Therefore, if an algorithm is α-competitive, then it makes at most α cache misses per phase. 2.4.1 AlgoPhases AlgoPhases is a simple online paging algorithm that removes every page in cache when a phase is over (notice that an online algorithm can indeed detect the beginning of a new phase by counting the number of distinct pages in the current phase). 3
Theorem 2.3. AlgoPhases is k-competitive. Proof. Consider any pair of consecutive phases p i, p i+1. By the end of p i, the cache is full. This leads AlgoPhases to empty the whole cache. As the number of pages that are different in p i+1 equals to k, AlgoPhases makes exactly k cache misses during a phase p i+1. As this is valid for any phase, Theorem 2.2 implies that AlgoPhases is k-competitive. 2.4.2 LRU Besides the simplicity of AlgoPhases, we show that LRU is no better than AlgoPhases in the worst case. Theorem 2.4. LRU is k-competitive Proof. Due to the page removal policy of LRU, if an uncached page b is read during a phase p, then b will be kept in cache until k other different pages are read. This implies that b will remain in cache until the end of p. As the number of pages that are different in p equals to k, the number of cache misses in p is at most k. Therefore, according to Theorem 2.2, LRU is k-competitive. 2.4.3 MRU Consider now the MRU online algorithm (most recently used). Whenever the cache is full, the page that MRU selects for removal is the youngest in cache. Theorem 2.5. MRU is not α-competitive for any α 1. Proof. Consider k = 2. Create an instance with an arbitrarily large phase [1, 2], [3, 2, 3, 2,..., 3, 2]. In the second phase, reading 3 will always remove 2, and reading 2 will always remove 3. Therefore, the number of cache misses inside a phase is proportional to the phase size. 2.4.4 An upper bound Our main question now is: can we do better than k-competitive? For online deterministic algorithms, the answer is no and we prove it next. Theorem 2.6. Let I be any instance made up of reads 1, 2,..., k, k + 1, in any order. The number of cache misses of the optimal offline algorithm is at most I k. Proof. Consider the longest forward distance (LFD) offline algorithm: whenever the cache is full, LFD selects a cached page p for removal if the next read for p is the farthest in the future. By the definition of LFD, all k cached pages will be read before p. Therefore, during any sequence of k requests, LFD causes at most one cache miss and the number of cache misses in I is at most I. k The same property holds for the optimal offline algorithm, as OP T (I ) LF D(I ) (note: LFD is optimal, but knowing this is not necessary for this proof). Theorem 2.7. No deterministic online algorithm is better than k-competitive 4
Proof. Select any deterministic online algorithm Algo. Such algorithm has a specific page removal strategy. We create an adversary instance I for Algo that causes one cache miss on every read. This is done by creating an arbitrarily long sequence requests for pages 1 through k + 1, always requesting the page that the algorithm removed on the previous iteration. Therefore, the number of cache misses by the algorithm is the length of the sequence, and from Theorem 2.6 the number of misses in this sequence by OP T is only only [length of sequence/k]. Thus, the algorithm is no better than k-competitive. 2.5 Nondeterministic online algorithm The marking algorithm (randomized 1-bit LRU) of [2] is approximately ln(k)-competitive and is the best possible α-competitive online algorithm for the paging problem. The behavior of the algorithm is sketched next. If a cache miss happens: If all pages are marked, unmark all pages. Chose an unmarked cached page randomly and remove it. Place current read page in cache, marked. Otherwise, read page with cost zero. We do not provide a formal proof for this algorithm, but the general idea is as follows: consider an instance with two phases p 1, p 2 : [1, 2,..., k][k + 1, 1, 2,..., k 1]. At the beginning of p 2, the cache is full with the elements from p 1. A cache miss happens because an uncached k + 1 must be read. As all pages in cache are marked, we unmark all of them and remove a random unmarked page. This could be any of the unmarked pages 1, 2,..., k. The probability that the first read of p 2 (k + 1) causes a cache miss is always 1 (k + 1 is not in cache). The second read (for page 1) may cause a cache miss if 1 was the page selected for removal during the previous read; as the number of unmarked pages at this point in time is k, the probability that page 1 was removed is at most 1, so this is the probability we have a miss while k reading 1. Now consider the read for 2. The probability of a miss is 1 (it was evicted while reading k k + 1 ) plus 1 1 ( 1 was evicted while reading k + 1 and 2 was evicted when reading k k 1 1 1 ), which equals. This goes on until the number of unmarked pages is zero, leading to at k 1 most H k = 1 + 1 + 1 +... + 1 ln(k) cache misses in p k k 1 k 2 2. 3 AdWords Problem A search engine receives a series of keyword searches. For each search, the engine must show a single ad. Ads belong to businesses that place bids b(k, i) between ads and keywords. Additionally, each ad has a daily budget B i. Whenever an ad i is shown on a search for keyword k, the engine 5
earns ˆb(k, i) = min{b(k, i), B i } and the same amount is decreased from the ad budget B i. The goal is to maximize the engine profits, given that the full list of keyword searches over time is unknown. Observation The offline version of the AdWords problem is NP-Hard, and the best known approximation algorithm is (1 1 )-competitive [1]. e 3.1 Greedy online algorithm We define a greedy algorithm as follows: whenever a new keyword k arrives, we assign it to the highest paying ad, i.e. the ad where ˆb(k, i) is maximum. 3.1.1 Example Consider an instance with two keywords k 1 and k 2 and two ads 1 and 2 with unit budgets. The relationships between keywords and ads are detailed in Table 3. Table 4 illustrates the execution of the greedy algorithm and Table 5 illustrates the execution of the optimal offline algorithm. i = 1 i = 2 k 1 b(k 1, 1) = 0.8 b(k 1, 2) = 0.7 k 2 b(k 2, 1) = 0.4 b(k 2, 2) = 0.1 Table 3: Relationships between keywords and ads Time Keyword Ad Revenue B 1 B 2 0 1.0 1.0 1 k 1 1 0.8 0.2 1.0 2 k 2 2 0.2 0.0 1.0 Table 4: Greedy algorithm has a revenue of 1.0 Time Keyword Ad Revenue B 1 B 2 0 1.0 1.0 1 k 1 2 0.7 1.0 0.3 2 k 2 1 0.4 0.6 0.3 Table 5: Optimal offline algorithm has a revenue of 1.1 6
3.1.2 Competitiveness Recap In a maximization problem, an algorithm Algo is α-competitive if, for every instance I, Algo(I) α OP T (I). The AdWords problem is a maximization problem. Theorem 3.1. The greedy algorithm is 1 2 competitive. Proof. (Local-ratio argument.) For the greedy algorithm to be 1 competitive, for any instance 2 I we need Greedy(I) 1 OP T (I), or OP T (I) 2 Greedy(I). We prove this by induction 2 on the size of the instance I. The first step of the greedy algorithm assigns for a keyword k the ad i that maximizes ˆb(i, k). Therefore, the revenue of Greedy(I) equals to ˆb(i, k) plus the revenue of the rest of the instance, named I ; more precisely, I is obtained from I by removing k and reducing B i by ˆb(i, k) units. So we need to show OP T (I) 2 (ˆb(i, k) + Greedy(I )). (1) We make the observation that reducing a budget B i to B i reduces the value of a solution by at most B i B i. As the budget of an ad was reduced by ˆb(i, k) in instance I, the cost of the optimal offline algorithm in I is bounded as follows: OP T (I ) OP T (I) ˆb(i, k) ˆb(ī, k), where ī is the ad assigned to k in OP T (I) (the last term come from the fact the keyword k was removed and so OP T may lose the value associated to the assignment it made to this keyword). Moreover, by the greedy construction, ˆb(ī, k) is at most ˆb(i, k) and so we get the lower bound OP T (I ) OP T (I) 2ˆb(i, k). (2) By induction we have OP T (I ) 2 Greedy(I ). Employing this in the bound (2) above gives the desired inequality (1), which concludes the proof. This analysis is also tight: Table 7 and Table 8 illustrate with an adversary instance why the greedy algorithm to the AdWords problem is not better than 1 competitive. There are two 2 keywords k 1 and k 2 and two ads 1 and 2 with unit budgets. The relationships between keywords and ads are detailed in Table 6. i = 1 i = 2 k 1 b(k 1, 1) = 1.0 b(k 1, 2) = 1.0 k 2 b(k 2, 1) = 1.0 ɛ b(k 2, 2) = 0.0 Table 6: Relationships between keywords and ads (ɛ is a very small number). 3.2 Better online algorithms In [4] two algorithms for the AdWords problem are proposed (based on [3]). The first is deterministic and the second is randomized. Both are (1 1 )-competitive, but this is only the case if e budgets are much larger than the bids (the proof is beyond the scope of the lecture notes). Both algorithms depend on a tradeoff function ψ(f) = 1 e (1 f) defined by [4]. 7
Time Keyword Ad Revenue B 1 B 2 0 1.0 1.0 1 k 1 1 1.0 0.0 1.0 2 k 2 0.0 1.0 Table 7: Greedy algorithm has a revenue of 1.0. Time Keyword Ad Revenue B 1 B 2 0 1.0 1.0 1 k 1 2 1.0 ɛ 1.0 ɛ 2 k 2 1 1.0 0.0 ɛ Table 8: Optimal offline algorithm has a revenue of 2.0 ɛ. 3.2.1 Deterministic algorithm Whenever a keyword k arrives, we assign it to the ad that maximizes ˆb(k, i) ψ(t (i)), where T (i) is the remaining fraction of the ad budget. 3.2.2 Randomized algorithm Randomly shuffle ads. Whenever a keyword k arrives, assign it to the ad that maximizes ˆb(k, i) ψ( r ), where r is the shuffled position of the ad i and n is the total number of ads. n References [1] N. Andelman and Y. Mansour. Auctions with Budget Constraints, pages 26 38. Springer Berlin Heidelberg, Berlin, Heidelberg, 2004. [2] A. Fiat, R. M. Karp, M. Luby, L. A. McGeoch, D. D. Sleator, and N. E. Young. Competitive paging algorithms. Journal of Algorithms, 12(4):685 699, 1991. [3] R. M. Karp, U. V. Vazirani, and V. V. Vazirani. An optimal algorithm for on-line bipartite matching. In Proceedings of the Twenty-second Annual ACM Symposium on Theory of Computing, STOC 90, pages 352 358, New York, NY, USA, 1990. ACM. [4] A. Mehta, A. Saberi, U. Vazirani, and V. Vazirani. Adwords and generalized on-line matching. In 46th Annual IEEE Symposium on Foundations of Computer Science (FOCS 05), pages 264 273, Oct 2005. 8