K-Lists. Anindya Sen and Corrie Scalisi. June 11, University of California, Santa Cruz. Anindya Sen and Corrie Scalisi (UCSC) K-Lists 1 / 25

Size: px

Start display at page:

Download "K-Lists. Anindya Sen and Corrie Scalisi. June 11, University of California, Santa Cruz. Anindya Sen and Corrie Scalisi (UCSC) K-Lists 1 / 25"

Janice Hamilton
6 years ago
Views:

1 K-Lists Anindya Sen and Corrie Scalisi University of California, Santa Cruz June 11, 2007 Anindya Sen and Corrie Scalisi (UCSC) K-Lists 1 / 25

2 Outline 1 Introduction 2 Experts 3 Noise-Free Case Deterministic Algorithm Mistake Bound 4 Randomized Algorithm Efficient Implementation 5 Deterministic Algorithm Median of Medians Proportion of Weight 6 Future Work 7 Conclusion Anindya Sen and Corrie Scalisi (UCSC) K-Lists 2 / 25

3 Problem How to combine the heads of k-lists into a prediction Online sequence of requests that index into the lists Want to minimize the number of misses (requests for indices that are not part of our prediction) Assume no two lists have an element in common x x x x Anindya Sen and Corrie Scalisi (UCSC) K-Lists 3 / 25

4 Motivation Applications in caching, ranking, etc. Combining elements in a k level virtual cache into a single real cache Designing a meta-search engine which combines the rankings of several popular search engine results Anindya Sen and Corrie Scalisi (UCSC) K-Lists 4 / 25

5 Two Lists Only need n experts for 2 list case Easy Partitions can be specified using 1 index i into list 1. The index into list 2 is just n i. Can use known algorithms such as Weighted Median, Weighted Average, etc. Anindya Sen and Corrie Scalisi (UCSC) K-Lists 5 / 25

6 K-Lists n = cache size, k = number of lists A partition is set P such that k j=1 P j = n Anindya Sen and Corrie Scalisi (UCSC) K-Lists 6 / 25

7 Outline 1 Introduction 2 Experts 3 Noise-Free Case Deterministic Algorithm Mistake Bound 4 Randomized Algorithm Efficient Implementation 5 Deterministic Algorithm Median of Medians Proportion of Weight 6 Future Work 7 Conclusion Anindya Sen and Corrie Scalisi (UCSC) K-Lists 7 / 25

8 Experts Two schemes: (n+1)*k weights one weight per partition For n = 3, k = 3 k(n + 1) weights 12 experts one weight per partition 10 experts Partitions: 003,012,021,030,102,111,120,201,210,300 Anindya Sen and Corrie Scalisi (UCSC) K-Lists 8 / 25

9 Outline 1 Introduction 2 Experts 3 Noise-Free Case Deterministic Algorithm Mistake Bound 4 Randomized Algorithm Efficient Implementation 5 Deterministic Algorithm Median of Medians Proportion of Weight 6 Future Work 7 Conclusion Anindya Sen and Corrie Scalisi (UCSC) K-Lists 9 / 25

10 Deterministic Algorithm for the Noise-Free Case m : number of cache pages with unknown status At each trial t, adversary maximizes m, algorithm minimizes m If A splits m amongst the k lists evenly, then any miss in list i reveals m k m k pages from the top of i (in cache) of pages from the bottom of j i (not in cache) m k x Anindya Sen and Corrie Scalisi (UCSC) K-Lists 10 / 25

11 Mistake Bound of Deterministic Algorithm Theorem Let M be the number of misses suffered by A for a sequence of requests for which there exists a partition with no misses. We obtain the following bound on M Proof. M 0.7k log(n) Recurrence : M(n) = M(n n + 1 ) + 1 k M(n(1 1/k)) + 1 Anindya Sen and Corrie Scalisi (UCSC) K-Lists 11 / 25

12 Mistake Bound of Deterministic Algorithm Theorem Let M be the number of misses suffered by A for a sequence of requests for which there exists a partition with no misses. We obtain the following bound on M Proof. M 0.7k log(n) Base case : M(1) = 0 Anindya Sen and Corrie Scalisi (UCSC) K-Lists 11 / 25

13 Mistake Bound of Deterministic Algorithm Theorem Let M be the number of misses suffered by A for a sequence of requests for which there exists a partition with no misses. We obtain the following bound on M Proof. M 0.7k log(n) Solving for M(n) : M(n) = k log(n)/ log( k 1 ) < k ln(2) log(n) Anindya Sen and Corrie Scalisi (UCSC) K-Lists 11 / 25

14 Outline 1 Introduction 2 Experts 3 Noise-Free Case Deterministic Algorithm Mistake Bound 4 Randomized Algorithm Efficient Implementation 5 Deterministic Algorithm Median of Medians Proportion of Weight 6 Future Work 7 Conclusion Anindya Sen and Corrie Scalisi (UCSC) K-Lists 12 / 25

15 Randomized Algorithm WMR selects a partition in each round Equal initial weights assigned to each partition WMR implicitly maintains a probability distribution p t over the set of all partitions P t 1 p t,i = e η s=1 L i Z where L i is the loss of the ith expert and Z represents normalization At the tth round, WMR randomly selects a partition based on p t + Able to get good bounds for the loss of algorithm w.r.t. loss of best in-hindsight partition E(M ALG ) M + 2M ln n + k ln n - May need to do lot of refetching Anindya Sen and Corrie Scalisi (UCSC) K-Lists 13 / 25

16 Efficient Implementation of WMR S(p, q, r) = # of partial partitions on the first p lists, with q elements and r mistakes S(p, q, r) = q S(p 1, q i, r c(p, i)) i=0 c(p, i) = total # of misses in the pth list below cutoff i Use Dynamic Programming to fill in nkt table for 1 p k, 0 q n and 1 r t. Time complexity : O(n 2 kt) Let P i = S(k, n, i)/z for 1 i t, the number of partitions making i mistakes Modify WMR to predict i proportional to P i. To obtain a corresponding partition, we can use backtracking. Anindya Sen and Corrie Scalisi (UCSC) K-Lists 14 / 25

17 Outline 1 Introduction 2 Experts 3 Noise-Free Case Deterministic Algorithm Mistake Bound 4 Randomized Algorithm Efficient Implementation 5 Deterministic Algorithm Median of Medians Proportion of Weight 6 Future Work 7 Conclusion Anindya Sen and Corrie Scalisi (UCSC) K-Lists 15 / 25

18 Deterministic Algorithm How can we balance the use of weights with the need to select only n experts for the real cache? Can we find a deterministic algorithm that gives a bound of the form a M + b, where M is the loss of the best expert? Anindya Sen and Corrie Scalisi (UCSC) K-Lists 16 / 25

19 Median of Medians Idea: Maybe the median of the total partition weight through each gap in each of the k-lists corresponds to a cache? Counterexample below β =.5 Accesses Partitions Loss Weight Anindya Sen and Corrie Scalisi (UCSC) K-Lists 17 / 25

20 Median of Medians Accesses Total Path weight through gap Medians of path weight through gaps are at (2,1,1). 4 cache pages, even though the cache should be of size 3. Anindya Sen and Corrie Scalisi (UCSC) K-Lists 18 / 25

21 Selection by Proportion of Weight Uses (n + 1) k experts model: When a miss occurs penalize experts above where loss occurred penalize same number of experts from bottoms of other lists Predict by taking W of the weight from the top of each list k 2 Anindya Sen and Corrie Scalisi (UCSC) K-Lists 19 / 25

22 Proportion of Weights Counterexample n = 3, k = 3, β =.5 and (n + 1) k weights initialized to 1 After a single miss in list 1 at index 1: W k 2 = = 2 Weights Taking 2 from the weight of each list results in the indices (2,1,1). 4 cache pages, even though the cache should be of size 3. Anindya Sen and Corrie Scalisi (UCSC) K-Lists 20 / 25

23 Outline 1 Introduction 2 Experts 3 Noise-Free Case Deterministic Algorithm Mistake Bound 4 Randomized Algorithm Efficient Implementation 5 Deterministic Algorithm Median of Medians Proportion of Weight 6 Future Work 7 Conclusion Anindya Sen and Corrie Scalisi (UCSC) K-Lists 21 / 25

24 Connection to the Metrical Task Problem Extensively studied in the Online Algorithms community Online algorithm A controls a system with n states located at points in a space with distance metric d At time t, A receives a task and a cost vector l, specifying cost of performing task in each state A tells the system to move from state i to j and pays cost d i,j + l j (d i,j = 0 if i = j) Performance comparison w.r.t. optimal offline algorithm E[cost A (σ)] a cost OPT (σ) + b Anindya Sen and Corrie Scalisi (UCSC) K-Lists 22 / 25

25 Connection to the Metrical Task Problem Theorem (Blum & Burch) Given n online algorithms for a problem that can be formulated as a Metrical Task System of diameter at most D > 0, and given ɛ < 1/4, the WMR algorithm can combine them such that on any request sequence σ it incurs expected cost at most (1 + 2ɛ)L + ( )D ln n ɛ where L is the cost of the best of the n algorithms on request sequence σ D = Size of list (dump contents of current cache and load that of another cache) What if we use the best sequence of experts (rather than a single best expert) as the comparator? Anindya Sen and Corrie Scalisi (UCSC) K-Lists 23 / 25

26 Outline 1 Introduction 2 Experts 3 Noise-Free Case Deterministic Algorithm Mistake Bound 4 Randomized Algorithm Efficient Implementation 5 Deterministic Algorithm Median of Medians Proportion of Weight 6 Future Work 7 Conclusion Anindya Sen and Corrie Scalisi (UCSC) K-Lists 24 / 25

27 Conclusion Randomized algorithm gives us a good bound but cannot make any guarantees on refetching Open Problem To find a deterministic algorithm that produces a valid cache and combines weights such that we get a good bound. Questions? Anindya Sen and Corrie Scalisi (UCSC) K-Lists 25 / 25

On-line Learning and the Metrical Task System Problem

Machine Learning, 39, 35 58, 2000. c 2000 Kluwer Academic Publishers. Printed in The Netherlands. On-line Learning and the Metrical Task System Problem AVRIM BLUM avrim+@cs.cmu.edu CARL BURCH cburch+@cs.cmu.edu