Design and Analysis of Algorithms

Similar documents
Randomized Sorting Algorithms Quick sort can be converted to a randomized algorithm by picking the pivot element randomly. In this case we can show th

Fixed-Universe Successor : Van Emde Boas. Lecture 18

6.854 Advanced Algorithms

1 Probability Review. CS 124 Section #8 Hashing, Skip Lists 3/20/17. Expectation (weighted average): the expectation of a random quantity X is:

Skip List. CS 561, Lecture 11. Skip List. Skip List

Skip Lists. What is a Skip List. Skip Lists 3/19/14

Lecture 23. Random walks

RANDOMIZED ALGORITHMS

CMPUT 675: Approximation Algorithms Fall 2014

Lecture 16. Error-free variable length schemes (contd.): Shannon-Fano-Elias code, Huffman code

Algorithms and Data Structures 2016 Week 5 solutions (Tues 9th - Fri 12th February)

Analysis of Algorithms I: Asymptotic Notation, Induction, and MergeSort

Randomized Algorithms

Sublinear-Time Algorithms

Priority queues implemented via heaps

data structures and algorithms lecture 2

CS Data Structures and Algorithm Analysis

MASSACHUSETTS INSTITUTE OF TECHNOLOGY 6.436J/15.085J Fall 2008 Lecture 3 9/10/2008 CONDITIONING AND INDEPENDENCE

Lecture 5: Hashing. David Woodruff Carnegie Mellon University

CSE 548: Analysis of Algorithms. Lectures 18, 19, 20 & 21 ( Randomized Algorithms & High Probability Bounds )

WHAT IS A SKIP LIST? S 2 S 1. A skip list for a set S of distinct (key, element) items is a series of lists

So far we have implemented the search for a key by carefully choosing split-elements.

Review Of Topics. Review: Induction

Design and Analysis of Algorithms March 12, 2015 Massachusetts Institute of Technology Profs. Erik Demaine, Srini Devadas, and Nancy Lynch Quiz 1

8 Priority Queues. 8 Priority Queues. Prim s Minimum Spanning Tree Algorithm. Dijkstra s Shortest Path Algorithm

Analysis of Algorithms CMPSC 565

Discrete Mathematics and Probability Theory Spring 2016 Rao and Walrand Note 14

Notes on Discrete Probability

1 Maintaining a Dictionary

Skylines. Yufei Tao. ITEE University of Queensland. INFS4205/7205, Uni of Queensland

2.1 Laplacian Variants

Analysis of Algorithms - Using Asymptotic Bounds -

Algorithms for Data Science

Recitation 6. Randomization. 6.1 Announcements. RandomLab has been released, and is due Monday, October 2. It s worth 100 points.

Data selection. Lower complexity bound for sorting

2.2 Asymptotic Order of Growth. definitions and notation (2.2) examples (2.4) properties (2.2)

SkipList. Searching a key x in a sorted linked list. inserting a key into a Sorted linked list. To insert

Aditya Bhaskara CS 5968/6968, Lecture 1: Introduction and Review 12 January 2016

Lecture 4: Two-point Sampling, Coupon Collector s problem

15.081J/6.251J Introduction to Mathematical Programming. Lecture 18: The Ellipsoid method

Lecture 4: Divide and Conquer: van Emde Boas Trees

Lecture 20 : Markov Chains

Today s Outline. CS 362, Lecture 4. Annihilator Method. Lookup Table. Annihilators for recurrences with non-homogeneous terms Transformations

Lecture 4. Quicksort

Randomized Algorithms Week 2: Tail Inequalities

Balanced Allocation Through Random Walk

Statistics for Linguists. Don t pannic

Mathematical Foundations of Computer Science Lecture Outline October 18, 2018

20.1 2SAT. CS125 Lecture 20 Fall 2016

Lecture 4 February 2nd, 2017

Computer Science 385 Analysis of Algorithms Siena College Spring Topic Notes: Limitations of Algorithms

What we have learned What is algorithm Why study algorithm The time and space efficiency of algorithm The analysis framework of time efficiency Asympt

CS Communication Complexity: Applications and New Directions

Handout 1: Probability

6.045: Automata, Computability, and Complexity (GITCS) Class 17 Nancy Lynch

Lecture 14. More discrete random variables

Lecture 19 : Reed-Muller, Concatenation Codes & Decoding problem

CS261: Problem Set #3

Topic 17. Analysis of Algorithms

3.1 Asymptotic notation

Lecture 4: Proof of Shannon s theorem and an explicit code

U.C. Berkeley CS294: Beyond Worst-Case Analysis Handout 3 Luca Trevisan August 31, 2017

Tutorial 4. Dynamic Set: Amortized Analysis

7 Dictionary. EADS c Ernst Mayr, Harald Räcke 109

CS 125 Section #12 (More) Probability and Randomized Algorithms 11/24/14. For random numbers X which only take on nonnegative integer values, E(X) =

Example 1. The sample space of an experiment where we flip a pair of coins is denoted by:

7 Dictionary. Harald Räcke 125

CS483 Design and Analysis of Algorithms

The first bound is the strongest, the other two bounds are often easier to state and compute. Proof: Applying Markov's inequality, for any >0 we have

Lecture 13 (Part 2): Deviation from mean: Markov s inequality, variance and its properties, Chebyshev s inequality

Another way of saying this is that amortized analysis guarantees the average case performance of each operation in the worst case.

Problem. Problem Given a dictionary and a word. Which page (if any) contains the given word? 3 / 26

Average Case Analysis of QuickSort and Insertion Tree Height using Incompressibility

Understanding Generalization Error: Bounds and Decompositions

N/4 + N/2 + N = 2N 2.

11.1 Set Cover ILP formulation of set cover Deterministic rounding

On a coin-flip problem and its connection to π

6.262: Discrete Stochastic Processes 2/2/11. Lecture 1: Introduction and Probability review

Models of Computation,

Lecture 18: March 15

Data Structures and Algorithms

Data Structures and and Algorithm Xiaoqing Zheng

Discrete Mathematics for CS Spring 2007 Luca Trevisan Lecture 20

Finding the Median. 1 The problem of finding the median. 2 A good strategy? lecture notes September 2, Lecturer: Michel Goemans

X = X X n, + X 2

CS684 Graph Algorithms

Advanced Algorithms (2IL45)

MIDTERM Fundamental Algorithms, Spring 2008, Professor Yap March 10, 2008

Chapter 5 Data Structures Algorithm Theory WS 2017/18 Fabian Kuhn

Outline. Computer Science 331. Cost of Binary Search Tree Operations. Bounds on Height: Worst- and Average-Case

Topics in Theoretical Computer Science April 08, Lecture 8

CS 577 Introduction to Algorithms: Strassen s Algorithm and the Master Theorem

6.041/6.431 Spring 2009 Quiz 1 Wednesday, March 11, 7:30-9:30 PM. SOLUTIONS

Lecture 7: Passive Learning

Analysis of Algorithms. Randomizing Quicksort

Allocate-On-Use Space Complexity of Shared-Memory Algorithms

1 AC 0 and Håstad Switching Lemma

Each internal node v with d(v) children stores d 1 keys. k i 1 < key in i-th sub-tree k i, where we use k 0 = and k d =.

7 Dictionary. Ernst Mayr, Harald Räcke 126

Dominating Set Counting in Graph Classes

Transcription:

Design and Analysis of Algorithms 6.046J/18.401J LECTURE 7 Skip Lists Data structure Randomized insertion With high probability (w.h.p.) bound 7/10/15 Copyright 2001-8 by Leiserson et al L9.1

Skip lists Simple randomized dynamic search structure Invented by William Pugh in 1989 Easy to implement Maintains a dynamic set of n elements in O(lg n) time per operation in expectation and with high probability Strong guarantee on tail of distribution of T(n) O(lg n) "almost always" 7/10/15 Copyright 2001-8 by Leiserson et al L9.2

One linked list Start from simplest data structure: (sorted) linked list Searches take Θ(n) time in worst case How can we speed up searches? 14 23 34 42 50 59 66 72 79 7/10/15 Copyright 2001-8 by Leiserson et al L9.3

Two linked lists Suppose we had two sorted linked lists (on subsets of the elements) Each element can appear in one or both lists How can we speed up searches? 14 23 34 42 50 59 66 72 79 7/10/15 Copyright 2001-8 by Leiserson et al L9.4

Two linked lists as a subway IDEA: Express and local subway lines (à la New York City 7th Avenue Line) Express line connects a few of the stations Local line connects all stations Links between lines at common stations 14 34 42 72 14 23 34 42 50 59 66 72 79 7/10/15 Copyright 2001-8 by Leiserson et al L9.5

Searching in two linked lists SEARCH(x): Walk right in top linked list (L 1 ) until going right would go too far Walk down to bottom linked list (L 2 ) Walk right in L 2 until element found (or not) 14 34 42 72 14 23 34 42 50 59 66 72 79 7/10/15 Copyright 2001-8 by Leiserson et al L9.6

Searching in two linked lists EXAMPLE: SEARCH(59) Too far: 59 < 72 14 34 42 72 14 23 34 42 50 59 66 72 79 7/10/15 Copyright 2001-8 by Leiserson et al L9.7

Design of two linked lists QUESTION: Which nodes should be in L 1? In a subway, the "popular stations" Here we care about worst-case performance Best approach: Evenly space the nodes in L 1 But how many nodes should be in L 1? 14 34 42 72 14 23 34 42 50 59 66 72 79 7/10/15 Copyright 2001-8 by Leiserson et al L9.8

Analysis of two linked lists ANALYSIS: 14 34 42 L 2 Search cost is roughly L + 1 Minimized (up to L 1 constant factors) when terms are equal 2 L 1 = L 2 = n L 1 = n 14 23 34 42 50 59 66 72 79 72 7/10/15 Copyright 2001-8 by Leiserson et al L9.9

Analysis of two linked lists ANALYSIS: L 1 = n, L 2 = n Search cost is roughly L 2 n L 1 + = n + = 2 n L 1 n 14 42 66 n 14 23 34 42 50 59 66 72 79 n n n 7/10/15 Copyright 2001-8 by Leiserson et al L9.10

More linked lists What if we had more sorted linked lists? 2 sorted lists 2 n 3 sorted lists 3 3 n k sorted lists k k n lg n sorted lists lg n lgn n = 2lg n 14 42 66 n 14 23 34 42 50 59 66 72 79 n n n 7/10/15 Copyright 2001-8 by Leiserson et al L9.11

lg n linked lists lg n sorted linked lists are like a binary tree (in fact, level-linked B + -tree) 14 79 14 50 79 14 34 50 66 79 14 23 34 42 50 59 66 72 79 7/10/15 Copyright 2001-8 by Leiserson et al L9.12

Searching in lg n linked lists EXAMPLE: SEARCH(72) 14 79 14 50 79 14 34 50 66 79 14 23 34 42 50 59 66 72 79 7/10/15 Copyright 2001-8 by Leiserson et al L9.13

Skip lists Ideal skip list is this lg n linked list structure Skip list data structure maintains roughly this structure subject to updates (insert/delete) 14 79 14 50 79 14 34 50 66 79 14 23 34 42 50 59 66 72 79 7/10/15 Copyright 2001-8 by Leiserson et al L9.14

INSERT(x) To insert an element x into a skip list: SEARCH(x) to see where x fits in bottom list Always insert into bottom list INVARIANT: Bottom list contains all elements Insert into some of the lists above QUESTION: To which other lists should we add x? 7/10/15 Copyright 2001-8 by Leiserson et al L9.15

INSERT(x) QUESTION: To which other lists should we add x? IDEA: Flip a (fair) coin; if HEADS, promote x to next level up and flip again Probability of promotion to next level = p = 1/2 On average: 1/2 of the elements promoted 0 levels 1/4 of the elements promoted 1 level 1/8 of the elements promoted 2 levels etc. Approx. balance d? 7/10/15 Copyright 2001-8 by Leiserson et al L9.16

Example of skip list EXERCISE: Try building a skip list from scratch by repeated insertion using a real coin Small change: Add special value to every list can search with 34 50 50 the same algorithm 23 34 42 50 7/10/15 Copyright 2001-8 by Leiserson et al L9.17

Skip lists A skip list is the result of insertions (and deletions) from an initially empty structure (containing just ) INSERT(x) uses random coin flips to decide promotion level DELETE(x) removes x from all lists containing it 7/10/15 Copyright 2001-8 by Leiserson et al L9.18

Skip lists A skip list is the result of insertions (and deletions) from an initially empty structure (containing just ) INSERT(x) uses random coin flips to decide promotion level DELETE(x) removes x from all lists containing it How good are skip lists? (speed/balance) INTUITIVELY: Pretty good on average Expected Time for Search: O(lg n) 7/10/15 Copyright 2001-8 by Leiserson et al L9.19

Expected Time for SEARCH Search for target begins with head element in top list Proceed horizontally until current element greater than or equal to target If the current element is equal to the target, it has been found. If the current element is greater than the target, go back to the previous element and drop down vertically to the next lower list and repeat the procedure. The expected number of steps in each linked list is seen to be 1/p, by tracing the search path backwards from the target until reaching an element that appears in the next higher list. The total expected cost of a search is O(log 1/p n) (1/p) which is O(lg n) when p is a constant 7/10/15 Copyright 2001-8 by Leiserson et al L9.20

With-high-probability theorem THEOREM: With high probability, every search in an n-element skip list costs O(lg n) 7/10/15 Copyright 2001-8 by Leiserson et al L9.21

With-high-probability theorem THEOREM: With high probability, every search in a skip list costs O(lg n) INFORMALLY: Event E occurs with high probability (w.h.p.) if, for any α 1, there is an appropriate choice of constants for which E occurs with probability at least 1 O(1/n α ) In fact, constant in O(lg n) depends on α FORMALLY: Parameterized event E α occurs with high probability if, for any α 1, there is an appropriate choice of constants for which E α occurs with probability at least 1 c α /n α 7/10/15 Copyright 2001-8 by Leiserson et al L9.22

With-high-probability theorem THEOREM: With high probability, every search in a skip list costs O(lg n) INFORMALLY: Event E occurs with high probability (w.h.p.) if, for any α 1, there is an appropriate choice of constants for which E occurs with probability at least 1 O(1/n α ) IDEA: Can make error probability O(1/n α ) very small by setting α large, e.g., 100 Almost certainly, bound remains true for entire execution of polynomial-time algorithm 7/10/15 Copyright 2001-8 by Leiserson et al L9.23

Boole's inequality / union bound Recall: BOOLE'S INEQUALITY / UNION BOUND: For any random events E 1, E 2,, E k, Pr{E 1 E 2 E k } Pr{E 1 } + Pr{E 2 } + + Pr{E k } Application to with-high-probability events: If k = n O(1), and each E i occurs with high probability, then so does E 1 E 2 E k 7/10/15 Copyright 2001-8 by Leiserson et al L9.24

Analysis Warmup LEMMA: n-element skip list has O(lg n) expected number of levels PROOF: Probability that x has been promoted once is p Probability that x has been promoted k times is pk f Expected number of promotions is Sigma I = 0 \infty i. p^i = O(log Error probability for having at most c lg n levels = Pr{more than c lg n levels} n Pr{element x promoted at least c lg n times} (by Boole's Inequality) = n (1/2 c lg n ) = n (1/n c ) = 1/ n c 1 7/10/15 Copyright 2001-8 by Leiserson et al L9.25

Analysis Warmup LEMMA: With high probability, n-element skip list has O(lg n) levels PROOF: Error probability for having at most c lg n levels 1/n c 1 This probability is polynomially small, i.e., at most n α for α = c 1. We can make α arbitrarily large by choosing the constant c in the O(lg n) bound accordingly. 7/10/15 Copyright 2001-8 by Leiserson et al L9.26

Proof of theorem THEOREM: Every search in an n-element skip list costs O(lg n) expected time COOL IDEA: Analyze search backwards leaf to root Search starts [ends] at leaf (node in bottom level) At each node visited: If node wasn't promoted higher (got TAILS here), then we go [came from] left If node was promoted higher (got HEADS here), then we go [came from] up Search stops [starts] at the root (or ) 7/10/15 Copyright 2001-8 by Leiserson et al L9.27

Proof of theorem THEOREM: With high probability, every search n-element skip list costs O(lg n) in an COOL IDEA: Analyze search backwards leaf to root PROOF: Search makes "up" and "left" moves until it reaches the root (or ) Expected number of "up" moves < num. of levels O(lg n) (Lemma) w.h.p., number of moves is at most the number of times we need to flip a coin to get c lg n HEADs 7/10/15 Copyright 2001-8 by Leiserson et al L9.28

Chernoff Bounds THEOREM (CHERNOFF): Let Y be a random variable representing the total number of heads (tails) in a series of m independent coin flips, where each flip has a probability p of coming up heads (tails). Then, for all r > 0, Pr[Y E[Y] + r] e -2r2 /m 7/10/15 Copyright 2001-8 by Leiserson et al L9.29

Lemma LEMMA: For any c there is a constant d such that w.h.p. the number of heads in flipping d lg n fair coins is at least c lg n. PROOF: Let Y be the number of tails when flipping a fair coin d lg n times. p = ½. m = d lgn, so E[Y] = ½ m = ½ d lg n We want to bound the probability of c lg n heads = probability of d lg n c lg n tails. 7/10/15 Copyright 2001-8 by Leiserson et al L9.30

Lemma Proof (contd.) Pr[Y (d c) lg n]= Pr[Y E[Y] + (½ d c) lg n] Choose d = 3c => r = 3clg n By Chernoff, probability of c lg n heads is e-2r 2 /m = e -2(3clg n) 2 /8clg n = e -9/4clg n e -clg n 2 -clg n (e > 2) = 1/n c 7/10/15 Copyright 2001-8 by Leiserson et al L9.31

Proof of theorem (finally!) THEOREM: With high probability, every search in an n-element skip list costs O(lg n) event A: number of levels c lg n w.h.p. event B: number of moves until c lg n up moves d lg n w.h.p. A and B are not independent! Want to show A & B occurs w.h.p. to prove theorem Pr(A & B) = Pr(A + B) Pr(A) + Pr(B) (union bound) 1/n c-1 + 1/n c = O(1/n c-1 ) 7/10/15 Copyright 2001-8 by Leiserson et al L9.32

MIT OpenCourseWare http://ocw.mit.edu 6.046J / 18.410J Design and Analysis of Algorithms Spring 2015 For information about citing these materials or our Terms of Use, visit: http://ocw.mit.edu/terms.