Introduction to Algorithms

Similar documents
Introduction to Algorithms

Hashing. Alexandra Stefan

Hashing, Hash Functions. Lecture 7

Algorithms Design & Analysis. Hash Tables

Lecture: Analysis of Algorithms (CS )

Lecture 3 January 31, 2017

Symbol-table problem. Hashing. Direct-access table. Hash functions. CS Spring Symbol table T holding n records: record.

Lecture 4: Universal Hash Functions/Streaming Cont d

Data Structures and Algorithm. Xiaoqing Zheng

Some Consequences. Example of Extended Euclidean Algorithm. The Fundamental Theorem of Arithmetic, II. Characterizing the GCD and LCM

Expected Value and Variance

princeton univ. F 13 cos 521: Advanced Algorithm Design Lecture 3: Large deviations bounds and applications Lecturer: Sanjeev Arora

Hash functions : MAC / HMAC

MATH 5707 HOMEWORK 4 SOLUTIONS 2. 2 i 2p i E(X i ) + E(Xi 2 ) ä i=1. i=1

THE CHINESE REMAINDER THEOREM. We should thank the Chinese for their wonderful remainder theorem. Glenn Stevens

Foundations of Arithmetic

The Multiple Classical Linear Regression Model (CLRM): Specification and Assumptions. 1. Introduction

Dirichlet s Theorem In Arithmetic Progressions

Stanford University CS359G: Graph Partitioning and Expanders Handout 4 Luca Trevisan January 13, 2011

Stanford University CS254: Computational Complexity Notes 7 Luca Trevisan January 29, Notes for Lecture 7

Lectures - Week 4 Matrix norms, Conditioning, Vector Spaces, Linear Independence, Spanning sets and Basis, Null space and Range of a Matrix

CHAPTER 17 Amortized Analysis

APPENDIX A Some Linear Algebra

Mining Data Streams-Estimating Frequency Moment

Min Cut, Fast Cut, Polynomial Identities

Example: (13320, 22140) =? Solution #1: The divisors of are 1, 2, 3, 4, 5, 6, 9, 10, 12, 15, 18, 20, 27, 30, 36, 41,

Inner Product. Euclidean Space. Orthonormal Basis. Orthogonal

DISCRIMINANTS AND RAMIFIED PRIMES. 1. Introduction A prime number p is said to be ramified in a number field K if the prime ideal factorization

Calculation of time complexity (3%)

E Tail Inequalities. E.1 Markov s Inequality. Non-Lecture E: Tail Inequalities

11 Tail Inequalities Markov s Inequality. Lecture 11: Tail Inequalities [Fa 13]

Estimation: Part 2. Chapter GREG estimation

18.1 Introduction and Recap

Economics 130. Lecture 4 Simple Linear Regression Continued

U.C. Berkeley CS294: Spectral Methods and Expanders Handout 8 Luca Trevisan February 17, 2016

Algorithms for factoring

Finding Primitive Roots Pseudo-Deterministically

Problem Solving in Math (Math 43900) Fall 2013

Notes on Frequency Estimation in Data Streams

Lecture 3. Ax x i a i. i i

SL n (F ) Equals its Own Derived Group

Randomness and Computation

Module 3 LOSSY IMAGE COMPRESSION SYSTEMS. Version 2 ECE IIT, Kharagpur

Finding Dense Subgraphs in G(n, 1/2)

04 - Treaps. Dr. Alexander Souza

U.C. Berkeley CS294: Beyond Worst-Case Analysis Luca Trevisan September 5, 2017

Week 2. This week, we covered operations on sets and cardinality.

On cyclic of Steiner system (v); V=2,3,5,7,11,13

HMMT February 2016 February 20, 2016

find (x): given element x, return the canonical element of the set containing x;

= z 20 z n. (k 20) + 4 z k = 4

Problem Set 9 Solutions

Case A. P k = Ni ( 2L i k 1 ) + (# big cells) 10d 2 P k.

Provable Security Signatures

Homework 9 Solutions. 1. (Exercises from the book, 6 th edition, 6.6, 1-3.) Determine the number of distinct orderings of the letters given:

A new construction of 3-separable matrices via an improved decoding of Macula s construction

Introductory Cardinality Theory Alan Kaylor Cline

Lecture Notes on Linear Regression

FACTORIZATION IN KRULL MONOIDS WITH INFINITE CLASS GROUP

Lecture Space-Bounded Derandomization

Design and Analysis of Algorithms

Lecture 5 Decoding Binary BCH Codes

Math Review. CptS 223 Advanced Data Structures. Larry Holder School of Electrical Engineering and Computer Science Washington State University

Math 261 Exercise sheet 2

Difference Equations

Hash tables. Hash tables

Vapnik-Chervonenkis theory

Maximizing the number of nonnegative subsets

The Geometry of Logit and Probit

EGR 544 Communication Theory

Discussion 11 Summary 11/20/2018

Lecture 4: November 17, Part 1 Single Buffer Management

6.842 Randomness and Computation February 18, Lecture 4

Errors for Linear Systems

1 Generating functions, continued

Appendix B: Resampling Algorithms

Polynomials. 1 What is a polynomial? John Stalker

Chapter 8 SCALAR QUANTIZATION

Chapter 5. Solution of System of Linear Equations. Module No. 6. Solution of Inconsistent and Ill Conditioned Systems

Lecture 10: May 6, 2013

3.1 Expectation of Functions of Several Random Variables. )' be a k-dimensional discrete or continuous random vector, with joint PMF p (, E X E X1 E X

Société de Calcul Mathématique SA

Attacks on RSA The Rabin Cryptosystem Semantic Security of RSA Cryptology, Tuesday, February 27th, 2007 Nils Andersen. Complexity Theoretic Reduction

Markov Chain Monte Carlo (MCMC), Gibbs Sampling, Metropolis Algorithms, and Simulated Annealing Bioinformatics Course Supplement

An Experiment/Some Intuition (Fall 2006): Lecture 18 The EM Algorithm heads coin 1 tails coin 2 Overview Maximum Likelihood Estimation

Prof. Dr. I. Nasser Phys 630, T Aug-15 One_dimensional_Ising_Model

MASSACHUSETTS INSTITUTE OF TECHNOLOGY 6.265/15.070J Fall 2013 Lecture 12 10/21/2013. Martingale Concentration Inequalities and Applications

Matrix Approximation via Sampling, Subspace Embedding. 1 Solving Linear Systems Using SVD

International Mathematical Olympiad. Preliminary Selection Contest 2012 Hong Kong. Outline of Solutions

COS 511: Theoretical Machine Learning

Tornado and Luby Transform Codes. Ashish Khisti Presentation October 22, 2003

Algebraic properties of polynomial iterates

1. Inference on Regression Parameters a. Finding Mean, s.d and covariance amongst estimates. 2. Confidence Intervals and Working Hotelling Bands

Message modification, neutral bits and boomerangs

First Year Examination Department of Statistics, University of Florida

COMPLEX NUMBERS AND QUADRATIC EQUATIONS

Econ107 Applied Econometrics Topic 3: Classical Model (Studenmund, Chapter 4)

Analysis of Discrete Time Queues (Section 4.6)

An efficient algorithm for multivariate Maclaurin Newton transformation

CIS526: Machine Learning Lecture 3 (Sept 16, 2003) Linear Regression. Preparation help: Xiaoying Huang. x 1 θ 1 output... θ M x M

Transcription:

Introducton to Algorthms 6.046J/8.40J Lecture 7 Prof. Potr Indyk

Data Structures Role of data structures: Encapsulate data Support certan operatons (e.g., INSERT, DELETE, SEARCH) Our focus: effcency of the operatons Algorthms vs. data structures Introducton to Algorthms February 27, 2003 L7.2

Symbol-table problem Symbol table T holdng n records: x record key[x] Other felds contanng satellte data Operatons on T: INSERT(T, x) DELETE(T, x) SEARCH(T, k) How should the data structure T be organzed? Introducton to Algorthms February 27, 2003 L7.3

Drect-access table IDEA: Suppose that the set of keys s K {0,,, m }, and keys are dstnct. Set up an array T[0.. m ]: x f k K and key[x] = k, T[k] = NIL otherwse. Then, operatons take Θ() tme. Problem: The range of keys can be large: 64-bt numbers (whch represent 8,446,744,073,709,55,66 dfferent keys), character strngs (even larger!). Introducton to Algorthms February 27, 2003 L7.4

Hash functons Soluton: Use a hash functon h to map the unverse U of all keys nto T {0,,, m }: K k k 5 k 4 k 2 k 3 U When a record to be nserted maps to an already occuped As each key slot s n nserted, T, a collson h maps occurs. t to a slot of T. Introducton to Algorthms February 27, 2003 L7.5 0 h(k ) h(k 4 ) h(k 2 ) = h(k 5 ) h(k 3 ) m

Resolvng collsons by channg Records n the same slot are lnked nto a lst. T 49 86 52 h(49) = h(86) = h(52) = Introducton to Algorthms February 27, 2003 L7.6

Analyss of channg We make the assumpton of smple unform hashng: Each key k K of keys s equally lkely to be hashed to any slot of table T, ndependent of where other keys are hashed. Let n be the number of keys n the table, and let m be the number of slots. Defne the load factor of T to be α = n/m = average number of keys per slot. Introducton to Algorthms February 27, 2003 L7.7

Search cost Expected tme to search for a record wth a gven key = Θ( + α). apply hash functon and access slot search the lst Expected search tme = Θ() f α = O(), or equvalently, f n = O(m). Introducton to Algorthms February 27, 2003 L7.8

Choosng a hash functon The assumpton of smple unform hashng s hard to guarantee, but several common technques tend to work well n practce as long as ther defcences can be avoded. Desrata: A good hash functon should dstrbute the keys unformly nto the slots of the table. Regularty n the key dstrbuton should not affect ths unformty. Introducton to Algorthms February 27, 2003 L7.9

Dvson method Assume all keys are ntegers, and defne h(k) = k mod m. Defcency: Don t pck an m that has a small dvsor d. A preponderance of keys that are congruent modulo d can adversely affect unformty. Extreme defcency: If m = 2 r, then the hash doesn t even depend on all the bts of k: If k = 0000000 2 and r = 6, then h(k) = 000 2. h(k) Introducton to Algorthms February 27, 2003 L7.0

Dvson method (contnued) h(k) = k mod m. Pck m to be a prme not too close to a power of 2 or 0 and not otherwse used promnently n the computng envronment. Annoyance: Sometmes, makng the table sze a prme s nconvenent. But, ths method s popular, although the next method we ll see s usually superor. Introducton to Algorthms February 27, 2003 L7.

Multplcaton method Assume that all keys are ntegers, m = 2 r, and our computer has w-bt words. Defne h(k) = (A k mod 2 w ) rsh (w r), where rsh s the bt-wse rght-shft operator and A s an odd nteger n the range 2 w < A < 2 w. Don t pck A too close to 2 w. Multplcaton modulo 2 w s fast. The rsh operator s fast. Introducton to Algorthms February 27, 2003 L7.2

Multplcaton method example h(k) = (A k mod 2 w ) rsh (w r) Suppose that m = 8 = 2 3 and that our computer has w = 7-bt words: 0 0 0 = A 0 0 = k 0 0 0 0 0 0 0 h(k) A. Modular wheel 7 6 5 0 4. 2 3. 3A 2A Introducton to Algorthms February 27, 2003 L7.3

Dot-product method Randomzed strategy: Let m be prme. Decompose key k nto r + dgts, each wth value n the set {0,,, m }. That s, let k = k 0, k,, k m, where 0 k < m. Pck a = a 0, a,, a m where each a s chosen randomly from {0,,, m }. Defne h ( k) = a k mod m. a r =0 Excellent n practce, but expensve to compute. Introducton to Algorthms February 27, 2003 L7.4

A weakness of hashng as we saw t Problem: For any hash functon h, a set of keys exsts that can cause the average access tme of a hash table to skyrocket. An adversary can pck all keys from {k U : h(k) = } for some slot. IDEA: Choose the hash functon at random, ndependently of the keys. Even f an adversary can see your code, he or she cannot fnd a bad set of keys, snce he or she doesn t know exactly whch hash functon wll be chosen. Introducton to Algorthms February 27, 2003 L7.5

Unversal hashng Defnton. Let U be a unverse of keys, and let H be a fnte collecton of hash functons, each mappng U to {0,,, m }. We say H s unversal f for all x, y U, where x y, we have {h H : h(x) = h(y)} = H /m. That s, the chance of a collson between x and y s /m f we choose h randomly from H. {h : h(x) = h(y)} H m H Introducton to Algorthms February 27, 2003 L7.6

Unversalty s good Theorem. Let h be a hash functon chosen (unformly) at random from a unversal set H of hash functons. Suppose h s used to hash n arbtrary keys nto the m slots of a table T. Then, for a gven key x, we have E[#collsons wth x] < n/m. Introducton to Algorthms February 27, 2003 L7.7

Proof of theorem Proof. Let C x be the random varable denotng the total number of collsons of keys n T wth x, and let f h(x) = h(y), c xy = 0 otherwse. Note: E[c xy ] = /m and C =. x c xy y T {x} Introducton to Algorthms February 27, 2003 L7.8

Proof (contnued) E [ C ] = E x c xy y T { x} Take expectaton of both sdes. Introducton to Algorthms February 27, 2003 L7.9

Proof (contnued) E[ C x ] = E c xy y T { x} Take expectaton of both sdes. = y T { x} E[ c xy ] Lnearty of expectaton. Introducton to Algorthms February 27, 2003 L7.20

Proof (contnued) E[ C x ] = E c xy y T { x} Take expectaton of both sdes. = y T { x} E[ c xy ] Lnearty of expectaton. = / y T { x} m E[c xy ] = /m. Introducton to Algorthms February 27, 2003 L7.2

Proof (contnued) E[ C x ] = E c xy y T { x} Take expectaton of both sdes. = y T { x} E[ c xy ] Lnearty of expectaton. = / y T { x} m E[c xy ] = /m. = n m. Algebra. Introducton to Algorthms February 27, 2003 L7.22

Constructng a set of unversal hash functons Let m be prme. Decompose key k nto r + dgts, each wth value n the set {0,,, m }. That s, let k = k 0, k,, k r, where 0 k < m. Randomzed strategy: Pck a = a 0, a,, a r where each a s chosen randomly from {0,,, m }. Defne h ( k) = a k mod m. a r =0 How bg s H = {h a }? H = m r +. Dot product, modulo m REMEMBER THIS! Introducton to Algorthms February 27, 2003 L7.23

Unversalty of dot-product hash functons Theorem. The set H = {h a } s unversal. Proof. Suppose that x = x 0, x,, x r and y = y 0, y,, y r are dstnct keys. Thus, they dffer n at least one dgt poston, wlog poston 0. For how many h a H do x and y collde? h a ( x) = h a ( b) r = 0 a x r = 0 a y (mod m) Introducton to Algorthms February 27, 2003 L7.24

Introducton to Algorthms February 27, 2003 L7.25 Proof (contnued) Equvalently, we have ) (mod 0 ) ( 0 m y x a r = or ) (mod 0 ) ( ) ( 0 0 0 m y x a y x a r + = ) (mod ) ( ) ( 0 0 0 m y x a y x a r = whch mples that,.

Fact from number theory Theorem. Let m be prme. For any z Z m such that z 0, there exsts a unque z Z m such that z z (mod m). Example: m = 7. z 2 3 4 5 6 z 4 5 2 3 6 Introducton to Algorthms February 27, 2003 L7.26

We have Back to the proof r a0( x0 y0) a ( x y ) (mod m), = and snce x 0 y 0, an nverse (x 0 y 0 ) must exst, whch mples that r ( 0 0 a0 a x y ) ( x y ) (mod m). = Thus, for any choces of a, a 2,, a r, exactly one choce of a 0 causes x and y to collde. Introducton to Algorthms February 27, 2003 L7.27

Proof (completed) Q. How many h a s cause x and y to collde? A. There are m choces for each of a, a 2,, a r, but once these are chosen, exactly one choce for a 0 causes x and y to collde, namely r a a ( x y ) ( x y ) 0 = 0 0 = mod m. Thus, the number of h a s that cause x and y to collde s m r = m r = H /m. Introducton to Algorthms February 27, 2003 L7.28

Perfect hashng Gven a set of n keys, construct a statc hash table of sze m = O(n) such that SEARCH takes Θ() tme n the worst case. IDEA: Twolevel scheme 0 44 3 3 wth unversal 2 hashng at 3 both levels. 4 00 00 26 26 5 No collsons 6 99 86 86 40 at level 2! T S 4 427 S h 3 (4) = h 3 (27) = 4 S 6 40 37 37 22 22 m a 0 2 3 4 5 6 7 8 Introducton to Algorthms February 27, 2003 L7.29

Collsons at level 2 Theorem. Let H be a class of unversal hash functons for a table of sze m = n 2. Then, f we use a random h H to hash n keys nto the table, the expected number of collsons s at most /2. Proof. By the defnton of unversalty, the probablty that 2 gven keys n the table collde under h s /m = /n 2. Snce there are ( n) pars 2 of keys that can possbly collde, the expected number of collsons s n n( n ) = <. 2 n2 2 n2 2 Introducton to Algorthms February 27, 2003 L7.30

No collsons at level 2 Corollary. The probablty of no collsons s at least /2. Proof. Markov s nequalty says that for any nonnegatve random varable X, we have Pr{X t} E[X]/t. Applyng ths nequalty wth t =, we fnd that the probablty of or more collsons s at most /2. Thus, just by testng random hash functons n H, we ll quckly fnd one that works. Introducton to Algorthms February 27, 2003 L7.3

Analyss of storage For the level- hash table T, choose m = n, and let n be random varable for the number of keys that hash to slot n T. By usng n 2 slots for the level-2 hash table S, the expected total storage requred for the two-level scheme s therefore E m = 0 Θ ( n 2 ) = Θ( n) snce the analyss s dentcal to the analyss from rectaton of the expected runnng tme of bucket sort. (For a probablty bound, apply Markov.), Introducton to Algorthms February 27, 2003 L7.32

Resolvng collsons by open addressng No storage s used outsde of the hash table tself. Inserton systematcally probes the table untl an empty slot s found. The hash functon depends on both the key and probe number: h : U {0,,, m } {0,,, m }. The probe sequence h(k,0), h(k,),, h(k,m ) should be a permutaton of {0,,, m }. The table may fll up, and deleton s dffcult (but not mpossble). Introducton to Algorthms February 27, 2003 L7.33

Example of open addressng Insert key k = 496: 0. Probe h(496,0) T 586 33 204 48 0 collson m Introducton to Algorthms February 27, 2003 L7.34

Example of open addressng Insert key k = 496: T 0. Probe h(496,0). Probe h(496,) 586 collson 33 0 204 48 m Introducton to Algorthms February 27, 2003 L7.35

Example of open addressng Insert key k = 496: 0. Probe h(496,0). Probe h(496,) 2. Probe h(496,2) T 586 33 204 496 48 0 nserton m Introducton to Algorthms February 27, 2003 L7.36

Example of open addressng Search for key k = 496: 0. Probe h(496,0). Probe h(496,) 2. Probe h(496,2) T 586 33 204 496 48 Search uses the same probe sequence, termnatng successfully f t fnds the key m and unsuccessfully f t encounters an empty slot. 0 Introducton to Algorthms February 27, 2003 L7.37

Probng strateges Lnear probng: Gven an ordnary hash functon h (k), lnear probng uses the hash functon h(k,) = (h (k) + ) mod m. Ths method, though smple, suffers from prmary clusterng, where long runs of occuped slots buld up, ncreasng the average search tme. Moreover, the long runs of occuped slots tend to get longer. Introducton to Algorthms February 27, 2003 L7.38

Probng strateges Double hashng Gven two ordnary hash functons h (k) and h 2 (k), double hashng uses the hash functon h(k,) = (h (k) + h 2 (k)) mod m. Ths method generally produces excellent results, but h 2 (k) must be relatvely prme to m. One way s to make m a power of 2 and desgn h 2 (k) to produce only odd numbers. Introducton to Algorthms February 27, 2003 L7.39

Analyss of open addressng We make the assumpton of unform hashng: Each key s equally lkely to have any one of the m! permutatons as ts probe sequence. Theorem. Gven an open-addressed hash table wth load factor α = n/m <, the expected number of probes n an unsuccessful search s at most /( α). Introducton to Algorthms February 27, 2003 L7.40

Proof of the theorem Proof. At least one probe s always necessary. Wth probablty n/m, the frst probe hts an occuped slot, and a second probe s necessary. Wth probablty (n )/(m ), the second probe hts an occuped slot, and a thrd probe s necessary. Wth probablty (n 2)/(m 2), the thrd probe hts an occuped slot, etc. Observe that n m < n m = α for =, 2,, n. Introducton to Algorthms February 27, 2003 L7.4

Introducton to Algorthms February 27, 2003 L7.42 Proof (contnued) Therefore, the expected number of probes s + + + + + 2 2 n m m n m n m n ( ) ( ) ( ) ( ) α α α α α α α α α = = + + + + + + + + = 0 3 2. The textbook has a more rgorous proof.

Implcatons of the theorem If α s constant, then accessng an openaddressed hash table takes constant tme. If the table s half full, then the expected number of probes s /( 0.5) = 2. If the table s 90% full, then the expected number of probes s /( 0.9) = 0. Introducton to Algorthms February 27, 2003 L7.43