Data Structures. Element Uniqueness Problem. Hash Tables. Example. Hash Tables. Dana Shapira. 19 x 1. ) h(x 4. ) h(x 2. ) h(x 3. h(x 1. x 4. x 2.

Similar documents
Chapter 7. Kleene s Theorem. 7.1 Kleene s Theorem. The following theorem is the most important and fundamental result in the theory of FA s:

Class Summary. be functions and f( D) , we define the composition of f with g, denoted g f by

( ) D x ( s) if r s (3) ( ) (6) ( r) = d dr D x

Electric Potential. and Equipotentials

STD: XI MATHEMATICS Total Marks: 90. I Choose the correct answer: ( 20 x 1 = 20 ) a) x = 1 b) x =2 c) x = 3 d) x = 0

EECE 260 Electrical Circuits Prof. Mark Fowler

Optimization. x = 22 corresponds to local maximum by second derivative test

Energy Dissipation Gravitational Potential Energy Power

Previously. Extensions to backstepping controller designs. Tracking using backstepping Suppose we consider the general system

General Physics II. number of field lines/area. for whole surface: for continuous surface is a whole surface

10 Statistical Distributions Solutions

(a) Counter-Clockwise (b) Clockwise ()N (c) No rotation (d) Not enough information

Michael Rotkowitz 1,2

Language Processors F29LP2, Lecture 5

Math 4318 : Real Analysis II Mid-Term Exam 1 14 February 2013


Algebra Based Physics. Gravitational Force. PSI Honors universal gravitation presentation Update Fall 2016.notebookNovember 10, 2016

Geometric Sequences. Geometric Sequence a sequence whose consecutive terms have a common ratio.

Ch 26 - Capacitance! What s Next! Review! Lab this week!

Review of Gaussian Quadrature method

School of Electrical and Computer Engineering, Cornell University. ECE 303: Electromagnetic Fields and Waves. Fall 2007

Answers to test yourself questions

CHAPTER 18: ELECTRIC CHARGE AND ELECTRIC FIELD

9.1 The multiplicative group of a finite field. Theorem 9.1. The multiplicative group F of a finite field is cyclic.

Where did dynamic programming come from?

378 Relations Solutions for Chapter 16. Section 16.1 Exercises. 3. Let A = {0,1,2,3,4,5}. Write out the relation R that expresses on A.

Lecture 10. Solution of Nonlinear Equations - II

Divisibility. c = bf = (ae)f = a(ef) EXAMPLE: Since 7 56 and , the Theorem above tells us that

The graphs of Rational Functions

This immediately suggests an inverse-square law for a "piece" of current along the line.

Electronic Supplementary Material

set is not closed under matrix [ multiplication, ] and does not form a group.

Probabilistic Retrieval

Quality control. Final exam: 2012/1/12 (Thur), 9:00-12:00 Q1 Q2 Q3 Q4 Q5 YOUR NAME

Continuous Charge Distributions

Week 8. Topic 2 Properties of Logarithms

Physics 604 Problem Set 1 Due Sept 16, 2010

4 7x =250; 5 3x =500; Read section 3.3, 3.4 Announcements: Bell Ringer: Use your calculator to solve

U>, and is negative. Electric Potential Energy

Theoretical foundations of Gaussian quadrature

Exponentials - Grade 10 [CAPS] *

School of Electrical and Computer Engineering, Cornell University. ECE 303: Electromagnetic Fields and Waves. Fall 2007

ITI Introduction to Computing II

Two dimensional polar coordinate system in airy stress functions

A Matrix Algebra Primer

Read section 3.3, 3.4 Announcements:

Electric Field F E. q Q R Q. ˆ 4 r r - - Electric field intensity depends on the medium! origin

Fluids & Bernoulli s Equation. Group Problems 9

Unit #9 : Definite Integral Properties; Fundamental Theorem of Calculus

Deterministic simulation of a NFA with k symbol lookahead

Chapter 3: Theory of Modular Arithmetic 38

Topics for Review for Final Exam in Calculus 16A

A Bijective Approach to the Permutational Power of a Priority Queue

FI 2201 Electromagnetism

Algorithm Design and Analysis

The Area of a Triangle

Math Calculus with Analytic Geometry II

Quadratic Forms. Quadratic Forms

SOLUTIONS ( ) ( )! ( ) ( ) ( ) ( )! ( ) ( ) ( ) ( ) n r. r ( Pascal s equation ). n 1. Stepanov Dalpiaz

Best Approximation. Chapter The General Case

Solution for Assignment 1 : Intro to Probability and Statistics, PAC learning

Physics 11b Lecture #11

1. Gauss-Jacobi quadrature and Legendre polynomials. p(t)w(t)dt, p {p(x 0 ),...p(x n )} p(t)w(t)dt = w k p(x k ),

Multiplying integers EXERCISE 2B INDIVIDUAL PATHWAYS. -6 ì 4 = -6 ì 0 = 4 ì 0 = -6 ì 3 = -5 ì -3 = 4 ì 3 = 4 ì 2 = 4 ì 1 = -5 ì -2 = -6 ì 2 = -6 ì 1 =

QIP Course 10: Quantum Factorization Algorithm (Part 3)

MATH 101A: ALGEBRA I PART B: RINGS AND MODULES 35

1 Using Integration to Find Arc Lengths and Surface Areas

Lecture 11: Potential Gradient and Capacitor Review:

π,π is the angle FROM a! TO b

HASHING METHODS. Hanan Samet

New problems in universal algebraic geometry illustrated by boolean equations

Math 4310 Solutions to homework 1 Due 9/1/16

Things to Memorize: A Partial List. January 27, 2017

Math 426: Probability Final Exam Practice

Chapter Direct Method of Interpolation More Examples Mechanical Engineering

DEPARTMENT OF CIVIL AND ENVIRONMENTAL ENGINEERING FLUID MECHANICS III Solutions to Problem Sheet 3

Stanford University CS259Q: Quantum Computing Handout 8 Luca Trevisan October 18, 2012

7.2 The Definite Integral

Week 10: DTMC Applications Ranking Web Pages & Slotted ALOHA. Network Performance 10-1

Exam 2, Mathematics 4701, Section ETY6 6:05 pm 7:40 pm, March 31, 2016, IH-1105 Instructor: Attila Máté 1

13.5. Torsion of a curve Tangential and Normal Components of Acceleration

Discrete Model Parametrization

Coalgebra, Lecture 15: Equations for Deterministic Automata

Lecture 5 Single factor design and analysis

Data Structures and Algorithm. Xiaoqing Zheng

Quantum Fourier Transform

State space systems analysis (continued) Stability. A. Definitions A system is said to be Asymptotically Stable (AS) when it satisfies

Multiplying and Dividing Rational Expressions

Electricity & Magnetism Lecture 6: Electric Potential

ACCESS TO SCIENCE, ENGINEERING AND AGRICULTURE: MATHEMATICS 1 MATH00030 SEMESTER /2019

Section 35 SHM and Circular Motion

Physics 505 Fall 2005 Midterm Solutions. This midterm is a two hour open book, open notes exam. Do all three problems.

Designing Information Devices and Systems I Discussion 8B

Fingerprint idea. Assume:

BİL 354 Veritabanı Sistemleri. Relational Algebra (İlişkisel Cebir)

a n+2 a n+1 M n a 2 a 1. (2)

38 Riemann sums and existence of the definite integral.

Reversing the Chain Rule. As we have seen from the Second Fundamental Theorem ( 4.3), the easiest way to evaluate an integral b

Math Lecture 23

Lecture 3. In this lecture, we will discuss algorithms for solving systems of linear equations.

Transcription:

Element Uniqueness Poblem Dt Stuctues Let x,..., xn < m Detemine whethe thee exist i j such tht x i =x j Sot Algoithm Bucket Sot Dn Shpi Hsh Tbles fo (i=;i<m;i++) T[i]=NULL; fo (i=;i<n;i++){ if (T[x i ]= = NULL) T[x i ]= i else{ output (i, T[x i ]) etun; } } 2 Wht hppens when m is lge o when we e deling with el numbes?? Hsh Tbles h Exmple Nottions: U univese of keys of size U, K n ctul set of keys of size n, T hsh tble of size m Use hsh-function h:u {,,m-}, h(x)=i tht computes the slot i in y T whee element x is to be stoed, fo ll x in U. h(k) is computed in O( k ) = O(). U h(x ) h(x 2 ) h(x 3 ) h(x 4 ) Set of y indices h:u {,, m-} h(x)=x mod (wht is m?) input: 7,62,9,8,53 Collision: x y but h(x) = h(y). m «U. Solutions:. Chining 2. Open ddessing 2 3 4 5 6 7 8 9 8 62 53 7 9 x x 2 x 3 x 4 3 4

Collision-Resolution by Chining Anlysis of Chining 8 62 2 53 7 37 57 Simple Unifom Hshing Any given element is eqully likely to hsh to ny slot in the hsh tble. The slot n element hshes to is independent of whee othe elements hsh. Lod fcto: α = n/m (elements stoed in the hsh tble / numbe of slots in the hsh tble) 9 Inset(T,x): Inset new element x t the hed of list T[h(x.key)]. Delete(T,x): Delete element x fom list T[h(x.key)]. Sech(T,x): Sech list T[h(x.key)]. 5 6 Anlysis of Chining Designing Good Hsh Functions Theoem: In hsh tble with chining, unde the ssumption of simple unifom hshing, both successful nd unsuccessful seches tke expected time Θ(+α) on the vege, whee α is the hsh tble lod fcto. Poof: Unsuccessful Sech: Unde the ssumption of simple unifom hshing, ny key k is eqully likely to hsh to ny slot in the hsh tble. The expected time to sech unsuccessfully fo key k is the expected time to sech to the end of list T[h(k)] which hs expected length α. expected time - θ( + α) including time fo computing h(k). Successful sech: The numbe of elements exmined duing successful sech is moe thn the numbe of elements tht ppe befoe k in T[h(k)]. n n n n i i i n + i m = n i m + = n i nm + = = = = i= n( n ) ( n ) α α α = + = + = + = + = θ ( + α ) nm 2 2m 2 2m 2 2n expected time - θ( + α) Coolly: If m = θ(n), then Inset, Delete, nd Sech tke expected constnt time. 7 Exmple: Input = els dwn unifomly t ndom fom [,) Hsh function: h(x) = Îmx Often, the input distibution is unknown. Then we cn use heuistics o univesl hshing. 8

The Division Method The Multipliction Method Hsh function: h(x) = x mod m m = 2 k h(x) = the lowest k bits of x Heuistic: m = pime numbe not too close to powe of 2 Hsh function: h(x) = Îm (cx mod ), fo some < c < Optiml choice of c depends on input distibution. Heuistic: Knuth suggests the invese of the golden tio s vlue tht woks well: Exmple: x=23,456, m=, h(x) =, (23,456.683 mod ) = =, (76,3.45 mod ) = =,.45 = 4.5... = 4 9 Efficient Implementtion of the Multipliction Method h(x) = Îm (cx mod ) Exmple h(x) = Îm (cx mod ) Let w be the size of mchine wod Assume tht key x fits into mchine wod Assume tht m = 2 p Restict ouselves to vlues of c of the fom c = s / 2 w Then cx = sx / 2 w < s < 2 w sx is numbe tht fits into two mchine wods h(x) = p most significnt bits of the lowe wod * Fctionl pt x = 23456, p = 4, m = 2 4 = 6384, w = 32, Then sx = (763 2 32 ) + 762864 The 4 most significnt bits of 762864 e 67; tht is, h(x) = 67 x = s = sx = h(x) = = 67 Intege pt fte multiplying by m = 2 p p bits 2

Open Addessing Line Pobing All elements e stoed diectly in the hsh tble. Lod fcto α cnnot exceed. If slot T[h(x)] is ledy occupied fo key x, we pobe ltentive loctions until we find n empty slot. Seching pobes slots stting t T[h(x)] until x is found o we e sue tht x is not in T. Insted of computing h(x), we compute h(x, i) i -the pobe numbe. Hsh function: h(k, i) = (h'(k) + i) mod m, whee h' is n oiginl hsh function. Benefits: Esy to implement Poblem: Pimy Clusteing - Long uns of occupied slots build up s tble becomes fulle. h(x) 3 4 Qudtic Pobing Double Hshing Hsh function: h(k, i) = (h'(k) + c i + c 2 i 2 ) mod m, whee h' is n oiginl hsh function. Benefits: No moe pimy clusteing Poblem: Secondy Clusteing - Two elements x nd y with h'(x) = h'(y) hve sme pobe sequence. Hsh function: h(k, i) = (h (k) + ih 2 (k)) mod m, whee h nd h 2 e two oiginl hsh functions. h 2 (k) hs to be pime w..t. m; tht is, gcd(h 2 (k), m) =. Two methods: Choose m to be powe of 2 nd guntee tht h 2 (k) is lwys odd. Choose m to be pime numbe nd guntee tht h 2 (k) < m. Benefits: No moe clusteing Dwbck: Moe complicted thn line nd qudtic pobing 5 6

Anlysis of Open Addessing Anlysis of Open Addessing Cont. Unifom hshing: The pobe sequence h(k, ),, h(k, m ) is eqully likely to be ny pemuttion of,, m. Theoem: In n open-ddess hsh tble with lod fcto α <, the expected numbe of pobes in n unsuccessful sech is t most / ( α), ssuming unifom hshing. Poof: Let X be the numbe of pobes in n unsuccessful sech. { } ( { } { }) { } EX [ ] = ip X= i = i P X i P X i+ = P X i i= i= i= A i = thee is n i-th pobe, nd it ccesses non-empty slot 7 8 Anlysis of Open Addessing Cont. Anlysis of Open Addessing Cont. Theoem: Given n open-ddess hsh tble with lod fcto α <, the expected numbe of pobes in successful sech is (/α) ln ( / ( α)), ssuming unifom hshing nd ssuming tht ech key in the tble is eqully likely to be seched fo. A successful sech fo n element x follows the sme pobe sequence s the insetion of element x. Conside the (i + )-st element x tht ws inseted. The expected numbe of pobes pefomed when inseting x is t most Aveging ove ll n elements, the expected numbe of pobes in successful sech is Coolly: The expected numbe of pobes pefomed duing n insetion into n open-ddess hsh tble with unifom hshing is / ( α). 9 2

Anlysis of Open Addessing Cont. Univesl Hshing A fmily H of hsh functions is univesl if fo ech pi k, l of keys, thee e t most H / m functions in H such tht h(k) = h(l). This mens: Fo ny two keys k nd l nd ny function h chosen unifomly t ndom, the pobbility tht h(k) = h(l) is t most /m. P= ( H / m )/ H =/m This is the sme s if we chose h(k) nd h(l) unifomly t ndom fom [, m ]. 2 22 Anlysis of Univesl Hshing Theoem: Fo hsh function h chosen unifomly t ndom fom univesl fmily H, the expected length of the list T[h(x)] is α if x is not in the hsh tble nd + α if x is in the hsh tble. Poof: Indicto vibles: χij = () = h( j) () h( j) hi hi Anlysis of Univesl Hshing Cont. E Y E E [ x] = χ xy = ( χ xy ) y T y T y T x y x y x y If x is not in T, then {y T : x y} = n. Hence, E[Y x ] = n / m = α. m Y x = the numbe of keys x tht hsh to the sme slot s x Y x = yt x y [ ] x χ xy EY E χ = xy y T x y 23 If x is in T, then {y T : x y} = n. Hence, E[Y x ] = (n ) / m < α. The length of list T[h(x)] is one moe, tht is, + α. 24

Univesl Fmily of Hsh Functions Choose pime p so tht m = p. Let x [ x,..., x ] such Fo ech [,..., ] {,..., } + = m define the hsh function s follows: H= h H = Exmple: m=p=253 =[248,223,] = xi x=25=[,2,]. m h( x) = ixi mod m i= { } 24 U = {,..., 2 } m + ( ) ( ) h x = 248 + 223 2 + mod 253 = 4 25 Univesl Fmily of Hsh Functions Theoem: The clss H is univesl. Poof: Let x= [ x,..., x ], y = [ y,..., y ] such tht x y, w.l.o.g Fo ll,, thee exists single such tht h( x) = h( y) h x h y = x y (mod m) x y ( ) ( ) ( ) i i i i= ( ) i( i i) x y x y (mod m) i= Fo ll z p thee exists single w such tht z w=(mod m) z = x ( ) ( ) y i xi yi x y (mod m) i= h x = h y fo m vlues ( ) ( ) The numbe of hsh functions h in H, fo which h(k ) = h(k 2 ) is t most m / m + =/m 26 Univesl Fmily of Hsh Functions Choose pime p so tht m < p. Fo ny < p nd b < p, we define function h,b (x) = ((x + b) mod p) mod m. Let H p,m be the fmily H p,m = {h,b : < p nd b < p}. Theoem: The clss H p,m is univesl. Summy Hsh tbles e the most efficient dictionies if only opetions Inset, Delete, nd Sech hve to be suppoted. If unifom hshing is used, the expected time of ech of these opetions is constnt. Univesl hshing is somewht complicted, but pefoms well even fo dvesil input distibutions. If the input distibution is known, heuistics pefom well nd e much simple thn univesl hshing. Fo collision-esolution, chining is the simplest method, but it equies moe spce thn open ddessing. Open ddessing is eithe moe complicted o suffes fom clusteing effects. 27 28