Lecture 3 January 31, 2017

Similar documents
Introduction to Algorithms

Introduction to Algorithms

Hashing. Alexandra Stefan

18.1 Introduction and Recap

Lecture 4: Universal Hash Functions/Streaming Cont d

Lecture Lecture 3 Tuesday Sep 09, 2014

princeton univ. F 13 cos 521: Advanced Algorithm Design Lecture 3: Large deviations bounds and applications Lecturer: Sanjeev Arora

U.C. Berkeley CS294: Beyond Worst-Case Analysis Luca Trevisan September 5, 2017

11 Tail Inequalities Markov s Inequality. Lecture 11: Tail Inequalities [Fa 13]

6.842 Randomness and Computation February 18, Lecture 4

Stanford University CS359G: Graph Partitioning and Expanders Handout 4 Luca Trevisan January 13, 2011

Lecture Randomized Load Balancing strategies and their analysis. Probability concepts include, counting, the union bound, and Chernoff bounds.

E Tail Inequalities. E.1 Markov s Inequality. Non-Lecture E: Tail Inequalities

U.C. Berkeley CS294: Beyond Worst-Case Analysis Handout 6 Luca Trevisan September 12, 2017

Expected Value and Variance

CHAPTER 17 Amortized Analysis

Problem Set 9 Solutions

MASSACHUSETTS INSTITUTE OF TECHNOLOGY 6.265/15.070J Fall 2013 Lecture 12 10/21/2013. Martingale Concentration Inequalities and Applications

princeton univ. F 17 cos 521: Advanced Algorithm Design Lecture 7: LP Duality Lecturer: Matt Weinberg

Exercises of Chapter 2

P exp(tx) = 1 + t 2k M 2k. k N

MATH 281A: Homework #6

String Hashing for Linear Probing

Finding Dense Subgraphs in G(n, 1/2)

Computing Correlated Equilibria in Multi-Player Games

Stanford University CS254: Computational Complexity Notes 7 Luca Trevisan January 29, Notes for Lecture 7

Matrix Approximation via Sampling, Subspace Embedding. 1 Solving Linear Systems Using SVD

Calculation of time complexity (3%)

Randomness and Computation

U.C. Berkeley CS294: Spectral Methods and Expanders Handout 8 Luca Trevisan February 17, 2016

Outline and Reading. Dynamic Programming. Dynamic Programming revealed. Computing Fibonacci. The General Dynamic Programming Technique

Notes on Frequency Estimation in Data Streams

find (x): given element x, return the canonical element of the set containing x;

APPENDIX A Some Linear Algebra

Feature Selection: Part 1

Lecture 10: May 6, 2013

More metrics on cartesian products

Lecture 5 September 17, 2015

Errors for Linear Systems

MATH 829: Introduction to Data Mining and Analysis The EM algorithm (part 2)

Learning Theory: Lecture Notes

Min Cut, Fast Cut, Polynomial Identities

Dimensionality Reduction Notes 2

Lecture 4 February 2nd, 2017

College of Computer & Information Science Fall 2009 Northeastern University 20 October 2009

LOW BIAS INTEGRATED PATH ESTIMATORS. James M. Calvin

8.4 COMPLEX VECTOR SPACES AND INNER PRODUCTS

Bernoulli Numbers and Polynomials

Hidden Markov Models

1 Definition of Rademacher Complexity

04 - Treaps. Dr. Alexander Souza

Lecture 3: Shannon s Theorem

For now, let us focus on a specific model of neurons. These are simplified from reality but can achieve remarkable results.

ECE 534: Elements of Information Theory. Solutions to Midterm Exam (Spring 2006)

Lectures - Week 4 Matrix norms, Conditioning, Vector Spaces, Linear Independence, Spanning sets and Basis, Null space and Range of a Matrix

Hidden Markov Models & The Multivariate Gaussian (10/26/04)

Finding Primitive Roots Pseudo-Deterministically

Assortment Optimization under the Paired Combinatorial Logit Model

Dynamic Programming. Preview. Dynamic Programming. Dynamic Programming. Dynamic Programming (Example: Fibonacci Sequence)

An introduction to chaining, and applications to sublinear algorithms

Math 217 Fall 2013 Homework 2 Solutions

Some Consequences. Example of Extended Euclidean Algorithm. The Fundamental Theorem of Arithmetic, II. Characterizing the GCD and LCM

EM and Structure Learning

Lecture 10 Support Vector Machines II

Lecture 4: Constant Time SVD Approximation

Singular Value Decomposition: Theory and Applications

Math 426: Probability MWF 1pm, Gasson 310 Homework 4 Selected Solutions

Probability Theory (revisited)

Communication Complexity 16:198: February Lecture 4. x ij y ij

TAIL BOUNDS FOR SUMS OF GEOMETRIC AND EXPONENTIAL VARIABLES

Exercise Solutions to Real Analysis

Lecture 3. Ax x i a i. i i

Artificial Intelligence Bayesian Networks

6. Stochastic processes (2)

6. Stochastic processes (2)

Estimation: Part 2. Chapter GREG estimation

Lecture 14: Bandits with Budget Constraints

CS 798: Homework Assignment 2 (Probability)

COS 511: Theoretical Machine Learning

Lecture 21: Numerical methods for pricing American type derivatives

Module 9. Lecture 6. Duality in Assignment Problems

A note on almost sure behavior of randomly weighted sums of φ-mixing random variables with φ-mixing weights

CHAPTER 5 NUMERICAL EVALUATION OF DYNAMIC RESPONSE

THE CHINESE REMAINDER THEOREM. We should thank the Chinese for their wonderful remainder theorem. Glenn Stevens

Online Classification: Perceptron and Winnow

Multilayer Perceptron (MLP)

Lecture Space-Bounded Derandomization

Maximizing the number of nonnegative subsets

Integrals and Invariants of Euler-Lagrange Equations

3.1 Expectation of Functions of Several Random Variables. )' be a k-dimensional discrete or continuous random vector, with joint PMF p (, E X E X1 E X

MATH 5707 HOMEWORK 4 SOLUTIONS 2. 2 i 2p i E(X i ) + E(Xi 2 ) ä i=1. i=1

Case A. P k = Ni ( 2L i k 1 ) + (# big cells) 10d 2 P k.

An Experiment/Some Intuition (Fall 2006): Lecture 18 The EM Algorithm heads coin 1 tails coin 2 Overview Maximum Likelihood Estimation

10-701/ Machine Learning, Fall 2005 Homework 3

Provable Security Signatures

The Minimum Universal Cost Flow in an Infeasible Flow Network

10-801: Advanced Optimization and Randomized Methods Lecture 2: Convex functions (Jan 15, 2014)

COS 521: Advanced Algorithms Game Theory and Linear Programming

Assortment Optimization under MNL

Randić Energy and Randić Estrada Index of a Graph

Transcription:

CS 224: Advanced Algorthms Sprng 207 Prof. Jelan Nelson Lecture 3 January 3, 207 Scrbe: Saketh Rama Overvew In the last lecture we covered Y-fast tres and Fuson Trees. In ths lecture we start our dscusson of hashng. We wll study load balancng, k-wse ndependence, and the dynamc dctonary problem solved usng hashng wth channg and lnear probng. 2 Load Balancng Formally, consder jobs wth IDs n the unverse [u], and machnes labeled,..., m. The task of load balancng studes the assgnment of jobs to machnes so that no machne s too overloaded. For example, we could have a centralzed scheduler whch decdes where jobs should go. However, local decsons for schedulng are preferable for complexty reasons ths motvates our study of hashng. The dea s to have a random functon h : [u] [m]. 2. Chernoff Bound We wll assume there are n jobs n the system, wth n u, and focus on the case where m = n. Studyng ths case wll motvate our statement and proof of Chernoff bounds. 2.. Applcaton of Bound Load balancng corresponds to a small probablty Pmax load of any machne > T ). We can restate ths as follows: Pmax load of any machne > T ) = P machne : load) > T ) n Pload) > T ) = = n Pload) > T ) where the nequalty follows from the unon bound. We can now apply the Chernoff bound, whch wll prove later.

Lemma. Chernoff Bound Dscrete Case). Let random varables X,..., X R {0, } be ndependent, wth X = X and E[X] = µ. Then for all δ > 0, e δ PX > + δ)µ) < + δ) +δ To apply ths to hashng, defne R = n ndcator varables { h) = X =. 0 o.w. ) µ Then µ = E[X] = E[X ] = n n =. We can now analyze the probablty n load balancng. Pload) > T ) n < et T T n e ) T < n < n et /T ) T T ) By settng T = Θ lg n such that /T ) T /n 2, we get n e T /T ) T < /n o). In load balancng ) jargon, we say that f the left-hand condton s satsfed, then the max load s O wth hgh probablty. lg n 2..2 Proof of Bound Because X s Bernoull, E[X ] = p mples µ = p. We wll make use of the followng nequalty to bound the probablty usng an expectaton. Lemma 2. Markov s Inequalty. Let X be a nonnegatve r.v. Then for all λ > 0, Because f s strctly ncreasng, we can say that PX > λ) < E[X] λ. PX > + δ)µ) = Pfx) > f + δ)µ)). As a somewhat magcal step, choose fz) = e tz, such that we can guess at the form of z. 2

Pe tx > e t+δ)µ ) < e t+δ)µ Ee t X ) by Markov s nequalty) = e t+δ)µ E e tx ) = e t+δ)µ Ee tx ) = e t+δ)µ p + p e t ) e t+δ)µ e p e t ) ) = e t+δ)µ e µet ) ) µ e Ths establshes that PX > + δ)µ) < δ +δ). +δ The above proof requred a magcal step of guessng at the functon s form. We can also consder Chernoff bounds more ntutvely as a moment bound for large p n the expresson derved from a repeated applcaton of Markov s nequalty: P X E[X] p ) < E X E[X] )p λ p. 2.2 Alternatve Analyss We can also approach load balancng more drectly. P max load > T ) < n Pload) > T ) = n P T jobs mappng to machne ) ) n < n T n T We can bound n T as follows. For I = {,..., T } wth < < T, let { f all s map to X I =. 0 o.w. Then P T jobs mappng to ) = P I : X I = ) I PX I = ) by the unon bound. Note that ) n T n T = n! T!n T!) nn ) n T + ) = nt T! n T < T!. Here, we can ether use Strlng s approxmaton or be slopper wth the nequalty T! > T/2) T/2. We choose the latter, and so T! < q where q = T/2. Ths quantty s much smaller than /n for ) q T = q = Θ. lg n 3

3 k-wse Independence It turns out that the above analyss only requres k-wse ndependence where k = T ), a concept whch we wll now study. Note that PX I = ) = P h T j= h j) = ) = T j= P hh j ) = ) = /n) T, where the probablty s taken over the randomness of the hash functon. Defnton 3. A famly H of functons h : [u] [m] s a k-wse ndependent hash famly f for any < < k [u] and y,..., y k [m], we have P h H T j=h j ) = y j ) = k Ph j ) = y j ). h Ths condton s useful because a totally random hash functon would requre u lg m bts to store. Wth a less restrctve k-wse ndependence, we can get away wth less storage. Note that f H s the set of all functons mappng [u] to [m], then a random h H s what we just analyzed. j= 3. Example Let u = p where p s prme, wth p 2m. Defne H = {h : hx) = k =0 a x mod p}. Then H = p k, and so lg H = k lg p bts. We wll omt the analyss of ths example for the purposes of ths course. It can be derved usng polynomal nterpolaton. 4 Dynamc Dctonary Problem The dynamc dctonary problem s a data structure problem. The goal s to mantan tems S [u] as keys where each x S has an assocated value n the unverse [u] subject to the followng operatons:. nsertx, v) assocate key x wth v 2. queryx) return value of key x 3. delx) remove x from S 4. Frst Soluton: Hashng wth Channg We can defne an array A[... m] whose entres are ponters to lnked lsts wth key-value pars as nodes. The hash functon maps the unverse nto ths array. It turns out that f H s 2-wse ndependent wth m S, then E[tme per op] = o). The analyss of ths s avalable n the notes for CS 24/25 and CLRS. 4

4.2 Second Soluton: Lnear Probng The approach we wll focus on n ths course s lnear probng. We agan have an array A[dotsm]. To nsert a key k, we start at hk) and move rght untl we fnd an empty slot. For now, we wll consder the smpler case whch does not support deletons.) Lnear probng frst appeared n an IBM 70 program by Samuel, Amdahl, and Boehme n 954, and was subsequently analyzed by Knuth n 963 n the case where H s all functons []. Knuth showed that f m + ɛ) n, where n = S, then Etme per op) O/ɛ 2 ). Pagh, Pagh, and Ružć showed more recently that f m c n e.g., c = 3 works), then 5-wse ndependence guarantees constant expected tme as well, whch we wll prove n the next lecture. In the case of 4-wse ndependence, Pǎtraşcu and Thorup showed that there exst H whch have expected runtme Ωlg n), whch s not constant. References [] Donald Knuth. The Art of Computer Programmng 2nd ed.). Addson Wesley, pp. 53 558, 998. [2] Anna Pagh, Rasmus Pagh, and Mlan Ružć. Lnear probng wth 5-wse ndependence. SIAM Revew 53.3 20):547-558, 20. [3] Mha Pǎtraşcu, and Mkkel Thorup. On the k-ndependence requred by lnear probng and mnwse ndependence. Internatonal Colloquum on Automata, Languages, and Programmng. Sprnger Berln Hedelberg, 200. 5