Multiple Choice Tries and Distributed Hash Tables
|
|
- Erick Hill
- 5 years ago
- Views:
Transcription
1 Multiple Choice Tries and Distributed Hash Tables Luc Devroye and Gabor Lugosi and Gahyun Park and W. Szpankowski January 3, 2007 McGill University, Montreal, Canada U. Pompeu Fabra, Barcelona, Spain U. Wisconsin, Whitewater, USA Purdue University, W. Lafayette, USA
2 Outline of the Talk 1. Digital Tries and Their Applications 2. Known Results 3. Our Main Results Two Choice Trie (greedy and optimal algorithms) Algorithmic Considerations Multiple Choice Trie 4. Distributed Hash Table
3 Trie and Its Parameters F n D n H n x 3 x 6 x 7 x 8 x 1 x2 x 4 x 5 x 9 x 10 F n fill up level; D n typical depth; H n height.
4 Application of Tries Tries are popular and efficient data structures that were initially developed and analyzed by Fredkin (1960) and Knuth (1973) as an efficient method for searching and sorting digital data. dynamic hashing conflict resolution algorithms leader election algorithms IP address lookup Lempel-Ziv compression schemes distributed hash tables (for ID management though tries were never explicitly named).
5 Distributed Hashing Tables interval owned by ID ID ID interval owned by ID internal node external node leaf (a) (b) 0 1 (a) (b) Figure 1: (a) IDs are randomly generated on the perimeter and either IDs own the intervals to their left in clockwise order or boundaries are determined by virtue of a trie or digital search tree. T he objective is to make all intervals of about equal length, so that all hosts receive about equal traffic. (b) A standard trie for five strings. The correspondence between nodes and intervals in a dyadic partition of the unit interval is shown. The leaf ID assigned is read from the path to the root (0 for left, 1 for right). The external nodes, not normally part of the trie, are shown as well. Together, external nodes and leaves define a partition of the unit interval (shaded boxes). The fill-up level of this tree is one, while the height is four. Balance B n is defined as B n = 2 H n Fn+O(1).
6 Some Known Results Probabilistic Assumption: A memoryless or Markov (mixing) source generates n binary sequences over a finite or infinite alphabet with p i being the probability of emitting the ith symbol. Depth D n (path distance to the root): D n log n 1 h in probability as n, where h = P i p i log p i is the entropy of the distribution. The mean, variance and the central limit theorem for D n were first obtained by Jacquet and Régnier (1986), Pittel (1985) and W.S.(1988). Height H n (maximum distance between root and leaves): H n log n 2 Q in probability as n, where! Q = log X i p 2 i Pittel (1985), Clement, Flajolet, Vallee (2001). Asymptotic distribution is of the extreme type.
7 Known Results for Generalized Tries Height in b-tries (i.e., one allows to store in an external node up to b strings) H n log n b log Pi pb+1 i «in probability Flajolet (1980), Pittel (1985), see also W.S. (2001). Height in PATRICIA trees and DIGITAL SEARCH TREES: H n log n 1 log 1 max i 1 p i ) = 1 h in probability Pittel (1985), where h = log max i 1 p i. Moreover, Pittel, and Knessl & W.S showed H n = 1 h log n + O( p log n).
8 More Known Results for Tries Fill-up F n (the last level that has full set of internal nodes): F n log n 1 log(1/p min ) = 1 in probability h where p min = min i {p i } and h = log(1/p min ) is the Rényi entropy of infinite order (cf. W.S. (2001)). Furthermore, F n concentrates on two points k n and k n + 1 where k n = 1 log(1/p min ) (log n log log log n) + O(1) while for symmetric sources (i.e., sources with p 1 = p 2 = 1/2) k n = log 2 n log 2 log 2 n + O(1), Pittel (1986), Devroye (1992), and Knessl and W.S. (2005).
9 Important Relationship From Jensen s inequality and (max i p i ) 2 X i p 2 i max i p i, we conclude log 1 1 min i 1 p i ) 1 h 1 Q 1 log 1 max i 1 p i ) 2 Q, so that the height is always at least twice as big as the typical depth of a node. For distributed hash tables the so called load balancing ratio B n is important where B n = 2 H n Fn.
10 Two-choice Tries In many applications (e.g., distributed hashing) one needs to construct a well balanced trie: height as small as possible and as close to its fillup level. Two-choice Trie: Each datum (key) has two strings, X i and Y i, that is, there n pairs of strings (X i, Y i ), and we can select one of the two to insert in the trie. A Greedy Heuristic: Choose the string which, at the time of insertion would yield the leaf nearest to the root. Note: Once the selection is made, it cannot be undone! Main Results: With high probability H n 1 log n 3 2Q in probability as n.
11 Sketch of Proof Theorem 1. For all integer d > 0 and any t > 0 j P H n 3 log n + t ff 4e t + 2n 1/4 e 3t/4. 2Q If p 1 = = p V = 1/V (symmetric case), then j ff lim P (3 ǫ) log n H n n 2Q = 0. Upper Bound: Define C(X, Y ) the length of the longest common prefix of X and Y ; Z i the string to be selected for the ith datum (i.e., Z i = X i or Z i = Y i ); P r = P i pr i ; note that Q = log P 2. {H n > d} = n[ [ {C(Z i, X l ) > d} {C(Z j, Y l ) > d} l=1 1 i,j<l hence P(H n > d) 4n 3 p 2d 2 + 2n2 p d 3 4n 3 p 2d 2 + 2n2 p 3/2d 2 since P 3 P 3/2 2.
12 Optimized off-line Algorithm Define: Z i (0) = X i and Z i (1) = Y i, {i 1,..., i n } {0, 1} n. Then H n (i 1,..., i n ) height over Z 1 (i 1 ),..., Z n (i n ). Finally H n = min H n (i 1,..., i n ) i 1,...,in the minimal height over all these 2 n tries. Theorem 2. If max i p i < 1, then In particular, for fixed t, H n /log n 1/Q in probability. P j H n log n + t ff Q 8e t. Also, for all ǫ > 0, j lim P H n n ff (1 ǫ) log n Q = 0.
13 Upper Bound Proof 1. Construct an infinite trie over 2n strings. 2. Let T j (1 j 2 d ) be a subtree rooted at distance d from the root. 3. A bad datum is with both strings (of the same datum) fall in the same T j. 4. A colliding pair of data is such that for some j k, each datum in the pair delivers one string to T j and one string to T k. Define λ = P i p2 i = P 2. Lemma 1. (i) The probability that there exists a bad datum anywhere is not more than nλ d. (ii) The probability that there is a colliding pair of data anywhere is not more than 2n 2 λ 2d.
14 A Multigraph Representation 5. Construct a multigraph G(d) whose vertices represent the T j. We connect T j with T l if a datum deposits one string in each of these trees. T 1 1 T 3 d T 2 3 T 1 T 2 T 3 graph G Figure 2: The multigraph G and an infinite trie for n = 3 pairs of strings, denoted by (1, 1 ), (2, 2 ) and (3, 3 ). Note that (2, 2 ) and (3, 3 ) is a colliding pair.
15 Cycles in G 6. Consider cycles of length at least 3. Lemma 2. The probability that G has a cycle of length 3 is not more than (4n) 3 λ 3d 1 4nλ d. Sketch of Proof. The probability of a cycle of length l can be bounded by the number of possible data assignments times the probability that the l pairs of data are in the given lists: 2 l (2n) l λ dl. The probability of a cycle of length 3 does not exceed X (4n) l λ dl = (4n)3 λ 3d 1 4nλ d. l=3
16 Selection Process 7. Assume that there is no: (i) bad datum, (ii) no colliding data, (iii) and no cycle (so that G is a forest with no multiedges and one can select one string for each node). We can assign strings as follows. We choose any one of the strings in the root node s list. For all other strings in the root s list, choose the companion string of the same datum (found by following edges away from the root). This either terminates, or has an impact on one or more child trees. But for the child tree of the root, we have fixed one string (as we did for the root), and thus choose again companion strings for that child list, and so forth. This process is continued until one string of each datum is chosen for the trie.
17 Finally If the height H n is at least d, then the height H 2n is at at least d, hence H n > d if there exists a bad datum there exists a colliding pair there exists a cycle. Thus P{H n > d} P{there exists a bad datum} + P{there exists a colliding pair} +P{there exists a cycle} nλ d + 2n 2 λ 2d + (4n)3 λ 3d 1 4nλ d. If we set A = nλ d, then P{H n > d} 4AI [A 1/8] + I [A>1/8] 4AI [A 1/8] + 8AI [A>1/8] 8nλ d. Algorithm: Using parent pointer data representations for forests, we can find the optimal selection in O(n log n) time.
18 Multiple-choice Tries Consider now k strings per datum. Consider n data, each composed of k independent strings of i.i.d. symbols drawn from a memoryless distribution. Let H n (k) denote the minimal height of any trie of n strings that takes one string of each datum. Theorem 3. Assume H <. For all ǫ > 0, there exists k large enough such that j ff (1 ǫ)log n lim P H (1 + ǫ)log n n n (k) = 1. h h Observe that D n H n (k).
19 Uniform Distribution for k = O(log n) Consider the interval [0, 1] and let X 1,..., X n be n independent vectors of k = clog n i.i.d. uniform [0, 1] random variables X i,j, 1 i n, 1 j k, where c > 0 is a constant. Theorem 4. Let α (0, 1/3) and c = 2/α. Then there exists a selection Z 1,..., Z n such that the height H n and fillup level F n of the associated trie for X 1,Z1,..., X n,z n satisfy, for n 8, P{H n F n 2} 1 3 n. Thus B n = O(1) (existential result). For DHT a greedy heuristic (on-line algorithm) for k = O(log n) suffices to yield H n F n 7, in probability.
MULTIPLE CHOICE TRIES AND DISTRIBUTED HASH TABLES
MULTIPLE CHOICE TRIES AND DISTRIBUTED HASH TABLES Luc Devroye Gábor Lugosi Gahyun Park and Wojciech Szpankowski School of Computer Science ICREA and Department of Economics Department of Computer Sciences
More informationEXPECTED WORST-CASE PARTIAL MATCH IN RANDOM QUADTRIES
EXPECTED WORST-CASE PARTIAL MATCH IN RANDOM QUADTRIES Luc Devroye School of Computer Science McGill University Montreal, Canada H3A 2K6 luc@csmcgillca Carlos Zamora-Cura Instituto de Matemáticas Universidad
More informationLAWS OF LARGE NUMBERS AND TAIL INEQUALITIES FOR RANDOM TRIES AND PATRICIA TREES
LAWS OF LARGE NUMBERS AND TAIL INEQUALITIES FOR RANDOM TRIES AND PATRICIA TREES Luc Devroye School of Computer Science McGill University Montreal, Canada H3A 2K6 luc@csmcgillca June 25, 2001 Abstract We
More informationPROBABILISTIC BEHAVIOR OF ASYMMETRIC LEVEL COMPRESSED TRIES
PROBABILISTIC BEAVIOR OF ASYMMETRIC LEVEL COMPRESSED TRIES Luc Devroye Wojcieh Szpankowski School of Computer Science Department of Computer Sciences McGill University Purdue University 3450 University
More informationLaws of large numbers and tail inequalities for random tries and PATRICIA trees
Journal of Computational and Applied Mathematics 142 2002 27 37 www.elsevier.com/locate/cam Laws of large numbers and tail inequalities for random tries and PATRICIA trees Luc Devroye 1 School of Computer
More informationThe Moments of the Profile in Random Binary Digital Trees
Journal of mathematics and computer science 6(2013)176-190 The Moments of the Profile in Random Binary Digital Trees Ramin Kazemi and Saeid Delavar Department of Statistics, Imam Khomeini International
More informationPartial Fillup and Search Time in LC Tries
Partial Fillup and Search Time in LC Tries August 17, 2006 Svante Janson Wociech Szpankowski Department of Mathematics Department of Computer Science Uppsala University, P.O. Box 480 Purdue University
More informationCompact Suffix Trees Resemble Patricia Tries: Limiting Distribution of Depth
Purdue University Purdue e-pubs Department of Computer Science Technical Reports Department of Computer Science 1992 Compact Suffix Trees Resemble Patricia Tries: Limiting Distribution of Depth Philippe
More informationarxiv:cs/ v1 [cs.ds] 6 Oct 2005
Partial Fillup and Search Time in LC Tries December 27, 2017 arxiv:cs/0510017v1 [cs.ds] 6 Oct 2005 Svante Janson Wociech Szpankowski Department of Mathematics Department of Computer Science Uppsala University,
More informationDigital Trees and Memoryless Sources: from Arithmetics to Analysis
Digital Trees and Memoryless Sources: from Arithmetics to Analysis Philippe Flajolet, Mathieu Roux, Brigitte Vallée AofA 2010, Wien 1 What is a digital tree, aka TRIE? = a data structure for dynamic dictionaries
More informationString Complexity. Dedicated to Svante Janson for his 60 Birthday
String Complexity Wojciech Szpankowski Purdue University W. Lafayette, IN 47907 June 1, 2015 Dedicated to Svante Janson for his 60 Birthday Outline 1. Working with Svante 2. String Complexity 3. Joint
More informationFrom the Discrete to the Continuous, and Back... Philippe Flajolet INRIA, France
From the Discrete to the Continuous, and Back... Philippe Flajolet INRIA, France 1 Discrete structures in... Combinatorial mathematics Computer science: data structures & algorithms Information and communication
More informationRandom forests and averaging classifiers
Random forests and averaging classifiers Gábor Lugosi ICREA and Pompeu Fabra University Barcelona joint work with Gérard Biau (Paris 6) Luc Devroye (McGill, Montreal) Leo Breiman Binary classification
More informationVariable-to-Variable Codes with Small Redundancy Rates
Variable-to-Variable Codes with Small Redundancy Rates M. Drmota W. Szpankowski September 25, 2004 This research is supported by NSF, NSA and NIH. Institut f. Diskrete Mathematik und Geometrie, TU Wien,
More informationMARKOV CHAINS A finite state Markov chain is a sequence of discrete cv s from a finite alphabet where is a pmf on and for
MARKOV CHAINS A finite state Markov chain is a sequence S 0,S 1,... of discrete cv s from a finite alphabet S where q 0 (s) is a pmf on S 0 and for n 1, Q(s s ) = Pr(S n =s S n 1 =s ) = Pr(S n =s S n 1
More informationLecture 16. Error-free variable length schemes (contd.): Shannon-Fano-Elias code, Huffman code
Lecture 16 Agenda for the lecture Error-free variable length schemes (contd.): Shannon-Fano-Elias code, Huffman code Variable-length source codes with error 16.1 Error-free coding schemes 16.1.1 The Shannon-Fano-Elias
More informationChapter 2: Source coding
Chapter 2: meghdadi@ensil.unilim.fr University of Limoges Chapter 2: Entropy of Markov Source Chapter 2: Entropy of Markov Source Markov model for information sources Given the present, the future is independent
More informationLecture 4 : Adaptive source coding algorithms
Lecture 4 : Adaptive source coding algorithms February 2, 28 Information Theory Outline 1. Motivation ; 2. adaptive Huffman encoding ; 3. Gallager and Knuth s method ; 4. Dictionary methods : Lempel-Ziv
More informationRENEWAL THEORY IN ANALYSIS OF TRIES AND STRINGS: EXTENDED ABSTRACT
RENEWAL THEORY IN ANALYSIS OF TRIES AND STRINGS: EXTENDED ABSTRACT SVANTE JANSON Abstract. We give a survey of a number of simple applications of renewal theory to problems on random strings, in particular
More informationSource Coding. Master Universitario en Ingeniería de Telecomunicación. I. Santamaría Universidad de Cantabria
Source Coding Master Universitario en Ingeniería de Telecomunicación I. Santamaría Universidad de Cantabria Contents Introduction Asymptotic Equipartition Property Optimal Codes (Huffman Coding) Universal
More informationAn Analysis of the Height of Tries with Random Weights on the Edges
An Analysis of the Height of Tries with Random Weights on the Edges N. Broutin L. Devroye September 0, 2007 Abstract We analyze the weighted height of random tries built from independent strings of i.i.d.
More informationECE 587 / STA 563: Lecture 5 Lossless Compression
ECE 587 / STA 563: Lecture 5 Lossless Compression Information Theory Duke University, Fall 2017 Author: Galen Reeves Last Modified: October 18, 2017 Outline of lecture: 5.1 Introduction to Lossless Source
More informationECE 587 / STA 563: Lecture 5 Lossless Compression
ECE 587 / STA 563: Lecture 5 Lossless Compression Information Theory Duke University, Fall 28 Author: Galen Reeves Last Modified: September 27, 28 Outline of lecture: 5. Introduction to Lossless Source
More information1 Introduction to information theory
1 Introduction to information theory 1.1 Introduction In this chapter we present some of the basic concepts of information theory. The situations we have in mind involve the exchange of information through
More informationBasic Principles of Lossless Coding. Universal Lossless coding. Lempel-Ziv Coding. 2. Exploit dependences between successive symbols.
Universal Lossless coding Lempel-Ziv Coding Basic principles of lossless compression Historical review Variable-length-to-block coding Lempel-Ziv coding 1 Basic Principles of Lossless Coding 1. Exploit
More informationLecture 1 : Data Compression and Entropy
CPS290: Algorithmic Foundations of Data Science January 8, 207 Lecture : Data Compression and Entropy Lecturer: Kamesh Munagala Scribe: Kamesh Munagala In this lecture, we will study a simple model for
More informationDependence between Path Lengths and Size in Random Trees (joint with H.-H. Chern, H.-K. Hwang and R. Neininger)
Dependence between Path Lengths and Size in Random Trees (joint with H.-H. Chern, H.-K. Hwang and R. Neininger) Michael Fuchs Institute of Applied Mathematics National Chiao Tung University Hsinchu, Taiwan
More informationA Master Theorem for Discrete Divide and Conquer Recurrences
A Master Theorem for Discrete Divide and Conquer Recurrences Wojciech Szpankowski Department of Computer Science Purdue University W. Lafayette, IN 47907 January 20, 2011 NSF CSoI SODA, 2011 Research supported
More informationAnalytic Information Theory: From Shannon to Knuth and Back. Knuth80: Piteaa, Sweden, 2018 Dedicated to Don E. Knuth
Analytic Information Theory: From Shannon to Knuth and Back Wojciech Szpankowski Center for Science of Information Purdue University January 7, 2018 Knuth80: Piteaa, Sweden, 2018 Dedicated to Don E. Knuth
More informationChapter 5: Data Compression
Chapter 5: Data Compression Definition. A source code C for a random variable X is a mapping from the range of X to the set of finite length strings of symbols from a D-ary alphabet. ˆX: source alphabet,
More informationSIGNAL COMPRESSION Lecture 7. Variable to Fix Encoding
SIGNAL COMPRESSION Lecture 7 Variable to Fix Encoding 1. Tunstall codes 2. Petry codes 3. Generalized Tunstall codes for Markov sources (a presentation of the paper by I. Tabus, G. Korodi, J. Rissanen.
More informationSolutions to Set #2 Data Compression, Huffman code and AEP
Solutions to Set #2 Data Compression, Huffman code and AEP. Huffman coding. Consider the random variable ( ) x x X = 2 x 3 x 4 x 5 x 6 x 7 0.50 0.26 0. 0.04 0.04 0.03 0.02 (a) Find a binary Huffman code
More informationAdvanced Data Structures
Simon Gog gog@kit.edu - Simon Gog: KIT The Research University in the Helmholtz Association www.kit.edu Predecessor data structures We want to support the following operations on a set of integers from
More informationEntropy as a measure of surprise
Entropy as a measure of surprise Lecture 5: Sam Roweis September 26, 25 What does information do? It removes uncertainty. Information Conveyed = Uncertainty Removed = Surprise Yielded. How should we quantify
More informationCS 229r Information Theory in Computer Science Feb 12, Lecture 5
CS 229r Information Theory in Computer Science Feb 12, 2019 Lecture 5 Instructor: Madhu Sudan Scribe: Pranay Tankala 1 Overview A universal compression algorithm is a single compression algorithm applicable
More informationOn Buffon Machines & Numbers
On Buffon Machines & Numbers Philippe Flajolet, Maryse Pelletier, Michèle Soria AofA 09, Fréjus --- June 2009 [INRIA-Rocquencourt & LIP6, Paris] 1 1733: Countess Buffon drops her knitting kit on the floor.
More informationPART III. Outline. Codes and Cryptography. Sources. Optimal Codes (I) Jorge L. Villar. MAMME, Fall 2015
Outline Codes and Cryptography 1 Information Sources and Optimal Codes 2 Building Optimal Codes: Huffman Codes MAMME, Fall 2015 3 Shannon Entropy and Mutual Information PART III Sources Information source:
More informationTight Bounds on Minimum Maximum Pointwise Redundancy
Tight Bounds on Minimum Maximum Pointwise Redundancy Michael B. Baer vlnks Mountain View, CA 94041-2803, USA Email:.calbear@ 1eee.org Abstract This paper presents new lower and upper bounds for the optimal
More informationDigital search trees JASS
Digital search trees Analysis of different digital trees with Rice s integrals. JASS Nicolai v. Hoyningen-Huene 28.3.2004 28.3.2004 JASS 04 - Digital search trees 1 content Tree Digital search tree: Definition
More informationCoding of memoryless sources 1/35
Coding of memoryless sources 1/35 Outline 1. Morse coding ; 2. Definitions : encoding, encoding efficiency ; 3. fixed length codes, encoding integers ; 4. prefix condition ; 5. Kraft and Mac Millan theorems
More informationDictionary: an abstract data type
2-3 Trees 1 Dictionary: an abstract data type A container that maps keys to values Dictionary operations Insert Search Delete Several possible implementations Balanced search trees Hash tables 2 2-3 trees
More informationAverage Case Analysis of QuickSort and Insertion Tree Height using Incompressibility
Average Case Analysis of QuickSort and Insertion Tree Height using Incompressibility Tao Jiang, Ming Li, Brendan Lucier September 26, 2005 Abstract In this paper we study the Kolmogorov Complexity of a
More informationCommunications Theory and Engineering
Communications Theory and Engineering Master's Degree in Electronic Engineering Sapienza University of Rome A.A. 2018-2019 AEP Asymptotic Equipartition Property AEP In information theory, the analog of
More informationData Compression Techniques
Data Compression Techniques Part 1: Entropy Coding Lecture 4: Asymmetric Numeral Systems Juha Kärkkäinen 08.11.2017 1 / 19 Asymmetric Numeral Systems Asymmetric numeral systems (ANS) is a recent entropy
More information1 Maintaining a Dictionary
15-451/651: Design & Analysis of Algorithms February 1, 2016 Lecture #7: Hashing last changed: January 29, 2016 Hashing is a great practical tool, with an interesting and subtle theory too. In addition
More informationProbabilistic analysis of the asymmetric digital search trees
Int. J. Nonlinear Anal. Appl. 6 2015 No. 2, 161-173 ISSN: 2008-6822 electronic http://dx.doi.org/10.22075/ijnaa.2015.266 Probabilistic analysis of the asymmetric digital search trees R. Kazemi a,, M. Q.
More informationA One-to-One Code and Its Anti-Redundancy
A One-to-One Code and Its Anti-Redundancy W. Szpankowski Department of Computer Science, Purdue University July 4, 2005 This research is supported by NSF, NSA and NIH. Outline of the Talk. Prefix Codes
More informationECE750-TXB Lecture 8: Treaps, Tries, and. Hash Tables
, and, and Hash Electrical & Computer Engineering University of Waterloo Canada February 1, 2007 Recall that a binary search tree has keys drawn from a totally ordered structure K, An inorder traversal
More informationA New Binomial Recurrence Arising in a Graphical Compression Algorithm
A New Binomial Recurrence Arising in a Graphical Compression Algorithm Yongwoo Choi, Charles Knessl, Wojciech Szpanowsi To cite this version: Yongwoo Choi, Charles Knessl, Wojciech Szpanowsi. A New Binomial
More informationAn instantaneous code (prefix code, tree code) with the codeword lengths l 1,..., l N exists if and only if. 2 l i. i=1
Kraft s inequality An instantaneous code (prefix code, tree code) with the codeword lengths l 1,..., l N exists if and only if N 2 l i 1 Proof: Suppose that we have a tree code. Let l max = max{l 1,...,
More informationAnalytic Pattern Matching: From DNA to Twitter. AxA Workshop, Venice, 2016 Dedicated to Alberto Apostolico
Analytic Pattern Matching: From DNA to Twitter Wojciech Szpankowski Purdue University W. Lafayette, IN 47907 June 19, 2016 AxA Workshop, Venice, 2016 Dedicated to Alberto Apostolico Joint work with Philippe
More informationCSE 421 Greedy: Huffman Codes
CSE 421 Greedy: Huffman Codes Yin Tat Lee 1 Compression Example 100k file, 6 letter alphabet: File Size: ASCII, 8 bits/char: 800kbits 2 3 > 6; 3 bits/char: 300kbits better: 2.52 bits/char 74%*2 +26%*4:
More informationLecture: Analysis of Algorithms (CS )
Lecture: Analysis of Algorithms (CS483-001) Amarda Shehu Spring 2017 1 Outline of Today s Class 2 Choosing Hash Functions Universal Universality Theorem Constructing a Set of Universal Hash Functions Perfect
More informationlossless, optimal compressor
6. Variable-length Lossless Compression The principal engineering goal of compression is to represent a given sequence a, a 2,..., a n produced by a source as a sequence of bits of minimal possible length.
More informationLecture 4 Thursday Sep 11, 2014
CS 224: Advanced Algorithms Fall 2014 Lecture 4 Thursday Sep 11, 2014 Prof. Jelani Nelson Scribe: Marco Gentili 1 Overview Today we re going to talk about: 1. linear probing (show with 5-wise independence)
More informationOn Universal Types. Gadiel Seroussi Hewlett-Packard Laboratories Palo Alto, California, USA. University of Minnesota, September 14, 2004
On Universal Types Gadiel Seroussi Hewlett-Packard Laboratories Palo Alto, California, USA University of Minnesota, September 14, 2004 Types for Parametric Probability Distributions A = finite alphabet,
More informationLecture 3. Mathematical methods in communication I. REMINDER. A. Convex Set. A set R is a convex set iff, x 1,x 2 R, θ, 0 θ 1, θx 1 + θx 2 R, (1)
3- Mathematical methods in communication Lecture 3 Lecturer: Haim Permuter Scribe: Yuval Carmel, Dima Khaykin, Ziv Goldfeld I. REMINDER A. Convex Set A set R is a convex set iff, x,x 2 R, θ, θ, θx + θx
More informationKolmogorov complexity
Kolmogorov complexity In this section we study how we can define the amount of information in a bitstring. Consider the following strings: 00000000000000000000000000000000000 0000000000000000000000000000000000000000
More informationChapter 11. Approximation Algorithms. Slides by Kevin Wayne Pearson-Addison Wesley. All rights reserved.
Chapter 11 Approximation Algorithms Slides by Kevin Wayne. Copyright @ 2005 Pearson-Addison Wesley. All rights reserved. 1 Approximation Algorithms Q. Suppose I need to solve an NP-hard problem. What should
More informationGreedy. Outline CS141. Stefano Lonardi, UCR 1. Activity selection Fractional knapsack Huffman encoding Later:
October 5, 017 Greedy Chapters 5 of Dasgupta et al. 1 Activity selection Fractional knapsack Huffman encoding Later: Outline Dijkstra (single source shortest path) Prim and Kruskal (minimum spanning tree)
More informationAdvanced Implementations of Tables: Balanced Search Trees and Hashing
Advanced Implementations of Tables: Balanced Search Trees and Hashing Balanced Search Trees Binary search tree operations such as insert, delete, retrieve, etc. depend on the length of the path to the
More informationHomework Set #2 Data Compression, Huffman code and AEP
Homework Set #2 Data Compression, Huffman code and AEP 1. Huffman coding. Consider the random variable ( x1 x X = 2 x 3 x 4 x 5 x 6 x 7 0.50 0.26 0.11 0.04 0.04 0.03 0.02 (a Find a binary Huffman code
More informationUNIT I INFORMATION THEORY. I k log 2
UNIT I INFORMATION THEORY Claude Shannon 1916-2001 Creator of Information Theory, lays the foundation for implementing logic in digital circuits as part of his Masters Thesis! (1939) and published a paper
More informationAnalytic Algorithmics, Combinatorics, and Information Theory
Analytic Algorithmics, Combinatorics, and Information Theory W. Szpankowski Department of Computer Science Purdue University W. Lafayette, IN 47907 September 11, 2006 AofA and IT logos Research supported
More informationData Streams & Communication Complexity
Data Streams & Communication Complexity Lecture 1: Simple Stream Statistics in Small Space Andrew McGregor, UMass Amherst 1/25 Data Stream Model Stream: m elements from universe of size n, e.g., x 1, x
More informationNUMBER OF SYMBOL COMPARISONS IN QUICKSORT
NUMBER OF SYMBOL COMPARISONS IN QUICKSORT Brigitte Vallée (CNRS and Université de Caen, France) Joint work with Julien Clément, Jim Fill and Philippe Flajolet Plan of the talk. Presentation of the study
More informationAsymptotic and Exact Poissonized Variance in the Analysis of Random Digital Trees (joint with Hsien-Kuei Hwang and Vytas Zacharovas)
Asymptotic and Exact Poissonized Variance in the Analysis of Random Digital Trees (joint with Hsien-Kuei Hwang and Vytas Zacharovas) Michael Fuchs Institute of Applied Mathematics National Chiao Tung University
More informationINF2220: algorithms and data structures Series 1
Universitetet i Oslo Institutt for Informatikk I. Yu, D. Karabeg INF2220: algorithms and data structures Series 1 Topic Function growth & estimation of running time, trees (Exercises with hints for solution)
More informationPROBABILITY AND INFORMATION THEORY. Dr. Gjergji Kasneci Introduction to Information Retrieval WS
PROBABILITY AND INFORMATION THEORY Dr. Gjergji Kasneci Introduction to Information Retrieval WS 2012-13 1 Outline Intro Basics of probability and information theory Probability space Rules of probability
More informationCMPT 365 Multimedia Systems. Lossless Compression
CMPT 365 Multimedia Systems Lossless Compression Spring 2017 Edited from slides by Dr. Jiangchuan Liu CMPT365 Multimedia Systems 1 Outline Why compression? Entropy Variable Length Coding Shannon-Fano Coding
More informationText Compression. Jayadev Misra The University of Texas at Austin December 5, A Very Incomplete Introduction to Information Theory 2
Text Compression Jayadev Misra The University of Texas at Austin December 5, 2003 Contents 1 Introduction 1 2 A Very Incomplete Introduction to Information Theory 2 3 Huffman Coding 5 3.1 Uniquely Decodable
More informationAdvanced Data Structures
Simon Gog gog@kit.edu - Simon Gog: KIT University of the State of Baden-Wuerttemberg and National Research Center of the Helmholtz Association www.kit.edu Predecessor data structures We want to support
More informationIntroduction to information theory and coding
Introduction to information theory and coding Louis WEHENKEL Set of slides No 5 State of the art in data compression Stochastic processes and models for information sources First Shannon theorem : data
More informationNUMBER OF SYMBOL COMPARISONS IN QUICKSORT AND QUICKSELECT
NUMBER OF SYMBOL COMPARISONS IN QUICKSORT AND QUICKSELECT Brigitte Vallée (CNRS and Université de Caen, France) Joint work with Julien Clément, Jim Fill and Philippe Flajolet Plan of the talk. Presentation
More informationOn universal types. Gadiel Seroussi Information Theory Research HP Laboratories Palo Alto HPL September 6, 2004*
On universal types Gadiel Seroussi Information Theory Research HP Laboratories Palo Alto HPL-2004-153 September 6, 2004* E-mail: gadiel.seroussi@hp.com method of types, type classes, Lempel-Ziv coding,
More information1590 IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 48, NO. 6, JUNE Source Coding, Large Deviations, and Approximate Pattern Matching
1590 IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 48, NO. 6, JUNE 2002 Source Coding, Large Deviations, and Approximate Pattern Matching Amir Dembo and Ioannis Kontoyiannis, Member, IEEE Invited Paper
More informationAn O(N) Semi-Predictive Universal Encoder via the BWT
An O(N) Semi-Predictive Universal Encoder via the BWT Dror Baron and Yoram Bresler Abstract We provide an O(N) algorithm for a non-sequential semi-predictive encoder whose pointwise redundancy with respect
More informationOn the lower limits of entropy estimation
On the lower limits of entropy estimation Abraham J. Wyner and Dean Foster Department of Statistics, Wharton School, University of Pennsylvania, Philadelphia, PA e-mail: ajw@wharton.upenn.edu foster@wharton.upenn.edu
More informationUkkonen's suffix tree construction algorithm
Ukkonen's suffix tree construction algorithm aba$ $ab aba$ 2 2 1 1 $ab a ba $ 3 $ $ab a ba $ $ $ 1 2 4 1 String Algorithms; Nov 15 2007 Motivation Yet another suffix tree construction algorithm... Why?
More informationCOS597D: Information Theory in Computer Science October 19, Lecture 10
COS597D: Information Theory in Computer Science October 9, 20 Lecture 0 Lecturer: Mark Braverman Scribe: Andrej Risteski Kolmogorov Complexity In the previous lectures, we became acquainted with the concept
More informationLecture 5: Hashing. David Woodruff Carnegie Mellon University
Lecture 5: Hashing David Woodruff Carnegie Mellon University Hashing Universal hashing Perfect hashing Maintaining a Dictionary Let U be a universe of keys U could be all strings of ASCII characters of
More informationStreaming Algorithms for Optimal Generation of Random Bits
Streaming Algorithms for Optimal Generation of Random Bits ongchao Zhou, and Jehoshua Bruck, Fellow, IEEE arxiv:09.0730v [cs.i] 4 Sep 0 Abstract Generating random bits from a source of biased coins (the
More informationSearch Algorithms. Analysis of Algorithms. John Reif, Ph.D. Prepared by
Search Algorithms Analysis of Algorithms Prepared by John Reif, Ph.D. Search Algorithms a) Binary Search: average case b) Interpolation Search c) Unbounded Search (Advanced material) Readings Reading Selection:
More informationRandomized Sorting Algorithms Quick sort can be converted to a randomized algorithm by picking the pivot element randomly. In this case we can show th
CSE 3500 Algorithms and Complexity Fall 2016 Lecture 10: September 29, 2016 Quick sort: Average Run Time In the last lecture we started analyzing the expected run time of quick sort. Let X = k 1, k 2,...,
More informationAsymmetric Rényi Problem
Asymmetric Rényi Problem July 7, 2015 Abram Magner and Michael Drmota and Wojciech Szpankowski Abstract In 1960 Rényi in his Michigan State University lectures asked for the number of random queries necessary
More informationJeffrey D. Ullman Stanford University
Jeffrey D. Ullman Stanford University 3 We are given a set of training examples, consisting of input-output pairs (x,y), where: 1. x is an item of the type we want to evaluate. 2. y is the value of some
More informationThe Height of List-tries and TST
Discrete Mathematics an Theoretical Computer Science (subm.), by the authors, 1 rev The Height of List-tries an TST N. Broutin 1 an L. Devroye 1 1 School of Computer Science, McGill University, 3480 University
More informationEntropy for Sparse Random Graphs With Vertex-Names
Entropy for Sparse Random Graphs With Vertex-Names David Aldous 11 February 2013 if a problem seems... Research strategy (for old guys like me): do-able = give to Ph.D. student maybe do-able = give to
More informationChapter 3 Source Coding. 3.1 An Introduction to Source Coding 3.2 Optimal Source Codes 3.3 Shannon-Fano Code 3.4 Huffman Code
Chapter 3 Source Coding 3. An Introduction to Source Coding 3.2 Optimal Source Codes 3.3 Shannon-Fano Code 3.4 Huffman Code 3. An Introduction to Source Coding Entropy (in bits per symbol) implies in average
More informationcompare to comparison and pointer based sorting, binary trees
Admin Hashing Dictionaries Model Operations. makeset, insert, delete, find keys are integers in M = {1,..., m} (so assume machine word size, or unit time, is log m) can store in array of size M using power:
More information10-704: Information Processing and Learning Fall Lecture 10: Oct 3
0-704: Information Processing and Learning Fall 206 Lecturer: Aarti Singh Lecture 0: Oct 3 Note: These notes are based on scribed notes from Spring5 offering of this course. LaTeX template courtesy of
More information1 Ex. 1 Verify that the function H(p 1,..., p n ) = k p k log 2 p k satisfies all 8 axioms on H.
Problem sheet Ex. Verify that the function H(p,..., p n ) = k p k log p k satisfies all 8 axioms on H. Ex. (Not to be handed in). looking at the notes). List as many of the 8 axioms as you can, (without
More informationOn the minimum neighborhood of independent sets in the n-cube
Matemática Contemporânea, Vol. 44, 1 10 c 2015, Sociedade Brasileira de Matemática On the minimum neighborhood of independent sets in the n-cube Moysés da S. Sampaio Júnior Fabiano de S. Oliveira Luérbio
More informationPattern Matching in Constrained Sequences
Pattern Matching in Constrained Sequences Yongwook Choi and Wojciech Szpankowski Department of Computer Science Purdue University W. Lafayette, IN 47907 U.S.A. Email: ywchoi@purdue.edu, spa@cs.purdue.edu
More informationON THE BIT-COMPLEXITY OF LEMPEL-ZIV COMPRESSION
ON THE BIT-COMPLEXITY OF LEMPEL-ZIV COMPRESSION PAOLO FERRAGINA, IGOR NITTO, AND ROSSANO VENTURINI Abstract. One of the most famous and investigated lossless data-compression schemes is the one introduced
More informationOutline. Computer Science 331. Cost of Binary Search Tree Operations. Bounds on Height: Worst- and Average-Case
Outline Computer Science Average Case Analysis: Binary Search Trees Mike Jacobson Department of Computer Science University of Calgary Lecture #7 Motivation and Objective Definition 4 Mike Jacobson (University
More informationSlides for CIS 675. Huffman Encoding, 1. Huffman Encoding, 2. Huffman Encoding, 3. Encoding 1. DPV Chapter 5, Part 2. Encoding 2
Huffman Encoding, 1 EECS Slides for CIS 675 DPV Chapter 5, Part 2 Jim Royer October 13, 2009 A toy example: Suppose our alphabet is { A, B, C, D }. Suppose T is a text of 130 million characters. What is
More information? 11.5 Perfect hashing. Exercises
11.5 Perfect hashing 77 Exercises 11.4-1 Consider inserting the keys 10; ; 31; 4; 15; 8; 17; 88; 59 into a hash table of length m 11 using open addressing with the auxiliary hash function h 0.k/ k. Illustrate
More informationCSE 190, Great ideas in algorithms: Pairwise independent hash functions
CSE 190, Great ideas in algorithms: Pairwise independent hash functions 1 Hash functions The goal of hash functions is to map elements from a large domain to a small one. Typically, to obtain the required
More informationInformation Theory and Statistics Lecture 2: Source coding
Information Theory and Statistics Lecture 2: Source coding Łukasz Dębowski ldebowsk@ipipan.waw.pl Ph. D. Programme 2013/2014 Injections and codes Definition (injection) Function f is called an injection
More information