String Complexity. Dedicated to Svante Janson for his 60 Birthday

Size: px
Start display at page:

Download "String Complexity. Dedicated to Svante Janson for his 60 Birthday"

Transcription

1 String Complexity Wojciech Szpankowski Purdue University W. Lafayette, IN June 1, 2015 Dedicated to Svante Janson for his 60 Birthday

2 Outline 1. Working with Svante 2. String Complexity 3. Joint String Complexity

3 Joint Papers 1. S. Janson and W. Szpankowski, Analysis of an asymmetric leader election algorithm Electronic J. of Combinatorics, 4, R17, S. Janson, S. Lonardi, and W. Szpankowski, On Average Sequence Complexity, Theoretical Computer Science, 326, , 2004 (also Combinatorial Pattern Matching Conference, CPM 04, Istanbul, 2004). 3. S. Janson and W. Szpankowski, Partial Fillup and Search Time in LC Tries ACM Trans. on Algorithms, 3, 44:1-44:14, 2007 (also, ANALCO, Miami, 2006). 4. A. Magner, S. Janson, G. Kollias, and W. Szpankowski On Symmetry of Uniform and Preferential Attachment Graphs, Electronic J. Combinatorics, 21, P3.32, 2014 (also, 25th International Conference on Probabilistic, Combinatorial and Asymptotic Methods for the Analysis of Algorithms AofA 14, Paris, 2014).

4 Joint Papers 1. S. Janson and W. Szpankowski, Analysis of an asymmetric leader election algorithm Electronic J. of Combinatorics, 4, R17, S. Janson, S. Lonardi, and W. Szpankowski, On Average Sequence Complexity, Theoretical Computer Science, 326, , 2004 (also Combinatorial Pattern Matching Conference, CPM 04, Istanbul, 2004). 3. S. Janson and W. Szpankowski, Partial Fillup and Search Time in LC Tries ACM Trans. on Algorithms, 3, 44:1-44:14, 2007 (also, ANALCO, Miami, 2006). 4. A. Magner, S. Janson, G. Kollias, and W. Szpankowski On Symmetry of Uniform and Preferential Attachment Graphs, Electronic J. Combinatorics, 21, P3.32, 2014 (also, 25th International Conference on Probabilistic, Combinatorial and Asymptotic Methods for the Analysis of Algorithms AofA 14, Paris, 2014). Working with Svante is easy...

5

6 Outline 1. Working with Svante 2. String Complexity 3. Joint String Complexity

7 Some Definitions String Complexity of a single sequence is the number of distinct substrings. Throughout, we write X for the string and denote by I(X) the set of distinct substrings of X over alphabet A. Example. If X = aabaa, then I(X) = {ǫ,a,b,aa,ab,ba,aab,aba,baa,aaba,abaa,aabaa}, and I(X) = 12. But if X = aaaaa, then and I(X) = 6. I(X) = {ǫ,a,aa,aaa,aaaa,aaaaa}, The string complexity is the cardinality of I(X) and we study here the average string complexity. E[ I(X) ] = X A n P(X) I(X).

8 Suffix Trees and String Complexity a b a a b a b a b a 4 a b a b a 3 b a a b a b a 2 b a a b a a b a b a Non-compact suffix trie for X = abaababa and string complexity I(X) = 24. a b a ababa 1 a ba ababa ba 4 ba 4 ababa 3 ababa 3 b a ababa 2 ba ababa 2 ba 5 ba a b a a b a b a a b a a b a b a String Complexity = # internal nodes in a non-compact suffix tree.

9 Some Simple Facts Let O(w) denote the number of times that the word w occurs in X. Then I(X) = w A min{1,o(w)}. Since between any two positions in X there is one and only one substring: w A O(w) = ( X + 1) X. 2 Hence Define: ( X + 1) X I(X) = max{0,o(w) 1}. 2 w A C n := E[ I(X) X = n]. Then C n = (n + 1)n 2 (k 1)P(O n (w) = k). w A k 2 We need to study probabilistically O n (w): that is: number of w occurrences in a text X generated a probabilistic source.

10 New Book on Pattern Matching Pattern matching techniques can offer answers to these questions and to many others, from molecular biology, to telecommunications, to classifying Twitter content. More than 100 end-of-chapter problems help the reader to make the link between theory and practice. Philippe Jacquet is a research director at INRIA, a major public research lab in Computer Science in France. He has been a major contributor to the Internet OLSR protocol for mobile networks. His research interests involve information theory, probability theory, quantum telecommunication, protocol design, performance evaluation and optimization, and the analysis of algorithms. Since 2012 he has been with Alcatel-Lucent Bell Labs as head of the department of Mathematics of Dynamic Networks and Information. Jacquet is a member of the prestigious French Corps des Mines, known for excellence in French industry, with the rank of Ingenieur General. He is also a member of ACM and IEEE : Jacquet & Szpankowski: PPC: C M Y K Wojciech Szpankowski is Saul Rosen Professor of Computer Science and (by courtesy) Electrical and Computer Engineering at Purdue University, where he teaches and conducts research in analysis of algorithms, information theory, bioinformatics, analytic combinatorics, random structures, and stability problems of distributed systems. In 2008 he launched the interdisciplinary Institute for Science of Information, and in 2010 he became the Director of the newly established NSF Science and Technology Center for Science of Information. Szpankowski is a Fellow of IEEE and an Erskine Fellow. He received the Humboldt Research Award in ISBN Cover design: Andrew Ward > Analytic Pattern Matching This book for researchers and graduate students demonstrates the probabilistic approach to pattern matching, which predicts the performance of pattern matching algorithms with very high precision using analytic combinatorics and analytic information theory. Part I compiles known results of pattern matching problems via analytic methods. Part II focuses on applications to various data structures on words, such as digital trees, suffix trees, string complexity and string-based data compression. The authors use results and techniques from Part I and also introduce new methodology such as the Mellin transform and analytic depoissonization. Jacquet and Szpankowski How do you distinguish a cat from a dog by their DNA? Did Shakespeare really write all of his plays? Philippe Jacquet and Wojciech Szpankowski Analytic Pattern Matching From DNA to Twitter

11 Book Contents Chapter 1: Probabilistic Models Chapter 2: Exact String Matching Chapter 3: Constrained Exact String Matching Chapter 4: Generalized String Matching Chapter 5: Subsequence String Matching Chapter 6: Algorithms and Data Structures Chapter 7: Digital Trees Chapter 8: Suffix Trees & Lempel-Ziv 77 Chapter 9: Lempel-Ziv 78 Compression Algorithm Chapter 10: String Complexity

12 Some Results Last expression allows us to write C n = (n + 1)n 2 + E[S n ] E[L n ] where E[S n ] and E[L n ] are, respectively, the average size and path length in the associated (compact) suffix trees. We know that E[S n ] = 1 (n + Ψ(logn)) + o(n), h E[L n ] = nlogn h + nψ 2 (logn) + o(n), where Ψ(logn) and Ψ 2 (logn) are periodic functions (when the logp a, a A are rationally related), and where h is the entropy rate. Therefore, C n = (n + 1)n 2 n h (logn 1 + Q 0(logn) + o(1)) where Q 0 (x) is a periodic function.

13 Theorem from 2004 Proved with Bare-Hands In 2004 Svante, Stefano and I published the first result of this kind for a symmetric memoryless source (all symbol probabilities are the same). Theorem 1 (Janson, Lonardi, W.S., 2004). Let C n be the string complexity for an unbiased memoryless source over alphabet A. Then E(C n ) = ( n + 1 ) ( 1 nlog A n γ ) ln A + ϕ A (log A n) n+o( nlogn) where γ is Euler s constant and ϕ A (x) = 1 ln A Γ j 0 ( 1 2πij ) e 2πijx ln A is a continuous function with period 1. ϕ A (x) is very small for small A : ϕ 2 (x) < , ϕ 3 (x) < , ϕ 4 (x) <

14 Outline 1. Working with Svante 2. String Complexity 3. Joint String Complexity

15 Joint String Complexity For X and Y, let J(X,Y ) be the set of common words between X and Y. The joint string complexity is J(X,Y) = I(X) I(Y ) Example. If X = aabaa and Y = abbba, then J(X,Y) = {ε,a,b,ab,ba}. Goal. Estimate J n,m = E[ J(X,Y) ] when X = n and Y = m.

16 Joint String Complexity For X and Y, let J(X,Y ) be the set of common words between X and Y. The joint string complexity is J(X,Y) = I(X) I(Y ) Example. If X = aabaa and Y = abbba, then J(X,Y) = {ε,a,b,ab,ba}. Goal. Estimate J n,m = E[ J(X,Y) ] when X = n and Y = m. Some Observations. For any word w A J(X,Y) = w A min{1,o X (w)} min{1,o Y (w)}. When X = n and Y = m, we have J n,m = E[ J(X,Y ) ] 1 = w A {ε} P(O 1 n (w) 1)P(O2 m (w) 1) where O i n (w) is the number of w-occurrences in a string of generated by source i = 1, 2 (i.e., X and Y ) which we assume to be memoryless sources.

17 Independent Joint String Complexity Consider two sets of n independently generated (memoryless) strings. Let Ω i n (w) be the number of strings for which w is a prefix when the n strings are generated by a source i = 1,2 define C n,m = w A {ε} Theorem 2. There exists ε > 0 such that for large n. P(Ω 1 n (w) 1)P(Ω2 m (w) 1) J n,m C n,m = O(min{n,m} ε )

18 Independent Joint String Complexity Consider two sets of n independently generated (memoryless) strings. Let Ω i n (w) be the number of strings for which w is a prefix when the n strings are generated by a source i = 1,2 define C n,m = w A {ε} Theorem 2. There exists ε > 0 such that for large n. P(Ω 1 n (w) 1)P(Ω2 m (w) 1) J n,m C n,m = O(min{n,m} ε ) Recurrence for C n,m C n,m = 1+ a A k,l 0 with C 0,m = C n,0 = 0. ( n 1 (a) k)p k (1 P 1 (a)) n k( m) P 2 (a) l (1 P 2 (a)) m l C k,l l

19 Generating Functions, Mellin Transform, DePoissonization... Poisson Transform. The Poisson transform C(z 1,z 2 ) of C n,m is C(z 1,z 2 ) = n,m 0 C n,m z n 1 zm 2 n!m! e z 1 z 2. which becomes the functional equation after summing up the recurrence: C(z 1,z 2 ) = (1 e z 1 )(1 e z 2 ) + a AC (P 1 (a)z 1,P 2 (a)z 2 ). Clearly, n!m!c n,m = [z n 1 ][zm 2 ]C(z 1,z 2 )e z 1 +z 2.

20 Generating Functions, Mellin Transform, DePoissonization... Poisson Transform. The Poisson transform C(z 1,z 2 ) of C n,m is C(z 1,z 2 ) = n,m 0 C n,m z n 1 zm 2 n!m! e z 1 z 2. which becomes the functional equation after summing up the recurrence: C(z 1,z 2 ) = (1 e z 1 )(1 e z 2 ) + a AC (P 1 (a)z 1,P 2 (a)z 2 ). Clearly, n!m!c n,m = [z n 1 ][zm 2 ]C(z 1,z 2 )e z 1 +z 2. Mellin Transform. Two dimensional Mellin transform is defined as C (s 1,s 2 ) = 0 0 C(z 1,z 2 )z s z s dz 1 dz 2. From the above functional equation we find for 2 < R(s i ) < 1 ( ) C 1 (s 1,s 2 ) = Γ(s 1 )Γ(s 2 ) H(s 1,s 2 ) + s 1 H( 1,s 2 ) + s 2 H(s 1, 1) + s 1 s 2 H( 1, 1) where the kernel is defined as H(s 1,s 2 ) = 1 a A(P 1 (a)) s 1 (P 2 (a)) s 2.

21 Finding C n,n Here we only consider m = n and z 1 = z 2 = z. To recover C n,n we first find the inverse Mellin C(z,z) = 1 (2iπ) 2 R(s 1 )=c 1 R(s 2 )=c 2 C (s 1,s 2 )z s 1 s 2 ds 1 ds 2 which turns out to be C(z,z) = ( ) 1 2 2iπ R(s 1 )=ρ 1 R(s 2 )=ρ 2 Γ(s 1 )Γ(s 2 ) H(s 1,s 2 ) z s 1 s 2 ds 1 ds 2 + o(z M ), for any M > 0 as z in a cone around the real axis. The final step to recover C n,n C(n,n) is to apply the two-dimensional depoissonization.

22 Main Results Assume that a A we have P 1 (a) = P 2 (a) = p a. Theorem 3. For a biased memoryless source, the joint complexity is asymptotically C n,n = n 2log2 + Q(logn)n + o(n), h where Q(x) is a small periodic function (with amplitude smaller than 10 6 ) which is nonzero only when the logp a, a A, are rationally related, that is, logp a /logp b Q. Assume that P 1 (a) P 2 (a). Theorem 4. Define κ = min (s1,s 2 ) K R 2 {( s 1 s 2 )} < 1, where s 1 and s 2 are roots of H(s 1,s 2 ) = 1 a A(P 1 (a)) s 1 (P 2 (a)) s 2 = 0. Then ( ) C n,n = nκ Γ(c 1 )Γ(c 2 ) logn π H(c1,c 2 ) H(c 1,c 2 ) + Q(logn) + O(1/logn), where Q is a double periodic function.

23 Very Brief Sketch of Proof 1. Set P 1 (a) = 1/ A and then the kernel is H(s 1,s 2 ) = 1 A s 1 a Ap s 2 a. Define r(s 2 ) = a A ps 2 a and L(s 2 ) = log A r(s 2 ). 2. Roots of H(s 1,s 2 ) = 0 are which are poles of C(z,z) leading to C(z,z) 1 2iπlog A s 1 = log A (r(s 2 )) + 2ikπ log( A ) R(s)=c 2 Integrating over s = s 2 requires the saddle point method. k Γ ( L(s) + 2ikπ ) Γ(s)z L(s) s 2ikπ/log( A ) ds log( A )

24 Saddle Point 3. The function L(s) s achieves it minimum at c 2 =: ρ is the dominant real saddle point. But there is more... ρ + i 2π j logr, j logn Infinitely Many Saddle Points: ρ + i 2π logr ρ + i 2π log r 1 3a. L(c 2 + it) is a periodic function with period 2πlogν. ρ i 2π logr 3b The saddle points are at c 2 + 2πil/logν. ρ i ρ i 2π logr 2π j logr, j logn 3c. The infinite saddle points defines the fluctuating function Q. 4. The growth of C(z,z) is defined by z L(c 2 ) c 2 = z κ where κ = min s R {log A (r(s)) s}, c 2 = minarg s R {log A (r(s)) s}, where here s = s 2, and recall L(s 2 ) = log A r(s 2 ). The factor 1/ logn comes from the saddle point approximation. completes the sketch. This

25 Classification of Sources The growth of C n,n is: Θ(n) for identical sources; Θ(n κ / logn) for nonidential sources with κ < 1. (a) (b) Figure 1: Joint complexity: (a) English text vs French, Greek, Polish, and Finnish texts; (b) real and simulated texts (3rd Markov order) of English, French, Greek, Polish and Finnish language.

26 That s It THANK YOU, SVANTE!

Analytic Pattern Matching: From DNA to Twitter. AxA Workshop, Venice, 2016 Dedicated to Alberto Apostolico

Analytic Pattern Matching: From DNA to Twitter. AxA Workshop, Venice, 2016 Dedicated to Alberto Apostolico Analytic Pattern Matching: From DNA to Twitter Wojciech Szpankowski Purdue University W. Lafayette, IN 47907 June 19, 2016 AxA Workshop, Venice, 2016 Dedicated to Alberto Apostolico Joint work with Philippe

More information

Analytic Pattern Matching: From DNA to Twitter. Combinatorial Pattern Matching, Ischia Island, 2015

Analytic Pattern Matching: From DNA to Twitter. Combinatorial Pattern Matching, Ischia Island, 2015 Analytic Pattern Matching: From DNA to Twitter Wojciech Szpankowski Purdue University W. Lafayette, IN 47907 June 26, 2015 Combinatorial Pattern Matching, Ischia Island, 2015 Joint work with Philippe Jacquet

More information

Analytic Pattern Matching: From DNA to Twitter. Information Theory, Learning, and Big Data, Berkeley, 2015

Analytic Pattern Matching: From DNA to Twitter. Information Theory, Learning, and Big Data, Berkeley, 2015 Analytic Pattern Matching: From DNA to Twitter Wojciech Szpankowski Purdue University W. Lafayette, IN 47907 March 10, 2015 Information Theory, Learning, and Big Data, Berkeley, 2015 Jont work with Philippe

More information

A Note on a Problem Posed by D. E. Knuth on a Satisfiability Recurrence

A Note on a Problem Posed by D. E. Knuth on a Satisfiability Recurrence A Note on a Problem Posed by D. E. Knuth on a Satisfiability Recurrence Dedicated to Philippe Flajolet 1948-2011 July 4, 2013 Philippe Jacquet Charles Knessl 1 Wojciech Szpankowski 2 Bell Labs Dept. Math.

More information

Multiple Choice Tries and Distributed Hash Tables

Multiple Choice Tries and Distributed Hash Tables Multiple Choice Tries and Distributed Hash Tables Luc Devroye and Gabor Lugosi and Gahyun Park and W. Szpankowski January 3, 2007 McGill University, Montreal, Canada U. Pompeu Fabra, Barcelona, Spain U.

More information

Compact Suffix Trees Resemble Patricia Tries: Limiting Distribution of Depth

Compact Suffix Trees Resemble Patricia Tries: Limiting Distribution of Depth Purdue University Purdue e-pubs Department of Computer Science Technical Reports Department of Computer Science 1992 Compact Suffix Trees Resemble Patricia Tries: Limiting Distribution of Depth Philippe

More information

Analytic Information Theory: From Shannon to Knuth and Back. Knuth80: Piteaa, Sweden, 2018 Dedicated to Don E. Knuth

Analytic Information Theory: From Shannon to Knuth and Back. Knuth80: Piteaa, Sweden, 2018 Dedicated to Don E. Knuth Analytic Information Theory: From Shannon to Knuth and Back Wojciech Szpankowski Center for Science of Information Purdue University January 7, 2018 Knuth80: Piteaa, Sweden, 2018 Dedicated to Don E. Knuth

More information

A Master Theorem for Discrete Divide and Conquer Recurrences. Dedicated to PHILIPPE FLAJOLET

A Master Theorem for Discrete Divide and Conquer Recurrences. Dedicated to PHILIPPE FLAJOLET A Master Theorem for Discrete Divide and Conquer Recurrences Wojciech Szpankowski Department of Computer Science Purdue University U.S.A. September 6, 2012 Uniwersytet Jagieloński, Kraków, 2012 Dedicated

More information

Digital Trees and Memoryless Sources: from Arithmetics to Analysis

Digital Trees and Memoryless Sources: from Arithmetics to Analysis Digital Trees and Memoryless Sources: from Arithmetics to Analysis Philippe Flajolet, Mathieu Roux, Brigitte Vallée AofA 2010, Wien 1 What is a digital tree, aka TRIE? = a data structure for dynamic dictionaries

More information

A Master Theorem for Discrete Divide and Conquer Recurrences

A Master Theorem for Discrete Divide and Conquer Recurrences A Master Theorem for Discrete Divide and Conquer Recurrences Wojciech Szpankowski Department of Computer Science Purdue University W. Lafayette, IN 47907 January 20, 2011 NSF CSoI SODA, 2011 Research supported

More information

From the Discrete to the Continuous, and Back... Philippe Flajolet INRIA, France

From the Discrete to the Continuous, and Back... Philippe Flajolet INRIA, France From the Discrete to the Continuous, and Back... Philippe Flajolet INRIA, France 1 Discrete structures in... Combinatorial mathematics Computer science: data structures & algorithms Information and communication

More information

Partial Fillup and Search Time in LC Tries

Partial Fillup and Search Time in LC Tries Partial Fillup and Search Time in LC Tries August 17, 2006 Svante Janson Wociech Szpankowski Department of Mathematics Department of Computer Science Uppsala University, P.O. Box 480 Purdue University

More information

arxiv:cs/ v1 [cs.ds] 6 Oct 2005

arxiv:cs/ v1 [cs.ds] 6 Oct 2005 Partial Fillup and Search Time in LC Tries December 27, 2017 arxiv:cs/0510017v1 [cs.ds] 6 Oct 2005 Svante Janson Wociech Szpankowski Department of Mathematics Department of Computer Science Uppsala University,

More information

Pattern Matching in Constrained Sequences

Pattern Matching in Constrained Sequences Pattern Matching in Constrained Sequences Yongwook Choi and Wojciech Szpankowski Department of Computer Science Purdue University W. Lafayette, IN 47907 U.S.A. Email: ywchoi@purdue.edu, spa@cs.purdue.edu

More information

Analysis of the Multiplicity Matching Parameter in Suffix Trees

Analysis of the Multiplicity Matching Parameter in Suffix Trees Analysis of the Multiplicity Matching Parameter in Suffix Trees Mark Daniel Ward 1 and Wojciech Szpankoski 2 1 Department of Mathematics, Purdue University, West Lafayette, IN, USA. mard@math.purdue.edu

More information

Probabilistic analysis of the asymmetric digital search trees

Probabilistic analysis of the asymmetric digital search trees Int. J. Nonlinear Anal. Appl. 6 2015 No. 2, 161-173 ISSN: 2008-6822 electronic http://dx.doi.org/10.22075/ijnaa.2015.266 Probabilistic analysis of the asymmetric digital search trees R. Kazemi a,, M. Q.

More information

Counting Markov Types

Counting Markov Types Counting Markov Types Philippe Jacquet, Charles Knessl, Wojciech Szpankowski To cite this version: Philippe Jacquet, Charles Knessl, Wojciech Szpankowski. Counting Markov Types. Drmota, Michael and Gittenberger,

More information

Source Coding. Master Universitario en Ingeniería de Telecomunicación. I. Santamaría Universidad de Cantabria

Source Coding. Master Universitario en Ingeniería de Telecomunicación. I. Santamaría Universidad de Cantabria Source Coding Master Universitario en Ingeniería de Telecomunicación I. Santamaría Universidad de Cantabria Contents Introduction Asymptotic Equipartition Property Optimal Codes (Huffman Coding) Universal

More information

Analysis of the multiplicity matching parameter in suffix trees

Analysis of the multiplicity matching parameter in suffix trees 2005 International Conference on Analysis of Algorithms DMTCS proc. AD, 2005, 307 322 Analysis of the multiplicity matching parameter in suffix trees Mark Daniel Ward and Wojciech Szpankoski 2 Department

More information

NUMBER OF SYMBOL COMPARISONS IN QUICKSORT

NUMBER OF SYMBOL COMPARISONS IN QUICKSORT NUMBER OF SYMBOL COMPARISONS IN QUICKSORT Brigitte Vallée (CNRS and Université de Caen, France) Joint work with Julien Clément, Jim Fill and Philippe Flajolet Plan of the talk. Presentation of the study

More information

Minimax Redundancy for Large Alphabets by Analytic Methods

Minimax Redundancy for Large Alphabets by Analytic Methods Minimax Redundancy for Large Alphabets by Analytic Methods Wojciech Szpankowski Purdue University W. Lafayette, IN 47907 March 15, 2012 NSF STC Center for Science of Information CISS, Princeton 2012 Joint

More information

NUMBER OF SYMBOL COMPARISONS IN QUICKSORT AND QUICKSELECT

NUMBER OF SYMBOL COMPARISONS IN QUICKSORT AND QUICKSELECT NUMBER OF SYMBOL COMPARISONS IN QUICKSORT AND QUICKSELECT Brigitte Vallée (CNRS and Université de Caen, France) Joint work with Julien Clément, Jim Fill and Philippe Flajolet Plan of the talk. Presentation

More information

Asymmetric Rényi Problem

Asymmetric Rényi Problem Asymmetric Rényi Problem July 7, 2015 Abram Magner and Michael Drmota and Wojciech Szpankowski Abstract In 1960 Rényi in his Michigan State University lectures asked for the number of random queries necessary

More information

Asymptotic and Exact Poissonized Variance in the Analysis of Random Digital Trees (joint with Hsien-Kuei Hwang and Vytas Zacharovas)

Asymptotic and Exact Poissonized Variance in the Analysis of Random Digital Trees (joint with Hsien-Kuei Hwang and Vytas Zacharovas) Asymptotic and Exact Poissonized Variance in the Analysis of Random Digital Trees (joint with Hsien-Kuei Hwang and Vytas Zacharovas) Michael Fuchs Institute of Applied Mathematics National Chiao Tung University

More information

RENEWAL THEORY IN ANALYSIS OF TRIES AND STRINGS: EXTENDED ABSTRACT

RENEWAL THEORY IN ANALYSIS OF TRIES AND STRINGS: EXTENDED ABSTRACT RENEWAL THEORY IN ANALYSIS OF TRIES AND STRINGS: EXTENDED ABSTRACT SVANTE JANSON Abstract. We give a survey of a number of simple applications of renewal theory to problems on random strings, in particular

More information

The Moments of the Profile in Random Binary Digital Trees

The Moments of the Profile in Random Binary Digital Trees Journal of mathematics and computer science 6(2013)176-190 The Moments of the Profile in Random Binary Digital Trees Ramin Kazemi and Saeid Delavar Department of Statistics, Imam Khomeini International

More information

A New Binomial Recurrence Arising in a Graphical Compression Algorithm

A New Binomial Recurrence Arising in a Graphical Compression Algorithm A New Binomial Recurrence Arising in a Graphical Compression Algorithm Yongwoo Choi, Charles Knessl, Wojciech Szpanowsi To cite this version: Yongwoo Choi, Charles Knessl, Wojciech Szpanowsi. A New Binomial

More information

AVERAGE-CASE ANALYSIS OF COUSINS IN m-ary TRIES

AVERAGE-CASE ANALYSIS OF COUSINS IN m-ary TRIES J. Appl. Prob. 45, 888 9 (28) Printed in England Applied Probability Trust 28 AVERAGE-CASE ANALYSIS OF COUSINS IN m-ary TRIES HOSAM M. MAHMOUD, The George Washington University MARK DANIEL WARD, Purdue

More information

Approximate Counting via the Poisson-Laplace-Mellin Method (joint with Chung-Kuei Lee and Helmut Prodinger)

Approximate Counting via the Poisson-Laplace-Mellin Method (joint with Chung-Kuei Lee and Helmut Prodinger) Approximate Counting via the Poisson-Laplace-Mellin Method (joint with Chung-Kuei Lee and Helmut Prodinger) Michael Fuchs Department of Applied Mathematics National Chiao Tung University Hsinchu, Taiwan

More information

A One-to-One Code and Its Anti-Redundancy

A One-to-One Code and Its Anti-Redundancy A One-to-One Code and Its Anti-Redundancy W. Szpankowski Department of Computer Science, Purdue University July 4, 2005 This research is supported by NSF, NSA and NIH. Outline of the Talk. Prefix Codes

More information

Digital search trees JASS

Digital search trees JASS Digital search trees Analysis of different digital trees with Rice s integrals. JASS Nicolai v. Hoyningen-Huene 28.3.2004 28.3.2004 JASS 04 - Digital search trees 1 content Tree Digital search tree: Definition

More information

Proofs, Strings, and Finite Automata. CS154 Chris Pollett Feb 5, 2007.

Proofs, Strings, and Finite Automata. CS154 Chris Pollett Feb 5, 2007. Proofs, Strings, and Finite Automata CS154 Chris Pollett Feb 5, 2007. Outline Proofs and Proof Strategies Strings Finding proofs Example: For every graph G, the sum of the degrees of all the nodes in G

More information

Analytic Depoissonization and Its Applications

Analytic Depoissonization and Its Applications Analytic Depoissonization and Its Applications W. Szpankowski Department of Computer Science Purdue University W. Lafayette, IN 47907 April 29, 2008 AofA and IT logos This research is supported by NSF

More information

MARKOV CHAINS A finite state Markov chain is a sequence of discrete cv s from a finite alphabet where is a pmf on and for

MARKOV CHAINS A finite state Markov chain is a sequence of discrete cv s from a finite alphabet where is a pmf on and for MARKOV CHAINS A finite state Markov chain is a sequence S 0,S 1,... of discrete cv s from a finite alphabet S where q 0 (s) is a pmf on S 0 and for n 1, Q(s s ) = Pr(S n =s S n 1 =s ) = Pr(S n =s S n 1

More information

Philippe Flajolet & Analytic Combinatorics: Inherent Ambiguity of Context-Free Languages

Philippe Flajolet & Analytic Combinatorics: Inherent Ambiguity of Context-Free Languages Philippe Flajolet & Analytic Combinatorics: Inherent Ambiguity of Context-Free Languages Frédérique Bassino and Cyril Nicaud LIGM, Université Paris-Est & CNRS December 16, 2011 I first met Philippe in

More information

Analytic Algorithmics, Combinatorics, and Information Theory

Analytic Algorithmics, Combinatorics, and Information Theory Analytic Algorithmics, Combinatorics, and Information Theory W. Szpankowski Department of Computer Science Purdue University W. Lafayette, IN 47907 September 11, 2006 AofA and IT logos Research supported

More information

4F5: Advanced Communications and Coding Handout 2: The Typical Set, Compression, Mutual Information

4F5: Advanced Communications and Coding Handout 2: The Typical Set, Compression, Mutual Information 4F5: Advanced Communications and Coding Handout 2: The Typical Set, Compression, Mutual Information Ramji Venkataramanan Signal Processing and Communications Lab Department of Engineering ramji.v@eng.cam.ac.uk

More information

7 Asymptotics for Meromorphic Functions

7 Asymptotics for Meromorphic Functions Lecture G jacques@ucsd.edu 7 Asymptotics for Meromorphic Functions Hadamard s Theorem gives a broad description of the exponential growth of coefficients in power series, but the notion of exponential

More information

PROBABILITY AND INFORMATION THEORY. Dr. Gjergji Kasneci Introduction to Information Retrieval WS

PROBABILITY AND INFORMATION THEORY. Dr. Gjergji Kasneci Introduction to Information Retrieval WS PROBABILITY AND INFORMATION THEORY Dr. Gjergji Kasneci Introduction to Information Retrieval WS 2012-13 1 Outline Intro Basics of probability and information theory Probability space Rules of probability

More information

PROBABILISTIC BEHAVIOR OF ASYMMETRIC LEVEL COMPRESSED TRIES

PROBABILISTIC BEHAVIOR OF ASYMMETRIC LEVEL COMPRESSED TRIES PROBABILISTIC BEAVIOR OF ASYMMETRIC LEVEL COMPRESSED TRIES Luc Devroye Wojcieh Szpankowski School of Computer Science Department of Computer Sciences McGill University Purdue University 3450 University

More information

UNIT I INFORMATION THEORY. I k log 2

UNIT I INFORMATION THEORY. I k log 2 UNIT I INFORMATION THEORY Claude Shannon 1916-2001 Creator of Information Theory, lays the foundation for implementing logic in digital circuits as part of his Masters Thesis! (1939) and published a paper

More information

Lecture 4 : Adaptive source coding algorithms

Lecture 4 : Adaptive source coding algorithms Lecture 4 : Adaptive source coding algorithms February 2, 28 Information Theory Outline 1. Motivation ; 2. adaptive Huffman encoding ; 3. Gallager and Knuth s method ; 4. Dictionary methods : Lempel-Ziv

More information

The Expected Profile of Digital Search Trees

The Expected Profile of Digital Search Trees The Expected Profile of Digital Search Trees March 24, 2011 Michael Drmota Wojciech Szpankowski Inst. für Diskrete Mathematik und Geometrie Department of Computer Science TU Wien Purdue University A-1040

More information

On universal types. Gadiel Seroussi Information Theory Research HP Laboratories Palo Alto HPL September 6, 2004*

On universal types. Gadiel Seroussi Information Theory Research HP Laboratories Palo Alto HPL September 6, 2004* On universal types Gadiel Seroussi Information Theory Research HP Laboratories Palo Alto HPL-2004-153 September 6, 2004* E-mail: gadiel.seroussi@hp.com method of types, type classes, Lempel-Ziv coding,

More information

Sturmian Words, Sturmian Trees and Sturmian Graphs

Sturmian Words, Sturmian Trees and Sturmian Graphs Sturmian Words, Sturmian Trees and Sturmian Graphs A Survey of Some Recent Results Jean Berstel Institut Gaspard-Monge, Université Paris-Est CAI 2007, Thessaloniki Jean Berstel (IGM) Survey on Sturm CAI

More information

Analytic Pattern Matching: From DNA to Twitter. Philippe Jacquet and Wojciech Szpankowski

Analytic Pattern Matching: From DNA to Twitter. Philippe Jacquet and Wojciech Szpankowski Analytic Pattern Matching: From DNA to Twitter Philippe Jacquet and Wojciech Szpankowski May 11, 2015 Dedicated to Philippe Flajolet, our mentor and friend. Contents Foreword...................................

More information

Variable-to-Variable Codes with Small Redundancy Rates

Variable-to-Variable Codes with Small Redundancy Rates Variable-to-Variable Codes with Small Redundancy Rates M. Drmota W. Szpankowski September 25, 2004 This research is supported by NSF, NSA and NIH. Institut f. Diskrete Mathematik und Geometrie, TU Wien,

More information

Exploring String Patterns with Trees

Exploring String Patterns with Trees Rose-Hulman Undergraduate Mathematics Journal Volume 12 Issue 2 Article 11 Exploring String Patterns with Trees Deborah Sutton Purdue University, dasutton@purdue.edu Booi Yee Wong Purdue University, wongb@purdue.edu

More information

Entropy Coding. Connectivity coding. Entropy coding. Definitions. Lossles coder. Input: a set of symbols Output: bitstream. Idea

Entropy Coding. Connectivity coding. Entropy coding. Definitions. Lossles coder. Input: a set of symbols Output: bitstream. Idea Connectivity coding Entropy Coding dd 7, dd 6, dd 7, dd 5,... TG output... CRRRLSLECRRE Entropy coder output Connectivity data Edgebreaker output Digital Geometry Processing - Spring 8, Technion Digital

More information

arxiv: v1 [math.pr] 5 Nov 2017

arxiv: v1 [math.pr] 5 Nov 2017 Asymmetric Rényi Problem arxiv:1711.01528v1 [math.pr] 5 Nov 2017 M. DRMOTA 1 and A. MAGNER 2 and W. SZPANKOWSKI 1 Institute for Discrete Mathematics and Geometry, TU Wien, Vienna, Austria, A -1040 Wien,

More information

Analytic Information Theory and Beyond

Analytic Information Theory and Beyond Analytic Information Theory and Beyond W. Szpankowski Department of Computer Science Purdue University W. Lafayette, IN 47907 May 12, 2010 AofA and IT logos Frankfurt 2010 Research supported by NSF Science

More information

Analytic Depoissonization and Its Applications Λ

Analytic Depoissonization and Its Applications Λ Analytic Depoissonization and Its Applications Λ W. Szpankowski y Department of Computer Science Purdue University W. Lafayette, IN 47907 May 31, 2003 Λ This research is supported by NSF and NIH. y Joint

More information

A Mathematical Theory of Communication

A Mathematical Theory of Communication A Mathematical Theory of Communication Ben Eggers Abstract This paper defines information-theoretic entropy and proves some elementary results about it. Notably, we prove that given a few basic assumptions

More information

Is there an Elegant Universal Theory of Prediction?

Is there an Elegant Universal Theory of Prediction? Is there an Elegant Universal Theory of Prediction? Shane Legg Dalle Molle Institute for Artificial Intelligence Manno-Lugano Switzerland 17th International Conference on Algorithmic Learning Theory Is

More information

COLLECTED WORKS OF PHILIPPE FLAJOLET

COLLECTED WORKS OF PHILIPPE FLAJOLET COLLECTED WORKS OF PHILIPPE FLAJOLET Editorial Committee: HSIEN-KUEI HWANG Institute of Statistical Science Academia Sinica Taipei 115 Taiwan ROBERT SEDGEWICK Department of Computer Science Princeton University

More information

Universal Loseless Compression: Context Tree Weighting(CTW)

Universal Loseless Compression: Context Tree Weighting(CTW) Universal Loseless Compression: Context Tree Weighting(CTW) Dept. Electrical Engineering, Stanford University Dec 9, 2014 Universal Coding with Model Classes Traditional Shannon theory assume a (probabilistic)

More information

On Buffon Machines & Numbers

On Buffon Machines & Numbers On Buffon Machines & Numbers Philippe Flajolet, Maryse Pelletier, Michèle Soria AofA 09, Fréjus --- June 2009 [INRIA-Rocquencourt & LIP6, Paris] 1 1733: Countess Buffon drops her knitting kit on the floor.

More information

Phase Transitions in a Sequence-Structure Channel. ITA, San Diego, 2015

Phase Transitions in a Sequence-Structure Channel. ITA, San Diego, 2015 Phase Transitions in a Sequence-Structure Channel A. Magner and D. Kihara and W. Szpankowski Purdue University W. Lafayette, IN 47907 January 29, 2015 ITA, San Diego, 2015 Structural Information Information

More information

A Master Theorem for Discrete Divide and Conquer Recurrences

A Master Theorem for Discrete Divide and Conquer Recurrences A Master Theorem for Discrete Divide and Conquer Recurrences MICHAEL DRMOTA and WOJCIECH SZPANKOWSKI TU Wien and Purdue University Divide-and-conquer recurrences are one of the most studied equations in

More information

Common Information. Abbas El Gamal. Stanford University. Viterbi Lecture, USC, April 2014

Common Information. Abbas El Gamal. Stanford University. Viterbi Lecture, USC, April 2014 Common Information Abbas El Gamal Stanford University Viterbi Lecture, USC, April 2014 Andrew Viterbi s Fabulous Formula, IEEE Spectrum, 2010 El Gamal (Stanford University) Disclaimer Viterbi Lecture 2

More information

the electronic journal of combinatorics 4 (997), #R7 2 (unbiased) coin tossing process by Prodinger [9] (cf. also []), who provided the rst non-trivia

the electronic journal of combinatorics 4 (997), #R7 2 (unbiased) coin tossing process by Prodinger [9] (cf. also []), who provided the rst non-trivia ANALYSIS OF AN ASYMMETRIC LEADER ELECTION ALGORITHM Svante Janson Wojciech Szpankowski Department of Mathematics Department of Computer Science Uppsala University Purdue University 756 Uppsala W. Lafayette,

More information

1 Ex. 1 Verify that the function H(p 1,..., p n ) = k p k log 2 p k satisfies all 8 axioms on H.

1 Ex. 1 Verify that the function H(p 1,..., p n ) = k p k log 2 p k satisfies all 8 axioms on H. Problem sheet Ex. Verify that the function H(p,..., p n ) = k p k log p k satisfies all 8 axioms on H. Ex. (Not to be handed in). looking at the notes). List as many of the 8 axioms as you can, (without

More information

An Introduction to Bioinformatics Algorithms Hidden Markov Models

An Introduction to Bioinformatics Algorithms   Hidden Markov Models Hidden Markov Models Outline 1. CG-Islands 2. The Fair Bet Casino 3. Hidden Markov Model 4. Decoding Algorithm 5. Forward-Backward Algorithm 6. Profile HMMs 7. HMM Parameter Estimation 8. Viterbi Training

More information

Chapter 2: Source coding

Chapter 2: Source coding Chapter 2: meghdadi@ensil.unilim.fr University of Limoges Chapter 2: Entropy of Markov Source Chapter 2: Entropy of Markov Source Markov model for information sources Given the present, the future is independent

More information

Course 495: Advanced Statistical Machine Learning/Pattern Recognition

Course 495: Advanced Statistical Machine Learning/Pattern Recognition Course 495: Advanced Statistical Machine Learning/Pattern Recognition Lecturer: Stefanos Zafeiriou Goal (Lectures): To present discrete and continuous valued probabilistic linear dynamical systems (HMMs

More information

The number of runs in Sturmian words

The number of runs in Sturmian words The number of runs in Sturmian words Paweł Baturo 1, Marcin Piatkowski 1, and Wojciech Rytter 2,1 1 Department of Mathematics and Computer Science, Copernicus University 2 Institute of Informatics, Warsaw

More information

lossless, optimal compressor

lossless, optimal compressor 6. Variable-length Lossless Compression The principal engineering goal of compression is to represent a given sequence a, a 2,..., a n produced by a source as a sequence of bits of minimal possible length.

More information

Introduction Consider a group of n people (users, computers, objects, etc.) sharing a scarce resource (e.g., channel, CPU, etc.). The following elimin

Introduction Consider a group of n people (users, computers, objects, etc.) sharing a scarce resource (e.g., channel, CPU, etc.). The following elimin ANALYSIS OF AN ASYMMETRIC LEADER ELECTION ALGORITHM September 3, 996 Svante Janson Wojciech Szpankowski Department of Mathematics Department of Computer Science Uppsala University Purdue University 756

More information

Streaming and communication complexity of Hamming distance

Streaming and communication complexity of Hamming distance Streaming and communication complexity of Hamming distance Tatiana Starikovskaya IRIF, Université Paris-Diderot (Joint work with Raphaël Clifford, ICALP 16) Approximate pattern matching Problem Pattern

More information

Information in Biology

Information in Biology Information in Biology CRI - Centre de Recherches Interdisciplinaires, Paris May 2012 Information processing is an essential part of Life. Thinking about it in quantitative terms may is useful. 1 Living

More information

1 Background on Information Theory

1 Background on Information Theory Review of the book Information Theory: Coding Theorems for Discrete Memoryless Systems by Imre Csiszár and János Körner Second Edition Cambridge University Press, 2011 ISBN:978-0-521-19681-9 Review by

More information

X 1 : X Table 1: Y = X X 2

X 1 : X Table 1: Y = X X 2 ECE 534: Elements of Information Theory, Fall 200 Homework 3 Solutions (ALL DUE to Kenneth S. Palacio Baus) December, 200. Problem 5.20. Multiple access (a) Find the capacity region for the multiple-access

More information

Multimedia Communications. Mathematical Preliminaries for Lossless Compression

Multimedia Communications. Mathematical Preliminaries for Lossless Compression Multimedia Communications Mathematical Preliminaries for Lossless Compression What we will see in this chapter Definition of information and entropy Modeling a data source Definition of coding and when

More information

Basic Principles of Lossless Coding. Universal Lossless coding. Lempel-Ziv Coding. 2. Exploit dependences between successive symbols.

Basic Principles of Lossless Coding. Universal Lossless coding. Lempel-Ziv Coding. 2. Exploit dependences between successive symbols. Universal Lossless coding Lempel-Ziv Coding Basic principles of lossless compression Historical review Variable-length-to-block coding Lempel-Ziv coding 1 Basic Principles of Lossless Coding 1. Exploit

More information

Estimating Statistics on Words Using Ambiguous Descriptions

Estimating Statistics on Words Using Ambiguous Descriptions Estimating Statistics on Words Using Ambiguous Descriptions Cyril Nicaud Université Paris-Est, LIGM (UMR 8049), F77454 Marne-la-Vallée, France cyril.nicaud@u-pem.fr Abstract In this article we propose

More information

Linear Classifiers (Kernels)

Linear Classifiers (Kernels) Universität Potsdam Institut für Informatik Lehrstuhl Linear Classifiers (Kernels) Blaine Nelson, Christoph Sawade, Tobias Scheffer Exam Dates & Course Conclusion There are 2 Exam dates: Feb 20 th March

More information

LAWS OF LARGE NUMBERS AND TAIL INEQUALITIES FOR RANDOM TRIES AND PATRICIA TREES

LAWS OF LARGE NUMBERS AND TAIL INEQUALITIES FOR RANDOM TRIES AND PATRICIA TREES LAWS OF LARGE NUMBERS AND TAIL INEQUALITIES FOR RANDOM TRIES AND PATRICIA TREES Luc Devroye School of Computer Science McGill University Montreal, Canada H3A 2K6 luc@csmcgillca June 25, 2001 Abstract We

More information

Information in Biology

Information in Biology Lecture 3: Information in Biology Tsvi Tlusty, tsvi@unist.ac.kr Living information is carried by molecular channels Living systems I. Self-replicating information processors Environment II. III. Evolve

More information

A Representation of Approximate Self- Overlapping Word and Its Application

A Representation of Approximate Self- Overlapping Word and Its Application Purdue University Purdue e-pubs Computer Science Technical Reports Department of Computer Science 1994 A Representation of Approximate Self- Overlapping Word and Its Application Wojciech Szpankowski Purdue

More information

Networks: space-time stochastic models for networks

Networks: space-time stochastic models for networks Mobile and Delay Tolerant Networks: space-time stochastic models for networks Philippe Jacquet Alcatel-Lucent Bell Labs France Plan of the talk Short history of wireless networking Information in space-time

More information

Hidden Markov Models

Hidden Markov Models Hidden Markov Models Outline 1. CG-Islands 2. The Fair Bet Casino 3. Hidden Markov Model 4. Decoding Algorithm 5. Forward-Backward Algorithm 6. Profile HMMs 7. HMM Parameter Estimation 8. Viterbi Training

More information

Introduction to Information Theory. Uncertainty. Entropy. Surprisal. Joint entropy. Conditional entropy. Mutual information.

Introduction to Information Theory. Uncertainty. Entropy. Surprisal. Joint entropy. Conditional entropy. Mutual information. L65 Dept. of Linguistics, Indiana University Fall 205 Information theory answers two fundamental questions in communication theory: What is the ultimate data compression? What is the transmission rate

More information

Dept. of Linguistics, Indiana University Fall 2015

Dept. of Linguistics, Indiana University Fall 2015 L645 Dept. of Linguistics, Indiana University Fall 2015 1 / 28 Information theory answers two fundamental questions in communication theory: What is the ultimate data compression? What is the transmission

More information

Plan for today. ! Part 1: (Hidden) Markov models. ! Part 2: String matching and read mapping

Plan for today. ! Part 1: (Hidden) Markov models. ! Part 2: String matching and read mapping Plan for today! Part 1: (Hidden) Markov models! Part 2: String matching and read mapping! 2.1 Exact algorithms! 2.2 Heuristic methods for approximate search (Hidden) Markov models Why consider probabilistics

More information

PART III. Outline. Codes and Cryptography. Sources. Optimal Codes (I) Jorge L. Villar. MAMME, Fall 2015

PART III. Outline. Codes and Cryptography. Sources. Optimal Codes (I) Jorge L. Villar. MAMME, Fall 2015 Outline Codes and Cryptography 1 Information Sources and Optimal Codes 2 Building Optimal Codes: Huffman Codes MAMME, Fall 2015 3 Shannon Entropy and Mutual Information PART III Sources Information source:

More information

Hidden Markov Models. Ivan Gesteira Costa Filho IZKF Research Group Bioinformatics RWTH Aachen Adapted from:

Hidden Markov Models. Ivan Gesteira Costa Filho IZKF Research Group Bioinformatics RWTH Aachen Adapted from: Hidden Markov Models Ivan Gesteira Costa Filho IZKF Research Group Bioinformatics RWTH Aachen Adapted from: www.ioalgorithms.info Outline CG-islands The Fair Bet Casino Hidden Markov Model Decoding Algorithm

More information

arxiv: v1 [cs.ds] 15 Feb 2012

arxiv: v1 [cs.ds] 15 Feb 2012 Linear-Space Substring Range Counting over Polylogarithmic Alphabets Travis Gagie 1 and Pawe l Gawrychowski 2 1 Aalto University, Finland travis.gagie@aalto.fi 2 Max Planck Institute, Germany gawry@cs.uni.wroc.pl

More information

Combinatorics On Sturmian Words. Amy Glen. Major Review Seminar. February 27, 2004 DISCIPLINE OF PURE MATHEMATICS

Combinatorics On Sturmian Words. Amy Glen. Major Review Seminar. February 27, 2004 DISCIPLINE OF PURE MATHEMATICS Combinatorics On Sturmian Words Amy Glen Major Review Seminar February 27, 2004 DISCIPLINE OF PURE MATHEMATICS OUTLINE Finite and Infinite Words Sturmian Words Mechanical Words and Cutting Sequences Infinite

More information

On Symmetries of Non-Plane Trees in a Non-Uniform Model

On Symmetries of Non-Plane Trees in a Non-Uniform Model On Symmetries of Non-Plane Trees in a Non-Uniform Model Jacek Cichoń Abram Magner Wojciech Spankowski Krystof Turowski October 3, 26 Abstract Binary trees come in two varieties: plane trees, often simply

More information

Pattern Matching. a b a c a a b. a b a c a b. a b a c a b. Pattern Matching 1

Pattern Matching. a b a c a a b. a b a c a b. a b a c a b. Pattern Matching 1 Pattern Matching a b a c a a b 1 4 3 2 Pattern Matching 1 Outline and Reading Strings ( 9.1.1) Pattern matching algorithms Brute-force algorithm ( 9.1.2) Boyer-Moore algorithm ( 9.1.3) Knuth-Morris-Pratt

More information

Context-free Grammars and Languages

Context-free Grammars and Languages Context-free Grammars and Languages COMP 455 002, Spring 2019 Jim Anderson (modified by Nathan Otterness) 1 Context-free Grammars Context-free grammars provide another way to specify languages. Example:

More information

BuST-Bundled Suffix Trees

BuST-Bundled Suffix Trees BuST-Bundled Suffix Trees Luca Bortolussi 1, Francesco Fabris 2, and Alberto Policriti 1 1 Department of Mathematics and Informatics, University of Udine. bortolussi policriti AT dimi.uniud.it 2 Department

More information

Chapter 9 Fundamental Limits in Information Theory

Chapter 9 Fundamental Limits in Information Theory Chapter 9 Fundamental Limits in Information Theory Information Theory is the fundamental theory behind information manipulation, including data compression and data transmission. 9.1 Introduction o For

More information

1 Alphabets and Languages

1 Alphabets and Languages 1 Alphabets and Languages Look at handout 1 (inference rules for sets) and use the rules on some examples like {a} {{a}} {a} {a, b}, {a} {{a}}, {a} {{a}}, {a} {a, b}, a {{a}}, a {a, b}, a {{a}}, a {a,

More information

arxiv: v1 [cs.dm] 13 Feb 2010

arxiv: v1 [cs.dm] 13 Feb 2010 Properties of palindromes in finite words arxiv:1002.2723v1 [cs.dm] 13 Feb 2010 Mira-Cristiana ANISIU Valeriu ANISIU Zoltán KÁSA Abstract We present a method which displays all palindromes of a given length

More information

Data Compression Techniques

Data Compression Techniques Data Compression Techniques Part 1: Entropy Coding Lecture 4: Asymmetric Numeral Systems Juha Kärkkäinen 08.11.2017 1 / 19 Asymmetric Numeral Systems Asymmetric numeral systems (ANS) is a recent entropy

More information

Exam in Discrete Mathematics

Exam in Discrete Mathematics Exam in Discrete Mathematics First Year at The TEK-NAT Faculty June 10th, 2016, 9.00-13.00 This exam consists of 11 numbered pages with 16 problems. All the problems are multiple choice problems. The answers

More information

Finding Consensus Strings With Small Length Difference Between Input and Solution Strings

Finding Consensus Strings With Small Length Difference Between Input and Solution Strings Finding Consensus Strings With Small Length Difference Between Input and Solution Strings Markus L. Schmid Trier University, Fachbereich IV Abteilung Informatikwissenschaften, D-54286 Trier, Germany, MSchmid@uni-trier.de

More information

Average Case Analysis of the Boyer-Moore Algorithm

Average Case Analysis of the Boyer-Moore Algorithm Average Case Analysis of the Boyer-Moore Algorithm TSUNG-HSI TSAI Institute of Statistical Science Academia Sinica Taipei 115 Taiwan e-mail: chonghi@stat.sinica.edu.tw URL: http://www.stat.sinica.edu.tw/chonghi/stat.htm

More information

Indexing LZ77: The Next Step in Self-Indexing. Gonzalo Navarro Department of Computer Science, University of Chile

Indexing LZ77: The Next Step in Self-Indexing. Gonzalo Navarro Department of Computer Science, University of Chile Indexing LZ77: The Next Step in Self-Indexing Gonzalo Navarro Department of Computer Science, University of Chile gnavarro@dcc.uchile.cl Part I: Why Jumping off the Cliff The Past Century Self-Indexing:

More information