String Complexity. Dedicated to Svante Janson for his 60 Birthday
|
|
- Allen Waters
- 5 years ago
- Views:
Transcription
1 String Complexity Wojciech Szpankowski Purdue University W. Lafayette, IN June 1, 2015 Dedicated to Svante Janson for his 60 Birthday
2 Outline 1. Working with Svante 2. String Complexity 3. Joint String Complexity
3 Joint Papers 1. S. Janson and W. Szpankowski, Analysis of an asymmetric leader election algorithm Electronic J. of Combinatorics, 4, R17, S. Janson, S. Lonardi, and W. Szpankowski, On Average Sequence Complexity, Theoretical Computer Science, 326, , 2004 (also Combinatorial Pattern Matching Conference, CPM 04, Istanbul, 2004). 3. S. Janson and W. Szpankowski, Partial Fillup and Search Time in LC Tries ACM Trans. on Algorithms, 3, 44:1-44:14, 2007 (also, ANALCO, Miami, 2006). 4. A. Magner, S. Janson, G. Kollias, and W. Szpankowski On Symmetry of Uniform and Preferential Attachment Graphs, Electronic J. Combinatorics, 21, P3.32, 2014 (also, 25th International Conference on Probabilistic, Combinatorial and Asymptotic Methods for the Analysis of Algorithms AofA 14, Paris, 2014).
4 Joint Papers 1. S. Janson and W. Szpankowski, Analysis of an asymmetric leader election algorithm Electronic J. of Combinatorics, 4, R17, S. Janson, S. Lonardi, and W. Szpankowski, On Average Sequence Complexity, Theoretical Computer Science, 326, , 2004 (also Combinatorial Pattern Matching Conference, CPM 04, Istanbul, 2004). 3. S. Janson and W. Szpankowski, Partial Fillup and Search Time in LC Tries ACM Trans. on Algorithms, 3, 44:1-44:14, 2007 (also, ANALCO, Miami, 2006). 4. A. Magner, S. Janson, G. Kollias, and W. Szpankowski On Symmetry of Uniform and Preferential Attachment Graphs, Electronic J. Combinatorics, 21, P3.32, 2014 (also, 25th International Conference on Probabilistic, Combinatorial and Asymptotic Methods for the Analysis of Algorithms AofA 14, Paris, 2014). Working with Svante is easy...
5
6 Outline 1. Working with Svante 2. String Complexity 3. Joint String Complexity
7 Some Definitions String Complexity of a single sequence is the number of distinct substrings. Throughout, we write X for the string and denote by I(X) the set of distinct substrings of X over alphabet A. Example. If X = aabaa, then I(X) = {ǫ,a,b,aa,ab,ba,aab,aba,baa,aaba,abaa,aabaa}, and I(X) = 12. But if X = aaaaa, then and I(X) = 6. I(X) = {ǫ,a,aa,aaa,aaaa,aaaaa}, The string complexity is the cardinality of I(X) and we study here the average string complexity. E[ I(X) ] = X A n P(X) I(X).
8 Suffix Trees and String Complexity a b a a b a b a b a 4 a b a b a 3 b a a b a b a 2 b a a b a a b a b a Non-compact suffix trie for X = abaababa and string complexity I(X) = 24. a b a ababa 1 a ba ababa ba 4 ba 4 ababa 3 ababa 3 b a ababa 2 ba ababa 2 ba 5 ba a b a a b a b a a b a a b a b a String Complexity = # internal nodes in a non-compact suffix tree.
9 Some Simple Facts Let O(w) denote the number of times that the word w occurs in X. Then I(X) = w A min{1,o(w)}. Since between any two positions in X there is one and only one substring: w A O(w) = ( X + 1) X. 2 Hence Define: ( X + 1) X I(X) = max{0,o(w) 1}. 2 w A C n := E[ I(X) X = n]. Then C n = (n + 1)n 2 (k 1)P(O n (w) = k). w A k 2 We need to study probabilistically O n (w): that is: number of w occurrences in a text X generated a probabilistic source.
10 New Book on Pattern Matching Pattern matching techniques can offer answers to these questions and to many others, from molecular biology, to telecommunications, to classifying Twitter content. More than 100 end-of-chapter problems help the reader to make the link between theory and practice. Philippe Jacquet is a research director at INRIA, a major public research lab in Computer Science in France. He has been a major contributor to the Internet OLSR protocol for mobile networks. His research interests involve information theory, probability theory, quantum telecommunication, protocol design, performance evaluation and optimization, and the analysis of algorithms. Since 2012 he has been with Alcatel-Lucent Bell Labs as head of the department of Mathematics of Dynamic Networks and Information. Jacquet is a member of the prestigious French Corps des Mines, known for excellence in French industry, with the rank of Ingenieur General. He is also a member of ACM and IEEE : Jacquet & Szpankowski: PPC: C M Y K Wojciech Szpankowski is Saul Rosen Professor of Computer Science and (by courtesy) Electrical and Computer Engineering at Purdue University, where he teaches and conducts research in analysis of algorithms, information theory, bioinformatics, analytic combinatorics, random structures, and stability problems of distributed systems. In 2008 he launched the interdisciplinary Institute for Science of Information, and in 2010 he became the Director of the newly established NSF Science and Technology Center for Science of Information. Szpankowski is a Fellow of IEEE and an Erskine Fellow. He received the Humboldt Research Award in ISBN Cover design: Andrew Ward > Analytic Pattern Matching This book for researchers and graduate students demonstrates the probabilistic approach to pattern matching, which predicts the performance of pattern matching algorithms with very high precision using analytic combinatorics and analytic information theory. Part I compiles known results of pattern matching problems via analytic methods. Part II focuses on applications to various data structures on words, such as digital trees, suffix trees, string complexity and string-based data compression. The authors use results and techniques from Part I and also introduce new methodology such as the Mellin transform and analytic depoissonization. Jacquet and Szpankowski How do you distinguish a cat from a dog by their DNA? Did Shakespeare really write all of his plays? Philippe Jacquet and Wojciech Szpankowski Analytic Pattern Matching From DNA to Twitter
11 Book Contents Chapter 1: Probabilistic Models Chapter 2: Exact String Matching Chapter 3: Constrained Exact String Matching Chapter 4: Generalized String Matching Chapter 5: Subsequence String Matching Chapter 6: Algorithms and Data Structures Chapter 7: Digital Trees Chapter 8: Suffix Trees & Lempel-Ziv 77 Chapter 9: Lempel-Ziv 78 Compression Algorithm Chapter 10: String Complexity
12 Some Results Last expression allows us to write C n = (n + 1)n 2 + E[S n ] E[L n ] where E[S n ] and E[L n ] are, respectively, the average size and path length in the associated (compact) suffix trees. We know that E[S n ] = 1 (n + Ψ(logn)) + o(n), h E[L n ] = nlogn h + nψ 2 (logn) + o(n), where Ψ(logn) and Ψ 2 (logn) are periodic functions (when the logp a, a A are rationally related), and where h is the entropy rate. Therefore, C n = (n + 1)n 2 n h (logn 1 + Q 0(logn) + o(1)) where Q 0 (x) is a periodic function.
13 Theorem from 2004 Proved with Bare-Hands In 2004 Svante, Stefano and I published the first result of this kind for a symmetric memoryless source (all symbol probabilities are the same). Theorem 1 (Janson, Lonardi, W.S., 2004). Let C n be the string complexity for an unbiased memoryless source over alphabet A. Then E(C n ) = ( n + 1 ) ( 1 nlog A n γ ) ln A + ϕ A (log A n) n+o( nlogn) where γ is Euler s constant and ϕ A (x) = 1 ln A Γ j 0 ( 1 2πij ) e 2πijx ln A is a continuous function with period 1. ϕ A (x) is very small for small A : ϕ 2 (x) < , ϕ 3 (x) < , ϕ 4 (x) <
14 Outline 1. Working with Svante 2. String Complexity 3. Joint String Complexity
15 Joint String Complexity For X and Y, let J(X,Y ) be the set of common words between X and Y. The joint string complexity is J(X,Y) = I(X) I(Y ) Example. If X = aabaa and Y = abbba, then J(X,Y) = {ε,a,b,ab,ba}. Goal. Estimate J n,m = E[ J(X,Y) ] when X = n and Y = m.
16 Joint String Complexity For X and Y, let J(X,Y ) be the set of common words between X and Y. The joint string complexity is J(X,Y) = I(X) I(Y ) Example. If X = aabaa and Y = abbba, then J(X,Y) = {ε,a,b,ab,ba}. Goal. Estimate J n,m = E[ J(X,Y) ] when X = n and Y = m. Some Observations. For any word w A J(X,Y) = w A min{1,o X (w)} min{1,o Y (w)}. When X = n and Y = m, we have J n,m = E[ J(X,Y ) ] 1 = w A {ε} P(O 1 n (w) 1)P(O2 m (w) 1) where O i n (w) is the number of w-occurrences in a string of generated by source i = 1, 2 (i.e., X and Y ) which we assume to be memoryless sources.
17 Independent Joint String Complexity Consider two sets of n independently generated (memoryless) strings. Let Ω i n (w) be the number of strings for which w is a prefix when the n strings are generated by a source i = 1,2 define C n,m = w A {ε} Theorem 2. There exists ε > 0 such that for large n. P(Ω 1 n (w) 1)P(Ω2 m (w) 1) J n,m C n,m = O(min{n,m} ε )
18 Independent Joint String Complexity Consider two sets of n independently generated (memoryless) strings. Let Ω i n (w) be the number of strings for which w is a prefix when the n strings are generated by a source i = 1,2 define C n,m = w A {ε} Theorem 2. There exists ε > 0 such that for large n. P(Ω 1 n (w) 1)P(Ω2 m (w) 1) J n,m C n,m = O(min{n,m} ε ) Recurrence for C n,m C n,m = 1+ a A k,l 0 with C 0,m = C n,0 = 0. ( n 1 (a) k)p k (1 P 1 (a)) n k( m) P 2 (a) l (1 P 2 (a)) m l C k,l l
19 Generating Functions, Mellin Transform, DePoissonization... Poisson Transform. The Poisson transform C(z 1,z 2 ) of C n,m is C(z 1,z 2 ) = n,m 0 C n,m z n 1 zm 2 n!m! e z 1 z 2. which becomes the functional equation after summing up the recurrence: C(z 1,z 2 ) = (1 e z 1 )(1 e z 2 ) + a AC (P 1 (a)z 1,P 2 (a)z 2 ). Clearly, n!m!c n,m = [z n 1 ][zm 2 ]C(z 1,z 2 )e z 1 +z 2.
20 Generating Functions, Mellin Transform, DePoissonization... Poisson Transform. The Poisson transform C(z 1,z 2 ) of C n,m is C(z 1,z 2 ) = n,m 0 C n,m z n 1 zm 2 n!m! e z 1 z 2. which becomes the functional equation after summing up the recurrence: C(z 1,z 2 ) = (1 e z 1 )(1 e z 2 ) + a AC (P 1 (a)z 1,P 2 (a)z 2 ). Clearly, n!m!c n,m = [z n 1 ][zm 2 ]C(z 1,z 2 )e z 1 +z 2. Mellin Transform. Two dimensional Mellin transform is defined as C (s 1,s 2 ) = 0 0 C(z 1,z 2 )z s z s dz 1 dz 2. From the above functional equation we find for 2 < R(s i ) < 1 ( ) C 1 (s 1,s 2 ) = Γ(s 1 )Γ(s 2 ) H(s 1,s 2 ) + s 1 H( 1,s 2 ) + s 2 H(s 1, 1) + s 1 s 2 H( 1, 1) where the kernel is defined as H(s 1,s 2 ) = 1 a A(P 1 (a)) s 1 (P 2 (a)) s 2.
21 Finding C n,n Here we only consider m = n and z 1 = z 2 = z. To recover C n,n we first find the inverse Mellin C(z,z) = 1 (2iπ) 2 R(s 1 )=c 1 R(s 2 )=c 2 C (s 1,s 2 )z s 1 s 2 ds 1 ds 2 which turns out to be C(z,z) = ( ) 1 2 2iπ R(s 1 )=ρ 1 R(s 2 )=ρ 2 Γ(s 1 )Γ(s 2 ) H(s 1,s 2 ) z s 1 s 2 ds 1 ds 2 + o(z M ), for any M > 0 as z in a cone around the real axis. The final step to recover C n,n C(n,n) is to apply the two-dimensional depoissonization.
22 Main Results Assume that a A we have P 1 (a) = P 2 (a) = p a. Theorem 3. For a biased memoryless source, the joint complexity is asymptotically C n,n = n 2log2 + Q(logn)n + o(n), h where Q(x) is a small periodic function (with amplitude smaller than 10 6 ) which is nonzero only when the logp a, a A, are rationally related, that is, logp a /logp b Q. Assume that P 1 (a) P 2 (a). Theorem 4. Define κ = min (s1,s 2 ) K R 2 {( s 1 s 2 )} < 1, where s 1 and s 2 are roots of H(s 1,s 2 ) = 1 a A(P 1 (a)) s 1 (P 2 (a)) s 2 = 0. Then ( ) C n,n = nκ Γ(c 1 )Γ(c 2 ) logn π H(c1,c 2 ) H(c 1,c 2 ) + Q(logn) + O(1/logn), where Q is a double periodic function.
23 Very Brief Sketch of Proof 1. Set P 1 (a) = 1/ A and then the kernel is H(s 1,s 2 ) = 1 A s 1 a Ap s 2 a. Define r(s 2 ) = a A ps 2 a and L(s 2 ) = log A r(s 2 ). 2. Roots of H(s 1,s 2 ) = 0 are which are poles of C(z,z) leading to C(z,z) 1 2iπlog A s 1 = log A (r(s 2 )) + 2ikπ log( A ) R(s)=c 2 Integrating over s = s 2 requires the saddle point method. k Γ ( L(s) + 2ikπ ) Γ(s)z L(s) s 2ikπ/log( A ) ds log( A )
24 Saddle Point 3. The function L(s) s achieves it minimum at c 2 =: ρ is the dominant real saddle point. But there is more... ρ + i 2π j logr, j logn Infinitely Many Saddle Points: ρ + i 2π logr ρ + i 2π log r 1 3a. L(c 2 + it) is a periodic function with period 2πlogν. ρ i 2π logr 3b The saddle points are at c 2 + 2πil/logν. ρ i ρ i 2π logr 2π j logr, j logn 3c. The infinite saddle points defines the fluctuating function Q. 4. The growth of C(z,z) is defined by z L(c 2 ) c 2 = z κ where κ = min s R {log A (r(s)) s}, c 2 = minarg s R {log A (r(s)) s}, where here s = s 2, and recall L(s 2 ) = log A r(s 2 ). The factor 1/ logn comes from the saddle point approximation. completes the sketch. This
25 Classification of Sources The growth of C n,n is: Θ(n) for identical sources; Θ(n κ / logn) for nonidential sources with κ < 1. (a) (b) Figure 1: Joint complexity: (a) English text vs French, Greek, Polish, and Finnish texts; (b) real and simulated texts (3rd Markov order) of English, French, Greek, Polish and Finnish language.
26 That s It THANK YOU, SVANTE!
Analytic Pattern Matching: From DNA to Twitter. AxA Workshop, Venice, 2016 Dedicated to Alberto Apostolico
Analytic Pattern Matching: From DNA to Twitter Wojciech Szpankowski Purdue University W. Lafayette, IN 47907 June 19, 2016 AxA Workshop, Venice, 2016 Dedicated to Alberto Apostolico Joint work with Philippe
More informationAnalytic Pattern Matching: From DNA to Twitter. Combinatorial Pattern Matching, Ischia Island, 2015
Analytic Pattern Matching: From DNA to Twitter Wojciech Szpankowski Purdue University W. Lafayette, IN 47907 June 26, 2015 Combinatorial Pattern Matching, Ischia Island, 2015 Joint work with Philippe Jacquet
More informationAnalytic Pattern Matching: From DNA to Twitter. Information Theory, Learning, and Big Data, Berkeley, 2015
Analytic Pattern Matching: From DNA to Twitter Wojciech Szpankowski Purdue University W. Lafayette, IN 47907 March 10, 2015 Information Theory, Learning, and Big Data, Berkeley, 2015 Jont work with Philippe
More informationA Note on a Problem Posed by D. E. Knuth on a Satisfiability Recurrence
A Note on a Problem Posed by D. E. Knuth on a Satisfiability Recurrence Dedicated to Philippe Flajolet 1948-2011 July 4, 2013 Philippe Jacquet Charles Knessl 1 Wojciech Szpankowski 2 Bell Labs Dept. Math.
More informationMultiple Choice Tries and Distributed Hash Tables
Multiple Choice Tries and Distributed Hash Tables Luc Devroye and Gabor Lugosi and Gahyun Park and W. Szpankowski January 3, 2007 McGill University, Montreal, Canada U. Pompeu Fabra, Barcelona, Spain U.
More informationCompact Suffix Trees Resemble Patricia Tries: Limiting Distribution of Depth
Purdue University Purdue e-pubs Department of Computer Science Technical Reports Department of Computer Science 1992 Compact Suffix Trees Resemble Patricia Tries: Limiting Distribution of Depth Philippe
More informationAnalytic Information Theory: From Shannon to Knuth and Back. Knuth80: Piteaa, Sweden, 2018 Dedicated to Don E. Knuth
Analytic Information Theory: From Shannon to Knuth and Back Wojciech Szpankowski Center for Science of Information Purdue University January 7, 2018 Knuth80: Piteaa, Sweden, 2018 Dedicated to Don E. Knuth
More informationA Master Theorem for Discrete Divide and Conquer Recurrences. Dedicated to PHILIPPE FLAJOLET
A Master Theorem for Discrete Divide and Conquer Recurrences Wojciech Szpankowski Department of Computer Science Purdue University U.S.A. September 6, 2012 Uniwersytet Jagieloński, Kraków, 2012 Dedicated
More informationDigital Trees and Memoryless Sources: from Arithmetics to Analysis
Digital Trees and Memoryless Sources: from Arithmetics to Analysis Philippe Flajolet, Mathieu Roux, Brigitte Vallée AofA 2010, Wien 1 What is a digital tree, aka TRIE? = a data structure for dynamic dictionaries
More informationA Master Theorem for Discrete Divide and Conquer Recurrences
A Master Theorem for Discrete Divide and Conquer Recurrences Wojciech Szpankowski Department of Computer Science Purdue University W. Lafayette, IN 47907 January 20, 2011 NSF CSoI SODA, 2011 Research supported
More informationFrom the Discrete to the Continuous, and Back... Philippe Flajolet INRIA, France
From the Discrete to the Continuous, and Back... Philippe Flajolet INRIA, France 1 Discrete structures in... Combinatorial mathematics Computer science: data structures & algorithms Information and communication
More informationPartial Fillup and Search Time in LC Tries
Partial Fillup and Search Time in LC Tries August 17, 2006 Svante Janson Wociech Szpankowski Department of Mathematics Department of Computer Science Uppsala University, P.O. Box 480 Purdue University
More informationarxiv:cs/ v1 [cs.ds] 6 Oct 2005
Partial Fillup and Search Time in LC Tries December 27, 2017 arxiv:cs/0510017v1 [cs.ds] 6 Oct 2005 Svante Janson Wociech Szpankowski Department of Mathematics Department of Computer Science Uppsala University,
More informationPattern Matching in Constrained Sequences
Pattern Matching in Constrained Sequences Yongwook Choi and Wojciech Szpankowski Department of Computer Science Purdue University W. Lafayette, IN 47907 U.S.A. Email: ywchoi@purdue.edu, spa@cs.purdue.edu
More informationAnalysis of the Multiplicity Matching Parameter in Suffix Trees
Analysis of the Multiplicity Matching Parameter in Suffix Trees Mark Daniel Ward 1 and Wojciech Szpankoski 2 1 Department of Mathematics, Purdue University, West Lafayette, IN, USA. mard@math.purdue.edu
More informationProbabilistic analysis of the asymmetric digital search trees
Int. J. Nonlinear Anal. Appl. 6 2015 No. 2, 161-173 ISSN: 2008-6822 electronic http://dx.doi.org/10.22075/ijnaa.2015.266 Probabilistic analysis of the asymmetric digital search trees R. Kazemi a,, M. Q.
More informationCounting Markov Types
Counting Markov Types Philippe Jacquet, Charles Knessl, Wojciech Szpankowski To cite this version: Philippe Jacquet, Charles Knessl, Wojciech Szpankowski. Counting Markov Types. Drmota, Michael and Gittenberger,
More informationSource Coding. Master Universitario en Ingeniería de Telecomunicación. I. Santamaría Universidad de Cantabria
Source Coding Master Universitario en Ingeniería de Telecomunicación I. Santamaría Universidad de Cantabria Contents Introduction Asymptotic Equipartition Property Optimal Codes (Huffman Coding) Universal
More informationAnalysis of the multiplicity matching parameter in suffix trees
2005 International Conference on Analysis of Algorithms DMTCS proc. AD, 2005, 307 322 Analysis of the multiplicity matching parameter in suffix trees Mark Daniel Ward and Wojciech Szpankoski 2 Department
More informationNUMBER OF SYMBOL COMPARISONS IN QUICKSORT
NUMBER OF SYMBOL COMPARISONS IN QUICKSORT Brigitte Vallée (CNRS and Université de Caen, France) Joint work with Julien Clément, Jim Fill and Philippe Flajolet Plan of the talk. Presentation of the study
More informationMinimax Redundancy for Large Alphabets by Analytic Methods
Minimax Redundancy for Large Alphabets by Analytic Methods Wojciech Szpankowski Purdue University W. Lafayette, IN 47907 March 15, 2012 NSF STC Center for Science of Information CISS, Princeton 2012 Joint
More informationNUMBER OF SYMBOL COMPARISONS IN QUICKSORT AND QUICKSELECT
NUMBER OF SYMBOL COMPARISONS IN QUICKSORT AND QUICKSELECT Brigitte Vallée (CNRS and Université de Caen, France) Joint work with Julien Clément, Jim Fill and Philippe Flajolet Plan of the talk. Presentation
More informationAsymmetric Rényi Problem
Asymmetric Rényi Problem July 7, 2015 Abram Magner and Michael Drmota and Wojciech Szpankowski Abstract In 1960 Rényi in his Michigan State University lectures asked for the number of random queries necessary
More informationAsymptotic and Exact Poissonized Variance in the Analysis of Random Digital Trees (joint with Hsien-Kuei Hwang and Vytas Zacharovas)
Asymptotic and Exact Poissonized Variance in the Analysis of Random Digital Trees (joint with Hsien-Kuei Hwang and Vytas Zacharovas) Michael Fuchs Institute of Applied Mathematics National Chiao Tung University
More informationRENEWAL THEORY IN ANALYSIS OF TRIES AND STRINGS: EXTENDED ABSTRACT
RENEWAL THEORY IN ANALYSIS OF TRIES AND STRINGS: EXTENDED ABSTRACT SVANTE JANSON Abstract. We give a survey of a number of simple applications of renewal theory to problems on random strings, in particular
More informationThe Moments of the Profile in Random Binary Digital Trees
Journal of mathematics and computer science 6(2013)176-190 The Moments of the Profile in Random Binary Digital Trees Ramin Kazemi and Saeid Delavar Department of Statistics, Imam Khomeini International
More informationA New Binomial Recurrence Arising in a Graphical Compression Algorithm
A New Binomial Recurrence Arising in a Graphical Compression Algorithm Yongwoo Choi, Charles Knessl, Wojciech Szpanowsi To cite this version: Yongwoo Choi, Charles Knessl, Wojciech Szpanowsi. A New Binomial
More informationAVERAGE-CASE ANALYSIS OF COUSINS IN m-ary TRIES
J. Appl. Prob. 45, 888 9 (28) Printed in England Applied Probability Trust 28 AVERAGE-CASE ANALYSIS OF COUSINS IN m-ary TRIES HOSAM M. MAHMOUD, The George Washington University MARK DANIEL WARD, Purdue
More informationApproximate Counting via the Poisson-Laplace-Mellin Method (joint with Chung-Kuei Lee and Helmut Prodinger)
Approximate Counting via the Poisson-Laplace-Mellin Method (joint with Chung-Kuei Lee and Helmut Prodinger) Michael Fuchs Department of Applied Mathematics National Chiao Tung University Hsinchu, Taiwan
More informationA One-to-One Code and Its Anti-Redundancy
A One-to-One Code and Its Anti-Redundancy W. Szpankowski Department of Computer Science, Purdue University July 4, 2005 This research is supported by NSF, NSA and NIH. Outline of the Talk. Prefix Codes
More informationDigital search trees JASS
Digital search trees Analysis of different digital trees with Rice s integrals. JASS Nicolai v. Hoyningen-Huene 28.3.2004 28.3.2004 JASS 04 - Digital search trees 1 content Tree Digital search tree: Definition
More informationProofs, Strings, and Finite Automata. CS154 Chris Pollett Feb 5, 2007.
Proofs, Strings, and Finite Automata CS154 Chris Pollett Feb 5, 2007. Outline Proofs and Proof Strategies Strings Finding proofs Example: For every graph G, the sum of the degrees of all the nodes in G
More informationAnalytic Depoissonization and Its Applications
Analytic Depoissonization and Its Applications W. Szpankowski Department of Computer Science Purdue University W. Lafayette, IN 47907 April 29, 2008 AofA and IT logos This research is supported by NSF
More informationMARKOV CHAINS A finite state Markov chain is a sequence of discrete cv s from a finite alphabet where is a pmf on and for
MARKOV CHAINS A finite state Markov chain is a sequence S 0,S 1,... of discrete cv s from a finite alphabet S where q 0 (s) is a pmf on S 0 and for n 1, Q(s s ) = Pr(S n =s S n 1 =s ) = Pr(S n =s S n 1
More informationPhilippe Flajolet & Analytic Combinatorics: Inherent Ambiguity of Context-Free Languages
Philippe Flajolet & Analytic Combinatorics: Inherent Ambiguity of Context-Free Languages Frédérique Bassino and Cyril Nicaud LIGM, Université Paris-Est & CNRS December 16, 2011 I first met Philippe in
More informationAnalytic Algorithmics, Combinatorics, and Information Theory
Analytic Algorithmics, Combinatorics, and Information Theory W. Szpankowski Department of Computer Science Purdue University W. Lafayette, IN 47907 September 11, 2006 AofA and IT logos Research supported
More information4F5: Advanced Communications and Coding Handout 2: The Typical Set, Compression, Mutual Information
4F5: Advanced Communications and Coding Handout 2: The Typical Set, Compression, Mutual Information Ramji Venkataramanan Signal Processing and Communications Lab Department of Engineering ramji.v@eng.cam.ac.uk
More information7 Asymptotics for Meromorphic Functions
Lecture G jacques@ucsd.edu 7 Asymptotics for Meromorphic Functions Hadamard s Theorem gives a broad description of the exponential growth of coefficients in power series, but the notion of exponential
More informationPROBABILITY AND INFORMATION THEORY. Dr. Gjergji Kasneci Introduction to Information Retrieval WS
PROBABILITY AND INFORMATION THEORY Dr. Gjergji Kasneci Introduction to Information Retrieval WS 2012-13 1 Outline Intro Basics of probability and information theory Probability space Rules of probability
More informationPROBABILISTIC BEHAVIOR OF ASYMMETRIC LEVEL COMPRESSED TRIES
PROBABILISTIC BEAVIOR OF ASYMMETRIC LEVEL COMPRESSED TRIES Luc Devroye Wojcieh Szpankowski School of Computer Science Department of Computer Sciences McGill University Purdue University 3450 University
More informationUNIT I INFORMATION THEORY. I k log 2
UNIT I INFORMATION THEORY Claude Shannon 1916-2001 Creator of Information Theory, lays the foundation for implementing logic in digital circuits as part of his Masters Thesis! (1939) and published a paper
More informationLecture 4 : Adaptive source coding algorithms
Lecture 4 : Adaptive source coding algorithms February 2, 28 Information Theory Outline 1. Motivation ; 2. adaptive Huffman encoding ; 3. Gallager and Knuth s method ; 4. Dictionary methods : Lempel-Ziv
More informationThe Expected Profile of Digital Search Trees
The Expected Profile of Digital Search Trees March 24, 2011 Michael Drmota Wojciech Szpankowski Inst. für Diskrete Mathematik und Geometrie Department of Computer Science TU Wien Purdue University A-1040
More informationOn universal types. Gadiel Seroussi Information Theory Research HP Laboratories Palo Alto HPL September 6, 2004*
On universal types Gadiel Seroussi Information Theory Research HP Laboratories Palo Alto HPL-2004-153 September 6, 2004* E-mail: gadiel.seroussi@hp.com method of types, type classes, Lempel-Ziv coding,
More informationSturmian Words, Sturmian Trees and Sturmian Graphs
Sturmian Words, Sturmian Trees and Sturmian Graphs A Survey of Some Recent Results Jean Berstel Institut Gaspard-Monge, Université Paris-Est CAI 2007, Thessaloniki Jean Berstel (IGM) Survey on Sturm CAI
More informationAnalytic Pattern Matching: From DNA to Twitter. Philippe Jacquet and Wojciech Szpankowski
Analytic Pattern Matching: From DNA to Twitter Philippe Jacquet and Wojciech Szpankowski May 11, 2015 Dedicated to Philippe Flajolet, our mentor and friend. Contents Foreword...................................
More informationVariable-to-Variable Codes with Small Redundancy Rates
Variable-to-Variable Codes with Small Redundancy Rates M. Drmota W. Szpankowski September 25, 2004 This research is supported by NSF, NSA and NIH. Institut f. Diskrete Mathematik und Geometrie, TU Wien,
More informationExploring String Patterns with Trees
Rose-Hulman Undergraduate Mathematics Journal Volume 12 Issue 2 Article 11 Exploring String Patterns with Trees Deborah Sutton Purdue University, dasutton@purdue.edu Booi Yee Wong Purdue University, wongb@purdue.edu
More informationEntropy Coding. Connectivity coding. Entropy coding. Definitions. Lossles coder. Input: a set of symbols Output: bitstream. Idea
Connectivity coding Entropy Coding dd 7, dd 6, dd 7, dd 5,... TG output... CRRRLSLECRRE Entropy coder output Connectivity data Edgebreaker output Digital Geometry Processing - Spring 8, Technion Digital
More informationarxiv: v1 [math.pr] 5 Nov 2017
Asymmetric Rényi Problem arxiv:1711.01528v1 [math.pr] 5 Nov 2017 M. DRMOTA 1 and A. MAGNER 2 and W. SZPANKOWSKI 1 Institute for Discrete Mathematics and Geometry, TU Wien, Vienna, Austria, A -1040 Wien,
More informationAnalytic Information Theory and Beyond
Analytic Information Theory and Beyond W. Szpankowski Department of Computer Science Purdue University W. Lafayette, IN 47907 May 12, 2010 AofA and IT logos Frankfurt 2010 Research supported by NSF Science
More informationAnalytic Depoissonization and Its Applications Λ
Analytic Depoissonization and Its Applications Λ W. Szpankowski y Department of Computer Science Purdue University W. Lafayette, IN 47907 May 31, 2003 Λ This research is supported by NSF and NIH. y Joint
More informationA Mathematical Theory of Communication
A Mathematical Theory of Communication Ben Eggers Abstract This paper defines information-theoretic entropy and proves some elementary results about it. Notably, we prove that given a few basic assumptions
More informationIs there an Elegant Universal Theory of Prediction?
Is there an Elegant Universal Theory of Prediction? Shane Legg Dalle Molle Institute for Artificial Intelligence Manno-Lugano Switzerland 17th International Conference on Algorithmic Learning Theory Is
More informationCOLLECTED WORKS OF PHILIPPE FLAJOLET
COLLECTED WORKS OF PHILIPPE FLAJOLET Editorial Committee: HSIEN-KUEI HWANG Institute of Statistical Science Academia Sinica Taipei 115 Taiwan ROBERT SEDGEWICK Department of Computer Science Princeton University
More informationUniversal Loseless Compression: Context Tree Weighting(CTW)
Universal Loseless Compression: Context Tree Weighting(CTW) Dept. Electrical Engineering, Stanford University Dec 9, 2014 Universal Coding with Model Classes Traditional Shannon theory assume a (probabilistic)
More informationOn Buffon Machines & Numbers
On Buffon Machines & Numbers Philippe Flajolet, Maryse Pelletier, Michèle Soria AofA 09, Fréjus --- June 2009 [INRIA-Rocquencourt & LIP6, Paris] 1 1733: Countess Buffon drops her knitting kit on the floor.
More informationPhase Transitions in a Sequence-Structure Channel. ITA, San Diego, 2015
Phase Transitions in a Sequence-Structure Channel A. Magner and D. Kihara and W. Szpankowski Purdue University W. Lafayette, IN 47907 January 29, 2015 ITA, San Diego, 2015 Structural Information Information
More informationA Master Theorem for Discrete Divide and Conquer Recurrences
A Master Theorem for Discrete Divide and Conquer Recurrences MICHAEL DRMOTA and WOJCIECH SZPANKOWSKI TU Wien and Purdue University Divide-and-conquer recurrences are one of the most studied equations in
More informationCommon Information. Abbas El Gamal. Stanford University. Viterbi Lecture, USC, April 2014
Common Information Abbas El Gamal Stanford University Viterbi Lecture, USC, April 2014 Andrew Viterbi s Fabulous Formula, IEEE Spectrum, 2010 El Gamal (Stanford University) Disclaimer Viterbi Lecture 2
More informationthe electronic journal of combinatorics 4 (997), #R7 2 (unbiased) coin tossing process by Prodinger [9] (cf. also []), who provided the rst non-trivia
ANALYSIS OF AN ASYMMETRIC LEADER ELECTION ALGORITHM Svante Janson Wojciech Szpankowski Department of Mathematics Department of Computer Science Uppsala University Purdue University 756 Uppsala W. Lafayette,
More information1 Ex. 1 Verify that the function H(p 1,..., p n ) = k p k log 2 p k satisfies all 8 axioms on H.
Problem sheet Ex. Verify that the function H(p,..., p n ) = k p k log p k satisfies all 8 axioms on H. Ex. (Not to be handed in). looking at the notes). List as many of the 8 axioms as you can, (without
More informationAn Introduction to Bioinformatics Algorithms Hidden Markov Models
Hidden Markov Models Outline 1. CG-Islands 2. The Fair Bet Casino 3. Hidden Markov Model 4. Decoding Algorithm 5. Forward-Backward Algorithm 6. Profile HMMs 7. HMM Parameter Estimation 8. Viterbi Training
More informationChapter 2: Source coding
Chapter 2: meghdadi@ensil.unilim.fr University of Limoges Chapter 2: Entropy of Markov Source Chapter 2: Entropy of Markov Source Markov model for information sources Given the present, the future is independent
More informationCourse 495: Advanced Statistical Machine Learning/Pattern Recognition
Course 495: Advanced Statistical Machine Learning/Pattern Recognition Lecturer: Stefanos Zafeiriou Goal (Lectures): To present discrete and continuous valued probabilistic linear dynamical systems (HMMs
More informationThe number of runs in Sturmian words
The number of runs in Sturmian words Paweł Baturo 1, Marcin Piatkowski 1, and Wojciech Rytter 2,1 1 Department of Mathematics and Computer Science, Copernicus University 2 Institute of Informatics, Warsaw
More informationlossless, optimal compressor
6. Variable-length Lossless Compression The principal engineering goal of compression is to represent a given sequence a, a 2,..., a n produced by a source as a sequence of bits of minimal possible length.
More informationIntroduction Consider a group of n people (users, computers, objects, etc.) sharing a scarce resource (e.g., channel, CPU, etc.). The following elimin
ANALYSIS OF AN ASYMMETRIC LEADER ELECTION ALGORITHM September 3, 996 Svante Janson Wojciech Szpankowski Department of Mathematics Department of Computer Science Uppsala University Purdue University 756
More informationStreaming and communication complexity of Hamming distance
Streaming and communication complexity of Hamming distance Tatiana Starikovskaya IRIF, Université Paris-Diderot (Joint work with Raphaël Clifford, ICALP 16) Approximate pattern matching Problem Pattern
More informationInformation in Biology
Information in Biology CRI - Centre de Recherches Interdisciplinaires, Paris May 2012 Information processing is an essential part of Life. Thinking about it in quantitative terms may is useful. 1 Living
More information1 Background on Information Theory
Review of the book Information Theory: Coding Theorems for Discrete Memoryless Systems by Imre Csiszár and János Körner Second Edition Cambridge University Press, 2011 ISBN:978-0-521-19681-9 Review by
More informationX 1 : X Table 1: Y = X X 2
ECE 534: Elements of Information Theory, Fall 200 Homework 3 Solutions (ALL DUE to Kenneth S. Palacio Baus) December, 200. Problem 5.20. Multiple access (a) Find the capacity region for the multiple-access
More informationMultimedia Communications. Mathematical Preliminaries for Lossless Compression
Multimedia Communications Mathematical Preliminaries for Lossless Compression What we will see in this chapter Definition of information and entropy Modeling a data source Definition of coding and when
More informationBasic Principles of Lossless Coding. Universal Lossless coding. Lempel-Ziv Coding. 2. Exploit dependences between successive symbols.
Universal Lossless coding Lempel-Ziv Coding Basic principles of lossless compression Historical review Variable-length-to-block coding Lempel-Ziv coding 1 Basic Principles of Lossless Coding 1. Exploit
More informationEstimating Statistics on Words Using Ambiguous Descriptions
Estimating Statistics on Words Using Ambiguous Descriptions Cyril Nicaud Université Paris-Est, LIGM (UMR 8049), F77454 Marne-la-Vallée, France cyril.nicaud@u-pem.fr Abstract In this article we propose
More informationLinear Classifiers (Kernels)
Universität Potsdam Institut für Informatik Lehrstuhl Linear Classifiers (Kernels) Blaine Nelson, Christoph Sawade, Tobias Scheffer Exam Dates & Course Conclusion There are 2 Exam dates: Feb 20 th March
More informationLAWS OF LARGE NUMBERS AND TAIL INEQUALITIES FOR RANDOM TRIES AND PATRICIA TREES
LAWS OF LARGE NUMBERS AND TAIL INEQUALITIES FOR RANDOM TRIES AND PATRICIA TREES Luc Devroye School of Computer Science McGill University Montreal, Canada H3A 2K6 luc@csmcgillca June 25, 2001 Abstract We
More informationInformation in Biology
Lecture 3: Information in Biology Tsvi Tlusty, tsvi@unist.ac.kr Living information is carried by molecular channels Living systems I. Self-replicating information processors Environment II. III. Evolve
More informationA Representation of Approximate Self- Overlapping Word and Its Application
Purdue University Purdue e-pubs Computer Science Technical Reports Department of Computer Science 1994 A Representation of Approximate Self- Overlapping Word and Its Application Wojciech Szpankowski Purdue
More informationNetworks: space-time stochastic models for networks
Mobile and Delay Tolerant Networks: space-time stochastic models for networks Philippe Jacquet Alcatel-Lucent Bell Labs France Plan of the talk Short history of wireless networking Information in space-time
More informationHidden Markov Models
Hidden Markov Models Outline 1. CG-Islands 2. The Fair Bet Casino 3. Hidden Markov Model 4. Decoding Algorithm 5. Forward-Backward Algorithm 6. Profile HMMs 7. HMM Parameter Estimation 8. Viterbi Training
More informationIntroduction to Information Theory. Uncertainty. Entropy. Surprisal. Joint entropy. Conditional entropy. Mutual information.
L65 Dept. of Linguistics, Indiana University Fall 205 Information theory answers two fundamental questions in communication theory: What is the ultimate data compression? What is the transmission rate
More informationDept. of Linguistics, Indiana University Fall 2015
L645 Dept. of Linguistics, Indiana University Fall 2015 1 / 28 Information theory answers two fundamental questions in communication theory: What is the ultimate data compression? What is the transmission
More informationPlan for today. ! Part 1: (Hidden) Markov models. ! Part 2: String matching and read mapping
Plan for today! Part 1: (Hidden) Markov models! Part 2: String matching and read mapping! 2.1 Exact algorithms! 2.2 Heuristic methods for approximate search (Hidden) Markov models Why consider probabilistics
More informationPART III. Outline. Codes and Cryptography. Sources. Optimal Codes (I) Jorge L. Villar. MAMME, Fall 2015
Outline Codes and Cryptography 1 Information Sources and Optimal Codes 2 Building Optimal Codes: Huffman Codes MAMME, Fall 2015 3 Shannon Entropy and Mutual Information PART III Sources Information source:
More informationHidden Markov Models. Ivan Gesteira Costa Filho IZKF Research Group Bioinformatics RWTH Aachen Adapted from:
Hidden Markov Models Ivan Gesteira Costa Filho IZKF Research Group Bioinformatics RWTH Aachen Adapted from: www.ioalgorithms.info Outline CG-islands The Fair Bet Casino Hidden Markov Model Decoding Algorithm
More informationarxiv: v1 [cs.ds] 15 Feb 2012
Linear-Space Substring Range Counting over Polylogarithmic Alphabets Travis Gagie 1 and Pawe l Gawrychowski 2 1 Aalto University, Finland travis.gagie@aalto.fi 2 Max Planck Institute, Germany gawry@cs.uni.wroc.pl
More informationCombinatorics On Sturmian Words. Amy Glen. Major Review Seminar. February 27, 2004 DISCIPLINE OF PURE MATHEMATICS
Combinatorics On Sturmian Words Amy Glen Major Review Seminar February 27, 2004 DISCIPLINE OF PURE MATHEMATICS OUTLINE Finite and Infinite Words Sturmian Words Mechanical Words and Cutting Sequences Infinite
More informationOn Symmetries of Non-Plane Trees in a Non-Uniform Model
On Symmetries of Non-Plane Trees in a Non-Uniform Model Jacek Cichoń Abram Magner Wojciech Spankowski Krystof Turowski October 3, 26 Abstract Binary trees come in two varieties: plane trees, often simply
More informationPattern Matching. a b a c a a b. a b a c a b. a b a c a b. Pattern Matching 1
Pattern Matching a b a c a a b 1 4 3 2 Pattern Matching 1 Outline and Reading Strings ( 9.1.1) Pattern matching algorithms Brute-force algorithm ( 9.1.2) Boyer-Moore algorithm ( 9.1.3) Knuth-Morris-Pratt
More informationContext-free Grammars and Languages
Context-free Grammars and Languages COMP 455 002, Spring 2019 Jim Anderson (modified by Nathan Otterness) 1 Context-free Grammars Context-free grammars provide another way to specify languages. Example:
More informationBuST-Bundled Suffix Trees
BuST-Bundled Suffix Trees Luca Bortolussi 1, Francesco Fabris 2, and Alberto Policriti 1 1 Department of Mathematics and Informatics, University of Udine. bortolussi policriti AT dimi.uniud.it 2 Department
More informationChapter 9 Fundamental Limits in Information Theory
Chapter 9 Fundamental Limits in Information Theory Information Theory is the fundamental theory behind information manipulation, including data compression and data transmission. 9.1 Introduction o For
More information1 Alphabets and Languages
1 Alphabets and Languages Look at handout 1 (inference rules for sets) and use the rules on some examples like {a} {{a}} {a} {a, b}, {a} {{a}}, {a} {{a}}, {a} {a, b}, a {{a}}, a {a, b}, a {{a}}, a {a,
More informationarxiv: v1 [cs.dm] 13 Feb 2010
Properties of palindromes in finite words arxiv:1002.2723v1 [cs.dm] 13 Feb 2010 Mira-Cristiana ANISIU Valeriu ANISIU Zoltán KÁSA Abstract We present a method which displays all palindromes of a given length
More informationData Compression Techniques
Data Compression Techniques Part 1: Entropy Coding Lecture 4: Asymmetric Numeral Systems Juha Kärkkäinen 08.11.2017 1 / 19 Asymmetric Numeral Systems Asymmetric numeral systems (ANS) is a recent entropy
More informationExam in Discrete Mathematics
Exam in Discrete Mathematics First Year at The TEK-NAT Faculty June 10th, 2016, 9.00-13.00 This exam consists of 11 numbered pages with 16 problems. All the problems are multiple choice problems. The answers
More informationFinding Consensus Strings With Small Length Difference Between Input and Solution Strings
Finding Consensus Strings With Small Length Difference Between Input and Solution Strings Markus L. Schmid Trier University, Fachbereich IV Abteilung Informatikwissenschaften, D-54286 Trier, Germany, MSchmid@uni-trier.de
More informationAverage Case Analysis of the Boyer-Moore Algorithm
Average Case Analysis of the Boyer-Moore Algorithm TSUNG-HSI TSAI Institute of Statistical Science Academia Sinica Taipei 115 Taiwan e-mail: chonghi@stat.sinica.edu.tw URL: http://www.stat.sinica.edu.tw/chonghi/stat.htm
More informationIndexing LZ77: The Next Step in Self-Indexing. Gonzalo Navarro Department of Computer Science, University of Chile
Indexing LZ77: The Next Step in Self-Indexing Gonzalo Navarro Department of Computer Science, University of Chile gnavarro@dcc.uchile.cl Part I: Why Jumping off the Cliff The Past Century Self-Indexing:
More information