NUMBER OF SYMBOL COMPARISONS IN QUICKSORT

Similar documents
NUMBER OF SYMBOL COMPARISONS IN QUICKSORT AND QUICKSELECT

Digital Trees and Memoryless Sources: from Arithmetics to Analysis

Lattice Reduction Algorithms: EUCLID, GAUSS, LLL Description and Probabilistic Analysis

ANALYSIS of EUCLIDEAN ALGORITHMS. An Arithmetical Instance of Dynamical Analysis Dynamical Analysis := Analysis of Algorithms + Dynamical Systems

Asymptotic and Exact Poissonized Variance in the Analysis of Random Digital Trees (joint with Hsien-Kuei Hwang and Vytas Zacharovas)

Notes by Zvi Rosen. Thanks to Alyssa Palfreyman for supplements.

On rational approximation of algebraic functions. Julius Borcea. Rikard Bøgvad & Boris Shapiro

Multiple Choice Tries and Distributed Hash Tables

COMP 9024, Class notes, 11s2, Class 1

ON COMPOUND POISSON POPULATION MODELS

10-704: Information Processing and Learning Fall Lecture 9: Sept 28

Data selection. Lower complexity bound for sorting

Algorithms, CSE, OSU Quicksort. Instructor: Anastasios Sidiropoulos

COMP Analysis of Algorithms & Data Structures

String Complexity. Dedicated to Svante Janson for his 60 Birthday

EECS 229A Spring 2007 * * (a) By stationarity and the chain rule for entropy, we have

Data Structures and Algorithms CSE 465

Poisson Processes. Particles arriving over time at a particle detector. Several ways to describe most common model.

Real numbers with bounded digit averages

Algebraic Geometry Codes. Shelly Manber. Linear Codes. Algebraic Geometry Codes. Example: Hermitian. Shelly Manber. Codes. Decoding.

On Partial Sorting. Conrado Martínez. Univ. Politècnica de Catalunya, Spain

CS161: Algorithm Design and Analysis Recitation Section 3 Stanford University Week of 29 January, Problem 3-1.

Riemann s Zeta Function and the Prime Number Theorem

A Master Theorem for Discrete Divide and Conquer Recurrences. Dedicated to PHILIPPE FLAJOLET

3F1 Information Theory, Lecture 3

Model Counting for Logical Theories

3F1 Information Theory, Lecture 3

An instantaneous code (prefix code, tree code) with the codeword lengths l 1,..., l N exists if and only if. 2 l i. i=1

Stability of Parameter Adaptation Algorithms. Big picture

Uniformly Uniformly-ergodic Markov chains and BSDEs

Lecture 11: Continuous-valued signals and differential entropy

Week 5: Quicksort, Lower bound, Greedy

STAT 380 Markov Chains

On the Complexity of Gröbner Basis Computation for Regular and Semi-Regular Systems

METEOR PROCESS. Krzysztof Burdzy University of Washington

Data Structures and Algorithms Running time and growth functions January 18, 2018

COMP 250: Quicksort. Carlos G. Oliver. February 13, Carlos G. Oliver COMP 250: Quicksort February 13, / 21

The Central Limit Theorem: More of the Story

From the Discrete to the Continuous, and Back... Philippe Flajolet INRIA, France

Part IB. Further Analysis. Year

Multivariable Calculus Notes. Faraad Armwood. Fall: Chapter 1: Vectors, Dot Product, Cross Product, Planes, Cylindrical & Spherical Coordinates

Uncommon Suffix Tries

Polynomials over finite fields. Algorithms and Randomness

PROBABILITY: LIMIT THEOREMS II, SPRING HOMEWORK PROBLEMS

An Analysis of the Height of Tries with Random Weights on the Edges

Digital search trees JASS

1 Computational Problems

Module 3F2: Systems and Control EXAMPLES PAPER 2 ROOT-LOCUS. Solutions

Blow-up on manifolds with symmetry for the nonlinear Schröding

Introduction to information theory and coding

Partitions into Values of a Polynomial

7 Asymptotics for Meromorphic Functions

Lecture Learning infinite hypothesis class via VC-dimension and Rademacher complexity;

STAT2201. Analysis of Engineering & Scientific Data. Unit 3

Control Systems. Root Locus & Pole Assignment. L. Lanari

Complex Variables...Review Problems (Residue Calculus Comments)...Fall Initial Draft

Notes for Lecture Notes 2

Counting subgroups of the multiplicative group

Part IB Complex Analysis

. Then g is holomorphic and bounded in U. So z 0 is a removable singularity of g. Since f(z) = w 0 + 1

1 Quick Sort LECTURE 7. OHSU/OGI (Winter 2009) ANALYSIS AND DESIGN OF ALGORITHMS

Chapter 14. Basics of The Differential Geometry of Surfaces. Introduction. Parameterized Surfaces. The First... Home Page. Title Page.

Analysis of Systems with State-Dependent Delay

Estimates for the resolvent and spectral gaps for non self-adjoint operators. University Bordeaux

Uncommon Suffix Tries

Philippe Flajolet & Analytic Combinatorics: Inherent Ambiguity of Context-Free Languages

Coding of memoryless sources 1/35

On Universal Types. Gadiel Seroussi Hewlett-Packard Laboratories Palo Alto, California, USA. University of Minnesota, September 14, 2004

Coding on Countably Infinite Alphabets

Laplace Transform Part 1: Introduction (I&N Chap 13)

Math 185 Fall 2015, Sample Final Exam Solutions

ECE 345 / ME 380 Introduction to Control Systems Lecture Notes 8

MOMENTS OF HYPERGEOMETRIC HURWITZ ZETA FUNCTIONS

ECEN 605 LINEAR SYSTEMS. Lecture 20 Characteristics of Feedback Control Systems II Feedback and Stability 1/27

Analytic number theory for probabilists

Computing rewards in probabilistic pushdown systems

Theorem [Mean Value Theorem for Harmonic Functions] Let u be harmonic on D(z 0, R). Then for any r (0, R), u(z 0 ) = 1 z z 0 r

Liquidation in Limit Order Books. LOBs with Controlled Intensity

Basic Principles of Lossless Coding. Universal Lossless coding. Lempel-Ziv Coding. 2. Exploit dependences between successive symbols.

4. Complex Analysis, Rational and Meromorphic Asymptotics

Approximate Counting via the Poisson-Laplace-Mellin Method (joint with Chung-Kuei Lee and Helmut Prodinger)

A One-to-One Code and Its Anti-Redundancy

Finite rooted tree. Each vertex v defines a subtree rooted at v. Picking v at random (uniformly) gives the random fringe subtree.

Complexity Theory VU , SS The Polynomial Hierarchy. Reinhard Pichler

4 Uniform convergence

Outline. Complexity Theory EXACT TSP. The Class DP. Definition. Problem EXACT TSP. Complexity of EXACT TSP. Proposition VU 181.

Applications of Discrete Mathematics to the Analysis of Algorithms

Part IA Probability. Definitions. Based on lectures by R. Weber Notes taken by Dexter Chua. Lent 2015

The zeta function, L-functions, and irreducible polynomials

CSE 312, Winter 2011, W.L.Ruzzo. 8. Average-Case Analysis of Algorithms + Randomized Algorithms

INTEGRATION WORKSHOP 2003 COMPLEX ANALYSIS EXERCISES

10-704: Information Processing and Learning Spring Lecture 8: Feb 5

ζ (s) = s 1 s {u} [u] ζ (s) = s 0 u 1+sdu, {u} Note how the integral runs from 0 and not 1.

Efficient Boltzmann Samplers for Weighted Partitions and Selections

Manual for SOA Exam MLC.

UNIVERSAL UNFOLDINGS OF LAURENT POLYNOMIALS AND TT STRUCTURES. Claude Sabbah

Forty years of tree enumeration

A construction of strictly ergodic subshifts having entropy dimension (joint with K. K. Park and J. Lee (Ajou Univ.))

Null Cones to Infinity, Curvature Flux, and Bondi Mass

PROBABILITY: LIMIT THEOREMS II, SPRING HOMEWORK PROBLEMS

Transcription:

NUMBER OF SYMBOL COMPARISONS IN QUICKSORT Brigitte Vallée (CNRS and Université de Caen, France) Joint work with Julien Clément, Jim Fill and Philippe Flajolet

Plan of the talk. Presentation of the study Statement of the results The general model of source The main steps of the method Sketch of the proof. What is a tamed source?

Plan of the talk. Presentation of the study Statement of the results The general model of source The main steps of the method Sketch of the proof What is a tamed source?

The classical framework for sorting. The main sorting algorithms or searching algorithms e.g., QuickSort, BST-Search, InsertionSort,... deal with n (distinct) keys U 1, U 2,..., U n of the same ordered set Ω. They perform comparisons and exchanges between keys. The unit cost is the key comparison.

The classical framework for sorting. The main sorting algorithms or searching algorithms e.g., QuickSort, BST-Search, InsertionSort,... deal with n (distinct) keys U 1, U 2,..., U n of the same ordered set Ω. They perform comparisons and exchanges between keys. The unit cost is the key comparison. The behaviour of the algorithm (wrt to key comparisons) only depends on the relative order between the keys. It is sufficient to restrict to the case when Ω = [1..n]. The input set is then S n, with uniform probability.

The classical framework for sorting. The main sorting algorithms or searching algorithms e.g., QuickSort, BST-Search, InsertionSort,... deal with n (distinct) keys U 1, U 2,..., U n of the same ordered set Ω. They perform comparisons and exchanges between keys. The unit cost is the key comparison. The behaviour of the algorithm (wrt to key comparisons) only depends on the relative order between the keys. It is sufficient to restrict to the case when Ω = [1..n]. The input set is then S n, with uniform probability. Then, the analysis of all these algorithms is very well known, with respect to the number of key comparisons performed in the worst-case, or in the average case.

Here, realistic analysis of the QuickSort algorithm QuickSort (n, A): sorts the array A Choose a pivot; (k, A, A + ) := Partition(A); QuickSort (k 1, A ); QuickSort (n k, A + ).

Here, realistic analysis of the QuickSort algorithm QuickSort (n, A): sorts the array A Choose a pivot; (k, A, A + ) := Partition(A); QuickSort (k 1, A ); QuickSort (n k, A + ). Mean number K n of key comparisons K n 2n log n

A more realistic framework for sorting. Keys are viewed as words. The domain Ω of keys is a subset of Σ. Σ = {the infinite words on some ordered alphabet Σ}. The words are compared [wrt the lexicographic order]. The realistic unit cost is now the symbol comparison.

A more realistic framework for sorting. Keys are viewed as words. The domain Ω of keys is a subset of Σ. Σ = {the infinite words on some ordered alphabet Σ}. The words are compared [wrt the lexicographic order]. The realistic unit cost is now the symbol comparison. The realistic cost of the comparison between two words A and B, A = a 1 a 2 a 3... a i... and B = b 1 b 2 b 3... b i... equals k + 1, where k is the length of their largest common prefix k := max{i; j i, a j = b j }= the coincidence

We are interested in this new cost for each algorithm: the number of symbol comparisons...and its mean value S n ( for n words)

We are interested in this new cost for each algorithm: the number of symbol comparisons...and its mean value S n ( for n words) How is S n compared to K n? That is the question... An initial question asked by Sedgewick in 2000...... In order to also compare with text algorithms based on tries.

We are interested in this new cost for each algorithm: the number of symbol comparisons...and its mean value S n ( for n words) How is S n compared to K n? That is the question... An initial question asked by Sedgewick in 2000...... In order to also compare with text algorithms based on tries. An example. Sixteen words drawn from the memoryless source p(a) = 1/3, p(b) = 2/3. We keep the prefixes of length 12. A = abbbbbaaabab B = abbbbbbaabaa C = baabbbabbbba D = bbbababbbaab E = bbabbaababbb F = abbbbbbbbabb G = bbaabbabbaba H = ababbbabbbab I = bbbaabbbbbbb J = abaabbbbaabb K = bbbabbbbbbaa L = aaaabbabaaba M = bbbaaabbbbbb N = abbbbbbabbaa O = abbabababbbb P = bbabbbaaaabb

A = abbbbbaaabab B = abbbbbbaabaa C = baabbbabbbba D =bbbababbbaab E = bbabbaababbb F = abbbbbbbbabb G = bbaabbabbaba H = ababbbabbbab I = bbbaabbbbbbb J = abaabbbbaabb K = bbbabbbbbbaa L = aaaabbabaaba M = bbbaaabbbbbb N = abbbbbbabbaa O = abbabababbbb P = bbabbbaaaabb

Plan of the talk. Presentation of the study Statement of the results The general model of source The main steps of the method Sketch of the proof What is a tamed source?

Case of QuickSort(n)[CFFV 08] Theorem. For any tamed source, the mean number S n of symbol comparisons used by QuickSort(n) satisfies S n 1 h S n log 2 n. and involves the entropy h S of the source S, defined as h S := lim 1 p w log p w, k k w Σ k where p w is the probability that a word begins with prefix w.

Case of QuickSort(n)[CFFV 08] Theorem. For any tamed source, the mean number S n of symbol comparisons used by QuickSort(n) satisfies S n 1 h S n log 2 n. and involves the entropy h S of the source S, defined as h S := lim 1 p w log p w, k k w Σ k where p w is the probability that a word begins with prefix w. Compared to K n 2n log n, there is an extra factor equal to 1/(2h S ) log n Compared to T n (1/h S ) n log n, there is an extra factor of log n.

The (general) model of source A general source S produces infinite words on an ordered alphabet Σ. For w Σ, p w := probability that a word begins with the prefix w. Define a w := p w b w := p w w, w = w w < w, p w 0 w, w = w w w, p w 0 Then: u [0, 1], k 1, w = M k (u) Σ k such that u [a w, b w [.

The (general) model of source A general source S produces infinite words on an ordered alphabet Σ. For w Σ, p w := probability that a word begins with the prefix w. Define a w := p w b w := p w w, w = w w < w, p w 0 w, w = w w w, p w 0 Then: u [0, 1], k 1, w = M k (u) Σ k such that u [a w, b w [. Example p 0 = 1/3, p 1 = 2/3 M 1 (1/2) = 1, M 1 (1/4) = 0

The (general) model of source A general source S produces infinite words on an ordered alphabet Σ. For w Σ, p w := probability that a word begins with the prefix w. Define a w := p w b w := p w w, w = w w < w, p w 0 w, w = w w w, p w 0 Then: u [0, 1], k 1, w = M k (u) Σ k such that u [a w, b w [. Example p 0 = 1/3, p 1 = 2/3 M 1 (1/2) = 1, M 1 (1/4) = 0 p 00 = 1/12, p 01 = 3/12, p 10 = 1/2, p 11 = 1/6 M 2 (1/2) = 10, M 2 (1/4) = 01

The (general) model of source A general source S produces infinite words on an ordered alphabet Σ. For w Σ, p w := probability that a word begins with the prefix w. Define a w := p w b w := p w w, w = w w < w, p w 0 w, w = w w w, p w 0 Then: u [0, 1], k 1, w = M k (u) Σ k such that u [a w, b w [. Example p 0 = 1/3, p 1 = 2/3 M 1 (1/2) = 1, M 1 (1/4) = 0 p 00 = 1/12, p 01 = 3/12, p 10 = 1/2, p 11 = 1/6 M 2 (1/2) = 10, M 2 (1/4) = 01 If p w 0 for w, the sequences (a w ) and (b w ) are adjacent They define an infinite word M(u) := lim k M k (u). Then, the source is alternatively defined by a mapping M : [0, 1] Σ. Fundamental interval [a w, b w ] := {u, M(u) begins with prefix w}

Natural instances of sources: Dynamical sources With a shift map T : I I and an encoding map τ : I Σ, the emitted word is M(x) = (τx, τt x, τt 2 x,... τt k x,...) T 2 x T x x T 3 x

Natural instances of sources: Dynamical sources With a shift map T : I I and an encoding map τ : I Σ, the emitted word is M(x) = (τx, τt x, τt 2 x,... τt k x,...) T 2 x T x x T 3 x A dynamical system, with Σ = {a, b, c} and a word M(x) = (c, b, a, c...).

Memoryless sources or Markov chains. = Dynamical sources with affine branches... 1.0 1.0 0.8 0.8 0.6 0.6 0.4 0.4 0.2 0.2 0.0 0.0 0.2 0.4 0.6 0.8 1.0 x 0.0 0.0 0.2 0.4 0.6 0.8 1.0 x

The dynamical framework leads to more general sources. The curvature of branches entails correlation between symbols

The dynamical framework leads to more general sources. The curvature of branches entails correlation between symbols Example : the Continued Fraction source

Fundamental intervals and fundamental triangles. 1.0 0.8 0.6 0.4 0.2 0.0 0.0 0.2 0.4 0.6 0.8 1.0 x 1.0 0.8 0.6 0.4 0.2 0.0 0.0 0.2 0.4 0.6 0.8 1.0 x

Plan of the talk. Presentation of the study Statement of the results The general model of source The main steps of the method Sketch of the proof What is a tamed source?

Three main steps for the analysis of the mean number S n of symbol comparisons

Three main steps for the analysis of the mean number S n of symbol comparisons (A) The Poisson model P Z does not deal with a fixed number n of keys. The number N of keys is now a random variable which follows a Poisson law of parameter Z. We first obtain nice expressions for S Z...

Three main steps for the analysis of the mean number S n of symbol comparisons (A) The Poisson model P Z does not deal with a fixed number n of keys. The number N of keys is now a random variable which follows a Poisson law of parameter Z. We first obtain nice expressions for S Z... (B) It is now possible to returm to the model where the number of keys is fixed. We obtain a nice exact formula for S n... from which it is not easy to obtain the asymptotics...

Three main steps for the analysis of the mean number S n of symbol comparisons (A) The Poisson model P Z does not deal with a fixed number n of keys. The number N of keys is now a random variable which follows a Poisson law of parameter Z. We first obtain nice expressions for S Z... (B) It is now possible to returm to the model where the number of keys is fixed. We obtain a nice exact formula for S n... from which it is not easy to obtain the asymptotics... (C) Then, the Rice formula provides the asymptotics of S n ( n ), as soon as the source is tamed.

Plan of the talk. Presentation of the study Statement of the results The general model of source The main steps of the method Sketch of the proof What is a tamed source?

(A) Dealing with the Poisson Model. In the P Z model, the number N of keys follows the Poisson law Z Zn Pr[N = n] = e n!, the mean number S(Z) of symbol comparisons for QuickSort is S(Z) = [γ(u, t) + 1] π(u, t) du dt T where T := {(u, t), 0 u t 1} is the unit triangle γ(u, t):= coincidence between M(u) and M(t) π(u, t) du dt := Mean number of key-comparisons between M(u ) and M(t ) with u [u, u + du] and t [t dt, t].

First Step in the Poisson model : The coincidence γ(u, t) An (easy) alternative expression for S(Z) = [γ(u, t) + 1] π(u, t) du dt T = w Σ T w π(u, t) du dt It involves the fundamental triangles and separates the rôles of the source and the algorithm. It is then sufficient to study the key probability π(u, t) of the algorithm (the second step). take its integral on each fundamental triangle (the third step)

Fundamental intervals and fundamental triangles. 1.0 0.8 0.6 0.4 0.2 0.0 0.0 0.2 0.4 0.6 0.8 1.0 x 1.0 0.8 0.6 0.4 0.2 0.0 0.0 0.2 0.4 0.6 0.8 1.0 x

Study of the key probability π(u, t) of the algorithm (I) π(u, t) du dt := Mean number of key-comparisons between M(u ) and M(t ) with u [u, u + du] and t [t dt, t].

Study of the key probability π(u, t) of the algorithm (I) π(u, t) du dt := Mean number of key-comparisons between M(u ) and M(t ) with u [u, u + du] and t [t dt, t]. Case of QuickSort. M(u) and M(t) are compared iff the first pivot chosen in {M(v), v [u, t]} is M(u) or M(t)

Study of the key probability π(u, t) of the algorithm (I) π(u, t) du dt := Mean number of key-comparisons between M(u ) and M(t ) with u [u, u + du] and t [t dt, t]. Case of QuickSort. M(u) and M(t) are compared iff the first pivot chosen in {M(v), v [u, t]} is M(u) or M(t) QuickSort (n, A): sorts the array A Choose a pivot; (k, A, A + ) := Partition(A); QuickSort (k 1, A ); QuickSort (n k, A + ).

Study of the key probability π(u, t) of the algorithm (I) π(u, t) du dt := Mean number of key-comparisons between M(u ) and M(t ) with u [u, u + du] and t [t dt, t]. Case of QuickSort. M(u) and M(t) are compared iff the first pivot chosen in {M(v), v [u, t]} is M(u) or M(t)

Study of the key probability π(u, t) of the algorithm (I) π(u, t) du dt := Mean number of key-comparisons between M(u ) and M(t ) with u [u, u + du] and t [t dt, t]. Case of QuickSort. M(u) and M(t) are compared iff the first pivot chosen in {M(v), v [u, t]} is M(u) or M(t) [ ] 2 π(u, t) du dt = Zdu Zdt E 2 + N [u,t] Here, N [u,t] is the number of words M(v) with v [u, t], It follows a Poisson law of parameter Z(t u). Then: π(u, t) = 2 Z 2 f 1 (Z(t u)) with f 1 (θ) := θ 2 [e θ 1+θ] Finally: S(Z) = 2 Z 2 w Σ T w f 1 (Z(t u)) du dt

(B) Return to the model where n is fixed. With the expansion of f 1, the mean value S(Z) = ( 1) k ϖ( k) Zk k!, k=2 is expressed with a series ϖ(s) of Dirichlet type, which depends both on the algorithm and the source. ϖ(s) = 2 w Σ (t u) Tw (s+2) dudt = p s w s(s + 1) where Λ(s) := p s w w Σ T w (t u) (s+2) dudt = ϖ(s) = 2 Λ(s) s(s + 1) is the Dirichlet series of probabilities.

(B) Return to the model where n is fixed. With the expansion of f 1, the mean value S(Z) = ( 1) k ϖ( k) Zk k!, k=2 is expressed with a series ϖ(s) of Dirichlet type, which depends both on the algorithm and the source. ϖ(s) = 2 w Σ (t u) Tw (s+2) dudt = p s w s(s + 1) where Λ(s) := p s w w Σ Since S ( n n! = [Zn ] e Z S(Z) ) n ( ) n S n = 2 ( 1) k ϖ( k) = 2 k k=2 T w (t u) (s+2) dudt = ϖ(s) = 2 Λ(s) s(s + 1) is the Dirichlet series of probabilities., there is an exact formula for S n n ( n ( 1) k k k=2 ) Λ( k) k(k 1).

(C) Using Rice formula As soon as ϖ(s) is weakly tamed in R(s) < σ 0 with σ 0 > 2, the residue formula transforms the sum into an integral: S n = n ( ) n ( 1) k ϖ( k) = 1 k 2iπ k=2 with 2 < d < min( 1, σ 0 ). d+i d i n! ϖ(s) s(s + 1)... (s + n) ds,

(C) Using Rice formula As soon as ϖ(s) is weakly tamed in R(s) < σ 0 with σ 0 > 2, the residue formula transforms the sum into an integral: S n = n ( ) n ( 1) k ϖ( k) = 1 k 2iπ k=2 with 2 < d < min( 1, σ 0 ). where d+i d i n! ϖ(s) s(s + 1)... (s + n) ds, Where are the leftmost singularities for ϖ(s)? Recall: ϖ(s) = 2 Λ(s) s(s + 1) Λ(s) := w has always a singularity at s = 1. w Σ p s What type of singularity? Is it the dominant singularity?

Plan of the talk. Presentation of the study Statement of the results The general model of source The main steps of the method Sketch of the proof What is a tamed source?

What can be expected about Λ(s)? For any source, Λ(s) has a singularity at s = 1.

What can be expected about Λ(s)? For any source, Λ(s) has a singularity at s = 1. For a tamed source S, the dominant singularity of Λ(s) is located at s = 1, this is a simple pôle, whose residue equals 1/h S.

What can be expected about Λ(s)? For any source, Λ(s) has a singularity at s = 1. For a tamed source S, the dominant singularity of Λ(s) is located at s = 1, this is a simple pôle, whose residue equals 1/h S. In this case, there is a triple pôle at s = 1 for ϖ(s) s + 1 = 2 Λ(s) s(s + 1) 2 and ϖ(s) s + 1 2 h S 1 (s + 1) 3 s 1

For shifting the integral to the right, past... d = 1, other properties of Λ(s) are needed on Rs 1, more subtle Different behaviours of Λ(s) for Rs 1 where one can past d = 1...

For shifting the integral to the right, past... d = 1, other properties of Λ(s) are needed on Rs 1, more subtle Different behaviours of Λ(s) for Rs 1 where one can past d = 1... In colored domains, Λ(s) is meromorphic and of polynomial growth for s.

For shifting the integral to the right, past... d = 1, other properties of Λ(s) are needed on Rs 1, more subtle Different behaviours of Λ(s) for Rs 1 where one can past d = 1... In colored domains, Λ(s) is meromorphic and of polynomial growth for s. For dynamical sources, we provide sufficient conditions (of geometric or arithmetic type), under which these behaviours hold. For a memoryless source, they depend on the approximability of ratios log p i/ log p j

Conclusions. If the source S is tamed, then S n 1 h S n log 2 n. We are done!

Conclusions. If the source S is tamed, then S n 1 h S n log 2 n. We are done! QuickSort must be compared to the algorithm which uses tries for sorting n words. We have studied its mean cost T n in 2001. For a tamed source, T n 1 h S n log n.

Conclusions. If the source S is tamed, then S n 1 h S n log 2 n. We are done! QuickSort must be compared to the algorithm which uses tries for sorting n words. We have studied its mean cost T n in 2001. For a tamed source, T n 1 h S n log n. Our methods apply to all the QuickSelect algorithms, and the hypotheses needed for the source are different (less strong). They are related to properties of Λ(s) for Rs < 1.

Conclusions. If the source S is tamed, then S n 1 h S n log 2 n. We are done! QuickSort must be compared to the algorithm which uses tries for sorting n words. We have studied its mean cost T n in 2001. For a tamed source, T n 1 h S n log n. Our methods apply to all the QuickSelect algorithms, and the hypotheses needed for the source are different (less strong). They are related to properties of Λ(s) for Rs < 1. It is easy to adapt our results to the intermittent sources, which emits long sequences of the same symbols. In this case, S n = Θ(n log 3 n), T n = Θ(n log 2 n).

Open problems. What about the distribution of the average search cost in a BST? Is it asymptotically normal? We know that is the case if one counts the number of key comparisons. We already know that, for a tamed source, the average depth of a trie is asymptotically normal (Cesaratto-V, 07).

Open problems. What about the distribution of the average search cost in a BST? Is it asymptotically normal? We know that is the case if one counts the number of key comparisons. We already know that, for a tamed source, the average depth of a trie is asymptotically normal (Cesaratto-V, 07). Long term research projects... Revisit the complexity results of the main classical algorithms, and take into account the number of symbol-comparisons... instead of the number of key-comparisons.

Open problems. What about the distribution of the average search cost in a BST? Is it asymptotically normal? We know that is the case if one counts the number of key comparisons. We already know that, for a tamed source, the average depth of a trie is asymptotically normal (Cesaratto-V, 07). Long term research projects... Revisit the complexity results of the main classical algorithms, and take into account the number of symbol-comparisons... instead of the number of key-comparisons. Provide a sharp analytic classification of sources: Transfer geometric properties of sources into analytical properties of Λ(s)

Natural instances of sources: Dynamical sources With a shift map T : I I and an encoding map τ : I Σ, the emitted word is M(x) = (τx, τt x, τt 2 x,... τt k x,...) T 2 x T x x T 3 x

Natural instances of sources: Dynamical sources With a shift map T : I I and an encoding map τ : I Σ, the emitted word is M(x) = (τx, τt x, τt 2 x,... τt k x,...) T 2 x T x x T 3 x A dynamical system, with Σ = {a, b, c} and a word M(x) = (c, b, a, c...).

Memoryless sources or Markov chains. = Dynamical sources with affine branches... 1.0 1.0 0.8 0.8 0.6 0.6 0.4 0.4 0.2 0.2 0.0 0.0 0.2 0.4 0.6 0.8 1.0 x 0.0 0.0 0.2 0.4 0.6 0.8 1.0 x

The dynamical framework leads to more general sources. The curvature of branches entails correlation between symbols

The dynamical framework leads to more general sources. The curvature of branches entails correlation between symbols Example : the Continued Fraction source

Fundamental intervals and fundamental triangles. 1.0 0.8 0.6 0.4 0.2 0.0 0.0 0.2 0.4 0.6 0.8 1.0 x 1.0 0.8 0.6 0.4 0.2 0.0 0.0 0.2 0.4 0.6 0.8 1.0 x

Case of QuickMin(n), QuickMax(n), [CFFV 08] Theorem 2. For any weakly tamed source, the mean numbers of symbol comparisons used by QuickMin(n) and QuickMax(n) T ( ) n involve the constants c (ɛ) S c (ɛ) S c ( ) (+) S n and T n c (+) S n, which depend on probabilities pw and p(ɛ) w, (ɛ = ±) := X "!# 1 p(ɛ) w log 1 + pw. p w p (ɛ) w w Σ p w Here p ( ) w, p (+) w, p w are the probabilities that a word begins with a prefix w, with w = w and w < w or w > w or w = w.

Case of QuickRand(n) [CFFV 08] Theorem 3. For any weakly tamed source, the mean number of symbol comparisons used by QuickRand(n) (randomized wrt rank), satisfies T n c S n, with c S = X " 2 + 1 + X "!! 2!## log 1 + p(ɛ) w p(ɛ) w log 1 + pw, p w p w p w ɛ=± p (ɛ) w w Σ p 2 w Here p ( ) w, p ( ) w, p w are the probabilities that a word begins with a prefix w, with w = w and w < w or w > w or w = w.

Case of QuickQuant α (n) [CFFV 09] Work yet in progress Theorem 4. For any weakly tamed source, the mean number of symbol comparisons used by QuickQuant α (n) satisfies q n (α) ρ S (α) n with ρ S(α) = 2 X p w + p w log p w p (α,+) w log p (α,+) w p (α, ) w log p (α, ) w w S(α)!!# + 2 X w R(α) + 2 X w L(α) p w " 1 + p(α, ) w 1 p w "! p w 1 + p(α,+) w 1 p w log 1 pw p w (α, ) log 1 pw p (α,+) w Three parts depending on the position of probabilities p (ɛ) w wrt α: w L(α) iff p (+) w 1 α, w R(α) iff p ( ) w α w S(α) iff p ( ) w < α < 1 p (+) w, p (α, ) w = 1 α p (+) w, p (α, ) w = α p ( ) w!#.