Randomized Quicksort and the Entropy of the Random Number Generator

Electroc Colloquum o Computatoal Complexty, Report No. 59 2004 Radomzed Qucksort ad the Etropy of the Radom Number Geerator Beatrce Lst, Markus Maucher, Uwe Schög ad Raer Schuler Abt. Theoretsche Iformatk, Uverstät Ulm, 89069 Ulm, Germay Jue 21, 2004 Abstract The worst-case complexty of a mplemetato of Qucksort depeds o the radom umber geerator that s used to select the pvot elemets. I ths paper we estmate the expected umber of comparsos of Qucksort as a fucto the etropy of the radom source. We gve upper ad lower bouds ad show that the expected umber of comparsos creases from log to 2, f the etropy of the radom source s bouded. As examples we show explct bouds for dstrbutos wth bouded m-etropy, the geometrcal dstrbuto ad the δ-radom source. 1 Itroducto Radomzed QuckSort s the well kow verso of QuckSort veted by Hoare [Ho] where the array elemet for splttg the array two parts the pvot elemet s selected at radom. It s also well kow that the expected umber of comparsos for every put permutato of the array elemets s 2 l 2 log 2 Θ. Here, the expectato s take over the radom choces doe the algorthm. Ths aalyss assumes radom umbers whch are depedet ad uformly dstrbuted. Here we aalyze radomzed QuckSort wthout assumg such a hgh etropy of the uderlyg radom source. Usg a radom umber geerator wth a low etropy ca result a worst-case behavor that ca go up to Θ 2. A extreme example s a very bad radom umber geerator that produces oly 1 as output. That s, each recursve call of QuckSort the frst array elemet s selected as pvot elemet. A worst case put ths case s the already sorted array. Related work has bee doe by Karloff ad Raghava [KR] see also [To] where the specal case of a lear cogruece geerator s cosdered ad a worst-case behavor of Ω 2 s show. Recurso for expected umber of comparsos Let T π be the expected umber of comparsos doe by radomzed QuckSort, whe operatg o a put array a[1],..., a[] whose elemets are permuted accordg to π S, that s, a[π1] < a[π2] < < a[π], where S s the set of all permutatos o {1,..., }. Let X be a radom varable takg values betwee 1 ad ot ecessarly uder uform dstrbuto whch models the radom umber geerator that s used to pck out a pvot elemet a[x]. 1 ISSN 1433-8092

We obta the followg recurso for the expected complexty.e. umber of comparsos T max π S T π. We have T 0 for 1; ad for > 1 we get T max π S T π 1 + max π S 1 + max π S 1 + max π S p T π 1 + T π p max T Φ 1 + max T Ψ Φ S 1 Ψ S p T 1 + T That s, there are 1 comparsos wth the selected pvot elemet, ad depedg o the rak of the pvot elemet wth the array, there are T 1 ad T addtoal comparsos. Here p s the probablty that the pvot elemet has rak wth the orderg of the array, that s, p P rπx. If the rak s ot uformly dstrbuted amog the umbers 1 to, a worst case put permutato ca be costructed such that the mddle raks receve relatvely low probablty ad the extreme raks close to 1 or close to get relatvely hgh probablty, resultg a large expected umber of comparsos. We gve upper ad lower bouds o the expected umber T of comparsos. Lower bouds are gve wth respect to a fxed put sequece the already sorted lst of elemets. We ca show see Theorem 1 that T g log 2 for ay fucto g greater tha 1/ m π p H /, where H / s the bary etropy fucto. Note that m π p H / s depedet of the permutato of the elemets,.e. s detcal for all dstrbutos p ad q such that p q π for all ad some permutato π. The lower boud see Theorem 3 ad 4 s derved for a fxed permutato the sorted lst of elemets, where we ca assume that the order s preserved all recursve calls of QuckSort. Therefore the lower boud T g Theorem 4 s w.r.t. ay fucto g less tha 1/ p H / + 1, where p s the probablty of selectg a[] as a pvot elemet. 2 Upper boud o the umber of expected comparsos Let P deote a sequece of probablty dstrbutos where P p 1,,..., p, s a dstrbuto o 1,...,. I the followg we use p to deote p,, sce s determed by the sze of the array. Theorem 1 We have T g log 2 for ay mootoe creasg fucto g wth the property 1 g m p H π S where Hx x log 2 x 1 x log 2 1 x s the bary etropy fucto Shao etropy. 2

Proof. Usg the above recurso for T we obta T 1 + max π S + max π p T 1 + T p g 1 1 log 2 1 + g log 2 + g max π S + g max π S p log 2 + p log 2 + g log 2 g m π S 1 log 2 1 log 2 1 + log 2 + p H To fsh the ducto proof, ths last expresso should be at most g log 2. Ths holds f ad oly f 1 g m p H π S as clamed. Example: I the stadard case of a uform dstrbuto p 1 we obta: Ths s asymptotcally equal to g 1 H 1. 1 1 Hxdx 2 l 2 1.38. 0 Aother Example: I the meda-of-3 verso of QuckSort cf. [K,SF], 3 dfferet elemets are pcked uformly at radom ad the meda of the 3 s used as the pvot elemet. I ths case p 6 1 1 2. Here the costat factor of the log -term ca be asymptotcally estmated by 6 1 0 1 x1 xhxdx 12 l 2 7 1.18 We gore here the addtoal umber of comparsos betwee the 3 elemets to fd out ther meda but ths does ot have a fluece asymptotcally. Sortg the probabltes Usg the symmetry of the fucto H aroud 1 2 ad ts mootocty, we get: m π S p H 1 m q j H π S j0 j 2. 3

Here, the q j are a reorderg of the p the followg way assumg s eve: q 0 p q 1 p 1 q 2 p 1 q 3 p 2.. q 2 p /2 q 1 p /2 1 Ths ew represetato has the advatage that the H-values the sum are creasg order, ad we ca determe whch permutato π S actually acheves the mmum. Namely, the mmum s acheved f the q j are ordered decreasg order. Ths s accordace wth the statemet the troducto that the worst case s assocated wth the stuato that the extreme raks occur wth hgher probablty tha the mddle raks. Lemma 2 Gve a sum of the followg form a j b πj, a j, b j 0 j1 where the a j are sorted strctly creasg order ad the permutato π ca be chose arbtrarly, the mmum value of the sum occurs whe the permutato π s such that the b πj are sorted decreasg order. Proof. Suppose that two elemets b, b are the wrog order,.e. b < b. We compare the stuato before ad after exchagg b ad b : a b + a j b a b + a j b a a j b b < 0 Ths meas, terchagg b ad b ths way strctly decreases the value of the sum. Furthermore, t s easy to see that the decreasgly sorted order ca always be acheved by swappg two elemets whch are the wrog order e.g. lke the BubbleSort algorthm. 3 A lower boud As we saw Secto 1, the rug tme of QuckSort s gve by the recurso T 1 + p T 1 + T, where p s the probablty of choosg the elemet wth rak as pvot elemet. To estmate a lower boud for the worst-case rug tme of QuckSort, we cosder as put the already sorted array of umbers. Further we assume that the parttog step of QuckSort leaves the elemets of the two sub-arrays the same relatve order as the put array. Recall that pvot-elemets are chose accordg to a sequece of probablty dstrbutos P, where dstrbuto P defes the probabltes o arrays of sze,.e. P p,1,..., p,. Note that f the p,j are sorted decreasg order, the a worst-case put s the already sorted sequece of umbers. I fact, f the sequece of probablty dstrbutos P s suffcetly uform, t should be possble to costruct a worst-case put by sortg probabltes as descrbed Secto 2. 4

Theorem 3 For ay sequece of probablty dstrbutos P t holds that T c g, for some costat c > 0, f for all > 0, g satsfes the two codtos ad g 1 12 p, 1 2 1 2 2 g g for all 0. Proof. Let P p 1,..., p be a dstrbuto where p s the probablty that we choose as a pvot elemet the elemet wth rak. For > 2, t holds T 1 + T 1 g 1 + g 1 g 1 g g + g g 1 2 2 g 2 + 2 12 2 g g 1 2 2. Therefore, T 1 + p T 1 + T 1 + g g The ducto hypothess follows f g p 1 12 2 2 2 1 1 p 1 12 2 2 2 The lower boud, Theorem 3, ca be gve usg the etropy fucto. Ths shows that up to a logarthmc factor we yeld matchg upper ad lower bouds. Theorem 4 For ay sequece of probablty dstrbutos P t holds that T c g, for some costat c > 0, f g satsfes the two codtos g p, H 1 1 + 1 ad g g Proof. We follow the proof of Theorem 3. For 2 1 2 2 T 1 + T g 2 + 2 1 2 2 g 2 + 2 + H + 1 g g H + 1. g H for 0. + 1 5

The last equalty follows from Lemma 5 below. Therefore, T 1 + p, T 1 + T 1 + g g p, H The the ducto hypothess follows f g p, H 1 Lemma 5 For tegers 5 ad wth 0, 1 2 2 2 + 2 + H 1. + 1. + 1 Proof. We use the kow equaltes l1 x x resp. log 2 1 x 0 x 1. So we get 1 2 2 2 + 2 + H + 1 2 2 + 1 + 2 2 + 2 2 22 2 + 1 + 2 2 2 22 2 + 1 + 2 2 2 + + 1 log 2 + 1 1 1 + 1 + 1 + 1 + 1 + 1 + 1 log 2 + 1 + 1 + + 1 + 1 + 1 22 2 + 1 + 2 2 + 2 2 2 + 2 2 2 + 1 2 1 1. + 1 x l 2, that hold for log 2 1 + 1 + 1 log 2 1 / l 2 For the secod last equalty, we use that + 1 2 l 2 2 for 5 ad set 0 5. + 1 Remark: Actually, the Lemma holds for 1, ot oly for 5. The remag 14 cases, 1, 0, 1, 1,..., 4, 4 ca be checked by computer. 4 Dstrbutos wth bouded etropy The uform dstrbuto o [1, ] {1,..., } has maxmal etropy. I ths secto we cosder dstrbutos whch have bouded etropy. Uform dstrbutos o a subset of {1,..., } Frst we cosder dstrbutos wth postve probablty o subsets of [1, ]. Let t o be a tme costructble mootoe creasg fucto. Defe a dstrbuto P p 1,..., p such that 1/t, f rak a t/2 p 1/t, f rak a > t/2 0, otherwse 6

That s, we choose the pvot elemet radomly usg a uform dstrbuto amog oly the worst t array elemets. Now p H / + 1 resp. p H/ are bouded as follows: p H + 1 p H t 2 t 4 log log + 1, 2 t 4 Ths gves T log t as a upper boud ad T 22 t log as a lower boud. Proof. A upper boud T g log 2 ca be estmated as follows. p H t/2 1 2 t H t/2 2 H t t/2 2 t log + log t/2 2 t/2 t log 2 2 t log t 2 2 t/2 t/2 + 1 t log t 2 t 2 4 log. t Wth t follows from Theorem 1 that g 4 t log2/t T 42 t log 2 log 2 2/t 7

I the same way the lower boud ca be calculated: p H + 1 t/2 1 2 t H + 1 t/2 2 H t + 1 t/2 2 t + 1 log + + 1 + 1 log + 1 + 1 + 1 t/2 2 t 2 + 1 log + 1 t/2 4 + 1 log + 1t t/2 4 log + 1 + 1t t/2 4 log + 1 + 1t t + 1 2 + 1 t + 2 2 + 1 log t/2 t/2 log + 1 log t + 2 4 + 1 t log logt/2 1 where we use that t/2 log t/2 logt/2 1 see Appedx, Lemma 10. Wth the fucto 2 + 1 g, t + 1 log we receve a lower boud of 2 + 1 T t + 1 log 4+1 t 4+1 t Ω 2 t log 4 t. M-Etropy A dstrbuto p 1,..., p has m-etropy k cf. [Lu] f p 2 k for all. Let P p 1,..., p be a dstrbuto wth m-etropy k. The we get T 42 as a upper boud ad T 22 as a lower boud. 2 k 2 k log Proof. 2 k /2 1 p H/ 2 2 k H/... same as above, wth t 2 k 2k 2 4 log 2 k, 8

ad p H + 1 2 k /2 1 2 2 k H + 1 2k + 1 2 + 1 2 + 1 log 2 k ad thus ad T 42 2 k log 2 log 2 2/2 k 2 + 1 T 2 k + 1 log 2+1 2 k So, for m-etropy 0 ths cludes the determstc case we get ad T 42 1 T log 2 log 2 2 log 2 42 log 2 + 1 42 + 1 log 2 + 1 2 log + 1 + 1 ad for m-etropy log 2 all pvot elemets are equally dstrbuted, we have T 42 log 2 log 2 2 4 log 2. Bouds for geometrc dstrbutos We cosder the case that pvot elemets are selected usg a geometrc dstrbuto. The probablty of pckg a elemet wth rak as pvot s gve by p q 1 1 q. More geerally, we allow the geometrc dstrbuto to deped o the sze of the array,.e., we defe P usg q : 1 1 f for some tme costructble mootoe fucto f. To estmate a lower boud o the umber of comparsos, we use Theorem 3 ad estmate p 1 12 2 c 2 2, for a costat c. Proof. Usg the fact that q 1 1 1 1 e, 9

t follows that 12 p 1 2 1 q 1 q 2 1 q 2 2 2 q 12 1 q 1 2 2 2 q 1 q 2 + 2 2 2 1 q 1 q 2 + 2 1 1 1 2 2 + 2 2 1 1 1 1 2 + 2 1 1 We splt the sum ad see that for k 0, 1, 2,... k+1 1 1 The we get k+1 k+1 k+1 j1 e +l j1 e k+j +l k+j e k j +lk+1 e k+lk+1 e j 2 + 2 2 1 e k+lk+1. 1 1 / 2 + 2 2 1 k0 k+1 k+1 j1 1 + 1 / 2 + 2 2 e k+lk+1 1 k0 2 + 22 k + 1 2 1 e k k0 c for a costat c. Usg Theorem 3, we get a lower boud of c 2 / for the rug tme of QuckSort. To get a upper boud for geometrc dstrbutos we estmate p H log e / 10

whch gves T 2 as upper boud, f o. Proof. p H 1 q q 1 q q 1 q q 1 q q 1 q q log q H q log + log q log 1 log q q log q q q 1 q 2 + 1 q 2 q 1 q q + 1 1 q 1 q We aga set q : 1 1 to obta p H log log log log log log 1 1 1 1 1 1 1 + 1 + 1 1 1 1 1 1 1 1 + 1 1 1 1 1 1 1 + 1 1 1 1 1 1 + 1 c log 1 1 1 1 e 1 2 1 1 + 1 for some costat c > 0 f o So we have a upper boud for the worst-case rug tme of T c2 for some costat c > 0. 5 The δ-radom source A geeral model of a radom bt geerator s the δ-radom-source. Sce the bas of each bt s a fucto of the prevous output, t ca be appled as a adversary argumet ad s partcularly suted for worst-case aalyss. See also [Pa, SV, AR]. 11

Defto 6 See [AR] A δ-radom-source s a radom bt geerator. Its bas may deped o the bts t has prevously output, but the probablty to output 1 must be the rage [δ, 1 δ]. Therefore, t has a teral state ω {0, 1}, deotg ts prevously output bts. To obta a radom umber X the rage 1,..., from the δ-radom-source, we output log bts ad terpret them as a umber Y. The, we set X : Y mod + 1. Lemma 7 See [As] For each p wth 0 < p < 1 2, there exsts a costat c, such that for all IN : cp 2Hp p j0 2 Hp. j Lemma 8 demovre-laplace Lmt Theorem For each p, 0 < p < 1, lm p k0 p k 1 p k 1 k 2 Proof. Let S,p be a bomally dstrbuted radom varable wth parameters ad p. The ormalzed bomal dstrbuto ca be approxmated by the ormal dstrbuto Φ, so lm p k0 p k 1 p k lm k P r[s,p p] [ lm P r Φ0 S,p p p1 p 0 ] 1 2 Theorem 9 For each δ-radom-source, 0 < δ < 1 2, there exsts 0 IN, such that for each > 0, ad each permutato π, Theorem 1 ca be appled wth g cδ 1 log 1 Hδ, where the radom bts are produced by a δ-radom-source modulo ad cδ s a costat that depeds o δ. Proof. From the symmetry ad mootoy of the etropy fucto t follows that for each s s 1 p H 1 sup p j s H, 1 π, ω 2 where p j depeds o π ad o the teral state ω of the radom source. Now we exame the two factors o the rghthad sde of 1 separately. We set k : log j1 ad s : 1 2 δk j0 k. j 12

Sce { P r[y πj], + πj 2 k p j P r[y πj] + P r[y πj + ] otherwse we get for the frst factor of 1, s 1 sup π, ω j1 p j sup ω δk j0 max M {0,1} k, M 2s k δ j 1 δ k j. j P r[y M] Here we use the result from [AR], that the maxmum probablty of httg a set of a certa sze ca be acheved by a extreme δ-radom-source that always outputs 0 wth probablty δ. Sce by Lemma 8 δk k lm δ j 1 δ k j 1 k j 2, j0 there exsts some costat c δ, so that p j c δ. s 1 sup π, ω j1 Now we cosder the rght factor of the equato above. We use the mootoy of Hx o the tervall [0, 1 2 ] ad Lemma 7: s s H H 2 2 k+1 H c 1 δ 2Hδ 1k 4 k We cosder δ < 1 2 so that Hδ < 1 ad use that Hx x log x. So we get s [ H c 1 δ 2Hδ 1k 2 4 1 Hδk log c ] 1δ k 4. k For k bg eough k > k 0 correspods to > 0, there s a costat c δ so that s H c δ k 2 Hδ 1k. 2 Combg the results, there s a 0 IN ad a c δ, such that for all 0, ad all permutatos π o {0,..., } ad all states ω {0, 1} of the geerator the followg holds: p H c δ log Hδ 1 log 2 1 cδ log Hδ 1, whch leads to the expected rug tme of T cδ 2 Hδ log. 13

Refereces [AR] Noga Alo, Mchael O. Rab: Based cos ad radomzed algorthms. I: F.P. Preparata, S. Mcal eds: Advaces Computg Research 5. JAI Press, 1989, pages 499 507. [As] R.B. Ash: Iformato Theory. Dover 1965. [Ho] C.A.R. Hoare: Qucksort. Computer Joural, 51: 10 15, 1962. [K] Doald Kuth: The Art of Computer Programmg. Vol 3: Sortg ad Searchg. Addso- Wesley, 1973. [KR] H.J. Karloff, P. Raghava: Radomzed algorthms ad pseudoradom umbers. Joural of the Assocato for Computg Machery 40 1993 454 476. [L] [Lu] [Pa] Beatrce Lst: Probablstsche Algorthme ud schlechte Zufallszahle. Doctoral Dssertato, Uverst ät Ulm, 1999. Mchael Luby: Pseudoradomess ad Cryptographc Applcatos. Prceto Uversty Press, 1996. C. H. Papadmtrou: Computatoal Complexty. Addso-Wesley, 1994 pages 259ff [SF] Robert Sedgewck, Phlppe Flajolet: Aalyss of Algorthms. Addso-Wesley, 1996. [SV] M. Satha, U. V. Vazra: Geeratg quas-radom sequeces from slghtly radom sources. Proceedgs of the 25th IEEE [To] Mart Tompa: Probablstc Algorthms ad Pseudoradom Geerators. Lecture Notes, 1991. 14

Appedx Lemma 10 For 0 log 2 log 2 1. Proof. Let S : log 1. We prove the lemma by ducto. S /2 log 2 + /2+1 /2 log 2 /2 1 + log 2 1 + 1 + log 2 / log 2 1 + 1 + log 2 / /2+1 /2+1 /2 log 2 1 + 1 + log 2 / /2+1 /2 log 2 1 + 2/ 1 log 2 1 + 2 /2+1 /2+1 2 / log 2 1 + 1 /2 2/2 + 2 /2 + log 2 1 + 1 /2 2 2 log 2 1 15 ECCC ISSN 1433-8092 http://www.eccc.u-trer.de/eccc ftp://ftp.eccc.u-trer.de/pub/eccc ftpmal@ftp.eccc.u-trer.de, subject help eccc