THE ANALYSIS OF RANGE QUICKSELECT AND RELATED PROBLEMS

Size: px

Start display at page:

Download "THE ANALYSIS OF RANGE QUICKSELECT AND RELATED PROBLEMS"

Kristina Thornton
5 years ago
Views:

1 THE ANALYSIS OF RANGE QUICKSELECT AND RELATED PROBLEMS CONRADO MARTÍNEZ, ALOIS PANHOLZER, AND HELMUT PRODINGER ABSTRACT. Rage Quickselect, a simple modificatio of the well kow Quickselect algorithm for selectio, ca be used to efficietly fid a elemet with rak k i a give rage [i..j], out of give elemets. We study basic cost measures of Rage Quickselect by computig exact ad asymptotic results for the expected umber of passes, comparisos ad data moves durig the executio of this algorithm. The key elemet appearig i the aalysis of Rage Quickselect is a trivariate recurrece that we solve i full geerality. The geeral solutio of the recurrece proves to be very useful, as it allows us to tackle several related problems, besides the aalysis that origially motivated us. I particular, we have bee able to carry out a precise aalysis of the expected umber of moves of the pth elemet whe selectig the jth smallest elemet with stadard Quickselect, where we are able to give both exact ad asymptotic results. Moreover, we ca apply our geeral results to obtai exact ad asymptotic results for several parameters i biary search trees, amely the expected umber of commo acestors of the odes with rak i ad j, the expected size of the subtree rooted at the least commo acestor of the odes with rak i ad j, ad the expected distace betwee the odes of raks i ad j.. INTRODUCTION Quickselect, also called Hoare s FIND algorithm, is a very flexible ad easy to implemet recursive algorithm to fid the elemet of give rak k (i. e., the kth smallest elemet i a give data array A[..] of legth. The Quickselect algorithm uses partitioig of the array ito two subarrays aroud a pivot elemet, as i the popular Quicksort, also by C. A. R. Hoare [5, 6]. The behavior of fudametal quatities like the umber of comparisos betwee data elemets ad the umber of passes (recursive calls of the algorithm i Quickselect has bee extesively studied, see, for istace [4, 8, 0, 4] ad refereces therei. These quatities have also bee studied for may variats of the stadard algorithm, for example, for the media-of-three partitioig scheme [9]. I the preset work we cosider a variat of Quickselect, that we have dubbed Rage Quickselect, which receives as iput the data array ad a rage [i..j]. Its goal is to fid a elemet whose rak falls i the give rage. The aalysis of Rage Quickselect poses several quite atural questios related to the Quickselect algorithm that do ot seem to have bee treated up to ow. Rage Quickselect (RQS, for short is useful whe we are ot ecessarily iterested i a exact order statistic, but some order statistic withi a rage [i..j] of raks. For example, istead of fidig 2000 Mathematics Subject Classificatio. 05A5, 68P0, 68W40. Key words ad phrases. Quickselect, Hoare s Fid, moves, Rage Quickselect, biary search trees, average-case aalysis. This work was supported by the Spaish-Austria research agreemet Accioes Itegradas, grat ES 0/2008 ad by the Spaish-South Africa research agreemet Accioes Itegradas, grat HS The first author was supported by the Spaish Mi. of Sciece ad Techology, project TIN (ALINEX. The secod author was supported by the Austria Sciece Foudatio FWF, grat S9608. The third author was supported by the South Africa Sciece Foudatio NRF, grat This research was also supported by the Ceter for Mathematical Research (CRM, Bellaterra, Spai, while the first ad the third authors held research visitig positios there.

2 2 C. MARTÍNEZ, A. PANHOLZER, AND H. PRODINGER the exact media we could be cotet with a elemet whose rak is, say, betwee 0.48 ad This relaxatio of the Quickselect algorithm will lead, depedig o the rage [i..j], to a reductio of the umber of passes ad of the umber of comparisos betwee elemets i the array durig the executio, ad thus will lead to a faster executio time. We compute the exact average umber of passes ad the exact average umber of comparisos betwee elemets whe executig RQS ad as a cosequece we ca give results quatifyig the average amout of savigs compared to stadard Quickselect. I particular, give some measure of performace X, we compare the differece betwee X,i, the average value of X correspodig to Quickselect whe give a iput of size ad lookig for the ith smallest elemet, ad X,i d,i+d, the average value of X correspodig to Rage Quickselect whe give a iput of size ad lookig for a elemet whose rak falls i the rage [i d, i + d]. The asymptotic behavior of that differece i terms of ad d provides a clear picture of the beefits of Rage Quickselect ad the trade-off betwee speed ad accuracy. The descriptio of the algorithm ad the aalysis of the expected behavior of its fudametal performace characteristics form the core of Sectio 2. The aalysis of Rage Quickselect ivolves the solutio of trivariate recurreces which we have bee able to solve i full geerality. The result (Theorem 2 that we obtai i Subsectio 2.3 turs out be very useful i the aalysis of other iterestig parameters, icludig the umber of moves of a particular elemet durig the executio of the stadard Quickselect algorithm ad the total umber of moves made durig the executio of Rage Quickselect. I particular, we give exact results for the average umber of moves of the elemet with rak p made while selectig the jth smallest elemet out of, ad also for the average total umber of moves durig the executio of the Rage Quickselect algorithm, whe fidig a elemet with rak k [i..j] out of (Sectio 3. These parameters give a further isight ito the fuctioality of the Quickselect algorithms ad moreover, sice moves of elemets correspod to variable assigmets i the algorithm, these quatities appear whe measurig the total cost of the Quickselect algorithms. We also wat to metio here two recet related studies, oe about the umber of moves of particular elemets i the Quicksort sortig algorithm [8] ad the other o the total umber of moves i Quickselect, but for a radomly chose rak [3]. The close coectio betwee Quickselect ad radom biary search trees surfaces also i this paper, like i may previous works of the area (see, for istace [7]. We establish i Sectio 4 the relatio betwee Rage Quickselect ad several parameters i radom biary search trees that ivolve two give odes. We study the average umber of commo acestors of the odes with raks i ad j, the average size of the subtree rooted at the least commo acestor of the odes with raks i ad j, ad the average distace (umber of edges from the ode of rak i to the ode of rak j. Despite these results ca be obtaied (ad have bee obtaied by other meas, we show that all of them follow from direct applicatio of Theorem 2. This is a further example of the geerality ad usefuless of this tool, which qualifies as oe of the importat cotributios of this paper. We shall isist here that i this paper we restrict our aalysis to the expected value of the quatities cosidered. I all cases, we shall cosider that the iput is a array of distict elemets, the! possible orderigs take equally likely. This assumptio is stadard i the probabilistic aalysis of compariso-based sortig ad selectio algorithms (see, for istace, []. Furthermore, the assumptio that the iput is a radom permutatio ca be removed if we cosider that the pivot of each recursive stage is picked uiformly at radom amog the elemets of the curret subarray. Ideed, whatever the iitial permutatio is, if we pick pivots at radom the the probability that we choose the kth smallest elemet out of N is /N for all k, k N. Whe we assume that the source of radomess comes from the algorithm itself, expectatios are with respect the radom choices made

3 ANALYSIS OF RANGE QUICKSELECT 3 by the algorithm, ot by assumig ay particular distributio o the iputs. Both approaches yield the same results, but we will talk i terms of the radom permutatio model for the rest of the article. It is also worth metioig that, apart from the study of the umber of moves of a particular elemet i Quickselect where depedecies betwee the quatities appearig i the recursive descriptio occur (see Sectio 3, our aalysis could, at least i priciple, be exteded to higher momets, most otably to the secod momet ad thus to the variace, although the computatioal effort would be cosiderable (see, for istace [8]. We coclude this sectio with a few remarks cocerig otatios used i this paper. We use Iverso s bracket otatio [Q] for a statemet Q: [Q] = if Q is true ad [Q] = 0 otherwise [3]. The harmoic umbers are always deoted by H := k, for a positive iteger. Moreover, the radom variable E always deotes the idicator fuctio of the evet E, which gives the value whe E occurs ad gives the value 0 otherwise. Throughout this paper we use for all quatities cosidered a calligraphic letter as P, C, etc. to deote radom variables, whereas the correspodig ordiary letters deote their expectatios, e. g., P = E (P. 2. RANGE QUICKSELECT 2.. The algorithm. We begi with a descriptio of the stadard Quickselect algorithm for selectio. The call QUICKSELECT(A, j, l, r will fid the (j l + th smallest elemet amogst all elemets i the array A[l..r], with l j r. I full rigor, the algorithm will retur a elemet x of A[l..r] such that there are at least j l + elemets i the subarray which are less or equal to x. To have a eat defiitio of rak, we shall assume that the give elemets are distict. This will simplify the discussio about the algorithms ad their correctess alog the paper, ad it is also essetial for our aalysis, as we have already poited out i the itroductio After executig this algorithm, it holds that A[j] stores the elemet of the desired rak j l + i A[l..r]; i particular, the iitial call QUICKSELECT(A, j,, will brig the jth smallest elemet of A[..] to A[j]. Moreover, the algorithm rearrages the cotets of the array i such a way that it holds that A[m] A[j], for all l m < j, ad A[j] A[m], for all j < m r. If r l, the subarray cotais at most oe elemet, ad the problem is trivially solved, sice A[l] must cotai the sought elemet. Whe l < r, we perform a partitioig phase, i which oe of the elemets i the array, say A[l], is chose as a pivot elemet. By comparig this pivot elemet v with all remaiig elemets i the array ad iterchagig elemets, the pivot elemet will be brought to its correct positio i the array, say A[k], such that all elemets i the array A[l..k ] are smaller tha or equal to v = A[k] ad all elemets i the array A[k +..r] are larger tha or equal to v. The partitioig algorithm is give i full detail i Subsectio 3., whe we aalyze the umber of moves carried out by Quickselect ad Rage Quickselect. For the time beig, it is eough to ote that the partitioig algorithm will make exactly = r l comparisos betwee the pivot ad the remaiig elemets i the (subarray of size ; moreover, if the subarray cotais a radom permutatio of elemets, the two subarrays that we obtai after partitioig are radom permutatios too. After the partitioig phase, three cases ca occur: ( if j = k we kow the that v = A[k] = A[j] is the (j l + th smallest elemet i A[l..r] ad the algorithm termiates, (2 if j < k we kow that the required elemet is cotaied i the left subarray ad we proceed by searchig for the (j l +th smallest elemet i the array A[l..k ] with a recursive call of Quickselect, ad (3 if j > k we kow that the required elemet is cotaied i the right subarray ad we proceed by searchig for the Both stadard Quickselect ad Rage Quickselect work correctly i the presece of repeated elemets; they retur a elemet such that there are at least some umber, say k, of elemets smaller or equal to it.

4 4 C. MARTÍNEZ, A. PANHOLZER, AND H. PRODINGER (j kth smallest elemet i the array A[k +..r], agai with a recursive call of Quickselect. The algorithm is detailed as Algorithm. Algorithm The Quickselect algorithm Require: array A[l..r], iteger j with l j r Esure: Returs j, A[j] cotais the (j l + th smallest elemet i the array A[l..r] procedure QUICKSELECT(A, j, l, r if r l the retur l ed if PARTITION(A, l, r, k m : (l m < k A[m] A[k], ad m : (k < m r A[k] A[m] if j < k the retur QUICKSELECT(A, j, l, k else if j > k the retur QUICKSELECT(A, j, k +, r else retur k ed if ed procedure Two simple modificatios of the Quickselect algorithm allow us to solve the problem of rage selectio. Rage Quickselect is give the array A, the lower ad upper idices l ad r that delimit the subarray that cotais the elemets of iterest, ad the values i ad j that specify a rage of raks. The call RQS(A, i, j, l, r returs a value k such that the elemet at A[k] has a rak betwee i l + ad j l + amogst all elemets i the array A[l..r], for l i j r. A call to RQS(A, i, j,, returs a value k such that A[k] has a rak k [i..j] amog the elemets i A[..]. Like i Quickselect, it also holds that A[m] A[k], for all l m < k, ad that A[k] A[m], for all k < m r. Compared to the stadard Quickselect algorithm we eed oly to make the followig two modificatios. First, we stop if j i r l, sice the subarray cotais elemets whose raks are betwee i ad j ad ay of them will do 2. The other modificatio comes after the partitioig phase, that is, after the pivot elemet v is brought to its correct positio A[k] i the array, with all elemets i the array A[l..k ] smaller tha or equal to v = A[k] ad all elemets i the array A[k +..r] larger tha or equal to v. We have three cases: ( if i k j the pivot has a rak i the rage [i l +..j l + ] ad we ca retur k ad termiate the algorithm, (2 if j < k we kow that each elemet of iterest is cotaied i the left subarray ad we cotiue with the selectio of a elemet with a rak betwee i l + ad j l + i the array A[l..k ] by makig a recursive call of Rage Quickselect o A[i..k ], ad (3 if i > k we kow that each elemet of iterest is cotaied i the right subarray ad we recursively proceed lookig for a elemet with a rak betwee i k ad j k i the array A[k +..r]. A implemetatio of this algorithm is give as Algorithm The umber of passes. We start our aalysis of Rage Quickselect with the average behavior of the radom variable P,i,j which couts the umber of passes, i. e., (recursive calls, of the algorithm RQS util a elemet with a rak betwee i ad j is foud i a array A[..]. Here, ad for the rest of the paper, as we have already discussed i the itroductio, we assume that the array cotais a radom permutatio of distict elemets. We also assume that we choose the first elemet of the curret subarray as the pivot of each recursive stage. 2 Equivaletly, we stop if i l ad r j.

5 Algorithm 2 The Rage Quickselect algorithm ANALYSIS OF RANGE QUICKSELECT 5 Require: Array A[l..r], itegers i ad j with l i j r Esure: Returs k, with i k j, A[k] has rak betwee i l + ad j l + i the array A[l..r] procedure RQS(A, i, j, l, r if r l j i the retur l ed if PARTITION(A, l, r, k m : (l m < k A[m] A[k], ad m : (k < m r A[k] A[m] if j < k the retur RQS(A, i, j, l, k else if i > k the retur RQS(A, i, j, k +, r else retur k ed if ed procedure Theorem. The expected umber of passes P,i,j = E (P,i,j of the algorithm Rage Quickselect util a elemet with a rak betwee i ad j is foud i a array of elemets is P,i,j = H j + H i+ 2H j i+ +, for i j = log j + log( i + 2 log(j i + + O(. The asymptotic estimate give holds uiformly for i j ad. Whe i = j the formula yields the well kow average umber of passes of Quickselect (see, for istace, [7]: P,j,j = H j + H j+, for j. I order to show this theorem we start with a recursive descriptio of P,i,j. Sice we assume that the iput is a radom permutatio of size we get that the probability that the pivot elemet v = A[] is the kth smallest elemet i the array is / for all k, k. After the partitioig phase the left subarray A[..k ] ad the right subarray A[k +..] cotai radom permutatios of legths k ad k, respectively. If i k j the algorithm termiates ad we oly have to cout the origial call to RQS. If k j we proceed with a recursive call of RQS for the left subarray. I these latter cases we have to add the umber of calls of RQS occurrig therei to the origial call. These cosideratios immediately lead to the followig propositio (see for istace [4, 20] ad refereces therei for backgroud o distributioal recurreces like the oe below. Propositio. The radom variable P,i,j satisfies the followig distributioal recurrece: (d P,i,j = + Uj P (2 U,i,j, for i j, ad P,i,j = 0, if i < or j , where the rak U of the pivot elemet is uiformly distributed o {, 2,..., } ad it is idepedet of (P,i,j,i,j, (P (,i,j,i,j ad (P (2,i,j,i,j; furthermore (P (,i,j,i,j ad (P (2,i,j,i,j are idepedet copies of (P,i,j,i,j. Propositio immediately leads to the followig recurrece for the expectatio P,i,j of the umber of passes: P,i,j = + i P k,i k,j k + P k,i,j, for i j, (

6 6 C. MARTÍNEZ, A. PANHOLZER, AND H. PRODINGER ad P,i,j = 0, if i < or j . It is ot difficult to show by iductio that the closed form for P,i,j give i Theorem is ideed the solutio of the recurrece above. This recurrece ad other that we will fid later ca i priciple be solved usig more or less stadard techiques i a ad-hoc fashio; however, the details of the derivatio are already cumbersome for ( ad they get eve worse whe we have to deal with more complicated recurreces. Therefore, we will take a detour i the ext sectio, where we will ivestigate the geeral solutio of trivariate recurreces whose shape is that of (, but with a geeric o-recursive cost T,i,j. With this systematic ad geeral approach the solutio of ( will be the a by-product of the mai result i the ext subsectio (Theorem 2. We will eed oly to set T,i,j = ad apply the theorem. The rewards of this geeral aalysis will be maifest soo afterwards, whe we use Theorem 2 to obtai the expected umber of comparisos of Rage Quickselect (Subsectio 2.4, later i Sectio 3 whe we aalyze the umber of moves of particular elemets made by Quickselect ad the total umber of moves made by Rage Quickselect, ad fially, i Sectio 4 whe we ivestigate several parameters of radom biary search trees Solvig a trivariate recurrece. We cosider the followig recurrece for umbers X,i,j, which appears i our studies of the Quickselect ad Rage Quickselect algorithms, ad later for biary search trees: X,i,j = i X k,i k,j k + X k,i,j + T,i,j, for i j. (2 Furthermore we defie X,i,j = 0, if i < or j < i or < j. For the toll fuctio T,i,j we also defie T,i,j = 0, if i < or j < i or < j. We remark that (2 is a geeralizatio of the ordiary Quickselect recurrece which appears whe studyig the momets of the umber of comparisos ad passes of Quickselect to select the jth smallest elemet i a array of size. Ideed, the ordiary Quickselect recurrece is the special istace of (2 where i = j. The ordiary Quickselect recurrece was first studied by Kuth [0]; a exact solutio for arbitrary toll fuctios has bee give by Kuba i [2]. To treat recurrece (2 we itroduce the followig trivariate geeratig fuctios: X(z, u, u 2 := X,i,j z u i u j 2, i j i j T (z, u, u 2 := T,i,j z u i u j 2. i j i j Multiplyig (2 by z u i uj 2 ad summig up for all values i j leads, after straightforward computatios, to the followig differetial equatio for the geeratig fuctio X(z, u, u 2 : ( z X(z, u, u 2 = z + u u 2 X(z, u, u 2 + zu u 2 z T (z, u, u 2, with iitial coditio X(0, u, u 2 = 0. The solutio of this first order liear differetial equatio, which ca be obtaied by stadard techiques, is: X(z, u, u 2 = ( z( zu u 2 z 0 ( t( u u 2 t( t T (t, u, u 2 dt. (3

7 ANALYSIS OF RANGE QUICKSELECT 7 The umbers X,i,j ca the be obtaied by extractig coefficiets from the solutio (3. By takig ito accout that T,i,j = [z u i uj 2 ]T (z, u, u 2 = 0, if i < or j < i or < j, we get the, for i j : X,i,j = [z u i u j 2 ]X(z, u, u 2 i = [z l u i l u j l 2 ] z = = = l=0 i l l=0 k=j l i l l=0 k=j l i l l=0 k=j l l= k=j i+l [z k u i l z u j l 2 ] 0 z 0 ( t( u u 2 t( t T (t, u, u 2 dt ( t( u u 2 t( t T (t, u, u 2 dt k [zk u i l u j l 2 ]( z( u u 2 z z T (z, u, u 2 ( kt k,i l,j l (k T k,i l,j l (k T k,i l,j l k + (k 2T k 2,i l,j l. The expressio ca be simplified easily by straightforward maipulatios, thus [ i i+l kt k,l,j i+l (k T k,l,j i+l X,i,j = k = = = i i+l l= k=j i+l i i+l+ l= k=j i+l+ i i+l l= k=j i+l i i i+l l= k=j i+l i+l l= k=j i+l Further simplificatios yield X,i,j = i i+l l= k=j i+l (k T k,l,j i+l (k 2T k 2,l,j i+l k kt k,l,j i+l (k T k,l,j i+l k (k T k,l,j i+l (k 2T k 2,l,j i+l k kt k,l,j i+l (k T k,l,j i+l k kt k,l,j i+l (k T k,l,j i+l k + kt k,l,j i+l (k T k,l,j i+l k(k + kt k,l,j i+l k(k + i i+l l= k=j i+l + k=j kt k,l,j i+l (k + (k ] kt k,i,j (k T k,i,j. k k=j kt k,i,j k k=j kt k,i,j k +

8 8 C. MARTÍNEZ, A. PANHOLZER, AND H. PRODINGER i = i+l l= k=j i+l i 2T k,l,j i+l (k + (k l= We collect our results i the followig theorem. T i+l,l,j i+l i + l + + k=j T k,i,j k + + T,i,j. Theorem 2. Let the sequece of umbers X,i,j, for i j, satisfy the followig recurrece: X,i,j = i X k,i k,j k + X k,i,j + T,i,j, with T,i,j, i j, a arbitrary sequece, such that T,i,j = 0 if i <, j < i or < j. The X,i,j, for i j, is give by the explicit formula i X,i,j = i+l l= k=j i+l i 2T k,l,j i+l (k + (k l= T i+l,l,j i+l i + l + + k=j T k,i,j k + + T,i,j. We remark that settig i = j above gives a exact solutio of the geeric Quickselect recurrece. The solutio thus obtaied is slightly differet from the oe give i [2] ad it is stated i the followig corollary. Corollary. Let the sequece of umbers X,j, for j, satisfy the followig recurrece: j X,j = X k,j k + X k,j + T,j, (4 with T,j, j, a arbitrary sequece such that T,j = 0 if j < or < j. The X,j, for j, is give by the explicit formula j X,j = l= j+l k=l j 2T k,l (k + (k l= T j+l,l, j + l + + k=j T k,j k + + T,j. Recurrece ( studied i Subsectio 2.2 is the istace of recurrece (2 for the particular toll fuctio T,i,j =, i j. We ca the obtai the exact solutio of ( applyig Theorem 2, which gives after easy summatios: i P,i,j = i+l l= k=j i+l i = l= k=j i+l i = l= This proves Theorem. i+l ( 2 i 2 (k + (k l= i + l + + k=j k + + ( 2 k + + H H i+ + H H j + k + 2 j i + l + i + l + = H j + H i+ 2H j i+ +, for i j. + 2H H i+ H j +

9 ANALYSIS OF RANGE QUICKSELECT The umber of comparisos. Next we study the average behavior of the radom variable C,i,j, with i j, which couts the umber of comparisos i the partitioig phase betwee elemets i the array ad the pivot elemet, whe executig the algorithm Rage Quickselect util a elemet with a rak betwee i ad j is foud i the array A[..]. Theorem 3. The expected umber of elemet comparisos C,i,j = E (C,i,j made while executig the algorithm Rage Quickselect util a elemet with a rak betwee i ad j is foud i a array of size is: C,i,j = 2( + H + 2(j i + 4H j i+ 2(j + 2H j 2( i + 3H i+ + 2 j + i 2 2 log + 2(j i + log(j i + 2j log j 2( i + log( i + + O(log 2, for i j The asymptotic equivalece holds uiformly for i j ad. Settig i = j above, we obtai the average umber of comparisos to select the jth smallest elemet out of [0]: C,j,j = 2 (( + H (j + 2H j ( j + 3H j+, for j. Aother immediate cosequece of the theorem is that the value C,i,j is always Θ(, amely, C,i,j = c(i/, j/ + o(, with c(a, b = 2( a l( a 2b l b + 2(b a l(b a + 2 (b a. The proof of this theorem is fully aalogous to that of Theorem i Subsectio 2.2. First we obtai a distributioal recurrece for C,i,j, which has the same structure as the oe give i Propositio. Here, we oly have to take ito accout that durig the partitioig phase ad idepedet of the actual rak of the pivot, we perform exactly comparisos betwee the pivot elemet ad the other elemets i the array. Propositio 2. The radom variable C,i,j satisfies the followig distributioal recurrece: (d C,i,j = + Uj C (2 U,i,j, for i j, ad C,i,j = 0, if i < or j , where the rak U of the pivot elemet is uiformly distributed o {, 2,..., } ad idepedet of (C,i,j,i,j, (C (,i,j,i,j ad (C (2,i,j,i,j; the last two are idepedet copies of (C,i,j,i,j. Propositio 2 gives the the followig recurrece for the expectatio C,i,j of the umber of comparisos: C,i,j = + i C k,i k,j k + C k,i,j, for i j, (5 ad C,i,j = 0, if i < or j . This recurrece is exactly the recurrece studied i Subsectio 2.3 for the particular toll fuctio T,i,j =, for i j. Applyig Theorem 2 easily leads the, for i j, to a exact formula for C,i,j ad proves Theorem 3: i C,i,j = i+l l= k=j i+l i 2(k (k + (k l= i + l i + l + + k=j k k + +

10 0 C. MARTÍNEZ, A. PANHOLZER, AND H. PRODINGER i = = i+l l= k=j i+l + ( i ( 4 k k + 2 i + l= ( 2 + i + l + ( 4(H i+l H j i+l + 6(H i+l+ H j i+l+ l= + i 2(H H i+ + j 2(H H j + = 2( + H + 2(j i + 4H j i+ 2(j + 2H j 2( i + 3H i+ + 2 j + i 2. To obtai the fial result we just used the basic summatio formula k=j ( 2 k + H k = ( H. ( Savigs ad grad averages. Give ay measure of performace X,i,j of Rage Quickselect whe lookig for a elemet whose rak falls i the rage [i..j], out of elemets, it is quite obvious that X,i,j X,k,k, for ay k [i..j]. I other words, o matter what measure we cosider, Rage Quickselect will ever perform worse tha Quickselect whe the sought rak k belogs to the rage [i..j] give as iput to Rage Quickselect. The iequality above of course carries over expectatios, thus X,i,j X,k,k for k [i..j]. It makes sese the to itroduce the differece X,i,d = X,i,i X,i d,i+d, d < i < + d, 0 d ( /2 which measures the savigs of Rage Quickselect over Quickselect whe lookig for the ith smallest elemet ad Rage Quickselect is give a rage of size 2d + aroud i. As we shall see, i some cases, X,i,d does ot deped (or its mai order term does ot deped o i, so usig the size d of the rage to express the savigs yielded by Rage Quickselect turs out to be a relevat choice. Obtaiig both explicit ad asymptotic formulaæ for P,i,d ad C,i,d is straightforward from the explicit expressios give by Theorems ad 3, ad the well-kow asymptotic expasio of the harmoic umbers H = log + γ + O(, with γ deotig the Euler-Mascheroi costat. Aother iterestig set of quatities that we study i this sectio (ad o those forthcomig are the grad averages. We fix a size 2d + for the rage give to Rage Quickselect ad the average over all possible i, i. e., we are iterested i the expected value of X,i d,i+d whe i is uiformly distributed i [d +.. d]. Such quatities are ofte called grad averages [4, 9]. Thus, X,d = 2d d<i d X,i d,i+d, 0 d ( /2. Notice that X,0 is the expected value for quickselect with radom rak. As before, we will also be iterested i the grad average savigs X,d = X,0 X,d, 0 d ( /2.

11 ANALYSIS OF RANGE QUICKSELECT I the case of passes ad comparisos, explicit ad asymptotic expressios for the grad averages ad the average savigs follow easily from the explicit formulæ available for these measures of cost. The followig corollary summarizes the relevat results. Corollary 2. Let d ad i be such that 0 d ( /2 ad d 0, uless explicitly stated otherwise, whe. ( Let P,i,d = P,i,i P,i d,i+d be the average umber of passes saved if we use Rage Quickselect with rage [i d..i + d] istead of Quickselect with rak i. The (2 Let P,i,d = (H i H i+d + (H + i H + i+d + 2H 2d+ 2 2 log d + Θ(. P,d = 2d d<i<+ d P,i d,i+d be the average umber of passes made by Rage Quickselect for a rage of size 2d+ cetered aroud a rak chose uiformly at radom. The P,d = 2 + 2d (H H 2d d { 2 log(/d + O(, if 0 < d = o(, 2 2δ log(/2δ + O(/, if d = δ + o(, with 0 < δ < /2. Furthermore, the grad average of the savigs is P,d = P,0 P,d 2 log d + O(. (3 Let C,i,d = C,i,i C,i d,i+d be the average umber of elemet comparisos that we save if we use Rage Quickselect with rage [i d..i + d] istead of Quickselect with rak i. The C,i,d = 8 2(i + 2H i 2( i + 3H + i 2(2d + 4H 2d+ + 2(i + d + 2H i+d + 2( i dh ++d i + 2d { 4d log ( d + Θ(d, if 0 < d = o(, 2c(α, δ 8 log + O(, if d = δ + o(, with 0 < δ < /2, where c(α, δ = δ + ( α + δ log( α + δ + (α + δ log(α + δ α log α ( α log( α 2δ log(2δ. The secod asymptotic estimate holds uiformly for i = α + o( ad. (4 Let C,d = C,i d,i+d 2d d<i<+ d be the average umber of comparisos made by Rage Quickselect for a rage of size 2d + cetered aroud a rak chose uiformly at radom. The C,d = 3 4(d + 2( + 4(d + 2 (H H 2d d 2d

12 2 C. MARTÍNEZ, A. PANHOLZER, AND H. PRODINGER { ( 3 4(d + 2 log ( d + Θ(d, if 0 < d = o(, 3 + 4δ log(2δ 2δ + Θ(, if d = δ + o(, with 0 < δ < /2. Furthermore, the grad average of the savigs is { 4d log(/d + Θ(d, if 0 < d = o(, C,d = C,0 C,d 4δ log(/2δ 2δ 8 log + Θ(, if d = δ + o(, 0 < δ < /2. To coclude, a few words o the practical sigificace of these fidigs. For istace, we ca fid a elemet whose rak is /2 ± ad save up to Θ( log comparisos, or fid a elemet of rak α( ± δ, for some δ > 0 ad save a liear umber of comparisos. Savigs of the order Θ( log might seem too small to bother with, sice the algorithm rus i liear time; however, for moderate sizes of the array, savigs such as these are oticeable i practice. For istace, with a rage of size 2 + aroud the desired rak, the algorithm allows us to save 555 comparisos o average for a array of size 0000 (the rage aroud the sought rak is of size 20, ad we save 2400 comparisos o the average whe the size is (the rage is the of size MOVES IN QUICKSELECT AND RANGE QUICKSELECT We start with the defiitio of the quatities i our study of moves of elemets i the stadard Quickselect ad Rage Quickselect algorithms. The radom variable M,p,j, with p, j, couts the umber of moves of the elemet with rak p, i. e., assigmets appearig i lie 8, lie 3 or 7 where the right-had side cotais the pth elemet, i the PARTITION procedure (give as Algorithm 3 i ext page while executig the algorithm Quickselect to fid a elemet with rak j i a array A[..]. The radom variable V,i,j, with i j, couts the total umber of moves, i. e., assigmets appearig i lie 8, lie 3 or 7, of array elemets i the procedure PARTITION whe executig the algorithm RQS to fid the elemet with rak k [i..j] i a array A[..]. We coclude this itroductio by statig the followig well-kow radomess preservatio property (see, e.g., [] of the partitio algorithm PARTITION as described i Subsectio 3. (we remark that this property also holds for other commoly used partitio procedures. Whe startig with a radom permutatio of distict values a l < a l+ < < a r as iput data A[l..r] for the partitio algorithm PARTITION(A, l, r, k it holds that after executig this procedure the left subarray A[l..k ] is itself a radom permutatio of a l < a l+ < < a k, ad the right subarray A[k +..r] is itself a radom permutatio of a k+ < a k+2 < < a r. This radomess preservatio property allows a recursive descriptio of the parameters studied i this paper ad is thus heavily used i the aalysis carried out i what follows. 3.. The Partitio procedure. There are several stadard implemetatios of the partitioig phase used i practice for the algorithm Quickselect (ad, of course, also for Quicksort. The procedure PARTITION give as Algorithm 3 is just oe particular implemetatio, which we assume to be used i both Quickselect ad Rage Quickselect. While for the aalysis of moves, we cotiue assumig that all elemets are distict, the implemetatio of PARTITION cotemplates the more geeral case where repetitios may occur. At this poit, we wat to poit out that other stadard implemetatios of the partitioig procedure will likely lead to similar, although slightly differet, results for the quatities studied here. After executig PARTITION(A, l, r, k a pivot elemet v is brought to its correct positio v = A[k] i the array, such that all elemets i the array A[l..k ] are smaller tha or equal to v ad all elemets i the array A[k +..r] are larger tha or equal to v.

13 ANALYSIS OF RANGE QUICKSELECT 3 To do this the procedure starts by choosig as pivot elemet v the first elemet A[l] i the array A[l..r], which is stored. The, by usig two poiters a ad b that are iitialized by a = l ad b = r, the array is scaed i a alteratig way from right ad from left, where each elemet is compared with the pivot elemet v. Whe scaig from right we search for the first elemet A[b], which is smaller tha or equal to v; this elemet is the stored at positio A[a] ad oe cotiues with scaig from left. Whe scaig from left we search for the first elemet A[a], which is larger tha or equal to v; this elemet is the stored at positio A[b] ad oe cotiues with scaig from right. The sca stops if a = b, i. e., if the two poiters a ad b meet each other. The it remais to store the pivot elemet v at its correct place A[a] i the array ad retur this fial locatio of the pivot elemet. Algorithm 3 The PARTITION procedure Require: Array A[l..r] Esure: m : (l m < k A[m] A[k], ad m : (k < m r A[k] A[m] : procedure PARTITION(A, l, r, k 2: if l > r the retur Nothig will be doe 3: ed if 4: a := l; b := r; v := A[a] 5: while a v do b := b Sca from right 7: ed while 8: A[a] := A[b] 9: a := a + 0: if a < b the : while A[a] < v do a := a + Sca from left 2: ed while 3: A[b] := A[a] 4: b := b 5: ed if 6: ed while 7: A[a] := v 8: k := a Task fiished 9: ed procedure 3.2. The umber of moves of particular elemets i Quickselect. We study here the average behavior of the radom variable M,p,j, with p, j, which couts the umber of moves, i. e., assigmets A[.] := a p of the elemet with rak p i the PARTITION procedure whe executig the algorithm Quickselect to fid the elemet with rak j i a array of size. The followig theorem provides a exact formula for the expectatio M,p,j := E (M,p,j. Theorem 4. The expected umber of moves M,p,j = E (M,p,j of the elemet with rak p durig the executio the algorithm Quickselect to fid the elemet with rak j i a array of size, is M,p,j = 3 H + 6 H j + 6 H p+ 2 3 H j p+ + (p (p + 2(p (p (p j 6(j (p (p 2 3(, for p < j, j p +

14 4 C. MARTÍNEZ, A. PANHOLZER, AND H. PRODINGER M,p,j = 3 H + 6 H p + 6 H j+ 2 3 H (p 2 p j+ + 3 (p j(p j 3 (p j(p j + 6( j + 6( j (p (p 2 3( M,p,j = 3 H + 6 H j + 6 H j+ + (j 2 (j (j ( 3j + 2 [j = ] 2 [j = ], for p = j, j ad 2, M,, =. + 3p + 2, for j < p, 3(p j + Despite the formulæ for M,p,j whe p < j ad M,p,j whe p > j seem to be related via the substitutio p p +, j j +, this is ot the case, as the reader ca readily covice herself by substitutig a few values. The differece betwee the formula M,p,j for p > j, call it M (2,p,j, ad the formula M,p,j for p < j, call it M (,i,j, whe we substitute p by + p ad j by + j is very small, amely, M (2,p,j = M (,p,j + O(. For completitude we will later give separate asymptotic formulæ i Corollary 3 for M,p,j whe p < j ad whe p > j, although that should be ot ecessary because of the relatio just oted. To prove this theorem we start with a recursive descriptio of M,p,j, which is obtaied by cosiderig a call of Quickselect for a array A[..]. We assume ow that the pivot elemet v = A[] is the kth smallest elemet i the array; sice our iput data are formig a radom permutatio of legth it holds that the probability that the pivot elemet has rak k is /, for k. Now we study whether the elemet with rak p will be moved, i. e., a assigmet A[ ] :=... where the right-had side cotais the elemet with rak i is performed, durig the executio of the partitio procedure PARTITION. We have to distiguish three cases: ( if k = p the the elemet with rak p (i this case this is the pivot elemet will always be moved, (2 if k p the elemet with rak p will be moved oly if it is located i the subarray A[k..]; the probability that this happes is the k+. After the partitioig phase the left subarray A[..k ] ad the right subarray A[k +..] are each formig a radom permutatio of legths k ad k, respectively. Next we observe that if the pivot elemet has a rak betwee p ad j, i. e., depedig o the order of the cosidered elemets either p k j or j k p, the fial umber of moves of the elemet with rak p durig the executio of Quickselect is already reached. This holds sice the either the Quickselect algorithm termiates (k = j or it cotiues executig i a subarray that does ot cotai the elemet with rak p. Oly if k j p or k > p j we proceed with a recursive call of Quickselect for the left subarray. I these latter cases we have to add the umber of moves of the elemet with rak p durig the executio of Quickselect occurrig therei. These cosideratios immediately lead to the followig propositio. Propositio 3. The radom variable M,p,j satisfies, for p, j, the followig distributioal recurrece: M,p,j (d = Uj M (2 U,p,j + T,p,U, for p j, M,p,j (d = U<j M ( U,p U,j U + U>p M (2 U,p,j + T,p,U, for j < p,

15 ANALYSIS OF RANGE QUICKSELECT 5 ad M,p,j = 0, if mi(p, j < or max(p, j >. The rak U of the pivot elemet is uiformly distributed o {, 2,..., } ad idepedet of (M,p,j,p,j, (M (,p,j,p,j ad (M (2,p,j,p,j, which are idepedet copies of (M,p,j,p,j. Here the radom variable T,p,k is the idicator fuctio of the evet that the elemet with rak p is moved durig the executio of the partitio procedure PARTITION for a radomly chose permutatio of legth leadig to a pivot elemet of rak k. It holds the, for p, k : k, k < p, P {T,p,k = } = k+, k > p,, k = p, ad P {T,p,k = 0} = P {T,p,k = }. We remark here that i the distributioal recurrece give as Propositio 3 the radom variables T,p,k ad M ( k,p k,j k (ad also T,p,k ad M (2 k,p,j are depedet as ca be checked easily for cocrete examples (e. g., for = 3 ad i = j =. Thus Propositio 3 will oly allow to treat the expectatio M,p,j of the umber of moves, whereas a study of higher momets would require a more refied descriptio of M,p,j. However, Propositio 3 immediately leads to a recurrece for the expected value M,p,j. It is here advatageous to distiguish betwee the cases p < j, p = j ad p > j. We start with the case p < j, where we obtai, for p < j : M,p,j = p M k,p k,j k + M k,p,j + E (T,p,k p = M k,p k,j k + p = M k,p k,j k + M k,p,j + M k,p,j + ( p k + + (p (p 2 2( + k=p+ k + ( p( p + 2( To get a exact solutio of M,p,j we ca thus apply Theorem 2 for the particular toll fuctio T,p,j = (p (p 2 2( + ( p( p + 2( +, p < j. +. We omit here the computatios leadig to the exact formula of M,p,j, p < j, give i Theorem 4, sice othig more is required tha basic summatio formulæ. For the case p = j we obtai, for j : j M,p,j = M k,p k,j k + = i M k,p k,j k + + { (j (j 2 2( M k,p,j + M k,p,j + ( j( j+ 2( +, for 2,, for =. E (T,p,k

16 6 C. MARTÍNEZ, A. PANHOLZER, AND H. PRODINGER Thus, a exact solutio of M,p,j with p = j ca be obtaied by itroducig M,j := M,p,j ad applyig Corollary for the particular toll fuctio T,j := { (j (j 2 2( + ( j( j+ 2( +, for j ad 2,, for j = =. After carryig out the computatios occurrig, which are omitted here, we obtai the exact formula of M,j,j, j, give i Theorem 4. Fially we cosider the case p > j, where we obtai, for j < p : M,p,j = j M k,p k,j k + M k,p,j + E (T,p,k j = M k,p k,j k + k=p+ k=p+ M k,p,j + (p (p 2 2( + ( p( p + 2( Whe itroducig M,p,j := M,j,p this recurrece ca be writte as follows, with p < j : p M,p,j = M k,p k,j k + M k,p,j + (j (j 2 2( + ( j( j + 2( A exact solutio of M,p,j, p < j, ca be obtaied by applyig Theorem 2 for the particular toll fuctio T,p,j = (j (j 2 2( + ( j( j + 2( +, p < j. After back substitutio we thus obtai a exact solutio of M,p,j, with j < p, which is give i Theorem 4. Agai the straightforward computatios are omitted. Last but ot least, we ca obtai asymptotic equivalets with little effort. Corollary 3. The expected umber of moves M,p,j = E (M,p,j of the elemet with rak p whe executig the algorithm Quickselect to fid the elemet with rak j i a array of size has the followig asymptotic equivalets, which hold for ad uiformly for the give rage of p ad j: M,p,j = 3 log + 6 log j + 6 log( p log(j p + + O(, p < j, M,p,j = 3 log + 6 log p + 6 log( j log(p j + + O(, j < p, M,j,j = 3 log + 6 log j + 6 log( j + + O(, j. I particular, we get the followig importat estimates whe j = β + o(, 0 < β < : M,p,j 6 log β + 6 log( α 2 α2 log(β α + 3 6β 2 2α 3β + 2 α 3 + α2 3, for p = α + o(, ad 0 < α < β <, M,p,j 6 log α + 6 log( β 2 ( α2 2( α log(α β + 3 6( β 2 3( β + 2 α 3 + α2 3, for p = α + o(, ad 0 < β < α <,

17 ANALYSIS OF RANGE QUICKSELECT 7 M,p,j 2 ( κ log, 3 for j p K κ, with 0 < κ < ad K 0, M,p,j 2 3 log, for j p = O((log κ for some κ > The total umber of moves i Rage Quickselect. Now we study the average behavior of the radom variable V,i,j, with i j, which couts the total umber of moves, i. e., assigmets A[ ] := of array elemets, i the partitio procedure PARTITION whe executig the algorithm RQS to fid the elemet with rak k [i..j] i a array A[..]. Theorem 5. The expected total umber of moves V,i,j = E (V,i,j of array elemets i the partitio procedure PARTITION whe executig the algorithm RQS to fid the elemet with rak k [i..j] i a array A[..] filled with a radom permutatio of legth, for i j, is give by the followig exact formula: V,i,j = 2 3 ( + H 6 (4j + H j 6 (4 4i + 5H i (2j 2i + H j i+ j 3 + i 3 +, for i < j, 2 V,j,j = 2 3 ( + H 6 (4j + H j 6 (4 4j + 5H j V,, = [j = j = ], for j ad 2, 36 Asymptotically, for i = α + o( ad j i = δ + o(, V,i,j 3 ( 2δ log δ 2( α log( α 2(α + δ log(α + δ + 2 δ, 0 < δ < α. We derive this theorem from a recursive descriptio of V,i,j, which is agai obtaied by cosiderig a call of RQS for a array A[..]. We assume that the pivot elemet v = A[] is the kth smallest elemet i the array; the probability that this happes is /, for k. Now we wat to cout the total umber of moves, i. e., a assigmet A[.] := (appearig i lie 8, lie 3 or 7, of array elemets i the partitio procedure PARTITION. We distiguish betwee two cases: ( if k = the there is exactly oe move durig the partitioig phase, amely the assigmet of the pivot elemet i lie 7, (2 if k 2 the there may occur the followig two situatios: Elemet A[k] has a rak i the rage..(k ad exactly l elemets with a rak i the rage..(k are located i the subarray A[k..]. It follows the that exactly l elemets with a rak i the rage (k+.. are located i the subarray A[2..k ]. I this situatio we obtai the that exactly 2l, i. e., l (lie 8 + l (lie 3 + (lie 7, moves are carried out durig the partitioig phase. By elemetary combiatorial cosideratios we get the followig probability that this evet occurs: ( ( ( k 2 ( k k 2 k l l (k!( k! =, for l k. (! l l ( k

18 8 C. MARTÍNEZ, A. PANHOLZER, AND H. PRODINGER Elemet A[k] has a rak i the rage (k +.. ad exactly l elemets with a rak i the rage..(k are located i the subarray A[k +..]. It follows the that exactly l elemets with a rak i the rage (k +.. are located i the subarray A[2..k]. I this situatio we obtai the that exactly 2l +, i. e., l (lie 8 + l (lie 3 + (lie 7, moves are carried out durig the partitioig phase. This gives the followig probability that this evet occurs: ( k 2 (! l ( k l (k!( k! = ( k 2 ( k l l ( k, for l k. Of course, after the partitioig phase the left subarray A[..k ] ad the right subarray A[k+..] are each formig a radom permutatio of legths k ad k, respectively. But as ca be show easily (permutig the elemets with a rak i the rage..(k ad of the elemets with a rak i the rage (k +.., respectively, i the iput data array leads to easy-describable permutatios of the elemets i the subarrays A[..k ] ad A[k +..] after the partitioig phase eve more is true. Namely, if we cosider oly those permutatios, such that the umber of moves i the procedure PARTITION is exactly l, with a arbitrary l, the it also holds that after the partitioig phase the left subarray A[..k ] ad the right subarray A[k +..] are each formig a radom permutatio of legths k ad k, respectively. Thus the umber of moves durig the partitioig phase is idepedet of the umber of moves, which are made durig a recursive call of RQS for the right subarray A[k +..] (if k j ad that have to be added to get the total umber of moves. This idepedece property appearig i the distributioal recurrece stated i the followig propositio would allow also to study higher momets of V,i,j or could be a startig poit for cosideratios cocerig the limitig distributioal behavior of V,i,j (see, e.g., [4, 7] for limitig distributio results studyig the parameter umber of comparisos i Quickselect. Propositio 4. The radom variable V,i,j satisfies, for i j, the followig distributioal recurrece: V,i,j (d = Uj V (2 U,i,j + T,U, for i j, ad V,i,j = 0, if i <, j , where the sequeces (U, (T,k,k, (V (,i,j,i,j ad (V (2,i,j,j of radom variables are all idepedet. Here V (,i,j ad V(2,i,j are idepedet copies of V,i,j, whereas U is uiformly distributed o {, 2,..., }. Furthermore, T,k is, for k, distributed as follows: P {T, = } =, P {T,k = 2l} = P {T,k = 2l + } = ( k 2 ( k l l ( k ( k 2 ( k l l ( k, for k 2 ad l k,, for k 2 ad l k.

19 ANALYSIS OF RANGE QUICKSELECT 9 Propositio 4 immediately gives the followig recurrece for the expectatio V,i,j of the total umber of moves: V,i,j = i V k,i k,j k + V k,i,j + E (T,k, for i j, (7 ( k l= ad V,i,j = 0, if i <, j . It holds that E (T, =, whereas for k 2 we obtai: E (T,k = k ( ( k 2 k 2l + l l ( k k l= ( k 2 l ( k l (2l + ( k + (2k =, where we used the Chu-Vadermode idetity (see, e. g., [3]. Easy computatios give the { E (T,k = , for 2,, for =. Thus (7 ca be writte as follows: V,i,j = i V k,i k,j k + V k,i,j + T,i,j, for i j, (8 with T,, = ad T,i,j = , for i j ad 2. A exact solutio of this recurrece ca be obtaied by simply applyig Theorem 2, which shows Theorem 5; the straightforward computatios are omitted here. We remark here that settig i = j leads to results cocerig the total umber of moves i stadard Quickselect, e. g., V,j,j is the radom variable that couts the total umber of moves made by Quickselect whe selectig the jth smallest elemet out of. As we have doe for passes ad comparisos, we ca compare the savigs of Rage Quickselect relative to Quickselect. The followig corollaries provide the exact ad asymptotic formulæ for the savigs ad the grad average. Corollary 4. Let V,i,d = V,i,i V,i d,i+d, that is, the average umber of data moves that we save if we use Rage Quickselect with rage [i d..i + d] istead of Quickselect with rak i, for d 0] 8 where { 4 3 d log ( d + Θ(d, if 0 < d = o(, c(α, δ 8 log + O(, if d = δ + o(, with 0 < δ < /2, 2 3 c(α, δ = δ + ( α + δ log( α + δ + (α + δ log(α + δ α log α ( α log( α 2δ log(2δ.

20 20 C. MARTÍNEZ, A. PANHOLZER, AND H. PRODINGER The first asymptotic estimate holds uiformly for all d = o( ad. The secod asymptotic estimate holds uiformly for i = α + o( ad. Observe that for ay valid i ad d, V,i,d 3 C,i,d; actually, V,i,j 3 C,i,j P,i,j +O(. Corollary 5. Let V,d = 2d d<i<+ d V,i d,i+d, 0 ( /2, that is, V,d is the average total umber of moves made by Rage Quickselect for a rage of size 2d + cetered aroud a rak chose uiformly at radom. The (4d + ( + ( V,d = H H 2d+ + 3( 2d 2 4d +, for d, 3( 2d V,0 = + 3 H + 7 9, for 2, 8 V,0 =. Moreover, it holds { (4d+ 3 log ( d + Θ(d if 0 < d = o(, V,d ( + + Θ(, if d = δ + o(, with 0 < δ < /2. 4δ log(2δ 3( 2δ The first asymptotic estimate holds uiformly for all d = o(, whe. Furthermore, the grad average of the savigs is V,d = V,0 V,d { 4d 3 log(/d + Θ(d, if 0 < d = o(, 4δ log(/2δ 3( 2δ 3 log + Θ(, if d = δ + o(, 0 < δ < /2. 4. SOME PARAMETERS IN BINARY SEARCH TREES Biary search trees are biary trees geerated by successively isertig elemets ito a origially empty tree via a simple recursive algorithm (see for istace [2]. If elemet x has to be iserted ito a empty tree oe creates a ew ode cotaiig x. If elemet x has to be iserted ito a o-empty tree oe has to compare x with the elemet k of the root: if x < k the x will be iserted ito the left subtree, whereas if x k the x will be iserted ito the right subtree. For the average-case aalysis of the quatities cosidered for biary search trees we also always use the radom permutatio model, i. e., we assume that all! permutatios of a sequece of distict values a < a 2 < < a are chose with equal probability as iput data to geerate a biary search tree of size. We defie ow the three parameters for radom biary search trees we will cosider i this paper. The radom variable A,i,j, with i j, couts the umber of commo acestors (i a rooted tree B a ode v is a acestor of ode w if v is lyig o the uique path from the root of B to w of the odes with rak i ad j i a radom biary search tree of size. The radom variable S,i,j, with i j, couts the size of the subtree rooted at the least commo acestor of the odes with rak i ad j i a radom biary search tree of size (i. e., the size of the smallest subtree cotaiig the odes with rak i ad j. Fially, the radom variable D,i,j, with i j, is the distace (umber of edges i the uique path from the ith ode to the jth ode i a radom BST of size.

21 ANALYSIS OF RANGE QUICKSELECT 2 FIGURE. A example of the parameters A,i,j, S,i,j ad D,i,j A example of a biary search tree together with the quatities cosidered i this paper is give as Figure. The biary search tree depicted is of size 6 ad was geerated by isertig the elemets [5, 5, 0, 6, 8, 2, 3, 2,, 4, 6, 4, 7, 9, 3, ], i that order. The odes i = 8 ad j = 2 have A 6,8,2 = 3 commo acestors (odes 5, 5, ad 0. The size of the subtree rooted at the least commo acestor of odes i = 8 ad j = 2 (which is ode 0 is S 6,8,2 = 9. The distace betwee the two odes is D 6,8,2 = 3. Both A,i,j ad D,i,j have received attetio i the literature [9, 22, 2]. The correspodig results i the followig subsectios are thus alterative derivatios, usig Theorem 2, of the formulæ that were already kow. Other authors have also ivestigated the umber of commo acestors ad the distace betwee two radomly chose odes i a radom biary search tree [5]. The results give here (Subsectio 4.2 about the size of the subtree rooted at the least commo acestor of two give odes are ew, to the best of our kowledge. 4.. Commo acestors. We cosider ow the radom variable A,i,j, with i j, which couts the umber of commo acestors of the odes with rak i ad j i a radom biary search tree of size. We fid that the distributio of A,i,j has bee dealt with already i Sectio 2. Theorem 6. The radom variable A,i,j ad the umber of passes made by Rage Quickselect P,i,j, which has bee defied i Subsectio 2.2, are equally distributed, i. e., A,i,j (d = P,i,j. Therefore, the expected umber of commo acestors A,i,j = E (A,i,j of the odes with rak i ad j i a radom biary search tree of size is, for i j, give by the followig exact ad asymptotic formula (which uiformly holds for i j ad : A,i,j = H j + H i+ 2H j i+ + = log + log( i + 2 log(j i + + O(. This ca be show easily, where we use a recursive descriptio of A,i,j, which is obtaied via the decompositio of a biary search tree of size ito the root ode ad its left ad right subtree. Assumig the radom permutatio model we get that with probability / the root ode has rak k, for all k. I ay case the root ode is a commo acestor of the odes with rak i ad j. If

CS / MCS 401 Homework 3 grader solutions

CS / MCS 401 Homework 3 grader solutios assigmet due July 6, 016 writte by Jāis Lazovskis maximum poits: 33 Some questios from CLRS. Questios marked with a asterisk were ot graded. 1 Use the defiitio of