Fundamenta Informaticae XXI (2001) IOS Press

Similar documents
Approximate Sorting. Institute for Theoretical Computer Science, ETH Zürich, CH-8092 Zürich

Recursive Algorithm for Generating Partitions of an Integer. 1 Preliminary

Pairs of disjoint q-element subsets far from each other

Lecture 14: Randomized Computation (cont.)

MA131 - Analysis 1. Workbook 2 Sequences I

Sequences I. Chapter Introduction

Problem Set 2 Solutions

CS / MCS 401 Homework 3 grader solutions

OPTIMAL ALGORITHMS -- SUPPLEMENTAL NOTES

Sequences A sequence of numbers is a function whose domain is the positive integers. We can see that the sequence

An Introduction to Randomized Algorithms

Discrete Mathematics for CS Spring 2007 Luca Trevisan Lecture 22

A sequence of numbers is a function whose domain is the positive integers. We can see that the sequence

THE ASYMPTOTIC COMPLEXITY OF MATRIX REDUCTION OVER FINITE FIELDS

Lecture 2: April 3, 2013

CS 332: Algorithms. Linear-Time Sorting. Order statistics. Slide credit: David Luebke (Virginia)

Analysis of Algorithms. Introduction. Contents

6.3 Testing Series With Positive Terms

ACO Comprehensive Exam 9 October 2007 Student code A. 1. Graph Theory

6.895 Essential Coding Theory October 20, Lecture 11. This lecture is focused in comparisons of the following properties/parameters of a code:

Feedback in Iterative Algorithms

Math 2784 (or 2794W) University of Connecticut

Metric Space Properties

Infinite Sequences and Series

CS 270 Algorithms. Oliver Kullmann. Growth of Functions. Divide-and- Conquer Min-Max- Problem. Tutorial. Reading from CLRS for week 2

Math 155 (Lecture 3)

Introduction to Machine Learning DIS10

UC Berkeley CS 170: Efficient Algorithms and Intractable Problems Handout 17 Lecturer: David Wagner April 3, Notes 17 for CS 170

Spectral Partitioning in the Planted Partition Model

MAT1026 Calculus II Basic Convergence Tests for Series

Recursive Algorithms. Recurrences. Recursive Algorithms Analysis

Discrete Mathematics for CS Spring 2005 Clancy/Wagner Notes 21. Some Important Distributions

The Growth of Functions. Theoretical Supplement

Lecture 4 February 16, 2016

Random Walks on Discrete and Continuous Circles. by Jeffrey S. Rosenthal School of Mathematics, University of Minnesota, Minneapolis, MN, U.S.A.

A Note on Matrix Rigidity

Seunghee Ye Ma 8: Week 5 Oct 28

Analysis of Algorithms -Quicksort-

Sequences and Series of Functions

1 Hash tables. 1.1 Implementation

Randomized Algorithms I, Spring 2018, Department of Computer Science, University of Helsinki Homework 1: Solutions (Discussed January 25, 2018)

n outcome is (+1,+1, 1,..., 1). Let the r.v. X denote our position (relative to our starting point 0) after n moves. Thus X = X 1 + X 2 + +X n,

Chapter 3. Strong convergence. 3.1 Definition of almost sure convergence

Section 11.8: Power Series

w (1) ˆx w (1) x (1) /ρ and w (2) ˆx w (2) x (2) /ρ.

(A sequence also can be thought of as the list of function values attained for a function f :ℵ X, where f (n) = x n for n 1.) x 1 x N +k x N +4 x 3

Design and Analysis of Algorithms

PH 425 Quantum Measurement and Spin Winter SPINS Lab 1

Polynomial identity testing and global minimum cut

Last time, we talked about how Equation (1) can simulate Equation (2). We asserted that Equation (2) can also simulate Equation (1).

Basics of Probability Theory (for Theory of Computation courses)

Mathematical Induction

Lecture 4: April 10, 2013

ON POINTWISE BINOMIAL APPROXIMATION

The random version of Dvoretzky s theorem in l n

This is an introductory course in Analysis of Variance and Design of Experiments.

Math 216A Notes, Week 5

A statistical method to determine sample size to estimate characteristic value of soil parameters

7.1 Convergence of sequences of random variables

Lecture 2 Clustering Part II

Lecture 19: Convergence

On Algorithm for the Minimum Spanning Trees Problem with Diameter Bounded Below

Lecture 6 Chi Square Distribution (χ 2 ) and Least Squares Fitting

Harmonic Number Identities Via Euler s Transform

7.1 Convergence of sequences of random variables

4.3 Growth Rates of Solutions to Recurrences

Analysis of the Expected Number of Bit Comparisons Required by Quickselect

6.883: Online Methods in Machine Learning Alexander Rakhlin

On Random Line Segments in the Unit Square

Lecture 10 October Minimaxity and least favorable prior sequences

Lecture 2. The Lovász Local Lemma

Lecture 6 Chi Square Distribution (χ 2 ) and Least Squares Fitting

Random Sampling with Removal

Sorting Algorithms. Algorithms Kyuseok Shim SoEECS, SNU.

Confidence Intervals

Algorithm Analysis. Chapter 3

The Random Walk For Dummies

Lecture 2: Monte Carlo Simulation

CSE 202 Homework 1 Matthias Springer, A Yes, there does always exist a perfect matching without a strong instability.

Lecture 1: Basic problems of coding theory

IP Reference guide for integer programming formulations.

Self-normalized deviation inequalities with application to t-statistic

Model of Computation and Runtime Analysis

The inverse eigenvalue problem for symmetric doubly stochastic matrices

A Note on the Kolmogorov-Feller Weak Law of Large Numbers

Sequences, Mathematical Induction, and Recursion. CSE 2353 Discrete Computational Structures Spring 2018

Lecture 14: Graph Entropy

Discrete Mathematics and Probability Theory Spring 2012 Alistair Sinclair Note 15

HOMEWORK 2 SOLUTIONS

Lecture 12: November 13, 2018

Complex Stochastic Boolean Systems: Generating and Counting the Binary n-tuples Intrinsically Less or Greater than u

Sequences. Notation. Convergence of a Sequence

Average-Case Analysis of QuickSort

Divide & Conquer. Divide-and-conquer algorithms. Conventional product of polynomials. Conventional product of polynomials.

Goodness-of-Fit Tests and Categorical Data Analysis (Devore Chapter Fourteen)

62. Power series Definition 16. (Power series) Given a sequence {c n }, the series. c n x n = c 0 + c 1 x + c 2 x 2 + c 3 x 3 +

Quantum Computing Lecture 7. Quantum Factoring

Discrete Mathematics and Probability Theory Summer 2014 James Cook Note 15

Beurling Integers: Part 2

6 Integers Modulo n. integer k can be written as k = qn + r, with q,r, 0 r b. So any integer.

Transcription:

Fudameta Iformaticae XXI (2001) 1001 1006 1001 IOS Press Approximate Sortig Joachim Giese Max Plak Istitute for Computer Sciece Saarbrücke, Germay Eva Schuberth Istitute for Theoretical Computer Sciece ETH Zurich, Switzerlad Miloš Stojaković Departmet of Mathematics ad Iformatics Uiversity of Novi Sad, Serbia Abstract. We show that ay compariso based, radomized algorithm to approximate ay give rakig of items withi expected Spearma s footrule distace 2 /ν() eeds at least (mi{log ν(), log } 6) comparisos i the worst case. This boud is tight up to a costat factor sice there exists a determiistic algorithm that shows that 6 log ν() comparisos are always sufficiet. Keywords: algorithms, sortig, rakig, Spearma s footrule metric, Kedall s tau metric 1. Itroductio Our motivatio to study approximate sortig comes from the followig market research applicatio. We wat to fid out how a respodet raks a set of products. I order to simulate real buyig situatios the respodet is preseted pairs of products out of which he has to choose the oe that he prefers, i.e., he has to perform paired comparisos. The respodet s rakig is the recostructed from the sequece of his choices. That is, a procedure that presets a sequece of product pairs to the respodet i order to obtai the product rakig is othig else tha a compariso based sortig algorithm. We ca measure the efficiecy of such a algorithm i terms of the umber of (pairwise) comparisos eeded i order to Partly supported by the Swiss Natioal Sciece Foudatio uder the grat Robust Algorithms for Cojoit Aalysis. Correspodig author. Partly supported by the Miistry of Sciece ad Evirometal Protectio, Republic of Serbia, ad Provicial Secretariat for Sciece, Provice of Vojvodia.

1002 J. Giese, E. Schubert, M. Stojaković / Approximate Sortig obtai the rakig. The iformatio theoretic lower boud o sortig [6] states that there is o procedure that ca determie a rakig by posig less tha log e paired compariso questios to the respodet, i.e., i geeral Ω( log ) comparisos are eeded. Eve for oly moderately large that easily is too much sice respodets ofte get wor out after a certai umber of questios (idepedet of ) ad do ot aswer further questios faithfully aymore. O the other had, it might be eough to kow the respodet s rakig approximately. I this paper we pursue the questio of how may comparisos are ecessary ad sufficiet i order to approximately rak products. I order to give sese to the term approximately we eed some metric to compare rakigs. Assume that we are dealig with products. Sice a rakig is a permutatio of the products, this meas that we eed a metric o the permutatio group S. Not all of the metrics, e.g., the Hammig distace that couts how may products are raked differetly, are meaigful for our applicatio. For example, if i the respodet s rakig oe exchages every secod product with its predecessor, the the resultig rakig has maximal Hammig distace to the origial oe. Nevertheless, this rakig still tells a lot about the respodet s prefereces. I marketig applicatios Kedall s tau metric [3] is frequetly used sice it seems to capture the ituitive otio of closeess of two rakigs ad also arises aturally i the statistics of certai radom rakigs [7]. Our results. Istead of workig with Kedall s metric we use Spearma s footrule metric [3] which essetially is equivalet to Kedall s metric, sice the two metrics are withi a costat factor of each other [3]. The maximal distace betwee ay two rakigs of products i Spearma s footrule metric is less tha 2. We show that i order to obtai a rakig at distace 2 /ν() to the actual rakig, with ay strategy, a respodet has i geeral to perform at least (mi{log ν(), log } 6) comparisos i the worst case, i.e., there is a istace for which ay compariso based algorithm performs at least (mi{log ν(), log } 6) comparisos. Moreover, if we allow the strategy to be radomized such that the obtaied rakig is at expected distace 2 /ν() to the respodet s rakig, we ca show that the same boud o the miimum umber of comparisos holds. O the other had, there is a determiistic strategy (algorithm) that shows that 6 log ν() comparisos are always sufficiet. Related work. At first glace our work seems related to work doe o pre-sortig. I pre-sortig the goal is to pre-process the data such that fewer comparisos are eeded afterwards to sort them. For example i [4] it is show that with O(1) pre-processig oe ca save Θ() comparisos for Quicksort o average. Pre-processig ca be see as computig a partial order o the data that helps for a give sortig algorithm to reduce the umber of ecessary comparisos. The structural quatity that determies how may comparisos are eeded i geeral to fid the rakig give a partial order is the umber of liear extesios of the partial order, i.e., the umber of rakigs cosistet with the partial order. Actually, the logarithm of this umber is a lower boud o the umber of comparisos eeded i geeral [5]. Here we study aother structural measure, amely, the maximum diameter i the Spearma s metric of the set of rakigs cosistet with a partial order. Our results show that with o( log ) comparisos oe ca make this diameter asymptotically smaller tha the diameter of the set of all rakigs. That is ot the case for the umber of liear extesios which stays i Θ(2 log ). Notatio. The logarithm log i this paper is assumed to be biary, ad by id we deote the idetity (icreasig) permutatio of [].

J. Giese, E. Schubert, M. Stojaković / Approximate Sortig 1003 2. Lower Boud Here, we show that i order to obtai a rakig reasoably close to the actual rakig, a respodet has to perform a substatial umber of comparisos i the worst case. More precisely, for ay (possibly radomized) compariso based algorithm that outputs a rakig at distace 2 /ν() to the actual rakig, there is a istace for which it performs (i expectatio) at least (mi{log ν(), log } 6) comparisos. The distace of a approximate rakig from the actual rakig will be measured i Spearma s footrule metric, D(π, id) = D(π) = i π(i), where π(i) is the rak of the elemet of rak i i the approximate rakig, i.e., i π(i) measures deviatio of the approximated rak from the actual rak. Note that for ay rakig the distace i the Spearma s footrule metric to id is at most 2 2. For r > 0, by B D (id, r) we deote the ball cetered at id of radius r with respect to the Spearma s footrule metric, so B D (id, r) := {π S : D(π, id) r}. Next we estimate the umber of permutatios i a ball of radius r. i=1 Lemma 2.1. ( ) 2e(r + ) B D (id, r). Proof: Every permutatio π S is uiquely determied by the sequece {π(i) i} i. Hece, for ay sequece of o-egative itegers d i, i = 1,...,, there are at most 2 permutatios π S satisfyig π(i) i = d i. If D(π, id) r, the i π(i) i r. Sice the umber of sequeces of o-egative itegers whose sum is at most r is ( ) r+, we have ( r + B D (id, r) ) 2 ( ) 2e(r + ). Usig the previous lemma ad Yao s Priciple [8], we give a lower boud for the worst case ruig time of ay (radomized) compariso based approximate sortig algorithm. Theorem 2.1. Let A be a radomized approximate sortig algorithm based o comparisos, let ν = ν() be a fuctio, ad let r = r() = 2 ν(). If for every iput permutatio π S the expected Spearma s footrule distace of the output to id is at most r, the the algorithm performs at least (mi{log ν, log } 6) comparisos i expectatio i the worst case.

1004 J. Giese, E. Schubert, M. Stojaković / Approximate Sortig Proof: Let k be the smallest iteger such that A performs at most k comparisos for every iput. For a cotradictio, let us assume that k < (mi{log ν, log } 6). First, we are goig to prove Sice log ν 6 > k/, we have ν 2 6 1! > 2k 2 ( ) 2e(2r + ). (1) > 2 k/ ad sice ν = 2 r O the other had, from log 6 > k/ we get 2 6 Puttig (2) ad (3) together, we obtai e we get 2e 2r > 2k/ 2e. (2) > 2 k/ implyig 2e > 2k/ 2e. (3) 2e(2r + ) > 2k/. Hece 1 ( ) ( ) 2! > 2 k 2e(2r + ), e provig (1). We deote by R the source of radom bits for A. Oe ca see R as the set of all ifiite 0-1 sequeces, ad the the algorithm is give a radom elemet of R alog with the iput. For a permutatio π S ad α R, we deote by A(π, α) the output of the algorithm with iput π ad radom bits α. We fix α R ad ru the algorithm for every permutatio π S. Note that with the radom bits fixed the algorithm is determiistic. For every compariso made by the algorithm there are two possible outcomes. We partitio the set of all permutatios S ito classes such that all permutatios i a class have the same outcomes of all the comparisos the algorithm makes. Sice there is o radomess ivolved, we have that for every class C there exists a σ S such that for every π C we have A(π, α) = σ π, where is the multiplicatio i the permutatio group S. I particular, this implies that the set {A(π, α) : π C} is of size C. O the other had, sice the algorithm i this settig is determiistic ad the umber of comparisos of the algorithm is at most k, there ca be at most 2 k classes. Hece, each permutatio i S is the ) output for at most 2 k differet iput permutatios. From, Lemma 2.1 we have B D (id, 2r) ad this together with (1) implies that at least ( 2e(2r+)! 2 k ( 2e(2r + ) ) > 1 2! iput permutatios have output at distace to id more tha 2r. Now, if both the radom bits α R ad the iput permutatio π S are chose at radom, the expected distace of the output A(π, α) to id is more tha r. Therefore, there exists a permutatio π 0 such that for a radomly chose α R the expected distace d D (A(π 0, α), id) is more tha r. Cotradictio.

J. Giese, E. Schubert, M. Stojaković / Approximate Sortig 1005 3. Algorithm The idea of ASORT algorithm is to partitio the products ito a sorted sequece of equal-sized bis such that the elemets i each bi have smaller rak tha ay elemet i subsequet bis. It is based o a well-studied variatio of Quicksort algorithm i which the media is chose to be the pivot elemet (see, e.g., [2]). The output of the algorithm is the sequece of bis. Note that we do ot specify the orderig of elemets iside each bi, but cosider ay rakig cosistet with the orderig of the bis. As it turs out, ay such rakig approximates the actual rakig of the elemets i terms of Spearma s footrule metric well. The algorithm ASORT iteratively performs a umber of media searches, each time placig the media ito the right positio i the rakig. Here the media of elemets is defied to be the elemet of rak +1 2. ASORT (B : set, m : it) 1 B 01 := B // B ij is the j th bi i the i th roud 2 for i := 1 to m do 3 for j := 1 to 2 i 1 do 4 compute the media of B (i 1)j 5 B i(2j 1) := {x B (i 1)j x media} 6 B i(2j) := {x B (i 1)j x > media} 7 ed for 8 ed for 9 retur B m1,..., B m(2 m ) To compute the media i lie 4 ad to partitio the elemets i lie 5 ad 6 we use the determiistic algorithm by Blum et al. [1] that performs at most 5.73 comparisos i order to compute the media of elemets ad to partitio them accordig to the media. We ote that i puttig the algorithm ASORT to practice oe may wat to use a differet media algorithm, like, e.g., RANDOMIZEDSELECT [2]. I each roud, the sum of the cardialities of all the bis is. Hece, oe roud takes at most 5.73 comparisos. As the algorithm rus for m rouds overall, the total umber of comparisos is less tha 6m. Theorem 3.1. Let r = 2 ν(). Ay rakig cosistet with the orderig of the bis computed by ASORT i log ν() rouds, i.e., with less tha 6 log ν() comparisos, has a Spearma s footrule distace of at most r to the actual rakig of the elemets from B. Proof: The distace of the actual rakig of the elemets i B to ay rakig cosistet with the orderig of the bis computed by ASORT i m rouds ca be bouded by ( ) 2 m 1 2 2 m. Pluggig i m = log ν(), we see that the distace is at most r. As we saw earlier, the algorithm performs at most 6m = 6 log ν() comparisos.

1006 J. Giese, E. Schubert, M. Stojaković / Approximate Sortig Ackowledgmets. We are idebted to Jiří Matoušek for commets ad isights that made this paper possible. Refereces [1] Blum, M., Floyd, R. W., Pratt, V., Rivest, R. L., Tarja, R. E.: Liear time bouds for media computatios, STOC 72: Proceedigs of the fourth aual ACM symposium o Theory of computig, ACM Press, 1972. [2] Corme, T. H., Leiserso, C. E., Rivest, R. L.: Itroductio to Algorithms, 2d ed., The MIT Press/McGraw- Hill, 2001. [3] Diacois, P., Graham, R. L.: Spearma s Footrule as a Measure of Disarray, Joural of the Royal Statistical Society, 39(2), 1977, 262 268. [4] Hwag, H. K., Yag, B. Y., Yeh, Y. N.: Presortig algorithms: a average-case poit of view, Theoretical Computer Sciece, 242(1-2), 2000, 29 40. [5] Kah, J., Kim, J. H.: Etropy ad Sortig, STOC 92: Proceedigs of the twety-fourth aual ACM symposium o Theory of computig, ACM Press, 1992. [6] Kuth, D. E.: The Art of Computer Programmig, vol. 3, Addiso Wesley, 1973. [7] Mallows, C. L.: No-ull rakig models, Biometrica, 44, 1957, 114 130. [8] Yao, A. C.: Probabilistic computatios: Towards a uified measure of complexity, FOCS 77: Proceedigs of 18th Aual Symposium o Foudatios of Computer Sciece, IEEE Computer Society Press, 1977.