GIVEN an input sequence x 0,..., x n 1 and the

1 Running Max/Min Filters using 1 + o(1) Comarisons er Samle Hao Yuan, Member, IEEE, and Mikhail J. Atallah, Fellow, IEEE Abstract A running max (or min) filter asks for the maximum or (minimum) elements within a fixed-length sliding window. The revious best deterministic algorithm (develoed by Gil and Kimmel, and refined by Coltuc) can comute the 1D max filter using 1.5 + o(1) comarisons er samle in the worst case. The best known algorithm for indeendent and identically distributed inut uses 1.25 + o(1) exected comarisons er samle(by Gil and Kimmel). In this work, we show that the number of comarisons can be reduced to 1 + o(1) comarisons er samle in the worst case. As a consequence of the new max/min filters, the oening (or closing) filter can also be comuted using 1 + o(1) comarisons er samle in the worst case, where the revious best work requires 1.5 + o(1) comarisons er samle (by Gil and Kimmel); and comuting the max and min filters simultaneously can be done in 2 + o(1) comarisons er samle in the worst case, where the revious best work (by Lemire) requires 3 comarisons er samle. Our imrovements over the revious work are asymtotic, that is, the number of comarisons is reduced only when the window size is large. Index Terms Mathematical morhology, erosion, dilation, oening, closing. 1 INTRODUCTION GIVEN an inut sequence x 0,..., x n 1 and the window size > 1, the 1D running max filter roblem is to comute the oututs y i = max 0 j< x i+j for 0 i n. The 1D running min filter roblem is defined in a similar way (by changing max to min). In the d-dimensional case, a d-dimensional cube is used as a window. Throughout this work, we assume that the dimension is one unless otherwise secified. Following revious work [6], [7], [14], the comutation model is the comarison model, i.e., only the number of comarisons for comaring inut elements is counted, and the comarisons between indices (e.g., as art of iterations) are not counted. This means that the inuts can be drawn from any totally ordered set, rather than a restricted universe like {0, 1,..., 255}. Usually, n is very large comared to, so the comlexity is measured by the number Portions of this work were suorted by a grant from City University of Hong Kong (Project No. 7200218); by National Science Foundation Grants CNS-0915436, CNS-0913875, Science and Technology Center CCF-0939370; by an NPRP grant from the Qatar National Research Fund; by Grant FA9550-09-1-0223 from the Air Force Office of Scientific Research; by sonsors of the Center for Education and Research in Information Assurance and Security. The statements made herein are solely the resonsibility of the authors. Hao Yuan (the corresonding author) is with the Deartment of Comuter Science, City University of Hong Kong, 83 Tat Chee Avenue, Kowloon, Hong Kong, China. E-mail: haoyuan@cityu.edu.hk Mikhail J. Atallah is with the Deartment of Comuter Science, Purdue University, West Lafayette, IN 47907, USA. E-mail: mja@cs.urdue.edu of comarisons er outut (or sometimes called er samle) in terms of. Running max/min filters are fundamental oerators in morhological image rocessing [13]. The max filter corresonds to the dilation oerator over gray-scale images using a flat and linear structuring element, and the min filter corresonds to the erosion oerator. Because all the algorithms for the running max filter aly to the min filter, we will only discuss the running max filter. In this work, we also consider the oening and closing filters, which are two basic morhological image rocessing oerators [13]. The oening filter is obtained by first alying the min filter, and then alying the max filter to the result of the revious min filtering. Similarly, the closing filter is obtained by first alying the max filter, and then alying the min filter to the result of the revious max filtering. 1.1 Previous Work A naïve imlementation of the max filter requires 1 comarisons er samle. Pitas [11] gave an algorithm that uses O(log ) comarisons er samle for the max filter 1. An algorithm that does not deend on the window size was given by van Herk [14] and indeendently by Gil and Werman [6]. Their algorithm (HGW for short) requires 3 4/ comarisons er samle. Gevorkian, Astola and Atourian [5] considered indeendent and identically distributed (i.i.d.) inuts, and resented an algorithm that uses 2.5 3.5/ + 1/ 2 exected comarisons er samle. The revious best known worst-case algorithm was due to Gil and Kimmel [7]. Their algorithm (GK for 1. Throughout this work, we use log to reresent log 2.

2 short) requires 1.5 + log + O( 1 ) comarisons er samle. The GK algorithm was further imroved by Coltuc [3]. Coltuc s algorithm saves about 1.5/( + 1) comarisons er samle comared to the GK algorithm, but the order is still 1.5 + log + O( 1 ). For i.i.d. inuts, Gil and Kimmel also resented an algorithm that requires 1.25 + log exected comarisons er samle. Note that the HGW algorithm can comute semigrou sums in a sliding window using 3 4/ semigrou oerations er samle. The semigrou case was imroved to 3 6/( + 1) oerations er samle by Coltuc [3]. For comuting both the maximum and minimum elements in the sliding window, Lemire [10] resented an algorithm that uses 3 comarisons er samle in the worst case, which is slightly better than running the GK algorithm twice. When the inut is i.i.d., Gil and Kimmel [7] resented an algorithm that uses 2 + 2.3466 log() exected comarisons er samle. The community of algorithm and theory studied a more general version of the roblem Range Minimum Query (RMQ). The RMQ roblem is to rerocess an inut array, such that the minimum element within any query window must be returned efficiently. A linear time and sace rerocessing algorithm for 1D RMQ to achieve constant-time query answering was first given by Gabow, Bentley and Tarjan [4], using a linear reduction to the Nearest Common Ancestor (NCA) roblem [9] on the Cartesian tree [15]. The Cartesian tree (defined by Vuillemin [15]) is built on to of the inut array to comletely cature the information necessary to determine the solution for any range minimum query on the original array. It was shown that any range minimum query can be translated to an NCA query on the recomuted Cartesian tree in constant time. The first constant-time NCA solution with linear rerocessing was given by Harel and Tarjan [9], and much effort was sent on simlifying the solution [1], [2], [12]. Multidimensional RMQ was studied by the authors [16]. Note that using the Cartesian tree aroach [15], the 1D max filter can be solved using at most 2 comarisons er samle. The d-dimensional max/min filter can be comuted by alying the 1D filter d times [6], using a structuring element decomosition aroach [8]. Therefore, the GK algorithm can be used to comute the 2D max (or min) filter by 3+o(1) comarisons er samle, and more generally d(1.5 + o(1)) comarisons er samle for d-dimensional cases. The revious best result for the 1D oening/closing filter was 1.5 + O( log2 ) by Gil and Kimmel [7]. They extended the result to the d-dimensional cases using (2d 1)(1.5 + o(1)) comarisons er samle. 1.2 Our Contribution In this work, we resent a simle imrovement based on the GK algorithm for 1D max filter. Our new algorithm achieves 1 + 2 + log + O( 1 ) comarisons er samle in the worst case, which imroves over the GK algorithm when 20, and over the Coltuc s algorithm when 24. As a consequence, the worstcase comlexity to comute both the maximum and minimum elements in a sliding window becomes 2 + 4 + 2 log + O( 1 ) comarisons er samle, which imroves the solution of Lemire [10] when 37. This means that our imrovements over revious work are asymtotic, i.e., the number of comarisons is reduced only for large s. Based on our imrovement for the 1D max/min filter, the d-dimensional max (or min) filter can be comuted in d(1+o(1)) comarisons er samle in the worst case. Using the techniques of Gil and Kimmel [7], 1D oening (or closing) filter can be comuted in 1 + O( 1 ) comarisons er samle in the worst case, and the d-dimensional oening and closing can be comuted in (2d 1)(1 + o(1)) comarisons er samle in the worst case. We will first review the GK algorithm in Section 2, and then resent our result in Section 3. Section 4 resents an exerimental evaluation. Section 5 concludes. 2 GIL AND KIMMEL S ALGORITHM We will first review the HGW algorithm, on which the GK algorithm is built. All the given resentations are for the 1D max filter. The HGW algorithm slits the inut sequence into overlaing segments of length 2 1, where the segments are centered at ositions i 1 for 1 i n/. Consider a segment centered at osition c, i.e., x c +1, x c +2,..., x c,..., x c+ 2, x c+ 1. The HGW algorithm generates oututs y c +1, y c +2,..., y c in two stages a rerocessing stage and a merge stage. In the rerocessing stage, refix maximums and suffix maximums are comuted for each block. This consists of comuting P c (k) = max{x c, x c+1,..., x c+k }, S c (k) = max{x c, x c 1,..., x c k }, for 0 k 1. Then in the merge stage, 2 oututs can be obtained by merging the refix maximums and suffix maximums in the following way: for 1 k 2, y c +1+k = max{s c ( 1 k), P c (k)}. (1) We can get the remaining two oututs by y c +1 = S c ( 1), y c = P c ( 1).

3 Straightforward imlementation of the rerocessing stage uses 2( 1) comarisons (based on the fact that P c (k) = max{p c (k 1), x c+k } and S c (k) = max{s c (k 1), x c k }), and the merge stage uses 2 comarison. Therefore, the total number of comarisons is 3 4 for generating oututs. The GK algorithm imroves both the rerocessing and merge stages of the HGW algorithm. In the rerocessing stage, the saving of comarisons is achieved by considering the comutations of P c ( ) and S c+ ( ) together when rocessing the elements x c, x c+1,..., x c+. Let q = ( + 1)/2. The GK algorithm first comutes P c (k) for k = 0, 1,..., q 1 using q 1 comarisons, and then comutes S c+ (k) for k = 0, 1,..., q using q comarisons. If P c (q 1) S c+ ( q), then no element in {x c+j q j < } can be greater than P c (q 1). In such a case, we have P c (j) = P c (q 1) for q j < (note that no comarison is required), and only q 1 more comarisons are required to finish comuting S c+ (k) for q < k <. Similarly, if P c (q 1) S c+ ( q), we have S c+ (j) = S c+ ( q) for q < j <, and q more comarisons are required to finish the comutation of P c (k) for q k <. This GK imrovement reduces the total number of comarisons for the ( mod 2) 2. rerocessing stage to 1.5 For the merge stage, the GK algorithm reduces the number of comarisons emloying the following observation: In equation (1), there is an index k such that y c +1+k = S c ( 1 k) for all 1 k k and y c +1+k = P c (k) for all k < k 2. Therefore, a binary search for k using log( 1) comarisons is sufficient to do the merge. Combining the imrovements for the rerocessing stage and the merge stage, the amortized number of comarisons ( mod 2) for oututs is 1.5 2 + log( 1), which is equivalent to 1.5 + log + O( 1 ) er samle. 3 IMPROVED ALGORITHM Our imrovement is obtained by imroving the rerocessing stage of the GK algorithm. When considering the comutations of S c+ ( ) and P c ( ) together, the GK algorithm achieves saving of comarisons by dividing the sequence x c, x c+1,..., x c+ into two halves, and then comuting the refix maximums of the first half and suffix maximums of the second half. The half that has a smaller maximum will extend the maximum comutation into the other half by doing more comarisons to finish the comutation. The worst case for the GK algorithm haens when the maximum is at or near the boundaries of the sequence x c, x c+1,..., x c+, in which case many comarisons are redundant. For examle, if the maximum is located at x c+3 (assume that it is at the left half), then the refix maximum comutations for P c (i) (where 3 < i < q) cost too many comarisons. This rerocessing strategy can be refined by adatively advancing the refix/suffix maximum comutations, so that not many comarisons are done after the refix/suffix maximum touches the actual maximum of x c, x c+1,..., x c+. At the beginning, a current refix index i and a current suffix index j are both set to 0. The index i is used to track the rogress of refix maximum comutation, and its semantic meaning is that: P c (k) has already been comuted for 0 k i. Similarly, the index j is used to track the rogress of suffix maximum comutation, and it means that S c+ (k) has already been comuted for 0 k j. Note that initially, P c (0) = x c and S c+ (0) = x c+. Let s 1 be a default ste size (a arameter whose value will be fixed to 1 later). Our algorithm will adatively advance the value of i and j until i + j = + s 1. In each advancement (excet the last one), either i or j will be advanced by s. In the last advancement (right before i + j reaches + s 1), the ste size is s if ( 1) mod s = 0; or the ste size is ( 1) mod s if ( 1) mod s 0. More secifically, excet in the last advancement, if P c (i) S c+ (j), then P c (k) for i < k i + s will be comuted, and after that i will have increased by s. Similarly, if P c (i) > S c+ (j), then S c+ (k) for j < k j + s will be comuted, and after that j will have increased by s. In the last advancement, similar comutation is done excet that a different ste size ( 1) mod s may be used (see the beginning art of this aragrah). There are +s 1 s advancements. So the number of comarisons sent on determining whether to advance i or j (by comaring P c (i) to S c+ (j)) is +s 1 s. At each time when i (or j) is increased by δ, there are δ comarisons for comuting δ entries of P c ( ) (or S c+ ( )). Because i + j is 0 at the beginning, and is changed to + s 1 at the end, + s 1 comarisons are made other than the comarisons for determining whether to advance i or j. So the total number of comarisons so far is + s 1 + s 1 +. s After i + j = + s 1, no comarison is required and we will show that P c (k) = P c (i) for i < k <, (2) S c+ (k) = S c+ (j) for j < k <. (3) The rerocessing stage is finished, and the correctness is based on Lemma 1 (given below). The seudocode of this algorithm is given in Algorithm 1. Line 5 sets the δ to be the advancement ste size. Lines 7 to 10 advance the refix comutations, and lines 12 to 15 advance the suffix comutations. Note that in Algorithm 1, some secial boundary cases are considered when i or j reaches 1, but these cases do not affect the correctness and comlexity of the algorithm. Lemma 1: When i + j = + s 1, equations (2) and (3) hold.

4 Algorithm 1: Imroved Prerocessing Stage Inut: x c, x c+1,..., x c+ and a arameter s Outut: P c (k) and S c+ (k) for 0 k 1 1 P c (0) x c 2 S c+ (0) x c+ 3 i 0, j 0 4 while i + j < + s 1 do 5 δ min{s, + s 1 i j} 6 if i < 1 and P c (i) S c+ (j) then 7 for k = i + 1 to min{i + δ, 1} do 8 P c (k) max{p c (k 1), x c+k } 9 end 10 i i + δ 11 else 12 for k = j + 1 to min{j + δ, 1} do 13 S c+ (k) max{s c+ (k 1), x c+ k } 14 end 15 j j + δ 16 end 17 end 18 while i < 1 do 19 i i + 1 20 P c (i) P c (i 1) 21 end 22 while j < 1 do 23 j j + 1 24 S c+ (j) S c+ (j 1) 25 end Proof: Without loss of generality, assume that all the inut elements are distinct (ties can be broken by considering their indices). After each advancement, let l be the index of the maximum element of {x k k [c, c+i] [c+ j, c+]} (the set of elements considered so far), then we must have either c + i s < l c + i or c + j l < c + j + s. In other words, l is within a distance of s 1 from either c + i or c + j. This is because the algorithm always advances i if the current refix maximum is smaller, or advances j if the current suffix maximum is smaller. When i+j = +s 1, because of c+i+1 c+ j, the x l is the maximum element of {x k k [c, c + ]}. According to the revious aragrah, l c + i or l < c + j + s. The latter inequality imlies that: l c + j + s 1 = c + ( + s 1 j) = c + i. Therefore, we always have l c + i. This means that no element in {x k k [c+i+1, c+)} can be greater than the current refix maximum, hence equation (2) holds. By a similar argument, equation (3) holds. If s = 1, then the total number of comarisons for the rerocessing stage is bounded by + s 1 + s 1 + s 1 = + 1 1 + + 1 1 + 1 = + 2 1. 1 + 1 Combining with the merge stage of the GK algorithm (which requires log( 1) comarisons by a binary search), the total amortized number of comarisons er samle of Algorithm 1 is bounded by C 1 = 1 ( + 2 1 + log( 1) 1 ( + 2 + log + 3) 1 + 2 + log ( ) 1 + O ( ) 1 = 1 + O = 1 + o(1). This establishes the following theorem: Theorem 1: The 1D running maximum (or minimum) filter can be comuted in 1 + o(1) comarisons er samle in the worst case by a deterministic algorithm. Note that the GK algorithm is very similar to ours when s = q 1 = ( 1)/2. It was shown that the d-dimensional max/min filter can be comuted by alying the 1D filter d times [6], using a structuring element decomosition aroach [8]. More secifically, the following lemma holds: Lemma 2: Let C 1 () be the number of comarisons er samle of an 1D filter for a 1D window of size, then a d-dimensional max/min filter with window size 1 2 d can be comuted by d i=1 C 1( i ) comarisons er samle. Based on Lemma 2, we have Corollary 1. Corollary 1: The d-dimensional maximum (or minimum) filter can be comuted deterministically using d(1 + o(1)) comarisons er samle in the worst case. Gil and Kimmel [7] showed that the oening/closing filter can be comuted using C 1 +O( log2 ) comarisons er samle. Therefore, the number of comarisons er samle for comuting the oening (or closing) filter is at most ( log 2 ) C 1 + O ( ) ( 1 log 2 ) =1 + O + O ( ) 1 =1 + O =1 + o(1). )

5 Exeriment Time in Seconds 0.0 0.5 1.0 1.5 Algorithms YA GK HGW 0 20 40 60 80 100 Window Size (i.e., ) Fig. 1: Exeriment result. So Corollary 2 is established. Corollary 2: The 1D oening (or closing) filter can be comuted deterministically using 1 + o(1) comarisons er samle in the worst case. Gil and Kimmel [7] shows that d-dimensional oening (or closing) can be comuted by running 1D min filter d 1 times, 1D max filter d 1 times, and 1D oening (or closing) filter once. So we have Corollary 3. Corollary 3: The d-dimensional oening (or closing) filter can be comuted deterministically using (2d 1)(1+o(1)) comarisons er samle in the worst case. 4 EXPERIMENTAL EVALUATION The comarison model does not cature other costs, like comarisons between indices, memory accesses, branch misredictions, etc. Therefore, we do not exect our algorithm to beat existing methods in actual imlementations even when the window size is large, unless the comarison between two inut elements is very exensive. So here, we only show the exeriment result for the case where the costs to comare inut elements are high. One such a case is when two 128-bit floating-oint numbers are comared in a 64- bit rocessor without hardware suorts for 128-bit floating-oint comutations. Note that for the cases when the comarisons of inut elements are chea, we do not observe that our algorithm outerforms revious ones even for large window sizes. We imlemented the 1D max filter using the HGW algorithm, GK algorithm, and our algorithm in C++. The exeriment is conducted on a lato comuter with an Intel Core i5-540m 2.53GHz rocessor and 2GB main memory. The oerating system is Ubuntu 11.04 64-bit version, and the comiler is the GNU Comiler Collections (GCC) 4.5.2. The comilation otion that we used is -O2. The comiled binary file is in the 64-bit mode. The data tye is the quadrule recision floating-oint number (e.g., float128 in the C++ of GCC), which has 128 bits. The comarisons for quadrule recision floating-oint numbers are very exensive, because they are currently imlemented using a software emulation. We generated 10 7 random inut numbers, and tested the three algorithms on the inut with window sizes from 2 to 100. The running times are shown in Figure 1. Our algorithm (denoted by YA in the figure) outerforms the GK algorithm when 33 in this exeriment. 5 CONCLUDING REMARKS In this work, we asymtotically imrove the state-ofthe-art for comuting running max/min filters. Some questions remain oen: Is it ossible to simultaneously comute the 1D maximum and minimum filters with less than 2C 1 (e.g., 2 or even 1.5) comarisons er samle? Is it ossible to comute the 2D maximum (or minimum) filter with less than 2C 1 (e.g., 2 or even 1.5) comarisons er samle? Can the 2D oening or closing be comuted by 3 or less comarisons er samle?

6 ACKNOWLEDGMENTS We would like to thank the referees for their helful comments and suggestions. REFERENCES [1] S. Alstru, C. Gavoille, H. Kalan, and T. Rauhe, Nearest common ancestors: a survey and a new distributed algorithm, in SPAA 02: Proceedings of the fourteenth annual ACM symosium on Parallel algorithms and architectures. New York, NY, USA: ACM, 2002,. 258 264. [2] M. A. Bender, M. Farach-Colton, G. Pemmasani, S. Skiena, and P. Sumazin, Lowest common ancestors in trees and directed acyclic grahs, Journal of Algorithms, vol. 57, no. 2,. 75 94, 2005. [3] D. Coltuc, Mathematical comlexity of running filters on semi-grous and related roblems, Signal Processing, IEEE Transactions on, vol. 56, no. 7,. 3191 3197, jul. 2008. [4] H. N. Gabow, J. L. Bentley, and R. E. Tarjan, Scaling and related techniques for geometry roblems, in STOC 84: Proceedings of the sixteenth annual ACM symosium on Theory of comuting. New York, NY, USA: ACM, 1984,. 135 143. [5] D. Gevorkian, J. Astola, and S. Atourian, Imroving gilwerman algorithm for running min and max filters, Pattern Analysis and Machine Intelligence, IEEE Transactions on, vol. 19, no. 5,. 526 529, may. 1997. [6] J. Gil and M. Werman, Comuting 2-d min, median, and max filters, Pattern Analysis and Machine Intelligence, IEEE Transactions on, vol. 15, no. 5,. 504 507, may. 1993. [7] J. Gil and R. Kimmel, Efficient dilation, erosion, oening, and closing algorithms, Pattern Analysis and Machine Intelligence, IEEE Transactions on, vol. 24, no. 12,. 1606 1617, dec. 2002. [8] R. M. Haralick, S. R. Sternberg, and X. Zhuang, Image analysis using mathematical morhology, Pattern Analysis and Machine Intelligence, IEEE Transactions on, vol. PAMI-9, no. 4,. 532 550, jul. 1987. [9] D. Harel and R. E. Tarjan, Fast algorithms for finding nearest common ancestors, SIAM Journal of Comuting, vol. 13, no. 2,. 338 355, 1984. [10] D. Lemire, Streaming maximum-minimum filter using no more than three comarisons er element, Nordic Journal of Comuting, vol. 13, no. 4,. 328 339, 2006. [11] I. Pitas, Fast algorithms for running ordering and max/min calculation, Circuits and Systems, IEEE Transactions on, vol. 36, no. 6,. 795 804, jun. 1989. [12] B. Schieber and U. Vishkin, On finding lowest common ancestors: simlification and arallelization, SIAM Journal on Comuting, vol. 17, no. 6,. 1253 1262, 1988. [13] P. Soille, Morhological Image Analysis: Princiles and Alications. Secaucus, NJ, USA: Sringer-Verlag New York, Inc., 2003. [14] M. van Herk, A fast algorithm for local minimum and maximum filters on rectangular and octagonal kernels, Pattern Recognition Letters, vol. 13, no. 7,. 517 521, 1992. [Online]. Available: htt://www.sciencedirect.com/science/article/ B6V15-48N559B-18/2/1b976a6102acf3d50831ad0e75a481e2 [15] J. Vuillemin, A unifying look at data structures, Communications of the ACM, vol. 23, no. 4,. 229 239, 1980. [16] H. Yuan and M. J. Atallah, Data structures for range minimum queries in multidimensional arrays, in Proceedings of the Twenty-First Annual ACM-SIAM Symosium on Discrete Algorithms (SODA 2010), 2010,. 150 160.