Quantile Search: A Distance-Penalized Active Learning Algorithm for Spatial Sampling

Size: px

Start display at page:

Download "Quantile Search: A Distance-Penalized Active Learning Algorithm for Spatial Sampling"

Matthew Cain
5 years ago
Views:

Quantile Search: A Distance-Penalized Active Learning Algorith for Spatial Sapling John Lipor, Laura Balzano, Branko Kerkez, and Don Scavia 3 Departent of Electrical and Coputer Engineering Departent

1 Quantile Search: A Distance-Penalized Active Learning Algorith for Spatial Sapling John Lipor, Laura Balzano, Branko Kerkez, and Don Scavia 3 Departent of Electrical and Coputer Engineering Departent of Civil and Environental Engineering 3 School of Natural Resources and Environent University of Michigan, Ann Arbor {lipor,girasole,bkerkez,scavia}@uichedu Abstract Adaptive sapling theory has shown that, with proper assuptions on the signal class, algoriths exist to reconstruct a signal in R d with an optial nuber of saples We generalize this proble to when the cost of sapling is not only the nuber of saples but also the distance traveled between saples This is otivated by our work studying regions of low oxygen concentration in the Great Lakes We show that for one-diensional threshold classifiers, a tradeoff between nuber of saples and distance traveled can be achieved using a generalization of binary search, which we refer to as quantile search We derive the expected total sapling tie for noiseless easureents and the expected nuber of saples for an extension to the noisy case We illustrate our results in siulations relevant to our sapling application I INTRODUCTION In any signal reconstruction probles, the cost associated with taking a saple is in soe way dependent on the distance between saples For exaple, consider sapling the Great Lakes with a sall autonoous watercraft in order to reconstruct a spatial signal Lakes such as Lake Erie span an area on the order of thousands of kiloeters; hence the tie it takes for the boat to reach the next saple location is a ajor part of the sapling cost In the traditional or passive sapling scenario, the saple locations or saple ties are chosen a priori, often on a unifor grid Contrasting this is the field of active sapling or active learning, wherein saple locations ay be chosen as a function of all previous locations and their corresponding easureents Active learning theory has shown that binary search guarantees optial sapling rates [], but this theory ignores any sapling cost other than the nuber of saples taken Our proble of interest is one in spatial sapling where the spatial region of interest is relatively large, and the total sapling tie is a function of both the nuber of saples required and the distance traveled throughout the sapling procedure Our contribution is an algorith, tered quantile search, that provides a tradeoff between the nuber of saples required and the distance traveled in finding the change point of a one-diensional threshold classifier At its two extrees, quantile search iniizes either the nuber of saples or the distance traveled, with a tradeoff achieved by varying a search paraeter We derive and analyze this algorith in the noise-free and noisy cases and study it in siulations with an application to environental onitoring, the study of oxygen levels in the Great Lakes The reconstruction of this signal is of interest to water quality experts in order to ensure the balance and quality of water in this region []; we show an estiate interpolated fro historical easureents in Figure Fig : Dissolved oxygen concentrations in Lake Erie Points represent saple locations and solid black lines delineate the central basin Source:

2 fx 04 0 θ x Fig : Threshold classifier in one diension with change point θ 3 Where as binary search takes saples at the bisection of a probability distribution, or the -quantile, quantile search instead saples at the first -quantile For a preview of our results, suppose that we wish to estiate a threshold θ on a one diensional interval For noiseless saples, our results show that the expected estiation error after n easureents is E[ ˆθ n θ ] n +, 4 which generalizes to binary search E[ ˆθ n θ ] n+ The expected distance traveled is E[D], which also generalizes to binary search where E[D] Therefore, larger values of allow for a tradeoff between nuber of saples and distance traveled The reainder of this paper is organized as follows In Section II, we otivate and forulate the proble of interest In Section III we place this work into the context of the current literature on active learning Section IV describes the quantile search algorith for both the noise-free and noisy cases In Section V we verify the theory presented via siulation and apply the algoriths described to our proble of interest, sapling hypoxia in Lake Erie Conclusions and directions of future research are given in Section VI II MOTIVATION AND PROBLEM FORMULATION The proble of interest for this work is the initial otivating proble described in the Introduction, where we wish to deterine hypoxic regions in the Great Lakes The spatial extent of low oxygen concentration or hypoxia is a strong indicator of the health of the lakes [3] Studies perfored by the National Oceanic and Atospheric Adinistration on Lake Erie utilize a nuber of easureent stations on the order of tens, as well as hydrodynaic odels, in order to provide a warning syste for severe hypoxia in the central basin We wish to deterine the spatial extent of the hypoxic region in the central basin of Lake Erie, a phenoenon that occurs yearly in the late suer onths A region is deeed hypoxic if the dissolved oxygen concentration is less than 0 pp [3] Further, we assue the hypoxic zone is one connected region with a sooth boundary The proble can then be considered a binary classification proble, with spatial points receiving a label 0 if they are hypoxic and otherwise, where the desired spatial extent corresponds to the Bayes decision boundary To perfor sapling, an autonoous watercraft with a speed ranging fro 05-4 /s will be used Further, since the axiu spatial extent ay occur at one of any depths, a depth profile ust be taken at each point, increasing the tie per saple significantly Finally, the hypoxic zone can only be considered approxiately stationary for a period of tie less than one week We therefore seek to estiate the decision boundary while siultaneously iniizing the total tie spent sapling Following [4], we split our proble into several one diensional intervals, a process that is described further in Section V-B In each interval we ust find a threshold beyond which the lake is hypoxic Define the step function class F {f : [0, ] R fx [0,θ x} where θ [0, ] is the change point and S x denotes the indicator function, which is on the set S and 0 elsewhere In contrast to the active learning scenario, our goal is to estiate θ while iniizing the total tie required for sapling, a function of both the nuber of saples taken and the distance traveled during sapling Denote the observations {Y i } n i {0, }n as saples of an unknown function f θ F taken at saple locations on the unit interval {X i } n i With probability p, 0 p < /, we observe an erroneous easureent Thus { f θ X i with probability p Y i fx i U i, f θ X i with probability p

3 3 where denotes suation odulo, and U i {0, } are Bernoulli rando variables with paraeter p While other noise scenarios are coon, here we assue the U i are independent and identically distributed and independent of {X i } This noise scenario is of interest as the otivating data hypoxia is a thresholded value in {0, }, where Gaussian noise results in iproper thresholding of the easureents The extension to nonunifor noise eg, a Tsybakov-like noise condition as studied in [5] reains as a topic for future work See Figure for an illustration of f θ X As stated previously, our goal is to estiate θ using as little tie as possible To this end, the saple location at step n is chosen as a function of all previous saple locations and easureents as well as the noise statistics III RELATED WORK A nuber of active learning algoriths exist to estiate θ while iniizing the nuber of saples taken Binary search or binary bisection has been studied on this proble and on any other active learning probles Consider binary bisection applied to the signal in Figure, where θ /3 Initially, the hypothesis space is the unit interval The binary bisection algorith takes its first easureent at X /, easuring Y 0, indicating that θ lies to the left of /, and thus reducing the hypothesis space to the interval [0, /] Continuing in this anner, the algorith reduces the hypothesis space by half after each easureent, achieving what has been shown to be an optial rate of convergence [] In fact, copared to its passive counterpart in this case, saple locations distributed uniforly throughout the interval, binary bisection has been shown to achieve an exponential speedup [], [6] A large body of work has answered questions such as what type of iproveent is attainable in higher diensions [6], [7], for ore coplex decision boundaries and under easureent noise [4], [8] [0], and whether algoriths exist that achieve such iproveents [] [4] Extensions of the binary bisection algorith continue to be an iportant topic of research in active learning and related fields In [5], the author presents a odified version of binary search that can handle noisy easureents, referred to as probabilistic binary search PBS A discretized version of this algorith was presented in [8] and analyzed, yielding an upper bound on the expected rate of convergence of the algorith This discretized algorith, referred to as the B-Z algorith after its authors, has been analyzed fro an inforation-theoretic perspective [6], [7], as well as under Tsybakov s noise condition [5] Further, in [5], the authors show that for the boundary fragent class in diension d, the B-Z algorith can be used to perfor a series of one-diensional searches, achieving an optial rate of convergence up to a logarithic factor The algorith has also been used ore recently in [8] to deonstrate the relationship between active learning and stochastic convex optiization As we noted before, however, binary search travels on average the entire length of the interval under consideration in order to estiate the threshold In the case of Lake Erie, one interval corresponds to approxiately 58 k, aking the use of binary search undesirable A nuber of active learning algoriths exist aside fro binary search, any of which have been shown to be optial under various boundary and noise conditions [4], [7], [9], [] [4] The work of [7], [] [4] presents active linear classifiers that are increasingly adaptive to noise conditions and coputationally feasible In [4], the authors present an algorith that akes use of epirical risk iniization of convex losses Another approach relies on the use of recursive dyadic partitions RDPs [9], in which the d-diensional unit hypercube is recursively divided into d equal sized hypercubes, foring a tree that is then pruned to give rise to an optial estiator An active learning algorith based on this idea is shown to be nearly optial in [9], and this algorith is used in a lake sapling context in [0] While these algoriths provide optial saple rates, they both begin by sapling uniforly at rando throughout the entire feature space This would require traversing the entire lake before proceeding to locations where the boundary is likely to lie, rendering these algoriths infeasible A sall aount of literature exists in which the cost of sapling is generalized beyond the nuber of saples [0] [] The probles considered in [0], [] are siilar to the setting here, in that the distance traveled to obtain saples is taken into account In [0], the authors eploy a traveling salesan with profits odel to iniize the distance traveled while obtaining labels for points of axiu uncertainty The authors of [] iniize an objective function that takes into account both label uncertainty and label cost In both cases, the saples are taken batch-wise before being labeled by a huan expert As every individual saple is inforative, we wish to take advantage of all available data by choosing our saple locations on a saple-by-saple basis, rather than updating after each batch Further, neither algorith is accopanied by theoretical analysis In [], the authors investigate perforance bounds for active learning in the case where label costs are nonunifor The quantile search algoriths presented here extend this idea to the case where label costs are both nonunifor and dynaic IV QUANTILE SEARCH In this section, we extend the ideas of [4], [8] to account for a distance penalization in addition to saple coplexity, resulting in what we refer to as quantile search The basic idea behind this algorith is as follows We wish to find a tradeoff between the nuber of saples required and the total distance traveled to achieve a given estiation error for the change point of a step function on the unit interval As we know, binary bisection iniizes the nuber of required saples On the other hand, continuous spatial sapling iniizes the required distance to estiate the threshold Binary search bisects

4 4 Algorith Deterinistic Quantile Search DQS : input search paraeter, saple budget N : initialize X 0 0, Y 0, n, a 0, b 3: while n N do 4: if Y n then 5: X n X n + b a 6: else 7: X n X n b a 8: end if 9: Y n fx n 0: a ax {X i : Y i, i n} : b in {X i : Y i 0, i n} : ˆθn a+b 3: end while the feasible interval hypothesis space at each step In contrast, one can think of continuous sapling as dividing the feasible interval into infinitesial subintervals at each step With this in ind, a tradeoff becoes clear: one can divide the feasible interval into subintervals of size /, where ranges between and Intuition would tell us that increasing would increase the nuber of saples required but decrease the distance traveled in sapling In what follows, we show that this intuition is correct in both the noise-free and noisy cases, resulting in two novel search algoriths A Deterinistic Quantile Search We first describe and analyze quantile search in the noise-free case p 0, here referred to as deterinistic quantile search DQS and given as Algorith In the following subsections, we analyze the expected saple coplexity and distance traveled for the algorith and show how these results can be used to iniize the total sapling tie Further, the results show that the required nuber of saples increases onotonically with and the distance traveled decreases onotonically with, indicating that the desired tradeoff is achieved Rate of Convergence: Recalling fro Section II, our goal is to estiate the threshold θ of a one-diensional threshold classifier We analyze the expected error of our estiate ˆθ n after a fixed nuber of saples for the DQS algorith The ain result and a sketch of the proof are provided here An expanded proof can be found in Appendix A Theore Consider a deterinistic quantile search with paraeter and let ρ Begin with a unifor prior on θ The expected estiation error after n easureents is then E[ ˆθ n θ ] [ ρ + ρ ] n 3 4 Proof Sketch The proof proceeds fro the law of total expectation Let Z n ˆθ n θ The first easureent is taken at /, and hence the expected error can be calculated when θ / and θ > / [ E[Z ] E Z θ ] Pr θ [ + E Z θ > ] Pr θ > [ ρ + ρ ] 4 Siilarly, after the second easureent is taken, there are four intervals, two that partition the interval [0, /], and two that partition /, ] These intervals contribute four onoials of degree 4 to the error, one of which is ρ 4, one which is ρ 4, and two which are ρ ρ The basic idea is that each parent interval integrates to ρ i ρ j and in the next step gives birth to two child intervals, one evaluating to ρ i+ ρ j and the other ρ i ρ j+ The proof of the theore then follows by induction Consider the result of Theore when In this case, the error becoes E[ ˆθ n θ ] n+ Coparing to the worst case, we see that the average case saple coplexity is exactly one saple better than the worst case, atching the well-known theory of binary search In Section V we confir this result through siulation Distance Traveled: Next, we analyze the expected distance traveled by the DQS algorith in order to converge to the true θ The proof is siilar to that of the previous section, in that it follows by the law of total expectation After each saple, we analyze the expected distance given that the true θ lies in a given interval The result and a proof sketch are given below, with the full proof included in Appendix B

5 5 Theore Let D denote a rando variable representing the distance traveled during a deterinistic quantile search with paraeter Begin with a unifor prior on θ Then E[D] 4 Proof Sketch The DQS algorith for our class of functions chooses saples oving further into the interval until it akes a zero easureent It then turns back and takes saples in the opposite direction until a one is easured This behavior allows us to analyze the total distance by splitting it up into stages first the expected distance traveled before the algorith reaches a point x > θ, and then x < θ, etc Let D n be a rando variable denoting the distance required to ove to the right of θ for the n th tie when n is odd, and to the left of θ for the n th tie when n is even In this case, we have that E[D] E[D n ] n [ i First, we would like to find E[D ] Let A i denote the interval p p0, i p p0, where A 0 [ 0,, so that the A i s for a partition of the unit interval whose endpoints are possible values of the saple locations X j Now note that E[D ] E[D θ A i ] Prθ A i Then since we assue θ is distributed uniforly over the unit interval, Prθ A i i p i p i Next, note that Thus we have p0 E[D ] i0 E[D θ A i ] i0 p0 i p0 p E[D θ A i ] Prθ A i The proof proceeds by rewriting the above in ters of ρ / and then calculating E[D n ] This is done by dividing each A i into subintervals which for partitions of A i The result then follows by induction on E[D n ] 3 Sapling Tie: Given the results in Theores and, we can now choose the optial value of to iniize the expected sapling tie First, let N be a rando variable denoting the nuber of saples required to achieve an error ε and denote its expectation E[N] log4ε log + n, which is found by solving 3 for n Now let T be a rando variable denoting the total tie required to achieve an error ε Then the expected value is E[T ] γe[n] + ηe[d] γn + η n γn + η, 5 where γ denotes the tie required to take one saple, and η denotes the tie required to travel one unit of distance Finding the optial is difficult However, 5 can be used to solve for nuerically using standard techniques We show exaples of such values in Section V As a final note, one ay wonder about the relation to what is known as -ary search [3] In contrast to quantile search, -ary search is tree-based To ake the difference clear, consider the an exaple with θ 3/8 and let 4 In this case, both algoriths take their first saple at X /4 However, after easuring Y, quantile search takes its second easureent at X 7/6, while -ary search proceeds to X / One ay then expect that both algoriths would achieve the desired tradeoff, with -ary search requiring fewer saples than quantile search for a fixed value of We choose the latter for two reasons First, quantile search does not require to be an integer and therefore gives ore flexibility in the resulting tradeoff Second, quantile search as described is the natural generalization of PBS and lends itself to the analysis of [4], [8] in the case where the easureents are noisy A coparison to noisy -ary search is a topic for future work

6 6 Algorith Probabilistic Quantile Search PQS : input search paraeter, saple budget N, probability of error p : initialize π 0 x for all x [0, ], n 0 3: while n N do 4: choose X n+ such that X n+ π 0 n xdx 5: observe Y n+ fx n+ U n+, where U n+ Bernp 6: if Y n+ 0 then 7: 8: else 9: 0: end if : end while : estiate ˆθ n such that ˆθ n 0 π n x / p π n+ x p + p p π n+ x p + p + p π n x, x X n+ π n x, x > X n+ π n x, x X n+ π n x, x > X n+ + p B Probabilistic Quantile Search We now extend the idea behind Section IV-A to the case where easureents ay be noisy ie, p 0 In [8], the authors present the probabilistic binary search PBS algorith The basic idea behind this algorith is to perfor Bayesian updating in order to aintain a posterior distribution on θ given the easureents and locations Rather than bisecting the interval, at each step the algorith bisects the posterior distribution This process is then iterated until convergence and has been shown to achieve optial saple coplexity [5], [7] We now extend this idea to achieve a tradeoff between saple coplexity and distance traveled We refer to the noise-tolerant algorith as probabilistic quantile search PQS, which proceeds as follows Starting with a unifor prior, the first saple is taken at X / The posterior density π n x is then updated as described below, and ˆθ n is chosen as the edian of this distribution The algorith proceeds by taking saples X n such that Xn+ 0 π n xdx, ie, X n is the first -quantile of π n For, the above denotes the edian of the posterior distribution and reduces to PBS A foral description is given in Algorith The Bayesian update on π n can be derived as follows Begin with the first saple We have π 0 x for all x and wish to find π x Let f x X, Y be the conditional density of θ given X, Y Applying Bayes rule, the posterior becoes: f x X, Y PrX, Y θ xπ 0 x PrX, Y For illustration, consider the case where θ 0 We now take the first easureent at X / Then and Pr X /, Y 0 θ 0 p Pr X /, Y θ 0 p In fact, this holds for any θ < / Now exaine the denoinator: + p PrX /, Y 0 Notice that for this siplifies to /, which is the case in [8] We then update the posterior distribution to be p x / π x p + p + p x > /

7 7 The equivalent posterior density can be found for when Y The process of aking an observation and updating the prior is then repeated, yielding general forula for the posterior update When Y n+ 0, we have p + p π n x x / π n+ x p π n x x > / Siilarly, for Y n+, we have + p p π n+ x p + p π n x x / π n x x > / + p Rate of Convergence: Analysis of the above algorith has proven difficult since its inception in 974, with a first proof of a geoetric rate of convergence appearing only recently in [4] Instead, the authors of [4], [5], [8], [7] use a discretized version involving inor odifications In this case, the unit interval is divided into bins of size, such that N The posterior distribution is paraeterized, and a paraeter α is used instead of p in the Bayesian update, where 0 < p < α The analysis of rate of convergence then centers around the increasing probability that at least half of the ass of π n x lies in the correct bin A foral description of the algorith can be found in [5] Given a version of PQS discretized in the sae way, we arrive at the following result Theore 3 Under the assuptions given in Section II, the discretized PQS algorith satisfies sup Pr ˆθ n θ > [ p θ [0,] α + p p + α p ] n [ Taking p α + p + p α p ] n/ yields a bound on the expected error [ p sup E[ ˆθ n θ ] θ [0,] α + p p + α p ] n/ Finally, taking α p/ p + q iniizes the right hand side of these bounds, yielding sup E[ ˆθ n θ ] θ [0,] + n/ p p 6 Proof Sketch In the discretized algorith, a i j denotes the posterior ass at the ith bin after j easureents and aj denotes the vectorized collection of all bins after j easureents Let kθ be the index of the bin containing θ Note that a i 0 for all i We define M θ j a kθj, a kθ j and N θ j + M θj + M θ j a kθj a kθ j + a kθ j + a kθ j As a brief bit of intuition, note that M θ j is a easure of how uch ass is in the bin actually containing θ, while N θ j + is a easure of iproveent fro one iteration to the next and is strictly less than one when an iproveent is ade [4] Fro [4] and following a straightforward application of Markov s inequality and the law of total expectation, we have that Pr θ n θ > E[M θ n], 7 and { } n E[M θ n] M θ 0 ax ax E[N θj + aj] j 0,,n aj The reainder of the proof consists of bounding E[N θ j + aj] by considering the three cases: i kj kθ; ii kj > kθ; and iii kj < kθ In all cases, we show that E[N θ j + aj] q + p q + p β α, where q p and β α Using 7, we have Pr ˆθ n θ > [ q + p + q p β α ] n

8 8 Next, we bound the expected error E[ ˆθ n θ ] Pr ˆθ n θ > t dt + 0 and the result follows by iniizing with respect to and α [ q + p q + p β α The full proof can be found in Appendix C In the case where, the above result atches that of [4], [8] as desired One iportant fact to note is that in contrast to the deterinistic case, the result here is an upper bound on the nuber of saples required for convergence as opposed to an expected value Therefore this result cannot be used directly select the value of that iniizes the expected sapling tie Other siilar analyses [4], [8], [4] also find upper bounds, with the ain barrier to finding expected values being the use of bounds to handle the dependence aong saple locations We instead rely on Monte Carlo siulations to choose the optial value of Finally, the bound here is loose For clarity, consider the case where p 0 and Then the above becoes As noted in [5] and entioned in Section IV-A, sup E[ ˆθ n θ ] θ [0,] sup E[ ˆθ n θ ] θ [0,] n/ n+, indicating that the bound is quite loose, even for the previously considered PBS algorith However, in [5], the authors use this result when to show rate optiality of the PBS algorith This fact suggests that despite the discrepancy, the result of Theore 3 ay still be useful in proving optiality for the PQS algorith While the rate of convergence for PQS can be derived using standard techniques, the expected distance or a useful bound on the distance is ore difficult The technique used in Section IV-A becoes intractable as the values of X i are no longer deterinistic given θ We have considered the approach of exaining the posterior distribution after each step and calculating the possible locations, but at the nth easureent, there are n possible distributions Deterining the expected distance traveled by PQS reains a subject of future work ] n,

9 9 Algorith 3 Discretized Probabilistic Quantile Search : input search paraeter, saple budget N, > 0 such that N, α, β α : define the posterior after j easureent as π j : [0, ] R π j x a i j Ii x, where I [0, ] and I i i, i], for i {,, } for a partition of the unit interval 3: initialize a i 0 such that i a ij 4: while n N do 5: define kj {,, } such that kj i i i a i j, kj a i j > 6: copute τ j, τ j, P j, and P j as τ j a i j kj a i j ikj i τ j kj a i j a i j P j τ j τ j + τ j, i ikj+ P j P j 7: choose X j+ { kj, kj, with probability P j with probability P j P j and define k X j+ 8: observe Y j+ fx j+ U j+ where U j+ Bernα 9: if Y j+ 0 then 0: { : else : 3: end if 4: end while 5: estiate ˆθ n such that ˆθ n 0 π nx / a i j + a i j + { +τ /β α +τ /β α + / τβ α + / τβ α i k i > k i k i > k A Verification of Algoriths V SIMULATIONS In this section, we verify through siulation the theoretical rate of convergence and distance traveled derived in Section IV-A Further, we present siulated results for the PQS algorith and copare with the bound derived in Section IV-B We first siulate the the DQS algorith over a range of fro to 0, where θ ranges over a 000 point grid on the unit interval The resulting average error after 0 saples is shown in Fig 3a, while the average distance before convergence to an error of ε 0 4 is shown in Fig 3b, and the average nuber of saples to achieve the sae error is shown in Fig 3c The figures show the theoretical values for expected error and distance atch the siulated values Fig 3c shows the nuber of saples required to achieve a given error is onotonically increasing in, while Fig 3b shows the distance traveled is onotonically decreasing This indicates that DQS achieves a tradeoff in the noise-free case Fig 3d shows a plot of the expected sapling tie as a function of, with the optial value of indicated in the case where γ 60 s and η 4 /s

10 theoretical siulated 0 theoretical siulated ˆθ0 θ average distance a b average saples expected sapling tie s c Fig 3: Siulated and theoretical vales for a expected error after 0 saples and b distance traveled for DQS algorith c Optial value of for γ 60 s and η 4 /s d Next, we siulate the PQS algorith over a range of fro to 0, where θ ranges over a 00 point grid on the unit interval with 500 Monte Carlo trials run for each value of θ Fig 4a shows the average nuber of saples required to converge to an error of 0 4 As in the deterinistic case, the required nuber of saples increases onotonically with Fig 4b shows the average distance traveled before converging to the sae error value Again, the distance decreases onotonically with, indicating that the algorith achieves the desired tradeoff in the noisy case B Application to Spatial Sapling In this section, we apply the DQS and PQS algoriths to the proble of sapling hypoxic regions in Lake Erie Fig 5a shows the lake with an exaple hypoxic zone pictured in gray In [5], the authors show that for the set of distributions such that the Bayes decision set is a boundary fragent, a variation on PBS can be used to estiate the boundary while achieving optial rates up to a logarithic factor The authors of [5] inforally describe the boundary fragent class on [0, ] d as the collection of sets in which the Bayes decision boundary is a Hölder sooth function of the first d coordinates In [0, ], this iplies that the boundary is crossed at ost one tie when traveling on a path along the second coordinate For a foral definition, see [5], Defn and the surrounding text Given one location x a, b inside the hypoxic zone, the proble can be transfored into two probles, each with a function belonging to the boundary fragent class Using odels and easureents fro previous years, this is a reasonable assuption Splitting the lake along the line y b yields the two sets shown in Fig 5b Now we can further divide the proble into strips along the first diension, as shown in Fig 5c Along each of these strips, the proble reduces to change

11 80 70 average saples average distance a Fig 4: Siulated average a saples required for convergence and b distance traveled for PQS algorith with probability of error p 0 Dashed lines indicate 95% of points above/below line Sapling Tie s Speed /s Total Tie hrs TABLE I: Coparison of deterinistic quantile search with binary search for a variety of sapling ties and speeds b point estiation of a one-diensional threshold classifier as we have studied thus far After estiating the change point at each strip, the boundary is estiated as a piecewise linear function of the estiates, as shown in Fig 5d We apply this procedure to the hypoxic region shown in Fig 5a using 6 strips for a variety of values for tie per saple and speed of watercraft These values are used with Theores and in order to select the optial value of The total tie required for both DQS and binary search can be seen in Table I The results are as expected A higher sapling tie biases the algorith toward lower values of in order to iniize the nuber of saples required, while a lower speed biases the algorith toward higher values of in order to iniize the distance traveled We see that in the case of a low sapling tie and low speed, the total tie saved copared to binary search is on the order of 60 hours This brings the sapling tie fro roughly 5 days down to -3, decreasing the possibility that the spatial extent of the hypoxic region will change significantly before the easureent is coplete Siilar results hold for PQS, as shown in Table II, where we show the sapling ties with a probability of easureent error p 0 averaged over 00 Monte Carlo siulations Note that in this case, the chosen value of ay not be optial VI CONCLUSIONS & FUTURE WORK The algoriths described here deonstrate the desired tradeoff between saples required and distance traveled for onediensional threshold classifier estiation We have shown how this idea can be used to estiate a two-diensional region of hypoxia under certain soothness assuptions on the boundary, and epirical results indicate the benefits of quantile search over traditional binary search Several open questions reain Deriving or bounding the expected distance for the PQS algorith is an iportant next step Further, while the choice of shown here is optial for the DQS algorith, the question reains whether this algorith is optial in soe general sense Beyond this, several interesting generalizations exist The boundary fragent class entioned here is restrictive [4], and the extension to ore general cases would be of interest The recent work of [6] describes a graphbased algorith that eploys PBS to higher-diensional nonparaetric estiation Extending this idea to penalize distance traveled is a proising avenue for practical applications of quantile search Finally, the PQS algorith requires knowledge of the noise paraeter p in order to update the posterior The algoriths presented in [4], [8] enjoy the property that they

a b c d Fig 5: Exaple splitting of hypoxic zone in Lake Erie to achieve two boundary fragent sets a Lake erie with hypoxic region illustrated in gray b

12 a b c d Fig 5: Exaple splitting of hypoxic zone in Lake Erie to achieve two boundary fragent sets a Lake erie with hypoxic region illustrated in gray b Exaple splitting of hypoxic zone to achieve two boundary fragent sets c Division of boundary fragent into strips d Piecewise linear estiation of boundary

13 3 Sapling Tie s Speed /s Total Tie hrs TABLE II: Coparison of probabilistic quantile search with binary search for a variety of sapling ties and speeds with probability of error p 0 are adaptive to unknown noise levels The developent of a noise-adaptive probabilistic search would certainly be of great interest, with potential applications in areas such as stochastic optiization [8] beyond direct applicability to this proble APPENDIX A PROOF OF THEOREM RATE OF CONVERGENCE FOR DQS The proof follows by induction After n saples, the unit interval has been split into n subintervals, one for each possible sequence of n easureents In order to find the expected error, we break the interval down by subinterval, find the conditional expected error given that θ is in each subinterval, and cobine all subintervals using the law of total expectation We note that in binary search, these subintervals are all the sae length, but in quantile search each has a different length Let Z n denote the error after n saples, eg, Z n ˆθ n θ We establish the base case by direct coputation The first saple is ade at / ρ into the interval The expected error after one saple is then E[Z ] E [ Z θ / ] Pr θ + E [ Z θ > 0 + / x 0 dx + ρ ρ x 0 dx + ρ x ρ dx [ ρ + ρ ] 4 ] Pr / θ > / + x dx After the second easureent, each interval [0, ρ] and [ ρ, ] is split into two subintervals, and the error can be calculated again using the law of total expectation We generalize this idea using the following lea Lea Suppose at saple n we know θ lies in a subinterval [a, b] [0, ] with noralized conditional error E[Z n θ [a, b]] Prθ [a, b] 4 ρi ρ j At saple n +, we split [a, b] into two subintervals according to quantile search Then the noralized conditional error of one subinterval will be 4 ρi ρ j+ and of the other will be 4 ρi+ ρ j Proof Note that quantile search either splits the interval at the point a + b a or b b a, depending on the value of Y n We show the first case the second is syetric At saple n +, we have E[Z n+ θ [a, b]] E [Z n+ θ [a, a + ] b a] Pr θ [a, a + b a] + E [Z n+ θ [a + ] b a, b] Pr θ [a + b a, b] a+ b a a a + a + b a b a 4 + b a 4 4 b a ρ + 4 b a ρ b x dx + a+ b a a + b a + b x dx We now show that b a ρ i ρ j using induction on n Consider the base case of n and note that / ρ Then the possible intervals are [0, /] and [/, ], and we have { b a ρ, [a, b] [0, ] ρ, [a, b] [, ]

14 4 Noting that E[Z ] 4 [ ρ + ρ ] proves the base case Now suppose that b a ρ i ρ j for soe n such that E[Z n θ [a, b]] Prθ [a, b] 4 ρi ρ j Splitting the interval at a + b a yields two subintervals, [c, d] and [e, f] with d c a + b a a b a ρ ρ i ρ j and Therefore we see that f e b a b a b a ρ ρ i ρ j E[Z n+ θ [a, b]] 4 ρi ρ j+ + 4 ρi+ ρ j and the proof is coplete We now use this lea to prove the induction step Suppose at saple n we have E[Z n ] n n ρ n k ρ k 4 k k0 With the next saple, the intervals will split into two We then apply the lea to see that the resulting conditional expectation will be E[Z n+ ] n n ρ n k+ ρ k + ρ n k ρ k+ 4 k k0 n + ρ n+ + n n n + ρ n+ k ρ k + n ρ n k k 4 n k Using the fact that n n n+ and we have n k Recalling the binoial forula copletes the proof + n k n! n k + k! n k +! + n! k k! n k +! E[Z n+ ] n+ n + ρ n+ k ρ k 4 k k0 x + y n n k0 n x n k y k k n +, k APPENDIX B PROOF OF THEOREM DISTANCE TRAVELED FOR DQS Following the proof sketch given in Section IV-A, we first rewrite the interval A i by siplifying the geoetric sus Let ρ / Then [ ρ i A i ρ, ρ i+ ρ [ ρ i, ρ i+, where we have used the fact that / ρ It is tedious but straightforward to define the subintervals that partition A i, and the following result is obtained A ii ρ i+ ρi ρ i+, ρ i+ ] ρi ρ i After further inspection, one can obtain a general equation for odd values of n and a siilar equation for even values [ A ii n ρ i+ ρi ρ i+ + + ρi+ +in n ρ in, ρ i + ρi ρ i+ + + ρi+ +in n ρ in+

15 5 The DQS algorith for our class of functions chooses saples oving further into the interval until it akes a zero easureent It then turns back and takes saples in the opposite direction until a one is easured This behavior allows us to analyze the total distance by splitting it up into stages first the expected distance traveled before the algorith reaches a point x > θ, and then x < θ, etc Let D n be a rando variable denoting the distance required to ove to the right of θ for the n th tie when n is odd, and to the left of θ for the n th tie when n is even Then by linearity of expectation, the total expected distance is E[D] E[D n ] 8 n We now calculate E[D n ] Using the above definition, we can easily calculate E[D n θ A ii n ] by taking the absolute value of the difference between the lower edge of A ii n and the upper edge of A ii n for odd values of n and the converse for even values This yields E[D n θ A ii n ] ρi+ +in n ρ in+ The probability Pr θ A ii n is easily calculated as the length of the interval Note that the ajority of ters cancel, so the result becoes Pr θ A ii n ρi+ +in n ρ in ρ in+ ρi+ +in n We now prove the expected distance using induction on E[D n ] The base case has been shown to be E[D ] Now assue for arbitrary n that Using the above definitions, we see that E[D n ] i 0 i 0 n E[D n ] n E[D n θ A ii n ] Pr θ A ii n i n0 i n0 i 0 n ρi+ +in ρ in+ n ρi+ +in i n 0 n 3 i 0 n 3 ρ n 3 i 0 ρ i+ +in i n 0 i 0 i n 0 E[D n ] n, i n0 ρ i+ +in i n 0 ρ i+ +in ρ in ρ in+ ρ i+ +in i n 0 where the third and sixth lines follow fro the fact that i0 ρi ρ i+ ρ definition of E[D n ] This shows the inductive hypothesis and ay obtain E[D] by 8 ρ in ρ in + APPENDIX C PROOF OF THEOREM 3 RATE OF CONVERGENCE FOR DISCRETIZED PQS and the final line follows fro the The proof follows that of [4], [8] Recall fro Algorith 3 that I i denotes the ith discretized bin of the interval and a i j denotes the posterior ass at the ith bin after j easureents First define kθ to be the index of the bin I i containing θ, so that θ I kθ Next define the following ters M θ j a kθj, a kθ j

16 6 and N θ j + M θj + M θ j a kθj a kθ j + a kθ j + a kθ j As a brief bit of intuition, note that M θ j is a easure of how uch ass is in the bin actually containing θ after the jth easureent, while N θ j + is a easure of iproveent fro one iteration to the next and is strictly less than one when an iproveent correct easureent is ade [4] Fro [4] and following a straightforward application of Markov s inequality and the law of total expectation, we have that Pr θ n θ > E[M θ n], 9 and { } n E[M θ n] M θ 0 ax ax E[N θj + aj] 0 j 0,,n aj The reainder of the proof is to bound E[N θ j + aj] To do this, we consider three cases: i kj kθ; ii kj > kθ; and iii kj < kθ First, consider the case i, where kj kθ, and let X j+ kj so that the correct easureent is Y j+ In this case, k in the algorith is kθ, and hence i k + for updating Also, assue the case where we easure correctly, so that with probability q p where β α Then we have where N θ j + a kθ j + a kθ j + / τβ α, + / τβ α a kθj + / τβ α a kθj a kθ j + / τβ α a kθj + a kθ j a kθ j a kθ j a kθj + β α / τ a kθ j a kθ j a kθj β α + a kθ j β αx +, / τ akθ j a kθ j x τ j + / a kθ j a kθ j and τ j is defined in Algorith 3 Siilarly, when X j+ kj, a kθ j +τ /β α a kθj N θ j + +τ /β α a kθj a kθ j where + τ /β α a kθj + a kθ j a kθ j a kθ j a kθj + β α τ / a kθ j a kθ j a kθj β α + a kθ j β αx +, τ / akθ j a kθ j x τ j / a kθ j a kθ j and τ j is defined in Algorith 3 Equivalent steps can be followed for cases ii and iii, yielding respectively τ j / a kθ j a x kθ j if X j+ kj, if X j+ kj τ j / a kθ j a kθ j

17 7 and x τ j+ / a kθ j a kθ j if X j+ kj τ j+ / a kθ j a kθ j if X j+ kj The algebra is analogous for the case where we easure incorrectly, so that { +β αx, with probability q N θ j + β αx, with probability p, where x is as defined above Before bounding E[N θ j + aj], we prove three useful leas Lea For 0 a and 0 τ, τ + / a a τ a a Proof Rewrite the left hand side in two ways as follows τ + / a a τ a a τ a + a τ a a τ a τ a + a Now consider two cases, a τ and a > τ a Case a τ:: In equation, the left ter of the product is positive but the right ter is nonpositive, aking our whole ter less than or equal to zero b Case a > τ:: Note that both ters in the product of equation are less than or equal to one because 0 τ First consider the situation where τ a Then τ a + a τ Next consider the situation when τ a < τ This iplies that τ a < τ τ This iplies the left hand ter of the product in equation is a τ a < a and therefore that τ τ τ a +, aking the product again less than /, which copletes the proof Lea 3 For all 0 < a < and τ, τ / a For all 0 < a < and τ, a τ + / a a τ + τ + Proof We prove the first stateent given in the lea The proof of the second stateent is nearly identical Note that the stateent fro the lea is equivalent to τ a τ + a τ + τa Equivalently, we rearrange further to see that the original stateent holds if a τa a, a

18 8 and hence if Note since a < copleting the proof of the first stateent a + τ a + τ + τ +, We are now ready to bound E[N θ j + aj] Fro the definitions given in Algorith 3, it is straightforward to see that 0 τ j, 0 < τ j, and τ j + τ j a kj j a kθ j Also define gx q p + β αx + β αx q p β αx, q + p + so that E[N θ j + aj] P jgx + P jgx, where x is defined above for the various cases and P j and P j are defined in Algorith 3 Consider case i, kj kθ Then a rearrangeent of ters shows τ j + / a kθ j τ j / a kθ j E[N θ j + aj] P jg + P jg a kθ j a kθ j q + p q + p β α [ ] τ j + / a kθ j τ j / a kθ j P j + P j a kθ j a kθ j q + p q + p β α [ ] τ j + / a kθ j τ j / + a kθ j P j + P j a kθ j a kθ j q + p q + p τ j + / a kθ j β α P j P j a kθ j Note that Then by Lea, we have that P j P j τ j τ j τ j + τ j a kjj τ j a kj j E[N θ j + aj] q + p + q p β α Using Lea 3 for τ /, we see that for case ii kj < kθ E[N θ j + aj] P jg τ j + + P jg τ j + q + p q + p [ ] β α P τ j + P τ j + P + P q + p q + p [ β α P τ j + P τ j + ] q + p q + p β α

19 9 Siilarly applying Lea 3 for τ / in case iii kj > kθ we have E[N θ j + aj] P jg τ j + + P jg τ j + q + p q + p [ ] β α P τ j P τ j + P + P q + p q + p [ β α P τ j P τ j + ] q + p q + p β α For, these reduce to the bound given in [4] We now coplete the proof by using the above result into 7 and 0 Plugging in to 7, we see that Pr ˆθ n θ > [ q + p q + p ] n β α Next, we bound the expected error E ˆθ n θ 0 + Pr ˆθ n θ > t dt Pr ˆθ n θ > t dt Pr ˆθ n θ > [ q + p + Pr q p and the result follows fro the values for, α, and β indicated in the theore ˆθ n θ > t dt β α ] n, REFERENCES [] S Dasgupta, Analysis of a greedy active learning strategy, in Proc Advances in Neural Inforation Processing Systes, 005 [] D Beletsky, D Schwab, and M McCorick, Modeling the suer circulation and theral structure in Lake Michigan, Journal of Geophys Res, vol, 006 [3] G L E R Laboratory, Lake Erie hypoxia warning syste, [4] R Castro and R Nowak, Active learning and sapling, in Foundations and Applications of Sensor Manageent, st ed New York, NY: Springer, 008, ch 8 [5], Miniax bounds for active learning, IEEE Trans Inf Theory, vol 54, pp , May 008 [6] S Dasgupta, Coarse saple coplexity bounds for active learning, in Proc Advances in Neural Inforation Processing Systes, 005 [7] S Hanneke, A bound on the label coplexity of agnostic active learning, in Proc ICML, 007 [8] M V Burnashev and K S Zigangirov, An interval estiation proble for controlled observations, Probles in Inforation Transission, vol 0:3-3, 974, translated fro Probley Peredachi Inforatsii, 03:5-6, July-Septeber, 974 [9] R Castro, R Willett, and R Nowak, Faster rates in regression via active learning, in Proc Advances in Neural Inforation Processing Systes, 005 [0] A Singh, R Nowak, and P Raanathan, Active learning for adaptive obile sensing networks, in Proc Inforation Processing in Sensor Networks, 006 [] M-F Balcan, A Beygelzier, and J Langford, Agnostic active learning, in Proc International Conference on Machine Learning, 006 [] S Dasgupta, D Hsu, and C Monteleoni, A general agnostic active learning algorith, in Proc Advances in Neural Inforation Processing Systes, 007 [3] M-F Balcan, A Broder, and T Zhang, Margin based active learning, Learning Theory, p 350, 007 [4] Y Wang and A Singh, Noise-adaptive argin-based active learning for ulti-diensional data, arxiv Preprint, vol v, 04 [5] M Horstein, Sequential decoding using noiseless feedback, IEEE Trans Inf Theory, vol 9, 963 [6] R M Karp and R Kleinberg, Noisy binary search and its applications, in Proc ACM-SIAM Syposiu on Discrete Algoriths, 007 [7] M B Or and A Hassidi, The bayesian learner is optial for noisy binary search and pretty good for quantu as well, in Proc IEEE Syposiu of Foundations of Coputer Science, 008 [8] Y R Wang and A Singh, Algorithic connections between active learning and stochastic convex optiization, in Proc Algorithic Learning Theory, 03 [9] C Scott and R Nowak, Miniax-optial classification with dyadic decision trees, IEEE Trans Inf Theory, vol 5, pp , Apr 006 [0] A Liu, G Jun, and J Ghosh, Spatially cost-sensitive active learning, in Proc SIAM Conf on Data Mining, 009 [] B Deir, L Minello, and L Bruzzone, A cost-sensitive active learning technique for the definition of effective training sets for supervised classifiers, in Proc SIAM Conf on Data Mining, 009 [] A Guillory and J Blies, Average-case active learning with costs, in Proc Algorithic Learning Theory, 009 [3] B Schlegel, R Geulla, and W Lehner, k-ary search on odern processors, in Proc Fifth Int Workshop on Data Manageent on New Hardware, 009 [4] R Waeber, P I Frazier, and S G Henderson, Bisection search with noisy responses, SIAM Journal on Control and Optiization, vol 5, pp 6 79, 03 [5] J Lipor and L Balzano, Quantile search: A distance-penalized active learning algorith for spatial sapling, Technical report, 05 [6] G Dasarathy and R Nowak, S : An efficient graph-based active learning algorith with application to nonparaetric learning, in Proc Conf on Learning Theory, 05

Quantile Search: A Distance-Penalized Active Learning Algorithm for Spatial Sampling

Quantile Search: A Distance-Penalized Active Learning Algorith for Spatial Sapling John Lipor 1, Laura Balzano 1, Branko Kerkez 2, and Don Scavia 3 1 Departent of Electrical and Coputer Engineering, 2