arxiv: v1 [math.st] 24 Apr 2015

Size: px

Start display at page:

Download "arxiv: v1 [math.st] 24 Apr 2015"

Chad Haynes
5 years ago
Views:

1 Multiple Testing of Local Extrema for Detection of Change Points arxiv:.v [math.st] Apr Dan Cheng North Carolina State University April, Astract Armin Schwartzman North Carolina State University A new approach to change point detection ased on smoothing and multiple testing is presented for long data sequences modeled as piecewise constant functions plus stationary ergodic Gaussian noise. As an application of the STEM algorithm for peak detection developed in Schwartzman et al. () and Cheng & Schwartzman (), the method detects change points as significant local maxima and minima after smoothing and differentiating the oserved sequence. The algorithm, comined with the Benjamini-Hocherg procedure for thresholding p- values, provides asymptotic strong control of the False Discovery Rate (FDR) and power consistency, as the length of the sequence and the size of the jumps get large. Simulations show that FDR levels are maintained in non-asymptotic conditions and guide the choice of smoothing andwidth with respect to the desired location tolerance. The methods are illustrated in genomic array-cgh data. Key Words: False discovery rate; Gaussian; Kernel smoothing; Local maxima; Local minima; Location tolerance. Introduction Detecting change points in the mean of an oserved signal is a common statistical prolem with applications in many research areas such as climatology (Reeves et al., ), oceanography(killick et al., ), finance (Zeileis et al., ) and medical imaging (Nam et al., ). It often appears in the analysis of time series ut it has more recently een found in the analysis of genomic sequences, see Erdman & Emerson (); Lai et al. (); Muggeo & Adelfio (); Olshen et al. (); Tishirani & Wang (); Wang et al. () and the references therein. Given the large amounts of data present in modern applications, it is of interest to design a change point detection method that Partially supported y NIH grant R-CA.

2 can operate over long sequences where the numer and location of change points are unknown, and in such a way that the overall detection error rate is controlled. Many different approaches have een proposed to find and estimate change points, such as kernelased methods (Gijels, ), Bayesian methods (Barry & Hartigan, ; Erdman & Emerson, ), segmentation techniques (Olshen et al., ; Wang et al., ; Muggeo & Adelfio, ), nonparametric tests (Lanzante, ) andl -penalty methods (Eilers & de Menezes, ; Huang et al., ; Tishirani & Wang, ), including the PELT method (Jackson et al., ; Killick et al., ). However, the approach proposed here is unique in the following two ways. First, the noise is assumed to e a stationary Gaussian process, allowing the error terms to e correlated. This is an important departure from the standard assumption of white noise in most of the change-point literature. In fact, applied statisticians desiring to use change-point methods have sometimes aandoned this option in favor of other techniques simply ecause the white noise assumption does not hold (Hung et al., ). This paper shows that change-point methods can e devised for correlated noise, expanding the domain of their applicaility. Second, we use the theory of Gaussian processes to compute p-values for all candidate change points, so that significant change points can e selected at a desired significance level. For concreteness, we adopt the Benjamini-Hocherg multiple testing procedure, enaling control of the false discovery rate (FDR) of detected change points when the data sequence is long and the numer and location of change points are unknown. To our knowledge, this is the first article proposing a multiple testing method for controlling FDR of detected change points. In this paper, we consider a signal-plus-noise model where the true signal is a piecewise constant function and the change points are defined as the points of discontinuity. Inspired y the method for detecting peaks in Schwartzman et al. () and Cheng & Schwartzman (), we modify the STEM algorithm therein to detect change points. The central idea is the oservation that the true signal has zero derivative everywhere except at the change points, where the derivative is infinite. Thus, in the presence of noise and under temporal or spatial sampling, change points can e seen as positive or negative peaks in the derivative of the smoothed signal. Note that ecause of the time sampling, derivatives cannot e oserved directly and can only e estimated. The focus on the derivative of the smoothed signal effectively transforms the change point detection prolem into a peak detection prolem. As in the STEM algorithm, the resulting peak detection prolem is then solved y identifying local maxima and minima of the derivative as candidate peaks and applying a multiple testing procedure to the list of candidates. The modified STEM algorithm for change point detection consists of the following steps:. Differential kernel smoothing: to transform change points to local maxima or minima.. Candidate peaks: find local maxima and minima of the differentiated smoothed process.. P-values: computed at each local maximum and minimum under the the null hypothesis of no signal in a local neighorhood.

3 Figure : Following the notation in. and Example., the left panel is the oserved signal-plus-noise model y(t) containing ten true change points with varying a i and noise z(t) given y (.) withσ = andν =. The right panel illustrates the STEM algorithm. The lue curve isy γ(t), otained with a Gaussian smoothing kernel with standard deviation γ =. Local maxima (green solid dots) and local minima (red solid dots) are declared as significant (marked with solid triangles) at FDR level α =. if their heights are eyond the dotted line thresholds. The cyan and pink ars indicate the location tolerance intervals (v i,v i + ) with = for increasing and decreasing change points respectively. At this tolerance, there are nine true discoveries and one false discovery.. Multiple testing: apply a multiple testing procedure to the set of local maxima and minima; declare as detected change points those local maxima and minima whose p-values are significant. The algorithm is illustrated y a toy example in Figure. The modified STEM algorithm aove differs from the ones in Schwartzman et al. () and Cheng & Schwartzman () in that peaks are sought in the derivative of the smoothed signal rather than the smoothed signal itself, and that oth positive and negative peaks are considered. In addition, an important consideration for the proper definition of error in change point detection is that, as opposed to the peak detection prolems considered in Schwartzman et al. () and Cheng & Schwartzman () where signal peaks had compact support, a true single change point at t = v has Leesgue measure zero. Thus in the presence of noise, it can never e detected exactly at t = v. Therefore we introduce a location tolerancethat defines the precision within which a change point should e detected. Specifically, given, a detected change point is regarded as a true discovery if it falls in the interval (v, v + ). Conversely, if a significant change point is found more than a distance from any true change point, it is considered a false discovery. The quantity is not used in the STEM algorithm itself ut is needed for proper error definition. Under this convention, it is shown here that the modified STEM algorithm exhiits asymptotic FDR control and power consistency as the length of the sequence and the size of the jumps at the change points increase. These asymptotic conditions are similar to those considered in Schwartzman et al.

4 () and Cheng & Schwartzman () and, in fact, the proofs are easily extended from those in Cheng & Schwartzman (). Simulations for varying levels of smoothing andwidth γ, location tolerance and jump size a are used to study the ehavior of the algorithm under non-asymptotic conditions. The simulation results help guide the choice of smoothing andwidth with respect to the desired location tolerance. In general, power increases with andwidth to a limit dictated y the distance etween the change points, so admitting a higher tolerance generally allows a higher andwidth and higher power. The methods are illustrated in a genomic sequence of array-cgh data in a reast-cancer tissue sample (Loo et al., ; Hsu et al., ). The goal of the analysis is to find genomic segments with copy-numer alterations. These are found y detecting change points in the copy numer genomic sequence. The multiple testing scheme. The model We consider a continuous time model, although the algorithm is designed for data discretely sampled in time. Consider the signal-plus-noise model y(t) = µ(t)+z(t), t R, (.) where the signal µ(t) is a piecewise constant function of the form µ(t) = a j h j (t), j= a j R\{}, with h j (t) = ½(t > v j ) for v j R. We are interested in finding the change points v j. For the asymptotic analysis, we assume a = inf j a j > and v = inf j v j v j >, so that the change points do not ecome aritrarily small in size nor aritrarily close to each other. Let w γ (t) = w(t/γ)/γ, where γ > is the andwidth parameter and w(t) is a unimodal symmetric kernel with compact connected support [ c, c] and unit action. Convolving the process (.) with the kernel w γ (t) results in the smoothed random process y γ (t) = w γ (t) y(t) = w γ (t s)y(s)ds = µ γ (t)+z γ (t), (.) where the smoothed signal and smoothed noise are defined respectively as µ γ (t) = w γ (t) µ(t) = R a j h j,γ (t) and z γ (t) = w γ (t) z(t), (.) j=

5 and where the smoothed change point takes the form h j,γ (t) = w γ (t) h j (t). (.) The smoothed noise z γ (t) defined y (.) is assumed to e a zero-mean four-times differentiale stationary ergodic Gaussian process.. Change point detection as peak detection of the derivative Consider now the derivative of the smoothed oserved process (.) y γ (t) = w γ R (t) y(t) = w γ (t s)y(s)ds = µ γ (t)+z γ (t), (.) N where the derivatives of the smoothed signal and smoothed noise are respectively µ γ (t) = w γ (t) µ(t) = a j h j,γ (t) and z γ (t) = w γ (t) z(t). A key oservation from (.) is that h j,γ(t) = w γ(t s)h j (s)ds = w γ(s)h j (t s)ds R R = w γ (s)½(t s > v j)ds = w γ (t v j ). R Thus (.) represents a signal-plus-noise model where the smoothed signal j= (.) µ γ (t) = a j h j,γ (t) = a j w γ (t v j ) (.) j= is a sequence of unimodal peaks with the same shape as that of w γ and located at locations v j. The prolem of finding change points in y γ (t) is thus reduced to finding (positive or negative) peaks in y γ(t). For simplicity, we assume that the compact supports S j,γ of the smoothed peak shape h j,γ (t) = j= w γ (t v j ) do not overlap, although this is not crucial in practice.. The STEM algorithm for change point detection Suppose we oserve y(t) with J jumps defined y (.) in the line of length L centered at the origin, denoted y U(L) = ( L/, L/). The following is a version of the STEM algorithm of Schwartzman et al. () and Cheng & Schwartzman () for detecting change points. Algorithm. (STEM algorithm for change point detection). Differential kernel smoothing: Otain the process (.) y convolution of y(t) with the kernel derivative w γ (t).

6 . Candidate peaks: Find the set of local maxima and minima of y γ(t) in U(L), denoted y T γ = T + γ T γ, where { } T γ + = t U(L) : y γ(t) =, y γ () (t) <, { } T γ = t U(L) : y γ(t) =, y γ () (t) >.. P-values: For each t T γ +, compute the p-value p γ(t) for testing the (conditional) hypotheses H (t) : {µ (s) = for all s (t,t+)} vs. H A (t) : {µ (s) > for some s (t,t+)}; and for each t T γ, compute the p-value p γ(t) for testing the (conditional) hypotheses H (t) : {µ (s) = for all s (t,t+)} vs. H A (t) : {µ (s) < for some s (t,t+)}, where > is an appropriate location tolerance.. Multiple testing: Let m γ = #{t T γ } e the numer of tested hypotheses. Apply a multiple testing procedure on the set of m γ p-values {p γ (t), t T γ }, and declare significant all local extrema whose p-values are smaller than the significance threshold.. P-values Given the oserved heights y γ (t) at the local maxima or minimat T γ = T γ + T γ, p-values in step () of Algorithm. are computed as F γ (y γ(t)), t T γ +, p γ (t) = F γ ( y γ (t)), t T (.) γ, wheref γ (u) denotes the right tail proaility ofz γ (t) at the local maximumt T γ + the null model µ (s) =, s (t,t+), that is,, evaluated under F γ (u) = P ( z γ(t) > u t is a local maximum of z γ (t) ). (.) The second line in (.) is otained y noting that, y (.), P ( z γ(t) < u t is a local minimum of z γ (t) ) = P ( z γ (t) > u tis a local maximum of z γ (t)) = F γ ( u), since z γ (t) and z γ (t) have the same distriution.

7 The distriution (.) has a closed-form expression, which can e otained as in Schwartzman et al. () or Cramér & Leadetter (). More specifically, the distriution (.) is given y ( ) λ,γ πλ ( ) ( ),γ u λ,γ F γ (u) = Φ u + λ,γ σ γ φ σ γ Φ u σ γ, (.) where σ γ = Var(z γ(t)), λ,γ = Var(z γ(t)), λ,γ = Var(z γ () (t)), = σ γ λ,γ λ,γ, and φ(x), Φ(x) are the standard normal density and cumulative distriution function, respectively. The quantities σ γ, λ,γ and λ,γ depend on the kernel w γ (t) and the autocorrelation function of the original noise process z(t). Explicit expressions may e otained, for instance, for the Gaussian autocorrelation model in Example. elow, which we use later in the simulations.. Error definitions Assuming the model of., define the signal region S = J j= (v j,v j + ) and null region S = U(L)\S. For u >, let T γ (u) = T γ + (u) T γ (u), where { } T γ + (u) = t U(L) : y γ(t) > u, y γ(t) =, y γ () (t) <, } T γ {t (u) = U(L) : y γ (t) < u, y γ (t) =, y() γ (t) >, indicating that T γ + (u) and T γ (u) are respectively the set of local maxima of y γ (t) aove u and the set of local minima of y γ (t) elow u. The numer of totally and falsely detected change points at threshold u are defined respectively as R γ (u) = #{t T + γ (u)}+#{t T γ (u)}, V γ (u;) = #{t T + γ (u) S }+#{t T γ (u) S }. (.) Both are defined as zero if T γ (u) is empty. proportion of falsely detected jumps FDR γ (u;) = E The FDR at threshold u is defined as the expected { } Vγ (u;). (.) R γ (u) Note that whenγ and u are fixed,v γ (u;) and hence FDR γ (u;) are decreasing in. Following the notation in Cheng & Schwartzman (), define the smoothed signal region S,γ to e the support of µ γ(t) and smoothed null region S,γ = U(L) \ S,γ. We call the difference etween the expanded signal support due to smoothing and the true signal support the transition region T γ = S,γ \S = S \S,γ.. Power Denote y I + and I the collections of indices j corresponding to increasing and decreasing change points v j, respectively. We define the power of Algorithm. as the expected fraction of true discov-

8 ered change points Power γ (u;) = J [ ( J Power j,γ (u;) = E J j= + j I ½ j I + ½ ( T+ γ (u) (v j,v j +) ) ) )] ( T γ (u) (v j,v j +), where Power j,γ (u;) is the proaility of detecting jump v j within a distance, ) P( T+ γ (u) (v j,v j +), if j I +, Power j,γ (u;) = ) P( T γ (u) (v j,v j +), if j I. (.) (.) The indicator function in (.) ensures that only one significant local extremum is counted within a distance of a change point, so power is not inflated. Note that whenγ anduare fixed,power γ (u;) and Power j,γ (u;) are increasing in. Asymptotic error control and power consistency Suppose the BH procedure is applied in step of Algorithm. as follows. For a fixed α (, ), let k e the largest index for which theith smallest p-value is less than iα/ m γ. Then the null hypothesis H (t) at t T γ is rejected if p γ (t) < kα m γ y γ(t) > ũ BH = F y γ(t) < ũ BH = F γ ( ) kα m γ γ ( ) kα m γ if t T + γ, if t T γ, wherekα/ m γ is defined as if m γ =. Sinceũ BH is random, we define FDR in such BH procedure as { } Vγ (ũ BH ;) FDR BH,γ () = E, R γ (ũ BH ) where R γ ( ) and V γ ( ;) are defined in (.) and the expectation is taken over all possile realizations of the random threshold ũ BH. We will make use of the following conditions: (C) The assumptions of. hold. (C) L and a = inf j a j, such that (logl)/a, J/L = A + O(a + L / ) with A >. Let E[ m,γ (U())] and E[ m,γ (U(),u)] e the expected numer of local maxima and local maxima aove level u of z γ(t) on unit interval U() = ( /,/), respectively. In particular, we have the following explicit formula (Schwartzman et al., ) E[ m,γ (U())] = λ,γ. (.) π λ,γ (.)

9 Theorem. Let conditions (C) and (C) hold. (i) Suppose Algorithm. is applied with a fixed threshold u >. Then FDR γ (u;) E[ m,γ (U(),u)]( cγa ) E[ m,γ (U(),u)]( cγa )+A +O(a +L / ). (ii) Suppose Algorithm. is applied with the random threshold ũ BH (.). Then E[ m,γ (U())]( cγa ) FDR BH,γ () α +O(a +L / ). E[ m,γ (U())]( cγa )+A Proof Since w γ (t) has compact support [ cγ,cγ], y (.), the support S,γ of µ γ (t) in (.) is composed of the support segments [v j cγ,v j + cγ] of h j,γ (t). By condition (C), S,γ /L = cγa +O(a +L / ), which implies S,γ /L = cγa +O(a +L / ). Notice that, on the null region S,γ, the expected numer of local extrema, including oth local maxima and minima, equals S,γ E[ m,γ (U())]. On the other hand, similarly to the proof of Theorem in Cheng & Schwartzman (), the expected numer of local extrema on the signal region S,γ is asymptotically equivalent toj. This is ecause, for eachj I + and >, asa, asymptotically, there is no local maximum ofy γ(t) in(v j cγ,v j ) (v j +,v j +cγ), and there is only one local maximum ofy γ (t) in (v j,v j +). The result then follows from similar arguments for proving Theorem in Cheng & Schwartzman () with N =, A,γ = cγa, z γ (t) replaced y z γ(t) and E[ m,γ (U())] replaced y E[ m,γ (U())]. Lemma. Let conditions (C) and (C) hold. As a j, the power for peak j and fixed u (.) can e approximated y ( ) aj w γ () u Power j,γ (u;) = Φ (+O( a j )). (.) σ γ Proof By (.), h j,γ (v j) = w γ () is the maximum of h j,γ (t) over t R. The result then follows from similar arguments for proving Lemma in Cheng & Schwartzman () with z γ (t) replaced y z γ(t). By similar arguments in Cheng & Schwartzman () (see equation () therein), one can show that the random threshold ũ BH converges asymptotically to the deterministic threshold ( ) u BH = αa F γ, (.) A +E[ m,γ (U())]( cγa )( α)

10 wheree[ m,γ (U())] is given y (.). Sinceũ BH is random, similarly to the definition offdr BH,γ (), we define power in the BH procedure as [ ( Power BH,γ () = E J j I + ½ + j I ½ Theorem. Let conditions (C) and (C) hold. ( T+ γ (ũ BH ) (v j,v j +) ) ) )] ( T γ (ũ BH ) (v j,v j +). (i) Suppose Algorithm. is applied with a fixed threshold u >. Then Power γ (u;) = O(a ). (ii) Suppose Algorithm. is applied with the random threshold ũ BH (.). Then Power BH,γ () = O(a +L / ). Proof The desired results follow from similar arguments for showing Theorem in Cheng & Schwartzman (). Example. [Gaussian autocorrelation model] Let the noise z(t) in (.) e constructed as ( ) t s z(t) = σ ν φ db(s), σ,ν >, (.) ν R where φ is the standard Gaussian density, db(s) is Gaussian white noise and ν > (z(t) is regarded y convention as Gaussian white noise when ν = ). Convolving with a Gaussian kernel w γ (t) = (/γ)φ(t/γ) with γ > as in (.) produces a zero-mean infinitely differentiale stationary ergodic Gaussian fieldz γ (t) such that z γ (t) = w γ (t) z(t) = σ (t s) R N ξ ( ) t s φ db(s), ξ = γ ξ +ν, withσ γ = σ /( πξ ), λ,γ = σ /( πξ ) and λ,γ = σ /( πξ ). We have SNR j,γ = a [ ] jw γ () aj (γ +ν ) / σ γ = σπ /. (.) γ As a function of γ, the SNR has a local minimum at γ = ν and is strictly increasing for large γ. In particular, when ν =, it is strictly increasing in γ. Thus we generally expect the detection power to increase with γ for γ > ν. This will e confirmed in the simulations elow. Note that for ν >, the SNR is unounded asγ, however in practice γ cannot e too small: if the support of w γ ecomes smaller than the sampling interval, then the derivative µ γ cannot e estimated.

11 a = a = a = FDR Power Figure : Realized FDR and power of the BH procedure ford =. Simulation studies Simulations were used to evaluate the performance and limitations of the STEM algorithm for signals µ(t) = a t/d, where t =,...,L, L =, d {,} and signal strength a {,,}. Under this setting, the true change points arev j = jd for j =,...,L/d. The noise is generated as the Gaussian process constructed in (.) with σ = and ν =. The smoothing kernels are w γ (t) = (/γ)φ(t/γ)½(t [ γ,γ]) for varying γ. The BH procedure was applied at FDR level α =.. Results were averaged over, replications to simulate the expectations. Figure shows the realized FDR and detection power for a change point separation of d =. A special color map was used for FDR to emphasize the control of FDR (FDR values less than the nominal level. appear in dark lue). We see that for every fixed andwidth γ, increasing the location tolerance allows for detections to e counted as true farther away from the true change points, therey decreasing FDR and increasing power. On the other hand, as expected from Theorems. and., as the strength of the signal a increases, FDR is eventually controlled elow the nominal level and the power tends to for every comination of parameters γ and. For a fixed tolerance, increasing the andwidth γ increases FDR y moving some true change points eyond and producing artificial errors. This also results in some loss of power, which can e seen in Figure especially when is small and γ is large. In addition, for larger and smaller γ, the power is seen to first decrease quickly and then increase again as γ increases. This phenomenon coincides with the ehavior of the SNR (.) derived in Example., predicting the power to decrease for γ ν and increase for γ > ν. Figure illustrates this phenomenon more clearly for fixed =. The realized FDR curves have the same shape as the theoretical power curves, otained y plugging the asymptotic threshold u BH

12 . Power.. γ Figure : The solid curves and dotted curves (in oth cases, lue for a =, green for a = and red for a = ) are respectively theoretical powers and realized powers from Figure with =. The pink dotted vertical line is γ = ν as shown in Example.. (.) into the approximate power (.). Both curves get closer to as the signal strength a increases. The separation of d = in Figure is large enough that andwidths up to γ = do not produce any interference etween neighoring change points. To investigate the effect of neighoring interference, Figure shows the realized FDR and detection power for a change point separation of d =. It can e seen that increasing γ eyond will produce interference and contamination error, therey decreasing power. Still, for any fixed γ and, FDR will eventually e controlled and power will increase if the signal strength is large enough. Data example. Data description Array-ased comparative genomic hyridization (array-cgh) is a high-throughput high-resolution technique used to evaluate changes in the numer of copies of alleles at thousands of genomic loci simultaneously. The output is often called Copy Numer Variation (CNV) data. Changes in copy numer are represented y segments whose mean is displaced with respect to the ackground. To detect these changes, it is costumary to search for change points along the genome. In this paper, we apply our method to chromosome of tumor sample # from the dataset of Hsu et al. () and Loo et al. (). This sample is one of formalin-fixed reast cancer tumors in that dataset and it was chosen for its visual appeal in the illustration of our method. The data in chromosome of tumor sample # consists of average copy numer reads mapped onto unequally spaced locations along the chromosome. For simplicity, the data was analyzed ignoring the gaps in the genomic locations: Figure (top left) shows the data with spacings etween reads artificially set to. Note that ignoring the spacings does not affect the presence or asence of change points.

13 a = a = a = FDR Power Figure : Realized FDR and power of the BH procedure ford =.. Data analysis To analyze the data, the STEM algorithm was applied with a truncated Gaussian smoothing kernel w γ (t) = (/γ)φ(t/γ)½(t [ γ,γ]) with γ =. The andwidth was chosen not too large in order to avoid interference etween neighoring change points. Figure (top right) shows the estimated first derivative (.). Figure (ottom left) marks local maxima (green) and local minima (red). P-values corresponding to local maxima and minima were computed according to (.) using the distriution (.). The required parameters σ γ = Var(z γ (t)), λ,γ = Var(z γ (t)), λ,γ = Var(z γ () (t)) were estimated empirically from the estimated first, second and third derivatives over the oserved data sequence. However, the empirical variances were computed using truncated averages instead of regular averages in order to avoid ias from the extreme derivatives at the change points without assuming their presence or location in advance. The BH algorithm was applied to the p-values FDR level., yielding a p-value significance threshold of.. The corresponding asolute height threshold of. is marked as dashed lines in Figure (ottom left). The significant peaks are plotted on the original data in Figure (ottom right) with a location tolerance of = for visual reference. Discussion In this paper, we comined oth local maxima and minima as candidate peaks, and then applied a multiple testing procedure to find a uniform threshold (in asolute value) for detecting all change points. This approach is sensile when the distriutions (numer and height) of true increasing and

14 Figure : Data example. Top left: Oserved data. Top right: estimated first derivative. Bottom left: local maxima (upward triangles), local minima (downward triangles) of the estimated first derivative, and significance height threshold (lack dashed). Bottom right: The detected positive (green) and negative (red) change points. decreasing change points are aout the same. Alternatively, different thresholds for detecting increasing and decreasing change points could e found y applying separate multiple testing procedures to the sets of candidate local maxima and local minima. While we applied the BH algorithm to control FDR, in principle other multiple testing procedures may e used to control other error rates. A natural and important question is how to choose the smoothing andwidth γ. From Example. and Figures and, we see that either a very small γ or a relatively large γ is preferred in order to increase power, ut only to the extent that the smoothed signal supports h j,γ (t) have little overlap and that detected change points are not displaced y more than the desired tolerance (recall that the value of is not used in the STEM algorithm itself, ut it may e determined y the needs of the specific scientific application). Considering the Gaussian kernel to have an effective support of ±cγ, a good value of γ may e aout min(,d/c), where d is the separation etween change points. Since the location of the change points is unknown, a more precise optimization of γ may require an iterative procedure. Moreover, if some change points are close together and others are far apart, an adpative andwidth may e preferale. We leave these as prolems for future research.

15 References ADLER, R. J. & TAYLOR, J. E. (). Random Fields and Geometry. Springer, New York. BARRY, D. & HARTIGAN, J. A. (). A Bayesian analysis for change point prolems. J. Am. Statist. Assoc.,. BENJAMINI, Y. & HOCHBERG, Y. (). Controlling the false discovery rate: a practical and powerful approach to multiple testing. J. R. Statist. Soc. B,. CHENG, D. & SCHWARTZMAN, A. (a). Distriution of the height of local maxima of Gaussian random fields. Extremes, to appear. arxiv:. CHENG, D. & SCHWARTZMAN, A. (). Multiple testing of local maxima for detection of peaks in random fields. arxiv:. CRAMÉR, H. & LEADBETTER, M. R. (). Stationary and Related Stochastic Processes. Wiley, New York. EILERS, P. H. C. & DE MENEZES, R. X. (). Quantile smoothing of array CGH data. Bioinformatics,. ERDMAN, C. & EMERSON J. W. (). A fast Bayesian change point analysis for the segmentation of microarray data. Bioinformatics,. GIJBELS, I. (). Inference for nonsmooth regression curves and surfaces using kernel-ased methods. a survey collected in Recent advances and trends in nonparametric statistics y Michael G. Akritas and Dimitris N. Politis. HUANG, T., WU, B., LIZARDI, P. & ZHAO, H. (). Detection of DNA copy numer alterations using penalized least squares regression. Bioinformatics,. HUNG, Y., WANG, Y., ZARNITSYNA, V., ZHU, C. & WU C. F. J. (). Hidden Markov models with applications in cell adhesion experiments. J. Am. Statist. Assoc.,. HSU, L., SELF, S. G., GROVE, D., RANDOLPH, T., WANG, K., DELROW, J. J., LOO, L. & PORTER, P. (). Denoising array-ased comparative genomic hyridization data using wavelets. Biostatistics,. JACKSON, B., SCARGLE, J. D., BARNES, D., ARABHI, S., ALT, A., GIOUMOUSIS, P., GWIN, E., SANG- TRAKULCHAROEN, P., TAN, L. & TSAI, T. T. (). An Algorithm for Optimal Partitioning of Data on an Interval. IEEE, Signal Processing Letters,. KILLICK, R., ECKLEY, I. A., EWANS, K. & JONATHAN, P. (). Detection of changes in variance of oceanographic time-series using changepoint analysis. Ocean Engineering,. KILLICK, R., FEARNHEAD, P. & ECKLEY, I. A. (). Optimal detection of changepoints with a linear computational cost. J. Am. Statist. Assoc.,. LAI, W. R., JOHNSON, M. D., KUCHERLAPATI, R. & PARK, P. J. (). Comparative analysis of algorithms for identifying amplifications and deletions in array CGH data. Bioinformatics,.

16 LOO, L. W. M., GROVE, D. I., NEAL, C. L., COUSENS, L. A., SCHUBERT, E. L., WILLIAMS, E. M., HOLCOMB, I. N., DELROW, J. J., TRASK, B. J., HSU, L. & PORTER, P. L. (). Array-CGH analysis of genomic alterations in reast cancer sutypes. Cancer Research,. LANZANTE, J. R. (). Resistant, roust and non-parametric techniques for the analysis of climate data: Theory and example, including applications to historical radiosonde station data. International Journal of Climatology,. MUGGEO, V. M. R. & ADELFIO, G. (). Efficient change point detection for genomic sequences of continuous measurements. Bioinformatics,. NAM, C. F. H., ASTON, J. A. D. & JOHANSEN, A. M. (). Quantifying the uncertainty in change points. Journal of Time Series Analysis,. OLSHEN, A. B., VENKATRAMAN, E. S., LUCITO, R. & WIGLER, M. (). Circular inary segmentation for the analysis of array-ased DNA copy numer data. Biostatistics,. REEVES, J., CHEN, J., WANG, X. L., LUND, R. & LU, Q. Q. (). A Review and Comparison of Changepoint Detection Techniques for Climate Data. Journal of Applied Meteorology and Climatology,. SCHWARTZMAN, A., DOUGHERTY, R. F. & TAYLOR, J. E. (). False discovery rate analysis of rain diffusion direction maps. Ann. Appl. Statist.,. SCHWARTZMAN, A., GAVRILOV, Y. & ADLER, R. J. (). Multiple testing of local maxima for detection of peaks in D. Ann. Statist.,. TIBSHIRANI, R. & WANG, P. (). Spatial smoothing and hot spot detection for CGH data using the fused lasso. Biostatistics,. WANG, P., KIM, Y., POLLACK, J., NARASIMHAN, B. & TIBSHIRANI, R. (). A method for calling gains and losses in array CGH data. Biostatistics,. ZEILEIS, A., SHAH, A. & PATNAIK, I. (). Testing, Monitoring, and Dating Structural Changes in Exchange Rate Regimes. Computational Statistics & Data Analysis,.

Peak Detection for Images

Peak Detection for Images Armin Schwartzman Division of Biostatistics, UC San Diego June 016 Overview How can we improve detection power? Use a less conservative error criterion Take advantage of prior