arxiv: v1 [math.st] 24 Apr 2015

Size: px
Start display at page:

Download "arxiv: v1 [math.st] 24 Apr 2015"

Transcription

1 Multiple Testing of Local Extrema for Detection of Change Points arxiv:.v [math.st] Apr Dan Cheng North Carolina State University April, Astract Armin Schwartzman North Carolina State University A new approach to change point detection ased on smoothing and multiple testing is presented for long data sequences modeled as piecewise constant functions plus stationary ergodic Gaussian noise. As an application of the STEM algorithm for peak detection developed in Schwartzman et al. () and Cheng & Schwartzman (), the method detects change points as significant local maxima and minima after smoothing and differentiating the oserved sequence. The algorithm, comined with the Benjamini-Hocherg procedure for thresholding p- values, provides asymptotic strong control of the False Discovery Rate (FDR) and power consistency, as the length of the sequence and the size of the jumps get large. Simulations show that FDR levels are maintained in non-asymptotic conditions and guide the choice of smoothing andwidth with respect to the desired location tolerance. The methods are illustrated in genomic array-cgh data. Key Words: False discovery rate; Gaussian; Kernel smoothing; Local maxima; Local minima; Location tolerance. Introduction Detecting change points in the mean of an oserved signal is a common statistical prolem with applications in many research areas such as climatology (Reeves et al., ), oceanography(killick et al., ), finance (Zeileis et al., ) and medical imaging (Nam et al., ). It often appears in the analysis of time series ut it has more recently een found in the analysis of genomic sequences, see Erdman & Emerson (); Lai et al. (); Muggeo & Adelfio (); Olshen et al. (); Tishirani & Wang (); Wang et al. () and the references therein. Given the large amounts of data present in modern applications, it is of interest to design a change point detection method that Partially supported y NIH grant R-CA.

2 can operate over long sequences where the numer and location of change points are unknown, and in such a way that the overall detection error rate is controlled. Many different approaches have een proposed to find and estimate change points, such as kernelased methods (Gijels, ), Bayesian methods (Barry & Hartigan, ; Erdman & Emerson, ), segmentation techniques (Olshen et al., ; Wang et al., ; Muggeo & Adelfio, ), nonparametric tests (Lanzante, ) andl -penalty methods (Eilers & de Menezes, ; Huang et al., ; Tishirani & Wang, ), including the PELT method (Jackson et al., ; Killick et al., ). However, the approach proposed here is unique in the following two ways. First, the noise is assumed to e a stationary Gaussian process, allowing the error terms to e correlated. This is an important departure from the standard assumption of white noise in most of the change-point literature. In fact, applied statisticians desiring to use change-point methods have sometimes aandoned this option in favor of other techniques simply ecause the white noise assumption does not hold (Hung et al., ). This paper shows that change-point methods can e devised for correlated noise, expanding the domain of their applicaility. Second, we use the theory of Gaussian processes to compute p-values for all candidate change points, so that significant change points can e selected at a desired significance level. For concreteness, we adopt the Benjamini-Hocherg multiple testing procedure, enaling control of the false discovery rate (FDR) of detected change points when the data sequence is long and the numer and location of change points are unknown. To our knowledge, this is the first article proposing a multiple testing method for controlling FDR of detected change points. In this paper, we consider a signal-plus-noise model where the true signal is a piecewise constant function and the change points are defined as the points of discontinuity. Inspired y the method for detecting peaks in Schwartzman et al. () and Cheng & Schwartzman (), we modify the STEM algorithm therein to detect change points. The central idea is the oservation that the true signal has zero derivative everywhere except at the change points, where the derivative is infinite. Thus, in the presence of noise and under temporal or spatial sampling, change points can e seen as positive or negative peaks in the derivative of the smoothed signal. Note that ecause of the time sampling, derivatives cannot e oserved directly and can only e estimated. The focus on the derivative of the smoothed signal effectively transforms the change point detection prolem into a peak detection prolem. As in the STEM algorithm, the resulting peak detection prolem is then solved y identifying local maxima and minima of the derivative as candidate peaks and applying a multiple testing procedure to the list of candidates. The modified STEM algorithm for change point detection consists of the following steps:. Differential kernel smoothing: to transform change points to local maxima or minima.. Candidate peaks: find local maxima and minima of the differentiated smoothed process.. P-values: computed at each local maximum and minimum under the the null hypothesis of no signal in a local neighorhood.

3 Figure : Following the notation in. and Example., the left panel is the oserved signal-plus-noise model y(t) containing ten true change points with varying a i and noise z(t) given y (.) withσ = andν =. The right panel illustrates the STEM algorithm. The lue curve isy γ(t), otained with a Gaussian smoothing kernel with standard deviation γ =. Local maxima (green solid dots) and local minima (red solid dots) are declared as significant (marked with solid triangles) at FDR level α =. if their heights are eyond the dotted line thresholds. The cyan and pink ars indicate the location tolerance intervals (v i,v i + ) with = for increasing and decreasing change points respectively. At this tolerance, there are nine true discoveries and one false discovery.. Multiple testing: apply a multiple testing procedure to the set of local maxima and minima; declare as detected change points those local maxima and minima whose p-values are significant. The algorithm is illustrated y a toy example in Figure. The modified STEM algorithm aove differs from the ones in Schwartzman et al. () and Cheng & Schwartzman () in that peaks are sought in the derivative of the smoothed signal rather than the smoothed signal itself, and that oth positive and negative peaks are considered. In addition, an important consideration for the proper definition of error in change point detection is that, as opposed to the peak detection prolems considered in Schwartzman et al. () and Cheng & Schwartzman () where signal peaks had compact support, a true single change point at t = v has Leesgue measure zero. Thus in the presence of noise, it can never e detected exactly at t = v. Therefore we introduce a location tolerancethat defines the precision within which a change point should e detected. Specifically, given, a detected change point is regarded as a true discovery if it falls in the interval (v, v + ). Conversely, if a significant change point is found more than a distance from any true change point, it is considered a false discovery. The quantity is not used in the STEM algorithm itself ut is needed for proper error definition. Under this convention, it is shown here that the modified STEM algorithm exhiits asymptotic FDR control and power consistency as the length of the sequence and the size of the jumps at the change points increase. These asymptotic conditions are similar to those considered in Schwartzman et al.

4 () and Cheng & Schwartzman () and, in fact, the proofs are easily extended from those in Cheng & Schwartzman (). Simulations for varying levels of smoothing andwidth γ, location tolerance and jump size a are used to study the ehavior of the algorithm under non-asymptotic conditions. The simulation results help guide the choice of smoothing andwidth with respect to the desired location tolerance. In general, power increases with andwidth to a limit dictated y the distance etween the change points, so admitting a higher tolerance generally allows a higher andwidth and higher power. The methods are illustrated in a genomic sequence of array-cgh data in a reast-cancer tissue sample (Loo et al., ; Hsu et al., ). The goal of the analysis is to find genomic segments with copy-numer alterations. These are found y detecting change points in the copy numer genomic sequence. The multiple testing scheme. The model We consider a continuous time model, although the algorithm is designed for data discretely sampled in time. Consider the signal-plus-noise model y(t) = µ(t)+z(t), t R, (.) where the signal µ(t) is a piecewise constant function of the form µ(t) = a j h j (t), j= a j R\{}, with h j (t) = ½(t > v j ) for v j R. We are interested in finding the change points v j. For the asymptotic analysis, we assume a = inf j a j > and v = inf j v j v j >, so that the change points do not ecome aritrarily small in size nor aritrarily close to each other. Let w γ (t) = w(t/γ)/γ, where γ > is the andwidth parameter and w(t) is a unimodal symmetric kernel with compact connected support [ c, c] and unit action. Convolving the process (.) with the kernel w γ (t) results in the smoothed random process y γ (t) = w γ (t) y(t) = w γ (t s)y(s)ds = µ γ (t)+z γ (t), (.) where the smoothed signal and smoothed noise are defined respectively as µ γ (t) = w γ (t) µ(t) = R a j h j,γ (t) and z γ (t) = w γ (t) z(t), (.) j=

5 and where the smoothed change point takes the form h j,γ (t) = w γ (t) h j (t). (.) The smoothed noise z γ (t) defined y (.) is assumed to e a zero-mean four-times differentiale stationary ergodic Gaussian process.. Change point detection as peak detection of the derivative Consider now the derivative of the smoothed oserved process (.) y γ (t) = w γ R (t) y(t) = w γ (t s)y(s)ds = µ γ (t)+z γ (t), (.) N where the derivatives of the smoothed signal and smoothed noise are respectively µ γ (t) = w γ (t) µ(t) = a j h j,γ (t) and z γ (t) = w γ (t) z(t). A key oservation from (.) is that h j,γ(t) = w γ(t s)h j (s)ds = w γ(s)h j (t s)ds R R = w γ (s)½(t s > v j)ds = w γ (t v j ). R Thus (.) represents a signal-plus-noise model where the smoothed signal j= (.) µ γ (t) = a j h j,γ (t) = a j w γ (t v j ) (.) j= is a sequence of unimodal peaks with the same shape as that of w γ and located at locations v j. The prolem of finding change points in y γ (t) is thus reduced to finding (positive or negative) peaks in y γ(t). For simplicity, we assume that the compact supports S j,γ of the smoothed peak shape h j,γ (t) = j= w γ (t v j ) do not overlap, although this is not crucial in practice.. The STEM algorithm for change point detection Suppose we oserve y(t) with J jumps defined y (.) in the line of length L centered at the origin, denoted y U(L) = ( L/, L/). The following is a version of the STEM algorithm of Schwartzman et al. () and Cheng & Schwartzman () for detecting change points. Algorithm. (STEM algorithm for change point detection). Differential kernel smoothing: Otain the process (.) y convolution of y(t) with the kernel derivative w γ (t).

6 . Candidate peaks: Find the set of local maxima and minima of y γ(t) in U(L), denoted y T γ = T + γ T γ, where { } T γ + = t U(L) : y γ(t) =, y γ () (t) <, { } T γ = t U(L) : y γ(t) =, y γ () (t) >.. P-values: For each t T γ +, compute the p-value p γ(t) for testing the (conditional) hypotheses H (t) : {µ (s) = for all s (t,t+)} vs. H A (t) : {µ (s) > for some s (t,t+)}; and for each t T γ, compute the p-value p γ(t) for testing the (conditional) hypotheses H (t) : {µ (s) = for all s (t,t+)} vs. H A (t) : {µ (s) < for some s (t,t+)}, where > is an appropriate location tolerance.. Multiple testing: Let m γ = #{t T γ } e the numer of tested hypotheses. Apply a multiple testing procedure on the set of m γ p-values {p γ (t), t T γ }, and declare significant all local extrema whose p-values are smaller than the significance threshold.. P-values Given the oserved heights y γ (t) at the local maxima or minimat T γ = T γ + T γ, p-values in step () of Algorithm. are computed as F γ (y γ(t)), t T γ +, p γ (t) = F γ ( y γ (t)), t T (.) γ, wheref γ (u) denotes the right tail proaility ofz γ (t) at the local maximumt T γ + the null model µ (s) =, s (t,t+), that is,, evaluated under F γ (u) = P ( z γ(t) > u t is a local maximum of z γ (t) ). (.) The second line in (.) is otained y noting that, y (.), P ( z γ(t) < u t is a local minimum of z γ (t) ) = P ( z γ (t) > u tis a local maximum of z γ (t)) = F γ ( u), since z γ (t) and z γ (t) have the same distriution.

7 The distriution (.) has a closed-form expression, which can e otained as in Schwartzman et al. () or Cramér & Leadetter (). More specifically, the distriution (.) is given y ( ) λ,γ πλ ( ) ( ),γ u λ,γ F γ (u) = Φ u + λ,γ σ γ φ σ γ Φ u σ γ, (.) where σ γ = Var(z γ(t)), λ,γ = Var(z γ(t)), λ,γ = Var(z γ () (t)), = σ γ λ,γ λ,γ, and φ(x), Φ(x) are the standard normal density and cumulative distriution function, respectively. The quantities σ γ, λ,γ and λ,γ depend on the kernel w γ (t) and the autocorrelation function of the original noise process z(t). Explicit expressions may e otained, for instance, for the Gaussian autocorrelation model in Example. elow, which we use later in the simulations.. Error definitions Assuming the model of., define the signal region S = J j= (v j,v j + ) and null region S = U(L)\S. For u >, let T γ (u) = T γ + (u) T γ (u), where { } T γ + (u) = t U(L) : y γ(t) > u, y γ(t) =, y γ () (t) <, } T γ {t (u) = U(L) : y γ (t) < u, y γ (t) =, y() γ (t) >, indicating that T γ + (u) and T γ (u) are respectively the set of local maxima of y γ (t) aove u and the set of local minima of y γ (t) elow u. The numer of totally and falsely detected change points at threshold u are defined respectively as R γ (u) = #{t T + γ (u)}+#{t T γ (u)}, V γ (u;) = #{t T + γ (u) S }+#{t T γ (u) S }. (.) Both are defined as zero if T γ (u) is empty. proportion of falsely detected jumps FDR γ (u;) = E The FDR at threshold u is defined as the expected { } Vγ (u;). (.) R γ (u) Note that whenγ and u are fixed,v γ (u;) and hence FDR γ (u;) are decreasing in. Following the notation in Cheng & Schwartzman (), define the smoothed signal region S,γ to e the support of µ γ(t) and smoothed null region S,γ = U(L) \ S,γ. We call the difference etween the expanded signal support due to smoothing and the true signal support the transition region T γ = S,γ \S = S \S,γ.. Power Denote y I + and I the collections of indices j corresponding to increasing and decreasing change points v j, respectively. We define the power of Algorithm. as the expected fraction of true discov-

8 ered change points Power γ (u;) = J [ ( J Power j,γ (u;) = E J j= + j I ½ j I + ½ ( T+ γ (u) (v j,v j +) ) ) )] ( T γ (u) (v j,v j +), where Power j,γ (u;) is the proaility of detecting jump v j within a distance, ) P( T+ γ (u) (v j,v j +), if j I +, Power j,γ (u;) = ) P( T γ (u) (v j,v j +), if j I. (.) (.) The indicator function in (.) ensures that only one significant local extremum is counted within a distance of a change point, so power is not inflated. Note that whenγ anduare fixed,power γ (u;) and Power j,γ (u;) are increasing in. Asymptotic error control and power consistency Suppose the BH procedure is applied in step of Algorithm. as follows. For a fixed α (, ), let k e the largest index for which theith smallest p-value is less than iα/ m γ. Then the null hypothesis H (t) at t T γ is rejected if p γ (t) < kα m γ y γ(t) > ũ BH = F y γ(t) < ũ BH = F γ ( ) kα m γ γ ( ) kα m γ if t T + γ, if t T γ, wherekα/ m γ is defined as if m γ =. Sinceũ BH is random, we define FDR in such BH procedure as { } Vγ (ũ BH ;) FDR BH,γ () = E, R γ (ũ BH ) where R γ ( ) and V γ ( ;) are defined in (.) and the expectation is taken over all possile realizations of the random threshold ũ BH. We will make use of the following conditions: (C) The assumptions of. hold. (C) L and a = inf j a j, such that (logl)/a, J/L = A + O(a + L / ) with A >. Let E[ m,γ (U())] and E[ m,γ (U(),u)] e the expected numer of local maxima and local maxima aove level u of z γ(t) on unit interval U() = ( /,/), respectively. In particular, we have the following explicit formula (Schwartzman et al., ) E[ m,γ (U())] = λ,γ. (.) π λ,γ (.)

9 Theorem. Let conditions (C) and (C) hold. (i) Suppose Algorithm. is applied with a fixed threshold u >. Then FDR γ (u;) E[ m,γ (U(),u)]( cγa ) E[ m,γ (U(),u)]( cγa )+A +O(a +L / ). (ii) Suppose Algorithm. is applied with the random threshold ũ BH (.). Then E[ m,γ (U())]( cγa ) FDR BH,γ () α +O(a +L / ). E[ m,γ (U())]( cγa )+A Proof Since w γ (t) has compact support [ cγ,cγ], y (.), the support S,γ of µ γ (t) in (.) is composed of the support segments [v j cγ,v j + cγ] of h j,γ (t). By condition (C), S,γ /L = cγa +O(a +L / ), which implies S,γ /L = cγa +O(a +L / ). Notice that, on the null region S,γ, the expected numer of local extrema, including oth local maxima and minima, equals S,γ E[ m,γ (U())]. On the other hand, similarly to the proof of Theorem in Cheng & Schwartzman (), the expected numer of local extrema on the signal region S,γ is asymptotically equivalent toj. This is ecause, for eachj I + and >, asa, asymptotically, there is no local maximum ofy γ(t) in(v j cγ,v j ) (v j +,v j +cγ), and there is only one local maximum ofy γ (t) in (v j,v j +). The result then follows from similar arguments for proving Theorem in Cheng & Schwartzman () with N =, A,γ = cγa, z γ (t) replaced y z γ(t) and E[ m,γ (U())] replaced y E[ m,γ (U())]. Lemma. Let conditions (C) and (C) hold. As a j, the power for peak j and fixed u (.) can e approximated y ( ) aj w γ () u Power j,γ (u;) = Φ (+O( a j )). (.) σ γ Proof By (.), h j,γ (v j) = w γ () is the maximum of h j,γ (t) over t R. The result then follows from similar arguments for proving Lemma in Cheng & Schwartzman () with z γ (t) replaced y z γ(t). By similar arguments in Cheng & Schwartzman () (see equation () therein), one can show that the random threshold ũ BH converges asymptotically to the deterministic threshold ( ) u BH = αa F γ, (.) A +E[ m,γ (U())]( cγa )( α)

10 wheree[ m,γ (U())] is given y (.). Sinceũ BH is random, similarly to the definition offdr BH,γ (), we define power in the BH procedure as [ ( Power BH,γ () = E J j I + ½ + j I ½ Theorem. Let conditions (C) and (C) hold. ( T+ γ (ũ BH ) (v j,v j +) ) ) )] ( T γ (ũ BH ) (v j,v j +). (i) Suppose Algorithm. is applied with a fixed threshold u >. Then Power γ (u;) = O(a ). (ii) Suppose Algorithm. is applied with the random threshold ũ BH (.). Then Power BH,γ () = O(a +L / ). Proof The desired results follow from similar arguments for showing Theorem in Cheng & Schwartzman (). Example. [Gaussian autocorrelation model] Let the noise z(t) in (.) e constructed as ( ) t s z(t) = σ ν φ db(s), σ,ν >, (.) ν R where φ is the standard Gaussian density, db(s) is Gaussian white noise and ν > (z(t) is regarded y convention as Gaussian white noise when ν = ). Convolving with a Gaussian kernel w γ (t) = (/γ)φ(t/γ) with γ > as in (.) produces a zero-mean infinitely differentiale stationary ergodic Gaussian fieldz γ (t) such that z γ (t) = w γ (t) z(t) = σ (t s) R N ξ ( ) t s φ db(s), ξ = γ ξ +ν, withσ γ = σ /( πξ ), λ,γ = σ /( πξ ) and λ,γ = σ /( πξ ). We have SNR j,γ = a [ ] jw γ () aj (γ +ν ) / σ γ = σπ /. (.) γ As a function of γ, the SNR has a local minimum at γ = ν and is strictly increasing for large γ. In particular, when ν =, it is strictly increasing in γ. Thus we generally expect the detection power to increase with γ for γ > ν. This will e confirmed in the simulations elow. Note that for ν >, the SNR is unounded asγ, however in practice γ cannot e too small: if the support of w γ ecomes smaller than the sampling interval, then the derivative µ γ cannot e estimated.

11 a = a = a = FDR Power Figure : Realized FDR and power of the BH procedure ford =. Simulation studies Simulations were used to evaluate the performance and limitations of the STEM algorithm for signals µ(t) = a t/d, where t =,...,L, L =, d {,} and signal strength a {,,}. Under this setting, the true change points arev j = jd for j =,...,L/d. The noise is generated as the Gaussian process constructed in (.) with σ = and ν =. The smoothing kernels are w γ (t) = (/γ)φ(t/γ)½(t [ γ,γ]) for varying γ. The BH procedure was applied at FDR level α =.. Results were averaged over, replications to simulate the expectations. Figure shows the realized FDR and detection power for a change point separation of d =. A special color map was used for FDR to emphasize the control of FDR (FDR values less than the nominal level. appear in dark lue). We see that for every fixed andwidth γ, increasing the location tolerance allows for detections to e counted as true farther away from the true change points, therey decreasing FDR and increasing power. On the other hand, as expected from Theorems. and., as the strength of the signal a increases, FDR is eventually controlled elow the nominal level and the power tends to for every comination of parameters γ and. For a fixed tolerance, increasing the andwidth γ increases FDR y moving some true change points eyond and producing artificial errors. This also results in some loss of power, which can e seen in Figure especially when is small and γ is large. In addition, for larger and smaller γ, the power is seen to first decrease quickly and then increase again as γ increases. This phenomenon coincides with the ehavior of the SNR (.) derived in Example., predicting the power to decrease for γ ν and increase for γ > ν. Figure illustrates this phenomenon more clearly for fixed =. The realized FDR curves have the same shape as the theoretical power curves, otained y plugging the asymptotic threshold u BH

12 . Power.. γ Figure : The solid curves and dotted curves (in oth cases, lue for a =, green for a = and red for a = ) are respectively theoretical powers and realized powers from Figure with =. The pink dotted vertical line is γ = ν as shown in Example.. (.) into the approximate power (.). Both curves get closer to as the signal strength a increases. The separation of d = in Figure is large enough that andwidths up to γ = do not produce any interference etween neighoring change points. To investigate the effect of neighoring interference, Figure shows the realized FDR and detection power for a change point separation of d =. It can e seen that increasing γ eyond will produce interference and contamination error, therey decreasing power. Still, for any fixed γ and, FDR will eventually e controlled and power will increase if the signal strength is large enough. Data example. Data description Array-ased comparative genomic hyridization (array-cgh) is a high-throughput high-resolution technique used to evaluate changes in the numer of copies of alleles at thousands of genomic loci simultaneously. The output is often called Copy Numer Variation (CNV) data. Changes in copy numer are represented y segments whose mean is displaced with respect to the ackground. To detect these changes, it is costumary to search for change points along the genome. In this paper, we apply our method to chromosome of tumor sample # from the dataset of Hsu et al. () and Loo et al. (). This sample is one of formalin-fixed reast cancer tumors in that dataset and it was chosen for its visual appeal in the illustration of our method. The data in chromosome of tumor sample # consists of average copy numer reads mapped onto unequally spaced locations along the chromosome. For simplicity, the data was analyzed ignoring the gaps in the genomic locations: Figure (top left) shows the data with spacings etween reads artificially set to. Note that ignoring the spacings does not affect the presence or asence of change points.

13 a = a = a = FDR Power Figure : Realized FDR and power of the BH procedure ford =.. Data analysis To analyze the data, the STEM algorithm was applied with a truncated Gaussian smoothing kernel w γ (t) = (/γ)φ(t/γ)½(t [ γ,γ]) with γ =. The andwidth was chosen not too large in order to avoid interference etween neighoring change points. Figure (top right) shows the estimated first derivative (.). Figure (ottom left) marks local maxima (green) and local minima (red). P-values corresponding to local maxima and minima were computed according to (.) using the distriution (.). The required parameters σ γ = Var(z γ (t)), λ,γ = Var(z γ (t)), λ,γ = Var(z γ () (t)) were estimated empirically from the estimated first, second and third derivatives over the oserved data sequence. However, the empirical variances were computed using truncated averages instead of regular averages in order to avoid ias from the extreme derivatives at the change points without assuming their presence or location in advance. The BH algorithm was applied to the p-values FDR level., yielding a p-value significance threshold of.. The corresponding asolute height threshold of. is marked as dashed lines in Figure (ottom left). The significant peaks are plotted on the original data in Figure (ottom right) with a location tolerance of = for visual reference. Discussion In this paper, we comined oth local maxima and minima as candidate peaks, and then applied a multiple testing procedure to find a uniform threshold (in asolute value) for detecting all change points. This approach is sensile when the distriutions (numer and height) of true increasing and

14 Figure : Data example. Top left: Oserved data. Top right: estimated first derivative. Bottom left: local maxima (upward triangles), local minima (downward triangles) of the estimated first derivative, and significance height threshold (lack dashed). Bottom right: The detected positive (green) and negative (red) change points. decreasing change points are aout the same. Alternatively, different thresholds for detecting increasing and decreasing change points could e found y applying separate multiple testing procedures to the sets of candidate local maxima and local minima. While we applied the BH algorithm to control FDR, in principle other multiple testing procedures may e used to control other error rates. A natural and important question is how to choose the smoothing andwidth γ. From Example. and Figures and, we see that either a very small γ or a relatively large γ is preferred in order to increase power, ut only to the extent that the smoothed signal supports h j,γ (t) have little overlap and that detected change points are not displaced y more than the desired tolerance (recall that the value of is not used in the STEM algorithm itself, ut it may e determined y the needs of the specific scientific application). Considering the Gaussian kernel to have an effective support of ±cγ, a good value of γ may e aout min(,d/c), where d is the separation etween change points. Since the location of the change points is unknown, a more precise optimization of γ may require an iterative procedure. Moreover, if some change points are close together and others are far apart, an adpative andwidth may e preferale. We leave these as prolems for future research.

15 References ADLER, R. J. & TAYLOR, J. E. (). Random Fields and Geometry. Springer, New York. BARRY, D. & HARTIGAN, J. A. (). A Bayesian analysis for change point prolems. J. Am. Statist. Assoc.,. BENJAMINI, Y. & HOCHBERG, Y. (). Controlling the false discovery rate: a practical and powerful approach to multiple testing. J. R. Statist. Soc. B,. CHENG, D. & SCHWARTZMAN, A. (a). Distriution of the height of local maxima of Gaussian random fields. Extremes, to appear. arxiv:. CHENG, D. & SCHWARTZMAN, A. (). Multiple testing of local maxima for detection of peaks in random fields. arxiv:. CRAMÉR, H. & LEADBETTER, M. R. (). Stationary and Related Stochastic Processes. Wiley, New York. EILERS, P. H. C. & DE MENEZES, R. X. (). Quantile smoothing of array CGH data. Bioinformatics,. ERDMAN, C. & EMERSON J. W. (). A fast Bayesian change point analysis for the segmentation of microarray data. Bioinformatics,. GIJBELS, I. (). Inference for nonsmooth regression curves and surfaces using kernel-ased methods. a survey collected in Recent advances and trends in nonparametric statistics y Michael G. Akritas and Dimitris N. Politis. HUANG, T., WU, B., LIZARDI, P. & ZHAO, H. (). Detection of DNA copy numer alterations using penalized least squares regression. Bioinformatics,. HUNG, Y., WANG, Y., ZARNITSYNA, V., ZHU, C. & WU C. F. J. (). Hidden Markov models with applications in cell adhesion experiments. J. Am. Statist. Assoc.,. HSU, L., SELF, S. G., GROVE, D., RANDOLPH, T., WANG, K., DELROW, J. J., LOO, L. & PORTER, P. (). Denoising array-ased comparative genomic hyridization data using wavelets. Biostatistics,. JACKSON, B., SCARGLE, J. D., BARNES, D., ARABHI, S., ALT, A., GIOUMOUSIS, P., GWIN, E., SANG- TRAKULCHAROEN, P., TAN, L. & TSAI, T. T. (). An Algorithm for Optimal Partitioning of Data on an Interval. IEEE, Signal Processing Letters,. KILLICK, R., ECKLEY, I. A., EWANS, K. & JONATHAN, P. (). Detection of changes in variance of oceanographic time-series using changepoint analysis. Ocean Engineering,. KILLICK, R., FEARNHEAD, P. & ECKLEY, I. A. (). Optimal detection of changepoints with a linear computational cost. J. Am. Statist. Assoc.,. LAI, W. R., JOHNSON, M. D., KUCHERLAPATI, R. & PARK, P. J. (). Comparative analysis of algorithms for identifying amplifications and deletions in array CGH data. Bioinformatics,.

16 LOO, L. W. M., GROVE, D. I., NEAL, C. L., COUSENS, L. A., SCHUBERT, E. L., WILLIAMS, E. M., HOLCOMB, I. N., DELROW, J. J., TRASK, B. J., HSU, L. & PORTER, P. L. (). Array-CGH analysis of genomic alterations in reast cancer sutypes. Cancer Research,. LANZANTE, J. R. (). Resistant, roust and non-parametric techniques for the analysis of climate data: Theory and example, including applications to historical radiosonde station data. International Journal of Climatology,. MUGGEO, V. M. R. & ADELFIO, G. (). Efficient change point detection for genomic sequences of continuous measurements. Bioinformatics,. NAM, C. F. H., ASTON, J. A. D. & JOHANSEN, A. M. (). Quantifying the uncertainty in change points. Journal of Time Series Analysis,. OLSHEN, A. B., VENKATRAMAN, E. S., LUCITO, R. & WIGLER, M. (). Circular inary segmentation for the analysis of array-ased DNA copy numer data. Biostatistics,. REEVES, J., CHEN, J., WANG, X. L., LUND, R. & LU, Q. Q. (). A Review and Comparison of Changepoint Detection Techniques for Climate Data. Journal of Applied Meteorology and Climatology,. SCHWARTZMAN, A., DOUGHERTY, R. F. & TAYLOR, J. E. (). False discovery rate analysis of rain diffusion direction maps. Ann. Appl. Statist.,. SCHWARTZMAN, A., GAVRILOV, Y. & ADLER, R. J. (). Multiple testing of local maxima for detection of peaks in D. Ann. Statist.,. TIBSHIRANI, R. & WANG, P. (). Spatial smoothing and hot spot detection for CGH data using the fused lasso. Biostatistics,. WANG, P., KIM, Y., POLLACK, J., NARASIMHAN, B. & TIBSHIRANI, R. (). A method for calling gains and losses in array CGH data. Biostatistics,. ZEILEIS, A., SHAH, A. & PATNAIK, I. (). Testing, Monitoring, and Dating Structural Changes in Exchange Rate Regimes. Computational Statistics & Data Analysis,.

Peak Detection for Images

Peak Detection for Images Peak Detection for Images Armin Schwartzman Division of Biostatistics, UC San Diego June 016 Overview How can we improve detection power? Use a less conservative error criterion Take advantage of prior

More information

Multiple Change-Point Detection and Analysis of Chromosome Copy Number Variations

Multiple Change-Point Detection and Analysis of Chromosome Copy Number Variations Multiple Change-Point Detection and Analysis of Chromosome Copy Number Variations Yale School of Public Health Joint work with Ning Hao, Yue S. Niu presented @Tsinghua University Outline 1 The Problem

More information

arxiv: v1 [stat.ml] 10 Apr 2017

arxiv: v1 [stat.ml] 10 Apr 2017 Integral Transforms from Finite Data: An Application of Gaussian Process Regression to Fourier Analysis arxiv:1704.02828v1 [stat.ml] 10 Apr 2017 Luca Amrogioni * and Eric Maris Radoud University, Donders

More information

FinQuiz Notes

FinQuiz Notes Reading 9 A time series is any series of data that varies over time e.g. the quarterly sales for a company during the past five years or daily returns of a security. When assumptions of the regression

More information

A Brief Introduction to the R package StepSignalMargiLike

A Brief Introduction to the R package StepSignalMargiLike A Brief Introduction to the R package StepSignalMargiLike Chu-Lan Kao, Chao Du Jan 2015 This article contains a short introduction to the R package StepSignalMargiLike for estimating change-points in stepwise

More information

arxiv: v1 [math.st] 31 Mar 2009

arxiv: v1 [math.st] 31 Mar 2009 The Annals of Statistics 2009, Vol. 37, No. 2, 619 629 DOI: 10.1214/07-AOS586 c Institute of Mathematical Statistics, 2009 arxiv:0903.5373v1 [math.st] 31 Mar 2009 AN ADAPTIVE STEP-DOWN PROCEDURE WITH PROVEN

More information

Table of Outcomes. Table of Outcomes. Table of Outcomes. Table of Outcomes. Table of Outcomes. Table of Outcomes. T=number of type 2 errors

Table of Outcomes. Table of Outcomes. Table of Outcomes. Table of Outcomes. Table of Outcomes. Table of Outcomes. T=number of type 2 errors The Multiple Testing Problem Multiple Testing Methods for the Analysis of Microarray Data 3/9/2009 Copyright 2009 Dan Nettleton Suppose one test of interest has been conducted for each of m genes in a

More information

A Brief Introduction to the Matlab package StepSignalMargiLike

A Brief Introduction to the Matlab package StepSignalMargiLike A Brief Introduction to the Matlab package StepSignalMargiLike Chu-Lan Kao, Chao Du Jan 2015 This article contains a short introduction to the Matlab package for estimating change-points in stepwise signals.

More information

1. Define the following terms (1 point each): alternative hypothesis

1. Define the following terms (1 point each): alternative hypothesis 1 1. Define the following terms (1 point each): alternative hypothesis One of three hypotheses indicating that the parameter is not zero; one states the parameter is not equal to zero, one states the parameter

More information

Simple Examples. Let s look at a few simple examples of OI analysis.

Simple Examples. Let s look at a few simple examples of OI analysis. Simple Examples Let s look at a few simple examples of OI analysis. Example 1: Consider a scalar prolem. We have one oservation y which is located at the analysis point. We also have a ackground estimate

More information

Spectrum Opportunity Detection with Weak and Correlated Signals

Spectrum Opportunity Detection with Weak and Correlated Signals Specum Opportunity Detection with Weak and Correlated Signals Yao Xie Department of Elecical and Computer Engineering Duke University orth Carolina 775 Email: yaoxie@dukeedu David Siegmund Department of

More information

Research Article Sample Size Calculation for Controlling False Discovery Proportion

Research Article Sample Size Calculation for Controlling False Discovery Proportion Probability and Statistics Volume 2012, Article ID 817948, 13 pages doi:10.1155/2012/817948 Research Article Sample Size Calculation for Controlling False Discovery Proportion Shulian Shang, 1 Qianhe Zhou,

More information

Fixed-b Inference for Testing Structural Change in a Time Series Regression

Fixed-b Inference for Testing Structural Change in a Time Series Regression Fixed- Inference for esting Structural Change in a ime Series Regression Cheol-Keun Cho Michigan State University imothy J. Vogelsang Michigan State University August 29, 24 Astract his paper addresses

More information

Chaos and Dynamical Systems

Chaos and Dynamical Systems Chaos and Dynamical Systems y Megan Richards Astract: In this paper, we will discuss the notion of chaos. We will start y introducing certain mathematical concepts needed in the understanding of chaos,

More information

Issues on quantile autoregression

Issues on quantile autoregression Issues on quantile autoregression Jianqing Fan and Yingying Fan We congratulate Koenker and Xiao on their interesting and important contribution to the quantile autoregression (QAR). The paper provides

More information

arxiv: v1 [math.st] 1 Dec 2014

arxiv: v1 [math.st] 1 Dec 2014 HOW TO MONITOR AND MITIGATE STAIR-CASING IN L TREND FILTERING Cristian R. Rojas and Bo Wahlberg Department of Automatic Control and ACCESS Linnaeus Centre School of Electrical Engineering, KTH Royal Institute

More information

Estimation of a Two-component Mixture Model

Estimation of a Two-component Mixture Model Estimation of a Two-component Mixture Model Bodhisattva Sen 1,2 University of Cambridge, Cambridge, UK Columbia University, New York, USA Indian Statistical Institute, Kolkata, India 6 August, 2012 1 Joint

More information

Post-Selection Inference

Post-Selection Inference Classical Inference start end start Post-Selection Inference selected end model data inference data selection model data inference Post-Selection Inference Todd Kuffner Washington University in St. Louis

More information

A sequential multiple change-point detection procedure via VIF regression

A sequential multiple change-point detection procedure via VIF regression Comput Stat DOI 10.1007/s00180-015-0587-5 ORIGINAL PAPER A sequential multiple change-point detection procedure via VIF regression iaoping Shi 1 iang-sheng Wang 1 Dongwei Wei 1 Yuehua Wu 1 Received: 6

More information

Generalized Seasonal Tapered Block Bootstrap

Generalized Seasonal Tapered Block Bootstrap Generalized Seasonal Tapered Block Bootstrap Anna E. Dudek 1, Efstathios Paparoditis 2, Dimitris N. Politis 3 Astract In this paper a new lock ootstrap method for periodic times series called Generalized

More information

Depth versus Breadth in Convolutional Polar Codes

Depth versus Breadth in Convolutional Polar Codes Depth versus Breadth in Convolutional Polar Codes Maxime Tremlay, Benjamin Bourassa and David Poulin,2 Département de physique & Institut quantique, Université de Sherrooke, Sherrooke, Quéec, Canada JK

More information

TIGHT BOUNDS FOR THE FIRST ORDER MARCUM Q-FUNCTION

TIGHT BOUNDS FOR THE FIRST ORDER MARCUM Q-FUNCTION TIGHT BOUNDS FOR THE FIRST ORDER MARCUM Q-FUNCTION Jiangping Wang and Dapeng Wu Department of Electrical and Computer Engineering University of Florida, Gainesville, FL 3611 Correspondence author: Prof.

More information

Statistical Applications in Genetics and Molecular Biology

Statistical Applications in Genetics and Molecular Biology Statistical Applications in Genetics and Molecular Biology Volume 5, Issue 1 2006 Article 28 A Two-Step Multiple Comparison Procedure for a Large Number of Tests and Multiple Treatments Hongmei Jiang Rebecca

More information

Post-selection Inference for Changepoint Detection

Post-selection Inference for Changepoint Detection Post-selection Inference for Changepoint Detection Sangwon Hyun (Justin) Dept. of Statistics Advisors: Max G Sell, Ryan Tibshirani Committee: Will Fithian (UC Berkeley), Alessandro Rinaldo, Kathryn Roeder,

More information

#A50 INTEGERS 14 (2014) ON RATS SEQUENCES IN GENERAL BASES

#A50 INTEGERS 14 (2014) ON RATS SEQUENCES IN GENERAL BASES #A50 INTEGERS 14 (014) ON RATS SEQUENCES IN GENERAL BASES Johann Thiel Dept. of Mathematics, New York City College of Technology, Brooklyn, New York jthiel@citytech.cuny.edu Received: 6/11/13, Revised:

More information

EVALUATIONS OF EXPECTED GENERALIZED ORDER STATISTICS IN VARIOUS SCALE UNITS

EVALUATIONS OF EXPECTED GENERALIZED ORDER STATISTICS IN VARIOUS SCALE UNITS APPLICATIONES MATHEMATICAE 9,3 (), pp. 85 95 Erhard Cramer (Oldenurg) Udo Kamps (Oldenurg) Tomasz Rychlik (Toruń) EVALUATIONS OF EXPECTED GENERALIZED ORDER STATISTICS IN VARIOUS SCALE UNITS Astract. We

More information

A moment-based method for estimating the proportion of true null hypotheses and its application to microarray gene expression data

A moment-based method for estimating the proportion of true null hypotheses and its application to microarray gene expression data Biostatistics (2007), 8, 4, pp. 744 755 doi:10.1093/biostatistics/kxm002 Advance Access publication on January 22, 2007 A moment-based method for estimating the proportion of true null hypotheses and its

More information

Estimating a Finite Population Mean under Random Non-Response in Two Stage Cluster Sampling with Replacement

Estimating a Finite Population Mean under Random Non-Response in Two Stage Cluster Sampling with Replacement Open Journal of Statistics, 07, 7, 834-848 http://www.scirp.org/journal/ojs ISS Online: 6-798 ISS Print: 6-78X Estimating a Finite Population ean under Random on-response in Two Stage Cluster Sampling

More information

The miss rate for the analysis of gene expression data

The miss rate for the analysis of gene expression data Biostatistics (2005), 6, 1,pp. 111 117 doi: 10.1093/biostatistics/kxh021 The miss rate for the analysis of gene expression data JONATHAN TAYLOR Department of Statistics, Stanford University, Stanford,

More information

False discovery rate and related concepts in multiple comparisons problems, with applications to microarray data

False discovery rate and related concepts in multiple comparisons problems, with applications to microarray data False discovery rate and related concepts in multiple comparisons problems, with applications to microarray data Ståle Nygård Trial Lecture Dec 19, 2008 1 / 35 Lecture outline Motivation for not using

More information

Supplemental Material for KERNEL-BASED INFERENCE IN TIME-VARYING COEFFICIENT COINTEGRATING REGRESSION. September 2017

Supplemental Material for KERNEL-BASED INFERENCE IN TIME-VARYING COEFFICIENT COINTEGRATING REGRESSION. September 2017 Supplemental Material for KERNEL-BASED INFERENCE IN TIME-VARYING COEFFICIENT COINTEGRATING REGRESSION By Degui Li, Peter C. B. Phillips, and Jiti Gao September 017 COWLES FOUNDATION DISCUSSION PAPER NO.

More information

Spiking problem in monotone regression : penalized residual sum of squares

Spiking problem in monotone regression : penalized residual sum of squares Spiking prolem in monotone regression : penalized residual sum of squares Jayanta Kumar Pal 12 SAMSI, NC 27606, U.S.A. Astract We consider the estimation of a monotone regression at its end-point, where

More information

Expansion formula using properties of dot product (analogous to FOIL in algebra): u v 2 u v u v u u 2u v v v u 2 2u v v 2

Expansion formula using properties of dot product (analogous to FOIL in algebra): u v 2 u v u v u u 2u v v v u 2 2u v v 2 Least squares: Mathematical theory Below we provide the "vector space" formulation, and solution, of the least squares prolem. While not strictly necessary until we ring in the machinery of matrix algera,

More information

Optional Stopping Theorem Let X be a martingale and T be a stopping time such

Optional Stopping Theorem Let X be a martingale and T be a stopping time such Plan Counting, Renewal, and Point Processes 0. Finish FDR Example 1. The Basic Renewal Process 2. The Poisson Process Revisited 3. Variants and Extensions 4. Point Processes Reading: G&S: 7.1 7.3, 7.10

More information

Sample Size Estimation for Studies of High-Dimensional Data

Sample Size Estimation for Studies of High-Dimensional Data Sample Size Estimation for Studies of High-Dimensional Data James J. Chen, Ph.D. National Center for Toxicological Research Food and Drug Administration June 3, 2009 China Medical University Taichung,

More information

Time-Varying Spectrum Estimation of Uniformly Modulated Processes by Means of Surrogate Data and Empirical Mode Decomposition

Time-Varying Spectrum Estimation of Uniformly Modulated Processes by Means of Surrogate Data and Empirical Mode Decomposition Time-Varying Spectrum Estimation of Uniformly Modulated Processes y Means of Surrogate Data and Empirical Mode Decomposition Azadeh Moghtaderi, Patrick Flandrin, Pierre Borgnat To cite this version: Azadeh

More information

Nonparametric Estimation of Smooth Conditional Distributions

Nonparametric Estimation of Smooth Conditional Distributions Nonparametric Estimation of Smooth Conditional Distriutions Bruce E. Hansen University of Wisconsin www.ssc.wisc.edu/~hansen May 24 Astract This paper considers nonparametric estimation of smooth conditional

More information

A Sufficient Condition for Optimality of Digital versus Analog Relaying in a Sensor Network

A Sufficient Condition for Optimality of Digital versus Analog Relaying in a Sensor Network A Sufficient Condition for Optimality of Digital versus Analog Relaying in a Sensor Network Chandrashekhar Thejaswi PS Douglas Cochran and Junshan Zhang Department of Electrical Engineering Arizona State

More information

Luis Manuel Santana Gallego 100 Investigation and simulation of the clock skew in modern integrated circuits. Clock Skew Model

Luis Manuel Santana Gallego 100 Investigation and simulation of the clock skew in modern integrated circuits. Clock Skew Model Luis Manuel Santana Gallego 100 Appendix 3 Clock Skew Model Xiaohong Jiang and Susumu Horiguchi [JIA-01] 1. Introduction The evolution of VLSI chips toward larger die sizes and faster clock speeds makes

More information

A Sequential Bayesian Approach with Applications to Circadian Rhythm Microarray Gene Expression Data

A Sequential Bayesian Approach with Applications to Circadian Rhythm Microarray Gene Expression Data A Sequential Bayesian Approach with Applications to Circadian Rhythm Microarray Gene Expression Data Faming Liang, Chuanhai Liu, and Naisyin Wang Texas A&M University Multiple Hypothesis Testing Introduction

More information

High-Throughput Sequencing Course. Introduction. Introduction. Multiple Testing. Biostatistics and Bioinformatics. Summer 2018

High-Throughput Sequencing Course. Introduction. Introduction. Multiple Testing. Biostatistics and Bioinformatics. Summer 2018 High-Throughput Sequencing Course Multiple Testing Biostatistics and Bioinformatics Summer 2018 Introduction You have previously considered the significance of a single gene Introduction You have previously

More information

Hypes and Other Important Developments in Statistics

Hypes and Other Important Developments in Statistics Hypes and Other Important Developments in Statistics Aad van der Vaart Vrije Universiteit Amsterdam May 2009 The Hype Sparsity For decades we taught students that to estimate p parameters one needs n p

More information

New Procedures for False Discovery Control

New Procedures for False Discovery Control New Procedures for False Discovery Control Christopher R. Genovese Department of Statistics Carnegie Mellon University http://www.stat.cmu.edu/ ~ genovese/ Elisha Merriam Department of Neuroscience University

More information

Real option valuation for reserve capacity

Real option valuation for reserve capacity Real option valuation for reserve capacity MORIARTY, JM; Palczewski, J doi:10.1016/j.ejor.2016.07.003 For additional information aout this pulication click this link. http://qmro.qmul.ac.uk/xmlui/handle/123456789/13838

More information

MATH 225: Foundations of Higher Matheamatics. Dr. Morton. 3.4: Proof by Cases

MATH 225: Foundations of Higher Matheamatics. Dr. Morton. 3.4: Proof by Cases MATH 225: Foundations of Higher Matheamatics Dr. Morton 3.4: Proof y Cases Chapter 3 handout page 12 prolem 21: Prove that for all real values of y, the following inequality holds: 7 2y + 2 2y 5 7. You

More information

Defect Detection using Nonparametric Regression

Defect Detection using Nonparametric Regression Defect Detection using Nonparametric Regression Siana Halim Industrial Engineering Department-Petra Christian University Siwalankerto 121-131 Surabaya- Indonesia halim@petra.ac.id Abstract: To compare

More information

arxiv: v2 [stat.me] 14 Jul 2016

arxiv: v2 [stat.me] 14 Jul 2016 Multiple Change-point Detection: a Selective Overview Yue S. Niu, Ning Hao, and Heping Zhang University of Arizona and Yale University arxiv:1512.04093v2 [stat.me] 14 Jul 2016 July 15, 2016 Abstract Very

More information

University of California, Berkeley

University of California, Berkeley University of California, Berkeley U.C. Berkeley Division of Biostatistics Working Paper Series Year 2004 Paper 147 Multiple Testing Methods For ChIP-Chip High Density Oligonucleotide Array Data Sunduz

More information

Model-Free Knockoffs: High-Dimensional Variable Selection that Controls the False Discovery Rate

Model-Free Knockoffs: High-Dimensional Variable Selection that Controls the False Discovery Rate Model-Free Knockoffs: High-Dimensional Variable Selection that Controls the False Discovery Rate Lucas Janson, Stanford Department of Statistics WADAPT Workshop, NIPS, December 2016 Collaborators: Emmanuel

More information

QUALITY CONTROL OF WINDS FROM METEOSAT 8 AT METEO FRANCE : SOME RESULTS

QUALITY CONTROL OF WINDS FROM METEOSAT 8 AT METEO FRANCE : SOME RESULTS QUALITY CONTROL OF WINDS FROM METEOSAT 8 AT METEO FRANCE : SOME RESULTS Christophe Payan Météo France, Centre National de Recherches Météorologiques, Toulouse, France Astract The quality of a 30-days sample

More information

Comments on A Time Delay Controller for Systems with Uncertain Dynamics

Comments on A Time Delay Controller for Systems with Uncertain Dynamics Comments on A Time Delay Controller for Systems with Uncertain Dynamics Qing-Chang Zhong Dept. of Electrical & Electronic Engineering Imperial College of Science, Technology, and Medicine Exhiition Rd.,

More information

Tools and topics for microarray analysis

Tools and topics for microarray analysis Tools and topics for microarray analysis USSES Conference, Blowing Rock, North Carolina, June, 2005 Jason A. Osborne, osborne@stat.ncsu.edu Department of Statistics, North Carolina State University 1 Outline

More information

SVETLANA KATOK AND ILIE UGARCOVICI (Communicated by Jens Marklof)

SVETLANA KATOK AND ILIE UGARCOVICI (Communicated by Jens Marklof) JOURNAL OF MODERN DYNAMICS VOLUME 4, NO. 4, 010, 637 691 doi: 10.3934/jmd.010.4.637 STRUCTURE OF ATTRACTORS FOR (a, )-CONTINUED FRACTION TRANSFORMATIONS SVETLANA KATOK AND ILIE UGARCOVICI (Communicated

More information

Comment about AR spectral estimation Usually an estimate is produced by computing the AR theoretical spectrum at (ˆφ, ˆσ 2 ). With our Monte Carlo

Comment about AR spectral estimation Usually an estimate is produced by computing the AR theoretical spectrum at (ˆφ, ˆσ 2 ). With our Monte Carlo Comment aout AR spectral estimation Usually an estimate is produced y computing the AR theoretical spectrum at (ˆφ, ˆσ 2 ). With our Monte Carlo simulation approach, for every draw (φ,σ 2 ), we can compute

More information

HIGH-DIMENSIONAL GRAPHS AND VARIABLE SELECTION WITH THE LASSO

HIGH-DIMENSIONAL GRAPHS AND VARIABLE SELECTION WITH THE LASSO The Annals of Statistics 2006, Vol. 34, No. 3, 1436 1462 DOI: 10.1214/009053606000000281 Institute of Mathematical Statistics, 2006 HIGH-DIMENSIONAL GRAPHS AND VARIABLE SELECTION WITH THE LASSO BY NICOLAI

More information

Miscellanea False discovery rate for scanning statistics

Miscellanea False discovery rate for scanning statistics Biometrika (2011), 98,4,pp. 979 985 C 2011 Biometrika Trust Printed in Great Britain doi: 10.1093/biomet/asr057 Miscellanea False discovery rate for scanning statistics BY D. O. SIEGMUND, N. R. ZHANG Department

More information

Model Specification Testing in Nonparametric and Semiparametric Time Series Econometrics. Jiti Gao

Model Specification Testing in Nonparametric and Semiparametric Time Series Econometrics. Jiti Gao Model Specification Testing in Nonparametric and Semiparametric Time Series Econometrics Jiti Gao Department of Statistics School of Mathematics and Statistics The University of Western Australia Crawley

More information

A New Smoothing Model for Analyzing Array CGH Data

A New Smoothing Model for Analyzing Array CGH Data A New Smoothing Model for Analyzing Array CGH Data Nha Nguyen, Heng Huang, Soontorn Oraintara and An Vo Department of Electrical Engineering, University of Texas at Arlington Email: nhn3175@exchange.uta.edu,

More information

Bumpbars: Inference for region detection. Yuval Benjamini, Hebrew University

Bumpbars: Inference for region detection. Yuval Benjamini, Hebrew University Bumpbars: Inference for region detection Yuval Benjamini, Hebrew University yuvalbenj@gmail.com WHOA-PSI-2017 Collaborators Jonathan Taylor Stanford Rafael Irizarry Dana Farber, Harvard Amit Meir U of

More information

Confidence Thresholds and False Discovery Control

Confidence Thresholds and False Discovery Control Confidence Thresholds and False Discovery Control Christopher R. Genovese Department of Statistics Carnegie Mellon University http://www.stat.cmu.edu/ ~ genovese/ Larry Wasserman Department of Statistics

More information

Non-specific filtering and control of false positives

Non-specific filtering and control of false positives Non-specific filtering and control of false positives Richard Bourgon 16 June 2009 bourgon@ebi.ac.uk EBI is an outstation of the European Molecular Biology Laboratory Outline Multiple testing I: overview

More information

Bayesian Methods for Machine Learning

Bayesian Methods for Machine Learning Bayesian Methods for Machine Learning CS 584: Big Data Analytics Material adapted from Radford Neal s tutorial (http://ftp.cs.utoronto.ca/pub/radford/bayes-tut.pdf), Zoubin Ghahramni (http://hunch.net/~coms-4771/zoubin_ghahramani_bayesian_learning.pdf),

More information

arxiv: v1 [hep-ex] 20 Jan 2013

arxiv: v1 [hep-ex] 20 Jan 2013 Heavy-Flavor Results from CMS arxiv:.69v [hep-ex] Jan University of Colorado on ehalf of the CMS Collaoration E-mail: keith.ulmer@colorado.edu Heavy-flavor physics offers the opportunity to make indirect

More information

Doing Cosmology with Balls and Envelopes

Doing Cosmology with Balls and Envelopes Doing Cosmology with Balls and Envelopes Christopher R. Genovese Department of Statistics Carnegie Mellon University http://www.stat.cmu.edu/ ~ genovese/ Larry Wasserman Department of Statistics Carnegie

More information

STRONG NORMALITY AND GENERALIZED COPELAND ERDŐS NUMBERS

STRONG NORMALITY AND GENERALIZED COPELAND ERDŐS NUMBERS #A INTEGERS 6 (206) STRONG NORMALITY AND GENERALIZED COPELAND ERDŐS NUMBERS Elliot Catt School of Mathematical and Physical Sciences, The University of Newcastle, Callaghan, New South Wales, Australia

More information

Chapter 3: Statistical methods for estimation and testing. Key reference: Statistical methods in bioinformatics by Ewens & Grant (2001).

Chapter 3: Statistical methods for estimation and testing. Key reference: Statistical methods in bioinformatics by Ewens & Grant (2001). Chapter 3: Statistical methods for estimation and testing Key reference: Statistical methods in bioinformatics by Ewens & Grant (2001). Chapter 3: Statistical methods for estimation and testing Key reference:

More information

The Mean Version One way to write the One True Regression Line is: Equation 1 - The One True Line

The Mean Version One way to write the One True Regression Line is: Equation 1 - The One True Line Chapter 27: Inferences for Regression And so, there is one more thing which might vary one more thing aout which we might want to make some inference: the slope of the least squares regression line. The

More information

Estimation of the False Discovery Rate

Estimation of the False Discovery Rate Estimation of the False Discovery Rate Coffee Talk, Bioinformatics Research Center, Sept, 2005 Jason A. Osborne, osborne@stat.ncsu.edu Department of Statistics, North Carolina State University 1 Outline

More information

Optimal Bandwidth Selection in Heteroskedasticity-Autocorrelation Robust Testing

Optimal Bandwidth Selection in Heteroskedasticity-Autocorrelation Robust Testing Optimal Bandwidth Selection in Heteroskedasticity-Autocorrelation Roust esting Yixiao Sun Department of Economics University of California, San Diego Peter C. B. Phillips Cowles Foundation, Yale University,

More information

Test for Discontinuities in Nonparametric Regression

Test for Discontinuities in Nonparametric Regression Communications of the Korean Statistical Society Vol. 15, No. 5, 2008, pp. 709 717 Test for Discontinuities in Nonparametric Regression Dongryeon Park 1) Abstract The difference of two one-sided kernel

More information

A Large-Sample Approach to Controlling the False Discovery Rate

A Large-Sample Approach to Controlling the False Discovery Rate A Large-Sample Approach to Controlling the False Discovery Rate Christopher R. Genovese Department of Statistics Carnegie Mellon University Larry Wasserman Department of Statistics Carnegie Mellon University

More information

Asymptotic inference for a nonstationary double ar(1) model

Asymptotic inference for a nonstationary double ar(1) model Asymptotic inference for a nonstationary double ar() model By SHIQING LING and DONG LI Department of Mathematics, Hong Kong University of Science and Technology, Hong Kong maling@ust.hk malidong@ust.hk

More information

Structuring Unreliable Radio Networks

Structuring Unreliable Radio Networks Structuring Unreliale Radio Networks Keren Censor-Hillel Seth Gilert Faian Kuhn Nancy Lynch Calvin Newport March 29, 2011 Astract In this paper we study the prolem of uilding a connected dominating set

More information

Step-down FDR Procedures for Large Numbers of Hypotheses

Step-down FDR Procedures for Large Numbers of Hypotheses Step-down FDR Procedures for Large Numbers of Hypotheses Paul N. Somerville University of Central Florida Abstract. Somerville (2004b) developed FDR step-down procedures which were particularly appropriate

More information

arxiv: v4 [math.st] 29 Aug 2017

arxiv: v4 [math.st] 29 Aug 2017 A Critical Value Function Approach, with an Application to Persistent Time-Series Marcelo J. Moreira, and Rafael Mourão arxiv:66.3496v4 [math.st] 29 Aug 27 Escola de Pós-Graduação em Economia e Finanças

More information

Supervised Classification for Functional Data Using False Discovery Rate and Multivariate Functional Depth

Supervised Classification for Functional Data Using False Discovery Rate and Multivariate Functional Depth Supervised Classification for Functional Data Using False Discovery Rate and Multivariate Functional Depth Chong Ma 1 David B. Hitchcock 2 1 PhD Candidate University of South Carolina 2 Associate Professor

More information

Understanding Regressions with Observations Collected at High Frequency over Long Span

Understanding Regressions with Observations Collected at High Frequency over Long Span Understanding Regressions with Observations Collected at High Frequency over Long Span Yoosoon Chang Department of Economics, Indiana University Joon Y. Park Department of Economics, Indiana University

More information

Summary and discussion of: Controlling the False Discovery Rate: A Practical and Powerful Approach to Multiple Testing

Summary and discussion of: Controlling the False Discovery Rate: A Practical and Powerful Approach to Multiple Testing Summary and discussion of: Controlling the False Discovery Rate: A Practical and Powerful Approach to Multiple Testing Statistics Journal Club, 36-825 Beau Dabbs and Philipp Burckhardt 9-19-2014 1 Paper

More information

Model Selection Tutorial 2: Problems With Using AIC to Select a Subset of Exposures in a Regression Model

Model Selection Tutorial 2: Problems With Using AIC to Select a Subset of Exposures in a Regression Model Model Selection Tutorial 2: Problems With Using AIC to Select a Subset of Exposures in a Regression Model Centre for Molecular, Environmental, Genetic & Analytic (MEGA) Epidemiology School of Population

More information

A spatio-temporal model for extreme precipitation simulated by a climate model

A spatio-temporal model for extreme precipitation simulated by a climate model A spatio-temporal model for extreme precipitation simulated by a climate model Jonathan Jalbert Postdoctoral fellow at McGill University, Montréal Anne-Catherine Favre, Claude Bélisle and Jean-François

More information

Supplement to: Guidelines for constructing a confidence interval for the intra-class correlation coefficient (ICC)

Supplement to: Guidelines for constructing a confidence interval for the intra-class correlation coefficient (ICC) Supplement to: Guidelines for constructing a confidence interval for the intra-class correlation coefficient (ICC) Authors: Alexei C. Ionan, Mei-Yin C. Polley, Lisa M. McShane, Kevin K. Doin Section Page

More information

Divide-and-Conquer. Reading: CLRS Sections 2.3, 4.1, 4.2, 4.3, 28.2, CSE 6331 Algorithms Steve Lai

Divide-and-Conquer. Reading: CLRS Sections 2.3, 4.1, 4.2, 4.3, 28.2, CSE 6331 Algorithms Steve Lai Divide-and-Conquer Reading: CLRS Sections 2.3, 4.1, 4.2, 4.3, 28.2, 33.4. CSE 6331 Algorithms Steve Lai Divide and Conquer Given an instance x of a prolem, the method works as follows: divide-and-conquer

More information

CPM: A Covariance-preserving Projection Method

CPM: A Covariance-preserving Projection Method CPM: A Covariance-preserving Projection Method Jieping Ye Tao Xiong Ravi Janardan Astract Dimension reduction is critical in many areas of data mining and machine learning. In this paper, a Covariance-preserving

More information

Time Series and Forecasting Lecture 4 NonLinear Time Series

Time Series and Forecasting Lecture 4 NonLinear Time Series Time Series and Forecasting Lecture 4 NonLinear Time Series Bruce E. Hansen Summer School in Economics and Econometrics University of Crete July 23-27, 2012 Bruce Hansen (University of Wisconsin) Foundations

More information

Models for models. Douglas Nychka Geophysical Statistics Project National Center for Atmospheric Research

Models for models. Douglas Nychka Geophysical Statistics Project National Center for Atmospheric Research Models for models Douglas Nychka Geophysical Statistics Project National Center for Atmospheric Research Outline Statistical models and tools Spatial fields (Wavelets) Climate regimes (Regression and clustering)

More information

Statistical Inference

Statistical Inference Statistical Inference Liu Yang Florida State University October 27, 2016 Liu Yang, Libo Wang (Florida State University) Statistical Inference October 27, 2016 1 / 27 Outline The Bayesian Lasso Trevor Park

More information

Example: Bipolar NRZ (non-return-to-zero) signaling

Example: Bipolar NRZ (non-return-to-zero) signaling Baseand Data Transmission Data are sent without using a carrier signal Example: Bipolar NRZ (non-return-to-zero signaling is represented y is represented y T A -A T : it duration is represented y BT. Passand

More information

Multiple Testing. Hoang Tran. Department of Statistics, Florida State University

Multiple Testing. Hoang Tran. Department of Statistics, Florida State University Multiple Testing Hoang Tran Department of Statistics, Florida State University Large-Scale Testing Examples: Microarray data: testing differences in gene expression between two traits/conditions Microbiome

More information

Parametric Empirical Bayes Methods for Microarrays

Parametric Empirical Bayes Methods for Microarrays Parametric Empirical Bayes Methods for Microarrays Ming Yuan, Deepayan Sarkar, Michael Newton and Christina Kendziorski April 30, 2018 Contents 1 Introduction 1 2 General Model Structure: Two Conditions

More information

A nonparametric test for seasonal unit roots

A nonparametric test for seasonal unit roots Robert M. Kunst robert.kunst@univie.ac.at University of Vienna and Institute for Advanced Studies Vienna To be presented in Innsbruck November 7, 2007 Abstract We consider a nonparametric test for the

More information

Testing near or at the Boundary of the Parameter Space (Job Market Paper)

Testing near or at the Boundary of the Parameter Space (Job Market Paper) Testing near or at the Boundary of the Parameter Space (Jo Market Paper) Philipp Ketz Brown University Novemer 7, 24 Statistical inference aout a scalar parameter is often performed using the two-sided

More information

Minimum Hellinger Distance Estimation in a. Semiparametric Mixture Model

Minimum Hellinger Distance Estimation in a. Semiparametric Mixture Model Minimum Hellinger Distance Estimation in a Semiparametric Mixture Model Sijia Xiang 1, Weixin Yao 1, and Jingjing Wu 2 1 Department of Statistics, Kansas State University, Manhattan, Kansas, USA 66506-0802.

More information

IN this paper, we consider the estimation of the frequency

IN this paper, we consider the estimation of the frequency Iterative Frequency Estimation y Interpolation on Fourier Coefficients Elias Aoutanios, MIEEE, Bernard Mulgrew, MIEEE Astract The estimation of the frequency of a complex exponential is a prolem that is

More information

Fast estimation of posterior probabilities in change-point analysis through a constrained hidden Markov model

Fast estimation of posterior probabilities in change-point analysis through a constrained hidden Markov model arxiv:1203.4394v2 [stat.ap] 22 Jan 2013 Fast estimation of posterior probabilities in change-point analysis through a constrained hidden Markov model The Minh Luong, Yves Rozenholc, and Gregory Nuel MAP5,

More information

Probabilistic Inference for Multiple Testing

Probabilistic Inference for Multiple Testing This is the title page! This is the title page! Probabilistic Inference for Multiple Testing Chuanhai Liu and Jun Xie Department of Statistics, Purdue University, West Lafayette, IN 47907. E-mail: chuanhai,

More information

Testing equality of autocovariance operators for functional time series

Testing equality of autocovariance operators for functional time series Testing equality of autocovariance operators for functional time series Dimitrios PILAVAKIS, Efstathios PAPARODITIS and Theofanis SAPATIAS Department of Mathematics and Statistics, University of Cyprus,

More information

Sanat Sarkar Department of Statistics, Temple University Philadelphia, PA 19122, U.S.A. September 11, Abstract

Sanat Sarkar Department of Statistics, Temple University Philadelphia, PA 19122, U.S.A. September 11, Abstract Adaptive Controls of FWER and FDR Under Block Dependence arxiv:1611.03155v1 [stat.me] 10 Nov 2016 Wenge Guo Department of Mathematical Sciences New Jersey Institute of Technology Newark, NJ 07102, U.S.A.

More information

arxiv: v1 [cs.fl] 24 Nov 2017

arxiv: v1 [cs.fl] 24 Nov 2017 (Biased) Majority Rule Cellular Automata Bernd Gärtner and Ahad N. Zehmakan Department of Computer Science, ETH Zurich arxiv:1711.1090v1 [cs.fl] 4 Nov 017 Astract Consider a graph G = (V, E) and a random

More information

Module 9: Further Numbers and Equations. Numbers and Indices. The aim of this lesson is to enable you to: work with rational and irrational numbers

Module 9: Further Numbers and Equations. Numbers and Indices. The aim of this lesson is to enable you to: work with rational and irrational numbers Module 9: Further Numers and Equations Lesson Aims The aim of this lesson is to enale you to: wor with rational and irrational numers wor with surds to rationalise the denominator when calculating interest,

More information

ROBUST FREQUENCY DOMAIN ARMA MODELLING. Jonas Gillberg Fredrik Gustafsson Rik Pintelon

ROBUST FREQUENCY DOMAIN ARMA MODELLING. Jonas Gillberg Fredrik Gustafsson Rik Pintelon ROBUST FREQUENCY DOMAIN ARMA MODELLING Jonas Gillerg Fredrik Gustafsson Rik Pintelon Department of Electrical Engineering, Linköping University SE-581 83 Linköping, Sweden Email: gillerg@isy.liu.se, fredrik@isy.liu.se

More information