Package FDRSeg. September 20, PDF Free Download

Type Package Package FDRSeg September 20, 2017 Title FDR-Control in Multiscale Change-Point Segmentation Version 1.0-3 Date 2017-09-20 Author Housen Li [aut], Hannes Sieling [aut], Timo Aspelmeier [cre] Maintainer Timo Aspelmeier <timo.aspelmeier@mathematik.uni-goettingen.de> Estimate step functions via multiscale inference with controlled false discovery rate (FDR). For details see H. Li, A. Munk and H. Sieling (2016) <doi:10.1214/16-ejs1131>. Depends R (>= 2.15.3) License GPL-3 Imports Rcpp (>= 0.11.5), stepr (>= 1.0-1), stats LinkingTo Rcpp NeedsCompilation yes Repository CRAN Date/Publication 2017-09-20 08:34:11 UTC R topics documented: FDRSeg-package...................................... 2 computefdp......................................... 4 dfdrseg........................................... 5 evalstepfun......................................... 6 fdrseg............................................ 7 simulquantile........................................ 9 smuce............................................ 10 teethfun........................................... 12 v_measure.......................................... 13 Index 14 1

2 FDRSeg-package FDRSeg-package FDR-Control in Multiscale Change-Point Segmentation Details Estimate step functions via multiscale inference with controlled false discovery rate (FDR). For details see H. Li, A. Munk and H. Sieling (2016) <doi:10.1214/16-ejs1131>. Package: FDRSeg Type: Package Version: 1.0-3 Date: 2017-09-20 License: The GNU General Public License Index: computefdp dfdrseg evalstepfun fdrseg simulquantile smuce teethfun v_measure Compute false discovery proportion (FDP) Piecewise constant regression with D-FDRSeg Evaluate step function Piecewise constant regression with FDRSeg Quantile simulations Piecewise constant regression with SMUCE Teeth function Compute V-measure Author(s) Housen Li [aut], Hannes Sieling [aut], Timo Aspelmeier [cre] Maintainer: Timo Aspelmeier <timo.aspelmeier@mathematik.uni-goettingen.de> Frick, K., Munk, A., and Sieling, H. (2014). Multiscale Change-Point Inference. J. R. Statist. Soc. B, with discussion and rejoinder by the authors, 76:495 580. Hotz, T., Schuette, O. M., Sieling, H., Polupanow, T., Diederichsen, U., Steinem, C., and Munk, A. (2013). Idealizing ion channel recordings by a jump segmentation multiresolution filter. IEEE Transactions on Nanobioscience, 12(4), 376 86. Li, H., Munk, A., and Sieling, H. (2015). FDR-control in multiscale change-point segmentation. arxiv:1412.5844. smucer, jsmurf

FDRSeg-package 3 library(stepr) ## (I) Independent Gaussian Data # simulate data n <- 300 # number of observations K <- 20 # number of change-points u0 <- teethfun(n, K) set.seed(2) Y <- rnorm(n, u0, 0.3) # plot data plot(y, pch = 20, col = "grey", ylab = "") lines(u0, type = "s") # estimate standard deviation sd <- sdrobnorm(y) # simulate quantiles alpha <- 0.1 qs <- simulquantile(1 - alpha, n, type = "smuce") # for SMUCE qfs <- simulquantile(1 - alpha, n, type = "fdrseg") # for FDRSeg # compute estimates us <- smuce(y, qs, sd = sd) # SMUCE ufs <- fdrseg(y, qfs, sd = sd) # FDRSeg # plot results lines(evalstepfun(us), type = "s", col = "blue") lines(evalstepfun(ufs), type = "s", col = "red") legend("topleft", c("truth", "SMUCE", "FDRSeg"), lty = c(1, 1, 1), col = c("black", "blue", "red")) ## (II) Dependent Gaussian Data # simulate data (a continuous time Markov chain) ts <- 0.1 # sampling time SNR <- 3 # signal-to-noise ratio sampling <- 1e4 # sampling rate 10 khz over <- 10 # tenfold oversampling cutoff <- 1e3 # 1 khz 4-pole Bessel-filter, adjusted for oversampling simdf <- dfilter("bessel", list(pole=4, cutoff=cutoff/sampling/over)) transrate <- 50 rates <- rbind(c(0, transrate), c(transrate, 0)) set.seed(123) sim <- contmc(ts*sampling, c(0,snr), rates, sampling = sampling, family = "gausskern", param = list(df=simdf, over=over, sd=1)) Y <- sim$data$y x <- sim$data$x # D-FDRseg convkern <- dfilter("bessel", list(pole=4, cutoff=cutoff/sampling))$kern alpha <- 0.1 r <- 10 # r could be much larger

4 computefdp qdfs udfs <- simulquantile(1 - alpha, ts*sampling, r, "dfdrseg", convkern) <- dfdrseg(y, qdfs, convkern = convkern) # plot results plot(x, Y, pch = 20, col = "grey", xlab="", ylab = "", main = "Simulate Ion Channel Data") lines(sim$discr, col = "blue") lines(x, evalstepfun(udfs), col = "red") legend("topleft", c("truth", "D-FDRSeg"), lty = c(1, 1), col = c("blue", "red")) computefdp Compute false discovery proportion (FDP) Compute false discovery proportion for estimated change-points, see (Li et al., 2015) for a detailed explanation. computefdp(u, ej) u ej true signal; a numeric vector estimated change-points; a numeric vector A scalar takes value in [0, 1]. Li, H., Munk, A., and Sieling, H. (2015). FDR-control in multiscale change-point segmentation. arxiv:1412.5844. fdrseg, v_measure # simulate data set.seed(2) u0 <- c(rep(1, 50), rep(5, 50)) Y <- rnorm(100, u0) # compute FDRSeg uh <- fdrseg(y)

dfdrseg 5 plot(y, pch = 20, col = "grey", xlab = "", ylab = "") lines(u0, type = "s", col = "blue") lines(evalstepfun(uh), type = "s", col = "red") legend("topleft", c("truth", "FDRSeg"), lty = c(1, 1), col = c("blue", "red")) # compute false discovery proportion fdp <- computefdp(u0, uh$left) cat("false discovery propostion is ", fdp, "\n") dfdrseg Piecewise constant regression with D-FDRSeg Compute the D-FDRSeg estimator for one-dimensional data with dependent Gaussian noises, especially for ion channel recordings, see (Hotz et al., 2013; Li et al., 2015) for further details. dfdrseg(y, q, alpha = 0.1, r = round(50/min(alpha, 1-alpha)), convkern, sd = stepr::sdrobnorm(y, lag=length(convkern)+1)) Y q alpha r a numeric vector containing the noisy data threshold value; a numeric vector of the same length as the data significance level; if q is missing, q is chosen as the (1-alpha)-quantile of the null distribution of the multiscale statistic via Monte Carlo simulation, see (Li et al., 2015) for an explanation numer of Monte Carlo simulations convkern kernel of the low-pass filter, see (Li et al., 2015) sd A list with components standard deviation of noises value left n function values on each segment of the estimator indices of leftmost points within each segment of the estimator number of samples Hotz, T., Schuette, O. M., Sieling, H., Polupanow, T., Diederichsen, U., Steinem, C., and Munk, A. (2013). Idealizing ion channel recordings by a jump segmentation multiresolution filter. IEEE Transactions on Nanobioscience, 12(4), 376-86. Li, H., Munk, A., and Sieling, H. (2015). FDR-control in multiscale change-point segmentation. arxiv:1412.5844.

6 evalstepfun smuce, dfdrseg, jsmurf, simulquantile, sdrobnorm, contmc, dfilter, evalstepfun library(stepr) # simulate data (a continuous time Markov chain) ts <- 0.1 # sampling time SNR <- 3 # signal-to-noise ratio sampling <- 1e4 # sampling rate 10 khz over <- 10 # tenfold oversampling cutoff <- 1e3 # 1 khz 4-pole Bessel-filter, adjusted for oversampling simdf <- dfilter("bessel", list(pole=4, cutoff=cutoff/sampling/over)) transrate <- 50 rates <- rbind(c(0, transrate), c(transrate, 0)) set.seed(123) sim <- contmc(ts*sampling, c(0,snr), rates, sampling = sampling, family = "gausskern", param = list(df=simdf, over=over, sd=1)) Y <- sim$data$y x <- sim$data$x # D-FDRseg library(stepr) convkern <- dfilter("bessel", list(pole=4, cutoff=cutoff/sampling))$kern uh <- dfdrseg(y, convkern = convkern, r = 10) # r could be much larger # plot results plot(x, Y, pch = 20, col = "grey", xlab="", ylab = "", main = "Simulate Ion Channel Data") lines(sim$discr, col = "blue") lines(x, evalstepfun(uh), col = "red") legend("topleft", c("truth", "D-FDRSeg"), lty = c(1, 1), col = c("blue", "red")) ## Not run: # alternatively simulate quantiles first alpha <- 0.1 q <- simulquantile(1 - alpha, ts*sampling, type = "dfdrseg", convkern = convkern) # then compute the estimate uh <- dfdrseg(y, q, convkern = convkern) ## End(Not run) evalstepfun Evaluate step function Transform the return value by smuce, fdrseg, or dfdrseg into a numeric vector.

fdrseg 7 evalstepfun(stepf) stepf a list returned by smuce, fdrseg, or dfdrseg with components value function values on each segment of the estimator left indices of leftmost points within each segment of the estimator n number of samples A numeric vector gives function values of stepf at sampling locations. smuce, fdrseg, dfdrseg # simulate data set.seed(2) u0 <- c(rep(1, 5), rep(5, 5)) Y <- rnorm(10, u0) # compute the SMUCE estimate uh <- smuce(y) # print results # step function returned by smuce print(uh) # vector returned by evalstepfun print(evalstepfun(uh)) fdrseg Piecewise constant regression with FDRSeg Compute the FDRSeg estimator for one-dimensional data with i.i.d. Gaussian noises. fdrseg(y, q, alpha = 0.1, r = round(50/min(alpha, 1-alpha)), sd = stepr::sdrobnorm(y))

8 fdrseg Y q alpha r sd a numeric vector containing the noisy data threshold value; a numeric vector of the same length as the data significance level; if q is missing, q is chosen as the (1-alpha)-quantile of the null distribution of the multiscale statistic via Monte Carlo simulation, see (Li et al., 2015) for an explanation numer of Monte Carlo simulations standard deviation of noises A list with components value left n function values on each segment of the estimator indices of leftmost points within each segment of the estimator number of samples Li, H., Munk, A., and Sieling, H. (2015). FDR-control in multiscale change-point segmentation. arxiv:1412.5844. smuce, dfdrseg, simulquantile, sdrobnorm, evalstepfun, computefdp, v_measure # simulate data set.seed(123) u0 <- c(rep(1, 50), rep(5, 50)) Y <- rnorm(100, u0) # compute the estimate (q is automatically simulated) # it might take a while due to simulating quantiles and will # be faster for later calls on signals of the same length uh <- fdrseg(y) # plot result plot(y, pch = 20, col = "grey", ylab = "", main = expression(alpha*" = 0.1")) lines(u0, type = "s", col = "blue") lines(evalstepfun(uh), type = "s", col = "red") legend("topleft", c("truth", "FDRSeg"), lty = c(1, 1), col = c("blue", "red")) # other choice of alpha uh <- fdrseg(y, alpha = 0.05) # plot result plot(y, pch = 20, col = "grey", ylab = "", main = expression(alpha*" = 0.05"))

simulquantile 9 lines(u0, type = "s", col = "blue") lines(evalstepfun(uh), type = "s", col = "red") legend("topleft", c("truth", "FDRSeg"), lty = c(1, 1), col = c("blue", "red")) ## Not run: # alternatively simulate quantiles first alpha <- 0.1 q <- simulquantile(1 - alpha, 100, type = "fdrseg") # then compute the estimate uh <- fdrseg(y, q) ## End(Not run) simulquantile Quantile simulations Simulate the quantiles of multiscale statistics for SMUCE, FDRSeg, and D-FDRSeg under null hypothesis. simulquantile(alpha, n, r = round(50/min(alpha, 1-alpha)), type = c("smuce","fdrseg","dfdrseg"), convkern, pos =.GlobalEnv) alpha n r a scalar with values in [0, 1]; the alpha-quantile of the null distribution of the multiscale statistic for SMUCE, FDRSeg, or D-FDRSeg via Monte Carlo simulation, see (Frick et al., 2014; Hotz et al., 2013; Li et al., 2015) for an explanation number of observations numer of Monte Carlo simulations type "smuce" simulate quantile for SMUCE "fdrseg" simulate quantiles for FDRSeg "dfdrseg" simulate quantiles for D-FDRSeg convkern pos convolution kernel, only needed when type is "dfdrseg" environment for saving the simulations for possible later usage A scalar value if type is chosen as "smuce"; a numeric vector of length n if type is chosen as "fdrseg" or "dfdrseg".

10 smuce Frick, K., Munk, A., and Sieling, H. (2014). Multiscale Change-Point Inference. J. R. Statist. Soc. B, with discussion and rejoinder by the authors, 76:495 580. Hotz, T., Schuette, O. M., Sieling, H., Polupanow, T., Diederichsen, U., Steinem, C., and Munk, A. (2013). Idealizing ion channel recordings by a jump segmentation multiresolution filter. IEEE Transactions on Nanobioscience, 12(4):376 86. Li, H., Munk, A., and Sieling, H. (2015). FDR-control in multiscale change-point segmentation. arxiv:1412.5844. smuce, fdrseg, dfdrseg library(stepr) # simulate quantiles for independent Gaussian noises qs <- simulquantile(0.9, 100, type = "smuce") qfs <- simulquantile(0.9, 100, type = "fdrseg") # plot result yrng <- range(qs, qfs) plot(qfs, pch = 20, ylim = yrng, xlab = "n", ylab = "") abline(h = qs) # simulate quantiles for dependent Gaussian noises convkern <- dfilter("bessel")$kern # create digital filters qdfs <- simulquantile(0.9, 100, type = "dfdrseg", convkern = convkern) plot(qdfs, pch = 20, xlab = "n", ylab = "") smuce Piecewise constant regression with SMUCE Compute the SMUCE estimator for one-dimensional data with i.i.d. Gaussian noises. smuce(y, q, alpha = 0.1, r = round(50/min(alpha, 1-alpha)), sd = stepr::sdrobnorm(y)) Y q alpha a numeric vector containing the noisy data threshold value; a scalar number significance level; if q is missing, q is chosen as the (1-alpha)-quantile of the null distribution of the multiscale statistic via Monte Carlo simulation, see (Frick et al., 2014) for an explanation

smuce 11 r sd numer of Monte Carlo simulations standard deviation of noises A list with components value left n function values on each segment of the estimator indices of leftmost points within each segment of the estimator number of samples Note This is an efficient implementation of function smucer in R package stepr (CRAN) for data with i.i.d. Gaussian noises. The detailed algorithm is described in (Seiling, 2013). Frick, K., Munk, A., and Sieling, H. (2014). Multiscale Change-Point Inference. J. R. Statist. Soc. B, with discussion and rejoinder by the authors, 76:495 580. Seiling, H. (2013). Statistical Multiscale Segmentation: Inference, Algorithms and Applications. PhD thesis, University of Goettingen, Germany. fdrseg, dfdrseg, simulquantile, sdrobnorm, evalstepfun, computefdp, v_measure # simulate data set.seed(2) u0 <- c(rep(1, 50), rep(5, 50)) Y <- rnorm(100, u0) # compute the estimate (q is automatically simulated) uh <- smuce(y) # plot result plot(y, pch = 20, col = "grey", ylab = "", main = expression(alpha*" = 0.1")) lines(u0, type = "s", col = "blue") lines(evalstepfun(uh), type = "s", col = "red") legend("topleft", c("truth", "SMUCE"), lty = c(1, 1), col = c("blue", "red")) # other choice of alpha uh <- smuce(y, alpha = 0.05) # plot result plot(y, pch = 20, col = "grey", ylab = "", main = expression(alpha*" = 0.05")) lines(u0, type = "s", col = "blue") lines(evalstepfun(uh), type = "s", col = "red")

12 teethfun legend("topleft", c("truth", "SMUCE"), lty = c(1, 1), col = c("blue", "red")) ## Not run: # alternatively simulate quantiles first alpha <- 0.1 q <- simulquantile(1 - alpha, 100, type = "smuce") # then compute the estimate uh <- smuce(y, q) ## End(Not run) teethfun Teeth function Creat the teeth function with specified lengths and number of change-points. teethfun(n, K, h = 3) n K h length of the vector (values of the teeth function) number of change-points height of the jump A numeric vector gives values of the teeth function. Fryzlewicz, P. (2014). Wild binary segmentation for multiple change-point detection. Ann. Statist., 42(6): 2243 1572. # create teeth function u <- teethfun(100, 6) # plot plot(u, type = "s")

v_measure 13 v_measure Compute V-measure Compute V-measure, a segmentation evaluation measure, which is based upon two criteria for clustering usefulness, homogeneity and completeness. v_measure(sig, est, beta = 1) sig est beta true signal; a numeric vector estimator; a numeric vector parameter in definition of V-measure, see (Rosenberg and Hirschberg, 2007) for details A scalar takes value in [0, 1], with a larger value indicating higher accuracy. Rosenberg, A., and Hirschberg, J. (2007). V-measure: a conditional entropy-based external cluster evaluation measures. Proc. Conf. Empirical Methods Natural Lang. Process., (June):410 420. computefdp, smuce, fdrseg, evalstepfun # simulate data u0 <- c(rep(1, 50), rep(5, 50)) Y <- rnorm(100, u0) # compute FDRSeg uh <- fdrseg(y) plot(y, pch = 20, col = "grey", xlab = "", ylab = "") lines(u0, type = "s", col = "blue") lines(evalstepfun(uh), type = "s", col = "red") legend("topleft", c("truth", "FDRSeg"), lty = c(1, 1), col = c("blue", "red")) # compute V-measure vm <- v_measure(u0, evalstepfun(uh)) print(vm)

Index Topic nonparametric computefdp, 4 dfdrseg, 5 evalstepfun, 6 fdrseg, 7 FDRSeg-package, 2 simulquantile, 9 smuce, 10 teethfun, 12 v_measure, 13 Topic package FDRSeg-package, 2 computefdp, 4, 8, 11, 13 contmc, 6 dfdrseg, 5, 6 8, 10, 11 dfilter, 6 evalstepfun, 6, 6, 8, 11, 13 FDRSeg (FDRSeg-package), 2 fdrseg, 4, 6, 7, 7, 10, 11, 13 FDRSeg-package, 2 jsmurf, 2, 6 sdrobnorm, 6, 8, 11 simulquantile, 6, 8, 9, 11 smuce, 6 8, 10, 10, 13 smucer, 2 teethfun, 12 v_measure, 4, 8, 11, 13 14

Package FDRSeg. September 20, 2017