Bootstrapping Dependent Data

Similar documents
A Simple Regression Problem

3.3 Variational Characterization of Singular Values

Analyzing Simulation Results

Introduction to Machine Learning. Recitation 11

e-companion ONLY AVAILABLE IN ELECTRONIC FORM

Machine Learning Basics: Estimators, Bias and Variance

Multi-Scale/Multi-Resolution: Wavelet Transform

Supplement to: Subsampling Methods for Persistent Homology

Computational and Statistical Learning Theory

Proc. of the IEEE/OES Seventh Working Conference on Current Measurement Technology UNCERTAINTIES IN SEASONDE CURRENT VELOCITIES

Estimating Parameters for a Gaussian pdf

C na (1) a=l. c = CO + Clm + CZ TWO-STAGE SAMPLE DESIGN WITH SMALL CLUSTERS. 1. Introduction

Multiscale Entropy Analysis: A New Method to Detect Determinism in a Time. Series. A. Sarkar and P. Barat. Variable Energy Cyclotron Centre

Pattern Recognition and Machine Learning. Learning and Evaluation for Pattern Recognition

Model Fitting. CURM Background Material, Fall 2014 Dr. Doreen De Leon

1 Proof of learning bounds

Ch 12: Variations on Backpropagation

E0 370 Statistical Learning Theory Lecture 6 (Aug 30, 2011) Margin Analysis

In this chapter, we consider several graph-theoretic and probabilistic models

Block designs and statistics

Boosting with log-loss

Ocean 420 Physical Processes in the Ocean Project 1: Hydrostatic Balance, Advection and Diffusion Answers

DEPARTMENT OF ECONOMETRICS AND BUSINESS STATISTICS

Topic 5a Introduction to Curve Fitting & Linear Regression

Quantum algorithms (CO 781, Winter 2008) Prof. Andrew Childs, University of Waterloo LECTURE 15: Unstructured search and spatial search

Keywords: Estimator, Bias, Mean-squared error, normality, generalized Pareto distribution

Principal Components Analysis

Optimal Jackknife for Discrete Time and Continuous Time Unit Root Models

arxiv: v1 [cs.ds] 3 Feb 2014

are equal to zero, where, q = p 1. For each gene j, the pairwise null and alternative hypotheses are,

1 Generalization bounds based on Rademacher complexity

Combining Classifiers

COS 424: Interacting with Data. Written Exercises

Non-Parametric Non-Line-of-Sight Identification 1

ASSUME a source over an alphabet size m, from which a sequence of n independent samples are drawn. The classical

Nonmonotonic Networks. a. IRST, I Povo (Trento) Italy, b. Univ. of Trento, Physics Dept., I Povo (Trento) Italy

Pseudo-marginal Metropolis-Hastings: a simple explanation and (partial) review of theory

ESTIMATING AND FORMING CONFIDENCE INTERVALS FOR EXTREMA OF RANDOM POLYNOMIALS. A Thesis. Presented to. The Faculty of the Department of Mathematics

Physically Based Modeling CS Notes Spring 1997 Particle Collision and Contact

MSEC MODELING OF DEGRADATION PROCESSES TO OBTAIN AN OPTIMAL SOLUTION FOR MAINTENANCE AND PERFORMANCE

Kinetic Theory of Gases: Elementary Ideas

Sharp Time Data Tradeoffs for Linear Inverse Problems

Lecture 12: Ensemble Methods. Introduction. Weighted Majority. Mixture of Experts/Committee. Σ k α k =1. Isabelle Guyon

Understanding Machine Learning Solution Manual

Kinetic Theory of Gases: Elementary Ideas

Pattern Recognition and Machine Learning. Artificial Neural networks

Course Notes for EE227C (Spring 2018): Convex Optimization and Approximation

arxiv: v1 [stat.ot] 7 Jul 2010

When Short Runs Beat Long Runs

SPECTRUM sensing is a core concept of cognitive radio

OBJECTIVES INTRODUCTION

1 Bounding the Margin

List Scheduling and LPT Oliver Braun (09/05/2017)

Ph 20.3 Numerical Solution of Ordinary Differential Equations

The Transactional Nature of Quantum Information

Stochastic Subgradient Methods

IN modern society that various systems have become more

Comparing Probabilistic Forecasting Systems with the Brier Score

13.2 Fully Polynomial Randomized Approximation Scheme for Permanent of Random 0-1 Matrices

CS Lecture 13. More Maximum Likelihood

I. Understand get a conceptual grasp of the problem

Least Squares Fitting of Data

An Improved Particle Filter with Applications in Ballistic Target Tracking

Estimation of Korean Monthly GDP with Mixed-Frequency Data using an Unobserved Component Error Correction Model

Research in Area of Longevity of Sylphon Scraies

U V. r In Uniform Field the Potential Difference is V Ed

Lecture 21. Interior Point Methods Setup and Algorithm

Testing equality of variances for multiple univariate normal populations

Figure 1: Equivalent electric (RC) circuit of a neurons membrane

Bayes Decision Rule and Naïve Bayes Classifier

Probability Distributions

On Conditions for Linearity of Optimal Estimation

The Weierstrass Approximation Theorem

Supplementary to Learning Discriminative Bayesian Networks from High-dimensional Continuous Neuroimaging Data

Sexually Transmitted Diseases VMED 5180 September 27, 2016

A Theoretical Analysis of a Warm Start Technique

As a model for an ATM switch we consider the overow frequency of a queue that

Lecture October 23. Scribes: Ruixin Qiang and Alana Shine

A Note on Scheduling Tall/Small Multiprocessor Tasks with Unit Processing Time to Minimize Maximum Tardiness

Computational and Statistical Learning Theory

Optimal Jamming Over Additive Noise: Vector Source-Channel Case

Symmetrization and Rademacher Averages

Distributed Subgradient Methods for Multi-agent Optimization

SEISMIC FRAGILITY ANALYSIS

MODIFICATION OF AN ANALYTICAL MODEL FOR CONTAINER LOADING PROBLEMS

Multi-Dimensional Hegselmann-Krause Dynamics

Randomized Recovery for Boolean Compressed Sensing

Proceedings of the 2016 Winter Simulation Conference T. M. K. Roeder, P. I. Frazier, R. Szechtman, E. Zhou, T. Huschka, and S. E. Chick, eds.

Proceedings of the 2015 Winter Simulation Conference L. Yilmaz, W. K. V. Chan, I. Moon, T. M. K. Roeder, C. Macal, and M. D. Rossetti, eds.

On weighted averages of double sequences

Experimental Design For Model Discrimination And Precise Parameter Estimation In WDS Analysis

General Properties of Radiation Detectors Supplements

Constant-Space String-Matching. in Sublinear Average Time. (Extended Abstract) Wojciech Rytter z. Warsaw University. and. University of Liverpool

Using EM To Estimate A Probablity Density With A Mixture Of Gaussians

Polygonal Designs: Existence and Construction

AN OPTIMAL SHRINKAGE FACTOR IN PREDICTION OF ORDERED RANDOM EFFECTS

AVOIDING PITFALLS IN MEASUREMENT UNCERTAINTY ANALYSIS

Learnability and Stability in the General Learning Setting

ma x = -bv x + F rod.

A note on the multiplication of sparse matrices

Transcription:

Bootstrapping Dependent Data One of the key issues confronting bootstrap resapling approxiations is how to deal with dependent data. Consider a sequence fx t g n t= of dependent rando variables. Clearly it would be a istake to resaple fro the sequence scalar quantities, as the reshu ed resaples would break the teporal dependence. Our goal is ost often to learn the variance of a general statistic T n (X ; : : : ; X n ), we hereafter refer to the unknown variance as. The quantity ay not be calculable analytically because the dependence structure and the underlying distribution of the innovations are not assued to be known. In 985, Hall exained the proble of bootstrap estiation for data that was spatial in character. His proposed ethods could be applied to tie-series data, although the speci c details of his results cannot be directly applied. For the xed-block bootstrap, he proposes dividing the series into nonoverlapping blocks of equal length, each block has length n. For the oving-block bootstrap, he proposes dividing the series into n + overlapping blocks of equal length n. To x ideas, consider the saple fx ; : : : ; x 4 g with block length. The xed-block bootstrap is obtained by constructing the statistic of interest for each eber of the set f(x ; x ) ; (x 3 ; x 4 )g : The oving-block bootstrap is obtained by constructing the statistic of interest for each eber of the set f(x ; x ) ; (x ; x 3 ) ; (x 3 ; x 4 )g : The intuition underpinning the xed-block bootstrap is as follows. The ovingblock bootstrap has any saples that share a large nuber of observations, in this way there is redundancy. The xed-block bootstrap avoids such redundancy. Further, if grows with n, then a statistic constructed fro a given subsaple will eventually behave as though it is independent of all but two (the adjacent two) of the statistics constructed fro the other subsaples. In addition, should grow with n to allow for long-lived dynaics to be captured. One natural choice for would be = cn, with 0 < c <, as the subsaples would be of the sae order of agnitude as the original data. Unfortunately, such an approach would To see why consider the case of two diensional spatial data. Rather than a sequence of tie-series variables, the underlying coponents are rectangles. One assuption is that the ratio of the lengths of two adjoining edges of the rectancle is constant, which has no natural anlaog in tie-series data.

not provide enough subsaples, as we have only about subsaples regardless c of n. We require that increase ore slowly, so that! 0. n In 986, Carlstein independently developed the xed-block bootstrap for stationary, -ixing sequences. Forally, let fx t ; < t < g be a strictly stationary sequence de ned on probability space (; F; P ). The function T n (x ; : : : ; x n ), fro R 7! R n is de ned so that T n (X (!) ; : : : ; X n (!)) is F -easurable. Fixed blocks of data are de ned as X t = (X t+ ; X t+ ; : : : ; X t+ ) ; so the whole saple is denoted X 0 n. A general statistic de ned for the xed block is T t = T X t ; for exaple, the saple ean X t = X X t+j : The statistic is appropriately standardized, so that for the unknown variance j= li E hn i Tn t ETn 0 = (0; ) ; n! which is clear for the case of the saple ean li n n! n! E (X t ) = : t= The value of the statistic for each of the xed blocks is denoted T t : 0 t k n ; where k n = n. For exaple, let n = 00 and = 8, so kn = [:5] =, T 0 8 = T 8 (X ; : : : ; X 8 ) ; T 8 = T 8 (X 9 ; : : : ; X 6 ) : : : ; T 8 (X 89 ; : : : ; X 96 ) ; so the last four observations are not used. To construct the estiator of, rst construct the average value of T across the subsaples T = kx n k n t=0 T t :

With the saple average in hand, the variance estiator is siply the standard variance estiator ^ F Boot = kx n k n t=0 T t T : For coparison, consider the variance of the saple ean ^ = ne X n : Observe that there is no randoness in the construction of ^ F Boot, the statistic of interest is calculated for each subsaple (that is, for each xed block) and the variance is directly estiated. In this way, as Kunsch (989) argues, the xedblock bootstrap is really closer to the jackknife than the oving-block bootstrap. For the jackknife, one deletes each block of consecutive observations once and calculates the saple variance of the statistics constructed fro the n + saples of length n. Thus the jackknife di ers fro the xed-block bootstrap in that overlapping subsaples are used (and that tapering is used to ake a sooth transition between observations oitted and observations included). For the arithetic ean, Kunsch argues that the xed-block bootstrap and the jackknife are equivalent. For ore coplicated statistics they are not, and Kunsch argues that the jackknife outperfors the xed-block bootstrap. P!. How Carlstein shows that if n! and n! 0, then ^ n F Boot should one choose in practice? Increasing reduces bias and captures ore persistent dependence. Decreasing reduces variance as ore subsaples are available. The trade-o between bias and variance leads one to consider ean square error as the optial criterion. Because construction of the MSE depends on knowledge of the underlying data generating process, no optial results are available. For the special case in which iid X t = X t + U t ; with jj < and U t N (0; ), the value of the block length that iniizes rst-order MSE is n = jj 3 n 3. Sensibly, the block length increases with the agnitude of. In a 989 paper rich with results, Kunsch explored the oving-block bootstrap (as well as the jackknife, about which we have little to say here). Kunsch is clear that either the oving-block or xed-block ethods are only appropriate for statistics constructed fro the epirical distribution function, as Hall akes 3

clear fro the outset in his book. To construct the potential blocks of data for the oving-block bootstrap, we again let X t = (X t+ ; X t+ ; : : : ; X t+ ) and note that there are n + possible overlapping blocks. (For the case in which n = 00 and = 8, the xed-block bootstrap used nonoverlapping subsaples, while there are 93 potential (overlapping) blocks for the ovingblock bootstrap. Unlike the xed-block bootstrap, there is randoness for the oving-block bootstrap, as the potential overlap of the blocks does not ake clear precisely which subsaples should be used. If we let S t be a rando variable distributed uniforly on the integers f0; ; : : : ; n g, then the oving-block bootstrap begins by constructing a saple of length k (Kunsch assues that k = n, so we do as well in what follows) as X S ; X S ; : : : ; X S k : The statistic of interest is calculated for the entire bootstrap saple, rather than fro the subsaples as in the xed-block bootstrap, and is denoted T n = T X S ; X S ; : : : ; X S k : The oving-block estiator of the variance is ^ MBoot = V ar (T n ) = E (Tn E T n ) ; where E denotes expectation with respect to S ; : : : ; S k. Kunsch shows that if n! and n! 0, then n ^ P MBoot!. There has been no ention of onte carlo resapling. That is because the bootstrap is de ned literally as the variance of the statistic constructed fro all possible subsaples. In ost applications ^ MBoot ust be evaluated y onte carlo siulation. To illustrate how such a quantity could be calculated without coputer siulation, consider estiation of the (arithetic) ean. We consider estiation of the ean fro blocks of length. Because we are estiating the ean, calculation of the statistic on the entire bootstrap saple is equivalent to calculating the statistic on each block, and averaging the block eans T n = kx W n;t ; k where W n;t is the average fro block t. Because each block is equally likely to be sapled, the W n;t s are i.i.d. with P W n;t = X j+ + X j+ = for each j = 0; : : : ; n : n + 4 t=

We have E (T n jx ; : : : ; X n ) = EW n; = n + = X t c t ; n + t= j=0 X i= X j+i where c t = [in (t ; n ) ax (t ; 0) + ] is a counter that indexes the nuber of appearances of each X t in the total su. For exaple, X appears in only the rst block, so c =. Siilarly, X appears in the rst two blocks and c =. We thus have an analytic expression for the expectation of the ovingblock bootstrap estiator of the ean. Of course, we are typically interested in the variance of the estiator. We have V ar (T n jx ; : : : ; X n ) = k V ar (W n;) = k n + j=0 X (X j+i EW n; ) ; which provides an analytic expression for the variance of the bootstrap estiator. References Carlstein, E., 986, The Use of Subseries Values for Estiating the Variance of a General Statistic fro a Stationary Sequence Annals of Statistics 4, 7-79. Hall, P., 985, Resapling a Coverage Pattern Stochastic Processes and their Applications 0, 3-46. K unsch, :: H., 989, The Jackknife and the Bootstrap for General Stationary Observations Annals of Statistics 7, 7-4. i= 5