Estimation of ordinal pattern probabilities in Gaussian processes with stationary increments

Similar documents
Ordinal Analysis of Time Series

Ordinal symbolic dynamics

Joint Parameter Estimation of the Ornstein-Uhlenbeck SDE driven by Fractional Brownian Motion

ON THE CONVERGENCE OF FARIMA SEQUENCE TO FRACTIONAL GAUSSIAN NOISE. Joo-Mok Kim* 1. Introduction

ON A LOCALIZATION PROPERTY OF WAVELET COEFFICIENTS FOR PROCESSES WITH STATIONARY INCREMENTS, AND APPLICATIONS. II. LOCALIZATION WITH RESPECT TO SCALE

ELEMENTS OF PROBABILITY THEORY

Permutation Excess Entropy and Mutual Information between the Past and Future

If we want to analyze experimental or simulated data we might encounter the following tasks:

Central Limit Theorems for Conditional Markov Chains

ENEE 621 SPRING 2016 DETECTION AND ESTIMATION THEORY THE PARAMETER ESTIMATION PROBLEM

Research Statement. Mamikon S. Ginovyan. Boston University

LARGE SAMPLE BEHAVIOR OF SOME WELL-KNOWN ROBUST ESTIMATORS UNDER LONG-RANGE DEPENDENCE

Fall 2017 STAT 532 Homework Peter Hoff. 1. Let P be a probability measure on a collection of sets A.

Variations. ECE 6540, Lecture 10 Maximum Likelihood Estimation

Maximum Likelihood Drift Estimation for Gaussian Process with Stationary Increments

VISCOSITY SOLUTIONS. We follow Han and Lin, Elliptic Partial Differential Equations, 5.

Covariance function estimation in Gaussian process regression

Likelihood-Based Methods

Permutation Entropy: New Ideas and Challenges

An exponential family of distributions is a parametric statistical model having densities with respect to some positive measure λ of the form.

Chapter 3 - Temporal processes

Uniformly and strongly consistent estimation for the Hurst function of a Linear Multifractional Stable Motion

Asymptotic inference for a nonstationary double ar(1) model

PROBABILITY: LIMIT THEOREMS II, SPRING HOMEWORK PROBLEMS

Statistics of Stochastic Processes

Goodness-of-Fit Tests for Time Series Models: A Score-Marked Empirical Process Approach

PROBABILITY: LIMIT THEOREMS II, SPRING HOMEWORK PROBLEMS

Monte-Carlo MMD-MA, Université Paris-Dauphine. Xiaolu Tan

MINIMUM EXPECTED RISK PROBABILITY ESTIMATES FOR NONPARAMETRIC NEIGHBORHOOD CLASSIFIERS. Maya Gupta, Luca Cazzanti, and Santosh Srivastava

A NOVEL APPROACH TO THE ESTIMATION OF THE HURST PARAMETER IN SELF-SIMILAR TRAFFIC

A PARAMETRIC MODEL FOR DISCRETE-VALUED TIME SERIES. 1. Introduction

Notes, March 4, 2013, R. Dudley Maximum likelihood estimation: actual or supposed

Time Series: Theory and Methods

The Equivalence of Ergodicity and Weak Mixing for Infinitely Divisible Processes1

Elements of Probability Theory

QUANTIZATION FOR DISTRIBUTED ESTIMATION IN LARGE SCALE SENSOR NETWORKS

Permutation entropy a natural complexity measure for time series

Ergodic Theorems. Samy Tindel. Purdue University. Probability Theory 2 - MA 539. Taken from Probability: Theory and examples by R.

Krzysztof Burdzy University of Washington. = X(Y (t)), t 0}

ICES REPORT Model Misspecification and Plausibility

SGN Advanced Signal Processing: Lecture 8 Parameter estimation for AR and MA models. Model order selection

L 2 -induced Gains of Switched Systems and Classes of Switching Signals

On Stochastic Adaptive Control & its Applications. Bozenna Pasik-Duncan University of Kansas, USA

Lecture 9. d N(0, 1). Now we fix n and think of a SRW on [0,1]. We take the k th step at time k n. and our increments are ± 1

Parametric Techniques Lecture 3

FORCING WITH SEQUENCES OF MODELS OF TWO TYPES

On prediction and density estimation Peter McCullagh University of Chicago December 2004

Statistical study of spatial dependences in long memory random fields on a lattice, point processes and random geometry.

25.1 Ergodicity and Metric Transitivity

Testing Restrictions and Comparing Models

Time Series 2. Robert Almgren. Sept. 21, 2009

Modelling Non-linear and Non-stationary Time Series

On the convergence of the iterative solution of the likelihood equations

Time Series 3. Robert Almgren. Sept. 28, 2009

Rough paths methods 4: Application to fbm

Chapter 3. Point Estimation. 3.1 Introduction

Expressions for the covariance matrix of covariance data

Asymptotic Normality of an Entropy Estimator with Exponentially Decaying Bias

Permutation Entropy: New Ideas and Challenges

Brownian motion. Samy Tindel. Purdue University. Probability Theory 2 - MA 539

Stochastic Realization of Binary Exchangeable Processes

Parametric Techniques

Mixing Properties of Conditional Markov Chains with Unbounded Feature Functions

Convergence Rate of Nonlinear Switched Systems

Weak convergence and Brownian Motion. (telegram style notes) P.J.C. Spreij

Part III Example Sheet 1 - Solutions YC/Lent 2015 Comments and corrections should be ed to

Spring 2012 Math 541B Exam 1

Statistics 612: L p spaces, metrics on spaces of probabilites, and connections to estimation

PACKING-DIMENSION PROFILES AND FRACTIONAL BROWNIAN MOTION

Packing-Dimension Profiles and Fractional Brownian Motion

An Evaluation of Errors in Energy Forecasts by the SARFIMA Model

Parametric Signal Modeling and Linear Prediction Theory 1. Discrete-time Stochastic Processes

Time series models in the Frequency domain. The power spectrum, Spectral analysis

ISSN Article. Efficiently Measuring Complexity on the Basis of Real-World Data

STAT 512 sp 2018 Summary Sheet

Gaussian Random Fields: Geometric Properties and Extremes

Journal of Statistical Research 2007, Vol. 41, No. 1, pp Bangladesh

COVARIANCE IDENTITIES AND MIXING OF RANDOM TRANSFORMATIONS ON THE WIENER SPACE

On Boolean functions which are bent and negabent

Long-range dependence

DYNAMICS ON THE CIRCLE I

Rearrangement Algorithm and Maximum Entropy

Monte Carlo Studies. The response in a Monte Carlo study is a random variable.

STATS 200: Introduction to Statistical Inference. Lecture 29: Course review

Asymptotic efficiency of simple decisions for the compound decision problem

ON THE ZERO-ONE LAW AND THE LAW OF LARGE NUMBERS FOR RANDOM WALK IN MIXING RAN- DOM ENVIRONMENT

Practical conditions on Markov chains for weak convergence of tail empirical processes

ECON 616: Lecture 1: Time Series Basics

Information geometry for bivariate distribution control

Topics in Stochastic Geometry. Lecture 4 The Boolean model

f(x θ)dx with respect to θ. Assuming certain smoothness conditions concern differentiating under the integral the integral sign, we first obtain

THE MEASURE-THEORETIC ENTROPY OF LINEAR CELLULAR AUTOMATA WITH RESPECT TO A MARKOV MEASURE. 1. Introduction

On rational approximation of algebraic functions. Julius Borcea. Rikard Bøgvad & Boris Shapiro

Multivariate Time Series: VAR(p) Processes and Models

Semi-Parametric Importance Sampling for Rare-event probability Estimation

Multivariate Normal-Laplace Distribution and Processes

LARGE DEVIATION PROBABILITIES FOR SUMS OF HEAVY-TAILED DEPENDENT RANDOM VECTORS*

Multivariate Time Series

A Concise Course on Stochastic Partial Differential Equations

Kriging models with Gaussian processes - covariance function estimation and impact of spatial sampling

Transcription:

Estimation of ordinal pattern probabilities in Gaussian processes with stationary increments Mathieu Sinn a, Karsten Keller,b a David R. Cheriton School of Computer Science, University of Waterloo, 200 University Avenue West, Waterloo, Ontario, Canada N2L 3G1 b Institute of Mathematics, University of Luebeck, Wallstrasse 40, D-23560 Lübeck, Germany Abstract The investigation of ordinal pattern distributions is a novel approach to quantifying the complexity of time series and detecting changes in the underlying dynamics. Being fast and robust against monotone distortions, this method is particularly well-suited for the analysis of long biophysical time series where the exact calibration of the measurement device is unknown. In this paper we investigate properties of the estimators of ordinal pattern probabilities in discrete-time Gaussian processes with stationary increments. We show that better estimators than the sample frequency estimators are available because the considered processes are subject to certain statistical symmetries. Furthermore, we establish sufficient conditions for the estimators to be strongly consistent and asymptotically normal. As an application, we discuss the Zero-Crossing (ZC) estimator of the Hurst parameter in fractional Brownian motion and compare its performance to that of a similar metric estimator by simulation studies. Key words: Time series analysis, ordinal pattern, stochastic process, estimation, fractional Brownian motion 2000 MSC: 60G15, 62M10, 60G18 Corresponding author Email addresses: msinn@cs.uwaterloo.ca (Mathieu Sinn), keller@math.uni-luebeck.de (Karsten Keller) Preprint submitted to Computational Statistics & Data Analysis September 25, 2009

1. Introduction One of the main challenges in time series analysis these days is the computational complexity due to the length and the high resolution of data sets. For instance, time series in finance, medicine or metereology often consist of several thousands of data points. Therefore, scalability of methods is a crucial requirement. Ordinal time series analysis is a new approach to the investigation of long and complex time series (see Bandt (2004), Keller et al. (2007)). The basic idea is to consider the order relations among the values of a time series instead of the values themselves. As major advantages compared to methods which take the exact metric structure into account, ordinal methods are particularly fast and robust (see Bandt and Pompe (2002), Keller and Sinn (2005)). Moreover, the order structure is invariant with respect to different offsets or scalings of a time series, which is important for the modelling of observations where the exact calibration of the measurement device is unknown. The key ingredient to ordinal time series analysis is the concept of ordinal patterns (or order patterns according to the terminology of Bandt and Shiha (2007)). An ordinal pattern represents the order relations among a fixed number of equidistant values in a time series. If the values of the time series are pairwise different, ordinal patterns can be identified with permutations. Statistics of the distribution of ordinal patterns in a time series (and in parts of it, respectively) provide information on the dynamics of the underlying system. One such statistic is the permutation entropy introduced by Bandt and Pompe (2002) as a measure for the complexity of time series. Permutation entropy has been applied to detect and investigate qualitative changes in brain dynamics as measured by an electroencephalogram (EEG) (see, e.g., Keller and Lauffer (2002), Cao (2004), Li et al. (2007, 2008)). Further statistics besides permutation entropy measure the symmetry of time series or quantify the amount of zigzag (see Bandt and Shiha (2007), Keller et al. (2007)). The distribution of ordinal patterns in time-discrete real-valued stochastic processes has been first investigated by Band and Shiha (2007). For special classes of Gaussian processes, they derive the probabilities of ordinal patterns describing the order relations between three and four successive observations. Some of their results apply more generally to processes with non-degenerate 2

and symmetric finite-dimensional distributions. In this paper we study statistical properties of the estimators of ordinal pattern probabilities. The framework of our analysis are discrete-time realvalued Gaussian processes with stationary and non-degenerate increments. Since only the order relations are considered, our results apply also to monotone functionals of such processes. In Section 2 we give general statements on ordinal pattern probabilities and their estimation. As the distribution of ordinal patterns in processes with stationary increments is time-invariant, unbiased estimators of ordinal pattern probabilities are given by the corresponding sample frequencies. By using symmetries of the distribution of Gaussian processes, we derive estimators with strictly lower risk with respect to convex loss functions. We show that these estimators are strongly consistent and give sufficient conditions for asymptotic normality. In Section 3 we apply the results to ordinal patterns describing the order structure of three successive observations. We show that reasonable estimators of the probability of such patterns can be expressed as an affine function of the frequency of changes between upwards and downwards. When the probability of a change is monotonically related to underlying process parameters, we derive estimators of such parameters. As an example, we discuss the Zero-Crossing (ZC) estimator of the Hurst parameter in fractional Brownian motion (see Coeurjolly (2000)). Section 4 illustrates the results and compares the ZC estimator with a metric analogue by simulation studies. As an interesting finding of the simulations, an even number of changes between upwards and downwards in a sample is much more likely than an odd number when the Hurst parameter is large. 2. Ordinal patterns and their probabilities Preliminaries. As usual, let N = {1, 2,...} and Z = {..., 1, 0, 1,...}, and let R Z be the space of sequences (z t ) t Z with z t R for all t Z. By B(R), B(R n ) and B(R Z ), we denote the Borel-σ-algebras on R, R n and R Z. For d N, let S d denote the set of permutations of {0, 1,..., d}, which we write as (d+1)-tuples containing each of the numbers 0, 1,..., d exactly one time. By the ordinal pattern of x = (x 0, x 1,..., x d ) R d+1 we understand the unique permutation π(x) = π(x 0, x 1,..., x d ) = (r 0, r 1,..., r d ) 3

of {0, 1,..., d} which satisfies x r0 x r1... x rd and r i 1 > r i if x ri 1 = x ri for i = 1, 2,..., d. The second condition is necessary to guarantee the uniqueness of (r 0, r 1,..., r d ) if there are equal values among x 0, x 1,..., x d. We may regard π(x) as a representation of the rank order of x 0, x 1,..., x d. If x i = x j for i, j {0, 1,..., d} with i < j, then x j is ranked higher than x i. When x 0, x 1,..., x d are pairwise different, the order relation between any two components of x (being either < or >) can be obtained from π(x). Ordinal time series analysis is based on counting ordinal patterns in a time series. The way of getting the ordinal pattern at some time t Z for some fixed d N is illustrated by the following example. Example. Figure 1 shows the values of a time series (x t ) t Z 24, 25,..., 36. For t = 27 and d = 5, we have at times (x t, x t+1,..., x t+d ) = (0.3, 0.1, 0.5, 0.9, 0.7, 0.5). In Figure 1, the given values are connected by black line segments. Since (r 0, r 1,..., r d ) = (3, 4, 5, 2, 0, 1) is the only permutation of {0, 1,..., d} satisfying x t+r0 x t+r1... x t+rd, we obtain π(x t, x t+1,..., x t+d ) = (3, 4, 5, 2, 0, 1). In order to get the whole ordinal pattern distribution of a (part of a) time series for d N, one has to determine π(x t, x t+1,..., x t+d ) for all times t of interest. This can be done by a very efficient algorithm which takes into account the overlapping of successive vectors (see Keller et al. (2007)). Instead of the permutation representation of an ordinal pattern, this algorithm uses an equivalent representation by a sequence of inversion numbers. The framework of analysis. Let (Ω, A) be a measurable space and X = (X t ) t Z a sequence of measurable mappings from (Ω, A) into (R, B(R)). Let Y = (Y t ) t Z denote the process of increments of X, given by Y t := X t X t 1 for t Z. Suppose (Ω, A) is equipped with a family of probability measures 4

x t 0.9 0.7 0.6 0.5 0.3 0.1 24 25 26 27 28 29 30 31 32 33 34 35 36 t Figure 1: A part of a time series (x t ) t Z, where the permutation π(x) describing the order relations among the components of the vector x = (x t, x t+1,..., x t+d ) with d = 5 and t = 27 is given by π(x) = (3, 4, 5, 2, 0, 1). (P ϑ ) ϑ Θ with Θ. The subscript ϑ (e.g., in E ϑ, Var ϑ, etc.) indicates integration with respect to P ϑ. Note that the consideration of a family of probability measures is only necessary for Sections 3 and 4, we however assume it from the beginning by reason of simplicity. We assume that for every ϑ Θ the following conditions are satisfied: (M1) Y is non-degenerate, that is, for all t 1 < t 2 <... < t k in Z with k N and every set B B(R k ), P ϑ ((Y t1, Y t2,..., Y tk ) B) > 0 only if B has strictly positive Lebesgue measure. (M2) Y is stationary for every ϑ Θ, that is, for all t 1 < t 2 <... < t k in Z with k N and every l N, (Y t1, Y t2,..., Y tk ) dist = (Y t1 +l, Y t2 +l,..., Y tk +l), where dist = denotes equality in distribution. (M3) Y is zero-mean Gaussian for every ϑ Θ. Note that the class of models satisfying (M1)-(M3) includes equidistant discretizations of fractional Brownian motion (fbm) with the Hurst parameter H (0, 1) (see Taqqu (2003) and the end of Section 3). As a consequence of (M1), the values of X are pairwise different P ϑ -almost surely for all ϑ Θ, that is, P ϑ (X s = X t ) = 0 (1) 5

for all s, t Z with s t. For k Z and ϑ Θ, define ρ ϑ (k) := Corr ϑ (Y 0, Y k ). (2) Ordinal patterns. Let d N. By the ordinal pattern of order d at time t in X, we mean the random permutation given by Π(t) := π(x t, X t+1,..., X t+d ) for t Z. In this section, we study the distribution of the ordinal pattern process (Π(t)) t Z and the problem of estimating ordinal pattern probabilities. Note that we could define Π(t) as a causal filter only depending on the past values X t d, X t d+1,..., X t. The above non-causal definition is just for sake of simpler notation in some proofs. Clearly, if h is a strictly monotone mapping from R onto R, then π(x) = π(h(x 0 ), h(x 1 ),..., h(x d )) for all x = (x 0, x 1,..., x d ) R d+1. Consequently, the ordinal patterns in X and h(x) are identical. Note that the mapping h may be random. For instance, when A and B are random variables with values in R and (0, ), respectively, the ordinal patterns in X and A + B X are identical. Stationarity. For y = (y 1, y 2,..., y d ) R d, define π(y) := π(0, y 1, y 1 + y 2,..., y 1 + y 2 +... + y d ). Let x = (x 0, x 1,..., x d ) R d+1. Clearly, π(x) = π(x 0 x 0, x 1 x 0,..., x d x 0 ). Furthermore, for i {1, 2,..., d}, we can write x i x 0 as the sum of the differences x 1 x 0, x 2 x 1,..., x i x i 1. Therefore, π(x) = π(x 1 x 0, x 2 x 1,..., x d x d 1 ). This shows that, for every t Z, the ordinal pattern Π(t) only depends on the increments Y t+1, Y t+2,..., Y t+d, namely, Π(t) = π(y t+1, Y t+2,..., Y t+d ). Thus, the following corollary is an immediate consequence of (M2). Corollary 1. (Π(t)) t Z is stationary for every ϑ Θ. 6

Let r = (r 0, r 1,..., r d ) S d for some d N. For ϑ Θ, define p r (ϑ) := P ϑ (Π(t) = r). According to Corollary 1, the function p r ( ) does not depend on the specific time point t Z on the right hand side of the definition. We call p r ( ) the probability of the ordinal pattern r. By (M1), we easily obtain the following statement. Corollary 2. For every ϑ Θ, 0 < p r (ϑ) < 1. Next, we study the statistical problem of estimating the (generally unknown) ordinal pattern probability p r ( ). For n N, consider the ordinal pattern sample Π n := (Π(0), Π(1),..., Π(n 1)). A natural estimator of p r ( ) is given by the relative frequency of observations of r in the sample Π n, ˆq r,n = ˆq r,n (Π n ) := 1 n 1 1 {Π(t)=r}. n Since (Π(t)) t Z is stationary, we have E ϑ (ˆq r,n ) = p r (ϑ) for every ϑ Θ, that is, ˆq r,n is an unbiased estimator of p r ( ). In the next paragraph we show that there is a simple way for improving this estimator. Space and time symmetry. Let k N and t 1 < t 2 <... < t k in Z. According to (M3), the random vectors (Y t1, Y t2,..., Y tk ) and ( Y t1, Y t2,..., Y tk ) are zero-mean Gaussian for every ϑ Θ. Since Cov ϑ (Y i, Y j ) = Cov ϑ ( Y i, Y j ) for all i, j Z, they have identical covariance matrices, which shows that t=0 (Y t1, Y t2,..., Y tk ) dist = ( Y t1, Y t2,..., Y tk ) for every ϑ Θ. Furthermore, since Y is stationary, we have Cov ϑ (Y i, Y j ) = Cov ϑ (Y i, Y j ) for all i, j Z. Therefore, (Y t1, Y t2,..., Y tk ) dist = (Y t1, Y t2,..., Y tk ) 7

for every ϑ Θ. We refer to these properties of Y as symmetry in space and time, respectively. With the terminology of Bandt and Shiha (2007), symmetry in space and time is equivalent to reversibility and rotation symmetry, respectively. Next, we show that as a consequence of the symmetry in space and time of Y, the distribution of Π n is invariant with respect to spatial and time reversals of ordinal pattern sequences. Consider the mappings α, β from S d onto itself given by α(r) := (r d, r d 1,..., r 0 ) and β(r) := (d r 0, d r 1,..., d r d ) (3) for r = (r 0, r 1,..., r d ) S d. Geometrically, we can regard α(r) and β(r) as the spatial and time reversal of r (for an illustration, see Figure 2). In particular, if the components of x = (x 0, x 1,..., x d ) R d+1 are pairwise different, then α(π(x)) = π( x 0, x 1,..., x d ) and β(π(x)) = π(x d, x d 1,..., x 0 ). In terms of the vector of increments y = (y 1, y 2,..., y d ) given by y k := x k x k 1 for k = 1, 2,..., d, we have α( π(y)) = π( y 1, y 2,..., y d ) and β( π(y)) = π( y d, y d 1,..., y 1 ). (4) r = (2, 0, 1) α(r) = (1, 0, 2) β(r) = (0, 2, 1) β α(r) = (1, 2, 0) Figure 2: The pattern r = (0, 2, 1) and its spatial and time reversals. As usual, let denote the composition of functions. For r S d, consider the subset r of S d given by r := { r, α(r), β(r), β α(r) }. Since α β(r) = β α(r) and α α(r) = β β(r) = r, the set r is closed under α and β, i.e., α(r) = β(r) = r. Consequently, if s r for r, s S d, then 8

s = r. This provides a partition of each S d into classes which contain 2 or 4 elements. For d = 1, the only class is S 1 = {(0, 1), (1, 0)}. For d = 2, there are two classes: {(0, 1, 2), (2, 1, 0)} and {(0, 2, 1), (2, 0, 1), (1, 2, 0), (1, 0, 2)}. For d = 3, there are 8 classes. For d 2, both classes of 2 and 4 elements are possible. Now, let n N, and consider the mappings A, B from (S d ) n onto (S d ) n given by A(r 1, r 2,..., r n ) := (α(r 1 ), α(r 2 ),..., α(r n )), B(r 1, r 2,..., r n ) := (β(r n ), β(r n 1 ),..., β(r 1 )) for (r 1, r 2,..., r n ) (S d ) n. According to the geometrical interpretation of α and β, the ordinal pattern sequences A(r 1, r 2,..., r n ) and B(r 1, r 2,..., r n ) can be regarded as the spatial and time reversal of the ordinal pattern sequence (r 1, r 2,..., r n ). Lemma 3. For every ϑ Θ, the ordinal pattern sequences Π n, A(Π n ), B(Π n ) and B A(Π n ) have the same distribution. Proof. Let ϑ Θ. Since the values in X are pairwise different P ϑ -almost surely (see (1)), equation (4) yields α( π(y t+1, Y t+2,..., Y t+d )) = π( Y t+1, Y t+2,..., Y t+d ) P ϑ -almost surely for every t Z. Furthermore, by the space symmetry of Y, the random vectors (Y 1, Y 2,..., Y n+d 1 ) and ( Y 1, Y 2,..., Y n+d 1 ) have the same distribution with respect to P ϑ. Thus, Π n = ( π(y 1,..., Y d ), π(y 2,..., Y d+1 ),..., π(y n,..., Y n+d 1 ) ) dist = ( π( Y 1,..., Y d ), π( Y 2,..., Y d+1 ),..., π( Y n,..., Y n+d 1 ) ) = A(Π n ), where the last equality holds P ϑ -almost surely. Similarly, we obtain β( π(y t+1, Y t+2,..., Y t+d )) = π( Y t+d, Y t+d 1,..., Y t+1 ) P ϑ -almost surely for every t Z. Because of the space and time symmetry of Y, the random vectors (Y 1, Y 2,..., Y n+d 1 ) and ( Y n+d 1, Y n+d 2,..., Y 1 ) have the same distribution with respect to P ϑ, and therefore Π n = ( π(y 1,..., Y d ), π(y 2,..., Y d+1 ),..., π(y n,..., Y n+d 1 ) ) dist = ( π( Y n+d 1,..., Y n ),..., π( Y d,..., Y 1 ) ) = B(Π n ) 9

with the last equality holding P ϑ -almost surely. Now, combining the two previous statements yields equality in distribution of Π n and B A(Π n ). Note that, for the proof of Lemma 3, we have only used that Y is symmetric in space and time and that the values of X are pairwise different P ϑ -almost surely. Thus, the statement is valid under more general assumptions than (M1)-(M3). A Rao-Blackwellization. In this paragraph, let r S d with d N and n N. According to Lemma 3, we obtain p r ( ) = p α(r) ( ) = p β(r) ( ) = p α β(r) ( ). This shows that ˆq r,n, ˆq α(r),n, ˆq β(r),n and ˆq α β(r),n are all unbiased estimators of p r ( ). By averaging them we obtain another unbiased estimator of p r ( ), given by ˆp r,n = ˆp r,n (Π n ) := 1 4 = 1 n (ˆq r,n + ˆq α(r),n + ˆq β(r),n + ˆq α β(r),n ) n 1 t=0 1 r 1 {Π(t) r} where r denotes the cardinality of the set r. Theorem 5 below shows that ˆp r,n has lower risk than ˆq r,n with respect to any convex loss function. We first prove the following lemma. Lemma 4. For every ϑ Θ, we have P ϑ (ˆp r,n ˆq r,n ) > 0. Proof. Let ϑ Θ. We show there exists a permutation (s 0, s 1,..., s n+d 1 ) S n+d 1 such that X s0 > X s1 >... > X sn+d 1 implies ˆp r,n > 0 and ˆq r,n = 0. Then, according to Corollary 2, we have P ϑ (ˆp r,n ˆq r,n ) P ϑ (ˆp r,n > 0, ˆq r,n = 0) P ϑ (X s0 > X s1 >... > X sn+d 1 ) > 0. Let i, j {0, 1,..., d} be the indices satisfying r i = d 1 and r j = d. If i < j, then we choose (s 0, s 1,..., s n+d 1 ) = (n + d 1, n + d 2,..., d + 1, r d, r d 1,..., r 0 ). 10

Otherwise, let (s 0, s 1,..., s n+d 1 ) = (r d, r d 1,..., r 0, d + 1, d + 2,..., n + d 1). In both cases, if X s0 > X s1 >... > X sn+d 1, then Π(0) = α(r) and Π(t) r for t = 1, 2,..., n 1, which implies ˆp r,n > 0 and ˆq r,n = 0. The proof is complete. Theorem 5. The estimator ˆp r,n of p r ( ) is unbiased and has lower risk than ˆq r,n with respect to any convex loss function, that is, for every ϑ Θ, E ϑ ( ϕ(ˆpr,n, p r (ϑ)) ) E ϑ ( ϕ(ˆqr,n, p r (ϑ)) ) with respect to any function ϕ : [0, 1] [0, 1] [0, ) with ϕ(p, p) = 0 and ϕ(, p) being convex for every p [0, 1]. When ϕ(, p) is strictly convex for every p [0, 1], then ˆp r,n has strictly lower risk than ˆq r,n with respect to φ. In particular, for every ϑ Θ, Var ϑ (ˆp r,n ) < Var ϑ (ˆq r,n ). Proof. Let ϑ Θ, and be any total order on (S d ) n. According to Lemma 3, the statistic S(Π n ) := min {Π n, A(Π n ), B(Π n ), B A(Π n )} is sufficient for Π n, namely, if π (S d ) n is such that P ϑ (S(Π n ) = π) 0, then the conditional distribution of Π n given S(Π n ) = π is the equidistribution on {π, A(π), B(π), B A(π)} which, clearly, does not depend on ϑ. Now, note that ˆp r,n = 1 4 ( ) ˆq r,n (Π n ) + ˆq r,n (A(Π n )) + ˆq r,n (B(Π n )) + ˆq r,n (B A(Π n )) P ϑ -almost surely, and thus ˆp r,n is a conditional expectation of ˆq r,n given S(Π n ). Since ˆq r,n is unbiased, the statement on the lower risk of ˆp r,n follows by Theorem 3.2.1 in Pfanzagl (1994). The result on strictness is also a consequence of Theorem 3.2.1 in Pfanzagl (1994) and the fact that, according to Lemma 4, P ϑ (ˆp r,n ˆq r,n ) > 0. Now, the statement on the variance follows by ( p) 2 being strictly convex for each p [0, 1]. Strong consistency. Up to the end of this section fix some r S d for d N. In order to establish sufficient conditions for strong consistency for ˆp r,n, we use well-known results from ergodic theory. Let τ denote shift transformation, given by τ(z) = (z t+1 ) t Z 11

for z = (z t ) t Z R Z. For j N, let τ j be given by τ j (z) = τ j 1 (τ(z)) where τ 0 (z) := z is the identity on R Z. A real-valued stationary stochastic process Z = (Z t ) t Z on a probability space (Ω, A, P) is called ergodic iff P(Z B) = 0 or P(Z B) = 1 for every set B B(R Z ) satisfying P(τ 1 (B) B) = 0. The Birkhoff-Khinchin Ergodic Theorem states that, if Z is ergodic and f : R Z R is measurable with E( f(z) ) <, 1 n 1 lim f(τ j (Z)) = E(f(Z)) n n j=0 P-almost surely (see Theorem 1.2.1 in Cornfeld et al. (1982)). According to Theorem 14.2.2 in Cornfeld et al. (1982), a sufficient condition for a stationary Gaussian process to be ergodic is that the autocorrelations tend to zero as the lag tends to infinity. Theorem 6. If ρ ϑ (k) 0 as k for every ϑ Θ and h : [0, 1] R is continuous on an open set containing p r (Θ), then h(ˆp r,n ) is a strongly consistent estimator of h(p r ( )), that is, lim h(ˆp r,n) = h(p r (ϑ)) n P ϑ -almost surely for every ϑ Θ. If, additionally, h is bounded on [0, 1], then h(ˆp r,n ) is an asymptotically unbiased estimator of h(p r ( )), that is, for every ϑ Θ. lim E ϑ(h(ˆp r,n )) = h(p r (ϑ)) n Proof. Let ϑ Θ. Consider the mapping f : R Z R given by { 1 if π(y1, y f(y) := 2,..., y d ) r 0 otherwise for y = (y t ) t Z R Z. Note that, for j = 0, 1, 2,..., we have f(τ j (Y)) = 1 {Π(j) r}. Under the assumptions, Y is ergodic and thus, according to the Birkhoff-Khinchin Ergodic Theorem, lim ˆp r,n = 1 n r lim 1 n 1 f(τ j (Y)) n n j=0 = 1 r E ϑ(f(y)) 12

P ϑ -almost surely. Furthermore, according to Lemma 3, 1 r E ϑ(f(y)) = p r (ϑ) P ϑ -almost surely. Under the assumptions, there exists a δ > 0 such that h is continuous on (p r (ϑ) δ, p r (ϑ) + δ). Therefore, lim h(ˆp r,n) = h(p r (ϑ)) n P ϑ -almost surely. Now, since h is bounded on [0, 1], the Dominated Convergence Theorem yields The proof is complete. lim E ϑ(h(ˆp r,n )) = E ϑ ( lim h(ˆp r,n ) ). n n Asymptotic normality. Next, we derive a sufficient condition on the autocorrelations of Y for the estimator ˆp r,n to be asymptotically normally distributed. Let N(0, σ 2 ) with σ 2 [0, ) denote the (possibly degenerated) normal distribution with zero mean and variance σ 2. As usual, we write g(k) = o(h(k)) for g, h : N R iff lim k g(k) h(k) = 0. By P ϑ we denote convergence in distribution with respect to P ϑ. Let Z = (Z 1, Z 2,..., Z n ) with n N be a Gaussian random vector on a probability space (Ω, A, P). For a measurable mapping f : R n R with Var(f(Z)) <, the Hermite rank of f with respect to Z is defined by rank(f) := min { κ N There exists a real polynomial q : R n R of degree κ with E([f(Z) E(f(Z))] q(z)) 0 }, where the minimum of the empty set is infinity. The following result is derived by a limit theorem for nonlinear functionals of a stationary Gaussian sequence of vectors in Arcones (1994), which relates the Hermite rank of f with respect to (Y 1, Y 2,..., Y d ) and the rate of decay of k ρ ϑ (k) to the asymptotical distribution of 1 n 1 n t=0 f(y t+1, Y t+2,..., Y t+d ). Theorem 7. If ρ ϑ (k) = o(k β ) for ϑ Θ and some β > 1 2, then n (ˆpr,n p r (ϑ)) P ϑ N(0, σ 2 ϑ ), 13

where σ 2 ϑ := γ ϑ (0) + 2 γ ϑ (k) k=1 and γ ϑ (k) := 1 ( r) 2 Cov ϑ (1 {Π(0) r}, 1 {Π(k) r} ) for k Z. Proof. Let g : R d R be defined by g(z) := { 1 r if π(z) r 0 otherwise for z R d, and note that g(y(t)) = 1 r 1 {Π(t) r} for every t Z. Therefore, according to the definition of ˆp r,n, n ˆpr,n = 1 n 1 n t=0 g(y t+1, Y t+2,..., Y t+d ) for every n N. Now, let Z = (Z 1, Z 2,..., Z d ) be a standard normal random vector on (Ω, A, P) and note that E((g(Z)) 2 ) <. We show that g has Hermite rank κ 2 with respect to Z. Let i {1, 2,..., d}. Note that it is sufficient to show that E([g(Z) E(g(Z))] Z i ) = 0. Since Z i is zero-mean Gaussian, we have E(g(Z)) E(Z i ) = 0 and thus E([g(Z) E(g(Z))] Z i ) = E(g(Z) Z i ). Furthermore, because Z is non-degenerate, using the same argument as in the proof of Lemma 3 shows that 1 { π( Z)=α(s)} = 1 { π(z)=s} P-almost surely for every s S d. Since Z is zero-mean Gaussian, Z and Z are identically distributed, and thus E(1 { π(z)=α(s)} Z i ) = E(1 { π( Z)=α(s)} ( Z i )) = E(1 { π(z)=s} Z i ). Now, in the case r = 2 where g(z) = 1 2 (1 { π(z)=r} + 1 { π(z)=α(r)} ), we have 2 E(g(Z) Z i ) = E(1 { π(z)=r} Z i ) + E(1 { π(z)=α(r)} Z i ) = 0. 14

Analogously, in the case r = 4, we obtain 4 E(g(Z) Z i ) = E(1 { π(z)=r} Z i ) + E(1 { π(z)=α(r)} Z i ) + E(1 { π(z)=β(r)} Z i ) + E(1 { π(z)=α β(r)} Z i ) = 0. Putting it all together, we have E([g(Z) E(g(Z))] Z i ) = 0 which shows that g has Hermite rank κ 2. Note that we have only used that Z is non-degenerate zero-mean Gaussian, so g has Hermite rank κ 2 also with respect to (Y 1, Y 2,..., Y d ) for every ϑ Θ. Now, let ϑ Θ and suppose ρ ϑ (k) = o(k β ) for some β > 1. Define 2 r (i,j) ϑ (k) := ρ ϑ(k + i j) for k Z and i, j {1, 2,..., d}. Since (k + i j) β k β for all i, j {1, 2,..., d}, we have r (i,j) ϑ (k) = o(k β ) and thus r (i,j) ϑ (k) κ <. k=1 By Theorem 4 in Arcones (1994), the result follows. /iffalse it follows that ˆp r,n is asymptotically normally distributed, where the expression for σϑ 2 is obtained according to ( ) 1 Cov ϑ g(y(0)), g(y(k)) = ( r) Cov ( ) 2 ϑ 1{Π(0) r}, 1 {Π(k) r} for k Z. The theorem is proved. /fi By applying the Delta Method (see Theorem 2.5.2 in Lehmann (1999)), we obtain the following corollary. Corollary 8. If ρ ϑ (k) = o(k β ) for ϑ Θ and for some β > 1, and 2 h : [0, 1] R has a non-vanishing first derivative at p r (ϑ), then ( n h(ˆpr,n ) h(p r (ϑ)) ) P ϑ N( 0, σ 2 ϑ [h (p r (ϑ))] 2 ), with σ 2 ϑ as given in Theorem 7. Note that Theorem 4 in Arcones (1994) also allows to derive a multidimensional statement: Let m N and r 1 S d1, r 2 S d2,..., r m S dm with (possibly different) d 1, d 2,..., d m N. Suppose h i : [0, 1] R has a nonvanishing first derivative at p ri (ϑ) for i = 1, 2,..., m. If ρ ϑ (k) = o(k β ) for some β > 1, then (h 2 1(ˆp r1,n), h 2 (ˆp r2,n),..., h m (ˆp rm,n)) is asymptotically normally distributed. 15

3. Parameter estimation from the frequency of changes The case d = 2. Next, we show that in the case of ordinal patterns of order d = 2, any reasonable estimator of ordinal pattern probabilities can be written as an affine function of the frequency of changes between upwards and downwards. Consider C(t) := 1 {Xt X t+1 <X t+2 } + 1 {Xt<X t+1 X t+2 } for t Z, indicating a change of direction in X, either from downwards to upwards (when X t X t+1 and X t+1 < X t+2 ), or from upwards to downwards (when X t < X t+1 and X t+1 X t+2 ). Clearly, C(t) only depends on the ordinal pattern of order d = 2 at time t, namely, X t X t+1 < X t+2 iff Π(t) = (0, 2, 1) or Π(t) = (2, 0, 1), and X t < X t+1 X t+2 iff Π(t) = (1, 0, 2) or Π(t) = (1, 2, 0). As a consequence of Corollary 1, (C(t)) t Z is stationary for every ϑ Θ. Consider the probability of a change, c(ϑ) := P ϑ (C(t) = 1) for ϑ Θ. Clearly, c( ) does not depend on the value of t on the right hand side of the definition. By evaluating two-dimensional normal orthant probabilities, it can be shown that for ϑ Θ (see Bandt and Shiha (2007)). relative frequency of changes, c(ϑ) = 1 2 1 π arcsin ρ ϑ(1) (5) ĉ n := 1 n 1 C(t). n t=0 For n N, let ĉ n denote the According to the relation between C(t) and Π(t) discussed above, we have ĉ n = { 4 ˆpr,n if r {(1, 0, 2), (1, 2, 0), (0, 2, 1), (2, 0, 1)}, 1 2 ˆp r,n if r {(0, 1, 2), (2, 1, 0)}. In particular, Var ϑ (ĉ n ) = 16 Var ϑ (ˆp r,n ) if r {(1, 0, 2), (1, 2, 0), (0, 2, 1), (2, 0, 1)}, and Var ϑ (ĉ n ) = 4 Var ϑ (ˆp r,n ) if r {(0, 1, 2), (2, 1, 0)}. Note that the results in Sinn and Keller (2009) allow to evaluate Var ϑ (ĉ n ) numerically. 16

Estimation of ϑ. We now restrict to the case where Θ is a subset of R. If the relation ϑ c(ϑ) is strictly monotone on Θ, then ϑ can be estimated by plugging the estimate ĉ n of c(ϑ) into the inverse functional relation. More precisly, assume there exists a function h : [0, 1] R with h(c(ϑ)) = ϑ (6) for every ϑ Θ. Note that according to formula (5), a necessary and sufficient condition for the existence of such a function h is that ϑ ρ ϑ (1) is strictly monotonic. Plugging the estimate ĉ n of c( ) into the left hand side of (6), we obtain ˆϑ n := h(ĉ n ) as an estimate of ϑ. The following corollary gives properties of ˆϑ n. Corollary 9. The estimator ˆϑ n has the following properties: (i) If h is continuous on an open set containing c(θ) and lim k ρ ϑ (k) = 0 for all ϑ Θ, then ˆϑ n is a strongly consistent estimator of ϑ. If, additionally, h is bounded on [0, 1], then ˆϑ n is an asymptotically unbiased estimator of ϑ. (ii) If ρ ϑ (k) = o(k β ) for some β > 1 and h has a non-vanishing first 2 derivative at c(ϑ), then n (ˆϑn ϑ) P ϑ N(0, σ 2 ϑ [h (c(ϑ))] 2 ), with σ 2 ϑ := γ ϑ (0) + 2 γ ϑ (k) k=1 and γ ϑ (k) := Cov ϑ (C(0), C(k)) for k Z. Proof. (i) is a consequence of Theorem 6. (ii) follows by Corollary 8. Equidistant discretizations of fbm. In the following, we apply the previous results to the estimation of the Hurst parameter in equidistant discretizations of fbm. Let B = (B(t)) t R be a family of measurable mappings from a measurable space (Ω, A) into (R, B(R)). Furthermore, let (P H ) H (0,1) be a family of probability measures on (Ω, A) such that B measured with respect 17

to P H is fbm with the Hurst parameter H, that is, B is zero-mean Gausian and Cov H (B(t), B(s)) = 1 2 ( t 2H + s 2H t s 2H) for s, t R. Note that many authors define fbm only for t [0, ), however, we adopt the double-side infinite definition in Taqqu (2003). It is well-known that fbm with the Hurst parameter H is H-self-similar, i.e., for every a > 0, the processes (B(at)) t R and (a H B(t)) t R have the same finite-dimensional distributions with respect to P H (see Taqqu (2003)). For a fixed sampling interval length δ > 0, consider the equidistant discretization of fbm, X = (X t ) t Z, given by X t := B(δt) for t Z. Let H (0, 1). According to the self-similarity of fbm, (B(δt)) t Z and (δ H B(t)) t Z have the same distribution. Furthermore, the ordinal patterns in (δ H B(t)) t Z and (B(t)) t Z are identical. Therefore, we obtain the following statement. Corollary 10. The distribution of ordinal patterns in X does not depend on the sampling interval length δ. In the following, we assume δ = 1. Let Y = (Y t ) t Z denote the increment process of X given by Y t := X t X t 1 for t Z. It is well-known that Y is non-degenerate, stationary and zero-mean Gaussian for every H (0, 1). Thus, with Θ := (0, 1) and ϑ := H, we have a class of stochastic processes with Y satisfying the model assumptions (M1)-(M3). The ZC estimator of the Hurst parameter. The frequency of changes ĉ n is a particularly interesting statistic for equidistant discretizations of fbm, because the probability of a change is monotonically related to the Hurst parameter. Note that the first-order autocorrelation of Y measured with respect to P H is given by ρ H (1) = 2 2H 1 1. Thus, according to formula (5) and using the fact that arcsin x = 2 arcsin (1 + x)/2 π for x [ 1, 1], we obtain 2 c(h) = 1 2 π arcsin 2H 1 (7) 18

for H (0, 1). Plugging the estimate ĉ n of c( ) into the left hand side of (7) and solving for H yields an estimator for the Hurst parameter. In order to obtain only finite non-negative estimates, we define h(x) := max { 0, log 2 (cos(πx/2)) + 1 } for x [0, 1] and set Ĥ n := h(ĉ n ). Note that the first derivative of h on (0, 2 ) is given by 3 h (x) = π tan(πx/2) (8) 2 ln 2 for x (0, 2). 3 The estimator Ĥn is known as the ZC estimator of the Hurst parameter, with ZC standing for zero-crossings, because changes between upwards and downwards in X are equivalent to zero-crossings in Y (see Kedem (1994), Coeurjolly (2000)). The following corollary gives properties of the ZC estimator. Note that the second statement has been established by Coeurjolly (2000) using a central limit theorem of Ho and Sun (1987). Corollary 11. The estimator Ĥn has the following properties: (i) Ĥ n is a strongly consistent and asymptotically unbiased estimator of the Hurst parameter. (ii) If H < 3 4, then n (Ĥn H) with h as given in (8) and P H N(0, σ 2 H [h (c(h))] 2 ), σ 2 H := γ H (0) + 2 γ H (k) k=1 where γ H (k) := Cov H (C(0), C(k)) for k Z. Proof. Note that the image of (0, 1) under c( ) is given by (0, 2 ) and h is 3 continuous and bounded on [0, 1]. Furthermore, ρ H (k) is asymptotically equivalent to H(2H 1)k 2H 2 as k (see Taqqu (2003)) and thus lim k ρ H (k) = 0 for every H (0, 1). Now, statement (i) follows by Corollary 9 (i). In order to establish (ii), note that h is non-vanishing on (0, 2). Furthermore, when H < 3, there exists a β > 1 such that ρ 3 4 2 H(k) = o(k β ) (for instance, we can choose β = 5 H). Thus, statement (ii) follows 4 by Corollary 9 (ii). 19

Alternative estimation methods.. There are various alternative methods for estimating the Hurst parameter, such as Maximum Likelihood (ML) methods and approximations thereof, or semi-parametric estimates based on the rescaled range statistic or on the periodogram. Although ML methods have asymptotic optimality properties, they are computationally intensive and thus often impracticable. Semi-parametric methods are more robust and generally applicable, but usually also less efficient. A further disadvantage of semi-parametric estimates is that they depend on certain tuning parameters which are difficult to select automatically. Here we consider a computationally simple alternative estimation method proposed by Kettani and Gubner (2002). Note that the probability of a change is monotonically related to the first-order autocorrelation of Y (which in turn is monotonically related to the Hurst parameter). The estimator of Kettani and Gubner first computes the sample autocorrelation of Y, ρ n := n 1 t=1 (Y t Ȳn)(Y t+1 Ȳn) n t=1 (Y t Ȳn) 2 (9) where Ȳn := 1 n n t=1 Y t is the sample mean, and plugs ρ n into the inverse of the monotonic relation to the Hurst parameter. Altogether, this gives the estimate of the Hurst parameter, where H n := g( ρ n ) g(x) := max { 0, 1 2 (log 2(1 + x) + 1) } (10) for x [0, 1]. Note that Ĥn can be regarded as the estimator obtained by plugging the estimate ˆρ n := cos(πĉ n ) of the first order autocorrelation into g( ). In this sense, Ĥ n is the ordinal analogue of Hn. In the next section, we compare the performance of Ĥ n and H n in a simulation study. Note that H n can be more generally applied to the estimation of the index of self-similarity in (not necessarily Gaussian) selfsimilar processes with stationary increments. In contrast to Ĥn, however, Hn is in general not invariant with respect to monotone transformations of the process. 20

n = 100 n = 1000 n = 10 000 H µ σ µ σ µ σ 0.05 0.087 (0.070) 0.101 (0.072) 0.054 (0.051) 0.043 (0.030) 0.050 (0.050) 0.016 (0.010) 0.10 0.119 (0.108) 0.113 (0.082) 0.099 (0.100) 0.048 (0.031) 0.100 (0.100) 0.016 (0.010) 0.15 0.156 (0.152) 0.124 (0.088) 0.149 (0.150) 0.048 (0.030) 0.150 (0.150) 0.015 (0.009) 0.20 0.199 (0.199) 0.131 (0.089) 0.199 (0.200) 0.047 (0.029) 0.200 (0.200) 0.015 (0.009) 0.25 0.245 (0.248) 0.134 (0.087) 0.249 (0.250) 0.045 (0.028) 0.250 (0.250) 0.014 (0.009) 0.30 0.291 (0.297) 0.133 (0.085) 0.299 (0.300) 0.043 (0.027) 0.300 (0.300) 0.014 (0.008) 0.35 0.341 (0.345) 0.131 (0.082) 0.349 (0.350) 0.042 (0.026) 0.350 (0.350) 0.013 (0.008) 0.40 0.390 (0.394) 0.127 (0.079) 0.399 (0.400) 0.040 (0.025) 0.400 (0.400) 0.013 (0.008) 0.45 0.441 (0.442) 0.121 (0.076) 0.449 (0.449) 0.038 (0.024) 0.450 (0.450) 0.012 (0.008) 0.50 0.491 (0.489) 0.116 (0.073) 0.499 (0.499) 0.036 (0.023) 0.500 (0.500) 0.011 (0.007) 0.55 0.541 (0.536) 0.110 (0.070) 0.549 (0.548) 0.034 (0.022) 0.550 (0.550) 0.011 (0.007) 0.60 0.592 (0.581) 0.103 (0.067) 0.599 (0.597) 0.032 (0.021) 0.600 (0.600) 0.010 (0.007) 0.65 0.641 (0.625) 0.097 (0.064) 0.649 (0.646) 0.031 (0.020) 0.650 (0.649) 0.010 (0.007) 0.70 0.692 (0.668) 0.093 (0.061) 0.699 (0.693) 0.030 (0.020) 0.700 (0.698) 0.010 (0.007) 0.75 0.741 (0.708) 0.089 (0.059) 0.749 (0.739) 0.029 (0.020) 0.750 (0.747) 0.010 (0.007) 0.80 0.790 (0.746) 0.085 (0.056) 0.798 (0.783) 0.031 (0.020) 0.800 (0.794) 0.012 (0.008) 0.85 0.837 (0.782) 0.081 (0.053) 0.848 (0.824) 0.035 (0.020) 0.849 (0.838) 0.017 (0.009) 0.90 0.884 (0.814) 0.076 (0.050) 0.895 (0.860) 0.039 (0.019) 0.898 (0.879) 0.023 (0.009) 0.95 0.930 (0.843) 0.067 (0.046) 0.941 (0.893) 0.039 (0.018) 0.944 (0.914) 0.027 (0.010) Table 1: Sample mean µ and sample standard deviation σ for the estimator Ĥn ( H n ) 4. Simulation studies In order to illustrate the performance of Ĥ n and to point out some interesting phenomenon in the distribution of changes for large H, we present some data based on simulations. We have used the pseudo random number generator of Matlab 7.6.0 and the algorithm of Davies and Harte (1987) for simulating paths of equidistant discretizations of fbm. For the sample sizes n = 100, 1000, 10 000 and different values of the Hurst parameter, we have generated each 100 000 paths and have computed the number of changes and the value of Ĥ n and H n for each path. The sample mean µ and the sample standard deviation σ of the obtained values for Ĥn and H n (in brackets) are shown in Table 1. The results suggest that the bias of Ĥ n is particularly large for larger values of the Hurst parameter. For n = 100 and n = 1000 the sample standard deviation is the highest for small values of H, and for n = 10 000 it is maximal in the case H = 0.95. Compared to the results for the estimator H n the sample standard deviation of the ZC estimator is larger. For instance, when n = 10 000 and H < 0.85, the standard deviation is about 1.5 times as large. The observed loss of efficiency of Ĥn compared to H n is not surprising, because Ĥn is based only on the number of changes between upwards and downwards whereas H n uses the metric structure. When n is small, both 21

estimators tend to overestimate the Hurst parameter (except for very small values of H). Particularly for large values of the Hurst parameter, Ĥ n seems to have much smaller bias than H n. Figure 3 shows the empirical distribution of the number of changes between upwards and downwards in the samples of length n = 100. For H = 0.70, the distribution looks approximately normal, which is compatible with Corollary 9 (ii), but for large values of H, the distribution is very irregular. Most remarkably, the frequency of even numbers is larger than the frequency of odd numbers, and the distributions conditioned on an odd or an even number look entirely different. For instance, when H = 0.95, the frequencies of odd and even numbers are 0.321 and 0.679, respectively. The distribution conditioned on an odd number of changes is slightly leftskewed with the mean 21.5 and the mode 23. The distribution conditioned on an even number of changes has the mean 14.4 and the mode 0 and looks, roughly, like the mixture of a geometric and a binomial distribution. An interesting consequence is that the probability of a change (which is given by c(0.95) = 0.167) is overestimated by the relative frequency of changes in a sample given that the number of changes in the sample is odd, and underestimated given that the number of changes in the sample is even. A heuristic explanation for the high frequency of even numbers is the following: When H is large, there is a high probability to observe path segments which look, roughly, like a straight line. Typically for such segments, there are only local changes in direction. Globally, there is one prevailing trend, either downwards or upwards. Overall, such segments result in an even number of changes. In other words: paths with an even number of changes look more similar to a straight line than paths with an odd number of changes. Note that in the limit case H 1, the number of changes is even with probability 1 (namely, equal to 0). Acknowledgements The research of Mathieu Sinn was supported by a Government of Canada Post-Doctoral Research Fellowship (PDRF). References [1] Arcones, M. A., 1994. Limit theorems for nonlinear functionals of a stationary Gaussian sequence of vectors. Annals of Probability 22, 2242-2274. 22

[2] Bandt, C., 2005. Ordinal time series analysis. Ecological Modelling 182, 229-238. [3] Bandt, C., Pompe, B., 2002. Permutation entropy: A natural complexity measure for time series. Phys. Rev. Lett. 88, 174102. [4] Bandt, C., Shiha, F., 2007. Order patterns in time series. J. Time Ser. Anal. 28, 646-665. [5] Cao, Y. H., Tung, W. W., Gao, J. B., Protopopescu, V. A., Hively, L. M., 2004. Detecting dynamical changes in time series using the permutation entropy. Phys. Rev. E 70, 046217. [6] Coeurjolly, J. F., 2000. Simulation and identification of the fractional Brownian motion: A bibliographical and comparative study. J. Stat. Software 5. [7] Cornfeld, I. P., Fomin, S. V., Sinai, Ya. G., 1982. Ergodic Theory. Springer-Verlag, Berlin. [8] Davies, R. B., Harte, D. S., 1987. Tests for Hurst effect. Biometrika 74, 95-101. [9] Ho, H.-C., Sun, T.-C., 1987. A central limit theorem for noninstantaneous filters of a stationary Gaussian process. J. Multivariate Anal. 22, 144-155. [10] Kedem, B., 1994. Time Series Analysis by Higher Order Crossings. IEEE Press, New York. [11] Keller, K., Lauffer, H., 2003. Symbolic analysis of high-dimensional time series. Int. J. Bifurcation Chaos 13, 2657-2668. [12] Keller, K., Sinn, M., 2005. Ordinal analysis of time series. Physica A 356, 114-120. [13] Keller, K., Sinn, M., Emonds, J., 2007. Time series from the ordinal viewpoint. Stochastics and Dynamics 2, 247-272. [14] Keller, K., Lauffer, H., Sinn, M., 2007. Ordinal analysis of EEG time series. Chaos and Complexity Letters 2, 247-258. 23

[15] Kettani, H., Gubner, J. A., 2006. A novel approach to the estimation of the Hurst parameter in self-similar traffic. IEEE Trans. Circuits Syst. II, 53, 463-467. [16] Lehmann, E., 1999. Elements of Large Sample Theory. Springer, New York. [17] Li, X., Cui, S., Voss, L. J., 2008. Using permutation entropy to measure the electroencephalographic effects of sevoflurane. Anesthesiology 109, 448-456. [18] Li, X., Ouyang, G., Richards, D. A., 2007. Predictability analysis of absence seizures with permutation entropy. Epilepsy research 77, 70-74. [19] Pfanzagl, J., 1994. Parametric Statistical Theory. De Gruyter, Berlin. [20] Sinn, M., Keller, K., 2009. Covariances of zero-crossings in Gaussian processes. Submitted. [21] Taqqu, M. S., 2003. Fractional Brownian Motion and Long-Range- Dependence, in: Doukhan, P., Oppenheim, G., Taqqu, M. S. (Eds.), Theory and Applications of Long-Range-Dependence, Birkhäuser, Boston. 24

0.08 0.08 0.06 0.06 0.04 H = 0.70 0.04 H = 0.75 0.02 0.02 0.00 0 10 20 30 40 50 60 0.08 0.00 0 10 20 30 40 50 60 0.08 0.06 0.06 0.04 H = 0.80 0.04 H = 0.85 0.02 0.02 0.00 0 10 20 30 40 50 60 0.08 0.00 0 10 20 30 40 50 60 0.08 0.06 0.06 0.04 H = 0.90 0.04 H = 0.95 0.02 0.02 0.00 0 10 20 30 40 50 60 0.00 0 10 20 30 40 50 60 Figure 3: Simulation of the number of changes in samples of the length n = 100. 25