Bivariate statistical analysis of TCP-flow sizes and durations

Size: px
Start display at page:

Download "Bivariate statistical analysis of TCP-flow sizes and durations"

Transcription

1 Ann Oper Res (2009) 170: DOI /s Bivariate statistical analysis of TCP-flow sizes and durations Natalia M. Markovich Jorma Kilpi Published online: 7 March 2009 Springer Science+Business Media, LLC 2009 Abstract We approximate the distribution of the TCP-flow rate by deriving it from the joint bivariate distribution of the flow sizes and flow durations of a given access network. The latter distribution is represented by a bivariate extreme value distribution using the Pickand s dependence A-function. We estimate the A-function to measure the dependencies of random pairs: TCP-flow size and duration, the rate of TCP-flow and size, as well as the rate and duration. We provide a method to test that the achieved estimate of A-function is good and perform the analysis with one concrete data example. Keywords TCP-flow Extreme value distribution Pickands function In a recent paper of D Auria and Resnick (2006) a new family of an infinite-source Poisson models is shown to be able to take into account several properties of true traffic, such as burstiness, long range dependence (LRD), heavy tails, bursty behavior determined by high bandwidth users and dependence determined by users with low transmission capacity. The basic elements of these flow-level models are beginnings of the data transfer sessions and the rates (R), durations (D) and sizes (S) of each of these data transfer sessions. Different dependence structures between the triple (R,D,S) are possible in these models and in D Auria and Resnick (2006) the natural but optimistic case when the rate R of a data transfer session is independent of the size S of the data transfer session is analyzed. A form of asymptotic independence for the pair (D, R) is obtained in Resnick (2006), p. 239 as a result of the examination of the tail of the product DR. In van de Meent and Mandjes (2005) N.M. Markovich ( ) Institute of Control Sciences, Russian Academy of Sciences, Profsoyuznaya str. 65, Moscow, Russia markovic@ipu.rssi.ru N.M. Markovich nmarkovich@yahoo.com J. Kilpi VTT, Vuorimiehentie 3, VTT, Finland jorma.kilpi@vtt.fi

2 200 Ann Oper Res (2009) 170: the dependence of R and D was found in 80% traces for aggregated flows generated by the same user by checking the condition ED ER/ES = 1. Some specific variant of this general model-family framework could be suitable also for access networks where the level of traffic aggregation may be moderate. The advantages of such flow-level models are discussed, for example, in van de Meent and Mandjes (2005). The data transfer session could be interpreted as an IP-flow, which is an aggregate of traffic with the same source/destination IP-addresses, or it could be defined by a TCP-connection if we restrict the interest only to TCP-traffic. This latter choice is somewhat more specific but more accurately defined. Sizes and durations of both directions of a TCP-connection are well-defined and can be easily measured with, for example, a tool like Tstat (2007). The quantities S and D are obtained directly from the data. For the purposes of this paper we define the rate R of a TCP-flow in a natural way as the ratio R = S/D. Since both S and D are random the rate R is also random. Since S and D are dependent (see results of the empirical study in Sect. 4.2) and positive, the joint distribution F of the pair (S, D) determines the distribution of R: F R (x) = P{S/D x}= zx 0 0 df(y,z) = 0 P{S zx D = z} df D (z). (1) We denote by F D the marginal cumulative distribution function (CDF) of D. Direct estimation of the distribution of R from the data, as the ratio S/D, would disregard the dependence structure of S and D. Since the distributions of both S and D are heavy-tailed (see investigation in Sect. 2), their joint distribution F is very complex for estimation. This dependence structure is driven by the physical structure of the access network, which contains the access links of various capacities, and by the separation of users behind their access links. With faster access rate users tend to download larger files, but the access link capacity is not the only factor. The tariffing policy, for example, has also an effect. The distribution of the rate R = S/D is of interest. The latter allows us to estimate the expectation and variance of the rate. Another aim is to measure the degree of dependencies of S and D, R and S as well as R and D. In the context of heavy-tailed distributions the asymptotical distribution (in a sense that the sample size increases infinitely) of a normalized maxima of the sample is used as a model of the tail. We shall estimate a bivariate Extreme Value Distribution (EVD) G of the pair (S, D) instead of F(y,z) in (1) (necessary explanations are given in Sect. 4.1). There are at least two causes for this. First, a maxima M n = max i=1,...,n {X i } of n observations of a random variable X fully represents its distribution in a sense P{X 1 + +X n >x} np{x 1 >x} P{M n >x} as x, Embrechts et al. (1997). Second, the lack of observations beyond the sample range does not allow to restore such a distribution at infinity. Our approach follows the three steps: 1. The preliminary detection of univariate heavy tails. 2. The dependence structure of univariate time series data. 3. The dependence structure of multivariate data. For heavy-tailed data and, in particular, data with infinite variance, the classical statistical methods are not adequate and flexible enough. We should also distinguish between methods that are valid for independent and dependent data. For example, an empirical CDF is

3 Ann Oper Res (2009) 170: biased when applied to dependent data. The maximum likelihood (ML) method requires independence assumption. The evaluation of the distributions of univariate data is especially important when estimating the multivariate quantiles and distributions. When the data are independent or weakly dependent one can apply traditional methods like kernel estimators to estimate the probability density function (PDF), see Hall et al. (1995). The problem of density estimation arises when the data are long-range dependent (LRD), see Castellana and Leadbetter (1986). The paper is organized as follows. First, the data that is analysed in this paper is briefly described in Sect. 1. The data analysis starts in Sect. 2 where we first describe rough methods to detect heaviness of tails of univariate distributions. Then, in Sect. 3, we study methods to detect dependencies in the data. The bivariate analysis is presented in Sect. 4. Finally, conclusions are made in Sect A brief description of the data The data set we use in this paper is part of a trace measured from a gateway between a mobile network and the Internet. The important thing is that all of the flows have passed through a mobile radio access, hence the access rates have always been rather limited, always less than 384 kb/s. The same data were used in a different study, reported in Kilpi and Lassila (2006), and from this study we know that the mobile backone network was not congested at any time during the measurement. For this reason we ignore the daily structure of this traffic. In Kilpi and Lassila (2006) the problems of TCP in this specific trace were found to be due to low access rate, limited CPU capacity of mobile terminals and self-congestion at the access link due to user behavior. The latter two properties are prominent in this trace due to the tariffing policy of the operator, flat-rate charged hours. Hence they are specific to this particular trace and are mentioned here only due to problems they generated for the data-analysis. The extreme value theory that will be applied for this data assumes that the size-duration pairs (S i,d i ), i = 1,...,N are an i.i.d sample from a common joint distribution F.However, data which is obtained from monitoring a single point of a network falls into the category of observational data. It cannot automatically be considered as i.i.d. data. Rather it resembles (unequally spaced) time series data. For example, a single web session of a single user can spawn several TCP-connections with short inter-arrival times and these TCPconnections are naturally dependent. The observational data are not necessarily homogeneous enough for meaningful inference from the trace specific empirical distribution. In order to get a more homogenius data set we considered only downstream flows running on the TCP port 80 (HTTP). We can assume that a user selects a random web content, that is he/she chooses the size S from the file size distribution randomly, and then downloads this content using TCP. The action of a user initiates the TCP connection and its arrival time AT but the TCP protocol generates the flow departure time DT when the downloading is finished and, thus, determines also the flow duration D = DT AT. The duration D that the downloading takes depends on the rate of the flow and, since the access rate is very low, it can be assumed to be the major constraint for the rate rather than congestion elsewhere in the network. It is the special feature of this data set. The majority of pairs (S, D) are like this, although there are also outliers in the data. The reconstruction of flows from the original tcpdump-file was made using the Tstat program (Tstat 2007). The total number of above described flows is N = and, throughout the paper, we will divide the data into smaller blocks of flows, consecutive according to arrival times AT.

4 202 Ann Oper Res (2009) 170: Table 1 Brief description of the data Statistic Unit Definition Sample mean Sample variance Sample max Min Max Min Max Min Max Size kb Content Transmitted Duration sec SYN-FIN The block size is denoted by n, the number of blocks is denoted by m, and we always have N = mn, we first consider m = 61 disjoint bivariate blocks, each of size n = Table 1 gives the observed ranges [min, max] of sample means, sample variances and sample maxima over these 61 blocks. Content in Table 1 refers to the size of the downloaded web content and Transmitted means Content plus segments retransmitted by TCP. Both are measures of the size of a flow. SYN-FIN means the period from the three-way handshaking (synchronization) to finish. (Other measures of duration would also be possible.) There are some individual examples where Transmitted is significantly larger than Content but, in general, they differ little. This may be due to a bias arising from the fact that we consider only completely observed flows. In the data examples of this paper the size S corresponds to Content. Then R = S/D can be interpreted as the rate that the user experiences. 2 Testing for heaviness of univariate tails 2.1 Theory Let X 1,X 2,...,X n be the observed sample of r.v.s that have identical marginal distribution. Rough methods to test the heaviness of tails and the number of finite moments, see Embrechts et al. (1997) and Markovich and Krieger (2006), include the ratio of the maximum to the sum, the plot of the empirical mean excess function, QQ-plots and the tail index estimation. The statistic R n (p) = M n (p)/s n (p), n 1, p>0, where S n (p) = X 1 p + + X n p and M n (p) = max( X 1 p,..., X n p ), can be used to check the moment conditions of the data (Embrechts et al. 1997, p. 308). If R n (p) is small for large n, thene X p <, otherwise it suggests that the pth moment E X p =. The Extreme Value Index (EVI) γ or the tail index α = 1/γ indicate the shape of the tail and is a significant part of the analysis of heavy-tailed data. Distributions with regularly varying tails (RVT) provide a wide class of heavy-tailed distributions. The distribution function of RVT distributions determines by formula F(x)= 1 x 1/γ l(x),wherel(x) denotes a slowly varying function, e.g., ln x or a positive constant. The smaller the α the heavier the tail. A positive sign of the EVI may imply the heaviness of tail. The EVI may indicate the number of finite moments if the distribution is RVT. In this case, the moments of order β exist, i.e. E{X β } < if β<α,bute{x β }= if β>α, Embrechts et al. (1997). The well-known Hill s estimator is defined as γ H (n, k) = 1 k k log X (n i+1) log X (n k), i=1

5 Ann Oper Res (2009) 170: where the parameter k indicates the largest order statistics X (n k) X (n). This estimator is valid for a positive EVI γ and can be constructed for dependent data also. Two other estimators, the moment estimator ˆγ M (n, k) =ˆγ H (n, k) ( 1 ( ˆγ H (n, k)) 2 /S n,k ) ) 1, (2) where S n,k = (1/k) k i=1 (log X (n i+1) log X (n k) ) 2 and the UH estimator ˆγ UH (n, k) = (1/k) k log UH i log UH k+1, (3) i=1 where UH i = X (n i) ˆγ H (n, i), of the EVI, see Beirlant et al. (2004) or Embrechts et al. (1997), are based on a similar argument involving k + 1 largest values and are valid for any real valued EVIs. We also apply the group estimator which is based on a different principle. For this estimator the sample is divided into l groups V 1,...,V l, each group containing r r.v.s, i.e. n = lr. LetM (1) li = max{x j : X j V i } and let M (2) li denote the second largest element in the same group V i.thenγ l = 1/z l 1, where z l = (1/l) l i=1 M(2) li /M (1) li, can also be used as an estimator of γ, see Davydov et al. (2000) for details. The main problem of all EVI estimates is the choice of the parameter k (or r in the group estimator). This parameter can be automatically calculated by the bootstrap method, Markovitch and Krieger (2002), Markovich (2005). The linearity of a QQ-plot shows that the parametric model of the distribution is selected correctly. It is fruitful to check the Generalized Pareto distribution GPD(σ, γ ), with the distribution function { 1 (1 + γx/σ) σ,γ (x) = 1/γ, γ 0, 1 exp ( x/σ), γ = 0, where σ>0andx 0ifγ 0and0 x σ/γ if γ<0, as a candidate. The GPD tail-fitting approach is based on the Pickands theorem that provides the GPD as a limiting distribution of the excesses over a high fixed threshold, Beirlant et al. (2004). However, a choice of parameters which provide a linear-looking QQ-plot is not unique. For heavy-tailed distributions the mean excess function e(u) = E[X u X>u] tends to infinity. For example, in case of the Pareto distribution it increases linearly. For the data this can be tested with the empirical mean excess function, defined as e n (u) = n n (X i u)1 {Xi >u}/ i=1 i=1 1 {Xi >u}. The more detailed description of these methods is given in Markovich and Krieger (2006). 2.2 Practice The mentioned estimators of the EVI were applied to observed flow sizes and durations. The results, again in the form of [min, max]-ranges of these statistics over all blocks when n = and m = 61, are given in Table 2. For each sample the parameter k was estimated by a bootstrapping method. For the group estimator we used r = l = 100 = n. The positive signs of all estimates allow to assume that both flow sizes and durations have heavy-tailed

6 204 Ann Oper Res (2009) 170: Fig. 1 On the left are examples of n R n (p) calculated from one block of the durations and for a variety of p-values: p = 0.5(lowest curve), p = 1.0(middle curve)andp = 2.0(topmost curve). Since R n (1) does not decrease as n increases, one may conclude that the moments beginning with the first are not finite. The middle and right plots are examples of the estimation of γ by the Hill s (solid line) and by the moment estimator (dotted line) for the flow size and the duration of transmission (right). Horizontal lines show the bootstrap selected values: in the middle plot they are ˆγ H (n, k) = and ˆγ M (n, k) = and in the right plot they are ˆγ H (n, k) = and ˆγ M (n, k) = For this block of data the estimate of the tail index α is larger than 1. It may imply, that the fist moment of the corresponding distribution is finite Fig. 2 On the left is an example of a QQ-plot for the duration with quantiles of the GPD(0.85,1). In the middle and on the right are examples of the exceedance e n (u) against the threshold u for the flow size and duration, respectively Table 2 Estimation of the EVI for flow sizes and durations γ H (n, k) γ l γ M (n, k) γ UH (n, k) Min Max Min Max Min Max Min Max Content Transmitted SYN-FIN distributions. All estimators apart of the group estimator indicate that the block-samples of flow sizes (both content and transmitted) may have infinite variance under the assumption that their distributions are regularly varying. Some block-samples of flow durations may have finite two first moments. Examples of the results of this analysis for one typical block-sample of observations are given on Figs. 1 2 and a summary over all m = 61 blocks is given in Table 3. The column Estimators of γ summarizes Table 2. The indication <1 implies that only the pth moments with p<1 may exist. The conclusion of Table 3 does not mean that the GPD was a good model, it means only that in the QQ-plot the few largest values of the tail could be forced into a roughly linear relationship with the model quantiles and that the e n (u) seems

7 Ann Oper Res (2009) 170: Table 3 Conclusion over all m = 61 samples Amount of first finite moments Type of distribution R n (p) Estimators of γ QQ-plot e n (u) Content 1 2 or 1 GPD(σ = 1,γ = 1.3) Pareto-like Transmitted 1 2 or < 1 GPD(σ = 1,γ = 1.3) Pareto-like SYN-FIN <1 2or <1 GPD(σ = 1,γ = 0.85) Pareto-like to grow linearly when u is small as shown in Fig. 2. Whenu is large the amount of data points larger than u is very small and, thus, e n (u) is not very reliable. We emphasize that the identical distribution here is affected by factors specific to the trace. It is also affected by our choice to consider only port 80 traffic. The results cannot be generalized to arbitrary TCP-flows. However, these results are in line with similar results obtained elsewhere, see e.g. those referred to by D Auria and Resnick (2006). 3 Testing for dependence of univariate tails 3.1 Theory The autocorrelation function (ACF) at lag h ρ X (h) = ρ(x t,x t+h ) = E ((X t EX t )(X t+h EX t+h )) /Var(X t ) is considered as an important indicator of the dependence structure of a time series. For a stationary time series {X t,t = 0, ±1, ±2,...} the standard sample ACF at lag h Z is defined by ρ n,x (h) = n h t=1 (X t X n )(X t+h X n ) n t=1 (X t X n ) 2, (4) where X n = 1 n n t=1 X t represents the sample mean. The relevance of this estimate is determined by its rate of convergence to the real ACF. When the distribution of the X t s is very heavy-tailed (in the sense that EXt 4 =, i.e. the tail index is small enough α<4inthis case), this rate can be extremely slow. For heavy-tailed data with infinite variance it is better to use the following slightly modified estimate without the usual centering by the sample mean: n h t=1 ρ n,x (h) = X tx t+h n. (5) t=1 X2 t However, this estimate may behave in a very unpredictable way and not estimate anything reasonable if one uses the class of non-linear processes in the sense that this sample ACF may converge in distribution to a non-degenerate r.v. depending on h. For linear processes it converges in distribution to a constant depending on h (Davis and Resnick 1985). Resnick (1997) says: If on graphing the sample heavy tailed ACF ρ n,x (h) one finds only small values, then it may be possible to model the data as i.i.d. If the sample ACF is small beyond lag q, then there is some evidence that the moving average process MA(q) may be an appropriate model.

8 206 Ann Oper Res (2009) 170: Usual short range dependent data sets would show a sample ACF dying after only a few lags and then persisting within the 95% Gaussian confidence window. Independence can be assumed with a certainty of 95% if ρ n,x (h) 1.96/ n. The boundaries ±1.96/ n provided by Bartlett s formula (Brockwell and Davis 1991) are valid only for linear processes with Gaussian noise. In the case of regularly varying linear processes (Resnick 2006)or nonlinear processes (Mikosch 2002) such boundaries cannot be indicated so easily. Then the solution regarding independence cannot evidently be done from the observation of Bartlett s interval. We have to test also the long range dependence which means that there is dependence in a time series over an unusually long period of time. More exactly, the time series (X t ) is long range dependent if ρ X (h) =. (6) h=0 The Hurst parameter is a tool to test the LRD. One can assume that for some constant c ρ > 0 ρ X (h) c ρ h 2(H 1) for large h and some H (0.5, 1), (in this case (6) holds). The closer the Hurst parameter H is to 1 the slower is the rate of ρ X (h) to zero as h, i.e., the longer is the range of dependence in the time series. 3.2 Practice Since we treat unequally spaced data as equally spaced we are completely blind to dependencies caused by the fact that a single user s web session can, and often do, generate several TCP connections which obviously are dependent. Hence, we know beforehand that the data is not independent and the task is merely to quantify how dependent the univariate time series data are. Non-stationarities caused by the daily traffic profile, tariffing policy and other similar trace specific features easily affect the ACFs. Therefore we did the analysis using block sizes n = 10000, n = 1000 and, few times, even n = 100. When the traffic load is low, n = may cover a too long period for the necessary stationarity assumption to hold. When the traffic load is high n = 1000 may be too short and biased due to trace specific features. This resulted in hundreds of pictures, only few of which can be shown here. Very large values also easily affect the ACFs. The previous analysis (see Table 3) shows that the TCP-flow data appear as heavy-tailed with possibly infinite variance. Therefore, the application of formula (5) is relevant. However, as indicated in Resnick (1997), there is no general knowledge of what small values in ACF (5) should be and without known confidence lines inference from any single example of (5) alone is not possible. Pictures of (5) contain the suggestive confidence lines ±1.96/ n. Therefore, comparison of ACFs when calculated in the cases where the data is ranked in arrival order vs. when the same data was ranked into random order was made. Putting the data in a random order should destroy dependence structure caused by correlated arrivals. Moreover, the operation of random ordering can be repeated many times for any fixed block. Figure 3 shows one comparison of heavy-tailed ACFs. The general conclusion of this type of comparisons was that dependence due to correlated arrivals is not very significant since the overall shape of the (heavy tailed) ACF did not usually change much. Large spikes are due to large values, not due to significant dependence at any lag. The standard ACFs of the TCP-flow durations have small

9 Ann Oper Res (2009) 170: Fig. 3 On the left is an example of (4) for a block of n = flow durations, in the middle is an example of heavy-tailed ACF (5) for the same data, and on the right is the same data but ranked in a random order before making the ACF Fig. 4 Left and middle plots show estimates of sample heavy tailed ACF (5) ofj = 10th block (n = 10000) of flow sizes, on the left the data is ranked in the arrival order and on the middle the data is ranked in the departure order before division into blocks. The right plot shows the sample values of Spearman s Rho, r S,n (D ATi,D DTi ), n = 10000, between durations in arrival and in departure rankings of all m = 61 blocks values except of several lags. One can recognize at least three clusters at the ACF-plot that may indicate the dependence, Mikosch (2002). The heavy-tailed ACFs are small beyond approximately 5000th and 7000th lag, respectively for plots on the left and on the right. This may indicate on the possible linear model of the considered time series and dependence of the data. Since the values of both ACFs are of the same order, it may imply that the correlation of arrivals does not influence on the ACF. For flow sizes S i we also compared ACFs when the whole data was first ranked in the arrival versus in the departure order and the division to blocks was done after this departure ranking. In this case the same jth block does not consist exactly of the same points. This could bring forth some dependencies which are due to correlated flow departures. For flow durations this may not be meaningful since, due to functional relationship AT i + D i = DT i for the data, it is hard to interpret the results. An example of this comparison is shown in Fig. 4. Again, the differencies were typically small and could be explained by nonstationarities. The ACFs of the TCP-flow sizes may indicate a weak dependence. The claim that the mobile access/backbone network was not congested at any time during the measurement means that, as a system, it was in a (trivial) equilibrium state. Since congestion at the access link, due to trace specific user behaviour, and congestion at the Internet are only local the flow departure process should be roughly independent of flow arrival process. The Spearman s Rho, denoted as ρ S, was used to check whether sizes and durations, when ranked into arrival order were independent of the exactly the same data when ranked in departure order. That is, division to blocks was done first and rankings were done within the block only. The sample version r S,n of ρ S satisfies the well-known Kendall (1970) inequality Var(r S,n ) 3 n (1 ρ2 S ) which provides us confidence lines ± 3 n.theyare shown in the right plot of Fig. 4. This type of a plot could indicate that ignorance of daily structure due to uncongested mobile backbone would not be justified, but here it confirms this knowledge. The largest exceedings occur during the night time due to non-stationarity.

10 208 Ann Oper Res (2009) 170: Fig. 5 Left plot shows estimates of the Hurst parameter for all block samples of durations when n = The middle and right plots are examples of ACFs calculated for the flow inter-arrival times and for the empirical rates, respectively Also LRD was tested. It would be slightly weird if Content, as chosen by a user, would be LRD but such statistics as Transmitted and (SYN-FIN) duration could in principle be affected by the LRD nature of the aggregate packet level traffic. Again, since the assumptions of various existing methods, see for example Taqqu and Teverovsky (1998), cannot typically be absolutely verified, different estimators of the Hurst Parameter were compared. Figure 5 shows the results using the estimator Ĥ n = 0.5(1 + log 2 (1 + ρ n,x (1))) from Kettani and Gubner (2006). Confidence lines Ĥ n ± 5 n are based on assumption of the exactly secondorder self similar process. The exceedances of confidence lines in the left plot of Fig. 5 are due to non-stationarity at the night time. None of the time series is LRD. Finally, since we really would need to know the dependencies between bivariate time series (S i,d i ), i = 1,...,n, we also constructed ACFs of flow inter-arrival times AT i+1 AT i, empirical rates R i = S i /D i,and Si 2 + Di 2, i = 1,...,n. Two examples are shown in Fig. 5. These ACFs could indicate some significant bivariate time series independencies that would not be visible in other ways, but they were extremely well behaving and, furthermore, empirical rates confirmed the fact that the mobile backbone was not congested at any time during the over 30 hour long trace collection. As a conclusion of the analysis presented in this section we can say that there are dependencies in the univariate time series data but it is certainly not LRD. On the contrary, it is close to independent data. Sizes (Content) are less dependent than durations, which is natural. We can also assume that the bivariate time series data (S i,d i ), i = 1,...,n is only weakly dependent. As a conclusion of the data analysis of both the previous and of this section we now can conclude that the data not only appear but really is heavy-tailed. 4 Non-parametric estimation of the Pickands dependence function 4.1 Theory Let (X 1,Y 1 ),...,(X n,y n ) be a bivariate i.i.d. (or weak dependent) sample with a bivariate max-stable distribution G. It implies that there exist normalizing constants a j,n > 0and b j,n R, j = 1, 2 such that as n, P{(M 1,n b 1,n )/a 1,n x,(m 2,n b 2,n )/a 2,n y} G(x,y), (7) where M 1,n = max{x 1,...,X n }, M 2,n = max{y 1,...,Y n } are the componentwise maxima (Fougères 2004). An asymptotical bivariate distribution G of normalized maxima (M 1,M 2 )

11 Ann Oper Res (2009) 170: may be determined by its margins G 1 and G 2 by the representation ( G(x,y) = exp log ( G 1 (x)g 2 (y) ) ( )) log G 2 (y) A log(g 1 (x)g 2 (y)) (8) where A(t), t [0, 1] is the Pickands dependence function, Beirlant et al. (2004). One should not mix up marginal CDFs F 1 and F 2 of r.v.s X and Y and corresponding univariate extreme value CDFs G 1 and G 2 of their maxima M 1,n and M 2,n. In order to get an approximation of F R (x) we can replace F(y,z) in (1) by estimated G(y, z), Beirlant et al. (2004). Generally, the detailed motivation of the replacement of a d-variate CDF F by a d-variate extreme value CDF G isgiveninbeirlantetal.(2004), Sect. 9.4, (see also formula (9.62)) under the assumption that F is in the domain of the attraction of G, i.e. (7) is fulfilled. We obtain F R (x) zx 0 0 ( ( d exp log ( Ĝ 1 (y)ĝ 2 (z) ) (  log Ĝ 2 (z) log ( Ĝ 1 (y)ĝ 2 (z) ) ))). (9) Let the random pair (X,Y ) have the DF G(x,y). The Pickands function indicates the degree of dependence among X and Y. In practice, (X,Y ) are component-wise maxima over large blocks of data. In the bivariate case the function A(t) satisfies A(0) = A(1) = 1, it is convex and lies inside the triangle determined by points (0, 1), (1, 1) and (0.5, 0.5). Cases A 1andA(t) = max{1 t,t} correspond to total independence and total dependence, respectively. It is convenient to transform initial random pairs (Xi,Y i ) to new pairs (ξ i,η i ), i = 1,...,m,wherem is the number of blocks, in such a way that the margins are all the same. For example, the transformations ξ i = log G 1 (X i ), η i = log G 2 (Y i ) (10) leads to exponentially distributed r.v.s ξ and η. In this case we get ( ( )) y P{ξ >x,η>y}=exp (x + y) A, x 0,y 0. (11) x + y The simulation study provided in Hall and Tajvidi (2000) indicates that the best estimators of A are  C m from Capéraà et al. (1997)andÂHT m from Hall and Tajvidi (2000): and log  C m (t) = 1 m m j=1 m (t) = 1 m  HT } m { ξ j /ξ min m 1 t, η j /η m 1 t j=1 log max { t ξ j,(1 t) η j } t 1 m m log ξ j (1 t) 1 m j=1, (12) m log η j. (13) Here ξ j = log Ĝ 1 (Xj ) and η j = log Ĝ 2 (Yj ), j = 1,...,m, ξ m = 1 m m j=1 ξ j, η m = 1 m m i=1 η j. When estimating A we face the following three problems. 1. The estimators are not convex. The easiest solution is to take the convex hull of them. j=1

12 210 Ann Oper Res (2009) 170: The margins G 1 and G 2 are unknown. One has to replace them by their estimates Ĝ 1 and Ĝ 2. One can choose some parametric model. The GPD(σ, γ ) or the univariate generalized extreme value distribution (GEV) may be applied as such a model, i.e. { exp( (1 + γ( x μ σ H γ (x) = )) 1/γ ), γ 0, x μ ( exp( e σ ) ), γ = 0, where 1 + γ(x μ)/σ > 0. This approach requires estimates for the parameters. In the i.i.d.-case the ML method can be applied. 3. The pairs of component-wise maxima may be not observable (artificial) in a sense such pairs do not presence in the initial bivariate sample. Under certain conditions one can estimate a bivariate extreme value distribution by an initial random sample, see Sect. 9.4 of Beirlant et al. (2004). The presence of artificial pairs does not affect the asymptotic distribution (7) and representations (8), (11). The excluding of artificial pairs may reduce the sample size. In consequence, the accuracy of the A-function estimation may decrease. 4.2 Practice Given the jth block of size n, let us denote the block maxima by Sj = max{s i i = 1,...,n} and Dj = max{d i i = 1,...,n}. If the number of such blocks is m we obtain bivariate block maxima data (Sj,D j ) (j = 1,...,m). The selection of the number of blocks m is a principal problem. Obviously, a larger amount of blocks leads to a lower variance and a larger bias of parameter estimates. The optimal amount of blocks is connected with the dependence in the data. Roughly speaking, the size of blocks should provide the approximate independence of the maxima sample. The more dependent the data the larger the size of the blocks should be Leadbetter (1983). As concluded in Sect. 3 we can assume independence of the original pairs (S i,d i ). Experimentally the block sizes n = 200, 500, 1000 and were tested. Since we have enough data the analysis of flow size and duration pairs is done from true (not artificial) data points only, i.e., from flows where both size and duration are simultaneously block maxima. However, every block will not contain such pairs, hence m is typically smaller than N/n. The left plot of Fig. 6 shows a scatter plot of such pairs when n = 1000 and m = 211. Before the estimation of the marginal CDFs of component-wise maxima of size and duration by the parametric GEV model the independence was checked with ACFs like explained in Sect. 3. The ML estimates of GEV are summarized in Table 4. We checked the GEV models by QQ-plots, and examples when n = 1000 and m = 211 are shown in Fig. 6. One can see that the QQ-plots show a quite linear behavior close to the diagonal line except at the few tail points which are much less heavy than the parametric model suggests. The cases n = 500 and n = 200 were worse and in those cases the GEV model is not adequate. The normalizing constants of (7) are embedded in the parameters μ and σ of Table 4. Hence they depend on the block size n, but the values of γ should correspond to the values of Table 2 of Sect. 2. Since the fit was good (except the outliers) only in the case n = 1 000, this case provides the best estimates of EVI. The obtained estimates of the Pickands function, showninfig.7, show strong dependence between S and D (since both estimates are

13 Ann Oper Res (2009) 170: Fig. 6 On the left is a scatter plot of pairs of block maxima (Sj,D j ), j = 1,...,m,wheren = 1000 and m = 211. QQ-plots against the GEV model of S (middle) andd (right) whenn = 1000 and m = 211 Fig. 7 On the left plot are the estimates  HT m (t) (dashed line) andâc m (t) (solid line) calculated from (Sj,D j ), j = 1,...,m,wheren = 1000 and m = 211 when GEV models were used for the (10) transform. The middle and right plots show estimates of A for the (Sj,R j ) and (D j,r j ), respectively, where Rj = max 1 i n R ij is the maximal rate S i /D i in the jth block, j = 1,...,m, m = 610, n = 1000 Table 4 ML estimates of GEV parameters of TCP-flow data Statistic Definition Block size Maximas γ μ σ Size Content n = 1000 m = n = 500 m = n = 200 m = Duration SYN-FIN n = 1000 m = n = 500 m = n = 200 m = situated deeply under the upper boundary of the triangle), weak dependence between R and S and almost independence of R and D (since both estimates, especially  C m (t), are close to the horizontal line). The corresponding dependence (or independence) of the initial r.v.s S and D, R and S, R and D follows from the dependence (or independence) of

14 212 Ann Oper Res (2009) 170: Fig. 8 Left: PP-plot test of A HT m (t) with the GEV marginals when n = 1000 and m = 211. Middle: The empirical CDF based estimation of the Pickands dependence function by estimators  C m (t) (solid line) and  HT m (t) (dashed line) whenn = 200 and m = 857. Right: PP-plot test of AHT m (t) with the empirical CDF as marginals when n = 200 and m = 857 their extreme values assuming that corresponding sequences of pairs (S i,d i ), (R i,s i ) and (R i,d i ) are independent. 1 Since the GEV model was adequate only in the case n = 1000 and still there were outliers, the empirical CDFs were used as a model for margins when estimating  HT m (t) and  C m (t) in the cases n = 200, n = 500 and also for n = The middle plot of Fig. 8 shows the case when n = 200 and m = Testing of estimates of A-function Finally, we need a method to show that the estimated A is good. Assume that the second derivative A exists. It is a reasonable assumption since A is known to be convex. One such a method proceeds as follows. We first transform the block maxima data (S i,d i ), i = 1,...,minto (ξ i,η i ), i = 1,...,mdata (10) by using either parametric GEV models as margins or simply empirical CDFs. Then we define H(x,y) by H(x,y) = P{ξ x,η y}=1 P{ξ >x} P{η>y}+P{ξ >x,η>y} ( ( )) y = 1 e x e y + exp (x + y)a. x + y We use the notation h ξ η (x y) = h(x,y) e y for the conditional density of ξ given that η = y. Then P{ξ x η = y}= x 0 x h(w, y) h ξ η (w y) dw = dw = 0 e y H (x, y) y e y 1 Really, suppose the maxima X and Y are independent. It means that P {X x,y y}=p {X x}p {Y y}. Then for i.i.d. pairs (X i,y i ), i = 1,...,nwe have P {X x}p {Y y}=p n {X i x}p n {Y i y}=p n {X i x,y i y}. From here the independence of X i and Y i follows. Suppose now that the maxima X and Y of r.v.s X and Y are dependent. Then evidently X and Y cannot be independent.

15 Ann Oper Res (2009) 170: and P{ξ/η z}= 0 P{ξ zy η = y}e y dy = 0 H (zy,y) dy. y Now, a straightforward calculation shows that the distribution of χ = ξ/η turns out to be z(1 + z A ( 1+z 1 ) ) A( 1+z 1 F χ (z) = ). (14) (1 + z) 2 The value of the derivative A (t) can be taken to be the slope of the edge of the convex hull of the estimate of A. After constructing values χ i = ξ i /η i, i = 1,...,m, we can make a probability plot (PP-plot) ( F χ (χ (i) ), i ) (i = 1,...,m) (15) m and see whether the fit is good. The proposed method makes use of the well known fact that if a random variable X has a continuous CDF F(x),thenF(X) is uniformly distributed. Hence, the PP-plot is close to a linear one if the model of the distribution is selected correctly, Coles (2001). The accuracy of this test depends on the accuracy of the estimation of both marginal distributions G 1 and G 2. The PP-plot can indicate an appropriate value of m for a selected model of the marginal distribution. The left and right plots of Fig. 8 show two examples of this test. In the left plot of Fig. 8 the model is the convex hull of A HT m model is the convex hull of A HT m Demonstration of the use of the approximation formula (t), m = 211 with GEV marginals, in the right plot the (t), m = 857 with empirical CDFs as marginals. In order to demonstrate the use of the approximation formula (9) we introduce the notation Rj = Sj /D j, j = 1,...,m. (Note that R j Rj = max 1 i n R ij ; the notation Rj was used in Fig. 7.) We show a generic method to obtain a parametric model for the distribution of R, F R, and show that it serves as a model for the conditional distribution of R = S/D given that S and D are larger than some thresholds s 0 and d 0, respectively. (Note that, since the access rates are bounded, a threshold s 0 for the size induces an implicit threshold d 0 for the duration.) First we list briefly some motivations why a parametric model for F R is expected to be useful: (1) The quantiles of F R contain information about bottlenecks of the rates of all flows with size larger than s 0. (2) The LRD of the aggregate traffic is known to be a consequence of heavy-tailed file sizes. Thus, the facility of simulation of rates of large flows can be assumed to be helpful in studying LRD and burstiness phenomena of the aggregate traffic. The margins G 1 and G 2 of the right hand side of (9) can be modelled, for example, by the fitted GEV distributions or by GPDs, see Sect. 2. The POT package for the R statistical software, Ribatet (2006), was found useful when dealing with GPDs. However, for this data, the GEV distribution with parameters taken from Table 4 with m = 211 was finally the best choice. Next, guided by the non-parametric estimates of A, we searched for a suitable parametric model for A. In the literature, see e.g. Hall and Tajvidi (2000), Ribatet (2006), there exist at

16 214 Ann Oper Res (2009) 170: Fig. 9 Left: A comparison of A HT m, m = 857 (dashed curve) with the logistic model A r when r = 2(solid curve). Middle: A PP-plot of the GEV and A 2 based parametric model F R against all flows with size larger than a threshold value 200 kb. Right: The parametric model CDF of F R (x) (dashed curve) andthe empirical CDF of rates ofall flows size S 200 kb (solid curve) least six different families of parametric models for A. The left plot of Fig. 9 shows that the best estimates of A, namely A HT m (and AC m ) with m = 857 can be approximated, for example, by a logistic model of the dependence function, Hall and Tajvidi (2000), Ribatet (2006), of the type A r (t) = (t r + (1 t) r ) 1/r when r = 2. On the other hand, if A r was assumed a priori as a model for A and the value for r was estimated from the data we also got r 2. To demonstrate the relevance of the model F R to empirical data, we show the empirical CDF of corresponded observations R i = S i /D i and a PP-plot of the model. The middle plot of Fig. 9 shows a PP-plot of the parametric model F R, whereg 1 and G 2 are modelled by GEVs with parameters taken from Table 4 (m = 211) and A 2 chosen as a model for A, against all flows with size larger than a threshold value s 0 = 200 kb. This threshold value was found by requiring the PP-plot to be as linear as possible. It corresponds to the 99.5% empirical quantile of sizes and there were flows with S s 0. The implicit threshold for durations was d 0 = 13.4 s which corresponds to the 65% empirical quantile of durations. The difference in the quantiles is due to the fact that there were a lot of flows with size smaller than 200 kb but duration longer than 13.4 s. Finally, the right plot of Fig. 9 shows the parametric CDF F R where G 1 and G 2 are modelled by GEVs with parameters taken from Table 4 (m = 211) and A 2 chosen as a model for A. For a comparison it shows also the empirical CDF of R = S/D of all flows with S s 0 = 200 kb. 5 Conclusions In this paper there are several new ideas concerning the TCP-flow data analysis which are both interesting for statistics and telecommunication. The distribution of the rate R of TCP-flows of a given access network is approximated by means of the bivariate extreme value distribution of the flow size S and duration D using the representation by Pickands A-function. The dependence of extreme values of size, duration and rate of TCP-flow is investigated by means of Pickands function and different models, namely, the GEV and the GPD, of marginal distributions. We found that, for this data, the TCP-flow size and duration are dependent random variables, the throughput rate and size are weak dependent and the throughput rate and duration are almost independent.

17 Ann Oper Res (2009) 170: Regarding the statistical innovations the proposition of statistic (14) and the further use of the PP-plot technique (15) provide a new method to test the relevance of estimates of Pickands function. Another new statistical method is proposed to estimate the most sensitive smoothing parameter of Pickands function, that is the number of blocks m. To our best knowledge, this was done the first time in this paper. Moreover, we gave a motivation for this type of analysis and performed the analysis with one concrete data example. A data transformation technique (10) to new data sets with exponentially distributed random variables is applied to estimate and to test the A-function. The usage of such transformation allows to extend the described methodology to other tailed (not necessary exponentially distributed) data sets, specifically to other data on TCP flows. References Beirlant, J., Goegebeur, Y., Teugels, J., & Segers, J. (2004). Statistics of extremes. New York: Wiley. Brockwell, P., & Davis, R. (1991). Time series: theory and methods. Berlin: Springer. Capéraà, P., Fougères, A.-L., & Genest, C. (1997). Estimation of bivariate extreme value copulas. Biometrika, 84, Castellana, J. V., & Leadbetter, M. R. (1986). On smoothed probability density estimation for stationary processes. Stochastic Processes and their Applications, 21, Coles, S. (2001). An introduction to statistical modeling of extreme values. Berlin: Springer. D Auria, B., & Resnick, S. (2006). Data network models of burstiness. Advances in Applied Probability, 38, Davis, R., & Resnick, S. (1985). Limit theory for moving averages of random variables with regularly varying tail probabilities. Annals of Probability, 13, Davydov, Y., Paulauskas, V., & Račkauskas, A. (2000). More on p-stable convex sets in Banach spaces. Journal of Theoretical Probability, 13(1), Embrechts, P., Klüppelberg, C., & Mikosch, T. (1997). Modelling extremal events for finance and insurance. Berlin: Springer. Fougères, A.-L. (2004). Multivariate extremes. In Extreme values in finance, telecommunications and the environment (pp ). London: Chapman & Hall. Hall, P., & Tajvidi, N. (2000). Distribution and dependence-function estimation for bivariate extreme-value distributions. Bernoulli, 6, Hall, P., Lahiri, S. N., & Truong, Y. K. (1995). On bandwidth choice for density estimation with dependent data. Annals of Statistics, 23(6), Kendall, M. (1970). Rank correlation methods (4th edn.). London: Griffin. Kettani, H., & Gubner, J. A. (2006). A novel approach to the estimation of the long-range dependence parameter. IEEE Transactions on Circuits and Systems-II: Express Briefs, 53(6), Kilpi, J., & Lassila, P. (2006). Micro- and macroscopic analysis of RTT variability in GPRS and UMTS networks. In F. Boavida et al. (Eds.), LNCS: Vol Networking 2006 (pp ). Berlin: Springer. Leadbetter, M. R. (1983). Extremes and local dependence in stationary sequences. Probability Theory and Related Fields, 65(2), Markovich, N. M. (2005). On-line estimation of the tail index for heavy-tailed distributions with applications to WWW-traffic. In Proceedings of the EuroNGI first conference on NGI: traffic engineering, Rome, Italy. Markovitch, N. M., & Krieger, U. R. (2002). The estimation of heavy-tailed probability density functions, their mixtures and quantiles. Computer Networks, 40(3), Markovich, N. M., & Krieger, U. R. (2006). Inspection and analysis techniques for traffic data arising from the Internet. In Proceedings of the HETNETs 04 2nd international working conference on performance modelling and evaluation of heterogeneous networks, Ilkley, West Yorkshire (pp. 72/1 72/9). Mikosch, T. (2002). Modeling dependence and tails of financial time series (Technical Report Working Paper No. 181). University of Copenhagen, Laboratory of Actuarial Mathematics. Resnick, S. (1997). Heavy tail modeling and teletraffic data. Annals of Statistics, 25, With discussion and a rejoinder by the author.

18 216 Ann Oper Res (2009) 170: Resnick, S. (2006). Heavy-tail phenomena. Probabilistic and statistical modeling. Berlin: Springer. Ribatet, M. (2006). A user s guide to the POT package (Version 1.0). Taqqu, M., & Teverovsky, V. (1998). On estimating the intensity of long-range dependence in finite and infinite variance time series. In R. J. Adler, F. E. Feldman, & M. S. Taqqu (Eds.), A practical guide to heavy tails (pp ). Basel: Birkhäuser. Tstat (2007). Tstat: TCP statistic and analysis tool. van de Meent, R., & Mandjes, M. (2005). Evaluation of user-oriented and black-box traffic models for link provisioning. In Proceedings of the 1st EuroNGI conference on next generation Internet networks traffic engineering, Rome, Italy. New York: IEEE Press.

Analysis methods of heavy-tailed data

Analysis methods of heavy-tailed data Institute of Control Sciences Russian Academy of Sciences, Moscow, Russia February, 13-18, 2006, Bamberg, Germany June, 19-23, 2006, Brest, France May, 14-19, 2007, Trondheim, Norway PhD course Chapter

More information

Analysis methods of heavy-tailed data

Analysis methods of heavy-tailed data Institute of Control Sciences Russian Academy of Sciences, Moscow, Russia February, 13-18, 2006, Bamberg, Germany June, 19-23, 2006, Brest, France May, 14-19, 2007, Trondheim, Norway PhD course Chapter

More information

Overview of Extreme Value Theory. Dr. Sawsan Hilal space

Overview of Extreme Value Theory. Dr. Sawsan Hilal space Overview of Extreme Value Theory Dr. Sawsan Hilal space Maths Department - University of Bahrain space November 2010 Outline Part-1: Univariate Extremes Motivation Threshold Exceedances Part-2: Bivariate

More information

Extreme Value Theory and Applications

Extreme Value Theory and Applications Extreme Value Theory and Deauville - 04/10/2013 Extreme Value Theory and Introduction Asymptotic behavior of the Sum Extreme (from Latin exter, exterus, being on the outside) : Exceeding the ordinary,

More information

Extreme Value Analysis and Spatial Extremes

Extreme Value Analysis and Spatial Extremes Extreme Value Analysis and Department of Statistics Purdue University 11/07/2013 Outline Motivation 1 Motivation 2 Extreme Value Theorem and 3 Bayesian Hierarchical Models Copula Models Max-stable Models

More information

Modelling Multivariate Peaks-over-Thresholds using Generalized Pareto Distributions

Modelling Multivariate Peaks-over-Thresholds using Generalized Pareto Distributions Modelling Multivariate Peaks-over-Thresholds using Generalized Pareto Distributions Anna Kiriliouk 1 Holger Rootzén 2 Johan Segers 1 Jennifer L. Wadsworth 3 1 Université catholique de Louvain (BE) 2 Chalmers

More information

ESTIMATING BIVARIATE TAIL

ESTIMATING BIVARIATE TAIL Elena DI BERNARDINO b joint work with Clémentine PRIEUR a and Véronique MAUME-DESCHAMPS b a LJK, Université Joseph Fourier, Grenoble 1 b Laboratoire SAF, ISFA, Université Lyon 1 Framework Goal: estimating

More information

Bayesian Modelling of Extreme Rainfall Data

Bayesian Modelling of Extreme Rainfall Data Bayesian Modelling of Extreme Rainfall Data Elizabeth Smith A thesis submitted for the degree of Doctor of Philosophy at the University of Newcastle upon Tyne September 2005 UNIVERSITY OF NEWCASTLE Bayesian

More information

A Conditional Approach to Modeling Multivariate Extremes

A Conditional Approach to Modeling Multivariate Extremes A Approach to ing Multivariate Extremes By Heffernan & Tawn Department of Statistics Purdue University s April 30, 2014 Outline s s Multivariate Extremes s A central aim of multivariate extremes is trying

More information

Sharp statistical tools Statistics for extremes

Sharp statistical tools Statistics for extremes Sharp statistical tools Statistics for extremes Georg Lindgren Lund University October 18, 2012 SARMA Background Motivation We want to predict outside the range of observations Sums, averages and proportions

More information

Analysis methods of heavy-tailed data

Analysis methods of heavy-tailed data Institute of Control Sciences Russian Academy of Sciences, Moscow, Russia February, 13-18, 2006, Bamberg, Germany June, 19-23, 2006, Brest, France May, 14-19, 2007, Trondheim, Norway PhD course Chapter

More information

Multivariate generalized Pareto distributions

Multivariate generalized Pareto distributions Multivariate generalized Pareto distributions Holger Rootzén and Nader Tajvidi Abstract Statistical inference for extremes has been a subject of intensive research during the past couple of decades. One

More information

PREPRINT 2005:38. Multivariate Generalized Pareto Distributions HOLGER ROOTZÉN NADER TAJVIDI

PREPRINT 2005:38. Multivariate Generalized Pareto Distributions HOLGER ROOTZÉN NADER TAJVIDI PREPRINT 2005:38 Multivariate Generalized Pareto Distributions HOLGER ROOTZÉN NADER TAJVIDI Department of Mathematical Sciences Division of Mathematical Statistics CHALMERS UNIVERSITY OF TECHNOLOGY GÖTEBORG

More information

Spatial and temporal extremes of wildfire sizes in Portugal ( )

Spatial and temporal extremes of wildfire sizes in Portugal ( ) International Journal of Wildland Fire 2009, 18, 983 991. doi:10.1071/wf07044_ac Accessory publication Spatial and temporal extremes of wildfire sizes in Portugal (1984 2004) P. de Zea Bermudez A, J. Mendes

More information

Nonlinear Time Series Modeling

Nonlinear Time Series Modeling Nonlinear Time Series Modeling Part II: Time Series Models in Finance Richard A. Davis Colorado State University (http://www.stat.colostate.edu/~rdavis/lectures) MaPhySto Workshop Copenhagen September

More information

Nonparametric Estimation of the Dependence Function for a Multivariate Extreme Value Distribution

Nonparametric Estimation of the Dependence Function for a Multivariate Extreme Value Distribution Nonparametric Estimation of the Dependence Function for a Multivariate Extreme Value Distribution p. /2 Nonparametric Estimation of the Dependence Function for a Multivariate Extreme Value Distribution

More information

Network Traffic Characteristic

Network Traffic Characteristic Network Traffic Characteristic Hojun Lee hlee02@purros.poly.edu 5/24/2002 EL938-Project 1 Outline Motivation What is self-similarity? Behavior of Ethernet traffic Behavior of WAN traffic Behavior of WWW

More information

CONTAGION VERSUS FLIGHT TO QUALITY IN FINANCIAL MARKETS

CONTAGION VERSUS FLIGHT TO QUALITY IN FINANCIAL MARKETS EVA IV, CONTAGION VERSUS FLIGHT TO QUALITY IN FINANCIAL MARKETS Jose Olmo Department of Economics City University, London (joint work with Jesús Gonzalo, Universidad Carlos III de Madrid) 4th Conference

More information

Does k-th Moment Exist?

Does k-th Moment Exist? Does k-th Moment Exist? Hitomi, K. 1 and Y. Nishiyama 2 1 Kyoto Institute of Technology, Japan 2 Institute of Economic Research, Kyoto University, Japan Email: hitomi@kit.ac.jp Keywords: Existence of moments,

More information

Frequency Estimation of Rare Events by Adaptive Thresholding

Frequency Estimation of Rare Events by Adaptive Thresholding Frequency Estimation of Rare Events by Adaptive Thresholding J. R. M. Hosking IBM Research Division 2009 IBM Corporation Motivation IBM Research When managing IT systems, there is a need to identify transactions

More information

Bivariate generalized Pareto distribution

Bivariate generalized Pareto distribution Bivariate generalized Pareto distribution in practice Eötvös Loránd University, Budapest, Hungary Minisymposium on Uncertainty Modelling 27 September 2011, CSASC 2011, Krems, Austria Outline Short summary

More information

A Closer Look at the Hill Estimator: Edgeworth Expansions and Confidence Intervals

A Closer Look at the Hill Estimator: Edgeworth Expansions and Confidence Intervals A Closer Look at the Hill Estimator: Edgeworth Expansions and Confidence Intervals Erich HAEUSLER University of Giessen http://www.uni-giessen.de Johan SEGERS Tilburg University http://www.center.nl EVA

More information

Estimation of the long Memory parameter using an Infinite Source Poisson model applied to transmission rate measurements

Estimation of the long Memory parameter using an Infinite Source Poisson model applied to transmission rate measurements of the long Memory parameter using an Infinite Source Poisson model applied to transmission rate measurements François Roueff Ecole Nat. Sup. des Télécommunications 46 rue Barrault, 75634 Paris cedex 13,

More information

A NOVEL APPROACH TO THE ESTIMATION OF THE HURST PARAMETER IN SELF-SIMILAR TRAFFIC

A NOVEL APPROACH TO THE ESTIMATION OF THE HURST PARAMETER IN SELF-SIMILAR TRAFFIC Proceedings of IEEE Conference on Local Computer Networks, Tampa, Florida, November 2002 A NOVEL APPROACH TO THE ESTIMATION OF THE HURST PARAMETER IN SELF-SIMILAR TRAFFIC Houssain Kettani and John A. Gubner

More information

Methods, Techniques and Tools for IP Traffic Characterization, Measurements and Statistical Methods

Methods, Techniques and Tools for IP Traffic Characterization, Measurements and Statistical Methods Information Society Technologies (IST) - 6th Framework Programme Deliverable No: D.WP.JRA.5.1.2 Methods, Techniques and Tools for IP Traffic Characterization, Measurements and Statistical Methods Deliverable

More information

A MODIFICATION OF HILL S TAIL INDEX ESTIMATOR

A MODIFICATION OF HILL S TAIL INDEX ESTIMATOR L. GLAVAŠ 1 J. JOCKOVIĆ 2 A MODIFICATION OF HILL S TAIL INDEX ESTIMATOR P. MLADENOVIĆ 3 1, 2, 3 University of Belgrade, Faculty of Mathematics, Belgrade, Serbia Abstract: In this paper, we study a class

More information

Semi-parametric estimation of non-stationary Pickands functions

Semi-parametric estimation of non-stationary Pickands functions Semi-parametric estimation of non-stationary Pickands functions Linda Mhalla 1 Joint work with: Valérie Chavez-Demoulin 2 and Philippe Naveau 3 1 Geneva School of Economics and Management, University of

More information

Research Article Strong Convergence Bound of the Pareto Index Estimator under Right Censoring

Research Article Strong Convergence Bound of the Pareto Index Estimator under Right Censoring Hindawi Publishing Corporation Journal of Inequalities and Applications Volume 200, Article ID 20956, 8 pages doi:0.55/200/20956 Research Article Strong Convergence Bound of the Pareto Index Estimator

More information

Heavy Tailed Time Series with Extremal Independence

Heavy Tailed Time Series with Extremal Independence Heavy Tailed Time Series with Extremal Independence Rafa l Kulik and Philippe Soulier Conference in honour of Prof. Herold Dehling Bochum January 16, 2015 Rafa l Kulik and Philippe Soulier Regular variation

More information

Financial Econometrics and Volatility Models Extreme Value Theory

Financial Econometrics and Volatility Models Extreme Value Theory Financial Econometrics and Volatility Models Extreme Value Theory Eric Zivot May 3, 2010 1 Lecture Outline Modeling Maxima and Worst Cases The Generalized Extreme Value Distribution Modeling Extremes Over

More information

Change Point Analysis of Extreme Values

Change Point Analysis of Extreme Values Change Point Analysis of Extreme Values TIES 2008 p. 1/? Change Point Analysis of Extreme Values Goedele Dierckx Economische Hogeschool Sint Aloysius, Brussels, Belgium e-mail: goedele.dierckx@hubrussel.be

More information

Richard L. Smith Department of Statistics and Operations Research University of North Carolina Chapel Hill, NC

Richard L. Smith Department of Statistics and Operations Research University of North Carolina Chapel Hill, NC EXTREME VALUE THEORY Richard L. Smith Department of Statistics and Operations Research University of North Carolina Chapel Hill, NC 27599-3260 rls@email.unc.edu AMS Committee on Probability and Statistics

More information

Multivariate generalized Pareto distributions

Multivariate generalized Pareto distributions Bernoulli 12(5), 2006, 917 930 Multivariate generalized Pareto distributions HOLGER ROOTZÉN 1 and NADER TAJVIDI 2 1 Chalmers University of Technology, S-412 96 Göteborg, Sweden. E-mail rootzen@math.chalmers.se

More information

FRACTIONAL BROWNIAN MOTION WITH H < 1/2 AS A LIMIT OF SCHEDULED TRAFFIC

FRACTIONAL BROWNIAN MOTION WITH H < 1/2 AS A LIMIT OF SCHEDULED TRAFFIC Applied Probability Trust ( April 20) FRACTIONAL BROWNIAN MOTION WITH H < /2 AS A LIMIT OF SCHEDULED TRAFFIC VICTOR F. ARAMAN, American University of Beirut PETER W. GLYNN, Stanford University Keywords:

More information

EXTREMAL QUANTILES OF MAXIMUMS FOR STATIONARY SEQUENCES WITH PSEUDO-STATIONARY TREND WITH APPLICATIONS IN ELECTRICITY CONSUMPTION ALEXANDR V.

EXTREMAL QUANTILES OF MAXIMUMS FOR STATIONARY SEQUENCES WITH PSEUDO-STATIONARY TREND WITH APPLICATIONS IN ELECTRICITY CONSUMPTION ALEXANDR V. MONTENEGRIN STATIONARY JOURNAL TREND WITH OF ECONOMICS, APPLICATIONS Vol. IN 9, ELECTRICITY No. 4 (December CONSUMPTION 2013), 53-63 53 EXTREMAL QUANTILES OF MAXIMUMS FOR STATIONARY SEQUENCES WITH PSEUDO-STATIONARY

More information

Exploring regularities and self-similarity in Internet traffic

Exploring regularities and self-similarity in Internet traffic Exploring regularities and self-similarity in Internet traffic FRANCESCO PALMIERI and UGO FIORE Centro Servizi Didattico Scientifico Università degli studi di Napoli Federico II Complesso Universitario

More information

On prediction and density estimation Peter McCullagh University of Chicago December 2004

On prediction and density estimation Peter McCullagh University of Chicago December 2004 On prediction and density estimation Peter McCullagh University of Chicago December 2004 Summary Having observed the initial segment of a random sequence, subsequent values may be predicted by calculating

More information

Stochastic volatility models: tails and memory

Stochastic volatility models: tails and memory : tails and memory Rafa l Kulik and Philippe Soulier Conference in honour of Prof. Murad Taqqu 19 April 2012 Rafa l Kulik and Philippe Soulier Plan Model assumptions; Limit theorems for partial sums and

More information

Models and estimation.

Models and estimation. Bivariate generalized Pareto distribution practice: Models and estimation. Eötvös Loránd University, Budapest, Hungary 7 June 2011, ASMDA Conference, Rome, Italy Problem How can we properly estimate the

More information

HIERARCHICAL MODELS IN EXTREME VALUE THEORY

HIERARCHICAL MODELS IN EXTREME VALUE THEORY HIERARCHICAL MODELS IN EXTREME VALUE THEORY Richard L. Smith Department of Statistics and Operations Research, University of North Carolina, Chapel Hill and Statistical and Applied Mathematical Sciences

More information

APPLICATION OF EXTREMAL THEORY TO THE PRECIPITATION SERIES IN NORTHERN MORAVIA

APPLICATION OF EXTREMAL THEORY TO THE PRECIPITATION SERIES IN NORTHERN MORAVIA APPLICATION OF EXTREMAL THEORY TO THE PRECIPITATION SERIES IN NORTHERN MORAVIA DANIELA JARUŠKOVÁ Department of Mathematics, Czech Technical University, Prague; jarus@mat.fsv.cvut.cz 1. Introduction The

More information

Tail dependence in bivariate skew-normal and skew-t distributions

Tail dependence in bivariate skew-normal and skew-t distributions Tail dependence in bivariate skew-normal and skew-t distributions Paola Bortot Department of Statistical Sciences - University of Bologna paola.bortot@unibo.it Abstract: Quantifying dependence between

More information

Multivariate Distribution Models

Multivariate Distribution Models Multivariate Distribution Models Model Description While the probability distribution for an individual random variable is called marginal, the probability distribution for multiple random variables is

More information

EVANESCE Implementation in S-PLUS FinMetrics Module. July 2, Insightful Corp

EVANESCE Implementation in S-PLUS FinMetrics Module. July 2, Insightful Corp EVANESCE Implementation in S-PLUS FinMetrics Module July 2, 2002 Insightful Corp The Extreme Value Analysis Employing Statistical Copula Estimation (EVANESCE) library for S-PLUS FinMetrics module provides

More information

On the estimation of the heavy tail exponent in time series using the max spectrum. Stilian A. Stoev

On the estimation of the heavy tail exponent in time series using the max spectrum. Stilian A. Stoev On the estimation of the heavy tail exponent in time series using the max spectrum Stilian A. Stoev (sstoev@umich.edu) University of Michigan, Ann Arbor, U.S.A. JSM, Salt Lake City, 007 joint work with:

More information

The autocorrelation and autocovariance functions - helpful tools in the modelling problem

The autocorrelation and autocovariance functions - helpful tools in the modelling problem The autocorrelation and autocovariance functions - helpful tools in the modelling problem J. Nowicka-Zagrajek A. Wy lomańska Institute of Mathematics and Computer Science Wroc law University of Technology,

More information

Maximum Likelihood Estimation of the Flow Size Distribution Tail Index from Sampled Packet Data

Maximum Likelihood Estimation of the Flow Size Distribution Tail Index from Sampled Packet Data Maximum Likelihood Estimation of the Flow Size Distribution Tail Index from Sampled Packet Data Patrick Loiseau 1, Paulo Gonçalves 1, Stéphane Girard 2, Florence Forbes 2, Pascale Vicat-Blanc Primet 1

More information

On the Application of the Generalized Pareto Distribution for Statistical Extrapolation in the Assessment of Dynamic Stability in Irregular Waves

On the Application of the Generalized Pareto Distribution for Statistical Extrapolation in the Assessment of Dynamic Stability in Irregular Waves On the Application of the Generalized Pareto Distribution for Statistical Extrapolation in the Assessment of Dynamic Stability in Irregular Waves Bradley Campbell 1, Vadim Belenky 1, Vladas Pipiras 2 1.

More information

Semi-parametric predictive inference for bivariate data using copulas

Semi-parametric predictive inference for bivariate data using copulas Semi-parametric predictive inference for bivariate data using copulas Tahani Coolen-Maturi a, Frank P.A. Coolen b,, Noryanti Muhammad b a Durham University Business School, Durham University, Durham, DH1

More information

Some conditional extremes of a Markov chain

Some conditional extremes of a Markov chain Some conditional extremes of a Markov chain Seminar at Edinburgh University, November 2005 Adam Butler, Biomathematics & Statistics Scotland Jonathan Tawn, Lancaster University Acknowledgements: Janet

More information

Investigation of an Automated Approach to Threshold Selection for Generalized Pareto

Investigation of an Automated Approach to Threshold Selection for Generalized Pareto Investigation of an Automated Approach to Threshold Selection for Generalized Pareto Kate R. Saunders Supervisors: Peter Taylor & David Karoly University of Melbourne April 8, 2015 Outline 1 Extreme Value

More information

First steps of multivariate data analysis

First steps of multivariate data analysis First steps of multivariate data analysis November 28, 2016 Let s Have Some Coffee We reproduce the coffee example from Carmona, page 60 ff. This vignette is the first excursion away from univariate data.

More information

On the occurrence times of componentwise maxima and bias in likelihood inference for multivariate max-stable distributions

On the occurrence times of componentwise maxima and bias in likelihood inference for multivariate max-stable distributions On the occurrence times of componentwise maxima and bias in likelihood inference for multivariate max-stable distributions J. L. Wadsworth Department of Mathematics and Statistics, Fylde College, Lancaster

More information

Accounting for extreme-value dependence in multivariate data

Accounting for extreme-value dependence in multivariate data Accounting for extreme-value dependence in multivariate data 38th ASTIN Colloquium Manchester, July 15, 2008 Outline 1. Dependence modeling through copulas 2. Rank-based inference 3. Extreme-value dependence

More information

The extremal elliptical model: Theoretical properties and statistical inference

The extremal elliptical model: Theoretical properties and statistical inference 1/25 The extremal elliptical model: Theoretical properties and statistical inference Thomas OPITZ Supervisors: Jean-Noel Bacro, Pierre Ribereau Institute of Mathematics and Modeling in Montpellier (I3M)

More information

Math 576: Quantitative Risk Management

Math 576: Quantitative Risk Management Math 576: Quantitative Risk Management Haijun Li lih@math.wsu.edu Department of Mathematics Washington State University Week 11 Haijun Li Math 576: Quantitative Risk Management Week 11 1 / 21 Outline 1

More information

Bayesian Inference for Clustered Extremes

Bayesian Inference for Clustered Extremes Newcastle University, Newcastle-upon-Tyne, U.K. lee.fawcett@ncl.ac.uk 20th TIES Conference: Bologna, Italy, July 2009 Structure of this talk 1. Motivation and background 2. Review of existing methods Limitations/difficulties

More information

Estimation of Quantiles

Estimation of Quantiles 9 Estimation of Quantiles The notion of quantiles was introduced in Section 3.2: recall that a quantile x α for an r.v. X is a constant such that P(X x α )=1 α. (9.1) In this chapter we examine quantiles

More information

The Behavior of Multivariate Maxima of Moving Maxima Processes

The Behavior of Multivariate Maxima of Moving Maxima Processes The Behavior of Multivariate Maxima of Moving Maxima Processes Zhengjun Zhang Department of Mathematics Washington University Saint Louis, MO 6313-4899 USA Richard L. Smith Department of Statistics University

More information

A PRACTICAL WAY FOR ESTIMATING TAIL DEPENDENCE FUNCTIONS

A PRACTICAL WAY FOR ESTIMATING TAIL DEPENDENCE FUNCTIONS Statistica Sinica 20 2010, 365-378 A PRACTICAL WAY FOR ESTIMATING TAIL DEPENDENCE FUNCTIONS Liang Peng Georgia Institute of Technology Abstract: Estimating tail dependence functions is important for applications

More information

CHAPTER 7. Trace Resampling and Load Scaling

CHAPTER 7. Trace Resampling and Load Scaling CHAPTER 7 Trace Resampling and Load Scaling That which is static and repetitive is boring. That which is dynamic and random is confusing. In between lies art. John A. Locke ( 70) Everything that can be

More information

A New Class of Tail-dependent Time Series Models and Its Applications in Financial Time Series

A New Class of Tail-dependent Time Series Models and Its Applications in Financial Time Series A New Class of Tail-dependent Time Series Models and Its Applications in Financial Time Series Zhengjun Zhang Department of Mathematics, Washington University, Saint Louis, MO 63130-4899, USA Abstract

More information

Generalized additive modelling of hydrological sample extremes

Generalized additive modelling of hydrological sample extremes Generalized additive modelling of hydrological sample extremes Valérie Chavez-Demoulin 1 Joint work with A.C. Davison (EPFL) and Marius Hofert (ETHZ) 1 Faculty of Business and Economics, University of

More information

Applying the proportional hazard premium calculation principle

Applying the proportional hazard premium calculation principle Applying the proportional hazard premium calculation principle Maria de Lourdes Centeno and João Andrade e Silva CEMAPRE, ISEG, Technical University of Lisbon, Rua do Quelhas, 2, 12 781 Lisbon, Portugal

More information

Capturing Network Traffic Dynamics Small Scales. Rolf Riedi

Capturing Network Traffic Dynamics Small Scales. Rolf Riedi Capturing Network Traffic Dynamics Small Scales Rolf Riedi Dept of Statistics Stochastic Systems and Modelling in Networking and Finance Part II Dependable Adaptive Systems and Mathematical Modeling Kaiserslautern,

More information

Model Fitting. Jean Yves Le Boudec

Model Fitting. Jean Yves Le Boudec Model Fitting Jean Yves Le Boudec 0 Contents 1. What is model fitting? 2. Linear Regression 3. Linear regression with norm minimization 4. Choosing a distribution 5. Heavy Tail 1 Virus Infection Data We

More information

Financial Econometrics and Volatility Models Copulas

Financial Econometrics and Volatility Models Copulas Financial Econometrics and Volatility Models Copulas Eric Zivot Updated: May 10, 2010 Reading MFTS, chapter 19 FMUND, chapters 6 and 7 Introduction Capturing co-movement between financial asset returns

More information

A New Estimator for a Tail Index

A New Estimator for a Tail Index Acta Applicandae Mathematicae 00: 3, 2003. 2003 Kluwer Academic Publishers. Printed in the Netherlands. A New Estimator for a Tail Index V. PAULAUSKAS Department of Mathematics and Informatics, Vilnius

More information

Tail Index Estimation of Heavy-tailed Distributions

Tail Index Estimation of Heavy-tailed Distributions CHAPTER 2 Tail Index Estimation of Heavy-tailed Distributions 2.1 Introduction In many diverse fields such as meteriology, finance, hydrology, climatology, environmental sciences, telecommunication, insurance

More information

The largest eigenvalues of the sample covariance matrix. in the heavy-tail case

The largest eigenvalues of the sample covariance matrix. in the heavy-tail case The largest eigenvalues of the sample covariance matrix 1 in the heavy-tail case Thomas Mikosch University of Copenhagen Joint work with Richard A. Davis (Columbia NY), Johannes Heiny (Aarhus University)

More information

On the Estimation and Application of Max-Stable Processes

On the Estimation and Application of Max-Stable Processes On the Estimation and Application of Max-Stable Processes Zhengjun Zhang Department of Statistics University of Wisconsin Madison, WI 53706, USA Co-author: Richard Smith EVA 2009, Fort Collins, CO Z. Zhang

More information

Network Simulation Chapter 5: Traffic Modeling. Chapter Overview

Network Simulation Chapter 5: Traffic Modeling. Chapter Overview Network Simulation Chapter 5: Traffic Modeling Prof. Dr. Jürgen Jasperneite 1 Chapter Overview 1. Basic Simulation Modeling 2. OPNET IT Guru - A Tool for Discrete Event Simulation 3. Review of Basic Probabilities

More information

Overview of Extreme Value Analysis (EVA)

Overview of Extreme Value Analysis (EVA) Overview of Extreme Value Analysis (EVA) Brian Reich North Carolina State University July 26, 2016 Rossbypalooza Chicago, IL Brian Reich Overview of Extreme Value Analysis (EVA) 1 / 24 Importance of extremes

More information

Convolution Based Unit Root Processes: a Simulation Approach

Convolution Based Unit Root Processes: a Simulation Approach International Journal of Statistics and Probability; Vol., No. 6; November 26 ISSN 927-732 E-ISSN 927-74 Published by Canadian Center of Science and Education Convolution Based Unit Root Processes: a Simulation

More information

Extreme Precipitation: An Application Modeling N-Year Return Levels at the Station Level

Extreme Precipitation: An Application Modeling N-Year Return Levels at the Station Level Extreme Precipitation: An Application Modeling N-Year Return Levels at the Station Level Presented by: Elizabeth Shamseldin Joint work with: Richard Smith, Doug Nychka, Steve Sain, Dan Cooley Statistics

More information

Introduction to Algorithmic Trading Strategies Lecture 10

Introduction to Algorithmic Trading Strategies Lecture 10 Introduction to Algorithmic Trading Strategies Lecture 10 Risk Management Haksun Li haksun.li@numericalmethod.com www.numericalmethod.com Outline Value at Risk (VaR) Extreme Value Theory (EVT) References

More information

Multivariate Heavy Tails, Asymptotic Independence and Beyond

Multivariate Heavy Tails, Asymptotic Independence and Beyond Multivariate Heavy Tails, endence and Beyond Sidney Resnick School of Operations Research and Industrial Engineering Rhodes Hall Cornell University Ithaca NY 14853 USA http://www.orie.cornell.edu/ sid

More information

If we want to analyze experimental or simulated data we might encounter the following tasks:

If we want to analyze experimental or simulated data we might encounter the following tasks: Chapter 1 Introduction If we want to analyze experimental or simulated data we might encounter the following tasks: Characterization of the source of the signal and diagnosis Studying dependencies Prediction

More information

A NOTE ON SECOND ORDER CONDITIONS IN EXTREME VALUE THEORY: LINKING GENERAL AND HEAVY TAIL CONDITIONS

A NOTE ON SECOND ORDER CONDITIONS IN EXTREME VALUE THEORY: LINKING GENERAL AND HEAVY TAIL CONDITIONS REVSTAT Statistical Journal Volume 5, Number 3, November 2007, 285 304 A NOTE ON SECOND ORDER CONDITIONS IN EXTREME VALUE THEORY: LINKING GENERAL AND HEAVY TAIL CONDITIONS Authors: M. Isabel Fraga Alves

More information

Multivariate Distributions

Multivariate Distributions IEOR E4602: Quantitative Risk Management Spring 2016 c 2016 by Martin Haugh Multivariate Distributions We will study multivariate distributions in these notes, focusing 1 in particular on multivariate

More information

Estimating Bivariate Tail: a copula based approach

Estimating Bivariate Tail: a copula based approach Estimating Bivariate Tail: a copula based approach Elena Di Bernardino, Université Lyon 1 - ISFA, Institut de Science Financiere et d'assurances - AST&Risk (ANR Project) Joint work with Véronique Maume-Deschamps

More information

Multivariate Statistics

Multivariate Statistics Multivariate Statistics Chapter 2: Multivariate distributions and inference Pedro Galeano Departamento de Estadística Universidad Carlos III de Madrid pedro.galeano@uc3m.es Course 2016/2017 Master in Mathematical

More information

Abstract: In this short note, I comment on the research of Pisarenko et al. (2014) regarding the

Abstract: In this short note, I comment on the research of Pisarenko et al. (2014) regarding the Comment on Pisarenko et al. Characterization of the Tail of the Distribution of Earthquake Magnitudes by Combining the GEV and GPD Descriptions of Extreme Value Theory Mathias Raschke Institution: freelancer

More information

Bayesian Point Process Modeling for Extreme Value Analysis, with an Application to Systemic Risk Assessment in Correlated Financial Markets

Bayesian Point Process Modeling for Extreme Value Analysis, with an Application to Systemic Risk Assessment in Correlated Financial Markets Bayesian Point Process Modeling for Extreme Value Analysis, with an Application to Systemic Risk Assessment in Correlated Financial Markets Athanasios Kottas Department of Applied Mathematics and Statistics,

More information

A Note on Tail Behaviour of Distributions. the max domain of attraction of the Frechét / Weibull law under power normalization

A Note on Tail Behaviour of Distributions. the max domain of attraction of the Frechét / Weibull law under power normalization ProbStat Forum, Volume 03, January 2010, Pages 01-10 ISSN 0974-3235 A Note on Tail Behaviour of Distributions in the Max Domain of Attraction of the Frechét/ Weibull Law under Power Normalization S.Ravi

More information

EVA Tutorial #2 PEAKS OVER THRESHOLD APPROACH. Rick Katz

EVA Tutorial #2 PEAKS OVER THRESHOLD APPROACH. Rick Katz 1 EVA Tutorial #2 PEAKS OVER THRESHOLD APPROACH Rick Katz Institute for Mathematics Applied to Geosciences National Center for Atmospheric Research Boulder, CO USA email: rwk@ucar.edu Home page: www.isse.ucar.edu/staff/katz/

More information

Zwiers FW and Kharin VV Changes in the extremes of the climate simulated by CCC GCM2 under CO 2 doubling. J. Climate 11:

Zwiers FW and Kharin VV Changes in the extremes of the climate simulated by CCC GCM2 under CO 2 doubling. J. Climate 11: Statistical Analysis of EXTREMES in GEOPHYSICS Zwiers FW and Kharin VV. 1998. Changes in the extremes of the climate simulated by CCC GCM2 under CO 2 doubling. J. Climate 11:2200 2222. http://www.ral.ucar.edu/staff/ericg/readinggroup.html

More information

ON THE CONSEQUENCES OF MISSPECIFING ASSUMPTIONS CONCERNING RESIDUALS DISTRIBUTION IN A REPEATED MEASURES AND NONLINEAR MIXED MODELLING CONTEXT

ON THE CONSEQUENCES OF MISSPECIFING ASSUMPTIONS CONCERNING RESIDUALS DISTRIBUTION IN A REPEATED MEASURES AND NONLINEAR MIXED MODELLING CONTEXT ON THE CONSEQUENCES OF MISSPECIFING ASSUMPTIONS CONCERNING RESIDUALS DISTRIBUTION IN A REPEATED MEASURES AND NONLINEAR MIXED MODELLING CONTEXT Rachid el Halimi and Jordi Ocaña Departament d Estadística

More information

Measurements made for web data, media (IP Radio and TV, BBC Iplayer: Port 80 TCP) and VoIP (Skype: Port UDP) traffic.

Measurements made for web data, media (IP Radio and TV, BBC Iplayer: Port 80 TCP) and VoIP (Skype: Port UDP) traffic. Real time statistical measurements of IPT(Inter-Packet time) of network traffic were done by designing and coding of efficient measurement tools based on the Libpcap package. Traditional Approach of measuring

More information

Some Background Information on Long-Range Dependence and Self-Similarity On the Variability of Internet Traffic Outline Introduction and Motivation Ch

Some Background Information on Long-Range Dependence and Self-Similarity On the Variability of Internet Traffic Outline Introduction and Motivation Ch On the Variability of Internet Traffic Georgios Y Lazarou Information and Telecommunication Technology Center Department of Electrical Engineering and Computer Science The University of Kansas, Lawrence

More information

Assessing the dependence of high-dimensional time series via sample autocovariances and correlations

Assessing the dependence of high-dimensional time series via sample autocovariances and correlations Assessing the dependence of high-dimensional time series via sample autocovariances and correlations Johannes Heiny University of Aarhus Joint work with Thomas Mikosch (Copenhagen), Richard Davis (Columbia),

More information

STA 4273H: Statistical Machine Learning

STA 4273H: Statistical Machine Learning STA 4273H: Statistical Machine Learning Russ Salakhutdinov Department of Computer Science! Department of Statistical Sciences! rsalakhu@cs.toronto.edu! h0p://www.cs.utoronto.ca/~rsalakhu/ Lecture 7 Approximate

More information

Practical conditions on Markov chains for weak convergence of tail empirical processes

Practical conditions on Markov chains for weak convergence of tail empirical processes Practical conditions on Markov chains for weak convergence of tail empirical processes Olivier Wintenberger University of Copenhagen and Paris VI Joint work with Rafa l Kulik and Philippe Soulier Toronto,

More information

Extremogram and ex-periodogram for heavy-tailed time series

Extremogram and ex-periodogram for heavy-tailed time series Extremogram and ex-periodogram for heavy-tailed time series 1 Thomas Mikosch University of Copenhagen Joint work with Richard A. Davis (Columbia) and Yuwei Zhao (Ulm) 1 Zagreb, June 6, 2014 1 2 Extremal

More information

MFM Practitioner Module: Quantitiative Risk Management. John Dodson. October 14, 2015

MFM Practitioner Module: Quantitiative Risk Management. John Dodson. October 14, 2015 MFM Practitioner Module: Quantitiative Risk Management October 14, 2015 The n-block maxima 1 is a random variable defined as M n max (X 1,..., X n ) for i.i.d. random variables X i with distribution function

More information

Independent Component (IC) Models: New Extensions of the Multinormal Model

Independent Component (IC) Models: New Extensions of the Multinormal Model Independent Component (IC) Models: New Extensions of the Multinormal Model Davy Paindaveine (joint with Klaus Nordhausen, Hannu Oja, and Sara Taskinen) School of Public Health, ULB, April 2008 My research

More information

Estimation of spatial max-stable models using threshold exceedances

Estimation of spatial max-stable models using threshold exceedances Estimation of spatial max-stable models using threshold exceedances arxiv:1205.1107v1 [stat.ap] 5 May 2012 Jean-Noel Bacro I3M, Université Montpellier II and Carlo Gaetan DAIS, Università Ca Foscari -

More information

Estimation of the extreme value index and high quantiles under random censoring

Estimation of the extreme value index and high quantiles under random censoring Estimation of the extreme value index and high quantiles under random censoring Jan Beirlant () & Emmanuel Delafosse (2) & Armelle Guillou (2) () Katholiee Universiteit Leuven, Department of Mathematics,

More information

A simple graphical method to explore tail-dependence in stock-return pairs

A simple graphical method to explore tail-dependence in stock-return pairs A simple graphical method to explore tail-dependence in stock-return pairs Klaus Abberger, University of Konstanz, Germany Abstract: For a bivariate data set the dependence structure can not only be measured

More information

A THRESHOLD APPROACH FOR PEAKS-OVER-THRESHOLD MODELING USING MAXIMUM PRODUCT OF SPACINGS

A THRESHOLD APPROACH FOR PEAKS-OVER-THRESHOLD MODELING USING MAXIMUM PRODUCT OF SPACINGS Statistica Sinica 20 2010, 1257-1272 A THRESHOLD APPROACH FOR PEAKS-OVER-THRESHOLD MODELING USING MAXIMUM PRODUCT OF SPACINGS Tony Siu Tung Wong and Wai Keung Li The University of Hong Kong Abstract: We

More information