Analysis methods of heavy-tailed data
|
|
- Myrtle Harmon
- 6 years ago
- Views:
Transcription
1 Institute of Control Sciences Russian Academy of Sciences, Moscow, Russia February, 13-18, 2006, Bamberg, Germany June, 19-23, 2006, Brest, France May, 14-19, 2007, Trondheim, Norway PhD course
2 Chapter 3 Heavy-tailed density estimation. Combined parametric-nonparametric methods, Barron s estimate and χ 2 -optimality. Kernel estimators with the variable bandwidth and their smoothing methods: weighted version of squared error cross-validation (WISE), discrepancy method. Re-transformed nonparametric estimators.
3 In Section 3 the problems of the heavy-tailed density estimation are discussed. Three approaches to heavy-tailed density estimation are considered. 1 Combined parametric-nonparametric methods, where the tail domain of the density is fitted by some parametric model and the main part of the density (the body) is fitted by some nonparametric method like a histogram. Similar approach realized by Barron s estimator is considered. 2 Kernel estimates with variable bandwidth. The optimal accuracy of these estimates as well as their disadvantages for heavy-tailed density estimation are discussed. 3 Re-transformed estimates, that use a preliminary transformation of underlying random variable to a new one with the density that is more convenient for restoration.
4 Specific features of analysis of heavy-tailed distributions are the following: Aim: heavy tail goes to zero at slower than at an exponential rate; Cramér s condition is violated; sparse observations in the tail domain of the distribution. Non-parametric PDF estimation with accurate tail behavior. Comparison of PDF s is needed in the classification: classification of measurements belonging to different sources: mobile, fax, normal calls, Internet...; classification of service, using customers behavior.
5 Example of heavy-tailed density estimation. The Fréchet PDF : the body and the tail estimation.
6 Statement of the problem Combined parametric-non-parametric estimators with separate estimation of the tail and the body of the PDF. Kernel estimator ˆfh (x) = 1 nh n ( ) x xi K h i=1 provide peeks at the tail domain or over-smooth the main part of the PDF for the finite samples;
7 Statement of the problem Variable bandwidth kernel estimator ( ) ˆf A (x) = 1 n f(x i ) 1/2 (x X K i )f(x i ) 1/2, nh h i=1 ( ) f A (x) = 1 n ˆf(Xi ) 1/2 (x X K i )ˆf(X i ) 1/2, nh h i=1 ˆf(Xi ) is a pilot estimate of f(x), e.g. a standard kernel estimate. Advantage in comparison with standard kernel estimator. Local adaptation to the sample by means of ˆf(X i ) 1/2 h with a fixed h.
8 Problems of kernel estimates at finite PDF s. Boundary effects of kernel estimates. Epanechnikov s kernel for h1 < 1 Y (n) : the truncation of the kernel; h2 = 1 Y (n) : the kernel corresponds to a triangular PDF at the neighborhood of 1; h3 > 1 Y (n) : oversmoothing of the PDF. Y (n) = 0.8
9 Outline: Heavy-tailed density estimation: 1 the combined approach; 2 variable bandwidth kernel estimators. 3 the usage of the transform-re-transform scheme. Boundary kernels The discrepancy method and cross-validation as smoothing tools for a variable bandwidth kernel estimator.
10 Main assumption: The asymptotical behavior of F(x) at is based on the asymptotic limit distribution of the maximum of the sample. Gnedenko, (1943): if F(x) is such that the limit distribution of the maximum M n = max(x 1, X 2,...,X n ) exists, then this limit distribution can only be of the following form for some normalizing constants a n > 0, b n R P{(M n b n )/a n x} = F n (b n + a n x) n H γ (x), x R, where H γ (x) = exp( x 1/γ ), x > 0, γ > 0 Fréchet, exp( ( x) 1/γ ), x < 0, γ < 0 Weibull, exp( e x ), γ = 0, x R Gumbel.
11 Combined estimators for heavy-tailed densities. Combined parametric-nonparametric method. { f f(t, N (t), t [0, X γ, N) = (n k) ], f γ (t), t (X (n k), ), X (n k) is some r.v., f γ (x) = (1/γ)x 1/γ 1 + (2/γ)x 2/γ 1, is the parametrical tail model of Pareto type, f N (t) = 1 X (n k) N t λ j ϕ j ( ), X (n k) is the non-parametrical estimator of the main part of the PDF that is an expansion by basis functions ϕ k (t), k = 1, 2,.... j=1
12 Estimation of mixtures of two PDFs by combined estimator. Estimation of the PDF of the mixture of Gamma and Pareto distributions (left) and two Gamma distributions (right) by combined estimator.
13 Barron s estimator and χ 2 -optimality. Let P n = {A n1... A nmn } be partitions of the real line (0, ) into finite intervals (bins) by quantiles G 1 (j/m n ), 1 j m n 1 of arbitrary distribution G(x), δ i = df n (x) = 1 n 1 Ani (X i ), n is sample size. A ni n i=1 Estimator, Barron, Györfi, van der Meulen, (1992): 1/n + δ ˆfB (x) = g(x) i, x A nj, 1 j m n, 1/n + 1/m n g(x) is a tail model. Histogram-type estimate at [0, X (k) ] superposes with the tail model g(x). The estimate is consistent in a sense of χ 2 -divergence if m n and m n /n 0 as n.
14 Problems of ˆf B (x): Optimal selection of partitions. Optimal selection of g(x). The behavior of the true DF for x > X (n) is unknown. One has to apply asymptotic results of the extreme value theory regarding the behavior of the DF at infinity. Examples of auxiliary distribution G(x) are lognormal, normal, Weibull distributions. The choice of the auxiliary density g(x) = G (x) influences hardly on the estimate in the tail domain x A nmn = [X (k), ): ˆF(x) = 1/n + δ m n 1/n + 1/m n x g(x)dx = 1/n + δ m n 1/n + 1/m n (1 G(x)). For samples of moderate sizes the tail model g(x) distorts the estimate of the body of the PDF. For large samples this influence becomes weaker.
15 Kernel estimates with the variable bandwidth. Let X n = {X 1,...,X n } be a sample of i.i.d. r.v. distributed with the heavy-tailed DF F(x) and the PDF f(x). Variable bandwidth kernel estimate, Abramson (1982). n ( ) ˆf A (x h) = (nh) 1 f(x i ) 1/2 K (x X i )f(x i ) 1/2 /h Practical version f A (x h 1, h) = (nh) 1 i=1 n ˆfh1 (X i ) 1/2 K i=1 ( ) (x X i )ˆf h1 (X i ) 1/2 /h Main advantages: Non-negativity; The best mean squared error.
16 Mean squared errors (MSE) for kernel estimates: MSE = E (ˆfh (x) f(x)) 2 dx MSE of a standard kernel estimate. MSE(ˆf h ) n 4/5 (bias h 2 ; variance (nh) 1 ) if a second-order kernel is used, h n 1/5, f has two continuous derivatives. MSE of a variable bandwidth kernel estimate. MSE(ˆf A (x h)) n 8/9 (bias h 4 ; variance (nh) 1 ) if a symmetric kernel such as x 4 K(x) dx < is used, h n 1/9, f has four continuous derivatives.
17 Cross-validation for a variable bandwidth kernel estimator (P.Hall (1992)). Weighted integrated squared error where WISE = f i (x; h) = 1 nh p f i (x; h) 2 ω(x)dx 2 n j=1,j i ˆf i (X j, h 1 ) p/2 K 1 ( x X j Ah ), A > 0, f i (x; h) 2 f(x)ω(x)dx, ω(x) is a bounded, nonnegative function (a weight). ( ) (x Xj )ˆf i (X j, h 1 ) 1/2 h
18 Cross-validation for a variable bandwidth kernel estimator (P.Hall (1992)). Example of the weight function: ω(x) = { 1, for Σ 1/2 (x µ) 2 z η, 0, otherwise, where µ and Σ denote the sample mean and variance, is Euclidean distance, z η is the upper (1 η)-level critical point of the chi-squared distribution. Practical version: ŴISE = f i (x; h) 2 ω(x)dx 2 n How good is h? n f i (X i ; h) 2 ω(x i ) i=1
19 Discrepancy method for variable bandwidth kernel estimator. Let h be a solution of the discrepancy equation sup F n (x) Fh,h A 1 (x) = δn 1/2, (1) <x< F n (x) is an empirical DF. F A h,h 1 (x) = x f A (t h 1, h)dt, f A (t h 1, h) is a variable bandwidth kernel estimator. δ is a quantile of the Kolmogorov-Smirnov statistic ndn = n sup <x< F n (x) F(x).
20 Discrepancy method for variable bandwidth kernel estimator. The bias rate. Let h is a solution of the discrepancy equation. It is possible to prove the following. Assuming h 1 = cn 1/5 we have ( P{h > n 1/9 } < exp 2n 1 2/α), for α > 2. ( P{E f A (x h 1, h ) f(x) > ψ(x)n 4/9 } < 2 exp 2n 1/9), where ψ(x) is a function that is independent on n.
21 Discrepancy method for variable bandwidth kernel estimator. Practical version. nmax (ˆD+ n, ˆD ) n = 0.5, where ˆD n ˆD + n = ( ) i n max 1 i n n F h,h A 1 (X (i) ), = ( n max Fh,h A 1 i n 1 (X (i) ) i 1 ), n X (1) X (2)... X (n) are order statistics.
22 Approach with transformations. X 1,..., X n T Y 1,..., Y n, Y j = T(X j ), j = 1,...,n Let T(x) be a monotone increasing one-to-one transformation function (T is continuous). The re-transformed estimate of the PDF of X i is ˆf(x) = ĝ(t(x))t (x), g(x) is the PDF of the r.v. Y i. The DF of the r.v. Y i is G(x) = P{Y i x} = P{T(X i ) x} = F(T 1 (x))
23 Approach with transformations. Preliminary transformations to a new r.v. Y j = T(X j ), j = 1,...,n: Fixed transformations: ln x, 2/π arctan x. Features of fixed transformations: Advantage: they do not require any knowledge about the distribution of X. Disadvantage: they can lead to densities of the transformed r.v.s Y j with discontinuity, which are difficult for the estimation.
24 Approach with transformations. First adapted transformations to a new r.v. Y j = T(X j ), j = 1,...,n: (Wand et al, 1991) T(x) = { xλ sign(λ), λ 0, ln x, λ = 0 ( λ = arg min g (y) ) 2 dy, R where g is the PDF of the transformed r.v. Y 1 = T λ (X 1 ).
25 Approach with transformations. First adapted transformations to a new r.v. Y j = T(X j ), j = 1,...,n: T : R + [0, 1], F(x) is some parametric model (Devroye and Györfi, 1985) The transformation to an isosceles triangular PDF φ tri (x) on [0, 1] F(x), F(x) 0.5, T (x) = F(x) 2, F(x) > 0.5, for kernel estimates with compact kernels and the transformation T(x) = F(x) to a uniform PDF φ uni (x) for a histogram, provide the minimal convergence rate in L 1 : min g E 1 0 ĝ 0 (x) g(x) dx
26 Approach with transformations. Problems of transform-re-transform scheme: The DF F(x) is unknown: impossibility to transform to exact desirable PDF. Selection of a parametric or non-parametric family of distributions as guess DF s. Selection of a target PDF to provide the stability of the re-transformed estimates to minor perturbations in the tail index estimates. Selection of the PDF estimate to keep the tail decay rate (of the true PDF ) after the inverse transformation.
27 Approach with transformations. Adaptive transformation (Maiboroda & Markovich (2004)): Tˆγ (x) = Φ 1 (Ψˆγ (x)) = 1 (1 + ˆγx) 1/(2ˆγ), the guess DF F of X i is assumed to be the Generalized Pareto distribution { 1 (1 + ˆγx) Ψˆγ (x) = 1/ˆγ, x 0, 0, x < 0, the target DF G of Y i is Φ(x) = (2x x 2 )1{x [0, 1]} + 1{x > 1}.
28 Approach with transformations. Adaptive transformation The transformation provides the PDF g(x) at [0, 1] that is continuous in the neighborhood of 1 for typical distributions (with regularly varying tails, lognormal-type tails and Weibull-like tails) for a consistent estimate ˆγ of γ. Estimators ĝ(x): polygram, kernel estimate 1/(nh) n i=1 K ((x x i)/h) The choice of Generalized Pareto distribution is widespread and motivated by Pickands s theorem which states that, for a certain class of distributions and for a sufficiently high threshold u of the r.v. X, the conditional distribution of the overshoot Y = X u, provided that X exceeds u, converges to a Generalized Pareto distribution.
29 Adaptive transformation approach. Estimation algorithm: 1 The tail index of X j is estimated by the sample {X 1,...,X n } using the Hill estimate ˆγ k = 1/k k i=1 log X (n i+1) log X (n k). (X (1)... X (n) is the order statistics of the sample.) 2 The transformation T = Tˆγk is constructed as follows: if ξ has the guess DF Ψˆγk then Tˆγk (ξ) has the target DF Φ (e.g. a triangular). (Here ˆγ k is considered as a fixed value). 3 The transformed sample Y j = Tˆγk (X j ), j = 1,...,n is constructed. 4 The PDF of Y 1,...,Y n is estimated by some estimate ĝ h (x). 5 The PDF of X j is estimated by ˆf h (x) = ĝ h (T(x))T (x).
30 The Pareto PDF with γ = 1. Sample size is n = 100. The PDF of the Gaussian distribution N(0, σ) is used as ˆf(x) for f A (x). The Gaussian kernel is used in the re-transformed kernel estimator. h1 = σ(y)n 1/5 = 0.099, h2 = 1.06σ(X)n 1/5 = 9.453, σ(x) and σ(y) are standard deviations of samples X n = {X 1,..., X n } and Y n = Tˆγ (X n ).
31 The Pareto PDF with γ = 1. Sample size is n = 100. The Gaussian kernel is used in the re-transformed kernel estimator and for ˆf(x) in f A (x). Tˆγ (x) = 1 (1 + ˆγx) 1/(2ˆγ) is the adapted transformation. ˆγ is the Hill s estimator, k is estimated by bootstrap.
32 The Weibull PDF with γ = 0.5. Sample size is n = 100. The Gaussian kernel is used for the re-transformed kernel estimator and for ˆf(x) in f A (x). h1 = σ(y)n 1/5 = 0.102, h2 = 1.06σ(X)n 1/5 = 3.673, Tˆγ (x) = 1 (1 + ˆγx) 1/(2ˆγ)
33 The accuracy of the re-transformed estimators. Mean integrated squared error (MISE). MISE h (ˆγ,Ω) = E = E = E Ω Ω (ˆf(x) f(x)) 2 dx (ĝ h (Tˆγ (x)) g(tˆγ (x))) 2 T ˆγ (x)dtˆγ(x) Ω (ĝ h (y) g(y)) 2 T 1 ˆγ (Tˆγ (y))dy, where Ω = Tˆγ (Ω). For the fixed transformations and non-random intervals Ω : MISE h (Ω) = T (T 1 (y))e(ĝ h (y) g(y)) 2 dy. Ω
34 The MISE of re-transformed kernel estimators. Mean integrated squared error (MISE). If 0 < T (T 1 (x)) c holds at Ω for the transformation T (not necessary fixed), then we have MISE h (Ω) c E(ĝ h (y) g(y)) 2 dy Ω for a non-random Ω. MSE of kernel estimates. MSE(ĝ h ) n 4/5 if a non-variable bandwidth kernel estimator as ĝ h (y) is used, h n 1/5, g (2) is continuous, MSE(ĝ h ) n 8/9 if a variable bandwidth kernel estimator as ĝ h (y) is used, h n 1/9, g (4) is continuous.
35 The rate of decay of re-transformed estimators at by γ. Boundary kernels. A bias of the estimate at the boundary.
36 Boundary kernels. Example. Let the PDF be f(x) = The re-transformed estimate is { l(x)(1 + γx) (1/γ+1), x 0 0, x < 0 ˆf(x) = ĝh (Tˆγ (x))t ˆγ (x) = 0.5ĝ h (Tˆγ (x))(1 + ˆγx) 1/(2ˆγ) 1, Transformation is Tˆγ (x) = 1 (1 + ˆγx) 1/(2ˆγ) Smoothed polygram gives ĝ n (x) = C n (1 x), x 1, ˆfn (x) 0.5C n (1 + ˆγx) (1/ˆγ+1) Kernel estimator gives ˆf h (x) 0.5ĝ h (1)(1 + ˆγx) (1/(2ˆγ)+1), i.e. the EVI is two times larger than needed.
37 Boundary kernels. Example. Principles of the selection of boundary kernels. The kernel coincides with the target PDF : K(y) = g(y), y [Y (n), 1], Direct fitting of the boundary : ( ) 1 T(x) Y(n) h : h K T (x) = h ˆf(x), because ĝ h (y) 1 ( ) y Y(n) h K h y (Y (n), 1]
38 Reduction of the boundary bias, Simonoff (1996): Let a new kernel independent on the PDF be B(x) = (a 2(p) a 1 (p)x) K(x) a 0 (p)a 2 (p) a1 2(p), a l (p) = 1 1 p u l K(u)du, 0 < p < 1. The bias of the kernel estimator with such a kernel in the boundary region is O(h 2 ) and the variance is O((nh) 1 ) (the same as in the interior) when second derivative of the underlying density is continuous.
39 Overcoming of boundary effects. Combination of two approaches: usage of B(x) and K(y) = g(y). For the adapted transformation and h = 1 Y (n) we have ĝ h (Tˆγ (x)) = 1 h 1 ( ) Tˆγ h B (x) Y (n) h ( a 2 (p) a 1 (p) Tˆγ(x) Y (n) h (1 + ˆγx) 1/(2ˆγ), ĝ h (y) 1 ( ) y Y(n) h K h ) K a 0 (p)a 2 (p) a 2 1 (p) y (Y (n), 1] ( ) Tˆγ (x) Y (n) h
40 Re-transformed kernel estimators applied to Web data. PDF estimation of the sizes of sub-sessions (left) and inter-response times (right). K(x) is Epanechnikov s kernel. h = σn 1/5, h1 = 1.01 Tˆγ (X (n) ), h < h1, σ is the variance of the transformed data.
41 Comparison of re-transformed kernel estimate and variable bandwidth kernel estimate. Retransformed standard kernel estimate and variable bandwidth kernel estimate with Epanechnikov s kernel for Pareto distribution: body (left) and tail (right). h is selected by D-method.
42 Comparison of re-transformed kernel estimate and variable bandwidth kernel estimate. Conclusions: Pure variable bandwidth kernel estimator does not fit the density at infinity at least with compact supported kernels in contrast to a variable bandwidth kernel estimator that uses transformation of the data.
43 Papers: Markovitch, N.M., Krieger U.R. (2000a) Nonparametric estimation of long-tailed density functions and its application to the analysis of World Wide Web traffic. Performance Evaluation, 42(2-3), Markovitch, N.M. and Krieger, U.R. (2002) The estimation of heavy-tailed probability density functions, their mixtures and quantiles. Computer Networks, Vol. 40, Issue 3, Maiboroda R.E., Markovich N.M. (2004) Estimation of heavy-tailed probability density function with application to Web data. Computational Statistics, 4. Barron,A.R., Chyong-Hwa Sheu. (1991) Approximation of density functions by sequences of exponential families. Annals of statistics, 19, 3,
44 Papers: Barron, A.R., Györfi, L., van der Meulen, E. (1992) Distribution estimation consistent in total variation and in two types of information divergence. IEEE Trans.Inform Theory, 38, Silverman, B.W. (1986) Density Estimation for Statistics and Data Analysis, New York: Chapman&Hall. Hall, P. (1992) On global properties of variable bandwidth density estimators. Annals of Statistics, 20, 2,
Analysis methods of heavy-tailed data
Institute of Control Sciences Russian Academy of Sciences, Moscow, Russia February, 13-18, 2006, Bamberg, Germany June, 19-23, 2006, Brest, France May, 14-19, 2007, Trondheim, Norway PhD course Chapter
More informationNonparametric estimation of long-tailed density functions and its application to the analysis of World Wide Web traffic
Performance Evaluation 4 ) 5 Nonparametric estimation of long-tailed density functions and its application to the analysis of World Wide Web traffic Natalia M. Markovitch a,, Udo R. Krieger b,c a Institute
More informationMath 576: Quantitative Risk Management
Math 576: Quantitative Risk Management Haijun Li lih@math.wsu.edu Department of Mathematics Washington State University Week 11 Haijun Li Math 576: Quantitative Risk Management Week 11 1 / 21 Outline 1
More informationAdaptive Kernel Estimation of The Hazard Rate Function
Adaptive Kernel Estimation of The Hazard Rate Function Raid Salha Department of Mathematics, Islamic University of Gaza, Palestine, e-mail: rbsalha@mail.iugaza.edu Abstract In this paper, we generalized
More informationQuantitative Economics for the Evaluation of the European Policy. Dipartimento di Economia e Management
Quantitative Economics for the Evaluation of the European Policy Dipartimento di Economia e Management Irene Brunetti 1 Davide Fiaschi 2 Angela Parenti 3 9 ottobre 2015 1 ireneb@ec.unipi.it. 2 davide.fiaschi@unipi.it.
More informationOn variable bandwidth kernel density estimation
JSM 04 - Section on Nonparametric Statistics On variable bandwidth kernel density estimation Janet Nakarmi Hailin Sang Abstract In this paper we study the ideal variable bandwidth kernel estimator introduced
More informationChapter 9. Non-Parametric Density Function Estimation
9-1 Density Estimation Version 1.2 Chapter 9 Non-Parametric Density Function Estimation 9.1. Introduction We have discussed several estimation techniques: method of moments, maximum likelihood, and least
More informationIntroduction to Curve Estimation
Introduction to Curve Estimation Density 0.000 0.002 0.004 0.006 700 800 900 1000 1100 1200 1300 Wilcoxon score Michael E. Tarter & Micheal D. Lock Model-Free Curve Estimation Monographs on Statistics
More informationHistogram Härdle, Müller, Sperlich, Werwatz, 1995, Nonparametric and Semiparametric Models, An Introduction
Härdle, Müller, Sperlich, Werwatz, 1995, Nonparametric and Semiparametric Models, An Introduction Tine Buch-Kromann Construction X 1,..., X n iid r.v. with (unknown) density, f. Aim: Estimate the density
More informationBoundary Correction Methods in Kernel Density Estimation Tom Alberts C o u(r)a n (t) Institute joint work with R.J. Karunamuni University of Alberta
Boundary Correction Methods in Kernel Density Estimation Tom Alberts C o u(r)a n (t) Institute joint work with R.J. Karunamuni University of Alberta November 29, 2007 Outline Overview of Kernel Density
More informationKernel density estimation
Kernel density estimation Patrick Breheny October 18 Patrick Breheny STA 621: Nonparametric Statistics 1/34 Introduction Kernel Density Estimation We ve looked at one method for estimating density: histograms
More informationChapter 9. Non-Parametric Density Function Estimation
9-1 Density Estimation Version 1.1 Chapter 9 Non-Parametric Density Function Estimation 9.1. Introduction We have discussed several estimation techniques: method of moments, maximum likelihood, and least
More information12 - Nonparametric Density Estimation
ST 697 Fall 2017 1/49 12 - Nonparametric Density Estimation ST 697 Fall 2017 University of Alabama Density Review ST 697 Fall 2017 2/49 Continuous Random Variables ST 697 Fall 2017 3/49 1.0 0.8 F(x) 0.6
More informationO Combining cross-validation and plug-in methods - for kernel density bandwidth selection O
O Combining cross-validation and plug-in methods - for kernel density selection O Carlos Tenreiro CMUC and DMUC, University of Coimbra PhD Program UC UP February 18, 2011 1 Overview The nonparametric problem
More informationDensity and Distribution Estimation
Density and Distribution Estimation Nathaniel E. Helwig Assistant Professor of Psychology and Statistics University of Minnesota (Twin Cities) Updated 04-Jan-2017 Nathaniel E. Helwig (U of Minnesota) Density
More informationNonparametric Density Estimation
Nonparametric Density Estimation Econ 690 Purdue University Justin L. Tobias (Purdue) Nonparametric Density Estimation 1 / 29 Density Estimation Suppose that you had some data, say on wages, and you wanted
More informationSpatially Smoothed Kernel Density Estimation via Generalized Empirical Likelihood
Spatially Smoothed Kernel Density Estimation via Generalized Empirical Likelihood Kuangyu Wen & Ximing Wu Texas A&M University Info-Metrics Institute Conference: Recent Innovations in Info-Metrics October
More informationAdaptive Nonparametric Density Estimators
Adaptive Nonparametric Density Estimators by Alan J. Izenman Introduction Theoretical results and practical application of histograms as density estimators usually assume a fixed-partition approach, where
More informationNonparametric Methods
Nonparametric Methods Michael R. Roberts Department of Finance The Wharton School University of Pennsylvania July 28, 2009 Michael R. Roberts Nonparametric Methods 1/42 Overview Great for data analysis
More informationExtreme Value Analysis and Spatial Extremes
Extreme Value Analysis and Department of Statistics Purdue University 11/07/2013 Outline Motivation 1 Motivation 2 Extreme Value Theorem and 3 Bayesian Hierarchical Models Copula Models Max-stable Models
More informationEMPIRICAL EVALUATION OF DATA-BASED DENSITY ESTIMATION
Proceedings of the 2006 Winter Simulation Conference L. F. Perrone, F. P. Wieland, J. Liu, B. G. Lawson, D. M. Nicol, and R. M. Fujimoto, eds. EMPIRICAL EVALUATION OF DATA-BASED DENSITY ESTIMATION E. Jack
More informationThe Convergence Rate for the Normal Approximation of Extreme Sums
The Convergence Rate for the Normal Approximation of Extreme Sums Yongcheng Qi University of Minnesota Duluth WCNA 2008, Orlando, July 2-9, 2008 This talk is based on a joint work with Professor Shihong
More informationStatistic Distribution Models for Some Nonparametric Goodness-of-Fit Tests in Testing Composite Hypotheses
Communications in Statistics - Theory and Methods ISSN: 36-926 (Print) 532-45X (Online) Journal homepage: http://www.tandfonline.com/loi/lsta2 Statistic Distribution Models for Some Nonparametric Goodness-of-Fit
More informationEstimation of risk measures for extreme pluviometrical measurements
Estimation of risk measures for extreme pluviometrical measurements by Jonathan EL METHNI in collaboration with Laurent GARDES & Stéphane GIRARD 26th Annual Conference of The International Environmetrics
More informationTime Series and Forecasting Lecture 4 NonLinear Time Series
Time Series and Forecasting Lecture 4 NonLinear Time Series Bruce E. Hansen Summer School in Economics and Econometrics University of Crete July 23-27, 2012 Bruce Hansen (University of Wisconsin) Foundations
More informationStatistical inference on Lévy processes
Alberto Coca Cabrero University of Cambridge - CCA Supervisors: Dr. Richard Nickl and Professor L.C.G.Rogers Funded by Fundación Mutua Madrileña and EPSRC MASDOC/CCA student workshop 2013 26th March Outline
More informationNonparametric Function Estimation with Infinite-Order Kernels
Nonparametric Function Estimation with Infinite-Order Kernels Arthur Berg Department of Statistics, University of Florida March 15, 2008 Kernel Density Estimation (IID Case) Let X 1,..., X n iid density
More informationMaximum Likelihood vs. Least Squares for Estimating Mixtures of Truncated Exponentials
Maximum Likelihood vs. Least Squares for Estimating Mixtures of Truncated Exponentials Helge Langseth 1 Thomas D. Nielsen 2 Rafael Rumí 3 Antonio Salmerón 3 1 Department of Computer and Information Science
More informationA Novel Nonparametric Density Estimator
A Novel Nonparametric Density Estimator Z. I. Botev The University of Queensland Australia Abstract We present a novel nonparametric density estimator and a new data-driven bandwidth selection method with
More informationNonparametric estimation of extreme risks from heavy-tailed distributions
Nonparametric estimation of extreme risks from heavy-tailed distributions Laurent GARDES joint work with Jonathan EL METHNI & Stéphane GIRARD December 2013 1 Introduction to risk measures 2 Introduction
More informationA Conditional Approach to Modeling Multivariate Extremes
A Approach to ing Multivariate Extremes By Heffernan & Tawn Department of Statistics Purdue University s April 30, 2014 Outline s s Multivariate Extremes s A central aim of multivariate extremes is trying
More informationExtreme Value Theory as a Theoretical Background for Power Law Behavior
Extreme Value Theory as a Theoretical Background for Power Law Behavior Simone Alfarano 1 and Thomas Lux 2 1 Department of Economics, University of Kiel, alfarano@bwl.uni-kiel.de 2 Department of Economics,
More informationn! (k 1)!(n k)! = F (X) U(0, 1). (x, y) = n(n 1) ( F (y) F (x) ) n 2
Order statistics Ex. 4. (*. Let independent variables X,..., X n have U(0, distribution. Show that for every x (0,, we have P ( X ( < x and P ( X (n > x as n. Ex. 4.2 (**. By using induction or otherwise,
More informationMinimum Hellinger Distance Estimation in a. Semiparametric Mixture Model
Minimum Hellinger Distance Estimation in a Semiparametric Mixture Model Sijia Xiang 1, Weixin Yao 1, and Jingjing Wu 2 1 Department of Statistics, Kansas State University, Manhattan, Kansas, USA 66506-0802.
More informationESTIMATING BIVARIATE TAIL
Elena DI BERNARDINO b joint work with Clémentine PRIEUR a and Véronique MAUME-DESCHAMPS b a LJK, Université Joseph Fourier, Grenoble 1 b Laboratoire SAF, ISFA, Université Lyon 1 Framework Goal: estimating
More informationCentral limit theorem for the variable bandwidth kernel density estimators
Central limit theorem for the variable bandwidth kernel density estimators Janet Nakarmi a and Hailin Sang b a Department of Mathematics, University of Central Arkansas, Conway, AR 72035, USA. E-mail address:
More informationModel Specification Testing in Nonparametric and Semiparametric Time Series Econometrics. Jiti Gao
Model Specification Testing in Nonparametric and Semiparametric Time Series Econometrics Jiti Gao Department of Statistics School of Mathematics and Statistics The University of Western Australia Crawley
More informationAnalysis methods of heavy-tailed data
Institute of Control Sciences Russian Academy of Sciences, Moscow, Russia February, 13-18, 2006, Bamberg, Germany June, 19-23, 2006, Brest, France May, 14-19, 2007, Trondheim, Norway PhD course Chapter
More informationON SOME TWO-STEP DENSITY ESTIMATION METHOD
UNIVESITATIS IAGELLONICAE ACTA MATHEMATICA, FASCICULUS XLIII 2005 ON SOME TWO-STEP DENSITY ESTIMATION METHOD by Jolanta Jarnicka Abstract. We introduce a new two-step kernel density estimation method,
More informationSmooth simultaneous confidence bands for cumulative distribution functions
Journal of Nonparametric Statistics, 2013 Vol. 25, No. 2, 395 407, http://dx.doi.org/10.1080/10485252.2012.759219 Smooth simultaneous confidence bands for cumulative distribution functions Jiangyan Wang
More informationFrontier estimation based on extreme risk measures
Frontier estimation based on extreme risk measures by Jonathan EL METHNI in collaboration with Ste phane GIRARD & Laurent GARDES CMStatistics 2016 University of Seville December 2016 1 Risk measures 2
More informationEcon 582 Nonparametric Regression
Econ 582 Nonparametric Regression Eric Zivot May 28, 2013 Nonparametric Regression Sofarwehaveonlyconsideredlinearregressionmodels = x 0 β + [ x ]=0 [ x = x] =x 0 β = [ x = x] [ x = x] x = β The assume
More informationFinancial Econometrics and Volatility Models Extreme Value Theory
Financial Econometrics and Volatility Models Extreme Value Theory Eric Zivot May 3, 2010 1 Lecture Outline Modeling Maxima and Worst Cases The Generalized Extreme Value Distribution Modeling Extremes Over
More informationA Note on Tail Behaviour of Distributions. the max domain of attraction of the Frechét / Weibull law under power normalization
ProbStat Forum, Volume 03, January 2010, Pages 01-10 ISSN 0974-3235 A Note on Tail Behaviour of Distributions in the Max Domain of Attraction of the Frechét/ Weibull Law under Power Normalization S.Ravi
More informationNonparametric Inference via Bootstrapping the Debiased Estimator
Nonparametric Inference via Bootstrapping the Debiased Estimator Yen-Chi Chen Department of Statistics, University of Washington ICSA-Canada Chapter Symposium 2017 1 / 21 Problem Setup Let X 1,, X n be
More informationMFM Practitioner Module: Quantitiative Risk Management. John Dodson. October 14, 2015
MFM Practitioner Module: Quantitiative Risk Management October 14, 2015 The n-block maxima 1 is a random variable defined as M n max (X 1,..., X n ) for i.i.d. random variables X i with distribution function
More informationNonparametric Econometrics
Applied Microeconometrics with Stata Nonparametric Econometrics Spring Term 2011 1 / 37 Contents Introduction The histogram estimator The kernel density estimator Nonparametric regression estimators Semi-
More informationInvestigation of goodness-of-fit test statistic distributions by random censored samples
d samples Investigation of goodness-of-fit test statistic distributions by random censored samples Novosibirsk State Technical University November 22, 2010 d samples Outline 1 Nonparametric goodness-of-fit
More informationShape of the return probability density function and extreme value statistics
Shape of the return probability density function and extreme value statistics 13/09/03 Int. Workshop on Risk and Regulation, Budapest Overview I aim to elucidate a relation between one field of research
More information41903: Introduction to Nonparametrics
41903: Notes 5 Introduction Nonparametrics fundamentally about fitting flexible models: want model that is flexible enough to accommodate important patterns but not so flexible it overspecializes to specific
More informationTail bound inequalities and empirical likelihood for the mean
Tail bound inequalities and empirical likelihood for the mean Sandra Vucane 1 1 University of Latvia, Riga 29 th of September, 2011 Sandra Vucane (LU) Tail bound inequalities and EL for the mean 29.09.2011
More informationModelling Non-linear and Non-stationary Time Series
Modelling Non-linear and Non-stationary Time Series Chapter 2: Non-parametric methods Henrik Madsen Advanced Time Series Analysis September 206 Henrik Madsen (02427 Adv. TS Analysis) Lecture Notes September
More informationIntensity Analysis of Spatial Point Patterns Geog 210C Introduction to Spatial Data Analysis
Intensity Analysis of Spatial Point Patterns Geog 210C Introduction to Spatial Data Analysis Chris Funk Lecture 4 Spatial Point Patterns Definition Set of point locations with recorded events" within study
More informationLecturer: Olga Galinina
Renewal models Lecturer: Olga Galinina E-mail: olga.galinina@tut.fi Outline Reminder. Exponential models definition of renewal processes exponential interval distribution Erlang distribution hyperexponential
More informationThe high order moments method in endpoint estimation: an overview
1/ 33 The high order moments method in endpoint estimation: an overview Gilles STUPFLER (Aix Marseille Université) Joint work with Stéphane GIRARD (INRIA Rhône-Alpes) and Armelle GUILLOU (Université de
More informationQuantile-quantile plots and the method of peaksover-threshold
Problems in SF2980 2009-11-09 12 6 4 2 0 2 4 6 0.15 0.10 0.05 0.00 0.05 0.10 0.15 Figure 2: qqplot of log-returns (x-axis) against quantiles of a standard t-distribution with 4 degrees of freedom (y-axis).
More informationA Perturbation Technique for Sample Moment Matching in Kernel Density Estimation
A Perturbation Technique for Sample Moment Matching in Kernel Density Estimation Arnab Maity 1 and Debapriya Sengupta 2 Abstract The fundamental idea of kernel smoothing technique can be recognized as
More informationDensity estimation Nonparametric conditional mean estimation Semiparametric conditional mean estimation. Nonparametrics. Gabriel Montes-Rojas
0 0 5 Motivation: Regression discontinuity (Angrist&Pischke) Outcome.5 1 1.5 A. Linear E[Y 0i X i] 0.2.4.6.8 1 X Outcome.5 1 1.5 B. Nonlinear E[Y 0i X i] i 0.2.4.6.8 1 X utcome.5 1 1.5 C. Nonlinearity
More informationDensity estimators for the convolution of discrete and continuous random variables
Density estimators for the convolution of discrete and continuous random variables Ursula U Müller Texas A&M University Anton Schick Binghamton University Wolfgang Wefelmeyer Universität zu Köln Abstract
More informationSTAT 6350 Analysis of Lifetime Data. Probability Plotting
STAT 6350 Analysis of Lifetime Data Probability Plotting Purpose of Probability Plots Probability plots are an important tool for analyzing data and have been particular popular in the analysis of life
More informationExtreme Value Theory and Applications
Extreme Value Theory and Deauville - 04/10/2013 Extreme Value Theory and Introduction Asymptotic behavior of the Sum Extreme (from Latin exter, exterus, being on the outside) : Exceeding the ordinary,
More informationExtremogram and Ex-Periodogram for heavy-tailed time series
Extremogram and Ex-Periodogram for heavy-tailed time series 1 Thomas Mikosch University of Copenhagen Joint work with Richard A. Davis (Columbia) and Yuwei Zhao (Ulm) 1 Jussieu, April 9, 2014 1 2 Extremal
More informationConfidence intervals for kernel density estimation
Stata User Group - 9th UK meeting - 19/20 May 2003 Confidence intervals for kernel density estimation Carlo Fiorio c.fiorio@lse.ac.uk London School of Economics and STICERD Stata User Group - 9th UK meeting
More informationECON 721: Lecture Notes on Nonparametric Density and Regression Estimation. Petra E. Todd
ECON 721: Lecture Notes on Nonparametric Density and Regression Estimation Petra E. Todd Fall, 2014 2 Contents 1 Review of Stochastic Order Symbols 1 2 Nonparametric Density Estimation 3 2.1 Histogram
More information2 Functions of random variables
2 Functions of random variables A basic statistical model for sample data is a collection of random variables X 1,..., X n. The data are summarised in terms of certain sample statistics, calculated as
More informationOne-Sample Numerical Data
One-Sample Numerical Data quantiles, boxplot, histogram, bootstrap confidence intervals, goodness-of-fit tests University of California, San Diego Instructor: Ery Arias-Castro http://math.ucsd.edu/~eariasca/teaching.html
More informationModel-free prediction intervals for regression and autoregression. Dimitris N. Politis University of California, San Diego
Model-free prediction intervals for regression and autoregression Dimitris N. Politis University of California, San Diego To explain or to predict? Models are indispensable for exploring/utilizing relationships
More informationKernel density estimation for heavy-tailed distributions...
Kernel density estimation for heavy-tailed distributions using the Champernowne transformation Buch-Larsen, Nielsen, Guillen, Bolance, Kernel density estimation for heavy-tailed distributions using the
More informationKernel density estimation of reliability with applications to extreme value distribution
University of South Florida Scholar Commons Graduate Theses and Dissertations Graduate School 2008 Kernel density estimation of reliability with applications to extreme value distribution Branko Miladinovic
More informationEstimation de mesures de risques à partir des L p -quantiles
1/ 42 Estimation de mesures de risques à partir des L p -quantiles extrêmes Stéphane GIRARD (Inria Grenoble Rhône-Alpes) collaboration avec Abdelaati DAOUIA (Toulouse School of Economics), & Gilles STUPFLER
More informationAkaike Information Criterion to Select the Parametric Detection Function for Kernel Estimator Using Line Transect Data
Journal of Modern Applied Statistical Methods Volume 12 Issue 2 Article 21 11-1-2013 Akaike Information Criterion to Select the Parametric Detection Function for Kernel Estimator Using Line Transect Data
More informationAsymptotic Statistics-VI. Changliang Zou
Asymptotic Statistics-VI Changliang Zou Kolmogorov-Smirnov distance Example (Kolmogorov-Smirnov confidence intervals) We know given α (0, 1), there is a well-defined d = d α,n such that, for any continuous
More informationS6880 #7. Generate Non-uniform Random Number #1
S6880 #7 Generate Non-uniform Random Number #1 Outline 1 Inversion Method Inversion Method Examples Application to Discrete Distributions Using Inversion Method 2 Composition Method Composition Method
More informationNonparametric Estimation of Luminosity Functions
x x Nonparametric Estimation of Luminosity Functions Chad Schafer Department of Statistics, Carnegie Mellon University cschafer@stat.cmu.edu 1 Luminosity Functions The luminosity function gives the number
More informationMean-Shift Tracker Computer Vision (Kris Kitani) Carnegie Mellon University
Mean-Shift Tracker 16-385 Computer Vision (Kris Kitani) Carnegie Mellon University Mean Shift Algorithm A mode seeking algorithm Fukunaga & Hostetler (1975) Mean Shift Algorithm A mode seeking algorithm
More informationUNIVERSITÄT POTSDAM Institut für Mathematik
UNIVERSITÄT POTSDAM Institut für Mathematik Testing the Acceleration Function in Life Time Models Hannelore Liero Matthias Liero Mathematische Statistik und Wahrscheinlichkeitstheorie Universität Potsdam
More informationNonparametric Identification of a Binary Random Factor in Cross Section Data - Supplemental Appendix
Nonparametric Identification of a Binary Random Factor in Cross Section Data - Supplemental Appendix Yingying Dong and Arthur Lewbel California State University Fullerton and Boston College July 2010 Abstract
More informationBrief Review on Estimation Theory
Brief Review on Estimation Theory K. Abed-Meraim ENST PARIS, Signal and Image Processing Dept. abed@tsi.enst.fr This presentation is essentially based on the course BASTA by E. Moulines Brief review on
More informationBootstrap. Director of Center for Astrostatistics. G. Jogesh Babu. Penn State University babu.
Bootstrap G. Jogesh Babu Penn State University http://www.stat.psu.edu/ babu Director of Center for Astrostatistics http://astrostatistics.psu.edu Outline 1 Motivation 2 Simple statistical problem 3 Resampling
More informationIntroduction to Empirical Processes and Semiparametric Inference Lecture 02: Overview Continued
Introduction to Empirical Processes and Semiparametric Inference Lecture 02: Overview Continued Michael R. Kosorok, Ph.D. Professor and Chair of Biostatistics Professor of Statistics and Operations Research
More informationExtreme value theory and high quantile convergence
Journal of Operational Risk 51 57) Volume 1/Number 2, Summer 2006 Extreme value theory and high quantile convergence Mikhail Makarov EVMTech AG, Baarerstrasse 2, 6300 Zug, Switzerland In this paper we
More informationExtremogram and ex-periodogram for heavy-tailed time series
Extremogram and ex-periodogram for heavy-tailed time series 1 Thomas Mikosch University of Copenhagen Joint work with Richard A. Davis (Columbia) and Yuwei Zhao (Ulm) 1 Zagreb, June 6, 2014 1 2 Extremal
More informationPENULTIMATE APPROXIMATIONS FOR WEATHER AND CLIMATE EXTREMES. Rick Katz
PENULTIMATE APPROXIMATIONS FOR WEATHER AND CLIMATE EXTREMES Rick Katz Institute for Mathematics Applied to Geosciences National Center for Atmospheric Research Boulder, CO USA Email: rwk@ucar.edu Web site:
More informationNONLINEAR CLASSIFICATION AND REGRESSION. J. Elder CSE 4404/5327 Introduction to Machine Learning and Pattern Recognition
NONLINEAR CLASSIFICATION AND REGRESSION Nonlinear Classification and Regression: Outline 2 Multi-Layer Perceptrons The Back-Propagation Learning Algorithm Generalized Linear Models Radial Basis Function
More informationNon-parametric Inference and Resampling
Non-parametric Inference and Resampling Exercises by David Wozabal (Last update 3. Juni 2013) 1 Basic Facts about Rank and Order Statistics 1.1 10 students were asked about the amount of time they spend
More informationIntroduction to Empirical Processes and Semiparametric Inference Lecture 09: Stochastic Convergence, Continued
Introduction to Empirical Processes and Semiparametric Inference Lecture 09: Stochastic Convergence, Continued Michael R. Kosorok, Ph.D. Professor and Chair of Biostatistics Professor of Statistics and
More informationLocal Polynomial Wavelet Regression with Missing at Random
Applied Mathematical Sciences, Vol. 6, 2012, no. 57, 2805-2819 Local Polynomial Wavelet Regression with Missing at Random Alsaidi M. Altaher School of Mathematical Sciences Universiti Sains Malaysia 11800
More informationDoes k-th Moment Exist?
Does k-th Moment Exist? Hitomi, K. 1 and Y. Nishiyama 2 1 Kyoto Institute of Technology, Japan 2 Institute of Economic Research, Kyoto University, Japan Email: hitomi@kit.ac.jp Keywords: Existence of moments,
More informationStatistics: Learning models from data
DS-GA 1002 Lecture notes 5 October 19, 2015 Statistics: Learning models from data Learning models from data that are assumed to be generated probabilistically from a certain unknown distribution is a crucial
More informationSTATS 200: Introduction to Statistical Inference. Lecture 29: Course review
STATS 200: Introduction to Statistical Inference Lecture 29: Course review Course review We started in Lecture 1 with a fundamental assumption: Data is a realization of a random process. The goal throughout
More informationPractical conditions on Markov chains for weak convergence of tail empirical processes
Practical conditions on Markov chains for weak convergence of tail empirical processes Olivier Wintenberger University of Copenhagen and Paris VI Joint work with Rafa l Kulik and Philippe Soulier Toronto,
More informationIntroduction to Regression
Introduction to Regression p. 1/97 Introduction to Regression Chad Schafer cschafer@stat.cmu.edu Carnegie Mellon University Introduction to Regression p. 1/97 Acknowledgement Larry Wasserman, All of Nonparametric
More informationGaussian processes for inference in stochastic differential equations
Gaussian processes for inference in stochastic differential equations Manfred Opper, AI group, TU Berlin November 6, 2017 Manfred Opper, AI group, TU Berlin (TU Berlin) inference in SDE November 6, 2017
More informationn! (k 1)!(n k)! = F (X) U(0, 1). (x, y) = n(n 1) ( F (y) F (x) ) n 2
Order statistics Ex. 4.1 (*. Let independent variables X 1,..., X n have U(0, 1 distribution. Show that for every x (0, 1, we have P ( X (1 < x 1 and P ( X (n > x 1 as n. Ex. 4.2 (**. By using induction
More informationOverview of Extreme Value Theory. Dr. Sawsan Hilal space
Overview of Extreme Value Theory Dr. Sawsan Hilal space Maths Department - University of Bahrain space November 2010 Outline Part-1: Univariate Extremes Motivation Threshold Exceedances Part-2: Bivariate
More informationBayesian Point Process Modeling for Extreme Value Analysis, with an Application to Systemic Risk Assessment in Correlated Financial Markets
Bayesian Point Process Modeling for Extreme Value Analysis, with an Application to Systemic Risk Assessment in Correlated Financial Markets Athanasios Kottas Department of Applied Mathematics and Statistics,
More informationEXPLICIT NONPARAMETRIC CONFIDENCE INTERVALS FOR THE VARIANCE WITH GUARANTEED COVERAGE
EXPLICIT NONPARAMETRIC CONFIDENCE INTERVALS FOR THE VARIANCE WITH GUARANTEED COVERAGE Joseph P. Romano Department of Statistics Stanford University Stanford, California 94305 romano@stat.stanford.edu Michael
More informationLarge deviations for random walks under subexponentiality: the big-jump domain
Large deviations under subexponentiality p. Large deviations for random walks under subexponentiality: the big-jump domain Ton Dieker, IBM Watson Research Center joint work with D. Denisov (Heriot-Watt,
More informationRandom variables. DS GA 1002 Probability and Statistics for Data Science.
Random variables DS GA 1002 Probability and Statistics for Data Science http://www.cims.nyu.edu/~cfgranda/pages/dsga1002_fall17 Carlos Fernandez-Granda Motivation Random variables model numerical quantities
More informationESTIMATORS IN THE CONTEXT OF ACTUARIAL LOSS MODEL A COMPARISON OF TWO NONPARAMETRIC DENSITY MENGJUE TANG A THESIS MATHEMATICS AND STATISTICS
A COMPARISON OF TWO NONPARAMETRIC DENSITY ESTIMATORS IN THE CONTEXT OF ACTUARIAL LOSS MODEL MENGJUE TANG A THESIS IN THE DEPARTMENT OF MATHEMATICS AND STATISTICS PRESENTED IN PARTIAL FULFILLMENT OF THE
More information