Bayesian inference for multivariate extreme value distributions

Similar documents
On the occurrence times of componentwise maxima and bias in likelihood inference for multivariate max-stable distributions

Monte Carlo in Bayesian Statistics

BAYESIAN METHODS FOR VARIABLE SELECTION WITH APPLICATIONS TO HIGH-DIMENSIONAL DATA

Models for Spatial Extremes. Dan Cooley Department of Statistics Colorado State University. Work supported in part by NSF-DMS

Approximate Bayesian computation for spatial extremes via open-faced sandwich adjustment

Bayesian Inference and MCMC

Down by the Bayes, where the Watermelons Grow

High-Order Composite Likelihood Inference for Max-Stable Distributions and Processes

Bayesian Modeling of Air Pollution Extremes Using Nested Multivariate Max-Stable Processes

Extreme value statistics: from one dimension to many. Lecture 1: one dimension Lecture 2: many dimensions

Extreme Value Analysis and Spatial Extremes

Bayesian Methods for Machine Learning

Marginal Specifications and a Gaussian Copula Estimation

Bayesian Modelling of Extreme Rainfall Data

(5) Multi-parameter models - Gibbs sampling. ST440/540: Applied Bayesian Analysis

Principles of Bayesian Inference

Bayesian Regression Linear and Logistic Regression

Bayesian Networks in Educational Assessment

MCMC algorithms for fitting Bayesian models

Fractional Hot Deck Imputation for Robust Inference Under Item Nonresponse in Survey Sampling

STA 4273H: Statistical Machine Learning

Markov Chain Monte Carlo, Numerical Integration

Statistical Methods in Particle Physics Lecture 1: Bayesian methods

Contents. Part I: Fundamentals of Bayesian Inference 1

Markov Chain Monte Carlo

A space-time skew-t model for threshold exceedances

MCMC and Gibbs Sampling. Kayhan Batmanghelich

CPSC 540: Machine Learning

The extremal elliptical model: Theoretical properties and statistical inference

Markov Chain Monte Carlo methods

Markov Chain Monte Carlo methods

Parameter estimation and forecasting. Cristiano Porciani AIfA, Uni-Bonn

Modelling spatial extremes using max-stable processes

Density Estimation. Seungjin Choi

Bayesian Inference in GLMs. Frequentists typically base inferences on MLEs, asymptotic confidence

Nonparametric Bayesian Methods (Gaussian Processes)

Learning the hyper-parameters. Luca Martino

Bayesian Statistical Methods. Jeff Gill. Department of Political Science, University of Florida

Bayesian nonparametrics for multivariate extremes including censored data. EVT 2013, Vimeiro. Anne Sabourin. September 10, 2013

Monte Carlo Methods. Leon Gu CSD, CMU

MCMC for big data. Geir Storvik. BigInsight lunch - May Geir Storvik MCMC for big data BigInsight lunch - May / 17

Markov Chain Monte Carlo

MCMC: Markov Chain Monte Carlo

Conditional simulations of max-stable processes

Stat 5101 Lecture Notes

Departamento de Economía Universidad de Chile

Large Scale Bayesian Inference

7. Estimation and hypothesis testing. Objective. Recommended reading

Bayesian Phylogenetics:

Bayesian GLMs and Metropolis-Hastings Algorithm

Bayesian Inference for Clustered Extremes

Answers and expectations

STAT 425: Introduction to Bayesian Analysis

On Markov chain Monte Carlo methods for tall data

Markov Networks.

David Giles Bayesian Econometrics

Estimation of spatial max-stable models using threshold exceedances

STA 294: Stochastic Processes & Bayesian Nonparametrics

Stat 451 Lecture Notes Markov Chain Monte Carlo. Ryan Martin UIC

Time Series and Dynamic Models

Bayesian Inference in Astronomy & Astrophysics A Short Course

Non parametric modeling of multivariate extremes with Dirichlet mixtures

The Mixture Approach for Simulating New Families of Bivariate Distributions with Specified Correlations

Probabilistic Machine Learning

COS513 LECTURE 8 STATISTICAL CONCEPTS

STAT 499/962 Topics in Statistics Bayesian Inference and Decision Theory Jan 2018, Handout 01

MARKOV CHAIN MONTE CARLO

Bayesian Estimation with Sparse Grids

Multilevel Statistical Models: 3 rd edition, 2003 Contents

Bayesian Nonparametric Regression for Diabetes Deaths

Pattern Recognition and Machine Learning

Bayesian Estimation of Input Output Tables for Russia

Learning Sequence Motif Models Using Expectation Maximization (EM) and Gibbs Sampling

Spatial extreme value theory and properties of max-stable processes Poitiers, November 8-10, 2012

The Bayesian Choice. Christian P. Robert. From Decision-Theoretic Foundations to Computational Implementation. Second Edition.

Review. DS GA 1002 Statistical and Mathematical Models. Carlos Fernandez-Granda

Computer Vision Group Prof. Daniel Cremers. 10a. Markov Chain Monte Carlo

Modelling Multivariate Peaks-over-Thresholds using Generalized Pareto Distributions

A Bayesian perspective on GMM and IV

VCMC: Variational Consensus Monte Carlo

Modelling Operational Risk Using Bayesian Inference

Graphical Models and Kernel Methods

Bayesian inference. Fredrik Ronquist and Peter Beerli. October 3, 2007

April 20th, Advanced Topics in Machine Learning California Institute of Technology. Markov Chain Monte Carlo for Machine Learning

ST 740: Markov Chain Monte Carlo

Bayesian Inference for Discretely Sampled Diffusion Processes: A New MCMC Based Approach to Inference

Computational statistics

Bayesian inference for factor scores

A HIERARCHICAL MAX-STABLE SPATIAL MODEL FOR EXTREME PRECIPITATION

Bayesian model selection in graphs by using BDgraph package

Paul Karapanagiotidis ECO4060

Tutorial on Probabilistic Programming with PyMC3

Fractional Imputation in Survey Sampling: A Comparative Review

Bayesian spatial hierarchical modeling for temperature extremes

STATS 200: Introduction to Statistical Inference. Lecture 29: Course review

Sequential Monte Carlo Methods for Bayesian Computation

Bayesian modelling. Hans-Peter Helfrich. University of Bonn. Theodor-Brinkmann-Graduate School

On Bayesian Computation

EM Algorithm II. September 11, 2018

A Bayesian Approach to Phylogenetics

Transcription:

Bayesian inference for multivariate extreme value distributions Sebastian Engelke Clément Dombry, Marco Oesting Toronto, Fields Institute, May 4th, 2016

Main motivation For a parametric model Z F θ of multivariate max-stable distributions, the full LLHs are usually not available.

Main motivation For a parametric model Z F θ of multivariate max-stable distributions, the full LLHs are usually not available. Maximum likelihood estimation is infeasible.

Main motivation For a parametric model Z F θ of multivariate max-stable distributions, the full LLHs are usually not available. Maximum likelihood estimation is infeasible. Goal: Use full LLHs in Bayesian setup to improve frequentist efficiency,

Main motivation For a parametric model Z F θ of multivariate max-stable distributions, the full LLHs are usually not available. Maximum likelihood estimation is infeasible. Goal: Use full LLHs in Bayesian setup to improve frequentist efficiency, allow for Bayesian methods in multivariate extremes.

Max-stable distributions and partitions Let Z be a d-dimensional max-stable distribution and X 1,..., X n (with std. Fréchet margins) in its MDA.

Max-stable distributions and partitions Let Z be a d-dimensional max-stable distribution and X 1,..., X n (with std. Fréchet margins) in its MDA. Componentwise maxima M n = max n i=1 X i/n d Z.

Max-stable distributions and partitions Let Z be a d-dimensional max-stable distribution and X 1,..., X n (with std. Fréchet margins) in its MDA. Componentwise maxima M n = max n i=1 X i/n d Z. Partition Π n of the set {1,..., d} of occurrence times of maxima, i.e., j and k in the same set if M n,j and M n,k come from same X i. 1 2 3 4 Example: d = 4, n = 3, Π n = {{1}, {2, 3}, {4}}.

Max-stable distributions and partitions We have the weak limit as n (M n, Π n ) d (Z, Π), where Π is the limit partition of occurrence times of Z. 1 2 3 4 Example: d = 4, n = 3, Π n = {{1}, {2, 3}, {4}}.

Max-stable distributions and partitions We have the weak limit as n (M n, Π n ) d (Z, Π), where Π is the limit partition of occurrence times of Z. Π n and Π are distributions on the set P d of all partitions of {1,..., d}. 1 2 3 4 Example: d = 4, n = 3, Π n = {{1}, {2, 3}, {4}}.

Max-stable distributions and likelihoods P(Z z) = e V (z), where V is the exponent measure of Z.

Max-stable distributions and likelihoods P(Z z) = e V (z), where V is the exponent measure of Z. Likelihood of Z is L Z (z) = π P d L Z,Π (z, π), where the Stephenson Tawn LLH (2005, Biometrika) is joint LLH of Z and Π and has the explicit form π L Z,Π (z, π) = e V (z) { V π (j)(z)}, where V π (j) are partial derivatives of V in directions π (j), and the partition π = (π (1),..., π ( π ) ) consists of π sets. j=1

Max-stable distributions and likelihoods P(Z z) = e V (z), where V is the exponent measure of Z. Likelihood of Z is L Z (z) = π P d L Z,Π (z, π), where the Stephenson Tawn LLH (2005, Biometrika) is joint LLH of Z and Π and has the explicit form π L Z,Π (z, π) = e V (z) { V π (j)(z)}, where V π (j) are partial derivatives of V in directions π (j), and the partition π = (π (1),..., π ( π ) ) consists of π sets. Since P d is the dth Bell number (very large), the use of L Z (z) is infeasible. j=1

Parametric inference methods for Z F θ With information on partition (occurrence times observed): Stephenson and Tawn (2005, Biometrika) use partition information and the joint LLH L Z,Π (z, π). But: Possible bias if Π n Π.

Parametric inference methods for Z F θ With information on partition (occurrence times observed): Stephenson and Tawn (2005, Biometrika) use partition information and the joint LLH L Z,Π (z, π). But: Possible bias if Π n Π. Wadsworth (2015, Biometrika) proposes bias correction by using L Z,Πn (z, π).

Parametric inference methods for Z F θ With information on partition (occurrence times observed): Stephenson and Tawn (2005, Biometrika) use partition information and the joint LLH L Z,Π (z, π). But: Possible bias if Π n Π. Wadsworth (2015, Biometrika) proposes bias correction by using L Z,Πn (z, π). Without information on partition (occurrence times unobserved): Composite (pairwise) likelihood Padoan et al. (2010, JASA). But: Losses in efficiency.

Parametric inference methods for Z F θ With information on partition (occurrence times observed): Stephenson and Tawn (2005, Biometrika) use partition information and the joint LLH L Z,Π (z, π). But: Possible bias if Π n Π. Wadsworth (2015, Biometrika) proposes bias correction by using L Z,Πn (z, π). Without information on partition (occurrence times unobserved): Composite (pairwise) likelihood Padoan et al. (2010, JASA). But: Losses in efficiency. Dombry et al. (2013, Biometrika) introduces Gibbs sampler to obtain conditional partitions of L Π Z (π z).

Parametric inference methods for Z F θ With information on partition (occurrence times observed): Stephenson and Tawn (2005, Biometrika) use partition information and the joint LLH L Z,Π (z, π). But: Possible bias if Π n Π. Wadsworth (2015, Biometrika) proposes bias correction by using L Z,Πn (z, π). Without information on partition (occurrence times unobserved): Composite (pairwise) likelihood Padoan et al. (2010, JASA). But: Losses in efficiency. Dombry et al. (2013, Biometrika) introduces Gibbs sampler to obtain conditional partitions of L Π Z (π z). Thibaud et al. (2015, http://arxiv.org/abs/1506.07836) use this Gibbs sampler to choose Π automatically and obtain posterior L(θ z) in a Bayesian framework for Brown Resnick processes.

Goals Recall: MLE with L Z (z) is infeasible.

Goals Recall: MLE with L Z (z) is infeasible. Establish Bayesian framework which uses full LLHs for general max-stable distributions (without observed partition information).

Goals Recall: MLE with L Z (z) is infeasible. Establish Bayesian framework which uses full LLHs for general max-stable distributions (without observed partition information). Improve (frequentist) efficiency of estimates compared to existing methods (pairwise LLH,...)

Goals Recall: MLE with L Z (z) is infeasible. Establish Bayesian framework which uses full LLHs for general max-stable distributions (without observed partition information). Improve (frequentist) efficiency of estimates compared to existing methods (pairwise LLH,...) Without partition information.

Goals Recall: MLE with L Z (z) is infeasible. Establish Bayesian framework which uses full LLHs for general max-stable distributions (without observed partition information). Improve (frequentist) efficiency of estimates compared to existing methods (pairwise LLH,...) Without partition information. With partition information: ignore partition information.

Goals Recall: MLE with L Z (z) is infeasible. Establish Bayesian framework which uses full LLHs for general max-stable distributions (without observed partition information). Improve (frequentist) efficiency of estimates compared to existing methods (pairwise LLH,...) Without partition information. With partition information: ignore partition information. Explore applications of Bayesian methodology in multivariate extremes.

Bayesian framework Parametric model Z F θ with θ Θ. Observations from Z: Unobserved partitions: z 1,..., z n R d π 1,..., π n P d Model parameters: θ Θ

Bayesian framework Parametric model Z F θ with θ Θ. Observations from Z: Unobserved partitions: (latent variables) z 1,..., z n R d π 1,..., π n P d Model parameters: γ θ Θ Bayesian setup: Introduce prior γ on Θ and try to obtain the posterior distribution L(θ, π 1,..., π n z 1,... z n ).

Markov Chain Monte Carlo The posterior has to be evaluated by MCMC.

Markov Chain Monte Carlo The posterior has to be evaluated by MCMC. Gibbs sampler for partitions and Metropolis Hastings for θ, using L(θ, π 1,..., π n z 1,... z n ) γ(θ) γ(θ) n L(z i, π i ) i=1 π n i { i=1 j=1 V π (j) i } (z i )

Markov Chain Monte Carlo The posterior has to be evaluated by MCMC. Gibbs sampler for partitions and Metropolis Hastings for θ, using L(θ, π 1,..., π n z 1,... z n ) γ(θ) γ(θ) n L(z i, π i ) i=1 π n i { i=1 j=1 V π (j) i We need derivatives of the exponent measure V π (j)(z). } (z i )

Examples Logistic model with θ Θ = (0, 1): V π (j)(z) = θ 1 π(j) Γ( π(j) θ) Γ(1 θ) ( d k=1 z 1/θ k ) θ π (j) z 1 1/θ k k π (j)

Examples Logistic model with θ Θ = (0, 1): V π (j)(z) = θ 1 π(j) Γ( π(j) θ) Γ(1 θ) ( d k=1 z 1/θ k ) θ π (j) z 1 1/θ k k π (j) Many other models: Brown Resnick; cf. Thibaud et al. (2015, Biometrika) Extremal-t Reich Shaby Dirichlet...

Extremal dependence: max-stable data Simulate n = 100 samples z 1,... z n from the d-dim. max-stable logistic model for Z with parameter θ 0 {0.1, 0.7, 0.9}; see Dombry et al. (2016, Biometrika).

Extremal dependence: max-stable data Simulate n = 100 samples z 1,... z n from the d-dim. max-stable logistic model for Z with parameter θ 0 {0.1, 0.7, 0.9}; see Dombry et al. (2016, Biometrika). Run MC with uniform prior γ on (0, 1), and take empirical median of posterior L(θ z 1,... z n ) as point estimate ˆθ Bayes.

Extremal dependence: max-stable data Simulate n = 100 samples z 1,... z n from the d-dim. max-stable logistic model for Z with parameter θ 0 {0.1, 0.7, 0.9}; see Dombry et al. (2016, Biometrika). Run MC with uniform prior γ on (0, 1), and take empirical median of posterior L(θ z 1,... z n ) as point estimate ˆθ Bayes. 1 Theta estimates 10 Mean partition sizes 0.8 8 0.6 6 0.4 4 0.2 2 0 0 250 500 0 0 250 500 Figure : Markov chains whose stationary distributions are the posterior of θ 0 (left) and the mean partition size (right); θ 0 = 0.1, d = 10.

Extremal dependence: max-stable data Simulate n = 100 samples z 1,... z n from the d-dim. max-stable logistic model for Z with parameter θ 0 {0.1, 0.7, 0.9}; see Dombry et al. (2016, Biometrika). Run MC with uniform prior γ on (0, 1), and take empirical median of posterior L(θ z 1,... z n ) as point estimate ˆθ Bayes. 1 Theta estimates 10 Mean partition sizes 0.8 8 0.6 6 0.4 4 0.2 2 0 0 250 500 0 0 250 500 Figure : Markov chains whose stationary distributions are the posterior of θ 0 (left) and the mean partition size (right); θ 0 = 0.9, d = 10.

Extremal dependence: max-stable data Compare with pairwise composite LLH estimator ˆθ PL.

Extremal dependence: max-stable data Compare with pairwise composite LLH estimator ˆθ PL. θ 0 = 0.1 θ 0 = 0.7 θ 0 = 0.9 d 6 10 50 6 10 50 6 10 50 Bias(ˆθ Bayes) 2 2 2 10 6 1-6 -3 2 s(ˆθ Bayes) 36 27 12 240 179 79 239 182 84 Bias(ˆθ PL) 1 0 2 13 12 16 26 31 41 s(ˆθ PL) 40 30 13 275 237 173 313 273 246 Table : Sample bias and standard deviation of ˆθ Bayes and ˆθ PL, estimated from 1500 estimates; figures multiplied by 10000. Observations: Posterior median ˆθ Bayes is unbiased. Substantially reduced std. deviations by using full LLHs.

Extremal dependence: max-stable data θ 0 = 0.1 θ 0 = 0.7 θ 0 = 0.9 dimension d 6 10 50 6 10 50 6 10 50 MSE(ˆθ Bayes) MSE(ˆθ PL) 82 83 78 76 57 21 58 44 11 Table : Relative efficiencies (%) of ˆθ Bayes compared to ˆθ PL, estimated from 1500 estimates. Observations: Increasing efficiency gain for lower dependence and higher dimensions; see also Huser et al. (2015, Extremes)).

Extremal dependence: max-domain of attraction For i = 1,..., n, n = 100, simulate b = 50 samples x 1,i,... x b,i from outer power Clayton copula in MDA of logistic model with parameter θ 0.

Extremal dependence: max-domain of attraction For i = 1,..., n, n = 100, simulate b = 50 samples x 1,i,... x b,i from outer power Clayton copula in MDA of logistic model with parameter θ 0. Take block maxima z i = max j=1,...,b x j,i as input data.

Extremal dependence: max-domain of attraction For i = 1,..., n, n = 100, simulate b = 50 samples x 1,i,... x b,i from outer power Clayton copula in MDA of logistic model with parameter θ 0. Take block maxima z i = max j=1,...,b x j,i as input data. Compare ˆθ Bayes, ˆθ PL based on { z i } i, and ST estimator ˆθ ST and its bias-corrected version ˆθ W based on {( z i, π i )} i.

Extremal dependence: max-domain of attraction For i = 1,..., n, n = 100, simulate b = 50 samples x 1,i,... x b,i from outer power Clayton copula in MDA of logistic model with parameter θ 0. Take block maxima z i = max j=1,...,b x j,i as input data. Compare ˆθ Bayes, ˆθ PL based on { z i } i, and ST estimator ˆθ ST and its bias-corrected version ˆθ W based on {( z i, π i )} i. θ 0 = 0.1 θ 0 = 0.7 θ 0 = 0.9 dimension d 6 10 6 10 6 10 ˆθ Bayes 100 100 100 100 100 100 ˆθ PL 83 81 71 60 62 51 ˆθ ST 90 97 47 23 16 7 ˆθ W 90 97 110 70 70 26 Table : Relative efficiencies as MSE(ˆθ Bayes)/MSE(ˆθ) (in %) for different estimators ˆθ; estimated from 1500 estimates. Observations: Efficiency gain by using full information remains large. Automatic choice of partition in ˆθ Bayes more robust than ST- likelihood with fixed partition; no bias correction needed.

More results and work in progress Estimation of marginal parameters: Simultaneous estimation of marginals and dependence parameters possible. Astonishing observation: This substantially reduces MSE for shape estimation (weak improvement for location and scale).

More results and work in progress Estimation of marginal parameters: Simultaneous estimation of marginals and dependence parameters possible. Astonishing observation: This substantially reduces MSE for shape estimation (weak improvement for location and scale). Bayesian hypothesis testing: Ex.: Testing asymptotic independence against dependence H 0 : θ 0 = 1 against H 1 = θ 0 (0, 1). Compare posterior probabilities P(H 0 z 1,..., z n ) and P(H 1 z 1,..., z n ). Many other tests are possible (symmetry, regression coefficients,...).

More results and work in progress Estimation of marginal parameters: Simultaneous estimation of marginals and dependence parameters possible. Astonishing observation: This substantially reduces MSE for shape estimation (weak improvement for location and scale). Bayesian hypothesis testing: Ex.: Testing asymptotic independence against dependence H 0 : θ 0 = 1 against H 1 = θ 0 (0, 1). Compare posterior probabilities P(H 0 z 1,..., z n ) and P(H 1 z 1,..., z n ). Many other tests are possible (symmetry, regression coefficients,...). Asymptotic limit results for posterior median.

References C. Dombry, S. Engelke, and M. Oesting. Exact simulation of max-stable processes. Biometrika, 2016. To appear. C. Dombry, F. Eyi-Minko, and M. Ribatet. Conditional simulation of max-stable processes. Biometrika, 100(1):111 124, 2013. R. Huser, A. C. Davison, and M. G. Genton. Likelihood estimators for multivariate extremes. Extremes, 19:79 103, 2016. S. A. Padoan, M. Ribatet, and S. A. Sisson. Likelihood-based inference for max-stable processes. J. Amer. Statist. Assoc., 105:263 277, 2010. A. Stephenson and J. A. Tawn. Exploiting occurrence times in likelihood inference for componentwise maxima. Biometrika, 92(1):213 227, 2005. E. Thibaud, J. Aalto, D. S. Cooley, A. C. Davison, and J. Heikkinen. Bayesian inference for the Brown Resnick process, with an application to extreme low temperatures. Available from http://arxiv.org/abs/1506.07836, 2015. J. L. Wadsworth. On the occurrence times of componentwise maxima and bias in likelihood inference for multivariate max-stable distributions. Biometrika, 102:705 711, 2015.