Heriot-Watt University

Similar documents
Penalized Loss functions for Bayesian Model Choice

Online tests of Kalman filter consistency

7. Estimation and hypothesis testing. Objective. Recommended reading

Downloaded from:

Markov Chain Monte Carlo methods

Statistical Inference for Stochastic Epidemic Models

Bayesian Learning. HT2015: SC4 Statistical Data Mining and Machine Learning. Maximum Likelihood Principle. The Bayesian Learning Framework

Variational Bayes. A key quantity in Bayesian inference is the marginal likelihood of a set of data D given a model M

7. Estimation and hypothesis testing. Objective. Recommended reading

Bayes: All uncertainty is described using probability.

Benchmarking of optimization methods for topology optimization problems

Plausible Values for Latent Variables Using Mplus

Hierarchical Models & Bayesian Model Selection

Quantile POD for Hit-Miss Data

A Bayesian multi-dimensional couple-based latent risk model for infertility

Bayesian Inference. p(y)

New Bayesian methods for model comparison

eqr094: Hierarchical MCMC for Bayesian System Reliability

Biostat 2065 Analysis of Incomplete Data

Subject CS1 Actuarial Statistics 1 Core Principles

Bayesian non-parametric model to longitudinally predict churn

Aalborg Universitet. Published in: IEEE International Conference on Acoustics, Speech, and Signal Processing. Creative Commons License Unspecified

. Also, in this case, p i = N1 ) T, (2) where. I γ C N(N 2 2 F + N1 2 Q)

Stat 5101 Lecture Notes

Package effectfusion

Variational Bayesian Inference for Parametric and Non-Parametric Regression with Missing Predictor Data

Or How to select variables Using Bayesian LASSO

Bayesian linear regression

MCMC 2: Lecture 2 Coding and output. Phil O Neill Theo Kypraios School of Mathematical Sciences University of Nottingham

Testing Restrictions and Comparing Models

Introduction to Machine Learning. Maximum Likelihood and Bayesian Inference. Lecturers: Eran Halperin, Yishay Mansour, Lior Wolf

Markov Chain Monte Carlo in Practice

Bayesian Regression Linear and Logistic Regression

INVERTED KUMARASWAMY DISTRIBUTION: PROPERTIES AND ESTIMATION

Transmuted distributions and extrema of random number of variables

Multiple Scenario Inversion of Reflection Seismic Prestack Data

Bayesian Inference. Chapter 9. Linear models and regression

Choosing the Summary Statistics and the Acceptance Rate in Approximate Bayesian Computation

Spatial Statistics Chapter 4 Basics of Bayesian Inference and Computation

Bayesian Methods for Machine Learning

Bayesian Inference on Joint Mixture Models for Survival-Longitudinal Data with Multiple Features. Yangxin Huang

Estimating the marginal likelihood with Integrated nested Laplace approximation (INLA)

A note on multiple imputation for general purpose estimation

CTDL-Positive Stable Frailty Model

Accelerating interior point methods with GPUs for smart grid systems

Lecture 1 Bayesian inference

Model comparison. Christopher A. Sims Princeton University October 18, 2016

Lecture 6: Model Checking and Selection

Lidar calibration What s the problem?

MODEL COMPARISON CHRISTOPHER A. SIMS PRINCETON UNIVERSITY

Hybrid Censoring; An Introduction 2

The M/G/1 FIFO queue with several customer classes

Default Priors and Effcient Posterior Computation in Bayesian

Bayesian Inference. Chapter 1. Introduction and basic concepts

Practice Exam 1. (A) (B) (C) (D) (E) You are given the following data on loss sizes:

A Note on Powers in Finite Fields

Bayesian Estimation for the Generalized Logistic Distribution Type-II Censored Accelerated Life Testing

Bayesian nonparametric estimation of finite population quantities in absence of design information on nonsampled units

Bayesian Inference in a Normal Population

Latent Variable Models for Binary Data. Suppose that for a given vector of explanatory variables x, the latent

SPRING 2007 EXAM C SOLUTIONS

Slice Sampling with Adaptive Multivariate Steps: The Shrinking-Rank Method

NORGES TEKNISK-NATURVITENSKAPELIGE UNIVERSITET

Bayesian Inference for the Multivariate Normal

General Bayesian Inference I

EM Algorithm II. September 11, 2018

Tools for Parameter Estimation and Propagation of Uncertainty

GAUSSIAN PROCESS REGRESSION

Standard Errors & Confidence Intervals. N(0, I( β) 1 ), I( β) = [ 2 l(β, φ; y) β i β β= β j

Parametric Inference Maximum Likelihood Inference Exponential Families Expectation Maximization (EM) Bayesian Inference Statistical Decison Theory

Quantifying remaining oil saturation using seismic time-lapse amplitude changes at fluid contacts Alvarez, Erick; MacBeth, Colin; Brain, Jon

MEASUREMENT UNCERTAINTY AND SUMMARISING MONTE CARLO SAMPLES

Model comparison: Deviance-based approaches

Introduction to Machine Learning. Maximum Likelihood and Bayesian Inference. Lecturers: Eran Halperin, Lior Wolf

The Bayesian Choice. Christian P. Robert. From Decision-Theoretic Foundations to Computational Implementation. Second Edition.

Parametric Modelling of Over-dispersed Count Data. Part III / MMath (Applied Statistics) 1

Reconstruction of individual patient data for meta analysis via Bayesian approach

Linear Models A linear model is defined by the expression

Noninformative Priors for the Ratio of the Scale Parameters in the Inverted Exponential Distributions

Approximate Bayesian computation for spatial extremes via open-faced sandwich adjustment

Web Appendix for Hierarchical Adaptive Regression Kernels for Regression with Functional Predictors by D. B. Woodard, C. Crainiceanu, and D.

Bayesian Networks in Educational Assessment

Bayesian data analysis in practice: Three simple examples

Empirical Likelihood Based Deviance Information Criterion

BAYESIAN MODEL CRITICISM

STAT 740: Testing & Model Selection

Bayesian Model Diagnostics and Checking

Down by the Bayes, where the Watermelons Grow

Recursive Deviance Information Criterion for the Hidden Markov Model

Bayesian Inference. Chapter 2: Conjugate models

ABC methods for phase-type distributions with applications in insurance risk problems

Modelling Salmonella transfer during grinding of meat

Matrix representation of a Neural Network

Principles of Bayesian Inference

Comparisons of winds from satellite SAR, WRF and SCADA to characterize coastal gradients and wind farm wake effects at Anholt wind farm

Introduction. Start with a probability distribution f(y θ) for the data. where η is a vector of hyperparameters

Principles of Bayesian Inference

Linear Regression. Data Model. β, σ 2. Process Model. ,V β. ,s 2. s 1. Parameter Model

Parameter Estimation. William H. Jefferys University of Texas at Austin Parameter Estimation 7/26/05 1

The consequences of misspecifying the random effects distribution when fitting generalized linear mixed models

Transcription:

Heriot-Watt University Heriot-Watt University Research Gateway Prediction of settlement delay in critical illness insurance claims by using the generalized beta of the second kind distribution Dodd, Erengul; Streftaris, George Published in: Journal of the Royal Statistical Society Series C: Applied Statistics DOI: 10.1111/rssc.12165 Publication date: 2017 Document Version Peer reviewed version Link to publication in Heriot-Watt University Research Portal Citation for published version (APA): Dodd, E., & Streftaris, G. (2017). Prediction of settlement delay in critical illness insurance claims by using the generalized beta of the second kind distribution. DOI: 10.1111/rssc.12165 General rights Copyright and moral rights for the publications made accessible in the public portal are retained by the authors and/or other copyright owners and it is a condition of accessing publications that users recognise and abide by the legal requirements associated with these rights. If you believe that this document breaches copyright please contact us providing details, and we will remove access to the work immediately and investigate your claim.

Download date: 20. Aug. 2018

Supplementary material for Prediction of settlement delay in critical illness insurance claims using GB2 distribution by Erengul Dodd and George Streftaris S1 Fitting related distributions Similar results for s i as in Section 3.1, can be derived for the GG distribution. For the nested Burr and Pareto distributions, equation (7) can be adjusted accordingly by substituting γ = 1 and γ = τ = 1, respectively. On the other hand for the log-normal distribution we employ standard weighted glm where, again, the variance is inversely proportional to the weights. The posterior distributions of parameters α and τ of the GG distribution are shown in Figure S1 and, as again discussed in Section 4, they provide evidence against other simpler distributions. Under all considered distributions we use the same linear predictor as in Section 3 of the main paper and the same priors for β, i.e. N(0, 10 4 ). The prior distributions for the other model parameters are given below. GG distribution: α Gamma(1, 0.01) τ Gamma(1, 0.01) Burr distribution: α Gamma(1, 0.01)I(1/τ, ) τ Gamma(1, 0.01) Log-normal distribution: σ 2 Inverse Gamma(1, 0.1) Pareto distribution: α Gamma(1, 0.01)I(1, ) We summarise the posterior estimates of the shape parameters of the Burr and Pareto distributions, and the scale parameter of the log-normal distribution together with 95% credible intervals of these parameters in Table S1. We note here that the posterior estimates of the GB2 model in Section 3.1, as illustrated in the densities of model parameters shown in Figure 2, also provide evidence against the Pareto model (for which γ = τ = 1). Similarly, Figure S1 illustrates that under the GG model, parameters α and τ are far from 1 and therefore related simpler nested models (Gamma, Weibull, exponential) are not supported. S1

Table S1: Posterior means and 95% credible intervals (CI) of shape parameters of the Burr and Pareto distributions and the scale parameter of the log-normal distribution. Parameter Mean 95% CI Burr α 0.5869 (0.5564, 0.6189) τ 2.6221 (2.5452, 2.7008) Pareto α 6.8084 (6.1727, 7.5150) Log-normal σ 2 1.0224 (0.9992, 1.0463) 0.000 0.005 0.010 0.015 0 10 20 30 40 50 50 100 150 200 250 300 0.06 0.07 0.08 0.09 0.10 0.11 0.12 (a) α (b) τ Figure S1: Posterior densities and histograms of model parameters together with their posterior means (dashed line) under the GG distribution. S2 DIC In the main paper we consider the following three versions of DIC (keeping the original indices in Celeux et al. (2006)): DIC 4 = 4E θ,dmis D obs [log f(d obs, D mis θ)] + 2E Dmis D obs [log f (D obs, D mis E θ [θ D obs, D mis ])], DIC 5 = 4E θ,dmis D obs [log f(d obs, D mis θ)] ( ( + 2 log f D obs, ˆD )) mis (D obs ) ˆθ(D obs ), (S1) (S2) DIC 8 = 4E θ,dmis D obs [log f(d obs D mis, θ)] ( + 2E Dmis D obs [log f D obs D mis, ˆθ(D )] (S3) obs, D mis ). ( In (S2) ˆDmis (D obs ), ˆθ(D ) obs ) is the joint maximum a posteriori estimator of (D mis, θ). Following Celeux et al. (2006) we use the pair of (D mis, θ) from the MCMC iterations that actually give the highest value of the non-standardised posterior density, i.e. f(d obs, D mis θ)f(θ). In (S3) the ˆθ(D obs, D mis ) denotes the posterior mean of θ. S2

S3 Latent likelihood ratio test In more general terms, at each MCMC post-convergence iteration t = 1,..., N, we compute the latent value of the likelihood ratio Λ as ) L 1 (θ (t) ; D Λ (t) = ( ) L 2 θ; D where θ (t) are MCMC posterior estimates at iteration (t) and θ are the MLEs. To calculate the tail probability π Λ = P ( Λ Λ (t)) we need the sampling distribution of Λ under (M 1 ). If models are nested, we can use asymptotic arguments (as demonstrated in Streftaris and Gibson (2012)) implying 2 log Λ (t) χ 2 df approximately, and therefore obtain a tail probability as π (t) Λ = P (χ2 df 2 log Λ (t) ) where df is the degrees of freedom of M 2, i.e. the number of estimated parameters in M 2. If models are not nested, we can employ simulation to obtain the empirical sampling distribution of Λ. First we generate a sample from M 1 ( ) D M 1 θ (t) (S4) and calculate Λ (t) = ( ) L 1 θ (t) ; D ( ). (S5) L 2 θ; D Then we can either obtain the expectation of π Λ as an ergodic mean by computing binary 1(Λ Λ (t) ) and estimating the expectation of the posterior distribution of the p-values as E(π Λ D) = P ( Λ Λ (t)) f (θ D) dθ 1 N 1(Λ Λ (t) ), or we can obtain the entire posterior distribution of p-values using a posterior sample π (1) Λ,..., π(n) Λ, by repeating the steps given in (S4) and (S5) in each MCMC iteration. Here, we prefer the repeated simulation approach when the two models under comparison are not nested. S4 Efficiency in dealing with missing values In the context of the work presented in the main paper, assessment of model efficiency in dealing with missing values is not as straightforward as in multiple imputation analysis (e.g. Rashid et al. (2015)). However, we have compared the relative posterior variance of model β parameters, including and excluding missing values under the GB2 and Burr model, using the following relative efficiency measure sd MV GB2 sdmv Burr, sd GB2 sd Burr S3 (S6)

where sd MV GB2 denotes the posterior standard deviation of each parameter under the GB2 model when missing values are included in the analysis - and similarly for the other cases. Therefore, negative values indicate that the GB2 distribution is more efficient in dealing with missing values. A barplot of the values of this measure for the 30 β coefficients is shown in Figure S2 and suggests that the GB2 model outperforms the Burr model (note that this difference has also a negative mean of 0.024). Relative efficiency 0.3 0.2 0.1 0.0 0.1 0.2 0.3 β 1 β 3 β 6 β 9 β 12 β 15 β 18 β 21 β 24 β 27 β 30 Figure S2: Barplot comparing the efficiency of the GB2 and Burr model in dealing with missing values. The relative efficiency measure (S6) has been computed for the β coefficients of the models. S5 Variable selection using marginal likelihoods We can confirm the results found in Section 5 using an approximation to the marginal likelihood for the GB2 model, since its exact analytical calculation is not tractable. If we denote these models by m where m (m 1,..., m 1024 ) and the parameter vector of model m by θ m, then the marginal likelihood can be expressed as f(d m) = f(d θ m, m)f(θ m m)dθ m. (S7) We consider the Laplace approximation to obtain (S7), that is f(d m) = (2π) dm/2 Σ m 1/2 f(d θ m, m)f( θ m m) (S8) where θ m denotes the parameter vector of model m calculated at the posterior modes, d m is the dimension of model m and Σ = ( H( β)) 1 is the covariance matrix with the Hessian matrix of the likelihood, H, evaluated at the posterior modes of the parameters. The resulting posterior model probabilities based on the approximated marginal likelihood under the GB2 distribution are given in Table S2 and are very similar to those calculated employing the MCMC approach (see Section 5; Table 5). This is not surprising as this approximation to marginal likelihoods is known to give S4

Table S2: Laplace approximation for the marginal likelihood. Model f(m D) P O(m981 /.) 1 m 981 0.2129 1.00 2 m 977 0.2098 1.01 3 m 978 0.1666 1.27 4 m 982 0.1120 1.90 5 m 979 0.0412 5.16 good results with large sample sizes (Gelman et al., 2000). More information on this approach can be found in Kass and Raftery (1995). References Celeux, G., Forbes, F., Robert, C. P., Titterington, D. M. et al. (2006) Deviance information criteria for missing data models. Bayesian analysis, 1, 651 673. Gelman, A., Carlin, J. B., Stern, H. S. and Rubin, D. B. (2000) Bayesian Data Analysis. Chapman and Hall/CRC. Kass, R. E. and Raftery, A. E. (1995) Bayes factors. Journal of the American Statistical Association, 90, 773 795. Rashid, S., Mitra, R. and Steele, R. (2015) Using mixtures of t densities to make inferences in the presence of missing data with a small number of multiply imputed data sets. Computational Statistics & Data Analysis, 92, 84 96. Streftaris, G. and Gibson, G. J. (2012) Non-exponential tolerance to infection in epidemic systems modeling, inference, and assessment. Biostatistics, 13, 580 593. S5