DEPARTMENT MATHEMATIK SCHWERPUNKT MATHEMATISCHE STATISTIK UND STOCHASTISCHE PROZESSE

Similar documents
Estimation theory. Parametric estimation. Properties of estimators. Minimum variance estimator. Cramer-Rao bound. Maximum likelihood estimators

Learning Bayesian network : Given structure and completely observed data

Sequential Procedure for Testing Hypothesis about Mean of Latent Gaussian Process

ABC methods for phase-type distributions with applications in insurance risk problems

University of Mannheim, West Germany

Invariant HPD credible sets and MAP estimators

Universität Potsdam Institut für Informatik Lehrstuhl Maschinelles Lernen. Bayesian Learning. Tobias Scheffer, Niels Landwehr

Applying the proportional hazard premium calculation principle

On the Bayesianity of Pereira-Stern tests

Characterization of Upper Comonotonicity via Tail Convex Order

Multivariate Time Series

Bayesian Inference for DSGE Models. Lawrence J. Christiano

Bayesian Inference for DSGE Models. Lawrence J. Christiano

Parametric Techniques Lecture 3

Asymptotics of random sums of heavy-tailed negatively dependent random variables with applications

STAT 499/962 Topics in Statistics Bayesian Inference and Decision Theory Jan 2018, Handout 01

Bayesian Predictive Modeling for Exponential-Pareto Composite Distribution

Parametric Techniques

Introduction to Estimation Theory

Relationship between Least Squares Approximation and Maximum Likelihood Hypotheses

Bayes and Empirical Bayes Estimation of the Scale Parameter of the Gamma Distribution under Balanced Loss Functions

Density Estimation: ML, MAP, Bayesian estimation

Karl-Rudolf Koch Introduction to Bayesian Statistics Second Edition

Lehrstuhl für Statistik und Ökonometrie. Diskussionspapier 87 / Some critical remarks on Zhang s gamma test for independence

Naive Bayes and Gaussian Bayes Classifier

CPSC 340: Machine Learning and Data Mining. MLE and MAP Fall 2017

Calculation of Bayes Premium for Conditional Elliptical Risks

On the existence of maximum likelihood estimators in a Poissongamma HGLM and a negative binomial regression model

Estimation Theory. as Θ = (Θ 1,Θ 2,...,Θ m ) T. An estimator

Fiducial Inference and Generalizations

Testing Statistical Hypotheses

Bayesian Inference for DSGE Models. Lawrence J. Christiano

CPSC 340: Machine Learning and Data Mining

Chain ladder with random effects

ON THE TRUNCATED COMPOSITE WEIBULL-PARETO MODEL

σ(a) = a N (x; 0, 1 2 ) dx. σ(a) = Φ(a) =

Bayesian decision theory Introduction to Pattern Recognition. Lectures 4 and 5: Bayesian decision theory

Standard Error of Technical Cost Incorporating Parameter Uncertainty

Statistics of stochastic processes

Statistics: Learning models from data

Parametric Inference Maximum Likelihood Inference Exponential Families Expectation Maximization (EM) Bayesian Inference Statistical Decison Theory

Experience Rating in General Insurance by Credibility Estimation

The Bayesian Approach to Multi-equation Econometric Model Estimation

ORDER RESTRICTED STATISTICAL INFERENCE ON LORENZ CURVES OF PARETO DISTRIBUTIONS. Myongsik Oh. 1. Introduction

CS 361: Probability & Statistics

INTRODUCTION TO BAYESIAN METHODS II

Eco517 Fall 2004 C. Sims MIDTERM EXAM

Brief Review on Estimation Theory

Maximum likelihood estimation of a log-concave density based on censored data

Exponential families also behave nicely under conditioning. Specifically, suppose we write η = (η 1, η 2 ) R k R p k so that

Predictive Hypothesis Identification

1 Basic concepts from probability theory

Multivariate verification: Motivation, Complexity, Examples

Logistic Regression. Machine Learning Fall 2018

LECTURE 5 NOTES. n t. t Γ(a)Γ(b) pt+a 1 (1 p) n t+b 1. The marginal density of t is. Γ(t + a)γ(n t + b) Γ(n + a + b)

Finite-time Ruin Probability of Renewal Model with Risky Investment and Subexponential Claims

Closest Moment Estimation under General Conditions

Statistics - Lecture One. Outline. Charlotte Wickham 1. Basic ideas about estimation

Stat 5421 Lecture Notes Proper Conjugate Priors for Exponential Families Charles J. Geyer March 28, 2016

Lecture 4. f X T, (x t, ) = f X,T (x, t ) f T (t )

Noninformative Priors for the Ratio of the Scale Parameters in the Inverted Exponential Distributions

ECE 275A Homework 7 Solutions

2 Statistical Estimation: Basic Concepts

On Backtesting Risk Measurement Models

Hypothesis Testing. Part I. James J. Heckman University of Chicago. Econ 312 This draft, April 20, 2006

Analysis of the ruin probability using Laplace transforms and Karamata Tauberian theorems

Bayesian Learning (II)

Affine Processes. Econometric specifications. Eduardo Rossi. University of Pavia. March 17, 2009

Inference on reliability in two-parameter exponential stress strength model

Calendar Year Dependence Modeling in Run-Off Triangles

SECTION 9.2: ARITHMETIC SEQUENCES and PARTIAL SUMS

7. Estimation and hypothesis testing. Objective. Recommended reading

Parameter Estimation

Multistage Methodologies for Partitioning a Set of Exponential. populations.

UNIVERSITÄT POTSDAM Institut für Mathematik

Statistics (1): Estimation

Nonparametric regression with martingale increment errors

The problem is to infer on the underlying probability distribution that gives rise to the data S.

A Very Brief Summary of Bayesian Inference, and Examples

Testing Restrictions and Comparing Models

Conditional probabilities and graphical models

VaR vs. Expected Shortfall

CREDIBILITY IN THE REGRESSION CASE REVISITED (A LATE TRIBUTE TO CHARLES A. HACHEMEISTER) H.BUHLMANN. ETH Zurich. A. GlSLER. Winterhur- Versicherungen

CS 361: Probability & Statistics

Subject CS1 Actuarial Statistics 1 Core Principles

Lecture notes on statistical decision theory Econ 2110, fall 2013

Bayesian Inference. STA 121: Regression Analysis Artin Armagan

Practice Exam #1 CAS Exam 3L

Introduction to Simple Linear Regression

Multivariate Least Weighted Squares (MLWS)

Bayesian Nonparametric Regression for Diabetes Deaths

HYPOTHESIS TESTING: FREQUENTIST APPROACH.

Introduction to Bayesian Learning. Machine Learning Fall 2018

ICES REPORT Model Misspecification and Plausibility

L Hopital s Rule. We will use our knowledge of derivatives in order to evaluate limits that produce indeterminate forms.

Bayesian Inference. Chapter 1. Introduction and basic concepts

Monotonicity and Aging Properties of Random Sums

Bayes Estimation and Prediction of the Two-Parameter Gamma Distribution

A NEW CLASS OF BAYESIAN ESTIMATORS IN PARETIAN EXCESS-OF-LOSS REINSURANCE R.-D. REISS AND M. THOMAS ABSTRACT

Contents. Preface to Second Edition Preface to First Edition Abbreviations PART I PRINCIPLES OF STATISTICAL THINKING AND ANALYSIS 1

Transcription:

U N I V E R S I T Ä T H A M B U R G Bayes-Statistics for mixed Erlang-Models Erhard Kremer Preprint No. 2010-04 Mai 2010 DEPARTMENT MATHEMATIK SCHWERPUNKT MATHEMATISCHE STATISTIK UND STOCHASTISCHE PROZESSE BUNDESSTR. 55, D 20146 HAMBURG

Bayes Statistics for mixed Erlang-models. 1 Introduction Erhard Kremer Department Mathematik Bundesstraße 55, 20144 Hamburg e-mail: Kremer@math.uni-hamburg.de Bayesian ideas are already comparably old. Already in the 18th century Thomas Bayes and Pierre-Simon Laplace developed first conclusions on the Bayes-calculus. Nowadays Bayes-theory or Bayes-Statistics is a highly developed part of Mathematical Statistics. The significant difference to usual Mathematical Statistics is the assumption that the parameter of the statistical model is a realization of a random variable. The distribution of that random variable is called a-priori-distribution. Usually one interprets the a-priori-distribution as something like a previous information on the value of the parameter of the model. But note that sometimes one has a more concrete explanation for such an a-priori-distribution. For example in the actuarial field it is the concretely given distribution of the socalled risk parameter in a collective of insurance risks. The classical approach of Bayes-Statistics is to calculate a socalled a-posteriori-distribution out of the a-priori-distribution by integrating the data of a statistical sample. With that a-posteriori distribution one derives optimal statistical decision rules then. These decision rules are called Bayes-rules. A lot of research on the Bayes-rules was done already. For surveys on most important things see the books of Berger (1985), Gosh et.al. (2003) and Robert (1994). An elegant introduction (in German) was given by the author (see Kremer (2005)). Recently the author noted a certain gap in the field of Bayes research. The closing of that gap is contents of the following paper, that is quite elegant and closed in its results. 2 The context Let be given the sample X = (X 1,..., X n ) as random variables: X i : (Ω, A, P ) (R, B) and the random-variable θ (on also (Ω, A, P ) with values in a set Θ), describing as realisation the parameter of the underlying stochastic model. Denote with P X := P X θ= the conditional probability (measure) of the X given the realisation of θ. With this notation the probability law P θ ist the a-priori-distribution (of θ) and the contitional 1

distribution of θ given X = x, the socalled a-posteriori-distribution (of θ), just in symbol: P θ X=x. The last one can be computed (usually) with the socalled Bayes- Theorem (see on this e.g. Kremer (2005), Theorem 2.1, on page 30 there). For all that follows assume that the X 1,..., X n are i.i.d. given θ, what means in details: (A) P X θ= = n P X i θ=, (B) P X i θ= = P X j θ=,, i j. Usually one assumes in addition that the P Xi θ= are dominated by a σ-finite measure µ, giving the µ-density f X i of the conditional distribution of X i given, with symbol: P X i := P X i θ=. So far for the general notations. The new parts of the paper are specialized to the additional model assumptions: (C) θ has a Erlang(a, b)-distribution with (given) fixed a N and parameter b (0, ), meaning P θ has the Lebesgue-density: on (0, ) f b () = ba Γ(a) a 1 exp( b) (D) P X i is a Erlang(α, )-distribution with (given) fixed α N and parameter (0, ), meaning that one has (with µ as Lebesgue-measure): on y (0, ). f X α i (y) = Γ(α) yα 1 exp( y) Note that for n = 1 the model given by (A)-(D) is related to the model in Johnson & Kotz (1970), section 8.2. 3 Basic results For giving Bayes-rules under the model defined through (A)-(C), one can take certain general theorems in Kremer (2005). These shall be listed up adequately in the following: Result 1: The assumptions (A), (B) with (a 1, b 1 ) R. Suppose that one has: (a) P θ is dominated by the Lebesgue-measure with density of type: f() = [C()] m exp( x 0 ) D(m, x 0 ), 2

where m 0 is fixed, C( ) is a positive function, x 0 (b 1, b 2 ) R is a parameter and D(m, x 0 ) is the norming constant such that: a2 a 1 f() d = 1. (b) P X i is dominated by the Lebesgue-measure and has the density: f X i (y) = C() exp( y) h(y) with the function C( ) of (a) and h( ) is an adequate, nonnegative, measurable function. Then the a-posteriori-distribution P θ X=x has the Lebesgue density according to the formula: f θ X=x () = [C()] n+m exp ( (x 0 + n x i) ) D (n + m, x 0 + n x (3.1) i) where x = (x 1,..., x n ) and D (n + m, x 0 + n x i) is again the norming constant like in (a). This result is Theorem 2.14 in Kremer (2005). It will be applied later on to models according assumptions (A)-(D). For the next let the function γ on θ be defined according: γ() = E(X i θ = ) (3.2) = y P X i (dy), and consider the problem of estimating γ(), based on (the data) X. The corresponding optimal estimator (in the context of the above Bayes-model (see all in front of (C)in part 2.)) is the so called Bayes-estimator. For its general definition see part 4.1 in Kremer (2005). Result 2: Take the complete context of Result 1 with in addition: The Bayes-estimator is given according to: with x = (x 1,..., x n ). f(b) f(a) = 0. ˆγ(x) = x 0 + n x i m + n 3

This result is just Theorem 4.6 in Kremer (2005). Also it will be applied later on. Finally let Θ be split up into two disjoint sets H and K: Θ = H + K. (3.3) A decision (based on X) shall be made between the hypotheses: H and the alternative: K. The Bayes-rule for this (so called) testing problem is called Bayes-Test. For details on it see again Kremer (2005), part 3.1. Result 3: Take the assumptions (A),(B) of section 2 and assume the existence of µ-densities f X i ( ) of P X i. Suppose f X i (y i) is measurable in (, y i ). Then one has as Bayes-Test: { 1, when P θ X=x (K) > c 1 c ϕ(x) = 1 +c 2 0, otherwise, with x = (x 1,..., x n ). Here c 1, c 2 are certain weighting constants in the so called Bayes-risk (see Theorem 3.2 in Kremer (2005)). They are for defining a certain loss-function (see Kremer (2005), page 53). Most simple is to take c 1 = c 2 = 1. The proof of a more generalized version of Result 3 can be found in Kremer (2005), pages 55-57 (note, the above ϕ is given in the remark on page 57). 4 Bayes-estimator Take the Bayes-context of section 2 with (A),(B) and more special (C),(D) in addition. One has: Theorem 1: The Bayes-estimator of γ() (according to (3.2)) is given by: ( b + n ˆγ(x) = α x ) i a + α n 1 with (the sample) x = (x 1,..., x n ), when more strongly α 1. 4

Remark1: Note the excluded case α = 1 is just the (classical) exponential distribution, what (in usual applications, like nonlife insurance rating) is of no such great importance. But also note, that this exponential case is already done in (Kremer 2005), example 2.11 amd example 4.5 case 4. Remark 2: From the properties of the Erlang(α, )-distribution one knows that: what means: 0 y f X i (y) dy = α, γ() = α. (4.1) Proof of Theorem 1: The f X i of (D) is of type (b) of section 3. One has: C() = α Γ(), h(y) = yα 1. Also f b of (C) is of the Type (a) of section 3. Here one has: x 0 = b, m = a 1 α. (4.2) Obviously the conditions of result 1 are given with: a 1 = 0, a 2 =, b 1 = 0, b 2 =. Since: f b (0) = 0, f b ( ) = 0 all conditions of result 2 are given. Inserting there into the formula of ˆγ(x) special choices (4.2), one arrives at the result of theorem 1. One certain interest is: Remark 3: From formula (3.1) and (4.1) one concludes easily that the a-posteriori-distribution P θ x=x with (x = x 1,..., x n )is just the Erlang(a + α, b + x i ), what is for n = 1 in agreement with one result in table 3.2 in Robert (1994). 5

Remark 4: Example 4.9 in Robert (1994) is related to the above theorem. In the special situation n = 1 one has for the Bayes-estimator δ π 1 (X) in Robert and the ˆγ(X) of theorem 1: ˆγ(X) = α δ π 1 (X). The factor α can be easily explained. Robert takes the loss function: ( L(, δ) = δ ) 1 2, whereas the above is based on: L(, δ) = ( δ α ) 2. 5 Bayes-Test Suppose again that the Bayes-context of section 2 with (A), (B) and with in addition (C), (D) is given. Considered shall be an adequate testing problem now, more concretely the hypotheses: H = [ 0, ) against the alternative: K = (0, 0 ), where 0 is fixed and given. That this type is most nearlying, one concludes from (4.1). Usually most nearlying is H = (0, 0 ) against K = ( 0, ). But since in (4.1) one has 1, above H and K are the most right choice. One has: Theorem 2: The Bayes-test for H against K is given by: { 1, when x n > 1 n ϕ(x) = (k (a, α, 0) b) 0, otherwise, when x = (x 1,..., x n ) and with: x n = ( 1 n ) x i ( ) k (a, α, 0 ) = F 1 c1 0, c 1 + c 2 6

where F 1 0 is the inverse of the (strictly monotone increasing) function: with: Proof: According to Remark 2 one has: with F 0 (z) = 1 exp( 0 z) a = a + α. a 1 l=1 P θ X=x (K) = F 0 (b ) b = b + x i, ( 0 z) l, l! since the Erlang(a, b )-distribution has the distribution function value F y (b ) at the point y. As a consequence the condition: is equivalent with: and that again with P θ X=x (K) > c 1 c 1 + c 2 ( ) b > F 1 c1 0 c 1 + c 2 x n > 1 n (k(a, α, 0) b). Remark 5: Note that F 0 is the distribution function of the Erlang(a, 0 )-distribution. This means, that one can compute the k(a, α, 0 ) just as γ-fractile (with γ = c 2 c 1 +c 2 ) of the Erlang(a, 0 )- distribution. Note that the above Bayes-test of theorem 2 can not be found in Berger (1994) (not even for the special case n = 1!) or somewhere else. Finally a fine: Application: Also in Bayes-statistics one thinks about certain confidence regions on the parameter. A quickest introduction into that area can be found again in Kremer (1994) (see part 4.4 there). There one derives confidence regions from the Bayes-tests. Denote with C(x) a confidence region for the parameter based on (the sample) x = (x 1,..., x n ). According to formula (3.7) in Kremer (2005) it has greatest sense to take: C(x) = { : ϕ (x) = 0} 7

where ϕ is the test of theorem 2 with choice instead of 0. Obviously: C(x) = { : x n 1n } (k(a, α, ) b) what can be rewritten as: ( C(x) = { : F b + ) } x i c with (compare proof of theorem 2). c = c 1 /(c 1 + c 2 ) For practical applications one needs an adequate value for c. Certainly one wants that the confidence region holds in a certain confidence level, say (1 δ) with a fixed, chosen δ (0, 1) (small, e.g. 0.05). Consequently one has the condition for choosing c: P θ X=x (C(x)) 1 δ, (5.1) where one can replace through =. According to remark 2 one has P θ X=x, it is Erlang (a+α, b+ n x i). But first rewrite C(x) into more nice form. Take the notation: ) G b () := F (b + x i with b = b + x i. Since also G b ( ) is strictly increasing, also its inverse G 1 b exists what gives: C(x) = { : G 1 b (c)}. The condition (5.1) (with replaced by = ) means as a consequence that G 1 b (c) must be the δ-fractile u δ (x) of P θ X=x. Alltogether one has as Bayesian-confidenceregion for : C(x) = (0, u δ (x)], (5.2) where u δ (x) is the δ-fractile of the Erlang(a + α, b + n x i)-distribution. C(x) is a HPD δ-credible region in the sense of Robert (1994) (see definition 5.7 there). But note, that the above results (especially (5.2)) are not given by Robert (and others). 8

6 Parameter-estimation For application of the results of the section 4 and 5 one needs to know the parameter b of the a-priori-distribution (remember a N, α R were assumed to be given (and known)). Since b is not given, one needs an estimator for b. For deriving such an estimator suppose, one has k replications of the X, say: X j = (X j1,..., X jn ), j = 1,..., k. with for each the random variable θ j (= θ) of the parameter j (= ) (j = 1,..., k). Assume for the following: a) θ 1,..., θ k are identically distributed. b) (X j, θ j ), j = 1,..., k are independent. c) X j1,..., X jn are i.i.d. given θ j (for j = 1,..., k). (defined in analog to (A), (B)). Obviously one has the context of section 6.2 and 6.3 in Kremer (2005). In addition assume that θ j is distributed like the θ in (C) (for all j = 1,..., k) and that: P X ji := P X ji θ j = is that P X ji of (D) (for all j = 1,..., k and i = 1,..., n). According to section 6.2 in Kremer (2005) one gets the socalled moment-estimator for the b as the solution ˆb of Eˆb(E(X ji θ j )) = X.. (6.1) with: X.. := 1 k n and where the outer integral E b ( ) stands for the integration over θ j = j according to the Erlang(a, b)-law. According to (4.1) one has and since: it follows as equation for ˆb from (6.1): k j=1 E(X ji θ j ) = a θ j, E(θ 1 j ) = b a 1 α ˆb a 1 = X... X ji 9

Obviously one has a moment-estimator ˆb ME for b simply: ˆbME = a 1 α X... Finally one also likes to know, how to calculate the maximum-likelihood-estimator for b in the above given Bayes-context. According to section 6.3 the ˆb is a solution of the equation: k j=1 ( d ln db ( n f X i (X ji) ) P b (d)) b=ˆb where P b is the a-piori-distribution with parameter b (a is known!). = 0, (6.2) Inserting all densities (according to (C) and (D)), one gets after routine calculations: ( d ln db ( n f X i (X ji) ) P b (d) As consequence, (6.2) gives as equation for ˆb: ) = ab (a + n α) ( b + ) 1 X ji k a ˆb = (a + n α) k j=1 1 ˆb + n X ji. With further modifications one arrives at the final result, that the maximum-likelihoodestimator ˆb ML of b is given as: ˆbML = n ˆB, where ˆB ist (the) solution of the equation: ˆB = a a + n α 1 S( ˆB) (6.3) with the S( ) according: where: S(B) = 1 k X j. = 1 n k j=1 1 B + X j., X ji. Clearly the solution ˆB of (6.3) has to be calculated in practical application with an adequate method of numerical mathematics. Certainly the practitioner might prefer the ˆbME to the ˆb ML. But note, according to certain general theoretical investigations of Asymptotic Statistics, also the more unhandy ˆb ML has its sense. 10

7 Final remarks Note, that the author s roots go back to non-parametric statistics. He made elegant research on Bahadur efficiencies of rank tests (see e.g. Kremer (1979), (1981), (1982a),(1985)). At the begin of the 80th he changed more to mathematical risk theory. In that field he brought a lot of new research in Bayes-techniques in premium rating (see e.g. Kremer (1982b), (1982c), (1996), (2005b)) and wrote also many papers on the statistics of loss reserving (see e.g. Kremer (1984), (1999), (2005c)). Biggest work he gave in mathematical stochastics of reinsurance (see e.g. Kremer (2005a), (2008)). 11

References [1] Berger, J.O. (1985): Statistical decision theory and Bayesian Analysis. Springer. [2] Gosh, J.K.& Ramamoorthi, R.V. (2003): Bayesian noparametric statistics. Springer. [3] Johnson, N.I. and Kotz, S. (1970): Continuous distributions - 1. John Wiley. [4] Kremer, E. (1979): Approximate and local Bahadur efficiency of linear rank tests of the two sample problem. Annals of Statistics. [5] Kremer, E. (1981): Local comparison of rank tests for the independence problem. Journal of Multivariate Statistics. [6] Kremer, E. (1982a): Local comparison of linear rank tests in the Bahadur sense. Metrika. [7] Kremer, E. (1982b): Credibility theory for some evolutionary models. Scandinavian Actuarial Journal. [8] Kremer, E. (1982c): Exponential smoothing and credibility theory. Insurance: Mathematics & Economics. [9] Kremer, E. (1984): A class of autoregressive models for predicting the final claims amount. Insurance: Mathematics & Economics. [10] Kremer, E. (1985): Behavior of rank tests at infinity. Mathematische Operationsforschung und Statistics. [11] Kremer, E. (1996): Credibility for stationarity. Blätter der deutschen Gesellschaft für Versicherungsmathematik. [12] Kremer, E. (1999): Threshold Loss Reserving. Mitteilungen der Vereinigung Schweizerischer Versicherungsmathematiker. [13] Kremer, E. (2004): Einführung in die Risikotheorie der verallgemeinerten Höchstschadenrückversicherung Verlag Versicherungswirtschaft. [14] Kremer, E. (2005a): Einführung in die Mathematik der Bayes-Statistik für Aktuare und Andere Logos Verlag. [15] Kremer, E. (2005b): Credibility for the regression model with autoregressive error terms. Blätter der deutschen Gesellschaft für Versicherungsmathematik. [16] Kremer, E. (2005c): The correlated Chain-Ladder-method for correlated claims developments. Blätter der deutschen Gesellschaft für Versicherungsmathematik. 12

[17] Kremer, E. (2008): Einführung in die Risikotheorie des Stoploss-Vertrags. Der Andere Verlag. [18] Robert, C.P. (1994): The Bayesian Choice. Springer. Abstract Bayesian techniques are usually no standard techniques of Mathematical Statistics. Nevertheless they are applied in different fields of research and practice, e.g. in insurance premium rating. Many theoretical results were derived about Bayesian-statistical methods. Applied to more special model assumptions they give often handy techniques for application. The present paper gives also under spezialized conditions such theory-based techniques, that were not yet given like that in the literature up to now. Keywords Bayes Statistics, Mixed Erlang Model, Estimator, Test, Parameter-Estimators. 13