PL-2 The Matrix Inverted: A Primer in GLM Theory

Similar documents
A Practitioner s Guide to Generalized Linear Models

C-14 Finding the Right Synergy from GLMs and Machine Learning

GLM I An Introduction to Generalized Linear Models

Content Preview. Multivariate Methods. There are several theoretical stumbling blocks to overcome to develop rating relativities

Advanced Ratemaking. Chapter 27 GLMs

Generalized linear mixed models (GLMMs) for dependent compound risk models

Generalized linear mixed models for dependent compound risk models

Ratemaking with a Copula-Based Multivariate Tweedie Model

Single-level Models for Binary Responses

Lecture 8. Poisson models for counts

Generalized linear mixed models (GLMMs) for dependent compound risk models

Chapter 5: Generalized Linear Models

Analytics Software. Beyond deterministic chain ladder reserves. Neil Covington Director of Solutions Management GI

11. Generalized Linear Models: An Introduction

8 Nominal and Ordinal Logistic Regression

Generalized Linear Models 1

Generalized Linear Models: An Introduction

Standard Error of Technical Cost Incorporating Parameter Uncertainty

Logistic regression: Miscellaneous topics

LOGISTIC REGRESSION Joseph M. Hilbe

Class Notes: Week 8. Probit versus Logit Link Functions and Count Data

Generalized Linear Models

MODELING COUNT DATA Joseph M. Hilbe

Subject CS1 Actuarial Statistics 1 Core Principles

Non-Life Insurance: Mathematics and Statistics

SCHOOL OF MATHEMATICS AND STATISTICS. Linear and Generalised Linear Models

Bonus Malus Systems in Car Insurance

Ratemaking application of Bayesian LASSO with conjugate hyperprior

GB2 Regression with Insurance Claim Severities

Lecture 14: Introduction to Poisson Regression

Modelling counts. Lecture 14: Introduction to Poisson Regression. Overview

EXAMINATIONS OF THE ROYAL STATISTICAL SOCIETY (formerly the Examinations of the Institute of Statisticians) GRADUATE DIPLOMA, 2007

A Handbook of Statistical Analyses Using R. Brian S. Everitt and Torsten Hothorn

Lecture 2: Linear and Mixed Models

Analysis of Categorical Data. Nick Jackson University of Southern California Department of Psychology 10/11/2013

DELTA METHOD and RESERVING

The simple linear regression model discussed in Chapter 13 was written as

STA 4504/5503 Sample Exam 1 Spring 2011 Categorical Data Analysis. 1. Indicate whether each of the following is true (T) or false (F).

Chapter 1 Statistical Inference

ST3241 Categorical Data Analysis I Generalized Linear Models. Introduction and Some Examples

Linear Regression Models P8111

BOOTSTRAPPING WITH MODELS FOR COUNT DATA

Introduction to General and Generalized Linear Models

f X (x) = λe λx, , x 0, k 0, λ > 0 Γ (k) f X (u)f X (z u)du

A Handbook of Statistical Analyses Using R 2nd Edition. Brian S. Everitt and Torsten Hothorn

Delta Boosting Machine and its application in Actuarial Modeling Simon CK Lee, Sheldon XS Lin KU Leuven, University of Toronto

Chapter 4: Generalized Linear Models-II

Solutions to the Spring 2016 CAS Exam S

Generalized Linear Models I

Package HGLMMM for Hierarchical Generalized Linear Models

Logistic Regression. James H. Steiger. Department of Psychology and Human Development Vanderbilt University

SAS Software to Fit the Generalized Linear Model

Chapter 1. Modeling Basics

Logistic regression. 11 Nov Logistic regression (EPFL) Applied Statistics 11 Nov / 20

9 Generalized Linear Models

Now consider the case where E(Y) = µ = Xβ and V (Y) = σ 2 G, where G is diagonal, but unknown.

Lecture 3: Linear Models. Bruce Walsh lecture notes Uppsala EQG course version 28 Jan 2012

Two Hours. Mathematical formula books and statistical tables are to be provided THE UNIVERSITY OF MANCHESTER. 26 May :00 16:00

Solutions. Some of the problems that might be encountered in collecting data on check-in times are:

Statistics 5100 Spring 2018 Exam 1

Counts using Jitters joint work with Peng Shi, Northern Illinois University

Correlation & Simple Regression

Logistic Regression. Some slides from Craig Burkett. STA303/STA1002: Methods of Data Analysis II, Summer 2016 Michael Guerzhoy

Modelling the risk process

Chapter 22: Log-linear regression for Poisson counts

Statistical Methods III Statistics 212. Problem Set 2 - Answer Key

Lecture 2: Linear Models. Bruce Walsh lecture notes Seattle SISG -Mixed Model Course version 23 June 2011

A Supplemental Material for Insurance Premium Prediction via Gradient Tree-Boosted Tweedie Compound Poisson Models

Generalized linear models

A few basics of credibility theory

STA 4504/5503 Sample Exam 1 Spring 2011 Categorical Data Analysis. 1. Indicate whether each of the following is true (T) or false (F).

Solutions to the Fall 2017 CAS Exam S

Linear Regression. In this lecture we will study a particular type of regression model: the linear regression model

BIOS 2083 Linear Models c Abdus S. Wahed

Generalized Linear Models. Last time: Background & motivation for moving beyond linear

Sections 4.1, 4.2, 4.3

Multinomial Logistic Regression Models

Various Issues in Fitting Contingency Tables

Lecture 9: Linear Regression

Practice Exam 1. (A) (B) (C) (D) (E) You are given the following data on loss sizes:

Lecture 5: Poisson and logistic regression

LISA Short Course Series Generalized Linear Models (GLMs) & Categorical Data Analysis (CDA) in R. Liang (Sally) Shan Nov. 4, 2014

Peng Shi. Actuarial Research Conference August 5-8, 2015

Computer exercise 4 Poisson Regression

Loglinear models. STAT 526 Professor Olga Vitek

Project Report for STAT571 Statistical Methods Instructor: Dr. Ramon V. Leon. Wage Data Analysis. Yuanlei Zhang

Week 8: Correlation and Regression

Faculty of Health Sciences. Regression models. Counts, Poisson regression, Lene Theil Skovgaard. Dept. of Biostatistics

Solutions to the Spring 2015 CAS Exam ST

Linear Regression With Special Variables

UNIVERSITY OF MASSACHUSETTS Department of Mathematics and Statistics Basic Exam - Applied Statistics Thursday, August 30, 2018

Final Exam. Name: Solution:

Lecture 2: Poisson and logistic regression

13.1 Categorical Data and the Multinomial Experiment

Logistic Regression - problem 6.14

STA 216: GENERALIZED LINEAR MODELS. Lecture 1. Review and Introduction. Much of statistics is based on the assumption that random

Multilevel Models in Matrix Form. Lecture 7 July 27, 2011 Advanced Multivariate Statistical Methods ICPSR Summer Session #2

Generalized Linear Models for a Dependent Aggregate Claims Model

DISPLAYING THE POISSON REGRESSION ANALYSIS

K. Model Diagnostics. residuals ˆɛ ij = Y ij ˆµ i N = Y ij Ȳ i semi-studentized residuals ω ij = ˆɛ ij. studentized deleted residuals ɛ ij =

Transcription:

PL-2 The Matrix Inverted: A Primer in GLM Theory 2005 CAS Seminar on Ratemaking Claudine Modlin, FCAS Watson Wyatt Insurance & Financial Services, Inc W W W. W A T S O N W Y A T T. C O M / I N S U R A N C E

Generalized linear model benefits Consider all factors simultaneously Allow for nature of random process Provide diagnostics Robust and transparent

Example of GLM output (real UK data) 0.25 22% 180 0.2 0.15 0.1 10% 6% 7% 160 140 Log of multiplier 0.05 0-0.05-0.1-0.15-0.2 0% -5% -19% -4% -16% -20% -15% -19% -17% 120 100 80 60 Exposure (policy years) -0.25-0.3 40-0.35-0.4 20-0.45 1 2 3 4 5 6 7 0 Factor Exposure Oneway relativities Approx 2 SE from estimate GLM estimate

Agenda Formularization of GLMs linear predictor, link function, offset error term, scale parameter, prior weights typical model forms Model testing use only variables which are predictive make sure model is reasonable Aliasing

Agenda Formularization of GLMs linear predictor, link function, offset error term, scale parameter, prior weights typical model forms Model testing use only variables which are predictive make sure model is reasonable Aliasing

Linear models Linear model Y i = i + error i based on linear combination of measured factors Which factors, and how they are best combined is to be derived i = +.age i +.age i 2 +.height i.age i i = +.age i +.(sex i =female) i = ( +.age i ) * exp(.height i.age i )

Linear models - formularization E[Y i ] = i = X ij j Var[Y i ] = 2 Y i ~ N( i, 2 )

What is X ij j? X defines the explanatory variables to be included in the model could be continuous variables - "variates" could be categorical variables - "factors" contains the parameter estimates which relate to the factors / variates defined by the structure of X "the answer"

What is X.? Write X ij j as X. Consider 3 rating factors - age of driver ("age") - sex of driver ("sex") - age of vehicle ("car") Represent by,,,,...

What is X.? Suppose we wanted a model of the form: = +.age +.age 2 +.car 27.age 52½ X. would need to be defined as: 1 age 1 age 1 2 car 1 27.age 1 52½ 1 age 2 age 2 2 car 2 27.age 2 52½ 1 age 3 age 3 2 car 3 27.age 3 52½ 1 age 4 age 4 2 car 4 27.age 4 52½ 1 age 5 age 2 5 car 27 5.age 52½ 5......

What is X.? Suppose we wanted a model of the form: = + if age < 30 1 + if age 30-40 2 + if age > 40 3 + if sex male 1 + if sex female 2

What is X.? Age Sex 1 2 3 4 5 <30 30-40 >40 M F 1 0 1 0 1 0 1 1 0 0 1 0 1 1 0 0 0 1 1 0 0 1 1 0 1 0 1 0 0 1...... 1 2 1 2 3

What is X.? Suppose we wanted a model of the form: = + if age < 30 1 + if age 30-40 2 + if age > 40 3 + if sex male 1 + if sex female 2

What is X.? Suppose we wanted a model of the form: = + if age < 30 1 + if age 30-40 2 "Base levels" + if age > 40 3 + if sex male 1 + if sex female 2

1 2 3 4 5 X. having adjusted for base levels Age Sex <30 30-40 >40 M F 1 0 1 0 1 0 1 1 0 0 1 0 1 1 0 0 0 1 1 0 0 1 1 0 1 0 1 0 0 1...... 1 2 1 2 3

Linear models - formularization E[Y i ] = i = X ij j Var[Y i ] = 2 Y i ~ N( i, 2 )

Generalized linear models i = f( +.age i +.age i 2 +.height i.age i ) i = f( +.age i +.(sex i =female) )

Generalized linear models i = g -1 ( +.age i +.age i 2 +.height i.age i ) i = g -1 ( +.age i +.(sex i =female) )

Generalized linear models Linear Models E[Y i ] = i = X ij j Var[Y i ] = 2 Y from Normal distribution Generalized Linear Models E[Y i ] = i = g -1 ( X ij j + i ) Var[Y i ] = V( i )/ i Y from a distribution from the exponential family

Generalized linear models Each observation i from distribution with mean i 0.045 0.04 0.035 0.03 0.025 0.02 0.015 0.01 0.005 0 Women Men

Generalized linear models E[Y i ] = i = g -1 ( X ij j + i ) j Var[Y i ] =.V( i )/ i

Generalized linear models E[Y] = = g -1( X + ) Var[Y] = V( ) /

Generalized linear models E[Y] = = g -1( X ) Observed thing (data) Some function (user defined) Some matrix based on data (user defined) as per linear models Parameters to be estimated (the answer!)

What is g -1 (X. )? -1 Y = g (X. ) + error Assuming a model with three categorical factors, each observation can be expressed as: -1 Y = g ( ijk + + + i j k ) + error = = = 0 2 1 3 age is in group i sex is in group j car is in group k

What is g -1 (X. )? g(x) = x Y ijk = + i + j + k + error g(x) = ln(x) Y ijk = e ( + i + j + k ) + error = A.B i.c j.d k + error where B i = e i etc Multiplicative form common for frequency and amounts

Multiplicative model $ 207.10 x Age Factor 17 2.52 18 2.05 19 1.97 20 1.85 21-23 1.75 24-26 1.54 27-30 1.42 31-35 1.20 36-40 1.00 41-45 0.93 46-50 0.84 50-60 0.76 60+ 0.78 Group Factor 1 0.54 2 0.65 3 0.73 4 0.85 5 0.92 6 0.96 7 1.00 8 1.08 9 1.19 10 1.26 11 1.36 12 1.43 13 1.56 Sex Factor Male 1.00 Female 1.25 Area Factor A 0.95 B 1.00 C 1.09 D 1.15 E 1.18 F 1.27 G 1.36 H 1.44 E(losses) = $ 207.10 x 1.42 x 0.92 x 1.00 x 1.15 = $ 311.14

Generalized linear models E[Y] = = g -1( X + ) "Offset" Eg Y = claim numbers Smith: Jones: Male, 30, Ford, 1 years, 2 claims Female, 40, VW, ½ year, 1 claim

What is? g(x) = ln(x) ijk = ln(exposure ijk ) E[Y ijk ] = e ( + i + j + k + ijk ) = A.B i.c j.d k. e (ln(exposure ijk )) = A.B i.c j.d k. exposure ijk

Restricted models E[Y] = = g -1( X + ) Offset Constrain model (eg territory, ABS discount) Other factors adjusted to compensate

Generalized linear models E[Y] = = g -1( X + ) Var[Y] = V( ) /

Generalized linear models Var[Y] = V( ) / Normal: = 2, V(x) = 1 Var[Y] = 2.1 Poisson: = 1, V(x) = x Var[Y] = Gamma: = k, V(x) = x 2 Var[Y] = k 2

The scale parameter Var[Y] = V( ) / Normal: = 2, V(x) = 1 Var[Y] = 2.1 Poisson: = 1, V(x) = x Var[Y] = Gamma: = k, V(x) = x 2 Var[Y] = k 2

Example of effect of changing assumed error - 1 10 8 6 4 2 0 1 2 3 Data Normal

Example of effect of changing assumed error - 1 10 8 6 4 2 0 1 2 3 Data Normal Poisson

Example of effect of changing assumed error - 1 10 8 6 4 2 0 1 2 3 Data Normal Poisson Gamma

Example of effect of changing assumed error - 2 Example portfolio with five rating factors, each with five levels A, B, C, D, E Typical correlations between those rating factors Assumed true effect of factors Claims randomly generated (with Gamma) Random experience analyzed by three models

Example of effect of changing assumed error - 2 0.6 0.5 0.4 Log of multiplier 0.3 0.2 0.1 0-0.1 A B C D E -0.2 True effect

Example of effect of changing assumed error - 2 0.6 0.5 0.4 Log of multiplier 0.3 0.2 0.1 0-0.1 A B C D E -0.2 True effect One way

Example of effect of changing assumed error - 2 0.6 0.5 0.4 Log of multiplier 0.3 0.2 0.1 0-0.1 A B C D E -0.2 True effect One way GLM / Normal

Example of effect of changing assumed error - 2 0.6 0.5 0.4 Log of multiplier 0.3 0.2 0.1 0-0.1 A B C D E -0.2 True effect One way GLM / Normal GLM / Gamma

Example of effect of changing assumed error - 2 0.6 0.5 0.4 Log of multiplier 0.3 0.2 0.1 0-0.1 A B C D E -0.2 True effect GLM / Gamma + 2 SE - 2 SE

Prior weights Var[Y] = V( ) / Exposure Other credibility Eg Y = claim frequency Smith: Male, 30, Ford, 1 years, 2 claims, 100% Jones: Female, 40, VW, ½ year, 1 claim, 100%

Typical model forms Y Claim frequency Claim number Average claim amount Probability (eg lapses) g(x) ln(x) ln(x) ln(x) ln(x/(1-x)) Error Poisson Poisson Gamma Binomial V(x) 1 x 1 x estimate x 2 1 x(1-x) exposure 1 # claims 1 0 ln(exposure) 0 0

0.2 0 Tweedie distributions Incurred losses have a point mass at zero and then a continuous distribution Poisson and gamma not suited to this Tweedie distribution has point mass and parameters which can alter the shape to be like Poisson and gamma above zero f Y ( y;,, ) n 1 1 ( ) ( 1/ y) ( n ) n! y p( Y 0) exp ( 0) n.exp [ y ( )] 0 0 for y 0

Generalized linear models Var[Y] = V( ) / Normal: = 2, V(x) = 1 Var[Y] = 2.1 Poisson = 1, V(x) = x Var[Y] = Gamma = k, V(x) = x 2 Var[Y] = k 2 Tweedie = k, V(x) = x p Var[Y] = k p

Tweedie distributions Tweedie = k, V(x) = x p Var[Y] = k p Defines a valid distribution for p<0, 1<p<2, p>2 Can be considered as Poisson/gamma process for 1<p<2 Typical values of p for insurance incurred claims around, or just under, 1.5

Tweedie conclusions Helpful when important to fit to pure premium Often similar results to traditional approach but differences may occur if numbers and amounts models have effects which are both large and insignificant No information about whether frequencies or amounts are driving result

Link function g(x) Offset Error Structure V( ) Linear Predictor Form X. = i + j + k + l Data Y Scale Parameter Prior Weights Numerical MLE Parameter Estimates,,, Diagnostics

Maximum likelihood estimation Seek parameters which give highest likelihood function given data Likelihood Parameter 1 Parameter 2

Newton-Raphson In one dimension: x n+1 = x n f '(x n ) / f ''(x n ) In n dimensions: n+1 = n H -1.s where is the vector of the parameter estimates (with p elements), s is the vector of the first derivatives of the log-likelihood and H is the (p*p) matrix containing the second derivatives of the log-likelihood

Agenda Formularization of GLMs linear predictor, link function, offset error term, scale parameter, prior weights typical model forms Model testing use only variables which are predictive make sure model is reasonable Aliasing

Model testing Use only those variables which are predictive standard errors of parameter estimates F tests / 2 tests on deviances Make sure the model is reasonable histogram of deviance residuals residual vs fitted value Box Cox link function investigation

Standard errors Roughly speaking, for a parameter p: SE = -1 / ( 2 / p 2 Likelihood) Likelihood Parameter 1 Parameter 2

GLM output (significant factor) 1.2 200000 180000 1 154% 138% 160000 0.8 105% 140000 Log of multiplier 0.6 0.4 31% 39% 45% 58% 84% 73% 72% 93% 120000 100000 80000 Exposure (years) 0.2 60000 0 0% 5% 40000 20000-0.2 1 2 3 4 5 6 7 8 9 10 11 12 13 0 Vehicle symbol Onew ay relativities Approx 95% confidence interval Parameter es timate P value = 0.0%

GLM output (insignificant factor) 0.2 200000 0.15 180000 160000 0.1 7% 140000 Log of multiplier 0.05 0-0.05 0% 4% -1% 3% -5% 0% -5% -1% 3% -1% 4% -1% 120000 100000 80000 Exposure (years) 60000-0.1 40000-0.15 20000-0.2 1 2 3 4 5 6 7 8 9 10 11 12 13 0 Vehicle symbol Onew ay relativities Approx 95% confidence interval Par ameter estimate P value = 52.5%

Awkward cases 2 1.8 1.6 1.4 Log of multiplier 1.2 1 0.8 0.6 0.4 0.2 0-0.2-10% -5% 16% -14% 11% -10% -22% 0% 42% 65% 92% 101% 123% 159% 200% -0.4-0.6-0.8-1 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15

Deviances Single figure measure of goodness of fit Try model with & without a factor Statistical tests show the theoretical significance given the extra parameters

Deviances Age Sex Vehicle Zone Model A Fitted value Deviance = 9585 df = 109954 Mileage Claims? Age Sex Vehicle Model B Fitted value Deviance = 9604 df = 109965 Mileage Claims

Deviances If known, scaled deviance S output S = n 2 / Y u u (Y u - ) / V( ) d u=1 u S 1 - S 2 ~ 2 d 1 -d 2 If unknown, unscaled deviance D =.S output (D 1 - D 2 ) (d 1 - d 2 ) D 3 / d 3 ~ F d1 -d 2, d 3

Model testing Use only those variables which are predictive standard errors of parameter estimates F tests / 2 tests on deviances Make sure the model is reasonable histogram of deviance residuals residual vs fitted value Box Cox link function investigation

Residuals 4.5 4 3.5 3 2.5 2 1.5 0 1 2 3 4 5 6 7 Residuals Fitted values Data

Residuals Several forms, eg standardized deviance sign (Y u - u ) / ( (1-h u ) ) ½ 2 u ( Y u - ) / V( d standardized Pearson Y - u u (.V( ).(1-h ) / ) u u u Standardized deviance - Normal (0,1) Numbers/frequency residuals problematical ½ Y u u

Residuals Histogram of Deviance Residuals Run 12 (Final models with analysis) Model 8 (AD amounts) 6000 5000 4000 Frequency 3000 2000 1000 0-4.5-4.25-4 -3.75-3.5-3.25-3 -2.75-2.5-2.25-2 -1.75-1.5-1.25-1 -0.75-0.5-0.25 0 0.25 0.5 0.75 1 1.25 1.5 1.75 2 2.25 2.5 2.75 3 3.25 3.5 3.75 4 4.25 4.5 4.75 5 5.25 Size of deviance residuals Pretium 08/01/2004 12:24

Residuals 4 3 2 1 Residual 0-1 -2-3 -4 5.6 5.7 5.8 5.9 6.0 6.1 6.2 6.3 6.4 6.5 6.6 6.7 6.8 6.9 7.0 7.1 7.2 7.3 7.4 7.5 Pretium 04/05/2004 11:03 Log of fitted value

Residuals

Gamma data, Gamma error Plot of deviance residual against fitted value Run 12 (All claim types, final models, N&A) Model 6 (Own damage, Amounts) 2 1 Deviance Residual 0-1 -2-3 500 600 700 800 900 1000 1100 1200 1300 1400 1500 1600 1700 1800 1900 2000 2100 2200 Pretium 08/01/2004 12:32 Fitted Value

Gamma data, Normal error Plot of deviance residual against fitted value Run 12 (All claim types, final models, N&A) Model 7 (Own damage, Amounts) 6000 5000 4000 3000 Deviance Residual 2000 1000 0-1000 -2000 500 600 700 800 900 1000 1100 1200 1300 1400 1500 1600 1700 1800 1900 2000 2100 2200 2300 Pretium 08/01/2004 12:32 Fitted Value

Agenda Formularization of GLMs linear predictor, link function, offset error term, scale parameter, prior weights typical model forms Model testing use only variables which are predictive make sure model is reasonable Aliasing

Aliasing and "near aliasing" Aliasing the removal of unwanted redundant parameters Intrinsic aliasing occurs by the design of the model Extrinsic aliasing occurs "accidentally" as a result of the data

Intrinsic aliasing X. = + if age 20-29 1 + if age 30-39 2 + if age 40 + 3 "Base levels" + if sex male 1 + if sex female 2

Intrinsic aliasing Example job Run 16 Model 3 - Small interaction - Third party material damage, Numbers 1.2 250000 1 155% 138% 0.8 200000 Log of multiplier 0.6 0.4 0.2 63% 28% 24% 150000 100000 Exposure (years) 0-0.2 0% -19% -6% -11% 50000-0.4 17-21 22-24 25-29 30-34 35-39 40-49 50-59 60-69 70+ 0 Age of driver Onew ay relativities Approx 95% confidence interval Unsmoothed estimate Smoothed estimate

Extrinsic aliasing If a perfect correlation exists, one factor can alias levels of another Eg if doors declared first: Selected base Exposure: # Doors 2 3 4 5 Unknown Color Selected base Red 13,234 12,343 13,432 13,432 0 Green 4,543 4,543 13,243 2,345 0 Blue 6,544 5,443 15,654 4,565 0 Black 4,643 1,235 14,565 4,545 0 Further aliasing Unknown 0 0 0 0 3,242 This is the only reason the order of declaration can matter (fitted values are unaffected)

Extrinsic aliasing Example job Run 16 Model 3 - Small interaction - Third party material damage, Numbers 1 135% 600000 Log of multiplier 0.8 0.6 0.4 0.2 30% 39% 45% 57% 82% 72% 70% 91% 103% 500000 400000 300000 200000 Exposure (years) 0 0% 0% 100000-0.2 50 55-59 60-64 65-69 70-74 75-79 80-84 85-89 90-94 95-99 100 Malus 0 No claim discount Onew ay relativities Approx 95% confidence interval Unsmoothed estimate Smoothed estimate

"Near aliasing" If two factors are almost perfectly, but not quite aliased, convergence problems can result as a result of low exposures (even though one-ways look fine), and/or results can become hard to interpret Selected base Exposure: # Doors 2 3 4 5 Unknown Color Selected base Red 13,234 12,343 13,432 13,432 0 Green 4,543 4,543 13,243 2,345 0 Blue 6,544 5,443 15,654 4,565 0 Black 4,643 1,235 14,565 4,545 2 Unknown 0 0 0 0 3,242 Eg if the 2 black, unknown doors policies had no claims, GLM would try to estimate a very large negative number for unknown doors, and a very large positive number for unknown color

"A Practitioner's Guide to Generalized Linear Models" CAS 2004 Discussion Paper Program Copies available at www.watsonwyatt.com/glm

The Matrix Inverted: A Primer in GLM Theory 2005 CAS Seminar on Ratemaking Claudine Modlin, FCAS Watson Wyatt Insurance & Financial Services, Inc W W W. W A T S O N W Y A T T. C O M / I N S U R A N C E