PL-2 The Matrix Inverted: A Primer in GLM Theory

Size: px
Start display at page:

Download "PL-2 The Matrix Inverted: A Primer in GLM Theory"

Transcription

1 PL-2 The Matrix Inverted: A Primer in GLM Theory 2005 CAS Seminar on Ratemaking Claudine Modlin, FCAS Watson Wyatt Insurance & Financial Services, Inc W W W. W A T S O N W Y A T T. C O M / I N S U R A N C E

2 Generalized linear model benefits Consider all factors simultaneously Allow for nature of random process Provide diagnostics Robust and transparent

3 Example of GLM output (real UK data) % % 6% 7% Log of multiplier % -5% -19% -4% -16% -20% -15% -19% -17% Exposure (policy years) Factor Exposure Oneway relativities Approx 2 SE from estimate GLM estimate

4 Agenda Formularization of GLMs linear predictor, link function, offset error term, scale parameter, prior weights typical model forms Model testing use only variables which are predictive make sure model is reasonable Aliasing

5 Agenda Formularization of GLMs linear predictor, link function, offset error term, scale parameter, prior weights typical model forms Model testing use only variables which are predictive make sure model is reasonable Aliasing

6 Linear models Linear model Y i = i + error i based on linear combination of measured factors Which factors, and how they are best combined is to be derived i = +.age i +.age i 2 +.height i.age i i = +.age i +.(sex i =female) i = ( +.age i ) * exp(.height i.age i )

7 Linear models - formularization E[Y i ] = i = X ij j Var[Y i ] = 2 Y i ~ N( i, 2 )

8 What is X ij j? X defines the explanatory variables to be included in the model could be continuous variables - "variates" could be categorical variables - "factors" contains the parameter estimates which relate to the factors / variates defined by the structure of X "the answer"

9 What is X.? Write X ij j as X. Consider 3 rating factors - age of driver ("age") - sex of driver ("sex") - age of vehicle ("car") Represent by,,,,...

10 What is X.? Suppose we wanted a model of the form: = +.age +.age 2 +.car 27.age 52½ X. would need to be defined as: 1 age 1 age 1 2 car 1 27.age 1 52½ 1 age 2 age 2 2 car 2 27.age 2 52½ 1 age 3 age 3 2 car 3 27.age 3 52½ 1 age 4 age 4 2 car 4 27.age 4 52½ 1 age 5 age 2 5 car 27 5.age 52½

11 What is X.? Suppose we wanted a model of the form: = + if age < if age if age > if sex male 1 + if sex female 2

12 What is X.? Age Sex < >40 M F

13 What is X.? Suppose we wanted a model of the form: = + if age < if age if age > if sex male 1 + if sex female 2

14 What is X.? Suppose we wanted a model of the form: = + if age < if age "Base levels" + if age > if sex male 1 + if sex female 2

15 X. having adjusted for base levels Age Sex < >40 M F

16 Linear models - formularization E[Y i ] = i = X ij j Var[Y i ] = 2 Y i ~ N( i, 2 )

17 Generalized linear models i = f( +.age i +.age i 2 +.height i.age i ) i = f( +.age i +.(sex i =female) )

18 Generalized linear models i = g -1 ( +.age i +.age i 2 +.height i.age i ) i = g -1 ( +.age i +.(sex i =female) )

19 Generalized linear models Linear Models E[Y i ] = i = X ij j Var[Y i ] = 2 Y from Normal distribution Generalized Linear Models E[Y i ] = i = g -1 ( X ij j + i ) Var[Y i ] = V( i )/ i Y from a distribution from the exponential family

20 Generalized linear models Each observation i from distribution with mean i Women Men

21 Generalized linear models E[Y i ] = i = g -1 ( X ij j + i ) j Var[Y i ] =.V( i )/ i

22 Generalized linear models E[Y] = = g -1( X + ) Var[Y] = V( ) /

23 Generalized linear models E[Y] = = g -1( X ) Observed thing (data) Some function (user defined) Some matrix based on data (user defined) as per linear models Parameters to be estimated (the answer!)

24 What is g -1 (X. )? -1 Y = g (X. ) + error Assuming a model with three categorical factors, each observation can be expressed as: -1 Y = g ( ijk i j k ) + error = = = age is in group i sex is in group j car is in group k

25 What is g -1 (X. )? g(x) = x Y ijk = + i + j + k + error g(x) = ln(x) Y ijk = e ( + i + j + k ) + error = A.B i.c j.d k + error where B i = e i etc Multiplicative form common for frequency and amounts

26 Multiplicative model $ x Age Factor Group Factor Sex Factor Male 1.00 Female 1.25 Area Factor A 0.95 B 1.00 C 1.09 D 1.15 E 1.18 F 1.27 G 1.36 H 1.44 E(losses) = $ x 1.42 x 0.92 x 1.00 x 1.15 = $

27 Generalized linear models E[Y] = = g -1( X + ) "Offset" Eg Y = claim numbers Smith: Jones: Male, 30, Ford, 1 years, 2 claims Female, 40, VW, ½ year, 1 claim

28 What is? g(x) = ln(x) ijk = ln(exposure ijk ) E[Y ijk ] = e ( + i + j + k + ijk ) = A.B i.c j.d k. e (ln(exposure ijk )) = A.B i.c j.d k. exposure ijk

29 Restricted models E[Y] = = g -1( X + ) Offset Constrain model (eg territory, ABS discount) Other factors adjusted to compensate

30 Generalized linear models E[Y] = = g -1( X + ) Var[Y] = V( ) /

31 Generalized linear models Var[Y] = V( ) / Normal: = 2, V(x) = 1 Var[Y] = 2.1 Poisson: = 1, V(x) = x Var[Y] = Gamma: = k, V(x) = x 2 Var[Y] = k 2

32 The scale parameter Var[Y] = V( ) / Normal: = 2, V(x) = 1 Var[Y] = 2.1 Poisson: = 1, V(x) = x Var[Y] = Gamma: = k, V(x) = x 2 Var[Y] = k 2

33 Example of effect of changing assumed error Data Normal

34 Example of effect of changing assumed error Data Normal Poisson

35 Example of effect of changing assumed error Data Normal Poisson Gamma

36 Example of effect of changing assumed error - 2 Example portfolio with five rating factors, each with five levels A, B, C, D, E Typical correlations between those rating factors Assumed true effect of factors Claims randomly generated (with Gamma) Random experience analyzed by three models

37 Example of effect of changing assumed error Log of multiplier A B C D E -0.2 True effect

38 Example of effect of changing assumed error Log of multiplier A B C D E -0.2 True effect One way

39 Example of effect of changing assumed error Log of multiplier A B C D E -0.2 True effect One way GLM / Normal

40 Example of effect of changing assumed error Log of multiplier A B C D E -0.2 True effect One way GLM / Normal GLM / Gamma

41 Example of effect of changing assumed error Log of multiplier A B C D E -0.2 True effect GLM / Gamma + 2 SE - 2 SE

42 Prior weights Var[Y] = V( ) / Exposure Other credibility Eg Y = claim frequency Smith: Male, 30, Ford, 1 years, 2 claims, 100% Jones: Female, 40, VW, ½ year, 1 claim, 100%

43 Typical model forms Y Claim frequency Claim number Average claim amount Probability (eg lapses) g(x) ln(x) ln(x) ln(x) ln(x/(1-x)) Error Poisson Poisson Gamma Binomial V(x) 1 x 1 x estimate x 2 1 x(1-x) exposure 1 # claims 1 0 ln(exposure) 0 0

44 0.2 0 Tweedie distributions Incurred losses have a point mass at zero and then a continuous distribution Poisson and gamma not suited to this Tweedie distribution has point mass and parameters which can alter the shape to be like Poisson and gamma above zero f Y ( y;,, ) n 1 1 ( ) ( 1/ y) ( n ) n! y p( Y 0) exp ( 0) n.exp [ y ( )] 0 0 for y 0

45 Generalized linear models Var[Y] = V( ) / Normal: = 2, V(x) = 1 Var[Y] = 2.1 Poisson = 1, V(x) = x Var[Y] = Gamma = k, V(x) = x 2 Var[Y] = k 2 Tweedie = k, V(x) = x p Var[Y] = k p

46 Tweedie distributions Tweedie = k, V(x) = x p Var[Y] = k p Defines a valid distribution for p<0, 1<p<2, p>2 Can be considered as Poisson/gamma process for 1<p<2 Typical values of p for insurance incurred claims around, or just under, 1.5

47 Tweedie conclusions Helpful when important to fit to pure premium Often similar results to traditional approach but differences may occur if numbers and amounts models have effects which are both large and insignificant No information about whether frequencies or amounts are driving result

48 Link function g(x) Offset Error Structure V( ) Linear Predictor Form X. = i + j + k + l Data Y Scale Parameter Prior Weights Numerical MLE Parameter Estimates,,, Diagnostics

49 Maximum likelihood estimation Seek parameters which give highest likelihood function given data Likelihood Parameter 1 Parameter 2

50 Newton-Raphson In one dimension: x n+1 = x n f '(x n ) / f ''(x n ) In n dimensions: n+1 = n H -1.s where is the vector of the parameter estimates (with p elements), s is the vector of the first derivatives of the log-likelihood and H is the (p*p) matrix containing the second derivatives of the log-likelihood

51 Agenda Formularization of GLMs linear predictor, link function, offset error term, scale parameter, prior weights typical model forms Model testing use only variables which are predictive make sure model is reasonable Aliasing

52 Model testing Use only those variables which are predictive standard errors of parameter estimates F tests / 2 tests on deviances Make sure the model is reasonable histogram of deviance residuals residual vs fitted value Box Cox link function investigation

53 Standard errors Roughly speaking, for a parameter p: SE = -1 / ( 2 / p 2 Likelihood) Likelihood Parameter 1 Parameter 2

54 GLM output (significant factor) % 138% % Log of multiplier % 39% 45% 58% 84% 73% 72% 93% Exposure (years) % 5% Vehicle symbol Onew ay relativities Approx 95% confidence interval Parameter es timate P value = 0.0%

55 GLM output (insignificant factor) % Log of multiplier % 4% -1% 3% -5% 0% -5% -1% 3% -1% 4% -1% Exposure (years) Vehicle symbol Onew ay relativities Approx 95% confidence interval Par ameter estimate P value = 52.5%

56 Awkward cases Log of multiplier % -5% 16% -14% 11% -10% -22% 0% 42% 65% 92% 101% 123% 159% 200%

57 Deviances Single figure measure of goodness of fit Try model with & without a factor Statistical tests show the theoretical significance given the extra parameters

58 Deviances Age Sex Vehicle Zone Model A Fitted value Deviance = 9585 df = Mileage Claims? Age Sex Vehicle Model B Fitted value Deviance = 9604 df = Mileage Claims

59 Deviances If known, scaled deviance S output S = n 2 / Y u u (Y u - ) / V( ) d u=1 u S 1 - S 2 ~ 2 d 1 -d 2 If unknown, unscaled deviance D =.S output (D 1 - D 2 ) (d 1 - d 2 ) D 3 / d 3 ~ F d1 -d 2, d 3

60 Model testing Use only those variables which are predictive standard errors of parameter estimates F tests / 2 tests on deviances Make sure the model is reasonable histogram of deviance residuals residual vs fitted value Box Cox link function investigation

61 Residuals Residuals Fitted values Data

62 Residuals Several forms, eg standardized deviance sign (Y u - u ) / ( (1-h u ) ) ½ 2 u ( Y u - ) / V( d standardized Pearson Y - u u (.V( ).(1-h ) / ) u u u Standardized deviance - Normal (0,1) Numbers/frequency residuals problematical ½ Y u u

63 Residuals Histogram of Deviance Residuals Run 12 (Final models with analysis) Model 8 (AD amounts) Frequency Size of deviance residuals Pretium 08/01/ :24

64 Residuals Residual Pretium 04/05/ :03 Log of fitted value

65 Residuals

66 Gamma data, Gamma error Plot of deviance residual against fitted value Run 12 (All claim types, final models, N&A) Model 6 (Own damage, Amounts) 2 1 Deviance Residual Pretium 08/01/ :32 Fitted Value

67 Gamma data, Normal error Plot of deviance residual against fitted value Run 12 (All claim types, final models, N&A) Model 7 (Own damage, Amounts) Deviance Residual Pretium 08/01/ :32 Fitted Value

68 Agenda Formularization of GLMs linear predictor, link function, offset error term, scale parameter, prior weights typical model forms Model testing use only variables which are predictive make sure model is reasonable Aliasing

69 Aliasing and "near aliasing" Aliasing the removal of unwanted redundant parameters Intrinsic aliasing occurs by the design of the model Extrinsic aliasing occurs "accidentally" as a result of the data

70 Intrinsic aliasing X. = + if age if age if age "Base levels" + if sex male 1 + if sex female 2

71 Intrinsic aliasing Example job Run 16 Model 3 - Small interaction - Third party material damage, Numbers % 138% Log of multiplier % 28% 24% Exposure (years) % -19% -6% -11% Age of driver Onew ay relativities Approx 95% confidence interval Unsmoothed estimate Smoothed estimate

72 Extrinsic aliasing If a perfect correlation exists, one factor can alias levels of another Eg if doors declared first: Selected base Exposure: # Doors Unknown Color Selected base Red 13,234 12,343 13,432 13,432 0 Green 4,543 4,543 13,243 2,345 0 Blue 6,544 5,443 15,654 4,565 0 Black 4,643 1,235 14,565 4,545 0 Further aliasing Unknown ,242 This is the only reason the order of declaration can matter (fitted values are unaffected)

73 Extrinsic aliasing Example job Run 16 Model 3 - Small interaction - Third party material damage, Numbers 1 135% Log of multiplier % 39% 45% 57% 82% 72% 70% 91% 103% Exposure (years) 0 0% 0% Malus 0 No claim discount Onew ay relativities Approx 95% confidence interval Unsmoothed estimate Smoothed estimate

74 "Near aliasing" If two factors are almost perfectly, but not quite aliased, convergence problems can result as a result of low exposures (even though one-ways look fine), and/or results can become hard to interpret Selected base Exposure: # Doors Unknown Color Selected base Red 13,234 12,343 13,432 13,432 0 Green 4,543 4,543 13,243 2,345 0 Blue 6,544 5,443 15,654 4,565 0 Black 4,643 1,235 14,565 4,545 2 Unknown ,242 Eg if the 2 black, unknown doors policies had no claims, GLM would try to estimate a very large negative number for unknown doors, and a very large positive number for unknown color

75 "A Practitioner's Guide to Generalized Linear Models" CAS 2004 Discussion Paper Program Copies available at

76 The Matrix Inverted: A Primer in GLM Theory 2005 CAS Seminar on Ratemaking Claudine Modlin, FCAS Watson Wyatt Insurance & Financial Services, Inc W W W. W A T S O N W Y A T T. C O M / I N S U R A N C E

A Practitioner s Guide to Generalized Linear Models

A Practitioner s Guide to Generalized Linear Models A Practitioners Guide to Generalized Linear Models Background The classical linear models and most of the minimum bias procedures are special cases of generalized linear models (GLMs). GLMs are more technically

More information

C-14 Finding the Right Synergy from GLMs and Machine Learning

C-14 Finding the Right Synergy from GLMs and Machine Learning C-14 Finding the Right Synergy from GLMs and Machine Learning 2010 CAS Annual Meeting Claudine Modlin November 8, 2010 Slide 1 Definitions Parametric modeling Objective: build a predictive model User makes

More information

GLM I An Introduction to Generalized Linear Models

GLM I An Introduction to Generalized Linear Models GLM I An Introduction to Generalized Linear Models CAS Ratemaking and Product Management Seminar March Presented by: Tanya D. Havlicek, ACAS, MAAA ANTITRUST Notice The Casualty Actuarial Society is committed

More information

Content Preview. Multivariate Methods. There are several theoretical stumbling blocks to overcome to develop rating relativities

Content Preview. Multivariate Methods. There are several theoretical stumbling blocks to overcome to develop rating relativities Introduction to Ratemaking Multivariate Methods March 15, 2010 Jonathan Fox, FCAS, GuideOne Mutual Insurance Group Content Preview 1. Theoretical Issues 2. One-way Analysis Shortfalls 3. Multivariate Methods

More information

Advanced Ratemaking. Chapter 27 GLMs

Advanced Ratemaking. Chapter 27 GLMs Mahlerʼs Guide to Advanced Ratemaking CAS Exam 8 Chapter 27 GLMs prepared by Howard C. Mahler, FCAS Copyright 2016 by Howard C. Mahler. Study Aid 2016-8 Howard Mahler hmahler@mac.com www.howardmahler.com/teaching

More information

Generalized linear mixed models (GLMMs) for dependent compound risk models

Generalized linear mixed models (GLMMs) for dependent compound risk models Generalized linear mixed models (GLMMs) for dependent compound risk models Emiliano A. Valdez joint work with H. Jeong, J. Ahn and S. Park University of Connecticut 52nd Actuarial Research Conference Georgia

More information

Generalized linear mixed models for dependent compound risk models

Generalized linear mixed models for dependent compound risk models Generalized linear mixed models for dependent compound risk models Emiliano A. Valdez joint work with H. Jeong, J. Ahn and S. Park University of Connecticut ASTIN/AFIR Colloquium 2017 Panama City, Panama

More information

Ratemaking with a Copula-Based Multivariate Tweedie Model

Ratemaking with a Copula-Based Multivariate Tweedie Model with a Copula-Based Model joint work with Xiaoping Feng and Jean-Philippe Boucher University of Wisconsin - Madison CAS RPM Seminar March 10, 2015 1 / 18 Outline 1 2 3 4 5 6 2 / 18 Insurance Claims Two

More information

Single-level Models for Binary Responses

Single-level Models for Binary Responses Single-level Models for Binary Responses Distribution of Binary Data y i response for individual i (i = 1,..., n), coded 0 or 1 Denote by r the number in the sample with y = 1 Mean and variance E(y) =

More information

Lecture 8. Poisson models for counts

Lecture 8. Poisson models for counts Lecture 8. Poisson models for counts Jesper Rydén Department of Mathematics, Uppsala University jesper.ryden@math.uu.se Statistical Risk Analysis Spring 2014 Absolute risks The failure intensity λ(t) describes

More information

Generalized linear mixed models (GLMMs) for dependent compound risk models

Generalized linear mixed models (GLMMs) for dependent compound risk models Generalized linear mixed models (GLMMs) for dependent compound risk models Emiliano A. Valdez, PhD, FSA joint work with H. Jeong, J. Ahn and S. Park University of Connecticut Seminar Talk at Yonsei University

More information

Chapter 5: Generalized Linear Models

Chapter 5: Generalized Linear Models w w w. I C A 0 1 4. o r g Chapter 5: Generalized Linear Models b Curtis Gar Dean, FCAS, MAAA, CFA Ball State Universit: Center for Actuarial Science and Risk Management M Interest in Predictive Modeling

More information

Analytics Software. Beyond deterministic chain ladder reserves. Neil Covington Director of Solutions Management GI

Analytics Software. Beyond deterministic chain ladder reserves. Neil Covington Director of Solutions Management GI Analytics Software Beyond deterministic chain ladder reserves Neil Covington Director of Solutions Management GI Objectives 2 Contents 01 Background 02 Formulaic Stochastic Reserving Methods 03 Bootstrapping

More information

11. Generalized Linear Models: An Introduction

11. Generalized Linear Models: An Introduction Sociology 740 John Fox Lecture Notes 11. Generalized Linear Models: An Introduction Copyright 2014 by John Fox Generalized Linear Models: An Introduction 1 1. Introduction I A synthesis due to Nelder and

More information

8 Nominal and Ordinal Logistic Regression

8 Nominal and Ordinal Logistic Regression 8 Nominal and Ordinal Logistic Regression 8.1 Introduction If the response variable is categorical, with more then two categories, then there are two options for generalized linear models. One relies on

More information

Generalized Linear Models 1

Generalized Linear Models 1 Generalized Linear Models 1 STA 2101/442: Fall 2012 1 See last slide for copyright information. 1 / 24 Suggested Reading: Davison s Statistical models Exponential families of distributions Sec. 5.2 Chapter

More information

Generalized Linear Models: An Introduction

Generalized Linear Models: An Introduction Applied Statistics With R Generalized Linear Models: An Introduction John Fox WU Wien May/June 2006 2006 by John Fox Generalized Linear Models: An Introduction 1 A synthesis due to Nelder and Wedderburn,

More information

Standard Error of Technical Cost Incorporating Parameter Uncertainty

Standard Error of Technical Cost Incorporating Parameter Uncertainty Standard Error of Technical Cost Incorporating Parameter Uncertainty Christopher Morton Insurance Australia Group This presentation has been prepared for the Actuaries Institute 2012 General Insurance

More information

Logistic regression: Miscellaneous topics

Logistic regression: Miscellaneous topics Logistic regression: Miscellaneous topics April 11 Introduction We have covered two approaches to inference for GLMs: the Wald approach and the likelihood ratio approach I claimed that the likelihood ratio

More information

LOGISTIC REGRESSION Joseph M. Hilbe

LOGISTIC REGRESSION Joseph M. Hilbe LOGISTIC REGRESSION Joseph M. Hilbe Arizona State University Logistic regression is the most common method used to model binary response data. When the response is binary, it typically takes the form of

More information

Class Notes: Week 8. Probit versus Logit Link Functions and Count Data

Class Notes: Week 8. Probit versus Logit Link Functions and Count Data Ronald Heck Class Notes: Week 8 1 Class Notes: Week 8 Probit versus Logit Link Functions and Count Data This week we ll take up a couple of issues. The first is working with a probit link function. While

More information

Generalized Linear Models

Generalized Linear Models York SPIDA John Fox Notes Generalized Linear Models Copyright 2010 by John Fox Generalized Linear Models 1 1. Topics I The structure of generalized linear models I Poisson and other generalized linear

More information

MODELING COUNT DATA Joseph M. Hilbe

MODELING COUNT DATA Joseph M. Hilbe MODELING COUNT DATA Joseph M. Hilbe Arizona State University Count models are a subset of discrete response regression models. Count data are distributed as non-negative integers, are intrinsically heteroskedastic,

More information

Subject CS1 Actuarial Statistics 1 Core Principles

Subject CS1 Actuarial Statistics 1 Core Principles Institute of Actuaries of India Subject CS1 Actuarial Statistics 1 Core Principles For 2019 Examinations Aim The aim of the Actuarial Statistics 1 subject is to provide a grounding in mathematical and

More information

Non-Life Insurance: Mathematics and Statistics

Non-Life Insurance: Mathematics and Statistics Exercise sheet 1 Exercise 1.1 Discrete Distribution Suppose the random variable N follows a geometric distribution with parameter p œ (0, 1), i.e. ; (1 p) P[N = k] = k 1 p if k œ N \{0}, 0 else. (a) Show

More information

SCHOOL OF MATHEMATICS AND STATISTICS. Linear and Generalised Linear Models

SCHOOL OF MATHEMATICS AND STATISTICS. Linear and Generalised Linear Models SCHOOL OF MATHEMATICS AND STATISTICS Linear and Generalised Linear Models Autumn Semester 2017 18 2 hours Attempt all the questions. The allocation of marks is shown in brackets. RESTRICTED OPEN BOOK EXAMINATION

More information

Bonus Malus Systems in Car Insurance

Bonus Malus Systems in Car Insurance Bonus Malus Systems in Car Insurance Corina Constantinescu Institute for Financial and Actuarial Mathematics Joint work with Weihong Ni Croatian Actuarial Conference, Zagreb, June 5, 2017 Car Insurance

More information

Ratemaking application of Bayesian LASSO with conjugate hyperprior

Ratemaking application of Bayesian LASSO with conjugate hyperprior Ratemaking application of Bayesian LASSO with conjugate hyperprior Himchan Jeong and Emiliano A. Valdez University of Connecticut Actuarial Science Seminar Department of Mathematics University of Illinois

More information

GB2 Regression with Insurance Claim Severities

GB2 Regression with Insurance Claim Severities GB2 Regression with Insurance Claim Severities Mitchell Wills, University of New South Wales Emiliano A. Valdez, University of New South Wales Edward W. (Jed) Frees, University of Wisconsin - Madison UNSW

More information

Lecture 14: Introduction to Poisson Regression

Lecture 14: Introduction to Poisson Regression Lecture 14: Introduction to Poisson Regression Ani Manichaikul amanicha@jhsph.edu 8 May 2007 1 / 52 Overview Modelling counts Contingency tables Poisson regression models 2 / 52 Modelling counts I Why

More information

Modelling counts. Lecture 14: Introduction to Poisson Regression. Overview

Modelling counts. Lecture 14: Introduction to Poisson Regression. Overview Modelling counts I Lecture 14: Introduction to Poisson Regression Ani Manichaikul amanicha@jhsph.edu Why count data? Number of traffic accidents per day Mortality counts in a given neighborhood, per week

More information

EXAMINATIONS OF THE ROYAL STATISTICAL SOCIETY (formerly the Examinations of the Institute of Statisticians) GRADUATE DIPLOMA, 2007

EXAMINATIONS OF THE ROYAL STATISTICAL SOCIETY (formerly the Examinations of the Institute of Statisticians) GRADUATE DIPLOMA, 2007 EXAMINATIONS OF THE ROYAL STATISTICAL SOCIETY (formerly the Examinations of the Institute of Statisticians) GRADUATE DIPLOMA, 2007 Applied Statistics I Time Allowed: Three Hours Candidates should answer

More information

A Handbook of Statistical Analyses Using R. Brian S. Everitt and Torsten Hothorn

A Handbook of Statistical Analyses Using R. Brian S. Everitt and Torsten Hothorn A Handbook of Statistical Analyses Using R Brian S. Everitt and Torsten Hothorn CHAPTER 6 Logistic Regression and Generalised Linear Models: Blood Screening, Women s Role in Society, and Colonic Polyps

More information

Lecture 2: Linear and Mixed Models

Lecture 2: Linear and Mixed Models Lecture 2: Linear and Mixed Models Bruce Walsh lecture notes Introduction to Mixed Models SISG, Seattle 18 20 July 2018 1 Quick Review of the Major Points The general linear model can be written as y =

More information

Analysis of Categorical Data. Nick Jackson University of Southern California Department of Psychology 10/11/2013

Analysis of Categorical Data. Nick Jackson University of Southern California Department of Psychology 10/11/2013 Analysis of Categorical Data Nick Jackson University of Southern California Department of Psychology 10/11/2013 1 Overview Data Types Contingency Tables Logit Models Binomial Ordinal Nominal 2 Things not

More information

DELTA METHOD and RESERVING

DELTA METHOD and RESERVING XXXVI th ASTIN COLLOQUIUM Zurich, 4 6 September 2005 DELTA METHOD and RESERVING C.PARTRAT, Lyon 1 university (ISFA) N.PEY, AXA Canada J.SCHILLING, GIE AXA Introduction Presentation of methods based on

More information

The simple linear regression model discussed in Chapter 13 was written as

The simple linear regression model discussed in Chapter 13 was written as 1519T_c14 03/27/2006 07:28 AM Page 614 Chapter Jose Luis Pelaez Inc/Blend Images/Getty Images, Inc./Getty Images, Inc. 14 Multiple Regression 14.1 Multiple Regression Analysis 14.2 Assumptions of the Multiple

More information

STA 4504/5503 Sample Exam 1 Spring 2011 Categorical Data Analysis. 1. Indicate whether each of the following is true (T) or false (F).

STA 4504/5503 Sample Exam 1 Spring 2011 Categorical Data Analysis. 1. Indicate whether each of the following is true (T) or false (F). STA 4504/5503 Sample Exam 1 Spring 2011 Categorical Data Analysis 1. Indicate whether each of the following is true (T) or false (F). (a) T In 2 2 tables, statistical independence is equivalent to a population

More information

Chapter 1 Statistical Inference

Chapter 1 Statistical Inference Chapter 1 Statistical Inference causal inference To infer causality, you need a randomized experiment (or a huge observational study and lots of outside information). inference to populations Generalizations

More information

ST3241 Categorical Data Analysis I Generalized Linear Models. Introduction and Some Examples

ST3241 Categorical Data Analysis I Generalized Linear Models. Introduction and Some Examples ST3241 Categorical Data Analysis I Generalized Linear Models Introduction and Some Examples 1 Introduction We have discussed methods for analyzing associations in two-way and three-way tables. Now we will

More information

Linear Regression Models P8111

Linear Regression Models P8111 Linear Regression Models P8111 Lecture 25 Jeff Goldsmith April 26, 2016 1 of 37 Today s Lecture Logistic regression / GLMs Model framework Interpretation Estimation 2 of 37 Linear regression Course started

More information

BOOTSTRAPPING WITH MODELS FOR COUNT DATA

BOOTSTRAPPING WITH MODELS FOR COUNT DATA Journal of Biopharmaceutical Statistics, 21: 1164 1176, 2011 Copyright Taylor & Francis Group, LLC ISSN: 1054-3406 print/1520-5711 online DOI: 10.1080/10543406.2011.607748 BOOTSTRAPPING WITH MODELS FOR

More information

Introduction to General and Generalized Linear Models

Introduction to General and Generalized Linear Models Introduction to General and Generalized Linear Models Generalized Linear Models - part II Henrik Madsen Poul Thyregod Informatics and Mathematical Modelling Technical University of Denmark DK-2800 Kgs.

More information

f X (x) = λe λx, , x 0, k 0, λ > 0 Γ (k) f X (u)f X (z u)du

f X (x) = λe λx, , x 0, k 0, λ > 0 Γ (k) f X (u)f X (z u)du 11 COLLECTED PROBLEMS Do the following problems for coursework 1. Problems 11.4 and 11.5 constitute one exercise leading you through the basic ruin arguments. 2. Problems 11.1 through to 11.13 but excluding

More information

A Handbook of Statistical Analyses Using R 2nd Edition. Brian S. Everitt and Torsten Hothorn

A Handbook of Statistical Analyses Using R 2nd Edition. Brian S. Everitt and Torsten Hothorn A Handbook of Statistical Analyses Using R 2nd Edition Brian S. Everitt and Torsten Hothorn CHAPTER 7 Logistic Regression and Generalised Linear Models: Blood Screening, Women s Role in Society, Colonic

More information

Delta Boosting Machine and its application in Actuarial Modeling Simon CK Lee, Sheldon XS Lin KU Leuven, University of Toronto

Delta Boosting Machine and its application in Actuarial Modeling Simon CK Lee, Sheldon XS Lin KU Leuven, University of Toronto Delta Boosting Machine and its application in Actuarial Modeling Simon CK Lee, Sheldon XS Lin KU Leuven, University of Toronto This presentation has been prepared for the Actuaries Institute 2015 ASTIN

More information

Chapter 4: Generalized Linear Models-II

Chapter 4: Generalized Linear Models-II : Generalized Linear Models-II Dipankar Bandyopadhyay Department of Biostatistics, Virginia Commonwealth University BIOS 625: Categorical Data & GLM [Acknowledgements to Tim Hanson and Haitao Chu] D. Bandyopadhyay

More information

Solutions to the Spring 2016 CAS Exam S

Solutions to the Spring 2016 CAS Exam S Solutions to the Spring 2016 CAS Exam S There were 45 questions in total, of equal value, on this 4 hour exam. There was a 15 minute reading period in addition to the 4 hours. The Exam S is copyright 2016

More information

Generalized Linear Models I

Generalized Linear Models I Statistics 203: Introduction to Regression and Analysis of Variance Generalized Linear Models I Jonathan Taylor - p. 1/16 Today s class Poisson regression. Residuals for diagnostics. Exponential families.

More information

Package HGLMMM for Hierarchical Generalized Linear Models

Package HGLMMM for Hierarchical Generalized Linear Models Package HGLMMM for Hierarchical Generalized Linear Models Marek Molas Emmanuel Lesaffre Erasmus MC Erasmus Universiteit - Rotterdam The Netherlands ERASMUSMC - Biostatistics 20-04-2010 1 / 52 Outline General

More information

Logistic Regression. James H. Steiger. Department of Psychology and Human Development Vanderbilt University

Logistic Regression. James H. Steiger. Department of Psychology and Human Development Vanderbilt University Logistic Regression James H. Steiger Department of Psychology and Human Development Vanderbilt University James H. Steiger (Vanderbilt University) Logistic Regression 1 / 38 Logistic Regression 1 Introduction

More information

SAS Software to Fit the Generalized Linear Model

SAS Software to Fit the Generalized Linear Model SAS Software to Fit the Generalized Linear Model Gordon Johnston, SAS Institute Inc., Cary, NC Abstract In recent years, the class of generalized linear models has gained popularity as a statistical modeling

More information

Chapter 1. Modeling Basics

Chapter 1. Modeling Basics Chapter 1. Modeling Basics What is a model? Model equation and probability distribution Types of model effects Writing models in matrix form Summary 1 What is a statistical model? A model is a mathematical

More information

Logistic regression. 11 Nov Logistic regression (EPFL) Applied Statistics 11 Nov / 20

Logistic regression. 11 Nov Logistic regression (EPFL) Applied Statistics 11 Nov / 20 Logistic regression 11 Nov 2010 Logistic regression (EPFL) Applied Statistics 11 Nov 2010 1 / 20 Modeling overview Want to capture important features of the relationship between a (set of) variable(s)

More information

9 Generalized Linear Models

9 Generalized Linear Models 9 Generalized Linear Models The Generalized Linear Model (GLM) is a model which has been built to include a wide range of different models you already know, e.g. ANOVA and multiple linear regression models

More information

Now consider the case where E(Y) = µ = Xβ and V (Y) = σ 2 G, where G is diagonal, but unknown.

Now consider the case where E(Y) = µ = Xβ and V (Y) = σ 2 G, where G is diagonal, but unknown. Weighting We have seen that if E(Y) = Xβ and V (Y) = σ 2 G, where G is known, the model can be rewritten as a linear model. This is known as generalized least squares or, if G is diagonal, with trace(g)

More information

Lecture 3: Linear Models. Bruce Walsh lecture notes Uppsala EQG course version 28 Jan 2012

Lecture 3: Linear Models. Bruce Walsh lecture notes Uppsala EQG course version 28 Jan 2012 Lecture 3: Linear Models Bruce Walsh lecture notes Uppsala EQG course version 28 Jan 2012 1 Quick Review of the Major Points The general linear model can be written as y = X! + e y = vector of observed

More information

Two Hours. Mathematical formula books and statistical tables are to be provided THE UNIVERSITY OF MANCHESTER. 26 May :00 16:00

Two Hours. Mathematical formula books and statistical tables are to be provided THE UNIVERSITY OF MANCHESTER. 26 May :00 16:00 Two Hours MATH38052 Mathematical formula books and statistical tables are to be provided THE UNIVERSITY OF MANCHESTER GENERALISED LINEAR MODELS 26 May 2016 14:00 16:00 Answer ALL TWO questions in Section

More information

Solutions. Some of the problems that might be encountered in collecting data on check-in times are:

Solutions. Some of the problems that might be encountered in collecting data on check-in times are: Solutions Chapter 7 E7.1 Some of the problems that might be encountered in collecting data on check-in times are: Need to collect separate data for each airline (time and cost). Need to collect data for

More information

Statistics 5100 Spring 2018 Exam 1

Statistics 5100 Spring 2018 Exam 1 Statistics 5100 Spring 2018 Exam 1 Directions: You have 60 minutes to complete the exam. Be sure to answer every question, and do not spend too much time on any part of any question. Be concise with all

More information

Counts using Jitters joint work with Peng Shi, Northern Illinois University

Counts using Jitters joint work with Peng Shi, Northern Illinois University of Claim Longitudinal of Claim joint work with Peng Shi, Northern Illinois University UConn Actuarial Science Seminar 2 December 2011 Department of Mathematics University of Connecticut Storrs, Connecticut,

More information

Correlation & Simple Regression

Correlation & Simple Regression Chapter 11 Correlation & Simple Regression The previous chapter dealt with inference for two categorical variables. In this chapter, we would like to examine the relationship between two quantitative variables.

More information

Logistic Regression. Some slides from Craig Burkett. STA303/STA1002: Methods of Data Analysis II, Summer 2016 Michael Guerzhoy

Logistic Regression. Some slides from Craig Burkett. STA303/STA1002: Methods of Data Analysis II, Summer 2016 Michael Guerzhoy Logistic Regression Some slides from Craig Burkett STA303/STA1002: Methods of Data Analysis II, Summer 2016 Michael Guerzhoy Titanic Survival Case Study The RMS Titanic A British passenger liner Collided

More information

Modelling the risk process

Modelling the risk process Modelling the risk process Krzysztof Burnecki Hugo Steinhaus Center Wroc law University of Technology www.im.pwr.wroc.pl/ hugo Modelling the risk process 1 Risk process If (Ω, F, P) is a probability space

More information

Chapter 22: Log-linear regression for Poisson counts

Chapter 22: Log-linear regression for Poisson counts Chapter 22: Log-linear regression for Poisson counts Exposure to ionizing radiation is recognized as a cancer risk. In the United States, EPA sets guidelines specifying upper limits on the amount of exposure

More information

Statistical Methods III Statistics 212. Problem Set 2 - Answer Key

Statistical Methods III Statistics 212. Problem Set 2 - Answer Key Statistical Methods III Statistics 212 Problem Set 2 - Answer Key 1. (Analysis to be turned in and discussed on Tuesday, April 24th) The data for this problem are taken from long-term followup of 1423

More information

Lecture 2: Linear Models. Bruce Walsh lecture notes Seattle SISG -Mixed Model Course version 23 June 2011

Lecture 2: Linear Models. Bruce Walsh lecture notes Seattle SISG -Mixed Model Course version 23 June 2011 Lecture 2: Linear Models Bruce Walsh lecture notes Seattle SISG -Mixed Model Course version 23 June 2011 1 Quick Review of the Major Points The general linear model can be written as y = X! + e y = vector

More information

A Supplemental Material for Insurance Premium Prediction via Gradient Tree-Boosted Tweedie Compound Poisson Models

A Supplemental Material for Insurance Premium Prediction via Gradient Tree-Boosted Tweedie Compound Poisson Models A Supplemental Material for Insurance Premium Prediction via Gradient Tree-Boosted Tweedie Compound Poisson Models Yi Yang, Wei Qian and Hui Zou April 20, 2016 Part A: A property of Tweedie distributions

More information

Generalized linear models

Generalized linear models Generalized linear models Douglas Bates November 01, 2010 Contents 1 Definition 1 2 Links 2 3 Estimating parameters 5 4 Example 6 5 Model building 8 6 Conclusions 8 7 Summary 9 1 Generalized Linear Models

More information

A few basics of credibility theory

A few basics of credibility theory A few basics of credibility theory Greg Taylor Director, Taylor Fry Consulting Actuaries Professorial Associate, University of Melbourne Adjunct Professor, University of New South Wales General credibility

More information

STA 4504/5503 Sample Exam 1 Spring 2011 Categorical Data Analysis. 1. Indicate whether each of the following is true (T) or false (F).

STA 4504/5503 Sample Exam 1 Spring 2011 Categorical Data Analysis. 1. Indicate whether each of the following is true (T) or false (F). STA 4504/5503 Sample Exam 1 Spring 2011 Categorical Data Analysis 1. Indicate whether each of the following is true (T) or false (F). (a) (b) (c) (d) (e) In 2 2 tables, statistical independence is equivalent

More information

Solutions to the Fall 2017 CAS Exam S

Solutions to the Fall 2017 CAS Exam S Solutions to the Fall 2017 CAS Exam S (Incorporating the Final CAS Answer Key) There were 45 questions in total, of equal value, on this 4 hour exam. There was a 15 minute reading period in addition to

More information

Linear Regression. In this lecture we will study a particular type of regression model: the linear regression model

Linear Regression. In this lecture we will study a particular type of regression model: the linear regression model 1 Linear Regression 2 Linear Regression In this lecture we will study a particular type of regression model: the linear regression model We will first consider the case of the model with one predictor

More information

BIOS 2083 Linear Models c Abdus S. Wahed

BIOS 2083 Linear Models c Abdus S. Wahed Chapter 5 206 Chapter 6 General Linear Model: Statistical Inference 6.1 Introduction So far we have discussed formulation of linear models (Chapter 1), estimability of parameters in a linear model (Chapter

More information

Generalized Linear Models. Last time: Background & motivation for moving beyond linear

Generalized Linear Models. Last time: Background & motivation for moving beyond linear Generalized Linear Models Last time: Background & motivation for moving beyond linear regression - non-normal/non-linear cases, binary, categorical data Today s class: 1. Examples of count and ordered

More information

Sections 4.1, 4.2, 4.3

Sections 4.1, 4.2, 4.3 Sections 4.1, 4.2, 4.3 Timothy Hanson Department of Statistics, University of South Carolina Stat 770: Categorical Data Analysis 1/ 32 Chapter 4: Introduction to Generalized Linear Models Generalized linear

More information

Multinomial Logistic Regression Models

Multinomial Logistic Regression Models Stat 544, Lecture 19 1 Multinomial Logistic Regression Models Polytomous responses. Logistic regression can be extended to handle responses that are polytomous, i.e. taking r>2 categories. (Note: The word

More information

Various Issues in Fitting Contingency Tables

Various Issues in Fitting Contingency Tables Various Issues in Fitting Contingency Tables Statistics 149 Spring 2006 Copyright 2006 by Mark E. Irwin Complete Tables with Zero Entries In contingency tables, it is possible to have zero entries in a

More information

Lecture 9: Linear Regression

Lecture 9: Linear Regression Lecture 9: Linear Regression Goals Develop basic concepts of linear regression from a probabilistic framework Estimating parameters and hypothesis testing with linear models Linear regression in R Regression

More information

Practice Exam 1. (A) (B) (C) (D) (E) You are given the following data on loss sizes:

Practice Exam 1. (A) (B) (C) (D) (E) You are given the following data on loss sizes: Practice Exam 1 1. Losses for an insurance coverage have the following cumulative distribution function: F(0) = 0 F(1,000) = 0.2 F(5,000) = 0.4 F(10,000) = 0.9 F(100,000) = 1 with linear interpolation

More information

Lecture 5: Poisson and logistic regression

Lecture 5: Poisson and logistic regression Dankmar Böhning Southampton Statistical Sciences Research Institute University of Southampton, UK S 3 RI, 3-5 March 2014 introduction to Poisson regression application to the BELCAP study introduction

More information

LISA Short Course Series Generalized Linear Models (GLMs) & Categorical Data Analysis (CDA) in R. Liang (Sally) Shan Nov. 4, 2014

LISA Short Course Series Generalized Linear Models (GLMs) & Categorical Data Analysis (CDA) in R. Liang (Sally) Shan Nov. 4, 2014 LISA Short Course Series Generalized Linear Models (GLMs) & Categorical Data Analysis (CDA) in R Liang (Sally) Shan Nov. 4, 2014 L Laboratory for Interdisciplinary Statistical Analysis LISA helps VT researchers

More information

Peng Shi. Actuarial Research Conference August 5-8, 2015

Peng Shi. Actuarial Research Conference August 5-8, 2015 Multivariate Longitudinal of Using Copulas joint work with Xiaoping Feng and Jean-Philippe Boucher University of Wisconsin - Madison Université du Québec à Montréal Actuarial Research Conference August

More information

Computer exercise 4 Poisson Regression

Computer exercise 4 Poisson Regression Chalmers-University of Gothenburg Department of Mathematical Sciences Probability, Statistics and Risk MVE300 Computer exercise 4 Poisson Regression When dealing with two or more variables, the functional

More information

Loglinear models. STAT 526 Professor Olga Vitek

Loglinear models. STAT 526 Professor Olga Vitek Loglinear models STAT 526 Professor Olga Vitek April 19, 2011 8 Can Use Poisson Likelihood To Model Both Poisson and Multinomial Counts 8-1 Recall: Poisson Distribution Probability distribution: Y - number

More information

Project Report for STAT571 Statistical Methods Instructor: Dr. Ramon V. Leon. Wage Data Analysis. Yuanlei Zhang

Project Report for STAT571 Statistical Methods Instructor: Dr. Ramon V. Leon. Wage Data Analysis. Yuanlei Zhang Project Report for STAT7 Statistical Methods Instructor: Dr. Ramon V. Leon Wage Data Analysis Yuanlei Zhang 77--7 November, Part : Introduction Data Set The data set contains a random sample of observations

More information

Week 8: Correlation and Regression

Week 8: Correlation and Regression Health Sciences M.Sc. Programme Applied Biostatistics Week 8: Correlation and Regression The correlation coefficient Correlation coefficients are used to measure the strength of the relationship or association

More information

Faculty of Health Sciences. Regression models. Counts, Poisson regression, Lene Theil Skovgaard. Dept. of Biostatistics

Faculty of Health Sciences. Regression models. Counts, Poisson regression, Lene Theil Skovgaard. Dept. of Biostatistics Faculty of Health Sciences Regression models Counts, Poisson regression, 27-5-2013 Lene Theil Skovgaard Dept. of Biostatistics 1 / 36 Count outcome PKA & LTS, Sect. 7.2 Poisson regression The Binomial

More information

Solutions to the Spring 2015 CAS Exam ST

Solutions to the Spring 2015 CAS Exam ST Solutions to the Spring 2015 CAS Exam ST (updated to include the CAS Final Answer Key of July 15) There were 25 questions in total, of equal value, on this 2.5 hour exam. There was a 10 minute reading

More information

Linear Regression With Special Variables

Linear Regression With Special Variables Linear Regression With Special Variables Junhui Qian December 21, 2014 Outline Standardized Scores Quadratic Terms Interaction Terms Binary Explanatory Variables Binary Choice Models Standardized Scores:

More information

UNIVERSITY OF MASSACHUSETTS Department of Mathematics and Statistics Basic Exam - Applied Statistics Thursday, August 30, 2018

UNIVERSITY OF MASSACHUSETTS Department of Mathematics and Statistics Basic Exam - Applied Statistics Thursday, August 30, 2018 UNIVERSITY OF MASSACHUSETTS Department of Mathematics and Statistics Basic Exam - Applied Statistics Thursday, August 30, 2018 Work all problems. 60 points are needed to pass at the Masters Level and 75

More information

Final Exam. Name: Solution:

Final Exam. Name: Solution: Final Exam. Name: Instructions. Answer all questions on the exam. Open books, open notes, but no electronic devices. The first 13 problems are worth 5 points each. The rest are worth 1 point each. HW1.

More information

Lecture 2: Poisson and logistic regression

Lecture 2: Poisson and logistic regression Dankmar Böhning Southampton Statistical Sciences Research Institute University of Southampton, UK S 3 RI, 11-12 December 2014 introduction to Poisson regression application to the BELCAP study introduction

More information

13.1 Categorical Data and the Multinomial Experiment

13.1 Categorical Data and the Multinomial Experiment Chapter 13 Categorical Data Analysis 13.1 Categorical Data and the Multinomial Experiment Recall Variable: (numerical) variable (i.e. # of students, temperature, height,). (non-numerical, categorical)

More information

Logistic Regression - problem 6.14

Logistic Regression - problem 6.14 Logistic Regression - problem 6.14 Let x 1, x 2,, x m be given values of an input variable x and let Y 1,, Y m be independent binomial random variables whose distributions depend on the corresponding values

More information

STA 216: GENERALIZED LINEAR MODELS. Lecture 1. Review and Introduction. Much of statistics is based on the assumption that random

STA 216: GENERALIZED LINEAR MODELS. Lecture 1. Review and Introduction. Much of statistics is based on the assumption that random STA 216: GENERALIZED LINEAR MODELS Lecture 1. Review and Introduction Much of statistics is based on the assumption that random variables are continuous & normally distributed. Normal linear regression

More information

Multilevel Models in Matrix Form. Lecture 7 July 27, 2011 Advanced Multivariate Statistical Methods ICPSR Summer Session #2

Multilevel Models in Matrix Form. Lecture 7 July 27, 2011 Advanced Multivariate Statistical Methods ICPSR Summer Session #2 Multilevel Models in Matrix Form Lecture 7 July 27, 2011 Advanced Multivariate Statistical Methods ICPSR Summer Session #2 Today s Lecture Linear models from a matrix perspective An example of how to do

More information

Generalized Linear Models for a Dependent Aggregate Claims Model

Generalized Linear Models for a Dependent Aggregate Claims Model Generalized Linear Models for a Dependent Aggregate Claims Model Juliana Schulz A Thesis for The Department of Mathematics and Statistics Presented in Partial Fulfillment of the Requirements for the Degree

More information

DISPLAYING THE POISSON REGRESSION ANALYSIS

DISPLAYING THE POISSON REGRESSION ANALYSIS Chapter 17 Poisson Regression Chapter Table of Contents DISPLAYING THE POISSON REGRESSION ANALYSIS...264 ModelInformation...269 SummaryofFit...269 AnalysisofDeviance...269 TypeIII(Wald)Tests...269 MODIFYING

More information

K. Model Diagnostics. residuals ˆɛ ij = Y ij ˆµ i N = Y ij Ȳ i semi-studentized residuals ω ij = ˆɛ ij. studentized deleted residuals ɛ ij =

K. Model Diagnostics. residuals ˆɛ ij = Y ij ˆµ i N = Y ij Ȳ i semi-studentized residuals ω ij = ˆɛ ij. studentized deleted residuals ɛ ij = K. Model Diagnostics We ve already seen how to check model assumptions prior to fitting a one-way ANOVA. Diagnostics carried out after model fitting by using residuals are more informative for assessing

More information