A Practitioner s Guide to Generalized Linear Models

Size: px
Start display at page:

Download "A Practitioner s Guide to Generalized Linear Models"

Transcription

1 A Practitioners Guide to Generalized Linear Models Background The classical linear models and most of the minimum bias procedures are special cases of generalized linear models (GLMs). GLMs are more technically efficient than iterative methods, and provide statistical diagnostics that aid in variable selection. Today, they re the industry standard for PPA and other personal lines pricing. Their primary applications are in ratemaking and underwriting, although there s been an increased use for target marketing analysis. The failings of one-way analysis A one way analysis summarizes statistics for each eplanatory variable, but doesn t take into account the effect of other variables. These analyses can be distorted by correlations between rating factors. Traditional techniques attempt to standardize the data to remove the distorting effect caused by the correlations, but they are only approimations. One-way analyses also don t consider interdependencies between factors in the way they affect claims eperience, which eists when the effect of one factor varies depending on the levels of another factor. Multivariate methods (like GLMs) adjust for correlations, and allow us to investigate interaction effects. The failings of minimum bias procedures Minimum bias procedures impose a set of equations relating the observed data, the rating variables, and set of parameters to be determined. An iterative procedure is used to converge to the optimal solution. However, once this solution is found, there s no systematic way to test whether a variable influences the result with statistical significance. This type of procedure lacks a statistical framework which would allow us to better assess the quality of the modeling. The connection of minimum bias to GLM Minimum Bias Procedure Corresponding Generalized Linear Models Link Function Error Function Multiplicative Balance Principle Logarithmic Poisson Additive Balance Principle Identity Normal Multiplicative Least Squares Logarithmic Normal Multiplicative Maimum Likelihood with eponential density function Logarithmic Gamma Multiplicative Maimum Likelihood with Normal density function Logarithmic Normal Additive Maimum likelihood with Normal density function Identity Normal The chi-squared additive and multiplicative minimum bias models have no corresponding GLM analog. Linear models The purpose of linear models and GLMs is to epress the relationship between an observed response variable (Y) and a number of covariates (X). Linear models state that Y is the sum of its mean and a random variable (Y = μ + ε). We assume that the epected value of Y (μ) can be written as a linear combination of the covariates, and that the error term (ε) is normally distributed with mean zero and variance σ 2. Eample: Suppose Y is the average claim severity, and that there are two factors, territory and se, resulting in four covariates: male, female, urban, and rural. The linear model epresses the observed item (Y) as a linear combination of a specified selection of the four variables, plus an error term that s normally distributed as above. One model is Y = β 1 X 1 + β 2 X 2 + β 3 X 3 + β 4 X 4 + ε. Since there is a linear dependency between the four covariates, the model in this form is not uniquely defined. To fi this, we instead consider Y = β 1 X 1 + β 2 X 2 + β 3 X 3 + ε. This implies that there s an average response for men (β 1 ), and an average response for women (β 2 ). The effect of being an urban driver has an additional additive effect (β 3 ) regardless of gender. We could also think of this as a model which assumes an average response for the base case of women in rural areas, with additional additive effects for being male and for being in an urban area. If we have a matri of 4 observations, we can epress them as a system of equations. For the classical linear model, we minimize the sum of the squared errors to solve for the three parameters. If the system epands greatly, vectors and matrices may be used in place of the system of equations. Solving the matri algebra will yield the same factors as the system of equations. Page 1 of 7

2 In conclusion, the basic ingredients for a linear model consist of (1) a set of assumptions about the relationship between the observed values and the predictor variables, and (2) an objective function which is to be optimized in order to solve the problem. Classic linear model assumptions Y = E Y + ε E Y = Xβ Random Component: Systematic Component: Link Function: Each component of Y is independent and is normally distributed. The mean of each component may differ, but they all have a common variance The covariates are combined to give the linear predictor η; η = Xβ. The relationship between the random and systematic components is specified via a link function. In the linear model, the link function is equal to the identity function. These assumptions are not easy to guarantee. It s hard to assume normality and constant variance for response variables. Linear regression attempts to transform the data so these requirements are met, but there s no reason why such a transformation should eist. Also, the values for the response variable may be restricted to be positive, in which case, the assumption of normality is violated. If the response variable is strictly non-negative, then the variance is a function of the mean, and tends to zero with the mean. Additivity is also not realistic for many applications. Many insurance risks tend to vary multiplicatively with the rating factors, not additively. Generalized linear model assumptions Random Component: Systematic Component: Link Function: Each component of Y is independent and is from one of the eponential family of distributions. The covariates are combined to give the linear predictor η; η = Xβ. The relationship between the random and systematic components is specified via a link function, g, that is differentiable and monotonic, such that E Y μ = g 1 η A member of the eponential family of distributions has 2 properties: 1. The distribution is completely specified in terms of its mean and variance 2. The variance is a function of its mean. The 2 nd property can be seen by defining the variance as Var Y i = φv μ i ω i. Distribution V() Normal 1 Poisson Gamma 2 Binomial (1 ) Inverse Gaussian 3 Here, V() is the variance function and is a specified function. The parameter φ scales the variance, and ω i is a constant that assigns weights to individual observations. In addition to the distributions shown here, a member of the eponential family is the Tweedie distribution. This distribution has a point mass at zero, and a variance function proportional to μ p. This distribution is used to model pure premium data directly. The choice of the variance function affects the results of the GLM. Two eamples are shown here: This eample shows the result of fitting three different GLMS to three data points. As shown in the graph, the GLM with a Normal variance function produced fitted values which are attracted to the original data points with equal weight. With a Poisson error, the GLM assumes that the variance increases with the epected values of each observation. Observations with smaller epected values have a smaller assumed variance. The model produces fitted values which are more influenced by points on the left than by the points on the right. With the Gamma variance function, the GLM is even more strongly influence by the point on the left, since the model assumes the variance increases with the square of the epected value. Page 2 of 7

3 A more realistic eample is listed here. This consists of an artificially generated dataset representing an insurance portfolio. Claims eperience is randomly generated using a gamma distribution, then analyzed using three models to see how closely the results of each model relate to the true factor effect. In this case, we know the true effect of the rating factors, in practice, this is not true. Three methods used: 1. One way analysis 2. GLM with Normal variance function 3. GLM with gamma variance function Because of the correlations between the rating factors in the data, the one way analysis is etremely distorted. The GLM with the assumed Normal is close to the correct relativities, but the GLM with the Gamma variance function yields results closest to the true effect. In addition to the variance function, two other parameters define the variance of each observation, the scale parameter φ and the prior weights ω i. The prior weights allow information about the known credibility to be incorporated in the model. Observations with higher eposure are deemed to have lower variance and the model will be more influenced by these observations. Let cell i denote a cell defined by a classification system. m ik be the number of claims arising from the k t unit of eposure in cell i ω i be the number of eposures in cell i Y i be the observed claim frequency in cell i ω i k=1 Y i = 1 ω i m ik If m ik is Poisson with frequency f i for all eposures, then E m ik = f i = Var[m ik ]. Assuming that the eposures are independent, μ i = f i and Var Y i = μ i 1 ω i An alternative eample: Let z ik be the claim size of the k t claim in cell i ω i be the number of claims in the cell Y i be the observed mean claim size in cell i ω i k=1 Y i = 1 ω i z ik. So, in this case, V μ i = μ i, φ = 1, and the prior weights are the eposures. Assume that the random process generating each individual claim is gamma distributed, and that each claim is independent: E z ik = m i Var z ik = σ 2 2 m i μ i = m i 2 Var Y i = μ σ 2 i ω i So, in this case, the variance of Y i follows the general form for all eponential distributions, with V μ i = μ 2 i, φ = σ 2, and prior weight equal to the number of claims in the cell. Prior weights can also be used to attach a lower credibility to a part of the data which is known to be less reliable. In some cases, the scale parameter is equal to 1, and falls out of the analysis. In general, this is not true, and the scale parameter must be estimated from the data. This is not actually necessary in order to solve for the GLM parameters, but it is necessary in order to calculate certain statistics (like the standard error). φ can be treated as another parameter, and estimated by maimum likelihood. This is mathematically difficult, and in its place, an estimate of φ can be used. Page 3 of 7

4 Estimates of φ: 1. The moment estimator = φ = 1 2. The total deviance estimator = φ = D ( i ω i Y i μ 2 i n p V μ i n p ) In practice, we sometimes attempt to transform data to satisfy the requirements of Normality, constant variance, and additivity of effects. GLMs merely require that there be a link function that guarantees the condition of additivity. Classical linear models require that Y be additive in the covariates, GLMs require that some transformation of Y be additive in the covariates. In theory, a different link function could be used for each observation i, but in practice, this is impractical, and rarely done. The link function must be differentiable and monotonic (either strictly increasing or strictly decreasing). Typical choices are: g() g 1 Identity Log ln () e Logit ln 1 Reciprocal 1 e 1 + e The log-link function is appealing, because the effect of the covariates are multiplicative. When a log link function is used, the GLM estimates logs of multiplicative effects. Choices of link functions and error functions can yield GLMs which are equivalent to a number of minimum bias models, as well as a simple linear model. When the effect of an eplanatory variable is known, it s appropriate to include information about this variable in the model as a known effect, by introducing an offset term ξ into the definition of the linear prediction, giving η = Xβ + ξ. This gives us: 1 E Y = μ = g 1 η = g 1 (Xβ + ξ) An eample of this is when we re fitting a GLM to the claim count (as opposed to frequency). Since we assume that the epected count of claims increases in proportion to the eposure of an observation, we should incorporate this information in the GLM. We set the offset term to be equal to the log of the eposure of each observation. In the case of the Poisson multiplicative GLM, modeling claim counts with an offset term equal to the log of the eposure produces identical results to modeling claim frequencies with no offset term, but with prior weights set equal to the eposure of each observation Typical GLM forms Y Claim Frequencies Claim Counts Average Claim Size Probability Link Function g() ln () ln () ln () ln ( 1 ) Error Poisson Poisson Gamma Binomial Scale Parameter 1 1 Estimated 1 Variance Function V 2 t t Prior Weights ω i Eposure 1 # Claims 1 Offset (ξ) 0 ln (eposure) 0 0 Appeal Invariant to time Invariant to Currency GLM maimum likelihood estimators Once the model is defined, the components are derived by maimizing the likelihood function to find the parameters which produce the observed data with the highest probability. In simple eamples, the produce for maimizing likelihood involves finding the solution to a system of equations with linear algebra. In practice, numerical techniques are used due to the large number of observations. Page 4 of 7

5 Page 5 of 7

6 Solving simple eamples The general procedure used in the 2 handwritten eamples following is: 1. Specify the design matri X and the vector of parameters β 2. Choose the error structure and the link function 3. Identify the log-likelihood function 4. Take the logarithm to convert the product of many terms into a sum 5. Maimize the logarithm of the likelihood function by taking partial derivatives with respect to each parameter, setting them to zero and solving the resulting system of equations 6. Compute the predicted values Solving for large datasets using numerical techniques In insurance modeling, it s not practical to use the above techniques; instead, we use iterative numerical techniques, such as Newton-Raphson iteration. Iterative processes can be started using either a value of zero for the elements, or by using the estimates implied by a one-way analysis or of another previously used GLM. Base levels and the intercept term In practice, when considering many factors each with many levels, it s helpful to parameterize the GLM by including an intercept term, which is a parameter that applies to all observations. This is done (in our eample) by defining the design matri by redefining beta-one as the intercept term, and only having one parameter relating to gender. When considering categorical factors and an intercept term, one level of each factor should have no parameter associated with it, so that the model remains uniquely defined. If a model were structured with an intercept term, but WITHOUT each factor having a base level, then the GLM solving routine would remove as many parameters as necessary to make the model uniquely defined. This process is called aliasing. Aliasing Aliasing occurs when there s a linear dependency among the observed covariates. There are two types: Intrinsic and Etrinsic. Intrinsic aliasing occurs because of dependences inherent in the definition of the covariates. These arise most commonly whenever categorical factors are included in the model. GLM software will remove parameters which are aliased. The choice of which parameter to alias does not affect the fitted values. Etrinsic aliasing also arises from a dependency among the covariates, but occurs when the dependency results from the nature of the data, rather than inherent properties of the covariates themselves. It arises if one level of a particular factor is perfectly correlated with a level of another factor When modeling in practice, a common problem occurs when two or more factors contain levels that are almost, but not quite, perfectly correlated. When levels of two factors are nearly aliased in this way, convergence problems can occur, or the GLM will give results that appear very confusing. To deal with this, eamine two-way tables of eposure and claim counts for the factors containing the nonsense parameter estimates. Then, identify the factor combinations which cause the near aliasing. The issue can be resolved by deleting or ecluding records, or by reclassifying the records into another factor level. Model diagnostics GLMs can also produce additional information indicating the certainty of the parameter estimates. The multivariate version of the Cramer-Rao lower bound can define standard errors for each parameter estimate. Standard errors can be thought of as being indicators of the speed with which log-likelihood falls from the maimum, given a change in a parameter. It s assumed that parameter estimates as asymptotically normally distribution, so it s possible to use a statistical test on individual parameter estimates, comparing each estimate with zero using a χ 2 test, with the square of the parameter estimate divided by its variance being compared to a χ 2 distribution. This compares the parameter with the base level of the factor. We can repeatedly change the base level, and construct a triangle of tests, comparing every pair of estimates. If none of the differences is significant, it s a good bet that the factor is not either. Measures of deviance can be used to assess the theoretical significance of a particular factor. Deviance is a measure of how much the fitted values differ from the observations. Define the deviance function: d Y i ; μ i = 2ω i Y i μ i Y i ζ V ζ dζ Page 6 of 7

7 Since V() is strictly positive, the deviance function is also strictly positive, and satisfies the condition for being a distance function. This function is a measurement of the difference between the fitted and the actual observations which gives more weight to the difference between Y i and μ i when the variance function is small. The deviance function can be thought of as a generalized form of the squared error. Summing the deviance function across all observations gives an overall measure of deviance, called the total deviance: D = n i=1 2ω i μ i Y i We can divide this by the scale parameter to get the scaled deviance, which is a generalized form of the sum of squared errors, adjusting for the shape of the distribution. D = n i=1 2ω i φ μ i For the class of eponential distributions, this is equal to twice the difference between the maimum achievable likelihood and the likelihood of the model. One useful test considers the ratio of the likelihood of two nested models. Nested Models refer to the situation where one model contains eplanatory variables which are a subset of the eplanatory variables in a second model. The change in scaled deviance between two nested models is a sample from a χ 2 distribution with degrees of freedom equal to the difference in degrees of freedom between the two models. Degrees of freedom for a model are the number of observations minus the number of parameters. This allows us to test the significance of the parameters that differ between the two models. It measures whether the inclusion of an eplanatory factor in a model improves the model enough, given the etra parameters which it adds to the model. The χ 2 tests depend on the scaled deviance. For some distributions, the scale parameter is not known, and must be estimated. In the event that the scale parameter used isn t accurate, the reliability of this test is decreased. After adjusting for degrees of freedom and the (true) scale parameter, the effect of the scale parameter is also distributed with a χ 2 distribution. The ratio of the change in deviance and the adjusted estimate of the scale is therefore distributed with an F- distribution. The F-Test is suitable for use when the scale parameter is noti known. When we know the scale, there s no advantage. Y i Y i ζ V ζ Y i ζ V ζ dζ dζ Page 7 of 7

PL-2 The Matrix Inverted: A Primer in GLM Theory

PL-2 The Matrix Inverted: A Primer in GLM Theory PL-2 The Matrix Inverted: A Primer in GLM Theory 2005 CAS Seminar on Ratemaking Claudine Modlin, FCAS Watson Wyatt Insurance & Financial Services, Inc W W W. W A T S O N W Y A T T. C O M / I N S U R A

More information

Content Preview. Multivariate Methods. There are several theoretical stumbling blocks to overcome to develop rating relativities

Content Preview. Multivariate Methods. There are several theoretical stumbling blocks to overcome to develop rating relativities Introduction to Ratemaking Multivariate Methods March 15, 2010 Jonathan Fox, FCAS, GuideOne Mutual Insurance Group Content Preview 1. Theoretical Issues 2. One-way Analysis Shortfalls 3. Multivariate Methods

More information

Generalized Linear Models (GLZ)

Generalized Linear Models (GLZ) Generalized Linear Models (GLZ) Generalized Linear Models (GLZ) are an extension of the linear modeling process that allows models to be fit to data that follow probability distributions other than the

More information

GLM I An Introduction to Generalized Linear Models

GLM I An Introduction to Generalized Linear Models GLM I An Introduction to Generalized Linear Models CAS Ratemaking and Product Management Seminar March Presented by: Tanya D. Havlicek, ACAS, MAAA ANTITRUST Notice The Casualty Actuarial Society is committed

More information

Advanced Ratemaking. Chapter 27 GLMs

Advanced Ratemaking. Chapter 27 GLMs Mahlerʼs Guide to Advanced Ratemaking CAS Exam 8 Chapter 27 GLMs prepared by Howard C. Mahler, FCAS Copyright 2016 by Howard C. Mahler. Study Aid 2016-8 Howard Mahler hmahler@mac.com www.howardmahler.com/teaching

More information

Generalized Linear Models 1

Generalized Linear Models 1 Generalized Linear Models 1 STA 2101/442: Fall 2012 1 See last slide for copyright information. 1 / 24 Suggested Reading: Davison s Statistical models Exponential families of distributions Sec. 5.2 Chapter

More information

LOGISTIC REGRESSION Joseph M. Hilbe

LOGISTIC REGRESSION Joseph M. Hilbe LOGISTIC REGRESSION Joseph M. Hilbe Arizona State University Logistic regression is the most common method used to model binary response data. When the response is binary, it typically takes the form of

More information

STA102 Class Notes Chapter Logistic Regression

STA102 Class Notes Chapter Logistic Regression STA0 Class Notes Chapter 0 0. Logistic Regression We continue to study the relationship between a response variable and one or more eplanatory variables. For SLR and MLR (Chapters 8 and 9), our response

More information

Multilevel Models in Matrix Form. Lecture 7 July 27, 2011 Advanced Multivariate Statistical Methods ICPSR Summer Session #2

Multilevel Models in Matrix Form. Lecture 7 July 27, 2011 Advanced Multivariate Statistical Methods ICPSR Summer Session #2 Multilevel Models in Matrix Form Lecture 7 July 27, 2011 Advanced Multivariate Statistical Methods ICPSR Summer Session #2 Today s Lecture Linear models from a matrix perspective An example of how to do

More information

A Re-Introduction to General Linear Models (GLM)

A Re-Introduction to General Linear Models (GLM) A Re-Introduction to General Linear Models (GLM) Today s Class: You do know the GLM Estimation (where the numbers in the output come from): From least squares to restricted maximum likelihood (REML) Reviewing

More information

Introduction to Probability Theory for Graduate Economics Fall 2008

Introduction to Probability Theory for Graduate Economics Fall 2008 Introduction to Probability Theory for Graduate Economics Fall 008 Yiğit Sağlam October 10, 008 CHAPTER - RANDOM VARIABLES AND EXPECTATION 1 1 Random Variables A random variable (RV) is a real-valued function

More information

Economics 205 Exercises

Economics 205 Exercises Economics 05 Eercises Prof. Watson, Fall 006 (Includes eaminations through Fall 003) Part 1: Basic Analysis 1. Using ε and δ, write in formal terms the meaning of lim a f() = c, where f : R R.. Write the

More information

Generalized linear models

Generalized linear models Generalized linear models Douglas Bates November 01, 2010 Contents 1 Definition 1 2 Links 2 3 Estimating parameters 5 4 Example 6 5 Model building 8 6 Conclusions 8 7 Summary 9 1 Generalized Linear Models

More information

Chapter 9 Regression. 9.1 Simple linear regression Linear models Least squares Predictions and residuals.

Chapter 9 Regression. 9.1 Simple linear regression Linear models Least squares Predictions and residuals. 9.1 Simple linear regression 9.1.1 Linear models Response and eplanatory variables Chapter 9 Regression With bivariate data, it is often useful to predict the value of one variable (the response variable,

More information

Lecture 5: Finding limits analytically Simple indeterminate forms

Lecture 5: Finding limits analytically Simple indeterminate forms Lecture 5: Finding its analytically Simple indeterminate forms Objectives: (5.) Use algebraic techniques to resolve 0/0 indeterminate forms. (5.) Use the squeeze theorem to evaluate its. (5.3) Use trigonometric

More information

Generalized Linear Models

Generalized Linear Models York SPIDA John Fox Notes Generalized Linear Models Copyright 2010 by John Fox Generalized Linear Models 1 1. Topics I The structure of generalized linear models I Poisson and other generalized linear

More information

4.3 How derivatives affect the shape of a graph. The first derivative test and the second derivative test.

4.3 How derivatives affect the shape of a graph. The first derivative test and the second derivative test. Chapter 4: Applications of Differentiation In this chapter we will cover: 41 Maimum and minimum values The critical points method for finding etrema 43 How derivatives affect the shape of a graph The first

More information

1. Sets A set is any collection of elements. Examples: - the set of even numbers between zero and the set of colors on the national flag.

1. Sets A set is any collection of elements. Examples: - the set of even numbers between zero and the set of colors on the national flag. San Francisco State University Math Review Notes Michael Bar Sets A set is any collection of elements Eamples: a A {,,4,6,8,} - the set of even numbers between zero and b B { red, white, bule} - the set

More information

Chapter 1 Statistical Inference

Chapter 1 Statistical Inference Chapter 1 Statistical Inference causal inference To infer causality, you need a randomized experiment (or a huge observational study and lots of outside information). inference to populations Generalizations

More information

Chapter 5: Generalized Linear Models

Chapter 5: Generalized Linear Models w w w. I C A 0 1 4. o r g Chapter 5: Generalized Linear Models b Curtis Gar Dean, FCAS, MAAA, CFA Ball State Universit: Center for Actuarial Science and Risk Management M Interest in Predictive Modeling

More information

November 13, 2018 MAT186 Week 8 Justin Ko

November 13, 2018 MAT186 Week 8 Justin Ko 1 Mean Value Theorem Theorem 1 (Mean Value Theorem). Let f be a continuous on [a, b] and differentiable on (a, b). There eists a c (a, b) such that f f(b) f(a) (c) =. b a Eample 1: The Mean Value Theorem

More information

Logistic Regression. Some slides from Craig Burkett. STA303/STA1002: Methods of Data Analysis II, Summer 2016 Michael Guerzhoy

Logistic Regression. Some slides from Craig Burkett. STA303/STA1002: Methods of Data Analysis II, Summer 2016 Michael Guerzhoy Logistic Regression Some slides from Craig Burkett STA303/STA1002: Methods of Data Analysis II, Summer 2016 Michael Guerzhoy Titanic Survival Case Study The RMS Titanic A British passenger liner Collided

More information

EPSY 905: Fundamentals of Multivariate Modeling Online Lecture #7

EPSY 905: Fundamentals of Multivariate Modeling Online Lecture #7 Introduction to Generalized Univariate Models: Models for Binary Outcomes EPSY 905: Fundamentals of Multivariate Modeling Online Lecture #7 EPSY 905: Intro to Generalized In This Lecture A short review

More information

MATH 1325 Business Calculus Guided Notes

MATH 1325 Business Calculus Guided Notes MATH 135 Business Calculus Guided Notes LSC North Harris By Isabella Fisher Section.1 Functions and Theirs Graphs A is a rule that assigns to each element in one and only one element in. Set A Set B Set

More information

Introducing Generalized Linear Models: Logistic Regression

Introducing Generalized Linear Models: Logistic Regression Ron Heck, Summer 2012 Seminars 1 Multilevel Regression Models and Their Applications Seminar Introducing Generalized Linear Models: Logistic Regression The generalized linear model (GLM) represents and

More information

Standard Error of Technical Cost Incorporating Parameter Uncertainty

Standard Error of Technical Cost Incorporating Parameter Uncertainty Standard Error of Technical Cost Incorporating Parameter Uncertainty Christopher Morton Insurance Australia Group This presentation has been prepared for the Actuaries Institute 2012 General Insurance

More information

Ron Heck, Fall Week 8: Introducing Generalized Linear Models: Logistic Regression 1 (Replaces prior revision dated October 20, 2011)

Ron Heck, Fall Week 8: Introducing Generalized Linear Models: Logistic Regression 1 (Replaces prior revision dated October 20, 2011) Ron Heck, Fall 2011 1 EDEP 768E: Seminar in Multilevel Modeling rev. January 3, 2012 (see footnote) Week 8: Introducing Generalized Linear Models: Logistic Regression 1 (Replaces prior revision dated October

More information

11. Generalized Linear Models: An Introduction

11. Generalized Linear Models: An Introduction Sociology 740 John Fox Lecture Notes 11. Generalized Linear Models: An Introduction Copyright 2014 by John Fox Generalized Linear Models: An Introduction 1 1. Introduction I A synthesis due to Nelder and

More information

Course. Print and use this sheet in conjunction with MathinSite s Maclaurin Series applet and worksheet.

Course. Print and use this sheet in conjunction with MathinSite s Maclaurin Series applet and worksheet. Maclaurin Series Learning Outcomes After reading this theory sheet, you should recognise the difference between a function and its polynomial epansion (if it eists!) understand what is meant by a series

More information

AP Statistics Cumulative AP Exam Study Guide

AP Statistics Cumulative AP Exam Study Guide AP Statistics Cumulative AP Eam Study Guide Chapters & 3 - Graphs Statistics the science of collecting, analyzing, and drawing conclusions from data. Descriptive methods of organizing and summarizing statistics

More information

LISA Short Course Series Generalized Linear Models (GLMs) & Categorical Data Analysis (CDA) in R. Liang (Sally) Shan Nov. 4, 2014

LISA Short Course Series Generalized Linear Models (GLMs) & Categorical Data Analysis (CDA) in R. Liang (Sally) Shan Nov. 4, 2014 LISA Short Course Series Generalized Linear Models (GLMs) & Categorical Data Analysis (CDA) in R Liang (Sally) Shan Nov. 4, 2014 L Laboratory for Interdisciplinary Statistical Analysis LISA helps VT researchers

More information

Generalized Linear Models: An Introduction

Generalized Linear Models: An Introduction Applied Statistics With R Generalized Linear Models: An Introduction John Fox WU Wien May/June 2006 2006 by John Fox Generalized Linear Models: An Introduction 1 A synthesis due to Nelder and Wedderburn,

More information

Proportional hazards regression

Proportional hazards regression Proportional hazards regression Patrick Breheny October 8 Patrick Breheny Survival Data Analysis (BIOS 7210) 1/28 Introduction The model Solving for the MLE Inference Today we will begin discussing regression

More information

Regression techniques provide statistical analysis of relationships. Research designs may be classified as experimental or observational; regression

Regression techniques provide statistical analysis of relationships. Research designs may be classified as experimental or observational; regression LOGISTIC REGRESSION Regression techniques provide statistical analysis of relationships. Research designs may be classified as eperimental or observational; regression analyses are applicable to both types.

More information

Tento projekt je spolufinancován Evropským sociálním fondem a Státním rozpočtem ČR InoBio CZ.1.07/2.2.00/

Tento projekt je spolufinancován Evropským sociálním fondem a Státním rozpočtem ČR InoBio CZ.1.07/2.2.00/ Tento projekt je spolufinancován Evropským sociálním fondem a Státním rozpočtem ČR InoBio CZ.1.07/2.2.00/28.0018 Statistical Analysis in Ecology using R Linear Models/GLM Ing. Daniel Volařík, Ph.D. 13.

More information

Generalized Linear Models I

Generalized Linear Models I Statistics 203: Introduction to Regression and Analysis of Variance Generalized Linear Models I Jonathan Taylor - p. 1/16 Today s class Poisson regression. Residuals for diagnostics. Exponential families.

More information

Notes on Discriminant Functions and Optimal Classification

Notes on Discriminant Functions and Optimal Classification Notes on Discriminant Functions and Optimal Classification Padhraic Smyth, Department of Computer Science University of California, Irvine c 2017 1 Discriminant Functions Consider a classification problem

More information

An Introduction to Mplus and Path Analysis

An Introduction to Mplus and Path Analysis An Introduction to Mplus and Path Analysis PSYC 943: Fundamentals of Multivariate Modeling Lecture 10: October 30, 2013 PSYC 943: Lecture 10 Today s Lecture Path analysis starting with multivariate regression

More information

Factor Analysis. Qian-Li Xue

Factor Analysis. Qian-Li Xue Factor Analysis Qian-Li Xue Biostatistics Program Harvard Catalyst The Harvard Clinical & Translational Science Center Short course, October 7, 06 Well-used latent variable models Latent variable scale

More information

Lecture 14: Introduction to Poisson Regression

Lecture 14: Introduction to Poisson Regression Lecture 14: Introduction to Poisson Regression Ani Manichaikul amanicha@jhsph.edu 8 May 2007 1 / 52 Overview Modelling counts Contingency tables Poisson regression models 2 / 52 Modelling counts I Why

More information

Modelling counts. Lecture 14: Introduction to Poisson Regression. Overview

Modelling counts. Lecture 14: Introduction to Poisson Regression. Overview Modelling counts I Lecture 14: Introduction to Poisson Regression Ani Manichaikul amanicha@jhsph.edu Why count data? Number of traffic accidents per day Mortality counts in a given neighborhood, per week

More information

Model Estimation Example

Model Estimation Example Ronald H. Heck 1 EDEP 606: Multivariate Methods (S2013) April 7, 2013 Model Estimation Example As we have moved through the course this semester, we have encountered the concept of model estimation. Discussions

More information

Plotting data is one method for selecting a probability distribution. The following

Plotting data is one method for selecting a probability distribution. The following Advanced Analytical Models: Over 800 Models and 300 Applications from the Basel II Accord to Wall Street and Beyond By Johnathan Mun Copyright 008 by Johnathan Mun APPENDIX C Understanding and Choosing

More information

SYLLABUS FOR ENTRANCE EXAMINATION NANYANG TECHNOLOGICAL UNIVERSITY FOR INTERNATIONAL STUDENTS A-LEVEL MATHEMATICS

SYLLABUS FOR ENTRANCE EXAMINATION NANYANG TECHNOLOGICAL UNIVERSITY FOR INTERNATIONAL STUDENTS A-LEVEL MATHEMATICS SYLLABUS FOR ENTRANCE EXAMINATION NANYANG TECHNOLOGICAL UNIVERSITY FOR INTERNATIONAL STUDENTS A-LEVEL MATHEMATICS STRUCTURE OF EXAMINATION PAPER. There will be one -hour paper consisting of 4 questions..

More information

Gini Coefficient. A supplement to Mahlerʼs Guide to Loss Distributions. Exam C. prepared by Howard C. Mahler, FCAS Copyright 2017 by Howard C. Mahler.

Gini Coefficient. A supplement to Mahlerʼs Guide to Loss Distributions. Exam C. prepared by Howard C. Mahler, FCAS Copyright 2017 by Howard C. Mahler. Gini Coefficient A supplement to Mahlerʼs Guide to Loss Distributions Eam C prepared by Howard C. Mahler, FCAS Copyright 27 by Howard C. Mahler. Howard Mahler hmahler@mac.com www.howardmahler.com/teaching

More information

Given the vectors u, v, w and real numbers α, β, γ. Calculate vector a, which is equal to the linear combination α u + β v + γ w.

Given the vectors u, v, w and real numbers α, β, γ. Calculate vector a, which is equal to the linear combination α u + β v + γ w. Selected problems from the tetbook J. Neustupa, S. Kračmar: Sbírka příkladů z Matematiky I Problems in Mathematics I I. LINEAR ALGEBRA I.. Vectors, vector spaces Given the vectors u, v, w and real numbers

More information

Class Notes: Week 8. Probit versus Logit Link Functions and Count Data

Class Notes: Week 8. Probit versus Logit Link Functions and Count Data Ronald Heck Class Notes: Week 8 1 Class Notes: Week 8 Probit versus Logit Link Functions and Count Data This week we ll take up a couple of issues. The first is working with a probit link function. While

More information

Stat 5101 Lecture Notes

Stat 5101 Lecture Notes Stat 5101 Lecture Notes Charles J. Geyer Copyright 1998, 1999, 2000, 2001 by Charles J. Geyer May 7, 2001 ii Stat 5101 (Geyer) Course Notes Contents 1 Random Variables and Change of Variables 1 1.1 Random

More information

Subject CS1 Actuarial Statistics 1 Core Principles

Subject CS1 Actuarial Statistics 1 Core Principles Institute of Actuaries of India Subject CS1 Actuarial Statistics 1 Core Principles For 2019 Examinations Aim The aim of the Actuarial Statistics 1 subject is to provide a grounding in mathematical and

More information

Mixed models in R using the lme4 package Part 5: Generalized linear mixed models

Mixed models in R using the lme4 package Part 5: Generalized linear mixed models Mixed models in R using the lme4 package Part 5: Generalized linear mixed models Douglas Bates Madison January 11, 2011 Contents 1 Definition 1 2 Links 2 3 Example 7 4 Model building 9 5 Conclusions 14

More information

Linear model A linear model assumes Y X N(µ(X),σ 2 I), And IE(Y X) = µ(x) = X β, 2/52

Linear model A linear model assumes Y X N(µ(X),σ 2 I), And IE(Y X) = µ(x) = X β, 2/52 Statistics for Applications Chapter 10: Generalized Linear Models (GLMs) 1/52 Linear model A linear model assumes Y X N(µ(X),σ 2 I), And IE(Y X) = µ(x) = X β, 2/52 Components of a linear model The two

More information

Poisson regression: Further topics

Poisson regression: Further topics Poisson regression: Further topics April 21 Overdispersion One of the defining characteristics of Poisson regression is its lack of a scale parameter: E(Y ) = Var(Y ), and no parameter is available to

More information

Generalized Linear Models. Last time: Background & motivation for moving beyond linear

Generalized Linear Models. Last time: Background & motivation for moving beyond linear Generalized Linear Models Last time: Background & motivation for moving beyond linear regression - non-normal/non-linear cases, binary, categorical data Today s class: 1. Examples of count and ordered

More information

SPRING 2007 EXAM C SOLUTIONS

SPRING 2007 EXAM C SOLUTIONS SPRING 007 EXAM C SOLUTIONS Question #1 The data are already shifted (have had the policy limit and the deductible of 50 applied). The two 350 payments are censored. Thus the likelihood function is L =

More information

Model Assumptions; Predicting Heterogeneity of Variance

Model Assumptions; Predicting Heterogeneity of Variance Model Assumptions; Predicting Heterogeneity of Variance Today s topics: Model assumptions Normality Constant variance Predicting heterogeneity of variance CLP 945: Lecture 6 1 Checking for Violations of

More information

Chapter 2. Random Variable. Define single random variables in terms of their PDF and CDF, and calculate moments such as the mean and variance.

Chapter 2. Random Variable. Define single random variables in terms of their PDF and CDF, and calculate moments such as the mean and variance. Chapter 2 Random Variable CLO2 Define single random variables in terms of their PDF and CDF, and calculate moments such as the mean and variance. 1 1. Introduction In Chapter 1, we introduced the concept

More information

Statistical Distribution Assumptions of General Linear Models

Statistical Distribution Assumptions of General Linear Models Statistical Distribution Assumptions of General Linear Models Applied Multilevel Models for Cross Sectional Data Lecture 4 ICPSR Summer Workshop University of Colorado Boulder Lecture 4: Statistical Distributions

More information

2 Statistical Estimation: Basic Concepts

2 Statistical Estimation: Basic Concepts Technion Israel Institute of Technology, Department of Electrical Engineering Estimation and Identification in Dynamical Systems (048825) Lecture Notes, Fall 2009, Prof. N. Shimkin 2 Statistical Estimation:

More information

Chapter 8. Exponential and Logarithmic Functions

Chapter 8. Exponential and Logarithmic Functions Chapter 8 Eponential and Logarithmic Functions Lesson 8-1 Eploring Eponential Models Eponential Function The general form of an eponential function is y = ab. Growth Factor When the value of b is greater

More information

AB Calculus 2013 Summer Assignment. Theme 1: Linear Functions

AB Calculus 2013 Summer Assignment. Theme 1: Linear Functions 01 Summer Assignment Theme 1: Linear Functions 1. Write the equation for the line through the point P(, -1) that is perpendicular to the line 5y = 7. (A) + 5y = -1 (B) 5 y = 8 (C) 5 y = 1 (D) 5 + y = 7

More information

Estimators as Random Variables

Estimators as Random Variables Estimation Theory Overview Properties Bias, Variance, and Mean Square Error Cramér-Rao lower bound Maimum likelihood Consistency Confidence intervals Properties of the mean estimator Introduction Up until

More information

Generalized Linear Models Introduction

Generalized Linear Models Introduction Generalized Linear Models Introduction Statistics 135 Autumn 2005 Copyright c 2005 by Mark E. Irwin Generalized Linear Models For many problems, standard linear regression approaches don t work. Sometimes,

More information

Introduction to General and Generalized Linear Models

Introduction to General and Generalized Linear Models Introduction to General and Generalized Linear Models Generalized Linear Models - part II Henrik Madsen Poul Thyregod Informatics and Mathematical Modelling Technical University of Denmark DK-2800 Kgs.

More information

= 1 2 x (x 1) + 1 {x} (1 {x}). [t] dt = 1 x (x 1) + O (1), [t] dt = 1 2 x2 + O (x), (where the error is not now zero when x is an integer.

= 1 2 x (x 1) + 1 {x} (1 {x}). [t] dt = 1 x (x 1) + O (1), [t] dt = 1 2 x2 + O (x), (where the error is not now zero when x is an integer. Problem Sheet,. i) Draw the graphs for [] and {}. ii) Show that for α R, α+ α [t] dt = α and α+ α {t} dt =. Hint Split these integrals at the integer which must lie in any interval of length, such as [α,

More information

Logistic Regression: Regression with a Binary Dependent Variable

Logistic Regression: Regression with a Binary Dependent Variable Logistic Regression: Regression with a Binary Dependent Variable LEARNING OBJECTIVES Upon completing this chapter, you should be able to do the following: State the circumstances under which logistic regression

More information

An Introduction to Path Analysis

An Introduction to Path Analysis An Introduction to Path Analysis PRE 905: Multivariate Analysis Lecture 10: April 15, 2014 PRE 905: Lecture 10 Path Analysis Today s Lecture Path analysis starting with multivariate regression then arriving

More information

MODELING COUNT DATA Joseph M. Hilbe

MODELING COUNT DATA Joseph M. Hilbe MODELING COUNT DATA Joseph M. Hilbe Arizona State University Count models are a subset of discrete response regression models. Count data are distributed as non-negative integers, are intrinsically heteroskedastic,

More information

2 Generating Functions

2 Generating Functions 2 Generating Functions In this part of the course, we re going to introduce algebraic methods for counting and proving combinatorial identities. This is often greatly advantageous over the method of finding

More information

Mixed models in R using the lme4 package Part 7: Generalized linear mixed models

Mixed models in R using the lme4 package Part 7: Generalized linear mixed models Mixed models in R using the lme4 package Part 7: Generalized linear mixed models Douglas Bates University of Wisconsin - Madison and R Development Core Team University of

More information

Path Analysis. PRE 906: Structural Equation Modeling Lecture #5 February 18, PRE 906, SEM: Lecture 5 - Path Analysis

Path Analysis. PRE 906: Structural Equation Modeling Lecture #5 February 18, PRE 906, SEM: Lecture 5 - Path Analysis Path Analysis PRE 906: Structural Equation Modeling Lecture #5 February 18, 2015 PRE 906, SEM: Lecture 5 - Path Analysis Key Questions for Today s Lecture What distinguishes path models from multivariate

More information

Chapter 6. Nonlinear Equations. 6.1 The Problem of Nonlinear Root-finding. 6.2 Rate of Convergence

Chapter 6. Nonlinear Equations. 6.1 The Problem of Nonlinear Root-finding. 6.2 Rate of Convergence Chapter 6 Nonlinear Equations 6. The Problem of Nonlinear Root-finding In this module we consider the problem of using numerical techniques to find the roots of nonlinear equations, f () =. Initially we

More information

Partitioning variation in multilevel models.

Partitioning variation in multilevel models. Partitioning variation in multilevel models. by Harvey Goldstein, William Browne and Jon Rasbash Institute of Education, London, UK. Summary. In multilevel modelling, the residual variation in a response

More information

Mixed models in R using the lme4 package Part 5: Generalized linear mixed models

Mixed models in R using the lme4 package Part 5: Generalized linear mixed models Mixed models in R using the lme4 package Part 5: Generalized linear mixed models Douglas Bates 2011-03-16 Contents 1 Generalized Linear Mixed Models Generalized Linear Mixed Models When using linear mixed

More information

ECON 5350 Class Notes Functional Form and Structural Change

ECON 5350 Class Notes Functional Form and Structural Change ECON 5350 Class Notes Functional Form and Structural Change 1 Introduction Although OLS is considered a linear estimator, it does not mean that the relationship between Y and X needs to be linear. In this

More information

C-14 Finding the Right Synergy from GLMs and Machine Learning

C-14 Finding the Right Synergy from GLMs and Machine Learning C-14 Finding the Right Synergy from GLMs and Machine Learning 2010 CAS Annual Meeting Claudine Modlin November 8, 2010 Slide 1 Definitions Parametric modeling Objective: build a predictive model User makes

More information

Linear, Generalized Linear, and Mixed-Effects Models in R. Linear and Generalized Linear Models in R Topics

Linear, Generalized Linear, and Mixed-Effects Models in R. Linear and Generalized Linear Models in R Topics Linear, Generalized Linear, and Mixed-Effects Models in R John Fox McMaster University ICPSR 2018 John Fox (McMaster University) Statistical Models in R ICPSR 2018 1 / 19 Linear and Generalized Linear

More information

ABSTRACT KEYWORDS 1. INTRODUCTION

ABSTRACT KEYWORDS 1. INTRODUCTION THE SAMPLE SIZE NEEDED FOR THE CALCULATION OF A GLM TARIFF BY HANS SCHMITTER ABSTRACT A simple upper bound for the variance of the frequency estimates in a multivariate tariff using class criteria is deduced.

More information

Outline. Mixed models in R using the lme4 package Part 5: Generalized linear mixed models. Parts of LMMs carried over to GLMMs

Outline. Mixed models in R using the lme4 package Part 5: Generalized linear mixed models. Parts of LMMs carried over to GLMMs Outline Mixed models in R using the lme4 package Part 5: Generalized linear mixed models Douglas Bates University of Wisconsin - Madison and R Development Core Team UseR!2009,

More information

Definition of Statistics Statistics Branches of Statistics Descriptive statistics Inferential statistics

Definition of Statistics Statistics Branches of Statistics Descriptive statistics Inferential statistics What is Statistics? Definition of Statistics Statistics is the science of collecting, organizing, analyzing, and interpreting data in order to make a decision. Branches of Statistics The study of statistics

More information

Solutions to the Spring 2016 CAS Exam S

Solutions to the Spring 2016 CAS Exam S Solutions to the Spring 2016 CAS Exam S There were 45 questions in total, of equal value, on this 4 hour exam. There was a 15 minute reading period in addition to the 4 hours. The Exam S is copyright 2016

More information

SAS Software to Fit the Generalized Linear Model

SAS Software to Fit the Generalized Linear Model SAS Software to Fit the Generalized Linear Model Gordon Johnston, SAS Institute Inc., Cary, NC Abstract In recent years, the class of generalized linear models has gained popularity as a statistical modeling

More information

STA 414/2104, Spring 2014, Practice Problem Set #1

STA 414/2104, Spring 2014, Practice Problem Set #1 STA 44/4, Spring 4, Practice Problem Set # Note: these problems are not for credit, and not to be handed in Question : Consider a classification problem in which there are two real-valued inputs, and,

More information

Soft and hard models. Jiří Militky Computer assisted statistical modeling in the

Soft and hard models. Jiří Militky Computer assisted statistical modeling in the Soft and hard models Jiří Militky Computer assisted statistical s modeling in the applied research Monsters Giants Moore s law: processing capacity doubles every 18 months : CPU, cache, memory It s more

More information

Interpreting Regression Results

Interpreting Regression Results Interpreting Regression Results Carlo Favero Favero () Interpreting Regression Results 1 / 42 Interpreting Regression Results Interpreting regression results is not a simple exercise. We propose to split

More information

Homework 3 solution (100points) Due in class, 9/ (10) 1.19 (page 31)

Homework 3 solution (100points) Due in class, 9/ (10) 1.19 (page 31) Homework 3 solution (00points) Due in class, 9/4. (0).9 (page 3) (a) The density curve forms a rectangle over the interval [4, 6]. For this reason, uniform densities are also called rectangular densities

More information

Statistical Process Control for Multivariate Categorical Processes

Statistical Process Control for Multivariate Categorical Processes Statistical Process Control for Multivariate Categorical Processes Fugee Tsung The Hong Kong University of Science and Technology Fugee Tsung 1/27 Introduction Typical Control Charts Univariate continuous

More information

Single-level Models for Binary Responses

Single-level Models for Binary Responses Single-level Models for Binary Responses Distribution of Binary Data y i response for individual i (i = 1,..., n), coded 0 or 1 Denote by r the number in the sample with y = 1 Mean and variance E(y) =

More information

Zero inflated negative binomial-generalized exponential distribution and its applications

Zero inflated negative binomial-generalized exponential distribution and its applications Songklanakarin J. Sci. Technol. 6 (4), 48-491, Jul. - Aug. 014 http://www.sst.psu.ac.th Original Article Zero inflated negative binomial-generalized eponential distribution and its applications Sirinapa

More information

9 Generalized Linear Models

9 Generalized Linear Models 9 Generalized Linear Models The Generalized Linear Model (GLM) is a model which has been built to include a wide range of different models you already know, e.g. ANOVA and multiple linear regression models

More information

The GENMOD Procedure. Overview. Getting Started. Syntax. Details. Examples. References. SAS/STAT User's Guide. Book Contents Previous Next

The GENMOD Procedure. Overview. Getting Started. Syntax. Details. Examples. References. SAS/STAT User's Guide. Book Contents Previous Next Book Contents Previous Next SAS/STAT User's Guide Overview Getting Started Syntax Details Examples References Book Contents Previous Next Top http://v8doc.sas.com/sashtml/stat/chap29/index.htm29/10/2004

More information

Outline of GLMs. Definitions

Outline of GLMs. Definitions Outline of GLMs Definitions This is a short outline of GLM details, adapted from the book Nonparametric Regression and Generalized Linear Models, by Green and Silverman. The responses Y i have density

More information

Basic concepts in estimation

Basic concepts in estimation Basic concepts in estimation Random and nonrandom parameters Definitions of estimates ML Maimum Lielihood MAP Maimum A Posteriori LS Least Squares MMS Minimum Mean square rror Measures of quality of estimates

More information

A Re-Introduction to General Linear Models

A Re-Introduction to General Linear Models A Re-Introduction to General Linear Models Today s Class: Big picture overview Why we are using restricted maximum likelihood within MIXED instead of least squares within GLM Linear model interpretation

More information

Intermediate Algebra Section 9.3 Logarithmic Functions

Intermediate Algebra Section 9.3 Logarithmic Functions Intermediate Algebra Section 9.3 Logarithmic Functions We have studied inverse functions, learning when they eist and how to find them. If we look at the graph of the eponential function, f ( ) = a, where

More information

Generalized Linear. Mixed Models. Methods and Applications. Modern Concepts, Walter W. Stroup. Texts in Statistical Science.

Generalized Linear. Mixed Models. Methods and Applications. Modern Concepts, Walter W. Stroup. Texts in Statistical Science. Texts in Statistical Science Generalized Linear Mixed Models Modern Concepts, Methods and Applications Walter W. Stroup CRC Press Taylor & Francis Croup Boca Raton London New York CRC Press is an imprint

More information

Central Limit Theorem and the Law of Large Numbers Class 6, Jeremy Orloff and Jonathan Bloom

Central Limit Theorem and the Law of Large Numbers Class 6, Jeremy Orloff and Jonathan Bloom Central Limit Theorem and the Law of Large Numbers Class 6, 8.5 Jeremy Orloff and Jonathan Bloom Learning Goals. Understand the statement of the law of large numbers. 2. Understand the statement of the

More information

MACHINE LEARNING ADVANCED MACHINE LEARNING

MACHINE LEARNING ADVANCED MACHINE LEARNING MACHINE LEARNING ADVANCED MACHINE LEARNING Recap of Important Notions on Estimation of Probability Density Functions 2 2 MACHINE LEARNING Overview Definition pdf Definition joint, condition, marginal,

More information

Class 26: review for final exam 18.05, Spring 2014

Class 26: review for final exam 18.05, Spring 2014 Probability Class 26: review for final eam 8.05, Spring 204 Counting Sets Inclusion-eclusion principle Rule of product (multiplication rule) Permutation and combinations Basics Outcome, sample space, event

More information

Correlation and regression

Correlation and regression 1 Correlation and regression Yongjua Laosiritaworn Introductory on Field Epidemiology 6 July 2015, Thailand Data 2 Illustrative data (Doll, 1955) 3 Scatter plot 4 Doll, 1955 5 6 Correlation coefficient,

More information