A Reliable Constrained Method for Identity Link Poisson Regression

Similar documents
Lattice Data. Tonglin Zhang. Spatial Statistics for Point and Lattice Data (Part III)

Statistics 3858 : Maximum Likelihood Estimators

Normal distribution We have a random sample from N(m, υ). The sample mean is Ȳ and the corrected sum of squares is S yy. After some simplification,

The material for categorical data follows Agresti closely.

STA216: Generalized Linear Models. Lecture 1. Review and Introduction

Marginal Screening and Post-Selection Inference

Link lecture - Lagrange Multipliers

Statistical Estimation

Statistics - Lecture One. Outline. Charlotte Wickham 1. Basic ideas about estimation

Homework 1 Solutions

Generalized Linear Models. Last time: Background & motivation for moving beyond linear

f(x θ)dx with respect to θ. Assuming certain smoothness conditions concern differentiating under the integral the integral sign, we first obtain

Test of Association between Two Ordinal Variables while Adjusting for Covariates

Poisson regression: Further topics

Definition 3.1 A statistical hypothesis is a statement about the unknown values of the parameters of the population distribution.

Generalized Linear Models

Clinical Trials. Olli Saarela. September 18, Dalla Lana School of Public Health University of Toronto.

STA 216: GENERALIZED LINEAR MODELS. Lecture 1. Review and Introduction. Much of statistics is based on the assumption that random

Semiparametric Generalized Linear Models

Appendix A. Numeric example of Dimick Staiger Estimator and comparison between Dimick-Staiger Estimator and Hierarchical Poisson Estimator

Faculty of Health Sciences. Regression models. Counts, Poisson regression, Lene Theil Skovgaard. Dept. of Biostatistics

Lecture 8. Poisson models for counts

COMPOSITIONAL IDEAS IN THE BAYESIAN ANALYSIS OF CATEGORICAL DATA WITH APPLICATION TO DOSE FINDING CLINICAL TRIALS

Chapter 1 Statistical Inference

Logistisk regression T.K.

STA 4504/5503 Sample Exam 1 Spring 2011 Categorical Data Analysis. 1. Indicate whether each of the following is true (T) or false (F).

Categorical data analysis Chapter 5

Lecture 25. Ingo Ruczinski. November 24, Department of Biostatistics Johns Hopkins Bloomberg School of Public Health Johns Hopkins University

Lecture 12: Effect modification, and confounding in logistic regression

Loglikelihood and Confidence Intervals

Review. Timothy Hanson. Department of Statistics, University of South Carolina. Stat 770: Categorical Data Analysis

Introduction to Statistical Analysis

Model comparison and selection

,..., θ(2),..., θ(n)

Lecture 01: Introduction

STAT 5500/6500 Conditional Logistic Regression for Matched Pairs

Does low participation in cohort studies induce bias? Additional material

Flexible mediation analysis in the presence of non-linear relations: beyond the mediation formula.

Ph.D. Qualifying Exam Friday Saturday, January 6 7, 2017

Machine Learning. Lecture 3: Logistic Regression. Feng Li.

Stat 642, Lecture notes for 04/12/05 96

An ordinal number is used to represent a magnitude, such that we can compare ordinal numbers and order them by the quantity they represent.

Constrained estimation for binary and survival data

Discussion of Identifiability and Estimation of Causal Effects in Randomized. Trials with Noncompliance and Completely Non-ignorable Missing Data

ST3241 Categorical Data Analysis I Generalized Linear Models. Introduction and Some Examples

STAT 5500/6500 Conditional Logistic Regression for Matched Pairs

STA 4504/5503 Sample Exam 1 Spring 2011 Categorical Data Analysis. 1. Indicate whether each of the following is true (T) or false (F).

Outline of GLMs. Definitions

Analysis of Categorical Data. Nick Jackson University of Southern California Department of Psychology 10/11/2013

Single-level Models for Binary Responses

Chapter 5: Logistic Regression-I

1 One-way analysis of variance

Simple logistic regression

Describing Contingency tables

Survival Analysis Math 434 Fall 2011

Linear regression is designed for a quantitative response variable; in the model equation

Statistics 203: Introduction to Regression and Analysis of Variance Course review

Ph.D. Qualifying Exam Friday Saturday, January 3 4, 2014

Parametric Modelling of Over-dispersed Count Data. Part III / MMath (Applied Statistics) 1

Dose-response Modeling for Ordinal Outcome Data

Local Likelihood Bayesian Cluster Modeling for small area health data. Andrew Lawson Arnold School of Public Health University of South Carolina

STAT 705: Analysis of Contingency Tables

(θ θ ), θ θ = 2 L(θ ) θ θ θ θ θ (θ )= H θθ (θ ) 1 d θ (θ )

LOGISTIC REGRESSION Joseph M. Hilbe

A class of latent marginal models for capture-recapture data with continuous covariates

Sections 4.1, 4.2, 4.3

Some New Aspects of Dose-Response Models with Applications to Multistage Models Having Parameters on the Boundary

Localization of Radioactive Sources Zhifei Zhang

Conditional Maximum Likelihood Estimation of the First-Order Spatial Non-Negative Integer-Valued Autoregressive (SINAR(1,1)) Model

Chapter 22: Log-linear regression for Poisson counts

Tied survival times; estimation of survival probabilities

STAT 526 Spring Midterm 1. Wednesday February 2, 2011

Generalized Linear Models Introduction

Multinomial Logistic Regression Models

Using the Delta Method to Construct Confidence Intervals for Predicted Probabilities, Rates, and Discrete Changes 1

Biostatistics Advanced Methods in Biostatistics IV

Statistical Data Mining and Machine Learning Hilary Term 2016

Various Issues in Fitting Contingency Tables

Generalized Linear Models (GLZ)

Biostat 2065 Analysis of Incomplete Data

High-Throughput Sequencing Course

Variable selection for model-based clustering

Goodness-of-Fit Tests for the Ordinal Response Models with Misspecified Links

High dimensional ising model selection using l 1 -regularized logistic regression

Introduction An approximated EM algorithm Simulation studies Discussion

Machine Learning 2017

Maximum Likelihood, Logistic Regression, and Stochastic Gradient Training

Maximum Likelihood Estimation

Chapter 2: Describing Contingency Tables - I

Causal Inference in Observational Studies with Non-Binary Treatments. David A. van Dyk

Correlation and regression

Lecture 14: Introduction to Poisson Regression

Modelling counts. Lecture 14: Introduction to Poisson Regression. Overview

Testing Independence

Bayesian Hierarchical Models

Regression Clustering

Investigating Models with Two or Three Categories

Econometrics Lecture 5: Limited Dependent Variable Models: Logit and Probit

Generalized Linear Modeling - Logistic Regression

Efficient MCMC Samplers for Network Tomography

Transcription:

A Reliable Constrained Method for Identity Link Poisson Regression Ian Marschner Macquarie University, Sydney Australasian Region of the International Biometrics Society, Taupo, NZ, Dec 2009. 1 / 16

Identity Link Poisson Model Useful for count data where the mean depends additively on a collection of covariates Useful as an approximation to identity link binomial model for modelling additive probabilities Epidemiological applications: Rate difference regression Risk difference regression Alternative to the usual multiplicative models: Multiplicative incidence (log link Poisson) Multiplicative risk (log link binomial) Multiplicative odds (logistic link binomial) 2 / 16

Advantages and Disadvantages Modelling advantages: Rate/risk differences may be preferred to rate/risk/odds ratios Additive model may fit the data better than a multiplicative model 3 / 16

Advantages and Disadvantages Modelling advantages: Rate/risk differences may be preferred to rate/risk/odds ratios Additive model may fit the data better than a multiplicative model Computational disadvantages: Positivity constraints may lead to non-convergence of standard computational methods e.g. IRLS Standard errors and confidence intervals may be invalid if MLE is on or near parameter space boundary Even if standard methods do converge for a single data set, they often won t work for thousands of bootstrap replications 3 / 16

Categorical Model Count data: {Y i ; i = 1,..., n} with covariates x i = (x i1,..., x ia ) x ij {1, 2,..., k j} and x i X = j {1, 2,..., kj} 4 / 16

Categorical Model Count data: {Y i ; i = 1,..., n} with covariates x i = (x i1,..., x ia ) x ij {1, 2,..., k j} and x i X = j {1, 2,..., kj} Poisson model: Y i independent Poisson(N i λ i ) i = 1,..., n Additive means: λ i = Λ(x i; θ) = α 0 + A j=1 αj(xij) N i are known positive standardisation constants e.g. expected number of events or number of binomial trials Log-likelihood: L(θ) = n i=1 Yi log[niλ(xi; θ)] NiΛ(xi; θ) 4 / 16

Categorical Model Count data: {Y i ; i = 1,..., n} with covariates x i = (x i1,..., x ia ) x ij {1, 2,..., k j} and x i X = j {1, 2,..., kj} Poisson model: Y i independent Poisson(N i λ i ) i = 1,..., n Additive means: λ i = Λ(x i; θ) = α 0 + A j=1 αj(xij) N i are known positive standardisation constants e.g. expected number of events or number of binomial trials Log-likelihood: L(θ) = n i=1 Yi log[niλ(xi; θ)] NiΛ(xi; θ) Identifiability restriction: for some r = (r 1,..., r A ) X define α j (r j ) = 0 for each j = 1,..., A 4 / 16

Categorical Model Count data: {Y i ; i = 1,..., n} with covariates x i = (x i1,..., x ia ) x ij {1, 2,..., k j} and x i X = j {1, 2,..., kj} Poisson model: Y i independent Poisson(N i λ i ) i = 1,..., n Additive means: λ i = Λ(x i; θ) = α 0 + A j=1 αj(xij) N i are known positive standardisation constants e.g. expected number of events or number of binomial trials Log-likelihood: L(θ) = n i=1 Yi log[niλ(xi; θ)] NiΛ(xi; θ) Identifiability restriction: for some r = (r 1,..., r A ) X define α j (r j ) = 0 for each j = 1,..., A Parameter space: Θ = {θ : Λ(x; θ) 0 for all x X } Interpretation: overall means E(Y i) are constrained to be non-negative but the individual α j(x ij) can be negative 4 / 16

Model with Non-negative Coefficients Suppose we did constrain the model: α j (x ij ) 0 for all i and j This type of identity link Poisson model is well known in specialised applications involving Poisson deconvolution The EM algorithm provides a computationally stable and convenient method to fit such models This is great if you have a specialised application that calls for non-negative coefficients In general, identity link Poisson regression models do not have this constraint so the specialised methods are not applicable We discuss how to adapt the specialised methods to the general case 5 / 16

Parameter Space Subset Given the chosen identifiability restriction r, define the following subset of the parameter space Θ: Θ(r) = {θ Θ : Λ(r; θ) Λ(x; θ) for all x X } Θ(r) is the subset of the parameter space in which no covariate pattern has a smaller (standardised) Poisson mean than the covariate pattern r Θ(r) is the subset of the parameter space in which all α j (x ij ) 0 for x ij = r j Given the chosen identifiability restriction r, fitting a non-negative coefficient model is the same as fitting the model subject to the constraint θ Θ(r) What use is this for the general model with θ Θ? 6 / 16

Fitting the General Model For one of the choices of r X the maximum of L(θ) over Θ is within Θ(r) since Θ = Θ(r) r X Cycle through the finite number of possible choices of r X and maximise L(θ) over Θ(r) in each case This involves a sequence of (stable and convenient) EM algorithm applications to yield a sequence of contrained estimates ˆθ(r) The MLE ˆθ is then the ˆθ(r) that yields the highest value of L. That is, ˆθ = ˆθ(r ) where: r = {r X : L(ˆθ(r)) L(ˆθ(s)) for all s X } If you find a stationary point then the process can terminate but often the MLE will not be a stationary point 7 / 16

EM Algorithm Since the Poisson means are additive, Y i can be viewed as the sum of unobserved Poisson variates: Y i = A j=0 Y (j) i where E(Y (j) i ) = N i α j (x ij ) EM algorithm: {Y i } = observed data and {Y (j) i } = complete data Only works for the non-negative coefficient model since the complete data means must be non-negative Yields a stable multiplicative algorithm which increases L within Θ(r) as long as ˆθ (0) Θ(r) At convergence: ˆθ ( ) = ˆθ(r), the maximum of L(θ) over Θ(r) 8 / 16

EM Algorithm At iteration c: ˆY (j) i(c+1) (j) = E(Y i Y i ; ˆθ (c) ) = ˆα (c) j (l)y i /Λ(x i ; ˆθ (c) ) ˆα (c+1) j (l) = / (l) ˆY i(c+1) N i i I jl i I jl = ˆα (c) j (l) i I jl N i 1 i I jl / (Y i Λ(x i ; ˆθ ) (c) ) where I jl = {i : x ij = l} identifies the observations that have covariate j equal to l 9 / 16

Extension to Continuous Covariates Isotonic regression: Covariate with C + 1 distinct observed values w 0 < < w C Additive contribution is f(w i) for unspecified non-decreasing f Add C dummy covariates to the model corresponing to the C non-negative increments γ i = f(w i) f(w i 1) EM algorithm maximises subject to the isotonicity constraint γ i 0 Works for ordinal categorical or continuous covariates 10 / 16

Extension to Continuous Covariates Isotonic regression: Covariate with C + 1 distinct observed values w 0 < < w C Additive contribution is f(w i) for unspecified non-decreasing f Add C dummy covariates to the model corresponing to the C non-negative increments γ i = f(w i) f(w i 1) EM algorithm maximises subject to the isotonicity constraint γ i 0 Works for ordinal categorical or continuous covariates Linear regression: Shift each covariate by subtracting the minimum (or maximum) Use the EM algorithm to maximise subject to a non-negative (or non-positive) gradient Cycle through all the combinations and find the highest likelihood 10 / 16

Example 1: Respiratory Cancer Mortality Breslow et al (JASA, 1983) Epidemiological data on respiratory cancer mortality in 8047 Montana smelter workers > glm.fit(design*expected,observed,family=poisson(link="identity")) Error: no valid set of coefficients has been found: please supply starting values In addition: Warning message: In log(ifelse(y == 0, 1, y/mu)) : NaNs produced 11 / 16

Example 1: Respiratory Cancer Mortality Illustration of calculations for constrained identity link model with 2 covariates: r Constant Birthplace Years heavy arsenic Log-likelihood Foreign US 0 <1 1 4 5+ (1, 1) 2.56 0.00 1.44 0.96 4.98 345.02 (1, 2) 2.66 0.00 0.00 0.86 4.88 343.99 (1, 3) 2.59 0.00 0.00 1.41 4.95 344.80 (1, 4) 2.72 0.00 0.00 1.28 0.80 340.63 (2,1) 1.65 3.05 1.77 1.87 5.47 360.61 (2, 2) 1.79 2.99 0.00 1.73 5.34 358.73 (2, 3) 1.74 2.97 0.00 1.69 5.40 359.54 (2, 4) 1.87 2.92 0.00 1.57 1.65 354.35 12 / 16

Example 1: Respiratory Cancer Mortality Constrained identity link analysis with all covariates and isotonic or unrestricted arsenic exposure effects Unrestricted analysis Isotonic analysis Covariate Estimate 95% CI Estimate 95% CI Constant 1.41 0.91 1.90 1.37 0.89 1.78 Foreign born (vs. U.S.) 2.48 1.26 3.70 2.48 1.29 3.67 Years moderate arsenic (vs. 0) <1 year 0.17 1.18 0.85 0.00 0.00 1.03 1 4 years 1.60 0.17 3.37 1.35 0.00 2.63 5 14 years 0.86 1.16 2.88 1.35 0.31 3.50 15+ years 4.03 1.09 6.96 4.06 1.59 7.18 Years heavy arsenic (vs. 0) <1 year 1.17 0.80 3.13 1.21 0.00 2.96 1 4 years 1.93 1.47 5.32 1.95 0.00 5.26 5+ years 5.68 1.11 10.25 5.70 1.90 10.59 Identity link deviance 26.05 26.46 Log link deviance 30.36 31.95 13 / 16

Example 1: Respiratory Cancer Mortality Results of 5000 bootstrap replications of the isotonic identity link model Constant Foreign born Moderate arsenic <1 year Frequency 0 200 600 Frequency 0 500 1000 Frequency 0 1000 2500 0.5 1.0 1.5 2.0 Estimate 0 1 2 3 4 5 Estimate 0.0 0.5 1.0 1.5 Estimate Moderate arsenic 1 4 years Moderate arsenic 5 14 years Moderate arsenic 15+ years Frequency 0 200 400 Frequency 0 400 800 1200 Frequency 0 200 400 600 0.0 0.5 1.0 1.5 2.0 2.5 3.0 3.5 Estimate 0 1 2 3 4 5 Estimate 0 2 4 6 8 10 Estimate Heavy arsenic <1 year Heavy arsenic 1 4 years Heavy arsenic 5+ years Frequency 0 500 1000 1500 Frequency 0 200 600 Frequency 0 200 600 0 1 2 3 4 5 Estimate 0 2 4 6 8 Estimate 0 5 10 15 Estimate 14 / 16

Example 2: Crab Population Counts Agresti (1996) Outcome: Number of partners for n = 173 horseshoe crabs Model: Identity link Poisson model for the mean number of partners versus crab size (cm), adjusted for colour and spine condition Analysis: IRLS fails to converge; identity link better than log link Linear and Isotonic Identity Link Models Bootstrap 95% Confidence Intervals Mean number of partners 0 2 4 6 8 Mean number of partners 0 2 4 6 8 20 25 30 35 Width 20 25 30 35 Width 15 / 16

Summary Reliability: Computationally stable method for fitting constrained identity link Poisson models when standard methods fail Stationary or non-stationary maximum Boundary or interior of parameter space Flexibility: Categorical, ordinal or continuous covariates Unspecified isotonic relationships Smoothing step could easily be added for non-parameteric smooth relationships Validity: Computationally reliable bootstrap analysis yields confidence intervals that satisfy parameter constraints Even when standard methods converge for the main analysis, they are often too unreliable for a resampling analysis 16 / 16