# A note on R 2 measures for Poisson and logistic regression models when both models are applicable

Save this PDF as:

Size: px
Start display at page:

Download "A note on R 2 measures for Poisson and logistic regression models when both models are applicable"

## Transcription

2 100 M. Mittlböck, H. Heinzl / Journal of Clinical Epidemiology 54 (001) Fig. 1. ogistic function (solid line) and oisson function (dotted line) with a single continuous covariate with range ( 70, 70) and parameters 0.1 and 1, respectively. The horizontal line at p 0.05 denotes the maximum of p, where an approximation of the logistic regression by a oisson regression is good. a dose response study shows that the probability of binary responses can be fitted very well for the various dose levels, and the goodness-of-fit is nearly perfect. But it is still difficult to predict which patient will respond as the covariate dose is not very effective in distinguishing outcomes [7]. Viewed on an individual-by-individual basis, the observations are all zeros and ones while for the various dose levels the predictions are between 0.30 and 0.8 in Fig.. Note, the closer the probability of an event is around 0.5 the more inaccurate is the prediction of an individual event. The basic idea of oisson regression is to estimate frequencies (number of responses) and not individual outcomes. This is similar to the estimation of probabilities for binary events. Therefore there is no large difference between goodness-of-fit and R measures in oisson models [8, 9]. However, if a typical binomial situation with binary outcomes is fitted by oisson regression, which is usually done in epidemiological studies for event rates, then model fit, parameter estimates and significances will be equivalent to the fit of a logistic regression, but the results of R measures may be completely different and may also have different interpretations. In the following section we describe R measures for logistic and oisson regression based on the sums-of-squares and on entropy concepts. Differences between R measures of both models are exemplified using real data, and thereby stressing the differences in the interpretation of R on the one hand and -values and parameter estimates on the other hand.. Materials and methods.1. Data We use data of a follow-up study [10,11], which investigates whether the covariates age and smoking can predict death from coronary artery disease of British male doctors. Age is given in decades, smoking is a dichotomous covariate which categorizes the study participants into smokers and non-smokers (Table 1). The numbers of death from coronary artery disease and person-years lived are given for each distinct covariate pattern formed by age-decades and smoking status... Statistical methods Fig.. Observed (dots) and by logistic regression estimated (line) probabilities of adverse events of a dose response example. The data are fitted by oisson and logistic regression, so that the effect of age and smoking on death of coronary artery disease can be described by parameter estimates and -values. The question, how good the individual outcomes can be predicted by these two factors is of further interest.

3 M. Mittlböck, H. Heinzl / Journal of Clinical Epidemiology 54 (001) Table 1 Data from the study about death from coronary artery disease among British doctors [10] Age Smoke Number of deaths erson-years , , , , , , , , , ,317 Usually, this is answered by computing R, which is also called the coefficient of determination or the proportion of explained variation. The general form of R measures is R i Dy ( i x i ) = , i Dy ( i ) where D(y i ) and D(y i x i ) denote a measure of the distance of observed values y i from an unconditional and a conditional (on a covariate vector x i ) central location parameter, respectively. R measures based on deviances make use of the loglikelihood of the saturated model (log(y)), when observed and estimated outcomes coincide, the log-likelihood of the full model (log( ˆ )), when the effects of covariates ( ˆ ) are estimated, and the log-likelihood of the null model (log(y)), when only an intercept parameter ( ˆ 0 ) is fitted. The conditional distance measure is the difference between the log-likelihoods of the saturated and the full model and the unconditional distance measure is the difference between the log-likelihood of the saturated and the null model. That is, log( y) log( βˆ ) R DEV = log( y) log( y) For oisson regression the deviance-based R measure is R DEV, 1 i y [( i log( y i ) y i ) ( y i log ( µˆ i ) µˆ i )] = y ( i log( y i ) y i ) y [ ( i log( y i ) y i )] i [8,9] with observed frequencies y i 0, estimated frequencies ˆ i n i exp( ˆ x i ) and y i n i exp( ˆ 0 ) n i ( j y/ j j n j ) under the full and null model, respectively; n i denotes the total number of exposures to risk (or person-years lived) for a given covariate vector x i. For logistic regression the deviance-based R measure is log( βˆ ) R DEV, = log( y), as log(y) is equal to zero (note, by definition 0log0 0). Thus R DEV, 1 i y [ i logpˆ i + ( 1 y i )log( 1 pˆ i )] = i y i logy [ + ( 1 y i )log( 1 y )] [6,1,13] with observed outcome y i {0,1} and estimated probabilities exp( βˆ x pˆ i ) i = exp( βˆ x i ) and y exp ( βˆ 0) = = 1 + exp( βˆ 0) under the full and null model, respectively, and the number of observations n i n i. The R measure for oisson regression models, based on sums-of-squares, uses as conditional distance measure the squared difference between observed and estimated frequencies (y i ˆ i). For the unconditional distance measure ˆ i is replaced by y. The resulting R i measure gives R SS, [8,9]. Whereas the R measure in logistic regression models, based on sums-of-squares, quantifies the difference between the binary observed outcomes and estimated probabilities, so that D(y i x i ) (y i) i pˆ. For the unconditional distance measure pˆ i is replaced by y, so that eventually R SS, [6,14]. In the following, all computations for model fitting and calculations for the R measures are made using the statistical software package SAS [15]. 3. Results = 1 i ( y i µˆ i ) ( y i y i ) i = 1 i ( y i pˆ i ) ( y i y ) i 1 -- n i y i, Results of modelling coronary artery disease are given in Table with the models: (a) smoking only, (b) age only, (c) smoking and age and (d) smoking, age and age-squared. We see that smoking and increasing age have a significant influence on death from coronary artery disease, but the increase in risk is larger for younger people than for older people (quadratic age effect). Model results of logistic and oisson regression are nearly identical. In Table 3, model (a), the estimated R -values for fitting only the covariate smoking is shown. R DEV gives 3.1% under oisson regression and 0.3% under logistic regression. The difference between the deviance-based R measures of oisson and logistic regression becomes even more dramatic when age is modelled (model (b)): 90.9% under oisson regression and 9.0% under logistic regression. The reason for this is simple: both measures are correct, but have different interpretation. In both models the underlying dependent variable is not the same and one has therefore to define what an R measure should explain. The main intention of the oisson regression is to explain the observed frequencies. Under model (d) the number of persons died with coro-

4 10 M. Mittlböck, H. Heinzl / Journal of Clinical Epidemiology 54 (001) Table Results of oisson/logistic regression for the study about death from coronary artery disease among British doctors [10] Model S.E. df -value (a) Intercept 5.4/ / Smoke 0.54/ / / / (b) Intercept 10.30/ / Age 0.08/ / / / (c) Intercept 10./ / Smoke 0.41/ / / / Age 0.08/ / / / (d) Intercept 17.51/ / Smoke 0.35/ / / / Age 0.33/ / / / Age-squared 0.00/ / / / nary artery disease can be predicted very well R is close to 100%. In contrast the dependent variable in logistic regression indicates on a year-by-year basis, if the person has died or is alive. Obviously it is much easier to predict the number of events, than to predict single events. Consequently, there will be a much higher R -value for oisson than for logistic regression. Of course, the same is true if we consider an R measure based on sums-of-squares (R ss). As oisson regression is usually fitted using maximum likelihood and not leastsquares, R ss can become negative in some situations under oisson regression (see [9]), as it is the case in Table 3, model (a). In models (b) to (d) the difference between RDEV and R ss under oisson regression is minimal. But under logistic regression R ss is much smaller than R DEV, which is due to different construction principles the improvement per observation in the likelihood of the full model relative to the null model is in our example larger than the improvement in the sums-of-squares of the full model relative to the null model. From Table we see that all variables (smoke, age and age-squared) have significant influence on the death from coronary artery disease. Despite the substantial differences in R -values between oisson and logistic regression, their ranking of the covariates remains unchanged. Under both models age explains most, whereas smoking, although significant, reduces uncertainty about death only slightly. Table 3 Estimated R measures in percent for the study about death from coronary artery disease among British doctors calculated under oisson and logistic regression [10] oisson regression ogistic regression Model R DEV, R SS, R DEV, R SS, (a) Smoke (b) Age (c) Age smoke (d) Age age-squared smoke Discussion The example illustrates the structurally big difference in R measures between oisson and logistic regression. The likelihood, parameter estimates and -values are approximately the same for both models, but one has to be cautious in interpreting R measures. Do we want to explain the variability in the observed event rates or in the individual outcomes? In the example a major reduction in the uncertainty of the event rate prediction for a given covariate pattern can be achieved, however, the predictability of an individual outcome is still poor. This is a typical example for studies with many observations, where contributions of the explanatory variables are highly significant and of substantial interest. However the R measure of the logistic regression can be a useful reminder that the contribution of the variables may, in fact, explain only a small percentage of the variability in the individual response. Cox and Wermuth [16] stated that this type of interpretation is misleading in linear regressions with binary responses since low values of R, roughly 0.1, are inevitable even if an important relation is present. But our example shows that low R -values in logistic regression are a necessary hint that small -values, impressive parameter estimates for prognostic factors and large R -values calculated under oisson regression may hide the necessity to search for further prognostic factors to improve the prediction of the individual outcome. It is even possible that the R for oisson regression achieves one, so that the event rates are perfectly predicted by the model, and in the same time the R -value of the corresponding logistic regression may achieve only a value close to zero, indicating that it is still difficult to predict individual outcome. R measures usually increase monotonically with increasing number of covariates even if they have no prognostic value at all. Therefore the use of adjusted R measures (as proposed by [6,9,17,18]) is advised especially in cases with small number of observations (that is distinctive covariate patterns in oisson regression) relative to the number of covariates. However, the principal problem as described here remains the same.

5 M. Mittlböck, H. Heinzl / Journal of Clinical Epidemiology 54 (001) References [1] Drolette ME. The vanishing table: linking the hypergeometric, binomial and oisson. Am Stat 1974;8:10 3. [] Eisenberg H, Geoghagen R, Walsh J. A general use of the oisson approximation for binomial events, with application to bacterial endocarditis data. Biometrics 1966;:74 8. [3] Matsunawa T. oisson distribution. In: Kotz S, Johnson N, editors. Encyclopedia of statistical sciences, vol. 7. New York: Wiley, pp [4] Sheu S. The oisson approximation to the binomial distribution. Am Stat 1984;38:06 7. [5] Hosmer DW, emeshow S. Applied logistic regression. New York: Wiley, [6] Mittlböck M, Schemper M. Explained variation for logistic regression. Stat Med 1996;15: [7] Korn E, Simon R. Explained residual variation, explained risk, and goodness of fit. Am Stat 1991;45:01 6. [8] Cameron AC, Windmeijer FAG. R measures for count data regression models with applications to health-care utilization. J Bus Econ Stat 1996;14:09 0. [9] Waldhör T, Haidinger G, Schober E. Comparison of R measures for oisson regression by simulation. J Epidemiol Biostat 1998;3: [10] Doll R, Hill AB. Mortality of British doctors in relation to smoking: observations on coronary thrombosis. Nat Cancer Inst Monog 1966; 19: [11] Breslow NE. Cohort analysis in epidemiology. In: Atkinson AC, Fienberg SE, editors. A celebration of statistics. New York: Springer Verlag, pp [1] Theil H. On the estimation of relationships involving qualitative variables. Am J Sociol 1970;76: [13] McFadden D. The measurement of urban travel demand. J ublic Econ 1974;3: [14] Margolin BH, ight RJ. An analysis of variance for categorical data, II: small sample comparisons with chi-square and other competitors. J Am Stat Assoc 1974;69: [15] The SAS system for windows. SAS Cary, NC: SAS Institute Inc., [16] Cox DR, Wermuth N. A comment on the coefficient of determination for binary responses. Am Stat 199;46:1 4. [17] Mittlböck M, Schemper M. Computing measures of explained variation for logistic regression models. Comp Methods rog Biomed 1999;58:17 4. [18] Mittlböck M, Waldhör T. Adjustments for R -measures for oisson regression models. Comp Stat Data Anal 000;34:461 7.

### BIAS OF MAXIMUM-LIKELIHOOD ESTIMATES IN LOGISTIC AND COX REGRESSION MODELS: A COMPARATIVE SIMULATION STUDY

BIAS OF MAXIMUM-LIKELIHOOD ESTIMATES IN LOGISTIC AND COX REGRESSION MODELS: A COMPARATIVE SIMULATION STUDY Ingo Langner 1, Ralf Bender 2, Rebecca Lenz-Tönjes 1, Helmut Küchenhoff 2, Maria Blettner 2 1

### Correlation and regression

1 Correlation and regression Yongjua Laosiritaworn Introductory on Field Epidemiology 6 July 2015, Thailand Data 2 Illustrative data (Doll, 1955) 3 Scatter plot 4 Doll, 1955 5 6 Correlation coefficient,

### Basic Medical Statistics Course

Basic Medical Statistics Course S7 Logistic Regression November 2015 Wilma Heemsbergen w.heemsbergen@nki.nl Logistic Regression The concept of a relationship between the distribution of a dependent variable

### Hypothesis Testing, Power, Sample Size and Confidence Intervals (Part 2)

Hypothesis Testing, Power, Sample Size and Confidence Intervals (Part 2) B.H. Robbins Scholars Series June 23, 2010 1 / 29 Outline Z-test χ 2 -test Confidence Interval Sample size and power Relative effect

### Section IX. Introduction to Logistic Regression for binary outcomes. Poisson regression

Section IX Introduction to Logistic Regression for binary outcomes Poisson regression 0 Sec 9 - Logistic regression In linear regression, we studied models where Y is a continuous variable. What about

### 8 Nominal and Ordinal Logistic Regression

8 Nominal and Ordinal Logistic Regression 8.1 Introduction If the response variable is categorical, with more then two categories, then there are two options for generalized linear models. One relies on

### Stat 642, Lecture notes for 04/12/05 96

Stat 642, Lecture notes for 04/12/05 96 Hosmer-Lemeshow Statistic The Hosmer-Lemeshow Statistic is another measure of lack of fit. Hosmer and Lemeshow recommend partitioning the observations into 10 equal

### 9 Generalized Linear Models

9 Generalized Linear Models The Generalized Linear Model (GLM) is a model which has been built to include a wide range of different models you already know, e.g. ANOVA and multiple linear regression models

### Multinomial Logistic Regression Models

Stat 544, Lecture 19 1 Multinomial Logistic Regression Models Polytomous responses. Logistic regression can be extended to handle responses that are polytomous, i.e. taking r>2 categories. (Note: The word

### STA6938-Logistic Regression Model

Dr. Ying Zhang STA6938-Logistic Regression Model Topic 2-Multiple Logistic Regression Model Outlines:. Model Fitting 2. Statistical Inference for Multiple Logistic Regression Model 3. Interpretation of

### Penalized likelihood logistic regression with rare events

Penalized likelihood logistic regression with rare events Georg Heinze 1, Angelika Geroldinger 1, Rainer Puhr 2, Mariana Nold 3, Lara Lusa 4 1 Medical University of Vienna, CeMSIIS, Section for Clinical

### Exact McNemar s Test and Matching Confidence Intervals Michael P. Fay April 25,

Exact McNemar s Test and Matching Confidence Intervals Michael P. Fay April 25, 2016 1 McNemar s Original Test Consider paired binary response data. For example, suppose you have twins randomized to two

### Effects of Exposure Measurement Error When an Exposure Variable Is Constrained by a Lower Limit

American Journal of Epidemiology Copyright 003 by the Johns Hopkins Bloomberg School of Public Health All rights reserved Vol. 157, No. 4 Printed in U.S.A. DOI: 10.1093/aje/kwf17 Effects of Exposure Measurement

### Statistical Modelling with Stata: Binary Outcomes

Statistical Modelling with Stata: Binary Outcomes Mark Lunt Arthritis Research UK Epidemiology Unit University of Manchester 21/11/2017 Cross-tabulation Exposed Unexposed Total Cases a b a + b Controls

### A new strategy for meta-analysis of continuous covariates in observational studies with IPD. Willi Sauerbrei & Patrick Royston

A new strategy for meta-analysis of continuous covariates in observational studies with IPD Willi Sauerbrei & Patrick Royston Overview Motivation Continuous variables functional form Fractional polynomials

### ADVANCED STATISTICAL ANALYSIS OF EPIDEMIOLOGICAL STUDIES. Cox s regression analysis Time dependent explanatory variables

ADVANCED STATISTICAL ANALYSIS OF EPIDEMIOLOGICAL STUDIES Cox s regression analysis Time dependent explanatory variables Henrik Ravn Bandim Health Project, Statens Serum Institut 4 November 2011 1 / 53

### Generalized Linear Models. Last time: Background & motivation for moving beyond linear

Generalized Linear Models Last time: Background & motivation for moving beyond linear regression - non-normal/non-linear cases, binary, categorical data Today s class: 1. Examples of count and ordered

### ZERO INFLATED POISSON REGRESSION

STAT 6500 ZERO INFLATED POISSON REGRESSION FINAL PROJECT DEC 6 th, 2013 SUN JEON DEPARTMENT OF SOCIOLOGY UTAH STATE UNIVERSITY POISSON REGRESSION REVIEW INTRODUCING - ZERO-INFLATED POISSON REGRESSION SAS

### STA102 Class Notes Chapter Logistic Regression

STA0 Class Notes Chapter 0 0. Logistic Regression We continue to study the relationship between a response variable and one or more eplanatory variables. For SLR and MLR (Chapters 8 and 9), our response

### Neuendorf Logistic Regression. The Model: Assumptions:

1 Neuendorf Logistic Regression The Model: Y Assumptions: 1. Metric (interval/ratio) data for 2+ IVs, and dichotomous (binomial; 2-value), categorical/nominal data for a single DV... bear in mind that

### Generalized linear models

Generalized linear models Douglas Bates November 01, 2010 Contents 1 Definition 1 2 Links 2 3 Estimating parameters 5 4 Example 6 5 Model building 8 6 Conclusions 8 7 Summary 9 1 Generalized Linear Models

### LOGISTIC REGRESSION Joseph M. Hilbe

LOGISTIC REGRESSION Joseph M. Hilbe Arizona State University Logistic regression is the most common method used to model binary response data. When the response is binary, it typically takes the form of

### Ron Heck, Fall Week 8: Introducing Generalized Linear Models: Logistic Regression 1 (Replaces prior revision dated October 20, 2011)

Ron Heck, Fall 2011 1 EDEP 768E: Seminar in Multilevel Modeling rev. January 3, 2012 (see footnote) Week 8: Introducing Generalized Linear Models: Logistic Regression 1 (Replaces prior revision dated October

### Investigating Models with Two or Three Categories

Ronald H. Heck and Lynn N. Tabata 1 Investigating Models with Two or Three Categories For the past few weeks we have been working with discriminant analysis. Let s now see what the same sort of model might

### LOGISTIC REGRESSION. Lalmohan Bhar Indian Agricultural Statistics Research Institute, New Delhi

LOGISTIC REGRESSION Lalmohan Bhar Indian Agricultural Statistics Research Institute, New Delhi- lmbhar@gmail.com. Introduction Regression analysis is a method for investigating functional relationships

### Binomial Model. Lecture 10: Introduction to Logistic Regression. Logistic Regression. Binomial Distribution. n independent trials

Lecture : Introduction to Logistic Regression Ani Manichaikul amanicha@jhsph.edu 2 May 27 Binomial Model n independent trials (e.g., coin tosses) p = probability of success on each trial (e.g., p =! =

### Lecture 10: Alternatives to OLS with limited dependent variables. PEA vs APE Logit/Probit Poisson

Lecture 10: Alternatives to OLS with limited dependent variables PEA vs APE Logit/Probit Poisson PEA vs APE PEA: partial effect at the average The effect of some x on y for a hypothetical case with sample

### COMPLEMENTARY LOG-LOG MODEL

COMPLEMENTARY LOG-LOG MODEL Under the assumption of binary response, there are two alternatives to logit model: probit model and complementary-log-log model. They all follow the same form π ( x) =Φ ( α

### Probability and Probability Distributions. Dr. Mohammed Alahmed

Probability and Probability Distributions 1 Probability and Probability Distributions Usually we want to do more with data than just describing them! We might want to test certain specific inferences about

### STAT 5500/6500 Conditional Logistic Regression for Matched Pairs

STAT 5500/6500 Conditional Logistic Regression for Matched Pairs The data for the tutorial came from support.sas.com, The LOGISTIC Procedure: Conditional Logistic Regression for Matched Pairs Data :: SAS/STAT(R)

### Generalized linear models for binary data. A better graphical exploratory data analysis. The simple linear logistic regression model

Stat 3302 (Spring 2017) Peter F. Craigmile Simple linear logistic regression (part 1) [Dobson and Barnett, 2008, Sections 7.1 7.3] Generalized linear models for binary data Beetles dose-response example

### Chapter 19: Logistic regression

Chapter 19: Logistic regression Self-test answers SELF-TEST Rerun this analysis using a stepwise method (Forward: LR) entry method of analysis. The main analysis To open the main Logistic Regression dialog

### " M A #M B. Standard deviation of the population (Greek lowercase letter sigma) σ 2

Notation and Equations for Final Exam Symbol Definition X The variable we measure in a scientific study n The size of the sample N The size of the population M The mean of the sample µ The mean of the

### Looking at data: relationships

Looking at data: relationships Least-squares regression IPS chapter 2.3 2006 W. H. Freeman and Company Objectives (IPS chapter 2.3) Least-squares regression p p The regression line Making predictions:

### Longitudinal Modeling with Logistic Regression

Newsom 1 Longitudinal Modeling with Logistic Regression Longitudinal designs involve repeated measurements of the same individuals over time There are two general classes of analyses that correspond to

### Regression techniques provide statistical analysis of relationships. Research designs may be classified as experimental or observational; regression

LOGISTIC REGRESSION Regression techniques provide statistical analysis of relationships. Research designs may be classified as eperimental or observational; regression analyses are applicable to both types.

### Models for Count and Binary Data. Poisson and Logistic GWR Models. 24/07/2008 GWR Workshop 1

Models for Count and Binary Data Poisson and Logistic GWR Models 24/07/2008 GWR Workshop 1 Outline I: Modelling counts Poisson regression II: Modelling binary events Logistic Regression III: Poisson Regression

### 7. Assumes that there is little or no multicollinearity (however, SPSS will not assess this in the [binary] Logistic Regression procedure).

1 Neuendorf Logistic Regression The Model: Y Assumptions: 1. Metric (interval/ratio) data for 2+ IVs, and dichotomous (binomial; 2-value), categorical/nominal data for a single DV... bear in mind that

### Confidence Intervals for the Odds Ratio in Logistic Regression with One Binary X

Chapter 864 Confidence Intervals for the Odds Ratio in Logistic Regression with One Binary X Introduction Logistic regression expresses the relationship between a binary response variable and one or more

### Unit 9: Inferences for Proportions and Count Data

Unit 9: Inferences for Proportions and Count Data Statistics 571: Statistical Methods Ramón V. León 12/15/2008 Unit 9 - Stat 571 - Ramón V. León 1 Large Sample Confidence Interval for Proportion ( pˆ p)

### Analysing categorical data using logit models

Analysing categorical data using logit models Graeme Hutcheson, University of Manchester The lecture notes, exercises and data sets associated with this course are available for download from: www.research-training.net/manchester

### A COEFFICIENT OF DETERMINATION FOR LOGISTIC REGRESSION MODELS

A COEFFICIENT OF DETEMINATION FO LOGISTIC EGESSION MODELS ENATO MICELI UNIVESITY OF TOINO After a brief presentation of the main extensions of the classical coefficient of determination ( ), a new index

### 1 Introduction A common problem in categorical data analysis is to determine the effect of explanatory variables V on a binary outcome D of interest.

Conditional and Unconditional Categorical Regression Models with Missing Covariates Glen A. Satten and Raymond J. Carroll Λ December 4, 1999 Abstract We consider methods for analyzing categorical regression

### Statistical aspects of prediction models with high-dimensional data

Statistical aspects of prediction models with high-dimensional data Anne Laure Boulesteix Institut für Medizinische Informationsverarbeitung, Biometrie und Epidemiologie February 15th, 2017 Typeset by

### Extensions of Cox Model for Non-Proportional Hazards Purpose

PhUSE 2013 Paper SP07 Extensions of Cox Model for Non-Proportional Hazards Purpose Jadwiga Borucka, PAREXEL, Warsaw, Poland ABSTRACT Cox proportional hazard model is one of the most common methods used

### Sample Size and Power I: Binary Outcomes. James Ware, PhD Harvard School of Public Health Boston, MA

Sample Size and Power I: Binary Outcomes James Ware, PhD Harvard School of Public Health Boston, MA Sample Size and Power Principles: Sample size calculations are an essential part of study design Consider

### Compare Predicted Counts between Groups of Zero Truncated Poisson Regression Model based on Recycled Predictions Method

Compare Predicted Counts between Groups of Zero Truncated Poisson Regression Model based on Recycled Predictions Method Yan Wang 1, Michael Ong 2, Honghu Liu 1,2,3 1 Department of Biostatistics, UCLA School

### LOGISTICS REGRESSION FOR SAMPLE SURVEYS

4 LOGISTICS REGRESSION FOR SAMPLE SURVEYS Hukum Chandra Indian Agricultural Statistics Research Institute, New Delhi-002 4. INTRODUCTION Researchers use sample survey methodology to obtain information

### Simple ways to interpret effects in modeling ordinal categorical data

DOI: 10.1111/stan.12130 ORIGINAL ARTICLE Simple ways to interpret effects in modeling ordinal categorical data Alan Agresti 1 Claudia Tarantola 2 1 Department of Statistics, University of Florida, Gainesville,

### Causal Hazard Ratio Estimation By Instrumental Variables or Principal Stratification. Todd MacKenzie, PhD

Causal Hazard Ratio Estimation By Instrumental Variables or Principal Stratification Todd MacKenzie, PhD Collaborators A. James O Malley Tor Tosteson Therese Stukel 2 Overview 1. Instrumental variable

### Survival Regression Models

Survival Regression Models David M. Rocke May 18, 2017 David M. Rocke Survival Regression Models May 18, 2017 1 / 32 Background on the Proportional Hazards Model The exponential distribution has constant

### 2/26/2017. PSY 512: Advanced Statistics for Psychological and Behavioral Research 2

PSY 512: Advanced Statistics for Psychological and Behavioral Research 2 When and why do we use logistic regression? Binary Multinomial Theory behind logistic regression Assessing the model Assessing predictors

### Precision of maximum likelihood estimation in adaptive designs

Research Article Received 12 January 2015, Accepted 24 September 2015 Published online 12 October 2015 in Wiley Online Library (wileyonlinelibrary.com) DOI: 10.1002/sim.6761 Precision of maximum likelihood

### Müller: Goodness-of-fit criteria for survival data

Müller: Goodness-of-fit criteria for survival data Sonderforschungsbereich 386, Paper 382 (2004) Online unter: http://epub.ub.uni-muenchen.de/ Projektpartner Goodness of fit criteria for survival data

### Multivariable Fractional Polynomials

Multivariable Fractional Polynomials Axel Benner September 7, 2015 Contents 1 Introduction 1 2 Inventory of functions 1 3 Usage in R 2 3.1 Model selection........................................ 3 4 Example

### CHL 5225 H Crossover Trials. CHL 5225 H Crossover Trials

CHL 55 H Crossover Trials The Two-sequence, Two-Treatment, Two-period Crossover Trial Definition A trial in which patients are randomly allocated to one of two sequences of treatments (either 1 then, or

### CHOOSING AMONG GENERALIZED LINEAR MODELS APPLIED TO MEDICAL DATA

STATISTICS IN MEDICINE, VOL. 17, 59 68 (1998) CHOOSING AMONG GENERALIZED LINEAR MODELS APPLIED TO MEDICAL DATA J. K. LINDSEY AND B. JONES* Department of Medical Statistics, School of Computing Sciences,

### The STS Surgeon Composite Technical Appendix

The STS Surgeon Composite Technical Appendix Overview Surgeon-specific risk-adjusted operative operative mortality and major complication rates were estimated using a bivariate random-effects logistic

### Binary Logistic Regression

The coefficients of the multiple regression model are estimated using sample data with k independent variables Estimated (or predicted) value of Y Estimated intercept Estimated slope coefficients Ŷ = b

### EDF 7405 Advanced Quantitative Methods in Educational Research. Data are available on IQ of the child and seven potential predictors.

EDF 7405 Advanced Quantitative Methods in Educational Research Data are available on IQ of the child and seven potential predictors. Four are medical variables available at the birth of the child: Birthweight

### Research Methodology: Tools

MSc Business Administration Research Methodology: Tools Applied Data Analysis (with SPSS) Lecture 05: Contingency Analysis March 2014 Prof. Dr. Jürg Schwarz Lic. phil. Heidi Bruderer Enzler Contents Slide

### Standardization methods have been used in epidemiology. Marginal Structural Models as a Tool for Standardization ORIGINAL ARTICLE

ORIGINAL ARTICLE Marginal Structural Models as a Tool for Standardization Tosiya Sato and Yutaka Matsuyama Abstract: In this article, we show the general relation between standardization methods and marginal

### ECON 5350 Class Notes Functional Form and Structural Change

ECON 5350 Class Notes Functional Form and Structural Change 1 Introduction Although OLS is considered a linear estimator, it does not mean that the relationship between Y and X needs to be linear. In this

### Confidence Intervals for the Odds Ratio in Logistic Regression with Two Binary X s

Chapter 866 Confidence Intervals for the Odds Ratio in Logistic Regression with Two Binary X s Introduction Logistic regression expresses the relationship between a binary response variable and one or

### CHAPTER 5 FUNCTIONAL FORMS OF REGRESSION MODELS

CHAPTER 5 FUNCTIONAL FORMS OF REGRESSION MODELS QUESTIONS 5.1. (a) In a log-log model the dependent and all explanatory variables are in the logarithmic form. (b) In the log-lin model the dependent variable

### Modelling Binary Outcomes 21/11/2017

Modelling Binary Outcomes 21/11/2017 Contents 1 Modelling Binary Outcomes 5 1.1 Cross-tabulation.................................... 5 1.1.1 Measures of Effect............................... 6 1.1.2 Limitations

### Goodness of Fit Tests: Homogeneity

Goodness of Fit Tests: Homogeneity Mathematics 47: Lecture 35 Dan Sloughter Furman University May 11, 2006 Dan Sloughter (Furman University) Goodness of Fit Tests: Homogeneity May 11, 2006 1 / 13 Testing

### Ignoring the matching variables in cohort studies - when is it valid, and why?

Ignoring the matching variables in cohort studies - when is it valid, and why? Arvid Sjölander Abstract In observational studies of the effect of an exposure on an outcome, the exposure-outcome association

### STAT 705: Analysis of Contingency Tables

STAT 705: Analysis of Contingency Tables Timothy Hanson Department of Statistics, University of South Carolina Stat 705: Analysis of Contingency Tables 1 / 45 Outline of Part I: models and parameters Basic

### Poisson Regression. Ryan Godwin. ECON University of Manitoba

Poisson Regression Ryan Godwin ECON 7010 - University of Manitoba Abstract. These lecture notes introduce Maximum Likelihood Estimation (MLE) of a Poisson regression model. 1 Motivating the Poisson Regression

### On estimation of the Poisson parameter in zero-modied Poisson models

Computational Statistics & Data Analysis 34 (2000) 441 459 www.elsevier.com/locate/csda On estimation of the Poisson parameter in zero-modied Poisson models Ekkehart Dietz, Dankmar Bohning Department of

### Chapter 10 Nonlinear Models

Chapter 10 Nonlinear Models Nonlinear models can be classified into two categories. In the first category are models that are nonlinear in the variables, but still linear in terms of the unknown parameters.

### Business Statistics. Lecture 10: Correlation and Linear Regression

Business Statistics Lecture 10: Correlation and Linear Regression Scatterplot A scatterplot shows the relationship between two quantitative variables measured on the same individuals. It displays the Form

### Generalized Linear Model under the Extended Negative Multinomial Model and Cancer Incidence

Generalized Linear Model under the Extended Negative Multinomial Model and Cancer Incidence Sunil Kumar Dhar Center for Applied Mathematics and Statistics, Department of Mathematical Sciences, New Jersey

### Practice of SAS Logistic Regression on Binary Pharmacodynamic Data Problems and Solutions. Alan J Xiao, Cognigen Corporation, Buffalo NY

Practice of SAS Logistic Regression on Binary Pharmacodynamic Data Problems and Solutions Alan J Xiao, Cognigen Corporation, Buffalo NY ABSTRACT Logistic regression has been widely applied to population

### STAT331. Cox s Proportional Hazards Model

STAT331 Cox s Proportional Hazards Model In this unit we introduce Cox s proportional hazards (Cox s PH) model, give a heuristic development of the partial likelihood function, and discuss adaptations

### Incorporating published univariable associations in diagnostic and prognostic modeling

Incorporating published univariable associations in diagnostic and prognostic modeling Thomas Debray Julius Center for Health Sciences and Primary Care University Medical Center Utrecht The Netherlands

### Impact of covariate misclassification on the power and type I error in clinical trials using covariate-adaptive randomization

Impact of covariate misclassification on the power and type I error in clinical trials using covariate-adaptive randomization L I Q I O N G F A N S H A R O N D. Y E A T T S W E N L E Z H A O M E D I C

### Assessing the effect of a partly unobserved, exogenous, binary time-dependent covariate on -APPENDIX-

Assessing the effect of a partly unobserved, exogenous, binary time-dependent covariate on survival probabilities using generalised pseudo-values Ulrike Pötschger,2, Harald Heinzl 2, Maria Grazia Valsecchi

### LISA Short Course Series Generalized Linear Models (GLMs) & Categorical Data Analysis (CDA) in R. Liang (Sally) Shan Nov. 4, 2014

LISA Short Course Series Generalized Linear Models (GLMs) & Categorical Data Analysis (CDA) in R Liang (Sally) Shan Nov. 4, 2014 L Laboratory for Interdisciplinary Statistical Analysis LISA helps VT researchers

### Sigmaplot di Systat Software

Sigmaplot di Systat Software SigmaPlot Has Extensive Statistical Analysis Features SigmaPlot is now bundled with SigmaStat as an easy-to-use package for complete graphing and data analysis. The statistical

### Linear Regression Analysis

REVIEW ARTICLE Linear Regression Analysis Part 14 of a Series on Evaluation of Scientific Publications by Astrid Schneider, Gerhard Hommel, and Maria Blettner SUMMARY Background: Regression analysis is

### A NOTE ON ROBUST ESTIMATION IN LOGISTIC REGRESSION MODEL

Discussiones Mathematicae Probability and Statistics 36 206 43 5 doi:0.75/dmps.80 A NOTE ON ROBUST ESTIMATION IN LOGISTIC REGRESSION MODEL Tadeusz Bednarski Wroclaw University e-mail: t.bednarski@prawo.uni.wroc.pl

### Model Estimation Example

Ronald H. Heck 1 EDEP 606: Multivariate Methods (S2013) April 7, 2013 Model Estimation Example As we have moved through the course this semester, we have encountered the concept of model estimation. Discussions

### STA 216: GENERALIZED LINEAR MODELS. Lecture 1. Review and Introduction. Much of statistics is based on the assumption that random

STA 216: GENERALIZED LINEAR MODELS Lecture 1. Review and Introduction Much of statistics is based on the assumption that random variables are continuous & normally distributed. Normal linear regression

### ABSTRACT INTRODUCTION. SESUG Paper

SESUG Paper 140-2017 Backward Variable Selection for Logistic Regression Based on Percentage Change in Odds Ratio Evan Kwiatkowski, University of North Carolina at Chapel Hill; Hannah Crooke, PAREXEL International

### Machine Learning Linear Classification. Prof. Matteo Matteucci

Machine Learning Linear Classification Prof. Matteo Matteucci Recall from the first lecture 2 X R p Regression Y R Continuous Output X R p Y {Ω 0, Ω 1,, Ω K } Classification Discrete Output X R p Y (X)

### Logistic regression: Miscellaneous topics

Logistic regression: Miscellaneous topics April 11 Introduction We have covered two approaches to inference for GLMs: the Wald approach and the likelihood ratio approach I claimed that the likelihood ratio

### Statistics in medicine

Statistics in medicine Lecture 1- part 1: Describing variation, and graphical presentation Outline Sources of variation Types of variables Fatma Shebl, MD, MS, MPH, PhD Assistant Professor Chronic Disease

### Logistic Regression. Interpretation of linear regression. Other types of outcomes. 0-1 response variable: Wound infection. Usual linear regression

Logistic Regression Usual linear regression (repetition) y i = b 0 + b 1 x 1i + b 2 x 2i + e i, e i N(0,σ 2 ) or: y i N(b 0 + b 1 x 1i + b 2 x 2i,σ 2 ) Example (DGA, p. 336): E(PEmax) = 47.355 + 1.024

### Chapter 1 Statistical Inference

Chapter 1 Statistical Inference causal inference To infer causality, you need a randomized experiment (or a huge observational study and lots of outside information). inference to populations Generalizations

### ECLT 5810 Linear Regression and Logistic Regression for Classification. Prof. Wai Lam

ECLT 5810 Linear Regression and Logistic Regression for Classification Prof. Wai Lam Linear Regression Models Least Squares Input vectors is an attribute / feature / predictor (independent variable) The

### PubH 7470: STATISTICS FOR TRANSLATIONAL & CLINICAL RESEARCH

PubH 7470: STATISTICS FOR TRANSLATIONAL & CLINICAL RESEARCH From Basic to Translational: INDIRECT BIOASSAYS INDIRECT ASSAYS In indirect assays, the doses of the standard and test preparations are are applied

### BAYESIAN ANALYSIS OF DOSE-RESPONSE CALIBRATION CURVES

Libraries Annual Conference on Applied Statistics in Agriculture 2005-17th Annual Conference Proceedings BAYESIAN ANALYSIS OF DOSE-RESPONSE CALIBRATION CURVES William J. Price Bahman Shafii Follow this

### Readings Howitt & Cramer (2014)

Readings Howitt & Cramer (014) Ch 7: Relationships between two or more variables: Diagrams and tables Ch 8: Correlation coefficients: Pearson correlation and Spearman s rho Ch 11: Statistical significance

### Correlation and simple linear regression S5

Basic medical statistics for clinical and eperimental research Correlation and simple linear regression S5 Katarzyna Jóźwiak k.jozwiak@nki.nl November 15, 2017 1/41 Introduction Eample: Brain size and

### CHAPTER 1: BINARY LOGIT MODEL

CHAPTER 1: BINARY LOGIT MODEL Prof. Alan Wan 1 / 44 Table of contents 1. Introduction 1.1 Dichotomous dependent variables 1.2 Problems with OLS 3.3.1 SAS codes and basic outputs 3.3.2 Wald test for individual

### Pseudo-score confidence intervals for parameters in discrete statistical models

Biometrika Advance Access published January 14, 2010 Biometrika (2009), pp. 1 8 C 2009 Biometrika Trust Printed in Great Britain doi: 10.1093/biomet/asp074 Pseudo-score confidence intervals for parameters

### Log-linear Models for Contingency Tables

Log-linear Models for Contingency Tables Statistics 149 Spring 2006 Copyright 2006 by Mark E. Irwin Log-linear Models for Two-way Contingency Tables Example: Business Administration Majors and Gender A

### Chapter 26 Multiple Regression, Logistic Regression, and Indicator Variables

Chapter 26 Multiple Regression, Logistic Regression, and Indicator Variables 26.1 S 4 /IEE Application Examples: Multiple Regression An S 4 /IEE project was created to improve the 30,000-footlevel metric