Generating Half-normal Plot for Zero-inflated Binomial Regression
|
|
- Victor Clark
- 5 years ago
- Views:
Transcription
1 Paper SP05 Generating Half-normal Plot for Zero-inflated Binomial Regression Zhao Yang, Xuezheng Sun Department of Epidemiology & Biostatistics University of South Carolina, Columbia, SC SUMMARY The Half-normal Plot, a valuable tool in model diagnostics, is a statistical graph based on the simulated envelope. Michael Friendly (1998) contributed the macro %halfnormal for generalized linear models, while the macro %halfnormal can only be used in some distributions: Normal, Binomial, Poisson and Gamma distribution. Since the zero-inflated binomial(zib) model was well-defined by D. Hall (2000), it is becoming popular in modeling the zero-inflated binomial data, including some pharmaceutical data, such as natural immunity. And the half-normal plot is a good way to evaluate model fitting for ZIB model. The macro(%halfnormal_zib) developed in this paper can be used to generate half-normal plot for ZIB regression, which is easily implemented in practice. And we show the application of macro %halfnormal_zib via a simulated data and Whitefly dataset used in Hall s paper. Keywords: half-normal plot, zero-inflated binomial, proc nlmixed, whitefly dataset INTRODUCTION There are many analysis methods for binomial data, such as Logistic regression, Probit mdoel, etc. While in practice, we can observe excess zeros in the raw data, i.e. the observed number of 0 is larger than the predicted from the model. We call this phenomenon as zero-inflation. In the biomedical field, the natural immunity may be a cause of zero-inflation. At this time, the zero-inflated binomial(zib) model can be used to fit the data. The basic idea behind ZIB is we consider there is a pure 0 state in the data generating mechanism, and the remaining part is from a general binomial distribution. ZERO-INFLATED BINOMIAL MODEL The zero-inflated binomial(zib) distribution was first introduced by Kemp and Kemp in 1988, but they only use ZIB to highlight some important aspects of empirical probability generating function estimation. Hall (2000) investigated and extended the ZIB model with and without random effect, and gave some detailed application in the data analysis. Following Hall (2000), we give some detailed information about ZIB regression model { 0 with probability p i Y i (1) Binomial(n i, π i ) with probability 1 p i This model implies 0 with probability p i + (1 ( p i )(1 ) π i ) ni Y i = n i k with probability (1 p i ) πi k k (1 π i) ni k (2), k = 1, 2,, n i ) with E(Y i ) = (1 p i )n i π i and V ar(y i ) = (1 p i )n i π i (1 π i (1 p i n i ). The above probability can be also be expressed as a generalized Bernoulli distribution, making it more apparent on combining it into a full likelihood, n ) ui ( ( ) ni L = (p i + (1 p i )(1 π i ) ni (1 p i ) π k k i (1 π i ) ni k) 1 u i (3) i=1 1
2 The parameters π = (π 1,, π n ) T and p = (p 1,, p n ) T are modeled via logit link, logit(p) = Gγ and logit(π) = Bβ. Hence the log-likelihood for this ZIB model is ( n ) ( ( ) ) l(γ, β; y) = u i log (e γ + (1 + e Biβ ) ni log(1 + e γ ) + (1 u i ) y i B i β n i log(1 + e Biβ ni ) ) + log k i=1 for covariate matrices B n p and G n q. Here n is the number of observation, p is the number of covariates in the general binomial regression model, and q is the number of covariates in the zero-inflation part, γ q 1 and β p 1 are parameter vector for zero-inflation part and general binomial regression part, respectively. The EM algorithm or the Newton-Raphson method can be used to obtain the ML estimates. You are referred to Hall (2000) for more information. We define the standardized Pearson residual as r p i = y i (1 p i )n i π i ( ) (5) (1 p i )n i π i 1 π i (1 p i n i ) Having fitted a ZIB model to the raw dataset, the Score test, likelihood ratio test, or Wald test can be used to justify the ZIB model is better than the general binomial regression model. If we are convinced the zero-inflation in the raw data, we still need a tool to evaluate the ZIB model. Then the half-normal plot is a good candidate for the model diagnosis. (4) HALF-NORMAL PLOT Since the distribution of the residuals is not known, half-normal plots with simulated envelopes are a helpful diagnostic tool (Atkinson, 1985, 4.2; Neter et al., 1996, 14.6; Collet, 2003, 5.2.2). The main idea is to enhance the usual half-normal plot by adding a simulated envelope which can be used to decide whether the observed residuals are consistent with the fitted model. Half-normal plots with a simulated envelope can be produced as follows: (i) fit the model and generate a simulated sample of n independent observations using the fitted model as if it were the true model; (ii) fit the model to the generated sample, and compute the ordered absolute values of the residuals; (iii) repeat steps (i) and (ii) k times; (iv) consider the n sets of the k order statistics; for each set compute its average, minimum and maximum values; (v) plot these values and the ordered residuals of the original sample against the half-normal scores Φ ((t 1 + n ) 1/8)/(2n + 1/2). The minimum and maximum values of the k order statistics yield the envelope. Atkinson(1985, p. 36) suggests using k = 19, so that the probability that a given absolute residual will fall beyond the upper band provided by envelope is approximately equal to 1/20 = Observations corresponding to absolute residuals outside the limits provided by the simulated envelope are worthy of further investigation. Additionally, if a considerable proportion of points falls outside the envelope, then one has evidence against the adequacy of the fitted model. SAS MACRO %halfnormal zib Michael Friendly (1998) contributed the macro %halfnormal for some generalized linear models (GLMs), while the macro %halfnormal can only be used in some distributions: Normal, Binomial, Poisson and Gamma distribution. In the statistical modeling, the model adequacy to the data is an important issue. Fortunately, the half-normal plot can be a valuable tool on evaluating the model adequacy. With the common application of ZIB model, the program for generating the half-normal plot is in need. This paper develops a macro %halfnormal_zib for generating half-normal plot for ZIB model, you can find the original code in the appendix. The half-normal plot generated by %halfnormal_zib is based on the standardized Pearson residual. 2
3 SIMULATION AND AN EXAMPLE We first generate a simulated data set to test the macro. And the simulation is based on the following schedule: p i logit(p i ) = log = g i γ 1 p i = β 0 + β 1 x i = x i i (6) π i logit(π i ) = log = b i β 1 π i = α 0 + α 1 x = x i i (7) The code for generating the above scheduled data is shown in Figure-1. There is only one covariate, x, in the ZIB model. The variable x can be thought as a dosage in biomedical field. And m is the number of trials, yzib is the response variable, i.e. the number of event in m number of trials. Figure 1: The program to generate a zero-inflated Binomial random sample based on schedule log(p i /(1 p i )) = x i for zero-inflation part, and log(π i /(1 π i )) = x i for regular Binomial model, the dataset test has 149 observations. Then we can get the parameter estimates for the simulated data via PROC NLMIXED, which is integrated in the macro %halfnormal zib. The following code show us to use the developed macro, generally, we do not need to indicate all the parameters in the macro, e.g. we can omit seed, nres and out in the application. %halfnorm_zib(data = test, resp = yzib, coefg = bp_0 bp_1, coefb = bll_0 bll_1, g = x, b = x, gv = 0 0, bv = 0 0, trials = m, out = pp, seed = 2006, nres = 19); The parameter estimates are shown in Table-1, here t-value is an asymptotically normal Wald type t statistics defined as the ratio of the estimate to its standard error. From Table-1, we can see the estimated parameters are very close to the assumed. The half-normal plot generated from the macro is shown in Figure-2, all the points fall within the boundary of the simulated envelope, indicating the fitted model is adequate. In the following, we show a real data analysis by using the Whitefly data (Hall, 2000) to show the application of the macro. To learn more information about Whitefly dataset, you are referred to Hall s paper. Since there is no CLASS statement in PROC NLMIXED, in which we can not use the nominal variable directly. Then Figure-3 shows the program to use PROC GLMMOD to generate a new dataset, which contains the expected variable information for further analysis. For simplicity and inadequacy of ZIB modeling, we only include one variable, treatment, from the original dataset. There are 6 treatment methods, by using PROC GLMMOD, we will generate 6 new variables in the dataset. And we are interested if the treatment method will affect the number of surviving whiteflies. 3
4 Table 1: The model fitting information for the simulated data, from the schedule log(p i /(1 p i )) = x i for zero-inflation part, and log(π i /(1 π i )) = x i for regular Binomial model, based on 149 observations. Part of Parameter Standard 95% CI 95% CI ZIB Parameter estimate error df t-value P-value Lower Upper Zero-inflation β β Regular Binomial α α < Figure 2: The half-normal plot generated from the macro %halfnormal zib using the standardized Pearson residual. The graph indicates ZIB model fit the data reasonably well. The application of macro %halfnormal zib is also shown in Figure-3, the original treatment variable becomes 6 variables trt_1, trt_2, trt_3, trt_4, trt_5, and trt_6. trt_6 is the control group. The parameter estimations are shown in Table-2, from which we can see that the treatments have significant effect on the number of surviving whiteflies, except the control group. If we want to do some comparison between different treatment, we then can use ESTIMATE statement in PROC NLMIXED, e.g. estimate bll 1 = bll 2 bll 1 bll 2; will test if the effect of treatment-1 will be different from treatment-2 on controlling the number of surviving whiteflies. While this has to be done by using another PROC NLMIXED program instead of this macro. Although Table-2 provides us with some interesting information to the data, we have to check if the used ZIB model can adequately fit the data, Figure-4 give us a negative response. Since many of the points fall outside the boundary of the simulated envelope, indicating, there are some other important variables we have not controlled in the modeling, the adequate models are considered in Hall s paper. 4
5 Figure 3: Program to generate a new dataset from Whitefly dataset by using PROC GLMMOD, and the usage of macro %halfnormal zib to generate the half-normal plot for diagnosis. Table 2: The model fitting information for Whitefly data, which has 640 observations. Only one variable: treatment from the original dataset is included in the modeling. Part of Parameter Standard 95% CI 95% CI ZIB Parameter estimate error df t-value P-value Lower Upper Zero-inflation bp bp bp bp bp bp < bp Regular Binomial bll < bll < bll < bll < bll < bll < bll DISCUSSION AND SUMMARY In this article, we have adapted Friendly s macro %halfnormal to the situation of zero-inflated binomial regression and developed a new macro %halfnormal zib to generate the half-normal plot for ZIB modeling diagnosis. The half-normal plot generated from %halfnormal zib is only based on the standardized Pearson residual, and the source code is attached in the appendix. You can change it to other kind of residual to generate corresponding half-normal plot. Also, this macro will give your only the parameter estimations and the half-normal plot, it will not perform a comparison, i.e. using statement estimate, contrast, etc in procedure NLMIXED. The macro %halfnormal zib is designed for general application, while for the simulated data, we set the order option in the axis statement, and in your application, you can delete this option and just run the macro, then you can find the axis scale for your plot, based on this, you can re-set the order option to generate a good-looking graph for your work. 5
6 Figure 4: The half-normal plot generated from model fitting to the Whitefly data. There are many points falling outside of the boundary of the envelope, indicating the ZIB model with only one variable information can not fit the data reasonably well. We then can include more variables in the model or by using the mixed effect ZIB model to fit the data, as shown by D. Hall(2000) Meanwhile, you have to notice the origin option in the legend statement, for different computer, SAS R will put the legend in different location on your graph, so you need to make small adjustment according to your computer. References [1] Atkinson, A.C. (1985). Plots, Transformations and Regression: An Introduction to Graphical Methods of Diagnostic Regression Analysis. New York: Oxford University Press. [2] C.D. Kemp and A.W. Kemp(1988), Rapid estimation for discrete distributions, The Statistician, 37: [3] Collet, D.(2003), Modelling binary data, 2nd ed., Chapman & Hall/CRC, New York [4] D.B. Hall(2000), Zero-inflated Poisson and Binomial regression with random effects: A case study, Biometrics, 56: [5] M. Friendly(1998), [6] Neter, J., Kutner, M.H., Nachtsheim, C.J. and Wasserman, W. (1996). Applied Linear Statistical Models(4th ed.), Chicago: Irwin. ACKNOWLEDGEMENTS The authors thank Dr. Daniel B. Hall for permission to use the Whitefly data set, also would like to give a nod to Toby Dunn, whose suggestions are valuable to this paper. 6
7 CONTACT INFORMATION Your comments and questions are valued and encouraged. Contact the authors at: Zhao Yang Department of Epidemiology & Biostatistics, University of South Carolina Columbia, SC Xuezheng Sun Department of Epidemiology & Biostatistics, University of South Carolina Columbia, SC SAS R and all other SAS Institute Inc. product or service names are registered trademarks or trademarks of SAS Institute Inc. in the USA and other countries. R indicates USA registration. This document is generated by L A TEX. Other brand and product names are trademarks of their respective companies. 7
8 APPENDIX This part contains the original code for macro %halfnormal_zib. We show the codes in several figures: from Figure-5 to Figure-13. Also there are some comments on each figure. Figure 5: Definition of the macro %halfnormal zib, the parameters included in this macro are defined above, also we generate a new dataset ZIBD for further analysis. Figure 6: We define three macros: %coefvar, which will be use to to generate model formula in the PROC NLMIXED; %coefval, will be used to generate expression in the PARMS statement in the PROC NLMIXED; and %nwords, which will be used to set up some conditions in using this macro. 8
9 Figure 7: Define some conditions which can prevent user from mis-using this macro, the conditions include: number of coefficients in the model should equal to number of initial values, and should be one more than number of variables in both part of the ZIB model. Also we have to indicate the response variable and the trial variable. Figure 8: By using PROC NLMIXED, we fit ZIB model for the dataset. We present the log-likelihood function and the probability function for the modeling. You can add more options in the PROC NLMIXED statement. 9
10 Figure 9: We first generate two dataset from the procedure NLMIXED, then we calculate the standardized Pearson residual. Then we generate a new dataset from the fitted ZIB model information, which contains 19 replication, by default. Figure 10: Using PROC NLMIXED, we use the 19 generated response variables to re-model the dataset. 10
11 Figure 11: Panel A is to calculated the residual from the 19 fitted model, and in Panel B, we combine all the generated dataset to create a new dataset for further processing. Figure 12: This part also contains Panel A and Panel B, both are extracted from the macro %halfnormal contributed by M. Friendly in
12 Figure 13: This part is to generate the half-normal graph for ZIB diagnosis, you can make some modifications in your convenience to the LEGEND, AXIS and SYMBOL statement. 12
Data Analyses in Multivariate Regression Chii-Dean Joey Lin, SDSU, San Diego, CA
Data Analyses in Multivariate Regression Chii-Dean Joey Lin, SDSU, San Diego, CA ABSTRACT Regression analysis is one of the most used statistical methodologies. It can be used to describe or predict causal
More informationGeneralized Linear. Mixed Models. Methods and Applications. Modern Concepts, Walter W. Stroup. Texts in Statistical Science.
Texts in Statistical Science Generalized Linear Mixed Models Modern Concepts, Methods and Applications Walter W. Stroup CRC Press Taylor & Francis Croup Boca Raton London New York CRC Press is an imprint
More informationLOGISTIC REGRESSION Joseph M. Hilbe
LOGISTIC REGRESSION Joseph M. Hilbe Arizona State University Logistic regression is the most common method used to model binary response data. When the response is binary, it typically takes the form of
More informationDISPLAYING THE POISSON REGRESSION ANALYSIS
Chapter 17 Poisson Regression Chapter Table of Contents DISPLAYING THE POISSON REGRESSION ANALYSIS...264 ModelInformation...269 SummaryofFit...269 AnalysisofDeviance...269 TypeIII(Wald)Tests...269 MODIFYING
More informationPerforming response surface analysis using the SAS RSREG procedure
Paper DV02-2012 Performing response surface analysis using the SAS RSREG procedure Zhiwu Li, National Database Nursing Quality Indicator and the Department of Biostatistics, University of Kansas Medical
More informationBootstrap Simulation Procedure Applied to the Selection of the Multiple Linear Regressions
JKAU: Sci., Vol. 21 No. 2, pp: 197-212 (2009 A.D. / 1430 A.H.); DOI: 10.4197 / Sci. 21-2.2 Bootstrap Simulation Procedure Applied to the Selection of the Multiple Linear Regressions Ali Hussein Al-Marshadi
More informationContrasting Marginal and Mixed Effects Models Recall: two approaches to handling dependence in Generalized Linear Models:
Contrasting Marginal and Mixed Effects Models Recall: two approaches to handling dependence in Generalized Linear Models: Marginal models: based on the consequences of dependence on estimating model parameters.
More informationSAS Software to Fit the Generalized Linear Model
SAS Software to Fit the Generalized Linear Model Gordon Johnston, SAS Institute Inc., Cary, NC Abstract In recent years, the class of generalized linear models has gained popularity as a statistical modeling
More informationReview. Timothy Hanson. Department of Statistics, University of South Carolina. Stat 770: Categorical Data Analysis
Review Timothy Hanson Department of Statistics, University of South Carolina Stat 770: Categorical Data Analysis 1 / 22 Chapter 1: background Nominal, ordinal, interval data. Distributions: Poisson, binomial,
More informationModels for Ordinal Response Data
Models for Ordinal Response Data Robin High Department of Biostatistics Center for Public Health University of Nebraska Medical Center Omaha, Nebraska Recommendations Analyze numerical data with a statistical
More informationCOLLABORATION OF STATISTICAL METHODS IN SELECTING THE CORRECT MULTIPLE LINEAR REGRESSIONS
American Journal of Biostatistics 4 (2): 29-33, 2014 ISSN: 1948-9889 2014 A.H. Al-Marshadi, This open access article is distributed under a Creative Commons Attribution (CC-BY) 3.0 license doi:10.3844/ajbssp.2014.29.33
More informationLinear Regression Models P8111
Linear Regression Models P8111 Lecture 25 Jeff Goldsmith April 26, 2016 1 of 37 Today s Lecture Logistic regression / GLMs Model framework Interpretation Estimation 2 of 37 Linear regression Course started
More informationSAS/STAT 13.1 User s Guide. Introduction to Survey Sampling and Analysis Procedures
SAS/STAT 13.1 User s Guide Introduction to Survey Sampling and Analysis Procedures This document is an individual chapter from SAS/STAT 13.1 User s Guide. The correct bibliographic citation for the complete
More informationSimple logistic regression
Simple logistic regression Biometry 755 Spring 2009 Simple logistic regression p. 1/47 Model assumptions 1. The observed data are independent realizations of a binary response variable Y that follows a
More informationYou can specify the response in the form of a single variable or in the form of a ratio of two variables denoted events/trials.
The GENMOD Procedure MODEL Statement MODEL response = < effects > < /options > ; MODEL events/trials = < effects > < /options > ; You can specify the response in the form of a single variable or in the
More informationLogistic regression. 11 Nov Logistic regression (EPFL) Applied Statistics 11 Nov / 20
Logistic regression 11 Nov 2010 Logistic regression (EPFL) Applied Statistics 11 Nov 2010 1 / 20 Modeling overview Want to capture important features of the relationship between a (set of) variable(s)
More informationGeneralized Linear Models
York SPIDA John Fox Notes Generalized Linear Models Copyright 2010 by John Fox Generalized Linear Models 1 1. Topics I The structure of generalized linear models I Poisson and other generalized linear
More informationNiche Modeling. STAMPS - MBL Course Woods Hole, MA - August 9, 2016
Niche Modeling Katie Pollard & Josh Ladau Gladstone Institutes UCSF Division of Biostatistics, Institute for Human Genetics and Institute for Computational Health Science STAMPS - MBL Course Woods Hole,
More informationChapter 4 Multi-factor Treatment Designs with Multiple Error Terms 93
Contents Preface ix Chapter 1 Introduction 1 1.1 Types of Models That Produce Data 1 1.2 Statistical Models 2 1.3 Fixed and Random Effects 4 1.4 Mixed Models 6 1.5 Typical Studies and the Modeling Issues
More informationPractice of SAS Logistic Regression on Binary Pharmacodynamic Data Problems and Solutions. Alan J Xiao, Cognigen Corporation, Buffalo NY
Practice of SAS Logistic Regression on Binary Pharmacodynamic Data Problems and Solutions Alan J Xiao, Cognigen Corporation, Buffalo NY ABSTRACT Logistic regression has been widely applied to population
More informationQinlei Huang, St. Jude Children s Research Hospital, Memphis, TN Liang Zhu, St. Jude Children s Research Hospital, Memphis, TN
PharmaSUG 2014 - Paper SP04 %IC_LOGISTIC: A SAS Macro to Produce Sorted Information Criteria (AIC/BIC) List for PROC LOGISTIC for Model Selection ABSTRACT Qinlei Huang, St. Jude Children s Research Hospital,
More informationSAS/STAT 13.2 User s Guide. Introduction to Survey Sampling and Analysis Procedures
SAS/STAT 13.2 User s Guide Introduction to Survey Sampling and Analysis Procedures This document is an individual chapter from SAS/STAT 13.2 User s Guide. The correct bibliographic citation for the complete
More informationUNIVERSITY OF TORONTO. Faculty of Arts and Science APRIL 2010 EXAMINATIONS STA 303 H1S / STA 1002 HS. Duration - 3 hours. Aids Allowed: Calculator
UNIVERSITY OF TORONTO Faculty of Arts and Science APRIL 2010 EXAMINATIONS STA 303 H1S / STA 1002 HS Duration - 3 hours Aids Allowed: Calculator LAST NAME: FIRST NAME: STUDENT NUMBER: There are 27 pages
More informationCOMPLEMENTARY LOG-LOG MODEL
COMPLEMENTARY LOG-LOG MODEL Under the assumption of binary response, there are two alternatives to logit model: probit model and complementary-log-log model. They all follow the same form π ( x) =Φ ( α
More informationPaper CD Erika Larsen and Timothy E. O Brien Loyola University Chicago
Abstract: Paper CD-02 2015 SAS Software as an Essential Tool in Statistical Consulting and Research Erika Larsen and Timothy E. O Brien Loyola University Chicago Modelling in bioassay often uses linear,
More informationModeling Longitudinal Count Data with Excess Zeros and Time-Dependent Covariates: Application to Drug Use
Modeling Longitudinal Count Data with Excess Zeros and : Application to Drug Use University of Northern Colorado November 17, 2014 Presentation Outline I and Data Issues II Correlated Count Regression
More informationThe GENMOD Procedure (Book Excerpt)
SAS/STAT 9.22 User s Guide The GENMOD Procedure (Book Excerpt) SAS Documentation This document is an individual chapter from SAS/STAT 9.22 User s Guide. The correct bibliographic citation for the complete
More informationApplication of Ghosh, Grizzle and Sen s Nonparametric Methods in. Longitudinal Studies Using SAS PROC GLM
Application of Ghosh, Grizzle and Sen s Nonparametric Methods in Longitudinal Studies Using SAS PROC GLM Chan Zeng and Gary O. Zerbe Department of Preventive Medicine and Biometrics University of Colorado
More informationSuperMix2 features not available in HLM 7 Contents
SuperMix2 features not available in HLM 7 Contents Spreadsheet display of.ss3 files... 2 Continuous Outcome Variables: Additional Distributions... 3 Additional Estimation Methods... 5 Count variables including
More informationSingle-level Models for Binary Responses
Single-level Models for Binary Responses Distribution of Binary Data y i response for individual i (i = 1,..., n), coded 0 or 1 Denote by r the number in the sample with y = 1 Mean and variance E(y) =
More informationPaper Equivalence Tests. Fei Wang and John Amrhein, McDougall Scientific Ltd.
Paper 11683-2016 Equivalence Tests Fei Wang and John Amrhein, McDougall Scientific Ltd. ABSTRACT Motivated by the frequent need for equivalence tests in clinical trials, this paper provides insights into
More information11. Generalized Linear Models: An Introduction
Sociology 740 John Fox Lecture Notes 11. Generalized Linear Models: An Introduction Copyright 2014 by John Fox Generalized Linear Models: An Introduction 1 1. Introduction I A synthesis due to Nelder and
More informationChapter 5: Logistic Regression-I
: Logistic Regression-I Dipankar Bandyopadhyay Department of Biostatistics, Virginia Commonwealth University BIOS 625: Categorical Data & GLM [Acknowledgements to Tim Hanson and Haitao Chu] D. Bandyopadhyay
More informationModel Based Statistics in Biology. Part V. The Generalized Linear Model. Chapter 18.1 Logistic Regression (Dose - Response)
Model Based Statistics in Biology. Part V. The Generalized Linear Model. Logistic Regression ( - Response) ReCap. Part I (Chapters 1,2,3,4), Part II (Ch 5, 6, 7) ReCap Part III (Ch 9, 10, 11), Part IV
More informationA Note on Bayesian Inference After Multiple Imputation
A Note on Bayesian Inference After Multiple Imputation Xiang Zhou and Jerome P. Reiter Abstract This article is aimed at practitioners who plan to use Bayesian inference on multiplyimputed datasets in
More informationSTAT5044: Regression and Anova
STAT5044: Regression and Anova Inyoung Kim 1 / 18 Outline 1 Logistic regression for Binary data 2 Poisson regression for Count data 2 / 18 GLM Let Y denote a binary response variable. Each observation
More information13.1 Causal effects with continuous mediator and. predictors in their equations. The definitions for the direct, total indirect,
13 Appendix 13.1 Causal effects with continuous mediator and continuous outcome Consider the model of Section 3, y i = β 0 + β 1 m i + β 2 x i + β 3 x i m i + β 4 c i + ɛ 1i, (49) m i = γ 0 + γ 1 x i +
More informationPaper: ST-161. Techniques for Evidence-Based Decision Making Using SAS Ian Stockwell, The Hilltop UMBC, Baltimore, MD
Paper: ST-161 Techniques for Evidence-Based Decision Making Using SAS Ian Stockwell, The Hilltop Institute @ UMBC, Baltimore, MD ABSTRACT SAS has many tools that can be used for data analysis. From Freqs
More informationSections 4.1, 4.2, 4.3
Sections 4.1, 4.2, 4.3 Timothy Hanson Department of Statistics, University of South Carolina Stat 770: Categorical Data Analysis 1/ 32 Chapter 4: Introduction to Generalized Linear Models Generalized linear
More informationRepeated ordinal measurements: a generalised estimating equation approach
Repeated ordinal measurements: a generalised estimating equation approach David Clayton MRC Biostatistics Unit 5, Shaftesbury Road Cambridge CB2 2BW April 7, 1992 Abstract Cumulative logit and related
More informationGeneralized Linear Models: An Introduction
Applied Statistics With R Generalized Linear Models: An Introduction John Fox WU Wien May/June 2006 2006 by John Fox Generalized Linear Models: An Introduction 1 A synthesis due to Nelder and Wedderburn,
More informationCHOOSING AMONG GENERALIZED LINEAR MODELS APPLIED TO MEDICAL DATA
STATISTICS IN MEDICINE, VOL. 17, 59 68 (1998) CHOOSING AMONG GENERALIZED LINEAR MODELS APPLIED TO MEDICAL DATA J. K. LINDSEY AND B. JONES* Department of Medical Statistics, School of Computing Sciences,
More informationDescription Syntax for predict Menu for predict Options for predict Remarks and examples Methods and formulas References Also see
Title stata.com logistic postestimation Postestimation tools for logistic Description Syntax for predict Menu for predict Options for predict Remarks and examples Methods and formulas References Also see
More informationSemiparametric Generalized Linear Models
Semiparametric Generalized Linear Models North American Stata Users Group Meeting Chicago, Illinois Paul Rathouz Department of Health Studies University of Chicago prathouz@uchicago.edu Liping Gao MS Student
More informationDIAGNOSTICS FOR STRATIFIED CLINICAL TRIALS IN PROPORTIONAL ODDS MODELS
DIAGNOSTICS FOR STRATIFIED CLINICAL TRIALS IN PROPORTIONAL ODDS MODELS Ivy Liu and Dong Q. Wang School of Mathematics, Statistics and Computer Science Victoria University of Wellington New Zealand Corresponding
More informationStatistical Practice
Statistical Practice A Note on Bayesian Inference After Multiple Imputation Xiang ZHOU and Jerome P. REITER This article is aimed at practitioners who plan to use Bayesian inference on multiply-imputed
More informationSAS/STAT 14.2 User s Guide. The GENMOD Procedure
SAS/STAT 14.2 User s Guide The GENMOD Procedure This document is an individual chapter from SAS/STAT 14.2 User s Guide. The correct bibliographic citation for this manual is as follows: SAS Institute Inc.
More informationGeneralized, Linear, and Mixed Models
Generalized, Linear, and Mixed Models CHARLES E. McCULLOCH SHAYLER.SEARLE Departments of Statistical Science and Biometrics Cornell University A WILEY-INTERSCIENCE PUBLICATION JOHN WILEY & SONS, INC. New
More informationONE MORE TIME ABOUT R 2 MEASURES OF FIT IN LOGISTIC REGRESSION
ONE MORE TIME ABOUT R 2 MEASURES OF FIT IN LOGISTIC REGRESSION Ernest S. Shtatland, Ken Kleinman, Emily M. Cain Harvard Medical School, Harvard Pilgrim Health Care, Boston, MA ABSTRACT In logistic regression,
More informationST3241 Categorical Data Analysis I Generalized Linear Models. Introduction and Some Examples
ST3241 Categorical Data Analysis I Generalized Linear Models Introduction and Some Examples 1 Introduction We have discussed methods for analyzing associations in two-way and three-way tables. Now we will
More informationGeneralized Models: Part 1
Generalized Models: Part 1 Topics: Introduction to generalized models Introduction to maximum likelihood estimation Models for binary outcomes Models for proportion outcomes Models for categorical outcomes
More informationFitting PK Models with SAS NLMIXED Procedure Halimu Haridona, PPD Inc., Beijing
PharmaSUG China 1 st Conference, 2012 Fitting PK Models with SAS NLMIXED Procedure Halimu Haridona, PPD Inc., Beijing ABSTRACT Pharmacokinetic (PK) models are important for new drug development. Statistical
More informationParametric Modelling of Over-dispersed Count Data. Part III / MMath (Applied Statistics) 1
Parametric Modelling of Over-dispersed Count Data Part III / MMath (Applied Statistics) 1 Introduction Poisson regression is the de facto approach for handling count data What happens then when Poisson
More informationGeneralized Linear Models for Non-Normal Data
Generalized Linear Models for Non-Normal Data Today s Class: 3 parts of a generalized model Models for binary outcomes Complications for generalized multivariate or multilevel models SPLH 861: Lecture
More informationMultinomial Logistic Regression Models
Stat 544, Lecture 19 1 Multinomial Logistic Regression Models Polytomous responses. Logistic regression can be extended to handle responses that are polytomous, i.e. taking r>2 categories. (Note: The word
More informationMore Accurately Analyze Complex Relationships
SPSS Advanced Statistics 17.0 Specifications More Accurately Analyze Complex Relationships Make your analysis more accurate and reach more dependable conclusions with statistics designed to fit the inherent
More informationLecture 3.1 Basic Logistic LDA
y Lecture.1 Basic Logistic LDA 0.2.4.6.8 1 Outline Quick Refresher on Ordinary Logistic Regression and Stata Women s employment example Cross-Over Trial LDA Example -100-50 0 50 100 -- Longitudinal Data
More informationRANDOM and REPEATED statements - How to Use Them to Model the Covariance Structure in Proc Mixed. Charlie Liu, Dachuang Cao, Peiqi Chen, Tony Zagar
Paper S02-2007 RANDOM and REPEATED statements - How to Use Them to Model the Covariance Structure in Proc Mixed Charlie Liu, Dachuang Cao, Peiqi Chen, Tony Zagar Eli Lilly & Company, Indianapolis, IN ABSTRACT
More informationIntroduction to Generalized Models
Introduction to Generalized Models Today s topics: The big picture of generalized models Review of maximum likelihood estimation Models for binary outcomes Models for proportion outcomes Models for categorical
More informationSTA 303 H1S / 1002 HS Winter 2011 Test March 7, ab 1cde 2abcde 2fghij 3
STA 303 H1S / 1002 HS Winter 2011 Test March 7, 2011 LAST NAME: FIRST NAME: STUDENT NUMBER: ENROLLED IN: (circle one) STA 303 STA 1002 INSTRUCTIONS: Time: 90 minutes Aids allowed: calculator. Some formulae
More informationPROC LOGISTIC: Traps for the unwary Peter L. Flom, Independent statistical consultant, New York, NY
Paper SD174 PROC LOGISTIC: Traps for the unwary Peter L. Flom, Independent statistical consultant, New York, NY ABSTRACT Keywords: Logistic. INTRODUCTION This paper covers some gotchas in SAS R PROC LOGISTIC.
More informationGMM Logistic Regression with Time-Dependent Covariates and Feedback Processes in SAS TM
Paper 1025-2017 GMM Logistic Regression with Time-Dependent Covariates and Feedback Processes in SAS TM Kyle M. Irimata, Arizona State University; Jeffrey R. Wilson, Arizona State University ABSTRACT The
More informationGeneralized Linear Modeling - Logistic Regression
1 Generalized Linear Modeling - Logistic Regression Binary outcomes The logit and inverse logit interpreting coefficients and odds ratios Maximum likelihood estimation Problem of separation Evaluating
More informationR Hints for Chapter 10
R Hints for Chapter 10 The multiple logistic regression model assumes that the success probability p for a binomial random variable depends on independent variables or design variables x 1, x 2,, x k.
More informationMantel-Haenszel Test Statistics. for Correlated Binary Data. Department of Statistics, North Carolina State University. Raleigh, NC
Mantel-Haenszel Test Statistics for Correlated Binary Data by Jie Zhang and Dennis D. Boos Department of Statistics, North Carolina State University Raleigh, NC 27695-8203 tel: (919) 515-1918 fax: (919)
More informationPower calculation for non-inferiority trials comparing two Poisson distributions
Paper PK01 Power calculation for non-inferiority trials comparing two Poisson distributions Corinna Miede, Accovion GmbH, Marburg, Germany Jochen Mueller-Cohrs, Accovion GmbH, Marburg, Germany Abstract
More informationUNIVERSITY OF MASSACHUSETTS Department of Mathematics and Statistics Applied Statistics Friday, January 15, 2016
UNIVERSITY OF MASSACHUSETTS Department of Mathematics and Statistics Applied Statistics Friday, January 15, 2016 Work all problems. 60 points are needed to pass at the Masters Level and 75 to pass at the
More informationBiostat Methods STAT 5820/6910 Handout #5a: Misc. Issues in Logistic Regression
Biostat Methods STAT 5820/6910 Handout #5a: Misc. Issues in Logistic Regression Recall general χ 2 test setu: Y 0 1 Trt 0 a b Trt 1 c d I. Basic logistic regression Previously (Handout 4a): χ 2 test of
More informationOverdispersion Workshop in generalized linear models Uppsala, June 11-12, Outline. Overdispersion
Biostokastikum Overdispersion is not uncommon in practice. In fact, some would maintain that overdispersion is the norm in practice and nominal dispersion the exception McCullagh and Nelder (1989) Overdispersion
More informationReview: what is a linear model. Y = β 0 + β 1 X 1 + β 2 X 2 + A model of the following form:
Outline for today What is a generalized linear model Linear predictors and link functions Example: fit a constant (the proportion) Analysis of deviance table Example: fit dose-response data using logistic
More informationSTAT 705 Nonlinear regression
STAT 705 Nonlinear regression Adapted from Timothy Hanson Department of Statistics, University of South Carolina Stat 705: Data Analysis II 1 / 1 Chapter 13 Parametric nonlinear regression Throughout most
More informationLogistic Regression. Interpretation of linear regression. Other types of outcomes. 0-1 response variable: Wound infection. Usual linear regression
Logistic Regression Usual linear regression (repetition) y i = b 0 + b 1 x 1i + b 2 x 2i + e i, e i N(0,σ 2 ) or: y i N(b 0 + b 1 x 1i + b 2 x 2i,σ 2 ) Example (DGA, p. 336): E(PEmax) = 47.355 + 1.024
More information8 Nominal and Ordinal Logistic Regression
8 Nominal and Ordinal Logistic Regression 8.1 Introduction If the response variable is categorical, with more then two categories, then there are two options for generalized linear models. One relies on
More informationSTA6938-Logistic Regression Model
Dr. Ying Zhang STA6938-Logistic Regression Model Topic 2-Multiple Logistic Regression Model Outlines:. Model Fitting 2. Statistical Inference for Multiple Logistic Regression Model 3. Interpretation of
More informationComparing Priors in Bayesian Logistic Regression for Sensorial Classification of Rice
SAS 1018-2017 Comparing Priors in Bayesian Logistic Regression for Sensorial Classification of Rice Geiziane Oliveira, SAS Institute, Brazil; George von Borries, Universidade de Brasília, Brazil; Priscila
More informationSCHOOL OF MATHEMATICS AND STATISTICS. Linear and Generalised Linear Models
SCHOOL OF MATHEMATICS AND STATISTICS Linear and Generalised Linear Models Autumn Semester 2017 18 2 hours Attempt all the questions. The allocation of marks is shown in brackets. RESTRICTED OPEN BOOK EXAMINATION
More informationjh page 1 /6
DATA a; INFILE 'downs.dat' ; INPUT AgeL AgeU BirthOrd Cases Births ; MidAge = (AgeL + AgeU)/2 ; Rate = 1000*Cases/Births; (epidemiologically correct: a prevalence rate) LogRate = Log10( (Cases+0.5)/Births
More informationEPSY 905: Fundamentals of Multivariate Modeling Online Lecture #7
Introduction to Generalized Univariate Models: Models for Binary Outcomes EPSY 905: Fundamentals of Multivariate Modeling Online Lecture #7 EPSY 905: Intro to Generalized In This Lecture A short review
More informationGeneralized linear models
Generalized linear models Christopher F Baum ECON 8823: Applied Econometrics Boston College, Spring 2016 Christopher F Baum (BC / DIW) Generalized linear models Boston College, Spring 2016 1 / 1 Introduction
More information7.1, 7.3, 7.4. Timothy Hanson. Department of Statistics, University of South Carolina. Stat 770: Categorical Data Analysis 1/ 31
7.1, 7.3, 7.4 Timothy Hanson Department of Statistics, University of South Carolina Stat 770: Categorical Data Analysis 1/ 31 7.1 Alternative links in binary regression* There are three common links considered
More informationMODEL SELECTION BASED ON QUASI-LIKELIHOOD WITH APPLICATION TO OVERDISPERSED DATA
J. Jpn. Soc. Comp. Statist., 26(2013), 53 69 DOI:10.5183/jjscs.1212002 204 MODEL SELECTION BASED ON QUASI-LIKELIHOOD WITH APPLICATION TO OVERDISPERSED DATA Yiping Tang ABSTRACT Overdispersion is a common
More informationLattice Data. Tonglin Zhang. Spatial Statistics for Point and Lattice Data (Part III)
Title: Spatial Statistics for Point Processes and Lattice Data (Part III) Lattice Data Tonglin Zhang Outline Description Research Problems Global Clustering and Local Clusters Permutation Test Spatial
More informationAnalyzing Residuals in a PROC SURVEYLOGISTIC Model
Paper 1477-2017 Analyzing Residuals in a PROC SURVEYLOGISTIC Model Bogdan Gadidov, Herman E. Ray, Kennesaw State University ABSTRACT Data from an extensive survey conducted by the National Center for Education
More informationGeneralized Linear Model under the Extended Negative Multinomial Model and Cancer Incidence
Generalized Linear Model under the Extended Negative Multinomial Model and Cancer Incidence Sunil Kumar Dhar Center for Applied Mathematics and Statistics, Department of Mathematical Sciences, New Jersey
More information9 Generalized Linear Models
9 Generalized Linear Models The Generalized Linear Model (GLM) is a model which has been built to include a wide range of different models you already know, e.g. ANOVA and multiple linear regression models
More informationSAS/STAT 13.1 User s Guide. Introduction to Mixed Modeling Procedures
SAS/STAT 13.1 User s Guide Introduction to Mixed Modeling Procedures This document is an individual chapter from SAS/STAT 13.1 User s Guide. The correct bibliographic citation for the complete manual is
More informationRobust covariance estimator for small-sample adjustment in the generalized estimating equations: A simulation study
Science Journal of Applied Mathematics and Statistics 2014; 2(1): 20-25 Published online February 20, 2014 (http://www.sciencepublishinggroup.com/j/sjams) doi: 10.11648/j.sjams.20140201.13 Robust covariance
More informationThe Steps to Follow in a Multiple Regression Analysis
ABSTRACT The Steps to Follow in a Multiple Regression Analysis Theresa Hoang Diem Ngo, Warner Bros. Home Video, Burbank, CA A multiple regression analysis is the most powerful tool that is widely used,
More informationRonald Christensen. University of New Mexico. Albuquerque, New Mexico. Wesley Johnson. University of California, Irvine. Irvine, California
Texts in Statistical Science Bayesian Ideas and Data Analysis An Introduction for Scientists and Statisticians Ronald Christensen University of New Mexico Albuquerque, New Mexico Wesley Johnson University
More informationConfidence intervals for the variance component of random-effects linear models
The Stata Journal (2004) 4, Number 4, pp. 429 435 Confidence intervals for the variance component of random-effects linear models Matteo Bottai Arnold School of Public Health University of South Carolina
More informationSAS/STAT 14.2 User s Guide. Introduction to Survey Sampling and Analysis Procedures
SAS/STAT 14.2 User s Guide Introduction to Survey Sampling and Analysis Procedures This document is an individual chapter from SAS/STAT 14.2 User s Guide. The correct bibliographic citation for this manual
More informationTECHNICAL REPORT # 59 MAY Interim sample size recalculation for linear and logistic regression models: a comprehensive Monte-Carlo study
TECHNICAL REPORT # 59 MAY 2013 Interim sample size recalculation for linear and logistic regression models: a comprehensive Monte-Carlo study Sergey Tarima, Peng He, Tao Wang, Aniko Szabo Division of Biostatistics,
More informationApproximation of Survival Function by Taylor Series for General Partly Interval Censored Data
Malaysian Journal of Mathematical Sciences 11(3): 33 315 (217) MALAYSIAN JOURNAL OF MATHEMATICAL SCIENCES Journal homepage: http://einspem.upm.edu.my/journal Approximation of Survival Function by Taylor
More informationThe betareg Package. October 6, 2006
The betareg Package October 6, 2006 Title Beta Regression. Version 1.2 Author Alexandre de Bustamante Simas , with contributions, in this last version, from Andréa Vanessa Rocha
More informationImplementation of Pairwise Fitting Technique for Analyzing Multivariate Longitudinal Data in SAS
PharmaSUG2011 - Paper SP09 Implementation of Pairwise Fitting Technique for Analyzing Multivariate Longitudinal Data in SAS Madan Gopal Kundu, Indiana University Purdue University at Indianapolis, Indianapolis,
More informationThe GENMOD Procedure. Overview. Getting Started. Syntax. Details. Examples. References. SAS/STAT User's Guide. Book Contents Previous Next
Book Contents Previous Next SAS/STAT User's Guide Overview Getting Started Syntax Details Examples References Book Contents Previous Next Top http://v8doc.sas.com/sashtml/stat/chap29/index.htm29/10/2004
More informationSTAT 7030: Categorical Data Analysis
STAT 7030: Categorical Data Analysis 5. Logistic Regression Peng Zeng Department of Mathematics and Statistics Auburn University Fall 2012 Peng Zeng (Auburn University) STAT 7030 Lecture Notes Fall 2012
More informationUNIVERSITY OF CALIFORNIA, SAN DIEGO
UNIVERSITY OF CALIFORNIA, SAN DIEGO Estimation of the primary hazard ratio in the presence of a secondary covariate with non-proportional hazards An undergraduate honors thesis submitted to the Department
More informationBinary Regression. GH Chapter 5, ISL Chapter 4. January 31, 2017
Binary Regression GH Chapter 5, ISL Chapter 4 January 31, 2017 Seedling Survival Tropical rain forests have up to 300 species of trees per hectare, which leads to difficulties when studying processes which
More informationThe Simulation Extrapolation Method for Fitting Generalized Linear Models with Additive Measurement Error
The Stata Journal (), Number, pp. 1 12 The Simulation Extrapolation Method for Fitting Generalized Linear Models with Additive Measurement Error James W. Hardin Norman J. Arnold School of Public Health
More information