INFORMATION AS A UNIFYING MEASURE OF FIT IN SAS STATISTICAL MODELING PROCEDURES

Size: px
Start display at page:

Download "INFORMATION AS A UNIFYING MEASURE OF FIT IN SAS STATISTICAL MODELING PROCEDURES"

Transcription

1 INFORMATION AS A UNIFYING MEASURE OF FIT IN SAS STATISTICAL MODELING PROCEDURES Ernest S. Shtatland, PhD Mary B. Barton, MD, MPP Harvard Medical School, Harvard Pilgrim Health Care, Boston, MA ABSTRACT This paper shows that there is a common denominator for many goodness-of-fit criteria in such popular statistical modeling procedures as REG, LOGISTIC, GENMOD, PHREG and others, and that the most natural common denominator is an information-based measure. The necessity to have some common, universal and easy to interpret criterion is almost obvious. PROC REG has eleven criteria of fit, LOGISTIC has four, GENMOD - nine, and PHREG - three. Some of these criteria look alike, but others look absolutely unrelated. With this abundance of criteria, the questions of which one to use, why and how to interpret the criterion of our choice are far from being trivial. As a unifying criterion, it is proposed to use (together with other criteria as R-Square, deviance, loglikelihood and so on) the information-difference statistic which: a) is common for all the types of regression analysis; b) is easy to interpret in terms of information gain/loss (in bits); c) has a very convenient property of additivity that allows SAS users to evaluate the contribution of an individual covariate or a group of covariates in terms of information. SAS USERS AND SAS STATISTICAL MODELING PROCEDURES One of the most difficult points in using SAS Statistical Modeling Procedures (after the type of the model is chosen) is to understand whether the model we have built is good enough for us in terms of some goodness-of-fit criterion. This means that at a minimum, we have to understand and interpret printouts. The difficulty of this task is compounded because, for example, PROC REG has at least 11 statistics of fit ([1], p. 1369), PROC GENMOD - 9 statistics ([2], p. 273), PROC LOGISTIC - 4 ([1], p. 1088, and [2], pp ), PROC PHREG - 3 ([2], p. 832), PROC MIXED - 6 ([2], pp. 547, ), and so on. The expertise required to understand and interpret properly each of these statistics is not available to all users. Therefore it would be desirable to have a universal criterion with an easy to understand meaning that would be common for many applications. POSSIBLE CANDIDATES FOR THE ROLE OF UNIVERSAL CRITERION OF FIT There are two clear candidates for the role of a universal criterion of fit: R-Square and a statistic based on likelihood. 1. R-SQUARE R-Square may be the most popular statistic of fit. Its popularity is so widespread that you can find it in your Vanguard or Fidelity mutual funds booklets. R-Square dominates in PROC REG and PROC GLM (in the latter it is the sole criterion). R-Square has been added to the original list of goodness-of-fit criteria for PROC LOGISTIC. Also, R-Square is sometimes used in Survival Analysis. Nevertheless, the most natural area of applications of R-Square is linear modeling. Here, R-Square is not just a rescaled measure of variation as it is usually thought of, but a comparison between two models: a null model with intercept only and our current model of interest (see [4], p. 114). 1

2 intercept only) models respectively. Here log is either We should mention also some drawbacks of the natural or binary logarithm. The natural R-Square. The basic shortcomings are: logarithm is used more often with the Gaussian or 1) R-Square always has a positive bias, sometimes normal distribution (in linear regression) and the rather substantial. This is why the ADJUSTED base 2 logarithm with discrete distributions R-Square has been introduced. (binomial, Poisson etc.). 2)There are many interpretations of R-Square; this is not bad in itself, but often leads to significant Now, how can the information concept appear in confusion. equation (1)? It is well-known that for any random As a result, we can find the advice ([3], p. 14): event E we have two numbers: the event s probability It may be easier to interpret 1-R 2 [rather than P(E) and its information I(E), the information R-Square itself]. Perhaps this is the reason why in contained in the spite of all the popularity of R-Square, it is a IE ( ) = log PE ( ) message that E has measure many statisticians love to hate. ([4], p. occurred. These 113). quantities are connected according to the formula 2. LIKELIHOOD (2) This is certainly a very fundamental characteristic, Information is as fundamental a concept as even more fundamental than R-Square. It appears in probability and for a general audience, information the form of the logarithm of the likelihood (log- rather than probability may have more intuitive ease. likelihood) in printouts of almost all nonlinear Sometimes I(E) regression procedures as itself or as a part of AIC, HX ( ) = ( Px ( i)log Px ( i)) is called the i BIC, SC or DEVIANCE (see [1], pp , Shannon 1369; [2], p. 273). But sometimes it is used with the information or Shannon complexity ([8], p. 38). For a sign '+', sometimes with the sign '-'; sometimes you discrete-valued random variable X with the can see the log-likelihood with the factor 2, probability distribution {P(x)} we can define the sometimes without it. This is confusing, especially if entropy as the information average: you don't know whether this statistic, the loglikelihood, is meaningful in itself or it is used instead (3) of likelihood because of some mathematical convenience (by the way, the latter is the most typical For a continuous random variable we will have an explanation). We will find the answer to this question integral instead of the sum. According to these through the concepts of information and entropy formulas, we can think of the right side of (1) as a when trying to find a relationship between R-Square sample estimate of the difference between the entropy and likelihood statistics. characteristics of the null model (b=0) and the model under consideration. The most natural R-SQUARE AND LIKELIHOOD - interpretation of this difference is the information WHAT THEY HAVE IN COMMON gain (IG) we have when moving from the simplest 'null' model to the fitted model ([9]-[11]). Only now It has been noted ([5]-[7]) that there exists a simple we have justified our using the log-likelihood instead formula relating R-Square and log-likelihood of just the likelihood not only from a technical but statistics from a conceptual point of view log(1- R 2 )= 1 [logl(b)-logl(0)] (1) N where N is a sample size, logl(b) and logl(0) denote the log-likelihoods of the fitted and null (with INFORMATION AND EVALUATING SCIENTIFICALLY IMPORTANT DIFFERENCES IN R-SQUARE Whatever criterion of fit we are using, we should distinguish 3 types of differences in our models (see [12], p. 319): (1) numeric differences, (2) statistically 2

3 significant differences, and (3) scientifically conclude that the R-Square scale is extremely important differences. Numerical differences may or inhomogeneous: closer to 1 the scale becomes steeper, may not correspond to significant or important and the change in R-Square from 0.99 to (less differences. For R-Square, the statistical significance than 1%!) has resulted in 1.67 bits. At the same of the difference between two models and two time, the change of R-Square from 0.50 to 0.75 is R-Square values (the so-called uniqueness index, see equivalent to 0.5 bit information. Thus, the [13], p. 407) usually is determined by using F-ratio information gain (IG) can be much more helpful statistics. The authors of the above-mentioned book than the uniqueness index in determining which expressed their concern about the absence of a direct difference is practically or scientifically important relationship between statistical significance and and which one is not. practical or scientific importance of uniqueness indices ([13], p. 440): With a large sample, it is It is not difficult to show that there exists a critical possible to obtain a uniqueness index that is value of R-Square equal to statistically significant even though the predictor variable accounts for only 2% or 3% of the variance 1-1/(2ln2) = in the criterion. When reviewing your results, it is important to assess whether a significant uniqueness below which the uniqueness index slightly overindex is large enough to be of substantive importance. estimates the information gain, and above which it How large is large enough?... When doing underestimates the information increment. With research with a criterion variable that traditionally R-Square closer to 1, this underestimate becomes has been difficult to predict, a uniqueness index of stronger and stronger. But for 0 < R 2 < 0.5, 0.05 (and possibly even smaller) may be viewed as the uniqueness index provides us with a good being of substantive importance. When researching a approximation of the information gain. criterion that is relatively easy to predict, the uniqueness index may have to be larger It is interesting to note that with introducing the to be considered important. A similar point of view information or entropy measure we can not only is expressed in [12], p Various sources measure the importance of changes in our model recommend using a 2%, 3% or 5% level of merit for more precisely and objectively, but we can make some the uniqueness index to be considered important. This fundamental relationships in the theory of regression advice is vague and furthermore may be incorrect in more understandable. For example, in a linear some cases. Below we demonstrate this by showing regression problem with a dependent variable Y and that the only objective way of quantifying an independent variables X 1,...,X K, a well-known important vs. unimportant dilemma is in using and fundamental relationship between R-Square and information statistics. The following table contains conditional partial correlations does not look the values of R-Square and corresponding values of encouraging and easy interpretable (see [15], for information gain (see also, [14], p. 250): example): TABLE R 2 =1-(1-R 2 )(1 - R 2 )... YX 1 YX 2 X 1 (4) R-Square Information Gain (IG)-in bits (1 - R 2 YX K X 1,..., X ) K where R YX1 is the correlation coefficient between Y and X1, the partial correlation coefficient R YX2 is X a measure of the strength of the linear relationship between Y and X 2 after controlling for the variable X 1, and so on. After simple transformations and (We have used the binary logarithm in TABLE 1 to taking the binary (or natural) logarithm, we are measure information in bits). From this data we can getting a very simple and easy to interpret additive 3

4 relationship N K IG(5%) (K+1)/N DIFFERENCE I(Y X 1,...,X K )=I(Y X 1 ) I(Y X 2 )+I(Y X 3 ) +... (5) I(Y X K,...,X K-1 ) Here I(Y X 1, X 2,...,X K ) is the the information about Y contained in all the covariates X 1, X 2,... Thus, using this fact, we can approximately test the, X K ; I(Y X 1 ) is the information about Y contained overall significance of our model in advance, without in X 1 alone; I(Y X 2 ) is the information about Y even referring to the table of critical values for the F- contained in X 2 after controlling for X 1, and so on. distribution (as a preliminary or exploratory analysis - to obtain a rough idea of significance). And to finish our list of benefits from using the information-difference or information-gain statistics Note that the parameter (K+1)/N (or K/N which is instead of the classical R-Square criterion, we will close to it) appears in many cases in statistical mention the following. It is well-known that the most modeling: powerful test of significance for overall regression is 1) in formulas for the statistics JP and PC performed by using an F-statistic which in terms of in PROC REG ([1], p. 1369) and in some variants of R-Square is expressed by the formula (see, for AIC, for example (see [9], pp , [11], p. 276, example [12], p. 126): [17], p. 423) F = N-K-1 [R 2 (1-R 2 )] (6) K AIC =- 2 N logl + 2 K N 2) in determining the importance or lack of where K is the number of explanatory or independent importance of the information gain (see [9], p. 170, variables in the model and N is the number of [11], p. 276); observations. We see that (6) is equivalent to 3) in the formula for the average bias of R-Square (see [15], p. 1031), etc log(1- R 2 K ) = 2log(1 + F N-K-1 ) (7) from which we can find quite a remarkable fact: 5% significance levels for the F-statistic are transformed to the significance levels for the information gain according to the formula IG(5%) = -1/2[log(1 - R-Square)], and these significance levels obey approximately the formula (K + 1)/N. The approximation improves with increase in N as demonstrated in the table below (see also [16]) TABLE 2 Before starting with nonlinear procedures, it is worth noting that the ideas similar to those discussed above can be used in canonical correlation analysis and PROC CANCORR. INFORMATION STATISTICS AS CRITERIA OF FIT IN NONLINEAR MODELS Equation (1) can serve as a prompt for us to introduce and utilize information- measuring statistics in PROC LOGISTIC, GENMOD, PHREG and other SAS statistical procedures for nonlinear modeling. 4

5 Frequently applications of these procedures belong to adequacy is better done by comparison with simpler medicine and health and behavioral sciences ([1], [2], models (for example, the model with intercept only) [12], [18]-[22]. For a typical application of logistic than the saturated model. Thus, it is proposed to use regression and PROC LOGISTIC in health services the statistic (8) rather than (9). The reason is that the research see also [23]. Chi-Square approximation for the deviance distribution is often very poor. In this rather uncertain PROC LOGISTIC situation, an additional recommendation is given ([2], p. 281): a deviance that is small relative to its For PROC LOGISTIC, equation (1) was used to degrees of freedom is a possible indication of a good introduce the generalized coefficient of determination model fit. The meaning of a new characteristic, (see [2], p. 414). The most typically used criterion for Value/DF, that appears in GENMOD printouts, is assessing overall model fit is (-2logL). The difference not explained and remains unclear. Taking into account that typically the sample size N is much 2(logL(b) - logl(0)) (8) greater than the number of explanatory variables K we can see that Value/DF is equal (approximately) to has an approximate Chi-Square distribution with k twice the information difference between the degrees of freedom (which is the only justification of saturated model and the model under consideration, introducing the factor 2 in the criterion). Based on which provides us with a new meaningful this fact, we can find the corresponding p-value and interpretation of this characteristic. test the hypotheses. However, it has been noted (see [24] and references in it) that in spite of its practical utility the chi-square statistic is not an estimate of anything meaningful. At the same time, if we divide CONCLUSIONS (8) by 2N we obtain the familiar expression 1 N [logl(b)-logl(0)] which is the estimate of the information difference between the current fitted model and the model with intercept only. In PROC PHREG we have the same situation. PROC GENMOD The deviance (or scaled deviance) plays a central role in assessment of goodness of fit for PROC GENMOD. It can be defined as 1. The facts discussed above support the suggestion that SAS users in their statistical analyses and especially interpretations should think in terms of the information gain or information differences rather than R-Square and uniqueness indices. The information gain, not the uniqueness index, gives us a real value (in universal information units) of our model or improvements in our model if we add some new explanatory variables. 2. Information statistics provide us with an interpretation bridge between linear and nonlinear regressions. These statistics have the same appearance and meaning in both linear and nonlinear settings. D = 2{logL(saturated) - logl(current)} (9) where L(saturated) is the likelihood of a saturated model that contains as many parameters as there are data points, and L(current) is the likelihood of the fitted model. At the same time SAS does not calculate p-values for the overall model deviances as it recommends ( [2], p. 281) that evaluation of model SAS and SAS/STAT are registered trademarks of SAS Institute Inc. indicates USA registration. REFERENCES 1. SAS Institute Inc. (1990). SAS/STAT User s Guide, Version 6, Fourth Edition, Volume 2, Cary, 5

6 NC: SAS Institute Inc. 2. SAS Institute Inc. (1996). SAS/STAT Software Changes and Enhancements Through Release 6.11, Cary, NC: SAS Institute Inc. 3. SAS Institute Inc. (1988). SAS/STAT User s Guide, Release 6.03 Edition, Cary, NC: SAS Institute Inc. 4. Anderson-Sprecher, R. (1994). Model comparisons and. The American Statistician, 48, R 2 5. Cox, D. R. & Snell, E. J. (1989). The Analysis of Binary Data, Second Edition, London: Chapman and Hall. 6. Maddala, G. S. (1983). Limited-Dependent and Quantitative Variables in Econometrics, Cambridge: University Press. 7. Nagelkerke, N. J. D. (1991). A note on a general definition of the coefficient of determination. Biometrika, 78, Rissanen J. (1989). Stochastic Complexity in Statistical Inquiry, London: World Scientific. 9. Kent J. T. (1983). Information gain and a general measure of correlation. Biometrika, 70, Kent J. T. & O Quigley, J. (1988). Measures of dependence for censored survival data. Biometrika,75, El Hasnaoui A. & Jais, J. P. (1992). Study and applications of an informational measure of dependence in survival models. Methods of Information in Medicine, 31, Kleinbaum, D. G., Kupper, L. L., & Muller, K. E. (1988). Applied Regression Analysis and Other Multivariable Methods, Second Edition, Boston: PWS-Kent. 13. Hatcher, L. & Stepanski, E. J. (1994). A Stepby -Step Approach to Using the SAS System for Univariate and Multivariate Statistics, Cary, NC: SAS Institute Inc. 14. Theil, H. & Chung, C. F. (1988). Information- theoretic measures of fit for univariate and multivariate linear regression. The American Statistician, 42, Stuart, A. & Ord, J. K. (1991). Advanced Theory of Statistics, Fifth edition, New York: Oxford University Press. 16. Parzen E. (1983). Time series model identification by estimating information. In Studies in Econometric Time Series, New York: Academic Press. 17. Judge, G. G., Griffiths, W. E., Hill, R. C., and Lee, T. (1980). The Theory and Practice of Econometrics, New York: John Wiley & Sons, Inc. 18. Hosmer, D. W. & Lemeshow, S. (1989). Applied Logistic Regression, New York: John Wiley & Sons, Inc. 19. McCullagh, P. & Nelder, J.A. (1989). Generalized Linear Models, Second Edition, New York: Chapman and Hall. 20. Dobson, A. J. (1983). An Introduction to Statistical Modelling, London: Chapman and Hall. 21. Dobson, A. J. (1990). An Introduction to Generalized Linear Models, London: Chapman and Hall. 22. Firth, D. (1991). Generalized Linear Models. In Statistical Theory and Modelling. Ed. Hinkley, D.V., Reid, N., and Snell, E. J., London: Chapman and Hall. 23. Barton, M. B., Fletcher, S. W. (1995) Preventive Services for Medicaid Enrollees in an HMO. J Gen Int Med, 10 (Suppl): Akaike, H. (1977). On Entropy Maximization Principle. In Applications of Statistics. Ed. Krishnaiah, P. R., New York: North-Holland Publishing Company. CONTACT INFORMATION: Ernest S. Shtatland Department of Ambulatory Care and Prevention 6

7 Harvard Pilgrim Health Care & Harvard Medical School 126 Brookline Avenue, Suite 200 Boston, MA tel: (617) Mary B. Barton, MD, MPP Department of Ambulatory Care and Prevention Harvard Medical School & Harvard Pilgrim Health Care 126 Brookline Avenue, Suite 200 Boston, MA tel: (617)

ONE MORE TIME ABOUT R 2 MEASURES OF FIT IN LOGISTIC REGRESSION

ONE MORE TIME ABOUT R 2 MEASURES OF FIT IN LOGISTIC REGRESSION ONE MORE TIME ABOUT R 2 MEASURES OF FIT IN LOGISTIC REGRESSION Ernest S. Shtatland, Ken Kleinman, Emily M. Cain Harvard Medical School, Harvard Pilgrim Health Care, Boston, MA ABSTRACT In logistic regression,

More information

LOGISTIC REGRESSION Joseph M. Hilbe

LOGISTIC REGRESSION Joseph M. Hilbe LOGISTIC REGRESSION Joseph M. Hilbe Arizona State University Logistic regression is the most common method used to model binary response data. When the response is binary, it typically takes the form of

More information

Bootstrap Simulation Procedure Applied to the Selection of the Multiple Linear Regressions

Bootstrap Simulation Procedure Applied to the Selection of the Multiple Linear Regressions JKAU: Sci., Vol. 21 No. 2, pp: 197-212 (2009 A.D. / 1430 A.H.); DOI: 10.4197 / Sci. 21-2.2 Bootstrap Simulation Procedure Applied to the Selection of the Multiple Linear Regressions Ali Hussein Al-Marshadi

More information

A COEFFICIENT OF DETERMINATION FOR LOGISTIC REGRESSION MODELS

A COEFFICIENT OF DETERMINATION FOR LOGISTIC REGRESSION MODELS A COEFFICIENT OF DETEMINATION FO LOGISTIC EGESSION MODELS ENATO MICELI UNIVESITY OF TOINO After a brief presentation of the main extensions of the classical coefficient of determination ( ), a new index

More information

SAS Software to Fit the Generalized Linear Model

SAS Software to Fit the Generalized Linear Model SAS Software to Fit the Generalized Linear Model Gordon Johnston, SAS Institute Inc., Cary, NC Abstract In recent years, the class of generalized linear models has gained popularity as a statistical modeling

More information

COLLABORATION OF STATISTICAL METHODS IN SELECTING THE CORRECT MULTIPLE LINEAR REGRESSIONS

COLLABORATION OF STATISTICAL METHODS IN SELECTING THE CORRECT MULTIPLE LINEAR REGRESSIONS American Journal of Biostatistics 4 (2): 29-33, 2014 ISSN: 1948-9889 2014 A.H. Al-Marshadi, This open access article is distributed under a Creative Commons Attribution (CC-BY) 3.0 license doi:10.3844/ajbssp.2014.29.33

More information

Chapter 1 Statistical Inference

Chapter 1 Statistical Inference Chapter 1 Statistical Inference causal inference To infer causality, you need a randomized experiment (or a huge observational study and lots of outside information). inference to populations Generalizations

More information

DISPLAYING THE POISSON REGRESSION ANALYSIS

DISPLAYING THE POISSON REGRESSION ANALYSIS Chapter 17 Poisson Regression Chapter Table of Contents DISPLAYING THE POISSON REGRESSION ANALYSIS...264 ModelInformation...269 SummaryofFit...269 AnalysisofDeviance...269 TypeIII(Wald)Tests...269 MODIFYING

More information

Generalized Linear Models (GLZ)

Generalized Linear Models (GLZ) Generalized Linear Models (GLZ) Generalized Linear Models (GLZ) are an extension of the linear modeling process that allows models to be fit to data that follow probability distributions other than the

More information

CHOOSING AMONG GENERALIZED LINEAR MODELS APPLIED TO MEDICAL DATA

CHOOSING AMONG GENERALIZED LINEAR MODELS APPLIED TO MEDICAL DATA STATISTICS IN MEDICINE, VOL. 17, 59 68 (1998) CHOOSING AMONG GENERALIZED LINEAR MODELS APPLIED TO MEDICAL DATA J. K. LINDSEY AND B. JONES* Department of Medical Statistics, School of Computing Sciences,

More information

Introduction to the Logistic Regression Model

Introduction to the Logistic Regression Model CHAPTER 1 Introduction to the Logistic Regression Model 1.1 INTRODUCTION Regression methods have become an integral component of any data analysis concerned with describing the relationship between a response

More information

Analysis of Longitudinal Data: Comparison between PROC GLM and PROC MIXED.

Analysis of Longitudinal Data: Comparison between PROC GLM and PROC MIXED. Analysis of Longitudinal Data: Comparison between PROC GLM and PROC MIXED. Maribeth Johnson, Medical College of Georgia, Augusta, GA ABSTRACT Longitudinal data refers to datasets with multiple measurements

More information

8 Nominal and Ordinal Logistic Regression

8 Nominal and Ordinal Logistic Regression 8 Nominal and Ordinal Logistic Regression 8.1 Introduction If the response variable is categorical, with more then two categories, then there are two options for generalized linear models. One relies on

More information

Class Notes: Week 8. Probit versus Logit Link Functions and Count Data

Class Notes: Week 8. Probit versus Logit Link Functions and Count Data Ronald Heck Class Notes: Week 8 1 Class Notes: Week 8 Probit versus Logit Link Functions and Count Data This week we ll take up a couple of issues. The first is working with a probit link function. While

More information

Model comparison. Patrick Breheny. March 28. Introduction Measures of predictive power Model selection

Model comparison. Patrick Breheny. March 28. Introduction Measures of predictive power Model selection Model comparison Patrick Breheny March 28 Patrick Breheny BST 760: Advanced Regression 1/25 Wells in Bangladesh In this lecture and the next, we will consider a data set involving modeling the decisions

More information

BIAS OF MAXIMUM-LIKELIHOOD ESTIMATES IN LOGISTIC AND COX REGRESSION MODELS: A COMPARATIVE SIMULATION STUDY

BIAS OF MAXIMUM-LIKELIHOOD ESTIMATES IN LOGISTIC AND COX REGRESSION MODELS: A COMPARATIVE SIMULATION STUDY BIAS OF MAXIMUM-LIKELIHOOD ESTIMATES IN LOGISTIC AND COX REGRESSION MODELS: A COMPARATIVE SIMULATION STUDY Ingo Langner 1, Ralf Bender 2, Rebecca Lenz-Tönjes 1, Helmut Küchenhoff 2, Maria Blettner 2 1

More information

LOGISTICS REGRESSION FOR SAMPLE SURVEYS

LOGISTICS REGRESSION FOR SAMPLE SURVEYS 4 LOGISTICS REGRESSION FOR SAMPLE SURVEYS Hukum Chandra Indian Agricultural Statistics Research Institute, New Delhi-002 4. INTRODUCTION Researchers use sample survey methodology to obtain information

More information

Estimating terminal half life by non-compartmental methods with some data below the limit of quantification

Estimating terminal half life by non-compartmental methods with some data below the limit of quantification Paper SP08 Estimating terminal half life by non-compartmental methods with some data below the limit of quantification Jochen Müller-Cohrs, CSL Behring, Marburg, Germany ABSTRACT In pharmacokinetic studies

More information

Repeated ordinal measurements: a generalised estimating equation approach

Repeated ordinal measurements: a generalised estimating equation approach Repeated ordinal measurements: a generalised estimating equation approach David Clayton MRC Biostatistics Unit 5, Shaftesbury Road Cambridge CB2 2BW April 7, 1992 Abstract Cumulative logit and related

More information

Practice of SAS Logistic Regression on Binary Pharmacodynamic Data Problems and Solutions. Alan J Xiao, Cognigen Corporation, Buffalo NY

Practice of SAS Logistic Regression on Binary Pharmacodynamic Data Problems and Solutions. Alan J Xiao, Cognigen Corporation, Buffalo NY Practice of SAS Logistic Regression on Binary Pharmacodynamic Data Problems and Solutions Alan J Xiao, Cognigen Corporation, Buffalo NY ABSTRACT Logistic regression has been widely applied to population

More information

Logistic Regression. Fitting the Logistic Regression Model BAL040-A.A.-10-MAJ

Logistic Regression. Fitting the Logistic Regression Model BAL040-A.A.-10-MAJ Logistic Regression The goal of a logistic regression analysis is to find the best fitting and most parsimonious, yet biologically reasonable, model to describe the relationship between an outcome (dependent

More information

A note on R 2 measures for Poisson and logistic regression models when both models are applicable

A note on R 2 measures for Poisson and logistic regression models when both models are applicable Journal of Clinical Epidemiology 54 (001) 99 103 A note on R measures for oisson and logistic regression models when both models are applicable Martina Mittlböck, Harald Heinzl* Department of Medical Computer

More information

Approximate Median Regression via the Box-Cox Transformation

Approximate Median Regression via the Box-Cox Transformation Approximate Median Regression via the Box-Cox Transformation Garrett M. Fitzmaurice,StuartR.Lipsitz, and Michael Parzen Median regression is used increasingly in many different areas of applications. The

More information

LISA Short Course Series Generalized Linear Models (GLMs) & Categorical Data Analysis (CDA) in R. Liang (Sally) Shan Nov. 4, 2014

LISA Short Course Series Generalized Linear Models (GLMs) & Categorical Data Analysis (CDA) in R. Liang (Sally) Shan Nov. 4, 2014 LISA Short Course Series Generalized Linear Models (GLMs) & Categorical Data Analysis (CDA) in R Liang (Sally) Shan Nov. 4, 2014 L Laboratory for Interdisciplinary Statistical Analysis LISA helps VT researchers

More information

\It is important that there be options in explication, and equally important that. Joseph Retzer, Market Probe, Inc., Milwaukee WI

\It is important that there be options in explication, and equally important that. Joseph Retzer, Market Probe, Inc., Milwaukee WI Measuring The Information Content Of Regressors In The Linear Model Using Proc Reg and SAS IML \It is important that there be options in explication, and equally important that the candidates have clear

More information

Analysis of the AIC Statistic for Optimal Detection of Small Changes in Dynamic Systems

Analysis of the AIC Statistic for Optimal Detection of Small Changes in Dynamic Systems Analysis of the AIC Statistic for Optimal Detection of Small Changes in Dynamic Systems Jeremy S. Conner and Dale E. Seborg Department of Chemical Engineering University of California, Santa Barbara, CA

More information

Introduction to Statistical Analysis

Introduction to Statistical Analysis Introduction to Statistical Analysis Changyu Shen Richard A. and Susan F. Smith Center for Outcomes Research in Cardiology Beth Israel Deaconess Medical Center Harvard Medical School Objectives Descriptive

More information

Model Estimation Example

Model Estimation Example Ronald H. Heck 1 EDEP 606: Multivariate Methods (S2013) April 7, 2013 Model Estimation Example As we have moved through the course this semester, we have encountered the concept of model estimation. Discussions

More information

Analysis of variance, multivariate (MANOVA)

Analysis of variance, multivariate (MANOVA) Analysis of variance, multivariate (MANOVA) Abstract: A designed experiment is set up in which the system studied is under the control of an investigator. The individuals, the treatments, the variables

More information

Model Based Statistics in Biology. Part V. The Generalized Linear Model. Chapter 16 Introduction

Model Based Statistics in Biology. Part V. The Generalized Linear Model. Chapter 16 Introduction Model Based Statistics in Biology. Part V. The Generalized Linear Model. Chapter 16 Introduction ReCap. Parts I IV. The General Linear Model Part V. The Generalized Linear Model 16 Introduction 16.1 Analysis

More information

Logistic Regression. Interpretation of linear regression. Other types of outcomes. 0-1 response variable: Wound infection. Usual linear regression

Logistic Regression. Interpretation of linear regression. Other types of outcomes. 0-1 response variable: Wound infection. Usual linear regression Logistic Regression Usual linear regression (repetition) y i = b 0 + b 1 x 1i + b 2 x 2i + e i, e i N(0,σ 2 ) or: y i N(b 0 + b 1 x 1i + b 2 x 2i,σ 2 ) Example (DGA, p. 336): E(PEmax) = 47.355 + 1.024

More information

A Note on Bayesian Inference After Multiple Imputation

A Note on Bayesian Inference After Multiple Imputation A Note on Bayesian Inference After Multiple Imputation Xiang Zhou and Jerome P. Reiter Abstract This article is aimed at practitioners who plan to use Bayesian inference on multiplyimputed datasets in

More information

Investigating Models with Two or Three Categories

Investigating Models with Two or Three Categories Ronald H. Heck and Lynn N. Tabata 1 Investigating Models with Two or Three Categories For the past few weeks we have been working with discriminant analysis. Let s now see what the same sort of model might

More information

Czech J. Anim. Sci., 50, 2005 (4):

Czech J. Anim. Sci., 50, 2005 (4): Czech J Anim Sci, 50, 2005 (4: 163 168 Original Paper Canonical correlation analysis for studying the relationship between egg production traits and body weight, egg weight and age at sexual maturity in

More information

Logistic Regression: Regression with a Binary Dependent Variable

Logistic Regression: Regression with a Binary Dependent Variable Logistic Regression: Regression with a Binary Dependent Variable LEARNING OBJECTIVES Upon completing this chapter, you should be able to do the following: State the circumstances under which logistic regression

More information

UNIVERSITY OF TORONTO. Faculty of Arts and Science APRIL 2010 EXAMINATIONS STA 303 H1S / STA 1002 HS. Duration - 3 hours. Aids Allowed: Calculator

UNIVERSITY OF TORONTO. Faculty of Arts and Science APRIL 2010 EXAMINATIONS STA 303 H1S / STA 1002 HS. Duration - 3 hours. Aids Allowed: Calculator UNIVERSITY OF TORONTO Faculty of Arts and Science APRIL 2010 EXAMINATIONS STA 303 H1S / STA 1002 HS Duration - 3 hours Aids Allowed: Calculator LAST NAME: FIRST NAME: STUDENT NUMBER: There are 27 pages

More information

The GENMOD Procedure. Overview. Getting Started. Syntax. Details. Examples. References. SAS/STAT User's Guide. Book Contents Previous Next

The GENMOD Procedure. Overview. Getting Started. Syntax. Details. Examples. References. SAS/STAT User's Guide. Book Contents Previous Next Book Contents Previous Next SAS/STAT User's Guide Overview Getting Started Syntax Details Examples References Book Contents Previous Next Top http://v8doc.sas.com/sashtml/stat/chap29/index.htm29/10/2004

More information

Introduction to Statistical modeling: handout for Math 489/583

Introduction to Statistical modeling: handout for Math 489/583 Introduction to Statistical modeling: handout for Math 489/583 Statistical modeling occurs when we are trying to model some data using statistical tools. From the start, we recognize that no model is perfect

More information

Statistics Boot Camp. Dr. Stephanie Lane Institute for Defense Analyses DATAWorks 2018

Statistics Boot Camp. Dr. Stephanie Lane Institute for Defense Analyses DATAWorks 2018 Statistics Boot Camp Dr. Stephanie Lane Institute for Defense Analyses DATAWorks 2018 March 21, 2018 Outline of boot camp Summarizing and simplifying data Point and interval estimation Foundations of statistical

More information

Confirmatory and Exploratory Data Analyses Using PROC GENMOD: Factors Associated with Red Light Running Crashes

Confirmatory and Exploratory Data Analyses Using PROC GENMOD: Factors Associated with Red Light Running Crashes Confirmatory and Exploratory Data Analyses Using PROC GENMOD: Factors Associated with Red Light Running Crashes Li wan Chen, LENDIS Corporation, McLean, VA Forrest Council, Highway Safety Research Center,

More information

FEEG6017 lecture: Akaike's information criterion; model reduction. Brendan Neville

FEEG6017 lecture: Akaike's information criterion; model reduction. Brendan Neville FEEG6017 lecture: Akaike's information criterion; model reduction Brendan Neville bjn1c13@ecs.soton.ac.uk Occam's razor William of Occam, 1288-1348. All else being equal, the simplest explanation is the

More information

RANDOM and REPEATED statements - How to Use Them to Model the Covariance Structure in Proc Mixed. Charlie Liu, Dachuang Cao, Peiqi Chen, Tony Zagar

RANDOM and REPEATED statements - How to Use Them to Model the Covariance Structure in Proc Mixed. Charlie Liu, Dachuang Cao, Peiqi Chen, Tony Zagar Paper S02-2007 RANDOM and REPEATED statements - How to Use Them to Model the Covariance Structure in Proc Mixed Charlie Liu, Dachuang Cao, Peiqi Chen, Tony Zagar Eli Lilly & Company, Indianapolis, IN ABSTRACT

More information

Procedia - Social and Behavioral Sciences 109 ( 2014 )

Procedia - Social and Behavioral Sciences 109 ( 2014 ) Available online at www.sciencedirect.com ScienceDirect Procedia - Social and Behavioral Sciences 09 ( 04 ) 730 736 nd World Conference On Business, Economics And Management - WCBEM 03 Categorical Principal

More information

Sigmaplot di Systat Software

Sigmaplot di Systat Software Sigmaplot di Systat Software SigmaPlot Has Extensive Statistical Analysis Features SigmaPlot is now bundled with SigmaStat as an easy-to-use package for complete graphing and data analysis. The statistical

More information

Generalized Linear Models 1

Generalized Linear Models 1 Generalized Linear Models 1 STA 2101/442: Fall 2012 1 See last slide for copyright information. 1 / 24 Suggested Reading: Davison s Statistical models Exponential families of distributions Sec. 5.2 Chapter

More information

Determining the number of components in mixture models for hierarchical data

Determining the number of components in mixture models for hierarchical data Determining the number of components in mixture models for hierarchical data Olga Lukočienė 1 and Jeroen K. Vermunt 2 1 Department of Methodology and Statistics, Tilburg University, P.O. Box 90153, 5000

More information

Open Problems in Mixed Models

Open Problems in Mixed Models xxiii Determining how to deal with a not positive definite covariance matrix of random effects, D during maximum likelihood estimation algorithms. Several strategies are discussed in Section 2.15. For

More information

The GENMOD Procedure (Book Excerpt)

The GENMOD Procedure (Book Excerpt) SAS/STAT 9.22 User s Guide The GENMOD Procedure (Book Excerpt) SAS Documentation This document is an individual chapter from SAS/STAT 9.22 User s Guide. The correct bibliographic citation for the complete

More information

Application of Ghosh, Grizzle and Sen s Nonparametric Methods in. Longitudinal Studies Using SAS PROC GLM

Application of Ghosh, Grizzle and Sen s Nonparametric Methods in. Longitudinal Studies Using SAS PROC GLM Application of Ghosh, Grizzle and Sen s Nonparametric Methods in Longitudinal Studies Using SAS PROC GLM Chan Zeng and Gary O. Zerbe Department of Preventive Medicine and Biometrics University of Colorado

More information

Case of single exogenous (iv) variable (with single or multiple mediators) iv à med à dv. = β 0. iv i. med i + α 1

Case of single exogenous (iv) variable (with single or multiple mediators) iv à med à dv. = β 0. iv i. med i + α 1 Mediation Analysis: OLS vs. SUR vs. ISUR vs. 3SLS vs. SEM Note by Hubert Gatignon July 7, 2013, updated November 15, 2013, April 11, 2014, May 21, 2016 and August 10, 2016 In Chap. 11 of Statistical Analysis

More information

1. Hypothesis testing through analysis of deviance. 3. Model & variable selection - stepwise aproaches

1. Hypothesis testing through analysis of deviance. 3. Model & variable selection - stepwise aproaches Sta 216, Lecture 4 Last Time: Logistic regression example, existence/uniqueness of MLEs Today s Class: 1. Hypothesis testing through analysis of deviance 2. Standard errors & confidence intervals 3. Model

More information

Generalized Models: Part 1

Generalized Models: Part 1 Generalized Models: Part 1 Topics: Introduction to generalized models Introduction to maximum likelihood estimation Models for binary outcomes Models for proportion outcomes Models for categorical outcomes

More information

SAS/STAT 14.2 User s Guide. The GENMOD Procedure

SAS/STAT 14.2 User s Guide. The GENMOD Procedure SAS/STAT 14.2 User s Guide The GENMOD Procedure This document is an individual chapter from SAS/STAT 14.2 User s Guide. The correct bibliographic citation for this manual is as follows: SAS Institute Inc.

More information

Bias-corrected AIC for selecting variables in Poisson regression models

Bias-corrected AIC for selecting variables in Poisson regression models Bias-corrected AIC for selecting variables in Poisson regression models Ken-ichi Kamo (a), Hirokazu Yanagihara (b) and Kenichi Satoh (c) (a) Corresponding author: Department of Liberal Arts and Sciences,

More information

Bivariate Weibull-power series class of distributions

Bivariate Weibull-power series class of distributions Bivariate Weibull-power series class of distributions Saralees Nadarajah and Rasool Roozegar EM algorithm, Maximum likelihood estimation, Power series distri- Keywords: bution. Abstract We point out that

More information

Statistics 262: Intermediate Biostatistics Model selection

Statistics 262: Intermediate Biostatistics Model selection Statistics 262: Intermediate Biostatistics Model selection Jonathan Taylor & Kristin Cobb Statistics 262: Intermediate Biostatistics p.1/?? Today s class Model selection. Strategies for model selection.

More information

Statistical Practice

Statistical Practice Statistical Practice A Note on Bayesian Inference After Multiple Imputation Xiang ZHOU and Jerome P. REITER This article is aimed at practitioners who plan to use Bayesian inference on multiply-imputed

More information

Data Analyses in Multivariate Regression Chii-Dean Joey Lin, SDSU, San Diego, CA

Data Analyses in Multivariate Regression Chii-Dean Joey Lin, SDSU, San Diego, CA Data Analyses in Multivariate Regression Chii-Dean Joey Lin, SDSU, San Diego, CA ABSTRACT Regression analysis is one of the most used statistical methodologies. It can be used to describe or predict causal

More information

Model Selection for Semiparametric Bayesian Models with Application to Overdispersion

Model Selection for Semiparametric Bayesian Models with Application to Overdispersion Proceedings 59th ISI World Statistics Congress, 25-30 August 2013, Hong Kong (Session CPS020) p.3863 Model Selection for Semiparametric Bayesian Models with Application to Overdispersion Jinfang Wang and

More information

Extensions of Cox Model for Non-Proportional Hazards Purpose

Extensions of Cox Model for Non-Proportional Hazards Purpose PhUSE 2013 Paper SP07 Extensions of Cox Model for Non-Proportional Hazards Purpose Jadwiga Borucka, PAREXEL, Warsaw, Poland ABSTRACT Cox proportional hazard model is one of the most common methods used

More information

Testing Equality of Two Intercepts for the Parallel Regression Model with Non-sample Prior Information

Testing Equality of Two Intercepts for the Parallel Regression Model with Non-sample Prior Information Testing Equality of Two Intercepts for the Parallel Regression Model with Non-sample Prior Information Budi Pratikno 1 and Shahjahan Khan 2 1 Department of Mathematics and Natural Science Jenderal Soedirman

More information

Analysis of Count Data A Business Perspective. George J. Hurley Sr. Research Manager The Hershey Company Milwaukee June 2013

Analysis of Count Data A Business Perspective. George J. Hurley Sr. Research Manager The Hershey Company Milwaukee June 2013 Analysis of Count Data A Business Perspective George J. Hurley Sr. Research Manager The Hershey Company Milwaukee June 2013 Overview Count data Methods Conclusions 2 Count data Count data Anything with

More information

SAS/STAT 13.1 User s Guide. Introduction to Survey Sampling and Analysis Procedures

SAS/STAT 13.1 User s Guide. Introduction to Survey Sampling and Analysis Procedures SAS/STAT 13.1 User s Guide Introduction to Survey Sampling and Analysis Procedures This document is an individual chapter from SAS/STAT 13.1 User s Guide. The correct bibliographic citation for the complete

More information

Generalized Linear Models

Generalized Linear Models Generalized Linear Models Lecture 3. Hypothesis testing. Goodness of Fit. Model diagnostics GLM (Spring, 2018) Lecture 3 1 / 34 Models Let M(X r ) be a model with design matrix X r (with r columns) r n

More information

Graphical Procedures, SAS' PROC MIXED, and Tests of Repeated Measures Effects. H.J. Keselman University of Manitoba

Graphical Procedures, SAS' PROC MIXED, and Tests of Repeated Measures Effects. H.J. Keselman University of Manitoba 1 Graphical Procedures, SAS' PROC MIXED, and Tests of Repeated Measures Effects by H.J. Keselman University of Manitoba James Algina University of Florida and Rhonda K. Kowalchuk University of Manitoba

More information

Group Comparisons: Differences in Composition Versus Differences in Models and Effects

Group Comparisons: Differences in Composition Versus Differences in Models and Effects Group Comparisons: Differences in Composition Versus Differences in Models and Effects Richard Williams, University of Notre Dame, https://www3.nd.edu/~rwilliam/ Last revised February 15, 2015 Overview.

More information

Application of Poisson and Negative Binomial Regression Models in Modelling Oil Spill Data in the Niger Delta

Application of Poisson and Negative Binomial Regression Models in Modelling Oil Spill Data in the Niger Delta International Journal of Science and Engineering Investigations vol. 7, issue 77, June 2018 ISSN: 2251-8843 Application of Poisson and Negative Binomial Regression Models in Modelling Oil Spill Data in

More information

Treatment Variables INTUB duration of endotracheal intubation (hrs) VENTL duration of assisted ventilation (hrs) LOWO2 hours of exposure to 22 49% lev

Treatment Variables INTUB duration of endotracheal intubation (hrs) VENTL duration of assisted ventilation (hrs) LOWO2 hours of exposure to 22 49% lev Variable selection: Suppose for the i-th observational unit (case) you record ( failure Y i = 1 success and explanatory variabales Z 1i Z 2i Z ri Variable (or model) selection: subject matter theory and

More information

GLM models and OLS regression

GLM models and OLS regression GLM models and OLS regression Graeme Hutcheson, University of Manchester These lecture notes are based on material published in... Hutcheson, G. D. and Sofroniou, N. (1999). The Multivariate Social Scientist:

More information

ARIC Manuscript Proposal # PC Reviewed: _9/_25_/06 Status: A Priority: _2 SC Reviewed: _9/_25_/06 Status: A Priority: _2

ARIC Manuscript Proposal # PC Reviewed: _9/_25_/06 Status: A Priority: _2 SC Reviewed: _9/_25_/06 Status: A Priority: _2 ARIC Manuscript Proposal # 1186 PC Reviewed: _9/_25_/06 Status: A Priority: _2 SC Reviewed: _9/_25_/06 Status: A Priority: _2 1.a. Full Title: Comparing Methods of Incorporating Spatial Correlation in

More information

Comment on Article by Scutari

Comment on Article by Scutari Bayesian Analysis (2013) 8, Number 3, pp. 543 548 Comment on Article by Scutari Hao Wang Scutari s paper studies properties of the distribution of graphs ppgq. This is an interesting angle because it differs

More information

SESUG 2011 ABSTRACT INTRODUCTION BACKGROUND ON LOGLINEAR SMOOTHING DESCRIPTION OF AN EXAMPLE. Paper CC-01

SESUG 2011 ABSTRACT INTRODUCTION BACKGROUND ON LOGLINEAR SMOOTHING DESCRIPTION OF AN EXAMPLE. Paper CC-01 Paper CC-01 Smoothing Scaled Score Distributions from a Standardized Test using PROC GENMOD Jonathan Steinberg, Educational Testing Service, Princeton, NJ Tim Moses, Educational Testing Service, Princeton,

More information

Comparison of Estimators in GLM with Binary Data

Comparison of Estimators in GLM with Binary Data Journal of Modern Applied Statistical Methods Volume 13 Issue 2 Article 10 11-2014 Comparison of Estimators in GLM with Binary Data D. M. Sakate Shivaji University, Kolhapur, India, dms.stats@gmail.com

More information

H-LIKELIHOOD ESTIMATION METHOOD FOR VARYING CLUSTERED BINARY MIXED EFFECTS MODEL

H-LIKELIHOOD ESTIMATION METHOOD FOR VARYING CLUSTERED BINARY MIXED EFFECTS MODEL H-LIKELIHOOD ESTIMATION METHOOD FOR VARYING CLUSTERED BINARY MIXED EFFECTS MODEL Intesar N. El-Saeiti Department of Statistics, Faculty of Science, University of Bengahzi-Libya. entesar.el-saeiti@uob.edu.ly

More information

STA 303 H1S / 1002 HS Winter 2011 Test March 7, ab 1cde 2abcde 2fghij 3

STA 303 H1S / 1002 HS Winter 2011 Test March 7, ab 1cde 2abcde 2fghij 3 STA 303 H1S / 1002 HS Winter 2011 Test March 7, 2011 LAST NAME: FIRST NAME: STUDENT NUMBER: ENROLLED IN: (circle one) STA 303 STA 1002 INSTRUCTIONS: Time: 90 minutes Aids allowed: calculator. Some formulae

More information

REVISED PAGE PROOFS. Logistic Regression. Basic Ideas. Fundamental Data Analysis. bsa350

REVISED PAGE PROOFS. Logistic Regression. Basic Ideas. Fundamental Data Analysis. bsa350 bsa347 Logistic Regression Logistic regression is a method for predicting the outcomes of either-or trials. Either-or trials occur frequently in research. A person responds appropriately to a drug or does

More information

Generating Half-normal Plot for Zero-inflated Binomial Regression

Generating Half-normal Plot for Zero-inflated Binomial Regression Paper SP05 Generating Half-normal Plot for Zero-inflated Binomial Regression Zhao Yang, Xuezheng Sun Department of Epidemiology & Biostatistics University of South Carolina, Columbia, SC 29208 SUMMARY

More information

KULLBACK-LEIBLER INFORMATION THEORY A BASIS FOR MODEL SELECTION AND INFERENCE

KULLBACK-LEIBLER INFORMATION THEORY A BASIS FOR MODEL SELECTION AND INFERENCE KULLBACK-LEIBLER INFORMATION THEORY A BASIS FOR MODEL SELECTION AND INFERENCE Kullback-Leibler Information or Distance f( x) gx ( ±) ) I( f, g) = ' f( x)log dx, If, ( g) is the "information" lost when

More information

36-309/749 Experimental Design for Behavioral and Social Sciences. Dec 1, 2015 Lecture 11: Mixed Models (HLMs)

36-309/749 Experimental Design for Behavioral and Social Sciences. Dec 1, 2015 Lecture 11: Mixed Models (HLMs) 36-309/749 Experimental Design for Behavioral and Social Sciences Dec 1, 2015 Lecture 11: Mixed Models (HLMs) Independent Errors Assumption An error is the deviation of an individual observed outcome (DV)

More information

Generalized Linear Models for Non-Normal Data

Generalized Linear Models for Non-Normal Data Generalized Linear Models for Non-Normal Data Today s Class: 3 parts of a generalized model Models for binary outcomes Complications for generalized multivariate or multilevel models SPLH 861: Lecture

More information

1. Introduction Over the last three decades a number of model selection criteria have been proposed, including AIC (Akaike, 1973), AICC (Hurvich & Tsa

1. Introduction Over the last three decades a number of model selection criteria have been proposed, including AIC (Akaike, 1973), AICC (Hurvich & Tsa On the Use of Marginal Likelihood in Model Selection Peide Shi Department of Probability and Statistics Peking University, Beijing 100871 P. R. China Chih-Ling Tsai Graduate School of Management University

More information

Introduction to Generalized Models

Introduction to Generalized Models Introduction to Generalized Models Today s topics: The big picture of generalized models Review of maximum likelihood estimation Models for binary outcomes Models for proportion outcomes Models for categorical

More information

Mohammed. Research in Pharmacoepidemiology National School of Pharmacy, University of Otago

Mohammed. Research in Pharmacoepidemiology National School of Pharmacy, University of Otago Mohammed Research in Pharmacoepidemiology (RIPE) @ National School of Pharmacy, University of Otago What is zero inflation? Suppose you want to study hippos and the effect of habitat variables on their

More information

Multiple regression and inference in ecology and conservation biology: further comments on identifying important predictor variables

Multiple regression and inference in ecology and conservation biology: further comments on identifying important predictor variables Biodiversity and Conservation 11: 1397 1401, 2002. 2002 Kluwer Academic Publishers. Printed in the Netherlands. Multiple regression and inference in ecology and conservation biology: further comments on

More information

EPSY 905: Fundamentals of Multivariate Modeling Online Lecture #7

EPSY 905: Fundamentals of Multivariate Modeling Online Lecture #7 Introduction to Generalized Univariate Models: Models for Binary Outcomes EPSY 905: Fundamentals of Multivariate Modeling Online Lecture #7 EPSY 905: Intro to Generalized In This Lecture A short review

More information

Logistic regression analysis. Birthe Lykke Thomsen H. Lundbeck A/S

Logistic regression analysis. Birthe Lykke Thomsen H. Lundbeck A/S Logistic regression analysis Birthe Lykke Thomsen H. Lundbeck A/S 1 Response with only two categories Example Odds ratio and risk ratio Quantitative explanatory variable More than one variable Logistic

More information

Generalized Linear. Mixed Models. Methods and Applications. Modern Concepts, Walter W. Stroup. Texts in Statistical Science.

Generalized Linear. Mixed Models. Methods and Applications. Modern Concepts, Walter W. Stroup. Texts in Statistical Science. Texts in Statistical Science Generalized Linear Mixed Models Modern Concepts, Methods and Applications Walter W. Stroup CRC Press Taylor & Francis Croup Boca Raton London New York CRC Press is an imprint

More information

Chapter 5. Introduction to Path Analysis. Overview. Correlation and causation. Specification of path models. Types of path models

Chapter 5. Introduction to Path Analysis. Overview. Correlation and causation. Specification of path models. Types of path models Chapter 5 Introduction to Path Analysis Put simply, the basic dilemma in all sciences is that of how much to oversimplify reality. Overview H. M. Blalock Correlation and causation Specification of path

More information

MIXED MODELS FOR REPEATED (LONGITUDINAL) DATA PART 2 DAVID C. HOWELL 4/1/2010

MIXED MODELS FOR REPEATED (LONGITUDINAL) DATA PART 2 DAVID C. HOWELL 4/1/2010 MIXED MODELS FOR REPEATED (LONGITUDINAL) DATA PART 2 DAVID C. HOWELL 4/1/2010 Part 1 of this document can be found at http://www.uvm.edu/~dhowell/methods/supplements/mixed Models for Repeated Measures1.pdf

More information

Applied Multivariate Analysis

Applied Multivariate Analysis Department of Mathematics and Statistics, University of Vaasa, Finland Spring 2017 Dimension reduction Exploratory (EFA) Background While the motivation in PCA is to replace the original (correlated) variables

More information

SCHOOL OF MATHEMATICS AND STATISTICS. Linear and Generalised Linear Models

SCHOOL OF MATHEMATICS AND STATISTICS. Linear and Generalised Linear Models SCHOOL OF MATHEMATICS AND STATISTICS Linear and Generalised Linear Models Autumn Semester 2017 18 2 hours Attempt all the questions. The allocation of marks is shown in brackets. RESTRICTED OPEN BOOK EXAMINATION

More information

SAS/STAT 14.2 User s Guide. Introduction to Analysis of Variance Procedures

SAS/STAT 14.2 User s Guide. Introduction to Analysis of Variance Procedures SAS/STAT 14.2 User s Guide Introduction to Analysis of Variance Procedures This document is an individual chapter from SAS/STAT 14.2 User s Guide. The correct bibliographic citation for this manual is

More information

Holzmann, Min, Czado: Validating linear restrictions in linear regression models with general error structure

Holzmann, Min, Czado: Validating linear restrictions in linear regression models with general error structure Holzmann, Min, Czado: Validating linear restrictions in linear regression models with general error structure Sonderforschungsbereich 386, Paper 478 (2006) Online unter: http://epub.ub.uni-muenchen.de/

More information

Models for Binary Outcomes

Models for Binary Outcomes Models for Binary Outcomes Introduction The simple or binary response (for example, success or failure) analysis models the relationship between a binary response variable and one or more explanatory variables.

More information

Statistical Distribution Assumptions of General Linear Models

Statistical Distribution Assumptions of General Linear Models Statistical Distribution Assumptions of General Linear Models Applied Multilevel Models for Cross Sectional Data Lecture 4 ICPSR Summer Workshop University of Colorado Boulder Lecture 4: Statistical Distributions

More information

Dr. Junchao Xia Center of Biophysics and Computational Biology. Fall /1/2016 1/46

Dr. Junchao Xia Center of Biophysics and Computational Biology. Fall /1/2016 1/46 BIO5312 Biostatistics Lecture 10:Regression and Correlation Methods Dr. Junchao Xia Center of Biophysics and Computational Biology Fall 2016 11/1/2016 1/46 Outline In this lecture, we will discuss topics

More information

Generalized linear models

Generalized linear models Generalized linear models Douglas Bates November 01, 2010 Contents 1 Definition 1 2 Links 2 3 Estimating parameters 5 4 Example 6 5 Model building 8 6 Conclusions 8 7 Summary 9 1 Generalized Linear Models

More information

COMPLEMENTARY LOG-LOG MODEL

COMPLEMENTARY LOG-LOG MODEL COMPLEMENTARY LOG-LOG MODEL Under the assumption of binary response, there are two alternatives to logit model: probit model and complementary-log-log model. They all follow the same form π ( x) =Φ ( α

More information

Determining Sufficient Number of Imputations Using Variance of Imputation Variances: Data from 2012 NAMCS Physician Workflow Mail Survey *

Determining Sufficient Number of Imputations Using Variance of Imputation Variances: Data from 2012 NAMCS Physician Workflow Mail Survey * Applied Mathematics, 2014,, 3421-3430 Published Online December 2014 in SciRes. http://www.scirp.org/journal/am http://dx.doi.org/10.4236/am.2014.21319 Determining Sufficient Number of Imputations Using

More information

Outline of GLMs. Definitions

Outline of GLMs. Definitions Outline of GLMs Definitions This is a short outline of GLM details, adapted from the book Nonparametric Regression and Generalized Linear Models, by Green and Silverman. The responses Y i have density

More information