Using PROC GENMOD to Model Adverse Event Counts in a Health Care Setting

Size: px
Start display at page:

Download "Using PROC GENMOD to Model Adverse Event Counts in a Health Care Setting"

Transcription

1 Using PROC GENMOD to Model Adverse Event Counts in a Health Care Setting John Ulicny, Fox Chase - Temple Bone Marrow Transplant Program, Philadelphia, PA Thomas R. Klumpp, MD, Fox Chase - Temple Bone Marrow Transplant Program, Philadelphia, PA ABSTRACT Statistical models for adverse events have been developed as part of a quality management initiative at the Fox Chase Temple Bone Marrow Transplant Program in Philadelphia, PA. Their purpose is to enable the transplant team to compare recent adverse event counts to what should be expected based on the patient population currently under care by the team. The count of adverse events is the response variable in the regression models. Generalized Linear Modeling, as implemented in the PROC GENMOD procedure, is an effective tool for performing regression analysis on a response of this type. Unlike ordinary least squares (OLS) it can be applied to a wide range of nonnormal responses as long as they come from the natural exponential family of distributions and meet certain other assumptions. The models in this paper were developed using Version of SAS. This version allows for enhanced, integrated graphical assessment of the model via the ODS Statistical Graphics facility which is built into various statistical procedures, GENMOD being one. The ODS Statistical Graphics facility is experimental in release Some examples of its use are provided in this paper. INTRODUCTION Hematopoietic Cell Transplantation (HCT) is performed on patients for a variety of severe illnesses, typically affecting the bone marrow and circulating blood. The term Bone Marrow Transplant (BMT) is an older term often used interchangeably with HCT, and for the purposes of this paper no distinction between the two terms is made. The vast majority of patients transplanted at the Fox Chase - Temple Bone Marrow Transplant Program have hematologic malignancies such as Non-Hodgkin s Lymphoma (NHL), Hodgkin s Disease (HD), Multiple Myeloma (MM), Acute Myelogenous Leukemia (AML) and others. Transplantation for some types of solid tumors is done as well, although these constitute a small minority of cases. Due to the high severity of most of these illnesses, high dose chemotherapy and radiation are typically administered to treat the cancer, but this results in the destruction of the bone marrow, which must then be replaced. This process is prone to generating a variety of toxicities, many of which are quite severe or even fatal. Common examples of HCT-related toxicities are anemia, nausea, diarrhea and mucositis. There are three main types of HCT useful for predicting adverse events: autologous, allogeneic-related and allogeneic-unrelated. In an autologous transplant, a patient s own blood stem cells are harvested before the preparative regimen (i.e. chemotherapy and/or radiation) is administered. The stem cells are then reinfused in order to reconstitute the marrow. In an allogeneic-related HCT, a member of the patient s family is a donor. The regimen is administered and then the related donor s cells are infused into the patient. The third form of HCT is required when a compatible donor cannot be found from within the patient s family. This is the allogeneic-unrelated HCT and involves a donor search through an international registry of donors to find a match. The matched unrelated donor s (MUD) cells are then infused after the regimen is administered. Autologous transplants are the most common and least risky form of transplant, followed by matched related transplants. The riskiest transplants are the allogeneic-unrelated transplants. The toxicities associated with each transplant are recorded by the data management team at the Fox Chase Temple BMT Program. They are graded according to the Eastern Cooperative Oncology Group (ECOG) standard grading system. These grades specified by the ECOG standard are ordinal, ranging from zero to five. If a toxicity is grade 3 or higher it is considered an adverse event in the model. A grade of zero indicates no event, and a grade of five indicates that the event was fatal. The event descriptions are listed in Table 1 on the next page. 1

2 Table 1. Toxicity Frequencies An Adverse Even is a Toxicity of Grade 3 or Higher Event Grade Description of Toxicity Aggregate Frequency ( ) 1 Mild 1,737 2 Moderate Severe Life threatening Fatal 19 - Total Toxicities 3,900 The goal of the model described in this paper is to compute a reasonable expected number of adverse events each month based on the characteristics of the patients under our care. Based on the information in Table 1 there have been ( )/60 21 adverse events per month on average based on a transplant volume of about 5 patients per month during the time frame GENERALIZED LINEAR MODELS AND PROC GENMOD There have been many excellent papers and books written about generalized linear models. For a thorough technical discussion see the books by Agresti 1 or Myers 2. For sources that describe using PROC GENMOD for generalized linear models see Allison 3 or Stokes et al 4. The discussion here is designed as a tutorial for those who have little or no familiarity with this procedure. In multiple linear regression, a response variable Y is related to a set of X-variables linearly as Y = β + β x β x + ε = x β + ε (1) i 0 1 1i p 1 ( p 1) i i i i for i=1, 2,, n observations. The errors from this model are assumed to be independent with zero mean and constant variance. The assumption of error normality is usually also added to enable one to construct hypothesis tests and confidence intervals for the parameters. A Generalized Linear Model also describes a relationship between a response variable and an independent variable or variables, however the relationship may be much more complex than a simple linear one. As described by Meyers 2 the generalized model is made up of three components: The random component. This component consists of the response variable Y with observed values Y 1, Y 2,, Y n. These observations are mutually independent and come from a natural exponential family. This family is of the general form given in equation (2) where the vectorθ may vary depending on the values of the covariates. i f( y ; θ ) = a( θ ) b( y )exp[ yc( θ )] (2) i i i i i i For instance, a special case of this family is the Bernoulli distribution with 1 parameter as given by yi 1 y p i i f( yi; pi) = pi (1 pi) = (1 pi)exp yilog( ) 1 pi (3) The parameter(s) in equations 2 and 3 are indexed by i because they can vary as a function of the covariates. In other words, because each observation can have different covariate values, the estimate of θ i can be different for each observation due to the dependence relationship specified in the model. In a simple linear regression it is only the response mean that can vary, but in a generalized linear model the variance can vary as well. In other words, the assumption of variance homogeneity can be relaxed, although it is important to understand how the variance depends on the model data. Keep in mind that there is a distinction between the parameters of the response distribution 2

3 represented by θ i and the parameters of the regression equation covariates represented by β. The systematic component. This is a function of the Xij that is linear in the parameters. If Y depends on several covariates then X n p is often called the design matrix, with n observations corresponding to p variables, possibly including an intercept. These explanatory variables can be combinations of continuous variables, categorical variables and interactions. The function results in a vector called the linear predictor. The equation is: η = X β (4) n 1 n p p 1 The final component is called the link function g. All link functions must be monotonic and differentiable, and they are often non-linear. This function relates the first two components to each other by specifying that η = g( µ ) = g[ E( Y)] (5) i i i The normal distribution is a special case of the natural exponential family, and the assumption of normality plays a key role in the process of estimating and evaluating simple linear and multiple regression models. In the present case however, where adverse event counts need to be modeled, other distributions from the exponential family are required to adequately represent the special nature of the data. One obvious requirement of the distribution is that it does not allow for a negative count value. A probability distribution for a count variable also will not typically have a constant variance. The variance of a count variable usually increases as the size of its mean increases. The Poisson distribution therefore is a good choice because the variance equals the mean, and so any model using the Poisson as a response variable will accommodate such responses. Note that this contrasts with the case of multiple regression where homogeneity of variance is assumed. One cannot have homogeneity of variance in a Poisson regression unless the mean always stays the same! The negative binomial distribution is another member of the natural exponential family that is useful in the context of modeling count data. This distribution, although significantly more complicated than the Poisson, can handle the situation called overdispersion in which the variance of the count variable is actually greater than the mean of the variable. The adverse events model illustration will use the Poisson model as well as the negative binomial model to demonstrate the two different approaches. In the standard regression case, where the regression response is normally distributed and the other regression assumptions mentioned above are met, ordinary least squares (OLS) can be used to arrive at parameter estimates that are unbiased as well as being maximum likelihood estimates (MLEs). Thus they are usually referred to as Best Linear Unbiased Estimators (BLUE). With non-normal responses however, it is necessary to use an algorithm such as iteratively re-weighted least squares instead of OLS to arrive at estimates that are approximate MLEs. These estimators are not guaranteed to be unbiased as in the normal response case, however they will tend to have the smallest variance possible while maximizing the probability that the sample obtained was actually drawn from a distribution with parameter values equal to the estimates that were generated. This is why some GENMOD output will mention iteratively re-weighted least squares as the estimation technique used. CHARACTERISTICS OF THE DATA There is a significant amount of very high quality data collected on each patient treated by the BMT team. The main file (that we call the core file) consists of one observation for each patient-protocol combination. For example, one patient might be registered onto three treatment protocols over a period of time, and only one of these protocols will actually involve a transplant. Nevertheless that patient will have three observations in the database and may be at risk for an adverse even under any of the protocols. In reality the risk of an adverse event on any protocol other than a transplant protocol is quite small and so only patients registered for transplants are included in the model. We collect and quality-assure dozens of variables, including patient demographic information, disease status, treatment details, outcomes and more. SAS/AF and SCL are used to input the information into a dataset. This dataset is then checked at least once per day by an expert system written in Base SAS. This system checks the relationships of key variables to one another, and it also checks to determine if the magnitudes of the most important variables are reasonable within the context of the patient s current status. For instance the code in figure 1 prints a warning message if the data manager enters a suspicious value for the creatinine level of a transplanted patient. 3

4 /* */ /* CHECK CREATININE LEVELS */ /* */ TITLE 'WARNING (CREAT) CREATININE LESS THAN.1 OR GREATER THAN 4.0'; TITLE2 'WHEN BMT=YES'; PROC PRINT DATA=BMT.CORE; WHERE ((CREAT<.1) OR (CREAT>4.0)) AND BMT='YES'; VAR ROWNUM NAME TYPE TUPN DIGNOSIS BMT BMTDAT CREAT; RUN; We have defined a reporting period to be approximately one month. This coincides with our monthly quality management meetings and with the historical cycle of monthly permanent backups of the core file. Although our current backup strategy involves multiple daily, weekly and monthly backups, some of the old monthly backups were not done at precise monthly intervals and so we must recognize and adjust for the fact that the exposure to adverse event risk is different in these periods. THE VARIABLES USED The BMT team has limited resources available to collect adverse event data and so we cannot record the exact date or exact cause of the events. (Sometimes it is virtually impossible to pin down such information regardless of the resources being devoted to the task.) In addition, only the maximum grade experienced by a patient within each particular adverse event category is captured. For these reasons, we make assumptions regarding the distribution of the events between the ending dates of each reporting period. In particular, it is assumed that an event reported as of the end of a reporting period was equally likely to have happened on any day within that period on which the patient was under treatment. If the reporting period was 30 days and the patient was being treated under a protocol for the entire period, then the probability is 1/30 that the event happened on any particular day within the period. Using this assumption, it was possible to construct a graph of the events as a function of days from the initiation of treatment for each patient. In virtually all cases the initiation of treatment is the start of the conditioning regimen, i.e. chemotherapy with or without radiation. The interest of the quality management team is highly focused on extremely severe adverse events. A plot of toxicities expressed as a rate per 100 patients per day is shown in Chart 1, with autologous, allogeneic-related and allogeneic-unrelated (MUD) events plotted separately. RESPONSE VARIABLE: ADVERSE EVENTS (AECOUNT) The adverse event variable is defined as the number of adverse events experienced by a patient in the reporting period under study. Each patient/month combination is a separate observation. Chart 1. Estimated Historical Toxicity Rate Toxicity Rate Per 100 Surviving Patients Per Day Days After Rx Initiated Allogeneic - MUD Allogeneic - Other Autologous 4

5 COVARIATE 1 - TEMPORAL RISK INDEX (TRI) Chart 1 shows graphically the tendency of the risk to start low and rise to a peak at about 50 to 75 days depending on the transplant type. The risk tends to trail off at about 100 days, gradually subsiding over the subsequent 400-day period. This result is extremely reasonable based on clinical understanding. The process of conditioning and cell infusion occurs in the early stages of treatment and then, ideally, the gradual recovery of the patient commences once the blood cells particularly neutrophils and platelets - start to be produced by the patient without the need for artificial support. We attempted to fit a number of possible candidate curves to this data in order to find a curve that was simple, intuitive and a good fit to all 3 types of transplants. Because of the apparent curvilinear risk shape in the early stage, and the tapering off of risk after day 100, an inverse quadratic curve emerged as the best fit to the data. A predictor called the temporal risk index (TRI) was constructed based on this curve, however we knew that temporal risk was not the only factor affecting adverse events! The inverse quadratic equation as well as a plot of the fit to autologous toxicities is displayed below. Having a relatively simple structure, this curve was able to explain 95% of the toxicity variability in the AUTO, 85% of the variability in the ALLO-OTHER, and 72% of the variability in the ALLO-MUD transplants, and yielded reasonably consistent parameter estimates across the different types of transplants as well as for different subsets of data. These characteristics reduce the possibility that the curves are simply fitting noise. We are also attempting to obtain data from national and international BMT registries to further validate this functional from for the relationship. Chart 2. Inverse Quadratic Fit to Toxicity Data This curve can be fit using PROC NLIN in SAS. We used all toxicities instead of just adverse events in this stage in order to get a greater volume of data to work with. The resulting parameters are listed in Table 2 below. Table 2. Coefficients Estimated in the Inverse Quadratic Fit: rate = a + bt + ct 2 1/( ) Coefficient AUTO ALLO - other ALLO MUD a b c

6 COVARIATE 2 - LENGTH OF TIME AS INPATIENT (LTIP) An explanatory variable based on the inpatient length of stay (LTIP) was included as a proxy for illness severity and complications during the admission. It is defined as: LTIP = 0 if not yet discharged in the current reporting period. = LOS for transplant admission if discharged during or prior to the current reporting period. COVARIATE 3 FOLLOW UP FLAG (FLFLAG) Adverse events are recorded into the database as they are observed by the transplant team. The frequency of followups typically diminishes over time. As we have fewer chances to observe a patient at distant time points after transplant, any adverse events are more likely to be recorded a the month in which a follow up visit has occurred. This variable is a flag to indicate whether or not a follow up visit has occurred in the current reporting period. COVARIATE 4 - OBSERVER FLAG (OBSERVER) A personnel change took place in the BMT program during which the responsibility for assigning toxicity grades changed from one group of observers to another. This kind of change almost always introduces an observer bias and the case of our adverse events data capture is no exception. Therefore a dummy variable called OBSERVER was included in the model to account for the bias. COVARIATE 5 - DIAGNOSIS/REGIMEN FLAG (DREG) One combination of diagnosis and regimen that was suspected to be particularly prone to generating adverse events was renal cell carcinoma patients who received fludarabine and total body irradiation as a preparative regimen. This indicator variable was included to assess the impact of this particular situation on adverse event risk. COVARIATE 6 - TRANSPLANT YEAR (BMTYR) Over time, improvements in transplant technology, training and process control should enable us to reduce the incidence of adverse events after adjusting for the other covariates. Examples of improvements in the area of supportive care include improved antibiotics and antiviral medications and better screening for viruses so that we can treat problems sooner. Treating problems sooner can keep toxicities that are at level 1 or 2 from becoming severe toxicities (i.e. adverse events). Including the transplant year variable allows us to test for evidence of this effect, and if there is improvement over time, it can indicate approximately how much improvement has occurred. MODELS PRODUCED BY GENMOD We will now present two models that can be produced by PROC GENMOD to analyze the adverse event data. The first is the Poisson model and the second is the negative binomial model. There are many other approaches to modeling this data that may or may not work better, however this paper will focus on these two approaches to illustrate and contrast the techniques as implemented in the GENMOD procedure. For a discussion of the event history approach and PROC PHREG see chapter 5 in Allison 3 and for a description of a new SAS PROC called GLIMMIX that can estimate generalized linear mixed models see the SUGI 30 paper by Oliver Schabenberger 4 of the SAS Institute. A key thing to remember is that all of these approaches will be superior to using standard multiple regression for the reasons stated above. The ODS statements invoke ODS statistical graphics, which is experimental in version of SAS. The examples below show how to evaluate the LINK function by using the ASSESS statement within the GENMOD procedure. The ASSESS statement works in tandem with ODS statistical graphics to produce the assessment graphics (charts 3-5) displayed later in this paper. According to SAS documentation for version 9.1.3, the ASSESS statement implements the following idea: Lin, Wei, and Ying (2002) present graphical and numerical methods for model assessment based on the cumulative sums of residuals over certain coordinates (e.g., covariates or linear predictors) or some related aggregates of residuals. The distributions of these stochastic processes under the assumed model can be approximated by the distributions of certain zero-mean Gaussian processes whose realizations can be generated by simulation. Each observed residual pattern can then be compared, both graphically and numerically, with a number of realizations from the null distribution. Such comparisons enable you to assess objectively whether the observed residual pattern reflects anything beyond random fluctuation. These procedures are useful in determining appropriate functional forms of covariates and link function. 6

7 MODEL 1 POISSON REGRESSION The code for this model is: /* */ /* POISSON REGRESSION */ /* */ ODS LISTING CLOSE; ODS RTF; ODS GRAPHICS ON; PROC GENMOD DATA=AEDATA; CLASS FLFLAG OBSERVER DREG; MODEL AECOUNT3 = TRI FLFLAG LTIP OBSERVER DREG BMTYR / DIST=POISSON LINK=LOG ; ASSESS LINK / RESAMPLE=10000; RUN; QUIT; ODS GRAPHICS OFF; ODS RTF CLOSE; ODS LISTING; /* */ With the ASSESS statement the aptness of the functional form of the link or of one of the continuous covariates is what is being checked. The analysis centers on whether the simulated residual patterns that would be generated by the model under the specified assumptions are statistically different from the one actually generated. The actual pattern is printed in bold while the simulations are represented by dotted lines. If the p-value is quite low, say p<.05, then there is cause for concern that the actual functional form being used is less than optimal because the actual residual pattern differs from the expected patterns generated by simulation. It is best to have p-values greater than.2. This is just an introduction to the ASSESS statement. For more on its capabilities please see the SAS documentation for version Excerpts from the Poisson regression run are listed below: Table 3 Goodness of Fit Statistics for the Poisson Regression Model Criteria For Assessing Goodness Of Fit Criterion DF Value Value/DF Deviance Scaled Deviance Pearson Chi-Square Scaled Pearson X Log Likelihood

8 Table 4 - Analysis of Parameter Estimates for the Poisson Regression Model Parameter Analysis Of Parameter Estimates DF Estimate Standard Error Wald 95% Confidence Limits Chi-Square Pr > ChiSq Intercept TRI <.0001 FLFLAG FLFLAG LTIP <.0001 OBSERVER OBSERVER DREG 01 REN_FLUTBI DREG 99 OTHER BMTYR Scale Chart 3 Assessment of the Poisson Model Link Function 8

9 MODEL 2 NEGATIVE BINOMIAL REGRESSION /* */ /* NEGATIVE BINOMIAL REGRESSION */ /* */ ODS LISTING CLOSE; ODS RTF; ODS GRAPHICS ON; PROC GENMOD DATA=AEDATA; CLASS OBSERVER DREG; MODEL AECOUNT3 = TRI LTIP OBSERVER DREG BMTYR / DIST=NB; ASSESS VAR=(TRI) / RESAMPLE=10000; RUN; QUIT; ODS GRAPHICS OFF; ODS RTF CLOSE; ODS LISTING; /* */ Notice in the negative binomial code that DIST=NB is specified and no link function is specified. The default link function is log for a negative binomial regression and the specification of it may be omitted. Also notice that the ASSESS is now being used to evaluate the temporal risk index (TRI). You may assess any of the continuous variables in the model or you may assess the link function using this same statement, but not at the same time! Table 5 Goodness of Fit Statistics for the Negative Binomial Model Criteria For Assessing Goodness Of Fit Criterion DF Value Value/DF Deviance Scaled Deviance Pearson Chi-Square Scaled Pearson X Log Likelihood

10 Table 6 Analysis of Parameter Estimates for the Negative Binomial Model Parameter Analysis Of Parameter Estimates DF Estimate Standard Error Wald 95% Confidence Limits Chi-Square Pr > ChiSq Intercept TRI <.0001 FLFLAG FLFLAG LTIP <.0001 OBSERVER OBSERVER DREG 01 REN_FLUTBI DREG 99 OTHER BMTYR Dispersion MODEL ANALYSIS Both the Poisson and the negative binomial models found the temporal risk and the length of stay to be very significant adverse event predictors. Other variables that have an impact are FLFLAG, OBSERVER and DREG although these variables are somewhat less important or only borderline significant. The DREG variable only affects a relatively small proportion of our patients and so even though it is significant statistically it is not as useful as one might think without knowing the data. Unfortunately the BMT year variable, BMTYR, is only borderline significant in the Poisson model and is clearly not significant in the negative binomial model. The same can be said of the intercept estimate. It is possible to re-estimate these models using the NOINT option in the MODEL statement. Tables 3 and 5 display goodness of fit statistics for the Poisson and negative binomial models respectively. Models that fit well have the values of the Value/DF close to 1. Notice how the Pearson chi-square for the Poisson model has a ratio of and the negative binomial model shows a value of for the corresponding entry. This number is better for the negative binomial case because the negative binomial distribution allows for a variance greater than the mean (overdispersion) whereas the Poisson distribution requires that the variance equal the mean. This is the reason the scale parameter is set to 1.0 in the Poisson model and the negative binomial model has a dispersion parameter estimate instead of a scale parameter estimate. The adverse event data are overdispersed and Table 6 shows that the dispersion parameter is in the negative binomial model. Note that a dispersion parameter doesn t exist in the corresponding location in the Poisson parameter estimates table, Table 4. Instead the Poisson model displays a scale parameter that stays fixed at 1. 10

11 Chart 4 Assessment of the Negative Binomial Model s Temporal Risk Index Chart 5 Assessment of the Link Function for a Multiple Regression Model 11

12 The link function assessment graph (Chart 3) shows that the choice of a log link is an excellent choice for the functional form of the link in this particular Poisson regression. For variety we showed the TRI assessment (Chart 4) for the negative binomial model. The verdict on this is less clear, as there seems to be some divergence from the simulated cumulative errors at a TRI value of around 22. Nevertheless we cannot reject the null hypothesis that the cumulative error pattern for TRI is consistent with the random fluctuation we would expect if the functional form used were appropriate. In order to assess an entire model one could produce these cumulative residual graphs on all covariates in the model as well as on the link function. Certain types of covariate misspecification will be readily apparent by the distinctive pattern produced in its cumulative residual graph. For instance, if a variable should be included in the model as log(x) but is instead included as X then the graph will exhibit a distinct pattern that starts out sloping downward then upward to a peak and then slopes downward again like a sideways S. Examples of these patterns are given in the ODS statistical graphics documentation for the GENMOD procedure in SAS For comparison purposes we ran a standard multiple regression analysis on the data and plotted the link function assessment graph. (To save space we have not shown the SAS code for this.) The link assessment is displayed in Chart 5. Note that the p-value is less than.0001, indicating the severe departure of the actual cumulative residuals from the simulated residuals. The chart shows that multiple regression is clearly not appropriate for this data. As an aside, PROC GENMOD and PROC REG yield virtually identical results when the response is set to normal and the link is set to identity in GENMOD, but the link assessment is possible only in GENMOD. CONCLUSION The Poisson and negative binomial regression approaches to modeling adverse events were discussed in this paper. Either approach has predictive value and is far superior to using standard multiple regression. When overdispersion exists in the Poisson approach it may be more appropriate to use a model based on a negative binomial response. The price you pay for using a negative binomial model is the additional complexity of the response distribution, however this additional complexity is worthwhile when the problem of overdispersion is pronounced. These models tell us how many adverse events we should expect to be reported during a reporting period. That was the main goal. The models are not intended primarily for making clinical inferences. For example, the timing of a follow up visit has nothing to do with the adverse event risks our patients face, but it DOES say that an adverse event, if it occurred, is more likely to be reported to us during a period in which a follow up visit took place. If the effect of BMT year on adverse events exists, it is too small to be detected by these models. The fact that the parameter estimate was and the p-value was.06 in the Poisson model was encouraging, but we clearly cannot make any conclusions here without further information and additional analysis. One must always be cognizant of multiple comparison issues when building regression models. Rigorous validation and model assessment techniques should be employed to assure that significant variables truly are significant and that the analyst is not simply modeling noise. This is especially important in borderline significance cases such as we have with the BMTYR variable. A future direction for studying adverse events would be to implement a generalized estimating equations (GEE) adjustment. According to Allison 3 this technique allows for correlations in the dependent variable across observations. In the present case this means that the technique would adjust for correlation over time within each patient s transplant data. Such correlations violate the assumption of independence on which many of the formulas are based, and GEE adjustment would reduce the impact of this violation. It would also be useful to try an event history approach (i.e. survival analysis) to model adverse events. Whereas the generalized linear model assumes that the response variable comes from an exponential family of distributions, that the responses are independent and that a specific link function applies, the event history approach has as a central assumption proportional hazards. In addition the event history approach utilizes partial likelihood as opposed to the MLEs used in generalized linear models. These differences in the two approaches may yield somewhat different inferences. 12

13 REFERENCES 1. Agresti, A. (1990). Categorical Data Analysis. New York: Wiley. 2. Meyers, Raymond H. (1990), Classical and Modern Regression with Applications. 2 nd Edition, Pacific Grove: Duxbury Press. 3. Allison, Paul D. (2005), Fixed Effects Regression Methods for Longitudinal Data Using SAS. 1 st Edition. Cary, NC: SAS Institute Inc. 4. Schabenberger, Oliver Introducing the GLIMMIX Procedure for Generalized Linear Mixed Models. Proceedings of the Thirtieth Annual SAS Users Group InternationalConference, Philadelphia, PA, Stokes, Maura E., Davis, Charles S., Koch, Gary G., Categorical Data Analysis Using the SAS System, Cary, NC:SAS Institute Inc., pp. 6. Nelder, J.A., Wedderburn, R.W.M., Generalized Linear Models, Journal of the Royal Statistical Society, Series A 153: Gardner, W., Mulvey, Edward P., Shaw, Esther C., Regression Analysis of Counts and Rates: Poisson, Overdispersed Poisson, and Negative Binomial Models. Psychological Bulletin, Vol. 118, No RECOMMENDED READING Cameron, A.C., Trivedi, P.K. (1998), Regression Analysis of Count Data, Cambridge: University Press. Firth, D. (1991), Generalized Linear Models, in Statistical Theory and Modeling, ed. Hinkley, D.V., Reid, N., and Snell, E.J., London: Chapman and Hall. Lin, D.Y., Wei, L.J., and Ying, Z "Model-Checking Techniques Based on Cumulative Residuals," Biometrics, 58, McCullagh, P., Nelder J.A. (1983), Generalized Linear Models, New York: Chapman and Hall. CONTACT INFORMATION Your comments and questions are valued and encouraged. Contact the authors at: Fox Chase Temple BMT Program 7604 Central Ave Philadelphia, PA Phone: (215) john.ulicny@tuhs.temple.edu Web: SAS and all other SAS Institute Inc. product or service names are registered trademarks or trademarks of SAS Institute Inc. in the USA and other countries. indicates USA registration. Other brand and product names are trademarks of their respective companies. 13

ONE MORE TIME ABOUT R 2 MEASURES OF FIT IN LOGISTIC REGRESSION

ONE MORE TIME ABOUT R 2 MEASURES OF FIT IN LOGISTIC REGRESSION ONE MORE TIME ABOUT R 2 MEASURES OF FIT IN LOGISTIC REGRESSION Ernest S. Shtatland, Ken Kleinman, Emily M. Cain Harvard Medical School, Harvard Pilgrim Health Care, Boston, MA ABSTRACT In logistic regression,

More information

DISPLAYING THE POISSON REGRESSION ANALYSIS

DISPLAYING THE POISSON REGRESSION ANALYSIS Chapter 17 Poisson Regression Chapter Table of Contents DISPLAYING THE POISSON REGRESSION ANALYSIS...264 ModelInformation...269 SummaryofFit...269 AnalysisofDeviance...269 TypeIII(Wald)Tests...269 MODIFYING

More information

SAS Software to Fit the Generalized Linear Model

SAS Software to Fit the Generalized Linear Model SAS Software to Fit the Generalized Linear Model Gordon Johnston, SAS Institute Inc., Cary, NC Abstract In recent years, the class of generalized linear models has gained popularity as a statistical modeling

More information

5. Parametric Regression Model

5. Parametric Regression Model 5. Parametric Regression Model The Accelerated Failure Time (AFT) Model Denote by S (t) and S 2 (t) the survival functions of two populations. The AFT model says that there is a constant c > 0 such that

More information

Multinomial Logistic Regression Models

Multinomial Logistic Regression Models Stat 544, Lecture 19 1 Multinomial Logistic Regression Models Polytomous responses. Logistic regression can be extended to handle responses that are polytomous, i.e. taking r>2 categories. (Note: The word

More information

Paper: ST-161. Techniques for Evidence-Based Decision Making Using SAS Ian Stockwell, The Hilltop UMBC, Baltimore, MD

Paper: ST-161. Techniques for Evidence-Based Decision Making Using SAS Ian Stockwell, The Hilltop UMBC, Baltimore, MD Paper: ST-161 Techniques for Evidence-Based Decision Making Using SAS Ian Stockwell, The Hilltop Institute @ UMBC, Baltimore, MD ABSTRACT SAS has many tools that can be used for data analysis. From Freqs

More information

Analysis of Count Data A Business Perspective. George J. Hurley Sr. Research Manager The Hershey Company Milwaukee June 2013

Analysis of Count Data A Business Perspective. George J. Hurley Sr. Research Manager The Hershey Company Milwaukee June 2013 Analysis of Count Data A Business Perspective George J. Hurley Sr. Research Manager The Hershey Company Milwaukee June 2013 Overview Count data Methods Conclusions 2 Count data Count data Anything with

More information

Data Analyses in Multivariate Regression Chii-Dean Joey Lin, SDSU, San Diego, CA

Data Analyses in Multivariate Regression Chii-Dean Joey Lin, SDSU, San Diego, CA Data Analyses in Multivariate Regression Chii-Dean Joey Lin, SDSU, San Diego, CA ABSTRACT Regression analysis is one of the most used statistical methodologies. It can be used to describe or predict causal

More information

Investigating Models with Two or Three Categories

Investigating Models with Two or Three Categories Ronald H. Heck and Lynn N. Tabata 1 Investigating Models with Two or Three Categories For the past few weeks we have been working with discriminant analysis. Let s now see what the same sort of model might

More information

Section Poisson Regression

Section Poisson Regression Section 14.13 Poisson Regression Timothy Hanson Department of Statistics, University of South Carolina Stat 705: Data Analysis II 1 / 26 Poisson regression Regular regression data {(x i, Y i )} n i=1,

More information

Model Based Statistics in Biology. Part V. The Generalized Linear Model. Chapter 18.1 Logistic Regression (Dose - Response)

Model Based Statistics in Biology. Part V. The Generalized Linear Model. Chapter 18.1 Logistic Regression (Dose - Response) Model Based Statistics in Biology. Part V. The Generalized Linear Model. Logistic Regression ( - Response) ReCap. Part I (Chapters 1,2,3,4), Part II (Ch 5, 6, 7) ReCap Part III (Ch 9, 10, 11), Part IV

More information

STA6938-Logistic Regression Model

STA6938-Logistic Regression Model Dr. Ying Zhang STA6938-Logistic Regression Model Topic 2-Multiple Logistic Regression Model Outlines:. Model Fitting 2. Statistical Inference for Multiple Logistic Regression Model 3. Interpretation of

More information

Review: what is a linear model. Y = β 0 + β 1 X 1 + β 2 X 2 + A model of the following form:

Review: what is a linear model. Y = β 0 + β 1 X 1 + β 2 X 2 + A model of the following form: Outline for today What is a generalized linear model Linear predictors and link functions Example: fit a constant (the proportion) Analysis of deviance table Example: fit dose-response data using logistic

More information

Class Notes: Week 8. Probit versus Logit Link Functions and Count Data

Class Notes: Week 8. Probit versus Logit Link Functions and Count Data Ronald Heck Class Notes: Week 8 1 Class Notes: Week 8 Probit versus Logit Link Functions and Count Data This week we ll take up a couple of issues. The first is working with a probit link function. While

More information

A SAS/AF Application For Sample Size And Power Determination

A SAS/AF Application For Sample Size And Power Determination A SAS/AF Application For Sample Size And Power Determination Fiona Portwood, Software Product Services Ltd. Abstract When planning a study, such as a clinical trial or toxicology experiment, the choice

More information

GMM Logistic Regression with Time-Dependent Covariates and Feedback Processes in SAS TM

GMM Logistic Regression with Time-Dependent Covariates and Feedback Processes in SAS TM Paper 1025-2017 GMM Logistic Regression with Time-Dependent Covariates and Feedback Processes in SAS TM Kyle M. Irimata, Arizona State University; Jeffrey R. Wilson, Arizona State University ABSTRACT The

More information

Some general observations.

Some general observations. Modeling and analyzing data from computer experiments. Some general observations. 1. For simplicity, I assume that all factors (inputs) x1, x2,, xd are quantitative. 2. Because the code always produces

More information

Glossary. The ISI glossary of statistical terms provides definitions in a number of different languages:

Glossary. The ISI glossary of statistical terms provides definitions in a number of different languages: Glossary The ISI glossary of statistical terms provides definitions in a number of different languages: http://isi.cbs.nl/glossary/index.htm Adjusted r 2 Adjusted R squared measures the proportion of the

More information

Generalized linear models

Generalized linear models Generalized linear models Douglas Bates November 01, 2010 Contents 1 Definition 1 2 Links 2 3 Estimating parameters 5 4 Example 6 5 Model building 8 6 Conclusions 8 7 Summary 9 1 Generalized Linear Models

More information

INFORMATION AS A UNIFYING MEASURE OF FIT IN SAS STATISTICAL MODELING PROCEDURES

INFORMATION AS A UNIFYING MEASURE OF FIT IN SAS STATISTICAL MODELING PROCEDURES INFORMATION AS A UNIFYING MEASURE OF FIT IN SAS STATISTICAL MODELING PROCEDURES Ernest S. Shtatland, PhD Mary B. Barton, MD, MPP Harvard Medical School, Harvard Pilgrim Health Care, Boston, MA ABSTRACT

More information

8 Nominal and Ordinal Logistic Regression

8 Nominal and Ordinal Logistic Regression 8 Nominal and Ordinal Logistic Regression 8.1 Introduction If the response variable is categorical, with more then two categories, then there are two options for generalized linear models. One relies on

More information

Basic Medical Statistics Course

Basic Medical Statistics Course Basic Medical Statistics Course S7 Logistic Regression November 2015 Wilma Heemsbergen w.heemsbergen@nki.nl Logistic Regression The concept of a relationship between the distribution of a dependent variable

More information

Homework 5: Answer Key. Plausible Model: E(y) = µt. The expected number of arrests arrests equals a constant times the number who attend the game.

Homework 5: Answer Key. Plausible Model: E(y) = µt. The expected number of arrests arrests equals a constant times the number who attend the game. EdPsych/Psych/Soc 589 C.J. Anderson Homework 5: Answer Key 1. Probelm 3.18 (page 96 of Agresti). (a) Y assume Poisson random variable. Plausible Model: E(y) = µt. The expected number of arrests arrests

More information

BOOTSTRAPPING WITH MODELS FOR COUNT DATA

BOOTSTRAPPING WITH MODELS FOR COUNT DATA Journal of Biopharmaceutical Statistics, 21: 1164 1176, 2011 Copyright Taylor & Francis Group, LLC ISSN: 1054-3406 print/1520-5711 online DOI: 10.1080/10543406.2011.607748 BOOTSTRAPPING WITH MODELS FOR

More information

Stat 642, Lecture notes for 04/12/05 96

Stat 642, Lecture notes for 04/12/05 96 Stat 642, Lecture notes for 04/12/05 96 Hosmer-Lemeshow Statistic The Hosmer-Lemeshow Statistic is another measure of lack of fit. Hosmer and Lemeshow recommend partitioning the observations into 10 equal

More information

COMPLEMENTARY LOG-LOG MODEL

COMPLEMENTARY LOG-LOG MODEL COMPLEMENTARY LOG-LOG MODEL Under the assumption of binary response, there are two alternatives to logit model: probit model and complementary-log-log model. They all follow the same form π ( x) =Φ ( α

More information

The SEQDESIGN Procedure

The SEQDESIGN Procedure SAS/STAT 9.2 User s Guide, Second Edition The SEQDESIGN Procedure (Book Excerpt) This document is an individual chapter from the SAS/STAT 9.2 User s Guide, Second Edition. The correct bibliographic citation

More information

BIAS OF MAXIMUM-LIKELIHOOD ESTIMATES IN LOGISTIC AND COX REGRESSION MODELS: A COMPARATIVE SIMULATION STUDY

BIAS OF MAXIMUM-LIKELIHOOD ESTIMATES IN LOGISTIC AND COX REGRESSION MODELS: A COMPARATIVE SIMULATION STUDY BIAS OF MAXIMUM-LIKELIHOOD ESTIMATES IN LOGISTIC AND COX REGRESSION MODELS: A COMPARATIVE SIMULATION STUDY Ingo Langner 1, Ralf Bender 2, Rebecca Lenz-Tönjes 1, Helmut Küchenhoff 2, Maria Blettner 2 1

More information

Model Estimation Example

Model Estimation Example Ronald H. Heck 1 EDEP 606: Multivariate Methods (S2013) April 7, 2013 Model Estimation Example As we have moved through the course this semester, we have encountered the concept of model estimation. Discussions

More information

ADVANCED STATISTICAL ANALYSIS OF EPIDEMIOLOGICAL STUDIES. Cox s regression analysis Time dependent explanatory variables

ADVANCED STATISTICAL ANALYSIS OF EPIDEMIOLOGICAL STUDIES. Cox s regression analysis Time dependent explanatory variables ADVANCED STATISTICAL ANALYSIS OF EPIDEMIOLOGICAL STUDIES Cox s regression analysis Time dependent explanatory variables Henrik Ravn Bandim Health Project, Statens Serum Institut 4 November 2011 1 / 53

More information

SCHOOL OF MATHEMATICS AND STATISTICS. Linear and Generalised Linear Models

SCHOOL OF MATHEMATICS AND STATISTICS. Linear and Generalised Linear Models SCHOOL OF MATHEMATICS AND STATISTICS Linear and Generalised Linear Models Autumn Semester 2017 18 2 hours Attempt all the questions. The allocation of marks is shown in brackets. RESTRICTED OPEN BOOK EXAMINATION

More information

RANDOM and REPEATED statements - How to Use Them to Model the Covariance Structure in Proc Mixed. Charlie Liu, Dachuang Cao, Peiqi Chen, Tony Zagar

RANDOM and REPEATED statements - How to Use Them to Model the Covariance Structure in Proc Mixed. Charlie Liu, Dachuang Cao, Peiqi Chen, Tony Zagar Paper S02-2007 RANDOM and REPEATED statements - How to Use Them to Model the Covariance Structure in Proc Mixed Charlie Liu, Dachuang Cao, Peiqi Chen, Tony Zagar Eli Lilly & Company, Indianapolis, IN ABSTRACT

More information

Chapter 5: Logistic Regression-I

Chapter 5: Logistic Regression-I : Logistic Regression-I Dipankar Bandyopadhyay Department of Biostatistics, Virginia Commonwealth University BIOS 625: Categorical Data & GLM [Acknowledgements to Tim Hanson and Haitao Chu] D. Bandyopadhyay

More information

BIOL 51A - Biostatistics 1 1. Lecture 1: Intro to Biostatistics. Smoking: hazardous? FEV (l) Smoke

BIOL 51A - Biostatistics 1 1. Lecture 1: Intro to Biostatistics. Smoking: hazardous? FEV (l) Smoke BIOL 51A - Biostatistics 1 1 Lecture 1: Intro to Biostatistics Smoking: hazardous? FEV (l) 1 2 3 4 5 No Yes Smoke BIOL 51A - Biostatistics 1 2 Box Plot a.k.a box-and-whisker diagram or candlestick chart

More information

ST3241 Categorical Data Analysis I Multicategory Logit Models. Logit Models For Nominal Responses

ST3241 Categorical Data Analysis I Multicategory Logit Models. Logit Models For Nominal Responses ST3241 Categorical Data Analysis I Multicategory Logit Models Logit Models For Nominal Responses 1 Models For Nominal Responses Y is nominal with J categories. Let {π 1,, π J } denote the response probabilities

More information

Clinical Trials. Olli Saarela. September 18, Dalla Lana School of Public Health University of Toronto.

Clinical Trials. Olli Saarela. September 18, Dalla Lana School of Public Health University of Toronto. Introduction to Dalla Lana School of Public Health University of Toronto olli.saarela@utoronto.ca September 18, 2014 38-1 : a review 38-2 Evidence Ideal: to advance the knowledge-base of clinical medicine,

More information

STAT 7030: Categorical Data Analysis

STAT 7030: Categorical Data Analysis STAT 7030: Categorical Data Analysis 5. Logistic Regression Peng Zeng Department of Mathematics and Statistics Auburn University Fall 2012 Peng Zeng (Auburn University) STAT 7030 Lecture Notes Fall 2012

More information

Generalized Linear Models for Non-Normal Data

Generalized Linear Models for Non-Normal Data Generalized Linear Models for Non-Normal Data Today s Class: 3 parts of a generalized model Models for binary outcomes Complications for generalized multivariate or multilevel models SPLH 861: Lecture

More information

Dynamic Determination of Mixed Model Covariance Structures. in Double-blind Clinical Trials. Matthew Davis - Omnicare Clinical Research

Dynamic Determination of Mixed Model Covariance Structures. in Double-blind Clinical Trials. Matthew Davis - Omnicare Clinical Research PharmaSUG2010 - Paper SP12 Dynamic Determination of Mixed Model Covariance Structures in Double-blind Clinical Trials Matthew Davis - Omnicare Clinical Research Abstract With the computing power of SAS

More information

Generalized Linear Models (GLZ)

Generalized Linear Models (GLZ) Generalized Linear Models (GLZ) Generalized Linear Models (GLZ) are an extension of the linear modeling process that allows models to be fit to data that follow probability distributions other than the

More information

UNIVERSITY OF MASSACHUSETTS Department of Mathematics and Statistics Applied Statistics Friday, January 15, 2016

UNIVERSITY OF MASSACHUSETTS Department of Mathematics and Statistics Applied Statistics Friday, January 15, 2016 UNIVERSITY OF MASSACHUSETTS Department of Mathematics and Statistics Applied Statistics Friday, January 15, 2016 Work all problems. 60 points are needed to pass at the Masters Level and 75 to pass at the

More information

Compare Predicted Counts between Groups of Zero Truncated Poisson Regression Model based on Recycled Predictions Method

Compare Predicted Counts between Groups of Zero Truncated Poisson Regression Model based on Recycled Predictions Method Compare Predicted Counts between Groups of Zero Truncated Poisson Regression Model based on Recycled Predictions Method Yan Wang 1, Michael Ong 2, Honghu Liu 1,2,3 1 Department of Biostatistics, UCLA School

More information

Models for Binary Outcomes

Models for Binary Outcomes Models for Binary Outcomes Introduction The simple or binary response (for example, success or failure) analysis models the relationship between a binary response variable and one or more explanatory variables.

More information

Prediction of Bike Rental using Model Reuse Strategy

Prediction of Bike Rental using Model Reuse Strategy Prediction of Bike Rental using Model Reuse Strategy Arun Bala Subramaniyan and Rong Pan School of Computing, Informatics, Decision Systems Engineering, Arizona State University, Tempe, USA. {bsarun, rong.pan}@asu.edu

More information

Survival Regression Models

Survival Regression Models Survival Regression Models David M. Rocke May 18, 2017 David M. Rocke Survival Regression Models May 18, 2017 1 / 32 Background on the Proportional Hazards Model The exponential distribution has constant

More information

Exam Applied Statistical Regression. Good Luck!

Exam Applied Statistical Regression. Good Luck! Dr. M. Dettling Summer 2011 Exam Applied Statistical Regression Approved: Tables: Note: Any written material, calculator (without communication facility). Attached. All tests have to be done at the 5%-level.

More information

The GENMOD Procedure. Overview. Getting Started. Syntax. Details. Examples. References. SAS/STAT User's Guide. Book Contents Previous Next

The GENMOD Procedure. Overview. Getting Started. Syntax. Details. Examples. References. SAS/STAT User's Guide. Book Contents Previous Next Book Contents Previous Next SAS/STAT User's Guide Overview Getting Started Syntax Details Examples References Book Contents Previous Next Top http://v8doc.sas.com/sashtml/stat/chap29/index.htm29/10/2004

More information

Package threg. August 10, 2015

Package threg. August 10, 2015 Package threg August 10, 2015 Title Threshold Regression Version 1.0.3 Date 2015-08-10 Author Tao Xiao Maintainer Tao Xiao Depends R (>= 2.10), survival, Formula Fit a threshold regression

More information

Statistics in medicine

Statistics in medicine Statistics in medicine Lecture 4: and multivariable regression Fatma Shebl, MD, MS, MPH, PhD Assistant Professor Chronic Disease Epidemiology Department Yale School of Public Health Fatma.shebl@yale.edu

More information

Section on Survey Research Methods JSM 2010 STATISTICAL GRAPHICS OF PEARSON RESIDUALS IN SURVEY LOGISTIC REGRESSION DIAGNOSIS

Section on Survey Research Methods JSM 2010 STATISTICAL GRAPHICS OF PEARSON RESIDUALS IN SURVEY LOGISTIC REGRESSION DIAGNOSIS STATISTICAL GRAPHICS OF PEARSON RESIDUALS IN SURVEY LOGISTIC REGRESSION DIAGNOSIS Stanley Weng, National Agricultural Statistics Service, U.S. Department of Agriculture 3251 Old Lee Hwy, Fairfax, VA 22030,

More information

Lecture 8 Stat D. Gillen

Lecture 8 Stat D. Gillen Statistics 255 - Survival Analysis Presented February 23, 2016 Dan Gillen Department of Statistics University of California, Irvine 8.1 Example of two ways to stratify Suppose a confounder C has 3 levels

More information

EXAMINATIONS OF THE ROYAL STATISTICAL SOCIETY (formerly the Examinations of the Institute of Statisticians) GRADUATE DIPLOMA, 2007

EXAMINATIONS OF THE ROYAL STATISTICAL SOCIETY (formerly the Examinations of the Institute of Statisticians) GRADUATE DIPLOMA, 2007 EXAMINATIONS OF THE ROYAL STATISTICAL SOCIETY (formerly the Examinations of the Institute of Statisticians) GRADUATE DIPLOMA, 2007 Applied Statistics I Time Allowed: Three Hours Candidates should answer

More information

Extensions of Cox Model for Non-Proportional Hazards Purpose

Extensions of Cox Model for Non-Proportional Hazards Purpose PhUSE 2013 Paper SP07 Extensions of Cox Model for Non-Proportional Hazards Purpose Jadwiga Borucka, PAREXEL, Warsaw, Poland ABSTRACT Cox proportional hazard model is one of the most common methods used

More information

Introduction to Regression Analysis. Dr. Devlina Chatterjee 11 th August, 2017

Introduction to Regression Analysis. Dr. Devlina Chatterjee 11 th August, 2017 Introduction to Regression Analysis Dr. Devlina Chatterjee 11 th August, 2017 What is regression analysis? Regression analysis is a statistical technique for studying linear relationships. One dependent

More information

Performing response surface analysis using the SAS RSREG procedure

Performing response surface analysis using the SAS RSREG procedure Paper DV02-2012 Performing response surface analysis using the SAS RSREG procedure Zhiwu Li, National Database Nursing Quality Indicator and the Department of Biostatistics, University of Kansas Medical

More information

Ron Heck, Fall Week 3: Notes Building a Two-Level Model

Ron Heck, Fall Week 3: Notes Building a Two-Level Model Ron Heck, Fall 2011 1 EDEP 768E: Seminar on Multilevel Modeling rev. 9/6/2011@11:27pm Week 3: Notes Building a Two-Level Model We will build a model to explain student math achievement using student-level

More information

UNIVERSITY OF TORONTO. Faculty of Arts and Science APRIL 2010 EXAMINATIONS STA 303 H1S / STA 1002 HS. Duration - 3 hours. Aids Allowed: Calculator

UNIVERSITY OF TORONTO. Faculty of Arts and Science APRIL 2010 EXAMINATIONS STA 303 H1S / STA 1002 HS. Duration - 3 hours. Aids Allowed: Calculator UNIVERSITY OF TORONTO Faculty of Arts and Science APRIL 2010 EXAMINATIONS STA 303 H1S / STA 1002 HS Duration - 3 hours Aids Allowed: Calculator LAST NAME: FIRST NAME: STUDENT NUMBER: There are 27 pages

More information

MIXED MODELS FOR REPEATED (LONGITUDINAL) DATA PART 2 DAVID C. HOWELL 4/1/2010

MIXED MODELS FOR REPEATED (LONGITUDINAL) DATA PART 2 DAVID C. HOWELL 4/1/2010 MIXED MODELS FOR REPEATED (LONGITUDINAL) DATA PART 2 DAVID C. HOWELL 4/1/2010 Part 1 of this document can be found at http://www.uvm.edu/~dhowell/methods/supplements/mixed Models for Repeated Measures1.pdf

More information

Quantifying Weather Risk Analysis

Quantifying Weather Risk Analysis Quantifying Weather Risk Analysis Now that an index has been selected and calibrated, it can be used to conduct a more thorough risk analysis. The objective of such a risk analysis is to gain a better

More information

UNIVERSITY OF TORONTO Faculty of Arts and Science

UNIVERSITY OF TORONTO Faculty of Arts and Science UNIVERSITY OF TORONTO Faculty of Arts and Science December 2013 Final Examination STA442H1F/2101HF Methods of Applied Statistics Jerry Brunner Duration - 3 hours Aids: Calculator Model(s): Any calculator

More information

STA 303 H1S / 1002 HS Winter 2011 Test March 7, ab 1cde 2abcde 2fghij 3

STA 303 H1S / 1002 HS Winter 2011 Test March 7, ab 1cde 2abcde 2fghij 3 STA 303 H1S / 1002 HS Winter 2011 Test March 7, 2011 LAST NAME: FIRST NAME: STUDENT NUMBER: ENROLLED IN: (circle one) STA 303 STA 1002 INSTRUCTIONS: Time: 90 minutes Aids allowed: calculator. Some formulae

More information

Introduction to Generalized Linear Models

Introduction to Generalized Linear Models Introduction to Generalized Linear Models Edps/Psych/Soc 589 Carolyn J. Anderson Department of Educational Psychology c Board of Trustees, University of Illinois Fall 2018 Outline Introduction (motivation

More information

Faculty of Health Sciences. Regression models. Counts, Poisson regression, Lene Theil Skovgaard. Dept. of Biostatistics

Faculty of Health Sciences. Regression models. Counts, Poisson regression, Lene Theil Skovgaard. Dept. of Biostatistics Faculty of Health Sciences Regression models Counts, Poisson regression, 27-5-2013 Lene Theil Skovgaard Dept. of Biostatistics 1 / 36 Count outcome PKA & LTS, Sect. 7.2 Poisson regression The Binomial

More information

Generalized linear models

Generalized linear models Generalized linear models Outline for today What is a generalized linear model Linear predictors and link functions Example: estimate a proportion Analysis of deviance Example: fit dose- response data

More information

Lecture 9. Statistics Survival Analysis. Presented February 23, Dan Gillen Department of Statistics University of California, Irvine

Lecture 9. Statistics Survival Analysis. Presented February 23, Dan Gillen Department of Statistics University of California, Irvine Statistics 255 - Survival Analysis Presented February 23, 2016 Dan Gillen Department of Statistics University of California, Irvine 9.1 Survival analysis involves subjects moving through time Hazard may

More information

Statistical Distribution Assumptions of General Linear Models

Statistical Distribution Assumptions of General Linear Models Statistical Distribution Assumptions of General Linear Models Applied Multilevel Models for Cross Sectional Data Lecture 4 ICPSR Summer Workshop University of Colorado Boulder Lecture 4: Statistical Distributions

More information

Chapter 1. Modeling Basics

Chapter 1. Modeling Basics Chapter 1. Modeling Basics What is a model? Model equation and probability distribution Types of model effects Writing models in matrix form Summary 1 What is a statistical model? A model is a mathematical

More information

Parametric Modelling of Over-dispersed Count Data. Part III / MMath (Applied Statistics) 1

Parametric Modelling of Over-dispersed Count Data. Part III / MMath (Applied Statistics) 1 Parametric Modelling of Over-dispersed Count Data Part III / MMath (Applied Statistics) 1 Introduction Poisson regression is the de facto approach for handling count data What happens then when Poisson

More information

The GENMOD Procedure (Book Excerpt)

The GENMOD Procedure (Book Excerpt) SAS/STAT 9.22 User s Guide The GENMOD Procedure (Book Excerpt) SAS Documentation This document is an individual chapter from SAS/STAT 9.22 User s Guide. The correct bibliographic citation for the complete

More information

MGR-815. Notes for the MGR-815 course. 12 June School of Superior Technology. Professor Zbigniew Dziong

MGR-815. Notes for the MGR-815 course. 12 June School of Superior Technology. Professor Zbigniew Dziong Modeling, Estimation and Control, for Telecommunication Networks Notes for the MGR-815 course 12 June 2010 School of Superior Technology Professor Zbigniew Dziong 1 Table of Contents Preface 5 1. Example

More information

Generalized Linear Models 1

Generalized Linear Models 1 Generalized Linear Models 1 STA 2101/442: Fall 2012 1 See last slide for copyright information. 1 / 24 Suggested Reading: Davison s Statistical models Exponential families of distributions Sec. 5.2 Chapter

More information

Generalized Models: Part 1

Generalized Models: Part 1 Generalized Models: Part 1 Topics: Introduction to generalized models Introduction to maximum likelihood estimation Models for binary outcomes Models for proportion outcomes Models for categorical outcomes

More information

Chapter 16. Simple Linear Regression and Correlation

Chapter 16. Simple Linear Regression and Correlation Chapter 16 Simple Linear Regression and Correlation 16.1 Regression Analysis Our problem objective is to analyze the relationship between interval variables; regression analysis is the first tool we will

More information

Lecture 7 Time-dependent Covariates in Cox Regression

Lecture 7 Time-dependent Covariates in Cox Regression Lecture 7 Time-dependent Covariates in Cox Regression So far, we ve been considering the following Cox PH model: λ(t Z) = λ 0 (t) exp(β Z) = λ 0 (t) exp( β j Z j ) where β j is the parameter for the the

More information

Lectures 5 & 6: Hypothesis Testing

Lectures 5 & 6: Hypothesis Testing Lectures 5 & 6: Hypothesis Testing in which you learn to apply the concept of statistical significance to OLS estimates, learn the concept of t values, how to use them in regression work and come across

More information

2 Prediction and Analysis of Variance

2 Prediction and Analysis of Variance 2 Prediction and Analysis of Variance Reading: Chapters and 2 of Kennedy A Guide to Econometrics Achen, Christopher H. Interpreting and Using Regression (London: Sage, 982). Chapter 4 of Andy Field, Discovering

More information

ANALYSING BINARY DATA IN A REPEATED MEASUREMENTS SETTING USING SAS

ANALYSING BINARY DATA IN A REPEATED MEASUREMENTS SETTING USING SAS Libraries 1997-9th Annual Conference Proceedings ANALYSING BINARY DATA IN A REPEATED MEASUREMENTS SETTING USING SAS Eleanor F. Allan Follow this and additional works at: http://newprairiepress.org/agstatconference

More information

Chapter 22: Log-linear regression for Poisson counts

Chapter 22: Log-linear regression for Poisson counts Chapter 22: Log-linear regression for Poisson counts Exposure to ionizing radiation is recognized as a cancer risk. In the United States, EPA sets guidelines specifying upper limits on the amount of exposure

More information

Chapter 6. Logistic Regression. 6.1 A linear model for the log odds

Chapter 6. Logistic Regression. 6.1 A linear model for the log odds Chapter 6 Logistic Regression In logistic regression, there is a categorical response variables, often coded 1=Yes and 0=No. Many important phenomena fit this framework. The patient survives the operation,

More information

Lecture 01: Introduction

Lecture 01: Introduction Lecture 01: Introduction Dipankar Bandyopadhyay, Ph.D. BMTRY 711: Analysis of Categorical Data Spring 2011 Division of Biostatistics and Epidemiology Medical University of South Carolina Lecture 01: Introduction

More information

Logistic Regression. Interpretation of linear regression. Other types of outcomes. 0-1 response variable: Wound infection. Usual linear regression

Logistic Regression. Interpretation of linear regression. Other types of outcomes. 0-1 response variable: Wound infection. Usual linear regression Logistic Regression Usual linear regression (repetition) y i = b 0 + b 1 x 1i + b 2 x 2i + e i, e i N(0,σ 2 ) or: y i N(b 0 + b 1 x 1i + b 2 x 2i,σ 2 ) Example (DGA, p. 336): E(PEmax) = 47.355 + 1.024

More information

Practice of SAS Logistic Regression on Binary Pharmacodynamic Data Problems and Solutions. Alan J Xiao, Cognigen Corporation, Buffalo NY

Practice of SAS Logistic Regression on Binary Pharmacodynamic Data Problems and Solutions. Alan J Xiao, Cognigen Corporation, Buffalo NY Practice of SAS Logistic Regression on Binary Pharmacodynamic Data Problems and Solutions Alan J Xiao, Cognigen Corporation, Buffalo NY ABSTRACT Logistic regression has been widely applied to population

More information

Marquette University Executive MBA Program Statistics Review Class Notes Summer 2018

Marquette University Executive MBA Program Statistics Review Class Notes Summer 2018 Marquette University Executive MBA Program Statistics Review Class Notes Summer 2018 Chapter One: Data and Statistics Statistics A collection of procedures and principles

More information

Review. Timothy Hanson. Department of Statistics, University of South Carolina. Stat 770: Categorical Data Analysis

Review. Timothy Hanson. Department of Statistics, University of South Carolina. Stat 770: Categorical Data Analysis Review Timothy Hanson Department of Statistics, University of South Carolina Stat 770: Categorical Data Analysis 1 / 22 Chapter 1: background Nominal, ordinal, interval data. Distributions: Poisson, binomial,

More information

Econometrics Summary Algebraic and Statistical Preliminaries

Econometrics Summary Algebraic and Statistical Preliminaries Econometrics Summary Algebraic and Statistical Preliminaries Elasticity: The point elasticity of Y with respect to L is given by α = ( Y/ L)/(Y/L). The arc elasticity is given by ( Y/ L)/(Y/L), when L

More information

From Practical Data Analysis with JMP, Second Edition. Full book available for purchase here. About This Book... xiii About The Author...

From Practical Data Analysis with JMP, Second Edition. Full book available for purchase here. About This Book... xiii About The Author... From Practical Data Analysis with JMP, Second Edition. Full book available for purchase here. Contents About This Book... xiii About The Author... xxiii Chapter 1 Getting Started: Data Analysis with JMP...

More information

Chapter 20: Logistic regression for binary response variables

Chapter 20: Logistic regression for binary response variables Chapter 20: Logistic regression for binary response variables In 1846, the Donner and Reed families left Illinois for California by covered wagon (87 people, 20 wagons). They attempted a new and untried

More information

Open Problems in Mixed Models

Open Problems in Mixed Models xxiii Determining how to deal with a not positive definite covariance matrix of random effects, D during maximum likelihood estimation algorithms. Several strategies are discussed in Section 2.15. For

More information

Department of Statistical Science FIRST YEAR EXAM - SPRING 2017

Department of Statistical Science FIRST YEAR EXAM - SPRING 2017 Department of Statistical Science Duke University FIRST YEAR EXAM - SPRING 017 Monday May 8th 017, 9:00 AM 1:00 PM NOTES: PLEASE READ CAREFULLY BEFORE BEGINNING EXAM! 1. Do not write solutions on the exam;

More information

Maximum-Likelihood Estimation: Basic Ideas

Maximum-Likelihood Estimation: Basic Ideas Sociology 740 John Fox Lecture Notes Maximum-Likelihood Estimation: Basic Ideas Copyright 2014 by John Fox Maximum-Likelihood Estimation: Basic Ideas 1 I The method of maximum likelihood provides estimators

More information

Review of Statistics 101

Review of Statistics 101 Review of Statistics 101 We review some important themes from the course 1. Introduction Statistics- Set of methods for collecting/analyzing data (the art and science of learning from data). Provides methods

More information

A COEFFICIENT OF DETERMINATION FOR LOGISTIC REGRESSION MODELS

A COEFFICIENT OF DETERMINATION FOR LOGISTIC REGRESSION MODELS A COEFFICIENT OF DETEMINATION FO LOGISTIC EGESSION MODELS ENATO MICELI UNIVESITY OF TOINO After a brief presentation of the main extensions of the classical coefficient of determination ( ), a new index

More information

Analyzing and Interpreting Continuous Data Using JMP

Analyzing and Interpreting Continuous Data Using JMP Analyzing and Interpreting Continuous Data Using JMP A Step-by-Step Guide José G. Ramírez, Ph.D. Brenda S. Ramírez, M.S. Corrections to first printing. The correct bibliographic citation for this manual

More information

Model Based Statistics in Biology. Part V. The Generalized Linear Model. Chapter 16 Introduction

Model Based Statistics in Biology. Part V. The Generalized Linear Model. Chapter 16 Introduction Model Based Statistics in Biology. Part V. The Generalized Linear Model. Chapter 16 Introduction ReCap. Parts I IV. The General Linear Model Part V. The Generalized Linear Model 16 Introduction 16.1 Analysis

More information

Statistics Boot Camp. Dr. Stephanie Lane Institute for Defense Analyses DATAWorks 2018

Statistics Boot Camp. Dr. Stephanie Lane Institute for Defense Analyses DATAWorks 2018 Statistics Boot Camp Dr. Stephanie Lane Institute for Defense Analyses DATAWorks 2018 March 21, 2018 Outline of boot camp Summarizing and simplifying data Point and interval estimation Foundations of statistical

More information

Application of Poisson and Negative Binomial Regression Models in Modelling Oil Spill Data in the Niger Delta

Application of Poisson and Negative Binomial Regression Models in Modelling Oil Spill Data in the Niger Delta International Journal of Science and Engineering Investigations vol. 7, issue 77, June 2018 ISSN: 2251-8843 Application of Poisson and Negative Binomial Regression Models in Modelling Oil Spill Data in

More information

Generalized Linear Model under the Extended Negative Multinomial Model and Cancer Incidence

Generalized Linear Model under the Extended Negative Multinomial Model and Cancer Incidence Generalized Linear Model under the Extended Negative Multinomial Model and Cancer Incidence Sunil Kumar Dhar Center for Applied Mathematics and Statistics, Department of Mathematical Sciences, New Jersey

More information

For more information about how to cite these materials visit

For more information about how to cite these materials visit Author(s): Kerby Shedden, Ph.D., 2010 License: Unless otherwise noted, this material is made available under the terms of the Creative Commons Attribution Share Alike 3.0 License: http://creativecommons.org/licenses/by-sa/3.0/

More information

Chapter 4: Generalized Linear Models-II

Chapter 4: Generalized Linear Models-II : Generalized Linear Models-II Dipankar Bandyopadhyay Department of Biostatistics, Virginia Commonwealth University BIOS 625: Categorical Data & GLM [Acknowledgements to Tim Hanson and Haitao Chu] D. Bandyopadhyay

More information

Lab 3: Two levels Poisson models (taken from Multilevel and Longitudinal Modeling Using Stata, p )

Lab 3: Two levels Poisson models (taken from Multilevel and Longitudinal Modeling Using Stata, p ) Lab 3: Two levels Poisson models (taken from Multilevel and Longitudinal Modeling Using Stata, p. 376-390) BIO656 2009 Goal: To see if a major health-care reform which took place in 1997 in Germany was

More information