Using PROC GENMOD to Model Adverse Event Counts in a Health Care Setting
|
|
- Maurice Beasley
- 5 years ago
- Views:
Transcription
1 Using PROC GENMOD to Model Adverse Event Counts in a Health Care Setting John Ulicny, Fox Chase - Temple Bone Marrow Transplant Program, Philadelphia, PA Thomas R. Klumpp, MD, Fox Chase - Temple Bone Marrow Transplant Program, Philadelphia, PA ABSTRACT Statistical models for adverse events have been developed as part of a quality management initiative at the Fox Chase Temple Bone Marrow Transplant Program in Philadelphia, PA. Their purpose is to enable the transplant team to compare recent adverse event counts to what should be expected based on the patient population currently under care by the team. The count of adverse events is the response variable in the regression models. Generalized Linear Modeling, as implemented in the PROC GENMOD procedure, is an effective tool for performing regression analysis on a response of this type. Unlike ordinary least squares (OLS) it can be applied to a wide range of nonnormal responses as long as they come from the natural exponential family of distributions and meet certain other assumptions. The models in this paper were developed using Version of SAS. This version allows for enhanced, integrated graphical assessment of the model via the ODS Statistical Graphics facility which is built into various statistical procedures, GENMOD being one. The ODS Statistical Graphics facility is experimental in release Some examples of its use are provided in this paper. INTRODUCTION Hematopoietic Cell Transplantation (HCT) is performed on patients for a variety of severe illnesses, typically affecting the bone marrow and circulating blood. The term Bone Marrow Transplant (BMT) is an older term often used interchangeably with HCT, and for the purposes of this paper no distinction between the two terms is made. The vast majority of patients transplanted at the Fox Chase - Temple Bone Marrow Transplant Program have hematologic malignancies such as Non-Hodgkin s Lymphoma (NHL), Hodgkin s Disease (HD), Multiple Myeloma (MM), Acute Myelogenous Leukemia (AML) and others. Transplantation for some types of solid tumors is done as well, although these constitute a small minority of cases. Due to the high severity of most of these illnesses, high dose chemotherapy and radiation are typically administered to treat the cancer, but this results in the destruction of the bone marrow, which must then be replaced. This process is prone to generating a variety of toxicities, many of which are quite severe or even fatal. Common examples of HCT-related toxicities are anemia, nausea, diarrhea and mucositis. There are three main types of HCT useful for predicting adverse events: autologous, allogeneic-related and allogeneic-unrelated. In an autologous transplant, a patient s own blood stem cells are harvested before the preparative regimen (i.e. chemotherapy and/or radiation) is administered. The stem cells are then reinfused in order to reconstitute the marrow. In an allogeneic-related HCT, a member of the patient s family is a donor. The regimen is administered and then the related donor s cells are infused into the patient. The third form of HCT is required when a compatible donor cannot be found from within the patient s family. This is the allogeneic-unrelated HCT and involves a donor search through an international registry of donors to find a match. The matched unrelated donor s (MUD) cells are then infused after the regimen is administered. Autologous transplants are the most common and least risky form of transplant, followed by matched related transplants. The riskiest transplants are the allogeneic-unrelated transplants. The toxicities associated with each transplant are recorded by the data management team at the Fox Chase Temple BMT Program. They are graded according to the Eastern Cooperative Oncology Group (ECOG) standard grading system. These grades specified by the ECOG standard are ordinal, ranging from zero to five. If a toxicity is grade 3 or higher it is considered an adverse event in the model. A grade of zero indicates no event, and a grade of five indicates that the event was fatal. The event descriptions are listed in Table 1 on the next page. 1
2 Table 1. Toxicity Frequencies An Adverse Even is a Toxicity of Grade 3 or Higher Event Grade Description of Toxicity Aggregate Frequency ( ) 1 Mild 1,737 2 Moderate Severe Life threatening Fatal 19 - Total Toxicities 3,900 The goal of the model described in this paper is to compute a reasonable expected number of adverse events each month based on the characteristics of the patients under our care. Based on the information in Table 1 there have been ( )/60 21 adverse events per month on average based on a transplant volume of about 5 patients per month during the time frame GENERALIZED LINEAR MODELS AND PROC GENMOD There have been many excellent papers and books written about generalized linear models. For a thorough technical discussion see the books by Agresti 1 or Myers 2. For sources that describe using PROC GENMOD for generalized linear models see Allison 3 or Stokes et al 4. The discussion here is designed as a tutorial for those who have little or no familiarity with this procedure. In multiple linear regression, a response variable Y is related to a set of X-variables linearly as Y = β + β x β x + ε = x β + ε (1) i 0 1 1i p 1 ( p 1) i i i i for i=1, 2,, n observations. The errors from this model are assumed to be independent with zero mean and constant variance. The assumption of error normality is usually also added to enable one to construct hypothesis tests and confidence intervals for the parameters. A Generalized Linear Model also describes a relationship between a response variable and an independent variable or variables, however the relationship may be much more complex than a simple linear one. As described by Meyers 2 the generalized model is made up of three components: The random component. This component consists of the response variable Y with observed values Y 1, Y 2,, Y n. These observations are mutually independent and come from a natural exponential family. This family is of the general form given in equation (2) where the vectorθ may vary depending on the values of the covariates. i f( y ; θ ) = a( θ ) b( y )exp[ yc( θ )] (2) i i i i i i For instance, a special case of this family is the Bernoulli distribution with 1 parameter as given by yi 1 y p i i f( yi; pi) = pi (1 pi) = (1 pi)exp yilog( ) 1 pi (3) The parameter(s) in equations 2 and 3 are indexed by i because they can vary as a function of the covariates. In other words, because each observation can have different covariate values, the estimate of θ i can be different for each observation due to the dependence relationship specified in the model. In a simple linear regression it is only the response mean that can vary, but in a generalized linear model the variance can vary as well. In other words, the assumption of variance homogeneity can be relaxed, although it is important to understand how the variance depends on the model data. Keep in mind that there is a distinction between the parameters of the response distribution 2
3 represented by θ i and the parameters of the regression equation covariates represented by β. The systematic component. This is a function of the Xij that is linear in the parameters. If Y depends on several covariates then X n p is often called the design matrix, with n observations corresponding to p variables, possibly including an intercept. These explanatory variables can be combinations of continuous variables, categorical variables and interactions. The function results in a vector called the linear predictor. The equation is: η = X β (4) n 1 n p p 1 The final component is called the link function g. All link functions must be monotonic and differentiable, and they are often non-linear. This function relates the first two components to each other by specifying that η = g( µ ) = g[ E( Y)] (5) i i i The normal distribution is a special case of the natural exponential family, and the assumption of normality plays a key role in the process of estimating and evaluating simple linear and multiple regression models. In the present case however, where adverse event counts need to be modeled, other distributions from the exponential family are required to adequately represent the special nature of the data. One obvious requirement of the distribution is that it does not allow for a negative count value. A probability distribution for a count variable also will not typically have a constant variance. The variance of a count variable usually increases as the size of its mean increases. The Poisson distribution therefore is a good choice because the variance equals the mean, and so any model using the Poisson as a response variable will accommodate such responses. Note that this contrasts with the case of multiple regression where homogeneity of variance is assumed. One cannot have homogeneity of variance in a Poisson regression unless the mean always stays the same! The negative binomial distribution is another member of the natural exponential family that is useful in the context of modeling count data. This distribution, although significantly more complicated than the Poisson, can handle the situation called overdispersion in which the variance of the count variable is actually greater than the mean of the variable. The adverse events model illustration will use the Poisson model as well as the negative binomial model to demonstrate the two different approaches. In the standard regression case, where the regression response is normally distributed and the other regression assumptions mentioned above are met, ordinary least squares (OLS) can be used to arrive at parameter estimates that are unbiased as well as being maximum likelihood estimates (MLEs). Thus they are usually referred to as Best Linear Unbiased Estimators (BLUE). With non-normal responses however, it is necessary to use an algorithm such as iteratively re-weighted least squares instead of OLS to arrive at estimates that are approximate MLEs. These estimators are not guaranteed to be unbiased as in the normal response case, however they will tend to have the smallest variance possible while maximizing the probability that the sample obtained was actually drawn from a distribution with parameter values equal to the estimates that were generated. This is why some GENMOD output will mention iteratively re-weighted least squares as the estimation technique used. CHARACTERISTICS OF THE DATA There is a significant amount of very high quality data collected on each patient treated by the BMT team. The main file (that we call the core file) consists of one observation for each patient-protocol combination. For example, one patient might be registered onto three treatment protocols over a period of time, and only one of these protocols will actually involve a transplant. Nevertheless that patient will have three observations in the database and may be at risk for an adverse even under any of the protocols. In reality the risk of an adverse event on any protocol other than a transplant protocol is quite small and so only patients registered for transplants are included in the model. We collect and quality-assure dozens of variables, including patient demographic information, disease status, treatment details, outcomes and more. SAS/AF and SCL are used to input the information into a dataset. This dataset is then checked at least once per day by an expert system written in Base SAS. This system checks the relationships of key variables to one another, and it also checks to determine if the magnitudes of the most important variables are reasonable within the context of the patient s current status. For instance the code in figure 1 prints a warning message if the data manager enters a suspicious value for the creatinine level of a transplanted patient. 3
4 /* */ /* CHECK CREATININE LEVELS */ /* */ TITLE 'WARNING (CREAT) CREATININE LESS THAN.1 OR GREATER THAN 4.0'; TITLE2 'WHEN BMT=YES'; PROC PRINT DATA=BMT.CORE; WHERE ((CREAT<.1) OR (CREAT>4.0)) AND BMT='YES'; VAR ROWNUM NAME TYPE TUPN DIGNOSIS BMT BMTDAT CREAT; RUN; We have defined a reporting period to be approximately one month. This coincides with our monthly quality management meetings and with the historical cycle of monthly permanent backups of the core file. Although our current backup strategy involves multiple daily, weekly and monthly backups, some of the old monthly backups were not done at precise monthly intervals and so we must recognize and adjust for the fact that the exposure to adverse event risk is different in these periods. THE VARIABLES USED The BMT team has limited resources available to collect adverse event data and so we cannot record the exact date or exact cause of the events. (Sometimes it is virtually impossible to pin down such information regardless of the resources being devoted to the task.) In addition, only the maximum grade experienced by a patient within each particular adverse event category is captured. For these reasons, we make assumptions regarding the distribution of the events between the ending dates of each reporting period. In particular, it is assumed that an event reported as of the end of a reporting period was equally likely to have happened on any day within that period on which the patient was under treatment. If the reporting period was 30 days and the patient was being treated under a protocol for the entire period, then the probability is 1/30 that the event happened on any particular day within the period. Using this assumption, it was possible to construct a graph of the events as a function of days from the initiation of treatment for each patient. In virtually all cases the initiation of treatment is the start of the conditioning regimen, i.e. chemotherapy with or without radiation. The interest of the quality management team is highly focused on extremely severe adverse events. A plot of toxicities expressed as a rate per 100 patients per day is shown in Chart 1, with autologous, allogeneic-related and allogeneic-unrelated (MUD) events plotted separately. RESPONSE VARIABLE: ADVERSE EVENTS (AECOUNT) The adverse event variable is defined as the number of adverse events experienced by a patient in the reporting period under study. Each patient/month combination is a separate observation. Chart 1. Estimated Historical Toxicity Rate Toxicity Rate Per 100 Surviving Patients Per Day Days After Rx Initiated Allogeneic - MUD Allogeneic - Other Autologous 4
5 COVARIATE 1 - TEMPORAL RISK INDEX (TRI) Chart 1 shows graphically the tendency of the risk to start low and rise to a peak at about 50 to 75 days depending on the transplant type. The risk tends to trail off at about 100 days, gradually subsiding over the subsequent 400-day period. This result is extremely reasonable based on clinical understanding. The process of conditioning and cell infusion occurs in the early stages of treatment and then, ideally, the gradual recovery of the patient commences once the blood cells particularly neutrophils and platelets - start to be produced by the patient without the need for artificial support. We attempted to fit a number of possible candidate curves to this data in order to find a curve that was simple, intuitive and a good fit to all 3 types of transplants. Because of the apparent curvilinear risk shape in the early stage, and the tapering off of risk after day 100, an inverse quadratic curve emerged as the best fit to the data. A predictor called the temporal risk index (TRI) was constructed based on this curve, however we knew that temporal risk was not the only factor affecting adverse events! The inverse quadratic equation as well as a plot of the fit to autologous toxicities is displayed below. Having a relatively simple structure, this curve was able to explain 95% of the toxicity variability in the AUTO, 85% of the variability in the ALLO-OTHER, and 72% of the variability in the ALLO-MUD transplants, and yielded reasonably consistent parameter estimates across the different types of transplants as well as for different subsets of data. These characteristics reduce the possibility that the curves are simply fitting noise. We are also attempting to obtain data from national and international BMT registries to further validate this functional from for the relationship. Chart 2. Inverse Quadratic Fit to Toxicity Data This curve can be fit using PROC NLIN in SAS. We used all toxicities instead of just adverse events in this stage in order to get a greater volume of data to work with. The resulting parameters are listed in Table 2 below. Table 2. Coefficients Estimated in the Inverse Quadratic Fit: rate = a + bt + ct 2 1/( ) Coefficient AUTO ALLO - other ALLO MUD a b c
6 COVARIATE 2 - LENGTH OF TIME AS INPATIENT (LTIP) An explanatory variable based on the inpatient length of stay (LTIP) was included as a proxy for illness severity and complications during the admission. It is defined as: LTIP = 0 if not yet discharged in the current reporting period. = LOS for transplant admission if discharged during or prior to the current reporting period. COVARIATE 3 FOLLOW UP FLAG (FLFLAG) Adverse events are recorded into the database as they are observed by the transplant team. The frequency of followups typically diminishes over time. As we have fewer chances to observe a patient at distant time points after transplant, any adverse events are more likely to be recorded a the month in which a follow up visit has occurred. This variable is a flag to indicate whether or not a follow up visit has occurred in the current reporting period. COVARIATE 4 - OBSERVER FLAG (OBSERVER) A personnel change took place in the BMT program during which the responsibility for assigning toxicity grades changed from one group of observers to another. This kind of change almost always introduces an observer bias and the case of our adverse events data capture is no exception. Therefore a dummy variable called OBSERVER was included in the model to account for the bias. COVARIATE 5 - DIAGNOSIS/REGIMEN FLAG (DREG) One combination of diagnosis and regimen that was suspected to be particularly prone to generating adverse events was renal cell carcinoma patients who received fludarabine and total body irradiation as a preparative regimen. This indicator variable was included to assess the impact of this particular situation on adverse event risk. COVARIATE 6 - TRANSPLANT YEAR (BMTYR) Over time, improvements in transplant technology, training and process control should enable us to reduce the incidence of adverse events after adjusting for the other covariates. Examples of improvements in the area of supportive care include improved antibiotics and antiviral medications and better screening for viruses so that we can treat problems sooner. Treating problems sooner can keep toxicities that are at level 1 or 2 from becoming severe toxicities (i.e. adverse events). Including the transplant year variable allows us to test for evidence of this effect, and if there is improvement over time, it can indicate approximately how much improvement has occurred. MODELS PRODUCED BY GENMOD We will now present two models that can be produced by PROC GENMOD to analyze the adverse event data. The first is the Poisson model and the second is the negative binomial model. There are many other approaches to modeling this data that may or may not work better, however this paper will focus on these two approaches to illustrate and contrast the techniques as implemented in the GENMOD procedure. For a discussion of the event history approach and PROC PHREG see chapter 5 in Allison 3 and for a description of a new SAS PROC called GLIMMIX that can estimate generalized linear mixed models see the SUGI 30 paper by Oliver Schabenberger 4 of the SAS Institute. A key thing to remember is that all of these approaches will be superior to using standard multiple regression for the reasons stated above. The ODS statements invoke ODS statistical graphics, which is experimental in version of SAS. The examples below show how to evaluate the LINK function by using the ASSESS statement within the GENMOD procedure. The ASSESS statement works in tandem with ODS statistical graphics to produce the assessment graphics (charts 3-5) displayed later in this paper. According to SAS documentation for version 9.1.3, the ASSESS statement implements the following idea: Lin, Wei, and Ying (2002) present graphical and numerical methods for model assessment based on the cumulative sums of residuals over certain coordinates (e.g., covariates or linear predictors) or some related aggregates of residuals. The distributions of these stochastic processes under the assumed model can be approximated by the distributions of certain zero-mean Gaussian processes whose realizations can be generated by simulation. Each observed residual pattern can then be compared, both graphically and numerically, with a number of realizations from the null distribution. Such comparisons enable you to assess objectively whether the observed residual pattern reflects anything beyond random fluctuation. These procedures are useful in determining appropriate functional forms of covariates and link function. 6
7 MODEL 1 POISSON REGRESSION The code for this model is: /* */ /* POISSON REGRESSION */ /* */ ODS LISTING CLOSE; ODS RTF; ODS GRAPHICS ON; PROC GENMOD DATA=AEDATA; CLASS FLFLAG OBSERVER DREG; MODEL AECOUNT3 = TRI FLFLAG LTIP OBSERVER DREG BMTYR / DIST=POISSON LINK=LOG ; ASSESS LINK / RESAMPLE=10000; RUN; QUIT; ODS GRAPHICS OFF; ODS RTF CLOSE; ODS LISTING; /* */ With the ASSESS statement the aptness of the functional form of the link or of one of the continuous covariates is what is being checked. The analysis centers on whether the simulated residual patterns that would be generated by the model under the specified assumptions are statistically different from the one actually generated. The actual pattern is printed in bold while the simulations are represented by dotted lines. If the p-value is quite low, say p<.05, then there is cause for concern that the actual functional form being used is less than optimal because the actual residual pattern differs from the expected patterns generated by simulation. It is best to have p-values greater than.2. This is just an introduction to the ASSESS statement. For more on its capabilities please see the SAS documentation for version Excerpts from the Poisson regression run are listed below: Table 3 Goodness of Fit Statistics for the Poisson Regression Model Criteria For Assessing Goodness Of Fit Criterion DF Value Value/DF Deviance Scaled Deviance Pearson Chi-Square Scaled Pearson X Log Likelihood
8 Table 4 - Analysis of Parameter Estimates for the Poisson Regression Model Parameter Analysis Of Parameter Estimates DF Estimate Standard Error Wald 95% Confidence Limits Chi-Square Pr > ChiSq Intercept TRI <.0001 FLFLAG FLFLAG LTIP <.0001 OBSERVER OBSERVER DREG 01 REN_FLUTBI DREG 99 OTHER BMTYR Scale Chart 3 Assessment of the Poisson Model Link Function 8
9 MODEL 2 NEGATIVE BINOMIAL REGRESSION /* */ /* NEGATIVE BINOMIAL REGRESSION */ /* */ ODS LISTING CLOSE; ODS RTF; ODS GRAPHICS ON; PROC GENMOD DATA=AEDATA; CLASS OBSERVER DREG; MODEL AECOUNT3 = TRI LTIP OBSERVER DREG BMTYR / DIST=NB; ASSESS VAR=(TRI) / RESAMPLE=10000; RUN; QUIT; ODS GRAPHICS OFF; ODS RTF CLOSE; ODS LISTING; /* */ Notice in the negative binomial code that DIST=NB is specified and no link function is specified. The default link function is log for a negative binomial regression and the specification of it may be omitted. Also notice that the ASSESS is now being used to evaluate the temporal risk index (TRI). You may assess any of the continuous variables in the model or you may assess the link function using this same statement, but not at the same time! Table 5 Goodness of Fit Statistics for the Negative Binomial Model Criteria For Assessing Goodness Of Fit Criterion DF Value Value/DF Deviance Scaled Deviance Pearson Chi-Square Scaled Pearson X Log Likelihood
10 Table 6 Analysis of Parameter Estimates for the Negative Binomial Model Parameter Analysis Of Parameter Estimates DF Estimate Standard Error Wald 95% Confidence Limits Chi-Square Pr > ChiSq Intercept TRI <.0001 FLFLAG FLFLAG LTIP <.0001 OBSERVER OBSERVER DREG 01 REN_FLUTBI DREG 99 OTHER BMTYR Dispersion MODEL ANALYSIS Both the Poisson and the negative binomial models found the temporal risk and the length of stay to be very significant adverse event predictors. Other variables that have an impact are FLFLAG, OBSERVER and DREG although these variables are somewhat less important or only borderline significant. The DREG variable only affects a relatively small proportion of our patients and so even though it is significant statistically it is not as useful as one might think without knowing the data. Unfortunately the BMT year variable, BMTYR, is only borderline significant in the Poisson model and is clearly not significant in the negative binomial model. The same can be said of the intercept estimate. It is possible to re-estimate these models using the NOINT option in the MODEL statement. Tables 3 and 5 display goodness of fit statistics for the Poisson and negative binomial models respectively. Models that fit well have the values of the Value/DF close to 1. Notice how the Pearson chi-square for the Poisson model has a ratio of and the negative binomial model shows a value of for the corresponding entry. This number is better for the negative binomial case because the negative binomial distribution allows for a variance greater than the mean (overdispersion) whereas the Poisson distribution requires that the variance equal the mean. This is the reason the scale parameter is set to 1.0 in the Poisson model and the negative binomial model has a dispersion parameter estimate instead of a scale parameter estimate. The adverse event data are overdispersed and Table 6 shows that the dispersion parameter is in the negative binomial model. Note that a dispersion parameter doesn t exist in the corresponding location in the Poisson parameter estimates table, Table 4. Instead the Poisson model displays a scale parameter that stays fixed at 1. 10
11 Chart 4 Assessment of the Negative Binomial Model s Temporal Risk Index Chart 5 Assessment of the Link Function for a Multiple Regression Model 11
12 The link function assessment graph (Chart 3) shows that the choice of a log link is an excellent choice for the functional form of the link in this particular Poisson regression. For variety we showed the TRI assessment (Chart 4) for the negative binomial model. The verdict on this is less clear, as there seems to be some divergence from the simulated cumulative errors at a TRI value of around 22. Nevertheless we cannot reject the null hypothesis that the cumulative error pattern for TRI is consistent with the random fluctuation we would expect if the functional form used were appropriate. In order to assess an entire model one could produce these cumulative residual graphs on all covariates in the model as well as on the link function. Certain types of covariate misspecification will be readily apparent by the distinctive pattern produced in its cumulative residual graph. For instance, if a variable should be included in the model as log(x) but is instead included as X then the graph will exhibit a distinct pattern that starts out sloping downward then upward to a peak and then slopes downward again like a sideways S. Examples of these patterns are given in the ODS statistical graphics documentation for the GENMOD procedure in SAS For comparison purposes we ran a standard multiple regression analysis on the data and plotted the link function assessment graph. (To save space we have not shown the SAS code for this.) The link assessment is displayed in Chart 5. Note that the p-value is less than.0001, indicating the severe departure of the actual cumulative residuals from the simulated residuals. The chart shows that multiple regression is clearly not appropriate for this data. As an aside, PROC GENMOD and PROC REG yield virtually identical results when the response is set to normal and the link is set to identity in GENMOD, but the link assessment is possible only in GENMOD. CONCLUSION The Poisson and negative binomial regression approaches to modeling adverse events were discussed in this paper. Either approach has predictive value and is far superior to using standard multiple regression. When overdispersion exists in the Poisson approach it may be more appropriate to use a model based on a negative binomial response. The price you pay for using a negative binomial model is the additional complexity of the response distribution, however this additional complexity is worthwhile when the problem of overdispersion is pronounced. These models tell us how many adverse events we should expect to be reported during a reporting period. That was the main goal. The models are not intended primarily for making clinical inferences. For example, the timing of a follow up visit has nothing to do with the adverse event risks our patients face, but it DOES say that an adverse event, if it occurred, is more likely to be reported to us during a period in which a follow up visit took place. If the effect of BMT year on adverse events exists, it is too small to be detected by these models. The fact that the parameter estimate was and the p-value was.06 in the Poisson model was encouraging, but we clearly cannot make any conclusions here without further information and additional analysis. One must always be cognizant of multiple comparison issues when building regression models. Rigorous validation and model assessment techniques should be employed to assure that significant variables truly are significant and that the analyst is not simply modeling noise. This is especially important in borderline significance cases such as we have with the BMTYR variable. A future direction for studying adverse events would be to implement a generalized estimating equations (GEE) adjustment. According to Allison 3 this technique allows for correlations in the dependent variable across observations. In the present case this means that the technique would adjust for correlation over time within each patient s transplant data. Such correlations violate the assumption of independence on which many of the formulas are based, and GEE adjustment would reduce the impact of this violation. It would also be useful to try an event history approach (i.e. survival analysis) to model adverse events. Whereas the generalized linear model assumes that the response variable comes from an exponential family of distributions, that the responses are independent and that a specific link function applies, the event history approach has as a central assumption proportional hazards. In addition the event history approach utilizes partial likelihood as opposed to the MLEs used in generalized linear models. These differences in the two approaches may yield somewhat different inferences. 12
13 REFERENCES 1. Agresti, A. (1990). Categorical Data Analysis. New York: Wiley. 2. Meyers, Raymond H. (1990), Classical and Modern Regression with Applications. 2 nd Edition, Pacific Grove: Duxbury Press. 3. Allison, Paul D. (2005), Fixed Effects Regression Methods for Longitudinal Data Using SAS. 1 st Edition. Cary, NC: SAS Institute Inc. 4. Schabenberger, Oliver Introducing the GLIMMIX Procedure for Generalized Linear Mixed Models. Proceedings of the Thirtieth Annual SAS Users Group InternationalConference, Philadelphia, PA, Stokes, Maura E., Davis, Charles S., Koch, Gary G., Categorical Data Analysis Using the SAS System, Cary, NC:SAS Institute Inc., pp. 6. Nelder, J.A., Wedderburn, R.W.M., Generalized Linear Models, Journal of the Royal Statistical Society, Series A 153: Gardner, W., Mulvey, Edward P., Shaw, Esther C., Regression Analysis of Counts and Rates: Poisson, Overdispersed Poisson, and Negative Binomial Models. Psychological Bulletin, Vol. 118, No RECOMMENDED READING Cameron, A.C., Trivedi, P.K. (1998), Regression Analysis of Count Data, Cambridge: University Press. Firth, D. (1991), Generalized Linear Models, in Statistical Theory and Modeling, ed. Hinkley, D.V., Reid, N., and Snell, E.J., London: Chapman and Hall. Lin, D.Y., Wei, L.J., and Ying, Z "Model-Checking Techniques Based on Cumulative Residuals," Biometrics, 58, McCullagh, P., Nelder J.A. (1983), Generalized Linear Models, New York: Chapman and Hall. CONTACT INFORMATION Your comments and questions are valued and encouraged. Contact the authors at: Fox Chase Temple BMT Program 7604 Central Ave Philadelphia, PA Phone: (215) john.ulicny@tuhs.temple.edu Web: SAS and all other SAS Institute Inc. product or service names are registered trademarks or trademarks of SAS Institute Inc. in the USA and other countries. indicates USA registration. Other brand and product names are trademarks of their respective companies. 13
ONE MORE TIME ABOUT R 2 MEASURES OF FIT IN LOGISTIC REGRESSION
ONE MORE TIME ABOUT R 2 MEASURES OF FIT IN LOGISTIC REGRESSION Ernest S. Shtatland, Ken Kleinman, Emily M. Cain Harvard Medical School, Harvard Pilgrim Health Care, Boston, MA ABSTRACT In logistic regression,
More informationDISPLAYING THE POISSON REGRESSION ANALYSIS
Chapter 17 Poisson Regression Chapter Table of Contents DISPLAYING THE POISSON REGRESSION ANALYSIS...264 ModelInformation...269 SummaryofFit...269 AnalysisofDeviance...269 TypeIII(Wald)Tests...269 MODIFYING
More informationSAS Software to Fit the Generalized Linear Model
SAS Software to Fit the Generalized Linear Model Gordon Johnston, SAS Institute Inc., Cary, NC Abstract In recent years, the class of generalized linear models has gained popularity as a statistical modeling
More information5. Parametric Regression Model
5. Parametric Regression Model The Accelerated Failure Time (AFT) Model Denote by S (t) and S 2 (t) the survival functions of two populations. The AFT model says that there is a constant c > 0 such that
More informationMultinomial Logistic Regression Models
Stat 544, Lecture 19 1 Multinomial Logistic Regression Models Polytomous responses. Logistic regression can be extended to handle responses that are polytomous, i.e. taking r>2 categories. (Note: The word
More informationPaper: ST-161. Techniques for Evidence-Based Decision Making Using SAS Ian Stockwell, The Hilltop UMBC, Baltimore, MD
Paper: ST-161 Techniques for Evidence-Based Decision Making Using SAS Ian Stockwell, The Hilltop Institute @ UMBC, Baltimore, MD ABSTRACT SAS has many tools that can be used for data analysis. From Freqs
More informationAnalysis of Count Data A Business Perspective. George J. Hurley Sr. Research Manager The Hershey Company Milwaukee June 2013
Analysis of Count Data A Business Perspective George J. Hurley Sr. Research Manager The Hershey Company Milwaukee June 2013 Overview Count data Methods Conclusions 2 Count data Count data Anything with
More informationData Analyses in Multivariate Regression Chii-Dean Joey Lin, SDSU, San Diego, CA
Data Analyses in Multivariate Regression Chii-Dean Joey Lin, SDSU, San Diego, CA ABSTRACT Regression analysis is one of the most used statistical methodologies. It can be used to describe or predict causal
More informationInvestigating Models with Two or Three Categories
Ronald H. Heck and Lynn N. Tabata 1 Investigating Models with Two or Three Categories For the past few weeks we have been working with discriminant analysis. Let s now see what the same sort of model might
More informationSection Poisson Regression
Section 14.13 Poisson Regression Timothy Hanson Department of Statistics, University of South Carolina Stat 705: Data Analysis II 1 / 26 Poisson regression Regular regression data {(x i, Y i )} n i=1,
More informationModel Based Statistics in Biology. Part V. The Generalized Linear Model. Chapter 18.1 Logistic Regression (Dose - Response)
Model Based Statistics in Biology. Part V. The Generalized Linear Model. Logistic Regression ( - Response) ReCap. Part I (Chapters 1,2,3,4), Part II (Ch 5, 6, 7) ReCap Part III (Ch 9, 10, 11), Part IV
More informationSTA6938-Logistic Regression Model
Dr. Ying Zhang STA6938-Logistic Regression Model Topic 2-Multiple Logistic Regression Model Outlines:. Model Fitting 2. Statistical Inference for Multiple Logistic Regression Model 3. Interpretation of
More informationReview: what is a linear model. Y = β 0 + β 1 X 1 + β 2 X 2 + A model of the following form:
Outline for today What is a generalized linear model Linear predictors and link functions Example: fit a constant (the proportion) Analysis of deviance table Example: fit dose-response data using logistic
More informationClass Notes: Week 8. Probit versus Logit Link Functions and Count Data
Ronald Heck Class Notes: Week 8 1 Class Notes: Week 8 Probit versus Logit Link Functions and Count Data This week we ll take up a couple of issues. The first is working with a probit link function. While
More informationA SAS/AF Application For Sample Size And Power Determination
A SAS/AF Application For Sample Size And Power Determination Fiona Portwood, Software Product Services Ltd. Abstract When planning a study, such as a clinical trial or toxicology experiment, the choice
More informationGMM Logistic Regression with Time-Dependent Covariates and Feedback Processes in SAS TM
Paper 1025-2017 GMM Logistic Regression with Time-Dependent Covariates and Feedback Processes in SAS TM Kyle M. Irimata, Arizona State University; Jeffrey R. Wilson, Arizona State University ABSTRACT The
More informationSome general observations.
Modeling and analyzing data from computer experiments. Some general observations. 1. For simplicity, I assume that all factors (inputs) x1, x2,, xd are quantitative. 2. Because the code always produces
More informationGlossary. The ISI glossary of statistical terms provides definitions in a number of different languages:
Glossary The ISI glossary of statistical terms provides definitions in a number of different languages: http://isi.cbs.nl/glossary/index.htm Adjusted r 2 Adjusted R squared measures the proportion of the
More informationGeneralized linear models
Generalized linear models Douglas Bates November 01, 2010 Contents 1 Definition 1 2 Links 2 3 Estimating parameters 5 4 Example 6 5 Model building 8 6 Conclusions 8 7 Summary 9 1 Generalized Linear Models
More informationINFORMATION AS A UNIFYING MEASURE OF FIT IN SAS STATISTICAL MODELING PROCEDURES
INFORMATION AS A UNIFYING MEASURE OF FIT IN SAS STATISTICAL MODELING PROCEDURES Ernest S. Shtatland, PhD Mary B. Barton, MD, MPP Harvard Medical School, Harvard Pilgrim Health Care, Boston, MA ABSTRACT
More information8 Nominal and Ordinal Logistic Regression
8 Nominal and Ordinal Logistic Regression 8.1 Introduction If the response variable is categorical, with more then two categories, then there are two options for generalized linear models. One relies on
More informationBasic Medical Statistics Course
Basic Medical Statistics Course S7 Logistic Regression November 2015 Wilma Heemsbergen w.heemsbergen@nki.nl Logistic Regression The concept of a relationship between the distribution of a dependent variable
More informationHomework 5: Answer Key. Plausible Model: E(y) = µt. The expected number of arrests arrests equals a constant times the number who attend the game.
EdPsych/Psych/Soc 589 C.J. Anderson Homework 5: Answer Key 1. Probelm 3.18 (page 96 of Agresti). (a) Y assume Poisson random variable. Plausible Model: E(y) = µt. The expected number of arrests arrests
More informationBOOTSTRAPPING WITH MODELS FOR COUNT DATA
Journal of Biopharmaceutical Statistics, 21: 1164 1176, 2011 Copyright Taylor & Francis Group, LLC ISSN: 1054-3406 print/1520-5711 online DOI: 10.1080/10543406.2011.607748 BOOTSTRAPPING WITH MODELS FOR
More informationStat 642, Lecture notes for 04/12/05 96
Stat 642, Lecture notes for 04/12/05 96 Hosmer-Lemeshow Statistic The Hosmer-Lemeshow Statistic is another measure of lack of fit. Hosmer and Lemeshow recommend partitioning the observations into 10 equal
More informationCOMPLEMENTARY LOG-LOG MODEL
COMPLEMENTARY LOG-LOG MODEL Under the assumption of binary response, there are two alternatives to logit model: probit model and complementary-log-log model. They all follow the same form π ( x) =Φ ( α
More informationThe SEQDESIGN Procedure
SAS/STAT 9.2 User s Guide, Second Edition The SEQDESIGN Procedure (Book Excerpt) This document is an individual chapter from the SAS/STAT 9.2 User s Guide, Second Edition. The correct bibliographic citation
More informationBIAS OF MAXIMUM-LIKELIHOOD ESTIMATES IN LOGISTIC AND COX REGRESSION MODELS: A COMPARATIVE SIMULATION STUDY
BIAS OF MAXIMUM-LIKELIHOOD ESTIMATES IN LOGISTIC AND COX REGRESSION MODELS: A COMPARATIVE SIMULATION STUDY Ingo Langner 1, Ralf Bender 2, Rebecca Lenz-Tönjes 1, Helmut Küchenhoff 2, Maria Blettner 2 1
More informationModel Estimation Example
Ronald H. Heck 1 EDEP 606: Multivariate Methods (S2013) April 7, 2013 Model Estimation Example As we have moved through the course this semester, we have encountered the concept of model estimation. Discussions
More informationADVANCED STATISTICAL ANALYSIS OF EPIDEMIOLOGICAL STUDIES. Cox s regression analysis Time dependent explanatory variables
ADVANCED STATISTICAL ANALYSIS OF EPIDEMIOLOGICAL STUDIES Cox s regression analysis Time dependent explanatory variables Henrik Ravn Bandim Health Project, Statens Serum Institut 4 November 2011 1 / 53
More informationSCHOOL OF MATHEMATICS AND STATISTICS. Linear and Generalised Linear Models
SCHOOL OF MATHEMATICS AND STATISTICS Linear and Generalised Linear Models Autumn Semester 2017 18 2 hours Attempt all the questions. The allocation of marks is shown in brackets. RESTRICTED OPEN BOOK EXAMINATION
More informationRANDOM and REPEATED statements - How to Use Them to Model the Covariance Structure in Proc Mixed. Charlie Liu, Dachuang Cao, Peiqi Chen, Tony Zagar
Paper S02-2007 RANDOM and REPEATED statements - How to Use Them to Model the Covariance Structure in Proc Mixed Charlie Liu, Dachuang Cao, Peiqi Chen, Tony Zagar Eli Lilly & Company, Indianapolis, IN ABSTRACT
More informationChapter 5: Logistic Regression-I
: Logistic Regression-I Dipankar Bandyopadhyay Department of Biostatistics, Virginia Commonwealth University BIOS 625: Categorical Data & GLM [Acknowledgements to Tim Hanson and Haitao Chu] D. Bandyopadhyay
More informationBIOL 51A - Biostatistics 1 1. Lecture 1: Intro to Biostatistics. Smoking: hazardous? FEV (l) Smoke
BIOL 51A - Biostatistics 1 1 Lecture 1: Intro to Biostatistics Smoking: hazardous? FEV (l) 1 2 3 4 5 No Yes Smoke BIOL 51A - Biostatistics 1 2 Box Plot a.k.a box-and-whisker diagram or candlestick chart
More informationST3241 Categorical Data Analysis I Multicategory Logit Models. Logit Models For Nominal Responses
ST3241 Categorical Data Analysis I Multicategory Logit Models Logit Models For Nominal Responses 1 Models For Nominal Responses Y is nominal with J categories. Let {π 1,, π J } denote the response probabilities
More informationClinical Trials. Olli Saarela. September 18, Dalla Lana School of Public Health University of Toronto.
Introduction to Dalla Lana School of Public Health University of Toronto olli.saarela@utoronto.ca September 18, 2014 38-1 : a review 38-2 Evidence Ideal: to advance the knowledge-base of clinical medicine,
More informationSTAT 7030: Categorical Data Analysis
STAT 7030: Categorical Data Analysis 5. Logistic Regression Peng Zeng Department of Mathematics and Statistics Auburn University Fall 2012 Peng Zeng (Auburn University) STAT 7030 Lecture Notes Fall 2012
More informationGeneralized Linear Models for Non-Normal Data
Generalized Linear Models for Non-Normal Data Today s Class: 3 parts of a generalized model Models for binary outcomes Complications for generalized multivariate or multilevel models SPLH 861: Lecture
More informationDynamic Determination of Mixed Model Covariance Structures. in Double-blind Clinical Trials. Matthew Davis - Omnicare Clinical Research
PharmaSUG2010 - Paper SP12 Dynamic Determination of Mixed Model Covariance Structures in Double-blind Clinical Trials Matthew Davis - Omnicare Clinical Research Abstract With the computing power of SAS
More informationGeneralized Linear Models (GLZ)
Generalized Linear Models (GLZ) Generalized Linear Models (GLZ) are an extension of the linear modeling process that allows models to be fit to data that follow probability distributions other than the
More informationUNIVERSITY OF MASSACHUSETTS Department of Mathematics and Statistics Applied Statistics Friday, January 15, 2016
UNIVERSITY OF MASSACHUSETTS Department of Mathematics and Statistics Applied Statistics Friday, January 15, 2016 Work all problems. 60 points are needed to pass at the Masters Level and 75 to pass at the
More informationCompare Predicted Counts between Groups of Zero Truncated Poisson Regression Model based on Recycled Predictions Method
Compare Predicted Counts between Groups of Zero Truncated Poisson Regression Model based on Recycled Predictions Method Yan Wang 1, Michael Ong 2, Honghu Liu 1,2,3 1 Department of Biostatistics, UCLA School
More informationModels for Binary Outcomes
Models for Binary Outcomes Introduction The simple or binary response (for example, success or failure) analysis models the relationship between a binary response variable and one or more explanatory variables.
More informationPrediction of Bike Rental using Model Reuse Strategy
Prediction of Bike Rental using Model Reuse Strategy Arun Bala Subramaniyan and Rong Pan School of Computing, Informatics, Decision Systems Engineering, Arizona State University, Tempe, USA. {bsarun, rong.pan}@asu.edu
More informationSurvival Regression Models
Survival Regression Models David M. Rocke May 18, 2017 David M. Rocke Survival Regression Models May 18, 2017 1 / 32 Background on the Proportional Hazards Model The exponential distribution has constant
More informationExam Applied Statistical Regression. Good Luck!
Dr. M. Dettling Summer 2011 Exam Applied Statistical Regression Approved: Tables: Note: Any written material, calculator (without communication facility). Attached. All tests have to be done at the 5%-level.
More informationThe GENMOD Procedure. Overview. Getting Started. Syntax. Details. Examples. References. SAS/STAT User's Guide. Book Contents Previous Next
Book Contents Previous Next SAS/STAT User's Guide Overview Getting Started Syntax Details Examples References Book Contents Previous Next Top http://v8doc.sas.com/sashtml/stat/chap29/index.htm29/10/2004
More informationPackage threg. August 10, 2015
Package threg August 10, 2015 Title Threshold Regression Version 1.0.3 Date 2015-08-10 Author Tao Xiao Maintainer Tao Xiao Depends R (>= 2.10), survival, Formula Fit a threshold regression
More informationStatistics in medicine
Statistics in medicine Lecture 4: and multivariable regression Fatma Shebl, MD, MS, MPH, PhD Assistant Professor Chronic Disease Epidemiology Department Yale School of Public Health Fatma.shebl@yale.edu
More informationSection on Survey Research Methods JSM 2010 STATISTICAL GRAPHICS OF PEARSON RESIDUALS IN SURVEY LOGISTIC REGRESSION DIAGNOSIS
STATISTICAL GRAPHICS OF PEARSON RESIDUALS IN SURVEY LOGISTIC REGRESSION DIAGNOSIS Stanley Weng, National Agricultural Statistics Service, U.S. Department of Agriculture 3251 Old Lee Hwy, Fairfax, VA 22030,
More informationLecture 8 Stat D. Gillen
Statistics 255 - Survival Analysis Presented February 23, 2016 Dan Gillen Department of Statistics University of California, Irvine 8.1 Example of two ways to stratify Suppose a confounder C has 3 levels
More informationEXAMINATIONS OF THE ROYAL STATISTICAL SOCIETY (formerly the Examinations of the Institute of Statisticians) GRADUATE DIPLOMA, 2007
EXAMINATIONS OF THE ROYAL STATISTICAL SOCIETY (formerly the Examinations of the Institute of Statisticians) GRADUATE DIPLOMA, 2007 Applied Statistics I Time Allowed: Three Hours Candidates should answer
More informationExtensions of Cox Model for Non-Proportional Hazards Purpose
PhUSE 2013 Paper SP07 Extensions of Cox Model for Non-Proportional Hazards Purpose Jadwiga Borucka, PAREXEL, Warsaw, Poland ABSTRACT Cox proportional hazard model is one of the most common methods used
More informationIntroduction to Regression Analysis. Dr. Devlina Chatterjee 11 th August, 2017
Introduction to Regression Analysis Dr. Devlina Chatterjee 11 th August, 2017 What is regression analysis? Regression analysis is a statistical technique for studying linear relationships. One dependent
More informationPerforming response surface analysis using the SAS RSREG procedure
Paper DV02-2012 Performing response surface analysis using the SAS RSREG procedure Zhiwu Li, National Database Nursing Quality Indicator and the Department of Biostatistics, University of Kansas Medical
More informationRon Heck, Fall Week 3: Notes Building a Two-Level Model
Ron Heck, Fall 2011 1 EDEP 768E: Seminar on Multilevel Modeling rev. 9/6/2011@11:27pm Week 3: Notes Building a Two-Level Model We will build a model to explain student math achievement using student-level
More informationUNIVERSITY OF TORONTO. Faculty of Arts and Science APRIL 2010 EXAMINATIONS STA 303 H1S / STA 1002 HS. Duration - 3 hours. Aids Allowed: Calculator
UNIVERSITY OF TORONTO Faculty of Arts and Science APRIL 2010 EXAMINATIONS STA 303 H1S / STA 1002 HS Duration - 3 hours Aids Allowed: Calculator LAST NAME: FIRST NAME: STUDENT NUMBER: There are 27 pages
More informationMIXED MODELS FOR REPEATED (LONGITUDINAL) DATA PART 2 DAVID C. HOWELL 4/1/2010
MIXED MODELS FOR REPEATED (LONGITUDINAL) DATA PART 2 DAVID C. HOWELL 4/1/2010 Part 1 of this document can be found at http://www.uvm.edu/~dhowell/methods/supplements/mixed Models for Repeated Measures1.pdf
More informationQuantifying Weather Risk Analysis
Quantifying Weather Risk Analysis Now that an index has been selected and calibrated, it can be used to conduct a more thorough risk analysis. The objective of such a risk analysis is to gain a better
More informationUNIVERSITY OF TORONTO Faculty of Arts and Science
UNIVERSITY OF TORONTO Faculty of Arts and Science December 2013 Final Examination STA442H1F/2101HF Methods of Applied Statistics Jerry Brunner Duration - 3 hours Aids: Calculator Model(s): Any calculator
More informationSTA 303 H1S / 1002 HS Winter 2011 Test March 7, ab 1cde 2abcde 2fghij 3
STA 303 H1S / 1002 HS Winter 2011 Test March 7, 2011 LAST NAME: FIRST NAME: STUDENT NUMBER: ENROLLED IN: (circle one) STA 303 STA 1002 INSTRUCTIONS: Time: 90 minutes Aids allowed: calculator. Some formulae
More informationIntroduction to Generalized Linear Models
Introduction to Generalized Linear Models Edps/Psych/Soc 589 Carolyn J. Anderson Department of Educational Psychology c Board of Trustees, University of Illinois Fall 2018 Outline Introduction (motivation
More informationFaculty of Health Sciences. Regression models. Counts, Poisson regression, Lene Theil Skovgaard. Dept. of Biostatistics
Faculty of Health Sciences Regression models Counts, Poisson regression, 27-5-2013 Lene Theil Skovgaard Dept. of Biostatistics 1 / 36 Count outcome PKA & LTS, Sect. 7.2 Poisson regression The Binomial
More informationGeneralized linear models
Generalized linear models Outline for today What is a generalized linear model Linear predictors and link functions Example: estimate a proportion Analysis of deviance Example: fit dose- response data
More informationLecture 9. Statistics Survival Analysis. Presented February 23, Dan Gillen Department of Statistics University of California, Irvine
Statistics 255 - Survival Analysis Presented February 23, 2016 Dan Gillen Department of Statistics University of California, Irvine 9.1 Survival analysis involves subjects moving through time Hazard may
More informationStatistical Distribution Assumptions of General Linear Models
Statistical Distribution Assumptions of General Linear Models Applied Multilevel Models for Cross Sectional Data Lecture 4 ICPSR Summer Workshop University of Colorado Boulder Lecture 4: Statistical Distributions
More informationChapter 1. Modeling Basics
Chapter 1. Modeling Basics What is a model? Model equation and probability distribution Types of model effects Writing models in matrix form Summary 1 What is a statistical model? A model is a mathematical
More informationParametric Modelling of Over-dispersed Count Data. Part III / MMath (Applied Statistics) 1
Parametric Modelling of Over-dispersed Count Data Part III / MMath (Applied Statistics) 1 Introduction Poisson regression is the de facto approach for handling count data What happens then when Poisson
More informationThe GENMOD Procedure (Book Excerpt)
SAS/STAT 9.22 User s Guide The GENMOD Procedure (Book Excerpt) SAS Documentation This document is an individual chapter from SAS/STAT 9.22 User s Guide. The correct bibliographic citation for the complete
More informationMGR-815. Notes for the MGR-815 course. 12 June School of Superior Technology. Professor Zbigniew Dziong
Modeling, Estimation and Control, for Telecommunication Networks Notes for the MGR-815 course 12 June 2010 School of Superior Technology Professor Zbigniew Dziong 1 Table of Contents Preface 5 1. Example
More informationGeneralized Linear Models 1
Generalized Linear Models 1 STA 2101/442: Fall 2012 1 See last slide for copyright information. 1 / 24 Suggested Reading: Davison s Statistical models Exponential families of distributions Sec. 5.2 Chapter
More informationGeneralized Models: Part 1
Generalized Models: Part 1 Topics: Introduction to generalized models Introduction to maximum likelihood estimation Models for binary outcomes Models for proportion outcomes Models for categorical outcomes
More informationChapter 16. Simple Linear Regression and Correlation
Chapter 16 Simple Linear Regression and Correlation 16.1 Regression Analysis Our problem objective is to analyze the relationship between interval variables; regression analysis is the first tool we will
More informationLecture 7 Time-dependent Covariates in Cox Regression
Lecture 7 Time-dependent Covariates in Cox Regression So far, we ve been considering the following Cox PH model: λ(t Z) = λ 0 (t) exp(β Z) = λ 0 (t) exp( β j Z j ) where β j is the parameter for the the
More informationLectures 5 & 6: Hypothesis Testing
Lectures 5 & 6: Hypothesis Testing in which you learn to apply the concept of statistical significance to OLS estimates, learn the concept of t values, how to use them in regression work and come across
More information2 Prediction and Analysis of Variance
2 Prediction and Analysis of Variance Reading: Chapters and 2 of Kennedy A Guide to Econometrics Achen, Christopher H. Interpreting and Using Regression (London: Sage, 982). Chapter 4 of Andy Field, Discovering
More informationANALYSING BINARY DATA IN A REPEATED MEASUREMENTS SETTING USING SAS
Libraries 1997-9th Annual Conference Proceedings ANALYSING BINARY DATA IN A REPEATED MEASUREMENTS SETTING USING SAS Eleanor F. Allan Follow this and additional works at: http://newprairiepress.org/agstatconference
More informationChapter 22: Log-linear regression for Poisson counts
Chapter 22: Log-linear regression for Poisson counts Exposure to ionizing radiation is recognized as a cancer risk. In the United States, EPA sets guidelines specifying upper limits on the amount of exposure
More informationChapter 6. Logistic Regression. 6.1 A linear model for the log odds
Chapter 6 Logistic Regression In logistic regression, there is a categorical response variables, often coded 1=Yes and 0=No. Many important phenomena fit this framework. The patient survives the operation,
More informationLecture 01: Introduction
Lecture 01: Introduction Dipankar Bandyopadhyay, Ph.D. BMTRY 711: Analysis of Categorical Data Spring 2011 Division of Biostatistics and Epidemiology Medical University of South Carolina Lecture 01: Introduction
More informationLogistic Regression. Interpretation of linear regression. Other types of outcomes. 0-1 response variable: Wound infection. Usual linear regression
Logistic Regression Usual linear regression (repetition) y i = b 0 + b 1 x 1i + b 2 x 2i + e i, e i N(0,σ 2 ) or: y i N(b 0 + b 1 x 1i + b 2 x 2i,σ 2 ) Example (DGA, p. 336): E(PEmax) = 47.355 + 1.024
More informationPractice of SAS Logistic Regression on Binary Pharmacodynamic Data Problems and Solutions. Alan J Xiao, Cognigen Corporation, Buffalo NY
Practice of SAS Logistic Regression on Binary Pharmacodynamic Data Problems and Solutions Alan J Xiao, Cognigen Corporation, Buffalo NY ABSTRACT Logistic regression has been widely applied to population
More informationMarquette University Executive MBA Program Statistics Review Class Notes Summer 2018
Marquette University Executive MBA Program Statistics Review Class Notes Summer 2018 Chapter One: Data and Statistics Statistics A collection of procedures and principles
More informationReview. Timothy Hanson. Department of Statistics, University of South Carolina. Stat 770: Categorical Data Analysis
Review Timothy Hanson Department of Statistics, University of South Carolina Stat 770: Categorical Data Analysis 1 / 22 Chapter 1: background Nominal, ordinal, interval data. Distributions: Poisson, binomial,
More informationEconometrics Summary Algebraic and Statistical Preliminaries
Econometrics Summary Algebraic and Statistical Preliminaries Elasticity: The point elasticity of Y with respect to L is given by α = ( Y/ L)/(Y/L). The arc elasticity is given by ( Y/ L)/(Y/L), when L
More informationFrom Practical Data Analysis with JMP, Second Edition. Full book available for purchase here. About This Book... xiii About The Author...
From Practical Data Analysis with JMP, Second Edition. Full book available for purchase here. Contents About This Book... xiii About The Author... xxiii Chapter 1 Getting Started: Data Analysis with JMP...
More informationChapter 20: Logistic regression for binary response variables
Chapter 20: Logistic regression for binary response variables In 1846, the Donner and Reed families left Illinois for California by covered wagon (87 people, 20 wagons). They attempted a new and untried
More informationOpen Problems in Mixed Models
xxiii Determining how to deal with a not positive definite covariance matrix of random effects, D during maximum likelihood estimation algorithms. Several strategies are discussed in Section 2.15. For
More informationDepartment of Statistical Science FIRST YEAR EXAM - SPRING 2017
Department of Statistical Science Duke University FIRST YEAR EXAM - SPRING 017 Monday May 8th 017, 9:00 AM 1:00 PM NOTES: PLEASE READ CAREFULLY BEFORE BEGINNING EXAM! 1. Do not write solutions on the exam;
More informationMaximum-Likelihood Estimation: Basic Ideas
Sociology 740 John Fox Lecture Notes Maximum-Likelihood Estimation: Basic Ideas Copyright 2014 by John Fox Maximum-Likelihood Estimation: Basic Ideas 1 I The method of maximum likelihood provides estimators
More informationReview of Statistics 101
Review of Statistics 101 We review some important themes from the course 1. Introduction Statistics- Set of methods for collecting/analyzing data (the art and science of learning from data). Provides methods
More informationA COEFFICIENT OF DETERMINATION FOR LOGISTIC REGRESSION MODELS
A COEFFICIENT OF DETEMINATION FO LOGISTIC EGESSION MODELS ENATO MICELI UNIVESITY OF TOINO After a brief presentation of the main extensions of the classical coefficient of determination ( ), a new index
More informationAnalyzing and Interpreting Continuous Data Using JMP
Analyzing and Interpreting Continuous Data Using JMP A Step-by-Step Guide José G. Ramírez, Ph.D. Brenda S. Ramírez, M.S. Corrections to first printing. The correct bibliographic citation for this manual
More informationModel Based Statistics in Biology. Part V. The Generalized Linear Model. Chapter 16 Introduction
Model Based Statistics in Biology. Part V. The Generalized Linear Model. Chapter 16 Introduction ReCap. Parts I IV. The General Linear Model Part V. The Generalized Linear Model 16 Introduction 16.1 Analysis
More informationStatistics Boot Camp. Dr. Stephanie Lane Institute for Defense Analyses DATAWorks 2018
Statistics Boot Camp Dr. Stephanie Lane Institute for Defense Analyses DATAWorks 2018 March 21, 2018 Outline of boot camp Summarizing and simplifying data Point and interval estimation Foundations of statistical
More informationApplication of Poisson and Negative Binomial Regression Models in Modelling Oil Spill Data in the Niger Delta
International Journal of Science and Engineering Investigations vol. 7, issue 77, June 2018 ISSN: 2251-8843 Application of Poisson and Negative Binomial Regression Models in Modelling Oil Spill Data in
More informationGeneralized Linear Model under the Extended Negative Multinomial Model and Cancer Incidence
Generalized Linear Model under the Extended Negative Multinomial Model and Cancer Incidence Sunil Kumar Dhar Center for Applied Mathematics and Statistics, Department of Mathematical Sciences, New Jersey
More informationFor more information about how to cite these materials visit
Author(s): Kerby Shedden, Ph.D., 2010 License: Unless otherwise noted, this material is made available under the terms of the Creative Commons Attribution Share Alike 3.0 License: http://creativecommons.org/licenses/by-sa/3.0/
More informationChapter 4: Generalized Linear Models-II
: Generalized Linear Models-II Dipankar Bandyopadhyay Department of Biostatistics, Virginia Commonwealth University BIOS 625: Categorical Data & GLM [Acknowledgements to Tim Hanson and Haitao Chu] D. Bandyopadhyay
More informationLab 3: Two levels Poisson models (taken from Multilevel and Longitudinal Modeling Using Stata, p )
Lab 3: Two levels Poisson models (taken from Multilevel and Longitudinal Modeling Using Stata, p. 376-390) BIO656 2009 Goal: To see if a major health-care reform which took place in 1997 in Germany was
More information