Compare Predicted Counts between Groups of Zero Truncated Poisson Regression Model based on Recycled Predictions Method
|
|
- Melina Barber
- 6 years ago
- Views:
Transcription
1 Compare Predicted Counts between Groups of Zero Truncated Poisson Regression Model based on Recycled Predictions Method Yan Wang 1, Michael Ong 2, Honghu Liu 1,2,3 1 Department of Biostatistics, UCLA School of Public Health, Los Angeles, CA David Geffen School of Medicine at the University of California, Los Angeles, Department of Medicine, Division of General Internal Medicine & Health Services Research, 911 Broxton Ave, 1st Floor, Los Angeles, CA UCLA School of Dentistry, Le Conte Ave, Los Angeles, CA Abstract Zero Truncated Poisson (ZTP) regression model is used to model positive count data, where zero is a potential value but is almost impossible to be observed due to the nature of study and its design. ZTP is more accurate than traditional Poisson regression model for this kind of data. In practice, researchers often need to test the difference of the predicted counts between groups with ZTP regression model. The test result can be misleading if the design is very unbalanced. However, the combination of ZTP regression model and recycled predictions method is one possible way to create an identical structure of the covariates when comparing the predicted counts between groups. This paper uses ZTP regression model based on recycled predictions method to model the positive count data and estimates the variance of the difference of the predicted counts by delta method. Finally, the model and estimation techniques are applied to a real study of Adherence and Efficacy of Protease Inhibitor Therapy (ADEPT). Keywords: Zero Truncated Poisson (ZTP) regression model, recycled predictions method, variance estimation, delta method I. Introduction The regression model for count data is receiving more and more attention nowadays [1-3], even though the use of regression models to describe count data is relatively recent [4-6]. Count data is the non negative integer outcome, such as the number of international conflicts, daily accidents, industrial injuries and so no. It is also common 2478
2 in the clinical settings, such as the number of doctors and hospital visits. In these cases, directly using standard linear regression model to count outcomes will result in inefficient, in consistent and biased estimation [7]. It is much better to use the models specifically designed for count outcomes. We have standard Poisson Regression Models (SPRM) and standard Negative Binomial Regression Models (SNBRM), which are the foundation of other modified count models, such as zero truncated models and zero inflated models. We are only interested in the zero truncated Poisson (ZTP) regression models in this paper. When zero count is a potential possible value, but is missing in the data set, we call it zero truncated data. The missing of zero count happens due to the sample scheme, in which the zero count is impossible to be observed [4-7]. For example, we study how often people have coffee every month if we collect the data in Starbucks. Then we are not able to observe the zero counts, since all the sampled people will have at least once. Shaw (1988) has proposed Poisson regression models for the analysis of truncated samples of count data [8]. Recycled predictions method is widely used to balance the data between different groups. It is very common in Medical cost-effective analysis [9], which creates an identical covariate structure. The method is coding everyone as if they were all in the control group and predicting the outcome for each individual. Then calculate the predicted outcome for each individual by coding everyone as if they were all in the treatment group. The estimated outcomes of the control group and treatment group are given by the arithmetic mean of all individuals in the data set respectively. To test whether the predicted values by above method is significantly different, the essential part is how to estimate the variance of the difference, which not only involve the variance of the predicted value, but will need to adjust the variance of all the covariates. This paper will talk about using ZTP regression model based on recycled predictions method to model the positive count data as well as estimate the variance of the difference of the predicted counts by delta method. The idea in this paper is from a real problem in the health service research. We want to test the difference between the lengths of hospital stays among 6 hospitals, adjust for age, gender and other demographic information. We will use ZTP regression to model the positive count of length of stay. The sample size is very unbalanced that one hospital has most of the observations and other 5 hospitals have less. After we use zero truncated Poisson (ZTP) regression model based recycled predictions method to predict the counts among different hospitals, we want to test whether the counts are different between any two hospitals. The research question is that how to estimate the standard error of the difference between any two predicted counts of Zero Truncated Poisson model based on recycled predictions. The model and estimation is derived in the methods section. Finally, the estimation method will be applied to a real data Adherence and Efficacy of Protease Inhibitor Therapy (ADEPT) example in the results section. 2479
3 II. Methods 2.1 Poisson Model and Poisson Regression Model The density function of standard Poisson model can be expressed as the following [1], (1) Here is the only parameter in the model (1), which denotes the rate of occurrences or the expected number of times an event will occur over a given period of time. The standard Poisson model is fundamental to understanding the Poisson regression models for counts [7]. The count variable of interest is, a random variable to denote the number of times an event will occur. Then, is the possible values of. The standard Poisson model requires the data to satisfy, In Poisson regression, for the given covariate vector for each subject, p is the number of covariates, then the parameter can be estimated at subject level as [6], we use log to denote the natural logarithm, (2) Here is the coefficient in the model, which can be estimated by Maximum likelihood method (MLE). The log likelihood function is as below, (3) In the standard Poisson regression model, the conditional mean of, is given by, (4) Here we allow zero counts in the model. 2.2 Zero Truncated Poisson Regression Model Zero-truncated Poisson (ZTP) regression, introduced by [8], is used to model the always positive counts. (If zero is an admissible value for the dependent variable, then standard Poisson regression is more appropriate [6]). The sampling schemes are most likely the reason that gives rise to the Zero Truncated Poisson (ZTP) model. The density function for ZTP is expressed as (5) (for ) after the zero value of being truncated [2-4], here j can be any positive numbers that takes, the probability for is, (5) 2480
4 Here is the only parameter in the model. We add the index for to indicate different groups or different design matrices, we may have different parameters of the model. We will later extend this to the regression setting easily. The expected counts for each given parameter is (6) The variance based on the model can also be estimated as, ZTP is different from the standard Poisson regression if we compare (6) and (7) with (4). According to the Poisson regression model at subject level can be estimated by the generalized linear model as below [6], here ( ) will indicate the estimation, (7) The offset is defined as including no observations in the model. It is a design based constant, different from the constant in the model. It is a variable that is included in a linear model without a corresponding coefficient being estimated [4-6], for example the exposure time and the population used in order to estimate the rate. The coefficient can also be estimated by the log likelihood function as, (8) The log likelihood function is more complicated than that of standard Poisson as (3). 2.3 Recycled Predictions Method Before we use recycled predictions method to estimate for each hospital, we first redefine the notation in more detail. We assume that there are subjects from the hospitals. Therefore, there are observations from 6 hospitals in total. The parameter can be estimated for each given, here is a vector defined as, We suppose in the vector, are the dummy variables for 6 hospitals, with hospital 6 as the reference group, i.e. represents the information of subject k in hospital 6 with all are zero. Then we assume the estimator for the coefficient is, 2481
5 Here are the coefficients for the corresponding set of dummy variables. This will be used to estimate for the subject in hospital for given, where is from 1 to 6 and k is from 1 to as (9) (9) Here collection. is the designed based constant at subject level, related to the data Now we can plug in the recycled prediction methods, in which we vary characteristics of interest across the whole data set to create identical covariate structure for different hospitals, and then average the predictions across the whole data set to estimate the parameter at hospital level. For example, if we assume all the observations are from hospital 1 and set the dummy variables as, where is from 1 to 6 and k is from 1 to across the whole data set. Then calculate the average of estimated at hospital level by all the N observations in the data set as, With the same techniques, we can calculate the estimators of other hospitals. The reference hospital can be estimated by letting all dummy variables. We note that is estimated by all the observations in the data set and has nothing to do with which hospital each observation originally belongs to. It is estimated by resetting all the dummy variables. To simplify the notation, we will use k to denote all the subjects in the data set, where. At person level, each is estimated by, where is from the information, with only dummy variables for hospitals changed. This is because the recycled prediction methods for hospital i only involve the change of. We can simplify the notation of the estimation as below, (10) For each hospital i, it is only related to the information by setting different set of corresponding dummy variables as above. We will use to denote that for each subject k in the data, we have different set of dummy variables to estimate. Then the mean of can be calculated as, (11) The variance is estimated by delta method, consider is a function of, 2482
6 (12) Here is a constant. The variance-covariance matrix of the coefficients is given by STATA with command e(v) by the generalized linear model command ZTP. is a vector that and is a scalar, and the coefficient is the same as before. The bootstrap may be an option to estimate the variance, but the calculation in (12) based on delta method takes the variance of other covariates into account. 2.4 Estimate the variance of the difference Then, we will derive the estimation of the variance of the difference. Consider in recycled predictions method, which is used to remove the effects of other variables, the estimated parameter and for two hospitals ( ) can be expressed as following by (5) (13) From (11), use the fact that, here is a vector as before, and are the estimated coefficients in the model corresponding to the different sets of dummy variables respectively. For example, if we want to estimate the log scale difference of hospital1 and hospital2, the only difference is the set of dummy variables. For hospital1, we set and all others is zero while for hospital2, we set and all others is zero. Then consider the above fact we have If we want to compare hospital1 with the reference hospital6 (by set all zero), then the difference will be Therefore, the only deference between the two groups is related to the correponding dummy group coefficient(s). We have the following relationship between and, (14) where is a constant for any given model. When we estimate the covariance between and, we will need to take the estimation into account. 2483
7 Next, we will derive the covariance of and. It will be too complicated to calculate the variance directly. Therefore, we consider the Taylor expansion of (10) Here. We only keep the linear part of the expansion, then we have the estimation for the as Finally, we have the variance-covariance matrix for and is (15) Again, we will calculate the variance of by delta method. If, we have the derivative as. Therefore, the variance for can be expressed as, This can be used further for testing whether the two predicted values are equal. To derive the variance of the difference between two predicted values, we can directly calculate from (10) and (13), we define, (16) then by the relationship between and, consider and (12), The derivative can be expressed as (17) Then by delta method, we will finally have the variance of the difference between group i and group j as, (18) Substitute the values from (11) (15) and (17) we will have the standard error estimation of the difference of predicted values, which is the square root of (16). We 2484
8 can further use based on this to test whether the two predicted values are equal. III. Results We use the Adherence and Efficacy of Protease Inhibitor Therapy (ADEPT) study as a real data example to apply the above theory. It is a prospective, observational investigation of medication adherence among HIV-infected patients starting a new Highly Active Antiretroviral Treatment (HAART) regimen [12-13] from February 1998 through April At each study week, 0, 8, 24, 48, each patient was asked about the names of their antiretroviral (ARV) medications. Patients visited the study nurses every 4 weeks for measurement of medications adherence. At each visit, the change of their medications would be noted down. We model the number combinations of ARV drugs for each patient in the study. Consider the design of the study; all the patients will have at least one drug combinations in the ADEPT data. There is a Medication Event Monitoring System (MEMS) bottle cap that recorded the date and time of each opening of the pill bottle. In the simplest case, we will consider 2 covariate variables age and gender effect. Of all the 116 patients, 8 of the patients had 3 drugs, 26 of them had 2 drugs and all the other 82 patients had only one drug. There are 23 (around 20.0%) females in the data set. The average age of these patients is 37.2 (with standard deviation 8.1) ranges from 20.3 to First we will model the data with classical Zero Truncated Poisson model. The output from STATA for the number of ARV drugs based on the model and the predicted mean number of drugs by gender is illustrated as Table1. Table1. Zero Truncated Poisson Model without Using Recycled Predictions Zero-truncated Poisson regression Number of obs = 116 LR chi2( 2) = Prob > chi2 = Log likelihood = Pseudo R2 = drug Coef. Std. Err. z P> z [95% Conf. Interval age sex _cons > sex = 0 (a) Standard ZTP model Variable Obs Mean Std. Dev. Min Max drug_ztp > sex = 1 Variable Obs Mean Std. Dev. Min Max drug_ztp (b) Predictions based on Standard ZTP model 2485
9 Now we will apply the recycled predictions method to modify the predictions. We will treat all the 116 patients in the data as male and female separately to predict the mean number of drugs. In this case, we predict the outcome adjust for the age based on the ZTP models in Table1. The recycled predictions are as Table2. Table2. The predictions based recycled predictions methods Variable Obs Mean Std. Dev. Min Max nl_drug_gen nl_drug_gen Compare Table1 (b) and Table2, the predictions are modified by the recycled predictions. Finally, in order to test the difference of the above correlated predictions, we will use the estimation method of the variance in the methods sections. The estimated standard error for the difference is We do not have strong evidence (p-value = 0.52) to reject the null hypothesis that the predicted number of combinations ARV drugs between male and female are the same, based on ZTP models with recycled prediction methods adjust for age. IV. Discussion In this paper, the zero truncated count is the key point for the model. As [11] mentioned, the interpretation of coefficients in the truncated model is always more complicated than those for the standard model. When the truncation exists only in the sample (the zero counts can not be observed), i.e. the population is in the case of standard Poisson model without zero truncated, and then the coefficient will have a usual interpretation, just consider the following, (19) The simple multiplicative connection exists for the two models. However, if no zero count is not because of the sampling scheme, but because observed zero count is truly impossible, then the simple connection is no longer correct. The adverse effects of overdispersion are worse with truncated models [7]. When sample is not truncated, using SPRM in the presence of overdispersion does not bias the estimated coefficients. But if we use ZTP model, the estimated coefficients will be biased and inconsistent and will therefore lead to biased estimation of predicted counts. As suggested by [7], before using ZTP regression model, we must check for overdispersion by zero truncated negative binomial model (ZTNB) based on a Likelihood-ratio test. 2486
10 Reference: 1. Dobson AJ, Dobson A. An Introduction to Generalized Linear Models, Second Edition. 2nd ed. Chapman & Hall/CRC; Grogger JT, Carson RT. Models for Truncated Counts. Journal of Applied Econometrics. 1991;6(3): Springael J., Van Nieuwenhuyse I. On the sum of independent zero-truncated Poisson random variables. University of Antwerp, Faculty of Applied Economics; Hardin JW, Hilbe JM. Generalized Linear Models and Extensions, Second Edition. 2nd ed. Stata Press; Wedel M, Desarbo WS, Bult JR, Ramaswamy V. A Latent Class Poisson Regression Model for Heterogeneous Count Data. Journal of Applied Econometrics. 1993;8(4): Corporation S. STATA Base Reference Manual, Volume 3 : Q - Z, Release 10. Stata; Long JS, Freese J. Regression Models for Categorical Dependent Variables Using Stata, Second Edition. 2nd ed. Stata Press; Shaw, D. (1988), 'On-site samples' regression problems of non-negative integers, truncation, and endogenous stratification', Journal of Econometrics, 37, ; Basu A, Meltzer D. Implications of spillover effects within the family for medical cost-effectiveness analysis. Journal of Health Economics. 2005;24(4): Glick H, Doshi JA, Sonnad SS, Polsky D. Economic evaluation in clinical trials. Oxford University Press; 2007: Simonoff JS. Analyzing categorical data. Springer; 2003: Miller LG, Liu H, Hays RD, et al. Knowledge of antiretroviral regimen dosing and adherence: a longitudinal study. Clin. Infect. Dis. 2003;36(4): Golin CE, Liu H, Hays RD, et al. A prospective study of predictors of adherence to combination antiretroviral medication. J Gen Intern Med. 2002;17(10):
Generalized linear models
Generalized linear models Christopher F Baum ECON 8823: Applied Econometrics Boston College, Spring 2016 Christopher F Baum (BC / DIW) Generalized linear models Boston College, Spring 2016 1 / 1 Introduction
More informationLab 3: Two levels Poisson models (taken from Multilevel and Longitudinal Modeling Using Stata, p )
Lab 3: Two levels Poisson models (taken from Multilevel and Longitudinal Modeling Using Stata, p. 376-390) BIO656 2009 Goal: To see if a major health-care reform which took place in 1997 in Germany was
More informationMODELING COUNT DATA Joseph M. Hilbe
MODELING COUNT DATA Joseph M. Hilbe Arizona State University Count models are a subset of discrete response regression models. Count data are distributed as non-negative integers, are intrinsically heteroskedastic,
More informationName: Biostatistics 1 st year Comprehensive Examination: Applied in-class exam. June 8 th, 2016: 9am to 1pm
Name: Biostatistics 1 st year Comprehensive Examination: Applied in-class exam June 8 th, 2016: 9am to 1pm Instructions: 1. This is exam is to be completed independently. Do not discuss your work with
More informationGeneral Linear Model (Chapter 4)
General Linear Model (Chapter 4) Outcome variable is considered continuous Simple linear regression Scatterplots OLS is BLUE under basic assumptions MSE estimates residual variance testing regression coefficients
More informationMeta-analysis of epidemiological dose-response studies
Meta-analysis of epidemiological dose-response studies Nicola Orsini 2nd Italian Stata Users Group meeting October 10-11, 2005 Institute Environmental Medicine, Karolinska Institutet Rino Bellocco Dept.
More informationDEEP, University of Lausanne Lectures on Econometric Analysis of Count Data Pravin K. Trivedi May 2005
DEEP, University of Lausanne Lectures on Econometric Analysis of Count Data Pravin K. Trivedi May 2005 The lectures will survey the topic of count regression with emphasis on the role on unobserved heterogeneity.
More informationBOOTSTRAPPING WITH MODELS FOR COUNT DATA
Journal of Biopharmaceutical Statistics, 21: 1164 1176, 2011 Copyright Taylor & Francis Group, LLC ISSN: 1054-3406 print/1520-5711 online DOI: 10.1080/10543406.2011.607748 BOOTSTRAPPING WITH MODELS FOR
More informationLongitudinal Data Analysis Using Stata Paul D. Allison, Ph.D. Upcoming Seminar: May 18-19, 2017, Chicago, Illinois
Longitudinal Data Analysis Using Stata Paul D. Allison, Ph.D. Upcoming Seminar: May 18-19, 217, Chicago, Illinois Outline 1. Opportunities and challenges of panel data. a. Data requirements b. Control
More informationHomework Solutions Applied Logistic Regression
Homework Solutions Applied Logistic Regression WEEK 6 Exercise 1 From the ICU data, use as the outcome variable vital status (STA) and CPR prior to ICU admission (CPR) as a covariate. (a) Demonstrate that
More informationBinary Dependent Variables
Binary Dependent Variables In some cases the outcome of interest rather than one of the right hand side variables - is discrete rather than continuous Binary Dependent Variables In some cases the outcome
More informationi (x i x) 2 1 N i x i(y i y) Var(x) = P (x 1 x) Var(x)
ECO 6375 Prof Millimet Problem Set #2: Answer Key Stata problem 2 Q 3 Q (a) The sample average of the individual-specific marginal effects is 0039 for educw and -0054 for white Thus, on average, an extra
More informationBinomial Model. Lecture 10: Introduction to Logistic Regression. Logistic Regression. Binomial Distribution. n independent trials
Lecture : Introduction to Logistic Regression Ani Manichaikul amanicha@jhsph.edu 2 May 27 Binomial Model n independent trials (e.g., coin tosses) p = probability of success on each trial (e.g., p =! =
More informationECON 594: Lecture #6
ECON 594: Lecture #6 Thomas Lemieux Vancouver School of Economics, UBC May 2018 1 Limited dependent variables: introduction Up to now, we have been implicitly assuming that the dependent variable, y, was
More informationPractice exam questions
Practice exam questions Nathaniel Higgins nhiggins@jhu.edu, nhiggins@ers.usda.gov 1. The following question is based on the model y = β 0 + β 1 x 1 + β 2 x 2 + β 3 x 3 + u. Discuss the following two hypotheses.
More informationLecture 10: Introduction to Logistic Regression
Lecture 10: Introduction to Logistic Regression Ani Manichaikul amanicha@jhsph.edu 2 May 2007 Logistic Regression Regression for a response variable that follows a binomial distribution Recall the binomial
More informationPoisson Regression. Ryan Godwin. ECON University of Manitoba
Poisson Regression Ryan Godwin ECON 7010 - University of Manitoba Abstract. These lecture notes introduce Maximum Likelihood Estimation (MLE) of a Poisson regression model. 1 Motivating the Poisson Regression
More informationAppendix A. Numeric example of Dimick Staiger Estimator and comparison between Dimick-Staiger Estimator and Hierarchical Poisson Estimator
Appendix A. Numeric example of Dimick Staiger Estimator and comparison between Dimick-Staiger Estimator and Hierarchical Poisson Estimator As described in the manuscript, the Dimick-Staiger (DS) estimator
More informationConfidence intervals for the variance component of random-effects linear models
The Stata Journal (2004) 4, Number 4, pp. 429 435 Confidence intervals for the variance component of random-effects linear models Matteo Bottai Arnold School of Public Health University of South Carolina
More informationLecture 5: Poisson and logistic regression
Dankmar Böhning Southampton Statistical Sciences Research Institute University of Southampton, UK S 3 RI, 3-5 March 2014 introduction to Poisson regression application to the BELCAP study introduction
More informationLecture 2: Poisson and logistic regression
Dankmar Böhning Southampton Statistical Sciences Research Institute University of Southampton, UK S 3 RI, 11-12 December 2014 introduction to Poisson regression application to the BELCAP study introduction
More informationLecture 3: Multiple Regression. Prof. Sharyn O Halloran Sustainable Development U9611 Econometrics II
Lecture 3: Multiple Regression Prof. Sharyn O Halloran Sustainable Development Econometrics II Outline Basics of Multiple Regression Dummy Variables Interactive terms Curvilinear models Review Strategies
More informationProblem Set 10: Panel Data
Problem Set 10: Panel Data 1. Read in the data set, e11panel1.dta from the course website. This contains data on a sample or 1252 men and women who were asked about their hourly wage in two years, 2005
More informationSociology 362 Data Exercise 6 Logistic Regression 2
Sociology 362 Data Exercise 6 Logistic Regression 2 The questions below refer to the data and output beginning on the next page. Although the raw data are given there, you do not have to do any Stata runs
More informationHow To Do Piecewise Exponential Survival Analysis in Stata 7 (Allison 1995:Output 4.20) revised
WM Mason, Soc 213B, S 02, UCLA Page 1 of 15 How To Do Piecewise Exponential Survival Analysis in Stata 7 (Allison 1995:Output 420) revised 4-25-02 This document can function as a "how to" for setting up
More informationUnderstanding the multinomial-poisson transformation
The Stata Journal (2004) 4, Number 3, pp. 265 273 Understanding the multinomial-poisson transformation Paulo Guimarães Medical University of South Carolina Abstract. There is a known connection between
More informationCase of single exogenous (iv) variable (with single or multiple mediators) iv à med à dv. = β 0. iv i. med i + α 1
Mediation Analysis: OLS vs. SUR vs. ISUR vs. 3SLS vs. SEM Note by Hubert Gatignon July 7, 2013, updated November 15, 2013, April 11, 2014, May 21, 2016 and August 10, 2016 In Chap. 11 of Statistical Analysis
More informationModelling Rates. Mark Lunt. Arthritis Research UK Epidemiology Unit University of Manchester
Modelling Rates Mark Lunt Arthritis Research UK Epidemiology Unit University of Manchester 05/12/2017 Modelling Rates Can model prevalence (proportion) with logistic regression Cannot model incidence in
More informationMarginal Effects for Continuous Variables Richard Williams, University of Notre Dame, https://www3.nd.edu/~rwilliam/ Last revised January 20, 2018
Marginal Effects for Continuous Variables Richard Williams, University of Notre Dame, https://www3.nd.edu/~rwilliam/ Last revised January 20, 2018 References: Long 1997, Long and Freese 2003 & 2006 & 2014,
More informationFrom the help desk: Comparing areas under receiver operating characteristic curves from two or more probit or logit models
The Stata Journal (2002) 2, Number 3, pp. 301 313 From the help desk: Comparing areas under receiver operating characteristic curves from two or more probit or logit models Mario A. Cleves, Ph.D. Department
More informationMultilevel Modeling Day 2 Intermediate and Advanced Issues: Multilevel Models as Mixed Models. Jian Wang September 18, 2012
Multilevel Modeling Day 2 Intermediate and Advanced Issues: Multilevel Models as Mixed Models Jian Wang September 18, 2012 What are mixed models The simplest multilevel models are in fact mixed models:
More informationWarwick Economics Summer School Topics in Microeconometrics Instrumental Variables Estimation
Warwick Economics Summer School Topics in Microeconometrics Instrumental Variables Estimation Michele Aquaro University of Warwick This version: July 21, 2016 1 / 31 Reading material Textbook: Introductory
More informationCorrelation and Simple Linear Regression
Correlation and Simple Linear Regression Sasivimol Rattanasiri, Ph.D Section for Clinical Epidemiology and Biostatistics Ramathibodi Hospital, Mahidol University E-mail: sasivimol.rat@mahidol.ac.th 1 Outline
More informationAcknowledgements. Outline. Marie Diener-West. ICTR Leadership / Team INTRODUCTION TO CLINICAL RESEARCH. Introduction to Linear Regression
INTRODUCTION TO CLINICAL RESEARCH Introduction to Linear Regression Karen Bandeen-Roche, Ph.D. July 17, 2012 Acknowledgements Marie Diener-West Rick Thompson ICTR Leadership / Team JHU Intro to Clinical
More informationApplied Statistics and Econometrics
Applied Statistics and Econometrics Lecture 5 Saul Lach September 2017 Saul Lach () Applied Statistics and Econometrics September 2017 1 / 44 Outline of Lecture 5 Now that we know the sampling distribution
More information2. We care about proportion for categorical variable, but average for numerical one.
Probit Model 1. We apply Probit model to Bank data. The dependent variable is deny, a dummy variable equaling one if a mortgage application is denied, and equaling zero if accepted. The key regressor is
More informationSemiparametric Generalized Linear Models
Semiparametric Generalized Linear Models North American Stata Users Group Meeting Chicago, Illinois Paul Rathouz Department of Health Studies University of Chicago prathouz@uchicago.edu Liping Gao MS Student
More information7/28/15. Review Homework. Overview. Lecture 6: Logistic Regression Analysis
Lecture 6: Logistic Regression Analysis Christopher S. Hollenbeak, PhD Jane R. Schubart, PhD The Outcomes Research Toolbox Review Homework 2 Overview Logistic regression model conceptually Logistic regression
More informationStudy Design: Sample Size Calculation & Power Analysis
Study Design: Sample Size Calculation & Power Analysis RCMAR/CHIME/EXPORT April 21, 2008 Honghu Liu, Ph.D. Contents Background Common Designs Examples Computer Software Summary & Discussion Background
More informationAnalysis of Longitudinal Data. Patrick J. Heagerty PhD Department of Biostatistics University of Washington
Analysis of Longitudinal Data Patrick J Heagerty PhD Department of Biostatistics University of Washington Auckland 8 Session One Outline Examples of longitudinal data Scientific motivation Opportunities
More informationLab 10 - Binary Variables
Lab 10 - Binary Variables Spring 2017 Contents 1 Introduction 1 2 SLR on a Dummy 2 3 MLR with binary independent variables 3 3.1 MLR with a Dummy: different intercepts, same slope................. 4 3.2
More informationLecture 12: Effect modification, and confounding in logistic regression
Lecture 12: Effect modification, and confounding in logistic regression Ani Manichaikul amanicha@jhsph.edu 4 May 2007 Today Categorical predictor create dummy variables just like for linear regression
More informationTesting and Model Selection
Testing and Model Selection This is another digression on general statistics: see PE App C.8.4. The EViews output for least squares, probit and logit includes some statistics relevant to testing hypotheses
More informationAnalysis of repeated measurements (KLMED8008)
Analysis of repeated measurements (KLMED8008) Eirik Skogvoll, MD PhD Professor and Consultant Institute of Circulation and Medical Imaging Dept. of Anaesthesiology and Emergency Medicine 1 Day 2 Practical
More informationLecture 4: Generalized Linear Mixed Models
Dankmar Böhning Southampton Statistical Sciences Research Institute University of Southampton, UK S 3 RI, 11-12 December 2014 An example with one random effect An example with two nested random effects
More informationoptions description set confidence level; default is level(95) maximum number of iterations post estimation results
Title nlcom Nonlinear combinations of estimators Syntax Nonlinear combination of estimators one expression nlcom [ name: ] exp [, options ] Nonlinear combinations of estimators more than one expression
More informationLecture#12. Instrumental variables regression Causal parameters III
Lecture#12 Instrumental variables regression Causal parameters III 1 Demand experiment, market data analysis & simultaneous causality 2 Simultaneous causality Your task is to estimate the demand function
More informationTreatment interactions with nonexperimental data in Stata
The Stata Journal (2011) 11, umber4,pp.1 11 Treatment interactions with nonexperimental data in Stata Graham K. Brown Centre for Development Studies University of Bath Bath, UK g.k.brown@bath.ac.uk Thanos
More informationVarieties of Count Data
CHAPTER 1 Varieties of Count Data SOME POINTS OF DISCUSSION What are counts? What are count data? What is a linear statistical model? What is the relationship between a probability distribution function
More informationLab 07 Introduction to Econometrics
Lab 07 Introduction to Econometrics Learning outcomes for this lab: Introduce the different typologies of data and the econometric models that can be used Understand the rationale behind econometrics Understand
More informationHandout 12. Endogeneity & Simultaneous Equation Models
Handout 12. Endogeneity & Simultaneous Equation Models In which you learn about another potential source of endogeneity caused by the simultaneous determination of economic variables, and learn how to
More informationEstimating chopit models in gllamm Political efficacy example from King et al. (2002)
Estimating chopit models in gllamm Political efficacy example from King et al. (2002) Sophia Rabe-Hesketh Department of Biostatistics and Computing Institute of Psychiatry King s College London Anders
More informationIntroductory Econometrics. Lecture 13: Hypothesis testing in the multiple regression model, Part 1
Introductory Econometrics Lecture 13: Hypothesis testing in the multiple regression model, Part 1 Jun Ma School of Economics Renmin University of China October 19, 2016 The model I We consider the classical
More informationBinomial and Poisson Probability Distributions
Binomial and Poisson Probability Distributions Esra Akdeniz March 3, 2016 Bernoulli Random Variable Any random variable whose only possible values are 0 or 1 is called a Bernoulli random variable. What
More informationLecture 7 Time-dependent Covariates in Cox Regression
Lecture 7 Time-dependent Covariates in Cox Regression So far, we ve been considering the following Cox PH model: λ(t Z) = λ 0 (t) exp(β Z) = λ 0 (t) exp( β j Z j ) where β j is the parameter for the the
More informationCategorical and Zero Inflated Growth Models
Categorical and Zero Inflated Growth Models Alan C. Acock* Summer, 2009 *Alan C. Acock, Department of Human Development and Family Sciences, Oregon State University, Corvallis OR 97331 (alan.acock@oregonstate.edu).
More informationLecture 25. Ingo Ruczinski. November 24, Department of Biostatistics Johns Hopkins Bloomberg School of Public Health Johns Hopkins University
Lecture 25 Department of Biostatistics Johns Hopkins Bloomberg School of Public Health Johns Hopkins University November 24, 2015 1 2 3 4 5 6 7 8 9 10 11 1 Hypothesis s of homgeneity 2 Estimating risk
More informationEditor Executive Editor Associate Editors Copyright Statement:
The Stata Journal Editor H. Joseph Newton Department of Statistics Texas A & M University College Station, Texas 77843 979-845-3142 979-845-3144 FAX jnewton@stata-journal.com Associate Editors Christopher
More information8. Nonstandard standard error issues 8.1. The bias of robust standard errors
8.1. The bias of robust standard errors Bias Robust standard errors are now easily obtained using e.g. Stata option robust Robust standard errors are preferable to normal standard errors when residuals
More informationLecture 14: Introduction to Poisson Regression
Lecture 14: Introduction to Poisson Regression Ani Manichaikul amanicha@jhsph.edu 8 May 2007 1 / 52 Overview Modelling counts Contingency tables Poisson regression models 2 / 52 Modelling counts I Why
More informationModelling counts. Lecture 14: Introduction to Poisson Regression. Overview
Modelling counts I Lecture 14: Introduction to Poisson Regression Ani Manichaikul amanicha@jhsph.edu Why count data? Number of traffic accidents per day Mortality counts in a given neighborhood, per week
More information1 The basics of panel data
Introductory Applied Econometrics EEP/IAS 118 Spring 2015 Related materials: Steven Buck Notes to accompany fixed effects material 4-16-14 ˆ Wooldridge 5e, Ch. 1.3: The Structure of Economic Data ˆ Wooldridge
More informationBIOL 51A - Biostatistics 1 1. Lecture 1: Intro to Biostatistics. Smoking: hazardous? FEV (l) Smoke
BIOL 51A - Biostatistics 1 1 Lecture 1: Intro to Biostatistics Smoking: hazardous? FEV (l) 1 2 3 4 5 No Yes Smoke BIOL 51A - Biostatistics 1 2 Box Plot a.k.a box-and-whisker diagram or candlestick chart
More informationECON Introductory Econometrics. Lecture 17: Experiments
ECON4150 - Introductory Econometrics Lecture 17: Experiments Monique de Haan (moniqued@econ.uio.no) Stock and Watson Chapter 13 Lecture outline 2 Why study experiments? The potential outcome framework.
More informationClinical Trials. Olli Saarela. September 18, Dalla Lana School of Public Health University of Toronto.
Introduction to Dalla Lana School of Public Health University of Toronto olli.saarela@utoronto.ca September 18, 2014 38-1 : a review 38-2 Evidence Ideal: to advance the knowledge-base of clinical medicine,
More informationRegression #8: Loose Ends
Regression #8: Loose Ends Econ 671 Purdue University Justin L. Tobias (Purdue) Regression #8 1 / 30 In this lecture we investigate a variety of topics that you are probably familiar with, but need to touch
More informationS o c i o l o g y E x a m 2 A n s w e r K e y - D R A F T M a r c h 2 7,
S o c i o l o g y 63993 E x a m 2 A n s w e r K e y - D R A F T M a r c h 2 7, 2 0 0 9 I. True-False. (20 points) Indicate whether the following statements are true or false. If false, briefly explain
More informationSTAC51: Categorical data Analysis
STAC51: Categorical data Analysis Mahinda Samarakoon January 26, 2016 Mahinda Samarakoon STAC51: Categorical data Analysis 1 / 32 Table of contents Contingency Tables 1 Contingency Tables Mahinda Samarakoon
More informationPBAF 528 Week 8. B. Regression Residuals These properties have implications for the residuals of the regression.
PBAF 528 Week 8 What are some problems with our model? Regression models are used to represent relationships between a dependent variable and one or more predictors. In order to make inference from the
More informationMeasurement Error. Often a data set will contain imperfect measures of the data we would ideally like.
Measurement Error Often a data set will contain imperfect measures of the data we would ideally like. Aggregate Data: (GDP, Consumption, Investment are only best guesses of theoretical counterparts and
More informationClassification & Regression. Multicollinearity Intro to Nominal Data
Multicollinearity Intro to Nominal Let s Start With A Question y = β 0 + β 1 x 1 +β 2 x 2 y = Anxiety Level x 1 = heart rate x 2 = recorded pulse Since we can all agree heart rate and pulse are related,
More informationAnswer all questions from part I. Answer two question from part II.a, and one question from part II.b.
B203: Quantitative Methods Answer all questions from part I. Answer two question from part II.a, and one question from part II.b. Part I: Compulsory Questions. Answer all questions. Each question carries
More informationOrdinal Independent Variables Richard Williams, University of Notre Dame, https://www3.nd.edu/~rwilliam/ Last revised April 9, 2017
Ordinal Independent Variables Richard Williams, University of Notre Dame, https://www3.nd.edu/~rwilliam/ Last revised April 9, 2017 References: Paper 248 2009, Learning When to Be Discrete: Continuous
More informationAt this point, if you ve done everything correctly, you should have data that looks something like:
This homework is due on July 19 th. Economics 375: Introduction to Econometrics Homework #4 1. One tool to aid in understanding econometrics is the Monte Carlo experiment. A Monte Carlo experiment allows
More informationFinal Exam. Question 1 (20 points) 2 (25 points) 3 (30 points) 4 (25 points) 5 (10 points) 6 (40 points) Total (150 points) Bonus question (10)
Name Economics 170 Spring 2004 Honor pledge: I have neither given nor received aid on this exam including the preparation of my one page formula list and the preparation of the Stata assignment for the
More informationECON Introductory Econometrics. Lecture 5: OLS with One Regressor: Hypothesis Tests
ECON4150 - Introductory Econometrics Lecture 5: OLS with One Regressor: Hypothesis Tests Monique de Haan (moniqued@econ.uio.no) Stock and Watson Chapter 5 Lecture outline 2 Testing Hypotheses about one
More informationEXAMINATIONS OF THE ROYAL STATISTICAL SOCIETY (formerly the Examinations of the Institute of Statisticians) GRADUATE DIPLOMA, 2007
EXAMINATIONS OF THE ROYAL STATISTICAL SOCIETY (formerly the Examinations of the Institute of Statisticians) GRADUATE DIPLOMA, 2007 Applied Statistics I Time Allowed: Three Hours Candidates should answer
More informationAssessing the Calibration of Dichotomous Outcome Models with the Calibration Belt
Assessing the Calibration of Dichotomous Outcome Models with the Calibration Belt Giovanni Nattino The Ohio Colleges of Medicine Government Resource Center The Ohio State University Stata Conference -
More informationEssential of Simple regression
Essential of Simple regression We use simple regression when we are interested in the relationship between two variables (e.g., x is class size, and y is student s GPA). For simplicity we assume the relationship
More informationEvaluating Patient Level Costs. Outline
Evaluating Patient Level Costs Statistical Considerations in Economic Evaluations ISPOR 14th Annual International Meeting May 2009 Jalpa Doshi and Henry Glick www.uphs.upenn.edu/dgimhsr Outline Part 1.
More informationLinear Modelling in Stata Session 6: Further Topics in Linear Modelling
Linear Modelling in Stata Session 6: Further Topics in Linear Modelling Mark Lunt Arthritis Research UK Epidemiology Unit University of Manchester 14/11/2017 This Week Categorical Variables Categorical
More informationsociology 362 regression
sociology 36 regression Regression is a means of modeling how the conditional distribution of a response variable (say, Y) varies for different values of one or more independent explanatory variables (say,
More informationLecture 10: Alternatives to OLS with limited dependent variables. PEA vs APE Logit/Probit Poisson
Lecture 10: Alternatives to OLS with limited dependent variables PEA vs APE Logit/Probit Poisson PEA vs APE PEA: partial effect at the average The effect of some x on y for a hypothetical case with sample
More information1: a b c d e 2: a b c d e 3: a b c d e 4: a b c d e 5: a b c d e. 6: a b c d e 7: a b c d e 8: a b c d e 9: a b c d e 10: a b c d e
Economics 102: Analysis of Economic Data Cameron Spring 2016 Department of Economics, U.C.-Davis Final Exam (A) Tuesday June 7 Compulsory. Closed book. Total of 58 points and worth 45% of course grade.
More information16.400/453J Human Factors Engineering. Design of Experiments II
J Human Factors Engineering Design of Experiments II Review Experiment Design and Descriptive Statistics Research question, independent and dependent variables, histograms, box plots, etc. Inferential
More informationfhetprob: A fast QMLE Stata routine for fractional probit models with multiplicative heteroskedasticity
fhetprob: A fast QMLE Stata routine for fractional probit models with multiplicative heteroskedasticity Richard Bluhm May 26, 2013 Introduction Stata can easily estimate a binary response probit models
More informationPath Analysis. PRE 906: Structural Equation Modeling Lecture #5 February 18, PRE 906, SEM: Lecture 5 - Path Analysis
Path Analysis PRE 906: Structural Equation Modeling Lecture #5 February 18, 2015 PRE 906, SEM: Lecture 5 - Path Analysis Key Questions for Today s Lecture What distinguishes path models from multivariate
More information5. Let W follow a normal distribution with mean of μ and the variance of 1. Then, the pdf of W is
Practice Final Exam Last Name:, First Name:. Please write LEGIBLY. Answer all questions on this exam in the space provided (you may use the back of any page if you need more space). Show all work but do
More informationClass Notes: Week 8. Probit versus Logit Link Functions and Count Data
Ronald Heck Class Notes: Week 8 1 Class Notes: Week 8 Probit versus Logit Link Functions and Count Data This week we ll take up a couple of issues. The first is working with a probit link function. While
More informationSTAT 5500/6500 Conditional Logistic Regression for Matched Pairs
STAT 5500/6500 Conditional Logistic Regression for Matched Pairs Motivating Example: The data we will be using comes from a subset of data taken from the Los Angeles Study of the Endometrial Cancer Data
More informationLatent class analysis and finite mixture models with Stata
Latent class analysis and finite mixture models with Stata Isabel Canette Principal Mathematician and Statistician StataCorp LLC 2017 Stata Users Group Meeting Madrid, October 19th, 2017 Introduction Latent
More informationAutocorrelation. Think of autocorrelation as signifying a systematic relationship between the residuals measured at different points in time
Autocorrelation Given the model Y t = b 0 + b 1 X t + u t Think of autocorrelation as signifying a systematic relationship between the residuals measured at different points in time This could be caused
More informationleebounds: Lee s (2009) treatment effects bounds for non-random sample selection for Stata
leebounds: Lee s (2009) treatment effects bounds for non-random sample selection for Stata Harald Tauchmann (RWI & CINCH) Rheinisch-Westfälisches Institut für Wirtschaftsforschung (RWI) & CINCH Health
More informationLecture 7: OLS with qualitative information
Lecture 7: OLS with qualitative information Dummy variables Dummy variable: an indicator that says whether a particular observation is in a category or not Like a light switch: on or off Most useful values:
More informationSociology 63993, Exam 2 Answer Key [DRAFT] March 27, 2015 Richard Williams, University of Notre Dame,
Sociology 63993, Exam 2 Answer Key [DRAFT] March 27, 2015 Richard Williams, University of Notre Dame, http://www3.nd.edu/~rwilliam/ I. True-False. (20 points) Indicate whether the following statements
More informationQuestion 1 carries a weight of 25%; Question 2 carries 20%; Question 3 carries 20%; Question 4 carries 35%.
UNIVERSITY OF EAST ANGLIA School of Economics Main Series PGT Examination 017-18 ECONOMETRIC METHODS ECO-7000A Time allowed: hours Answer ALL FOUR Questions. Question 1 carries a weight of 5%; Question
More informationAddition to PGLR Chap 6
Arizona State University From the SelectedWorks of Joseph M Hilbe August 27, 216 Addition to PGLR Chap 6 Joseph M Hilbe, Arizona State University Available at: https://works.bepress.com/joseph_hilbe/69/
More informationAnalysing repeated measurements whilst accounting for derivative tracking, varying within-subject variance and autocorrelation: the xtiou command
Analysing repeated measurements whilst accounting for derivative tracking, varying within-subject variance and autocorrelation: the xtiou command R.A. Hughes* 1, M.G. Kenward 2, J.A.C. Sterne 1, K. Tilling
More informationPoisson regression: Further topics
Poisson regression: Further topics April 21 Overdispersion One of the defining characteristics of Poisson regression is its lack of a scale parameter: E(Y ) = Var(Y ), and no parameter is available to
More informationUnobserved Heterogeneity and the Statistical Analysis of Highway Accident Data. Fred Mannering University of South Florida
Unobserved Heterogeneity and the Statistical Analysis of Highway Accident Data Fred Mannering University of South Florida Highway Accidents Cost the lives of 1.25 million people per year Leading cause
More information